136 1 23MB
English Pages 312
Grundlehren der mathematischen Wissenschaften A Series of Comprehensive Studies in Mathematics
Series editors M. Berger P. de la Harpe F. Hirzebruch N.J. Hitchin L. Hörmander A. Kupiainen G. Lebeau F.-H. Lin B.C. Ngô M. Ratner D. Serre Ya.G. Sinai N.J.A. Sloane A.M. Vershik M. Waldschmidt Editor-in-Chief A. Chenciner J. Coates S.R.S. Varadhan
104
For further volumes: http://www.springer.com/series/138
Kai Lai Chung
Markov Chains With Stationary Transition Probabilities
Second Edition
Kai Lai Chung (1917-2009) Stanford University, USA
ISSN 0072-7830 ISBN-13: 978-3-642-62017-1 e-ISBN-13: 978-3-642-62015-7 DOl: 10.1007/978-3-642-62015-7 Springer Heidelberg Dordrecht London New York Library of Congress Catalog Card Number: 66-25793 Mathematics Subject Classification (2010): 60-XX, 60110
© by Springer-Verlag OHG, Berlin' Gottingen . © by Springer-Verlag, Berlin' Heidelberg 1967
Heidelberg 1960
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover design: VTEX, Vilnius
Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
To my parents
Preface to the First Edition The theory of Markov chains, although a special case of Markov processes, is here developed for its own sake and presented on its own merits. In general, the hypothesis of a denumerable state space, which is the defining hypothesis of what we call a "chain" here, generates more clear-cut questions and demands more precise and definitive answers. For example, the principal limit theorem (§§ I.6, II.10), still the object of research for general Markov processes, is here in its neat final form; and the strong Markov property (§ II.9) is here always applicable. While probability theory has advanced far enough that a degree of sophistication is needed even in the limited context of this book, it is still possible here to keep the proportion of definitions to theorems relatively low. From the standpoint of the general theory of stochastic processes, a continuous parameter Markov chain appears to be the first essentially discontinuous process that has been studied in some detail. It is common that the sample functions of such a chain have discontinuities worse than jumps, and these baser discontinuities playa central role in the theory, of which the mystery remains to be completely unraveled. In this connection the basic concepts of separability and measurability, which are usually applied only at an early stage of the discussion to establish a certain smoothness of the sample functions, are here applied constantly as indispensable tools. Hence it is hoped that this book may also serve as an illustration of the modern rigorous approach to stochastic processes toward which there is still so much misgiving. The two parts of the book, dealing respectively with a discrete and a continuous parameter, are almost independent. It was my original intention to write only the second part, preceded by whatever necessary material from the first. As it turned out, I have omitted details of the continuous parameter analogues when they are obvious enough, in order to concentrate in Part II on those topics which have no counterparts in the discrete parameter case, such as the local properties of sample functions and of transition probability functions. It is these topics that make the continuous parameter case a relatively original and still challenging theory. Markov process is named after A. A. MARKOV who introduced the concept in 1907 with a discrete parameter and finite number of states. The denumerable case was launched by KOLMOGOROV in 1936, followed
VIII
Preface
closely by DOEBLIN whose contributions pervade all parts of the Markov theory. Fundamental work on continuous parameter chains was done by DOOB in 1942 and 1945; and in 1951 PAUL LEVY, with his unique intuition, drew a comprehensive picture of the field. The present work has grown out of efforts to consolidate and continue the pioneering work of these mathematicians. It is natural that I have based the exposition on my own papers, with major revisions and additions; in particular, the first few sections form an expansion of my lecture notes (mimeographed, Columbia University 1951) which have had some circulation. Quite a few new results, by myself and by colleagues subject to my propaganda, have been as it were made to order for this presentation. Historical comments and credit acknowledgements are to be found in the Notes at the end of the sections. But as a rule I do not try to assign priority to fairly obvious results; to do so would be to insult the intelligence of the reader as well as that of the authors involved. This book presupposes no knowledge of Markov chains but it does assume the elements of general probability theory as given in a modem introductory course. Part I is on about the same mathematical level as FELLER'S Introduction to probability theory and its applications, vol.J. For Part II the reader should know the elementary theory of real functions such as the oft-quoted theorems of DINI, FATOU, FUBINI and LEBESGUE. He should also be ready to consult, if not already familiar with, certain basic measure-theoretic propositions in DooB's Stochastic processes. An attempt is made to isolate and expose [sic] the latter material, rather than to assure the reader that it is useless luxury. The mature reader can read Part II with only occasional references to Part I. Markov chains have been used a good deal in applied probability and statistics. In these applications one is generally looking for something considerably more specific or rather more general. In the former category belong finite chains, birth-and-death processes, etc.; in the latter belong various models involving a continuous state space subject to some discretization such as queueing problems. It should be clear that such examples cannot be adequately treated here. In general, the practical man in search of ready-made solutions to his own problems will discover in this book, as elsewhere, that mathematicians are more inclined to build fire stations than to put out fires. A more regrettable omission, from my point of view, is that of a discussion of semigroup or resolvent theory which is pertinent to the last few sections of the book. Let us leave it to another treatise by more competent hands. A book must be ended, but not without a few words about what lies beyond it. First, sporadic remarks on open problems are given in the Notes. Even for a discrete parameter and in the classical vein, a semblance of fullness exists only in the positive-recurrent case. Much less
IX
Preface
is known in the null-recurrent case, and a serious study of nonrecurrent phenomena has just begun recently. The last is intimately related to an analysis of the discontinuities of continuous parameter sample functions already mentioned. In the terminology of this book, the question can be put as follows: how do the sample curves manage to go to infinity and to come back from there? A satisfactory answer will include a real grasp on the behavior of instantaneous states, but the question is equally exigent even if we confine ourselves to stable states (as in §§ 11.17 to 20). This area of investigations has been called the theory of "boundaries" in analogy with classical analysis, but it is perhaps more succinctly described as an intrinsic theory of compactification of the denumerable state space of the Markov chain. There are a number of allusions to this theme scattered throughout the book, but I have refrained from telling an unfinished story. The solicitous voice of a friend has been heard saying that such a new theory would supersede the part of the present treatment touching on the boundary. Presumably and gladly so. Indeed, to use a Chinese expression, why should the azure not be superior to the blue? Among friends who have read large portions of the manuscript and suggested valuable improvements are J. L. DOOE, HENRY P. McKEAN Jr. and G. E. H. REUTER. My own work in the field, much of it appearing here and some of it for the first time, has been supported in part by the Office of Scientific Research of the United States Air Force. To these, and quite a few others who rendered help of one kind or other, I extend my hearty thanks. K. L. C.
January, 1960
Preface to the Second Edition In this revised edition I have added some new material as well as making corrections and improvements. In Part I the additions (in §§ 9, 10, 11) are a few results closely related to the original text and illustrative of the method of taboos. In Part II the major additions have to do with the boundary theory; these include "fine topology" in § 11, "Martin boundary" and "entrance law" in § 19. The old Addenda have been expanded into the new § 12, some of the developments there being also germane to the boundary theory. It is hoped that these efforts have now brought the reader right up to the edge of the boundary. A number of selected items have been inserted in the Bibliography as a further guide to the latest literature on the above-mentioned and other topics.
x
Preface
For some of the important revisions I am particularly indebted to STEVEN OREY and DAVID FREEDMAN for their extensive counsel. All other readers who have sent in corrections are also gratefully acknowledged here. Numerous lesser changes are made, including some for the sake of pedagogy and commentary. Personal taste and habit not being stationary in time, I should have liked to make more radical departures such as deleting hundreds of the w's in the cumbersome notation, but have generally decided to leave well enough alone. Most of the corrections have already been incorporated in the Russian translation which was published in 1964, but many of the additions are new. The second edition appears at a time when boundary theory (envisaged in this book as a study in depth of the behavior of sample functions in relation to the" infinities ") has just begun to take shape. This vital theme, already announced in the preface to the first edition, will no doubt be the most challenging part of the theory to come. I have chosen not to enter into it in detail in the belief that such a development needs more time to mature. In this regard it may be a timely observation that the theory of Markov processes in general state space, which flourished in recent years and has built up a powerful machinery, has had to date little impact on the denumerable [chain] case. This is because the prevailing assumptions allow the sample functions virtually no other discontinuities than jumps - a situation which would make a trite object of a chain. On the other hand, the special theory of Markov chains has yet to adapt its methodology to a broader context suitable for the general state space. Thus there exists at the moment a state of mutual detachment which surely must not be suffered to continue. Future progress in the field looks to a meaningful fusion of these two aspects of the Markovian phenomenon. October, 1966
K. L. C.
Contents Part I. Discrete Parameter § 1. Fundamental definitions
§ 2. § 3. § 4. § 5. § 6. § 7. § 8. § 9. § 10 .. § 11. § 12. § 13. § 14. § 15. § 16. § 17.
Transition probabilities Classification of states Recurrence Criteria and examples The main limit theorem Various complements . Repetitive pattern and renewal process Taboo probabilities . . . . . . . . . The generating function . . . . . . . The moments of first entrance time distributions A random walk example . . . . . . . . . System theorems . . . . . . . . . . . . Functionals and associated random variables Ergodic theorems . . . . . . Further limit theorems Almost closed and sojourn sets
12 17 21 28 35 41 45 54 61 71 76 81 90 99 112
Part II. Continuous Parameter § 1. § 2. § 3. § 4. § 5. § 6. § 7.
§ 8. § 9. § 10. § 11. § 12. §13. § 14. § 15. § 16. § 17. § 18. § 19· § 20.
Transition matrix: basic properties Standard transition matrix. Differentiability Definitions and measure-theoretic foundations The sets of constancy Continuity properties of sample functions Further specifications of the process . Optional random variable Strong Markov property . Classification of states . Taboo probability functions Last exit time Ratio limit theorems; discrete approximations Functionals Post-exit process Imbedded renewal process The two systems of differential equations The minimal solution The first infinity Examples
119 128 134 140 148 157 160 165 172 182 187 197 212 224 230 239 245 251 257 272
Bibliography
292
Index
298
Part I. Discrete parameter § 1. Fundamental definitions The precise definition of the term "Markov chain" as used in this monograph will be given below. However, the following remarks will help clarify our usage for the benefit of those readers who have had previous contact with the terminology. A Markov process is a special type of stochastic process distinguished by a certain Markov property; a Markov chain is a Markov process with a denumerable (namely, finite or denumerably infinite) number of states. The time parameter may be taken to be the set of nonnegative integers or the set of nonnegative real numbers; accordingly we have the discrete parameter case or the continuous parameter case. The adjective "simple" is sometimes used to qualify our Markov chain, but since we do not discuss "multiple" chains we shall not make the distinction. Furthermore, we shall discuss only Markov chains "with stationary (or temporally homogeneous) transition probabilities" so that the qualifying phrase in quotes will be understood. Finally, our discussion does not differentiate between a finite or a denumerably infinite number of states so that no special treatment is given to the former case. In Part I we deal with a discrete parameter. Here the requisite foundations can be summarized as follows. We consider an abstract set Q, called the probability space (or sample space), with the generic element ro, called the elementary event (or sample point); a Borel field ofF of subsets of Q, called measurable sets or events, including Q as a member; and a (countably additive) probability measure P defined on ofF. The triple (Q, ofF, P) is called a probability triple. A set in ofF of probability zero will be called a null set; "almost all ro (a. a. ro)" or "almost everywhere (a.e.)" means "all ro except a null set". The pair (ofF, P) will be assumed to be complete in the sense that every subset of a null set belongs to ofF (and is a null set). A Borel subfield of ofF is said to be augmented iff it contains all null sets. Given a Borel sub field there is a unique smallest augmented Borel subfield of ofF containing the given one. Unless otherwise specified all Borel fields used below will be assumed to be augmented. A (real) random variable (in the generalized sense) is a single-valued function from a set .10 in ofF to the closed real line X = [- 00, 00] such that for every real number c the set of ro in .10 for which x (ro) ~ c belongs to ofF. .10 is called the domain of definition of x and the set .1 of ro in .10 for which
+
t
Chung, Markov Chains
I Discrete parameter
2
Ix(w)l 0. If P(AI)=o, then P(A2IAI) is undefined. Undefined conditional probabilities will appear frequently in what follows
1.1 Fundamental definitions
3
as a matter of convenience. If one of them is multiplied by a quantity which is equal to 0, the product is taken to be 0. The conditional probability of the set {co: e.g. is denoted by
Xa (co)
2
= ca} relative to the set n=l n {co:
P{xa(co) =cal Xl (co) =cl ,
X2(CO)
Xn
(co) = cn }
=c2}·
The random variables {x., 1 ~'V~n}, not necessarily finite-valued, are said to be independent in case
P{.61 [x.(co)~c.]} = !lP{x.(co)~c.} for arbitrary real (finite) c., 1~'V~n. It follows that the same equation holds if the sets {co: x.(co)~c.} are replaced by the more general sets {co: x.(co)EA.} where the A. are Borel sets in X. The sequence {xn' n~1} is a sequence of independent random variables in case any finite number of them are independent. The measurable sets A., 1 ~'V:;;;n or 1 ~'V< 00 are independent in case their indicators are, the indicator of a set being the function which equals one on the set and zero elsewhere. The (mathematical) expectation of a random variable x is the abstract Lebesgue-Stieltjes integral
E(x) = JX (co) P (dco) . D
Frequently we extend this definition to a random variable that assumes one of the two values 00 or - 00 with positive probability, provided the integral exists, finite or infinite. If AE~ and P(A) > 0, the conditional expectation of x relative to A is defined to be
+
E(xIA)=
f x(co)P(dcoIA)=
fx(w)P(dw) A
P(A)
.
D
In particular if x is discrete with all its possible values in the denumerable set A then
E(xIA)= P(~) LiP{A; x(co}=i} iEA
provided the series converges absolutely. Throughout Part I the letters n, m, 'V, r, s, t denote nonnegative integers unless otherwise specified. A discrete parameter stochastic process is a sequence of random variables {xn' n~ O} defined with respect to a probability triple (fl,~, Pl. If all the random variables are discrete, the union I of all the possible values of all Xn is a denumerable set called the (minimal) state space 1*
I Discrete parameter
4
of the process and each element of I a state. Thus iEI if and only if there exists an n~O such that P{xn(co)=i}>O. We are borrowing from the language of physics where the term "state" refers to that of a material system whose evolution in time is described by the model that is our stochastic process. A discrete parameter Markov chain is a sequence of discrete random variables {xn' n ~ O} possessing the following property: for any n ~ 2, 0;;;;; tl < ... < tn and any iI' ... , in in the state space I we have
{
(1)
Ph" (co) =inl Xtl (co) =i~, ... , Xtn _l (co? = in-I} =P{Xtn (co) =~nl Xtn _l (co) =~n-I}
whenever the left member is defined. The condition (1) will be referred to as the Markov property. It is equivalent to the following apparently weaker condition: for every n ~ 1,
P{xn (co) =inl Xo (co) = i o, ... , xn - 1 (co) =in =P{xn(co) =inl xn - 1 (co) =in - 1 }.
1}
The simple proof is omitted. An important consequence of (1) is that for any n~2 and 0;;;;;t1 < ... < tn< .. ·O, we have
P{xm+r(W)ECr for all r;S1lxm(w)=i}=1. Suppose now that P{Xo(W)EC o}=1, viz.
L
iE c.
Pi=1. Consider the se-
quence of random variables {Xnd' n;SO}. Clearly this is a M.C. with state space Co. Its initial distribution is {Pi' iECo} and its transition matrix is (P~~», i,jECo. For this reduced M.C. the reduced state space Co is an essential class of period one. It is thus clear why considerations involving the period can be reduced to the case where it is equal to one.
I Discrete parameter
16
A nonempty set of states A is called (stochastically) closed iff we have
"LPii=1
iEA
for every iEA. It then follows that"L PW=1 for every n. A closed iEA
set is called minimal iff it contains no proper subset that is closed. The state space is closed but not necessarily minimal closed.
Theorem 5. The union of a finite number of inessential classes is not closed. A set is minimal closed if and only if it is an essential class. Proof. We define a partial ordering among inessential classes as follows: A l ""A 2 iff there are ilEAl and i2EA2 such that i l ""i2. This is then true for every ilEAl and i 2 EA 2 • Clearly this relation is reflexive, anti-symmetric between two distinct inessential classes, and transitive; hence it establishes a partial ordering. Now let AI' ... , Am be distinct inessential classes. Consider all possible partial orderings among them and let A""""A m. " " ••• ""Ami be one of the longest ordered sequences so that there is no n, 1~n~m, such that Aml""An. Let iEA"'r Since i is inessential there is aj such that i""j but notj"",i. By the choice of Am m
we have j EE U An' proving that this union is not closed. n=l
It is clear that an essential class is minimal closed. It follows that if a minimal closed set contains an essential state, then it consists of the corresponding class. Now consider a closed set C containing only inessential states. Let i be one of them and let A be the set of all f in C such that i""j but not j""i. Then A-{i} is nonempty, closed and a proper subset of C. Hence C cannot be minimal closed. 0 The trivial example where I is the set of nonnegative integers, Po=1, and Pi,i+1 =1 for i~O shows that (i) the closed set I may be the union of a denumerably infinite number of inessential states (classes); (ii) a closed set may contain no minimal closed set.
Let A be closed, iEA and m~O. Let Q={w: xm(w)=i} and suppose P(Q»O; let #'={w: Xm(w) = i},,9'" {xs , s~m+1}, namely .# is the Borel field of sets each of which is the intersection of {w: xm(w)=i} with a set from the Borel field 9'"{xs' s~m+1} defined in §1; and P(A)=P(AIQ) for AE.#. On the probability triple (6,.#, follows:
P)
we define a process {xn' n ~ o} as
1.4 Recurrence
17
A of A with A = A if in particular A is minimal; whose initial distribution is given
~hen {Xn' n~O} is a M.e. whose state space is a subset
by Pi=1 and whose transition matrix is (Pik)= (Pik)' j, kEA. For conditional transitions within the closed set A, relative to the event {w: xm(w)=i}, the reduced M.C. {xn} with the reduced state space and reduced transition matrix suffices. More generally we may consider transitions within A relative to the event {w: xm (w) EA}. This explains why in the sequel we shall often assume that I is an essential class. A M. C. is called irreducible iff its state space is minimal closed, thus consisting of one essential class. It is called indecomposable iff its state space does not contain two disjoint closed sets.
§ 4. Recurrence For further classification of the states we must introduce new quantities. To employ a vivid language we shall say that the M. C. {xn} is in the state i at the time n or at the n-th step to mean the event {w: xn(w)=i}. Similar expressions like starting, entering, returning, avoiding etc.; and also temporal phrases like for the first (or last) time, at least once etc. are to be interpreted in the same spirit. The first new quantity is the probability that the M. C. will be in j for the first time at the n-th step, given that it starts from i. This is denoted by (1)
P{xm+.(w)+j, O0; i-.=j if and only if fit
f~>O.
Proof. This follows from the obvious inequalities sup p(n) < 'I =
..
r. < L 00
'1 =n~l
p(n)
'1'
n
Theorem 2. gij=!itgjj' Proof. We have 00
gij=
L
n~l
P{x.(w)=l=j,O
• =0
Since 8 is arbitrary, (1) follows. Now let b be infinite; without loss of generality we may suppose that b=+oo. Given any M>O, there exists an N=N(M) such that bn~M if n~N. We have then
t a.bn_.~
• =0
Dividing by
(n1:N a.) M .=0
+ (.=n-N+l t a.) (O=n;&;N !pin b
n) •
n
L a., letting n~ 00 and using (3) we obtain
.=0
n
lim
n-:::::;--oo
L a.bn _. .=0
n
La•
• =Q
Since M is arbitrary, (1) follows.
0
~M.
1.5 Criteria and examples
23
Remark. The condition (2) is satisfied if either
(i)
or
(ii)
ran < ran =
00 00
and
an is bounded.
These are the specific conditions under which Lemma A will be applied. We note further that the denominator in (1) may be replaced e.g. by n+I
L
.=0
av •
One of the basic formulas concerning recurrence is the following. It will be considerably generalized in § 9.
Theorem 2. For every i andj in I and
n~1
we have
(4)
This is proved by the method of first entrance (into j).
Theorem 3. For every i andj we have N
lim
LP~i) n;l
N ...... oo ' "
(n)
=m.
L.Pii n=O
Proof. Summing (4) from n=1 to n=N, we have N
N n-I
N-I
N
N-I
N-v
"'p(n) - '" '" f(~-V)P(v) - '" p(~) '" f(~-v) - '" p(v) '" f(n) L. 'I -.c... .c... '1 11-L. 11 L. 'I -L. 11 L. '1· n=l n=l v=O v=O n=v+I v=O n=l
Applying Lemma A with n=N, a.=p}"J, and bo=O, b.=
•
L f~i),
n=l
we
obtain Theorem 3. It is clear that the a. satisfy either (i) or (ii) in the Remark to Lemma A. 0
Theorem 4. The state i is recurrent or nonrecurrent according as the series diverges or converges. In the latter case the sum of the series is equal to (1- ftit l .
Proof. Theorem 3 is valid for i=j, hence: N
lim N-+oo
n~l~n; N () fti. 1 + ~ Pi1 n=l
Both assertions now follow from the definition of recurrence. 0
I Discrete parameter
24
Remark. Theorem 4 yields another proof of Corollary 2 to Theorem 4.5. For let i be recurrent and p~i) > 0, p~~) > O. Since p(~) p(!') p(~) P11(n+v+m) > = 11' u
~1'
the divergence of recurrent.
L p~1 v
implies that of
L pJ'i •
and consequently } is
Theorem 5. The series 00
'" p(~) £- '1 n=O
(5)
converges or diverges according as gii=O or >0.
Proof. If gii=O then /tj gii=O by Theorem 4.2. If /tj=0 then the series (5) vanishes termwise. If gji=O then} is nonrecurrent and L p}"/ converges by Theorem 4; hence (5) converges by Theorem 3. n
If gii>O then /tj~gii>O and gii=gii(f{j)-1>0 by Theorem 4.2. Hence} is recurrent by the Corollary to Theorem 4.3; L p;n/ diverges by Theorem 4; and (5) diverges by Theorem 3. 0 n
Corollary. If} is nonrecurrent then (5) converges. If} is recurrent then (5) diverges or vanishes termwise according as irv.} or not. Remark. The following trivial example shows that Theorem 5 is a best possible result. Let 0 j > 0 then the Itj in Example 2 is the same as in Example 1, because starting from i the M. C. {xn} must enter j before it can enter 0, and the Pii for i;;S;1, j;;S;1 are the same in the two examples. Furthermore it is clear that
* - (/*10 )i-i . 1iiWe now calculate
Ito
as follows. We have lor the chain in Example 1:
p?o"+1) =
(2nn+1) pn (1 -
Pt+1 ,
pi
2:)
= O.
26
I Discrete parameter
Hence we obtain using (6), if P=!=-}, (8) From (7) and (8) we obtain by Theorem (9)
tiO=
1 -11 -2pl 2P
3,
=1 or
1-P P
according as P-}. For p=i we know tio=1 by recurrence. Thus if p~i we have lif=1 for i>j~O, although C is inessential and therefore nonrecurrent by Theorem 4.4. This explains why no assertion is made regarding i=!=j, in a nonrecurrent class. It is clear however that at least one of and is less than 1 since ~ Ii; < 1. If P>-} we have
lio=
m, m m _P)i
1 ( -P-
mm
. In either case gio=lio and gii=O for all j~1.
These facts imply the following: if p~i-,
p{ lim if
p>{-,
n~oo
xn (w)=O}=1;
p{;i~ xn(w)=O} = ( 1;P
PUi~ Xn(W) = oo}
=1-
r,
(1 ;P
r.
Example 3 (random walk with a reflecting barrier). The state space is the same as in Example 2; h=1 for some h~O; and
Poo=1-P,
POl=P,
h,Hl=P,
P;'i-l=1-P,
i~1.
Thus the free random walk is modified so as to return to the state 0 whenever the state -1 would have been entered. It is customary to imagine a rellecting barrier placed at --}. All the states form an essential class of period 1. The same considerations as in Example 2 lead to the validity of (9) for the new M. C. It follows from Theorem 6 that it is nonrecurrent if P> -}. Furthermore a direct argument yields Ii:'i = (1- p)n-lp and consequently It1 =1 if P> O. It is also clear that Ito~/tdto. Hence if P~-} we conclude from these facts and (9) that Ito =1; it follows from Theorem 6 that the M. C. is recurrent and all
!ti=1.
A slight modification of this example is obtained by replacing the first two equations above by the single one: POl=1.
Example 4 (random walk with two absorbing barriers). The state space is the set of integers i: O~i~l where l~2; h=1 for some h,
27
1.5 Criteria and examples
O O. Hence d;' divides V, d;; divides N +1- V and consequently either number divides N +1 since d;'=d;;. Thus d;'+I=d;'=d';.=d;;+I as was to be proved. This
establishes the lemma. We define for n~O, Then ro=1 and
00
00
"L rn= .=1 "L v I.=m. n=O Since r.- r._ 1 = - I. we have upon sUbstituting into (1), n
Pn=- "L (r.-r._I)Pn_. or
0=1
n
n-l
"L r. Pn-.= .=0 "L r. Pn-I-.· 0=1 This means that the sum does not depend on the value of n; since r opo=1 we have therefore (4)
n
"Lr.Pn_.=1,
• =0
n~O
.
I Discrete parameter
30
Let there exists a subsequence {n k} such that lim Pnkd=J... Let k-->-co
S
be such
that Is >0; then d divides s. We have, using (1) and (2),
:;;:;'Isk-').oo lim Pnkd-S +
(L
,,=1
I.) k---+oo lim Pnkd-P
.,*s
:;;:;'Is lim Pnkd-s+ (i-Is) J... k-->-co
Thus lim Pnkd-s~ J.. and consequently lim Pnk d- s= J.. by the definition k-'J>-oo
k-'J>oo
°
of J... This being true for every s for which Is > and every subsequence {nk} for which lim Pnkd=J.. we apply the result a finite number of k-->-co
I
times to conclude that lim Pnkd-t=J.. for any t of the form k-->-co
L ci si where
i~l
the ci and si are positive integers such that Is; > 0, 1:;;:;'j:;;:;' Z. By the lemma there exist si with Is; > 0, 1 :;;:;'j:;;:;'Z, such that their greatest common divisor is equal to d. An elementary result from number theory which we have already used in the proof of Theorem 3.3 then implies the following: if s ~ So then there exist positive integers ci such that I
sd=Lc.S .. i~l 1 1
We have therefore proved that for every
s~so'
limp(nk_S)d=J...
k---+co
Putting n= (nk-so)d in (4) and noting that P.=O unless d divides we obtain
'V,
nk-so
L r.dP(nk-sO-p) d= 1 .
• ~o
Letting k-'?- 00, we obtain
provided that the series converges; otherwise J.. must vanish so that in either case we have
1.6 The main limit theorem
Since 1.=0 unless d divides
'JI
00
"\'
31
.d+d-l we see that r.d= ~ rio Hence i=··d 00
L
m
1 '\'
L.."r.d=(j L.."r'=d'
.=0
.=0
and we have proved that A=d/m. The preceding evaluation of lim Ad applies, mutatis mutandis, to n ...... oo
0
lim Ad. Theorem 1 is therefore proved.
n ...... oo
Corollary. Let i andj belong to the same recurrent class of period d and let JEG, (i), 1 ~r~d, according to the notation in §3. Then we have lim p(n.d+,) = _d_
n-+oo
mij .
''q
Proof. Since p}1=O unless d divides
'JI,
we have
n
= L..J '\' I('1•.d+,) p(n.d-.d) P(n.d+,) '1 11 .
(5)
• =0
00
We have L
.=0
Ilid+,) = L IW = m= 1. 00
.=1
Letting n -+ 00 and using Lemma A
of § 5 we obtain the corollary from the theorem.
0
A recurrent state i is called positive or null according as lim p~idl n ...... oo
is positive or zero. Thus a recurrent state i is positive if and only if mii< 00. Theorem 2. The property of a state being positive (or nUll) is a class property. Proof. Let i~j and p~,,!) > 0, P~i) > o. If d is the common period we have p('?')p(•.d)p(n.) P"(,?,+.d+n) > = '1 11 1'.
Letting 'JI-+ 00 we obtain Theorem 2 from Theorem 1, since i and j are interchangeable in the argument. 0 Before we consider the limiting behavior of p~n;> for arbitrary i and j we shall introduce and study certain new quantities. Let the period of j be d. For every integer r we define 00
!0 (r) = n=1 L Ii';> . It is clear that
n=r(modd)
!0(r) =P{xn(w) =j for some n-== r(mod d), n> 01 xo{w)=i}
I Discrete parameter
32
and that d
'Lfi'j(r)=t0 .
• =1
Furthermore, let C(Ct.) with period d", Ct.~1, be an enumeration of all the essential classes. For each Ct. let Cp(Ct.), 1 ~(J~d", be the subclasses of C (Ct.). For every i we define
f*(r; i, Cp(Ct.)) =P{xn(w) ECp(Ct.) for some n=:r(mod drx ), n>OI xo(w)=i} f* (i, C (Ct.)) =P{xn (w) EC(Ct.) for some n> 01 xo(w) =i}. It is easy to see that drx
f*(i, C(Ct.)) = 'Lf*(r; i, Cp(Ct.)) P=1
for every r ~ 0; and that
'Lf*(i, C(Ct.)) ~1.
"
Theorem 3. Let C(Ct.) be recurrent. Then for an arbitrary i,
f*(r+s; i, CfJ+s(Ct.))=f*(r; i, Cp(Ct.)) for every rand s;
f*(r; i, Cp(Ct.)) =fi'j (r) for every r and any j E Cp(Ct.); and
f*(i, C(Ct.))=fif
for any jEC(Ct.).
Proof. Let jECp(Ct.). Then clearly we have f0(r)~f* (r; i, Cp(Ct.)). On the other hand if r~ 1 we have by the method of first entrance (into Cp(Ct.)),
f0(r) =
00
'L 'L
n=O kECp{")
P{xnd+r(w)=k; xvd+r(w)EE Cp (Ct.) , O~Y-co
kEG_,(i)n->-co
or (4) Furthermore we have for each r, 1 = lim
l:
n->-coiEGr(i)
Consequently we have
3'
p\jtl+r) ~
l:
lim p\jH.) =
iEGr(i)n->-co
l:
iEGr(i)
d 1ti .
I Discrete parameter
36
and (5) Summing (4) over all iEC we obtain (6)
The interchange of summations is justified by (5). Comparing (4) and (6) we see that equality must hold in (4):
'Tli=L'Tlkhi· kEG
Thus {c 'Tli' iEC} for every constant c is a solution of (1). (ii) To prove the stated uniqueness of the solution above, let {u i } be a solution satisfying L IUil < 00. Iterating (1) and interchanging iEG
summations we obtain for every nand r:
Ui=
L
UkP~~d+').
kEG_r(i)
Letting n ~ 00 we have by the Corollary to Theorem 6.1, (7) If C is null then 'Tli = 0 so that Ui = o. If C is positive then each 'Tli > o. Equation (7) being true for every r, we see that the bracketed sum does not depend on r. Hence it does not depend on i either, since C_,(i), 1 ~ r~ d, is simply an enumeration of the subclasses of C. Thus in either case Ui=C'Tli as was to be proved. (iii) In case C is a positive class it follows from (i) and (ii) that we may replace the u's by the 'Tl's in (7). Since 'Tli> 0 we may divide through to obtain (2) and consequently (3). 0
Corollary. If C is a positive class, we have for every i,
L 'Tlii=t* (i, C).
iEG
In particular the sum is equal to one if iEC. Proof. This follows from Theorem 6.3, the Corollary to Theorem 6.4 and (3) above. 0 A M.e. {xn' n~O} such that
Pln ) =P{xn (w) =i}=P{xo (w) = i}= Pi for every n~ 0 and iEI is called a stationary M. C. It follows from the definition and the stationarity of the transition probabilities that the
1.7 Various complements
37
joint distribution of {Xn,+n, Xn.+ n , ... , x nl +n } for any {nl' n 2 , ... , nl} is independent of n so that the process is stationary in the established sense. Theorem 2. Let {xn' n~O} be a M.C. with the initial distribution {Pi} and transition matrix (hi); and let the positive classes be D(r:x.), exEA where A is a denumerable set of indices, D (ex) =f=D(ex') if ex=f=ex' and U D(ex)=D. In order that the M.e. be stationary it is necessary and
",EA
sufficient that there exists a sequence of constants {A"" exEA} with A",~O, ~A",=1 such that ",EA
if i(fD; {o Pi = A",ni if iED (ex),
(8)
Proof. (i) Necessity. If {nn'
exEA.
is a stationary M.C., then
n~O}
Pi=p~n)= ~ Pi pj:). 1
If i(fD, we have by Theorem 6.4 lim pj:)=O for every n---+oo
n--+oo
f. Hence letting
in the above equation we obtain Pi=O. If iED(ex), then since we have by what has just been proved
Pj';l=o if iED-D(ex)
Pi= ~ p·p(1 iED(",)
for every v. It follows that
L
Pi = lim
n---+oo iED(",)
1
1
Pi {~
f pw}.
p~l
We can pass to the limit under the summation sign here because the series on the right side is dominated by ~ Pi ~ 1. Hence by the Corollary to Theorem 6.4 we obtain iED(",)
Pi= (~ Pi) ni· iED(",)
Taking A",= ~ Pi we see that (8) is satisfied. iED(",)
(ii) Sufficiency. Let {Pi} be defined by (8). Let i(fD; then if iED, p)~)=O for every n. On the other hand Pi=O for every i(fD by definition. Hence we have for every n, p~n)= ~ Pi pj:)=O=Pi. 1
Now let iED(ex) and choose any iED(ex). We have, since Pk=O if k(fD and P1n) = 0 if kED-D(ex),
p}n)= ~ Pkpi~)=A", ~ nkPW. kED(",)
kED(",)
I Discrete parameter
38
From Theorem 1 we know that L
kED(o;)
7lkPk;=71;·
On account of (3) we can iterate this to obtain
L
kED(o;)
We have therefore
tlk
pl~) = 17,;.
Thus {xn' n~O} is a stationary M.C. 0 If C is a positive class the probability distribution given by {7l;, iE C} is called the stationary absolute distribution for the class. Corollary to Theorem 2. Let the state space be a positive class. The M. C. is stationary if and only if its initial distribution is the stationary absolute distribution. The following theorems throw some light on the situation, both analytically and probabilistically. Let D (ex) and D be as in Theorem 2 and let f*(i, D)=
L f*(i, D(ex))
o;EA
where the right side is defined in §6. The quantity f*(i,D) may be called the probability of absorption of i into some positive class: f* (i, D)=P{xn(m)ED for some n>
01 xo(m)=i}.
Theorem 3. The series
in i converges uniformly with respect to n if and only if ~ 7lii=1 or 1 equivalently f* (i, D) =1. n
Proof. Let
aW = ~n L P~'J; 11=1
Theorem 6.4. Hence if
L aii) i
then lim a~7 = n---+oo
tli i
by the Corollary to
converges uniformly with respect to
n,
we have L 7l;i=1 upon passing to the limit under the summation sign. i
To prove the converse let} be a finite of 1. We have lim La1i)=1-lim La1i)=1- L71ii'
n ...... ooiffJ
n--+OOiEJ
iEJ
Since L 7lii=1, given any e > 0 we can choose} = }(e) so that i
1.7 Various complements
39
There exists no = no (e) such that if n > no,
La}j) ii=L i
L
exEA iED(ex)
~*!.= Lf*{i,D(rx)) 11
exEA
L iED(ex)
n:.. = Lf*{i,D(rx)). 11
exEA
This completes the proof of the theorem. 0 The M. C. is called non-dissipative iff L nii = 1 for every i. i
Theorem 4. Within an essential class the series
L plj) i
converges
uniformly with respect to n for every i or for no i according as the class is positive or not.
Proof. If the period of the class C is equal to one the proof is similar to that of Theorem 3, since now nii=ni and L ni=1 if and only if iEG
C is positive. Let the period be d; then for every r, '" p(~d+r) _ 'J -
L.J
i
'"
L.J
iEG.(i)
1
p(~d+r) '1 -
where lim pljd+r)=dnj for every iEC,(i) and n~oo
same conclusion holds.
1~r~d,
L
dnj=1. Hence the
iEGr(i)
0
Corollary. Within a positive class of period d, we have lim p\jd+r)= dnj uniformly with respect to i in Cr (i).
n~oo
The probabilistic meaning of Theorem 4 is as follows. If i is in a positive class C, then given any e> there exists a finite set] (i, e) of states in C such that for all n ~ 0,
°
P{Xn (OJ) EJ(S) Ixo(OJ) =i} > 1- e. For a discussion of the following system of equations (9)
which is "conjugate" to the system (1), see Theorem 17.5 and its corollary. The following simple results are related to the question.
I Discrete parameter
40
Theorem 5. Let C be an arbitrary class, not necessarily essential. Suppose that Ui~O for each iEC and that we have for some iEC: (10) Then (11) Furthermore, {m, i EC} is the minimal nonnegative solution to the system of inequalities (10), in fact for it there is equality there. Proof. For each 'V ~ 1, the following relation holds for any two i and i belonging to the same class: (12)
z:
PikfI:1=flv.+ 1 ).
kEC-{i}
1
1
This is clear by definition if the sum is extended over l-{il Now if kr!i.C and Pik > 0, then irvO (see HARDY [1J). To avoid a direct application of this restricted Tauberian theorem we may proceed as follows. Let A(u)=iii(U)
iii(U)=oon~1 an un,
B(U)= __1 _ - Lb un 1 - A (u) -
n~O n
1 lul 11m (10) 0i Oo=eOi' n--+oo
Now consider the equation
(n+m)/p(n) _ "{p(n)/p(n)}p(m) P00 00 - L..J Ok 00 kO' k
It follows by (10) that
1 ~ lim {Pbn) /PbnJ} pi~) n--+oo
+ L,*ieOk P~"'r} . k
The last written sum without the restriction "k=J=i" is equal to e oo =/t0=1 by Theorem 9.3. Choosing m so that pi~) > 0, it follows that the upper limit appearing above does not exceed eOi and so in combination with (10) we have ·
11m n--+oo
(11)
p(n)/p(n) 0i Oo=e oi '
Similarly we obtain, by arguments which are dual to the above: lim p(n)/p(n)=r" =1 10 00 10
(12)
n--+oo
.
Furthermore it follows from the equation (~) = P11
and by (11) that (13)
·
11m n--+oo
" j(v) p(n:-v) +0p(n) 01 11 '
n-l
L..J 10 v=1
{p(n)/p(n)}>j* ii 00 = iO eoi-e oi '
On the other hand, by applying (11) and (13) to the equation = P(n) 01
we obtain eo i ~ f~i)
"
n
f(v). p(~-v)
L..J 01 v=1
11
lim {PJi- m ) /PbnJ}
n--+oo
'
+ (1 -
/bi)) eo i'
Choosing m so that f~i) > 0, it follows that the upper limit appearing above, which is the same as lim {pii) /PbnJ} by (6), does not exceed eOi and n--+oo
1.11 The moments of first entrance time distributions
61
so in combination with (13) we obtain · {p(nl/p(nl} 11m (14) ii 00 =eoi · n-+oo
Consequently, the condition (6) is now valid with 0 replaced by an arbitrary state, and so are the relations (11), (12) and (14) under the same replacement. The existence of the limit in (7) is thereby established, and its value of is course the same in as (9.17), evaluated with h=l. Notes. Theorem 1 is elementary; Theorem 2 is Theorem 96 in HARDY [1 J; the other Tauberian theorem cited is a consequence of Theorem 108 there. It may be mentioned that a somewhat more elaborate use of the generating function involving WIENER'S Theorem on absolutely convergent Fourier series leads to a proof of the main limit theorem (Theorem 6.1) in the case mii m(s) 1>,"'1 .LJ rru £..J T"'1 '
r"'l
(4)
11.(1))
rH
= Jr'u ./1\1» + " .m(v) " (s +y)1> m(s) £..J 1,""'1 £..J ,1"'· 00
00
v=1
s=l
These equations are to be interpreted according to our conventions regarding 00 (see §9). We divide the rest of the proof into three steps. (i) We show that (1) implies that /1\1»
r"11
< 00 •
If i=i there is nothing to prove. If i=t=i we have from (3), smce (s+y)l>~yl>, //(1))
2: 't,V1,1 .//(1))
(""'fl1 -
+ £...J" 7TU .m(v) yl> " m(s) > .11(1)) + m~ .. //(1)) L..J ,"'1 00
00
v=1
s=1
=
tr1,1
T"'11{""ld.•
Since cp~ > 0 it follows that (1))
(5)
illii
/1.(1)) < .11(1)) r"'''' =1,1,'&
o;
I.11 The moments of first entrance time distributions
65
00
where c and c' are positive numbers such that L f~nJ ~ 1. The state 0 n~l
and therefore the M. C. is recurrent if and only if equality holds. It is obvious that the order is p in either case; and if P> 0, mlfJ < - 00 in (18) we therefore obtain (17). 0 00 It follows at once from (19) and the divergence of L: P\1) that N
~pl't.)
lim _n,,;.;-o__ = N-->oo
~
p\'P
_1_
= ei k < 00 •
eki
n~O
This and Theorem 5.3 yield a quick proof of (9.18) and hence also of Corollary 2 to Theorem 9.4 in a recurrent class.
Notes. Formulas (11) and (13) are due to HARRIS [1J; the rest is essentially in CHUNG [2J and [3J except for Theorem 8. Theorem 1 has been generalized here to allow for a taboo set. The proof of Theorem 3 is new; in CHUNG [2J certain other moments and their combinatorial relations are studied. For empty Hand P=1, Theorem 1 reduces to the result of KOLMOGOROV that all mean first entrance times are finite in a positive class. It is not exactly clear what property of the power n P lies behind this theorem, although the method of proof permits trivial extensions. The direct proof of Theorem 4 given in CHUNG [2J is incorrect because of non-absolute convergence in formula (15) there. Theorem 8 is taken from CHUNG [10J, extending a result due to SPITZER on random walks; see NEVEU [4J for further generalization. The special case where all eki = 1 is discussed in CHUNG [10].
1.12 A random walk example
71
§ 12. A random walk example In this section we study in some detail a random walk scheme which is more general than the one we studied in § 5. First we develop a general method. The taboo probabilities HP~i) satisfy the obvious system of equations:
P(n) =
(1 )
>2 n=.
" p. p(n.-l) LJ .kH kl '
H"
kfiH
Summing over n we obtain
HP~= I., Pik HPli+Pii
(2)
kfiH
provided that every HPIi < 00 and the series above converges. Next, multiplying (1) through by n, summing and using (2) we obtain
where HM
00
-" iiL.. n H p(n) ii ' n~l
and both members of (3) may be 00. Specializing to the case where H consists of a single state that {u i , iEl} where if i=Fi, u.= {It-1 if i=i, • 1 is a solution of the system Ui=I.,PijUj, I
Specializing to the case where H consists of two states
Io
k/~ ~f ~ =F1: or k, 1
is a solution of the system Ui=
Furthermore
{Ui'
If~=1,
if i=k,
I., hi Uj, I
i=Fi or k.
iEI} where U.=
•
{km '1.. 0
see
i=Fi·
i=Fk we see that {u i , iEI} where ui=
i we
if i=Fi or k, if i=i or k,
i
and k where
72
I Discrete parameter
is a solution of the system
Ui= L Pi! UI+kt~,
i=t=j or k.
I
Finally
{Ui' iEl} where if i=t=j or k, if i=j or k,
is a solution of the system
Ui= L Pi! ul+ J~+;f"t7., I
i=t=j or k.
If the state space is a recurrent class then the last system reduces to
Ui =LPi! UI+1. I
In the first two cases the condition of finiteness and convergence is automatically satisfied while in the last two cases it must be ascertained. We are now going to apply these results to an example in which the above systems of equations with boundary values can be solved. Consider a M. C. whose state space is the set of nonnegative integers. The initial distribution is arbitrary and the transition matrix is given as follows. Let oc o=1, 0 < oci < 1, Pi=1- oci; j~O.
Thus the generalization over the classical scheme (§ 5) lies in the fact that the transition probabilities at different states are generally different although from each state it is possible only to go to one of the two adjacent states at the next step. The condition oc o=1 making 0 a reflecting barrier is not as restrictive as it may seem. By an obvious relabelling we could replace 0 by any negative integer and so achieve an apparent extension of the scheme. In particular, quantities like k/tt and kmii which depend only on a finite portion of the transition matrix because of the special nature of the random walk will not be affected by any change outside this portion. Let us first compute the According to the general method they satisfy the following systems:
m.
Since oc o=1 we have 1.fo=U1 • Rewriting the system as
1.12 A random walk example
we see that the unique solution is Ui=(J., (4)
73
O~i~i.
Thus
o~io}. The event Al(W)=O is the event that xn(w)=Fj between the first and second entrances at i; hence
P{A1(w) = O}=itti ,
P{A1(w) >0}=1- itti·
Thus the random variable N' has a geometric distribution and we have for every p~O, (7) Now consider the sum N-l
Tv(i)-l
TV+l(i)-l
TN(i)-l
W= n=l L Y,,= n=l L Yn+n=Tv(i) L Yn+ n=Tv+,(i) L Yn-
(8) We have by (6)
(9)
E(I WI
1
P)
~E {(:~:Iy"ln ~E {C~11y"1 +:~w;Jy"ln ~2P E{C~1IYnln+2PE {C~tN;lly"ln.
Applying the second part of Theorem 3, we see that the last two expectations are equal so that
E(IWI P)~2P+1E{C~1IYnln·
86
I Discrete parameter
I
If P > 1, using
(10)
~
HOLDER'S
E {( IY"I)P} n-1
inequality we obtain
~ E {(N')P-1 ~ IY"I P} n-1
=m~t{N'(w) =m}mP- 1E L~p';,jP1 N'(w) =m}.
We have by the definition of N':
E{IY,,\PIN'(w) =m}=E{IY,,\Plxs(w) =!=j for all sEIn (i; w)}, 1 ~n < m; E{!Ym\PIN'(w)=m}=E{!Ym\Plxs(w)=j for some sE1m(i; w)}. It follows from Theorem 13.5 that the right members above are independent of nand m respectively; hence we may denote them by E1 and E 2 • The hypothesis E(\Y,.I P) < 00 for every n implies that E1 < 00 and E2 < 00. Substituting into (10) we obtain by (6) and (7),
EK~lly"ln ~~t{N'(w)=m}mP-1{(m-1)E1+E2}
OO
ZS(n) n
=E(Y)=:i5;(I).
In the same way it is proved that the above relation holds if I~ o. Hence it holds for a general I, provided :i5;(I/I) .}Y,,(w)P(dw)
=
n-l
L
• =1
{E(Y,,)-
J
{len. w);£.}
Y,,(w)P(dw)} .
I Discrete parameter
94
n
The last summation may be replaced by 2;. Now it is clear that v=1
{w: l(n, w) 0,
6
:n;2i2"""" A :n;2n •
i;?,An
Furthermore the events {w:YanH(w)~An}, n~1, are independent since xan (w) = with probability one. Hence by the Borel-Cantelli
°
lemma, we have
P{YanH(w)~An i.o.}=1.
This and the fact 5 an (w)=0 together imply that
· Sn PI-·Im-=oo. - sn 0= Pl Im-< n..:::;oo n
n~oo
n
Consequently the limit in (5) does not exist, nor does that in (9) since E(5 an )=0, E(5 anH )=00.
°
Example 2. Let I consist of and ordered pairs (r, s) of integers such that r~ 1, 1 ~ s~ 2'. Let Po= 1, Poo=2- 1 ,
p(.,s),(.,s+l)=1, 7 Chung, Markov Chains
PO,(r,1)=3-',
it.,2r),o=1,
r~1,
r~1; 1~s~2'-1.
I Discrete parameter
98
It is easily seen that 1
moo =2+ m(.,s), (.,s)
L -3-=3, ,=1
=3'+1 ,
00
2'+1
r~1,
1 ~s~2r.
Let 1(0)=0, and
I((r,s))= {
+(!)' if s is odd; - (!)'
if s is even.
Then we have
on the other hand it is clear that f-ti=O for every iEf. Now it is easy to see that for every n~O,
(14) It follows that for every It> (lg 3-lg 2) Ilg 3,
L P{ISn(w)1 >nA} n
Thus the limit in (5) is O. Furthermore it follows from (14) that E(Sn) is bounded in n; hence the limit in (9) is also O. Notes. Theorem 1 is given in CHUNG [3J. HARRIS and ROBBINS [1J proved it for a general state space using HOPF's ergodic theorem. Corollary 1 was proved before Theorem 1 by LEVY [1 J and HARRIS [1 J. Theorem 2 can be derived without pain from G. D. BIRKHOFF'S ergodic theorem, where the needed metric transitivity can be proved by martingale theory, as suggested by DOOB (in a letter). For our purposes our proofs are by far the simplest. For the kind of application of Theorem 1 alluded to after its proof see the Notes on § 16. For generalization of ratio ergodic theoremes as well as ratio limit theorems (§ 9) to general state space, see JAIN [1J, KRENGEL [1J. In the chain case the latter will allow ratios of the form N
N
n=1
n=1
LP(n)(i, A)I L p(n)(f, B) where e.g. p(n)(i, A)=
LP}~.
kEA ~-convergence
Theorem 3 corresponds of course to the version of the ergodic theorem; for the L 2-convergence version see § 16. Unlike the other results here, these two do not extend to an arbitrary initial
1.16 Further limit theorems
99
distribution because the E's in question need not then be finite under our conditions. Theorem 4 is in CHUNG [3], but the proof there is inaccurate since we need the strong, rather than the weak, law of large numbers - a rather unexpected twist of fortune. An open problem is to find necessary and sufficient conditions for the ergodic theorems given here. The two examples show that "deterministic circuits" must be excluded in order to reach a satisfactory result. It may be objected that these circuits are rather artificial and we may combine e.g. the two states 2i-1 and 2i in Example 1 into one without any loss of information on the transition of the M. C. Nevertheless we can modify these circuits slightly to destroy the complete determinism while still keeping the requisite properties to serve as counterexamples. For instance we may let P2i,2i-l=P2i,O= i in Example 1.
§ 16. Further limit theorems In this section we give several more limit theorems about Sn including the central limit theorem and the law of the iterated logarithm. The state space I will now be assumed to be positive class, in fact beginning with Theorem 3 below the stronger assumption that ml~) < 00 for some and hence all i will be made. Let us rewrite (15.12) as l(n)-l
(1)
Sn-enMn= Y'(n)
+ L Z.+ Y"(n)-e
n
M(n-l'n(I)+l'l)
.~l
where M Let
= 'J1:i fli
is independent of i by the Corollary to Theorem 15.4. (J~ =E {Z~ (i)}
and
Note that (J~ is not the variance of Y,,(i). Applying Theorem 14.4 to the functional f - M we see that if (J~ < 00 for some i EI then (J~ < 00 for all iEl.
1=
Theorem 1. If 0 < (J~ < 00, then Sn is asymptotically normally distributed with mean Mn and variance En; viz. for every real u,
V
Proof. Dividing (1) through by En and letting n-+oo, we see that the first, third, and fourth terms on the right all tend to zero in probability by (14.4) and (14.20), which latter is applied once as given and 7*
I Discrete parameter
100
once with t= 1. (For the second application we may also use Theorem 14.2.) Thus it remains only to prove that lim p
{
n~oo
1
lin,
,,=1
Let 0 < 13 < 1 and
(2)
w)-1
1fBn " Bn L...J
Z " (w);;:;;u
}
=
(/J(u).
n'= [n;(1- e3 )n] ,
where the square bracket denotes the integral part. It follows from Corollary 1 to Theorem 15.2 that there exists a number no(e) such that the w-set A={w: n'no(e)'
If wEA then for all n>no(e) we have
II(n~~1-1 Zv{w)- V~1 Zv(w) I;;:;; 2 n'~~tn"lv~;+/v(W) I· By the well known KOLMOGOROV inequality (see e.g. FELLER [3, p.220]) we have
(3)
P
I
{ max I L
v'
-}
Zv(w) >2- 1ea iVn*;;:;;
n' (u-s)O'p .
Hence if wE(Q-Ao)AIA2' then l(n, w)-1 >n' so that there exists n(w) (u- 4s)O'Vn*}.
The last equation is equivalent to
1-P
C6 A s) +pC6oAs) ~P t~~~[S.(w)-Mp]~ (U-4s)O'Vn'isO'Vn*}
t= 'tl(n. W)
~p{ I;;;;.;;;; maxu,,(w»sO'Vn*} ..
I Discrete parameter
106
since
It= L
I
v
max
'l':l(n,ro) 2 :
denote positive constants. We have then
••• , C4
f'(j)
O
such that
n* In =
no < 1 it follows that
~~ (t1Y,,-A~)
tends in distri-
bution to (/). This has been shown to be impossible. Case 2. There exists a sequence {nk} such that Bnk=o(Vnk). It follows then from (14) that (16)
plim Vi * nk
k ..... oo
(~Y,,-Ank)=O. .~1
According to the necessary and sufficient conditions for the validity of the weak law of large numbers (see GNEDENKO-KoLMOGOROV [1; p. 134J) it is necessary that
nt{I1';.(w)-ml ~ Vn:}=o(1) where m is a median of 1';.. But the left member is asymptotically equal to C2 nk* -----:-,-=-----
n: ,+1 192n:
by (12); consequently (16) is false and Case 2 is impossible.
1.16 Further limit theorems
111
We have therefore proved that in either case the sequence of random variables (15) does not tend in distribution to t;P and with this our goal is reached. Notes. Theorems 1 and 2 are due to DOEBLIN [4J and [1J. Our proof of Theorem 1 is slightly simpler than DOEBLIN'S and, thanks to an observation due to D. G. KENDALL [3J, does not require the condition 1, owing to the circumstance that ~ (I) =~ (g) = 0 for the appropriate j and g. This happens in general when the random variables Y" (i) have to be normalized by subtracting the mean, for then ~ (1) = O. Such is the case also with FELLER'S Theorem 5 which is the particular case of our Theorem 1 for j = el i . Thus the result is valid, but the method of Theorem 15.1 fails. How to replace the latter theorem in the indeterminate case is an open
mW
I Discrete parameter
112
question. On the other hand, FELLER'S method does not seem to yield DOEBLIN'S theorem.
§ 17. Almost closed and sojourn sets In this last section of Part I we return to general considerations of the evolution of the M. e. The new results will concern primarily nonrecurrent (essential or inessential) states. Logically speaking, this section may be placed immediately after §4 as it is largely independent of the developments thereafter. Its postponement is partly due to the methodological difference and partly due to its being an introduction to an as yet unexplored part of the theory of M. C. Let 100 be the space of inlinite state sequences, namely sequences of elements of I. A generic element of 1 00 may be denoted by j={io,il' ...}. Define a transformation Ton 100 , called the shilt, by
T {io, iI' ...} = {iI' i2' ...}. Let I be a function with domain 1 00 and range the finite real line. The function I is said to be invariant (under the shilt) iff I(Tj)=/(j) for every jE/co • Now consider the M.e. {xn' n~O} with state space Ion the triple (Q, ff, P). It is well known (see DooB [4; p. 603]) that a function q; defined on Q and measurable with respect to ff {xn' n ~ O} is a Baire function I of the sequence x = {xo, Xl' ... } : (1 )
For a given q; there may be more than one I satisfying (1); e.g., if xo==io on Q then the identically vanishing function q; may be represented in the form (1) by either the identically vanishing I or the function 1= xo-io. The function q; is said to be invariant iff there exists an invariant Baire function I for which (1) holds. A set A in ff {xn' n ~ O} is said to be invariant iff its indicator q;A is an invariant function and hence representable in the form (1) by an invariant Baire function fA of x. It follows that for every n~O, (2)
q;A (w) =
IA ( r x) =IA (xn (w), xn +l (w), ... ) .
The collection of all invariant sets is a nonempty Borel subfield of ff; this field will be called the invariant lield and denoted by '§. It is not necessarily complete with respect to P. The above definition may be relaxed by allowing an exceptional null set of w in (1). Two functions on Q are said to be equivalent iff their values coincide except on a null set; and two sets Al and A2 in ff are said to be equivalent iff their symmetric ditterence (AI-AlAI) v (A2-AIA0 is a null set. We write AI"':" A2 when Al and A2 are equivalent, and we
I.17 Almost closed and sojourn sets
113
write Ala} and a is an arbitrary number satisfying 0 < a < 1. For an arbitrary set of states A let us write p(n)
(i, A) =P{xn(w)EA IXo(w)=i}.
1.17 Almost closed and sojourn sets
115
Clearly we have (7)
P{2(A) Ixo(w)=i}~ lim p(n) (i, A)~P{2(A) Ixo(w)=i}. n---+oo
Hence if A is almost closed then for any m;;;;; 0 and iEI, P{2(A) IXm(w)=i} = lim p(n) (i, A) n---+oo
exists. (The converse is not true: take A to be a single nonrecurrent state.) Next, it follows from the second inequality in (7) that if A is transient then limp(n)(i,A)=O. (The converse is not true: take A n---+oo
to be a single null recurrent state.) A set of states 5 is called a soiourn set iff P(2(5)) >0. This is equivalent to the apparently stronger condition that there exist i E 5 and N;;;;; 0 such that (8)
For it is a consequence of the definition that there exists an N < such that
00
from which (8) follows. Let 5 be a sojourn set and 0 < a < 1 ; we put 5(a)={i: P~(5)lxo(w)=i} >a}.
Theorem 2. If 5 is a sojourn set then for every a, 0 < a < 1, the set 55(a) is almost closed, 5(a) - 55(a) is transient, and
Proof. The set 2(5) being invariant, 5(a) is almost closed and 2(5) == 2(5(a)) by the Corollary to Theorem 1. Consequently 2( 5) = 2 (5) 2 (5(a)) = 2 (5 5(a))
and
f£(5 5(a)) 0 and A does not contain two disj oint subsets in Il( of positive probability; it is called completely nonatomic iff P (A) > 0 and A does not contain any atomic subset. In the latter case for each given c, 0 < c < P (A), there exists a subset M in Il( of A such that P (M) = c. The following lemma is well known (sec e.g. LOEvE [1, p. 100J). Lemma. Relative to (Q, Il(, P) we have the following decomposition: (9)
where the A's are a finite or denumerable number of disjoint sets in Ill, at most one of which is completely nonatomic and the others are atomic. The decomposition is unique modulo null sets in Ilf. Let us apply this lemma to Il( = 'lI and use the notation in (9). To each invariant set An of positive probability there corresponds an almost closed set of states An. In analogy with the definitions above an almost closed set will be called atomic iff it does not contain two disjoint almost closed sets, and completely nonatomic iff it does not contain any atomic almost closed set. Theorem 4. We have the following decomposition of the state space: (10) where the A's are a finite or denumerable number of disjoint almost closed sets, at most one of which is completely non atomic and the others are atomic; and 00 (11) L P{!l'(An)} = 1. n=O
The decomposition is unique modulo transient sets. Proof. Let Bn correspond to
An in the Lemma with Il(= 'lI, according n-l
to Theorem 1. Let Ao=Bo, An=Bn- U B,Bn, ,=0
n~1.
It follows from
1.17 Almost closed and soj ourn sets
117
the isomorphism of the correspondence that Bo Bn is transient if v=l= n, hence Bn - An is transient. Thus An as well as Bn is almost closed and 2(An) == 2(Bn) == An. The atomicity of the sets An and the uniqueness 00
follow from the isomorphism and (11) follows from (9). The set 1- U An is transient and may be absorbed in any An. D n=O Corollary. Each existing recurrent class may be taken as one of the atomic An's; each of the remaining An's including the completely nonatomic one, if present, contains only nonrecurrent states. This follows from the fact, consequence of the Corollary to Theorem 4.5, that a recurrent class forms an atomic almost closed set. Example 1. Let I be the set of nonnegative integers, POI =Po2=i; P2i-I,2i+l =P2i, 2i+2= 1-1/(i+ 1)2, P2i-l,O=P2i,O= 1/(i+ 1)2, i~ 1. Then I is a nonrecurrent class; the set of even integers and the set of odd integers are two disjoint atomic almost closed sets. Example 2. Let I be the set of ordered pairs of nonnegative integers. Pu,O),(i+1,O)= 1-1/(i+ 2)2,p(i,o),(i, 1)= 1/(i+ 2)2,p(i,i),(i,i+1)= 1, i~0,j~1.
All states are inessential. The atomic almost closed sets are A o= {(i, 0), i~O}, Ai+l={(i,j), j~1}, i~O; which yield the decomposition asserted in Theorem 4. Let Bi={(i,j), j~O}, i~O; then the B;'s are disjoint atomic almost closed. sets, but they do not yield the decomposition in the sense of Theorem 4 since (11) is not satisfied. There is no representative of A o among the B;'s. Example 3. Let {8n , n~ 1} be a sequence of independent and identically distributed random variables taking the values 1 and - 1 with probability
i
each; and let xn =2-1
+
n
+0=1 L8
0
To-I. Then {xn' n~O} is
aM. C. with independent (but not stationary!) increments, of which the state space lis the set of numbers ofthe form (2m+1) Tn, 1 ~2m+1< 2n , n~ 1. It is easy to see that I is a completely nonatomic (almost) closed set, since for each n there exist 2n disjoint almost closed sets Ao such that P{2(A.)} = rn. We wind up with the discussion of the system of equations (7.9) which is "conjugate" to the determining system (7.1). Theorem 5. To every bounded solution {u(i), iEl} of the system of equations (12) u(i)=LPiiu(j), iEI iEI
there corresponds a bounded invariant function rp such that
(13) a.e., and conversely.
118
I Discrete parameter
Proof. Suppose that (12) is true and consider the functional process {u(xn), n~O}. We have
E{u(xn+1) Ixo....• xn} =E{u (xn+1) IXn} =
L: PXn. i U (1) =u(xn)· i
Hence the sequence {u(xn).~} where ~=$'" {xv. O~'P~n} is a martingale. Since u is bounded it follows from a martingale convergence theorem (DooB [4; p. 319J) that lim u (xn) = 9'
n--+oo
exists and E {9' Ixo} = u (xo) with probability one. Clearly 9' is bounded and equivalent to an invariant function. Conversely. if (13) is true where 9' is bounded and invariant. then by the Markov property:
Thus {u(xn).~} is a martingale and (12) becomes the defining relation
Corollary. I consists of a single atomic almost closed set if and only if the only bounded solution of the system (12) is u(.) = constant. Notes. The main results. Theorems 1. 2. and 5. are due to BLACKWELL [3J. to which we refer for application to chains with independent and stationary increments. For further results along these lines see BREIMAN [1]. Theorems 3 and 4 amount to new proofs of some auxiliary results in FELLER [4J. other related results can be derived by the same method. Apart from their intrinsic interest. these results will be useful in the future development of the continuous parameter theory; see FELLER [5J and § 19 of Part II. The discussion of the transition probability p(n) (i. A) to a set becomes properly speaking a part of the general theory of Markov processes with an arbitrary. not necessarily denumerable. state space; see CHUNG [13J. It should be remarked that the notion of "shift" is used only in this section of the book. Its introduction requires either a specification of the sample space as discussed here. or a new axiomatic foundation as in DYNKIN [2]. Our avoidance of it elsewhere in the book is done at some expense of extra work. for instance in § 15 of Part II. However. the novice is sternly warned against taking this shifty operation for granted. For connections with discrete potential theory. see DOOB [5J. HUNT [1J and NEVEU [4]. For applications to boundary theory see CHUNG [12].
Part II. Continuous parameter § 1. Transition matrix: basic properties In Part II it will be convenient to begin with the analysis of a denumerable set of real valued functions, later (in §4) to be identified as the transition probability functions of a continuous parameter Markov chain. Although the questions treated in the next three sections are of a purely analytic nature some of the methods used will be probabilistically inspired. Occasionally the results are given in a more general form than is required by later applications. Henceforth T stands for the interval [0, 00), TO for (0, 00); I for a denumerable set of indices, later to be specified to be the minimal state space of a Markov chain. Unless otherwise stated, the letters s, t, u, ~, e denote positive real numbers, namely points of TO; i, f, k, 1 denote elements of I; 'V, m, n denote positive integers. An unspecified sum over the indices is over all I. A transition matrix is a finite or denumerable array of functions (Pii(·) or more simply (Pii)' i, fEl, defined on TO and satisfying the following three conditions; for every i, f and s, t, (A)
Pii(t)~O;
(B)
LPij(t)=1;
(C)
L Pik (s)hj(t) =Pii (s+t).
i
k
If we denote the matrix (Pii(t)
by P(t), then these properties can be stated as follows; each element of P(t) is nonnegative; each row sum is equal to one; and the family {P(t), tETO} is a semigroup with respect to the usual matrix multiplication. (Convergence of the row-column product in (C) in the infinite case is ensured by (A) and (B).) Conditions (A) and (B) together may be expressed by saying that for each t, the matrix (hi (t) is stochastic. Condition (C) is often referred to as the Chapman-Kolmogorov equation. In the sequel measurability on T or part of it means Lebesgue measurability, unless otherwise specified. This measure is denoted by ft and the phrases" almost all" and "almost everywhere", abbreviated to "a.a." and" a.e.", have the usual meaning. An example of a transition matrix whose elements are not measurable will be given at the end of this monograph. Measurability of all the
II Continuous parameter
120
elements of the matrix has far-reaching consequences. We record this as a basic assumption: Every Pij is a measurable function in TO.
(D)
A transition matrix which satisfies the condition (D) is said to be
measurable. Theorem 1. For any transition matrix (Pij) and each fixed h> 0 the sum 2:IPii(t+h)- Pii(t)1 i
is nonincreasing as t increases. If (Pii) is measurable, then the above sum tends to zero uniformly in t~!5 > 0 as h--+O. In particular, each Pi i is uniformly continuous in [15, 00) for every 15> O. Proof. We have if 0 < s < t,
f Ipii(t+h)- Pii(t)1 = fit [Pik (s+h)- Pik (S)JPkj(t- s) I
~
2:k IPik(S+h)-Pik(S)1 2:i Pkj(t-S)= 2:k IPik(S+h)- Pik(S)I·
This proves the first assertion. Next if the matrix is measurable, then upon integrating we obtain, if t~!5 > 0,
L IPii(t+h) -
Pij(t)1 ~
i
Hence if
O~ h~!5
L k
6
~
jIPik(s+h)- hk(S)1 ds. 0
the second series is dominated by 26
L~j k
Pik(S) ds
0
and consequently converges uniformly in hE [O,!5J. By a well known theorem (see e.g. TITCHMARSH [1; p. 377J) we have for each k, 6
lim j IPik(S+h)- Pik(S)1 ds=O.
h-+O
o
The second assertion of the theorem follows from this and the uniform convergence, and the third assertion is an immediate consequence. 0 The most important part of Theorem 1 is: every element Pii of a measurable transition matrix is a continu011,s function in TO. In order to lead to a further essential hypothesis regarding the transition matrix, we study the behavior of Pij(t) as t--+O from the right. (Since Pi i (t) is not defined for t < 0 we shall simply write t --+ 0 for such a limit.) We prove first an algebraic result.
11.1 Transition matrix: basic properties
121
Theorem 2. Let the matrix (U ii ) be given satisfying the following conditions for every i and i: (1) (2)
2: Uii~ 1; i
Uii~2: Uik Uki · k
Then the index set I may be partitioned into disjoint sets F, I, such that (4)
Uii=O
if iEF;
(5)
Uii=/jIjui
if iEI, iEI;
I, ...
where {ui' iEI-F} are numbers satisfying (6)
ui>O,
Proof. Let
2:ui=1. iEJ
From (1) and (3) we have
and so by (2) Hence Ui (1 +tti-uii)~ui'
o. If ui> 0 then it follows from the last inequality Hence by the definition of ui we have
If ui= 0 then u ii =
that
ui-uii~O.
(7)
Thus this equation holds in any case; using it and (3) again we obtain uii~
2:k u ik uii+uii(Uii-uii)~uii+uii(Uii-uii)'
It follows that
(8)
Now define F to be the set of i such that uii=O. If iEF, then Hence Uii=O for every i, proving (4). Let iEI-F; we have o < Uii~ 2: Ui i uii~(~ u ii ) Ui' Uii~uii=O.
1
1
II Continuous parameter
122
It follows by (7) that
(9)
LUij =1
Furthermore we have
,
if iffF.
1 = LUij~L LUik Ukj=LUikLUkj~LUik=1. j jk k j k
Thus equality must hold throughout, and we obtain Uij=LUikUkj k
ifi!'iF,
strengthening (3). In particular
consequently Uik>O and Ukj>O imply that uij>O. We may thus partition the set 1- F into equivalence classes: two indices i and i belong to the same class if u i j > O. Thus relation is reflexive by (7) and symmetric by (8), since u j j > 0 if i !'iF; and transitive as just shown. As a notational device, the class to which an index belongs will be denoted by the corresponding capital letter, e.g. iEI, i EI, unless the contrary is explicitly indicated. (The class I is not to be confused with /.) Thus if I=t=I, then Uij=O. If I=I, then uji>O and we have from (7) and (8)
These two cases are combined in the formula (5); (6) now follows from (9). 0 The index set for the classes distinct from F will be denoted by C and an unspecified sum over the classes is over all C. Corollary. We have actually (3')
for every i and
f.
There exist numbers {eiJ' iEF,
IE C}
satisfying
(10) and such that (11) Conversely, given any partition of I into disjoint sets F, I, I, ... and given any {u j' iEl-F} and {eiJ' iEF, IEC}, satisfying (6) and (10) respectively, there exists a matrix (Uij) satisfying (1), (2), (3'), (4), (5) and (11).
H.1 Transition matrix: basic properties
123
Proof. By (4) the sum in (3) need only be extended to k rtF. Now upon summing over all j and using (9) we see that equality must hold in (3), namely (3') is true. We have from (3'), (4) and (5)
If we define
(!i]= LU ik kEf
then (10) is satisfied because of (1) and (2). To prove the converse we simply define the elements of the matrix (uii) by (4), (5) and (11). The verification of (1), (2) and (3') is straightforward. 0 We are now ready to prove that the continuity (hence also the measurability) of all Pii is equivalent to the existence of their limits at zero; furthermore in this case each Pii is decomposed into the product of this limit and a simpler function. Theorem 3. Let (Pii) be a transition matrix. Then all Pii are continuous in TO if and only if all the following limits exist: lim
1--+0
p'1.. (t)=u '1...
In this case the limit matrix (Uii) satisfies the conditions (1), (2) and (3') above and we have (12)
Furthermore, in the notation of Theorem 2, we have
(13) There exists a transition matrix (III])' I, lEO satisfying the additional condition (14)
such that
(15) There exist functions IIi] on TO, iEF, lEO satisfying (16)
and such that (17)
f IIi] (t) ~ 0,
LIIi] (t) = 1,
1~IIiK(S) IIK](~=IIi](s+t)
II Continuous parameter
124
Conversely, given any partition of I into disjoint sets F, I, l, and given any transition matrix (III])' I,lEC satisfying (14), any continuous functions {IIi]' iEF, lEC} satisfying (16) and any {Uj, j EJ-F} satisfying (6), there exists a measurable transition matrix satisfying (13), (15) and (17). Proof. Suppose that all Pij are continuous. Let {tn}, {t~} be two sequences tending to zero such that lim Pij(tn)=uij, lim Pij(t~)=u;j n-+oo
»----+00
for all i, i. Such sequences exist be the Bolzano-Weierstrass theorem and the diagonal procedure. We have by FATou's lemma, 1 = lim
n~oo
L hj(tn ) ~ L u ij · i
i
Furthermore, by the continuity of Pij and dominated convergence we have (18) and consequently
It follows that if
L Uk j < 1 i
for a certain k, then Pik (t) = 0 for this k,
every i and every t; in particular U;k=O. Next we have by FATou's lemma (19) Letting t-+o along the sequence {tn} we obtain (20) On the other hand, letting t-+o along {t~} in (18) we obtain (21) and consequently
since U;k=O whenever
L Ukj< 1, i
as shown above. It follows that
equality must hold in (21), and upon comparison with (20) we obtain Uii~U;j and so uii=U;j by symmetry. Thus any two limiting matrices of (Pij(t) as t-+o are identical and there is a unique limit matrix (Uij). The relation (21) with equality sign becomes (3') and (18) and (19)
11.1 Transition matrix: basic properties
125
become (12). We have therefore proved that continuity of all Pii implies the existence of a unique limit matrix (U ii ) at zero satisfying the stated conditions. Suppose on the other hand that there is a unique limit matrix (U ii ). Then reading (18) without the first member we see that every Pii has a right-hand limit at every t. Such a function has only a denumerable set of discontinuities (see e.g. SAKS [1; p.261J). Hence all P; i are measurable and therefore continuous by Theorem 1. We proceed to prove the other assertions. If iEF, then Uii=O for every i by (4) and it follows from (12) that Pii(t)== o. Let iEI, iEJ. Since Pik(t) Uki=O if Mj, and Uki=ui if kEj by (5), we have from (12)
Pii(t) = (L,Pik(t)) ui· kEf
Thus Pii(t)Ujl depends only on i and (19) and writing Uik for U;k we have
J. Furthermore, upon summing
the last equation following from (9). Hence equality must hold in (19) and using (5) we obtain
Thus Pii(t)Ujl depends only on I and i- Combining this with the precedingremark we conclude that Pii(t)Ui- 1 depends only on I and j so that we can define Ill] by (15). Clearly Ill](t)~O; 1 = L, Pi .(t) illF
and
1
= L, L, n](t)u. = ] iE]
Ill] (s+t) =Pii(s+t)Ui- 1 =
=
L n](t);
1]
L Pik (s)hi(t) Ujl = kllF L IllK (s) UkIlK](t)
kllF
L IllK (s) ilK] (t) L Uk = L nK (s)IlK] (t) . K kEK K
Finally we have by (5), lim Ill] (t)
t-'>O
1 = U· . u:- 1 = 0 for all t according as iEF or ilF. If Pii(to) >0 then Pii (t) > 0 for all t~ to' Proof. If iEF, then P;i(t) ==- 0 by (13). If i1F, then by (5) and (15), limPii(t)=ui>O. Since
1--->0
(22) for every n by (A) and (C), it follows that Pii(t) > 0 for all t. Next, Pii(to) > 0 implies that i1F; hence if t> to we have by (A) and (C) and what has just been proved,
Pii (t) ~Pii(to)Pii(t- to) >
o. 0
The second assertion of Theorem 4 will now be sharpened into a rather deep result. Theorem 5. Let (Pii) be a measurable transition matrix. Then each Pii is either identically zero or never zero in TO. Proof. We may take I to be the set of positive integers. Suppose first that the theorem is false for Pil where ilF. Then by Theorem 4 there exists an to> 0 such that
Pi/(t)=O if
O0.
By the Corollary to Theorem 3, there exists an N such that (23)
L Pii(t) < :
i>N
if 0 < t~ 2to'
11.1 Transition matrix: basic properties
Let s=to/2N and define for
127
m~ 1:
Am={k: Pik(m s) >O}. By Theorem 4 we have Am(Am+!. Let BI=AI' Bm=Am-Am-1 for m~2. If kEf Am' then
O=Pik (m s) = L Pii ((m-1) s) Pik(S) = L Pii((m-1) s) Pik (s) and consequently
(24)
i
jEA m _ 1
Pik(S)=O
if JEAm-I' kEf Am.
Suppose if possible that Am=Am-1 for a certain m, 1 0 for a certain t1>0. Then by (13) there is atleast one knot inF such that Pik (t1/2) >0 and hi (t1/2) > o. It follows from what has just been proved that Pii(3t1/4) >0. Repeating this argument and using Theorem 4 we conclude that Pii(t) > 0 for all t> 0, as was to be shown. 0 Another completely different and probabilistic proof of Theorem 5 will be given in Theorem 5.2. Notes. Theorems 1, 2 and 3 are due to DooB [1J, given here with slight improvements. Similar results in the case where I is finite were previously given by DOEBLIN [2]. The proof of Theorem 1 given here follows the general semigroup treatment (see HILLE-PHILLIPS [1, P.305J) with a simplification communicated by JURKAT, who also observed that (3') follows from (1) to (3). The proof of Theorem 2 is that of CHACON [1]. Both are simpler than DooB's original versions. Theorem 5 was first proved by D. G. AUSTIN and simplified by the author, using probabilistic methods; see Theorem 5.2 below. The algebraic proof given here was communicated by D. ORNSTEIN. The result remains true even if condition (B) for (Pii) is dropped; see CHUNG [2J. The term "transition matrix" as defined in this section should not cause confusion with that in § 1.2, where it denotes the stochastic matrix of the one-step transition probabilities of a discrete parameter M. C. In the latter sense the term may be re-named "one-step transition matrix" to avoid any conflict. Alternatively, the more logical but academic term" transition matrix function" may be used for the present purpose, as in DooB [4].
§ 2. Standard transition matrix According to Theorem 1.3, the analytic study of (Pii) is reduced to that of (IIIJ) if the set of indices F is ignored. In fact, the curtailed matrix (Pi)' i,jEI-F, is a transition matrix and its elements differ from those of (IIIJ) only in certain constant factors depending on the second index. From the standpoint of probability, it will be seen (in §4) that the set F plays a nuisance role and can indeed be ignored. Moreover, the reduction from (Pii) to (IIIJ) is also justified on probabilistic grounds (see Theorem 4.3). The distinctive feature of the transition matrix (IIIJ) is the property (1.14) which will now be formulated as a definition. A transition matrix (Pii) is called standard iff (E)
lim
p .. (t)=(j '1..
t~O'1
II.2 Standard transition matrix
129
for every i and j. This is equivalent to lim PH (t) = 1 for every i. For a 1---+0
standard transition matrix it is natural to extend the definition of each Pi i to T by setting (1) Pii(O)=()ij" The following result is a simple consequence of Theorem 1.3. Theorem 1. The transition matrix (Pii) is standard if and only if (a) it is measurable; (b) in the notation of §1, the class F is empty and each of the remaining classes contains only one element. In particular, the continuity of the elements of a standard transition matrix follows from Theorem 1.3, but a sharper result can be proved in a simpler way. Theorem 2. Let (Pii) be a standard transition matrix. Then for every i, j, t~ 0 and h> 0 we have (2)
In other words, the modulus of continuity of Pii does not exceed that of Pi; at zero. Proof. We have by (1.e), Pii(t+h)- Pii(t) =
Lk Pi" (h)hi(t)- Pii(t)
= [Pii(h)-1]Pii(t) + L Pi" (h)hi(t). "9=;
L Pi" (h) in absolute ko" i opposite signs; hence (2)
Neither term in the last member exceeds 1- Pii(h) = value by (1.A) and (1.B) and they are of follows. 0
Corollary. Let (Pii) be a measurable transition matrix. For every i(fF and every j, Pii is uniformly continuous in T. It turns out that the continuity of Pii follows already from assumptions (1.A), (1.C) and (2.E) without the intervention of (1.B), the finiteness of all Pii being assumed. We prove the following more general result which will also be used on several other occasions; note that the dual statements correspond to the transposition of a matrix. Theorem 3. Let (gii), i, jEI, be a matrix of finite, nonnegative functions on T satisfying the condition that for every i:
9 Chung, Markov Chains
II Continuous parameter
130
Let {Ii' i EI} be finite, nonnegative functions on TO satisfying the following equations for s, tETo: li(S+t)= L Ii (S)gii(t) , iEl;
(4)
i
(5) Then each Ii may be extended to T to be continuous there. Proof. We prove (4) since (5) is entirely similar. For a fixed i and t~o suppose that tn-+O, t~-+O, OO we have also Ij(t+e)~/j(t)gjj(e) from which and a similar argument it follows that for every i:
(6)
Given any t > 0, there exists an s such that 0 < s < t and st£D. If 0 < e < s then li(t- e) = L li(S- e)gij(t- s), i
and consequently by FATou's lemma li(t- 0) ~ L li(S)gii(t- s) =Ij(t). ;
Together with (6) we see that each Ii is left continuous. If D were not empty, then this and (6) would imply that for any to in D, there is some 1· such that Ij (to 0) > Ij(t o). It follows that for all s so small thatgii(s) >0, we should have
+
~ lim Ij(t o+ b)gjj(s) +lim L li(tO+ l5)gij(s) ~tO
~/j(to+O)gii(s)+
>
~tO
i*i
L Ii (to)gii(S)
i*i
Li Ii (to)gii(s) =Ij(to+s).
II.2 Standard transition matrix
131
Hence all such s would belong to D which is impossible since D is countable. Thus D is empty and we have proved that each Ii is continuous in TO and has a finite limit at O. The theorem follows. 0 Corollary. Let (gii) be finite, nonnegative functions on TO satisfying the condition (3) and the following equations:
gij(S+t) = L: gik (s)gki(t),
s, tE TO.
k
Then each gii is continuous in T by extension. Remark. Uniform continuity in T is ruled out by the trivial counterexample gii(t) = tJ ij eq,l where qi> o. We proceed to establish the existence of the derivatives at zero of the elements of a standard transition matrix (Pij) in the next two theorems. Naturally we consider only the right-hand derivative at zero which will be denoted simply by P~i (0). Theorem 4. For every i,
-P~'(O)= lim ..
1-+0
1
-P;dt ) t
exists but may be infinite. Proof. We know from Theorem 1.4 that Pii(t) > 0 for all t; let -log P;i (t) = cp (t); then cp is finite-valued. The inequality (7)
which follows from (1.A) and (1.C), becomes the subadditive property of cp: cp(s+t) ~ cp (s) +cp (t). Put _
q;- sup 1>0
/p(t) t
1- 2e by (12). Using this in (9) we obtain n-l Pii(n h) > (1-2e) L. Pii(h) (1- e)~ (1- 3 e)n Pii(h); .=0
or (13) Put
Pi~(;h) >(1-3 e) Pi~(h), if nh 0; and we have
I
lim" Pii(t+h) -Pii(t) -p~.(t)l=o
h-+O
L.."
h
i
o. If (3) holds also at t=O, namely if qi= L qii' then the uniformity holds in t ~ 0 in all three cases. Hoi The proof of the corollary is similar to that of Theorem 1.1. Theorem 2. If qi < 00, then for every i, Pii has a continuous derivative in T. Furthermore we have (11)
t~o,
S>O.
Proof. We have, using (6), Pii(t+h)- Pii(t) ~Pii(t) rPii(h)-1] ~- Pii(t) qi h
and consequently D [Pii(t) eq1t ]
= [DPii(t) +Pii(t) qi] #;t~o
where D denotes the right-hand lower derivate. Hence P; i (t) eqjt is nondecreasing in t and DPii is finite a.e. Put
(12) Writing (13) applying D with respect to s and using FUBINI'S theorem quoted in the preceding proof, we obtain (14) for each t and a.a.s. The corresponding inequality with "~" replacing "=" is true for all t and s by FATou's lemma. In particular 00
>vii(to)~Pii(tO-S)Vii(s)
for a.a. to and s < t, from which it follows that each vii is finite everywhere. By FUBINI'S theorem on product measure, (14) is valid if SEEZ and tf£Zs where,u (Z) =,u (Zs) = O. For an sof£Z suppose that for a certain t we have
(15)
II Continuous parameter
138
Then if t' > t it follows that Vii(t'
+ so) ~ LI
Pi! (t'- t) VI i (t+ so) > L Pit (t'- t) L Plk (t) vki (so) I
k
= LPik(t')vki(SO)' k
This is impossible by the choice of So since (14) holds for a.a.t. Hence Zs is empty if sElZ. Next, let s> 0 be arbitrary, Oo and consequently vii is continuous in TO, by Theorem 2.3. The function Pii' having a continuous Dini derivate DPii in TO, has in fact a continuous derivative P;i there (see e.g. SAKS [1; p. 204J). The continuity in T follows as in the preceding proof. 0 We remark that the difference between Theorem 1 and Theorem 2 lies in the fact that for a fixed i we have the assumption (1.B) for which there is no analogue for a fixed j. The proof of Theorem 2 uses only the finiteness of all Pii and assumptions (i.A), (1.e) and (2.E) without the intervention of (1.B). Under the same assumptions the first sentence in Theorem 1 and the relation (5), namely the part of Theorem 1 which is the dual of Theorem 2, can be proved in a way exactly dual to the proof of Theorem 2. The details are left to the reader. We take this occasion to discuss a simple but important extension of the notion of a transition matrix. A finite or denumerable array of functions -°iEI
which does not involve an index in F remains valid for a substochastic transition matrix; so do Theorems 2.4, 2.5, 2.6, 3.1 and 3.2 except that equation (3) in Theorem 3.1 should be replaced by the corresponding inequality L P;i (t) ~ 0, since p;~ (t) ~ 0. i
Notes. Theorems 1 and 2, except the continuity of P;; at zero, were first proved by D. G. AUSTIN [1J, [2J by purely analytic means. A new proof of Theorem 1 and a partial proof of (11) in Theorem 2 are given in CHUNG [5J, in which probabilistic methods are used to introduce the nonnegative quantities rii and Vi; and to derive their properties; see §§ 15 to 16. Another proof is given by YUSKEVIC [2J who also gave an example in which Pi; does not have a finite second derivative. JURKAT [1J simplified and strengthened the ideas in CHUNG [5J and the present proofs are cast largely in his form, with a further simplification suggested by G. E. H. REUTER. A simplified version of AUSTIN'S proof of Theorem 1 together with its connection with semigroup theory is given by REUTER [2]. A substochastic transition matrix is called a (Markov) process, and a transition matrix an honest process in the last cited paper. For simpler new analytic proofs of Theorems 1 and 2, obtained after the first edition and based on Theorem 12.4 below, see CHUNG [9]. For differentiability of transition functions in Euclidean space, see Hsu[ 1J.
II Continuous parameter
140
§ 4. Definitions and measure-theoretic foundations For general definitions, conventions and notation we refer to § 1.1 and § 11.1. A continuous parameter stochastic process is a one-parameter family of random variables {Xt} on a probability triple (Q,.fF, Pl. The parameter t may range over an arbitrary linear set S, although we shall be mainly concerned with the sets T=[O, DC) and TO= (0, 00). For a fixed w, the function x (', w) = x. (w) of tE S is called the sample function corresponding to w. Its domain is S; let its range be 9i(w). The union of 8l(w) for all wEQ is the range of the process, but since a null set is usually immaterial the more pertinent concept is that of the essential range, defined to be the set of values each of which is contained in 8l (w) for a set of w of positive probability. The essential range of a process is therefore invariant if the sample functions of a null set are changed. We shall be mainly concerned with a process in which each Xt is a discrete random variable and moreover the union 1 of all the possible values (see § 1.1) of all the Xt is still a denumerable set. Such a process will be called denumerably-valued and 1 its minimal state space. In contrast to the discrete parameter case, the essential range of a denumerably-valued process may be larger than the minimal state space; in fact it may be nondenumerable. A value in the essential range will be called a state of the process; and it will be called a fictitious state iff it is not in 1. As a convention an unspecified state shall not be fictitious. A continuous parameter Markov chain is a stochastic process {Xt' tE T} which is denumerably-valued with the minimal state space 1 and which possesses the Markov property: for everyn~2, O~tl < ... s} = 0 for every s; it is said to be stochastically t---+to
0
continuous in SeT iff it is so at every toES, and simply stochastically continuous when S = T. The following theorem is a restatement of the separability definition in terms of sample functions, with an important strengthening. It will be referred to in the sequel by the phrase: "by separability" . Theorem 1. The process {Xt' t E T} is separable with R as a separability set if and only if almost all sample functions are separable in T- R. If so and if the process is stochastically continuous, then almost all sample functions are separable in T. Proof. Suppose the process is separable with R. Let An denote the 00
closure of values x(r,w),!r-t! O. This contradiction disproves the assumption that 0 < u (i) < 00 and so proves the theorem. 0 By FUBINI'S theorem, we have if t> e and i =l= 00,
I'}
1 1 E { 2"8,u[Si(w),,(t-e,t+e)J x(t,w)=~ =2"8
f •(PiPi(t)
(t - s)
+1 ) Pii(s)ds.
o
Hence the conditional expectation tends to one as etO. It follows that ,u[S.(w),,(t-e,t+e)]!2e tends to one in probability under the hypothesis x (t, w) = i. The following theorem strengthens this result.
Theorem 3 (M). For each fixed t>O, we have if i=l=oo, (2)
P{lim +-,u [Si(W),,(t- e, t+ e)] = 11 x (t, w) =i}= 1, 6t O e
(For t=o the limit is to be replaced
bylim~,u[Si(w)"(O, 6t O e
e)].)
Proof. Let 0< e < t and define e(s, t) =0 or 1 according as Is- tl ~ e or . Hence by FUBINI'S theorem, ,u [Si(W),,(t- e, t+ e)J =
00
J e(s, t);(s, w) ds
o
is measurable fJ6 x$>. Thus for each e,,u [Si(W)" (t- e, t+ e)] as a function of (t, w) is measurable fJ6x$> and consequently by letting en=n-1 and n~oo we see that r={(t, w): lim -12 ,u [Si(W)" (t- e, t+ e)] =1}EfJ6X $>.
6t O
e
Let A(t)={w: x(t, w)=i; (t, w)E£r} and A(w)={t: x(t, w)=i; (t, w)E£r}. Since Si(W) is a Lebesgue measurable set for a. a. w, we have,u [A (w)] =0 for a. a. w by the density theorem in measure theory (see e.g. SAKS [1; p.129]). Hence by FUBINI'S theorem we have P{A (tn =0 for a. a. t. To prove (2) for all t we show that the left member of (2) does not depend on t by using the following lemma which is of general applicability.
Lemma. Let {;lvl , tE T}, '11= 1, 2, be two measurable processes whose sample functions are (almost all) Lebesgue integrable and suppose that the two processes have the same finite-dimensional joint distributions. Let (ai' bi)' 1-;;;;'j-;;;;'k be any intervals. Then the two bf
sets of k random variables J ;(v) (t, w) dt, 1 -;;;;'j;;;' k, have the same joint distribution. 4f
II Continuous parameter
152
The lemma follows from a known result (see e.g. DooB [4; pp. 64- 5J) according to which there exists a sequence of partitions {s}n)} with a= s&n) < sin) < ... < sl;')= b such that b
J;(") (t, w) dt =
a
lim
n-l
2.:
n-+oo j=O
;(v)
(s}n) , w) (Sj~l- sjn))
for a.a. wand V= 1,2. Let LI (t)= {w: x (t, w)=i}, M (t)= {w: (t, w) Er} so that the probability in (2) reduces to P{M (t) ILl (t)}. As a consequence of the Lemma, the conditional joint distribution of ,u[S;(w)n(t-e, t+e)J for any finite number of values of e, relative to LI (t- Cl), where 0 < e < Cl, is the same for all t> Cl, owing to the stationarity of the transition probabilities. Hence in particular the probability
(3)
P{LI (t - Cl)nLl (t) nM (t)}jP{LI (t - Cl)}= P{LI (t)nM (t) ILl (t - Cl)}
is the same for all t> Cl. Letting Cl t 0 it follows from the stochastic continuity of {Xt} that the ratio in (3) tends to P{LI (t)nM (t)}jP{LI (t))= P{M (t) ILl (t)}. Thus the latter is the same for all t > 0, as was to be shown. 0 Theorem 3 may be stated as follows: if a given t belongs to 5 i (w ), then it is a point of density of S;(w) for a.a. w. A linear set 5 is called metrically dense in itself iff every open interval which contains a point of 5 contains a subset of 5 of positive measure. Theorem 4 (SM). For a.a. w, the set S;(w), i=l=oo, is metrically dense in itself. Proof. Let R be a separability set. For a.a. w, an open interval which contains a point of Si(W) contains also apointofRnS;(w). This follows from the definition of separability if we take the closed set A in (4.10) to be i-{i}. On the other hand, applying Theorem 3 to t=r for every rER we see that for a.a. w, an open interval containing a point of Rn Si (w) contains a subset of 5; (w) of positive measure (indeed on either side of r, if r=l= 0). Theorem 4 follows from these two statements. 0 Theorem 5 (S). For any iEI, s;SO and t>O, we have
(4)
P{x(u, w)==i, s 0, the set 5;(w) contains only a finite number of i-intervals in (0, A). Consider for each n the sequence of numbers
(5)
an
=
{~ 2A (n-1)A} n J n , ... , n .
Since the union of all numbers of
an' n~
1, is denumerable it follows
00
from Theorem 6 that for any tE U an' the set {w: x(t, w)=i} differs n=1
from the set {w: t belongs to an i-interval of x(o, w)} by a null set. We define a sequence of w-functions as follows: .in ) (w)=the smallest number in an that belongs to an i-interval of x (0, w), if such a number exists; otherwise .in ) (w) = A; .~~1 (w) = the smallest number in an that exceeds T~n) (w) and belongs to an i-interval of x(o, w) distinct from the one containing .~n) (w), if such a number exists; otherwise .~~dw)=A. In particular if .~n)(w)=A then all the subsequent ones are equal to A. It is easy to verify that {.in ), s~1} are random variables. Let N be a positive integer. Consider the w-set A~)={w: T~)(W) 00
after the first edition states that every stochastically continuous process has a well-separable and progressively naturally Borel measurable version. This can be used to replace Theorem 1 here and simplify proofs elsewhere. Theorem 4 is a more specific version of a result due to DOOB [1J, made possible by the more detailed analysis of stable vs. instantaneous states given in the preceding theorems. The Corollary to Theorem 5 is explicitly proved by LEVY [1J in which some other results are implicit.
§ 8. Optional random variable We begin by recalling the definition of a conditional expectation relative to a given field. Given the probability triple (Q, $", P), a random variable' with E(IW < 00 and an augmented Borel subfield t§ of $", any w-function X(,) which is measurable t§ and such that f X(w)p(dw)=f'(w)P(dw)
M
M
for every ME t§, is called a version 01 the conditional expectation 01 , relative to t§, and denoted collectively by E('I t§). Thus Xis another version of the conditional probability if and only if X is measurable t§ and X= X with probability one. When t§ is the field generated by the random variable IX the conditional expectation E('I t§) will be denoted by E('IIX) and its value at w by E('IIX(w)). It is well known (see e.g. DOOB [4; p.603J) that one version of E('IIX) is a Baire function of IX; when so regarded its value on the set {w: IX(W)=S} will be denoted by E('IIX=s). When' is the indicator of a set A in $", the conditional probability of A relative to t§ is denoted by P (A I t§) or P (A I IX) , etc.
II Continuous parameter
166
Let {XI' tET} be a M.e. and let {~, tET} be a family of Borel fields such that (1) and such that for every i EI and 0 < s < t we have
with probability one. From this it follows more generally that if AE~{Xt, t~s}, then
P{AI~}=P{AI
xs}
with probability one. Observe that if ~=~{xs' O;?;;s;?;;t} then (1) is satisfied and (2) is a consequence of the Markov property. Now let O;?;; r < t and let s t r through rational values in (2), then the middle member converges with probability one to Px"i (t- r) by (6.2) and the argument following it, while the left member becomes P{Xt(w)=il~+o} where ~+o= n~. This being true for every r we conclude_ that (2) remains valid if ~ is replaced by the generally larger field ~=~+O. The n:.w f~mily of Borel fields {it, tET} satisfies also (1) and is such that ~=~+o' Replacing the given family by the new one without changing the notation, we shall suppose in what follows that the family {31i, tET} satisfies (1) and (2) together with the additional condition of right continuity,' Such a family of Borel fields will be called admissible relative to {Xt' {~tO, tET}, where §'to=~{xS' O~s;?;;t}; in this case there is right continuity as it stands by the theorem below. Theorem 1. §'t~O = §'to for each t E T.
tET}. The most important one is
Proof. According to the preceding remarks we have, if
r>t} then (3)
AE~{Xr'
P{AI~O}=P{AI~~o},
with probability one. Now let A be an "elementary" set in ~~ of the form A={w: xt,(w)=i., 1;?;;v;?;;I}. Then we have A=AI nA 2 where AIE~o and A2E~{Xr' r>t}. It follows that if g; denotes the indicator of AI:
P{AI §'n=g;p{A 2 §'n=G;p{A 2 §'t~o}=P{AI~t~o} 1
1
with probability one. The class of sets A satisfying (3) is a monotone class containing the field generated by the elementary sets, hence it contains the least Borel field ~~ so generated. In particular if AE~~o ,
II.8 Optional random variable
167
then the right member in (3) reduces to the indicator of A while the left member is measurable with respect to.fJ{°. Hence AE.fJ{° since all fields are augmented by assumption, proving the theorem. 0 The family {.fJ{0, tE T} will be called the minimal admissible family relative to the M.e. {XI' tET}. Let ff" be the smallest Borel field generated by.fJ{° and random variables independent of {XI' tET}, then {ff", tET} is an admissible family which is larger than the minimal one. Let LI E.fF and P (LI) > o. A nonnegative extended-valued random variable a defined on LI is said to be optional relative to {XI' ff", tE T}, where {ff", tE T} is an admissible family, iff for each t~ 0 we have {w: a (w) < t} Eff". Because ff"+o=ff,, the last condition is equivalent to (4)
{w:
a(w)~t}Eff".
According to this definition, a is optional relative to {XI' ff,,}, where {ff,,} is any admissible family, if it is optional relative to {xI,.fFn In the latter case we shall say that a is optional relative to {XI' tET} or simply "optional". On the set ,Q - LI the random variable a may be undefined so that the pertinent probability triple for a is the reduced triple (LI, LI.fF,P(·ILI)). If a is optional relative to {xI,ff",tET} it is easily seen that the collection of sets A in LI.fF such that
(5) for every t ~ 0 forms a Borel field. This will be called the pre-a field (relative to {XI' ff", tET}) and denoted by ~. Note that when a reduces to a constant to~o (which is an optional random variable relative to any admissible family) the corresponding ~ reduces indeed to ~o. In the particular case of the minimal admissible family the pre-a field may be denoted by .fF{XI' t~a}. In the remainder of this section the M.e. {XI' tET} is assumed to be separable and measurable. Since we have already assumed that the transition matrix is standard, or equivalently (Theorem 4.3) that the process is stochastically continuous, this implies by a theorem quoted before (see the paragraph preceding Theorem 4.3) that the process is well-separable. In particular, the set of terminating dyadics {m2- n , m, n~ O} or the set of rational numbers is a separability set. The first set is especially convenient since as n increases, the partial sets {m2- n , m~ O} give rise to partitions of T which are refinements of preceding ones. Let a with domain of finiteness LI be optional relative to {XI' ff", tE T}. For each iEI we define a w-function Yi as follows: (6)
Yi(w)=inf{t: t>a(w); x(t, w)=i}.
II Continuous parameter
168
In this and subsequent similar definitions it is understood that the infimum is 00 if no t exists satisfying the given condition. Since for a. a. w, x(., w) is separable in T by Theorem 4.1, we infer Yi(w) is a limit point of Rr.Si(w) from the right, where R is a separability set. The domain of definition and finiteness of Yi is rj=Llr.{w: Si(w)r.(IX(W), (0)=l=0}.
We have, if {s.} is a separability set, {w: Yi(w) 0,
x(t+t', w)=k}= 2:P{A; oc(w)~s; x(t, w) =j}Pik (t') . i
This being true for every s, it follows from the defining property of a conditional probability that
P{x(t+t', w)=kIA; oc}= 2:P{x(t, w)=il A ; oc}Pik(t') i
with probability eon; that is, by Theorem
(14)
rk(s, t+t'IA) =
3,
2: ri (s, tIA)Pik(t') i
for t, t';So, a.a. SE [0, tJ. Both members of (14) being Baire functions of (s, t, t') by Theorem 3, we can apply FUBINI'S theorem and infer that (14) is also true for a.a. s, a.a. (t,t')E[s, 00) x [0, 00). From this it follows, by the right continuity given in Theorem 3 and FATOU'S lemma:
(15)
rk(s, t+t'IA);S2:ri(s, tIA)Pik(t'), i
a.a. s, t;Ss, t';SO.
Summing over k, we obtain (16)
Next, we have from Theorem 3, (17)
2:ri(s, tIA)=P{AIA; oc=s}= 1 i
for t > 0, a. a. s E [0, tJ. Hnece by FUBINI'S theorem (17) is also true for a.a. s, a.a. t;Ss. In the same way as (16) follows from (14), we have from (17), (18) 2:ri(s, tIA)~1, a.a. s, t;Ss. i
Consider a fixed s fiE where E is the exceptional set of A-measure zero occurring in (1 5) and (18) . For such an s if (17) is true for a certain value of t then it is true for all larger values of t by (16) and (18). Hence (17) is in fact true for all t>s, since it is true for a.a. t;Ss. This proves
172
II Continuous parameter
(13). Hence there must be equality in (16) and so also in (15). This proves (12). The continuity of ri(s, ·IA) in (s, 00), for a.a. s, now follows from (12), (13) and the continuity of Pik. Since for each s it is right continuous at s by Theorem 3, the continuity may be extended to [s, 00). 0 Notes. An optional random variable is called a "stopping time" in BLUMENTHAL [1J and a "random variable independent of the future" in DYNKIN and YUSKEVIC [1]. In the latter as well as in CHUNG [6J, however, the more exigent condition (4) is imposed. The relaxation to the present condition with strict inequality, which is more easily verified in applications, is made possible by Theorem 1 due to DOOE and his observation (both communicated in a letter) that the right continuity of the field family may be assumed. These definitions extend at once to a Markov process with general state space; see the first two papers cited above, and also YUSKEVIC [1]. For a general discussion of questions concerning optionality see CHUNG and DOOE [1]. Instead of defining a M.C. {Xt} first and then an admissible family {~}, we may define {Xt} to be a M.e. relative to {~} iff the double family {Xt'~} satisfies (1) and (2) with a given transition matrix (Pi i)· The larger the fields ~ are the more stringent this definition becomes, but at the same time the less stringent the definition of optionality relative to {Xt, ~}. This kind of generality is needed, though not in this monograph, when several processes are considered simultaneously. The proof of Theorem 1 is new and much simpler than the one given in the first edition. Theorems 2 to 4 are taken from CHUNG [6J; Theorems 3 and 4 were first obtained in the special case treated in § 15.
§ 9. Strong Markov property To begin with, we assume that the M. C. {XI' tE T} is Borel measurable. Using the notation of § 8, we may define a family of random variables {~t' tE T} on the triple (Lt, Lt §, P (·1 Lt)) as follows:
(1)
~t(w)=~(t,W)=X(IX(W)+t,w).
For each t, ~t(·) is measurable § since x(.,.) is measurable fJBx§. In other words each ~ t is a random variable, thugh not necessarily finite-valued. It will be proved later that except for ~o which may be 00 a.e. on Lt, each ~t is actually finite a.e. on Lt. The stochastic process {~t' tE T}, taken in this general sense, is called the post-IX process relative to {Xt, ~,tET}. The Borel field §{~t' tET} will be called the post-IX field and denoted by ~. We remark that (1) defines a process if the x-process is an arbitrary Borel measurable process and IX~O; on the
11.9 Strong Markov property
173
other hand, even if the x-process is not Borel measurable it may happen that (1) still defines a process (see e.g. CHUNG [5]). For intuitive background we refer to § 1.13 where the discrete parameter analogue is discussed. Indeed, even in the continuous parameter case, if the optional random variable (1. is assumed to be denumerably-valued the results of § 1.13 can be immedaitely extended. We leave this as an exercise since the following discussion covers the general situation. Using the notation in § 8, we put for any AE~: 00
(2)
ri(A;t)=!ri(s,s+tIA)A(A;ds),
It follows from Theorem 8.4 that for any
(3)
fEl,t~o.
o
Lri(A;t)=P(A);
ri(A;t)~O;
i
f, kEI
and t, t' > 0, we have
L.: ritA; t)Pik(t') =rk(A; t+t'). i
Moreover, ri (A; t) is continuous on T; this can also be proved by applying Theorem 2.3 to the last relation in (3). In particular, lim ritA; t)= t,j,o
ritA; 0). Hence by FATou's lemma, the two equations in (3) become inequalities with "~" when t= o. The probabilistic meaning of ritA; 0) will be given first.
Theorem 1. For the x+ version, we have if AE~, {
(4)
P{A; ~o(w) =f}=ri(A; 0), PtA; ~o(w)
= oo}=P(A) -
fEl;
~ ritA; 0). 1
Proof. Suppose that (1.(w)=Yi(w) for a.a. w in ,1, where Yi is defined in (8.6). Then by Theorem 6.1, the only finite limiting value of x(t, w) as tt(1.(w) isf and ~ (0, w)=f for a.a. w in ,1 by the choice of the x+ version. Conversely if ~(O, w)=f then (1.(w)=Yi(w) for a.a. w in ,1 by the definition of Yi. We have thus {w: ~(O, w)=f}= {w: (1. (w)=Yi(w)}. Consequently we obtain, using (8.7), P{A; ~(O, w)=j}=P{A; (1.(w)=Yi(w))
=}~I:
f
Ci(S,m~1IA)A(A;dS)
m~O [~ m+l) n' n 00
=
}i~J Ci(s, o
f
00
[ns:-1] IA)A (A; ds)=
Ci(s, s IA) A (A; ds).
0
Since Ci(s, sIA)=ri(s, slA) by (8.10) the first equation in (4) is proved; the second then follows since ~ (0, w) = 00 if and only if ~ (0, w) H. 0 Theorem 1 is immediately generalized as follows.
II Continuous parameter
174
Theorem 2. For the x+ version, we haveifAE~; 0;;;::;tO 0 let us put 1]t(w)=~t.+t(w)= X(IX(W)+tO+t, w);
then {1]t, tET} is the post-(IX+tO) process and §"{1]t, t~O}=~'+t•. Since 1]0= ~ 10 is finite a. e. on LI by Corollary 2 to Theorem 2, we may apply (5) to the 1]-process; thus if AE~+lo and MEff:+to then
P{AM /1]0}= P{A /1]o}P{M /1]0} a.e. on LI. This proves (12) in full. Dividing (12) through by P(A/ ~to) provided the latter is not zero, we obtain
P{M/A; ~t,}=P{M/~to}' The truth of this for every to ET, a. e. in LI', and Corollary 1 to Theorem 2, imply the admissibility of {~+t' tET} relative to {~t' tET}. The last assertion in Theorem 3 is implicit in the discussion above. 0 It is easy to see that IX is measurable ~ so that for any s the set {w: IX(W)~s} belongs to ~ and may be absorbed in the set A in (5). However, it is advantageous to make this more explicit.
11.9 Strong Markov property
Corollary 1. We have if AE~, (13)
l
o~to 0,15' > O}, or alternatively by {C; (15),15 > O}, defined below:
B;(t5, 15')= {kEI: h;(t5) > 1- t5'}; C;(t5)= {kEl: sup Pk; (t) > 1- t5}. o;:;;;t;:;;;o
Two distinct states i and j have two disjoint fine neighborhoods; namely, I endowed with the fine topology is a Hausdorff space. Furthermore, almost every sample function x+ (., w), [x_ (., w)] is right [left] continuous in the fine topology on the set {t: x(t, w)El}.
II.11 Taboo probability functions
191
Proof. The equivalence of the two bases follows from the inequalities:
Pk;(6)-;i;, sup Pki(t)-;i;,Pk;(6) [ inf Pii(t)]-l. O;;i;t;;i;d
O;;i;t;;i;d
The Hausdorff separation property is demonstrated by the obvious relation Bt(6, j) n Bi(6, j)= 0, since hi(6)+hi(6)-;i;,1 for each k and 6>0. It remains to prove that the sets Ci (6) form a base of fine neighborhoods for i and to prove the last assertion of the theorem for x+ (that for x_ being entirely similar). Both will be accomplished if we prove that given any i and Ci (6 0 ), for a.e. w the sample function x(., w) has the following property: if x (t, w) = i, then there exists h (w) > 0 such that
(10)
x(s,w)EC;(6 0)v{oo} for
SE(t,t+h(w)).
This is indeed trivial if i is stable, by Theorem 5.7. In general we consider the martingale {1']t, O-;i;, t < A} introduced in the proof of Theorem 6.1. It follows from the discussion there that for a. e. w, and every generic t, we have lim Px(r w) ,.(A-r)=1']A(t,w) rER,1',£,t
"
where R is a separability set for {Xt}. If x+(t, w)=i, then as r.j.t there are infinitely many values of r for which x (r, w) = i, and consequently by the continuity of Pii the limit on the left side above must be Pii(A-t). Thus we have in particular for i=i (11)
lim Px(r
rER,rp
"
w)
;(A-r)=p;i(A-t).
Fix a 6 such that 61-60 , For tin S;(w)n O~t~d
[(m-1) 6, m6] where m is a positive integer, apply (11) with A=m6. It follows that for a. e. w we have lim [sup PX(rW)i(U)]~ inf Pti(t).
rER,?~t O;;;;u;;;;d
Hence for a. e. w there exists h (w)
"
O;;i;t;;;;d
> 0 such that
x(r, W)EC i (6o) for rERn(t, t+h(w)) for every t in the said range. This relation implies the desired relation (10) by separability and by varying m. 0 Let us note that the unilateral continuity asserted in Theorem 4 amounts to a strengthening of Theorem 3 from a "fixed t" version to a "generic t" version. Needless to add that a bilateral strengthening would be out of question, e.g., at a t in St(w) n Si(w) where i is instantaneous and i is stable.
II Continuous parameter
192
Corollary. If i is instantaneous and H is adjacent to i, then there exists an infinite sequence {iv} from H such that for every i EI and t> 0 we have limAv1.(t)=Pdt). lI---?OO 1 Proof. Since H is adjacent to i it intersects every fine neighborhood of i and so in particular every B;(I5, 15'). Taking two sequences I5n~O and 15:~0 it follows that there exists a sequence {iv} from H such that Now we write for I5 n < t:
and the corollary follows at once from the continuity of Pii' 0 Turning to the behavior of HPii(t) as t--HX), we prove the following analogue of Theorem I.9.3. We write as in the discrete parameter case i r-.7o. H iff there is a state k EH such that i r-.7o. k. Theorem 5. If i r-.7o.H then 00
f HPii(t) dt0, U>O. For t>o and from (4),
n~O
we have
BPii (nu+t) Pik (u) ~ e--qj(nu+t) +P{e; (w) ~ nu+t; x(s, w)t£H, e;(w) oo
That this limit does not depend on the choice of R is a consequence; in particular since {x t } may be assumed to be well-separable we infer the purely analytical statement that the limit is the same for any sequence R dense in T. 0 The following theorem treats two similar but distinct aspects of taboo probabilities for a post-Q( process. Theorem 7. Suppose that the x+ version is used and that We have (16)
P{~(u, W}ffH, 0 < u0:
L CfJij((j; s) iPik(t)= CfJij((j; s+t) i
and consequently by integration: (34)
LcPij((j; S);P;k(t) = cPik((j; s+t) - cPik((j; t). j
From (20) we have if O-O. Theorem 6. If i=}j and i, j and k belong to one recurrent class then (22) Observe that the results in this section were derived from the first entrance decomposition of § 11 without using the last exit decomposition of § 12. As a matter of fact, they were obtained before the results of § 12, which are more difficult, were proved. The latter may be used to yield quicker proofs as well as new formulas. For instance, we deduce from Theorem 12.13 or just (12.24) that P;i (t) · 11m -p. .-(t-)-
t-+oo
$10
=
Gi i ( (0)
as the dual to (5); and therefore lim _lhQL I-H)O
Pil(t)
=
Fii(oo)
Gii(oo)
as a complement to Theorem 4, the finiteness of Gi i ( (0) in this case being a consequence.
II Continuous parameter
218
In what sense and how well do the discrete skeletons Cl:s approximate the M.C. Cl: as s.j.O? We have already seen on several occasions, notably in Theorems 10.2 and 10.4, that results about a d. p. M. C. can be used to obtain the corresponding ones about a c. p. M. C. by suitable approximations. Further illustrations of this method will now be given. Consider the discrete skeletons Cl:s : { Xn s , n ~ O} of Cl:: {X t , t ~ 0}. Quantities defined for a skeleton will be denoted by the symbols used for a d. p. M. C. in Part I (see in particular § I.9) followed by a parenthesis indicating the scale of the skeleton; e.g. HP~i) (s) = P{x. s (w}EEH, 1 ~v < n; xns (w}=jlxo(w)= i}
HP~ (s) =
00
L HPW (s);
n~l
Hili) (s) =i. HPW (s);
00
mii(s}=
L
nM(s}.
n~l
It is clear that HPii(t} is a good analogue of HPW only if i is instantane-
ous while we need other definitions if i is stable. Note also that although pli)(s}=Pii(ns}, HPW(S} is not equal to HPii(ns} in general. We shall suppose iEEH below so that the definitions of the latter given in § 11 and § 12 agree. Let us take the X+ version in what follows and define on LI;= {w: xo(w}=i} for each s>O,
els ) (w}=max{ns:
n~O; x(vs, w}=i, O~v~n}.
Furthermore we define for each s > 0,
(Xls! (w) =min{ns: n~ 1; ns > e;s) (w); x (ns, w) =j},
tWI (w)=min{ns: n~ 1; nS~(Xii(w); x(ns, w)=j}, if wELlii=LI;n{w: (Xii(W) O. That is, given any 8>0, there exists a t=t(8) such that s
2:
n>ts- 1
HP}j) (s)
O. Since Pik is continuous, there is a b > 0 and a t1 > to such that Pi k (t) ~ b if to"'£ t"'£ t1 . For each sufficiently small s there
11.13 Ratio limit theorems; discrete approximations
221
exists an integer ms such that to-;;;'mss-;;;'ti' We have
Sn~_.HP}1)(S)-;;;' Pik(~SS) n~_.P{x(vs,w)=I=k, 1-;;;'v-;;;'n; x((n+ms)s, w}=kl LI;}. The last sum may be written, by putting n= Ams+r, as ms-l
L
r~O
L
A> (ts-'-r)
m,'
P{x(vs,w)=I=k,1-;;;'v-;;;'Ams+r; X((Ams+r+ms)s, W}=kILl;}
-;;;, msP{x (vs, W) =1= k, 1-;;;'v-;;;' [tS-I] + 11 LI;} -;;;,msP{aW (w) >tILl;}. It follows from Theorem 8 that
()
00
-1' '" 1msL.J s~O
n>ts-'
(n)( s ) Tn (W); X {t, w)=j},
n ~ 2;
Tn{W)= inf{t: t> O'n (w); x{t, w)=i},
n ~ 1-
These are all random variables and except for 1'1 which may equal zero with positive probability, they all have continuous distributions by Theorem 6.3. It is also easy to see that they are all optional relative to {Xt' tEl}; d. Theorem 15.1 below. For a.a. w, we have lim O'n{w)= lim Tn{W)= 00.
(1 )
n-700
~oo
For suppose that the limit is Coo (w) < 00 for wEA. Then if wEA, as ttCoo{w), x{t, w) has the two finite limiting values i and j. Hence P{A)=O by Theorem 6.1-
Theorem 1. If CU,;) (It/) < 00, then the integrals Tn+'(W) T,(w) A (2) f y{t,w)dt, f y{t,w)dt, fy{t,w)dt Tn(W) 0 0 exist and are finite for a. a. w, for every n~ 1 and every constant A Moreover, as functions of w they are finite random variables.
Proof. We have
1
Tnl(W)ly {t, w)1 dt = ~(~
~ If{k)1 {,u [Sk(W) " (Tn(W) , O'n(W)] +
kEL
,u [Sk(W) " (O'n(w) , Tn+l (w))]).
Applying (11.17) to the post-Tn process, we have
P{X{Tn(W) +u, w) +j, 0 < U < t; X (Tn(W) +t, w)= k}= iP;k (t). It follows by FUBINI's theorem that 00
E {,u [Sk(W) " (Tn(W), O'n(w))]} = f iPik (t) dt= iP"k (oo); o
similarly
00
E{,u[Sk(W)"(O'n(w), Tn+l(W))]}= f iPik{t) dt=iPik{OO). Substituting into (3), we obtain
o
Tn+1(W) } E { f ly{t,w)ldt = ~ If{k)le~,i)=C(i,i)(It/)s}= {w: ,u [Si(w)n (0, t)] > s} which clearly belongs to ff~; hence Yi(S)=Yi(S,.) is optional relative to {Xt, tET} for each s~O. Using the notation of § 11 we define on Ll i :
HCii(t, w)= {
x(t, w)=f and x(s, w)rtH for min(t, ei(W)) ex(w), x(t, w)=!=i}; It follows that
e (w)=,B (w)- ex(w).
e (w) =inf{t: t> 0; x (ex (w) +t, w) =!=i}= inf{t: t> 0; ~(t, w) =!=i} and consequently for each s> 0, (1)
L1n{w: e(w) >s}=L1n{w:
~(t,
w)=i,
o~t~s}.
By Corollary 3 to Theorem 9.3, {~t} is separable so that the probability of the right member above is defined. We infer from (1) that e is optional relative to {~t' ~+t' tET}. From this we infer that,B is optional relative to {Xt, ffi;, tET} by the following general result.
Theorem 1. If ex is optional relative to {Xt, ffi;, tE T} and e is optional relative to {~t' ~+t' tET} where {~t' tET} is the post-ex process; then ex+e is optional relative to {Xt, ffi;, tET}. Proof. Let (§s=§{~t' o~t~s}. We have (2)
{w: ex(w)+e(w) 0 for some t> 0, then Fi i (t) > 0 for an arbitrarily small t> O. All statements below concerning w will be valid for a. a. w. Since ,u [500 (w)]= 0 by Theorem 5.1 we have (7)
T1
(i; w)= L:,u [5k (w)" {O, kEI
T1
(i, w))].
Let so> 0 be given; by hypothesis there exists an s, 0 < S < so' and an w-set Al with P{A1) > 0 such that if wEAl then s < T1 (f, w) < 00. The series in (7) therefore converges on AI; it follows that there exists a finite set K with N elements and A 2 (A 1 with P{A 2»0 such that if wEA 2 , (This as well as the next statement is a simple case of EGORov'S theorem.) Since all states are stable each 5k (w) is the union of intervals of lengths An (k; w), n;;S; 1, it follows again that there exists an N' < 00 and A3 (A 2 with P{A3) > 0 such that if wEA 3, there exist {kv, n., 1 ~v~m} with k.EK and 1 ~n.~N' such that (a)
and (b)
T1{f;W)-
~ < v=l fAn.{k.;w)~T1{i;w).
Consider all possible ordering of {Tn{k), kEK, 1~n~N'} subject to the restriction that Tn (k) < Tn' (k) if n < n'; and consider all the initial sections of these ordered sequences, a typical one of which may be denoted by q.>= {Tn, (k1) < ' .. < Tn". (km)}· Let A 3CP denote the subset of Aa for which (a) and (b) hold. Since the number of q.>'s is finite there exists at least one q.> for which P (Aacp) > o. Without complicating the notation let
11.15 Post-exit process
235
this be the cp indicated above. For coEA 3tp we put
15/ (co) = 'Tn / H (kz+1; co) - 'T~/(kz; co), 15m (co) =
'T1 (j;
1 ~ l~ m- 1 ;
co) - 'Tnm (km ; co).
Furthermore we put
A4= {co: 'Tn, (kl; co) < ... 0.
p=1
This concludes the proof since 100 is arbitrary. 0 From now on in this section we shall deal with a fixed post-exit process. There is then no loss of generality if we assume IX==' in the above so that P{xo(co)=i}= 1. We have thus
°
ei(co)=inf{t: t>O; x(t, co)=j=i};
1)(t, CO)=X(ei(CO)+t, co)
defined a.e. on L1=,Q. Theorem 4. For eachjEl there is a function rii continuous in [0, (0) such that (8) for each
t~o
and a.a. (Lebesgue measure) SE[O,
tJ.
Moreover we have
(9)
Proof. It follows from Theorem 2 that the random variable (!i and the Borel field SO {1)t, tEl} are independent. Hence Corollary to Theorem 9.4 is applicable and the theorem follows. Note that here the A (L1 ; .) measure is that determined by the distribution function 1- e-q,t. 0 The analytical content of (8) is recapitulated in the next theorem, which also serves to identify rii analytically. Theorem 5. Let i be stable andjEI. We have for every (10)
t~o,
t
Pii (t) = l5 ii e- Qit + J qi rqi(t-s) rii (s) ds, o
where ri i is continuous in [0, (0). Consequently P;i(t) exists in [0, (0) and (11)
II Continuous parameter
236
Proof. We may suppose that qi>O; otherwise the assertions are trivial. In view of (8), the equation (10) is simply the analytic expression of the relation
Pij(t) =P{ei(W) > t; x(t, w) = j}+P{ei (w) ~ t; x (t, w) = j} =P{ei(W»t;x(t,w)=j}+
{roo
J
P{x(t, w)=ji ei}P(dw).
ei(ro)~t}
Equation (11) follows from (10) upon differentiation since rij is continuous. 0 We have thus a new and probabilistic proof of the main assertions of Theorem 3.1, namely the existence of the continuous derivative P;j' On the other hand, the latter can be used to prove the probabilistic statement (8) via (10). Similarly (3.3) and (3.5) have their counterparts in the following equations:
2>ii(t) =1;
(12)
j
(13) These are particular cases of (8.12) and (8.13) and express the fundamental Markov property of the post-exit process. Theorem 2.6 is also given a new meaning as follows.
Theorem 6. We have for everyjEI,
P{17(O, w)=j}=rij(O)= (1-~/qij
(14) and
Proof. This is the particular case of Theorem 9.1. The evaluation of rij(O) follows from (11) and Theorems 2.4 and 2.5. 0 Theorem 6 is illustrative of the general sample function behavior described in Theorem 7.4. According as j is stable or instantaneous, as Siei(W), rij(O) is the probability of case (a) or (b) (readj for the i there), and 1-2: rij(O) is the probability of case (c) of that theorem. Clearly j
rii(O)=O. We proceed to derive some further relations. Let Yi j= IXi j- (!i; then by Theorem 2 the two random variables ei and Yi j are independent. We have Yij(w)=inf{t: t~O: 17(t, w)=j}. Let Ri j be the distribution function of Y i i:
Rii(t)=P{Yij(W) ~t}.
lI.iS Post-exit process
237
Theorem 7. R;i is continuous in [0, 00) and t
J[1- e-qi(t-s)] dRii(s);
(15 )
~i(t)=
(16)
rii(t)=J Pii(t-s) dRii(S).
o
t
o
Furthennore ~'i exists and is continuous in [0, 00) and (17)
F/; and R; i exist a. e.; if j is stable then P~'i and r; i exist a. e. Proof. The continuity of Rii follows from the definition of Yii and Theorem 6.3; (15) follows from the independence of e; and Yi i' Formula (16) is the analogue of formula (13.3) for the 1]-process. Integrating by parts in (15) we obtain t
(18)
~i(t)=
J qi e-q;(t-s) Rii(s) ds.
o
Hence Fii has a continuous derivative given by (17). Since the right member of (17) is a function of bounded variation, F/; exists a.e. and so does R;i by (17). By (12.21), ferentiate (13.3) to obtain
P;i (t)
t
J Ipii(s)lds00 exists and is finite. It now follows from (8) and Theorem 10.1 that lim P;i (t) exists; this limit must be zero since Pii(t);;;;, 1. 0 1->00
The idea behind the basic formula (3) is of course that" x(t, w)=j" amounts to "t belonging to a j-interval of x(o, w)". One may inquire further about the probability that this j-interval be preceded by a k-interval where k is a stable state; in other words, the probability that the last discontinuity of the sample function before t be a jump from k to j. To find this we first note the following analogue of (2): 00
(13)
L. P{'t'~ (j; w);;;;' tl Ll i} =
n~l
[Uii*eqj] (t) = W;i (t)
11.16 Imbedded renewal process
243
where from now on we use the notation (14) Formula (13) follows from (2) and Theorem 15.2. Upon differentiation we obtain t
H{j (t) = qi J e-qj(t-s) d Uii (S) = Pii (t) qi'
(15)
o
The desired probability can now be evaluated as follows, using Theorem 9.5 and (11.17) by taking the IX there to be 'Z'~(k) restricted to the set {co: x('Z'~(co), co) =i}, whose probability is given by Theorem 15.6:
L P{'Z'~(k; co)~t< 'Z'~+dk; co); x(u, co) =i, 'Z'~(k; co)~u ~tl Ll;} n=l 00
00
Lf
=
t
P{x(u, co)=i,
n=lo t
'Z'~(k; co)~u~tl 'Z'~(k)=s}ds P{'Z'~(k; co)~s} t
= f~k: e-qj(t-s) dH{k(S)
fhk(S) e-qj(t-s) ds.
=
o
0
Heavy machinery has been used in the above calculation which is given here only as an illustration of the method. It turns out that there is a more delicate proof which extends the result to an instantaneous k. Let us say that the sample function x (., co) has a pseudo-fump at 'Z' from ktoiiff (16) lim x(s, co) =k, lim x(s, co)=i st T St T where k+-i and either k or i may be 00. If either k or i is stable then the corresponding lim may be replaced by lim by Theorem 7.4 (a), and if both are stable then the pseudo-jump becomes an ordinary jump.
Theorem 3. Let i, k be arbitrary, i stable and P(LI;)= 1. The probability that x(t, co)=i and that the last discontinuity before t of the sample function x (., co) is a pseudo-jump from k to i is equal to t
Jhk (s)
(17) if k
+-
o 00;
(18)
if k= 00.
qki e-qj(t-s) ds
and equal to
Pii(t)-CJiie-qjt-
~
t
J Pik(S)qkie-qj(t-S)ds
k",io
Proof. Let A denote the set of co such that t belongs to a i-interval of x(·, co) and that at the left endpoint 'Z'='Z'(co) of this interval (16) is true. It follows from Theorems 5.7 and 7.4 that PtA) is equal to the 16*
244
II Continuous parameter
required probability. Suppose first that k+ 00. Then the r just defined satisfies the conditions of Theorem 12.1, indeed only (ii) needs a simple verification. Define rm as in that theorem and let Am and A;" be respec· tively the intersection of A with {w: x (rm(w), w) = k} and {w: x (rm(w), w) +k}. We have lim P(A;") by Theorem 12.1. Furthermore, suppose that m--+oo
for infinitely many values of m tending to infinity there is an nm(w) such that x(nm Tm, w)=k and x(s, w)==j for (nm+1)Tm~s~t, then the corresponding set of {nm 2- m } has a right limiting point r(w) for which (16) is true by well-separability and Theorem 6.1, and consequently wEA except for a null set. This means, if we put Mm=
U
{w: x(nTm, w)=k; x(s, w)==j, (n+1)Tm~s~t},
(n+l)2-m 0, t~ O. These equations are valid for arbitrary i and j by Theorem 12.8 but we shall be concerned only with the cases specifically noted. The limiting cases for s= 0 may be written as
246
II Continuous parameter
These equations are no longer necessarily valid, but the corresponding inequalities with "~" replacing the" = ", which will be referred to as (1ti ) and (2ti ) respectively, are valid. The simplest way to prove these inequalities is to write for t~ 0, h> 0, Pii(t+hl-Pii(t)
= Pidh1-~ Pii(t)+ L
Pikh0l.hi(t);
k*i
Pii(t+hl-Pii(t) =Pii(t) Pii(h1- 1
+ LPik(t)
Pk~(h) ;
k*i let h,).O and use Theorems 2.4 and 2.5 together with FATOU'S lemma. We shall call the equations (iii) for a fixed stable i and all i the first subsystem (1 i ); the equations (2ii) for a fixed stable i and all i the second subsystem (2i ); the collection of (1i) for all i the first system; and the collection of (2i) for all i the second system. As a mnemonic aid we note that in the first [second] system all the q's appearing in one equation have the same first [second] index. The more meaningful probabilistic counterparts of (3.5) and (3.11) are given by (15.13) and (16.9), which we record as
(15.13 bis)
rii(s+t)= L rik(s) hi(t), k
if O 0, t~ 0. These equations have the analytical advantage of involving only nonnegative quantities. Applying F ATOU'S lemma we obtain the limiting inequalities for s= 0: (3ti )
rii(t)~Lrik(O)hi(t);
(40)
vii(t)~ LPik(t) Vki(O).
k
k
The corresponding equations will be referred to as (3ii) and (4 ii ) which are equivalent to (iii) and (2ii) respectively; smilarly for the systems (3i) and (4i ) The equivalence is established by (15.11) and (16.8) which we record as (15.11 bis) rii(t)= q;lp;i(t) +Pii(t), if O 0,
(8)
The function d ii is nonnegative and continuous in (0, 00); lim dii(t)=O; lim dij(t) exists and is finite. qo
°
t_oo
Proof. The equation (8) follows easily from (16.9 bis); dii~ by (4ti ); d ii is continuous in (0, 00) by Theorem 2.3; the limit at zero exists by the same theorem and the limit at infinity exists and is finite as in the proof of Theorem 16.2. Moreover we have lim v· .(t) ~ lim 'L P'k (t) Vk1·(0) qo (t) ~ o. Next, if we put a;j>(t)=
n
2: P;i>(t) we have from
v=o
(17.3)
A simple induction on n then shows that (4)
n
2: afj> (t) = 2: 2: P;i> (t) ~ 1 i
i .=0
for every t and n. Hence the series in (17.15) converges. Summing (17.13) over n~O and differentiating we see that (Pij) is a solution of
II Continuous parameter
252
(2). To prove that it is also a solution of (3) we shall show that if we use (17.14) instead of (17.13) in the above construction, the resulting "P;i) coincides with the previous pii) for all n. This is true for n= 0 by (17.12). For n= 1 both pg> and"pg> are equal to (1 -
t5; i) qi i [eq; * eqjJ
in the notation of (16.14). Assuming that pW=pf? for all i and i, V= n- 1 and V= n, we see that both Pi~n+!) and "Pt+!) are equal to eq/*[ L
L qikPkV-1 ) qli]
k*d I*i
*eqi ·
Hence the induction is complete and consequently (17.14) holds for the P;';>,s constructed from (17.13). Summing (17.14) over n~O and differentiating we see that (3) is satisfied with zii=P;i tor almost all t. To prove that (Pi) is a solution of (3) for all t we will first show that (Pii) is a substochastic transition matrix. For this suppose we prove the identity n
pf'!> (s+t)= L L pf,'> (s) Pkj-v)(t) v=o k
(5)
in i, i, sand t. For n= 0 this is trivial. Assuming that (5) is true for a given n, we have from (17.13):
= L pf!.> (s)Pkj+!) (t) k = e-q;s pii+!) (t) = e-q,s L l*i = L I*i
n+1
+L
L L v=l k I*i
s
Je- qjU qil pfi,-l) (s- u)Pk'!+!-V) (t) du
0
s
+ L Je-q;u qil pf;> (s- u+t) du
t
1*;
0
s
J e-qiUqiIP~n) (t-u) du+ L Je-q;UqizPf;> (s+t-u) du I*i
0
s+t
J e- qjU qil PI\n) (s+t-u)
0
du=P;'!+!) (s+t).
0
Hence (5) is proved by induction on n. Summing it over n we obtain 00
Pii(s+t)
=
00
L 'Lpfk)(s) LPkj-V)(t) k .=0 n=v
=
LPik{S)Pki(t). k
This establishes property (1.C) for the matrix (Pii); combined with (4) we see that it is indeed a substochastic transition matrix. It is standard since lim Pi;(t) ~ lim Pi?) (t)= 1. By Theorems 3.1 (or 3.2) and 3·3, P;i{t) t,j,o
qo
is a continuous function of t; by the Corollary to Theorem 17.3, so is
II.18 The minimal solntion
L Pik(t) qki· k
253
We have already proved that these two functions of tare
equal for a. a. t; hence they are equal everywhere. This completes the proof of the following theorem. Theorem 1. For any Q-matrix (qii) the system of equations (2) and (3) have a solution (Pii) which is a standard substochastic transition matrix. The elements of the solution matrix are constructed by (17.12), either (17.13) or (17.14), and (17.15). Let (zii) be any substochastic transition matrix such that for every i and i, (6)
z:i(O)=qii;
in other words (Zij) satisfies the equations (2) and (3) for t=o. Then i and i are continuous by Theorem 3.3, and the inequalities corresponding to (1£i) of § 17 hold for (Zii) by FATou's lemma. In particular we have z:i(t)~-qi zii(t) from which it follows that
z.
z;
zii(t) ~ {)ii e-q;t=
pW (t).
Furthermore, by integrating the said inequalities we have Zii(t)~{)ii e-q,t+
(d.
(17.9))
t
L J e-qj(t-s) qik zki(s) ds.
k*iO
Assuming then, for the sake of induction on n, that Zk j (t) ~ L" p~'/ (t) for all k and i, we obtain from (17.13) and the above, .=0 t
n
Zii(t)~{)iie-qlt+
=
{)i i e-q;t
L L J e-q/(t-s) qikP~1(s) ds
.=0 k*i
n
0
n+1
+ L pg+1> (t) = L pfi> (t) . • =0
.=0
This completes the induction and we have upon letting n~ 00 (7) Theorem 2. If (Zij) is any substochastic transition matrix satisfying (6) and (Pij) is the solution constructed according to Theorem 1, then (7) is true. The substochastic transition matrix (Pij) will be called the minimal solution corresponding to the Q-matrix (qij). Theorem 3. If the minimal solution corresponding to a given Q-matrix is a transition matrix then it is the unique substochastic transition matrix whose Q-matrix coincides with the given one, necessarily conservative. In particular it is the unique solution of the two systems (2) and (3).
II Continuous parameter
254
Proof. Let (Zij) be a solution; then (7) is true by Theorem 2. It follows that 1 ~ L Zij(t) ~ L Pij(t) = 1 i
i
and consequently Zij= Pij for every i and f. The Q-matrix is conservative by Theorem 17.2. 0 The converse of this theorem will be given in the Corollary to Theorem 19.4. If the minimal solution is substochastic, we may complete it by adding one new state {) in accordance with Theorem 3.3. Any M.e. with the resulting transition matrix, an arbitrary initial distribution and the minimal state space i will be called a minimal chain corresponding to the given Q-matrix. We proceed to derive some further properties of the minimal solution, leaving the probabilistic meaning to the next section. Since each Pf'!> is obviously continuous by induction on n, and Pij is also continuous, it follows by DINI'S theorem that the series in (17.15) converges uniformly in every finite interval. The convergence need not be uniform in T since if qj>O, lim pf,!'> (t)= 0 for each n but lim Pij(t) may be t--""oo
t----')o-OO
positive. The following result is essentially a special case of Theorem 15.3 on account of Theorem 3.3, and can also be easily proved by induction on n.
Theorem 4. Each in TO.
pf'!> or Pij is either identically zero or never zero
For any subset J of the index set I let (8)
Theorem 5. If PiI(tO) = 1 for a certain to> 0 then PiI(t) ==- 1 for all
t~o; in this case PkI(t)-== 1 for every k for which Pik(t) =1=
o.
Proof. We have if O 0 for all t> 0 by Theorem 4. Hence if kEE, then PkI(t) = 1 for o~t~to' We have therefore if to (t)
=
1-
n
L.: pfI> (t) .
v=o
It follows from (4) that Lfn>(t)~O; since each pf7> is continuous in T by the preceding Corollary, so is Lfn>. Let
(10)
Li(t) = n->-()o lim Lfn> (t) = i-Po (t);
then L; is a continuous function in T by the Corollary above, but more is true as follows.
Theorem 6. Each L; has a continuous derivative l; satisfying for t>O: (11)
s~O,
Proof. From the definition (10) and the semigroup property (1.C) of (Pi i)' we deduce for each iEl: (12) Applying Theorem 12.4 to the system of equations (12) we obtain (11).
0
Comparison of this result with Theorem 12.5 suggests that the minimal transition probabilities may be regarded as taboo probabilities with the "ficticious taboo state 00". This formal analogy has been substantiated and much refined by the boundary theory; see CHUNG [12J and [14J; NEVEU l2].
Theorem 7. If the Q-matrix is conservative, then (13)
Lfn> (t)
t
=
J L.: pf7> (s) qi ds.
o i
II Continuous parameter
256
Proof. Using (17.14), we have if n;;:;;O, t
t u
J p;j+l> (u) qi du = J J L
o
0 0
k*i
pf;;> (s)qki e-q,(u-s) qi ds du
t
= L J Pf;;>(s)qki[1-e- qj (t-S)Jds k*io t
Summing over
i,
= L J pf;;>(S)qki ds -P;j+l>(t). k*iO we obtain
p;;+l> (t)
t
=
It follows that
JL
o i
P;i> (s) qi ds -
0
0=0
t
J qie-qjSds- J LPfj>(s)qjds
0
=1-
0
p;,!+l) (s) qj ds.
i
t
n
LPW(t)=e-q;t+
proving (13).
t
JL
t
JL
o i
0
i
p;j> (s)qi ds,
Corollary. A sufficient condition that Li(t)=O for all t is that L Pii(t) qi < 00 for some t> o. i
Proof. If 0 < s < t we have
and consequently 00
L L;n>(t) =
n=O
Hence Li (t) remS. 0
=
t
JL
0
i
Pii(s)qi ds< 00.
lim L;n> (t) = 0 and the Corollary follows from Theon-->-oo
Theorem 8. If the Q-matrix is conservative, then (14)
Proof. Multiplying (17.13) through byeq;t, differentiating, cancelling eqjt and integrating, we obtain
Multiplying by qi' summing over i and using (13), we obtain (14).
0
II.19 The first infinity
257
An equivalent form of (14) is as follows:
Lfn+l> (t)
(15)
t
e-q;(l-s)
o
Letting n-* 00 we have
Li(t) =
(16)
f
=
f
I
o
e-q;(l-s)
L.: qikL~n> (s)
k*i
ds.
L.: qikLk(S) ds. k*i
The differentiated form of (16) is (17)
and the right member is equal to
L.: qik L.: (1-Pki(t)) = - L.: L.: qik Pki(t) = - L.: P;i(t). k
Thus the series
i
i
L.: Pi i (t) = i
i
k
1- Li (t) may be differentiated term by term.
This is actually a case of (3.3), in view of Theorem 3.3. Another consequence of (17) is that 1i (0) = 0 for each i. It follows from (15), by induction on n, that (18)
lim Lfn> (t)
1_00
00
=
f L.:ipf'!> (s) q. ds = l l
0
1.
The probabilistic meaning of the formulas (12), (16) and (18) will become clear in the next section. Notes. The main results of this section are specializations of those in FELLER [1J which treats more general processes. A new treatment of our special case by means of Laplace transforms (resolvents) is given in FELLER [5J; another approach through approximation by truncated matrices is given by REUTER and LEDERMANN [1J; a third one by a perturbation method by KATO [1]. See also REUTER [2J and HILLE and PHILLIPS [1]. The methods of functional analysis, however, tend to obliterate certain fine points which may be significant for the study of the stochastic process. It may be mentioned that even for the minimal solution (Pi i)' questions of representation, higher derivatives and further analytical properties seem to be open, except in special cases such as bounded or symmetric Q-matrix; see e.g. AUSTIN [1J; JURKAT [2]. For historical orientation see KOLMOGOROV [1].
§ 19. The first infinity We return to the M. C. {XI' tE T} which is assumed to be well-separable and Borel measurable. Furthermore we assume that for each i, qi> 0 and that (P;i(O))= (qii) is a conservative Q-matrix. By Theorem 17.2 17
Chung, Markov Chains
II Continuous parameter
258
this is equivalent to the validity of the first system of differential equations. In probabilistic language, all states are stable and non-absorbing and the discontinuity of the sample functions at any exit time is an ordinary jump with probability one (see Theorems 15.2 and 15.6). It follows that we can enumerate the successive jumps until the first discontinuity that is not a jump, if such a discontinuity exists. By Theorem 7.4, since only case (c) there is possible, the sample function tends to infinity as this discontinuity is approached from the left. Let us put Too (co}=inf{t: t> 0; lim x (s, co) = oo}
(1 )
stt
if the t-set between the braces is not empty, and Too (co) = 00 otherwise; Too(CO} is called the first infinity of x(., co}. It is easy to verify that Too is an optional random variable relative to {Xt' tE T}. For each co, we have Too(CO}=OO if and only if the sample function x(., co} is a "step function" in T, namely one which has no other discontinuities except jumps on a discrete (finite or countably infinite) set of points and which is constant between jumps. Let the successive points of jump of x(.,co} until Too(CO} be {Tn (co), n:;=;;1}; for a.a. co this is an increasing infinite sequence and lim Tn (co) = Too (co).
Let To=O and
n--->oo
for n:;=;;O so that 00
(2)
Too (co) =
2: en (co).
n~O
We may now apply the analytical results of § 18 to the probabilities and Pij defined in (17.11) and (17.16). The quantities Pi1> (t), Lin> (t), Pu(t} and Li (t) are, respectively, the probabilities that x (., co) has exactly n, more than n, finitely many, and infinitely many jumps (and no other discontinuities) in (0, t) (or [0, tJ indifferently), under the hypothesis that Xo (co) =i. This interpretation remains valid even if (qij) is not conservative, by Theorem 15.6, since a pseudo-jump to 00 is a discontinuity other than a jump. We have therefore
Pl';>
Lin> (t}=P{Tn+! (co) ~ tJxo(co)= i}; L; (t) = P{Too (co) ~ tJ Xo (co) = i}. According to (18.13), the distribution Lfn> of Tn+! has the density function 2:pW(t}qj which is continuous by (18.14). As for the distrij
bution L; of Too, it has a continuous density function given by (18.17),
II.19 The first infinity
259
which is the limit as n_ 00 of the density function of 7:n , by letting n_ 00 in (18.14). We have P{7:oo (co) >tlxo(co)=i}= 1 if and only if Pi.l(t) = 1 or Li(t)=O. Theorem 18.5 asserts that each Li is either identically zero or never zero. This means that if almost all sample functions with x (0, co) = i are step functions in (0, e) for any e> then so are they in (0, (0). Alternatively, if there is a positive probability that the first infinity is finite then there is a positive probability that it is less than any given positive number. This result is to be compared with Theorem 15.3. For the sake of definiteness let us now take the x+ version so that
°
x(t, co) =
X
(7:n (co), co)
if 7:n(co)~t 0, there exists to= to (e) such that inf lim Pii(t»1-e.
iEIO;;i;t;;i;t.
This to has the properties required of it in the cited proof. In particular if nh
n-+oo iEA
°
A,r~i)=P{x. (W) EA, 1 ~v < n; Xn (w)=il Xo(w)= i}.
Theorem 3. If there exists a sojourn set A for the jump matrix and an i satisfying both (10) and (11)
II Continuous parameter
262
then (12) Conversely if (12) is true then there exists a sojourn set A and an i satisfying (10) and 00
L: L:
(13)
n=OiEA
r~n/
qi l < 00.
Remark. Since (11) is implied by (13) we may use the latter condition in both parts of the theorem. Furthermore we may allow some qi= 0 if we set the corresponding qi l = 00. Proof. Let Lli={w: Xo(w}=i}={w: xo(w}=i}. As already remarked, the condition (10) is equivalent to P(A ILl;} > O. We have
consequently by (11)
f L:n q~tw) P(dw} ILl i } < 00.
A
Hence for a. a. w in Ll i A,
L: q~tw) < 00. n
It follows from Theorem 1 that
P{Too(W} < 001 Lli};SP(AI Ll;} > o. To prove the converse suppose that (12) is true; then there exists a t> 0 and a c > 0 such that Li(t} =P{Too (w) ~ tl Ll i }= c. Let A = {i: Li (t) ;S c}; we shall show that A is a sojourn set. Let 00
A,1t = n=l L: P{x.(w} EA, 1 ~vr(w).
"P(}(t) = 1- ~"Pj(t); j
then Theorem 4 may be restated as follows. Theorem 4'. The reverse chain {y t, t ETO} in (0, r (w)) is an open M.e., with the transition matrix ("P;j) , i, iElu{f}}; and the absolute distribution {"Pj(t)}, iElu{f}}, at time t. The main limit theorem for the approach to the first infinity is obtained by applying the martingale convergence theorem to the reverse chain. Observe that "Py(s, 0»),; (t- s) is a version of the conditional probability
P{y(t, w)=iIY (r, w), O-O
(Pu') is a standard transition matrix. The analytical verification of (1.B) and (1.C) is left to the reader. Thus there is a version of {Xt, tET} which is well-separable and measurable by Theorem 4.3. For such a version we have for every s ~ 0, h > 0, P{X(t, w) =e, s-;;:;,t-;;:;'s+h!X(s, w) =e} 00
= IIP{x(n)(t, w) = n=1
=
en' s-;;:;,t-;;:;'s+h!x(n)(s, w) = en}
00 00 h - 1: q(n) h I I e _q(n) en = e n=1 Bn •
n=1
Since e,,=O for all sufficiently large n, the last term is equal to zero by (28). Hence every state e is instantaneous. If all states are instantaneous, then we have for a.e. w: (30) To prove this we need the following extension of BAIRE'S lemma which can be proved in the same way as its usual form (see e.g. KELLEY [1; p.200J).
II.20 Examples
287
Lemma. Let {Un' n;S 1} be a sequence of subsets of T such that each and has the following property: for any open interval V in T, whenever U,," V is not empty it contains a nonempty open interval.
u;. is dense in I
Then
00
n Un is dense in T.
"~1
To apply the lemma to the proof of (30), we may suppose that I is the set of positive integers. For each w andn;S1 put U,,= U Sm(w). n~m~oo
If all states are instantaneous, it follows from Theorem 7.4 (f3) and the
°
fact that .u [Soo(w)] = for a. e. w (Theorem 5.1) that for a.e. w the sequence {Un (w), n;S 1} satisfies the conditions of the Lemma. Hence
Soo(w) =
00
n U,,(w)
"~1
0
is dense in T, as was to be shown.
Example 7. Even if all states are stable, it is possible that (almost) no sample function has any jumps so that all its discontinuities are of the second kind. Each such function is a singular nondecreasing function in T; its graph is obtained by inverting that of a purely discontinuous strictly increasing function whose set of discontinuity in that of all the rationals in T, the segments of the jumps being added to the graph. Thus as t increases, x(t, w) assumes all values in T, each rational being assumed in a nondegenerate interval while the set of irrationals being assumed in a set of Lebesgue measure zero. To construct such a M.e., let R be the set of rationals in T and let r denote an element of R below. Assign to each r a positive number qr such that qr = qr+1 for every rand L q;1 < 00. Thus L q;1 = 00. 0:;;,:;;1
,ER
Let {r" rER} be a countable family of independent random variables such that i, has the distribution function eq" so that E{ir}=q;1. Put
O'(S, w) = Li,(W) ,.~s
for SE T. There exists a set Do with P(Do) = 1 such that if wEDo ' then the following statements are all true. For each r, 0< iT (w) < 00; for each SE T, O'(s, w) < 00; 0'(" w) is strictly increasing in T; limO' (s, w) = 00; and s---+oo
(31)
O'(s+O,w)-O'(s-O,w)= {
ir(W) >0, 0,
if 'f
1
s=rER; T R
sE
-
.
Thus 0'(" w) has jumps at each r in R and is continuous elsewhere. These properties follow from the proposition that the sum of a countable family of independent positive random variables is finite for a. e. w if and only if the sum of their expectations is finite; and that if a countable family of positive numbers have a finite sum then any sequence from the family has a convergent sum.
288
II Continuous parameter
Now put for wEQ o and tE T:
(32)
x{t,w)=s for tE[O'{S-O,w),O'(s+O,w)],
SET.
The sample functions of the process {XI' tET} have the properties described above. If SET-R and tET, then
P{x{t, w) =s}=P{O'(s, w) =t}=O by (31) and the (absolute) continuity of the distribution of O'{s, .). Next if 0=tO