273 88 19MB
English Pages 288 [284]
Applied Mathematical Sciences EDITORS Fritz John Courant Institute of Mathematical Sciences New York University New York. N.Y. 10012
Lawrence Sirovich Division of Applied Mathematics Brown University Providence. R.I. 02912
Joseph P. LaSalle
Division of Applied Mathematics Lefschetz Center for Dynamical Systems Providence. R.I. 02912
ADVISORS H. Cabannes University of Paris-VI M. Ghil New York University J.K. Hale Brown University
J. Keller Stanford University J. Marsden Univ. of California at Berkeley G.B. Whitham California Insl. of Technology
EDITORIAL STATEMENT The mathematization of all sciences. the fading of traditional scientific boundaries. the impact of computer technology. the growing importance of mathematicalcomputer modelling and the necessity of scientific planning all create the need both in education and research for books that are introductory to and abreast of these developments. The purpose of this series is to provide such books. suitable for the user of mathematics. the mathematician interested in applications. and the student scientist. In particular. this series will provide an outlet for material less formally presented and more anticipatory of needs than finished texts or monographs. yet of immediate interest because of the novelty of its treatment of an application or of mathematics being applied or lying close to applications. The aim of the series is. through rapid publication in an attractive but inexpensive format. to make material of current interest widely accessible. This implies the absence of excessive generality and abstraction. and unrealistic idealization. but with quality of exposition as a goal. Many of the books will originate out of and will stimulate the development of new undergraduate and graduate courses in the applications of mathematics. Some of the books will present introductions to new areas of research. new applications and act as signposts for new directions in the mathematical sciences. This series will often serve as an intermediate stage of the publication of material which. through exposure here. will be further developed and refined. These will appear in conventional format and in hard cover.
MANUSCRIPTS The Editors welcome all inquiries regarding the submission of manuscripts for the series. Final preparation of all manuscripts will take place in the editorial offices of the series in the Division of Applied Mathematics. Brown University. Providence. Rhode Island. SPRINGER-VERLAG NEW YORK INC .. 175 Fifth Avenue. New York. N. Y. 10010
Applied Mathematical Sciences
I Volume 23
John Lamperti
Stochastic Processes A Survey of the Mathematical Theory
Springer-Verlag New York Heidelberg Berlin
John Lamperti Department of Mathematics Dartmouth College Hanover. New Hampshire 03755
AMS Subject Classifications:
60-01. 60Gxx. 60Jxx
Library of Congress Cataloging in Publication Data Lamperti. John. Stochastic processes. (Applied mathematical sciences ; v. 23) Bibliography: p. Includes index. 1. Stochastic processes. 2. Stationary processes. 3. Markov processes. I. Title. I I. Series. QAl.A647 vol. 23 [QA274] 5l0'.8s [519.2) 77-24321 All rights reserved. No part of this book may be translated or reproduced in any form without written permission from Springer-Verlag. ~ 1977 by Springer-Verlag, New York Inc.
9 8 7 6
543
ISBN 978-0-387-90275-3 ISBN 978-1-4684-9358-0 (eBook) DOl 10.1007/978-1-4684-9358-0
PREFACE This book is the result of lectures which I gave during the academic year 1972-73 to third-year students University in Denmark.
a~
Aarhus
The purpose of the book, as of the
lectures, is to survey some of the main themes in the modern theory of stochastic processes. In my previous book Probability: ! survey of the mathematical theory I gave a short overview of "classical" probability mathematics, concentrating especially on sums of independent random variables.
I did not discuss specific appli-
cations of the theory; I did strive for a spirit friendly to application by coming to grips as fast as I could with the major problems and techniques and by avoiding too high levels of abstraction and completeness.
At the same time, I tried
to make the proofs both rigorous and motivated and to show how certain results have evolved rather than just presenting them in polished final form.
The same remarks apply to this
book, at least as a statement of intentions, and it can serve as a sequel to the earlier one continuing the story in the same style and spirit. The contents of the present book fall roughly into two parts.
The first deals mostly with stationary processes,
which provide the mathematics for describing phenomena in a steady state overall but subject to random fluctuations. Chapter 4 is the heart of this part.
The simple geometry of
the Wold decomposition is the starting point for discussing linear prediction, and the analysis is derived from it in a direct and natural way.
Basic results such as criteria for
singularity or regularity, the factorization of the spectral
vi
density and the error of optimum prediction are thus obtained with a minimum of heavy analytical machinery, while the need for those tools which are used becomes clear in advance.
The
individual ergodic theorem and the strong law of large numbers then round out the study of stationary processes. The second part of the book is mainly about Markov processes; if desired this can be read before most of part one by going directly from Chapter 2 to Chapter 6. that the application of semigroup
I think
theory is the key to under-
standing in this area, and so Chapter 7 in which this tool is developed is basic to the discussion.
Properties of path
functions, strong Markov processes and a little martingale theory are introduced in the later chapters of this part. I believe that, in the last analysis, probability cannot be properly understood just as pure mathematics, separated from the body of experience and examples which have brought it to life.
The students attending my Aarhus lectures al-
ready had a considerable acquaintance with applied probability and statistics, and I regard some such experience as one of the essential prerequisites for reading this book with profit. The other prerequisites are a general knowledge of "real analysis" as well as some familiarity with the measuretheoretic formulation of probability theory itself; details are given below.
I hope that after finishing this book
readers will be prepared either to go on to the frontiers of mathematical research through more specialized literature, or to turn toward applied problems with an ability to relate them to the general theory and to use its tools and ideas as far as may be possible.
vii If it is true that the mathematics discussed in this book is applicable, the question naturally must arise "Applicable for what?"
In the preface to the mimeographed version
of my Aarhus lectures,*) given while the Indochina War was still raging, I said: "It is impossible for me these days to write or lecture about mathematics without ambivalence. It is obvious that in many nations, and most of all in my own, science and mathematics are all too often serving as tools for militarism and oppression. Probability theory has played a considerable role in some of these perversions, and those who like myself, work in "pure mathematics" rather t~an directly with applications must also accept a share of the responsibility. I believe that today it is a vital duty for the scientific communitr to struggle against such misuse of science, ana to resist the demands - made in the name of "defense" or "security" - to develop ever more efficient means for killing and exploiting other human beings." Such concerns, of course, are not new.
The American
mathematician who has contributed most to the theories developed in this book is undoubtedly Norbert Wiener.
In 1947, in
the Introduction to his influential book Cybernetics,**) Wiener wrote: "Those of us who have contributed to the new science of cybernetics thus stand in a moral position which is, to say the least, not very comfortable. We have contributed to the initiation of a new science which, as I have said embraces technical developments with great possi~ilities for good and for evil. We can only hand it over into the world that exists about us, and this is the world of Belsen and Hiroshima. We do not even have the choice of suppressing these new technical developments. They belong to th~ age, and the most any of us can do by suppression is to put the development of the subject into the hands of the most irresponsible and most venal of our engineers. The best we can do is to see that a large public understands the trend and the bearing of the present work and to confine our personal efforts to those fiel~s, such as physiology and psychology, most remote from war and exploitation." II)
(Aarhus University Mathematics Lecture Note series, no. 38.) U) [W], page 28.
viii
That was an important statement, but we must now go further.
I believe that scientists have an obligation to try
to estimate which of the possible results of new technical developments are likely to occur in reality. done in a social and political vacuum.
This cannot be
In a peaceful, lib-
erated, nonexploitative society there would be little to fear; beneficial applications would be pushed while harmful ones would wither.
But in today's United States it is mainly the
government, especially the Pentagon, and the giant corporations which have the resources and the desire to exploit vanced technology for their own purposes.
ad~
I do not think the
prospects here for the benign application of science are encouraging.
Elsewhere in the world the outlook is rarely much
better, and sometimes worse. What then can be done?
To personally abstain from
immediately harmful work is a first step, but no more. Wiener's emphasis on public education is surely important; the vital decisions must not be left to the experts and rulers, but should be made in a broad political forum.
This
is beginning to happen in the nuclear energy controversy, for example, despite powerful efforts to exclude the public from meaningful participation.
Individual scientists
and
engineers, and several organizations of scientists, have played important roles in this process. Perhaps the key word which must be added to Wiener's statement is "organize."
The great day of the dedicated
solitary researcher is over, if indeed it ever existed.
Now
our scientific work is elaborately planned and supported, but the old individualistic idealogy of "disinterested research"
ix
and "knowledge for its own sake" persists.
These concepts can
serve as intellectual blinders which prevent us from understanding the social role which we in fact do playas mathematicians, scientists and engineers, and which keep us from working effectively for change.
In their stead, con-
cern for the human consequences of scientific and technological achievement must become part of our working lives, of our teaching and learning, of our professional meetings and writing.
Only through organized collective action can this be
achieved. The goal of controlling and humanizing science will not be fully attained, I believe, until radical changes have been made in the structure of society.
I also believe that
to wait for that day before beginning to act invites disaster. Fortunately there appear to be a growing number of people, in the U. S. and elsewhere, who are deeply concerned about the social consequences of their scientific work, who are ready to give this concern a major role in their professional lives, and who are getting together in old and new ways to develop their ideas and to put them into practice.
Since this must
be the starting point, perhaps there is some basis for optimism. The author of a book such as this one is obviously indebted to almost everyone who has contributed to the field, and I am drawing not only on the research but also on the expository writing of many others.
In particular, lecture
notes of A. D. Wentzel and (especially) of K. Ito have been very helpful in developing my feeling for stochastic processes, and the writings of Noam Chomsky, D. F. Fleming and
x
I. F. Stone have aided me to better understand the world in which we live.
Other resources are listed in the bibliography.
I wish to express my appreciation for the hospitality of the Mathematics Institute of Aarhus University where my lectures were given four years ago; my visit there was both pleasant and profitable.
And finally, I am grateful to the
Dartmouth College mathematics department for its generosity with leave and assistance during the preparation of the final version of the book.
John Lamperti Hanover, N. H. February, 1977
PREREQUISITES There are three general mathematical prerequisites for reading this book with profit.
They are an adequate know-
ledge of mathematical analysis, knowledge of basic probability mathematics (including its measure-theoretic foundations), and familiarity with examples and applications from elementary probability, preferably including finite'Markov chains. Taking up the last point first, there is no use in trying to prescribe exactly what
~
be known; the idea is
that readers should have some feeling for the importance of the mathematics we will be discussing, as well as some basic intuition about how probability works.
There are innumerable
valid ways to gain such experience, but in my opinion anyone who does not yet have it should postpone reading this book. I can think of no better source to which to turn than William Feller's beautiful text [F l ].
The chapters on Markov
chains and on examples of continuous-time stochastic processes (chapters 15 and 17 respectively in the 3rd edition) will be especially helpful in motivating most of part 2 here. It is assumed that the reader has already studied the modern formulation of probability theory:
probability spaces,
random variables as measurable functions, the concept of independence and some properties of sums of independent random variables such as the laws of large numbers (weak and strong) and the central limit theorem.
One source for all this is
chapters 1 through 3 of my previous book [LJ.
Of course
there are many other places to find this material, and some of them are listed in the bibliographies here and in [L].
xii
One specific comment:
the general (non-discrete) theory of
conditional probabilities and expectations is not treated in [LJ but will be needed here.
For this reason the essentials
have been set out in Appendix 2, and any reader who is not familiar with these topics should begin there. The prerequisites in analysis begin, of course, with an adequate knowledge of measure theory.
This must include
familiarity with abstract measure spaces and integrals on them, plus such basic results as the extension theorem for constructing
a-additive measures on Borel fields, the domin-
ated and monotone convergence theorems, the Fubini theorem for product measures, and the Radon-Nikodym theorem.
In ad-
dition to measure theory, considerable use is made of Hilbert and Banach spaces, both in the abstract and such more specific ones as
L2
and spaces of continuous or bounded functions.
The concept of "Hilbert space" is not defined in this book; it is assumed that the reader has studied Hilbert spaces already and is familiar with such things as orthogonal bases and series (generalized Fourier series), subspaces, bounded linear operators and functionals, and projections.
The spec-
tral theorem is not assumed, although a form of it is basic for Chapters 3 and 4; we will derive what we need there in (I hope) a relatively painless way.
As for Banach spaces, the
concepts of linear functional and of bounded and unbounded linear operators will be used (in part 2) without special explanations.
The results of F. Riesz relating linear func-
tionals on spaces of continuous functions to integrals should also be familiar.
xiii The material above is usually included in a course in "real analysis" such as the one taught at Dartmouth College for first-year mathematics graduate students.
All that I
have mentioned, and much more, can be found in Rudin's Real and Complex Analysis [R).
I might add, however, that it is
certainly not necessary to know everything in that book in order to read this one!
Also, I personally feel that main-
taining strict logical priorities can be a block to learning mathematics; it is not a sin to understand the statement of a theorem and to use it before learning the proof:
If this
point of view is accepted, the above list of prerequisities may seem much less formidable. Some of the more particular bits of information needed are listed below, arranged according to the chapter in which they occur: Chapter 2.
In section 3 some facts about the multi-
variate normal distribution are used without proof or much explanation; adequate background can be found in [F 2 J, Chapter 3, section 6. Chapter 3.
When reading section 1 it will help to have
seen the solution of linear, homogeneous finitedifference equations.
Section 3 requires the use of
"Helly's theorems" about the weak convergence of measures on
Rl; one reference among many is [LJ,
section 12.
Weak convergence turns up elsewhere in
the book too. Chapter 4.
Some standard Hilbert-space theory and
harmonic analysis are needed throughout the chapter, and in section 2 we need to know that linear
xiv
functionals on continuous functions are represented by at most one signed measure.
For this, see [R],
Chapter 6 (the Riesz representation theorem).
Then
in section 3 we require a theorem of F. and M. Riesz whose proof is sketched later in section 7.
That
proof (which can be omitted without loss of continuity) involves some complex analysis and a few facts about harmonic functions which go beyond what is needed elsewhere; they are all in [R]. Chapter 7.
Basic ideas about Banach spaces are used
throughout the chapter, but semigroup theory is developed from scratch.
A little advance famili-
arity with Laplace transforms may help; it's not strictli required.
The weak topology is mentioned
several times but never really used except at the end of section 6 (which can be omitted).
Some know-
ledge of elementary differential equations will help with certain examples. Chapters 8 and 9 involve lots of measure theory, but nothing exotic.
The last section of Chapter 9
probably can't be appreciated without some prior knowledge of potential theory and harmonic functions (see [R) and/or [K]) but it plays no role in what follows. And that's about all.
REMARKS ON NOTATION As a rule. bold-face letters have been used for operators throughout the book; at times they have also been used for the names of particular Hilbert or Banach spaces. The Hardy symbols
0
ment that a function
and
0
f(h)
appear occasionally; the stateo(h)
as
h
-+
a
means that
lim f(h)/h = 0, while the similar statement with h-+a that
f(h)/h
is bounded.
(The lias
if it is clear from the context.)
h -+ a"
O(h)
means
may be omitted
A few other symbols used
repeatedly, and their meanings, are listed below:
(n,
is the standard notation for a probability space,
~,P)
consisting of a set
Q, a
and a probability measure
a-field of its subsets P.
The letter
E
~
is always
used to denote mathematical expectation of a random variable. ~(W),
where
W is either a collection of sets or of random
variables, means the smallest sets in
a-field containing all the
W, or with respect to which all the random vari-
abIes are measurable, whichever is appropriate. is the indicator function of the set
~A(X)
~A(X)
¢
=I
if
x E A and
o
that is,
otherwise.
denotes the complex numbers.
Rn
is real Euclidean
n-space; Rn*
is the Kronecker
6 ij
i
r
6. which is
n-space com-
is real
pactified by adding a single "point at
.l'
A~
I
if
"'. " i
=j
and
j .
means the constant function whose value is always
'iff'
0
is sometimes used in place of 'if and only if.'
1.
if
TABLE OF CONTENTS Page Prerequisites
xi
Notation
xv
Chapter 1.
General Introduction
Chapter 2.
Second-Order Random Functions
12
Chapter 3.
Stationary Second-Order Processes
32
Chapter 4.
Interpolation and Prediction
52
Chapter 5.
Strictly-Stationary Processes and Ergodic Theory
1
83
Chapter 6.
Markov Transition Functions
106
Chapter 7.
The Application of Semigroup Theory
134
Chapter 8.
Markov Processes
181
Chapter 9.
Strong Markov Processes
204
Chapter 10. Martingale Theory
234
Appendix 1.
250
Appendix 2.
255
Bibliography
260
Index
262
CHAPTER 1 GENERAL INTRODUCTION 1.
Basic definitions.
A stochastic (or random) process is formally defined to be a collection of random variables defined on a common probability space
(n.
parameter set
The set
T.
and indexed by the elements of a
~P)
T will in this book generally be
one of these: Rl. R+ .. [0. 00 ) ' '1 " {. ••• -1.0.1.2 •... }. or
in all these cases the parameter thought of as time. of a random sequence.
If
T" 2 If
is often called a random field.
+
.. {0.1.2 •... };
t E T may usually be or
T" Rn
'1
",+ A
with
•
• one sometlmes spea k s
n > 1
the process
The random variables of the
process need not always be real-valued but must have the same rang.e-space
S; this may be
some other measurable space.
n R
(a vector-valued process) or
In the first part
of this book
the (common) range-space of the random variables making up the process will almost always be the real or complex numbers;
2
1.
INTRODUCTION
this is not generally true in the latter part, although the reader can privately make that restriction if desired. any case the range
S
In
of the variables is called the state
space of the process. In describing a stochastic process as we have just done there is a certain psychological bias:
one tends to regard
the process primarily as a function on
T
each
t E T
are random variables.
whose values for
Of course we are really X = X(t.w).
dealing with one function of two variables. say where tion of
t E T. wEn, and where for each fixed X(t.·)
t
is measurable with respect to
we fix an
t ~
wEn. we obtain a function
(or into whatever the state space
S
the funcIf instead
XC' .w): T
+
RI
may be) which is called
a trajectory or a path-function or sample-function of the process.
It is also legitimate. and sometimes most appropriate.
to think of the process
X as a single random variable whose
range is a space of functions on
T; the term random function
perhaps suggests this point of view. Let set in
t l 't 2 •... ,tn E T. and let Sn. Then the definition
makes sense since function
P
t l ' .. · .t n
urable sets of subsets of
XCt i ,')
Sn.
(.)
is
C be any measurable
_~-measurable.
and the set
is a probability measure on the meas-
The measures so obtained. for all finite
T, are called the finite-dimensional distributions
of the process.
This family of distributions is often (but
not always!) the most important aspect of the process, and one frequently needs to study a random process by starting with
2.
Remarks on methods
3
its finite-dimensional distributions.
The existence of such
a process -- that is, one whose finite-dimensional distributions coincide with a given family of measures is often assured provided the measures satisfy certain simple consistency conditions.
This theorem is due to
Kolmogorov and should be familiar; to review it see Appendix 1.
2.
Remarks on methods. The theory of random processes, although always deal-
ing with the same general sort of mathematical object, has many aspects and uses very diverse methods. pose that t ET. all cess.
X is real or complex and
Then the function
For example, sup-
E(IXtI2)
K(s,t) = E(XsX t )
t
have (a. s. ) .
(1)
That is, given the "present" of the process, the "past" event
A and the "future" event
B are conditionally inde-
pendent. The general physical interpretation mentioned above can be paraphrased as asserting that the conditional probability of a "future" event as the probability of
B given the "present" is the same B given the present and the past; in
analogy with (1) this becomes
10
1.
INTRODUCTION
(a.s.) for any
B E:
(2)
~>t'
Problem 6.
Prove that (1) and (2) are equivalent.
Note that the definition (1) is symmetric with respect to past and future.
It follows that the process
X'
defined
Xt t is also a .Markov process whenever X is one. 0It also follows that another equivalent expression of the
by
X~
Z
Markov property can be obtained by interchanging the roles of past and future in (2). The notion of Markov process has a generalization to cases when
T
is an ordered set not consisting of real num-
bers. but this extension seems not to be of great interest. However. the extension of a sort of Markov property to certain random fields is a subject of active research at this time.
We will not deal with Markov random fields in this
book; an introduction can be found in (SJ. A real or complex process
X
with
have independent increments if. whenever ET, the variables
T C Rl
tl < t2 < ••• < t
(X t -X t ),(X t -X t ) •.•. ,(X t -X t
mutually independent.
2
If
I
3
T
+
=
I
is said to
2
n
n-l
n
are
) •
• such a process consIsts
of the partial sums of a sequence of independent random variables.
Especially important is the special case of station-
ary independent increments which obtains when
X satisfies
the requirements both for stationary increments and for independent ones.
In the example
T
=
7+
this means that the
process is formed from partial sums of independent and identically-distributed random variables. examples with
T
= R+
The most famous
are the Brownian-motion and the
4.
Markov processes
11
Poisson processes.*) Problem 7. cess
Prove that if
and the pro-
X has independent increments, then
X is a Markov pro-
cess. (Hint:
Choose
t
>
0
and define a future set
B as
follows:
where
t < tl < t2 < ••• < tn
and the
First show that
for such sets
BE
~t'
E.
J
are Borel sets.
using the independence of the incre-
ments of the process and properties of conditional probability.
Then show that the class of sets
perty holds must form a
B for which the pro-
a-field.)
*)These two examples will be mentioned repeatedly later on. For introductions to them see for example (L], sections 20 and 21, and (F1J, chapter 17, section 2, respectively.
CHAPTER 2 SECOND-ORDER RANDOM FUNCTIONS In this chapter we will consider some general properties of stochastic processes with finite second moments. already noted, the variables of such a process
As
{X t } can be
H = L2 (0, Y,P).
thought of as elements in the Hilbert space
It will be convenient to sometimes allow complex values for the
proc~ss,
and so we will understand that H may mean
either the real or the complex
L2
space.
The norm and inner
product will be denoted (X,Y) = E(XY,)
respectively, and this notation will be used when the Hilbertspace aspects of the process, rather than more probabilistic ones, are in the foreground. {X t }
The covariance function of
can be written in two ways:
We will be looking mainly at those aspects of
{X t } which
1.
Differential calculus
13
make sense in Hilbert-space terms and we always assume that T
is a subset of
Rl
(although certain things can be gen-
eralized to other cases). 1.
Differential calculus. The word "limit" will now always mean covergence in the
LZ norm; for example, if y
T = RI
o.
Y E LZ and
means that
It is worth noting in passing, however, that rapid convergence in the norm does imply almost sure convergence: Proposition. such a way that Proof.
Suppose
Y
n=l
T = Z+
E(IX _YI Z) n
The functions
D.
However, our proof of (1) used the definiteness property only in the case when
Thus the apparently weaker con-
dition
for all real
A is actually sufficient for (1), while it is
easy to see that (1) in turn implies the positivity inequality in its original form (with arbitrary
zl"" ,zN)'
'Phe Bochner-Khintchine t-heorem, dealing with the continuous case, is often interpreted in probability theory as the condition for a function
~(t)
to be the characteristic
function of some probability distribution. i t here; see, for instance,
We will not prove
[FZJ, chapter 19, section Z.
3.
46
STATIONARY 2ND-ORDER PROCESSES
We now derive the spectral form of the stationary
pro~
cess itself, which is based on the "stochastic integral"
dis~
cussed in Chapter 2. ous case
T
= Rl;
and obvious.
changes for the discrete case are trivial
The idea is to set up an isometry between the
Hilbert space
M (a subspace of
the random variables dF
We will use the notation of the continu-
{X t
}
L2 (0, Y,P)) generated by 1
and the space
L 2 (R
,dF), where
is the spectral measure. We begin with the set Ml
binations of the
of all finite linear com-
Xt's and with the set of trigonometric
polynomials, and we define a correspondence between them by letting
1/1( for any
r j =1 n
t. G: T, c. E J J
c. Xt J
t.
n
j
)
.rl
J=
c.e J
it.). J
It is easily verified using (2) that
1/1 is an isometry:
Then in the usual way
1/1 extends to an isometry between the
closure of the set of trigonometric polynomials --- which is 1
L 2 (R ,dF) it).
e
*)
--- and the closure
,and the unitary operator
M of MI.
Ut
We have
1/Ix t ..
on M characterized by
goes over into multiplication by the function
e
it).
*)This point may not seem quite so obvious as before, and the reader is invited to think it over a little.
3.
Spectral representation
47
The function
1
is in
must correspond to something in M
and so it
L2 (R ,dF)
let's say to
is easy to see that the random variables
{Za}
with orthogonal increments; in fact, if
a
b
O.
0,
(~n,Xn-k)
But
In
X ) "n' n-k ..
( c
-n
so that
einAW(A)e-i(n-k)Af(A)dA
is orthogonal to
e
-ikA for all
k >
o.
There-
fore we may say that f2 E L~O' which means the span in (Lebesgue) of {e inA : n ~ a}; in other words, f2 is of
L2
"power series type." Finally, we must have
~n
E Mn;
~o
E MO
is enough.
But
It is clear that this will belong to -1
series of
quencies
*)
f
1
Mo
if the Fourier
contains only terms with non-positive fre-
, since then
can be expressed as a limit of -1
*)This statement requires some caution, since square-integrable.
fl
We will just assume that
f- l ELI); in particular, this is true if
f
-1
fl
may not be E L2
(Le.,
is continuous
4.
70
INTERPOLATION AND PREDICTION
linear combinations of X 's with n < O. Consequently, the n series for f- l will involve only non-negative frequencies. 2
To sum up, we have proved: Theorem. torization f2
which
f2
~
f
f -1
and
Suppose the spectral density
belong to
of pOlver-series
L>O
has a fac---
(i.e., have Fourier series Then the random variables
~).
(3)
the orthonormal set appearing in the Wold decomposition.
~
Corollary.
~
Suppose that
rzn
fl(~)
-ik~
f 1 E L)
A) then
[-w,w)
{X t } must be stationary.
(Com-
pare this with problem 4 of Chapter 1.) (b)
Rn
If
{X t }
{Y t }
and
{Xt+Y t }
which are independent, then
(Proof?)
are stationary processes in is stationary.
In particular, the random oscillations of example
(a) can be added if they are independent to form a larger class of stationary processes. (c)
An independent sequence
process (with
T
= 7)
{~n}
is a stationary
iff the random variables are identically
distributed.
where
~n
N+l
Let
(d)
g: R
1
... R
be a Borel function and define
are independent and identically distributed (iid).
{X } is stationary. If g is a linear function {X n } n will be a moving average of the type ''Ie have considered be
Then
fore, but
g
need not necessarily be linear.
sible also for (e) that is, an sums
= 1.
g
(It is pos-
to involve infinitely many variables.)
Suppose that
p
=
[Pij)
is a stochastic matrix;
N xN matrix of non-negative numbers with rowAlso, suppose that
bilities such that Markov chain with
WP
= n.
n
is a row-vector of proba-
Then there exists a stationary as its transition matrix and
w as
1.
Examples.
85
Law of large numbers
its stationary distribution.*) To construct such a process, let cardinality
N
S
be any set of
(which may be countab1y infinita) and think
of the indices
i,j
as ranging over
S.
Now set
n = S7; S
will be the state space of the process under construction. Sets such as C = {w En: wa where
St E Sand
be the
a,b E 7, are called cylinder sets; let
a-field which they generate.
measure
P
~
Finally, we define a
on cylinder sets by putting
This prescription determines a family of finite-dimensional distributions.
It is not hard to see that they are consis-
tent, and so by Kolmogorov's theorem on
Since
~
P(C)
P
is independent of
butions of the random variables
extends to a measure a, the joint distri-
Xn(W) = wn
are invariant
under time-translation; hence
{X}
is indeed a stationary
process.
{X n }
is a Markov process as
It's also true that
n
defined in Chapter 1, but we will not prove it now since this matter is discussed in more generality in Chapter 8. A method of obtaining many more examples will be described in the next section. In example (c) (an iid sequence of random variables {~n})
the famous strong law of large numbers (slln) holds pro-
vided
E(I~
n
I)
O} n
E fI, n
o.
~
(2)
is a stationary process.
If
~
is invertible, (2) can be extended to negative values
of
n
as well.
As l"e will see later, it is "almost true"
that every stationary sequence is of this form. Xn E Ll
Clearly The mp mapping
(or
L2) for all
n
iff
X E Ll (L 2).
induces a norm-preserving operator on
~
L2
which is defined by
= X(~w)
(UX)(w)
X E L2 : if ~ is invertible, then U is unitary. (Of course, this U agrees with the unitary operator associated
when
with the stationary process
{Xn }.)
Although every inverti-
ble mp mapping induces a unitary operator on
L2
most unitary operators are not of this form.
(Consider the
case when sional!)
fI
consists of 2 points, so that
in this way,
L2
is 2-dimen-
This illustrates the difference between wide-sense
and strict stationarity. Here are some examples of mp mappings. or
is measure-preserving; this fact is a basic prop-
erty of Lebesgue measure. ity spaces.
Translation of
Of course these are not probabil-
But one can also translate modulo
1
(that
is, on the circumference of a circle), and it is easy to check that the mapping
~a:
[0,1)
+
[0,1)
defined by
is indeed measure preserving and invertible for every real a.
This idea extends to compact topological groups in gen-
2.
Measure-preserving transformations
89
eral, on which one can construct probability measures (the Haar measures) invariant under translation by the group elements. The transformation
$x; 2x (mod 1)
on
[0,1)
is
measure-preserving but not invertible; it is two-to-one everywhere, in fact. Looking back to the Markov chain example (e) above, we can define
$:n
+
n
by n E 2';
in other words,
W --
elements of
-- is simply translated one unit to the left.
S
which is a doubly-infinite sequence of
This mapping is called a shift and it is clearly measurepreserving. defined
Moreover, the Markov chain random variables were
by (2) above using the shift:
we have
XO(w) =
Wo
and
The shift is invertible provided the process is defined for all
n E 7, but even if
T = t
+
the shift makes sense; it
:'mp:;.'~)no.::ng:r.ol~-tO!Jn~. th:O:.::::P:::,:'..:u:t:O::): sequence of independent-coin-tossing random variables.
When
+
T ; 7 , the shift for this process can be identified with the mapping
x
+
2x (mod 1)
on
[0,1).
(How?)
As a final example, we mention the context in which the study of
mp
transformations began.
Suppose that a
90
S.
STRICTLY-STATIONARY PROCESSES
classical mechanical system is confined to a bounded region V of its phase space (position and momentum space).
Its
time-evolution can be described by Hamilton's system of firstorder differential equations, whose solutions provide a famWt : V + V (t E Rl ). The mapping Wt has the following interpretation: If the system is in a state
ily of mappings Xo E V at time Wtx O'
to' then its state at time
to
+
t
will be
These mappings satisfy the group condition for all
1 t,s E R ;
moreover, they preserve the Lebesgue measure of subsets of V.
(The latter fact is knOlffl as Liouville's theorem, and it
partially explains why position-momentum coordinates are more useful in mechanics than position-velocity ones, for which the mappings corresponding to
Wt
are no longer measure-
preserving.) A mechanical system can thus be considered as a stationary stochastic process.
Of course, its "randomness" lies
only in the choice of an initial point in the phase space
V;
after that is done the path of the process is determined. But the same thing is true for any process IHi tten in the form (2), including, for example, a sequence of independent random variables!
There is less here than meets the eye at
first glance. We will now prove the "recurrence theorem" of H. Poincare, which (in 1912) was the first general result about measure-preserving transformations.
In the context of mech-
anics as described above, this theorem implies that a system
2.
Measure-preserving transformations
91
returns again and again arbitrarily close to its initial state.
This apparently rules out irreversible changes of
state such as the assertion in thermodynamics that the entropy of a system can only increase, and much has been written in attempts to resolve this seeming contradiction. Theorem.
Let
$
be
bility space (n, ~,P).
~ ~
Clearly
Let
B
B n $-n B
~
WomB n
1jJ
for almost every
n
k {wEA; $ w fl: A for all for all
- (m+ n)
also, so that the sets
n > 1.
B = $-m(B n
{1jJ-nB}
pen)
-
1jJ
-n
$
PCB) = O.
n > O. is mp.
1jJ
n
n > 1.
w E A infinitely often, we shall apply
the result already proved to the mp transformations w E A-N, where
But
Thus for al-
w E A, wnw E A for at least one
To see that
I} •
B) = ~
are disjoint for
we must have
~,
proba-
Then
These sets all have the same measure because then since
~
A E Y, wnw E A holds
Then for any ~
for infinitely many values of Proof.
transformation £g
N is a set of measure
0
When
1jJk.
equal to the union
k, we see that for every such that (1jJk) nkw EA. Thus
of the exceptional sets for each k > 1 1jJ n w
there exists
nk > 1
E A for arbitrarily large
n, as asserted. 00
Corollary 1. most everYlofhere on Proof. A
Let
f > O.
Then
{w: few) > OJ.
L
n=O
f(1jJnw)
=
CD
Apply the recurrence theorem to the set
{w: few) >!}
n '
and then let
n ~
00.
a1-
92
5. Corollary 2.
STRICTLY-STATIONARY PROCESSES
With probability I, a stationary Markov
chain returns infinitely often to the state it occupies at time
O. Proof.
{w: XO(w)
i}
i E S.
for each 3.
Apply the theorem to the sets
The ergodic theorem. Ergodic theory has bean called "a theorem looking for
a theory."
This is doubtless no longer fair, but for some
time the subject consisted mostly of applications and refinements of George Birkhoff's celebrated ergodic theorem, first proved by him in 1931.
The proof given here is much simpler
than the original one.
Many authors have contributed improve-
ments to the proof; here we are using a lemma published by A. Garsia in 1965. ~rgodic
Our version is adapted from [T].
Theorem.
probability space n
Let
n1 n-l L k=O
lim +
A E 9"I
Lemma 1. let
B =
for eVery
be!
(fl, .9",P), ~ let
00
exists almost everywhere. for every
~
=
k f(~ w)
Define
{w: Sn(w) > 0
So
for some
Then
~
= few)
1jI- l C
=0
transformation on a
f E L1 .
Moreover,
{C E .9":
~
(1)
fELl
~
fAf
fAf
C}.
and
S n
n} •
n-l
L k=O
Then
f(~
f.A()B
k
w), and fdP
~
A E .9" • I
Proof.
Let
S" (w) n
max Sk (w). O -
Inn
S* n
since
S* (~w) n
~
f
f
IA g (~w)dP
But because
f dP > 0
Inn
that
w(/;B. n
S (~w)dP = O. * (w)dP f A Sn A n*
A E STr ' fA g dP =
preserving.)
if
0
0, we have
(Note that for any integrable function set
Hence
00.)
Bn
f.~n
~
g
and any invariant
~
since
B as
f dP > 0
n
is measure-
+ ~,
it follows from
also.
I\J ID
Problem 1.
Give a complete proof of the assertion in
parentheses just above which is used in the proof of the lemma. Lemma 2.
f
Ber A
f dP
Let
1 n-1 {w: sup - L n>l n k=O
Ba
aP (B () A)
> -
for
a
Proof.
Apply Lemma 1 to the funct ion
few)
= lim sup n + 00
and let let
E a,B
few)
= {w:
1 n-l - L n k=O
be the lim info
a}.
Then
A E .91 '
Proof of the Ergodic Theorem. -
k
f(~ 00) >
g (00)
Let us put k
f(~ 00),
For any real numbers
a,e
few) < e, !(w) > a}, and note both that -
94
S.
and that
a,a E?r
E
STRICTLY-STATIONARY PROCESSES
Ea 13 C Ba'
Then by Lemma 2
,
JE Replacing
(2) f dP ~ ap(E a)' a,,., a,S by -f,-a,-a respectively and applying
f,a,a
Lemma 2 again, we have the same set
as before and so
E
a,a
(2) becomes
f dP < SP(E
IEa,a
,,), a,p
Comparing (2) and (3), we see that
(3)
P(E a 13)
,
o
if
S
< a.
But since
{w:
few)
this implies that
-f
the limit
A
P(!
few)}
U Es r r 0, be two random n-
sequences, not necessarily defined on the same probability space, which have the same finite-dimensional joint distributions.
Suppose that
P ( lim X, n .... 00 n
X')
I, where
on the same space as as
X)
P( lim X n n .... "" X'
{X'}
= 1.
Prove that
is a random variable defined
and having the same distribution
n
X.
Clearly the same conclusion holds for Cesaro convergence, and so Theorem 1 -- the strong law of large numbers for all stationary sequences -- follows from the ergodic theorem. 6.
Continuous time-parameter. In this final section, we shall show how Theorem 2
(section 1) can be deduced from what we have already proved. First, we note that if finite) then mT
=
E(IXtl)
I
f
To Xt
T,
(we assumed that it is
Ix Idt)
tot
by Fubini's theorem, since that for every
= EC IT
T Eclx I)dt o
=m
T
Io
{X t }
IXtldt
is measurable.
It follows
exists almost surely, as does
dt.
Now let us define two random sequences
It is plausible that both
{y}
n
and {Z} n
are stationary,
and clearly both are integrable since ECIYnl) 2 E(Zn) =
t
n+l EClxtl)dt
m < "".
6.
Continuous time-parameter
103
The question of stationarity, however, is not as obvious as it at first appears, and we will return to this point below. Accepting the stationarity for the moment, by the strong law already proved for sequences we have 1
n-l
- L
Yk
~I)'
To prove Theorem 2, we must extend (1)
n k=O
where
E(Yol
X
(1)
from integral to arbitrary values of small point that the process
{Zn}
In
Ix
not
If a sequence
ak
~
Idt
0
n k=O
In fact,
too and yields
n
= 1 n-l L
It is just for this
was introduced.
{Z}
the ergodic theorem applies to
-1
n.
Zk ~ Z
a.s.
(2)
converges in the Cesaero sense to a
finite limit it follows that
anln
~
~ 0
a. s.
0, and so (2) implies
that
-n1 But we can write
! It t
0
/n+l n
([t]
X ds = __1__ s
[t]
Ix t I dt
(3)
means the greater integer
J[t] 0
X ds . [t] s
+
t
! t
ft
< t)
X ds,
[t]
s
and then applying (1) and (3) we see that the first term on the right side converges a.s. to tends to
a.s.
0
Thus
!( t
where
XE
Ll
and
X, while the second term
0
X ds s
-+
-X
a. s.
E (X) = E(Y O) = E (X O)
(the second step in-
volves Fubini's theorem again), and the slln is proved. It remains to show that
result for
{Zn}
{Y n }
is stationary.
will be included, since
{IXtl}
(The
is sta-
104
S.
{X t }
tionary provided
integrals defining the then each
Yn
STRICTLY-STATIONARY PROCESSES
is.)
This would be quite easy if the
Yn's were of the Riemann type, for
could be approximated by a sum involving fin-
itely many of the
Xt's.
only measurability?
But what in general, when we assume
It is not at a glance obvious that the
distribution of, for example, YO =
f~ Xsds
is even deter-
mined by the finite-dimensional distributions of
{X t }.
Nevertheless, it is true: Lemma.
Let
dom process such
rUt: a that
~
t
~
be a measurable real ran-
b}
f~ E(lutl)dt
distribution of the random variable
0
for
pn = 11'
lim
11'
finite stochastic matrix hav-
Then
n ....
exists, where
~
is
a
(1)
00
stochastic matrix with identical rows.
Assume first that
m
= I,
so that
Pij ~ e: > denote the small-
min p~~) i 1J est element in the j'th column of pn, and similarly let for each
0
M.Cn) J
i, j.
Let
m. (n) J
be the largest element.
Then
1.
Markov chains:
discrete time and states
109
(n) ~ (n-l) ~ p.. = t. P,kPk' > t. P'k m. (n-l) = m. (n-l). 1) k 1 ) - k 1) ) Since (2) holds for each
i
it holds for the minimum, and so
m. (n) > m. )
(2)
-)
(n-l);
(3)
i.e., the column-minima increase with
n.
In just the same
way we can see that the maxima decrease; both M. (n), therefore, have limits as )
n
mj(n)
and
The theorem will
~~.
be proved if we can show that these limits are the same. To see this we estimate a little more carefully and we use (for the first time) the assumption that pose that the minimum tained when
i
m.(n)
and maximum
J
p .. > e. 1) -
M.(n-l) )
= iO and i = iI' respectively.
Sup-
are at-
Then
m. (n) J
so that m.(n)
>
J
-
eM.(n-l) J
(l-e)m.(n-l).
+
)
(4)
In just the same way (do it!) it can be shown that M. (n) < em. (n - 1) J
-
)
+ (1-
e) M. (n - 1) • )
(5)
Subtracting (4) from (5) then yields M. (n) - m. (n) < (1-2e) [M. (n-l) - m. (n-l)], so that
J
J
J
-
1. n
We know in any case that
lim -+
since the stochastic matrix treated. n
Since
For any
lim -+
p
k
nm+k n
00
pm
= 1,2, •..
falls into the case already
,mol
-+
we then have (8)
00
has ro~-sums equal to
pk
(7)
lim
columns, pk1l' is simply
land
11'
has constant
(7) and (8) therefore imply that
11';
holds in this case also and the proof is complete.
(1)
Corollary. 11'
= 11'
pnm 00
L 1r. i 1
satisfy
The (identical) rows of the limiting matrix p ..
1T.,
J
1J
no vector other than
1T
=
L 1T.1 =
> 0, and
1T. 1
i
1.
There is
which satisfies these condi-
{1T.} 1
tions. Proof. ness.
Everything is clear except (possibly) the unique-
Suppose
conditions.
is another vector which satisfies the
Then v
But since
v
L
v.1
Vp n
= Vp
=1
and
n 11'
vp
lim -+
n
=
V 11' •
00
has constant columns, V1l'
= 1T
which proves uniqueness. The hypothesis of the theorem that, for some (m)
Pij
> 0
for all
i,j
m,
is easily seen to be necessary as
well as sufficient for (1) when it is assumed that positive in addition to having identical rows.
11'
is
Still, this
hypothesis is probabilistically quite unnatural and it is
1.
Markov chains:
rliscrete time and states
111
useful to rephrase it. Definition. have
(n)
Pij
for some
0
>
If for every pair of states
we say the matrix
(si's.)
(which may depend on
n
J
i
we
and
j)
is irreducible or that all states com-
p
municate. Defini tion.
(n)
The integer
is called the period of state
d i ; g.c.d. {n > 1: Pii si'
If
di
= 1
O}
>
the state is
said to be aperiodic.
A stochastic matrix
Proposition.
satisfies the
P
positivity condition of the theorem if and only if
P
is ir-
reducible and has at least one aperiodic state. Proof.
Clearly the positivity condition implies that
all large powers of
p
states communicate (in
are strictly positive, and so all m steps, in fact) and are aperiodic.
Conversely, suppose that
S. 1
is aperiodic for some
i.
Since we have (n+m) Pii the set
(n) (m) (n) (m) Pik Pki ~ Pii Pii '
= t L
k
D; en: p~~) > o} 11
with the fact that its
is closed under addition.
gcd; 1, implies that
This,
D contains all
Now consider any (k) other state s. and choose k and t so that Pij > 0 J (R.) and Pj i > O. Then for all n > M we have large positive integers, say all
(n+k+t)
P .. JJ
which means that also aperiodic.
(t) (n) (k) > P ji Pii Pij > 0,
(m)
Pj j
n > M.
> 0
for all large
Similarly we have
m so that
is
6.
112
when
MARKOV TRANSITION FUNCTIONS
n ~ M, and so the off-diagonal entries of
wise positive when
n
is large enough.
Since
matrix, it is clear that for big values of
n
are like-
pn
p n p
is a finite is strictly
positive. If the hypothesis of irreducibility is kept but some state has period
d > 1, the theorem remains true when Cesaro
convergence is substituted for the limit; the statements about the stationary vector
n
still hold.
Finally, when
p
is
reducible, the states of the system can be divided into equivalence classes of mutually communicating states within which p
acts as described above.
There may also be some transient
states which belong to none of the equivalence classes. Their probabilistic role is temporary; after a finite time (with probability one) the system must be found in one of the irreducible classes.
For proofs and further developments
refer to [F I ] or the other references. Finally we mention that if the states form a denumerably-infinite set much of this theory continues to hold, though different methods of proof are required to attain adequate generality. positive for some
(The hypotheses that
pm
be strictly
m is now much too restrictive, and is no
longer even sufficient for the theorem to hold.
In addition some
quite different, and very interesting, phenomena arise including a far-reaching connection with potential theory.
In ad-
dition to [F l ], chapter 1 of [DY] is highly recommended for an introduction to some of these problems.
2.
Continuous-time Markov chains
2.
Continuous-time Markov chains.
113
The intuitive idea will be the same as in section 1 except that "transitions" of the system from one state to another can take place at any time Thus for any
0,1,2, ...
matrix
pet)
t
>
t
0, not only at
~
t
=
we should have a stochastic
0
which expresses the probabilities of passage
from one state to another after a time-lapse of duration
t.
The same argument which previously led us to conclude that the matrix powers sitions in
n
pn
represented the probabilities of tran-
steps now suggests that
the matrices
pet)
should satisfy p(t+s) Thus the matrices
= p(t)p(s);
{p(t): t
>
O}
s,t
>
O.
(1)
form a semigroup; we will
see that this idea is central in the general theory of continuous-time
~arkov
processes.
In contrast to the discrete case, where we could write down stochastic matrices at will, it is not so ohvious how to find the solutions of equation (1).
If
pet)
were a scalar
rather than a matrix, (1) would be the functional equation of the exponential function (but a regularity condition would still be necessary to exclude non-measurable solutions). Something of the same sort holds for matrices as well, as we shall now show. It is reasonable to imagine that no transition can take place in zero time, so we put s,t
~
0.)
for each t
= 0: lim
As stated, pet) t.
t+O+
p(O)
is an
N
Finally, we assume that pet)
= I.
I.
=
x
(Then (1) holds for
N stochastic matrix p(.)
A matrix function
is continuous at
p(.)
satisfying
6.
114
MARKOV TRANSITION FUNCTIONS
these conditions and equation (1) will be called a Markov transition function. Definition. Then
Let
A be any
N
x
N matrix
(N
0, be a Markov transition
for
i # j
(tG) ,
t
~
(2)
0,
matrix satisfying
N x N
L gij = 0
and
j
for all
i.
The correspondence between Markov transition functions and the class of matrices Proof.
G satisfying (3) is
tinuous for all
t
>
O.
t
o imply that
In fact, we have for
pet)
~-to-~.
Equation (1) and the assumption that
continuous from the right at
(3)
pet) h > 0
pet)
is
is con-
2.
Continuous-time Markov chains p(t+h)
(p(h)-l h
~
= p(t)p(h)
exists for all small
0+), and letting
for all
and
t > O.
h
0
~
h
115
pet-h)
= p(t)p(h)-l
since
p(h)" I
we see that
(Actually
pet)
as
= pet)
lim p(t±h) h+O
is uniformly continuous on
[0,00); we will use this fact later on.) Now assume that the right) at
t
exists for all
-
lim
t~O+
Pij (t) t
i ,j .
Clearly the pet)
h
h~O
Gp(t); G =
is also differentiable (from
O', i. e. , that
=
lim p (t+h)
pI (t)
(Here
pet)
also
=
c5 ij
lim
h~O+
pI (t)
=
.. = g 1) gij
(4 )
satisfy (3).
p(h) h
I
pet) (5)
p(t)G.
Left differentiability when
[g .. ]. 1)
just as easily verified.)
Then
t
>
0
is
Now both of the differential equa-
tions in (5) have unique solutions under the initial condition
peO)
= I.
But we know that the function
exp(tG)
sat-
isfies both differential equations as well as the initial condition
p(O)
=
I, and so (2) must hold:
transition function (as the matrix
pet)
is an exponential, with a generator
G is called) satisfying (3).
Conversely, pet)
= exp(tG)
function for every generator
G.
obvious except the fact that
pet)
matrix for each
t
>
O.
Since
00 p(tH =
every differentiable
L
n=O
=
is a Markov transition All the conditions are quite must be a stochastic
G has row-sums equal to
o
(tG) ,r
=
,r,
0,
116
6.
which shows that
pet)
MARKOV TRANSITION FUNCTIONS
has row-sums equal to unity.
"l" means a column-vector with all entries g ..
1J
for each
> 0
i , j.
1)
it is clear that
2 0 .. + tg .. + PCt ), 1J
small
for all small
1J
pCt) > 0
Finally, if
t.
1J
p .. (t) > 0
t.
But because
= p(t/k)k
pet) it follows that
Suppose that
Then since
=
p .. (t)
= 1.)
(Here
for all
=
g ..
1)
t
if it is true for
in some cases, we consider
0
e > O.
By the previous argument
p(e)(t)
lim p(e)(t) e+O+ and so we conclude that
>
=
O.
But then clearly
pet) ~ 0,
= exp(tG)
pet)
is a Markov transi-
tion function. To complete the proof of Theorem 1, we must prove that pI (0)
actually exists.
From (1) we have
[p(h)-I][I+p(h) + p(2h) + ... + penh-h)] = penh) - I for any if
h
+
h > 0 while
0
and any
n.
Now since
nh
t > 0
we have
n-l h
Since
p(O)
L
k=O
+
p(kh)
+
p(.)
is continuous,
Iot p(u)du.
= I, it is easy to see that the integral is non-
singular when
t
is small enough; if we fix such a
the Riemann sum will also be non-singular for small Clearly
(6)
penh) - I ... pet) - I.
t, then h.
Combining these facts with
2.
Continuous-time Markov chains
117
(6), we obtain
lim h... O+
p
Ch) - I h
I
1
t
= [pet) - 1][ 0 p(u)du]- ,
so that, in particular. p' (0+)
exists.
This finishes the
proof. Next we will discuss the behaviour of
pet)
for
large
t; the facts are simpler than in the discrete-time
case.
(We continue to assume that
Markov transition function for all
Criterion.
p(.)
t > 0
p(.)
N
1.
for all
[G ]ij = 0
i F j
is irreducible iff for every
there exists a finite sequence of states indexed by i l ,i 2 , ..• ,in = j
and
Consequently, t n [G n ] .. 1J = 0 ~ n! n=O ~
p .. (t) 1J
there
p (. )
t, so
for all ~
paths from
is not irreducible. i
to
length of the shortest ones.
j
Conversely. i f
(i F j), let
be the
in any such path iki k + l must be nonnegative since diagonal terms cannot occur in paths nO of minimum length; accordingly [G ]ij > O. As before, this implies that t > O.
p .. (t) > 0 1J
Therefore
p(.)
All
nO
for small
g
t -- and hence for all
is irreducible if the criterion on
G is satisfied. Note that in the continuous-time case there is no analogue for "periodicity."
6.
118
Theorem 2.
Let
~-function ~
pet), t
finitely
lim
MARKOV TRANSITION FUNCTIONS 0, be
~
states.
~
pet) =
~
irreducible
~
~
(7)
~
t~~
exists; the limiting matrix (identical)
Proof.
for all For any
stochastic matrix.
has constant columns.
the unique probability vector
~ ~
n = np(t)
that
~
Its
n
such
t > O. h > 0, p(h)
is a strictly-positive
By our results in the discrete-parameter
case we know that
= lim
lim penh)
p(h)n
=
~
(8)
n~~
exists, where the identical rows of ability
vector satisfying
n
But we have seen that on f
= np(h). pet)
is uniformly
such that
[O,~)
lim f(nh)
exists for each
n~w
is easily seen to possess a limit as in (7) exists. and so
continuous
Any real, bounded, uniformly continuous function
[O,~).
on
are the unique prob-
~
~ m,
and so the limit
The limits (8) thus are the same for each
np(t)
n
t
h > 0
for every
(even for a single fixed
t.
h,
The solution is unique
t > 0), and the proof of the theorem
is complete. It is clear that not every stochastic matrix can be embedded in a continuous-time Markov family so that the given matrix while
p
becomes
pel); for example, if some entry
pm > 0 for an m > 1
Another criterion: (Again, why?)
then p
an embeddable
p ..
1J
cannot be embedded. p
=0 (Why?)
must be invertible.
It is not always easy to tell when embedding
is possible, and quite a few papers have been written on this
2.
Continuous-time Markov chains
problem.
119
The following examples may be of interest:
Problem 2.
Show that
p
in a Markov transition function
[
-
-
a I-a
pet)
can be embedded iff
a E(1/2, 1).
the condition holds, compute the corresponding show that it
pet)
If
and
is unique.
Problem 3.
(Jane Speakman, 1967).
possible to embed a stochastic matrix
p
Even when it is into a continuous-
time Markov transition function, the latter need not be unique (in contrast to the previous problem).
r-~
JJ
1
Ll
G = 1
-1
o
and
Let
-1 G2 = [ 1/2
1/2
1/2 -1
1/2
l/J 1/2 -1
note that both are generators.
Find the corresponding tran-
sition functions
p(2)(t), and show that they
agree for (Hint:
t
pellet)
4n k
= --
/3'
and
",here
k
is any integer.
by considering the diagonal form of the generators,
one sees without much computation that certain form.
In the case of
has the form and
;
b ..
1)
p .. (t) 1)
are constants.
=
p .. (t) 1)
must be of a
p(2)(t), for example, each -3t/2 a .. + b .. e , where a .. 1)
1)
1)
Then the initial conditions and
symmetries allow one quite easily to find the precise formulas; in particular, the diagonal terms of
p(2)(t)
are
1/3 + 2/3 e- 3t / 2 .)
On the other hand, any stochastic matrix
~
describe
the transitions of a continuous-time Markov chain if the transitions are governed by a "random clock"; that is, a Poisson process. tem" undergoes
We assume namely that in time k
transitions, where
k
t
our "sys-
is a Poisson random
6.
120
variable with mean At. stochastic matrix
MARKOV TRANSITION FUNCTIONS
Each transition is governed by the
p which can be quite arbitrary.
probability of passing from state
i
to
j
Thus the
after time
t
has elapsed should be Pij (t)
= e -At
CIO
>: k=O
~ (k) k'. p IJ .. ,
so that we get p
Since
(t)
A(p - I)
e
-At
eXP(At p)
= exp[At(P
- I)}.
(9)
is a generator (satisfies (3)), this func-
tion is a Markov transition function. Conversely, let
pet)
be any transition function; we
= exp(tG)
shall express it in the form (9).
Since
pet)
by Theorem 1, we need only find a
A> 0
and a stochastic
matrix
p
such that
A(P - I) = G.
Thus if
A> 0
is fixed,
we must. take i
~
(10)
j;
The matrix defined in (10) automatically has row-sums = 1 and so it will be stochastic iff
p .. > 0; i. e., provided 11 -
A > max Igiil. Hence, for all large i such that p (t) is given by (9).
that
The representation of
pet)
A, there is a
p
as a discrete-time Markov
chain governed by a "random clock" makes it easy to construct the sample functions of the corresponding stochastic process. However, we will leave this matter until Chapter 8, when we will construct the sample spaces and random variables for a wide class of Markov processes.
3.
Discrete time, general state-space
121
Incidently, using a "random clock" and equation (9) is still a valid way to construct continuous-time Markov transition functions from individual stochastic matrices even when the number of states
(N)
is infinite.
However, one no
longer gets all transition functions this way, and the ones which are missed are naturally the most interesting! 3.
Discrete time, general state-space. Let
(5,5')
together with a
be a measurable space; that is, a set
a-field of subsets
~
5
The analogues of the
stochastic matrices considered in section 1 are functions p = p(x,A), x E 5, A E 5', which wi11 be ca11ed Markov kernals or stochastic kernals provided that (a)
is a probability measure on 5' for each
p(x,·)
x E 5, and (b)
pC' ,A) A E
is measurable (with respect to
~
for each
Yo
Given such a function we define the "matrix powers" ductively by setting
1
p
=p
P (n+l) (x,A) =
f5
pn
in-
and dY ) P (n)( yA) p (x, ,.
(1)
It is then easy to show that (2)
for a11
n,m> O.
(\\Ie define
point measure with mass at p(n) (x.A)
p(O)(x,A)
x.)
ought to represent the
= -A(x)
to be the
Clearly the function n-step transition proba-
bilities of the Markov process, in analogy with the matrix powers of section 1.
122
6.
Examples.
MARKOV TRANSITION FUNCTIONS
It is easy to construct an enormous variety
of stochastic kernals.
Let
RI
be
(S,Y)
with the Borel
field; we will consider some special types. a)
Suppose that from
possible, say to and
l-h(x)
f I (x)
and
respectively.
only two transitions are
x
with probability
f 4 (x) Then
and this is a stochastic kernal provided only that and x.
h
hex)
are measurable functions with
2
0
hex)
2
f l ,f 2 1
for all
Such processes arise in certain mathematical mddels for
psychological experiments in learning. f 2 (x)
= x-I, b)
If
= x+l,
flex)
we obtain a random walk on the integers.
Alternatively, take a continuous distribution de-
pending on several parameters -- to be specific, we can choose
N
the normal
o,~
functions of p(x,·).
Let
and
0
x, and then choose
be arbitrary measurable
~
N
o(x),~(x)
as the measure
This again yields a stochastic kernal. c)
If
p(x,A)
is independent of
x, we have indepen-
dent trials. d)
The most important case corresponds to the process
"sums of independent random variables." probability measure on
RI
p(x,A) (where
A-x
to see that
=
{y: y
= a-x
p(n) (x,A)
m-fold convolution of
Let
m(')
be any
and define
= meA-x) for some
a E A}).
m(n)(A-X), where
men)
Then it is easy denotes the
m with itself.
We shall not discuss the general theory of this class
4.
Continuous time and space
of Markov processes.
123
Some of the important problems, as
in the discrete case, are to study the limiting behaviour of p(n)(x,A)
n ~ 00, to investigate the possible existence
as
of a "stationary measure"
15
n
51' such that
on
n(A)
p(x,A) n(dx), and to formulate the best analogues for such
concepts as irreducibility, aperiodicity and transience.
Some
of what has been done along these lines can be found in Chung's 1964 paper which reviews and extends the pioneering work done by W. Doeblin before his untimely death in World War II. 4.
Continuous time and space:
definition and examples.
The idea behind the following definition is simple; the quantity
pt(x,E), where
t
~
0, x ES
and
E E Yo is
supposed to mean the probability that a "system" which is in "state" s
+
(a)
t
x
at some time
in the set Pt(x,')
Pt (. ,E) and
(c)
will be found at a later time
Accordingly we assume that
is a probability measure on
and each (b)
E.
s
51' for each
t > 0
xES; is an Y-measurable function for each
t > 0
E E 51';
PO (x ,E) =
~E
Cx) ;
and finally
A functiun satisfying (a) - Cd) will be called a Markov transition function.
The intuitive meaning of the
crucial condition Cd) should now be clear by analogy with what has gone before.
This condition is often called the
124
6.
MARKOV TRANSITION FUNCTIONS
"Chapman - Kolmogorov equation," and it is just this property which reflects the Markov principle of the lack of any "memory" in the
system.
In addition (as in the finite- state
case) some sort of continuity condition with respect to
t
will be needed; we will formulate various forms of this condition later as appropriate. Examples. (i)
Let
S be a finite set with
.51'= 2S •
Then except for the absence of the continuity assumption the above definition reduces to that of section 2. (ii) tion 3.
Let
p
= p(x,A)
be a stochastic kernal, as in sec-
If the transitions of the corresponding discrete-
time Markov process are governed by a Poisson "clock" we obtain (1)
where .{p k } 3.
are the iterates of
p
defined in (1), section
Note that the "Poisson process" itself is a special case:
simply let
S = t
and
p(x,E)
=e
Pt(x,{x+n}) Problem 4. function on
= ~E(x+l)
so that (1) becomes
-H (H)n I '
n.
Verify that (1) defines a Markov transition
(S,SI)
and that it satisfies as well the con-
tinuity condition lim Pt(x,{x}) t+O+
(e)
(iii)
field.
Let
u-. (S,J)
=
= 1,
(R 1 ,
m),
all
xES.
where
m
is the Borel
An important class of transition functions are those
which satisfy (a) - (d) and (f)
all
1 yER.
4.
Continuous time and space
125
We will see that when a Markov process is constructed from such a translation-invariant transition function, the result is a process with independent increments (Chapter 1).
The
Poisson process is the simplest example. Let ure on
~
be any infinitely-divisible probability meas-
Rl; this means that for each integer
is a distribution
whose
\.It+s
(where
"*"
\.It
such that
\.11
Pt (x,E)
t > 0
\.I.
there is a
= \.I and \.It * \.Is
denotes convolution).
it follows immediately that
there
n'th "convolution power" is
It is then not hard to show that for each probability measure
n > 0
We define
= \.It (E-x); Pt
is a translation-invariant
transition function. The most important example is the normal distribution: if
\.I
is
N(O,l), then
\.I
t
= N(O,t) e
_y2/2t
and we obtain dy.
(2)
This is the transition function of the famous Brownian motion or Wiener process.
Its character is quite different from
that of the Poisson process. does
~
The transition function (2)
satisfy the continuity condition (e), but does sat-
isfy (g)
for every
Pt(x,[x-£,x+£]) £ > 0
and every
= 1 - oCt)
x E R1 •
This means (as we will
see later) that the Wiener process never stands still, as does the "compound-Poisson" process governed by (1), but is in constant motion.
Condition (g), however, implies that the
process changes state not by jumps but by continuous motion.
126
6.
MARKOV TRANSITION FUNCTIONS
A Markov process with this property is called a diffusion. Problem S.
Suppose that
!
f(x) ..
tion" wi th densi ty
1f
_1_ , x E Rl.
density
t
= 1f- ----. t2+x2
ft(x)
Show that
l+x2
is infinitely divisible and that 1
is the "Cauchy distribu-
Il
Il
is the distribution with
Il t
(The resulting transition function
corresponds to a Markov process called the Cauchy process.) Does this transition function satisfy condition (g)1 (iv)
Other closely related examples can be constructed
easily from (2). N(mt,t)
For example, if
Il =
N(m,l)
we have
Il
=
t
so that Pt (x,E)
fE-x
1 =-121ft
e
_ (y-mt) 2 2t dy.
(3)
This represents Wiener process with a constant "drift" of magnitude
m superimposed.
If
{Xt }
are the random variables
of the original Wiener process governed by (2), then (3) corresponds to the process
{X t + mt}.
This new one still has
independent increments. Another modification leads to a transition function on R+.
Due to the symmetry of the function (2) about
is plausible that the process
y
x = O.
if
Xt goes from x to function suggests itself: 1 p (x,E) .. - -
I2nt
t
where
x
>
0
and
[
Since
IXtl
R+
goes from
it
with a x
to
±y, the following transition
fE-x e- y 2/2t dy + fE+x e _y2/2t dy),
E C R+.
= 0,
{Ixtl}, where
erned by (2), could represent a Wiener process on reflecting barrier at
x
(4)
S.
Ko1mogorov's equations
Problem 6. function on S.
127
Verify that (4) is a Markov transition
(R+,m+)
and satisfies condition (g).
Kolmogorov's equations. The Brownian motion, even with its modifications, is
only one example of the large class of transition functions which satisfy conditions (a) - (d)
an~
(g) of section 4 and
which we anticipate may correspond to processes with continuous paths.
We will now show that these functions generate
solutions to certain parabolic partial-differential equations. Conversely, the equations can be used to construct and study the transition functions and the Markov processes they govern. We begin with the Brownian case, where the transition function
Pt(x,E)
has the density
f(t,x,y)
e
This function is, for each
_(y-x)2/2t
t
>
o.
(1)
x, the fundamental solution of the
classical "diffusion equation" (or "heat equation") a£
at
1 2
a2f
(2)
ay2
This means that the right side of (1) is a solution of (2) for
t > 0, and that the measures defined using
a density converge weakly, as
t
+
f(t,x,')
0+, to a unit mass at
as x.
This way of looking at the problem makes (2) the so-called "forward equation" of the Wiener process:
for fixed initial
state, the density of the random variable
Xt
satisfies a
parabolic differential equation. There is another approach, which at first appears less natural but turns out to be both simpler and more general. This is, basically, to consider the terminal state to be fixed
6.
128
MARKOV TRANSITION FUNCTIONS
and to vary the initial state and the elapsed time.
In the
case of (1) the change seems deceptively trivial; if rather than
x
is fixed, f
solution of (2) with
y
is clearly still a fundamental
a2/ax 2
2 a2lay.
in place of
different way to put this is as follows: bounded, continuous function on
RI
let
A slightly be any
~(x)
and define
~(t,x) = f~ ~(y)f(t,x,y)dy.
(3)
-~
Then y)
satisfies the heat equation (2) (with
~
and also the initial condition
~(O,x)
x
replacing
= ~(x).
In this
context (2) becomes the "backward equation" of the process. The situation can be clarified by returning to the finite-state case studied in section 2. basic matrix equation in the form led to the differential equation
We began with the
p(t+h) = p(h)p(t), which pl(t)
= Gp(t).
This is the
backward equation of the Markov chain since we analyzed a transition in time final state Sj
t+h
and derive
p' (t)
to a new
= p(t)G,
p(t+h) =
we obtain the forward G acts on the
si; in the second (forward) case, on the final
Sj'
More generally, let tion.
si
Alternatively, if we begin with
In the first case the operator
initial state state
by holding the
to
h) from the initial state
intermediate state. equation.
si
fixed and performing first the short-time
transition (time p(t)p(h)
from
p
t
(x,E)
be any transition func-
The forward approach seeks a differential equation (or
a more general functional equation) satisfied by the density of the measure
Pt(x,')
for fixed
x; the choice of
enter only through the initial condition that as
t
x +
0+
will the
5.
Ko1mogorov's equations
129
solution should converge to a unit mass at
x.
This method
was known to physicists such as Smoluchowski even before the work of Wiener, and led to useful results about problems such as diffusion subject to outside forces or inhomogeneities in the medium. The backward approach was introduced by Kolmogorov in 1931 in a famous paper which for the first time surveyed the whole question; we will now take a closer look.
Define,
as in (3),
~(t,x) = I~~ ~(y)pt(x,dy), where
~
is bounded and continuous.
(4 )
We will try to find the
differential equation satisfied by these functions time
~
~(x).
will enter through the initial condition
~;
this
~(O+,x)
If the equation, subject to this condition, can be
solved for every
~
it is possible to reconstruct the transi-
tion function (although the information is given in a less intuitive form than in the forward case).
The advantages of
the backward method lie in its greater generality and theoretical simplicity, and it has come to occupy the main place in the mathematical literature. We will now derive a (backward) "diffusion equation" satisfied by the functions
~
defined in (4).
interested in continuous paths, we assume that
Since we are pt(x,E)
sat-
isfies (g); that is, that lim t+O+ for all
£ > 0, all
that the limits
tI pt(x,RI -[x-£,x+£]) = 0 x.
(5)
It is then a1sQ reasonable to require
130
6.
MARKOV TRANSITION FUNCTIONS
! Jx+e: pt(x,dy)(y-x) = a(x) t
lim t+O+
(6)
x-e:
and
exist for
each
(5), for all
mean (over
Jx+e: z pt(x,dy) (y-x) = b(x) x-e:
tI
lim t+O+
x, for some
e:).
0
(7)
e: > 0
(and hence, because of
Physically, a(x)
may be thought of as a
w) instantaneous (with respect to
when the process is "at tion as a variance. tion
~
x";
b(x)
t) velocity
has a similar interpreta-
Finally, we have to assume that the func-
has a continuous second partial derivative with re-
~
spect to
x, for all Theorem.
function
!i.
t > O. the above conditions hold, then the
defined in (4) satisfies the differential equa-
~
tion t
(8)
> 0,
and the initial condition lim t+O+ Proof. which leads to sume first that
~(t,x)
=
~(x),
I
xER.
(9)
We will write down the difference quotient a~/at,
h > O.
equation (d) to write
and show that its limit exists.
We can use the Chapman-Kolmogorov Pt+h
in terms of
there are again two ways of doing this.
and
~(t+h,x) = f~mI~mPh(X,dY)Pt(y,dZ)~(Z)
Ph; clearly
The one appropriate
to the "backward" approach is to begin with
which leads to
As-
=
Ph:
I~m Ph(x,dy)~(t,y),
5.
Kolmogorov's equations
!pct+h,x)-1/J(t,X) h
= h1
To take the limit as
roo
(5) and the fact that
Ph(x,dY){1/J(t,y)-1/J(t,x)}.
J-oo
h
131
(10)
0, we first observe that by using
+
1/J
is bounded by
we can
supl~(Y)1
rewrite (10) as 1/J(t+h,x)-1/J(t,x) h
for each
>
O.
smoothness of
1/J
1/J(t,y)
€
=
kfX+€
Ph(x,dY){1/J(t,Y)-1/J(t,x)}+o(l)
Next, using the assumptions about the we have
1/J(t,x) + (y-x)1/Jx(t,x) +
=
(11)
X-€
(y_x)2 2 1/Jxx(t,x)
(12)
+ ret ,y) (y-x) 2 by Taylor's theorem, where
r(t,y)
+
0
as
y
+
x.
We are
going to substitute (12) into the right side of (' L), and then use (6) and (7).
The error term in (12) leads to an in-
tegral bounded by the quantity max
1
yE[X-€,x+€]
as
h
+
0+
rx +€
2
Ir(t,y)1 h Jx-~ Ph(x,dy) (y-x) ; 0, verify that (9) holds for every
bounded, continuous function
~.
We will not derive the forward equation analogous to (8), but the reader may still want to know what it looks like. Assumptions (5), (6) and (7) are again required along with regularity conditions more stringent than for the backward case; to begin 'iith, Pt(x,E)
must have a density
which is twice continuously differentiable in tion satisfied by
f, for fixed 1 2
x,
y.
f(t,x,y) The equa-
turns out to be
a2 a {bey)£} - - {aCyl£} ay2 ay ,
-
(15)
so that the operator on the right is the formal adjoint of the one in (8).
It is apparent that (15) holds less generally
than the backward equation; for example, previously the functions
a
and
b
did not have to be differentiable.
We have shown that a transition function satisfying certain conditions generates solutions of the system (8) and (9).
The most important applications, however, are in the
S.
Kolmogorov's equations
opposite direction:
133
given the functions
would like to construct
pt(x,E)
a(x)
and
by solving (8).
b(x), we This was
done before 1920 in special cases (using the forward equation (15) instead of (8)), but the first general treatment was given by W. Feller in 1936. a(x)
and
Under certain conditions on
b(x), Feller showed that there is a unique bounded
solution
~
tions
are derived via (4) from a transition-probability
~
function. *)
to (8) and (9) for each
~,and
that these func-
It was first proved by R. Fortet in 1943 that
these probabilities correspond to processes with continuous paths.
This is about where the theory of diffusion stood at
the end of the 1940's. But there was already much more than this to be said about Markov processes.
Kolmogorov's 1931 paper discussed,
in addition to diffusions, processes of the "purely discontinuous" sort (the compound-Poisson processes of (1), last section, are examples of this kind) plus processes of "mixed type" which move both by jumps and continuously.
In addition
to these generalizations, the effect of boundaries (with associated "boundary conditions" such as "reflecting," "absorbing," etc.) had to be taken into account.
The result of all
this was a picture of considerable complexity.
Most theorems
required ad-hoc hypotheses and regularity conditions, and there were few complete, clear-cut results.
*)This is not quite how Feller put it, but equivalent. The hypotheses of our theorem, incidently, are approximately those of Feller's 1936 paper rather than of Kolmogorov's.
CHAPTER 7 THE APPLICATION OF SRMIGROUP THEORY 1.
Introduction. The skeleton key which brings order out of all
t~~s
chaos is the theory of semigroups; its application to Markov processes was developed in the early 1950's, with W. Feller doing the pioneering work. Let
Pt
urable space
be any Markov transition function on a meas(S,Y)
and le.t
measurable functions from
S
F(S) to
be the space of bounded,
Rl; F(S)
with respect to the supremum norm.
For
is a Banach space
f E F(S)
we define (1 )
It is clear that for each tor mapping
F
IITtfll~ Ilfll
into
t > 0
T
t
F', that is, Tt
for all
fE F.
is a contraction operais linear and
Moreover, the Chapman-
Kolmogorov equation (condition (d) in the definition of a transit~on
function) yields
1.
Introduction
135
(Tt+sf)(x) = fPt+S(X,dY)f(y)
ff
pt(x,du)ps(u,dy)f(y)
so that the operators
{T t }
satisfy the semigroup condition (2)
Because of (c), we also have On the other hand, let of finite signed measures on norm.
For each
TO = I.
M(S)
denote the Banach space
(S,5P)
with total-variation
v E M(S), we define (3)
Again, it is easy to see that the operators (with on
Uo = I)
M(S).
{U t ; t > O}
also form a contraction semigroup of operators
(The reader should verify this for herself.)
Finally, there is a natural bilinear functional between F(S)
v EM,
f
and
E F.
M(S)
defined by
(V,f) =
f
S
f(x)v (dx)
for
In terms of this functional the two semigroups
we have constructed are adjoint, since using Fubini's theorem we have (4 )
Thus we expect a certain duality to hold between the two semigroups; it is incomplete, since in general neither is the entire conjugate space of the other.
However, we may
expect that it will be enough to study primarily two semigroups and we will see that
M nor
~
of the
{T t } is usually the
F
136
7.
SEMIGROUP THEORY
better one on which to concentrate. As an example, when of column vectors and (1) becomes (~p(t))
..
J
(Ttf).
1
= \j
M(S)
S
is finite
F(S)
is the space
that of row vectors.
= (p(t)f)1"
p .. (t)f. 1J
The duality between
J
F
and
Equation
while
M is of course com-
plete in this case, and
Ut and Tt are true adjoints of each other, corresponding simply to transposing the matrices pet).
The semigroup equation (2) reduces to the multiplica-
tive property of the stochastic matrices which we studied in section 2 of the last chapter. In the next four sections we will study the structure of contraction-operator semigroups.
The theory bears con-
siderable resemblance to the matrix case treated earlier; there is again a differential equation analogous to
y = ay
and an operator which "generates" the semigroup, although the simple exponential function no longer suffices to construct it.
These general results will then be specialized to the
Markov case, and the mysteries of "backward" and "forward," boundary conditions and the like will largely disappear. After all this we will turn in Chapter 8 to the study of the Markov processes themselves, using the considerable supply of information about their transition functions which will then be available. 2.
Definitions and preliminaries. We have seen that there are two semigroups associated
in a natural way with each Markov transition function.
We
will now develop the most necessary results from the general theory of semigroups, due to E. Hille, K. Yosida, R. Phillips
2.
Definitions and preliminaries
and others.
137
From here on, the letter V
will mean an arbit-
rary Banach space over the real numbers.
Elements of V
be denoted by letters such as
z, scalars by Greek
x, y
and
letters, and operators by boldface Latin capitals. of an element
x EV
Definition 1.
will
The norm
is written II xii. {T t }, t E R+ , be a family of lin-
Let
V. Then {T t } is a one-l2arameter contrac-
ear operators on
tion semigroul2 provided that (i)
II\xll ~ "xII
(ii)
(iii)
Tt +s
= Tt
Ts
lim IITtx-xll t+O+
We will say for short that
0 {T t }
for all
XE V;
for all
t,s
for all
x E V.
~
0;
is a semisroul2 i f these
three conditions are satisfied. Remarks.
It follows from (ii) and (iii) that
the identity operator.
(Why?)
Condition (iii) is (by defini-
tion) equivalent to requiring that strong ol2erator tOl2ology.
= I,
TO
Tt
+
as
t + 0+
in the
It is natural to ask why the uni-
form or the weak topologies are not used instead.
The
answer in the first case is that requiring uniform convergence would be much too restrictive; this holds only when the generator (defined below) of operator.
{Ttl
is a bounded
As for weak convergence, it turns out that it is
actually equivalent to strong convergence in our situation. This fact is sometimes useful but will not be essential for us; see section 6, propositions 3 and 4. Lemma.
{T } is strongly continuous in t
t
for all
t>O.
138
SEMIGROUP THEORY
7.
Proof.
h
We must show that lim -+-
0
o
IITt+hx-Ttxll
But by (i) and (ii) we have for
which tends to
0
h
as
(but so small that
0
-+-
for all
x E V,
h > 0
by (iii) .
The case
t+h > 0) is handled similarly.
finite-dimensional case, the proof shows that ally uniformly continuous for Definition 2.
Ax
t >
{T } t
Let
=
t
lim -+-
0+
Ttx
set of
x EV
h < 0 As in the is actu-
o.
be a semigroup.
Define
Ttx-x t
(1 )
provided the limit exists (in the norm topology). called the generator*) of
t > O.
Then
A is
{T t }, and its domain (that is, the
such that the limit (1) exists) is denoted
~A' It is clear that into
A is a linear operator mapping
V; it mayor may not be continuous.
This operator is a
sort of time-derivative of the semigroup at
t = 0, and the
hope is that it can playa role like that of the matrix in the finite-dimensional case. erator of the matrix semigroup space.
In general
~A ~
V
always enough elements in tor very useful.
~A
G
In that case, G was the genpet), and
~G
was the whole
occurs frequently, but there are ~A
to make the concept of genera-
This is quite easy to show:
*)Sometimes A is also called the "infinitesimal generator" or the "infinitesimal operator" of {T t}.
2.
Definitions and preliminaries Problem 1.
Let
I: f(t)TtxdtE~A
0 < a < b
0, and we have t
Proof.
> O.
(2)
The derivative on the left side of (2) is taken
in the norm topology; this will always be assumed hereafter unless the contrary is explicitly stated. for any small
h > 0
Fix
t > O.
Then
we have T -I h • T X t ,
-,;--
and as
h
Tt(Ax)
because of (1) and the fact that
+
0+
the limit of the second expression is clearly Tt
is continuous.
The third expression thus approaches this limit also, which proves that
TtX E
~A
and that (2) holds with "right-hand
derivative" on the left side. Now assume that
h < 0
(but
t+h > 0).
Then
7.
140
Tlhl x - x
~ I I Ih I
- TI hi . Axil
0;
C, m < ~;
0+. u(t) = Ttx
We already know that
is a solu-
tion and that it satisfies the conditions; it remains to show uniqueness.
Suppose that
ul
satisfying (a) - (c) and set
and
u2
vet)
are both solutions
ul(t) - u 2 (t); then
is a solution satisfying (a) and (b) and tending to t
+
0+.
Since
Put vet)
wet)
=e
-At
v(t), taking
~
> max(m l ,m 2 ).
satisfies (6) we have
~ ~(t)
= -~w(t)
+
e-~t Av(t)
0
-1
-R). wet)
v as
7.
146
SEMIGROUP THEORY
by Theorem 1; i.e., wet)
= -R~ d~~t)
Integrating both sides from
f os w(t)dt = -R~ fS0 since
to
s
dw(t) dt dt
we obtain
= -R~
s
+
00, the left side
sumption (b) and the choice of vet)
vanishes for each
above it follows that
vet)
=0
wCO)
= O.
tends to the Laplace trans-
form of v, while the right side tends to form of
wCs)
commutes with the integration (why?) and
R~
But when
0
.
0
because of as-
Thus the Laplace trans-
~. ~
> m and by the lemma
itself.
This completes the
proof of uniqueness. Corollary. then
Tt
A of
{T t }
is bounded,
= exp(tA).
Proof. exp(tA)x
If the generator
We have seen that, when
A is bounded, u(t) =
satisfies the differential equation (6) and condi-
tions (a) and (c).
The verification that the exponential
series converges also shows that Ilexp(tA)xll ~ exp(tIIAII) Ilxll, so that (b) holds as well. TtX
for each
x E V.
x
in
!?)A'
Hence by the theorem \~hich
exp(tA)x
in this case means all
4.
The Hille-Yosida theorem
4.
The Hiile-Yosida theorem.
147
We have seen that semigroups are characterized by their generators, and have clarified the relation between generator, semigroup and resolvent.
But for constructing semigroups, it
is important to know precisely when an operator generator of some semigroup.
A is the
This question is answered by
the following: Theorem (Hille and Yosida). tor on tor of
V wi th domain ~
(i)
!?) A is dense in for every
V
A be the genera-
it is necessary and
A> 0
V; and
y EV
the equation
has a unique solution
the solution in (ii) satisfies
Outline: proximate
~
linear opera-
~
A satisfy the following three conditions:
AX - Ax = y (iii)
A be
In order that
contraction semigroup
sufficient that
(ii)
[/JA'
Let
x E!?)
A
when
2.11fU
Ilxll
We will find bounded operators
A for
x E !?)A;
A is large.
AA
which ap-
Using these
operators, we construct semigroups by setting (1)
Tt = lim Tt(A), and prove that A... '" is a semigroup with generator A.
Finally, we will define
Proof.
The necessity of the above three conditions was
proved in section 3.
Suppose the conditions hold.
it is clear that the operator write
{T t }
RA = (AI-A)
-1
.
From (ii)
AI - A is invertible, and we
(Of course we anticipate that
RA
will
become the resolvent of the semigroup we plan to construct.)
7.
148
For each
A
0
>
RA maps
V onto
~A'
SEMIGROUP THEORY
and by (iii) we have
liRA II < A-I. AA = AAR A, A > 0; this choice is moti-
We now define
vated by the corollary to Theorem I of the previous section. Since (2)
we see that
AA
is continuous, with
I IAAI I 22A.
We will
now prove that lim AAX A"'''''
for all
Ax
=
x E ~A'
(3)
x E ~A'
(4 )
In order to do this, we first show that lim A"''''' Since
A commutes with
ARAx
=
RA
x
for all
we have
Therefore
which proves (4). Next we extend (4) from and any
e
>
0, we can choose
~A
to all of
y E~A
so that
V.
For any
I Iy-xl I 2 e.
Then
and then as
A'" ""
we have
result (4) already proved. Finally, if
x E
~A'
lim sup I IARAx-xl I 2 2e Thus (4) holds for all we have
by the x.
x
4.
The Hille-Yosida theorem
149
lim AAX • lim ARAAX
A-+oo
A-+oo
because (4) holds for
Ax.
= Ax
Therefore (3) is proved; in a
sense, this says that the bounded operators A for large
approximate
A.
Now we consider the operators We know that
AA
{Tt(A); t
~
O}
Ti(A)
defined by (1).
is a semigroup for each
A > 0,
and it is easy to see that it is actually a contraction semigroup,
In fact by (2) we have
so that
We must next prove that
Tt(A)X
has a limit as
A -+ OOj
to do this we will show that lim IIT t (A)X- Tt (ll)xll A,Il-+- 00 For fixed
A and
Il
and
=0
x E:!)A
for
x E !:lA'
(5)
write
and note that fl (t)
where But then we have*)
*)This formula involves taking the derivative of a "product" as in elementary calculus, but in quite different circumstances. Think through what is involved!
7.
150
so that, since
SEMI GROUP THEORY
f(O) = 0,
Io exp(-sAA)g(s)ds. t
Noting that
Tt(A)
and
A~
Hence, since the operators "f(t)1I .::.
commute (why?) we can write
{Tt(A)}
are contractions,
fo IITt_s(A)Ts(~)(AA-A~)xllds t
(6)
.::. tIICAA-A~)xll. But by (3) the right side tends to is established. t
as
0
over any bounded set.
rangi~g
Ttx
= lim
A+OO
Tt(A)X,
and lim Ttx t+O
= x.
The first term is at most
t, Ttx
I Ixl I
Tt
~
I Ixi I·
by setting
will be continuous
Moreover, for each
since
tion while the second term tends to I ITtxl I
{T t }
XE[!)A'
Because convergence is uniform in t
00, and so (5)
We note that the convergence is uniform for
l'le nOlo[ can define the operators
in
A,~ +
0
x E [!)A'
TtCA) as
is a contrac-
A + 00; hence
It is now easy to extend the definition of
x E V, and to verify that the resulting operators
to all
are contractions and form a strongly continuous semigroup. (These details are left to the reader.) Finally, we must show that the generator of the semigroup
{T t }
any case
which we have constructed is precisely
{T t }
has a generator; let us call it
A'.
A.
In
We will
4.
The Hille-Yosida theorem
try to compute
A'x
151
x E 9 A.
for
We have
(7)
Now
The first term tends uniformly to bounded. Tt(A) A~
0
as
A
The second term is bounded by
~ w
for
I IAx-AAXI I
is a contraction, and this bound tends to
0
t since as
Therefore
w.
lim T (A)'A x = T 'Ax tAt
A~OO
holds for each fixed in
t
x
~
9
and the convergence is uniform
A
over bounded intervals.
As a result we can pass to
the limit in (7) to obtain
ft
Ttx-x Dividing Qy A'x (since
t
lim
t~O+
and letting
t
Ttx-x t
1 t
TO = I) . A'
AI-A
9 A,
V
agrees with
= ~A
r 0
:::l9i
Ts'AX ds
Ax
and
Ax
A'x
is the generator of
smaller domain also onto that
0, we have
~
is an extension of
in a 1-1 manner onto But
t~O+
Hence 9 A,
in other words, A' Since
lim
and so
(8)
Ts' AX ds.
0
for
x
E~A;
A. {Ttl, AI -A'
maps
9",
by Theorem 1 of the last section. AI-A' V
on
9
A
and maps this possibly
by condition (iij.
A' = A.
It follows
This completes the proof
152
7.
SEMIGROUP THEORY
of the theorem. A) 5.
Complements. a)
Although we will not need to do so, it is quite
easy to extend the above theory to certain semigroups which are not contractions; instead of condition (i) in the definition of a "semigroup" (page 137) we can assume (i')
where
C and
m are constants independent of
in section 2 is changed essentially. that the resolvent
RA
x.
Nothing
In section 3 we note
is defined for all
A
m, and the
>
bound (2) is obviously replaced by (1)
The theorems proved in the section remain true with only the obvious change that in Theorem 1 we must take Problem 4.
A > m.
Prove that the resolvent of a semigroup
satisfying (i') is bounded by
II Rnx II A
< CII x II
-
(A-m)n
for
n" 1,2,3, ...
in place of the similar formula, with
Cn
(2)
on the right,
which would be obtained by simply iterating (1). As for the Hille-Yosida theorem (section 4), we expect that condition (i) remains unchanged and that (ii) is the same as before except that "for every every
A > m."
A > 0" becomes "for
Because of (2) above, however, we must replace
A)It is sometimes useful to note that conditions (ii) and (iii) need only be assumed to hold for all large A, rather than for all A > O.
5.
Complements
153
condition (iii) by the following:
(~I-A)-l, which exists by (ii),
the operator
(iii')
satisfies n .. 1,2, ...
Problem 5.
Outline the changes which must be made in
the proof of the Hille-Yosida theorem to adapt it to these more general conditions. b)
The theory can be adapted to cases when condition
(iii) in the definition of a semigroup, asserting that Ttx
+
x
as
t
+
0
for each
x, is not satisfied.
In that
case we define M = {x E V: lim
t+O+
Tt x = x},
and restrict our attention to this space. Problem 6.
and that the operators TtM eM for each
M is a closed subspace of V
Show that {T t }
leave M invariant (i. e. ,
t > 0).
Because of this result the theory we have developed applies to the semigroup
{T }
restricted to M.
t
culty, of course, is that M
The diffi-
may be a very small subspace!
But there are also situations in which this trick is quite useful. c)
Recall that an operator
xn E ~B' xn y = Bx.
+
x
and
(That is, the
BX n
+
y
B on V
imply that
is closed if
x E~B
and
B.!:!!E!!. of B is a closed subset of
V x V.) Problem 7.
Show that the generator
A of a contrac-
154
7.
SEMIGROUP THEORY
tion semigroup (or the slightly more general type of (a) above) is closed. (Hint:
use the proposition on page 139 which implies
that T Z - Z t
for all
t > 0
whenever
=
Jt0
E
9JA.)
Z
T AZ ds s
From this fact, recalling the "closed graph theorem," we have at once this Corollary. entire space 6.
If
A is a generator whose domain is the
V, then A must be a bounded operator.
Markov transition functions:
continuity conditions.
We now return to the study of Markov transition functions.
The idea is to apply the general theory we have just
been developing to the semigroups
{Ttl
and
{Utl
which
were defined in section 1 by equations (1) and (3). technical reasons we will concentrate on
{Ttl.)
nential" differentia·l equations associated with {utl
(For
The "expo{T t}
and
will then be respectively the "backward" and "forward"
equations associated with the Markov process. Our first job is to adapt the continuity condition (iii) of section 2 to the present context. requires that
I ITtx-xl
1+
For the Markov semigroup
0
as
{Ttl
bounded, measurable functions on lim
sup
t+O+ xES
for all
f E F.
IJ
S
t
+
0+
Condition (iii) for all
acting on the space
x EV. F(S)
of
5, this assumption becomes
Pt(x,dy)f(y) - f(x)1
I
In particular, choosing
=0 f =
(1) ~{x}
we get
6.
Markov transition functions: continuity conditions
lim
t ... O+
Pt(x,{x}) = 1
for each
ISS
xES.
(2)
This is the same as condition (e) on page 124, and its validity is thus necessary in order for tinuous at
t
=0
on the space
{T t }
to be strongly con-
F(S).
As an example (or rather, a large class of examples) we can consider the "compound-Poisson" process defined by (1) on page 124 (see also problem 4 just below it). Problem 8.
Let
{T t }
be the semigroup on
tained from the above process. continuous at operator
t
= 0,
M(S)
{T t }
ob-
is strongly
and that its generator is the bounded
A defined by Af(x)
for all
Verify that
F(S)
f E F.
=A
I
S
p(x,dy)[f(y) - f(x)]
(3)
Verify also that the semigroup
rUt}
on
is strongly continuous and compute its generator. Every semigroup is, as we have seen, associated with
an "exponential" differential equation the generator of the semigroup.
~~ = Au, where
A is
For the above semigroup
{T t }, this equation becomes
d~~~,t) = A
Is
p(x,dy)[~(y,t) - ~(x,t)].
(4)
Equation (4) is thus the Kolmogorov backward equation for the compound-Poisson process, and of course it has the solution w(x,t)
= Ttf(x)
for
f E F.
This function satisfies both
the equation and the initial condition
~(x,O)
= f(x),
and it
is unique under mild additional restrictions (Theorem 3, page l4S).
The forward equation is obtained similarly from the
generator of the semigroup
rUt}; it may be a useful exercise
7.
156
SEMI GROUP THEORY
to derive it. It is clear, however, that we can't construct a satisfactory general theory in quite this way, since the Brownian motion process -- our most important and interesting example -- does not satisfy condition (2).
Perhaps it would
be natural to simply try and apply remark (b) in the last section, but we will do something a little different and shift our attention to continuous functions. sume from now on that
S
a-field of Borel sets. by
We accordingly as-
is a metric space and
~
is its
The metric itself will be denoted
p. Definition 1.
A Markov transition function on
is a Feller function if
(S,~
Ttf. defined by
is a continuous function whenever
f
is bounded and continu-
ous. Let
C(S)
denote the Banach space of bounded continu-
ous functions with the uniform norm. equivalent to the assertion that space of
F(S)
for the operators
lent to say that the measures x
The "Feller property" is
C(S) {T t }.
Pt(x,.)
is an invariant subIt is also equivadepend continuously on
in the usual weak topology, for every fixed
t
>
O.
Al-
most all the Markov processes which arise "naturally" in or
Rn
RI
(including Brownian motion~) do have the Feller pro-
perty, although of course it is easy to construct examples where the property fails. The Feller condition deals with continuity of the transition function
pt(x.E)
in
x, and does not, by itself.
6.
Markov transition functions: continuity conditions
imply anything about continuity in {T t }
easier for
t.
157
However, it is much
to be continuous in
t
on
C
than on
F;
in particular, condition (2) is no longer necessary since the function
~{x} is usually not continuous. We will now explore conditions related to continuity
in
t, in the sense appropriate to semi groups on
Notation:
we write
Ne;(x) '" {y E S; p(y,x)
0
and
XES, and define f E C(S).
Clearly
F
is a linear functional on
to see that it is bounded (its norm is
C(S), and it is easy 1) and non-negative.
According to the Riesz representation theorem, such a functional always arises by
integra~ion
non-negative Borel measure on
S.
with respect to a finite, Of course the functional
F and so the measure which represents it will depend on the choice of
t
and
x; we denote it
Pt(x,·).
Thus we have
7,
Normal Markov semigroups on
=
Ttf(x) = F(f) and we must show that
C(S)
163
fs pt(x,dy)f(y),
pt(x,')
is a normal Markov transition
function. Since
Pt(x,')
is a finite non-negative measure, (ii)
clearly implies that the total mass is the measure
PO(x.·)
1.
Because
T
o=I
must be just a unit mass at the point
x.
Ttf(x)
is continuous in
x
Pt
comes automatically; this also implies that
measurable for each Borel set
so the Feller property of E.
pte· .E)
is
We have
JS {J S pt(x.dy)ps(y,du)}f(u) {T t }
by the semigroup property of Since this holds for all
and Fubini's theorem.
f E C(S), it fol10ws*) that
i.e., the Chapman-Kolmogorov equation is satisfied,
Thus
Pt
is a Feller transition function, We know that f E C(S).
Ttf
~
f
uniformly as
S
~
~
0+
for
Even if the convergence were only pointwise, this
implies stochastic continuity for *)If
t
and
and if
v
J
Pt
by proposition 1.
If
are finite Borel measures on a metric space f
d~ = f f dv for all f E C(S). then ~ = v.
When S is compact this is the uniqueness part of the Riesz theorem used above, but it is quite easy to prove it directly for any metric space.
164
7.
SEMIGROUP THEORY
we rely on proposition 3 of the last section, uniform stochastic continuity fOllows automatically and so the proof that Pt
is normal is complete. Without using proposition 3 we can proceed as follows:
define for
fE(x,y)
{:
E
.
> 0
1
E
p(x,y)
By the triangle inequality
for
p(x,y) < E;
for
p(x,y)
>
E.
we have, for any
x
and
z,
- f (z y)/ < p(x,z) max /f~(x,y)
f
f
Then for any
to x
=g
-hi
is unique.
Af - Af
=
g
at-
7.
166
Repeating with
-f
and
-g, we obtain
SEMIGROUP THEORY
AI If I I ~ I Igl I,
which verifies the third condition necessary to apply the Hille-Yosida theorem.
Hence we can conclude that the operator
A does in fact generate some contraction semigroup on let us call it
C(S);
{T t }.
Next, we know that the initial-value problem du dt = AU, has the unique bounded solution
=~
u(t) Thus
Tt~
u(O)
=~
u(t) =
Tt~'
is also a solution, since =
~
for all
t.
=0
A~
We see that by (iv) above.
It only remains to prove that
Tt
is non-negative and then we can apply Theorem 1. We first show that Suppose that If
f
so that
H- M
RA
= (AI_A)-l
= g and that
attains its maximum at
g < 0
implies
means that the operators
g (x) < 0
for all
x.
x O' we have
f < O. RA
is non-negative.
But
f = RAg
and so this
preserve positivity.
The same property holds for
Tt .
In the proof of the
Hille-Yosida theorem, we constructed the operators e- tA
Clearly, since
RA
L
00
n=O
(tA 2R,)n A
n!
is a positive operator the same holds for
Finally, since
Tt = lim Tt(A), we conclude that
is itself positive. We have now shown that
A->-oo
A generates a semigroup
Tt (T t }
and that this semigroup satisfies the hypotheses of Theorem 1.
7.
Normal Markov semigroups on
Hence
{T t }
C(S)
167
is normal, and Theorem 2 is proved.
Remark.
In the definition of a Markov transition
function, we assumed that
=1
pt(x,S)
for all
t,x.
It is
sometimes useful to relax this by assuming only that p (x,S) < 1. t
The appropriate changes in Theorems 1 and 2 are
-
simply to delete the conditions that J" E!?) and
0, respectively.
A~ =
T
t
If
~
=~
or that
Pt(x,S) < 1, we inter-
pret the difference as the chance that the process, starting
x, has "vanished" from the state-space
from t.
S
before time
Sometimes one says instead that the process has been
"killed. " We conclude this section with a result of less importance, but which comes in handy for discussing certain examples and will have more applications later on. Theorem 3. normal semigroup ~
Suppose!h!! {T t
}
C(S)
~
t+O+
Ttf(x) - f(x)
the convergence in (1) Proof.
xES). ~
clear that H-A
Then i f
g
be uniform, and
Af = g
when
is continuf E!?).
~
A
fE 9.
A is an extension of the generator is an extension of
We will show that pose that
(1 )
Let!?) = {f E C(S): the limit (1) exists and
is continuous}, and write
fore
that for
~
- - ' - - - t - - - = g (x)
exists pOint,,,ise (for each
g
and suppose
f E C(S)
lim
~
A is the generator of !
-
f e!?) and that
~I
~I
A.
It is There-
- A as well.
- A is one-to-one on!?). Ai - Af
= 0, and let
f
Sup-
attain
7.
168
its maximum at and so
xO'
f(x) < 0
we find that But
f
It is very easy to see that
for all
=0
x.
AI - A maps
Af(x O)
0,
~
Repeating the argument for
-f,
AI - A is one-to-one.
and so
and is also one-to-one. 8.
SEMIGROUP THEORY
onto
~
C(S); AI - A extends it
!?I
Clearly, this implies that
A
9.
=
Examples. (a)
tion 2).
Suppose Then
S
F (S)
is finite (the case of Chapter 6, secC(S)
consists of column-vectors, and
a transition function is a family of stochastic matrices satisfying
= I, p(t+s) = pet) . p(s), and lim pet) = I.
p(O)
t .... O+
We define
Ttf(i) =
generator
G whose domain is dense; since
I: p .. (t)f(j).
j
1)
The semigroup C
dimensional, a dense linear subspace must be Hence lim t .... O+ exists for all
f.
I:jPij (t)f(j) - f(i) t
Choosing
f (i) =
{Ttl
pb (0)
has a
is finiteC
itself.
Gf (i)
i O' we i,j E S; this differen($ •• 110
for fixed
gij for each tiability was proved in a different way in Chapter 6. find that
pet)
also follows from the general theory that
pet)
It
= exp(tG).
It is easy to use Theorem 2 above to verify that a matrix
G is the generator of a transition function iff it
is non-negative off the main diagonal and has row-sums equal to
O.
Indeed, any matrix
theorem. being
G satisfies (i) and (ii) of the
Condition (iv) is clearly equivalent to row-sums
O.
If the entries of
G are also non-negative off
the diagonal, it is clear that condition (iii) of Theorem 2 holds.
Finally, suppose (iii) holds and consider the func-
tion (vector)
8.
169
Examples
{"
f(i)
-a,
a > 0, so
f
i
i
iI'
0'
otherwise.
0
We take
i
has its maximum at
i
i O'
Then by
(iii)
Taking
= 0,
a
g..
we conclude that
< O.
10 1 0
But i f the non-
g.. were negative, choosing a large enough lOll would produce a contradiction. Hence G does satisfy the diagonal term
description stated above, which agrees with the condition on generators derived earlier in Chapter 6. (b)
The previous examples have all involved generators
which are bounded operators.
The simplest unbounded case is
probably the deterministic motion with constant velocity We take f(x
+
S
vt)
R'"
and
for any
Pt(x,{x f E F(S).
+
vt}) = 1, so that Clearly, i f
f
v.
Ttf(x)
is differen-
tiable we have
lim
t ... O+
and if
f' E C(S)
Ttf(x) - f(x)
-------- = vf' (x), t
the convergence will be uniform according
to Theorem 3 in the last section.
(It is also easy to see it
directly using the mean value theorem.)
Hence
Cl C
~A
where
Cl = {f: f, f' E C(S)}, and for fECI we have Af = vf'. Note that f' (~) must be 0 since f is bounded, which agrees with the need to have is a trap.
(A state 1
for
Xo
Af(~)
o
because the state "","
is called a trap in case
t > 0.)
We now will prove that
~A
Cl .
Since
AI - A extends
170
p.
SEMI GROUP THEORY
7.
-
v d )
ax
and
-
AI
A
9 A onto
maps
A
way, i t is enough to show that onto
That is, for all large
C.
-
v d
Ox
in a one-to-one
C
actually maps
A and all
g EC
Cl
we must
show that
= g(x)
Af(x) - vf'(x) has a solution f(x) = e where
r
The general solution of (1) is
fECI' Ax/v
(1)
[K - -1 v
0
e -AU/V g(u)dul,
K is an arbitrary constant.
brackets must approach f E C(S)
x ....
As
+00,
the term in
i f there is to be any chance for
0
and so we have to choose
K
= -v1 foo0 e - AU/V
g(u)du.
It is then easy to verify (do it) that, with this choice of
K, f
actually is in
Cl ; hence (1) has a solution in
Cl ,
Cl = ~A'
and so
(c)
As noted already, the Brownian motion transition
function (page 125) is normal on generator.
R*.
We will now compute its
To begin with, we partly follow the method used
in proving the theorem on page 130: if f E Cz = if: f, f', ~'
E C(R*)} T tf(x)
we write
- f(x)
t
-l-f t ,!zTTt
- (y- x)
2t
e
_00
2
[fey) - f(x) ]dy.
Then we use the Taylor expansion fey) - f(x) '" f'(x)(y-x) where
r(x,y) .... 0
as
+
12 f"(x)(y-x) 2
y .... x, so that we have
+
r(x,y)(y-x)2,
8.
Examples
171
t f"(x) Since
r(x,y)
~J
+
ty2nt
(Z)
_(y_x)2
00
r (x ,y) (y-x) 2e
2t
dy.
-00
is bounded by
maxlf"l
and tends to
0
near
y • x, it is easy to show that T /(x)
lim t ... O+
- f(x)
According to Theorem 3, since
Z
pI E C
f E 9A
form convergence, so that Cz C 9 A, and
•
9 A.
we actually have uniAf =
and
2'1
f".
Thus
t(d Z/dx 2 ).
A is an extension of
Actually, Cz
1 f"(x).
t
The method of proof resembles
example (b); we simply need to verify that the equation
i- fIt (x)
Af (x) -
has a solution
f E Cz for any
g (x) g E C, A
>
O.
Changing nota-
tion slightly, we consider the equivalent equation A2f • g.
The general solution is
f(x) where
+
Kl
and
K2
1:. fX A
0
sinh(Ax - Au)g(u)du,
are arbitrary constants.
from this fact we can conclude that Problem 9.
f E C .
Z·
9 A.
C2 •
Show that the constants
KI
and
(3) are uniquely determined by the requirement that
bounded, and that the resulting function
(3)
We will leave
it to the reader to verify that there is a solution
to
f"-
f
in
K2
f
be
actually belongs
C2 . d)
We give another derivation of the generator of the
Brownian motion, using the resolvent. tion formula
Because of the integra-
7.
172
SEMIGROUP THEORY
1 -ml ul -- e
lIT
we have
=
R,\g(x)
= f;
Let us put R,\g
f
1
oo
-co
m
g(y) - e
-IV: I x- y I dy.
by direct calculation it is easy to show
that f"(x) = -2g(x) But since ZAf(x)
,\f - Af
=g
2,\R,\g(x).
+
the last expression above equals
and we have proved that and
Af
=~
fll.
This time the second derivative is seen to be an extension of A instead of the reverse. To prove
~A
= CZ'
we must show that the solution
f
of the equation ,\f -
is unique. (Why?)
1:. fIt Z
=g
But the general solution of
which clearly is not bounded (and hence not in Kl
= K2 = O. e)
,\f
1:.fll
is
2
CZ) unless
Again we have the result
On page 126 we discussed Brownian motion on
with a reflectin& barrier at
x
= O.
R+
Let us compute the gen-
erator of this process. If
shows that co,
x
x > O. ess en t ially the same argument as before \~hen
f E C2
of course, AfCeo)
= O.
In fact, for
= O.
we must have
Af(x) = if" (x) .
At
We will now consider the case
f tC z we again use a short Taylor
8.
Examples
173
series and get Ttf(O) - f(O) t
=
fih r
£J.Ql/k r t
1ft
0
Z
-L
2t [fey) - f(O)]dy
e 2
-h dy ye
0
+
1:.f ll (O) Z
+ 0(1) ;
the error term "0(1)" is similar to that in (Z).
The inte-
gral in the first term can easily be calculated, and the result is simply exist unless
fl
(O)~.
fl (0)
= 0,
necessary restriction on
=
fl (0)
lim
Hence the limit as
t
~
cannot
0
and so this boundary condition is a If
~A'
f E Cz and satisfies
0, then
Ttf(x) - f(x) t
t~O+
1
= '2f "(x)
fll E C,
exists pointwise with
and again Theorem 3 implies that the convergence is uniform so that
:::>C Z () {f: fl (0) = 0),
~A
Once more, there is actually equality. the equation
lows the same path as before: f E Cz
has a solution
() {fl (0)
"O}
The proof fol1 Af - -fll =g
for each
Z
g.
We leave
the details as an exericse: Problem 10. Af - 1:.fll
show that
Z fl (0) "0.
satisfies ator
For each g
A> 0
and each
has a unique solution
g E C([O ,co]), f E Cz which
Conclude that the domain of the generCz ()
A of the reflecting Brownian motion is exactly
{fl (0)
OJ. f)
There are two other simple cases:
we mention the
results without writing down the corresponding transition functions.
When the Brownian motion reaches
x = 0
for the
first time, instead of reflecting it may stick there forever; in this case
0
is a trap.
Consequently, we must have
174
7.
= f(O)
Tt£(O) But for Since
for all
t
and all
x> 0, we still have Af
SEMI GROUP THEORY
f, so that 1
Af(x) = 2'f"(x)
is continuous, it follows that
A£(O) x
if
f
A is now just
C2 n{f: f"(O) =
E~A'
f" (0) = O.
is called the sticking barrier boundary condition. main of
= o. This
The do-
oJ, and for functions
in this space the formula for the generator is still
Af = tf"(X). Alternatively, the process may terminate upon reaching 0
by the "killing" of the Bro\mian "particle." have
\~ill
Ttf(O)
= 0, and so ffl
f(O) = 0; the
~A unless
latter is the boundary condition for this case. this example involves L fl
Pt(x,[O,~])
0
=
(l-ct)f(O)
ctf:
f(y)dF(y)
+
oCt),
is the parameter in the exponential waiting-
time distribution at lim t+O+
+
x
= O.
From this we would get [fey) - f(O)]dF(y).
*)See Chapter 8, section 4, for a proof.
t,
9.
Conclusion
179
1 Af(x) = - flO (x) for x > 0 the above 2 limit must also equal f"(O). Thus we expect that
But if
f E~A
and
i
ex>
fo
flO (0) = 2c
[fey)
- f(O)]dF(y)
will be a necessary condition for
(5)
f E ~A; in other words,
(5) is the appropriate "boundary condition" for this process. The dependence of the condition on values of x = 0
from the boundary
f
far away
(unlike our previous boundary con-
ditions) reflects the fact that the motion of the process is not continuous in this case when it reaches the boundary. Af = !f" defined 2 ' satisfies (5)}, is the genera-
Problem 11.
Show that the operator
C2 ([O,00]) n {f tor of a normal Markov function.
on the domain
9.
Conclusion. Here are three more problems, perhaps a little harder
than most of the earlier ones: Problem 12. ing" boundary at (I. e. , Af
x
= ! f" 2
[0,1], fl (0)
Consider the Brownian motion with "stick-
= 1 and reflecting boundary at x = O.
and
= f" (1)
O. )
converges weakly as at
x = 1.
f E ~A t
(That is,
tinuous function
f.)
+
00
iff
flO
is continuous on
Prove that, for each
x, Pt (x,·)
to the unit measure concentrated
f~ Pt (x,dy)f(y)
+
f(l)
for every con-
This gives some justification for the
name "sticking". Problem 13. x
=1
is "sticky" -
pf"(l) = 0
for
Suppose instead that the boundary at the boundary condition is now
0 < p < 00.
In this case, show that
fl (1)
+
180
Ttf(x)
7. +
f(l)
does nQ1 hold for all
f
SEMIGROUP THEORY
as
t
+
~,
the process will never become permanently stuck at Problem 14.
Let
x
= 1.
A be the generator of a normal tran-
sition function on the compact space be strictly positive.
so that
S, and let
Show that the operator
Bf(x) = p(x)Af(x),
!?J
B
p E C(S)
B defined by
=
is also the generator of a normal Markov function on
S.
(The corresponding new process is said to be obtained from the old one by a "random time change.") A final remark.
It is clear that semigroup theory has
not made everything utterly simple -- that would be too much to expect.
What it has done, I hope, is to put the peculari-
ties of particular Markov processes into a unified context, so that one knows what to try and do, and why.
Now there are
specific technical problems to deal with, instead of general confusion.
I feel it's a big improvement!
An important paper, in which many of these things were
done for the first time in a systematic way, was published by Feller in 1952 with the title "The parabolic differential equations and the associated semigroups of transformations." To read it (or part of it) is still a valuable experience.
CHAPTER 8 MARKOV PROCESSES So far in the second part of this book we have studied Markov transition functions with only informal references to the random variables which actually form the processes themselves.
We now turn to this neglected side of our subject.
Of necessity, the discussion will have a more measure-theoretical flavor than hitherto.
In fact the modern theory of
Markov processes has become very complex, because it has been necessary to introduce a great deal of machinery to bridge the gap between intuition about how a process "without memory" should behave, on one hand, and what can be rigorously proved, on the other.
Here we will try to keep this machin-
ery as simple as possible, and we will introduce its components only gradually, as they are needed. 1.
Construction of processes. Suppose that
Pt
(S,5')
is a measurable space and that
is a Markov transi.tion function on
S.
The idea behind
the definition of a transition function suggests the following:
182
8.
Definition. with values in
(n.
~,P),
Pt
provid~d
A stochastic process
MARKOV PROCESSES
(xt(w): t
>
O}
S, defined on some probability space
is said to be governed £r the transition function that for some probability measure
~
on
(S,~)
we have
f ... f f
YnEBn" 'YIEB I x for all
~(dx)pt
1
(x.dYI)Pt -t (Yl' dY 2)" 'Pt -t (y -I,dYn) Z 1 n n-l n
0 < tl < t z < ••• < tn
0, and
D
E~,
8.
184
MARKOV PROCESSES
where, of course, each equality is to hold almost surely.
We
begin with the second equality in (1); the other part drops out along the way. It is clear that
is measurable with respect
pu(xt,O)
to
?=t' and hence with respect to a measurable function from S into
~t' since Pu (. ,0) [0,1). What must be
verified, therefore, is that for any set
fA p
u
(xt(w),O)dP = peA n
{X t
+u
is
A E~t C ~t; thus we have A E ~t' we conclude that
B E ~t
and the theorem is
proved.
If~: s(n) ...
Corollary. pect to
u(n),
and
J
l' f
R1
is measurable with res-
'" ( x t + "" ,x t + ) uI un
,
, 1n ' t egra hI e, 1S
then
This result is,of course, an extension of (5) above, and is proved in much the same way using that fact that ~(Xt+
ul
, ... ,X t +U
n
omit the details.
)
is measurable with respect to
~t;
we
An extension to an even larger class of
functions is also possible, which amounts to letting
~
in
(8) depend on infinitely many of the "future" random variables instead of a finite number.
We won't stop to formulate this
now; however, although I won't call it a "problem," it is a good idea for the reader to try to state and prove a result
188
8.
MARKOV PROCESSES
of this kind independently. 3.
Path functions of Markov processes. So far our study of Markov processes has been based en-
tirely on equation (1) (page 182) which shows how the finitedimensional distributions of the process are calculated from a Markov transition function.
As we saw long ago, however,
knowledge of all the finite-dimensional distributions may be insufficient to determine a random process as precisely as one may require.
This difficulty arises also in the Markov
case -- indeed, the example on page 4/5
is a Markov process
and can be obtained from the normal transition function defined by
Pt(x,{x})
= 1 for all t
> 0
and
x E [0,1].
Thus the Markov property by itself does not imply that the path functions of a process must be "reasonable." But as asserted in Chapter I, the correct question to ask is a different one:
given a transition function and an
initial distribution, does
~
exist a process having the
corresponding finite-dimensional distributions whose paths are almost all "nice" in some sense?
Kolmogorov's existence
theorem does not help much here, because if we use the space Q
= ST
Q '
and the field
JP generated by the cylinder sets of
as in proving that theorem we find that
continuous}
is not a measurable set.
{w: x t (w)
is
(The same holds for
right-continuity, for being locally bounded, and for most other reasonable properties which we might want the paths of the process to have.)
Nevertheless, under quite general con-
conditions there does exist a Markov process with nice paths
3.
Path functions of Markov processes
189
equivalent*) to any given process: Theorem 1. pose that in
S
tion.
Let
{x t ; t > O}
which is' governed
S
be
is
~
Qr
~
compact metric space, and
stochastic process with values
~
normal Markov transition func-
{x t },
Then there exists ~ process
{x t }, such that except for functions
XC.)Cw)
limits for all
~
~
~
eguivalent to
w-set of probability
0
the
right-continuous and have left-hand
t > O.
We will refer to such a process for short as a "rightcontinuous Markov process," although more than just rightcontinuity is involved.
Since equivalent processes have the
same finite-dimensional distributions, {Xt} the same transition function as initial distribution.
{x} t
is governed by
and has also the same
This theorem will be fundamental for
our further study of Markov processes.
Its proof is rather a
long story, since it uses the theory of "martingales" and "super-martingales" which we have not yet discussed. have many other uses as well.)
CThey
We will therefore give a
proof of Theorem 1 later, in Chapter 10 which is devoted to martingale theory and some of its applications.
For now we
will use the theorem without proof -- or, perhaps more logically, we can simply assume that we are dealing with rightcontinuous processes, while secretly relying on Theorem 1 for assurance that the objects under discussion do really exist. *)Recall that processes
{x t }
ity space and parameter set
and {Yt} T
on the same probabil-
are equivalent provided that
for all
t E T.
8.
190
MARKOV PROCESSES
It is naturally interesting and important to discover when the paths of
{x t }
are actually continuous for all
t.
The rest of this section is devoted to establishing a useful sufficient condition for path-continuity, and applying it to some of the examples we have considered earlier.
The results
below have evolved from ideas of Dynkin, Kinney and Dobrushin. Theorem 2.
{xt(w); t ~ O}
Let
be a measurable*) ran-
dom process with values in a metric space for each
E
and
> 0
M
e;}, then we
0, then
€
t
~
0)
= 1.
(5)
We know that right-continuity of its paths
implies that the process
{x t }
is measurable (page 6).
It is easy to see that condition (4) for the transition
func-
tion implies (1); in fact
where
~
is the initial distribution, 'and if (4) holds uni-
formly (1) follows immediately.
Hence (2) holds, and since
by assumption the process has no discontinuities except jumps we get (5). The last result gives considerable justification for the idea that conditions like (5) on page
129 are closely
related to continuity of paths; we see that it is actually sufficient for continuity if that condition holds uniformly in
x. Problem 2.
Show that there exists a version of the
Brownian-motion (Wiener) proces.s which has almost-surely continuous path functions. A)Remember that by "right-continuous process" we mean that the paths have limits from the left as well.
3.
Path functions of Markov processes
193
It is usually difficult to verify condition (4) directly, since it is rather exceptional when any simple formula for the transition probability function is available. By making stronger assumptions, however, we can give a useful criterion in terms of the generator of the process: Theorem 3.
Let
process governed Qx that, for each such that (iii)
f
~
e: > 0
(i) f and
~
Af
0
{x t }
be
~
right-continuous Markov
normal transition function. and
f E 9A
xES, there exists S, (ii)
Qll
fey) > 0
Suppose
for
y
~
Ne:(x),
both vanish on some neighborhood of
x.
Then is continuous for all
P(Xt(W) Proof. Ph(x,S-N 2 e:(x)) e:
0)
~
=
1.
Suppose that condition (4) of the corollary
above does not hold.
for this
t
Then there is an
is not
o(h)
there exist
e: > 0
uniformly in
cS > 0, h n .... 0
such that
x; in other words,
and
x
n
E S
such
that Ph (x n ,S-N 2 e:(x n )) > cSh n n Since to some limit
(6)
xn tends we will have
is compact, we can assume that
S
X', then when
S-Ne:(x) :JS-NZe:(X n )'
p(xn,x) < e:
Hence (6) implies that (7)
Ph (x ,S-Ne:(x)) > cSh n n n for all large
n.
Now using this
x
and
e:, consider the function
described in the hypothesis of the theorem. S-N (x), let e: By (7) we have
positive on the compact set minimum there.
Since c > 0
f
f is
denote its
194
S.
~
Th f(x ) n n For large
f
S-Ne(x)
close to
xn
x
n
But, since
MARKOV PROCESSES
Ph (xn,dy)f(y) > c O.
(S)
f E ~A' the convergence lim t ...
is uniform in
o
Ttf(y) - fey) - Af (y)
y.
t
Substituting
t
= hn
=
0
and
y
= xn '
and not-
Af(x ) = 0 for large n, lYe find a contradiction n with (S). If such functions f E ~A exist as postulated, ing that
therefore, (S) is impossible and so we see condition (4) does hold.
The conclusion of Theorem 3 therefore follows from the
corollary to Theorem Z above. This theorem is easy to apply in many cases.
We have,
for example, the following: Corollary.
Let
process on R1*,R+*
{x t }
be a right-continuous Markov whose generator is of the
or
form Af(x) = b(x) f"(x) + a(x)f'(x) Z
with
a (x)
and
b (x)
ter 7, section Si.)
continuous and
b (x) > O.
Suppose the domain of
A is
(See ChapCz
re-
stricted by boundary conditions such as
at each finite {x t }
end-point
a.
1
of the state-interval.
has continuous paths with probability
1.
Then
4.
Holding times
Proof. at first so does
195
As an example, suppose
x E (0,00). Af, since
If
f
. R+* ,and cons1der
S
vanishes in a neighborhood of
is a differential operator.
A
x
Thus it
is only necessary to find a dary condition at
C2 function satisfying the bounwhich vanishes near x and is positive
0
elsewhere; there is no difficulty at all in doing this. cases
x
=0
x =
and
00
they are just as simple.
The
must be examined separately but The reader can think over the de-
tails of the various cases more easily than they can be written down. To understand the working of our criterion for continuity a little better, it is useful to think through why it does not hold in the case of example Sj; even though
A is
still a differential operator in that case, the boundary condition is no longer of local character.
Another example:
when is the condition satisfied in the case (example Sa) of a finite Markov chain? 4.
Holding times. Suppose that a Markov process
state
x.
How long does it stay at
the first time?
In other words, if
{x t } x
on
begins in
S
before leaving for
TX = inf{t
~
0: x t
Fx(t) = PX(TX
what can be said about the distribution
The first thing to notice is that in general not be a random variable. and the random variables
In fact, if {x t }
S
RI
~-measurable.
TX
t)? may
n, ~,
is definitely
This melancholy situation is redeemed
by Theorem I above, however, since when continuous we have
~
are those constructed in the
proof of Kolmogorov's existence theorem, TX not
and
x},
~
{x t }
is right-
8.
196
for all
s < t}
for all rational
s < t}
{W: x
= {W:
x s .. x
MARKOV PROCESSES
.. x
s
x}. n s
O',
At the other extreme,
a trap.
Using the right continuity of the paths, we have
4.
Holding times
f(t)
197
= -ni t
s
n ....
00
lim
n ....
Pt/n (x, {x})
O,l, ... ,n-l)
i
n-l
00
Thus the theorem will be proved as soon as we establish the following: Lemma.
Suppose that
Proof.
Formally, for any
is a function on [0,00] on with values in [0,1] such that lim a(t/n) = f (t) exists n .... 00 for every t > 0, where f(t) is non-increasing. Then f(t) = e -ct for some constant c E [0,00].
feat) =
lim a(at)n n .... 00 n
=
aCt)
a
>
°
we can simply write
lim m .... 00
The only difficulty with this is that making the change of variable a
n = am
may lead to non-integral values of
m.
If
is rational, however, m will be an integer infinitely offeat) = f(t)a
ten, and so the limit relation in this case. hold for all and
c
for
Since
f
is monotonic the relation then must
a, and so fixing -log f(t)/t
is justified
t
we get
and writing
s
for
f(s) = exp(-cs)
at
as claimed.
This proves the lemma and the theorem. The discussion so far has concerned only the first holding time in
x, assuming
x
is the initial state.
The
result can be strengthened by using the Markov property: Theorem 2. governed by where
AE
Let
be a right-continuous process
-----
Pt' and suppose that ~t'
for
P~({Xt
= x} n
A)
>
0
Then t u is deter-
that
We must have
mined by n Z + k'
{Ti' i > Z}
and must produce the values
tl - u, t1 + t z
at times
-
that (Z) defines the original path. peNt
1
= k'
u
k'
and
in just the same way
Thus we have
+ I, N = k' + 1 + n 2 ) tl+t z
Substituting from (3) for the probability in the integrand (because of the induction hypothesis), this becomes k' k' nZ A (tl-u) -HZ (Hz) } -----e du k"•
e
which agrees with (3). k
=
-At
1
n Z·'
(At l )
k' +1
(k' + 1) !
e
-At z (At Z)
nZ
Thus (3) holds in all cases
where
Z.
Formula (3) for general an induction on
k
k
can now be established by
in just the same way we passed above from
202
k
8.
=1
k
to
2.
MARKOV PROCESSES
The details of this induction step will
be omitted. There is nothing very surprising. incidently. about this dual way of describing the Poisson process -- either as a Markov process with Poisson-distributed independent increments or as a "counting" of the number of exponentia11ydistributed waiting times which have elapsed.
One can see
the same phenomenon in a simple model using discrete Bernoulli trials.
Suppose for each
with probability
p
or
= 1.2 •••.
n
that
Xn
1 - P and that the
q
=1
or
{Xn }
0
are
Nn = ~ X.1 (with N = 0). The dis0 i=l is obviously binomial for each n. and
independent; then set tribution of {Nn }
Nn
has independent increments.
On the other hand. the
waiting times between occurrences of a "one" are independent and have a common geometric distribution. and the relation of Nn
to the number of such waiting times which have elapsed be-
fore time process.
n
is analogous to our construction of the Poisson
Moreover. if
units by setting
n
=
p
+
0
and time is measured in large
[At/pl. the discrete model approaches
the Poisson process with parameter Problem 4.
A as a limit.
Find a way to formulate rigorously the
above statement about the approximation of the discrete process to the Poisson -- and then prove your assertion. In much the same way -- using the idea of waitingtimes in each state -- one can directly construct the paths of any finite-state Markov chain (Chapter 6) and lots of more complicated processes as well.
But of course this won't
work for Brownian motion. since in that case for any
t > O.
p (x
= x)
= 0
x t A very nice direct construction of Brownian
S.
Example:
the Poisson process
203
motion as the sum of an infinite series of functions of
t
with random coefficients is described in Chapter 4 of my book Probability (I did not discover it!); Wiener's original construction used random Fourier series. with such things in this book.
We won't go further
In my opinion, though, com-
parison of special methods with general ones is often productive and almost always interesting, and no one can "really understan~'
herself.
probability theory without doing some of that for
CHAPTER 9 STRONG MARKOV PROCESSES Suppose that known that
xt
o
= x,
{x t }
is a Markov process.
the process can be thought of as "be-
ginning afresh" thereafter as though state.
If it is
x
had been its initial
In particular, we have seen that (i)
But suppose that
to
is not fixed, but instead is itself a
random variable.
Does any analogous statement hold in this
case? It is quite natural to expect that something similar to (i) should be true.
For example, if
to
is the random
moment when a continuous Brownian motion first reaches the state
x
=a
(having started elsewhere), then it is plausible
and true that the motion behaves subsequently as though it were beginning in state the other hand, if
to
a
with no "memory" of its past.
On
means a certain fixed length of time
before the process reaches
a, then clearly (i) does not hold.
1.
Families of
a-fields
205
The tasks of the present chapter are to formulate precisely this idea of beginning afresh at "suitable" random times, to show that many processes do satisfy the resulting "strong Markov property," and to explore some of the consequences. In the following pages, of course, we will only make a beginning with these matters which lead into much of the recent research on Markov processes.
In particular, section 6 is a
brief introduction to a substantial chapter in contemporary probability theory. Some suggestions for further reading can be found in the bibliography. 1.
Families of
a-fields.
We have previously defined and discussed the Markov property using the
a-fields
~t' ~t
and
~t
(page 9)
which are generated by the random variables of a process {x t }.
A small change in our definition and point of view
turns out to be quite useful: Defini tion 1. let
T
be a subset of
there is a sub when Then
Let
be a probability space,
Rl , and suppose that, for each
a-algebra
ff such that fft
of
.~
1
t E T,
C.9t 2
tl :. t Z' Let {x t } be a random process on (Q, ff, P) . {x t } is said to be adapted to the fields { fft } pro-
vided that the function t
(Q, ff,P)
x t (·)
is
~-measurable
for every
E T. Naturally every process
past, that is, to the fields smallest possible ones.
{x t }
~t'
In fact
is adapted to its own
and these fields are the {x t }
is adapted to any
non-decreasing family of fields such that
~ :>~t
for all
9.
206
t, and to no others.
STRONG MARKOV PROCESSES
However, important aspects of the re-
lationship between the variables and the fields may be distorted if the latter are "too big."
We will discuss this
idea a little in the case of Markov processes. Definition 2.
(n,
YoP)
with
T
ing family of sub
=
Let +
{x t
or
R
}
be a random process on
Rl , and let
a- fields of
¥.
be an increas-
We say that
the Markov property wi th respect to the fields vided that (i)
{x t }
is adapted to
{¥t}
{x t }
has pro-
{Y}
t and (ii)
(a. s. )
for every
t E T
and every set
BE
(1)
~t'
This definition reduces to our earlier one if
~
=
Y0
m>O
('I
hk
2.
l }] a + -m
The set in brackets on the right is in
Y a+m -1 and so the limit as m-+-oo belongs to ~+' or; because of right-continuity, to ~. This proves that A E SZ, so
nn
'!
(vii)
~
n
Let
C
~.
{x t }
on some metric space
be a right-continuous Markov process S, where the Markov property holds with
respect to fields { ?"t}. fine
T
Then
T
inf{t
=
~
Let
0: x t EU}
{w: x t EU
{w: T < a}
U).
{~}.
{w: x t EU for some rational t < a} = =
Obviously this set is in (viii)
be an open set and de-
(the first-passage time to
is a stopping time of
Proof.
U CS
t < a}
for some U
t 0
be generalized.)
p(x 1 , ... ,x n ), and assume for
everywhere in
Rn.
(This can easily
In elementary probability, one defines the
conditional density of to be the function
Xl"" ,Xn
Xn
given
258
APPENDIX 2
n Il < ""t as defined above can
It is an important exercise to verify that if then the conditional expectation of
Xn
EClx
be computed in this "elementary" way:
f~"" xp(X I
•· ..• xn_l,x)dx
f~