137 34 14MB
English Pages 296 [299] Year 2016
Annals of Mathematics Studies Number 4 4
ANNALS OF M ATHEM ATICS ST U D IES Edited by Robert C. Gunning, John C. Moore, and Marston Morse 1. Algebraic Theory of Numbers, by
H
W
erm ann
3. Consistency of the Continuum Hypothesis, by
eyl
K urt G odel
11. Introduction to Nonlinear Mechanics, by N.
Kryloff
16. Transcendental Numbers, by
S ie g e l
Carl L
u d w ig
and N.
B o g o l iu b o f f
17. Probleme General de la Stabilite du Mouvement, by M. A. L ia p o u n o f f 19.
Fourier Transforms, by
S. B o c h n e r
and
K. C h a n d r a s e k h a r a n
20. Contributions to the Theory of Nonlinear Oscillations, Vol. I, edited by S. 21. Functional Operators, Vol. I, by
Jo h n
22. Functional Operators, Vol. II, by
von
Jo h n
N
N
von
Zygm und,
W.
27. Isoperimetric Inequalities in Mathematical Physics, by G. 28. Contributions to the Theory of Games, Vol. II, edited by 29.
efschetz
eum ann
24. Contributions to the Theory of Games, Vol. I, edited by H. W. 25. Contributions to Fourier Analysis, edited by A. d e r o n , and S. B o c h n e r
L
eum ann
and G.
Po lya
H.
and A. W.
Kuhn
T ransue, M. M
W.
Kuhn
and A. W.
L. A h lfo rs
A.
P. C a l
Sz e g o
Contributions to the Theory of Nonlinear Oscillations, Vol. II, edited by S.
30. Contributions to the Theory of Riemann Surfaces, edited by
T ucker
orse,
L
T ucker
efschetz
et al.
31. Order-Preserving Maps and Integration Processes, by E d w a r d J. M c Sh a n e 33. Contributions to the Theory of Partial Differential Equations, edited by L. and F . Jo h n
B ers,
S.
B ochner,
34. Automata Studies, edited by C. E. Sh a n n o n and J. M c C a r t h y 36. Contributions to the Theory of Nonlinear Oscillations, Vol. Ill, edited by 37. Lectures on the Theory of Games, by
H arold
W.
Kuhn.
S. L
efsch etz
In press
38. Linear Inequalities and Related Systems, edited by H. W. K u h n and A. W. T u c k e r 39.
Contributions to the Theory of Games, Vol. Ill, edited by M. P. W o l f e
40. Contributions to the Theory of Games, Vol. IV, edited by R.
D
D
resher,
uncan
L
uce
A. W.
and A. W.
41. Contributions to the Theory of Nonlinear Oscillations, Vol. IV, edited by S. 42. Lectures on Fourier Integrals, by S.
Bochner.
T ucker
L
43. Ramification Theoretic Methods in Algebraic Geometry, by S. A b h y a n k a r 44. Stationary Processes and Prediction Theory, by
Ft. F u r s t e n b e r g
45. Contributions to the Theory of Nonlinear Oscillations, Vol. V, edited by S. L e f s c h e t z 46. Seminar on Transformation Groups, by A. B o r e l et al. 47. Theory of Formal Systems, by R. Sm u l l y a n
T ucker
efsch etz
In press
and
STATIO NAR Y
PROCESSES
A N D PREDICTION THEORY BY
Harry Furstenberg
PRINCETO N, N E W JERSEY PR IN C E TO N UNIVERSITY PRESS
i960
Copyright © 1 9 6 0 , by Princeton University Press
All Rights Reserved L. C. Card 6 0 - 1 2 2 2 6
Printed in the United States of America
To the Memory of Jekuthiel Ginsburg
ACKNOWLETOEMEIWS
This study was written while the author was at Princeton University and is an elaboration of his Princeton doctoral dissertation. During the period of his work the author was supported by grants from the National Science Foundation and the Office of Ordnance Research, U. S. Army Ordnance, for which he wishes to thank these agencies. In addition the author is indebted to Princeton University and par ticularly to Professors Solomon Bochner and William Feller for their friendship and encouragement and for their suggestions which have been most helpful in guiding the present work. Finally, the author wishes to express his gratitude to Dr. Leon Ehrenpreis without whose kind encouragement this work would not have advanced beyond its preliminary stages.
CONTENTS Page I N T R O D U C T I O N ........... ....................... ..............................
1
C H A P T E R 1.
8
S T O C H A S T I C P R O C E S S E S A N D S T O C H A S T I C S E Q U E N C E S .......... §1.
*
P reliminar Prelimin aireise:s :C oCmommumtuattaitviev eC C-alge - a l gb er ba rs a s .......... 1.1o C * - a l g e b r a s ...................................... ......................... 1.2. EE--aallggeebbrraass............................. ..................................
8 8 11
§2.P r e l i minarie s : S t a t i o n a r y S t o c h a s t i c P r o cesses... 14 2.1. O n e - s i d e s and T w o - s i d e d P r o c e s s e s ........... 14 2.2. The S a m p l e P aths of a P r o c e s s ................ 18 2-3S u b p r o c e s s e s ........................... ......... 20 2.4. E r g o d i c i t y . c ..................................... 21 §3 . S t o c h a s t i c S et qo cu he an scteisc................................... S e q u e n c e s ..................... 3 • 1 • DDeeffiinniittiioonnss....................................... ...................................... 3 .2 . DDiissttrriibbuuttiioonn FFuunnccttiioonnss of ofS tSotcohcahsatsitci c S e q u e n c e s ...................................... ......................................... 3 .3 . EExxaammpplleess..........................................
23 23 23
2k 24 26
§4.The TheP rocess P roces sA s As so sc oi ca it ae td e dw iwtiht ha aS tSotcohcahsatsitci cS eSqeuqeunecnec e 27 27 4.1. The The AAbbssttrraacctt PPrroocceessss of of aaS tSotcohcahsatsitci c S e q u e n c e . . . .................................... 27 4.2. The The PPrroocceessss XX((ss))............................... .............................. 29 4.3- EExxaammpplleess........ ...... .................................. ................................. 32 §55*. R e g u l a r S e q u eRnecgeusl..... a r S e................................. q u e n c e s ........ ........ ...... 5 • 1 • DDeeffiinniittiioonnss....................................... ...................................... 5 .2 . EExxiisstteennccee of .............. 5*2. of RReegguullaarr SSeeqquueenncceess............... 5*3* GGeenneerriicc PPooiinnttss................................... .................................. §6. E x a m p l e s . . . . . ............... E x a m p l e s . . . ............................. . .......... ............... 6.1. The The AAllmmoosstt PPeeriodic r i o d i c CCaassee....................... ...................... 6 .2 . The The RRaannddoomm CCaassee.................................. ................................. ............. 6 .3 . Operati Operat ioonnss on on RReegguullaarr SSeeqquueenncceess.............. §7*
C H A P T E R 2. 2.
33 33 33 37 38
41 41 k2 42 k2 42
E r g o d i c P r o p e r t i e s of R e g u l a r S e q u e n c e s ......... 7.1 . C r i t e r i a f o r E r g o d i c i t y ....................... 7.2. E x a m p l e s .........................................
kk . 44 53
T H E P R E D I PC RT EI DO IN C TP IR O NB L PE RM OS B LF EO MR S SFEOQRU ESNECQEUSE................... N C E S .............
56
§88.. The The S upport S upp o rof t of the the P r e Pd ri ec dt i co tn i Mo ne aMseuarseu.............. r e ........... 8.1. o .. .................................. 8 . 1 . LL--sseeqquueenncceess...................................... 8 .2 . LLiittttlleewwoooodd!s !s TThheeoorreemm........................... ........ ..................
56 56 59
§9«
60 60 61 61
D e t e r m i n i s Dt ei tc e rSmeiqnuiesntciecs ............................... S e q u e n c e s ................... 9-1- DDeeffiinniittiioonn.......................................... .................................. 9.2 o CCoonnssttrruuccttiioonn of of DDeetteerrmmiinniissttiicc Sequenc Sequenceess........
§10. §1 0 .P rPerdeidcitcitoino nM eMaesausruerse sand andC o Cn ot ni tn iu no u os u sP rPerdeidcitcatbaibli ltiyt y 63 63 10.1. 1 0 . 1 . P r e d i c t i o n M e a s u r e s ............................ 63 1 0 .2 . P r e l i m i n a r y L e m m a s .............................. 65 1 0 .3 . A C r i t e r i o n f o r C o n t i n u o u s P r e d i c t a b i l i t y . 68 10.4. C o n t i n u o u s P r e d i c t a b i l i t y at E v e r y Point.. 69 1 0 .5 . C o m p o s i t i o n of P r e d i c t i o n M e a s u r e s .......... 70 10.6. R e a l i z a t i o n of P r e d i c t i o n M e a s u r e s .......... 72 C H A P T E R 3-
E X A M P L E S A N D C O U N T E R E X A M P L E S ............................... §11.
R a n d o m and o fdf MSaerqkuo ef nf c eS se ......................... R a n dMoamr kan q u e n c e s ................
75 75
CONTENTS Page
11.1. Preliminaries............. ........... 1 1 .2 . Continuous Predictability of Random, Markoff, and M-Markoff Sequences....... §1 2 . A Non-Continuously Predictable Sequence........ 1 2 .1 . Construction of the Sequence............ 1 2 .2 . Application of the Prediction Procedure... 1 2 .3• Another Form of the Sequence............ §13- A Class of Deterministic Sequences... ........ . 13 •1• -Sequences.......... ..... ........... CHAPTER k . SUBPROCESSES OF MARKOFF PROCESSES. ....... . .......... §14.
Automorphism Groups.......................... . . 14.1. Definitions........................ . .... 14.2. An Illustration.......... ......... §15* Linear Transformations of Cones................ 15-1* The Projective Metric......... ..... 1 5 -2 . Projectively Bounded Transformations..... §1 6 . A Sufficient Condition for Continuous Predictability............. .......... . 1 6 .1 . Automorphisms and Their Adjoints..... 1 6 .2 . Construction of the Prediction Measures... §17- Normality and Continuous Predictability......... 17•1• Reduction to Markoff Processes..... 17-2. Consequences of Normality............... 17*3* An Example...................... .......
75 76 78 78 80 81 82 82 88 88 88
91 91 91 93 95 95 1 00 1 02 1 02
103 108
CHAPTER 5 . STOCHASTIC SEMIGROUPS AND CONTINUOUS PREDICTABILITY...
110
§1 8 . Stochastic Semigroups.......................... 1 8 . 1 . Definitions................... ......... 1 8 .2 . Ergodicity of X and £* (X)............ 1 8 .3 . Application to Symmetric Processes....... §19- Realization of Stochastic Semigroups...... 19-1- Linear Semigroups...................... 1 9 *2 . Examples........................ ....... 19*3* Farther Examples........................ §2 0 . Continuous Predictability for the Processes of Stochastic Semigroups..................... . 2 0 .1 . A Criterion for Continuous Predictability. 2 0 .2 . The Linear Case......... .......... . 2 0 .3 . Applications............................
1 10
125 127
STATISTICAL PREDICTABILITY..........................
130
§2 1 . The Continuously Predictable Cover of a Process.. 21.1. Definitions............................. §2 2 . Statistical Predictability..................... 2 2 .1 . Statistical Determination............... 2 2 .2 . Properties of Statistical Determination... 22.3* Statistical Predictability.............. §2 3 . The Continuously Predictable Cover of a Finitely Valued Process....................... ....... 2 3 .1 . An Identity for Prediction Measures...... 23*2. Construction of .................... A. §24. Applications to Finite Dimensional Processes....
130 130 132 132 134 136
CHAPTER 6 .
110 1 12
113 1 14 1 14 115 121
122 122
138 138 140 145
CONTENTS Page
24.1. The Canonical Semigroup.................. 24.2. The Sample Space ....................
145 147
CHAPTER 7- INDUCTIVE FUNCTIONS..................................
152
§2 5 . Inductive Functions; An Example...... .......... 2 5 .1 . Preliminaries............................ 2 5 .2 . The Equation zR+1 = xn+1 + Examples. 25-3- A Condition for Regularity............... 25.4. Existence of Representative Sequences.... §2 6 . G-roup-Valued Inductive Functions... ............ 2 6 .1 . The Equation zn+1 = xn+1zn .............. 2 6 .2 . Applications to Equidistribution......... §2 7 . Periodic Subsequences of Regular Sequences...... 2 7 .1 . Connection with Inductive Functions...... 2 7 .2 . Regularity of a-Sequences................ 2 7 .3 . Existence of Fourier Coefficients........
153 153
CHAPTER 8. INDUCTIVE FUNCTIONS AND MARKOFF PROCESSES............ §2 8 . Inductive Functions of Markoff Processes........ 2 8 .1 . Preliminaries............................ 2 8 .2 . A Counterexample......................... 2 8 .3 . A Law of Large Numbers for Markoff Processes............................. 2 8 .4 . Compact Inductive Functions of Random Processes............................. 2 8 .5 . Compact Inductive Functions of Markoff Processes............................. §2 9 . Compact Inductive Functions of Markoff Sequences. 2 9-1* Uniqueness of Inductive Functions.. 2 9 .2 . Application to Markoff Sequences.........
158 160 1 64 165 165 171 172 172
175 178
181 181 181 1 85 186
192 1 96
199 199 2 02
CHAPTER 9- PROJECTIVE INDUCTIVE FUNCTIONS AND PREDICTION........
207
§3 0 . Projective Inductive Functions.................. 3 0 .1 . Notation................................ 3 0 .2 . Preliminary Lemmas....................... 3 0 .3 . Projective Inductive Functions of a Markoff Process....................... 3 0 .4 . The Range of a Projective Inductive Function.............................. §3 1 • Projective Inductive Functions of Markoff Sequences................................ . 31-1* Statement of the Problem................. 31.2. A Reformulation......................... 31.3* A Sufficient Condition................... 31.4. A Further Sufficient Condition........... §3 2 . Adjoint Processes.............................. 3 2 .1 . Conditional Distributions................ 3 2 .2 . The Fundamental Lemma.................... 3 2 .3 * Restricted Solution to the Problem....... 32.4. The Remaining Alternative................ 3 2 .5 * Some Remarks.............................
2 07 207
210 214 226 228 228 230 232 239
241 241 242 252 260 274
CONTENTS
Page §33-
Application to Statistical Predictability........... 27 7 33 •1 • ResumS.o.......... ............. . ...... 277 2 78 3 3 *2 . Application of Theorem 3 2 .1 »............ 33*3 • Uniqueness. ...... ........ ............. 280
BIBLIOGRAPHY. . .................................................... ..... 82 2
STATIONARY PROCESSES AND PREDICTION THEORY
INTRODUCTION The theory of linear prediction developed b y Wiener and Kolmogoroff ([ 15 ], [8 ]) treats the problem of extrapolating from the known past of some variable quantity to its future. The known data is in the form of an infinite sequence of numerical readings taken from observations made at equally spaced intervals of time throughout the past. The problem is totomake make a a reliable estimate of the future future values values of the of the variable on the basis The method method of of Wiener basis of of this thisd adtaat.a .The Wiener and and Kolmogoroff Kolmogoroff may, for our purposes, be described as follows. One compares each of the past readings w i t h a fixed linear combination of those preceding it and asks for that particular linear combination which in some sense mi n i mizes the discrepancy between the two. Having found the most consistent linear combination, the predicted value for the next reading is taken as this linear combination of the past r e a d i n g s . ^he The questions w h i c h Wiener, Kolmogoroff, and their followers have investigated are for the most part concerned wit h the relationship between this prediction procedure and the harmonic analysis of the time series in question. The harmonic, or oscillatory, aspects of the time series are embodied in the so-called so-called "spectrum" "spectrum" of the series whichwhich analyzes analyzes the the series into its harmonic harmonic ccoommppoonneennttss..The The spectrum spectrum of of aa time time series series turns turns uupp in numerous questions involving "linear" properties of the time series. For example, it was found that from the spectrum one could compute the "expected" error of the prediction procedure described, and thereby one could ascertain whether or not the prediction would be e x a c t . Similar questions may be answered for vector-valued time series as well, but the theory in this case is considerably more complex ([5], [11], [15])[15] ) In order to present our version of the problem of prediction it will not be necessary to enter in great detail into the circle of prob lems connected with linear prediction theory. We do point out that our description oversimplifies the actual situation in several r e s p e c t s . To begin with, there need not exist an optimal linear combination and the p r e dicted value will have to be obtained b y some limiting p r o c e s s . Secondly, this limiting process need not converge for an individual time series and the procedure may be meaningful only for an ensemble of time series where the predicted value exists as a function on the ensemble defined "almost
2
INTRODUCTION
everywhere” . As a result the customary setting for the linear prediction problem is the theory of randomized ensembles of time series or, more precisely, the theory of stationary stochastic processes. The known data is then not an individual numerical sequence hut a sequence of random variables, and the problem is to find a variable defined in terms of the past that best approximates the variable representing the next reading. The sense in which the approximation is to be optimum is that of the "mean square" norm so that the desired random variable always exists by the Riesz-Fischer theorem. We emphasize that the sense in which the pre diction exists here is as a random variable defined almost everywhere on a set of time series. This is significant because a random variable de fined almost everywhere on a set of points need not be well-defined at any point and exists only as an equivalence class of well-defined func tions. Consequently, the resulting prediction need not have a meaning at any particular time series of the ensemble. However, the existing theory still makes plausible a problem of prediction for individual time series. Presumably if the series repre senting the past is sufficiently well-behaved there do not arise any con vergence difficulties and the predicted value for the next reading will be well-defined. Moreover, in principle, there is no reason to restrict oneself to linear prediction and for a suitably restricted time series it should be possible to speak of a non-linear prediction as well. That is, instead of considering only linear combinations of the past as esti mates of the future one can consider arbitrary functionals on the past, and take a sequence of such functionals that minimizes the discrepancy in the past between the estimates values and those occurring. If the series in question is sufficiently well-behaved the sequence of functionals should provide a sequence of values converging to an "absolutely (i.e., not nec essarily linear) best estimate" of the unknown next reading. To be more explicit, let the given time series be represented as (1.1)
... |(-n), ..., |(-2), |(-1), ?(o);
let f(t.j, ..., t^.) be a function of k variables, k arbitrary, and consider as an approximation to the given series, the series defined by (1.2)
lf(-n) =f(|(-n-i), |(-n-2), ..., |(-n-k))
.
Using the mean square norm to measure the discrepancy between we obtain for the error
|
and
^
INTRODUCTION N ( 1 - 3)
e (f)
= 11m s u p ^ u ^ - ^
|l(-n )
-
| f ( - n ) | ?l
Let e be the g.l.b. of e(f) for all f and let {fv} be a sequence of functions with s(f ) — > e. The sequences { (-n)} then represent v v closer and closer approximations to {|(-n)}, each defined in terms of "the past", namely, in terms of the preceding values of Hence (I.M
1 *0 ) =
lim v ->
f(l(o), 6(-l), ..., s(-k+ 1 ))
,
00
if it exists and is independent of the sequence would be an appropriate choice for the (least-squares) estimate of the next reading 1(1 )• We are now in a position to state the problem underlying this in vestigation. The Wiener-Kolmogoroff theory of prediction suggests that for some class of time series it is possible to affect linear and non-linear prediction of future values in a manner to be meaningful for the indi vidual series themselves. However, the particular procedure described in the preceding paragraph, as it stands, will not take us very far. For, it may be shown that the limit in (1.4) does not exist for general mini mizing sequences of functions except in the trivial case that {|(-n)) is periodic. The problem that we have set ourselves here is to determine whether there is nevertheless a precise sense in which one can identify a * value | (1 ) as the optimum estimate for the unknown reading |(1 ) of a time series (1.1 ). The problem of prediction that we pose is then no longer concerned with the computability of the future values (where for practical reasons one is restricted to linear extrapolation) but with the purely theoretical question whether the concept of a predicted value of a sequence is at all sensible. Our goal, therefore, will be to state a defi nition of "predictability" for time series and to demonstrate its useful ness by exhibiting as wide a class as possible of "predictable" time series. In this description of the problem we have restricted the scope of our notion of predictability to conform to the corresponding notion for stochastic processes. However there are points of view which suggest ex tending this notion. Namely, suppose that |(n) is not a numerical se quence but one that takes values in an abstract space, such as would be the case if we were to record the results of a series of coin tosses. Here, even if one identifies the abstract values with numerical values, the re sult of the prediction might be a number that does not correspond to one of the abstract values possible. For example, setting "heads" = 1, and
4
INTRODUCTION
"tails" = -1, the result of the WIener-Kolmogoroff prediction procedure would be 0 (if the coin is balanced) which is an impossible value in this case. The value 0 occurring in the foregoing situation does have a simple interpretation; namely, thatboth "heads" and "tails", or 1 and - 1, are equally likely to occur as the next reading. Now if one regards the concept of prediction as that of inferring information about the future on the basis of the past, then the conclusion that both possible values are equally likely to occur is information that is "predicted" from the past behavior. This suggests that what we should look for is not a particular "most likely" value in the future, but a probability distri bution over all the values of the future readings as conditioned by the known past. Having found this distribution, the estimate for the next value may be made for a numerical sequence in a number of ways, the most natural being, perhaps, to choose the expected value of the distribution. On the other hand, a probability distribution would be no less meaningful for non-numerical sequences - and this is the principal advantage. We shall, in fact, adopt this point of view and as a consequence, the result of prediction for a sequence, when meaningful, will not be a single value, but a probability distribution over a set of values. More generally, the desired prediction will provide a probability distribution, or measure, over the set of all possible values for all the future readings. This approach might, incidentally, be taken in the prediction problem for stochastic processes aswell; here it does not involve a radical departure from the ordinary prediction problem. Thus it is possible to define a "random prediction measure", which is a measure-valued random variable defined over the past of the process. The non-linear pre diction variable of the existing theory is related to the prediction measure in the manner described above: the random predicted values are the expecta tion value's of the random prediction measures. It should be pointed out that the notion of prediction measures avoids another of the limitations of the Wiener-Kolmogoroff approach; namely the arbitrariness of the least squares criterion arising in the form of the error term e(f) in (I.3 ). As may be shown, the least squares cri terion corresponds to just one of the possible choices of an estimate based on a probability distribution - namely, the expectation values, and pre diction values based on other norms may be obtained as well from the pre diction measure. To sum up, our problem is to propose a definition for the pre dictability of a sequence; i.e., for the existence of a procedure assigning to an abstract-valued sequence a measure defined on the set of all possible
INTRODUCTION
5
futures for the sequence and representing the likelihood of the occurrence of the future events based on the known past. Although our problem is formulated for individual time series it is not independent of the theory of stochastic processes. In fact, the class of sequences to which our analysis will be applicable will be such that to each sequence there corresponds a stationary stochastic process for which the given sequence is a "typical" sample sequence. Speaking heuristically, the stationary process plays the role of a mechanical sys tem on which the measurements are made. This mechanical system must be of a stationary nature, for otherwise there are no grounds on which to expect any connection between past and future behavior. Our first chapter will therefore be concerned with the question of determining when a se quence can be thought of as occurring from observations on a stationary mechanism, or its abstract counterpart, a stationary process. A sequence of this kind will be said to be "regular" and it determines a "generic" point of the associated process. To make these notions precise we shall have to develop systematically the relevant theory of stationary stochastic processes. We remark that a useful tool in this part of the exposition .£ will be the theory of commutative C -algebras; a stationary process will -ft be described in an "invariant" manner as a C -algebra with certain special properties. The problem of prediction for sequences turns out to be a special case of a general problem of predicting from one stationary process to another "at a point of the process". In this formulation there will be two notions of predictability that may be applicable: "continuous predicta bility" and "statistical predictability", and our investigation may be separated into two portions treating respectively these two notions. The more restricted notion, that of "continuous predictability" is easily de fined and the problem that we treat is to exhibit a large class of regular sequences to which this form of predictability applies. We find however that this class is not as wide as one might hope, and in particular one can find continuously predictable sequences of an elementary kind such that certain sequences derived from these by ordinary algebraic operations are no longer continuously predictable. To remedy this we introduce the broader concept of "statistical predictability" and our main problem will be to show that this form of predictability does apply to the sequences in question. In the course of our discussion the theory of stationary processes plays a double role. In the first place the definitions of predictability can be made only after imbedding the sequences into the sample spaces of appropriate stationary processes. Secondly, in exhibiting the regular
6
INTRODUCTION
sequences that are predictable we shall not, in general, given an exact description of the sequence (which, usually, we cannot do, but fortunately, this will not be necessary) but rather a description of the process it represents. Thus we shall speak of "Markoff sequences" and "multiple Markoff sequences", these being regular sequences generic for a stationary Markoff or multiple Markoff process. One question, therefore, to be considered is that of describing stationary processes. To begin with, one has the elementary processes, the "random" and "Markoff" processes where the probability relationships are of a simple kind. More complex processes are obtained by considering processes "derived" from these by algebraic operations and the like. These might be termed "sub-Markoff" processes. This gives rise to the first non-trivial problem in our theory: to determine when a sub-Markoff sequence is continuously predictable. In Chapter 3 we find that this is not always the case and in Chapter b we derive sufficient conditions for it to hold. In Chapter 5 we introduce the notion of a "stochastic semigroup" by means of which one may define a still wider class of stationary processeso Here too we shall find conditions for the pre dictability of the sequences arising in this way, the conditions being In terms of the generating stochastic semigroup. In our treatment of "statistical predictability" our main concern will be with sequences derived from Markoff sequences and our principal result Is that these are statistically predictable. In arriving at this result we shall come across another method of generating stationary processes, namely, by forming "inductive functions" of known processes. In this regard the main problem will be to see to what ex tent the inductive function of a regular sequence mirror the behavior of the inductive functions of the stationary process it represents. We shall see that this question arises in connections other than prediction theory and we devote Chapters 7 and 8 to an exposition of these. In particular, we shall show how a certain equidistribution theorm of H. Weyl ([13]) is deducible in this context and we also give, by these means another proof of a theorem of Wiener and Wintner regarding the Fourier coefficients of a stationary process ([16]). There is one more remark which we feel is appropriate. We wish to point out that although prediction theory is the unifying theme of this study, it has not always been our main concern to prove theorems re garding prediction theory. Thus our final result which may be stated "every finitely-valued derived sequence of a finite state Markoff se quence is statistically predictable" is too special to justify the last
INTRODUCTION
7
four chapters required for its proof. It would be more correct to say that our goal has been to show how certain concepts, most of which we present in Chapter 1, may be used to systematically analyze various stationary processes and their sample sequences. Prediction theory has therefore often been the excuse rather than the reason for carrying through a cer tain analysis illustrating the concepts and methods that we introduce.
CHAPTER 1 .
§1.
STOCHASTIC PROCESSES AND STOCHASTIC SEQUENCES
Preliminaries:
Commutative C*-Algebras
([10]).
1.1. C*-Algebras. We denote by C(X) the algebra of all complex valued continuous functions on a compact Hausdorff space X. provided with the norm ||f|| = sup If( X ) XeX
I
When
,
and the involution f -- > f*, f*(x) = f(x), C(X) becomes a commutative C -algebra. In other words it is a Banach algebra: (a) (b)
||xf|| = |x | ||f ||, ||f + g|| < ||f || + ||g||, ||fg|| < ||f || Hell C(X) is complete in the metric induced by ||.||, and ** * _ * the involution satisfies f = f, (\f) = \f , (f + g )* = f* + g*, (fg )* = f V , iiff*n = iifii2 .
,
*
These are the conditions for a C -algebra. To an extent the converse is true as well. If A is any commutative normed algebra with an identity element and an involution satisfying (a) and (b) then there is a compact Hausdorff space X, and an isomorphism of A with C(X) preserving norms and involutions. The class of algebras C(X) is thus identical * with the class of commutative C -algebras with identity. This will be of considerable importance for us, for we shall frequently recognize an alge* bra as a C -algebra and it will be useful to know that it is the algebra of continuous functions on a space X.
-
To study more closely the nature of this equivalence let A be ■X* a commutative C.-algebra with identity. We consider the set of all alge braic homomorphisms of A onto the complex numbers. These form a set X^ and the elements x € A induce functions x on X^ by defining x (h) = h(x) for h g X^. X^ may then be given the weakest topology that renders all these functions continuous. Withthis topology X^ is a compact Hausdorff space and the mapping x > x sends A onto a subalgebra of C(XA). Ordinarily, the mapping x -- > x would reduce norms; A * however when A is a commutative C -algebra the norm of x in A agrees exactly with the supremum norm of x on X^. This implies that our
§1.
PRELIMINARIES: COMMUTATIVE C*-ALGEBRAS
9
representation of A in C(X^) is 1 - 1 so that A may be identified with a subalgebra of C(X^), which, by thecompleteness of A, is uni formly closed. Since two points of X^ that give the samevalue to each x must correspond to the same homomorphismof A, hence must be identi cal, it follows that the subalgebra of C(X^) corresponding to A separates points in X * . Moreover it may be shown that involutions go /V* ^ over into conjugates in this representation, i.e., that (x ) = x. This implies that the subalgebra in question contains with every function on X^ its complex conjugate. At this point we Invoke the Stone-Weierstress theorem to the effect that a subalgebra of C(X) separating points of X and containing with each function its complex conjugate is dense either in C(X) or in the subalgebra of functions vanishing at some fixed point. Since the image of A in C(X^) satisfies the hypotheses of the theorem and more over is uniformly closed and contains the function 1, it follows that this algebra is C(X^) itself. In this way, the identity of A and C(X^) is established. We note that the compact Hausdorff space X^ is, moreover, uniquely determined (up to a homeomorphism) by the condition that A = C(X^). For if A = C(Y) for some space Y then since each point of Y provides a homomorphism of A onto the complex numbers we have a mapping of Y into X^. Since C(Y) separates points in Y it follows that this mapping is 1 - 1 and it will be continuous by the definition of the topology of X^. The image of Y in X^ will be compact; hence if the two were not identical there would be a non-zero function on X^ vanishing on the image of Y. But then the 0 element of A = C(Y) would be identified with a nonzero element of A = C(X^) which is impossible. Hence Y » X^. A corollary to this is that the only homomorphlsms of C(X) onto the complex numbers are those given by the points of X — namely, X ” xc(x). We shall, as a rule, identify the algebras A and C(X^). So an element g e A may be Interchangeably thought of as an abstract ele ment of A or as a function on the space X^. In the former case we shall write x(g) for the pairing of an x e X^ and g e A while in the latter we shall write g(x), the meaning being the same in both cases. Now let A and B be two C -algebras (henceforth it will be * understood that our C -algebras are commutative and possess an identity) and suppose B C A. Clearly every homomorphism (onto the complex numbers) of A Induces a homomorphism of B so that if A = C(X^) and B = C(X-g), we shall have a map (the term "mapping" will always be meant to imply con tinuity) p : X ^ -- > Xg. We can show that p has to be onto: p(X^) = Xg.
CHAPTER 1.
STOCHASTIC PROCESSES AND STOCHASTIC SEQUENCES
For, in any case, since p is continuous p(X^) will be a compact subset of Xg. If it is a proper subset then there exists a non-zero element of B vanishing on p(X^). This means that every homomorphism on A applied to this element gives 0, so that the element must be 0 which is a contradiction. Hence p is onto. An importantconsequence of this should be pointed out. Namely, since p is onto, every homomorphism of B is the restriction to B of some homomorphism of A, i.e., every homomorphism of a C -subalgebra of A may be extended to A. Consider next the converse situation where we are given two compact Hausdorff spaces X and Y and a map p of X onto Y. To every g e C(Y) we may associate the composite function g o p on X. If g o p vanishes identically then since p is onto it follows that g is identically zero. Hence the correspondence g -- > g o p imbeds C(Y) in a 1 - 1 manner into C(X) and we may write C(Y) C C(X), identifying * g with g o p. We thus find that the C -subalgebras of a C -algebra A are all obtained by talcing a mapping p of the homomorphism space X^ onto a compact Hausdorff space Y and identifying those functions on X^ of the form g o p with g e C (Y). Whenever we have a mapping p of a compact Hausdorff space X onto a space Y we may refer to Y as an identification space of X; for with an appropriate topology, Y is the space obtained by identifying the points of X that have the same image under p. The mapping p then becomes the canonical map taking a point into the class of equivalent points to which it belongs. Moreover the algebra C(Y) when identified with a subalgebra of C(X) consists of just those functions on X that take on equal values at points with the same image under p. Namely, if f = g o p with g e C(Y), then obviously p(x1 ) = p(x2 ) implies f(x1) = f(x2 ). The converse is also true. For a function taking equal values at equivalent points has the form f = g ° p for some function g on Y and it need only be shown that g is continuous. But if F is a closed subset of the complex plane then g"1(F) = p(f”1(F)) and since —1 —1 —i f” (F) is closed and hence compact so is p(f“ (F)). Hence g“ (F) is closed so that g is continuous. Returning to the two C -algebras A and B with B ( A, let us call two homomorphisms of A equivalent if they agree on B. This means that two points of X^ are considered equivalent If they have the same image in Xg under the canonical map p : X ^ -- > Xg. It follows from the foregoing discussion that the subalgebra B corresponds to just those functions on X^ taking equal values at equivalent points. As a result of this the elements of B may be thought of in several ways. To begin with they are elements of B, they are also the continuous functions on Xg, and finally, they are the continuous functions on X^ taking
§1.
PRELIMINARIES: COMMUTATIVE C*-ALGEBRAS
equal values at equivalent points. We shall not distinguish "between these three except where necessary so that g € e B, g e € C(Xg) and g 0 p € C(X^) are all to be identified. In particular g(x) will make sense for x either in or in Xg. *
*
Let T be a homomorphism of a C -algebra -algebra BB into into aa CC -algebra -algebra A. Every homomorphism h of A onto onto the thecomplex complex numbers numbers determines determines aa homomorphism h 0 T of A and so there is induced a mapping of X^ onto Xg. We We shall shall find find it it convenient convenientto denote this this mapping mapping again again by by TT so so that one may may write write either either x(Tf) x(Tf)or or Tx(f) Tx(f) (or (or also also Tf(x) Tf(x) or or f(Tx)). one f(Tx)). It will be noted that although T may be given only as an algebraic homomorphism it will necessarily preserve involutions as well: (Tf) = T(f ). This is seen by considering the induced map T of X^ onto Xg. For the isomorphism of A with C(X^) preserves the involution so that for x e X «, f*(x) = f(x). Hence for x £ X«, (Tf)*(x) = Tf (x) = f(Tx) = f*(Tx) = Tf*(x). If T : B -- > A is 1 - 1 then we are in the situation previously considered, B ( A. Here the induced map T = p takes X^ onto X Xg. -g. An incidental consequence of this is that the map T : B -- > A is norm preserving. For if f e B, ||f|| = sup X€Xb and since
T
is onto, the latter is sup X€
and
|f(x)|
|f(Tx)| = sup
|Tf (x ) | = ||Tf||
^
||f|| = ||Tf|||..
to If T : B -- > A is onto then : TB ->onA X^ is onto for all all ff Namely if if Tx1 Tx1 == Tx2, Tx2, then then f(Tx1) f(Tx1) == f(Txg f(Txg)) for Tf(x-j Tf(x1 ) = = Tf(x2 Tf(x2 ) )for for all all such such f. f. But But since since TT takes takesBB implies that as homomorphisms homomorphisms of of A, A, xx11 == xg xg so that TT
Xg 1 - 1 . X^ thenis T on inB, B,or or in onto onto A, A, this this is1 -1 1 - .1 .
toXgis1 - 1.
Finally we note that a mapping T of X^ into X^ Xg induces an algebraic homomorphism, again denoted by T, of B into A. Again the the original original map map was was onto onto or or 11 -- 11 induced map will be 1 - 1 or onto if the respectively. These statements follow just just as as readily readily as the as the previous previous ones. 1.2. E-algebras. If an algebra A is given as the algebra of continuous functions on a space X, then the borel measures on X de termine linear functionals on the algebra A:
J
E^(f) = J f(x)dn(x) X
.
12
CHAPTER 1 .
STOCHASTIC PROCESSES AMD STOCHASTIC SEQUENCES
In the following definition we describe in terms of an abstract C -algebra the situation that arises by choosing \i to be a probability meastore. (Compare Segal's "probability algebra", [12].) DEFINITION 1.1. An E-algebra is a C*-algebra with a distinguished E satisfying (i) E(ff*) > 0 with equality possible only if f = 0, (ii) E( 1 ) = 1 .
functional
-X-
When viewed as functions on X^, the elements ff are pre cisely the positive continuous functions and (i) requires that the func tional E be positive, i.e., take on positive values at positive func tions . It is clear that if we have a probability measure \± on X^, it will induce a functional E^ satisfying (i) and (ii) if \± takes on positive values on non-empty open sets. Conversely by the Riesz-Markoff representation theorem any functional on C(X^) satisfying (i) and (ii) is induced by such a probability measure. In general we shall denote by P the probability measure on X^ corresponding to the functional E of an E-algebra A. Suppose that Tis an algebraic homomorphism of B with A which furthermore preserves the functional E: E(Tf)= E(f),E( •)de noting the distinguished functional on both algebras. In particular, E(T(ff*)) = E(ff *) so that if f ^ 0, E(T(ff*)) > 0 . But T(ff*) = (Tf)(Tf*) so that it follows from f £ 0 that E((Tf)(Tf*)) 4 0 and Tf £ 0. Thus we conclude that a homomorphism of E-algebras preserving E is 1 - 1 and hence (§1 *1 )also norm-preserving. As a result, analge braic homomorphism of an E-algebra preserving the functional E, is an isomorphism preserving the entire E-algebra structure. As regards the homomorphism spaces, we note that T is a measure preserving map of X^ onto XB . Namely from
J XA
f(Tx)dPA (x) =
J
f(x)dPB (x)
XB
we deduce that P^(T“1(a)) = Pg(^) for any borel set A in X^; this is the sense in which T is to be thought of as measure preserving. (In general, if T maps X into Y and \i is a measure on X there is in duced a measure v on Y by (1.1)
v ( A ) = u(T- 1 ( A ) )
,
for borel sets A in Y. T is measure preserving If the induced measure agrees with the existing measure. Note that (1 .1 ) does not imply
(1 . 2 )
n( A) = v ( T ( a ) )
§1.
for borel subsets A
PRELIMINARIES: COMMUTATIVE C*-ALGEBRAS
of
13
X.)
By means of the measure P on the homomorphismspace of an E-algebra A we may define the Lebesgue spaces of A: DEFINITION 1.2. If A is an E-algebra, LP (A) (1 < P < ») will denote the Banach spaces of measurable functions on X^ satisfying
f
|fP (x)|dP(x)
0 implies z > 0 and that 7 = 1 . It follows then from - ||z|| < z ) = xn_1(aj). If the xn are essentially bounded (i.e., |x ^(cd)| C M, but for a set of co ofmeasure 0) they will generate an E-subalgebra of the algebra of all bounded measurable functions on ft. As elements in this E-algebra they may be identified with the continuous functions on a compact Hausdorff space. This enables us to choose asa particular version of the sample space, a compact Hausdorff space — the random variables of the process will then be continuous functions on this space. For our purposes it will be convenient to assume from the start that the sample space is a compact Hausdorff space and moreover to allow the random variables to take their values in an abstract topological space. DEFINITION 2.1. A one-sided process X consists of a separable compact Hausdorff space ft-^, a sequence fxn }n) (ii) the xn separate points in ft^ (iii) for every borel set A C P(T“1(a )) = P(a ) (iv) P(a ) > 0 for every non-empty open set A C X is a two-sided process if the index n ranges from - » to is a homeomorphism of the corresponding space ft^. ftx will be to as the sample space of the process X.
» and T referred
In this definition we have suppressed the adjective "stationary" because we shall deal almost exclusively with stationary processes so that a non-stationary process will be explicitly referred to as such. We note that conditions (iii) and(iv) together imply that the transformation T is onto. For T(ft^) is in any case compact so that ft-^ - T(ft^) is an openset and if it is non-empty P(ft-^ - T(ft-^)) > o and P (T (ft-^)) < 1. HoweverT is measure preserving by (iv) so that P (T (ft■£)) = P(T”1T(ft^)) > P(ft^) = 1 which gives a contradiction so T is necessarily onto. This shows that not only is the function deter mined by x^ according to (I) but also conversely so that all the xn
§2.
PRELIMINARIES: STATIONARY STOCHASTIC PROCESSES
15
are determined by any one of them. Condition (ii) is not an essential one in the sense that It can always be satisfied by modifying the space Namely, if (ii) is not satisfied we pass to the Identification space of nx obtained by identify ing pairs of points and 2 . Therefore T will be defined on the identification space and it follows that with the new space, the x^ determine a process. In particular suppose that X Is a two-sided process with vari ables x^ defined for all integers n. We may form from this a one sided process by considering only the xn with n < o, reducing the space so that (ii) will be satisfied for these variables. The resulting process we shall denote by X ” and the corresponding space (properly a^-) we denote by nx is then an identification space of and we de note the canonical map of &x onto byp . We shall think of X~ and X as the one-sided and two-sided versions of a single process. This will be justified when we show later that X may also be determined uniquely from X ” . In certain connections the variables xn play a secondary role and only the space ftx , the measure P, and the transformation T are of significance. It will be convenient to consider such a set-up in its own right and we give it a formal definition in terms of the algebra C (ftx ): DEFINITION 2.2. An abstract (one-sided) process X Is a separable E-algebra Ax with an endomorphism T of Ax preserving the functional E. X is two-sided if T is an automorphism of A^. We recall from §1.2 that an algebraic homomorphism of an E-algebra preserving the functional E must also preserve norms. Hence, the operator T provides an E-algebra isomorphism of Ax with T(AX ). If we start with an ordinary process X (one- or two-sided) we obtain an abstract process by taking Ax = C(nx ) with E the functional induced by P. E satisfies (i) and (ii) of Definition 1.1 because P is a probability measure satisfying (iv) of Definition 2.1. If X is two-sided then T, being a homeomorphism of &x , Induces an automorphism of Ax so that Ax will be two-sided. If X “ is the one-sided version of X then Ax -, which we shall write as Ax , is the algebra of con tinuous functions on an identification space of &x , hence may be Identified with a subalgebra of Ax . These algebras will be referred to as the E-algebras of the process X. The E-algebras of a process
X
may be obtained directly as
16
CHAPTER 1.
STOCHASTIC PROCESSES AND STOCHASTIC SEQUENCES
follows. Let i|r range over C(a ) where A is the range of the xn . Since the xn , - °o < n < «>, separate points in it follows that so do the set of all Hence the set of these functions generates the algebra Ax = 0(0^). Similarly isgenerated by the subset of the
^(xn ) ?or which
n < o.
When dealing with an ordinary two-sided process we are able to associate with it a one-sided process in a natural way. This is not the case for an abstract two-sided process. To see this let X be an ordinary process with variables x^^ and let X 1 be the process obtained by setting x^ = • Then clearly A ^ = A^; however A'^ ( AjJ in X X £ general. On the other hand to a one-sided abstract process it is possible to associate a canonical two-sided abstract process. Namely, we have THEOREM 2.1. If Y is a one-sided abstract process with algebra Ay, there exists a unique two-sided process X with A^ 5 Ay such that the set of x e Ax with Tmx 6 Ay for some m is dense in A-^. (Here, as always, "unique" means unique up to an isomorphism preserving the relevant structure.) PROOF. Let Ay-n ^, n >_ 0, be denumerably many copies of Ay and suppose each Ay11^ imbedded in Ay-n+1 ^ by identifying the element z of Ay11^ with the element Tz in Ay-n+1 ^. This is an imbedding since (§1 .2) the map T preserves E and Is therefore 1 - 1 ; moreover the image of Ay31^ in Ayn+1 ^ is isomorphic (as an E-algebra) to Ay.n ^ for the same reason. We now have Ay0 ^ C Ay1 ^ C .•. C Ay-n ^ C ••• and we may form the union of these algebras Ay. In Ay, the functional E is de fined as is the transformation T, and T still preserves E. Moreover T(Ay-n+1 = Ay11^. Set Ax equal to the completion of Ay in the norm determined upon it by the norms of the algebras Ay1). Since T is norm preserving (§1.1) it extends to Ax and so does E; since T takes Pr onto A^ it also follows that T takes onto A^. Since T is 1 - 1 and onto it is an automorphism of A-^ so that A-^ defines a twosided process X. Identifying Ay0 ^ with Ay we have Ay C A^. Moreover every xin A^ belongs to some Aym ^ and so T^x e A^0 ^ = Ay for some m so that, since Ay is dense in Ax , Ax will have the properties requiredof it. Now the latter property shows that it is unique for if A^ satisfies the conditions of the theorem, then since T is invertible in Ar7 and Ay C A^ it follows that Ay C Az and since Ay must be dense In A^ it follows that X = Z. We now turn to the case of an ordinary one-sided process Y with variables yn , n < o, and we wish to find a two-sided (ordinary) process X with Y - X ” . To do so let Y ’ be the abstract process corresponding
§2.
PRELIMINARIES: STATIONARY STOCHASTIC PROCESSES
17
to Y so that Ay, is the algebra of functions of Oy. By Theorem 2.1 there exists a unique process X 1 with Ay, C Ax , and X 1 two-sided, such that furthermore for a dense subset of z e Ax , Tmz e Ay, for some m. We define an ordinary two-sided process X by taking to be the homomorphism space of AX1, P the corresponding probability measure, and T the homeomorphism induced on by the automorphism T of There remains only to define the variables xR . Since Ay j C A-£, there is induced a canonical map P from to the homo morphism space of Ay, which Isfty. We then define the xn bysetting x^ = y^ o p for n < o and x = x (T-na ) ) for n > o, and a ) e ii ii — n o a To show that these functions define a process it is only necessary to show that the xR separate points in the other requirements being clearly satisfied. Suppose then that there exist and od2 in with
x (o)1 ) = x n (a)2 ^ **op a11
xn ^ " ma3l ^ = xn (T“ma>2 ) for all yn ^ T”ma3l ^ = ^ n ^ T”'m(JD2)* pT”™ ^
n * For ea°h n,
Since the
m
ve will then have
certainly for yn
= pT-mo32 . Now for a dense set of
n < o,
and therefore
separate points in z
in
A^,
Oy,
Tm z € Ay,
for some
m, and so z (od1 ) = z(TmT"ma)1 ) = Traz(T'ma)1 ) = Tmz )pT”moD1 ) = T^z (f3T“m2 ). Hence for all z In A^, z(o1 ) = z (cd2 ) and therefore = 0 to A. Each of these spaces is given the weak topology induced by the coordinate functions, i.e., the topology of pointwise convergence. The elements of a ^, a ”, and A* will be referred to as doubly-inf init e, left -infinite, and right-infinite A-sequences respectively. There are natural maps from A°° to both a 0”0 and a + :00' we shall be concerned primarily with the first of these, denoted p, defined by p ( c o ) ( n ) = c o ( n ) for co e a , P (c o ) a” and n < 0 . We shall also make use of the shift operator in Am and in A~ defined by (2.1 )
Tco(n) = a>(n-l ) .
(The shift in the opposite direction could be defined in A*, shall not need it.) T then represents an operator either in A~ and the two operators are related by (2 .2 )
TP = pT
but we A^ or in
.
Now let X be a two-sided process with variables xn taking their values in A and let X ” be the one-sided version of X. To every co € ^ we may associate the doubly-infinite A-sequence 7 (co) where (2 .3 )
7 (co)(n) = xn (cD)
- 00 < n < 00
,
and for every co! e we obtain a left-infinite A-sequence 7 1 (co ) de fined in the analogous way. We thus have 7 : and 7/ * : nXZ -- > A". 00
(2.*0
THEOREM 2 .3 . 7 and 7 * are 1 - 1 (a) T 7 (co ) = 7 T ( c o ) for co € (b ) T 7 *(co) = 7 !T(co) for co e ^ (c) 7 r p ( c o ) = P 7 ( c o ) for co €
(continuous) maps and
.
§2.
PRELIMINARIES: STATIONARY STOCHASTIC PROCESSES
The ranges of subsets of
7
19
and 7 ' are closed T-invariant and A“ respectively.
PROOF. In order that 7 and 7 1 he continuous functions it suffices that the coordinates of 7 (05) and 7 t (cd') depend continuously on co and cd* and this follows from the continuity of the x^. Since the xn in each case separate the points of the sample space or a flx , it follows that 7 and 71 are 1 - 1 . (a), (b) and (c) of (2.k) follow by direct substitution using the definitions of P, 7, 71, and T. (a) and (b) imply moreover that the ranges of 7 and 7 * are invariant under T and these ranges are necessarily closed as images of compact sets. The import of the first part of this theorem is that £2X and ax may be identified with subsets of A^ and A~; the equations in (2.k) ensure that under this identification the operators T and p de fined for and &x go over into the T and p defined for Aw and a “ . As a consequence of this we may suppose that from the start nx and nx were taken as subsets of ax and a “ respectively so that the points of and nx are A-sequences. and ax are now sequence spaces and the relationship be tween them is given in the next theorem. THEOREM 2.k. A left-infinite A-sequence is in if and only if it is the image under p of a sequence in A doubly-infinite A-sequence cd is in if and only if each of the left-infinite A-sequences p(T“ncD) is in ax where n >_ 0 . PROOF. an identification "only if" portion if cd € &x so do cd € A^ and that Sincep takes p(cDn ) = p(T"ncD).
The first assertion follows from the fact that ax is space of hence p takes flx onto flx . The of the second assertion follows from the same fact since each of the T-ncD. To prove the converse, assume that p(T”ncD) € &x for all n >_ 0; we shall show that cd € flx &x onto flx there is for each n a point cDn e ftx with Consider the points {T^cd^} in ftx and let k>o cd1 € be a limit point of some subsequence of this sequence is k. separable and compact — hence, sequentially compact), say T > 031 * We have then for
each
n, cd^ (n
-
k^) -- >
cd1
(n). Since
k
-- >
n - kwill eventually be negative so that cd, may bereplaced by -le -^m P(a^) = p(T cd) and we find co(n) = T cd(n - k^) -- > cd1(n). Hence cd1(n)
=
c D (n ),
cd*
= cd,
so that
cd
€
20
CHAPTER 1.
STOCHASTIC PROCESSES AND STOCHASTIC SEQUENCES
2 .3 . Subprocesses. If X is a A-valued process with variables and cp is a continuous function from a“ tosome other compact space A 1j weobtain a A'-valued process Y by setting
(2 .5 )
yn = ">(•••’
^n-i - V
•
The index n may range over all the integers or only over the negative ones depending upon whether X is two-sided or one-sided. We suppose that the function cp is independent of n so that we may write y n (T xn- 1 ^ ^
x n (T^ 0; to show that A~(g) is an E-algebra with this functional it needonly be * — shown that E(zz ) = 0 implies z = 0 . Suppose then that z g A (g) * and E(zz ) = 0 . If we take z
=
lim
k-» 00
z-.
K
with z^. e A-|(|) it will suffice to prove that ||ZjJI^ -- > °* sequences ^ in AQ (g) represent the z^. g A.|(g). Since
Let the
IIzk " Hoc ---^ 0 as 1 -- 5> 00> for arbitrary e > 0 there will exist a such that the set of n for which (n ) I > 6 density 0 for k, 4 > k^ . It follows that if for some k > 1^, |£k-(n) | > 2e on a set of non-zero upper density, then for all a > k£,
§if.
PROCESS ASSOCIATED WITH A STOCHASTIC SEQUENCE
29
lt*(n)| > e on a set with the same upper density In particular, p * E(|£^| ) would he bounded away from o. But since E(zz ) = 0, E ( | ^ | 2 ) = E(z^z*) -- > o. Hence as k — ->-— > 0 as was to be shown. Thus A”(|) is an E-algebra. To see that it defines an ab stract process we remark that the transformation T in A (g) obviously preserves the functional E> hence so does the induced shift transforma tion in A- (g) : E(Tz) = E(z). Since T is an endomorphism of A“ (g), the latter will be the algebra of a one-sided process. The corresponding two-sided process is determined as in Theorem 2.1 and the corresponding algebra is denoted A( g). k.2. The Process X( g). We may also associate with a stochastic sequence g(n) an ordinary process X(|). In order to define the vari ables x^ of this process let fl~(g) be the homomorphism space of A~(g) : A~(g) = C(fl"(|)). The transformation T and functional E in A~(g) induce a mapping T and a probability measure P on ft“(g) such thatP is preserved by T. We shall obtain a one-sided process by de fining a set of variables xn on fl~*(g)that separate thepoints of that space and satisfying x ^ T cd) = xn__1 (cd), cd e ft“(g).
C (A ) (^ o3)
To do so suppose that g(n) is a A-sequence, let y range and consider the sequences in AQ (g) of the form £jk )(n) = \|rU(k + n)),
n z|k ^. Hence if cd € ft~(g), y -- > cD(z|k ^) is a homo morphism of C (A ) onto the complex numbers so that we may write (k.k)
cD(z|k ^) = \|r(X)
,
X £ A
By the preceding equation each ) depends continuously on c d ; it follows that X itself depends continuously on cd so that we may write x = xk (cD) where x^. is a map of fl~(g) to A. The functions x^. are thus defined by the relationship (Jv.5)
i|r(xk (co)) = tu(z|k ^)
.
Consider next Ta>(z|/^); by definition this is aj(Tz^^) where Tz|^ is the element of A~(g) corresponding to the translate of thesequence i.e., to the sequence
CHAPTER 1.
30
STOCHASTIC PROCESSES AND STOCHASTIC SEQUENCES T ^ k ) (N)
It follows that
(U.6)
“ ^ ( n
T?|k) =
or
+
k
-
1 ))
.
Tzjk) = z|k - 1 ^ so that
^f(xk (Ta))) = to(Tzjk b = a>(zjk-1b = + (xk_1(a>))
This being true for all (4.7)
\|r in
0(h)
we have
xk (Tco) = x k-1 (cd)
.
Finally we wish to show that the variables xv separate the (|). To do sosoweweobserve ^ ^ ' ' generate points of Q (g). observethat that the the sequences sequences Yk) generate because the the a dense subset of A^(|) and andhence henceofof A ”(|). This is so because (k) sequence ' corresponds to the function \jr(A.(k)) on Aro. (^(k) is the kth component of a sequence in a ”) and the functions of this form separate points in A~, hence they generate C( a “). It follows that Now = x,) (=cd0x,) (cd for k k the z^f separate points in in if Now x,if(a).x,) (a). 0 ) all for all v
then by (4.5) implies that
/ i rr 'i
(v)
K
1
K
aii(z| ) = ^ 311(1 a11 ' ^or = a>2 so that the x^. separate points in
d
this n ”(|).
DEFINITION 4.3. ^. 3 . The process X"(|) defined by a stochastic sequence g(n) is the process with variables x^, space ft~(|), trans formation T and probability measure P P described described ininthe the foregoing foregoing paragraphs. The corresponding two-sided version will be denoted X(g), with space ft(| ). Consider now the sequences of A (|) (g) which, after identifying sequences that differ by a null sequence, form a dense subset of A “(|). and we we seek seek Each of these corresponds to a function on the space (g|)) and an exact relationship between the derived sequences of § (n) in A Q (|) g(n) (g) and the functions on 0~(g). n ”(|). Consider, first of all, the functions of the form ^(x^) where ^ e C(A). By (4.5) and and. (4.3), ^(x^.) corresponds to the sequence ?k (n) = *(|(k + n))
.
Since the correspondence between functions on ft“(|) (g) and sequences in Aq (|) preserves algebraic operations, it follows that a function of the AQ (g) form i|r(x. , x. , — , x. ) corresponds to the sequence ?(n) = ^(|(n + j ), J-l J-l JJ22 JJk k 11 Finally taking taking uniform uniform limits limits in in the the sequence sequence i(n + j2 ), ..., ig (n + J^-))- Finally algebra and in A ”(|) C(a“ (g) we deduce that for any y e C( a “), the function (4.8)
z = *(..., x_2 , x _ 1 , xQ )
§4.
PROCESS ASSOCIATED WITH A STOCHASTIC SEQUENCE
corresponds to the sequence (4 .9 )
^(n )
= +(...,
£ (n) I(n
31
where - 2),
|(n
-
1),
|(n ))
.
From this we may draw several consequences. First of all since the are A-valued, may be imbedded inA~ and hence (§1 .1 ) there is induced a restriction map of C(A“) onto C(ft’(|)) = A“(|). Since the x^ are the coordinate functions imbedding into A~ it follows that every continuous function on n"U) when considered the restriction of a function on a” has the form (4.8). But then it corre sponds to the derived function (4 .9 ). In other words we have shown that (4.10)
A "( |)
I.e., that every element of
A “(|)
= A 1( | )
,
corresponds to a derived sequence of
i(n). The points of a” are left-infinite A-sequences and, in par ticular, | e a” as do its translates T111!. Thus (4.9) states that the sequence £ (n) corresponding to the function z of (4.8) may be ob tained by evaluating the function y on A" at the points T11^. From this we deduce THEOREM 4.2. Everypoint in n~(|) is a limit of lates T™! of the original sequence g(n).
trans
PROOF. If not,' there would be a non-zero function on A" 00 vanishing at each T111! but non-zero on fl”(g). If this function is de noted by i|r then according to (4.9) the corresponding sequence in AQ U ) vanishes identically whereas the corresponding function in A"(|) would not be Oo This is a contradiction. We shall later need the following. THEOREM 4.3. Let cp : A~ — > A ' and suppose that t)(n) is the derived sequence of |(n) defined by (4.11)
t](n) = cp(..., |(n - 2), g(n - 1 ), |(n))
.
If |(n) is stochastic and X(|) is its associated process then the process X(tj ) associated to the stochastic sequence t) is the subprocess of X(|) de termined by cp. (Cf. Definition 2.4 and Theorem 3 .1 .)
32
CHAPTER 1 .
STOCHASTIC PROCESSES AND STOCHASTIC SEQUENCES
PROOF. The numerical derived sequences of T}(n) are evidently derived sequences of |(n); hence AQ (ti ) C AQ (t) and A“ (t]) C A”(g) so that X(t] ) is a subprocess of X(|). If the variables of X(r\ ) are de noted y^ then y^. is defined by the correspondence of t(yk ) to the sequence ?k (n) = *(ti (k + n) ) where ^ e C(A1). Using (4.11 ) we have ?k (n) = £ ( k + n - 1), g(k + n)) so that by (4.8) and (4.9)
..., |(k + n - 2), we find that
corresponds to the function tycp(..., x^ 2 > Xk-1' xk^ * Hence we may write y^= cp(..., Xk-1 9 xk^ 821(1 this proves our assertion that X(tj ) is the subprocess of X(|) determined by cp. The algebra A“(|) is, according to (4.10), essentially a se quence algebra, namely, it is the algebra of sequences derived from |(n), identifying sequences that differ by a null sequence. The two-sided alge bra A( |) has so far been defined abstractly but It is possible to re late it to asequence algebra as well. For the algebra A(|) is char acterized by the property that for a dense subset of z € A(|) there exists an m with Tmz € A“(|), or z = T-mw with w € A“U). Since the operator T in A~(g) corresponds to translation to the right in Aq (|) it follows that a dense subset of A(g) ¥-111 be obtained if we consider the translates to the left of sequences in kQ (i). More pre cisely, let BQ (|) be the set of all sequences defined on all but finitely many negative integers with the property that some translate to the right Is a derived sequence of g(n). Since in identifying se quences that differ by a null sequence, any finite set of values of the sequences are insignificant, this equivalence relation may be extended to BQ (|) so that the resulting algebra B ^ l ) contains A 1(g). The completion of B^(i) is then the algebra A( |). 4 .3 . Examples. By way of illustration we shall describe the processes X “(|) and X(|) when |(n) is a periodic sequence. We note that if |(n) has the period v then so does every derived sequence of |(n) so that AQ (g) will be at most v-dimensional. Moreover if v is the minimal period of |(n) then v is exactly the dimension of AQ (|). Now a periodic sequence cannot be a null sequence without being identically 0 so that Aq U ) = A 1(|) and by (4.1 0 ), or by the fact that a finite dimensional normed space is always complete, AQ U ) = A” (|). Since every sequence of AQ (|) is determined by its last v entries it follows that A"(|) is isomorphic to the algebra of functions on a space with v ele ments:« (1 , 2, ..., v} . Moreover Tv = identity so that rp-1 _ t v“1 . It follows that T is an automorphism of A" (£) so that
§5 •
REGULAR SEQUENCES
33
X “U ) is already two-sided: X ~ U ) = X U ) . Since T is measure pre serving, it follows that the probability measure on &""({■) = n(|) is ob tained by assigning probability 1 /v to each of the v points of the space. If the sequence g(n) is not periodic but it is ’’eventually” periodic, say with period v, then each translate Tv£(n) of a derived sequence £ (n) of |(n) will agree with £(n) except at a finite number of values of n. Hence ri(n) = Tv£(n) - £(n) would be a null sequence that would not necessarily be identically 0. Thus A 1(| ^ AQ (|). However we would still have Tv = identity in A 1(|) so that A 1(|) de fines the same process as before. The property A ~ U ) = A( £) just observed for periodic sequences holds for almost periodic sequences as well. In fact, by the definition of almost periodicity, there exists a sequence of integers nv -— > °°, nk K for a given almost periodic sequence |(n), such that T | -- >> £ uni formly. This is equivalent to x -- > x in A"(|). Hence k 0 x. _ — ~> x. for all j so that each x. is a limit of functions in J J J A“ (|) and hence A( |) = A” (|). §5.
Regular Sequences
5.1. Definitions. For the purposes of prediction theory, the class of left-infinite sequences that permit satisfactory analysis is still further restricted. The condition to be imposed in this section will be such as to exclude sequences that exhibit ”sudden" changes of behavior. An example of such a sequence would be one that is periodic up to a point and then violates its periodicity, e.g., (5-1)
|(n) = (-1)n ,
n < - 2,
|(-1 ) = |(o) = 1
.
We first prove THEOREM 5 .1 . Let g(n) be a stochastic left-infinite A-sequence. The following statements regarding |(n) are equivalent: (a) If a 1, a 2, ..., a ^.are open subsets of A, then the set of n for which |(n) e A 1, |(n - 1) € a 2, . . |(n - k + i) € ^ is either empty or has strictly positive upper density. (b) If £(n) is a non-negative derived sequence
3hr
CHAPTER 1 .
STOCHASTIC PROCESSES AND STOCHASTIC SEQUENCES
(c)
(d) (e)
of |(n) that is not identically o, then £ (n) > o on a set with strictly positive upper density. If £(n) is a non-negative derived sequence of |(n) that is not identically o then £(n) > 0 on a set with strictly positive lower density. There are no null sequences in AQ (g) but for the sequence identically 0. As a point of A”, g € n~(|).
PROOF. The implication from (a) to (b) is made in several steps. First we note that (a) implies that if r is any open subset of Ak (the k-fold cartesian product of A with itself) then the set of n for which g(n), I(n - 1), ..., £(n - k + 1)) e r is either empty or has positive upper density. This is so because r is the union ofsets of the form A 1 x a2 x . . . x a^, From this we conclude that if y is a non-negative function on A^ then \|r(g(n), g(n - 1), ..., g(n - k + 1 )) is either identically o or positive on a set of positive upper density. Now suppose 5 (n) is a derived sequence of g(n) £(n) = 0; there exists a non-negative y on Ak for some k such that (5.2)
for all
I? ( n)
n.
- iU(n),
|(n - i ) ,
...,
|( n - k + 1) ) | < |
But then
+(|(n), |(n - i),
|(n - k + 1)) > | a
when £(n) = a so that the function V(|(n), g(n - l), ..., i (n - k + 1 )) where V = max(\jr - 2/3a, 0) is not identically o. Hence it is strictly positive on a set of positive upper density; therefore iKi(n), S(n - 1)> ..., g ( n - k + 1 )> 2/ 3 ct on a set of positive upper density and by (5 .2 ) £(n) > a/ 3 on this set of n. This shows that (a) implies (b). To see that (b) implies (c) we remark that if £ (n) > 0 on a non-empty set of n, then also £ (n) - 5 > 0 on a non-empty set of n
§5 •
REGULAR SEQUENCES
35
for sufficiently small 5. Since this is again a derived sequence of g(n), by (b) £ (n) - 5 > o on a set of positive upper density. Now let F(\ ) be the distribution function of the stochastic sequence - £ (n) (Theorem 3 .3 ). F(x ) will have a point of continuity between - & and 0 and hence D {n : - 5 (n) < - 81} exists for some 8’, 0 < &T < 5. But 0 < D{n : - £ (n) < - 8} < D{n : - £(n) < - 8'} < D {n : - £(n) < 0} = D{n : £ (n) > 0} and this proves our assertion. Condition (d) is contained in the stronger statement: (df) The essential supremum (Definition 4.1) of any derived sequence of g is given by lie IL = sup I? (n) I n 0 (d1) follows.
by Definition 4.1, be positive only on = 0 and |£(n)| < ||E and
To see that (d) implies (e) suppose that g(n) satisfies (d) and that | i oT (g). There then exists a non-negative function cp on a” vanishing on fl~(|) but non-zero at g. Consider the derived sequence £ ( n ) = cp ( . . . ,
| ( n - 2 ),
|(n -
1),
g(n))
we have 5 (0 )> 0 . By (4.8) and (4.9) the element sponding to £(n) is given by z = c p ( . . . , x_2, x ^ , xQ ) Now the
x^
are the variables of
X~(g)
(..., x_2, x-i > xo) -^e ^-n the sample space into a sequence space that ?(n) is a null sequence and by £(0 ) > 0 and therefore we must have
z in
;
A~(g)corre
.
and as such the sequences
the definition of the imbedding of (§2 .2 ). Hence z = 0 .It follows (d) ? (n) = 0 . This contradicts | e ft~(g).
Finally suppose that g € ft“(g) and let A 1, a2 , ..., Ak be open subsets of A. There exist continuous functions t2, ..., ^ on A with positive in Aj andvanishing on A- a •. Consider the element z e A~ (g) defined by (5.3)
36
CHAPTER 1.
STOCHASTIC PROCESSES AM) STOCHASTIC SEQUENCES
By (4.8) and (4.9) this corresponds to the sequence (5 .4 )
£(n) =
\|r1 (g (n) H 2(g(n
- 1 )).. . (g(n - k + 1 )).
We see that £ (n) > o if and only if the conditions g(n) € A 1, g(n - 1 ) ..., g ( n - k + 1 ) €A^ are fulfilled. Now since g€ ft”(g) all its translates I^g e ft“(g) (Theorem 2 .3 ). Hence we may rewrite (5.4) as (5.5)
£(n) = z(T~n |)
,
z being defined by (5 .3 ). It follows that if the set of n for which g(n) € A 1, g(n - 1 ) € a2, ..., g(n - k + 1 ) € A^ is non-empty then z ^ 0 . However since z 0 this implies E(z) > 0 and by the definition of E on A“(g) this means that the average of the sequence £(n) in (5.4) is strictly positive. Finally this Implies that the upper density of the set of points for which (n) > 0 is strictly positive and this is equivalent to (a). This completes the proof of the theorem. DEFINITION 5 .1 . A left-infinite sequence g(n) is regular if it is stochastic and it satisfies the five equivalent conditions of Theo rem 5 .1 . Thus in the example of (5 .1 ) although g(n) is stochastic it is not regular since the pair of values (1 , 1 ) occurs once in g(n) but not with positive upper density, thus violating (a). We remark that if g(n) is regular then by (d) the ideal N of AQ (g) consists only of the 0 element and henceA 1(g) = AQ (|). Combining this with (4.1 0 ) we find that A“(g) = AQ (g) or that A~(g) is simply the algebra of numerical valued derived sequences of |(n). The correspondence between the elements of A“(g) as functions on n~(g) and the sequences in AQ (g) is obtained by way of (e) which asserts that g € fl~(g). Namely, if z e A“(g) is a function on fl"(g) it may be evaluated at g and each of its translates T111!. The resulting sequence is the sequence In AQ (g) corresponding to z. More precisely we may write (5.6)
£ ( - n ) = zCT11! )
.
This is a direct consequence of equations (4.8) and (4.9) relating the functions on ft”(g) (or on A^) with sequences in AQ (|). The following theorem is easily verified: THEOREM 5 .2 . A derived sequence of a regular sequence
§5.
REGULAR SEQUENCES
37
is regular, and a uniform limit of regular sequences is again a regular sequence. 5.2. Existence of Regular S e q uences. Theorem 3.b, which states that almost all sample sequences of a stationary process are stochastic and thereby ensures a plentiful supply of stochastic sequences ma y he sharpened to give THEOREM 5.3. Almost all sample sequences {xn (co)} a one-sided stationary process are regular.
of
PROOF. B y Theorem 3 .^ and (b) of Theorem 5.1 what remains to the sample sample space space and andfor for any anyn on no n be shown isisthat thatfor foralmost almostall all cd cd ininthe n of negative cp cp onon A ~A~ (A (A isisthe the range range of of the the variables variables xxn of the the process process in question), the derived sequence of ( x ^ ( c d ) } corresponding to cp is either identically 0 or it is non-zero on a set 01 of positive upper density. To do so let us call the pair ((cp cp, cd) regular, where cp e C(A~) and It would certainly suffice to show that for each rational r, the d e rived sequence is either never > r or is > r on a set of positive ( a “ )), , upper density. Let us call the pair (c(cp, p, cd) regular if cp ee C (a“ cd e and if the derived sequence cp (. . ., x n _2 (o>); x n _i (oj), x n (a>)) has this property for all rationals. Observe that if cpk --- > cp u n i formly and the (cp, (c p k , c d ) are regular then (cp, (c p , c d ) is also regular. For if the derived sequence for ccp p is is somewhere somewhere >> r, r, it it is also is also >> r ++ 2s 2s forfor a rational a rational e >e > 0 .0 . Hence Hence if if ||||cp cp -- cpcpk k |||| r + s on a set of positive upper density and so the original sequence for ccpp is > r on a set of positive upper density. Hence (cp, c d )) is regular. cd
It follows follows that that it it is is only only necessary necessary to to show show that that for for almost almost all all co and a denumerable dense dense set set of of cpcp € € C (Ca (~a) ~ ) the thepair pair (cp, (c p , c d ) c d ) is is regular. regular. Since the condition for regularity involves the denumerable set of rationals itsuffices to show that for a afixed fixed cp cp and and r,r, the the derived derived sequence for (c(cp, p , ccdd) ) is is either either never never >r>r ororisissosoon.on. a a set set of of positive positive upper density. Let 0 = mm aa x (c ( c p - r, 0 ) and let co
{
cd
€
: A
lim N ->
00 N +
n=o
( 5 -7 ) X n-1
x n (aj)) = 0 J-
38
CHAPTER 1.
STOCHASTIC PROCESSES AND STOCHASTIC SEQUENCES
Our theorem -will follow if we show that for almost all
as e A, 0 , by (e) of Theorem 5.1. Hence if z € A”U )
= C(ft“U)) we may speak of zCr11^)* By (5 .6), this is £(- n) where (; is the sequence corresponding to z. Since by the definition of E in A“U), E(z) is the average of the sequence £ (- n) we may also write
(5.10)
E(z) =
11m “ N -4 00 iN|+1
N V Ls n=o
The point | is thereby distinguished by the fact that the functional E is determined entirely from the values of a function at \ and its trans lates T1"^. In the following definition we consider the analogous situ ation for an arbitrary abstract process. DEFINITION 5.2. If X Is an abstract process, A-£ = C(o£), the algebra of a one-sided version of X (for example, X itself), and co € then co is a generic point of X if for every z € A^, the sequence {zCl^co)} has an average equal to E(z). Thus if of
XU).
g(n)
is a regular sequence then
|
is a generic point
§5-
39
REGULAR SEQUENCES
One immediate consequence of this definition is that the trans lates T^cd of a generic point of X lie dense in For if this were not the case we could find a non-negative z, vanishing at all the I^cd but not identically 0 . Then on the one hand E(z) > 0 since z f o and on the other N lim -J— V zCl^o)) = 0 N _> 00 N + 1 La n=o so that
cd
could not be a generic point.
For an ordinary process THEOREM 5.4.
cd
X
we have the following theorem.
is a generic point of
if the sequence (|(n) = ^ ( ^ o ^ r K o with X(|) identical to X.
X
is a
if and only sequence
PROOF. Note first that when is identified with a subset of A~ as in §2 . 2 where A is the range space of the x ^ then the point (dq is identified with the sequence |. It follows then that if g(n) is regular, a>0 is generic for X(|), and therefore for X if the two processes are identical. Suppose conversely that F(xA ) = P(a ). On the other hand N D (A) = 11m sup - J — 5
N
V X. (T11!) < 11m
^ ° ° N + 1 IS ' o
" N- > ”
— 3—
N y f U T 11!) = E(fk )
N + 1 i fe)
so that D (A)
oo
E(fk ) = P(A)
K
.
If A isopen then D^(A U (a - a)) = 1 where - A is closed. On the other hand, it is clear that for disjoint sets A 1 and a 2, D^(a1 U a2 ) < D ^ a ^ + D^(a2 ) s o that D^(a) + - a) 1 and since - a ) < P(o£ - a ) = 1 - P(a ), we conclude that D^(a )> P(a ). THEOREM 7.1. A regular sequence g(n) is topologically ergodic if and only if for every numerical derived se quence £ (n) of |(n) and every e > 0 there is an integer N such that (7.1)
max ( U (/ ) M e U for all
i
-1)1, .
le (t - N) I) > ||£ ||„ - e
but for a set of upper density < e
.
PROOF. Suppose first that |(n) is topologically ergodic; let {;(n)be a derived sequence and let e > o. Let zdenote the function on corresponding to the sequence ? (n) sothat £(n) = z^T11!) (Equation (5.6)). For each n > 0 define Ah = {CD € n_(i ) : max ( |z(to) |, |z(Tu>) |, ..., Iz CI^cd) I) < Ik lloo "
I4-6
CHAPTER 1 .
STOCHASTIC PROCESSES AND STOCHASTIC SEQUENCES
and le t
_ A= N
n > o
^
.
Ah is closed so that A will also be closed. Moreover if 0; from this we shall obtain a contradiction. First of all, z corresponds to a sequence in = £(- n). Take e < a and set
z(Tn O
r = (a) e ft”U ) where N holds for aa and T a Ta open so wwhich h ich
y^ a y
e
: max (|z(co)|,
|z(To>)|, ...,
A Q (|)
(§5 .1 )
|z(T^(co)| < 1 - e}
is is such such that that max max ( |£ (^ ) |,|£ (^ I£ (^- -1 )1|,) I>. .., •••> |I££ (^ - N ) |) > 1 - e all upper density density > a. a. But this would would meanthat thatthe the set of £ forfor
max ((|£(i)|> (^ - 1) I> •••> |£ ( O I> |£ \Z(£ • • •> |£ l£ (^ U - N) | < 1 - - ee which w h ich contradicts our choice of N.
has upper density density
THEOREM 7.2. A regular sequence £(n) |(n) is ergodic if and only if for every numerical derived sequence £(n) of |(n) and every e > o0 there is an N such that N (7.2)
— 3--
V
N+ 1£0
£(i - j) - E(£)
< e
for all l outside of a set of upper density (E(£ ) is the average of the sequence £(n ) .) PROOF.
Suppose
g(n)
is ergodic,
£ (n)
< s•
a numerical derived
§7.
sequence of sponding to
ERGODIC PROPERTIES OF REGULAR SEQUENCES
g(n) and let £(n). Define
= J cd e
a
U
z €
)
) be the function on
47
fl”(|)
corre
ii j-- V z (Tj*cd) - E(z)
:
+ 1 Uo and set A = lim sup a n —> 00
= {cd :
cd
e
infinitely many
A^}
If cd e A then cd is clearly not a generic point for the process; hence A has measure 0, assuming X(|)is ergodic. Consequently P(a ^) ----> 0 so that there exists an N for which P(a ^) < e, for a preassigned b > o. By Lemma 7.1 since a ^ is closed, D^(a ^) < e and this means that the set of & for which (7.2) does not hold has upper density < e as was to be shown. Now assume that the condition of the theorem is fulfilled for all £(n) derived from g(n). From this we may deduce that If z e A “(|) there is a sequence (N^J , N ^ -- > such that
1—
(7.3)
Y
Tjz —
■;> E(z)
*k + 1 in the norm of
L (A”(g)).
For this choose
N^
such that
%
:
~— y s(^ + j ) - e (z ) + 1 j=o for l outside a set of upper density corresponding to z. Then Nn Lk> 2 y TJ’z - E(z)
■J— + 1 J=o
11m N-» 00 N
I
< 1
= lim— 3— y N- » N + 1
«>, (7 .3 ) holds. It is moreover easy to show that if (7.3) holds for a dense set of z € L 2 (A~U)) it holds for all of o L (A“(g)). Now suppose X(g) were not ergodic so that there would exist a non-trivial measurable set a In with Ta C A. The character istic function of this set would satisfy Tx A almost everywhere. Taking in ( 7 .3 ) would give x = E(z), a constant which shows A A must have measure 0 or 1 . This completes the proof. In Chapter 9 we shall need the following stronger form of the "only if" portion of the preceding theorem. We note that in the case of Theorem 7 . 1 the corresponding sharper form would have been trivial. THEOREM 7*3* If i(n) is an ergodic sequence and £ (n) a numerical derived sequence of g(n) then for any e > 0 there exists an N such that
-—
+ 1
for all density
y
s(i - j)
E(£ )
< e
J=o
N > Nq and all & outside a set of upper < s and independent of N.
PROOF. We may suppose that £ is real valued; by addition of an appropriate constant, to the terms £(n), the assertion of the theorem may be reduced to the following. If E(£ ) < 0 , there exists an Nq such that for all N > N , N £ (i - j) < 6N J= o
§7-
ERGODIC PROPERTIES OF REGULAR SEQUENCES
49
for all 4 outside a set of upper density < s . It will be convenient to consider this over the positive integers so that the desired Inequality takes the form N ^
£ it + j) < 5N
j =1
for z 0 and outside a set of upper density < e . Suppose that we have E(£ ) = - a < 0 and that ||(; = p. We decompose the positive in tegers into two sets R and S defined by k R = in : y
z
£(n+j)>^0
for some
£ (n + j) < 0
for all
k
j=o k S = In : \ I We observe that S sufficiently large
k
J=o
must be infinite for if it were finite, then for n, n*
Y,
^ 0
j=n for appropriate
n*
and we could not have N
i
. 1
We may also assume R Is infinite for otherwise there is nothing to prove. Now apply Theorem 7 . 2 to the sequence £(n); then for arbitrary &2 > 0 we can find an N 1 with
N1 • '
>0
for all l outside a set U of upper density < We now proceed to define a set S consisting of blocks of N 1 + 1 consecutive integers defined as follows. Let n^ be the smallest element of S not in U (if no such n 1 exists then S will be empty); the block of integers (n^ n 1 + 1 , ..., n 1 + N 1 ) is then taken to belong to S. If n 2 is the
50
CHAPTER 1 .
STOCHASTIC PROCESSES AND STOCHASTIC SEQUENCES
least element of S n 1 + N 1 + 1 but not in U then (n 2 , n 2 + 1 , ..., n 2 + N 1 ) also belongs to S. Proceeding in this manner we obtain the set S with the property that if n e S but n / U then n e S. Now let N^. be a sequence of integers with N^. € S and N^ --- > °o0 This is possible because R is infinite. We also suppose that N^. is so large that thenumber of integers in U which are < N^ is < 2 S2 ^k* Letting a^ denote the number of integers of S that are < N^ we find by decomposing theseintegers into blocks that
Y
£(n) > -
( a + 51)ok - N , ( p - a
- 61)
.
ruS
n - (a + 5 , ) ^ - N,(p - a - 8 ,) - 252NkP
S
must
.
neSUS
n 0 j=n
•
It follows that the £ (n) for n 4 S U S and to blocks of consecutive terms whose sums are
n < Nk are decomposed in o . Hence
Nk -1
Y
5 ( n) > -
( a + 51 )crk - N, (p - a - 61 ) -
2
S2 Nk P
.
n=l
Dividing by
Nk
and letting -
Since Is an
&1
N1
a
k -- > «
> -(a+ 51 ) 1 k
we find that °k lim inf ooNk
— -
25 J
d
and could be chosen arbitrarily small we find that there and a corresponding S with ak lim inf — > 1 k -» 00 N^
e
so that the upper density (along the sequence N^) of the Integers not in S is < e . To show that this implies that the ordinary upper density (i.e., not only for N^) of the complement of S is < e we must show that wecan choose Nk in S with Nk+ 1 /Nk ----> 1 . Now ifthis were not the case it wouldfollow that £(n) could not be stochastic. Forsuppose that for a large N^, Nk + 1 > (1 + 5 1 )Nk where Nk + 1 is the next integer In S after N^. As we have shown
W
Y n=Nk + i
so that
5(n) >. 0
52
CHAPTER 1.
STOCHASTIC PROCESSES AND STOCHASTIC SEQUENCES
N,, t (n) >
I
) i(n) - - aNv > n=l
' (n) >- I
n=l
+ 5'
k+1
which could not he possible for large k. We conclude then that the comple ment of S has ordinary upper density < e . To complete the proof of the theorem we shall show that there is an Nq such that for l e S,
£(•« + j) < 8N
Y
.
j= 1 For this we recall that by the definition of for some m < N 1• But then
5,
if
1 e S
then
4 - m e S
N
z
j=-m
and N
Y
?(•« + j)
N^/8.
X-ergodicity.
THEOREM 7.4. A regular sequence |(n) is X-ergodic if and only if for every numerical derived sequence t(n) of |(n) and £ > o there is an N such that N (7.4)
r—
Y
e~2*ij*\(i - j:
< e
+ 1 J=o r-U for all
4
outside of a set of upper density
< e.
PROOF. The proof of this theorem proceeds exactly as the proof of Theorem 7.2 once two facts are established. The first is that if X(|) is x-ergodic then for almost all o> e fl”(|) N (7.5)
--lim N oo N +
V
1 j=o
e”2*1^'*T^z(o>) = 0
§7-
ERGODIC PROPERTIES OP REGULAR SEQUENCES
The second fact is that, conversely, if for every
(7.6)
53
2 — z € L (A~(g))
11m ----— V e-2*^ j^T^'z = 0 % - 1 £„
In the sense of wealc convergence In L2(A-(|)) for an appropriate sequence Nk , then X( |) is x -ergodic. The first follows from the fact ([3 ], P* 469) that the limit in (7 .5 ) does exist almost everywhere and if this limit oiri1 is w we have Tw = e w. By X -ergodicity w = 0 . The second assertion 2itiX is more elementary since a function satisfying Tw = e w has to be 0 if (7 .6 ) holds with z = w. The details of the proof now follow as for ordinary ergodicity. It might be pointed out that the criterion of Theorem 7 . 2 for ergodicity is, in a sense, a refinement of the condition that a sequence be stochastic. For whereas the latter merely requires the existence of an average for every derived sequence; ergodicity requires some uniformity in the approach to this average. This observation may help clarify the role of ergodicity in some of our results. Finally, we remark that the theorems just proven have the follow ing corollary: COROLLARY. A uniform limit of topologically ergodic, ergodic or X-ergodic sequences is again respectively topologically ergodic, ergodic or X -ergodic. 7 .2 .
Examples. Let us call a sequence a "random sequence" if it is generic for a random process, i.e., a process with independent vari ables. A random sequence is both ergodic, (and hence topologically ergodic) and x -ergodic for 0 < x < 1 . Both of these follow from the fact that if X is a random process, then for a dense subset of A^, the variables z and l^z will be independent for sufficiently large n. On the other P If Tk hand if Tn = e u, 0 < x < 1, then all translates of n are proportional to u, and this implies that u is constant if X = 0 and u = 0 for 0 < x < 1 , so that X is both ergodic and X -ergodic. If |(n) is an almost periodic sequence then |(n) is ergodic. To see this we apply the criterion of 7 . 2 and observe that a derived se quence of an almost periodic sequence is again almost periodic. Now if £(n) is almostperiodic, the proof of theexistence of an average for £(n) shows that this average is approached uniformly, i.e., N — —
y
" * 1 £b
- j> —
> Ms)
54
CHAPTER 1.
STOCHASTIC PROCESSES AM) STOCHASTIC SEQUENCES
uniformly In & . It follows that for N sufficiently large the left hand side of (7 .2 ) may be made arbitrarily small, for all i without any ex ceptional set. On the other hand a non-constant almost periodic sequence cannot be X -ergodic for every x . For it follows readily from the criterion of Theorem 7.4 that if g(n) is x -ergodic then for any i
_ j) = 0
lim - i _ y N -> 00 n + 1 J=o
(7.7)
where £ (n) is a derived sequence of |(n). However if we consider (7.7) for £ (n) = g(n) this implies that the Bohr-Fourier coefficient of |(n) corresponding to e 2jtiX vanishes. Since this cannot happen for every X £ 0 unless g(n) is a constant our assertion follows. It is possible to construct sequences that are not ergodic by a method of superposition of two sequences that are generic for different processes. To make this precise let us consider a partition of the nega tive integers into infinitely many blocks of consecutive integers: ..., I_n > •••> 1^2’ *o where intervals I are arranged according to the natural order of the integers. Suppose that I has k^ integers and that
(i)
> 00
(ii)
k, - a ---- > a k,2n+l
(iii)
k, . - a ---- > s 2n-i
where
a+p=l,a,
p^o.
Now let ^(n) and l2(n) be two regular sequences; we shall form a new sequence |(n) from these by means of the partition {I_n ) . Namely, the last kQ terms of £2(n) are placed in order in the positions of IQ, the next k2 terms of £2(n) are placed in the positions of I_2, the next k^ in I_^ and so on. The last k 1 terms of ^(n) are placed in order in the positions of the next k^ in the positions of I and so on till all the I_2n are occuPieci "by en” tries of |2(n), and the I_2n+1 ^ t‘ tie errtries l-j(n ). Conditions (ii) and (iii) now imply that the entries of l2 (n) and of ^(n) occur in the ratio a : p. From this one can show the following: If ^(n) and l2(n) are both stochastic A-sequences then the new sequence, |(n),
§7.
er go dic ! p r o p e r t i e s op r e g u l a r s e q u e n c e s
will again be stochastic. If cp e C(A“ ) and £(n), £.,(n), £2(n) are the derived sequences of |(n), ^(n), £2(n) corresponding to cp then the averages of £,£.,> £2 are related by (7.8)
E(£ ) = dE(E2 ) + $E(£1 )
.
The sequence g(n) defined here will be termed an (a, p) of 11(n) and s2(n).
superposition
Suppose that ^(n) is a random {- 1,1 }-sequence, i.e., a regularsequence generic for a random process X(p^ q1 ) (§6.2), and suppose that l2 (n) is also a random {- 1, 1)-sequence, but generic for X(p2, q2 ) with p 1 ^ p2 . Let |(n) be an (a, p) superposition of ^(n) and £2 (n) and consider the process X(g). The sample space of X “U ) may be imbedded in {- 1, 1}“ while for l-j(n) and £2(n) we shall have fl'U.,) = a”(l2 ) = {- 1, 1}~ (§6.2). If P and P2 represent the probability measures for X( ^ ) and X(g2 ) respectively and P is the probability measure forX(|), we find from (7.8) that (7.9)
P(A) = aP2 (A) + pP^A)
for any borel set a ( {- 1, 1}“. In particular ft“U ) must be all of {- 1, 1}” . Prom (7.9) it is clear that X(g) is not ergodic. For P and P2 are not identical and hence the points in {- 1 , 1}~ generic for X( ^ ) will be disjoint from those generic for X(|2 ). Referring to these two sets as A 1 and a2 we have from (7 .9 ), P(a1 ) = p, P(a2 ) = a. But clearly A 1 and a2 are Invariant under T so that there exist non trivial invariant subsets for X(g), and X(|) will not be an ergodic process. As regards the sequence |(n) itself we can deduce that |(n) is regular from the fact that it is stochastic and that fl~(|) = {- , 1 }“ whence | € ft“(|). We thus have constructed a regular sequence that is not ergodic. The sequence constructed above is however still topologically ergodic, as may easily be verified. A non-topologically-ergodic sequence may also be found by forming an (a, p) superposition of a random {■- 1 , 1 } sequence with a random {- 1 , 0 * 1}-sequence. That such a sequence cannot be topologically-ergodic may be verified either directly from the sequence, using Theorem 7 . 1 or by an analysis of the corresponding process and its sample space.
CHAPTER 2.
THE PREDICTION PROBLEMS FOR SEQUENCES
The problem of prediction to be treated here, as we have stated in the Introduction, is that of assigning to a sequence representing the past behavior of some variables, a probability measure on the set of all possible futures of the variable. Our procedure at first will be to set down the basic properties that such a probability measure would be ex pected to possess. It will appear that in certain cases these properties uniquely determine the unknown measure; for such sequences the problem of prediction becomes trivial, in a sense. These "trivially" predictable sequences will be the object of our study in this and the succeeding three chapters. Subsequently we shall consider how a measure may be chosen for prediction when more than one such measure is admissible. §8 . The Support of the Prediction Measure 8 .1 . L-sequences. Let g(n) denote the given left-infinite sequence. Each possible future of g(n) is a right-infinite sequence; however we shall think of it rather as a doubly-infinite sequence ex tending |(n). Thus the set on which the prediction measure will have to be defined is the set of all doubly-infinite sequences extending |(n).
Our first step will be to make an assumption restricting the set of doubly-infinite sequences that will be admitted to extend |(n). Such an assumption amounts to restricting the support of an allowable pre diction measure. The purpose of the present restriction is to exclude doubly-infinite sequences in which the events of the future cannot be approximated by events of the past. In particular it guarantees that the future of a A-sequence will be a A-sequence. DEFINITION 8 .1 . A doubly-infinite A-sequence |(n) is an L-sequence if for every numerical derived sequence £ (n) of |(n) (Definition 3 .3 ) (8 .1 )
sup { I?(n)|} = n < o 56
sup
f 1 5 (n) I)
.
§8.
SUPPORT OF THE PREDICTION MEASURE
57
The initial "L" in this definition is intended to call to mind the theorem of Littlewood ([9]) to the effect that an n-body system con fined to a fixed region in space throughout the past will, with probability 1, remain in that region in the future. The property of being an L-sequence is an extension of this and may be described as the requirement that whatever has been true throughout the entire past remains true in the future. We shall see presently that Littlewood1s theorem extends to the statement that almost all sample sequences of a two-sided process are L-sequences. As a result it appears plausible to allow only L-sequences as the doubly-infinite sequences extending |(n). If the left-infinite sequence |(n) is a regular sequence then restricting its extensions to L-sequences has a particular significance: THEOREM 8.1. Let |(n) be a regular left-infinite A-sequence and suppose that !(n) in A^ extends i(n) : = |. Then \ is an L-sequence if and only if 1 € ft(t). PROOF. Suppose first that 1 € &U). By Theorem 2.4 this im plies that ^(T"*111!) € for all m. Now let £ (n) be a derived se quence of 1 (n). For each m, T”m £ will be a derived sequence of T”™! and p(T”m £) will be a derived sequence of p(T-nv£) € fl~(g). In each case the derived sequence corresponds to the same fxmction cp on A” . It follows then that the entries of the sequence p(T”m £) are values of cp at points of fl”( 0 and hence the supremum of the absolute values of these entries is dominated by the supremum of |cp| on ft~(|). Since the trans lates T1^ are dense in fl”(|) (Theorem 4.2), the latter agrees with the supremum of {|cp(T11^ ) |} = { |£(- n) |}. What we have just shown is that (8.2)
sup { |p(T"m O(n) |}< sup n < o n < o
(
U(n) |)
or (8-3)
sup £ |£(n) |} < sup CU(n)|} n < m n < o
and since the right hand side is evidently dominated by the left hand side if m > o, we must have equality. Letting m -- > » we obtain (8.1 ). Conversely suppose that !(n) is an L-sequence, extending |(n). To show that e ft(|)we apply the criterionof Theorem 2.4, namely, we show that for all m > 0, p(T~mK) e ft“(|).In doing so we shall make use of the metric on A which we denote by d(., .) . In order for
58
CHAPTER 2.
PREDICTION PROBLEMS FOR SEQUENCES
^(T"111!) to belong to ft”(g) it suffices that the translates T^g come arbitrarily close to p(T”m,|). For this we will show that the distances d(T^g(j), P(T"’ml') (j)) can be made arbitrarily small for any finite set of j, by choosing z appropriately. Namely, let £(n) be the derived sequence of T(n) defined by
f
(8 4)
£(n) = h
0
+
V,
V
1"1
d(|(j + n), p(T~m?)(j))
J=“-
Since p(T~mT)(j) = T(j + m) In general £(n) < 1 so that
it follows that when
sup |£ (n) | = 1 -°° 1(0)), which occurs infinitely often in £(n) by the requirement of regularity, will always be followed by the same element of A and this element will have to be |(1 ). PROOF. It is only necessary to prove that for some N the se quence (I(- N ), | ( - N + 1 ) , ..., |(o ) is always followed by the same element, since once this is shown, the condition that the extension of |(n) be an L-sequence guarantees that g(l) takes on this value. Assume then that no such N exists. We will showthat |(n) is not determin istic by exhibiting two distinct points in n~(|), and cd2, with the property that Ta^ = Tcd2 = £. Then and cd2 have preimages and t]2 in fl(i),i.e., pf^) = o>1, P(t]2) = a>2, and since cd1 £ cd2, t]1 f r)2 and also Ttj 1 £ Tt)2 since T is 1 - 1 on fl(g). But p(Ttj 1) = Tp(nn) = Tco^ = £ and p(Tt]2)= | so that £ will have two extensions in &(£)• We recall that the topology of A~ is given in terms of pointwise convergence so that for A discrete we shall have cd^ -- > cd if and only if cDk (i) = co(i) for each I, for sufficiently large k. Now, if no N exists with the required property there will be arbitrarily large blocks ({■(- N), g ( - N + 1), ..., g(o)) occurring in the past of |(n) followed sometimes by one and sometimes by another element of A. Since anything that occurs once does so infinitely often, we can find two ele ments of A, say a and b such that arbitrarily large blocks will occur followed sometimes by a and sometimes by b. If cd is a A-sequence and X e A, let (cd, \) denote the sequence ..., cd(— 1), cd(o), What we mi have just said implies that there is a sequence of T £ converging to ni
(£, a) and a sequence T £ converging to (£, b ). Hence (£, a) and (|, b) € ft”(|) and these have the properties claimed for cd1 and cd2 earlier. This proves the theorem. 9.2. Construction of Deterministic Sequences. The following theorem shows that the class of deterministic sequences is quite extensive. THEOREM 9.2. Any regular sequence |(n) is a derived •ksequence of some deterministic sequence | (n).
62
CHAPTER 2.
PREDICTION PROBLEMS FOR SEQUENCES
PROPF. Suppose that |(n) is a A-sequence; the space ft(g) * may then be imbedded in A^. Define A as the space of all right-infinite A-sequences with the property that they extend to doubly-infinite A-sequences in ftU). Since the shift operator T is invertible in ft(|) * it follows that A is closed under the operation S of translating se quences to the left. Now let !(n) be any extension of |(n) in fi(g), i.e., °|(n) = |(n) for n < o. From l(n) we form the A -sequence (9.1)
S*(n) = (l(n), |(n + 1), ...)
* * * Note that | (n + 1 ) =S U (n)) so that if | (n) is plausible that it willbe deterministic. In any case derived sequence of g (n).
n < o
.
regular it is g(n) is clearly
a
*
To show that | (n) is regular we remark that to do so it evi* dently suffices to show that every numerical derived sequence of | (n) is regular. Moreover by Theorem 5 . 2 it suffices to show this for a dense * / subset of these derived sequences. Denote by X = ( . . . ) a typical point of A . We obtain a dense set of derived sequences by taking functions cp on A^ depending only on finitely many coordinates, and for each coordinate \ depending only on finitely many j = 1 , 2, 3, * J ... . Applying a function cp of this type to g (n), we shall obtain a derived sequence which is also a derived sequence of some V
= (..., T(k - 2 ), g(k - 1 ), g(k)) .
It suffices therefore to prove that g1 is regular. If k < 0 , then |!(n) is a derived sequence of g(n), hence is itself regular. If k > 0 , g'(n) consists of finitely many entries adjoined to g(n) so it will at least be stochastic. Moreover X(g!) = X(|) so that in order to prove that gf(n) is regular it suffices by Theorem 5 . 1 to show that gf € fl“U). But g 1 = p(T~k|) and since 1 e ft(g) it follows that g! e ft~U) and hence g!(n) and therefore also g (n) are regular. -X-
To see that g (n) is deterministic we remark that the fact that |*(n + 1 ) = S(g*(n)) for all n implies that for all co* e ft“U*) and for all n, o>*(n + 1 ) = S(co*(n)). Now if l*(n) is an extension of g*(n) in fl(g*) then p(T“l*> e n“U*) and so 1*(1) = SU*(o)) = S U (°)) since 1 (n) agrees with g*(n) for n < 0 . It follows that 1 (1 ) is uniquely determined by g*(n) and similarly all the 'i (n) are * determined uniquely so that g (n) is deterministic.
§10.
PREDICTION MEASURES AMD CONTINUOUS PREDICTABILITY
63
§ 1 0 . Prediction Measures and Continuous Predictability 1 0 .1 . Prediction Measures. To widen the class of "predictable" sequences we shall have to introduce other hypotheses regarding the pre diction measure. The assumption made so far, that the doubly-infinite sequences to be considered should be L-sequences, has served to restrict the support of this measure to a subset of ft(|). In this section we make use of the probability measure induced by £(n) on ft(|) to further restrict the nature of the prediction measure. Essentially our assumption will be that the prediction measure behaves "something like" a conditional probability measure determined by the original probability measure on n(|) conditioned by the event that i has occurred (or, the
conditional probability with respect to the set C &(£)> P being the canonical map from flU) to Since the event {|} (or the subset p”1(|)) has probability o in general we cannot speak strictly of a conditional probability and what we shall do is to require that the prediction measure behave in a manner similar to the ordinary conditional probabilities. Stated precisely, we require that if the conditional expectation of a function on ftU) with respect to every set of non-zero measure that is "close to" p-1(£) is non-negative, then the prediction measure should also assign a non-negative value to the function. In the above condition we make use only of the E-algebra structure of A~(£) and A U ) without reference to the transformation T and the variables xn of the process X(£)- For this reason we shall state the formal definition of a prediction measure in terms of an arbitrary pair of E-algebras, A and B, with A C B. This will be useful because we shall later require a notion of predictability from one algebra to another when the algebras do not necessarily correspond to the past and future of a process. With the E-algebras A and B, A ( B, we consider the homo morphism spaces and and the induced map p of onto 0 ^. As usual the inclusion A ( B is intended to imply that the expectation functional on A is the restriction of the one on B, or, in terms of the probability measures P^(A ) = pb (P~1 (a)), a ( We therefore de note both P^ and P^ as P . In what follows we shall assume that A and B are separable, so that and are metrizable and the top ologies of and Og may be described in terms of convergence of se quences . The condition given for prediction measures can be stated as follows: DEFINITION 1 0 .1 . If
|
and
|± is a probability measure
6k
CHAPTER 2 *
on
fig
we sa y t h a t
h as th e
PREDICTION PROBLEMS FOR SEQUENCES
p.
i s a p r e d i c t i o n m easure a t
p ro p e rty th a t f o r
n o n -n e g a tiv e f u n c t io n E ( f g ) > 0,
th en
some n eigh b o rh o o d U
f eA
I
C^
and h a v in g su p p o rt
i f w henever
o f th e
in
U
g € B
p o in t £,
any
s a tis fie s
n( g) > 0.
Two co n seq u en ces o f t h i s d e f i n i t i o n a r e g iv e n in th e f o l lo w i n g lemma: LEMMA 1 0 . 1 .
If
p
i s a p r e d ic t i o n m easure a t
th e n th e su p p o rt o f f o r elem en ts PROOF.
F or
p
i s c o n ta in e d i n
|,
(I )
and
€ A, p( g) = g U ) •
th e f i r s t a s s e r t i o n o f th e lemma, l e t g 1
n e g a t iv e fu n c t io n in show t h a t
g
_ -j
B
p ( g ! ) = o.
v a n is h in g on th e c lo s e d s e t
F or a r b i t r a r i l y sm a ll
e > o
be
a
non
We must t h e r e w i l l be a
n eig h b o rh o o d U o f | in su ch t h a t g = e - g 1 i s n o n -n e g a tiv e in —1 —1 p” (U). F or o th e r w is e we w ould have g 1 ( ) > e f o r some co i n p" (U) f o r a r b i t r a r i l y sm a ll f o r w h ich in
U
and e
U
g 1 (to) > e
and we c o u ld th e n f i n d
an
w h ich i s im p o s s ib le s in c e
cd
w it h
P( cd) = |
g'Ccn) = 0.
S in c e
g > 0
th e h y p o th e s e s o f D e f i n i t i o n 10. 1 a re o b v io u s ly s a t i s f i e d f o r
U
so t h a t i f
p
is
a p r e d ic t i o n m easu re,
b e in g a r b i t r a r y t h i s im p lie s Now suppose
on
U )
lemma
\i( p(g* g* )) = =0 0
g
g
A
s in c e f of or r cocd e so so t h t haat t
p(g1) = o
and c o n s id e r g(a>) = g U ) -
n( g) > 0
ov
g
p(gf )
) = g U ) -
n( g) = g ( | ). p(Tg) h is= pgr (o |v ). e s th T he islemma. p r o v e s th e lemma.
Thus we se e t h a t th e c o n d it io n o f D e f i n i t i o n 10. 1 r e q u ir e s in p a r t i c u l a r t h a t th e p r e d ic t i o n m easure have i t s o f e x t e n s io n s o f
|
from
A
to
B.
su p p o rt in th e s e t
C o rresp o n d in g to t h i s c o n d it io n we
may d e fin e a form o f p r e d i c t a b i l i t y f o r se q u en ces p ro p e rty th a t th e re i s n itio n 10. 1.
p- 1 ( | )
£(n)
j u s t one p r e d i c t i o n m easure a t
t h a t h ave th e |
s a t is fy in g D e fi
T h is le a d s t o th e fo llo w in g , d e f i n i t i o n w h ich i s a g a in form u
l a t e d f o r a b s t r a c t E - a lg e b r a s :
(c.p.)
_to
DEFINITION 1 0 . 2 . B at | e
The E - a lg e b r a A i s c o n t in u o u s ly p r e d ic t a b l e i f t h e r e i s e x a c t l y one p r e d ic t i o n m easure a t
| . A w i l l be s a id to be c o n t in u o u s ly p r e d ic t a b l e to B w ith o u t r e fe r e n c e t o a p o in t in i f i t i s c . p . a t e v e r y p o in t o f 0^. A p r o c e s s X w i l l be c a l l e d c o n t in u o u s ly p r e d ic t a b l e i f r e g u l a r sequ ence A( l )
at
| ( n)
A^
i s c . p . to
i s c o n t in u o u s ly p r e d ic t a b l e i f
A^-.
F in a lly , a
A"(0
i s c . p . to
5. We p ro c e e d to i n v e s t i g a t e th e c o n d itio n s f o r co n tin u o u s p r e
d ic ta b ility .
F or t h i s we s h a l l have to d e v e lo p c e r t a i n p r e lim in a r y n o t io n s .
By th
§10.
65
PREDICTION MEASURES AND CONTINUOUS PREDICTABILITY
1 0 .2 .
Preliminary Lemmas
DEFINITION 1 0 .3 . Let A be a separable E-algebra, Its homomorphism space and let cp € L(A) (Definition 1 .2 ). The function cp is continuous at 0 there is a neighborhood U£ of cd such that (1 0 .1 )
|E(fq>) - aE(f)| < eE(f)
for all non-negative functions
f e A
and having support in
U£ .
We note that if (1 0 . 1 ) holds for f € A with support in also holds for f e L(A) with support in U£ .
U£
it
One easily shows that if cp e A, i.e., cp i s a continuous func tion on in the ordinary sense, then cp is continuous in the sense of Definition 1 0 . 3 at every cd e 0 ^ with value cp(co). Moreover sums and products of functions continuous at cd are continuous at cd. For example if cp and \jr are continuous at cd with values a and p, then |E(fq>1 r) - apE(f)| < |E(fcp*) - aE(f*)| + |a||E(f*) - pE(f)|' which can be made arbitrarily small if (1 0 .1 ) applies to cp and \|r, with ff re placing f in the first term. LEMMA 1 0 .2 . If
n ) -- > a(co). Choose en > 0 with sn -- > 0 and find neighborhoods Un of con such that (1 0 . 1 ) Is satisfied for a = cd we may suppose that from some point on, Un C ue . Then if f has support in Un for large n we have simul taneously |E(fq>) - a(con )E(f) | < £nE(f) and |E(fcp) - a(cD)E(f)| < sE(f) whence |a(
so that
0,
p.' gives
n* ( - g + J i ( g ) + £ ) > o
\i'(g) = n(g).
This proves the theorem.
COROLLARY 1 . The function E(g|A) is continuous at | (in the sense of Definition 1 0 .3 ) for each g e B if and only if A is c.p. to B at PROOF. Forif E(g|A) is continuous at £ with value a(g) then the proof of the preceding theorem shows that for any prediction measure jj. at |, n(g) =a(g). Hence p. is unique. Conversely if A is c.p. to B at |then by the theorem every predicting sequence at £ converges over B and by Lemma1 2 .3 , E(g|A) is continuous at £. The foregoing corollary is the Reason for the term "continuous" predictability. COROLLARY 2 . If A is c.p. to Bat |, then the unique prediction measure at 5 is a limit over B of non-negative functions in A. PROOF. For a predicting sequence at | always exists as we have remarked in the proof of Lemma 1 0 .3 . By the theorem this predicting sequence must converge to the unique prediction measure over B. 1 0 ^. Continuous Predictability at Every Point. Inthe follow ing theorem we shall find conditions for A to be c.p. to B at every point, or simply, for A to be c.p. to B. If B is an E-algebra and M a subset of B we denote by M x the set of elements f of B satisfyE(fg) = 0 for all g e M.
70
CHAPTER 2.
PREDICTION PROBLEMS FOR SEQUENCES
THEOREM 10.2. A is c.p. to B if and only if B is the direct sum of A and Ax . If A is c.p. to B then the uniqueprediction measure \i at % de pends continuously on g e Moreovern^(g) = E(g|A)U), g € B, almost everywhere in if A is c.p. to B, and E(g|A) € A for all g e B if and only if A is c.p. to B. PROOF. The last statement is an immediate consequence of Corollary 1 to Theorem 10.1 and Lemma 10.2, since E(g|A) c A means that E(g|A) is continuous at every point of in the sense of Defi nition 1 0 .3 . We turn next to the first statement of the theorem. If A is c.p. to B and g € B we may write g = E(g|A) + [g - E(g |A) ] where E(g|A) € A as we have already shown. ThiS gives a decomposition of g into two functions belonging to A and Ax respectively. Conversely, suppose for each g, g = g1 + g2 with g1 e A, g2 € A"*". If f e A then E(fg2 ) = 0; hence E(fE(g2 |A)) = 0 and since f is an arbitrary ele ment of A E(g2 |A) = 0. It follows that E(g|A) = E(g1 |A) = g1 € A, or thatE(g|A) belongs to A for all g in B. Again, by what we have alreadyproven this implies that A is c.p. to B, so the firststate ment is proven. Combining Lemma 1 0 . 3 with Theorem 10.1 we find that when A is c.p. to B at | then n^(g) = E(g|A)U) which proves the remaining statements of the theorem. 10.5. Composition of PredictionMeasures. We now consider the situation arising when three E-algebras A, B, and C are given satisfy ing A C B C C. THEOREM 1 0 .3 . Let A, B, Cbe E-algebras, fi^, fig, ftc the corresponding homomorphism spaces, and suppose A C B C C . Denote by p the canonical map of fig onto and let | be a given point of fi^, its preimage in fig. If A is c.p. to C at g it is c.p. to B at g. If A is c.p. to B at g with pre diction measure *i, and if B is c.p. to C at all points of with the exception of a set of n-measure 0, then A is c.p. to C. In particular if A is c.p. to B and B is c.p. to C then A is c.p. to C. PROOF. For the firstpart of the theoremwe employ the criterion of Theorem 1 0 .1 . Namely, if A isc.p. to C at g then any predicting
§10.
sequence at c.p. to B.
PREDICTION MEASURES AND CONTINUOUS PREDICTABILITY
|
converges over C
and a fortiori over
B
so that
71
A
is
To prove the second part, let W ( "be a compact set such that at all points of W, B is c.p. to C, and assume that for a preassigned s > 0, p(p~1U ) - W) < e . At each cd e W let \±^ denote the unique prediction measure on ^ . Let g be a fixed element ofC . For any a) e W there will be a neighborhood U^ of m such that any non-negative f with support in U^ satisfies |E(fg) -
mJ
g)E(f)| < eE(f)
.
Since we have a covering of W by open sets U^ we may select a finite subcovering {U^., k = 1, 2,.. ., N} . We also construct a family of non negative functions {cp^} of B such that each cp^ has Its support in side an open set of {U^.} and such that ZgCpg < 1 everywhere and EgCpg = 1 in W. The construction of such a set of functions is similar to that of a partition of unity. Now set cp = z^cp^ and suppose that {fn } is a predicting sequence of A at |. We have (10.8) |E(fng) - E(fn^
has its
|E(fn °° and these limits are the prob abilities 11 (1 (1 ) = a^). In order to verify this, recall that | is a generic point of the process X(|) and hence the frequencies observed in |(n) are the measures of certain sets in and &U). In particular the fre quency of £(- N), ..., |(o) in the past of |(n) is the measure of the set in ft~U)defined by {x_N = §(- N), ..., xQ = |(o)}; the frequency of |(- N ), ..., £ ( 0 ), aj is the probability of the set in ft(£ ) defined bt = xo = denote the characteristic function the characteristic function of the quencies obtained by the procedure
X1 =
Now let •••' £ ( ° ) 1 of the former of these sets, X[aj] set {x1 = aj} . The relative fre in question are then
E (x [| (-N ),...,| (o )]x [a .])
(1 0 .1 1 )
J— E (X [|(-N ),...,|(0 )])
Since x[|(- N), ..., |(0)]/E(x[|(- N), ..., £(0 )]) is a prediction se quence at 1 , as is easily verified, the relative frequencies in (10.11) converge to the value n(x[aj]) = iiUO) = aj)• Hence continuous pre dictability ensures that the limits of the approximations in (10.11) ex ist and give the correct value for the prediction. It should be pointed out that for the continuously predictable case there exist many other procedures for obtaining the prediction measure corresponding to the fact that there exist numerous predicting sequences at a point. In other words we could "close down" upon the point 1 in many more ways. Namely, instead of looking at blocks of consecutive terms of the sequence £(n) we could also consider sets of non-consecutive terms and see where these have occurred before. That Is to say, for our Nth approximation we consider the set of
4
for which
CHAPTER 2.
PREDICTION PROBLEMS FOR SEQUENCES
- vN>1) - ?(- vK>1), | U
= i(- vn>2), where the sets of integers
(-
- vNj2)
%(l - vNjmn ) = l(" VN,J%)
2, ..., -
tend to the
set of negative integers (in an obvious manner) as N -- > 00. The prob ability \i(£ (1 ) = a •) is then approximated by the relative frequency of those of the foregoing i for which |(£ + 1 ) = aj. If A( |) is c.p. to A(|) at | then any such method yields the same prediction measure. It is interesting that there are sequences for which the general procedure just described does not work — hence the sequences are not continuously predictable — but the method of consecutive blocks described first does give a meaningful answer. The reason for this is not yet completely clear.
CHAPTER 3.
EXAMPLES AND COUNTEREXAMPLES
In this chapter and the two following we investigate various classes of regular sequences for their predictability properties. In only a few cases can the terms of the sequences studied be displayed ex plicitly and most often we shall describe a sequence by the statement that it is generic for a process of one kind or another. In this chapter we consider examples of sequences representing the simplest processes and also the one class of regular sequences capable of explicit construction. §11 . Random and Markoff Sequences 11.1. Preliminaries. The most elementary processes from the point of view of prediction theory are processes with independent vari ables — which we call random processes — and Markoff processes. In fact the former may be described as the processes for which the prediction of the future is independent of the individual sequence representing the past, and the latter, as those processes for which the prediction depends only on the last entry of the sequence. (The precise meaning of this will become clear as we proceed.) Next in order of complexity are m-Markoff processes for which the prediction depends upon the last m entries. More precisely an m-Markoff process is one for which (11.1)
P ( x n € A lx n _-j >
...)
= P(xn e A l ^ , ,
^ _ 2,
...,
Xn _ m )
•
DEFINITION 11.1. A left-infinite sequence t(n) will be called a random, Markoff, or m-Markoff sequence if it is regular and X(|) is respectively a random Markoff, or m-Markoff process. In all these cases the question of predictability is easily answered after establishing several preliminary lemmas. For the following lemma we define A-£ as the algebra of functions generated by the for n > o , ijreC(A) where the xn are A-valued variables for the process X. Thus for a process X, A-^ is the algebra generated by the subalgebras A-^ and A-£. 75
76
CHAPTER 3 -
EXAMPLES AM) COUNTEREXAMPLES
LEMMA 11 to Ax Ax 1 1 .. 11. . F oFro ra apprroocceessss XX, ' Ax AX i iss cc. p . p. . to at £| e i iff th e fu z | |A A ^^) ) i n funncct tioionnss EE( (z in L(A L(A^) ^) a r e c o n tin u o u s aatt fo r a ll
||
iinn th thee sseennssee ooff DD eeffiinniittiioonn 11 00 .. 33 ,,
z e A-^.
PROOF.
T h is d i f f e r s from C o r o l l a r y 1 to Theorem 1 0 . 1
h e re we r e q u i r e th e c o n t in u it y o f c o n tin u ity f o r a l l
z ge A^ A^-
F o r th e s u b a lg e b r a s f u n c t io n in w it h
A^
z.j z 1 € A^
A-^ Aj£
E ((zz ||A A^)
o n ly f o r
A-£
g e n e ra te
A^
z 2 ee A^. z
C le a r ly i t
oof f t thhi si s fo form rm. .
F oFro ra appr rooccees s s XX, ,
s u b a lg e b r a o f is c.p.
A ^o f o f z A^
z =
s u f f i c e s to e s t a b l i s h th e co n BBuut t EE (( zz 11 zz22 |A^) |A^) == zz.,E .,E( (zz22|A^) |A^)
and h ence i it t ssuuf f fi ci ceess to know ccoonnttiinnuuiittyy ffoorr to know LEMMA 111 1 ..22. .
A
w h ich means t h a t e v e r y
i s a l i m i t o f sums o f f u n c t io n s o f th e form and
in th a t However
r e a l l y im p lie s c o n t i n u i t y f o r a l l o f
and
t i n u i t y o of f E E z|A (z( |A ^ ^) ) f of or r
A^
z € g A^.
zz g A ^. e A ^.
lleett
T~ e no o T~11A£ A£ d d en tete th thee
z s a st iastfiys ifny ign g Tz Tz e e A A ££..
to to AxAx i if f and and oonnlyly i if f
Then
A-^ A-^ i iss c . p .
tto o
t- ’ a-.
T" V
rem 1 0 . 3 c.p.
to
PROOF. I If f
A-^
to iiss cc..pp.. to
A^ i si s cc..pp..
ttoo
T“ 11A^ A^ CC A-£. A ^. T“
T- 1 A“ .
p a r t o f Theorem 1 0 . 3 A^
Suppose Suppose c o n v e r s e l y t h a t
incceeT11 T11i s ian s an autom automorphism orphism o f SSin
i t f o ll o w s t h a t f o r e a c h f o ll o w s t h a t
Ax Ax th then en bbyy th thee ff ii r ss tt pp aa rrtt oo ff Theo
n , T~nA^ T""nA-^
(i.e.,
is c.p.
to
is c.p.
to
A-^
is
Ax Ax ( a (sa sananE -Ea -l ag legberba r)a )
T“ n ” 1 A^.
B y th e seco n d
t r a n s i t i v i t y o f th e c . p . r e l a t i o n s h i p ) i t T "nA^
fo r a l l
n.
Now b y Theorem 2 . 1
th e
ense se i inn A^« B y Theorem u n io n o f th e TT””n^ Theorem1 0 1. 0 h ave n^xx i iss dden 2 . 2weweh ave E ( g |A^) e A-^ f o r a l l g g T ” n A x s i n c e A-^ i s c . p . to T ” nAx and ence ce E ( g | A^ ) e A^ f o r a l l g e s in c e Ax i s c , p * to ^ ^ X 811(1 hhen E ( g | A ^ ) € €A£ f foorr aal l l gg eA^-. A^-. B yB yTheorem Theorem1 01. 02 . 2a gaagin a in A^A^ i s i sc . cp ..p .to to g
A-^ A^-
a s was to b e shown. COROLLARY.
X
> > jr G e C ( A) A ),, A A PROOF.
th o s e o f
is c.p.
i f and o n ly i f f o r e v e r y
th e ra n g e o f th e
xn ,
E Ef(Mx ^ ( x- j1 ) ||A A^) ^ ) e A ^.
F o r th e f u n c t io n s o f th e fo rm
-1 te T A^-. A^-. AA^^ ggeenneerraate
\|r(x- )
t o g e t h e r w it h
-
11.2. C o n tin u o u s P r e d i c t a b i l i t y o f Random, M a r k o ff, and M -M arkoff S e q u e n c e s . THEOREM 1 1 . 1 .
A random seq u en ce i s c o n t in u o u s ly
p r e d ic ta b le . PROOF.
T h is i s an im m ediate co n se q u e n ce o f Lemma 11.1 s in c e
§11.
for a random process
RANDOM. AND MARKOFF SEQUENCES
E(z|A^) = E(z)
if
77
z e A^
Just as easily we prove THEOREM 11.2. Every Markoff and, more generally, every M-Markoff sequence with discrete state space A is continuously predictable. PROOF. For if z e A+ (|) where £ is m-Markoff then E(z|A“U)) is a function of xQ, , ..., x_m+-j• Thus it is a function on the cartesian product Am and since a is discrete so is Am and any func tion on it is continuous. For the general case of a Markoff sequence we consider the trans formation R defined by a Markoff process on the set of continuous func tions on the state space of the process. R is defined by (11.1)
R+(x0 ) = EUCx., ) |A^)
.
where if \jr is continuous Rt will be a measurable function on A (with respect to the stationary measure on A). When the Markoff process is given In terms of transition probabilities (11
p(x, A), A, € A, A C A,
we have
.2 ) A We now apply the corollary to Lemma 11.2 and obtain THEOREM 11 .3 . A Markoff process X is c.p. if and only if the transformation R of (11.1) or (11.2) transforms continuous functions on A into continuous functions on A .
With regard to Markoff sequences it follows that if R takes continuous functions into continuous functions (for example, if p(X, a ) is continuous as a measure valued function of x ) then any generic se quence for X is continuously predictable. Conversely if Rijr is not always continuous, say XQ G a is a point of discontinuity, then it is easy to see that a generic sequence terminating in Xq will not be c.p. When the process is given by transition probabilities p(^, a ) and x q is a point of discontinuity of p(*., a ) it is not surprising that a sequence terminating in xQ is not c.p. since the prediction measure is precisely P(xQ, a ) which cannot be evaluated from the process if Xq is a point of discontinuity of p(x, a ). The case of m-Markoff sequences and processes need not be given
78
CHAPTER 3*
EXAMPLES AND COUNTEREXAMPLES
a separate treatment because, as is known ([3 ], Chapter II), an m-Markoff process defines an ordinary Markoff process which is in a certain sense equivalent to it. We make this precise in the following: LEMMA 11.3. To any m-Markoff process X there corre sponds a Markoff process Y with A^ = A^. To any m-Markoff sequence |(n) there is a Markoff sequence T)(n) such that |(n) is a derived sequence of T)(n) and T](n) is a derived sequence of £(n). PROOF. Take yn = ( x ^ , , ..., xn ) e Am and = (| (n - m + 1 ), |(n - m + 2 ), ..., £(n)). It is clear that =A^ and that |(n) and ii(n) are derived sequences of one another. The fact that Y is a Markoff process is evident from the fact that y 1 is determined by x 1 and yQ so that the conditional distribution of y 1 given yQ, y_1, ... depends upon yQ and the conditional distribution of x 1 given y Q, y 1 , ... which depends only on (xQ, x_1, ..., x_m+1 ) = t\(n)
y0 It follows from this that an m-Markoff process will be c.p. if and only if the corresponding Markoff process is, and similarly for se quences. For, A^ and A^- are identical as are Ax and Ay. Moreover in the isomorphism of A^ and A^ the point I e A" (|) is sent into the point ri e A"(ti) s o that A" U ) is c.p. to A( |) at | if and only if A“(ti) is c.p. to A(tj) at j\. §12.
A Non-Continuously Predictable Sequence
12.1. Construction of the Sequence. In the preceding section we found that there exist Markoff sequences that are not continuously predictable. By Theorem 11 .2 . such a sequence must take its values in an infinite space. We now give an example of a two-valued regular sequence that is not continuously predictable. This will be all the more sur prising since the sequence in question will be a derived sequence of a random sequence which is trivially c.p. (Theorem 11.1). It follows that a derived sequence of a c.p. sequence need not be c.p. THEOREM 12.1 . Let r\(n) be a generic sequence for a random {- 1, 1} process Y where P(yn = 1) = P> If |(n) P (yn = 1} = q, P + q = 1 and p f 1 /2 . defined by |(n) = Ti(n)T](n - 1 ) then £(n) is not continuously predictable. A sequence
|(n)
of this kind might arise in the
is
§12.
79
A NON-CONTINUOUSLY PREDICTABLE SEQUENCE
following way. We suppose that an infinite sequence of coin-tosses have been made and that Instead of recording each result as "heads" or "tails", the ex perimenter records only whether each toss gives the same or a different result when compared with the preceding one. The available data is then in the form of a two-valued sequence which is equivalent to the |(n) of the theorem. PROOF. We shall first prove that the process Y = X(tj ) is an L-extension (Definition 2.6) of X(|), i.e., that A"’(t]) C L(A~(£))« By Theorem 4 . 3 we have that the variables xn of X(g) are related to the variables of Y by *n = V n - 1 Consider now the elements of
zn =
t
A^
•
defined by
yo 0 and U£ be a neighborhood of £in fl~(|) for which
(Defi
80
CHAPTER 3-
(12.1 )
EXAMPLES AND COUNTEREXAMPLES
|E(fy0 ) - aE(f) | < sE(f)
for all non-negative
f e A~U)
with support in
U . Evidently the same
inequality will holdfor for allr allrnon-negative non-negative ff in in L(A“(|)), L(A“(£))> e.g., A^,
with support in
U£U£. . Now Nowthe theinclusion inclusion AA""((00 (( A^ A^ induces induces the the
canonical map from the point
t)*
to
to n”(|) n”(|) which which maps mapsr\
defined by
tjtj11(n) (n)
onto
== -- T)(n) T)(n) onto onto
|.
and also alsomaps maps Since Since the the canonical canonical
map is continuous it follows that there is aa neighborhood neighborhood TP TP of of in
which is mapped into
A^
U£
Similarly there is a neighborhood
fn
U£ U£ of of
tj.
L(A”(t)), L(A”(0)>
A^
Ue Ue..
where the
fR fR ee A^ A^
and and
In particular we will eventually reach
with support support in and for these with in |E(fny0 ) - aE(fn )| < BE(fn )
Letting n ---> 00 we find the other hand we may also the supports of |t )'(0 ) - a | s
support in in U£ U£..
t}tj11mapped mappedinto into
fn converge to to 5^5^ over converge over
with supports shrinking to
^
^
by this map and hence any function of
with support support in has, as a function in with in
Now let
in in
sufficiently
|(n)
f^
|y0(tj )
- a | < e , or | t j (0) - a | < e . find a sequence f ^ -> 6^ t over A^-
eventually interior to
< e and since small.
.
t
U£ .
On with
This gives
]'(o ) = - t ](o ) this will not be possible
Hence
yQ
is not continuous at
£
foi
and hence
is not continuously predictable. 12.2.
Application of the Prediction Procedure.
It is inter
esting that if we follow the procedure described in §10.6, we do obtain a definite prediction whose interpretation will be given in Chapter 6. For example to find the probability of
x1 = 1
given
we take as
an approximation the conditional probability P(x1 = 1 |(x_n ,
x0 ) = (i(- n),
|(0)))
(12.2) P((x_n ,...,XQ,X 1 ) = (l(-n),...,s(0 ),l))) P((x_n ,•..,xQ ) = (|(-n),...,|(o))) These probabilities probabilities are areto be to be computed from the fre and let n ----> > ««>>.. These computed from the fre quencies of thethe corresponding events sequence | | which which may may be be de of-corresponding eventsin--the in thesequence de termined from the the probability probability relations relations of of the the process process X(|)« X(|)« Now Now (i(( i ( - n), |(o)) |(0)) arises arises from from either either of of two two sequences: sequences: ( ( tti i (( - n - 1 ),1
),
..., (0 )) and (tj(- n ), __ _ -- nn(0)) (0)) and and the the probability probability of of the the • ••> rn\ (0 (— -n(— — 1 1), I-sequence is the sum of the probabilities of occurrence of the t ^-sequences. j-sequences. Let -1*s
v(n) in
and
ji(n) |i(n)
respectively represent the number of
(tti (- n . . , tt]i ((0o )). )). - 1) ),, ....,
Then
1*s 1*s
and
§12.
81
A NON-C ONTINUOUSLY PREDICTABLE SEQUENCE
P((x_n,
xQ ) = |(- n),
|(0)) =
•••> yQ ) = (t)(- n - 1 ),
■P((y_n-i> •••> y0 ) = (~
tj(—
pv(n)qn(n) + pM(n)qv(n)
#
n - i ),
t)(0))
+
r](o)) =
Now suppose that n (0) = 1. Then £(1) = 1 if T](n) = 1 so that U (- n), „ . |(o), 1) arises fromthe sequences (t)(- n - 1), ..., t](o), 1) and. (-(- n - 1 ), ..., tj (0), -1), and therefore the denominator of (12.2) is p v(n )+1qM-(n) + pp(n)^v(n)+1 ^ ^ow by assumption r\ is generic for Y so that v(n)/n---- > p, |i(n)/n-> q. fraction in (1 2 .2 ) becomes
p v(n)+1qn(n) +
^ pV(n'yqnCn)' +
pjx(n) v(n)+1
Assume that
p > q.
The
p v(n)-n(n)+1 + qv(n)-ji(n)+1
pV(nTqV(n) = ' pV (n)-V (n )‘ + qV(n)-|i‘(n)
'
and since p > q, v(n) - n(n) -- > °° and so (1 2 .3 ) tends to p v(n)-n(n)+1/p v(n)-n(n) = p> similarly if r|(0 ) = - 1 the conditional probability tends to q. In any case the conditional probabilities con verge to a definite limit and by this procedure a prediction measure may be obtained. We observe that the same probabilities might be obtained in a different way. From the sequence £(n) we can construct the two possible sequences that could have given rise to it in the Y process. Notice that one of these sequences r\ is generic for Y and since p £ q the remaining sequence r\l where 1 and - 1 are interchanged cannot be generic. Now suppose that in forming the prediction we take for granted that the generic sequence x\ is the one that "actually occurred". Suppose first that the last reading of this r\ is tj(o) = 1. Then |(1) will be 1 if and only if T}(1 ) is 1 . Now this occurs with probability p, the t) being & random sequence and thus, supposing that r\(n) is the sequence that did occur, we may take p as the probability that |(1 ) = 1 . If ^(0 ) = - 1 then £(1) = 1 is equivalent to rj(1 ) = - 1 so that in this case the probability is q that |(1 ) = 1 . Our previous results thus coincide with what we would obtain if we knew that tj is the se quence that occurred (and not tj1).. This general idea will be the basis of statistical predictability. 1 2 .3 . Another Form of the Sequence. The example of this section shows that a derived sequence of a random sequence need not be continuously
CHAPTER 3 .
82
EXAMPLES AND COUNTEREXAMPLES
predictable. It is obvious however that if the derived sequence had the form ( 1 2 A)
4(n) = iir(r) (n))
r\
r\
where (n) is random, then |(n) must be c.p. For if (n) is random, so is |(n) in (12.4) and then Theorem 11.1 applies. The situation is quite different for Markov sequences for, if t(n) is a Markov sequence and |(n) has the form (12.4) then |(n) need not be a Markov sequence. (In terms of processes, if the yn form a Markov process and ^ = i|r(yn ), the xn need not form a Markov process.) Accordingly one may inquire whether or not all derived sequences of a discrete Markov sequence of the form (12.4) are c.p. The answer is that they are not and this is also shown by the example of this section. Namely, form the composite sequence
\
{(n) = (t)(n), t)(n - 1 ))
;
(;(n) is generic for a process Z with variables zn = (yn , yn_-|)* Using the independence of the yn it is easily shown that the zn form a Markov process. We note for later reference that the transition matrix of this process is (- 1 , - 1)
02.5)
(- 1 , -1) (- 1, 1) (1, _ 1} (1, 1)
/ \
Q 0 0
(- 1 , 1 )
(1 , -
)
(1 , 1 )
0\
p
0
0
q
P
p
0
0
0
Ql
P /
Now define + by 'K1 > 1 ) = *(- 1 , - 1 ) = 1 , *(1 , 1 ) = t(- 1 , 1 ) = - 1 • Then we have |(n) = y(£(n)) = *1(n - 1)). This shows that a de rived sequence of a Markov sequence of the form (12.4) need not be con tinuously predictable. §1 3 . A Class of Deterministic Sequences 13.1 ^-Sequences. It was pointed out at the beginning of the chapter that not many regular sequences are capable of explicit con struction. In fact the only regular sequences known whose general term can be explicitly displayed are (but for minor modifications) special instances of the following: DEFINITION 1 3 .1 . We denote by the class of left-infinite sequences that are uniform limits of sequences whose general term is given
§13*
A CLASS OF DETERMINISTIC SEQUENCES
83
by k ( 1 3 -1 )
|(n) = £
c ve ( p v ( n ) )
V=1
2itit where e(t) = e and P v(t) is a polynomial with real coefficients. A left-infinite sequence belonging to will be referred to as a g)-sequence. The set 2> is easily seen to form an algebra closed under the shift T, and containing the almost periodic sequences as a sub-algebra. p Not every sequence is almost periodic since, for example, |(n) = cos n a, a/* irrational, is a sequence but is not almost periodic. This can be seen by noting that the average of !(n)e£rtin?l for |(n) = cos n2a vanishes for all real ^ (a fact to be proven later) which for almost periodic sequences would Imply that |(n) s 0 . With regard to prediction, however, all 2>-sequences behave as almost periodic sequences, i.e., they are regular and deterministic. This is particularly Instructive in view of the fact that a sequence as cos n 2a, for a/* irrational, behaves, on the one hand, like a random sequence from the point of view of linear prediction (since such a sequence has no periodic components) whereas the non-linear theory finds these sequences deterministic. In investigating SVsequences we shall use the following: LEMMA 1 3 .1 . Let £(n) be a set of translates of |(n), polntwise limit of the |(k)
sequence and a = T31^ . Then any again a ^-sequence.
PROOF. The statement of the lemma is obvious if |(n) has the form (1 3 .1 ) since a polntwise limit of translates of such a sequence again has the same form. Now suppose the sequences £m (n ) ha,ve the form ( 1 3 .1 ) and that £m (n ) -- > l(n) -uniformly. Also let a>(n) be a pointwise limit of the translates £^)(n ). We wish to show that a> € 2 >. Now nk for some subsequences of the sequence {n^.) the translates T £m will converge polntwise forall m so let us suppose thatthis subsequence has been renumbered as {n^.} . Also let lim T % k-» 00 We now have for each
= %
.
n
|cd (n) - m(n)| =
lim |T k lm (n) - T k |(n)| k-» 00
84
CHAPTER 3-
It follows Since the since the co is also
EXAMPLES AMD COUNTEREXAMPLES
that h m (n) - m(n) | < ||sm - 5 \\m so that ||cDm - m||M < ||lm | converge to £ uniformly, cr>m -- > cd -uniformly. But com are 5>-sequences by our first observation it follows that a 2>-sequence.
LEMMA 1 3 .2 . If p(t) is a real polynomial with at least one irrational coefficient then the average of e(p(n)) exists and equals 0. This lemma is a result of H. Weyl ([ 1 6 ] ) to which we shall give an alternative proof in Chapter 7 . This lemma is equivalent to the state ment that the sequence e(p(n)) is equidistributed on the circle |z| = 1 , if p(t) has an irrational coefficient — or to the statement that in every case, e(p(n)) Is either periodic or equidistributed. It is by this lemma that one deduces above that all the Fourier coefficients of cos n a vanish if a/it is irrational. LEMMA 13*3* If G is a compact abelian group with identity e and if g e G is an arbitrary element, then some subsequence of {gn , n > 1 } converges to e. PROOF. Let the closure of {gn , n > 1 } in G be denoted G 1 . Gj is closed under multiplication by g so that {gnG 1} forms a monotonically decreasing set of closed subsets of G. Since G is compact
g2
=n
^
n>l is non-empty. Since g ^ C g 1^ ) = gn'"1 G 1 it follows that G 2 is closed under multiplication by g ” 1 . Hence G2 is closed under multiplication by g""n and therefore also by G ” 1 . Hence it is closed under multiplica_1 1 tion by G 2 so that G 2 is a group. Hence e e G 2 C G 1 . The principal result of this
section is
THEOREM 1 3 .1 . Every 2>-sequence is regular and de terministic . PROOF. We first observe that the assertion that 2>-sequences are deterministic will follow from the fact that they are regular. Notefirst that if all ^-sequences are regular then it follows that a 2 >-sequence that vanishes for all n < - k must vanish for all n. Hence if two 5>-sequences agree for all n < - k, their difference being a 2 >-sequence, it must
§13-
A CLASS OP DETERMINISTIC SEQUENCES
85
vanish identically so that the two sequences must be identical. Suppose now that | e 2) and that I Is not deterministic. There then exist two se quences 1 1 and 12 in ft(g) with P(l1 ) = P (?2 ) = I. If ? 1 4 ?2 there will be a k, k > o, for which- l-fk) 4- ?0 (k). This means that the sequences p(T“ 1 1 ) and p(T~ are distinct. Denote these as and Since T”kl 1 and T”^ ^ are in a(|) it follows that and l2 belong to and hence are (pointwise) limits of translates of £(n). By Lemma 1 3 .1 , |!j and £2 are 2>-sequences. On the other hand T^s* = Tkp(T_lcT 1 ) = P T ^ T " ^ ) = pCi^) = P(T2 ) = for n < - k. But then, as sequences, £* Identical throughout contrary to assumption. ministic .
so that |’(n) = g^(n) and would have to be Hence |(n) must be deter
We turn now to the proof that the sequences ofareregular. By Theorem 5 . 2 it suffices to prove thata dense subset of *22 is regular and hence we may restrict our attention tosequences of the form (1 3 * 1 )• Suppose than that k I(n) = ^ cye(pv(n)) . v—1
Let G denote the k-dimensional torus group; that is,'the elements of G are k-tuples (z^, ..., z^) with |zj| = 1 , j = 1 , 2, ..., k. Using the exponentials defining |(n) we may define a G-sequence £(n) by setting (1 3 *2 )
£(n) = (e(p1(n)), e(p2 (n)),
e(pk (n)))
.
£ (n) is then a derived sequence of £(n) and to show that |(n) Is a regular sequence It suffices to show that ?(n) is regular. Now it is easily shown that ?(n) is a stochastic sequence. Namely, for this we must show that the average of sequences
\|r1(£(n))lr2 (£(n - 1 )) ... *r (?(n - r + 1 )) exist for continuous functions y • on G. Now the continuous functions on G can be approximated by linear combinations of the group charac ters on G and these have the form ^ „ (zi; z~, q, Qa qk < w ••* rv (x, y)-
PROOF. We first show that ry ( x , x) = 1 for all x e V. Since x) 1 we have ry(x, x ) 1 . If K y ( x , x) > 1 then - x e V> hence ^x e V for all real X so that by (*) x = o which is excluded for ry is only defined for x , y ^ o . We next prove the triangle in equality
K y(x,
(1 5 .2 )
rv (x, z) > rv (x, y)rv (y, z)
.
For this it suffices to prove the analogous inequality for Ky. Now by (15.1), x - (K y ( x , y) - e)y € V, y - (Ky ( y , z) - e )z e V. Multiplying the second by K y ( x , y) - s and adding to the first we find that x - [K y ( x , y)- s ] [ K y ( y , x ) - e ]z e V so that Ky ( x , z) > [ K y ( x , y) - e] [K y (y , x ) - s ] for every e > o and from this the result follows * To prove (i) we observe that ry(x, y ) = ry(y> x ) rv (x, x) = 1 then by (1 5 .2 ), 1 > ry (x, y)2 . (ii) follows from the fact that
K y (X x ,
pj) = ^
K y (x ,
since y).
The "if” portion of (iii) has already been shown, so suppose that we are given ry ( x , y) = 1 . Write K y ( x , y) = X ; then K y (x , Xy) = 1 , so we may suppose to begin with that K y ( x , y) = 1 and therefore Ky(y, x ) = 1 . Then by (1 5 .1 )
§15-
93
LINEAR TRANSFORMATIONS OF CONES
X - (1 - e)y € V, y - (1 - e )x e V for arbitrary
s > 0.
,
Thus 0. Now l/e1 - l/e2 can take on any real value so that adding the preceding elements of V we find x + y + X(x - y) 6 V for arbitrary
x and by
x = y.
(*)
Thus in general
Finally (iv) Is immediate since if so that by (1 5 . 1 ) K^Lx, Ly) > Ky(x, y).
x= Xy.
x - Xy e V, Lx - XLy € W
Lemma 15 .1 shows that if we identify points on the same ray of V, we obtain a topological space with metric given by - log ly(x, y). (iv) shows moreover that the linear transformations of such a space will always be contractions in this metric. As an example take D = R131, m-dimensional Euclidean space, and V theset of m-tuples with no negative components. V clearly satisfies (*). Now suppose that x, y e V x = (x1, x2, ...,xJn), y = j 2, ..., ym ). If yi = ° and x^ ^ 0 for some I, then clearly y - Xx will always have a negative component for X > 0 so that Ky(y, x) = 0. It follows that unless the two vectors have the same components vanishing Ty(x, y) = 0. Otherwise we have Kv (x, y) = inf y./x. x^O so that (1 5 .3 )
r„(x,
y)
=
inf x^y^o
X.y./Xjy.
J x
.
J
1 5 .2 . Protectively Bounded Transformations.
DEFINITION 1 5 .2 . A linear transformation of a cone a cone W C D* is protectively bounded if for all x, y e V Lx, Ly £ 0 (15.4) for a fixed
rw (Lx, Ly) > 5 5 > 0 . The largest
5
VC D with
,
for which (15.4) holds is the
into
9^
CHAPTER
SUBPROCESSES OF MARKOFF PROCESSES
projective bound, of
L, 5 = r(L).
In the finite dimensional case, a transformation L : r M -- > Rn taking the cone of non-negative vectors of Rm into the corresponding vectors of Rn , may be represented by a matrix with non-negative entries (L^j). The condition that L is projectively bounded may be stated as follows: whenever L. • = 0 then either all L. . = 0 or all L. . =0. V o V 1J0 To see this, suppose first that this condition is fulfilled. Then for all i, j,r, 3,either bothor neither of L. L. and L. L. Q vanish. 1J/ Js J is Therefore there will bea 5 > 0 such thatL. L. > 5 L.^L. for all ir js — ji i, j, r, s. From this we obtain
(15-5)
I
LirL jsxrys ^ 5
r,s
I
LjAsVs r,s
or setting u = Lx, v = Ly, > Su^v^. It follows now from (15*3) and (15.^)that L is bounded. Conversely if L is bounded (15.5) holds for all choice of non-negative {x J?1 and {ya} s so that in particular L.^L. > 5 L _ . L. and from this we obtain the above condition. lx Js —■ J is Another form of this condition is that all the entries of should vanish but for those for which i e I ( {1 , ..., m} and j e J ( {1 , ..., n} for these .> 0 . Inparticular a matrix with all positive entries is projectively bounded. The following lemma will be of importance to us. LEMMA 15.2. (15.6)
If for all
x, y e V,
(15.*0 holds then
rw (Lx, Ly) > ry (x, y) + ------- §--- (1 - ry (x, y)2 ) 1 + &ry (x,y)
.
PROOF. Let Kg (x, y) = Ky(x, y) - e and rg (x, y) = Kg (x, y) Kg(y, x). Setting Xg = x - Kg(x, y)y, ye = y - Kg(y, x)x, it follows that Xg, ye £ V. By the definition of and (15 .^ ) if 5' < 5 there will be numbers \ , (i > o with Xu > 6' for which Lxg - xLyg e W, Ly£ - nLxg e W These two may be rewritten: X + K (x,y) L x ------- £----- Ly e W 1 + VKg(y,x)
,
.
§16.
SUFFICIENT CONDITION FOR CONTINUOUS PREDICTABILITY
n + k (y»x) L y --- --- ------- Lx e W 1 + nK(x,y)
95
.
Hence + K. (x,y) (i + K (y, x) ]%(Lx, Ly) > ---- 2------ , %(Ly, Lx) > --- e-----1 + *.Ks (y,x) 1 + \K£ (x,y) and Xu + r £ ( x , y ) + xK£ ( y , x ) + nK£ ( x , y )
(15.7)
rw (Lx, Ly) >_ 1 + Xnrg ( x , y ) + xK£ ( y , x )
Now for any positive numbers
+ nKg ( x , y )
a, p, a', p ’,
if
a/p >, a'/0'
then a / p > a . ± . .a i
> a -/pt
" P + P' Since XK ( y , x ) + nK ( x , y )
r (Lx, Ly) < 1 = -- £--- ------ §--- XK£ ( y , x ) + ^K£ ( x , y )
it follows from (1 5 .T) Xu + r ( x , y )
rw (Lx, Ly) > ----- S-----1 + Xnr£ ( x , y )
Now letting
e -- > 0
and
S ’ -- > 5
we find
5 + r
(1 5 .8 )
(x,y)
rw (Lx, Ly) >. -----i-----1 + 6rv ( x , y )
and from this the assertion of the lemma follows readily. We notice that the expression on the right side of (1 5 *8 ) is simultaneously > 5 and > P y ( x , y), thus yielding two inequalities that we already knew. Thus whenever (15.*0 holds for a transformation L, a stronger inequality may be inferred. § 1 6 . A Sufficient Condition For Continuous Predictability 16.1. Automorphisms and Their Ad joints. The process Y in this section and for most of the next be an ergodic Markoff process with finite state space. Also we take X to be a subprocess of Y
will,
CHAPTER k.
96
SUBPROCESSES OF MARKOFF PROCESSES
defined by = ^(yn ) and. we shall find conditions for A-^ to be con tinuously predictable to A^ at a point £ e Since Y is c.p. these conditions will imply that | itself is a continuously predictable sequence. Let M be the (finite) state space of Y and A that of X so that y : M -- > A. We shall assume that every element of M actually occurs with non-zero probability. The transition matrix of Y will be written (p(b^bj)) where b^ and b. range over M. is imbedded in M~ and in terms of the transition matrix of Y it is possible to determine just which sequences occur in Namely co e if and only if p(a>(n), ca(n + 1 )) > 0 for all n < 0 . For if cd € then the open set of : {cd1 : yn (a>T ) = a>(n), yn + 1 (cd1 ) = co(n + 1 )} is non empty and if p(a>(n), co(n + 1 )) = o this set would have measure 0 . Con versely if all transitions in cd have positive probability, then the sub set of given by {cd1 : yn (cDr) = co(n), yR + 1 (cd1 ) = co(n + 1 ), ..., y0 (©') = a>(o)} has positive probability and hence is non-empty. There fore the intersection of these sets, i.e., the point cd, belongs to We shall be dealing with both finite and infinite M-sequences of the form a = (cr(q), cr(q + 1 ), cr(o)). Here and in the future q will represent an integer < 0 . In either case we call cr a possible sequence if p( q, cr(n) € M^s). But then nn (tf(n)> + 1 )) = p( 0 if a is possible, and hence I I * (b, b 1) > p(cr(q), c r(q +1)) ... p(cx(-1 ), cr(o)) > 0. Thus the lemma re duces to showing that for - q large, and b € M ($), b* eMQ (|), there is a possible sequence lying over |joining b to b !. Clearly it suffices to show for a particular b', that eventually each element of is joined to it by a possible sequence lying over £ since if we go far enough this would be satisfied by all b ! € MQ (|) simultaneously. So we choose b 1 e MQ (|) and let be the set of b € M ($) which cannot be joined to b*by a possible sequence lying over £. Also let be the closed set of all r\ e such that ■ q (q) e A^. By the definition of M^(|) it follows that if A^ is non empty so is X . We observe next that C 2^. For suppose tj so that 7](q - 1 ) cannot be joined by a possible sequence over | to b'. Clearly T ) ( q ) , at the next stage, cannot be joined to b ! by a possible sequence over i since if it could then T](q - 1 ) could also have been joined to b*. If we now suppose that the A^ are all non-empty, then the 2^ would form a monotonically decreasing sequence of non-empty closed sets so that therewould bean ^ g for all q. Thiswill lead us to a contradiction. For let tj 1 bethe sequence of o”1(l)terminating in b !. By the preceding lemma there is an r\e with t j (o ) = b 1 and j\(n) = t" (n) for n < - N for some N. But then ^(n) c An for n £ - N and then passage from Tj(n) to b ! over £ would be impossible. This contradiction shows that the A^ are eventually empty and this proves the lemma. LEMMA 17.4. Let £1 be another point of a” and consider the corresponding sets M (l1), q < 0, matrices n (l1)- Suppose for a fixed q < 0 and
and
§17.
NORMALITY AND CONTINUOUS PREDICTABILITY
k > O, £1 satisfies (i) t 1(n - k ) = £(n)
for
q < n < o
(ii) Mq_k (?')C Mq (0Then r(n
r(nq_k (^)nq_k+1(i*) ••• n . , . ^ ' ) ) * ( 5 )n
PROOF. (17-2 )
, ( l ) ...
n_1 ( 5 ) ) .
We observe first of all that (ii) implies
V k (5’>C
for all n with q — ~ (g). We can therefore write i|r (|(q)) - M (g) = -*i q. A1 U a 2 where A1 consists of those elements of i|r~ (g(q)) that cannot be continued to the right as a possible sequence lying over i(q), |(q + 1 ), ..., |(o), and A2 consists of those elements not in A^ which cannot be continued to the left as a possible sequence over ... |(q - 2), i(q - 1 )> l(q)* Now there Is a bound to how far to the left an element of A2 may be continued (as a possible sequence) for otherwise, as a simple compactness argument shows, there would be elements in a 2 that may be continued infinitely far to the left as a possible sequence, and this contradicts the definition of Ag. This shows that there is a sequence
§17-
NORMALITY AND CONTINUOUS PREDICTABILITY
|(r), |(r + 1), |(q), . |(o) such that any sequence r\ Of taking values in ty”1U(j)) for r — < j— < o must satisfy T ) ( q ) e M (£). For if not *n(q) belongs either to orAg . The former is excluded since ri(q)is contlnuable to the right and the latter will be impossible if r is sufficiently negative since t) (q ) can be continued as far as t](r). If £ is a limit point of translates of |1 then any neighbor hood of | contains infinitely many translates Tn £1. By thedefinition of the topology in A~ this Implies that the last |r| + 1 terms of in finitely many of the Tn £1 agree with those of £; in other words (i(r), |(r + 1),|(o)) occurs infinitely often in g1. Let K de note the set of k for which (1 1(r - k), £1(r - k + 1), .. ., £1(- k)) = i(r + 1), |(0))o What we have just shown amounts to the state ment that M_ , (£ 1) C M _(|) q-K; q . We see then that for k e K and for q as before,the hypotheses of Lemma 1 7 . 4 are fulfilled. Hence
r(n
,
)n
.
, (£1) . . .n , .
(tf )
' q-k+ 1 v ' -1-kv
K q-kv
5 '
(17-3) > r(nq (|)n
(g)...n_1 (i)
for Infinitely many k. Now the right hand side of (17 •3) is positive, and equals some 5 > 0. We thus find that each n ^ ')Hj+1( £ ' ) • • • ( £ 1) may be written as a product m(1 M . 1 W 2 M . 2 K . J J J J
.N^.s V . s ^ J J
with r(N^^) > 5 and s -- > °o as j ---> - oo. From r(N^^) > 5 we deJ ~ (1 W i ) J ~ duce that also r(M^. 'N\ ') > 8 so that n .(|* )n . - (£1)...n,(g1) has J J — J J+ 1 * the form r!1 M . 2 K ..r!s ^r' J
J
J
with r ( R ^ ) > B. The corollary will therefore be proven if we establish f1 ) that there exists a sequence en (8 ) -- > 0 such that whenever r(Rv ')> r ( R ^ ) , ..., r ( R ^ ) > 8 then r(R^1 ^ 2 K .. R ^ ) > 1 - en (5). The latter assertion is an immediate consequence of Lemma 1 5 .2 . Namely If r
we also have
(R 0
)R (
2
) . . . R ( n ))
A . The preceding theorem shows that X is always continuously predictable. To show that this Is so we must verify that Y is normal over X. Since = M“ it follows that for arbitrary sequences a 1 and cr2 of the same length and having the same images under \|r and also beginning with the same element, Ta a (§ 16) is a homeomorphism in G(Y|X). Now let
t)i
and
r\2
be any infinite M-sequences lying over the same point
£
§17-
of
NORMALITY M D
CONTINUOUS PREDICTABILITY
Clearly for an appropriate
o^, o2,
a rj1
will come arbitrarily
close to t]2 * Hence a closed set containing that is invariant under G(Y|X) will also contain Hence Y is normal over X at each | and so X is continuously predictable We note however that this case can never arise if Y is m-Markoff for m > 1, or if xn depends upon more than one of the For in these cases, the Markoff process is obtained by taking a vector process zn = (yn , yn_-|> •••> Yn_^-) and the transition probability from (X , to ..., is always 0 unless XQ = An example of this was given in §12.3.
•
CHAPTER 5-
STOCHASTIC SEMIGROUPS AND CONTINUOUS PREDICTABILITY
In the last chapter we dealt with the prediction problem for processes derived from certain elementary processes, ordinary and m-Markoff processes. In this chapter we shall develop a procedure for constructing a large variety of processes for which we shall be able to study the pre diction problem. Our procedure will be of some interest in itself and we shall show how it is also related to certain problems outside of pre diction theory. §18.
Stochastic Semigroups
18.1. Definitions DEFINITION 18.1. A stochastic semigroup of order group G* having an identity e, together with a set of r a 1, a2, ..., ap generating and a real-valued function G* satisfying (i) (ii)
F(e) = F(a) > 0
(iii) s£=1
for each
a e £*and
P(a±ff) = 4=1 F (aai)
F(a^) >0,
= P((J)
r F
is a semi elements defined on
i = 1,2, ...,
r,
•
The connection between stationary processes and stochastic semi groups is easily seen. Suppose X is a process with values in a set A having r elements a 1, ..., ap . We may form the (free) semigroup G* of all formal products of the a^ with the empty product taken as the identity e. On define F for o = a (1 )a(2)... cr(n) by (18.1 )
F(a) = P(x1 = a( 1 ), x2 = a(2), ..., xR = a(n))
and take F(e) = 1. has to be verified.
To show that ^ is a stochastic semigrouponly (iii) The equality r F(aai ) = F(cr 1=1
V
follows directly from the fact that
)
P 11 0
Is a probability measure.
The
§1 8 ,
STOCHASTIC SEMIGROUPS
remaining half follows from the stationarity of r )
X:
r P(a±a) = \
1=1
P(x1 = a „ x2 = a(l),
x ^
= cr(n)) =
1=1
P(x2 =cr(l ),
xn+1 = a(n)) = P(x1 = a(l),
xR = a(n)) = P(cr)
.
Conversely suppose P is a given stochastic semigroup. We then define aprocess X with values in the set of generators (a1, a2, ..., ar) by setting P(x1 =
cr (1
), x2 = o(2 ), ...,
Xn
= cr(n) ) = F( a (1 )a (2 ). ..a (n) )
.
It is easily verified that this will be a process, the stationarity being a consequence of (iii). We shall not distinguish between isomorphic processes and thus the particular nature of the range space of the xn is irrelevant. It is however convenient for purposes of notation to identify the range of X here with the generators of DEFINITION 18.2. For an r-valued process X, ^(X) will de note the associated stochastic semigroup of order r. For a stochastic semigroup