113 41
English Pages 304 [322] Year 2012
Die Grundlehrcn dcr mathcmatischen Wissenschaften in Einzeldarstcllungen Band 150
M. Iosifescu • R. Theodorescu
Random Processes and Learning
Die Grundlehren der 1nathe1natischen Wissenschaften in Einzeldarstellungen mit besonderer Berlicksichtigung der Anwendungsgebiete Band 150
H erausgegeben von J. L. Doob · E. Heinz · F. Hirzebruch · E. Hopf· H. Hopf W. Maak · S. MacLane · W. Magnus· D. Mumford M. M. Postnikov · F. K. Schmidt · D. S. Scott · K. Stein
Geschiiftsfuhrende H erausgeber B. Eckmann und B. L. van der Waerden
M. Iosifescu · R. Theodorescu
Random Processes and
Learning
I Springer-Verlag New York Inc. 1969
Prof. Dr. Marius Iosifescu Academy of the Socialist Republic of Romania Centre of Mathematical Statistics, Bucharest
Prof. Dr. Radu Theodorescu Academy of the Socialist Republic of Romania Centre of Mathematical Statistics, Bucharest visiting professor at Laval University, Department of Mathematics, Quebec
Geschliftsfiihrende Herausgeber:
Prof. Dr. B. Eckmann Eidgenossische Technische Hochschule Zurich
Prof. Dr. B. L. van der Waerden Mathematisches lnstitut der Universitiit Zurich
All rights reserved. No part of this book may be translated or reproduced in any form without written permission from Springer-Verlag. 0 by Springer-Verlag Berlin· Heidelberg 1969. Library of Congress Catalog Card Number 68-54828. Printed in Germany Title No. 5133
Preface The aim of the present monograph is two-fold: (a) to give a short account of the main results concerning the theory of random systems with complete connections, and (b) to describe the general learning model by means of random systems with complete connections. The notion of chain with complete connections has been introduced in probability theory by ONICESCU and Mrnoc (1935 a). These authors have set themselves the aim to define a very broad type of dependence which takes into account the whole history of the evolution and thus includes as a special case the Markovian one. In a sequel of papers of the period 1935-1937, ONICESCU and Mmoc developed the theory of these chains for the homogeneous case with a finite set of states from different points of view: ergodic behaviour, associated chain, limit laws. These results led to a chapter devoted to these chains, inserted by ONICESCU and Mmoc in their monograph published in 1937. Important contributions to the theory of chains with complete connections are due to DoEBLIN and FoRTET and refer to the period 1937-1940. They consist in the approach of chains with an infinite history (the so-called chains of infinite order) and in the use of methods from functional analysis. After II World War, instrumental works in the development of the theory of chains with complete connections were those by IONESCU TULCEA and MARINESCU who have used systematically methods from functional analysis. Other authors contributed also to this theory; it suffices to look at the bibliographical references given at the end of this monograph. The theory of chains with complete connections, despite numerous authors who contributed to it, remains however a Romanian creation 1> • An important fact to be mentioned is the publishing of the monograph devoted to stochastic models for learning by BusH and MOSTELLER in 1955. Apparently having no relation to the theory of chains with complete connection~, this monograph describes the most current 1
>see, e.g., the monograph published by Crncu and THEODORESCU (1960) which contains the results available up to the end of 1959; see also the expository payer by THEODORESCU (1960b).
Preface
VI
stochastic models for learning to be met in practice. As proved later, there exists a tight relationship between mathematical learning theory and the theory of chains with complete connections in the sense that the model described by the subject-controlled events considered by BusH and MOSTELLER (1955) reduces to the associated (Markov) chain with a linear chain with complete connections with a finite set of states. Obviously, the remaining models discussed by BusH and MOSTELLER can be also reduced to schemes somewhat generalizing these chains. Among the works devoted to the application of the theory of chains with complete connections to mathematical learning theory - eventually the most important field of application of this theory - an important contribution is due to NORMAN (1966c, 1968a,c), whose results play an important part in the present monograph. The monograph is divided into three chapters. The first two chapters are devoted to the general theory of random processes, especially of random systems with complete connections. In Chapter 3 this theory is applied to the general learning model. It is intended for mathematicians with a good background in modern probability theory, as well as for applied people working in the field of learning. The numbering used is the following: a. b. c. d. e, where a indicates the chapter, b the section, c the subsection, d the paragraph and e the subparagraph; definitions, theorems, lemmas and propositions are numbered a. b. c, where a indicates the chapter and b the section. We have made a sincere effort to provide every paragraph with appropriate bibliographical notes and comments; any inaccuracy or omission in assigning priorities is wholly unintentional and deeply regretted. Where it was necessary we inserted also additional complements which supplement the theory developed. The bibliographical references mentioned at the end of the book are exhaustive as far as Romanian contributions are concerned. Certain parts of this book were previously distributed, and we are indebted to many friends and colleagues who proposed improvements during this period. We are particularly indebted to Professor KLAUS KRICKEBERG for his careful reading of the manuscript and for making a number of valuable suggestions. We wish also to express our appreciation to Springer-Verlag for their most efficient handling of the publication of this book. December 1967 Bucharest
MARIUS
IOSIFESCU
RAou THEODORESCU
Contents Chapter 1 A study of random sequences via the dependence coefficient
1.1. The general case . . . . . . . . . . . . 1.1.1. The dependence coefficient . . . . . 1.1.1.1. Borel-Cantelli type properties . 1.1.1.2. The 0-1 law. . . . . . . . 1.1.1.3. Two auxiliary results . . . . 1.1.2. Generalizations of Bienayme's equality 1.1.2.1. Some inequalities concerning the covariance l.1.2.2. Applications to the variance of sums . 1.1.3. Convergence of series . . . . . . . . . 1.1.3.1. The a.s. convergence. . . . . . 1.1.3.2. The strong law of large numbers 1.1.4. The central limit theorem . . 1.1.4.1. The variance of sums . . 1.1.4.2. Different variants . . . . 1.1.5. The law of the iterated logarithm . 1.1.5.1. Two auxiliary results 1.1.5.2. The main theorem.
1.2. The Markovian case . . . . . 1.2. l. The coefficient of ergodicity 1.2.1.1. Introductory definitions 1.2.1.2. Properties . . . . . . 1.2.1.3. The relationship to the independence coefficient . 1.2.2. A lower bound for the variance. 1.2.2.1. The main theorem. 1.2.2.2. Auxiliary results . . . 1.2.3. Asymptotic properties . . . . 1.2.3.1. Borel-Cantelli lemma and the 0-1 law. 1.2.3.2. Generalizations of Bienayme's equality.
1 5 7 10 10 12 13 13 17 22 22 28 34 34 35 38 38 38 40 43 45 45 46 51 51 52
VIII
Contents
1.2.3.3. Convergence of series . . . . . . . . . . . . . . . 1.2.3.4. The strong law of large numbers . . . . . . . . . . 1.2.3.5. The central limit theorem and the law of the iterated
54 56
logarithm . . . . . . . . . . . . . . . . . . . .
60
Chapter 2 Random systems with complete connections
2.1. Ergodicity. . . . . . . . . . . . . . . . . . . . . . 2.1.1. Basic definitions . . . . . . . . . . . . . . . . . 2.1.1.1. The concept of random system with complete
connections . . . . . . . . 2.1.1.2. The associated Markov system 2.1.1.3. The associated operators . . . 2.1.2. Different types of ergodicity . . . . . 2.1.2.1. Definitions and auxiliary results . 2.1.2.2. Uniform ergodicity in the weak sense 2.1.2.3. Uniform ergodicity for the homogeneous case. 2.1.2.4. Uniform ergodicity in the strong sense . . . 2.1.2.5. Application to multiple Markov chains. . . . 2.1.2.6. Application to the associated Markov system 2.1.3. An operator-theoretical approach. . . . . . . . . 2.1.3.1. Mean and uniform ergodic theorems. . . . 2.1.3.2. Ergodic theorems for a special class of operators. 2.1.3.3. Application to the associated Markov system . . 2.1.3.4. Application to the ergodicity of homogeneous random
systems with complete connections.
2.2. Asymptotic behaviour. . . . . . . . . . . . . 2.2.1. Properties not supposing the ergodicity . . . . 2.2.1.l. Borel-Cantelli lemma and the 0-1 law. 2.2.1.2. Convergence and the strong law of large numbers 2.2.2. Properties supposing the ergodicity . . . 2.2.2.1. Basic results . . . . . . . . . 2.2.2.2. The strong law of large numbers 2.2.2.3. The weak law of large numbers . 2.2.2.4. The central limit theorem . . . 2.2.2.5. The law of the iterated logarithm 2.2.2.6. Some nonparametric statistics. .
2.3. Special random systems with complete connections. 2.3.1. OM-chains . . . . . . 2.3.1.1. Basic definitions . . . . . . . . . . . .
63 63 63 66 67 69 69
77 81 86 89 96 97 97 99 116 128
131 131 131 132 134 134
137 140 145 152 153 155 155 155
Contents
2.3.1.2. Examples . . . . . . 2.3.1.3. Ergodic theorems . . . 2.3.1.4. The Monte Carlo simulation 2.3.1.5. The case of an arbitrary set of states . 2.3.2. Chains of infinite order . . . . . . . . . 2.3.2.1. Definition and several special cases 2.3.2.2. An existence theorem . . . . 2.3.2.3. The case of a finite set of states 2.3.3. Other examples . . . . . . . . . . . 2.3.3. l. Partially observable sequences 2.3.3.2. Miscellanea . . . . . .
IX
160 168 176 184 . . . 186 186 188 190 198 198 202
Chapter 3 Leaming
3.1. Basic models . . . . . . . . . . . 3.1.1. Introductory definitions and notions 3.1.1.1. Description of models . . 3.1.1.2. The simulation of models 3.1.2. Distance diminishing models . . . . 3.1.2.1. Description of the model . . 3.1.2.2. Theorems concerning states. 3.1.2.3. Theorems concerning events 3.1.3. Finite state models . . . . . . 3.1.3.1. Introductory comments 3.1.3.2. Properties
3.2. Linear models . . . . 3.2.1. The (t+ !)-operator model. 3.2.1.1. Description of the model . . . . . . . . . . . 3.2.1.2. The (m+ 1) 2 -operator model with reinforcement. 3.2.1.3. The limiting distribution function . . . . 3.2.2. Experimenter-, subject- and experimenter-subjectcontrolled events . . . . . . . . . . 3.2.2.1. Experimenter-controlled events . . . . 3.2.2.2. Subject-controlled events . . . . . . . 3.2.2.3. Experimenter-subject-controlled events.
203 203 203 207 210 . . 210 211 215
217 217 218 219 219 219 225 232 245 245 257
261
3.3. Nonlinear models . . . . . . . . .
264
3.3. l. The beta model . . . . . . . . 3.3.1.1. Description of the model. 3.3.1.2. Some auxiliary results 3.3.1.3. Properties . . . . . . .
264 264 267 270
X
Contents
3.3.2. The simultaneous discrimination learning model 3.3.2.1. Description of the model . 3.3.2.2. Properties . . . . . . . 3.3.3. The fixed sample size model . . . 3.3.3.l. Description of the model. 3.3.3.2. Properties Bibliography . . . Notation index . . Author and subject index
278 278 280 282 282 283
286 300
301
CHAPTER 1
A study of random sequences via the dependence coefficient This chapter in concerned with a study of sequences of dependent random variables by means of a dependence coefficient following the classical approach of sequences of independent random variables. The Section 1.1 deals with the general case, whereas Section 1.2 with the Markovian one; here the results obtained in Section 1.1 are transcribed in terms of the famous coefficients of ergodicity. At the same time, several results which have no analogues in the preceding theory are given.
1.1. The general case 1.1.1. The dependence coefficient 1.1.1.1. Borel-Cantelli type properties
1.1.1.1.1. Let (Q,f,P) be a probability space and ,ffi and % 2 two er-algebras contained in the er-algebra Jf: Define the dependence coefficient of the er-algebras %i and % 2 by the relation ¢(%i,%i) = sup(esssup1P(Bl%i) (ro)-P(B)l)Be:K2
coeQ
Define also the independence coefficient of the er-algebras by the relation
.-ffi and Xz
o:(%i,%i) = 1 - sup (esssupP(BI~) (ro)- essinf P(Bl.ffi)(w)\. Be:K2
wen
wen
(1.1.1)
/
It is easy to see that 0:::; a(.~, fi), (~, .x';) :s; 1, ½[1-o:(~,%i)]:::; ¢(%i,%i):::; 1-o:(%i,%i).
(1.1.2)
1. A study of random sequences via the dependence coefficient
2
Jt;
If %/,%{ are a-algebras contained in % such that ~ c % 2, then ef,(~',.Y4), r
!
~ s) ~ 1-
-r,
and hence P(suplSs+z-Ss+kl l>k
~ 2s) ~ 1 -
!
-r,
for every s > r(e, c5). Since c5 can be taken arbitrarily small, we may write
lim
s ➔ oo
P(U {ISs+i-Ss+kl ~ 2e}) = 0, l>k
which concludes the proof.
q.e.d
We deduce from Theorem 1.1.12 the following Corollary. Let the random variables f be :.t:-measurable nEN*. Assume that there exists n0 EN* such that " " '
lim
n ➔ 00
Then the series absolutely a.s.
L neN*
(\/= ~' i =\j n+ i
1
no
~)=11(x)-cJ>(kx)
=-
1
fx
v'2n
.,2
e - 2 du
kx
and, obviously,
It is easy to see that for any x e R
lxle
-~ 2
1
lxle
~-,
Ve
- k2x2 2
1 ~--,
which concludes the proof.
kVe
q.e.d.
1.1.4.1.2. In this Paragraph (fn)neN* will denote a strictly stationary sequence of real random variables. We suppose that Ef1 =0. For Ac N* let ~{(fn)neN•} =~ be the er-algebra generated by the family of random variables (fn)neA. Particularly we shall write A = [m, n] for A = {i: m ~ i ~ n} and A= [ n, oo) for A= {i: i ~ n} . Set cp(n) = SUNJ? (%c1.r], %cr+ n,oo)). re
The sequence (cp(n))neN* is, obviously, nonincreasing. Let us consider the condition (1.1.10) Proposition 1.1.20. If (1.1.10) holds and Ef i < oo, then the series CT
2
=£ff +2
L Efifn+l
neN•
converges absolutely. We have E
o- 2 ~ 0
(J/J
and 2
= n(u + p.)
with Pn=o(l) as n-+'X). Proof The absolute convergence follows from (1.1.10) and the inequality 1£ f1fn+ 1 I~2 1 ' 2 (n)£Ji, which is a consequence of Lemma 1.1.7. 3 Iosifescu/Theodorescu, Random Processes
24
1. A study of random sequences via the dependence coefficient
where
as n--.oo. We conclude that u 2 ~0. 1.1.4.1.3. Let us set
q.e.d. n
rr
Lemma 1.1.21. (l.1.10) holds, u#O, and El/11 2 H< 00 for some 0 0 there exist suitable a 1 and k such that (1.1.11)
for all nE N*. We have (1.1.12)
~ Elsn 1 + c} + 2
for all
nE N*.
Els 1 + n
2
c}
+ 2 Elsn 1 +{JI sn I + 2 Elsn 11 sn 1 + 1
1
c}
25
1.1. The general case
The strict stationarity implies
EISnl 2 H = EISnl 2 H =en• By Lemma 1.1.7 first with p = (2 + b)(l + b)- 1 and then with p = 2 + b we have 1+6
El Sn I1 HI Sn I~ 2[ n0 ) we shall have 6
C2
2 1 +2 n~(2+e2)cn+2a 1 (ESn) .
Further, there obviously exists a 2 (~2a 1 ) so that (1.1.15) is valid for n~n 0 ; thus (1.1.15) holds for all neN*. In virtue of (l.1.15), and taking into account Proposition 1.1.20, for every reN* 2 2 21 C2r~(2+r.rc1 +:)j : +p ) · j= 0 2 1 + 2 (j + P2r- I
+a2(ESir-l+¾(rf (
Therefore, for e sufficiently small there is a suitable a3 such that 2
C2r~(2+e)' C1 +a3(ES2r- 1)
for all
rE N*.
1
{>
+2
It follows that
Thus, for a suitable a 4 , (1.1.16)
for all rE N*. Let 2r~n(x)
uniformly with respect to xER. Proof. See
IBRAGIMOV ( 1962).
Theorem 1.1.23 may be extended as follows. Let ~ be a real valued random variable. Then ~ can be represented in the form ~=F[(/11 ) 11 eN•J, where Fis a g1N•-measurable real-valued function defined on RN•. Consider the strictly stationary sequence (g 11 )neN* with
. ~ 1 .oc 1-measurable
We have Theorem 1.1.24. If E~=O, E, 2 oo
lim
n ➔ oo [IBRAGIMOV
(1959, 1962)].
P(~ iro-s:
< x) = 4>(x).
< oo for some c5>0,
34
1. A study of random sequences via the dependence coefficient
4. If (fJneN• is only a stationary in the wide sense sequence of random variables such that ll,,I ~ C, neN*, and nt(n)< oo for
I
neN•
some e > 0, then a theorem corresponding to Theorem 1. 1.22 may be proved. For this one must use a direct domination for Elg 1 13 ( ~ C p E gt) and choose adequately p,q and -r [losIFESCU (1963d)].
1.1.S. The law of the iterated logarithm 1.1.5.1. Two auxiliary results
1.1.5.1.1. Here the sequence (fn)neN• is supposed to verify the same conditions as those mentioned at the beginning of Subparagraph 1.1.4.1.2. First, we have
Proposition 1.1.25. Suppose that (1.1.10) holds and u;i=O. If 0
Proof According to Theorem 1.1.22 we may write
For every t>O we have 00
f
x2
e - 2- dx =
! e- 2
r2 (
t
1 - ~) t2
'
r
Consequently
vk Ie-~
dx+O(n-•) =
a.,
1
>
We set log2 n=loglogn.
~a, exp(- al)
Elf1 12 H < oo,
35
1.1. The general case
Under the condition concerning an we have
O(n-")a,.exp (
1) =
o(l),
which concludes the proof.
q.e.d.
1.1.5.1.2. Further, let us set
s: = 1:s;k=s;n max Sk. Proposition 1.1.26. Suppose that (1.1.10) holds and Elf1 12 H < oo, ¢(1)< 1, o-#0. For every c such that (l)O
such that
P(S:>•).;; c-~(lt(S.>e-baV,i) for arbitrary e > 0 and
nE N*.
Proof. According to Theorem 1.1.22 there exist n0 EN* and b=b(c)>O such that P(Sn> -bo-Vn)~c for n~n 0 . It is easy to see that b may be chosen sufficiently large such that the above inequality holds for every nEN*. It remains to apply Lemma 1.1.6' with a= -bo-Vn and x=e-bo-Vn, q.e.d. 1.1.5.2. The main theorem 1.1.5.2.1. Now we shall give the law of the iterated logarithm.
Theorem 1.1.27.
If (1.1.10) holds and Elfil 2 H< oo, o-,i:O, ¢(1)< 1,
then
Proof. We use a classical approach due to A. N. KoLMOGOROV. Set nk = [ d 2k] for kE N* with d > 1. Without any loss of generality it may be supposed that a= 1. 1° Prove that for any e>O we have
Let O< e' < e and choose d such that (1 + e)/ d > 1 +a'. It is easy to see that
P(Sn > (1 +e)V2nlog2n i.o.) ~ P(s:k > (1 +e)V2nk-l log2nk-l i.o.) ~ P(s:k >(1 +e')V2nklog2nk i.o.).
36
I. A study of random sequences via the dependence coefficient
Therefore, it suffices to show that
L P(s:k > (1 +e')V2nklog2 nk) < oo. neN*
This convergence follows from the fact that for O< e" < e' and for k's sufficiently large we have, by virtue of Propositions 1.1.26 and 1.1.25,
P(s:k > (1 +e')V2nklog2nk)
~ c-~(1) P(Snk > (1 +e') v2nklog2nk-bv,l:) ~ c-~(l) P(Snk > (1 +e")V2nk1og2 nk) =o (k-< 1 +E:'')
2 ).
Note that the proved result implies that
P(ISnl > (1 +e)V2n1og2 n i.o.)= 0 for any e>O. 2° Prove that for any e>O we have Set uf = nk(l -d- 2), vf = 2log 2 uf, and Ak = {Snk -Snk-l >(1-e)ukvk}
for kEN*.
According to Proposition 1.1.25 we have (we can obviously consider only the case e < 1)
P(Ak)
1
= ---(loguf)-< 1 -
2 2
> (1
+o(l)),
~Vk
which is the general term of a divergent series. Lemma 1.1.2' 1 > implies then that
P(1im A,) = I. Taking into account the remark mentioned
at the end of ~:it results that P(lim AknBk) = 1, where k-+ co
Bk= {ISnk-1I ~2v2nk-1 log2 nk-1}. By choosing d sufficiently large in order that (l-£)(1-d- 2 ) 112 -2d- 1 > 1-e' with e'>e we obtain 1= P
(!~ AknBk) ~ P(Snk>(1-e')V2nklog nk i.o.). 2
Theorem 1.1.27 is proved. 1
>
Lemma 1.1.2 might be also used by taking Ak= {Sn2k -Sn21c-1 >(l-e)u2kV2k}.
q.e.d.
37
1.1. The general case
1.1.5.2.2. Set now
Uf = Eff + 2 L Efifn1+1· neN*
Then, we may state Corollary 1. For each lEN* such that a 1#0 and ¢(1) 0 there exist x', x" EX so that the measures P(x'; •) and P(x"; •) are concentrated at two sets Bx, and Bx" for which P(x';Bx· nBx,,)II. According to (1.2.5) we may write re1
Then, Proposition 1.2.2 implies
Thus, we have proved that oc(Q)= inf
i,ieI
I, min(qik,qik)kel
Taking into account that . mm(qik,qik)
=
qik+qjk-lqik-qjkl
2
'
we obtain q.e.d.
1.2.1.2.3. Let us consider the general case. Let (X,,q[), ( Y, i?Y') and (Z, !Z) be three measurable spaces. Consider a transition probability function P' from (X,,q[) to (Y, i?Y) and a transition probability function
1.2. The Markovian case
43
P" from ( Y, dJ/) to (Z, ~)- Let P be the transition probability function from (X,_q-) to (Z,f!Z) defined by P(x;C) =
JP'(x;dy)P"(y;C) y
for xEX, CE:!Z. Proposition 1.2.4. We have
1-tX(P)~(l -a(P'))(l -oc(P")). Proof. The associated operators ~, ~, and ~" with P, P' and P" respectively will satisfy the relation ~ = ~,, ~', from which we obtain
11~11~11~"1111~'11. It remains to apply Proposition 1.2.2.
q.e.d.
Notes and comments
Lemma 1.2.1 is due to due to DoBRUSIN (1956).
IOSIFESCU ( 1967 b ).
Propositions 1.2.2-1.2.4 are
Complements
Let O(V, -r) be the Banach space of all real-valued bounded and r=measurable functions t/1 defined on V with the norm 111/111 =oscv,= supt/f(v)- infv,(v). veV
veV
With the transition probability function P from (X,ff) to ( Y, ~) we associate an operator ~* applying 0( Y, ~) into 0( X, El) by setting (~*t/f)(x)=
JP(x;dy)v,(dy).
y
We have cx(P)= 1 -11~*11. [IOSIFESCU
(1967 b)]. 1.2.1.3. The relationship to the independence coefficient
1.2.1.3.1. Consider a sequence of measurable spaces (Xi,g[")ieN and for every jEN a transition probability function ip from (Xj,5) to
44
1. A study of random sequences via the dependence coefficient
(Xi+t,.:l'i+ 1 ). By means of the iP's and a given xEX 0 we can construct a probability space (Q,f,Px) by taking
a=nxi,
Yt=n~
ieN*
ieN*
and by setting Px(A)= J 0P(x;dxi)J 1P(x1;dx2) ... J"- 1P(x,.-1;dx,.) A1
An
A2
for A= A 1 x · .. x A,., AiE.:l'i, 1 ~ i ~ n [the possibility of extending Px on the whole Ye is assured by IONESCU TULCEA's (1949) theorem; see also LoEYE ( 1963, p. 137)]. The sequence of random variables (,,.)neN* defined on Q by for w = (x,.),. N will be said to be connected in a (nonhomogeneous) Markov chain with state spaces (X,.,.:l',.), nEN*, transition probability functions i P, j EN, and initial probability distribution concentrated at XEX 0 •
Obviously, for every nEN*, the X,.-valued random variable ,,. is ~-measurable, where
~=
{A:A=
TT Ai}
ieN*
for every n, l EN*. 1.2.1.3.2. Define now as usual the n-step transition probability functions i P" by iP"(xi; Ai+n)
=
JiP(xi;dxi+ X
1 ) •••
Jn+ i- 2 P(x,.+ i_ 2 ;dx,.+ X
where XjEX, Aj+nE.:l'j+n· We shall prove
j-i)
Jn+ i-
Aj+n
1
P(x,.+ i-t ,dx,.+ },
45
1.2. The Markovian case
Proposition 1.2.5. We have k
a(
i"!i ~,
k+l+m-1
i=y+i
)
~ =
1
a(kP )
for every k,l,mEN*. Proof According to (1.1.1) we have
where kp~ is the transition probability function from (Xk,~k) to the k+l+m-1
product measurable space
TI
(
k+l+m-1
TI
Xi,
j=k+l
)
~i
induced by the ip•s.
j=k+l
It is easy to verify that for m > 1 kp~(xk;•)
=
J kpl(xk;dxk+,) J Xk+I
k+lpm-1(xk+,;dy 1, x~,x;EX1c, Therefore o:(k P~) ~ o:(k pl).
q.e.d.
The converse inequality is obvious. Proposition 1.2.6. We have k+l-1
1-a(kP1)~
n (1-aeP))
i=k
for every k, l EN*. Proof The above inequality is an immediate consequence of Proposition 1.2.4. q.e.d. Notes and comments
Proposition 1.2.5 is essentially due to
UENO
(1957).
1.2.2. A lower bound for the variance 1.2.2.1. The main theorem
1.2.2.1.1. Let us consider a probability space (Q,%,P). For a given a-algebra IR c :I{" we denote by E(fl!f') the conditional expectation
46
1. A study of random sequences via the dependence coefficient
of the real-valued random variable f with respect to Y and we set, as usual, D(fl2') = E {[f- £(!1£1]2 IY} = E(f 2 IZ)- [ £(fl2')]2, D(fl.Y)) = £[ D(flY)], D(fl2') = D [ E(f!Y')]. It is known that (1.2.6) Df=D(flZ)+D(fl~). 1.2.2.1.2. Now consider a sequence ('Pn)neN• of real-valued functions such that 'Pn is defined on X" and ,qfn-measurable. Set fn= 'Pn°~n, nEN*, sk,n=h+ ... +fk+n-1• S1,n=Sn.
Let us denote !X =
min
!X(i P).
l~i~n-1
Our purpose in what follows is to prove Theorem 1.2.7. If the random variables variances, then 1 > cxlnJ
DxSn ~ 8
fk, 1 ~k~n, have finite
n
L
Dxft.
k=l
The proof will follow from a sequel of propositions. 1.2.2.2. Auxiliary results
1.2.2.2.1. Let us denote Yk=E(Skl~).
bk=Sk-Yk·
Obviously, E(bkl~)=O. Without any loss of generality we may suppose that £ fk = 0, I ~ k ~ n. Set also Dk=D(Sklfk)=D'5k, l\=D(Skl~)=Dyk. Therefore DSk=Dk+Dk. 1
>
For the sake of simplicity, we shall write in the following D, E instead of
Dx,Ex-
47
1.2. The Markovian case
Proposition 1.2.8. For any 1 ~k,-E(li,lfi)] [r.+s.+ 1.,-.-E(r. +sk+ ,.,-.lfi)J /
,Y.f.} = o.
For, remark that the random variable
l
is
V ~-measurable; thus it suffices to show that i=k
(1.2.7) By using the time-reversibility of the Markov property we obtain
thus (1.2.7) holds. To close the proof it remains to show that
This follows from (1.2.8) q.e.d. 1.2.2.2.2. Now, consider two measurable spaces (X,,q-) and (Y,@'), a probability n on ,q-, and a transition probability function P from the
48
1. A study of random sequences via the dependence coefficient
first to the second. On the product a-algebra :!Ix tl.!I we define the probability p by setting p(A x B)= x(dx)P(x;B)
J
X
for Ae:!I, Betl.!I. Consider also the a-algebras fi = {Ax Y: Aef!I}, o/1 ={Xx B: Betl.!I}.
These a-algebras are isomorphic with the er-algebras :!I and tl.!/. For Ae:!I and Betl.!I the corresponding elements in fi and o// will be denoted by A= A x Y and B= X x B respectively. Let ebe a real-valued random variable defined on the probability space (Xx Y, :!Ix tl.!l,p). Suppose that eis Ii-measurable and possesses a finite variance.
Proposition 1.2.9. We have
D(elcffl) ~½cx(P)D e.
e.
Proof It suffices to consider the case of a simple random variable Let us denote by m the median of and set M = {(x,y):e(x,y)-m~0}. Obviously ME:i and p(M)=n(M)~½- As known E(e-m) 2 ~De,
e
wherefrom it follows that we have either E[(e-m)+] 2 ~½De or E[(~-m)-] 2 ~½D~. To make a choice let us suppose that
(1.2.9) Denote by a a fixed positive but arbitrary number less than ½. By (1.2.9) and the i-measurability of e it follows that there exists a partition of X x Y-:, M into a finite number of disjoint sets l, 1 ~ i ~ r, belonging to :!I such that for some ti~ m r
L 1t(Ai)(ti-m)
2
~ aDe
(1.2.10)
i= 1
and e(x,y)=tj for (x,y)EAj, For every AefI such that n(A)>O let us consider the A-conditional probability on f£ defined as usual by n (•) A
= n(A n •) n(A)
·
Consider also the probability qA on o// defined by qA(•)
= JxA(dx)P(x;•) = X
~1tA(•).
(1.2.11)
49
1.2. The Markovian case
Note that we can also write qA(B)
1
= n(A)
J
n(dx)P(x;B) =
p(AxB)
(1.2.12)
n(A) .
A
Taking into account (1.2.11), Lemma 1.2.1 yields llqA'-qA"II ~ 1-oc(P)
(1.2.13)
for every A', A"E~ with n(A'), n(A")>O. Finally, from (1.2.1) and (1.2.3) it follows that for every pair of sets A',A"E_q[ with n(A'), n(A")>O there exists a measure vA' A" on Cf.!J such that (1.2.14) for all BeCT.!f, and (1.2.15)
ti(qA',qA") =VA' A"(Y).
From (1.2.2), (1.2.15) and (1.2.13) we deduce vA'A"(Y) ~ oc(P).
(1.2.16)
Relations (1.2.14) and (1.2.12) imply n(Ai)vA,M(B) ~ n(AJqA,(B) = p(Ai x B)
(1.2.17)
for all BECT.!f, 1 ~ i ~ r. Relations (1.2.14), (1.2.12) and the inequality n(M) ~½ imply also r
I
r
n(A;)vA,M(B) ~
i=1
I
n(Ai)qM(B)
(1.2.18)
i=1
for all BECT.!f. Now, consider a finite subalgebra PA of Cf.!J and denote by Bi, l ~j~s, a system of (disjoint) generators of PA. We shall prove that for every Bi, 1 ~j~s, we have r
p(X x Bi)D(~IX x B) ~
L n(Ai)vA;M(Bi)(ti-m)
2
•
(l.2.19)
i=1
If m i=E(~IX x B)+(l-cx(n))1f2J
k=1
(n)
CX
n
L Dxfk•
k=1
Proof. It suffices to take into account Proposition 1.2.6.
q.e.d.
Note that this Corollary and Theorem 1.2.7 give the striking double inequality
(1.2.22) This double inequality implies Proposition 1.2.15. Suppose that (1.2.20) is verified with n 0 = l. Then the series f n - Exfn converges in quadratic mean if[ the series
L
neN*
L
Dxfn converges.
neN*
1.2.3.2.2. Further, we can prove Theorem 1.2.16. If the random variables fk, 1 ~ k ~ n, are bounded, then
Proof. The proof follows from (l.1.2), Proposition 1.2.5 and Theo-
rem 1.1.11.
q.e.d.
Corollary. If the random variables fk, 1 ~ k ~ n, are bounded and a. > 0, then
Proof. It suffices to take into account Proposition 1.2.6.
q.e.d.
Notes and comments
An inequality of the type from Theorem 1.2.14 was given by (1956). It is unknown whether the constants½ and 8 in (l.2.22) may be improved.
DOBRUSIN
1. A study of random sequences via the dependence coefficient
54
1.2.3.3. Convergence of series
1.2.3.3.1. We have Theorem 1.2.17. Suppose that (1.2.20) is verified with n 0 = 1. Then the series fn converges in Px-probability iff it converges Px -a.s.
L
neN*
If (1.2.20) is verified with n0 > 1, then the series converges absolutely in Px-probability iff it converges absolutely Px-a.s. Proof. The proof follows from (1.1.2). Proposition 1.2.5, Theorem 1.1.12 and its Corollary. q.e.d. Theorem 1.2.18. Suppose that ( 1.2.20) is satisfied. If the random variables fn, nEN*, have finite variances and Dxfn a), neN•
are convergent for at least one a> 0. Proof. It suffices to notice that this Corollary corresponds to Corollary 4 of Theorem 1.1.14. q.e.d. Theorem 1.2.19. Suppose that (1.2.20) is verified with n 0 = 1. If the random variables fn, nE N*, are uniformly bounded and the series fn
L
L
converges Px-a.s., then the series
Dxfn and
neN*
neN*
L
Exfn are convergent.
neN•
Proof. Suppose on the contrary that the series
I
Dxfn is divergent.
neN*
As without any loss of generality we may suppose that inf a.(kP) ~a.> 0, keN*
Theorem 1.2.7 implies that lim
n--+ oo
oxsn = 00.
55
1.2. The Markovian case
Now, we shall use a consequence of a result by STATULEVICIUS (1962). If 1/kl~M, kEN*, there exist two constants H>O, O 0 (there/ore for all a> 0). Proof. The proof follows from Theorem 1.2.18 and Theorem 1.2.19 as in the case of independent random variables [ see, e. g., LoEVE (1963, p. 237)]. q.e.d. Notes and comments
Theorem 1.2.17 is due to UENO (1957), Theorem 1.2.18 to COHN (1965c). Theorems 1.2.19 and 1.2.20 are due to IosIFEscu (1967b). 5 losifescu{f heodorescu, Random Processes
56
1. A study of random sequences via the dependence coefficient
1.2.3.4. The strong law of large numbers
1.2.3.4.1. We have Theorem 1.2.21. Suppose that (1.2.20) is verified. If for a sequence
II
then 1/an
I
fk-Exfk converges Px-a.s. to Oas
n-HX).
k =1
Proof. The proof follows from Theorem 1.2.18 in the same way as Theorem 1.1.15 followed from Theorem 1.1.14. q.e.d.
Now, the analogue of Theorem 1.1.17. Theorem 1.2.22. Suppose that ( 1.2.20) is verified and let the random rariables f,,, nE N*, be identically distributed. Under these conditions
1/n
I Ji
converges Px-a.s. to a constant a as n-HX) iff Exlf 1 1 < oo; in
i= 1
this case a= Ex f 1.
Set
and suppose that
Next, we have Theorem 1.2.23. If zn+l-1
L (2"a,,)-
2
neN*
L
Dxf~< r:x;,
( l.2.23)
i=2"
n
then the random variable n n~x.
1
I
fk - Exfk converges Px -a. s. to O as
k=l
Proof We shall verify conditions ( 1.1.5) and ( 1.1.6) of Theorem 1.1.18. It is easy to see that 1-r,n~!Xn, ~•n
~8
:t; I.
57
1.2. The Markovian case
Thus
22 "(1-ri,J
---2-,.--:--+--:-t_-1- - ~ - - - - 2 , , - • - I-_-1- -
L
(yn + 1)
DxJ;
L
(2nan)- 2
i=2"
Dx.!;
i=2"
and in virtue of (l.2.23) condition (l.1.5) will be satisfied. Then, 2n+ 1-1
2n+ 1-1
2- 2 n(rn+ 1)
L
(2"an)- 2
DxJ;
L
DxJ;
________ i=_2_"---,--- ~ _ _ _ _ _ _i_=_2"------,---:--~2n+ t-1 2"+ 1 -1 2 1 2 2 2
4(1-11n)e
-r
L
n(11n+ 1)
DXJ;
4.9- e -(2"r. j,
if n~j.
We shall prove that for every j EN* lim .!._ n__,-x:,
n
±
fli>=O
k= 1
Px-a.s.
(l.2.24)
59
1.2. The Markovian case
We use the following result [LoEVE (1963, p. 387)]. Let (Y,.)neN* be a sequence of random variables such that £( Y,.I Y,. _ 1 , ••• , Yi)= 0 and £¥,; ~M < oo for all neN*. Then 1 lim n ➔ oo n
n
L Yrt=O k=l
a.s.
To apply this, note that for n > j Ex (J . f (i>);: ): ): )IJ n IJ,W n-1,···, 1 - £ JC (£JC (f (j) n I1:on-J,':on-j-1,··•,~1 n-1,···, J) 1,
and that, since the ek's form a Markov chain,
Ex(f~'"n-j, ... , e1)=£x(f~>1en-1)=0. Further, £x(f~/>)~2(supl/l) 2 thus (1.2.24) holds. To complete the proof of Theorem 1.2.24 write
f(c;n)-£x{f(c;n)lc;n-)=Jn+ ··· +fnU>,
n>j.
Thus, by (1.2.24),
Or, neglecting at most k terms,
-;;1 k~/(c;k) n
!~~ 1
so that, for fixed
!~~
n
I
Px-a.s.,
uE N*
-;;1 J/(c;k) n
1
1
-;; k~l Ex{f(c;j+k)lc;k) = 0
1
" [ 1
U
JI
-;; j~l ; k~l (\:(f(~j+k)IO =
0
By the uniform convergence of Unf to Err.I, for any £>0 we may choose u e N* such that
For such an u we have
which completes the proof of the theorem.
q.e.d.
60
1. A study of random sequences via the dependence coefficient Notes and comments
Theorems 1.2.21 and 1.2.22 are due to COHN ( 1965 a, 1965 c). Theorem 1.2.23 is due to IosIFEscu (1967b). We note that similar results given by ROSENBLATT-ROTH (1964) are doubtful since they are based on an erroneous proof [loSIFESCU (1967 d)]. Theorem 1.2.24 is due to BREIMAN (1960). A class of Markov chains admiting invariant probabilities, closely related to some stochastic models for learning, has been studied by DuBINS and FREEDMAN (1966).
1.2.3.5. The central limit theorem and the law of the iterated logarithm
1.2.3.5.1. Let us approach the homogeneous case, i. e. 'P = P, 11
nEN*.
In this special case conditions such as ( 1.2.20), which appear in most of the preceeding theorems, amount to
cx(Pn°) ~ex> 0. Thus, the restatement of these theorems is obvious. Let us pass to the theorems from Subsections 1.1.4 and 1.1.5. According to Theorem 2.1.35 if cx(pn°)~cx for some n0 EN*, then there exists a probability n on flI such that n
--1
IPn(x;A)-n(A)I ~(1-a)no
for all nEN*, xEX,AE PI. It is easy to see that n is a stationary absolute probability distribution. This means that the sequence (fn)neN* will be a strictly stationary one if the initial probability distribution of the Markov chains is n (instead of that concentrated at x EX as in Subparagraph 1.2.1.3.1 ). Theorem 1.1.22 leads to Theorem 1.2.25. If En I f 1 12 H < co, a- n # 0, then there exist two positive constants v < 1 and K such that
for any aER, nEN*. 1
>
The notations E1t,)EB}. Notes and comments
The associated Markov chain was discussed for the first time by ONICESCU and Mrnoc(l936a) and afterwards by FoRTET(1938), loSIFEscu and THEODORESCU (1961-1962) etc. Further, several authors used the associated chain mainly in applied problems. Complements
1. An important problem is the investigation of the coefficients of ergodicity cx(rQ"). We do not know yet a nontrivial general result in this direction. It is clear that if, for example, there exists BE "If" and w ,W EW such that ur(w';x)I = 0 n--+oo
uniformly with respect to lEN*, w', w"E W, A< 0 EX(l); thus lim (Pi(A< 1>-fi(A< 0 ))=0 n--+ oo
uniformly with respect to ZEN*, A);A)I the supremum being taken over all t', t" ET, w', w'' E W, Aefl', then
xE Yn(A 0 ),
neN*
We note that
U" j")
implies that the sequence (a~)neN* is nonincreas-
ing. Consider also a weaker condition than Condition FLS"(A 0 , v). Condition FLS'(A 0 , v). Let A 0 Efl' and veN*; (j') there exists y>O such that rpl (w; A 0 )~y for all we W, te T; (j'j') if we set
a~ =suplr+rP(ur(w'; x); A)-r+rP(ur(w"; x); A)I the supremum being taken over all te T, w', w" E W, xE Yn(A 0 ), AEfl, then
I
an< oo.
neN•
We have
Proposition 2.1.13. If Condition FLS'(A 0 , v) is verified, then
O, w' , w"EW., fior all lEN* , tET, xEYn(A 0 ), w',w"EW, A< 0 Ef!l"0>. It follows that for l ~ v Proposition 2.1.13 is proved. If l> v, we have t+rpl(w;AO>)
=
J t+rP,,(w;dxM) J x)-Pi0 (w"; A)I ~ 1-b
fior all [EN* ' w', w" E W., AEf!l'0 such that for every leN*, tET and every partition A~>uA~>=x,A~1>Ef!l'(l), i=l,2, we have either 'Pi 0 (w; AVl);?;c> for all WE W, or tpi0 (w; A~>);?;b for all wEW. For homogeneous random systems with complete connections, the above conditions reduce to Condition F(n 0 ). Let n0 eN*; there exists b>O such that for every lEN* and every partition A~> u A~>=x< 0 , A1°Ef!l'< 0, i= 1,2, we have either Pf 0 (w; A~l);?;b for all wEW, or Pi 0 ( w; A~>);?; )_tp;o(w"; A~>)I = 1rp;o(w'; A~>)_rp;o(w"; A~>)[, valid for every partition A~> uA~>=x0 >, Ai1>E~0 >, i=l,2. Now suppose that M'(n 0 ) is verified. Put
JHA< 1>) = inftPi 0 (w; A(l>). weW
We have sup rPf 0 (w; A) weW
= sup [1-tPi (w; X(l)-A)] = 1-fHX). 0
weW
Condition M'(n 0 ) implies 1-fHX(ll_A(l))-fHA(I))~ 1-b
that is
Therefore Condition F' (n 0 ) will be verified, b being replaced by t>/2. q.e.d.
Notes and comments
The above approach follows IosIFESCU (1963 e, 1965 e, 1966 b ). Lemma 2.1.7 is essentially due to LAMPERTI and SUPPES (1959). Condition FLS(A 0 , v) was used in special cases by FoRTET (1938) and LAMPERTI and SuPPES (1959). Conditions M(n 0 ) and F(n 0 ) are analogous to classical conditions used by MARKOV and FRECHET to investigate ergodic properties of Markov chains. Complements
One may imagine also other types of uniform ergodicity, as well as one may define different types of uniform semi-ergodicity. We come, for example, to the uniform semi-ergodicity of Cesaro type if for any lEN* and wE W there exists a probability P?(w: •) on grm such that 1 lim n--+ex>
n
L" P{(w; A0>)=P?(w; A 0, it follows tp~+i(w;A)~oc'tp(wo;A) -
11.'
4
11 We notice that Condition K~(A ,k) implies (j') of Condition FLS'(A ,k+ 1) 0 0 with y=cc.
80
2. Random systems with complete connections
for all tET, lEN*, wEW, A 0>efr and a fixed w0 EW. We have either rp< 0 (w 0;Am)~½ or 1p< 0 (w 0 ;x< 0 -A 0 >)~½; thus Condition F'(k+ 1) is verified with t,
= a.'.
q.e.d.
4 Further, let us consider
Condition K~(A 0 ,k,µ,r). Let A 0 efl', keN and µ,reN*; there exist a family ('p,),eT of probabilities on Aon [r0 such that 'P:(w; A n A 0)~ cc,. 'p,.(Ae[r)I ~ -
Y
L a1
(2.1.10)
j~r
for all l>r, tET, w',w"EW, A< 0 e~< 0 . On the other hand we have for l> r and an arbitrary we W
'P 1+.u-i (w';x)~
J
~
XUl-l)xA
X
'Pr+µ- 1(w' ;dx(r+µ-1))
J
t+r+µ-1 Pz_,(u,(w';xE~m, where pis the greatest nonnegative integer such that Pov< n0 • Proof. The proof follows from Theorems 2.1.12 and 2.1.15.
q.e.d.
2.1.2.3.2. We shall now give sufficient conditions for uniform ergodicity for homogeneous random systems with complete connections corresponding to those considered in Subparagraph 2.1.2.2.2. Let us set {u{w;x):wEW,.xEX(fd} if k>O, "'1c = { w 1'f k =. 0 Condition K 1 (A 0 ,k). Let A 0 E~ and kEN; there exists a probability p on A 0 n~ and a number O such that P(w;A n A 0 );;::::E Yn(A 0 ), w', w"E W, l/lEBL,Ao(W, 'if'} Proof. We have
J
(Uh i/J){w) = Ph(w; dx) l/l(u(w; x)). X
As a consequence of the inequality (2.1.6) and Lemma 1.2.1 we obtain l(Uh t/J)(u(w'; x))I ~ (v- l)an oscl/1 + In
(2.1.11)
for all h)-P1 (A< 0 )1 ~ 1
3I P-
~
inf
[
k
(
=0
r) (1-yr-k+· 6v k
pov) =t flj(x; A< 0)
for every w=xEW, tET, l, nEN*, A(l>Eq-< 0 . The relation (2.1.1) implies t flj(x;AOl)
=
J m-k+ t
X
7*
1 (x;dy(k))r+r n;-'(y(kl; A(ll)
(2.1.12)
90
2. Random systems with complete connections
for k ~ r < n. This relation generalizes the well-known ChapmanKolmogorov equation. 2.1.2.5.2. Uniform ergodicity in the strong sense for the considered Markov chain comes to the existence of the probabilities fl? on !£0 > for every le N* such that lim t llf (xe!£ 0 such that for every partition A\k>u A~k>=x, At> efif, i= 1,2, we have either or
Further, we consider Condition Mk(n 0 ). Let n0 eN*; there exists IJ>O such that l''n:o(x;A(k))-'"n:o(y(k);A(k))I ~ 1-/J
for every t', t" E T, x, y E x, A e !£,A('))_t"+nnNx E x;dz;A)I
~ k+no-1~r~n-1 inf
L a'!
)
1
i~r c)
+ (1-b{!,--
1
Proof The proof follows from (2.1.17), by making use of Lemma 1.2.1 and taking into account (2.1.14) and (2.1.16). By setting l]7(A 0')
=
11f(A0') =
inf
'llf(x; A 0>)
reT,xO•>eXCk)
suo
'llf(x;A< 1>),
teT,x(k1eXCkl
we obtain J17(A< 0 )-Uf(A 0 >)~(1-c5)[lli-'(A< 1>)-Ui-'(A 0 ')]
+
La~' j3r
for k + n0 - 1 ~ r < n and hence the existence of n~ and the domination in the above statement are easily deduced. q.e.d. 2.1.2.5.3. Let us pass to uniform ergodicity in the weak sense. The uniform ergodicity in the weak sense for the considered Markov chain comes to n---+ x,
uniformly with respect to tE T, x,yEx, ZEN*, AE£l'" 0 >. Obviously, Condition FLS'(X, 1) is always verified. Condition F~(n 0 ). Let n0 EN*; there exists c5 > 0 such that for every partition A\k'uA~k>=x, Aik>E£l°, i=l,2, and every tET we have either or
Condition M~(n 0 ). Let n0 EN*; there exists b>O such that ltfl~O(x(k);A(k))-'n:o(y;A(k))I ~ 1-c5
for all tE T.
2.1. Ergodicity
93
Proposition 2.1.33. Condition F~(n 0 ) and M~(n 0 ) are equivalent.
Proof. The proof follows as in Proposition 2.1.14.
q.e.d.
Further, we have Proposition 2.1.34. Condition M~(n 0 ) implies Condition M'(n 0 + k).
Proof Indeed, by (2.1.12) we may write t II?+k(x;A)J r +no+k-1 llz(Z(k); A(I)),
XII, x,yEx), x, A~' 0 'E~(Lo>, i = 1,2, we have either
or
2. Random systems with complete connections
96
(Equivalently,
for every w', w" e W,
A(lol e~
i.e. real- or complex-valued.
109
2.1. Ergodicity
consequently, we shall use the notations CL( W) = CL1 (W), m(t/t)=m(t/t; 1). First, we note that CL(W) C C(W). Indeed, for a fixed but arbitrary w0 E W, we have !t/t(w)-t/t(w0 )1 s u p - - - - - = aO; there exists an integer j(e) such that m(I/Jn)~m+E,
j'~j(B);
if w',w"eW, w'=l=w", then 11/J(w)-l/f(w")I = lim 11/Jn;Cw')-l/lniw")I. d (w', w") j-+ oo d (w', w") It follows that 11/1( w')-1/1( w")I d (w', w") ~ m+e,
and since e is arbitrary, we get (2.1.24). Let us come back to the completeness of CL( W). From lim llt/ln -1/1 PII =0
n,p--+oo
we deduce lim
n,p-+ oo
11/Jn-l/lpl=O
111
2.1. Ergodicity
and (2.1.25) By (2.1.25) there exists a t/, EC( W) such that lim lt/ln-t/11 =0;
n-+oo
it remains to show that t/, E CL( W) and lim llt/ln-t/111 =0.
n ➔ oo
Let e > 0 and n(e) an integer such that m(t/Jr-t/Jn) n(e) being arbitrarily chosen, we have also lim m(t/Jr-t/1)=0,
r ➔ oo
·so that the completeness of CL( W) is proved. We notice also that CL(W) is dense in C(W). Further, we remark that the above considerations imply the validity of (ITM.) for X=C(W) and ID=CL(W). Thus, we have succeeded in isolating a special class of operators, namely the class B( CL( W), C(W)). For this class the following specialization of Theorem 2.1 .40 is valid. Theorem 2.1.52. Let Ube a linear operator on C L(W) such that
(i)
IUlcL(W) ~ 1;
(ii)
UEB(CL(W));
(iii) for some positive integer k and positive constants q < 1 and Q
Then (a) there exist at most a finite number of eigenvalues Ai, 1 ~ i ~ p, of modulus 1 of U, each of them of finite multiplicity, such that
2. Random systems with complete connections
1)2
p
un= L
lfUi+vn,
neN*,
i=l
where U;eB(CL(W)), l~i~p, and l~i~p;
(b)
VeB(CL(W)) and 91.,,=U;'l),
l~i~p, 1 ~;. i'Fj, 1 ~i~p;
Uf=Ui, U;Ui=O, U; V= VU;,
j~p.
(c) there exist two positive constants Mand h such that
IIV"II ~
M (1 +ht,
neN*.
Proof. Obvious.
2.1.3.2.4. Let us assume now that the elements of the homogeneous random system with complete connections {(W, 1Y), (X,~), u.,P} satisfy (I) Wis a compact metric space with respect to some distance d and 1Y is the class of Borel sets in W; (II) m(P(•;A))~a < oo for all Ae~; 11 (III) µ(u(•;x))~ 1 for all xeX, where ( ( )) µu.·x '
(IV)
-
d(u(w';x),u(w";x)) sup-------· w'#w"
d(w',w")
'
there exists n 0 EN* and A\;'0 >e ~(no> such that
A\;'ol c {x l
}
2
is either 1 or O provided that y¾. A similar result may be obtained for the more general expression l f..+1Cw) = - +1 n
II
L
l/l(a11 + 1 w),
J=O
where Un)neN• is an increasing sequence of nonnegative integers (1940a,b); for the special case l11 =n(n-1)/2, see KAC- (1938)]. These results were obtained by making use of the linear operator
[FORTH
(Ul/l)(w)
= ~1
a-1 i~
(
+ i") •
1/1 wa
2.1.3.3. Application to the associated Markov system
2.1.3.3.1. We shall continue to restrict ourselves to the homogeneous case. A basic question is the asymptotic behaviour of Q"(w;.) as 11--+-x.. It is necessary to make precise what is meant by "convergence" of a
2.1. Ergodicity
117
sequence (pn)neN* of probabilities on "IF to a probability p on "IF. The appropriate notion is this: Pn converges to p if for any Borel set B in W n-+oo 0
n-+oo
-
where B is the interior and B the closure of B. Equivalently, lim n-+oo
Jtf,(w)p"(dw) = Jtf,(w)p(dw)
W
W
for every tf, EC( W), i.e. the weak convergence. Further, if (Q")neN• is a sequence of transition probability functions and Q 00 is a transition probability function, we shall say that Q" converges uniformly to Q00 if for any Borel set B in W and any £ > 0 there exists an integer nil such that Q 00 ( w; B)- e ~ Q" (w; B) ~ Q 00 ( w; B) + e
for all n~nll, wEW. We shall prove Theorem 2.1.54. For any homogeneous random system with complete connections satisfying (1)-(IV), the transition probability function 1 n-1 . Q11 > converges uniformly as n ➔ oo to a transition probability
- I n
j=O
function Q00 and for any Borel set B in W, Q 00 (•;B)ECL(W). There exists a positive constant c such that
(2.1.28) for all n EN* t/1 EC L( W), where
J
(U 00 tf,)(w) = tf,(w')Q 00 (w;dw'). w
Proof We prove firstly that there exists a transition probability function Q from (W, "fl/') to itself such that (2.1.28) holds. Theorem 2.1.53 implies that 00
p
IIU"I!
~
I IIUdl+IIVnll i= 1
p -i-
I IIUill i=t
2. Random systems with complete connections
118
as n ➔ oo. Therefore there exists a constant D < oo such that IIU"ll~D
for all neN. Let now
1
n-1
on= - L
Ui for neN*. Then, by Theorem 2.1.53,
n j=O
_
1
Un= -(J-U")
n
~ L,
1
+n
J=l
1 1 =-(I-U")+n n
1=1
.
U1
c· )
n
1=1
1-~
1
n
n
J= 1
Lp L lf u,+-1 L Vi. =1
II
Therefore (we suppose that l 1 = 1) Un-Ul
1
1
11
= -(J-U") +- L l , 1 - U; +- L n
n 1=2
1
- , .1
V 1,
so that
where " IIU,11 M c=(l+D)+2 ;~2 11-l,1 + h.
Thus, for any t/leCL(W), 111/111 IIU.,t/l-U1t/lll~IIU"-U1ll llt/lll ~ c . n
(2.1.29)
lim I0.,1/1- u1 t/tl =0
(2.1.30)
and, a fortiori, n ➔ oc,
for all ,J,eCL(WJ Since CL(W) is dense in C(W) and IUni= 1 for neN*, it follows that (2.1.30) holds for all ,J,eC(W), where U 1 has been extended (uniquely) to a bounded linear operator on C(W). Since the operators O" on C(W) are all positive and preserve constants, (2.1.30) implies that the same is true of U 1 . Thus, for any we W, ( U 1 y,)(w) is a positive linear functional on C(W) with (U 1 I')(w)= 1, where r(w) =l. Hence, by the Riesz representation theorem, there exists a (unique) probability Q00 (w;•) on '"If" such that
J
(U 1 t/,)(w) = t/l(w')Q 00 (w;dw') w
(2.1.31)
119
2.1. Ergodicity
for all t/JEC(W). In view of (2.1.31), (2.1.29) reduces to (2.1.28). That Q 00 is a transition probability function follows from the fact that Q(X, (•; B)E CL(W) for every Borel set B. This is obviously true if B = W. Suppose that B is an open set such that its complement ff is not empty. For nEN*, define XnECL(W) by
Xn(w)
=
!
if d (w,JJC)
l
nd(w,JJC)
~
1 1) - , n
1 if d (w,JJC) :::;; - . n
Then
and the convergence is monotonic. Therefore lim (U 1Xn)(w)
n ➔ oo
= JXB(w')Q (w;dw')=Q (w;B) 00
00
(2.1.32)
W
for all WE W. By Theorem 2.1.53, 91 1 = U 1 (CL(W)) is a finite dimensional subspace of CL( W). Hence there exists a constant J < oo such that
for all
i/JE 91 1 .
Therefore
l(U 1Xn)(w1)-(U 1Xn)(w2)I:::;; m(U 1Xn)d (w1, W2) :::;;JIU 1Xnld(w1, W2):::;;J d(w1, W2) for all nEN* and w 1 ,w 2 EW. Letting
n ➔ oo,
we have
IQ 00 (w 1;B)-Q 00 (w 2;B)I :::;;Jd (w 1, w2).
If B is an arbitrary Borel set, w 1 , w 2 E W and e > 0, the regularity of Q 00 (wi;•) insures the existence of an open set Bi,e such that Bi,e ::i B and Q 00 ( w i; Bi, 2 ) - Q 00 ( wi; B):::;; a
1
9
>
d(B,B')=
inf
weB,w'EB'
d(w,w'),
IosifescujTheodorescu, Random Processes
d(w,B)=d({w},B).
2. Random systems with complete connections
120
i = 1,2. Combination of these inequalities with the result just obtained above yields IQ 00 (w 1 ;B)-Q 00 (w 2 ;B)I ~Jd (w 1, w2)+2e
or, since e is arbitrary, IQ 00 (w 1;B)-Q 00 (w 2;B)I ~Jd (w1, W2)-
Thus 0 l•:B)ECL(W) with m(Q (•;B))~J for all Borel sets Bin Wl). To complete the proof it remains only to prove that the transition 1 n-1 probability function Qi converges uniformly to Q 00 • Denote 00
7
n
I
j=O
1
n-1
- L
.
_
Ql=Qn,
nEN*.
n j=O
Since
it suffices to show that if B is open, then
for all
wE W
if n is sufficiently large, while if B is closed
for all wE W if n is sufficiently large. The statement concerning closed sets follows from that concerning open sets by taking complements, so only open sets need be considered. There is no loss in generality in assuming B =I= W. By the definition of the x/s, XB(w);?::xj(w)
for all
WE
W so
Qn(w; B);::: (-0" x}(w)
= Q oo
We shall prove two lemmas.
Lemma 2.1.55. For any homogeneous random system with complete connections satisfying (I) and (V), the operator U has no eigenvalues of modulus 1 other than 1.
Proofl)Suppose lll=l,l#l,i/1¢0 and ljJE\Jl).. Since 11/JIEC(W), there exists an w0 E W such that ll/l(w0 )1 = maxli/J(w)I =Ii/JI. weW
Clearly i/J(w 0 ) :;f 0. Now ( U" i/J)(w 0 ) = An i/J(w 0 ). Let Bn = {w: t/l(w) = .,1," t/l(w 0 )} for nEN*. We have Q"(w 0 ;Bn)= 1 for all nEN*. For, note that
Re ifT
{,,c"~i)
< I
(2.1.33)
WE~.
By (U"t/J)(w) =
JQ"(w;dw')t/J(w') w
(see 2.1.3') taking w=w 0 , we obtain l"t/l(w 0 )Q"(w 0 ;~) =
J Q"(w 0 ;dw')t/l(w'). B~
1
9*
>
Similar arguments were previously used by JAMISON (1964).
(2.1.34)
2. Random systems with complete connections
122
Taking into account (2.1.33) if Qn(w0 ,~)#0, (2.1.34) implies Qn(wo; ~) < Qn(wo; ~).
Consequently, one must have Qn(w 0 ;Bn)= 1 for all nEN*. Since Q(w 0 ;B 1 )=1, B 1 #0. Let w1 eB 1 • Then r/J(w 1 )=lr/J(w0 ), (U"r/J)(w 1 )=An+tl/J(w 0 ), and Qn(w 1 ;Bn+ 1 )=1 for nEN*. But IA.n+t_l"I = IA -11 > 0. Since 1/J is uniformly continuous there exists b > 0 such that d(w',w")
for all w' E W. Thus so
This formula and a simple induction on v imply
for v~O. Thus
130
2. Random systems with complete connections
But any positive integer l can be represented as l=vn 0 +j for some v~O and O~ji+q
1)-
Thus
(2.2.11)
~6M
m+n-2
2
[
q
_L
1-=m
t:;+(n-1)_
L
J>m+q
]
t:j-i-k+1 •
144
2. Random systems with complete connections
Finally, we have according to (2.2.7'), (2.2.10) and (2.2.11)
+12(n-1)
L
BJ-i-Ht
+4 L,
J>m+11
If a= 0, the condition
L, en< oo
Bn]·
neN•
implies
neN•
for fixed k, and it suffices to take q = [n 112 ]. If a> 0, the condition L n" £ 11 < oo implies e" = O(n- 1 neN•
[
")
and
1 ]
elementary computations show that one can take q= n - 1 +" leads to the announced rate of convergence.
which q.e.d.
We deduce immediately the following
Corollary. If 'I' in (2.2.5) is bounded and
L, en< oo,
then for every
neN•
meN*, weW, a>½, the random variable
converges to O in Pw-quadratic mean as n ➔ oo. Proof. The proof follows from Theorem 2.2.19.
q.e.d.
Notes and comments
Theorems 2.2.16 and 2.2.19 are due to
loSIFESCU
(1965e, 1967b).
145
2.2. Asymptotic behaviour
Complements
Supposethat en=O(e-,tt/2i) for some A.>0. Thenforall mEN*,wEW
as
n ➔ oo,
where Pn=O
1 mf . {q+l - - + e-lVq+l }) ( -n + qeN• n
[Cmcu (1957 a, 1957 b)].
2.2.2.4. The central limit theorem
2.2.2.4.1. In wha~ follows we shall suppose that (2.2.3) holds. First we shall investigate the connections between the asymptotic behaviours of suitable normed sums associated with the sequence lfn)nEN* under the assumption that the probability on (Q,X) is either Pw or Pa.. Let (an)neN• be a sequence of real numbers and (bn)nEN* be a sequence of positive numbers such that
lim bn=oo.
(2.2.12)
n ➔ oo
Let us denote
F.m.n>(a) = P.w (sm,n w b - an
,
nE N*'
for a fixed pEA. It follows that (n takes on the (m+ lt values 1 ~jr~m+ 1, 1 ~r~n, with the probabilities 7th·· .1Jp). For the homogeneous case
t/lJ
1
iJp),
...
2.3.1.1.5. Uniform ergodicity reduces for homogeneous simple OM-chains to the existence of the limits
which are not dependent on the initial probability distribution pE ,1 (or ,1'), uniformly with respect to IE N*,pE ,1 (or ,1'), 1 ~ i 1 , ... , i 1 ~ m + 1. 2.3.1.1.6. The concept of OM-chain can be extended in the sense that one may consider chains which are of a given multiplicity l. We arrive to the so-called multiple OM-chains; without entering into further details, we note only that for such chains the transition mappings are in fact dependent not only on the preceding state and preceding probabi1 > Here we identify the sequences (xii,•••,xiJ and (ii, ... ,jJ, l~jr~m+l, l~r~n.
2.3. Special random systems with complete connections
159
lity distribution, but on l preceding states and l preceding probability distributions. In Subparagraph 2.3.1.5.2 we -consider, for example, multiple OM-chains of second order in case of an arbitrary set of states. Notes and comments
The notion of chain with complete connections was introduced in probability theory by ONICESCU and Mrnoc (1935 a). The term OM-chain was used for the first time by FoRTET (1938). The associated Markov chain was introduced by ONICEscu and Mrnoc (1936a) in order to study the asymptotic behaviour of the corresponding OM-chain [see also F0RTET (1938), I0SIFESCU and THE0D0RESCU (1961-1962)]. The idea of defining OM-chains for a continuous parameter by making use of differential equations is to be found in ONICESCU (1954, 1956); for multiple OM-chains, see THE0D0RESCU (1956). The same idea refered to generalized random processes, has been applied by MARINESCU (1960), where the Gateaux differential here replaces the usual operation of differentiation. Complements
1. Let us consider a finite set X = {x 1 , ..• , xm + 1 } and a family of (m+ 1) x (m + 1) stochastic matrices ttp = ('tj,i), tE N. Further, let (V, ~) be a measurable space, called the parameter space, and ir, tEN, a family of ~-measurable mappings of V into itself. Let us assume that ttp, tEN, is dependent on VE V. The parameter v on trial t+ 1, i.e. v(t+ 1),. is
given by v(t+ l)='Tv(t),
tEN;
it follows that
where
r =tTo· · ·o 0 T. A system {V,X,('lf'\eN,CT)teN} of spaces and mappings is an OMchain in the wide sense [THEODORESCU (1968)]; obviously the OM-chain is a special case. If we consider an k-person OM-chain in the wide sense, that is k OM-chains in the wide sense for which the stochastic matrices depend on k parameters, we get as special case the k-person Markov process. For the properties of these processes and their application in economics, see JACOBS (1958).
2. Random systems with complete connections
160
2. Let us consider a homogeneous simple OM-chain and let us set
where roi,ji .. J,.(pn)
= Y'i,ji (p") ... ,J,j,._ i,j,.(pn+r-1 ),
nE N*,
provided that on trial n occured xi. This is then-step entropy of a path of length r provided that on trial n occured xi. The corresponding expected entropy is m+l
W,.(p)=
L
1tir•-dp)H'in,iP)
ik= 1 1 ~k~n
and we have r-1
W,.(p) =
I
H~+k(P),
k=O
H~+s(p) = H~(p) + H!+,(p).
If lim W,.(p)=H'00 (p),
reN*,
n-+ oo
then H'"oo(p) =r H~(p).
Further, a Shannon type theorem can be proved [IoSIFESCU and THEODORESCU (1961)]. This result extends to OM-chains those obtained by HINCIN (1953) for simple Markov chains and those obtained by AMBARCUMJAN (1958) for multiple Markov chains. 2.3.1.2. Examples
2.3. l.2. l. Let us consider an initial urn U O containing ai balls of colour j, 1 ~j~m+ 1, and denote by aj, 1 ~j~m+ 1, the structure of the urn Un(aJ = aj, 1 ~j ~ m + 1), given by the following rule; if the structure of the urn Un was aj, 1 ~j ~ m+ 1, and on trial n a ball of colour i was drawn, then the structure of the urn U" + 1 is n+l _ f" ( n n ) aj i,j a1,···,am+1 , where the function /7,i' 1 ~i,j~m+ 1, nEN, are taking values in N. The conditional probabilities of drawing a ball of colour j on trial n are
2.3. Special random systems with complete connections
161
and on trial n + 1
(2.3.3)
Further, if we denote by Dj, 1 ~j ~ m + 1, the algebraic complements of the elements from the last raw in the determinant
-
II
P2
we obtain and therefore the conditional probabilities (2.3.3) become n+ 1
Pi
ff. Pn Di, · · · • An D~ + 1) = m+l
L
,
1 ~j~m+ 1.
(2.3.4)
f?.dAriDi, ... ,}.nD~+d
k=l
Assuming now that the fL 's, l~i,j~m+l, nEN, are homogeneous of the same order, we deduce from (2.3.4) f '!i,J-(Dnt,•••, Dnm+l )
n+l _
Pi
- m+l
I
_ ,/,n ( n n ) - 'ri,i Pi,•• •,Pm+1 ,
n.k respectively.
2.3. Special random systems with complete connections
167
2. Let us consider an initial urn containing a 1 balls of colour 1 and a 1 balls of colour 2, a 1 + a 1 = M. Let us assume that on trial n a ball was drawn; then this ball is replaced while a ball of the opposite colour is added to the urn. We get a homogeneous simple alternate linear OM-chain. An equivalent model is the following. Let us consider the sequence of urns with the following compositions: (O,M),(1,M-l), ... ,(M -1, 1), (M,O).
Obviously, the initial urn is located somewhere in this sequence. If the drawn ball is of colour 1 (2) the next drawing is performed from the foregoing (following) urn. Thus we get a circular system of urns [ONICESCU and Mmoc (1936 b)]. If we denote by vn the number of balls of colour 1 drawn during the first n trials, then the urn will have after n drawings the composition (a1 -2vn+n,a 2 +2vn-n). But O~a 1 -2v+n~a 1 +a 2 ,
O~a 2 +2v-n~a 1 +a 2 ,
from which we get
a
1
v
a1
- -2~ - - - ~ - , 2n n 2 2n
that is
V
1
lim - = - . n 2
n---->oo
This relation highlights an interesting example of the law of large numbers; in this case the convergence is in the classical sense of analysis [ see also CANTELLI (1935)]. It follows that ) 1 . P (p)+ ··· +Pn(p 1 hm - - - - - - - = -
n
n---->oo
2
i
for any p = -, O~i~M. M The circular system of urns considered above has also other interesting properties. For instance, if a1 = a 2 =a= 2 k, then · hm P112(V2n-n=x) = 21-la (
n ➔ oo
2
a )
a-21xl
for lxl~k,xEZ, and lim
P112(V2n+l
-n=x) = 21-2a (
n-+oo
for lxl ~k, xEZ, x#O. 12 Iosifescu/Theodorescu, Random Processes
2a
a-21xl+l
)
2. Random systems with complete connections
168
For sufficiently large a, by making use of the Stirling formula, we get 21-2a (
2a
)
a-21xl
~
_ 4 x2
2
- - e "'
y'an
i.e. the Gaussian law. For small a, the limit law differs essentially from the Gaussian one [ONICESCU and MIHOC (1936b)]. 3. The Thurstone urn. It suffices to take
d12 =0, d21 =l2 ,
m=l, d11 =21 ,
d22 =0.
This amounts to the fact that if on trial n a ball was drawn of colour 1 (2), then this ball is replaced while ,1. 1 (...1. 2 ) balls of colour 1 are added to the urn. This urn has important applications in learning theory [THURSTONE (1930)].
4. The urn model described in Subparagraph 2.3.1.2.2 may be generalized by setting
1----1·---m+l a::::: a::::: ,
an+l_a"+d" j j i}• i.e. the quantities d:i vary with n. An important special case is The Luce urn. It suffices to take m= 1,
di.2=0.
This amounts to the fact that if on trial n a ball of colour 1 (2) was drawn, then this ball is replaced while - 1- ai ( -1- a2) balls of colour 1 1 +/31 l+P2 are added to the urn. In other words, the number of balls of colour 1 added is proportional to the number already in the urn. In this case p"+ 1 l
-
JJ':
Pi +P;(I-pD
[LucE (1959)]. This model is very important m learning theory; see Subsection 3.3.1. 2.3.1.3. Ergodic theorems
2.3.1.3.1. Let us introduce a metric d in A' by setting m
d(p,q)
=
I
j= 1
lpj-
qX
2.3. Special random systems with complete connections
169
We shall obtain an upper bound for the norm µ(f) =
sup p,qe-1' pt=q
d(f(p),f(q)) d(p,q)
of a mapping/= (fi, ... ,fm) of Lf into itself. Proposition 2.3.4. Suppose that the f Js, 1 ~j ~ m, have first order bounded partial derivatives. Then
I lofj(p)I ·
µ(f) ~ max sup
Opk
1 ~k~m pe.d' j= 1
Proof. We have, for some 0 1. It should be also noted that by specialising the results of Paragraph 2.1.2.4 we could obtain sufficient conditions for uniform in the strong sense ergodicity for nonhomogeneous OM-chains. 2.3.1.3.3. Now consider the homogeneous case. Let .d' be endowed with an arbitrary metric d. We have
Theorem 2.3.6. Suppose that (j) there exists k 0 , 1 ~ k 0 ~ m+ 1, such that
m
L'P;,j(p) 0, c/>2,1 (p)> 0 (or for every pe [O, 1];
4>1.JP)0,
lcl~l,
d>0,
c+d>0
lal~l,
b x,
and (2.3.19) uniformly with respect to pe A. Proof. Relation (2.3.18) is a simple consequence of uniform ergodicity and a known Cesaro theorem. Further, we have EP (c2n,11·--11 > . .
1
2
n
'° E
=- -~ (n+l)2
1-0
n
'° '°
n-1
.v; . . + - -~ -~ EP~i.ji···11-k,J1·--11· r .( . . (n+l)2 1-0 k-t+l
P"'l,)1·--11
In virtue of uniform ergodicity of the considered OM-chain, we obtain n
L EP CL
n
1 ...
i,
L EP (i,ii-••ii ~ (n+ l)Pi~---i, + b
~
i=O
(2.3.20)
i=O
smce Next, 2
n-1
n
---2 " " (n+ l) i~Ok=f+1
Epr,.. 1•1···1·, r_k,1'1--·1·, ~ [P~ ·] 2 11 "'
1.,
"
·--Jr
A
A
2 + _t_ p~ . +--n+l 11 --· 11 (n+l) 2 '
(2.3.21) where A1 and A 2 are constants depending on b but not depending on n. By (2.3.20) and (2.3.21), we get Ep',,n,]l••·Jl r . . ~[P~ .]2+!!!._P~ ~ Jt· .. J! n+l Jt•··JI. + (n+1)2'
(2.3.22)
where B1 and B 2 are constants depending on b but not depending on n. Analogously, we have
(2.3.23) where C 1 and C 2 are constants depending on b but not depending on n. According to (2.3.22) and (2.3.23), we obtain the relation DP(,t ~ Pf +0,01, Pf-0,004 ~ z(i>,t ~ Pf +0,004, Pf-0,03 ~ 1 ~ Pf +0,03, Pf -0,01 ~ 1 ~ Pf +0,01, Pf -0,004 ~ 1 ~ Pf +0,004.
(1)
(2) (3) (4)
vt vt vt
(5) (6)
Summarizing the results, we arrive to Table 1
IA 1B
IIA IIB
(1)
(2)
(3)
31 98 20 62
779 378 702 215
982
(4)
(5)
(6)
972 560 827 221
Table 1 provides that subscript i from which on all values of zw,i (respectively, vj, 1 ), i ~j ~ 1000, verify the corresponding inequalities in the paths A. B (IA and 1B for the first OM-chain and IIA and 11B for the second one); the sign - shows that such an i does not exist. Table 2a (respectively 2b) gives the numbers of values z(i).J and 1 which verify the inequalities (1)-(6) in different parts of the simulations (1-100, 101-200, ... , 901-1000 and 1-1000).
vt
2. Random systems with complete connections
182
Table2a (6)
(5)
(4)
(3)
(2)
(1)
IA
I 1B
1-100 101-200 201-300 301-400 401-500 501-600 601-700 701-800 801-900 901-1000
89 100 100 100 100 100 100 100 100 100
73 100 100 100 100 100 100 100 100 100
31 68 100 100 60 2 1 21 100 100
9 20 10 25 100 100 100 100 100 100
10 6 43 18 0 0 0 0 0 0
4 0 0 0 0 74 58 68 55 21
22 42 90 98 35 0 0 0 0 29
5 0 0 14 34 96 100 100 100 100
9 3 0 2 15 0 0 0 0 0 0 40 0 11 0 1 0 0 0 0
3 2 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I- 1000
989
973
583
664
77
280
316
549
26
13
-----
-
IA I 1B IA I 1B IAIIB
IA I rn IA I 1B
55
2
---
Table 2b (2)
(1) -
1-100 101-200 201-300 301-400 401-500 501-600 601-700 701-800 801-900 901-1000 1-1000
(4)
(3)
(5)
(6)
IIA I 118 IIA I 11B IIAIIIB IIB I na IIAIIIB 1IAJ1rn 86
100 100 100 100 100 100
63 100 100 100 100 100 100 100 100 100
18 0 4 12 0 43 2 98 100 100
100 97 100 100 100 100 100 100 100
986
963
377
917
100 100 100
20
2 0 0 0 0 0 0 0
0 0 2
5 25 21 87 44 0 0 0 0
16 0 0 0 0
3 0 0 0 0 0 0
1 15 8 31 7 0 0
1 0 0 0 0 0
0
0
0
0 70 94 100
15 95 90 100 100 100 100 100 100 100
0 0
0 0
0 0 0
182
291
900
3
62
11
0
1 5 0 1 0 0 0 0 0
0 7
Table 3 summarizes the results concerning the simulation of the alternate OM-chain with the initial probability distribution p1 =0,5, p 2 =0,5 and with the transition mappings 1, 1 (p)
= 0,1 p 2 -0,3 p+0,6,
2.1 (p) = 0,1 p2 +0,2p +0,3.
2.3. Special random systems with complete connections
183
Table 3
25 50 75 100 25 50 75 200 25 50
15 300 25 50
15 400 25 50
15 500 25 50 75 600 25 50 75 700 25 50 75 800 25 50
15 900 25 50
15 1000
z We adopt the following convention concerning the a-ary expansion of u in ambiguous cases. The a-ary expansion of u=l will be taken as u=O,(a-l)(a-1) .... In all other ambiguous cases, an expansion terminating in O's will be prefered to one terminating in (a- l)'s. Thus in the decimal system, the expansion of u = l will be 0, 999 ... while the expansion of u = ½ will be 0, 5000 ... rather than O, 4999 ....
2.3. Special random systems with complete connections
191
We have Theorem 2.3.9. Suppose that Q(•; x) are Borel measurable for all suppose also that there exists a distribution function G, with G(O- )=0, G(l)= 1, which satisfies the functional equation
XE X;
G(u)=
L
au-x
j Q(v;x)dG(v),
xeX
uE[0,1].
(2.3.28)
0
Then there exists a probability space (Q,%, P) and a X-valued strictly stationary double infinite sequence (~r)rez defined on (Q,%, P) such that P-a.s.
P(et = xlet-1,er-2 ...) = P(... ,et-2,er-1 ;x)
(2.3.29)
for all xEX, tEZ. Proof Consider a real-valued Markov chain (r1r)rez on a suitable probability space (Q,%,P) with state space [O, 1], stationary absolute distribution given by G and transition probability functions given by
P('1, = x:u I17,_
1
ue[O,l].
=u)=Q(u;x),
(2.3.30)
In virtue of (2.3.28) these conditions are compatible. Define the function hon [0,1] by h(u) = first digit in a-ary expansion of u.
Now define the random variables~' by
er-
1
= h(r,r),
tE
z.
Obviously (~1),ez is a strictly stationary double infinite sequence on (Q,%,P) and . =1 ( " -~t-j)
p r/r=
~
jeN*
aJ
since P(rrt-i =ar,r-h(r,r))=l for all teZ. Also J:. J: • (" p ( ··•,",,t-2,i,,t-l,X)-Q ~
jeN*
~t-j. j - , X)
a
~r- er-
(2.3.31)
holds P-a.s. for all teZ. The only "path" (... , 2, 1 ) for which the two sides of (2.3.31) might be different are paths other than ( ... , (a - 1), (a -1 )) which contain only a finite number of components different from (a-1) and these can be shown to have P-probability 0. It follows from this that (2.3.29) holds. q.e.d.
2. Random systems with complete connections
192
We note that the Borel measurability of Q(•;x), xEX, may be assured if we suppose thatll w"-+w implies P(w";x)-+P(w;x) as n-+ oo. In fact under this condition it follows that Q(•;x), xEX, are continuous to the right for each uE [O, 1] and continuous to the left except possibly at lattice points (that is, the points whose a-ary expansion terminates in an unbroken sequence of O's). Thus Q(•; x), xEX, belong to Baire class 1. Particularly, this holds if Condition H below is satisfied. 2.3.2.3.3. We shall use the notation (u = v)n
to mean that the first n digits in the a-ary expansion of u are the same as the first n in the expansion of v. Define
hn= sup IQ(u;x)-Q(v;x)I,
neN*.
(2.3.32)
x,(U= V)n
We shall consider
l
Condition H. We have lim hn=O
n---. co
~w.J I TT"
ah 1 - -~ =oo
i)
We shall agree that any of the factors
(1 - 2ahk)
(2.3.33)
in (2.3.33) which
is zero or negative will be replaced by 1. We note that the condition L h" < w implies Condition H but the converse may be not true (for neN*
example Condition H is satisfied provided we have hk ~ 2(a k)- 1 for sufficiently large k). We have Theorem 2.3.10. Let Gn(v;u) be the n-step distribution function induced by the Markov transition law (2.3.30) starting from a given value u (that is G"(z:;u)=P(11n~vl17 0 =u)). Suppose that Condition His satisfied and that there exists x 0 EX such that
(2.3.34) 1
> If wn=( .. . , x~\,x~>) and w=(... , x_ 1 ,x 0 ), then each kE -N, xL"1=xk for all n sufficiently large.
Wn--+W
will mean that for
2.3. Special random systems with complete connections
193
Then (i) Gn(v;u) is Ci-summable to a distribution function G(v) which is independent of u. If hn < oo holds (instead of Condition H) then the
L
neN•
ordinary limit exists. In either case the limit is uniform in u; (ii) the distribution function G either has a single discontinuity of magnitude 1 at one of the points 0,1/a-1, 2/a - 1, ... , 1 or is continuous. (iii) the distribution function G is the only stationary absolute distribution for the considered Markov transition law and satisfies (2.3.28). Proof. (i) The method is related to an idea of DoEBLIN (1938), who proved the ergodic theorem for Markov chains with a finite number of states by considering two particles starting in different states, which move independently until they simultaneously occupy the same state, after which they merge. In order not to obscure the main idea by details we sketch the proof for the case a= 2. Since (2.3.34) holds we can just as well take x 0 = 0. Then
Let (tm)meN be independent random variables uniformly distributed on (0, 1). Define sequences (']~)meN and (']~)meN as follows: '7~ = u', r,~ = u", u', u" E [O, 1]. Suppose r,~ and '7~ are determined. Then
'
'lm+t=
for
tm~Q(r,~;O),
for
tm>Q(r,~;O),
21'fm
for
tm~Q(r,:;o),
.!_+_! II 2 21Jm
for
tm>Q(,,:;o).
{t +½'lI:. , 2
21Jm
while II
1'fm+ 1 =
{'"
It is easy to see that (11~)meN• and (']~)meN* are Markov chains with transition law given by (2.3.30). It is convenient to let r;n(T;~) designate the transformation applied to 11~(17;). That is T~ = i(T~ = i) if
/
_ i+r,~ ( tfm+l - i+r,:) -,
'1m+1 - --
2
II
_
2
i=O,l.
We have
0
A slight modification is necessary if u' = 1 or u" = 1.
194
2. Random systems with complete connections
which in tum implies
I'lm+k+l I
II
I ~2-" •
'lm+k+l ""
It can be shown that these relations lead for any e > 0 to
1 lim -
n ➔ oa
n
n-1
L Prob(l'l:..- '1~1 > e) = 0,
(2.3.35)
m=O
uniformly in u' and u". A simple argument then shows that Gn(v; u) is C.-summable to a distribution function G(v) which is independent of u. Moreover, the difference
tends to zero uniformly in ue [O, 1] as n-HJJ at all points of continuity of G. If the stronger condition L hn < oo holds instead of Condition H, neN•
we can replace (2.3.35) by the stronger statement lim Prob(l,,;,.-,,~I > e) = 0. We then get actual convergence, rather than just C 1 -summability, of the distributions to G. (ii) First let x 0 in (2.3.34) be 0. If Q(O; 0) = 1 it is clear that G has a jump of magnitude 1 at v =0, and conversely. If Q(O; 0) < 1, G is everywhere continuous. First, G must be continuous at 0. For let k, nE N*, k ): a. The probability transition function p together with an a priori distribution function for the initial value d 0 l = a 0 ( or l}
ii
en,
Here PW and neN*, are those constructed in Theorem 2.1.3. This is not an essential restriction; it is made here for the sake of simplicity.
2.3. Special random systems with complete connections
199
more generally, for a initially fixed segment (a(m), x, neN*. If we introduce the notation Pn=(p~, ... ,p!), ~=Prob(an=ilx),
a}) = P.(Sn(ro)E A),
Q"(s;A) =
nE N*,
Q0 (s;A) = X..t(s),
for all Ae9'. 3.1.2.2.2. Theorem 3.1.9 below shows what can be said about the asymptotic behaviour of the random sequence (Sn)neN" for distance diminishing models with no further assumptions.
Theorem 3.1.9. For any distance diminishing model, the transition probability function
converges uniformly as n--+ oo to a transition probability function Q 00 and for any Borel set A in S, Q-~(-;A)eCL(S). There exists a positive constant c such that
II:
:t: E.t/,(S1)-(U) Q00 (• ;ds') 1 >.
s
Proof The proof follows from Theorems 2.1.65 and 2.1.66.
q.e.d.
We have also Theorem 3.1.16. If an absorbing distance diminishing model and an A c £0, there exist two positive constants v< 1 and K., such that
for all seS, m,neN*, xeR. (e) Denote by o-f ( ti') the series obtained by replacing in the expression 2 of o- ( P) the term E Y. P(E i) 'l'(En + i) by E 00 'l'(E i) l/'(£" 1+ i). If l is sufficiently large (such that L(J,' < 1) and o-1(P)>0, then
- I (n ➔ oo Iim
P s
for all seS, meN*.
11
l/'(E(h-1)l+m)-n£ootp
=_!__ ---~---_-_-_-_-~
O"i(l/')V2nlog2n
)
= 1 =1
217
3.1. Basic models
Proof The proof follows from Proposition 2.2.18, Theorems 2.2.19, 2.2.24, 2.2.12 and 2.2.27. q.e.d. We note that we could also give another asymptotic properties by transcribing those given in Section 2.2. Notes and comments
The theorems given here are due to Theorem 3.1.17.
NORMAN
(1968 a), except for
3.1.3. Finite state models 3.1.3.1. Introductory comments
3.1.3.1.1. It is possible to develop a theory of finite state models that completely parallels the theory of distance diminishing models given in Subsection 3.1.2. This will not be done here, since the results concerning states obtained by these relatively complicated methods are, if anything, slightly inferior to those that can be obtained by applying the well known theory of finite Markov chains to (Sn)neN•· 3.1.3.1.2. The natural analogue of (V ), Theorem 3.1.10, for finite state models is (V') for any s,s'ES if n is sufficiently large.
This is equivalent to (V') the finite Markov chain (Sn)ne N* has a single ergodic set and this set is regular
orto (V') there exist n 0 EN* such that the coefficient of ergodicity of the n 0 th power of the transition matrix Q of the finite M arkoi, chain (Sn)neN• is positive.
By analogy with Definition 3.1.11, we have Definition 3.1.18. A finite state model is ergodic 11 if it satisfies (V'). Notes and comments
The above description of finite state models 1s due to (1968a).
NORMAN
n (Sn)neN* need not be an ergodic chain, since it may have transient states. If there happen not to be any transient states, the chain is ergodic and regular.
218
3. Learning
3.1.3.2. Properties
3.1.3.2.1. Theorem 3.1.19 is analogous to Theorem 3.1.10. Theorem 3.1.19. For any ergodic finite state model there exist two constants a< 1 and c and a probability Q 00 on S such that
(3.1.10) for all real-valued functions t/1 on S, nEN*, where
uoot/1 =
I
(3.1.11)
t/l(s)Qoo({s}).
seS
Proof Let t be the number of states, i.e. S = {s 1 , ••. , s,}. The transition matrix Q = (Qij) and the column vector ,J,*' =(t/1!, ... , I/ti) corresponding to t/1 are defined by
Qij=Q(si;{sj}),
Then
t/lr =t/l(si),
1 ~i,j~t.
1 ~i~t,
£.,t/l(Sn)=(Q",J,*);,
for nEN*, where (Q"t/1*); is the ith component of Q"t/1*. By Theorem 2.1.32 (or 2.1.35)1 >there exists a stochastic matrix A= (A;) all of whose rows are the same, say (a 1 , ••. , a,), and there are two positive constants a< 1 and b such that
(3.1.12) for all nEN*, 1 ~i,j~t, where (Qn)ii is the element of the matrix Q situated in the ith row and the jth column. Let now Q00 be the probability on S with
Q00 ({sj)=aj,
1 ~j~t,
and let £ t/1 be any coordinate of A if,*. Then (3.1.11) holds and 00
1£.,l/l(Sn)- Uc,: t/JI = l(Qnt/J*);-(A t/1*);1 =
lit
((Q")ij-A,Jt/Jjl
t
~
L l(Q")ii-Ai)lt/l(s)l~tbix"llt/111,
l~i~t.
j= 1
This gives (3.1.10) with tb c=-.
q.e.d.
(j_
1>
For a direct proof see, e.g.,
KEMENY
and
SNELL
(1957), Chapter 4.
219
3.2. Linear models
3.1.3.2.2. The next theorem parallels Theorem 3.1.15.
Theorem 3.1.20. For any ergodic finite state model
for all n, le N*, A c £Cl>, where t
K
1 (A0>) = L Kl (si; A0 >) Q
00
({si}),
i=l
and c and
IX
are as in Theorem 3.1.19.
Proof. The proof follows from Theorem 3.1.15 (i.e. from Theorem 2.1.66) and Theorem 3.1.19. q.e.d.
We have also Theorem 3.1.21. All of the conclusions of Theorem 3.1.17 hold for any ergodic finite state model. Proof. This follows from Theorem 3.1.20.
q.e.d.
Notes and comments
The results mentioned here are due to
NORMAN ( 1968 a).
3.2. Linear models 3.2.1. The (t + 1)-operator model 3.2.1.1. Description of the model
3.2.1.1.1. Suppose that the subject in a learning experiment is faced with a set of response classes (shortly, responses) Ai, ... ,Am+ 1 on each trial, and that, following response A;, 1 ~i~m+ l, one of the t+ l events Ei, l ~j ~t+ l, occurs. The event probabilities are supposed to depend at most on the most recent response. Let A;,n and Ei,n denote, respectively, the occurence of A; and Ei on trial n and denote by nnii the conditional probability P(Ei,nlAi,n). Such an N-model can be described by identifying pn, the conditional probability vector of A 1 ,n, ... , Am+l,n• with the state Sn and by making the following stipulations:
3. Leaming
220
E = {j: 1 ~j ~ t + 1}
Uabbreviates Ej),
nvip) =vn(p;j)="Ajp, m+l i=l
for all 1 ~j ~ t + l; here A 1is a (m + 1) x (m + 1) stochastic matrix and "ll=("ni) a (m+l)x(t+l) matrix such that O~"nij~l, l~i~m+l, 11
r+ 1
1 ~j~t+ 1, and
L "nii= 1, I ~c~m+ 1.
j- 1
Let us examine closely the matrix "Ai; in addition, for the sake of simplicity we shall omit the subscripts nE N and 1 ~j ~ t + 1, that is A= (a;k); obviously, m+l
L
aik=
1 ~k~m+ 1.
1,
i= 1
For m= 1, we set then 1-b A= ( b
If now we apply this event matrix operator to p, we get Ap = ((l-b)p1 + ap2 ) bp 1 +(l -a)p 2
•
Further, if we denote the first element of the foregoing probability vector as Q(p 1) and the second element by Q(p 2 ), then Q(p1)=(l-b)P1 +ap2=P1 +a(l-pi)-bp1, Q(p2)=bp1 +(l-a)p2 =p2 +b(l-p2)-ap2,
i.e. we obtain the so-called gain-loss form. The following two forms are also used: Q(pi)=ap 1 +a,
and
-a~cx~ 1-a,
O~a~ 1
3.2. Linear models
221
the first is the so-called slope-intercept form and the second one the fixed-point form, since the solution of the equation
is exactly p 1 = y. We shall use also the form
Using the fixed-point form, we can write,
A =cxl +(1-a)I', where I is the identity matrix and
r-c~y .~J· so that, returning to n EN and 1 ~j ~ t + 1,
where
" ( "Y1 ) Y1= l-"Y1. 3.2.1.1.2. We turn now to the general case m>l. We wish to introduce one further restriction which is automatically fulfilled when m= 1, namely: if m + 1 classes of responses are initially defined and if the experimenter later decides to treat any two classes in identical manner, it should be possible to combine those two classes, thereby obtaining the same results that would have been obtained had only m classes been defined initially. For the sake of simplicity, we consider the case m=2. We start with three response classes A 1 , A 2 , A 3 and a probability vector p' = (pi, p2 , p 3 ). We wish to combine classes A 1 and A 2 to form a new class which we label Ac, and to represent the vector in the collapsed space by
Kp
= (~}
p,=p, +pz,
where, obviously,
K = (~
~
~)-
222
3. Learning
Taking into account that A is a stochastic matrix, and after applying the collapsing matrix K, we get (a11 +a21)P1 +(au +a22)P2 +(au +a23)p3) KAp
=
(
0 a31p1 +a32P2 +a33p3
.
Since the elements of this probability vector are to be independent of the probability distribution within class Ac, the components of this vector must not depend on p 1 and p2 individually but only on their sum Pc· Hence
If we combine now classes A 1 and A 3 and then classes A 2 and A 3 we obtain in a similar manner a21 =a23=a2,
a12=a13=a1.
Thus
But a 11 +a 2 +a3 =l, a 1 +a 22 +a 3 =l, a 1 +a 2 +a 33 =1, so that where
If we set
then A =a.I +(1-rx)I',
3.2. Linear models
where
Since
223
1'1 I'= Y2 ( '}'3
we obtain, finally, the expression A p=ap+(l-a)y
or Ap=(l-0)p+0y,
0=1-a.
Returning to neN and 1 ~j~t+ 1, we can write nvip)=nctjp+(l _nct)ni'j
or nvip)=(l _n0i)p+n0/yi,
noi= 1 _nai,
which is valid, obviously, for any m > 1. From the last form, we deduce that the new value nvi,k(p) of the probability for the kth response class Ak is a linear function of Pk and does not depend on p1, l # k, i. e. nvj,k(p)=nctjpk+(l _na)n"Yj.k,
where nYi,k is the kth component of ni'i• 3.2.1.1.3. Let us examine now for the homogeneous case the repetitive application of a single event matrix operator, say Ai. By simple computation, we deduce Ai=~l +(1-cx1)½, reN*, so that
which yields that Aip ➔ yi as r ➔ oo provided that lex)< 1. Finally, we notice that two different matrices, say Aii and Ah commute iff at least one of the following conditions is fulfilled: (a) cxit=l,
this is a simple consequence of the equality Ah Ah-Ah Ah =(1-cxit)(l -cxh)(½ 1 -I'h).
3. Learning
224
3.2.1.1.4. It is now possible to introduce the following precise definition. Definition 3.2.1. A (t+ !)-operator model ((S,d),E,(vn)neN,("K),.eN) said to be linear if
IS
p' =(Pi, · · ·, Pm+ 1 ),
m+l
nKiP)=
L
p/n;j,
i= 1
for 1 ~j~t+ l; here n"li is a column probability vector. For the sake of simplicity we shall omit the term linear throughout this Section. If it is assumed that some of the E/s positively reinforce Ai in the weak sense that they do not decrease the probability of Ai, we deduce that the corresponding ny/s are equal to ti=(c51i) 1 ~ 1~m+i· We can.give, consequently, Definition 3.2.2. A linear (t + 1)-operator model is said to be with reinforcement if all the ny/s, 1 ~j ~ t + l, nE N, are probability vectors of the form ei.
Further, let us assume that the subject in a learning experiment is faced with a set of response classes A 1 , ... , Am+ 1 on each trial, and that, following response Ai, 1 ~ i ~ m + 1, one of r + l observable outcomes Oik• 1 ~k~r+ l, occurs. The outcome probabilities are supposed to depend at most on the most recent response. Let Ai,n and Oik. denote, respectively, the occurence of Ai and Oik on trial n and denote by nnik the conditional probability P(Oik., IAi,n). Such a model can be described by identifying, as previously, pn, the conditional probability vector of 11
1
3.2. Linear models
225
A 1 ,n, ... , Am+1,n, with the state Sn, by identifying the response-outcome pair that occurs on trial n with the event En and by making the following stipulations:
S= {p'=(p,, ···,Pm+1): O,;;;p;,;;; 1, d(p,q)=
Cf
(p,-q;)2
1 ,;;;;,;;;m+ 1,
J:
p,=1 },
)''2,
E = {(i, k): 1 ~ i ~ m + 1,
1 ~ k ~ r + 1},
((i, k) abbreviates (Ai, Oik)),
for all l~i~m+l, l~k~r+l,nEN. This learning model is a variant of the (t + 1)-operator model and we shall refer to it as the (m + 1)(r+ 1)-operator model, that is t + 1 =(m + 1) x (r+ 1). Clearly, it can be interpreted also as a BM-model, namely it is the classical BM-model with experimenter-subject-controlled events. Notes and comments
The form of the linear model as presented in this Paragraph is due mainly to BUSH and MOSTELLER (1955) and NORMAN (1968a), though several aspects are new. By making use of chains of infinite order, in the sense of Complement I to Paragraph 2.3.2.1, LAMPERTI and SUPPES (1959, 1965) gave several properties of linear models, namely asymptotic theorems. The results of LAMPERTI and SuPPES (I 959) have been extended by BERT (1964, 1968).
3.2.1.2. The (m+ 1)2-operator model with reinforcement
3.2.1.2.1. Let us consider the homogeneous (m+ 1) 2 -operator model with reinforcement. That is, the subject in a learning experiment is faced with the set of response classes A 1 , .•. , Am+ 1 on each trial, and that, following response A;, 1 ~ i ~ m + 1, one of m + 1 observable outcomes Oik• l~k~m+l, occurs. It is assumed that Oik• 1 ~i~m+1, positively reinforce Ak in the weak sense. The outcome probabilities are supposed to depend at most on the most recent response. We con-
3. Leaming
226
elude that this model can be described by making the following stipulations:
S={p'=(p1,··•,Pm+l): O~pi~l, d(p,q)=
Ct:
(p;-q;)2
l~i~m+l,
:t:p;=l},
)1'2,
E={(i,k): I~i, k~m+1},
for all 1 ~ i, k ~ m + 1. In this terminology, the above relations define a (2m+ l)(m+ !)parameter family of (m + 1) 2 -operator models, one for each choice of 01k, l~i,k~m+l, and n;k, 1~i~m+1, l~k~m. Since and
it clear that any (m+ 1)2-operator model satisfies all of the conditions of Definition 3.1.8 except perhaps (IV). For the sake of simplicity, we shall restrict ourselves throughout this Paragraph to four-operator models, i.e. m = 1. In this case it is simpler to identify the conditional probability Pn of A 1 ,n with the state S,., so that
S=[O,l],
d(p,q)=lp-ql,
E={(i,k): l~i. k~2}, V;k(p)=(1-0,k)p+O;kt51k, K;k(p) ={p()il +(1- p)bi2) 71:ik•
for all 1 ~i,k~2. This amounts to the fact that
and
3.2. Linear models
227
3.2.1.2.2. The asymptotic behaviour of the sequence (pn)neJV$ 1 > associated with the four-operator model depends critically on the number of absorbing states. Proposition 3.2.3 catalogues the absorbing states for such a model. Proposition 3.2.3. The state 1 is absorbing if! 0 12 =0 or n 12 =0. The state O is absorbing if! 0 21 =0 or n 21 =0. A state pE(O, 1) is absorbing if! for each (i,k)eE, 0ik=0 or nik=0; in this case all states are absorbing, and the model is said to be trivial. Proof. A state pE(O, 1) is absorbing iff for any (i,k)EE, either vik(p)=p (in which case 0;k=0 and vik(x)=x) or Kik(p)=0 (in which case nik=0 and Kik(x) = 0). The state 1 is absorbing iff 1 - 0 12 = v12 ( 1) = 1 or n 12 = K 12 (1) = 0. Analogously, for the state 0. q.e.d.
The next proposition tells which four-operator models satisfy (IV), Definition 3.1.8. Proposition 3. 2.4. A four-operator model is distance diminishing iff for each i = 1, 2, there exists some k;, ki = 1, 2, such that 0ik, > 0 and nik, > 0. Proof. Suppose that the condition given by the proposition is met. If p>0, then Klk;(p)=pnlk,>0 and µ(vlkJ= 1-0 1k 1 < 1. Similarly, if p < 1, then K 2 k 2(p) > 0 and µ(v 2 k < 1. Thus (IV), Definition 3.1.8, is satisfied with l = 1 for all states. Suppose that the condition fails. Then for some i E { 1, 2} and all kE{l,2}, 0ik=0 or nik=0. Since the cases i= 1 and i=2 can be treated similarly, only i = 1 will be considered. It follows from Proposition 3.2.3 on taking k=2 that 1 is an absorbing state. Thus Km 1 n1 , .•• ,m,n,(1)>0 implies mt=1 and 7r1nr>O, l~t~l. But then 0 1 n,=0 for l~t~l and µ(vm 1 n1 , ... ,m,n.>= 1. So (IV), Definition 3.1.8, is not satisfied. q.e.d. 2
)
3.2.1.2.3. With one inconsequential exception, distance diminishing four-operator models are either ergodic or absorbing. Theorem 3.2.5. If neither O nor 1 is absorbing for a four-operator model, then O;k > 0 and nik > 0 for i =I= k, and the model is distance diminishing. Either (i) 0ik = 1 and nik = 1 if i =I= k, or (ii) the model is ergodic. Proof. By Proposition 3.2.3 if neither O or 1 is absorbing, then 0ik > 0 and nik > 0 for i =I= k, and the model is disl!ance diminishing by Proposition 3.2.4. 1>
p0 = p is the initial probability of A 1 .
3. Learning
228
Suppose n 21 < 1. Then by considering first the case p = 0, then p>O and 012 = 1, and finally p>O and 0 12 < 1, it is seen that (1-0 12 )" pE Tn(p) for all nE N*. Thus
as n-HXJ, and the model is ergodic according to Definition 3.1.11. By symmetry the same conclusion is valid if n 12 < 1. Suppose that 0 12 < 1. Then (l-0 12 )'1 pET11(p) for all p>O, nEN*, and (l-0 12 )''- 1 021ET"(O) for all nE N*. Since both sequences tend to 0, ergodicity follows. The same conclusion follows by symmetry when 021 < 1. Thus if (i) does not hold the model is ergodic. q.e.d. The behaviour of the random sequence (p 11 )neN when
Oik
= 1 and
= l for ii= k is completely transparent. Starting at
p the random sequence moves on its first step to 1 with probability 1 - p and to 0 with probability p. and thereafter alternates between these two extreme nik
states. This cyclic model is of no psychological interest and will be discussed no further. As a consequence of Theorem 3.2.5 all of the theorems of Paragraphs 3.1.2.2 and 3.1.2.J for ergodic models are valid for noncyclic fouroperator models without absorbing states. Theorem 3.2.6. If a distance diminishing four-operator model has an absorbing state, then it is an absorbing model.
Proof. The condition given by Proposition 3.2.4 for a four-operator model to be distance diminishing allows four possibilities. These are distinguised by the values of ki, i= 1,2:
A:k 1 =1,k 2 =1; C:k 1 =1,k 2 =2;
B:ki=2,k 2 =2; D:k 1 =2,k 2 =1.
Proposition 3.2.3 shows that D is inconsistent with the existence of absorbing states. Thus it remains to show that a model is absorbing under A, B or C if there are absorbing states. Under A, 1-(1-021 )"(1-p)ETn(P) for all nEN*,pE[O,1], so d (Tn(p), 1)~(1-021 t-+O as n-+ 'XJ. This implies that O is not an absorbing state. By assumption, however, there exists at le:ist one absorbing state, so 1 is absorbing. But then limd(Tn(p),1)=0, pE[0,1], n-+oo
3.2. Linear models
229
implies that the model is absorbing. By symmetry the model is also absorbing under B. If 0 is not absorbing n 21 >0 and 021 >0 by Proposition 3.2.3. Thus, if C holds, A does also, and the model is absorbing. If C holds, and 1 is not absorbing, the same conclusion follows by symmetry. Condition C implies that (l-0 22 )"pETn(P) for p 0 is held fixed and 0 1 is permitted to approach 0. We shall see that in this case the asymptotic behaviour of F 91 , 82 ,K differs radically from that obtained previously. Lemma 3.2.12. If 02 >0, then
Jp
2
dF91,92,K(p) =O(0f).
R
Proof. Taking -r=0 in (3.2.9), we obtain
(3.2.19) where
V*(pn,01)=£.(pn+ 1 -PnlPn)= 01 (1-pn)K(pn)-02Pn(l -K(pn)) and
Letting
n ➔ oo
in (3.2.19), we have 1
0=2fr V*(p,01)dFt(p) 0
1
+ I M*(p,01)dFMp), 0
(3.2.20)
3. Learning
238 1
1
J 02 p 2 (2-0 2 )(1 -K(p))dF;1 (p)=2J p0i(l -p)K(p)dF;1 (p) 0
0
(3.2.21)
1
+ J0f (1-p) 2 ,c(p)dF: (p). 1
Thus
0
(3.2.22) q.e.d.
from which Lemma 3.2.12 follows easily.
Theorem 3.2.13. For a homogeneous two-operator model with reinforcement under the hypotheses of Lemma 3.2.12
lim F81 ,02 ,1e(0~ x) =
81 ➔ 0
* neN
G1e(O>((l _xO )"), 2
where Gm(Y) is the geometric distribution with saltus (l -m)mk at y=k, kEN. Proof. Paralleling the derivation of (3.2.12), we obtain
f eixtdH
81
(x)
= f eixt Y*(0 1 x,0 1 )dH81 (x),
R
(3.2.23)
R
Y*(p, 0i) = £.(exp(i01 1 (Pn+ 1 -pn)t)IPn= P)
= exp(it(l - p)),c(p) +exp( -it 0 1 1 02 p)(l - K(p))
so that
Y*(0. x,0i) = eit(l -B,x) K(01 x) + e-itBix(l - K(01 x))
=eit ,c(O)+e-it82x(l -K(0))+0(01 x).
(3.2.24)
Substituting (3.2.24) in (3.2.23) and noting that JlxldH 81 (x) R
is a bounded function of 0 1 as a consequence of Lemma 3.2.12, we find that
Jeixt dHeJx) =eit ,c(O)J eixt dH01(x) R
R
+(l-,c(O))J eix0- 92 >tdH81 (x)+O(01), R
3.2. Linear models
239
It follows that H 81 converges as 0 1 -+-0 to the distribution function H whose characteristic function tf>u satisfies the functional equation (l-K(O))t/>H((l -02)t) 1-K(O)eit
i.e.,
0. Since H is an infinite convolution of purely discrete distributions, it is either purely discrete or singular or absolutely continuous. 3.2.1.3.4. Let us extend now Theorems 3.2.11 and 3.2.13. Denote by f a function defined on a rectangle D = [O, 1] x [O, p
for all pe[O, 1) and 0e(O,O
for all pE [O, 1),
(3.2.29)
f(p,O)=p
for all pe[O,l],
(3.2.30)
throughout in D,
(3.2.31)
ao f)
-f(p,0)>0
op
3. Learning
240 i)
for all 0e(O,c5], pe[O,1].
opf(p,0)0, 0 2 >0 and K, where
has a limiting distribution function F 61 , 62 ,",f as n-+oo (Theorem 2.1.36). The analogue of Theorem 3.2.13 is Theorem 3.2.14. If f satisfies (3.2.25) - (3.2.32), then for 0 2 > 0
,(a:,),
.1!!1. F 8 ,.,,••• J (;),
where
az
-oiM(-r,O) 0
2 (T
=
I
a2
4 -V(,,O)
1·
cpa0
and M(p,0)=u 2 (p,0)K(p)+u 2 (1-p, (0)(1-K(p)), V(p,0)
= K(p)u(p, 0)-(1-K(p))u(l - p, ( 0).
(3.2.38)
Proof In view of (3.2.36) the argument on Wat the beginning of the
c
proof of Lemma 3.2.10 can be applied directly to -
ae
existence and uniqueness of the root. of (3.2.37). I}
. d For hnear mo els, e.g.,
(;2 opo(J
f
(p, 8) = -1.
V(p,O) to yield the
3.2. Linear models
243
Much as in the proof of Lemma 3.2.10 we obtain (3.2.10), where F6 ,, 6 ,K now is F6 ,, 6 ,K,f· Writing
where p* is between p and rand O is uniform in p, we obtain
-1
2
a2
(p-r) opo0 V(p*,O)dFo,{8,K,J(p)= 0(0).
But
a2
sup - - V(p,O)21Pn=P), i.e. by (3.2.38), Y(p,0,t) by (3.2.11), and G8 ,{ 8 ,K,f by (3.2.13), we have (3.2.12) and (3.2.14) just as in the proof of Theorem 3.2.11. Again we have 3
£.(1Pn+1-Pnl IPn=p)=0(0
3
)
uniformly in p. Substituting this and the expansions V(p,0)
= 0(p-,)
;: v(,,0)+02 0 0
o(l It should be noted that if k----+oo, we obtain as limiting probability distribution function of y the normal one; this follows as a special case of a more general result given at the end of Subpara_graph 3.2.1.3.2.
3. Learning
252
we obtain sin[2u(l
-r l/k) 2-(n- lJ/kJ
---------c------:--:-:-,------
2u(l-2-lt")2- 1, then for experimenter-controlled events one can calculate, e.g., the so-called marginal vector moments, and extend to these moments the properties given for m = 1 [BUSH and MOSTELLER ( 1955)].
3.2. Linear models
257
Complements
The procedure described in Subparagraph 3.2.2.1.3 can be generalized to t > 1. If a 1 = · · · = at+ 1 = a, then the characteristic function f becomes
Assuming that y1 O,
Xn ➔ -
if nlog/J+(l-n)logy0, " 1 if c 0 and c < 0 follow immediately from (3.3.8), (3.3.9), and the equation (3.3.10). In the case c=O, note that EP Yi=O. It is known [CHUNG and FUCHS (1951)] that the sums X" are then recurrent; that is, they repeatedly take on values arbitrarily close to any possible value. In particular, X" takes on repeatedly arbitrarily large and arbitrarily small values (PP-a. s.), which upon recalling (3.3.9) proves the second statement. The third statement is a consequence of the central limit theorem, which implies that for any A, Pp(Xn > A) and PP(Xn< -A) both converge to one-half as n~c,J. Again the assertion of the theorem follows from this fact and (3.3.10). q.e.d.
3.3.1.3.2. Let us examine now the case when the probability of reinforcement depends only on the immediately response; we have what is called simple contingent reinforcement. Further, let us set n: 1 = n 11 and n 2 = n 21 and let /3 and y be specified as in (3.3.6). Using (3.3.9), define the random variable X" recursively (note that log 1, appears first, since log 1' > 0 and log /3 < 0 in order most directly to apply Proposition 3.3.4)
3. Leaming
272
Xn+l
X" + logy { = Xn+log/J
with probability 0 throughout S' = {(p,p'): p>0, p' >0},
so to complete the verification of (IV), Definition 3.1.8, it suffices to show that if (p, p') ES' there exist a l ~ 2 and e1 , ••• , e1_ 1 such that ve •... e1-1(p,p')ES' and Ke1 ... e,_,(p,p')>0. If O0 and l'.4c 8 (0,p')eS', while if 00 and l'.4 8 (p,0)ES Finally l'.4c 8 (0, 0) has positive first and null second coordinate, so t'.4c 8 ,A 8 (0, 0)E S' and K.4 8 (r.4c 8 (0,0))>0. Since K.4c 8 (0,0)>0 the latter inequality implies KAcB,AB(Q,Q) >0. The above argument shows that for any (p, p')E S there exists a point sPP' E T 1 1 (p,p') n S'. Since l'.4w maps S' into S' it follows that 1 r~~~ds 11 P·) 'E Tn+I- i(p,p') for nEN, where z:~w is the nth iterate of r.1w• i.e. 1
•
1
'
No notational confusion is possible concerning the nth iterate.
3.3. Nonlinear models
281
v~w(s)=s, v~1v1 =vAwov~w,
jEN.
Since for any (q,q')ES and nEN,
v~w(q,q')=(l -(1-q)cx~, 1-(1-q')ctj), it follows that d(~+ 1_ 1(p,p'), (1, 1)) ~ d(v~w(sPP'), (1, 1)) ~ {ctin + ct~n) 112 ➔ O
as n ➔ oo. Since (1, 1) is obviously an absorbing state, the verification of (VI), Theorem 3.1.13, is complete. q.e.d. 3.3.2.2.2. We can prove also
Theorem 3.3.13. For any discrimination model of type I lim Pn(A)= 1 n--+ co
and There exist two positive constans ct< 1 and c such that
IIE.(P~
1
(A)P; 2 (WIA))- l II~ c((vi + v~) 112 + l)cxn
for all real v1 , v2 ~ 1, nE N*. The total number fJ* of B responses is finite ~P, p'fa. s. and If cx 1 = ct 2 = 1-0
II£. B*II < oo. and ct 3 = ct4 = 1-0', then 1-p
EO and cil>O for all 1 ~ j, f ~ 2 is ergodic.
3. Leaming
284
Proof. It is clear that if Sn< M, then the sample on trial n will contain elements conditioned to A 2 with positive probability. Given such a sample, A 2 ,n will occur and be followed by 0 21 . with positive probability, and conditioning will be effective with positive probability. Thus Sn+ 1 > Sn with positive probability, and it follows that the state M can be reached from any state. Thus there is only one ergodic set, and it contains M. Furthermore, if Sn= M, then A 1 ,n occurs with probability 1, and O11 ,n follows with positive probability. So Sn+ 1 = M with positive probability, and the ergodic set is regular. q.e.d. 11
3.3.3.2.2. It follows from Theorems 3.1.21 and 3.3.15 that the conclusions of Theorem 3.1.17 are available for any fixed sample size model with 0 < nu, cir for all j and f. Letting D be the subset D= {(i,j,f,l):j=l}
of E, and A = Xn(En), the conclusions of Theorem 3.1.17 include a law of large numbers, a central limit theorem and a law of the iterated logarithm for the number m+n-l 11
L
Aj
i""m
of A 1 responses in the n trial block starting on trial m. A simple expression for a 2 = a}(x. 0 ) can be readily calculated for the pattern model (v= 1) with equal cit under noncontingent reinforcement. We have Theorem 3.3.16. A fixed sample size model with v= l, O 0 is ergodic. For all mEN* and sES, the random variable 1 m+n-1
- L
n
Aj
j=m
converges as n ➔ oo to n 11 both in "5-quadratic mean and P5 -a. s. There exist two positive constants p < 1 and K such that
for all SES, m, nEN*, xER, where (J
2
= 1t11(1-1t11) ( 1 + 2(1-c)) C .
285
3.3. Nonlinear models
If le N* is sufficiently large, then
Proof. Ergodicity follows from Theorem 3.3.15 so Theorem 3.1.17 is applicable. According to ATKINSON and ESTES [ 1963, pp. 172 - 176, especially formulas (37) and (41)] we have EooxD= nu; £00 A1 An+ 1-(£ooxo) 2 = 1C11 (l-n11)(l-ct,
These formulas permit the computation of in (3.1.9) and of