Random Processes and Learning 3642461867, 9783642461866

The aim of" the present monograph is two-fold: (a) to give a short account of the main results concerning the theor

120 41

English Pages 304 [322] Year 2012

Recommend Papers

Random Processes for Engineers 9781107100121

This engaging introduction to random processes provides students with the critical tools needed to design and evaluate e

416 7 8MB Read more

Nonlinear Transformations of Random Processes

The objective of this book is to provide a comprehensive background on nonlinear noise problems for practical applicatio

375 7 3MB Read more

Dynamical Systems and Random Processes [1 ed.] 9781470454463, 9781470448318

This volume contains the proceedings of the 16th Carolina Dynamics Symposium, held from April 13-15, 2018, at Agnes Scot

138 111 19MB Read more

Intuitive Probability and Random Processes using MATLAB 0387241574, 9780387241579

Intuitive Probability and Random Processes using MATLAB® is an introduction to probability and random processes that mer

480 54 6KB Read more

Theory of Probability and Random Processes 9783540254843, 9783540688297, 2012943837, 3540254846

A one-year course in probability theory and the theory of random processes, taught at Princeton University to undergradu

383 106 3MB Read more

Probability and Random Processes [2 ed.] 9781119011903, 2015023986

393 22 65MB Read more

Intuitive Probability and Random Processes using MATLAB 0387241574, 9780387241579

Intuitive Probability and Random Processes using MATLAB® is an introduction to probability and random processes that mer

428 39 2MB Read more

Probability, random variables, statistics, and random processes: fundamentals & applications 9781119300816, 1119300819, 9781119300823, 1119300827, 9781119300830, 1119300835, 9781119300847, 1119300843

Basic concepts of probability theory -- Applications in probability -- Counting methods and applications -- One random v

949 114 3MB Read more

Probability Theory: Basic Concepts, Limit Theorems, Random Processes

The aim of this book is to serve as a reference text to provide an orientation in the enormous material which probabilit

109 90 Read more

Random walk in random and non-random environments 9810202377, 9789810202378

The simplest mathematical model of the Brownian motion of physics is the simple, symmetric random walk. This book collec

532 87 2MB Read more

Random Processes and Learning
3642461867, 9783642461866

Author / Uploaded
Marius Iosifescu
Radu Theodorescu

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Die Grundlehrcn dcr mathcmatischen Wissenschaften in Einzeldarstcllungen Band 150

M. Iosifescu • R. Theodorescu

Random Processes and Learning

Die Grundlehren der 1nathe1natischen Wissenschaften in Einzeldarstellungen mit besonderer Berlicksichtigung der Anwendungsgebiete Band 150

H erausgegeben von J. L. Doob · E. Heinz · F. Hirzebruch · E. Hopf· H. Hopf W. Maak · S. MacLane · W. Magnus· D. Mumford M. M. Postnikov · F. K. Schmidt · D. S. Scott · K. Stein

Geschiiftsfuhrende H erausgeber B. Eckmann und B. L. van der Waerden

M. Iosifescu · R. Theodorescu

Random Processes and

Learning

I Springer-Verlag New York Inc. 1969

Prof. Dr. Marius Iosifescu Academy of the Socialist Republic of Romania Centre of Mathematical Statistics, Bucharest

Prof. Dr. Radu Theodorescu Academy of the Socialist Republic of Romania Centre of Mathematical Statistics, Bucharest visiting professor at Laval University, Department of Mathematics, Quebec

Geschliftsfiihrende Herausgeber:

Prof. Dr. B. Eckmann Eidgenossische Technische Hochschule Zurich

Prof. Dr. B. L. van der Waerden Mathematisches lnstitut der Universitiit Zurich

All rights reserved. No part of this book may be translated or reproduced in any form without written permission from Springer-Verlag. 0 by Springer-Verlag Berlin· Heidelberg 1969. Library of Congress Catalog Card Number 68-54828. Printed in Germany Title No. 5133

Preface The aim of the present monograph is two-fold: (a) to give a short account of the main results concerning the theory of random systems with complete connections, and (b) to describe the general learning model by means of random systems with complete connections. The notion of chain with complete connections has been introduced in probability theory by ONICESCU and Mrnoc (1935 a). These authors have set themselves the aim to define a very broad type of dependence which takes into account the whole history of the evolution and thus includes as a special case the Markovian one. In a sequel of papers of the period 1935-1937, ONICESCU and Mmoc developed the theory of these chains for the homogeneous case with a finite set of states from different points of view: ergodic behaviour, associated chain, limit laws. These results led to a chapter devoted to these chains, inserted by ONICESCU and Mmoc in their monograph published in 1937. Important contributions to the theory of chains with complete connections are due to DoEBLIN and FoRTET and refer to the period 1937-1940. They consist in the approach of chains with an infinite history (the so-called chains of infinite order) and in the use of methods from functional analysis. After II World War, instrumental works in the development of the theory of chains with complete connections were those by IONESCU TULCEA and MARINESCU who have used systematically methods from functional analysis. Other authors contributed also to this theory; it suffices to look at the bibliographical references given at the end of this monograph. The theory of chains with complete connections, despite numerous authors who contributed to it, remains however a Romanian creation 1> • An important fact to be mentioned is the publishing of the monograph devoted to stochastic models for learning by BusH and MOSTELLER in 1955. Apparently having no relation to the theory of chains with complete connection~, this monograph describes the most current 1

>see, e.g., the monograph published by Crncu and THEODORESCU (1960) which contains the results available up to the end of 1959; see also the expository payer by THEODORESCU (1960b).

Preface

VI

stochastic models for learning to be met in practice. As proved later, there exists a tight relationship between mathematical learning theory and the theory of chains with complete connections in the sense that the model described by the subject-controlled events considered by BusH and MOSTELLER (1955) reduces to the associated (Markov) chain with a linear chain with complete connections with a finite set of states. Obviously, the remaining models discussed by BusH and MOSTELLER can be also reduced to schemes somewhat generalizing these chains. Among the works devoted to the application of the theory of chains with complete connections to mathematical learning theory - eventually the most important field of application of this theory - an important contribution is due to NORMAN (1966c, 1968a,c), whose results play an important part in the present monograph. The monograph is divided into three chapters. The first two chapters are devoted to the general theory of random processes, especially of random systems with complete connections. In Chapter 3 this theory is applied to the general learning model. It is intended for mathematicians with a good background in modern probability theory, as well as for applied people working in the field of learning. The numbering used is the following: a. b. c. d. e, where a indicates the chapter, b the section, c the subsection, d the paragraph and e the subparagraph; definitions, theorems, lemmas and propositions are numbered a. b. c, where a indicates the chapter and b the section. We have made a sincere effort to provide every paragraph with appropriate bibliographical notes and comments; any inaccuracy or omission in assigning priorities is wholly unintentional and deeply regretted. Where it was necessary we inserted also additional complements which supplement the theory developed. The bibliographical references mentioned at the end of the book are exhaustive as far as Romanian contributions are concerned. Certain parts of this book were previously distributed, and we are indebted to many friends and colleagues who proposed improvements during this period. We are particularly indebted to Professor KLAUS KRICKEBERG for his careful reading of the manuscript and for making a number of valuable suggestions. We wish also to express our appreciation to Springer-Verlag for their most efficient handling of the publication of this book. December 1967 Bucharest

MARIUS

IOSIFESCU

RAou THEODORESCU

Contents Chapter 1 A study of random sequences via the dependence coefficient

1.1. The general case . . . . . . . . . . . . 1.1.1. The dependence coefficient . . . . . 1.1.1.1. Borel-Cantelli type properties . 1.1.1.2. The 0-1 law. . . . . . . . 1.1.1.3. Two auxiliary results . . . . 1.1.2. Generalizations of Bienayme's equality 1.1.2.1. Some inequalities concerning the covariance l.1.2.2. Applications to the variance of sums . 1.1.3. Convergence of series . . . . . . . . . 1.1.3.1. The a.s. convergence. . . . . . 1.1.3.2. The strong law of large numbers 1.1.4. The central limit theorem . . 1.1.4.1. The variance of sums . . 1.1.4.2. Different variants . . . . 1.1.5. The law of the iterated logarithm . 1.1.5.1. Two auxiliary results 1.1.5.2. The main theorem.

1.2. The Markovian case . . . . . 1.2. l. The coefficient of ergodicity 1.2.1.1. Introductory definitions 1.2.1.2. Properties . . . . . . 1.2.1.3. The relationship to the independence coefficient . 1.2.2. A lower bound for the variance. 1.2.2.1. The main theorem. 1.2.2.2. Auxiliary results . . . 1.2.3. Asymptotic properties . . . . 1.2.3.1. Borel-Cantelli lemma and the 0-1 law. 1.2.3.2. Generalizations of Bienayme's equality.

1 5 7 10 10 12 13 13 17 22 22 28 34 34 35 38 38 38 40 43 45 45 46 51 51 52

VIII

Contents

1.2.3.3. Convergence of series . . . . . . . . . . . . . . . 1.2.3.4. The strong law of large numbers . . . . . . . . . . 1.2.3.5. The central limit theorem and the law of the iterated

54 56

logarithm . . . . . . . . . . . . . . . . . . . .

60

Chapter 2 Random systems with complete connections

2.1. Ergodicity. . . . . . . . . . . . . . . . . . . . . . 2.1.1. Basic definitions . . . . . . . . . . . . . . . . . 2.1.1.1. The concept of random system with complete

connections . . . . . . . . 2.1.1.2. The associated Markov system 2.1.1.3. The associated operators . . . 2.1.2. Different types of ergodicity . . . . . 2.1.2.1. Definitions and auxiliary results . 2.1.2.2. Uniform ergodicity in the weak sense 2.1.2.3. Uniform ergodicity for the homogeneous case. 2.1.2.4. Uniform ergodicity in the strong sense . . . 2.1.2.5. Application to multiple Markov chains. . . . 2.1.2.6. Application to the associated Markov system 2.1.3. An operator-theoretical approach. . . . . . . . . 2.1.3.1. Mean and uniform ergodic theorems. . . . 2.1.3.2. Ergodic theorems for a special class of operators. 2.1.3.3. Application to the associated Markov system . . 2.1.3.4. Application to the ergodicity of homogeneous random

systems with complete connections.

2.2. Asymptotic behaviour. . . . . . . . . . . . . 2.2.1. Properties not supposing the ergodicity . . . . 2.2.1.l. Borel-Cantelli lemma and the 0-1 law. 2.2.1.2. Convergence and the strong law of large numbers 2.2.2. Properties supposing the ergodicity . . . 2.2.2.1. Basic results . . . . . . . . . 2.2.2.2. The strong law of large numbers 2.2.2.3. The weak law of large numbers . 2.2.2.4. The central limit theorem . . . 2.2.2.5. The law of the iterated logarithm 2.2.2.6. Some nonparametric statistics. .

2.3. Special random systems with complete connections. 2.3.1. OM-chains . . . . . . 2.3.1.1. Basic definitions . . . . . . . . . . . .

63 63 63 66 67 69 69

77 81 86 89 96 97 97 99 116 128

131 131 131 132 134 134

137 140 145 152 153 155 155 155

Contents

2.3.1.2. Examples . . . . . . 2.3.1.3. Ergodic theorems . . . 2.3.1.4. The Monte Carlo simulation 2.3.1.5. The case of an arbitrary set of states . 2.3.2. Chains of infinite order . . . . . . . . . 2.3.2.1. Definition and several special cases 2.3.2.2. An existence theorem . . . . 2.3.2.3. The case of a finite set of states 2.3.3. Other examples . . . . . . . . . . . 2.3.3. l. Partially observable sequences 2.3.3.2. Miscellanea . . . . . .

IX

160 168 176 184 . . . 186 186 188 190 198 198 202

Chapter 3 Leaming

3.1. Basic models . . . . . . . . . . . 3.1.1. Introductory definitions and notions 3.1.1.1. Description of models . . 3.1.1.2. The simulation of models 3.1.2. Distance diminishing models . . . . 3.1.2.1. Description of the model . . 3.1.2.2. Theorems concerning states. 3.1.2.3. Theorems concerning events 3.1.3. Finite state models . . . . . . 3.1.3.1. Introductory comments 3.1.3.2. Properties

3.2. Linear models . . . . 3.2.1. The (t+ !)-operator model. 3.2.1.1. Description of the model . . . . . . . . . . . 3.2.1.2. The (m+ 1) 2 -operator model with reinforcement. 3.2.1.3. The limiting distribution function . . . . 3.2.2. Experimenter-, subject- and experimenter-subjectcontrolled events . . . . . . . . . . 3.2.2.1. Experimenter-controlled events . . . . 3.2.2.2. Subject-controlled events . . . . . . . 3.2.2.3. Experimenter-subject-controlled events.

203 203 203 207 210 . . 210 211 215

217 217 218 219 219 219 225 232 245 245 257

261

3.3. Nonlinear models . . . . . . . . .

264

3.3. l. The beta model . . . . . . . . 3.3.1.1. Description of the model. 3.3.1.2. Some auxiliary results 3.3.1.3. Properties . . . . . . .

264 264 267 270

X

Contents

3.3.2. The simultaneous discrimination learning model 3.3.2.1. Description of the model . 3.3.2.2. Properties . . . . . . . 3.3.3. The fixed sample size model . . . 3.3.3.l. Description of the model. 3.3.3.2. Properties Bibliography . . . Notation index . . Author and subject index

278 278 280 282 282 283

286 300

301

CHAPTER 1

A study of random sequences via the dependence coefficient This chapter in concerned with a study of sequences of dependent random variables by means of a dependence coefficient following the classical approach of sequences of independent random variables. The Section 1.1 deals with the general case, whereas Section 1.2 with the Markovian one; here the results obtained in Section 1.1 are transcribed in terms of the famous coefficients of ergodicity. At the same time, several results which have no analogues in the preceding theory are given.

1.1. The general case 1.1.1. The dependence coefficient 1.1.1.1. Borel-Cantelli type properties

1.1.1.1.1. Let (Q,f,P) be a probability space and ,ffi and % 2 two er-algebras contained in the er-algebra Jf: Define the dependence coefficient of the er-algebras %i and % 2 by the relation ¢(%i,%i) = sup(esssup1P(Bl%i) (ro)-P(B)l)Be:K2

coeQ

Define also the independence coefficient of the er-algebras by the relation

.-ffi and Xz

o:(%i,%i) = 1 - sup (esssupP(BI~) (ro)- essinf P(Bl.ffi)(w)\. Be:K2

wen

wen

(1.1.1)

/

It is easy to see that 0:::; a(.~, fi), (~, .x';) :s; 1, ½[1-o:(~,%i)]:::; ¢(%i,%i):::; 1-o:(%i,%i).

(1.1.2)

1. A study of random sequences via the dependence coefficient

2

Jt;

If %/,%{ are a-algebras contained in % such that ~ c % 2, then ef,(~',.Y4), r

!

~ s) ~ 1-

-r,

and hence P(suplSs+z-Ss+kl l>k

~ 2s) ~ 1 -

!

-r,

for every s > r(e, c5). Since c5 can be taken arbitrarily small, we may write

lim

s ➔ oo

P(U {ISs+i-Ss+kl ~ 2e}) = 0, l>k

which concludes the proof.

q.e.d

We deduce from Theorem 1.1.12 the following Corollary. Let the random variables f be :.t:-measurable nEN*. Assume that there exists n0 EN* such that " " '

lim

n ➔ 00

Then the series absolutely a.s.

L neN*

(\/= ~' i =\j n+ i

1

no

~)=11(x)-cJ>(kx)

=-

1

fx

v'2n

.,2

e - 2 du

kx

and, obviously,

It is easy to see that for any x e R

lxle

-~ 2

1

lxle

~-,

Ve

- k2x2 2

1 ~--,

which concludes the proof.

kVe

q.e.d.

1.1.4.1.2. In this Paragraph (fn)neN* will denote a strictly stationary sequence of real random variables. We suppose that Ef1 =0. For Ac N* let ~{(fn)neN•} =~ be the er-algebra generated by the family of random variables (fn)neA. Particularly we shall write A = [m, n] for A = {i: m ~ i ~ n} and A= [ n, oo) for A= {i: i ~ n} . Set cp(n) = SUNJ? (%c1.r], %cr+ n,oo)). re

The sequence (cp(n))neN* is, obviously, nonincreasing. Let us consider the condition (1.1.10) Proposition 1.1.20. If (1.1.10) holds and Ef i < oo, then the series CT

2

=£ff +2

L Efifn+l

neN•

converges absolutely. We have E

o- 2 ~ 0

(J/J

and 2

= n(u + p.)

with Pn=o(l) as n-+'X). Proof The absolute convergence follows from (1.1.10) and the inequality 1£ f1fn+ 1 I~2 1 ' 2 (n)£Ji, which is a consequence of Lemma 1.1.7. 3 Iosifescu/Theodorescu, Random Processes

24

1. A study of random sequences via the dependence coefficient

where

as n--.oo. We conclude that u 2 ~0. 1.1.4.1.3. Let us set

q.e.d. n

rr

Lemma 1.1.21. (l.1.10) holds, u#O, and El/11 2 H< 00 for some 0 0 there exist suitable a 1 and k such that (1.1.11)

for all nE N*. We have (1.1.12)

~ Elsn 1 + c} + 2

for all

nE N*.

Els 1 + n

2

c}

+ 2 Elsn 1 +{JI sn I + 2 Elsn 11 sn 1 + 1

1

c}

25

1.1. The general case

The strict stationarity implies

EISnl 2 H = EISnl 2 H =en• By Lemma 1.1.7 first with p = (2 + b)(l + b)- 1 and then with p = 2 + b we have 1+6

El Sn I1 HI Sn I~ 2[ n0 ) we shall have 6

C2

2 1 +2 n~(2+e2)cn+2a 1 (ESn) .

Further, there obviously exists a 2 (~2a 1 ) so that (1.1.15) is valid for n~n 0 ; thus (1.1.15) holds for all neN*. In virtue of (l.1.15), and taking into account Proposition 1.1.20, for every reN* 2 2 21 C2r~(2+r.rc1 +:)j : +p ) · j= 0 2 1 + 2 (j + P2r- I

+a2(ESir-l+¾(rf (

Therefore, for e sufficiently small there is a suitable a3 such that 2

C2r~(2+e)' C1 +a3(ES2r- 1)

for all

rE N*.

1

{>

+2

It follows that

Thus, for a suitable a 4 , (1.1.16)

for all rE N*. Let 2r~n(x)

uniformly with respect to xER. Proof. See

IBRAGIMOV ( 1962).

Theorem 1.1.23 may be extended as follows. Let ~ be a real valued random variable. Then ~ can be represented in the form ~=F[(/11 ) 11 eN•J, where Fis a g1N•-measurable real-valued function defined on RN•. Consider the strictly stationary sequence (g 11 )neN* with

. ~ 1 .oc 1-measurable

We have Theorem 1.1.24. If E~=O, E, 2 oo

lim

n ➔ oo [IBRAGIMOV

(1959, 1962)].

P(~ iro-s:

< x) = 4>(x).

< oo for some c5>0,

34

1. A study of random sequences via the dependence coefficient

4. If (fJneN• is only a stationary in the wide sense sequence of random variables such that ll,,I ~ C, neN*, and nt(n)< oo for

I

neN•

some e > 0, then a theorem corresponding to Theorem 1. 1.22 may be proved. For this one must use a direct domination for Elg 1 13 ( ~ C p E gt) and choose adequately p,q and -r [losIFESCU (1963d)].

1.1.S. The law of the iterated logarithm 1.1.5.1. Two auxiliary results

1.1.5.1.1. Here the sequence (fn)neN• is supposed to verify the same conditions as those mentioned at the beginning of Subparagraph 1.1.4.1.2. First, we have

Proposition 1.1.25. Suppose that (1.1.10) holds and u;i=O. If 0

Proof According to Theorem 1.1.22 we may write

For every t>O we have 00

f

x2

e - 2- dx =

! e- 2

r2 (

t

1 - ~) t2

'

r

Consequently

vk Ie-~

dx+O(n-•) =

a.,

1

>

We set log2 n=loglogn.

~a, exp(- al)

Elf1 12 H < oo,

35

1.1. The general case

Under the condition concerning an we have

O(n-")a,.exp (

1) =

o(l),

which concludes the proof.

q.e.d.

1.1.5.1.2. Further, let us set

s: = 1:s;k=s;n max Sk. Proposition 1.1.26. Suppose that (1.1.10) holds and Elf1 12 H < oo, ¢(1)< 1, o-#0. For every c such that (l)O

such that

P(S:>•).;; c-~(lt(S.>e-baV,i) for arbitrary e > 0 and

nE N*.

Proof. According to Theorem 1.1.22 there exist n0 EN* and b=b(c)>O such that P(Sn> -bo-Vn)~c for n~n 0 . It is easy to see that b may be chosen sufficiently large such that the above inequality holds for every nEN*. It remains to apply Lemma 1.1.6' with a= -bo-Vn and x=e-bo-Vn, q.e.d. 1.1.5.2. The main theorem 1.1.5.2.1. Now we shall give the law of the iterated logarithm.

Theorem 1.1.27.

If (1.1.10) holds and Elfil 2 H< oo, o-,i:O, ¢(1)< 1,

then

Proof. We use a classical approach due to A. N. KoLMOGOROV. Set nk = [ d 2k] for kE N* with d > 1. Without any loss of generality it may be supposed that a= 1. 1° Prove that for any e>O we have

Let O< e' < e and choose d such that (1 + e)/ d > 1 +a'. It is easy to see that

P(Sn > (1 +e)V2nlog2n i.o.) ~ P(s:k > (1 +e)V2nk-l log2nk-l i.o.) ~ P(s:k >(1 +e')V2nklog2nk i.o.).

36

I. A study of random sequences via the dependence coefficient

Therefore, it suffices to show that

L P(s:k > (1 +e')V2nklog2 nk) < oo. neN*

This convergence follows from the fact that for O< e" < e' and for k's sufficiently large we have, by virtue of Propositions 1.1.26 and 1.1.25,

P(s:k > (1 +e')V2nklog2nk)

~ c-~(1) P(Snk > (1 +e') v2nklog2nk-bv,l:) ~ c-~(l) P(Snk > (1 +e")V2nk1og2 nk) =o (k-< 1 +E:'')

2 ).

Note that the proved result implies that

P(ISnl > (1 +e)V2n1og2 n i.o.)= 0 for any e>O. 2° Prove that for any e>O we have Set uf = nk(l -d- 2), vf = 2log 2 uf, and Ak = {Snk -Snk-l >(1-e)ukvk}

for kEN*.

According to Proposition 1.1.25 we have (we can obviously consider only the case e < 1)

P(Ak)

1

= ---(loguf)-< 1 -

2 2

> (1

+o(l)),

~Vk

which is the general term of a divergent series. Lemma 1.1.2' 1 > implies then that

P(1im A,) = I. Taking into account the remark mentioned

at the end of ~:it results that P(lim AknBk) = 1, where k-+ co

Bk= {ISnk-1I ~2v2nk-1 log2 nk-1}. By choosing d sufficiently large in order that (l-£)(1-d- 2 ) 112 -2d- 1 > 1-e' with e'>e we obtain 1= P

(!~ AknBk) ~ P(Snk>(1-e')V2nklog nk i.o.). 2

Theorem 1.1.27 is proved. 1

>

Lemma 1.1.2 might be also used by taking Ak= {Sn2k -Sn21c-1 >(l-e)u2kV2k}.

q.e.d.

37

1.1. The general case

1.1.5.2.2. Set now

Uf = Eff + 2 L Efifn1+1· neN*

Then, we may state Corollary 1. For each lEN* such that a 1#0 and ¢(1) 0 there exist x', x" EX so that the measures P(x'; •) and P(x"; •) are concentrated at two sets Bx, and Bx" for which P(x';Bx· nBx,,)II. According to (1.2.5) we may write re1

Then, Proposition 1.2.2 implies

Thus, we have proved that oc(Q)= inf

i,ieI

I, min(qik,qik)kel

Taking into account that . mm(qik,qik)

=

qik+qjk-lqik-qjkl

2

'

we obtain q.e.d.

1.2.1.2.3. Let us consider the general case. Let (X,,q[), ( Y, i?Y') and (Z, !Z) be three measurable spaces. Consider a transition probability function P' from (X,,q[) to (Y, i?Y) and a transition probability function

1.2. The Markovian case

43

P" from ( Y, dJ/) to (Z, ~)- Let P be the transition probability function from (X,_q-) to (Z,f!Z) defined by P(x;C) =

JP'(x;dy)P"(y;C) y

for xEX, CE:!Z. Proposition 1.2.4. We have

1-tX(P)~(l -a(P'))(l -oc(P")). Proof. The associated operators ~, ~, and ~" with P, P' and P" respectively will satisfy the relation ~ = ~,, ~', from which we obtain

11~11~11~"1111~'11. It remains to apply Proposition 1.2.2.

q.e.d.

Notes and comments

Lemma 1.2.1 is due to due to DoBRUSIN (1956).

IOSIFESCU ( 1967 b ).

Propositions 1.2.2-1.2.4 are

Complements

Let O(V, -r) be the Banach space of all real-valued bounded and r=measurable functions t/1 defined on V with the norm 111/111 =oscv,= supt/f(v)- infv,(v). veV

veV

With the transition probability function P from (X,ff) to ( Y, ~) we associate an operator ~* applying 0( Y, ~) into 0( X, El) by setting (~*t/f)(x)=

JP(x;dy)v,(dy).

y

We have cx(P)= 1 -11~*11. [IOSIFESCU

(1967 b)]. 1.2.1.3. The relationship to the independence coefficient

1.2.1.3.1. Consider a sequence of measurable spaces (Xi,g[")ieN and for every jEN a transition probability function ip from (Xj,5) to

44

1. A study of random sequences via the dependence coefficient

(Xi+t,.:l'i+ 1 ). By means of the iP's and a given xEX 0 we can construct a probability space (Q,f,Px) by taking

a=nxi,

Yt=n~

ieN*

ieN*

and by setting Px(A)= J 0P(x;dxi)J 1P(x1;dx2) ... J"- 1P(x,.-1;dx,.) A1

An

A2

for A= A 1 x · .. x A,., AiE.:l'i, 1 ~ i ~ n [the possibility of extending Px on the whole Ye is assured by IONESCU TULCEA's (1949) theorem; see also LoEYE ( 1963, p. 137)]. The sequence of random variables (,,.)neN* defined on Q by for w = (x,.),. N will be said to be connected in a (nonhomogeneous) Markov chain with state spaces (X,.,.:l',.), nEN*, transition probability functions i P, j EN, and initial probability distribution concentrated at XEX 0 •

Obviously, for every nEN*, the X,.-valued random variable ,,. is ~-measurable, where

~=

{A:A=

TT Ai}

ieN*

for every n, l EN*. 1.2.1.3.2. Define now as usual the n-step transition probability functions i P" by iP"(xi; Ai+n)

=

JiP(xi;dxi+ X

1 ) •••

Jn+ i- 2 P(x,.+ i_ 2 ;dx,.+ X

where XjEX, Aj+nE.:l'j+n· We shall prove

j-i)

Jn+ i-

Aj+n

1

P(x,.+ i-t ,dx,.+ },

45

1.2. The Markovian case

Proposition 1.2.5. We have k

a(

i"!i ~,

k+l+m-1

i=y+i

)

~ =

1

a(kP )

for every k,l,mEN*. Proof According to (1.1.1) we have

where kp~ is the transition probability function from (Xk,~k) to the k+l+m-1

product measurable space

TI

(

k+l+m-1

TI

Xi,

j=k+l

)

~i

induced by the ip•s.

j=k+l

It is easy to verify that for m > 1 kp~(xk;•)

=

J kpl(xk;dxk+,) J Xk+I

k+lpm-1(xk+,;dy 1, x~,x;EX1c, Therefore o:(k P~) ~ o:(k pl).

q.e.d.

The converse inequality is obvious. Proposition 1.2.6. We have k+l-1

1-a(kP1)~

n (1-aeP))

i=k

for every k, l EN*. Proof The above inequality is an immediate consequence of Proposition 1.2.4. q.e.d. Notes and comments

Proposition 1.2.5 is essentially due to

UENO

(1957).

1.2.2. A lower bound for the variance 1.2.2.1. The main theorem

1.2.2.1.1. Let us consider a probability space (Q,%,P). For a given a-algebra IR c :I{" we denote by E(fl!f') the conditional expectation

46

1. A study of random sequences via the dependence coefficient

of the real-valued random variable f with respect to Y and we set, as usual, D(fl2') = E {[f- £(!1£1]2 IY} = E(f 2 IZ)- [ £(fl2')]2, D(fl.Y)) = £[ D(flY)], D(fl2') = D [ E(f!Y')]. It is known that (1.2.6) Df=D(flZ)+D(fl~). 1.2.2.1.2. Now consider a sequence ('Pn)neN• of real-valued functions such that 'Pn is defined on X" and ,qfn-measurable. Set fn= 'Pn°~n, nEN*, sk,n=h+ ... +fk+n-1• S1,n=Sn.

Let us denote !X =

min

!X(i P).

l~i~n-1

Our purpose in what follows is to prove Theorem 1.2.7. If the random variables variances, then 1 > cxlnJ

DxSn ~ 8

fk, 1 ~k~n, have finite

n

L

Dxft.

k=l

The proof will follow from a sequel of propositions. 1.2.2.2. Auxiliary results

1.2.2.2.1. Let us denote Yk=E(Skl~).

bk=Sk-Yk·

Obviously, E(bkl~)=O. Without any loss of generality we may suppose that £ fk = 0, I ~ k ~ n. Set also Dk=D(Sklfk)=D'5k, l\=D(Skl~)=Dyk. Therefore DSk=Dk+Dk. 1

>

For the sake of simplicity, we shall write in the following D, E instead of

Dx,Ex-

47

1.2. The Markovian case

Proposition 1.2.8. For any 1 ~k,-E(li,lfi)] [r.+s.+ 1.,-.-E(r. +sk+ ,.,-.lfi)J /

,Y.f.} = o.

For, remark that the random variable

l

is

V ~-measurable; thus it suffices to show that i=k

(1.2.7) By using the time-reversibility of the Markov property we obtain

thus (1.2.7) holds. To close the proof it remains to show that

This follows from (1.2.8) q.e.d. 1.2.2.2.2. Now, consider two measurable spaces (X,,q-) and (Y,@'), a probability n on ,q-, and a transition probability function P from the

48

1. A study of random sequences via the dependence coefficient

first to the second. On the product a-algebra :!Ix tl.!I we define the probability p by setting p(A x B)= x(dx)P(x;B)

J

X

for Ae:!I, Betl.!I. Consider also the a-algebras fi = {Ax Y: Aef!I}, o/1 ={Xx B: Betl.!I}.

These a-algebras are isomorphic with the er-algebras :!I and tl.!/. For Ae:!I and Betl.!I the corresponding elements in fi and o// will be denoted by A= A x Y and B= X x B respectively. Let ebe a real-valued random variable defined on the probability space (Xx Y, :!Ix tl.!l,p). Suppose that eis Ii-measurable and possesses a finite variance.

Proposition 1.2.9. We have

D(elcffl) ~½cx(P)D e.

e.

Proof It suffices to consider the case of a simple random variable Let us denote by m the median of and set M = {(x,y):e(x,y)-m~0}. Obviously ME:i and p(M)=n(M)~½- As known E(e-m) 2 ~De,

e

wherefrom it follows that we have either E[(e-m)+] 2 ~½De or E[(~-m)-] 2 ~½D~. To make a choice let us suppose that

(1.2.9) Denote by a a fixed positive but arbitrary number less than ½. By (1.2.9) and the i-measurability of e it follows that there exists a partition of X x Y-:, M into a finite number of disjoint sets l, 1 ~ i ~ r, belonging to :!I such that for some ti~ m r

L 1t(Ai)(ti-m)

2

~ aDe

(1.2.10)

i= 1

and e(x,y)=tj for (x,y)EAj, For every AefI such that n(A)>O let us consider the A-conditional probability on f£ defined as usual by n (•) A

= n(A n •) n(A)

·

Consider also the probability qA on o// defined by qA(•)

= JxA(dx)P(x;•) = X

~1tA(•).

(1.2.11)

49

1.2. The Markovian case

Note that we can also write qA(B)

1

= n(A)

J

n(dx)P(x;B) =

p(AxB)

(1.2.12)

n(A) .

A

Taking into account (1.2.11), Lemma 1.2.1 yields llqA'-qA"II ~ 1-oc(P)

(1.2.13)

for every A', A"E~ with n(A'), n(A")>O. Finally, from (1.2.1) and (1.2.3) it follows that for every pair of sets A',A"E_q[ with n(A'), n(A")>O there exists a measure vA' A" on Cf.!J such that (1.2.14) for all BeCT.!f, and (1.2.15)

ti(qA',qA") =VA' A"(Y).

From (1.2.2), (1.2.15) and (1.2.13) we deduce vA'A"(Y) ~ oc(P).

(1.2.16)

Relations (1.2.14) and (1.2.12) imply n(Ai)vA,M(B) ~ n(AJqA,(B) = p(Ai x B)

(1.2.17)

for all BECT.!f, 1 ~ i ~ r. Relations (1.2.14), (1.2.12) and the inequality n(M) ~½ imply also r

I

r

n(A;)vA,M(B) ~

i=1

I

n(Ai)qM(B)

(1.2.18)

i=1

for all BECT.!f. Now, consider a finite subalgebra PA of Cf.!J and denote by Bi, l ~j~s, a system of (disjoint) generators of PA. We shall prove that for every Bi, 1 ~j~s, we have r

p(X x Bi)D(~IX x B) ~

L n(Ai)vA;M(Bi)(ti-m)

2

•

(l.2.19)

i=1

If m i=E(~IX x B)+(l-cx(n))1f2J

k=1

(n)

CX

n

L Dxfk•

k=1

Proof. It suffices to take into account Proposition 1.2.6.

q.e.d.

Note that this Corollary and Theorem 1.2.7 give the striking double inequality

(1.2.22) This double inequality implies Proposition 1.2.15. Suppose that (1.2.20) is verified with n 0 = l. Then the series f n - Exfn converges in quadratic mean if[ the series

L

neN*

L

Dxfn converges.

neN*

1.2.3.2.2. Further, we can prove Theorem 1.2.16. If the random variables fk, 1 ~ k ~ n, are bounded, then

Proof. The proof follows from (l.1.2), Proposition 1.2.5 and Theo-

rem 1.1.11.

q.e.d.

Corollary. If the random variables fk, 1 ~ k ~ n, are bounded and a. > 0, then

Proof. It suffices to take into account Proposition 1.2.6.

q.e.d.

Notes and comments

An inequality of the type from Theorem 1.2.14 was given by (1956). It is unknown whether the constants½ and 8 in (l.2.22) may be improved.

DOBRUSIN

1. A study of random sequences via the dependence coefficient

54

1.2.3.3. Convergence of series

1.2.3.3.1. We have Theorem 1.2.17. Suppose that (1.2.20) is verified with n 0 = 1. Then the series fn converges in Px-probability iff it converges Px -a.s.

L

neN*

If (1.2.20) is verified with n0 > 1, then the series converges absolutely in Px-probability iff it converges absolutely Px-a.s. Proof. The proof follows from (1.1.2). Proposition 1.2.5, Theorem 1.1.12 and its Corollary. q.e.d. Theorem 1.2.18. Suppose that ( 1.2.20) is satisfied. If the random variables fn, nEN*, have finite variances and Dxfn a), neN•

are convergent for at least one a> 0. Proof. It suffices to notice that this Corollary corresponds to Corollary 4 of Theorem 1.1.14. q.e.d. Theorem 1.2.19. Suppose that (1.2.20) is verified with n 0 = 1. If the random variables fn, nE N*, are uniformly bounded and the series fn

L

L

converges Px-a.s., then the series

Dxfn and

neN*

neN*

L

Exfn are convergent.

neN•

Proof. Suppose on the contrary that the series

I

Dxfn is divergent.

neN*

As without any loss of generality we may suppose that inf a.(kP) ~a.> 0, keN*

Theorem 1.2.7 implies that lim

n--+ oo

oxsn = 00.

55

1.2. The Markovian case

Now, we shall use a consequence of a result by STATULEVICIUS (1962). If 1/kl~M, kEN*, there exist two constants H>O, O 0 (there/ore for all a> 0). Proof. The proof follows from Theorem 1.2.18 and Theorem 1.2.19 as in the case of independent random variables [ see, e. g., LoEVE (1963, p. 237)]. q.e.d. Notes and comments

Theorem 1.2.17 is due to UENO (1957), Theorem 1.2.18 to COHN (1965c). Theorems 1.2.19 and 1.2.20 are due to IosIFEscu (1967b). 5 losifescu{f heodorescu, Random Processes

56

1. A study of random sequences via the dependence coefficient

1.2.3.4. The strong law of large numbers

1.2.3.4.1. We have Theorem 1.2.21. Suppose that (1.2.20) is verified. If for a sequence

II

then 1/an

I

fk-Exfk converges Px-a.s. to Oas

n-HX).

k =1

Proof. The proof follows from Theorem 1.2.18 in the same way as Theorem 1.1.15 followed from Theorem 1.1.14. q.e.d.

Now, the analogue of Theorem 1.1.17. Theorem 1.2.22. Suppose that ( 1.2.20) is verified and let the random rariables f,,, nE N*, be identically distributed. Under these conditions

1/n

I Ji

converges Px-a.s. to a constant a as n-HX) iff Exlf 1 1 < oo; in

i= 1

this case a= Ex f 1.

Set

and suppose that

Next, we have Theorem 1.2.23. If zn+l-1

L (2"a,,)-

2

neN*

L

Dxf~< r:x;,

( l.2.23)

i=2"

n

then the random variable n n~x.

1

I

fk - Exfk converges Px -a. s. to O as

k=l

Proof We shall verify conditions ( 1.1.5) and ( 1.1.6) of Theorem 1.1.18. It is easy to see that 1-r,n~!Xn, ~•n

~8

:t; I.

57

1.2. The Markovian case

Thus

22 "(1-ri,J

---2-,.--:--+--:-t_-1- - ~ - - - - 2 , , - • - I-_-1- -

L

(yn + 1)

DxJ;

L

(2nan)- 2

i=2"

Dx.!;

i=2"

and in virtue of (l.2.23) condition (l.1.5) will be satisfied. Then, 2n+ 1-1

2n+ 1-1

2- 2 n(rn+ 1)

L

(2"an)- 2

DxJ;

L

DxJ;

________ i=_2_"---,--- ~ _ _ _ _ _ _i_=_2"------,---:--~2n+ t-1 2"+ 1 -1 2 1 2 2 2

4(1-11n)e

-r

L

n(11n+ 1)

DXJ;

4.9- e -(2"r. j,

if n~j.

We shall prove that for every j EN* lim .!._ n__,-x:,

n

±

fli>=O

k= 1

Px-a.s.

(l.2.24)

59

1.2. The Markovian case

We use the following result [LoEVE (1963, p. 387)]. Let (Y,.)neN* be a sequence of random variables such that £( Y,.I Y,. _ 1 , ••• , Yi)= 0 and £¥,; ~M < oo for all neN*. Then 1 lim n ➔ oo n

n

L Yrt=O k=l

a.s.

To apply this, note that for n > j Ex (J . f (i>);: ): ): )IJ n IJ,W n-1,···, 1 - £ JC (£JC (f (j) n I1:on-J,':on-j-1,··•,~1 n-1,···, J) 1,

and that, since the ek's form a Markov chain,

Ex(f~'"n-j, ... , e1)=£x(f~>1en-1)=0. Further, £x(f~/>)~2(supl/l) 2 thus (1.2.24) holds. To complete the proof of Theorem 1.2.24 write

f(c;n)-£x{f(c;n)lc;n-)=Jn+ ··· +fnU>,

n>j.

Thus, by (1.2.24),

Or, neglecting at most k terms,

-;;1 k~/(c;k) n

!~~ 1

so that, for fixed

!~~

n

I

Px-a.s.,

uE N*

-;;1 J/(c;k) n

1

1

-;; k~l Ex{f(c;j+k)lc;k) = 0

1

" [ 1

U

JI

-;; j~l ; k~l (\:(f(~j+k)IO =

0

By the uniform convergence of Unf to Err.I, for any £>0 we may choose u e N* such that

For such an u we have

which completes the proof of the theorem.

q.e.d.

60

1. A study of random sequences via the dependence coefficient Notes and comments

Theorems 1.2.21 and 1.2.22 are due to COHN ( 1965 a, 1965 c). Theorem 1.2.23 is due to IosIFEscu (1967b). We note that similar results given by ROSENBLATT-ROTH (1964) are doubtful since they are based on an erroneous proof [loSIFESCU (1967 d)]. Theorem 1.2.24 is due to BREIMAN (1960). A class of Markov chains admiting invariant probabilities, closely related to some stochastic models for learning, has been studied by DuBINS and FREEDMAN (1966).

1.2.3.5. The central limit theorem and the law of the iterated logarithm

1.2.3.5.1. Let us approach the homogeneous case, i. e. 'P = P, 11

nEN*.

In this special case conditions such as ( 1.2.20), which appear in most of the preceeding theorems, amount to

cx(Pn°) ~ex> 0. Thus, the restatement of these theorems is obvious. Let us pass to the theorems from Subsections 1.1.4 and 1.1.5. According to Theorem 2.1.35 if cx(pn°)~cx for some n0 EN*, then there exists a probability n on flI such that n

--1

IPn(x;A)-n(A)I ~(1-a)no

for all nEN*, xEX,AE PI. It is easy to see that n is a stationary absolute probability distribution. This means that the sequence (fn)neN* will be a strictly stationary one if the initial probability distribution of the Markov chains is n (instead of that concentrated at x EX as in Subparagraph 1.2.1.3.1 ). Theorem 1.1.22 leads to Theorem 1.2.25. If En I f 1 12 H < co, a- n # 0, then there exist two positive constants v < 1 and K such that

for any aER, nEN*. 1

>

The notations E1t,)EB}. Notes and comments

The associated Markov chain was discussed for the first time by ONICESCU and Mrnoc(l936a) and afterwards by FoRTET(1938), loSIFEscu and THEODORESCU (1961-1962) etc. Further, several authors used the associated chain mainly in applied problems. Complements

1. An important problem is the investigation of the coefficients of ergodicity cx(rQ"). We do not know yet a nontrivial general result in this direction. It is clear that if, for example, there exists BE "If" and w ,W EW such that ur(w';x)I = 0 n--+oo

uniformly with respect to lEN*, w', w"E W, A< 0 EX(l); thus lim (Pi(A< 1>-fi(A< 0 ))=0 n--+ oo

uniformly with respect to ZEN*, A);A)I the supremum being taken over all t', t" ET, w', w'' E W, Aefl', then

xE Yn(A 0 ),

neN*

We note that

U" j")

implies that the sequence (a~)neN* is nonincreas-

ing. Consider also a weaker condition than Condition FLS"(A 0 , v). Condition FLS'(A 0 , v). Let A 0 Efl' and veN*; (j') there exists y>O such that rpl (w; A 0 )~y for all we W, te T; (j'j') if we set

a~ =suplr+rP(ur(w'; x); A)-r+rP(ur(w"; x); A)I the supremum being taken over all te T, w', w" E W, xE Yn(A 0 ), AEfl, then

I

an< oo.

neN•

We have

Proposition 2.1.13. If Condition FLS'(A 0 , v) is verified, then

O, w' , w"EW., fior all lEN* , tET, xEYn(A 0 ), w',w"EW, A< 0 Ef!l"0>. It follows that for l ~ v Proposition 2.1.13 is proved. If l> v, we have t+rpl(w;AO>)

=

J t+rP,,(w;dxM) J x)-Pi0 (w"; A)I ~ 1-b

fior all [EN* ' w', w" E W., AEf!l'0 such that for every leN*, tET and every partition A~>uA~>=x,A~1>Ef!l'(l), i=l,2, we have either 'Pi 0 (w; AVl);?;c> for all WE W, or tpi0 (w; A~>);?;b for all wEW. For homogeneous random systems with complete connections, the above conditions reduce to Condition F(n 0 ). Let n0 eN*; there exists b>O such that for every lEN* and every partition A~> u A~>=x< 0 , A1°Ef!l'< 0, i= 1,2, we have either Pf 0 (w; A~l);?;b for all wEW, or Pi 0 ( w; A~>);?; )_tp;o(w"; A~>)I = 1rp;o(w'; A~>)_rp;o(w"; A~>)[, valid for every partition A~> uA~>=x0 >, Ai1>E~0 >, i=l,2. Now suppose that M'(n 0 ) is verified. Put

JHA< 1>) = inftPi 0 (w; A(l>). weW

We have sup rPf 0 (w; A) weW

= sup [1-tPi (w; X(l)-A)] = 1-fHX). 0

weW

Condition M'(n 0 ) implies 1-fHX(ll_A(l))-fHA(I))~ 1-b

that is

Therefore Condition F' (n 0 ) will be verified, b being replaced by t>/2. q.e.d.

Notes and comments

The above approach follows IosIFESCU (1963 e, 1965 e, 1966 b ). Lemma 2.1.7 is essentially due to LAMPERTI and SUPPES (1959). Condition FLS(A 0 , v) was used in special cases by FoRTET (1938) and LAMPERTI and SuPPES (1959). Conditions M(n 0 ) and F(n 0 ) are analogous to classical conditions used by MARKOV and FRECHET to investigate ergodic properties of Markov chains. Complements

One may imagine also other types of uniform ergodicity, as well as one may define different types of uniform semi-ergodicity. We come, for example, to the uniform semi-ergodicity of Cesaro type if for any lEN* and wE W there exists a probability P?(w: •) on grm such that 1 lim n--+ex>

n

L" P{(w; A0>)=P?(w; A 0, it follows tp~+i(w;A)~oc'tp(wo;A) -

11.'

4

11 We notice that Condition K~(A ,k) implies (j') of Condition FLS'(A ,k+ 1) 0 0 with y=cc.

80

2. Random systems with complete connections

for all tET, lEN*, wEW, A 0>efr and a fixed w0 EW. We have either rp< 0 (w 0;Am)~½ or 1p< 0 (w 0 ;x< 0 -A 0 >)~½; thus Condition F'(k+ 1) is verified with t,

= a.'.

q.e.d.

4 Further, let us consider

Condition K~(A 0 ,k,µ,r). Let A 0 efl', keN and µ,reN*; there exist a family ('p,),eT of probabilities on Aon [r0 such that 'P:(w; A n A 0)~ cc,. 'p,.(Ae[r)I ~ -

Y

L a1

(2.1.10)

j~r

for all l>r, tET, w',w"EW, A< 0 e~< 0 . On the other hand we have for l> r and an arbitrary we W

'P 1+.u-i (w';x)~

J

~

XUl-l)xA

X

'Pr+µ- 1(w' ;dx(r+µ-1))

J

t+r+µ-1 Pz_,(u,(w';xE~m, where pis the greatest nonnegative integer such that Pov< n0 • Proof. The proof follows from Theorems 2.1.12 and 2.1.15.

q.e.d.

2.1.2.3.2. We shall now give sufficient conditions for uniform ergodicity for homogeneous random systems with complete connections corresponding to those considered in Subparagraph 2.1.2.2.2. Let us set {u{w;x):wEW,.xEX(fd} if k>O, "'1c = { w 1'f k =. 0 Condition K 1 (A 0 ,k). Let A 0 E~ and kEN; there exists a probability p on A 0 n~ and a number O such that P(w;A n A 0 );;::::E Yn(A 0 ), w', w"E W, l/lEBL,Ao(W, 'if'} Proof. We have

J

(Uh i/J){w) = Ph(w; dx) l/l(u(w; x)). X

As a consequence of the inequality (2.1.6) and Lemma 1.2.1 we obtain l(Uh t/J)(u(w'; x))I ~ (v- l)an oscl/1 + In

(2.1.11)

for all h)-P1 (A< 0 )1 ~ 1

3I P-

~

inf

[

k

(

=0

r) (1-yr-k+· 6v k

pov) =t flj(x; A< 0)

for every w=xEW, tET, l, nEN*, A(l>Eq-< 0 . The relation (2.1.1) implies t flj(x;AOl)

=

J m-k+ t

X

7*

1 (x;dy(k))r+r n;-'(y(kl; A(ll)

(2.1.12)

90

2. Random systems with complete connections

for k ~ r < n. This relation generalizes the well-known ChapmanKolmogorov equation. 2.1.2.5.2. Uniform ergodicity in the strong sense for the considered Markov chain comes to the existence of the probabilities fl? on !£0 > for every le N* such that lim t llf (xe!£ 0 such that for every partition A\k>u A~k>=x, At> efif, i= 1,2, we have either or

Further, we consider Condition Mk(n 0 ). Let n0 eN*; there exists IJ>O such that l''n:o(x;A(k))-'"n:o(y(k);A(k))I ~ 1-/J

for every t', t" E T, x, y E x, A e !£,A('))_t"+nnNx E x;dz;A)I

~ k+no-1~r~n-1 inf

L a'!

)

1

i~r c)

+ (1-b{!,--

1

Proof The proof follows from (2.1.17), by making use of Lemma 1.2.1 and taking into account (2.1.14) and (2.1.16). By setting l]7(A 0')

=

11f(A0') =

inf

'llf(x; A 0>)

reT,xO•>eXCk)

suo

'llf(x;A< 1>),

teT,x(k1eXCkl

we obtain J17(A< 0 )-Uf(A 0 >)~(1-c5)[lli-'(A< 1>)-Ui-'(A 0 ')]

+

La~' j3r

for k + n0 - 1 ~ r < n and hence the existence of n~ and the domination in the above statement are easily deduced. q.e.d. 2.1.2.5.3. Let us pass to uniform ergodicity in the weak sense. The uniform ergodicity in the weak sense for the considered Markov chain comes to n---+ x,

uniformly with respect to tE T, x,yEx, ZEN*, AE£l'" 0 >. Obviously, Condition FLS'(X, 1) is always verified. Condition F~(n 0 ). Let n0 EN*; there exists c5 > 0 such that for every partition A\k'uA~k>=x, Aik>E£l°, i=l,2, and every tET we have either or

Condition M~(n 0 ). Let n0 EN*; there exists b>O such that ltfl~O(x(k);A(k))-'n:o(y;A(k))I ~ 1-c5

for all tE T.

2.1. Ergodicity

93

Proposition 2.1.33. Condition F~(n 0 ) and M~(n 0 ) are equivalent.

Proof. The proof follows as in Proposition 2.1.14.

q.e.d.

Further, we have Proposition 2.1.34. Condition M~(n 0 ) implies Condition M'(n 0 + k).

Proof Indeed, by (2.1.12) we may write t II?+k(x;A)J r +no+k-1 llz(Z(k); A(I)),

XII, x,yEx), x, A~' 0 'E~(Lo>, i = 1,2, we have either

or

2. Random systems with complete connections

96

(Equivalently,

for every w', w" e W,

A(lol e~

i.e. real- or complex-valued.

109

2.1. Ergodicity

consequently, we shall use the notations CL( W) = CL1 (W), m(t/t)=m(t/t; 1). First, we note that CL(W) C C(W). Indeed, for a fixed but arbitrary w0 E W, we have !t/t(w)-t/t(w0 )1 s u p - - - - - = aO; there exists an integer j(e) such that m(I/Jn)~m+E,

j'~j(B);

if w',w"eW, w'=l=w", then 11/J(w)-l/f(w")I = lim 11/Jn;Cw')-l/lniw")I. d (w', w") j-+ oo d (w', w") It follows that 11/1( w')-1/1( w")I d (w', w") ~ m+e,

and since e is arbitrary, we get (2.1.24). Let us come back to the completeness of CL( W). From lim llt/ln -1/1 PII =0

n,p--+oo

we deduce lim

n,p-+ oo

11/Jn-l/lpl=O

111

2.1. Ergodicity

and (2.1.25) By (2.1.25) there exists a t/, EC( W) such that lim lt/ln-t/11 =0;

n-+oo

it remains to show that t/, E CL( W) and lim llt/ln-t/111 =0.

n ➔ oo

Let e > 0 and n(e) an integer such that m(t/Jr-t/Jn) n(e) being arbitrarily chosen, we have also lim m(t/Jr-t/1)=0,

r ➔ oo

·so that the completeness of CL( W) is proved. We notice also that CL(W) is dense in C(W). Further, we remark that the above considerations imply the validity of (ITM.) for X=C(W) and ID=CL(W). Thus, we have succeeded in isolating a special class of operators, namely the class B( CL( W), C(W)). For this class the following specialization of Theorem 2.1 .40 is valid. Theorem 2.1.52. Let Ube a linear operator on C L(W) such that

(i)

IUlcL(W) ~ 1;

(ii)

UEB(CL(W));

(iii) for some positive integer k and positive constants q < 1 and Q

Then (a) there exist at most a finite number of eigenvalues Ai, 1 ~ i ~ p, of modulus 1 of U, each of them of finite multiplicity, such that

2. Random systems with complete connections

1)2

p

un= L

lfUi+vn,

neN*,

i=l

where U;eB(CL(W)), l~i~p, and l~i~p;

(b)

VeB(CL(W)) and 91.,,=U;'l),

l~i~p, 1 ~;. i'Fj, 1 ~i~p;

Uf=Ui, U;Ui=O, U; V= VU;,

j~p.

(c) there exist two positive constants Mand h such that

IIV"II ~

M (1 +ht,

neN*.

Proof. Obvious.

2.1.3.2.4. Let us assume now that the elements of the homogeneous random system with complete connections {(W, 1Y), (X,~), u.,P} satisfy (I) Wis a compact metric space with respect to some distance d and 1Y is the class of Borel sets in W; (II) m(P(•;A))~a < oo for all Ae~; 11 (III) µ(u(•;x))~ 1 for all xeX, where ( ( )) µu.·x '

(IV)

-

d(u(w';x),u(w";x)) sup-------· w'#w"

d(w',w")

'

there exists n 0 EN* and A\;'0 >e ~(no> such that

A\;'ol c {x l

}

2

is either 1 or O provided that y¾. A similar result may be obtained for the more general expression l f..+1Cw) = - +1 n

II

L

l/l(a11 + 1 w),

J=O

where Un)neN• is an increasing sequence of nonnegative integers (1940a,b); for the special case l11 =n(n-1)/2, see KAC- (1938)]. These results were obtained by making use of the linear operator

[FORTH

(Ul/l)(w)

= ~1

a-1 i~

(

+ i") •

1/1 wa

2.1.3.3. Application to the associated Markov system

2.1.3.3.1. We shall continue to restrict ourselves to the homogeneous case. A basic question is the asymptotic behaviour of Q"(w;.) as 11--+-x.. It is necessary to make precise what is meant by "convergence" of a

2.1. Ergodicity

117

sequence (pn)neN* of probabilities on "IF to a probability p on "IF. The appropriate notion is this: Pn converges to p if for any Borel set B in W n-+oo 0

n-+oo

-

where B is the interior and B the closure of B. Equivalently, lim n-+oo

Jtf,(w)p"(dw) = Jtf,(w)p(dw)

W

W

for every tf, EC( W), i.e. the weak convergence. Further, if (Q")neN• is a sequence of transition probability functions and Q 00 is a transition probability function, we shall say that Q" converges uniformly to Q00 if for any Borel set B in W and any £ > 0 there exists an integer nil such that Q 00 ( w; B)- e ~ Q" (w; B) ~ Q 00 ( w; B) + e

for all n~nll, wEW. We shall prove Theorem 2.1.54. For any homogeneous random system with complete connections satisfying (1)-(IV), the transition probability function 1 n-1 . Q11 > converges uniformly as n ➔ oo to a transition probability

- I n

j=O

function Q00 and for any Borel set B in W, Q 00 (•;B)ECL(W). There exists a positive constant c such that

(2.1.28) for all n EN* t/1 EC L( W), where

J

(U 00 tf,)(w) = tf,(w')Q 00 (w;dw'). w

Proof We prove firstly that there exists a transition probability function Q from (W, "fl/') to itself such that (2.1.28) holds. Theorem 2.1.53 implies that 00

p

IIU"I!

~

I IIUdl+IIVnll i= 1

p -i-

I IIUill i=t

2. Random systems with complete connections

118

as n ➔ oo. Therefore there exists a constant D < oo such that IIU"ll~D

for all neN. Let now

1

n-1

on= - L

Ui for neN*. Then, by Theorem 2.1.53,

n j=O

_

1

Un= -(J-U")

n

~ L,

1

+n

J=l

1 1 =-(I-U")+n n

1=1

.

U1

c· )

n

1=1

1-~

1

n

n

J= 1

Lp L lf u,+-1 L Vi. =1

II

Therefore (we suppose that l 1 = 1) Un-Ul

1

1

11

= -(J-U") +- L l , 1 - U; +- L n

n 1=2

1

- , .1

V 1,

so that

where " IIU,11 M c=(l+D)+2 ;~2 11-l,1 + h.

Thus, for any t/leCL(W), 111/111 IIU.,t/l-U1t/lll~IIU"-U1ll llt/lll ~ c . n

(2.1.29)

lim I0.,1/1- u1 t/tl =0

(2.1.30)

and, a fortiori, n ➔ oc,

for all ,J,eCL(WJ Since CL(W) is dense in C(W) and IUni= 1 for neN*, it follows that (2.1.30) holds for all ,J,eC(W), where U 1 has been extended (uniquely) to a bounded linear operator on C(W). Since the operators O" on C(W) are all positive and preserve constants, (2.1.30) implies that the same is true of U 1 . Thus, for any we W, ( U 1 y,)(w) is a positive linear functional on C(W) with (U 1 I')(w)= 1, where r(w) =l. Hence, by the Riesz representation theorem, there exists a (unique) probability Q00 (w;•) on '"If" such that

J

(U 1 t/,)(w) = t/l(w')Q 00 (w;dw') w

(2.1.31)

119

2.1. Ergodicity

for all t/JEC(W). In view of (2.1.31), (2.1.29) reduces to (2.1.28). That Q 00 is a transition probability function follows from the fact that Q(X, (•; B)E CL(W) for every Borel set B. This is obviously true if B = W. Suppose that B is an open set such that its complement ff is not empty. For nEN*, define XnECL(W) by

Xn(w)

=

!

if d (w,JJC)

l

nd(w,JJC)

~

1 1) - , n

1 if d (w,JJC) :::;; - . n

Then

and the convergence is monotonic. Therefore lim (U 1Xn)(w)

n ➔ oo

= JXB(w')Q (w;dw')=Q (w;B) 00

00

(2.1.32)

W

for all WE W. By Theorem 2.1.53, 91 1 = U 1 (CL(W)) is a finite dimensional subspace of CL( W). Hence there exists a constant J < oo such that

for all

i/JE 91 1 .

Therefore

l(U 1Xn)(w1)-(U 1Xn)(w2)I:::;; m(U 1Xn)d (w1, W2) :::;;JIU 1Xnld(w1, W2):::;;J d(w1, W2) for all nEN* and w 1 ,w 2 EW. Letting

n ➔ oo,

we have

IQ 00 (w 1;B)-Q 00 (w 2;B)I :::;;Jd (w 1, w2).

If B is an arbitrary Borel set, w 1 , w 2 E W and e > 0, the regularity of Q 00 (wi;•) insures the existence of an open set Bi,e such that Bi,e ::i B and Q 00 ( w i; Bi, 2 ) - Q 00 ( wi; B):::;; a

1

9

>

d(B,B')=

inf

weB,w'EB'

d(w,w'),

IosifescujTheodorescu, Random Processes

d(w,B)=d({w},B).

2. Random systems with complete connections

120

i = 1,2. Combination of these inequalities with the result just obtained above yields IQ 00 (w 1 ;B)-Q 00 (w 2 ;B)I ~Jd (w 1, w2)+2e

or, since e is arbitrary, IQ 00 (w 1;B)-Q 00 (w 2;B)I ~Jd (w1, W2)-

Thus 0 l•:B)ECL(W) with m(Q (•;B))~J for all Borel sets Bin Wl). To complete the proof it remains only to prove that the transition 1 n-1 probability function Qi converges uniformly to Q 00 • Denote 00

7

n

I

j=O

1

n-1

- L

.

_

Ql=Qn,

nEN*.

n j=O

Since

it suffices to show that if B is open, then

for all

wE W

if n is sufficiently large, while if B is closed

for all wE W if n is sufficiently large. The statement concerning closed sets follows from that concerning open sets by taking complements, so only open sets need be considered. There is no loss in generality in assuming B =I= W. By the definition of the x/s, XB(w);?::xj(w)

for all

WE

W so

Qn(w; B);::: (-0" x}(w)

= Q oo

We shall prove two lemmas.

Lemma 2.1.55. For any homogeneous random system with complete connections satisfying (I) and (V), the operator U has no eigenvalues of modulus 1 other than 1.

Proofl)Suppose lll=l,l#l,i/1¢0 and ljJE\Jl).. Since 11/JIEC(W), there exists an w0 E W such that ll/l(w0 )1 = maxli/J(w)I =Ii/JI. weW

Clearly i/J(w 0 ) :;f 0. Now ( U" i/J)(w 0 ) = An i/J(w 0 ). Let Bn = {w: t/l(w) = .,1," t/l(w 0 )} for nEN*. We have Q"(w 0 ;Bn)= 1 for all nEN*. For, note that

Re ifT

{,,c"~i)

< I

(2.1.33)

WE~.

By (U"t/J)(w) =

JQ"(w;dw')t/J(w') w

(see 2.1.3') taking w=w 0 , we obtain l"t/l(w 0 )Q"(w 0 ;~) =

J Q"(w 0 ;dw')t/l(w'). B~

1

9*

>

Similar arguments were previously used by JAMISON (1964).

(2.1.34)

2. Random systems with complete connections

122

Taking into account (2.1.33) if Qn(w0 ,~)#0, (2.1.34) implies Qn(wo; ~) < Qn(wo; ~).

Consequently, one must have Qn(w 0 ;Bn)= 1 for all nEN*. Since Q(w 0 ;B 1 )=1, B 1 #0. Let w1 eB 1 • Then r/J(w 1 )=lr/J(w0 ), (U"r/J)(w 1 )=An+tl/J(w 0 ), and Qn(w 1 ;Bn+ 1 )=1 for nEN*. But IA.n+t_l"I = IA -11 > 0. Since 1/J is uniformly continuous there exists b > 0 such that d(w',w")

for all w' E W. Thus so

This formula and a simple induction on v imply

for v~O. Thus

130

2. Random systems with complete connections

But any positive integer l can be represented as l=vn 0 +j for some v~O and O~ji+q

1)-

Thus

(2.2.11)

~6M

m+n-2

2

[

q

_L

1-=m

t:;+(n-1)_

L

J>m+q

]

t:j-i-k+1 •

144

2. Random systems with complete connections

Finally, we have according to (2.2.7'), (2.2.10) and (2.2.11)

+12(n-1)

L

BJ-i-Ht

+4 L,

J>m+11

If a= 0, the condition

L, en< oo

Bn]·

neN•

implies

neN•

for fixed k, and it suffices to take q = [n 112 ]. If a> 0, the condition L n" £ 11 < oo implies e" = O(n- 1 neN•

[

")

and

1 ]

elementary computations show that one can take q= n - 1 +" leads to the announced rate of convergence.

which q.e.d.

We deduce immediately the following

Corollary. If 'I' in (2.2.5) is bounded and

L, en< oo,

then for every

neN•

meN*, weW, a>½, the random variable

converges to O in Pw-quadratic mean as n ➔ oo. Proof. The proof follows from Theorem 2.2.19.

q.e.d.

Notes and comments

Theorems 2.2.16 and 2.2.19 are due to

loSIFESCU

(1965e, 1967b).

145

2.2. Asymptotic behaviour

Complements

Supposethat en=O(e-,tt/2i) for some A.>0. Thenforall mEN*,wEW

as

n ➔ oo,

where Pn=O

1 mf . {q+l - - + e-lVq+l }) ( -n + qeN• n

[Cmcu (1957 a, 1957 b)].

2.2.2.4. The central limit theorem

2.2.2.4.1. In wha~ follows we shall suppose that (2.2.3) holds. First we shall investigate the connections between the asymptotic behaviours of suitable normed sums associated with the sequence lfn)nEN* under the assumption that the probability on (Q,X) is either Pw or Pa.. Let (an)neN• be a sequence of real numbers and (bn)nEN* be a sequence of positive numbers such that

lim bn=oo.

(2.2.12)

n ➔ oo

Let us denote

F.m.n>(a) = P.w (sm,n w b - an
,

nE N*'

for a fixed pEA. It follows that (n takes on the (m+ lt values 1 ~jr~m+ 1, 1 ~r~n, with the probabilities 7th·· .1Jp). For the homogeneous case

t/lJ

1

iJp),

...

2.3.1.1.5. Uniform ergodicity reduces for homogeneous simple OM-chains to the existence of the limits

which are not dependent on the initial probability distribution pE ,1 (or ,1'), uniformly with respect to IE N*,pE ,1 (or ,1'), 1 ~ i 1 , ... , i 1 ~ m + 1. 2.3.1.1.6. The concept of OM-chain can be extended in the sense that one may consider chains which are of a given multiplicity l. We arrive to the so-called multiple OM-chains; without entering into further details, we note only that for such chains the transition mappings are in fact dependent not only on the preceding state and preceding probabi1 > Here we identify the sequences (xii,•••,xiJ and (ii, ... ,jJ, l~jr~m+l, l~r~n.

2.3. Special random systems with complete connections

159

lity distribution, but on l preceding states and l preceding probability distributions. In Subparagraph 2.3.1.5.2 we -consider, for example, multiple OM-chains of second order in case of an arbitrary set of states. Notes and comments

The notion of chain with complete connections was introduced in probability theory by ONICESCU and Mrnoc (1935 a). The term OM-chain was used for the first time by FoRTET (1938). The associated Markov chain was introduced by ONICEscu and Mrnoc (1936a) in order to study the asymptotic behaviour of the corresponding OM-chain [see also F0RTET (1938), I0SIFESCU and THE0D0RESCU (1961-1962)]. The idea of defining OM-chains for a continuous parameter by making use of differential equations is to be found in ONICESCU (1954, 1956); for multiple OM-chains, see THE0D0RESCU (1956). The same idea refered to generalized random processes, has been applied by MARINESCU (1960), where the Gateaux differential here replaces the usual operation of differentiation. Complements

1. Let us consider a finite set X = {x 1 , ..• , xm + 1 } and a family of (m+ 1) x (m + 1) stochastic matrices ttp = ('tj,i), tE N. Further, let (V, ~) be a measurable space, called the parameter space, and ir, tEN, a family of ~-measurable mappings of V into itself. Let us assume that ttp, tEN, is dependent on VE V. The parameter v on trial t+ 1, i.e. v(t+ 1),. is

given by v(t+ l)='Tv(t),

tEN;

it follows that

where

r =tTo· · ·o 0 T. A system {V,X,('lf'\eN,CT)teN} of spaces and mappings is an OMchain in the wide sense [THEODORESCU (1968)]; obviously the OM-chain is a special case. If we consider an k-person OM-chain in the wide sense, that is k OM-chains in the wide sense for which the stochastic matrices depend on k parameters, we get as special case the k-person Markov process. For the properties of these processes and their application in economics, see JACOBS (1958).

2. Random systems with complete connections

160

2. Let us consider a homogeneous simple OM-chain and let us set

where roi,ji .. J,.(pn)

= Y'i,ji (p") ... ,J,j,._ i,j,.(pn+r-1 ),

nE N*,

provided that on trial n occured xi. This is then-step entropy of a path of length r provided that on trial n occured xi. The corresponding expected entropy is m+l

W,.(p)=

L

1tir•-dp)H'in,iP)

ik= 1 1 ~k~n

and we have r-1

W,.(p) =

I

H~+k(P),

k=O

H~+s(p) = H~(p) + H!+,(p).

If lim W,.(p)=H'00 (p),

reN*,

n-+ oo

then H'"oo(p) =r H~(p).

Further, a Shannon type theorem can be proved [IoSIFESCU and THEODORESCU (1961)]. This result extends to OM-chains those obtained by HINCIN (1953) for simple Markov chains and those obtained by AMBARCUMJAN (1958) for multiple Markov chains. 2.3.1.2. Examples

2.3. l.2. l. Let us consider an initial urn U O containing ai balls of colour j, 1 ~j~m+ 1, and denote by aj, 1 ~j~m+ 1, the structure of the urn Un(aJ = aj, 1 ~j ~ m + 1), given by the following rule; if the structure of the urn Un was aj, 1 ~j ~ m+ 1, and on trial n a ball of colour i was drawn, then the structure of the urn U" + 1 is n+l _ f" ( n n ) aj i,j a1,···,am+1 , where the function /7,i' 1 ~i,j~m+ 1, nEN, are taking values in N. The conditional probabilities of drawing a ball of colour j on trial n are

2.3. Special random systems with complete connections

161

and on trial n + 1

(2.3.3)

Further, if we denote by Dj, 1 ~j ~ m + 1, the algebraic complements of the elements from the last raw in the determinant

-

II

P2

we obtain and therefore the conditional probabilities (2.3.3) become n+ 1

Pi

ff. Pn Di, · · · • An D~ + 1) = m+l

L

,

1 ~j~m+ 1.

(2.3.4)

f?.dAriDi, ... ,}.nD~+d

k=l

Assuming now that the fL 's, l~i,j~m+l, nEN, are homogeneous of the same order, we deduce from (2.3.4) f '!i,J-(Dnt,•••, Dnm+l )

n+l _

Pi

- m+l

I

_ ,/,n ( n n ) - 'ri,i Pi,•• •,Pm+1 ,

n.k respectively.

2.3. Special random systems with complete connections

167

2. Let us consider an initial urn containing a 1 balls of colour 1 and a 1 balls of colour 2, a 1 + a 1 = M. Let us assume that on trial n a ball was drawn; then this ball is replaced while a ball of the opposite colour is added to the urn. We get a homogeneous simple alternate linear OM-chain. An equivalent model is the following. Let us consider the sequence of urns with the following compositions: (O,M),(1,M-l), ... ,(M -1, 1), (M,O).

Obviously, the initial urn is located somewhere in this sequence. If the drawn ball is of colour 1 (2) the next drawing is performed from the foregoing (following) urn. Thus we get a circular system of urns [ONICESCU and Mmoc (1936 b)]. If we denote by vn the number of balls of colour 1 drawn during the first n trials, then the urn will have after n drawings the composition (a1 -2vn+n,a 2 +2vn-n). But O~a 1 -2v+n~a 1 +a 2 ,

O~a 2 +2v-n~a 1 +a 2 ,

from which we get

a

1

v

a1

- -2~ - - - ~ - , 2n n 2 2n

that is

V

1

lim - = - . n 2

n---->oo

This relation highlights an interesting example of the law of large numbers; in this case the convergence is in the classical sense of analysis [ see also CANTELLI (1935)]. It follows that ) 1 . P (p)+ ··· +Pn(p 1 hm - - - - - - - = -

n

n---->oo

2

i

for any p = -, O~i~M. M The circular system of urns considered above has also other interesting properties. For instance, if a1 = a 2 =a= 2 k, then · hm P112(V2n-n=x) = 21-la (

n ➔ oo

2

a )

a-21xl

for lxl~k,xEZ, and lim

P112(V2n+l

-n=x) = 21-2a (

n-+oo

for lxl ~k, xEZ, x#O. 12 Iosifescu/Theodorescu, Random Processes

2a

a-21xl+l

)

2. Random systems with complete connections

168

For sufficiently large a, by making use of the Stirling formula, we get 21-2a (

2a

)

a-21xl

~

_ 4 x2

2

- - e "'

y'an

i.e. the Gaussian law. For small a, the limit law differs essentially from the Gaussian one [ONICESCU and MIHOC (1936b)]. 3. The Thurstone urn. It suffices to take

d12 =0, d21 =l2 ,

m=l, d11 =21 ,

d22 =0.

This amounts to the fact that if on trial n a ball was drawn of colour 1 (2), then this ball is replaced while ,1. 1 (...1. 2 ) balls of colour 1 are added to the urn. This urn has important applications in learning theory [THURSTONE (1930)].

4. The urn model described in Subparagraph 2.3.1.2.2 may be generalized by setting

1----1·---m+l a::::: a::::: ,

an+l_a"+d" j j i}• i.e. the quantities d:i vary with n. An important special case is The Luce urn. It suffices to take m= 1,

di.2=0.

This amounts to the fact that if on trial n a ball of colour 1 (2) was drawn, then this ball is replaced while - 1- ai ( -1- a2) balls of colour 1 1 +/31 l+P2 are added to the urn. In other words, the number of balls of colour 1 added is proportional to the number already in the urn. In this case p"+ 1 l

-

JJ':

Pi +P;(I-pD

[LucE (1959)]. This model is very important m learning theory; see Subsection 3.3.1. 2.3.1.3. Ergodic theorems

2.3.1.3.1. Let us introduce a metric d in A' by setting m

d(p,q)

=

I

j= 1

lpj-

qX

2.3. Special random systems with complete connections

169

We shall obtain an upper bound for the norm µ(f) =

sup p,qe-1' pt=q

d(f(p),f(q)) d(p,q)

of a mapping/= (fi, ... ,fm) of Lf into itself. Proposition 2.3.4. Suppose that the f Js, 1 ~j ~ m, have first order bounded partial derivatives. Then

I lofj(p)I ·

µ(f) ~ max sup

Opk

1 ~k~m pe.d' j= 1

Proof. We have, for some 0 1. It should be also noted that by specialising the results of Paragraph 2.1.2.4 we could obtain sufficient conditions for uniform in the strong sense ergodicity for nonhomogeneous OM-chains. 2.3.1.3.3. Now consider the homogeneous case. Let .d' be endowed with an arbitrary metric d. We have

Theorem 2.3.6. Suppose that (j) there exists k 0 , 1 ~ k 0 ~ m+ 1, such that

m

L'P;,j(p) 0, c/>2,1 (p)> 0 (or for every pe [O, 1];

4>1.JP)0,

lcl~l,

d>0,

c+d>0

lal~l,

b x,

and (2.3.19) uniformly with respect to pe A. Proof. Relation (2.3.18) is a simple consequence of uniform ergodicity and a known Cesaro theorem. Further, we have EP (c2n,11·--11 > . .

1

2

n

'° E

=- -~ (n+l)2

1-0

n

'° '°

n-1

.v; . . + - -~ -~ EP~i.ji···11-k,J1·--11· r .( . . (n+l)2 1-0 k-t+l

P"'l,)1·--11

In virtue of uniform ergodicity of the considered OM-chain, we obtain n

L EP CL

n

1 ...

i,

L EP (i,ii-••ii ~ (n+ l)Pi~---i, + b

~

i=O

(2.3.20)

i=O

smce Next, 2

n-1

n

---2 " " (n+ l) i~Ok=f+1

Epr,.. 1•1···1·, r_k,1'1--·1·, ~ [P~ ·] 2 11 "'

1.,

"

·--Jr

A

A

2 + _t_ p~ . +--n+l 11 --· 11 (n+l) 2 '

(2.3.21) where A1 and A 2 are constants depending on b but not depending on n. By (2.3.20) and (2.3.21), we get Ep',,n,]l••·Jl r . . ~[P~ .]2+!!!._P~ ~ Jt· .. J! n+l Jt•··JI. + (n+1)2'

(2.3.22)

where B1 and B 2 are constants depending on b but not depending on n. Analogously, we have

(2.3.23) where C 1 and C 2 are constants depending on b but not depending on n. According to (2.3.22) and (2.3.23), we obtain the relation DP(,t ~ Pf +0,01, Pf-0,004 ~ z(i>,t ~ Pf +0,004, Pf-0,03 ~ 1 ~ Pf +0,03, Pf -0,01 ~ 1 ~ Pf +0,01, Pf -0,004 ~ 1 ~ Pf +0,004.

(1)

(2) (3) (4)

vt vt vt

(5) (6)

Summarizing the results, we arrive to Table 1

IA 1B

IIA IIB

(1)

(2)

(3)

31 98 20 62

779 378 702 215

982

(4)

(5)

(6)

972 560 827 221

Table 1 provides that subscript i from which on all values of zw,i (respectively, vj, 1 ), i ~j ~ 1000, verify the corresponding inequalities in the paths A. B (IA and 1B for the first OM-chain and IIA and 11B for the second one); the sign - shows that such an i does not exist. Table 2a (respectively 2b) gives the numbers of values z(i).J and 1 which verify the inequalities (1)-(6) in different parts of the simulations (1-100, 101-200, ... , 901-1000 and 1-1000).

vt

2. Random systems with complete connections

182

Table2a (6)

(5)

(4)

(3)

(2)

(1)

IA

I 1B

1-100 101-200 201-300 301-400 401-500 501-600 601-700 701-800 801-900 901-1000

89 100 100 100 100 100 100 100 100 100

73 100 100 100 100 100 100 100 100 100

31 68 100 100 60 2 1 21 100 100

9 20 10 25 100 100 100 100 100 100

10 6 43 18 0 0 0 0 0 0

4 0 0 0 0 74 58 68 55 21

22 42 90 98 35 0 0 0 0 29

5 0 0 14 34 96 100 100 100 100

9 3 0 2 15 0 0 0 0 0 0 40 0 11 0 1 0 0 0 0

3 2 0 0 10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

I- 1000

989

973

583

664

77

280

316

549

26

13

-----

-

IA I 1B IA I 1B IAIIB

IA I rn IA I 1B

55

2

---

Table 2b (2)

(1) -

1-100 101-200 201-300 301-400 401-500 501-600 601-700 701-800 801-900 901-1000 1-1000

(4)

(3)

(5)

(6)

IIA I 118 IIA I 11B IIAIIIB IIB I na IIAIIIB 1IAJ1rn 86

100 100 100 100 100 100

63 100 100 100 100 100 100 100 100 100

18 0 4 12 0 43 2 98 100 100

100 97 100 100 100 100 100 100 100

986

963

377

917

100 100 100

20

2 0 0 0 0 0 0 0

0 0 2

5 25 21 87 44 0 0 0 0

16 0 0 0 0

3 0 0 0 0 0 0

1 15 8 31 7 0 0

1 0 0 0 0 0

0

0

0

0 70 94 100

15 95 90 100 100 100 100 100 100 100

0 0

0 0

0 0 0

182

291

900

3

62

11

0

1 5 0 1 0 0 0 0 0

0 7

Table 3 summarizes the results concerning the simulation of the alternate OM-chain with the initial probability distribution p1 =0,5, p 2 =0,5 and with the transition mappings 1, 1 (p)

= 0,1 p 2 -0,3 p+0,6,

2.1 (p) = 0,1 p2 +0,2p +0,3.

2.3. Special random systems with complete connections

183

Table 3

25 50 75 100 25 50 75 200 25 50

15 300 25 50

15 400 25 50

15 500 25 50 75 600 25 50 75 700 25 50 75 800 25 50

15 900 25 50

15 1000

z We adopt the following convention concerning the a-ary expansion of u in ambiguous cases. The a-ary expansion of u=l will be taken as u=O,(a-l)(a-1) .... In all other ambiguous cases, an expansion terminating in O's will be prefered to one terminating in (a- l)'s. Thus in the decimal system, the expansion of u = l will be 0, 999 ... while the expansion of u = ½ will be 0, 5000 ... rather than O, 4999 ....

2.3. Special random systems with complete connections

191

We have Theorem 2.3.9. Suppose that Q(•; x) are Borel measurable for all suppose also that there exists a distribution function G, with G(O- )=0, G(l)= 1, which satisfies the functional equation

XE X;

G(u)=

L

au-x

j Q(v;x)dG(v),

xeX

uE[0,1].

(2.3.28)

0

Then there exists a probability space (Q,%, P) and a X-valued strictly stationary double infinite sequence (~r)rez defined on (Q,%, P) such that P-a.s.

P(et = xlet-1,er-2 ...) = P(... ,et-2,er-1 ;x)

(2.3.29)

for all xEX, tEZ. Proof Consider a real-valued Markov chain (r1r)rez on a suitable probability space (Q,%,P) with state space [O, 1], stationary absolute distribution given by G and transition probability functions given by

P('1, = x:u I17,_

1

ue[O,l].

=u)=Q(u;x),

(2.3.30)

In virtue of (2.3.28) these conditions are compatible. Define the function hon [0,1] by h(u) = first digit in a-ary expansion of u.

Now define the random variables~' by

er-

1

= h(r,r),

tE

z.

Obviously (~1),ez is a strictly stationary double infinite sequence on (Q,%,P) and . =1 ( " -~t-j)

p r/r=

~

jeN*

aJ

since P(rrt-i =ar,r-h(r,r))=l for all teZ. Also J:. J: • (" p ( ··•,",,t-2,i,,t-l,X)-Q ~

jeN*

~t-j. j - , X)

a

~r- er-

(2.3.31)

holds P-a.s. for all teZ. The only "path" (... , 2, 1 ) for which the two sides of (2.3.31) might be different are paths other than ( ... , (a - 1), (a -1 )) which contain only a finite number of components different from (a-1) and these can be shown to have P-probability 0. It follows from this that (2.3.29) holds. q.e.d.

2. Random systems with complete connections

192

We note that the Borel measurability of Q(•;x), xEX, may be assured if we suppose thatll w"-+w implies P(w";x)-+P(w;x) as n-+ oo. In fact under this condition it follows that Q(•;x), xEX, are continuous to the right for each uE [O, 1] and continuous to the left except possibly at lattice points (that is, the points whose a-ary expansion terminates in an unbroken sequence of O's). Thus Q(•; x), xEX, belong to Baire class 1. Particularly, this holds if Condition H below is satisfied. 2.3.2.3.3. We shall use the notation (u = v)n

to mean that the first n digits in the a-ary expansion of u are the same as the first n in the expansion of v. Define

hn= sup IQ(u;x)-Q(v;x)I,

neN*.

(2.3.32)

x,(U= V)n

We shall consider

l

Condition H. We have lim hn=O

n---. co

~w.J I TT"

ah 1 - -~ =oo

i)

We shall agree that any of the factors

(1 - 2ahk)

(2.3.33)

in (2.3.33) which

is zero or negative will be replaced by 1. We note that the condition L h" < w implies Condition H but the converse may be not true (for neN*

example Condition H is satisfied provided we have hk ~ 2(a k)- 1 for sufficiently large k). We have Theorem 2.3.10. Let Gn(v;u) be the n-step distribution function induced by the Markov transition law (2.3.30) starting from a given value u (that is G"(z:;u)=P(11n~vl17 0 =u)). Suppose that Condition His satisfied and that there exists x 0 EX such that

(2.3.34) 1

> If wn=( .. . , x~\,x~>) and w=(... , x_ 1 ,x 0 ), then each kE -N, xL"1=xk for all n sufficiently large.

Wn--+W

will mean that for

2.3. Special random systems with complete connections

193

Then (i) Gn(v;u) is Ci-summable to a distribution function G(v) which is independent of u. If hn < oo holds (instead of Condition H) then the

L

neN•

ordinary limit exists. In either case the limit is uniform in u; (ii) the distribution function G either has a single discontinuity of magnitude 1 at one of the points 0,1/a-1, 2/a - 1, ... , 1 or is continuous. (iii) the distribution function G is the only stationary absolute distribution for the considered Markov transition law and satisfies (2.3.28). Proof. (i) The method is related to an idea of DoEBLIN (1938), who proved the ergodic theorem for Markov chains with a finite number of states by considering two particles starting in different states, which move independently until they simultaneously occupy the same state, after which they merge. In order not to obscure the main idea by details we sketch the proof for the case a= 2. Since (2.3.34) holds we can just as well take x 0 = 0. Then

Let (tm)meN be independent random variables uniformly distributed on (0, 1). Define sequences (']~)meN and (']~)meN as follows: '7~ = u', r,~ = u", u', u" E [O, 1]. Suppose r,~ and '7~ are determined. Then

'

'lm+t=

for

tm~Q(r,~;O),

for

tm>Q(r,~;O),

21'fm

for

tm~Q(r,:;o),

.!_+_! II 2 21Jm

for

tm>Q(,,:;o).

{t +½'lI:. , 2

21Jm

while II

1'fm+ 1 =

{'"

It is easy to see that (11~)meN• and (']~)meN* are Markov chains with transition law given by (2.3.30). It is convenient to let r;n(T;~) designate the transformation applied to 11~(17;). That is T~ = i(T~ = i) if

/

_ i+r,~ ( tfm+l - i+r,:) -,

'1m+1 - --

2

II

_

2

i=O,l.

We have

0

A slight modification is necessary if u' = 1 or u" = 1.

194

2. Random systems with complete connections

which in tum implies

I'lm+k+l I

II

I ~2-" •

'lm+k+l ""

It can be shown that these relations lead for any e > 0 to

1 lim -

n ➔ oa

n

n-1

L Prob(l'l:..- '1~1 > e) = 0,

(2.3.35)

m=O

uniformly in u' and u". A simple argument then shows that Gn(v; u) is C.-summable to a distribution function G(v) which is independent of u. Moreover, the difference

tends to zero uniformly in ue [O, 1] as n-HJJ at all points of continuity of G. If the stronger condition L hn < oo holds instead of Condition H, neN•

we can replace (2.3.35) by the stronger statement lim Prob(l,,;,.-,,~I > e) = 0. We then get actual convergence, rather than just C 1 -summability, of the distributions to G. (ii) First let x 0 in (2.3.34) be 0. If Q(O; 0) = 1 it is clear that G has a jump of magnitude 1 at v =0, and conversely. If Q(O; 0) < 1, G is everywhere continuous. First, G must be continuous at 0. For let k, nE N*, k ): a. The probability transition function p together with an a priori distribution function for the initial value d 0 l = a 0 ( or l}

ii

en,

Here PW and neN*, are those constructed in Theorem 2.1.3. This is not an essential restriction; it is made here for the sake of simplicity.

2.3. Special random systems with complete connections

199

more generally, for a initially fixed segment (a(m), x, neN*. If we introduce the notation Pn=(p~, ... ,p!), ~=Prob(an=ilx),

a}) = P.(Sn(ro)E A),

Q"(s;A) =

nE N*,

Q0 (s;A) = X..t(s),

for all Ae9'. 3.1.2.2.2. Theorem 3.1.9 below shows what can be said about the asymptotic behaviour of the random sequence (Sn)neN" for distance diminishing models with no further assumptions.

Theorem 3.1.9. For any distance diminishing model, the transition probability function

converges uniformly as n--+ oo to a transition probability function Q 00 and for any Borel set A in S, Q-~(-;A)eCL(S). There exists a positive constant c such that

II:

:t: E.t/,(S1)-(U) Q00 (• ;ds') 1 >.

s

Proof The proof follows from Theorems 2.1.65 and 2.1.66.

q.e.d.

We have also Theorem 3.1.16. If an absorbing distance diminishing model and an A c £0, there exist two positive constants v< 1 and K., such that

for all seS, m,neN*, xeR. (e) Denote by o-f ( ti') the series obtained by replacing in the expression 2 of o- ( P) the term E Y. P(E i) 'l'(En + i) by E 00 'l'(E i) l/'(£" 1+ i). If l is sufficiently large (such that L(J,' < 1) and o-1(P)>0, then

- I (n ➔ oo Iim

P s

for all seS, meN*.

11

l/'(E(h-1)l+m)-n£ootp

=_!__ ---~---_-_-_-_-~

O"i(l/')V2nlog2n

)

= 1 =1

217

3.1. Basic models

Proof The proof follows from Proposition 2.2.18, Theorems 2.2.19, 2.2.24, 2.2.12 and 2.2.27. q.e.d. We note that we could also give another asymptotic properties by transcribing those given in Section 2.2. Notes and comments

The theorems given here are due to Theorem 3.1.17.

NORMAN

(1968 a), except for

3.1.3. Finite state models 3.1.3.1. Introductory comments

3.1.3.1.1. It is possible to develop a theory of finite state models that completely parallels the theory of distance diminishing models given in Subsection 3.1.2. This will not be done here, since the results concerning states obtained by these relatively complicated methods are, if anything, slightly inferior to those that can be obtained by applying the well known theory of finite Markov chains to (Sn)neN•· 3.1.3.1.2. The natural analogue of (V ), Theorem 3.1.10, for finite state models is (V') for any s,s'ES if n is sufficiently large.

This is equivalent to (V') the finite Markov chain (Sn)ne N* has a single ergodic set and this set is regular

orto (V') there exist n 0 EN* such that the coefficient of ergodicity of the n 0 th power of the transition matrix Q of the finite M arkoi, chain (Sn)neN• is positive.

By analogy with Definition 3.1.11, we have Definition 3.1.18. A finite state model is ergodic 11 if it satisfies (V'). Notes and comments

The above description of finite state models 1s due to (1968a).

NORMAN

n (Sn)neN* need not be an ergodic chain, since it may have transient states. If there happen not to be any transient states, the chain is ergodic and regular.

218

3. Learning

3.1.3.2. Properties

3.1.3.2.1. Theorem 3.1.19 is analogous to Theorem 3.1.10. Theorem 3.1.19. For any ergodic finite state model there exist two constants a< 1 and c and a probability Q 00 on S such that

(3.1.10) for all real-valued functions t/1 on S, nEN*, where

uoot/1 =

I

(3.1.11)

t/l(s)Qoo({s}).

seS

Proof Let t be the number of states, i.e. S = {s 1 , ••. , s,}. The transition matrix Q = (Qij) and the column vector ,J,*' =(t/1!, ... , I/ti) corresponding to t/1 are defined by

Qij=Q(si;{sj}),

Then

t/lr =t/l(si),

1 ~i,j~t.

1 ~i~t,

£.,t/l(Sn)=(Q",J,*);,

for nEN*, where (Q"t/1*); is the ith component of Q"t/1*. By Theorem 2.1.32 (or 2.1.35)1 >there exists a stochastic matrix A= (A;) all of whose rows are the same, say (a 1 , ••. , a,), and there are two positive constants a< 1 and b such that

(3.1.12) for all nEN*, 1 ~i,j~t, where (Qn)ii is the element of the matrix Q situated in the ith row and the jth column. Let now Q00 be the probability on S with

Q00 ({sj)=aj,

1 ~j~t,

and let £ t/1 be any coordinate of A if,*. Then (3.1.11) holds and 00

1£.,l/l(Sn)- Uc,: t/JI = l(Qnt/J*);-(A t/1*);1 =

lit

((Q")ij-A,Jt/Jjl

t

~

L l(Q")ii-Ai)lt/l(s)l~tbix"llt/111,

l~i~t.

j= 1

This gives (3.1.10) with tb c=-.

q.e.d.

(j_

1>

For a direct proof see, e.g.,

KEMENY

and

SNELL

(1957), Chapter 4.

219

3.2. Linear models

3.1.3.2.2. The next theorem parallels Theorem 3.1.15.

Theorem 3.1.20. For any ergodic finite state model

for all n, le N*, A c £Cl>, where t

K

1 (A0>) = L Kl (si; A0 >) Q

00

({si}),

i=l

and c and

IX

are as in Theorem 3.1.19.

Proof. The proof follows from Theorem 3.1.15 (i.e. from Theorem 2.1.66) and Theorem 3.1.19. q.e.d.

We have also Theorem 3.1.21. All of the conclusions of Theorem 3.1.17 hold for any ergodic finite state model. Proof. This follows from Theorem 3.1.20.

q.e.d.

Notes and comments

The results mentioned here are due to

NORMAN ( 1968 a).

3.2. Linear models 3.2.1. The (t + 1)-operator model 3.2.1.1. Description of the model

3.2.1.1.1. Suppose that the subject in a learning experiment is faced with a set of response classes (shortly, responses) Ai, ... ,Am+ 1 on each trial, and that, following response A;, 1 ~i~m+ l, one of the t+ l events Ei, l ~j ~t+ l, occurs. The event probabilities are supposed to depend at most on the most recent response. Let A;,n and Ei,n denote, respectively, the occurence of A; and Ei on trial n and denote by nnii the conditional probability P(Ei,nlAi,n). Such an N-model can be described by identifying pn, the conditional probability vector of A 1 ,n, ... , Am+l,n• with the state Sn and by making the following stipulations:

3. Leaming

220

E = {j: 1 ~j ~ t + 1}

Uabbreviates Ej),

nvip) =vn(p;j)="Ajp, m+l i=l

for all 1 ~j ~ t + l; here A 1is a (m + 1) x (m + 1) stochastic matrix and "ll=("ni) a (m+l)x(t+l) matrix such that O~"nij~l, l~i~m+l, 11

r+ 1

1 ~j~t+ 1, and

L "nii= 1, I ~c~m+ 1.

j- 1

Let us examine closely the matrix "Ai; in addition, for the sake of simplicity we shall omit the subscripts nE N and 1 ~j ~ t + 1, that is A= (a;k); obviously, m+l

L

aik=

1 ~k~m+ 1.

1,

i= 1

For m= 1, we set then 1-b A= ( b

If now we apply this event matrix operator to p, we get Ap = ((l-b)p1 + ap2 ) bp 1 +(l -a)p 2

•

Further, if we denote the first element of the foregoing probability vector as Q(p 1) and the second element by Q(p 2 ), then Q(p1)=(l-b)P1 +ap2=P1 +a(l-pi)-bp1, Q(p2)=bp1 +(l-a)p2 =p2 +b(l-p2)-ap2,

i.e. we obtain the so-called gain-loss form. The following two forms are also used: Q(pi)=ap 1 +a,

and

-a~cx~ 1-a,

O~a~ 1

3.2. Linear models

221

the first is the so-called slope-intercept form and the second one the fixed-point form, since the solution of the equation

is exactly p 1 = y. We shall use also the form

Using the fixed-point form, we can write,

A =cxl +(1-a)I', where I is the identity matrix and

r-c~y .~J· so that, returning to n EN and 1 ~j ~ t + 1,

where

" ( "Y1 ) Y1= l-"Y1. 3.2.1.1.2. We turn now to the general case m>l. We wish to introduce one further restriction which is automatically fulfilled when m= 1, namely: if m + 1 classes of responses are initially defined and if the experimenter later decides to treat any two classes in identical manner, it should be possible to combine those two classes, thereby obtaining the same results that would have been obtained had only m classes been defined initially. For the sake of simplicity, we consider the case m=2. We start with three response classes A 1 , A 2 , A 3 and a probability vector p' = (pi, p2 , p 3 ). We wish to combine classes A 1 and A 2 to form a new class which we label Ac, and to represent the vector in the collapsed space by

Kp

= (~}

p,=p, +pz,

where, obviously,

K = (~

~

~)-

222

3. Learning

Taking into account that A is a stochastic matrix, and after applying the collapsing matrix K, we get (a11 +a21)P1 +(au +a22)P2 +(au +a23)p3) KAp

=

(

0 a31p1 +a32P2 +a33p3

.

Since the elements of this probability vector are to be independent of the probability distribution within class Ac, the components of this vector must not depend on p 1 and p2 individually but only on their sum Pc· Hence

If we combine now classes A 1 and A 3 and then classes A 2 and A 3 we obtain in a similar manner a21 =a23=a2,

a12=a13=a1.

Thus

But a 11 +a 2 +a3 =l, a 1 +a 22 +a 3 =l, a 1 +a 2 +a 33 =1, so that where

If we set

then A =a.I +(1-rx)I',

3.2. Linear models

where

Since

223

1'1 I'= Y2 ( '}'3

we obtain, finally, the expression A p=ap+(l-a)y

or Ap=(l-0)p+0y,

0=1-a.

Returning to neN and 1 ~j~t+ 1, we can write nvip)=nctjp+(l _nct)ni'j

or nvip)=(l _n0i)p+n0/yi,

noi= 1 _nai,

which is valid, obviously, for any m > 1. From the last form, we deduce that the new value nvi,k(p) of the probability for the kth response class Ak is a linear function of Pk and does not depend on p1, l # k, i. e. nvj,k(p)=nctjpk+(l _na)n"Yj.k,

where nYi,k is the kth component of ni'i• 3.2.1.1.3. Let us examine now for the homogeneous case the repetitive application of a single event matrix operator, say Ai. By simple computation, we deduce Ai=~l +(1-cx1)½, reN*, so that

which yields that Aip ➔ yi as r ➔ oo provided that lex)< 1. Finally, we notice that two different matrices, say Aii and Ah commute iff at least one of the following conditions is fulfilled: (a) cxit=l,

this is a simple consequence of the equality Ah Ah-Ah Ah =(1-cxit)(l -cxh)(½ 1 -I'h).

3. Learning

224

3.2.1.1.4. It is now possible to introduce the following precise definition. Definition 3.2.1. A (t+ !)-operator model ((S,d),E,(vn)neN,("K),.eN) said to be linear if

IS

p' =(Pi, · · ·, Pm+ 1 ),

m+l

nKiP)=

L

p/n;j,

i= 1

for 1 ~j~t+ l; here n"li is a column probability vector. For the sake of simplicity we shall omit the term linear throughout this Section. If it is assumed that some of the E/s positively reinforce Ai in the weak sense that they do not decrease the probability of Ai, we deduce that the corresponding ny/s are equal to ti=(c51i) 1 ~ 1~m+i· We can.give, consequently, Definition 3.2.2. A linear (t + 1)-operator model is said to be with reinforcement if all the ny/s, 1 ~j ~ t + l, nE N, are probability vectors of the form ei.

Further, let us assume that the subject in a learning experiment is faced with a set of response classes A 1 , ... , Am+ 1 on each trial, and that, following response Ai, 1 ~ i ~ m + 1, one of r + l observable outcomes Oik• 1 ~k~r+ l, occurs. The outcome probabilities are supposed to depend at most on the most recent response. Let Ai,n and Oik. denote, respectively, the occurence of Ai and Oik on trial n and denote by nnik the conditional probability P(Oik., IAi,n). Such a model can be described by identifying, as previously, pn, the conditional probability vector of 11

1

3.2. Linear models

225

A 1 ,n, ... , Am+1,n, with the state Sn, by identifying the response-outcome pair that occurs on trial n with the event En and by making the following stipulations:

S= {p'=(p,, ···,Pm+1): O,;;;p;,;;; 1, d(p,q)=

Cf

(p,-q;)2

1 ,;;;;,;;;m+ 1,

J:

p,=1 },

)''2,

E = {(i, k): 1 ~ i ~ m + 1,

1 ~ k ~ r + 1},

((i, k) abbreviates (Ai, Oik)),

for all l~i~m+l, l~k~r+l,nEN. This learning model is a variant of the (t + 1)-operator model and we shall refer to it as the (m + 1)(r+ 1)-operator model, that is t + 1 =(m + 1) x (r+ 1). Clearly, it can be interpreted also as a BM-model, namely it is the classical BM-model with experimenter-subject-controlled events. Notes and comments

The form of the linear model as presented in this Paragraph is due mainly to BUSH and MOSTELLER (1955) and NORMAN (1968a), though several aspects are new. By making use of chains of infinite order, in the sense of Complement I to Paragraph 2.3.2.1, LAMPERTI and SUPPES (1959, 1965) gave several properties of linear models, namely asymptotic theorems. The results of LAMPERTI and SuPPES (I 959) have been extended by BERT (1964, 1968).

3.2.1.2. The (m+ 1)2-operator model with reinforcement

3.2.1.2.1. Let us consider the homogeneous (m+ 1) 2 -operator model with reinforcement. That is, the subject in a learning experiment is faced with the set of response classes A 1 , .•. , Am+ 1 on each trial, and that, following response A;, 1 ~ i ~ m + 1, one of m + 1 observable outcomes Oik• l~k~m+l, occurs. It is assumed that Oik• 1 ~i~m+1, positively reinforce Ak in the weak sense. The outcome probabilities are supposed to depend at most on the most recent response. We con-

3. Leaming

226

elude that this model can be described by making the following stipulations:

S={p'=(p1,··•,Pm+l): O~pi~l, d(p,q)=

Ct:

(p;-q;)2

l~i~m+l,

:t:p;=l},

)1'2,

E={(i,k): I~i, k~m+1},

for all 1 ~ i, k ~ m + 1. In this terminology, the above relations define a (2m+ l)(m+ !)parameter family of (m + 1) 2 -operator models, one for each choice of 01k, l~i,k~m+l, and n;k, 1~i~m+1, l~k~m. Since and

it clear that any (m+ 1)2-operator model satisfies all of the conditions of Definition 3.1.8 except perhaps (IV). For the sake of simplicity, we shall restrict ourselves throughout this Paragraph to four-operator models, i.e. m = 1. In this case it is simpler to identify the conditional probability Pn of A 1 ,n with the state S,., so that

S=[O,l],

d(p,q)=lp-ql,

E={(i,k): l~i. k~2}, V;k(p)=(1-0,k)p+O;kt51k, K;k(p) ={p()il +(1- p)bi2) 71:ik•

for all 1 ~i,k~2. This amounts to the fact that

and

3.2. Linear models

227

3.2.1.2.2. The asymptotic behaviour of the sequence (pn)neJV$ 1 > associated with the four-operator model depends critically on the number of absorbing states. Proposition 3.2.3 catalogues the absorbing states for such a model. Proposition 3.2.3. The state 1 is absorbing if! 0 12 =0 or n 12 =0. The state O is absorbing if! 0 21 =0 or n 21 =0. A state pE(O, 1) is absorbing if! for each (i,k)eE, 0ik=0 or nik=0; in this case all states are absorbing, and the model is said to be trivial. Proof. A state pE(O, 1) is absorbing iff for any (i,k)EE, either vik(p)=p (in which case 0;k=0 and vik(x)=x) or Kik(p)=0 (in which case nik=0 and Kik(x) = 0). The state 1 is absorbing iff 1 - 0 12 = v12 ( 1) = 1 or n 12 = K 12 (1) = 0. Analogously, for the state 0. q.e.d.

The next proposition tells which four-operator models satisfy (IV), Definition 3.1.8. Proposition 3. 2.4. A four-operator model is distance diminishing iff for each i = 1, 2, there exists some k;, ki = 1, 2, such that 0ik, > 0 and nik, > 0. Proof. Suppose that the condition given by the proposition is met. If p>0, then Klk;(p)=pnlk,>0 and µ(vlkJ= 1-0 1k 1 < 1. Similarly, if p < 1, then K 2 k 2(p) > 0 and µ(v 2 k < 1. Thus (IV), Definition 3.1.8, is satisfied with l = 1 for all states. Suppose that the condition fails. Then for some i E { 1, 2} and all kE{l,2}, 0ik=0 or nik=0. Since the cases i= 1 and i=2 can be treated similarly, only i = 1 will be considered. It follows from Proposition 3.2.3 on taking k=2 that 1 is an absorbing state. Thus Km 1 n1 , .•• ,m,n,(1)>0 implies mt=1 and 7r1nr>O, l~t~l. But then 0 1 n,=0 for l~t~l and µ(vm 1 n1 , ... ,m,n.>= 1. So (IV), Definition 3.1.8, is not satisfied. q.e.d. 2

)

3.2.1.2.3. With one inconsequential exception, distance diminishing four-operator models are either ergodic or absorbing. Theorem 3.2.5. If neither O nor 1 is absorbing for a four-operator model, then O;k > 0 and nik > 0 for i =I= k, and the model is distance diminishing. Either (i) 0ik = 1 and nik = 1 if i =I= k, or (ii) the model is ergodic. Proof. By Proposition 3.2.3 if neither O or 1 is absorbing, then 0ik > 0 and nik > 0 for i =I= k, and the model is disl!ance diminishing by Proposition 3.2.4. 1>

p0 = p is the initial probability of A 1 .

3. Learning

228

Suppose n 21 < 1. Then by considering first the case p = 0, then p>O and 012 = 1, and finally p>O and 0 12 < 1, it is seen that (1-0 12 )" pE Tn(p) for all nE N*. Thus

as n-HXJ, and the model is ergodic according to Definition 3.1.11. By symmetry the same conclusion is valid if n 12 < 1. Suppose that 0 12 < 1. Then (l-0 12 )'1 pET11(p) for all p>O, nEN*, and (l-0 12 )''- 1 021ET"(O) for all nE N*. Since both sequences tend to 0, ergodicity follows. The same conclusion follows by symmetry when 021 < 1. Thus if (i) does not hold the model is ergodic. q.e.d. The behaviour of the random sequence (p 11 )neN when

Oik

= 1 and

= l for ii= k is completely transparent. Starting at

p the random sequence moves on its first step to 1 with probability 1 - p and to 0 with probability p. and thereafter alternates between these two extreme nik

states. This cyclic model is of no psychological interest and will be discussed no further. As a consequence of Theorem 3.2.5 all of the theorems of Paragraphs 3.1.2.2 and 3.1.2.J for ergodic models are valid for noncyclic fouroperator models without absorbing states. Theorem 3.2.6. If a distance diminishing four-operator model has an absorbing state, then it is an absorbing model.

Proof. The condition given by Proposition 3.2.4 for a four-operator model to be distance diminishing allows four possibilities. These are distinguised by the values of ki, i= 1,2:

A:k 1 =1,k 2 =1; C:k 1 =1,k 2 =2;

B:ki=2,k 2 =2; D:k 1 =2,k 2 =1.

Proposition 3.2.3 shows that D is inconsistent with the existence of absorbing states. Thus it remains to show that a model is absorbing under A, B or C if there are absorbing states. Under A, 1-(1-021 )"(1-p)ETn(P) for all nEN*,pE[O,1], so d (Tn(p), 1)~(1-021 t-+O as n-+ 'XJ. This implies that O is not an absorbing state. By assumption, however, there exists at le:ist one absorbing state, so 1 is absorbing. But then limd(Tn(p),1)=0, pE[0,1], n-+oo

3.2. Linear models

229

implies that the model is absorbing. By symmetry the model is also absorbing under B. If 0 is not absorbing n 21 >0 and 021 >0 by Proposition 3.2.3. Thus, if C holds, A does also, and the model is absorbing. If C holds, and 1 is not absorbing, the same conclusion follows by symmetry. Condition C implies that (l-0 22 )"pETn(P) for p 0 is held fixed and 0 1 is permitted to approach 0. We shall see that in this case the asymptotic behaviour of F 91 , 82 ,K differs radically from that obtained previously. Lemma 3.2.12. If 02 >0, then

Jp

2

dF91,92,K(p) =O(0f).

R

Proof. Taking -r=0 in (3.2.9), we obtain

(3.2.19) where

V*(pn,01)=£.(pn+ 1 -PnlPn)= 01 (1-pn)K(pn)-02Pn(l -K(pn)) and

Letting

n ➔ oo

in (3.2.19), we have 1

0=2fr V*(p,01)dFt(p) 0

1

+ I M*(p,01)dFMp), 0

(3.2.20)

3. Learning

238 1

1

J 02 p 2 (2-0 2 )(1 -K(p))dF;1 (p)=2J p0i(l -p)K(p)dF;1 (p) 0

0

(3.2.21)

1

+ J0f (1-p) 2 ,c(p)dF: (p). 1

Thus

0

(3.2.22) q.e.d.

from which Lemma 3.2.12 follows easily.

Theorem 3.2.13. For a homogeneous two-operator model with reinforcement under the hypotheses of Lemma 3.2.12

lim F81 ,02 ,1e(0~ x) =

81 ➔ 0

* neN

G1e(O>((l _xO )"), 2

where Gm(Y) is the geometric distribution with saltus (l -m)mk at y=k, kEN. Proof. Paralleling the derivation of (3.2.12), we obtain

f eixtdH

81

(x)

= f eixt Y*(0 1 x,0 1 )dH81 (x),

R

(3.2.23)

R

Y*(p, 0i) = £.(exp(i01 1 (Pn+ 1 -pn)t)IPn= P)

= exp(it(l - p)),c(p) +exp( -it 0 1 1 02 p)(l - K(p))

so that

Y*(0. x,0i) = eit(l -B,x) K(01 x) + e-itBix(l - K(01 x))

=eit ,c(O)+e-it82x(l -K(0))+0(01 x).

(3.2.24)

Substituting (3.2.24) in (3.2.23) and noting that JlxldH 81 (x) R

is a bounded function of 0 1 as a consequence of Lemma 3.2.12, we find that

Jeixt dHeJx) =eit ,c(O)J eixt dH01(x) R

R

+(l-,c(O))J eix0- 92 >tdH81 (x)+O(01), R

3.2. Linear models

239

It follows that H 81 converges as 0 1 -+-0 to the distribution function H whose characteristic function tf>u satisfies the functional equation (l-K(O))t/>H((l -02)t) 1-K(O)eit

i.e.,

0. Since H is an infinite convolution of purely discrete distributions, it is either purely discrete or singular or absolutely continuous. 3.2.1.3.4. Let us extend now Theorems 3.2.11 and 3.2.13. Denote by f a function defined on a rectangle D = [O, 1] x [O, p

for all pe[O, 1) and 0e(O,O

for all pE [O, 1),

(3.2.29)

f(p,O)=p

for all pe[O,l],

(3.2.30)

throughout in D,

(3.2.31)

ao f)

-f(p,0)>0

op

3. Learning

240 i)

for all 0e(O,c5], pe[O,1].

opf(p,0)0, 0 2 >0 and K, where

has a limiting distribution function F 61 , 62 ,",f as n-+oo (Theorem 2.1.36). The analogue of Theorem 3.2.13 is Theorem 3.2.14. If f satisfies (3.2.25) - (3.2.32), then for 0 2 > 0

,(a:,),

.1!!1. F 8 ,.,,••• J (;),

where

az

-oiM(-r,O) 0

2 (T

=

I

a2

4 -V(,,O)

1·

cpa0

and M(p,0)=u 2 (p,0)K(p)+u 2 (1-p, (0)(1-K(p)), V(p,0)

= K(p)u(p, 0)-(1-K(p))u(l - p, ( 0).

(3.2.38)

Proof In view of (3.2.36) the argument on Wat the beginning of the

c

proof of Lemma 3.2.10 can be applied directly to -

ae

existence and uniqueness of the root. of (3.2.37). I}

. d For hnear mo els, e.g.,

(;2 opo(J

f

(p, 8) = -1.

V(p,O) to yield the

3.2. Linear models

243

Much as in the proof of Lemma 3.2.10 we obtain (3.2.10), where F6 ,, 6 ,K now is F6 ,, 6 ,K,f· Writing

where p* is between p and rand O is uniform in p, we obtain

-1

2

a2

(p-r) opo0 V(p*,O)dFo,{8,K,J(p)= 0(0).

But

a2

sup - - V(p,O)21Pn=P), i.e. by (3.2.38), Y(p,0,t) by (3.2.11), and G8 ,{ 8 ,K,f by (3.2.13), we have (3.2.12) and (3.2.14) just as in the proof of Theorem 3.2.11. Again we have 3

£.(1Pn+1-Pnl IPn=p)=0(0

3

)

uniformly in p. Substituting this and the expansions V(p,0)

= 0(p-,)

;: v(,,0)+02 0 0

o(l It should be noted that if k----+oo, we obtain as limiting probability distribution function of y the normal one; this follows as a special case of a more general result given at the end of Subpara_graph 3.2.1.3.2.

3. Learning

252

we obtain sin[2u(l

-r l/k) 2-(n- lJ/kJ

---------c------:--:-:-,------

2u(l-2-lt")2- 1, then for experimenter-controlled events one can calculate, e.g., the so-called marginal vector moments, and extend to these moments the properties given for m = 1 [BUSH and MOSTELLER ( 1955)].

3.2. Linear models

257

Complements

The procedure described in Subparagraph 3.2.2.1.3 can be generalized to t > 1. If a 1 = · · · = at+ 1 = a, then the characteristic function f becomes

Assuming that y1 O,

Xn ➔ -

if nlog/J+(l-n)logy0, " 1 if c 0 and c < 0 follow immediately from (3.3.8), (3.3.9), and the equation (3.3.10). In the case c=O, note that EP Yi=O. It is known [CHUNG and FUCHS (1951)] that the sums X" are then recurrent; that is, they repeatedly take on values arbitrarily close to any possible value. In particular, X" takes on repeatedly arbitrarily large and arbitrarily small values (PP-a. s.), which upon recalling (3.3.9) proves the second statement. The third statement is a consequence of the central limit theorem, which implies that for any A, Pp(Xn > A) and PP(Xn< -A) both converge to one-half as n~c,J. Again the assertion of the theorem follows from this fact and (3.3.10). q.e.d.

3.3.1.3.2. Let us examine now the case when the probability of reinforcement depends only on the immediately response; we have what is called simple contingent reinforcement. Further, let us set n: 1 = n 11 and n 2 = n 21 and let /3 and y be specified as in (3.3.6). Using (3.3.9), define the random variable X" recursively (note that log 1, appears first, since log 1' > 0 and log /3 < 0 in order most directly to apply Proposition 3.3.4)

3. Leaming

272

Xn+l

X" + logy { = Xn+log/J

with probability 0 throughout S' = {(p,p'): p>0, p' >0},

so to complete the verification of (IV), Definition 3.1.8, it suffices to show that if (p, p') ES' there exist a l ~ 2 and e1 , ••• , e1_ 1 such that ve •... e1-1(p,p')ES' and Ke1 ... e,_,(p,p')>0. If O0 and l'.4c 8 (0,p')eS', while if 00 and l'.4 8 (p,0)ES Finally l'.4c 8 (0, 0) has positive first and null second coordinate, so t'.4c 8 ,A 8 (0, 0)E S' and K.4 8 (r.4c 8 (0,0))>0. Since K.4c 8 (0,0)>0 the latter inequality implies KAcB,AB(Q,Q) >0. The above argument shows that for any (p, p')E S there exists a point sPP' E T 1 1 (p,p') n S'. Since l'.4w maps S' into S' it follows that 1 r~~~ds 11 P·) 'E Tn+I- i(p,p') for nEN, where z:~w is the nth iterate of r.1w• i.e. 1

•

1

'

No notational confusion is possible concerning the nth iterate.

3.3. Nonlinear models

281

v~w(s)=s, v~1v1 =vAwov~w,

jEN.

Since for any (q,q')ES and nEN,

v~w(q,q')=(l -(1-q)cx~, 1-(1-q')ctj), it follows that d(~+ 1_ 1(p,p'), (1, 1)) ~ d(v~w(sPP'), (1, 1)) ~ {ctin + ct~n) 112 ➔ O

as n ➔ oo. Since (1, 1) is obviously an absorbing state, the verification of (VI), Theorem 3.1.13, is complete. q.e.d. 3.3.2.2.2. We can prove also

Theorem 3.3.13. For any discrimination model of type I lim Pn(A)= 1 n--+ co

and There exist two positive constans ct< 1 and c such that

IIE.(P~

1

(A)P; 2 (WIA))- l II~ c((vi + v~) 112 + l)cxn

for all real v1 , v2 ~ 1, nE N*. The total number fJ* of B responses is finite ~P, p'fa. s. and If cx 1 = ct 2 = 1-0

II£. B*II < oo. and ct 3 = ct4 = 1-0', then 1-p

EO and cil>O for all 1 ~ j, f ~ 2 is ergodic.

3. Leaming

284

Proof. It is clear that if Sn< M, then the sample on trial n will contain elements conditioned to A 2 with positive probability. Given such a sample, A 2 ,n will occur and be followed by 0 21 . with positive probability, and conditioning will be effective with positive probability. Thus Sn+ 1 > Sn with positive probability, and it follows that the state M can be reached from any state. Thus there is only one ergodic set, and it contains M. Furthermore, if Sn= M, then A 1 ,n occurs with probability 1, and O11 ,n follows with positive probability. So Sn+ 1 = M with positive probability, and the ergodic set is regular. q.e.d. 11

3.3.3.2.2. It follows from Theorems 3.1.21 and 3.3.15 that the conclusions of Theorem 3.1.17 are available for any fixed sample size model with 0 < nu, cir for all j and f. Letting D be the subset D= {(i,j,f,l):j=l}

of E, and A = Xn(En), the conclusions of Theorem 3.1.17 include a law of large numbers, a central limit theorem and a law of the iterated logarithm for the number m+n-l 11

L

Aj

i""m

of A 1 responses in the n trial block starting on trial m. A simple expression for a 2 = a}(x. 0 ) can be readily calculated for the pattern model (v= 1) with equal cit under noncontingent reinforcement. We have Theorem 3.3.16. A fixed sample size model with v= l, O 0 is ergodic. For all mEN* and sES, the random variable 1 m+n-1

- L

n

Aj

j=m

converges as n ➔ oo to n 11 both in "5-quadratic mean and P5 -a. s. There exist two positive constants p < 1 and K such that

for all SES, m, nEN*, xER, where (J

2

= 1t11(1-1t11) ( 1 + 2(1-c)) C .

285

3.3. Nonlinear models

If le N* is sufficiently large, then

Proof. Ergodicity follows from Theorem 3.3.15 so Theorem 3.1.17 is applicable. According to ATKINSON and ESTES [ 1963, pp. 172 - 176, especially formulas (37) and (41)] we have EooxD= nu; £00 A1 An+ 1-(£ooxo) 2 = 1C11 (l-n11)(l-ct,

These formulas permit the computation of in (3.1.9) and of