363 72 5MB
English Pages [120] Year 2018
Stat 150 Stochastic Processes
Spring 2009
Lecture 1: To be determined Lecturer: Jim Pitman
• Stochastic Process: Random process, evolution over time/space. Especially models for sequences of random variables: – X0 , X1 , X2 , . . . – Time = {0, 1, 2, . . . } – Random vector (X0 , X1 , . . . , XN ) – (Xt , t ∈ {0, 1, . . . , N }) – (Xt , 0)
Time = [0, 1]
• In background always have – Probability space: (Ω, F, P) ∼ TRIPLE – Set Ω = “set of all possible outcomes” – Collection F of “events” = subsets of Ω S T – Assume F is closed under , , complement, and countable set operations. • If A1 , A2 , . . . is a sequence of events, we S S – consider A1 A2 A3 . . . is an event; T S – P Axiom: If the Ai ’s are disjoint, Ai Aj = ∅, i 6= j, then P( ∞ i=1 Ai ) = ∞ P(A ). i i=1 • Measurability. P{ω : X(ω) ≤ x} = F (x). F is called the cumulative distribution function of X. P(X ≤ x) = F (x). • Discrete value: List of values x0 , x1 , x2 . . . , often {0, 1, 2, 3, . . . }. P P(X = xi ) = pi . If ∞ i=0 Pi = 1, we say X has discrete distribution with probabilities (pi ) at values (xi ). P P PN E(X) = issue: i xi pi . Technical P i xi pi := limN →∞ i=0 xi pi , we need to assume the limit exists, i.e., ∞ |x |p < ∞. i i i=0 1
Lecture 1: To be determined
2
Other cases:
Density case:
F (x) = P(X ≤ x) Z x f (y)dy F (x) = −∞ Z ∞ xf (x)dx E(X) = −∞ Z ∞ g(x)f (x)dx E(X) =
Z provided
|g(x)|f (x)dx = E|g(X)| < ∞
−∞
• We start with na¨ıve ideas of E(X) defined by We often encounter situations where
P
R and .
0 ≤ Xn ↑ X 0 ≤ X1 ≤ X2 ≤ . . . ↑ X 0 ≤ E(Xn ) ↑ E(X)
Claim: X ≤ Y =⇒ E(X) ≤ E(Y ). Proof: Y = X + Y − X. E(Y ) = E(X) + E(Y − X) ≥ E(X). Monotone Convergence Theorem: If 0 ≤ Xn ≤↑ X, then 0 ≤ E(Xn ) ↑ E(X). • Conditional Probability P(A|B) =
P(AB) P(B)
provided P(B) = 0
Easily for a discrete random variable X we can define ( P = i yi dP(Y = yi |X = x) discrete R E(Y |X = x) = y ydP(Y = y|X = x) density
Suppose you know values of E(Y |X = x) for all x of X, how to find E(Y )? X E(Y ) = E(Y |X = x) ·P(X = x) | {z } x
= E(ϕ(x))
ϕ(x)
Lecture 1: To be determined
3
For a discrete X, E(Y |X) is a random variable, whereas E(Y |X = x) is a value. So E(Y ) = E(E(Y |X))
• Exercises with random sums X1 , X2 . . . is a sequence of independent and identically distributed random variables. Sn := X1 + · · · + Xn , N is a random variable. For simplicity, N ⊥ X1 , X2 , . . . Want a formula for E(XN ) = E(X1 +· · ·+XN ) = E(E(SN |N )). E(SN |N = n) = E(X1 + · · · Xn ) = nE(X1 ) =⇒ E(E(SN |N )) = E(N E(X1 )) = E(N )E(X1 ) Stopping time: Rule for deciding when to quit.
Stat 150 Stochastic Processes
Spring 2009
Lecture 2: Conditional Expectation Lecturer: Jim Pitman
• Some useful facts (assume all random variables here have finite mean square): • E(Y g(X)|X) = g(X)E(Y |X) • Y − E(Y |X) is orthogonal to E(Y |X), and orthogonal also to g(X) for every measurable function g. • Since E(Y |X) is a measurable function of X, this characterizes E(Y |X) as the orthogonal projection of Y onto the linear space of all square-integrable random variables of the form g(X) for some measurable function g. • Put another way g(X) = E(Y |X) minimizes the mean square prediction error E[(Y − g(X))2 ] over all measurable functions g.
! ! !
! ! !
! ! !
! ! !
! !
• Y " Residual • E(Y |X) ! ! !
! ! !
! ! !
! ! !
! !
g(X)
These facts can all be checked by computations as follows: Check orthogonality: E[(Y − E(Y |X))g(X)] = E(g(X)Y − g(X)E(Y |X)) = E(g(X)Y ) − E(g(X)E(Y |X)) = E(E(g(X)Y |X)) − E(g(X)E(Y |X)) = E(g(X)E(Y |X)) − E(g(X)E(Y |X)) =0 • Recall: Var(Y ) = E(Y − E(Y ))2 and Var(Y |X) = E([Y − E(Y |X)]2 |X). Claim: Var(Y ) = Var(E(Y |X)) + E(Var(Y |X))
1
Lecture 2: Conditional Expectation
2
Proof: Y = E(Y |X) + Y − E(Y |X) E(Y 2 ) = E(E(Y |X)2 ) + E([Y − E(Y |X)]2 ) + 0 = E(E(Y |X)2 ) + E(Var(Y |X)) E(Y 2 ) − (E(Y ))2 = E(E(Y |X)2 ) − (E(Y ))2 + E(Var(Y |X)) = E(E(Y |X)2 ) − (E(E(Y |X)))2 + E(Var(Y |X)) =⇒Var(Y ) = Var(E(Y |X)) + E(Var(Y |X)) • Exercise P. 84, 4.3 T is uniform on [0,1]. Given T , U is uniform on [0, T ]. What is P(U ≥ 1/2)? P(U ≥ 1/2) = E(E[1(U ≥ 1/2)|T ]) = E[P(U ≥ 1/2|T )] T − 1/2 1(T ≥ 1/2) =E T Z 1 t − 1/2 = dt t 1/2 • Random Sums: Random time T . Sn = X1 + · · · + Xn . Wants a formula for E(ST ) which allows that T might not be independent of X1 , X2 , . . . . Condition: For all n = 1, 2, . . . the event (T = n) is determined by X1 , X2 , . . . , Xn . Equivalently: (T ≤ n) is determined by X1 , X2 , . . . , Xn . Equivalently: (T > n) is determined by X1 , X2 , . . . , Xn . Equivalently: (T ≥ n) is determined by X1 , X2 , . . . , Xn−1 . Call such a T a stopping time relative to the sequence X1 , X2 , . . . Example: The first n (if any) such that Sn ≤ 0 or Sn ≥ b. Then (T = n) = (S1 ∈ (0, b), S2 ∈ (0, b), . . . , Sn−1 ∈ (0, b), Sn 6∈ (0, b)) is a function of S1 , . . . , Sn . • Wald’s identity: If T is a stopping time relative to X1 , X2 , . . . , which are i.i.d. and Sn := X1 + · · · + Xn , then E(ST ) = E(T )E(X1 ), provided E(T ) < ∞.
Lecture 2: Conditional Expectation
3
Sketch of proof: ST = X1 + · · · + XT = X1 1(T ≥ 1) + X2 1(T ≥ 2) + · · · E(ST ) = E(X1 1(T ≥ 1) + E(X2 1(T ≥ 2)) + · · · = E(X1 ) + E(X2 )P(T ≥ 2) + E(X3 )P(T ≥ 3) + · · · ∞ X = E(X1 ) P(T ≥ n) n=1
= E(X1 )E(T ) Key point is that for each n the event (T ≥ n) is determined by X1 , X2 , . . . , Xn−1 , hence is independent of Xn . This justifies the factorization E(Xn 1(T ≥ n) = E(Xn )E(1(T ≥ n)) = E(X1 )P(T ≥ n). It is also necessary to justify the swap of E and Σ. This is where E(T ) < ∞ must be used in a more careful argument. Note that if Xi ≥ 0 the swap is justified by monotone convergence. • Example. Hitting probabilities for simple symmetric random walk +C
0
−B
Sn = X1 + · · · Xn , Xi ∼ ±1 with probability 1/2,1/2. T = first n s.t. Sn = +C or − B. It is easy to see that E(T ) < ∞. Just consider successive blocks if indices of B+C B+C B+C ... length B + C. Wait until a block of length B + C with Xi = 1 for all i in the block. Geometric distribution of this upper bound on T =⇒ E(T ) < ∞ .
Lecture 2: Conditional Expectation
Let p+ = P(ST = +C) and p− := P(ST = −C). Then E(ST ) = E(T )E(X1 ) = E(T ) · 0 = 0 0 = p+ C − p− B 1 = p+ + p− B C =⇒ p+ = p− = B+C B+C
4
Stat 150 Stochastic Processes
Spring 2009
Lecture 3: Martingales and hitting probabilities for random walk Lecturer: Jim Pitman
• A sequence of random variables Sn , n ∈ {0, 1, 2, . . . }, is a martingale if 1) E|Sn | < ∞ for each n = 0, 1, 2, . . . 2) E(Sn+1 |(S0 , S1 , . . . , Sn )) = Sn E.g., Sn = S0 + X1 + · · · + Xn , where Xi are differences, Xi = Si − Si−1 . Observe that (Sn ) is a martingale if and only E(Xn+1 |S0 , S1 , . . . , Sn ) = 0. Then say (Xi ) is a martingale difference sequence. Easy example: let Xi be independent random variables (of each other, and of S0 ), where E(Xi ) ≡ 0 =⇒ Xi are martingale differences =⇒ Sn is a martingale. • Some general observations: If Sn is a martingale, then E(Sn ) ≡ E(S0 ) (Constant in n). Proof: By induction, suppose true for n, then E(Sn+1 ) = E(E(Sn+1 |(S0 , S1 , . . . , Sn ))) = E(Sn ) = E(S0 ) • Illustration: Fair coin tossing walk with absorption at barriers 0 and b. Process: Start at a. Run until the sum of ±1’s hits 0 or b, then freeze the value. – By construction, 0 ≤ Sn ≤ b =⇒ E|Sn | ≤ b < ∞. – Given S0 , S1 , . . . , Sn , with 0 < Si < b for 0 ≤ i ≤ n, we assume that ( Sn + 1 with probability 1/2 Sn+1 = Sn − 1 with probability 1/2 So E(Sn+1 |S0 , S1 , . . . , Sn with 0 < Si < b, 0 ≤ i ≤ n) 1 1 = (Sn + 1) + (Sn − 1) = Sn 2 2 whereas given S0 , S1 , . . . , Sn , with either Sn = 0 or Sn = b, then Sn+1 = Sn . So also E(Sn+1 |S0 , S1 , . . . , Sn with Si = 0 or b) = Sn 1
Lecture 3: Martingales and hitting probabilities for random walk
2
Since the only possible values of Sn are in the range from 0 to b, no matter what the value of Sn , E(Sn+1 |S0 , S1 , . . . , Sn ) = Sn So the fair random walk with absorbtion at barriers 0 and b is a martingale. • Revisit the Gambler’s ruin problem for a fair coin. Observe (Sn = b) =⇒ (Sn+1 = b) P(Sn = b) ≤ P(Sn+1 = b) P(Sn = b) is an increasing sequence which is bounded above by 1. Proof: We know E(Sn ) ≡ a. But a = E(Sn ) = 0 · E(Sn = 0) + b · P(Sn = b) +
b−1 X
i · P(Sn = i)
i=1
Pb−1 P(Sn = i) = P(T > n) → 0 as n → ∞, where T is the time Observe that i=1 to hit the barriers. Then a n a P(Sn = b) = − → as n → ∞ b b b • Gambler’s ruin for an unfair coin p ↑ q ↓, p + q = 1. Idea: Choose an increasing function g so that we make the game fair; that is, we will make an Mn = g(Sn ) so that (Mn ) is a MG. Look at rSn instead of Sn for some ratio r to be determined. Suppose you are given rS0 , . . . rSn , with 0 < Sn < b, then ( with prob p r Sn × r rSn+1 = 1 Sn with prob q r ×r Compute 1 E(rSn+1 |rS1 , . . . rSn ) = rSn (rp + q) = rSn r 1 q ⇐⇒ rp + q = 1 ⇐⇒ r = or r = 1. r p
So r = q/p implies rSn is a martingale. Again, the walk is stopped if it reaches the barriers at 0 or b, and given such values the martingale property for the next step is obvious. The martingale property gives: ErSn ≡ ra if S0 = a. b−1
X q q q q ( )a = ( )0 P(Sn = 0) + ( )b P(Sn = b) + ( )i P(Sn = i) p p p p i=1
Lecture 3: Martingales and hitting probabilities for random walk
Let Pup = limn→∞ P(Sn = b), 1 − Pup = limn→∞ P(Sn = 0), q q q ( )a = ( )0 (1 − Pup ) + ( )b Pup p p p (q/p)a − 1 =⇒ Pup = (q/p)b − 1
Case p > q, fix a and let b → ∞, q lim Pup = 1 − ( )a b→∞ p q a lim Pdown = ( ) b→∞ p
3
Stat 150 Stochastic Processes
Spring 2009
Lecture 4: Conditional Independence and Markov Chain Lecturer: Jim Pitman
1
Conditional Independence • Q: If X and Y were conditionally independent given λ, are X and Y independent? (Typically no.) Y • Write X ⊥ Z to indicate X and Z are conditionally independent given Y . Y Assume for simplicity all variables are discrete, then X ⊥ Z means P(X = x, Z = z|Y = y) = P(X = x|Y = y)P(Z = z|Y = y) for all possible values of x, y, and z. Y • Suppose X ⊥ Z, claim: P(Z = z|X = x, Y = y) = P (Z = z|Y = y). (Key point: The value of X is irrelevant.) Proof: From P(X = x, Z = z|Y = y) = P(X = x|Y = y)P(Z = z|Y = y), we get P(X = x, Z = z|Y = y) P(X = x|Y = y) P(X = x, Y = y, Z = z)/P(Y = y) = P(X = x, Y = y)/P (Y = y)
P(Z = z|Y = y) =
Also P(Z = z|X = x, Y = y) =
P(X = x, Y = y, Z = z) P(X = x, Y = y)
Y Y • Z ⊥ X is the same as X ⊥ Z. P(X = x|Z = z, Y = y) = P (X = x|Y = y), ∀x, y, z is a further extension. Y • Suppose X ⊥ Z, then E[g(Z)|X, Y ] = E[g(Z)|Y ]. Proof is by application of P(Z = z|X = x, Y = y) = P (Z = z|Y = y) 1
Lecture 4: Conditional Independence and Markov Chain
2
2
Markov Chain • Definition: A sequence of r.v.’s (X0 , X1 , . . . , Xn ) is called a Markov chain if Xm (present) for every 1 ≤ m < n, (X0 , X1 , . . . , Xm−1 )(past) ⊥ Xm+1 (future). In other words, at each stage, to describe the conditional distribution of Xm+1 given (X0 , X1 , . . . , Xm ), you just need to know Xm . Often it is assumed further that the conditional distribution of Xm+1 given Xm = x depends only on x, not on m. These are homogeneous transition probabilities. • MC’s with homogenous transition probabilities are specified by a matrix P . Various notations for P : P = Pij = P (i, j). Assuming discrete values for X0 , X1 , . . . , then P(Xm+1 |Xm = i, X0 = i0 , X1 = i1 , . . . , Xm−1 = im−1 ) = Pij = P (i, j) Write the unconditional probability: P(X0 = i0 , X1 = i1 , . . . , Xn = in ) = P(X0 = i0 )P (i0 , i1 ) · · · P (in−1 , in ) • Consider the matrix P (i, j). Fix i and sum on j, and get
P
j
P (i, j) = 1.
• Trivial case: IID ↔ P with all rows identical. • Notice that we need a state space of values S — one row and one column for each stat i ∈ S. • The textbook uses S = {0, 1, 2, . . . }. However, sometimes it is unnatural to have a state space of nonnegative integers. e.g. shuffling a deck of cards. State space = 52! orderings of the deck. • Transition diagram v.s. transition matrix • Examples of Markov chains: random walks of graphs, webpage rankings. • n-step transition probability of a MC Problem: Know P (i, j) = P(Xm+1 = j|Xm = i), ∀i, j, want to compute P(Xn = j|X0 = i) = P(Xm+n = j|Xm = i).
Lecture 4: Conditional Independence and Markov Chain
3
Method: For n = 2, P(X0 = i, X2 = j) P(Xo = i) P P(X0 = i, X1 = k, X2 = j) = k P(X0 = i) P P(X0 = i)P (i, k)P (k, j) = k P (X0 = i) X = P (i, k)P (k, j)
P(X2 = j|X0 = i) =
k
Notice that P(X2 = j|X0 = i) = P 2 (i, j). Claim: P(Xn = j|X0 = i) = P n (i, j). Proof: The case for n = 2 has been proved. Suppose true for n, then P(X0 = i, Xn+1 = j) P(Xo = i) P P(X0 = i, X1 = k, Xn+1 = j) = k P(X0 = i) P P(X0 = i)P (i, k)P n (k, j) = k P (X0 = i) X = P (i, k)P n (k, j) = P n+1 (i, j)
P(Xn+1 = j|X0 = i) =
k
• Action of matrix P on row vectors Suppose X0 has distribution λ on state space S. P(X0 = i) = λi , i ∈ S, P = P (i, j), i, j ∈ S, then X X P(X1 = j) = P(X0 = i, X1 = j) = λi P (i, j) = (λP )j i∈S
i∈S
Repeating this process, we get distribution of Xn is λP n . • Action of matrix P on column vectors Given a function f on the state space S, then X X E(f (X1 )|X0 = i) = f (j)P(X1 = j|X0 = i) = P (i, j)f (j) = (P f )i j
j
Stat 150 Stochastic Processes
Spring 2009
Lecture 5: Markov Chains and First Step Analysis Lecturer: Jim Pitman
1
Further Analysis of Markov Chains • Q: Suppose λ = row vector, f = column vector, and P is a probability transition matrix, then what is the meaning of λP n f ? Here λP n f is the common value of (λP n )f = λ(P n f ) for usual operations involving vectors and matrices. The above equality is elementary for finite state space. For countable state space it should be assumed e.g. that both λ and f areP non-negative. Assume that λ is a probability distribution, i.e. λi >= 0 and i λi = 1. Suppose then that X0 , X1 , . . . , Xn is a Markov chain and the distribution of X0 is λ: P(X0 = i) = λi . Then (λP n )j = P(Xn = j). So X λP n f = (λP n )j f (j) j
=
X
P(Xn = j)f (j)
j
= Ef (Xn ) So the answer is: λP n f = Ef (Xn ) for a Markov chain with initial distribution λ and transition probability matrix P . • Notice that the same conclusion is obtained by first recognizing that P n f (i) = E[f (Xn )|X0 = i]: λP n f = λ(P n f ) X = λi (P n f )i i
=
X
λi E[f (Xn )|X0 = i]
i
= E[E[f (Xn )|X0 ]] = Ef (Xn )
1
Lecture 5: Markov Chains and First Step Analysis
2
• Q: What sort of functions of a Markov chain give rise to Martingales? Say we have a MC X0 , X1 , . . . with transition probability matrix P . Seek a function h such that h(X0 ), h(X1 ), . . . is a Martingale. Key computation: E[h(Xn+1 )|X0 , X1 , . . . , Xn ] =E[h(Xn+1 )|Xn ] =(P h)(Xn ) where P h(i) =
X
P (i, j)h(j)
j
= E[h(X1 )|X0 = i] = E[h(Xn+1 )|Xn = i] Notice that if P h = h, then E[h(Xn+1 )|h(X0 ), . . . , h(Xn )] = E[E[h(Xn+1 )|X0 , . . . , Xn ]|h(X0 ), . . . , h(Xn )] = E[h(Xn )|h(X0 ), . . . , h(Xn )] = h(Xn ) Definition: A function h such that h = P h is called a harmonic function associated with the Markov matrix P , or a P -harmonic function for short. • Conclusion: If h is a P -harmonic function and (Xn ) is a Markov chain with transition matrix P , then (h(Xn )) is a martingale. You can also check the converse: if (h(Xn )) is a martingale, no matter what the initial distribution of X0 , then h is P -harmonic. • Note: A constant function h(i) ≡ c is harmonic for every transition matrix P , because every row of P sums to 1. • Example: A simple random walk with ±1, (1/2, 1/2) on {0, . . . , b}, and 0 and b absorbing: P (0, 0) = 1, P (b, b) = 1, P (i, i ± 1) = 1/2, 0 < i < b. Now describe the harmonic P function solutions h = (h0 , h1 , . . . , hb ) of h = P h. For 0 < i < b, h(i) = j P (i, j)h(j) = 0.5h(i − 1) + 0.5h(i + 1), h(0) = h(0), h(b) = h(b). Observe that h = P h ⇒ i → h(i) is a straight line. So if h(0) and h(b) are specified, get h(i) = h(0) + i
h(b) − h(0) . So there is just b
Lecture 5: Markov Chains and First Step Analysis
3
a two-dimensional subspace of harmonic functions. And a harmonic function is determined by its boundary values. Exercise: Extend the discussion to a p ↑, q ↓ walk with absorbtion at 0 and b for p 6= q. (Hint: Consider (q/p)i as in discussion of the Gambler’s ruin problem for p 6= q)
2
First Step Analysis
First step analysis is the general approach to computing probabilities or expectations of a functional of a Markov chain X0 , X1 , . . . by conditioning the first step X1 . The key fact underlying first step analysis is that if X0 , X1 , . . . is a MC with some initial state X0 = i and transition matrix P , then conditionally on X1 = j, the sequence X1 , X2 , X3 , . . . (with indices shifted by 1) is a Markov chain with X1 = j and transition matrix P . You can prove this by checking from first principles e.g. P(X2 = j2 , X3 = j3 , X4 = j4 |X1 = j) = P (j, j2 )P (j2 , j3 )P (j3 , j4 ) • Back to gambler’s ruin problem: Consider random walk X0 , X1 , . . . with ±1, (1/2, 1/2) on {0, . . . , b}. We want to calculate P(walk is absorbed at b | starts at X0 = a). • New approach: let hb (i) := P(reach b eventually | X0 = i). Now hb (0) = 0, hb (b) = 1, and for 0 < i < b, hb (i) =P (i, i + 1)P(reach b eventually | X1 = i + 1) + P (i, i − 1)P(reach b eventually | X1 = i − 1) =0.5 hb (i + 1) + 0.5 hb (i − 1) Observe that h = hb solves h = P h. So from the results of harmonic functions, hb (i) = i/b. • Example: Random walk on a grid inside square boundary. For the symmetric nearest neighbor random walk, with all four steps to nearest neighbors equally likely, and absorbtion on the boundary, a function is harmonic if its value at each state (i, j) in the interior of the grid is the average of its values at the four neighbors (i ± 1, j ± 1). This is a discrete analog of the notion of a harmonic function in classical analysis, which is a function in a two dimensional domain whose value at (x, y) is the average of its values around every circle centered at (x, y) which is inside the domain.
Lecture 6 : Markov Chains STAT 150 Spring 2006 Lecturer: Jim Pitman
Scribe: Alex Michalka
Markov Chains • Discrete time • Discrete (finite or countable) state space S • Process {Xn } • Homogenous transition probabilities • matrix P = {P (i, j); i, j ∈ S} P (i, j), the (i, j)th entry of the matrix P , represents the probability of moving to state j given that the chain is currently in state i. Markov Property: P(Xn+1 = in+1 |Xn = in , . . . , X0 = i0 ) = P (in , in+1 ) This means that the states of Xn−1 . . . X0 don’t matter. The transition probabilities only depend on the current state of the process. So, P(Xn+1 = in+1 |Xn = in , . . . , X0 = i0 ) = P(Xn+1 = in+1 |Xn = in ) = P (in , in+1 ) To calculate the probability of a path, multiply the desired transition probabilities: P(X0 = i0 , X1 = i1 , X2 = i2 , . . . , Xn = in ) = P (i0 , i1 ) · P (i1 , i2 ) · . . . · P (in−1 , in ) P Example: iid Sequence {Xn }, P(Xn = j) = p(j), and p(j) = 1. j
P (i, j) = j.
Example: Random Walk. S = Z (integers), Xn = i0 + D1 + D2 + . . . + Dn , where Di are iid, and P(Di = j) = p(j). P (i, j) = p(j − i). Example: Same random walk, but stops at p(j − i) 1 P (i, j) = 0 6-1
0. if i 6= 0; if i = j = 0; if i = 0, j 6= 0.
6-2
Lecture 6: Markov Chains
Example: Y0 , Y1 , Y2 . . . iid. P(Y = j) = p(j). Xn = max(Y0 , . . . , Yn ). if i ≤ j; p(j) P p(k) if i = j; P (i, j) = k≤i 0 if i > j. Matrix Properties: P (i, ·) = [P (i, j); j ∈ S], the ith row of the matrix P , is a row vector. Each row of P is a probability distribution on the state-space S, representing the probabilities of P transitions out of state i. Each row should sum to 1. ( P (i, j) = 1 ∀i ) j∈S
The column vectors [P (·, j); j ∈ S] represent the probability of moving into state j. Use the notation Pi = probability for a Markov Chain that started in state i (X0 = i). In our transition matrix notation, Pi (X1 = j) = P (i, j). n-step transition probabilities: Want to find Pi (Xn = j) for n = 1, 2, ... This is the probability that the Markov Chain is in state j after n steps, given that it started in state i. First, let n = 2. X Pi (X2 = k) = Pi (Xi = j, X2 = k) j∈S
=
X
P (i, j)P (j, k).
j∈S
This is simply matrix multiplication. So, Pi (X2 = k) = P 2 (i, k) We can generalize this fact for n-step probabilities, to get: Pi (Xn = k) = P n (i, k) Where P n (i, k) is the (i, k)th entry of P n , the transition matrix multiplied by itself n times. This is a handy formula, but as n gets large, P n gets increasingly difficult and time-consuming to compute. This motivates theory for large values of n. Suppose we have X0 ∼ µ, where µ is a probability distribution on S, so P(X0 = i) =
6-3
Lecture 6: Markov Chains
µ(i) for i ∈ S. We can find nth step probabilities by conditioning on X0 . X P(Xn = j) = P(X0 = i) · P(Xn = j|X0 = i) i∈S
=
X
µ(i)P n (i, j)
i∈S
= (µP n )j = the j th entry of µP n . Here, we are regarding the probability distribution µ on S as a vector indexed by i ∈ S. So, we have X1 ∼ µP, X2 ∼ µP 2 , . . . , Xn ∼ µP n . Note: To compute the distribution of Xn for a particular µ, it is not necessary to find P n (i, j) for all i, j, n. In fact, there are very few examples where P n (i, j) can be computed explicitly. Often, P has certain special initial distributions µ so that computing µP n is fairly simple. Example: Death and Immigration Process Xn = the number of individuals in the population at time (or generation) n. S = {0, 1, 2, . . .} Idea: between times n and n + 1, each of the Xn individuals dies with probability p, and the survivors contribute to generation Xn+1 . Also, immigrants arrive each generation, following a Poisson(λ) distribution. Theory: What is the transition matrix, P , for this process? To find it, we condition on the number of survivors to obtain: P (i, j) =
i∧j P
k=0
i j
(1 − p)k (p)i−k ·
e−λ λj−k (j−k)!
Here, i ∧ j = min{i, j}. This formula cannot be simplified in any useful way, but we can analyze the behavior for large n using knowledge of the Poisson/Binomial relationship. We start by considering a special initial distribution for X0 . Let X0 ∼ Poisson(λ0 ). Then the survivors of X0 have a Poisson(λ0 q) distribution, where q = 1−p. Since the number of immigrants each generation follows a Poisson(λ) distribution, independent of the number of people currently in the population, we have X1 ∼ Poisson(λ0 q + λ). We can repeat this logic to find: X2 ∼ Poisson((λ0 q + λ)q + λ) = Poisson(λ0 q 2 + λq + λ), and
6-4
Lecture 6: Markov Chains
X3 ∼ Poisson((λ0 q 2 + λq + λ)q + λ) = Poisson(λ0 q 3 + λq 2 + λq + λ). In general, Xn ∼ Poisson(λ0 q n + λ
n−1 P
q k ).
k=0
In this formula, λ0 q n represents the survivors from the initial population, λq n−1 represents the survivors from the first immigration, and so on, until λq represents the survivors from the previous immigration, and λ represents the immigrants in the current generation. Now we’ll look at what happens as n gets large. As we let n → ∞:
λ0 q n → 0,
and
λ0 q n + λ
n−1 P
qk →
k=0
λ 1−q
= λp .
So, no matter what λ0 , if X0 has Poisson distribution with mean λ0 , limn→∞ P(Xn = k) =
e−ν ν k , k!
where ν =
λ 1−q
= λp .
It is easy enough to show that this is true no matter what the distribution of X0 . The particular choice of initial distribution for X0 that is Poisson with mean λ0 = ν gives an invariant (also called stationary, or equilibrium, or steady-state) distribution of X0 . With λ0 = ν, we find that each Xn will follow a Poisson(ν) distribution. This −ν i initial distribution µ(i) = e i! ν is special because it has the property that P
µ(i)P (i, j) = µ(j) for all j ∈ S.
i∈S
Or, in matrix form, µP = µ. It can be shown that this µ is the unique stationary probability distribution for this Markov Chain.
Stat 150 Stochastic Processes
Spring 2009
Lecture 6: Markov Chains and First Step Analysis II Lecturer: Jim Pitman
1
Further Analysis of Markov Chain • In class last time: We found that if h = P h, then (1) E[h(Xn+1 ) | X0 , X1 , . . . , Xn ] = h(Xn ) ⇒ (2) E[h(Xn+1 ) | h(X0 ), h(X1 ), . . . , h(Xn )] = h(Xn ) ⇒ h(Xn ) is a MC. Why? This illustrates a conditional form of E[Y ] = E[E[Y |X]], which can be written more generically as (β)
E[Y |Z] = E[E[Y |X, Z]|Z]
To obtain (2), take Y = h(Xn+1 ), Z = (h(X0 ), h(X1 ), . . . , h(Xn )), X = (X0 , X1 , . . . , Xn ). Notice that (X, Z) = (X0 , X1 , . . . , Xn , h(X0 ), h(X1 ), . . . , h(Xn )) is the same information as (X0 , X1 , . . . , Xn ) since Z is a function of X. So E[h(Xn+1 ) | h(X0 ), h(X1 ), . . . , h(Xn )] =E[E[h(Xn+1 ) | X0 , X1 , . . . , Xn , h(X0 ), h(X1 ), . . . , h(Xn )] | h(X0 ), h(X1 ), . . . , h(Xn )] =E[E[h(Xn+1 )|X0 , X1 , . . . , Xn ] | h(X0 ), h(X1 ), . . . , h(Xn )] =E[h(Xn ) | h(X0 ), h(X1 ), . . . , h(Xn )] =h(Xn )
1
Lecture 6: Markov Chains and First Step Analysis II
2
• Let us derive (β) from the definition of E[Y |Z]: E[Y 1(Z = z)] P(Z = z) P E[ x Y 1(X = x, Z = z)] = P(Z = z) P E[Y 1(X = x, Z = z)] = x P(Z = z) P E[Y |X = x, Z = z]P(X = x|Z = z)P(Z = z) = x P(Z = z) X = E[Y |X = x, Z = z]P(X = x|Z = z)
E[Y |Z = z] =
x
= E[E[Y |X, Z = z]|Z = z] Therefore, E[Y |Z] = E[E[Y |X, Z]|Z]. P Note that the above argument uses 1 = P x Y 1(X = x). The switch of sum and E x 1(X = x) which implies Y = can be justified provided either Y ≥ 0 or E(|Y |) < ∞).
2
First Step Analysis — An Example • Consider a symmetric nearest neighbour random walk in a 2-D grid: 7
8
0.25 6
D
C
4 0.25 0.25
5
A
B
3
0.25
1
2
Suppose squares 1 to 8 form an absorbing boundary. Start the walk in Square A, what is the distribution of the exit point on the boundary? Let fj denote the probability of hitting Square j. Obvious symmetries: f1 = f5 , f2 = f6 , f3 = f7 , f4 = f8 .
Lecture 6: Markov Chains and First Step Analysis II
3
Less obvious: f2 = f3 . • Idea: Let TB denote the time of first hitting state B (if ever), T = time of first hitting some state i ∈ {1, 2, . . . , 8}. Observe that if XT ∈ {2, 3}, then we must get there via B, which implies TB < ∞. That f2 = f3 now follows from: • General fact (Strong Markov Property): For a MC X0 , X1 , . . . , with transition matrix P , a state B, if TB = first hitting time of B: ( first n : Xn = B TB = ∞
if any if none
Then (XTB , XTB +1 , . . . ) given TB < ∞ is a MC with transition matrix P and initial state B. • Let fij = Pi (XT = j) denote the probability, starting at state i, of exiting in square j. So fj = pAj . From the above symmetries, and the obvious fact that PA (T < ∞) = 1, we see 1 f1 + 2f2 + f4 = . 2 By conditioning on X1 , called first step analysis, 1 1 1 fA1 = 0 + 1 + fB1 + 4 4 4 1 1 1 f1 = + f2 + f2 4 4 4 1 1 1 fA2 = 0 + 0 + fB2 + 4 4 4 1 1 f2 = f1 + f4 4 4
1 fD1 4 (By symmetry) 1 fD2 4
Solving 1 2 1 1 1 f1 = + f2 + f2 4 4 4 1 1 f2 = f1 + f4 4 4
f1 + 2f2 + f4 =
yields f1 = 7/24, f2 = 1/12, f4 = 1/24. And we can obtain the distribution of other exit points by symmetry.
Stat 150 Stochastic Processes
Spring 2009
Lecture 7: Limits of Random Variables Lecturer: Jim Pitman
• Simplest case: pointwise limits. • Recall that formally a random variable X is a function of outcomes ω ∈ Ω: X:Ω→R ω → X(ω) A sequence of random variables (functions) Xn (ω) converges pointwise to a limit X(ω) means limn→∞ Xn (ω) = X(ω), ∀ω ∈ Ω. • Important example: Xn (ω) is ↑ as n ↑, that is, Xn (ω) ≤ Xn+1 (ω) ≤ . . . Pn For example: Xn (ω) = k=1 Yk (ω) for Yk (ω) ≥ 0. (e.g. if Yk = IAk is the indicator of some event Ak ; then Xn is the number of events that occur among A1 , . . . , An .) Since Xn (ω) is a sequence of nondecreasing random variables, by properties of realSnumbers, for every ω there is a pointwise limit X(ω) P∞ = limn→∞ Xn (ω) ∈ R {+∞}. In the example with indicators, X(ω) = k=1 IAk (ω). This is the total number of events which occur in the infinite sequence of events A1 , A2 , . . .. This number X(ω) might be finite or infinite, depending on ω. • If 0 ≤ Xn ↑ X as above, then the Monotone Convergence Theorem gives Pn 0 ≤ E[Xn ] ↑ E[X]. Applying to Xn = k=1 Yk for Yk ≥ 0 this justifies exchange of infinite sums and E for non-negative sequences of random variables: ! ∞ ∞ X X E Yk = E(Yk ). k=1
k=1
1
Lecture 7: Limits of Random Variables
2
X(w)
X_2(w)
X_1(w)
w
1
Law of Large Numbers • Suppose X, X1 , X2 , . . . are i.i.d. with E|X| < ∞, EX 2 < ∞, Sn := X1 + . . . Xn , Sn /n = the average of X1 , . . . , Xn , then E[Sn /n] = E[X], V ar(Sn /n) = V ar(X)/n → 0 as n → ∞. • Convergence in the Mean Square: E (Sn /n − E[X])2 = V ar(X)/n → 0 as n → ∞ This is a more subtle sense of convergence of Sn /n to limit E[X]. • Almost Sure Convergence: Suppose X, X1 , X2 , . . . are i.i.d., if E|X| < ∞, then P(ω : lim Sn (ω)/n = E(X)) = 1 n→∞
Note that there may be an exceptional set of points that do not satisfy lim Sn (ω)/n = n→∞
E(X), but that set has probability 0 under the current probability measure. • The Strong Law of Large Numbers (Kolmogorov): a.s.
Sn /n −→ E[X] as n → ∞ • The Weak Law of Large Numbers: SLLN ⇒ ∀ > 0, lim P(|Sn /n − E[X]| > ) = 0 n→∞
Lecture 7: Limits of Random Variables
3
A quick proof: By Chevbyshev’s inequality, P(|Sn /n − E[X]| > ) ≤
V ar(X) → 0. n2
• Two results about convergence of MGs (MG convergence theorem): (1) If (Xn , n ≥ 0) is a nonnegative MG, then Xn converges a.s. to a limit X. (2) If (Xn , n ≥ 0) is a MG which has sup E[Xn2 ] < ∞, then Xn converges both n
a.s. and in the mean square to some X. Hint of proof of (1): From the text, if X0 ≤ a, then P(supn Xn ≥ b) ≤ a/b.
? b
a
0
P(# of upcrossings of [a,b] ≥ 1) ≤ a/b P(# of upcrossings of [a,b] ≥ 2) ≤ (a/b)2 P(# of upcrossings of [a,b] ≥ k) ≤ (a/b)k Let Ua,b := total # of upcorssings of [a,b] by path of X0 , X1 , . . . , P(Ua,b ≥ k) ≤ (a/b)k
=⇒
P(Ua,b < ∞) = 1
So with probability one, for every 0 < a < b < ∞, the sequence Xn must eventually stop crossing the interval [a, b]. This leads to almost sure convergence. (Reference: see any graduate level textbook in probability, e.g. Durrett). • Example: For fair coin tossing, given X0 = b, Ta = first n : Xn = a, and 0 < a < b, prove that Pb (Ta < ∞) = 1.
Lecture 7: Limits of Random Variables
4
Proof: Pb (Ta < Tc ) =
c−b → 1 as c → ∞ c−a
(Recall gambler’s ruin problem)
but 1 ≥ P(Ta ≤ ∞) ≥ Pb (Ta < Tc ) ↑ 1, therefore, Pb (Ta < ∞) = 1.
c
b a 0
2
Mean first passage times by first step analysis • Consider a fair coin tossing walk that starts at a and runs to until first reaching either 0 or b. Problem: Find the expected time to reach 0 or b.
S0 = a S n = a + X1 + X2 + · · · + Xn T0,b : = first n : Sn = 0 or b Ea [T0,b ] : = E[T0,b |S0 = a] Method: Define ma,b = Ea [T0,b ]. Fix b and vary a. Condition on the first step: Ea (T0,b |S1 = s) = 1 + Es (T0,b ) ma,b = 1/2(1 + ma+1,b ) + 1/2(1 + ma−1,b ) mb,b = 0 m0,b = 0
Lecture 7: Limits of Random Variables
5
b
a+1 a a-1
0
Simplify the main equation to (ma+1,b + ma−1,b )/2 − ma,b = −1 which shows that a → ma,b is a concave function. The simplest concave function is a parabola. The function must vanish at a = 0 and a = b, which suggests the solution ma,b = ca(b − a) for some c > 0. Simple algebra shows this works for c = 1. The solution is obviously unique subject to the boundary conditions, hence ma,b = a(b − a). Note that the maximum is attained for a = b/2 if b is even and for a = (b ± 1)/2 if b is odd.
Stat 150 Stochastic Processes
Spring 2009
Lecture 8: First passage and occupation times for random walk Lecturer: Jim Pitman
1
First passage and occupation times for random walk • Gambler’s ruin problem on {0, 1, . . . , b} with P (a, a − 1) = P (a, a + 1) = 0 and b absorbing. As defined in last class,
1 2
and
T0,b = first n : Xn ∈ {0, b} ma,b := Ea (T0,b ) m0,b = mb,b = 0 • Solve this system of equations ma,b = 1 + 12 (ma+1,b + ma−1,b ). Recall harmonic equation h(a) = 12 h(a + 1) + 21 h(a − 1): h(a+1)
h(a-1)
a-1
a
a+1
Now graphically h(a) = 1 + 21 h(a + 1) + 12 h(a − 1) looks like this: a concave function whose value at a exceeds by 1 the average of values at a ± 1
1
Lecture 8: First passage and occupation times for random walk
2
h(a) h(a+1)
h(a-1) h(a+2)
a-1
a
a+1
a+2
Simplify the main equation to (ma+1,b + ma−1,b )/2 − ma,b = −1 which shows that a → ma,b is a concave function. The simplest concave function is a parabola, that is a quadratic function of a. The function must vanish at a = 0 and a = b, which suggests the solution ma,b = ca(b − a) for some c > 0. Simple algebra shows this works for c = 1, meaning that ma,b := a(b − a) solves the equations. The solution is obviously unique subject to the boundary conditions, hence Ea (T0,b ) = a(b − a). Note that the maximum is attained for a = b/2 if b is even and for a = (b ± 1)/2 if b is odd. • Exercise: Do this by a martingale argument. For Sn = a + X1 + · · · + Xn , with no barrier at 0 and b, and Xn = ±1 (1/2, 1/2), check that (Sn2 − n) is a MG. Now argue that this process stopped at time T0,b is still a martingale, and deduce the result. • Occupation times In the previous example, take a state j with 1 ≤ j ≤ b and calculate the expected number of visits to j before T0,b . Denote Nj := # times hitting j. Observe that
b−1 X
= T0,b . To make this true, we must agree that for j = a, we
j=1
count the visit at time 0. (e.g. 0 < a = 1 < b = 2, must hit 0 or b at time 1)
Lecture 8: First passage and occupation times for random walk
Formally, Nj :=
∞ X
3
1(Sn = j)
n=0 b
a
5 visits to j j
0
• Discuss the case j = a. Key observations: – At every return to a, start a fresh Markov chain from a. – If we let p = Pa (T0,b before returning to a) = P (a, b), we see Pa (Na = 1) = p Pa (Na = 2) = (1 − p)p Pa (Na = k) = (1 − p)k−1 p Then Ea (Na ) =
∞ X
k(1 − p)k−1 p = 1/p.
k=1
• Now find p = P (a, b). Method: condition on the first step. With probability 1/2 of going up, starting at a + 1: Pa+1 (hit b before a) = P1 (hit (b − a) before 0) =
1 b−a
Similarly, Pa−1 (hit 0 before a) = 1 − Pa−1 (hit a before 0) = 1 −
a−1 1 = a a
1 1 1 b 2a(b − a) Thus P (a, b) = ( + )= . So Ea (Na ) = . 2 a b−a 2a(b − a) b • Now consider the case 0 < j < a.
Lecture 8: First passage and occupation times for random walk
4
b
a
Two cases
j
0
Ea (Nj ) =Pa (hit j before b) · Ej (Nj ) + Pa (hit b before j) · Ea (Nj |hit b before j) b − a 2j(b − j) · , since (Ea (Nj |hit b before j) = 0) = b−j b (b − a) =2j , for 1 ≤ j ≤ a b The graph below shows that for a ≤ j ≤ b we must get Ea (Nj ) =
2a (b − j). b
E_a(N_j)
a
0
• Final Check: b−1 X
Ea (Nj ) =
j=1
Pb−1
j=1
a−1 X
=
a−1 X j=1
a X j=a
2j
j
Ea (Nj ) = Ea (T0,b ) = a(b − a)
Ea (Nj ) +
j=1
b
Ea (Nj ) +
b−1 X
Ea (Nj )
j=a+1
b−1 X b−a b−a a + 2a + 2(b − j) b b b j=a+1
2(b − a) (a − 1)(1 + a − 1) 2a 2a (b − a − 1)(b − a − 1 + 1) · + (b − a) + · b 2 b b 2 =a(b − a). =
Stat 150 Stochastic Processes
Spring 2009
Lecture 9: Waiting for patterns Lecturer: Jim Pitman
1
Waiting for patterns • Expected waiting time for patterns in Bernoulli trials Suppose X1 , X2 , . . . are independent coin tosses with P(Xi = H) = p, P(Xi = T ) = 1−p = q. Take a particular pattern of some finite length K, say HH . . . H} | {z K
or HH . . . H} T . Let | {z K−1
Tpat := first n s.t. (Xn−K+1 , Xn−K+2 , . . . , Xn ) = pattern | {z } K
You can try to find the distribution of Tpat , but you will find it very difficult. We can compute E[Tpat ] with our tools. Start with the case of pat = HH . . . H}. | {z K
• For K = 1,
P(TH = n) = q n−1 p, n = 1, 2, 3 . . . X 1 E(TH ) = nq n−1 p = p n
For a general K, notice that when the first T comes (e.g. HHHT , considering some K ≥ 4), we start the counting process again with no advantage. Let mK be the expected number of steps to get K heads in a row. Let TT be the time of the first tail. Observe: If TT > K, then THH...H = K; if TT = j ≤ K, then THH...H = j + (fresh copy of) THH...H .
1
Lecture 9: Waiting for patterns
2
Therefore E(THH...H |TT > K) = K and E(THH...H |TT = j) = j + mK . So condition mK on TT : mK =
K X
P(TT = j)(j + mK ) + P(TT > K)K
j=1
=
K X
pj−1 q(j + mk ) + pK K
j=1 K K X X j−1 =( p q)mK + ( pj−1 qj) + pK K j=1
j=1
= (1 − p )mK + E(TT 1(TT ≤ K)) + pK K K
Recall that E(X1A ) = E(X|A)P(A), and notice that E(TT ) = 1/q, E(TT |TT > K) = K + 1/q, so E(TT 1(TT ≤ K)) = E(TT ) − E(TT 1(TT > K)) 1 1 = − (K + )pK q q Therefore mK = (1 − pK )mK +
1 1 − (K + )pK + pK K q q
1 − pK 1 1 − p pK 1 + p + p2 + · · · + pK−1 = pK
=
or mK pK = 1 + p + p2 + · · · + pK−1 . • Try to understand equation mK pK = 1 + p + p2 + · · · + pK−1 . Idea: Imagine you are gambling and observe the sequence of H and T evolving. Each time you bet that the next K steps will be HH . . . H}. Scale to get $1 if | {z K
you win. Accounting: Suppose that K = 3.
Lecture 9: Waiting for patterns
3
$1 p$1 p2 $1
Return:
"
Cost:
"
H
T H H T T T H H
p3 + p3 +
"
! H
! ?
! ?
+ p3
Expected cost = $ pK mK Expected return = $1·(1 + p + · · · + pK−1 ) So by “Conservation of Fairness”, mK pK = 1 + p + p2 + · · · + pK−1 . Rigorously, this is a martingale argument, but details are beyond scope of this course. See ”A Martingale Approach to the Study of Occurrence of Sequence Patterns in Repeated Experiments” by Shuo-Yen Robert Li, Ann. Probab. Volume 8, Number 6 (1980), 1171-1176. • Try pattern = HT HT Return:
$1
"
Cost:
| "
p2 q 2 +
! H T
pq$1 ! H
T ? ? ?
+ p2 q 2
Again, make fairness argument: p2 q 2 mHTHT = 1 + pq ⇒ mHTHT =
1 + pq p2 q 2
• Try H H T T H H T H H H
H T T H
H H T T
T H H T
T T T H T T H H T T
H H T T H H T T H H T T H H T T H H T T H H T T H H T T H H T T
Then mHHTTHHTT =
1 + p2 q 2 . p4 q 4
! $1 !0 !0 !0
! p2 q 2 $1 !0 !0 !0
Lecture 9: Waiting for patterns
4
• To verify the method described above actually works for all patterns of length K = 2, make a M.C. with states {HH, TT, TH, TT} by tracking the last two states. HH HT TH TT HH p q 0 0 HT 0 0 p q TH p q 0 0 TT 0 0 p q We have a chain with states i ∈ {HH, HT, T H, T T }, and we want the mean first passage time to HT. Say we start at i = T T , and define mij = mean first passage time to (j = HT ) In general, for a M.C. with states i, j, . . . , mij = Ei ( time to reach j ) If i 6= j, mij =
X
P (i, k)(1 + mkj ) +
k6=j
=1+
X
1 · P (i, j)
k=j
X
P (i, k)mkj
k6=j
Remark: The derivation shows that j need not be an absorbing state. So we want mTT,HT = 1 + p mTH,HT + q mTT,HT mTH,HT = 1 + p mHH,HT mHH,HT = 1 + p mHH,HT 1 . In previous notation, pq this is just mHT , and the general formula is seen to be working. You can check similarly that the formula works for patterns of length 3 by considering a chain with 8 patterns as its states, and so on (in principle) for patterns of length K with 2K states.
Solve the system of equations and verify that mTT,HT =
Stat 150 Stochastic Processes
Spring 2009
Lecture 10: The fundamental matrix (Green function) Lecturer: Jim Pitman
1
The fundamental matrix (Green function) • Formulate for Markov chains with an absorbing boundary. (Applications to other chains can be made by suitable sets of states to make them absorbing.) Suppose the state space S = I ∪ B satisfies the following regularity condition: – Finite number total number of internal states I. – Finite or countably infinite set of boundary states B: each b ∈ B is absorbing, meaning that P (b, b) = 1. – Starting at any i ∈ I, there is some path to the boundary, say i → j1 → j2 → · · · → jn ∈ B with P (i, j1 )P (j1 , j2 ) · · · P (jn−1 , jn ) > 0. It’s easy to see that Pi (TB < ∞) = 1 for all i ∈ I: say |I| = N < ∞, no matter what the starting state i, within N + 1 steps, by avoiding any repeat, there must be a path as above from i to some jn ∈ B in n ≤ N steps. So Pi (TB ≤ N ) > 0 for each i ∈ I. Because I is finite, this implies := mini∈I Pi (TB ≤ N ) > 0. Then Pi (TB > N ) ≤ 1 − Pi (TB > 2N ) ≤ (1 − )2 .. . Pi (TB > kN ) ≤ (1 − )k =⇒ Pi (TB < ∞) = 1
(Condition on XN )
↓ 0 as k ↑ ∞
That is, if TB = ∞, then TB > KN for arbitrary K, so 0 ≤ Pi (TB = ∞) ≤ Pi (TB > KN ) ≤ (1 − )K → 0 as K → ∞. Therefore Pi (TB = ∞) = 0. Also, TB has finite expectation: Ei (TB ) ≤ N
∞ X
Pi (TB > KN ) ≤
K=0
∞ X
(1 − )K = −1
K=0
1
Lecture 10: The fundamental matrix (Green function)
2
• (Text Page 169) Assume the structure of the transition matrix P is as follows:
I
B
z}|{ z}|{ R | P = I {| Q B {| 0 I | Qij = Pij for i, j ∈ I and Rij = Pij for i ∈ I, j ∈ B. Note that Q is a square matrix, and I denotes an identity matrix, in this diagram indexed by b ∈ B, and later indexed by i ∈ I. Matrix R indexed by I × B is typically not square. • Key observation: Look at the mean number of hits on j before hitting B starting at some different i ∈ I. (Always count a hit at time 0) First: case j ∈ I Ei [
∞ X
1(Xn = j)] =
n=0
∞ X
Pi (Xn = j) =
n=0
j∈I
P n (i, j)
n=0
Observe that for i, j ∈ I, Wij := Ei Nj = X
∞ X
P∞
n=0
P n (i, j) < ∞. Also:
∞ X ∞ X X Pi (TB > n)] = Ei (TB ) ≤ −1 < ∞ Wi j = Ei [ 1(Xn = j)] = Ei [ n=0 j∈I
n=0
so W := (Wij , i, j ∈ I) is an I × I matrix with all entries finite. Claim: Pn (i, j) = Qn (i, j) for all i, j ∈ I. P n Wij = ∞ n=0 Qij for i, j ∈ I is a finite matrix. Consider the following analogy: P n 2 −1 Real numbers: w = ∞ n=0 q = 1 + q + q + · · · = (1 − q) P n 2 −1 Matrices: W = ∞ n=0 Q = I + Q + Q + · · · = (I − Q) How do we know I − Q is invertible? Let us prove that W really is an inverse of I − Q. Need to show: W (I − Q) = I, that is, W − W Q = I or W = I + W Q. Check: ∞ X W = Qn = I + Q + Q2 + · · · n=0
WQ =
∞ X
Qn = Q + Q2 + Q3 + · · ·
n=1
=⇒I + W Q = W • Summary: In this setting, matrix W is called the Fundamental Matrix, also the Green Matrix(or Function). We find that W = (I −Q)−1 . The meaning of entries of W is that Wij is the mean number of hits of state j ∈ I, starting from state i ∈ I.
Lecture 10: The fundamental matrix (Green function)
3
• Final points: (1) W and R encode all the hitting probabilities: Pi (XTB = b) =
∞ X
Pi (TB = n, Xn = b)
n=1
= = =
∞ X X n=1 j∈I ∞ X X
Pi (Xn−1 = j, Xn = b) Qn−1 ij Pjb
n=1 j∈I ∞ XX
(
Qn−1 ij )Pjb
j∈I n=1
=
X
Wij Pjb
j∈I
since R = P |I×B
= (W R)ib
(2) From Qn and R you pick up distribution of TB : X Pi (TB = n) = Pi (TB = n, Xn = b) b∈B
=
XX
Qn−1 ij Rjb
b∈B j∈I n−1
= (Q
R 1)i
(3) Exercise: Qn and R give you the joint distribution of TB and XTB . • (Compare page 241) Brief discussion of infinite state spaces with no boundary. P Example: Coin tossing walk, p ↑ q ↓ on Z, N0 := ∞ n=0 1(Xn = 0). Compute: E0 N0 = E0
∞ X
1(Xn = 0)
(= expected number of returns to 0)
n=0
=
∞ X
P n (0, 0)
n=0
=
∞ X
P 2m (0, 0)
m=0 ∞ X
2m m m = p q m m=0 ∞ X 2m = (1/2)2m (4pq)m m m=0
Lecture 10: The fundamental matrix (Green function)
4
√ Case p = q = q/2: recall Stirling’s formula n! ∼ (n/e)n 2πn. Then 2m c (1/2)2m ≈ √ m m 4pq = 1 ∞ X c √ = ∞. Then m n=0
Case p 6= q:
∞ X 2m 1 (1/2)2m (4pq)m ≤ 0 for some l.
• Theorem: If P is a regular transition matrix on a finite set, then there exists a unique probability distribution π on S such that πP = π (π is called the stationary, equilibrium, invariant, or steady state distribution) and limn→∞ Pλ (Xn = j) = πj . Remarks: – πj > 0, for all j. – Let Tj := first n ≥ 1 : Xn = j, then πj = –
1 . Ej Tj
πj = Ei (# of hits on j before Ti ). This can be proved by using previous πi techniques. 1
Lecture 17: Limit distributions for Markov Chains
2
– Take λj = 1(j = i), λP n (j) = P n (i, j) → πj . Start in state i, X0 = i. Discussion: If such limits limn→∞ λP n (j) exist for all j, say limn→∞ λP n (j) = πj , then πP = π. Proof: πj = lim λP n (j) n→∞ X = lim (λP n )i P (i, j) n→∞
=
X
i∈S
πi P (i, j)
i
= πP (j)
π is a probability distribution , πP = π
is a system of N + 1 equations with
N unknowns: N X
πi = 1
i=1 N X
πi P (i, j) = πj , for 1 ≤ j ≤ N
i=1
There is one too many equation because the P matrix has rows sum to 1. • Trick/Technique: Reversible equilibrium d
π is an equilibrium distribution if given X0 = π, then d
d
d
d
d
π = X0 = X1 = X 2 = · · · = Xn d
Sometimes (X0 , X1 ) = (X1 , X0 ); that is P(X0 = i, X1 = j) = P(X1 = i, X0 = j) = P(X0 = j, X1 = i) With πi = P(X0 = i), this becomes πi P (i, j) = πj P (j, i) This system of equations πi P (i, j) = πj P (j, i), i, j ∈ S is called system of reversible equilibrium equations, also called detailed balance equations. No tice that this system has N2 equations with N unknowns,
Lecture 17: Limit distributions for Markov Chains
3
P and there is still the additional constraint i πi = 1. Observe: If π solves REV EQ EQNS, then π solves the usual EQ EQNS: Suppose sum over i:
πi P (i, j) = πj P (j, i), for all i, j ∈ S X X (πP )j = πi P (i, j) = πj P (j, i) = πj =⇒ πP = π i
i
It is usually easy to tell if the REV EQ EQNS have a solution, and if so to evaluate it. Starting from the value πi any particular state i, the value of πj for every state j with P (i, j) > 0 is immediately determined. Continuing this way determines πk for every state k with can be reached from i in two steps, say via j with P (i, j)P (j, k) > 0, and so on. The only issue with this procedure is if two different paths from i to k lead force two different values for πk , in which case the REV EQ EQNS have no solution. • Equilibrium Mass Transfer Imagine mass πi located at state i. At each step, proportion P (i, j) of the mass at i is moved to j. Equilibrium means that after the transfer, the mass distribution is the same. Reversible equilibrium means this is achieved because for each pair of states i and j, the mass transferred from i to j equals the mass transferred from j to i. • Example Nearest neighbour R.W. on graph. This is really a meta-example, with large numbers of instances. Graph G on set S, G is a collection of edges {i, j} for i, j ∈ S. 1 1(i neighbour of j) P (i, j) = N (i) N (i) = # of neighbours of i
2
1
3
4
5
Easy fact: The measure (N (i), i ∈ S) is always a reversible equilibrium for RW on graph. N (i)P (i, j) = N (j)P (j, i), for all i, j ∈ S because these equations read either 0 = 0 if i is not a neighbour of j or 1 = 1 if i is a neighbour of j. For instance (π1 , π2 , π3 , π4 , π5 ) = (2/14, 3/14, 3/14, 4/14, 2/14) in the example above.
Stat 150 Stochastic Processes
Spring 2009
Lecture 18: Markov Chains: Examples Lecturer: Jim Pitman
• A nice collection of random walks on graphs is derived from random movement of a chess piece on a chess board. The state space of each walk is the set of 8 × 8 = 64 squares on the board:
8 7 6 5 4 3 2 1
a b c d e f g h
Each kind of chess piece at a state i on an otherwise empty chess board has some set of all states j to which it can allowably move. These states j are the neighbours of i in a graph whose vertices are the 64 squares of the board. Ignoring pawns, for each of king, queen, rook, bishop and knight, if j can be reached in one step from i, this move can be reversed to reach i from j. Note the pattern of black and white squares, which is important in the following discussion. The King interior 8 possible moves
' * % & ' !% $ '•% # %' % "' % ) ' (
corner 3 possible moves
edge 5 possible moves
!# # $ •# "
!$ $ % •$ # & & ' "&
1
Lecture 18: Markov Chains: Examples
2
In the graph, i ←→ j means that j is a king’s move from i, and i is a king’s move from j. Observe that N (i) := #( possible moves from i) ∈ {3, 5, 8}. From general discussion of random walk on a graph in previous lecture, the reversible N (i) where equilibrium distribution is πi = Σ X Σ := N (j) = (6 × 6) × 8 + (4 × 6) × 5 + 4 × 3 j
Question: Is this walk regular? Is it true that ∃m : P m (i, j) > 0 for all i, j? Yes. You can easily check this is so for m = 7. Rook and Queen Very similar treatment. Just change the values of N (i). Both walks are regular because P 2 is strictly positive: even the rook can get from any square to any other square in two moves. Bishop 32 white squares, 32 black squares. Bishop stays on squares of one colour.
# & #
! " ! #•! !# ! # ! % # $ Is this chain regular? No, because the bishop’s walk is reducible: there are two disjoint sets of states (B and W ). For each i, j ∈ B, ∃N ; P N (i, j) > 0. In fact, N = 3. Similarly for each i, j ∈ W, P 3 (i, j) > 0, hence P N (i, j) > 0 for all N ≥ 3. But B and W do not communicate.
B 1
!# "$
. B ..
32 33
. W ..
64
W
!# "$
The matrix is decomposed into two: B ←→ B, W ←→ W . If the bishop
Lecture 18: Markov Chains: Examples
3
starts on a black square, it moves as if a Markov chain with state space B, and the limit distribution is the equilibrium distribution for the random P P N (i) = j∈B N (j), with 32 walk on B, with πi = P as before, but now terms, rather than a sum over all 64 squares. This is typical of a Markov chain with two disjoint communicating classes of states. Knight
× ×
×
× •
×
×
× ×
Is it regular? (No) After an even number of steps n, P n (i, j) > 0 only if j is of the same color as i. No matrix power of P is strictly > 0 at all entries =⇒ not regular. • Periodicity Fix a state i. Look at the set of n : P n (i, i) > 0. For the knight’s move {n : P n (i, i) > 0} = {2, 4, 6, 8, . . . }. You cannot return in an odd number of steps because knight’s squares change colour B → W → B at each step. Definition: Say P is irreducible if for all states i, j, ∃n : P n (i, j) > 0. This condition is weaker than regular, which requires the n to work for all i, j. Deal with periodic chains: Suppose P is irreducible, say state i has period d ∈ {1, 2, 3, . . . } if the greatest common divisor of {n : P n (i, i) > 0} = d. E.g., d = 2 for each state i for the knight’s walk. Fact (Proof see Feller Vol.1): For an irreducible chain, every state i has the same period. – d = 1: called aperiodic – d = 2, 3, . . . : called periodic with period d In periodic case, ∃ d disjoint sets of states, C1 , . . . , Cd (Cyclically moving subclasses): P (i, j) > 0 only if i ∈ Cm , j ∈ Cm+1 (m + 1 mod d).
Lecture 18: Markov Chains: Examples
4
!# C1 "$ !# C2 "$
.. . !# C3 "$
Note If ∃i : P (i, i) > 0, then i has period 1. Then, assuming P is irreducible, all states have period 1, and the chain is aperiodic. Fact An irreducible, aperiodic chain on finite S is regular. • Death and immigration chain Population story. State space S = {0, 1, 2, . . . }. Let Xn ∈ S represent the number of individuals in some population at time n. Dynamics: Between times n and n + 1, – each individual present at time n dies with probability p and remains with probability q := 1 − p. – add an independent Poisson(λ) number of immigrants. Problem Describe limit behavior of Xn as n → ∞. Natural first step: write down the transition matrix. For i ≥ 0, j ≥ 0, condition on number of survivors: P (i, j) =
i X i k=0
k
q k pi−k e−λ
λj−k 1(k ≤ j) (j − k)!
Now try to solve the equations πP = π. Very difficult! • Idea: Suppose we start at state i = 0, X0 = 0, X1 ∼ Poi(λ) X2 ∼ Poi(λq + λ) by thinning and “+” rule for Poisson X3 ∼ Poi((λq + λ)q + λ) .. . 1 − qn n−1 Xn ∼ Poi(λ(1 + q + · · · + q )) = Poi λ → Poi(λ/p) 1−q
Lecture 18: Markov Chains: Examples
5
So given X0 = 0, we see Xn −→ Poi(λ/p). d
If we start with Xn > 0, it is clear from the dynamics that we can write Xn = Xn∗ + Yn Xn∗ = survivors of the initial population Yn = copy of the process given (Xn = 0)
where
Note that Xn∗ → 0,
Yn → Poi(λ/p)
Conclusion: Xn −→ Poi(λ/p) no matter what X0 is. d
• Ad hoc analysis — special properties of Poisson. General idea: What do you expect in the limit? Answer: stationary distribution πP = π. Here π is Poi(λ/p). Does πP = π? That is, does X0 ∼ Poi(λ/p) imply X1 ∼ Poi(λ/p) . . . ? Check: Say X0 ∼ Poi(µ), then by thinning and addition rules for Poisson, as before X1 ∼ Poi(µq + λ). So X1 = X0 ⇐⇒ µ = µq + λ ⇐⇒ µ = d
λ λ = 1−q p
Therefore, the unique value of µ which makes Poi(µ) invariant for this chain is µ = λ/p. In fact, this is the unique stationary distribution for the chain. This follows from the previous result that no matter what the distribution of X0 , the distribution of Xn converges to Poi(λ/p). Because if π was some other stationary distribution, if we started the chain with X0 ∼ π, then Xn ∼ π for every n, and the only way this can converge to Poi(λ/p) is if in fact π = Poi(λ/p).
Stat 150 Stochastic Processes
Spring 2009
Lecture 19: Stationary Markov Chains Lecturer: Jim Pitman
• Symmetry Ideas: General idea of symmetry: make a transformation, and something stays the same. In probability theory, the transformation may be conditioning, or some rearrangement of variables. What stays the same is the distribution of something. For a sequence of random variables X0 , X1 , X2 , . . . , various notions of symmetry: (1) Independence. X0 and X1 are independent. Distribution of X1 given X0 d does not involve X0 ; that is, (X1 |X0 ∈ A) = X1 . d
(2) Identical distribution. The Xn are identically distributed: X0 = Xn for every n. Of course IID =⇒ LLN / CLT. • Stationary: (X1 , X2 , . . . ) = (X0 , X1 , . . . ) By measure theory, this is the same as d
(shifting time by 1)
d
(X1 , X2 , . . . , Xn ) = (X0 , X1 , . . . , Xn−1 ), for all n. Obviously IID =⇒ Stationary. But the converse is not necessarily true. E.g., Xn = X0 for all n, for any non-constant random variable X0 . Another example: Stationary MC, i.e., start a MC with transition matrix P with initial distribution π. Then easily, the following are equivalent: πP = π d
X1 = X0 d
(X1 , X2 , . . . ) = (X0 , X1 , . . . ) . Otherwise put, (Xn ) is stationary means (X1 , X2 , . . . ) is a Markov chain with exactly the same distribution as (X0 , X1 , . . . ). • Other symmetries: d
– Cyclic symmetry: (X1 , X2 , . . . , Xn ) = (X2 , X3 , . . . , Xn , X1 ) | {z } | {z } n
n d
– Reversible: (Xn , Xn−1 , . . . , X1 ) = (X1 , X2 , . . . , Xn ) 1
Lecture 19: Stationary Markov Chains
2
d
– Exchangeable: (Xπ(1) , Xπ(2) , . . . , Xπ(n) ) = (X1 , X2 , . . . , Xn ) for every permutation π of (1, . . . , n). • Application of stationary idea to MC’s. Think about a sequence of RVs X (0 , X1 , . . . which is stationary. Take a set least n ≥ 1(if any) s.t. Xn ∈ A A in the state space. Let TA = ; that is, ∞ if no such n TA := least n ≥ 1 : 1A (Xn ) = 1. Consider the 1A (Xn ) process. E.g., 0 0 0 0 1 0 0 1 1 0 0 0 . . . To provide an informal notation: P(TA = n) = P(? |0 0 .{z . . 0 0} 1) n−1 zeros
P(TA ≥ n) = P(? |0 0 .{z . . 0 0} ?) n−1 zeros
P(X0 ∈ A, TA ≥ n) = P(1 |0 0 .{z . . 0 0} ?) n−1 zeros
Lemma: For every stationary sequence X0 , X1 , . . . , P(TA = n) = P(X0 ∈ A, TA ≥ n)
for n = 1, 2, 3, . . .
With the informal notation, the claim is: P(? 0| 0 .{z . . 0 0} 1) = P(1 |0 0 .{z . . 0 0} ?) n−1 zeros
n−1 zeros
Notice. This looks like reversibility, but reversibility of the sequence X0 , X1 , . . . , Xn is not being assumed. Only stationarity is required! Proof
P (? 0! 0 ."# . . 0 0$ 1)
P (1 !0 0 ."# . . 0 0$ ?)
n−1 zeros
+ P (? 0! 0 ."# . . 0 0$ 0)
n−1 zeros
stationarity
=
n−1 zeros
n−1 zeros
P (0 !0 0 ."# . . 0 0$ ?) n−1 zeros
! P (? 0! 0 ."# . . 0 0$ ?)
+
! =
P (? 0! 0 ."# . . 0 0$ ?) n−1 zeros
Lecture 19: Stationary Markov Chains
3
Take the identity above, sum over n = 1, 2, 3, . . . , P(TA < ∞) = =
∞ X n=1 ∞ X
PA (T = n) P(X0 ∈ A, TA ≥ n)
n=1 ∞ X
=E
1(X0 ∈ A, TA ≥ n)
n=1
= E(TA 1(X0 ∈ A)) This is Marc Kac’s identity: For every stationary sequence (Xn ), and every measurable subset A of the state space of (Xn ): P(TA < ∞) = E(TA 1(X0 ∈ A)) • Application to recurrence of Markov chains: Suppose that an irreducible transition matrix P on a countable space S has a stationary probability measure π, that is πP = π, with πi > 0 for some state i. Then – π is the unique stationary distribution for P – πj > 0, ∀j. – Ej Tj < ∞, ∀j. – Ej Tj =
1 , ∀j. πj
Proof: Apply Kac’s identity to A = {i}, with P = Pπ governing (Xn ) as d a MC with transition matrix P started with X0 = π. Kac
1 ≥ Pπ (Ti < ∞) = Eπ [Ti 1(X0 = i)] = πi Ei Ti Since πi > 0 this implies first Ei Ti < ∞, and hence Pi (Ti < ∞) = 1. Next, need to argue that Pj (Ti < ∞) = 1 for all states j. This uses irreducibility, which gives us an n such that P n (i, j) > 0, hence easily an m ≤ n such that it is possible to get from i to j in m steps without revisiting i on the way, i.e. Pi (Ti > m, Xm = j) > 0 But using the Markov property this makes Pj (Ti < ∞) = Pi (Ti < ∞ | Ti > m, Xm = j) = 1
Lecture 19: Stationary Markov Chains
Finally Pπ (Ti < ∞) =
X
4
πj Pj (Ti < ∞) =
j
X
πj = 1
j
and the Kac formula becomes 1 = πi Ei Ti as claimed. Lastly, it is easy that πj > 0 and hence 1 = πj Ej Tj because π = πP implies π = πP n and so X πj = πk P n (k, j) ≥ πi P n (i, j) > 0 k
for some n by irreducibility of P .
Stat 150 Stochastic Processes
Spring 2009
Lecture 20: Markov Chains: Examples Lecturer: Jim Pitman
• A nice formula: Ei (number of hits on j before Ti ) =
πj πi
Domain of truth: – P is irreducible. P – π is an invariant measure: πP = π, πj ≥ 0 for all j, and πj > 0 P – Either πj < ∞ or (weaker) Pi (Ti < ∞) = 1 (recurrent). Positive recurrent case: if Ei (Ti ) < ∞, then π can be a probability measure. Null recurrent case: if Pi (Ti < ∞) = 1 but Ei (Ti ) = ∞, then π cannot be a probability measure. • The formula above is a refinement of P the formula Ei (Ti ) = 1/πi for a positive recurrent chain with invariant π with j πj = 1. To see this, observe that Nij : = number of hits on j before Ti ∞ X Nij = 1(Xn = j, n < Ti ) n=0
Ti = =
∞ X
1(n < Ti )
n=0 ∞ X X
1(Xn = j) 1(n < Ti )
n=0 j∈S
{z
| =
∞ XX
=1
1(Xn = j, n < Ti )
j∈S n=0
=
X
}
Nij
j∈S
1
Lecture 20: Markov Chains: Examples
2
Then Ei Ti = Ei
X
Nij =
X
j
Ei Nij =
j
X πj πi
j
=
1 πi
with
X
πj = 1
j
• Strong Markov property : Chain refreshes at each visit to i =⇒ numbers of visits to j in successive i-blocks are iid.
Xn j i•
4
• (1)
0
4
(2)
• (3)
•
0
•
2
•
6
n
•
(k)
(1)
Sequence Nij , Nij , Nij , . . . Each Nij is an independent copy of Nij = Nij . So Ei Nij = expected # of j’s in an i-block, and according to the law of large numbers: m 1 X (k) N −→ Ei Nij m k=1 ij | {z } average # of visits to j in the first m i-blocks
Push this a bit further: N X n=1 N X
1(Xn = j) = # of j’s in the first N steps. 1(Xn = i) = # of i’s in the first N steps.
n=1
Previous discussion: In the long run, the # of j’s / (visits to i) −→ Ei Nij : PN 1(Xn = j) # of j’s in the first N steps = −→ Ei Nij Pn=1 N # of i’s in the first N steps 1(X = i) n n=1 But also in positive recurrent case, for simplicity, assume aperiodic/regular: Pi (Xn = j) −→ πj N N 1 X 1 X n Ei ( 1(Xn = j)) = P (i, j) −→ πj N n=1 N n=1 ↓ πj
Lecture 20: Markov Chains: Examples
3
because averages of a convergent sequence tend to the same limit. What’s happening of course is PN 1 n=1 1(Xn = j) −→ πj N 1 N
PN
n=1
1(Xn = i) −→ πi
PN
=⇒ Pn=1 N
n=1
1(Xn = j) 1(Xn = i)
−→
πj πi
Hence Ei Nij = πj /πi . • Example p ↑ q ↓ walk on {0, 1, 2, . . . } with partial reflection at 0.
0 q then n µn = ∞ which shows P there is no invariant probability distribution for this chain. If p < q then n µn = 1/(1 − p/q) < ∞ which gives the unique invariant probability distribution
(p/q)n πn = P∞ = (p/q)n (1 − p/q) ∼ Geometric(1 − p/q) on {0, 1, 2, . . . } n (p/q) n=0 From previous theory, existence of this invariant probability distribution proves that the chain is positive recurrent for p < q. • Example: Compute E1 T0 by two methods and compare. Method 1: First step analysis. Let mij = Ei Tj . From 0, m00 = 1 + pm10 =⇒ m10
m00 − 1 = = p
1 1−p/q
−1
p
=
1 q−p
Method 2: We can use Wald’s identity for the coin-tossing walk, without con1 cern about the boundary condition at 0, to argue that m10 = q−p . • Example: For any particular probability distribution p1 , p2 , . . . on {1, 2, . . . }, there is a MC and a state 0 whose recurrent time T0 has P0 (T0 = n) = pn . • • 3 • • • 2 • • • 1 • • • • 0• 3 4 0 3
•
•
• 5
•
• • • 0
• 3
•
n
Lecture 20: Markov Chains: Examples
(1)
(2)
5
(3)
Start with T0 , T0 , T0 , . . . iid with distribution probability (p1 , p2 , . . . ). P (0, j) = pj+1 , for j = 0, 1, 2, . . . P (1, j) = 1(j = 0) P (2, j) = 1(j = 1) .. . P (i, j) = 1(j = i − 1), for i > 0
Describe the stationary distribution: Solve πP = π. Not reversible. Use πj = E0 ( # of visits to j before T0 ) π0 Notice chain dynamics: # of hits on state j ≥ 1 is either 1 or 0. It is 1 if and only if initial jump from 0 is to j or greater. Probability of that is pj+1 + pj+2 + · · · πj =⇒ = pj+1 + pj+2 + · · · . π0 Check
1 = E0 (T0 ): π0 ∞
X πj X 1 = = P0 (T0 > j) = E0 (T0 ) π0 π0 j j=1
Stat 150 Stochastic Processes
Spring 2009
Lecture 21: Poisson Processes Lecturer: Jim Pitman
• Poisson processes General idea: Want to model a random scatter of points in some space. – # of points is random – locations are random – want model to be simple, flexible and serve as a building block Variety of interpretations: – Natural phenomena such as weather, earthquakes, meteorite impacts – points represent discrete “events”, “occurrences”, “arrivals” in some continous space of possibilities – Continuous aspect: location/attributes of points in physical space/time – Discrete aspect: counts of numbers of points in various regions of space/time give values in −→ {0, 1, 2, 3, . . . } Abstractly: A general, abstract space S. Suitable subsets B of S for which we discuss counts, e.g. intervals, regions for which area can be defined, volumes in space, ... Generically N (B) should be interpreted as the number of occurrence/arrivals/events with attributes in B. Then by definition, if B1 , B2 , . . . , Bn are disjoint subsets of S, then N (B1 ∪ B2 ∪ · · · ∪ Bn ) = N (B1 ) + · · · + N (Bn ) Simple model assumptions that lead to Poisson processes: (1) Independence: If B1 , B2 , . . . , Bn are disjoint subsets of S, then the counts N (Bi ) are independent. (2) No multiple occurrences. Example: we model #’s of accidents, not #’s of people killed in accidents, or numbers of vehicles involved in accidents.
1
Lecture 21: Poisson Processes
2
It can be shown in great generality that these assumptions imply the Poisson model: N (B) ∼ Poisson(µ(B)) for some measure µ on subsets B of S. If µ is λ times some notion of length/area/volume on S, then the process is called homogeneous and λ is called the rate. In general, µ(B) is the expected number of points in B. In the homogeneous case, the rate λ is the expected number of points per unit of length/area/volume. • Specific Most basic study is for a Poisson arrival process on a time line [0, ∞) with arrival rate λ. ×
!
×
"
×
a
×
×
b
t
(a, b] = interval of time (a, b] ∪ (b, c] = (a, c] Length
z }| { N (a, b] := # of arrivals in (a, b] ∼ Poisson (λ(b − a)) | {z } E[# of arrivals]
Nt := N (0, t]. (Nt , t ≥ 0) continuous time counting process: – only jumps by 1 – stationary independent increments – for 0 ≤ s ≤ t, Nt − Ns = Nt−s = Poisson(λ(t − s)) d
d
Define Tr := W1 + W2 + · · · + Wr = inf{t : Nt = r}, Wi := Ti+1 − Ti
0 = T0 < T1 < T2 · · ·
Lecture 21: Poisson Processes
3
Nt
• • W3
• • •
W1
W2
◦ T1
W4
W5
◦
◦
◦
◦
T2
T3
T4
T5
t
• Fact/Theorem (Nt , t ≤ 0) is a Poisson process with rate λ ⇐⇒ W1 , W2 , W3 , . . . is a sequence of iid Exp(λ). (W1 > t) ⇐⇒ (Nt = 0) P(W1 > t) = P(Nt = 0) = e−λt • Fundamental constructions: Marked Poisson point process Assume space of marks is [0,1] at first. ◦ Points arrive according to a PP(λ) ◦ Each point is assigned an independent uniform mark in [0,1] Let the space of marks be split into two subsets of length p and q with p+q = 1.
1
q
p
0
⊗ !
!
! !
!
⊗
!
t
Each point is a with probability p and ⊗ with probability q. Let N (t) : # of up to t N⊗ (t) : # of ⊗ up to t
Lecture 21: Poisson Processes
4
Then N (t) ∼ Poisson(λpt) N⊗ (t) ∼ Poisson(λqt) N (t) = N (t) + N⊗ (t) These relations hold because of the Poisson thinning property: if you have Poisson number of individuals with mean µ, then – keep each individual with probability p – discard each individual with probability q Then # kept and # discarded are independent Poissons with means µp and µq, respectively. Pushing this further, (N (t), t ≥ 0) and (N⊗ (t), t ≥ 0) are two independent homogeneous Poisson processes with rates λp and λq respectively. Take a fixed region (t, u) space, count the number of (Ti , Ui ) with (Ti , Ui ) ∈ R. In the graph below, there are two. Since the sum of independent Poissons is Poisson, the number of (Ti , Ui ) has to be Poisson as well.
× ×
×
× t
◦ Here we simply suppose the marks are uniform ◦ Generalize to marks which are independent identical with density f (x) ◦ T1 , T2 , . . . are time arrivals of PP(λ) ◦ X1 , X2 , . . . are independent identical with P(Xi ∈ dx) = f (x)dx. ◦ Then (T1 , X1 ), (T2 , X2 ), . . . are the points of a Poisson process in [0, ∞) × R with measure density λdtf (x)dx
Lecture 21: Poisson Processes
X
5
#$ B
×
× × × !"
t
Here, NB ∼ Poisson with mean
RR B
λdtf (x)dx
• Practical example (Queueing model) – Customers arrive according to a PP(λ) – A customer arrives at time t and stays in the system for a random time interval with density f (x). Find a formula for the distribution of # of customers in system at time t.
• •
•
X1
• •
T1
X3
T2
•
•
X2
•
X2 • X3
X1
t
T3 t
T 1 T2
T3
Mark each arrival by time spent in the system. Z t Z ∞ µ= λds f (x)dx 0 t−s Z t Z s = λds(1 − F (s)), where F (s) = f (x)dx 0
0
Actual number is Poisson with mean µ calculated above. • Connection between PP and uniform order statistics Construction of a PP with rate λ on [0,1] Step 1: Let N1 ∼ Poisson(λ). This will be the total number of points in [0,1]
Lecture 21: Poisson Processes
6
Step 2: Given N1 = n, let U1 , . . . , Un be independent and uniform on [0,1]. Let the points of PP be at U1 , U2 , . . . , Un . Let T1 = min Ui 1≤i≤n
Tk = min{Ui : Ui > Tk−1 , 1 ≤ i ≤ n} for k > 1 We can only define T1 , T2 , . . . , Tn = ordered values of U1 , U2 , . . . , Un . Pick all the Tr ’s with Tr ≤ 1 this way. P Create (Nt , 0 ≤ t ≤ 1) : Nt = #{i : Ui ≤ t} = ∞ i=1 1(N1 ≥ i, Ui ≤ t) Claim: (Nt , 0 ≤ t ≤ 1) is the usual PP(λ) This means ◦ Nt ∼ Poisson(λt) ◦ Nt1 , Nt2 − Nt1 , . . . , Ntn − Ntn−1 , N1 − Ntn are independent for 0 ≤ t1 ≤ · · · ≤ tn ≤ 1 ◦ Nt − Ns ∼ Poisson(λ(t − s)) Why? Nt ∼ Poisson(λt) by thinning. Independence also by thinning: I can assign Poisson(N1 ) counts into categories (0, t1 ], (t1 , t2 ], . . . , (tn , 1] with probabilities t1 , t2 − t1 , . . . , 1 − tn . Poissonization of multinomials: P(Nt1 = n1 , Nt2 − Nt1 = n2 , . . . , N1 − Ntk−1 = nk |n1 + n2 + · · · + nk = n) =P(N1 = n1 + · · · + nk )P(Nt1 = n1 , Nt2 − Nt1 = n2 , . . . , N1 − Ntk−1 = nk |N1 = n1 + · · · + nk ) −λ
=e
n1 + · · · + nk n1 λn1 +···+nk t (t2 − t1 )n2 · · · (1 − tk−1 )nk (n1 + · · · nk )! n1 , n2 , . . . , nk 1
=e−λt1
(λt1 )n1 −λ(t2 −t1 ) (λ(t2 − t1 ))n2 (λ(1 − tk−1 ))nk e · · · e−λ(1−tk−1 ) n1 ! n2 ! nk !
Therefore Nt1 , Nt2 − Nt1 , . . . , Ntn − Ntn−1 , N1 − Ntn are independent Poisson as claimed.
Stat 150 Stochastic Processes
Spring 2009
Lecture 22: Continuous Time Markov Chains Lecturer: Jim Pitman
• Continuous Time Markov Chain Idea: Want a model for a process (Xt , t ≥ 0) with continuous time parameter t. – State space a finite/countable set S – Markov property (time homogenous transition mechanism) Strong parallels with discrete time Expression of Markov: conditionally given that Xt = j, the past process (Xs , 0 ≤ s ≤ t) and the future (Xt+v , v ≥ 0) are conditionally independent, and the future is a distinct copy of (Xv , v ≥ 0) given X0 = j. Notations Recall in discrete P n = n step transition matrix, we need now P t = time t transition matrix. Conceptual challenge: raising a matrix to a real t power such as t = 1/2, 1/3. Use notation P (t) instead of P t for the transition matrix over time t. Idea: P (t) is something like a tth power: P (s)P (t) = P (s + t). This semigroup property of transition probability matrices follows easily from the Markov property: Pij (s + t) :=Pi (Xs+t = j) X = Pi (Xs = k, Xs+t = j) k
=
X
Pi (Xs = k)Pk (Xt = j)
k
• Basic examples: The Poisson process as a Markov chain. Let (Nt , t ≥ 0) be a Posson process with N0 = 0 and rate λ.
1
Lecture 22: Continuous Time Markov Chains
Nt
• • • • •
S1
◦ W1
S2
S3
S4
S5
2
◦
◦
◦
◦
W2
W3 W4
W5
t
S1 , S2 , . . . independent exp(λ). Obserbe (Nt , t ≥ 0) is a MC with transition matrices Pij (t) = P0 (Ns+t = j|Ns = i) = P0 (Ns+t − Ns = j − i|Ns = i) = P0 (Ns+t − Ns = j − i) since Ns+t − Ns ⊥ Ns = P0 (Nt = j − i) (λt)j−i 1(j ≥ i) = e−λt (j − i)! Independence property of PP =⇒ (Nt , t ≥ 0) is a MC in state space {0, 1, . . . } (λt)j−i 1(j ≥ i). with transition matrix Pij (t) = e−λt (j − i)! Distinguish carefully two features of this process: (1) The semi-group (P (s + t) = P (s)P (t)) of transition P (t), t ≥ 0: the entries t → Pij (t) are very smooth(differentiable) functions of t. 1
Pii (t)=e−λt Pi,i+1 (t)
0
Pi,i+2 (t)
t
(2) The path of the process t → Nt is a step function.
Lecture 22: Continuous Time Markov Chains
• • •
3
◦
◦
◦
t
Recall: Discrete time :P n+1 = P n P = P P n Continuous time :P (n + 1) = P (n)P (1) = P (1)P (n) P (t + h) = P (t)P (h) = P (h)P (t)
with h small
d P (t + h) − P (t) P (t) := lim . We write this as a convet→0 dt h d Pij (t + h) − Pij (t) nient abbreviation for Pij (t) = lim , ∀i, j ∈ S. t→0 dt h We should evaluate
d P (t + h) − P (t) P (t) = lim t→0 dt h P (h) − I = lim P (t) h→0 h P (h) − I = P (t) lim h→0 h = P (t)A,
P (h) − I d where A := lim = P (t) h→0 h dt t=0
d Note that P (t) = P (t)A = AP (t). These are Kolmogorov forward and dt backward difference equations for a Markov semi-group. The matrix A is called the infinitesimal generator of the semi-group. It should be clear that A is rather important. If you know A, you can figure out P (t) by solving Kolmogorov’s equations. • Compute A for the PP: Ai,i = −λ Ai,i+1 = λ Ai,i+2 = 0 Ai,i+3 = 0 .. .
Lecture 22: Continuous Time Markov Chains
Ai,j
4
−λ if j = i = λ if j = i + 1 0 else
Note that X j
Pij (t) = 1 =⇒
X d d X Pij (t) = 0 =⇒ Pij (t) = 0 dt j dt j
X d Take t = 0 in this, Pij (t) = Aij =⇒ Aij = 0. dt t=0 j Notice Aii ≤ 0 for all i, all transition matrix P (t). d Pii (0) = 1 =⇒ Pii (t) ≤ 0 Pii (t) ≤ 1 dt t=0 Customary notations: (see Pg. 396 of text) qi := − Aii qij :=Aij ≥ 0 X qij qi =
for all j 6= i
j6=i
• Key idea: Start a continuous time chain in state i, – it will hold there for an exponentially distributed sojourn time with exp(qi ), mean 1/qi . – independent of length of this sojourn time, when it leaves for the first time, it jumps to j with probability qij /qi . A=
−q1 q12 q13 · · · q21 −q2 q23 · · · .. . −q3 .. ... .
Now P sum = 0, diagonal Dij = −qi ≤ 0, off diagonal dij = qij ≥ 0, j6=i qij /qi = 1.
P
j6=i qij
Example: A Poisson process that starts in state i – hold in state i for exp(λ), λ = qi = −Aii – only way to leave state i is by jumping to i + 1, qi,i+1 /qi = λ/λ = 1
= qi ,
Lecture 22: Continuous Time Markov Chains
– for other states j, qij /qi = 0/λ = 0 • Transition rate diagram: Mark off diagonal entries of A. 1 −λ λ Two state MC: A = 2 µ −µ
!"2#$ ' % µ λ 1# ! & "$ ( (3)
Three state MC on a circle: Nt 0
λ
λ
2
1
λ General 3 state chain: 0 q01
q20 q02 2
q10 q21 q12
1
= Nt mod 3, Nt = PP(λ).
5
Lecture 21 : Continuous Time Markov Chains STAT 150 Spring 2006 Lecturer: Jim Pitman
Scribe: Stephen Bianchi
(These notes also include material from the subsequent guest lecture given by Ani Adhikari.) Consider a continuous time stochastic process (Xt , t ≥ 0) taking on values in the finite state space S = {0, 1, 2, . . . , N }. Recall that in discrete time, given a transition matrix P = p(i, j) with i, j ∈ S, the n-step transition matrix is simply P n = pn (i, j), and that the following relationship holds: P n P m = P n+m . Where pn (i, j) is the probability that the process moves from state i to state j in n transitions. That is, pn (i, j) = Pi (Xn = j), pn (i, j) = P(Xn = j|X0 = i), pn (i, j) = P(Xm+n = j|Xm = i). Now moving to continuous time, we say that the process (Xt , t ≥ 0) is a continuous time markov chain if the following properties hold for all i, j ∈ S, t, s ≥ 0: • Pt (i, j) ≥ 0, PN • j=0 Pt (i, j) = 1,
P • P(Xt+s = j|X0 = i) = N k=0 P(Xt+s = j|Xs = k)P(Xs = k|X0 = i), or PN P(Xt+s = j|X0 = i) = k=0 Ps (i, k)Pt (k, j).
This last property can be written in matrix form as Ps+t = Ps Pt .
This is known as the Chapman-Kolmogorov equation (the semi-group property). Example 21.1 (Poissonize a Discrete Time Markov Chain) Consider a poisson process (Nt , t ≥ 0) with rate λ, and a jump chain with transition matrix Pˆ , which 21-1
21-2
Lecture 21: Continuous Time Markov Chains
makes jumps at times (Nt , t ≥ 0). Let Xt = YNt (YNt is the value of the jump chain at time Nt ), where Xt takes values in the discrete state space S. Assume (Y0 , Y1 , . . .) and Nt are independent. Find Pt for the process Xt . By definition Pt (i, j) = P(Xt = j|X0 = i). Condition on Nt to give, Pt (i, j) =
∞ X
P(Nt = n)Pˆ n (i, j)
n=0
=
∞ X e−λt (λt)n n=0
= e
−λt
n!
Pˆ n (i, j)
∞ X (λtPˆ )n (i, j)
n!
n=0
ˆ
= e−λt e−λtP . Dropping indices then gives,
ˆ
Pt = e−λt(P −I) . Pt is a paradigm example of a continuous time semi-group of transition matrices. It is easy to check that Pt Ps = Pt+s , since eA+B = eA eB .
21.1
Generalization
Let A be a matrix of transition rates, where qij = A(i, j) is the rate of transitions from state i to state j (i 6= j), qij ≥ 0, and qi = A(i, i) is the total rate of transitions P out of state i. Hence, qi = − N j=0,j6=i qij and each row of A sums to 0. Also, since Pt is continuous and differentiable we have Ph (i, j) h→0 h 1 − Ph (i, i) . = lim h→0 h
qij = lim qi
In words, qij is the probability per unit time of transitioning from state i to state j in a small time increment (h). The limit relations above can be expressed in matrix form, Pt+h − Pt Pt Ph − P t Pt [Ph − I] d Pt = lim = lim = lim h→0+ h→0+ h→0+ dt h h h
21-3
Lecture 21: Continuous Time Markov Chains
hence
Ph − I d d Pt = Pt lim = Pt P0 = Pt A = APt , h→0+ dt h dt
where A =
d P. dt 0
Given the initial condition P0 = I, the solution to the equation d Pt = Pt A = APt dt
is Pt = eAt . dtd Pt = Pt A is known as the Kolmogorov forward equations and dtd Pt = APt is known as the Kolmogorov backward equations. We claim that (Pt , t ≥ 0) is a collection of transition matrices of some continuous time markov chain. In the example given above we have, A = λ(Pˆ − I) A(i, i) = λ(Pˆ (i, i) − 1) A(i, j) = λPˆ (i, j). Example 21.2 Consider the two state markov chain with states {0, 1} and −λ λ . A= µ −µ The sojourn times in state 0 and 1 are independent and exponentially distributed with parameters λ and µ, respectively. Find Pt for this chain. First note that d Pt = APt = dt
−λPt (0, 0) + λPt (1, 0) −λPt (0, 1) + λPt (1, 1) µPt (0, 0) − µPt (1, 0) µPt (0, 1) − µPt (1, 1)
Then
d (Pt (0, 0) − Pt (1, 0)) = −(λ + µ)(Pt (0, 0) − Pt (1, 0)). dt This equation has the form f 0 (t) = cf (t), and a solution is Pt (0, 0) − Pt (1, 0) = ce−(λ+µ)t . Since c evaluated at t = 0 is one (i.e., P0 (0, 0) − P0 (1, 0) = 1)), we have Pt (0, 0) − Pt (1, 0) = e−(λ+µ)t , which implies
d Pt (0, 0) = −λe−(λ+µ)t . dt
.
21-4
Lecture 21: Continuous Time Markov Chains
Then Pt (0, 0) = P0 (0, 0) +
Z
t
−λe−(λ+µ)s ds 0
λ −(λ+µ)t λ Pt (0, 0) = 1 + e − λ+µ λ+µ λ −(λ+µ)t µ + e . Pt (0, 0) = λ+µ λ+µ Likewise µ µ −(λ+µ)t − e λ+µ λ+µ λ −(λ+µ)t µ − e Pt (0, 1) = λ+µ λ+µ λ µ −(λ+µ)t Pt (1, 1) = + e . λ+µ λ+µ Pt (1, 0) =
Notice that as t −→ ∞ µ λ+µ λ Pt (0, 1) = Pt (1, 1) = . λ+µ
Pt (0, 0) = Pt (1, 0) =
These limits are independent of the initial state. The stationary distribution of the chain is µ π0 = λ+µ λ . π1 = λ+µ
21.2
Jump Hold Description
Let (Xt , t ≥ 0) be a markov P chain with transition rate matrix A, such that A(i, j) ≥ 0 for i 6= j and A(i, i) = − i6=j A(i, j). Let Hi be the holding time in state i (i.e., the process ”holds” in state i for an amount of time Hi and then jumps to another state). The markov property implies Pi (Hi > s + t|Hi > s) = Pi (Hi > t). We also recognize this as the memoryless property of the exponential distribution (fact: the only continuous distribution with the memoryless property is the exponential).
21-5
Lecture 21: Continuous Time Markov Chains
By the idea of competing risks we will see that Pi (Hi > t) = eA(i,i)t . Imagine that each state j 6= i has a random variable ξj ∼ exp(A(i, i)). Think of a bell that rings in state j at time ξj , and from state i the process moves to the state where the bell rings first. Construct Hi = mini6=j ξj , then \ Pi (Hi > t) = P( ξj > t) i6=j
= Πi6=j e−A(i,j)t P = e−( i6=j A(i,j))t = eA(i,i)t ,
where A(i, i) < 0. What is the probability is that the first transition out of state i is to state k? This is given by Z ∞ P(min ∈ dt, min = ξk )dt P(min ξj = ξk ) = i6=j 0 Z ∞ P(ξk ∈ dt, all the others are > t)dt = 0 Z ∞ P A(i, k)e−A(i,k)t e−( i6=k,j A(i,k))t dt = Z0 ∞ P A(i, k)e−( i6=j A(i,j))t dt = 0
= P
A(i, k) A(i, k) = . −A(i, i) i6=j A(i, j)
Observe from this calculation that the following mechanisms are equivalent: 1. (a) competing random variables ξj ∼ exp(A(i, j)) for state j 6= i (b) Hi = minj6=i ξj (c) jump to the state j which attains the minimum P 2. (a) Hi ∼ exp(−A(i, i) = i6=j A(i, j)) (b) at time Hi jump to k with probability
A(i,k) −A(i,i)
Example 21.3 (Hold Rates for a Poissonized Chain) Consider a general transition matrix Pˆ , where some of the diagonal entries maybe greater than zero. So we have Pt A A(i, i) A(i, j)
= = = =
ˆ
e−λt(P −I) λ(Pˆ − I) λ(Pˆ (i, i) − 1) = −λ(1 − Pˆ (i, i)) λ(Pˆ (i, j)).
21-6
Lecture 21: Continuous Time Markov Chains
P P Hence A(i, i) = λ(1 − Pˆ (i, i)) = λ i6=j Pˆ (i, j) = i6=j A(i, j) is the total rate of transitions out of state i, and A(i, j) = λ(Pˆ (i, j)) is the rate of i to j transitions. Let Hi = S 1 + S 2 + · · · + S G , where S1 , S2 , . . . ∼ exp(λ) are independent. Define G to be the number of steps of the Pˆ chain until the process leaves state i. Then P(G = n) = Pˆ (i, i)n−1 (1 − Pˆ (i, i)). So G has a geometric distribution with parameter 1− Pˆ (i, i) and Hi has an exponential distribution with parameter λ(1 − Pˆ (i, i)). It is easy to check the expectations, since E[Si ] = 1/λ, E[G] = 1/(1 − Pˆ (i, i)), and E[Hi ] = 1/λ(1 − Pˆ (i, i)). Then by Wald’s identity ! 1 1 1 . = E[Hi ] = E[Si ]E[G] = λ 1 − Pˆ (i, i) λ(1 − Pˆ (i, i))
21.3
Theorem
Theorem 21.4 (for finite state irreducible chain) There is a unique probability distribution π so that 1. limt→∞ Pt (i, j) = πj for all i. 2. π solves the ”balance equations” πA = 0 (equivalently, πPt = π for all t). 3. πj is the long-run proportion of time spent in state j 4. the expected return time is Ej [Tj ] =
1 . q j πj
Proof: [for part (2)] Assume (1) is true. X Ps+t (i, j) = Ps (i, k)Pt (k, j) k
Letting s → ∞ we get πj =
X
πk Pt (k, j)
k
Therefore, π = πPt for all t. Differentiating with respect to t then gives 0 = πA.
Lecture 21: Continuous Time Markov Chains
21-7
Look at the jth element of πA (which we get by multiplying π by the jth column of A). We see that X πi qij − πj qj = 0 i6=j
which gives
X
πi qij = πj qj .
i6=j
This is the reason these equations are referred to as balance equations. Computations can often be simplified if there is also detailed balance, which simply means that for all pairs i, j (i 6= j) we have πi qij = πj qji . This is the condition of reversibility of the markov chain. Example 21.5 (Birth-Death Process) Consider a process that, from state i, can only move to state i + 1 or state i − 1. That is qi,i+1 = λi qi,i−1 = µi qi,j = 0, ∀j 6∈ {i + 1, i − 1}. So λi is the birth rate and µi is the death rate. We try to find π so that π0 q0,1 = π1 q1,0 π1 q1,2 = π2 q2,1 .. . πi−1 qi−1,i = πi qi,i−1 . Then π0 q0,1 = π1 q1,0 π0 λ 0 = π 1 µ 1 λ0 π1 = π0 . µ1 And π1 q1,2 = π2 q2,1 π1 λ 1 = π 2 µ 2 λ1 λ0 λ1 π1 = π0 . π2 = µ2 µ1 µ2
21-8
Lecture 21: Continuous Time Markov Chains
Hence πk =
λ0 λ1 · · · λk−1 π0 , µ1 µ2 · · · µ k
and π0 =
1 1+
λ0 λ1 ···λk−1 k=1 µ1 µ2 ···µk
P∞
.
Stat 150 Stochastic Processes
Spring 2009
Lecture 24: Queuing Models Lecturer: Jim Pitman
• Queueing Models The simplest setup is a single server with some service rules takes a random to process then the customer departs. ! ! ! ! ! ! " " " " " " #
! ! ! !
Queue ! ! !
Generically, X(t) = # of customers in system(queue) at time t, and model for evolution of X. • M/M/1 – M = Markov – First M is for input stream – Second M is for service process – 1 = # of servers – Poisson arrival at rate λ – Independent exponential service time with rate µ – Each customer that is served takes exp(µ) time – The service times T1 , T2 , . . . are i.i.d. exp(µ) random variables independent of the arrival stream
1
Lecture 24: Queuing Models
2
.. . .. . 3 !"#$ ' %
µ 2# ! % & ' "$ ( µ λ !"1#$ ( % & ' µ λ !"0#$ ( & λ
qij = rate of transition from i to j = off diagonal elements of the A matrix q0 = λ = −A00 q01 = λ q0j = 0, for all j 6= 0, j 6= 1 q1 = λ + µ = A11 q10 = µ q12 = λ q1j = 0, for all j 6= 0, j 6= 1, j 6= 2 • Condition for stability of the queue : λ < µ λ = µ is exceptional (null-recurrent chain). λ > µ is linear growth (like a RW with upward drift: transient chain). P Stability means lim P(Xt = j) = πj > 0 with j πj = 1. t→∞
We can calculate π. How? πPt = π,
for all t
Take dtd and evaluate at 0 =⇒ πA = 0. (Note: we did not try to solve differential equations for Pt . In the M/M/1 queue Pt (i, j) is a nasty power series (involving Bessel functions))
Use
qi = −Aii ,
qij = Aij , i 6= j,
and
X
πi Aij = 0, for all j
i
We get ∀j,
X
µi qij − πj qj = 0
i6=j
=⇒∀j,
X i6=j
µi qij = πj qj
Lecture 24: Queuing Models
3
That is , at equilibrium, Rate in to j = Rate out of j Fact: For birth/death chain on integers, the only possible equilibrium is reversible. So always look for solution of Rev EQ. π0 λ = π1 µ λ λ πn π1 λ = π2 µ = =⇒ πn = π0 ( )n =⇒ πn−1 µ µ .. . Use
∞ X
πn = 1, get
n=0
λ λ πn = ( )n (1 − ), µ µ
n = 0, 1, 2, . . .
Check P549(2.9) λ µ In the limit, mean queue length = 1/(1 − ) = . If λ is close to µ, this is µ µ−λ µ ↑ ∞. large. Fix λ, as µ ↓ λ, µ−λ • M/M/∞ – Input rate λ – Unlimited number of servers, each operate at rate µ on one customer at a time .. . .. . 3 !"#$ ' % 3µ !"2#$ ( & % ' 2µ λ !"1#$ ( % & ' µ λ !"0#$ ( & λ
Use the fact that mean of exp(µ1 ), exp(µ2 ), . . . , exp(µn ) is exp(µ1 +µ2 +· · ·+µn ). Condition for stability? No matter what λ > 0, µ > 0, the system is stable.To see this, establish a Rev EQ probablilty distribution:
Lecture 24: Queuing Models
4
π0 λ = π1 µ π1 λ = π2 2µ πn λ λ 1 =⇒ = =⇒ πn = π0 ( )n π2 λ = π3 3µ πn−1 nµ µ n! .. .
So π0 = e−λ/µ . The equilibrium probability distribution is Poisson(λ/µ) and the mean queue length is λ/µ.
Stat 150 Stochastic Processes
Spring 2009
Lecture 25: Renewal Theory Lecturer: Jim Pitman
• Renewal Process – Model for points on the real line. – Typically interpreted as times that something happens, e.g. ∗ ∗ ∗ ∗
arrival in a queue departure in a queue end of a busy cycle of a queue time that a component is replaced
• Many examples arise from more complex models where there is an increasing sequence of random times of interest, say 0 < W1 < W2 < · · · , with Wn = X1 + X2 + · · · + Xn , where – the Xi are strictly positive random variables (perhaps discrete, perhaps with a density) – the Xi are independent and identically distributed copies of X := X1 . Language: Events/Arrivals/... are called Renewals. Generic image: Lightbulb replacements: – Xn = lifetime of nth bulb – Install new bulb at time 0 – Leave it until it burns out; replace with a new bulb immediately – Wn = time of replacement of the nth bulb. Notation ready to roll from PP setup. Introduce the counting process: N (t) := # of renewals up to and including time t := largest n : 0 < Wn ≤ t Note N (0) := 0. The path t 7→ N (t) is by construction a right continuous non-decreasing step function, which jumps up by 1 at each of the times Wn for 1
Lecture 25: Renewal Theory
2
n ≥ 1. The sequence of renewal times (Wn ) and the counting process N (t) are inverses of each other. So for instance: (N (t) < n) = (Wn > t) (N (t) ≥ n) = (Wn ≤ t) which allows the distribution of N (t) for each t to be derived from knowledge of the distribution of Wn for each n, and vice versa. In principle this generalizes the relation between Poisson counts and exponential inter-arrival times. In practice, these distributions are not computationally tractable except in a small number of cases. But limiting results are available in great generality. • Scope of theory Interested in limit behaviours of various quantities/random variables. M (t) := E[N (t)]. How does M (t) behave for large t? Example: If P(X > t) = e−λt for λ > 0, then N (t) ∼ Poisson(λt). M (t) = E[N (t)] = λt, linear growth with t. Notice that λ = expected rate of renewals per unit time, which is realized also as a long run rate of arrivals per unit time. Recall E[X] = 1/λ. one renewal per time 1/λ ←→ expect λ renewals per unit time In general, we have LLN: After a large number n of replacements, the time used up will be around E(Wn ) = nE(X). N (t) n = 100
"
100 EX
Slope = 1/EX
u↑
→ Wn
100 EX
! t
t f (t) , where g(t) ∼ f (t) means lim = 1. This is called t→0 g(t) EX asymptotic equivalence. By LLN, EN (t) ∼
Lecture 25: Renewal Theory
3
EN (t) = 1. t→∞ t/EX Usual notation: EN (t) ∼ t/EX. Theorem: If EX < ∞, then lim
E[N (t)/t] → 1/EX. This form is also true if EX = ∞ with 1/∞ = 0 Finer result: If you avoid discrete(lattice) case, meaning there is some ∆ > 0 : P( X is a multiple of ∆) = 1 (Discrete renewal theory), then M (t) := EN (t) satisfies M (t + h) − M (t) → h/EX, for each fixed h > 0. This is D. Blackwell’s renewal theorem. • Now look at residual life and age process. — residual life/excess life = γt — age/current life = δt — βt := γt + δt = total life of component in use at time t
•
×
γt
βt
δt ◦
•
•
◦
×
×
◦
•
•
◦
×
×
◦ γt
δt ×
t
×
×
!
βt = δt + γt
• Example: Poisson process – Excess life γt ∼ exp(λ) for every t by the memoryless property of exponential variables. – Age δt . Notice 0 ≤ δt ≤ t, also (δt > s) ≡ (N (t) − N (t − s) = 0), provided {z } | Poisson(λs)
0 < s < t. So P(δt > s) = e−λs for 0 ≤ s < t, and P(δt > t) = 0. Clever representation: δt = min (X, t), where X ∼ exp(λ) d
d
−→ X ∼ exp(λ) as t → ∞ Notice From Poisson assumptions, γt and δt are independent. d
γt = X, δt → X d
and d
γt , δt independent
=⇒ βt := γt + δt → X1 + X2 = W2
Lecture 25: Renewal Theory
4
where we know W2 ∼ Gamma(2, λ) has the Gamma(2, λ) density fW2 (x) =
1 2 −λx xfX (x) λ xe = Γ(2) E(X)
• Remarkably, this structure of size-biasing the lifetime density fX (x) be a factor of x is completely general. Theorem: Whatever the distribution of X with a density fX (x), the limit xfX (x) . This is a density because: distribution of βt as t → ∞ has density EX Z ∞ xfX (x) EX = =1 EX EX 0 More d
(γt , δt ) → (U Y, (1 − U )Y ) xfX (x) and U is uniform [0,1], independent of Y . EX Exercise: Show this implies limit density of γt at x is P(X>x) , and that the same EX is true for δt . Note that this function of x is a probability density because Z ∞ Z ∞ Z ∞ P(X > x)dx = E[1(X > x)]dx = E 1(X > x)dx = EX. where P(Y ∈ dx) =
0
0
0
Stat 150 Stochastic Processes
Spring 2009
Lecture 26: Brownian Motion Lecturer: Jim Pitman
• Brownian Motion Idea: Some kind of continuous limit of random walk.
0
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
1
1 N
Consider a random walk with independent ±∆ steps for some ∆ with 12 ↑ 12 ↓, and N steps per unit time. Select ∆ = ∆N , so the value of the walk at time 1 has a limit distribution as N → ∞. Background: Coin tossing walk with independent ±1 steps with
1 2
↑
1 2
↓.
SN = X1 + · · · + XN ESN = 0 2 ESN =N If instead we add ±∆N , get SN ∆N and 2 )∆2N = N ∆2N ≡ 1 E(SN ∆N )2 = E(SN √ if we take ∆N = 1/ N .
Value of our process at time 1 with scaling is then √ d SN / N → Normal(0, 1)
as N → ∞
according the the normal approximation to the binomial distribution. The same is true for any distribution of Xi with mean 0 and variance 1, by the Central
1
Lecture 26: Brownian Motion
2
N steps
!
0
x
x
x
x
x
x
x
x
x
x
x
x
x
x
" x
x
x
x
x
x
x
x
x
x N (0, 1) x
1 t
Limit Theorem.
Scaled process at time t has value SbtN c SbtN c √ =p N btN c
r
btN c N
Fix 0 < t < 1, N → ∞, SbtN c p → N (0, 1) and btN c So
r
√ btN c → t N
SbtN c d √ → Normal(0, t) N
√ the normal distribution with mean 0, variance t, and standard deviation t. In the distributional limit as N → ∞, we pick up a process (Bt , t ≥ 0) with some nice properties: – Bt ∼ Normal(0, t), EBt = 0, EBt2 = t. – For 0 < s < t, Bs and Bt − Bs are independent. Bt = Bs + (Bt − Bs ) =⇒ Bt − Bs ∼ Normal(0, t − s) (Bt |Bs ) ∼ Normal(Bs , t − s) For times 0 < t1 < t2 < · · · < tn , Bt1 , Bt2 − Bt1 , . . . , Btn − Btn−1 are independent Normal(0, ti −ti−1 ) random variables for 1 ≤ i ≤ n, t0 = 0. This defines the finite dimensional distributions of a stochastic process (Bt , t ≥ 0) called a Standard Brownian Motion or a Wiener Process. • Theorem: (Norbert Wiener) It is possible to construct such a Brownian motion on a probability space (Ω, F, P), where P is a countably additive probability measure, and the path t → Bt (w) is continuous for all w ∈ Ω. More formally, we can take Ω = C[0, ∞) = continuous functions from [0, ∞) → R. Then for a function (also called a path) w = (w(t), t ≥ 0) ∈ Ω, the value of
Lecture 26: Brownian Motion
3
Bt (w) is simply w(t). This is the canonical path space viewpoint. This way P is a probability measure on the space Ω of continuous functions. This P is called Wiener measure. Note. It is customary to use the term Brownian motion only if the paths of B are continuous with probability one. • Context: Brownian motion is an example of a Gaussian process (Gaussian = Normal). For an arbitrary index set I, a stochastic process (Xt , t ∈ I) is called Gaussian if – for every selection of t1 , t2 , . . . , tn ∈ I, the joint distribution of Xt1 , Xt2 , . . . , Xtn is multivariate normal. – Here multivariate normal P ⇐⇒ i ai Xti has a one-dimensional normal distribution for whatever choice of a1 , . . . , an . ⇐⇒ Xt1 , Xt2 , . . . , Xtn can be constructed as n (typically different) linear combinations of n independent normal variables. Easy fact: The FDD’s of a Gaussian process are completely determined by its mean and covariance functions σ(s, t) = E(Xs − µ(s))(Xt − µ(t))
µ(t) := E(Xt )
Here the function σ is non-negative definite meaning it satisfies the system of P inequalities implied by V ar( i ai Xti ) ≥ 0 whatever the choice of real numbers ai and indices ti . ˆ with the same mean and covariance Key consequences: If we have a process B ˆ and B are function as a BM B, then the finite-dimensional distributions of B ˆ are continuous, then B ˆ a BM. identical. If the paths of B Example: Start with (Bt , t ≥ 0) = BM, fix a number v ≥ 0, look at (Bv+t − Bv , t ≥ 0). This is a new BM, independent of (Bs , 0 ≤ s < v).
0
x
x
x
x
x
x
x
x
x
x
x
x
x
!
x
x
x
x
x
x
x
x
x
x
x x
v
• Time inversion ( tB(1/t) for t > 0 ˆ is a BM. Note first that B ˆ is also Let Bˆt := . Check that B 0 for t = 0 a Gaussian process.
Lecture 26: Brownian Motion
4
(1) Continuity of paths Away from t = 0, this is clear. At t = 0, does tB(1/t) → 0 as t → 0? Compute V ar(tB(1/t)) = t2 V ar(B(1/t)) = t → 0. With some more care, it is possible to establish path continuity at 0 (convergence with probability one). (2) Mean and covariances E(tB(1/t)) = tE(B(1/t)) = 0 = E(Bt ) Covariances: for 0 < s < t, E(Bs Bt ) = E(Bs (Bs + Bt − Bs )) = E(Bs2 ) + E(Bs (Bt − Bs )) =s+0 =s ˆs B ˆt ) = E(sB(1/s)tB(1/t)) E(B = stE(B(1/s)B(1/t)) 1 1 1 since < = st · t t s =s • Example: Distribution of maximum on [0, t], Mt := sup Bs 0≤s≤t d
Fact: Mt = |Bt | P(Mt ) = 2P(Bt > x) 1 2 2 P(Mt ∈ dx) = √ e− 2 x dx 2π
x
0
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x x
x
x
x x
x
x
x
x
x
x x
x
x
x
x
t
Lecture 26: Brownian Motion
5
d
Sketch a proof: We have B = −B. P(Mt > x, Bt < x) = P(Mt > x, Bt > x) = P(Bt > x) d
by a reflection argument. This uses B = −B applied to (B(Tx + v) − x, v ≥ 0) instead of B (strong Markov property) where Tx is the first hitting time of x by B.. Add and use P(Bt = x) = 0, to get P(Mt > x) = 2P(Bt > x).
Stat 150 Stochastic Processes
Spring 2009
Lecture 27: Hitting Probabilities for Brownian Motion Lecturer: Jim Pitman
• The French mathematician Paul L´evy first showed that several random variables have the same distribution for a Brownian path: (1) g1 := Time of last zero before 1 (See Problem 2.5 on Page 497 for g1 = τ0 in text) R1 (2) A+ (1) = 0 1(Bt > 0)dt, function of time spent > 0 up to time 1 (3) Time when the maximum of B on [0,1] is attained (4) Time when the minimum of B on [0,1] is attained All of these variables have the Beta(1/2, 1/2) distribution, also called the Arcsin Law on [0,1] with density at u ∈ [0, 1] 1 √ √ π u 1−u Surprising (at least for g1 , but not for the others): – symmetry about 1/2 – U shape • Hitting probability for BM First discuss BM started at x. This is x + Bt , t ≥ 0 for (Bt ) the standard BM as before. Let Px governs BM started at x, then Px , x ∈ R fit together as the distributions of a Markov process indexed by its starting state x. Problem: Look at BM started at a, 0 < a < b, run until T0,b when B first hit 0 or b. Then Pa (B(T0,b ) = b) = Pa (B hits b before 0) = a/b. .
1
Lecture 27: Hitting Probabilities for Brownian Motion
2
T0,b
b
a
0
T0,b
Idea: BM is the continuous limit of RW. We know this formula for ±1, 12 ↑ RW, a, b integers, 0 < a < b.
1 2
↓
Look at the BM case for a and b integers. Easy: a = 1, b = 2 1
2
0
1
0
-1
Recall hitting +1 before -1 has the same probability as hitting -1 before +1. Easy analysis shows P1 (B(T0,3 ) = 3) = 1/3 3
2 1/2 1
0
Key Idea: After first reach 2, the BM starts afresh as if it starts at 2. – There is a coin tossing walk embedded in the BM. – Exploit this to derive the a/b formula for BM. Next, suppose a, b are rational numbers. Work on multiples of c where a/c and a/c b/c are integers. Then = a/b b/c
Lecture 27: Hitting Probabilities for Brownian Motion
3
Use rational numbers are dense in reals and obvious monotonicity in a, b to get result for real a, b. • Martingales Definition: (Mt , t ≥ 0) is called a Martingale if E(Mt |Ms , 0 ≤ s ≤ u) = Mu
for 0 < u < t
E[Mt g(Ms , 0 ≤ s ≤ u)] = E[Mu g(Ms , 0 ≤ s ≤ u)] for 0 < u < t, all bounded measurable function g. Notice: (Bt , t ≥ 0) is a MG. Given Bs , 0 ≤ s ≤ u, Bt ∼ N (Bu , t − u) =⇒ E(Bt |Bs , 0 ≤ s ≤ u) = Bu Var(Bt |Bs , 0 ≤ s ≤ u) = t − u Key fact: Define idea of stopping time for (Mt ), (τ ≤ t) is determined by (Ms , 0 ≤ s ≤ t) for all t ≥ 0. ( Mt if t ≤ τ Stopped process Mt∧τ := Mτ if t > τ Theorem: If M is a MG with continuous path and τ is a stopping time for M, then (Mt∧τ , t ≥ 0) is also a MG. • Application: Take Mt = a + Bt for B a standard BM. T := T0,b = first t : Mt = 0 or b E[Mt∧τ ] ≡ a Then a = E[Mt∧τ ] =0 · P(MT = 0, T ≤ t) + b · P(MT = b, T ≤ t) (→ bP(MT = b)) + E(Mt 1(T > t)) (→ 0) =⇒ a = b · P(MT = b) =⇒ P(MT = b) = a/b as claimed. • Introduce a drift Look at Dt a drifting BM started at a > 0. Dt = a + δt + Bt Find Pa (D hits b before 0).
Lecture 27: Hitting Probabilities for Brownian Motion
4
Parallel to p ↑ q ↓ walk Sn . Found a suitable MG: Mn := (q/p)Sn Key idea: Look at rSn Mn =
rSn E(rSn )
works for any r. Good choice r = q/p makes E(rSn ) = 1 for S0 = 0. For continuous time/space, let r = eθ , then Mt = analog with B instead of Sn . First θBt
E(e
Z )=
eθBt . Look for continuous E(eθBt )
∞
1 2 1 − 1 x2 eθx √ e 2 t dx = e 2 θ t 2πt −∞
Then (θ)
E[Mt+s |Bu , 0 ≤ u ≤ t] = =
E(eθBt+s |Bu , 0 ≤ u ≤ t) E(eθBt+s ) E(eθ(Bt+s −Bt ) eθBt |Bu , 0 ≤ u ≤ t) 1 2 (t+s)
e2θ 1 2
=
e 2 θ s eθBt 1 2
1 2 s
e 2 θ te 2 θ (θ) = Mt We have this for all θ!
Return to Dt = a + δt + Bt . Look at eθDt = eθa eθδt eθBt . Want θδ = − 12 θ2 ⇔ (θ) θ = −2δ, then eθDt = eθa Mt Conclusion: If Dt is a drifting BM, then (e−2δDt , t ≥ 0) is a MG. Analog of Sn is p ↑ q ↓ walk and (q/p)Sn is a MG. Exercise: Use this MG to solve the problem of Pa (D hits b before 0).
Stat 150 Stochastic Processes
Spring 2009
Lecture 28: Brownian Bridge Lecturer: Jim Pitman
• From last time: Problem: Find hitting probability for BM with drift δ, Dt := a + δt + Bt , (δ) Pa (Dt hits b before 0). Idea: Find a suitable MG. Found e−2δDt is a MG. Use this: Under Pa start at a. Ea e−2δD0 = e−2δa . Let T = first time Dt hits 0 or b. Stop the MG at time T , Mt = e−2δDt . Look at (Mt∧T , t ≥ 0). For simplicity, take δ > 0, then 1 ≥ e−2δDt > e−2δa > e−2δb > 0. For 0 ≤ t ≤ T , get a MG with values in [0,1] when we stop at T . Final value of MG = 1 · 1(hit 0 before b) + e−2δb · 1(hit b before 0) 1 −2δa
e
= P(hit b) + P(hit 0) = 1 · P(hit 0) + e−2δb · P(hit b)
=⇒ P(hit b) =
1 − e−2δa 1 − e−2δb
Equivalently, start at 0, P(Bt + δt ever reaches − a) = e−2δa . Let M := − mint (Bt + δt), then P(M ≥ a) = e−2δa =⇒ M ∼ Exp(2δ). Note the memoryless property: P(M > a + b) = P(M > a)P(M > b) This can be understood in terms of the strong Markov property of the drifting BM at its first hitting time of −a: to get below −(a + b) the process must get down to −a, and the process thereafter must get down a further amount −b. 0 a
b
1
Lecture 28: Brownian Bridge
2
• Brownian Bridge This is a process obtained by conditioning on the value of B at a fixed time, say time=1. Look at (Bt , 0 ≤ t ≤ 1|B1 ), where B0 = 0. Key observation: E(Bt |B1 ) = tB1 B1
B0
To see this, write Bt = tB1 + (Bt − tB1 ), and observe that tB1 and Bt − tB1 are independent. This independence is due to the fact that B is a Gaussian process, so two linear combinations of values of B at fixed times are independent if and only if their covariance is zero, and we can compute: E(B1 (Bt − tB1 )) = E(Bt B1 − tB12 ) = t − t · 1 = 0 since E(Bs Bt ) = s for s < t. Continuing in this vein, let us argue that (Bt − tB1 , 0 ≤ t ≤ 1) and B1 are |{z} | {z } process
r.v.
independent. Proof: ˆ i ) := Bt − ti B1 , take any linear combination: Let B(t i X X X ˆ i ))B1 ) = ˆ i )B1 = E(( αi B(t αi EB(t αi · 0 = 0 i
i
i
ˆ at fixed times ti ∈ [0, 1] This shows that every linear combination of values of B is independent of B1 . It follows by measure theory (proof beyond scope of this ˆ i ), 0 ≤ course) that B1 is independent of every finite-dimensional vector (B(t t1 < · · · < tn ≤ 1), and hence that B1 is independent of the whole process ˆ ˆ (B(t), 0 ≤ t ≤ 1). This process B(t) := (Bt − tB1 , 0 ≤ t ≤ 1) is called Brownian bridge. ˆ ˆ ˆ is B conditioned on B1 = 0. Moreover, Notice: B(0) = B(1) = 0. Intuitively, B ˆ B conditioned on B1 = b is (B(t)+tb, 0 ≤ t ≤ 1). These assertions about conditioning on events of probability can be made rigorous either in terms of limiting statements about conditioning on e.g. B1 ∈= (b, b + ∆) and letting ∆ tend to 0, or by integration, similar to the interpretation of conditional distributions and densities.
Lecture 28: Brownian Bridge
3
ˆ • Properties of the bridge B. (1) Continuous paths. (2) X
ˆ i) = αi B(t
i
X
αi (B(ti ) − ti B(1))
i
=
X
αi B(ti ) − (
i
X
αi ti )B(1)
i
= some linear combination of B(·) ˆ is a Gaussian process. As such it is determined by its mean and =⇒ B ˆ covariance functions: E(B(t)) = 0. For 0 ≤ s < t ≤ 1, ˆ B(t)) ˆ E(B(s) = E[(Bs − sB1 )(Bt − tB1 )] = E(Bs Bt ) − sE(Bt B1 ) − tE(Bs B1 ) + stE(B12 ) = s − st − st + st = s(1 − t) ˆ − t), 0 ≤ t ≤ 1) = (B(t), ˆ Note symmetry between s and 1 − t =⇒ (B(1 0≤ t ≤ 1) ˆ is an inhomogeneous Markov Process. (Exercise: describe its transition (3) B rules, i.e. the conditional distribution of Bˆt given Bˆs for 0 < s < t.) d
• Application to statistics Idea: How to test if variables X1 , X2 , . . . are like an independent sample from some continuous cumulative distribution functionF ? Empirical distribution Fn (x) := fraction of Xi , 1 ≤ i ≤ n, with values ≤ x. Fn (x) is a random function of x which increases by 1/n at each data point. If the Xi ’s are IID(F ), then Fn (x) → F (x) as n → ∞ by LLN. In fact, supx |Fn (x) − F (x)| → 0. (Glivenko-Cantelli theorem). √ Now Var(Fn (x)) = F (x)(1 − F (x)/n. So consider Dn := n supx |Fn (x) − F (x)| as a test statistic for data distributed as F . Key observation: Let Ui := F (Xi ) ∼ U[0,1] . The distribution of Dn does not depend on F at all. To compute it, may as well assume that F (x) = x, 0 ≤ x ≤ 1; that is, sampling from U[0,1] . Then √ Dn := n sup |Fn (u) − u| n n 1X Fn (t) = 1(Ui ≤ u) n i=1
Lecture 28: Brownian Bridge
4
√ Idea: Look at the process Xn (t) := ( n(Fn (t) − t), 0 ≤ t ≤ 1). You can easily check for 0 ≤ s ≤ t ≤ 1: E(Xn (t)) = 0 Var(Xn (t)) = t(1 − t) E[Xn (s)Xn (t)] = s(1 − t) d ˆ We see (Xn (t), 0 ≤ t ≤ 1) → (B(t), 0 ≤ t ≤ 1) in the sense of convergence in distribution of all finite-dimensional distributions: In fact, with care from linear combination, d ˆ 1 ), . . . , B(t ˆ k )) (Xn (t1 ), . . . , Xn (tk )) −→ (B(t CenteredandScaledDiscrete
Gaussian
Easy: For any choice of αi and ti , 1 ≤ i ≤ m, Pm d i=1 αi Xn (ti ) = centered sum of n IID random variables −→ Normal by usual CLT. And it can be shown that also: d ˆ Dn := supt |Xn (t)| −→ supt |B(t)| Formula for the distribution function is a theta function series which has been widely tabulated. The convergence in distribution to an explicit limit distribution is due to Kolmogorov and Smirnov in the 1930s. Identification of the limit in terms of Brownian bridge was made by Doob in the 1940s. Rigorous derivation of a family of such limit results for various functionals of random walks converging to Brownian motion is due to Donsker in the 1950’s (Donsker’s theorem). Theorem: (Dvoretsky-Erd¨os-Kakutani) With probability 1 the path of a BM is nowhere differentiable.