163 114 569KB
English Pages 92 Year 2013,2005
Probability: A Graduate Course Solutions to Problems Allan Gut
Contents 1 Introductory Measure Theory
1
2 Random Variables
5
3 Inequalities
13
4 Characteristic Functions
16
5 Convergence
27
6 The Law of Large Numbers
37
7 The Central Limit Theorem
50
8 The Law of the Iterated Logarithm
67
9 Limit Theorems; Extensions and Generalizations
70
10 Martingales
75
September 13, 2007
Introductory Remarks The present file contains solutions to the problems given at the end of each chapter in my book Probability: A Graduate Course, published 2005 at the Springer-Verlag. Almost all solutions are complete in the sense that so-called trivial or elementary, and/but tedious calculations are commonly included. References to chapters, sections, theorems, formulas, problems, and so on, refer to the corresponding object in the text-book. References to objects in the set of solutions can be distinguished by the first “digit”, which is the letter S set in boldface. A number of solutions are (sometimes variations or revisions of solutions) originally due to PhD students who have taken some graduate probability course of mine based on related books or, more recently, of early versions of the text-book. I wish to thank all of them, wherever they are, for this indirect assistance. Unfortunately I have not succeeded in making this a misprint-free book. A pdf-file with misprints and corrections can be found at http://www.math.uu.se/~allan/publ.html .
Uppsala Midsummer 2005
Allan Gut
Meanwhile a “corrected 2nd printing” of the book has appeared (spring 2007). The present set of solutions is an updated (i.e., corrected) version of the previously available one. Once again, thanks to all who have pointed out “misprints” as well as misprints, and to you who have provided me with better/more attractive/more elegant solutions. Uppsala September 2007
Allan Gut
1
Introductory Measure Theory 1. The first equality is pure definition of A M B. Using the definition of A r B and standard rules for unions and intersections yields {A ∪ B} r {A ∩ B} = {A ∪ B} ∩ (A ∩ B)c = A ∩ (A ∩ B)c ∪ B ∩ (A ∩ B)c = A ∩ (Ac ∪ B c ) ∪ {B ∩ (Ac ∪ B c ) = (A ∩ B c ) ∪ (B ∩ Ac ) This establishes the first relation. As for the second one, by definintion, Ac M B c = Ac ∩ (B c )c ∪ B c ∩ (Ac )c = {Ac ∩ B} ∪ {B c ∩ A} = A M B. Finally, (A1 ∪ A2 ) ∩ (B1 ∪ B2 )c ∪ (B1 ∪ B2 ) ∩ (A1 ∪ A2 )c = (A1 ∪ A2 ) ∩ (B1c ∩ B2c ) ∪ (B1 ∪ B2 ) ∩ (Ac1 ∩ Ac2 ) ⊂ (A1 ∩ B1c ) ∪ (A2 ∩ B2c ) ∪ (B1 ∩ Ac1 ) ∪ (B2 ∩ Ac2 ) = (A1 ∩ B1c ) ∪ (B1 ∩ Ac1 ) ∪ (A2 ∩ B2c ) ∪ (B2 ∩ Ac2 )
{A1 ∪ A2 } M {B1 ∪ B2 } =
= {A1 M B1 } ∪ {A2 M B2 }. 2. The first statement is true, because lim sup{An ∪ Bn } = n→∞
=
∞ [ ∞ \
(Ak ∪ Bk ) =
n=1 k=n \ ∞ [ ∞
Ak
[
n=1 k=n
∞ [ ∞ \
Ak
n=1 k=n ∞ ∞ \ [
Ak
∞ [ [
Bk
k=n
= lim sup An ∪ lim sup Bn . n→∞
n=1 k=n
n→∞
In words: if x ∈ LHS, this means that x ∈ Ak ∪ Bk i.o., which is the same as x ∈ Ak i.o. or x ∈ Bk i.o., which, in turn = RHS. Statement 2 is false, namely, x ∈ LHS =⇒ x ∈ {Ak ∩ Bk } i.o., x ∈ RHS =⇒ {x ∈ Ak i.o.} ∩ {x ∈ Bk i.o.}. If, say, x ∈ Ak for all odd k and x ∈ Bk for all even k, then x ∈ LHS, but x ∈ / RHS. This shows that, in fact, lim sup{An ∩ Bn } ⊂ lim sup An ∩ lim sup Bn . n→∞
n→∞
n→∞
Statement 3 is false similarly; take complements or argue analogously. In this case, lim inf {An ∪ Bn } ⊃ lim inf An ∪ lim inf Bn . n→∞
n→∞
n→∞
To see that the statement 4 is true, consider complements together with the first statement, or argue analogously: ∞ \ ∞ ∞ \ ∞ ∞ \ \ [ [ lim inf {An ∩ Bn } = (Ak ∩ Bk ) = Ak Bk n→∞
=
n=1 k=n [ ∞ ∞ \ n=1 k=n
n=1
Ak
\ [ ∞ \ ∞ n=1 k=n
k=n
k=n
Ak
= lim inf An ∩ lim inf Bn . n→∞
n→∞
2
Chapter 1 In words: If x ∈ LHS, then x ∈ Ak ∩ Bk for all k ≥ n0 for some n0 . If x ∈ RHS, then x belongs to Ak for all k ≥ nA and x ∈ Bk for all k ≥ nB , that is, x ∈ Ak and x ∈ Bk for all k ≥ n0 = max{nA , nB }, that is, x ∈ Ak ∩ Bk for all k ≥ n0 for some n0 . Statement 5 is true, because, by joining statements 1 and 3 we obtain A ∪ B = lim inf An ∪ lim inf Bn ⊂ lim sup An ∪ lim sup Bn = A ∪ B. n→∞
n→∞
n→∞
n→∞
Statement 6 follows by joining Statements 2 and 4 similarly, or by taking complements in Statement 5. S∞ S 3. Let A = ∞ k=1 Ak , then, necessarily, A ∈ Ak for some k. However, k=1 I2k . If A ∈ there is no such Ak . 4. This follows from the fact that we also have c and Ak = Bk ∩ Bk−1
A1 = B1
for 2 ≤ k ≤ n,
so that every union and intersection of sets Ak can be expressed in terms of sets Bk and vice versa. 5. (a): By assumption there exists a subsequence {nk , k ≥ 1}, such that P (Ank ) > 1 −
1 , 2k
k = 1, 2, 3, . . . ,
which, via de Morgan, implies that P
\
Ank = 1 − P
[
k
Acnk
≥1−
k
∞ X
P (Acnk )
k=1
∞ X 1 = 1 − 1 = 0. >1− 2k k=1
(b): Let {An , n ≥ 1} be independent events such that P (An ) = α for all n (e.g. some “coin-tossing” event). Then, for any subsequence {nk , k ≥ 1}, P
\
m Y Ank = lim P (Ank ) = lim αm = 0. m→∞
k
k=1
m→∞
(c): Since the sets are monotone the limit exists, and \ P An = lim P (An ) ≥ lim α = α. n
n→∞
n→∞
(d): Since the sets are monotone the limit exists, and [ P An = lim P (An ) ≤ lim α = α. n
n→∞
6. There are four cases: I{A}(ω) = 1, I{B}(ω) = 1, I{A}(ω) = 1, I{B}(ω) = 0, I{A}(ω) = 0, I{B}(ω) = 1, I{A}(ω) = 0, I{B}(ω) = 0, Everything follows from these relations.
n→∞
I{A ∩ B}(ω) = 1, I{A ∩ B}(ω) = 0, I{A ∩ B}(ω) = 0, I{A ∩ B}(ω) = 0,
3
Introductory measure theory
7. Since An converges as n → ∞ iff lim supn→∞ An = lim inf n→∞ An , and since two sets agree iff their indicator functions do, the claim is equivalent to proving that I{lim sup An } = I{lim inf An } n→∞
n→∞
⇐⇒
lim sup I{An } = lim inf I{An }, n→∞
n→∞
which, in turns amounts to proving that I{sup inf Ak } = I{ inf sup Ak } n≥1 k≥n
n≥1 k≥n
⇐⇒
sup inf I{Ak } = inf sup I{Ak }. (S.1.1) n≥1 k≥n
n≥1 k≥n
Now, ⇐⇒
I{ inf Ak (ω)} = 1 k≥n
⇐⇒ ⇐⇒
ω ∈ inf Ak = k≥n
\
Ak
k≥n
I{Ak (ω)} = 1
k≥n
for all
inf I{Ak (ω)} = 1.
k≥n
A similar argument shows that I{sup Ak (ω)} = 1
⇐⇒
sup I{Ak (ω)} = 1.
k≥n
k≥n
Joining the two shows that I{sup inf Ak (ω)} = 1
⇐⇒
I{ inf sup Ak (ω)} = 1
⇐⇒
n≥1 k≥n n≥1 k≥n
sup inf I{Ak (ω)} = 1, n≥1 k≥n
inf sup I{Ak (ω)} = 1.
n≥1 k≥n
Since indicator functions are equal to zero when they are not equal to 1, it follows that (S.1.1) holds and we are done. 8. We have [0, an ) ⊂ sup[0, an ) = n
[
[0, an ) = [0, sup an ) ⊂ [0, 1]
for all
n.
n
n
Let x ∈ [0, 1) be arbitrarily close to 1. Then x ∈ [0, an ) for n large enough. S Thus, [0, 1) ⊂ supn [0, an ). On the other hand, 1 ∈ / an for any n, so that 1 ∈ / n [0, an ). This shows that supn [0, an ) = [0, 1). S The same argument shows that supn [0, an ] ⊂ [0, 1), and that 1 ∈ / n [0, an ]. Since the semi-open set is contained in the closed one it also follows that supn [0, an ] = [0, 1). The second conclusion follows similarly; \ [0, 1] ⊂ inf [0, bn ] = [0, bn ] = [0, inf bn ] ⊂ [0, bn ] n
n
n
for all
n.
By letting x > 1 be arbitrarily close to 1 it follows that x ∈ / inf n [0, bn ], so that inf n [0, bn ] = [0, 1]. Since 1 ∈ [0, bn ) for all n we also have inf n [0, bn ) = [0, 1]. 9. Use induction: For n = 2 it is well known that P (A1 ∪ A2 ) ≤ P (A1 ) + P (A2 ).
4
Chapter 1 Now, suppose that we are done up to n − 1, and set Bn = P
n [
Ak = P (Bn−1 ∪An ) ≤ P (Bn−1 )+P (An ) ≤
k=1
n−1 X
S
1≤k≤n Ak .
Then
P (Ak )+P (An ) =
k=1
n X
P (Ak ).
k=1
This proves the first relation. As for the second one, it is well known that P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∩ A2 ). Once again, suppose we are done up to n−1, let Bn be as before, and set, in addition, [ Cn = Bn−1 ∩ An = {Ak ∩ An }. 1≤k≤n−1
Then P
n [
Ak = P Bn−1 ∪ An = P (Bn−1 ) + P (An ) − P (Cn )
(S.1.2)
k=1
Now, P (Bn−1 ) is estimated from below via the induction hypothesis: P (Bn−1 ) ≥
n−1 X
X
P (Ak ) −
P (Ai ∩ Aj ),
1≤i K) < ε, also by construction.
6
Chapter 2 3. (a): Let (a, b] be an arbitrary interval on R+ . Then {|X| ∈ (a, b]} = {X ∈ [−b, −a)} ∪ {X ∈ (a, b]} ∈ F. Since it suffices to check such intervals we are done. (b): Let A be a non-measurable set, and set ( 1, when ω ∈ A, X(ω) = −1, when ω ∈ / A. Then X is not a random variable. However, P (|X| = 1) = 1, i.e. |X| is, indeed, a random variable. 4. (a): Since distributions are right-continuous with left-hand limits it follows that the limit equals ( P (X = x), if x ∈ JF , F (x) − F (x−) = 0, otherwise. (b): If x ∈ JF , then 0 < F (x) − F (x−) ≤ F (x + h) − F (x − h)
for every
h > 0,
i.e. x ∈ supp (F ). (c): Suppose that x∗ is an isolated point. Then there exists A = (x+δ, x−δ)∩{x∗ }c , such that A ∩ supp (F ) = ∅. This implies that F (x + ε) − F (x − ε) = 0 for every x ∈ (x∗ − δ, x∗ ) provided ε > 0 is sufficiently small, which, in turn, tells us that F , being non-decreasing, must be constant throughout the interval x ∈ (x∗ − δ, x∗ ). A completely analogous argument shows that F is also constant throughout the interval x ∈ (x∗ , x∗ + δ). It follows that x∗ ∈ JG . (d): The support is closed since its complement is open. Namely, suppose that x∈ / supp (F ), that is, that F (x + h0 ) − F (x − h0 ) = 0.
there exists h0 > 0, such that
(S.2.1)
Since F is non-decreasing it follows that F (x + h) − F (x − h) = 0
for all h < h0
(h ≥ 0),
which implies that there exists an open interval I around x, such that (S.2.1) holds for all points in I. We have thus shown thatfor any x ∈ / supp (F ), there exists an c open interval I, such that x ∈ I ⊂ supp (F ) , which proves that the complement of the support is open. 5. We have (summation should start at −∞) ∞ X
P (n < X ≤ n + m) =
n=−∞
=
∞ X
n+m X
n=−∞ k=n+1 ∞ X
P (X = k) =
mP (X = k) = m
k=−∞
∞ X
k−1 X
k=−∞ n=k−m ∞ X
P (X = k)
P (X = k) = m.
k=−∞
7
Random variables 6. We have (note that integration is over R) Z ∞ Z y Z ∞ Z x+a Z ∞ dx dF (y) dF (y) dx = P (x < X ≤ x + a) dx = −∞ y−a −∞ x −∞ Z ∞ Z ∞ = a dF (y) = a dF (y) = a. −∞
7.
−∞
P (Xt 6= 0) = P(ω) = 0, and, yet, sup0≤t≤1 Xt (ω) = Xω (ω) = 1 a.s.
8. If the sum is convergent for some A, then P (Xn > A i.o.) = 0, by the first Borel-Cantelli lemma, which means that, for almost all ω, we have Xn (ω) ≤ A for all n ≥ n0 (ω), which is the same as supn Xn < ∞ a.s. If, on the other hand, the sum is divergent for all A, then, by independence and the second Borel-Cantelli lemma, P (Xn > A i.o.) = 1
for all
A,
which tells us that supn Xn = ∞ a.s. 9. For y > 0, log y − µ FY (y) = P (Y ≤ y) = P (eX ≤ y) = P (X ≤ log y) = FX (log y) = Φ , σ so that, by differentiation (y > 0), n (log y − µ)2 o log y − µ 1 1 exp fY (y) = φ · =√ . σ yσ 2σ 2 2πyσ 10. Let y ≥ 0. Then FY (y) = P (Y ≤ y) = P (−θ log X ≤ y) = P (X ≥ e−y/θ ) = 1 − e−y/θ . Also, by differentiation (y > 0), 1 fY (y) = e−y/θ . θ 11. Let Tk , k = 1, 2, . . . , 6, be the number of throws until “the next new face” appears, so that the total number of tosses equals T = T1 + T2 + · · · + T6 . Clearly, T1 = 1. Next, T2 = the number of throws required for the second “new face” to appear, which means that T2 ∈ Fs(5/6). Similarly, T3 ∈ Fs(4/6), T4 ∈ Fs(3/6), T5 ∈ Fs(2/6), and T6 ∈ Fs(1/6). Thus, E T1 = 1,
6 E T2 = , 5
6 E T3 = , 4
6 E T4 = , 3
6 E T5 = , 2
6 E T4 = , 1
and, hence, 6 6 6 6 E T = E(T1 +T2 +· · ·+T6 ) = E T1 +E T2 +· · ·+E T6 = 1+ + + + +6 = 14.7. 5 4 3 2 ♣ As a prelude to the next problem we note that, by summing backwards, ET =6
6 X 1 . k
k=1
(S.2.2)
8
Chapter 2 12. This problem can be viewed as an extension of the previous one in the sense that it is equivalent to throwing a symmetric die with n faces until all of them have appeared at least once. In view of (S.2.2) this implies that we can immediately provide the answer (although it is advisable to check the details): ET =n
n X 1 . k k=1
For large values of n it is worth mentioning that the harmonic series grows logarithmically, viz., E T ≈ n(log n + γ) for n large, where γ = 0.5772 . . . denotes Euler’s constant. 13. Since n
m2n
n
n
k=1
k=1
X X 1X ¯ n + (X ¯ n )2 = 1 ¯ n + n(X ¯ n )2 Xk2 − 2Xk X Xk2 − 2 Xk X n n
=
k=1
n 1X
=
n
¯ n )2 Xk2 − n(X
k=1
n 1X 2 ¯ n )2 , = Xk − (X n k=1
it follows that E
m2n
n 2 1X ¯ n )2 = σ 2 + µ2 − σ + µ2 = σ 2 1 − 1 . = E Xk2 − E (X n n n k=1
The computations become simpler if one assumes that µ = 0. This is no restriction. Do re-examine the above computations under this assumption. And let’s keep it for the variance. n n 2X 1 X 2 2 2 ¯ ¯ n )2 Var = Var (X ) + Var ( X ) − Cov X , ( X n k k n2 n k=1 k=1 1 ¯ n4 − E(X ¯ n )2 2 − 2Cov X12 , (X ¯ n )2 = E X 4 − (E X 2 )2 + E X n 2 2 2 µ4 − σ 4 ¯ n4 − σ ¯ n )2 + 2σ 2 · σ . +E X (S.2.3) = − 2E X12 (X n n n 4 Pn via the binomial theorem, and by observing that Now, by expanding k=1 Xk terms with Xk occurring linearly will vanish due to independence and the assumption that E X = 0, we obtain
m2n
n
¯ n4 E X
=
1 X E Xk4 + 6 n4 k=1
=
1)σ 4
µ4 3(n − + n3 n3
X 1≤i 0. Using the hint we have (recall the assumption that E X = 0), 0 ≤ E(a0 + a1 X + a2 X 2 )2 = a20 + a21 µ2 + a22 µ4 + 2a0 a1 0 + 2a0 a2 µ2 + 2a1 a2 µ3 = a20 + a21 µ2 + a22 µ4 + 2a0 a2 µ2 + 2a1 a2 µ3 . Noticing that there is only one term involving µ4 we must have a22 = µ2 , so that the last expression equals a20 + a21 µ2 + µ2 µ4 + 2a0 (µ2 )3/2 + 2a1 (µ2 )1/2 µ3 2 2 = µ2 µ4 + a0 + (µ2 )3/2 − µ32 + a1 (µ2 )1/2 + µ3 − µ23 = Det (A) provided the expressions within parentheses vanish, that is, for a0 = −(µ2 )3/2
and a1 = −(µ2 )−1/2 µ3 .
We have thus shown that Det (A) = µ2 µ4 − µ23 − µ32 = E (µ2 )1/2 X 2 − µ3 (µ2 )−1/2 X − (µ2 )3/2
2
≥ 0.
♣ For the reader who is acquainted with Chapter 3 it may be of interest to observe that µ3 = E X 3 = E X 2 · X = E X 2 · X − E X · E X 2 = Cov (X, X 2 ), which, together with the fact that µ2 = Var X (since E X = 0) permits a rewriting of the determinant as 2 Det (A) = Var (X)Var (X 2 ) − Cov (X, X 2 ) ≥ 0, either by Cauchy-Schwarz inequality or (which is the same) since |ρX,X 2 | ≤ 1.
(b): We exploit a theorem on matrices to the effect that a matrix is non-negative definite if the determinant of every submatrix starting at the upper left corner is non-negative. In the present case we have 1 ≥ 0,
µ2 ≥ 0,
µ2 µ4 − µ23 − µ32 ≥ 0,
the latter fact taken from (a). Alternatively, check that all eigenvalues are non-negative. (c): Try the matrix
1 0 0 µ2 µ2 µ3 µ3 µ4
µ2 µ3 µ4 µ5
µ3 µ4 . µ5 µ6
10
Chapter 2
15. Changing the order of integration we obtain Z x Z ∞ Z ∞ Z ∞Z x dF (u, v) dx dF (u, v) − LHS = −∞ x −∞ x −∞ Z x Z u Z ∞ Z ∞Z v = dx dF (u, v) dx dF (u, v) − −∞ v −∞ x u Z ∞ Z ∞ Z x = (v − u) dF (u, v) − (u − v) dF (u, v) −∞ x −∞ Z ∞ Z ∞ Z x = (v − u) dF (u, v) + (v − u) dF (u, v) x −∞ Z−∞ ∞ Z ∞ (v − u) dF (u, v) = E(Y − X) = RHS. = −∞
−∞
16. Since X and I{Y ∈ B} are independent, E X · I{Y ∈ B} = E X · E I{Y ∈ B} = E X · P (Y ∈ B). 17. (a) Since |Yn | ≤
Pn
i=1 |Xk |,
it follows that
E Yn ≤ E|Yn | ≤
n X
E|Xk | < ∞.
i=1
(b) This follows from the fact that Xk ≤ Yn for k = 1, 2, . . . , n. (c) See (a). (d) Suppose that X, X1 , X2 , . . . are independent, identically distributed random variables, such that P (X = 1) = p and P (X = −2) = 1 − p, 0 < p < 1. Then P (|Yn | = 2) = P (Xk = −2 for all k ≤ n) = (1 − p)n , so that E|Yn | = 2 · (1 − p)n + 1 · (1 − (1 − p)n ) = 1 + (1 − p)n , whereas E|X| = 2 · (1 − p) + 1 · p = 1 + (1 − p) > 1 + (1 − p)n whenever n ≥ 2. 18. We wish to prove that E|Y |r < ∞
⇐⇒
∃ Z ∈ Lr ,
such that |Xn | ≤ Z
for all n
(r > 0).
If E|Y |r < ∞, then we may choose Z = Y , since |Xn | ≤ Y for all n. Conversely, if the desired Z exists, then, since |Xn | ≤ Z for all n, it also follows that Y ≤ Z, and since Z ∈ Lr , we conclude that Y ∈ Lr . 19. (a): We have nE
n nn o n 1 I{X > n} = E I n) → 0 as n → ∞. X X X X
(b): This one is a bit harder, because there is trouble at 0. Define
n 1 o n∗ = inf n : P I{X > 0} > n < ε , X
11
Random variables and let n > n∗ . Then 1 1 1 = E I X> n X n
1 1 1 E I a) a Z ∞ = 2a(1 − F (a)) + 2 (1 − F (x)) dx − E X + a − 2aP (X > a) a Z ∞ (1 − F (x)) dx − E X + a. = 2 a
Differentiation with respect to a shows that the minimum is attained for a, such that F (a) = 1/2, that is, when a = med (X). 5. Noticing that, for any set Borel set A, P (A) = E I{X ∈ A} = E(I{X ∈ A})2 , we obtain, via Cauchy’s inequality, E X 2 P (X ≥ λE X) = E X 2 · E(I{X ≥ λE X})2 ≥ (E XI{X ≥ λE X})2 = (E X − E XI{X < λE X})2 ≥ (E X − λE X)2 = (1 − λ)2 (E X)2 6.
E X = 0,
Var X = 2px2 ,
P (|X| ≥ x) = 2p,
Var X x2
= 2p.
7. (a): Let x > 0. Then x = E(x + E X − X) ≤ E(x + E X − X)I{x + E X − X > 0},
15
Inequalities after which squaring and the Cauchy-Schwarz inequality together yield x2 ≤ E(x + E X − X)2 · P (x + E X − X > 0) = (x2 + σ 2 ) · P (X − E X < x), from which we conclude that P (X − E X ≥ x) ≤
σ2 . x2 + σ 2
The two-sided version follows by adding the tails. (b): Suppose that P (X = x) = P (X = −x) = 1/2. Then P (|X| ≥ x) = 1,
E X = 0,
σ 2 = x2 ,
2σ 2 = 1. x2 + σ 2
(Not very interesting.) (c): Cantelli is better than Chebyshev whenever σ2 2σ 2 ≤ , x2 + σ 2 x2 which reduces to σ 2 ≥ x2 (the upper bounds coincide for x = σ). (d): We know that P (|X − E X| ≥ x) ≤
2σ 2 , x2 + σ 2
√ which equals 1/2 when x = σ 3. The conclusion follows. (e): Suppose that E|X|r < ∞ for some r > 1, and set αr = E|X − E X|r ; (note that α2 = σ 2 ). By Lemma 3.1.1 applied to g(x) = |x + c|r , and the cr -inequalities, we then obtain P (X − E X ≥ x) ≤
E|X − E X + c|r 2r−1 (αr + cr ) ≤ (x + c)r (x + c)r
Checking the minimum of the RHS yields c = (αr /x)1/(r−1) (which reduces to c = σ 2 /x for r = 2), from which it follows that P (X − E X ≥ x) ≤ 2r−1
αr xr/(r−1)
+ (αr
r−1 )1/(r−1)
≤ 2r−1
xr
αr , + αr
where the last inequality follows from (a + b)r ≥ ar + br for a, b > 0, r ≥ 1. The two-sided inequality follows as in (a). Note also that, for r = 2, we rediscover (a), except for the factor 2r−1 = 2. 8. (a): This follows from Jensen’s inequality turned around, since log x is concave (i.e. − log x is convex). (b, c): This follows from the fact that there is strict inequality in Jensen’s inequality if the corresponding function is strictly convex. The logarithm is strictly concave. In the degenerate case both members are equal to the same constant.
16
Chapter 4 9. (a): We have QX+b (h) = sup P (x ≤ X + b ≤ x + h) = sup P (x − b ≤ X ≤ x − b + h) x
x
= sup P (x ≤ X ≤ x + h) = QX (h). x
In words: The interval that yields the maximal probability is translated if X is translated, but the value of the maximal probability does not change. (b): The correct statements are QX (mh) ≤ mQX (h),
(m = 1, 2, . . . , h > 0),
QX (ah) ≤ ([a] + 1)QX (h),
(a, h > 0).
To see this, let m ∈ N. Then QX (mh) = sup P (x ≤ X ≤ x + mh) x
≤ sup x m X
≤
k=1
m X
P x + (k − 1)h ≤ X ≤ x + kh
k=1
sup P x + (k − 1)h ≤ X ≤ x + kh = mQX (h). x
For the general case this implies that QX (ah) ≤ QX ([a] + 1)h ≤ ([a] + 1)QX (h). ♣ In addition, one may show that QaX (h) = QX (h/|a|)
for a 6= 0.
(c): The final conclusion is a consequence of the fact that QX+Y (h) = sup P (x ≤ X + Y ≤ x + h) x Z = sup FX (x + h − y) − FX (x − y) dFY (y) x Z ∞ R ≤ QX (h) dFY (y) = QX (h). −∞
The symmetric argument shows that we also have QX+Y (h) ≤ QY (h). The conclusion follows.
4
Characteristic Functions 1. In view of the uniqueness theorem for, say, moment generating functions, it suffices to check that ψY1 −Y2 (t) = ψL(1) (t) for all t. Toward this end, Z 1 ∞ tx −x RHS = ψL(1) (t) = e e e e dx + e e dx 2 2 0 −∞ −∞ Z Z 1 ∞ −tx−x 1 ∞ tx−x 1 1 1 1 e dx + e dx = + = , = 2 0 2 0 2 1+t 1−t 1 − t2 1 1 1 LHS = ψY1 −Y2 (t) = ψY1 (t) · ψ−Y2 (t) = ψY1 (t) · ψY2 (−t) = · = . 1−t 1+t 1 − t2 Z
∞
tx 1 −|x|
1 dx = 2
Z
0
tx x
17
Characteristic functions
2. For a characteristic function ϕ of a random variable X with mean 0, variance σ 2 , and a finite third moment we know, from Theorem 4.4.2, that ϕ(t) = 1 −
t2 σ 2 (it)3 + E X 3 + o(|t|3 ) 2 6
as
t → 0.
(S.4.1)
Exploiting this yields t2 Var (Sn ) (it)3 + E(Sn3 ) + o(|t|3 ) as t → 0, 2 6 P where, of course, Sn = nk=1 Xk . On the other hand, by indpendence and (S.4.1) applied to each factor, we also have ϕSn (t) = 1 −
ϕSn (t) =
n Y
ϕXk (t) =
k=1
n Y
1−
k=1
t2 Var (Xk ) (it)3 + E(Xk3 ) + o(|t|3 ) 2 6
as
t → 0.
The conclusion now follows by expanding the product along powers of t and by identifying the coefficients for t3 with those from the first expression. ♣ In addition, comparing the coefficients of t2 shows that Var Sn =
Pn
k=1
Var Xk .
3. By the uniqueness theorem for moment generating functions it suffices to show that ψVn = ψWn . Toward that end: Z ∞ Z ∞ n−1 tx ψVn (t) = e n FX (x) fX (x) dx = n etx (1 − e−x )n−1 e−x dx 0 0 Z 1 Γ(1 − t) · Γ(n) = Set y = e−x = n y (1−t)−1 (1 − y)n−1 dy = n Γ(1 − t + n) 0 Γ(1 − t) · Γ(n + 1) = Using Γ(s + 1) = sΓ(s) = = ··· Γ(1 − t + n) n! Γ(1 − t) · Γ(n + 1) = Qn , = Qn k=1 (k − t) · Γ(1 − t) k=1 (k − t) and ψWn (t) =
n Y
ψXk /k (t) =
k=1
n Y
ψXk (t/k) =
k=1
n Y k=1
n! 1 = Qn . 1 − (t/k) k=1 (k − t)
4. (a): Set Z = X + Y . P (Z ≤ z) = P (Z ≤ z | X = 1) · P (X = 1) + P (Z ≤ z | X = −1) · P (X = −1) = P (Y ≤ z − 1 | X = 1) · P (X = 1) +P (Y ≤ z | X = −1) · P (X = −1) = P (Y ≤ z − 1) · P (X = 1) + P (Y ≤ z + 1) · P (X = −1) ( z 1 1 for 0 < z < 2, 2 · 2 + 1 · 2, = z+2 1 0 + 2 · 2, for − 2 < z < 0, z+2 = for − 2 < z < 2. 4 In addition, derivation yields ( fZ (z) =
1 4,
0,
for 0 < z < 2, otherwise.
18
Chapter 4 (b): By independence we have ϕZ (t) = ϕX (t) · ϕY (t). Inserting the expressions for the respective transforms yields sin 2t sin t = cos t · . 2t t (c): Rearranging this produces the formula for the double angle of the sine function: sin 2t = 2 sin t cos t. 5. In view of the uniqueness theorem for characteristic functions it suffices to check the respective transforms. Recall that if X has characteristic function ϕ, then ϕX s = |ϕ|2 . Thus, ϕ(X+Y )s (t) = |ϕX+Y (t)|2 = |ϕX (t) · ϕY (t)|2 = |ϕX (t)|2 · |ϕY (t)|2 = ϕX s (t) · ϕY s (t) = ϕX s +Y s (t). 6. Let X and Y be independent and identically distributed random variables. We then know that ϕX−Y (t) = |ϕ(t)|2 , where ϕ is the common characteristic function of X and Y ; note that X − Y is the symmetrized version. We also know that
sin 2t . 2t Now, comparing the two transforms we observe that ϕX−Y is always non-negative, whereas ϕU (−1,1) also assumes negative values. Thus, no matter how we choose the distributions of X and Y we can never achieve equality between the two transforms for all t. The uniqueness theorem for characteristic functions therefore tells us that X − Y can never be U (−1, 1)-distributed. R 7. (a) and (b): We note that R (1 + |t|)e−|t| dt < ∞ (split into |t| ≤ 1 and |t| > 1 and estimate 1 + |t| ≤ 2 in the former interval, and 1 + |t| ≤ 2t in the latter). This means that if the function is, indeed, a characteristic function, the corresponding distribution is absolutely continuous. Therefore, the inversion formula and a change of variable t → −t on the negative half-axis, yield Z ∞ Z 0 Z ∞ 1 1 1 −ixt −|t| −ixt t e (1 + |t|)e dt = e (1 − t)e dt + e−ixt (1 + t)e−t dt 2π −∞ 2π −∞ 2π 0 Z ∞ Z ∞ 1 1 ixt −t e (1 + t)e dt + e−ixt (1 + t)e−t dt. = 2π 0 2π 0 ϕU (−1,1) (t) =
By identifying e−t as the density of a standard exponential distribution and te−t as the density of a Γ(2, 1)-distribution, we can “immediately” evaluate the integrals with the aid of the characteristic functions of these distributions (with x and t playing the role of each other). The computation thus continues: 1 1 1 1 1 = + + + 2π 1 − ix (1 − ix)2 1 − i(−x) (1 − i(−x))2 2(1 − x2 ) 2 1 2 = + = . 2 2 2 2π 1 + x (1 + x ) π(1 + x2 )2 The inversion formula produces a non-negative function whose total integral equals 1 (the value of the integral at t = 0), that is, the inversion formula produces the density of some random variable.
19
Characteristic functions ♣ One may, in fact, also check that Z ∞ 2 eitx dx = (1 + |t|)e−|t| . 2 )2 π(1 + x −∞
(c): Let t > 0. Differentiating twice yields ϕ0 (t) = e−t − (1 + t)e−t = te−t , ϕ00 (t) = e−t − te−t , and similarly for t < 0, from which it follows that mean and variance exist (Theorem 4.4.3), and, moreover, that E X = ϕ0 (0) = 0
and
Var X = E X 2 = ϕ00 (0) = 1.
Note also that the characteristic function is real, which implies that the distribution is symmetric, so that the mean equals 0. An alternative is to use series expansion. For t > 0 small (recall the symmetry), t2 ϕ(t) = ϕ(−t) = (1 + t)e−t = (1 + t) 1 − t + + o(t2 ) 2 2 2 t t = 1 − t2 + + o(t2 ) = 1 − + o(t2 ), 2 2 from which we can read off that E X = 0, and E X 2 = 1. ♣ Had we continued the expansion we could have read off higher moments. In fact, since ϕ is infinitely differentiable, or, equivalently, since the series expansion can be continued indefinitely, it follows that all moments exist.
8. Trying if the moment generating function exists we obtain ψ(t) = 1 +
∞ n X t n=1
n!
n
EX =1+
∞ n X t n=1
n!
· c = 1 + c · (et − 1).
Rewriting this as 1−c+cet we can identify this as the moment generating function of a Be(c)-distributed random variable if 0 < c < 1 (and a random variable degenerate at 0 when c = 0, and at 1 when c = 1). So, what about c ∈ / [0, 1]? Example? Counterexample? Since even moments necessarily are non-negative there cannot be any solution for c negative. So what about c > 1? Checking the variance, c(1 − c), which must be non-negative, rules out the case c > 1 (as well as the case c < 0). 9. Trying if the moment generating function exists we have ψ(t) = 1 +
∞ n X t n=1
= 1+
n!
E Xn = 1 +
2 t/λ 3 1 − (t/λ)
∞ n X t n=1
∞
2n! 2 X t n = 1 + n! 3λn 3 λ ·
n=1
for t < λ.
This is the moment generating function of some random variable, and since moment generating functions uniquely determine the distribution the distribution is unique.
20
Chapter 4 Moreover, rewriting the expression as 1 2 t/λ 1 2 1 ψ(t) = + 1+ = + · , 3 3 1 − (t/λ) 3 3 1 − (t/λ) we can, in fact, identify the distribution as a (convex) mixture of a δ(0)-distribution and an Exp(1/λ)-distribution.
10. Suppose that ϕ is, indeed, a characteristic function. Since ϕ(h) − 2ϕ(0) + ϕ(−h) →0 h2
as
t → 0,
we conclude that ϕ(t) is twice differentiable at t = 0, and that ϕ00 (0) = 0, so that E X 2 = 0. The series expansion also tells us that E X 2 = 0. This means that X ∈ δ(0). However, in that case, ϕ ≡ 1. In other words, if ϕ is a characteristic function, then, necessarily, ϕ ≡ 1. 11. Let X be the random variable associated with the characteristic function ϕ. d
(a): ϕ2 is the characteristic function of X + Y , where Y = X, and X and Y are independent. √ (b): ϕ need not be a characteristic function. As a preliminary we know from Problem 4.11.6 that the U (−1, 1)-distibution can not be decomposed into the difference of two independent, identically distributed random variables. This is in particular true for the difference of two independent, identically distributed symmetric random d variables. Moreover, since Y = −Y if Y is symmetric it follows that the U (−1, 1)distibution can not be decomposed into the sum of two independent, identically distributed symmetric random variables. Returning to the general case we first observe that the characteristic function of the U (−1, 1)-distibution assumes negative values, which creates trouble for the square root. Therefore, consider the random variable X whose characteristic function equals ϕ(t) =
1 1 sin t + . 2 2 t
This distribution is a mixture of the δ(0)-distribution and U ∈ U (−1, 1); viz., ( 1 x+1 · , for − 1 < x < 0, 1 1 FX (x) = Fδ(0) (x) + FU (x) = 12 2 1 x+1 2 2 for 0 ≤ x < 1. 2 ·1+ 2 · 2 , √ ≥ −1/π it follows that ϕ(t) ≥ 0 and a square root, ϕ, is well defined. √ We now claim that ϕ is not a characteristic function, or, equivalently, that X cannot be decomposed into the sum of two independent, identically distributed, √ symmetric (since ϕ is real) random variables.
Since
sin t t
Suppose, to the contrary, that, indeed X = Y1 +Y2 , where Y1 and Y2 are independent, √ identically distributed random variables (with characteristic function ϕ), and let us compute moments. The odd moments of X are equal to 0 by symmetry. As for the even ones we have Z 1 1 1 1 ·0+ 1 · x2n dx = , 2 2 −1 2 2(2n + 1) E X 2n = E(Y + Y )2n . 1 2
21
Characteristic functions
Let Y denote a generic Y -variable. For n = 1 we now obtain (recall independence and the fact that odd moments are equal to 0), 1/6 = E(Y1 + Y2 )2 = E Y12 + E Y22 = 2E Y 2
=⇒
EY 2 = 1/12,
1/10 = E(Y1 + Y2 )4 = E Y14 + 6E Y12 E Y22 + E Y24 = 2E Y 4 + 6 E Y 2 = 2E Y 4 + 6/144
=⇒
2
EY 4 = 7/240,
1/14 = E(Y1 + Y2 )6 = E Y16 + 15E Y14 E Y22 + 15E Y12 E Y24 + E Y26 7 1 = 2E Y 6 + 30E Y 4 E Y 2 = 2E Y 6 + 30 · · =⇒ E Y 6 = −1/1344, 240 12 and a contradiction has been obtained, since even moments are positive. (It also turns out that E Y 8 < 0.) One alternative (which is equivalent) is to use Taylor expansion, the terms of which should have alternating signs. Using, for example, Maple one finds that p
ϕ(t) = 1 −
1 2 7 4 1 1 191 t + t + t6 − t8 − t10 + · · · , 24 5760 967680 154828800 24524881920
from which one can read off the (hypothetical) moments and note that the expansion does not have alternating signs. Another way is to appeal to analytic characteristic functions and their zeroes, but we refrain from such an argument. ♣ An alternative description of the random variable X is X = V · U, where V ∈ Be(1/2), and U ∈ U (−1, 1) as above, and U and V are independent; one checks that 1 1 ϕX (t) = E etV U = E E etV U | V = E et·0·U | V + E et·1·U | V 2 2 1 1 = + ϕU (t) = ϕ(t). 2 2
For a discrete example, a natural attempt would be to depart from the coin-tossing distribution whose characteristic function is cos t, that is, to try ϕ(t) =
1 1 + · cos t, 2 2
in order to obtain a non-negative characteristic function. However, using the double angle formula we have r p 1 1 + · cos t = cos2 (t/2) = cos(t/2), 2 2 which is a characteristic function (corresponding to the distribution which puts mass 1/2 at −1/2 and at 1/2). The following modification works: Let X be a random variable, such that P (X = 1) = P (X = −1) = p,
P (X = 0) = 1 − 2p,
so that ϕ(t) = (1 − 2p) + 2p · cos t.
where
0 < p < 1/2,
22
Chapter 4 √ Then ϕ must be (would be) the characteristic function of a discrete random variable Y (remember the convolution table Figure 4.4.3, page 170). More precisely, Y must be concentrated to the points ±1/2, and for the probabilities at ±1 to match we must have √ P (Y = 1/2) = P (Y = −1/2) = p, √ which, in turn, forces the total mass to be 2 p 6= 1, since p < 1/2. This shows that X is not decomposable into a sum of two independent, identially distributed √ random variables, and, hence, that ϕ is not a characteristic function.
12. Use Parseval’s relation: Let X ∈ C(0, 1), let Y1 , Y2 be coin-tossing random variables, and suppose that all random variables are independent. Then Z ∞ Z Z ∞ 1 ∞ (cos y)2 ϕX (y) dFY1 +Y2 (y) dy = fX (y) · ϕY1 +Y2 (y) dy = π −∞ 1 + y 2 −∞ −∞ 1 1 1 1 1 1 = ϕX (−2) + ϕX (0) + ϕX (2) = e−2 + + e−2 4 2 4 4 2 4 1 + e−2 = . 2 As for the second example, let X be as above, let Z ∈ L(1), and suppose that X and Z are independent. Then Z ∞ Z ∞ Z ∞ 1 dy = π f (y) · ϕ (y) dy = π fZ (y) · ϕX (y) dy X Z 2 2 −∞ (1 + y ) −∞ −∞ Z ∞ Z ∞ 1 −|y| −|y| π = π e ·e dy = π e−2y dy = . 2 −∞ 2 0 Finally, let U ∈ Tri(−1, 1), which means that U is distributed as the sum of two independent U (− 21 , 21 )-distributed random variables, so that sin(t/2) 2 2(1 − cos t) ϕU (t) = . = t/2 t2 Using the inversion formula (ϕU ∈ L1 ), we thus have Z ∞ 1 2(1 − cos t) e−ixt dt = 1 − |x|, 2π −∞ t2
−1 < x < 1.
A rewriting of this, changing x → −t and t → x, tells us that Z ∞ 1 − cos x eitx dt = 1 − |t|, −1 < t < 1. πx2 −∞ x This implies that 1−cos is the density of a random variable V , whose characteristic πx2 function equals 1 − |t| for −1 < t < 1, and 0 otherwise. Add X as above to the setting, and suppose that X and V are independent. Then Z Z ∞ Z ∞ 1 ∞ 1 − cos y −|y| f (y) · ϕ (y) dy = ϕV (y) · fX (y) dy e dy = V X π −∞ y2 −∞ −∞ Z 1 Z 1 1 2 1 1−y = dy = dy (1 − |y|) · π 1 + y2 π 0 1 + y2 −1 Z 2 1 1 y = − dy π 0 1 + y2 1 + y2 1 log 2 2 1 = arctan 1 − arctan 0 − log 2 − log 1 = − . π 2 2 π
23
Characteristic functions 13. I cannot solve this one because I do not know what you have invented.
14. Let X have characteristic function ϕ, suppose that F is the distribution function of Y , and that X and Y are independent. Then R 1 ϕ(tu) du, R0 ∞ ϕ(tu)e−u du, ϕX·Y (t) = R0∞ ϕ(tu) −∞ π(1+u2 ) du, R ∞ 0 ϕ(tu) dF (u) .
for Y ∈ U (0, 1) , for Y ∈ Exp(1) , for Y ∈ C(0, 1) ,
Q t 15. Based on the U (−1, 1)-distribution we have found that sint t = ∞ k=1 cos( 2k ); recall Section 4.2.2. For the U (−π, π)-distribution we obtain analogously (scaling) that ∞ πt Y sin(πt) = cos k . πt 2 k=1
For t = 1/2 this reduces to ∞ π Y 2 = cos k+1 . π 2 k=1
The formula for the double angle of the cosine function tells us that s π cos k+1 = 2
1 + cos( 2πk ) . 2
Putting k = 1 yields
cos
π 4
r =
1 + cos( π2 ) = 2
r
√ 1+0 2 = . 2 2
Putting k = 2 yields
cos
π 8
r =
1+
cos( π4 ) 2
√
s =
1+ 2
2 2
p =
2+ 2
√
2
.
And so on. Continuing this procedure produces the desired formula. ♣ Note also that Vieta’s formula can be used to approximate π.
16. Since |1 − eihx |2 = (1 − cos hx)2 + (sin hx)2 = 2(1 − cos hx), we obtain 2
|ϕ(t) − ϕ(t + h)|
2 2 itX itX i(t+h)X ihX = E e −Ee 1−e = E e 2 ≤ E 1 − eihX = 2E(1 − cos hX) = 2 1 − Re(ϕ(h) .
24
Chapter 4
17. Via a change of the order of integration we obtain 1 h
Z
h
0
Z Z ∞ 1 h (1 − cos tx) dFX (x) dt 1 − Re(ϕ(t) dt = h 0 −∞ Z ∞ Z h 1 (1 − cos tx) dt dFX (x) = −∞ h 0 Z ∞ Z sin hx sin hx = 1− dFX (x) ≥ 1− dFX (x) hx hx −∞ |x|≥1/h Z ≥ (1 − sin 1) dFX (x) = (1 − sin 1)P (|X| > 1/h) |x|≥1/h
1 ≥ 1 − sin(π/3) P (|X| > 1/h) ≥ P (|X| > 1/h). 7 18. We identify the expression as the characteristic function of a multivariate normal distribution, namely N (µ, Λ), where µ = 0, and
1
1 2
Λ=
1 2 − 21
2 1 2
− 12
1 2
.
4
19. Since the variable of interest is a linear combination of the components of X we know that the answer is a normal distribution, more precisely N (Bµ, BΛB0 ), where B is the transformation matrix, in this case a vector; B = (1, −2, 3). Inserting the given values yields 3 Bµ = (1, −2, 3) 4 = −14 −3 and
2 1 3 1 4 −2 −2 = · · · = 128 . BΛB0 = (1, −2, 3) 1 3 −2 8 3 ♣ In this particular case, when the answer is a one-dimensional distribution, it is, of course, also possible to determine the parameters the usual way: E(X1 − 2X2 + 3X3 ) = 3 − 2 · 4 + 3 · (−3) = −14, Var (X1 − 2X2 + 3X3 ) = Var X1 + 22 Var X2 + 32 Var X3 + 2 · (−2) · Cov (X1 , X2 ) +2 · 3 · Cov (X1 , X3 ) + 2 · (−6) · Cov (X2 , X3 ) = 2 + 16 + 72 − 4 + 18 + 24 = 128.
20. (a): We have P ({X ≤ x} ∩ {X > Y }) P (Y < X ≤ x) = P (X > Y ) 1/2 2 1 = 2 · P ({X ≤ x} ∩ {Y ≤ x}) = P (X ≤ x)2 = Φ(x) . 2
P (X ≤ x | X > Y ) =
In other words, the distribution of X given that X > Y is the same as the distribution of max{X, Y }.
25
Characteristic functions The conditional density is 2Φ(x)φ(x), so that, using partial integration, r Z ∞ Z ∞ 2 2 xΦ(x)φ(x) dx = E(X | X > Y ) = 2 Φ(x) · xe−x /2 dx π −∞ −∞ r h Z ∞ i ∞ 2 −x2 /2 −x2 /2 φ(x)e dx = + − Φ(x)e π −∞ −∞ r Z ∞ Z ∞ 2 2 1 −x2 1 1 − x √ √ p = e dx = e 2·(1/2) dx π −∞ 2π π −∞ 2π · (1/2) 1 = √ . π
The integral has been made into a normal integral that we know has the value 1. (b): The trick leading to a short solution is to observe that the sum and the difference are independent random variables: E(X + Y | X > Y ) = E(X + Y | X − Y > 0) = E(X + Y ) = 0. 21. Let X, X1 , X2 , . . . be independent random variables with characteristic function ϕ, let N ∈ Ge(p) be independent of X1 , X2 , . . . , and set Y = X1 + X2 + · + XN . An application of Theorem 4.9.1 then tells us that p . ϕY (t) = gN ϕX (t) = 1 − (1 − p)ϕ(t) In other words, ϕ˜ is, indeed, a characteristic function, namely of Y . 22. We are facing a sum of a random number of random variables, for which the results of Section 2.15 apply. (a): We use generating functions, but first some facts. ∞ X
∞ mn 1 X (mt)n emt − 1 gX (t) = t = = , n!(em − 1) em − 1 n! em − 1 n=1 n=1 n gN (t) = e−m + (1 − e−m )t . n
Thus, emt − 1 n = · · · = emn(t−1) = g Po(mn) (t). gY (t) = gN gX (t) = e−m + (1 − e−m ) m e −1 The uniqueness theorem for generating functions tells us that, indeed, Y ∈ Po(nm). (b): We begin with some facts about the mean and variance of X and N . EX =
∞ X n=1
∞
m X mn mem mn n = = , n!(em − 1) em − 1 n! em − 1 n=0
E X 2 = E X(X − 1) + X =
=
m2 em − 1
∞ X mn n=0
n!
Var X = E X 2 − E X EN
= n(1 − e−m ),
+
∞ X
n=1 mem
em − 1
n(n − 1) =
mn mem + n!(em − 1) em − 1
(m2 + m)em , em − 1
mem m2 em − , (em − 1)2 em − 1 Var N = n(1 − e−m )e−m .
2
= ··· =
26
Chapter 4 With the aid of these formulas we obtain EY Var Y
mem = · · · = mn, em − 1 = E N · Var X + (E X)2 · Var N mem m2 em mem 2 − (1 − e−m )e−m + m = n(1 − e−m ) (em − 1)2 em − 1 e −1 = · · · = mn. = E N · E X = n(1 − e−m ) ·
In other words, mean and variance are those of the relevant Poisson distribution. 23. Let Y -variables denote offspring. The basis for everything is the fact that X(2) = Y1 + Y2 + · · · + YX(1) , more generally, X(n) = Y1 + Y2 + · · · + YX(n−1) . (a): The first relation states that the number of grandchildren is the sum of the number of children of every child. The second relation is analogous. Both relations d fit the kind of “SN -sum” in Section 2.15. Recalling that Y = X(1) we thus conclude that g2 (t) = g1 (g1 (t)) = g(g(t)) and that gn (t) = gn−1 (g(t)), as desired. (b): Similarly, using Theorem 2.15.1, E X(2) = E X(1) · E Y = m · m = m2 , E X(n) = E X(n − 1) · E Y = E X(n − 1) · m2 = E X(n − 2) · m3 = · · · = mn . (c): Every individual in the first generation generates a new branching process of its own. All these such generated branching processes are independent. Let Z1 , Z2 , . . . denote the total progeny of these branching processes up to generation n − 1, including the new “founder”, i.e., the corresponding individual in the first generation. d Letting Z denote a generic random variable, we have, i.a., that Z = Tn−1 . The crucial feature now is the relation Tn = X(0) + Z1 + Z2 + · · · + ZX(1) . We next observe that X(0) = 1 and that the remaining sum is, once again, an “SN sum”. Translating these facts into generating functions, noticing that g1+U (t) = tgU (t) for any non-negative integer valued random variable U , we obtain gTn (t) = tgX(1) gZ (t) , which is the same as Gn (t) = tg(Gn−1 (t)). (d): The statement E X(n) = mn → 0 as n → ∞ indicates extinction, at least on average. So, for m < 1 the mean total progeny becomes E
∞ X n=0
∞ ∞ X X X(n) = E X(n) = mn = n=0
n=0
1 . 1−m
27
Convergence 24. For the shortest lifetime Y = min{X1 , X2 , . . . , XN } we have, for y > 0, P (Y > y) = = = =
∞ X n=0 ∞ X n=0 ∞ X n=0 ∞ X
P (Y > y | N = n) · P (N = n) P (min{X1 , X2 , . . . , Xn } > y | N = n) · P (N = n) P (min{X1 , X2 , . . . , Xn } > y) · P (N = n) ∞ X n n−1 1 − F (y) · p(1 − p) = e−nay · p(1 − p)n−1
n=1
n=1
= p · e−ay
∞ X
n e−ay (1 − p) =
n=0
pe−ay , 1 − e−ay (1 − p)
that is, FY (y) = 1 −
pe−ay 1 − e−ay = , 1 − e−ay (1 − p) 1 − e−ay (1 − p)
y > 0.
25. Glancing at the previous problem we obtain FY (y) = P (Y ≤ y) =
∞ X
P (Y ≤ y | N = n) · P (N = n)
n=0
= = =
∞ X n=0 ∞ X n=0 ∞ X
P (max{X1 , X2 , . . . , Xn } ≤ y | N = n) · P (N = n) P (max{X1 , X2 , . . . , Xn } ≤ y) · P (N = n) n F (y) · P (N = n) = gN F (x) .
n=0
♣ Note that we may rewrite the answer in Problem 4.11.24 as FY (y) = 1 − gN (1 − F (y)).
5
Convergence 1. Observe that Xn − a ∈ Exp(1), and that Zn = min{X1 − a, X2 − a, . . . , Xn − a} + a = Yn − a, where Yn = min{V1 , V2 , . . . , Vn } and V1 , V2 , . . . are i.i.d. standard exponential random variables. The problem is therefore equivalent to showing that p
Yn → 0. The claim now follows from the fact that, for any ε > 0, P (|Yn | > ε) = e−ε
n
= e−εn → 0
as
n → ∞.
28
Chapter 5 2. Exploiting facts from the Γ-distribution we have Z ∞ 1 1 x −x/k k2 1 k2 E e dx = = = , 3 2 3 Xk 2k 0 Γ(2) k 2k 2k Z ∞ 1 1 −x/k 1 k k = e dx = 3 = 2 , E 2 3 2k 0 k 2k 2k Xk 1 2 1 1 1 Var = 2, = − 2 Xk 2k 2k 4k n n X X 1 1 1 , E = Xk 2 k k=1 n X
Var
k=1
so that
k=1
1 Xk
n X 1 π2 1 = → · 4k 2 4 6
as
n → ∞,
k=1
n X 1 1 2 E →0 − Xk 2k
as
n, m → ∞,
k=m+1
since the sum of variances converges. We thus conclude that verges in square mean, and, hence, in probability as n → ∞.
Pn
1 k=1 ( Xk
−
1 2k )
con-
The conclusion now follows from the hint and Cram´er’s theorem; n n n X X 1 1 1 1 1 X 1 − log n = − + − log n . Xk 2 Xk 2k 2 k k=1
k=1
k=1
2
♣ We return to the L -convergence problem in Lemma 6.5.2.
3. In view of the continuity theorem for characteristic functions it suffices to check convergence of the latter: ϕXn /n (t)
= = →
λ λ+n n it/n λ+n e
λ 1− n + λ − neit/n λ λ = λ − it + no(t/n) n + λ − n 1 + it/n + o(t/n) λ = ϕ Exp(1/λ) (t) as n → ∞, λ − it
ϕXn (t/n) =
d
which thus tells us that Xn /n → Exp(1/λ)
as
=
n → ∞.
4. In view of the continuity theorem for characteristic functions it suffices to investigate the convergence problem for the latter. (a): We have ϕXn (t) =
p pn → = ϕ Ge(p) (t) 1 − (1 − pn )eit 1 − (1 − p)eit
as
n → ∞.
(b): This extends Problem 5.14.3. ϕXn /n (t)
= = →
d
pn 1 − (1 − pn )eit/n pn npn = np − it + no(t/n) 1 − (1 − pn ) 1 + it/n + o(t/n) n λ = ϕ Exp(1/λ) (t) as n → ∞, λ − it
ϕXn (t/n) =
so that Xn /n → Exp(1/λ) as n → ∞.
29
Convergence
5. Noticing that τ (t) ∈ Fs(pt ), and that pt → 0 as t → ∞, we obtain, with a glance at the solutions of Problems 5.14.3 and 5.14.4(b), pt 1 + ipt s + o(pt s) pt eipt s ϕpt τ (t) (s) = ϕτ (t) (pt s) = = 1 − (1 − pt )eipt s 1 − (1 − pt ) 1 + ipt s + o(pt s) 1 + ipt s + o(pt s) 1 = → = ϕ Exp(1) (s) as n → ∞. 1 − is + o(pt s) 1 − is An appeal to the continuity theorem for characteristic functions finishes the proof. 6. In view of to the continuity theorem for moment generating functions, say, it suffices to check the corresponding transforms: 1 1 ψXn (t) = exp tµn + t2 σn2 → exp tµ + t2 σ 2 = ψX (t) 2 2
as
n → ∞,
iff µn → µ and σn2 → σ 2 as n → ∞ (take logarithms and identify coefficients of the powers of t). 7. In view of to the continuity theorem for generating functions, say, it suffices to check the corresponding transforms: gXn (t) = exp{λn (t − 1)} → exp{λ(t − 1)} = gX (t)
as
n → ∞,
iff λn → λ as n → ∞. 8. We have ϕ Sn (t) n2
=
ϕ t/n
n 2
=
q ( 1 − |t| n 2−
|t| n ,
0, p → exp{− 2|t|}
as
for |t| ≤ n2 , otherwise,
n → ∞.
An application of the continuity theorem for characteristic functions tells us that converges in distribution as n → ∞.
Sn n2
♣ The limit distribution is a so-called asymmetric stable distribution with index 1/2. For more on stable distributions we refer to Chapter 9.
9. Suppose that Xn ∈ U (− n1 , n1 ), for n ≥ 1. Then Xn is absolutely continuous for p
d
every n, and since Xn → 0, and, hence, Xn → δ(0) as n → ∞, the limit is discrete (moreover, degenerate). To check this we simply observe that, for every ε > 0, P (|Xn | > ε) = 0
as soon as
1 n> . ε
For an example of the second kind, let Xn be uniformly distributed over the points d k n , k = 0, 1, . . . , n − 1. Then Xn → U (0, 1) as n → ∞. To see this we check the moment generating function (with the corresponding continuity theorem in mind): ψXn (t) =
n−1 X k=0
=
n n−1 1 X t/n k 1 1 − et/n = e = · n n n 1 − et/n k=0
tk/n 1
e
1 − et t/n et − 1 → = ψU (0,1) (t) t 1 − et/n t
as
n → ∞.
30
Chapter 5
10. It follows from the assumption that P (lim inf Xn < a < b < lim sup Xn i.o.) = 0 n→∞
for all
n→∞
a < b, a, b ∈ Q,
(S.5.1)
which, in view of the fact that the set of rationals is dense in R, implies convergence, since non-convergence would imply that P (lim inf Xn < lim sup Xn ) > 0, n→∞
n→∞
which, in turn, would contradict (S.5.1). 11. (i):
For any measurable, non-empty set A ⊂ supp (FX ), we have ∞ X
P (Xn ∈ A) = +∞,
n=1
since the sum consists of an infinity of identical terms. By the second Borel-Cantelli lemma we therefore conclude that P (Xn ∈ A i.o.) = 1
for every such set A.
Note, additionally, that convergence is a tail event, and that random variables which are tail measurable are constants by the Kolmogorov zero-one law. However, for any ε > 0 we have ∞ X
P (|Xn − a| > ε) = ∞
for any
a∈R
(a ∈ supp (FX )),
n=1
so that, by the second Borel-Cantelli lemma, P (|Xn − a| > ε i.o.) = 1 for all a ∈ supp (FX ). (ii):
Alternatively, since ∞ X
P (|X2n+1 − X2n | > ε) = ∞
for all
ε > 0,
n=1
the second Borel-Cantelli lemma tells us that the sequence {Xn } is not a.s. Cauchyconvergent, and, hence, not a.s. convergent either. (iii): Alternatively, since the distribution is non-dgenerate, there exist a, b ∈ R, a < b, such that P (X1 < a) > 0 and P (X1 > b) > 0, which implies that ∞ X
P (Xn < a) =
n=1
∞ X
P (Xn > b) = +∞,
n=1
so that, by the second Borel-Cantelli lemma, P (Xn < a i.o.) = P (Xn > b i.o.) = 1, and, hence, P (Xn < a i.o.
and P (Xn > b i.o.) = 1,
which, in view of the previous problem implies that Xn does not converge almost surely as n → ∞.
31
Convergence 12. Checking the generating function yields 1 1 1 n t − 1 t2 − 1 n 1 + gSn (t) = 1 − − 2 t 0 + t 1 + 2 t2 = 1 + n n n n n n2 → exp{t − 1} = g Po(1) (t) as n → ∞,
which, in view of the continuity theorem for generating functions, establishes the desired conclusion. P 13. Set Yn = nk=1 ak Xk , n ≥ 1. Then n n n n X o Y Y ϕYn (t) = ϕXk (ak t) = exp{−|ak t|} = exp − |ak | |t| , k=1
k=1
k=1
P which converges to a continuous function, which equals 0 at t = 0, iff ∞ n=1 |an | < ∞. By the continuity theorem for characteristic functions the limiting distribution exists, P∞ it is, in fact a rescaled Cauchy distibution; C(0, n=1 |an |). If the sum diverges, then ϕYn (t) converges to 1 for t = 0 and to 0 otherwise and therefore can not be a characteristic function. 14. Suppose first that µn = 0 for all n. A neccesary, but not sufficient, condition is L1 -boundedness. Now, Z ∞ x2 1 √ sup E|Xn | = sup |x| exp − 2 dx 2σn n n 2πσn −∞ r Z ∞ 1 2 2 √ σn |x| exp{−x /2} dx = sup σn . = sup π n n 2π −∞ This tells us that uniformly bounded variances, that is, sup σn2 ≤ C < ∞ n
is a necessary condition. The definition requires that we check the following quantity: Z n x2 o 1 √ E|Xn |I{|Xn | > a} = |x| exp − 2 dx 2σn 2πσn |x|>a r Z ∞ n a2 o 2σn 2 2 = √ x exp{−x /2} dx = σn exp − 2 π 2σn 2π a/σn n a2 o ≤ C 0 exp − 2 . 2σn The latter quantity converges to 0 as a → ∞ uniformly in n provided inf n σn > 0. For the case µn 6= 0 the necessary condition concerning L1 -boundedness becomes: Z ∞ n (x − µ )2 o 1 n √ sup E|Xn | = sup |x| exp − dx 2 2σ 2πσ n n n −∞ n Z ∞ x2 1 √ |σn x + µn | exp − = sup dx 2 n 2π −∞ r2 ≤ sup σn + |µn | < ∞. π n Adding the assumption supn |µn | < ∞ thus guarantees uniform integrability. Moreover, a violation of this condition entails E Xn2 = ∞, in which case the variance is not well defined.
32
Chapter 5
15. A necessary, but not sufficient, condition is L1 -boundedness: sup E|Xn | = sup λn = C < ∞. n
n
The definition requires that we check the following quantity: Z ∞ x exp{−x/λn } dx E|Xn |I{|Xn | > a} = E Xn I{Xn > a} = λn a a = exp{−a/λn } + exp{−a/λn } λn a = exp{−a/λn } 1 + . λn For this to converge to 0 as a → ∞ uniformly in n we require, in addition, that inf n λn > 0. 16. (a): Let X have characteristic function ϕ, and introduce, for 1 ≤ k ≤ n, n ≥ 1, P d Xk,n = X/ak,n . Finally, set F (x) = nk=1 pk,n FXk,n (x). The characteristic function of this distribution is ϕn (t) =
n X
pk,n ϕXk,n (t) =
k=1
n X
pk,n ϕX/ak,n (t) =
k=1
n X
pk,n ϕ(t/ak,n ).
k=1
This shows that ϕn is, indeed, a characteristic function. Moreover, ϕn is convex since the a-coefficients are strictly increasing. (b): Since convex functions can be obtained as limits of polygons, P´olya’s theorem follows from this fact, together with the continuity theorem for characteristic functions. R 17. We begin by modifying Problem 4.11.14 in order to show that R ϕ(t, u) dG(u) is a characteristic function. To see this, let Y have distribution function G, and let X be a random variable, such that ϕX|Y =u (t) = ϕ(t, u). Then Z ϕX (t) = E eitX = E E eitX | Y = ϕ(t, u) dG(u). R
Now, let X, Xk,n , 1 ≤ k ≤ n, n ≥ 1,Pbe independent, identically distributed random variables, let Z0 ∈ δ(0), set Zn = nk=1 Xk,n , n ≥ 1, and consider the distribution function n X 1 Fn (x) = FZ (x), n ≥ 1. k! k k=0
The characteristic function of this distribution is n n X X 1 (ϕX (t))k ϕ˜n (t) = ϕZk (t) = → ϕ(t) ˜ k! k! k=0
as
n → ∞,
k=0
which, in view of the continuity theorem for characteristic functions (note that ϕ(0) ˜ = 0 and that ϕ(t)is ˜ continuous) shows that ϕ(t) ˜ is a characteristic function – itPis the characteristic function corresponding to the distribution function 1 F (x) = ∞ k=0 k! FZk (x).
33
Convergence 18. Using Theorem 4.9.1 we obtain ϕSN (t)
=
gN ϕX (t) = exp m(
ma2 t2 1 − 1) = exp − 1 + a2 t2 1 + a2 t2
→ exp{−t2 } = ϕN (0,1/2) (t), d
so that SN → N (0, 1/2) in view of the continuity theorem for characteristic functions. 19. Since
Xn ∈ Exp(1), n! it seems reasonable to try to prove that Sn−1 p →0 n!
as
n → ∞,
(S.5.2)
after which an application of Cram´er’s theorem would conclude the proof. Toward this end we have S E Xn−1 E Sn−2 (n − 1)! (n − 2)! n−1 = + ≤ + (n − 2) · E n! n! n! n! n! 2 ≤ → 0 as n → ∞, n n−1 n−1 S X Var Xk X (k!)2 (n − 1)!)2 n−1 = = ≤ (n − 1) · Var n! (n!)2 (n!)2 (n!)2 k=1
≤
1 →0 n
k=1
as
n → ∞,
from which (S.5.2) follows. 20. (a): Noticing that Yn = Xn + (Yn − Xn ), and that Xn ∈ U (−1, 1), it remains to p prove that Yn − Xn → 0 as n → ∞, since an application of Cram´er’s theorem then d tells us that Yn → Y ∈ U (−1, 1) as n → ∞. Let ε < 1. Then P (|Yn − Xn | > ε) = P |Xn | > 1 − (b): As for the mean, Z E Yn = |x|≤1−(1/n)
1 1 = →0 n n
as
n → ∞.
1 1 x dx + nP (Yn = n) = 0 + n · = 1, 2 n
in other words, E Yn 6→ E Y = 0 as n → ∞. In order to check the variance we first compute the second moment: Z 1 2 1 E Yn2 = x dFXn + n2 P (Yn = n) ≥ n2 · = n → ∞ as 2 n |x|≤1−(1/n)
n → ∞,
which implies that Var Yn → ∞ as n → ∞ as well, in particular Var Yn 6→ Var Y = 1/3 as n → ∞. ♣ Note that the sequence of expected values converges (in fact, is constant!), but that the limit (constant) is not equal to the expected value of the limit.
34
Chapter 5 p
21. Since Xn → 1 as n → ∞ we know, via Cram´er’s theorem, that d
Yn → N (0, 1)
as
n → ∞.
Moreover, by independence, E Yn = E X · E Xn = 0 · 1 = 0, 1 1 Var Yn = E Yn2 = E X 2 · E Xn2 = 1 · 12 · 1 − + n2 · →∞ n n
as
n → ∞.
22. Everything follows from the fact that the L1 norm is dominated by the L2 norm (cf. Lyapounov’s inequality), viz., 2 kxk2 = |x1 |2 + |x2 |2 + · · · + |xd |2 ≥ |x1 | + |x2 | + · · · + |xd | . This means that a vector converges if and only if the components converge. As for convergence in probability this amounts to the statement that p
Xn → X
as
n→∞
⇐⇒
p
Xn(k) → X (k)
as
n→∞
for all
k.
The second conclusion follows similarly: If the sequence of random vectors converges in probability, then so do the components, which, by Theorem 5.10.3 implies that p (k) p h(Xn ) → h(X (k) ) as n → ∞ for every k, which, in turn implies that h(Xn ) → h(X) as n → ∞. 23. The short solution is based on the fact that minimum and maximum are continuous functions, so that the conclusion follows immediately from the continuous mapping theorem. Alternatively, let ε > 0. A direct computation along the lines of the proof of Cram´er’s theorem yields P (max{Un , Vn } ≤ x) = P ({max{Un , Vn } ≤ x} ∩ {|Vn − a| > ε}) +P ({max{Un , Vn } ≤ x} ∩ {|Vn − a| ≤ ε}) = p 1 + p2 . First of all, p1 ≤ P (|Vn − a| > ε) → 0
as
n → ∞,
so that it remains to investigate p2 . The bounds on Vn imply that ( 0, for x < a − ε, p2 = → P (U ≤ x) as P (Un ≤ x), for x > a + ε.
n → ∞.
The arbitrariness of ε concludes the proof; note that the case x = a is not, need not, and can not be checked – it is a discontinuity point of the limit. The arguments for the minimum are analogous (in this case it is more convenient to study the tails of the distributions): P (min{Un , Vn } > x) = P ({min{Un , Vn } > x} ∩ {|Vn − a| > ε}) +P ({min{Un , Vn } ≤ x} ∩ {|Vn − a| > ε}) = p 1 + p2 .
35
Convergence Once again, p1 ≤ P (|Vn − a| > ε) → 0 As for p2 , the bounds on Vn imply that ( P (Un > x), for x < a − ε, p2 = 0, for x > a + ε.
as
n → ∞.
→ P (U > x)
as
n → ∞.
The arbitrariness of ε concludes the proof. 24. We need the following extension of Theorem 5.6.1 as a prelude: Theorem S.5.1 Let X1 , X2 , . . . be a sequence of random variables. The following are equivalent: d
(a) Xn → X as n → ∞. (b) E h(Xn ) → E h(X) as n → ∞ for all h ∈ CB . (c) lim supn→∞ P (Xn ∈ F ) ≤ P (X ∈ F ) for all closed sets F . (d) lim inf n→∞ P (Xn ∈ G) ≥ P (X ∈ G) for all open sets G. (e) limn→∞ P (Xn ∈ A) = P (X ∈ A) for all P -continuity sets, i.e., for all sets A, such that P (X ∈ ∂A) = 0. Proof. (a) ⇐⇒ (b): See Theorem 5.6.1. (b) =⇒ (c): Let F be an arbitrary closed set, and let Fε be an ε-neigborhood of F ; Fε = {x ∈ F c : |x − y| < ε for some y ∈ F .} With the aid of a minor modification of Lemma A.9.4 (cf. also the proof of Theorem 5.6.1) we can approximate the indicator function I{F } with a (uniformly) continuous and bounded function h, such that I{F (x)} ≤ h(x) ≤ I{Fε (x)}, so that an application of (a) leads to lim sup P (Xn ∈ F ) = lim sup E I{Xn ∈ F ) ≤ lim sup E h(Xn ) = E h(X) n→∞
n→∞
n→∞
≤ E I{X ∈ Fε } = P (X ∈ Fε ). Letting ε & 0 establishes (b). (c) ⇐⇒ (d): Take complements; the complement of a closed set is open and vice versa. (c,d) =⇒ (e): Joining the two yields P (X ∈ A ∪ ∂A) ≥ lim sup P (Xn ∈ A ∪ ∂A) ≥ lim sup P (Xn ∈ A) n→∞
n→∞
≥ lim inf P (Xn ∈ A) ≥ lim inf P (Xn ∈ A r ∂A) n→∞
n→∞
≥ P (X ∈ A r ∂A). Now, if P (∂A) = 0 the extreme members coincide, which implies that P (Xn ∈ A) → P (A) as n → ∞, that is, (e) holds. ♣ The set Ao = A r ∂A is called the interior of A, and the set A¯ = A ∪ ∂A is called the closure of A.
36
Chapter 5 (e) =⇒ (a): By linearity it is no restriction to suppose that 0 ≤ h ≤ 1, say. The continuity of h then implies that P (∂{h(X) > x}) ≤ P (h(X) = x) = 0, so that (e), together with Theorem 2.12.1(i) and bounded convergence (Theorem 2.5.3 with a dominating random variable that is constant), tell us that Z
1
Z
1
P (Xn > x) dx →
E h(Xn ) = 0
P (Xn > x) dx = E h(X)
as
0
n → ∞. 2
We are now all set for the solution of the problem. The idea is to apply Theorem S.5.5(c) as follows: Let F be a closed set. Then, setting g¯F = {x : g(x) ∈ F } (= g −1 F ), lim sup P (g(Xn ) ∈ F ) = lim sup P (Xn ∈ g¯F ) ≤ lim sup P Xn ∈ g¯F ∪ ∂(¯ gF ) n→∞ n→∞ n→∞ ≤ P X ∈ g¯F ∪ ∂(¯ gF ) = P X ∈ g¯F ∪ ∂(¯ gF ) ∩ E c ≤ P (X ∈ g¯F ∩ E c ) + P (X ∈ ∂(¯ gF ) ∩ E c ) ≤ P (X ∈ g¯F ∩ E c ) + P (X ∈ E c ) ≤ P (X ∈ g¯F ) = P (g(X) ∈ F ). This shows that condition (c) of the theorem is fulfilled, and we are done. 25. Given X and X1 , X2 , . . . we have dL (Xn , X) = inf{ε > 0 : FX (x − ε) − ε ≤ FXn (x) ≤ FX (x + ε) + ε} for all x. d
Suppose first that Xn → X as n → ∞, and let ε > 0. Letting n → ∞ then yields lim sup dL (Xn , X) ≤ ε, n→∞
since FXn (x) → FX (x) as n → ∞ for x ∈ C(F ), and FX (x − ε) − ε ≤ FX (x) ≤ FX (x + ε) + ε if x ∈ / C(F ). This shows that dL (Xn , X) → 0 as n → ∞. Suppose, on the other hand, that dL (Xn , X) → 0 as n → ∞. This implies, in particular, that FX (x − ε) − ε ≤ lim inf FXn (x) ≤ lim sup FXn (x) ≤ FX (x + ε) + ε, n→∞
n→∞
for every x ∈ R and ε > 0, so that, letting ε & 0 tells us that FX (x−) ≤ lim inf FXn (x) ≤ lim sup FXn (x) ≤ FX (x) for all x, n→∞
n→∞
in particular, that FX (x) ≤ lim inf FXn (x) ≤ lim sup FXn (x) ≤ FX (x) for all x ∈ C(F ), n→∞
n→∞
d
and, hence, that Xn → X as n → ∞. ♣ It can be shown that dL thus defined defines a metric; please verify this fact.
37
The law of large numbers
6
The Law of Large Numbers 1. (a) Let ε > 0. Then n P (|Yn − x∞ | > ε) = P (Yn < x∞ − ε) = 1 − F (x∞ − ε) → 0
as
n → ∞.
(b): Since Yn is monotone in n we also have a.s. convergence (Theorem 5.3.5). p
(c): In complete analogy with (a) we have Zn → x−∞ as n → ∞ (or, else, consider −Xk for all k, and Zn = max{−X1 , −X2 , . . . , −Xn }). It then follows that the ratio converges in probability to the ratio of the limits. a.s.
(d): Analogous to (b) we have Zn → x−∞ as n → ∞, from which it follows that the ratio converges almost surely to the ratio of the limits. 2. The strong law tells us that n
1X a.s. Xk → E X1 = µ n k=1 n X
1 n
as
n → ∞,
a.s.
Xk2 → E X12 = σ 2 + µ2
as
n → ∞,
k=1
from which it follows that Pn X Pnk=1 k2 = X k=1 k
1 n 1 n
Pn Xk a.s. µ Pnk=1 2 → 2 σ + µ2 k=1 Xk
n → ∞.
as
3. (a): The usual rewriting, the strong law, and the fact that the function x2 is continuous (at µx ), together yield n
s2n,x
n 1 X 2 ¯ n )2 a.s. → 1 · E X 2 − (E X)2 = σx2 Xk − (X = n−1 n
as
n → ∞.
k=1
a.s.
The proof that s2n,y → σy2 as n → ∞ is, of course, the same. (b): We assume w.l.o.g. that µx = µy = 0. Since {Xk ·Yk , k ≥ 1} are independent, identically distributed random variables with mean E(X1 Y1 ) = ρσx σy , rewriting and observing that the square root is continuous, together yield rn =
1 n
Pn
k=1 Xk Yk
n−1 n
¯ n · Y¯n −X
q
s2n,x · s2n,x
→
ρσx σy − 0 · 0 =ρ 1 · σx · σy
as
n → ∞.
4. For every k we have
P (Yk > y) =
k Y j=1
P (Xj > y) = (1 − y)k ,
0 < y < 1,
38
Chapter 6 so that (Theorem 2.12.1) Z
Z
1
1 y k dy = , k+1 0 0 Z 1 Z 1 Z 1 k k = 2 y(1 − y) dy = 2 (1 − y) dy − 2 (1 − y)k+1 dy k
(1 − y) dy =
E Yk = E Yk2
1
0
0
0
2 2 2 − = , k+1 k+2 (k + 1)(k + 2) 1 2 k 2 = − . (k + 1)(k + 2) k+1 (k + 1)2 (k + 2)
= Var Yk =
Since Y1 , Y2 , . . . are not independent we also need covariances. Let i < j and consider U = Yi and V = min{Xi+1 , Xi+2 , . . . , Xj }, so that Yi = U , and Yj = min{U, V }, and note that U and V are independent. Thus, fU,V (u, v) = i(1 − u)i−1 · (j − i)(1 − v)j−i−1 ,
0 < u, v < 1,
from which it follows that Z
1Z v
E(Yi Yj ) = E(U · min{U, V }) = u2 · i(1 − u)i−1 · (j − i)(1 − v)j−i−1 dudv 0 0 Z 1Z 1 + uv · i(1 − u)i−1 · (j − i)(1 − v)j−i−1 dudv 0 v Z 1 2 2 v(1 − v)i+1 + (1 − (1 − v)i+2 )} = {−v 2 (1 − v)i − i + 1 (i + 1)(i + 2) 0 ×(j − i)(1 − v)j−i−1 dv Z 1n o 1 + v(1 − v)i + (1 − v)i+1 · v(j − i)(1 − v)j−i−1 dv i+1 0 Z 1n o 2 1 1 − (1 − v)i+2 − v(1 − v)i+1 = (i + 1)(i + 2) i+1 0
= =
×(j − i)(1 − v)j−i−1 dv 2 2(j − i) j−i − − (i + 1)(i + 2) (i + 1)(i + 2)(j + 2) (i + 1)(j + 1)(j + 2) i+j+2 , (i + 1)(j + 1)(j + 2)
so that Cov (Yi , Yj ) =
i+j+2 1 1 i − · = . (i + 1)(j + 1)(j + 2) i + 1 j + 1 (i + 1)(j + 1)(j + 2)
Collecting all facts we obtain
E
n X k=1
Yk =
n X k=1
1 = log(n + 1) − 1 + γ + o(1) k+1
as
n → ∞,
39
The law of large numbers and Var
n X
Yk
=
k=1
=
n X k=1 n X k=1
≤
X
Var Yk + 2
Cov (Yi , Yj )
1≤i ε) = +∞ for all ε > 0, n=1
an application of the second Borel-Cantelli lemma shows that P (|Xn | > ε i.o.) = 1, and, thus, that the sum can not converge almost surely. (b): This is immediate from the fact that P (|Xn | > ε i.o.) = 1. Alternatively, since convergence or not is a tail event and X1 , X2 , . . . are independent, it follows from the Kolmogorov zero-one law that the sum converges with probability 0 or 1, so that (b) follows from the fact that we have ruled out almost sure convergence in (a). ♣ All of this, of course, under the tacit assumption that the distribution is not the δ(0)distribution.
10. The strong law implies that n
1X a.s. Xk → µ n
as
n → ∞,
k=1
P from which the conclusion follows; note for example, that P ( nk=1 Xk >
nµ 2
i.o.) = 1.
11. (a): Lemma A.6.1 implies that Sn (ω) →0 n
for almost all ω
as
n → ∞.
(b,c): The same solution works with the aid of the second half of Lemma A.6.1; recall also Example A.6.1. (d): By Minkowski’s inequality, n
S 1X
n kXk kr ,
≤ n r n k=1
and Lemma A.6.1 does it once again.
42
Chapter 6 p
(e): Using the hint we have P (|Xn | > ε) → 0 as n → ∞, so that Xn → 0 as n → ∞. Set mn = [log2 n], where log2 n is the logarithm to the base 2. Since P (Xk = 1, k = 1, 2, . . . , mn − 1) =
mX n −1 k=1
1 = 2mn − 1 ≤ n − 1, 2k
it is necessary for the event {Sn > n} to occur that Xk 6= 0 for at least one k ∈ [mn , n], viz., n n \ S [ n Xk = 0 Xk 6= 0 = 1 − P P >1 = P (Sn > n) ≥ P n k=mn
k=mn
= 1−
n Y
1−
k=mn
1 mn − 1 =1− →1 n n
as
n → ∞,
which shows that a weak law does not hold. P P∞ 2 12. If ∞ n=1 µn and n=1 Pσ∞n are convergent, Kolmogorov’s convergence criterion immediately tells us that n=1 Xn is a.s. convergent. P If, on the other hand, ∞ n=1 Xn is a.s. convergent, then the sum is also convergent in distribution, in particular, the sequence of characteristic functions converges; ϕPn
k=1
Xk (t)
=
n Y k=1
n n n X 1 X 2o σk ϕXk (t) = exp it µk − 2 k=1
1 → exp itµ − σ 2 2
as
k=1
n → ∞,
for some constants µ and σ 2 (convergence implies that real and imaginary parts must converge). An application of the continuity theorem for characteristic functions thus shows that the sums of the parameters converge, and that the three statements are all true. Note that, in fact, characteristic functions could have been used in the first part too, since distributional convergence implies almost sure convergence (Theorem 6.5.8). 13. According to thePKolmogorov three-series theorem,Pa necessary condition for the a.s. ∞ convergence of ∞ n=1 P (Xn > A) < ∞ for some n=1 Xn is, to begin with, that A > 0. With A = 1/2 this means that ∞ X
P (Xn 6= 0) =
n=1
∞ X
(1 − pn ) =
n=1
∞ X
qn < ∞.
(S.6.2)
n=1
To prove that this is also a sufficient condition, suppose that (S.6.2) holds. Then qn → 0 and, hence, pn → 1 as n → ∞, so that pn > 1/2 (for example) for n > n0 for some n0 . Checking expected values and variances therefore yields ∞ X n=1 ∞ X n=1
E Xn
∞ X X qn X qn = ≤C+ ≤C +2 qn < ∞, pn pn n>n n>n
Var (Xn ) =
n=1 ∞ X
n=1
0
0
X qn X qn ≤ C + ≤ C + 4 qn < ∞. p2n p2n n>n n>n 0
0
43
The law of large numbers This means that the three three-series sums all converge, and, hence, that converges almost surely. To summarize: ∞ X
∞ X
⇐⇒
Xn converges a.s
n=1
P∞
n=1 Xn
qn < ∞.
n=1
Notice also that
∞ X
qn ≤
n=1
∞ ∞ X X qn qn ≤ , pn p2n
n=1
n=1
so that, if the sum of the variance converges, then so do the others. If, on the other hand, (S.6.2) holds, it follows from above that X qn X X qn qn . ≤ 2 ≤ 4 p2n pn n>n n>n n>n 0
0
0
In other words, the three sums converge and diverge together. Finally, by monotonicity, a.s. convergence implies that E
∞ X
Xn
n=1
∞ X qn = pn
and Var
∞ X
n=1
Xn
n=1
∞ X qn . = p2n n=1
14. Since a.s. convergence is equivalent to distributional convergence (Theorem 6.5.8) it suffices, according to the continuity theorem for characteristic functions, to check the latter, which, in turn, reduces the problem to Problem 5.14.13: ϕYn (t) =
n Y
ϕXk (ak t) =
k=1
n Y
n
exp{−|ak t|} = exp −
k=1
P∞
o |ak | |t| ,
k=1
k=1 |ak |
which converges properly iff
n X
< ∞.
15. (a): Due to independence and the fact that the mean is 0 we have, 4
E(Sn )
=
n X
E Xk4 + 6
X
E Xi2 E Xj2
1≤i ε ≤ = = C 2. n (n2 ε)2 n4 ε 2 n
(c): We wish to show that ∞ X n=1
P (Tn > n2 ε) < ∞,
(S.6.3)
44
Chapter 6 and lean on the first Borel-Cantelli lemma. Now, 2
P (Tn > n ε) ≤ ≤
E Tn2 ≤ (n2 ε)2
P(n+1)2
k=n2 +1
(n + 1)2 − n n4 ε 2
E(Sk − Sn2 )2 n4 ε 2
2 2
E X2
=
P(n+1)2 =
k=n2 +1
(k − n2 )E X 2 n4 ε 2
(2n + 1)2 E X 2 1 ≤ C 2, 4 2 n ε n
which establishes (S.6.3). (d): This follows from (c) (or, else, by analogous computations) and the fact that Vn ≤ Tn /n2 , so that P (Vn > ε) ≤ P (Tn > n2 ε) ≤ · · · ≤ C
1 . n2
(e): Let n2 < k ≤ (n + 1)2 . Then, |Sk | |S 2 | + Tn a.s. →0 ≤ n 2 k n
as
n → ∞,
because of (b) and (c). Alternatively, using (d), the statement that Vn > ε occurs only finitely often implies that the event {|Sn /n| > ε} also occurs only finitely often, since every Vn contains only finitely many averages. ♣ Observe that the subsequence method was needed in order to avoid the harmonic series, and, at the same time, how the Kahane-Hoffman-Jørgensen inequality beautifully avoids the harmonic series as well as working via subsequences in, e.g., the proof of the Hsu-Robbins-Erd˝ os strong law, Theorem 6.11.2.
16. Since, in the previous problem, ◦ we made the (unnecessary) assumption that the fourth moment exists, ◦ the computations involving moments remain true for uncorrelated sequences, ◦ only the first of the Borel-Cantelli lemmas was exploited, we can inherit the solution from there except for one change: The summands are not equidistributed here. This is taken care of as follows: sup E Xk4 ≤ M k
sup E Xk2 k
by assumption, q √ ≤ sup E Xk4 ≤ M by Lyapounov’s inequality. k
17. Let A denote the value of the integral, and let Ik , k ≥ 1, be indicator variables, such that ( 1, point k falls below the curve, Ik = 0, otherwise. The indicators are independent, identically distributed Be(A)-distributed P random variables, and the number of points falling below the curve equals Un = nk=1 Ik (∈ Bin(n, A)). The conclusion therefore is immediate from the strong law of large numbers.
45
The law of large numbers
18. (a): The procedure is to cut the remaining piece at a point which is uniformly distributed over the current length, which means that X1 ∈ U (0, 1),
X2 | X1 = x ∈ U (0, x),
...,
Xn | Xn−1 = x ∈ U (0, x),
....
Due to the linearity of the uniform distribution – if Z ∈ U (a, b), then cZ + d ∈ U (ca + d, cb + d) – we may rewrite the procedure as X1 ∈ U (0, 1),
Xn = Yn Xn−1 ,
n ≥ 2,
where Y1 , Y2 , . . . are independent, identically U (0, 1)-distributed random variables, which, in turn, by iteration, leads to Xn =
n Y
Yk .
k=1
(b): Taking logarithms transforms the product into a sum; log Xn =
n X
log Yk ,
k=1
and letting Z µ = E(log Y1 ) =
1
h i1 log y dy = y log y − y = −1, 0
0
an application of the strong law of large numbers yields n
1X log Xn a.s. = log Yk → −1 n n
as
n → ∞.
k=1
1/n a.s.
♣ After exponentiating we also have Xn
→ e−1 as n → ∞.
(c): The first piece we throw away must also be U (0, 1)-distributed. After all, the pieces we keep and throw away are obtained the same way; pick a point at random and ... keep/throw away ... Computationally: If X1 is the first piece we keep, then 1 − X1 is the first piece we throw away, and F1−X1 (x) = P (1 − X1 ≤ x) = P (X1 ≥ 1 − x) = 1 − (1 − x) = x, 19. (a): By Chebyshev’s inequality, Pn Pn S Var S 2 n3 n n k=1 Var Xk k=1 k P 2 > ε ≤ 2 2 = = ≤ C →0 n (n ε) n4 ε 2 n4 ε 2 n4
0 < x < 1.
as
n → ∞.
(b): The stronger result follows the pattern of the solution of Problem 6.13.15 (with second moments). ∞ ∞ ∞ P 2 ∞ Pn2 S 2 2 X X X Var (Sn2 ) X nk=1 Var Xk n k=1 k P 2 2 > ε ≤ = = (n ) (n4 ε)2 n8 ε 2 n8
n=1
≤
n=1 ∞ X
n=1
n=1
∞ X 1 C 8 =C < ∞. n n2
n6
n=1
n=1
46
Chapter 6 With a modified notation of Problem 6.13.15 – Vn = maxn2 ≤k≤(n+1)2 |Sk /k 2 |, and Tn as before, we next obtain P(n+1)2 E(Sk − Sn2 )2 E T 2 2 P (Vn > ε) ≤ P (Tn > n4 ε) ≤ 4 n 2 ≤ k=n +1 8 2 (n ε) n ε P(n+1)2 2 P(n+1)2 Pk 2 2 2 (n + 1) − n j 2 +1 j 2 j=n j=n2 +1 k=n +1 = ≤ n8 ε 2 n8 ε 2 (2n + 1) (n + 1)6 − n6 (2n + 1) · n5 1 ≤ C ≤ C 2, ≤ C 8 8 n n n so that by summation and the first Borel-Cantelli lemma, P (Vn > ε i.o.) = 0. The conclusion now follows as in Problem 6.13.15(e). Alternatively, since E Xk = 0 and Var Xk = k 2 for k = 1, 2, . . ., it follows that ∞ X k=1
Var
X k k2
=
∞ X 1 < ∞, k2 k=1
so that, via the Kolmogorov convergence criterion (Theorem 6.5.2), ∞ X Xk k=1
k2
converges almost surely.
An application of the Kronecker lemma finishes the proof. 20. WLLN: We follow the previous solution: Pn Pn S 2β Var Sn n γ k=1 Var Xk k=1 k P γ >ε = P (|Sn | > n ε) ≤ γ 2 = = n (n ε) n2γ ε2 n2γ ε2 n2β+1 ≤ C 2γ → 0 as n → ∞, n provided β + 1/2 < γ. With the aid of the Marcinkiewicz-Zygmund inequalities (Theorem 3.8.1) it is also possible to prove laws of large numbers by assuming a finite moment of some order r ∈ (1, 2), and then make a trade off between the parameters β, γ, and r: Thus, let r ∈ (1, 2), and suppose that E|X|r < ∞. Then P P S Br nk=1 E|Xk |r Br nk=1 k rβ E|X|r E|Sn |r n P γ >ε ≤ = = n (nγ ε)r nrγ εr nrγ εr nrβ+1 ≤ C rγ → 0 as n → ∞, n provided β + 1/r < γ. SLLN: To prove a strong law we begin by checking two of the three sums of the Kolmogorov three-series theorem. Set
( Xk , Yk = 0,
for |Xk | ≤ k γ , otherwise,
for k = 1, 2, . . . .
47
The law of large numbers The first sum equals ∞ ∞ ∞ X X X X k P γ >1 = P (k β |X| > k γ ) = P (|X| > k γ−β ) < ∞, k k=1
k=1
k=1
provided E |X|1/(γ−β) < ∞ (we assume that γ > β). Under this moment assumption we now check the sum of the truncated variances. Following Lemma 6.6.1, with 1/(γ − β) ∈ (0, 2) here playing the role of r there, we obtain ∞ X k=1
Var
Y k γ k
≤ ≤
∞ X EY2 k=1 ∞ X
k k 2γ
=
∞ X E k 2β X 2 I{|k β X| ≤ k γ }
k 2γ
k=1
E X 2 I{|X| ≤ k γ−β } ≤ · · · ≤ C + CE|X|1/(γ−β) < ∞. 2(γ−β) k k=1
This means that
∞ X Yk − E Yk k=1
< ∞,
kγ
and, in view of Kronecker’s lemma, that n 1 X a.s. (Yk − E Yk ) → 0 γ n
as
n → ∞.
k=1
Next in line are the trunctated means, and for these we copy the proof of the Marcinkiewicz-Zygmund strong law Theorem 6.7.1: For 0 < γ − β < 1 check page 299, for γ − β = 1, check page 296, and for 1 < γ − β < 2, check page 300, to obtain E Yk → 0
n → ∞,
as
so that, by Lemma A.6.1, n 1 X E Yk → 0 nγ
as
n → ∞,
k=1
implying that n 1 X a.s. Yk → 0 nγ
as
n → ∞,
k=1
and, finally, by convergence equivalence (convergence of the first Kolmogorov threeseries sum), that n 1 X a.s. Xk → 0 γ n
as
n → ∞.
(S.6.4)
k=1
However, we can do a bit more. Namely, suppose that (S.6.4) holds. Then, following the proof of the converses in the strong laws, we obtain Xn a.s. →0 nγ
as
n → ∞,
48
Chapter 6 d
which, by the second Borel-Cantelli lemma, and the assumption that Xn = nβ X, tells us that ∞ ∞ X X ∞> P (|Xn | > nγ ) = P (|X| > nγ−β ), n=1
n=1
E|X|1/(γ−β)
so that < ∞, from which it follows that the mean is zero whenever finite via the strong law, cf. page 301. We have thus shown that, if E|X|1/(γ−β) < ∞, where 0 < γ − β < 2 and E X = 0 whenever finite (i.e. for γ − β ∈ [1, 2)), then the strong law (S.6.4) holds, and that if the strong law holds, then E|X|1/(γ−β) < ∞ and E X = 0 whenever finite. ♣ With β = 1 and γ = 1/r the result reduces to the Marcinkiewicz-Zygmund strong law, Theorem 6.7.1, and for β = 1, γ = 2 we rediscover the previous problem, however, with the weaker assumption of finite mean only.
21. We assume throughout w.l.o.g. that A = 1; this is only a matter of scaling. WLLN: Set, for 1 ≤ k ≤ n, ( ak Xk , Yk = 0,
for |Xk | ≤ n, otherwise.
We check the conditions of Theorem 6.3.3: n n 1 X 1 1 X Var Y ≤ E(ak Xk )2 I{|Xk | ≤ n} ≤ E X 2 I{|X| ≤ n} k 2 2 n n n k=1
k=1
n−1
≤
4X 2 + jP (|X| > j) → 0 n n
as
n → ∞.
j=1
The last inequality, and the convergence to 0 have been stolen from the proof of the Kolmogorov-Feller weak law, Theorem 6.4.1. As for the second condition, n X
P (|ak Xk | > n) ≤ nP (|X| > n) → 0
as
n → ∞.
as
n → ∞,
k=1
♣ Up to this point we have shown that n p 1X ak Xk − E(ak Xk I{|Xk | ≤ n}) → 0 n
(S.6.5)
k=1
without making use of the fact that the mean is finite, only that nP (|X| > n) → 0 as n → ∞ (that is, assumption (6.4.1)). We have thus, in fact, extended the KolmogorovFeller law to the present setting.
Finally, since E X = 0 (!), we obtain n 1 X E ak Xk I{|Xk | ≤ n} = n k=1
≤
n 1 X −E ak Xk I{|Xk | > n} n k=1
n 1X E|ak Xk |I{|Xk | > n} n k=1
≤ E|Xk |I{|Xk | > n} → 0 which, inserted into (S.6.5) finishes the proof.
as
n → ∞,
49
The law of large numbers
Alternatively, using characteristic functions, Theorem 4.4.2, and the boundedness of the weights, we obtain ϕ(ak t/n) = 1 +
ak it ak |t| A|t| ·0+o =1+o n n n
as
n → ∞,
so that ϕ1 n
(t) k=1 ak Xk
Pn
=
n Y k=1
A|t| n ϕ(ak t/n) = 1 + o →1 n
as
n → ∞.
An application of the continuity theorem for characteristic functions finishes the proof. SLLN: This time our truncation is ( ak Xk , for |Xk | ≤ k, Yk = 0, otherwise,
for k = 1, 2, . . . .
To prove a strong law we begin by checking two of the three conditions of the Kolmogorov three-series theorem. Parallelling the proof of the Kolmogorov strong law we have ∞ ∞ ∞ Y X X X k P >1 ≤ P (|Xk | > k) = P (|X| > k) < ∞, k k=1
k=1
∞ X k=1
Var
Y k
k
≤
k
k=1
∞ X EY2 k=1 ∞ X
≤
k=1
k2
=
∞ X E X 2 I{|Xk | ≤ k} k
k2
k=1
E X 2 I{|X| ≤ k} ≤ · · · ≤ C + CE|X| < ∞. k2
From here we conclude that ∞ X Yk − E Yk k=1
k
< ∞,
so that, by Kronecker’s lemma, n
1X a.s. (Yk − E Yk ) → 0 n
as
n → ∞.
k=1
Next, we must get rid of the trunctated mean: |E Yk | = | − E ak Xk I{|Xk | > k}| ≤ E |Xk |I{|Xk | > k} = E |X|I{|X| > k} → 0
as
k → ∞,
so that (remember Lemma A.6.1), n
1X E Yk → 0 n
as
n → ∞,
k=1
which, together with (S.6.6), implies that n
1 X a.s. Yk → 0 n k=1
as
n → ∞,
(S.6.6)
50
Chapter 7 and, finally, by convergence equivalence (convergence of the first Kolmogorov threeseries sum), that n 1X a.s. ak Xk → 0 as n → ∞. n k=1
22. Here is a slight variation of Lemma A.6.1. Lemma S.6.1 Suppose that an ∈ R, n ≥ 1. If max1≤k≤n |ak | →0 n
an n
≤ ≤
n → ∞.
as
proof. Given an arbitrary ε > 0, we know that It follows that, for n > n0 , max1≤k≤n |ak | n
→ 0 as n → ∞, then
|an | n
< ε as soon as n > n0 = n0 (ε).
max1≤k≤n0 |ak | maxn0 ≤k≤n |ak | + n n max1≤k≤n0 |ak | |ak | C + max ≤ + ε, n0 ≤k≤n k n n
so that lim sup n→∞
max1≤k≤n |ak | ≤ ε, n 2
which does it, since ε can be made arbitrarily small. a.s.
Now, since E|X| < ∞ we know from Proposition 6.1.1 that Xnn → 0 as n → ∞, so a.s. that by Lemma S.6.1 it follows that Ynn → 0 as n → ∞. From the strong law we a.s. also know that Snn → µ as n → ∞, so that, finally Yn Yn /n a.s. 0 = → =0 Sn Sn /n µ
7
as
n → ∞.
The Central Limit Theorem 1. Suppose that
P∞
n=1 pn (1
− pn ) =
P∞
n=1 Var Xn
= +∞, and let r > 2. Then
E|X1 − E X1 |r = |0 − pk |r · (1 − pk ) + |1 − pk |r · pk = pk (1 − pk ) pr−1 + (1 − pk )r−1 ≤ pk (1 − pk ) pk + (1 − pk ) k = pk (1 − pk ), so that an application of Lyapounov’s condition yields Pn Pn r k=1 pk (1 − pk ) k=1 E|Xk − E Xk | β(n, r) = r/2 ≤ Pn r/2 Pn k=1 Var Xk k=1 pk (1 − pk ) n X 1−r/2 = pk (1 − pk ) → 0 as n → ∞. k=1
This shows that the central limit theorem holds in this case. If, on the other hand, the sum of the variances converges, we know from Remark 7.2.5 that asymptotic normality does not hold.
51
The central limit theorem 2. Suppose that
P∞
1 n=1 mn
= +∞. Then (
m
E|X1 |
m
= E|X1 − E X1 |
=
1 m+1 ,
when m is even, when m is odd.
0,
In order to apply Lyapounov’s condition with r = 4 we note, using the cr -inequality, Theorem 3.2.2, (with r = 4) that, 1 1 4 E|X1m − E X1m |4 ≤ 8 E X14m + (E|X1m |)4 ≤ 8 + 4m + 1 m+1 8 10 2 + = when m is even, ≤ m m m and that E|X1m − E X1m |4 = E X14m =
1 4m + 1
when m is odd,
and, hence, that E|X1m − E X1m |4 ≤
10 m
for all
m.
Moreover, 1 1 2 = − 2m + 1 m+1 ( 1 , ≤ 2m+1 m2 when m is even, 1 (2m + 1)(m + 1)2 ≥ 4(2m+1) ,
Var (X1m ) = E X12m − E X1m =
2
and Var (X1m ) = E X12m = so that
(
1 2m+1 , 1 4(2m+1)
≤ ≥
Var (X1m )
1 2m + 1
≥
when m is odd,
for all
4 3m ,
m.
We this in mind, mk mk r k=1 E|Xk − E Xk | Pn mk r/2 k=1 Var (Xk ) n X 1 −1
Pn
Pn β(n, 4) =
= C
k=1
mk
→0
as
≤
10 k=1 mk 2 4 k=1 3mk
Pn
n → ∞.
The central limit theorem thus holds in this case. P 1 If, on the other hand, ∞ n=1 mn < ∞, then our estimates for the variances tell us that the sum of the variances converges, and Remark 7.2.5 therefore that asymptotic normality does not hold. 3. We have E Xk = 0, and, for r > 0, E|Xk − E Xk |r = E|Xk |r = k rα . P Via Remark 7.2.5 we know that ∞ k=1 Var Xk = ∞ is necessary for the central limit theorem to hold, which, in the present case, implies that −2α ≥ 1, that is, that α ≥ −1/2.
52
Chapter 7 Therefore, suppose that this is the case, and check the Lyapounov condition: Pn Pn rα r k=1 k k=1 E|Xk − E Xk | β(n, r) = r/2 = Pn Pn 2α r k=1 k k=1 Var Xk rα+1 C n2α+1 = Cn1−r/2 , for α > −1/2, (n )r ∼ C n−r/2+1 , for α = −1/2, (log n)r/2 → 0
n → ∞.
as
Here we have used Lemma A.3.1 for ∼. 4. We have E Xk = 0, and E|Xk − E Xk |r = E|Xk |r =
2 k αr . r+1
Thus, ∞ X k=1
∞
2 X 2α Var Xk = k 3 k=1
( = ∞,
for α ≥ −1/2,
< ∞,
for α < −1/2,
that is, α ≥ −1/2 is necessary for the central limit theorem. Suppose in the following that α ≥ −1/2, and check Lyapounov’s condition with an eye on Lemma A.3.1: Pn β(n, r)
r k=1 E|Xk − E Xk | r/2 Pn k=1 Var Xk
= ( ∼ → 0
=
2 Pn αr k=1 k r+1 2 Pn 2α r k k=1 3
αr+1
C (nn2α+1 )r = n1−r/2 , n−r/2+1 (log n)r ,
as
for α > −1/2,
for α = −1/2,
n → ∞.
Thus, for α ≥ −1/2, Pn Xk d pPk=1 → N (0, 1) n 2α k=1 k
n → ∞,
as
which can be rewritten (Lemma A.3.1) as P (α + 12 ) nk=1 Xk d → N (0, 1) nα+(1/2) and as
Pn
k=1 Xk
log n
d
→ N (0, 1)
as
as
n → ∞,
n → ∞,
when α > −1/2,
when α = −1/2.
5. Noticing that n X k=1
Sk =
n X k X k=1 j=1
Xj =
n X n n X X 1 Xj = (n − (j − 1))Xj , j=1
k=j
j=1
53
The central limit theorem we first obtain E
n X
Sk
= 0
Sk
=
and
k=1
Var
n X
n X
(n + 1 − j)2
n X
j=1
k=1
Var Xj = σ 2
j=1
n X
j 2 ∼ σ2
j=1
n3 , 3
where we used Lemma A.3.1 in the last step. The natural guess thus is that no centering should be madep and that γ = 3/2 (more precisely, that, in fact, the correct normalization should be σ 2 n3 /3). Using these guesses we obtain, via characteristic functions – more conveniently, their logarithms –, and expansion around 0 (Section 4.4) log ϕPn Sk /n3/2 (t) = log ϕPnk=1 Sk (t/n3/2 ) = log ϕPnj=1 (n+1−j)Xj (t/n3/2 ) k=1 n n Y X = log ϕ(n+1−j)Xj (t/n3/2 ) = log ϕX (jt/n3/2 ) j=1
j=1
n X
1 jt 2 2 σ + r log 1 − j 2 n3/2 j=1
=
= 1−
n n σ 2 t2 X 2 X j + rj,n + Rn , 2n3 j=1
j=1
where rj,n , 1 ≤ j ≤ n, n ≥ 1, are the o( · )-terms stemming from the approximations of the individual characteristic functions, and Rn is the error stemming from the expansion of the logarithm. Since Pn
j=1 j n3 /3
2
→1
as
n → ∞,
(S.7.1)
we have shown, so far, that, log ϕPn
k=1
Sk
/n3/2
n X t2 σ 2 + o(1) + rj,n + Rn (t) = 1 − · 2 3
as
n → ∞, (S.7.2)
j=1
and it remains to take care of the remainders. Suppose, for a second, that we have shown that the remainders vanish asymptotically. It then would follow that log ϕPn
k=1
Sk
/n3/2
Sk
/n3/2
t2 σ 2 (t) → 1 − · 2 3
as
n → ∞,
or, equivalently, that ϕPn
k=1
t2 σ 2 3
(t) → e− 2 ·
as
n → ∞,
which, in view of the continuity theorem for characteristic functions, would tell us that Pn k=1 Sk d → N (0, σ 2 /3) as n → ∞, n3/2
54
Chapter 7 which, in turn, coincides exactly with the more precise guess above. Thus, to close the problem we therefore wish to show that n X
rj,n + Rn → 0
n → ∞.
as
(S.7.3)
j=1
From Section 4.4 we know that ϕX (t) = 1 − (S.7.1) and Lemma A.3.1) shows that n X
rj,n =
j=1
t2 2
+ o(t) as t → 0, which (remember
n n X jt 2 X jt 2 2 2 σ = o σ = o(1) o n3 n3
as
n → ∞.
j=1
j=1
This takes care of the first term in (S.7.3). Next, (Lemma A.3.1 again) n n n 2 t4 σ 4 X X 1 X 1 jt 2 2 4 2 ≤ σ + r j + rj,n j,n 2 2 n3/2 4n6 j=1 j=1 j=1 n n X j4 C C X j 2 t2 2 2 + o + o σ = n n3 n n6 j=1 j=1 C C as n → ∞, +o n n
|Rn | ≤ ≤ =
which takes care of the second termin (S.7.3), and we are done. 6. Let Pn X1 , X2 , . . . be independent Po(1)-distributed random variables, and set Sn = k=1 Xk , n ≥ 1. Then, by the central limit theorem, LHS = P (Sn ≤ n) = P
S − E S 1 n n √ ≤ 0 → Φ(0) = = RHS 2 Var Sn
as
n → ∞.
7. Let n ≥ 2. We have log Yn =
n n 1 X 1 X 1 √ log(1 + Xk ) = √ Xk − Xk2 + Rk 2 log n k=1 log n k=1
n 1 X 1 Xk − + Rk 2k log n k=1 Pn Pn Xk Rk log n + γ + o(1) k=1 √ √ − + √k=1 = log n 2 log n log n γ + o(1) 1 = Σ1 − log n − √ + Σ2 . 2 2 log n
=
√
Now, E
n X
Xk = 0,
Var
k=1 n X k=1
n X k=1
3
E|Xk − E Xk | =
n X k=1
Xk
n X 1 = ∼ log n k
E|Xk |3 =
k=1 n X k=1
as
k −3/2 ∼ Cn−1/2
n → ∞, as
n → ∞.
55
The central limit theorem Thus, Pn β(n, 3) =
− E Xk |3 n−1/2 →0 ∼ C 3/2 (log n)3/2 Var Σ1
k=1 E|Xk
so that
d
Σ1 → N (0, 1)
as
n → ∞,
n → ∞;
as
note that we, in fact, already know this from Problem 7.8.3. As for Σ2 we notice that the expansion of the logarithm is an alternating series, which implies that n X
|Rk | ≤
k=1
n X |Xk |3 k=1
so that
a.s.
3
Σ2 → 0
=
n X k −3/2
3
k=1
as
∼ Cn−1/2 ,
n → ∞.
Taking the remaining terms into account we have shown that 1p d log n → N (0, 1) as n → ∞, 2 √ √ 1/√log n , can be rewritten as which, upon noticing that 12 log n = log n √ 1/√log n d n log Yn · → N (0, 1) as n → ∞, log Yn +
that is,
n 1/√log n d √ Y n (1 + Xk ) → LN (0, 1)
as
n → ∞.
k=1
8. (a): Let t > 0. Since t → 0 later, we suppose that t < 1 already now. Z Z ∞ Z ∞ cos tx 1 − (1 − cos tx) itx 1 ϕ(t) = e dx = 2 dx = 2 dx 3 3 |x| x x3 |x|>1 1 1 Z 1/t Z ∞ 1 − cos tx 1 − cos tx = 1−2 dx − 2 dx 3 x x3 1/t 1 = 1 − 2I1 − 2I2 . Noticing that tx < 1 in the first integral, we have Z I1 =
1/t 1
− (1 −
1
=
(tx)2 2 + x3
O((tx)4 )
t2 dx = 2
Z
1/t
1
1 dx + O(1) x
t2 log 1/t + O(t2 ). 2
The second integral is Z
∞
I2 = 2 1/t
1 − cos tx dx ≤ 2 x3
Z
∞
1/t
1 dx = t2 . x3
Putting the pieces together, recalling symmetry, yields 1 ϕ(t) = 1 − t2 log + O(1) as |t|
t → 0.
Z 1
1/t
1 dx x3
56
Chapter 7 (b): Using (a) and the usual rules we obtain n p ϕSn /√n log n (t) = ϕ(t/ n log n) t2 √n log n t 2 n = 1− log +O n log n |t| n log n t 2 n t2 1 1 log n + log log n − log |t| + O = 1− n log n 2 2 n log n n 2 t 1 2 = 1− +o → e−t /2 = ϕN (0,1) (t) as n → ∞. 2n n An appeal to the continuity theorem for characteristic functions finishes the proof. ♣ Note that Var X1 does not exist. Asymptotic normality (although with a slightly stronger normalization) still holds, although with a slightly modified normalization. More about this can be found in Chapter 9.
9. (a): By symmetry, for k = 1, 2, . . ., E Yk = 0, Var Yk = E
Yk2
Z =2 1
bk
x2 dx = 2 x3
Z 1
bk
1 dx = 2 log bk . x
(b): As for third moments (for example), 3
3
Z
bk
E|Yk − E Yk | = E|Yk | = 2
dx = 2(bk − 1), 1
so that
P 2 nk=1 (bk − 1) β(n, 3) = Pn . (2 k=1 log bk )3/2 √ For b1 = b2 = 1 and bk = k log k for k ≥ 3 our computations turn into E Y1 = E Y2 = Var Y1 = Var Y2 = 0, E Yk = 0,
Var Yk = log k + 2 log log k, √ E|Yk − E Yk |3 = 2( k log k − 1), k ≥ 3, n X E Yk = 0, Var
k=1 n X
n X Yk = (log k + 2 log log k) ∼ n log n
k=1 n X
E|Yk − E Yk |3 = 2
k=1
as
n → ∞,
k=3 n √ X 2 ( k log k − 1) ∼ 2 · n3/2 log n 3
as
n → ∞,
k=3
so that β(n, 3) ∼ C
n3/2 log n = C(log n)−1/2 → 0 (n log n)3/2
as
n → ∞.
This shows that Lyapounov’s condition is satisfied, and we conclude that Pn Y d √ k=1 k → N (0, 1) as n → ∞. n log n
57
The central limit theorem (c): It remains (Theorem 5.1.2) to show that tionally equivalent. However, ∞ X
P (Xk 6= Yk ) = 2 +
k=1
= 2+
∞ X k=3 ∞ X k=1
P (|Xk | >
Pn
k=1 Xk
and
√ k log k) = 2 + 2
Pn
k=1 Yk
∞ Z X
are distribu-
∞
√ k=3
k log k
1 dx x3
1 < ∞, k(log k)2
which, in fact, shows that X1 , X2 , . . . and Y1 , Y2 , . . . are convergence equivalent, in particular (Theorem 6.1.1), that Pn
P − nk=1 Yk a.s. →0 n log n
Xk k=1 √
as
n → ∞.
The desired conclusion follows. Alternatively, if, for each fixed n ≥ 3 we truncate X1 , X2 , . . . , Xn at bn = then (please check!) E
n X
Yk = 0,
Var
k=1 n X
n X
√
n log n,
Yk = n(log n + 2 log log n),
k=1
E|Yk − E Yk |3 = 2(n3/2 log n − n),
k=1
β(n, 3) ∼ 2(log n)−1/2 → 0
as
n → ∞,
so that
Pn Y d √ k=1 k → N (0, 1) as n → ∞. n log n P P In this case distributional equivalence of nk=1 Xk and nk=1 Yk is established as follows: P
n X
Xk 6=
k=1
n X
Yk
≤
k=1
n X
P (Xk 6= Yk ) = n · P (|X1 | >
√
n log n)
k=1
= n·
1 →0 n(log n)2
as
n → ∞,
and we are done. 10. Let X1 , X2 , . . . be independent, identically distributed random variables with mean 0 and variance 1 (this is no restriction) and partial sums Sn , n ≥ 1. Then S S √2n = √ n + 2n 2n
P2n 1 Sn 1 S0 k=n+1 Xk √ = √ · √ + √ √n , n 2n 2 2 n
d
where Sn0 = S2n − Sn = Sn , and where Sn and Sn0 are independent and identically distributed. Now, suppose, to the contrary, that convergence in probability holds in the central limit theorem. Then, letting N1 and N2 denote standard normal random variables,
58
Chapter 7 we have S √2n 2n Sn √ n 0 S √n n so that S0 √n n
(
p
→ N1 p
→ N1 p
→ N2
as
n → ∞,
as
n → ∞,
as
n → ∞,
p
→ N2 on the one hand, p √ → 2N1 − N1 on the other hand,
which implies that N2 =
√
2 − 1 N1
a.s.
But this is impossible, for example, since the variances of the quantities in the LHS and the RHS are different. 11. (a): Since at each step only a single row is involved, the proof for sequences carries over verbatim (except for notational changes). 2 (b): This is a matter of scaling. If E Xn,j = µn,j and Var Xn,j = σn,j one can introduce new random variables
Yn,j =
Xn,j − µn,j , sn
and (7.2.21) is satisfied anew. (c): Put Xn,j =
Xj −µj sn ,
1 ≤ j ≤ n, n ≥ 1.
12. We suppose w.l.o.g. that all expectations are 0 (otherwise, center, and replace An by 2An ). Checking the Lyapounov condition with r = 3 yields P Pn 2 3 An nk=1 E Xn,k An k=1 E|Xn,k | β(n, 3) = ≤ = → 0 as n → ∞. s3n s3n sn 13. We wish to show that the Lindeberg condition L2 is satisfied. This is achieved by modifying the proof of Lyapounov’s theorem (which amounted to showing that the Lyapounov condition implies L2 ). Toward that end, let ε > 0. Since, by assumption, x2 /g(x) & 0 as x → ∞, we obtain (parallelling the proof of Lyapounov’s theorem), L2 (n)
= ≤
n n Xk2 1 X 1 X 2 E X I{|X | > εs } ≤ E g(X ) · I{|Xk | > εsn } n k k k s2n s2n g(Xk ) k=1 k=1 P n ε2 nk=1 E g(Xk ) 1 (εsn )2 X E g(Xk )I{|Xk | > εsn } ≤ s2n g(εsn ) g(εsn ) k=1
→ 0
as
n → ∞.
♣ Do check the case g(x) = |x|r for r > 2.
14. Let X1 , X2 , . . . be independent, identically distributed random variables, distributed as X (and Y ), and set n
2 1 X Vn = √ Xk . 2n k=1
59
The central limit theorem We claim that d
Vn = X
for all
n,
(S.7.4)
and intend to prove this by induction on n. For n = 1 the statement reduces to the assumption, so suppose we are done up to n, and consider Vn+1 . Splitting at 2n yields Vn+1
1 =√ 2
P2n+1 P2n Xk V 0 + V 00 n Xk k=2 k=1 √ √ = n√ n . + 2n 2n 2
By noticing that 2n+1 − 2n = 2n , we obtain, via the induction hypothesis, that Vn0 and Vn00 are independent and distributed as X. This establishes (S.7.4). P d Now, let n → ∞. By the central limit theorem we know that √1n nk=1 Xk → N (0, 1) as n → ∞. Since this remains true for subsequences it follows that d
Vn → N (0, 1) d
n → ∞.
as
d
On the other hand, Vn = X, so that X → N (0, 1) as n → ∞, that is, X ∈ N (0, 1). An alternative is to solve the problem with the aid of characteristic functions: Set ϕ(t) = ϕX (t). Then, by assumption, √ √ √ 2 √ ϕ(t) = ϕ(X+Y )/√2 (t) = ϕ(X+Y ) (t/ 2) = ϕX (t/ 2) · ϕY (t/ 2) = ϕ(t/ 2) , which, after iteration and an application of Theorem 4.4.2, yields ϕ(t)
=
t2 2n √ 2n 1 t2 ϕX (t/ 2n ) = 1− √ +o √ 2 2n 2n 2 /2
→ e−t
= ϕN (0,1) (t)
as
n → ∞,
so that, in fact, ϕ = ϕN (0,1) . The uniqueness theorem for characteristic functions, finally, tells us that X ∈ N (0, 1). 15. (a): By independence and symmetry (ϕ(t) = ϕ(t)), ϕ(2t) = ϕ2X (t) = ϕ(X+Y )+(X−Y ) (t) = ϕX+Y (t) · ϕX−Y (t) = ϕX (t) · ϕY (t) · ϕX (t) · ϕ−Y (t) = (ϕ(t))4 . (b): Rewriting this as ϕ(t) = (ϕ(t/2))4 , iterating, and applying Theorem 4.4.2 yields √ 4n 42 4n ϕ(t/22 ) = · · · = ϕ(t/2n ) = ϕ(t/ 4n ) t2 4n 1 t2 2 = 1− · n +o n → e−t /2 = ϕN (0,1) (t) as 2 4 4
ϕ(t) =
n → ∞.
We have thus shown that ϕ = ϕN (0,1) , which, by the uniqueness theorem for characteristic functions, proves that X is standard normal. 16. (a): By independence, ϕ(2t) = ϕ2X (t) = ϕ(X+Y )+(X−Y ) (t) = ϕX+Y (t) · ϕX−Y (t) = ϕX (t) · ϕY (t) · ϕX (t) · ϕ−Y (t) = (ϕ(t))3 · ϕ(t).
60
Chapter 7 (b): If ϕ(t) = 0 for some t = 0 then, since ϕ(t) = (ϕ(t/2))3 · ϕ(t/2) = (ϕ(t/2))2 |ϕ(t/2)|2 , we must have ϕ(t/2) = 0 too. Continuing this procedure we find that ϕ(t) = 0 at tvalues arbitrarily close to 0, which, because of the uniform continuity of characteristic functions and the fact that ϕ(0) = 1, is impossible. Hence, ϕ(t) 6= 0 for all t. (c): Recalling (a), γ(2t) =
ϕ(2t) ϕ(2t)
=
(ϕ(t))3 · ϕ(t) (ϕ(t))3 · ϕ(t)
2 = γ(t) .
(d): We first note that |γ(t)| = 1 for all t, which means that γ is located on the unit circle, so that γ(t) = eiθ(t) for some function θ(t). Now, using (c), eiθ(2t) = ei2θ(t) , so that θ(2t) = 2θ(t)
for all t ∈ [0, 2π].
In addition, θ(0) = 0 since γ(0) = 1. Moreover, γ is continuous since ϕ is continuous. The proof of Lemma A.8.1, which departs exactly from these facts, then tells us that θ is linear – θ(t) = ct for some c – and it remains to show that c = 0. The fact that the mean equals 0, and the variance equals 1 (by assumption), implies that, for t small,
γ(t) =
eict = 1 + ict + · · · ,
ϕ(t) ϕ(t)
=
1−t2 /2+o(t2 ) 1+t2 /2+o(t2 )
= 1 − t2 /2 + · · · ,
so that, since there are no linear terms in the second alternative we must have c = 0. (e): We have shown in (d) that
ϕ(t) ϕ(t)
= 1 for all t, that is to say, that ϕ(t) = ϕ(t) for
all t, and, hence, that ϕ is real, and so (e) follows from this and (a). (f): For the solution of this one we refer to the previous problem. ♣ Since we have shown in (d) that ϕ is real it follows that the underlying distribution is symmetric, and we have reduced the problem to the previous one.
17. Let L(t) denote availability in the sense of “actual amount of time spent in the busy period during (0, t]”, so that L(t)/t is the relative time spent in the busy period. (a): Set τ (t) = min{n : Tn > t},
t ≥ 0.
Since Sτ (t) is the amount of time spent in the busy period at time Tτ (t) , it follows that L(t) ≤ Sτ (t) ≤ L(t) + Yτ (t) .
(S.7.5)
61
The central limit theorem
Now, from strong laws for randomly indexed sequences of random variables, Theorem 6.8.2, and renewal theory, Theorem 6.9.3, we know that τ (t) t Yτ (t) t Sτ (t) t
a.s.
→
a.s.
1 µy + µz
as
t → ∞,
→
0
=
Sτ (t) τ (t) a.s. 1 → µy · · τ (t) t µy + µz
as
t → ∞, as
t → ∞,
which, after dividing by t and letting t → ∞ in (S.7.5), shows that µy L(t) a.s. → t µy + µz
as
t → ∞.
♣ This is a reasonable result, since, for t large, Sτ (t) /t Sτ (t) L(t) µy = ≈ ≈ . t Tτ (t) Tτ (t) /t µy + µz
(b): Using the hint, the central limit theorem tells us that Pn (µy Zk − µz Yk ) d p k=1 → N (0, 1) as n → ∞, nVar (µy Zk − µz Yk ) which, together with asymptotics for τ (t) from a few lines above, and Anscombe’s theorem 6.3.2, shows that Pτ (t) y Zk − µz Yk ) d k=1 (µ q → N (0, 1) as n → ∞, 1 γ µx +µ t y where γ 2 = Var (µy Zk − µz Yk ) = µ2y σz2 + µ2z σy2 . Pτ (t) By adding and subtracting Sτ (t) = k=1 µy Yk in the numerator, the conclusion turns into µy Tτ (t) − (µy + µz )Sτ (t) d q → N (0, 1) as n → ∞. 1 γ µx +µ t y We now split the LHS in order to introduce L(t): LHS =
µy (Tτ (t) − t) µy t − (µy + µz )L(t) (µy + µz )(L(t) − Sτ (t) ) q q q + + . (S.7.6) 1 1 1 γ µx +µ t γ t γ t µx +µy µx +µy y
By Cram´er’s theorem (the symmetry of the normal distribution, and a little bit of reshuffling) the proof is complete if we can show that the extreme members in the RHS converge to 0 in probability (or almost surely) as t → ∞, and for this we omit irrelevant constants in the following. Via the sandwich argument and some other facts from Subsection 6.9.3, we have t < Tτ (t) Yτ (t) t Zτ (t) t
≤ a.s.
→
a.s.
→
t + Yτ (t) + Zτ (t) , 0
as
t → ∞,
0
as
t → ∞.
62
Chapter 7 By joining the first and third of these it follows that Tτ (t) − t a.s. √ →0 t
t → ∞,
as
which takes care of the first term in the RHS of (S.7.6), and by joining (S.7.5) and the second one, it follows that Sτ (t) − L(t) a.s. √ →0 t
t → ∞.
as
The proof is complete. 18. We use the Delta method with g(x) = xp , x > 0. Let µ and σ 2 denote mean and variance, respectively, of the summands, and note that g 0 (µ) = pµp−1 > 0, since µ > 0 (unless X is degenerate). Thus 2 √ Sn p d n − µp → N (0, σ 2 pµp−1 ). n For the special case p = 1/2 we may also argue as follows: r Sn √ Sn √ √ −µ 1 Sn − nµ ·q n − µ = nq n = √ √ √ n n Sn Sn | {z } n + µ n + µ {z } | d 2 →N (0,σ )
σ2 d → N 0, 4µ
as
p √ →1/(2 µ)
n → ∞.
The first arrow is the central limit theorem, the second one is the law of large numbers and the continuity of the square root. The last arrow is Cram´er’s theorem. 2 1 Note also that g 0 (µ) = 4µ as should be. 19. We use the Delta method with g(x) = H(x), 0 < x < 1. Then H 0 (x) = − log x − 1 + log(1 − x) + 1 = log
1 − x
, x for x 6= 1/2,
H 0 (x) = 0
for x = 1/2, H 0 (x) 6= 0 1 1 1 H 00 (x) = − − =− , x 1−x x(1 − x) H 00 (1/2) = −4. Thus, for p 6= 1/2 it follows that √
1 − p 2 d n H(Xn /n) − H(p) → N 0, p(1 − p) log p
as
n → ∞.
For p = 1/2 the limit distribution is a χ2 (1)-distribution, namely, d 1 1 n H(Xn /n) − H(1/2) → · (−4Z) 2 4
as
n → ∞,
or, equivalently, d 2n log 2 − H(Xn /n) → Z,
where
Z ∈ χ2 (1)
as
n → ∞.
63
The central limit theorem
d P 20. Note that Xn = nk=1 Yk , where Y1 , Y2 , . . . are independent, standard exponential random variables. We have Pn r (Yk − 1) Xn − n d n d k=1 √ √ · Pn → N (0, 1) as n → ∞. = Y n Xn k k=1 | {z } | {z } p
d
→N (0,1)
→1
Here p we have used the central limit theorem, the law of large numbers, the fact that 1/x is continuous at x = 1, and Cram´er’s theorem. 21. From Problem 6.13.3(a) we know that p
s2n → σ 2
n → ∞,
as
and from Problem 2.20.13 we know that n 2 µ − σ 4 2µ − 4σ 4 µ − 3σ 4 4 4 4 Var (s2n ) = · − + , n−1 n n2 n3 so that nVar (s2n ) → µ4 − σ 4
as
n → ∞,
with the aid of which we conclude that √ s2 − σ 2 p d n s2n − σ 2 = p n · nVar (s2n ) → N (0, µ4 − σ 4 ) 2 {z } | Var (s ) | {z n } p √ →
d
as
n → ∞,
µ4 −σ 4
→N (0,1)
via the central limit theorem, the continuity of the square root function and Cram´er’s theorem. 22. We assume w.l.o.g. that the means are equal to zero. In order to apply various limit theorems we begin by rewriting: Pn
√
X Y k=1 √ k k
−
Pn Xk k=1 √ n
n n−1 n sn,x
nrn =
· Y¯n
· sn,y
.
(S.7.7)
Now, from Problem 6.13.3 we know that p
p
s2n,x → σx2
and that s2n,y → σy2
as
n → ∞.
Moreover, since {Xk Yk , k ≥ 1} are independent, identically distributed random variables with mean E X1 Y1 = ρσx σy = 0, and variance Var (X1 Y1 ) = µ2xy , the central limit theorem tells us that Pn Xk Yk d k=1 √ → N (0, 1) as n → ∞. µxy n Putting all facts together we obtain — remember Cram´er’s theorem, Pn Pn Xk Yk X d k=1 √ √ k · Y¯n → N (0, µ2xy ) as n → ∞, − k=1 |{z} n n p | {z } | {z } →0 d
d
→N (0,µ2xy )
→N (0,1)
|
{z p
→0
for the numerator in (S.7.7).
}
64
Chapter 7 The denominator converges in probability to σx σy ; here we also exploit the continuity of the square root function. A final application of Cram´er’s theorem shows that √
µ2 1 d xy · N (0, µ2xy ) = N 0, 2 2 σx σy σx σy
d
nrn →
n → ∞.
as
23. (a): Setting Var X = E X 2 = σ 2 < ∞ we obtain, noticing the homogeneity of the denominator, and recalling Cram´er’s theorem, that Tn (2) = d
1 · q Pn
S √n n |{z}
d
2 k=1 Xk n
→N (0,σ 2 )
|
{z
→ N (0, 1)
n → ∞.
as
}
a.s.
→ 1/σ
(b): Similarly, n
( p1 − 21 )
Tn (p) = d
S √n n |{z}
1
·
Pn
|Xk n
k=1
→N (0,σ 2 )
|
a.s.
|p
σ2 d 1/p → N (0, kXk2 )
as
n → ∞.
p
{z
}
→ 1/kXkp
♣ Note that part (b) reduces to part (a) for p = 2. ♣ Note also that if E X = µ 6= 0, and E|X|p < ∞, then Tn (p) =
Sn a.s. n → p 1/p k=1 |Xk | n
Pn
µ =0 kX1 kp
as
n → ∞,
by the strong law of large numbers.
24. We assume w.l.o.g. that all expectations are 0, and replace the ususal Lindeberg condition L2 (7.2.2) with the following r-analog: n 1 X L2,r (n) = r E|Xj − µj |r I{|Xj | > εsn } → 0 sn
as
n → ∞.
(S.7.8)
j=1
Assuming this we obtain the following L1 -analog (recall (7.2.1)): E|Xj |r →0 1≤j≤n srn
L1,r (n) = max
as
n → ∞.
(S.7.9)
This is seen by modifying the argument from the proof of the central limit theorem: For any ε > 0, 1 1 E|Xj |r I{|Xj | ≤ εsn } + max r E|Xj |r I{|Xj | > εsn } r 1≤j≤n sn 1≤j≤n sn r ≤ ε + L2,r (n),
L1,r (n) ≤
max
so that lim supn→∞ L1,r (n) ≤ εr , which establishes the claim. Moreover, and similarly, n 1 X E|Xj |r → 0 srn j=1
as
n → ∞.
(S.7.10)
65
The central limit theorem Namely, LHS =
n n 1 X 1 X r E|X | I{|X | ≤ εs } + E|Xj |r I{|Xj | > εsn } j j n srn srn j=1
j=1
r
≤ ε + L2,r (n), so that lim supn→∞
1 srn
Pn
r j=1 E|Xj |
≤ εr , and the assertion follows.
Given this, one would like to prove that o n S r n ,n≥1 sn
is uniformly integrable.
However, estimating moments of sums with the aid of the Marcinkiewicz-Zygmund inequalities introduces unwanted powers of n in the numerator and necessitates assumptions on the boundedness of n/s2n which are automatic in the standard case of identical distributions. So, we have to proceed differently. Instead we prove moment convergence without proceeding via uniform integrability. As it turns out, even moments are easier to handle, so we confine ourselves to that case. For more we refer to the following articles: – Brown, B.M. (1969). Moments of a stopping time related to the central limit theorem. Ann. Math. Statist. 40, 1236-1249. – Brown, B.M. (1970). Characteristic functions, moments, and the central limit theorem. Ann. Math. Statist. 41, 658-664. – Brown, B.M. (1971). A note on convergence of moments. Ann. Math. Statist. 42, 777-779. Thus, suppose that r = 2k, an even integer, and let N ∈ N (0, 1). We wish to show that S 2k n E → E|N |2k = (2k − 1)!! as n → ∞. (S.7.11) sn We proceed by induction. For k = 1 there is nothing to prove. Suppose that the claim has been verified up to k − 1. We wish to verify it for k. By expanding Sn2k and independence, E Sn2k =
n X 2k X 2k j=1 m=1
=
2k 2
X n
m
2k−m E Sj−1 E Xjm =
2k−2 E Sj−1 E Xj2 +
j=1
= Σ 1 + Σ2 + Σ3 ;
2k X n X 2k m=1
2k−2 X m=3
2k m
m
X n j=1
2k−m E Sj−1 E Xjm
j=1 2k−m E Sj−1 E Xjm +
n X
E Xj2k
j=1
(S.7.12)
note that the terms corresponding to m = 1 and m = 2k − 1 vanish, since the means are 0. We now take care of the three sums separately – the first one will be the essential one, the others will be aymptotically negligible.
66
Chapter 7 Σ1 : We first observe, via the conjugate rule and the fact that sn %, that Z sn 2k 2kx2k−1 dx ≤ 2k(sn − sn−1 )s2k−1 s2k − s = n n n−1 sn−1
= k(s2n − s2n−1 )s2k−2 n−1
s 2k−1 n , sn−1
and, similarly, that 2k−2 2k 2 2 s2k n − sn−1 ≥ k(sn − sn−1 )sn−1
sn−1 , sn
so that, in view of (S.7.9), 2k−2 2k 2 2 s2k n − sn−1 = k(sn − sn−1 )sn−1 1 + o(1) , and, hence, by telescoping, k
n X
2k (s2j − s2j−1 )s2k−2 j−1 = C + sn (1 + o(1))
as
n → ∞.
(S.7.13)
j=1
With the induction hypothesis in mind we now insert this into the first sum. For n large, X n n X Sj−1 2k−2 2k−2 2k 2k−2 2 · sj−1 E Xj2 Σ1 = E Sj−1 E Xj = k(2k − 1) E sj−1 2 j=1
= C + k(2k − 1) E|N |2k−2 + = C + k(2k − 1) E|N |2k−2 +
j=1 n X 2k−2 o(1) · sj−1 E j=1 n X 2k−2 2 o(1) sj−1 (sj j=1
Xj2 − s2j−1 )
n X 2 2 = C + E|N |2k + o(1) k s2k−2 j−1 (sj − sj−1 ) j=1 2k
= C + E|N |
2k 2k + o(1) s2k n (1 + o(1)) = C + E|N | sn (1 + o(1)).
Σ2 : Since (Proposition 3.6.5) E|Si |p ≤ E|Sj |p
for 1 ≤ i ≤ j,
p ≥ 1,
we obtain, by Lyapounov’s inequality and (S.7.10), Σ2 ≤
2k−2 X m=3
≤
m=3
≤
j=1
2k−2 X
2k k
= Ck
n 2k X E|Sn |2k−m E|Xj |m m 2k m
E|Sn |2k−2
E|Xj |m
j=1
2k−2 X m=3
2k−2 X
n (2k−m)/(2k−2) X
Sn 2k−2 E sn
(2k−m)/(2k−2)
· s2k−m n
n X
E|Xj |m
j=1
(2k−m)/(2k−2) 2k−m m E|N |2k + o(1) · sn o sn
m=3
≤ Ck · 2k E|N |2k + o(1) o(s2k n )
as
n → ∞.
67
The law of the iterated logarithm Σ3 : By (S.7.10), Σ3 =
n X
E Xj2k = o s2k n
as
n → ∞.
j=1
Joining the estimates for the three sums, and dividing by s2k n shows that S 2k n = E|N |2k (1 + o(1)) as n → ∞. E sn For non-even integers we refer, once again, to the articles cited above. 25. (a): Under the assumption that the characteristic functions coincide in (−T, T ), Lemma 7.6.1 immediately reduces to sup |FU (x) − FV (x)| ≤ x
24A . πT
(b): The proof probably consists of a close examination and modification of the proof of the Berry-Esseen theorem.
8
The Law of the Iterated Logarithm 1. This follows from the Hartman-Wintner law of the iterated logaritm: p |Sn | |Sn | 2σ 2 n log log n lim sup √ = lim sup p · =1·0=0 β (log n)β n(log n) n→∞ n→∞ 2σ 2 n log log n
a.s.
An alternative is to use law of large numbers methods. For β > 1/2, the easier case, Kolmogorov’s convergence criterion yields ∞ X
Var √
n=2
∞ X σ2 Xn = < ∞, n(log n)2β n(log n)β n=2
so that, since the mean is 0, we have ∞ X
√
n=2
Xn n(log n)β
converges a.s.,
which, via Kronecker’s lemma establishes the claim. 2. The subsequence {nk = k k , k ≥ 1} is sparse since k k kk 1 = · →0 k+1 k+1 k+1 (k + 1)
as
k → ∞.
The conclusion therefore follows from Theorem 8.7.2. Alternatively, we observe that S kk p
kk
log log(k k )
=p
S kk
kk
S kk
=p · log(k log k) k k log k
s
log k . log k + log log k
Checking for ε∗ we find that ∞ X n=3
log nk
−ε2 /2
=
∞ X n=3
k log k
−ε2 /2
( < ∞,
for ε >
= ∞,
for ε ≤
√ √
2, 2,
68
Chapter 8 √ so that ε∗ = 2. The conclusion therefore follows from the law of the iterated logarithm for sparse subsequences, Theorem 8.5.2, and the fact that the second factor in the RHS converges to 1 as k → ∞. The subsequence {nk = k α , k ≥ 1} is dense since k α kα = →1 (k + 1)α k+1
k → ∞.
as
The conclusion therefore follows from the law of the iterated logarithm for dense subsequences, Theorem 8.5.1. 3. Let N be the null set where the three almost sure statements (may) fail to hold, and pick ω ∈ / N . If Y (tk , ω) → a ∈ E as k → ∞ it follows that ξ(tk , ω)Y (tk , ω) + η(tk , ω) → a
as
k → ∞.
The arbitrariness of a ∈ E establishes the claim. ♣ Note that being outside the null set we have reduced the problem to a problem for real numbers.
4. (a): This one follows from the first conclusion in the Anscombe LIL, Theorem 8.7.4, since τ (t) a.s. 1 → t µ
as
t → ∞.
(S.8.1)
(b): So does this one; note that the conclusion of Problem 8.8.3 enters in the transition from (a) to (b). (c): Since the variance is finite we know from Proposition 6.1.1 that X a.s. √n → 0 n
as
n → ∞,
which, all the more, shows that √
Xn a.s. →0 n log log n
as
n → ∞,
which, in view of (S.8.1) – recall Theorem 6.8.1 – proves the conclusion. (c): The proof of the theorem is concluded by joining the above, the sandwich inequality, t < Sτ (t) ≤ t + Xτ (t) , and Problem 8.8.3. 5. We introduce the usual indicators, ( 1, if Xk is a record value, Ik = 0, otherwise. Since |Ik − 1/k| ≤ 1 for all k, we may apply the Kolmogorov LIL, Theorem 8.1.1, to obtain Pn 1 k=1 (Ik − k ) lim sup (lim inf ) q P = +1 (−1) a.s. Pn 1 n→∞ 1 n→∞ 2 nk=1 k1 1 − k1 log log (1 − ) k=1 k k
69
The law of the iterated logarithm Now, Pn
µ(n) − log n √ log n log log log n
− k1 ) Pn 1 1 1 1 k=1 k 1 − k log log k=1 k (1 − k ) sP Pn 1 n 1 1 1 k=1 k 1 − k log log k=1 k (1 − k ) × log n log log log n Pn 1 − log n , + √ k=1 k log n log log log n k=1 (Ik
=
qP n
from which the conclusion follows since (Theorem 2.17.2) E Var
n X k=1 n X
Ik Ik
n X 1 = ∼ log n k
=
k=1
k=1 n X k=1
as
n → ∞,
1 1 1− ∼ log n k k
as
n → ∞.
(1)
6. Since Mn ≥ Sn for all n it follows immediately that (1)
lim sup p
Mn
2σ 2 n log log n
n→∞
≥1
a.s.
In order to prove the opposite inequality we exploit the fact that lim supn→∞ Sn = +∞ and lim inf n→∞ Sn = −∞ (for example as a consequence of the HartmanWintner law), which implies that P (Mn(1) = Sn i.o.) = 1. Thus, for every ω outside a null set there exists a subsequence {nk , k ≥ 1} (depending on ω), such that (1) Mnk Snk p =p . 2 2 2σ nk log log nk 2σ nk log log nk In between these epochs the partial maxima stay constant, whereas the denominator decreases. It follows that the LHS attains a maximum precisely when Sn reaches a new record high, which, in turn, may or may not be a maximum for the RHS. This permits us to conclude that (1)
Mn Sn lim sup p ≤ lim sup p =1 n→∞ n→∞ 2σ 2 n log log n 2σ 2 n log log n
a.s.,
which establishes the following law of the iterated logarithm for the partial maxima: (1)
Mn
lim sup p
2σ 2 n log log n
n→∞
(2)
An analogous argument for Mn
=1
a.s.
=1
a.s.
yields (2)
Mn
lim sup p n→∞
2σ 2 n log log n
70
Chapter 9 Namely, in this case we have Mn(2) ≥ |Sn |, (2)
Mn
lim sup p n→∞
2σ 2 n log log n
≥1
a.s.,
P (Mn(2) = |Sn | i.o.) = 1, and a subsequence {mk , k ≥ 1} (depending on ω), such that (2)
Mmk p
2σ 2 mk log log mk
|Smk |
=p
2σ 2 mk log log mk
. (1)
Form this point on the argument is verbatim the same as for Mn .
9
Limit Theorems; Extensions and Generalizations 1. For this problem we also need to know that non-negative stable distributions with index in (0, 1) possess a Laplace transform. In the present case this means that “we know” that LY (s) = E e−sY = exp{−csβ } for s > 0, where c is some positive constant. For more about Laplace transforms, see Feller’s Volume II, Chapter XIII, and for this kind of stable distributions, Section XIII.6. Given this additional information we check the characteristic function of XY 1/α . Since X is symmetric, and β ∈ (0, 1) is “the α of Y ”, we obtain ϕXY 1/α (t) = E exp{itXY 1/α } = E E(exp{itXY 1/α } | Y ) = E ϕX (tY 1/α ) | Y = E exp{−c|tY 1/α |α } = E exp{−c|t|α Y } = LY (c|t|α ) = exp − (c|t|α )β } = exp − cβ |t|αβ }, which due to the uniqueness of Laplace transforms finishes the proof. d
2. Since Sn = n1/α X1 it follows that d
log |Sn | =
1 log n + log |X1 |, α
so that, for any ε > 0, log |S | log |X | 1 1 n − >ε =P P >ε →0 log n α log n
as
n → ∞.
3. In view of the uniqueness theorem for characteristic functions it suffices to investigate the transform. 1 1 ϕXY (t) = E exp{itXY } = E E exp{itXY } | Y = E ϕX (t) + E ϕX (−t) 2 2 1 1 α α α = exp{−c|t| } + exp{−c| − t| } = exp{−c|t| } = ϕX (t). 2 2 Note also that, X being symmetric, 1 1 1 1 P (XY ≤ u) = P (X ≤ u) + P (−X ≤ u) = P (X ≤ u) + P (X ≤ u) 2 2 2 2 = P (X ≤ u). d
In other words, XY = X. ♣ This is, in fact, true for any symmetric random variable X.
71
Limit Theorems; Extensions and Generalizations
4. Since the variance is finite for α > 2 and for α = 2, γ > 1, the ordinary central limit theorem holds in those cases. Therefore, suppose in the following that these cases are not present. In order to apply Theorem 9.3.3 we check the truncated variance for α = 2, γ ≤ 1, and the tail probabilities for 0 < α < 2: Thus, suppose first that α = 2, γ ≤ 1. For x > e we then have Z x 1 dy U (x) = E X 2 I{|X| ≤ x} = 2c γ e y(log y) ( 2c log log x, for γ = 1, = 2c 1−γ −1 , for γ < 1, 1−γ · (log x) so that U ∈ SV, and the distribution belongs to the domain of attraction of the normal distribution. Next let 0 < α < 2. Then Z P (|X| > x) = 2c e
x
1 y α+1 (log y)γ
dy ∼ C
1 αxα (log x)γ
as
x → ∞,
which means that the tails are regularly varying with index −α, so that the distribution belongs to the domain of attraction of a stable law, which, moreover, is symmetric, since the original distribution is symmetric – note that P (X > x) P (X < −x) 1 = = . P (|X| > x) P (|X| > x) 2 5. (a): The continuity theorem for characteristic functions tells us that it suffices to check the transform: ϕPnk=1 ak Xk (t) =
n Y
ϕXk (ak t) =
k=1
n Y
exp{−c|ak t|α }
k=1
n
= exp − c
n X
n o n X o |ak |α |t|α , |ak t|α = exp − c
k=1
k=1
P α which converges to a continuous function, which equals 0 at t = 0, iff ∞ k=1 |ak | < ∞. By the continuity theorem for characteristic functions the limiting distribution exists, it is, in fact a rescaled stable distribution with index α. If the sum diverges, then ϕYn (t) converges to 1 for t = 0 and to 0 otherwise and therefore can not be a characteristic function. (b): Almost sure convergence follows from Theorem 5.5.8. 6. We suppose that the distribution is non-degenerate. Let random variables with superscript s denote symmetrized random variables (`a la Chapter 3). Since sums (and differences) of independent sequences that converge in distribution also converge in distribution (Theorem 5.11.2), it follows that Yns =
Sns Sn − bn Sn0 − bn d s = − →Y an an an
First, since supn |Sns | = +∞, it follows that an → ∞.
as
n → ∞.
72
Chapter 9 Secondly,
s Xs Sn+1 d = Yns + n+1 → Y s + 0 = Y s an an+1
since
s s Xn+1 p d X1 = →0 an+1 an+1
as
n → ∞,
as
n → ∞.
The convergence to types theorem (Theorem 9.2.1) therefore forces an+1 →1 an
n → ∞.
as
7. (a): Using Taylor expansion, the exponent equals Z
∞ itx
e 0
e−x −1 dx = x
Z
∞ ∞X
0
n=1
∞ X
=
n=1
which shows that nZ exp
∞
eitx − 1
0
∞
X (it)n (itx)n e−x dx = n! x n!
Z
n=1
∞ X
(it)n Γ(n) = n!
n=1
∞
xn−1 e−x dx
0
(it)n = − log(1 − it), n
e−x o 1 dx = exp{− log(1 − it)} = x 1 − it
as desired. 1 (b): Running the computations in (a) backwards, starting with 1−iλt , which amounts to replacing t by λt, followed by a change of variable, yields Z ∞ h i Z ∞ e−x e−y/λ iλtx − log(1 − iλt) = e −1 dx = y = λx = eity − 1 dy, x y 0 0
so that ϕ Exp(λ) (t) =
n 1 = exp 1 − iλt
Z 0
∞
e−x/λ o dx . eitx − 1 x
(c): Since ϕΓ(p,λ) (t) = (ϕ Exp(λ) (t))p it follows immediately, using (b), that ϕΓ(p,a) (t) = exp
∞
nZ 0
pe−x/λ o eitx − 1 dx . x
(d): Since, by definition, ϕΓ(p,1) (t) =
1 Γ(p)
Z 0
∞
eitx xp−1 e−x dx =
1 , (1 − it)p
it follows that ϕX (t) = exp
nZ
∞
o eitx − 1)xp−1 e−x dx = exp Γ(p) ϕΓ(p,1) (t) − 1 ,
0
which, in view of Theorem 4.9.1 and the uniqueness theorem for characteristic funcd tions, shows that X = Y1 + Y2 + · · · + YN , where N ∈ Po(Γ(p)), and X1 , X2 , . . . are independent Γ(p, 1)-distributed random variables, which, in addition, are independent of N .
73
Limit Theorems; Extensions and Generalizations 8. The probability function is pX (n) = pq n
for n = 0, 1, . . .
(q = 1 − p),
so that p = exp log(1 − q) − log(1 − qeit ) it 1 − qe n ∞ ∞ ∞ nX n X qn o q n X qeit o + eitn − 1 · = exp . = exp − n n n
ϕX (t) =
n=1
n=1
n=1
The canonical representation thus corresponds to the (canonical) measure that puts mass q n /n at the point n, n = 1, 2, . . .. 9. We confine ourselves to the simpler case when the variance is finite and nZ ∞ ϕ(t) = exp eitx − 1 dG(x)}. −∞
In Remark 9.4.2 it was mentioned that the class of inifinitely divisible distributions with finite variance coincides with limits of compound Poisson distributions. In the present case, let λ = G(+∞) − G(−∞), set F (x) = G(x)/λ (so that F is a distribution function), and rewrite the characteristic function as follows: n Z ∞ o n Z ∞ o ϕ(t) = exp λ eitx − 1 dF (x) = exp λ eitx dF (x) − 1 . −∞
−∞
Since the integral is the characteristic function of a random variable with distribution function F , we have expressed the random variable corresponding to the characteristic function ϕ as Z = Y1 + Y2 + · · · + YN , where N ∈ Po(λ) and Y1 , Y2 , . . . are independent, identically distributed random variables, which are independent of N and with distribution function F (Theorem 4.9.1). From this it is now clear that Z ≥ 0 if and only if F (0) = 0 (which, in turn, is equivalent to G(0) = 0). 10. Set Y = X1 + X2 + · · · + XN . Then (Theorem 4.9.1), on nλ ϕY (t) = gN ϕX1 (t) = exp{λ(ϕX1 (t) − 1)} = exp (ϕX1 (t) − 1) n n = g Po(λ/n) ϕX1 (t) for all n. Since the expression within parentheses is the characteristic function of the sum of a Po(λ/n)-distributed number of X-variables, the uniqueness theorem for characteristic functions tells us that Y can be decomposed into the sum of n such independent, identically distributed random variables, and that this is true for all n. d
Alternatively, we observe that N = N1 + N2 + · · · + Nn , where Nk ∈ Po(λ/n), 1 ≤ k ≤ n, and where, in addition, N1 , N2 , . . . , Nn are independent. This means that d Y = Z1 + Z2 + · · · + Zn , where Zk = Xk,1 + Xk,2 + · · · + Xk,Nk , and where all Xi,j are independent, and distributed as X1 , so that {Zk , 1 ≤ k ≤ n} are independent and identically distributed. We have thus decomposed Y probabilistically (that is, without resorting to transforms). This asserts the infinite divisibility of Y , since the decomposition is valid for all n.
74
Chapter 9
n 11. (a) P ( nY1/α ≤ x) = (F (xn1/α ))n = exp{−(xn1/α )−α )n} = exp{−x−α }.
(b) P (n1/α Yn ≤ x) = (F (xn−1/α ))n = exp{−(−xn−1/α )α )n} = exp{−(−x)α }. (c) P (Yn − log n ≤ x) = (F (x + log n))n = exp{−e−(x+log n) n} = exp{−e−x }. 12. Recalling Problem 4.11.25 (or re-deriving it) we have P (Y ≤ y) = gN FX (y) = exp{λ(1 − e−y/θ − 1)} = exp{−λe−y/θ },
y > 0.
The RHS is the distribution function of a Gumbel type distribution, which, however, has its support on all of R. Since P (Y < 0) = 0
P (Y = 0) = e−λ ,
and
the distribution of Y coincides with that of V + . 13. (a): Let Tn ∈ Po(µn ). Theorem 9.7.1 then yields d(Sn , Tn ) ≤
n n n X 1 − e−µn X 2 1 X 2 1 max pj pk = max pk . pk ≤ pk ≤ 1≤k≤n µn µn µn 1≤j≤n k=1
k=1
k=1
(b): Theorem 7.6.2 provides the following closeness to the normal distribution: Pn E|Xk − pk |3 0.7915 sup F Sn −µn (x) − Φ(x) ≤ 0.7915 k=1 3 , ≤ pPn s s n x n k=1 pk (1 − pk ) where the last inequality has been stolen from the solution of Problem 7.8.1. P For a comparison to be possible it is necessary that ∞ n=1 pn = ∞ (for a Poisson approximation this is not necessary, but for a normal approximation). Thus, if this is the case, then one has to compare, say, n 1 X 2 pk µn k=1
and
0.7915 , P µn − nk=1 p2k
q
in order to find out which of the approximations that is the better one. We leave the details of this to the reader, and confine ourselves to two examples. We have seen that Poisson approximation is not suitable if pk = p for all k, in which case the bounds are Normal approximation : Poisson approximation :
0.7655 1 p ·√ , n p(1 − p) p,
respectively. For records, we have pk = 1/k, and the upper bounds (Theorems 7.7.5 and 9.7.1, cf. also page 462), Normal approximation : Poisson approximation :
1.9 √ , log n π2 , 6 log n
respectively, in which case the Poisson approximation is better (as might be expected).
75
Martingales
10
Martingales
P 1. Since Sk = kj=1 Xj for 1 ≤ k ≤ n and X1 = S1 , and Xk = Sk − Sk−1 for 2 ≤ k ≤ n, the conclusion follows from the fact that knowing the values of X1 , X2 , . . . , Xn is equivalent to knowing the values of S1 , S2 , . . . , Sn . 2. Add and subtract E(Y | G) and use smoothing; Lemma 10.1.1: Var Y
= E(Y − E Y )2 = E(Y − E(Y | G) + E(Y | G) − E Y )2 2 = E Y − E(Y | G) + E (Y | G) − E Y )2 +2E Y − E(Y | G) E(Y | G) − E Y 2 2 = E E Y − E(Y | G) | G + E E(Y | G) − E E(Y | G) +2E E E Y − E(Y | G) E(Y | G) − E Y |G = E Var (Y | G) + Var (E(Y | G)) +2E E E(Y | G) − E Y E Y − E(Y | G) | G | {z } =0
= E Var (Y | G) + Var (E(Y | G)). 3. Insertion as requested shows that we wish to verfy the relation Var Xn+1 = E Var (Xn+1 | Fn ) + Var (E(Xn+1 | Fn )) or, equivalently, that E Var (Xn+1 | Fn ) = Var Xn+1 − Var Xn , that is, 2 E Var (Xn+1 | Fn ) = E Xn+1 − E Xn2 ,
since martingales have constant expectations. Now, applying Theorem 10.1.4 with g(X) = E X0 , we have E Var (Xn+1 | Fn ) = E(Xn+1 − E X0 )2 − E(E(Xn+1 | Fn ) − E X0 )2 2 = E(Xn+1 − E X0 )2 − E(Xn − E X0 )2 = E Xn+1 − E Xn2 ,
once again, because of the constancy of expectations. ♣ In view of the orthogonality of the martingale increments (recall Remark 10.4.1) the last relation may be rewritten as E Var (Xn+1 | Fn ) = E(Xn+1 − Xn )2 = Var (Xn+1 − Xn ). Pn ♣ In particular, if Xn = k=1 Yk , where {Yk } are i.i.d. random variables with mean zero, and Fn = σ{X1 , X2 , . . . , Xn }, n ≥ 1, the relation turns into E Var (Xn+1 | Fn ) = Var Yn+1 . Checking the LHS, remembering that Xn given Xn is a (random) constant, and that Xn and Yn+1 are independent, yields Var (Xn+1 | Fn )
as requested.
= Var ((Xn + Yn+1 ) | Fn ) = Var (Xn | Fn ) + Var (Yn+1 | Fn ) + 2Cov ((Xn , Yn+1 ) | Fn ) = 0 + Var Yn+1 + 0 = Var Yn+1 ,
76
Chapter 10 4. Since, with the usual notation, X(n−1)
X(n) =
X
Yk ,
k=1
we note, as a preliminary, that E X(n) | X(n − 1) = X(n − 1)m, Var X(n) | X(n − 1) = X(n − 1)σ 2 . (a): Using the above and smoothing we obtain E X(n) = E E(X(n) | X(n − 1)) = mE X(n − 1) = · · · = mn . (b): Using (a) and Problem 10.17.2 we similarly obtain, Var X(n) = E Var (X(n) | X(n − 1)) + Var E(X(n) | X(n − 1)) = σ 2 E X(n − 1) + Var X(n − 1)E(Y ) = mn−1 σ 2 + m2 Var X(n − 1). (c): For n = 1 the result is obvious. Iteration/induction produces the desired result. Namely, suppose we are done up to level n − 1. Then, by (b), and the induction hypothesis, Var X(n) = mn−1 σ 2 + m2 Var X(n − 1) = mn−1 σ 2 + m2 σ 2 mn−2 + mn−1 + · · · + m2(n−2) = σ 2 mn−1 mn + · · · + m2(n−1) . 5. We first note that Y ∈ G and that E Y 2 = E E(X 2 | G) = E X 2
(and that
(E Y = E E(X | G) = E X),
so that, by smoothing, E(X − Y )2 = E X 2 − 2E XY + E Y 2 = 2E X 2 − 2E E(XY | G) = 2E X 2 − 2E Y E(X | G) = 2E X 2 − 2E Y 2 = 0, which, in turn, shows that X − Y = 0 a.s., and, hence, that X = Y a.s. 6. The conclusion follows from Theorem 5.4.1. Namely, let a > 0. By the conditional triangle inequality and the defining relation, Z Z |E(X | G)| dP ≤ E(|X| | G) dP {|E(X|G)|>a} {E(|X||G)>a} Z = |X| dP → 0 as a → ∞ uniformly in G, {E(|X||G)>a}
since the sets over which we integrate are uniformly small: P (|E(X | G)| > a) ≤
E E(|X| | G) E|X| E|E(X | G)| ≤ = . a a a
77
Martingales 7. (a): Since (X, σ{X}), (X + Y, σ{X + Y }) is a martingale, it follows that (|X|, σ{X}), (|X + Y |, σ{X + Y })
is a submartingale,
which, i.a., has non-decreasing expectations. (b): The same argument applies, since (|X|r , σ{X}), (|X + Y |r , σ{X + Y }) is a submartingale; note that it is essential that r ≥ 1 (r = 1 is part (a)). 8. By symmetry, E(X | X + Y ) = E(Y | X + Y ), and by adding the two, and linearity, E(X | X + Y ) + E(Y | X + Y ) = E(X + Y | X + Y ) = X + Y. The conclusion follows. 9. For the first expression we use the lack of memory property of the exponential distribution in the second term of the RHS: E(V | min{V, t}) = E(V I{V ≤ t} | min{V, t}) + E(V I{V > t} | min{V, t}) = V E I{V ≤ t} + (t + 1)E(I{V > t}) = V P (V ≤ t) + (t + 1)P (V > t) = V (1 − e−t ) + (t + 1)e−t . As for the second expression, E(V | max{V, t}) = E(V I{V ≤ t} | max{V, t}) + E(V I{V > t} | max{V, t}) = E(V I{V ≤ t}) + V E(I{V > t}) Z t ve−v dv + V P (V > t) = 1 − e−t − te−t + V e−t . = 0
10. (a): Since, in each round, the probability of continuing is 1/2 it follows that the duration of the game N ∈ Fs(1/2), and, hence, that E N = 1/(1/2) = 2. (b): Let the amount of money spent be Z. Then, since the stake is doubled each time as long as the game continues, E Z = E(E(Z | N )) = E
N X n=0
∞ X 1 2n−1 = E 2N − 1 = 2n · n − 1 = +∞. 2 n=1
(c): This portion is to be deleted; there is a mix-up with the St. Petersburg game in Section 6.4. 11. (a): Let an , n ≥ 1 be reals. Since E(Yn+1 + an+1 | Fn ) = Yn + cn + an+1 = Yn + an + cn + an+1 − an , we must have cn + an+1 − an = 0 in order for {(Yn + an , Fn ), n ≥ 1} to be a martingale. Thus, with an = − E Yn+1 −
n X
n n−1 X X ck | Fn = Yn + cn − ck = Yn − ck ,
k=1
which verifies that n
Yn −
n−1 X
k=1
o ck , Fn , n ≥ 1
k=1
is a martingale.
k=1
♣ The argument is reminiscent of the proof of the Doob decomposition.
Pn−1
k=1 ck ,
78
Chapter 10 (b): Similarly: E(Yn+1 · an+1 | Fn ) = Yn · cn · an+1 = Yn · an · so that, necessarily,
cn an+1 , an
cn an+1 =1 an
in order for {(Yn · an , Fn ), n ≥ 1} to be a martingale. Thus, with an = 1/ 1 1 1 Q E Yn+1 − n | Fn = Yn · cn · Qn = Yn · Qn−1 , c c k=1 k k=1 k k=1 ck
Qn−1
k=1 ck ,
which verifies that n
Yn Qn−1
k=1 ck
o , Fn , n ≥ 1
is a martingale.
12. From Section 10.3 we know that {Sn − n/2, n ≥ 1} is a martingale. Thus, E(Xn+1 | Fn ) = 2E(Sn+1 − (n + 1)/2 | Fn ) = 2(Sn − n/2) = Xn . ♣ The conclusion also follows from the fact that Xn = 2(Sn −n/2) and Problem 10.17.11.
13. (a): We argue in analogy with Example 10.3.3. 3 E Sn+1 | Fn = E (Sn + Yn+1 )3 | Fn 2 3 = E Sn3 + 3Sn2 Yn+1 + 3Sn Yn+1 + Yn+1 | Fn
2 3 = Sn3 + 3Sn2 E Yn+1 + 3Sn E Yn+1 + E Yn+1 ,
so that, by centering and setting µ = E Y , σ 2 = Var Y , and β 3 = E(Y − E Y )3 , E (Sn+1 − (n + 1)µ)3 | Fn = (Sn − nµ)3 + 3(Sn − nµ)σ 2 + β 3 . (S.10.1) With an eye on Problem 10.17.11 (and Example 10.3.3) one takes care of Zn = 3(Sn − nµ)σ 2 + β 3 by subtracting (n + 1)Zn+1 on the LHS of (S.10.1) to obtain E (Sn+1 − (n + 1)µ)3 − 3(n + 1)(Sn+1 − (n + 1)µ)σ 2 − (n + 1)β 3 | Fn = (Sn − nµ)3 − 3n(Sn − nµ)σ 2 − nβ 3 , which establishes that n o Sn − nµ)3 − 3nσ 2 (Sn − nµ) − nβ 3 , n ≥ 1
is a martingale.
(b): Following the pattern of (a) we first obtain 4 E Sn+1 | Fn = E (Sn + Yn+1 )4 | Fn 2 4 4 = E Sn4 + 4Sn3 Yn+1 + 6Sn2 Yn+1 + 4Sn Yn+1 + Yn+1 | Fn
2 3 4 = Sn4 + 4Sn3 E Yn+1 + 6Sn2 E Yn+1 + 4Sn E Yn+1 + E Yn+1 ,
so that, by centering and setting µ = E Y , σ 2 = Var Y , β 3 = E(Y − E Y )3 , and γ 4 = E(Y − E Y )4 , E (Sn+1 − (n + 1)µ)4 | Fn = (Sn − nµ)4 + 6(Sn − nµ)2 σ 2 +4(Sn − nµ)β 3 + γ 4 .
(S.10.2)
79
Martingales
Next, in order to take care of 6(Sn − nµ)2 σ 2 , we remember from Example 10.3.3 that {Un = (Sn − nµ)2 − nσ 2 , n ≥ 1} is a martingale, so that with a glimpse into Problem 10.17.11, we subtract 6(n + 1)σ 2 Un+1 on the LHS of (S.10.2), and subtract and add 6nσ 4 on the RHS, to obtain E (Sn+1 − (n + 1)µ)4 − 6(n + 1)σ 2 (Sn+1 − (n + 1)µ)2 − (n + 1)σ 2 | Fn = (Sn − nµ)4 − 6nσ 2 (Sn − nµ)2 − nσ 2 + 6nσ 4 + 4(Sn − nµ)β 3 + γ 4 . We finally subtract (n + 1) 4(Sn+1 − (n + 1)µ)β 3 + γ 4 and 3(n + 1)nσ 4 on both sides in the last expression in order to take care of 4(Sn − nµ)β 3 + γ 4 and 6nσ 4 , respectively: E (Sn+1 − (n + 1)µ)4 − 6(n + 1)σ 2 (Sn+1 − (n + 1)µ)2 − (n + 1)σ 2 −(n + 1) 4(Sn+1 − (n + 1)µ)β 3 + γ 4 − 3(n + 1)nσ 4 | Fn = (Sn − nµ)4 − 6nσ 2 (Sn − nµ)2 − nσ 2 − n 4(Sn − nµ)β 3 + γ 4 −3n(n − 1)σ 4 . By rearranging somewhat this tells us that n o (Sn − nµ)4 − 6nσ 2 (Sn − nµ)2 − 4nβ 3 (Sn − nµ) + 3n(n + 1)σ 4 − nγ 4 , n ≥ 1 is a martingale. 14. By joining Theorems 10.9.1 and 10.9.2 we obtain λP ( max |Xk | > λ) ≤ λP ( max Xk > λ) + λP ( max Xk < −λ) 0≤k≤n
0≤k≤n
≤ E
Xn+
+E
0≤k≤n
Xn+
− E X0 ≤ 3 max E|Xk | 0≤k≤n
For martingales and non-negative submartingales Theorem 10.9.1 alone provides the upper bound E|Xn | (= max0≤k≤n E|Xk |, since, as was mentioned in the text, submartingales have non-decreasing expectations). 15. Since {(−Xn , Fn ), n ≥ 1} is a submartingale, we know from the Doob decomposition that −Xn = Mn + An , where {(Mn , Fn ), n ≥ 1} is a martingale and {An , n ≥ 1} an increasing process. This means that Xn = −Mn − An , where {(−Mn , Fn ), n ≥ 1} is (also) a martingale, so that the suggested decomposition is obtained by setting Vn = −Mn for all n. (b): From submartingale theory we know that E An = E(−Xn ) − E Mn = E(−Xn ) − E M0 = E(−Xn ) + E V0 ≤ sup E(−Xn ) + E V0 = − inf E Xn + E V0 . n
n
(c): To begin with we know that {An , n ≥ 1} is an increasing process so that An → A∞ ≤ +∞ a.s. From (b) and Fatou’s lemma it follows that E A∞ ≤ lim inf E An ≤ − inf E Xn + E V0 < ∞, n→∞
which implies that P (A∞ < ∞) = 1.
n
80
Chapter 10 (d): Combining (a) and (c) yields Xn = Vn − E(A∞ | Fn ) + E(A∞ | Fn ) − An . Since {Vn , n ≥ 1} and {E(A∞ | Fn ), n ≥ 1} are martingales, so is their difference, which takes care of the martingale part of the decomposition. As for the other one, we first assert that E(A∞ | Fn ) − An ∈ Fn for all n, since the first term is Fn -measurable, and the second one is predictable, that is, Fn−1 measurable. Secondly, Zn = E A∞ | Fn − An = E (A∞ − An ) | Fn ≥ 0. Thirdly, by (anti)smoothing and predictability, E E(A∞ | Fn+1 ) − An+1 | Fn = E E(A∞ | Fn+1 ) | Fn − E(An+1 | Fn ) = E A∞ | Fn − An+1 ≤ E A∞ | Fn − An , which shows that {E A∞ | Fn − An , n ≥ 1} is a supermartingale. Finally, by monotone convergence – remember (b) and (c), E E A∞ | Fn − An = E A∞ − E An → 0 as n → ∞. This completes the proof of the fact that {E A∞ | Fn − An , n ≥ 1} is a potential, thereby establishing the desired decomposition. (e): Uniqueness is a consequence of the uniqueness of the Doob decomposition.
16. Having a potential at hand yields ( E M0 + 0, E Xn = E Mn + E Zn = E M0 + E Zn → 0,
as as
n → ∞, n → ∞,
which forces E Mn = 0 for all n. One decomposition thus is to let the martingale part vanish. However, since the Riesz decomposition for supermartingales is unique this generates the decomposition, viz. Xn = E A∞ | Fn − An . 17. (a): By construction, conditioning on Fn−1 is equivalent to conditioning on Yn−1 , so that Yn−1 . E(Yn | Fn−1 ) = E(Yn | Yn−1 ) = 2 (b): Remembering Problem 10.17.11(b) (if necessary) we find that E(2n Yn | Fn−1 ) = 2n E(Yn | Yn−1 ) = 2n
Yn−1 = 2n−1 Yn−1 , 2
n ≥ 1,
(Y0 = 1),
which shows that, setting Xn = 2n Yn , for n ≥ 0 makes {(Xn , Fn ), n ≥ 0} into a martingale. (c): In Problem 6.13.18 we represesented the length Yn (Xn there) as a product of independent U (0, 1)-distributed random variables. This means that we can write Yn =
n Y k=1
Zk ,
n ≥ 1,
81
Martingales
where, thus Z1 , Z2 , . . ., are independent U (0, 1)-distributed random variables. Inserting this into our martingale we obtain Xn = 2n Yn = 2n
n Y
Zk =
k=1
n Y
(2Zk ),
k=1
which means that {(Xn , Fn ), n ≥ 0} is a martingale of product type with mean 1 as described in Example 10.3.4 – note that 2Zk ∈ U (0, 2) and therefore that E(2Zk ) = 1, k = 1, 2, . . .. a.s.
In Problem 6.13.18 we found that log Xn /n → −1 as n → ∞, which shows that a.s. Xn → 0 as n → ∞. Since the mean equals (and, hence, converges to) 1 as n → ∞ we do not have L1 -convergence. ♣ This also shows that the martingale is non-regular.
18. We first note that E Y = 1, so that {(Xn , Fn ), n ≥ 1} is, indeed, a martingale of product type with mean 1. Now, let Nn = Card {k ≤ n : Yk = 3/2}. Then Xn = log Xn n
=
Since Nn ∈ Bin(n, 1/2) we know that log Xn n log Xn Xn
a.s.
→
a.s.
→
a.s.
→
3N n , n = 1, 2, . . . , 2n Nn log 3 − log 2. n Nn a.s. 1 n → 2
as n → ∞, which implies that
√3 log 3 − log 2 = log < log 1 = 0 2 2 −∞ as n → ∞, 0
as
as
n → ∞,
n → ∞. a.s.
Finally, the martingale is non-regular, since Xn → 0 as n → ∞ and E Xn = 1 for all n (in particular, E Xn → 1 as n → ∞). 19. We know that {Ln , n ≥ 1} is a martingale of product type with mean 1. Taking P (Yk ;θ1 ) , where the summands are indepenlogarithms we obtain a sum, nk=1 log ff (Y k ;θ2 ) dent, identically distributed random variables, however, with mean < 0, due to the strict concavity of the logarithm (Problem 3.10.8). The strong law of large numbers therefore tells us that f (Y ; θ ) log Ln a.s. 1 1 → E log < 0 as n → ∞, n f (Y1 ; θ2 ) a.s.
from which it follows that log Ln → −∞ as n → ∞. (b): This follows from (a) and the continuity of the exponential function. a.s.
(c): Since Ln → 0 as n → ∞ and E Ln = 1 for all n (in particular, E Ln → 1 as n → ∞) the likelihood martingale is non-regular, hence neither closable nor uniformly integrable.
82
Chapter 10
20. The assumptions imply that E(Xn+1
( Xn , | Fn ) = Xn+1 ,
the martingale property, predictability,
that is, Xn+1 = Xn a.s., which, n being arbitrary, shows that Xn = X0 a.s. for all n. 21. (a): This can be achieved by taking the logarithm of any of the martingales of product type with mean 1 that we have encountered in the text as well as in various problems. Alternatively, using the hint, let Y1 , Y2 , . . . be independent random variables defined by ( 2 −1, with probability nn−1 2 , Yn = 1 2 n − 1, with probability n2 , Pn and set Xn = k=1 Yk , n ≥ 1. Since E Yn = 0 for all n it follows that {Xn , n ≥ 1} is a martingale. And since P (Yn = n2 − 1 i.o.) = 0 by the Borel-Cantelli lemma, we a.s. conclude that Xn → −∞ as n → ∞. a.s.
(b): Let {(Xn , Fn ), n ≥ 0} be positive martingale, such that Xn → +∞ as n → ∞. For example, turn around the second example in the previous problem: Let Z1 , Z2 , . . ., be independent random variables, defined as ( 2 1, with probability nn−1 2 , Zn = 1 2 −(n − 1), with probability n2 , P set Xn = nk=1 Zk , n ≥ 1, and note that E Zn = 0 for all n. This implies that {(Xn , Fn ), n ≥ 0} is a martingale. And since P (Yn 6= 1 i.o.) = 0 by the Borela.s. Cantelli lemma, we conclude that Xn → +∞ as n → ∞. Now, consider {exp{−Xn }, n ≥ 1}. By convexity this is a non-negative submartingale, which, in view of the continuity of the exponential function, converges almost surely to 0 as n → ∞. ♣ Remember the proof of the convergence theorem, where one of the steps amounted to considering the exponential of a (non-negative L1 -bounded) martingale.
22. Almost sure convergence + uniform integrability imply convergence in L1 . Moreover, the submartingale property implies that expectations are non-decreasing. Thus, 0 ≤ E Xn % 0
as
n → ∞,
which necessitates E Xn = 0 for all n, and hence, due to the non-negativity, that Xn = 0 a.s. ♣ The submartingale constructed in the previous problem thus is not (can not be) uniformly integrable.
23. It is not that hard to prove convergence in probability: Almost sure convergence and the assumption about domination imply that Yn → Y∞ in L1 as well as n → ∞. Now, by the triangle inequality, |E(Yn | Fn ) − E(Y∞ | F∞ )| ≤ |E(Yn | Fn ) − E(Y∞ | Fn )| +|E(Y∞ | Fn ) − E(Y∞ | F∞ )|. (S.10.3)
83
Martingales
The second term converges to 0 almost surely and in L1 by Theorem 10.12.4. As for the first term, E|E(Yn | Fn ) − E(Y∞ | Fn )| = E|E Yn − Y∞ | Fn | ≤ E E|Yn − Y∞ | | Fn = E|Yn − Y∞ | → 0
n → ∞,
as
that is, E(Yn | Fn ) → E(Y∞ | F∞ ) in L1 and, hence, in probability as n → ∞. In order to prove almost sure convergence we have to strengthen the arguments a bit for the first term in the decomposition in the RHS of (S.10.3); the second one has already been taken care of. Let ε > 0, m be arbitrary, n > m, and set Vm = sup |Yn − Y∞ |. n≥m
Then P (Vm > ε) = P
[
{|Yn − Y∞ | > ε} → 0
as
m → ∞,
n≥m p
a.s.
since Yn → Y∞ as n → ∞, which shows that Vm → 0 as m → ∞. Since, moreover Vm is non-increasing as m → ∞ an appeal to Theorem 5.3.5 tells us that, in fact, a.s. Vm → 0 as m → ∞. Since Vm ≤ 2Z ∈ L1 it follows, in addition, by dominated convergence, that Vm → 0 in L1 as m → ∞. Next we note that the first term in the RHS of (S.10.3) is dominated by E(Vm | Fn ), viz., |E(Yn | Fn ) − E(Y∞ | Fn )| ≤ E( sup |Yn − Y∞ | | Fn ) = E(Vm | Fn ). n≥m
Since m is fixed it now follows from another application of Theorem 10.12.4 that E(Vm | Fn ) → E(Vm | F∞ )
a.s. and in L1 as
n → ∞.
Recalling the fact that the second term in the RHS of (S.10.3) converges to 0 almost surely as n → ∞, it therefore follows that lim sup |E(Yn | Fn ) − E(Y∞ | F∞ )| ≤ E(Vm | F∞ )
a.s.
(S.10.4)
n→∞
Finally, let, again, ε > 0. Recalling the fact that Vm → 0 in L1 as m → ∞, we now conclude that P (E(Vm | F∞ ) > ε) ≤
E|E(Vm | F∞ )| E Vm = →0 ε ε
as
m → ∞.
p
This shows that E(Vm | F∞ ) → 0 as n → ∞, which, due to the monotonicity in a.s. m and Theorem 5.3.5 shows that E(Vm | F∞ ) → 0 as n → ∞. This fact, and the arbitrariness of m in (S.10.4), finally, establish the conclusion. 24. Let {(Xn , Fn ), n ≥ 0} denote our martingale, let A denote the uniform bound of the increments, and let τ be a stopping time with finite mean. According to Definition 10.15.1 we have to check that {(Xτ ∧n , Fn ), n ≥ 0}
is a regular martingale.
84
Chapter 10 In order to do so we show that the martingale is uniformly integrable; cf. Theorem 10.12.1. Now, because of the boundedness of the increments (and the triangle inequality) we find that |Xτ ∧n | ≤ A · (τ ∧ n + 1) ≤ A(τ + 1) ∈ L1 . We have thus found an integrable random variable that dominates the martingale sequence, which, in turn, implies that the sequence {Xτ ∧n , n ≥ 0} is uniformly integrable; Theorem 5.4.4.
25. (a): This is the exponential martingale, Example 3.6, in the standard normal case. (b): This follows since such martingales are of product type with mean 1, which, as we have seen in several problems, have the property that the sequence of averages of the logarithms converges to −∞ almost surely, so that the martingale itself converges to 0 almost surely. (c): As for mean convergence, r 2 E exp{Sn − n/2} = E exp{rSn − nr/2} = ψSn (r) · e−nr/2 = enr /2 · e−nr/2 for r < 1, → 0, nr(r−1)/2 = e = 1, for r = 1, → +∞, for r > 1. 26. (a): Theorems 10.15.7 and 10.14.3. (b): Theorem 6.9.3 takes care of the first statement, from which the second one follows via Theorem 6.8.2(iv). Next, Theorem 10.14.3 tells us that E
τ (t) X
Wk = E τ (t) · µw ,
k=1
Pτ (t) which, together with the sandwich inequality t < k=1 Wk ≤ t + Wτ (t) , in turn, implies that E Wτ (t) E τ (t) 1< ≤1+ . µw t Almost sure convergence and Fatou’s lemma then yield 1 ≤ lim inf t→∞
E τ (t) . µw
For the opposite inequality, suppose first that the summands are bounded above by some constant, M , say. Borrowing from page 535 we then conclude that lim sup t→∞
M E τ (t) ≤ 1 + lim sup = 1. µw t t→∞
For the general case we recall from page 535-536 that we reach the level t more rapidly for the unrestricted random walk than for the restricted one, and the third statement follows. Finally, the third statement and another application of Theorem 10.4.3 together yield E V (t) E τ (t) · µz 1 = → µz as n → ∞. t t µw
85
Martingales (c): Direct insertion yields a
Z µw
=
x dFY1 (x) + aP (Y1 > a),
E W1 = E min{Y1 , a} = 0
µz
=
τ (t) t
a.s.
Vτ (t) t
→
E Z1 = P (Y1 ≤ a), Ra 0
a.s.
→
Ra 0
E τ (t) t
→
Vτ (t) t
→
Ra 0
Ra 0
1 x dFY1 (x) + aP (Y1 > a)
as
n → ∞,
P (Z1 ≤ a) x dFY1 (x) + aP (Y1 > a)
as
n → ∞,
1 x dFY1 (x) + aP (Y1 > a)
as
n → ∞,
P (Z1 ≤ a) x dFY1 (x) + aP (Y1 > a)
as
n → ∞.
27. In the exponential case, direct computations yield µw
=
λ(1 − e−a/λ ),
µz τ (t) t
=
1 − e−a/λ , 1 λ(1 − e−a/λ )
a.s.
→
as
n → ∞,
→
1 − e−a/λ 1 = −a/λ λ λ(1 − e )
as
E τ (t) t
→
1 λ(1 − e−a/λ )
n → ∞,
Vτ (t) t
→
1 1 − e−a/λ = λ λ(1 − e−a/λ )
Vτ (t) t
a.s.
as
as
n → ∞,
n → ∞.
(d): The lack of memory property implies that there is no point in replacing used components, since they are “reborn” whenever they are inspected and found to be “alive”. 28. Letting Y1 , Y2 , . . . be martingale differences, E
2 Xn+1
2
− (Sn+1 (X)) | Fn = E
= E
n+1 X
Yi Yj | Fn =
i,j=1, i6=j
=
Xn2
n+1 X
Yk
k=1 n X
2
−
n+1 X
Yk2 | Fn
k=1
Yi Yj + 2E Xn Yn+1 | Fn
i,j=1, i6=j 2
− (Sn (X)) + 2Xn E(Yn+1 | Fn ) = Xn2 − (Sn (X))2 ,
which verifies the martingale property of the first sequence (measurability and integrability are obvious). As for the second one, 2 2 2 − (sn+1 (X))2 | Fn = E Xn + Yn+1 − (sn (X))2 − E(Yn+1 | Fn ) | Fn E Xn+1 2 2 = E Xn2 + Yn+1 + 2Xn Yn+1 − (sn (X))2 − E(Yn+1 | Fn ) | Fn 2 2 = Xn2 + E Yn+1 + 2Xn E(Yn+1 | Fn ) − (sn (X))2 − E Yn+1 = Xn2 − (sn (X))2 .
86
Chapter 10 Since the third sequence is the difference of two martingale sequences, it finally follows from what we have shown so far that the third sequence is itself a martingale sequence.
29. We first note that the independence assumption implies that E Yk = 0 for all k in order for {(Xn , Fn ), n ≥ 0} to be a martingale. Secondly, the conditional square function reduces to (sn (X))2 =
n X
E(Yk2 | Fk−1 ) =
k=1
n X
E Yk2 = Var Xn ,
k=1
and the conclusions of the Problem 10.17.28 turn into n n X
Yk
2
is a martingale,
o E Yk2 , n ≥ 1
is a martingale,
o Yk2 − E Yk2 , n ≥ 1
is a martingale.
k=1 n n X
Yk
n X
o Yk2 , n ≥ 1
−
k=1
2
−
k=1 n nX
n X k=1
k=1
In particular, the second case reduces to Example 10.3.3, and the third case reduces to Example 10.3.1. 30. The proof is obtained by “translating” the solution of Problem 10.17.23 to the present setting. Almost sure convergence and domination imply that Yn → Y−∞ in L1 as well as n → −∞: By the triangle inequality, |E(Yn | Fn ) − E(Y−∞ | F−∞ )| ≤ |E(Yn | Fn ) − E(Y−∞ | Fn )| +|E(Y−∞ | Fn ) − E(Y−∞ | F−∞ )|. (S.10.5) The second term in the RHS of (S.10.5) converges to 0 almost surely and in L1 by Theorem 10.12.4. As for the first term, E|E(Yn | Fn ) − E(Y−∞ | Fn )| ≤ E E|Yn − Y−∞ | | Fn = E|Yn − Y−∞ | → 0
as
n → −∞,
that is, E(Yn | Fn ) → E(Y−∞ | F−∞ ) in L1 and, hence, in probability as n → −∞. In order to prove almost sure convergence we strengthen the arguments for the first term in the decomposition in the RHS of (S.10.5); the second one has already been taken care of. Let ε > 0, let m ∈ −N be arbitrary, and set, for n < m, Vm = sup |Yn − Y−∞ |. n≤m
Then, P (Vm > ε) = P
[
{|Yn − Y−∞ | > ε} → 0
n≤m
as
m → −∞,
87
Martingales p
a.s.
since Yn → Y−∞ as n → −∞, which shows that Vm → 0 as m → −∞. Since, moreover Vm is non-increasing as m → −∞ an appeal to Theorem 5.3.5 tells us that a.s. Vm → 0 as m → −∞. Since Vm ≤ 2Z ∈ L1 it follows, in addition, that Vm → 0 in L1 as m → −∞. Next we observe that |E(Yn | Fn ) − E(Y−∞ | Fn )| ≤ E( sup |Yn − Y−∞ | | Fn ) = E(Vm | Fn ). n≤m
Since m is fixed Theorem 10.12.4 tells us that E(Vm | Fn ) → E(Vm | F−∞ )
a.s. and in L1 as
n → −∞.
Recalling the fact that the second term in the RHS of (S.10.5) converges to 0 almost surely as n → −∞, it therefore follows that lim sup |E(Yn | Fn ) − E(Y−∞ | F−∞ )| ≤ E(Vm | F−∞ ).
(S.10.6)
n→−∞
Finally, for ε > 0, arbitrary, P (E(Vm | F−∞ ) > ε) ≤
E|E(Vm | F−∞ )| E Vm = →0 ε ε
as
m → −∞,
since, as we have seen already, Vm → 0 in L1 as m → −∞. This shows that E(Vm | p F−∞ ) → 0 as n → −∞, which, due to the monotonicity in m and Theorem 5.3.5, a.s. shows that E(Vm | F−∞ ) → 0 as m → −∞, and, hence, due to the arbitrariness of m, finally, finishes the proof.
http://www.springer.com/978-0-387-22833-4