130 38 920KB
English Pages 124 Year 2018
David A. Harville
Linear Models and the Relevant Distributions and Matrix Algebra
Contents
2 Matrix Algebra: a Primer
1
3 Random Vectors and Matrices
19
4 The General Linear Model
33
5 Estimation and Prediction: Classical Approach
41
6 Some Relevant Distributions and Their Properties
63
7 Confidence Intervals (or Sets) and Tests of Hypotheses
87
2 Matrix Algebra: a Primer
EXERCISE 1. Let A represent an M N matrix and B an N M matrix. Can the value of A C B0 be determined from the value of A0 C B (in the absence of any other information about A and B)? Describe your reasoning. Solution. Yes! In light of results (1.12) and (1.10), we have that .A0 C B/0 D .A0 /0 C B0 D A C B0:
Thus, to obtain the value of A C B0, it suffices to transpose the value of A0 C B. EXERCISE 2. Show that for any M N matrix A D faij g and N P matrix B D fbij g, .AB/0 D B0 A0 [thereby verifying result (1.13)].
Solution. Since (for k D 1; 2; : : : ; N ) the i kth element of the P N matrix B0 is the ki th element of B and the kj th element of the N M matrix A0 is the j kth element of A, it follows from the P very definition of a matrix product that the ij th element of the P M matrix B0 A0 is N kD1 bki ajk PN PN (i D 1; 2; : : : ; P ; j D 1; 2; : : : ; M ). And upon rewriting kD1 bki ajk as kD1 ajk bki , it is evident that the ij th element of B0 A0 equals the j i th element of AB. Moreover, the j i th element of AB is the ij th element of .AB/0 . Thus, each element of B0 A0 equals the corresponding element of .AB/0, and we conclude that .AB/0 D B0 A0 . EXERCISE 3. Let A D faij g and B D fbij g represent N N symmetric matrices. (a) Show that in the special case where N D 2, AB is symmetric if and only if b12 .a11 a12 .b11 b22 /. (b) Give a numerical example where AB is nonsymmetric. (c) Show that A and B commute if and only if AB is symmetric. Solution. (a) Suppose that N D 2. Then, a a12 b11 b12 a b C a12 b12 AB D 11 D 11 11 a12 a22 b12 b22 a12 b11 C a22 b12
a11 b12 C a12 b22 : a12 b12 C a22 b22
Thus, AB is symmetric if and only if a11 b12 C a12 b22 D a12 b11 C a22 b12 or, equivalently, if and only if b12 .a11
a22 / D a12 .b11
b22 /:
(b) Take a11 a12 2 1 b11 b12 2 0 D and D : a21 a22 1 1 b21 b22 0 1
a22 / D
2
Matrix Algebra: a Primer
And, for i; j D 3; 4; : : : ; N , take aij D bij D 0. Then, 0 4 1 0 ::: B2 1 0 : : : B B AB D B0 0 0 : : : B :: :: :: : : @: : : :
1 0 0C C 0C C: :: C :A
0 0 0 ::: 0
(c) Since A and B are symmetric,
.AB/0 D B0 A0 D BA:
Thus, BA D AB if and only if .AB/0 D AB, that is, if and only if AB is symmetric. EXERCISE 4. Let A represent an M N partitioned matrix comprising R rows and U columns of blocks, the ij th of which is an Mi Nj matrix Aij that (for some scalar cij ) is expressible as Aij D cij 1Mi 10Nj (a scalar multiple of an Mi Nj matrix of 1’s). Similarly, let B represent an N Q partitioned matrix comprising U rows and V columns of blocks, the ij th of which is an Ni Qj matrix Bij that (for some scalar dij ) is expressible as Bij D dij 1Ni 10Qj . Obtain (in as simple form as possible) the conditions that must be satisfied by the scalars cij (i D 1; 2; : : : ; R; j D 1; 2; : : : ; U ) and dij (i D 1; 2; : : : ; U ; j D 1; 2; : : : ; V ) in order for AB to equal a null matrix.
Solution. According to result (2.5), AB is expressible as a partitioned matrix comprising R rows and V columns of blocks, the ij th of which is U X Fij D Ai k Bkj :
Moreover,
kD1
Ai k Bkj D ci k 1Mi 10Nk dkj 1Nk10Qj D ci k dkj 1Mi10Nk1Nk10Qj D ci k dkj 1Mi .Nk /10Qj
D Nk ci k dkj 1Mi 10Qj :
Thus, AB D 0 if and only if for i D 1; : : : ; R and j D 1; : : : ; V ,
PU
kD1
Nk ci k dkj D 0.
EXERCISE 5. Show that for any M N matrix A and N M matrix B, tr.AB/ D tr.A0 B0 /:
Solution. Making use of results (3.3), (1.13), and (3.7), we find that tr.AB/ D trŒ.AB/0 D tr.B0 A0 / D tr.A0 B0 /: EXERCISE 6. Show that for any M N matrix A, N P matrix B, and P M matrix C, tr.ABC/ D tr.CAB/ D tr.BCA/ (i.e., the cyclic permutation of the 3 matrices in the product ABC does not affect the trace of the product). Solution. Making use of Lemma 2.3.1, we find that tr.ABC/ D trŒ.AB/C D trŒC.AB/ D tr.CAB/ and similarly that tr.CAB/ D trŒ.CA/B D trŒB.CA/ D tr.BCA/:
EXERCISE 7. Let A, B, and C represent square matrices of order N.
3
Matrix Algebra: a Primer
(a) Using the result of Exercise 5 (or otherwise), show that if A, B, and C are symmetric, then tr.ABC/ D tr.BAC/. (b) Show that [aside from special cases like that considered in Part (a)] tr.BAC/ is not necessarily equal to tr.ABC/. Solution. (a) Making use of the result of Exercise 5 and of result (1.13), we find that tr.ABC/ D trŒ.AB/0 C 0 D tr.B0 A0 C 0 /: Now, suppose that A, B, and C are symmetric. Then, B0 A0 C 0 D BAC, and we conclude that tr.ABC/ D tr.BAC/.
(b) Let A D diag.A ; 0/, B D diag.B ; 0/, and C D diag.C ; 0/, where 1 1 1 0 1 0 A D ; B D ; and C D : 0 0 1 0 0 1 1 1 Then, A B C D 0 and B A C D , and, observing that ABC D diag.A B C ; 0/ 1 1
and BAC D diag.B A C ; 0/ and making use of result (3.5), we find that
tr.BAC/ D tr.B A C / D 2 ¤ 0 D tr.A B C / D tr.ABC/: EXERCISE 8. Which of the following sets are linear spaces: (1) the set of all N N diagonal matrices, (2) the set of all N N upper triangular matrices, and (3) the set of all N N nonsymmetric matrices? Solution. Clearly, the sum of two N N diagonal matrices is a diagonal matrix, and the sum of two N N upper triangular matrices is upper triangular. Likewise, a scalar multiple of an N N diagonal matrix is a diagonal matrix, and a scalar multiple of an N N upper triangular matrix is upper triangular. However, the sum of two N N nonsymmetric matrices is not necessarily nonsymmetric. For example, if A is an N N nonsymmetic matrix, then A and A0 are also N N nonsymmetric matrices, yet the sums A C . A/ D 0 and A C A0 are symmetric. Also, the product of the scalar 0 and any N N nonsymmetric matrix is the N N null matrix, which is symmetric. Thus, the set of all N N diagonal matrices and the set of all N N upper triangular matrices are linear spaces, but the set of all N N nonsymmetric matrices is not a linear space. EXERCISE 9. Define
0
1 A D @2 1
2 1 1
and (for i D 1; 2; 3) let a0i represent the i th row of A.
1 1 2
1 0 1A; 1
(a) Show that the set fa01 ; a02 g is a basis for R.A/.
(b) Find rank.A/.
(c) Making use of the answer to Part (b) (or otherwise), find a basis for C.A/. Solution. (a) For any scalars k1 and k2 such that k1 a01 C k2 a02 D 0, we have that k1 C 2k2 D 0, 2k1 C k2 D 0, k2 k1 D 0, and k2 D 0, implying that k1 D k2 D 0. Thus, the set fa01 ; a02 g is linearly independent. Moreover, a03 D a02 a01 , so that every row of A belongs to sp.a01 ; a02 /, implying (in light of Lemma 2.4.2) that R.A/ sp.a01 ; a02 / and hence that fa01 ; a02 g spans R.A/. And we conclude that fa01 ; a02 g is a basis for R.A/. (b) Rank.A/ D 2, as is evident from the result of Part (a) upon observing that rank.A/ D dimŒR.A/.
4
Matrix Algebra: a Primer
(c) Any two of the four columns of A are linearly independent (as is easily verified). Moreover, dimŒC.A/ D rank.A/ D 2. And we conclude, on the basis of Theorem 2.4.11, that the set comprising any two of the columns of A is a basis for C.A/; in particular, the set comprising the first two columns of A is a basis for C.A/. EXERCISE 10. Let A1 ; A2 ; : : : ; AK represent matrices in a linear space V, and let U represent a subspace of V. Show that sp.A1 ; A2 ; : : : ; AK / U if and only if A1 ; A2 ; : : : ; AK are contained in U (thereby establishing what is essentially a generalization of Lemma 2.4.2). Solution. If sp.A1 ; A2 ; : : : ; AK / U, then, obviously, A1 ; A2 ; : : : ; AK are contained in U. Now (for purposes of establishing the converse), suppose that A1 ; A2 ; : : : ; AK are contained in U. And let A represent an arbitrary matrix in V. If A 2 sp.A1 ; A2 ; : : : ; AK /, then A is expressible as a linear combination of A1 ; A2 ; : : : ; AK , implying (since U is a linear space) that A 2 U. Thus, sp.A1 ; A2 ; : : : ; AK / U. EXERCISE 11. Let V represent a K-dimensional linear space of M N matrices (where K 1). Further, let fA1 ; A2 ; : : : ; AK g represent a basis for V, and, for arbitrary scalars x1 ; x2 ; : : : ; xK and P PK y1 ; y2 ; : : : ; yK , define A D K i D1 xi Ai and B D j D1 yj Aj . Show that AB D
K X
xi yi
i D1
for all choices of x1 ; x2 ; : : : ; xK and y1 ; y2 ; : : : ; yK if and only if the basis fA1 ; A2 ; : : : ; AK g is orthonormal. Solution. Making use of result (4.8) (and of basic properties of an inner product), we find that AB D D D D
K X
xi Ai
i D1
K X i D1
K X
i D1
yj Aj
j D1 K X yj Aj xi Ai j D1
xi
K hX
j D1
i D1
K X
K X
xi
K X
yj Aj
Ai
i
yj .Aj Ai /:
j D1
Now, suppose that the basis fA1 ; A2 ; : : : ; AK g is orthonormal. Then, Ai Ai D 1 and, for j ¤ i , P P Aj Ai D 0, implying that jKD1 yj .Aj Ai / D yi . Thus, A B D K i D1 xi yi . PK x y (for all choices of x1 ; x2 ; : : : ; xK and Conversely, suppose that A B D i D1 i i y1 ; y2 ; : : : ; yK ). Upon setting xs D 1 and y t D 1 (where s and t are integers between 1 and K, inclusive) and setting xi D 0 for i ¤ s and yj D 0 for j ¤ t, we find that A B D As A t P and that K i D1 xi yi equals 1 if t D s and equals 0 if t ¤ s. Thus, As As D 1 and, for t ¤ s, As A t D 0 (s D 1; 2; : : : ; K), so that the basis fA1 ; A2 ; : : : ; AK g is orthonormal.
EXERCISE 12. An N N matrix A is said to be involutory if A2 D I, that is, if A is invertible and is its own inverse. (a) Show that an N N matrix A is involutory if and only if .I A/.I C A/ D 0. a b (b) Show that a 2 2 matrix A D is involutory if and only if (1) a2 C bc D 1 and d D a c d or (2) b D c D 0 and d D a D ˙1.
5
Matrix Algebra: a Primer Solution. (a) Clearly, .I
A/.I C A/ D I
A C .I
A/A D I
ACA
A2 D I
A2:
Thus, .I
A/.I C A/ D 0 , I
A2 D 0 , A2 D I:
(b) Clearly, A2 D
a2 C bc ac C cd
ab C bd : bc C d 2
And if Condition (1) or (2) is satisfied, it is easy to see that A is involutory. Conversely, suppose that A is involutory. Then, ab D db and ac D dc, implying that d D a or b D c D 0. Moreover, a2 C bc D 1 and d 2 C bc D 1. Consequently, if d D a, Condition (1) is satisfied. Alternatively, if b D c D 0, then d 2 D a2 D 1, implying that d D a D ˙1 [in which case Condition (2) is satisfied] or that d D a D ˙1 [in which case Condition (1) is satisfied]. EXERCISE 13. Let A D faij g represent an M N matrix of full row rank. (a) Show that in the special case M D 1 (i.e., in the special case where A is an N -dimensional row vector), there exists an N -dimensional column vector b, N 1 elements of which are 0, that is a right inverse of A. (b) Generalize from Part (a) (to an arbitrary value of M ) by showing that there exists an N M matrix B, N M rows of which are null vectors, that is a right inverse of A. Solution. (a) Since A is of full row rank, it contains at least one nonzero element. Take k to be any integer between 1 and N , inclusive, such that a1k ¤ 0. And take b D fbj g to be the N -dimensional column vector whose kth element is 1=a1k and whose other N 1 elements equal 0. Then, Ab is the 1 1 matrix .c/, where N X cD a1j bj D a1k .1=a1k / D 1: j D1
Thus, b is a right inverse of A.
(b) Since A is of full row rank (i.e., of rank M ), it contains M linearly independent columns, say columns j1 ; j2 ; : : : ; jM (j1 < j2 < < jM ). Denote by A the M M submatrix of A obtained by striking out all of the columns of A except the j1 ; j2 ; : : : ; jM th columns (and observe that A is nonsingular). Now, take B to be the N M matrix whose j1 ; j2 ; : : : ; jM th rows are respectively the first through M th rows of A1 and whose other N M rows are null vectors. Then, letting aj represent the j th column of A and bj0 the j th row of B (j D 1; 2; : : : ; N ), we find that 0 0 1 b1 B b0 C B 2C AB D .a1 ; a2 ; : : : ; aN /B : C @ :: A b0N 0 0 1 bj1 N M B bj0 C X X B 2C D aj bj0 D ajs bj0 s D .aj1 ; aj2 ; : : : ; ajM /B : C D A A1 D I: @ :: A sD1 j D1 bj0 M Thus, B is a right inverse of A. EXERCISE 14. Provide an alternative verification of equality (6.10) by premultiplying or postT U multiplying the right side of the equality by and by confirming that the resultant product V W equals IM CN .
6
Matrix Algebra: a Primer
Solution. Let
T U T V W B12 , where B22
1
BD Then, B D
B11 B21
1
B11 D T T
1
CT 1
D I C UQ
C T 1 UQ Q 1V T
1
1
1
UQ
VT
VT
1
1
T
1
VT
CU
1
UQ
1
VT
C UQ
1
D
1
UQ Q 1
VT
1
C UQ
1
D 0;
Q
1
:
1
D IM ; B12 D T
B21 D V T D VT
1
T 1 1
1
CT
UQ
1
C VT
QQ
1
D VT 1 VT D 0; and
1
D VT
B22 D V
1
1
UQ
D VT
D QQ
1
T 1 1
UQ
UQ
1
1
VT
UQ
1
VT
1
1
1
1
VT
CWQ
CWQ
UQ CW
WQ
1
Q 1
1
VT
VT 1
1
1
1
D IN : Thus, B D IM CN . 0
1 2 0 4 EXERCISE 15. Let A D @3 5 6A. Use the results of Section 2.6 to show that A is nonsingular 4 2 12 T U 1 and to obtain A . (Hint. Partition A as A D , where T is a square matrix of order 2.) V W T U Solution. Partition A as A D , where T is a square matrix of order 2. Upon observing that V W T is lower triangular and that its diagonal elements are nonzero, it follows from Part (1) of Lemma 2.6.4 that T is nonsingular and that 1=2 0 0:5 0 1 T D D : .1=5/3.1=2/ 1=5 0:3 0:2 Now, defining Q as in Theorem 2.6.6, that is, as Q D W V T 1 U, we find that 4 0:5 0 D .12 8/ D .4/: Q D 12 .4; 2/ 0:3 0:2 6 And, based on Part (1) of Theorem 2.6.6, we conclude that A is nonsingular, and, upon observing
7
Matrix Algebra: a Primer that 1
1
D .0:25/; 0:5 0 Q 1 V T 1 D .0:25/.4; 2/ D .0:35; 0:1/; 0:3 0:2 0:5 0 4 0:5 T 1 UQ 1 D .0:25/ D ; and 0:3 0:2 6 0 Q
D .4/
T
1
CT
1
UQ
1
VT
we further conclude that A
1
1
DT D D D
1
UQ 1 Q Q 1 V T 1 0 0:5 C .4/.0:35; 0:1/ 0:2 0 0 0:7 0:2 C 0:2 0 0 0:2 ; 0:2
C T
0:5 0:3 0:5 0:3
1:2 0:3
0
1:2 D @ 0:3 0:35
1
0:2 0:2 0:1
1 0:5 0 A: 0:25
EXERCISE 16. Let T D ftij g represent an N N triangular matrix. Show that if T is orthogonal, then T is diagonal. If T is orthogonal, what can be inferred about the values of the diagonal elements t11 ; t22 ; : : : ; tNN of T ? Solution. Suppose that T is orthogonal, and consider the case where T is upper triangular (i.e., the case where tij D 0 for i > j D 1; 2; : : : ; N ). For j D 1; 2; : : : ; N , let tj D .t1j ; t2j ; : : : ; tNj /0 represent the j th column of T . Then, the ij th element of T 0 T equals ti0 tj . And, for i j D 1; 2; : : : ; N , N i X X ti0 tj D tki tkj D tki tkj : kD1
kD1
In particular, for j D 1; 2; : : : ; N , we have that
t10 tj D t11 t1j : 2 Thus, t11 D 1 and (for j > 1) t11 t1j D 0, implying that t11 D ˙1 and that (for j > 1) t1j D 0. We have established that, for i D 1,
ti i D ˙1
and
ti;i C1 D ti;i C2 D D tiN D 0:
(S.1)
Let us now show that result (S.1) holds for all i . To do so, let us proceed by mathematical induction. Suppose that result (S.1) holds for all i less than or equal to an arbitrary integer I (between 1 and N 1, inclusive). Then, for j D I C 1; I C 2; : : : ; N , tI0 C1 tj D
IX C1 kD1
tk;I C1 tkj D tI C1;I C1 tI C1;j :
Thus, tI2C1;I C1 D 1 and (for j > I C 1) tI C1;I C1 tI C1;j D 0, implying that tI C1;I C1 D ˙1 and that (for j > I C 1) tI C1;j D 0 and hence that result (S.1) holds for i D 1; : : : ; I; I C 1 (which completes the mathematical induction argument). And we conclude that T is diagonal and that its
8
Matrix Algebra: a Primer
diagonal elements equal ˙1—this is also the case when T is lower triangular, as can be established via a similar argument. EXERCISE 17. Let A represent an N N matrix. Show that for any N N nonsingular matrix B, B 1AB is idempotent if and only if A is idempotent. Solution. If A is idempotent, then .B 1AB/2 D B 1ABB 1AB D B 1A2 B D B 1AB and hence B 1AB is idempotent. Conversely, if B 1AB is idempotent, then A2 D AA D IN AIN AIN D BB 1ABB 1ABB
1
D BB 1ABB
1
D A;
so that A is idempotent. EXERCISE 18. Let x D fxi g and y D fyi g represent nonnull N -dimensional column vectors. Show that xy 0 is a scalar multiple of an idempotent matrix (i.e., that xy 0 D cA for some scalar c P and some idempotent matrix A) if and only if N i D1 xi yi ¤ 0 (i.e., if and only if x and y are not orthogonal with respect to the usual inner product). P Solution. Let k D N i D1 xi yi . And observe that .xy 0 /2 D xy 0 xy 0 D x.y 0 x/y 0 D kxy 0:
(S.2)
Now, suppose that xy 0 D cA for some scalar c and some idempotent matrix A. Then, c ¤ 0 and A ¤ 0, as is evident upon observing that the ij th element of xy 0 equals xi yj (i; j D 1; 2; : : : ; N ) and hence that xy 0 (like x and y) is nonnull. Moreover, .xy 0 /2 D cA.cA/ D c 2 A2 D c 2 A: Thus, .xy 0 /2 ¤ 0, and we conclude [on the basis of result (S.2)] that k ¤ 0. Conversely, suppose that k ¤ 0. And take c D k and A D .1=k/xy 0. Then, xy 0 D cA. Moreover, A2 D .1=k/xy 0 Œ.1=k/xy 0 D .1=k 2 /x.y 0 x/y 0 D .1=k 2 /kxy 0 D .1=k/xy 0 D A; and hence A is idempotent. EXERCISE 19. Let A represent a 4 N matrix of rank 2, and take b D fbi g to be a 4-dimensional column vector. Suppose that b1 D 1 and b2 D 0 and that two of the N columns of A are the vectors a1 D .5; 4; 3; 1/0 and a2 D .1; 2; 0; 1/0 . Determine for which values of b3 and b4 the linear system Ax D b (in x) is consistent.
Solution. According to Theorem 2.9.2, Ax D b is consistent if and only if b 2 C.A/. Moreover, C.A/ D sp.a1 ; a2 /, as is evident from Theorem 2.4.11 upon observing that neither a1 nor a2 is a scalar multiple of the other and hence that a1 and a2 are linearly independent. Thus, Ax D b is consistent if and only if b 2 sp.a1 ; a2 / or, equivalently, if and only if there exist scalars k1 and k2 such that b D k1 a1 C k2 a2 : The first and second elements of the vector k1a1 Ck2 a2 equal b1 (D 1) and b2 (D 0), respectively, if and only if k1 D 1=3 and k2 D 2=3, as is easily verified. And, for k1 D 1=3 and k2 D 2=3, both the third and fourth elements of k1 a1 C k2 a2 equal 1. Accordingly, there exist scalars k1 and k2 such that b D k1 a1 C k2 a2 if and only if b3 D b4 D 1, and hence Ax D b is consistent if and only if b3 D b4 D 1. EXERCISE 20. Let A represent an M N matrix. Show that for any generalized inverses G1 and
9
Matrix Algebra: a Primer
G2 of A and for any scalars w1 and w2 such that w1 Cw2 D 1, the linear combination w1 G1 Cw2 G2 is a generalized inverse of A. Solution. We find that A.w1 G1 C w2 G2 /A D w1 AG1 A C w2 AG2 A D w1 A C w2 A D .w1 C w2 /A D A: Thus, w1 G1 C w2 G2 is a generalized inverse of A. EXERCISE 21. Let A represent an N N matrix. (a) Using the result of Exercise 20 in combination with Corollary 2.10.11 (or otherwise), show that if A is symmetric, then A has a symmetric generalized inverse. (b) Show that if A is singular (i.e., of rank less than N ) and if N > 1, then (even if A is symmetric) A has a nonsymmetric generalized inverse. (Hint. Make use of the second part of Theorem 2.10.7.) Solution. (a) Suppose that A is symmetric. Then, according to Corollary 2.10.11, .A /0 is a generalized inverse of A, and hence it follows from the result of Exercise 20 that .1=2/A C .1=2/.A /0 is a generalized inverse of A. Moreover, Œ.1=2/A C .1=2/.A /0 0 D .1=2/.A /0 C .1=2/Œ.A /0 0
D .1=2/.A /0 C .1=2/A D .1=2/A C .1=2/.A /0:
Thus, .1=2/A C .1=2/.A /0 is a symmetric generalized inverse of A.
(b) Suppose that A is singular and that N > 1. And suppose further (for purposes of establishing a contradiction) that every generalized inverse of A is symmetric. Let G represent any generalized inverse of A. Since rank.A/ < N, G is not a left inverse of A (as is evident from Lemma 2.5.1). Thus, I GA ¤ 0, and hence, among the elements of I GA, is an element, say the i kth element, that differs from 0. Now, take j to be any of the first N positive integers 1; 2; : : : ; N other than i, and take T to be an N N matrix whose kj th element is nonzero and whose remaining N 2 1 elements equal 0. Then, the ij th element of .I GA/T equals the product of the i kth element of I GA and the kj th element of T and consequently is nonzero, while the j i th element of .I GA/T equals 0. This implies that the ij th element of G C .I GA/T differs from the j i th element (since, in keeping with our supposition, G is symmetric), so that G C .I GA/T is nonsymmetric. Moreover, G C .I GA/T is a generalized inverse of A, as is evident from the second part of Theorem 2.10.7. Accordingly, we have arrived at the sought-after contradiction (of the supposition that every generalized inverse of A is symmetric). And we conclude that A has a nonsymmetric generalized inverse. EXERCISE 22. Let A represent an M N matrix of rank N 1. And let x represent any nonnull vector in N.A/, that is, any N -dimensional nonnull column vector such that Ax D 0. Show that a matrix Z is a solution to the homogeneous linear system AZ D 0 (in an N P matrix Z) if and only if Z D xk0 for some P -dimensional row vector k0. Solution. Suppose that Z D xk0 for some P -dimensional row vector k0. Then, AZ D .Ax/k0 D 0k0 D 0: Conversely, suppose that AZ D 0. And denote the columns of Z by z1 , z2 ; : : : ; zP , respec tively. Then, clearly, Azj D 0 and hence zj 2 N.A/ (j D 1; 2; : : : ; P ). Now, upon observing (in light of Lemma 2.11.5) that dimŒN.A/ D N .N 1/ D 1, it follows from Theorem 2.4.11 that fxg is a basis for N.A/. Thus, zj D kj x for some scalar kj (j D 1; 2; : : : ; P ). And we conclude that Z D xk0 for k0 D .k1 ; k2 ; : : : ; kP /.
EXERCISE 23. Suppose that AX D B is a consistent linear system (in an N P matrix X).
10
Matrix Algebra: a Primer
(a) Show that if rank.A/ D N or rank.B/ D P, then, corresponding to any solution X to AX D B, there is a generalized inverse G of A such that X D GB. (b) Show that if rank.A/ < N and rank.B/ < P, then there exists a solution X to AX D B such that there is no generalized inverse G of A for which X D GB.
Solution. (a) Assume that rank.A/ D N. Then, according to Lemma 2.10.3, every generalized inverse of A is a left inverse. Thus, for every generalized inverse G of A, we have that X D IN X D GAX D GB: Alternatively, assume that rank.B/ D P. Then, according to Lemma 2.5.1, B has a left inverse, say L. Moreover, according to Theorem 2.11.7, X D A B C .I
A A/Y
for some matrix Y . Thus, X D A B C .I
A A/Y IP D A B C .I
A A/Y LB D ŒA C .I
A A/Y LB:
And A C .I A A/Y L is a generalized inverse of A, as is evident from the second part of Theorem 2.10.7. (b) Suppose that rank.A/ < N and rank.B/ < P. Then, there exist a nonnull vector, say r, in N.A/ and a nonnull vector, say s, in N.B/. For any generalized inverse G of A, we have that GBs D G.Bs/ D G 0 D 0:
(S.3)
Now, taking G0 to be any particular generalized inverse of A, define X D X0 C Z; where X0 D G0 B and Z D rs0. Then, AX D AX0 C AZ D B C .Ar/s0 D B C 0 s0 D B; so that X is a solution to AX D B. Moreover, X s D X0 s C Z s D G0 Bs C rs0 s D 0 C rs0 s D .s0 s/r; implying (since s0 s ¤ 0) that X s ¤ 0. And, based on result (S.3), we conclude that there is no generalized inverse G of A for which X D GA. EXERCISE 24. Show that a matrix A is symmetric and idempotent if and only if there exists a matrix X such that A D PX .
Solution. That A is symmetric and idempotent if there exists a matrix X such that A D PX is an immediate consequence of Parts (4) and (5) of Theorem 2.12.2. Conversely, suppose that A is symmetric and idempotent. Then, upon observing that A0 D A and A0 A D A2 D A, we find that PA D A.A0 A/ A0 D AA A D A: Thus, A D PX for X D A.
EXERCISE 25. Show that corresponding to any quadratic form x0 Ax (in an N -dimensional vector
11
Matrix Algebra: a Primer
x), there exists a unique lower triangular matrix B such that x0 Ax and x0 Bx are identically equal, and express the elements of B in terms of the elements of A. Solution. Let A D faij g represent the ij th element of A (i; j D 1; 2; : : : ; N ). For an N N matrix B D fbij g that is lower triangular (i.e., having bij D 0 for j > i D 1; 2; : : : ; N ), the conditions ai i D bi i and aij C aj i D bij C bj i (j ¤ i D 1; 2; : : : ; N ) of Lemma 2.13.1 are equivalent to the conditions ai i D bi i and aij C aj i D bij (j < i D 1; 2; : : : ; N ). Thus, it follows from the lemma that there exists a unique lower triangular matrix B such that x0 Ax and x0 Bx are identically equal, namely, the lower triangular matrix B D fbij g, where bi i D ai i and bij D aij Caj i (j < i D 1; 2; : : : ; N ). EXERCISE 26. Show, via an example, that the sum of two positive semidefinite matrices can be positive definite. 1 0 0 0 Solution. Consider the two N N matrices and . Clearly, both of these two 0 0 0 IN 1 matrices are positive semidefinite, however their sum is the N N matrix IN , which is positive definite. EXERCISE 27. Let A represent an N N symmetric nonnegative definite matrix (where N 2). Define A0 D A, and, for k D 1; 2; : : : ; N 1, take Qk to be an .N k C 1/ .N k C 1/ unit upper triangular matrix, Ak an .N k/ .N k/ matrix, and dk a scalar that satisfy the recursive relationship Q0k Ak 1 Qk D diag.dk ; Ak / (E.1) —Qk , Ak , and dk can be constructed by making use of Lemma 2.13.19 and by proceeding as in the proof of Theorem 2.13.20. (a) Indicate how Q1 ; Q2 ; : : : ; QN 1 ; A1 ; A2 ; : : : ; AN 1 , and d1 ; d2 ; : : : ; dN 1 could be used to form an N N unit upper triangular matrix Q and a diagonal matrix D such that Q0 AQ D D. 1 0 2 0 0 0 B0 4 2 4C C (which is a symmetric nonnegative definite matrix), determine (b) Taking A D B @0 2 1 2A 0 4 2 7 unit upper triangular matrices Q1 , Q2 , and Q3 , matrices A1 , A2 , and A3 , and scalars d1 , d2 , and d3 that satisfy the recursive relationship (E.1), and illustrate the procedure devised in response to Part (a) by using it to find a 4 4 unit upper triangular matrix Q and a diagonal matrix D such that Q0 AQ D D. Solution. (a) Take D to be the diagonal matrix D D diag.d1 ; d2 ; : : : ; dN 1 ; AN 1 /. And take Q to be the N N unit upper triangular matrix Q D P1 P2 PN 1 , where P1 D Q1 and where, for k D 2; 3; : : : ; N 1, Pk D diag.Ik 1 ; Qk /—that P1 P2 PN 1 is unit upper triangular is evident upon observing that a product of unit upper triangular matrices is itself unit upper triangular. Then, Q0 AQ D D: To see this, define, for k D 1; 2; : : : ; N 1, Bk D diag.Dk ; Ak /; where Dk D diag.d1 ; d2 ; : : : ; dk /. And observe that B1 D P10 AP1 and that, for k D 2; 3; : : : ; N 1, 1 0 Dk 1 0 0 Dk 1 0 dk 0 A D D Pk0 Bk 1 Pk : Bk D @ 0 0 A Q 0 Q k k 1 k 0 0 A k
12
Matrix Algebra: a Primer
Repeated substitution gives BN
1
D PN0
It remains only to observe that BN and
1
1
P20 P10 AP1 P2 PN
1
D Q0 AQ:
D D.
(b) Recursive relationship (E.1) can be satisfied for k D 1 simply by taking Q1 D I4 , d1 D 2, 0 1 4 2 4 1 2 A: A1 D @ 2 4 2 7
Relationship (E.1) can be satisfied for k D 2 by proceeding as in Case (1) of the proof of Theorem 0 1 2.13.20. Specifically, take 1 1=2 1 Q2 D @0 1 0A; 0 0 1 0 1 so that 4 0 0 Q02 A1 Q2 D @0 0 0A: 0 0 3 And take d2 D 4 and 0 0 A2 D : 0 3 Finally, relationship (E.1) can be satisfied for k D 3 by proceeding as in Case (2) of the proof of Theorem 2.13.20. Accordingly, take 1 0 Q3 D ; 0 1 in which case Q03 A3 Q3 D diag.0; 3/: And take d3 D 0 and A3 D .3/. Now, making use of the solution to Part (a), we find that Q0 AQ D D; with
and
0 1 B0 1 0 I2 0 Q D Q1 DB @0 0 Q2 0 Q3 0
1 0 0 0 1 1=2 1C C 0 1 0A 0 0 1
D D diag.d1 ; d2 ; d3 ; A3 / D diag.2; 4; 0; 3/:
EXERCISE 28. Let A D faij g represent an N N symmetric positive definite matrix, and let B D fbij g D A 1. Show that, for i D 1; 2; : : : ; N, bi i 1=ai i ; with equality holding if and only if aij D 0 for all j ¤ i .
Solution. Let U D .u1 ; U2 /, where u1 is the i th column of IN and U2 is the submatrix of IN obtained by striking out the i th column, and observe that U is orthogonal (so that U is nonsingular and U 1 D U 0 ). Define R D U 0AU and S D U 0 BU. And partition R and S as r11 r 0 s11 s0 RD and SD r R s S
13
Matrix Algebra: a Primer [where the dimensions of both R and S are .N 1/ .N 1/]. Then, 0
r D
r11 D u01 Au1 D ai i ;
u01 AU2
D .ai1 ; ai 2 ; : : : ; ai;i
and
(S.4)
1 ; ai;i C1 ; : : : ; ai;N 1 ; aiN /;
s11 D u01 Bu1 D bi i :
(S.5) (S.6)
It follows from Corollary 2.13.11 that R is positive definite, implying (in light of Corollary 2.13.13) that R is positive definite and hence (in light of Corollary 2.13.12) that R is invertible and R 1 is positive definite. Also, R
1
DU
1
A
1
.U
1 0
/ D U 0 BU D S:
Thus, in light of results (S.4) and (S.6), it follows from Theorem 2.13.32 and result (6.11) that ai i r 0 R 1 r > 0 and that bi i D .ai i r 0 R 1 r/ 1:
Moreover, r 0 R 1 r 0, with equality holding if and only if r D 0. Accordingly, we conclude that bi i 1=ai i , with equality holding if and only if r D 0 or equivalently [in light of result (S.5)] if and only if aij D 0 for j ¤ i . EXERCISE 29. Let
0
B B ADB @
a11 a21 a31 a41
a12 a22 a32 a42
a14 a24 a34 a44
a13 a23 a33 a43
1
C C C: A
(a) Write out all of the pairs that can be formed from the four “ boxed” elements of A. (b) Indicate which of the pairs from Part (a) are positive and which are negative. (c) Use formula (14.1) to compute the number of pairs from Part (a) that are negative, and check that the result of this computation is consistent with your answer to Part (b). Solution. (a) and (b) Pair a14 , a23 a14 , a31 a14 , a42 a23 , a31 a23 , a42 a31 , a42
“Sign”
C
(c) 4 .4; 3; 1; 2/ D 3 C 2 C 0 D 5 [or alternatively 4 .3; 4; 2; 1/ D 2 C 2 C 1 D 5]. EXERCISE 30. Obtain (in as simple form as possible) an expression for the determinant of each of the following two matrices: (1) an N N matrix A D faij g of the general form 0 1 0 ::: 0 0 a1N B 0 ::: 0 a2;N 1 a2N C B C B 0 a a3;N 1 a3N C 3;N 2 ADB C B :: :: :: C @ : : : A aN1 : : : aN;N 2 aN;N 1 aNN
14
Matrix Algebra: a Primer
(where aij D 0 for j D 1; 2; : : : ; N the general form 0
i ; i D 1; 2; : : : ; N 0 0 :: :
1 0 :: :
B B B BDB B @ 0 k0
0 1
1); (2) an N N matrix B D fbij g of
:::
1
0 0
::
: 0 k2 : : :
0 k1
1 kN
—a matrix of this general form is called a companion matrix.
1
C C C C C A
Solution. (1) Consider the expression for the determinant given by the sum (14.2) or (14.20). For a matrix A of the specified form, there is only one term of this sum that can be nonzero, namely, the term corresponding to j1 D N , j2 D N 1; : : : ; jN D 1. (To verify formally that this is the only term that can be nonzero, let j1 ; j2 ; : : : ; jN represent an arbitrary permutation of the first N positive integers, and suppose that a1j1 a2j2 aNjN ¤ 0 or, equivalently, that aiji ¤ 0 for i D 1; 2; : : : ; N . Then, it is clear that j1 D N and that if j1 D N , j2 D N 1; : : : ; ji 1 D N C1 .i 1/, then ji D N C1 i . We conclude, on the basis of mathematical induction, that j1 D N; j2 D N 1; : : : ; jN D 1.) Now, upon observing that N .N; N 1; : : : ; 1/ D .N
1/ C .N
2/ C C 1
and “recalling” that the sum of the first N 1 positive integers equals N.N jAj D . 1/N.N
1/=2
a1N a2;N
1
1/=2, we find that
aN1 :
(2) Consider the expression for the determinant given by the sum (14.2) or (14.20). For a matrix B of the specified form, there is only one term of this sum that can be nonzero, namely, the term corresponding to j1 D 2, j2 D 3, : : : ; jN 1 D N , jN D 1. Upon observing that N .2; 3; : : : ; N; 1/ D 1 C 1 C C 1 D N
1;
we conclude that jBj D . 1/N
1
b12 b23 bN
1;N bN1
D . 1/N
1 N 1
1
. k0 / D . 1/N k0 :
Alternatively, this expression could be derived by making use of result (14.14). Partitioning B B as B D , where c0 is the last row of B, it follows from result (14.14) and from formula (14.7) c0 (for the determinant of a triangular matrix) that ˇ 0ˇ ˇ ˇ ˇc ˇ ˇB ˇ jAj D ˇˇ 0 ˇˇ D . 1/.N 1/1 ˇˇ ˇˇ D . 1/N 1 . k0 /1N 1 D . 1/N k0 : B c Still another possibility is to apply formula (14.22). Taking T D IN 1 , W D . k0 /, and V D .k1 ; k2 ; : : : ; kN 1 /, we find that ˇ ˇ ˇ 0 Tˇ ˇ ˇ D . 1/.N 1/1 jT jjW j D . 1/N 1 1. k0 / D . 1/N k0 : jBj D ˇ W Vˇ
EXERCISE 31. Verify the part of result (14.16) that pertains to the postmultiplication of a matrix by a unit upper triangular matrix by showing that for any N N matrix A and any N N unit upper triangular matrix T , jAT j D jAj. Solution. Define Ti to be a matrix formed from IN by replacing the i th column of IN with the i th column of T (i D 1; 2; : : : ; N ). Then, T D TN TN
1
T1 :
(S.7)
15
Matrix Algebra: a Primer
[That T is expressible in this form can be established via a mathematical induction argument: let ti represent the i th column of T and ui the i th column of IN (i D 1; 2; : : : ; N ); and observe that TN D .u1 ; u2 ; : : : ; uN 1 ; tN / and that if TN TN 1 Ti C1 D .u1 ; u2 ; : : : ; ui ; ti C1 ; : : : ; tN 1 ; tN /, then TN TN 1 Ti D .TN TN 1 Ti C1 /Ti D .u1 ; u2 ; : : : ; ui 1 ; ti ; ti C1 ; : : : ; tN 1 ; tN /.] In light of result (S.7), AT is expressible as AT D ATN TN
1
T1 :
Now, define BN C1 D A, and Bi D ATN TN 1 Ti (i D N; N 1; : : : ; 2). Clearly, to show that jAT j D jAj, it suffices to show that, for i D N; N 1; : : : ; 1, the postmultiplication of Bi C1 by Ti does not alter the determinant of Bi C1 . Observe that the columns of Bi C1 Ti are the same as those of Bi C1 , except for the i th column of Bi C1 Ti , which consists of the i th column of Bi C1 plus scalar multiples of the first through .i 1/th columns of Bi C1 . Accordingly, it follows from Theorem 2.14.12 that jBi C1Ti j D jBi C1 j. We conclude that jAT j D jAj. EXERCISE 32. Show that for any N N matrix A and any N N nonsingular matrix C, jC
1
ACj D jAj:
Solution. Making use of results (14.25) and (14.28), we find that ACj D jC 1 jjAjjCj D .1=jCj/jAjjCj D jAj: a b EXERCISE 33. Let A D , where a, b, c, and d are scalars. c d jC
1
(a) Show that in the special case where A ispsymmetric (i.e., where c D b), A is nonnegative definite if and onlypif a 0, d 0, and jbj ad and is positive definite if and only if a > 0, d > 0, and jbj < ad . (b) Extend the result of Part (a) by showing that in the general case where A is not necessarily symmetric (i.e., where possibly c ¤ b), A is nonnegative definite if and only if a 0, dp 0, p and jb C cj=2 ad and is positive definite if and only if a > 0, d > 0, and jb C cj=2 < ad . [Hint. Take advantage of the result of Part (a).]
Solution. (a) Suppose that A is symmetric (i.e., that c D b). And, based on Corollary 2.13.14 and Lemma 2.14.21, observe that if A is nonnegative definite, then a 0, d 0, and jAj 0, and that if A is positive definite, then a > 0, d > 0, and jAj > 0. Moreover, jAj D ad
b2
[as is evident from result (14.4)], so that jAj > 0 , b 2 < ad: p Thus, if A is nonnegative definite, then a 0, d 0, and jbj ad , and if A is positive definite, p then a > 0, d > 0, and jbj < ad . Conversely, if a > 0, then A D U 0 DU; jAj 0
,
where UD
1 0
b 2 ad
b=a 1
and
and D D
a 0
d
0 : .b 2 =a/
p [in which case And based on Corollary 2.13.16, we conclude that if a > 0, d 0, and jbj adp d .b 2 =a/ 0], then A is nonnegative definite and that if a > 0, d > 0, and jbj < ad [in which
16
Matrix Algebra: a Primer
case d .b 2 =a/ > 0], then A is positive definite. It remains only to observe that if a D 0, d 0, p and jbj ad (in which case b D 0), then A D diag.0; d / and diag.0; d / is positive semidefinite.
(b) Suppose that A is not necessarily symmetric (i.e., that possibly c ¤ b). Then, according to Corollary 2.13.2, there is a unique symmetric matrix B such that the quadratic forms x0 Ax and x0 Bx (in x) are identically equal, namely, the matrix a .b C c/=2 B D .1=2/.A C A0 / D : .b C c/=2 d
Accordingly, A is nonnegative definite if and only if B is nonnegative definite and is positive definite if and only if B is positive definite. And upon applying the result of Part (a) to the p matrix B, we find that A is nonnegative definite if and only if a 0, dp 0, and jb C cj=2 ad and is positive definite if and only if a > 0, d > 0, and jb C cj=2 < ad . EXERCISE 34. Let A D faij g represent an N N symmetric matrix. And suppose that A is nonnegative definite (in which case its diagonal elements are nonnegative). By, for example, making use of the result of Part (a) of Exercise 33, show that, for j ¤ i D 1; 2; : : : ; N, p jaij j ai i ajj max.ai i ; ajj /; p with jaij j < ai i ajj if A is positive definite. ai i aij Solution. Observe that the 2 2 matrix is a principal submatrix of A and, accordingly, aj i ajj that it is symmetric and (in light of Corollary 2.13.13) nonnegative definite. And it is positive definite p if A is positive definite. Thus, it follows from Part (a) of Exercise 33 that jaij j ai i ajj , with p jaij j < ai i ajj if A is positive definite. Moreover, if ai i ajj , then p p ai i ajj ai i ai i D ai i D max.ai i ; ajj /I and similarly if ai i ajj , then p p ai i ajj ajj ajj D ajj D max.ai i ; ajj /:
EXERCISE 35. Let A D faij g represent an N N symmetric positive definite matrix. Show that det A
N Y
ai i ;
i D1
with equality holding if and only if A is diagonal. Q Solution. That det A D N i D1 ai i if A is diagonal is an immediate consequence of Corollary 2.14.2. Q Thus, it suffices to show that if A is not diagonal, then det A < N i D1 ai i . This is accomplished by mathematical induction. Consider a 2 2 symmetric matrix a a A D 11 12 a12 a22 that is not diagonal. (Every 1 1 matrix is diagonal.) Upon recalling formula (14.4), we find that jAj D a11 a22
2 a12 < a11 a22 :
Suppose now that, for every .N 1/ .N 1/ symmetric positive definite matrix that is not diagonal, the determinant of the matrix is (strictly) less than the product of its diagonal elements, and consider the determinant of an N N symmetric positive definite matrix A D faij g that is not diagonal (where N 3). Partition A as
17
Matrix Algebra: a Primer A a AD a0 aNN
[where A is of dimensions .N 1/ .N 1/]. And observe (in light of Corollary 2.13.13) that A is (symmetric) positive definite. Upon recalling (from Lemma 2.13.9) that a positive definite matrix is nonsingular, it follows from Theorem 2.14.22 that jAj D jA j.aNN
a0 A1 a/:
(S.8)
Moreover, it follows from Lemma 2.14.21 that jA j > 0 and from Theorem 2.13.32 that aNN a0 A1 a > 0. In the case where A is diagonal, we have (since A is not diagonal) that a ¤ 0, implying (since, in light of Corollary 2.13.12, A1 is positive definite) that a0 A1 a > 0 and hence that aNN > aNN a0 A1 a, so that [in light of result (S.8)] jAj < aNN jA j D aNN
N Y1 i D1
ai i D
N Y
ai i :
i D1
In the alternative case where A is not diagonal, we have that a0 A1 a 0, implying that aNN Q 1 aNN a0 A1 a, and we have (by supposition) that jA j < N i D1 ai i , so that jAj < .aNN
Thus, in either case, jAj
0 and E.z 2 / > 0. Clearly, jwzj wz jwzj, and it follows that E.jwzj/ E.wz/ E.jwzj/ and hence that jE.wz/j E.jwzj/. And to establish the inequality p p E.jwzj/ E.w 2 / E.z 2 /, it suffices to observe that this inequality is equivalent to the inequality p p jE.jwjjzj/j E.w 2 / E.z 2 /, obtained from inequality (2.13) by applying inequality (2.13) with jwj and jzj in place of w and z, respectively. Further, the inequality jE.wz/j E.jwzj/ holds as the equality E.wz/ D E.jwzj/ if and only if E.wz jwzj/ D 0 and hence if and only if wz 0 with probability 1. Similarly, the inequality jE.wz/j E.jwzj/ holds as the equality E.wz/ D E.jwzj/ if and only if E.wz C jwzj/ D 0 and E.jwzj/ p hence pif and only if wz 0 with probability p 1. And p equality holds in the inequalityıp 2 2 2 2 E.w / E.z / if and only if E.jwjjzj/ D E.w / E.z / and hence if and only if jzj E.z 2 / D ıp jwj E.w 2 / with probability 1 [as is evident from the conditions under which equality is attained in inequality (2.13) when that inequality is applied with jwj and jzj in place ofpw and z]. p Finally, 2 equality holds in both of the inequalities jE.wz/j E.jwzj/ and E.jwzj/ E.w / E.z 2 / if p p ıp ıp and only if jE.wz/j D E.w 2 / E.z 2 / and hence if and only if z E.z 2 / D w E.w 2 / with ıp ıp probability 1 or z E.z 2 / D w E.w 2 / with probability 1.
(b) Upon setting w D x E.x/ and z D y E.y/ in result (E.1), we obtain result (E.2). Moreover, upon setting w D x E.x/ and z D y E.y/ in the conditions under which equality is attained in the first and/or second of the two inequalities in result (E.1), we obtain the conditions under which equality is attained in the corresponding inequality or inequalities in result (E.2). Accordingly, when var.x/ D 0 or var.y/ D 0, both inequalities in result (E.2) hold as equalities. Now, suppose that var.x/ > 0 and var.y/ > 0. Then, the inequality jcov.x; y/j EŒjx E.x/jjy E.y/j holds as the equality cov.x; y/ D EŒjx E.x/jjy E.y/j if and only if Œx E.x/Œy E.y/ 0 with probability 1 and holds as the equality cov.x; y/ D EŒjx E.x/jjy E.y/j if and only if Œx E.x/Œy E.y/ 0 with probability 1. Further, the inequality EŒjx E.x/jjy E.y/j p p ıp ıp var.x/ var.y/ holds as an equality if and only if jy E.y/j var.y/ D jx E.x/j var.x/ with probability ıp in both of the inequalities in result (E.2) ıpif and only ıp1. And equality is attained if Œy E.y/ var.y/ D Œx E.x/ var.x/ with probability 1 or Œy E.y/ var.y/ D ıp Œx E.x/ var.x/ with probability 1. EXERCISE 3. Let x represent an N -dimensional random column vector and y a T -dimensional random column vector. And define x to be an R-dimensional subvector of x and y an S -dimensional subvector of y (where 1 R N and 1 S T ). Relate E.x / to E.x/, var.x / to var.x/, and cov.x ; y / and cov.y ; x / to cov.x; y/. Solution. The subvector x is obtained from x by striking out all except R elements (of x), say the
21
Random Vectors and Matrices
i1 ; i2 ; : : : ; iR th elements. Similarly, the subvector y is obtained from y by striking out all except S elements (of y), say the j1 ; j2 ; : : : ; jS th elements. Accordingly, E.x / is the R-dimensional subvector of E.x/ obtained by striking out all of the elements of E.x/ except the i1 ; i2 ; : : : ; iR th elements. And var.x / is the R R submatrix of var.x/ obtained by striking out all of the rows and columns of var.x/ except the i1 ; i2 ; : : : ; iR th rows and columns. Further, cov.x ; y / is the R S submatrix of cov.x; y/ obtained by striking out all of the rows and columns of cov.x; y/ except the i1 ; i2 ; : : : ; iR th rows and the j1 ; j2 ; : : : ; jS th columns; and cov.y ; x / is the S R submatrix of Œcov.x; y/0 obtained by striking out all of the rows and columns of Œcov.x; y/0 except the j1 ; j2 ; : : : ; jS th rows and i1 ; i2 ; : : : ; iR th columns. EXERCISE 4. Let x represent a random variable that is distributed symmetrically about 0 (so that x x); and suppose that the distribution of x is “nondegenerate” in the sense that there exists a nonnegative constant c such that 0 < Pr.x > c/ < 12 [and assume that E.x 2 / < 1]. Further, define y D jxj. (a) Show that cov.x; y/ D 0.
(b) Are x and y statistically independent? Why or why not? Solution. (a) Since x x, we have that cov.x; y/ D cov. x; j xj/. And upon observing that cov. x; j xj/ D cov. x; y/ D it follows that cov.x; y/ D
cov.x; y/;
cov.x; y/ and hence that cov.x; y/ D 0.
(b) The random variables x and y are statistically dependent. To formally verify that conclusion, let c represent a nonnegative constant such that 0 < Pr.x > c/ < 12 . And observe (in light of the symmetry about 0 of the distribution of x) that Pr.x < c/ D Pr. x < c/ D Pr.x > c/ and hence that Pr.y c/ D 1 Pr.x < c/ Pr.x > c/ D 1 2 Pr.x > c/ > 0: Thus, Pr.x > c; y c/ D 0 < Pr.x > c/ Pr.y c/; which confirms that x and y are statistically dependent. EXERCISE 5. Provide detailed verifications for (1) equality (2.39), (2) equality (2.45), and (3) equality (2.48). Solution. (1) By definition, N T X X cov c C ai xi ; k C bj yj i D1
j D1
nh
DE cC
N X
ai xi
i D1
N i X E cC ai xi i D1
h
kC In light of equality (1.5), we have that cC
N X i D1
ai xi
T X
bj yj
j D1
T io X : E kC bj yj
N N X X ai xi E cC ai xi D c C i D1
i D1
D
N X i D1
(S.2)
j D1
ai Œxi
h
cC
E.xi /
N X i D1
ai E.xi /
i (S.3)
22
Random Vectors and Matrices
and similarly that kC
T X
bj yj
j D1
T T X X bj Œyj E kC bj yj D
E.yj /:
(S.4)
j D1
j D1
And upon substituting expressions (S.3) and (S.4) into expression (S.2) and making further use of equality (1.5), we find that N N T nX X X ai Œxi cov c C ai xi ; k C bj yj D E i D1
E.xi /
i D1
j D1
DE D D
N X T nX
T X
bj Œyj
j D1
ai bj Œxi
E.xi /Œyj
o E.yj /
ai bj EfŒxi
E.xi /Œyj
E.yj /g
i D1 j D1
N X T X
o E.yj /
i D1 j D1
N X T X
ai bj cov.xi ; yj /:
i D1 j D1
(2) Let ci represent the i th element of c and a0i the i th row of A (i D 1; 2; : : : ; M ). Similarly, let kj represent the j th element of k, and bj0 the j th row of B (j D 1; 2; : : : ; S ). Then, the i th element of c C Ax is ci C a0i x, and the j th element of k C By is kj C bj0 y. Thus, the ij th element of the M S matrix cov.c C Ax; k C By/ is cov.ci C a0i x; kj C bj0 y/. And it follows from result (2.42) that cov.ci C a0i x; kj C bj0 y/ equals a0i cov.x; y/bj . Also, a0i cov.x; y/bj is the ij th element of the M S matrix Acov.x; y/B0, as is evident from result (2.2.12) upon observing that bj is the j th column of B0. We conclude that each element of cov.c C Ax; k C By/ equals the corresponding element of Acov.x; y/B0 and hence that cov.c C Ax; k C By/ D Acov.x; y/B0: (3) Let cp represent the pth element of c, and, for i D 1; 2; : : : ; N , let xpi represent the pth element of xi (p D 1; 2; : : : ; M ). Similarly, let kq represent the qth element of k, and, for j D 1; 2; : : : ; T , let yqj represent the qth element of yj (q D 1; 2; : : : ; S ). Then, the pth element of P PN PT PT cC N i D1 ai xi is cp C i D1 ai xpi , and the qth element of k C j D1 bj yj is kq C j D1 bj yqj . PN PT Thus, the pqth element of the M S matrix cov.c C i D1 ai xi ; k C j D1 bj yj / is cov.cp C PT PN PN i D1 ai xpi ; kq C i D1 ai xpi ; kq C j D1 bj yqj /. And it follows from result (2.39) that cov.cp C PT PN PT j D1 bj yqj / equals i D1 j D1 ai bj cov.xpi ; yqj /, which is the pqth element of the M S PN PT P matrix i D1 j D1 ai bj cov.xi ; yj /. We conclude that each element of cov.c C N i D1 ai xi ; k C PT PN PT b y / equals the corresponding element of a b cov.x ; y / and hence that i j j D1 j j i D1 j D1 i j N X T N T X X X cov c C ai bj cov.xi ; yj /: ai xi ; k C bj yj D i D1
j D1
i D1 j D1
EXERCISE 6. (a) Let x and y represent random variables. Show that cov.x; y/ can be determined from knowledge of var.x/, var.y/, and var.x C y/, and give a formula for doing so.
23
Random Vectors and Matrices
(b) Let x D .x1 ; x2 /0 and y D .y1 ; y2 /0 represent 2-dimensional random column vectors. Can cov.x; y/ be determined from knowledge of var.x/, var.y/, and var.x C y/? Why or why not? Solution. (a) As a special case of result (2.41), we have that var.x C y/ D var.x/ C var.y/ C 2 cov.x; y/: Thus, cov.x; y/ D Œvar.x C y/
var.x/
var.y/=2:
(b) Clearly, cov.x; y/ can be determined from knowledge of var.x/, var.y/, and var.x C y/ if and only if it can be determined from knowledge of var.x/, var.y/, and var.xCy/ var.x/ var.y/. And, as a special case of result (2.50), we have that var.x C y/ D var.x/ C var.y/ C cov.x; y/ C cov.y; x/ and hence that var.x C y/
Further,
cov.x; y/ C cov.y; x/ D
var.x/
var.y/ D cov.x; y/ C cov.y; x/:
2 cov.x1 ; y1 / cov.x1 ; y2 / C cov.x2 ; y1 / : cov.x1 ; y2 / C cov.x2 ; y1 / 2 cov.x2 ; y2 /
Now, observe that cov.x; y/ D Observe also that
cov.x1 ; y1 / cov.x1 ; y2 / : cov.x2 ; y1 / cov.x2 ; y2 /
var.x1 / cov.x1 ; x2 / var.y1 / cov.y1 ; y2 / var.x/ D and var.y/ D : cov.x2 ; x1 / var.x2 / cov.y2 ; y1 / var.y2 / Accordingly, the diagonal elements of cov.x; y/ and the average of the two off-diagonal elements of cov.x; y/ can be determined from knowledge of var.x/, var.y/, and var.xCy/—they are respectively equal to the diagonal elements and either off-diagonal element of .1=2/Œvar.xCy/ var.x/ var.y/. But the knowledge provided by var.x/, var.y/, and var.x C y/ is insufficient to “separate” one offdiagonal element of cov.x; y/ from the other, leading to the conclusion that cov.x; y/ cannot be completely determined from knowledge of var.x/, var.y/, and var.x C y/. EXERCISE 7. Let x represent an M -dimensional random column vector with mean vector and variance-covariance matrix †. Show that there exist M.M C 1/=2 linear combinations of the M elements of x such that can be determined from knowledge of the expected values of M of these linear combinations and † can be determined from knowledge of the [M.M C1/=2] variances of these linear combinations. Solution. Denote by xi the i th element of x and by i the i th element of , and let ij represent the ij th element of †. Further, for i D 1; 2; : : : ; M, define wi D xi and, for j > i D 1; 2; : : : ; M, define yij D xi C xj . Then, clearly, E.wi / D i and var.wi / D i i
.i D 1; 2; : : : ; M /;
and var.yij / D i i C jj C 2 ij
.j > i D 1; 2; : : : ; M /:
And it follows that i D E.wi / and i i D var.wi /
.i D 1; 2; : : : ; M /
and j i D ij D Œvar.yij /
var.wi /
var.wj /=2
.j > i D 1; 2; : : : ; M /:
24
Random Vectors and Matrices
Thus, can be determined from knowledge of the expected values of the M linear combinations w1 ; w2 ; : : : ; wM , and † can be determined from knowledge of the variances of w1 ; w2 ; : : : ; wM and the variances of the M.M 1/=2 additional linear combinations yij (j > i D 1; 2; : : : ; M )—note that M C ŒM.M 1/=2 D M.M C1/=2. EXERCISE 8. Let x and y represent random variables (whose expected values and variances exist), let V represent the variance-covariance matrix of the random vector .x; y/, and suppose that var.x/ > 0 and var.y/ > 0. (a) Show that if jV j D 0, then, for scalars a and b, .a; b/0 2 N.V / , b where c D
(
p
p var y D c a var x;
C1; when cov.x; y/ < 0, 1; when cov.x; y/ > 0.
(b) Use the result of Part (a) and the results of Section p 3.2epto devise an alternative proof ofpthe result (established in Section 3.2a) that cov.x; y/ D var x var y if and only if Œy E.y/= var y D p p p Œx E.x/= var x with probability 1 and that cov.x; y/ D var x var y if and only if p p Œy E.y/= var y D Œx E.x/= var x with probability 1. Solution. (a) Suppose that jV j D 0. Then, in light of result (2.54) [or result (2.14.4)], Œcov.x; y/2 D var.x/ var.y/; or, equivalently, so that
p p cov.x; y/ D ˙ var x var y; p var x p 0 c : var y 0 1
p var x p 0 1 V D var y 0 c
Thus, .a; b/0 2 N.V / if and only if p p var x p 0 1 c apvar x 0 D var y c 1 b var y 0 0 p p p p and hence since a var x c b var y D c .b var y c a var x/ if and only if p p b var y c a var x D 0 or, equivalently, if and only if (b) If Œy
p E.y/= var y D Œx
p p b var y D c a var x: p E.x/= var x with probability 1, then
cov.x; y/ D EfŒx
and, similarly, if Œy
E.x/Œy E.y/g p ıp D EfŒx E.x/2 var .y/ var .x/g p p D var .x/ var .y/I p p E.y/= var y D Œx E.x/= var x with probability 1, then cov.x; y/ D EfŒx
E.x/Œy
E.y/g p ıp D Ef Œx E.x/ var .y/ var .x/g p p var .x/ var .y/: D 2
25 p p Conversely, suppose that cov.x; y/ D var x var y. Then, jV j D 0 [as is evident from result (2.54) or result (2.14.4)]. Now, take a to be an arbitrary nonzero (nonrandom) scalar, and take p p b D a var x= var y. Then, it follows from the result of Part (a) that .a; b/0 2 N.V /, implying [in light of result (2.51)] that var.ax C by/ D 0. Moreover, y h p x i p var.ax C by/ D var a var x p var y var x y x D a2 var.x/ var p : p var y var x Thus, y x var p D 0; p var y var x Random Vectors and Matrices
and, consequently,
y p var y
y x p DE p var x var y
x p with probability 1 var x
or, equivalently, y E.y/ x E.x/ p D p with probability 1: var y var x p p p var x var y implies that Œy E.y/= var y D That cov.x; y/ D probability 1 can be established via a similar argument.
Œx
p E.x/= var x with
EXERCISE 9. Let x represent an N -dimensional random column vector (with elements whose expected values and variances exist). Show that (regardless of the rank of var x) there exist a nonrandom column vector c and an N N nonsingular nonrandom matrix A such that the random vector w, defined implicitly by x D c C A0 w, has mean 0 and a variance-covariance matrix of the form diag.I; 0/ [where diag.I; 0/ is to be regarded as including 0 and I as special cases]. Solution. Let V D var.x/ and K D rank.V /, and observe that x D c C A0 w
,
w D .A
1 0
/ .x
c/:
Take c D E.x/. Then, clearly, E.w/ D 0. Now, consider the condition var.w/ D diag.I; 0/. Observe that var.w/ D .A 1 /0 V A 1; and suppose that V is nonnull (in which case K 1)—in the degenerate special case where V D 0, take A to be IN (or any other N N nonsingular matrix). Then, proceeding as in Section 3.3b, take T to be any K N nonrandom matrix such that V D T 0 T , and observe that rank.T / D K—that such a matrix exists and is of rank K follows from Corollary 2.13.23. If K D N , take A D T, with the result that var.w/ D .T
1 0
/ VT
1
D .T T
1 0
/ TT
1
D I 0 I D I:
Alternatively, suppose that K < N , and consider the matrix .R; S/, where R is any right inverse of T and S is any N .N K/ matrix whose columns form a basis for N.V /—Lemma 2.11.5 implies that dimŒN.V / D N K. The matrix .R; S/ is nonsingular—to see this, take L to be a left inverse of S, and observe that T S D 0 (as is evident from Corollary 2.3.4 upon observing that T 0 T S D V S D 0), so that T TR TS IK 0 .R; S/ D D ; L LR LS LR IN K
26
Random Vectors and Matrices
implying (in light of Corollary 2.4.17 and Lemma 2.6.2) that I 0 rank .R; S/ rank K D N: LR IN K Further, since R0 V R D .T R/0 T R D I 0 I D I, we have that 0 R V R R0 V S I 0 .R; S/0 V .R; S/ D D : .V S/0 R S0 V S 0 0 Thus, to satisfy the condition var.w/ D diag.I; 0/, it suffices to take A D .R; S/ A 1 D .R; S/].
1
[in which case
EXERCISE 10. Establish the validity of result (5.11). Solution. For n D 1; 2; 3; : : : , p p 2n.2n 1/.2n 2/ 7 6 5 4 3 2 1 .2n/Š D 4n nŠ 2n.2n 2/ 6 4 2 2n 1 3 5 7 .2n 1/ p D : 2n Thus, to establish the validity of result (5.11), it suffices to establish the validity (for n D 0; 1; 2; : : :) p of the formula .2n/Š 1 : (S.5) nC 2 D 4n nŠ Let us proceed by mathematical induction. Formula (S.5) is valid for n D 0, as is evident from result (5.10). Now, let r represent an arbitrary one of the integers 1; 2; 3; : : : , and suppose that formula (S.5) is valid for n D r 1, in which case p Œ2.r 1/Š 1 1 : r 2 D r 1C 2 D r 1 4 .r 1/Š Then, making use of result (5.6), we find that r C 12 D r D r
1 2 1 2
C1
r
1 2
p .2r 1/.2r/Œ2.r 1/Š D 2 .2r/ 4r 1 .r 1/Š p .2r/Š D ; 4r rŠ
which establishes that formula (S.5) is valid for n D r and thereby completes the mathematicalinduction argument. EXERCISE 11. Let w D jzj, where z is a standard normal random variable. (a) Find a probability density function for the distribution of w. (b) Use the expression obtained in Part (a) (for the probability density function of the distribution of w) to derive formula (5.15) for E.w r /—in Section 3.5c, this formula is derived from the probability density function of the distribution of z. (c) Find E.w/ and var.w/. Solution. (a) Let G./ represent the (cumulative) distribution function of w, let f ./ represent the
27
Random Vectors and Matrices
probabiltiy density function of the standard-normal distribution, and denote by t an arbitrary scalar. For t > 0, we find that G.t/ D Pr.w t/ D Pr. t z t/ Z Z t f .s/ ds C D 0
0
f .s/ ds; t
R0 Rt and because t f .s/ ds D 0 f .s/ ds [as can be formally verified by making the change of variable u D s and recalling result (5.1)], it follows that Z tr Z t 2 2 f .s/ ds D G.t/ D 2 e s =2 ds: 0 0 Moreover, G.t/ D 0 for t < 0. And we conclude that the function g./, defined by 8r 2 ˆ 2 ˆ < e s =2 for s > 0, g.s/ D ˆ ˆ :0 for s 0,
is a probability density function for the distribution of w.
(b) Making use of result (5.9) [and defining g./ as in Part (a)], we find that Z 1 s r g.s/ ds E.w r / D 0 r Z 1 2 2 s r e s =2 ds D 0 r 2r r C1 D : 2 (c) Making use of result (5.16), we find that E.w/ D and var.w/ D E.w 2 /
r
2
ŒE.w/2 D 1
2 :
EXERCISE 12. Let x represent a random variable having mean and variance 2. Then, E.x 2 / D 2 C 2 [as is evident from result (2.3)]. Thus, the second moment of x depends on the distribution of x only through and 2. If the distribution of x is normal, then the third and higher moments of x also depend only on and 2. Taking the distribution of x to be normal, obtain explicit expressions for E.x 3 /, E.x 4 /, and, more generally, E.x r / (where r is an arbitrary positive integer). Solution. Take z to be a standard normal random variable. Then, E.x r / D EŒ. C z/r : And, based on the binomial theorem (e.g., Casella and Berger 2002, sec 3.2), we have that ! r X r r . C z/ D r k k z k k kD0
28
Random Vectors and Matrices
—interpret 00 as 1. Thus, letting r represent the largest even integer that does not exceed r (so that r D r if r is even, and r D r 1 if r is odd) and making use of results (5.17) and (5.18), we find that ! r X r E.x r / D r k k E.z k / k kD0 ! rX =2 r r 2s 2s E.z 2s / D 2s sD0 D r C
rX =2 sD1
r.r 1/.r 2/ .r 2s C 1/ r 2s.2s 2/.2s 4/ 6 4 2
2s
2s:
(S.6)
As special cases of result (S.6), we have that E.x 3 / D 3 C
32 3 2
2 2
D 3 C 3 2
and 43 4 2 2 4321 4 C 2 42 D 4 C 62 2 C 3 4:
E.x 4 / D 4 C
4 4
EXERCISE 13. Let x represent an N -dimensional random column vector whose distribution is N.; †/. Further, let R D rank.†/, and assume that † is nonnull (so that R 1). Show that there exist an R-dimensional nonrandom column vector c and an R N nonrandom matrix A such that c C Ax N.0; I/ (i.e., such that the distribution of c C Ax is R-variate standard normal).
Solution. Take to be an R N nonrandom matrix such that † D 0 , and observe that rank./ D R—that such a matrix exists and is of rank R follows from Corollary 2.13.23. Further, take ƒ to be a right inverse of —the existence of a right inverse follows from Lemma 2.5.1. Then, upon taking A D ƒ0 and c D ƒ0 , we find that A†A0 D ƒ0 0 .ƒ0 /0 D .ƒ/0 ƒ D I 0 I D I; c C A D ƒ0 C ƒ0 D 0:
and
And we conclude (on the basis of Theorem 3.5.1) that, for A D ƒ0 and c D ƒ0 , the distribution of c C Ax is N.0; I/. EXERCISE 14. Let x and y represent random variables, and suppose that x C y and x y are independently and normally distributed and have the same mean, say , and the same variance, say 2. Show that x and y are statistically independent, and determine the distribution of x and the distribution of y. Solution. Let u D x C y and w D x y. And observe (in light of Theorem 3.5.7) that the joint distribution of u and w is MVN. Observe also that
and that
x D 12 .u C w/
and
1 2 1 2
E.u/ C
E.x/ D var.x/ D
var.y/ D
E.y/ D
1 4 1 4
var.u/ C
var.u/ C
E.u/ 1 4 1 4
y D 21 .u 1 2 1 2
E.w/ D ;
E.w/ D 0;
var.w/ C var.w/
w/
1 2 1 2
cov.u; w/ D 21 2;
cov.u; w/ D 21 2;
29
Random Vectors and Matrices and
cov.x; y/ D
1 4
var.u/
1 4
cov.u; w/ C
1 4
cov.w; u/
1 4
var.w/ D 0:
Accordingly, it follows from Corollary 3.5.6 that x and y are statistically independent and from Theorem 3.5.1 that x N. ; 21 2 / and y N.0; 21 2 /. EXERCISE 15. Suppose that two or more random column vectors x1 ; x2 ; : : : ; xP are pairwise independent (i.e., xi and xj are statistically independent for j > i D 1; 2; : : : ; P ) and that the joint distribution of x1 ; x2 ; : : : ; xP is MVN. Is it necessarily the case that x1 ; x2 ; : : : ; xP are mutually independent? Why or why not? Solution. For j > i D 1; 2; : : : ; P , cov.xi ; xj / D 0, as is evident from Lemma 3.2.1 (upon observing that the independence of xi and xj implies the independence of each element of xi and each element xj ) and as is also evident from Corollary 3.5.5. Thus, it follows from Theorem 3.5.4 that x1 ; x2 , : : : ; xP are mutually independent. EXERCISE 16. Let x represent a random variable whose distribution is N.0; 1/, and define y D ux, where u is a discrete random variable that is distributed independently of x with Pr.u D 1/ D Pr.u D 1/ D 12 . (a) Show that y N.0; 1/.
(b) Show that cov.x; y/ D 0.
(c) Show that x and y are statistically dependent.
(d) Is the joint distribution of x and y bivariate normal? Why or why not? Solution. (a) We find that (for an arbitrary constant c) Pr.y c/ D Pr.y c; u D 1/ C Pr.y c; u D
D Pr.x c; u D 1/ C Pr. x c; u D
1/ 1/
D Pr.x c/ Pr.u D 1/ C Pr. x c/ Pr.u D 1/ D
1 2
Pr.x c/ C
1 2
Pr. x c/:
Moreover, upon observing that x N.0; 1/ and hence that x x, it is clear that Pr. x c/ D Pr.x c/. Thus, Pr.y c/ D Pr.x c/, implying that y x and hence that y N.0; 1/. (b) Making use of the statistical independence of u and x, we find that
cov.x; y/ D E.xy/ D E.ux 2 / D E.u/ E.x 2 / D 0.1/ D 0: (c) Clearly, jyj D jxj. This suggests that x and y are statistically dependent. To formally verify their statistical dependence, let c represent a (strictly) positive constant, and observe that Pr.jxj c/ > 0 and that Pr.jyj c/ D Pr.jxj c/ < 1. Then, Pr.jyj c; jxj c/ D Pr.jxj c/ > Pr.jxj c/ Pr.jyj c/; implying that jxj and jyj are statistically dependent. If x and y were statistically independent, then “any” function of x would be statistically independent of “any” function of y (e.g., Casella and Berger 2002, sec. 4.3; Bickel and Doksum 2001, app. A) and, in particular, jxj would be distributed independently of jyj. Thus, x and y are not statistically independent; they are statistically dependent. (d) The joint distribution of x and y is not bivariate normal. If the joint distribution of x and y were bivariate normal, then [in light of the result of Part (b)] it would follow from Corollary 3.5.5 that x and y would be statistically independent, in contradiction to what was established in Part (c). EXERCISE 17. Let x1 ; x2 ; : : : ; xK represent N -dimensional random column vectors, and suppose
30
Random Vectors and Matrices
that x1 ; x2 ; : : : ; xK are mutually independent and that (for i D 1; 2; : : : ; K) xi N.i ; †i /. Derive P (for arbitrary scalars a1 ; a2 ; : : : ; aK ) the distribution of the linear combination K i D1 ai xi .
Solution. Take x to be the (KN -dimensional) random column vector defined by x0 D 0 .x01 ; x02 ; : : : ; xK /, take to be the (KN -dimensional) column vector defined by 0 D 0 0 0 .1 ; 2 ; : : : ; K /, and define † D diag.†1 ; †2 ; : : : ; †K /. Then, according to Theorem 3.5.7, P x N.; †/. Moreover, K i D1 ai xi D Ax, where A D .a1 IN ; a2 IN ; : : : ; aK IN /. And upon applying Theorem 3.5.1, it follows that K X i D1
Clearly, A D
PK
i D1
ai xi N.A; A†A0 /:
ai i and A†A0 D K X i D1
PK
ai xi N
i D1
ai2 †i . Thus,
K X i D1
ai i ;
K X i D1
ai2 †i :
EXERCISE 18. Let x and y represent random variables whose joint distribution is bivariate normal. Further, let 1 D E.x/, 2 D E.y/, 12 D var x, 22 D var y, and D corr.x; y/ (where 1 0 and 2 0). Assuming that 1 > 0 , 2 > 0, and 1 < < 1, show that the conditional distribution of y given x is N Œ2 C 2 .x 1 /=1 ; 22 .1 2 /.
Solution. The assumption that 1 > 0, 2 > 0, and 1 < < 1 implies [in light, e.g., of result (5.33)] that the determinant of the variance-covariance matrix of the vector .x; y/ is nonzero (and, in fact, strictly positive) and hence (recalling Lemma 2.14.21) that this matrix is positive definite. Thus, it follows from Theorem 3.5.8 that the conditional distribution of y given x is normal with mean 2 C 2 1 .12 / 1 .x 1 / D 2 C 2 .x 1 /=1 and variance
22
2 1 .12 / 1 1 2 D 22 .1
2 /:
EXERCISE 19. Let x and y represent random variables whose joint distribution is bivariate normal. Further, let 12 D var x, 22 D var y, and 12 D cov.x; y/ (where 1 0 and 2 0). Describe (in as simple terms as possible) the marginal distributions of x and y and the conditional distributions of y given x and of x given y. Do so for each of the following two “degenerate” cases: (1) 12 D 0; and (2) 12 > 0, 22 > 0, and j12 j D 1 2 . Solution. Let 1 D E.x/ and 2 D E.y/.
(1) Suppose that 12 D 0. Then, according to result (2.12), 12 D 0, implying (in light of Corollary 3.5.5) that x and y are statistically independent. Further, x D 1 with probability 1 (both unconditionally and conditionally on y). And y N.2 ; 22 / or, in the special case where 22 D 0, y D 2 with probability 1 (both unconditionally and conditionally on x). (2) Suppose that 12 > 0, 22 > 0, and j12 j D 1 2 , and let D corr.x; y/. (And observe that D ˙1.) The marginal distribution of x is N.1 ; 12 /. Further, the conditional distribution of y given x is normal with mean 2 C 2 1 .12 /
1
.x
1 / D 2 C 2 .x
1 /=1
and variance 22
2 1 .12 / 1 1 2 D 22 .1
2 / D 0;
so that, conditional on x, y D 2 C 2 .x
1 /=1 with probability 1
31
Random Vectors and Matrices or, equivalently,
y
2 2
D
x
1 1
with probability 1:
Similarly, the marginal distribution of y is N.2 ; 22 /, and the conditional distribution of x given y is N Œ1 C 1 .y 2 /=2 ; 0, with the implication that, conditional on y, x D 1 C 1 .y or, equivalently,
x
1 1
D
y
2 /=2 with probability 1 2 2
with probability 1:
EXERCISE 20. Let x represent an N -dimensional random column vector, and take y to be the M -dimensional random column vector defined by y D c C Ax, where c is an M -dimensional nonrandom column vector and A an M N nonrandom matrix. (a) Express the moment generating function of the distribution of y in terms of the moment generating function of the distribution of x. (b) Use the result of Part (a) to show that if the distribution of x is N.; †/, then the moment generating function of the distribution of y is the same as that of the N.c C A; A†A0 / distribution, thereby (since distributions having the same moment generating function are identical) providing an alternative way of arriving at Theorem 3.5.1. Solution. Let m./ represent the moment generating function of the distribution of y. (a) For an arbitrary M -dimensional nonrandom column vector t, m.t/ D EŒexp.t 0 y/
D EŒexp.t 0 c C t 0 Ax/ D Efexp.t 0 c/ expŒ.A0 t/0 xg
D exp.t 0 c/ EfexpŒ.A0 t/0 xg D exp.t 0 c/ s.A0 t/;
where s./ is the moment generating function of the distribution of x. (b) Suppose that x N.; †/. Then, starting with the result of Part (a) and making use of formula (5.47), we find that (for an arbitrary t) m.t/ D exp.t 0 c/ expŒ.A0 t/0 C 21 .A0 t/0 †A0 t D expŒt 0 .c C A/ C 21 t 0 A†A0 t:
And upon substituting c C A for and A†A0 for † in formula (5.47), it is clear that m./ is the same as the moment generating function of the N.c C A; A†A0 / distribution.
4 The General Linear Model
EXERCISE 1. Verify formula (2.3). Solution. We have that ı.u/ D ˇ1 C where (for j D 2; 3; : : : ; P )
P X
ˇj ıj .u/;
j D2
ıj .u/ D .u
a/j 1:
And making use of basic results on differentiation (and employing a standard notation), we find that ( .j d k 1 ıj .u/ D k 1 du 0
2/ .j
1/.j
k C 1/.u
a/j
k
for j k, for j < k.
Thus, ı .k
1/
.u/ D D
P X
ˇj
j D2 P X
.j
d k 1 ıj .u/ duk 1 1/.j
j Dk
D .k
1/Šˇk C
2/ .j P X
.j
j DkC1
k C 1/.u 1/.j
a/j
2/ .j
k
ˇj
k C 1/.u
a/j
k
ˇj :
EXERCISE 2. Write out the elements of the vector ˇ, of the observed value of the vector y, and of the matrix X (in the model equation y D Xˇ C e) in an application of the G-M model to the cement data of Section 4.2 d. In doing so, regard the measurements of the heat that evolves during hardening as the data points, take C D 4, take u1 ; u2 , u3 , and u4 to be the respective amounts of tricalcium aluminate, tricalcium silicate, tetracalcium aluminoferrite, and ˇ-dicalcium silicate, and take ı.u/ to be of the form (2.11). Solution. The parameter vector ˇ is the .P D 5/-dimensional vector whose transpose is ˇ 0 D .ˇ1 ; ˇ2 ; ˇ3 ; ˇ4 ; ˇ5 /. And, assuming that the N D 13 data points are numbered 1 through 13 in the order in which they are listed in Table 4.2, the observed value of the vector y and the model matrix
34 X are
The General Linear Model 1 78:5 B 74:3 C C B B 104:3 C C B B 87:6 C C B B 95:9 C C B B 109:2 C C B C y DB B 102:7 C B 72:5 C C B B 93:1 C C B B 115:9 C C B B 83:8 C C B @ 113:3 A 109:4
0
0
and
1 B1 B B1 B B1 B B1 B B1 B XDB B1 B1 B B1 B B1 B B1 B @1 1
7 1 11 11 7 11 3 1 2 21 1 11 10
26 29 56 31 52 55 71 31 54 47 40 66 68
6 15 8 8 6 9 17 22 18 4 23 9 8
60 52 20 47 33 22 6 44 22 26 34 12 12
1
C C C C C C C C C C C: C C C C C C C C C A
EXERCISE 3. Write out the elements of the vector ˇ, of the observed value of the vector y, and of the matrix X (in the model equation y D Xˇ C e) in an application of the G-M model to the lettuce data of Section 4.2 e. In doing so, regard the yields of lettuce as the data points, take C D 3, take u1 ; u2 , and u3 to be the transformed amounts of Cu, Mo, and Fe, respectively, and take ı.u/ to be of the form (2.14). Solution. The parameter vector ˇ is the .P D 10/-dimensional vector whose transpose is ˇ 0 D .ˇ1 ; ˇ2 ; ˇ3 ; ˇ4 ; ˇ11 ; ˇ12 ; ˇ13 ; ˇ22 ; ˇ23 ; ˇ33 /: And, assuming that the N D 20 data points are numbered 1 through 20 in the order in which they are listed in Table 4.3, the observed value of the vector y and the model matrix X are 0 1 0 1 1 1 1 0:5 1 1 0:5 1 0:5 0:25 21:42 B1 1 1 0:5 1 1 0:5 1 0:5 0:25 C B 15:92 C B C B C B 1 1 0:5 1 1 0:5 1 0:5 0:25 C B 22:81 C B1 C B C B1 1 1 1 1 1 1 1 1 1 C B 14:90 C B C B C B1 1 1 0:5 1 1 0:5 1 0:5 0:25 C B 14:95 C B C B C B1 1 1 1 1 1 1 1 1 1 C B 7:83 C B C B C B1 1 1 1 1 1 1 1 1 1 C B 19:90 C B C B C B1 1 1 1 p1 1 1 1 1 1 C B 4:68 C B C p B C B1 4 8 B 0:20 C 0 0 p8 0 0 0 0 0 C B C p B C B1 4 8 B 17:65 C 0 0 8 0 0 p0 0 0 C B C B C p y DB C: 4 C and X D B 0 p 8 0 0 0 0 p8 0 0 C B1 B 18:16 C B C B 25:39 C B1 0 0 0 8 0 p0 C 0 4 8 p0 B C B C 4 B 11:99 C B1 0 0 p 8 0 0 0 0 0 p8 C B C B C B 7:37 C B1 0 0 0 0 0 8 C 0 0 48 B C B C B 22:22 C B1 0 0 0 0 0 0 0 0 0 C B C B C B 19:49 C B1 0 0 0 0 0 0 0 0 0 C B C B C B 22:76 C B1 0 0 0 0 0 0 0 0 0 C B C B C B 24:27 C B1 0 0 0 0 0 0 0 0 0 C B C B C @ 27:88 A @1 0 0 0 0 0 0 0 0 0 A 27:53 1 0 0 0 0 0 0 0 0 0 EXERCISE 4. Let y represent a random variable and u a C -dimensional random column vector such that the joint distribution of y and u is MVN (with a nonsingular variance-covariance matrix).
35
The General Linear Model And take z D fzj g to be a transformation (of u) of the form z D R0 Œu
E.u/;
where R is a nonsingular (nonrandom) matrix such that var.z/ D I—the existence of such a matrix follows from the results of Section 3.3b. Show that C
X E.y j u/ E.y/ p D corr.y; zj / zj : var y j D1 Solution. According to result (3.2) (or Theorem 3.5.8), E.y j u/ D E.y/ C cov.y; u/.var u/ Now, let T D R
1
, in which case .R0 /
D .R
Œu
E.u/:
1 0
/ D T 0 . Then,
u D E.u/ C T 0 z;
implying that and hence (since T
1
1
1
var u D T 0 .var z/T D T 0 T D R) that .var u/
1
DT
1
.T 0 /
1
1
DT
.T
1 0
/ D RR0 :
Thus, in light of formula (3.2.45), we have that cov.y; u/.var u/
1
Œu
E.u/ D cov.y; u/RR0 Œu
E.u/ D cov.y; z/ z:
And since var.zj / D 1 (for j D 1; 2; : : : ; C ), we conclude that E.y j u/ E.y/ 1 p Dp cov.y; u/.var u/ var y var y 1 Dp cov.y; z/ z var y D
C X
1
Œu
E.u/
corr.y; zj / zj :
j D1
EXERCISE 5. Let y represent a random variable and u D .u1 ; u2 ; : : : ; uC /0 a C -dimensional random column vector, assume that var.u/ is nonsingular, and suppose that E.y j u/ is expressible in the form E.y j u/ D ˇ1 C ˇ2 .u1
a1 / C ˇ3 .u2
a2 / C C ˇC C1 .uC
aC /;
where a1 ; a2 , : : : ; aC and ˇ1 ; ˇ2 ; ˇ3 , : : : ; ˇC C1 are nonrandom scalars. (a) Using the results of Section 3.4 (or otherwise), show that
and that
.ˇ2 ; ˇ3 ; : : : ; ˇC C1 / D cov.y; u/.var u/ 1; ˇ1 D E.y/ C
C C1 X
ˇj Œaj
1
E.uj
1 /;
j D2
in agreement with the results obtained in Section 4.3 (under the assumption that the joint distribution of y and u is MVN).
36
The General Linear Model
(b) Show that
EŒvar.y j u/ D var.y/
cov.y; u/.var u/
1
cov.u; y/:
Solution. (a) Using result (3.4.1), we find that E.y/ D EŒE.y j u/ D EŒˇ1 C ˇ2 .u1 a1 / C ˇ3 .u2 a2 / C C ˇC C1 .uC C C1 X D ˇ1 C ˇj ŒE.uj 1 / aj 1
aC /
j D2
and hence that
ˇ1 D E.y/
C C1 X
ˇj ŒE.uj
1/
aj
1
j D2
D E.y/ C
C C1 X
ˇj Œaj
1
E.uj
1 /:
j D2
Further, appylying result (3.4.3), observing that cov.y; u j u/ D 0 and E.u j u/ D u (with probability 1), reexpressing E.y j u/ in the form E.y j u/ D ˇ1
C C1 X
ˇj aj
1
j D2
C .ˇ2 ; ˇ3 ; : : : ; ˇC C1 / u;
and recalling formula (3.2.46), we determine that cov.y; u/ D EŒcov.y; u j u/ C covŒE.y j u/; E.u j u/ h D E.0/ C cov ˇ1
C C1 X
ˇj aj
j D2
1
C .ˇ2 ; ˇ3 ; : : : ; ˇC C1 /u; u
i
D .ˇ2 ; ˇ3 ; : : : ; ˇC C1 /.var u/: And it follows that .ˇ2 ; ˇ3 ; : : : ; ˇC C1 / D cov.y; u/.var u/ 1: (b) Using result (3.4.2) and formula (3.2.43) and applying the result of Part (a), we find that var.y/ D EŒvar.y j u/ C varŒE.y j u/ h D EŒvar.y j u/ C var ˇ1
C C1 X
ˇj aj
j D2
1 C .ˇ2 ; ˇ3 ; : : : ; ˇC C1 /u
i
D EŒvar.y j u/ C .ˇ2 ; ˇ3 ; : : : ; ˇC C1 /.var u/.ˇ2 ; ˇ3 ; : : : ; ˇC C1 /0
D EŒvar.y j u/ C cov.y; u/.var u/
implying that
D EŒvar.y j u/ C cov.y; u/.var u/ EŒvar.y j u/ D var.y/
1
.var u/.var u/
1
cov.u; y/;
cov.y; u/.var u/
1
1
Œcov.y; u/0
cov.u; y/:
EXERCISE 6. Suppose that (in conformance with the development in Section 4.4b) the residual effects in the general linear model have been partitioned into K mutually exclusive and exhaustive subsets or classes numbered 1; 2; : : : ; K. And for k D 1; 2; : : : ; K, write ek1 ; ek2 ; : : : ; ekNk for the residual effects in the kth class. Take ak (k D 1; 2; : : : ; K) and rks (k D 1; 2; : : : ; K; s D 1; 2; : : : ; Nk ) to be uncorrelated random variables, each with mean 0, such that var.ak / D k2 for / D k2 (s D 1; 2; : : : ; Nk ) for some strictly positive scalar some nonnegative scalar k and var.rks k . Consider the effect of taking the residual effects to be of the form
37
The General Linear Model eks D ak C rks ;
(E.1)
rather than of the form (4.26). Are there values of k2 and k2 for which the value of var.e/ is the same when the residual effects are taken to be of the form (E.1) as when they are taken to be of the form (4.26)? If so, what are those values; if not, why not? Solution. When the residual effects are taken to be of the form (E.1), we find that (for k D 1; 2; : : : ; K and s D 1; 2; : : : ; Nk ) var.eks / D k2 C k2; that (for k D 1; 2; : : : ; K and t ¤ s D 1; 2; : : : ; Nk )
cov.eks ; ekt / D k2;
and that (for k 0 ¤ k D 1; 2; : : : ; K, s D 1; 2; : : : ; Nk , and t D 1; 2; : : : ; Nk 0 ) cov.eks ; ek 0 t / D 0: For the value of var.e/ to be the same when the residual effects are taken to be of the form (E.1) as when they are taken to be of the form (4.26), it would [in light of results (4.30) and (4.31)] be necessary and sufficient that k2 C k2 D k2 C k2 C !k2 and 1 k2 D k2 !2 Nk 1 k or, equivalently, that
and
Nk
k2 D k2 C
Nk
k2 D k2
1 !2: Nk 1 k
1
!k2 (S.1)
1 ! 2, but not otherwise—if k2 < N 1 1 !k2, then Nk 1 k k condition (S.1) could only be satisfied by taking k2 (which is inherently nonnegative) to be negative.
These conditions can be satisfied if k2
EXERCISE 7. Develop a correlation structure for the residual effects in the general linear model that, in the application of the model to the shear-strength data (of Section 4.4 d), would allow for the possibility that steel aliquots chosen at random from those on hand on different dates may tend to be more alike when the intervening time is short than when it is long. Do so by making use of the results (in Section 4.4 e) on stationary first-order autoregressive processes. Solution. For k D 1; 2; : : : ; K and s D 1; 2; : : : ; Nk , let eks represent the sth of the residual effects associated with the kth class, and suppose that eks D ak C rks : Here, ak (k D 1; 2; : : : ; K) and rks (k D 1; 2; : : : ; K; s D 1; 2; : : : ; Nk ) are taken to be random variables, each with mean 0. Further, the ak ’s are taken to have a common variance 2 and the rks ’s a common variance 2, and the rks ’s are assumed to be uncorrelated with the ak ’s and with each other. However, instead of taking the ak ’s to be uncorrelated (with each other), they are assumed to be such that (for k 0 ¤ k D 1; 2; : : : ; K) corr.ak ; ak 0 / D jk
0
kj
for some nonrandom scalar in the interval 0 < 1 (as would be the case if a1 ; a2 ; : : : ; aK were members of a stationary first-order autoregressive process).
38
The General Linear Model The setting is such that (for k D 1; 2; : : : ; K and s D 1; 2; : : : ; Nk ) var.eks / D 2;
(S.2)
where 2 D 2 C 2. Further, for k D 1; 2; : : : ; K and t ¤ s D 1; 2; : : : ; Nk , cov.eks ; ekt / D var.ak / D 2 D 2;
(S.3)
where D 2 =. 2 C 2 /. And for k 0 ¤ k D 1; 2; : : : ; K, s D 1; 2; : : : ; Nk , and t D 1; 2; : : : ; Nk 0 , cov.eks ; ek 0 t / D cov.ak ; ak 0 / D 2 jk
0
kj
D 2 jk
0
kj
:
(S.4)
0 0 The vector e is expressible as e D .e01 ; e02 ; : : : ; eK / , where (for k D 1; 2; : : : ; K) ek D 0 .ek1 ; ek2 ; : : : ; ekNk / . Accordingly, 0 1 R11 R12 : : : R1K B R21 R22 : : : R2K C B C var.e/ D 2 B : (S.5) :: :: C; :: @ :: : : : A RK1 RK2 : : : RKK
where (for k; k 0 D 1; 2; : : : ; K) Rkk 0 is the Nk Nk 0 matrix whose stth element is corr.eks ; ek 0 t /. And in light of results (S.2), (S.3), and (S.4), we have (for k D 1; 2; : : : ; K) that Rkk D .1 and (for k 0 ¤ k D 1; 2; : : : ; K) that
/INk C 1Nk10Nk
Rkk 0 D jk
0
kj
1Nk10Nk0 :
In the case of the shear-strength data, K D 12 and 8 ˆ < 9 if k D 2; 3; 5; 6; 7; 8; 10, or 12, Nk D 11 if k D 9 or 11, ˆ : 12 if k D 1 or 4.
Taking var.e/ to be of the form (S.5) may (or may not) be preferable to taking var.e/ to be of the more-restrictive (block-diagonal) form (4.34). EXERCISE 8. Suppose (as in Section 4.4g) that the residual effects e1 ; e2 ; : : : ; eN in the general linear model correspond to locations in D-dimensional space, that these locations are represented by D-dimensional column vectors s1 ; s2 ; : : : ; sN of coordinates, and that S is a finite or infinite set of D-dimensional column vectors that includes s1 ; s2 ; : : : ; sN . Suppose further that e1 ; e2 ; : : : ; eN are expressible in the form (4.54) and that conditions (4.55) and (4.57) are applicable. And take ./ to be the function defined on the set H D fh 2 RD W h D s t for s; t 2 S g by .h/ D 2 C 2 Œ1 (a) Show that, for j ¤ i D 1; 2; : : : ; N, 1 2
var.ei
ej / D
—this result serves to establish the function (
.h/ D
K.h/:
.si sj /
./, defined by
.h/ if h ¤ 0, 0 if h D 0,
as what in spatial statistics is known as a semivariogram (e.g., Cressie 1993).
39
The General Linear Model
(b) Show that (1) .0/ D 2 ; that (2) . h/ D .h/ for h 2 H ; and that (3) PM PM x x .t t / 0 for every positive integer M, for all not-necessarily-distinct i j i D1 j D1 i j P vectors t1 ; t2 ; : : : ; tM in S , and for all scalars x1 ; x2 ; : : : ; xM such that M i D1 xi D 0.
Solution. (a) Making use of results (4.56) and (4.58), we find that for j ¤ i D 1; 2; : : : ; N, 1 2
var.ei
ej / D 21 Œvar.ei / D
D
var.ej /
1 2 Œ C 2 C 2 2 1 2 2 2 f2 C 2 Œ1 2 2
D C Œ1 D .si sj /
2 cov.ei ; ej /
C
2
K.si
K.si
2 2 K.si sj / sj /g
sj /
(b) In light of the three requisite properties of the function K./, we have that (1) .0/ D 2 C 2 Œ1
K.0/ D 2 C 2 .1
1/ D 2 I
that (2) for h 2 H , . h/ D 2 C 2 Œ1
K. h/ D 2 C 2 Œ1
K.h/ D
.h/I
and that (3) for every positive integer M, for any M not-necessarily-distinct vectors t1 ; t2 ; : : : ; tM P in S , and for any M scalars x1 ; x2 ; : : : ; xM such that M i D1 xi D 0, M X M X
i D1 j D1
xi xj .ti tj / D
M X M X
i D1 j D1
xi xj f 2 C 2 Œ1
D . 2 C 2 / D0 0:
2
M X i D1
M X M X
xi
M X
xj
K.ti tj /g
2
j D1
M X M X
xi xj K.ti tj /
i D1 j D1
xi xj K.ti tj /
i D1 j D1
EXERCISE 9. Suppose that the general linear model is applied to the example of Section 4.5b (in the way described in Section 4.5b). What is the form of the function ı.u/? Solution. Take C D 3, and define u1 , u2 , and u3 as in Section 4.5b. Then, ı.u/ is such that for u3 D s (D 1; 2; 3; 4), ı.u/ D ˇs1 C ˇs2 u1 C ˇs3 u2 C ˇs4 u21 C ˇs5 u22 C ˇs6 u1 u2 ;
where ˇs1 ; ˇs2 ; : : : ; ˇs6 are unknown parameters.
5 Estimation and Prediction: Classical Approach
EXERCISE 1. Take the context to be that of estimating parametric functions of the form 0ˇ from an N 1 observable random vector y that follows a G-M, Aitken, or general linear model. Verify (1) that linear combinations of estimable functions are estimable and (2) that linear combinations of nonestimable functions are not necessarily nonestimable. P Solution. (1) Let jKD1 cj .j0 ˇ/ represent an arbitrary linear combination of an arbitrary number K 0 of estimable linear functions 10 ˇ; 20 ˇ; : : : ; K ˇ. Then, for j D 1; 2; : : : ; K, there exists an N 1 P 0 0 vector of constants aj such that E.aj y/ D j ˇ. And letting a D jKD1 cj aj , we find that 0
E.a y/ D E Thus, upon observing that estimable function.
PK
K hX
cj .aj0 y/
j D1
0 j D1 cj .j ˇ/
D
i
D
K X
j D1
cj E.aj0 y/
D
K X
cj .j0 ˇ/:
j D1
0 PK 0 j D1 cj .j ˇ/ is an j D1 cj j ˇ, we conclude that
PK
(2) Take 10 ˇ to be any estimable function and 20 ˇ to be any nonestimable function. And define 3 D 1 C 2 . Then, 30 ˇ is a nonestimable function—as is evident upon observing that 20 ˇ D 30 ˇ 10 ˇ and hence that if 30 ˇ were estimable, then [in light of the result of Part (1)] 20 ˇ would be estimable. Moreover, 30 ˇ 20 ˇ D 10 ˇ. Thus, 30 ˇ 20 ˇ, which is a linear combination of 2 nonestimable functions, is estimable. EXERCISE 2. Take the context to be that of estimating parametric functions of the form 0ˇ from an N 1 observable random vector y that follows a G-M, Aitken, or general linear model. And let R D rank.X/. (a) Verify (1) that there exists a set of R linearly independent estimable functions; (2) that no set of estimable functions contains more than R linearly independent estimable functions; and (3) that if the model is not of full rank (i.e., if R < P ), then at least one and, in fact, at least P R of the individual parameters ˇ1 ; ˇ2 ; : : : ; ˇP are nonestimable. (b) Show that the j th of the individual parameters ˇ1 ; ˇ2 ; : : : ; ˇP is estimable if and only if the j th element of every vector in N.X/ equals 0 (j D 1; 2; : : : ; P ). Solution. (a) (1) Take 1 ; 2 ; : : : ; R to be any R P -dimensional column vectors whose transposes 0 are linearly independent rows of the model matrix X—that X contains R linearly 10 ; 20 ; : : : ; R independent rows follows from Theorem 2.4.19. Then, obviously, i0 2 R.X/ (i D 1; 2; : : : ; R). And 0 ˇ are R linearly independent in light of the results of Section 5.3, it follows that 10 ˇ; 20 ˇ; : : : ; R estimable functions. 0 (2) Let 10 ˇ; 20 ˇ; : : : ; K ˇ represent any K linearly independent estimable functions (where 0 are K is an arbitrary positive integer). Then, in light of the results of Section 5.3, 10 ; 20 ; : : : ; K linearly independent vectors in R.X/. And upon observing that dimŒR.X/ D R, it follows from Therorem 2.4.9 that K R.
(3) Suppose that R < P . Then, upon observing that the individual parameters ˇ1 ; ˇ2 ; : : : ; ˇP can be regarded as P linearly independent combinations of the elements of ˇ, it follows from the
42
Estimation and Prediction: Classical Approach
result of Part (2) that no more than R of the individual parameters are estimable (in which case, at least P R of them are nonestimable). (b) Clearly, ˇj D j0 ˇ, where j is the j th column of IP . Accordingly, it follows from the results of Section 5.3b that ˇj is estimable if and only if k0 j D 0
(S.1)
for every P 1 vector k in N.X/. Now, let kj represent the j th element of k. Then, upon observing that k0 j D kj ;
condition (S.1) is reexpressible as
kj D 0:
Thus, ˇj is estimable if and only if the j th element of every vector in N.X/ equals 0. EXERCISE 3. Show that for a parametric function of the form 0ˇ to be estimable from an N 1 observable random vector y that follows a G-M, Aitken, or general linear model, it is necessary and sufficient that rank.X0; / D rank.X/: Solution. It follows from the results of Section 5.3 that for 0ˇ to be estimable, it is necessary and sufficient that the linear system X0 a D (in a) be consistent. And upon observing (in light of Theorem 2.9.2) that the linear system X0 a D is consistent if and only if rank.X0; / D rank.X0 / and observing [in light of equality (2.4.1)] that rank.X0 / D rank.X/, it is clear that for 0ˇ to be estimable, it is necessary and sufficient that rank.X0; / D rank.X/. EXERCISE 4. Suppose that y is an N 1 observable random vector that follows a G-M, Aitken, Q where or general linear model. Further, take y to be any value of y, and consider the quantity 0 b, 0 is an arbitrary P 1 vector of constants and bQ is any solution to the linear system X Xb D X0 y Q then 0ˇ is an (in the P 1 vector b). Show that if 0 bQ is invariant to the choice of the solution b, estimable function. And discuss the implications of this result. Solution. Theorem 2.11.7 implies that a P 1 vector is a solution to X0 Xb D X0 y if and only if it is expressible in the form .X0 X/ X0 y C ŒI .X0 X/ X0 Xz for some P 1 vector z. Thus, 0 bQ is invariant to the choice of the solution bQ if and only if 0 .X0 X/ X0 y C 0 ŒI
.X0 X/ X0 Xz
is invariant to the choice of z, or equivalently (since one choice for z is z D 0) if and only if 0 ŒI
.X0 X/ X0 Xz D 0
for every P 1 vector z. Moreover, in light of Lemma 2.2.2, 0 ŒI
.X0 X/ X0 Xz D 0 for every z ) )
) )
0 ŒI .X0 X/ X0 X D 0 0 D 0 .X0 X/ X0 X
0 D a0 X for some N 1 vector a 0ˇ is an estimable function:
The primary implication of this result is that it is not feasible to extend the definition of a least squares estimator of an estimable parametric function of the form 0ˇ to a nonestimable parametric
Estimation and Prediction: Classical Approach
43
function of the form 0ˇ. Such an extension would result in a function of y that is not uniquely defined. The least squares estimator of an estimable parametric function 0ˇ is (by definition) the function of y whose value at y D y is 0 bQ (where bQ is an any solution to the linear system X0 Xb D X0 y). If 0ˇ Q If 0ˇ were nonestimable, is an estimable function, 0 bQ is invariant to the choice of the solution b. 0Q Q b would not be invariant to the choice of b. EXERCISE 5. Suppose that y is an N 1 observable random vector that follows a G-M, Aitken, or general linear model. And let a represent an arbitrary N 1 vector of constants. Show that a0 y is the least squares estimator of its expected value E.a0 y/ (i.e., of the parametric function a0 Xˇ) if and only if a 2 C.X/. Solution. Let `.y/ represent the least squares estimator of a0 Xˇ. And observe [in light of result (4.24)] that `.y/ D a0 X.X0 X/ X0 y
and [in light of result (4.35)] that
`.y/ D rQ 0 X0 y;
where rQ is any solution to the conjugate normal equations X0 Xr D X0 a. Either of these two representations can be used to establish that a0 y is the least squares estimator of its expected value if and only if a 2 C.X/. Now, suppose that a 2 C.X/. Then, a D Xk for some P 1 vector k. And in light of Part (2) of Theorem 2.12.2, we have that `.y/ D a0 X.X0 X/ X0 y D k0 X0 X.X0 X/ X0 y D k0 X0 y D a0 y: Or, alternatively, upon observing that X0 a D X0 Xk and hence that k is a solution to the conjugate normal equations, we have that `.y/ D k0 X0 y D a0 y: Conversely, suppose that `.y/ D a0 y (for every value of y). Then, a0 X.X0 X/ X0 D a0 and hence a D XŒ.X0 X/ 0 X0 a 2 C.X/. Or, alternatively, rQ 0 X0 D a0 and hence a D XQr 2 C.X/.
EXERCISE 6. Let U represent a subspace of the linear space RM of all M -dimensional column vectors. Verify that the set U? (comprising all M -dimensional column vectors that are orthogonal to U) is a linear space. Solution. The M 1 null vector 0 is a member of U?, and consequently U? is nonempty. Moreover, for any scalar k and for any M -dimensional column vector x in U?, we have that kx 2 U?, as is evident upon observing that for every M -dimensional column vector w in U, .kx/ w D k.x w/ D 0: Similarly, for any M -dimensional column vectors x and y in U?, we have that xCy 2 U?, as is evident upon observing that for every M -dimensional column vector w in U, .x C y/ w D .x w/ C .y w/ D 0 C 0 D 0: And it follows that U? is a linear space. EXERCISE 7. Let X represent an N P matrix. A P N matrix G is said to be a least squares generalized inverse of X if it is a generalized inverse of X (i.e., if XGX D X) and if, in addition, .XG/0 D XG (i.e., XG is symmetric). (a) Show that G is a least squares generalized inverse of X if and only if X0 XG D X0.
(b) Using Part (a) (or otherwise), establish the existence of a least squares generalized inverse of X.
44
Estimation and Prediction: Classical Approach
(c) Show that if G is a least squares generalized inverse of X, then, for any N Q matrix Y , the matrix GY is a solution to the linear system X0 XB D X0 Y (in the P Q matrix B). Solution. (a) Suppose that X0 XG D X0. Then, X0 XGX D X0 X, implying (in light of Corollary 2.3.4) that XGX D X. Moreover, XG D .X0 /0 G D .X0 XG/0 G D .XG/0 XG;
so that XG is symmetric. Thus, G is a least squares generalized inverse of X. Conversely, if G is a least squares generalized inverse of X, then X0 XG D X0 .XG/0 D .XGX/0 D X0: (b) In light of Part (a), it suffices to establish the existence of a matrix G such that X0 XG D X0. Moreover, according to Part (2) of Theorem 2.12.2, X0 XG D X0 for G D .X0 X/ X0. (c) Suppose that G is a least squares generalized inverse of X. Then, in light of Part (a), X0 XGY D X0 Y ;
so that GY is a solution to the linear system X0 XB D X0 Y . EXERCISE 8. Let A represent an M N matrix. An N M matrix H is said to be a minimum norm generalized inverse of A if it is a generalized inverse of A (i.e., if AHA D A) and if, in addition, .HA/0 D HA (i.e., HA is symmetric). (a) Show that H is a minimum norm generalized inverse of A if and only if H0 is a least squares generalized inverse of A0 (where least squares generalized inverse is as defined in Exercise 7). (b) Using the results of Exercise 7 (or otherwise), establish the existence of a minimum norm generalized inverse of A. (c) Show that if H is a minimum norm generalized inverse of A, then, for any vector b 2 C.A/, kxk attains its minimum value over the set fx W Ax D bg [comprising all solutions to the linear system Ax D b (in x)] uniquely at x D Hb (where kk denotes the usual norm). Solution. (a) Suppose that H0 is a least squares generalized inverse of A0. Then, A0 H0 A0 D A0 and .A0 H0 /0 D A0 H0, implying that and that
AHA D .A0 H0 A0 /0 D .A0 /0 D A
.HA/0 D A0 H0 D .A0 H0 /0 D HA:
Thus, H is a minimum norm generalized inverse of A. Conversely, suppose that H is a minimum norm generalized inverse of A. Then, and
A0 H0 A0 D .AHA/0 D A0; .A0 H0 /0 D Œ.HA/0 0 D .HA/0 D A0 H0:
And it follows that H0 is a least squares generalized inverse of A0.
(b) It follows from Part (a) that the transpose of any least squares generalized inverse of A0 is a minimum norm generalized inverse of A. Thus, the existence of a minimum norm generalized inverse of A follows from the existence of a least squares generalized inverse of A0 ; the existence of a least squares generalized inverse of A0 is a consequence of the result established in Part (b) of Exercise 7. (c) Suppose that H is a minimum norm generalized inverse of A. And let x represent an arbitrary solution to Ax D b [where b 2 C.A/]. Then, according to Theorem 2.11.7, x D Hb C .I
HA/z
45
Estimation and Prediction: Classical Approach for some vector z. Accordingly, we have that kxk2 D ŒHb C .I 2
HA/z0 ŒHb C .I
D kHbk C k.I
HA/z
2
HA/zk C 2.Hb/0 .I
HA/z:
Moreover, upon observing that b D Ar for some vector r, we find that .Hb/0 .I Thus,
HA/z D r 0 .HA/0 .I
HA/z D r 0 HA.I
kxk2 D kHbk2 C k.I
HA/z D r 0 H.A
AHA/z D 0:
HA/zk2;
so that kxk2 kHbk2 or, equivalently, kxk kHbk with equality holding if and only if k.I HA/zk D 0. Upon observing that k.I HA/zk D 0 if and only if .I HA/z D 0 and hence if and only if x D Hb (and observing also that Hb is a solution to Ax D b), we conclude that kxk attains its minimum value over the set fx W Ax D bg uniquely at x D Hb. EXERCISE 9. Let X represent an N P matrix, and let G represent a P N matrix that is subject to the following four conditions: (1) XGX D X; (2) GXG D G; (3) .XG/0 D XG; and (4) .GX/0 D GX. (a) Show that if a P P matrix H is a minimum norm generalized inverse of X0 X, then conditions (1)–(4) can be satisfied by taking G D HX0.
(b) Use Part (a) and the result of Part (b) of Exercise 8 (or other means) to establish the existence of a P N matrix G that satisfies conditions (1)–(4) and show that there is only one such matrix.
(c) Let XC represent the unique P N matrix G that satisfies conditions (1)–(4)—this matrix is customarily referred to as the Moore-Penrose inverse, and conditions (1)–(4) are customarily referred to as the Moore-Penrose conditions. Using Parts (a) and (b) and the results of Part (c) of Exercise 7 and Part (c) of Exercise 8 (or otherwise), show that XC y is a solution to the linear system X0 Xb D X0 y (in b) and that kbk attains its minimum value over the set fb W X0 Xb D X0 yg (comprising all solutions to the linear system) uniquely at b D XC y (where kk denotes the usual norm). Solution. (a) Suppose that H is a minimum norm generalized inverse of X0 X. Then, (1) X.HX0 /X D X, as is evident from Part (1) of Theorem 2.12.2, (2) .HX0 /X.HX0 / D HX0, as is evident from Part (2) of Theorem 2.12.2, (3) ŒX.HX0 /0 D XH0 X0 D X.HX0 /, as is evident, for example, from Part (3) of Theorem 2.12.2 and from Corollary 2.10.11, and (4) Œ.HX0 /X0 D .HX0 X/0 D HX0 X D .HX0 /X. (b) Upon observing [in light of Part (b) of Exercise 8] that there exists a minimum norm generalized inverse of X0 X, it follows from Part (a) (of the present exercise) that there exists a P N matrix G that satisfies conditions (1)–(4). Now, let G represent any particular P N matrix that satisfies conditions (1)–(4) [i.e., for which conditions (1)–(4) can be satisfied by setting G D G ]. Then, for any matrix G that satisfies conditions (1)–(4), G D GXG D G.XG/0 D GG 0 X0 D GG 0 .XG X/0 D GG 0 X0 .XG /0 D GG 0 X0 XG D G.XG/0 XG D GXGXG D GXG D GXG XG D .GX/0 .G X/0 G D X0 G 0 X0 G0 G D .XGX/0 G0 G D X0 G0 G D .G X/0 G D G XG D G :
Thus, there is only one P N matrix G that satisfies conditions (1)–(4).
(c) The Moore-Penrose inverse XC of X is a least squares generalized inverse of X. Accordingly,
46
Estimation and Prediction: Classical Approach
it follows from the result of Part (c) of Exercise 7 that XC y is a solution to the linear system X0 Xb D X0 y. Further, it follows from the results of Parts (a) and (b) (of the present exercise) that XC D HX0, where H is a minimum norm generalized inverse of X0 X. Thus, XC y D HX0 y. And based on the result of Part (c) of Exercise 8, we conclude that b attains its minimum value over the set fb W X0 Xb D X0 yg uniquely at b D XC y. EXERCISE 10. Consider further the alternative approach to the least squares computations, taking the formulation and the notation to be those of the final part of Section 5.4e. Q 1 C L2 hQ 2 , where hQ 2 is an arbitrary (P K)-dimensional column vector and hQ 1 is (a) Let bQ D L1 h Q is minimized by taking the solution to the linear system R1 h1 D z1 R2 hQ 2 . Show that kbk hQ 2 D ŒI C .R1 1 R2 /0 R1 1 R2
1
.R1 1 R2 /0 R1 1 z1 :
Do so by formulating this minimization problem as a least squares problem in which the role of R1 1 R2 R1 1 z1 y is played by the vector , the role of X is played by the matrix , and the I 0 Q 2. role of b is played by h (b) Let O1 representa P K matrix with orthonormal columns and T1 a K K upper triangular R01 D O1 T10 —the existence of a decomposition of this form can be estabmatrix such that R02 lished in much the same way as the existence of the QR decomposition (in which T1 would be lower triangular rather than upper triangular). Further, take O2 to be any P .P K/ matrix such that the P P matrix O defined by O D .O1 ; O2 / is orthogonal. T1 0 0 (1) Show that X D QT .LO/ , where T D . 0 0 (2) Showthaty Xb D Q1 .z1 T1 d1 / C Q2 z2 , where d D .LO/0 b and d is partitioned as d1 . dD d2 (3) Show that .y Xb/0 .y Xb/ D .z1 T1 d1 /0 .z1 T1 d1 / C z02 z2 . (4) Taking dQ 1 to be the solution to the linear system T1 d1 D z1 (in d1 ), show that .y Xb/0 .y Xb/ attains a minimum value of z02 z2 and that it does so at a value bQ of b if and dQ only if bQ D LO Q 1 for some .P K/ 1 vector dQ 2 . d2 (5) Letting bQ represent an arbitrary one of the values of b at which .y Xb/0 .y Xb/ attains a minimum value [and, as in Part (4), taking dQ 1 to be the solution to T1 d1 D z1 ], show that Q 2 (where kk denotes the usual norm) attains a minimum value of dQ 0 dQ and that it does kbk 1 1 dQ 1 Q so uniquely at b D LO . 0 Solution. (a) Clearly, Q 2 D bQ 0 bQ D kbk Moreover, hQ 1 D R1 1 z1
0 0 Q1 h hQ 1 hQ 1 hQ 1 0 Q0 Q Q0 Q L L Q 2 D h1 h1 C h2 h2 : Qh2 D hQ 2 Qh2 h
Q , so that R1 1 R2 h 2
Q 0 hQ Q /0 .R 1 z Q 0 hQ D .R 1 z R1 1 R2 hQ 2 / C h R1 1 R2 h hQ 01 hQ 1 C h 2 2 1 1 2 2 1 1 2 1 1 0 1 1 R1 z1 R1 R2 Q R1 z1 R1 R2 Q h2 : h2 D 0 I 0 I Q with respect to hQ 2 can be formulated as a least squares Accordingly, the problem of minimizing kbk
47 1 1 R1 z1 R1 R2 problem. More specifically, when the the roles of y, X, and b are played by , , and 0 I hQ 2 , respectively, it can be formulated as the (least squares) problem of minimizing .y Xb/0 .y Xb/ Q 2 at which kbk Q attains its minimum value with respect to b. Thus, an expression for the value of h can be obtained from the solution to the normal equations for this least squares problem, that is, from the solution to the linear system 1 0 1 1 0 1 R1 R2 R1 R2 Q R1 R2 R1 z1 h2 D : I I I 0 Estimation and Prediction: Classical Approach
That solution is
hQ 2 D ŒI C .R1 1 R2 /0 R1 1 R2 1.R1 1 R2 /0 R1 1 z1 : (b) (1) Clearly,
0 0 R1 .R1 ; R2 / D D T1 O10 D .T1 ; 0/O 0; R02 R1 R2 T1 0 RD D O 0 D T O 0: 0 0 0 0
so that
(S.2)
And upon replacing R in the expression X D QRL0 with expression (S.2), we obtain the expression X D QT .LO/0: (2) That y
Xb D Q1 .z1
T1 d1 / C Q2 z2 is clear upon observing that y
and that z
Xb D Q.z T d/ z1 T1 d1 : Td D z2
(S.3) (S.4)
(3) Making use of results (S.3) and (S.4), we find that .y
Xb/0 .y
Xb/ D .z
T d/0 Q0 Q.z
T d/
0
D .z T d/ .z T d/ D .z1 T1 d1 /0 .z1 T1 d1 / C z02 z2 :
(S.5)
(4) Expression (S.5) attains a minimum value of z02 z2 and does so at those values of d for which the first term of expression (S.5) equals 0 or, equivalently, at those values for which d1 D dQ 1 . Thus, it follows from the results of Part (3) that y Xb0 .y Xb/ attains a minimum value of z02 z2 and dQ that it does so at b D bQ if and only if bQ D LO Q 1 for some .P K/ 1 vector dQ 2 . d2 dQ Q (5) In light of Part (4), it suffices to take b D LO Q 1 , where dQ 2 is an arbitrary .P K/ 1 d2 vector. Thus, recalling that the product of orthogonal matrices is orthogonal, we find that 0 0 Q Q Q dQ 1 Q 2 D bQ 0 bQ D d1 .LO/0 .LO/ d1 D d1 D dQ 01 dQ 1 C dQ 02 dQ 2 : kbk Qd2 Qd2 Qd2 dQ 2 Upon observing that dQ 1 is uniquely determined—it is the unique solution to T1 d1 D z1 —and that dQ 2 is arbitrary, it is clear that dQ 01 dQ 1 C dQ 02 dQ 2 attains a minimum value of dQ 01 dQ 1 and that it does so Q 2 attains a minimum value of dQ 0 dQ and that it does so if and only if dQ 2 D 0. Weconclude that kbk 1 1 dQ 1 Q uniquely at b D LO . 0
48
Estimation and Prediction: Classical Approach
EXERCISE 11. Verify that the difference (6.14) is a nonnegative definite matrix and that it equals 0 if and only if c C A0 y D `.y/.
Solution. Clearly, E.A0 y/ D ƒ0ˇ; that is, A0 y is an unbiased estimator of ƒ0ˇ. And as a consequence, the MSE matrix of c C A0 y is EŒ.cCA0 y ƒ0ˇ/.cCA0 y ƒ0ˇ/0 D cc0 C EŒ.A0 y ƒ0ˇ/.A0 y ƒ0ˇ/0 C EŒc.A0 y ƒ0ˇ/0 C EfŒc.A0 y ƒ0ˇ/0 0 g D cc0 C var.A0 y/: Thus, EŒ.cCA0 y ƒ0ˇ/.cCA0 y ƒ0ˇ/0
EfŒ`.y/ ƒ0ˇŒ`.y/ ƒ0ˇ0 g D cc0 C var.A0 y/
varŒ`.y/:
(S.6)
Moreover, it follows from Theorem 5.6.1 that var.A0 y/ varŒ`.y/ is a nonnegative definite matrix. Accordingly, expression (S.6) is the sum of two nonnegative definite matrices and, consequently, is itself a nonnegative definite matrix. It remains to show that if the difference (6.14) equals 0, then c C A0 y D `.y/—that c C A0 y D `.y/ implies that the difference (6.14) equals 0 is obvious. Accordingly, suppose that the difference (6.14) equals 0. Further, let c1 ; c2 ; : : : ; cM represent the elements of c, and observe [in light of equality (S.6)] that var.A0 y/ varŒ`.y/ D cc0: (S.7) 2 Clearly, the diagonal elements of cc0 are c12 ; c22 ; : : : ; cM , and because var.A0 y/ varŒ`.y/ is nonnegative definite and because the diagonal elements of a nonnegative definite matrix are nonnegative, it follows from equality (S.7) that cj2 0 (j D 1; 2; : : : ; M ) and hence that c1 D c2 D D cM D 0 or, equivalently, c D 0. Thus, c C A0 y is a linear unbiased estimator of ƒ0ˇ, and var.c C A0 y/ varŒ`.y/ D var.A0 y/ varŒ`.y/ D 0:
And we conclude (on the basis of Theorem 5.6.1) that c C A0 y D `.y/.
EXERCISE 12. Suppose that y is an N 1 observable random vector that follows a G-M, Aitken, or general linear model. And let s.y/ represent any particular translation-equivariant estimator of an estimable linear combination 0ˇ of the elements of the parametric vector ˇ—e.g., s.y/ could be the least squares estimator of 0ˇ. Show that an estimator t.y/ of 0ˇ is translation equivariant if and only if t.y/ D s.y/ C d.y/ for some translation-invariant statistic d.y/. Solution. Suppose that
t.y/ D s.y/ C d.y/
for some translation-invariant statistic d.y/. Then, for every P 1 vector k (and for every value of y), t.y C Xk/ D s.y C Xk/ C d.y C Xk/ D s.y/ C 0 k C d.y/ D t.y/ C 0 k:
Thus, t.y/ is translation equivariant. Conversely, suppose that t.y/ is translation equivariant. And define d.y/ D t.y/ clearly, t.y/ D s.y/ C d.y/:
s.y/. Then,
49
Estimation and Prediction: Classical Approach
Moreover, d.y/ is translation invariant, as is evident upon observing that, for every P 1 vector k (and for every value of y), d.y C Xk/ D t.y C Xk/ D t.y/ C 0 k
D t.y/ s.y/ D d.y/:
s.y C Xk/ Œs.y/ C 0 k
EXERCISE 13. Suppose that y is an N 1 observable random vector that follows a G-M model. And let y 0Ay represent a quadratic unbiased nonnegative-definite estimator of 2, that is, a quadratic form in y whose matrix A is a symmetric nonnegative definite matrix of constants and whose expected value is 2. (a) Show that y 0Ay is translation invariant. (b) Suppose that the fourth-order moments of the distribution of the vector e D .e1 ; e2 ; : : : ; eN /0 are such that (for i; j; k; m D 1; 2; : : : ; N ) E.ei ej ek em / satisfies condition (7.38). For what choice of A is the variance of the quadratic unbiased nonnegative-definite estimator y 0Ay a minimum? Describe your reasoning. Solution. (a) As an application of formula (7.11), we have [as in result (7.57)] that E.y 0Ay/ D 2 tr.A/ C ˇ 0 X0AXˇ:
And because y 0Ay estimates 2 unbiasedly, it follows that ˇ 0 X0AXˇ D 0 for all ˇ [and that tr.A/ D 1] and hence (in light of Corollary 2.13.4) that X0AX D 0. Thus, based on Corollary 2.13.27, we conclude that AX D 0 and that, as a consequence, y 0Ay is translation invariant.
(b) It follows from Part (a) that every quadratic unbiased nonnegative-definite estimator of 2 is a quadratic unbiased translation-invariant estimator. And it follows from the results of Section 5.7d that the estimator of 2 that has minimum variance among quadratic unbiased translation-invariant estimators is the estimator O 2 D eQ 0 eQ =.N rank X/ [where eQ D .I PX /y]. Moreover, O 2 is expressible as 0 1 1 .I PX / p .I PX /y; O 2 D y 0 p N rank X N rank X which is a quadratic form (in y) whose matrix is (in light of Corollary 2.13.15) symmetric nonnegative definite. Thus, O 2 is a quadratic unbiased nonnegative-definite estimator. Since O 2 has minimum variance among quadratic unbiased translation-invariant estimators and since every quadratic unbiased nonnegative-definite estimator is a quadratic unbiased translation-invariant estimator, we conclude that O 2 has minimum variance among quadratic unbiased nonnegative-definite estimators. EXERCISE 14. Suppose that y is an N 1 observable random vector that follows a G-M model. Suppose further that the distribution of the vector e D .e1 ; e2 ; : : : ; eN /0 has third-order moments jkm D E.ej ek em / (j; k; m D 1; 2; : : : ; N ) and fourth-order moments ijkm D E.ei ej ek em / (i; j; k; m D 1; 2; : : : ; N ). And let A D faij g represent an N N symmetric matrix of constants. (a) Show that in the special case where the elements e1 ; e2 ; : : : ; eN of e are statistically independent, var.y 0Ay/ D a0 a C 4ˇ 0 X0Aƒ a C 2 4 tr.A2 / C 4 2 ˇ 0 X0A2 Xˇ;
(E.1)
where is the N N diagonal matrix whose i th diagonal element is i i i i 3 4, where ƒ is the N N diagonal matrix whose i th diagonal element is i i i , and where a is the N 1 vector whose elements are the diagonal elements a11 ; a22 ; : : : ; aNN of A. (b) Suppose that the elements e1 ; e2 ; : : : ; eN of e are statistically independent, that (for i D 1; 2; : : : ; N ) i i i i D (for some scalar ), and that all N of the diagonal elements of the
50
Estimation and Prediction: Classical Approach PX matrix are equal to each other. Show that the estimator O 2 D eQ 0 eQ =.N rank X/ [where eQ D .I PX /y] has minimum variance among all quadratic unbiased translation-invariant estimators of 2.
Solution. (a) Suppose that e1 ; e2 ; : : : ; eN are statistically independent. Then, unless m D k D j , jkm D 0. And ( 4 if j Di and mDk¤i , if kDi and mDj ¤i , or if mDi and kDj ¤i ,
ijkm D 0 unless j Di and mDk, kDi and mDj , and/or mDi and kDj . Thus, as an application of formula (7.17), we have that var.y 0Ay/ D .vec A/0 vec A C 4ˇ 0 X0Aƒ vec A C 2 4 tr.A2 / C 4 2 ˇ 0 X0A2 Xˇ; where is an N 2 N 2 matrix whose entry for the ij th row [row .i 1/N C j ] and kmth column [column .k 1/N C m] is i i i i 3 4 if m D k D j D i and is 0 otherwise and where ƒ is an N N 2 matrix whose entry for the j th row and kmth column [column .k 1/N C m] is jjj if m D k D j and is 0 otherwise. Moreover, .vec A/0 vec A D a0 a and ƒ vec A D ƒ a.
(b) Suppose that y 0Ay is a quadratic unbiased translation-invariant estimator of 2. Then, upon applying formula (E.1) (with , ƒ, and a being defined accordingly) and upon observing (in light of the results of Section 5.7d) that AX D 0, we find that var.y 0Ay/ D a0 a C 4ˇ 0 .AX/0 ƒ a C 2 4 tr.A2 / C 4 2 ˇ 0 X0AAXˇ D .
D .
1 .I N rank X of result (7.63), we find that Let R D A
and that
3 4 /a0 a C 0 C 2 4 tr.A2 / C 0
3 4 /a0 a C 2 4 tr.A2 /:
(S.8)
PX /. Then, proceeding in much the same way as in the derivation tr.R/ D 0 1 tr.A2 / D C tr.R0 R/: N rank X
(S.9) (S.10)
Now, let (for i; j D 1; 2; : : : ; N ) rij represent the ij th element of R, and take r to be the N 1 vector whose elements are the diagonal elements r11 ; r22 ; : : : ; rNN of R. Because (by supposition) each of the diagonal elements of the PX matrix has the same value, say c, we have that rDa
1 c 1: N rank X
Moreover, it follows from result (7.35) that cD Thus, r D a (S.9)] that
1 1 tr.PX / D rank X: N N
1 1 1, implying that a is expressible as a D 1 C r and hence [in light of result N N X N 1 X a0 a D 2 C 2 ri i C ri2i N N i i X 1 1 C 2 tr.R/ C ri2i D N N i X X 1 1 2 C0C C ri i D ri2i : (S.11) D N N i
i
51
Estimation and Prediction: Classical Approach
And upon replacing a0 a and tr.A2 / in expression (S.8) with expressions (S.11) and (S.10), we find that X 1 1 C C tr.R0 R/ ri2i C 2 4 var.y 0Ay/ D . 3 4 / N N rank X i X X X 1 1 1 D . 4 / C C ri2i C 2 4 rij2 ri2i N N rank X N i i; j i X X rank X 1 C C ri2i C 2 4 rij2 ; D . 4 / N N.N rank X/ i
i; j ¤i
which (upon observing that 4 ) leads to the conclusion that var.y 0Ay/ attains its minimum 1 value when R D 0 or, equivalently, when A D .I PX / (i.e., when y 0Ay D O 2 ). N rank X EXERCISE 15. Suppose that y is an N 1 observable random vector that follows a G-M model, and assume that the distribution of the vector e of residual effects is MVN. (a) Letting 0ˇ represent an estimable linear combination of the elements of the parametric vector ˇ, find a minimum-variance unbiased estimator of .0ˇ/2. (b) Find a minimum-variance unbiased estimator of 4. Solution. Recall (from Section 5.8) that X0 y and y 0 .I PX /y form a complete sufficient statistic and that, consequently, “any” function, say tŒX0 y; y 0 .I PX /y, of X0 y and y 0 .I PX /y is a minimumvariance unbiased estimator of EftŒX0 y; y 0 .I PX /yg. (a) Recall (from Section 5.4c) that the least squares estimator of 0ˇ is rQ 0 X0 y, where rQ is any solution to the conjugate normal equations X0 Xr D . Recall also that E.Qr 0 X0 y/ D 0ˇ
And observe that
and
var.Qr 0 X0 y/ D 2 rQ 0 :
EŒ.Qr 0 X0 y/2 D var.Qr 0 X0 y/ C ŒE.Qr 0 X0 y/2 D 2 rQ 0 C .0ˇ/2: Further, let O 2 D y 0 .I
PX /y=.N rank X/, and recall (from Section 5.7c) that E.O 2 / D 2. Thus, EŒ.Qr 0 X0 y/2 D E.O 2 rQ 0 / C .0ˇ/2
or, equivalently,
EŒ.Qr 0 X0 y/2
O 2 rQ 0 D .0ˇ/2:
We conclude that .Qr 0 X0 y/2 O 2 rQ 0 is an unbiased estimator of .0ˇ/2 and, upon observing that 0 0 2 2Q0 .Qr X y/ O r depends on the value of y only through the values of X0 y and y 0 .I PX /y (which form a complete sufficient statistic), that .Qr 0 X0 y/2 O 2 rQ 0 has minimum variance among 0 2 all unbiased estimators of . ˇ/ . (b) Making use of the results of Section 5.7c, we find that EfŒy 0 .I
PX /y2 g D varŒy 0 .I D 2 4 .N
D 4 .N
PX /y C fEŒy 0 .I
rank X/ C 4 .N
rank X/ŒN
PX /yg2 rank X/2
rank.X/ C 2:
Œy 0 .I PX /y2 . And since this estimator of 4 .N rank X/ŒN rank.X/C2 depends on the value of y only through the value of y 0 .I PX /y [which, in combination with X0 y, forms a complete sufficient statistic], it has minimum variance among all unbiased estimators of 4.
Thus, 4 is estimated unbiasedly by
52
Estimation and Prediction: Classical Approach
EXERCISE 16. Suppose that y is an N 1 observable random vector that follows a G-M model, and assume that the distribution of the vector e of residual effects is MVN. Show that if 2 were known, X0 y would be a complete sufficient statistic. Solution. Suppose that 2 were known. Proceeding in much the same way as in Section 5.8, let us begin by letting K D rank X and by observing that there exists an N K matrix, say W , whose columns form a basis for C.X/. Observe also that W D XR for some matrix R and that X D W S for some (K P ) matrix S (of rank K). Moreover, X0 y is expressible in terms of the (K 1) vector W 0 y, and, conversely, W 0 y is expressible in terms of X0 y; we have that X0 y D S0 W 0 y and that W 0 y D R0 X0 y. Thus, corresponding to any function g.X0 y/ of X0 y, there is a function , say g .W 0 y/, of W 0 y such that g .W 0 y/ D g.X0 y/ for every value of y; namely, the function g .W 0 y/ defined by g .W 0 y/ D g.S0 W 0 y/. Similary, corresponding to any function h.W 0 y/ of W 0 y, there is a function, say h .X0 y/, of X0 y such that h .X0 y/ D h.W 0 y/ for every value of y; namely, the function h .X0 y/ defined by h .X0 y/ D h.R0 X0 y/. Now, suppose that W 0 y were a complete sufficient statistic. Then, since W 0 y D R0 X0 y, X0 y would be a sufficient statistic. Moreover, if EŒg.X0 y/ D 0, then EŒg .W 0 y/ D 0, implying that PrŒg .W 0 y/ D 0 D 1 and hence that PrŒg.X0 y/ D 0 D 1. Thus, X0 y would be a complete statistic. Conversely, suppose that X0 y were a complete sufficient statistic. Then, since X0 y D S0 W 0 y, 0 W y would be a sufficient statistic. Moreover, if EŒh.W 0 y/ D 0, then EŒh .X0 y/ D 0, implying that PrŒh .X0 y/ D 0 D 1 and hence that PrŒh.W 0 y/ D 0 D 1. Thus, W 0 y would be a complete statistic. At this point, we have established that X0 y would be a complete sufficient statistic if and only if 0 W y would be a complete sufficient statistic. Thus, for purposes of verifying that X0 y would be a complete sufficient statistic, it suffices to consider the would-be sufficiency and completeness of the statistic W 0 y. In that regard, the probability density function of y, say f ./, is expressible as follows: 0 1 0 0 1 1 0 1 0 W y : (S.12) y y exp ˇ X Xˇ exp Sˇ exp f .y/ D 2 2 2 2 2 .2 2 /N=2 Based on the same well-known result on complete sufficient statistics for exponential families that was employed in Section 5.8, it follows from result (S.12) that (under the supposition of normality) W 0 y would be a complete sufficient statistic—to establish that the result on complete sufficient statistics for exponential families is applicable, it suffices to observe [in connection with expression (S.12)] that the (K 1) vector .1= 2 /Sˇ of parametric functions is such that, for any K 1 vector d, .1= 2 /Sˇ D d for some value of ˇ (as is evident upon noting that S contains K linearly independent columns). It remains only to observe that since W 0 y would be a complete sufficient statistic, X0 y would be a complete sufficient statistic. EXERCISE 17. Suppose that y is an N 1 observable random vector that follows a general linear model. Suppose further that the distribution of the vector e of residual effects is MVN or, more generally, that the distribution of e is known up to the value of the vector . And take h.y/ to be any (possibly vector-valued) translation-invariant statistic. (a) Show that if were known, h.y/ would be an ancillary statistic—for a definition of ancillarity, refer, e.g., to Casella and Berger (2002, def. 6.2.16) or to Lehmann and Casella (1998, p. 41). (b) Suppose that X0 y would be a complete sufficient statistic if were known. Show (1) that the least squares estimator of any estimable linear combination 0ˇ of the elements of the parametric vector ˇ has minimum variance among all unbiased estimators, (2) that any vector of least squares estimators of estimable linear combinations (of the elements of ˇ) is distributed independently of h.y/, and (3) (using the result of Exercise 12 or otherwise) that the least squares estimator of any estimable linear combination 0ˇ has minimum mean squared error among all translation-
53
Estimation and Prediction: Classical Approach
equivariant estimators. {Hint [for Part (2)]. Make use of Basu’s theorem—refer, e.g., to Lehmann and Casella (1998, p. 42) for a statement of Basu’s theorem.} Solution. (a) It was established in Section 5.7d that any translation-invariant statistic depends on the value of y only through the value of e. Thus, if were known, the distribution of h.y/ would not depend on the (unknown) parameters—the only (unknown) parameters would be the elements of ˇ. And it follows that h.y/ would be an ancillary statistic. (b) (1) Let `.y/ represent the least squares estimator of 0ˇ, and recall (from the results of Section 5.4c) that `.y/ is an unbiased estimator. And note that `.y/ and every other unbiased estimator of 0ˇ would be among the estimators that would be unbiased if were known. Note also that `.y/ D r 0 X0 y for some (P 1) vector r. Thus, `.y/ depends on the value of y only through the value of what would be a complete sufficient statistic if were known, and, consequently, `.y/ has minimum variance among unbiased estimators at the point . Moreover, this same line of reasoning is applicable and the same conclusion reached for every 2 ‚. (2) As noted in Section 5.7c, any (column) vector of least squares estimators of estimable linear combinations of the elements of ˇ is expressible as R0 X0 y for some matrix R. Since X0 y would be a complete sufficient statistic and [in light of Part (a)] h.y/ would be an ancillary statistic if were known, it follows from Basu’s theorem that X0 y is distributed independently of h.y/ when the distribution of e is that corresponding to the point . Moreover, this same line of reasoning is applicable and the same conclusion reached for every 2 ‚. And, upon observing that R0 X0 y depends on the value of y only through the value of X0 y, we conclude that R0 X0 y is distributed independently of h.y/.
(3) Let `.y/ represent the least squares estimator of 0ˇ, and recall (from the results of Section 5.5) that `.y/ is translation-equivariant (as well as unbiased). Further, let t.y/ represent an arbitrary translation-equivariant estimator of 0ˇ, and observe (based on the result of Exercise 12) that t.y/ D `.y/ C d.y/ for some translation-invariant statistic d.y/. Observe also [in light of Part (2)] that `.y/ and d.y/ are statistically independent. Thus, the mean squared error of t.y/ is expressible as EfŒt.y/ 0ˇ2 g D EfŒ`.y/ 0ˇ Cd.y/2g
D EfŒ`.y/ 0ˇ2g C 2 EŒ`.y/d.y/ D varŒ`.y/ C 2 EŒ`.y/ EŒd.y/ D varŒ`.y/ C 20ˇ EŒd.y/ D varŒ`.y/ C EfŒd.y/2 g varŒ`.y/;
2 EŒ0ˇd.y/ C EfŒd.y/2 g
20ˇ EŒd.y/ C EfŒd.y/2 g
20ˇ EŒd.y/ C EfŒd.y/2 g
(S.13)
with equality holding [in inequality (S.13)] if and only if d.y/ D 0 (with probability 1) or, equivalently, if and only if t.y/ D `.y/ (with probability 1). And it follows that `.y/ has minimum mean squared error among all translation-equivariant estimators of 0ˇ. EXERCISE 18. Suppose that y is an N 1 observable random vector that follows a G-M model. Suppose further that the distribution of the vector e of residual effects is MVN. And, letting eQ D y PX y, take Q 2 D eQ 0 eQ =N to be the ML estimator of 2 and O 2 D eQ 0 eQ =.N rank X/ to be the unbiased estimator. (a) Find the bias and the MSE of the ML estimator Q 2.
(b) Compare the MSE of the ML estimator Q 2 with that of the unbiased estimator O 2 : for which values of N and of rank X is the MSE of the ML estimator smaller than that of the unbiased estimator and for which values is it larger?
54
Estimation and Prediction: Classical Approach
Solution. Let K D rank X.
(a) In light of result (7.36), the bias of Q 2 is 0 eQ eQ N K E.Q 2 / 2 D E 2 D 2 N N
and, in light of result (7.42), the MSE of Q 2 is EŒ.Q 2
2 /2 D
4 Œ2.N N2
2 D
K/ C K 2 :
K 2 ; N
(S.14)
(b) Let D represent the difference between the MSE of O 2 and the MSE of Q 2. In light of results (7.40) and (S.14) [along with result (7.37)], we find that 2 4 4 DD Œ2.N K/ C K 2 N K N2 4 D 2 Œ2N 2 2.N K/2 K 2 .N K/ N .N K/ 4 D 2 Œ4NK 2K 2 K 2 .N K/ N .N K/ K 4 ŒK.K 2/ N.K 4/: D 2 N .N K/ Clearly, if K D 1, K D 2, K D 3, or K D 4, then D > 0 (regardless of the value of N ). If K.K 2/ K.K 2/ ,N D , or K 5, then D > 0, D D 0, or D < 0, depending on whether N < K 4 K 4 K.K 2/ 2K 2K 2K N > or, equivalently, on whether N K < ,N KD , or N K > . K 4 K 4 K 4 K 4 Thus, for K between 1 and 4, inclusive, the ML estimator has the smaller MSE. And, for K 5, whether the MSE of the ML estimator is smaller than, equal to, or larger than that of the unbiased estimator depends on whether N is smaller than, equal to, or larger than K.K 2/=.K 4/ or, equivalently, on whether N K is smaller than, equal to, or larger than 2K=.K 4/. Note that, for K > 4, 2K=.K 4/ is a decreasing function of K and that, for K > 12, 2K=.K 4/ < 3. The implication is that, for K 13, the unbiased estimator has the smaller MSE whenever N K 3 or, equivalently, whenever N K C 3. EXERCISE 19. Suppose that y is an N 1 observable random vector that follows a general linear model, that the distribution of the vector e of residual effects is MVN, and that the variancecovariance matrix V ./ of e is nonsingular (for all 2 ‚). And, letting K D N rank X, take R to be any N K matrix (of constants) of full column rank K such that X0 R D 0, and (as in Section 5.9b) define z D R0 y. Further, let w D s.z/, where s./ is a K 1 vector of real-valued functions that defines a one-to-one mapping of RK onto some set W. (a) Show that w is a maximal invariant. (b) Let f1 . I / represent the pdf of the distribution of z, and assume that s./ is such that the distribution of w has a pdf, say f2 . I /, that is obtainable from f1 . I / via an application of the basic formula (e.g., Bickel and Doksum 2001, sec. B.2) for a change of variables. And, taking L1 .I R0 y/ and L2 ŒI s.R0 y/ (where y denotes the observed value of y) to be the likelihood functions defined by L1 .I R0 y/ D f1 .R0 yI / and L2 ŒI s.R0 y/ D f2 Œs.R0 y/I , show that L1 .I R0 y/ and L2 ŒI s.R0 y/ differ from each other by no more than a multiplicative constant. Solution. (a) As discussed in Section 5.9b, z is a maximal invariant, and any (possibly vectorvalued) statistic that depends on y only through the value of a maximal invariant is translation
55
Estimation and Prediction: Classical Approach
invariant. Thus, w is translation invariant. Now, take y1 and y2 to be any pair of values of y such that s.R0 y2 / D s.R0 y1 /. Then, because the mapping defined by s./ is one-to-one, R0 y2 D R0 y1 . And because z is a maximal invariant, it follows that y2 D y1 C Xk for some vector k. Accordingly, we conclude that w is a maximal invariant. (b) Let J.z/ represent the determinant of the K K matrix whose i tth element is the partial derivative of the i th element of s.z/ with respect to the tth element of z. Then, L2 ŒI s.R0 y/ D f2 Œs.R0 y/I D f1 .R0 yI /jJ.R0 y/j
1
D jJ.R0 y/j
1
L1 .I R0 y/:
Thus, the two likelihood functions L1 .I R0 y/ and L2 ŒI s.R0 y/ differ from each other by no more than a multiplicative constant (which may depend on the observed value y of y). EXERCISE 20. Suppose that y is an N 1 observable random vector that follows a general linear model, that the distribution of the vector e of residual effects is MVN, and that the variancecovariance matrix V ./ of e is nonsingular (for all 2 ‚). Further, let z D R0 y, where R is any N .N rank X/ matrix (of constants) of full column rank N rank X such that X0 R D 0; and let u D X0 y, where X is any N .rank X/ matrix (of constants) whose columns form a basis for C.X/. And denote by y the observed value of y. 0 (a) Verify that the likelihood function that would result from regarding the observed value .X ; R/ y u of as the data vector differs by no more than a multiplicative constant from that obtained z by regarding the observed value y of y as the data vector.
(b) Let f0 . j I ˇ; / represent the pdf of the conditional distribution of u given z. And take L0 Œˇ; I .X ; R/0 y to be the function of ˇ and defined by L0 Œˇ; I .X ; R/0 y D f0 .X0 y j R0 yI ˇ; /. Show that L0 Œˇ; I .X ; R/0 y D .2/
.rank X/=2
jX0 X j
1
jX0 ŒV ./ 1 X j1=2 Q ˇ0 X0 ŒV ./ expf 21 Œˇ./
Q where ˇ./ is any solution to the linear system X0 ŒV ./ vector b).
1
1
Xb D X0 ŒV ./
Q XŒˇ./ ˇg; 1
y (in the P 1
(c) In connection with Part (b), show (1) that Q Œˇ./ ˇ0 X0 ŒV ./
1
Q XŒˇ./ ˇ
D .y
Xˇ/0 ŒV ./
1
XfX0 ŒV ./
1
Xg X0 ŒV ./ 1.y
Xˇ/
and (2) that the distribution of the random variable s defined by s D .y
Xˇ/0 ŒV ./
1
XfX0 ŒV ./
1
Xg X0 ŒV ./ 1.y
Xˇ/
does not depend on ˇ. Solution. (a) Let f1 . I ˇ; / represent the pdf of the distribution of y and f2 . I ˇ; / thepdf u u of the distribution of —the distribution of y is N ŒXˇ; V ./, and the distribution of z z is N Œ.X ; R/0 Xˇ; .X ; R/0 V ./.X ; R/. Then, the function L1 .ˇ; I y/ of ˇ and defined by L1 .ˇ; I y/ D f1 .yI ˇ; / is the likelihood function obtained by regarding the observed value y of y as the data vector. And, similarly, the function L2 Œˇ; I .X ; R/0 y of ˇ and defined by 0 L2 Œˇ; I .X ; R/0 y D f2 Œ.X ;R/ yI ˇ; is the likelihood function obtained by regarding the u as the data vector. Further, observed value .X ; R/0 y of z f1 .yI ˇ; / D jdet .X ; R/j f2 Œ.X ; R/0 yI ˇ; ;
56
Estimation and Prediction: Classical Approach
as can be verified directly from formula (3.5.32) for the pdf of a MVN distribution—clearly, N=2
.2/
jV ./j
1=2
Xˇ/0 ŒV ./
1 .y 2
expf
D jdet .X ; R/j .2/
1 Œ.X ; 2
expf
N=2
1
.y
Xˇ/g
0
j.X ; R/ V ./.X ; R/j
R/0 .y
1=2
Xˇ/0 Œ.X ; R/0 V ./.X ; R/
1
.X ; R/0 .y
Xˇ/g u —or simply by observing that D .X ; R/0 y and making use of standard results (e.g., Bickel z and Doksum 2001, sec. B.2) on a change of variables. Thus, L2 Œˇ; I .X ; R/0 y D f2 Œ.X ; R/0 yI ˇ;
D jdet .X ; R/j 1f1 .yI ˇ; / D jdet .X ; R/j
1
L1 .ˇ; I y/:
(S.15)
(b) Let f3 . I / represent the pdf of the (marginal) distribution of z, and take L3 .I R0 y/ to be the function of defined by L3 .I R0 y/ D f3 .R0 yI /. Further, define L1 .ˇ; I y/ and L2 Œˇ; I .X ; R/0 y as in the solution to Part (a). Then, making use of result (S.15), we find that L0 Œˇ; I .X ; R/0 y D
jdet .X ; R/j 1 L1 .ˇ; I y/ L2 Œˇ; I .X ; R/0 y D L3 .I R0 y/ L3 .I R0 y/
and hence [in light of result (9.26)] that L0 Œˇ; I .X ; R/0 y D
.2/
N=2
j.X ; R/0 V ./.X ; R/j
.2/
1=2
expf
.N rank X/=2 jR0 V ./Rj 1=2
expf
1 .y Xˇ/0 ŒV ./ 1 .y Xˇ/g 2 : 1 0 y RŒR0 V ./R 1 R0 yg 2
And in light of results (9.36), (9.32), and (9.40), it follows that L0 Œˇ; I .X ; R/0 y D .2/
D .2/
.rank X/=2
1
jX0 X j
expf
1 2 .y
jX0 ŒV ./
1
Xˇ/0 ŒV ./
X j1=2
1
.y Xˇ/
0 0 Q X ŒV ./ 1 yg C 21 y 0 ŒV ./ 1 y 21 Œˇ./ .rank X/=2 jX0 X j 1 jX0 ŒV ./ 1 X j1=2 Q Q expf 21 Œˇ./ ˇ0 X0 ŒV ./ 1 XŒˇ./ ˇg:
(c) (1) Clearly, Q Œˇ./ ˇ0 X0 ŒV ./
1
Q XŒˇ./ ˇ
D fX0 ŒV ./
1
Q XŒˇ./ ˇg0 fX0 ŒV ./
1
Xg X0 ŒV ./
1
Q XŒˇ./ ˇ:
Moreover, X0 ŒV ./
1
Q XŒˇ./ ˇ D X0 ŒV ./ 0
1 1
Q Xˇ./ X0 ŒV ./ 0
D X ŒV ./ y X ŒV ./
1
1
Xˇ
Xˇ D X0 ŒV ./ 1.y Xˇ/:
Thus, Q Œˇ./ ˇ0 X0 ŒV ./
1
Q XŒˇ./ ˇ D .y
Xˇ/0 ŒV ./
1
XfX0 ŒV ./
1
Xg X0 ŒV ./ 1.y
Xˇ/:
Estimation and Prediction: Classical Approach
57
(2) Clearly, the random variable s is a function of the random vector y Xˇ. Moreover, the distribution of y Xˇ is N Œ0; V ./, which does not depend on ˇ. Thus, the distribution of s does not depend on ˇ. EXERCISE 21. Suppose that z is an S 1 observable random vector and that z N.0; 2 I/, where is a (strictly) positive unknown parameter. (a) Show that z0 z is a complete sufficient statistic. (b) Take w.z/ to be the S -dimensional vector-valued statistic defined by w.z/ D .z0 z/ 1=2 z—w.z/ is defined for z ¤ 0 and hence with probability 1. Show that z0 z and w.z/ are statistically independent. (Hint. Make use of Basu’s theorem.) (c) Show that any estimator of 2 of the form z0 z=k (where k is a nonzero constant) is scale equivariant—an estimator, say t.z/, of 2 is to be regarded as scale equivariant if for every (strictly) positive scalar c (and for every nonnull value of z) t.cz/ D c 2 t.z/.
(d) Let t0 .z/ represent any particular scale-equivariant estimator of 2 such that t0 .z/ ¤ 0 for z ¤ 0. Show that an estimator t.z/ of 2 is scale equivariant if and only if, for some function u./ such that u.cz/ D u.z/ (for every strictly positive constant c and every nonnull value of z), t.z/ D u.z/t0 .z/ for z ¤ 0:
(E.2)
(e) Show that a function u.z/ of z is such that u.cz/ D u.z/ (for every strictly positive constant c and every nonnull value of z) if and only if u.z/ depends on the value of z only through w.z/ [where w.z/ is as defined in Part (b)]. (f) Show that the estimator z0 z=.S C2/ has minimum MSE among all scale-equivariant estimators of 2. Solution. (a) The distribution of z has a pdf, say f ./, that is expressible as follows: 1 0 S=2 S f .z/ D .2/ exp zz : 2 2
(S.16)
Based on the same well-known result on complete sufficient statistics for exponential families that was employed in Section 5.8, it follows from result (S.16) that z0 z is a complete sufficient statistic. (b) Clearly,
w.z/ D fŒ.1=/z0 Œ.1=/zg 1=2 .1=/z D wŒ.1=/z: And upon observing that the distribution of .1=/z [which is N.0; I/] does not depend on , it follows that w.z/ is an ancillary statistic. Since [according to Part (a)] z0 z is a complete sufficient statistic, we conclude (on the basis of Basu’s theorem) that z0 z and w.z/ are statistically independent. (c) Clearly, for every (strictly) positive scalar c (and for every value of z), .cz/0 .cz/=k D c z z=k. Thus, z0 z=k is a scale-equivariant estimator of 2. 2 0
(d) Suppose that, for some function u./ such that u.cz/ D u.z/ (for c > 0 and for z ¤ 0), t.z/ D u.z/t0 .z/ for z ¤ 0. Then, for c > 0 and z ¤ 0, t.cz/ D u.cz/t0 .cz/ D u.z/Œc 2 t0 .z/ D c 2 u.z/t0 .z/ D c 2 t.z/:
Thus, t.z/ is scale equivariant. Conversely, suppose that t.z/ is scale equivariant. Then, taking u./ to be the function defined (for z ¤ 0) by u.z/ D t.z/=t0 .z/, we find that (for c > 0 and z ¤ 0) u.cz/ D t.cz/=t0 .cz/ D c 2 t.z/=Œc 2 t0 .z/ D t.z/=t0 .z/ D u.z/: Moreover, u./ is such that (for z ¤ 0) t.z/ D u.z/t0 .z/.
(e) Suppose that there exists a function h./ such that u.z/ D hŒw.z/ (for every z ¤ 0). Then, for every c > 0 and every z ¤ 0, it is clear [upon observing that w.cz/ D w.z/] that u.cz/ D hŒw.cz/ D hŒw.z/ D u.z/:
58
Estimation and Prediction: Classical Approach
Conversely, suppose that u.cz/ D u.z/ for every c > 0 and every z ¤ 0. Then, for every nonnull value of z, u.z/ D uŒ.z0 z/ 1=2 z D uŒw.z/: (f) Let t.z/ represent an arbitrary scale-equivariant estimator of 2, or equivalently [in light of Parts (c), (d), and (e)] take t.z/ to be any estimator of 2 that (for z ¤ 0) is expressible in the form t.z/ D hŒw.z/ z0 z
(S.17)
for some function h./. The MSE of an estimator of the form (S.17) is E fhŒw.z/ z0 z 2 g2 :
Recall [from Part (b)] that z0 z and w.z/ are statistically independent, and consider the minimization of the MSE conditional on the value of w.z/. Upon employing essentially the same line of reasoning as in the derivation (in Section 5.7c) of the Hodges-Lehmann estimator, we find that the conditional MSE attains its minimum value when the value of hŒw.z/ is taken to be 1=.SC2/. And we conclude that the estimator z0 z=.S C2/ has minimum MSE (both conditionally and unconditionally) among all scale-equivariant estimators of 2. EXERCISE 22. Suppose that y is an N 1 observable random vector that follows a G-M model and that the distribution of the vector e of residual effects is MVN. Using the result of Part (f) of Exercise 21 (or otherwise), show that the Hodges-Lehmann estimator y 0 .I PX /y=ŒN rank.X/C2 has minimum MSE among all translation-invariant estimators of 2 that are scale equivariant—a translation-invariant estimator, say t.y/, of 2 is to be regarded as scale equivariant if t.cy/ D c 2 t.y/ for every (strictly) positive scalar c and for every nonnull value of y in N.X0 /. Solution. As discussed in Section 5.7d, the Hodges-Lehmann estimator is translation invariant. Further, upon observing that (for every scalar c and for every value of y) .cy/0 .I PX /.cy/ D c 2 y 0 .I PX /y, it is clear that the Hodges-Lehmann estimator is scale equivariant. Now, let z D R0 y, where R is any N .N rank X/ matrix (of constants) such that X0 R D 0, 0 R R D I, and RR0 D I PX —the existence of such a matrix was established in Part 2 of Section 5.9b. Then, in light of the results of the introductory part of Section 5.9b, z is a maximal invariant, and, as a consequence, any translation-invariant estimator of 2 is expressible as a function, say t.z/, of z. Moreover, if this estimator is scale equivariant, then, for every strictly positive scalar c and for every nonnull value of y in N.X0 /, tŒR0 .cy/ D c 2 t.R0 y/, implying [since the columns of R form a basis for N.X0 /] that, for every strictly positive scalar c and for every nonnull value of z, t.cz/ D t.cR0 Rz/ D tŒR0 .cRz/ D c 2 tŒR0 .Rz/ D c 2 t.R0 Rz/ D c 2 t.z/:
Also, the Hodges-Lehmann estimator is expressible as z0 z=ŒN rank.X/C2, and the distribution of z is N.0; 2 I/. Thus, upon regarding z as the data vector and applying the result of Part (f) of Exercise 21 (with S D N rank X), it follows that the Hodges-Lehmann estimator has minimum MSE among all translation-invariant estimators of 2 that are scale equivariant. EXERCISE 23. Let z D .z1 ; z2 ; : : : ; zM /0 represent an M -dimensional random (column) vector that has an absolutely continuous distribution with a pdf f ./. And suppose that for some (nonnegative) function g./ (of a single nonnegative variable), f .z/ / g.z0 z/ (in which R 1case the distribution of z is spherical). Show (for i D 1; 2; : : : ; M ) that E.zi2 / exists if and only if 0 s M C1 g.s 2 / ds < 1, R1 in which case 1 0 s M C1 g.s 2 / ds 2 R : var.zi / D E.zi / D M 01 s M 1 g.s 2 / ds R Solution. Clearly, RM g.z0 z/ d z < 1, and where c D
R
RM
f .z/ D c
1
g.z0 z/;
g.z0 z/ d z. Accordingly, it follows from the results of Section 5.9c that
59
Estimation and Prediction: Classical Approach R1 M 1 2 g.s / ds < 1 and that c is expressible in the form 0 s Z 1 2 M=2 cD s M 1 g.s 2 / ds: .M=2/ 0 Moreover, since [in light of result (9.48)] zj zi for j ¤ i , E.zi2 /
Z M M 1 X 2 1 1 1 X 2 0 E.zj / D E E.z z/ D zj D .z0 z/f .z/ d z D M M M M RM j D1 j D1 Z 1 D g .z0 z/ d z; M c RM
where g ./ is the (nonnegative) function (of a single nonnegative variable, say u) defined by R g .u/ D ug.u/. And in light of the results of Section 5.9c, RM g .z0 z/ d z < 1 if and only R1 R1 if 0 s M 1 g .s 2 / ds < 1 or, equivalently, if and only if 0 s M C1 g.s 2 / ds < 1, in which case Z 1 Z 1 2 M=2 2 M=2 s M 1 g .s 2 / ds D s M C1 g.s 2 / ds: M .M=2/ .M=2/ 0 0 R R 1 Thus, E.zi2 / exists if and only if 0 s M C1 g.s 2 / ds < 1, in which case R R1 1 0 s M C1 g.s 2 / ds 1 RM g .z0 z/ d z 2 R D E.zi / D : M c M 01 s M 1 g.s 2 / ds Z
g .z0 z/ d z D
It remains only to observe that if E.zi2 / exists [in which case E.zi / also exists], then [in light of result (9.46)] E.zi / D 0 and hence var.zi / D E.zi2 /. EXERCISE 24. Let z represent an N -dimensional random column vector, and let z represent an M -dimensional subvector of z (where M < N ). And suppose that the distributions of z and z are absolutely continuous with pdfs f ./ and f ./, respectively. Suppose also that there exist (nonnegative) functions g./ and g ./ (of a single nonnegative variable) such that (for every value of z) f .z/ D g.z0 z/ and (for every value of z ) f .z / D g .z0 z / (in which case the distributions of z and z are spherical). (a) Show that (for v 0)
.N M /=2 g .v/ D Œ.N M /=2
(b) Show that if N
Z
1
.u
v/Œ.N
M /=2 1
g.u/ du:
v
M D 2, then (for v > 0) g.v/ D
1 0 g .v/;
where g0 ./ is the derivative of g ./. Solution. (a) The result of Part (a) is an immediate consequence of result (9.67). (b) Suppose that N
M D 2. Then, since .1/ D 1, the result of Part (a) simplifies to Z 1 g.u/ du: g .v/ D v
And it follows that (for v > 0)
g0 .v/ D Œ g.v/ D and hence that
g.v/
60
Estimation and Prediction: Classical Approach g.v/ D
1 0 g .v/:
EXERCISE 25. Let y represent an N 1 random vector and w an M 1 random vector. Suppose that the second-order moments of the joint distribution of y and w exist, and adopt the following notation: y D E.y/, w D E.w/, Vy D var.y/, Vyw D cov.y; w/, and Vw D var.w/. Further, assume that Vy is nonsingular. 0 (a) Show that the matrix Vw Vyw Vy 1 Vyw EŒvar.w j y/ is nonnegative definite and that it equals 0 if and only if (for some nonrandom vector c and some nonrandom matrix A) E.w j y/ D cCA0 y (with probability 1). 0 (b) Show that the matrix varŒE.w j y/ Vyw Vy 1 Vyw is nonnegative definite and that it equals 0 if and only if (for some nonrandom vector c and some nonrandom matrix A) E.w j y/ D c C A0 y (with probability 1). 0 Solution. Let .y/ D C Vyw Vy 1 y, where D w
0 Vyw Vy 1 y .
(a) Making use of results (10.16) and (10.15), we find that Vw
0 Vyw Vy 1 Vyw
EŒvar.w j y/ D varŒ.y/
D varŒ.y/
w
EŒvar.w j y/
E.w j y/:
Clearly, varŒ.y/ E.w j y/ is nonnegative definite, and varŒ.y/ E.w j y/ D 0 if and only if E.w j y/ D .y/ (with probability 1). If E.w j y/ D .y/ (with probability 1), then it is obviously the case that (for some c and some A, specifically, for c D and A D Vy 1 Vyw ) E.w j y/ D c C A0 y (with probability 1). Conversely, if (for some c and some A) E.w j y/ D c C A0 y (with probability 1), then because .y/ is the (unique) best linear approximation to E.w j y/, .y/ D c C A0 y and hence E.w j y/ D .y/ (with probability 1). (b) Clearly, Vw D EŒvar.w j y/ C varŒE.w j y/ or, equivalently, Thus,
varŒE.w j y/
varŒE.w j y/ D Vw
EŒvar.w j y/:
0 Vyw Vy 1 Vyw D Vw
0 Vyw Vy 1 Vyw
EŒvar.w j y/;
so that Part (b) follows from the result of Part (a). EXERCISE 26. Let y represent an N 1 observable random vector and w an M 1 unobservable random vector. Suppose that the second-order moments of the joint distribution of y and w exist, and adopt the following notation: y D E.y/, w D E.w/, Vy D var.y/, Vyw D cov.y; w/, and Vw D var.w/. Assume that y , w , Vy , Vyw , and Vw are known. Further, define .y/ D 0 w C Vyw Vy .y y /, and take t.y/ to be an (M 1)-dimensional vector-valued function of the form t.y/ D c C A0 y, where c is a vector of constants and A is an N M matrix of constants. Extend various of the results of Section 5.10a (to the case where Vy may be singular) by using Theorem 3.5.11 to show (1) that .y/ is the best linear predictor of w in the sense that the difference between the matrix EfŒt.y/ wŒt.y/ w0 g and the matrix varŒ.y/ w [which is the MSE matrix of .y/] equals the matrix EfŒt.y/ .y/Œt.y/ .y/0 g, which is nonnegative definite and which equals 0 if and only if t.y/ D .y/ for every value of y such that y y 2 C.Vy /, (2) that 0 PrŒy y 2 C.Vy / D 1, and (3) that varŒ.y/ w D Vw Vyw Vy Vyw . Solution. (1) Clearly, .y/ is unbiased. It is also linear, as is evident upon observing that it is 0 0 Vy y . Vy y, where D w Vyw reexpressible as .y/ D C Vyw Now, decompose the difference between the vector-valued function t.y/ and the vector w into two components as follows:
61
Estimation and Prediction: Classical Approach w D Œt.y/
t.y/
.y/ C Œ.y/
w:
And upon observing [in light of Part (2) of Theorem 3.5.11] that covŒw .y/; y D 0 or, equivalently, that covŒy; .y/ w D 0 and hence that EfŒt.y/
w0 g D covŒt.y/
.y/Œ.y/
D .A0
we find that wŒt.y/
EfŒt.y/
D E fŒt.y/
w0 g
D EfŒt.y/
.y/ C Œ.y/
.y/; .y/
wgfŒt.y/
w D 0;
.y/0 C Œ.y/
.y/0 g C varŒ.y/
.y/Œt.y/
w
0 Vyw Vy / covŒy; .y/
w0 g
w:
(S.18)
It follows from result (S.18) that .y/ is a best linear predictor in the sense that the difference between the matrix EfŒt.y/ wŒt.y/ w0 g and the matrix varŒ.y/ w [which is the MSE matrix of .y/] equals the matrix EfŒt.y/ .y/Œt.y/ .y/0 g, which is nonnegative definite. It remains to show that EfŒt.y/ .y/Œt.y/ .y/0 g D 0 if and only if t.y/ D .y/ for every value of y such that y y 2 C.Vy /. Clearly, 0 t.y/ .y/ D c C A0 y C .A0 Vyw Vy /.y y /; EŒt.y/
varŒt.y/
.y/ D c C A0 y 0
.y/ D .A
0 Vyw Vy
; and
/Vy .A0
0 Vyw Vy /0:
And, making use of Corollary 2.13.27, we find that EfŒt.y/
.y/Œt.y/ .y/0 g D 0 , EŒt.y/ .y/fEŒt.y/
.y/g0 C varŒt.y/
.y/ D 0
, EŒt.y/ .y/ D 0 and varŒt.y/ .y/ D 0 0 Vy /Vy D 0: , c C A0 y D 0 and .A0 Vyw
0 0 Moreover, .A0 Vyw Vy /Vy D 0 if and only if .A0 Vyw Vy /.y y / D 0 for every value of y such that y y 2 C.Vy /. Thus, EfŒt.y/ .y/Œt.y/ .y/0 g D 0 if and only if t.y/ D .y/ for every value of y such that y y 2 C.Vy /.
(2) That PrŒy y 2 C.Vy / D 1 follows from Part (0) of Theorem 3.5.11.
(3) That varŒ.y/
w D Vw
0 Vyw Vy Vyw follows from Part (2) of Theorem 3.5.11.
EXERCISE 27. Suppose that y is an N 1 observable random vector that follows a G-M model, and take w to be an M 1 unobservable random vector whose value is to be predicted. Suppose further that E.w/ is of the form E.w/ D ƒ0ˇ (where ƒ is a matrix of known constants) and that cov.y; w/ 0 is of the form cov.y; w/ D 2 Hyw (where Hyw is a known matrix). Let D .ƒ0 Hyw X/ˇ, 0 Q Q Q denote by w.y/ an arbitrary predictor (of w), and define .y/ Q D w.y/ Hyw y. Verify that w.y/ is a translation-equivariant predictor (of w) if and only if .y/ Q is a translation-equivariant estimator of . Solution. For an arbitrary P 1 vector k (and for an arbitrary value of y), Q C Xk/ D w.y/ Q Q C Xk/ w.y C ƒ0 k , w.y
0 .y C Xk/ D .y/ Q C .ƒ0 Hyw
, .y Q C Xk/ D .y/ Q C .ƒ0
0 X/k Hyw
0 Hyw X/k:
Q Thus, w.y/ is a translation-equivariant predictor (of w) if and only if .y/ Q is a translation-equivariant estimator of .
6 Some Relevant Distributions and Their Properties
EXERCISE 1. Let x represent a random variable whose distribution is Ga.˛; ˇ/, and let c represent a (strictly) positive constant. Show that cx Ga.˛; cˇ/ [thereby verifying result (1.2)]. Solution. Let y D cx. And denote by f ./ the pdf of the Ga.˛; ˇ/ distribution and by h./ the pdf of the distribution of the random variable y. Upon observing that x D y=c and that dx=dy D 1=c and upon making use of standard results on a change of variable (e.g., Casella and Berger 2002, sec. 2.1), we find that, for 0 < y < 1, h.y/ D f .y=c/ .1=c/ D
1 .y=c/˛ .˛/ˇ ˛
1
y=.cˇ /
e
.1=c/ D
1 y˛ .˛/.cˇ/˛
1
e
y=.cˇ /
—for 1 < y 0, h.y/ D 0. Thus, the pdf h./ is identical to the pdf of the Ga.˛; cˇ/ distribution, so that y Ga.˛; cˇ/. EXERCISE 2. Let w represent a random variable whose distribution is Ga.˛; ˇ/, where ˛ is a (strictly positive) integer. Show that (for any strictly positive scalar t) Pr.w t/ D Pr.u ˛/; where u is a random variable whose distribution is Poisson with parameter t=ˇ [so that Pr.u D s/ D e t =ˇ .t=ˇ/ s=sŠ for s D 0; 1; 2; : : :]. Solution. The equality Pr.w t/ D Pr.u ˛/ can be verified by mathematical induction. For ˛ D 1, Pr.w t/ D D
Z
Z
D D1
D1
t 0
1 e ˇ
x=ˇ
(S.1)
dx
t =ˇ
e
y
dy
0
e
ˇyDt =ˇ ˇ
yˇ
yD0
e
t =ˇ
Pr.u D 0/
D Pr.u 1/ D Pr.u ˛/: Thus, equality (S.1) holds for ˛ D 1. Moreover, upon observing that d. ˇe we find that (for ˛ 2)
x=ˇ
/=dx D e
x=ˇ
and employing integration by parts,
64
Some Relevant Distributions and Their Properties t 1 x ˛ 1 e x=ˇ dx .˛ 1/Šˇ ˛ 0 Z t 1 ˛ 1 t =ˇ ˛ 1 0=ˇ D .˛ 1/x ˛ t . ˇe / 0 . ˇe / .˛ 1/Šˇ ˛ 0 Z t 1 1 ˛ 1 t =ˇ D x ˛ 2 e x=ˇ dx t e C .˛ 1/Šˇ ˛ 1 .˛ 2/Šˇ ˛ 1 0 D Pr.u D ˛ 1/ C Pr.w t/;
Z
Pr.w t/ D
2
. ˇe
x=ˇ
/ dx
(S.2)
where w is a random variable whose distribution is Ga.˛ 1; ˇ/. And upon observing that expression (S.2) equals Pr.u ˛/ C Pr.w t/ Pr.u ˛ 1/; it is clear that if equality (S.1) holds for ˛ D ˛ 0 1 (where ˛ 0 2), then it also holds for ˛ D ˛ 0. By mathematical induction, it follows that equality (S.1) holds for every value of ˛. EXERCISE 3. Let u and w represent random variables that are distributed independently as Be.˛; ı/ and Be.˛Cı; /, respectively. Show that uw Be.˛; Cı/.
Solution. Define x D uw and y D u. Let us find the pdf of the joint distribution of x and y. Denote by f . ; / the pdf of the joint distribution of u and w, and by h. ; / the pdf of the joint distribution of x and y. Further, denote by A the set f.u; w/ W 0 < u < 1; 0 < w < 1g, and by B the set f.x; y/ W 0 < x < y < 1g. For .u; w/ 2 A, f .u; w/ D
.˛ C ı/ ˛ u .˛/.ı/
1
.1
u/ı
1
.˛ C ı C / ˛Cı w .˛ C ı/./
1
.1
w/
1
I
for .u; w/ … A, f .u; w/ D 0. The mapping .u; w/ ! .uw; u/ defines a one-to-one transformation from the set A onto the set B. The inverse transformation is defined by the two equalities u D y and w D x=y. Clearly, @u=@x @u=@y 0 1 D 1=y: det D det @w=@x @w=@y 1=y x=y 2 Thus, for .x; y/ 2 B, h.x; y/ D
.˛ C ı C / ˛ y .˛/.ı/./
1
.1
y/ı
1
x ˛Cı 1 1 y
x y
1
1 I y
for .x; y/ … B, h.x; y/ D 0. It is now clear that the distribution of x is the distribution with pdf g./, where (for 0 < x < 1) Z 1 Z ı 1 .˛ C ı C / ˛ 1 1 x x 1 x h.x; y/ dy D g.x/ D x x 1 dy .˛/.ı/./ y y y2 x x —for x 0 and x 1, g.x/ D 0. And upon introducing the change of variable z D .1 x/ 1 Œ1 .x=y/ and upon observing that 1 z D .1 x/ 1 Œ.x=y/ x and that dz=dy D .1 x/ 1 .x=y 2 /, we find that (for 0 < x < 1) Z 1 .˛ C ı C / ˛ 1 g.x/ D z 1 .1 z/ı 1 dz x .1 x/Cı 1 .˛/.ı/./ 0 .˛ C ı C / ˛ 1 Cı 1 ./.ı/ D x .1 x/ .˛/.ı/./ . C ı/ 1 x ˛ 1 .1 x/Cı 1: D B.˛; Cı/
65
Some Relevant Distributions and Their Properties Thus, x Be.˛; Cı/. EXERCISE 4. Let x represent a random variable whose distribution is Be.˛; /. (a) Show that, for r > ˛, E.x r / D
(b) Show that E.x/ D
˛ ˛C
and
.˛ C r/.˛ C / : .˛/.˛ C C r/ var.x/ D
.˛ C
˛ : C C 1/
/2 .˛
Solution. (a) For r > ˛, Z 1 r sr E.x / D
1 s ˛ 1 .1 s/ 1 ds B.˛; / 0 Z 1 1 s ˛Cr 1 .1 s/ 1 ds D B.˛; / 0
D
.˛ C r/./.˛ C / .˛ C r/.˛ C / B.˛ C r; / D D : B.˛; / .˛ C C r/.˛/./ .˛/.˛ C C r/
(S.3)
(b) Upon setting r D 1 in expression (S.3), we find [in light of result (3.5.6)] that E.x/ D
.˛ C 1/ .˛ C / 1 ˛ D˛ D : .˛/ .˛ C C 1/ ˛C ˛C
And upon setting r D 2 in expression (S.3), we find [in light of result (1.23)] that E.x 2 / D
1 ˛.˛ C 1/ .˛ C 2/ .˛ C / D .˛ C 1/ ˛ D ; .˛/ .˛ C C 2/ .˛ C C 1/.˛ C / .˛ C /.˛ C C 1/
so that var.x/ D E.x 2 /
ŒE.x/2 D D D
˛.˛ C 1/ .˛ C /.˛ C C 1/
˛2 .˛ C /2
˛.˛ C 1/.˛ C / ˛ 2 .˛ C C 1/ .˛ C /2 .˛ C C 1/
˛.˛ C / ˛ 2 ˛ D : 2 2 .˛ C / .˛ C C 1/ .˛ C / .˛ C C 1/
EXERCISE 5. Take x1 ; x2 ; : : : ; xK to be K random variables whose joint distribution is PK Di.˛1 ; ˛2 ; : : : ; ˛K ; ˛KC1 I K/, define xKC1 D 1 kD1 xk , and let ˛ D ˛1 C C ˛K C ˛KC1 .
(a) Generalize the results of Part (a) of Exercise 4 by showing that, for r1 > ˛1 , : : : ; rK > and rKC1 > ˛KC1 , KC1 Y .˛k C rk / .˛/ rK rKC1 E x1r1 xK xKC1 D : PKC1 .˛k / ˛ C kD1 rk
˛K ,
kD1
(b) Generalize the results of Part (b) of Exercise 4 by showing that (for an arbitrary integer k between 1 and K C1, inclusive) ˛k .˛ ˛k / ˛k and var.xk / D 2 E.xk / D ˛ ˛ .˛ C 1/ and that (for any 2 distinct integers j and k between 1 and K C1, inclusive) ˛j ˛k : cov.xj ; xk / D 2 ˛ .˛ C 1/
66
Some Relevant Distributions and Their Properties
Solution. (a) For r1 >
˛1 , : : : ; rK >
rK rKC1 E x1r1 xK xKC1 Z Y K r skk 1 D S
kD1
˛K , and rKC1 > ˛KC1 ,
K X
sk
kD1
rKC1
K Y .˛/ ˛ sk k QKC1 .˛ / k kD1 kD1
1
1
K X
sk
kD1
˛KC1
1
d s;
P where s D .s1 ; s2 ; : : : ; sK /0 and S D fs W sk > 0 .k D 1; 2; : : : ; K/; K kD1 sk < 1g. Thus, QKC1 Z .˛/ rK rKC1 kD1 .˛k C rk / E x1r1 xK xKC1 D QKC1 h.s/ d s; PKC1 kD1 .˛k / ˛ C kD1 rk RK
where h./ is a function defined as follows: for s 2 S , PKC1 K Y ˛ Cr kD1 .˛k C rk / sk k k h.s/ D QKC1 kD1 .˛k C rk / kD1
1
1
K X
kD1
sk
˛KC1 CrKC1
1
and, for s … S , h.s/ D 0. And upon observingRthat h./ is the pdf of the Di.˛1 C r1 , : : : ; ˛K C rK , ˛KC1 C rKC1 I K/ distribution and hence that RK h.s/ d s D 1, we conclude that rK rKC1 E x1r1 xK xKC1 D
KC1 Y .˛k C rk / .˛/ : PKC1 ˛ C kD1 rk kD1 .˛k /
(S.4)
(b) Upon applying formula (S.4) and recalling result (3.5.6), we find that E.xk / D E x10 xk0
1 0 1 xk xkC1
0 D xKC1
1 ˛k .˛/ .˛k C 1/ D ˛k D : .˛ C 1/ .˛k / ˛ ˛
And upon applying formula (S.4) and recalling result (1.23), we find that 0 0 E xk2 D E x10 xk0 1 xk2 xkC1 xKC1 1 ˛k ˛k C 1 .˛/ .˛k C 2/ D .˛k C 1/ ˛k D ; D .˛ C 2/ .˛k / .˛ C 1/ ˛ ˛ ˛C1 so that 2 E xk var xk D E xk2 ˛k ˛k C 1 ˛k 2 D ˛ ˛C1 ˛ ˛k ˛k C 1 ˛k ˛k ˛.˛k C 1/ ˛k .˛ C 1/ ˛k .˛ ˛k / D D D 2 : ˛ ˛C1 ˛ ˛ ˛.˛ C 1/ ˛ .˛ C 1/ Similarly, 1 ˛j ˛k .˛/ .˛j C 1/ .˛k C 1/ D ˛j ˛k D ; E.xj xk / DD .˛ C 2/ .˛j / .˛k / .˛ C 1/ ˛ ˛.˛ C 1/ so that cov.xj ; xk / D E.xj xk / E.xj / E.xk / ˛j ˛k ˛ .˛ C 1/ ˛j ˛k ˛j ˛k D ˛j ˛k 2 D 2 : D ˛.˛ C 1/ ˛ ˛ ˛ .˛ C 1/ ˛ .˛ C 1/ EXERCISE 6. Verify that the function b./ defined by expression (1.48) is a pdf of the chi distribution with N degrees of freedom.
Some Relevant Distributions and Their Properties 67 p Solution. Let x D v, where v is a random variable whose distribution is 2 .N /. And recall that a pdf, say f ./, of the 2 .N / distribution is obtainable from expression (1.16), and observe that v D x 2 and that dv=dx D 2x. Then, using standard results on a change of variable, we find that a pdf, say b./, of the distribution of x is obtained by taking (for 0 < x < 1) 1 .N=2/ 2N=2
b.x/ D f .x 2 / 2x D
1
xN
1
e
x 2 =2
and, ultimately, by taking
b.x/ D
8 ˆ
0 .j D 1; : : : ; J; J C 1; : : : ; J C K/; jJD1 sj < 1
and S to be the set ˚ S D .u1 ; : : : ; uJ ; x1 ; : : : ; xK / W uj > 0 .j D 1; : : : ; J /; xk > 0 .k D 1; : : : ; K/; PJ PK j D1 uj < 1; kD1 xk < 1 : Clearly, the J C K equalities
uj D sj .j D 1; 2; : : : ; J /
and
xk D
1
sJ Ck PJ
j D1 sj
.k D 1; 2; : : : ; K/
define a one-to-one transformation from the set S onto the set S . The inverse transformation is the transformation defined by the J CK equalities PJ .k D 1; 2; : : : ; K/: sj D uj .j D 1; 2; : : : ; J / and sJ Ck D xk 1 j D1 uj
Further, the .J C K/ .J C K/ matrix whose j th row is (for j D 1; : : : ; J , J C 1, : : : ; J C K) .@sj =@u1 ; : : : ; @sj =@uJ ; @sj =@x1 ; : : : ; @sj =@xK / equals # " I 0 (S.5) ; PJ A 1 j D1 uj I
where A is the K J matrix whose kth row is xk 10. The determinant of the matrix (S.5) equals K PJ 1 , as is evident upon applying Theorem 2.14.14. j D1 uj Now, let f . ; : : : ; ; ; : : : ; / represent the pdf of the joint distribution of u1 , : : : ; uJ , x1 , : : : ; xK , and let h. ; : : : ; ; ; : : : ; / represent the pdf of the joint distribution of s1 ; : : : ; sJ ; sJ C1 ; : : : ; sJ CK . Then, making use of standard results on a change of variables and also making use of expression
68
Some Relevant Distributions and Their Properties
(1.38) for the pdf of a Dirichlet distribution, we find that, for .u1 ; : : : ; uJ ; x1 ; : : : ; xK / 2 S , f .u1 ; : : : ; uJ ; x1 ; : : : ; xK / .˛1 C C ˛J CK C ˛J CKC1 / D .˛1 / .˛J CK /.˛J CKC1 / J K Y Y ˛j 1 ˛ 1 uj 1 xk J Ck j D1
h
1
kD1 J X
uj
j D1
K X
kD1
J X
uj
j D1
xk 1
P K
kD1
J X
uj
j D1
.˛J Ck
i˛J CKC1
.˛1 C C ˛J CK C ˛J CKC1 / D .˛1 / .˛J CK /.˛J CKC1 / P J J KC1 ˛J Ck X Y kD1 ˛j 1 uj 0 1 uj j D1
j 0 D1
K Y
˛
xk J Ck
1
kD1
1
1/
1
1
J X
uj
j D1
K
1
K X
k 0 D1
xk 0
˛J CKC1
1
:
PKC1 Moreover, marginally, u1 ; : : : ; uJ have a Di ˛1 ; : : : ; ˛J ; kD1 ˛J Ck I J distribution, the pdf of P which, say g. ; : : : ; /, is [for u1 ; : : : ; uJ such that uj > 0 (j D 1; : : : ; J ) and jJD1 uj < 1] expressible as P J J KC1 ˛J Ck X 1 .˛1 C C ˛J CK C ˛J CKC1 / Y ˛j 1 kD1 : uj 0 1 uj g.u1 ; : : : ; uJ / D PKC1 .˛1 / .˛J / kD1 ˛J Ck j D1 j 0 D1
Thus, for .u1 ; : : : ; uJ ; x1 ; : : : ; xK / 2 S ,
f .u1 ; : : : ; uJ ; x1 ; : : : ; xK / .˛J C1 C C ˛J CK C ˛J CKC1 / D g.u1 ; : : : ; uJ / .˛J C1 / .˛J CK /.˛J CKC1 /
K Y
˛
xk J Ck
kD1
1
1
K X
k 0 D1
xk 0
˛J CKC1
1
:
And it follows that the conditional distribution of x1 ; x2 ; : : : ; xK given s1 ; s2 ; : : : ; sJ has the same pdf as the Di.˛J C1 ; : : : ; ˛J CK ; ˛J CKC1 I K/ distribution (and hence that these two distributions are the same). EXERCISE 8. Let z1 ; z2 ; : : : ; zM represent random variables whose joint distribution is absolutely P 2 continuous with a pdf f . ; ; : : : ; / of the form f .z1 ; z2 ; : : : ; zM / D g M z i D1 i [where g./ is a nonnegative function of a single nonnegative variable]. Verify that the function b./ defined by PM 2 1=2 . (Note. This expression (1.53) is a pdf of the distribution of the random variable i D1 zi exercise can be regarded as a more general version of Exercise 6.) P p 2 Solution. Let s D v, where v D M i D1 zi . And observe that a pdf, say r./, of the distribution of v is obtainable from expression (1.51), and observe also that v D s 2 and that dv=ds D 2s. Then, using standard results on a change of variable, we find that a pdf, say b./, of the distribution of s is obtained by taking (for 0 < s < 1) b.s/ D r.s 2 / 2s D
2 M=2 M s .M=2/
1
g.s 2 /
69
Some Relevant Distributions and Their Properties and, ultimately, by taking 8 M=2 ˆ < 2 sM b.s/ D .M=2/ ˆ : 0;
1
g.s 2 /;
for 0 < s < 1, elsewhere.
EXERCISE 9. Use the procedure described in Section 6.2a to construct a 6 6 orthogonal matrix whose first row is proportional to the vector .0; 3; 4; 2; 0; 1/. Solution. Formulas (2.2) and (2.3) can be used to construct a 4 4 orthogonal matrix Q whose first row is proportional to the vector . 3; 4; 2; 1/. Upon observing that . 3/2 D 9, . 3/2 C 42 D 25, . 3/2 C 42 C 22 D 29, and . 3/2 C 42 C 22 C . 1/2 D 30, the four rows of Q are found to be as follows: 3 4 2 1 1=2 30 . 3; 4; 2; 1/ D p ; p ; p ; p I 30 30 30 30 4 3 16 1=2 ; ; 0; 0 I . 3; 9=4; 0; 0/ D 9 .25/ 5 5 1=2 6 8 5 4 . 3; 4; 25=2; 0/ D p ; p ; p ; 0 I and 25 .29/ 5 29 5 29 29 p 1=2 3 4 2 1 29 Œ 3; 4; 2; 29=. 1/ D p ;p ;p ;p : 29 .30/ 870 870 870 30 Thus, the matrix 3 B0 p30 B B 4 B B0 B 5 B B 6 B0 p B B 5 29 B B B0 p 3 B 870 B B B1 0 @ 0 0 0
is a 6 6 orthogonal .0; 3; 4; 2; 0; 1/.
matrix
4 p 30 3 5 8 p 5 29 4 p 870 0 0 whose
2 p 30
1 1 0 p C 30 C C C 0 0 0 C C C C 5 p 0 0 C C 29 p C C 2 29 C p 0 p C 870 30 C C C 0 0 0 C A 0 1 0
first
row
is
proportional
to
the
vector
EXERCISE 10. Let x1 and x2 represent M -dimensional column vectors. (a) Use the results of Section 6.2a (pertaining to Helmert matrices) to show that if x02 x2 D x01 x1 , then there exist orthogonal matrices O1 and O2 such that O2 x2 D O1 x1 . (b) Use the result of Part (a) to devise an alternative proof of the “only if” part of Lemma 5.9.9.
Solution. (a) Suppose that x02 x2 D x01 x1 . And assume that both x1 and x2 are nonnull—if either x1 or x2 is null, then they are both null, in which case O2 x2 D O1 x1 for any M M matrices O1 and O2 (orthogonal or not). Then, it follows from the results of Section 6.2a that there exist an M M orthogonal matrix O1 whose first row is proportional to x01 and an M M orthogonal matrix O2 whose first row is proportional to x02 . Moreover, O2 x2 D Œ.x02 x2 /1=2; 0; 0; : : : ; 00 D Œ.x01 x1 /1=2; 0; 0; : : : ; 00 D O1 x1 :
70
Some Relevant Distributions and Their Properties
(b) Suppose [as in Part (a)] that x02 x2 D x01 x1 . And take O1 and O2 to be M M orthogonal matrices such that O2 x2 D O1 x1 —the existence of such matrices follows from Part (a). Then, to prove the “only if” part of Lemma 5.9.9, it suffices (upon recalling that the transpose of an orthogonal matrix equals its inverse and that the transposes and products of orthogonal matrices are orthogonal) to observe that x2 D O20 O1 x1 and that O20 O1 is orthogonal. EXERCISE 11. Let w represent a random variable whose distribution is 2 .N; /. Verify that the expressions for E.w/ and E.w 2 / provided by formula (2.36) are in agreement with those provided by results (2.33) and (2.34) [or, equivalently, by results (2.28) and (2.30)]. Solution. Upon applying formula (2.36) in the special case where r D 1, we find that ! 1 0 D N C ; E.w/ D 1 C N 0 in agreement with result (2.33) [and result (2.28)]. And upon applying formula (2.36) in the special case where r D 2, we find that ! ! 2 2 1 0 C .N C2/ E.w 2 / D 2 C N.N C2/ 1 0 D 2 C N.N C2/ C .N C2/2 D .N C2/.N C2/ C 2;
in agreement with result (2.34) [and result (2.30)]. EXERCISE 12. Let w1 and w2 represent random variables that are distributed independently as Ga.˛1 ; ˇ; ı1 / and Ga.˛2 ; ˇ; ı2 /, respectively, and define w D w1 C w2 . Derive the pdf of the distribution of w by starting with the pdf of the joint distribution of w1 and w2 and introducing a suitable change of variables. [Note. This derivation serves the purpose of verifying that w Ga.˛1 C˛2 ; ˇ; ı1 Cı2 / and (when coupled with a mathematical-induction argument) represents an alternative way of establishing Theorem 6.2.2 (and Theorem 6.2.1).] Solution. Define s D w1 =.w1 Cw2 /, and denote by h./ the pdf of the distribution of w1 and by b./ the pdf of the distribution of w2 . Then, 1 X ı1j e ı1 h.w1 / D hj .w1 /; jŠ j D0
where (for j D 0; 1; 2; : : :) hj ./ is the pdf of a Ga.˛1 Cj; ˇ/ distribution, and, similarly, b.w2 / D
1 X ı2k e ı2 bk .w2 /; kŠ
kD0
where (for k D 0; 1; 2; : : :) bk ./ is the pdf of a Ga.˛2 C k; ˇ/ distribution. And, proceeding in essentially the same way as in Section 6.2c [in the derivation of result (2.10)], we find that a pdf, say
71
Some Relevant Distributions and Their Properties f . ; /, of the joint distribution of w and s is obtained by taking, for w > 0 and 0 < s < 1, f .w; s/ D h.sw/ bŒ.1 s/w j wj Dw D D
1 1 X X ı2ke ı2 ı1j e ı1 hj .sw/ bk Œ.1 s/w jŠ kŠ
j D0
1 X
rD0 1 X
kD0
r X ı1j e ı1 ı r j e ı2 w hj .sw/ 2 br jŠ .r j /Š
j Œ.1
s/w
j D0
e
.ı1 Cı2 /
rD0
w ˛1 C˛2 Cr 1 e w=ˇ 1 j r ı ı j Š .r j /Š .˛1 Cj /.˛2 Cr j / 1 2
ˇ
r X
j D0
.˛1 C˛2 Cr/
j ˛1 Cj 1
s
.1 s/˛2 Cr
j 1
—for w 0 and for s such that s 0 or s 1, f .w; s/ D 0. Now, let q./ represent the pdf of the (marginal) distribution of w. Then, making use of result (1.10), we find that, for w > 0, Z 1 f .w; s/ ds q.w/ D 0
D D D
1 X
e
.ı1 Cı2 /
ˇ
.˛1 C˛2 Cr/
w
˛1 C˛2 Cr 1
e
w=ˇ
rD0 1 X
rD0 1 X
rD0
r X
j D0
e
.ı1 Cı2 /
ˇ
.ı1 Cı2 /r e rŠ
.˛1 C˛2 Cr/
.ı1 Cı2 /
w ˛1 C˛2 Cr
1
e
w=ˇ
1 ıj ır j Š .r j /Š .˛1 C˛2 Cr/ 1 2
j
.ı1 C ı2 /r rŠ .˛1 C˛2 Cr/
1 w ˛1 C˛2 Cr .˛1 C˛2 Cr/ ˇ ˛1 C˛2 Cr
1
e
w=ˇ
—for w 0, q.w/ D 0. Thus, q./ is the pdf of the Ga.˛1 C˛2 ; ˇ; ı1 Cı2 / distribution [implying that w Ga.˛1 C˛2 ; ˇ; ı1 Cı2 /]. EXERCISE 13. Let x D C z, where z is an N -dimensional random column vector that has an absolutely continuous spherical distribution and where is an N -dimensional nonrandom column vector. Verify that in the special case where z N.0; I/, the pdf q./ derived in Section 6.2h for the distribution of x0 x “simplifies to” (i.e., is reexpressible in the form of) the expression (2.15) given in Section 6.2c for the pdf of the noncentral chi-square distribution [with N degrees of freedom and with noncentrality parameter (D 0 )]. Solution. Adopt the notation employed in Section 6.2h. And suppose that z N.0; I/, in which case g.c/ D .2/ N=2 exp. c=2/ (for every nonnegative scalar c): Then, the pdf q./ is such that, for 0 < y < 1, q.y/ D
.N 1/=2 y .N=2/ Œ.N 1/=2
Further, upon expanding e y
1=2 s
1
.2/
N=2
e
y=2
e
=2
Z
1
.1 1
in the power series e y
1=2 s
D
1 X
j D0
.y 1=2 s/j =j Š ;
s 2 /.N
3/=2 y 1=2 s
e
ds:
(S.6)
72
Some Relevant Distributions and Their Properties
we find that Z Since
R1
Z
1
1
.1
2 .N 3/=2 y 1=2 s
s /
1
s j .1
s 2 /.N
1
.1
e
s 2 /.N
3/=2
ds D
1 X
.y
j D0
1=2 j
/ =j Š
Z
1
s j .1
s 2 /.N
3/=2
ds:
s 2 /.N
3/=2
ds
1
ds D 0 for j D 1; 3; 5; : : : , it follows that
3/=2 y 1=2 s
e
1
ds D D D
1 X
rD0 1 X
rD0 1 X
rD0
Z
.y/r=.2r/Š
t rC.1=2/
Z
rD0
Upon substituting expression (S.7) for that, for 0 < y < 1,
R1
q.y/ D y .N=2/
e
N=2
e
y=2
1
.1 s 2 /.N
=2
1 X
rD0
1 X .=2/r e D rŠ rD0
=2
1
1
1
.1
t/Œ.N
1/=2 1
dt
Œr C .1=2/Œ.N 1/=2 .y/r=.2r/Š Œ.N=2/ C r
1
2
s 2r .1
0
and hence [in light of result (3.5.11)] that Z 1 1 X 1=2 .1 s 2 /.N 3/=2 e y s ds D
1
1
.y 1=2 /2r=.2r/Š
1=2 .y/r : 4r rŠ Œ.N=2/ C r
3/=2 y 1=2 s
e
(S.7)
ds in expression (S.6), we find
.=2/r y r 2r rŠ Œ.N=2/ C r
1 y Œ.2rCN /=2 Œ.2r CN /=2 2.2rCN /=2
1
e
y=2
:
Moreover, q.y/ D 0 for 1 < y 0. Thus, q./ is reexpressible in the form of expression (2.15). EXERCISE 14. Let u and v represent random variables that are distributed independently as 2 .M / and 2 .N /, respectively. And define w D .u=M /=.v=N /. Devise an alternative derivation of the pdf of the SF .M; N / distribution by (1) deriving the pdf of the joint distribution of w and v and by (2) determining the pdf of the marginal distribution of w from the pdf of the joint distribution of w and v. Solution. (1) Let r D v. And observe that the two equalities w D .u=M /=.v=N / and r D v define a one-to-one transformation from the rectangular region fu; v W u > 0; v > 0g onto the rectangular region fw; r W w > 0; r > 0g: Observe also that the inverse transformation is defined by the two equalities u D .M=N /rw
We find that
@u=@w @u=@r det @v=@w @v=@r
and
v D r:
.M=N /r .M=N /w D det D .M=N /r: 0 1
Thus, letting g./ represent a pdf of the 2 .M / distribution and h./ a pdf of the 2 .N / distribution, we find that a pdf, say f . ; /, of the joint distribution of w and r (or, equivalently, of the joint distribution of w and v) is obtained by taking, for w > 0 and r > 0, f .w; r/ D gŒ.M=N /rw h.r/.M=N /r
73
Some Relevant Distributions and Their Properties
—for w and r such that w 0 or r 0, f .w; r/ D 0. And upon taking g./ and h./ to be the pdfs obtained by applying formula (1.16), we find that, for w > 0 and r > 0, 1 f .w; r/ D .M=N /M=2 r Œ.M CN /=2 1 w .M=2/ 1 e Œ1C.M=N /wr=2: .M=2/.N=2/2.M CN /=2 (2) A pdf, say f ./, of the marginal distribution of w is obtained from the joint distribution of Z 1 w and v by taking, for w > 0, f .w; r/ dr (S.8) f .w/ D 0
—for 1 < w 0, f .w/ D 0. And upon introducing the change of variable s D Œ1C.M=N /wr, we find that (for w > 0) Z 1 ˚ f w; Œ1 C .M=N /w 1s Œ1 C .M=N /w 1 ds f .w/ D Z0 1 1 .M=N /M=2 D .M=2/.N=2/2.M CN /=2 0 w .M=2/
D
1
Œ1 C .M=N /w
Œ.M CN /=2 .M=N /M=2 w .M=2/ .M=2/.N=2/ Z 1
0
1
.M CN /=2 Œ.M CN /=2 1
s
Œ1 C .M=N /w
e
s=2
ds
.M CN /=2
1 s Œ.M CN /=2 Œ.M CN /=22.M CN /=2
1
e
s=2
ds:
(S.9)
Moreover, the integrand of the integral in expression (S.9) is a pdf—it is the pdf of a 2 .M CN / distribution—and hence the integral equals 1. Thus, for w > 0, f .w/ D
Œ.M CN /=2 .M=N /M=2 w .M=2/ .M=2/.N=2/
1
Œ1 C .M=N /w
.M CN /=2
;
in agreement with result (3.12). ıp EXERCISE 15. Let t D z v=N , where z and v are random variables that are statistically independent with z N.0; 1/ and v 2 .N / [in which case t S t.N /]. (a) Starting with the pdf of the joint distribution of z and v, derive the pdf of the joint distribution of t and v. (b) Derive the pdf of the S t.N / distribution from the pdf of the joint distribution of t and v, thereby providing an alternative to the derivation given in Part 2 of Section 6.4a. ıp Solution. (a) Let w D v. And observe that the equalities t D z v=N and w D v define a one-toone transformation from the region defined by the inequalities 1 < z < 1 and 0 < v < 1 onto the region defined by the inequalities 1 < t < 1 and 0 < w < 1. Observe also that the inverse of this transformation is the transformation defined by the equalities z D .w=N /1=2 t Further,
and
v D w:
ˇ ˇ ˇ ˇ@z=@t @z=@w ˇ ˇ.w=N /1=2 .1=2/.N w/ ˇ ˇ ˇ ˇDˇ ˇ ˇ@v=@t @v=@w ˇ ˇ 0 1
1=2
ˇ t ˇˇ ˇ D .w=N /1=2: ˇ
Thus, denoting by b./ the pdf of the N.0; 1/ distribution and by d./ the pdf of the 2 .N / distribution and making use of standard results on a change of variables, the pdf of the joint distribution of t
74
Some Relevant Distributions and Their Properties
and w (and hence the pdf of the joint distribution of t and v) is the function f . ; / (of 2 variables) obtained by taking (for 1 < t < 1 and 0 < w < 1) f .t; w/ D bŒ.w=N /1=2 t d.w/ .w=N /1=2 D
1
.N=2/ 2.N C1/=2 1=2
N
1=2
w Œ.N C1/=2
1
e
wŒ1C.t 2 =N /=2
—for 1 < t < 1 and 1 < w 0, f .t; w/ D 0.
(b) Clearly, a pdf of the (marginal) distribution of t is given by the function h./ obtained by Z 1 taking (for all t) f .t; w/ dw: h.t/ D 0
Moreover, upon introducing the change of variable s D wŒ1C.t 2 =N /=2 and recalling (from Section 3.5b) the definition of the gamma function, we find that Z t 2 .N C1/=2 1 Œ.N C1/=2 1 s 1 1=2 1C N h.t/ D s e ds N .N=2/ 1=2 0 t 2 .N C1/=2 Œ.N C1/=2 1=2 ; 1 C N D N .N=2/ 1=2 in agreement with expression (4.8).
EXERCISE 16. Let t D .x1 C x2 /=jx1 x2 j, where x1 and x2 are random variables that are distributed independently and identically as N.; 2 / (with > 0). Show that t has a noncentral t distribution, and determine the values of the parameters (the degrees of freedom and the noncentrality parameter) of this distribution. ıp ı p 2 and y2 D .x1 x2 / 2 , and define x D .x1 ; x2 /0 and Solution. Let y1 D .x1 C x2 / 0 y D .y1 ; y2 / . Clearly, x N.1; 2 I/; and p 1 1 1 y D Ax; where A D : 2 1 1 Thus, y has a multivariate normal distribution with
E.y/ D A.1/ D
p
2= 0
!
and
var.y/ D A. 2 I/A0 D I: p And it follows that y1 N 2=; 1 and also that y2 is distributed independently of y1 as N.0; 1/ and hence that y22 is distributed independently of y1 as 2 .1/. Further, y y1 Dq 1 ; jy2 j y22 =1 p leading to the conclusion that t S t 1; 2= . tD
EXERCISE 17. Let t represent a random variable that has an S t.N; / distribution. And take r to be an arbitrary one of the integers 1; 2; : : : < N . Generalize expressions (4.38) and (4.39) [for E.t 1 / and E.t 2 /, respectively] by obtaining an expression for E.t r / (in terms of ). (Note. This exercise is closely related to Exercise 3.12.)
75
Some Relevant Distributions and Their Properties
Solution. Let x represent a random variable that is distributed as N.; 1/, and take z to be a random variable that is distributed as N.0; 1/. Then, clearly, E.x r / D EŒ. C z/r : And based on the binomial theorem (e.g., Casella and Berger 2002, sec. 3.2), we have that ! r X r r . C z/ D r k z k k kD0
0
—interpret 0 as 1. Thus, letting r represent the largest even integer that does not exceed r (so that r D r if r is even, and r D r 1 is r is odd) and making use of results (3.5.17) and (3.5.18), we find that ! r X r r r k E.z k / E.x / D k kD0 ! rX =2 r r 2s E.z 2s / D 2s sD0 D r C
rX =2 sD1
r.r 1/.r 2/ .r 2s C1/ r 2s.2s 2/.2s 4/ 6 4 2
2s
:
(S.10)
Further, upon substituting expression (S.10) for E.x r / in expression (4.36) or (4.37), we obtain the expression rX =2 r.r 1/.r 2/ .r 2s C1/ r 2s r r r=2 Œ.N r/=2 C E.t / D .N=2/ .N=2/ 2s.2s 2/.2s 4/ 6 4 2 sD1 or, in the special case where r is even, the expression E.t r / D
r=2 X N r=2 r.r 1/.r 2/ .r 2s C1/ r r C .N 2/.N 4/ .N r/ 2s.2s 2/.2s 4/ 6 4 2 sD1
:
2s
EXERCISE 18. Let t represent an M -dimensional random column vector that has an MV t.N; IM / distribution. And let w D t 0 t. Derive the pdf of the distribution of w in each of the following two ways: (1) as a special case of the pdf (1.51) and (2) by making use of the relationship (4.50). Solution. (1) In light of result (4.65), a pdf, say r./, of the distribution of w is obtainable as a special case of the pdf (1.51); it is obtainable upon taking the function g./ in expression (1.51) to be as follows: for 0 < w < 1, w .N CM /=2 Œ.N CM /=2 M=2 1 C N g.w/ D : N .N=2/ M=2 Thus, for 0 < w < 1, w .N CM /=2 M=2 Œ.N CM /=2 M=2 1 C N w .M=2/ 1 .M=2/ N .N=2/ M=2 .M=2/ 1 .N CM /=2 w w Œ.N CM /=2 1C N 1 D .N=2/.M=2/ N N
r.w/ D
—for 1 < w 0, r.w/ D 0.
76
Some Relevant Distributions and Their Properties
(2) Let u D w=M . Then, according to the relationship (4.50), u SF .M; N /. And upon observing that du=dw D 1=M , upon denoting the pdf of the SF .M; N / distribution by f ./ and recalling expression (3.12), and upon making use of standard results on a change of variable, we find that the distribution of w has as a pdf the function r./ obtained by taking, for 0 < w < 1, r.w/ D f .w=M /.1=M / w .M=2/ 1 M w Œ.M CN /=2 1C .M=N /M=2 D .M=2/.N=2/ M N M w .M=2/ 1 .N CM /=2 Œ.N CM /=2 w D 1C N 1 .N=2/.M=2/ N N
.M CN /=2
1 M
—for 1 < w 0, r.w/ D 0. EXERCISE 19. Let x represent an M -dimensional random column vector whose distribution has as a pdf a function f ./ that is expressible in the following form: for all x, Z 1 h.x j u/g.u/ du; f .x/ D 0
where g./ is the pdf of the distribution of a strictly positive random variable u and where (for every u) h. j u/ is the pdf of the N.0; u 1 IM / distribution. (a) Show that the distribution of x is spherical. (b) Show that the distribution of u can be chosen in such a way that f ./ is the pdf of the MVt.N; IM / distribution. Solution. (a) For all x (and for 0 < u < 1), h.x j u/ D .2/
M=2 M=2
u
e
.u=2/x0 x
:
(S.11)
Thus, h.x j u/ depends on the value of x only through x0 x and, consequently, f .x/ depends on the value of x only through x0 x. And it follows that the distribution of x is spherical. (b) Take u Ga.N=2; 2=N /. Then, for 0 < u < 1, g.u/ D
N N=2 u.N=2/ .N=2/ 2N=2
1
N u=2
e
:
And in light of result (S.11), we find that (for all x) f .x/ D D
Z
1 0
N N=2 uŒ.N CM /=2 .N=2/ 2.N CM /=2 M=2
Œ.N CM /=2 N .N=2/ M=2
M=2
Z
x0 x 1C N
1 0
1
e
.u=2/.N Cx0 x/
du
.N CM /=2
1 Œ.N CM /=2Œ2=.N C x0x/.N CM /=2 uŒ.N CM /=2
1
e
u=Œ2=.N Cx0 x/
du:
(S.12)
Moreover, the integrand of the integral in expression (S.12) is the pdf of a GaŒ.N CM /=2; 2=.NC x0 x/ distribution. Thus, for all x, f .x/ D
Œ.N CM /=2 N .N=2/ M=2
M=2
x0 x 1C N
.N CM /=2
;
77
Some Relevant Distributions and Their Properties and it follows that f ./ is the pdf of the MV t.N; IM / distribution. EXERCISE 20. Show that if condition (6.7) of Theorem 6.6.2 is replaced by the condition †.b C 2A/ 2 C.†A†/;
the theorem is still valid.
Solution. Condition (6.7) clearly implies that †.bC2A/ 2 C.†A†/. Thus, to show that Theorem 6.6.2 remains valid when condition (6.7) is replaced by the condition †.b C 2A/ 2 C.†A†/, it suffices to show that the condition †.b C 2A/ 2 C.†A†/, in combination with condition (6.6), implies condition (6.7). If †.b C 2A/ 2 C.†A†/, then †.b C 2A/ D †A†r for some column vector r and if, in addition, condition (6.6) is satisfied, then †A†.b C 2A/ D †A†A†r D †A†r D †.b C 2A/; so that condition (6.7) is satisfied. EXERCISE 21. Let x represent an M -dimensional random column vector that has an N.; †/ distribution (where † ¤ 0), and take G to be a symmetric generalized inverse of †. Show that x0 Gx 2 .rank †; 0 G/
if 2 C.†/ or G†G D G. [Note. A symmetric generalized inverse G is obtainable from a possibly nonsymmetric generalized inverse, say H, by taking G D 21 H C 21 H0 ; the condition G†G D G is the second of the so-called Moore-Penrose conditions—refer, e.g., to Harville (1997, chap. 20) for a discussion of the Moore-Penrose conditions.] Solution. Clearly, x0 Gx D 0 C 0 0 x C x0 Gx. And since †G† D †, †G†G† D †G† And if G†G D G, then
and
†.2G/ D †G†.2G/:
0 G D 0 G†G D 14 .2G/0 †.2G/:
Similarly, if 2 C.†/, then D †k for some column vector k and hence 0 G D 0 G†k D 0 G†G†k D 0 G†G D 14 .2G/0 †.2G/: Thus, upon observing that rank.†G†/ D rank †, it follows from Theorem 6.6.2 that x0 Gx 2 .rank †; 0 G/:
EXERCISE 22. Let z represent an N -dimensional random column vector. And suppose that the distribution of z is an absolutely continuous spherical distribution, so that the distribution of z has as a pdf a function f ./ such that (for all z) f .z/ D g.z0 z/, where g./ is a (nonnegative) function of a single nonnegative variable. Further, take z to be an M -dimensional subvector of z (where M < N ), and let v D z0 z . (a) Show that the distribution of v has as a pdf the function h./ defined as follows: for v > 0, Z 1 N=2 w Œ.N M /=2 1 g.vCw/ dwI v .M=2/ 1 h.v/ D .M=2/ Œ.N M /=2 0 for v 0, h.v/ D 0.
(b) Verify that in the special case where z N.0; IN /, h./ simplifies to the pdf of the 2 .M / distribution. Solution. (a) Let g ./ represent the (nonnegative) function of a single nonnegative variable obtained by taking (for v 0)
78
Some Relevant Distributions and Their Properties
1 .N M /=2 w Œ.N M /=2 1 g.vCw/ dw: Œ.N M /=2 0 Then, according to result (5.9.66), the distribution of z has as a pdf the function f ./ defined by taking (for all z ) f .z / D g .z0 z /. And based on the results of Section 6.1g, we conclude that the distribution of v has as a pdf the function h./ defined by taking, for v > 0,
Z
g .v/ D
h.v/ D D
M=2 .M=2/ v .M=2/
1
g .v/
N=2 v .M=2/ .M=2/ Œ.N M /=2
1
Z
1
w Œ.N
M /=2 1
g.vCw/ dw
0
and by taking, for v 0, h.v/ D 0.
(b) In the special case where z N.0; IN /, we have that (for v 0) g.v/ D .2/
and hence that (for v > 0) N=2 v .M=2/ h.v/ D .M=2/ Œ.N M /=2
1
e
N=2
v=2
e
.2/
v=2
N=2
Z
1
w Œ.N
M /=2 1
e
w=2
dw:
0
Moreover, upon changing the variable of integration from w to u D w=2, we find that Z 1 Z 1 w Œ.N M /=2 1 e w=2 dw D 2.N M /=2 uŒ.N M /=2 1 e u du 0
0
D2
.N M /=2
Œ.N M /=2:
Thus, in the special case where z N.0; IN /, we find that (for v > 0) h.v/ D
1 v .M=2/ .M=2/ 2M=2
1
e
v=2
;
so that (in that special case) h./ simplifies to the pdf of the 2 .M / distribution. EXERCISE 23. Let z D .z1 ; z2 ; : : : ; zM /0 represent an M -dimensional random (column) vector that has a spherical distribution. And take A to be an M M symmetric idempotent matrix of rank R (where R 1). (a) Starting from first principles (i.e., from the definition of a spherical distribution), use the results of P 2 Theorems 5.9.5 and 6.6.6 to show that (1) z0 Az R i D1 zi and [assuming that Pr.z ¤ 0/ D 1] P P M R 2 2 0 0 that (2) z Az=z z i D1 zi = i D1 zi .
(b) Provide an alternative “derivation” of results (1) and (2) of Part (a); do so by showing that (when z has an absolutely continuous spherical distribution) these two results can be obtained by applying Theorem 6.6.7 (and by making use of the results of Sections 6.1f and 6.1g). Solution. (a) According to Theorem 5.9.5, there exists an M R matrix Q1 such that A D Q1Q10 ; and, necessarily, this matrix is such that Q10 Q1 D IR (and hence is such that its columns are orthonormal). Now, take Q to be the M M matrix defined as follows: if R D M, take Q D Q1 ; if R < M, take Q D .Q1 ; Q2 /, where Q2 is an M .M R/ matrix whose columns consist of any M R vectors that, in combination with the R columns of Q1 , form an orthonormal basis for RM —the existence of such vectors follows from Theorem 6.6.6. Further, define y1 D Q10 z and y D Q 0 z, and observe that the elements of y1 are the first R elements of y. Observe also that Q is orthogonal and that, as a consequence, y z. Accordingly, we find that P 2 z0 Az D z0 Q1Q10 z D .Q10 z/0 Q10 z D y10 y1 R i D1 zi :
Some Relevant Distributions and Their Properties
79
And upon assuming that Pr.z ¤ 0/ D 1 and upon observing that, for z ¤ 0,
we find that
z0 Az z0 Q Q 0 z .Q10 z/0 Q10 z y0 y D 0 1 01 D D 10 1 ; 0 0 0 0 zz z QQ z .Q z/ Q z yy P R 2 z0 Az i D1 zi : P M 2 z0 z i D1 zi
(b) Assume that z has an absolutely continuous spherical distribution. Further, let y D .y1 ; y2 ; : : : ; yM /0 represent an M -dimensional random (column) vector that has an N.0; IM / distribution. Then, it follows from Theorem 6.6.7 that PR 2 y 0 Ay i D1 yi : P M 2 y 0y i D1 yi
And upon letting v D .v1 ; v2 ; : : : ; vM /0 D .z0 z/ 1=2 z and u D .u1 ; u2 ; : : : ; uM /0 D .y 0 y/ 1=2 y and observing (in light of the results of Sections 6.1f and 6.1g) that the vector z has the same distribution as the vector u—each of these 2 vectors is distributed uniformly on the surface of an M -dimensional unit sphere—we find that PR PR 2 2 PR PR z y 0 Ay z0 Az i D1 yi 0 0 2 2 D v Av u Au D 0 PM D i D1 ui i D1 vi D PiMD1 i : (S.13) 0 2 2 zz yy i D1 yi i D1 zi
It remains only to observe (in light of the results of Section 6.1g) that z0 z is distributed independently P 2 of v, implying that z0 z is distributed independently of v 0Av and of R i D1 vi , or equivalently of P P M 2 2 z0 Az=z0 z and of R i D1 zi = i D1 zi , and hence [in light of result (S.13)] that PR 2 P z z0 Az 2 0 0 0 z Az D .z z/ 0 .z z/ PiMD1 i D R i D1 zi : 2 zz z i D1 i
EXERCISE 24. Let A represent an N N symmetric matrix. And take Q to be an N N orthogonal matrix and D an N N diagonal matrix such that A D QDQ0 —the decomposition A D QDQ0 is the spectral decomposition, the existence and properties of which are established in Section 6.7a. Further, denote by d1 ; d2 ; : : : ; dN the diagonal elements of D (which are the not-necessarily-distinct eigenvalues of A), and taking D C to be the N N diagonal matrix whose i th diagonal element is diC, where diC D 0 if di D 0 and where diC D 1=di if di ¤ 0, define AC D QD C Q0. Show that (1) AACA D A (i.e., AC is a generalized inverse of A) and also that (2) ACAAC D AC, (3) AAC is symmetric, and (4) ACA is symmetric—as discussed, e.g., by Harville (1997, chap. 20), these four conditions are known as the Moore-Penrose conditions and they serve to determine a unique matrix AC that is known as the Moore-Penrose inverse. Solution. Observe that DD C D, D C DD C , DD C, and D C D are diagonal matrices. Observe also that (for i D 1; 2; : : : ; N ) the i th diagonal element of DD C D equals di and the i th diagonal element of D C DD C equals diC. Thus, (1) AACA D QDQ0 QD C Q0 QDQ0 D QDD C DQ0 D QDQ0 D A; (2) ACAAC D QD C Q0 QDQ0 QD C Q0 D QD C DD C Q0 D QD C Q0 D AC; (3) AAC D QDQ0 QD C Q0 D QDD C Q0 (a symmetric matrix); and (4) ACA D QD C Q0 QDQ0 D QD C DQ0 (a symmetric matrix). EXERCISE 25. Let † represent an N N symmetric nonnegative definite matrix, and take 1 to be a P1 N matrix and 2 a P2 N matrix such that † D 10 1 D 20 2 . Further, take A to be an N N symmetric matrix. And assuming that P2 P1 (as can be done without any essential loss
80
Some Relevant Distributions and Their Properties
of generality), show that the P2 not-necessarily-distinct eigenvalues of the P2 P2 matrix 2 A20 consist of the P1 not-necessarily-distinct eigenvalues of the P1 P1 matrix 1 A10 and of P2 P1 zeroes. (Hint. Make use of Corollary 6.4.2.) Solution. Let p./ represent the characteristic polynomial of the P2 P2 matrix 2 A20 . Then, making use of Corollary 6.4.2 (and of Corollary 2.14.6), we find that, for any nonzero scalar , p./ D j2 A20 D . /
P2
IP2 j
jIP2
D . /P2 jIN
D . /P2 jIN D . /
P2
.1=/2 A20 j
.1=/A20 2 j .1=/A†j
jIN
D . /P2 jIP1
.1=/A10 1 j
.1=/1 A10 j
D . /P2 Œ .1=/P1 j1 A10 D . 1/P2
P1 P2 P1
IP1 j
q./;
where q./ is the characteristic polynomial of the P1 P1 matrix 1 A10 . Based, for example, on the observation that two polynomials that are equal over some nondegenerate interval are equal everywhere, we conclude that, for every scalar , p./ D . 1/P2
P1 P2 P1
q./:
Thus, the P2 not-necessarily-distinct eigenvalues of 2 A20 [which coincide with the P2 notnecessarily-distinct roots of the polynomial p./] consist of the P1 not-necessarily-distinct eigenvalues of 1 A10 and of P2 P1 zeroes. EXERCISE 26. Let A represent an M M symmetric matrix and † an M M symmetric nonnegative definite matrix. Show that the condition †A†A† D †A† (which appears in Theorem 6.6.2) is equivalent to each of the following three conditions: (1) .A†/3 D .A†/2 ; (2) trŒ.A†/2 D trŒ.A†/3 D trŒ.A†/4 ; and (3) trŒ.A†/2 D tr.A†/ D rank.†A†/. Solution. Take to be any matrix (with M columns) such that † D 0 .
(1) That †A†A† D †A† ) .A†/3 D .A†/2 is obvious. Now, for purposes of establishing the converse, suppose that .A†/3 D .A†/2 or, equivalently, that A 0 A 0 A 0 D A 0 A 0 : Then, upon applying Corollary 2.3.4, we find that
implying that or, equivalently, that
A 0 A 0 A 0 D A 0 A 0; A 0 A 0 A 0 D A 0 A 0 .A 0 /0 .A 0 /A 0 D .A 0 /0 .A 0 /:
And upon making further use of Corollary 2.3.4, it follows that and hence that
A 0 A 0 D A 0 †A†A† D 0 A 0 A 0 D 0 A 0 D †A†:
81
Some Relevant Distributions and Their Properties
(2) Suppose that †A†A† D †A†. Then, clearly (in light of what has already been established), .A†/2 D .A†/3 D A†.A†/2 D A†.A†/3 D .A†/4; implying that trŒ.A†/2 D trŒ.A†/3 D trŒ.A†/4 . Conversely, suppose that trŒ.A†/2 D trŒ.A†/3 D trŒ.A†/4 :
(S.14)
And denote by P the number of rows in the matrix , and take O to be a P P orthogonal matrix such that O 0 A 0 O D diag.d1 ; d2 ; : : : ; dP / for some scalars d1 ; d2 ; : : : ; dP —the existence of such a P P orthogonal matrix follows from Theorem 6.7.4. Further, observe (in light of Lemma 2.3.1) that
and, similarly,
trŒ.A†/2 D tr.A 0 A 0 / D tr.A 0 OO 0 A 0 OO 0 / P 2 D trŒ.O 0 A 0 O/2 D P i D1 di trŒ.A†/3 D
PP
3 i D1 di
and
trŒ.A†/4 D
implying [in light of the supposition (S.14)] that PP PP PP 4 3 2 i D1 di i D1 di D i D1 di D and hence that
PP
i D1
di2 .1
(S.15)
PP
4 i D1 di ;
di /2 D 0:
Thus, either di D 0 or di D 1 (i D 1; 2; : : : ; P ); and upon observing that d1 , d2 , : : : ; dP are the notnecessarily-distinct eigenvalues of A 0, it follows from Theorem 6.7.7 that A 0 is idempotent, that is, A 0 A 0 D A 0; in which case †A†A† D 0 A 0 A 0 D 0 A 0 D †A†: (3) The condition †A†A† D †A† is reexpressible as 0 A 0 A 0 D 0 A 0 . Thus, in light of Corollary 2.3.4, †A†A† D †A†
,
A 0 A 0 D A 0;
that is, †A†A† D †A† if and only if A 0 is idempotent. Moreover, in light of Lemma 2.3.1, trŒ.A†/2 D tr.A 0 A 0 / and tr.A†/ D tr.A 0 /I and in light of Lemma 2.12.3, rank.†A†/ D rank.†A 0 D rank.A 0 /: Accordingly, to show that the condition †A†A† D †A† is equivalent to the condition trŒ.A†/2 D tr.A†/ D rank.†A†/, it suffices to show that A 0 is idempotent if and only if trŒ.A 0 /2 D tr.A 0 / D rank.A 0 /: (S.16)
That the idempotency of A 0 implies that condition (S.16) is satisfied is an obvious consequence of Corollary 2.8.3. Conversely, suppose that condition (S.16) is satisfied. And [as in Part (2)] denote by P the number of rows in the matrix , and take O to be a P P orthogonal matrix such that O 0 A 0 O D diag.d1 ; d2 ; : : : ; dP / for some scalars d1 ; d2 ; : : : ; dP (in which case d1 ; d2 ; : : : ; dP are the not-necessarily-distinct eigenvalues of A 0 ). Then, as in the case of result (S.15), P 2 trŒ.A 0 /2 D P i D1 di :
82
Some Relevant Distributions and Their Properties
Moreover, tr.A 0 / D
PP
i D1 di
rank.A 0 / D
and
as is evident from Theorem 6.7.5. Thus, PP P 2 1/2 D P i D1 .i W di ¤0/ .di i D1 di
2
0 2
D trŒ.A /
PP
i D1 di
C
0
PP
i D1 .i W di ¤0/
PP
i D1 .i W di ¤0/
1;
1
2 tr.A / C rank.A 0 / D 0;
leading to the conclusion that all of the nonzero eigenvalues of A 0 equal 1 and hence (in light of Theorem 6.7.7) that A 0 is idempotent. EXERCISE 27. Let z represent an M -dimensional random column vector that has an N.0; IM / distribution, and take q D c C b0 z C z0 Az, where c is a constant, b an M -dimensional column vector of constants, and A an M M (nonnull) symmetric matrix of constants. Further, denote by m./ the moment generating function of q. Provide an alternative derivation of the “sufficiency part” of Theorem 6.6.1 by showing that if A2 D A, b D Ab, and c D 14 b0 b, then, for every scalar t in some neighborhood of 0, m.t/ D m .t/, where m ./ is the moment generating function of a 2 .R; c/ distribution and where R D rank A D tr.A/. Solution. Proceeding as in the proof of the “necessity part” of Theorem 6.6.1, define K D rank A. And take O to be an M M orthogonal matrix such that O 0AO D diag.d1 ; d2 ; : : : ; dM /
for some scalars d1 ; d2 ; : : : ; dM —the existence of such an M M orthogonal matrix follows from Theorem 6.7.4—and define u D O 0 b. In light of Theorem 6.7.5, K of the scalars d1 ; d2 ; : : : ; dM (which are the not-necessarily-distinct eigenvalues of A) are nonzero, and the rest of them equal 0. Assume (without loss of generality) that it is the first K of the scalars d1 ; d2 ; : : : ; dM that are nonzero, so that dKC1 D dKC2 D D dM D 0—this assumption can always be satisfied by reordering d1 , d2 , : : : ; dM and the corresponding columns of O (as necessary). Further, letting D1 D diag.d1 ; d2 ; : : : ; dK / and partitioning O as O D .O1 ; O2 / (where the dimensions of O1 are M K), observe that A D O1 D1 O10 : Upon applying result (7.20), we find that, for every scalar t in some neighborhood S of 0, " !# K K M Y X u2i t2 X 1=2 2 m.t/ D .1 2tdi / exp tc C C ui ; 2 1 2tdi i D1 i D1 i DKC1
where u1 ; u2 ; : : : ; uM represent the elements of u. Now, suppose that A2 D A, b D Ab, and c D 14 b0 b. Then, in light of Theorem 6.7.7, d1 D d2 D D dK D 1, so that D1 D IK ; and in light of Theorem 6.7.5, K D tr.A/. Further, O2O20 b D .I
O1IO10 /b D .I
O1O10 /b D .I
implying that O20 b D 0 and hence that PM
2 i DKC1 ui
And
PK
i D1
Thus, for t 2 S ,
u2i D
PM
2 i D1 ui
m.t/ D .1 2t/ D .1 2t/
O1D1O10 /b D .I
A/b D b
b D 0;
D .O20 b/0 O20 b D 0:
D u0 u D .O 0 b/0 O 0 b D b0 b D 4c:
K=2 K=2
expŒtc C .t 2 =2/.4c/=.1 2t/ expŒtc=.1 2t/:
We conclude [in light of result (2.18)] that, for every scalar t in some neighborhood of 0,
Some Relevant Distributions and Their Properties
83
m.t/ D m .t/: EXERCISE 28. Let x represent an M -dimensional random column vector that has an N.0; †/ distribution, and denote by A an M M symmetric matrix of constants. Construct an example where M D 3 and where † and A are such that A† is not idempotent but are nevertheless such that x0 Ax has a chi-square distribution. 0 1 0 1 1 1 0 1 0 0 1 0A and A D @0 1 0A. Then, Solution. Take † D @1 0 0 1 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0A ¤ @ 1 1 0A D A†; .A†/2 D @0 0 0 1 0 0 1 0 1 but 0 0 0 0 0A D †A†: †A†A† D @0 0 0 1 Moreover, tr.A†/ ŒD rank.†A†/ D 1. Thus, it follows from Theorem 6.6.2 that x0 Ax 2 .1/.
EXERCISE 29. Let x represent an M -dimensional random column vector that has an N.0; †/ distribution. Further, partition x and † as x1 †11 †12 xD and † D x2 †21 †22 (where the dimensions of †11 are the same as the dimension of x1 ). And take G1 to be a generalized inverse of †11 and G2 a generalized inverse of †22 . Show that x01 G1 x1 and x02 G2 x2 are distributed independently if and only if †12 D 0. Solution. Clearly, x01 G1 x1 D x01 G1 x1 and x02 G2 x2 D x02 G2 x2 , where G1 D .1=2/.G1 C G10 / and G2 D .1=2/.G2 C G20 /. Moreover, G1 is a symmetric generalized inverse of †11 and G2 a symmetric generalized inverse of †22 . Accordingly, it follows from Theorem 6.8.1 that x01 G1 x1 and x02 G2 x2 are distributed independently if and only if
†A1 †A2 † D 0; G1 0 0 0 where A1 D and A2 D . Thus, it suffices to show that †A1 †A2 † D 0 if and 0 0 0 G2 only if †12 D 0. We find that 0 †11 G1 †12 G2 †12 †11 G1 †12 G2 †22 : (S.17) †A1 †A2 † D 0 0 0 †12 G1 †12 G2 †22 †12 G1 †12 G2 †12 That †A1 †A2 † D 0 if †12 D 0 follows immediately from expression (S.17). Now, for purposes of establishing the converse, suppose that †A1 †A2 † D 0. Then, †11 G1 †12 G2 †22 D 0
(S.18)
[as is evident from expression (S.17)]. Moreover, since † is symmetric and nonnegative definite, † D 0 for some matrix , and upon partitioning as D .1 ; 2 / (where 1 has the same number of columns as †11 ), we find that †11 D 10 1 , †12 D 10 2 , and †22 D 20 2 . And in light of results (2.12.1) and (2.12.2), it follows that †11 G1 †12 G2 †22 D 10 1 G1 10 2 G2 20 2 D 10 2 D †12 ;
84
Some Relevant Distributions and Their Properties
leading [in light of equality (S.18)] to the conclusion that †12 D 0. EXERCISE 30. Let x represent an M -dimensional random column vector that has an N.; IM / distribution. And, for i D 1; 2; : : : ; K, take qi D ci C b0i x C x0 Ai x, where ci is a constant, bi an M dimensional column vector of constants, and Ai an M M symmetric matrix of constants. Further, denote by m. ; ; : : : ; / the moment generating function of the joint distribution of q1 ; q2 ; : : : ; qK . Provide an alternative derivation of the “sufficiency part” of Theorem 6.8.3 by showing that if, for j ¤ i D 1; 2; : : : ; K, Ai Aj D 0, Ai bj D 0, and b0i bj D 0, then there exist (strictly) positive scalars h1 ; h2 ; : : : ; hK such that, for any scalars t1 ; t2 ; : : : ; tK for which jt1 j < h1 , jt2 j < h2 , : : : ; jtK j < hK , m.t1 ; t2 ; : : : ; tK / D m.t1 ; 0; 0; : : : ; 0/ m.0; t2 ; 0; 0; : : : ; 0/ m.0; : : : ; 0; 0; tK /: Solution. For i D 1; 2; : : : ; K, let di D bi C 2Ai . And observe (in light of Lemma 6.5.2) that PK there exist (strictly) positive scalars h1 ; h2 ; : : : ; hK such that the matrix IM i D1 ti Ai is positive definite for any scalars t1 ; t2 ; : : : ; tK for which jt1 j < h1 , jt2 j < h2 , : : : ; jtK j < hK . Accordingly, it follows from result (5.12) or (5.16) that, for any scalars t1 ; t2 ; : : : ; tK for which jt1 j < h1 , jt2 j < h2 , : : : ; jtK j < hK , ˇ ˇ 1=2 P P 0 0 m.t1 ; t2 ; : : : ; tK / D ˇI 2 i ti Ai ˇ exp i ti .ci Cbi C Ai / 0 1P P P I 2 i ti Ai exp 12 i ti di i ti di : Now, suppose that, for j ¤ i D 1; 2; : : : ; K, Ai Aj D 0, Ai bj D 0, and b0i bj D 0. Then, Q P I 2 i ti Ai D i .I 2 ti Ai /;
as can be readily verified by mathematical induction. Moreover, for j ¤ i D 1, 2, : : : ; K, di D .I
2 tj Aj /di or, equivalently (provided jtj j < hj ), .I
2 tj Aj /
Thus, for t1 ; t2 ; : : : ; tK for which jt1 j < h1 , jt2 j < h2 , : : : ; jtK j < hK , Q P .I 2 j tj Aj / 1 di D j .I 2 tj Aj / 1 di D .I 2 ti Ai / so that
d0i .I 2
and (if s ¤ i )
d0s .I 2
P
j tj Aj /
P
j tj Aj /
1
1
di D di :
di ;
di D d0i .I 2 ti Ai / 1di
1
di D d0s .I 2 ti Ai / 1di D Œ.I 2 ti Ai / 1ds 0 di D d0s di D 0:
1
And it follows that
m.t1 ; t2 ; : : : ; tK / D
Q
i
jI 2 ti Ai j
1=2 Q i
expŒ ti .ci Cb0i C0 Ai Q i exp 21 ti2 d0i .I 2 ti Ai /
1
di
D m.t1 ; 0; 0; : : : ; 0/ m.0; t2 ; 0; 0; : : : ; 0/ m.0; : : : ; 0; 0; tK /: EXERCISE 31. Let x represent an M -dimensional random column vector that has an N.0; †/ distribution. And take A1 and A2 to be M M symmetric nonnegative definite matrices of constants. Show that the two quadratic forms x0 A1 x and x0 A2 x are statistically independent if and only if they are uncorrelated. Solution. That x0 A1 x and x0 A2 x are uncorrelated if they are statistically independent is obvious— refer to Lemma 3.2.1. Now, for purposes of establishing the converse, suppose that x0 A1 x and x0 A2 x
85
Some Relevant Distributions and Their Properties
are uncorrelated. Then, in light of result (5.7.19) or as an application of condition (8.24), we have that tr.A1 †A2 †/ D 0: (S.19) Moreover, A1 D R01 R1 and A2 D R02 R2 for some matrices R1 and R2 . And making use of Lemma 2.3.1, we find that tr.A1 †A2 †/ D tr.R01 R1 †R02 R2 †/ D tr.R2 †R01 R1 †R02 / D trŒ.R1 †R02 /0 R1 †R02 :
(S.20)
Together, results (S.19) and (S.20) imply (in light of Lemma 2.3.2) that
and it follows that
R1 †R02 D 0; A1 †A2 D R01 R1 †R02 R2 D 0
[and also that A2 †A1 D ŒA1 †A2 0 D 0], leading (in light of Theorem 6.8.1) to the conclusion that x0 A1 x and x0 A2 x are statistically independent. EXERCISE 32. Let x represent an M -dimensional random column vector that has an N.; IM / distribution. Show (by producing an example) that there exist quadratic forms x0A1 x and x0A2 x (where A1 and A2 are M M symmetric matrices of constants) that are uncorrelated for every 2 RM but that are not statistically independent for any 2 RM. Solution. Take A1 D diag.B1 ; 0/ and A2 D diag.B2 ; 0/, where 1 1 1 B1 D and B2 D 1 1 1
Then, where
1 : 1
A1 A2 D diag.C1 ; 0/; 0 2 C1 D B 1 B 2 D : 2 0
Thus, A1 A2 ¤ 0, implying (in light of Corollary 6.8.2) that x0A1 x and x0A2 x are not statistically independent for any 2 RM . Now, consider the covariance of x0A1 x and x0A2 x. Denoting by 1 ; 2 ; : : : ; M the elements of and applying formula (5.7.19), we find that cov.x0A1 x; x0A2 x/ D 2 tr.A1 A2 / C 40A1 A2 0 1 D0 C C1 1 2 2 D 2 2 1
2 1 2 D 0;
so that x0A1 x and x0A2 x are uncorrelated for every 2 RM .
7 Confidence Intervals (or Sets) and Tests of Hypotheses
EXERCISE 1. Take the context to be that of Section 7.1 (where a second-order G-M model is applied to the results of an experimental study of the yield of lettuce plants for purposes of making inferences about the response surface and various of its characteristics). Assume that the second-order regression coefficients ˇ11 , ˇ12 , ˇ13 , ˇ22 , ˇ23 , and ˇ33 are such that the matrix A is nonsingular. Assume also that the distribution of the vector e of residual effects in the G-M model is MVN (in O is nonsingular with probability 1). Show that the large-sample distribution which case the matrix A of the estimator uO 0 of the stationary point u0 of the response surface is MVN with mean vector u0 and variance-covariance matrix O 0 / A 1: .1=4/ A 1 var.Oa C 2Au (E.1)
Do so by applying standard results on multi-parameter maximum likelihood estimation—refer, e.g., to McCulloch, Searle, and Neuhaus (2008, sec. S.4) and Zacks (1971, chap. 5). O is nonsingular (with probability 1), and uO 0 Solution. As discussed in Section 7.1, the matrix A O 1 aO ] is the ML estimator of u0 . Define (column) vectors and of parameters as [D .1=2/A 0 1 0 1 follows: a u0 D @ˇ1 A and D @ˇ1 A ˇ2 ˇ2
[where ˇ2 D .ˇ11 ; ˇ12 ; ˇ13 ; ˇ22 ; ˇ23 ; ˇ33 /0 ]. Further, letting a1 , a2 , and a3 represent the columns of A, we find that 0 1 0 0 1 a1 u0 a1 0 0 0 @ A 0 A @ (S.1) a D 2Au0 D 2 u0 a2 D 2 diag.u0 ; u0 ; u0 / a2 D Hˇ2 ; a3 u00 a3 where
0
HD
2 B0 B B0 B B0 B 0 B 0 0 diag.u0 ; u0 ; u0 / B0 B0 B B0 B @0 0
0 1 0 1 0 0 0 0 0
0 0 1 0 0 0 1 0 0
0 0 0 0 2 0 0 0 0
0 0 0 0 0 1 0 1 0
1 0 0C C 0C C 0C C 0C C: 0C C 0C C 0A 2
And letting ˇO2 D .ˇO11 ; ˇO12 ; ˇO13 ; ˇO22 ; ˇO23 ; ˇO33 /0, the ML estimators of and are respectively the vectors O and O defined as follows: 0 1 0 1 aO uO 0 O D @ˇO1 A and O D @ˇO1 A: ˇO2 ˇO2
In light of standard results on multi-parameter ML estimation, it suffices to show that the variancecovariance matrix of the large-sample distribution of uO 0 is given by expression (E.1). The vector
88
Confidence Intervals (or Sets) and Tests of Hypotheses
can be regarded as a (vector-valued) function of the vector ; the Jacobian matrix of this function is the matrix 0 1 2A 0 H @ D @ 00 1 00 A @0 0 0 I
[as can be readily verified by using result (S.1) together with result (5.4.12)]. Accordingly, it follows from standard results on multi-parameter ML estimation that the variance-covariance matrix of the large-sample distribution of O is 10 1 @ @ O var./ : 0 @ @0 And upon applying Lemma 2.6.4, we find that 0 1 1 2A @ @ 00 D @0 0
1
0 1 0
1 1 1 2A H 0 A:
0 I
Thus, the variance-covariance matrix of the large-sample distribution of uO 0 is 0 1 1 1 1 O A 1; 0; 21 A 1H var./ A 1; 0; 21 A 1H D var A ; 0; 12 A 1H O 2 2 2 D var 21 A 1 I; 0; H O D .1=4/ A 1var.Oa HˇO2 / A 1: Moreover, analogous to the equality Hˇ2 D 2Au0 [from result (S.1)], we have that HˇO2 D
and hence that .1=4/ A 1var.Oa
HˇO2 / A
1
O 0 2Au
O 0 / A 1: D .1=4/ A 1var.Oa C 2Au
EXERCISE 2. Taking the context to be that of Section 7.2a (and adopting the same notation and Q 0 X0 y to have the same expected value under the terminology as in Section 7.2a), show that for R augmented G-M model as under the original G-M model, it is necessay (as well as sufficient) that X0 Z D 0. Q 0 X0 y has the same expected value under the augmented G-M model as under Solution. The vector R the original G-M model if and only if Q 0 X0 Z D 0 R [as is evident from expression (2.5)]. Thus, it suffices to show that Q 0 X0 Z D 0 ) X0 Z D 0 R
Q 0 X0 Z D 0. —clearly, X0 Z D 0 ) R 0 Q By definition, X XR D ƒ and R.ƒ0 / D R.X/, implying (in light of Theorem 2.12.2) that Q 0 X0 Z D R Q 0 X0 X.X0 X/ X0 Z D ƒ0 .X0 X/ X0 Z R and (in light of Lemma 2.4.3) that for some matrix L. Thus, so that
X D Lƒ0
Q 0 X0 Z D Lƒ0 .X0 X/ X0 Z D X.X0 X/ X0 Z D P Z; LR X Q 0 X0 Z D 0 ) LR Q 0 X0 Z D 0 , P Z D 0 R X
and hence [since (according to Theorem 2.12.2) X0 PX D X0 ]
Confidence Intervals (or Sets) and Tests of Hypotheses
89
Q 0 X0 Z D 0 ) X0 P Z D 0 , X0 Z D 0: R X EXERCISE 3. Adopting the same notation ı and terminology as in Section 7.2, consider the expected value of the usual estimator y 0 .I PX /y ŒN rank.X/ of the variance of the residual effects of the (original) G-M model y D Xˇ C e. How is the expected value of this estimator affected when the model equation is augmented via the inclusion of the additional “term” Z? That is, what is the expected value of this estimator when its expected value is determined under the augmented G-M model (rather than under the original G-M model)? Solution. Suppose that the N 1 observable random vector y follows the augmented G-M model y D Xˇ C Z C e. Then, E.y/ D Xˇ C Z and var.y/ D 2 I: And upon applying formula (5.7.11) for the expected value of a quadratic form, we find that 0 trŒ.I PX /. 2 I/C.Xˇ C Z/0 .I PX /.Xˇ CZ/ y .I PX /y D : E N rank.X/ N rank.X/ Moreover, in light of Lemma 2.8.4 and Theorem 2.12.2, trŒ.I PX /. 2 I/ D 2 tr.I PX / D 2 ŒN rank.PX / D 2 ŒN rank.X/ and Thus,
.Xˇ C Z/0 .I PX /.Xˇ CZ/ D 0 Z0 .I PX /Z: 0 Z0 .I PX /Z y 0 .I PX /y D 2 C : E N rank.X/ N rank.X/
EXERCISE 4. Adopting the same notation and terminology as in Sections 7.1 and 7.2, regard the lettuce yields as the observed values of the N .D 20/ elements of the random column vector y, and take the model to be the “reduced” model derived from the second-order G-M model (in the 3 variables Cu, Mo, and Fe) by deleting the four terms involving the variable Mo—such a model would be consistent with an assumption that Mo is “more-or-less inert,” i.e., has no discernible effect on the yield of lettuce. (a) Compute the values of the least squares estimators of the regression coefficients (ˇ1 , ˇ2 , ˇ4 , ˇ11 , ˇ13 , and ˇ33 ) of the reduced model, and determine the standard errors, estimated standard errors, and correlation matrix of these estimators. (b) Determine the expected values of the least squares estimators (of ˇ1 , ˇ2 , ˇ4 , ˇ11 , ˇ13 , and ˇ33 ) from Part (a) under the complete second-order G-M model (i.e., the model that includes the 4 terms involving the variable Mo), and determine (on the basis of the complete model) the estimated standard errors of these estimators. (c) Find four linearly independent linear combinations of the four deleted regression coefficients (ˇ3 , ˇ12 , ˇ22 , and ˇ23 ) that, under the complete second-order G-M model, would be estimable and whose least squares estimators would be uncorrelated, each with a standard error of ; and compute the values of the least squares estimators of these linearly independent linear combinations, and determine the estimated standard errors of the least squares estimators. Solution. (a) The residual sum of squares equals 139:4432, and N P D 14. Thus, the value of the ı usual unbiased estimator of 2 is 139:4432 14 D 9:96, the square root of which equals 3:16. And the least squares estimates, the standard errors, and the estimated standard errors are as follows:
90
Confidence Intervals (or Sets) and Tests of Hypotheses Coefficient of
Regression coefficient
Least squares estimate
1 u1 u3 u21 u1 u3 u23
ˇ1 ˇ2 ˇ4 ˇ11 ˇ13 ˇ33
23:68 4:57 0:11 5:18 1:29 5:20
Std. error of Estimated std. error the estimator of the estimator 0:333 0:279 0:317 0:265 0:462 0:271
1:05 0:88 1:00 0:84 1:46 0:86
The correlation matrix of the vector of least squares estimators [which is expressible as S 1.X0 X/ 1 S 1, where S is the diagonal matrix of order P D 6 whose first through P th diagonal elements are respectively the square roots of the first through P th diagonal elements of .X0 X/ 1 ] is 1 0 1:00 0:00 0:04 0:61 0:00 0:51 B 0:00 1:00 0:00 0:00 0:24 0:00C C B B 0:04 0:00 1:00 0:09 0:00 0:21C C: B B 0:61 0:00 0:09 1:00 0:00 0:17C C B @ 0:00 0:24 0:00 0:00 1:00 0:00A 0:51 0:00 0:21 0:17 0:00 1:00
(b) Let ˇQ1 , ˇQ2 , ˇQ4 , ˇQ11 , ˇQ13 , and ˇQ33 represent the least squares estimators (of ˇ1 , ˇ2 , ˇ4 , ˇ11 , ˇ13 , and ˇ33 ) from Part (a). Then, taking X to be the model matrix of the reduced second-order G-M Q D .X0 X/ 1 (corresponding to ƒ D I), we find that, under model and applying result (2.5) with R the complete second-order G-M model, E.ˇQ1 / D ˇ1 C 0:861ˇ22 ; E.ˇQ2 / D ˇ2 ; E.ˇQ4 / D ˇ4 C 0:114ˇ22 ; E.ˇQ11 / D ˇ11 0:126ˇ22 ;
E.ˇQ13 / D ˇ13 ; and E.ˇQ33 / D ˇ33 0:195ˇ22 :
Moreover, when the model is taken to be the complete second-order model, the value obtained for the usual estimator of 2 is (as determined in Section 7.2c) 10:89, the square root of which equals 3:30. Accordingly, the estimated value of is 4:6% higher than the estimated value (3:16) obtained when the model is taken to be the reduced second-order model. And (when determined under the complete second-order model) the estimated standard errors of ˇQ1 , ˇQ2 , ˇQ4 , ˇQ11 , ˇQ13 , and ˇQ33 are 1:10, 0:92, 1:05, 0:87, 1:53, and 0:89, respectively, which are 4:6% higher than those determined [in Part (a)] under the reduced model. (c) Take X to be the 20 6 matrix whose first through sixth columns are the columns of the model matrix of the complete second-order G-M model corresponding to the parameters ˇ1 , ˇ2 , ˇ4 , ˇ11 , ˇ13 , and ˇ33 , and take Z to be the 204 matrix whose first through fourth columns are the columns of the model matrix of the complete second-order model corresponding to the parameters ˇ3 , ˇ12 , ˇ22 , and ˇ23 . Further, take .X; Z/ D OU to be the QR decomposition of the partitioned matrix .X; Z/ (so that O is a 2010 matrix with orthonormal columns and U a 1010 upper triangular matrix with positive diagonal elements), define D .ˇ3 ; ˇ12 ; ˇ22 ; ˇ23 /0, and partition O as O D .O1 ; O2 / and U11 U12 U as U D (where O2 is of dimensions 20 4 and U22 of dimensions 4 4). Then, the 0 U22 elements of the vector U22 are linearly independent linear combinations of ˇ3 , ˇ12 , ˇ22 , and ˇ23 . And (when the model is taken to be the complete second-order model) these linear combinations
91
Confidence Intervals (or Sets) and Tests of Hypotheses
are estimable, their least squares estimators equal the (corresponding) elements of the vector O20 y (and are uncorrelated, each with a standard error of ), and the estimated standard errors of the least squares estimators equal the estimate of the standard deviation . The elements of U22 are found to be 3:696ˇ3 C 0:545ˇ23 ; 2:828ˇ12 ; 3:740ˇ22 ; and 2:165ˇ23 ; and their least squares estimates are found to be 2:68;
3:72;
2:74; and 1:43;
respectively; the estimated standard errors of the least squares estimators equal 3:30. EXERCISE 5. Suppose that y is an N 1 observable random vector that follows the G-M model. Show that Q E.y/ D XRS˛ C U; Q where R, S, U, ˛, and are as defined in Section 7.3a. Solution. Employing the same notation as in Section 7.3a and making use of result (3.17), we find that E.y/ D E.Oz/ 0 1 D O E.z/ ˛ Q U; L/ @ A D .XRS; 0 Q D XRS˛ C U: EXERCISE 6. Taking the context to be that of Section 7.3, adopting the notation employed therein, supposing that the distribution of the vector e of residual effects (in the G-M model) is MVN, and assuming that N > P C2, show that .˛ ˛.0/ /0 .˛ ˛.0/ / N P 1C EŒFQ .˛.0/ / D N P 2 M 2 N P . .0/ /0 C . .0/ / D 1C : N P 2 M 2 Q .0/ / SF ŒM ; N P ; .1= 2 /.˛ ˛.0/ /0 .˛ ˛.0/ /. Solution. According to result (3.46), F.˛ Thus, as an application of formula (6.3.31), we have that .˛ ˛.0/ /0 .˛ ˛.0/ / N P .0/ Q 1C : EŒF .˛ / D N P 2 M 2 And upon applying result (3.34) (with P D .0/ ) or result (3.48), we find that 1C
.˛
. ˛.0/ /0 .˛ ˛.0/ / D 1C M 2
.0/ /0 C . M 2
.0/ /
:
EXERCISE 7. Take the context to be that of Section 7.3, adopt the notation employed therein, and suppose that the distribution of the vector e of residual effects (in the G-M model) is MVN. For P 2 C.ƒ0 /, the distribution of F ./ P is obtainable (upon setting ˛P D S0 ) P from that of FQ .˛/: P in light of the relationship (3.32) and results (3.24) and (3.34), F ./ P SF ŒM ; N P ; .1= 2 /.P /0 C .P /: Provide an alternative derivation of the distribution of F ./ P by (1) taking b to be a P 1 vector such that P D ƒ0 b and establishing that F ./ P is expressible in the form
92
Confidence Intervals (or Sets) and Tests of Hypotheses F ./ P D
.1= 2 /.y Xb/0 P
Q XR
.1= 2 /.y
Xb/0 .I
.y Xb/=M
PX /.y Xb/=.N P /
and by (2) regarding .1= 2 /.y Xb/0 P Q .y Xb/ and .1= 2 /.y Xb/0 .I PX /.y Xb/ as quadratic XR forms (in y Xb) and making use of Corollaries 6.6.4 and 6.8.2. Solution. (1) By definition,
.O / P 0 C .O /=M P : 0 y .I PX /y=.N P /
F./ P D
And upon taking b to be a P 1 vector such that P D ƒ0 b and recalling (from Section 7.3a) that Q D ƒ, that O D R Q 0 X0 y, and that C D R Q 0 X0 XR Q ŒD .XR/ Q 0 XR], Q we find that X0 XR and hence that
Q 0 X0 y O P D R
Q 0b D R Q 0 X0 .y Xb/ .X0 XR/
Q Q 0 .y Xb/ D .y Xb/0 XRC .XR/
.O / P 0 C .O / P D .y
Xb/0 PXRQ .y Xb/:
Moreover, since .I PX /X D 0 and X0 .I PX / D 0 (as is evident from Theorem 2.12.2), y 0 .I PX /y D .y Xb/0 .I PX /.y Xb/: Thus, F ./ P D
.y Xb/0 P
Q XR
.y
Xb/0 .I
.y Xb/=M
PX /.y Xb/=.N P /
2
D
.1= /.y Xb/0 P
Q XR
.1= 2 /.y
Xb/0 .I
.y Xb/=M
PX /.y Xb/=.N P /
:
(S.2)
(2) We have that e N.0; 2 I/, so that y N.Xˇ; 2 I/ and hence y Xb N ŒX.ˇ b/; 2 I. Thus, upon regarding .1= 2 /.y Xb/0 P Q .y Xb/ and .1= 2 /.y Xb/0 .I PX /.y Xb/ as quadratic XR forms (in y Xb) and upon observing (in light of Theorem 2.12.2) that Q Q 0 X0 .I P / D 0; Œ.1= 2 /PXRQ . 2 I/.1= 2 /.I PX / D .1= 2 /XRC R X Œ.1= 2 /PXRQ . 2 I/.1= 2 /PXRQ D .1= 2 /PXRQ ; and
Œ.1= 2 /.I PX /. 2 I/.1= 2 /.I PX / D .1= 2 /.I PX /; it follows from Corollary 6.8.2 that .1= 2 /.y Xb/0 P Q .y Xb/ and .1= 2 /.y Xb/0 .I PX /.y Xb/ XR are statistically independent and from Corollary 6.6.4 that and
.1= 2 /.y Xb/0 PXRQ .y Xb/ 2 Œrank.PXRQ /; .1= 2 /.ˇ b/0 X0 PXRQ X.ˇ b/ .1= 2 /.y Xb/0 .I PX /.y Xb/ 2 Œrank.I PX /; .1= 2 /.ˇ b/0 X0 .I PX /X.ˇ b/:
Moreover, in light of Theorem 2.12.2, result (3.1), and Lemma 2.8.4, Q D M ; rank.PXRQ / D rank.XR/ rank.I PX / D N
rank.PX / D N
P ;
Q Q 0 X.ˇ b/ .1= 2 /.ˇ b/0 X0 PXRQ X.ˇ b/ D .1= 2 /.ˇ b/0 X0 XRC .XR/ D .1= 2 /.ˇ b/0 ƒC ƒ0 .ˇ b/
and
D .1= 2 /.P /0 C .P /; .1= 2 /.ˇ b/0 X0 .I PX /X.ˇ b/ D 0:
93
Confidence Intervals (or Sets) and Tests of Hypotheses And in combination with result (S.2), these results imply that F ./ P SF ŒM ; N P ; .1= 2 /.P /0 C .P /:
EXERCISE 8. Take the context to be that of Section 7.3, and adopt the notation employed therein. Taking the model to be the canonical form of the G-M model and taking the distribution of the vector of residual effects to be N.0; 2 I/, derive (in terms of the transformed vector z) the size- P likelihood ratio test of the null hypothesis HQ 0 W ˛ D ˛.0/ (versus the alternative hypothesis HQ 1 W ˛ ¤ ˛.0/ )— refer, e.g., to Casella and Berger (2002, sec. 8.2) for a discussion of likelihood ratio tests. Show that the size- P likelihood ratio test is identical to the size- P F test. 20 1 3 ˛ Solution. The distribution of z is N 4@ A; 2 I5. Accordingly, the likelihood function, say 0 `.˛; ; I z/, is expressible as follows: `.˛; ; I z/ D .2 2 /
N=2
expf Œ1=.2 2 /Œ.˛O ˛/0 .˛O ˛/ C .O /0.O / C d0dg:
Upon recalling the results of Section 5.9a, it becomes clear that `.˛; ; I z/ attains its maximum value (with respect to ˛, , and ) at ˛ D ˛, O D , O and D .d0 d=N /1=2 and that `.˛.0/; ; I z/ attains its maximum value (with respect to and ) at D O and D fŒ.˛O ˛.0/ /0 .˛O ˛.0/ / C d0 d=N g1=2. Thus, the likelihood ratio test statistic is max `.˛.0/; ; I z/
d0 d D max `.˛; ; I z/ .˛O ˛.0/ /0 .˛O ˛.0/ / C d0 d
;
˛; ;
.˛O ˛.0/ /0 .˛O ˛.0/ / D 1C d0 d
N=2 N=2
:
This statistic is a decreasing function of .˛O ˛.0/ /0 .˛O ˛.0/ /=d0 d and, since FQ .˛.0/ / D Œ.N P /=M .˛O ˛.0/ /0 .˛O ˛.0/ /=d0 d; a decreasing function of FQ .˛.0/ /. Thus, the size- P likelihood ratio test is identical to the size- P F test. EXERCISE 9. Verify result (3.67). Solution. Let us employ the same notation as in Sections 7.3a and 7.3b. Making use of equality (3.64) and recalling [from result (3.8)] that D W 0 ˛, we find that Thus,
FQ1 .˛/ D S0F1 .W 0˛/ D S0F1 ./:
2 3 FQ1 .˛/ 0 Q FQ1 .˛/ C UFQ2 ./ D XRSS Q O4FQ2 ./ 5 D XRS F1 ./ C UFQ2 ./: 0 Moreover, as noted earlier [and as is apparent from result (3.29)], SS0 is a generalized inverse of the matrix C. Q F1 ./ is invariant to the choice It remains only to establish that the value of the product XRC of the generalized inverse C . For any M 1 vector P and, in particular, for P D , F1 ./ P 2 C.ƒ0 / [which in light of result (3.6), is consistent with expression (3.63)], implying that F1 ./ D ƒ0 b for Q 0 X0 XR Q and that ƒ D X0 XR, Q we find that some (column) vector b. And upon recalling that C D R Q F1 ./ D XRC Q Q Q 0 Xb: XRC ƒ0 b D XRC .XR/
94
Confidence Intervals (or Sets) and Tests of Hypotheses
Q Q 0 is invariant to the choice of the generalized inverse C , as is evident upon Moreover, XRC .XR/ Q in place of X). applying Part (3) of Theorem 2.12.2 (with XR EXERCISE 10. Verify the equivalence of conditions (3.59) and (3.68) and the equivalence of conditions (3.61) and (3.69). Solution. Let P represent an arbitrary vector in C.ƒ0 /. And recalling [from result (3.6)] that C.ƒ0 / D C.W 0 /, observe that there exists an M1 vector ˛P such that P D W 0 ˛. P Moreover, since [according to result (3.6)] W S D I, ˛P D .W S/0 ˛P D S0 P and P D W 0 S0 : P Now, suppose that FQ1 .˛/ P D ˛, P as would be the case if condition (3.59) is satisfied. Then, FQ1 .S0 / P D 0 S , P so that [in light of relationship (3.63)] F1 ./ P D W 0FQ1 .S0 / P D W 0 S0 P D : P Conversely, let ˛P represent an arbitrary M 1 vector. Further, let P D W 0 ˛, P in which case P 2 C.W 0 / D C.ƒ0 /. And supposing that F1 ./ P D P [as would be the case if condition (3.68) is satisfied], we find that F1 .W 0 ˛/ P D W 0 ˛, P so that [in light of relationship (3.64)] 0 Q F1 .˛/ P D S F1 .W 0 ˛/ P D S0 W 0 ˛P D .W S/0 ˛P D ˛: P Thus, condition (3.59) implies condition (3.68) and vice versa; and we conclude that conditions (3.59) and (3.68) are equivalent. Moreover, since .0/ D W 0 ˛.0/, we conclude also that condition (3.61) implies condition (3.69), and vice versa, and hence that conditions (3.61) and (3.69) are equivalent. EXERCISE 11. Taking the context to be that of Section 7.3 and adopting the notation employed therein, show that, corresponding to any two choices S1 and S2 for the matrix S (i.e., any two M M matrices S1 and S2 such that T S1 and T S2 are orthogonal), there exists a unique M M matrix Q such that Q 2 D XRS Q 1 Q; XRS and show that this matrix is orthogonal. Solution. Let Q represent an arbitrary MM matrix. Then, recalling Corollary 2.3.4 and observing that Q 0 XR Q D C D T 0 T; .XR/ we find that Q 2 D XRS Q 1Q XRS (S.3) if and only if
Q 0 XRS Q 2 D .XR/ Q 0 XRS Q 1 Q; .XR/
or equivalently if and only if
T 0 T S2 D T 0 T S1 Q;
and hence if and only if T S2 D T S1 Q: And (since T S1 is orthogonal) it follows that equality (S.3) is satisfied uniquely by taking Q D .T S1 /0 T S2 and (since T S2 , like T S1 , is orthogonal) that Œ.T S1 /0 T S2 0 .T S1 /0 T S2 D .T S2 /0 T S1 .T S1 /0 T S2 D .T S2 /0 IT S2 D I [so that .T S1 /0 T S2 is orthogonal]. EXERCISE 12. Taking the context to be that of Section 7.3, adopting the notation employed therein, and making use of the results of Exercise 11 (or otherwise), verify that none of the groups G0 , G1 , G01 , G2 . .0/ /, G3 . .0/ /, and G4 (of transformations of y) introduced in the final three parts of Subsection b of Section 7.3 vary with the choice of the matrices S, U, and L.
Confidence Intervals (or Sets) and Tests of Hypotheses
95
Solution. Letting L1 and L2 represent N .N P / matrices and supposing that L1 is among the choices for the matrix L [whose columns form an orthonormal basis for N.X0 /], observe that , for L2 to be among the choices for L, it is necessary and sufficient that L2 D L1 M
for some orthogonal matrix M. And observe that Q 2 ; L2 /0 D NŒ.XRS Q 1 ; L 1 /0 NŒ.XRS
(S.4) (S.5)
for any two choices S1 and S2 for the matrix S (i.e., any two M M matrices S1 and S2 such that T S1 and T S2 are orthogonal) and for any two choices L1 and L2 for the matrix L [as is evident from the results of Exercise 11 and from result (S.4)]. Further, letting U1 and U2 represent N .P M / matrices and supposing that U1 is among the choices for the matrix U (whose columns form an Q L/0 ), observe that , for U2 to be among the choices for U, it is orthonormal basis for NŒ.XRS; necessary and sufficient that U2 D U1 A (S.6) for some orthogonal matrix A. That the groups G0 , G1 , and G01 do not vary with the choice of the matrices S, U, and L is apparent from result (3.72) upon observing [in light of result (3.1) and Corollary 2.4.17] that Q Q C.XRS/ D C.XR/ and upon observing [in light of results (S.5) and (S.6)] that C.U/ does not vary with the choice of U. Turning now to the group G2 . .0/ /, we find that T2 .y I .0/ / D O TQ2 .O 0 y I S0 .0/ / 2 0 .0/ 3 Q 0 X0 y S0 .0/ / S C k.S0 R 0 5 kU y D O4 0 kL y
0 .0/ 0 Q0 0 Q Q D XRSS C k XRSS .R X y .0/ / C kUU 0 y C k LL0 y: Q 0 X0 X), And since .0/ 2 C.ƒ0 / (and since ƒ0 D R Q 0 X0 X` .0/ D R for some vector `. Thus, 0 .0/ 0 Q0 0 0Q0 0 Q Q Q XRSS C k XRSS .R X y .0/ / D XRSS R X Œky C .1 k/X`:
(S.7) (S.8)
Moreover, it follows from the results of Exercise 11 that, corresponding to any two choices S1 and S2 for the matrix S, there exists an orthogonal matrix Q such that Q 2 D XRS Q 1 Q; XRS (S.9) in which case Q 2 S20 R Q 0 X0 D XRS Q 2 .XRS Q 2 /0 D XRS Q 1 QQ0 .XRS Q 1 /0 D XRS Q 1 S10 R Q 0 X0: XRS (S.10) Based on result (S.10) and on results (S.4), (S.5), and (S.6), it is clear that expression (S.8) and the quantities kUU 0 y and k LL0 y do not vary with the choice of the matrices S, U, and L, leading to the conclusion that T2 .y I .0/ / and hence the group G2 . .0/ / do not vary with that choice. Finally, consider the groups G3 . .0/ / and G4 . Clearly, T3 .y I .0/ / D O TQ3 .O 0 y I S0 .0/ / 2 0 .0/ 3 Q 0 X0 y S0 .0/ / S C P 0 .S0 R 0 5 Uy D O4 0 Ly 0 .0/ 0 0 Q0 0 Q Q D XRSS C XRSP S .R X y .0/ / C UU 0 y C LL0 y
(S.11)
96 and
Confidence Intervals (or Sets) and Tests of Hypotheses 0Q0 0 Q T4 .y/ D O TQ4 .O 0 y/ D XRSS R X y C UU 0 y C LB0 L0 y:
(S.12)
Moreover, in light of result (S.7), 0 .0/ 0 0 Q0 0 0Q0 0 0 Q Q Q Q Q 0 .y X`/: (S.13) XRSS C XRSP S .R X y .0/ / D XRSS R X X` C XRSP .XRS/
And (for the 2 choices S1 and S2 for the matrix S) Q 2 P 0 .XRS Q 2 /0 D XRS Q 1 .QP Q0 /0 .XRS Q 1 /0 XRS
(S.14)
[where Q is the orthogonal matrix that satisfies equality (S.9)]. In light of result (S.10) and results (S.4), (S.5), and (S.6), the first term of expression (S.13) and the quantities UU 0 y and LL0 y do not vary with the choice of the matrices S, U, and L. And as is apparent from result (S.14), the second term of expression (S.13) is affected by the choice of S, but the collection of values obtained for the second term as P ranges over all M M orthogonal matrices is unaffected—the mapping P ! QP Q0 is a 1-to-1 mapping of the set of all M M orthogonal matrices onto itself. Thus, the group G3 . .0/ / does not vary with the choice of S, U, and L. Similarly, the first two terms of expression (S.12) do not vary with the choice of S, U, and L [as is evident from result (S.10) and results (S.5) and (S.6)]. The last term [of expression (S.12)] varies with the choice of L; for the two choices L1 and L2 , L2 B0 L02 D L1 .MBM0 /0 L01 [where M is the orthogonal matrix that satisfies equality (S.4)]. However, the collection of matrices generated by the expression L1 .MBM0 /0 L01 as B ranges over all .N P / .N P / orthogonal matrices is the same as that generated by the expression L1 B0 L01 . Thus, the group G4 does not vary with the choice of S, U, and L. Q by expression (3.89) or EXERCISE 13. Consider the set AQıQ (of ıQ 0 ˛-values) defined (for ıQ 2 ) (3.90). Underlying this definition is an implicit assumption that (for any M1 vector tP ) the function Q D jıQ 0 tP j=.ıQ 0 ı/ Q 1=2, with domain fıQ 2 Q W ıQ ¤ 0g, attains a maximum value. Show (1) that this f .ı/ function has a supremum and (2) that if the set Q 1=2 ıg Q R D fıR 2 RM W 9 a nonnull vector ıQ in Q such that ıR D .ıQ 0 ı/ Q at which this function attains a maximum value. is closed, then there exists a nonnull vector in Q Solution. (1) For any M 1 nonnull vector ı, jıQ 0 tP j .tP 0 tP /1=2; 0 1=2 Q Q .ı ı/
as is evident from inequality (2.4.10) (which is a special case of the Cauchy-Schwarz inequality). Q is bounded and hence has a supremum—refer, e.g., to Bartle (1976, sec. 17) Thus, the function f .ı/ or to Bartle and Sherbert (2011). R represent the function defined (for ıR 2 RM ) as follows: g.ı/ R D ıR 0 tP . And let r.ı/ R (2) Let g.ı/ R R R R R R represent the restriction of the function g.ı/ to the set , and define (for ı 2 ) h.ı/ D jr.ı/j.
Since g./ is a linear function, it is continuous; and since a restriction of a continuous function is continuous and since the absolute value of a continuous function is continuous, r./ is continuous R is and hence h./ is continuous—refer, e.g., to Bartle (1976, secs. 20 and 21). Moreover, the set R has a norm of 1). bounded (as is evident upon observing that every vector in R is closed. Then, since R is bounded, it follows from the Heine-Borel Now, suppose that the set R is compact. And theorem (e.g., Bartle 1976, sec. 11; Bartle and Sherbert 2011, sec 11.2) that upon applying the extreme (maximum and minimum) value theorem (e.g., Bartle 1976, sec. 22), R contains a vector, say ıR , at which the function h./ attains a maximum value. we conclude that Q W ıQ ¤ 0g contains a vector, say ıQ , for which It remains only to observe that the set fıQ 2
97
Confidence Intervals (or Sets) and Tests of Hypotheses ıR D .ıQ0 ıQ /
1=2 Q
ı and that f ./ attains a maximum value at ıQ .
EXERCISE 14. Take the context to be that of Section 7.3, and adopt the notation employed therein. Further, let r D O c P , and for ıQ 2 RM, let Q APıQ D fP 2 R1 W P D ıQ 0 ˛; P ˛P 2 Ag;
where AQ is the set (3.121) or (3.122) and is expressible in the form jıP 0 .˛P ˛/j O M P Q Q A D ˛P 2 R W r for every nonnull ı 2 : P 1=2 .ıP 0 ı/ Q AP Q is identical to the set AQ Q defined by expression (3.123). Show that for ıQ 2 , Q AP Q is For ıQ … , ı ı ı Q identical to the set A Q defined by expression (3.89) or (3.90) or, equivalently, by the expression ı
AQıQ D fP 2 R1 W jP
Q 1=2 rg: ıQ 0 ˛/j O .ıQ 0 ı/
(E.2)
Q It Solution. Let P represent an arbitrary scalar, and let ıQ represent any nonnull M 1 vector in . P suffices to show that P is contained in the set (E.2) if and only if it is contained in the set AıQ —when ıQ D 0, both of these sets equal f0g. Q 1=2 r, Suppose that P 2 APıQ . Then, P D ıQ 0 ˛P for some M 1 vector ˛P such that jıQ 0 .˛P ˛/j=. O ıQ 0 ı/ 0 0 Q 1=2 Q Q in which case jP ı ˛j O .ı ı/ r, implying that P is contained in the set (E.2). Conversely, suppose that P is contained in the set (E.2). Then, Q 1=2 k P ıQ 0 ˛O D .ıQ 0 ı/ (S.15)
Q 1=2 ı. Q And observe [in light for some scalar k such that 0 jkj r. Now, let ˛P D ˛O C k.ıQ 0 ı/ of result (2.4.10) (which is a special case of the Cauchy-Schwarz inequality)] that for any nonnull P M 1 vector ı, jıP 0 .˛P ˛/j O Œ.˛P ˛/ O 0 .˛P ˛/ O 1=2 D jkj r; 0 1=2 P P .ı ı/ Q Observe also that so that ˛P 2 A. Q 1=2 k; ıQ 0 ˛P ıQ 0 ˛O D .ıQ 0 ı/ which in combination with result (S.15) implies that P D ıQ 0 ˛, P and leads to the conclusion that P 2 APıQ .
Q the matrix C, and the random vector t to be as deEXERCISE 15. Taking the sets and , fined in Section 7.3, supposing that the set fı 2 W ı 0 Cı ¤ 0g consists of a finite number of vectors ı1 ; ı2 ; : : : ; ıQ , and letting K represent a Q Q (correlation) matrix with ij th element .ıi0 Cıi / 1=2 .ıj0 Cıj / 1=2 ıi0 Cıj , show that jıQ 0 tj D max.ju1 j; ju2 j; : : : ; juQ j/; max Q 1=2 Q Q Q W ı¤0g .ıQ 0 ı/ fı2 where u1 ; u2 ; : : : ; uQ are the elements of a random vector u that has an MV t.N P; K/ distribution. Q Q D fıQ W ıDW Solution. Adopting the notation employed in Section 7.3, recalling that ı; ı2g (and that W is a matrix such that C D W 0 W ), and observing that [since .W ı/0 W ı D ı 0 Cı] W ı D 0 if and only if ı 0 Cı D 0, it follows from the supposition about the nature of the set that Q W ıQ ¤ 0g D fıQ1 ; ıQ2 ; : : : ; ıQQ g; fıQ 2
where (for i D 1; 2; : : : ; Q) ıQi D W ıi . Thus, max
Q Q Q W ı¤0g fı2
0 jıQQ tj jıQ20 tj jıQ10 tj jıQ 0 tj D max ; ; : : : ; 0 0 0 Q 1=2 .ıQ 0 ı/ .ıQ1 ıQ1 /1=2 .ıQ2 ıQ2 /1=2 .ıQQ ıQQ /1=2
D max.ju1 j; ju2 j; : : : ; juQ j/;
!
98
Confidence Intervals (or Sets) and Tests of Hypotheses
where (for i D 1; 2; : : : ; Q) ui D .ıQi0 ıQi / 1=2 ıQi0 t. Now, let u D .u1 ; u2 ; : : : ; uQ /0. And observe that where D D diag .ıQ10 ıQ1 / also that (by definition)
; .ıQ20 ıQ2 /
1=2
u D DB0 t;
0 Q ; : : : ; .ıQQ ıQ /
1=2
1=2
and B D ıQ1 , ıQ2 , : : : , ıQQ . Observe
1 t p z; v=.N P / where v is a random variable that is distributed as 2 .N P / and z is a random vector that is distributed independently of v as N.0; I/. Then, 1 u p DB0 z and DB0 z N.0; DB0 BD/: v=.N P / Further, for i; j .ıQi0 ıQi / 1=2 .ıQj0 ıQj /
.ıQi0 ıQi /
D 1; 2; : : : ; Q, the ij element of the Q Q matrix DB0 BD equals 1=2 Q 0 Q ıi ıj , so that DB0 BD is a correlation matrix, and 1=2
.ıQj0 ıQj /
1=2 Q 0 Q ıi ıj
D .ıi0 W 0 W ıi / D .ıi0 Cıi /
1=2
1=2
.ıj0 W 0 W ıj /
.ıj0 Cıj /
1=2 0 ıi W 0 W ıj
1=2 0 ıi Cıj :
It remains only to observe that u MV t.N P ; DB0 BD/.
Q c P , and t as in Section 7.3c [so that t is an M 1 random vector that EXERCISE 16. Define , has an MV t.N P ; IM / distribution]. Show that ıQ 0 t > c P P =2; Pr max Q 1=2 Q Q Q W ı¤0g .ıQ 0 ı/ fı2
with equality holding if and only if there exists a nonnull M 1 vector ıR (of norm 1) such that Q 1=2 ıQ D ıR for every nonnull vector ıQ in . Q .ıQ 0 ı/ Solution. Clearly, c P > 0, and [in light of result (3.96)]
P D Pr
D Pr
max
Q Q Q W ı¤0g fı2
max
Q Q Q W ı¤0g fı2
j ıQ 0 tj > c P Q 1=2 .ıQ 0 ı/
ıQ 0 t ıQ 0 t min > c P C Pr > c P Q 1=2 Q 1=2 Q Q Q W ı¤0g .ıQ 0 ı/ .ıQ 0 ı/ fı2 ıQ 0 t ıQ 0 t Pr max > c P and min > c P : Q 1=2 Q 1=2 Q Q Q Q Q W ı¤0g Q W ı¤0g .ıQ 0 ı/ .ıQ 0 ı/ fı2 fı2
Moreover, upon recalling result (3.145) and observing that t t, we find that ıQ 0 . t/ ıQ 0 t > c P , max > c P min Q 1=2 Q 1=2 Q Q Q Q Q W ı¤0g Q W ı¤0g .ıQ 0 ı/ .ıQ 0 ı/ fı2 fı2 and that max
Q Q Q W ı¤0g fı2
ıQ 0 . t/ ıQ 0 t max : Q 1=2 fı2 Q 1=2 Q Q Q W ı¤0g .ıQ 0 ı/ .ıQ 0 ı/
Thus,
P D 2 Pr
max
Q Q Q W ı¤0g fı2
ıQ 0 t > c P Q 1=2 .ıQ 0 ı/ Pr max Q Q Q W ı¤0g fı2
ıQ 0 t > c P and Q 1=2 .ıQ 0 ı/
min
Q Q Q W ı¤0g fı2
ıQ 0 t > c P : Q 1=2 .ıQ 0 ı/
99
Confidence Intervals (or Sets) and Tests of Hypotheses or, equivalently, Pr max
ıQ 0 t > c P Q 1=2 Q Q Q W ı¤0g .ıQ 0 ı/ fı2 D P =2 C Pr max
Q Q Q W ı¤0g fı2
so that Pr
max
ıQ 0 t > c P and Q 1=2 .ıQ 0 ı/
Q Q Q W ı¤0g fı2
ıQ 0 t > c P P =2; Q 1=2 .ıQ 0 ı/
with equality holding if and only if ıQ 0 t Pr max > c P and Q 1=2 Q Q Q W ı¤0g .ıQ 0 ı/ fı2 or, equivalently, if and only if ıQ 0 t Pr min < Q 1=2 Q Q Q W ı¤0g .ıQ 0 ı/ fı2
min
Q Q Q W ı¤0g fı2
ıQ 0 t > c P ; Q 1=2 .ıQ 0 ı/
min
ıQ 0 t > c P D 0 Q 1=2 .ıQ 0 ı/
max
ıQ 0 t > c P D 0: Q 1=2 .ıQ 0 ı/
Q Q Q W ı¤0g fı2
c P and
Q Q Q W ı¤0g fı2
Q Now, suppose that there exists a nonnull M 1 vector ıR such that .ıQ 0 ı/ Q Then, nonnull vector ıQ in . ıQ 0 t ıQ 0 t max D min D ıR 0 t: Q 1=2 Q 1=2 Q Q Q Q Q W ı¤0g Q W ı¤0g .ıQ 0 ı/ .ıQ 0 ı/ fı2 fı2 And it follows that Pr min
Q Q Q W ı¤0g fı2
ıQ 0 t < c P and Q 1=2 .ıQ 0 ı/
max
Q Q Q W ı¤0g fı2
1=2 Q
ı D ıR for every
ıQ 0 t > c P Q 1=2 .ıQ 0 ı/ D PrŒ ıR 0 t < c P and ıR 0 t > c P D 0:
Q 1=2 ıQ D Alternatively, suppose that there does not exist a nonnull M1 vector ıR such that .ıQ 0 ı/ Q Then, Q contains two nonnull vectors ıQ1 and ıQ2 such that ıR for every nonnull vector ıQ in . .ıQ10 ıQ1 / 1=2 ıQ1 ¤ .ıQ20 ıQ2 / 1=2 ıQ2 . And the joint distribution of the random variables .ıQ10 ıQ1 / 1=2 ıQ10 t and .ıQ20 ıQ2 / 1=2 ıQ20 t is bivariate t (with N P degrees of freedom); the correlation of these two random variables equals .ıQ10 ıQ1 / 1=2 .ıQ20 ıQ2 / 1=2 ıQ10 ıQ2 , the absolute value of which is [in light of the Cauchy-Schwarz inequality (2.4.10)] strictly less than 1. Thus, ıQ 0 t ıQ 0 t Pr min < c P and max > c P Q 1=2 Q 1=2 Q Q Q Q Q W ı¤0g Q W ı¤0g .ıQ 0 ı/ .ıQ 0 ı/ fı2 fı2 Q0 ı1 t ıQ20 t Pr min < c P ; .ıQ10 ıQ1 /1=2 .ıQ20 ıQ2 /1=2 Q0 ıQ20 t ı1 t > c ; and max
P .ıQ10 ıQ1 /1=2 .ıQ20 ıQ2 /1=2 ıQ10 t ıQ20 t D Pr < c and > c
P
P .ıQ10 ıQ1 /1=2 .ıQ20 ıQ2 /1=2 ıQ20 t ıQ10 t C Pr < c and > c
P
P .ıQ 0 ıQ /1=2 .ıQ 0 ıQ /1=2 2 2
> 0:
1 1
100
Confidence Intervals (or Sets) and Tests of Hypotheses
EXERCISE 17. (a) Letting E1 ; E2 ; : : : ; EL represent any events in a probability space and (for any event E) denoting by E the complement of E, verify the following (Bonferroni) inequality: PL Pr E1 \ E2 \ \ EL 1 i D1 Pr Ei :
(b) Take the context to be that of Section 7.3c, where y is an N 1 observable random vector that follows a G-M model with N P model matrix X of rank P and where D ƒ0ˇ is an M 1 vector of estimable linear combinations of the elements of ˇ (such that ƒ ¤ 0). Further, suppose that the distribution of the vector e of residual effects is N.0; 2 I/ (or is some other spherically symmetric distribution with mean vector 0 and variance-covariance matrix 2 I), let O represent the least squares estimator of , let C D ƒ0.X0 X/ ƒ, let ı1 ; ı2 ; : : : ; ıL represent M 1 vectors of constants such that (for i D 1; 2; : : : ; L) ıi0 Cıi > 0, let O represent the positive square root of the usual estimator of 2 (i.e., the estimator obtained upon dividing the residual sum of squares P by N P ), and let P1 ; P2 ; : : : ; PL represent positive scalars such that L i D1 Pi D P . And (for i D 1; 2; : : : ; L) denote by Ai .y/ a confidence interval for ıi0 with end points ıi0 O ˙ .ıi0 Cıi /1=2 O tN Pi =2 .N P /:
Use the result of Part (a) to show that PrŒıi0 2 Ai .y/ .i D 1; 2; : : : ; L/ 1 P
and hence that the intervals A1 .y/; A2 .y/; : : : ; AL .y/ are conservative in the sense that their probability of simultaneous coverage is greater than or equal to 1 —when P
P1 D P2 D D PL , the end points of interval Ai .y/ become ıi0 O ˙ .ıi0 Cıi /1=2 O tN P =.2L/ .N P /;
and the intervals A1 .y/; A2 .y/; : : : ; AL .y/ are referred to as Bonferroni t-intervals. Solution. (a) Since Pr E1 \E2 \ \EL D 1 Pr E1 \ E2 \ \ EL , it suffices to prove that Pr E1 \ E2 \ \ EL Pr E1 C Pr E2 C C Pr EL : (S.16) To establish the validity of inequality (S.16), let us employ mathematical induction. Clearly, E1 \ E2 D E1 [ E2 , and, consequently, Pr E1 \ E2 D Pr E1 [ E2 D Pr E1 C Pr E2 Pr E1 \ E2 Pr E1 C Pr E2 ;
(S.17)
which establishes the validity of inequality (S.16) when applied to any two events. To complete the induction argument, suppose that inequality (S.16) is valid when applied to any L 1 events. Then, upon applying inequality (S.17) (with E1 \E2 \ \EL 1 in place of E1 and EL in place of E2 ), we find that Pr E1 \ E2 \ \ EL D Pr .E1 \ E2 \ \ EL 1 / \ EL Pr E1 \ E2 \ \ EL 1 C Pr EL Pr E1 C Pr E2 C C Pr EL 1 C Pr EL :
And we conclude that inequality (S.16) is valid when applied to any L events (for every positive integer L 2). (b) For i D 1; 2; : : : ; L,
ıi0 … Ai .y/
,
jıi0 O ıi0 j > tN Pi =2 .N P / .ıi0 Cıi /1=2 O
Confidence Intervals (or Sets) and Tests of Hypotheses and
101
ıi0 O ıi0 S t.N P /: .ıi0 Cıi /1=2 O
Thus, as an application of the result of Part (a) [that where the event Ei consists of the values of y such that ıi0 2 Ai .y/], we have that PL 0 PrŒıi0 2 Ai .y/ .i D 1; 2; : : : ; L/ 1 i D1 PrŒıi … Ai .y/ 0 PL jıi O ıi0 j N > t .N P / D1 Pr
Pi =2 i D1 .ıi0 Cıi /1=2 O PL D1 i D1 Pi D1
P :
EXERCISE 18. Suppose that the data (of Section 4.2b) on the lethal dose of ouabain in cats are regarded as the observed values of the elements y1 ; y2 ; : : : ; yN of an N.D 41/-dimensional observable random vector y that follows a G-M model. Suppose further that (for i D 1; 2; : : : ; 41) E.yi / D ı.ui /, where u1 ; u2 ; : : : ; u41 are the values of the rate u of injection and where ı.u/ is the third-degree polynomial ı.u/ D ˇ1 C ˇ2 u C ˇ3 u2 C ˇ4 u3: And suppose that the distribution of the vector e of residual effects is N.0; 2 I/ (or is some other spherically symmetric distribution with mean vector 0 and variance-covariance matrix 2 I). (a) Compute the values of the least squares estimators ˇO1 ; ˇO2 ; ˇO3 , and ˇO4 of ˇ1 ; ˇ2 ; ˇ3 , and ˇ4 , respectively, and the value of the positive square root O of the usual unbiased estimator of 2 —it follows from the results of Section 5.3d that P .D rank X/ D P D 4, in which case N P D N P D 37, and that ˇ1 ; ˇ2 ; ˇ3 , and ˇ4 are estimable. (b) Find the values of tN:05 .37/ and Œ4FN:10 .4; 37/1=2, which would be needed if interval Iu.1/.y/ with end points (3.169) and interval Iu.2/.y/ with end points (3.170) (where in both cases P is taken to be :10) were used to construct confidence bands for the response surface ı.u/.
(c) By (for example) making use of the results in Liu’s (2011) Appendix E, compute Monte Carlo .3/ approximations to the constants c:10 and c:10 that would be needed if interval Iu .y/ with end .4/ points (3.171) and interval Iu .y/ with end points (3.172) were used to construct confidence bands for ı.u/; compute the approximations for the case where u is restricted to the interval 1 u 8, and (for purposes of comparison) also compute c:10 for the case where u is unrestricted. O (d) Plot (as a function of u) the value of the least squares estimator ı.u/ D ˇO1 C ˇO2 uC ˇO3 u2 C ˇO4 u3 and (taking P D :10) the values of the end points (3.169) and (3.170) of intervals Iu.1/.y/ and Iu.2/.y/ and the values of the approximations to the end points (3.171) and (3.172) of intervals Iu.3/.y/ and Iu.4/.y/ obtained upon replacing c:10 and c:10 with their Monte Carlo approximations—assume (for purposes of creating the plot and for approximating c:10 and c:10 ) that u is restricted to the interval 1 u 8.
Solution. (a) ˇO1 D 8:838, ˇO2 D 8:602, ˇO3 D 0:752, ˇO4 D 0:109, and O D 12:357. (b) tN:05 .37/ D 1:687 and Œ4FN:10 .4; 37/1=2 D 2:901.
(c) Let W represent a 4 4 upper triangular matrix such that .X0 X/ 1 D W 0 W [in which case .X0 X/ 1 D W 0 W is the Cholesky decomposition of .X0 X/ 1 ], take t to be a 4 1 random vector whose distribution is MV t.37; I4 /, and define (for u 2 R1 ) x.u/ D .1; u; u2; u3 /0 (so that ı.u/ D Œx.u/0 ˇ). Further, let (for u 2 R1 ) z.u/ D W x.u/, and define (for u 2 R1 ) functions g.u/ and h.u/ (of u) as follows: t 0 z.u/ and h.u/ D t 0 z.u/: g.u/ D fŒz.u/ 0 z.u/g1=2
102
Confidence Intervals (or Sets) and Tests of Hypotheses
By definition, c:10 is the upper 10% point of the distribution of the random variable maxfu W 1u8g jg.u/j or the random variable maxfu W u2R1 g jg.u/j (depending on whether or not u is restricted to the interval 1 u 8), and c:10 is the upper 10% point of the distribution of the random variable maxfu W 1u8g jh.u/j. To obtain a Monte Carlo approximation to c:10 , it is necessary to determine the value of maxfu W 1u8g jg.u/j or maxfu W u2R1 g jg.u/j for each of a large number of values of t. Simi larly, to obtain a Monte Carlo approximation to c:10 , it is necessary to determine the value of maxfu W 1u8g jh.u/j for each of a large number of values of t. Clearly, d g.u/ D du
fŒz.u/ 0 z.u/g1=2
d t 0 z.u/ du
Œt 0 z.u/ 12 fŒz.u/ 0 z.u/g
1=2 d Œz.u/
0
du
Œz.u/ 0 z.u/ 0 d t z.u/ 1 0 d Œz.u/ 0 z.u/ fŒz.u/ 0 z.u/g Œt z.u/ 2 du du : D fŒz.u/ 0 z.u/g3=2
And
z.u/
(S.18)
d h.u/ d t 0 z.u/ d x.u/ D D t0 W ; du du du
(S.19)
and [making use of result (5.4.10) along with the chain rule] d x.u/ d Œz.u/ 0 z.u/ D 2Œx.u/0 W 0 W : du du
(S.20)
Further,
d x.u/ D .0; 1; 2u; 3u2 /0: (S.21) du d g.u/ It follows from results (S.18), (S.19), and (S.19) that D 0 (or, equivalently, that du d Œ g.u/ D 0) if and only if du n h d x.u/ o d x.u/ i t 0 W x.u/ Œx.u/0 W 0 W D 0: (S.22) Œx.u/0 W 0 W x.u/ t 0 W du du
As is evident from result (S.21), the left side of equality (S.22) is a polynomial (in u), so that the function polyroot (which is part of R) can be used to determine the values of u that satisfy equality (S.22). Moreover, max
fu W 1u8g
jg.u/j D max jg.u/j and fu W u2Ug
max
fu W u2R1 g
jg.u/j D
max
fu W u2U0 g
jg.u/j;
(S.23)
where U is the finite set whose elements consist of 1 and 8 and of any values of u in the interval 1 u 8 that satisfy equality (S.22) and where U0 is the finite set whose elements consist of all of the values of u that satisfy equality (S.22). d Œ h.u/ d h.u/ D 0 (or, equivalently, D 0) if and only if Similarly, du du d x.u/ t0 W D 0: (S.24) du And the left side of equality (S.24) is a second-degree polynomial (in u), so that the values of u that satisfy equality (S.24) can be easily determined, and max
fu W 1u8g
jh.u/j D
max
fu W u2U g
jh.u/j;
(S.25)
103
Confidence Intervals (or Sets) and Tests of Hypotheses
where U is the finite set whose elements consist of 1 and 8 and of any values of u in the interval 1 u 8 that satisfy equality (S.24). Results (S.23) and (S.25) were used in obtaining Monte Carlo approximations to c:10 and c:10 . Monte Carlo approximations were determined from 599999 draws with the following results: : : c:10 D 2:5183 or c:10 D 2:5794;
depending on whether or not u is restricted to the interval 1 u 8, and : c:10 D 1:3516: (d)
δ ^ δ I (1) I (2)
90
I (3) I (4) 80
70
60
50
40
30
20
10
u 1
2
3
4
5
6
7
8
EXERCISE 19. Taking the setting to be that of the final four parts of Section 7.3b (and adopting the notation and terminology employed therein) and taking GQ 2 to be the group of transformations consisting of the totality of the groups GQ 2 .˛.0/ / (˛.0/ 2 RM ) and GQ 3 the group consisting of
104
Confidence Intervals (or Sets) and Tests of Hypotheses
Q the totality of the groups GQ 3 .˛.0/ / (˛.0/ 2 RM ), show that (1) if a confidence set A.z/ for ˛ is equivariant with respect to the groups GQ 0 and GQ 2 .0/, then it is equivariant with respect to the group Q GQ 2 and (2) if a confidence set A.z/ for ˛ is equivariant with respect to the groups GQ 0 and GQ 3 .0/, then it is equivariant with respect to the group GQ 3 . Q is equivariant with respect to the groups GQ 0 and GQ 2 .0/. Then, for Solution. (1) Suppose tha A.z/ every strictly positive scalar k (and every value of z), Q Q TQ2 .zI 0/ D f˛N W ˛N D k ˛; Q A.kz/ D AŒ P ˛P 2 A.z/g: (S.26) Q And [in combination with the equivariance of A.z/ with respect to the group GQ 0 ], result (S.26) implies that for every M 1 vector a and every strictly positive scalar k, "0kz Ca1# 1 Q TQ0 .kz/ D f˛R W ˛R D aC ˛; Q AQ @ kz2 A D AŒ N ˛N 2 A.kz/g kz3 Q D f˛R W ˛R D aCk ˛; P ˛P 2 A.z/g: (S.27) Moreover, upon setting a D ˛.0/ k˛.0/ in result (S.27), we find that (for every M 1 vector ˛.0/ and every strictly positive scalar k) 1# "0 .0/ ˛ C k.z1 ˛.0/ / Q A D f˛R W ˛R D ˛.0/ C k.˛P ˛.0/ /; ˛P 2 A.z/g: AQ @ kz2 kz3
Q is equivariant with respect to the group GQ 2 . Thus, A.z/ Q with respect to the groups G Q 0 and GQ 3.0/ implies its equivariance (2) That the equivariance of A.z/ Q with respect to the group G3 can be established via an analogous argument.
EXERCISE 20. Taking the setting to be that of Section 7.4e (and adopting the assumption of normality and the notation and terminology employed therein), suppose that M D 1, and write ˛O Q ˛; for ˛, O ˛ for ˛, and ˛ .0/ for ˛.0/. Further, let . O ; O d/ represent the critical function of an arbitrary (possibly randomized) level- P test of the null hypothesis HQ 0 W ˛ D ˛ .0/ versus the alternative hypothesis HQ 1 W ˛ ¤ ˛ .0/, and let Q .˛; ; / represent its power function fso that Q .˛; ; / D Q ˛; EŒ . O ; O d/g. And define s D .˛O ˛ .0/ /=Œ.˛O ˛ .0/ /2 C d0 d1=2 and w D .˛O ˛ .0/ /2 C d0 d, denote R by .s; w; / O the critical function of a level- P test (of HQ 0 versus HQ 1 ) that depends on ˛, O , O and d only through the values of s, w, and , O and write E0 for the expectation operator E in the special case where ˛ D ˛ .0/. Q ˛; (a) Show that if the level- P test with critical function . O ; O d/ is an unbiased test, then and
Q .˛ .0/; ; / D P for all and ˇ @ Q .˛; ; / ˇˇ D 0 for all and : ˇ @˛ .0/
(E.3) (E.4)
˛D˛
(b) Show that
@ Q .˛; ; / ˛O ˛ Q DE . ˛; O ; O d/ : @˛ 2
(E.5)
Q ˛; (c) Show that corresponding to the level- P test (of HQ 0 ) with critical function . O ; O d/, there is a (level- ) P test that depends on ˛, O , O and d only through the values of s, w, and O and that has the Q ˛; same power function as the test with critical function . O ; O d/.
(d) Show that when ˛ D ˛ .0/, (1) w and O form a complete sufficient statistic and (2) s is statistically independent of w and O and has an absolutely continuous distribution, the pdf of which is the pdf h./ given by result (6.4.7).
Confidence Intervals (or Sets) and Tests of Hypotheses
105
Q ˛; R w; /, (e) Show that when the critical function . O ; O d/ (of the level- P test) is of the form .s; O condition (E.3) is equivalent to the condition R w; / E0 Œ .s; O j w; O D P (wp1) (E.6) and condition (E.4) is equivalent to the condition R w; / E0 Œsw 1=2 .s; O j w; O D 0 (wp1).
(E.7)
(f) Using the generalized Neyman-Pearson lemma (Lehmann and Romano 2005b, sec. 3.6; Shao R w; / 2010, sec. 6.1.1), show that among critical functions of the form .s; O that satisfy (for any particular values of w and ) O the conditions R w; / R w; / E0 Œ .s; O j w; O D P and E0 Œsw 1=2 .s; O j w; O D 0; (E.8) R w; / the value of EŒ .s; O j w; O (at those particular values of w and ) O is maximized [for any particular value of ˛ (¤ ˛ .0/ ) and any particular values of and ] when the critical function is taken to be the critical function R .s; w; / O defined (for all s, w, and ) O as follows: ( 1; if s < c or s > c, R .s; w; / O D 0; if c s c,
where c is the upper 100. =2/% P point of the distribution with pdf h./ [given by result (6.4.7)]. (g) Use the results of the preceding parts to conclude that among all level- P tests of HQ 0 versus HQ 1 that are unbiased, the size- P two-sided t test is a UMP test. Solution. Let z D .˛; O O 0; d0 /0, and observe that (by definition) Z Q ˛;
Q .˛; ; / D . O ; O d/f1 .˛I O ˛; /f2 .I O ; /f3 .dI / d z;
(S.28)
RN
where f1 . I ˛; / is the pdf of an N.˛; 2 / distribution, f2 . I ; / the pdf of an N.; 2 IP 1 / distribution, and f3 . I / the pdf of an N.0; 2 IN P / distribution. Observe also that f1 .˛I O ˛; / D .2 2 /
1=2
expŒ .˛O ˛/2=.2 2 /:
(S.29)
(a) To verify result (E.3), it suffices to observe that for any particular values of and , Q .˛; ; / is a continuous function of ˛ and hence if .˛ Q .0/; ; / < , P then Q .˛ .0/ Cı; ; / < P for some Q ˛; sufficiently small nonzero scalar ı, in which case the test [with critical function . O ; O d/] would not be unbiased. And to verify result (E.4), it suffices to observe that if (for any particular values of and ) the value of the partial derivative of Q .˛; ; / (with respect to ˛) at ˛ D ˛ .0/ were either less than 0 or greater than 0, then [since Q .˛ .0/; ; / P ] Q .˛ .0/Cı; ; / < P for some sufficiently small strictly positive or strictly negative scalar ı, in which case the test would not be unbiased. (b) Result (E.5) follows from result (S.28) upon observing [in light of result (S.29)] that @f1 .˛I O ˛; / D .2 2 / @˛
1=2
expŒ .˛O ˛/2=.2 2 /
@Œ .˛O ˛/2=.2 2 / ˛O ˛ D f1 .˛I O ˛; /: @˛ 2
(c) The observable random variables s and w and the observable random vector O form a sufficient statistic as can be readily verified upon observing that [as ˛ ranges over R1, over RP 1, and over the interval .0; 1/] the pdf of the joint distribution of ˛, O , O and d forms an exponential family. Thus, Q ˛; the conditional expectation EŒ . O ; O d/ j s; w; O does not depend on ˛, , or and hence (when regarded as a function of s, w, and ) O can serve as the critical function of a test of HQ 0 . Moreover, this Q ˛; test is of level P and in fact has the same power function as the test with critical function . O ; O d/, as is evident upon observing that Q ˛; Q ˛; EfEŒ . O ; O d/ j s; w; g O D EŒ . O ; O d/: (d) Suppose that ˛ D ˛ .0/. (1) That w and O form a complete sufficient statistic is evident
106
Confidence Intervals (or Sets) and Tests of Hypotheses
from the pdf of the joint distribution of ˛, O , O and d upon applying results [like those discussed by Lehmann and Romano (2005b, secs. 2.7 and 4.3)] on exponential families. (2) That s is statistically independent of w and O and has an absolutely continuous distribution with pdf h./ is evident from the results of Section 6.4a (upon observing that ˛O and d are distributed independently of ). O
(e) That condition (E.3) is equivalent to condition (E.6) and condition (E.4) to condition (E.7) R w; /] [when the critical function is of the form .s; O is evident from the results of Parts (d) and (b) upon recalling the definition of completeness and upon observing that R w; / R w; / E0 Œ .s; O D E0 fE0 Œ .s; O j w; g; O (S.30) that ˛O
˛ .0/ D sw 1=2, and that
R w; / R w; / E0 Œsw 1=2 .s; O D E0 fE0 Œsw 1=2 .s; O j w; g: O
(S.31)
(f) When ˛ D ˛ .0/, the conditional distribution of s given w and O is the absolutely continuous distribution with pdf h./ [as is evident from the result of Part (d)]. Moreover, h. s/ D h.s/ for “every” value of s. Thus, R1 R c R1 R w; / E0 Œ .s; O j w; O D c h.s/ ds C 1 h.s/ ds D 2 c h.s/ ds D P and R c R1 R w; / O j w; O D c sw 1=2 h.s/ ds C 1 sw 1=2 h.s/ ds E0 Œsw 1=2 .s; R1 R 1 1=2 D c sw 1=2 h.s/ ds h .s/ ds D 0: c sw
When ˛ ¤ ˛ .0/, the conditional distribution of s given w and O is an absolutely continuous distribution with a pdf q .s j w/ that is proportional (for any particular values of w and ) O to the pdf of the joint distribution of w and s; an expression for the latter pdf is obtainable from result (6.4.32). Together, results (6.4.7) and (6.4.32) imply that q .s j w/ 1=2 .0/ D k0 e w s .˛ ˛ / h.s/ for some “constant” k0 . Moreover, there exist “constants” k1 and k2 such that 1=2
.0/
k0 e w c .˛ ˛ / D k1 C k2 w 1=2 c and k0 e w And these constants are such that k0 e w
1=2 s .˛
˛ .0/ /
> k1 C k2 w 1=2 s
1=2 .
, ,
c/.˛ ˛ .0/ /
D k1 C k2 w 1=2 . c/:
s < c or s > c R .s; w; / O D 1:
R w; / Thus, it follows from the generalized Neyman-Pearson lemma that EŒ .s; O j w; O achieves its R maximum value [with respect to the choice of the function . ; ; / from among those candidates R ; ; / is taken to be the function R . ; ; /. that satisfy conditions (E.8)] when . (g) The size- P two-sided t test of HQ 0 versus HQ 1 is equivalent to the test with critical function R .s; w; /, O as can be readily verified by making use of relationship (6.4.31). Moreover, among tests R w; / with a critical function of the form .s; O that satisfy conditions (E.6) and (E.7), the test with R critical function .s; w; / O is a UMP test, as is evident from the result of Part (f). And in light of the results of Parts (c) and (e), it follows that the test with critical function R .s; w; / O is a UMP Q ˛; test among all tests with a critical function . O ; O d/ that satisfies conditions (E.3) and (E.4). Thus, Q ˛; since the critical function . O ; O d/ of any level- P unbiased test satisfies conditions (E.3) and (E.4) [as is evident from the result of Part (a)] and since the size- P two-sided t test is an unbiased test, we conclude that among all level- P unbiased tests, the size- P two-sided t test is a UMP test. EXERCISE 21. Taking the setting to be that of Section 7.4e and adopting the assumption of normality and the notation and terminology employed therein, let Q .˛; ; / represent the power function of a size- P similar test of H0 W D .0/ or HQ 0 W ˛ D ˛.0/ versus H1 W ¤ .0/ or HQ 1 W ˛ ¤ ˛.0/. Show that min˛2S.˛.0/;/ Q .˛; ; / attains its maximum value when the size- P similar test is taken to be the size- P F test.
Confidence Intervals (or Sets) and Tests of Hypotheses
107
Solution. Define Q min .; ; / D min˛2S.˛.0/; / Q .˛; ; /. Then, clearly,
Q min .; ; / N .; ; /:
QFmin .; ; /
(S.32)
min
Further, let and NF .; ; / represent Q .; ; / and N .; ; / in the special case where Q .˛; ; / is the power function of the size- P F test, and observe that
QFmin .; ; / D NF .; ; /:
(S.33)
According to the main result of Section 7.4e,
N .; ; / NF .; ; /:
(S.34)
Together, results (S.32), (S.33), and (S.34) imply that
Q min .; ; / QFmin .; ; /:
Thus, Q min .; ; / attains its maximum value when the size- P similar test is taken to be the size- P F test. EXERCISE 22. Taking the setting to be that of Section 7.4e and adopting the assumption of Q ˛; normality and the notation and terminology employed therein, let . O ; O d/ represent the critical Q function of an arbitrary size- P test of the null hypothesis H0 W ˛ D ˛.0/ versus the alternative Q represent the power function of the test with hypothesis HQ 1 W ˛ ¤ ˛.0/. Further, let Q . ; ; I / Q Q Q ˛; critical function . ; ; /, so that Q .˛; ; I / D E Œ. O ; O d/. And take Q . ; ; / to be the function defined as follows: Q
Q .˛; ; / D supQ Q .˛; ; I /: This function is called the envelope power function.
(a) Show that Q .˛; ; / depends on ˛ only through the value of .˛ ˛.0/ /0 .˛ ˛.0/ /. (b) Let QF .˛; O ; O d/ represent the critical function of the size- P F test of HQ 0 versus HQ 1 . And as a Q ; ; /, consider the use of the criterion basis for evaluating the test with critical function . Q max Œ Q .˛; ; / Q .˛; ; I /; (E.9) ˛2S.˛.0/; /
which reflects [for ˛ 2 S.˛.0/; /] the extent to which the power function of the test deviates from the envelope power function. Using the result of Exercise 21 (or otherwise), show that the size- P F test is the “most stringent” size- P similar test in the sense that (for “every” value of ) the value attained by the quantity (E.9) when Q D QF is a minimum among those attained when Q is the critical function of some (size- ) P similar test. Solution. (a) Let ˛1 and ˛2 represent any two values of ˛ such that .˛2 ˛.0/ /0 .˛2 ˛.0/ / D .˛1 ˛.0/ /0 .˛1 ˛.0/ /. It suffices to show that corresponding to any critical function Q 1 . ; ; /, there exists a second critical function Q2 . ; ; / such that Q .˛2 ; ; I Q2 / D Q .˛1 ; ; I Q1 /. According to Lemma 5.9.9, there exists an orthogonal matrix O such that ˛2 ˛.0/ D O.˛1 .0/ ˛ /. Take Q2 . ; ; / to be a critical function defined [in terms of Q 1 . ; ; /] as follows: Q2 .˛; O ; O d/ D Q1 Œ˛.0/CO 0.˛O ˛.0/ /; ; O d: And observe that
Q .˛2 ; ; I Q2 / D D
R
Q 2 .˛; O ; O d/f1 .˛O I ˛2 ; /f2 .O I ; /f3 .d I / d z .0/ Q CO 0.˛O ˛.0/ /; ; O d N 1 Œ˛
RN
R
R
f1 Œ˛.0/CO 0.˛O ˛.0/ / I ˛1 ; /f2 .O I ; /f3 .d I / d z:
Then, upon writing z for the vector .˛O 0 ; O 0; d0 /0, where ˛O D ˛.0/CO 0.˛O ˛.0/ /, and upon making a change of variables from z to z , we find that R
Q .˛2 ; ; I Q2 / D N Q1 .˛O ; ; O d/f1 .˛O I ˛1 ; /f2 .O I ; /f3 .d I / d z D Q .˛1 ; ; I Q1 /: R
108
Confidence Intervals (or Sets) and Tests of Hypotheses
(b) According to Part (a), Q .˛; ; / has the same value for all ˛ 2 S.˛.0/; /. And as Q attains its maximum value and hence is evident from Exercise 21, min˛2S.˛.0/; / Q .˛; ; I / Q Q ; ; / is the min˛2S.˛.0/; / Q .˛; ; I / its minimum value [subject to the constraint that . critical function of a (size- P ) similar test] when Q D QF . Moreover, max
Q D Œ Q .˛; ; I /
˛2S.˛.0/; /
min
˛2S.˛.0/; /
Q
Q .˛; ; I /:
Q ; ; / is the Thus, the quantity (E.9) attains its minimum value [subject to the constraint that . Q Q critical function of a (size- P ) similar test] when D F . EXERCISE 23. Take the setting to be that of Section 7.5a (and adopt the assumption of normality and the notation and terminology employed therein). Show that among all tests of the null hypothesis H0C W .0/ or HQ 0C W ˛ ˛ .0/ (versus the alternative hypothesis H1C W > .0/ or HQ 1C W ˛ > ˛ .0/ ) that are of level P and that are unbiased, the size- P one-sided t test is a UMP test. (Hint. Proceed stepwise as in Exercise 20.) Q ˛; Solution. Let . O ; O d/ represent the critical function of an arbitrary level- P unbiased (possibly randomized) test of the null hypothesis H0C W .0/ or HQ 0C W ˛ ˛ .0/ (versus the alternative hypothesis H1C W > .0/ or HQ 1C W ˛ > ˛ .0/ ), and let Q .˛; ; / represent its power function fso that Q ˛;
Q .˛; ; / D EŒ. O ; O d/g. Then,
Q .˛ .0/; ; / D P for all and ,
(S.35)
as is evident upon observing that for any particular values of and , Q .˛; ; / is a continuous function of ˛, so that if Q .˛ .0/; ; / < P for some values of and , then Q .˛ .0/ Cı; ; / < P for some sufficiently small strictly positive scalar ı (and for those values of and ), in which case the Q ˛; test [with critical function . O ; O d/] would not be unbiased. .0/ Now, let s D .˛O ˛ /=Œ.˛O ˛ .0/ /2 C d0 d1=2 and w D .˛O ˛ .0/ /2 C d0 d. And observe (in light of the discussion in the next-to-last part of Section 7.3a) that s, w, and O form a sufficient statistic. Q ˛; Thus, the conditional expectation EŒ. O ; O d/ j s; w; O does not depend on ˛, , or and hence (when regarded as a function of s, w, and ) O can serve as the critical function of a test of H0C or HQ 0C. Moreover, Q ˛; Q ˛; EfEŒ. O ; O d/ j s; w; g O D EŒ. O ; O d/; (S.36) Q ˛; so that the test with critical function EŒ. O ; O d/ j s; w; O has the same power function as the test with Q Q ˛; critical function .˛; O ; O d/, implying in particular that [like the test with critical function . O ; O d/] it is of level P and is unbiased. R w; / Let .s; O represent the critical function of a level- P unbiased (possibly randomized) test of C H0 or HQ 0C (versus H1C or HQ 1C ) that depends on ˛, O , O and d only through the values of s, w, and , O and write E0 for the expectation “operator” in the special case where ˛ D ˛ .0/. In light of result (S.36), it suffices to show that the power of the size- P one-sided t test of H0C is uniformly (for every R w; / value of ˛ that exceeds ˛ .0/ and for all values of and ) greater than or equal to EŒ.s; O [for R every choice of the critical function . ; ; /]. And in light of result (S.35), R w; / O D P for all and . E0 Œ .s;
(S.37)
Moreover, when ˛ D ˛ .0/, w and O form a complete sufficient statistic [as is evident from the pdf of the joint distribution of ˛, O , O and d upon applying results on exponential families like those discussed by Lehmann and Romano (2005b, secs. 2.7 and 4.3)], so that upon recalling the definition of completeness and upon observing that R w; / R w; / E0 Œ .s; O D E0 fE0 Œ .s; O j w; g; O we find that condition (S.37) is equivalent to the condition R w; / E0 Œ .s; O j w; O D P (wp1).
(S.38)
109
Confidence Intervals (or Sets) and Tests of Hypotheses
Clearly, s and w are distributed independently of . O Moreover, the joint distribution of s and w is absolutely continuous with a pdf that is obtainable from result (6.4.32). When ˛ D ˛ .0/, s is distributed independently of w as well as O with a pdf h ./ that is obtainable from result (6.4.7) simply by replacing N with N P . When ˛ ¤ ˛ .0/, the conditional distribution of s given w (and ) O is absolutely continuous with a pdf q .s j w/ that is proportional (for any particular values of w and ) O to the pdf of the joint distribution of s and w. Together, results (6.4.7) and (6.4.32) imply that q .s j w/ 1=2 D k e w s .˛ h .s/
˛ .0/ /= 2
for some strictly positive scalar k that does not depend on s. For ˛ ˛ .0/, the ratio q .s j w/=h .s/ is an increasing function of s, as is evident upon observing that d Œq .s j w/=h .s/ 1=2 .0/ 2 D kw 1=2 .˛ ˛ .0/ / 2 e w s .˛ ˛ /= : ds Thus, upon applying Theorem 7.4.1 (the Neyman-Pearson lemma) with X D s, D ˛, ‚ D . 1; 1/, and .0/ D ˛ .0/ and upon observing that ) ( tN P .N P / C Q z2C , s2 s W s p N P CŒtN P .N P /2
R w; / [analogous to the equivalence (5.6)], we find that subject to the constraint E0 Œ .s; O j w; O D P , R EŒ .s; w; / O j w; O attains its maximum value for any specified value of ˛ that exceeds ˛ .0/ and for any specified values of and when the critical region of the test is taken to be the critical region CQ C or C C of the size- P one-sided t test of H0C. And in light of the equivalence of conditions (S.37) R and (S.38), it follows that subject to the constraint imposed on .s; w; / O by condition (S.37), R EŒ .s; w; / O attains its maximum value for any specified value of ˛ that exceeds ˛ .0/ and for any specified values of and when the critical region of the test is taken to be CQ C or C C. Since there is no loss of generality in restricting attention to tests of H0C or HQ 0C with critical functions of the R w; /,since form .s; O any level- P unbiased test of H0C or HQ 0C with a critical function of the form R w; / .s; O satisfies condition (S.37), and since the size- P one-sided t test of H0C is an unbiased test, we conclude that among level- P unbiased tests of H0C or HQ 0C (versus H1C or HQ 1C ), the size- P one-sided t test of H0C is a UMP test. EXERCISE 24. (a) Let (for an arbitrary positive integer M ) fM ./ represent the pdf of a 2 .M / distribution. Show that (for 0 < x < 1) xfM .x/ D MfM C2 .x/: (b) Verify [by using Part (a) or by other means] result (6.22). Solution. (a) Upon recalling result (6.1.16) and that (for any positive scalar ˛) .˛C1/ D ˛.˛/, we find that (for 0 < x < 1) 1 x M=2 e x=2 xfM .x/ D .M=2/2M=2 M=2 x M=2 e x=2 D Œ.M=2/C1 2M=2 1 DM x Œ.M C2/=2 1 e x=2 D MfM C2 .x/: Œ.M C2/=2/ 2.M C2/=2 (b) It follows from Part (a) that (for 0 < u < 1) u g.u/ D g .u/; N P
110
Confidence Intervals (or Sets) and Tests of Hypotheses
where g ./ is the pdf of the 2 .N P C2/ distribution. Thus, Z c1 Z c1 u 1 g.u/ du D Œg .u/ g.u/ du c0 N P c0 Z c1 Œg .u/ g.u/ du D 0
D G .c1 /
G.c1 /
Z
ŒG .c0 /
c0
Œg .u/
g.u/ du
0
G.c0 /:
EXERCISE 25. This exercise is to be regarded as a continuation of Exercise 18. Suppose (as in Exercise 18) that the data (of Section 4.2 b) on the lethal dose of ouabain in cats are regarded as the observed values of the elements y1 ; y2 ; : : : ; yN of an N.D 41/-dimensional observable random vector y that follows a G-M model. Suppose further that (for i D 1; 2; : : : ; 41) E.yi / D ı.ui /, where u1 ; u2 ; : : : ; u41 are the values of the rate u of injection and where ı.u/ is the third-degree polynomial ı.u/ D ˇ1 C ˇ2 u C ˇ3 u2 C ˇ4 u3: And suppose that the distribution of the vector e of residual effects is N.0; 2 I/.
(a) Determine for P D 0:10 and also for P D 0:05 (1) the value of the 100.1 /% P lower confidence bound for provided by the left end point of interval (6.2) and (2) the value of the 100.1 /% P upper confidence bound for provided by the right end point of interval (6.3). (b) Obtain [via an implementation of interval (6.23)] a 90% two-sided strictly unbiased confidence interval for . Solution. Let us adopt the notation employed in Section 7.6a. And observe that S (the residual sum of squares) equals 5650:109 and that N P D N P D 37 [so that the estimator O of obtained by taking the square root of the usual (unbiased) estimator S=.N P / of 2 is equal to 12:357]. (a) When P D 0:10, N 2 P D 48:363 and N12 P D 26:492; and when P D 0:05, N 2 P D 52:192 and N12 P D 24:075. Thus, (1) the value of the 90% lower confidence bound for is 10:809, and the value of the 95% lower confidence bound is 10:405; and (2) the value of the 90% upper confidence bound for is 14:604, and the value of the 95% upper confidence bound is 15:320. (b) When P D 0:10, the value P1 of P1 that is a solution to equation (6.19) is found to be 0:04205, and the corresponding value P2 of P2 is P2 D P P1 D 0:05795. And N 2:04205 D 53:088, and N12 :05795 D N 2:94205 D 24:546. Thus, upon setting S D 5650:109, P1 D 0:04205, and
P2 D 0:05795 in the interval (6.23), we obtain as a 90% strictly unbiased confidence interval for the interval 10:316 15:172: EXERCISE 26. Take the setting to be that of the final part of Section 7.6c, and adopt the notation and terminology employed therein. In particular, take the canonical form of the G-M model to be that identified with the special case where M D P , so that ˛ and ˛O are P -dimensional. Show that the (size- ) P test of the null hypothesis H0C W 0 (versus the alternative hypothesis H1C W > 0 ) with critical region C C is UMP among all level- P tests. Do so by carrying out the following steps.
(a) Let .T; ˛/ O represent the critical function of a level- P test of H0C versus H1C [that depends on the vector d only through the value of T (D d0 d=02 )]. And let .; ˛/ represent the power function of the test with critical function .T; ˛/. O Further, let represent any particular value of greater than 0 , let ˛ represent any particular value of ˛, and denote by h. I / the pdf of the distribution of T, by f . I ˛; / the pdf of the distribution of ˛, O and by s./ the pdf of the N.˛ ; 2 02 / distribution. Show (1) that Z
.0 ; ˛/ s.˛/ d ˛ P (E.10) RP
111
Confidence Intervals (or Sets) and Tests of Hypotheses and (2) that Z
RP
.0 ; ˛/ s.˛/ d ˛ D
Z
RP
Z
1 0
.t; ˛/ O h.tI 0 /f .˛I O ˛ ; / dt d ˛: O
(E.11)
(b) By, for example, using a version of the Neyman-Pearson lemma like that stated by Casella and Berger (2002) in the form of their Theorem 8.3.12, show that among those choices for the critical function .T; ˛/ O for which the power function .; / satisfies condition (E.10), . ; ˛ / can be maximized by taking .T; ˛/ O to be the critical function .T; ˛/ O defined as follows: ( 1; when t > N 2 P , .t; ˛/ O D 0; when t N 2 P . (c) Use the results of Parts (a) and (b) to reach the desired conclusion, that is, to show that the test of H0C (versus H1C ) with critical region C C is UMP among all level- P tests. Solution. (a) (1) By definition, the test with critical function .T; ˛/ O is of level P , so that .0 ; ˛/ P for all ˛. And it follows that Z
.0 ; ˛/ s.˛/ d ˛ P : RP
(2) Clearly, Z Z
.0 ; ˛/ s.˛/ d ˛ D RP
Moreover,
Z
RP
Z
1 0
.t; ˛/ O h.tI 0 /
Z
f .˛I O ˛; 0 / s.˛/ d ˛ dt d ˛: O
RP
f .˛I O ˛; 0 / s.˛/ d ˛ D f .˛I O ˛ ; /;
RP
as is evident upon regarding ˛O and ˛ as random vectors whose joint distribution is ˛ 2 I .2 02 /I ; when ˛O and ˛ are so regarded, f .˛I O ˛; 0 / is the pdf of N ; .2 02 /I .2 02 /I ˛ the conditional R distribution of ˛O given ˛ and s.˛/ is the pdf of the marginal distribution of ˛, in which case RP f .˛I O ˛; 0 / s.˛/ d ˛ equals the pdf of the marginal distribution of ˛. O (b) Clearly, the value . ; ˛ / (when D and ˛ D ˛ ) of the power function .; ˛/ of the test with critical function .T; ˛/ O is expressible as Z Z 1 .t; ˛/ O h.tI /f .˛I O ˛ ; / dt d ˛: O
. ; ˛ / D RP
0
Moreover, the ratio h.tI /=h.tI 0 / is (for t > 0) a strictly increasing function of t, so that (for some strictly positive constant k) .t; ˛/ O is reexpressible in the form ( 1; when h.tI /f .˛I O ˛ ; / > k h.tI 0 /f .˛I O ˛ ; /, .t; ˛/ O D 0; when h.tI /f .˛I O ˛ ; / k h.tI 0 /f .˛I O ˛ ; /. And upon recalling result (E.11) and applying the Neyman-Pearson lemma, it follows that . ; ˛ / can by maximized [subject to the constraint imposed on the choice of .T; ˛/ O by condition (E.10)] by taking .T; ˛/ O to be the critical function .T; ˛/. O
(c) Recall (from the final part of Section 7.6c) that corresponding to any (possibly randomized) test of H0C (versus H1C ) there is a test with a critical function of the form .T; ˛/ O that has the same power function. And observe that the test of H0C with critical region C C is identical to the test with critical function .T; ˛/. O Accordingly, let us restrict attention to tests with a critical function of the form .T; ˛/ O and observe that (for purposes of showing that the test with critical region C C is UMP among all level- P tests) there is no loss of generality in doing so.
112
Confidence Intervals (or Sets) and Tests of Hypotheses
Among tests with a critical function for which the power function .; / satisfies condition (E.10), . ; ˛ / is maximized by taking the critical function to be .T; ˛/ O [as is evident from Part b)]. Moreover, the set consisting of all tests that are of level P is a subset of the set consisting of all tests with a power function that satisties condition (E.10). Thus, since the test with critical function .T; ˛/ O is of level , P we conclude that among all level- P tests, . ; ˛ / is maximized by taking the critical function of the test to be .T; ˛/. O It remains only to observe that this conclusion does not depend on the choice of or ˛ . EXERCISE 27. Take the context to be that of Section 7.7a, and adopt the notation employed therein. Using Markov’s inequality (e.g., Casella and Berger 2002, lemma 3.8.3; Bickel and Doksum 2001, sec. A.15) or otherwise, verify inequality (7.12), that is, the inequality P Pr. xi > c for k or more values of i / .1=k/ M i D1 Pr. xi > c/: Solution. Let I.x/ represent an indicator function defined (for x 2 R1 ) as follows: ( 1; if x > c, I.x/ D 0; otherwise. Then, upon making use of Markov’s inequality, we find that Pr. xi > c for k or more values of i/ D PrŒ I.xi / D 1 for k or more values of i PM D Pr i D1 I.xi / k PM .1=k/ E i D1 I.xi / P D .1=k/ M i D1 EŒI.xi / PM D .1=k/ i D1 Pr. xi > c/:
EXERCISE 28.
(a) Letting X represent any random variable whose values are confined to the interval Œ0; 1 and letting (0 < < 1) represent a constant, show (1) that E.X / Pr.X / C Pr.X > /
(E.12)
and then use inequality (E.12) along with Markov’s inequality (e.g., Casella and Berger 2002, sec. 3.8) to (2) show that E.X / E.X / Pr.X > / : (E.13) 1 (b) Show that the requirement that the false discovery rate (FDR) satisfy condition (7.45) and the requirement that the false discovery proportion (FDP) satisfy condition (7.46) are related as follows: (1) if FDR ı, then Pr.FDP > / ı=; and (2) if Pr.FDP > / , then FDR C .1 /. Solution. (a) (1) Clearly, E.X / D E.X j X / Pr.X / C E.X j X > / Pr.X > / Pr.X / C Pr.X > /:
E.X / Pr.X > / is evident from inequality (E.12) upon observing that Pr.X 1 E.X / follows from Markov’s inequality. Pr.X > /. That Pr.X > /
(2) That / D 1
Confidence Intervals (or Sets) and Tests of Hypotheses
113
(b) E.X / (1) If FDR ı, then upon applying the inequality Pr.X > / with X D FDP, we find FDR ı that Pr.FDP > / . E.X / (2) If Pr.FDP > / , then upon applying the inequality Pr.X > / (with 1 FDR Pr.FDP > / and hence that X D FDP), we find that 1 FDR C .1 / Pr.FDP > / C .1 /: EXERCISE 29. Taking the setting to be that of Section 7.7 and adopting the terminology and notation employed therein, consider the use of a multiple-comparison procedure in testing (for every .0/ .0/ i 2 I D f1; 2; : : : ; M g) the null hypothesis Hi W i D i versus the alternative hypothesis .1/ .0/ .0/ .0/ .1/ .0/ Hi W i ¤ i (or Hi W i i versus Hi W i > i ). And denote by T the set of values of .0/ .0/ i 2 I for which Hi is true and by F the set for which Hi is false. Further, denote by MT the size .0/ of the set T and by XT the number of values of i 2 T for which Hi is rejected. Similarly, denote by MF the size of the set F and by XF the number of values of i 2 F for which Hi.0/ is rejected. Show that (a) in the special case where MT D 0, FWER D FDR D 0;
(b) in the special case where MT D M, FWER D FDR; and
(c) in the special case where 0 < MT < M, FWER FDR, with equality holding if and only if Pr.XT > 0 and XF > 0/ D 0. Solution. By definition, FDP D
(
XT =.XT CXF /; if XT CXF > 0, 0; if XT CXF D 0.
Thus, FDP D 0 , XT D 0I
FDP D 1 , XT > 0 and XF D 0I and 0 < FDP < 1 , XT > 0 and XF > 0:
(S.39) (S.40) (S.41)
And in light of results (S.39), (S.40), and (S.41), FWER D Pr.XT > 0/ D Pr.FDP > 0/:
(S.42)
(a) Suppose that MT D 0. Then, clearly, XT D 0. And in light of result (S.39), FDP D 0. Thus, FWER D Pr.XT > 0/ D 0; and FDR D E.FDP/ D 0: (b) and (c) Suppose that Pr.XT > 0 and XF > 0/ D 0. Then, in light of result (S.41), Pr.FDP D 0 or FDP D 1/ D 1:
Thus,
FDR D E.FDP/ D 0 Pr.FDP D 0/ C 1 Pr.FDP D 1/ D Pr.FDP D 1/ D Pr.FDP > 0/: And [in light of result (S.42)] it follows that which upon observing that
FWER D FDR;
MT D M ) MF D 0 ) XF D 0 ) Pr.XT > 0 and XF > 0/ D 0
(S.43)
114
Confidence Intervals (or Sets) and Tests of Hypotheses
establishes Part (b). Now, suppose that 0 < MT < M and that Pr.XT > 0 and XF > 0/ > 0. And observe that if XF > 0, then MT < 1: FDP MT C 1 Then, upon recalling results (S.41) and (S.42), we find that FDR D E.FDP/
D 0 Pr.FDP D 0/ C 1 Pr.FDP D 1/ C E.FDP j XT > 0 and XF > 0/ Pr.0 < FDP < 1/
< Pr.FDP D 1/ C Pr.0 < FDP < 1/ D Pr.FDP > 0/ D FWER;
which [in combination with result (S.43)] establishes Part (c). EXERCISE 30. (a) Let pO1 ; pO2 ; : : : ; pO t represent p-values [so that Pr.pOi u/ u for i D 1; 2; : : : ; t and for every u 2 .0; 1/]. Further, let pO.j / D pOij (j D 1; 2; : : : ; t), where i1 ; i2 ; : : : ; i t is a permutation of the first t positive integers 1; 2; : : : ; t such that pOi1 pOi2 pOi t . And let s represent a positive integer such that s t, and let c0 ; c1 ; : : : ; cs represent constants such that 0 D c0 c1 cs 1. Show that P Pr.pO.j / cj for 1 or more values of j 2 f1; 2; : : : ; sg/ t js D1 .cj cj 1 /=j: (E.14)
(b) Take the setting to be that of Section 7.7d, and adopt the notation and terminology employed therein. And suppose that the ˛j ’s of the step-down multiple-comparison procedure for testing .0/ the null hypotheses H1.0/; H2.0/; : : : ; HM are of the form N ˛j D t.ŒjC1/ P =f2.M CŒjC1 j /g .N P / .j D 1; 2; : : : ; M / (E.15) [where P 2 .0; 1/]. (1) Show that if
Pr.jtuIT j > tNu P =.2MT / .N P /for 1 or more values of u 2 f1; 2; : : : ; Kg/ ;
(E.16)
then the step-down procedure [with ˛j ’s of the form (E.15)] is such that Pr.FDP > / . (2) Reexpress the left side of inequality (E.16) in terms of the left side of inequality (E.14). (3) Use Part (a) to show that Pr.jtuIT j > tNu P =.2MT / .N P / P C1 for 1 or more values of u 2 f1; 2; : : : ; Kg/ P ŒM 1=u: uD1
(E.17)
(4) Show that the version of the step-down procedure [with ˛j ’s of the form (E.15)] obtained PŒM C1 upon setting P D = uD1 1=u is such that Pr.FDP > / .
Solution. (a) Let j 0 represent the smallest value of j 2 f1; 2; : : : ; sg for which pO.j / cj —if pO.j / > cj for j D 1; 2; : : : ; s, set j 0 D 0. Then, upon regarding j 0 as a random variable and observing that fj 0 W j 0 D 1g; fj 0 W j 0 D 2g; : : : ; fj 0 W j 0 D sg; are disjoint events, we find that Pr.pO.j / cj for 1 or more values of j 2 f1; 2; : : : ; sg/
D Pr.j 0 D k for some integer k between 1 and s, inclusive/ P D skD1 Pr.j 0 D k/: (S.44) Ps 0 Now, for purposes of obtaining an upper bound on kD1 Pr.j D k/, let Nj represent the number of p-values that are less than or equal to cj , and observe that (for j D 1; 2; : : : ; s) Pt Pt Ps 0 (S.45) kD1 cj D tcj : kD1 Pr.pOk cj / kD1 k Pr.j D k/ E.Nj /
115
Confidence Intervals (or Sets) and Tests of Hypotheses
Then, upon multiplying both sides of inequality (S.45) by 1=Œj.j C1/ for values of j < s and by 1=s for j D s (and upon summing over the s values of j ), we obtain the inequality s 1 X
j D1
j s s 1 X X 1 tcj tcs 1X k Pr.j 0 D k/ C : k Pr.j 0 D k/ C j.j C1/ s j.j C1/ s kD1
kD1
(S.46)
j D1
Moreover, upon interchanging the order of summation, the left side of inequality (S.46) is reexpressible as s s s 1 X X 1X 1 1 0 0 k Pr.j D k/ C k Pr.j D k/ D Pr.j 0 D k/: (S.47) k s s kD1
kD1
kD1
And the right side of inequality (S.46) is reexpressible as s s 1 X X 1 cj cj 1 1 C t cs D t t cj j j C1 s j
1
:
(S.48)
j D1
j D1
Thus, inequality (S.46) is reexpressible in the form Ps Ps 0 kD1 Pr.j D k/ t j D1 .cj
cj
1 /=j
and hence [in light of result (S.44)] is reexpressible in the form of inequality (E.14). (b) (1) In light of inequality (7.66), we find that [for ˛j ’s of the form (E.15)] Pr.jtuIT j > tNu P =.2MT / .N P / for 1 or more values of u 2 f1; 2; : : : ; Kg/
Pr.jtuIT j > ˛u for 1 or more values of u 2 f1; 2; : : : ; Kg/:
Thus, if condition (E.16) is satisfied, condition (7.62) is also satisfied, in which case the step-down procedure is such that Pr.FDP > / .
(2) Set t D MT . And letting k1 ; k2 ; : : : ; kMT represent the elements of T, take (for j D 1; 2; : : : ; MT ) pOj to be the value of P for which tN P =2 .N P / D jtkj j. Further, set s D K, and take (for u D 1; 2; : : : ; K) cu D u P =MT . Then, Pr.jtuIT j > tNu P =.2MT / .N P / for 1 or more values of u 2 f1; 2; : : : ; Kg/
D Pr.pO.u/ cu for 1 or more values of u 2 f1; 2; : : : ; Kg/: (S.49)
(3) Upon applying Part (a) to the right side of equality (S.49), we find that Pr.jtuIT j > tNu P =.2MT / .N P / for 1 or more values of u 2 f1; 2; : : : ; Kg/ P MT Œc1 C K uD2 .cu cu 1 /=u PK D P f1 C uD2 Œu .u 1/=ug P D P K uD1 1=u PŒM C1 P uD1 1=u: PŒM C1 (4) When P D = uD1 1=u, the right side of inequality (E.17) equals . Thus, when P D PŒM C1 = uD1 1=u, condition (E.16) is satisfied. And we conclude [on the basis of Part (1)] that upon PŒM C1 setting P D = uD1 1=u, the step-down procedure [with ˛j ’s of the form (E.15)] is such that Pr.FDP > / .
EXERCISE 31. Take the setting to be that of Section 7.7e, and adopt the notation and termi nology employed therein. And take ˛1 ; ˛2 ; : : : ; ˛M to be scalars defined implicitly (in terms of ˛1 ; ˛2 ; : : : ; ˛M ) by the equalities
116
Confidence Intervals (or Sets) and Tests of Hypotheses ˛k0 D
or explicitly as ˛j
D
(
where t S t.N P /.
Pk
j D1
˛j
.k D 1; 2; : : : ; M /
(E.18)
Pr ˛j 1 jtj > ˛j ; for j D 2; 3; : : : ; M, for j D 1, Pr jtj > ˛j ;
(a) Show that the step-up procedure for testing the null hypotheses Hi.0/ W i D i.0/ (i D 1; 2; : : : ; P P M ) is such that (1) the FDR is less than or equal to MT jMD1 ˛j =j ; (2) when M jMD1 ˛j =j < 1, P the FDR is controlled at level M jMD1 ˛j =j (regardless of the identity of the set T ); and (3) in the special case where (for j D 1; 2; : : : ; M ) ˛j0 is of the form ˛j0 D j P =M, the FDR is less than 1 P PM or equal to P .MT =M / jMD1 1=j and can be controlled at level ı by taking P D ı . j D1 1=j PM (b) The sum j D1 1=j is “tightly” bounded from above by the quantity
C log.M C0:5/ C Œ24.M C0; 5/2 1; (E.19) where is the Euler-Mascheroni constant (e.g., Chen 2010)—to 10 significant digits, D P 0:5772156649. Determine the value of jMD1 1=j and the amount by which this value is exceeded by the value of expression (E.19). Do so for each of the following values of M : 5, 10, 50, 100, 500, 1;000, 5;000, 10;000, 20;000, and 50;000.
(c) What modifications are needed to extend the results encapsulated in Part (a) to the step-up procedure for testing the null hypotheses Hi.0/ W i i.0/ (i D 1; 2; : : : ; M ). Solution. (a) (1) Starting with expression (7.91) and recalling that (for all i ) the sets A1Ii ; A2Ii ; : : : ; AM Ii are M 1 mutually disjoint and that their union equals R , we find that P P 0 . i/ FDR D i 2T M 2 AkI i j jti j > ˛k kD1 .˛k =k/ Pr t P P Pk . i/ D i 2T M 2 AkI i j jti j > ˛k kD1 j D1 ˛j .1=k/ Pr t P P P . i/ 2 AkI i j jti j > ˛k D i 2T jMD1 ˛j M kDj .1=k/ Pr t P P P . i/ 2 AkI i j jti j > ˛k i 2T jMD1 ˛j M kDj .1=j / Pr t P P P . i/ 2 AkI i j jti j > ˛k i 2T jMD1 ˛j M kD1 .1=j / Pr t P P P . i/ 2 AkI i j jti j > ˛k D i 2T jMD1 .˛j =j / M kD1 Pr t P P D i 2T jMD1 ˛j =j P D MT jMD1 ˛j =j : P (2) Since MT M, it follows from Part (1) that (when M jMD1 ˛j =j < 1) the FDR is controlled PM at level M j D1 ˛j =j (regardless of the identity of the set T ). (3) Suppose that (for j D 1; 2; : : : ; M ) ˛j0 is of the form ˛j0 D j P =M. Then, ˛1 D ˛10 D P =M ; and for j D 2; 3; : : : ; M, ˛j D ˛j0
˛j0
D .j P =M / .j 1/ P =M D P =M: 1 PM And it follows from Part (2) that [when P < ] the FDR is controlled at level j D1 1=j P PM PM 1
P jMD1 1=j . Moreover, when P D ı 1=j ,
P 1=j D ı. j D1 j D1 1
(b) Let Dev.M / represent the function of M whose values are those of the difference between P P expression (E.19) and the sum jMD1 1=j . Then, the values of jMD1 1=j and the values of Dev.M / corresponding to the various values of M are as follows:
117
Confidence Intervals (or Sets) and Tests of Hypotheses PM M Dev.M / M j D1 1=j 5 10 50 100 500
2:28 2:93 4:50 5:19 6:79
< 10 < 10 < 10 < 10 < 10
5
PM
j D1
1;000 5;000 10;000 20;000 50;000
6
8 10 12
1=j
7:49 9:09 9:79 10:48 11:40
Dev.M / < 10 < 10 < 10 < 10 < 10
14 14 14 14 20
(c) For j D 1; 2; : : : ; M, take ˛j0 D Pr.t > ˛j /, rather than ˛j0 D Pr.jtj > ˛j /. And take to be the scalars redefined implicitly (via the redefined ˛j0 ’s) by the equalities (E.18) ( or explicitly as Pr ˛j 1 t > ˛j ; for j D 2; 3; : : : ; M, ˛j D for j D 1. Pr t > ˛j ; ˛1 ; ˛2 ; : : : ; ˛M
EXERCISE 32. Take the setting to be that of Section 7.7, and adopt the terminology and notation employed therein. Further, for j D 1; 2; : : : ; M , let ˛Pj D tNkj =Œ2.M P Ckj
j / .N
P /;
where [for some scalar (0 < < 1)] kj D ŒjC1, and let
˛Rj D tNj =.2M P / .N P /:
And consider two stepwise multiple-comparison procedures for testing the null hypotheses .0/ H1.0/; H2.0/; : : : ; HM : a stepwise procedure for which ˛j is taken to be of the form ˛j D ˛Pj [as in Section 7.7d in devising a step-down procedure for controlling Pr.FDP > /] and a stepwise procedure for which ˛j is taken to be of the form ˛j D ˛Rj (as in Section 7.7e in devising a step-up procedure for controlling the FDR). Show that (for j D 1; 2; : : : ; M ) ˛Pj ˛Rj , with equality holding if and only if j 1=.1 / or j D M. Solution. Clearly, it suffices to show that (for j D 1; 2; : : : ; M ) kj j 0; M M Ckj j
(S.50)
with equality holding if and only if j 1=.1 / or j D M. And as can be readily verified, j M
kj .M j /.j kj / D : M Ckj j M ŒM .j kj /
(S.51)
Moreover, for j D 2; 3; : : : ; M, Œj
1 j
1 D .j 1/
.1 / < .j 1/;
implying (since Œj 1 is an integer) that 1 Œ.j 1/;
Œj or equivalently that so that and upon observing that
Œj Œ.j 1/ C 1 D kj j
kj D j 1 1
Œj j 1
1;
kj
1I
(S.52)
k1 D 0
and making repeated use of inequality (S.52), it follows that (for j D 1; 2; : : : ; M ) M
kM j
kj 0:
(S.53)
Together, results (S.51) and (S.53) validate inequality (S.50). As is evident from equality (S.51), equality holds in inequality (S.50) if and only if j D M or j kj D 0. Moreover, j kj D 0 , Œj D j 1;
118
Confidence Intervals (or Sets) and Tests of Hypotheses
and (since both Œj and j 1 are integers and since j < j ) Œj D j 1 , j j 1 , j 1=.1 /: Thus, equality holds in inequality (S.50) if and only if j 1=.1 / or j D M. EXERCISE 33. Take the setting to be that of Section 7.7f, and adopt the terminology and notation employed therein. And consider a multiple-compariaon procedure in which (for i D 1; 2; : : : ; M ) .0/ the i th of the M null hypotheses H1.0/; H2.0/; : : : ; HM is rejected if jti.0/ j > c, where c is a strictly positive constant. Further, recall that T is the subset of the set I D f1; 2; : : : ; M g such that i 2 T if .0/ .0/ Hi is true, denote by R the subset of I such that i 2 R if Hi is rejected, and (for i D 1; 2; : : : ; M ) take Xi to be a random variable defined as follows: ( 1; if jti.0/ j > c, Xi D .0/ 0; if jti j c. (a) Show that P EŒ.1=M / i 2T Xi D .MT =M / Pr.jtj > c/ Pr.jtj > c/; where t S t.100/. P (b) Based on the observation that [when .1=M / M Xi > 0] i D1P .1=M / i 2T Xi FDP D ; P .1=M / M i D1 Xi P on the reasoning that for large M the quantity .1=M / M i D1 Xi can be regarded as a (strictly positive) constant, and on the result of Part (a), the quantity MT Pr.jtj > c/=MR can be regarded as an “estimator” of the FDR [D E.FDP/] and M Pr.jtj > c/=MR can be regarded as an estimator of maxT FDR (Efron 2010, chap. 2)—if MR D 0, take the estimate of the FDR or of maxT FDR to be 0. Consider the application to the prostate data of the multiple-comparison procedure in the case where c D c P .k/ and also in the case where c D tNk =.2M P / .100/. Use the information provided by the entries in Table 7.5 to obtain an estimate of maxT FDR for each of these two cases. Do so for P D :05; :10; and :20 and for k D 1; 5; 10; and 20. Solution. (a) For i D 1; 2; : : : ; M, E.Xi / D Pr.jti.0/ j > c/. And for i 2 T, ti.0/ S t.100/. Thus, P P EŒ.1=M / i 2T Xi D .1=M / i 2T E.Xi / P .0/ D .1=M / i 2T Pr.jti j > c// P D .1=M / i 2T Pr.jtj > c// D .MT =M / Pr.jtj > c/ Pr.jtj > c/:
(b) For the case where c D c P .k/ and the case where c D tN D tNk =.2M P / .100/, we obtain the following estimates of maxT FDR:
k 1 5 10 20
P D :05
c D c P .k/ c D tN :03 :03 :06 :02 :09 :04 :16 :05
P D :10
c D c P .k/ c D tN :01 :01 :07 :04 :10 :05 :16 :06
P D :20
c D c P .k/ c D tN :02 :02 :07 :05 :12 :06 :18 :08
For example, when c D c P .k/, k D 10, and P D :05, M Pr.jtj > c/=MR D 6033 Pr.jtj > 3:42/=58 D :09. EXERCISE 34. Take the setting to be that of Part 6 of Section 7.8a (pertaining to the testing of H0 W w 2 S0 versus H1 W w 2 S1 ), and adopt the notation and terminology employed therein.
Confidence Intervals (or Sets) and Tests of Hypotheses
119
(a) Write p0 for the random variable p0.y/, and denote by G0./ the cdf of the conditional distribution of p0 given that w 2 S0 . Further, take k and c to be the constants that appear in the definition of the critical function ./, take k 00 D Œ1C .1 =0 / k 1, and take ./ to be a critical function 8 defined as follows: < 1; when p0 .y/ < k 00, .y/ D c; when p0 .y/ D k 00, : 0; when p0 .y/ > k 00. Show that (1) .y/ D .y/ when f .y/ > 0, (2) that k 00 equals the smallest scalar p 0 for which G0 .p 0 / P , and (3) that cD
P
Pr.p0 < k 00 j w 2 S0 / when Pr.p0 D k 00 j w 2 S0 / > 0 Pr.p0 D k 00 j w 2 S0 /
—when Pr.p0 D k 00 j w 2 S0 / D 0, c can be chosen arbitrarily.
(b) Show that if the joint distribution of w and y is MVN, then there exists a version of the critical function ./ defined by equalities (8.25) and (8.26) for which .y/ depends on the value of 0 0 Q y only through the value of w.y/ D C Vyw Vy 1 y (where D w Vyw Vy 1 y ).
(c) Suppose that M D 1 and that S0 D fw W ` w ug, where ` and u are (known) constants (with ` < u). Suppose also that the joint distribution of w and y is MVN and that vyw ¤ 0. And 0 0 0 lettimg wQ D w.y/ Q D C vyw Vy 1 y (with D w vyw Vy 1 y ) and vQ D vw vyw Vy 1 vyw , define d D d.y/ D FfŒu w.y/= Q vQ 1=2 g FfŒ` w.y/= Q vQ 1=2 g; where F ./ is the cdf of the N.0; 1/ distribution. Further, let C0 D fy 2 RN W d.y/ < dR g;
where dR is the lower 100 % P point of the distribution of the random variable d. Show that among all P -level tests of the null hypothesis H0 , the nonrandomized P -level test with critical region C0 achieves maximum power.
Solution. (a) (1) Suppose that f .y/ > 0. If p0 .y/ > 0, then f0 .y/ > 0, and it follows from result (8.27) that .y/ D .y/. If p0 .y/ D 0 [in which case p1 .y/ D 1], then p0 .y/ < k 00, f0 .y/ D 0, and f1 .y/ > 0, so that .y/ D 1 D .y/. Thus, in either case [i.e., whether p0 .y/ > 0 or p0 .y/ D 0], .y/ D .y/. (2) and (3) Making use of result (8.27), we find that R
P D EŒ .y/ j w 2 S0 D RN .y/f0 .y/ d y R D RN .y/f0 .y/ d y D Pr.p0 < k 00 j w 2 S0 / C c Pr.p0 D k 00 j w 2 S0 /: Thus, G0 .k 00 / P Pr.p0 < k 00 j w 2 S0 / D G0 .k 00 / Pr.p0 D k 00 j w 2 S0 /;
(S.54)
which implies that k 00 equals the smallest scalar p 0 for which G0 .p 0 / . P Moreover, it follows from equality (S.54) that cD
P
Pr.p0 < k 00 j w 2 S0 / when Pr.p0 D k 00 j w 2 S0 / > 0: Pr.p0 D k 00 j w 2 S0 /
(b) Suppose that the joint distribution of w and y is MVN, in which case the N.y ; Vy / distribu0 Q tion is the marginal distribution of y and the N Œw.y/; Vw Vyw Vy 1 Vyw / distribution is a conditional distribution of w given y. Further, take the pdf f ./ of the marginal distribution of y to be such that f .y/ > 0 for every value of y—clearly, this is consistent with the N.y ; Vy / distribution being the marginal distribution. And observe [in light of the result of Part (a)] that there exists a function, say
120
Confidence Intervals (or Sets) and Tests of Hypotheses
0 ./, such that .y/ D 0 Œp0 .y/ for every value of y. Observe also that (for every value of y), we R can take Q d w; p0 .y/ D RM gŒw I w.y/
0 Q Q where gŒ I w.y/ is the pdf of the N Œw.y/; Vw Vyw Vy 1 Vyw / distribution, in which case p0 .y/ Q depends on y only through the value of w.y/. It remains only to observe that if p0 .y/ depends on y Q only through the value of w.y/, then (since .y/ D 0 Œp0 .y/) .y/ depends on y only through Q the value of w.y/.
(c) When the joint distribution of w and y is MVN, the NŒw.y/; Q v Q distribution is a conditional distribution of w given y, and the marginal distribution of y is absolutely continuous with a pdf f ./ for which f .y/ > 0 for every N 1 vector y [as was noted in the solution to Part (b)]. And upon applying the result of Part (a)-(1), we find that the critical function ./ of a most-powerful
P -level test is such that .y/ D .y/ for every value of y [where ./ is as defined in Part (a)]. Moreover, when M D 1 and S0 D fw W ` w ug, we find that (for every N 1 vector y) w 2 S0 , ` w u , Œ` w.y/= Q vQ 1=2 Œw w.y/= Q vQ 1=2 Œu w.y/= Q vQ 1=2 and hence that we can take
p0 .y/ D d.y/: Now, supposing that p0 .y/ D d.y/, consider the constants k 00 and c. When the joint distribution of w and y is MVN and when vyw ¤ 0, the conditional distribution of wQ given that ` w u is absolutely continuous. Thus, the conditional distribution of d given that ` w u is absolutely continuous and hence [in light of the results of Part (a)] is such that k 00 D dR and is such that c can be taken to be 0. It remains only to observe that (when c D 0) the test with critical function ./ is identical to the nonrandomized P -level test with critical region C0 .