374 38 3MB
English Pages VII, 353 [350] Year 2020
Dale L. Zimmerman
Linear Model Theory Exercises and Solutions
Linear Model Theory
Dale L. Zimmerman
Linear Model Theory Exercises and Solutions
Dale L. Zimmerman Department of Statistics and Actuarial Science University of Iowa Iowa City, IA, USA
ISBN 978-3-030-52073-1 ISBN 978-3-030-52074-8 (eBook) https://doi.org/10.1007/978-3-030-52074-8 Mathematics Subject Classification: 62J05, 62J10, 62F03, 62F10, 62F25 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG. The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
In loving gratitude to my late father, Dean Zimmerman, and my mother, Wendy Zimmerman
Contents
1
A Brief Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2
Selected Matrix Algebra Topics and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
3
Generalized Inverses and Solutions to Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Moments of a Random Vector and of Linear and Quadratic Forms in a Random Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
5
Types of Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
6
Estimability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
7
Least Squares Estimation for the Gauss–Markov Model . . . . . . . . . . . . . .
63
8
Least Squares Geometry and the Overall ANOVA . . . . . . . . . . . . . . . . . . . . .
91
9
Least Squares Estimation and ANOVA for Partitioned Models . . . . . . . 103
10
Constrained Least Squares Estimation and ANOVA . . . . . . . . . . . . . . . . . . . 131
11
Best Linear Unbiased Estimation for the Aitken Model . . . . . . . . . . . . . . . 153
12
Model Misspecification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
13
Best Linear Unbiased Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
14
Distribution Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
15
Inference for Estimable and Predictable Functions . . . . . . . . . . . . . . . . . . . . 255
16
Inference for Variance–Covariance Parameters . . . . . . . . . . . . . . . . . . . . . . . . 325
17
Empirical BLUE and BLUP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
4
vii
1
A Brief Introduction
This book contains 296 solved exercises on the theory of linear models. The exercises are taken from the author’s graduate-level textbook, Linear Model Theory: With Examples and Exercises, which was published by Springer in 2020. The exercises themselves have been restated, when necessary and feasible, to make them as comprehensible as possible independently of the textbook, but the solutions refer liberally to theorems and other results therein. They are arranged in chapters, the numbers and titles of which are identical to those of the chapters in the textbook that have exercises. Some of the exercises and solutions are short, while others have multiple parts and are quite lengthy. Some are proofs of theorems presented but not proved in the aforementioned textbook, but most are specializations of said theorems and other general results to specific linear models. In this respect they are quite similar to the textbook’s examples. A few of the exercises require the use of a computer, but none involve the analysis of actual data. The author is not aware of any other published set of solved exercises for a graduate-level course on the theory of linear models. It is hoped that students and instructors alike, possibly even those not using Linear Model Theory: With Examples and Exercises for their course, will find these exercises and solutions useful.
© Springer Nature Switzerland AG 2020 D. L. Zimmerman, Linear Model Theory, https://doi.org/10.1007/978-3-030-52074-8_1
1
2
Selected Matrix Algebra Topics and Results
This chapter presents exercises on selected matrix algebra topics and results and provides solutions to those exercises. Exercise 1 Let Vn represent a vector space, and let Sn represent a subspace of Vn . Prove that the orthogonal complement of Sn (relative to Vn ) is a subspace of Vn . Solution By definition, the orthogonal complement of Sn (relative to Vn ) is Sn⊥ = {v ∈ Vn : wT v = 0 for all w ∈ Sn }. Obviously 0 ∈ Sn⊥ , so Sn⊥ is nonempty. For any v1 , v2 ∈ Sn⊥ , wT (v1 + v2 ) = wT v1 + wT v2 = 0 for all w ∈ Sn , so v1 + v2 ∈ Sn⊥ . Also, for any v ∈ Sn⊥ , wT (cv) = cwT v = 0 for any c ∈ R and all w ∈ Sn , so cv ∈ Sn⊥ . Therefore, Sn⊥ is a vector space, hence a subspace of Vn . Exercise 2 Let A represent an n × m matrix. Prove that N (A) is a subspace of Rm . Solution By definition, N (A) = {v : Av = 0} ⊆ Rm . Obviously 0 ∈ N (A), so N (A) is nonempty. For any v1 , v2 ∈ N (A), A(v1 + v2 ) = Av1 + Av2 = 0, so v1 + v2 ∈ N (A). Also, for any v ∈ N (A), A(cv) = cAv = 0 for any c ∈ R, so cv ∈ N (A). Therefore, N (A) is a vector space, hence a subspace of Rm . Exercise 3 Prove Theorem 2.8.2: For any matrix A, rank(AT ) = rank(A). Solution rank(AT ) = dim{R(AT )} = dim{C(A)} = rank(A). Exercise 4 Prove Theorem 2.8.3: For any matrix A and any nonzero scalar c, rank(cA) = rank(A). Solution rank(cA) = dim{R(cA)} = dim{R(A)} = rank(A).
© Springer Nature Switzerland AG 2020 D. L. Zimmerman, Linear Model Theory, https://doi.org/10.1007/978-3-030-52074-8_2
3
4
2 Selected Matrix Algebra Topics and Results
Exercise 5 Prove Theorem 2.15.1: If A is a nonnegative definite matrix, then all of its diagonal elements are nonnegative; if A is positive definite, then all of its diagonal elements are positive. Solution Denote the diagonal elements of the n × n matrix A by a11 , . . . , ann . If A is nonnegative definite, then xT Ax ≥ 0 for all x. Choosing x = ui (i = 1, . . . , n) yields aii ≥ 0 (i = 1, . . . , n). Alternatively, if A is positive definite, then xT Ax > 0 for all x = 0. Again choosing x = ui (i = 1, . . . , n) yields aii > 0 (i = 1, . . . , n). Exercise 6 Prove Theorem 2.15.2: Let c represent a positive scalar. If A is a positive definite matrix, then so is cA; if A is positive semidefinite, then so is cA. Solution If A is positive definite, then xT (cA)x = cxT Ax > 0 for all x = 0 [and trivially 0T (cA)0 = 0], implying that cA is positive definite. Alternatively, if A is positive semidefinite, then xT (cA)x = cxT Ax ≥ 0 for all x and xT (cA)x = cxT Ax = 0 for some x = 0, implying that cA is positive semidefinite. Exercise 7 Prove Theorem 2.15.3: Let A and B represent n × n matrices. If A and B are both nonnegative definite, then so is A+B; if either one is positive definite and the other is nonnegative definite, then A + B is positive definite. Solution If A and B are both nonnegative definite, then for any x ∈ Rn , xT (A + B)x = xT Ax+xT Bx ≥ 0, implying that A+B is nonnegative definite. Alternatively, suppose without loss of generality that A is positive definite and B is nonnegative definite. Then for any nonnull x ∈ Rn , xT (A + B)x = xT Ax + xT Bx ≥ xT Ax > 0, and 0T (A + B)0 = 0. Thus, 0 is the only value of x for which xT (A + B)x equals 0, implying that A + B is positive definite.
A0 is 0 B nonnegative definite if and only if both A and B are nonnegative definite and is positive definite if and only if both A and B are positive definite.
Exercise 8 Prove Theorem 2.15.4: The block diagonal matrix
Solution First observe that for any vector x with dimension equal tothe number A0 x1 where the of rows (or columns) of , and which is partitioned as x2 0 B dimension of x1 is equal to the number of rows (or columns) of A, we have T
x
T T A0 A0 x1 = xT1 Ax1 + xT2 Bx2 . x = x1 , x2 x2 0 B 0 B
Now, if A and B are nonnegative definite, then xT1 Ax1 ≥ 0 for all x1 and xT2 Bx2 ≥ 0 A0 T for all x2 , implying (by the result displayed above) that x x ≥ 0 for all 0 B
2 Selected Matrix Algebra Topics and Results
x, i.e., that
A0 0 B
5
is nonnegative definite. Conversely, if
A0 0 B
is nonnegative
definite, then T T A0 T T A0 x1 0 ≥ 0 for ≥ 0 for all x1 and 0 , x2 x1 , 0 0 0 B 0 B x2 all x2 , i.e., xT1 Ax1 ≥ 0 for all x1 and xT2 Bx2 ≥ 0 for all x2 . Thus A and B are nonnegative definite. On the other hand, if A and B are positive definite, then xT1 Ax1 > 0 for all x1 = 0 and xT2 Bx2 > 0 for all x2 = 0, implying (by the result displayed above A0 and by the trivial results 0T A0 = 0T B0 = 0) that xT x > 0 for all x = 0, 0 B A0 A0 i.e., that is positive definite. Conversely, if is positive definite, then 0 B 0 B T T A0 T T A0 x1 0 > 0 for > 0 for all x1 = 0 and 0 , x2 x1 , 0 0 0 B 0 B x2 all x2 = 0, i.e., xT1 Ax1 > 0 for all x1 = 0 and xT2 Bx2 > 0 for all x2 = 0. Thus A and B are positive definite. Exercise 9 Prove Corollary 2.15.12.1: If A is an n × n nonnull nonnegative definite matrix and C is any matrix having n columns, then CACT is nonnegative definite. Solution By Theorem 2.15.12, a matrix B exists such that A = BBT . Thus CACT = CBBT CT = (BT CT )T (BT CT ), which is nonnegative definite by Theorem 2.15.9. Exercise 10 Prove Theorem 2.15.13: Let A and B represent nonnegative definite matrices. Then tr(AB) ≥ 0, with equality if and only if AB = 0. 1
1
Solution By Theorem 2.15.12, symmetric matrices A 2 and B 2 exist such that 1 1 1 1 A 2 A 2 = A and B 2 B 2 = B. Then by Theorem 2.10.3, 1
1
1
1
1
1
1
1
tr(AB) = tr(A 2 A 2 B 2 B 2 ) = tr(B 2 A 2 A 2 B 2 ) 1
1
1
1
= tr[(A 2 B 2 )T (A 2 B 2 )]. 1
1
Therefore, by Theorem 2.10.4, tr(AB) ≥ 0, with equality if and only if A 2 B 2 = 1 1 1 1 0. If A 2 B 2 = 0, then pre-multiplication by A 2 and post-multiplication by B 2 , respectively, of this matrix equation yield AB = 0. Conversely, if AB = 0, then immediately tr(AB) = 0.
6
2 Selected Matrix Algebra Topics and Results
Exercise 11 Prove Theorem 2.18.1: If A1 , A2 , . . . , Ak are nonsingular, then so is ⊕ki=1 Ai , and its inverse is ⊕ki=1 A−1 i . Solution ⎞ ⎛ −1 ⎞ A1 0 ··· 0 0 ··· 0 −1 ⎜ ⎟ A2 · · · 0 ⎟ ⎟ ⎜ 0 A2 · · · 0 ⎟ .. . . .. ⎟ ⎜ .. .. . . .. ⎟ . . ⎠⎝ . . . ⎠ . . 0 0 · · · Ak 0 0 · · · A−1 k ⎛ ⎞ I 0 ··· 0 ⎜0 I ··· 0⎟ ⎜ ⎟ = ⎜ . . . . ⎟ = I. ⎝ .. .. . . .. ⎠ ⎛
A1 ⎜ 0 ⎜ ⊕ki=1 Ai ⊕ki=1 A−1 =⎜ . i ⎝ ..
0 0 ··· I Exercise 12 Prove Theorem 2.19.1: Let x represent an n-vector of variables. T
(a) If f (x) = aT x where a is an n-vector of constants, then ∇f = ∂(a∂x x) = a. (b) If f (x) = xT Ax where A is an n × n symmetric matrix of constants, then T ∇f = ∂(x∂xAx) = 2Ax. Solution (a) ⎛ ∂(n ∇f =
∂(aT x) ∂x
i=1 ai xi ) ∂x1
⎜ =⎜ ⎝ ∂(
n
.. .
i=1 ai xi ) ∂xn
⎞
⎛
⎞ a1 ⎟ ⎜ . ⎟ ⎟ = ⎝ . ⎠ = a. . ⎠ an
(b) ⎛ ∂(xT Ax) ⎜ ⎜ ∇f = =⎜ ⎝ ∂x
∂( ni=1 nj=1 aij xi xj ) ∂x1
.. .
∂( ni=1 nj=1 aij xi xj ) ∂xn
⎞
⎞ ⎛ 2a x + nj=2 a1j xj + ni=2 ai1 xi ⎟ ⎜ 11 1 ⎟ ⎟ ⎜ .. ⎟ ⎟=⎝ . ⎠ ⎠ n n 2ann xn + j =2 anj xj + i=2 ain xi ⎛ n 2 j =1 a1j xj ⎜ .. =⎜ . ⎝ n 2 j =1 anj xj
⎞ ⎟ ⎟ = 2Ax. ⎠
Generalized Inverses and Solutions to Systems of Linear Equations
This chapter presents exercises on generalized inverses and solutions to systems of linear equations, and provides solutions to those exercises. Exercise 1 Find a generalized inverse of each of the following matrices. (a) (b) (c) (d) (e)
(f) (g) (h) (i) (j) (k) (l)
cA, where c = 0 and A is an arbitrary matrix. abT , where a is a nonnull m-vector and b is a nonnull n-vector. K, where K is nonnull and idempotent. D = diag(d1 , . . . , dn ) where d1 , . . . , dn are arbitrary real numbers. C, where C is a square matrix whose elements are equal to zero except possibly on the cross-diagonal stretching from the lower left element to the upper right element. PAQ, where A is any nonnull m × n matrix and P and Q are m × m and n × n orthogonal matrices. P, where P is an m × n matrix such that PT P = In . Jm×n , and characterize the collection of all generalized inverses of Jm×n . aIn + bJn , where a and b are nonzero scalars. (Hint: Consider the nonsingular and singular cases separately, and use Corollary 2.9.7.1 for the former.) A+bdT , where A is nonsingular and A+bdT is singular. (Hint: The singularity of A + bdT implies that dT A−1 b = −1 by Corollary 2.9.7.1.) B, where B = ⊕ki=1 Bi and B1 , . . . , Bk are arbitrary matrices. A ⊗ B, where A and B are arbitrary matrices.
Solution (a) Let G represent a generalized inverse of A. Then (1/c)G is a generalized inverse of cA because cA[(1/c)G]cA = cAGA = cA.
© Springer Nature Switzerland AG 2020 D. L. Zimmerman, Linear Model Theory, https://doi.org/10.1007/978-3-030-52074-8_3
7
3
8
(b)
3 Generalized Inverses and Solutions to Systems of Linear Equations 1 baT
a 2 b 2
is a generalized inverse of abT because
ab
T
1 baT
a 2 b 2
abT =
1 a b 2 a 2 bT = abT .
a 2 b 2
(c) K is a generalized inverse of itself because KKK = (KK)K = KK = K. Another generalized inverse of K is I because KIK = KK = K. 1/dk , if dk = 0 for k = 1, . . . , n. Then (d) Let C = diag(c1 , . . . , cn ) where ck = 0, if dk = 0 C is a generalized inverse of D because DCD = diag(d1 , . . . , dn )diag(c1 , . . . , cn )diag(d1 , . . . , dn ) = diag(d1 c1 d1 , . . . , dn cn dn ) = D. (e) Let ⎛ ⎜ C=⎝
..
⎛
⎞
c1
⎟ ⎠
.
⎜ D=⎝
and
cn where dk =
dn ..
.
⎞ ⎟ ⎠,
d1
0 1/ck , if ck = for k = 1, . . . , n. Then D is a generalized inverse 0, if ck = 0
of C because ⎛
c1
⎜ CDC = ⎝ ⎛ ⎜ =⎝
..
.
⎟⎜ ⎠⎝
cn
dn ..
c1 d1 ..
⎟⎜ ⎠⎝
. cn dn
c1 d1 c1
⎜ =⎝
. .. cn dn cn c1
⎜ =⎝
.. cn
.
⎞⎛
.
⎞ ⎟ ⎠.
cn ⎞ ⎟ ⎠
c1
⎟⎜ ⎠⎝
d1 ⎞⎛
⎛
⎛
⎞⎛
c1 . ..
.. cn ⎞ ⎟ ⎠
.
⎞ ⎟ ⎠
3 Generalized Inverses and Solutions to Systems of Linear Equations
9
(f) Let G represent a generalized inverse of A. Then QT GPT is a generalized inverse of PAQ because PAQ(QT GPT )PAQ = PAIGIAQ = PAGAQ = PAQ. (g) PT is a generalized inverse of P because PPT P = P(PT P) = PIn = P. (h) (1/mn)Jn×m is a generalized inverse of Jm×n because Jm×n [(1/mn)Jn×m ]Jm×n = (1/mn)1m 1Tn 1n 1Tm 1m 1Tn = (1/mn)1m (nm)1Tn = Jm×n .
To characterize the entire collection of generalized inverses of Jm×n , let G = (gij ) represent an arbitrary member of the collection. Then Jm×n GJm×n = n m T T T T T Jm×n , i.e., 1m 1n G1m 1n = 1m 1n , i.e., i=1 j =1 gij 1m 1n = 1m 1n , n m implying that i=1 j =1 gij = 1. Thus the collection of all generalized inverses of Jm×n consists of all n × m matrices whose elements sum to one. (i) aI + bJn = aI + (b1n )1Tn , which by Corollary 2.9.7.1 is nonsingular if 1Tn (aI)−1 (b1n ) = −1, i.e., if b = −a/n. In that case the corollary yields (aI + bJn )−1 = a −1 I − [1 + 1Tn (a −1 I)(b1n )]−1 (a −1 I)(b1n )1Tn (a −1 I) =
1 b 1 I− · 2 Jn a 1 + bn/a a
=
b 1 I− Jn . a a(a + bn)
In the other case, i.e., if b = −a/n, then aI + bJn = a(I − n1 J), and because I− n1 Jn is idempotent, a1 (I− n1 J) is a generalized inverse by the solutions to parts (a) and (c) of this exercise. Another generalized inverse in this case is (1/a)In . (j) A−1 is a generalized inverse of A + bdT because (A + bdT )A−1 (A + bdT ) = AA−1 A + bdT A−1 A + AA−1 bdT + bdT A−1 bdT = A + bdT + b(1 + dT A−1 b)dT = A + bdT .
(k) Let B− . . , k. Then i represent a generalized inverse of Bi for i = 1, . k k B ⊕i=1 B− is a generalized inverse of B because ⊕ ⊕ki=1 B− i i i=1 k i k ⊕i=1 Bi = ⊕ki=1 Bi B− i Bi = ⊕i=1 Bi = B. − − (l) Let A and B represent generalized inverses of A and B, respectively. Then A− ⊗ B− is a generalized inverse of A ⊗ B because, using Theorem 2.17.5, (A ⊗ B)(A− ⊗ B− )(A ⊗ B) = (AA− A) ⊗ (BB− B) = A ⊗ B.
10
3 Generalized Inverses and Solutions to Systems of Linear Equations
Exercise 2 Prove Theorem 3.1.1: Let A represent an arbitrary matrix of rank r, and let P and Q represent nonsingular matrices such that PAQ =
Ir 0 . 0 0
(Such matrices P and Q exist by Theorem 2.8.10.) Then the matrix Ir F P, G=Q HB
where F, H, and B are arbitrary matrices of appropriate dimensions, is a generalized inverse of A. I 0 Q−1 , we obtain Solution Because A = P−1 r 0 0 AQ
I Ir F PA = P−1 r HB 0 I = P−1 r 0 I = P−1 r 0
0 I 0 I F Q−1 Q r PP−1 r Q−1 0 HB 0 0 F Ir 0 Q−1 0 0 0 0 Q−1 0
= A, which establishes that Q
Ir F P is a generalized inverse of A. HB
Exercise 3 Let A represent any square matrix. (a) Prove that A has a nonsingular generalized inverse. (Hint: Use Theorem 3.1.1, which was stated in the previous exercise.) (b) Let G represent a nonsingular generalized inverse of A. Show, by giving a counterexample, that G−1 need not equal A. Solution (a) Let n be the number of rows (and columns) of A, and let P and Q represent nonsingular matrices such that PAQ =
Ir 0 , 0 0
3 Generalized Inverses and Solutions to Systems of Linear Equations
11
where r = rank(A). By Theorem 3.1.1, Q
Ir F P, HB
where F, H, and B are arbitrary matrices of appropriate dimensions, is a generalized inverse of A. Take B = In−r , H = 0(n−r)×r , and F = 0r×(n−r) . Then this generalized inverse is QP, which is nonsingular by Theorem 2.8.9. (b) Take A = J2 , for which (1/2)I2 is a nonsingular generalized inverse [because J2 (1/2)I2 J2 = (1/2)·2J2 = J2 ]. But the inverse of this nonsingular generalized inverse is 2I2 , which does not equal J2 . Exercise 4 Show, by giving a counterexample, that if the system of equations Ax = b is not consistent, then AA− b need not equal b. Solution Consider the system of equations
11 11
x1 x2
0 = , 1
whichclearly are not consistent. Here A = J2 , for which one generalized inverse is 10 . Then 00 −
AA b =
11 11
10 00
0 0 = = b. 1 0
Exercise 5 Let X represent a matrix such that XT X is nonsingular, and let k represent an arbitrary real number. Consider generalized inverses of the matrix I + kX(XT X)−1 XT . For each k ∈ R, determine Sk , where Sk = {c ∈ R : I + cX(XT X)−1 XT is a generalized inverse of I + kX(XT X)−1 XT }. Solution [I + kX(XT X)−1 XT ][I + cX(XT X)−1 XT ][I + kX(XT X)−1 XT ] = [I + (k + c + kc)X(XT X)−1 XT ][I + kX(XT X)−1 XT ] = I + (2k + k 2 + k 2 c + 2kc + c)X(XT X)−1 XT . Thus I + cX(XT X)−1 XT is a generalized inverse of I + kX(XT X)−1 XT if and only if 2k+k 2 +c(k+1)2 = k, i.e., if and only if c(k+1)2 = −k(k+1). If k = −1, then c is arbitrary; if k = 1, then c = −k/(k+1). So Sk = R if k = −1; Sk = {−k/(k+1)} otherwise.
12
3 Generalized Inverses and Solutions to Systems of Linear Equations
Exercise 6 Let A represent any m × n matrix, and let B represent any n × q matrix. Prove that for any choices of generalized inverses A− and B− , B− A− is a generalized inverse of AB if and only if A− ABB− is idempotent. Solution If B− A− is a generalized inverse of AB, then ABB− A− AB = AB, implying further (upon pre-multiplying both sides of the matrix equation by A− and post-multiplying by B− ) that A− ABB− A− ABB− = A− ABB− , i.e., A− ABB− is idempotent. Conversely, if A− ABB− is idempotent, then A− ABB− A− ABB− = A− ABB− , implying further (upon pre-multiplying both sides of the matrix equation by A and post-multiplying by B) that ABB− A− AB = AB, i.e., B− A− is a generalized inverse of AB. Exercise 7 Prove Theorem 3.3.5: For any n × m matrix A of rank r, AA− and A− A are idempotent matrices of rank r, and In −AA− and Im −A− A are idempotent matrices of ranks n − r and m − r, respectively. Solution (AA− )(AA− ) = (AA− A)A− = AA− and (A− A)(A− A) = A− (AA− A) = A− A, which establishes that AA− and A− A are idempotent. That In − AA− and Im − A− A likewise are idempotent follows immediately by Theorem 2.12.1. Now by Theorem 2.8.4, rank(A) = rank(AA− A) ≤ rank(AA− ) ≤ rank(A); thus rank(AA− ) = rank(A) = r. By the idempotency of In − AA− and AA− and by Theorems 2.12.2 and 2.10.1b, rank(In − AA− ) = tr(In − AA− ) = tr(In ) − tr(AA− ) = n − rank(AA− ) = n − r. Similar arguments yield rank(A− A) = r and rank(Im − A− A) = m − r. Exercise 8 Prove Theorem 3.3.7: Let M =
AB CD
represent a partitioned
matrix. (a) If C(B) is a subspace of C(A) and R(C) is a subspace of R(A) (as would be the case, e.g., if A was nonsingular), then the partitioned matrix
A− + A− BQ− CA− −A− BQ− −Q− CA− Q−
where Q = D − CA− B, is a generalized inverse of M.
,
3 Generalized Inverses and Solutions to Systems of Linear Equations
13
(b) If C(C) is a subspace of C(D) and R(B) is a subspace of R(D) (as would be the case, e.g., if D was nonsingular), then
P− −P− BD− − − − −D CP D + D− CP− BD−
,
where P = A − BD− C, is a generalized inverse of M. Solution We prove part (a) only; the proof of part (b) is very similar. The given condition C(B) ⊆ C(A) implies that B = AF for some F, or equivalently that AA− B = B. Similarly, R(C) ⊆ R(A) implies that C = HA for some H, or equivalently that CA− A = C. Then
AB CD
A− + A− BQ− CA− −A− BQ− −Q− CA− Q−
AA− + AA− BQ− CA− − BQ− CA− CA− + CA− BQ− CA− − DQ− CA− K11 K12 , = K21 K22
=
AB CD
−AA− BQ− + BQ− −CA− BQ− + DQ−
AB CD
say, where K11 = AA− A + AA− BQ− CA− A − BQ− CA− A − AA− BQ− C + BQ− C = A + BQ− C − BQ− C − BQ− C + BQ− C = A, K12 = AA− B + AA− BQ− CA− B − BQ− CA− B − AA− BQ− D + BQ− D = B + BQ− CA− B − BQ− CA− B − BQ− D + BQ− D = B, K21 = CA− A + CA− BQ− CA− A − DQ− CA− A − CA− BQ− C + DQ− C = C + CA− BQ− C − DQ− C − CA− BQ− C + DQ− C = C, and K22 = CA− B + CA− BQ− CA− B − DQ− CA− B − CA− BQ− D + DQ− D = CA− B − (D − CA− B)Q− CA− B + (D − CA− B)Q− D
14
3 Generalized Inverses and Solutions to Systems of Linear Equations
= CA− B + QQ− (D − CA− B) = CA− B + Q = D. Exercise 9 Prove Theorem 3.3.10: Let M = A + H, where C(H) and R(H) are subspaces of C(A) and R(A), respectively (as would be the case, e.g., if A was nonsingular), and take B, C, and D to represent any three matrices such that H = BCD. Then, the matrix A− − A− BC(C + CDA− BC)− CDA− is a generalized inverse of M. Solution Define Q = C + CDA− BC. Because C(H) ⊆ C(A) and R(H) ⊆ R(A), matrices F and K exist such that H = AF and H = KA. It follows that AA− H = AA− AF = AF = H and HA− A = KAA− A = KA = H. Therefore (A + H)(A− − A− BCQ− CDA− )(A + H) = (A + H)A− (A + H) −(A + H)(A− BCQ− CDA− )(A + H) = A + H + H + HA− H − R,
where R = AA− BCQ− CDA− A + AA− BCQ− CDA− H + HA− BCQ− CDA− A +HA− BCQ− CDA− H = AA− BCQ− CDA− A + AA− BCQ− CDA− HAA− +AA− HA− BCQ− CDA− A + AA− HA− BCQ− CDA− HA− A = AA− BCQ− CDA− A + AA− BCQ− CDA− BCDA− A +AA− BCDA− BCQ− CDA− BCDA− A = AA− B[CQ− C + CQ− CDA− BC + CDA− BCQ− C +CDA− BCQ− CDA− BC]DA− A = AA− B[(C + CDA− BC)Q− (C + CDA− BC)]DA− A = AA− B(C + CDA− BC)DA− A = AA− HA− A + AA− HA− HA− A = H + HA− H. Substituting this expression for R into the expression for (A + H)(A− − A− BCQ− CDA− )(A + H), we obtain (A + H)(A− − A− BCQ− CDA− )(A + H) = A + H, which establishes that A− − A− BCQ− CDA− is a generalized inverse of A + H.
3 Generalized Inverses and Solutions to Systems of Linear Equations
15
A11 A12 , where A11 is of Exercise 10 Prove Theorem 3.3.11: Let A = A21 A22 G11 G12 , dimensions n × m, represent a partitioned matrix and let G = G21 G22 where G11 is of dimensions m × n, represent any generalized inverse of A. If R(A21 ) ∩ R(A11 ) = {0} and C(A12 ) ∩ C(A11 ) = {0}, then G11 is a generalized inverse of A11 .
Solution A11 A12 G11 G12 A11 A12 A11 A12 = A21 A22 A21 A22 G21 G22 A21 A22 A11 G12 + A12 G22 A11 A12 A11 G11 + A12 G21 , = A21 G11 + A22 G21 A21 G12 + A22 G22 A21 A22 implying (by considering only the upper left blocks of the block matrices on the two sides of the matrix equation) that A11 G11 A11 + A12 G21 A11 + A11 G12 A21 + A12 G22 A21 = A11 , i.e., A12 (G21 A11 + G22 A21 ) = A11 (I − G11 A11 − G12 A21 ). Because the column spaces of A12 and A11 are essentially disjoint, A11 (I − G11 A11 − G12 A21 ) = 0, i.e., (I − A11 G11 )A11 = A11 G12 A21 . Because the row spaces of A11 and A21 are essentially disjoint, (I − A11 G11 )A11 = 0, i.e., A11 G11 A11 = A11 , i.e., G11 is a generalized inverse of A11 .
16
3 Generalized Inverses and Solutions to Systems of Linear Equations
Exercise 11 (a) Prove Theorem 3.3.12: Let A and B represent two m × n matrices. If C(A) ∩ C(B) = {0} and R(A) ∩ R(B) = {0}, then any generalized inverse of A + B is also a generalized inverse of A. (b) Show that the converse of Theorem 3.3.12 is false by constructing a counterexample based on the matrices A=
10 00
and
B=
00 . 01
Solution (a) A + B = (A + B)(A + B)− (A + B), implying that A + B = A(A + B)− A + B(A + B)− A + A(A + B)− B + B(A + B)− B, implying further that A − A(A + B)− A − A(A + B)− B = B(A + B)− A + B(A + B)− B − B. The columns of the matrix on the left of this last equality are elements of C(A), while those of the matrix on the right are elements of C(B). Because C(A) ∩ C(B) = {0}, A − A(A + B)− A − A(A + B)− B = 0, i.e., A − A(A + B)− A = A(A + B)− B. The rows of the matrix on the left of this last equality are elements of R(A), while those of the matrix on the right are elements of R(B). Because R(A) ∩ R(B) = {0}, A − A(A + B)− A = 0. Thus A = A(A + B)− A, showing that any generalized inverse of A + B is also a generalized inverse of A. This proves the theorem. (b) Let A and B be as defined in the theorem. Then, A + B = I2 , which is nonsingular so its only generalized inverse is its ordinary inverse, I2 . It is easily verified that AIA = A, hence I2 is also a generalized inverse of A. 1 0 However, C(A) = c : c ∈ R = C(B) = c : c ∈ R . Thus 0 1 C(A) ∩ C(B) = {0}, and by the symmetry of A and B, R(A) ∩ R(B) = {0} also. Exercise 12 Determine Moore–Penrose inverses of each of the matrices listed in Exercise 3.1 except those in parts (c) and (j).
3 Generalized Inverses and Solutions to Systems of Linear Equations
17
Solution (a) It is easy to show that (1/c)A+ satisfies the four Moore–Penrose conditions. (b) It was shown in Exercise 3.1b that a 21 b 2 baT is a generalized inverse of abT . Observe that 1 1 1 T T T ab = ba ba baT , 2 2 2 2 2
a b
a b
a b 2 so this generalized inverse is reflexive. Furthermore,
1 baT 2
a b 2
abT =
1 bbT
b 2
and ab
T
1 baT 2
a b 2
=
1 aaT ,
a 2
which are both symmetric; thus a 21 b 2 baT is the Moore–Penrose inverse. (d) It was shown in Exercise 3.1d that C = diag(c1 , . . . , cn ), where ck =
0 1/dk , if dk = 0, if dk = 0
(for k = 1, . . . , n) is a generalized inverse of D. It is easily verified that CDC = C and that CD and DC are both diagonal and hence symmetric. Thus C is the Moore–Penrose inverse of D. (e) With the elements of C defined via ⎛
c1
⎜ C=⎝
..
.
⎞ ⎟ ⎠,
cn it was shown in Exercise 3.1e that ⎛
dn
⎜ D=⎝
..
.
⎞ ⎟ ⎠,
d1 where dk =
0 1/ck , if ck = 0, if ck = 0
18
(f)
(g) (h) (i)
3 Generalized Inverses and Solutions to Systems of Linear Equations
(for k = 1, . . . , n) is a generalized inverse of C. It is easily verified that DCD = D and that DC and CD are both diagonal and hence symmetric. Thus D is the Moore–Penrose inverse of C. By Exercise 3.1f, QT A+ PT is a generalized inverse of PAQ because A+ is a generalized inverse of A. Furthermore, (QT A+ PT )PAQ(QT A+ PT ) = QT A+ AA+ PT = QT A+ PT . Finally observe that (QT A+ PT )PAQ = QT A+ AQ and PAQ(QT A+ PT ) = PAA+ PT , which are both symmetric because A+ A and AA+ are symmetric. Thus QT A+ PT is the Moore–Penrose inverse of PAQ. It is easy to show that PT satisfies the four Moore–Penrose conditions. It is easy to show that (1/mn)Jn×m satisfies the four Moore–Penrose conditions. b If b = −a/n, then a1 In − a(a+bn) Jn is the ordinary inverse and hence the Moore– Penrose inverse. If b = −a/n, then the matrix under consideration is a(In − 1 1 1 n Jn ). By Exercise 3.1i, a (In − n Jn ) is a generalized inverse of this matrix. Observe that 1 1 1 1 1 1 1 (In − Jn ) a(In − Jn ) (In − Jn ) = (In − Jn ). a n n a n a n
Finally, [ a1 (In − n1 Jn )]a(In − n1 Jn ) = In − n1 Jn and a(In − n1 Jn )[ a1 (In − n1 Jn )] = In − n1 Jn , which are both symmetric. Thus a1 (In − n1 Jn ) is the Moore–Penrose inverse in this case. (k) It is easy to show that ⊕ki=1 B+ i satisfies the four Moore–Penrose conditions. (l) It is easy to show, with the aid of Theorem 2.17.5, that A+ ⊗ B+ satisfies the four Moore–Penrose conditions. Exercise 13 Prove Theorem 3.4.2: For any matrix A, (AT )+ = (A+ )T . Solution It suffices to show that (A+ )T satisfies the four Moore–Penrose conditions for the Moore–Penrose inverse of AT . Using the cyclic property of the trace and the fact that A+ satisfies the four Moore–Penrose conditions for the Moore–Penrose inverse of A, we obtain: (i) (ii) (iii) (iv)
AT (A+ )T AT = (AA+ A)T = AT , (A+ )T AT (A+ )T = (A+ AA+ )T = (A+ )T , [(A+ )T AT ]T = AA+ = (AA+ )T = (A+ )T AT , [AT (A+ )T ]T = A+ A = (A+ A)T = AT (A+ )T .
Exercise 14 Let A represent any matrix. Prove that rank(A+ ) = rank(A). Does this result hold for an arbitrary generalized inverse of A? Prove or give a counterexample.
3 Generalized Inverses and Solutions to Systems of Linear Equations
19
Solution By the first two Moore–Penrose conditions for A+ and Theorem 2.8.4, rank(A+ ) = rank(A+ AA+ ) ≤ rank(A) = rank(AA+ A) ≤ rank(A+ ). Thus rank(A+ ) = rank(A). This result does not hold for an arbitrary generalized inverse of A. Consider, for example, A = J2 , for which 0.5I2 is a generalized inverse as noted in (3.1). But rank(J2 ) = 1, whereas rank(0.5I2 ) = 2.
4
Moments of a Random Vector and of Linear and Quadratic Forms in a Random Vector
This chapter presents exercises on moments of a random vector and linear and quadratic forms in a random vector and provides solutions to those exercises. Exercise 1 Prove Theorem 4.1.1: A variance–covariance matrix is nonnegative definite. Conversely, any nonnegative definite matrix is the variance–covariance matrix of some random vector. Solution Let x represent a random k-vector such that E(x) = μ, and let a ∈ Rk . Then aT var(x)a = aT E[(x − μ)(x − μ)T ]a = E[aT (x − μ)(x − μ)T a] = E{[aT (x − μ)]2 } ≥ 0. Conversely, let z represent a vector of independent random variables with common variance one, so that var(z) = I, and let A represent any nonnegative 1 definite matrix. By Theorem 2.15.12, a nonnegative definite matrix A 2 exists such 1 1 1 1 1 that A = A 2 A 2 . Then var(A 2 z) = A 2 IA 2 = A. Exercise 2 Prove Theorem 4.1.2: If the variances of all variables in a random vector x exist, and those variables are pairwise independent, then var(x) is a diagonal matrix. Solution Without loss of generality consider cov(x1 , x2 ), where x1 and x2 are the first two elements of x. Because x1 and x2 are independent, cov(x1 , x2 ) = E[(x1 − E(x1 ))(x2 − E(x2 ))] = E[x1 − E(x1 )] · E[x2 − E(x2 )] = 0 · 0 = 0. Thus all off-diagonal elements of var(x) equal 0, i.e., var(x) is diagonal. Exercise 3 Prove that the second patterned variance–covariance matrix described in Example 4.1-1, i.e.,
© Springer Nature Switzerland AG 2020 D. L. Zimmerman, Linear Model Theory, https://doi.org/10.1007/978-3-030-52074-8_4
21
22
4 Moments of a Random Vector and of Linear and Quadratic Forms in a Random Vector
⎛
11 ⎜1 2 ⎜ ⎜ = σ 2 ⎜ ... ... ⎜ ⎝1 2
··· ··· .. .
1 2 .. .
1 2 .. .
⎞
⎟ ⎟ ⎟ ⎟, ⎟ ··· n − 1 n − 1⎠ 1 2 ··· n − 1 n
where σ 2 > 0, is positive definite. Solution First, note that (1/σ 2 ) is nonnegative definite, as shown in Example 4.1-1. Now, let x represent any vector for which xT [(1/σ 2 )]x = 0. Then xT [(1/σ 2 )]x = xT 1n 1Tn x + xT
T
0
1n−1
1n−1
0n−1 1
+xT
0
0n−1 1
x + xT
02
02
1n−2
1n−2
T x + ···
T x
⎛ ⎞2 ⎛ ⎞2 ⎛ ⎞2 n n n =⎝ xi ⎠ + ⎝ xi ⎠ + ⎝ xi ⎠ + · · · + xn2 i=1
i=2
i=3
= 0.
This implies that xn = 0, xn−1 + xn = 0, . . . ,
n
xi = 0,
i=2
n
xi = 0,
i=1
i.e., that xn = xn−1 = xn−2 = · · · = x1 = 0, i.e., that x = 0. Thus, is positive definite. Exercise 4 Prove that the third patterned variance–covariance matrix described in Example 4.1-1, i.e., ⎛
1 ρ1 ρ1 ρ2 ρ1 ρ2 ρ3 ⎜ 1 ρ ρ2 ρ3 2 ⎜ ⎜ 1 ρ3 ⎜ n √ ⎜ ⎜ 1 = ⊕i=1 σii ⎜ ⎜ ⎜ ⎜ ⎝
· · · n−2 i=1 ρi n−2 · · · i=2 ρi · · · n−2 i=3 ρi n−2 · · · i=4 ρi .. .. . . 1
n−1 ⎞ i=1 ρi n−1 ⎟ i=2 ρi ⎟ n−1 ⎟ i=3 ρi ⎟ n−1 ⎟ n √ i=4 ρi ⎟ ⎟ ⊕i=1 σii , .. ⎟ ⎟ . ⎟ ρn−1 ⎠ 1
where σii > 0 and −1 < ρi < 1 for all i, is positive definite.
4 Moments of a Random Vector and of Linear and Quadratic Forms in a. . .
23
Solution For k = 1, . . . , n, define the k × k matrix ⎛
1 ρ1 ρ1 ρ2 ρ1 ρ2 ρ3 ⎜ 1 ρ ρ2 ρ3 2 ⎜ ⎜ 1 ρ3 ⎜ ⎜ 1 Rk = ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
· · · k−2 ρ i=1 i · · · k−2 ρ i=2 i · · · k−2 i=3 ρi k−2 · · · i=4 ρi .. .. . . 1
k−1 ⎞ i=1 ρi k−1 ⎟ i=2 ρi ⎟ k−1 ⎟ i=3 ρi ⎟ k−1 ⎟ i=4 ρi ⎟ ⎟. .. ⎟ ⎟ . ⎟ ρk−1 ⎠ 1
Observe that the third patterned variance–covariance matrix is equal to Rn , apart from pre- and post-multiplication by a diagonal matrix with positive main diagonal elements. Therefore, it suffices to show that Rn is positive definite. Also observe that for k = 2, . . . , n, Rk may be written in partitioned form as Rk =
Rk−1 rk−1 rTk−1 1
,
k−1 T where rk−1 = ( k−1 i=1 ρi , i=2 ρi , . . . , ρk−1 ) . By Theorem 2.15.11, Rn is positive definite if and only if |R1 | > 0, |R2 | > 0, . . . , |Rn | > 0. We will use “proof by induction” to show that all of these inequalities are satisfied. First observe that |R1 | = |1| = 1 > 0. Thus, it suffices to show that if |Rk | > 0, then |Rk+1 | > 0. So suppose that |Rk | > 0. By Theorem 2.11.7, |Rk+1 | = |Rk |(1 − rTk R−1 k rk ). Crucially, rk is equal to ρk times the last column of Rk , implying that R−1 k rk = rk−1 (k) ρk uk and rk = ρk . Therefore 1 2 |Rk+1 | = |Rk |[1 − ρk rTk−1 1 (ρk u(k) k )] = |Rk |(1 − ρk ) > 0 because −1 < ρk < 1. Exercise 5 Prove Theorem 4.1.3: Suppose that the skewness matrix = (λij k ) of a random n-vector x exists. (a) If the elements of x are triple-wise independent (meaning that each trivariate marginal cdf of x factors into the product of its univariate marginal cdfs), then for some constants λ1 , . . . , λn , λij k =
λi if i = j = k, 0 otherwise,
24
4 Moments of a Random Vector and of Linear and Quadratic Forms in a Random Vector
or equivalently, = (λ1 u1 uT1 , λ2 u2 uT2 , . . . , λn un uTn ), where ui is the ith unit n-vector; (b) If the distribution of x is symmetric in the sense that the distributions of [x − E(x)] and −[x − E(x)] are identical, then = 0. Solution (a) For i = j = k, λij k = λiii ≡ λi . For any other case of i, j , and k, at least one is not equal to the other two. Without loss of generality assume that i = j and i = k. Then λij k = E[(xi − μi )] · E[(xj − μj )(xk − μk )] = 0 · σj k = 0. (b) Because two random vectors having the same distribution (in this case x − μ and −(x − μ)) necessarily have the same moments, λij k = E[(xi − μi )(xj − μj )(xk − μk )] = E{[−(xi − μi )][−(xj − μj )][−(xk − μk )]} = −E[(xi − μi )(xj − μj )(xk − μk )] = −λij k for all i, j, k. Thus λij k = 0 for all i, j, k, i.e., = 0. Exercise 6 Prove Theorem 4.1.4: If the kurtosis matrix = (γij kl ) of a random n-vector x exists and the elements of x are quadruple-wise independent (meaning that each quadrivariate marginal cdf of x factors into the product of its univariate marginal cdfs), then for some constants γ1 , . . . , γn ,
γij kl
⎧ γi ⎪ ⎪ ⎨ σii σkk = ⎪ σ σ ⎪ ⎩ ii jj 0
if i = j = k = l, if i = j = k = l, if i = k = j = l or i = l = j = k, otherwise,
where var(x) = (σij ). Solution If i = j = k = l, then γij kl = γiiii ≡ γi . If i = j = k = l, then γij kl = γiikk = E[(xi − μi )2 (xk − μk )2 ] = E[(xi − μi )2 ]E[(xk − μk )2 ] = σii σkk . By exactly the same argument, γij kl = σii σjj if i = k = j = l or i = l = j = k. For any other case, at least one of i, j, k, l is not equal to any of the other three. Without loss of generality assume that i = j , i = k, and i = l. Then γij kl = E[(xi − μi )]E[(xj − μj )(xk − μk )(xl − μl )] = 0 · λij k = 0.
4 Moments of a Random Vector and of Linear and Quadratic Forms in a. . .
25
Exercise 7 Prove Theorem 4.2.1: Let x represent a random n-vector with mean μ, let A represent a t × n matrix of constants, and let a represent a t-vector of constants. Then E(Ax + a) = Aμ + a. Solution Represent A by its rows as ⎞ aT1 ⎜ .. ⎟ ⎝ . ⎠ ⎛
aTn and let a = (ai ). Then ⎛ ⎞⎤ ⎛ T ⎞ ⎞ a1 a1 x + a1 aT1 ⎜ ⎟⎥ ⎜ ⎟ ⎢⎜ ⎟ .. E(Ax + a) = E ⎣⎝ ... ⎠ x + ⎝ ... ⎠⎦ = E ⎝ ⎠ . ⎡⎛
aTn
aTn x + an ⎞
an
⎞ ⎛ T a1 μ + a1 E(aT1 x + a1 ) ⎟ ⎜ ⎟ ⎜ .. .. = =⎝ ⎠ ⎝ ⎠ = Aμ + a. . . ⎛
E(aTn x + an )
aTn μ + an
Exercise 8 Prove Theorem 4.2.2: Let x and y represent random n-vectors, and let z and w represent random m-vectors. If var(x), var(y), var(z), and var(w) exist, then: (a) cov(x + y, z + w) = cov(x, z) + cov(x, w) + cov(y, z) + cov(y, w); (b) var(x + y) = var(x) + var(y) + cov(x, y) + [cov(x, y)]T . Solution (a) cov(x + y, z + w) = E[(x + y − E(x + y))(z + w − E(z + w))T ] = E[(x − E(x))(z − E(z))T + (x − E(x))(w − E(w))T +(y − E(y))(z − E(z))T +(y − E(y))(w − E(w))T ] = cov(x, z) + cov(x, w) + cov(y, z) + cov(y, w).
26
4 Moments of a Random Vector and of Linear and Quadratic Forms in a Random Vector
(b) Using part (a), var(x + y) = cov(x + y, x + y) = cov(x, x) + cov(x, y) + cov(y, x) + cov(y, y) = var(x) + var(y) + cov(x, y) + [cov(x, y)]T . Exercise 9 Prove Theorem 4.2.3: Let x and z represent a random n-vector and a random m-vector, respectively, with matrix of covariances ; let A and B represent t × n and u × m matrices of constants, respectively; and let a and b represent a t-vector and u-vector of constants, respectively. Then cov(Ax + a, Bz + b) = ABT . Solution cov(Ax + a, Bz + b) = E[(Ax + a − E(Ax + a))(Bz + b − E(Bz + b))T ] = E[(Ax + a − AE(x) − a)(Bz + b − BE(z) − b)T ] = E[A(x − E(x))(z − E(z))T BT ] = AE[(x − E(x))(z − E(z))T ]BT = ABT . Exercise 10 Let x1 , . . . , xn be uncorrelated random variables with common variance σ 2 . Determine cov(xi − x, ¯ xj − x) ¯ for i ≤ j = 1, . . . , n. Solution xi − x¯ = uTi x − (1/n)1T x and xj − x¯ = uTj x − (1/n)1T x. Thus 1 T 1 1) x, (uj − 1)T x] n n 1 1 = (ui − 1)T (σ 2 I)(uj − 1) n n 1 1 1 = σ 2 (uTi uj − 1T uj − uTi 1 + 2 1T 1) n n n ! σ 2 (1 − n1 ) if i = j, = 2 − σn if i < j.
¯ xj − x) ¯ = cov[(ui − cov(xi − x,
Exercise 11 Let x be a random n-vector with mean μ and positive definite 1 1 variance–covariance matrix . Define − 2 = ( 2 )−1 . 1
(a) Show that − 2 (x − μ) has mean 0 and variance–covariance matrix I.
4 Moments of a Random Vector and of Linear and Quadratic Forms in a. . .
27
(b) Determine E[(x − μ)T −1 (x − μ)] and var[(x − μ)T −1 (x − μ)], assuming for the latter that the skewness matrix and excess kurtosis matrix of x exist. Solution 1
1
1
1
(a) By Theorem 4.2.1, E[ − 2 (x − μ)] = E( − 2 x − − 2 μ) = − 2 μ − 1 1 1 − 2 μ = 0. By Corollary 4.2.3.1 and the symmetry of 2 , var[ − 2 (x − μ)] = 1 1 − 2 − 2 = I. (b) Using Theorem 4.2.4, E[(x − μ)T −1 (x − μ)] = 0T −1 0 + tr( −1 ) = tr(In ) = n. Furthermore, using Theorem 4.2.6, var[(x − μ)T −1 (x − μ)] = [vec( −1 )]T [vec( −1 )] + 40T −1 vec( −1 ) +2tr( −1 −1 ) + 40T −1 −1 0 = [vec( −1 )]T [vec( −1 )] + 2tr(I) = [vec( −1 )]T [vec( −1 )] + 2n.
Exercise 12 Let x be a random n-vector with variance–covariance matrix σ 2 I, where σ 2 > 0, and let Q represent an n × n orthogonal matrix. Determine the variance–covariance matrix of Qx. Solution Because Q is orthogonal, QT Q = QQT = I. Therefore, by Corollary 4.2.3.1, var(Qx) = Q(σ 2 I)QT = σ 2 I. μ1 x1 have mean vector μ = and variance– x2 μ2 11 12 covariance matrix = , where x1 and μ1 are m-vectors and 11 21 22 is m × m. Suppose that is positive definite, in which case both 22 and 11 − 12 −1 22 21 are positive definite by Theorem 2.15.7a.
Exercise 13 Let x =
(a) Determine the mean vector and variance–covariance matrix of x1·2 = x1 − μ1 − 12 −1 22 (x2 − μ2 ). (b) Show that cov(x1 − 12 −1 22 x2 , x2 ) = 0. (c) Determine E{xT1·2 [var(x1·2 )]−1 x1·2 }.
28
4 Moments of a Random Vector and of Linear and Quadratic Forms in a Random Vector
Solution (a) By Theorem 4.2.1, E[x1 − μ1 − 12 −1 22 (x2 − μ2 )] = E(x1 ) − μ1 − 12 −1 [E(x ) − μ ] = 0. By Theorem 4.2.2 and Corollary 4.2.3.1, 2 2 22 −1 var[x1 − μ1 − 12 −1 22 (x2 − μ2 )] = var(x1 ) + var( 12 22 x2 )
−cov(x1 , 12 −1 22 x2 ) T −[cov(x1 , 12 −1 22 x2 )] −1 T = 11 + 12 −1 22 22 ( 22 ) 21 −1 T − 12 ( −1 22 ) 21 − 12 22 21
= 11 − 12 −1 22 21 . (b) By Corollary 4.2.3.1, −1 cov(x1 − 12 −1 22 x2 , x2 ) = cov[(I, − 12 22 )x, (0, I)x] 0 11 12 −1 = (I, − 12 22 ) 21 22 I
=
11 − 12 −1 22 21 ,
12 − 12 −1 22 22
0 I
= 0. (c) By part (a) and Theorem 4.2.4, this expectation is equal to −1 −1 −1 −1 0T ( 11 − 12 −1 22 21 ) 0 + tr[( 11 − 12 22 21 ) ( 11 − 12 22 21 )]
= 0 + tr(Im ) = m. Exercise 14 Suppose that observations x1 , . . . , xn have common mean μ, common variance σ 2 , and common correlation ρ among pairs, so that μ = μ1n and = σ 2 [(1 − ρ)In + ρJn ] for ρ ∈ [−1/(n − 1), 1]. Determine E(x), ¯ var(x), ¯ and E(s 2 ). Solution Let x = (x1 , . . . , xn )T , so that E(x) = μ = μ1n and var(x) = σ 2 [(1 − ρ)In + ρJn ]. Then by Theorems 4.2.1 and 4.2.4 and Corollary 4.2.3.1, E(x) ¯ = E[(1/n)1T x] = (1/n)1T (μ1) = μ; var(x) ¯ = var[(1/n)1T x] = (σ 2 /n2 )1T [(1 − ρ)In + ρJn ]1 = (σ 2 /n2 )[(1 − ρ)n + ρn2 ] = (σ 2 /n)[1 + ρ(n − 1)];
4 Moments of a Random Vector and of Linear and Quadratic Forms in a. . .
E(s 2 ) = E =
29
1 xT [In − (1/n)Jn ]x n−1
σ2 1 {(μ1)T [In − (1/n)Jn ](μ1} + tr{[I − (1/n)Jn ][(1 − ρ)In + ρJn ]} n−1 n−1
= 0+
σ2 tr{(1 − ρ)[In − (1/n)Jn ]} n−1
= σ 2 (1 − ρ).
Exercise 15 Prove that if x is a random vector for which μ = 0 and = 0, then any linear form bT x is uncorrelated with any quadratic form xT Ax. Solution By Corollary 4.2.5.1, cov(bT x, xT Ax) = 2bT A0 = 0. Exercise 16 Determine how Theorems 4.2.1 and 4.2.4–4.2.6 specialize when μ is orthogonal to every row of A, i.e., when Aμ = 0. Solution When μ is orthogonal to every row of A, i.e., when Aμ = 0, then E(Ax + a) = a, E(xT Ax) = tr(A), cov(Bx, xT Ax) = Bvec(A), and cov(xT Ax, xT Bx) = [vec(A)]T vec(B) + 2μT Bvec(A) + 2tr(AB). Exercise 17 Let x be a random n-vector with mean μ and variance–covariance matrix of rank r. Find E(xT − x) and simplify it as much as possible. Solution By Theorems 4.2.4, 3.3.5, and 2.12.2, E(xT − x) = μT − μ + tr( − ) = μT − μ + r. Exercise 18 Determine the covariance between the sample mean and sample variance of observations whose joint distribution is symmetric with common mean μ, variance–covariance matrix , and finite skewness matrix . Solution By Theorem 4.1.3b and Corollary 4.2.5.1, cov(x, ¯ s 2 ) = 2(1/n)1T {[I − T (1/n)J]/(n − 1)}(μ1) = (2/n)1 0 = 0.
5
Types of Linear Models
This chapter presents exercises on types of linear models, and provides solutions to those exercises. Exercise 1 Give as concise a representation of the model matrix X as possible for a two-way main effects model when the data are balanced, and specialize further to the case r = 1. Solution This model is a special case of (5.6) with nij = r. Thus, we have ni· = mr and n·j = qr for all i, j . Then by (5.7), the model matrix for a balanced two-way main effects model is given by ⎛
1mr 1mr 0mr ⎜1 0 1 ⎜ mr mr mr X=⎜ .. .. ⎜ .. ⎝ . . . 1mr 0mr 0mr
··· ··· .. . ···
⎞ 0mr ⊕m j =1 1r ⎟ 0mr ⊕m j =1 1r ⎟ ⎟ = 1qmr , Iq ⊗ 1mr , 1q ⊗ Im ⊗ 1r . .. .. ⎟ ⎠ . . m 1mr ⊕j =1 1r
Furthermore, for the case that r = 1, we have X = 1qm , Iq ⊗ 1m , 1q ⊗ Im . Exercise 2 Give a concise representation of the model matrix X for the cellmeans representation of the two-way model with interaction. Is this X full rank? Specialize to the case of balanced data. q
Solution X = ⊕i=1 ⊕m j =1 1nij , which has full rank qm. In the case of balanced data q with nij = r for all i and j , X = ⊕i=1 ⊕m j =1 1r = Iqm ⊗ 1r .
© Springer Nature Switzerland AG 2020 D. L. Zimmerman, Linear Model Theory, https://doi.org/10.1007/978-3-030-52074-8_5
31
32
5 Types of Linear Models
Exercise 3 Give the model matrix X for a three-way main effects model in which each factor has two levels and each cell of the 2 × 2 × 2 layout has exactly one observation, i.e., for the model yij k = μ + αi + γj + δk + eij k
(i = 1, 2; j = 1, 2; k = 1, 2).
Solution ⎛
1 ⎜1 ⎜ ⎜1 ⎜ ⎜ ⎜1 X=⎜ ⎜1 ⎜ ⎜1 ⎜ ⎝1 1
1 1 1 1 0 0 0 0
0 0 0 0 1 1 1 1
1 1 0 0 1 1 0 0
0 0 1 1 0 0 1 1
1 0 1 0 1 0 1 0
⎞ 0 1⎟ ⎟ 0⎟ ⎟ ⎟ 1⎟ ⎟ = 18 , I2 ⊗ 14 , 12 ⊗ I2 ⊗ 12 , 14 ⊗ I2 . 0⎟ ⎟ 1⎟ ⎟ 0⎠ 1
Exercise 4 Give a concise representation of the model matrix X for the twofactor nested model, and specialize it to the case of balanced data. Solution Assuming that the elements of y are ordered lexicographically, ⎛
1n1· 1n1· 0n1· ⎜1 0 1 ⎜ n2· n2· n2· X=⎜ .. .. ⎜ .. ⎝ . . . 1nq· 0nq· 0nq·
··· ··· .. . ···
1 0n1· ⊕m j =1 1n1j 2 0n2· ⊕m j =1 1n2j .. .. . . mq 1nq· ⊕j =1 1nqj
⎞ ⎟ ⎟ ⎟. ⎟ ⎠
For balanced data, mi = m and nij = r for all i and j , so the model matrix may be simplified to X = 1qmr , Iq ⊗ 1mr , Iqm ⊗ 1r . Exercise 5 Give a concise representation of the model matrix X for the balanced two-factor partially crossed model yij k = μ + αi − γj + eij k
(i = j = 1, . . . , q; k = 1, . . . , r).
Solution X = 1rq(q−1) , Iq ⊗ 1r(q−1) , −(1r u(q)T ) j =i=1,...,q . j Exercise 6 Give a concise representation of the model matrix X for a one-factor, factor-specific-slope, analysis-of-covariance model yij = μ + αi + γi xij + eij
(i = 1, . . . , q; j = 1, . . . , ni ).
5 Types of Linear Models
33
Solution X = 1n· , ⊕qi=1 1ni , ⊕qi=1 xi , where n· =
q
i=1 ni
and xi = (xi1 , . . . , xini )T .
Exercise 7 Determine the variance–covariance matrix for each of the following two-factor mixed effects models: (a) yij = μ+αi +bj +dij k (i = 1, . . . , q; j = 1, . . . , m; k = 1, . . . , nij ), where E(bj ) = 0 and E(dij k ) = 0 for all i, j, k, the bj ’s and dij k ’s are all uncorrelated, var(bj ) = σb2 for all j , and var(dij k ) = σ 2 for all i, j, k; with parameter space {μ, σb2 , σ 2 : μ ∈ R, σb2 > 0, σ 2 > 0}. (b) yij = μ+ai +γj +dij k (i = 1, . . . , q; j = 1, . . . , m; k = 1, . . . , nij ), where E(ai ) = 0 and E(dij k ) = 0 for all i, j, k, the ai ’s and dij k ’s are all uncorrelated, var(ai ) = σa2 for all i, and var(dij k ) = σ 2 for all i, j, k; with parameter space {μ, σa2 , σ 2 : μ ∈ R, σa2 > 0, σ 2 > 0}. (c) yij = μ + αi + bj + cij + dij k (i = 1, . . . , q; j = 1, . . . , m; k = 1, . . . , nij ), where E(bj ) = E(cij ) = 0 and E(dij k ) = 0 for all i, j, k, the bj ’s, cij ’s, and dij k ’s are all uncorrelated, var(bj ) = σb2 for all j , var(cij ) = σc2 for all i and j , and var(dij k ) = σ 2 for all i, j, k; with parameter space {μ, σb2 , σc2 , σ 2 : μ ∈ R, σb2 > 0, σc2 > 0, σ 2 > 0}. (d) yij = μ + ai + γj + cij + dij k (i = 1, . . . , q; j = 1, . . . , m; k = 1, . . . , nij ), where E(ai ) = E(cij ) = 0 and E(dij k ) = 0 for all i, j, k, the ai ’s, cij ’s, and dij k ’s are all uncorrelated, var(ai ) = σa2 for all i, var(cij ) = σc2 for all i and j , and var(dij k ) = σ 2 for all i, j, k; with parameter space {μ, σa2 , σc2 , σ 2 : μ ∈ R, σa2 > 0, σc2 > 0, σ 2 > 0}. (e) yij = μ + ai + bj + cij + dij k (i = 1, . . . , q; j = 1, . . . , m; k = 1, . . . , nij ), where E(ai ) = E(bj ) = E(cij ) = 0 and E(dij k ) = 0 for all i, j, k, the ai ’s, bj ’s, cij ’s, and dij k ’s are all uncorrelated, var(ai ) = σa2 for all i, var(bj ) = σb2 for all j , var(cij ) = σc2 for all i and j , and var(dij k ) = σ 2 for all i, j, k; with parameter space {μ, σa2 , σb2 , σc2 , σ 2 : μ ∈ R, σa2 > 0, σb2 > 0, σc2 > 0, σ 2 > 0}. Solution (a) This model may be written as y = Xβ + Zb + d, where
⎛
⊕m j =1 1n1j ⎜ ⊕m 1 ⎜ j =1 n2j Z=⎜ .. ⎜ ⎝ . m ⊕j =1 1nqj
⎞ ⎟ ⎟ ⎟, ⎟ ⎠
⎛
⎞ b1 ⎜ ⎟ b = ⎝ ... ⎠ , bm
⎛ ⎜ ⎜ ⎜ ⎝
d111 d112 .. . dqmnqm
⎞ ⎟ ⎟ ⎟. ⎠
34
5 Types of Linear Models
Then var(y) = Zvar(b)ZT + var(d) ⎛ m ⊕j =1 Jn1j ×n1j ⊕m j =1 Jn1j ×n2j ⎜ m m ⎜⊕ ⎜ j =1 Jn2j ×n1j ⊕j =1 Jn2j ×n2j = σb2 ⎜ .. .. ⎜ . . ⎝ m J J ⊕ ⊕m n ×n qj 1j j =1 j =1 nqj ×n2j
· · · ⊕m j =1 Jn1j ×nqj · · · ⊕m j =1 Jn2j ×nqj .. .. . . · · · ⊕m J j =1 nqj ×nqj
⎞ ⎟ ⎟ ⎟ ⎟ + σ 2 In . ⎟ ⎠
(b) This model may be written as y = Xβ + Zb + d, where q
Z = ⊕i=1 1ni· ,
b = (a1 , . . . , aq )T ,
d = (d111 , . . . , dqmnqm )T .
Then q
var(y) = Zvar(b)ZT + var(d) = σa2 ⊕i=1 Jni· + σ 2 In . (c) This model may be written as y = Xβ + Z1 b1 + Z2 b2 + d, where
⎛
⊕m j =1 1n1j ⎜ ⊕m 1 ⎜ j =1 n2j Z1 = ⎜ .. ⎜ ⎝ . ⊕m j =1 1nqj ⎞ ⎛ c11 ⎟ ⎜ b2 = ⎝ ... ⎠ ,
⎞ ⎟ ⎟ ⎟, ⎟ ⎠
⎛
⎞ b1 ⎜ ⎟ b = ⎝ ... ⎠ ,
q
Z2 = ⊕i=1 ⊕m j =1 1nij ,
bm ⎛
⎞ d111 ⎜ ⎟ d = ⎝ ... ⎠ . dqmnqm
cqm Then
var(y) = Z1 var(b1 )ZT1 + Z2 var(b2 )ZT2 + var(d) ⎛ m m ⊕j =1 Jn1j ×n1j ⊕m j =1 Jn1j ×n2j · · · ⊕j =1 Jn1j ×nqj ⎜ m m ⎜ ⊕j =1 Jn2j ×n1j ⊕m j =1 Jn2j ×n2j · · · ⊕j =1 Jn2j ×nqj = σb2 ⎜ .. .. .. .. ⎜ . ⎝ . . . m J m J J ⊕ · · · ⊕ ⊕m n ×n n ×n j =1 qj 1j j =1 qj 2j j =1 nqj ×nqj q
2 + σc2 ⊕i=1 ⊕m j =1 Jnij + σ In .
⎞ ⎟ ⎟ ⎟ ⎟ ⎠
5 Types of Linear Models
35
(d) This model may be written as y = Xβ + Z1 b1 + Z2 b2 + d, where ⎞ a1 ⎜ ⎟ b1 = ⎝ ... ⎠ , ⎛
q
Z1 = ⊕i=1 1ni· ,
⎛
⎞
⎛
q
Z2 = ⊕i=1 ⊕m j =1 1nij ,
aq
⎞ d111 ⎜ ⎟ d = ⎝ ... ⎠ .
c11 ⎜ .. ⎟ b2 = ⎝ . ⎠ ,
dqmnqm
cqm Then
var(y) = Z1 var(b1 )ZT1 + Z2 var(b2 )ZT2 + var(d) q
q
2 = σa2 ⊕i=1 Jni· + σc2 ⊕i=1 ⊕m j =1 Jnij + σ In .
(e) This model may be written as y = Xβ + Z1 b1 + Z2 b2 + Z3 b3 + d, where ⎞
⎛
a1 ⎜ .. ⎟ b1 = ⎝ . ⎠ ,
q
Z1 = ⊕i=1 1ni· ,
aq q
Z3 = ⊕i=1 ⊕m j =1 1nij ,
⎛
⊕m j =1 1n1j ⎜ ⊕m 1 ⎜ j =1 n2j Z2 = ⎜ .. ⎜ ⎝ . ⊕m j =1 1nqj
b3 = (c11 , . . . , cqm )T ,
⎞ ⎟ ⎟ ⎟, ⎟ ⎠
⎞ b1 ⎜ ⎟ b2 = ⎝ ... ⎠ , ⎛
bm
d = (d111 , . . . , dqmnqm )T .
Then var(y) = Z1 var(b1 )ZT1 + Z2 var(b2 )ZT2 + Z3 var(b3 )ZT3 + var(d) ⎛ m m ⊕j =1 Jn1j ×n1j ⊕m j =1 Jn1j ×n2j · · · ⊕j =1 Jn1j ×nqj ⎜ ⊕m J m ⎜ j =1 n2j ×n1j ⊕m j =1 Jn2j ×n2j · · · ⊕j =1 Jn2j ×nqj q = σa2 ⊕i=1 Jni· + σb2 ⎜ .. .. .. .. ⎜ . ⎝ . . . m J m J J ⊕ · · · ⊕ ⊕m n ×n n ×n j =1 qj 1j j =1 qj 2j j =1 nqj ×nqj q
2 +σc2 ⊕i=1 ⊕m j =1 Jnij + σ In .
⎞ ⎟ ⎟ ⎟ ⎟ ⎠
36
5 Types of Linear Models
Exercise 8 Consider the stationary autoregressive model of order one, given by yi = μ + ρ(yi−1 − μ) + ui
(i = 1, . . . , n),
where u1 , . . . , un are uncorrelated random variables with common mean 0, var(u1 ) = σ 2 /(1 − ρ 2 ), var(ui ) = σ 2 for i = 2, . . . , n, and y0 ≡ μ, and the parameter space is {μ, ρ, σ 2 : μ ∈ R, −1 < ρ < 1, σ 2 > 0}. Verify that this model has variance–covariance matrix given by ⎛
1 ρ ρ2 ρ3 · · · ⎜ 1 ρ ρ2 · · · ⎜ ⎜ 1 ρ ··· σ2 ⎜ ⎜ 1 ··· 1 − ρ2 ⎜ ⎜ ⎜ .. . ⎝
⎞ ρ n−1 ρ n−2 ⎟ ⎟ ⎟ ρ n−3 ⎟ ⎟ ρ n−4 ⎟ . ⎟ .. ⎟ . ⎠ 1
Solution For a fixed i ∈ {1, . . . , n} let j ∈ {0, 1, . . . , n − 1}. Subtracting μ from both sides of the model equation and multiplying both sides of the resulting equation by yi−j − μ yields (yi−j − μ)(yi − μ) = ρ(yi−j − μ)(yi−1 − μ) + (yi−j − μ)ui . Taking expectations of both sides yields the recursive equation cov(yi−j , yi ) = ρcov(yi−j , yi−1 ) + σ 2 I{j =0} . Then writing σij for the (i, j )th element of the variance–covariance matrix, we obtain σ11 = σ 2 /(1 − ρ 2 ), σ12 = ρσ11 = ρσ 2 /(1 − ρ 2 ), σ22 = ρσ12 + σ 2 = ρ 2 σ 2 /(1 − ρ 2 ) + σ 2 = σ 2 /(1 − ρ 2 ), σ13 = ρσ12 = ρ 2 σ 2 /(1 − ρ 2 ), σ23 = ρσ22 = ρσ 2 /(1 − ρ 2 ), σ33 = ρσ23 + σ 2 = ρ 2 σ 2 /(1 − ρ 2 ) + σ 2 = σ 2 /(1 − ρ 2 ). The method of induction may be used to verify that the remaining elements of the variance–covariance matrix coincide with those given by (5.13).
5 Types of Linear Models
37
Exercise 9 Determine the variance–covariance matrix for y = (y1 , y2 , . . . , yn )T where yi = μ +
i
uj
(i = 1, . . . , n),
j =1
and the uj ’s are uncorrelated random variables with mean 0 and variance σ 2 > 0. Solution By Theorems 4.2.2 and 4.2.3, cov(yi , yk ) = cov μ + ij =1 uj , μ min(i,k) + kl=1 ul = ij =1 kl=1 cov(uj , ul ) = j =1 var(uj ) = σ 2 min(i, k). Thus, the variance–covariance matrix is ⎛
1 1 1 ··· ⎜1 2 2 ··· ⎜ ⎜ σ2 ⎜1 2 3 ··· ⎜. . . . ⎝ .. .. .. . .
⎞ 1 2⎟ ⎟ 3⎟ ⎟. .. ⎟ .⎠
1 2 3 ··· n Exercise 10 Determine the variance–covariance matrix for y = (y1 , y2 , y3 )T where yi = μi + φi−1 (yi−1 − μi−1 ) + ui
(i = 1, 2, 3),
y0 ≡ μ0 , and u1 , u2 , u3 are uncorrelated random variables with common mean 0 and var(ui ) = σi2 for i = 1, 2, 3; the parameter space is {μ0 , μ1 , μ2 , μ3 , φ0 , φ1 , φ2 , σ12 , σ22 , σ32 : μi ∈ R for all i, φi ∈ R for all i, and σi2 > 0 for all i}. This model is an extension of the first-order stationary autoregressive model (for three observations) called the first-order antedependence model. Solution For a fixed i ∈ {1, 2, 3} let j ∈ {0, 1, 2}. Subtracting μi from both sides of the model equation and multiplying both sides of the resulting equation by yi−j − μi−j yields (yi−j − μi−j )(yi − μi ) = φi−1 (yi−j − μi−j )(yi−1 − μi−1 ) + (yi−j − μi−j )ui . Taking expectations of both sides yields the recursive equation cov(yi−j , yi ) = φi−1 cov(yi−j , yi−1 ) + σi2 I{j =0} .
38
5 Types of Linear Models
Thus writing σij for the (i, j )th element of the variance matrix, we obtain σ11 = σ12 , σ12 = φ1 σ11 = φ1 σ12 , σ22 = φ1 σ12 + σ22 = φ12 σ12 + σ22 , σ13 = φ2 σ12 = φ2 φ1 σ12 , σ23 = φ2 σ22 = φ2 φ12 σ12 + φ2 σ22 , σ33 = φ2 σ23 + σ32 = φ22 φ12 σ12 + φ22 σ22 + σ32 .
6
Estimability
This chapter presents exercises on estimability of linear functions of β in linear models, and provides solutions to those exercises. Exercise 1 A linear estimator t (y) = t0 + tT y of cT β associated with the model {y, Xβ} is said to be location equivariant if t (y + Xd) = t (y) + cT d for all d. (a) Prove that a linear estimator t0 + tT y of cT β is location equivariant if and only if tT X = cT . (b) Is a location equivariant estimator of cT β necessarily unbiased, or vice versa? Explain. Solution (a) t (y+Xd) = t0 +tT (y+Xd) = t0 +tT y+tT Xd, and t (y)+cT d = t0 +tT y+cT d. These two quantities are equal for all d if and only if tT Xd = cT d for all d, i.e., if and only if tT X = cT (by Theorem 2.1.1). (b) Comparing part (a) with Theorem 6.1.1, we see that a location equivariant estimator is not necessarily unbiased, but an unbiased estimator is necessarily location equivariant. Exercise 2 Prove Corollary 6.1.2.1: If cT1 β, cT2 β, . . . , cTk β are estimable under the model {y, Xβ}, then so is any linear combination of them; that is, the set of estimable functions under a given unconstrained model is a linear space. Solution Because cT1 β,. . . ,cTk β are estimable, cTi ∈ R(X) for all i. Because R(X) is a linear space, any linear combination of cT1 β,. . . ,cTk β is an element of R(X), hence estimable.
© Springer Nature Switzerland AG 2020 D. L. Zimmerman, Linear Model Theory, https://doi.org/10.1007/978-3-030-52074-8_6
39
40
6 Estimability
Exercise 3 Prove Corollary 6.1.2.2: The elements of Xβ are estimable under the model {y, Xβ}; in fact, those elements span the space of estimable functions for that model. Solution The ith element of Xβ can be expressed as xTi β, where xTi is the ith row of X. Clearly, xTi ∈ R(X), so the ith element of Xβ is estimable. Because span(xT1 , . . . , xTn ) = R(X), the elements of Xβ span the space of estimable functions. Exercise 4 Prove Corollary 6.1.2.3: A set of p∗ [= rank(X)] linearly independent estimable functions under the model {y, Xβ} exists such that any estimable function under that model can be written as a linear combination of functions in this set. Solution Let rT1 , . . . , rTp∗ represent any basis for R(X). Then rT1 β, . . . , rTp∗ β satisfy the requirements of the definition of a basis for the set of all estimable functions. Exercise 5 Prove Corollary 6.1.2.4: cT β is estimable for every vector c under the model {y, Xβ} if and only if p∗ = p. Solution cT β is estimable for all cT ∈ Rp if and only if R(X) = Rp , i.e., if and only if R(X) = R(Ip ), i.e., if and only if rank(X) = rank(Ip ), i.e., if and only if p∗ = p. Exercise 6 Prove Corollary 6.1.2.5: cT β is estimable under the model {y, Xβ} if and only if c ⊥ N (X). Solution By Theorem 2.6.2, R(X) is the orthogonal complement of N (X). Thus cT ∈ R(X) if and only if c ⊥ N (X). Exercise 7 Prove Corollary 6.1.2.6: cT β is estimable under the model {y, Xβ} if and only if cT ∈ R(XT X). Solution The result follows immediately from Theorems 6.1.2 and 6.1.3. Exercise 8 For each of the following two questions, answer “yes” or “no.” If the answer is “yes,” give an example; if the answer is “no,” prove it. (a) Can the sum of an estimable function and a nonestimable function be estimable? (b) Can the sum of two nonestimable functions be estimable? (c) Can the sum of two linearly independent nonestimable functions be nonestimable?
6 Estimability
41
Solution (a) No. Suppose that cT1 β is estimable and cT2 β is nonestimable. Suppose further that their sum, cT1 β + cT2 β is estimable. Now, cT2 β = (cT1 β + cT2 β) − cT1 β, so by Corollary 6.1.2.1, cT2 β is estimable. But this contradicts the initial assumption that cT2 β is nonestimable. Thus cT1 β + cT2 β cannot be estimable. (b) Yes. An example occurs in the one-factor model, under which both μ and α1 are nonestimable but their sum, μ + α1 , is estimable. (c) Yes. An example occurs in the two-way main effects model, under which both α1 and γ1 are nonestimable and so is their sum. Exercise 9 Determine N (X) for the one-factor model and for the two-way main effects model with no empty cells, and then use Corollary 6.1.2.5 to obtain the same characterizations of estimable functions given in Examples 6.1-2 and 6.1-3 for these two models. Solution For the one-factor model, the matrix of distinct rows of X is (1q , Iq ), which implies that N (X) = {v : (1q , Iq )v = 0} = {v = (v1 , vT2 )T : v1 1q + v2 = 0} = {v = (v1 , vT2 )T : v2 = −v1 1q } −1 , a ∈ R}. = {v : v = a 1q By Corollary 6.1.2.5, cT β [where c = (ci )] is estimable if and only if c ⊥ N (X), q+1 −1 T = 0 for all a ∈ R, i.e., if and only if c1 = i=2 ci . i.e., if and only if c a 1q For the two-way main effects model with no empty cells, the matrix of distinct rows of X is (1qm , Iq ⊗ 1m , 1q ⊗ Im ), which implies that N (X) = {v : (1qm , Iq ⊗ 1m , 1q ⊗ Im )v = 0}. Let v = (v1 , vT2 , vT3 )T with v1 ∈ R, v2 ∈ Rq , and v3 ∈ Rm . Also denote the elements of v2 by (v21 , v22 , . . . , v2q )T . Then (1qm , Iq ⊗ 1m , 1q ⊗ Im )v = 0 if and only if v1 1m + v2i 1m + v3 = 0
for all i = 1, . . . , q,
i.e., if and only if v2i 1m = −v1 1m + v3 for all i = 1, . . . , q, i.e., if and only if v21 = v22 = · · · = v2q ≡ b, say, and v3 = −(v1 + b)1m . Therefore, N (X) = {v : v = (a, b1Tq , −(a + b)1Tm )T , a, b ∈ R}.
42
6 Estimability
By Corollary 6.1.2.5, cT β [where c = (c1 , c2 , . . . , cq+1 , cq+2 , . . . , cq+m+1 )T ] is estimable if and only if c ⊥ N (X), i.e., if and only if ⎛
⎞ a ⎠=0 c ⎝ b1q −(a + b)1m
for all a, b ∈ R,
T
i.e., if and only if ac1 + b
q+1
ci − (a + b)
i=2
q+m+1
ci = 0
for all a, b ∈ R.
(6.1)
i=q+2
Putting a = 1 and b = −1 yields c1 =
q+1
ci ,
i=2
and putting a = 1 and b = 0 yields c1 =
q+m+1
ci .
i=q+2
And, it may be easily verified that if both of these last two equations hold, then q+1 so does Eq. (6.1). Therefore, cT β is estimable if and only if c1 = i=2 ci = q+m+1 c . i=q+2 i Exercise 10 Prove that a two-way layout that is row-connected is also columnconnected, and vice versa. Solution Suppose that the two-way layout is row-connected, and consider γj − γj where j = j . Because γj appears in the model for at least one observation and so does γj , there exist i and i such that the cell means μ + αi + γj and μ + αi + γj are estimable. Because the layout is row-connected, αi − αi is estimable. Consequently, γj − γj = (μ + αi + γj ) − (μ + αi + γj ) − (αi − αi ), i.e., γj − γj is a linear combination of estimable functions, implying (by Corollary 6.1.2.1) that γj − γj is estimable. Because j and j were arbitrary, the layout is columnconnected. Showing that column-connectedness implies row-connectedness may be shown similarly.
6 Estimability
43
Exercise 11 Prove Theorem 6.1.4: Under the two-way main effects model for a q × m layout, each of the following four conditions implies the other three: (a) (b) (c) (d)
the layout is connected; rank(X) = q + m − 1; μ + αi + γj is estimable for all i and j ; all α-contrasts and γ -contrasts are estimable.
Solution (a) ⇒ (b) Because the layout is connected, by definition αi − αi is estimable for all i and i and γj − γj is estimable for all j and j under the main effects model. For j = 1, there exists i ∈ {1, 2, . . . , q} such that μ + αi + γ1 is estimable (otherwise all cells in the first column of the q×m layout would be empty, rendering it a q × (m − 1) design). Without loss of generality, assume that μ+α1 +γ1 is estimable. Then {μ+α1 +γ1 , α2 −α1 , . . . , αq − α1 , γ2 − γ1 , . . . , γm − γ1 } is a set of q + m − 1 linearly independent estimable functions. Therefore rank(X) ≥ q + m − 1. On the other hand, rank(X) ≤ q + m − 1 because the main effects model with no empty cells (with model matrix X∗ , say) has rank q +m−1 and R(X) ⊆ R(X∗ ) because X can be obtained by deleting rows corresponding to empty cells (if any) in X∗ . Thus, rank(X) = q + m − 1. (b) ⇒ (c) If rank(X) = q + m − 1, then for a q × m layout main effects model with no empty cells (with model matrix X∗ ), it is the case that R(X) ⊆ R(X∗ ) and rank(X) = q + m − 1 = rank(X∗ ). Therefore, R(X) = R(X∗ ) by Theorem 2.8.5. Because μ+αi +βj is estimable for all i and j in the main effects model with no empty cells, by Theorem 6.1.2 it is also estimable in the main effects model with model matrix X. (c) ⇒ (d) Suppose that μ + αi + γj is estimable all i and j under the main for q q effects model. Then for any α-contrast i=1 di αi where i=1 di = 0, it is the case that q
di αi =
i=1
q
di (μ + αi + γ1 ),
i=1
which is a linear combination of estimable functions, hence estimable by Corollary 6.1.2.1. Similarly, for any γ -contrast m j =1 gj γj where m g = 0, it is the case that j =1 j m j =1
gj γj =
m
gj (μ + α1 + γj ),
j =1
which is a linear combination of estimable functions, hence estimable.
44
6 Estimability
(d) ⇒ (a) If all α-contrasts and γ -contrasts are estimable, then αi − αi is estimable for all i and i , and γj − γj is estimable for all j and j because they are particular types of α- and γ -contrasts. Thus by definition, the layout is connected. Exercise 12 For a two-way main effects model with q levels of Factor A and m levels of Factor B, what is the smallest possible value of rank(X)? Explain. Solution rank(X) = q + m − s where s is the number of disconnected subarrays. Thus rank(X) is minimized by maximizing s. The maximum of s is min(q, m), so the minimum of rank(X) is q + m − min(q, m) = max(q, m). Exercise 13 Consider the following two-way layout, where the number in each cell indicates how many observations are in that cell: Levels of B Levels of A 1 2 3 1 1 2 3 4 5 6 7 8 1 9 2
4
5
6
7 2
8
9
10
1 4
1
1 2
2 1 1
1
1 2 3
Consider the main effects model yij k = μ + αi + γj + eij k
(i = 1, . . . , 9; j = 1, . . . , 10)
for these observations (where k indexes the observations, if any, within cell (i, j ) of the table). (a) Which α-contrasts and γ -contrasts are estimable? (b) Give a basis for the set of all estimable functions. (c) What is the rank of the associated model matrix X? Solution (a) The “3+e” procedure, followed by an appropriate rearrangement of rows and columns, results in the following set of disconnected subarrays:
6 Estimability
45
Levels of A 3 4 6 7 1 5 2 8 9
Levels of B 4 5 6 4 e 1 e 2 e e 1 e 1 2 e
10 1 2 e e
3
7
8
1 e
2 1
e 1
2
9
e 1
1 3
1
2
From this α-contrasts are of the form 9i=1 di αi we determine that the estimable where i∈{3,4,6,7} di = di = d9 = 0 and that the i∈{1,5} di = 10i∈{2,8} estimable γ -contrasts are of the form j =1 gj γj where j ∈{4,5,6,10} gj = j ∈{3,7,8} gj = j ∈{2,9} gj = g1 = 0. (b) {μ + α3 + γ4 , α4 − α3 , α6 − α3 , α7 − α3 , γ5 − γ4 , γ6 − γ4 , γ10 − γ4 , μ + α1 + γ3 , α5 − α1 , γ7 − γ3 , γ8 − γ3 , μ + α2 + γ2 , α8 − α2 , γ9 − γ2 , μ + α9 + γ1 } (c) rank(X) = 15 by part (b) and Corollary 6.1.2.3. Exercise 14 Consider the following two-way layout, where the number in each cell indicates how many observations are in that cell:
Levels of A 1 2 3 4 5 6 7 8 9
Levels of B 1 2 3 1 3
4
5 4
6 1 1
7
8
1 1 1
2 2
1
1 3
2 1
1 1
Consider the main effects model yij k = μ + αi + γj + eij k
(i = 1, . . . , 9; j = 1, . . . , 8)
46
6 Estimability
for these observations (where k indexes the observations, if any, within cell (i, j ) of the table). (a) (b) (c) (d)
Which α-contrasts and γ -contrasts are estimable? Give a basis for the set of all estimable functions. What is the rank of the associated model matrix X? Suppose that it was possible to take more observations in any cells of the table, but that each additional observation in cell (i, j ) of the table will cost the investigator ($20 × i) + [$25 × (8 − j )]. How many more observations are necessary for all α-contrasts and all γ contrasts to be estimable, and in which cells should those observations be taken to minimize the additional cost to the investigator?
Solution (a) The following two-way layout is obtained after applying the “3+e” procedure and appropriately rearranging rows and columns:
Levels of A 1 2 7 9 3 6 4 5 8
Levels of B 1 5 6 1 4 2 3 e 1 e e 3 1 1 e
3
4
8
e 2
e 1
1 1
2
7
1 1 2
2 e 1
9 Thus, the estimable α-contrasts are i∈{1,2,7,9} di = i=1 di αi where d = d = 0, and the estimable γ -contrasts are 8j =1 gj γj i∈{3,6} i∈{4,5,8} i i where j ∈{1,5,6} gj = j ∈{3,4,8} gj = j ∈{2,7} gj = 0. (b) {μ + α1 + γ1 , α2 − α1 , α7 − α1 , α9 − α1 , γ5 − γ1 , γ6 − γ1 , μ + α3 + γ3 , α6 − α3 , γ4 − γ3 , γ8 − γ3 , μ + α4 + γ2 , α5 − α4 , α8 − α4 , γ7 − γ2 }. (c) rank(X) = 14 by part (b) and Corollary 6.1.2.3. (d) Two more observations will suffice, and the additional cost will be minimized by taking them in cells (1, 8) and (1, 7).
6 Estimability
47
Exercise 15 Consider the two-way partially crossed model introduced in Example 5.1.4-1, i.e., yij k = μ + αi − αj + eij k
(i = j = 1, . . . , q; j = 1, . . . , ni )
for those combinations of i and j (i = j ) for which nij ≥ 1. (a) For the case of no empty cells, what conditions must the elements of c satisfy for cT β to be estimable? In particular, are μ and all differences αi − αj estimable? Determine the rank of the model matrix X for this case and find a basis for the set of all estimable functions. (b) Now consider the case where all cells below the main diagonal of the two-way layout are empty, but no cells above the main diagonal are empty. How, if at all, would your answers to the same questions in part (a) change? (c) Finally, consider the case where there are empty cells in arbitrary locations. Describe a method for determining which cell means are estimable. Under what conditions on the locations of the empty cells are all functions estimable that were estimable in part (a)? Solution (a) Let us agree to label the distinct rows of X by subscripts ij , where j = i = 1, . . . , q. Then the matrix of distinct rows of X is given by [1q(q−1) , x1 , x2 , . . . , xq ] where the ij th element of xk is equal to 1 if i = k, −1 if j = k, and 0 otherwise. Any linear combination aT X may therefore be expressed as [a·· , a1· − a·1 , a2· − a·2 , . . . , aq· − a·q ], where aT = (a12 , a13 , . . . , a1q , a21 , a23 , a24 , . . . , a2q , a31 , . . . , aq,q−1 ). Thus, cT β is estimable if and only if the coefficientson the αi ’s sum to 0; there are no q restrictions on the coefficient on μ. Since k=1 xk = 0 but 1 ∈ / C(x1 , . . . , xq ), the rank of X is q. A basis for the set of all estimable functions is {μ, α1 − α2 , α1 − α3 , . . . , α1 − αq }. (b) If all cells below the main diagonal are empty, then the matrix of distinct rows of X is obtained from the one in part (a) by deleting those rows ij for which i > j . Let X∗ = (1q(q−1)/2 , x∗1 , . . . , x∗q ) denote the resulting matrix. Then, aT X∗ = (a·· , a1· , a2· − a12 , a3· − (a13 + a23 ), . . . , aq−1,q − (a1q + · · · + aq−1,q )) where aT = (a12 , a13 , . . . , a1q , a23 , a24 , . . . , a2q , a34 , . . . , aq−1,q ). It is easily verified that the last q elements of aT X∗ sum to 0 and 1 ∈ / C(x∗1 , . . . , x∗q ) as ∗ before, so the row space of X is the same as that of X. Thus, the answers to part (a) do not change. (c) Follow this procedure: (i) Since (μ + αi − αj ) = (μ + αi − αj ) + (μ + αi − αj ) − (μ + αi − αj ), we can start with the 3 + e procedure used to determine which cell means are estimable in the two-way main effects model, with one difference. Specifically, put an “o” (for “observation”) in each cell for which nij > 0.
48
6 Estimability
Then examine the 2 × 2 subarray formed as the Cartesian product of any two rows and any two columns, provided that the subarray does not include a cell on the main diagonal. If any three of the four cells of this subarray are occupied by an observation but the fourth cell is empty, put an “e” (for “estimable”) in the empty cell. Repeat this procedure for every 2 × 2 subarray, putting an “e” in every empty fourth cell if the other three cells are occupied by either an “o” or an “e.” (ii) Next, note that [(μ+αi −αj )+(μ+αj −αi )]/2 = μ; thus, if, for any i = j , cell (i, j ) and its reflection cell (j, i) are occupied by an “o” or “e,” then μ is estimable, and an “e” may be put in every cell which is the reflection of a cell occupied by an “o” or “e.” (iii) Examine each 2 × 2 subarray formed as the Cartesian product of any two rows and any two columns such that exactly one of the four cells lies on the main diagonal. For such a subarray, subtracting the cell mean for the cell opposite to the one on the main diagonal from the sum of cell means for the other two cells yields μ. Thus, if the three off-diagonal cells for such an array are occupied by an “o” or and “e,” then μ is estimable, and an “e” may be put in every cell that is the reflection of a cell occupied by an “o” or “e.” (iv) If, while carrying out steps (ii) and (iii), it was determined that μ is estimable, then re-examine every 2 × 2 subarray previously examined in step (iii), and if any two of the three off-diagonal cells are occupied by an “o” or and “e,” put an “e” in the remaining off-diagonal cell. (v) Cycle through the previous four steps until an “e” may not be put in any more cells. After carrying out this procedure, the only cell means that are estimable are those corresponding to cells occupied by an “o” or an “e.” If μ was not determined to be estimable in steps (ii) or (iii), then μ is not estimable and neither is any difference αi −αi . If, however, μ was determined to be estimable, then all differences αi − αi corresponding to the estimable cell means are estimable. Exercise 16 For a 3 × 3 layout, consider a two-way model with interaction, i.e., yij k = μ + αi + γj + ξij + eij k
(i = 1, 2, 3; j = 1, 2, 3; k = 1, . . . , nij )
where some of the nine cells may be empty. (a) Give a basis for the interaction contrasts, ψij i j ≡ ξij − ξij − ξi j + ξi j , that are estimable if no cells are empty. (b) If only one of the nine cells is empty, how many interaction contrasts are there in a basis for the estimable interaction contrasts? What is rank(X)? Does the answer depend on which cell is empty? (c) If exactly two of the nine cells are empty, how many interaction contrasts are there in a basis for the estimable interaction contrasts? What is rank(X)? Does the answer depend on which cells are empty?
6 Estimability
49
(d) What is the maximum number of empty cells possible for a 3 × 3 layout that has at least one estimable interaction contrast? What is rank(X) in this case? Solution (a) Any ψij i j ≡ ξij − ξij − ξi j + ξi j = μij − μij − μi j + μi j is estimable if the four cell means included in its definition are estimable. Thus, if there are no empty cells, all ψij i j for i = i and j = j are estimable. A basis for this set of functions is {ψ1122 , ψ1123 , ψ1132 , ψ1133 , ψ1223 , ψ1233 , ψ2132 , ψ2133 , ψ2233 }. There are nine functions in the basis, so rank(X) = 9. (b) The location of the empty cell determines which interaction contrasts are estimable but does not determine the number of linearly independent interaction contrasts because we can always perform row and column permutations to put the empty cell in any row and column we desire. So without loss of generality assume that the empty cell is (3, 3). Then a basis for the estimable interaction contrasts is {ξ1122 , ξ1123 , ξ1132 , ξ1223 , ξ2132 }. The basis will change depending on which cell is empty, but its dimension is 5 regardless of which cell is empty. Thus, rank(X) = 5. (c) Either the two empty cells are in the same row or column, or they are in different rows and columns. The “same row” case is equivalent to the “same column” case. For this case, again by considering row and column permutations, without loss of generality we may assume cells (2, 3) and (3, 3) are empty. Then a basis for the estimable interaction contrasts is {ξ1122 , ξ1123 , ξ1223 }. The basis will change depending on which two cells are empty, but its dimension is 3 regardless, so rank(X) = 3. On the other hand, if the empty cells lie in different rows and columns, which without loss of generality we may take to be cells (1, 1) and (3, 3), then a basis for the estimable interaction contrasts is {ξ1223 , ξ2132 }. Again, the basis will change depending on which two cells are empty, but its dimension is 2 regardless so rank(X) = 2. (d) No entire row and no entire column in the 3 × 3 layout can be empty. The maximum allowable number of empty cells for this to be true and for there to be at least one estimable interaction contrast is 4. Without loss of generality assume that cells (1, 3), (2, 3), (3, 1), and (3, 2) are empty. Then the only estimable interaction contrast is ξ1122 . The lone estimable contrast will change, of course, depending on which four cells are empty, but there is always one such contrast, and the rank of X is 1. Exercise 17 Consider the three-way main effects model yij kl = μ+αi +γj +δk +eij kl
(i = 1, . . . , q; j = 1, . . . , m; k = 1, . . . , s; l = 1, . . . , nij k ),
and suppose that there are no empty cells (i.e., nij k > 0 for all i, j, k). (a) What conditions must the elements of c satisfy for cT β to be estimable? (b) Find a basis for the set of all estimable functions. (c) What is the rank of the model matrix X?
50
6 Estimability
Solution (a) The matrix of distinct rows of X is
1qms , Iq ⊗ 1ms , 1q ⊗ Im ⊗ 1s , 1qm ⊗ Is .
An arbitrary element of R(X) may therefore be expressed as aT 1qms , Iq ⊗ 1ms , 1q ⊗ Im ⊗ 1s , 1qm ⊗ Is , or equivalently as (a··· , a1·· , . . . , aq·· , a·1· , . . . , a·m· , a··1 , . . . , a··s ), where aT = (a111 , a112 , . . . , a11s , a121 , . . . , a12s , . . . , aqms ). Thus, by Theorem 6.1.2, cT β is estimable if and only if the coefficients on the αi ’s sum to the coefficient on μ and the coefficients on the γj ’s and δk ’s do likewise. (b) A basis for the set of all estimable functions is {μ + α1 + γ1 + δ1 , α2 − α1 , α3 − α1 , . . . , αq − α1 , γ2 − γ1 , . . . , γm − γ1 , δ2 − δ1 , . . . , δs − δ1 }. (c) By part (b) and Corollary 6.1.2.3, rank(X) = 1 + (q − 1) + (m − 1) + (s − 1) = q + m + s − 2. Exercise 18 Consider a three-factor crossed classification in which Factors A, B, and C each have two levels, and suppose that there is exactly one observation in the following four (of the eight) cells: 111, 122, 212, 221. The other cells are empty. Consider the following main effects model for these observations: yij k = μ + αi + γj + δk + eij k , where (i, j, k) ∈ {(1, 1, 1), (1, 2, 2), (2, 1, 2), (2, 2, 1)}. (a) Which of the functions α2 − α1 , γ2 − γ1 , and δ2 − δ1 are estimable? (b) Find a basis for the set of all estimable functions. (c) What is the rank of the model matrix X? Note This exercise and the next one are relevant to the issue of “confounding” in a 22 factorial design in blocks of size two. Solution (a) All of them are estimable by Corollary 6.1.2.1 because: • − 12 (μ + α1 + γ1 + δ1 ) − 12 (μ + α1 + γ2 + δ2 ) + 12 (μ + α2 + γ1 + δ2 ) + 1 2 (μ + α2 + γ2 + δ1 ) = α2 − α1 ; • − 12 (μ + α1 + γ1 + δ1 ) + 12 (μ + α1 + γ2 + δ2 ) − 12 (μ + α2 + γ1 + δ2 ) + 1 2 (μ + α2 + γ2 + δ1 ) = γ2 − γ1 ;
6 Estimability
51
• − 12 (μ + α1 + γ1 + δ1 ) + 12 (μ + α1 + γ2 + δ2 ) + 12 (μ + α2 + γ1 + δ2 ) − 1 2 (μ + α2 + γ2 + δ1 ) = δ2 − δ1 . Thus, if Factor C represents blocks, so that the layout represents a 22 factorial design in (incomplete) blocks of size two, Factor A and Factor B are not confounded with blocks. (b) A basis for the set of estimable functions is {μ + α1 + γ1 + δ1 , α2 − α1 , γ2 − γ1 , δ2 − δ1 }. (c) p∗ = 4. Exercise 19 Consider a three-factor crossed classification in which Factors A, B, and C each have two levels, and suppose that there is exactly one observation in the following four (of the eight) cells: 111, 211, 122, 222. The other cells are empty. Consider the following main effects model for these observations: yij k = μ + αi + γj + δk + eij k , where (i, j, k) ∈ {(1, 1, 1), (2, 1, 1), (1, 2, 2), (2, 2, 2)}. (a) Which of the functions α2 − α1 , γ2 − γ1 , and δ2 − δ1 are estimable? (b) Find a basis for the set of all estimable functions. (c) What is the rank of the model matrix X? Solution (a) By Corollary 6.1.2.1, α2 − α1 is estimable because − 12 (μ + α1 + γ1 + δ1 ) + 1 1 1 2 (μ + α2 + γ1 + δ1 ) − 2 (μ + α1 + γ2 + δ2 ) + 2 (μ + α2 + γ2 + δ2 ) = α2 − α1 . So is γ2 − γ1 + δ2 − δ1 because − 12 (μ + α1 + γ1 + δ1 ) − 12 (μ + α2 + γ1 + δ1 ) + 1 1 2 (μ + α1 + γ2 + δ2 ) + 2 (μ + α2 + γ2 + δ2 ) = γ2 − γ1 + δ2 − δ1 . However, no linear combination of the same four functions yields either γ2 − γ1 or δ2 − δ1 , so neither γ2 − γ1 nor δ2 − δ1 is estimable. If Factor C represents blocks, so that the layout represents a 22 factorial design in (incomplete) blocks of size two, Factor B is confounded with blocks. (b) A basis for the set of estimable functions is {μ + α1 + γ1 + δ1 , α2 − α1 , γ2 − γ1 + δ2 − δ1 }. (c) p∗ = 3. Exercise 20 Consider a three-factor crossed classification in which Factors A and B have two levels and Factor C has three levels (i.e. a 2 × 2 × 3 layout), but some cells may be empty. Consider the following main effects model for this situation: yij kl = μ + αi + γj + δk + eij kl , where i = 1, 2; j = 1, 2; k = 1, 2, 3; and l = 1, . . . , nij k for the nonempty cells.
52
6 Estimability
(a) What is the fewest number of observations that, if they are placed in appropriate cells, will make all functions of the form μ + αi + γj + δk estimable? (b) If all functions of the form μ + αi + γj + δk are estimable, what is the rank of the model matrix? (c) Let a represent the correct answer to part (a). If a observations are placed in any cells (one observation per cell), will all functions of the form μ + αi + γj + δk necessarily be estimable? Explain. (d) If a + 1 observations are placed in any cells (one observation per cell), will all functions of the form μ + αi + γj + δk necessarily be estimable? Explain. (e) Suppose there is exactly one observation in the following cells: 211, 122, 123, 221. The other cells are empty. Determine p∗ and find a basis for the set of all estimable functions. Solution (a) Let X be the model matrix for the case of this model/layout where no cell is empty. Then obviously all functions of the form μ + αi + γj + δk are estimable, ˜ be the model matrix for and by Exercise 6.17c the rank of X is 5. Now let X any 2 × 2 × 3 layout for which all functions of the form μ + αi + γj + δk are ˜ ≥ 5 because spanning the set of all functions of the estimable. Then rank(X) ˜ ≤ 5 because form μ + αi + γj + δk requires at least five vectors, and rank(X) ˜ ˜ ˜ R(X) ⊆ R(X). Hence rank(X) = 5 and it follows that X must have at least five rows, i.e., at least five observations are required. Now consider the particular case ⎛ ⎞ 11010100 ⎜1 1 0 0 1 1 0 0⎟ ⎜ ⎟ ⎜ ⎟ ˜ X = ⎜1 0 1 1 0 1 0 0⎟, ⎜ ⎟ ⎝1 1 0 1 0 0 1 0⎠ 11010001 for which the corresponding design is depicted in the top panel of Fig. 6.1. Then it can be easily verified that all functions of the form μ + αi + γj + δk are estimable. Therefore, it is possible to make all functions of the form μ + αi + γj + δk estimable with five observations, but no fewer than 5. (b) rank(X) = 5, as explained in part (a). (c) No. Suppose you put observations in cells 111, 121, 212, 222, 213, as depicted in the left bottom panel in Fig. 6.1. Then the model matrix is ⎛
1 ⎜1 ⎜ ⎜ X = ⎜1 ⎜ ⎝1 1
1 1 0 0 0
0 0 1 1 1
1 0 1 0 1
0 1 0 1 0
1 1 0 0 0
0 0 1 1 0
⎞ 0 0⎟ ⎟ ⎟ 0⎟. ⎟ 0⎠ 1
6 Estimability
53
α2 γ2 α1
γ1 δ1
δ3
δ2
α2
α2 γ2
α1
γ1 δ1
δ2
δ3
γ2 α1
γ1 δ1
δ2
δ3
Fig. 6.1 Depictions of designs for Exercise 6.20. Open circles indicate empty cells; black circles indicate nonempty cells
Denoting the columns of X by xj , j = 1, 2, . . . , 8, we observe that x4 , x5 , x6 , x7 are linearly independent, and x1 = x4 + x5 , x2 = x6 , x3 = x4 + x5 − x6 , and x8 = x4 + x5 − x6 − x7 . Hence rank(X) = 4, indicating that not all functions of the form μ + αi + γj + δk are estimable. (d) No. Suppose you put observations in the same cells as in part (c), plus cell 223; this design is shown in the bottom right panel of Fig. 6.1. Then the model matrix is ⎞ ⎛ 11010100 ⎜1 1 0 0 1 1 0 0⎟ ⎟ ⎜ ⎟ ⎜ ⎜1 0 1 1 0 0 1 0⎟ X=⎜ ⎟. ⎜1 0 1 0 1 0 1 0⎟ ⎟ ⎜ ⎝1 0 1 1 0 0 0 1⎠ 10101001 Then columns four through seven of X are linearly independent here just as they were in part (b), and the other four columns of X here may be expressed as the same linear combinations of columns of X given in part (b). Hence the rank of X is still 4, so not all functions of the form μ + αi + γj + δk are estimable. (e) Because the cell means of those four nonempty cells are estimable and linearly independent, p∗ = 4 and a basis for the set of all estimable functions is {μ + α2 + γ1 + δ1 , μ + α1 + γ2 + δ2 , μ + α1 + γ2 + δ3 , μ + α2 + γ2 + δ1 }. Exercise 21 A “Latin square” design is an experimental design with three partially crossed factors, called “Rows,” “Columns,” and “Treatments.” The number of levels of each factor is common across factors, i.e., if there are q levels of
54
6 Estimability
Row, then there must be q levels of Column and q levels of Treatment. The first two factors are completely crossed, so a two-way table can be used to represent their combinations; but the Treatment factor is only partially crossed with the other two factors because there is only one observational unit for each Row×Column combination. The levels of Treatment are assigned to Row×Column combinations in such a way that each level occurs exactly once in each row and column (rather than once in each combination, as would be the case if Treatment was completely crossed with Row and Column). An example of a 3 × 3 “Latin square” design is depicted below: The number displayed in each of the nine cells of this layout is the 2 3 1
1 2 3
3 1 2
Treatment level (i.e., the Treatment label) which the observation in that cell received. For observations in a Latin square design with q treatments, consider the following model: yij k = μ + αi + γj + τk + eij k
(i = 1, . . . , q; j = 1, . . . , q; k = 1, . . . , q),
where yij k is the observation in row i and column j (and k is the treatment assigned to that cell); μ is an overall effect; αi is the effect of Row i; γj is the effect of Column j ; τk is the effect of Treatment k; and eij k is the error corresponding to yij k . These errors are uncorrelated with mean 0 and variance σ 2 . (a) Show that τk − τk is estimable for all k and k , and give an unbiased estimator for it. (b) What condition(s) must the elements of c satisfy for cT β to be estimable? (c) Find a basis for the set of all estimable functions. (d) What is the rank of the model matrix X? Solution (a) τk − τk = [(μ + α1 + γj1 (k) + τk ) − (μ + α1 + γj1 (k ) + τk ) +(μ + α2 + γj2 (k) + τk ) − (μ + α2 + γj2 (k ) + τk ) + · · · + (μ + αq + γjq (k) + τk ) − (μ + αq + γjq (k ) + τk )]/q, where {j1 (k), j2 (k), . . . , jq (k)} and {j1 (k ), j2 (k ), . . . , jq (k )} are permutations of {1, 2, . . . , q} corresponding to the columns to which treatments k
6 Estimability
55
and k were assigned in rows 1, 2, . . . , q. Because all of the cell means in this expression for τk − τk are estimable, every treatment difference is estimable. Obviously, the same linear combination of observations as the linear combination of cell means in τk − τk , i.e., [(y1j1 (k) −y1j1 (k ) )+(y2j2 (k) −y2j2 (k ) )+· · ·+(yqjq (k) −yqjq (k ) )]/q = y¯··k − y¯··k is an unbiased estimator of τk − τk . (b) Given a Latin square with q treatments, let X be the corresponding model matrix and let S = {(i, j, k) : Treatment k is assigned to the cell in Row i and Columnj }. Note that there are q 2 elements in S. Then any linear combination of 2 the rows of X may be written as aT X where a ∈ Rq , and let us agree to index the elements of a by aij k , where (i, j, k) ∈ S, arranged in lexicographic order. Then aT X = (a··· , a1·· , a2·· , . . . , aq·· , a·1· , . . . , a·q· , a··1 , . . . , a··q ). Therefore, cT β is estimable if and only if the coefficients on the αi ’s sum to the coefficient on μ, and the coefficients on the γj ’s and τk ’s do likewise. (c) Let j1 (1) be defined as in part (a). Consider the set of functions {μ + α1 + γj1 (1) + τ1 , α2 − α1 , . . . , αq − α1 , γ2 − γ1 , . . . , γq − γ1 , τ2 − τ1 , . . . , τq − τ1 }. There are 3q − 2 functions in this set, all of which are estimable by part (b), ˜ where X ˜ is the and they are linearly independent. Furthermore, R(X) ⊆ R(X) model matrix of a q × q × q three-way main effects model with no empty cells; ˜ = 3q − 2 by the result of Exercise 6.17c, any basis for and because rank(X) the set of estimable functions in a Latin square with q treatments can contain no more than 3q − 2 functions. Thus, the set in question is a basis for the set of all estimable functions. (d) By part (c) and Corollary 6.1.2.3, rank(X) = 3q − 2. Exercise 22 An experiment is being planned that involves two continuous explanatory variables, x1 and x2 , and a response variable y. The model relating y to the explanatory variables is yi = β1 + β2 xi1 + β3 xi2 + ei
(i = 1, . . . , n).
Suppose that only three observations can be taken (i.e. n = 3), and that these three observations must be taken at distinct values of (xi1 , xi2 ) such that xi1 = 1, 2, or 3, and xi2 = 1, 2, or 3 (see Fig. 6.2 for a depiction of the nine allowable (xi1 , xi2 )-pairs, which are labelled as A, B, . . ., I for easier reference). A three-observation design can thus be represented by three letters: for example, ACI is the design consisting of the three points A, C, and I. For some of the three-observation designs, all parameters in the model’s mean structure (β1 , β2 , and β3 ) are estimable. However, there are some three-observation designs for which not all of these parameters are estimable. List eight designs of the latter type, and justify your list.
6 Estimability
3
G
H
I
2
D
E
F
1
A
B
C
0
X2
4
56
0
1
2
3
4
X1 Fig. 6.2 Experimental layout for Exercise 6.22
Solution For all elements of β (β1 , β2 , and β3 ) to be estimable, rank(X) must equal 3. But for designs ABC, DEF, GHI, GDA, HEB, and IFC, X = (1, x2 , x3 ) where either x2 or x3 is equal to c1 where c ∈ {1, 2, 3}, so rank(X) = 2 for these six designs. Furthermore, for design AEI, x2 = x3 = (1, 2, 3)T , so rank(X) = 2 for this design. Finally, for design GEC, x2 = (1, 2, 3)T = 41 − x3 so rank(X) = 2 for this design also. Exercise 23 Suppose that y follows a Gauss–Markov model in which ⎛
⎞ 11 t ⎜ 1 2 2t ⎟ ⎜ ⎟ ⎜. . . ⎟ ⎜ .. .. .. ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ 1 t t2 ⎟ X=⎜ ⎟, ⎜1 1 s ⎟ ⎜ ⎟ ⎜ 1 2 2s ⎟ ⎜ ⎟ ⎜ .. .. .. ⎟ ⎝. . . ⎠ 1 s s2
⎛
⎞ β1 β = ⎝ β2 ⎠ . β3
6 Estimability
57
Here, t and s are integers such that t ≥ 2 and s ≥ 2. [Note: The corresponding model is called the “conditional linear model” and has been used for data that are informatively right-censored; see, e.g., Wu and Bailey (1989).] (a) Determine which of the following linear functions of the elements of β are always estimable, with no further assumptions on t and s. (i) β1 , (ii) β1 + β2 t + β3 s, (iii) β2 + β3 (s + t)/2. (b) Determine necessary and sufficient conditions on t and s for all linear functions of the elements of β to be estimable. Solution (i) β1 = (1, 0, 0)β = [2(1, 1, t) − (1, 2, 2t)]β, so β1 is always estimable. T (ii) Let (aT , bT ) = (a1 , . . . , at , b dT X = ( ti=1 ai + 1 , . . . , bs ). Then s d = t s t s j =1 bj , i=1 ai i + j =1 bj j, t i=1 ai i + s j =1 bj j ) ≡ (A1 , A2 + T A3 , tA2 + sA3 ), say. Now, β1 + β2 t + β3 s = c β where cT = (1, t, s). For this function to be estimable, there must exist A1 , A2 , A3 such that A1 = 1, A2 + A3 = t, and tA2 + sA3 = s. The last two equalities cannot simultaneously hold if t = s = 1. So β1 +β2 t +β3 s is not always estimable. = (0, 1, (s + t)/2)β = {(1/2)[(1, 2, 2t) − (1, 1, t)] + (iii) β2 + β3 s+t 2 (1/2)[(1, 2, 2s) − (1, 1, s)]}β, so it is always estimable. (b) A necessary and sufficient condition is s = t. To show sufficiency, let s = t and let us show that the unit vectors in three-dimensions form a basis for R(X). It was shown in part (a) that (1, 0, 0) ∈ R(X). Hence it is also the case that (1, 1, t) − (1, 0, 0) = (0, 1, t) ∈ R(X) and (1, 1, s) − (1, 0, 0) = (0, 1, s) ∈ R(X). Thus [(0, 1, t) − (0, 1, s)]/(t − s) = (0, 0, 1) ∈ R(X) and [(1/t)(0, 1, t) − (1/s)(0, 1, s)]/[(1/t) − (1/s)] = (0, 1, 0) ∈ R(X). It follows that R(X) = R3 and that all linear functions of β are estimable. To show necessity, suppose that s = t. Then {(1, 0, 0), (0, 1, t)} is a basis for R(X), implying that the dimensionality of R(X) equals 2 and hence that not all linear functions of β are estimable. (a)
Exercise 24 Consider three linear models labelled I, II, and III, which all have model matrix ⎞ ⎛ 1111 ⎜0 1 0 1⎟ ⎜ ⎟ ⎜ ⎟ X = ⎜1 0 1 0⎟. ⎜ ⎟ ⎝1 1 1 1⎠ 1010
58
6 Estimability
The three models differ only with respect to their parameter spaces for β = [β1 , β2 , β3 , β4 ]T , as indicated below: Parameter space for Model I : {β : β ∈ R4 }, Parameter space for Model II : {β : β1 + β4 = 5}, Parameter space for Model III : {β : β1 + β4 = 0}. Under which, if any, of these models does an unbiased estimator of β2 + β3 exist which is of the form tT y? Justify your answer. Solution (a) By Theorem 6.2.1, tT y is unbiased for β2 + β3 = (0, 1, 1, 0)β if and only if a scalar g exists such that gh = 0 and tT X + gA = (0, 1, 1, 0). Under Model I, A = 0 so tT y is unbiased for β2 + β3 if and only if tT X = (0, 1, 1, 0). But (0, 1, 1, 0) ∈ / R(X) = span{(1, 0, 1, 0), (0, 1, 0, 1)}, so tT y cannot be unbiased under Model I. Under Model II, A = (1, 0, 0, 1) and tT y is unbiased for β2 + β3 if and only if a scalar g exists such that g · 5 = 0 and tT X + g(1, 0, 0, 1) = (0, 1, 1, 0), or equivalently, if and only if tT X = (0, 1, 1, 0). Thus, as previously, tT y cannot be unbiased under Model II. Under Model III, tT y is unbiased for β2 + β3 = (0, 1, 1, 0)β if and only if a scalar g exists such that g · 0 = 0 and tT X + gA = (0, 1, 1, 0). Choosing g = −1 and tT = (1, 0, 0, 0, 0) satisfies the requirements, so tT y is unbiased under Model III. Exercise 25 Consider two linear models of the form y = Xβ + e, where y is a 4-vector, β is the 3-vector (β1 , β2 , β3 )T , and X is the 4 × 3 matrix ⎛
⎞ 1 35 ⎜ 0 4 4⎟ ⎜ ⎟ ⎝ −2 6 2 ⎠ . 4 −3 5 (Observe that the third column of X is equal to the second column plus twice the first column, but the first and the second columns are linearly independent, so the rank of X is 2). In the first model, β is unconstrained in R3 . In the second model, β is constrained to the subset of R3 for which β1 = β2 . (a) Under which of the two models is the function β2 + β3 estimable? (Your answer may be neither model, the unconstrained model only, the constrained model only, or both models.)
6 Estimability
59
(b) Under which of the two models is the function β1 + β3 estimable? (Your answer may be neither model, the unconstrained model only, the constrained model only, or both models.) Solution (a) β2 + β3 = (0, 1, 1)β, which is estimable under both models because (0, 1, 1) = (1/4)(0, 4, 4) ∈ R(X) ⊆ R
X . A
(b) β1 + β3 = (1, 0, 1)β, which is estimable under the constrained model but not under the unconstrained model. To see this, observe that the second and third rows of X are linearly independent and span R(X), and the only linear combination of these two rows yielding a vector whose first element is 1 and second element is 0 is (3/4)(0, 4, 4) + (−1/2)(−2, 6, 2) = (1, 0, 2). So (1, 0, 1) ∈ / R(X). However, (1, 0, 1) = (1/4)(0, 4, 4) + (1, −1, 0), where the first vector on the right-hand side is an element of R(X) and the second vector on the side is an element of R(A) = R(1, −1, 0). right-hand X Thus (1, 0, 1) ∈ R . A Exercise 26 Prove Corollary 6.2.2.1: If cT1 β, cT2 β, . . . , cTk β are estimable under the model {y, Xβ : Aβ = h}, then so is any linear combination of them; that is, the set of estimable functions under a given constrained model is a linear space. X Solution Because cT1 β,. . . ,cTk β are estimable, cTi ∈ R for all i. Because A X R is a linear space, any linear combination of cT1 β,. . . ,cTk β is an element A X of R , hence estimable. A Exercise 27 Prove Corollary 6.2.2.2: The elements of Xβ are estimable under the model {y, Xβ : Aβ = h}. Solution The ith element of Xβ can be expressed as xTi β, where xTi is the ith row X of X. Clearly, xTi ∈ R , so the ith element of Xβ is estimable. A
60
6 Estimability
X Exercise 28 Prove Corollary 6.2.2.3: A set of rank linearly independent A estimable functions under the model {y, Xβ : Aβ = h} exist such that any estimable function under that model can be written as a linear combination of functions in this set. X T T Solution Next, let r1 , . . . , rs represent any basis for R where necessarily A X s = rank . Then rT1 β, . . . , rTs β satisfy the requirements of the definition of a A basis for the set of all estimable functions. Exercise 29 Prove Corollary 6.2.2.4: cT β isestimable for every vector c under X the model {y, Xβ : Aβ = h} if and only if rank = p. A X Solution cT β is estimable for all cT ∈ Rp if and only if R = Rp , i.e., if and A X X only if R = R(Ip ), i.e., if and only if rank = rank(Ip ), i.e., if and only A A X if rank = p. A Exercise 30 Prove Corollary6.2.2.5: cT β is estimable under the model {y, Xβ : XT X Aβ = h} if and only if cT ∈ R . A Solution Corollary 6.2.2.5 follows immediately from Theorems 6.2.2 and 6.2.3. Exercise 31 Consider the constrained model {y, Xβ : Aβ = h}. (a) Suppose that a function cT β exists that is estimable under this model but is nonestimable under the unconstrained model {y, Xβ}. Under these circumstances, X how do R(X) ∩ R(A), R , and R(X) compare to each other? Indicate A which subset inclusions, if any, are strict. (b) Again suppose that a function cT β exists that is estimable under this model but is nonestimable under the model {y, Xβ}. Under these circumstances, if a vector t exists such that tT y is an unbiased estimator of cT β under the constrained model, what can be said about A and h? Be as specific as possible.
6 Estimability
61
Solution
X (a) Because is estimable under the constrained model, ∈ R . And, A the unconstrained model, cT ∈ / R(X). Thus, because cT β is nonestimable under X {R(X) ∩ R(A)} ⊆ R(X) ⊆ R , where the latter inclusion is strict. A (b) The situation described implies, by Theorem 6.2.1, that a vector g exists such / R(X). that gT h = 0, tT X + gT A = cT , and gT A ∈ cT β
cT
Reference Wu, M. C. & Bailey, K. R. (1989). Estimation and comparison of changes in the presence of informative right censoring: conditional linear model. Biometrics, 45, 939–955.
7
Least Squares Estimation for the Gauss–Markov Model
This chapter presents exercises on least squares estimation for the Gauss–Markov model, and provides solutions to those exercises. Exercise 1 The proof of the consistency of the normal equations (Theorem 7.1.1) relies upon an identity (Theorem 3.3.3c) involving a generalized inverse of XT X. Provide an alternate proof of the consistency of the normal equations that does not rely on this identity, by showing that the normal equations are compatible. Solution Let v represent any vector such that vT XT X = 0T . Post-multiplication of both sides of this equation by v results in another equation vT XT Xv = 0, implying (by Corollary 2.10.4.1) that Xv = 0, hence that vT XT y = (Xv)T y = 0. This establishes that the normal equations are compatible, hence consistent by Theorem 3.2.1. Exercise 2 For observations following a certain linear model, the normal equations are ⎛
⎞⎛ ⎞ ⎛ ⎞ 7 −2 −5 β1 17 ⎝ −2 3 −1 ⎠ ⎝ β2 ⎠ = ⎝ 34 ⎠ . −5 −1 6 −51 β3 Observe that the three rows of the coefficient matrix sum to 0T3 . (a) Obtain two distinct solutions, say βˆ 1 and βˆ 2 , to the normal equations. (b) Characterize, as simply as possible, the collection of linear functions cT β = c1 β1 + c2 β2 + c3 β3 that are estimable. (c) List one nonzero estimable function, cTE β, and one nonestimable function, cTN E β. (Give numerical entries for cTE and cTN E .)
© Springer Nature Switzerland AG 2020 D. L. Zimmerman, Linear Model Theory, https://doi.org/10.1007/978-3-030-52074-8_7
63
64
7 Least Squares Estimation for the Gauss–Markov Model
(d) Determine numerically whether the least squares estimator of your estimable function from part (c) is the same for both of your solutions from part (a), i.e., determine whether cTE βˆ 1 = cTE βˆ 2 . Similarly, determine whether cTN E βˆ 1 = cTN E βˆ 2 . Which theorem does this exemplify? Solution (a) The coefficient matrix has dimensions 3 × 3 but rank 2, so by Theorems 3.2.3 and 3.3.6 two solutions to the normal equations are ⎛
3 17 2 ˆβ 1 = ⎜ ⎝ 17
⎞
⎞⎛
⎞
⎛
2 7 17 17 0 7 0 ⎟ ⎜ 34 ⎟ = ⎜ 16 ⎟ ⎠ ⎝ ⎠ 17 ⎠ ⎝
0 0 0
−51
0
⎛
0 ⎜ ˆ and β 2 = ⎝ 0 0
⎞ ⎞ ⎛ 0 17 ⎟ ⎟ ⎜ 1 ⎟⎜ = ⎝ 9 ⎠. 17 ⎠ ⎝ 34 ⎠ 3 −7 −51 17
0 0 6 17 1 17
⎞⎛
(b) R(X) = {cT ∈ R3 : cT 1 = 0}. (c) cTE β = β1 − 2β2 + β3 , cTN E β = β1 + β2 + β3 . ⎞ ⎛ ⎛ ⎞ 0 7 (d) cTE βˆ 1 = (1, −2, 1) ⎝ 16 ⎠ = −25 = (1, −2, 1) ⎝ 9 ⎠ = cTE βˆ 2 , but cTN E βˆ 1 = −7 0 ⎞ ⎛ ⎛ ⎞ 0 7 (1, 1, 1) ⎝ 16 ⎠ = 23 = 2 = (1, 1, 1) ⎝ 9 ⎠ = cTN E βˆ 2 . This exemplifies −7 0 Theorem 7.1.3. Exercise 3 Consider a linear model for which ⎞ ⎛ ⎛ ⎞ 1 1 1 −1 y1 ⎜ 1 1 1 −1 ⎟ ⎜ y2 ⎟ ⎟ ⎜ ⎜ ⎟ ⎜ 1 1 −1 1 ⎟ ⎜y ⎟ ⎟ ⎜ ⎜ 3⎟ ⎟ ⎜ ⎜ ⎟ ⎜ 1 1 −1 1 ⎟ ⎜ y4 ⎟ y = ⎜ ⎟, X = ⎜ ⎟, ⎜ 1 −1 1 1 ⎟ ⎜ y5 ⎟ ⎟ ⎜ ⎜ ⎟ ⎜ 1 −1 1 1 ⎟ ⎜ y6 ⎟ ⎟ ⎜ ⎜ ⎟ ⎝ −1 1 1 1 ⎠ ⎝ y7 ⎠ y8
−1
1
1
⎛
⎞ β1 ⎜ β2 ⎟ ⎟ β=⎜ ⎝ β3 ⎠ . β4
1
(a) Obtain the normal equations for this model and solve them. (b) Are all functions cT β estimable? Justify your answer. (c) Obtain the least squares estimator of β1 + β2 + β3 + β4 .
7 Least Squares Estimation for the Gauss–Markov Model
65
Solution (a) The normal equations are ⎛
8 ⎜0 ⎜ ⎝0 0
0 8 0 0
0 0 8 0
⎛ ⎞ ⎞ y1 + y2 + y3 + y4 + y5 + y6 − y7 − y8 0 ⎜ ⎟ 0⎟ ⎟ β = ⎜ y1 + y2 + y3 + y4 − y5 − y6 + y7 + y8 ⎟ ⎝ ⎠ 0 y1 + y2 − y3 − y4 + y5 + y6 + y7 + y8 ⎠ −y1 − y2 + y3 + y4 + y5 + y6 + y7 + y8 8
and their unique solution is ⎛
⎞ (y1 + y2 + y3 + y4 + y5 + y6 − y7 − y8 )/8 ⎜ (y1 + y2 + y3 + y4 − y5 − y6 + y7 + y8 )/8 ⎟ ⎟ βˆ = ⎜ ⎝ (y1 + y2 − y3 − y4 + y5 + y6 + y7 + y8 )/8 ⎠ . (−y1 − y2 + y3 + y4 + y5 + y6 + y7 + y8 )/8 (b) Because XT X = 8I has full rank, all functions cT β are estimable by Corollary 6.1.2.4. (c) cT βˆ = 1T4 βˆ = (2y1 + 2y2 + 2y3 + 2y4 + 2y5 + 2y6 + 2y7 + 2y8 )/8 = 2y. ¯ Exercise 4 Prove Corollary 7.1.4.1: If CT β is estimable under the model {y, Xβ} and LT is a matrix of constants having the same number of columns as CT has rows, then LT CT β is estimable and its least squares estimator (associated with the ˆ specified model) is LT (CT β). Solution Because CT β is estimable under the specified model, CT = AT X for some A. Then LT CT = LT AT X = (AL)T X, implying that every row of LT CT is an element of R(X). Hence LT CT β is estimable, and by Theorem 7.1.4 its least ˆ squares estimator is LT CT (XT X)− XT y = LT [CT (XT X)− XT y] = LT CT β. Exercise 5 Prove that under the Gauss–Markov model {y, Xβ, σ 2 I}, the least squares estimator of an estimable function cT β associated with that model has uniformly (in β and σ 2 ) smaller mean squared error than any other linear location equivariant estimator of cT β. (Refer back to Exercise 6.1 for the definition of a location equivariant estimator.) Solution Let t0 + tT y represent any linear location equivariant estimator of cT β associated with the specified model. Then by Exercise 6.1, tT X = cT , implying that tT y is unbiased for cT β. Now, MSE(t0 + tT y) = var(t0 + tT y) + [E(t0 + tT y) − cT β]2 = var(tT y) + [t0 + tT Xβ − cT β]2 = var(tT y) + t02 ˆ ≥ var(cT β),
66
7 Least Squares Estimation for the Gauss–Markov Model
where the inequality holds by Theorem 7.2.3, with equality holding if and only if t0 = 0 and tT y = cT βˆ with probability one. Exercise 6 Consider the model {y, Xβ} with observations into two partitioned X1 y1 , β , where groups, so that the model may be written alternatively as y2 X2 y1 is an n1 -vector. Suppose that cT ∈ R(X1 ) and R(X1 ) ∩ R(X2 ) = {0}. Prove that the least squares estimator of cT β associated with the model {y1 , X1 β} for the first n1 observations is identical to the least squares estimator of cT β associated with the model for all the observations; i.e., the additional observations do not affect the least squares estimator. Solution First observe that cT β is estimable under both models. Let βˆ denote a solution to the normal equations for the model for all the observations. Then XT Xβˆ = XT y, i.e.,
X1 X2
T
X1 X2
βˆ =
X1 X2
T y,
i.e., XT1 X1 βˆ + XT2 X2 βˆ = XT1 y + XT2 y. Algebraic transposition yields XT1 X1 βˆ − XT1 y = −(XT2 X2 βˆ − XT2 y), and then matrix transposition yields T T (βˆ XT1 − yT )X1 = −(βˆ XT2 − yT )X2 .
The vector on the left-hand side of this last system of equations belongs to R(X1 ), while the vector on the right-hand side belongs to R(X2 ). Because these two row spaces are essentially disjoint, both vectors must be null, implying in particular that T
(βˆ XT1 − yT )X1 = 0T , or equivalently that XT1 X1 βˆ = XT1 y. Thus βˆ also satisfies the normal equations for the model for only the first n1 observations, implying that cT βˆ is the least squares estimator of cT β under both models.
7 Least Squares Estimation for the Gauss–Markov Model
67
Exercise 7 For an arbitrary model {y, Xβ}: (a) Prove that x¯1 β1 + x¯2 β2 + · · · + x¯p βp is estimable. (b) Prove that if one of the columns of X is a column of ones, then the least squares estimator of the function in part (a) is y. ¯ Solution (a) x¯1 β1 + x¯2 β2 + · · · + x¯p βp = cT β with cT = (x¯1 , x¯2 , . . . , x¯p ). But this cT = (1/n)1Tn X which is an element of R(X), so cT β is estimable. (b) Because 1n ∈ C(X), 1n = Xa for some p-vector a. Then, using part (a) and Theorem 7.1.4, the least squares estimator of cT β is (1/n)1T X(XT X)− XT y = (1/n)aT XT X(XT X)− XT y = (1/n)aT XT y = (1/n)1T y = y. ¯ Exercise 8 For the centered simple linear regression model ¯ + ei yi = β1 + β2 (xi − x)
(i = 1, . . . , n),
obtain the least squares estimators of β1 and β2 and their variance–covariance matrix (under Gauss–Markov assumptions). Solution
βˆ1 βˆ2
= [(1, x − x1) ¯ T (1, x − x1)] ¯ −1 (1, x − x1) ¯ Ty =
and var
βˆ1 βˆ2
n 0 0 SXX
−1 n
i=1 yi
SXY
=
= σ 2 [(1, x − x1) ¯ T (1, x − x1)] ¯ −1 = σ 2
y¯ SXX/SXY
,
1/n 0 . 0 1/SXX
Exercise 9 For the no-intercept simple linear regression model, obtain the least squares estimator of the slope and its variance (under Gauss–Markov assumptions). Solution βˆ = (xT x)−1 xT y = σ 2 / ni=1 xi2 .
n
i=1 xi yi /
n
2 i=1 xi ,
ˆ = σ 2 (xT x)−1 = and var(β)
Exercise 10 For the two-way main effects model with cell frequencies specified by the last 2 × 2 layout of Example 7.1-3, obtain the variance–covariance matrix (under Gauss–Markov assumptions) of the least squares estimators of: (a) the cell means; (b) the Factor A and B differences.
68
7 Least Squares Estimation for the Gauss–Markov Model
Solution ⎞ ⎛ μ + α1 + γ1 0.5 ⎟ ⎜ ⎜ μ + α1 + γ2 ⎟ 0 = σ2 ⎜ (a) var⎜ ⎝ 0 ⎝ μ + α2 + γ1 ⎠ −0.5 μ + α2 + γ2 α1 − α2 1.5 0.5 = σ2 (b) var . 0.5 1.5 γ 1 − γ2 ⎛
0 1 0 1
⎞ 0 −0.5 0 1 ⎟ ⎟. 1 1 ⎠ 1 2.5
Exercise 11 For the two-way main effects model with equal cell frequencies r, obtain the least squares estimators, and their variance–covariance matrix (under Gauss–Markov assumptions), of: (a) the cell means; (b) the Factor A and Factor B differences.
Solution (a) The model matrix is X = (1qmr , Iq ⊗ 1mr , 1q ⊗ Im ⊗ 1r ) and X X= T
AB , CD
where A=
qmr mr1Tq mr1q mrIq
,
C = qr1m , rJm×q ,
B=
qr1Tm rJq×m
D = qrIm .
By Theorem 3.3.6, one generalized inverse of A is −
A =
0 0Tq 0q (1/mr)Iq
.
Because each column of B is equal to (1/m) times the first column of A, C(B) ⊆ C(A), and similarly R(C) ⊆ R(A). Let Q = D − CA− B = qrIm − (qr/m)Jm . − = (1/qr)I − (1/qmr)J , and BQ− = One inverse of Q is Q m m generalized T qr1m [(1/qr)Im − (1/qmr)Jm ] = 0. Therefore, by Theorem 3.3.7, one rJq×m [(1/qr)Im − (1/qmr)Jm ] generalized inverse of XT X is
A− 0 0 Q−
.
7 Least Squares Estimation for the Gauss–Markov Model
69
Finally, let CT = (1qm , Iq ⊗ 1m , 1q ⊗ Im ) denote the matrix consisting of all distinct rows of X. Then the least squares estimator of the vector of cell means is ⎛
⎞ μ + α1 + γ1 ⎜ μ + α1 + γ2 ⎟ ⎜ ⎟ ⎜ ⎟ = CT (XT X)− XT y .. ⎜ ⎟ ⎝ ⎠ . μ + αq + γm = (1qm , Iq ⊗ 1m , 1q ⊗ Im )
⎞ y··· ⎟ ⎜ ⎜ y1·· ⎟ ⎟ ⎜ ⎛ ⎞ ⎜ .. ⎟ T ⎟ ⎜ 0 0 0q ⎜ . ⎟ ⎟ ⎜ ⎟⎜ × ⎝ 0q (1/mr)Iq 0 ⎠ ⎜ yq·· ⎟ ⎟ ⎜ ⎟ 0 0 (1/qr)Im − (1/qmr)Jm ⎜ ⎜ y·1· ⎟ ⎜ . ⎟ ⎜ . ⎟ ⎝ . ⎠ y·m· ⎛ ⎛
⎞ y··· ⎟ ⎜ ⎜ y1·· ⎟ ⎟ ⎜ ⎜ . ⎟ ⎜ .. ⎟ ⎟ ⎜ ⎟ ⎜ = 0qm , (1/mr)Iq ⊗ 1m , (1/qr)1q ⊗ [Im − (1/m)Jm ] ⎜ yq·· ⎟ ⎟ ⎜ ⎜ y·1· ⎟ ⎟ ⎜ ⎜ . ⎟ ⎜ . ⎟ ⎝ . ⎠ y·m· ⎛ ⎞ y¯1·· + y¯·1· − y¯··· ⎜ y¯ + y¯ − y¯ ⎟ ⎜ 1·· ··· ⎟ ·2· ⎟. =⎜ .. ⎜ ⎟ ⎝ ⎠ . y¯q·· + y¯·m· − y¯···
By Theorem 7.2.2, ⎛
y¯1·· + y¯·1· − y¯··· ⎜ y¯1·· + y¯·2· − y¯··· ⎜ var ⎜ .. ⎝ . y¯q·· + y¯·m· − y¯···
⎞ ⎟ ⎟ ⎟ = σ 2 CT (XT X)− C ⎠
70
7 Least Squares Estimation for the Gauss–Markov Model
⎛
⎞ 1Tqm ⎜ ⎟ = σ 2 0qm , (1/mr)Iq ⊗ 1m , (1/qr)1q ⊗ [Im − (1/m)Jm ] ⎝ Iq ⊗ 1Tm ⎠ T 1q ⊗ Im = σ 2 {(1/mr)(Iq ⊗ Jm ) + (1/qr)Jq ⊗ [Im − (1/m)Jm ]}. (b) By part (a) and Corollary 7.1.4.1, the least squares estimators of the Factor A differences and Factor B differences are α i − αi = y i·· − y i ··
and
γj − γj = y ·j · − y ·j · ,
for i = i = 1, . . . , q, j = j = 1, . . . , m. To compute the variance–covariance matrix of these estimators, we use an approach similar to that used in Example 7.2-2. First we obtain ! 2 σ if i = i , cov(y i·· , y i ·· ) = mr 0 if i = i ; ! 2 σ if j = j , cov(y ·j · , y ·j · ) = qr 0 if j = j ; cov(y i·· , y ·j · ) =
σ2 qmr
for all i, j.
Therefore, entries of the variance–covariance matrix obtained using Theorem 4.2.2a are as follows: for i ≤ s, i > i = 1, . . . , q, and s > s = 1, . . . , q, cov(y i·· − y i ·· , y s·· − y s ·· ) = cov(y i·· , y s·· ) − cov(y i ·· , y s·· ) − cov(y i·· , y s ·· ) + cov(y i ·· , y s ·· ) ⎧ 2 2σ ⎪ if i = s and i = s , ⎪ ⎪ mr ⎪ ⎨ σ2 if i = s and i = s , or i = s and i = s , = mr 2 σ if i = s, ⎪ ⎪ − mr ⎪ ⎪ ⎩ 0, otherwise;
for j ≤ t, j > j = 1, . . . , m, and t > t = 1, . . . , m, cov(y ·j · − y ·j · , y ·t· − y ·t · ) = cov(y ·j · , y ·t· ) − cov(y ·j · , y ·t · ) − cov(y ·j · , y ·t · ) + cov(y ·j · , y ·t · ) ⎧ 2 2σ ⎪ ⎪ qr if j = t and j = t , ⎪ ⎪ 2 ⎨σ if j = t and j = t , or j = t and j = t , = qr 2 σ ⎪− ⎪ if j = t, ⎪ ⎪ ⎩ qr 0, otherwise;
7 Least Squares Estimation for the Gauss–Markov Model
71
and for i > i = 1, . . . , q and j > j = 1, . . . , m, cov(y i·· − y i ·· , y ·j · − y ·j · ) = cov(y i·· , y ·j · ) − cov(y i ·· , y ·j · ) − cov(y i·· , y ·j · ) + cov(y i ·· , y ·j · ) =
σ2 σ2 σ2 σ2 − − + qmr qmr qmr qmr
= 0.
Exercise 12 Consider the connected 6 × 4 layout displayed in Example 6.1-4. Suppose that the model for the “data” in the occupied cells is the Gauss–Markov two-way main effects model yij = μ + αi + γj + eij , where (i, j ) ∈ {(1, 1), (1, 2), (2, 1), (2, 3), (3, 1), (3, 4), (4, 2), (4, 3), (5, 2), (5, 4), (6, 3), (6, 4)} (the occupied cells). Obtain specialized expressions for the least squares estimators of γj − γj (j > j = 1, 2, 3, 4), and show that their variances are equal. The homoscedasticity of the variances of these differences is a nice property of balanced incomplete block designs such as this one. Solution The model matrix is (only the nonzero elements are shown) ⎞ 11 1 ⎟ ⎜1 1 1 ⎟ ⎜ ⎟ ⎜ 1 ⎟ ⎜1 1 ⎟ ⎜ ⎜1 1 1 ⎟ ⎟ ⎜ ⎟ ⎜1 1 1 ⎟ ⎜ ⎟ ⎜1 1 1 ⎟ ⎜ X=⎜ ⎟ 1 1 ⎟ ⎜1 ⎜1 1 1 ⎟ ⎟ ⎜ ⎟ ⎜ 1 1 ⎟ ⎜1 ⎟ ⎜ ⎜1 1 1⎟ ⎟ ⎜ ⎝1 1 1 ⎠ 1 1 1 ⎛
72
7 Least Squares Estimation for the Gauss–Markov Model
and the coefficient matrix of the normal equations is the symmetric matrix ⎞ 12 2 2 2 2 2 2 3 3 3 3 ⎜ 2 0 0 0 0 0 1 1 0 0⎟ ⎟ ⎜ ⎟ ⎜ 2 0 0 0 0 1 0 1 0⎟ ⎜ ⎟ ⎜ ⎜ 2 0 0 0 1 0 0 1⎟ ⎟ ⎜ ⎜ 2 0 0 0 1 1 0⎟ ⎟ ⎜ XT X = ⎜ 2 0 0 1 0 1⎟ ⎟. ⎜ ⎟ ⎜ 2 0 0 1 1 ⎟ ⎜ ⎟ ⎜ 3 0 0 0⎟ ⎜ ⎟ ⎜ ⎜ 3 0 0⎟ ⎟ ⎜ ⎝ 3 0⎠ 3 ⎛
Clearly XT X satisfies the conditions of Theorem 3.3.7a with A in that theorem taken to be the upper left 7 × 7 block of XT X. Furthermore, by Theorem 3.3.6 one 0 0T6 . Thus, one generalized generalized inverse of the matrix A so defined is 06 12 I6 inverse of XT X is given by a matrix of the form specified in Theorem 3.3.7a, with ⎛
⎛
3111 ⎜3 1 0 0 Q = 3I4 − ⎜ ⎝3 0 1 0 3001 ⎛
3 1⎜ 1 = 3I4 − ⎜ ⎝ 2 1 1
1 3 1 1
1 1 3 1
0 1 1 0
0 1 0 1
3 ⎜1 ⎞ ⎜ 0 ⎜ ⎜1 T 0⎟ 0 0 ⎜ 6 ⎟ ⎜1 1 ⎠ 06 12 I6 ⎜ ⎜0 ⎜ 1 ⎝0 0
3 1 0 0 1 1 0
3 0 1 0 1 0 1
⎞ 3 0⎟ ⎟ 0⎟ ⎟ ⎟ 1⎟ ⎟ 0⎟ ⎟ 1⎠ 1
⎞ 1 1⎟ ⎟ 1⎠ 3
1 = 2(I4 − J4 ). 4 Either by first principles or by the results of Exercise 3.1a, c, we find that one generalized inverse of Q is ⎛ Q− =
1 1 (I4 − J4 ) = 2 4
3 8 ⎜−1 ⎜ 8 ⎝−1 8 − 18
⎞ − 18 − 18 − 18 3 1 1⎟ 8 −8 −8 ⎟ . 3 1 − 8 8 − 18 ⎠ − 18 − 18 38
7 Least Squares Estimation for the Gauss–Markov Model
73
Using further notation from Theorem 3.3.7a, ⎛
3 ⎜3 1 1 ⎜ Q− CA− = (I4 − J4 ) ⎜ ⎝3 2 4 3
1 1 0 0
1 0 1 0
1 0 0 1
0 1 1 0
0 1 0 1
⎞ ⎛ 1 0 −4 ⎜−1 0⎟ ⎟ 0 0T6 ⎜ 4 =⎜ 1 ⎟ ⎝−2 1 ⎠ 06 12 I6 1 − 12
− 14 − 12 − 14 − 12
− 14 − 12 − 12 − 14
− 12 − 14 − 14 − 12
− 12 − 14 − 12 − 14
⎞ − 12 − 12 ⎟ ⎟ ⎟. − 14 ⎠ − 14
Thus, the last four elements of (XT X)− XT y (which are the only elements needed to obtain least squares estimators of the γ -differences) are ⎛
⎛1 ⎜ ⎜ ⎝
4 1 4 1 2 1 2
1 4 1 2 1 4 1 2
1 4 1 2 1 2 1 4
1 2 1 4 1 4 1 2
1 2 1 4 1 2 1 4
1 2 1 2 1 4 1 4
⎞ y1· ⎜ y2· ⎟ ⎜ ⎟ ⎜ y3· ⎟ ⎜ ⎟ ⎟ 3 1 1 1 ⎞⎜ − − − ⎜ y4· ⎟ 8 8 8 8 ⎟ 3 1 1 1 ⎟⎜ − 8 8 − 8 − 8 ⎟ ⎜ y5· ⎟ ⎟. 3 1 1 1 ⎠⎜ − 8 − 8 8 − 8 ⎜ y6· ⎟ ⎜ ⎟ ⎟ − 18 − 18 − 18 38 ⎜ ⎜ y·1 ⎟ ⎜ y·2 ⎟ ⎜ ⎟ ⎝ y·3 ⎠ y·4
Thus, the least squares estimators of the γ -differences are γˆ1 − γˆ2 = γˆ1 − γˆ3 = γˆ1 − γˆ4 = γˆ2 − γˆ3 = γˆ2 − γˆ4 = γˆ3 − γˆ4 =
1 (y4· + y5· ) − 4 1 (y4· + y6· ) − 4 1 (y5· + y6· ) − 4 1 (y1· + y2· ) − 4 1 (y3· + y6· ) − 4 1 (y3· + y5· ) − 4
1 (y2· + y3· ) + 4 1 (y1· + y3· ) + 4 1 (y1· + y2· ) + 4 1 (y5· + y6· ) + 4 1 (y1· + y4· ) + 4 1 (y2· + y4· ) + 4
1 (y·1 − y·2 ), 2 1 (y·1 − y·3 ), 2 1 (y·1 − y·4 ), 2 1 (y·2 − y·3 ), 2 1 (y·2 − y·4 ), 2 1 (y·3 − y·4 ). 2
Written in terms of the individual observations, the first of these estimators is γˆ1 − γˆ2 =
1 1 (y11 − y12 ) + (y21 − y23 + y31 − y34 − y42 + y43 − y52 + y54 ), 2 4
74
7 Least Squares Estimation for the Gauss–Markov Model
which has variance ( 12 )2 (2σ 2 ) + ( 14 )2 (8σ 2 ) = σ 2 . The variances of the other five estimators may likewise be shown to equal σ 2 . Exercise 13 For the two-factor nested model, obtain the least squares estimators, and their variance–covariance matrix (under Gauss–Markov assumptions), of: (a) the cell means; (b) the Factor B differences (within levels of Factor A). Solution (a) For the full-rank reparameterized model ⎛ ⎜ ˘⎜ y=X ⎜ ⎝
μ + α1 + γ11 μ + α1 + γ12 .. .
⎞ ⎟ ⎟ ⎟+e ⎠
μ + αq + γqmq mi ˘ = ⊕ where X i=1 ⊕j =1 1nij , least squares estimators of the cell means (the elements of β in this parameterization) may be obtained as follows: q
⎛
μ + α1 + γ11 ⎜ μ + α1 + γ12 ⎜ ⎜ .. ⎜ ⎝ . μ + αq + γqmq
⎞ ⎟ " #−1 ⎟ q q T i i ⎟ = ⊕q ⊕mi 1T ⊕i=1 ⊕m ⊕i=1 ⊕m j =1 1nij j =1 1nij y i=1 j =1 nij ⎟ ⎠ ⎞ ⎛ ⎞ y11· y¯11· ⎟ ⎜ ⎟ ⎜ ⎜ y12· ⎟ ⎜ y¯12· ⎟ q i ⎜ . ⎟ = ⎜ . ⎟. = ⊕i=1 ⊕m (1/n ) ij j =1 ⎜ . ⎟ ⎜ . ⎟ ⎝ . ⎠ ⎝ . ⎠ yqmq · y¯qmq · ⎛
From the elements of ⎛ ⎜ ⎜ var ⎜ ⎝
μ + α1 + γ11 μ + α1 + γ12 .. .
⎞ ⎟ ⎟ ˘ T X) ˘ −1 = σ 2 ⊕q ⊕mi (1/nij ) , ⎟ = σ 2 (X i=1 j =1 ⎠
μ+α q + γqmq we obtain ! cov(y ij · , y i j · ) =
σ2 nij
if i = i and j = j ,
0 otherwise.
7 Least Squares Estimation for the Gauss–Markov Model
75
(b) By part (a) and Corollary 7.1.4.1, the least squares estimators of {γij − γij : i = 1, . . . , q; j > j = 1, . . . , mi } are γij − γij = y ij · − y ij · . Furthermore, using Theorem 4.2.2a and the variances and covariances among the estimated cell means determined in part (a), entries of the variance– covariance matrix of the least squares estimators are as follows, for i ≤ s, j > j = 1, . . . , mi and t > t = 1, . . . , ms : cov(y ij · − y ij · , y st· − y st · ) = cov(y ij · , y st· ) − cov(y ij · , y st· ) − cov(y ij · , y st · ) + cov(y ij · , y st · ) ⎧ 2 1 σ ( nij + n1 ) if i = s, j = t, j = t , ⎪ ⎪ ij ⎪ ⎪ ⎪ σ2 ⎪ if i = s, j = t, j = t , ⎪ nij ⎪ ⎪ 2 ⎪ ⎨ σ if i = s, j = t, j = t , = nij 2 ⎪ ⎪ −σ if i = s and j = t , ⎪ ⎪ nij2 ⎪ ⎪ ⎪ if i = s and j = t, ⎪ − nσ ⎪ ⎪ ⎩ ij 0 otherwise. Exercise 14 Consider the two-way partially crossed model introduced in Example 5.1.4-1, with one observation per cell, i.e., yij = μ + αi − αj + eij
(i = j = 1, . . . , q),
where the eij ’s satisfy Gauss–Markov assumptions. (a) Obtain expressions for the least squares estimators of those functions in the set {μ, αi − αj : i = j } that are estimable. (b) Obtain expressions for the variances and covariances (under Gauss–Markov assumptions) of the least squares estimators obtained in part (a). Solution (a) Recall from Example 5.1.4-1 that X = (1q(q−1) , (vTij )i=j =1,...,q ), where vij = (q)
ui
(q)
− uj . Now observe that 1Tq(q−1) 1q(q−1) = q(q − 1), (q)
(q)
1Tq(q−1) (ui − uj )i=j =1,...,q = (1)(q − 1) + (−1)(q − 1), . . . , (1)(q − 1) + (−1)(q − 1) 1×q = 0Tq ,
76
7 Least Squares Estimation for the Gauss–Markov Model
and [(vTij )i=j =1,...,q ]T [(vTij )i=j =1,...,q ] =
q q (q) (q) (q) (q) (ui − uj )(ui − uj )T i=1 j =i
= (q − 1)
q q (q) (q)T (q) (q)T ui ui + (q − 1) uj uj j =1
i=1
−
q (q) (q)T (q) (q)T (ui uj + u j ui ) i=1 j =i
= 2qIq − 2Jq = 2q[Iq − (1/q)Jq ].
One generalized inverse of this last matrix is (1/2q)Iq , so by Theorem 3.3.7 one generalized inverse of XT X is −
(X X) = T
1/[q(q − 1)] 0Tq 0q (1/2q)Iq
.
Now ⎞ yij ⎟ ⎜ ⎜ y1· − y·1 ⎟ ⎟ ⎜ y − y·2 ⎟ , XT y = ⎜ ⎟ ⎜ 2· .. ⎟ ⎜ ⎠ ⎝ . yq· − y·q ⎛
where yi· = equations is
j =i
yij and y·j =
i=j
i=j
yij . Thus, one solution to the normal
⎞ ⎛ ⎞ yij y¯·· ⎟ ⎜ (y1· − y·1 )/2q ⎟ ⎜ y1· − y·1 ⎟ ⎜ ⎟ T ⎟ ⎜ ⎜ 1/[q(q − 1)] 0 ⎜ q ⎟ ⎜ y2· − y·2 = ⎜ (y2· − y·2 )/2q ⎟ βˆ = ⎟, ⎟ ⎜ ⎟ 0q (1/2q)Iq ⎜ .. .. ⎟ ⎝ ⎜ ⎠ ⎠ ⎝ . . ⎛
i=j
yq· − y·q
(yq· − y·q )/2q
q where y¯·· = [1/q(q − 1)] i=1 j =i yij . By the result of Exercise 6.15a, μ and αi − αj are estimable (j > i = 1, . . . , q), and by Theorem 7.1.4 their least squares estimators are (q+1)T
u1
βˆ = y¯··
7 Least Squares Estimation for the Gauss–Markov Model
77
and (q+1) (q+1) (ui+1 − uj +1 )T βˆ = (yi· − y·i − yj · + y·j )/2q.
(b) var(y¯·· ) = σ 2 /[q(q − 1)], (q+1) (q+1) var[(yi· − y·i − yj · + y·j )/2q] = σ 2 (ui+1 − uj +1 )T (q+1)
× (ui+1
(q+1)
− uj +1 ) = σ 2 /q,
cov[y¯·· , (yi· − y·i − yj · + y·j )/2q] =
1/[q(q − 1)] 0Tq 0q (1/2q)Iq
σ 2 uT1
1/[q(q − 1)] 0Tq 0q (1/2q)Iq
(q+1)
(q+1)
× (ui+1 − uj +1 ) = 0, and for i ≤ s, j > i = 1, . . . , q, and t > s = 1, . . . , q, cov[(yi· − y·i − yj · + y·j )/2q, (ys· − y·s − yt· + y·t )/2q] 1/[q(q − 1)] 0Tq (q+1) T (q+1) (q+1) 2 (q+1) = σ (ui+1 − uj +1 ) (us+1 − ut+1 ) 0q (1/2q)Iq ⎧ 2 σ /q if i = s and j = t, ⎪ ⎪ ⎨ 2 σ /2q if i = s and j = t, or i = s and j = t, = ⎪ −σ 2 /2q if j = s, ⎪ ⎩ 0 otherwise. Exercise 15 For the Latin square design with q treatments described in Exercise 6.21: (a) Obtain the least squares estimators of the treatment differences τk − τk , where k = k = 1, . . . , q. (b) Obtain expressions for the variances and covariances (under Gauss–Markov assumptions) of those least squares estimators. Solution (a) As in the solution to Exercise 6.21a, for each i = 1, . . . , q let {j1 (k), j2 (k), . . . , jq (k)} represent a permutation of {1, . . . , q} such that the kth treatment in row i is assigned to column ji (k). Then the model matrix may be written as X = 1q 2 , Iq ⊗ 1q , 1q ⊗ Iq , I˜ ,
78
7 Least Squares Estimation for the Gauss–Markov Model
⎛˜ ⎞ I1 ⎜ I˜2 ⎟ ⎜ ⎟ where I˜ = ⎜ . ⎟ consists of q blocks where the columns of each block are a ⎝ .. ⎠ I˜q permutation of the columns of Iq , so that the kth column of I˜i is the unit q-vector uji (k) . It is easy to verify that whatever the specific assignment of treatments to rows and columns, ⎛
q2 ⎜ q1q XT X = ⎜ ⎝ q1q q1q
q1Tq qIq Jq Jq
q1Tq Jq qIq Jq
⎞ q1Tq Jq ⎟ ⎟≡ AB , CD Jq ⎠ qIq
say, where A = q 2 , B = q1T3q = CT , and D = q{I3 ⊗ [Iq − (1/q)Jq ] + J3 ⊗ (1/q)Jq }. Observe that D13q = (3q)13q , so C(C) ⊆ C(D) and R(B) ⊆ R(D). Also, it is easily verified that one generalized inverse of D is D− = (1/q){I3 ⊗ [Iq − (1/q)Jq ] + (1/9)J3 ⊗ (1/q)Jq }. Then by Theorem 3.3.7, one generalized inverse of XT X is −
(X X) = T
0 0T 0 D−
,
where we used the fact that P ≡ A − BD− C = 0 so 0 is a generalized inverse of P. Now let S = {(i, j, k) : treatment k is assigned to cell (i, j )} and let y = (yij k : (i, j, k) ∈ S) be ordered lexicographically. Then a solution to the normal equations is ⎛
⎞ y··· ⎜ ⎟ ⎜ y1·· ⎟ ⎜ ⎟ ⎜ .. ⎟ ⎜ . ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ yq·· ⎟ ⎟ ⎜ ⎜ y·1· ⎟ 0T ⎜ ⎟ ˆβ = (XT X)− XT y = 0 ⎜ . ⎟ .. ⎟ 0 (1/q){I3 ⊗ [Iq − (1/q)Jq ] + (1/9)J3 ⊗ (1/q)Jq } ⎜ ⎜ ⎟ ⎜ ⎟ ⎜ y·q· ⎟ ⎜ ⎟ ⎜y ⎟ ⎜ ··1 ⎟ ⎜ . ⎟ ⎜ . ⎟ ⎝ . ⎠ y··q
7 Least Squares Estimation for the Gauss–Markov Model
79
⎛
⎞ 0 ⎜ ⎟ ⎜ y¯1·· − (2/3)y¯··· ⎟ ⎜ ⎟ ⎜ y¯2·· − (2/3)y¯··· ⎟ ⎜ ⎟ ⎜ ⎟ .. ⎜ ⎟ . ⎜ ⎟ ⎜ ⎟ ⎜ y¯q·· − (2/3)y¯··· ⎟ ⎜ ⎟ ⎜ ⎟ = ⎜ y¯·1· − (2/3)y¯··· ⎟ . ⎜ ⎟ .. ⎜ ⎟ . ⎜ ⎟ ⎜ ⎟ ⎜ y¯·q· − (2/3)y¯··· ⎟ ⎜ ⎟ ⎜ y¯ − (2/3)y¯ ⎟ ⎜ ··1 ··· ⎟ ⎜ ⎟ .. ⎜ ⎟ ⎝ ⎠ . y¯··q − (2/3)y¯···
It follows that the least squares estimator of τk − τk is [y¯··k − (2/3)y¯··· ] − [y¯··k − (2/3)y¯··· ] = y¯··k − y¯··k (k = k = 1 . . . , q). (b) For k ≤ l, k < k , and l < l , ⎧ 2 ⎪ ⎪ 2σ /q ⎪ ⎨ σ 2 /q cov(y¯··k − y¯··k , y¯··l − y¯··l ) = ⎪ −σ 2 /q ⎪ ⎪ ⎩ 0
if k = l and k = l , if k = l and k = l , or k = l and k = l , if k = l, otherwise.
Exercise 16 Consider a simple linear regression model (with Gauss–Markov assumptions) for (x, y)-observations {(i, yi ) : i = 1, 2, 3, 4, 5}. (a) Give a nonmatrix expression for the least squares estimator, βˆ2 , of the slope, in terms of the yi ’s only. (b) Give the variance of βˆ2 as a multiple of σ 2 . (c) Show that each of the linear estimators γˆ = y2 − y1
and
ηˆ = (y5 − y2 )/3
is unbiased for the slope, and determine their variances. Are these variances larger, smaller, or equal to the variance of βˆ2 ? (d) Determine the bias and variance of cγˆ + (1 − c)ηˆ (where 0 ≤ c ≤ 1) as an estimator of the slope. (e) Find the choice of c that minimizes the variance in part (d), and determine this minimized variance. Is this variance larger, smaller, or equal to the variance of βˆ2 ?
80
7 Least Squares Estimation for the Gauss–Markov Model
Solution 5 (xi −x)y ¯ i i=1 5 ¯ 2 i=1 (xi −x) 2 ˆ var(β2 ) = σ /10
(a) βˆ2 =
=
y4 +2y5 −y2 −2y1 10
(b) (c) E(y2 − y1 ) = β1 + 2β2 − (β1 + β2 ) = β2 , E[(y5 − y2 )/3] = [β1 + 5β2 − (β1 + 2β2 )]/3 = β2 , var(y2 − y1 ) = var(y2 ) + var(y1 ) = 2σ 2 , var[(y5 − y2 )/3] = [var(y5 ) + var(y2 )]/9 = 2σ 2 /9. These variances are larger than var(βˆ2 ). (d) E[cγˆ + (1 − c)η] ˆ = cE(γˆ ) + (1 − c)E(η) ˆ = cβ2 + (1 − c)β2 = β2 , so the bias is 0. Because cγˆ + (1 − c)ηˆ = c(y2 − y1 ) + [(1 − c)/3](y5 − y2 ) = [(1 − c)/3]y5 + [(4c − 1)/3]y2 − cy1 , var(cγˆ + (1 − c)η) ˆ = σ 2 {[(1 − c)2 /9] + [(4c − 1)2 /9] + [9c2 /9]} = (σ 2 /9)(1 − 2c + c2 + 16c2 − 8c + 1 + 9c2 ) = (σ 2 /9)(26c2 − 10c + 2). (e) Define f (c) = 26c2 −10c+2. Taking the first derivative of f and setting it equal to 0 yields an equation whose solution is 5/26. The second derivative of f is 52, which is positive, so the stationary point of f at c = 5/26 is a point of minimum. The minimized variance is var[(5/26)γˆ + (21/26)η] ˆ = (σ 2 /9)[26(5/26)2 − 2 10(5/26) + 2] = 3σ /26. This variance is larger than var(βˆ2 ). Exercise 17 Consider the cell-means parameterization of the two-way model with interaction, i.e., yij k = μij + eij k
(i = 1, . . . , q; j = 1, . . . , m; k = 1, . . . , nij ).
Suppose that none of the cells is empty. Under Gauss–Markov assumptions, obtain expressions for the variance of the least squares estimator of μij −μij −μi j +μi j and the covariance between the least squares estimators of every pair μij − μij − μi j + μi j and μst − μst − μs t + μs t of such functions. Solution We seek an expression for cov(y¯ij −y¯ij −y¯i j +y¯i j , y¯st −y¯st −y¯s t +y¯s t ), where i and i are distinct, j and j are distinct, s and s are distinct, and t and t are distinct. Without loss of generality we may take i ≤ s. The least squares estimator of each interaction contrast of the type described in this exercise corresponds to a 2 × 2 array of cells. It is considerably less complicated/tedious if we systematize by the extent and nature of the overlap of these 2 × 2 arrays, rather than by the indices of the cells. With respect to the extent of overlap, there are four cases: no overlap, a one-cell overlap, a two-cell overlap, and a complete (four-cell overlap); it is not possible for only three cells to overlap. An instance of each of the possibilities is depicted in Fig. 7.1. Suppose initially that i < i , j < j , k < k , and m < m .
7 Least Squares Estimation for the Gauss–Markov Model
81
(a)
(b)
(c)
(d)
Fig. 7.1 Four cases of overlap among 2 × 2 arrays in a two-way layout
• No overlap case: In this case, all eight means in the two contrasts are uncorrelated, so cov(y¯ij − y¯ij − y¯i j + y¯i j , y¯st − y¯st − y¯s t + y¯s t ) = 0. • One-cell overlap case: In order to deal with the 1-cell overlap cases, let us agree to refer to the cells in each array as the top left (TL), top right (TR), bottom left (BL), and bottom right (BR) cells. By assumption, indices ij and st refer to the TL cells of the first and second arrays, respectively. Thus, the coefficients on the four cells in any interaction contrast are 1, −1, −1, 1 for the TL, TR, BL, and BR cells, respectively. Let TL/TL denote the situation in which the overlapping cell is the TL cell of both arrays, TL/TR the situation in which the overlapping cell is the TL cell of the first array but the TR cell of the second array, etc. Observe that the coefficients corresponding to the overlapping cell in the two interaction contrasts have the same sign if the overlapping cell is at the same corner or opposite corners in the two arrays; otherwise, those coefficients have opposite sign. Thus, cov(y¯ij − y¯ij − y¯i j + y¯i j , y¯st − y¯st − y¯s t + y¯s t )
=
⎧ ⎪ ⎨σ2
if the overlapping cell is TL/TL, TL/BR, TR/TR, TR/BL, BL/BL, BL/TR, BR/BR, BR/TL ⎪ ⎩ −σ 2 1∗ otherwise, n 1 n∗
where n∗ is the frequency of the overlapping cell.
82
7 Least Squares Estimation for the Gauss–Markov Model
• Two-cell overlap case: In the case of a two-cell overlap, the two overlapping cells form either a row or a column in each array (they cannot lie in opposite corners of an array). We can refer to these cells as either the left column (L), right column (R), top row (T), or bottom row (B) of cells in each array. The coefficients corresponding to the overlapping cells in the first contrast have the same sign as the coefficients corresponding to those cells in the second contrast if and only if the overlapping cells are on the same side of both arrays, and have opposite signs otherwise. Thus, cov(y¯ij − y¯ij − y¯i j + y¯i j , y¯st − y¯st − y¯s t + y¯s t ) =
⎧ ⎨σ2 ⎩ −σ 2
1 n∗
+ 1 n∗
1 n∗∗
+
1 n∗∗
if the overlapping cells are L/L, R/R, T/T, or B/B, otherwise,
where n∗ and n∗∗ are the frequencies of the overlapping cells. • Four-cell overlap case: In the case of a four-cell overlap, the covariance between the two estimated contrasts is merely the variance of (either) one of them:
cov(y¯ij − y¯ij − y¯i j + y¯i j , y¯st − y¯st − y¯s t + y¯s t ) = σ 2
1 1 1 1 + + + nij nij ni j ni j
.
Exercise 18 This exercise generalizes Theorem 7.2.3 (the Gauss–Markov theorem) to a situation in which one desires to estimate several estimable functions and say something about their joint optimality. Let CT β be a k-vector of estimable functions under the Gauss–Markov model {y, Xβ, σ 2 I}, and let CT βˆ be the kvector of least squares estimators of those functions associated with that model. Furthermore, let k + BT y be any vector of linear unbiased estimators of CT β. Prove the following results (the last two results follow from the first). For the last result, use the following lemma, which is essentially the same as Theorem 18.1.6 of Harville (1997): If M is a positive definite matrix, Q is a nonnegative definite matrix of the same dimensions as M, and M − Q is nonnegative definite, then |M − Q| ≤ |M|. ˆ is nonnegative definite; (a) var(k + BT y) − var(CT β) ˆ (b) tr[var(k + BT y)] ≥tr[var(CT β)]; ˆ (c) |var(k + BT y)| ≥ |var(CT β)|. Solution (a) For any k-vector , T CT β is estimable (by Corollary 6.1.2.1), and T k + (B)T y and (C)T βˆ are linear unbiased estimators of it (if T k = 0). Therefore,
7 Least Squares Estimation for the Gauss–Markov Model
83
by Theorem 7.2.3, ˆ 0 ≤ var(T k + T BT y) − var(T CT β) ˆ = T [var(BT y) − var(CT β)] ˆ = T [var(k + BT y) − var(CT β)]. The result follows from the definition of nonnegative definiteness. (b) This follows immediately from the result of part (a) upon taking to equal each of the unit vectors, and then summing both sides of the corresponding inequalities over all such vectors. (c) var(k + BT y) = σ 2 BT B and, because BT X = CT (due to the unbiasedness of k + BT y), ˆ = σ 2 CT (XT X)− C var(CT β) = σ 2 BT X(XT X)− XT B. ˆ is less than or equal to the rank of B. Now consider Thus, the rank of var(CT β) the following two cases. If |var(k + BT y)| = 0, then rank(B) < k by Theorem ˆ < k and hence that |var(CT β)| ˆ = 0. 2.11.2, implying that rank[var(CT β)] Thus the result holds for this case. On the other hand, if |var(k + BT y)| > 0, ˆ for M and then by substituting var(k + BT y) and [var(k + BT y) − var(CT β)] Q, respectively, in the result from part (a), we obtain the desired result for this case also. Exercise 19 For a balanced two-factor crossed classification with two levels of each factor and r observations per cell: (a) Show that the model yij k = θ + (−1)i τ1 + (−1)j τ2 + eij k
(i = 1, 2; j = 1, 2; k = 1, . . . , r)
is a reparameterization of the two-way main effects model given by (5.6). (b) Determine how θ , τ1 , and τ2 are related to the parameters μ, α1 , α2 , γ1 , and γ2 for parameterization (5.6) of the two-way main effects model. (c) What “nice” property does the model matrix for this reparameterization have? Does it still have this property if the data are unbalanced? (d) Indicate how you might reparameterize a three-way main effects model with two levels for each factor and balanced data in a similar way, and give the corresponding model matrix.
84
7 Least Squares Estimation for the Gauss–Markov Model
Solution (a) The model matrices associated with parameterization (5.6) and the parameterization introduced in this exercise may be written, respectively, as ⎛
1r ⎜1 ⎜ r X=⎜ ⎝ 1r 1r
⊗ (1 ⊗ (1 ⊗ (1 ⊗ (1
1 1 0 0
0 0 1 1
1 0 1 0
⎞ 0) ⎟ 1)⎟ ⎟ 0)⎠ 1)
⎛
and
⎞ 1r ⊗ ( 1 −1 −1 ) ⎜ 1 ⊗( ⎟ 1 −1 1 ) ⎟ r ˘ =⎜ X ⎜ ⎟. ⎝ 1r ⊗ ( 1 1 −1 ) ⎠ 1r ⊗ ( 1 1 1 )
˘ ∗ be the submatrices of unique rows of X and X, ˘ respectively. Then Let X∗ and X ⎛
1 0 ⎜ 0 −1 ⎜ ˘ ∗ = X∗ ⎜ X ⎜0 1 ⎜ ⎝0 0 0 0
⎞ 0 0 ⎟ ⎟ ⎟ 0 ⎟ ⎟ −1 ⎠ 1
⎞ 1 0.5 0.5 0.5 0.5 ˘ ∗ ⎝ 0 −0.5 0.5 0 0 ⎠ , X∗ = X 0 0 0 −0.5 0.5 ⎛
and
˘ and hence that the two models are which establishes that C(X) = C(X) reparameterizations of each other. ⎛ ⎞ ⎛ ⎞ θ μ + (α1 + α2 )/2 + (γ1 + γ2 )/2 ⎠. (b) ⎝ τ1 ⎠ = ⎝ (α2 − α1 )/2 τ2 (γ2 − γ1 )/2 ˘ ˘TX ˘ = 4rI. X ˘ does not retain this property (c) The columns of X are orthogonal, so X if the data are unbalanced. (d) The analogous reparameterization is yij kl = θ + (−1)i τ1 + (−1)j τ2 + (−1)k τ3 + eij kl (i = 1, 2; j = 1, 2; j = 1, 2; l = 1, . . . , r) and the corresponding model matrix is ⎞ 1r ⊗ ( 1 −1 −1 −1 ) ⎟ ⎜ ⎜ 1r ⊗ ( 1 −1 −1 1 ) ⎟ ⎟ ⎜ ⎜ 1r ⊗ ( 1 −1 1 −1 ) ⎟ ⎟ ⎜ ⎜ 1r ⊗ ( 1 −1 1 1 ) ⎟ ⎟. ⎜ ˘ X=⎜ ⎟ ⎜ 1r ⊗ ( 1 1 −1 −1 ) ⎟ ⎟ ⎜ 1 ⊗( 1 1 −1 1 ) ⎟ ⎜ r ⎟ ⎜ ⎝ 1r ⊗ ( 1 1 1 −1 ) ⎠ 1r ⊗ ( 1 1 1 1 ) ⎛
7 Least Squares Estimation for the Gauss–Markov Model
85
Exercise 20 Consider the second-order polynomial regression model yi = β1 + β2 xi + β3 xi2 + ei ,
(i = 1, . . . , n)
and the reparameterization ¯ + τ3 (xi − x) ¯ 2 + ei , yi = τ1 + τ2 (xi − x)
(i = 1, . . . , n).
The second model is obtained from the first by centering the xi ’s. Suppose that at least three of the xi ’s are distinct and that the errors satisfy Gauss–Markov assumptions. ˘ represent the model matrices corresponding to these two models. (a) Let X and X ˘ and verify that the second model is a Determine the columns of X and X reparameterization of the first model. (b) Let τˆj denote the least squares estimators of τj (j = 1, 2, 3) in the model ˘ + e. Suppose that xi = i (i = 1 . . . , n). Determine cov(τˆ1 , τˆ2 ) and y = Xτ cov(τˆ2 , τˆ3 ). (c) Determine the variance inflation factors for the regressors in the second model. Solution (a) X = (1, x1 , x2 ) where x1 = (x1 , . . . , xn )T and x2 = (x12 , . . . , xn2 )T . ˘ = (1, x1 − x1, ¯ x˘ ) where x˘ = ((x1 − x) ¯ 2 , . . . , (xn − x) ¯ 2 )T = Furthermore, X 2 2 2 2 T (x1 − 2xx ¯ 1 + x¯ , . . . , xn − 2xx ¯ n + x¯ ) . Thus ⎛
⎞ 1 −x¯ x¯ 2 ˘ = (1, x1 , x2 ) ⎝ 0 1 −2x¯ ⎠ = XT, X 0 0 1 ˘ ⊆ C(X). Furthermore, T is nonsingular, so C(X) ˘ = say, implying that C(X) C(X) by Theorems 2.8.5 and 2.8.9. Thus the second model is a reparameterization of the first. − x) ¯ = 0; moreover, by the symmetry of the values of xi (b) Of course, ni=1 (xi around their mean, ni=1 (xi − x) ¯ 3 = 0. Thus, n n ¯ i=1 (xi − x) i=1 (xi n n 2 = σ2 ⎝ (x − x) ¯ (xi i i=1 i=1 n i=1 (xi n ⎛ n 0 i=1 (xi n 2 = σ2 ⎝ (x − x) ¯ 0 i i=1 n (x i=1 i ⎛
˘ T X) ˘ −1 var(τˆ ) = σ 2 (X
n
⎞−1 − x) ¯ 2 − x) ¯ 3⎠ − x) ¯ 4 ⎞−1 − x) ¯ 2 ⎠ . − x) ¯ 4
86
7 Least Squares Estimation for the Gauss–Markov Model
Rearranging the elements of τˆ , we obtain ⎞ ⎛ n ⎞−1 τˆ2 ¯ 2 0 0 i=1 (xi − x) n var ⎝ τˆ1 ⎠ = σ 2 ⎝ (xi − x) ¯ 2⎠ 0 n n i=1 n 2 τˆ3 0 ¯ ¯ 4 i=1 (xi − x) i=1 (xi − x) ⎛ n ⎞ 1/ i=1 (xi − x) ¯ 200 = σ2 ⎝ 0 ∗ ∗⎠, 0 ∗∗ ⎛
where we have written an asterisk for elements whose exact form is unimportant for the purposes of this exercise. Thus, cov(τˆ1 , τˆ2 ) = cov(τˆ2 , τˆ3 ) = 0. (c) Using the notation introduced in Methodological Interlude #1, !
aT A−1 a VIF 2 = 1 − n ¯ − (1/n) nj=1 (xj − x)] ¯ 2 i=1 [(xi − x)
$−1 ,
where in this case a=
n
(xi − x)[(x ¯ ¯ 2 − (1/n) i − x)
n n (xj − x) ¯ 2] = (xi − x) ¯ 3 = 0. j =1
i=1
i=1
Thus VIF 2 = 1. VIF 3 is given by an analogous expression, for which some quantities are different but a is the same as it is for VIF 2 . Thus VIF 3 = 1 also. Exercise 21 Consider the Gauss–Markov model {y, Xβ, σ 2 I}, and suppose that X = (xij ) (i = 1, . . . , n; j = 1, . . . , p) has full column rank. Let βˆj be the least squares estimator of βj (j = 1, . . . , p) associated with this model. (a) Prove that σ2 var(βˆj ) ≥ n
2 i=1 xij
for all j = 1, . . . , p,
with equality if and only if ni=1 xij xik = 0 for all j = k = 1, . . . , p. (Hint: Without loss of generality take j = 1, and use Theorem 2.9.5.) (b) Suppose that n = 8 and p =4. Let Xc represent the set of all 8 × 4 model matrices X = (xij ) for which 8i=1 xij2 ≤ c for all j = 1, . . . , 4. From part (a), it follows that if an X ∈ Xc exists for which 8i=1 xij xik = 0 for all j = k = 1, . . . , 4 and 8i=1 xij2 = c for j = 1, . . . , 4, then X minimizes var(βˆj ), for all
7 Least Squares Estimation for the Gauss–Markov Model
87
j = 1, . . . , 4, over Xc . Use this fact to show that ⎛
1 ⎜1 ⎜ ⎜1 ⎜ ⎜ ⎜1 X=⎜ ⎜1 ⎜ ⎜1 ⎜ ⎝1 1
−1 −1 −1 −1 1 1 1 1
−1 −1 1 1 −1 −1 1 1
⎞ −1 1⎟ ⎟ −1 ⎟ ⎟ ⎟ 1⎟ ⎟ −1 ⎟ ⎟ 1⎟ ⎟ −1 ⎠ 1
minimizes var(βˆj ) for all j = 1, . . . , 4 among all 8 × 4 model matrices for which −1 ≤ xij ≤ 1 (i = 1, . . . , 8 and j = 1, . . . , 4). (c) The model matrix X displayed in part (b) corresponds to what is called a 23 factorial design, i.e., an experimental design for which there are three completely crossed experimental factors, each having two levels (coded as −1 and 1), with one observation per combination of factor levels. According to part (b), a 23 factorial design minimizes var(βˆj ) (for all j = 1, . . . , 4) among all 8 × 4 model matrices for which xi1 ≡ 1 and −1 ≤ xij ≤ 1 (i = 1, . . . , 8 and j = 2, 3, 4), under the “first-order” model yi = β1 + β2 xi2 + β3 xi3 + β4 xi4 + ei
(i = 1, . . . , 8).
Now consider adding seven additional combinations of the three quantitative explanatory variables to the eight combinations in a 23 factorial design. These seven additional combinations are as listed in rows 9 through 15 of the following new, larger model matrix: ⎞ 1 −1 −1 −1 ⎜ 1 −1 −1 1 ⎟ ⎟ ⎜ ⎜ 1 −1 1 −1 ⎟ ⎟ ⎜ ⎜ 1 −1 1 1 ⎟ ⎟ ⎜ ⎟ ⎜ ⎜ 1 1 −1 −1 ⎟ ⎟ ⎜ ⎜ 1 1 −1 1 ⎟ ⎟ ⎜ ⎜ 1 1 1 −1 ⎟ ⎟ ⎜ ⎟ X=⎜ ⎜1 1 1 1⎟. ⎜1 0 0 0⎟ ⎟ ⎜ ⎟ ⎜ ⎜ 1 −α 0 0 ⎟ ⎟ ⎜ ⎜1 α 0 0⎟ ⎟ ⎜ ⎜ 1 0 −α 0 ⎟ ⎟ ⎜ ⎜1 0 α 0⎟ ⎟ ⎜ ⎝ 1 0 0 −α ⎠ 1 0 0 α ⎛
88
7 Least Squares Estimation for the Gauss–Markov Model
Here α is any positive number. The experimental design associated with this model matrix is known as a 23 factorial central composite design. Use the results of parts (a) and (b) to show that this design minimizes var(βˆj ) (j = 2, 3, 4) 2 2 among all 15 × 4 model matrices for which xi1 ≡ 1 and 15 i=1 xij ≤ 8 + 2α (j = 2, 3, 4), under the “first-order” model yi = β1 + β2 xi2 + β3 xi3 + β4 xi4 + ei
(i = 1, . . . , 15).
Solution (a) Partition X = (x1 , X−1 ), so that (XT X)−1 =
a bT b C
where, by Theorem
2.9.5, a = [xT1 x1 − xT1 X−1 (XT−1 X−1 )−1 XT−1 x1 ]−1 ≥ 1/(xT1 x1 ), 2 , with with equality if and onlyif xT1 X−1 = 0T . That is, a ≥ 1/ ni=1 xi1 n equality if and only if i=1 xi1 xik = 0 for all k = 2, . . . , p. Because 2 ˆ var(β1 ) = σ a, the result holds for j = 1. Results for j = 2, . . . , p may be shown similarly. (b) According to the result stated in this X minimizes var(βˆj ), j = 1, 2, 3, 4, part, n overall 8 × 4 matrices for which i=1 xij2 ≤ 8 for j = 1, 2, 3, 4. Because {xij : −1 ≤ xij ≤ 1, i = 1, . . . , 8 and j = 1, 2, 3, 4} is a subset of that set, X also minimizes var(βˆj ), j = 1, 2, 3, 4, over this subset. (c) By parts (a) and (b), this design minimizes var(βˆj ) (j = 2, 3, 4) among all 15 × 3 model matrices for the model yi = β2 xi2 + β3 xi3 + β4 xi4 + ei 15
≤ 8 + 2α 2 (j = 2, 3, 4). Because the columns of this model are orthogonal to 115 , this design also minimizes var(βˆ j ) (j = 2, 3, 4) under the model
for which
2 i=1 xij
yi = β1 + β2 xi2 + β3 xi3 + β4 xi4 + ei . ˘ } of it. Exercise 22 Consider the model {y, Xβ} and a reparameterization {y, Xτ ˘ = XT and X = XS ˘ T . Show that the Let matrices T and ST be defined such that X converse of Theorem 7.3.1 is false by letting ⎞ 1212 X = ⎝1 2 0 0⎠, 1200 ⎛
⎞ 112 ˘ = ⎝1 0 0⎠, X 100 ⎛
⎞ 1200 ST = ⎝ 0 0 1 2 ⎠ , 0000 ⎛
7 Least Squares Estimation for the Gauss–Markov Model
89
and finding a vector b such that bT ST β is estimable under the original model but bT τ is not estimable under the reparameterized model. Solution Let b = 13 . Then bT ST = (1, 2, 1, 2) ∈ R(X), so bT ST β is estimable ˘ so bT τ is not estimable under the / R(X) under the original model. However, bT ∈ reparameterized model. Exercise 23 Prove Theorem 7.3.2: For a full-rank reparameterization of the model {y, Xβ}, R(ST ) = R(X) and hence: (a) rank(ST ) = p∗ and (b) a function cT β is estimable if and only if cT ∈ R(ST ). ˘ T , R(X) ⊆ R(ST ). Furthermore, Solution Because X = XS ˘ T ) ≤ rank(ST ) ≤ rank(X), rank(X) = rank(XS where the first inequality holds by Theorem 2.8.4 and the last inequality holds because ST has p∗ rows. Thus rank(ST ) = rank(X), proving part (a). By Theorem 2.8.5, R(ST ) = R(X), so part (b) follows from Theorem 6.1.2.
Reference Harville, D. A. (1997). Matrix algebra from a statistician’s perspective. New York: Springer-Verlag.
8
Least Squares Geometry and the Overall ANOVA
This chapter presents exercises on least squares geometry and the overall analysis of variance, and provides solutions to those exercises. ˘ is any matrix for which C(X) ˘ = C(X), Exercise 1 Prove Corollary 8.1.1.1: If X T T ˆ ˆ ˘ then PX˘ = PX and Xτˆ = Xβ, where β is any solution to X Xβ = X y and τˆ is ˘ T Xτ ˘ =X ˘ T y. any solution to X ˘ τˆ = Solution By Theorem 8.1.1, yˆ = Xβˆ = X(XT X)− XT y = PX y and yˆ = X ˘ X ˘ T X) ˘ −X ˘ T y = P ˘ y. Thus PX y = P ˘ y for all y, implying (by Theorem 2.1.1) X( X X that PX = PX˘ . Exercise 2 Prove Theorem 8.1.6c–e: Let pij denote the ij th element of PX . Then: (c) pii ≤ 1/ri , where ri is the number of rows of X that are identical to the ith row. If, in addition, 1n is one of the columns of X, then: (d) pii ≥ 1/n; (e) ni=1 pij = nj=1 pij = 1. [Hint: To prove part (c), use Theorem 3.3.10 to show (without loss of generality) 1r1 −1 xT1 where xT1 is the first that the result holds for p11 by representing X as X2 distinct row of X and r1 is the number of replicates of that row, and X 2 consists of the remaining rows of X including, as its first row, the last replicate of xT1 . To prove part (d), consider properties of PX − (1/n)Jn .] Solution According to Theorem 8.1.3b, c, I − PX is symmetric and idempotent. Thus, by Corollary 2.15.9.1 and Theorem 2.15.1, its diagonal elements 1 − pii (i = 1, . . . , n) are nonnegative, or equivalently pii ≤ 1 (i = 1, . . . , n). Now represent X © Springer Nature Switzerland AG 2020 D. L. Zimmerman, Linear Model Theory, https://doi.org/10.1007/978-3-030-52074-8_8
91
92
8 Least Squares Geometry and the Overall ANOVA
as ⎛
1r1 xT1 ⎜ 1r xT ⎜ 2 2 X=⎜ . ⎝ ..
⎞ ⎟ ⎟ ⎟, ⎠
1rm xTm where x1 , . . . , xTm are the distinct rows of X and r1 , . . . , rm are the corresponding number of replicates of those rows. Without loss of generality, assume that r1 > 1, and we will show that p11 ≤ 1/r1 . The general result can then be obtained by permuting the rows of X. As suggested in the hint, define ⎛
xT1 ⎜ 1r xT ⎜ 2 2 X2 = ⎜ . ⎝ ..
⎞ ⎟ ⎟ ⎟. ⎠
1rm xTm Then X=
1r1 −1 xT1 X2
and XT X = (r1 −1)x1 xT1 +XT2 X 2 . Now by Theorem 3.3.10 (with XT2 X 2 , x1 , r1 −1, and xT1 playing the roles of A, B, C, and D in the theorem) we obtain (XT X)− = (XT2 X 2 )− − (XT2 X 2 )− x1 (r1 − 1)[(r1 − 1) + (r1 − 1)2 xT1 (XT2 X 2 )− x1 ]− ·(r1 − 1)xT1 [(XT2 X 2 )− ]T .
Let p11(−1) = xT1 (XT2 X 2 )− x1 be the (1,1) element of PX 2 . Then by pre- and post-
multiplying the expression for (XT X)− above by xT1 and x1 , respectively, we obtain 2 p11 = p11(−1) − p11(−1) (r1 − 1)2 /[(r1 − 1) + (r1 − 1)2 p11(−1) ]
= p11(−1) /[1 + (r1 − 1)p11(−1) ] = 1/[(1/p11(−1) ) + r1 − 1] ≤ 1/r1 , where the inequality holds because p11(−1) ≤ 1 which implies that (1/p11(−1) ) − 1 ≥ 0. This proves part (c). Now suppose that 1n is one of the columns of X. Then
8 Least Squares Geometry and the Overall ANOVA
93
by Theorem 8.1.2a(i), PX 1n = 1n , implying that (1/n)Jn PX = (1/n)1n 1Tn PX = (1/n)1n 1Tn = (1/n)Jn = (1/n)PX Jn . Therefore, [PX − (1/n)Jn ][PX − (1/n)Jn ] = PX PX − (1/n)Jn PX − (1/n)PX Jn + (1/n2 )Jn Jn = PX − (1/n)Jn − (1/n)Jn + (1/n)Jn = PX − (1/n)Jn .
Thus PX − (1/n)Jn is idempotent, and it is clearly symmetric as well; hence by Corollary 2.15.9.1, PX − (1/n)Jn is nonnegative definite. Thus, by Theorem 2.15.1 its diagonal elements pii − (1/n) (i = 1, . . . , n) are nonnegative, or equivalently pii ≥ 1/n (i = 1, . . . , n). This proves part (d). For part (e), consideration of each element of the equalities PX 1n = 1n and 1Tn PX = 1Tn established above yields n n j =1 pij = 1 for all i = 1, . . . , n and i=1 pij = 1 for all j = 1, . . . , n. Exercise 3 Find var(ˆe − e) and cov(ˆe − e, e) under the Gauss–Markov model {y, Xβ, σ 2 I}. Solution var(ˆe −e) = var[(I−PX )y−e] = var[(I−PX )e−e] = var(−PX e) = PX (σ 2 I)PTX = σ 2 PX , and cov(ˆe−e, e) = cov(ˆe, e)−var(e) = cov[(I−PX )e, e]−σ 2 I = (I−PX )(σ 2 I)−σ 2 I = −σ 2 PX . Exercise 4 Determine PX for each of the following models: (a) (b) (c) (d)
the no-intercept simple linear regression model; the two-way main effects model with balanced data; the two-way model with interaction and balanced data; the two-way partially crossed model introduced in Example 5.1.4-1, with one observation per cell; (e) the two-factor nested model. Solution
(a) PX = x(xT x)−1 xT = 1/ ni=1 xi2 xxT . (b) Recall that for this situation, X = (1qmr , Iq ⊗ 1mr , 1q ⊗ Im ⊗ 1r ). By the same development given in the solution to Exercise 7.11, we obtain the following
94
8 Least Squares Geometry and the Overall ANOVA
generalized inverse of XT X: ⎛
⎞ 0 0Tq 0 ⎠. (XT X)− = ⎝ 0q (1/mr)Iq 0 0 0 (1/qr)Im − (1/qmr)Jm Thus, ⎞ 0 0 0Tq ⎟ ⎜ PX = (1qmr , Iq ⊗ 1mr , 1q ⊗ Im ⊗ 1r ) ⎝ 0q (1/mr)Iq 0 ⎠ 0 0 (1/qr)Im − (1/qmr)Jm ⎞ ⎛ 1Tqmr ⎟ ⎜ × ⎝ Iq ⊗ 1Tmr ⎠ 1Tq ⊗ Im ⊗ 1Tr Iq ⊗ 1Tmr = (1/mr)Iq ⊗ 1mr , (1/qr)1q ⊗ [Im − (1/m)Jm ] ⊗ 1r 1Tq ⊗ Im ⊗ 1Tr ⎛
= (1/mr)Iq ⊗ Jmr + (1/qr)Jq ⊗ [Im − (1/m)Jm ] ⊗ Jr .
(c) The cell-means model yij k = μij + eij k
(i = 1, . . . , q; j = 1, . . . , m; k = 1, . . . , r)
is a reparameterization of the two-way model with interaction and balanced data. Because the model matrix for this model is X = Iqm ⊗ 1r , we obtain " T #− T PX = Iqm ⊗ 1r Iqm ⊗ 1r Iqm ⊗ 1r Iqm ⊗ 1r − Iqm ⊗ 1Tr = Iqm ⊗ 1r rIqm = Iqm ⊗ (1/r)Jr . (d) Recall from Example 5.1.4-1 that X = (1q(q−1) , (vTij )i=j =1,...,q ), where vij = (q)
ui
(q)
− uj . Now observe that 1Tq(q−1) 1q(q−1) = q(q − 1), (q)
(q)
1Tq(q−1) (ui − uj )i=j =1,...,q = (1)(q − 1) + (−1)(q − 1), . . . , (1)(q − 1) + (−1)(q − 1) 1×q = 0Tq ,
8 Least Squares Geometry and the Overall ANOVA
95
and [(vTij )i=j =1,...,q ]T [(vTij )i=j =1,...,q ] =
q q (q) (q) (q) (q) (ui − uj )(ui − uj )T i=1 j =i
= (q − 1)
q q (q) (q)T (q) (q)T ui ui + (q − 1) uj uj j =1
i=1
−
q (q) (q)T (q) (q)T (ui uj + u j ui ) i=1 j =i
= 2qIq − 2Jq = 2q[(Iq − (1/q)Jq ].
One generalized inverse of this last matrix is (1/2q)Iq , so by Theorem 3.3.7 one generalized inverse of XT X is −
(X X) = T
1/[q(q − 1)] 0Tq 0q (1/2q)Iq
.
Then −
PX = X(X X) X = T
× =
T
(1q(q−1) , (vTij )i=j =1,...,q )
1Tq(q−1) (vij )i=j =1,...,q )
1/[q(q − 1)] 0Tq 0q (1/2q)Iq
1 Jq(q−1) + P2 , q(q − 1)
where the rows and columns of P2 = (p2,ij i j ) are indexed by double subscripts ij and i j , respectively, and (q)
p2,ij i j = (1/2q) · (ui
(q)
(q)
(q)
− uj )T (ui − uj )
(q)T (q)
(q)T (q)
(q)T (q)
(q)T (q) uj
= (1/2q) · (ui ui − ui uj − uj ui + uj ⎧ 1 ⎪ if i = i and j = j , ⎪ ⎪q ⎪ 1 ⎪ ⎪ ⎨ − q if i = j and j = i , 1 = 2q if i = i and j = j , or i = i and j = j , ⎪ ⎪ 1 ⎪ − 2q if i = j and j = i , or j = i and i = j , ⎪ ⎪ ⎪ ⎩0 if i = i , i = j , j = i , and j = j .
96
8 Least Squares Geometry and the Overall ANOVA
Thus PX = (pij i j ) where
pij i j
⎧ 1 ⎪ ⎪ q(q−1) + ⎪ ⎪ 1 ⎪ ⎪ ⎨ q(q−1) − 1 + = q(q−1) ⎪ ⎪ 1 ⎪ ⎪ q(q−1) − ⎪ ⎪ ⎩ 1 q(q−1)
1 q 1 q 1 2q 1 2q
if i if i if i if i if i
= i and j = j , = j and j = i , = i and j = j , or i = i and j = j , = j and j = i , or j = i and i = j , = i , i = j , j = i , and j = j .
(e) By Exercise 5.4, the model matrix for the two-factor nested model may be q written as X = (X1 , X2 , X3 ) where X1 = 1, X2 = ⊕i=1 1ni , and X3 = q mi ⊕i=1 ⊕j =1 1nij . Observe that X1 and every column of X2 can be written as a linear combination of the columns of X3 , so C(X3 ) = C(X). By Corollary 8.1.1.1, " #− q q q mi mi T i PX = PX3 = ⊕i=1 ⊕m ⊕ ⊕ 1 ⊕ 1 ⊕ 1 n n ij ij i=1 j =1 nij i=1 j =1 j =1 q T i × ⊕i=1 ⊕m j =1 1nij q q q T i i i ⊕i=1 ⊕m ⊕i=1 ⊕m = ⊕i=1 ⊕m j =1 1nij j =1 (1/nij ) j =1 1nij q
i = ⊕i=1 ⊕m j =1 (1/nij )Jnij .
Exercise 5 Determine the overall ANOVA, including a nonmatrix expression for the model sum of squares, for each of the following models: (a) (b) (c) (d)
the no-intercept simple linear regression model; the two-way main effects model with balanced data; the two-way model with interaction and balanced data; the two-way partially crossed model introduced in Example 5.1.4-1, with one observation per cell; (e) the two-factor nested model. Solution (a) Source
Rank
Model
1
Residual Total
Sum of squares n 2 n 2 i=1 xi yi / i=1 xi
yT x(xT x)−1 xT y =
n−1 n
(b) Using the solution to Exercise 8.4b,
By subtraction yT y
8 Least Squares Geometry and the Overall ANOVA
97
PX y = {(1/mr)Iq ⊗ Jmr + (1/qr)Jq ⊗ [Im − (1/m)Jm ] ⊗ Jr }y ⎞ ⎞ ⎛ ⎛ y¯·1· y¯1·· ⎜ y¯·2· ⎟ ⎜ y¯2·· ⎟ ⎟ ⎟ ⎜ ⎜ = ⎜ . ⎟ ⊗ 1mr + 1q ⊗ ⎜ . ⎟ ⊗ 1r − y¯··· 1qmr . ⎝ .. ⎠ ⎝ .. ⎠ y¯q··
y¯·m·
Hence yT PX y = (PX y)T PX y = r the overall ANOVA is as follows:
(c) Using ⎛ y¯11· ⎜ y¯12· ⎜ ⎜ . ⎝ ..
Source Model
Rank q +m−1
Residual Total
qmr − q − m + 1 qmr
m
q
j =1 (y¯i··
i=1
r
+ y¯·j · − y¯··· )2 . Therefore,
Sum of squares m 2 j =1 (y¯i·· + y¯·j · − y¯··· ) i=1
q
By subtraction yT y
the solution to Exercise 8.4c, PX y = ⎞
q y = ⊕i=1 ⊕m (1/r)J r j =1
⎟ q m ⎟ ⎟ ⊗ 1r . Hence yT PX y = (PX y)T PX y = r i=1 j =1 y¯ij2 · . Therefore, ⎠
y¯qm· the overall ANOVA is as follows: Source Model
Rank qm
Residual Total
qm(r − 1) qmr
r
Sum of squares q m 2 j =1 y¯ij · i=1 By subtraction yT y
(d) Using the solution to Exercise 8.4d, PX y =
1q(q−1) , (vTij )i=j =1,...,q
1/[q(q − 1)] 0Tq 0q (1/2q)Iq
T y × 1q(q−1) , (vTij )i=j =1,...,q ⎛
⎞ y·· ⎜ ⎟ ⎜ y1· − y·1 ⎟ ⎜ y2· − y·2 ⎟ T = {1/[q(q − 1)]}1q(q−1) , (1/2q)(vij )i=j =1,...,q ⎜ ⎟ ⎜ ⎟ .. ⎝ ⎠ . yq· − y·q
98
8 Least Squares Geometry and the Overall ANOVA
⎛ ⎜ ⎜ =⎜ ⎝
⎞
y¯·· + [(y¯1· − y¯·1 )/2] − [y¯2· − y¯·2 )/2] y¯·· + [(y¯1· − y¯·1 )/2] − [y¯3· − y¯·3 )/2] .. .
⎟ ⎟ ⎟. ⎠
y¯·· + [(y¯q−1· − y¯·q−1 )/2] − [y¯q· − y¯·q )/2] q Hence yT PX y = (PX y)T PX y = i=1 j =i [y¯·· +(1/2)(y¯i· − y¯·i )−(1/2)(y¯j · − y¯·j )]2 . Therefore, the overall ANOVA is as follows: Source Model
Rank q
Residual Total
(q − 1)2 q(q − 1)
q
Sum of squares [ y ¯ + (1/2)( y¯i· − y¯·i ) − (1/2)(y¯j · − y¯·j )]2 j =i ··
i=1
By subtraction yT y
(e) Using the solution to Exercise 8.4e, q
q
mi i PX y = ⊕i=1 ⊕m j =1 (1/nij )Jnij y = ⊕i=1 ⊕j =1 1nij y¯ij · .
Hence T q q mi i ⊕ yT PX y = (PX y)T (PX y) = ⊕i=1 ⊕m 1 y ¯ ⊕ 1 y ¯ i=1 j =1 nij ij · j =1 nij ij · =
q mi i=1 j =1
nij y¯ij2 · .
Therefore, the overall ANOVA is as follows: Source Model Residual Total
q i=1
Rank q i=1 mi
q i=1 mi j =1 nij − q mi n i=1 j =1 ij mi
Sum of squares q mi 2 i=1 j =1 nij y¯ij · By subtraction yT y
Exercise 6 For the special case of a (Gauss–Markov) simple linear regression model with even sample size n and n/2 observations at each of x = −n and x = n, obtain the variances of the fitted residuals and the correlations among them.
8 Least Squares Geometry and the Overall ANOVA
99
n 2 3 Solution In this situation, x¯ = 0 and SXX= i=1 n = n . Specializing the general expressions for the elements of PX provided in Example 8.1-2, we obtain ! pij =
1 n 1 n
+ −
n2 n3 n2 n3
=
2 n
if xi = xj ,
= 0 if xi = −xj .
So by Theorem 8.1.7d, var(eˆi ) = σ 2 [1 − (2/n)] and for i = j , ! corr(eˆi , eˆj ) =
−2/n 1−(2/n)
=
−2 n−2
0
if xi = xj , if xi = −xj .
Exercise 7 For the (Gauss–Markov) no-intercept simple linear regression model, obtain var(eˆi ) and corr(eˆi , eˆj ). Solution Px = x(xT x)−1 xT = (1/ ni=1 xi2 )xxT , so by Theorem 8.1.7d, var(eˆi ) = −xi xj / nk=1 xk2 . σ 2 (1 − xi2 / nj=1 xj2 ) and corr(eˆi , eˆj ) = % n 2 2 2 n 2 (1−xi /
k=1 xk )(1−xj /
k=1 xk )
Exercise 8 Consider the Gauss–Markov model {y, Xβ, σ 2 I}. Prove that if XT X = kI for some k > 0, and rows i and j of X are orthogonal, then corr(eˆi , eˆj ) = 0. Solution Let xTi represent row i of X. By Theorem 8.1.7d, cov(eˆi , eˆj ) = −σ 2 xTi (XT X)− xj = −σ 2 xTi (kI)−1 xj = −(σ 2 /k)xTi xj = (σ 2 /k) · 0 = 0. Exercise 9 Under the Gauss–Markov model {y, Xβ, σ 2 I} with n > p∗ and excess kurtosis matrix 0, find the estimator of σ 2 that minimizes the mean squared error within the class of estimators of the form yT (I − PX )y/k, where k > 0. How does the minimized mean squared error compare to the mean squared error of σˆ 2 specified in Theorem 8.2.3, which is 2σ 4 /(n − p∗ )? Solution By Theorem 4.2.4 and Corollary 4.2.6.2, E[yT (I−PX )y] = (Xβ)T (I−PX )Xβ + tr[(I−PX )(σ 2 I)] = 0+σ 2 tr(I−PX ) = σ 2 (n−p∗ ) and var[yT (I − PX )y] = 4(Xβ)T (I − PX )vec(I − PX ) + 2tr[(I − PX )(σ 2 I)(I − PX )(σ 2 I)] +4(Xβ)T (I − PX )(σ 2 I)(I − PX )Xβ = 2σ 4 (n − p ∗ ).
100
8 Least Squares Geometry and the Overall ANOVA
Therefore, writing MSE(σˆ 2 (k)) for the mean squared error of σˆ 2 (k) ≡ [yT (I − PX )y]/k, we obtain MSE(σˆ 2 (k)) = var(σˆ 2 (k)) + [E(σˆ 2 (k)) − σ 2 ]2 2 2 σ (n − p∗ ) 1 4 ∗ 2 = 2 [2σ (n − p )] + −σ k k =
σ4 2σ 4 ∗ ∗ 2 [2(n − p ) + (n − p ) ] − (n − p∗ ) + σ 4 k k2
= σ 4 [2(n − p∗ )h2 + (n − p∗ )2 h2 − 2(n − p∗ )h + 1], where h ≡ (1/k). Differentiating this last expression with respect to h and setting the result equal to 0, we obtain a stationary point at h = 1/(n − p∗ + 2). Because the second derivative is positive, this point minimizes the mean squared error. Equivalently, MSE(σˆ 2 (k)) is minimized at k = n − p∗ + 2. Then, MSE(σˆ 2 (k)) =
1 [2σ 4 (n − p∗ )] + (n − p∗ + 2)2
=
σ4 [2(n − p∗ ) + 4] (n − p∗ + 2)2
=
2σ 4 . n − p∗ + 2
σ 2 (n − p∗ ) − σ2 n − p∗ + 2
2
This mean squared error is smaller than that of σˆ 2 by a multiplicative factor of (n − p∗ )/(n − p∗ + 2). Exercise 10 Define a quadratic estimator of σ 2 associated with the Gauss– Markov model {y, Xβ, σ 2 I} to be any quadratic form yT Ay, where A is a positive definite matrix. (a) Prove that a quadratic estimator yT Ay of σ 2 is unbiased under the specified Gauss–Markov model if and only if AX = 0 and tr(A) = 1. (An estimator t (y) is said to be unbiased for σ 2 under a Gauss–Markov model if E[t (y)] = σ 2 for all β ∈ Rp and all σ 2 > 0 under that model.) (b) Prove the following extension of Theorem 8.2.2: Let cT βˆ be the least squares estimator of an estimable function associated with the model {y, Xβ}, suppose that n > p∗ , and let σ˜ 2 be a quadratic unbiased estimator of σ 2 under the Gauss– ˆ σ˜ 2 ) = 0. Markov model {y, Xβ, σ 2 I}. If e has skewness matrix 0, then cov(cT β, (c) Determine as simple an expression as possible for the variance of a quadratic unbiased estimator of σ 2 under a Gauss–Markov model for which the skewness matrix of e equals 0 and the excess kurtosis matrix of e equals 0.
8 Least Squares Geometry and the Overall ANOVA
101
Solution (a) By Theorem 4.2.4, E(yT Ay) = (Xβ)T AXβ + tr[A(σ 2 I)] = β T XT AXβ + σ 2 tr(A). Thus, E(yT Ay) = σ 2 for all β ∈ Rp and all σ 2 > 0 if and only if XT AX = 0 and tr(A) = 1. The condition XT AX = 0 may be re-expressed as XT CT CX = 0 (where A = CT C by the nonnegative definiteness of A), or equivalently as either CX = 0 or AX = 0. (b) By Corollary 4.2.5.1 and part (a), ˆ σ˜ 2 ) = 2cT (XT X)− XT (σ 2 I)AXβ = 0. cov(cT β, (c) By Corollary 4.2.6.3, var(yT Ay) = 2tr[A(σ 2 I)A(σ 2 I)] + 4(Xβ)T A(σ 2 I)AXβ = 2σ 4 tr(A2 ). Exercise 11 Verify the alternative expressions for the five influence diagnostics described in Methodological Interlude #3. Solution eˆi,−i = yi − xTi βˆ −i = yi − xTi βˆ − = yi − xTi βˆ + =
DFBETASj,i =
eˆi 1 − pii
eˆi 1 − pii
(XT X)−1 xi
xTi (XT X)−1 xi = eˆi +
pii eˆi 1 − pii
eˆi , 1 − pii
[(XT X)−1 xi ]j βˆj − βˆj,i eˆi % = 1 − pii σˆ −i [(XT X)−1 ]jj σˆ −i [(XT X)−1 ]jj
X(βˆ − βˆ −i )i yˆi − yˆi,−i eˆi [X(XT X)−1 xi ]i = = √ √ √ σˆ −i pii σˆ −i pii 1 − pii σˆ −i pii √ eˆi pii = , σˆ −i (1 − pii ) 2 T T −1 T xi (X X) X X(XT X)−1 xi (βˆ − βˆ −i )T XT X(βˆ − βˆ −i ) eˆi Cook’s Di = = 2 1 − pii p σˆ p σˆ 2
DFFITSi =
=
eˆi2 pii , (1 − pii )2 p σˆ 2
102
8 Least Squares Geometry and the Overall ANOVA
COVRATIOi =
2 | |(XT−i X−i )−1 σˆ −i
|(XT X)−1 σˆ 2 |
=
1 1 − pii
2 σˆ −i
=
2 σˆ −i
σˆ 2
p
p
|XT X| = |XT−i X−i |
2 σˆ −i
σˆ 2
p
|XT X| |XT X|(1 − p
ii )
,
σˆ 2
where we used Theorem 2.11.8 to obtain the penultimate expression for COVRATIOi . Exercise 12 Find E(eˆi,−i ), var(eˆi,−i ), and corr(eˆi,−i , eˆj,−j ) (for i = j ) under the Gauss–Markov model {y, Xβ, σ 2 I}. Solution Using Theorem 8.1.7b, d, E(eˆi ) eˆi = = 0, E(eˆi,−i ) = E 1 − pii 1 − pii var(eˆi ) eˆi σ2 = = , var(eˆi,−i ) = var 1 − pii 1 − pii (1 − pii )2
and for i = j ,
eˆj eˆi corr(eˆi,−i , eˆj,−j ) = corr , 1 − pii 1 − pjj =
1 1−pii
= '
&
1 1−pjj
σ2 1−pii
eˆj eˆi cov 1−p , 1−p ii jj =& eˆj eˆi var var 1−p 1−pjj ii
(−σ 2 pij ) 2 σ 1−pjj
−pij (1 − pii )(1 − pjj )
.
9
Least Squares Estimation and ANOVA for Partitioned Models
This chapter presents exercises on least squares estimation and ANOVA for partitioned linear models and provides solutions to those exercises. Exercise 1 Prove Theorem 9.1.1: For the orthogonal projection matrices P12 and P1 corresponding to the ordered two-part model {y, X1 β 1 +X2 β 2 } and its submodel {y, X1 β 1 }, respectively, the following results hold: (a) (b) (c) (d) (e) (f)
P12 P1 = P1 and P1 P12 = P1 ; P1 (P12 − P1 ) = 0; P1 (I − P12 ) = 0; (P12 − P1 )(I − P12 ) = 0; (P12 − P1 )(P12 − P1 ) = P12 − P1 ; P12 − P1 is symmetric and rank(P12 − P1 ) = rank(X1 , X2 ) − rank(X1 ).
Solution P12 P1 = P12 X1 (XT1 X1 )− XT1 = X1 (XT1 X1 )− XT1 = P1 , upon which it follows that P1 = PT1 = (P12 P1 )T = PT1 PT12 = P1 P12 . This proves part (a). Then by part (a), P1 (P12 − P1 ) = P1 P12 − P1 P1 = P1 − P1 = 0, which proves part (b). Similarly, P1 (I − P12 ) = P1 − P1 P12 = P1 − P1 = 0, (P12 − P1 )(I − P12 ) = P12 −P1 −P12 P12 +P1 P12 = P12 −P1 −P12 +P1 = 0, and (P12 −P1 )(P12 −P1 ) = P12 P12 − P1 P12 − P12 P1 + P1 P1 = P12 − P1 − P1 + P1 = P12 − P1 , proving parts (c), (d), and (e). Theorem 2.7.1b implies that P12 − P1 is symmetric, whereas the idempotency of P12 , P1 , and their difference together with Theorem 2.12.2 imply that rank(P12 − P1 ) = tr(P12 − P1 ) = tr(P12 ) − tr(P1 ) = rank(X1 , X2 ) − rank(X1 ). This establishes part (f).
© Springer Nature Switzerland AG 2020 D. L. Zimmerman, Linear Model Theory, https://doi.org/10.1007/978-3-030-52074-8_9
103
104
9 Least Squares Estimation and ANOVA for Partitioned Models
Exercise 2 Prove Theorem 9.1.3: For the matrices P12···j (j = 1, . . . , k) given by (9.4) and their successive differences, the following properties hold: (a) P12···j is symmetric and idempotent, and rank(P12···j ) = rank(X1 , . . . , Xj ) for j = 1, . . . , k; (b) P12···j Xj = Xj for 1 ≤ j ≤ j , j = 1, . . . , k; (c) P12···j P12···j = P12···j and P12···j P12···j = P12···j for j < j = 2, . . . , k; (d) P1...j (I − P12···k ) = 0 for j = 1, . . . , k and (P12···j − P12···j −1 )(I − P12···k ) = 0 for j = 2, . . . , k; (e) P1...j −1 (P12···j − P12···j −1 ) = 0 for j = 2, . . . , k and (P12···j − P12···j −1 )(P12···j − P12···j −1 ) = 0 for all j = j ; (f) (P12···j − P12···j −1 )(P12···j − P12···j −1 ) = P12···j − P12···j −1 for j = 2, . . . , k; (g) P12···j −P12···j −1 is symmetric and rank(P12···j −P12···j −1 ) = rank(X1 , . . . , Xj ) − rank(X1 , . . . , Xj −1 ) for j = 2, . . . , k; (h) P12···j − P12···j −1 is the orthogonal projection matrix onto C[(I − P12···j −1 )X12···j ], which is the orthogonal complement of C(X12···j −1 ) relative to C(X12···j ) for j = 2, . . . , k. Solution Parts (a) and (b) are merely restatements of various parts of Theorem 8.1.2a as they apply to (X1 , . . . , Xj ). Part (c) may be established using part (b) by observing that for j < j , P12···j P12···j = P12···j (X1 , . . . , Xj )[(X1 , . . . , Xj )T (X1 , . . . , Xj )]− (X1 , . . . , Xj )T = (X1 , . . . , Xj )[(X1 , . . . , Xj )T (X1 , . . . , Xj )]− (X1 , . . . , Xj )T = P12···j , upon which it follows that P12···j = PT12···j = (P12···j P12···j )T = PT12···j PT12···j = P12···j P12···j . Parts (d), (e), and (f) follow easily from part (c). We may obtain part (g) using part (a) and an argument very similar to that used to prove part (j) of Theorem 9.1.1. Finally, the proof of part (h) is exactly like that of Theorem 9.1.2, with P12···j and P12···j −1 , respectively, substituted for P12 and P1 . Exercise 3 For the ordered k-part Gauss–Markov model {y, kl=1 Xl β l , σ 2 I}, show that E[yT (P12···j − P12···j −1 )y] = ( kl=j Xl β l )T (P12···j − P12···j −1 ) X( kl=j Xl β l ) + σ 2 [rank(X1 , . . . , Xj ) − rank(X1 , . . . , Xj −1 )].
9 Least Squares Estimation and ANOVA for Partitioned Models
105
Solution By Theorem 4.2.4, E[y (P12···j − P12···j −1 )y] = T
k
T Xl β l
(P12···j − P12···j −1 )
l=1
k
Xl β l
l=1
+tr[(P12···j − P12···j −1 )(σ 2 I)] ⎛ ⎛ ⎞T ⎞ k k =⎝ Xl β l ⎠ (P12···j − P12···j −1 ) ⎝ Xl β l ⎠ l=j
l=j
+σ 2 [rank(X1 , . . . , Xj ) − rank(X1 , . . . , Xj −1 )] where for the last equality we used Theorem 9.1.3b, e. Exercise 4 Verify the expressions for the expected mean squares in the corrected sequential ANOVA for the two-way main effects model with one observation per cell given in Example 9.3.4-3. Solution y¯i· = (1/m)
m
yij ⇒ E(y¯i· ) = (1/m)
j =1
m
(μ + αi + γj ) = μ + αi + γ¯ .
j =1
Similarly, y¯·j = (1/q)
q
q yij ⇒ E(y¯·j ) = (1/q) (μ + αi + γj ) = μ + α¯ + γj
i=1
i=1
and y¯·· = (1/qm)
q m
q m yij ⇒ E(y¯·· ) = (1/qm) (μ + αi + γj ) = μ + α¯ + γ¯ .
i=1 j =1
i=1 j =1
Thus, m
q q [E(y¯i· ) − E(y¯·· )]2 = m (αi − α) ¯ 2 i=1
i=1
106
9 Least Squares Estimation and ANOVA for Partitioned Models
and q
m m [E(y¯·j ) − E(y¯·· )]2 = q (γj − γ¯ )2 , j =1
j =1
so the expected mean squares for Factor A and Factor B are, respectively, m (αi − α) ¯ 2 q −1 q
σ2 +
i=1
and q (γj − γ¯ )2 . m−1 m
σ2 +
j =1
Exercise 5 Obtain the corrected sequential ANOVA for a balanced (r observations per cell) two-way main effects model, with Factor A fitted first (but after the overall mean, of course). Give nonmatrix expressions for the sums of squares in this ANOVA table and for the corresponding expected mean squares (under a Gauss– Markov version of the model). Solution By the result obtained in Example 9.3.3-2, the sum of squares for the Factor A effects in the corrected sequential ANOVA in which those effects are q fitted first is mr i=1 (y¯i·· − y¯··· )2 . By the same result, the sum of squares for the Factor B effects sequential ANOVA in which those effects are fitted in the corrected 2 . Because equal cell frequencies are a special case of first is qr m ( y ¯ − y ¯ ) ·j · ··· j =1 proportional frequencies, by the result obtained in Example 9.3.4-2 the sums of squares corresponding to the Factor A and Factor B effects are invariant to the order in which the two factors are fitted. Thus, the corrected sequential ANOVA is Source Factor A
Rank q −1
Factor B
m−1
Residual
qmr − q − m + 1
Corrected total
qmr − 1
Sum of squares q 2 i=1 (y¯i·· − y¯··· ) m qr j =1 (y¯·j · − y¯··· )2 mr
By subtraction m r 2 j =1 k=1 (yij k − y¯··· ) i=1
q
Expected mean square mr q σ 2 + q−1 ¯ 2 i=1 (αi − α) qr m 2 σ + m−1 j =1 (γj − γ¯ )2 σ2
9 Least Squares Estimation and ANOVA for Partitioned Models
107
Exercise 6 For the balanced (r ≥ 2 observations per cell) two-way model with interaction: (a) Obtain the corrected sequential ANOVA with Factor A fitted first (but after the overall mean, of course), then Factor B, and then the interaction effects. Give nonmatrix expressions for the sums of squares in this ANOVA table and for the corresponding expected mean squares (under a Gauss–Markov version of the model). (b) Would the corrected sequential ANOVA corresponding to the ordered model in which the overall mean is fitted first, then Factor B, then Factor A, and then the interaction be the same (apart from order) as the sequential ANOVA of part (a)? Justify your answer. Solution (a) Clearly, the lines for Factor A, Factor B, and Corrected Total in this corrected sequential ANOVA coincide with those for the balanced two-way main effects model given in the solution to Exercise 9.5. The sum of squares for interaction effects may be expressed as yT (PX − P123 )y, where X = (X1 , X2 , X3 , X4 ) = (1qmr , Iq ⊗ 1mr , 1q ⊗ Im ⊗ 1r , Iqm ⊗ 1r ) and P123 is the orthogonal projection matrix onto the column space of (X1 , X2 , X3 ). Using results from the solutions to Exercises 8.5b and 8.5c, we obtain (PX − P123 )y = PX y − P123 y ⎛ ⎛ ⎛ ⎞ ⎞ y¯11· y¯1·· y¯·1· ⎜ y¯ ⎟ ⎜ y¯ ⎟ ⎜ y¯ ⎜ 12· ⎟ ⎜ 2·· ⎟ ⎜ ·2· ⎜ ⎜ ⎟ ⎟ =⎜ ⎜ .. ⎟ ⊗ 1r − ⎜ .. ⎟ ⊗ 1mr − 1q ⊗ ⎜ .. ⎝ . ⎠ ⎝ . ⎠ ⎝ . y¯qm·
y¯q··
⎞ ⎟ ⎟ ⎟ ⊗ 1r + y¯··· 1qmr . ⎟ ⎠
y¯·m·
Therefore, yT (PX −P123 )y = [(PX −P123 )y]T (PX −P123 )y = r
q m (y¯ij · − y¯i·· − y¯·j · + y¯··· )2 . i=1 j =1
108
9 Least Squares Estimation and ANOVA for Partitioned Models
Also, the rank of PX −P123 is qm−(q +m−1) = qm−q −m+1 = (q −1)(m−1). Therefore, the corrected sequential ANOVA table is as follows: Source Factor A
Rank q −1
Factor B
m−1
Interaction Residual Corrected total
(q − 1)(m − 1) qm(r − 1) qmr − 1
Sum of squares q 2 i=1 (y¯i·· − y¯··· ) m qr j =1 (y¯·j · − y¯··· )2 q m r i=1 j =1 (y¯ij · − y¯i·· − y¯·j · + y¯··· )2 mr
By subtraction m r 2 j =1 k=1 (yij k − y¯··· ) i=1
q
To obtain the expected mean squares, we use the same type of approach used in the solution to Exercise 9.4, obtaining E(y¯ij · ) = μ + αi + γj + ξ¯ij , E(y¯i·· ) = μ + αi + γ¯ + ξ¯i· , E(y¯·j · ) = μ + α¯ + γj + ξ¯·j , E(y¯··· ) = μ + α¯ + γ¯ + ξ¯·· . Substituting these expressions for the corresponding sample means in the sums of squares column, simplifying, dividing the result by the corresponding degrees of freedom, and finally adding σ 2 , we obtain the following table of expected mean squares: Source Factor A
Expected mean square
Factor B Interaction Residual
mr q ¯ + ξ¯i· − ξ¯·· )2 i=1 (αi − α q−1 qr m 2 σ + m−1 j =1 (γj − γ¯ + ξ¯·j − ξ¯·· )2 q m r ¯ ¯ ¯ 2 j =1 (ξij − ξi· − ξ·j + ξ·· ) i=1 (q−1)(m−1) 2 σ
σ2 +
σ2 +
(b) The only difference between this corrected sequential ANOVA and that given in part (a) is that the order of fitting Factor A and Factor B is interchanged. Thus, the answer to the question is yes because, as shown in Example 9.3.4-2, the two corrected sequential ANOVAs for a balanced two-way main effects model are identical apart from order. Exercise 7 Obtain the corrected sequential ANOVA for a two-factor nested model in which the Factor A effects are fitted first (but after the overall mean, of course). Give nonmatrix expressions for the sums of squares in this ANOVA table and for the corresponding expected mean squares (under a Gauss–Markov version of the model).
9 Least Squares Estimation and ANOVA for Partitioned Models
109
Solution Write the model as the ordered three-part model y = X1 β 1 + X2 β 2 + X3 β 3 + e i q q mi where X1 = 1n , X2 = ⊕i=1 1ni· where ni· = m j =1 nij , X3 = ⊕i=1 ⊕j =1 1nij , β 1 = μ, β 2 = (α1 , . . . , αq )T , and β 3 = (γ11 , γ12 , . . . , γqmq )T . Clearly P1 = (1/n)Jn so the corrected total sum of squares is y [I − (1/n)Jn ]y = T
nij q mi
(yij k − y¯··· )2 ,
i=1 j =1 k=1
with rank n − 1. Furthermore, because C(X1 ) ⊆ C(X2 ), − q q q q P12 = X2 (XT2 X2 )− XT2 = (⊕i=1 1ni· ) (⊕i=1 1ni· )T (⊕i=1 1ni· ) (⊕i=1 1ni· )T q
= ⊕i=1 (1/ni· )Jni· . Thus, "
#
SS(X2 |1) = yT (P12 − P1 )y = yT ⊕qi=1 (1/ni· )Jni· − (1/n)Jn y =
q
ni· (y¯i·· − y¯··· )2 ,
i=1 q
i with rank q − 1. Also, from Exercise 8.4e, P123 = PX = ⊕i=1 ⊕m j =1 (1/nij )Jnij . Thus,
"
#
q i SS(X3 |1, X2 ) = yT (P123 − P12 )y = yT ⊕qi=1 ⊕m j =1 (1/nij )Jnij − ⊕i=1 (1/ni· )Jni· y
=
q mi
nij (y¯ij · − y¯i·· )2 ,
i=1 j =1
with rank
q
i=1 mi
− q. Finally, the residual sum of squares is
nij q mi " # q i y = yT (I − P123 )y = yT I − ⊕i=1 ⊕m (1/n )J (yij k − y¯ij · )2 , ij nij j =1 i=1 j =1 k=1
110
9 Least Squares Estimation and ANOVA for Partitioned Models
with rank n −
q
i=1 mi .
Thus, the corrected sequential ANOVA table is
Source Factor A
q
Rank q −1
Sum of squares q 2 i=1 ni· (y¯i·· − y¯··· ) q mi 2 i=1 j =1 nij (y¯ij · − y¯i·· ) q mi nij 2 i=1 j =1 k=1 (yij k − y¯ij · ) q mi nij 2 i=1 j =1 k=1 (yij k − y¯··· )
i=1 mi − q q i=1 mi
Factor B
n−
Residual Corrected total
n−1
Exercise 8 Suppose, in a k-part model, that C(Xj ) = C(Xj ) for all j = j . Prove that the sequential ANOVAs corresponding to all ordered k-part models are identical (apart from order of listing) if and only if XTj Xj = 0 for all j = j . Solution Matrices B1 , . . . , Bk exist such that XTj Xj Bj = XTj (j = 1, . . . , k), and Pj = Xj Bj is the orthogonal projection matrix onto C(Xj ). Now suppose that XTj Xj = 0 for all j = j . Then, ⎛
XT1 X1 0 ⎜ 0 XT X2 2 ⎜ ⎜ . .. ⎝ .. . 0 0
··· ··· .. .
0 0 .. .
· · · XTk Xk
⎞ ⎛ T X1 B1 ⎟ ⎜ B2 ⎟ ⎜ XT ⎟⎜ ⎟ ⎜ 2 ⎟⎜ . ⎟ = ⎜ . ⎠ ⎝ .. ⎠ ⎝ .. ⎞⎛
Bk
⎞ ⎟ ⎟ ⎟; ⎠
XTk
that is, B ≡ (BT1 , BT2 , . . . , BTk )T satisfies the matrix equation XT XB = XT , where X = (X1 , . . . , Xk ). Hence P12···k = XB = X1 B1 + X2 B2 + · · · + Xk Bk = P1 + P2 + · · · + Pk . By applying the same argument to every ordered submodel of order less than k, we find that the orthogonal projection matrix onto the column space of any subset of {X1 , . . . , Xk } is equal to the sum of the orthogonal projection matrices onto the column spaces of each element of the subset. Thus, letting {π1 , π2 , . . . , πk } represent any permutation of the first k positive integers, we have yT (Pπ1 π2 − Pπ1 )y = yT Pπ2 y, yT (Pπ1 π2 π3 − Pπ1 π2 )y = yT Pπ3 y, and so on, for all y. Then the sequential ANOVA corresponding to the ordered k-part model y = Xπ1 β π1 + Xπ2 β π2 + · · · + Xπk β πk + e
9 Least Squares Estimation and ANOVA for Partitioned Models
111
is Source Xπ1 Xπ2 |Xπ1 Xπ3 |Xπ1 , Xπ2 .. . Xπk |Xπ1 , . . . , Xπk−1 Residual Total
Rank rank(Xπ1 ) rank(Xπ1 , Xπ2 ) − rank(Xπ1 ) rank(Xπ1 , Xπ2 , Xπ3 ) − rank(Xπ1 , Xπ2 ) .. . rank(X) − rank(Xπ1 , . . . , Xπk−1 ) n − rank(X) n
Sum of squares yT Pπ1 y yT Pπ2 y yT Pπ3 y .. . yT Pπk y yT (I − PX )y yT y
Because the permutation was arbitrary, this establishes that the sequential ANOVAs corresponding to all k! ordered k-part models are identical, apart from order of listing, under the given condition. Conversely, suppose that the sequential ANOVAs corresponding to all k! ordered k-part models are identical, apart from order of listing. Let j and j be distinct integers between 1 and k, inclusive. By considering only those permutations for which π1 = j and π2 = j , we see that either Pj = Pj or Pj = Pjj − Pj . If Pj = Pj , then C(Xj ) = C(Xj ), which contradicts the assumption. If, instead, Pj = Pjj − Pj , then XTj Pj Xj = XTj (Pjj − Pj )Xj = XTj (Pjj Xj − Pj Xj ) = XTj (Xj − Xj ) = 0, i.e., XTj Xj = 0. Because j and j were arbitrary, the result is established. Exercise 9 Prove Theorem 9.3.2: Consider the two ordered three-part models y = X1 β 1 + X2 β 2 + X3 β 3 + e and y = X1 β 1 + X3 β 3 + X2 β 2 + e, and suppose that C[(I − P1 )X2 ] = C[(I − P1 )X3 ]. A necessary and sufficient condition for the sequential ANOVAs of these models to be identical (apart from order of listing) is XT2 (I − P1 )X3 = 0.
112
9 Least Squares Estimation and ANOVA for Partitioned Models
Solution SS(X3 |X1 , X2 ) = SS(X3 |X1 ) ⇒ P123 − P12 = P13 − P1 ⇒ XT2 (P123 − P12 )X3 = XT2 (P13 − P1 )X3 ⇒ XT2 X3 − XT2 X3 = XT2 X3 − XT2 P1 X3 ⇒ XT2 (I − P1 )X3 = 0. For the converse, suppose that XT2 (I−P1 )X3 = 0. We aim to show that P123 −P12 = P13 − P1 , or equivalently that P123 − P1 = (P12 − P1 ) + (P13 − P1 ). Recall from Theorem 9.1.2 that P12 − P1 is the orthogonal projection matrix onto C[(I − P1 )X2 ]. Thus, P12 − P1 = (I − P1 )X2 B˙ 2 , where B˙ 2 is any matrix satisfying XT2 (I − P1 )X2 B˙ 2 = XT2 (I − P1 ). Similarly, P13 − P1 = (I − P1 )X3 B˙ 3 , where B˙ 3 is any matrix satisfying XT3 (I − P1 )X3 B˙ 3 = XT3 (I − P1 ). By similar reasoning,
B¨ P123 − P1 = (I − P1 )(X2 , X3 ) ¨ 2 B3 where
¨2 B ¨3 B
is any matrix satisfying (X2 , X3 )T (I − P1 )(X2 , X3 )
B¨ 2 B¨ 3
= (X2 , X3 )T (I − P1 ).
Expanding the last system of equations above yields
XT2 (I − P1 )X2 XT3 (I − P1 )X2
XT2 (I − P1 )X3 XT3 (I − P1 )X3
B¨ 2 B¨ 3
=
XT2 (I − P1 ) , XT3 (I − P1 )
9 Least Squares Estimation and ANOVA for Partitioned Models
113
but by hypothesis the coefficient matrix is block diagonal, so these equations separate as follows: XT2 (I − P1 )X2 B¨ 2 = XT2 (I − P1 ), XT3 (I − P1 )X3 B¨ 3 = XT3 (I − P1 ). But this is the same system of equations as the system defining B˙ 2 and B˙ 3 ; hence, P123 − P1 = (I − P1 )(X2 , X3 )
˙2 B ˙3 B
= (I − P1 )X2 B˙ 2 + (I − P1 )X3 B˙ 3 = (P12 − P1 ) + (P13 − P1 ). Exercise 10 Consider the linear model specified in Exercise 7.3. (a) Obtain the overall ANOVA (Source, Rank, and Sum of squares) for this model. Give nonmatrix expressions for the model and total sums of squares (the residual sum of squares may be obtained by subtraction). (b) Obtain the sequential ANOVA (Source, Rank, and Sum of squares) for the ordered two-part model {y, X1 β 1 + X2 β 2 }, where X1 is the submatrix of X consisting of its first two columns and X2 is the submatrix of X consisting of its last two columns. (Again, give nonmatrix expressions for all sums of squares except the residual sum of squares.) (c) Would the sequential ANOVA for the ordered two-part model {y, X2 β 2 +X1 β 1 } be identical to the ANOVA you obtained in part (b), apart from order of listing? Justify your answer. Solution ˆ X, and y from Exercise 7.3, Model SS = (a) Here, using expressions for β, ˆβ T XT y = (y1 + y2 + y3 + y4 + y5 + y6 − y7 − y8 )2 /8 + (y1 + y2 + y3 + y4 − y5 − y6 + y7 + y8 )2 /8 + (y1 + y2 − y3 − y4 + y5 + y6 + y7 + y8 )2 /8 + (−y1 − y2 + y3 + y4 + y5 + y6 + y7 + y8 )2 /8. So the ANOVA table is Source Model Residual Total
Rank 4 4 8
Sum of squares Model SS By subtraction 8 2 i=1 yi
114
9 Least Squares Estimation and ANOVA for Partitioned Models
(b) Source X1
Rank 2
X2 |X1
2
Residual Total
4 8
Sum of squares (y1 + y2 + y3 + y4 + y5 + y6 − y7 − y8 )2 /8 +(y1 + y2 + y3 + y4 − y5 − y6 + y7 + y8 )2 /8 (y1 + y2 − y3 − y4 + y5 + y6 + y7 + y8 )2 /8 +(−y1 − y2 + y3 + y4 + y5 + y6 + y7 + y8 )2 /8 By subtraction 8 2 i=1 yi
(c) Yes, by Theorem 9.3.1, because ⎛
1 ⎜ 1 ⎜ ⎜ −1 ⎜ ⎜ 1 1 1 1 1 1 −1 −1 ⎜ −1 XT1 X2 = ⎜ 1 1 1 1 −1 −1 1 1 ⎜ 1 ⎜ ⎜ 1 ⎜ ⎝ 1 1
⎞ −1 −1 ⎟ ⎟ 1 ⎟ ⎟ ⎟ 00 1 ⎟ . ⎟= 00 1 ⎟ ⎟ 1 ⎟ ⎟ 1 ⎠ 1
Exercise 11 Consider the two-way partially crossed model introduced in Example 5.1.4-1, with one observation per cell, i.e., yij = μ + αi − αj + eij
(i = j = 1, . . . , q).
(a) Would the sequential ANOVAs corresponding to the ordered two-part models in which the overall mean is fitted first and last be identical (apart from order of listing)? Justify your answer. (b) Obtain the sequential ANOVA (Source, Rank, and Sum of squares) corresponding to the ordered two-part version of this model in which the overall mean is fitted first. Give nonmatrix expressions for the sums of squares and the corresponding expected mean squares (under a Gauss–Markov version of the model).
9 Least Squares Estimation and ANOVA for Partitioned Models
115
Solution (a) Recall from Example 5.1.4-1 that X = (X1 , X2 ), where X1 = 1q(q−1) and (q) (q) X2 = (vTij )i=j =1,...,q , where vij = ui − uj . Now, (q)
XT1 X2 = 1Tq(q−1) (ui
(q)
− uj )Ti=j =1,...,q
= ((1)(q − 1) + (−1)(q − 1),
(1)(q − 1) + (−1)(q − 1), . . . ,
(1)(q − 1) + (−1)(q − 1))1×q = 0Tq . Thus, by Theorem 9.3.1, the two ANOVAs would be identical (apart from order of listing). (b) By part (a) and the argument used to prove the sufficiency of Theorem 9.3.1, PX = P1 + P2 , where P1 = [1/q(q − 1)]Jq(q−1) and P2 is the orthogonal projection matrix onto C[(vTij )i=j =1,...,q ]. Recall from Exercise 8.4d that PX = (pij i j ), where the elements of the q(q − 1) × q(q − 1) matrix PX are ordered by row index ij and column index i j and ⎧ 1 1 ⎪ ⎪ q(q−1) + q if i = i and j = j , ⎪ ⎪ 1 1 ⎪ ⎪ ⎨ q(q−1) − q if i = j and j = i , 1 1 + 2q if i = i and j = j , or i = i and j = j , pij i j = q(q−1) ⎪ ⎪ 1 1 ⎪ ⎪ q(q−1) − 2q if i = j and j = i , or j = i and i = j , ⎪ ⎪ ⎩ 1 if i = i , i = j , j = i , and j = j . q(q−1) Thus, P2 = (p2,ij i j ), where ⎧ ⎪ 1/q if i ⎪ ⎪ ⎪ ⎪ −1/q if i ⎨ p2,ij i j = 1/2q if i ⎪ ⎪ ⎪ −1/2q if i ⎪ ⎪ ⎩0 if i
= i and j = j , = j and j = i , = i and j = j , or i = i and j = j , = j and j = i , or j = i and i = j , = i , i = j , j = i , and j = j .
Thus, 2 yij − yij yj i + y P2 y = (1/2q) 2
i=j
i,j,j :i=j, i=j
T
+
i=j
yij yi j
i,j,i :i=j, i =j
⎛ −⎝
yij yij
i,j,i :i=j, i =i
yij yi i +
i,j,j :i=j, j =j
⎞
yij yjj ⎠ .
116
9 Least Squares Estimation and ANOVA for Partitioned Models
Furthermore, rank(P2 ) = tr(P2 ) = q(q − 1)(1/q) = q − 1. Thus, the sequential ANOVA table is as follows: Source 1
Rank 1
Sum of squares q(q − 1)y¯··2
(vij )i=j
q −1
yT P2 y
Residual
q(q − 2)
Total
q(q − 1)
By subtraction n 2 i=1 yi
Exercise 12 Consider the two corrected sequential ANOVAs for the analysis-ofcovariance model having a single factor of classification and a single quantitative variable, which was described in Sect. 9.4. (a) Obtain nonmatrix expressions for the expected mean squares (under a Gauss– Markov version of the model) in both ANOVAs. (b) Obtain a necessary and sufficient condition for the two ANOVAs to be identical (apart from order of listing). Solution (a) y¯i· = (1/ni )
ni
yij ⇒ E(y¯i· ) = (1/ni )
j =1
ni
(μ + αi + γ zij ) = μ + αi + γ z¯ i·
j =1
and y¯·· = (1/n)
q ni
yij ⇒ E(y¯·· ) = (1/n)
i=1 j =1
=μ+
q ni
(μ + αi + γ zij )
i=1 j =1
q (ni /n)αi + γ z¯ ·· , i=1
from which we obtain E(y¯i· ) − E(y¯·· ) = αi −
q
(nk /n)αk + γ (¯zi· − z¯ ·· ),
k=1
E(yij ) − E(y¯i· ) = γ (zij − z¯ i· ),
9 Least Squares Estimation and ANOVA for Partitioned Models
E(yij ) − E(y¯·· ) = μ + αi + γ zij − [μ +
117
q (nk /n)αk + γ z¯ ·· ] k=1
= αi −
q
(nk /n)αk + γ (zij − z¯ ·· ).
k=1
Thus in the first corrected ANOVA, EMS(Classes) = σ 2 +
1 ni [αi − (nk /n)αk + γ (¯zi· − z¯ ·· )]2 q −1 q
q
i=1
k=1
and EMS(z|Classes) = σ 2 + γ 2
q ni (zij − z¯ i· )2 . i=1 j =1
In the second corrected ANOVA, EMS(z) = σ 2 +
{
ni
q
j =1 [αi −
i=1
q
k=1 (nk /n)αk ](zij − z¯ ·· ) + γ q ni 2 i=1 j =1 (zij − z¯ ·· )
q
ni
2 2 j =1 (zij − z¯ ·· ) }
i=1
and EMS(Classes|z) = σ 2
⎧ ⎡ ⎤ q q q ni 1 ⎨ + ni [αi − (nk /n)αk + γ (¯zi· − z¯ ·· )]2 + γ 2 ⎣ (zij − z¯ i· )2 ⎦ q −1 ⎩
−
{
q
i=1
k=1
ni
q
i=1
j =1 [αi −
k=1 (nk /n)αk ](zij − z¯ ·· ) + γ q ni 2 i=1 j =1 (zij − z¯ ·· )
q
i=1 j =1
ni
i=1
2 2 j =1 (zij − z¯ ·· ) }
$ .
(b) Observe that ⎛ n ⎜ q T XT2 [I − (1/n)J]X3 = ⊕i=1 1ni [I − (1/n)J]z = ⎜ ⎝
1
j =1 (z1j
− z¯ ·· )
⎞
⎟ .. ⎟. . ⎠ n q (z − z ¯ ) qj ·· j =1
This vector, which is the left-hand side of (9.13) as it specializes to this model, will equal 0 if and only if z¯ i· = z¯ ·· (i = 1, . . . , q). So (9.13), the necessary and
118
9 Least Squares Estimation and ANOVA for Partitioned Models
sufficient condition for the two ANOVAs to be identical (by Theorem 9.3.2), is satisfied if and only if z¯ i· = z¯ ·· (i = 1, . . . , q). Exercise 13 Consider the analysis-of-covariance model having a single factor of classification with q levels and a single quantitative variable, i.e., yij = μ + αi + γ zij + eij
(i = 1, . . . , q; j = 1, . . . , ni )
where the eij ’s satisfy Gauss–Markov assumptions. Assume that ni ≥ 2 and zi1 = zi2 for all i = 1, . . . , q. In Sect. 9.4 it was shown that one solution to the normal equations for this model is μˆ = 0, αˆ i = y¯i· − γˆ z¯ i· (i = 1, . . . , q), and q γˆ =
n i i=1 j =1 (zij − z¯ i· )yij q n i 2 i=1 j =1 (zij − z¯ i· )
.
Thus, the least squares estimator of μ + αi is y¯i· − γˆ z¯ i· (i = 1, . . . , q). Obtain nonmatrix expressions for: (a) var(γˆ ) (b) var(y¯i· − γˆ z¯ i· ) (i = 1, . . . , q) (c) cov(y¯i· − γˆ z¯ i· , y¯i · − γˆ z¯ i · ) (i > i = 1, . . . , q) Solution q
(a) var(γˆ ) =
ni zi· )2 σ 2 j =1 (zij −¯ i=1 q ni [ i=1 j =1 (zij −¯zi· )2 ]2
=
σ2 ni zi· )2 j =1 (zij −¯ i=1
q
=
σ2 SZZ ,
say.
(b) Without loss of generality, consider the case i = 1. We obtain cov(y¯1· , γˆ z¯ 1· ) = cov((1/n1 )1Tn1 , 0Tn−n1 )y, (1/SZZ) ×(zT1 − z¯ 1· 1Tn1 , . . . , zTq − z¯ q· 1Tnq )y]¯z1· = σ 2 [(1/n1 )(1/SZZ)1Tn1 (z1 − z¯ 1· 1n1 )]¯z1· = 0. 2 var(γˆ ) = (σ 2 /n ) + z¯ 2 σ 2 /SZZ. Hence var(y¯1· − γˆ z¯ 1· ) = var(y¯1· ) + z¯ 1· 1 1· (c) By Theorem 4.2.2a and part (b) of this exercise, cov(y¯i· − γˆ z¯ i· , y¯i · − γˆ z¯ i · ) = cov(y¯i· , y¯i · ) − z¯ i· cov(γˆ , y¯i · ) − z¯ i · cov(y¯i· , γˆ ) + z¯ i· z¯ i · var(γˆ ) = 0 − 0 − 0 + z¯ i· z¯ i · σ 2 /SZZ = (¯zi· z¯ i · /SZZ)σ 2 .
Exercise 14 Consider the one-factor, factor-specific-slope analysis-ofcovariance model yij = μ + αi + γi (zij − z¯ i· ) + eij and assume that zi1 = zi2 for all i.
(i = 1, . . . , q; j = 1, . . . , ni )
9 Least Squares Estimation and ANOVA for Partitioned Models
119
(a) Obtain the least squares estimators of μ + αi and γi (i = 1, . . . , q). (b) Obtain the corrected sequential ANOVA corresponding to the ordered version of this model in which Zγ appears first (but after the overall mean, of course), q where Z = ⊕i=1 (zi − z¯ i· 1ni ) and zi = (zi1 , zi2 , . . . , zini )T . Assume that zi has at least two distinct elements for each i. Give nonmatrix expressions for the sums of squares in the ANOVA table. Solution (a) This model can be written in two-part form as follows: y = Xβ + Zγ + e, q
q
where X = (1n , ⊕i=1 1ni ), Z = ⊕i=1 (zi − zi· 1ni ), β = (μ, α1 , . . . , αq )T , and γ = (γ1 , . . . , γq )T . Because X Z= T
1Tn
q
⎛
⊕i=1 1Tni
q
⊕i=1 (zi − z¯ i· 1ni )
z1· − n1 z¯ 1· z2· − n2 z¯ 2· ⎜ z1· − n1 z¯ 1· 0 ⎜ ⎜ − n2 z¯ 2· 0 z 2· =⎜ ⎜ . . .. .. ⎝ 0 0
⎞ · · · zq· − nq z¯ q· ⎟ ··· 0 ⎟ ⎟ ··· 0 ⎟ ⎟ .. .. ⎠ . . · · · zq· − nq z¯ q·
= 0, the corresponding two-part normal equations are ⎞ y·· ⎜ y1· ⎟ n nT ⎜ ⎟ β + 0γ = ⎜ . ⎟ , q n ⊕i=1 ni ⎝ .. ⎠ ⎛
yq· ⎛ n
j =1 (z1j
− z1· )y1j
j =1 (zqj
− zq· )yqj
1
⎜ q ni 2 0β + ⊕i=1 γ =⎜ j =1 (zij − zi· ) ⎝ nq
.. .
⎞ ⎟ ⎟, ⎠
120
9 Least Squares Estimation and ANOVA for Partitioned Models
where n = (n1 , . . . , nq )T . A solution for γ can be easily obtained from the “bottom" subset of equations: ⎛
⎞
⎛ n1
j =1 (z1j −z1· )y1j n1 2 j =1 (z1j −z1· )
⎜ γˆ1 ⎜ ⎜ .. ⎟ ⎜ .. γˆ = ⎝ . ⎠ = ⎜ ⎜ nq . ⎝ j =1 (zqj −zq· )yqj γˆq
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠
n1 2 j =1 (zqj −zq· )
Because one generalized inverse of the coefficient matrix of β in the “top" subset of equations is
0 0T , q 0 ⊕i=1 (1/ni )
one solution for β is ⎞ ⎞ ⎛ ⎞ ⎛ 0 μˆ y·· ⎜ ⎟ ⎜y ⎟ ⎜ αˆ 1 ⎟ 0 0T ⎜ ⎟ ⎜ y1· ⎟ ⎜ 1· ⎟ βˆ = ⎜ . ⎟ = ⎜ . ⎟ = ⎜ . ⎟. q 0 ⊕i=1 (1/ni ) ⎝ .. ⎠ ⎝ .. ⎠ ⎝ .. ⎠ ⎛
αˆ q
yq·
y q·
Hence, the least squares estimator of (μ + α1 , . . . , μ + αq )T is ⎛
⎞ ⎛ ⎞ y 1· μ + α1 ⎜ .. ⎟ ⎜ .. ⎟ ⎝ . ⎠ = ⎝ . ⎠. μ + αq
y q·
(b) The corrected total sum of squares is i 1 y (I − Jn )y = (yij − y ·· )2 n
q
n
T
i=1 j =1
with rank n − 1, and 1 SS(1) = yT ( Jn )y = ny 2·· . n
9 Least Squares Estimation and ANOVA for Partitioned Models
121
Now consider the orthogonal projection of y onto C(1, Z). The two-part normal equations corresponding to this submodel are
T
n q
0 ⊕i=1
0 ni
j =1 (zij
− zi· )2
μ 1Tn y . = γ ZT y
Because the coefficient matrix is diagonal and nonsingular, the unique solution is ⎛
⎛ n1
j =1 (z1j −z1· )y1j n1 2 j =1 (z1j −z1· )
⎞
⎜ γˆ1 ⎜ ⎜ .. ⎟ ⎜ .. γˆ = ⎝ . ⎠ = ⎜ ⎜ nq . ⎝ j =1 (zqj −zq· )yqj γˆq
μˆ = y ·· ,
n1 2 j =1 (zqj −zq· )
Therefore,
SS(1, Z) = μ, ˆ γˆ
T
1Tn y ZT y
= μ1 ˆ Tn y + γˆ T ZT y 2 ni q z − z )y ij i· ij j =1 = ny 2·· + n i 2 j =1 (zij − zi· ) i=1 and thus, SS(Z | 1) = SS(1, Z) − SS(1) 2 ni q (z − z )y ij i· ij j =1 = n i 2 j =1 (zij − zi· ) i=1 with rank (q + 1) − 1 = q. q Denote X∗ = ⊕i=1 1ni .
⎞ ⎟ ⎟ ⎟ ⎟. ⎟ ⎠
122
9 Least Squares Estimation and ANOVA for Partitioned Models
Now,
⎛
⎛
⎞T 0 ⎜y ⎟ ⎜ 1· ⎟ ⎟ =⎜ ⎜ .. ⎟ ⎝ . ⎠ y q·
=
SS(X∗ , Z, 1) = βˆ T , γˆ T
q
XT ZT
T
y = βˆ XT y + γˆ T ZT y
⎛ ⎞ y·· ⎜ ⎜y ⎟ ⎜ ⎜ 1· ⎟ ⎜ ⎜ . ⎟+⎜ ⎜ . ⎟ ⎜ ⎝ . ⎠ ⎜ ⎝ yq·
ni y 2i· +
i=1
q
n1
j =1 (z1j −z1· )y1j 2 j =1 (z1j −z1· )
n1
.. .
nq
j =1 (zqj −zq· )yqj 2 j =1 (zqj −zq· )
n1
⎞T
⎛ n1 ⎟ ⎟ j =1 (z1j − z1· )y1j ⎟ ⎜ .. ⎟ ⎜ ⎟ ⎝ . ⎟ nq ⎠ (zqj − zq· )yqj
⎞ ⎟ ⎟ ⎠
j =1
2 ni (z − z )y ij i· ij j =1 . ni 2 j =1 (zij − zi· )
i=1
Therefore, SS(X∗ | Z, 1) = SS(X∗ , Z, 1) − SS(1, Z) ⎛ 2 ⎞ ni q q (z − z )y ij i· ij j =1 ⎜ ⎟ =⎝ ni y 2i· + ni ⎠ 2 (z − z ) i· j =1 ij i=1 i=1 ⎛ ⎜ − ⎝ny 2·· +
q
i=1
=
q
⎞
2 ni j =1 (zij − zi· )yij ⎟ n i ⎠ 2 (z − z ) ij i· j =1
ni y 2i· − ny 2··
i=1
with rank 2q − q − 1 = q − 1. Finally, the corrected sequential ANOVA is as follows: Source Z Classes | Z Residual Corrected Total
Rank q q −1 n − 2q n−1
Sum of squares SS(Z | 1) SS(X∗ | Z, 1) By subtraction q ni 2 i=1 j =1 (yij − y ·· )
Exercise 15 Consider the balanced two-way main effects analysis-of-covariance model with one observation per cell, i.e., yij = μ + αi + γj + ξ zij + eij
(i = 1, . . . , q; j = 1, . . . , m).
9 Least Squares Estimation and ANOVA for Partitioned Models
123
(a) Obtain expressions for the least squares estimators of αi − αi , γj − γj , and ξ . (b) Obtain the corrected sequential ANOVAs corresponding to the two ordered fourpart models in which 1μ appears first and zξ appears second. Solution (a) To obtain a solution to the normal equations, let us first formulate the model as a special case of the two-part model y = X1 β 1 + X2 β 2 + e where X1 = (1qm , Iq ⊗ 1m , 1q ⊗ Im ), β 1 = (μ, α1 , . . . , αq , γ1 , . . . , γm )T , X2 = z = (z11 , z12 , . . . , zqm )T , and β 2 = ξ . The two-part normal equations corresponding to this model are easily shown to be ⎛ ⎞ ⎞ y·· z·· ⎜y ⎟ ⎜z ⎟ ⎜ 1· ⎟ ⎜ 1· ⎟ ⎜ . ⎟ ⎟ ⎜ .. ⎟ ⎞ ⎜ .. ⎟ ⎜ T q1m ⎜ ⎟ ⎜ . ⎟ ⎜ ⎟ ⎟ ⎜ ⎠ Jq×m β 1 + ⎜ zq· ⎟ ξ = ⎜ yq· ⎟ , ⎜ ⎟ ⎟ ⎜ ⎜ y·1 ⎟ ⎜ z·1 ⎟ qIm ⎜ ⎟ ⎟ ⎜ ⎜ .. ⎟ ⎜ .. ⎟ ⎝ . ⎠ ⎝ . ⎠ z·m y·m ⎛
⎛
mq m1Tq ⎝ m1q mIq q1m Jm×q
(z·· , z1· , . . . , zq· , z·1 , . . . , z·m )β 1 +
q m i=1 j =1
2 zij ξ =
q m
zij yij .
i=1 j =1
Next observe that P1 = X1 (XT1 X1 )− XT1
⎛
⎞− mq m1Tq q1Tm = (1qm , Iq ⊗ 1m , 1q ⊗ Im ) ⎝ m1q mIq Jq×m ⎠ q1m Jm×q qIm ×(1qm , Iq ⊗ 1m , 1q ⊗ Im )T ⎛
⎞ 0 0Tq 0Tm ⎠ = (1qm , Iq ⊗ 1m , 1q ⊗ Im ) ⎝ 0q (1/m)Iq 0q×m 0m 0m×q (1/q)Im − (1/qm)Jm ⎛ ⎞ 1Tqm ⎜ ⎟ × ⎝ Iq ⊗ 1Tm ⎠ 1Tq ⊗ Im
124
9 Least Squares Estimation and ANOVA for Partitioned Models
⎞ 1Tqm ⎟ ⎜ = (0qm , Iq ⊗ (1/m)1m , 1q ⊗ [(1/q)Im − (1/qm)Jm ] ⎝ Iq ⊗ 1Tm ⎠ T 1q ⊗ Im ⎛
= Iq ⊗ (1/m)Jm + Jq ⊗ [(1/q)Im − (1/qm)Jm ]. Thus, the reduced normal equation for ξ is zT {Iqm − [Iq ⊗ (1/m)Jm ] − Jq ⊗ [(1/q)Im − (1/qm)Jm )]}zξ = zT {Iqm − [Iq ⊗ (1/m)Jm ] − Jq ⊗ [(1/q)Im − (1/qm)Jm )]}y. The unique solution to this equation (which is also the least squares estimator of ξ ) is ξˆ = [zT {Iqm − [Iq ⊗ (1/m)Jm ] − Jq ⊗ [(1/q)Im − (1/qm)Jm )]}z]−1 ×zT {Iqm − [Iq ⊗ (1/m)Jm ] − Jq ⊗ [(1/q)Im − (1/qm)Jm )]}y q m j =1 (zij − z¯ i· − z¯ ·j + z¯ ·· )(yij − y¯i· − y¯·j + y¯·· ) i=1 . = q m 2 j =1 (zij − z¯ i· − z¯ ·j + z¯ ·· ) i=1 Back-solving for β 1 , we obtain ⎞ ⎡⎛ ⎞ ⎛ ⎞ ⎤ z·· y·· μˆ ⎢⎜ y ⎟ ⎜ z ⎟ ⎥ ⎜ αˆ ⎟ ⎢⎜ 1· ⎟ ⎜ 1· ⎟ ⎥ ⎜ 1⎟ ⎢⎜ . ⎟ ⎜ . ⎟ ⎥ ⎜ . ⎟ ⎛ ⎞ − ⎢⎜ .. ⎟ ⎜ .. ⎟ ⎥ ⎜ .. ⎟ mq m1Tq q1Tm ⎢⎜ ⎟ ⎜ ⎟ ⎥ ⎜ ⎟ ⎟ ⎢⎜ ⎟ ⎜ ⎟ ⎥ ⎜ βˆ 1 = ⎜ αˆ q ⎟ = ⎝ m1q mIq Jq×m ⎠ ⎢⎜ yq· ⎟ − ⎜ zq· ⎟ ξˆ ⎥ ⎢⎜ ⎟ ⎜ ⎟ ⎥ ⎜ ⎟ ⎢⎜ y·1 ⎟ ⎜ z·1 ⎟ ⎥ ⎜ γˆ1 ⎟ q1m Jm×q qIm ⎢⎜ ⎟ ⎜ ⎟ ⎥ ⎜ ⎟ ⎢⎜ .. ⎟ ⎜ .. ⎟ ⎥ ⎜ .. ⎟ ⎣⎝ . ⎠ ⎝ . ⎠ ⎦ ⎝ . ⎠ γˆm y·m z·m ⎛ ⎞ y·· − ξˆ z·· ⎜ y − ξˆ z ⎟ 1· ⎟ ⎜ 1· ⎟ .. ⎛ ⎞⎜ ⎜ ⎟ T T 0 0q 0m . ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ = 0q (1/m)Iq ⎜ yq· − ξˆ zq· ⎟ 0q×m ⎜ ⎟ ˆ ⎟ 0m 0m×q (1/q)Im − (1/qm)Jm ⎜ ⎜ y·1 − ξ z·1 ⎟ .. ⎜ ⎟ ⎝ ⎠ . y·m − ξˆ z·m ⎛
9 Least Squares Estimation and ANOVA for Partitioned Models
⎛
125
⎞
0 y¯1· − ξˆ z¯ 1· .. .
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ =⎜ y¯q· − ξˆ z¯ q· ⎟. ⎜ ⎟ ˆ ˆ ⎜ y¯·1 − ξ z¯ ·1 − (y¯·· − ξ z¯ ·· ) ⎟ ⎜ ⎟ .. ⎜ ⎟ ⎝ ⎠ . ˆ ˆ y¯·m − ξ z¯ ·m − (y¯·· − ξ z¯ ·· ) So the least squares estimator of αi − αi is y¯i· − y¯i · − ξˆ (¯zi· − z¯ i · ) (i = i = 1, . . . , q), and the least squares estimator of γj − γj is y¯·j − y¯·j − ξˆ (¯z·j − z¯ ·j ) (j = j = 1, . . . , m). (b) First consider the ordered four-part model in which 1μ appears first, zξ appears second, X3 α appears third, and X4 γ appears last, where X3 = Iq ⊗ 1m , α = (α1 , . . . , αq )T , X4 = 1q ⊗ Im , and γ = (γ1 , . . . , γm )T . (Here the Factor A effects are fit prior to the Factor B effects.) For this model, we may exploit results for the second corrected sequential ANOVA of the one-way analysis of covariance model presented in Sect. 9.4 to obtain " q
m
j =1 (yij − y¯·· )(zij − z¯ ·· ) q m 2 j =1 (zij − z¯ ·· ) i=1
#2
i=1
SS(z|1) = and SS(A Classes|z, 1) =
q
m(y¯i· − y¯·· ) + ξˆ 2
q m (zij − z¯ i· )(yij − y¯i· ) i=1 j =1
i=1
" −
#2 q m j =1 (yij − y¯·· )(zij − z¯ ·· ) i=1 . q m 2 j =1 (zij − z¯ ·· ) i=1
To obtain SS(B Classes|A Classes, z, 1), we may obtain the Model sum of squares for the overall ANOVA and subtract SS(A Classes|z, 1) from it. The
126
9 Least Squares Estimation and ANOVA for Partitioned Models
former is, using results from part (a), T βˆ XT y = βˆ 1 XT1 y + ξˆ zT y ⎛ ⎞T 0 ⎜ ⎟ y¯1· − ξˆ z¯ 1· ⎜ ⎟ ⎜ ⎟ .. ⎜ ⎟ . ⎜ ⎟ ⎜ ⎟ =⎜ y¯q· − ξˆ z¯ q· ⎟ ⎜ ⎟ ⎜ y¯·1 − ξˆ z¯ ·1 − (y¯·· − ξˆ z¯ ·· ) ⎟ ⎜ ⎟ .. ⎜ ⎟ ⎝ ⎠ . y¯·m − ξˆ z¯ ·m − (y¯·· − ξˆ z¯ ·· )
=
⎛
⎞ y·· ⎜y ⎟ ⎜ 1· ⎟ ⎜ . ⎟ ⎜ .. ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ yq· ⎟ ⎜ ⎟ ⎜ y·1 ⎟ ⎜ ⎟ ⎜ .. ⎟ ⎝ . ⎠ y·m
q m (y¯i· − ξˆ z¯ i· )yi· + [y¯·j − ξˆ z¯ ·j − (y¯·· − ξˆ z¯ ·· )]y·j j =1
i=1
+ξˆ
q m (zij − z¯ i· − z¯ ·j + z¯ ·· )(yij − y¯i· − y¯·j + y¯·· ). i=1 j =1
Thus, the corrected sequential ANOVA for this model is Source z A classes|z
Rank
q i=1
1 q −1
q
i=1 m(y¯i·
m−1
q
i=1 (y¯i·
z·· ) j =1 (yij −y¯·· )(zij −¯ q m z·· )2 j =1 (zij −¯ i=1
"
q i=1
q
#2
m
i=1
m
j =1 (zij
− z¯ i· )(yij − y¯i· )
z·· ) j =1 (yij −y¯·· )(zij −¯ q m z·· )2 j =1 (zij −¯ i=1
− ξˆ z¯ i· )yi· +
m
j =1 [y¯·j
#2
− ξˆ z¯ ·j − (y¯·· − ξˆ z¯ ·· )]y·j
q +ξˆ i=1 m j =1 (zij − z¯ i· − z¯ ·j + z¯ ·· )(yij − y¯i· − y¯·j + y¯·· ) q q − i=1 m(y¯i· − y¯·· )2 − ξˆ i=1 m j =1 (zij − z¯ i· )(yij − y¯i· ) "
+ Residual qm − q − m − 1 Corrected total qm − 1
m
− y¯·· )2 + ξˆ −
B classes|A classes,z
Sum of squares
"
q i=1
m
z·· ) j =1 (yij −y¯·· )(zij −¯ m 2 (z −¯ z ) ·· ij j =1 i=1
q
By subtraction m 2 j =1 (yij − y¯·· ) i=1
q
#2
9 Least Squares Estimation and ANOVA for Partitioned Models
127
By the “symmetry" of the situation with Factor A and Factor B effects, the corrected sequential ANOVA for the model in which 1μ appears first, zξ appears second, X4 γ appears third, and X3 α appears last is as follows: Source z
Rank
m−1
q −1
z·· ) j =1 (yij −y¯·· )(zij −¯ m z·· )2 j =1 (zij −¯ i=1
#2
q
m
y=1 q(y¯·j
− y¯·· )2 + ξˆ " q
− A classes|B classes,z
m
i=1
1
B classes|z
Sum of squares
" q
q
i=1 (y¯i·
q
m
i=1
m
i=1
j =1 (zij
− z¯ ·j )(yij − y¯·j )
z·· ) j =1 (yij −y¯·· )(zij −¯ m z·· )2 j =1 (zij −¯ i=1
#2
q
− ξˆ z¯ i· )yi· +
m
j =1 [y¯·j
− ξˆ z¯ ·j − (y¯·· − ξˆ z¯ ·· )]y·j
q +ξˆ i=1 m j =1 (zij − z¯ i· − z¯ ·j + z¯ ·· )(yij − y¯i· − y¯·j + y¯·· ) q m 2 − m q( y ¯ ·j − y¯·· ) − ξˆ j =1 j =1 (zij − z¯ ·j )(yij − y¯·j ) i=1 " q
+ Residual qm − q − m − 1 Corrected total qm − 1
m
i=1
z·· ) j =1 (yij −y¯·· )(zij −¯ m 2 (z −¯ z ) ·· ij j =1 i=1
#2
q
By subtraction m 2 j =1 (yij − y¯·· ) i=1
q
Exercise 16 Consider an analysis-of-covariance extension of the model in Exercise 9.6, written in cell-means form, as yij k = μij + γ zij k + eij k
(i = 1, . . . , q; j = 1, . . . , m; k = 1, . . . , r).
(a) Give the reduced normal equation for γ (in either matrix or nonmatrix form), and assuming that the inverse of its coefficient “matrix" exists, obtain the unique solution to the reduced normal equation in nonmatrix form. (b) Back-solve to obtain a solution for the μij ’s. (c) Using the overall ANOVA from Exercise 9.15 as a starting point, give the sequential ANOVA (sums of squares and ranks of the corresponding matrices), uncorrected for the mean, for the ordered two-part model {y, X1 β 1 +zγ }, where X1 is the model matrix without the covariate and z is the vector of covariates. Solution (a) 1 1 zT [Iqmr − (Iqm ⊗ Jr )]zγ = zT [Iqmr − (Iqm ⊗ Jr )]y, r r
128
9 Least Squares Estimation and ANOVA for Partitioned Models
so q
m r j =1 k=1 zij k (yij k − y¯ij · ) i=1 . q m r 2 j =1 k=1 (zij k − z¯ ij · ) i=1
γˆ =
(b) Observe that XT X = rIqm , so the “top" subset of equations in the two-part normal equations is ⎛
μ11 ⎜ μ12 ⎜ r⎜ . ⎝ ..
⎞
⎛
z11· ⎟ ⎜ z12· ⎟ ⎜ ⎟+⎜ . ⎠ ⎝ ..
μqm
⎛
⎞
y11· ⎜ y12· ⎟ ⎜ ⎟ ⎟γ = ⎜ . ⎝ .. ⎠
zqm·
⎞ ⎟ ⎟ ⎟. ⎠
yqm·
Back-solving yields μˆ ij = y¯ij · − γˆ z¯ ij · (i = 1, . . . , q; j = 1, . . . , m). (c) Source X1 z|X1 Residual Total
Rank qm 1 qm(r − 1) − 1 qmr
Sum of squares q 2 r i=1 m j =1 y¯ij · q m r γˆ i=1 j =1 k=1 zij k (yij k − y¯ij · ) q
By subtraction m r
i=1
j =1
2 k=1 yij k
Exercise 17 Prove Theorem 9.5.1: For the n × n matrix J defined in Sect 9.5, the following results hold: (a) (b) (c) (d) (e) (f)
J is symmetric and idempotent, and rank(J) = m; I − J is symmetric and idempotent, and rank(I − J) = n − m; JX = X; JPX = PX ; (I − J)(J − PX ) = 0; J − PX is symmetric and idempotent, and rank(J − PX ) = m − p∗ .
Solution The symmetry of J¯ = ⊕m i=1 (1/ni )Jni follows immediately from that of each Jni , and the idempotency holds because m m J¯ J¯ = ⊕m i=1 (1/ni )Jni ⊕i=1 (1/ni )Jni = ⊕i=1 (1/ni )Jni (1/ni )Jni ¯ = ⊕m i=1 (1/ni )Jni = J. ¯ = tr(J) ¯ = m (1/ni )ni = m. This proves part Thus, by Theorem 2.12.2, rank(J) i=1 (a). The symmetry and idempotency of I − J¯ follow from that of J¯ (the latter by
9 Least Squares Estimation and ANOVA for Partitioned Models
129
¯ = tr(I − J) ¯ = tr(I) − Theorem 2.12.1), so again by Theorem 2.12.2, rank(I − J) ¯ = n − m, establishing part (b). As for part (c), tr(J) ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ (1/n1 )Jn1 1n1 xT1 1n1 xT1 1n1 xT1 ) ⎟ ⎜ .. ⎟ .. ⎟ = ⎜ .. ¯ = ⊕m (1/ni )Jn ⎜ JX ⎠ = ⎝ . ⎠ = X. i ⎝ i=1 . ⎠ ⎝ . (
1nm xTm
(1/nm )Jnm 1nm xTm
1nm xTm
T X)− XT = X(XT X)− XT = P , proving part (d). By ¯ X = JX(X ¯ By part (c), JP X ¯ ¯ ¯ X = J¯ − PX − J¯ + PX = 0, parts (a) and (d), (I − J)(J − PX ) = IJ¯ − IPX − J¯ J¯ + JP proving part (e). As for part (f), the symmetry of J¯ − PX follows easily from that ¯ X − PX J¯ + PX PX = of J¯ (part (a)) and PX . Also, (J¯ − PX )(J¯ − PX ) = J¯ J¯ − JP ¯J − PX − PX + PX = J¯ − PX , where we have used parts (a) and (d) and the fact ¯ X )T = PT = PX . Thus J¯ − PX is symmetric and idempotent, so that PX J¯ = (JP X ¯ − tr(PX ) = by Theorem 2.12.2 once more, rank(J¯ − PX ) = tr(J¯ − PX ) = tr(J) ∗ ¯ rank(J) − rank(PX ) = m − p .
Constrained Least Squares Estimation and ANOVA
10
This chapter presents exercises on least squares estimation and ANOVA for constrained linear models and provides solutions to those exercises. Exercise 1 Consider the constrained one-factor models featured in Example 10.1-1. (a) Under the model with constraint αq = h, verify that (10.2) is the constrained least squares estimator of an arbitrary q estimable function. (b) Under the model with constraint i=1 αi = h, obtain a general expression for the constrained least squares estimator of an arbitrary estimable function and verify that the estimators of level means and differences coincide with their counterparts under the unconstrained model. (c) Under the model with constraint μ+α1 = h, verify that (10.3) is the constrained least squares estimator of an arbitrary estimable function. Also verify the expressions for the constrained least squares estimators of estimable functions ˘ that immediately follow (10.3). and for var(CT β) Solution (a) Using Theorem 2.9.5b and the fact that n−nT−q (⊕i=1 n−1 i )n−q = nq , the inverse of the upper left q × q block of the constrained normal equations coefficient matrix displayed in the second paragraph of Example 10.1-1 may be expressed as q−1
n nT−q q−1 n−q ⊕i=1 ni
−1 =
n−1 q
T −n−1 q 1q−1
−1 −1 −n−1 q 1q−1 ⊕i=1 ni + nq Jq−1
© Springer Nature Switzerland AG 2020 D. L. Zimmerman, Linear Model Theory, https://doi.org/10.1007/978-3-030-52074-8_10
q−1
.
131
132
10 Constrained Least Squares Estimation and ANOVA
The Schur complement of that same block of the constrained normal equations 01 coefficient matrix is, as noted in the example, nonsingular and equal to , 10 and its inverse is equal to itself (as is easily verified). Therefore, using Theorem 2.9.5a the inverse of the constrained normal equations coefficient matrix has 01 lower right 2 × 2 block , upper right q × 2 block 10 −
n−1 q
T −n−1 q 1q−1
−1 −1 −n−1 q 1q−1 ⊕i=1 ni + nq Jq−1 q−1
0 0Tq−1 lower left 2 × q block −1 1Tq−1 and upper left q × q block
n−1 q
× =
01 10
0
=
0q−1
−1 1q−1
,
(by the symmetry of the coefficient matrix),
T −n−1 q 1q−1
+
−1 −1 −n−1 q 1q−1 ⊕i=1 ni + nq Jq−1 q−1
0 nq 0q−1 0q−1
n−1 q
0 1 0 −1q−1
nq 0Tq−1 0 0Tq−1
T −n−1 q 1q−1 q−1 −1 −1 −1 −n−1 q 1q−1 − nq 1q−1 ⊕i=1 ni + nq Jq−1
n−1 q
T −n−1 q 1q−1
−1 −1 −n−1 q 1q−1 ⊕i=1 ni + nq Jq−1 q−1
.
Putting these blocks together, the inverse of the constrained normal equations coefficient matrix is ⎛
n−1 q
T −n−1 q 1q−1
0
−1
⎞
q−1 −1 ⎜ −n−1 1 ⎟ ⎜ q q−1 ⊕i=1 ni + n−1 q Jq−1 0q−1 1q−1 ⎟ ⎜ ⎟. ⎝ 0 0Tq−1 0 1 ⎠ −1 1Tq−1 1 0
Extracting G11 and G12 as the upper left (q + 1) × (q + 1) and upper right (q + 1) × 1 blocks, respectively, of this last matrix, we may use Theorem 10.1.4
10 Constrained Least Squares Estimation and ANOVA
133
to write the constrained least squares estimator of an estimable (under the model with constraint αq = h) function cT β as cT β˘ = cT (G11 XT y + G12 h) ⎤ ⎡ ⎞ ⎛ ⎛ ⎞ y·· ⎛ ⎞ −1 T n−1 −n 1 0 ⎥ ⎢ ⎟ ⎜ −1 q q q−1 ⎢⎜ ⎟ ⎜ y1· ⎟ ⎜ ⎟ ⎥ q−1 −1 T −1 −1 ⎜ ⎢ ⎟ ⎟ ⎜ = c ⎢⎝ −nq 1q−1 ⊕i=1 ni + nq Jq−1 0q−1 ⎠ ⎜ . ⎟ + ⎝ 1q−1 ⎠ h⎥ ⎥ ⎦ ⎣ ⎝ .. ⎠ 1 0 0Tq−1 0 yq· ⎞ ⎡⎛ ⎤ q−1 n−1 q⎛ (y·· − ⎞ i=1 yi· ) ⎟ ⎛ ⎢⎜ ⎞ ⎥ ⎟ ⎢⎜ ⎥ y¯1· −1 ⎟ ⎢⎜ ⎥ ⎜ . ⎟ q−1 ⎟ ⎜ ⎢⎜ −1 ⎟ ⎥ T −1 ⎟ ⎜ . = c ⎢⎜ −nq y·· 1q−1 + ⎝ . ⎠ + nq ( i=1 yi· )1q−1 ⎟ + ⎝ 1q−1 ⎠ h⎥ ⎟ ⎢⎜ ⎥ ⎟ ⎢⎜ ⎥ 1 y¯q−1· ⎠ ⎣⎝ ⎦ 0 ⎛ ⎞ y q· − h ⎜ ⎟ ⎜ y 1· − y q· + h ⎟ ⎜ ⎟ ⎜ y 2· − y q· + h ⎟ ⎜ ⎟ T =c ⎜ ⎟. .. ⎜ ⎟ . ⎜ ⎟ ⎜ ⎟ ⎝ y q−1· − y q· + h ⎠ h
(b) In this case the coefficient matrix of the constrained normal equations is ⎛
⎞ n nT 0 ⎝ n ⊕q ni 1q ⎠ , i=1 0 1Tq 0 which is nonsingular because ni=1 αi is nonestimable under the unconstrained model. The upper left scalar element and the lower right (q + 1) × (q + 1) submatrix of this matrix are nonsingular also. In particular, by Theorem 2.9.5a the inverse of that submatrix is
−1 q ⊕i=1 ni 1q 1Tq 0 q q q q − m−1 ⊕i=1 n−1 1q 1Tq ⊕i=1 n−1 m−1 ⊕i=1 n−1 1q ⊕i=1 n−1 i i i i = q m−1 1Tq ⊕i=1 n−1 −m−1 i
134
10 Constrained Least Squares Estimation and ANOVA
where m = here to be
q
−1 i=1 ni .
Applying Theorem 2.9.5b with P in that theorem taken
n − nT 0
q
⊕i=1 ni 1q 1Tq 0
−1 n = q 2 /m, 0
we find, after tedious calculations, that the inverse of the constrained normal equations coefficient matrix is ⎞−1 0 n nT T q ⎝ n ⊕ ni 1q ⎠ = m11 m12 i=1 m12 M22 0 1Tq 0 ⎛
where m11 = m/q 2 , , mT12 = −(m/q 2 )1Tq + (1/q)1Tq ⊕qi=1 n−1 i q q q −1 −1 2 T T T ⊕i=1 n−1 i + (m/q )1q 1q − (1/q)1q ⊕i=1 ni − (1/q) ⊕i=1 ni 1q 1q (1/q)1q M22 = . (1/q)1Tq 0
Then, G11 XT y ⎛
⎞ q m/q 2 − (m/q 2 )1Tq + (1/q)1Tq ⊕i=1 n−1 i ⎜ q q −1 2 T ⎟ = ⎝ −(m/q 2 )1q + (1/q) ⊕i=1 n−1 i 1q ⊕i=1 ni + (m/q )1q 1q ⎠ q q −1 T −(1/q)1q 1Tq ⊕i=1 n−1 i − (1/q) ⊕i=1 ni 1q 1q ⎛ ⎞ y·· ⎜ y1· ⎟ ⎜ ⎟ ×⎜ . ⎟ ⎝ .. ⎠ ⎛
yq·
⎞ q (1/q) j =1 y¯j · ⎜ y¯ − (1/q) q y¯ ⎟ ⎜ 1· j =1 j · ⎟ ⎟ =⎜ .. ⎜ ⎟ ⎝ ⎠ . q y¯q· − (1/q) j =1 y¯j ·
10 Constrained Least Squares Estimation and ANOVA
135
and G12 h =
−(1/q) (1/q)1q
h.
Therefore, a general expression for the constrained least squares estimator of an q estimable (under the model with constraint i=1 αi = h) function cT β is ⎛
⎞ q (1/q) j =1 y¯j · − (h/q) ⎜ y¯ − (1/q) q y¯ + (h/q) ⎟ ⎜ 1· ⎟ j =1 j · T ⎜ ⎟. c ⎜ .. ⎟ ⎝ ⎠ . q y¯q· − (1/q) j =1 y¯j · + (h/q) q Constrained least squares estimators of μ and αi are (1/q) j =1 y¯j · − (1/q)h q and y¯i· − (1/q) j =1 y¯j · + (1/q)h, respectively. Constrained least squares estimators of μ + αi and αi − αi are y¯i· and y¯i· − y¯i · , respectively, which coincide with their counterparts under the unconstrained model. (c) By Theorem 3.3.7, one generalized inverse of the constrained normal equations coefficient matrix has lower right scalar element −n1 , upper right (q + 1) × 1 block −
0 0Tq q 0q ⊕i=1 n−1 i
1 u1
⎛
⎞ 0 (−n1 ) = ⎝ 1 ⎠ , 0q−1
lower left 1 × (q + 1) block (−n1 )(1, uT1 )
0 0Tq q 0q ⊕i=1 n−1 i
= 0 1 0Tq−1 ,
and upper left (q + 1) × (q + 1) block
0 0Tq 0 0Tq 0 0Tq 1 T (−n1 )(1, u1 ) + q q q u1 0q ⊕i=1 n−1 0q ⊕i=1 n−1 0q ⊕i=1 n−1 i i i 02×2 02×(q−1) . = q 0(q−1)×2 ⊕i=2 n−1 i
136
10 Constrained Least Squares Estimation and ANOVA
Therefore, one solution to the constrained normal equations is ⎛
0 0 0Tq−1 ⎜ 0 0 0Tq−1 ⎜ ⎜ q ⎝ 0q−1 0q−1 ⊕i=2 n−1 i 0 1 0Tq−1
⎛ ⎞ ⎛ ⎞ 0 y ⎞ ⎜ ·· ⎟ ⎜ ⎟ h 0 ⎜ y1· ⎟ ⎜ ⎟ ⎜ ⎜ ⎟ ⎟ y¯2· y2· ⎟ ⎜ ⎟ 1 ⎟ ⎟⎜ ⎜ ⎜ ⎟ ⎟, ⎟⎜ . ⎟ = ⎜ .. ⎟ . ⎠ 0q−1 ⎜ . ⎟ ⎜ ⎟ . ⎜ ⎟ ⎜ ⎟ −n1 ⎝ yq· ⎠ ⎝ ⎠ y¯q· h y1· − n1 h
so the constrained least squares estimator of an estimable (under the model with constraint μ + α1 = h) function cT β is ⎛
⎞ 0 ⎜ h ⎟ ⎜ ⎟ ⎜ ⎟ cT ⎜ y¯2· ⎟ . ⎜ . ⎟ ⎝ .. ⎠ y¯q· Thus, the constrained least squares estimators of μ + α1 , μ + αi (i = 2, . . . , q), αi − α1 (i = 2, . . . , q), and αi − αi (i > i = 2, . . . , q) are, respectively, h, y¯i· , y¯i· − h, and y¯i· − y¯i · . Finally, for CT = (1q , Iq ), ⎛ ˘ = σ 2 (1q , Iq ) ⎜ var(CT β) ⎝
0
0
0Tq−1
⎞
⎟ 0 0 0Tq−1 ⎠ q 0q−1 0q−1 ⊕i=2 n−1 i
1Tq Iq
= σ2
0
0Tq−1
0q−1 ⊕i=2 n−1 i q
.
Exercise 2 Consider the one-factor model. For this model, the normal equations may be written without using matrix and vector notation as nμ +
q
ni αi = y·· ,
i=1
ni μ + ni αi = yi·
(i = 1, . . . , q).
Using the approach described in Example 7.1-2, it may be shown that one solution to these equations is βˆ = (0, y¯1· , y¯2· , . . . , y¯q· )T . The following five functions might be of interest: μ, α1 , α1 − α2 , μ + q1 αi , and μ + n1 ni αi . (a) For a model without constraints, which of these five functions are estimable and which are not? Give the BLUEs for those that are estimable.
10 Constrained Least Squares Estimation and ANOVA
137
(b) For each function that is nonestimable under a model without constraints, show that it is estimable under the model with the constraint αq = 0 and give its BLUE under this model. (c) For each function that is estimable under the model without constraints, show ˆ and its constrained that its least squares estimator in terms of the elements of β, least squares estimator under the model with constraint αq = 0, are identical. Solution q q (a) α1 − α2 , μ + q1 i=1 αi , and μ + n1 i=1 ni αi are estimable; μ and α1 are not. Using the given solution, BLUEs forthe estimable functions respectively, q are, q q i y¯1· − y¯2· , (1/q) i=1 y¯i· , and (1/n) i=1 ni y¯i· = (1/n) i=1 nj =1 yij = y¯·· . (b) The constraint αq = 0 may be written as Aβ = h, where A = (0Tq , 1) (and h = 0). Then, μ = (1, 0Tq−1 , 1)β − (0Tq , 1)β. In this expression, the first row vector multiplying β is an element of R(X), T and the second such row vector is an element of R(A). Thus μ = c β where X cT ∈ R , so it is estimable under the constrained model. From the equation A nq μ + nq αq = yq· (the next-to-last equation in the constrained normal equations), we obtain μ˘ = y¯q· , and from the q − 1 equations ni μ + ni αi = yi·
(i = 1, . . . , q − 1)
we obtain y¯q· + αi = y¯i· , i.e., α˘ i = y¯i· − y¯q· (i = 1, . . . , q − 1). Next consider α1 , which may be expressed as α1 = (1, 1, 0Tq−1 )β − (1, 0Tq−1 , 1)β + (0Tq , 1)β; the first two row vectors multiplying β are elements of C(X), and the third is an element of R(A). Thus α1 is estimable under the constrained model, and its BLUE, using the previously stated solution to the constrained normal equations, is α˘ 1 = y¯1· − y¯q· . (c) The least squares estimator of α1 − α2 in terms of βˆ is (0, 1, −1, 0Tq−2 )(0, y¯1· , y¯2· , . . . , y¯q· )T = y¯1· − y¯2· .
138
10 Constrained Least Squares Estimation and ANOVA
Its constrained least squares estimator in terms of the solution to the constrained normal equations obtained in part (b) is (0, 1, −1, 0Tq−2 )(y¯q· , y¯1· − y¯q· , y¯2· − y¯q· , . . . , y¯q−1· − y¯q· , 0)T = (y¯1· − y¯q· ) − (y¯2· − y¯q· ) = y¯1· − y¯2· . The least squares estimator of μ + (1/q)
q
i=1 αi
(1, (1/q)1Tq )βˆ = (1/q)
in terms of βˆ is
q
y¯i· .
i=1
Its constrained least squares estimator in terms of the solution to the constrained normal equations obtained in part (b) is (1, (1/q)1Tq )(y¯q· , y¯1· − y¯q· , y¯2· − y¯q· , . . . , y¯q−1· − y¯q· , 0)T = (1/q)
q
y¯i· .
i=1
The least squares estimator of μ + (1/n)
q
i=1 ni αi
in terms of βˆ is
(1, (n1 /n), (n2 /n), . . . , (nq /n))βˆ = (1/n)
q
ni y¯i· = y¯·· .
i=1
Its constrained least squares estimator in terms of the solution to the constrained normal equations obtained in part (b) is (1, (n1 /n), (n2 /n), . . . , (nq /n))(y¯q· , y¯1· − y¯q· , y¯2· − y¯q· , . . . , y¯q−1· − y¯q· , 0)T = y¯q· +
q−1
(ni /n)(y¯i· − y¯q· )
i=1
= y¯q· + (1/n)
q−1
ni y¯i· − (1/n)y¯q·
i=1
= y¯q· + (1/n)
q−1
q−1
ni
i=1
ni y¯i· − [(n − nq )/n]y¯q·
i=1
= y¯q· + (1/n)
q−1 ni i=1 j =1
= y¯··
yij − y¯q· + (1/n)
nq j =1
yqj
10 Constrained Least Squares Estimation and ANOVA
139
Exercise 3 Consider the two-way main effects model, with exactly one observation per cell. (a) Write out the normal equations for this model without using matrix and vector notation. (b) Show that one solution to the normal equations is given by μˆ = y¯·· , αˆ i = y¯i· − y¯·· (i = 1, . . . , q), γˆj = y¯·j − y¯·· (j = 1, . . . , m). (c) Solve the constrained normal equations obtained by augmenting the model with an appropriate number of jointly nonestimable pseudo-constraints. List your pseudo-constraints explicitly and explain why they are jointly nonestimable. (d) The following five functions might be of interest: μ, α1 , γ1 − γ2 , α1 − γ2 , and q μ + q1 i=1 αi + m1 m γ . j j =1 (i) For a model without constraints, which of these five functions are estimable and which are not? Give the BLUEs for those that are estimable. (ii) For each function that is nonestimable under a model without constraints, show that it is estimable under the model with the pseudo-constraints you gave in part (c) and give its BLUE under this constrained model. (iii) For each function that is estimable under a model without constraints, show that its least squares estimator in terms of the elements of the solution to the normal equations given in part (b) is identical to its constrained least squares estimator under the model with the constraints you gave in part (c). Solution (a) The model matrix is X = (1qm , Iq ⊗ 1m , 1q ⊗ Im ), so the normal equations are (1qm , Iq ⊗ 1m , 1q ⊗ Im )T (1qm , 1m , 1q ⊗ Im )β = (1qm , Iq ⊗ 1m , 1q ⊗ Im )T y, i.e., qmμ + m
q
αi + q
m
γj = y··
j =1
i=1
mμ + mαi +
m
γj = yi·
(i = 1, . . . , q)
j =1
qμ +
q i=1
αi + qγj = y·j
(j = 1, . . . , m).
140
10 Constrained Least Squares Estimation and ANOVA
(b) Substituting the purported solution for the parameters on the left-hand side of the normal equations results in qmy¯·· + m
q m (y¯i· − y¯·· ) + q (y¯·j − y¯·· ) = y·· j =1
i=1
my¯·· + m(y¯i· − y¯·· ) +
m (y¯·j − y¯·· ) = yi·
(i = 1, . . . , q)
j =1 q (y¯i· − y¯·· ) + q(y¯·j − y¯·· ) = y·j q y¯·· +
(j = 1, . . . , m),
i=1
demonstrating that the purported solution does indeed solve those equations. (c) Recall from Example 6.1-3 that rank(X) = q + m − 1 while the number of parameters is q +m+1. Therefore, two jointly nonestimable pseudo-constraints are needed. Constraints μ = 0 and α1 = 0 are suitable; they are nonestimable because the coefficients on the αi ’s do not sum to the coefficient on μ (as illustrated in Example 6.1-3). The corresponding unique solution is as follows: μ˘ = 0,
α˘ 1 = 0,
α˘ i = y¯i· − y¯1· (i = 2, . . . , q),
γ˘j = y¯·j + y¯1· − y¯·· (j = 1, . . . , m). q (d) (i) γ1 − γ2 and μ + (1/q) i=1 αi + (1/m) m j =1 γj are estimable; μ, α1 , and α1 − γ2 are nonestimable. Using the solution from part (b), BLUEs of the estimable − y¯·· ) − (y¯·2 − y¯·· )] = y¯·1 − y¯·2 and q functions are [(y¯·1 y¯·· + (1/q) i=1 (y¯i· − y¯·· ) + (1/m) m j =1 (y¯·j − y¯·· ) = y¯·· , respectively. T T 1 0q 0m (q)T T T , 0Tm )β (ii) A = (q)T T , so μ = (1, 0q , 0m )β and α1 = (0, u1 0 u1 0m plainly are estimable under the constrained model. So too is α1 − γ2 = μ + 2α1 − (μ + α1 + γ2 ). Using the solution from part (c), their BLUEs are 0, 0, and −(y¯·2 + y¯1· − y¯·· ), respectively. (iii) Using the solution to the constrained normal equations from part (c), the BLUE of γ1 − γ2 is [(y¯·1 + y¯1· − y¯·· ) − y¯·1 − y¯·2 ; and the (y¯·2 + y¯1· − y¯·· )] = q q BLUE of μ + (1/q) i=1 αi + (1/m) m γj is 0 + (1/q) i=2 (y¯i· − y¯1· ) + j =1 m (1/m) j =1 (y¯·j + y¯1· − y¯·· ) = y¯·· . Exercise 4 Consider the two-factor nested model. (a) Write out the normal equations for this model without using matrix and vector notation. (b) Show that one solution to the normal equations is given by μˆ = y¯··· , αˆ i = y¯i·· − y¯··· (i = 1, . . . , q), γˆij = y¯ij · − y¯i·· (i = 1, . . . , q; j = 1, . . . , mi ).
10 Constrained Least Squares Estimation and ANOVA
141
(c) Solve the constrained normal equations obtained by augmenting the model with an appropriate number of jointly nonestimable pseudo-constraints. List your pseudo-constraints explicitly and explain why they are jointly nonestimable. (d) The followingsix functions might be of interest: μ, α1 , μ + α1 , γ11 −γ12 , μ + q q mi 1 α1 + (1/m1 ) m γ , and μ + (1/m ) m α + (1/m ) 1j · i i · i=1 i=1 j =1 j =1 γij . (i) For a model without constraints, which of these six functions are estimable and which are not? Give the BLUEs for those that are estimable. (ii) For each function that is nonestimable under a model without constraints, show that it is estimable under the model with the pseudo-constraints you gave in part (c) and give its BLUE under this constrained model. (iii) For each function that is estimable under a model without constraints, show that its least squares estimator in terms of the elements of the solution to the normal equations given in part (b) is identical to its constrained least squares estimator under the model with the constraints you gave in part (c). Solution (a) The normal equations are n·· μ +
q i=1
ni· αi +
q mi
nij γij = y···
i=1 j =1
ni· μ + ni· αi +
mi
nij γij = yi··
(i = 1, . . . , q)
j =1
nij μ + nij αi + nij γij = yij ·
(i = 1, . . . , q; j = 1, . . . , mi ).
(b) Substituting the purported solution for the parameters in the left-hand side of the normal equations results in n·· y¯··· +
q i=1
ni· (y¯i· − y¯··· ) +
q mi
nij (y¯ij · − y¯··· ) = y···
i=1 j =1
ni· y¯··· + ni· (y¯i·· − y¯··· ) +
mi
nij (y¯ij · − y¯··· ) = yi··
(i = 1, . . . , q)
j =1
nij y¯··· + nij (y¯i·· − y¯··· ) + nij (y¯ij · − y¯i·· ) = yij ·
(i = 1, . . . , q; j = 1, . . . , mi ),
demonstrating that the purported solution does indeed solve those equations. q (c) Recall from Example 6.1-6 that rank(X) = i=1 mi , while the number of q parameters is q +1+ i=1 mi . So q +1 jointly nonestimable pseudo-constraints are needed. Constraints μ = 0 and αi = 0 (i = 1, . . . , q) are suitable, yielding the unique solution μ˘ = 0, α˘i = 0, γ˘ij = y¯ij · for all i and j .
142
(d)
10 Constrained Least Squares Estimation and ANOVA
1 q (i) γ11 − γ12 , μ + α1 + (1/m1 ) m γ1j , and μ + (1/m· ) i=1 mi αi + j =1 q i (1/m· ) i=1 m j =1 γij are estimable; μ, α1 , and μ + α1 are not estimable. Using the solution from part (b), BLUEs of the estimable functions are [(y¯11· − y¯··· ) − (y¯12· − y¯··· )] = y¯11· − y¯12· , y¯··· + m1 1 (y¯1·· − y¯··· ) + (1/m1 ) m ( y ¯ − y ¯ ) = (1/m ) 1j · 1·· 1 j =1 j =1 y¯1j · , and q q mi y¯··· + (1/m· ) i=1 mi (y¯i·· − y¯··· ) + (1/m· ) i=1 j =1 (y¯ij · − y¯i·· ) = q i (1/m· ) i=1 m ij · , respectively. j =1 y¯ T T 1 0q 0m· (q)T (ii) A = , so μ = (1, 0Tq , 0Tm· )β and α1 = (0, u1 , 0Tm· )β 0q Iq 0q×m· plainly are estimable under the constrained model. So too is μ + α1 = (q)T (1, u1 , 0Tm· )β. Using the solution from (c), their BLUEs are all 0. (iii) Using the solution to the constrained normal equations from part the (c), 1 BLUE of γ11 − γ12 is y¯11· − y¯12· ; the BLUE of μ + α1 + (1/m) m γ j =1 1j 1 m1 is 0 + 0 + (1/m1 ) m y ¯ = (1/m ) y ¯ ; and the BLUE of μ + 1j · 1 1j · j =1 q q i j =1 q (1/m· ) i=1 mi αi + (1/m· ) i=1 m γ is 0 + (1/m ) m ij · i=1 i · 0 + q mi qj =1mi (1/m· ) i=1 j =1 y¯ij · = (1/m· ) i=1 j =1 y¯ij · .
Exercise 5 Consider the one-factor model yij = μ + αi + eij
(i = 1, 2;
j = 1, . . . , r)
with the constraint α1 + α2 = 24. (a) Write out the constrained normal equations without using matrix and vector notation. (b) Obtain the unique solution μ, ˘ α˘ 1 , and α˘ 2 to the constrained normal equations. (c) Is α˘ 1 − α˘ 2 identical to the least squares estimator of α1 − α2 under the corresponding unconstrained model? If you answer yes, verify your answer. If you answer no, give a reason why not. Solution (a) 2rμ + rα1 + rα2 = y·· rμ + rα1 = y1· rμ + rα2 = y2· α1 + α2 = 24
10 Constrained Least Squares Estimation and ANOVA
143
(b) The second and third equations imply that α1 − α2 = (y1· − y2· )/r = y¯1· − y¯2· . This derived equation and the constraint equation yield 2α1 = y¯1· − y¯2· + 24, which finally yields the solution α˘ 1 = (1/2)(y¯1· − y¯2· ) + 12,
α˘ 2 = (1/2)(y¯2· − y¯1· ) + 12,
μ˘ = y¯1· − α˘ 1 = (1/2)(y¯1· + y¯2· ) − 12 = y¯·· − 12. (c) Yes, because the constraint α1 + α2 = 24 is nonestimable under the unconstrained model. The constrained least squares estimator of α1 − α2 is α˘ 1 − α˘ 2 = (1/2)(y¯1· − y¯2· ) + 12 − [(1/2)(y¯2· − y¯1· ) + 12] = y¯1· − y¯2· , which agrees with the least squares estimator of this function associated with the unconstrained model. Exercise 6 Prove Theorem 10.1.6: Under the constrained Gauss–Markov model {y, Xβ, σ 2 I : Aβ = h}, the variance of the constrained least squares estimator of an estimable function cT β associated with the model {y, Xβ : Aβ = h} is uniformly (in β and σ 2 ) less than that of any other linear unbiased estimator of cT β. Solution Because cT β is estimable under the constrained model, by Theorem 6.2.2 an n-vector a1 and a q-vector a2 exist such that cT = aT1 XT X + aT2 A. Let cT β˘ and t0 + tT y be the constrained least squares estimator and any linear unbiased estimator, respectively, of cT β under the constrained model. Then (by Theorem 6.2.1) a q-vector g exists such that t0 = gT h and tT X + gT A = cT . Consequently, by Theorem 4.2.2b, var(t0 + tT y) = var(tT y) ˘ = var[cT β˘ + (tT y − cT β)] ˘ + var(tT y − cT β) ˘ + 2 cov(cT β, ˘ tT y − cT β), ˘ = var(cT β) where, by Corollary 4.2.3.1 and various parts of Theorem 3.3.8 (with XT X and AT here playing the roles of A and B in the theorem), ˘ tT y − cT β) ˘ cov(cT β, = cov{cT (G11 XT y + G12 h), [tT − cT (G11 XT y + G12 h)]} = cT G11 XT (σ 2 I)(t − XGT11 c) = σ 2 cT G11 XT (t − XGT11 c) = σ 2 (a1 XT X + aT2 A)G11 [XT t − XT XGT11 (XT Xa1 + AT a2 )] = σ 2 (a1 XT X + aT2 A)G11 [(c − AT g) − XT XG11 XT Xa1 − XT XGT11 AT a2 ]
144
10 Constrained Least Squares Estimation and ANOVA
= σ 2 (a1 XT X + aT2 A)G11 [c − AT g − (XT Xa1 + AT G22 Aa1 )] = σ 2 (a1 XT X + aT2 A)G11 [c − AT g − (cT − AT a2 + AT G22 Aa1 )] = 0. Thus, ˘ + var(tT y − cT β) ˘ var(t0 + tT y) = var(cT β) ˘ ≥ var(cT β), ˘ = 0, i.e., if and only if σ 2 (tT − and equality holds if and only if var(tT y − cT β) T T T T T T c G11 X )(t − c G11 X ) = 0, i.e., if and only if tT = cT G11 XT , i.e., if and only if tT y = cT G11 XT y for all y, i.e., if and only if t0 + tT y = gT h + cT G11 XT y for all y, i.e., if and only if t0 + tT y = cT G12 h + cT G11 XT y = cT β˘ for all y because, by various parts of Theorem 3.3.8, gT A = cT − tT X = cT − cT G11 XT X = cT − (aT1 XT X + aT2 A)G11 XT X = cT − aT1 XT XG11 XT X = cT − aT1 (XT X + AT G22 A) = cT − aT1 XT X(I − G12 A) = cT − (cT − aT2 A)(I − G12 A) = cT − (cT − aT2 A − cT G12 A + aT2 AG12 A) = cT G12 A, implying that gT h = cT G12 h. ˘ where Xβ˘ Exercise 7 Prove Theorem 10.1.10: Define y˘ = Xβ˘ and e˘ = y − Xβ, is given by (10.6). Then, under the constrained Gauss–Markov model {y, Xβ, σ 2 I : Aβ = h}: (a) (b) (c) (d) (e)
E(˘y) = Xβ; E(˘e) = 0; var(˘y) = σ 2 PX(I−A− A) ; var(˘e) = σ 2 (I − PX(I−A− A) ); and cov(˘y, e˘ ) = 0.
Solution Because Xβ is estimable under the constrained model, Xβ˘ is its constrained least squares estimator. By Theorem 10.1.4, Xβ˘ is unbiased for Xβ. Thus ˘ = Xβ, which proves part (a). Then E(˘e) = E(y−Xβ) ˘ = Xβ −Xβ = E(˘y) = E(Xβ) 0. This establishes part (b). Finally, by the symmetry and idempotency of PX(I−A− A)
10 Constrained Least Squares Estimation and ANOVA
145
(Theorem 10.1.8). var(˘y) = var(PX(I−A− A) y) = PX(I−A− A) (σ 2 I)PTX(I−A− A− A) = σ 2 PX(I−A− A) , var(˘e) = var[(I − PX(I−A− A) )y] = (I − PX(I−A− A) )(σ 2 I)(I − PX(I−A− A) )T = σ 2 (I − PX(I−A− A) ),
and cov(˘y, e˘ ) = cov[PX(I−A− A) y, (I − PX(I−A− A) )y] = PX(I−A− A) (σ 2 I)(I − PX(I−A− A) ) = σ 2 (PX(I−A− A) − PX(I−A− A) ) = 0.
This establishes the remaining parts of the theorem. Exercise 8 Under the constrained Gauss–Markov linear model y = Xβ + e,
Aβ = h,
consider the estimation of an estimable function cT β. (a) Show that the least squares estimator of cT β associated with the corresponding unconstrained model is unbiased under the constrained model. (b) Theorem 10.1.6, in tandem with the unbiasedness of the least squares estimator shown in part (a), reveals that the variance of the least squares estimator of cT β under the constrained Gauss–Markov model is at least as large as the variance of the constrained least squares estimator of cT β under that model. For the case in which the constraints are estimable under the corresponding unconstrained model, obtain an expression for the amount by which the former exceeds the latter and verify that this exceedance is nonnegative. (c) Suppose that n > rank(X), and let σˆ 2 denote the residual mean square associated with the corresponding unconstrained model, i.e., σˆ 2 = yT (I − PX )y/[n − rank(X)]. Is σˆ 2 unbiased under the constrained model? Solution (a) (a) As a special case of Theorem 6.2.1, the least squares estimator cT (XT X)− XT y of cT β is unbiased if and only if a vector g exists such that gT h = 0 and cT (XT X)− XT X + gT A = cT , i.e., because cT ∈ R(X), if and only if a vector g exists such that gT h = 0 and cT + gT A = cT . Because g = 0 is such a vector, the least squares estimator is unbiased.
146
10 Constrained Least Squares Estimation and ANOVA
ˆ = σ 2 cT (XT X)− c and var(cT β) ˘ = (b) By Theorems 7.2.2 and 10.1.5, var(cT β) 2 T σ c G11 left p × p block of any generalized c, where G11 is Tthe upper X X AT G11 G12 of . If the constraints are estimable under the inverse G21 G22 A 0 unconstrained T model, then as shown in Sect. 10.3, one generalized inverse of X X AT has as its upper left p × p block the matrix A 0 (XT X)− − (XT X)− AT [A(XT X)− AT ]− A(XT X)− . So, in this case ˘ = σ 2 cT (XT X)− c − σ 2 cT (XT X)− AT [A(XT X)− AT ]− A(XT X)− c var(cT β) ˆ − σ 2 cT (XT X)− AT [A(XT X)− AT ]− A(XT X)− c. = var(cT β) Let a represent an n-vector such that aT X = cT , and let M represent a matrix ˆ exceeds var(cT β) ˜ by the amount such that MX = A. Then, var(cT β) σ 2 cT (XT X)− AT [A(XT X)− AT ]− A(XT X)− c = σ 2 aT X(XT X)− XT MT [MX(XT X)− XT MT ]− MX(XT X)− XT a = σ 2 aT (MPX )T [(MPX )(MPX )T ]− (MPX )a = σ 2 aT P(MPX )T a, which is nonnegative because P(MPX )T is nonnegative definite by Corollary 2.15.9.1. (c) Yes, σˆ 2 is unbiased under the constrained model. This may be shown in exactly the same manner that it was shown in the proof of Theorem 8.2.1 for an unconstrained model. Exercise 9 Prove Theorem 10.4.2b: Consider a constrained linear model {y, Xβ : Aβ = h} for which the constraints are jointly nonestimable under the corresponding T T unconstrained model, and let (β˘ , λ˘ )T be any solution to the constrained normal equations for the constrained model. Then XG11 XT = PX , XG12 h = 0, and hT G22 h = 0. (Hint: Use Theorems 3.3.11 and 3.3.8b.) XT X AT A11 A12 Solution By Theorem 3.3.11, with playing the role of A 0 A21 A22 T in that theorem, G11 is a generalized inverse of X X. It follows immediately that XG11 XT = PX . Next, by Theorem 3.3.8b, XT XG12 A = AT G21 XT X. The rows of the matrix on the left-hand side of this matrix equation are elements of R(A), and the rows of the
10 Constrained Least Squares Estimation and ANOVA
147
matrix on the right-hand side are elements of R(X). Because R(X) ∩ R(A) = {0}, all elements of these two matrices are zero. In particular XT XG12 A = 0, implying (by Theorem 3.3.2) that XG12 A = 0, implying finally (upon post-multiplication of both sides of this last matrix equation by β) that XG12 h = 0. Finally, because (as just shown) XT XG12 A = 0, by Theorem 3.3.8b AT G22 A = 0. Pre- and post-multiplication of this last matrix equation by β T and β, respectively, yields hT G22 h = 0. Exercise 10 For the case of estimable constraints: (a) Obtain expression (10.14) for the constrained least squares estimator of an estimable function cT β. (b) Show that PX(I−A− A) = PX − PX(XT X)− AT . (c) Show that (PX − PX(XT X)− AT )PX(XT X)− AT = 0, (PX − PX(XT X)− AT )(I − PX ) = 0, and PX(XT X)− AT (I − PX ) = 0. (d) Obtain expressions (10.16) and (10.17) for the Constrained Model and Constrained Residual second-order polynomial functions, respectively. Solution (a) cT β˘ = cT (G11 XT y + G12 h) = cT {(XT X)− − (XT X)− AT [A(XT X)− AT ]− A(XT X)− }XT y +cT (XT X)− AT [A(XT X)− AT ]− h = cT (XT X)− XT y − cT (XT X)− AT [A(XT X)− AT ]− A(XT X)− XT y +cT (XT X)− AT [A(XT X)− AT ]− h = cT βˆ − cT (XT X)− AT [A(XT X)− AT ]− (Aβˆ − h). (b) PX(I−A− A) = XG11 XT = X{(XT X)− − (XT X)− AT [A(XT AT ]− A(XT X)− }XT = PX − X(XT X)− AT [MX(XT X)− AT ]− A(XT X)− XT = PX − X(XT X)− AT [MX((XT X)− )T XT X(XT X)− AT ]− A(XT X)− XT = PX − X(XT X)− AT [(X(XT X)− AT )T X(XT X)− AT ]− [X(XT X)− A]T = PX − PX(XT X)− AT
where we used Theorem 3.3.3b to obtain the fourth equality.
148
10 Constrained Least Squares Estimation and ANOVA
(c) Observe that PX − PX(XT X)− AT , PX(XT X)− AT , and I − PX are symmetric. Also, because C(X(XT X)− AT ) ⊆ C(X), we find that PX(XT X)− AT PX = PX(XT X)− AT and PX(XT X)− AT (I − PX ) = 0. Furthermore, using these results we obtain (PX − PX(XT X)− AT )PX(XT X)− AT = PX(XT X)− AT − PX(XT X)− AT = 0 and (PX − PX(XT X)− AT )(I − PX ) = PX (I − PX ) − PX(XT X)− AT (I − PX ) = 0. (d) As noted in the derivation of (10.7), hT G21 XT y = hT GT12 XT y = yT XG12 h. Thus, the Constrained Model second-order polynomial may be written as yT XG11 XT y + hT G21 XT y + yT XG12 h + hT G22 h = yT X(XT X)− XT y − yT X(XT X)− XT AT [A(XT X)− AT ]− A(XT X)− XT y +hT [A(XT X)− AT ]− A(XT X)− XT y +yT X(XT X)− XT AT [A(XT X)− AT ]− h − hT [A(XT X)− AT ]− h T
= yT PX y − βˆ AT [A(XT X)− AT ]− Aβˆ + hT [A(XT X)− AT ]− Aβˆ T
+βˆ AT [A(XT X)− AT ]− h − hT [A(XT X)− AT ]− h = yT PX y − (Aβˆ − h)T [A(XT X)− AT ]− (Aβˆ − h). Then, ˘ = yT y − {yT PX y − (Aβˆ − h)T [A(XT X)− AT ]− (Aβˆ − h)} Q(β) = yT (I − PX )y + (Aβˆ − h)T [A(XT X)− AT ]− (Aβˆ − h). Exercise 11 Consider a constrained version of the simple linear regression model yi = β1 + β2 xi + ei
(i = 1, . . . , n)
where the constraint is that the line β1 + β2 x passes through a known point (a, b).
10 Constrained Least Squares Estimation and ANOVA
149
(a) Give the constrained normal equations for this model in matrix form, with numbers given, where possible, for the elements of the matrices and vectors involved. (b) Give simplified expressions for the constrained least squares estimators of β1 and β2 . (c) Give an expression for the constrained residual mean square in terms of a solution to the constrained normal equations (and other quantities). Solution (a) The constrained normal equations are ⎛ ⎝
n ⎞ ⎞ ⎛ ⎞ ⎛ n β1 x 1 y ni=1 2i ni=1 i ⎠. ⎠ ⎝ β2 ⎠ = ⎝ i=1 xi i=1 xi a i=1 xi yi λ b 1 a 0
n n
(b) A solution to the constrained normal equations in part (a) may be obtained by inverting the coefficient matrix of those equations, but a less messy approach is to multiply the first equation by a and subtract it from the second equation, yielding the equation β1
n n n (xi − a) + β2 xi (xi − a) = (xi − a)yi . i=1
i=1
(10.1)
i=1
Then multiplying the third of the constrained normal equations by and subtracting it from (10.1) yields the solution β˘2 =
n
i=1 (xi −a)
n i=1 (xi − a)(yi − b) n , 2 i=1 (xi − a)
β˘1 = b − a β˘2 , λ˘ =
n
yi − nβ˘1 − β˘2
i=1 ∗ = rank (c) pA
X A
xi .
i=1
− rank(A) = 2 − 1 = 1. Hence
T yT y − β˘ XT y − λ˘ b = σ˘ = ∗ n − pA 2
n
n
2 i=1 yi
− β˘1
n
− β˘2 n−1
i=1 yi
n
i=1 xi yi
˘ − λb
.
150
10 Constrained Least Squares Estimation and ANOVA
Exercise 12 Consider a constrained version of the mean-centered second-order polynomial regression model ¯ + β3 (xi − x) ¯ 2 + ei yi = β1 + β2 (xi − x)
(i = 1, . . . , 5)
where xi = i, and the constraint is that the first derivative of the second-order polynomial is equal to 0 at x = a where a is known. (a) Give the constrained normal equations for this model in matrix form, with numbers given, where possible, for the elements of the matrices and vectors involved. (b) Give nonmatrix expressions for the constrained least squares estimators of β1 , β2 , and β3 . (c) Give a nonmatrix expression for the constrained residual mean square in terms of a solution to the constrained normal equations (and other quantities). Solution (a) The constrained normal equations are n ⎞⎛ ⎞ ¯ 2 0 n 0 β1 i=1 (xi − x) n n 2 3 ⎜ ⎟ ⎟ ⎜ (x − x) ¯ (x − x) ¯ 1 β 0 i i ⎜ n ⎟⎜ 2⎟ i=1 i=1 n n 2 3 4 2(a − x) ⎝ ⎠ ⎠ ⎝ (x − x) ¯ (x − x) ¯ (x − x) ¯ ¯ β 3 i=1 i i=1 i i=1 i λ 0 1 2(a − x) ¯ 0 n ⎛ ⎞ i=1 yi ⎜ n (xi − x)y ¯ i ⎟ i=1 ⎟. =⎜ ⎝ n (xi − x) ¯ 2 yi ⎠ i=1 0 ⎛
Using the specified values of xi , i = 1, . . . , 5), these equations can be written as ⎛
5 ⎜0 ⎜ ⎝ 10 0
5 ⎞ ⎞⎛ ⎞ ⎛ 0 10 0 β1 i=1 yi ⎟ ⎜ ⎟ ⎜ 10 0 1 ⎟ ⎟ ⎜ β2 ⎟ = ⎜ 2y5 + y4 − y2 − 2y1 ⎟ . 0 34 2a − 6 ⎠ ⎝ β3 ⎠ ⎝ 4y5 + y4 + y2 + 4y1 ⎠ λ 1 2a − 6 0 0
(b) Multiplying the second equation by 2a − 6 and subtracting the result from the third equation yield 10β1 −10(2a−6)β2 +34β3 = (4y5 +y4 +y2 +4y1 )−(2y5 +y4 −y2 −2y1 )(2a−6).
10 Constrained Least Squares Estimation and ANOVA
151
Subtracting twice the first equation from the equation just derived yields −10(2a − 6)β2 + 14β3 = (2y5 − y4 − y2 + 2y1 ) − (2y5 + y4 − y2 − 2y1 )(2a − 6). Adding 10(2a − 6) times the fourth equation to the equation just derived yields [14 + 10(2a − 6)2 ]β3 = (2y5 − y4 − y2 + 2y1 ) − (2y5 + y4 − y2 − 2y1 )(2a − 6). Therefore, β˘3 =
(2y5 − y4 − y2 + 2y1 ) − (2y5 + y4 − y2 − 2y1 )(2a − 6) , 14 + 10(2a − 6)2
β˘2 = −(2a − 6)β˘3 , β˘1 = y¯ − 2β˘3 . (c)
∗ = pA
rank
X A
− rank(A) = 3 − 1 = 2,
∗ = 5 − 2 = 3, n − pA
5 2 ˘ ˘ ˘ i=1 yi − β1 i=1 yi − β2 (2y5 + y4 − y2 − 2y1 ) − β3 (4y5 + y4 + y2 + 4y1 )
5 σ˘ 2 =
3
Exercise 13 Recall from Sect. 10.4 that one can augment an unconstrained model of rank p∗ with p − p∗ jointly nonestimable “pseudo-constraints” so as to obtain a unique solution to the constrained normal equations, from which the least squares estimators of all functions that are estimable under an unconstrained model may be obtained. Extend this idea to a constrained linear model y = Xβ + e, where β satisfies q1 real, linearly independent, consistent constraints A1 β = h1 , so as to obtain a unique solution to constrained normal equations corresponding to this constrained model augmented by pseudo-constraints A2 β 2 = h2 , from which constrained least squares estimators of all functions that are estimable under the original constrained model may be obtained. Solution Suppose that we impose q2 linearly independent pseudo-constraints A2 β 2 = h2 upon the model with q1 linearly independent real constraints A1 β 1 = h1 and that the pseudo-constraints are jointly nonestimable under the model with the true constraints, or equivalently
X R A1
∩ R(A2 ) = {0}.
.
152
10 Constrained Least Squares Estimation and ANOVA
By Corollary 10.1.7.1, ⎛
⎛ ⎞ ⎞ XT X AT1 AT2 X A1 rank ⎝ A1 0 0 ⎠ = rank ⎝ A1 ⎠ + rank A2 A2 A2 0 0 X A1 = rank + rank(A2 ) + rank A2 A1 X + q2 + (q1 + q2 ). = rank A1 X , Thus, if we choose the number of pseudo-constraints to be q2 = p − rank A1 then
⎛
⎞ XT X AT1 AT2 rank ⎝ A1 0 0 ⎠ = p + q1 + q2 , A2 0 0 ⎛
⎞ XT X AT1 AT2 i.e., ⎝ A1 0 0 ⎠ has full rank and is therefore invertible. Thus, if we augment A2 0 0 X pseudo-constraints that are jointly the model with real constraints by p−rank A1 nonestimable under the model with the true constraints, then a unique solution to the constrained normal equations corresponding to the augmented model is given by ⎛
⎞−1 ⎛ T ⎞ XT X AT1 AT2 X y ⎝ A1 0 0 ⎠ ⎝ h1 ⎠ . h2 A2 0 0 Constrained least squares estimators of all functions that are estimable under the original constrained model may be obtained using this solution.
Best Linear Unbiased Estimation for the Aitken Model
11
This chapter presents exercises on best linear unbiased estimation for the Aitken model and provides solutions to those exercises. Exercise 1 Prove Theorem 11.1.8: Under the positive definite Aitken model {y, Xβ, σ 2 W}: (a) (b) (c) (d) (e)
E(˜y) = Xβ; E(˜e) = 0; var(˜y) = σ 2 X(XT W−1 X)− XT ; var(˜e) = σ 2 [W − X(XT W−1 X)− XT ]; cov(˜y, e˜ ) = 0.
The expressions in parts (c) and (d) are invariant to the choice of generalized inverse of XT W−1 X. Solution (a) E(˜y) = E(P˜ X y) = P˜ X Xβ = Xβ by Theorem 11.1.7b. (b) E(˜e) = E[(I − P˜ X )y] = (I − P˜ X )Xβ = Xβ − P˜ X Xβ = 0, again by Theorem 11.1.7b. (c) var(˜y) = var(P˜ X y) = P˜ X (σ 2 W)P˜ TX = σ 2 X(XT W−1 X)− XT W−1 WW−1 X[(XT W−1 X)− ]T XT = σ 2 X[(XT W−1 X)− ]T XT = σ 2 X(XT W−1 X)− XT
by Theorem 11.1.6a, b. © Springer Nature Switzerland AG 2020 D. L. Zimmerman, Linear Model Theory, https://doi.org/10.1007/978-3-030-52074-8_11
153
154
11 Best Linear Unbiased Estimation for the Aitken Model
(d) var(˜e) = var[(I − P˜ X )y] = (I − P˜ X )(σ 2 W)(I − P˜ X )T = σ 2 (W − P˜ X W − WP˜ TX + P˜ X WP˜ TX ) = σ 2 {W − X(XT W−1 X)− XT − X[(XT W−1 X)− ]T XT +X(XT W−1 X)− XT W−1 X[(XT W−1 X)− ]T XT } = σ 2 [W − X(XT W−1 X)− XT ] by Theorem 11.1.6a, b. (e) cov(˜y, e˜ ) = cov[P˜ X y, (I − P˜ X )y] = P˜ X (σ 2 W)(I − P˜ X )T = σ 2 X(XT W−1 X)− XT {I − W−1 X[(XT W−1 X)− ]T XT } = σ 2 X(XT W−1 X)− {XT − XT W−1 X[(XT W−1 X)− ]T XT } = 0, where for the last equality we used Theorem 11.1.6a. Exercise 2 Prove Theorem 11.1.9b: Under the positive definite Aitken model {y, Xβ, σ 2 W}, E[yT (W−1 − P˜ TX W−1 P˜ X )y] = (n − p∗ )σ 2 . Solution By Theorems 4.2.4, 1.1.7b, 3.3.5, 2.8.8, and 2.8.9, E[yT (W−1 − P˜ TX W−1 P˜ X )y] = (Xβ)T (W−1 − P˜ TX W−1 P˜ X )Xβ +tr[(W−1 − P˜ TX W−1 P˜ X )(σ 2 W)] = 0 + σ 2 tr[In − XT W−1 X(XT W−1 X)− ] = σ 2 [n − rank(XT W−1 X)] 1
= σ 2 [n − rank(W− 2 X)] = σ 2 (n − p∗ ). Exercise 3 Prove Theorem 11.1.10: Let CT β be an estimable vector, and let CT β˜ be its generalized least squares estimator associated with the positive definite Aitken model {y, Xβ, σ 2 W}. Suppose that n > p∗ , and let σ˜ 2 be the generalized residual mean square for the same model.
11 Best Linear Unbiased Estimation for the Aitken Model
155
˜ σ˜ 2 ) = 0 under the specified (a) If the skewness matrix of y is null, then cov(CT β, model; (b) If the skewness and excess kurtosis matrices of y are null, then var(σ˜ 2 ) = 2σ 4 /(n − p∗ ) under the specified model. Solution By Corollary 4.2.5.1, ˜ σ˜ 2 ) = 2CT (XT W−1 X)− XT W−1 (σ 2 W) cov(CT β, ×[W−1 − W−1 X(WT W−1 X)− XT W−1 ]Xβ/(n − p∗ ) = 2CT (XT W−1 X)− XT W−1 (σ 2 W) ×[W−1 X − W−1 X(WT W−1 X)− XT W−1 X]β/(n − p∗ ) = 0, where the last equality follows from Theorem 11.1.6a. This establishes part (a). For part (b), by Corollary 4.2.6.3, var(σ˜ 2 ) = 2tr{[W−1 − W−1 X(XT W−1 X)− XT W−1 ] (σ 2 W)[W−1 − W−1 X(XT W−1 X)− XT W−1 ](σ 2 W)/(n − p∗ )2 } +4(Xβ)T [W−1 − W−1 X(XT W−1 X)− XT W−1 ] (σ 2 W)[W−1 − W−1 X(XT W−1 X)− XT W−1 ]Xβ = [2σ 4 /(n − p∗ )2 ]tr{[W−1 − W−1 X(XT W−1 X)− XT W−1 ]W} = [2σ 4 /(n − p∗ )2 ]tr{[I − W−1 X(XT W−1 X)− XT ]} = [2σ 4 /(n − p∗ )2 ]{tr(In ) − tr[(XT W−1 X)(XT W−1 X)− ]} = 2σ 4 /(n − p∗ ), where we used Theorems 11.1.6d and 3.3.5 for the second and fifth equalities, respectively. Exercise 4 Obtain β˜ (the generalized least squares estimator of β) and its variance under an Aitken model for which X = 1n and W = (1 − ρ)In + ρJn , 1 ˜ compare to where ρ is a known scalar satisfying − n−1 < ρ < 1. How does var(β) 2 σ /n?
156
11 Best Linear Unbiased Estimation for the Aitken Model
Solution Using expression (2.1) for the inverse of a nonsingular compound symmetric matrix, we obtain T −1 −1 T −1 ˜ β = (X W X) X W y = 1Tn
−1 1 ρ In − Jn 1n 1−ρ (1 − ρ)(1 − ρ + nρ) 1 ρ ×1Tn In − Jn y 1−ρ (1 − ρ)(1 − ρ + nρ) −1 y· n n2 ρ nρy· = − − 1−ρ (1 − ρ)(1 − ρ + nρ) 1−ρ (1 − ρ)(1 − ρ + nρ) −1 y· (1 − ρ + nρ) − nρy· n(1 − ρ + nρ) − n2 ρ = (1 − ρ)(1 − ρ + nρ) (1 − ρ)(1 − ρ + nρ) = y. ¯
Also, using an expression obtained above, ˜ = σ 2 (XT W−1 X)−1 var(β) 2 −1 2 n(1 − ρ + nρ) − n ρ =σ (1 − ρ)(1 − ρ + nρ) 2 1 − ρ + nρ =σ n 1 ρ(n − 1) + , = σ2 n n which differs from σ 2 /n by an amount σ 2 ρ(n − 1)/n. Exercise 5 Obtain β˜ (the generalized least squares estimator of β) and its variance under an Aitken model for which X = 1n and W = I + aaT , where a ˜ compare to σ 2 /n? is a known n-vector whose elements sum to 0. How does var(β) Solution Using Corollary 2.9.7.1 and the fact that 1Tn a = 0, we obtain β˜ = (XT W−1 X)−1 XT W−1 y = {1Tn [In − (1 + aT a)−1 aaT ]1n }−1 1Tn [In − (1 + aT a)−1 aaT ]y = (n − 0)−1 (y· − 0) = y. ¯ ˜ = σ 2 (XT W−1 X)−1 = σ 2 {1Tn [In − (1 + aT a)−1 aaT ]1n }−1 = Furthermore, var(β) 2 σ /n.
11 Best Linear Unbiased Estimation for the Aitken Model
157
Exercise 6 Consider the model yi = βxi + ei
(i = 1, . . . , n)
where xi = i for all i and (e1 , . . . , en )T has mean 0 and positive definite variance– covariance matrix ⎛
111 ⎜1 2 2 ⎜ ⎜ σ 2W = σ 2 ⎜ 1 2 3 ⎜. . . ⎝ .. .. ..
··· ··· ··· .. .
⎞ 1 2⎟ ⎟ 3⎟ ⎟. .. ⎟ .⎠
1 2 3 ··· n Obtain specialized expressions for the BLUE of β and its variance. (Hint: Observe that the model matrix is one of the columns of W, which makes it possible to compute the BLUE without obtaining an expression for W−1 .) Solution Letting wn denote the last column of W, we find that β˜ = (xT W−1 x)−1 xT W−1 y = (wTn W−1 wn )−1 wTn W−1 y −1 u(n)T y = (1, 2, . . . , n)u(n) n n = yn /n. Furthermore, ˜ = var(yn /n) = (1/n)2 (σ 2 n) = σ 2 /n. var(β) Exercise 7 Consider the problem of best linear unbiased estimation of estimable linear functions of β under the constrained positive definite Aitken model {y, Xβ, σ 2 W : Aβ = h}. (a) Derive the system of equations that must be solved to obtain the BLUE (called the constrained generalized least squares estimator) under this model. (b) Generalize Theorems 10.1.4, 10.1.5, and 10.1.12 to give explicit expressions for the constrained generalized least squares estimator of a vector CT β of estimable functions and its variance–covariance matrix, and for an unbiased estimator of σ 2 , in terms of a generalized inverse of the coefficient matrix for the system of equations derived in part (a). (c) If the constraints are jointly nonestimable, will the constrained generalized least squares estimators of all functions that are estimable under the constrained
158
11 Best Linear Unbiased Estimation for the Aitken Model
model coincide with the (unconstrained) generalized least squares estimators of those functions? Explain. Solution (a) The Lagrangian function for the problem of minimizing QW (β) ≡ (y − Xβ)T W−1 (y − Xβ) with respect to β, subject to constraints Aβ = h, is L(β, λ) = (y − Xβ)T W−1 (y − Xβ) + 2λT (Aβ − h) where 2λ is a vector of Lagrange multipliers. Then, ∂L = −2XT W−1 y + 2XT W−1 Xβ + 2AT λ, ∂β ∂L = 2(Aβ − h). ∂λ Equating these partial derivatives to 0 yields the system of equations
G11 G12 G21 G22
XT W−1 X AT A 0
T −1 β X W y . = h λ
represent any generalized inverse of the coefficient matrix of β˘ the system of equations derived in part (a), and let ˘ be any solution to that λ system. Then, proceeding as in the proof of Theorem 10.1.4, we obtain
(b) Let
T −1 T T G11 G12 X W y ˘ c β= c 0 = cT (G11 XT W−1 y + G12 h). G21 G22 h T
Very similar arguments to those used in the proofs of Theorem 10.1.5 and ˘ = σ 2 CT G11 C and that σ˘ 2 = (yT W−1 y − 10.1.12 then establish that var(CT β) T T T ∗ ) is an unbiased estimator of σ 2 , where p ∗ is β˘ X W−1 y − λ˘ h)/(n − pA A defined exactly as it was in Theorem 10.1.8. (c) Yes. Observe that the “top” subset of the system of equations derived in part (a) can be rearranged as ˘ XT W−1 (Xβ˘ − y) = −AT λ, or equivalently as (Xβ˘ − y)T W−1 X = −λ˘ A. T
11 Best Linear Unbiased Estimation for the Aitken Model
159
If the constraints are jointly nonestimable, then R(X) ∩ R(A) = {0}, so both sides of the system of equations equal 0T , yielding the system XT W−1 Xβ˘ = XT W−1 y. Thus β˘ satisfies the (unconstrained) Aitken equations, implying that the constrained generalized least squares estimators of all functions that are estimable under a model with jointly nonestimable constraints will coincide with the (unconstrained) generalized least squares estimators of those functions. Exercise 8 Consider an Aitken model in which ⎞ 1 X=⎝ 1 ⎠ −2
⎞ 1 −0.5 −0.5 W = ⎝ −0.5 1 −0.5 ⎠ . −0.5 −0.5 1 ⎛
⎛
and
Observe that the columns of W sum to 0; thus W is singular. (a) Show that tT y and T y, where t = ( 16 , 16 , − 13 )T and = ( 12 , 12 , 0)T , are BLUEs of β, thus establishing that there is not a unique BLUE of β under this model. W X (b) Using Theorem 3.3.7, find a generalized inverse of and use Theorem XT 0 11.2.3a to characterize the collection of all BLUEs of β. Solution ⎞⎛ 1 ⎞ ⎞ ⎛ 1 −0.5 −0.5 1 6 (a) tT X = ( 16 , 16 , − 13 ) ⎝ 1 ⎠ = 1 and Wt = ⎝ −0.5 1 −0.5 ⎠ ⎝ 16 ⎠ = −0.5 −0.5 1 −2 − 13 ⎞ ⎛ 1 (3/2) ⎝ 1 ⎠, so by Corollary 11.2.2.1, tT y is a BLUE of β. Similarly, −2 ⎞⎛ 1 ⎞ ⎛ ⎞ ⎛ 1 −0.5 −0.5 1 2 T X = ( 12 , 12 , 0) ⎝ 1 ⎠ = 1 and W = ⎝ −0.5 1 −0.5 ⎠ ⎝ 12 ⎠ = −0.5 −0.5 1 −2 0 ⎞ ⎛ 1 (1/4) ⎝ 1 ⎠, so T y is also a BLUE of β. −2 (b) Observe that ⎛
⎞ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ −0.5 −0.5 1 1 ⎝ 1 ⎠ = ⎝ −0.5 ⎠ + ⎝ 1 ⎠ − ⎝ −0.5 ⎠ , 1 −0.5 −0.5 −2 ⎛
160
11 Best Linear Unbiased Estimation for the Aitken Model
implying that C(X) ⊆ C(W) and, by the symmetry of W, that R(XT ) ⊆ W X is R(W). Thus, by Theorem 3.3.7, one generalized inverse of XT 0 ⎛ 5 ⎞ 1 1 2 9 −9 9 6 5 2 1 ⎟ 1 ⎜− ⎜ 9 9 9 6 ⎟ ≡ G11 G12 where G11 is 3 × 3. It may be verified ⎝ 2 2 2 −1 ⎠ G21 G22 9 9 9 3 1 1 1 1 − − 6 6 3 4 that ⎛ G12 = ⎝
⎞
1 6 1 6
⎠,
I3 − G11 W − G12 XT = (1/3)J3 ,
and
G11 X = 03 .
− 13 Then by Theorem 11.2.3a, the set of all BLUEs of β is ⎧ ⎪ ⎨ ⎪ ⎩
⎛ ˜tT y : ˜t = ⎜ ⎝
1 6 1 6
− 13
⎧ ⎛ ⎪ ⎨ ⎟ ⎜ 4 = ˜tT y : ˜t = ⎝ ⎠ + ((1/3)J3 , 03 )z, z ∈ R ⎪ ⎩ ⎭ ⎪ ⎫ ⎪ ⎬
⎞
1 6 1 6
− 13
⎫ ⎪ ⎬
⎞
⎟ ⎠ + z13 , z ∈ R . ⎪ ⎭
Exercise 9 Prove Theorem 11.2.4: Let CT β be an estimable vector, and let ˜tT y be a BLUE of it under the Aitken model {y, Xβ, σ 2 W}. Suppose that rank(W, X) > rank(X). (a) If the skewness matrix of y is null, then cov(˜tT y, σ˜ 2 ) = 0; (b) If the skewness and excess kurtosis matrices of y are null, then var(σ˜ 2 ) =
2σ 4 . rank(W, X) − rank(X)
Solution (a) By Corollary 4.2.5.1, cov(˜tT y, σ˜ 2 ) = cov
cT GT12 + zT
In − WGT11 − XGT12 −XT G11
yT G11 y/[rank(W, X) − rank(X)]
=2 c
T
GT12
+z
T
In − WGT11 − XGT12 −XT G11
y, -
×(σ 2 W)G11 Xβ/[rank(W, X) − rank(X)] = 0,
11 Best Linear Unbiased Estimation for the Aitken Model
161
where for the last equality we used Theorem 3.3.8c (with W and X here playing the roles of A and B in the theorem). (b) For ease of writing, put s = [rank(W, X) − rank(X)]. By Corollary 4.2.6.3, var(σ˜ 2 ) = 2tr[(1/s)G11 (σ 2 W)(1/s)G11 (σ 2 W)] + 4(Xβ)T (1/s)G11 (σ 2 W)(1/s)G11 Xβ = (2σ 4 /s 2 )tr(G11 WG11 W) = (2σ 4 /s 2 )tr[G11 (W + XG22 XT )] = (2σ 4 /s 2 )tr(G11 W − G11 WG12 XT ) = (2σ 4 /s 2 )[tr(WG11 ) − tr(G12 XT G11 W)] = (2σ 4 /s 2 )[rank(W, X) − rank(X)] = 2σ 4 /[rank(W, X) − rank(X)],
where for the second, third, fourth, and sixth equalities we used parts (c), (d), (b), (c), and (e) (in that order) of Theorem 3.3.8 (with W and X here playing the roles of A and B in the theorem). Exercise 10 Results on constrained least squares estimation under the Gauss– Markov model {y, Xβ, σ 2 I : Aβ = h} that were established in Chap. 10 may be used to obtain results for best linear unbiased estimation under the unconstrained Aitken model {y, Xβ, σ 2 W}, as follows. Consider the constrained Gauss–Markov model {y, Xβ, σ 2 I : Aβ = h}. Regard the vector h on the right-hand side of the system of constraint equations as a vector of q “pseudo-observations” and append it to the vector of actual observations y, and then consider the unconstrained Aitken model y X e e 2 I 0 . = β+ , var =σ 00 h A 0 0 Prove that the constrained least squares estimator of a function cT β that is estimable under the constrained Gauss–Markov model is a BLUE of cT β under the corresponding unconstrained Aitken model. Solution First observe (by Theorems 6.1.2 and 6.2.2) that the necessary and X T T sufficient condition for c β to be estimable under either model is c ∈ R A T so if c β is estimable under either model then it is estimable under the other. G∗11 G∗12 , where G∗11 is p × p, represent an arbitrary generalized inverse Let G∗21 G∗22 T X X AT G11 G12 , where G11 is (n + q) × (n + q), represent an of , and let A 0 G21 G22
162
11 Best Linear Unbiased Estimation for the Aitken Model
⎛
⎞ In 0 X arbitrary generalized inverse of S = ⎝ 0 0 A ⎠. Then form the matrix XT AT 0 ∗ ∗T ⎞ T I − XG∗T 11 X −XG12 XG11 T ⎠ R = ⎝ −G∗T −G∗22 G∗T 12 X 12 G211 G212 G22
⎛
where (G211 , G212 ) = G21 . It suffices to show that R is a generalized inverse of T XG∗T y T 11 S, for if that is so, then by Theorem 11.2.3a, c , or equivalently ∗T G12 h cT (G∗11 XT y + G∗12 h), is a BLUE of cT β. By Theorem 10.1.4, this would establish that the constrained least squares estimator is a BLUE under the Aitken model. To show that R is a generalized inverse of S, it is helpful to first document a few results. Observe that 0 = XT X − XT XG∗11 XT X + AG∗22 AT = XT X − XT XG∗11 XT X − XT XG∗12 A T T ∗T T = XT X − XT XG∗T 11 X X − A G12 X X,
where we used Theorem 3.3.8d, b for the first two equalities and ordinary matrix transposition for the third. The third equality implies that T T ∗T T XT − XT XG∗T 11 X − A G12 X = 0
(11.1)
by Theorem 3.3.2. Similarly, by Theorem 3.3.8f, T ∗ T 0 = XT X − XT XG∗T 11 X X + AG22 A
and via similar manipulations we obtain T XT − XT XG∗11 XT − AT G∗T 12 X = 0,
or equivalently T ∗ X − XG∗T 11 X X − XG12 A = 0.
(11.2)
Also, note that AG∗11 XT X = 0 by Theorem 3.3.8c, implying (by Theorem 3.3.2 and matrix transposition) that T XG∗T 11 A = 0.
(11.3)
11 Best Linear Unbiased Estimation for the Aitken Model
163
Let SRS =
M11 M12 M21 M22
.
Then, XT 0 XT XG∗T AT I0 I − XG∗T X XG∗T 11 11 11 = + + G21 0 0 0 0 00 A X + G22 (XT , AT ) A I0 = 00
M11
where we used Theorem 3.3.8b and (11.3). Furthermore, by (11.2) and Theorem 3.3.8a, M12 =
∗ T X − XG∗T 11 X X − XG12 A 0
+
X X X = G21 , A A A
and by (11.1) and Theorem 3.3.8c, a, T T ∗T T T ∗T T T ∗T T T T M21 = (XT XG∗T 11 X + A G12 X , X XG11 A + A G12 A ) = (X , A ).
Finally, by Theorem 3.3.8b, M22 = −XT XG∗12 A − AT G∗22 A = −(−AT G∗22 A) − AT G∗22 A = 0. Thus SRS = S. Exercise 11 Consider the same simple linear regression setting described in Example 11.3-1. (a) Assuming that the elements of W are such that W is nonnegative definite, determine additional conditions on those elements that are necessary and sufficient for the ordinary least squares estimator of the slope to be a BLUE of that parameter. (Express your conditions as a set of equations that the wij ’s must satisfy, for example, w11 + w12 = 3w22 , etc.) (b) Assuming that the elements of W are such that W is nonnegative definite, determine additional conditions on those elements that are necessary and sufficient for the ordinary least squares estimator of the intercept to be a BLUE of that parameter.
164
11 Best Linear Unbiased Estimation for the Aitken Model
(c) Assuming that the elements of W are such that W is nonnegative definite, determine additional conditions on those elements that are necessary and sufficient for the ordinary least squares estimator of every estimable function to be a BLUE of that function. (Hint: Combine your results from parts (a) and (b).) (d) Give numerical entries of a positive definite matrix W = I for which: (i) the ordinary least squares estimator of the slope, but not of the intercept, is a BLUE; (ii) the ordinary least squares estimator of the intercept, but not of the slope, is a BLUE; (iii) the ordinary least squares estimators of the slope and intercept are BLUEs. Solution (a) By Theorem 11.3.1, the necessary and sufficient condition is that ⎛
⎞ ⎞⎛ ⎞ ⎛ −1 w11 w12 w13 1 −1 a−b 0 ⎝ w12 w22 w23 ⎠ ⎝ 1 0 ⎠ 3 0 =⎝ a ⎠ 02 1 1 1 w13 w23 w33 a+b for some a, b ∈ R, or equivalently that ⎞ ⎞ ⎛ a−b −w11 + w13 ⎝ −w12 + w23 ⎠ = ⎝ a ⎠ a+b −w13 + w33 ⎛
for some a, b ∈ R. This may be re-expressed as (−w12 + w23 ) − (−w11 + w13 ) = (−w13 + w33 ) − (−w12 + w23 ), or equivalently as w11 − w33 = 2(w12 − w23 ). (b) By Theorem 11.3.1, the necessary and sufficient condition is that ⎛
⎞ ⎞⎛ ⎞ ⎛ −1 w11 w12 w13 1 −1 a−b 3 0 1 ⎝ w12 w22 w23 ⎠ ⎝ 1 0 ⎠ =⎝ a ⎠ 02 0 1 1 w13 w23 w33 a+b for some a, b ∈ R, or equivalently that ⎞ ⎞ ⎛ a−b w11 + w12 + w13 ⎝ w12 + w22 + w23 ⎠ = ⎝ a ⎠ a+b w13 + w23 + w33 ⎛
11 Best Linear Unbiased Estimation for the Aitken Model
165
for some a, b ∈ R. This may be re-expressed as (w13 +w23 +w33 )−(w12 +w22 +w23 ) = (w12 +w22 +w23 )−(w11 +w12 ++w13 ), or equivalently as w11 + w33 + 2w13 = w12 + w23 + 2w22 . (c) Every linear combination of the intercept and slope is estimable, and every estimable function is a linear combination of the intercept and slope. Thus, the least squares estimator of every estimable function will be a BLUE of that function if and only if the least squares estimators of the slope and intercept are BLUEs. By the results of parts (a) and (b), this occurs if and only if w11 − w33 = 2(w12 − w23 ) and w11 + w33 + 2w13 = w12 + w23 + 2w22 , which cannot be simplified further. (d) (i) A suitable W-matrix is the one featured in Example 11.4-1, i.e., W = ⎞ ⎛ 1 0.5 0.25 ⎝ 0.5 1 0.5 ⎠. 0.25 0.5 1 ⎞ ⎛ 1.5 0 0.5 (ii) A suitable W-matrix is W = ⎝ 0 2 0 ⎠, which is diagonally dominant 0.5 0 1.5 so it is positive definite by Theorem 2.15.10. ⎞ ⎛ 1 0.5 0.5 (iii) A suitable W-matrix is W = ⎝ 0.5 1 0.5 ⎠, which we know to be 0.5 0.5 1 positive definite by the results of Example 4.1. Exercise 12 Determine the most general form that the variance–covariance matrix of an Aitken one-factor model can have in order for the ordinary least squares estimator of every estimable function under the model to be a BLUE of that function. Solution Corollary 11.3.1.2 may be used to address this issue. Recall from Example q 8.1-3 that for this model, PX = ⊕i=1 (1/ni )Jni . Now write W in the following partitioned form: ⎛
W11 W12 ⎜ W21 W22 ⎜ W=⎜ . .. ⎝ .. . Wq1 Wq2
⎞ · · · W1q · · · W2q ⎟ ⎟ . ⎟, .. . .. ⎠ · · · Wqq
where Wij has dimensions ni × nj for i, j = 1, 2, . . . , q. Since W is nonnegative definite and hence symmetric, we have Wij = WTji .
166
11 Best Linear Unbiased Estimation for the Aitken Model
By Corollary 11.3.1.2, ordinary least squares estimators of all estimable functions are BLUEs of those functions if and only if WPX is symmetric. Because ⎛ ⎜ ⎜ WPX = ⎜ ⎜ ⎝
1 1 n1 W11 Jn1 n2 W12 Jn2 1 1 n1 W21 Jn1 n2 W22 Jn2
.. . 1 n1 Wq1 Jn1
.. . 1 n2 Wq2 Jn2
··· ··· .. . ···
1 nq W1q Jnq 1 nq W2q Jnq
.. . 1 nq Wqq Jnq
⎞ ⎟ ⎟ ⎟, ⎟ ⎠
we conclude that WPX is symmetric if and only if n1i Jni Wij = n1j Wij Jnj for i, j = 1, 2, . . . , q. Now fix one particular pair (i, j ) ∈ {(s, t) : s, t ∈ {1, 2, . . . , q}}, let ⎛
w11 w12 · · · w1nj ⎜ w21 w22 · · · w2n j ⎜ Wij = ⎜ . .. .. . ⎝ . . . wn i 1 wn i 2 · · · wn i n j
⎞ ⎟ ⎟ ⎟, ⎠
n j i and define w k· = n1j s=1 wks to be the kth row mean and w·l = n1i ns=1 wsl to be the lth column mean, for k = 1, . . . , ni and l = 1, . . . , nj . Then, ⎛
⎞ w·2 · · · w·nj w·2 · · · w·nj ⎟ ⎟ .. ⎟ , .. . ⎠ . w ·1 w·2 · · · w·nj ⎞ ⎛ w 1· w 1· · · · w 1· ⎜ w 2· w 2· · · · w 2· ⎟ ⎟ ⎜ =⎜ . .. ⎟ . .. ⎝ .. . ⎠ .
w ·1 ⎜ 1 1 ⎜ w ·1 Jni Wij = 1ni 1Tni Wij = ⎜ . ⎝ .. ni ni
1 1 Wij Jnj = Wij 1nj 1Tnj nj nj
w ni · w ni · · · · w ni · By comparing the elements in the two matrices above, we conclude that 1 nj
1 ni Jni Wij
=
Wij Jnj if and only if w 1· = w 2· = · · · = wni · = w ·1 = w ·2 = · · · = w ·nj ,
i.e., if and only if every row and every column of Wij have the same mean. Since the pair (i, j ) is arbitrary, we obtain the following necessary and sufficient conditions
11 Best Linear Unbiased Estimation for the Aitken Model
167
on W in order for the ordinary least squares estimators to be BLUEs: 1. W is nonnegative definite. 2. When W is partitioned as in this solution, for each i, j = 1, 2, . . . , q every row and every column of Wij have the same mean. Exercise 13 Consider a k-part general mixed linear model y = X1 β 1 + X2 β 2 + · · · Xk β k + e for which V(θ ) = θ0 I + θ1 X1 XT1 + θ2 X2 XT2 + · · · θk Xk XTk , where θ = (θ0 , θ1 , θ2 , . . . , θk )T is an unknown parameter belonging to the subset of Rk+1 within which V(θ) is positive definite for every θ . (a) Show that the ordinary least squares estimator of every estimable function under this model is a BLUE of that function. (b) For the special case of a two-way main effects model with two levels of Factor A, three levels of Factor B, and one observation per cell, determine numerical entries of a V(θ ) = I for which the ordinary least squares estimator of every estimable function is a BLUE. (Hint: Use part (a).) (c) For the special case of a two-factor nested model with two levels of Factor A, two levels of Factor B within each level of Factor A, and two observations per cell, determine numerical entries of a V(θ ) = I for which the ordinary least squares estimator of every estimable function is a BLUE. Solution (a) Using Theorem 9.1.3b (twice), we find that V(θ)PX = (θ0 I + θ1 X1 XT1 + θ2 X2 XT2 + · · · θk Xk XTk )PX = θ0 PX + θ1 X1 XT1 + θ2 X2 XT2 + · · · θk Xk XTk = PX (θ0 I + θ1 X1 XT1 + θ2 X2 XT2 + · · · θk Xk XTk ) = PX V(θ ). The desired result follows by Corollary 11.3.1.2. (b) There are a variety of correct answers, but one of them is obtained using the three-part decomposition of X given in Example 6.1-3, i.e., X = (X1 , X2 , X3 ) = (16 , I2 ⊗ 13 , 12 ⊗ I3 )
168
11 Best Linear Unbiased Estimation for the Aitken Model
and setting θ0 = 6, θ1 = θ2 = θ3 = 1. This yields V = 6I6 + 16 1T6 + (I2 ⊗ 13 )(I2 ⊗ 13 )T + (12 ⊗ I3 )(12 ⊗ I3 )T = 6I6 + J6 + (I2 ⊗ J3 ) + (J2 ⊗ I3 ) ⎞ ⎛ 922211 ⎜2 9 2 1 2 1⎟ ⎟ ⎜ ⎟ ⎜ ⎜2 2 9 1 1 2⎟ =⎜ ⎟, ⎜2 1 1 9 2 2⎟ ⎟ ⎜ ⎝1 2 1 2 9 2⎠ 112229 which is symmetric, has positive diagonal elements, and is diagonally dominant; thus by Theorem 2.15.10 it is positive definite. (c) Again there are many correct answers, but one of them is obtained by the threepart decomposition of X obtained in Exercise 5.4, i.e., X = (X1 , X2 , X3 ) = (18 , I2 ⊗ 14 , I4 ⊗ 12 ) and setting θ0 = 9, θ1 = θ2 = θ3 = 1. This yields V = 9I8 + 18 1T8 + (I2 ⊗ 14 )(I2 ⊗ 14 )T + (I4 ⊗ 12 )(I4 ⊗ 12 )T = 9I8 + J8 + (I2 ⊗ J4 ) + (I4 ⊗ J2 ) ⎞ ⎛ 12 3 2 2 1 1 1 1 ⎜ 3 12 2 2 1 1 1 1 ⎟ ⎟ ⎜ ⎜ 2 2 12 3 1 1 1 1 ⎟ ⎟ ⎜ ⎟ ⎜ ⎜ 2 2 3 12 1 1 1 1 ⎟ =⎜ ⎟, ⎜ 1 1 1 1 12 3 2 2 ⎟ ⎟ ⎜ ⎜ 1 1 1 1 3 12 2 2 ⎟ ⎟ ⎜ ⎝ 1 1 1 1 2 2 12 3 ⎠ 1 1 1 1 2 2 3 12 which is symmetric, has positive diagonal elements, and is diagonally dominant; thus by Theorem 2.15.10 it is positive definite. Exercise 14 Consider the positive definite Aitken model {y, Xβ, σ 2 W}, and suppose that n > p∗ . (a) Prove that the ordinary least squares estimator and generalized least squares estimator of every estimable function are equal if and only if PX = P˜ X . (b) Define σˆ 2 = yT (I − PX )y/(n − p∗ ); note that this would be the residual mean square under a Gauss–Markov model. Also define σ˜ 2 = yT [W−1 − W−1 X(XT W−1 X)− XT W−1 ]y/(n − p∗ ),
11 Best Linear Unbiased Estimation for the Aitken Model
169
which is the generalized residual mean square. Suppose that the ordinary least squares estimator and generalized least squares estimator of every estimable function are equal. Prove that σˆ 2 = σ˜ 2 (for all y) if and only if W(I − PX ) = I − PX . (c) Suppose that W = I + PX APX for some n × n matrix A such that W is positive definite. Prove that in this case the ordinary least squares estimator and generalized least squares estimator of every estimable function are equal and σˆ 2 = σ˜ 2 (for all y). Solution (a) By Theorems 7.1.4 and 11.1.1, the ordinary least squares estimator and generalized least squares estimator of every estimable function are equal if and only if cT (XT X)− XT y = cT (XT W−1 X)− XT W−1 y for all y and all cT ∈ R(X), i.e., by Theorems 2.1.1 and 6.1.2, if and only if aT X(XT X)− XT = aT X(XT W−1 X)− XT W−1 for all a, i.e., by another application of Theorem 2.1.1, if and only if X(XT X)− XT = X(XT W−1 X)− XT W−1 , i.e., if and only if PX = P˜ X . (b) σˆ 2 = σ˜ 2 (for all y) if and only if yT (I − PX )y = yT [W−1 − W−1 X(XT W−1 X)− XT W−1 ]y for all y, i.e. (by the uniqueness assertion in Theorem 2.14.1), if and only if I − PX = W−1 − W−1 X(XT W−1 X)− XT W−1 , i.e., if and only if I − PX = W−1 − W−1 P˜ X . By the given assumption, this last condition may be re-expressed, using part (a), as I − PX = W−1 − W−1 PX or equivalently (by pre-multiplication of both sides by W) as W(I − PX ) = I − PX . (c) WPX = (I + PX APX )PX = PX + PX APX = PX (I + PX APX ) = PX W, implying (by Corollary 11.3.1.2) that the least squares estimator and generalized least squares estimator of every estimable function are equal. Furthermore, by Theorem 8.1.4, W(I − PX ) = (I + PX APX )(I − PX ) = I − PX + PX APX (I − PX ) = I − PX . Thus by part (b), σˆ 2 = σ˜ 2 for all y.
12
Model Misspecification
This chapter presents exercises on the effects of misspecifying the linear model and provides solutions to those exercises. Exercise 1 Consider the underspecified mean structure scenario, and suppose that cT1 β 1 is estimable under the correct model and rank[(I − P1 )X2 ] = p2 . Prove (U ) that cT βˆ and cT βˆ 1 coincide if and only if cT (XT X1 )− XT X2 = 0T . 1
1
1
1
1
1
(U ) Solution First note that cT1 βˆ 1 = cT1 (XT1 X1 )− XT1 y. By Theorem 3.3.7a, the upper left p1 × p1 and upper right p1 × p2 blocks of one generalized inverse of the coefficient matrix of the two-part normal equations corresponding to the correct model are
(XT1 X1 )− + (XT1 X1 )− XT1 X2 [XT2 (I − P1 )X2 ]− XT2 X1 (XT1 X1 )− and −(XT1 X1 )− XT1 X2 [XT2 (I − P1 )X2 ]− , respectively. Thus, cT1 βˆ 1 = cT1 (XT1 X1 )− XT1 y + cT1 (XT1 X1 )− XT1 X2 [XT2 (I − P1 )X2 ]− XT2 X1 (XT1 X1 )− XT1 y −cT1 (XT1 X1 )− XT1 X2 [XT2 (I − P1 )X2 ]− XT2 y.
© Springer Nature Switzerland AG 2020 D. L. Zimmerman, Linear Model Theory, https://doi.org/10.1007/978-3-030-52074-8_12
171
172
12 Model Misspecification (U )
So if cT1 (XT1 X1 )− XT1 X2 = 0T , then cT1 βˆ 1 = cT1 βˆ 1 . Conversely, if cT1 βˆ 1 = (U ) cT βˆ , then 1
1
cT1 (XT1 X1 )− XT1 X2 [XT2 (I − P1 )X2 ]− XT2 X1 (XT1 X1 )− XT1 y −cT1 (XT1 X1 )− XT1 X2 [XT2 (I − P1 )X2 ]− XT2 y = 0 for all y, implying that cT1 (XT1 X1 )− XT1 X2 [XT2 (I − P1 )X2 ]− XT2 (I − P1 ) = 0T . Post-multiplying by X2 and letting a represent any vector such that cT1 = aT (I − P2 )X1 , we obtain 0T = aT (I − P2 )P1 X2 [XT2 (I − P1 )X2 ]− XT2 (I − P1 )X2 = aT (I − P2 )[(I − P1 )X2 ][XT2 (I − P1 )X2 ]− XT2 (I − P1 )X2 = aT (I − P2 )[(I − P1 )X2 ] = aT (I − P2 )P1 X2 = aT (I − P2 )X1 (XT1 X1 )− XT1 X2 = cT1 (XT1 X1 )− XT1 X2 where we used Theorem 3.3.3a [with (I − P1 )X2 playing the role of A in the theorem] for the third equality. Exercise 2 Consider an underspecified mean structure scenario, where both models satisfy Gauss–Markov assumptions. Suppose that cT1 β 1 is estimable under the underspecified model and that cT2 β 2 is estimable under the correct model. Prove (U ) that cT βˆ and cT βˆ 2 are uncorrelated under the correct model. 1
1
2
Solution By the given estimability conditions, cT1 = aT1 X1 for some a1 , and cT2 = (U ) aT2 (I − P1 )X2 for some a2 . Thus we may write cT1 βˆ 1 as aT1 X1 (XT1 X1 )− XT1 y = aT1 P1 y, and using Theorem 3.3.7 we have cT2 βˆ 2
=
aT2 (I − P1 )X2
−[XT2 (I − P1 )X2 ]− XT2 X1 (XT1 X1 )−
[XT2 (I − P1 )X2 ]−
= aT2 (I − P1 )X2 {−[XT2 (I − P1 )X2 ]− XT2 P1 y + [XT2 (I − P1 )X2 ]− XT2 y} = aT2 (I − P1 )X2 [XT2 (I − P1 )X2 ]− XT2 (I − P1 )y = aT2 (P12 − P1 )y,
XT1 y XT2 y
12 Model Misspecification
173
where we used Theorem 9.1.2 for the last equality. Then, by Theorems 4.2.3 and 9.1.1b, cov[aT1 P1 y, aT2 (P12 − P1 )y] = aT1 P1 (σ 2 I)(P12 − P1 )a2 = 0. Exercise 3 Suppose that the Gauss–Markov no-intercept simple linear regression model is fit to some data, but the correct model is the Gauss–Markov simple linear regression model. (a) Determine the bias of βˆ2 , and determine necessary and sufficient conditions for the bias to equal 0. (b) Determine necessary and sufficient conditions for the mean squared error of (U ) βˆ2 to be less than or equal to the mean squared error of the least squares estimator of β2 under the correct model. (U )
Solution n 2 (a) By Theorem 12.1.2a, the bias is (nx/ ¯ i=1 xi )β1 , which is equal to 0 (for all β1 ) if and only if x¯ = 0. (U ) (b) By Theorem 12.1.4b, MSE(βˆ2 ) ≤ MSE(βˆ2 ) if and only if [(nx/ ¯
n i=1
xi2 )β1 ]2 ≤ σ 2 (nx/ ¯
n
xi2 ){1T [I − x(xT x)−1 xT ]1}−1 (nx/ ¯
i=1
n
xi2 ),
i=1
i.e., if and only if β12 ≤ σ 2 (n − n2 x¯ 2 / σ 2 ni=1 xi2 /(nSXX), i.e., if and only if %
n
|β1 | var(βˆ1 )
2 −1 i=1 xi ) ,
i.e., if and only if β12 ≤
≤ 1.
Exercise 4 Consider a situation in which the true model for three observations is the two-way main effects Gauss–Markov model yij = μ + αi + γj + eij ,
(i, j ) ∈ {(1, 1) (1, 2), (2, 1)}
but the model fitted to the observations is the (underspecified) one-factor Gauss– Markov model yij = μ + αi + eij ,
(i, j ) ∈ {(1, 1) (1, 2), (2, 1)}.
Recall that the true model for such observations was considered in Examples 7.1-3 and 7.2-3.
174
12 Model Misspecification
(a) Is μ + α1 estimable under the true model? Regardless of the answer to that question, obtain its least squares estimator under the fitted model. Also obtain the expectation, variance, and mean squared error of that estimator under the true model. (b) Is α1 − α2 estimable under the true model? Regardless of the answer to that question, obtain its least squares estimator under the fitted model. Also obtain the expectation, variance, and mean squared error of that estimator under the true model. (c) Obtain the least squares estimator of α1 − α2 under the true model, and obtain its expectation, variance, and mean squared error under the true model. (d) Compare the variances of the estimators of α1 − α2 obtained in parts (b) and (c). Which theorem does this exemplify? (e) Obtain an explicit condition on γ1 , γ2 , and σ 2 for the mean squared error of the estimator of α1 − α2 defined in part (b) to be smaller than the mean squared error of the estimator of α1 − α2 defined in part (c). Solution (a) No, μ + α1 is not estimable under the correct model. But from Example 6.1-2 it is estimable under the underspecified model, and from Example 7.1-2 the estimator obtained by applying least squares to the underspecified model is y¯1· = (y11 + y12 )/2. By Theorem 12.1.2 (or by first principles), this estimator has mean μ + α1 + (γ1 + γ2 )/2, variance σ 2 /2, and mean squared error (σ 2 /2) + (γ1 + γ2 )2 /4. (b) Yes, α1 − α2 is estimable under the correct model by Theorem 6.1.4 because the two-way layout is connected. From Example 7.1-2, the estimator of α1 − α2 obtained by applying least squares to the underspecified model is y¯1· − y21 . By Theorem 12.1.2 (or by first principles), this estimator has expectation α1 − α2 − (γ1 − γ2 )/2, variance 3σ 2 /2, and mean squared error (3σ 2 /2) + (γ1 − γ2 )2 /4. (c) From Example 7.1-3, the least squares estimator of α1 − α2 under the correct model is y11 − y21 , which has expectation α1 − α2 , variance 2σ 2 , and mean squared error 2σ 2 . (d) The variance of the estimator obtained in part (b) is 25% smaller than the variance of the estimator obtained in part (c). This exemplifies Theorem 12.1.4a. (e) The estimator obtained in part (b) has smaller MSE than the estimator obtained in part (c) if and only if (3σ 2 /2) + (γ1 − γ2 )2 /4 ≤ 2σ 2 , i.e., if and only if (γ1 − γ2 )2 ≤ 2σ 2 . Exercise 5 Consider a situation in which the true model for three observations is the one-factor Gauss–Markov model yij = μ + αi + eij ,
(i, j ) ∈ {(1, 1) (1, 2), (2, 1)}
12 Model Misspecification
175
but the model fitted to the observations is the (overspecified) two-way main effects Gauss–Markov model yij = μ + αi + γj + eij ,
(i, j ) ∈ {(1, 1) (1, 2), (2, 1)}.
Recall that the fitted model for such observations was considered in Examples 7.1-3 and 7.2-3. (a) Obtain the least squares estimator of α1 − α2 under the fitted model, and obtain the expectation, variance, and mean squared error of that estimator under the true model. (b) Obtain the least squares estimator of α1 − α2 under the true model, and obtain the expectation, variance, and mean squared error of that estimator under the true model. (c) Which estimator has larger mean squared error? Which theorem does this exemplify? Solution (a) The least squares estimator of α1 − α2 under the fitted (overspecified model) is y11 − y21 . Under the correct model, this estimator has expectation α1 − α2 , variance 2σ 2 , and mean squared error 2σ 2 . (b) The least squares estimator of α1 −α2 under the correct model is y¯1· −y21 . Under the correct model, this estimator has expectation α1 − α2 , variance 3σ 2 /2, and mean squared error 3σ 2 /2. (c) The estimator obtained in part (a) has larger mean squared error, exemplifying Theorem 12.1.7b. Exercise 6 Prove Theorem 12.1.5b, c: In the underspecified mean structure scenario, if n > rank(X1 ) then: (U ) (b) if the skewness matrix of y is null, then cov(cT1 βˆ 1 , σˆ U2 ) = 0; and (c) if the skewness and excess kurtosis matrices of y are null, then
var(σˆ U2 ) =
4σ 2 β T2 XT2 (I − P1 )X2 β 2 2σ 4 + . n − rank(X1 ) [n − rank(X1 )]2
Solution By Corollary 4.2.5.1, yT (I − P1 )y T ˆ (U ) 2 T T − T cov(c1 β 1 , σˆ U ) = cov c1 (X1 X1 ) X1 y, n − rank(X1 ) = 2cT1 (XT1 X1 )− XT1 (σ 2 I)(I − P1 )(X1 β 1 + X2 β 2 )/[n − rank(X1 )] = 2σ 2 cT1 (XT1 X1 )− (XT1 − XT1 P1 )(X1 β 1 + X2 β 2 )/[n − rank(X1 )] =0
176
12 Model Misspecification
because XT1 P1 = XT1 . This proves part (b) of the theorem. Next, by Corollary 4.2.6.3, var(σˆ U2 ) =
2tr[(I − P1 )(σ 2 I)(I − P1 )(σ 2 I)] [n − rank(X1 )]2 +
4(X1 β 1 + X2 β 2 )T (I − P1 )(σ 2 I)(I − P1 )(X1 β 1 + X2 β 2 ) [n − rank(X1 )]2
=
4σ 2 β T2 XT2 (I − P1 )X2 β 2 2σ 4 tr(I − P1 ) + [n − rank(X1 )]2 [n − rank(X1 )]2
=
4σ 2 β T2 XT2 (I − P1 )X2 β 2 2σ 4 + n − rank(X1 ) [n − rank(X1 )]2
where we used Theorem 8.1.3b for the second equality and Theorems 2.12.2 and 8.1.3b, d for the third equality. This proves part (c) of the theorem. Exercise 7 In the underspecified simple linear regression settings considered in Example 12.1.1-1 and Exercise 12.3, obtain the bias and variance of σˆ U2 and compare the latter to the variance of σˆ 2 (assuming that n > 2 and that the skewness and excess kurtosis matrices of y are null). Solution First consider the setting of Example 12.1.1-1. By Theorem 12.1.5a, the bias of σˆ U2 is β2 xT (I − n1 Jn )xβ2 /(n − 1) = β22 SXX/(n − 1). By Theorem 12.1.5c, var(σˆ U2 ) =
4σ 2 β2 xT (I − n1 Jn )xβ2 4σ 2 β22 SXX 2σ 4 2σ 4 + + = . n−1 n−1 (n − 1)2 (n − 1)2
By Theorem 8.2.3, var(σˆ 2 ) = 2σ 4 /(n − 2). Therefore, var(σˆ U2 ) ≥ var(σˆ 2 ) if and only if 4σ 2 β22 SXX 2σ 4 2σ 4 + , ≥ 2 n−1 n−2 (n − 1) i.e., if and only if 2β22 SXX
≥
n−1 σ 2. n−2
12 Model Misspecification
177
Next consider the setting of Exercise 12.3. By Theorem 12.1.5a, the bias of σˆ U2 is β1 1T [I − x(xT x)−1 xT ]β1 /(n − 1) = β12 nSXX/[(n − 1) ni=1 xi2 ]. Therefore, var(σˆ U2 ) ≥ var(σˆ 2 ) if and only if 4σ 2 β12 nSXX 2σ 4 2σ 4 + , ≥ n − 1 (n − 1)2 ni=1 xi2 n−2 i.e., if and only if ≥
2β12 nSXX
n n−1 σ2 xi2 . n−2 i=1
Exercise 8 Prove Theorem 12.1.8: In the overspecified mean structure scenario, (O) suppose that XT1 X2 = 0 and that cT1 β 1 is estimable. Then cT1 βˆ 1 and cT1 βˆ 1 coincide. Solution If XT1 X2 = 0, then (O)
cT1 βˆ 1
= (cT1 , 0T )
XT1 X1 0 0 XT2 X2
−
XT1 y XT2 y
= cT1 (XT1 X1 )− XT1 y = cT1 βˆ 1 , where Theorem 3.3.7a was used for the second equality. Exercise 9 Suppose that the Gauss–Markov simple linear regression model is fit to some data, but the correct model is the no-intercept version of the same model. (O) Determine the mean squared error of βˆ2 and compare it to that of βˆ2 . Solution By Theorem 12.1.6b, σ2 (O) MSE(βˆ2 ) = n
2 i=1 xi
σ2
+ n
n 2 xi {1T [I − x(xT x)−1 xT ]1}−1 2 2
i=1 xi
i=1
n 2 2 −1 n σ2 σ2 i=1 xi = n + xi n − n 2 2 n 2 2 i=1 xi i=1 xi i=1 i=1 xi nx¯ 2 σ2 . 1 + = n 2 SXX i=1 xi
178
12 Model Misspecification
Also, MSE(βˆ2 ) = var(βˆ2 ) = σ 2 / equality if and only if x¯ = 0.
n
2 i=1 xi .
Thus, MSE(βˆ2 ) ≥ MSE(βˆ2 ), with (O)
Exercise 10 Prove Theorem 12.1.9: In the overspecified mean structure scenario, if n > rank(X1 , X2 ) then: (a) E(σˆ O2 ) = σ 2 ;
(O) (b) if the skewness matrix of y is null, then cov(cT1 βˆ 1 , σˆ O2 ) = 0; and (c) if the skewness and excess kurtosis matrices of y are null, then
var(σˆ O2 ) = 2σ 4 /[n − rank(X1 , X2 )]. Solution By Theorem 4.2.4, E(σˆ O2 ) = E{yT (I − P12 )y/[n − rank(X1 , X2 )]} = {β T1 XT1 (I − P12 )X1 β 1 + tr[(I − P12 )(σ 2 I)]}/[n − rank(X1 , X2 )] = {0 + σ 2 [n − rank(X1 , X2 )]}/[n − rank(X1 , X2 )] = σ 2. Next, by Corollary 4.2.5.1, (O)
cov(cT1 βˆ 1 , σˆ O2 ) = 2bT (σ 2 I){(I − P12 )/[n − rank(X1 , X2 )]}X1 β 1 = 0 where the exact form of b is unimportant because (I − P12 )X1 = 0. Last, by Corollary 4.2.6.3, I − P12 I − P12 2 2 (σ I) (σ I) = 2tr n − rank(X1 X2 ) n − rank(X1 , X2 ) I − P12 I − P12 T T 2 (σ I) X1 β 1 +4β 1 X1 n − rank(X1 , X2 ) n − rank(X1 , X2 )
var(σˆ O2 )
=
2σ 4 . n − rank(X1 , X2 )
Exercise 11 In the overspecified simple linear regression settings considered in Example 12.1.2-1 and Exercise 12.9, obtain the variance of σˆ O2 (assuming that n > 2 and that the skewness and excess kurtosis matrices of y are null) and compare it to the variance of σˆ 2 . Solution By Theorem 12.1.9c, the variance of σˆ O2 is 2σ 4 /(n − 2) in both Example 12.1.2-1 and Exercise 12.9. The variance of σˆ 2 in both cases is 2σ 4 /(n − 1).
12 Model Misspecification
179
2 2 Thus, in both cases the variance of σˆ O exceeds that of σˆ by an amount 1 1 1 2σ 4 n−2 = 2σ 4 (n−1)(n−2) . − n−1
Exercise 12 Prove Theorem 12.2.1a: In the misspecified variance–covariance structure scenario, let cT β be an estimable function. Then cT β˜ I and ˜tTC y coincide if −1 T − and only if W2 W−1 1 X(X W1 X) c ∈ C(X). −1 − T Solution Note that cT β˜ I = cT (XT W−1 1 X) X W1 y, invariant to the choice −1 T of generalized inverse of X W1 X, and that cT = aT X for some a. Therefore, by Corollary 11.2.2.2, cT β˜ I is the BLUE of cT β if and only if −1 T − T T W2 W−1 1 [(X W1 X) ] X a ∈ C(X), or equivalently (by Theorem 11.1.6b and −1 T − Corollary 3.3.1.1) if and only if W2 W−1 1 (X W1 X) c ∈ C(X).
Exercise 13 Consider a misspecified variance–covariance structure scenario with W1 = I, in which case σ˜ I2 is the ordinary residual mean square. n ¯ 2 , where σ¯ 2 = (1/n) ni=1 σ 2 w2,ii and w2,ii is (a) Show that 0 ≤ E(σ˜ I2 ) ≤ n−p ∗σ the ith main diagonal element of W2 . (Hint: Use Theorem 2.15.13.) (b) Prove the following lemma, and then use it to show that the lower bound in part (a) is attained if and only if W2 = PX BPX for some nonnegative definite matrix B and that the upper bound is attained if and only if W2 = (I−PX )C(I−PX ) for some nonnegative definite matrix C. (Hint: To prove the lemma, use Theorems 3.2.3 and 2.15.9.) Lemma Let A and Q represent an m × n matrix and an n × n nonnegative definite matrix, respectively, and let A− represent any generalized inverse of A. Then AQ = 0 if and only if Q = (I − A− A)B(I − A− A)T for some n × n nonnegative definite matrix B. (Note: This exercise was inspired by results from Dufour [1986].) Solution (a) By Theorem 4.2.4, E(σ˜ I2 ) = E[yT (I − PX )y/(n − p∗ )] =
σ2 tr[(I − PX )W2 ]. n − p∗
But tr[(I − PX )W2 ] ≥ 0 by Theorem 2.15.13 because both I − PX and W2 are nonnegative definite. This establishes the lower bound. Note further that, by the same theorem, tr[(I − PX )W2 ] = tr(W2 ) − tr(PX W2 ) ≤ tr(W2 )
180
12 Model Misspecification
because both PX and W2 are nonnegative definite. Thus, E(σ˜ I2 ) ≤
σ2 n tr(W2 ) = σ¯ 2 . n − p∗ n − p∗
(b) The lower bound is attained if and only if tr[(I − PX )W2 ] = 0 or equivalently, by Theorem 2.15.13, if and only if (I − PX )W2 = 0. Applying the lemma with A = I − PX and Q = W2 , we find that W2 = [I − (I − PX )− (I − PX )]B[I − (I − PX )− (I − PX )]T = PX BPX for some n × n nonnegative definite matrix B. Similarly, the upper bound is attained if and only if tr(PX W2 ) = 0 or equivalently, by Theorem 2.15.13, if and only if PX W2 = 0. Applying the lemma with A = PX and Q = W2 , we find that W2 = (I − P− X PX )C(I − T = (I − P )C(I − P ) for some n × n nonnegative definite matrix C. P− P ) X X X X Now we prove the lemma. Proving sufficiency is trivial since A(I − A− A) = 1 0. To prove the necessity, suppose that AQ = 0 and let Q 2 be the nonnegative 1 1 definite square root of Q given by Definition 2.15.2. Then AQ 2 Q 2 = 0, implying by 1 Theorem 3.3.2 that AQ 2 = 0. This last matrix equation implies, by Theorem 3.2.3, 1 that Q 2 = (I − A− A)Z for some matrix Z. Thus, Q = (I − A− A)ZZT (I − A− A)T = (I − A− A)B(I − A− A)T where B (= ZZT ) is nonnegative definite by Theorem 2.15.9. Exercise 14 Consider the Aitken model {y, Xβ, σ 2 [I + PX APX + (I − PX )B(I − PX )]}, where B is a specified n × n nonnegative definite matrix and A is a specified n × n matrix such that var(e) is nonnegative definite. Suppose that n > p∗ . (a) Let σ˜ I2 = yT (I − PX )y/(n − p∗ ). Obtain a simplified expression for the bias of σ˜ I2 under this model, and determine necessary and sufficient conditions on A and B for this bias to equal 0. (b) Suppose that the skewness and excess kurtosis matrices of y are null. Obtain a simplified expression for var(σ˜ I2 ). How does var(σ˜ I2 ) compare to var(σ˜ C2 )? Solution (a) By Theorem 4.2.4, E(σ˜ I2 ) = E[yT (I − PX )y/(n − p∗ )] = 0 + tr{(I − PX )σ 2 [I + PX APX + (I − PX )B(I − PX )]}/(n − p∗ ) = σ 2 {1 + tr[(I − PX )B]/(n − p∗ )}.
12 Model Misspecification
181
Thus, σ˜ I2 is unbiased if and only if B is such that tr[(I − PX )B] = 0; there are no conditions on A. Furthermore, by Theorem 2.15.13, tr[(I − PX )B] = 0 if and only if (I − PX )B = 0. (b) By Corollary 4.2.6.3, var(σ˜ I2 ) = 2tr{(I − PX )σ 2 [I + PX APX + (I − PX )B(I − PX )](I − PX ) ×σ 2 [I + PX APX + (I − PX )B(I − PX )]}/(n − p ∗ )2 + 0 =
2σ 4 tr[(I − PX ) + 2(I − PX )B(I − PX ) + (I − PX )B(I − PX )B(I − PX )] (n − p ∗ )2
=
2σ 4 tr[2(I − PX )B(I − PX ) + (I − PX )B(I − PX )B(I − PX )] 2σ 4 + . ∗ n−p (n − p ∗ )2
Note that var(σ˜ C2 ) = 2σ 4 /(n − p∗ ), and (I − PX )B(I − PX ) and (I − PX )B(I − PX )B(I − PX ) are nonnegative definite because B and I − PX are nonnegative definite, so by Theorem 2.15.1, var(σ˜ I2 ) ≥ var(σ˜ C2 ). Exercise 15 Consider a special case of the misspecified variance–covariance structure scenario in which W1 = I and W2 is a nonnegative definite matrix such that tr(W2 ) = n but is otherwise arbitrary. (a) Show that σ˜ I2 ≡ yT (I − PX )y/(n − p∗ ) is an unbiased estimator of σ 2 under the correct model if and only if tr(PX W2 ) = p∗ . (b) As an even more special case, suppose that ⎛
⎞ 1 1 ⎜ 1 −1 ⎟ ⎟ X=⎜ ⎝1 1⎠
⎛
and
1 −1
1 ⎜ρ W2 = ⎜ ⎝0 0
ρ 1 ρ 0
0 ρ 1 ρ
⎞ 0 0⎟ ⎟. ρ⎠ 1
Here ρ is a specified real number for which W2 is nonnegative definite. Using part (a), determine whether σ˜ I2 is an unbiased estimator of σ 2 in this case. Solution (a) By Theorem 12.2.1d, E(σ˜ I2 )
=σ
2
tr[(I − PX )W2 ] n − p∗
=σ
2
so σ˜ I2 is unbiased if and only if tr(PX W2 ) = p∗ .
n − tr(PX W2 ) , n − p∗
182
12 Model Misspecification
(b) ⎛ ⎞ 2 1 −1 ⎜ ⎟ 1 1 1 1 −1 ⎟ 4 0 ⎜0 = (1/4) ⎜ ⎟ ⎝2 1 −1 1 −1 1 ⎠ 04 0 −1
⎛
1 ⎜ 1 ⎜ PX = X(XT X)− XT = ⎜ ⎝1 1
0 2 0 2
2 0 2 0
⎞ 0 ⎟ 2⎟ ⎟, 0⎠ 2
so ⎛
2 ⎜0 PX W2 = (1/4) ⎜ ⎝3 0
0 2 0 2
2 0 2 0
⎞⎛ 0 1 ⎟ ⎜ 2⎟⎜γ 0⎠⎝ 0 2 0
γ 1 γ 0
0 γ 1 γ
⎞ 0 0⎟ ⎟ = (1/2)I4 , γ⎠ 1
which has trace equal to 2. Because p∗ = 2, σ˜ I2 is unbiased for σ 2 . Exercise 16 Consider a misspecified variance–covariance structure scenario in which the correct model is a heteroscedastic no-intercept linear regression model {y, xβ, σ 2 diag(x12 , x22 , . . . , xn2 )} and the incorrect model is its Gauss–Markov counterpart {y, xβ, σ 2 I}. (a) Obtain specialized expressions for var(β˜I ) and var(β˜C ). (b) Suppose that xi = 100/(101 − i) (i = 1, . . . , 100). Evaluate the expression for var(β˜I ) obtained in part (a) for each n = 1, . . . , 100. (You should write a short computer program to do this.) Are you surprised by the behavior of var(β˜I ) as the sample size increases? Explain. (Note: This exercise was inspired by results from Meng and Xie [2014].) Solution (a) Specializing the expressions in Theorem 12.2.1c, we obtain
var(β˜I ) = σ
2
n i=1
−1 xi2
T
x
[diag(x12 , x22 , . . . , xn2 )]x
n i=1
−1 xi2
n x4 = σ 2 i=1 i 2 n 2 i=1 xi
and var(β˜C ) = σ 2 {xT [diag(1/x12 , 1/x22 , . . . , 1/xn2 )]x}−1 = σ 2 /n.
12 Model Misspecification
183
(b) It turns out that var(β˜I ) decreases monotonically as n increases from 1 to 64, but surprisingly, var(β˜I ) increases monotonically as n increases from 65 to 100. Partial results are listed in the following table: n 1 2 3 4 5 .. . 63 64 65 66 .. . 98 99 100
var(β˜I ) 1.000000 0.500051 0.333424 0.250129 0.200167 .. . 0.021441 0.021436 0.021453 0.021492 .. . 0.133746 0.204172 0.404883
Exercise 17 Consider a misspecified variance–covariance structure scenario in which the incorrect model is {y, 1μ, σ 2 I} and the correct model has the same mean structure but a variance–covariance matrix given by (5.13). Obtain a general expression for the efficiency (the ratio of variances) of the (incorrect) least squares estimator of μ relative to the (correct) generalized least squares estimator, and evaluate it for n = 10 and ρ = −0.9, −0.7, −0.5, . . . , 0.9. Solution Specializing one of the expressions in Theorem 12.2.1c, we obtain var(μ˜ I ) = σ 2 (1/n)1T W1(1/n) =
σ2 n2 (1 − ρ 2 )
[n+2(n−1)ρ +2(n−2)ρ 2 +· · ·+2ρ n−1 ],
and from Example 11.1-3 we have var(μ˜ C ) =
σ2 . (1 − ρ)[2 + (n − 2)(1 − ρ)]
Thus the efficiency of μ˜ I relative to μ˜ C is Eff =
n2 (1 + ρ) var(μ˜ C ) = . var(μ˜ I ) [n − (n − 2)ρ][n + 2(n − 1)ρ + 2(n − 2)ρ 2 + · · · + 2ρ n−1 ]
184
12 Model Misspecification
When n = 10 and ρ = −0.9, −0.7, . . . , 0.7, 0.9, we obtain the following values of Eff : ρ −0.9 −0.7 −0.5 −0.3 −0.1 0.1 0.3 0.5 0.7 0.9
Eff 0.6831 0.8603 0.9455 0.9835 0.9983 0.9984 0.9861 0.9614 0.9299 0.9326
References Dufour, J. (1986). Bias of S 2 in linear regressions with dependent errors. The American Statistician, 40, 284–285. Meng, X. & Xie, X. (2014). I got more data, my model is more refined, but my estimator is getting worse! Am I just dumb? Econometric Reviews, 33, 218–250.
13
Best Linear Unbiased Prediction
This chapter presents exercises on best linear unbiased prediction under a linear model, and provides solutions to those exercises. Exercise 1 Prove 13.1.2: A function τ = cT β + u is predictable Theorem y Xβ under the model , if and only if cT β is estimable under the model u 0 {y, Xβ}, i.e., if and only if cT ∈ R(X). Solution If cT ∈ R(X), then an n-vector a exists such that aT X = cT , implying further that E(aT y) = aT Xβ = cT β = E(cT β + u) for all β. Thus, cT β is predictable. Conversely, if cT β + u is predictable, then by definition and Theorem 13.1.1 there exists an unbiased predictor of cT β + u of the form tT y, where tT X = cT . Thus, cT ∈ R(X). Wk kT h T for which the BLUP equations for a predictable function c β + u are not consistent. (Hint: Take W = XXT and determine a suitable k.)
Exercise 2 Suppose that n > p∗ . Find a nonnegative definite matrix
Solution Taking W = XXT , the BLUP equations for cT β + u are
XXT X XT 0
t k = . λ c
In particular, the “top” subset of BLUP equations is X(XT t + λ) = k, which has a solution if and only if k ∈ C(X). Therefore, those equations will not have a solution if k is taken to be any nonnull vector in C(I − PX ). Such a vector exists because n > p∗ . The value of h is irrelevant, so it can be taken to equal 1.
© Springer Nature Switzerland AG 2020 D. L. Zimmerman, Linear Model Theory, https://doi.org/10.1007/978-3-030-52074-8_13
185
186
13 Best Linear Unbiased Prediction
Exercise 3 Prove Corollary 13.2.1.1: Under the prediction-extended positive definite Aitken model described in Theorem 13.2.1, if CT β + u is a vector of predictable functions and LT is a matrix of constants having the same number of columns as CT has rows, then LT (CT β + u) is predictable and its BLUP is ˜ LT (CT β˜ + u). Solution Each row of LT CT is an element of R(X), E(LT u) = 0, and var
y LT u
y I 0 u 0 LT W K I 0 I 0 2 σ = KT H 0L 0 LT W KL , = σ2 T (KL) LT HL
= var
which is nonnegative definite by Corollary 2.15.12.1. Therefore, LT (CT β + u) is predictable by Theorem 13.1.2, and by Theorem 13.2.1 its BLUP is ˜ LT CT β˜ + (KL)T Ey = LT (CT β˜ + u). Exercise 4 Prove Theorem 13.2.3: Under a prediction-extended positive definite Aitken model for which n > p∗ and the skewness matrix of the joint distribution of y and u is null, cov(τ˜ − τ , σ˜ 2 ) = 0. Solution By Corollary 4.2.5.1, cov(τ˜ − τ , σ˜ 2 )
⎡ ⎤ . T y y y ⎦ E 0 (n − p∗ ) , = cov ⎣ CT (XT W−1 X)− XT W−1 + KT E, −Is u u u 0 0 = 2 CT (XT W−1 X)− XT W−1 + KT E, −Is σ 2
W K
KT H
E0 0 0
Xβ 0
.
(n − p∗ )
= 0,
where the last equality follows from Theorem 11.1.6d. Exercise 5 Prove Theorem 13.3.1a: For the prediction-extended Aitken model y W K Xβ 2 , , ,σ u KT H 0
13 Best Linear Unbiased Prediction
187
W X G11 G12 represent a generalized inverse of , let τ = CT β + u be let G21 G22 XT 0 an s-vector of predictable functions, and suppose that C(K) ⊆ C(W, X). Then the collection of BLUPs of τ consists of all quantities of the form T˜ T y, where
T˜ = G12 C + G11 K + (In − G11 W − G12 XT , −G11 X)Z, where Z is an arbitrary (n + p) × s matrix. In particular, (CT GT12 + KT GT11 )y is a BLUP of τ . Solution Because CT β + u is predictable and C(K) ⊆ C(W, X), the BLUP equations for each element of CT β + u are consistent (Theorem 13.1.3). Therefore, by Theorem 3.2.3, all solutions to the BLUP equations for all elements of CT β + u are given by
G11 G12 G21 G22
K C
W X G11 G12 Z, + I− G21 G22 XT 0
where Z ranges throughout the space of (n + p) × s matrices. The matrix of first n-component subvectors of such a solution is T˜ = G11 K + G12 C + (In − G11 W − G12 XT , −G11 X)Z. Thus all BLUPs of CT β + u may be written as T˜ T y for matrices T˜ of this form. One BLUP in particular, obtained by setting Z = 0, is (G11 K + G12 C)T y, i.e., (CT GT12 + KT GT11 )y. Exercise 6 Prove the invariance 13.3.1b: prediction For the result inTheorem W K G11 G12 y Xβ 2 represent , let extended Aitken model , ,σ G21 G22 KT H u 0 W X a generalized inverse of , let τ = CT β + u be an s-vector of predictable XT 0 functions, and suppose that C(K) ⊆ C(W, X). Then: (b) the variance–covariance matrix of prediction errors associated with a vector of BLUPs τ˜ = (CT GT12 + KT GT11 )y of a vector of predictable functions τ = CT β + u is K G11 G12 2 T T var(τ˜ − τ ) = σ H − (K , C ) , G21 G22 C invariant to the choice of generalized inverse.
188
13 Best Linear Unbiased Prediction
K G11 G12 Solution It suffices to show that , i.e., KT G11 K + G21 G22 C KT G12 C + CT G21 K + CT G22 C, is invariant to the choice of generalized inverse. Now, by Theorem 3.3.8f (with W and X here playing the roles of A and B in the theorem), all four of the following matrices have this invariance: WG11 W, WG12 XT , XG21 W, and XG22 XT . We consider each of these four matrices in turn. Using Theorem 3.3.8c and defining F and L as in the proof of the earlier part of Theorem 13.2.1b, we obtain
(KT , CT )
KT G11 K = (FT W + LT XT )G11 (WF + XL) = FT (WG11 W)F + LT (XT G11 W)F + FT (WG11 X)L + LT (XT G11 X)L = FT (WG11 W)F, which establishes that KT G11 K has the desired invariance. Using Theorem 3.3.8a, we obtain KT G12 C = (FT W + LT XT )G12 XT A = FT WG12 XT A + LT XT G12 XT A = FT (WG12 XT )A + LT XT A, likewise establishing that KT G12 C has the desired invariance. Similarly, CT G21 K = AT XG21 (WF + XL) = AT (XG21 W)F + AT XL and CT G22 C = AT (XG22 XT )A have the desired invariance. Exercise 7 In this exercise, you are to consider using the BLUE cT β˜ of cT β as a predictor of a predictable τ = cT β + u associated with the prediction W k y Xβ . Let us call extended positive definite Aitken model , ,σ2 kT h u 0 this predictor the “BLUE-predictor.” (a) Show that the BLUE-predictor is a linear unbiased predictor of τ . (b) Obtain an expression for the variance of the prediction error cT β˜ − τ corresponding to the BLUE-predictor. (c) Because [from part (a)] the BLUE-predictor is a linear unbiased predictor, its prediction error variance [obtained in part (b)] must be at least as large as the prediction error variance of the BLUP of τ . Give an expression for how much larger it is, i.e., give an expression for var(cT β˜ − τ ) − var(τ˜ − τ ). (d) Determine a necessary and sufficient condition for your answer to part (c) to equal 0. Your condition should take the form k ∈ S, where S is a certain set of vectors. Thus you will have established a necessary and sufficient condition for the BLUE of cT β to also be the BLUP of cT β + u.
13 Best Linear Unbiased Prediction
189
Solution (a) cT β˜ = cT (XT W−1 X)− XT W−1 y so cT β˜ is a linear predictor. It is also unbiased ˜ = cT β = cT β + E(u) = E(τ ). because E(cT β) (b) ˜ + var(τ ) − 2cov(cT β, ˜ τ) var(cT β˜ − τ ) = var(cT β) = σ 2 cT (XT W−1 X)− c + σ 2 h − 2σ 2 cT (XT W−1 X)− XT W−1 k. (c) Using Theorem 13.2.3a, var(cT β˜ − τ ) − var(τ˜ − τ ) = σ 2 [cT (XT W−1 X)− c + h − 2cT (XT W−1 X)− XT W−1 k] −σ 2 [h − kT W−1 k +(cT − kT W−1 X)(XT W−1 X)− (c − XT W−1 k)] = σ 2 [kT W−1 k − kT W−1 X(XT W−1 X)− XT W−1 k].
(d) The difference in the answer to part (c) equals 0 if and only if kT [W−1 − 1 W−1 X(XT W−1 X)− XT W−1 ]k = 0, i.e., if and only if (W− 2 k)T {I − 1 1 1 1 1 (W− 2 X)[(W− 2 X)T (W− 2 X)]− (W− 2 X)T }W− 2 k = 0, i.e., if and only if 1 1 W− 2 k ∈ C(W− 2 X), i.e., if and only if k ∈ C(X). Exercise 8 Verify all results for the spatial prediction problem considered in Example 13.2-3. Then, obtain the BLUP of y7 and the corresponding prediction error variance after making each one of the following modifications to the spatial configuration or model. For each modification, compare the weights corresponding to the elements of y with the weights in Example 13.2-3, and use your intuition to explain the notable differences. (Note: It is expected that you will use a computer to do this exercise.) (a) Suppose that y7 is to be observed at the grid point in the second row (from the bottom) of the first column (from the left). (b) Suppose that the (i, j )th element of W is equal to exp(−dij /4). (c) Suppose that the (i, j )th element of W is equal to 1 if i = j , or 0.5 exp(−dij /2) otherwise. Solution First we establish a coordinate system with the origin placed at the bottom left corner and 1 unit length being the distance between two closest grid points on each axis. In addition, we label the observations from left to right and bottomup if two observations are in the same column. In this way, the locations for each observation (including y7 ) are given in the following table:
190
13 Best Linear Unbiased Prediction Observation y1 y2 y3 y4 y5 y6 y7 Location (0, 0) (1, 1) (2, 3) (3, 0) (3, 2) (4, 1) (2, 1) %
By definition, W=(wij ) with wij = exp(−dij /2) and dij = (si − sj )2 + (ti − tj )2 , where observation i is at (si , ti ) and observation j is at (sj , tj ). Numerical entries of W are shown as follows: ⎛ ⎞ 1.000 0.493 0.165 0.223 0.165 0.127 0.327 ⎜ 0.493 1.000 0.327 0.327 0.327 0.223 0.607 ⎟ ⎜ ⎟ ⎜ 0.165 0.327 1.000 0.206 0.493 0.243 0.368 ⎟ ⎜ ⎟ ⎜ ⎟ W = ⎜ 0.223 0.327 0.206 1.000 0.368 0.493 0.493 ⎟ . ⎜ ⎟ ⎜ 0.165 0.327 0.493 0.368 1.000 0.493 0.493 ⎟ ⎜ ⎟ ⎝ 0.127 0.223 0.243 0.493 0.493 1.000 0.368 ⎠ 0.327 0.607 0.368 0.493 0.493 0.368 1.000 Now consider the model
y y7
e var e7
= μ17 +
e e7
=σ W=σ 2
2
,
W1 k , kT 1
where ⎞ 1.000 0.493 0.165 0.223 0.165 0.127 ⎜ 0.493 1.000 0.327 0.327 0.327 0.223 ⎟ ⎟ ⎜ ⎟ ⎜ ⎜ 0.165 0.327 1.000 0.206 0.493 0.243 ⎟ W1 = ⎜ ⎟, ⎜ 0.223 0.327 0.206 1.000 0.368 0.493 ⎟ ⎟ ⎜ ⎝ 0.165 0.327 0.493 0.368 1.000 0.493 ⎠ 0.127 0.223 0.243 0.493 0.493 1.000 kT = 0.327 0.607 0.368 0.493 0.493 0.368 . ⎛
Then, the BLUP of y7 = 1μ + e7 is −1 −1 −1 −1 −1 − T T T T − T y˜7 = [cT (XT W−1 1 X) X W1 + k W1 − k W1 X(X W1 X) X W1 ]y = 0.017, 0.422, 0.065, 0.247, 0.218, 0.030 y
= 0.017y1 + 0.422y2 + 0.065y3 + 0.247y4 + 0.218y5 + 0.030y6 .
13 Best Linear Unbiased Prediction
191
The BLUP’s prediction error variance is −1 −1 −1 T T T − T T T var(y˜7 − y7 ) = σ 2 [1 − kT W−1 1 k + (c − k W1 X)(X W1 X) (c − k W1 X) ]
= 0.478σ 2 .
In general, the closer the spatial location corresponding to an element of y is to that of y7 , the higher the corresponding weight will be (for example, y2 has the highest weight). This result matches the intuition that closer observations are more highly correlated so it is reasonable to put more weight on closer observations in order to have a good prediction. It can also be observed that, even though two observations (y4 and y5 ) have the same distance to y7 , the weights for them are close but not necessarily the same. (a) We preserve the coordinate system and notations in the original example but the location of y7 is now (0, 1). Repeating the calculations, we obtain the following quantities: ⎛
⎞ 1.000 0.493 0.165 0.223 0.165 0.127 0.607 ⎜ 0.493 1.000 0.327 0.327 0.327 0.223 0.607 ⎟ ⎜ ⎟ ⎜ 0.165 0.327 1.000 0.206 0.493 0.243 0.243 ⎟ ⎜ ⎟ ⎜ ⎟ W = ⎜ 0.223 0.327 0.206 1.000 0.368 0.493 0.206 ⎟ , ⎜ ⎟ ⎜ 0.165 0.327 0.493 0.368 1.000 0.493 0.206 ⎟ ⎜ ⎟ ⎝ 0.127 0.223 0.243 0.493 0.493 1.000 0.135 ⎠ 0.607 0.607 0.243 0.206 0.206 0.135 1.000 ⎞ ⎛ 1.000 0.493 0.165 0.223 0.165 0.127 ⎜ 0.493 1.000 0.327 0.327 0.327 0.223 ⎟ ⎟ ⎜ ⎟ ⎜ ⎜ 0.165 0.327 1.000 0.206 0.493 0.243 ⎟ W1 = ⎜ ⎟, ⎜ 0.223 0.327 0.206 1.000 0.368 0.493 ⎟ ⎟ ⎜ ⎝ 0.165 0.327 0.493 0.368 1.000 0.493 ⎠ 0.127 0.223 0.243 0.493 0.493 1.000 kT = 0.607 0.607 0.243 0.206 0.206 0.135 . Therefore, the BLUP of y7 = 1μ + e7 is −1 −1 −1 −1 −1 − T T T T − T y˜7 = [cT (XT W−1 1 X) X W1 + k W1 − k W1 X(X W1 X) X W1 ]y = 0.453, 0.414, 0.094, 0.005, 0.004, 0.029 y
= 0.453y1 + 0.414y2 + 0.094y3 + 0.005y4 + 0.004y5 + 0.029y6 ,
192
13 Best Linear Unbiased Prediction
and the BLUP’s prediction error variance is −1 −1 −1 T T T − T T T var(y˜7 − y7 ) = σ 2 [1 − kT W−1 1 k + (c − k W1 X)(X W1 X) (c − k W1 X) ]
= 0.517σ 2 .
Now, the weights of y1 and y2 in the BLUP are significantly higher than the others. However, in the original example the weights of y2 , y4 , and y5 are significantly higher. This corresponds to the fact that now y7 ’s location is much closer to those of y1 and y2 than to the locations of other observations, while in (a) y7 ’s location is much closer to those of y2 , y4 , and y5 than to the locations of other observations. Another interesting observation is that even though y6 ’s location is farthest from y7 ’s, the weight for y6 is not the smallest one. (b) We preserve the coordinate system and notations in the original example but now wij = exp(−dij /4). Repeating the calculations, we obtain the following quantities: ⎞ 1.000 0.702 0.406 0.472 0.406 0.357 0.572 ⎜ 0.702 1.000 0.572 0.572 0.572 0.472 0.779 ⎟ ⎟ ⎜ ⎜ 0.406 0.572 1.000 0.454 0.702 0.493 0.607 ⎟ ⎟ ⎜ ⎟ ⎜ W = ⎜ 0.472 0.572 0.454 1.000 0.607 0.702 0.702 ⎟ , ⎟ ⎜ ⎜ 0.406 0.572 0.702 0.607 1.000 0.702 0.702 ⎟ ⎟ ⎜ ⎝ 0.357 0.472 0.493 0.702 0.702 1.000 0.607 ⎠ 0.572 0.779 0.607 0.702 0.702 0.607 1.000 ⎞ ⎛ 1.000 0.702 0.406 0.472 0.406 0.357 ⎜ 0.702 1.000 0.572 0.572 0.572 0.472 ⎟ ⎟ ⎜ ⎟ ⎜ ⎜ 0.406 0.572 1.000 0.454 0.702 0.493 ⎟ W1 = ⎜ ⎟, ⎜ 0.472 0.572 0.454 1.000 0.607 0.702 ⎟ ⎟ ⎜ ⎝ 0.406 0.572 0.702 0.607 1.000 0.702 ⎠ 0.357 0.472 0.493 0.702 0.702 1.000 kT = 0.572 0.779 0.607 0.702 0.702 0.607 . ⎛
Therefore, the BLUP of y7 = 1μ + e7 is −1 −1 −1 −1 −1 − T T T T − T y˜7 = [cT (XT W−1 1 X) X W1 + k W1 − k W1 X(X W1 X) X W1 ]y = 0.001, 0.45, 0.049, 0.261, 0.23, 0.01 y
= 0.001y1 + 0.45y2 + 0.049y3 + 0.261y4 + 0.23y5 + 0.01y6 ,
13 Best Linear Unbiased Prediction
193
and the BLUP’s prediction error variance is −1 −1 −1 T T T − T T T var(y˜7 − y7 ) = σ 2 [1 − kT W−1 1 k + (c − k W1 X)(X W1 X) (c − k W1 X) ]
= 0.254σ 2 .
The weights in the BLUP corresponding to the elements of y are very similar to those in the original example. This corresponds to the fact that the locations of the observations are unchanged. However, the BLUP’s prediction error variance is much smaller than it was in the original example. An intuitive explanation is that for the new variance–covariance matrix of y, the correlations between any two observations are larger than those in the original example. With higher correlation, the prediction is more precise and thus the prediction error variance is smaller. (c) We preserve the coordinate system and notations in the original example but now wij = 1 if i = j , and wij = 0.5 exp(−dij /2) otherwise. Repeating the calculations, we obtain the following quantities: ⎛
⎞ 1.000 0.247 0.082 0.112 0.082 0.064 0.163 ⎜ 0.247 1.000 0.163 0.163 0.163 0.112 0.303 ⎟ ⎜ ⎟ ⎜ 0.082 0.163 1.000 0.103 0.247 0.122 0.184 ⎟ ⎜ ⎟ ⎜ ⎟ W = ⎜ 0.112 0.163 0.103 1.000 0.184 0.247 0.247 ⎟ , ⎜ ⎟ ⎜ 0.082 0.163 0.247 0.184 1.000 0.247 0.247 ⎟ ⎜ ⎟ ⎝ 0.064 0.112 0.122 0.247 0.247 1.000 0.184 ⎠ 0.163 0.303 0.184 0.247 0.247 0.184 1.000 ⎞ ⎛ 1.000 0.247 0.082 0.112 0.082 0.064 ⎜ 0.247 1.000 0.163 0.163 0.163 0.112 ⎟ ⎟ ⎜ ⎟ ⎜ ⎜ 0.082 0.163 1.000 0.103 0.247 0.122 ⎟ W1 = ⎜ ⎟, ⎜ 0.112 0.163 0.103 1.000 0.184 0.247 ⎟ ⎟ ⎜ ⎝ 0.082 0.163 0.247 0.184 1.000 0.247 ⎠ 0.064 0.112 0.122 0.247 0.247 1.000 kT = 0.163 0.303 0.184 0.247 0.247 0.184 . Therefore, the BLUP of y7 = 1μ + e7 is −1 −1 −1 −1 −1 − T T T T − T y˜7 = [cT (XT W−1 1 X) X W1 + k W1 − k W1 X(X W1 X) X W1 ]y = 0.124, 0.256, 0.133, 0.194, 0.175, 0.119 y
= 0.124y1 + 0.256y2 + 0.133y3 + 0.194y4 + 0.175y5 + 0.119y6 ,
194
13 Best Linear Unbiased Prediction
and the BLUP’s prediction error variance is −1 −1 −1 T T T − T T T var(y˜7 − y7 ) = σ 2 [1 − kT W−1 1 k + (c − k W1 X)(X W1 X) (c − k W1 X) ]
= 0.843σ 2 .
Now, the weights corresponding to the elements of y do not vary as much as they did in the original example, even though the relative locations are unchanged. This is because the new variance–covariance matrix results in all observations being relatively weakly correlated and the differences in correlations are relatively small. Furthermore, due to the weak correlation between observations, it is not surprising to see that the BLUP’s prediction error variance is much higher than it was in the original example. Exercise 9 Observe that in Example 13.2-3 and Exercise 13.8, the “weights” (the coefficients on the elements of y in the expression for the BLUP) sum to one (apart from roundoff error). Explain why this is so. Solution Let tT y represent the BLUP. The BLUP is an unbiased predictor, so by Theorem 13.1.1 tT X = cT , which in this problem is tT 1 = 1 (i.e., the weights sum to 1) because X = 1 and c = 1. Exercise 10 Suppose that observations (xi , yi ) follow the simple linear regression model yi = β1 + β2 xi + ei
(i = 1, . . . , n),
where n ≥ 3. Consider the problem of predicting an unobserved y-value, yn+1 , corresponding to a specified x-value, xn+1 . Assume that yn+1 follows the same basic model as the observed responses; that is, the joint model for yn+1 and the observed responses can be written as yi = β1 + β2 xi + ei
(i = 1, . . . , n, n + 1),
where the ei ’s satisfy Gauss–Markov assumptions with common variance σ 2 > 0. Example 13.2-1 established that the best linear unbiased predictor of yn+1 under this model is y˜n+1 = βˆ1 + βˆ2 xn+1 and its mean squared prediction error (MSPE) is σ 2 [1 + (1/n) + (xn+1 − x) ¯ 2 /SXX]. Although y˜n+1 has smallest MSPE among all unbiased linear predictors of yn+1 , biased linear predictors of yn+1 exist that may have smaller MSPE than y˜n+1 . One ¯ show that y¯ has such predictor is y, ¯ the sample mean of the yi ’s. For xn+1 = x, smaller MSPE than y˜n+1 if and only if β22 SXX/σ 2 < 1.
13 Best Linear Unbiased Prediction
195
Solution The mean squared prediction error of y¯ is ¯ − E(yn+1 )]2 E[(y¯ − yn+1 )2 ] = var(y¯ − yn+1 ) + [E(y) = var(y) ¯ + var(yn+1 ) − 2cov(y, ¯ yn+1 ) + [β1 + β2 x¯ − (β1 + β2 xn+1 )]2 = σ 2 [1 + (1/n)] + β22 (xn+1 − x) ¯ 2.
Thus the mean squared prediction error of y¯ is less than that of y˜n+1 if and only if σ 2 [1 + (1/n)] + β22 (xn+1 − x) ¯ 2 < σ 2 [1 + (1/n) + (xn+1 − x) ¯ 2 /SXX], i.e., if and only if β22 SXX/σ 2 < 1. Exercise 11 Obtain the expressions for var(b˜ − b) and its unbiased estimator given in Example 13.4.2-1. Solution Specializing the expressions in Theorem 13.4.2b, c using C = 0, F = 1, X = 0, Z = z, G = σb2 /σ 2 ≡ ψ, and R = I, we obtain var(b˜ − b) = σ 2 1(ψ −1 + zT z)−1 1 + 0 =
σ2 . ψ −1 + ni=1 zi2
An unbiased estimator of var(b˜ − b) may be obtained by replacing σ 2 in the numerator of this expression with σ˜ = y [I − (ψ 2
T
−1
+ z z) T
−1
zz ]y/(n − 0) = T
n
2 i=1 zi yi ψ −1 + ni=1 zi2 n
yi2
−
i=1
. n.
Exercise 12 Consider the random no-intercept simple linear regression model yi = bzi + di
(i = 1, . . . , n),
where b, d1 , d2 , . . . , dn are uncorrelated zero-mean random variables such that var(b) = σb2 > 0 and var(di ) = σ 2 > 0 for all i. Let ψ = σb2 /σ 2 and suppose that ψ is known. Let z represent a specified real number. (a) Write out the mixed-model equations in as simple a form as possible, and solve them. (b) Give a nonmatrix expression, simplified as much as possible, for the BLUP of bz. (c) Give a nonmatrix expression, simplified as much as possible, for the prediction error variance of the BLUP of bz.
196
13 Best Linear Unbiased Prediction
Solution (a) (ψ −1 + ˙ (b) bz. 2 2 (c) −1 σ z n ψ
+
n
2 i=1 zi )b =
2 i=1 zi
n
˙ i=1 zi yi , with solution b =
n
zy i=1 i i . ψ −1 + ni=1 zi2
.
Exercise 13 Consider the following random-slope simple linear regression model: yi = β + bzi + di
(i = 1, · · · , n),
where β is an unknown parameter, b is a zero-mean random variable with variance σb2 > 0, and the di ’s are uncorrelated (with each other and with b) zero-mean random variables with common variance σ 2 > 0. Let ψ = σb2 /σ 2 and suppose that ψ is known. The model equation may be written in matrix form as y = β1 + bz + d. (a) Determine σ 2 W ≡ var(y). (b) Write out the mixed-model equations in as simple a form as possible and solve them. (c) Consider predicting the predictable function τ ≡ β + bz, where z is a specified real number. Give an expression for the BLUP τ˜ of τ in terms of a solution ˙ T to the mixed-model equations. ˙ b) (β, (d) Give a nonmatrix expression for σ˜ 2 , the generalized residual mean square, in terms of a solution to the mixed-model equations mentioned in part (c). (e) Obtain a nonmatrix expression for the prediction error variance associated with τ˜ . Solution (a) var(y) = var(β1+bz+d) = var(bz)+var(d) = σb2 zzT +σ 2 I = σ 2 (ψzzT +I). (b) The mixed-model equations simplify to
n n
i=1 zi
n
i=1 zi 2 −1 z i=1 i + ψ
n
n y β = ni=1 i , b i=1 zi yi
from which we obtain the solution n (z − z¯ )yi ˙b = n i=1 i , 2 −1 i=1 (zi − z¯ ) + ψ
˙ z. β˙ = y¯ − b¯
13 Best Linear Unbiased Prediction
197
˙ (c) By Corollary 13.4.3.1, the BLUP of τ ≡ β + bzis τ˜ = β˙ + bz.
(d) By Corollary 13.4.3.2, σ˜ 2 = (e) By Theorem 13.4.4,
˙ Ty ˙ Tn y−bz yT y−β1 n−1
=
var(τ˜ − τ ) = σ (1, z) 2
n ˙ ˙ i=1 yi (yi −β−bzi )
n−1
1
+ ψ −1 ) − ( ni=1 zi )2 n zi2 + ψ −1 − ni=1 zi 1 i=1 n − i=1 zi n z n n 2 + ψ −1 − 2z 2n z z + z i i=1 i i=1 . = σ2 n( ni=1 zi2 + ψ −1 ) − ( ni=1 zi )2 n(
n
.
2 i=1 zi
Exercise 14 Consider the following random-intercept simple linear regression model: yi = b + βxi + di
(i = 1, · · · , n),
where β is an unknown parameter, b is a zero-mean random variable with variance σb2 > 0, and the di ’s are uncorrelated (with each other and with b) zero-mean random variables with common variance σ 2 > 0. Let ψ = σb2 /σ 2 and suppose that ψ is known. The model equation may be written in matrix form as y = b1 + βx + d. (a) Determine σ 2 W ≡ var(y). (b) Write out the mixed-model equations in as simple a form as possible and solve them. (c) Consider predicting the predictable function τ ≡ b + βx, where x is a specified real number. Give an expression for the BLUP τ˜ of τ in terms of a solution ˙ T to the mixed-model equations. ˙ b) (β, (d) Give a nonmatrix expression for σ˜ 2 , the generalized residual mean square, in terms of a solution to the mixed-model equations mentioned in part (c). (e) Obtain a nonmatrix expression for the prediction error variance associated with τ˜ . Solution (a) var(y) = var(b1+βx+d) = var(b1)+var(d) = σb2 11T +σ 2 I = σ 2 (ψ11T +I). (b) The mixed-model equations simplify to n n n xi2 xi yi β i=1 xi i=1 i=1 , = n n −1 b i=1 xi n + ψ i=1 yi
198
13 Best Linear Unbiased Prediction
from which we obtain the solution n β˙ =
n
i=1 xi yi − n 2 i=1 xi −
i=1 xi
n
i=1 yi
n+ψ −1 ( ni=1 xi )2 n+ψ −1
b˙ =
,
n
− β˙ ni=1 xi . n + ψ −1
i=1 yi
˙ (c) By Corollary 13.4.3.1, the BLUP of τ ≡ b + βxis τ˜ = b˙ + βx.
(d) By Corollary 13.4.3.2, σ˜ 2 = (e) By Theorem 13.4.4,
˙ Tn y ˙ T y−b1 yT y−βx n−1
=
n ˙ ˙ i=1 yi (yi −b−βxi )
n−1
.
1 var(τ˜ − τ ) = σ (x, 1) (n + ψ −1 ) ni=1 xi2 − ( ni=1 xi )2 n + ψ −1 − ni=1 xi x n 2 − ni=1 xi x 1 i=1 i n n 2 2 −1 i=1 xi + i=1 xi 2 x (n + ψ ) − 2x . =σ (n + ψ −1 ) ni=1 xi2 − ( ni=1 xi )2 2
Exercise 15 Consider the following random simple linear regression model: yi = b1 + b2 zi + di
(i = 1, · · · , n),
where b1 and b2 are uncorrelated zero-mean random variables with variances σb21 > 0 and σb22 > 0, respectively, and the di ’s are uncorrelated (with each other and with b1 and b2 ) zero-mean random variables with common variance σ 2 > 0. Let ψ1 = σb21 /σ 2 and ψ2 = σb22 /σ 2 , and suppose that ψ1 and ψ2 are known. The model equation may be written in matrix form as y = b1 1 + b2 z + d. (a) Determine σ 2 W ≡ var(y). (b) Write out the mixed-model equations in as simple a form as possible and solve them. (c) Consider predicting the predictable function τ ≡ b1 +b2 z, where z is a specified real number. Give an expression for the BLUP τ˜ of τ in terms of a solution (b˙1 , b˙2 )T to the mixed-model equations. (d) Give a nonmatrix expression for σ˜ 2 , the generalized residual mean square, in terms of a solution to the mixed-model equations mentioned in part (c). (e) Obtain a nonmatrix expression for the prediction error variance associated with τ˜ .
13 Best Linear Unbiased Prediction
199
Solution (a) var(y) = var(b1 1 + b2 z + d) = var(b1 1) + var(b2 z) + var(d) = σb21 11T + σb22 zzT + σ 2 I = σ 2 (ψ1 11T + ψ2 zzT + I). (b) The mixed-model equations simplify to
n n n + ψ1−1 zi b1 i=1 yi n n i=1 = , n −1 2 b2 i=1 zi yi i=1 zi i=1 zi + ψ2
from which we obtain the solution n
b˙2 =
b˙1 =
i=1 zi yi −
n
i−1 zi
n
i=1 yi
n+ψ1−1 ( n z )2 n −1 2 − i=1 −1i i=1 zi + ψ2 n+ψ1
n
i=1 yi
− b˘2
,
n
n + ψ1−1
i=1 zi
.
(c) By Corollary 13.4.3.1, the BLUP of τ ≡ b1 + b2 z is τ˜ = b˙1 + b˙2 z. (d) By Corollary 13.4.3.2, yT y − (b˙1 1T + b˙2 zT )y σ˜ = = n 2
n
i=1 yi (yi
− b˙1 − b˙2 zi ) . n
(e) By Theorem 13.4.4, var(τ˜ − τ ) = σ (1, z) 2
1
(n + ψ1−1 )( ni=1 zi2 + ψ2−1 ) − ( ni=1 zi )2 n zi2 + ψ2−1 − ni=1 zi 1 i=1 z − ni=1 zi n + ψ1−1 n n −1 −1 2 2 i=1 zi + ψ2 − 2z i=1 zi + (n + ψ1 )z 2 . =σ (n + ψ1−1 )( ni=1 zi2 + ψ2−1 ) − ( ni=1 zi )2
Exercise 16 For the balanced one-factor random effects model considered in Example 13.4.2-2: (a) Obtain specialized expressions for the variances and covariances of the predic tion errors, {b i − bi − (bi − bi ) : i > i = 1, . . . , q}. (b) Obtain a specialized expression for the generalized residual mean square, σ˜ 2 .
200
13 Best Linear Unbiased Prediction
Solution (a) First observe that b i − bi − (bi − bi ) =
rψ (y¯i· − y¯i · ) − (bi − bi ) rψ + 1
=
rψ [(bi + d¯i· ) − (bi + d¯i · )] − (bi − bi ) rψ + 1
=
rψ −1 (bi − bi ) + (d¯i· − d¯i · ). rψ + 1 rψ + 1
Hence for i ≤ j, i > i = 1, . . . , q, and j > j = 1, . . . , q, − bj − (bj − bj )] cov[b i − bi − (bi − bi ), bj =
1 cov[(−1)(bi − bi ) + rψ(d¯i· − d¯i · ), (−1)(bj − bj ) + rψ(d¯j · − d¯j · )] (rψ + 1)2
1 r 2ψ 2 cov[(bi − bi ), (bj − bj )] + cov[(d¯i· − d¯i · ), (d¯j · − d¯j · )] (rψ + 1)2 (rψ + 1)2 ⎧ 2σ 2 r2ψ2 2ψσ 2 2σ 2 b ⎪ + (rψ+1) = rψ+1 if i = j and i = j , ⎪ 2 · r (rψ+1)2 ⎪ ⎪ ⎪ 2 2 2 2 ⎨ σb2 r ψ ψσ σ if i = j and i = j , or i = j and i = j , 2 + (rψ+1)2 · r = rψ+1 = (rψ+1) 2 2 2 2 2 −σ ⎪ r ψ ψσ (−σ ) b ⎪ = − rψ+1 if i = j, ⎪ 2 + (rψ+1)2 · ⎪ r ⎪ ⎩ (rψ+1) 0 otherwise.
=
(b) By Theorem 13.4.2c, an unbiased estimator of σ 2 is " # q q σ˜ 2 = (y − y¯·· 1qr )T I − ⊕i=1 1r (ψ −1 Iq + rIq )−1 ⊕i=1 1Tr (y − y¯·· 1qr )/(n − 1) ⎤ ⎡ . q q r rψ (yij − y¯·· )2 − r(y¯i· − y¯·· )2 ⎦ (n − 1). =⎣ rψ + 1 i=1 j =1
i=1
Exercise 17 Consider a two-way layout with two rows and two columns, and one observation in three of the four cells, labelled as y11 , y12 , and y21 as depicted in the sketch below: y11 y12 y21
No response is observed in the other cell, but we would like to use the existing data to predict such a response, which we label as y22 (the response, not its predictor). Obtain simplified expressions for the BLUP of y22 and its mean squared prediction
13 Best Linear Unbiased Prediction
201
error, under each of the following three models. Note: The following suggestions may make your work easier in this exercise: • For part (a), reparameterize the model in such a way that the parameter vector is given by ⎛
⎞ μ + α2 + γ2 ⎝ γ1 − γ2 ⎠ . α1 − α2 For part (b), reparameterize the model in such a way that the parameter vector is given by
μ + α2 α1 − α2
.
• The following matrix inverses may be useful: ⎞−1 ⎛ ⎞ 322 3 −2 −2 ⎝ 2 2 1 ⎠ = ⎝ −2 2 1 ⎠ , 212 −2 1 2 ⎛
⎞−1 ⎛ 2 1⎞ 201 3 0 −3 ⎝0 2 0⎠ = ⎝ 0 1 0⎠, 2 102 − 13 0 23 ⎛
⎞−1 ⎛ 3 1 1 ⎞ 311 7 −7 −7 ⎝1 3 0⎠ = ⎝−1 8 1 ⎠ . 7 21 21 1 8 103 − 17 21 21 ⎛
(a) The Gauss–Markov two-way main effects model yij = μ + αi + γj + eij , ⎞ e11 ⎜ e12 ⎟ ⎟ E⎜ ⎝ e21 ⎠ = 0, ⎛
e22
⎞ e11 ⎜ e12 ⎟ 2 ⎟ var ⎜ ⎝ e21 ⎠ = σ I. ⎛
e22
202
13 Best Linear Unbiased Prediction
(b) A mixed two-way main effects model yij = μ + αi + bj + dij , ⎛
⎞ b1 ⎜b ⎟ ⎜ 2⎟ ⎜ ⎟ ⎜d ⎟ E ⎜ 11 ⎟ = 0, ⎜ d12 ⎟ ⎜ ⎟ ⎝ d21 ⎠ d22
⎛
⎞ b1 ⎜b ⎟ ⎜ 2⎟ ⎜ ⎟ σb2 I2 0 ⎜d ⎟ var ⎜ 11 ⎟ = , ⎜ d12 ⎟ 0 σ 2 I4 ⎜ ⎟ ⎝ d21 ⎠ d22
where σb2 /σ 2 = 1. (c) A random two-way main effects model yij = μ + ai + bj + dij , ⎛
⎞ a1 ⎜ a2 ⎟ ⎜ ⎟ ⎜b ⎟ ⎜ 1⎟ ⎜ ⎟ ⎜b ⎟ E ⎜ 2 ⎟ = 0, ⎜ d11 ⎟ ⎜ ⎟ ⎜ d12 ⎟ ⎜ ⎟ ⎝ d21 ⎠ d22
⎛
⎞ a1 ⎜ a2 ⎟ ⎜ ⎟ ⎜b ⎟ ⎛ ⎞ ⎜ 1⎟ σa2 I2 0 0 ⎜ ⎟ ⎜b ⎟ ⎠, var ⎜ 2 ⎟ = ⎝ 0 σb2 I2 ⎜ d11 ⎟ 2 ⎜ ⎟ 0 0 σ I4 ⎜ d12 ⎟ ⎜ ⎟ ⎝ d21 ⎠ d22
where σb2 /σ 2 = σa2 /σ 2 = 1. Solution (a) After reparameterizing as suggested and putting y = (y11 , y12 , y21 )T , we have ⎞ 111 X = ⎝1 0 1⎠, 110 ⎛
and W = I3 , k = 03 , and h = 1. The BLUE of μ + α2 + γ2 is the first element of ⎞−1 ⎛ ⎞⎛ ⎞ 322 y11 111 β˜ = (XT X)−1 XT y = ⎝ 2 2 1 ⎠ ⎝ 1 0 1 ⎠ ⎝ y12 ⎠ 212 110 y21 ⎞⎛ ⎞⎛ ⎛ ⎞ y11 111 3 −2 −2 = ⎝ −2 2 1 ⎠ ⎝ 1 0 1 ⎠ ⎝ y12 ⎠ , 110 −2 1 2 y21 ⎛
13 Best Linear Unbiased Prediction
203
which is y12 + y21 − y11 , and this is also the BLUP of y22 in this case (because k = 0). The mean squared prediction error of this BLUP is var(y12 + y21 − y11 − y22 ) = 4σ 2 . (b) After reparameterizing as suggested, we have ⎞ 11 X = ⎝1 1⎠, 10 ⎛
⎞ 201 W = ⎝0 2 0⎠, 102 ⎛
⎛ ⎞ 0 k = ⎝1⎠,
h = 2.
0
The BLUE of β is β˜ = (XT W−1 X)−1 XT W−1 y ⎛ ⎡ ⎛ ⎞ ⎞⎛ ⎞⎛ ⎞⎤−1 2/3 0 −1/3 2/3 0 −1/3 y11 11 111 ⎜ ⎢ 111 ⎜ ⎟ ⎟⎜ ⎟⎜ ⎟⎥ =⎣ ⎝ 0 1/2 0 ⎠ ⎝ y12 ⎠ ⎝ 0 1/2 0 ⎠ ⎝ 1 1 ⎠⎦ 110 110 y21 −1/3 0 2/3 −1/3 0 2/3 10 ⎛ ⎞ −1 y11 7/6 5/6 1/3 1/2 1/3 ⎜ ⎟ = ⎝ y12 ⎠ 5/6 7/6 2/3 1/2 −1/3 y21 (−y11 + y12 + 4y21 )/4 = . (3y11 + y12 − 4y21 )/4
Then, the BLUP of y22 is y˜22 =
T −1 ˜ 1 0 β˜ + k W (y − Xβ)
= [(−y11 + y12 + 4y21 )/4] ⎞ ⎞ ⎡⎛ ⎤ ⎞ ⎛ ⎛ 11 y11 2/3 0 −1/3 ⎟ (−y11 + y12 + 4y21 )/4 ⎥ ⎟ ⎢⎜ ⎟ ⎜ ⎜ + 0 1 0 ⎝ 0 1/2 0 ⎠ ⎣⎝ y12 ⎠ − ⎝ 1 1 ⎠ ⎦ (3y11 + y12 − 4y21 )/4 10 −1/3 0 2/3 y21 = [(−y11 + y12 + 4y21 )/4] + (y12 /2) − (1/2)[(−y11 + y12 + 4y21 )/4] −(1/2)[(3y11 + y12 − 4y21 )/4] = (−1/2)y11 + (1/2)y12 + y21 .
The mean squared prediction error of y˜22 is ⎛ ⎜
2010
⎞⎛ ⎟⎜
−1/2
⎜ 0 2 0 1 ⎟ ⎜ 1/2 ⎟⎜ var[(−1/2)y11 + (1/2)y12 + y21 − y22 ] = σ 2 −1/2 1/2 1 −1 ⎜ ⎜ ⎟⎜ ⎝1 0 2 0⎠⎝ 1 0102 = 3σ 2 .
−1
⎞ ⎟ ⎟ ⎟ ⎟ ⎠
204
13 Best Linear Unbiased Prediction
(c) We have ⎞ 311 W = ⎝1 3 0⎠, 103 ⎛
X = 13 ,
⎛ ⎞ 0 k = ⎝1⎠, 1
h = 3.
The BLUE of β (≡ μ) is ⎡
⎞ ⎤−1 ⎛ ⎞⎛ ⎞ y11 3/7 −1/7 −1/7 3/7 −1/7 −1/7 β˜ = ⎣1T3 ⎝ −1/7 8/21 1/21 ⎠ 13 ⎦ 1T3 ⎝ −1/7 8/21 1/21 ⎠ ⎝ y12 ⎠ −1/7 1/21 8/21 −1/7 1/21 8/21 y21 ⎛ ⎞ y11 = (7/5) 1/7 2/7 2/7 ⎝ y12 ⎠ y21 ⎛
= (1/5)y11 + (2/5)y12 + (2/5)y21 . Then, the BLUP of y22 is y˜22 = (1/5)y11 + (2/5)y12 + (2/5)y21 ⎞ ⎡⎛ ⎛ ⎤ ⎞ 3/7 −1/7 −1/7 y11 ⎟ ⎢⎜ ⎜ ⎥ ⎟ + 0 1 1 ⎝ −1/7 8/21 1/21 ⎠ ⎣⎝ y12 ⎠ − 13 [(1/5)y11 + (2/5)y12 + (2/5)y21 ]⎦ −1/7 1/21 8/21
y21
= (1/5)y11 + (2/5)y12 + (2/5)y21 + (−2/7)y11 + (3/7)y12 + (3/7)y21 −(4/7)[(1/5)y11 + (2/5)y12 + (2/5)y21 ] = (−1/5)y11 + (3/5)y12 + (3/5)y21 .
The mean squared prediction error of y˜22 is var[(−1/5)y11 + (3/5)y12 + (3/5)y21 − y22 ] = σ 2 −1/5 3/5 3/5 −1 ⎛ ⎞⎛ ⎞ 3110 −1/5 ⎜ 1 3 0 1 ⎟ ⎜ 3/5 ⎟ ⎜ ⎟⎜ ⎟ ⎝ 1 0 3 1 ⎠ ⎝ 3/5 ⎠ 0113 −1 = (12/5)σ 2 .
13 Best Linear Unbiased Prediction
205
Exercise 18 Consider the following mixed linear model for two observations, y1 and y2 : y1 = 2β + b + d1 y2 = β + 3b + d2 , where β is a fixed unknown (and unrestricted) parameter, and b, d1 , d2 are independent random variables with zero means and variances σb2 = var(b) and σ 2 = var(d1 ) = var(d2 ). Suppose that σb2 /σ 2 = 1/2. (a) Write this model in the matrix form y = Xβ + Zb + d and give an expression for var(y) in which σ 2 is the only unknown parameter. (b) Compute the BLUE of β and the BLUP of b. Solution 2 1 (a) y = β+ b + d, and 1 3 var(y) = var(Zb + d) = σb2 ZZT + σ 2 I 1 = σ 2 (1/2) 1 3 +I 3 3/2 3/2 = σ2 . 3/2 11/2 (b) The mixed-model equations are
5 5 5 12
β 2y1 + y2 . = y1 + 3y2 b
Their unique solution, which yields the BLUE of β and the BLUP of b, is β˙ 12/35 −5/35 2y1 + y2 (19/35)y1 − (3/35)y2 = = . b˙ −5/35 5/35 y1 + 3y2 (−1/7)y1 + (2/7)y2 Exercise 19 Consider the balanced mixed two-way main effects model yij k = μ + αi + bj + dij k
(i = 1, . . . , q; j = 1, . . . , m; k = 1, . . . , r),
where μ, α1 , . . . , αq are unknown parameters, the bj ’s are uncorrelated zero-mean random variables with common variance σb2 > 0, and the dij k ’s are uncorrelated
206
13 Best Linear Unbiased Prediction
(with each other and with the bj ’s) zero-mean random variables with common variance σ 2 > 0. Let ψ = σb2 /σ 2 and suppose that ψ is known. (a) One solution to the Aitken equations for this model is β˜ = (0, y¯1·· , . . . , y¯q·· )T . Using this solution, obtain specialized expressions for the BLUEs of μ + αi and αi − αi (i > i = 1, . . . , q). (b) Obtain a specialized expression for the variance–covariance matrix of the BLUEs of μ+αi , and do likewise for the BLUEs of αi −αi (i > i = 1, . . . , q). (c) Obtain specialized expressions for the BLUPs of μ+αi +bj (i = 1, . . . , q; j = 1, . . . , m) and bj − bj (j > j = 1, . . . , m). (d) Obtain a specialized expression for the variance–covariance matrix of the BLUPs of μ + αi + bj , and do likewise for the BLUPs of bj − bj (i = 1, . . . , q; j > j = 1, . . . , m). Solution ˜ Thus, the BLUE of μ + αi is (a) The BLUE of any estimable function cT β is cT β. μ + αi = y¯i·· (i = 1, . . . , q), and the BLUE of αi − αi is α i − αi = y¯i·· − y¯i ·· (i > i = 1, . . . , q). m r 1 1 m r (b) y¯i·· = mr j =1 k=1 yij k = mr j =1 k=1 (μ + αi + bj + dij k ) = μ + αi + b¯ + d¯i·· . Therefore, + αi ) = cov(y¯i·· , y¯i ·· ) cov(μ + αi , μ = cov(b¯ + d¯i·· , b¯ + d¯i ·· ) ¯ + cov(d¯i·· , d¯i ·· ) = var(b) ! σ2 σ2 b + mr if i = i , = σm2 b if i = i . m And, for i ≤ s, i > i = 1, . . . , q, and s > s = 1, . . . , q, cov(α i − αi , α s − αs ) = cov(y¯i·· − y¯i ·· , y¯s·· − y¯s ·· ) = cov(y¯i·· , y¯s·· ) − cov(y¯i ·· , y¯s·· ) − cov(y¯i·· , y¯s ·· ) + cov(y¯i ·· , y¯s ·· ) ⎧ 2 2σ ⎪ ⎪ ⎪ mr2 if i = s and i = s , ⎪ ⎨σ if i = s and i = s , or i = s and i = s , = mrσ 2 ⎪− ⎪ mr if i = s, ⎪ ⎪ ⎩0 otherwise.
13 Best Linear Unbiased Prediction
207
(c) By Theorem 13.4.2a, ⎛ ˜ = b˜ = (G−1 + ZT Z)−1 ZT (y − Xβ)
qrψ 1 + qrψ
⎞ y¯·1· − y¯··· ⎜ ⎟ .. ⎝ ⎠. . y¯·m· − y¯···
Therefore, the BLUP of μ + αi + bj is μ + αi + bj = y¯i·· +
qrψ (y¯·j · − y¯··· ) 1 + qrψ
(i = 1, . . . , q; j = 1, . . . , m)
and the BLUP of bj − bj is − bj = bj
qrψ (y¯·j · − y¯·j · ) 1 + qrψ
(j > j = 1, . . . , m).
(d) Note that y¯·j · = μ + α¯ + bj + d¯·j · and y¯··· = μ + α¯ + b¯ + d¯··· , so cov(y¯·j · , y¯·j · )
=
cov(bj , bj ) + cov(d¯·j · , d¯·j · ) =
var(y¯··· )
=
var(b¯ + d¯··· ) =
cov(y¯i·· , y¯·j · )
=
¯ bj ) + cov(d¯i·· , d¯·j · ) = cov(b,
cov(y¯i·· , y¯··· )
=
¯ b) ¯ + cov(d¯i·· , d¯··· ) = cov(b,
cov(y¯·j · , y¯··· )
=
¯ + cov(d¯·j · , d¯··· ) = cov(bj , b)
⎧ ⎨ σ2 + ⎩0
b
σ2 qr
if j if j
= j , = j ,
σb2 σ2 + , m qmr σb2 σ2 + m qmr
σb2 σ2 + m qmr σb2 σ2 + m qmr
for i = 1, . . . , q; j
= 1, . . . , m,
for i = 1, . . . , q, for j
= 1, . . . , m.
Thus, cov(μ + αi + bj , μ + αi + bj ) = cov(y¯i·· + qrψ qrψ y¯·j · − y¯··· ) 1 + qrψ 1 + qrψ ⎧ 2 σb2 σb2 ⎪ qrψ σ2 ⎪ + + (m − 1) ⎪ ⎪ m mr 1+qrψ m + ⎪ ⎪ ⎪ 2 σ2 ⎪ ⎪ σb2 qrψ σ2 σ2 b ⎪ + 1+qrψ ⎨ m + mr m + mr = 2 σb2 σb2 ⎪ qrψ σ2 ⎪ + (m − 1) + ⎪ ⎪ m 1+qrψ m mr ⎪ ⎪ ⎪ 2 2 2 ⎪ 2 σb ⎪ σb qrψ σ ⎪ ⎩ m − 1+qrψ m + mr
qrψ qrψ y¯·j · − y¯··· , y¯i ·· 1 + qrψ 1 + qrψ
+
σ2 mr
if i = i and j = j , if i = i and j = j , if i = i and j = j , if i = i and j = j ,
208
13 Best Linear Unbiased Prediction
and for j ≤ t, j > j = 1, . . . , m, and t > t = 1, . . . , m, cov(bj − bj , b t − bt ) =
=
2 qrψ cov(y¯·j · − y¯·j · , y¯·t· − y¯·t · ) 1 + qrψ ⎧ 2 2 qrψ ⎪ ⎪ 2 1+qrψ σb2 + σqr if j = t and j = t , ⎪ ⎪ ⎪ 2 ⎪ 2 ⎪ qrψ ⎪ σb2 + σqr if j = t and j = t , ⎨ 1+qrψ
⎪ ⎪ 2 ⎪ ⎪ qrψ ⎪ σb2 + − ⎪ ⎪ 1+qrψ ⎪ ⎩ 0
2
σ qr
or j = t and j = t ,
if j = t, otherwise.
Exercise 20 Consider a random two-way partially crossed model, analogous to the fixed-effects model introduced in Example 5.1.4-1, with one observation per cell, i.e., yij = μ + bi − bj + dij
(i = j = 1, . . . , q),
where μ is an unknown parameter, the bi ’s are uncorrelated zero-mean random variables with common variance σb2 > 0, and the dij ’s are uncorrelated (with each other and with the bi ’s) zero-mean random variables with common variance σ 2 > 0. Let ψ = σb2 /σ 2 and suppose that ψ is known. (a) This model is a special case of the general mixed linear model y = Xβ + Zb + d, where var(d) = σ 2 I and var(b) = σ 2 G for some positive definite matrix G. Write out the elements of the matrices X, Z, and G for this case. (b) Write out the mixed-model equations for this model and solve them. (c) Determine W ≡ (1/σ 2 )var(y) and determine W−1 . (d) Give expressions, in as simple a form as possible, for the BLUPs of bi − bj (j > i = 1, . . . , q) and for the variance–covariance matrix of the prediction errors associated with these BLUPs. Specialize that variance–covariance matrix for the case q = 4. Solution (q)
(q)
(a) X = 1q(q−1) , Z = (vTij )i=j =1,...,q , G = ψIq where vij = ui − uj and ψ = σb2 /σ 2 . (b) Here XT X = 1Tq(q−1) 1q(q−1) = q(q − 1), XT Z = 1q(q−1) (vTij )i=j =1,...,q = 0Tq (because each column of (vTij )i=j =1,...,q has q − 1 ones and q − 1 minus ones), G = ψIq , and ZT Z = 2qIq − 2Jq . Furthermore, XT y = 1q(q−1) y =
13 Best Linear Unbiased Prediction
i,j :i=j
yij and ZT y =
209
i:i=j (yij − yj i )
equations may be written as
q(q − 1) 0Tq 0q (ψ −1 + 2q)Iq − 2Jq
j =1,...,q
. So the mixed-model ⎞
⎛ i,j :i=j yij μ =⎝ b i=j (yij − yj i )
⎠,
j =1,...,q
with solution μ˙ =
i,j :i=j
yij
q(q − 1)
and ⎛ b˙ = [(ψ −1 + 2q)Iq − 2Jq ]−1 ⎝
⎞ (yij − yj i )⎠
i:i=j
⎛
j =1,...,q
⎞ 2 = [(ψ −1 + 2q)−1 Iq + Jq ] ⎝ (yij − yj i )⎠ (ψ −1 + 2q)ψ −1 i:i=j
⎛ = (ψ −1 + 2q)−1 ⎝
⎞ (yij − yj i )⎠
i:i=j
j =1,...,q
, j =1,...,q
where we used expression (2.1) for the inverse of a nonsingular compound symmetric matrix. (c) var(y) = σ 2 (ψZZT + Iq(q−1) ). Thus W = ψZZT + Iq(q−1) where (ZZT )ij,st , for i ≤ s, j > i = 1, . . . , q, and t > s = 1, . . . , q, is given by (q)
(ZZT )ij,st = vTij vst = (ui
(q)
(q)
(q)
− uj )T (us − ut )
(q)T (q)
(q)T (q)
(q)T (q)
(q)T (q)
= ui us − uj us − ui ut + uj ut ⎧ 2 if i = s and j = t, ⎪ ⎪ ⎨ 1 if i = s and j = t, or i = s and j = t, = ⎪ −1 if j = s, ⎪ ⎩ 0 otherwise.
210
13 Best Linear Unbiased Prediction
(The remaining elements are determined by symmetry.) Using the computational formulae for W−1 , we obtain G−1 + ZT Z = (ψ −1 + 2q)Iq − 2Jq , (G−1 + ZT Z)−1 = (ψ −1 + 2q)−1 Iq +
(ψ −1
2 Jq , + 2q)ψ −1
W−1 = (I + ZGZT )−1 = I − Z(G−1 + ZT Z)−1 ZT = I − (ψ −1 + 2q)−1 ZZT because ZJ = 0 and JZT = 0. (d) Ignoring terms involving the fixed effects, b˜ − b = (ψ −1 + 2q)−1 ZT (Zb + d) − b = (ψ −1 + 2q)−1 [(2qIq − 2Jq )b − (ψ −1 + 2q)b + ZT d] = (ψ −1 + 2q)−1 [(−ψ −1 − 2Jq )b + ZT d]. Thus var(b˜ − b) = (ψ −1 + 2q)−2 [σ 2 ψ(ψ −1 Iq + 2Jq )(ψ −1 Iq + 2Jq ) + ZT (σ 2 Iq(q−1) ] = σ 2 (ψ −1 + 2q)−2 [ψ −1 Iq + 4(1 + qψ)Jq + (2qIq − 2Jq )] = σ 2 (ψ −1 + 2q)−2 [(ψ −1 + 2q)Iq + (2 + 4qψ)Jq ]
and var(Zb˜ − Zb) = Zvar(b˜ − b)ZT = σ 2 (ψ −1 + 2q)−2 (ψ −1 + 2q)ZZT =
σ2 ZZT . + 2q
ψ −1
So, in the special case q = 4, with rows (and columns) ordered as 12, 13, 14, 23, 24, 34, we have ⎛
2 ⎜ 1 ⎜ ⎜ σ2 ⎜ 1 var(Zb˜ − Zb) = −1 ⎜ ψ + 8 ⎜ −1 ⎜ ⎝ −1 0
1 2 1 1 0 −1
1 −1 1 1 2 0 0 2 1 1 1 −1
−1 0 1 1 2 1
⎞ 0 −1 ⎟ ⎟ ⎟ 1 ⎟ ⎟. −1 ⎟ ⎟ 1 ⎠ 2
13 Best Linear Unbiased Prediction
211
Exercise 21 Consider the balanced mixed two-factor nested model yij k = μ + αi + bij + dij k
(i = 1, . . . , q; j = 1, . . . , m; k = 1, . . . , r),
where μ, α1 , . . . , αq are unknown parameters, the bij ’s are uncorrelated zero-mean random variables with common variance σb2 > 0, and the dij k ’s are uncorrelated (with each other and with the bij ’s) zero-mean random variables with common variance σ 2 > 0. Let ψ = σb2 /σ 2 and suppose that ψ is known. (a) This model is a special case of the general mixed linear model y = Xβ + Zb + d,
(b) (c) (d) (e)
(f)
where var(d) = σ 2 I and var(b) = σ 2 G for some positive definite matrix G. Specialize the matrices X, Z, and G for this model. Is α1 − α2 + b31 (assuming that q ≥ 3) a predictable function? Explain why or why not. Specialize the mixed-model equations for this model. Obtain specialized expressions for W = (1/σ 2 )var(y) and W−1 . It can be shown that one solution to the Aitken equations corresponding to this model is β˜ = (0, y¯1·· , . . . , y¯q·· ). Using this fact, give a specialized expression for the BLUP of μ + αi + bij . Obtain a specialized expression for the prediction error variance of the BLUP of μ + αi + bij .
Solution (a) ⎞ ⎞ ⎛ b11 μ ⎜b ⎟ ⎜α ⎟ ⎜ 12 ⎟ ⎜ 1⎟ ⎟ ⎜ . ⎟ , G = (σ 2 /σ 2 )Iqm . , Z = I ⊗1 , b = Iq ⊗ 1mr , β = ⎜ qm r b ⎜ . ⎟ ⎜ .. ⎟ ⎝ . ⎠ ⎝ . ⎠ αq bqm ⎛
X = 1qmr
(b) Yes, by Theorem 13.1.2 because α⎛ is estimable. 1 − α2⎞ ⎛ ⎞ y··· μ ⎜ α ⎟ ⎜y ⎟ ⎜ 1 ⎟ ⎜ 1·· ⎟ ⎜ . ⎟ ⎜ . ⎟ ⎞ ⎛ T T ⎜ .. ⎟ ⎜ .. ⎟ r1qm qmr mr1q ⎟ ⎜ ⎟ ⎜ T ⎟ ⎜ ⎟ ⎟ ⎜ mr1q mrIq rIq ⊗ 1m ⎠ ⎜ (c) ⎝ ⎜ αq ⎟ = ⎜ yq·· ⎟ . 2 ⎜ ⎟ ⎟ ⎜ r1qm rIq ⊗ 1m (r + σ 2 )Iqm ⎜ b11 ⎟ ⎜ y11· ⎟ σb ⎟ ⎜ ⎟ ⎜ ⎜ .. ⎟ ⎜ .. ⎟ ⎝ . ⎠ ⎝ . ⎠ bqm yqm·
212
13 Best Linear Unbiased Prediction
(d) var(y) = Zvar(b)ZT + var(d) = (Iqm ⊗ 1r )(σb2 Iqm )(Iqm ⊗ 1Tr ) + σ 2 Iqmr = σb2 (Iqm ⊗ Jr ) + σ 2 (Iqm ⊗ Ir ) = σ 2 Iqm ⊗ (Ir + ψJr ), where ψ = σb2 /σ 2 . Thus W = (1/σ 2 )var(y) = Iqm ⊗ (Ir + ψJr ). From Example 2.9-1, we find that −1 = Iqm ⊗ (Ir − W−1 = I−1 qm ⊗ (Ir + ψJr )
ψ Jr ). 1 + rψ
(e) Using the given solution to the Aitken equations, μ + αi = y¯i·· . Furthermore, (qm)T ˜ b˜ij = u(i−1)q+j GZT (I + ZGZT )−1 (y − Xβ) (qm)T
= u(i−1)q+j (ψIqm )(Iqm ⊗ 1Tr )[Iqm ⊗ (Ir − (qm)T
= ψu(i−1)q+j [Iqm ⊗ (1Tr −
ψ ˜ Jr )](y − Xβ) 1 + rψ
rψ ˜ 1T )](y − Xβ) 1 + rψ r
=
ψ (qm)T ˜ u (Iqm ⊗ 1Tr )(y − Xβ) 1 + rψ (i−1)q+j
=
rψ (y¯ij · − y¯i·· ). 1 + rψ
rψ rψ Thus, the BLUP of μ + αi + bij is y¯i·· + 1+rψ (y¯ij · − y¯i·· ) = 1+rψ y¯ij · + 1 1+rψ y¯i·· . (f) Because y¯i·· = μ + αi + b¯i· + d¯i·· and y¯ij · = μ + αi + bij + d¯ij · , we obtain var(y¯i·· ) =
σb2 σ2 + , m mr
var(y¯ij · ) = σb2 +
σ2 , r
cov(y¯i·· , y¯ij · ) = cov(b¯i· , bij 1 ) + cov(d¯i·· , d¯ij · ) =
σb2 σ2 + , m mr
13 Best Linear Unbiased Prediction
213
cov(y¯i·· , bij ) =
σb2 , m
cov(y¯ij · , bij ) = σb2 . Consequently, the prediction error variance associated with the BLUP of μ + αi + bij is
1 rψ y¯i·· + y¯ij · − bij 1 + rψ 1 + rψ 2 2 rψ 2rψ 1 var(y¯i·· ) + var(y¯ij · ) + var(bij ) + cov(y¯i·· , y¯ij · ) = 1 + rψ 1 + rψ (1 + rψ)2
var
2rψ 2 cov(y¯i·· , bij ) − cov(y¯ij · , bij ) 1 + rψ 1 + rψ σb2 σb2 σ2 σ2 σ2 2rψ 1 (rψ)2 2 2 + + + σb + σb + = + mr r mr (1 + rψ)2 m (1 + rψ)2 (1 + rψ)2 m −
− =
σb2 2rψ 2 − σ2 1 + rψ m 1 + rψ b
m−1 1 + mr 2 ψ 2 + 2rψ 2 σb2 + σ . 2 m(1 + rψ) mr(1 + rψ)2
Exercise 22 Consider a mixed effects model with known ψ, where G = G(ψ) and R = R(ψ) are positive definite. Let B be the unique q × q positive definite matrix such that BT B = G−1 (such a matrix exists by Theorem 2.15.12). Suppose 1 that we transform this model by pre-multiplying all of its terms by R− 2 , then augment the transformed model with the model y∗ = Bb + h, where y∗ = 0q and h is a random q-vector satisfying E(h) = 0 and var(h) = σ 2 I. Show that any solution to the normal equations for this transformed/augmented model is also a solution to the mixed-model equations associated with the original mixed effects model, thus establishing that any computer software that can perform ordinary least squares regression can be coerced into obtaining BLUPs under a mixed effects model. Solution The transformed/augmented model is
1
R− 2 y y∗
=
1
1
R− 2 X R− 2 Z 0 B
1 β R− 2 d , + b h
where E
1
R− 2 d h
= 0n+q ,
var
1
R− 2 d h
= σ 2 In+q .
214
13 Best Linear Unbiased Prediction
The normal equations corresponding to this model are
1
1
R− 2 X R− 2 Z 0 B
T
1
1
1
1
R− 2 X R− 2 Z 0 B
T 1 1 1 β R− 2 y R− 2 X R− 2 Z , = b 0 B y∗
i.e.,
1
XT R− 2 0T 1 ZT R− 2 BT
R− 2 X R− 2 Z 0 B
1 1 β XT R− 2 0T R− 2 y , = 1 b 0 ZT R− 2 BT
i.e.,
XT R−1 Z XT R−1 X T −1 T Z R X Z R−1 Z + G−1
T −1 β X R y . = ZT R−1 y b
This last system of equations is the mixed-model equations associated with the original mixed effects model; thus, any solution to the normal equations for the transformed/augmented model is also a solution to the mixed-model equations associated with the original mixed effects model. Exercise 23 Prove Theorem 13.4.4: Under the positive definite mixed effects model with known ψ, let β˙ and b˙ be the components of any solution to the mixedmodel equations; let τ = CT β + FT b represent a vector of predictable linear functions; and let τ˜ = CT β˙ + FT b˙ be the BLUP of τ . Then var(τ˜ − τ ) = σ
2
C F
T
XT R−1 Z XT R−1 X T −1 −1 Z R X G + ZT R−1 Z
−
C , F
invariant to the choice of generalized inverse of the coefficient matrix of the mixedmodel equations. (Hint: For the invariance, use Theorem 3.2.1.) Solution Observe that the lower right block S ≡ G−1 + ZT R−1 Z of the coefficient matrix of the mixed-model equations is nonsingular, implying by Theorem 3.3.7b that −M− XT R−1 ZS−1 M− , −S−1 ZT R−1 XM− S−1 + S−1 ZT R−1 XM− XT R−1 ZS−1
13 Best Linear Unbiased Prediction
215
where M = XT (R−1 − R−1 ZS−1 ZT R−1 )X, is a generalized inverse of the coefficient matrix. Denote this particular generalized inverse by T. Pre-multiplication and post-multiplication of T by (CT , FT ) and its transpose, respectively, yields CT M− C − CT M− XT R−1 ZS−1 F − FT S−1 ZT R−1 XM− C + FT S−1 F +FT S−1 ZT R−1 XM− XT R−1 ZS−1 F = FT S−1 F + (CT − FT S−1 ZT R−1 X) ×[XT (R−1 − R−1 ZS−1 ZT R−1 )X]− (CT − FT S−1 ZT R−1 X)T , which matches the expression for (1/σ 2 )var(τ˜ − τ ) given in Theorem 13.4.2b. Now let A denote the coefficient matrix of the mixed-model equations, and let B denote an arbitrary generalized inverse of A. By Theorem 3.1.2, B = T + U − TAUAT for some matrix U of appropriate dimensions. We obtain TA =
=
M−
−M− XT R−1 ZS−1
−S−1 ZT R−1 XM− S−1 + S−1 ZT R−1 XM− XT R−1 ZS−1 M− M 0 0
XT R−1 X
XT R−1 Z
ZT R−1 X G−1 + ZT R−1 X
I
because X(I − M− M) = 0 by Theorem 11.1.6a. Similarly, we obtain AT =
MM− 0 0 I
because (I − MM− )XT = 0. Therefore, upon partitioning T and U appropriately, we have B=
T11 T12 T21 T22
+
U11 U12 U21 U22
MM− (I − M− M)XT R−1 ZS−1 + 0 I T11 + U11 − M− MU11 MM− T12 + U12 − M− MU12 . = T21 + U21 − U21 MM− T22 M− M 0 S−1 ZT R−1 X(I − M− M) I
U11 U12 U21 U22
216
13 Best Linear Unbiased Prediction
Finally, the predictability of CT β + FT b implies that CT = AT X for some matrix A, so
=
CT
FT
AT X
FT
B
C
F
T11 + U11 − M− MU11 MM− T12 + U12 − M− MU12 T21 + U21 − U21 MM−
T
T
F
T22 −
XT A
= A XT11 X A + A X(U11 − M MU11 MM )X A + A XT12 F + A X(U12 − M− MU12 )F T
−
T
T
T
+FT T21 XT A + FT (U21 − U21 MM− )XT A + FT T22 F = CT T11 C + CT T12 F + FT T21 C + FT T22 F.
This establishes the claimed invariance. Exercise 24 For Example 13.4.5-1: (a) Verify the given expressions for P1 , Pα , PZ , Pγ , and Pξ . (b) Verify all the expressions for the pairwise products of P1 , Pα , PZ , Pγ , and Pξ . (c) Verify that the matrices of the six quadratic forms in decomposition (13.12) are idempotent. d) Verify the given expressions for the expected mean squares. Solution (a) Making extensive use of Theorems 2.17.4 and 2.17.5 and the fact that A− ⊗ B− is a generalized inverse of A ⊗ B (cf. Exercise 3.1l), we obtain P1 = 1qrm (1Tqrm 1qrm )− 1Tqrm = (1/qrm)Jqrm , Pα = (Iq ⊗ 1rm )[(Iq ⊗ 1rm )T (Iq ⊗ 1rm )]− (Iq ⊗ 1rm )T = (Iq ⊗ 1rm )(Iq ⊗ rm)− (Iq ⊗ 1rm )T = (1/rm)(Iq ⊗ Jrm ), PZ = (Iqr ⊗ 1m )[(Iqr ⊗ 1m )T (Iqr ⊗ 1m )]− (Iqr ⊗ 1m )T = (Iqr ⊗ 1m )(Iqr ⊗ m)− (Iqr ⊗ 1m )T = (1/m)(Iqr ⊗ Jm ), Pγ = (1qr ⊗ Im )[(1qr ⊗ Im )T (1qr ⊗ Im )]− (1qr ⊗ Im )T = (1qr ⊗ Im )(qr ⊗ Im )− (1qr ⊗ Im )T = (1/qr)(Jqr ⊗ Im ), Pξ = (I1 ⊗ 1r ⊗ Im )[(I1 ⊗ 1r ⊗ Im )T (I1 ⊗ 1r ⊗ Im )]− (I1 ⊗ 1r ⊗ Im )T = (I1 ⊗ 1r ⊗ Im )(Iq ⊗ r ⊗ Im )− (I1 ⊗ 1r ⊗ Im )T = (1/r)(Iq ⊗ Jr ⊗ Im ).
13 Best Linear Unbiased Prediction
(b) Using Theorem 2.17.5 repeatedly, we obtain Pα P1 = (1/rm)(Iq ⊗ Jrm )(1/qrm)Jqrm = (1/qr 2 m2 )(Iq ⊗ Jrm )(Jq ⊗ Jrm ) = (1/qr 2 m2 )(Jq ⊗ rmJrm ) = (1/qrm)Jqrm = P1 , PZ P1 = (1/m)(Iqr ⊗ Jm )(1/qrm)(Jqr ⊗ Jm ) = (1/qrm2 )(Jqr ⊗ mJm ) = (1/qrm)Jqrm = P1 , Pγ P1 = (1/qr)(Jqr ⊗ Im )(1/qrm)(Jqr ⊗ Jm ) = (1/q 2 r 2 m)(qrJqr ⊗ Jm ) = (1/qrm)Jqrm = P1 , Pξ P1 = (1/r)(Iq ⊗ Jr ⊗ Im )(1/qrm)(Jq ⊗ Jr ⊗ Jm ) = (1/qr 2 m)(Jq ⊗ rJr ⊗ Jm ) = (1/qrm)Jqrm = P1 , Pα PZ = (1/rm)(Iq ⊗ Jrm )(1/m)(Iqr ⊗ Jm ) = (1/rm2 )(Iq ⊗ Jr ⊗ Jm )(Iq ⊗ Ir ⊗ Jm ) = (1/rm)(Iq ⊗ Jrm ) = Pα , Pα Pγ = (1/rm)(Iq ⊗ Jrm )(1/qr)(Jqr ⊗ Im ) = (1/qr 2 m)(Iq ⊗ Jr ⊗ Jm )(Jq ⊗ Jr ⊗ Im ) = (1/qrm)Jqrm = P1 , Pα Pξ = (1/rm)(Iq ⊗ Jrm )(1/r)(Iq ⊗ Jr ⊗ Im ) = (1/r 2 m)(Iq ⊗ Jr ⊗ Jm )(Iq ⊗ Jr ⊗ Im ) = (1/rm)(Iq ⊗ Jrm ) = Pα , PZ Pγ = (1/m)(Iqr ⊗ Jm )(1/qr)(Jqr ⊗ Im ) = (1/qrm)Jqrm = P1 , PZ Pξ = (1/m)(Iqr ⊗ Jm )(1/r)(Iq ⊗ Jr ⊗ Im ) = (1/rm)(Iq ⊗ Ir ⊗ Jm )(Iq ⊗ Jr ⊗ Im ) = (1/rm)(Iq ⊗ Jrm ) = Pα , Pγ Pξ = (1/qr)(Jqr ⊗ Im )(1/r)(Iq ⊗ Jr ⊗ Im )
217
218
13 Best Linear Unbiased Prediction
= (1/qr 2 )(Jq ⊗ Jr ⊗ Im )(Iq ⊗ Jr ⊗ Im ) = (1/m)(Jqr ⊗ Im ) = Pγ . (c) We know already that P1 P1 is idempotent. Using the results from part (b), we obtain (Pα − P1 )(Pα − P1 ) = Pα − P1 − P1 + P1 = Pα − P 1 , (PZ − Pα )(PZ − Pα ) = PZ − Pα − Pα + Pα = PZ − P α , (Pγ − P1 )(Pγ − P1 ) = Pγ − P1 − P1 + P1 = Pγ − P 1 , (Pξ − Pα − Pγ + P1 )(Pξ − Pα − Pγ + P1 ) = Pξ + Pα + Pγ + P1 − 2Pα − 2Pγ + 2P1 +2P1 − 2P1 − 2P1 = Pξ − P α − P γ + P 1 , (I − PZ + Pα − Pξ )(I − PZ + Pα − Pξ ) = I + PZ + Pα + Pξ − 2PZ + 2Pα − 2Pξ −2Pα + Pα − 2Pα = I − PZ + Pα − Pξ .
(d) First observe that 1 (μ + αi + γk + ξik ) = μ + α¯ · + γ¯· + ξ¯·· , qrm q
E(y¯··· ) =
r
m
i=1 j =1 k=1
1 (μ + αi + γk + ξik ) = μ + αi + γ¯· + ξ¯i· , rm r
E(y¯i·· ) =
m
j =1 k=1
1 (μ + αi + γk + ξik ) = μ + αi + γ¯· + ξ¯i· , m m
E(y¯ij · ) =
k=1
E(y¯··k ) =
q r 1 (μ + αi + γk + ξik ) = μ + α¯ · + γk + ξ¯·k , qr i=1 j =1
E(y¯i·k ) =
1 r
r j =1
(μ + αi + γk + ξik ) = μ + αi + γk + ξ¯ik .
13 Best Linear Unbiased Prediction
219
Then /
q E rm (y¯i·· − y¯··· )2
0 = rm
i=1
q
[E(y¯i·· ) − E(y¯··· )]2
i=1
+tr{[(1/rm)(Iq ⊗ Jrm ) − (1/qrm)Jqrm ] σ 2 [Iqrm + (σb2 /σ 2 )(Iqr ⊗ Jm )]} = rm
q
[(αi − α¯ · ) + (ξ¯i· − ξ¯·· )]2
i=1
+σ 2 tr[(1/rm)(Iq ⊗ Jrm − (1/qrm)Jqrm ] +σb2 tr[(1/rm)(Iq ⊗ Jr ⊗ mJm ) − (1/qrm)(Jqr ⊗ mJm )] = rm
q
[(αi − α¯ · ) + (ξ¯i· − ξ¯·· )]2 + (q − 1)(σ 2 + mσb2 ),
i=1
⎡
⎤ q q r r E ⎣m (y¯ij · − y¯i·· )2 ⎦ = m [E(y¯ij · ) − E(y¯i·· )]2 i=1 j =1
i=1 j =1
+tr{[(1/m)(Iqr ⊗ Jm ) − (1/rm)(Iq ⊗ Jrm )]σ 2 [Iqrm +(σb2 /σ 2 )(Iqr ⊗ Jm )]} = 0 + σ 2 tr[(1/m)(Iqr ⊗ Jm ) − (1/rm)(Iq ⊗ Jrm ] +σb2 tr[(1/m)(Iqr ⊗ mJm ) − (1/rm)(Iq ⊗ Jr ⊗ mJm )] = q(r − 1)(σ 2 + mσb2 ),
/ E qr
m
0 (y¯··k − y¯··· )
2
= qr
k=1
m [E(y¯··k ) − E(y¯··· )]2 k=1
+tr{[(1/qr)(Jqr ⊗ Im ) − (1/qrm)Jqrm ]σ 2 [Iqrm +(σb2 /σ 2 )(Iqr ⊗ Jm )]} = qr
m [(γk − γ¯· ) + (ξ¯·k − ξ¯·· )]2 k=1
+σ 2 tr[(1/qr)(Jqr ⊗ Im ) − (1/qrm)(Jqrm ] +σb2 tr[(1/qr)(Jqr ⊗ Jm ) − (1/qrm)(Jqr ⊗ mJm )] = qr
m [(γk − γ¯· ) + (ξ¯·k − ξ¯·· )]2 + (m − 1)σ 2 , k=1
220
13 Best Linear Unbiased Prediction
/ E r
q m
0 (y¯i·k − y¯i·· − y¯··k + y¯··· )
2
i=1 k=1
=r
q m
[E(y¯i·k ) − E(y¯i·· ) − E(y¯··k )
i=1 k=1
+E(y¯··· )]2 + tr{[(1/r)(Iq ⊗ Jr ⊗ Im ) − (1/rm)(Iq ⊗ Jrm ) −(1/qr)(Jqr ⊗ Im ) + (1/qrm)Jqrm ]σ 2 [Iqrm + (σb2 /σ 2 )(Iqr ⊗ Jm )]} =r
q m (ξik − ξ¯i· − ξ¯·k + ξ¯·· )2 i=1 k=1
+σ 2 tr[(1/r)(Iq ⊗ Jr ⊗ Im ) − (1/rm)(Iq ⊗ Jrm ) −(1/qr)(Jqr ⊗ Im ) + (1/qrm)Jqrm ] +σb2 tr[(1/r)(Iq ⊗ Jr ⊗ Jm ) − (1/rm)(Iq ⊗ Jr ⊗ mJm ) −(1/qr)(Jqr ⊗ Im ) + (1/qrm)(Jqr ⊗ mJm )] =r
q m (ξik − ξ¯i· − ξ¯·k + ξ¯·· )2 + σ 2 (qm − q − m + 1), i=1 k=1
⎡ ⎤ q r m 2 E⎣ (y¯ij k − y¯ij · − y¯i·k + y¯i·· ) ⎦ i=1 j =1 k=1
=
q r m [E(y¯ij k ) − E(y¯ij · ) − E(y¯i·k ) + E(y¯i·· )]2 i=1 j =1 k=1
+tr{[I − (1/m)(Iqr ⊗ Jm ) + (1/rm)(Iq ⊗ Jrm ) −(1/r)(Iq ⊗ Jr ⊗ Im )]σ 2 [Iqrm + (σb2 /σ 2 )(Iqr ⊗ Jm )]} = 0 + σ 2 tr[I − (1/m)(Iqr ⊗ Jm ) + (1/rm)(Iq ⊗ Jrm ) − (1/r)(Iq ⊗ Jr ⊗ Im )] +σb2 tr[(Iqr ⊗ Jm ) − (1/m)(Iqr ⊗ mJm ) +(1/rm)(Iq ⊗ Jr ⊗ mJm ) − (1/r)(Iq ⊗ Jr ⊗ Jm )] = σ 2 (qrm − qr + q − qm).
The expected mean squares listed in Example 13.4.5-1 may be obtained by dividing each of the expected sums of squares given above by the corresponding rank.
13 Best Linear Unbiased Prediction
221
Exercise 25 For the split-plot model considered in Example 13.4.5-1, obtain simplified expressions for: (a) (b) (c) (d)
var(y¯i·· − y¯i ·· ) (i = i ); var(y¯··k − y¯··k ) (k = k ); var(y¯i·k − y¯i·k ) (k = k ); var(y¯i·k − y¯i ·k ) (i = i ).
Solution (a) Note that 1 (μ + αi + bij + γk + ξik + dij k ) = μ + αi + b¯i· + γ¯· + ξ¯i· + d¯i·· . rm r
y¯i·· =
m
j =1 k=1
It follows that for i = i , var(y¯i·· − y¯i ·· ) = var[(b¯i· + d¯i·· ) − (b¯i · + d¯i ·· )] = var(b¯i· ) + var(b¯i · ) + var(d¯i·· ) + var(d¯i ·· ) =
2 (mσb2 + σ 2 ). rm
(b) Note that y¯··k
q r 1 = (μ + αi + bij + γk + ξik + dij k ) = μ + α¯ · + b¯·· + γk + ξ¯·k + d¯··k . qr i=1 j =1
It follows that for k = k , var(y¯··k − y¯··k ) = var[(d¯··k ) − (d¯··k )] = var(d¯··k ) + var(d¯··k ) =
2σ 2 . qr
(c) Note that 1 (μ + αi + bij + γk + ξik + dij k ) = μ + αi + b¯i· + γk + ξik + d¯i·k . r r
y¯i·k =
j =1
222
13 Best Linear Unbiased Prediction
It follows that for k = k , var(y¯i·k − y¯i·k ) = var[(b¯i· + d¯i·k ) − (b¯i· + d¯i·k )] = var(d¯i·k ) + var(d¯i·k ) =
2σ 2 . r
(d) Note that 1 (μ + αi + bij + γk + ξik + dij k ) = μ + αi + b¯i· + γk + ξik + d¯i·k . r r
y¯i·k =
j =1
It follows that for i = i , var(y¯i·k − y¯i ·k ) = var[(b¯i· + d¯i·k ) − (b¯i · + d¯i ·k )] = var(b¯i· ) + var(b¯i · ) + var(d¯i·k ) + var(d¯i ·k ) =
2(σb2 + σ 2 ) . r
Distribution Theory
14
This chapter presents exercises on distribution theory relevant to linear models, and provides solutions to those exercises. Exercise 1 Let x represent an n-dimensional random vector with mean vector μ and variance–covariance matrix . Use Theorem 14.1.1 to show that x has a nvariate normal distribution if and only if, for every n-vector of constants a, aT x has a (univariate) normal distribution. Solution Suppose that aT x has a (univariate) normal distribution for every n-vector of constants a. Note that E(aT x) = aT μ and var(xT Ax) = aT a. Then, denoting the mgf of aT x by maT x (t) and denoting the mgf of x, if it exists, by m∗x (t), we find that for all a, 1 m∗x (a) = E[exp(aT x)] = maT x (1) = exp(aT μ + aT a). 2 It follows by Theorems 14.1.1 and 14.1.4 that x ∼ N(μ, ). Conversely, suppose that x ∼ N(μ, ). Then denoting the mgf of x by m∗x (t) and denoting the mgf of aT x, if it exists, by maT x (t), we find that for all t, 1 maT x (t) = E[exp(taT x)] = E{exp[(ta)T x]} = m∗x (ta) = exp[(ta)T μ + (ta)T (ta)] 2 1 = exp[t (aT μ) + t 2 (aT a)]. 2
It follows from Theorems 14.1.1 and 14.1.4 that aT x ∼ N(aT μ, aT a). Exercise 2 Use Theorem 14.1.3 to construct an alternate proof of Theorem 4.2.4 for the special case in which x ∼ N(μ, ) where is positive definite.
© Springer Nature Switzerland AG 2020 D. L. Zimmerman, Linear Model Theory, https://doi.org/10.1007/978-3-030-52074-8_14
223
224
14 Distribution Theory
Solution 1 E(x Ax) = T
1
∞ −∞
···
∞
1
−∞
xT Ax(2π )−n/2 ||− 2
exp{−[x −1 x − 2μT −1 x + μT −1 μ]/2} dx −n/2 1 −n/2 − 21 1 n/2 1 π || || 2 [tr(2A) − 0 = (2π ) 2 2 T
1 + (−μT −1 )(2)A(2)(− −1 μ)] 2 1 1 × exp (−μT −1 )(2)(− −1 μ) − μT −1 μ 4 2 = tr(A) + μT Aμ, where the next-to-last equality results from putting a0 = 0, a = 0, A = A, b0 = 1 T −1 1 −1 −1 into Theorem 14.1.3. 2 μ μ, b = μ, and B = 2 Exercise 3 Suppose that y follows a normal positive definite Aitken model {y, Xβ, σ 2 W}, and that X has full column rank. Use the “Factorization Theorem” [e.g., Theorem 6.2.6 of Casella and Berger (2002)] to show that the maximum likelihood estimators of β and σ 2 derived in Example 14.1-1 are sufficient statistics for those parameters. Solution By Theorem 14.1.5, the likelihood function is 1
L(β, σ 2 ) = (2π )−n/2 |σ 2 W|− 2 exp[−(y − Xβ)T (σ 2 W)−1 (y − Xβ)/2] 1
= (2π )−n/2 (σ 2 )−n/2 |W|− 2 exp[−(y − Xβ)T W−1 (y − Xβ)/2σ 2 ], where we have used Theorem 2.11.4. By the same argument used in Theorem 7.1.2 for the ordinary residual sum of squares function, for the generalized residual sum of squares function we have ˜ T W−1 (y − Xβ) ˜ + (β˜ − β)T XT W−1 X(β˜ − β). (y − Xβ)T W−1 (y − Xβ) = (y − Xβ) So we may rewrite the likelihood function as ˜ T W−1 (y − Xβ) ˜ L(β, σ 2 ) = (2π)−n/2 |W|− 2 (σ 2 )−n/2 exp{−[(y − Xβ) 1
+(β˜ − β)T XT W−1 X(β˜ − β)]/2σ 2 } = (2π)−n/2 |W|− 2 (σ 2 )−n/2 exp{−(n/2σ 2 )[σ¯ 2 + (β˜ − β)T XT W−1 X(β˜ − β)/n], } 1
14 Distribution Theory
225
˜ T W−1 (y − Xβ)/n. ˜ where σ¯ 2 = (y − Xβ) This shows that the likelihood 1 function can be factored into a product of two functions, (2π )−n/2 |W|− 2 and (σ 2 )−n/2 exp{−(n/2σ 2 )[σ¯ 2 + (β˜ − β)T XT W−1 X(β˜ − β)/n]}, such that the first function does not depend on (β, σ 2 ) and the second function, which does depend ˜ σ¯ 2 ). By the factorization theorem, this on (β, σ 2 ), depends on y only through (β, establishes that the maximum likelihood estimators comprise a sufficient statistic for (β, σ 2 ). Exercise 4 Suppose that y follows the normal constrained Gauss–Markov model {y, Xβ, σ 2 I : Aβ = h}. Let cT β be an estimable function under this model. Obtain maximum likelihood estimators of cT β and σ 2 . Solution The log-likelihood function is n n Q(β) , log L(β, σ 2 ) = − log 2π − log σ 2 − 2 2 2σ 2 where Q(β) = (y − Xβ)T (y − Xβ) is the residual sum of squares function. Recall from Definition 10.1.3 of the constrained least squares estimator of cT β that Q(β) is minimized, over all β satisfying the constraints Aβ = h, at β˘ where β˘ is the first p-component subvector of a solution to the constrained normal equations
XT X AT A 0
T β X y . = h λ
Because these equations are free of σ 2 , for each fixed value of σ 2 log L(β, σ 2 ) is ˘ It then maximized with respect to β, among those β that satisfy the constraints, at β. remains to maximize ˘ ˘ σ 2 ) = − n log 2π − n log σ 2 − Q(β) log L(β, 2 2 2σ 2 with respect to σ 2 . Taking the first derivative with respect to σ 2 yields n ˘ σ 2) ˘ ∂ log L(β, n Q(β) (σ¨ 2 − σ 2 ), = − + = ∂σ 2 2σ 2 2σ 4 2σ 4 where σ¨ 2 =
˘ Q(β) n .
Furthermore, ⎧ 2 2 ˜ σ 2 ) ⎨ > 0 for σ < σ¨ , ∂ log L(β, 2 = 0 for σ = σ¨ 2 , ⎩ ∂σ 2 < 0 for σ 2 > σ¨ 2 ,
226
14 Distribution Theory
˘ σ¨ 2 ) [unless Q(β) ˘ = 0, so that L(β, σ 2 ) attains a global maximum at (β, σ 2 ) = (β, in which case L(β, σ 2 ) does not have a maximum; however, this is an event of probability 0.] Thus, the maximum likelihood estimators of cT β and σ 2 are cT β˘ ˘ and Q(β)/n, respectively. Exercise 5 Prove Theorem 14.1.8: Suppose that x ∼ Nn (μ, ), and partition x, μ, and conformably as ⎛
⎞ x1 ⎜ ⎟ x = ⎝ ... ⎠ ,
⎛
⎞ μ1 ⎜ ⎟ μ = ⎝ ... ⎠ , μm
xm
⎛
⎞ 11 · · · 1m ⎜ .. ⎟ . = ⎝ ... . ⎠ m1 · · · mm
Then, x1 , . . . , xm are mutually independent if and only if ij = 0 for j = i = 1, . . . , m. Solution Let mx (t), mx1 (t1 ), . . . , mxm (tm ) denote the mgfs of x, x1 , . . . , xm where t = (tT1 , . . . , tTm )T . Suppose that ij = 0 for all j = i = 1, . . . , m. Then for all t ∈ Rn , 1 mx (t) = exp(tT μ + tT t) 2 ⎛ ⎞ m m m 1 = exp ⎝ tTi μi + tTi ij tj ⎠ 2 i=1
i=1 j =1
1 1 = exp(tT1 μ1 + tT1 11 t1 ) · · · exp(tTm μm + tTm mm tm ) 2 2 m 2 mxi (ti ), = i=1
where we used Theorem 14.1.4 for the first and last equalities. Thus, by Theorem 14.1.2, x1 , . . . , xm are mutually independent. Conversely, suppose that x1 , . . . , xm are mutually independent. Then for j = i = 1, . . . , m, ij = E[(xi − μi )(xj − μj )T ] = E(xi − μi )E(xj − μj )T = 0. Exercise 6 Derive the distributions specified by (14.5) and (14.6) in Example 14.1-3.
14 Distribution Theory
227
˜ = CT β. Furthermore, Solution The BLUP is unbiased by definition so E(CT β˜ + u) using results from the proof of Theorem 13.2.2, ˜ + var(u) ˜ u) ˜ u)] ˜ = var(CT β) ˜ + [cov(CT β, ˜ T ˜ + cov(CT β, var(CT β˜ + u) = σ 2 CT (XT W−1 X)− C + σ 2 [KT W−1 K − KT W−1 X(XT W−1 X)− XT W−1 K] +0 + 0T ,
which is easily seen to match (14.5). Furthermore, E(τ˜ − τ ) = E(τ˜ ) − E(τ ) = CT β − CT β = 0; E(˜e) = 0 (by Theorem 11.1.8b); var(τ˜ − τ ) = σ 2 Q by Theorem 13.2.2; var(˜e) = σ 2 [W − X(XT W−1 X)− XT ] by Theorem 11.1.8d; and, letting A represent any matrix such that AT X = CT , cov(τ˜ − τ , e˜ ) = cov(CT β˜ + KT W−1 e˜ − u, e˜ ) = cov(AT y˜ , e˜ ) + cov(KT W−1 e˜ , e˜ ) − cov[u, (I − P˜ X )y] = 0 + σ 2 KT W−1 [W − X(XT W−1 X)− XT ] − σ 2 KT (I − P˜ X )T = σ 2 [KT − KT W−1 X(XT W−1 X)− XT ] −σ 2 [KT − KT W−1 X(XT W−1 X)− XT ] = 0, where the third equality holds by Theorem 11.1.8d, e. Exercise 7 Suppose that x ∼ N4 (μ, σ 2 W), where μ = (1, 2, 3, 4)T and W = 1 1 2 I4 + 2 J4 . Find the conditional distribution of Ax given that Bx = c, where
A = 2 0 −1 −1 ,
B=
0 0 2 −2 , 111 1
2 c= . 2
Ax A Solution Because = x, by Theorem 14.1.6 the joint distribution of Bx B Ax and Bx is normal with mean ⎛ ⎞ ⎞ ⎛ ⎞ 1 ⎛ −5 2 0 −1 −1 ⎜ ⎟ 2 A ⎟ ⎝ ⎠ μ = ⎝ 0 0 2 −2 ⎠ ⎜ ⎝ 3 ⎠ = −2 B 10 11 1 1 4 and variance–covariance matrix
A B
σ2
⎛ ⎞ 30 0 2 T T σ 1 1 AA AB ⎜ ⎟ I4 + J4 ) AT BT = = σ2 ⎝0 4 0 ⎠ 2 2 2 BAT BBT + BJBT 0 0 10
228
14 Distribution Theory
because AJ4 = 0T4 . By Theorem 14.1.8, Ax and Bx are independent; therefore, the conditional distribution of Ax given that Bx = c is the same as the marginal distribution of Ax, which by Theorem 14.1.7 is N(−5, 3σ 2 ). Exercise 8 Suppose that (y1 , y2 , y3 )T satisfies the stationary autoregressive model of order one described in Example 5.2.4-1 with n = 3, and assume that the joint distribution of (y1 , y2 , y3 )T is trivariate normal. Determine the conditional distribution of (y1 , y3 )T given that y2 = 1. What can you conclude about the dependence of y1 and y3 , conditional on y2 ? Solution We have ⎛ ⎞ ⎛ ⎞⎞ 1 ρ ρ2 y1 2 σ ⎝ ρ 1 ρ ⎠⎠ , ⎝ y2 ⎠ ∼ N ⎝μ13 , 1 − ρ2 y3 ρ2 ρ 1 ⎛
which upon rearrangement may be re-expressed as ⎛ ⎞ ⎛ ⎞⎞ 1 ρ2 ρ y1 2 ⎝ ρ 2 1 ρ ⎠⎠ . ⎝ y3 ⎠ ∼ N ⎝μ13 , σ 1 − ρ2 y2 ρ ρ 1 ⎛
Thus, by Theorem 14.1.9, the conditional distribution of (y1 , y3 )T given that y2 = 1 is bivariate normal with mean ρ ρ + (1 − ρ)μ μ12 + (1 − μ) = ρ ρ + (1 − ρ)μ and variance–covariance matrix σ2 σ2 1 ρ2 ρ − ρ ρ = σ 2 I2 . 2 2 2 ρ 1 ρ 1−ρ 1−ρ Because the off-diagonal elements of this variance–covariance matrix equal 0, we conclude, by Theorem 14.1.8, that y1 and y3 are conditionally independent given y2 . Exercise 9 Let x = (xi ) be a random n-vector whose distribution is multivariate normal with mean vector μ = (μi ) and variance–covariance matrix = (σij ). Show, by differentiating the moment generating function, that E[(xi − μi )(xj − μj )(xk − μk )(x − μ )] = σij σk + σik σj + σi σj k for i, j, k, = 1, . . . , n.
14 Distribution Theory
229
Solution By Theorem 14.1.6, x − μ ∼ N(0, ), hence by Theorem 14.1.4, mx−μ (t) = exp(tT t/2). Now first observe that ∂ tT t 1 ∂ = ∂ti 2 2 ∂ti
n
σaa ta2
+2
n
σab ta tb
b=i
a=1 b>a
a=1
⎞ ⎛ n 1⎝ σib tb ⎠ = σib tb . = 2σii ti + 2 2 b=1
Thus ∂mx−μ (t) ∂ tT t = exp(tT t/2) ∂ti ∂ti 2 n T σia ta , = exp(t t/2) a=1
n ∂ 2 mx−μ (t) ∂ = exp(tT t/2) σia ta ∂ti tj ∂tj a=1 n n T = exp(t t/2) σia ta σj b tb + σij exp(tT t/2), ∂ 3 mx−μ (t) = exp(tT t/2) ∂ti tj tk
a=1 n a=1
+ exp(t t/2)σik T
b=1
σia ta
n
n b=1
+ exp(tT t/2)σij
n
σj b tb
n
b=1
σkc tc
c=1
σj b tb + exp(t t/2)σj k T
n
σia ta
a=1
σkc tc ,
c=1
n n n n ∂ 4 mx−μ (t) T = exp(t t/2) σia ta σj b tb σkc tc σd td ∂ti tj tk t a=1 b=1 c=1 d=1 n n T + exp(t t/2)σi σj b tb σkc tc b=1
+ exp(t t/2)σj T
n
c=1
σia ta
a=1
+ exp(t t/2)σk T
n a=1
n
σkc tc
c=1
σia ta
n b=1
σj b tb
230
14 Distribution Theory
+ exp(t t/2)σik T
n
σj b tb
b=1
+ exp(t t/2)σj k T
n
+ exp(t t/2)σij T
σia ta
n
σd td
d=1
σkc tc
n
c=1
+ exp(tT t/2)σik σj
σd td
d=1
a=1
n
n
+ exp(tT t/2)σj k σi
σd td
+ exp(tT t/2)σij σk .
d=1
Finally, E[(xi − μi )(xj − μj )(xk − μk )(x − μ )] =
∂ 4 mx−μ (t) 33 3 ∂ti tj tk t t=0
= σij σk + σik σj + σi σj k . Exercise 10 Let u and v be random variables whose joint distribution is N2 (0, ), where rank() = 2. Show that corr(u2 , v 2 ) = [corr(u, v)]2 . u Solution Write σij (i, j = 1, 2) for the elements of and let x = . Now v u2 = xT
10 x 00
and
v 2 = xT
00 x 01
so by Corollary 4.2.6.3, cov(u , v ) = 2tr 2
2
10 00
σ11 σ12 σ21 σ22
00 01
σ11 σ12 σ21 σ22
2 . = σ12 2 and var(v 2 ) = σ 2 . Therefore, Similar calculations yield var(u2 ) = σ11 22 2 σ12 cov(u2 , v 2 ) corr(u2 , v 2 ) = ' = [corr(u, v)]2 . = 2 2 σ σ 11 22 var(u )var(v )
Exercise 11 Suppose that x ∼ Nn (a, σ 2 I) where a = (ai ) is an n-vector of constants such that aT a = 1. Let b represent another n-vector such that bT b = 1 and aT b = 0. Define Pa = aaT /(aT a) = aaT and Pb = bbT /(bT b) = bbT . (a) Find the distribution of bT x. (b) Find the distribution of Pa x. (c) Find the distribution of Pb x.
14 Distribution Theory
231
(d) Find the conditional distribution of x1 (the first element of x) given that aT x = 0. Assume that n ≥ 2. Solution (a) By Theorem 14.1.6, bT x ∼ N(bT a, bT (σ 2 I)b), i.e., N(0, σ 2 ). (b) By Theorem 14.1.6, Pa x ∼ Nn (Pa a, Pa (σ 2 I)Pa ), i.e., Nn (a, σ 2 Pa ). 2 2 (c) By Theorem 14.1.6, Pb x ∼Nn (P b (σ I)Pb ), i.e., Nn (0, σ Pb ). b a,T P u1 x1 x (where u1 is the first unit vector) has a = (d) By Theorem 14.1.6, T a x aT T a1 u1 a= bivariate normal distribution with mean vector and variance– aT 1 T T T 1 a1 u1 u1 (σ 2 I) covariance matrix . Then, by Theorem = σ2 T T a a a1 1 14.1.9, the conditional distribution of x1 given that aT x = 0 is normal with mean a1 + a1 (1)−1 (0 − 1) = 0 and variance σ 2 [1 − a1 (1)−1 a1 ] = σ 2 (1 − a12 ). Exercise 12 Use the mgf of a χ 2 (ν, λ) distribution to obtain its mean and variance. ν 2tλ for t < 12 , so Solution mu (t) = (1 − 2t)− 2 exp 1−2t 1 2tλ(−2) −ν/2 2tλ d 2λ 2tλ + mu (t) = − (−2) exp exp ν ν 2 dt 1 − 2t 1 − 2t (1 − 2t) (1 − 2t) 2 +1 (1 − 2t) 2 1 − 2t 2tλ ν + 2λ 4tλ = exp + ν ν 1 − 2t (1 − 2t) 2 +1 (1 − 2t) 2 +2
and 1 4tλ 2tλ ν + 2λ d2 4tλ m (t) = + exp 2λ + u ν ν ν 1 − 2t 1 − 2t dt 2 (1 − 2t) 2 (1 − 2t) 2 +1 (1 − 2t) 2 +2 (ν + 2λ)[−( ν2 + 2)](−2) + 4λ 4tλ[−( ν2 + 2)](−2) 2tλ + + exp ν ν 1 − 2t (1 − 2t) 2 +2 (1 − 2t) 2 +3 4tλ 2tλ ν + 2λ 1 4tλ + exp = 2λ + ν ν ν 1 − 2t 1 − 2t (1 − 2t) 2 (1 − 2t) 2 +1 (1 − 2t) 2 +2 4tλ(ν + 2) 2tλ (ν + 2λ)(ν + 2) + 4λ + + exp ν ν 1 − 2t (1 − 2t) 2 +2 (1 − 2t) 2 +3
232
14 Distribution Theory
Then 3 d 3 mu (t)3 = ν + 2λ, t=0 dt 3 d2 3 var(u) = 2 mu (t)3 − [E(u)]2 t=0 dt E(u) =
= (2λ)(ν + 2λ) + (ν + 2λ)(ν + 2) + 4λ − (ν 2 + 4λν + 4λ2 ) = 2ν + 8λ. Exercise 13 Suppose that x ∼ Nn (μ, ), where rank() = n. Let A represent a nonnull nonnegative definite matrix of constants such that A is not idempotent. It has been proposed that the distribution of xT Ax may be approximated by a multiple of a central chi-square distribution, specifically by cχ 2 (f ) where c and f are chosen so that xT Ax and cχ 2 (f ) have the same mean and variance. (a) Determine expressions for the appropriate c and f . (b) Under the normal Gauss–Markov model {y, Xβ, σ 2 I}, suppose that L = PX is a matrix such that E(Ly) = Xβ. Define e˘ = (I − L)y. Using part (a), determine an approximation to the distribution of e˘ T e˘ . Solution (a) Using the proposed approximation and expressions in Sect. 14.2 for the mean and variance of a central chi-square distribution, we obtain approximating equations . E(yT Ay) = E[cχ 2 (f )] = cf and . Var(yT Ay) = Var[cχ 2 (f )] = 2c2 f. Using Theorem 4.2.4 and solving, we obtain c=
Var(yT Ay) 2μT AAμ + tr(AA) = , 2E(yT Ay) μT Aμ + tr(A)
f =
E(yT Ay) [μT Aμ + tr(A)]2 = . c 2μT AAμ + tr(AA)
(b) Because Ly is unbiased for Xβ, E(Ly) = Xβ for all β, implying that LXβ = Xβ for all β, or equivalently that LX = X. Note that e˘ T e˘ = yT (I − L)T (I − L)y ≡ yT Ay, say, where A = (I − L)T (I − L). By part
14 Distribution Theory
233
(a), yT Ay is approximately distributed as cχ 2 (f ) where c = =
2(Xβ)T (I − L)T (I − L)σ 2 I(I − L)T (I − L)Xβ + tr[(I − L)T (I − L)σ 2 I(I − L)T (I − L)σ 2 I] (Xβ)T (I − L)T (I − L)Xβ + tr[(I − L)T (I − L)σ 2 I] σ 2 tr[(I − L)T (I − L)(I − L)T (I − L)] tr[(I − L)T (I − L)]
and f = =
{(Xβ)T (I − L)T (I − L)Xβ + tr[(I − L)T (I − L)σ 2 I]}2 2(Xβ)T (I − L)T (I − L)σ 2 I(I − L)T (I − L)Xβ + tr[(I − L)T (I − L)σ 2 I(I − L)T (I − L)σ 2 I] {tr[(I − L)T (I − L)]}2 . tr[(I − L)T (I − L)(I − L)T (I − L)]
Exercise 14 Suppose that the conditions of part (a) of Corollary 14.2.6.1 hold, i.e., suppose that x ∼ Nn (μ, ) where rank() = n and that A is a nonnull symmetric n × n matrix of constants such that A is idempotent. Prove that A is nonnegative definite. Solution Because A is idempotent, AA = A. Post-multiplying both sides of this matrix equation by −1 yields AA = A. Therefore, for any n-vector x, xT Ax = xT AAx = xT AT Ax = (Ax)T (Ax) ≥ 0 because is positive definite. Exercise 15 Consider the special case of Theorem 14.3.1 in which there are only two functions of interest and they are quadratic forms, i.e., w1 = xT A1 x and w2 = xT A2 x. Show that, in the following cases, necessary and sufficient conditions for these two quadratic forms to be distributed independently are: (a) A1 A2 = 0, A1 A2 μ = 0, A2 A1 μ = 0, and μT A1 A2 μ = 0, when there are no restrictions on A1 and A2 beyond those stated in the theorem. (b) A1 A2 = 0 and A1 A2 μ = 0 when A1 is nonnegative definite. (c) A1 A2 = 0 when A1 and A2 are nonnegative definite. Solution (a) These conditions are trivial specializations of the conditions of Theorem 14.3.1 to the case in which k = 2 and 1 = 2 = 0. (b) If A1 A2 = 0 and A1 A2 μ = 0, then A1 A2 = (A1 A2 ) = 0 = 0; A1 A2 μ = (A1 A2 μ) = 0 = 0; A2 A1 μ = (A1 A2 )T μ = 0μ = 0; and μT A1 A2 μ = μT (A1 A2 μ) = μT 0 = 0. Thus the conditions of part (a) are satisfied so w1 and w2 are independent. Conversely, if w1 and w2 are independent and A1 is nonnegative definite, then the conditions of part (a) 1
1
are satisfied and they may be written as A1 A2 = (A12 )T (A12 )A2 = 1
1
1
0 and A1 A2 μ = (A12 )T (A12 )A2 μ = 0 where A12 is the unique
234
14 Distribution Theory 1
nonnegative definite square root of A1 . Thus, by Theorem 3.3.2 A12 A2 = 0 1
1
and A12 A2 μ = 0. Pre-multiplying both equalities by A12 yields A1 A2 = 0 and A1 A2 μ = 0. (c) If A1 A2 = 0, then A1 A2 = (A1 A2 ) = 0 = 0 and A1 A2 μ = (A1 A2 )μ = 0μ = 0. Thus the conditions of part (b) are satisfied so w1 and w2 are independent. Conversely, if w1 and w2 are independent and A1 and A2 are nonnegative definite, then the conditions of part (b) are satisfied, the first 1
1
1
of which may be written as A1 (A22 )T (A22 ) = 0 where A22 is the unique nonnegative definite square root of A2 . This implies, by Theorem 3.3.2, that 1
1
A1 A22 = 0. Post-multiplication of this equality by A22 yields A1 A2 = 0. Exercise 16 Suppose that x1 , x2 , and x3 are independent and identically distributed as Nn (μ, ), where rank() = n. Define Q = (x1 , x2 , x3 )A(x1 , x2 , x3 )T , where A is a symmetric idempotent 3 × 3 matrix of constants. Let r = rank(A) and let c represent a nonzero n-vector of constants. Determine the distribution of cT Qc/(cT c). [Hint: First obtain the distribution of (x1 , x2 , x3 )T c.] Solution By the given conditions, cT x1 , cT x2 , and cT x3 are independent and identically distributed as N(cT μ, cT c), implying that (x1 , x2 , x3 )T c ∼ N3 [(cT μ)13 , diag(cT c, cT c, cT c)]. Partially standardizing, we have 1
1
(cT c)− 2 (x1 , x2 , x3 )T c ∼ N3 [(cT c)− 2 (cT μ)13 , I3 ). Because AI3 = A is idempotent, by Corollary 14.2.6.1 we have cT Qc/(cT c) ∼ χ 2 [r,
1 T (c c)−1 (cT μ)2 1T3 A13 ]. 2
Exercise 17 Suppose that x ∼ Nn (μ, ), where rank() = n, and partition x, μ, and conformably as follows: x=
x1 x2
,
μ=
μ1 μ2
,
=
11 12 21 22
,
where x1 is n1 × 1 and x2 is n2 × 1. −1 T −1 (a) Obtain the distribution of Q1 = (x1 − 12 −1 22 x2 ) ( 11 − 12 22 21 ) (x1 − −1 12 22 x2 ). (b) Obtain the distribution of Q2 = xT2 −1 22 x2 . (c) Suppose that μ2 = 0. Obtain the distribution of Q1 /Q2 (suitably scaled).
14 Distribution Theory
235
Solution (a) Observe that x1 −1 . x1 − 12 −1 x = − I n1 12 22 22 2 x2 Thus, by Theorem 14.1.6, x1 − 12 −1 22 x2 is normally distributed with mean vector μ1 −1 = μ1 − 12 −1 In1 − 12 22 22 μ2 μ2 and variance–covariance matrix 11 12 In1 = 11 − 12 −1 In1 − 12 −1 22 21 . 22 21 22 − −1 21 22 Because is positive definite, the n1 × n1 matrix 11 − 12 −1 22 21 is also positive definite by Theorem 2.15.7a. So by Theorem 14.2.4, −1 −1 T −1 Q1 ∼ χ 2 n1 , (μ1 − 12 −1 μ ) ( − ) (μ − μ )/2 . 11 12 21 12 1 22 2 22 22 2
(b) x2 ∼ N(μ2 , 22 ) by Theorem 14.1.7. By the positive definiteness of and Theorem 2.15.7a, 22 is positive definite, so its rank is n2 . By Theorem 14.2.4, Q2 ∼ χ 2 (n2 , μT2 −1 22 μ2 /2). (c) Because
x1 − 12 −1 22 x2 x2
=
I − 12 −1 22 0 I
x1 x2
,
the joint distribution of x1 − 12 −1 22 x2 and x2 is multivariate normal, and their matrix of covariances is 11 12 0 −1 = 0. I − 12 22 21 22 I Thus x1 − 12 −1 22 x2 and x2 are independent by Theorem 14.1.8. Moreover, when μ2 = 0, the results of parts (a) and (b) specialize as follows: Q1 ∼ −1 2 χ 2 (n1 , μT1 ( 11 − 12 −1 22 21 ) μ1 /2) and Q2 ∼ χ (n2 ). Thus, by Definition 14.5.1, Q1 /n1 −1 ∼ F(n1 , n2 , μT1 ( 11 − 12 −1 22 21 ) μ1 /2). Q2 /n2
236
14 Distribution Theory
Exercise 18 Suppose that x ∼ Nn (0, I) and that A and B are n × n symmetric idempotent matrices of ranks p and q, respectively, where p > q. Show, by completing the following three steps, that xT (A − B)x ∼ χ 2 (p − q) if √ T T corr(x Ax, x Bx) = q/p. (a) Show that tr[(I − A)B] = 0 if and only if corr(xT Ax, xT Bx) = (b) Show that (I − A)B = 0 if and only if tr[(I − A)B] = 0. (c) Show that xT (A − B)x ∼ χ 2 (p − q) if (I − A)B = 0.
√
q/p.
Solution (a) By Corollary 4.2.6.3, cov(xT Ax, xT Bx) = 2 · tr(AIBI) + 4 · 0T AIB0 = 2tr(AB), var(xT Ax) = 2 · tr(AIAI) + 4 · 0T AIA0 = 2tr(A) = 2p, var(xT Bx) = 2 · tr(BIBI) + 4 · 0T BIB0 = 2tr(B) = 2q. √ √ So corr(xT Ax, xT Bx) = 2tr(AB)/ 4pq = tr(AB)/ pq, which is equal to √ q/p if and only if tr(AB) = q = rank(B), i.e., if and only if tr(AB) = tr(B), i.e., if and only if tr[(I − A)B] = 0. (b) If (I − A)B = 0, then tr[(I − A)B] = 0 trivially. Conversely, if tr[(I − A)B] = 0, then tr{[(I−A)B]T [(I−A)B]} = tr[B(I−A)B] = tr[(I−A)BB] = tr[(I−A)B] = 0. By Theorem 2.10.4, (I − A)B = 0. (c) If (I − A)B = 0 then B = AB, so BA = BT AT = (AB)T = BT = B. Then (A − B)(A − B) = AA − BA − AB + BB = A − B − B + B = A − B, i.e., A − B is idempotent. Furthermore, rank(A − B) = tr(A − B) = tr(A) − tr(B) = p − q. Thus, by Corollary 14.2.6.1, xT (A − B)x ∼ χ 2 (p − q). Exercise 19 Suppose that x ∼ Nn (μ, ), where rank() = n. Partition x, μ, and conformably as
x1 x2
,
μ1 μ2
,
and
11 12 21 22
,
where x1 is n1 × 1 and x2 is n2 × 1. Let A and B represent symmetric n1 × n1 and n2 × n2 matrices of constants, respectively. Determine a necessary and sufficient condition for xT1 Ax1 and xT2 Bx2 to be independent.
14 Distribution Theory
237
Solution By Corollary 14.3.1.2a, xT1 Ax1 =
x1 x2
T
00 0B
x1 x2
x1 x2
T
A0 0 0
x1 x2
and xT2 Bx2 =
are independent if and only if
A0 0 0
11 12 21 22
00 0B
= 0,
i.e., if and only if A 12 B = 0. Exercise 20 Suppose that x ∼ Nn (a, σ 2 I) where a is a nonzero n-vector of constants. Let b represent another nonzero n-vector. Define Pa = aaT /(aT a) and Pb = bbT /(bT b). (a) Determine the distribution of xT Pa x/σ 2 . (Simplify as much as possible here and in all parts of this exercise.) (b) Determine the distribution of xT Pb x/σ 2 . (c) Determine, in as simple a form as possible, a necessary and sufficient condition for the two quadratic forms in parts (a) and (b) to be independent. (d) Under the condition in part (c), can the expressions for the parameters in either or both of the distributions in parts (a) and (b) be simplified further? If so, how? Solution (a) Note that σ1 x ∼ N( σ1 a, I) and that Pa I = Pa is idempotent with rank(Pa ) = 1. Then by part (a) of Corollary 14.2.6.1, xT Pa x/σ 2 = ( σ1 x)T Pa ( σ1 x) ∼ χ 2 (1, 12 ( σ1 a)T Pa ( σ1 a)), i.e., χ 2 (1, 2σ1 2 aT a). (b) Note that Pb I = Pb is idempotent with rank(Pb ) = 1. Then by part (a) of Corollary 14.2.6.1, xT Pb x/σ 2 = ( σ1 x)T Pb ( σ1 x) ∼ χ 2 (1, 12 ( σ1 a)T Pb ( σ1 a)), i.e., χ 2 (1, 2σ1 2 aT Pb a). (c) By Corollary 14.3.1.2a, the two quadratic forms in parts (a) and (b) are independent if and only if Pa IPb = 0, i.e., if and only if aaT bbT /(aT abT b) = 0, i.e., if and only if aaT bbT = 0. This last condition holds if aT b = 0, and it implies (by pre-multiplication and post-multiplication of both sides by aT and bT , respectively, and then dividing both sides by the nonzero scalars aT a and bT b) that aT b = 0. Thus a necessary and sufficient condition is that aT b = 0, i.e., that a and b are orthogonal. (d) If aT b = 0, the noncentrality parameter of the distribution of xT Pb x/σ 2 simplifies to 0 because aT Pb a = aT (1/bT b)bbT a = 0. The noncentrality parameter of the distribution of xT Pa x/σ 2 does not simplify further.
238
14 Distribution Theory
Exercise 21 Suppose that y follows the normal Aitken model {y, Xβ, σ 2 W}. Let A represent any n × n nonnegative definite matrix such that yT Ay is an unbiased estimator of σ 2 . (a) Show that yT Ay is uncorrelated with every linear function aT y. (b) Suppose further that aT y is a BLUE of its expectation. Show that yT Ay and aT y are distributed independently. (c) The unbiasedness of yT Ay for σ 2 implies that the main diagonal elements of AW satisfy a certain property. Give this property. If W = kI + (1 − k)J for k ∈ [0, 1], what must k equal (in terms of A) in order for this property to be satisfied? Solution (a) By Theorem 4.2.4, the unbiasedness of yT Ay for σ 2 implies that β T XT AXβ + σ 2 tr(AW) = σ 2
for all β and all σ 2 > 0.
Putting β = 0p into this equation yields tr(AW) = 1, and back-substituting this into the same equation yields the result β T XT AXβ = 0 1
for all β.
1
1
1
Writing A as A 2 A 2 , this last result can be rewritten as β T XT A 2 A 2 Xβ = 0 1 1 for all β, implying that A 2 Xβ = 0 for all β, implying further that A 2 X = 0, 1 implying still further (by pre-multiplying both sides of the equality by A 2 that AX = 0. Then, by Corollary 4.2.5.1, cov(aT y, yT Ay) = 2aT (σ 2 W)A(Xβ) = 2σ 2 aT W0β = 0, so yT Ay is uncorrelated with every linear function aT y. (b) If aT y is a BLUE of its expectation, then by Corollary 11.2.2.2 Wa = Xq for some p-vector q. Thus A(σ 2 W)a = σ 2 AWa = σ 2 AXq = 0, where we used the fact, shown in part (a), that AX = 0. The independence of yT Ay and aT y follows immediately from Corollary 14.3.1.2b. (c) The unbiasedness of yT Ay for σ 2 implies, as shown in part (a), that tr(AW) = 1, i.e., that the main diagonal elements of AW sum to one. In the special case W = kI + (1 − k)J, we have 1 = tr{A[kI + (1 − k)J]} = ktr(A) + (1 − k)tr(AJ),
14 Distribution Theory
239
implying that k=
1 − tr(AJ) . tr(A) − tr(AJ)
Exercise 22 Suppose that x ∼ Nn (1μ, σ 2 [(1 − ρ)I + ρJ]) where n ≥ 2, −∞ < 1 μ < ∞, σ 2 > 0, and 0 < ρ < 1. Define A = n1 J and B = n−1 (I − n1 J). Determine var(xT Ax). Determine var(xT Bx). Are xT Ax and xT Bx independent? Explain. Suppose that n is even. Let x1 represent the vector consisting of the first n/2 elements of x and let x2 represent the vector consisting of the remaining n/2 elements of x. Determine the distribution of x1 − x2 . (e) Suppose that n = 3, in which case x = (x1 , x2 , x3 )T . Determine the conditional distribution of (x1 , x2 )T given that x3 = 1.
(a) (b) (c) (d)
Solution (a) First observe that n1 J σ 2 [(1 − ρ)I + ρJ] = ρ + ρn)J. By Corollary 4.2.6.3,
σ2 n [(1
− ρ)J + ρnJ] =
σ2 n (1
−
1 1 J σ 2 [(1 − ρ)I + ρJ] J σ 2 [(1 − ρ)I + ρJ] n n 1 1 T 2 J σ [(1 − ρ)I + ρJ] J (μ1) +4(μ1) n n 2 4 σ 2 2 σ (1 − ρ + ρn) tr(JJ) + 4μ (1 − ρ + ρn)1T JJ1 =2 n2 n2
var(xT Ax) = 2tr
= 2σ 4 (1 − ρ + ρn)2 + 4nμ2 σ 2 (1 − ρ + ρn). (b) First observe that 4.2.6.3 again,
1 1 2 n−1 (I− n J)σ [(1−ρ)I+ρJ]
=
σ 2 (1−ρ) 1 n−1 (I− n J). By Corollary
1 1 1 1 I − J σ 2 [(1 − ρ)I + ρJ] I − J σ 2 [(1 − ρ)I + ρJ] n−1 n n−1 n 1 1 1 1 I − J σ 2 [(1 − ρ)I + ρJ] I − J (μ1) +4(μ1)T n−1 n n−1 n 4 2 2 2 1 1 1 4μ σ (1 − ρ) T 2σ (1 − ρ) I − I − J I − J + J 1 tr 1 = n n n (n − 1)2 (n − 1)2
var(xT Bx) = 2tr
=
2σ 4 (1 − ρ)2 . n−1
240
14 Distribution Theory
(c) Using the result established at the outset of the solution to part (a), we observe that 1 σ 2 (1 − ρ + ρn) 1 1 1 J σ 2 [(1−ρ)I+ρJ] J I − J = 0. I− J = n n−1 n n(n − 1) n Therefore, by Corollary 14.3.1.2a, xT AxandxT Bx are independent. x1 (d) Observe that x1 − x2 = In/2 −In/2 . Therefore, by Theorem 14.1.6, x2 x1 − x2 has a normal distribution with mean vector
In/2
μ1n/2 = μ1n/2 − μ1n/2 = 0 −In/2 μ1n/2
and variance–covariance matrix
In/2 −In/2
σ 2 [(1 − ρ)In/2 + ρJn/2 ] σ 2 ρJn/2 2 2 σ ρJn/2 σ [(1 − ρ)In/2 + ρJn/2 ]
In/2 −In/2
= σ 2 [(1 − ρ)In/2 + ρJn/2 ] + σ 2 [(1 − ρ)In/2 + ρJn/2 ] − 2σ 2 ρJn/2 = 2σ 2 (1 − ρ)In/2 .
(e) By Theorem 14.1.9, the conditional distribution of (x1 , x2 )T , given that x3 = 1, is normal with mean vector μ12 + σ 2 ρ12 (σ 2 )−1 (1 − μ) = [μ + ρ(1 − μ)]12 and variance–covariance matrix σ 2 [(1 − ρ)I2 + ρJ2 ] − σ 2 ρ12 (σ 2 )−1 σ 2 ρ1T2 = σ 2 [(1 − ρ)I2 + (ρ − ρ 2 )J2 ]. Exercise 23 Suppose that x ∼ Nn (μ, cI + b1Tn + 1n bT ), where cI + b1Tn + 1n bT is positive definite. (a) Determine the distribution of ni=1 (xi − x)2 (suitably scaled), and fully specify the parameters of that distribution. (b) Determine, in as simple a form as possible, a necessary and sufficient condition on b for ni=1 (xi − x)2 and x 2 to be independent.
14 Distribution Theory
241
Solution (a) Let ≡ cI + b1Tn + 1n bT . Then where A = (I −
n
i=1 (xi
− x) ¯ 2 = xT (I − n1 J)x ≡ xT Ax, say,
1 1 1 J)(cI + b1Tn + 1n bT ) = c(I − J)(I + b1Tn ). n n c
Now,
1 A c
1 A c
= (I −
1 1 1 1 J)[(I + b1Tn )(I − J)](I + b1Tn ) n c n c
1 1 1 J)(I − Jn )(I + b1T ) n n c 1 T 1 = (I − J)(I + b1n ). n c ¯ 2 ∼ χ 2 (df, N CP ) where Thus, by Corollary 14.2.6.1, 1c ni=1 (xi − x) = (I −
df = rank NCP =
1 1 A = rank(I − J) = n − 1, c n
n n 1 1 [E(xi ) − E(x)] ¯ 2= (μi − μ) ¯ 2, 2c 2c i=1
i=1
where μ = (μi ). (b) x¯ 2 = ( n1 1T x)2 = n12 xT Jx. Thus, by Corollary 14.3.1.2a, ni=1 (xi − x) ¯ 2 and 1 1 T x¯ 2 are independent if and only if (I − n J)(I + c b1n )J = 0, i.e., if and only if (I − n1 J)b1Tn J = 0, i.e., if and only if n(I − n1 11T )b1Tn = 0, i.e., if and only if b = a1 for some a ∈ R. Exercise 24 Suppose that
x1 x2
∼N
μ1 μ2
11 12 , , 21 22
where x1 and μ1 are n1 -vectors, x2 and μ2 are n2 -vectors, 11 is n1 × n1 , and is nonsingular. Let A represent an n1 × n1 symmetric matrix of constants and let B represent a matrix of constants having n2 columns. (a) Determine, in as simple a form as possible, a necessary and sufficient condition for xT1 Ax1 and Bx2 to be uncorrelated. (b) Determine, in as simple a form as possible, a necessary and sufficient condition for xT1 Ax1 and Bx2 to be independent.
242
14 Distribution Theory
(c) Using your answer to part (a), give a necessary and sufficient condition for xT1 x1 and x2 to be uncorrelated. (d) Using your answer to part (b), give a necessary and sufficient condition for xT1 x1 and x2 to be independent. −1 (e) Consider the special case in which = (1−ρ)I+ρJ, where n−1 < ρ < 1. Can T T x1 x1 and x2 be uncorrelated? Can x1 x1 and x2 be independent? If the answer to either or both of these questions is yes, give a necessary and sufficient condition for the result to hold. Solution (a) By Corollary 4.2.5.1, xT1 Ax1 and Bx2 are uncorrelated if and only if 11 12 A0 μ1 = 0, 2 0B 21 22 μ2 0 0
i.e., if and only if B 21 Aμ1 = 0. (b) By Corollary 14.3.1.2b, xT1 Ax1 and Bx2 are independent if and only if
A0 0 0
11 12 21 22
0B
T
= 0,
i.e., if and only if A 12 BT = 0, or equivalently, if and only if B 21 A = 0. (c) By part (a) with A = In1 and B = In2 , xT1 x1 and x2 are uncorrelated if and only if 21 μ1 = 0. (d) By part (b) with A = In1 and B = In2 , xT1 x1 and x2 are independent if and only if 12 = 0. (e) If = (1 − ρ)I + ρJ, where 0 < ρ < 1, then the necessary and sufficient condition for uncorrelatedness in part (c) may be written as ρJn2 ×n1 μ1 = 0, or equivalently as 1Tn1 μ1 = 0. However, the necessary and sufficient condition for independence in part (d) is not satisfied because ρJn1 ×n2 = 0. Exercise 25 Suppose that x ∼ Nn (μ, ), where rank() = n. Let A represent a nonnull symmetric idempotent matrix such that A = kA for some k = 0. Show that: (a) (b) (c) (d)
xT Ax/k ∼ χ 2 (ν, λ) for some ν and λ, and determine ν and λ. xT Ax and xT (I − A)x are independent. Ax and (I − A)x are independent. k = tr(AA)/rank(A).
14 Distribution Theory
243
Solution (a) Observe that ( k1 A) = ( k1 )(kA) = A, which is idempotent. Thus, by Corollary 14.2.6.1, xT Ax/k = xT ( k1 A)x ∼ χ 2 (df, N CP ) where df = 1 T μ Aμ. rank[( k1 A)(kA)] = rank(A) and NCP = 12 μT ( k1 A)μ = 2k (b) A(I − A) = kA(I − A) = k(A − A) = 0 because A is idempotent. Thus, by Corollary 14.3.1.2a, xT Ax and xT (I − A)x are independent. (c) As shown in the solution to part (b), A(I − A) = 0. Thus, by Corollary 14.3.1.2c, Ax and (I − A)x are independent. (d) tr(AA) = tr[(kA)A] = ktr(A) = krank(A) because A is idempotent. Exercise 26 Suppose that x ∼ Nn (μ1, ), where μ is an unknown parameter, = I + CCT where C is known, C = 0, and CT 1 = 0. Let u1 = xT −1 x and let ui = xT Ai x (i = 2, 3) where A2 and A3 are nonnull symmetric matrices. (a) Prove or disprove: There is no nonnull symmetric matrix A2 for which u1 and u2 are independent. (b) Find two nonnull symmetric matrices A2 and A3 such that u2 and u3 are independent. (c) Determine which, if either, of u2 and u3 [as you defined them in part (b)] have noncentral chi-square distributions. If either of them does have such a distribution, give the parameters of the distribution in as simple a form as possible. Solution (a) By Corollary 14.3.1.2a, u1 and u2 are independent if and only if −1 A2 = 0, i.e., if and only if A2 = 0. But A2 = 0 by hypothesis, so u1 and u2 cannot be independent. (b) By Corollary 14.3.1.2a, u1 and u2 are independent if and only if A2 (I + CCT )A3 = 0, i.e., if and only if A2 A3 + A2 CCT A3 = 0. The second summand on the left-hand side of this equality is equal to 0 if we take A2 = I − PC , and then we can make the first summand equal to 0 by taking A3 = PC . (c) A2 = (I − PC )(I + CCT ) = I − PC which is idempotent, so by Corollary 14.2.6.1 u2 ∼ χ 2 [n − rank(C), N CP ] where N CP = 12 (μ1)T (I − PC )(μ1) = (n/2)μ2 because CT 1 = 0. Furthermore, A3 = PC (I + CCT ) = PC + CCT , which generally is not idempotent because (PC + CCT )(PC + CCT ) = PC + CCT + CCT + CCT CCT . Thus u3 generally does not have a chi-square distribution. Exercise 27 Suppose that the model {y, Xβ} is fit to n > rank(X) observations by ordinary least squares. Let cT β be an estimable function under the model, and let βˆ be a solution to the normal equations. Furthermore, let σˆ 2 denote the residual mean square from the fit, i.e., σˆ 2 = yT (I − PX )y/[n − rank(X)]. Now, suppose that
244
14 Distribution Theory
the distribution of y is N(Xβ, ) where β is an unknown vector of parameters and is a specified positive definite matrix having one of the following four forms: • • • •
Form 1: Form 2: Form 3: Form 4: A and B.
= σ 2 I. = σ 2 (I + PX APX ) for an arbitrary n × n matrix A. = σ 2 [I + (I − PX )B(I − PX )] for an arbitrary n × n matrix B. = σ 2 [I + PX APX + (I − PX )B(I − PX )] for arbitrary n × n matrices
(a) For which of the four forms of , if any, will cT βˆ and the fitted residuals vector eˆ be independent? Justify your answer. (b) For which of the four forms of , if any, will [n − rank(X)]σˆ 2 /σ 2 have a noncentral chi-square distribution? Justify your answer, and for those forms for which the quantity does have a noncentral chi-square distribution, give the parameters of the distribution. (c) Suppose that the ordinary least squares estimator of every estimable function cT β is a BLUE of that function. Will cT βˆ and eˆ be independent for every estimable function cT β? Will [n − rank(X)]σˆ 2 /σ 2 have a noncentral chi-square distribution? Justify your answers. Solution (a) Let a represent a vector such that cT = aT X (such a vector exists because cT β is estimable). Then cT βˆ = aT Xβˆ = aT PX y, and eˆ = (I − PX )y. So cT βˆ and eˆ will be independent if and only if aT PX (I − PX ) = 0T . For Form 1, aT PX (σ 2 I)(I − PX ) = σ 2 aT PX (I − PX ) = 0T . For Form 2, aT PX [σ 2 (I + PX APX )](I − PX ) = σ 2 aT PX (I − PX ) = 0T . For Form 3, aT PX σ 2 [I + (I − PX )B(I − PX )](I − PX ) = 0T . For Form 4, aT PX σ 2 [I + PX APX + (I − PX )B(I − PX )](I − PX ) = 0T . Therefore, cT βˆ and eˆ will be independent for all four forms of . (b) [n − rank(X)]σˆ 2 /σ 2 will have a noncentral χ 2 distribution if and only if (1/σ 2 )(I − PX ) is idempotent. For Form 1, (1/σ 2 )(I − PX )σ 2 I = I − PX , which is idempotent.
14 Distribution Theory
245
For Form 2, (1/σ 2 )(I − PX )σ 2 (I + PX APX ) = I − PX , which is idempotent. For Form 3, (1/σ 2 )(I − PX )σ 2 [I + (I − PX )B(I − PX )] = (I − PX ) + (I − PX )B(I − PX ), which is generally not idempotent. For Form 4, (1/σ 2 )(I − PX )σ 2 [I + PX APX + (I − PX )B(I − PX )] = (I − PX ) + (I−PX )B(I−PX ), which is generally not idempotent. Thus, [n−rank(X)]σˆ 2 /σ 2 will have a noncentral χ 2 distribution only for Forms 1 and 2. For both of those forms, [n − rank(X)]σˆ 2 /σ 2 will have a χ 2 [rank(I − PX ), β T XT (I − PX )Xβ/2] distribution, i.e., χ 2 [n − rank(X)]. (c) By Theorem 11.3.2, W has Form 4. Thus, cT βˆ and eˆ will be independent for every estimable function cT β by part (a), but [n − rank(X)]σˆ 2 /σ 2 will not necessarily have a noncentral χ 2 distribution. Exercise 28 Suppose that w ∼ F(ν1 , ν2 , λ), where ν2 > 2. Find E(w). 1 Solution By Definition 14.5.1, w has the same distribution as uu12 /ν /ν2 where u1 and u2 are distributed independently as χ 2 (ν1 , λ) and χ 2 (ν2 ), respectively. Thus, by Theorem 14.2.7 and the discussion immediately after Corollary 14.2.3.1,
E(w) = E
u1 /ν1 u2 /ν2
= (ν2 /ν1 )E(u1 )E
1 u2
=
ν2 [1 + (2λ/ν1 )] . ν2 − 2
Exercise 29 Prove Theorem 14.5.1: If t ∼ t(ν, μ), then t 2 ∼ F(1, ν, μ2 /2). Solution If t ∼ t (ν, μ), then by definition t ∼
√z u/ν
where z ∼ N(μ, 1), u ∼
and z and u are independent. By Theorem 14.2.3, z2 ∼ χ 2 (1, μ2 /2). Thus z2 t 2 ∼ u/ν where z2 ∼ χ 2 (1, μ/2), u ∼ χ 2 (ν), and z2 and u are independent, implying that t 2 ∼ F(1, ν, μ2 /2) by Definition 14.5.1.
χ 2 (ν),
Exercise 30 Prove Theorem 14.5.5: Consider the prediction of a vector of s predictable functions τ = CT β+u under the normalprediction-extended W K y Xβ . Recall from Thepositive definite Aitken model , ,σ2 KT H u 0 orem 13.2.1 that τ˜ = CT β˜ + KT Ey is the vector of BLUPs of the elements of τ , where β˜ is any solution to the Aitken equations and E = W−1 − W−1 X(XT W−1 X)− XT W−1 , and that var(τ˜ − τ ) = σ 2 Q, where an expression for Q was given in Theorem 13.2.2a. If Q is positive definite (as is the case under either of the conditions of Theorem 13.2.2b), then (τ˜ − τ )T Q−1 (τ˜ − τ ) ∼ F(s, n − p∗ ). s σ˜ 2
246
14 Distribution Theory
If s = 1, then τ˜ − τ √ ∼ t(n − p∗ ), σ˜ Q where Q is the scalar version of Q. Solution First, by (14.6) τ˜ − τ ∼ N(0, σ 2 Q), so by Theorem 14.2.4 (τ˜ − τ )T (σ 2 Q)−1 (τ˜ − τ ) ∼ χ 2 (s). Second, as noted in Example 14.2-3, (n − p∗ )σ˜ 2 /σ 2 ∼ χ 2 (n − p∗ ). Third, as noted in Example 14.1-3, τ˜ − τ and e˜ are independent, implying that τ˜ − τ and e˜ T W−1 e˜ /(n − p∗ ) = σ˜ 2 are independent. Therefore, (τ˜ − τ )T (σ 2 Q)−1 (τ˜ − τ )/s (n−p∗ )σ˜ 2 /σ 2 n−p∗
=
(τ˜ − τ )T Q−1 (τ˜ − τ ) ∼ F(s, n − p∗ ). s σ˜ 2
When s = 1, this result simplifies to τ˜ − τ √ ∼ t(n − p∗ ) σ˜ q by Theorem 14.5.1. Exercise 31 Let μ1 and μ2 be orthonormal nonrandom nonnull n-vectors. Suppose that x ∼ Nn (μ1 , σ 2 I). Determine the distribution of (xT μ1 μT1 x)/(xT μ2 μT2 x). Solution
1 μ μT σ2 1 1
(σ 2 I)
1 μ μT σ2 1 1
(σ 2 I) = μ1 (μT1 μ1 )μT1 = μ1 μT1 =
1 μ μT σ2 1 1
(σ 2 I),
i.e., ( σ12 μ1 μT1 )(σ 2 I) is idempotent. Thus by Corollary 14.2.6.1, xT ( σ12 μ1 μT1 )x ∼ χ 2 (df, NCP) where df = rank(μ1 μT1 ) = 1 and NCP = 12 μT1 ( σ12 μ1 μT1 )μ1 = 2σ1 2 . Furthermore,
1 μ μT σ2 2 2
(σ 2 I)
1 μ μT σ2 2 2
(σ 2 I) = μ2 (μT2 μ2 )μT2 = μ2 μT2 =
1 μ μT σ2 2 2
(σ 2 I),
14 Distribution Theory
247
i.e., ( σ12 μ2 μT2 )(σ 2 I) is idempotent. Thus by Corollary 14.2.6.1 again, xT ( σ12 μ2 μT2 ) x ∼ χ 2 (df, NCP) where df = rank(μ2 μT2 )) = 1 and NCP = 12 μT1 ( σ12 μ2 μT2 )μ1 = 0 by the orthogonality of μ1 and μ2 . Also, by Corollary 14.3.1.2a, xT ( σ12 μ1 μT1 )x and xT ( σ12 μ2 μT2 )x are independent because μ1 μT1 (σ 2 I)μ2 μT2 = σ 2 μ1 (μT1 μ2 )μT2 = 0. Therefore, by Definition 14.5.1, (xT μ1 μT1 x)/(xT μ2 μT2 x) ∼ F(1, 1, 2σ1 2 ). Exercise 32 Suppose that y follows the normal Gauss–Markov model {y, Xβ, σ 2 I}. Let cT β be an estimable function, and let aT y be a linear unbiased estimator (not necessarily the least squares estimator) of cT β. Let σˆ 2 = yT (I − PX )y/(n − p∗ ) be the residual mean square and let w represent a positive constant. Consider the quantity t∗ =
aT y − cT β . σˆ w
(a) What general conditions on a and w are sufficient for t ∗ to be distributed as a central t random variable with n − p∗ degrees of freedom? Express the conditions in as simple a form as possible. (b) If aT y was not unbiased but everything else was as specified above, and if the conditions found in part (a) held, what would be the distribution of t ∗ ? Be as specific as possible. (c) Repeat your answers to parts (a) and (b), but this time supposing that y follows a positive definite Aitken model (with variance–covariance matrix σ 2 W) and with σ˜ 2 = yT W−1 (I − P˜ X )y/(n − p∗ ) in place of σˆ 2 . Solution (a) By Theorems 14.1.6 and 6.1.1, aT y ∼ N(aT Xβ, aT σ 2 Ia) = N(cT β, σ 2 aT a) because aT y is unbiased for cT β. This implies that aT y − cT β ∼ N(0, σ 2 aT a), 1 so if w = (aT a) 2 , then (aT y − cT β)/(σ w) ∼ N(0, 1). Also, (n − p∗ )σˆ 2 /σ 2 ∼ χ 2 (n − p∗ ) as usual. For the independence of (aT y − cT β)/(σ w) and (n − p∗ )σˆ 2 /σ 2 to hold, a sufficient condition is (by Corollary 14.3.1.2b) (I − PX )Ia = 0 or equivalently a ∈ C(X). So sufficient conditions on a and w for t ∗ to be distributed as a central t random variable with n − p∗ degrees of 1 freedom are (1) a ∈ C(X) and (2) w = (aT a) 2 .
(b) In this case t ∗ ∼ t n − p∗ , a
T Xβ−cT β 1
σ (aT a) 2
.
(c) By arguments similar to those used in the solution to part (a), sufficient conditions on a and w for t ∗ to be distributed as a central t random variable with n − p∗ degrees of freedom are (1) W−1 (I − P˜ X )Wa = 0 or equivalently 1 T T 2 (I − P˜ TX )a = 0 and (2) w = (a Wa) . Ifthese conditions hold but a y is not unbiased, then t ∗ ∼ t n − p∗ , a
T Xβ−cT β 1
σ (aT Wa) 2
.
248
14 Distribution Theory
Exercise 33 Consider the normal prediction-extended positive definite Aitken model W K y Xβ 2 . , ,σ KT H u 0 Let β˜ be a solution to the Aitken equations; let τ = CT β + u be a vector of s predictable functions; let τ˜ be the BLUP of τ ; let σ˜ 2 be the generalized ˜ T W−1 (y − Xβ)/(n ˜ residual mean square, i.e., σ˜ 2 = (y − Xβ) − p∗ ); and let 2 Q = (1/σ )var(τ˜ − τ ). Assume that Q and
W K KT H
are positive definite. By Theorem 14.5.5, F ≡
(τ˜ − τ )T Q−1 (τ˜ − τ ) ∼ F(s, n − p∗ ). s σ˜ 2
Suppose that LT y is some other linear unbiased predictor of τ ; that is, LT y is a LUP but not the BLUP of τ . A quantity F ∗ analogous to F may be obtained by replacing τ˜ with LT y and replacing Q with LT WL + H − LT K − KT L in the definition of F , i.e., F∗ ≡
(LT y − τ )T (LT WL + H − LT K − KT L)−1 (LT y − τ ) . s σ˜ 2
Does F ∗ have a central F-distribution in general? If so, verify it; if not, explain why not and give a necessary and sufficient condition on LT for F ∗ to have a central F-distribution. Solution Because LT y is a LUP, LT X = CT , implying further that LT y − τ ∼ N(0, σ 2 (LT WL + H − LT K − KT L)). Also, of course, (n − p∗ )σ˜ 2 /σ 2 ∼ χ 2 (n − p∗ ). Thus two of the three requirements are satisfied for F ∗ to have a central F-distribution. However, the third requirement, namely the independence of LT y−τ and σ˜ 2 , is not satisfied in general. By Corollary 14.3.1.2b, a necessary and sufficient condition for that independence is
LT −I
W K KT H
i.e., (LT W − KT )E = 0, i.e., LT WE = KT E.
E0 00
= 0,
14 Distribution Theory
249
Exercise 34 Consider the normal Aitken model {y, Xβ, σ 2 W}, for which point estimation of an estimable function cT β and the residualvariance σ 2 were presented ˜t in Sect. 11.2. Suppose that rank(W, X) > rank(X). Let ˜ be any solution to the λ BLUE equations
W X XT 0
t 0 = λ c
for cT β, and let σ˜ 2 = yT G11 y/[rank(W, X) − rank(X)] 11 is the upper left where G G11 G12 of the coefficient n × n block of any symmetric generalized inverse G21 G22 matrix of the BLUE equations. Show that ˜tT y ' σ˜ ˜tT W˜t has a noncentral t distribution with degrees of freedom equal to rank(W, X) − T rank(X) and noncentrality parameter √c β . σ
˜tT W˜t
Solution Because the BLUE equations arise by solving the problem of minimizing var(tT y) among all unbiased estimators, ˜tT y is unbiased and has variance σ 2 ˜tT W˜t. Thus ˜tT y ∼ N(cT β, σ 2 ˜tT W˜t). Partial standardization yields ˜tT y cT β ' ' ∼N ,1 . σ ˜tT W˜t σ ˜tT W˜t Now, [rank(W, X) − rank(X)]σ˜ 2 = [(1/σ )y]T G11 [(1/σ )y]. σ2 Consider the conditions of Theorem 14.2.6 as they may apply to the quadratic form on the right-hand side of this last equation. We find that WG11 WG11 W = WG11 (W − XG22 XT ) = WG11 W by Theorem 3.3.8c, d (with W and X here playing the roles of A and B in the theorem); WG11 WG11 [(1/σ )Xβ] = (W − XG22 XT )G11 Xβ(1/σ ) = WG11 [(1/σ )Xβ],
250
14 Distribution Theory
also by Theorem 3.3.8c, d; and [(1/σ )Xβ]T G11 WG11 [(1/σ )Xβ] = 0 = [(1/σ )Xβ]T G11 [(1/σ Xβ] yet again by Theorem 3.3.8c. Thus all three conditions of Theorem 14.2.6 are satisfied, implying that [rank(W, X) − rank(X)]σ˜ 2 ∼ χ 2 (tr(G11 W), 0), σ2 or equivalently (by Theorem 3.3.8c, e), χ 2 [rank(W, X) − rank(X)]. Next consider the conditions of Theorem 14.3.1 as they apply here, with G11 , 0n×n , 0n , and ˜t playing the roles of A1 , A2 , 1 , and 2 in the theorem. We find that the first of the conditions in Theorem 14.3.1 is satisfied trivially for both (i, j ) = (1, 2) and (i, j ) = (2, 1); the second is satisfied when (i, j ) = (1, 2) because WG11 W˜t = −WG11 Xλ˜ = −0n×p λ˜ = 0 by Theorem 3.3.8c, and is satisfied trivially when (i, j ) = (2, 1); and the third is satisfied for both (i, j ) = (1, 2) and ˜T (i, j ) = (2, 1) because ˜tT WG11 [(1/σ )Xβ] = 0 by Theorem 3.3.8c. Thus, √t y σ
˜tT W˜t
and [rank(W, X) − rank(X)]σ˜ 2 /σ 2 are independent. It follows immediately from Definition 14.5.2 that ˜T y
&
√t
˜tT y = ' 4 σ˜ ˜tW˜t {[rank(W, X) − rank(X)]σ˜ 2 /σ 2 } [rank(W, X) − rank(X)] σ
˜tT W˜t
has the specified distribution. Exercise 35 Let M=
m11 m12 m21 m22
represent a random matrix whose elements m11 , m12 , m21 , m22 are independent N(0, 1) random variables. (a) Show that m11 − m22 , m12 + m21 , and m12 − m21 are mutually independent, and determine their distributions. (b) Show that Pr[{tr(M)}2 − 4|M| > 0] = Pr(W > 12 ), where W ∼ F(2, 1). [Hint: You may find it helpful to recall that (u + v)2 − (u − v)2 = 4uv for all real numbers u and v.]
14 Distribution Theory
251
Solution (a) Because (m11 , m12 , m21 , m22 )T ∼ N(04 , I4 ) and ⎛ ⎞ ⎞ m11 ⎞ ⎛ m11 − m22 1 0 0 −1 ⎜ ⎟ ⎝ m12 + m21 ⎠ = ⎝ 0 1 1 0 ⎠ ⎜ m12 ⎟ , ⎝ m21 ⎠ 0 1 −1 0 m12 − m21 m22 ⎛
by Theorem 14.1.6 ⎛
⎞ m11 − m22 ⎝ m12 + m21 ⎠ ∼ N(03 , ), m12 − m21 where ⎛ ⎞ 1 1 0 0 −1 ⎜ 0 = ⎝0 1 1 0 ⎠I⎜ ⎝ 0 0 1 −1 0 −1 ⎛
⎞ 0 0 1 1 ⎟ ⎟ = 2I3 . 1 −1 ⎠ 0 0
Thus, by Theorem 14.1.8, m11 −m22 , m12 +m21 , and m12 −m21 are independent, and by Theorem 14.1.7 each of them has a N(0, 2) distribution. (b) {tr(M)}2 − 4|M| = (m11 + m22 )2 − 4(m11 m22 − m12 m21 ) = m211 + m222 − 2m11 m22 + 4m12 m21 = (m11 − m22 )2 + (m12 + m21 )2 − (m12 − m21 )2 . By the distributional result in part (a), the three summands in this last expression are independent and each, divided by 2, has a χ 2 (1) distribution. Thus, Pr[{tr(M)}2 − 4|M| > 0] = Pr[(m11 − m22 )2 + (m12 + m21 )2 − (m12 − m21 )2 > 0]
= Pr
(m11 − m22 )2 (m12 + m21 )2 (m12 − m21 )2 + > 2 2 2
= Pr(U > V ) U/2 1 = Pr > V 2 = Pr(W >
1 ), 2
252
14 Distribution Theory
where U ∼ χ 2 (2), V ∼ χ 2 (1), U and V are independent, and W ∼ F(2, 1) by Theorem 14.2.8 and Definition 14.5.1. Exercise 36 Using the results of Example 14.4-2 show that the ratio of the Factor A mean square to the whole-plot error mean square, the ratio of the Factor B mean square to the split-plot error mean square, and the ratio of the AB interaction mean square to the split-plot error mean square have noncentral F distributions, and give the parameters of those distributions. Solution The ratio of the Factor A mean square to the whole-plot error mean square is q rm i=1 (y¯i·· − y¯··· )2 /(q − 1) q m i=1 rj =1 (y¯ij · − y¯i·· )2 /[q(r − 1)] q rm 2 i=1 (y¯i·· − y¯··· ) /(q − 1) σ 2 +mσb2 = q r m 2 2 2 j =1 (y¯ij · − y¯i·· ) /[q(r − 1)] i=1 σ +mσb
q rm 2 [(αi − α¯ · ) + (ξ¯i· − ξ¯·· )] . ∼ F q − 1, q(r − 1), 2(q − 1)(σ 2 + mσb2 ) i=1
The ratio of the Factor B mean square to the split-plot error mean square is 2 qr m k=1 (y¯··k − y¯··· ) /(m − 1) q r m 2 j =1 k=1 (yij k − y¯ij · − y¯i·k + y¯i·· ) /[q(r − 1)(m − 1)] i=1 qr m 2 k=1 (y¯··k − y¯··· ) /(m − 1) σ2 = 1 q r m 2 j =1 k=1 (yij k − y¯ij · − y¯i·k + y¯i·· ) /[q(r − 1)(m − 1)] i=1 σ2 m qr 2 ¯ ¯ ∼ F m − 1, q(r − 1)(m − 1), [(γk − γ¯· ) + (ξ·k − ξ·· )] . 2(m − 1)σ 2 k=1
The ratio of the AB interaction mean square to the split-plot error mean square is q
m 2 k=1 (y¯i·k − y¯i·· − y¯··k + y¯··· ) /[(q − 1)(m − 1)] i=1 q r m 2 j =1 k=1 (yij k − y¯ij · − y¯i·k + y¯i·· ) /[q(r − 1)(m − 1)] i=1 r
=
r q m 2 k=1 (y¯i·k − y¯i·· − y¯··k + y¯··· ) /[(q − 1)(m − 1)] i=1 σ2 q r m 1 2 j =1 k=1 (yij k − y¯ij · − y¯i·k + y¯i·· ) /[q(r − 1)(m − 1)] i=1 σ2
∼ F[(q − 1)(m − 1), q(r − 1)(m − 1)].
14 Distribution Theory
253
Reference Casella, G. & Berger, R. L. (2002). Statistical inference (2nd ed.). Pacific Grove, CA: Duxbury.
Inference for Estimable and Predictable Functions
15
This chapter presents exercises on inference for estimable and predictable functions in linear models and provides solutions to those exercises. Exercise 1 Consider the normal Gauss–Markov simple linear regression model for n observations, and suppose that in addition to the n responses that follow that ∗ , all taken at an unknown value of x, say model, there are m responses y1∗ , y2∗ , . . . , ym ∗ x , that also follow that model. Using the observation-pairs (x1 , y1 ), . . . , (xn , yn ) and the additional y-observations, an estimate of the unknown x ∗ can be obtained. One estimator of x ∗ that has been proposed is xˆ ∗ =
y¯ ∗ − βˆ1 , βˆ2
∗ ∗ ˆ ˆ ∗ where y¯ ∗ = m1 m i=1 yi . This estimator is derived by equating y¯ to β1 + β2 x ∗ ∗ (because their expectations are both β1 + β2 x ) and then solving for x . (Note that the estimator is not well defined when βˆ2 = 0, but this is an event of probability zero.) (a) Determine the distribution of W ≡ y¯ ∗ − βˆ1 − βˆ2 x ∗ . (b) The random variable W/(σˆ c), where c is a nonrandom quantity, has a certain well-known distribution. Determine this distribution and the value of c. (c) Let 0 < ξ < 1. Based on your solution to part (b), derive an exact 100(1 − ξ )% confidence interval for x ∗ whose endpoints depend on the unknown x ∗ and then approximate the endpoints (by substituting xˆ ∗ for x ∗ ) to obtain an approximate 100(1 − ξ )% confidence interval for x ∗ . Is this confidence interval symmetric about xˆ ∗ ?
© Springer Nature Switzerland AG 2020 D. L. Zimmerman, Linear Model Theory, https://doi.org/10.1007/978-3-030-52074-8_15
255
256
15 Inference for Estimable and Predictable Functions
Solution (a) y¯ ∗ ∼ N(β1 + β2 x ∗ , σ 2 /m), and with the aid of Example 7.2-1 it is easily shown ∗ −x) ¯ 2 that βˆ1 + βˆ2 x ∗ ∼ N(β1 +β2 x ∗ , σ 2 [ n1 + (xSXX ]). Furthermore, y¯ ∗ and βˆ1 + βˆ2 x ∗ ∗ ∗ are independent of y , . . . , y . Therefore, are independent because y1 , . . . , ym 1 n ∗
−x) ¯ W ∼ N(0, σ 2 [ m1 + n1 + (xSXX ]). 2 because (βˆ1 , βˆ2 , y¯ ∗ )T and σˆ 2 are independent. (b) W and σˆ are independent %
Therefore, letting c =
1 m
2
+
1 n
+
(x ∗ −x) ¯ 2 SXX ,
we have
W/(σ c) W ∼ t(n − 2). =' σˆ c (n − 2)σˆ 2 /[σ 2 (n − 2)] (c) By part (b), 1 − ξ = Pr −tξ/2,n−2 ≤
y¯ ∗ − βˆ1 − βˆ2 x ∗ ≤ tξ/2,n−2 σˆ c
= Pr βˆ1 − y¯ ∗ − σˆ ctξ/2,n−2 ≤ −βˆ2 x ∗ ≤ βˆ1 − y¯ ∗ + σˆ ctξ/2,n−2 y¯ ∗ − βˆ1 + σˆ ctξ/2,n−2 33 y¯ ∗ − βˆ1 − σˆ ctξ/2,n−2 ∗ ˆ = Pr ≤x ≤ 3β2 > 0 · Pr(βˆ2 > 0) βˆ2 βˆ2 y¯ ∗ − βˆ1 − σˆ ctξ/2,n−2 33 y¯ ∗ − βˆ1 + σˆ ctξ/2,n−2 ∗ ˆ + Pr ≤x ≤ 3β2 < 0 · Pr(βˆ2 < 0) βˆ2 βˆ2 σˆ ctξ/2,n−2 σˆ ctξ/2,n−2 33 ≤ x ∗ ≤ xˆ ∗ + = Pr xˆ ∗ − 3βˆ2 > 0 · Pr(βˆ2 > 0) βˆ2 βˆ2 σˆ ctξ/2,n−2 σˆ ctξ/2,n−2 33 ˆ β + Pr xˆ ∗ − ≤ x ∗ ≤ xˆ ∗ + < 0 · Pr(βˆ2 < 0). 3 2 βˆ2 βˆ2
Thus, xˆ ∗ ±
σˆ ctξ/2,n−2 βˆ2
∗ is % an exact 100(1 − ξ )% confidence interval for x . Replacing c with cˆ = ∗ 2 (xˆ −x) ¯ 1 1 m + n + SXX yields an approximate 100(1 − ξ )% confidence interval that is symmetric about xˆ ∗ .
Exercise 2 Consider the normal Gauss–Markov quadratic regression model yi = β1 + β2 xi + β3 xi2 + ei
(i = 1, . . . , n),
15 Inference for Estimable and Predictable Functions
257
where it is known that β3 = 0. Assume that there are at least three distinct xi ’s, so that X has full column rank. Let xm be the value of x where the quadratic function μ(x) = β1 +β2 x +β3 x 2 is minimized or maximized (such an x-value exists because β3 = 0). (a) Verify that xm = −β2 /(2β3 ). (b) Define τ = β2 + 2β3 xm (= 0) and τˆ = βˆ2 + 2βˆ3 xm , where (βˆ1 , βˆ2 , βˆ3 ) are the least squares estimators of (β1 , β2 , β3 ). Determine the distribution of τˆ in terms of xm , σ 2 , and {cij }, where cij is the ij th element of (XT X)−1 . (c) Based partly on your answer to part (b), determine a function of τˆ and σˆ 2 that has an F distribution, and give the parameters of this F distribution. (d) Let ξ ∈ (0, 1). Use the result from part (c) to find a quadratic function of xm , say q(xm ), such that Pr[q(xm ) ≤ 0] = 1 − ξ . When is this confidence set an interval? Solution d (β1 + β2 x + (a) The stationary point of μ(x) occurs at the value of x, where dx 2 β3 x ) = 0, i.e., where β2 + 2β3 x = 0, i.e., at xm = −β2 /(2β3 ). ˆ where cT = (0, 1, 2xm ). Thus βˆ2 + 2βˆ3 xm has a normal (b) βˆ2 + 2βˆ3 xm = cT β, distribution with mean β2 + 2β3 xm = 0 and variance
σ
2
0 1 2xm
⎛
⎞ 0 2 c33 ). (XT X)−1 ⎝ 1 ⎠ = σ 2 (c22 + 4xm c23 + 4xm 2xm
(c) (n − 3)σˆ 2 /σ 2 ∼ χ 2 (n − 3), and τˆ and σˆ 2 are independent, so τˆ 2 ∼ F(1, n − 3). 2c ) σˆ 2 (c22 + 4xm c23 + 4xm 33 (d) By the result in part (c),
(βˆ2 + 2βˆ3 xm )2 1 − ξ = Pr ≤ σˆ 2 Fξ,1,n−3 2c c22 + 4xm c23 + 4xm 33
2 = Pr[(4βˆ32 − 4c33 σˆ 2 Fξ,1,n−3 )xm + (4βˆ2 βˆ3 − 4c23 σˆ 2 Fξ,1,n−3 )xm
+ (βˆ22 − c22 σˆ 2 Fξ,1,n−3 ) ≤ 0]. 2 + bx + c, where a = 4βˆ 2 − 4c σ 2 Thus q(xm ) = axm m 33 ˆ Fξ,1,n−3 , b = 4βˆ2 βˆ3 − 3 2 2 2 4c23 σˆ Fξ,1,n−3 , and c = βˆ2 − c22 σˆ Fξ,1,n−3 . The set {xm : q(xm ) ≤ 0} is a 100(1 − ξ )% confidence set, but it is not necessarily an interval. It is an interval if a > 0 and b2 − 4ac > 0.
258
15 Inference for Estimable and Predictable Functions
Exercise 3 Consider the normal Gauss–Markov simple linear regression model yi = β1 + β2 xi + ei
(i = 1, . . . , n).
In Example 12.1.1-1, it was shown that the mean squared error for the least squares estimator of the intercept from the fit of an (underspecified) intercept-only model is smaller than the mean squared error for the least squares estimator of the intercept from fitting the full simple linear regression model if and only if β22 /[σ 2 /SXX] ≤ 1. Consequently, it may be desirable to test the hypotheses H0 :
β22 ≤ 1 versus σ 2 /SXX
Ha :
β22 > 1. σ 2 /SXX
A natural test statistic for this hypothesis test is βˆ22 . σˆ 2 /SXX Determine the distribution of this test statistic when β22 2 σ /SXX
= 1.
Solution βˆ2 ∼ N(β2 , σ 2 /SXX), implying that βˆ22 β22 2 ∼ χ 1, 2 . σ 2 /SXX 2σ /SXX Also, we know that (n − 2)σˆ 2 /σ 2 is distributed as χ 2 (n − 2), independently of βˆ2 . Thus, 4 1
βˆ22 β22 4 = 2 ∼ F 1, n − 2, 2 . (n−2)σˆ 2 σˆ /SXX 2σ /SXX (n − 2) σ2 βˆ22 σ 2 /SXX
When
β22 σ 2 /SXX
= 1, this distribution specializes to F(1, n − 2, 12 ).
Exercise 4 By completing the three parts of this exercise, verify the claim made in Example 15.2-3, i.e., that the power of the F-test for H0 : β2 = 0 versus Ha : β2 = 0 in normal simple linear regression, when the explanatory variable is restricted to the interval [a, b] and n is even, is maximized by setting half of the x-values equal to a and the other half equal to b.
15 Inference for Estimable and Predictable Functions
259
2 (a) Show that ni=1 (xi − x) ¯ 2 ≤ ni=1 xi − a+b . [Hint: Add and subtract a+b 2 2 n 2 within i=1 (xi − x) ¯ .] 2 n(b−a)2 (b) Show that ni=1 xi − a+b ≤ . 2 4 n n(b−a)2 a+b 2 (c) Show that i=1 xi − 2 = for the design that sets half of the x4 values equal to a and the other half equal to b. Solution (a) n n a+b 2 a+b 2 xi − − x¯ − (xi − x) ¯ = 2 2 i=1
i=1
=
n a+b 2 a+b 2 + n x¯ − xi − 2 2 i=1
n a+b a+b xi − − 2 x¯ − 2 2
i=1
=
n
xi −
i=1
a+b 2
2
a+b 2 − n x¯ − 2
n a+b 2 xi − ≤ . 2 i=1
(b) Because xi − a ≥ 0 and xi − b ≤ 0, (xi − a) + (xi − b) ≤ (xi − a) − (xi − b) = b − a. Therefore, n i=1
a+b xi − 2
2 =
n n(b − a)2 . (1/4)[(xi − a) + (xi − b)]2 ≤ 4 i=1
(c) If half of the x-values are equal to a and the other half are equal to b, then x¯ = a+b 2 , implying further that n i=1
(xi − x) ¯ 2=
n a+b 2 n n(b − a)2 a+b 2 n a+b 2 = + = xi − . a− b− 2 2 2 2 2 4 i=1
Thus, by part (b) the design with half of2 the x-values equal to a and the other half equal to b maximizes ni=1 (xi − x) ¯ .
260
15 Inference for Estimable and Predictable Functions
Exercise 5 Consider the following modification of the normal Gauss–Markov model, which is called the normal Gauss–Markov variance shift outlier model. For this model, the model equation y = Xβ + e can be written as
y1 y−1
=
xT1 X−1
β+
e1 e−1
,
where
e1 e−1
∼ Nn 0, diag{σ 2 + σδ2 , σ 2 , σ 2 , . . . , σ 2 } .
Here, σδ2 is an unknown nonnegative scalar parameter. Assume that rank(X−1 ) = rank(X) (= p∗ ), i.e., the rank of X is not affected by deleting its first row, and define 2 , and p eˆ1,−1 , βˆ −1 , σˆ −1 11,−1 as in Methodological Interlude #9. 2 /σ 2 . (a) Determine the distributions of eˆ1,−1 and (n − p∗ − 1)σˆ −1 (b) Determine the distribution of
eˆ1,−1 ' . σˆ −1 1 + p11,−1 (c) Explain how the hypotheses H0 : σδ2 = 0
vs
Ha : σδ2 > 0
could be tested using the statistic in part (b), and explain how the value of p11,−1 affects the power of the test. Solution (a) Because βˆ −1 is unbiased for β and does not depend on y1 , E(y1 − xT1 βˆ −1 ) = xT1 β − xT1 β = 0 and var(y1 − xT1 βˆ −1 ) = var(y1 ) + xT1 (XT−1 X−1 )−1 XT−1 (σ 2 I)X−1 (XT−1 X−1 )−1 x1 = σ 2 + σδ2 + σ 2 xT1 (XT−1 X−1 )−1 x1 = σ 2 (1 + p11,−1 ) + σδ2 . 2 /σ 2 ∼ Thus eˆ 1,−1 ∼ N(0, σ 2 (1 + p11,−1 ) + σδ2 ). Furthermore, (n − p∗ − 1)σˆ −1 χ 2 (n − p∗ − 1) by applying (14.8) to the model without the first observation.
15 Inference for Estimable and Predictable Functions
261
(b) Because
0 0T 0 I − PX−1
σ 2 + σδ2 0T 0 σ 2I
1 −X−1 (XT−1 X−1 )−1 x1
=
0 , 0
2 /σ 2 are independent by Corollary 14.3.1.2b. Thus, eˆ 1,−1 and (n − p∗ − 1)σˆ −1
% eˆ 1,−1 / σ 2 (1 + p11,−1 ) + σδ2
eˆ 1,−1 ' ∼ t(n − p∗ − 1), =% 2 σˆ −1 1 + p11,−1 ∗ 2 ∗ (n − p − 1)σˆ −1 /[σ (n − p − 1)] implying that eˆ1,−1 ' ∼ σˆ −1 1 + p11,−1 (c) Under H0 ,
σˆ −1
√eˆ1,−1
1+p11,−1
5 1+
σδ2 2 σ (1 + p
11,−1 )
t(n − p∗ − 1).
∼ t(n − p∗ − 1), so a size-ξ test of H0 versus Ha is to
reject H0 if and only if |eˆ1,−1 | ' > tξ/2,n−p−1 . σˆ −1 1 + p11,−1 & The power of this test clearly increases as
1+
σδ2 σ 2 (1+p11,−1 )
increases, so the
power increases as p11,−1 decreases. Exercise 6 Consider the normal positive definite Aitken model y = 1n μ + e, where n ≥ 2, e ∼ Nn (0, σ 2 W), and W = (1 − ρ)In + ρJn . Here, σ 2 is unknown 1 , 1 is a known constant. Let σˆ 2 represent the sample variance of but ρ ∈ − n−1 the responses, i.e., σˆ 2 = ni=1 (yi − y)2 /(n − 1). σˆ (a) Determine the joint distribution of y¯ and σ(n−1) 2 (1−ρ) . (b) Using results from part (a), derive a size-ξ test for testing H0 : μ = 0 versus Ha : μ = 0. (c) For the test you derived in part (b), describe how the magnitude of ρ affects the power. (d) A statistician wants to test the null hypothesis H0 : μ = 0 versus the alternative hypothesis Ha : μ > 0 at the ξ level of significance. However, rather than using the appropriate test obtained in part (b), (s)he wishes to ignore the correlations 2
262
15 Inference for Estimable and Predictable Functions
among the observations and use a standard size-ξ t test. That is, (s)he will reject H0 if and only if y √ > tξ,n−1 . σˆ / n The size of this test is not necessarily equal to its nominal value, ξ . What is the actual size of the test, and how does it compare to ξ ? Solution (a) First observe that y¯ = (1/n)1Tn y is normally distributed with mean (1/n)1Tn (1n μ) = μ and variance (1/n)1T (σ 2 W)(1/n)1 = (σ 2 /n2 )1T [(1 − ρ)I + ρJ]1 = (σ 2 /n)[1 + (n − 1)ρ]. Furthermore, (n − 1)σˆ 2 = (1/σ 2 )yT σ 2 (1 − ρ)
1 1 (In − Jn ) y. 1−ρ n
1 1 Since [ 1−ρ (In − n1 Jn )]W = [ 1−ρ (In − n1 Jn )][(1 − ρ)In + ρJn ] = In − n1 Jn is (n−1)σˆ 2 ∼ χ 2 (df, N CP ), σ 2 (1−ρ) 1 N CP = 12 (μ/σ )2 1Tn [ 1−ρ (In −
idempotent, it follows from Corollary 14.2.6.1 that 1 (In − n1 Jn )] = n − 1 and where df = rank[ 1−ρ
σˆ = 0. Also, y¯ and σ(n−1) 2 (1−ρ) are independent. (b) By the distributional results obtained in the solution to part (a), 2
1 n Jn )]1n
y¯ μ ∼N ,1 √ √ √ √ (σ/ n) 1 + (n − 1)ρ (σ/ n) 1 + (n − 1)ρ and y¯ √ √ (σ/ n) 1+(n−1)ρ
&
(n−1)σˆ 2 σ 2 (1−ρ)
4
(n − 1)
√ μ 1−ρ y¯ ∼ t(n − 1, ). = √ ·√ √ √ σˆ / n 1 + (n − 1)ρ (σ/ n) 1 + (n − 1)ρ
Thus, a size-ξ test of H0 : μ = 0 versus Ha : μ = 0 is to reject H0 if and only √ 1−ρ |y| ¯ √ √ if σˆ / n · 1+(n−1)ρ > tξ/2,n−1 .
−1 (c) As ρ increases from n−1 to 1, the absolute value of the noncentrality parameter of the t distribution decreases, hence the power of the test in part (b) decreases.
15 Inference for Estimable and Predictable Functions
263
% ¯ 1+(n−1)ρ (d) By part (b), the null distribution of σˆ /y√ is T , where T ∼ t (n − 1). 1−ρ n It follows that the actual size of the standard size-ξ t test is
y¯ Pr √ > tξ,n−1 σˆ / n
5
1 + (n − 1)ρ = Pr T > tξ,n−1 1−ρ 5 1−ρ = Pr T > tξ,n−1 . 1 + (n − 1)ρ
% % 1−ρ 1−ρ 1 > 1 if ρ ∈ (− n−1 , 0); 1+(n−1)ρ = Finally, it is easy to verify that 1+(n−1)ρ % 1−ρ 1 if ρ = 0 and 1+(n−1)ρ < 1 if ρ ∈ (0, 1). Hence the actual size satisfies ⎧ 1 ⎨ < ξ if ρ ∈ (− n−1 , 0) y¯ Pr √ > tξ,n−1 = ξ if ρ = 0 ⎩ σˆ / n > ξ if ρ ∈ (0, 1).
Exercise 7 Consider a situation where observations are taken on some outcome in two groups (e.g., a control group and an intervention group). An investigator wishes to determine if exactly the same linear model applies to the two groups. Specifically, suppose that the model for n1 observations in the first group is y1 = X1 β 1 + e1 and the model for n2 observations from the second group is y2 = X2 β 2 + e2 , where X1 and X2 are model matrices whose p columns (each) correspond to the same p explanatory variables, and e1 and e2 are independent and normally distributed random vectors with mean 0ni and variance–covariance matrix σ 2 Ini . Assume that n1 + n2 > 2p. (a) Let CT be an s × p matrix such that CT β 1 is estimable under the first model and CT β 2 is estimable under the second model. Show that the two sets of data can be combined into a single linear model of the form y = Xβ + e such that the null hypothesis H0 : CT β 1 = CT β 2 is testable under this model and can be expressed as H0 : BT β = 0 for a suitably chosen matrix B.
264
15 Inference for Estimable and Predictable Functions
(b) By specializing Theorem 15.2.1, give the size-ξ F-test (in as simple a form as possible) for testing the null hypothesis of part (a) versus the alternative Ha : CT β 1 = CT β 2 . Solution X1 0 β1 e y1 = + 1 . The null hypothesis H0 : CT β 1 = CT β 2 (a) y2 β2 e2 0 X2 may be written as H0 : BT β = 0, where BT = (CT , −CT ). Furthermore, by the given estimability conditions, R(CT )⊆ R(X1) and R(CT ) ⊆ R(X2 ), X1 0 = R(X). Thus the null implying that R(BT ) = R(CT , −CT ) ⊆ R 0 X2 hypothesis is testable under the single model. (b) We may write an arbitrary solution to the normal equations as βˆ =
βˆ 2 βˆ 2
−
= (X X) X y = T
T
XT1 X1 0 0 XT2 X2
−
XT1 y1 XT2 y2
=
(XT1 X1 )− XT1 y1 (XT2 X2 )− XT2 y2
.
Then BT βˆ = CT βˆ 1 − CT βˆ 2 , and ˆ = σ 2 BT (XT X)− B = σ 2 (CT , −CT ) var(B β) T
(XT1 X1 )− 0 T 0 (X2 X2 )−
C −C
= σ 2 CT [(XT1 X1 )− + (XT2 X2 )− ]C.
So, by Theorem 15.2.1 the size-ξ F-test of the specified hypotheses rejects H0 if and only if (CT βˆ 1 − CT βˆ 2 ){CT [(XT1 X1 )− + (XT2 X2 )− ]C}−1 (CT βˆ 1 − CT βˆ 2 ) > Fξ,s,n1 +n2 −2p , s σˆ 2
where σˆ 2 = (y − X1 βˆ 1 − X2 βˆ 2 )T (y − X1 βˆ 1 − X2 βˆ 2 )/(n1 + n2 − 2p). Exercise 8 Consider a situation in which data are available on a pair of variables (x, y) from individuals that belong to two groups. Let {(x1i , y1i ) : i = 1, . . . , n1 } and {(x2i , y2i ) : i = 1, . . . , n2 }, respectively, denote the data from these two groups. Assume that the following model holds: y1i = β11 + β21 (x1i − x¯1 ) + e1i
(i = 1, . . . , n1 )
y2i = β12 + β22 (x2i − x¯2 ) + e2i
(i = 1, . . . , n2 ),
15 Inference for Estimable and Predictable Functions
265
where the e1i ’s and e2i ’s are independent N(0, σ 2 ) random variables, n1 ≥ 2, n2 ≥ 2, x11 = x12 , and x21 = x22 . Furthermore, define the following notation: x¯1 = n−1 1
n1
x1i ,
x¯2 = n−1 2
i=1
n2
x2i ,
i=1
SXX 1 =
n1 (x1i − x¯1 )2 , i=1
SXY 1 =
y¯1 = n−1 1
n1 (x1i − x¯1 )(y1i − y¯1 ), i=1
n1
y1i ,
y¯2 = n−1 2
i=1
SXX 2 =
n2
y2i ,
i=1
n2 (x2i − x¯2 )2 , i=1
SXY 2 =
n2 (x2i − x¯2 )(y2i − y¯2 ). i=1
Observe that this model can be written in matrix notation as the unordered 2-part model 0 β11 x1 − x¯1 1n1 β21 1n1 0n1 + + e. y= 0n2 1n2 β12 β22 0 x2 − x¯2 1n2 (a) Give nonmatrix expressions for the BLUEs of β11 , β12 , β21 , and β22 , and for the residual mean square, in terms of the quantities defined above. Also give an expression for the variance–covariance matrix of the BLUEs. (b) Give the two-part ANOVA table for the ordered two-part model that coincides with the unordered two-part model written above. Give nonmatrix expressions for the degrees of freedom and sums of squares. Determine the distributions of the (suitably scaled) sums of squares in this table, giving nonmatrix expressions for any noncentrality parameters. (c) Give a size-ξ test for equal intercepts, i.e., for H0 : β11 = β12 versus Ha : β11 = β12 . (d) Give a size-ξ test for equal slopes (or “parallelism”), i.e., for H0 : β21 = β22 versus Ha : β21 = β22 . (e) Give a size-ξ test for identical lines, i.e., for H0 : β11 = β12 , β21 = β22 versus Ha : either β11 = β12 or β21 = β22 . (f) Let c represent a specified real number and let 0 < ξ < 1. Give a size-ξ test of the null hypothesis that the two groups’ regression lines intersect at x = c and that the slope for the first group is the negative of the slope for the second group versus the alternative hypothesis that at least one of these statements is false. (g) Suppose that we desire to predict the value of a new observation y1,n+1 from the first group corresponding to the value x1,n+1 of x1 , and we also desire to predict the value of a new observation y2,n+2 from the second group corresponding to the value x2,n+2 of x2 . Give nonmatrix expressions for 100(1 − ξ )% prediction intervals for these two new observations, and give an expression for a 100(1 − ξ )% prediction ellipsoid for them. (In the latter expression, you may use vectors and matrices but if so, give nonmatrix expressions for each element of them.)
266
15 Inference for Estimable and Predictable Functions
(h) Give confidence bands for the two lines for which the simultaneous coverage probability (for both lines and for all x ∈ R) is at least 1 − ξ . Solution 0 1n1 0 x1 − x¯1 1n1 , where x1 = (x11 , . . . , x1n1 )T and (a) X = 0 x2 − x¯2 1n2 0 1n2 x2 = (x21 , . . . , x2n2 )T , XT X = diag(n1 , n2 , SXX 1 , SXX 2 ), and XT y = (n1 y¯1 , n2 y¯2 , SXY 1 , SXY 2 )T . Therefore, the BLUEs of β11 , β12 , β21 , and β22 are given by y¯1 , y¯2 , SXY 1 /SXX 1 , and SXY 2 /SXX 2 , respectively. Furthermore,
ˆ ˆ T (y − Xβ) (y − Xβ) n1 + n2 − rank(X) n 2 n1 2 2 2 2 i=1 (y1i − y¯1 ) + i=1 (y2i − y¯2 ) − (SXY 1 /SXX 1 ) − (SXY 2 /SXX 2 ) = n1 + n2 − 4
σˆ 2 =
and the variance–covariance matrix of the BLUEs is σ 2 (XT X)−1 σ 2 diag(1/n1 , 1/n2 , 1/SXX 1 , 1/SXX 2 ).
=
(b) Source Intercepts
df 2
Sum of squares n1 y¯12 + n2 y¯22
x1 , x2 |Intercepts
2
(SXY 21 /SXX 1 ) + (SXY 22 /SXX 2 )
Residual Total
n1 + n 2 − 4 n1 + n2
By subtraction n1 2 n2 2 i=1 y1i + i=1 y2i
2 + n β2 n n1 y¯12 + n2 y¯22 β 1 2 11 12 ∼ χ 2 2, , σ2 2σ 2 2 SXX + β 2 SXX β21 (SXY 21 /SXX 1 ) + (SXY 22 /SXX 2 ) 1 2 2 22 ∼ χ 2, , σ2 2σ 2 (n1 + n2 − 4)σˆ 2 ∼ χ 2 (n1 + n2 − 4). σ2 (c) Using the results from part (a), a size-ξ t test of the given hypotheses is to reject H0 if and only if |y¯1 − y¯2 | ≥ tξ/2,n1 +n2 −4 . √ σˆ (1/n1 ) + (1/n2 )
15 Inference for Estimable and Predictable Functions
267
(d) Using the results from part (a), a size-ξ t test of the given hypotheses is to reject H0 if and only if |βˆ21 − βˆ22 | ≥ tξ/2,n1 +n2 −4 . √ σˆ (1/SXX 1 ) + (1/SXX 2 ) (e) This null hypothesis may be written as H0 : CT β = 0, where CT = 1 −1 0 0 , and it is clearly testable. Furthermore, 0 0 1 −1
CT βˆ =
=
⎞ y¯1 ⎟ y¯2 1 −1 0 0 ⎜ ⎟ ⎜ ⎝ SXY 1 /SXX 1 ⎠ 0 0 1 −1 SXY 2 /SXX 2
⎛
y¯1 − y¯2 (SXY 1 /SXX 1 ) − (SXY 2 /SXX 2 )
and ⎛
1 ⎜ −1 1 −1 0 0 CT (XT X)− C = [diag(n1 , n2 , SXX 1 , SXX 2 )]−1 ⎜ ⎝ 0 0 0 1 −1 0 1 + n12 0 . = n1 1 1 0 + SXX 1 SXX 2
⎞ 0 0 ⎟ ⎟ 1 ⎠ −1
By Theorem 15.2.1, a size-ξ test of this null hypothesis versus the alternative Ha : CT β = 0 is to reject the null hypothesis if and only if ˆ T [CT (XT X)− C]−1 CT βˆ (CT β) ≥ Fξ,2,n1 +n2 −4 , 2σˆ 2 i.e., if and only if
(y¯1 − y¯2 )2 [(SXY 1 /SXX 1 ) − (SXY 2 /SXX 2 )]2 ≥ 2σˆ 2 Fξ,2,n1 +n2 −4 . + (1/n1 ) + (1/n2 ) (1/SXX 1 ) + (1/SXX 2 )
268
15 Inference for Estimable and Predictable Functions
(f) This null hypothesis may be written as H0 : CT β = 0, where CT = 1 −1 c − x¯1 x¯2 − c , and it is clearly testable. Furthermore, 0 0 1 1
CT βˆ =
=
1 −1 c − x¯1 x¯2 − c 0 0 1 1
⎞ y¯1 ⎟ ⎜ y¯2 ⎟ ⎜ ⎝ SXY 1 /SXX 1 ⎠ SXY 2 /SXX 2 ⎛
y¯1 − y¯2 + (c − x¯1 )(SXY 1 /SXX 1 ) − (c − x¯2 )(SXY 2 /SXX 2 ) (SXY 1 /SXX 1 ) + (SXY 2 /SXX 2 )
and ⎛
1 ⎜ 1 −1 c − x¯1 x¯2 − c T T − −1 ⎜ −1 C (X X) C = [diag(n1 , n2 , SXX 1 , SXX 2 )] ⎜ ⎝ c − x¯1 0 0 1 1 x¯2 − c x¯1 )2 (c−x¯2 )2 c−x¯1 c−x¯2 1 + n12 + (c− SXX 1 + SXX 2 SXX 1 − SXX 2 = n1 . c−x¯1 c−x¯2 1 1 SXX 1 − SXX 2 SXX 1 + SXX 2
⎞ 0 ⎟ 0⎟ ⎟ 1⎠ 1
By Theorem 15.2.1, a size-ξ test of this null hypothesis versus the alternative Ha : CT β = 0 is to reject the null hypothesis if and only if ˆ T [CT (XT X)− C]−1 CT βˆ (CT β) ≥ Fξ,2,n1 +n2 −4 . 2σˆ 2 (g) Define τ1 = cT1 β + e1,n+1 and τ2 = cT2 β + e2,n+2 , where cT1 = 1 0 x1,n+1 − x¯1 0 ,
cT2 = 1 0 x2,n+2 − x¯2 0 .
The BLUPs of τ1 and τ2 are τ˜1 = βˆ11 + βˆ21 (x1,n+1 − x¯1 ) and τ˜2 = βˆ12"+ βˆ22 (x2,n+2 − x¯2 ), #respectively, and their are " prediction error variances # (x1,n+1 −x¯1 )2 (x2,n+2 −x¯2 )2 1 1 2 2 2 2 ≡ σ Q1 and σ 1 + n2 + ≡ σ Q2 . σ 1 + n1 + SXX 1 SXX 2 Therefore, a 100(1 − ξ )% prediction interval for τ1 is given by ' βˆ11 + βˆ21 (x1,n+1 − x¯1 ) ± tξ/2,n1 +n2 +4 σˆ Q1 and a 100(1 − ξ )% prediction interval for τ2 is given by ' βˆ12 + βˆ22 (x2,n+2 − x¯2 ) ± tξ/2,n1 +n2 +4 σˆ Q2 .
15 Inference for Estimable and Predictable Functions
269
Because cov[βˆ11 + βˆ21 (x1,n+1 − x¯1 ) − τ1 , βˆ12 + βˆ22 (x2,n+2 − x¯2 ) − τ2 ] = 0, a 100(1 − ξ )% prediction ellipsoid for (τ1 , τ2 ) is given by (τ˜1 − τ1 )2 (τ˜2 − τ2 )2 (τ1 , τ2 ) : + ≤ 2σˆ 2 Fξ,2,n1 +n2 −4 . Q1 Q2 (h) ⎧ ⎨ ⎩
' [βˆ11 + βˆ21 (x − x¯1 )] ± 4Fξ,4,n1 +n2 −4 σˆ
5
⎫ ⎬ 1 (x − x¯1 )2 + for all x ∈ R ⎭ n1 SXX 1
and ⎧ ⎫ 5 ⎨ ⎬ 2 ' 1 (x − x¯2 ) [βˆ12 + βˆ22 (x − x¯2 )] ± 4Fξ,4,n1 +n2 −4 σˆ + for all x ∈ R . ⎩ ⎭ n2 SXX 2 Exercise 9 Consider a situation similar to that described in the previous exercise, except that the two lines are known to be parallel. Thus the model is y1i = β11 + β2 (x1i − x¯1 ) + e1i
(i = 1, . . . , n1 )
y2i = β12 + β2 (x2i − x¯2 ) + e2i
(i = 1, . . . , n2 ),
where again the e1i ’s and e2i ’s are independent N(0, σ 2 ) random variables, n1 ≥ 2, n2 ≥ 2, x11 = x12 , and x21 = x22 . Adopt the same notation as in the previous exercise, but also define the following: SXX 12 = SXX 1 + SXX 2 ,
SXY 12 = SXY 1 + SXY 2 .
Observe that this model can be written in matrix notation as the unordered 2-part model β11 x1 − x¯1 1n1 1n1 0n1 + β2 + e. y= 0n2 1n2 β12 x2 − x¯2 1n2 (a) Give nonmatrix expressions for the BLUEs of β11 , β12 , and β2 , and for the residual mean square, in terms of the quantities defined above. Also give an expression for the variance–covariance matrix of the BLUEs. (b) Give the two-part ANOVA table for the ordered two-part model that coincides with the unordered two-part model written above. Give nonmatrix expressions for the degrees of freedom and sums of squares. Determine the distributions of the (suitably scaled) sums of squares in this table, giving nonmatrix expressions for any noncentrality parameters.
270
15 Inference for Estimable and Predictable Functions
(c) Suppose that we wish to estimate the distance, δ, between the lines of the two groups, defined as the amount by which the line for Group 1 exceeds the line for Group 2. Verify that δ = β11 − β12 − β2 (x¯1 − x¯2 ) and obtain a 100(1 − ξ )% confidence interval for δ. (d) Obtain a size-ξ t test for H0 : δ = 0 versus Ha : δ = 0. (This is a test for identical lines, given parallelism.) (e) Suppose that enough resources (time, money, etc.) were available to take observations on a total of 20 observations (i.e., n1 + n2 = 20). From a design standpoint, how would you choose n1 , n2 , the x1i ’s and the x2i ’s in order to minimize the width of the confidence interval in part (c)? (f) Give a size-ξ test for H0 : β2 = 0 versus Ha : β2 = 0. δ = 0 versus Ha : not H0 . (g) Obtain the size-ξ F-test of H0 : β2 (h) Let ξ ∈ (0, 1). Give an expression for a 100(1 − ξ )% confidence ellipsoid for β ≡ (β11 , β12 , β2 )T . (i) Let x0 represent a specified x-value. Give confidence intervals for β11 + β2 x0 and β12 + β2 x0 whose simultaneous coverage probability is exactly 1 − ξ . (j) Give confidence bands for the two parallel lines whose simultaneous coverage probability (for both lines and for all x ∈ R) is at least 1 − ξ . Solution (a) XT X = diag(n1 , n2 , SXX 12 ) and XT y = (n1 y¯1 , n2 y¯2 , SXY 12 )T . Therefore, the BLUEs of β11 , β12 , and β2 are given by y¯1 , y¯2 , and SXY 12 /SXX 12 , respectively. Furthermore, n 2 n1 2 2 2 i=1 (y1i − y¯1 ) + i=1 (y2i − y¯2 ) − (SXY 12 /SXX 12 ) 2 σˆ = n1 + n2 − 3 and the variance–covariance matrix of the BLUEs is σ 2 diag(1/n1 , 1/n2 , 1/SXX 12 ). (b) Source Intercepts
df 2
Sum of squares n1 y¯12 + n2 y¯22
x|Intercepts
1
SXY 212 /SXX 12
Residual Total
n1 + n2 − 3 n1 + n2
By subtraction n1 2 n2 2 i=1 y1i + i=1 y2i
2 + n β2 n n1 y¯12 + n2 y¯22 β 1 2 11 12 ∼ χ 2 2, , σ2 2σ 2
2 SXX β SXY 212 12 ∼ χ 2 1, 2 2 , σ 2 SXX 12 2σ
15 Inference for Estimable and Predictable Functions
271
and (n1 + n2 − 3)σˆ 2 ∼ χ 2 (n1 + n2 − 3). σ2 (c) For any x ∈ R, δ = β11 +β12 (x − x¯1 )−[β12 +β2 (x − x¯2 )] = β11 −β12 −β2 (x¯1 − x¯2 ). The ordinary least squares estimator of δ is δˆ = βˆ11 − βˆ12 − βˆ2 (x¯1 − x¯2 ), and % δˆ ± tξ/2,n1 +n2 −3 σˆ (1/n1 ) + (1/n2 ) + (x¯1 − x¯2 )2 /SXX 12 is a 100(1 − ξ )% confidence interval for δ. (d) Reject H0 if and only if ˆ |δ|
'
σˆ (1/n1 ) + (1/n2 ) + (1/SXX 12 )(x¯1 − x¯2 )2
> tξ/2,n1 +n2 −3 .
(e) To minimize the width of the confidence interval obtained in part (c) subject to n1 + n2 = 20, we should minimize both of (1/n1 ) + (1/n2 ) and (x¯1 − x¯2 )2 ; the first of these is minimized at n1 = n2 = 10, and the latter is minimized when x¯1 = x¯2 . (f) Reject H0 if and only if |βˆ2 | > tξ/2,n1 +n2 −3 . √ σˆ 1/SXX 12 T T (g) This null hypothesis may be written as H0 : C β = 0, where C = 1 −1 −(x¯1 − x¯2 ) , and it is clearly testable. Furthermore, 0 0 1
⎞ y¯1 ⎠ CT βˆ = y¯2 SXY 12 /SXX 12 y¯1 − y¯2 − (x¯1 − x¯2 )(SXY 12 /SXX 12 ) = SXY 12 /SXX 12
⎛ 1 −1 −(x¯1 − x¯2 ) ⎝ 0 0 1
and CT (XT X)− C = =
1 −1 −(x¯1 − x¯2 ) 0 0 1 1 n1
+
(x¯1 −x¯2 )2 SXX 12 ¯1 −x¯2 − xSXX 12 1 n2
+
⎛
1 −1 ⎜ [diag(n1 , n2 , SXX 12 )] ⎝ −1 −(x¯1 − x¯2 ) ¯1 −x¯2 − xSXX 12 . 1 SXX 12
⎞ 0 ⎟ 0⎠ 1
272
15 Inference for Estimable and Predictable Functions
By Theorem 15.2.1, a size-ξ test of this null hypothesis versus the alternative Ha : CT β = 0 is to reject the null hypothesis if and only if ˆ T [CT (XT X)− C]−1 CT βˆ (CT β) ≥ Fξ,2,n1 +n2 −3 . 2σˆ 2 (h) By (15.2) and the diagonality of XT X, a 100(1 − ξ )% confidence ellipsoid for β = (β11 , β12 , β2 )T is given by the set of such β for which {(β11 , β12 , β2 ) : (βˆ11 −β11 )2 n1 +(βˆ12 −β12 )2 n2 +(βˆ2 −β2 )2 SXX 12 ≤ 3σˆ 2 Fξ,3,n1 +n2 −3 }.
(i) The 100(1−ξ )% multivariate t simultaneous confidence intervals for β11 +β2 x0 and β12 + β2 x0 are given by % [βˆ11 + βˆ2 (x0 − x¯1 )] ± tξ/2,2,n1 +n2 −3,R σˆ (1/n1 ) + x02 /SXX 12 and % ˆ ˆ [β12 + β2 (x0 − x¯2 )] ± tξ/2,2,n1 +n2 −3,R σˆ (1/n2 ) + x02 /SXX 12 , where ⎛ R=⎝
1
x02 /SXX 12 % % 2 (1/n1 )+(x0 /SXX 12 ) (1/n2 )+(x02 /SXX 12 )
⎞ ⎠.
1 (j) ⎧ ⎨ ⎩
[βˆ11 + βˆ2 (x − x¯1 )] ±
'
5 3Fξ,3,n1 +n2 −3 σˆ
⎫ ⎬ 1 (x − x¯1 )2 + for all x ∈ R ⎭ n1 SXX 12
and ⎧ ⎫ 5 ⎨ ⎬ ' 1 (x − x¯2 )2 [βˆ21 + βˆ2 (x − x¯2 )] ± 3Fξ,3,n1 +n2 −3 σˆ + for all x ∈ R . ⎩ ⎭ n2 SXX 12 Exercise 10 Consider the normal Gauss–Markov two-way main effects model with exactly one observation per cell, which has model equation yij = μ + αi + γj + eij
(i = 1, . . . , q; j = 1, . . . , m).
15 Inference for Estimable and Predictable Functions
273
Define f =
q m
bij (yij − y¯i· − y¯·j + y¯·· ) = bT (I − PX )y,
i=1 j =1
where {bij } is a set of real numbers, not all equal to 0, such that all j and m j =1 bij = 0 for all i.
q
i=1 bij
= 0 for
(a) Show that f2 σ 2 bT b
and
q m (1/σ ) (yij − y¯i· − y¯·j + y¯·· )2 − 2
i=1 j =1
f2 σ 2 bT b
are jointly distributed as independent χ 2 (1) and χ 2 (qm − q − m) random variables under this model. (b) Let yI represent the q-vector whose ith element is y¯i· − y¯·· , let yI I represent the m-vector whose j th element is y¯·j − y¯·· , and let yI I I represent the qm-vector whose ij th element is yij − y¯i· − y¯·j + y¯·· . Show that yI , yI I , and yI I I are independent under this model. (c) Now suppose that we condition on yI and yI I , and consider the expansion of the model to yij = μ + αi + γj + λ(y¯i· − y¯·· )(y¯·j − y¯·· ) + eij
(i = 1, . . . , q; j = 1, . . . , m).
Using part (a), show that conditional on yI and yI I , 2 ( y ¯ − y ¯ )( y ¯ − y ¯ )y i· ·· ·j ·· ij j =1 i=1 q m 2 2 2 σ j =1 (y¯·j − y¯·· ) i=1 (y¯i· − y¯·· )
q
m
and q m (yij − y¯i· − y¯·j + y¯·· )2 − (1/σ 2 ) i=1 j =1
2 q m j =1 (y¯i· − y¯·· )(y¯·j − y¯·· )yij i=1 q 2 σ 2 i=1 (y¯i· − y¯·· )2 m j =1 (y¯·j − y¯·· )
are jointly distributed as independent χ 2 (1, ν) and χ 2 (qm − q − m) random q variables, and determine ν. [Hint: First observe that i=1 m j =1 (y¯i· − y¯·· )(y¯·j − q m y¯·· )(yij − y¯i· − y¯·j + y¯·· ) = i=1 j =1 (y¯i· − y¯·· )(y¯·j − y¯·· )yij .] (d) Consider testing H0 : λ = 0 versus Ha : λ = 0 in the conditional model specified in part (c). Show that the noncentrality parameter of the chi-square distribution of the first quantity displayed above is equal to 0 under H0 , and
274
15 Inference for Estimable and Predictable Functions
then use parts (b) and (c) to determine the unconditional joint distribution of both quantities displayed above under H0 . (e) Using part (d), obtain a size-ξ test of H0 : λ = 0 versus Ha : λ = 0 in the unconditional statistical model (not a linear model according to our definition) given by yij = μ + αi + γj + λ(y¯i· − y¯·· )(y¯·j − y¯·· ) + eij , where the eij ’s are independent N(0, σ 2 ) random variables. Note: A nonzero λ in the model defined in part (e) implies that the effects of the two crossed factors are not additive. Consequently, the test ultimately obtained in part (e) tests for nonadditivity of a particular form. The test and its derivation (which is essentially equivalent to the steps in this exercise) are due to Tukey (1949), and the test is commonly referred to as “Tukey’s one-degree-of-freedom test for nonadditivity.” Solution (a) By the given properties of b, we obtain bT X = bT (1qm , Iq ⊗ 1m , 1q ⊗ Im ) = (b·· , b1· , b2· , . . . , bq· , b·1 , b·2 , . . . , b·m ) = 0T1+q+m . Thus, PX b = 0 and f = bT y. Now, the two quantities whose joint distribution we seek may be written as yT A1 y/σ 2 and yT A2 y/σ 2 , where A1 =
1 bT b
A2 = I − PX −
bbT ,
1 bT b
bbT .
Observe that A1 and A2 sum to I−PX , which is idempotent; that A1 is obviously idempotent; and that A2 is idempotent because
1 bT b
bb
(I − PX ) =
T
1 bT b
bbT .
Therefore, by Theorem 14.4.1 (Cochran’s theorem), the two quantities of interest are independent; the distribution of the first quantity is chi-square with one degree of freedom and noncentrality parameter 2
T
T
(1/2σ )β X
1 bT b
bb
T
Xβ = 0;
15 Inference for Estimable and Predictable Functions
275
and the distribution of the second quantity is chi-square with qm−q −m degrees of freedom and noncentrality parameter (1/2σ 2 )β T XT
1 I − PX − T bbT Xβ = 0. b b
(b) The independence of yI and yI I will be demonstrated; independence of the other two pairs of vectors is shown very similarly. Observe that yI = [(Iq ⊗ (1/m)Jm ) − ((1/q)Jq ⊗ (1/m)Jm )]y, yI I = [((1/q)Jq ⊗ Im ) − ((1/q)Jq ⊗ (1/m)Jm )]y. Also observe that [(Iq ⊗ (1/m)Jm ) − ((1/q)Jq ⊗ (1/m)Jm )][((1/q)Jq ⊗ Im ) − ((1/q)Jq ⊗ (1/m)Jm )] = ((1/q)Jq ⊗ (1/m)Jm ) − ((1/q)Jq ⊗ (1/m)Jm ) − ((1/q)Jq ⊗ (1/m)Jm ) + ((1/q)Jq ⊗ (1/m)Jm ) = 0.
By Corollary 14.1.8.1, yI and yI I are independent. (c) The equality in the hint is satisfied because q m
(y¯i· − y¯·· )(y¯·j − y¯·· )y¯i· =
q
i=1 j =1
i=1
q m
m
i=1 j =1
j =1
(y¯i· − y¯·· )(y¯·j − y¯·· )y¯·j =
m (y¯i· − y¯·· )y¯i· (y¯·j − y¯·· ) = 0, j =1 q (y¯·j − y¯·· )y¯·j (y¯i· − y¯·· ) = 0, i=1
q m
q
i=1 j =1
i=1
(y¯i· − y¯·· )(y¯·j − y¯·· )y¯·· = y¯··
(y¯i· − y¯·· )
m
(y¯·j − y¯·· ) = 0.
j =1
Similarly, q i=1
(y¯i· − y¯·· )(y¯·j − y¯·· ) = (y¯·j − y¯·· )
q
(y¯i· − y¯·· ) = 0 for all j
i=1
and m m (y¯i· − y¯·· )(y¯·j − y¯·· ) = (y¯i· − y¯·· ) (y¯·j − y¯·· ) = 0 j =1
j =1
for all i,
276
15 Inference for Estimable and Predictable Functions
so that (y¯i· − y¯·· )(y¯·j − y¯·· ) satisfies the requirements of the bij ’s specified in part (a) (the condition that they are not all equal to 0 is satisfied with probability one). Moreover, by the independence established in part (b), the conditional distribution of yI I I , given yI and yI I , is identical to its unconditional distribution. Thus, by part (a), the conditional distributions, given yI and yI I , of the two quantities we seek are independent chi-square distributions with the same degrees of freedom as specified in part (a), but possibly different noncentrality parameters [because the mean structure of this model is different than that of the model in part (a)]. To obtain the new noncentrality parameters, let us write b = yI ⊗ yI I . Now let PX represent the orthogonal projection matrix onto the column space of this model, and let PXME represent the orthogonal projection matrix onto the column space of the main effects model considered in parts (a) and (b). For the current model, which is conditioned on yI and yI I , X = (1qm , Iq ⊗ 1m , 1q ⊗ Im , b). It was shown in the solution to part (a) that the last column of this matrix is orthogonal to the remaining columns, so PX = PXME +
1 bT b
bT b.
Thus bT X = (bT XME , bT b) = (0Tqm−q−m , bT b), yielding ν = (1/2σ 2 )β T XT
1 bT b
bbT
Xβ = (1/2σ 2 )λ2 bT b.
The noncentrality parameter of the other chi-square distribution is 1 1 1 T T Xβ (1/2σ 2 )β T XT PX − PXME − T bbT Xβ = (1/2σ 2 )β T XT − bb bb b b bT b bT b = 0.
(d) Under H0 , ν = 0 and the joint conditional distribution of the two quantities displayed in part (c) does not depend on the values conditioned upon. Thus, under H0 , the joint unconditional distribution of those quantities is the same as their joint conditional distribution. (e) Let F denote the ratio of the first quantity to the second quantity, and observe that σ 2 cancels in this ratio so that F is a statistic. By part (d), under H0 the
15 Inference for Estimable and Predictable Functions
277
unconditional distribution of F is F(1, qm − q − m). Thus, a size-ξ test of H0 versus Ha is to reject H0 if and only if F > Fξ,1,qm−q−m . Exercise 11 Prove Facts 2, 3, and 4 listed in Sect. 15.3. Solution Define E2 = {ω : A(ω) ≤ τ (ω) ≤ B(ω)} and F2 = {ω : c+ A(ω) + c− B(ω) ≤ cτ (ω) ≤ c+ B(ω) + c− A(ω) for all c ∈ R}. If ω ∈ E2 , then A(ω) ≤ τ (ω) ≤ B(ω) and multiplication by any real number c yields cA(ω) ≤ cτ (ω) ≤ cB(ω) if c ≥ 0 and cB(ω) ≤ cτ (ω) ≤ cA(ω) otherwise, or equivalently, c+ A(ω) + c− B(ω) ≤ cτ (ω) ≤ c+ B(ω) + c− A(ω). Thus ω ∈ F2 . Conversely, if ω ∈ F2 , then consideration of the single interval corresponding to c = 1 reveals that ω ∈ E2 . Thus E2 = F2 , hence Pr(E2 ) = Pr(F2 ). This proves Fact 2. {ω : Ai (ω) ≤ τi (ω) ≤ Bi (ω) for i = 1, . . . , k} and F3 = {ω : kDefine E3 = k k A (ω) ≤ τ (ω) ≤ i i i=1 i=1 i=1 Bi (ω)}. If ω ∈ E3 , then Ai (ω) ≤ τi (ω) ≤ Bi (ω) for i = 1, . . . , k, and by summing these inequalities we obtain the interval defined in F3 . Thus ω ∈ F3 , implying that E3 ⊆ F3 , implying further that Pr(E3 ) ≤ Pr(F3 ). This proves Fact 3. k k + − Define F4 = {ω : ≤ i=1 ci Ai (ω) + ci Bi (ω) i=1 ci τi (ω) ≤ k + − c B (ω) + c A (ω) for all c ∈ R}. If ω ∈ E , then Ai (ω) ≤ i i 3 i=1 i i i of the ith of these intervals τi (ω) ≤ Bi (ω) for i = 1, . . . , k, and multiplication k + − by the real number ci and summing yields c A (ω) + c B (ω) ≤ i i i=1 i i k + k − i=1 ci τi (ω) ≤ i=1 ci Bi (ω) + ci Ai (ω) . Thus ω ∈ F4 . Conversely, if ω ∈ F4 , then consideration of the k intervals corresponding to {c1 , . . . , ck } = {1, 0, . . . , 0}, {c1 , . . . , ck } = {0, 1, . . . , 0}, . . . , {c1 , . . . , ck } = {0, 0, . . . , 1} reveals that ω ∈ E3 . Thus E3 = F4 , hence Pr(E3 ) = Pr(F4 ). This proves Fact 4. Exercise 12 Prove (15.13), i.e., prove that for any ξ ∈ (0, 1) and any s ∗ > 0, ν > 0, and l > 0, s ∗ Fξ,s ∗ ,ν < (s ∗ + l)Fξ,s ∗ +l,ν . [Hint: Consider three independent random variables U ∼ χ 2 (s ∗ ), V ∼ χ 2 (l), and W ∼ χ 2 (ν).] Solution Let U ∼ χ 2 (s ∗ ), V ∼ χ 2 (l), and W ∼ χ 2 (ν) be independent. Then by Definition 14.5.1, U/s ∗ ∼ F(s ∗ , ν) W/ν
and
(U + V )/(s ∗ + l) ∼ F(s ∗ + l, ν), W/ν
and
ν(U + V ) ∼ (s ∗ + l)F(s ∗ + l, ν). W
implying that νU ∼ s ∗ F(s ∗ , ν) W
278
15 Inference for Estimable and Predictable Functions
The latter distributional result implies that Pr Now observe that
ν(U +V ) W
≤ (s ∗ + l)Fξ,s ∗ +l,ν = 1−ξ .
νU νU νV ν(U + V ) ≤ + = , W W W W where the inequality holds because νV /W > 0 with probability one. Therefore, Pr
ν(U + V ) ≤ s ∗ Fξ,s ∗ ,ν W
< Pr
νU ≤ s ∗ Fξ,s ∗ ,ν W
= 1 − ξ.
Therefore, it must be the case that s ∗ Fξ,s ∗ ,ν < (s ∗ + l)Fξ,s ∗ +l,ν . Exercise 13 Prove (15.14), i.e., prove that for any ξ ∈ (0, 1) and any s > 0 and ν > 0, tξ/2,s,ν,I ≤
'
sFξ,s,ν .
Solution Let x = (xi ) ∼ Ns (0, I) and w ∼ χ 2 (ν), and suppose that xand w are xi independent. Define ti = √w/ν and t = (ti ). Then t ∼ t(s, ν, I). Also, si=1 xi2 ∼ s xi2 /s χ 2 (s), and si=1 xi2 and w are independent, implying that i=1 ∼ F(s, ν) or w/ν s 2 s i=1 xi 2 equivalently i=1 ti = w/ν ∼ sF(s, ν). Thus, Pr
s
ti2
≤ sFξ,s,ν
= 1 − ξ.
i=1
But because maxi ti2 ≤ Pr
s i=1
q
2 i=1 ti ,
ti2
≤
2 tξ/2,s,ν,I
2 = 1 − ξ. ≤ Pr max ti2 ≤ tξ/2,s,ν,I i
2 Therefore, it must be the case that tξ/2,s,ν,I ≤ sFξ,s,ν or equivalently tξ/2,s,ν,I ≤ ' sFξ,s,ν .
Exercise 14 Prove Theorem 15.3.7: Under the normal prediction-extended positive definite Aitken model y W K Xβ 2 , ,σ u KT H 0
15 Inference for Estimable and Predictable Functions
279
with n > p∗ , if CT β + u is an s-vector of predictable functions and Q is positive definite (as is the case under either condition of Theorem 13.2.2b), then the probability of simultaneous coverage of the infinite collection of intervals for {aT (CT β + u) : a ∈ Rs } given by ˜ ± aT (CT β˜ + u)
% sFξ,s,n−p∗ σ˜ 2 (aT Qa)
for all a
is 1 − ξ . Solution By Corollary 2.16.1.1, ˜ − (CT β + u)]T Q−1 [(CT β˜ + u) ˜ − (CT β + u)] [(CT β˜ + u) 2 s σ˜ ˜ − (CT β + u)]}2 1 {aT [(CT β˜ + u) max = . aT Qa s σ˜ 2 a=0 Thus by Theorem 14.5.5,
˜ − (CT β + u)]T Q−1 [(CT β˜ + u) ˜ − (CT β + u)] [(CT β˜ + u) 1 − ξ = Pr ≤ Fξ,s,n−p∗ s σ˜ 2 0 / ˜ − (CT β + u)]}2 1 {aT [(CT β˜ + u) = Pr max ≤ Fξ,s,n−p∗ for all a = 0 aT Qa s σ˜ 2 a=0 % ˜ − sFξ,s,n−p∗ σ˜ 2 (aT Qa) ≤ aT (CT β˜ + u) = Pr aT (CT β˜ + u) ˜ + ≤ aT (CT β˜ + u)
%
sFξ,s,n−p∗ σ˜ 2 (aT Qa) for all a ∈ Rs .
Exercise 15 Consider the mixed linear model for two observations specified in Exercise 13.18, and suppose that the joint distribution of b, d1 , and d2 is multivariate normal. Let 0 < ξ < 1. (a) Obtain a 100(1 − ξ )% confidence interval for β. (b) Obtain a 100(1 − ξ )% prediction interval for b. (c) Obtain intervals for β and b whose simultaneous coverage probability is exactly 1 − ξ. Solution (a) By (15.6) in Theorem 15.1.2, a 100(1 − ξ )% confidence interval for β is given by ' β˜ ± tξ/2,1 σ˜ Qβ ,
280
15 Inference for Estimable and Predictable Functions
where (by Theorem 13.4.4) 5 5 −1 1 12 . Qβ = 1 0 = 0 5 12 35 (b) Again by (15.6) in Theorem 15.1.2, a 100(1 − ξ )% confidence interval for b is given by ' b˜ ± tξ/2,1 σ˜ Qb , where (by Theorem 13.4.4) 5 5 −1 0 1 Qb = 0 1 = . 1 5 12 7
(c) By the discussion immediately following (15.15), 100(1 − ξ )% multivariate t simultaneous confidence/prediction intervals for β and b are given by & β˜ ± tξ/2,2,1,R σ˜
12 35
& and
b˜ ± tξ/2,2,1,R σ˜
1 , 7
where, using Theorem 13.4.4 once more, R= ˜ b˜ − b) = and corr(β,
˜ b˜ − b) 1 corr(β, ˜ b˜ − b) corr(β, 1
√ −5/35 (12/35)(1/7)
% 5 = − 12 .
Exercise 16 Generalize all four types of simultaneous confidence intervals presented in Sect. 15.3 (Bonferroni, Scheffé, multivariate t, Tukey) to be suitable for use under a normal positive definite Aitken model. Solution Let M = (mii ) = CT (XT W−1 X)− C. By Theorem 15.1.1, the 100(1 − ξ )% Bonferroni intervals and the 100(1 − ξ )% Scheffé intervals are √ cTi β˜ ± tξ/(2s),n−p∗ σ˜ mii
(i = 1, . . . , s)
and cT β˜ ±
'
' s ∗ Fξ,s ∗ ,n−p∗ σ˜ cT (XT W−1 X)− c
for all cT ∈ R(CT ).
15 Inference for Estimable and Predictable Functions
281
Because √ cTi β˜ − cTi β (cTi β˜ − cTi β)/(σ mii ) % = ∼ t(s, n − p∗ , R), √ (n−p∗ )σ˜ 2 /σ 2 σ˜ mii n−p∗
where
cT β˜ cT β˜ R = var , . . . , √s √1 σ m11 σ mss
T ,
the 100(1 − ξ )% multivariate t intervals are √ cTi β˜ ± tξ/2,s,n−p∗ ,R σ˜ mii
(i = 1, . . . , s).
Letting ψ˜ 1 , . . . , ψ˜ k denote the generalized least squares estimators of the estimable functions ψ1 , . . . , ψk , and assuming that var(ψ˜ 1 , . . . , ψ˜ k )T = c2 σ 2 I for some c2 > 0, we find that the 100(1 − ξ )% Tukey intervals for all pairwise differences among the ψi ’s are ∗ (ψ˜ i − ψ˜ j ) ± cσ˜ qξ,k,n−p ∗.
Exercise 17 Obtain specialized expressions for R for the multivariate t-based simultaneous confidence and simultaneous prediction intervals presented in Example 15.3-1. Solution For the multivariate t simultaneous confidence intervals for β1 and β2 , R=
1 −√
x¯
x¯ 2 +(SXX/n)
.
1
Next consider the multivariate t-based simultaneous confidence intervals for (β1 + β2 xn+1 , . . . , β1 + β2 xn+s ). In this case, R = var
βˆ1 + βˆ2 xn+1
βˆ1 + βˆ2 xn+s
' ,..., ' σ (1/n) + (xn+1 − x) ¯ 2 /SXX σ (1/n) + (xn+s − x) ¯ 2 /SXX
T .
282
15 Inference for Estimable and Predictable Functions
The (i, j )-th element of this R is equal to 1 if i = j and is equal to ⎛
βˆ1 + βˆ2 xn+j
βˆ1 + βˆ2 xn+i
⎞
⎠ cov ⎝ ' , % 2 σ (1/n) + (xn+i − x) ¯ 2 /SXX σ (1/n) + (xn+j − x) ¯ /SXX (1/n) + (xn+i − x)(x ¯ n+j − x)/SXX ¯ % = ' (1/n) + (xn+i − x) ¯ 2 /SXX (1/n) + (xn+j − x) ¯ 2 /SXX if i = j. Finally, consider the multivariate t-based simultaneous prediction intervals for (yn+1 , . . . , yn+s ). In this case, R = var
βˆ1 + βˆ2 xn+1 − en+1
βˆ1 + βˆ2 xn+s − en+s
' ,..., ' σ 1 + (1/n) + (xn+1 − x) ¯ 2 /SXX σ 1 + (1/n) + (xn+s − x) ¯ 2 /SXX
T .
The (i, j )-th element of this R is equal to 1 if i = j and is equal to ⎛
βˆ1 + βˆ2 xn+j − en+j
βˆ1 + βˆ2 xn+i − en+i
⎞
⎠ cov ⎝ ' , % σ 1 + (1/n) + (xn+i − x) ¯ 2 /SXX σ 1 + (1/n) + (xn+j − x) ¯ 2 /SXX (1/n) + (xn+i − x)(x ¯ n+j − x)/SXX ¯ % = ' 1 + (1/n) + (xn+i − x) ¯ 2 /SXX 1 + (1/n) + (xn+j − x) ¯ 2 /SXX if i = j . Exercise 18 Extend the expressions for simultaneous confidence intervals presented in Example 15.3-2 to the case of unbalanced data, if applicable. Solution The 100(1 − ξ )% Bonferroni simultaneous confidence intervals are 5 (y¯i· − y¯i · ) ± tξ/[q(q−1)],n−q σˆ
1 1 + ; ni ni
the 100(1 − ξ )% multivariate t simultaneous confidence intervals are 5 (y¯i· − y¯i · ) ± tξ/2,q(q−1)/2,n−q,R σˆ
1 1 + , ni ni
15 Inference for Estimable and Predictable Functions
283
where
y¯(q−1)· − y¯q· y¯1· − y¯3· y¯1· − y¯2· , √ ,..., ' R = var ; √ σ (1/n1 ) + (1/n2 ) σ (1/n1 ) + (1/n3 ) σ (1/nq−1 ) + (1/nq )
and the 100(1 − ξ )% Scheffé simultaneous confidence intervals are (y¯i· − y¯i · ) ±
%
5 1 1 + ni ni
(q − 1)Fξ,q−1,n−q σˆ
for i > i = 1, . . . , q. Tukey’s method is not applicable. Exercise 19 Show that the simultaneous coverage probability of the classical confidence band given by (15.17) is 1−ξ despite the restriction that the first element of c is equal to 1. Solution Consider the normal full-rank Gauss–Markov regression model in its centered form, which can be written in the two equivalent forms yi = (1, xTi,−1 )
β1 β −1
+ ei
(i = 1, . . . , n)
and
y = Xβ + e = 1n , X−1
β1 β −1
+ e,
where xTi,−1 = (xi2 − x¯2 , xi3 − x¯3 , . . . , xip − x¯p ) and ⎞ xT1,−1 ⎟ ⎜ = ⎝ ... ⎠ . ⎛
X−1
xTn,−1 Observe that
XT X
=
n 0n−1
0Tn−1 . By Corollary 2.16.1.1, the maximum of XT−1 X−1 2 βˆ − β1 cT ˆ 1 β −1 − β −1 T c (XT X)−1 c
284
15 Inference for Estimable and Predictable Functions
over all c = (ci ) = 0 is attained at only those c that are proportional to βˆ − β1 , i.e., only those c for which XT X ˆ 1 β −1 − β −1 c=c
n(βˆ1 − β1 ) T X−1 X−1 (βˆ −1 − β −1 )
for some c = 0. This establishes that, at the maximum, c1 = 0 with probability one. Therefore, with probability one, ⎛/ ⎜ ⎜ ⎜ max ⎜ c=0 ⎜ ⎝
cT
βˆ1 − β1 ˆβ −1 − β −1
cT (XT X)−1 c
⎛/
02 ⎞ ⎟ ⎟ ⎟ ⎟= ⎟ ⎠
⎜ ⎜ ⎜ max ⎜ c1 =0,c2 ∈Rp−1 ⎜ ⎝
02 ⎞ βˆ1 − β1 ⎟ ⎟ βˆ −1 − β −1 ⎟ ⎟ ⎟ c 1 ⎠ (c1 , cT2 )(XT X)−1 c2
(c1 , cT2 )
⎛ /
02 ⎞ ˆ β − β 1 1 ⎜ [1, (1/c1 )cT2 ] ⎟ ⎜ ⎟ βˆ −1 − β −1 ⎜ ⎟ ⎟ = max ⎜ ⎟ c1 =0,c2 ∈Rp−1 ⎜ 1 ⎝ [1, (1/c )cT ](XT X)−1 ⎠ 1 2 (1/c1 )c2 ⎛/ ⎜ ⎜ ⎜ = max ⎜ x−1 ∈Rp−1 ⎜ ⎝
02 ⎞ ˆ β − β 1 1 ⎟ (1, xT−1 ) ⎟ βˆ −1 − β −1 ⎟ ⎟. ⎟ 1 ⎠ (1, xT−1 )(XT X)−1 x−1
Thus % 1 − ξ = Pr cT βˆ − pFξ,p,n−p σˆ 2 cT (XT X)−1 c ≤ cT β ≤ cT βˆ +
% pFξ,p,n−p σˆ 2 cT (XT X)−1 c for all c ∈ Rp
6 7 7 1 T = Pr (βˆ1 + x−1 βˆ −1 ) − 8pFξ,p,n−p σˆ 2 (1, xT−1 )(XT X)−1 ≤ β1 + xT−1 β −1 x−1 6 7 7 1 T T 8 2 T −1 ˆ ≤ (βˆ1 + x−1 β −1 ) + pFξ,p,n−p σˆ (1, x−1 )(X X) for all x−1 ∈ Rp−1 . x−1
15 Inference for Estimable and Predictable Functions
285
Exercise 20 Consider the normal random-slope, fixed-intercept simple linear regression model yi = β + bzi + di
(i = 1, . . . , n),
described previously in more detail in Exercise 13.13, where var(y), among other quantities, was determined. The model equation may be written in matrix form as y = β1 + bz + d. (a) Consider the sums of squares that arise in the following ordered two-part mixedmodel ANOVA: Source z 1|z Residual
df 1 1 n−2
Sum of squares yT Pz y T y (P1,z − Pz )y yT (I − P1,z )y
Here Pz = zzT /( ni=1 zi2 ). Determine which two of these three sums of squares, when divided by σ 2 , have noncentral chi-square distributions under the mixed simple linear regression model defined above. For those two, give nonmatrix expressions for the parameters of the distributions. (b) It is possible to use the sums of squares from the ANOVA above to test H0 : β = 0 versus Ha : β = 0 by an F-test. Derive the appropriate F-statistic and give its distribution under each of Ha and H0 . T 2 ˜ ˜ T (c) Let τ = (β, b) and let τ˜ = (β, b) be the BLUP of τ . Let σ Q = var(τ˜ − q11 q12 and assume that this matrix is positive definite. Starting τ) = σ2 q12 q22 with the probability statement
(τ˜ − τ )T Q−1 (τ˜ − τ ) ∗ 1 − ξ = Pr ≤ Fξ,s,n−p , s σ˜ 2 for a particular choice of s and p∗ , Scheffé’s method can be used to derive a 100(1 − ξ )% simultaneous prediction band for {β + bz : z ∈ R}, where ˜ ± g(z), for some function g(z). ξ ∈ (0, 1). This band is of the form (β˜ + bz) Determine g(z); you may leave your answer in terms of q11 , q12 , and q22 (and other quantities) but determine numerical values for s and p∗ .
286
15 Inference for Estimable and Predictable Functions
Solution (a) According to the solution to Exercise 13.13, var(y) = σ 2 [I + (σb2 /σ 2 )zzT ]. Now, (1/σ 2 )Pz var(y) =
n
−1 zi2
⎡ ⎤ −1 n zzT [I + (σb2 /σ 2 )zzT ] = ⎣ zi2 + (σb2 /σ 2 )⎦ zzT ,
i=1
i=1
which is not idempotent. But (1/σ 2 )(P1,z − Pz )var(y) = (P1,z − Pz )[I + (σb2 /σ 2 )zzT ] = P1,z − Pz , which is idempotent, and (1/σ 2 )(I − P1,z )var(y) = (I − P1,z )[I + (σb2 /σ 2 )zzT ] = I − P1,z , which is also idempotent. Therefore, (1/σ 2 )yT (P1,z − Pz )y ∼ χ 2 (1, NCP1 ), where N CP1 = (β1)T (P1,z − Pz )(β1)/(2σ 2 ) =
β 2 nSZZ , 2σ 2 ni=1 zi2
and (1/σ 2 )yT (I − P1,z )y ∼ χ 2 (n − 2, NCP2 ), where NCP2 = (β1)T (I − P1,z )(β1)/(2σ 2 ) = 0. (b) The two quadratic forms in part (a) that have chi-square distributions are independent because (1/σ 2 )(P1,z − Pz )var(y)(1/σ 2 )(I − P1,z ) = (1/σ 2 )(P1,z − Pz )(I − P1,z ) = (1/σ 2 )(P1,z − Pz − P1,z + Pz ) = 0.
15 Inference for Estimable and Predictable Functions
287
Therefore, F ≡
yT (P1,z − Pz )y/σ 2 yT (P1,z − Pz )y β 2 nSZZ = ∼ F 1, n − 2, . yT (I − P1,z )y/(n − 2) yT (I − P1,z )y/(n − 2)σ 2 2σ 2 ni=1 zi2
Under H0 , F ∼ F(1, n − 2), and under Ha , F ∼ F 1, n − 2, (c) Here s = 2 and
p∗
= 1. Thus, by Corollary 2.16.1.1,
β 2 nSZZ 2σ 2 ni=1 zi2
(τ˜ − τ )T Q−1 (τ˜ − τ ) ≤ F ξ,2,n−1 2σ˜ 2 T 1 [a (τ˜ − τ )]2 ≤ F for all a = 0 = Pr ξ,2,n−1 aT Qa 2σ˜ 2 % % = Pr aT τ˜ − 2Fξ,2,n−1 σ˜ 2 (aT Qa) ≤ aT τ ≤ aT τ˜ + 2Fξ,2,n−1 σ˜ 2 (aT Qa)
.
1−ξ =
Pr
for all a .
Thus ˜ − g(z) ≤ β + bz ≤ (β˜ + bz) ˜ + g(z) for all z] = 1 − ξ, Pr[(β˜ + bz) % where g(z) = 2Fξ,2,n−1 σ˜ 2 (q11 + 2q12 z + q22 z2 ). The same argument as that used in the solution of Exercise 15.19 establishes that the simultaneous coverage probability is exactly (rather than at least) 1 − ξ . Exercise 21 Consider the normal Gauss–Markov simple linear regression model and suppose that n = 22 and x1 = x2 = 5, x3 = x4 = 6, . . . , x21 = x22 = 15. (a) Determine the expected squared lengths of the Bonferroni and multivariate t 95% simultaneous confidence intervals for β1 and β2 . (Expressions for these intervals may be found in Example 15.3-1.) (b) Let xn+1 = 5, xn+2 = 10, and xn+3 = 15, and suppose that the original responses and the responses corresponding to these three “new” values of x follow the normal Gauss–Markov prediction-extended simple linear regression model. Give an expression for the multivariate t 95% simultaneous prediction intervals for {β1 + β2 xn+i + en+i : i = 1, 2, 3}. Determine how much wider (proportionally) these intervals are than the multivariate t 95% simultaneous confidence intervals for {β1 + β2 xn+i : i = 1, 2, 3}. (c) Obtain the 95% trapezoidal (actually it is a parallelogram) confidence band described in Example 15.3.3 for the line over the bounded interval [a = 5, b = 15], and compare the expected squared length of any interval in that band to the expected squared length of the intervals in the 95% classical confidence band at x = 5, x = 10, and x = 15.
288
15 Inference for Estimable and Predictable Functions
Solution (a) In this scenario, x¯ √ = 10, SXX = 220,√and (according to Example 7.2-1) ¯ x¯ 2 + SXX = −10/ 110. The expected squared lengths corr(βˆ1 , βˆ2 ) = −x/ of the Bonferroni and multivariate t 95% simultaneous confidence intervals for β1 and β2 are listed in the following table: Intervals
β1
Multivariate t
1 x¯ 2 n+ SXX = x¯ 2 2 4t0.025,2,20,R σ 2 n1 + SXX 2 4t0.0125,20 σ2
Bonferroni
β2 11.743σ 2
2 4t0.0125,20 σ 2 /SXX = 0.10675σ 2
= 9.749σ 2
2 4t0.025,2,20,R σ 2 /SXX = 0.08863σ 2
To obtain the results for the multivariate t intervals, we used the fact that, accord 1 √−10 1 corr(βˆ1 , βˆ2 ) 110 . = ing to Example 15.3-1, R = −10 √ 1 corr(βˆ1 , βˆ2 ) 1 110 Then, using the qmvt function in the mvtnorm package of R with this Rmatrix, we obtain t0.025,2,20,R = 2.207874. (b) The multivariate t 95% simultaneous prediction intervals for yn+1 , yn+2 , yn+3 are given by 5 (βˆ1 + βˆ2 xn+i ) ± t0.025,3,n−2,R σˆ 1 +
¯ 2 1 (xn+i − x) + n SXX
(i = 1, 2, 3),
where
βˆ1 + βˆ2 xn+1 − yn+1 βˆ1 + βˆ2 xn+2 − yn+2 βˆ1 + βˆ2 xn+3 − yn+3 R = var , , √ √ √ σ q11 σ q22 σ q33
rij =
√ 1√ qii qjj (xn+i −x) ¯ 2 for SXX
1 n
+
(xn+i −x)(x ¯ n+j −x) ¯ SXX
T = (rij ),
for i = j = 1, 2, 3, and qii = 1 +
+ i = 1, 2, 3. Furthermore, the multivariate t 95% simultaneous confidence intervals for {β1 + β2 xn+i : i = 1, 2, 3} are given by 1 n
5 (βˆ1 + βˆ2 xn+i ) ± t0.025,3,n−2,R∗ σˆ
1 (xn+i − x) ¯ 2 + n SXX
(i = 1, 2, 3),
where
βˆ1 + βˆ2 xn+1 βˆ1 + βˆ2 xn+2 βˆ1 + βˆ2 xn+3 R∗ = var , , √ √ √ σ m11 σ m22 σ m33 = (rij∗ ),
T
15 Inference for Estimable and Predictable Functions
rij∗ =
√
1√
mii mjj (xn+i −x) ¯ 2 for i SXX
1 n
+
(xn+i −x)(x ¯ n+j −x) ¯ SXX
289
for i = j = 1, 2, 3, and mii =
+ = 1, 2, 3. Finally, the ratio of the width of the multivariate t 95% simultaneous prediction intervals for yn+1 , yn+2 , yn+3 to the width of the confidence intervals for β1 + β2 xn+i (i = 1, 2, 3) is 1 n
% % −x) ¯ 2 −10)2 1 2t0.025,3,n−2,R σˆ 1 + n1 + (xn+i 1 + 22 + (xn+i220 t 0.025,3,20,R SXX % % = −x) ¯ 2 (xn+i −10)2 1 ∗ 2t0.025,3,n−2,R∗ σˆ n1 + (xn+i t 0.025,3,20,R SXX 22 + 220 2.770 if i = 1 or i = 3, = 4.920 if i = 2. (c) Because x¯ = 10 lies at the midpoint of [a, b] = [5, 15], the trapezoidal confidence region is actually a parallelogram. Therefore, the expected squared length of the interval in the 95% trapezoidal confidence band corresponding to any x ∈ [5, 15] is
&
2t0.025,2,20,R
25 1 + 22 220
2
2 σ 2 = 4t0.025,2,20,R
1 25 + σ 2, 22 220
where R is a 2 × 2 correlation matrix with off-diagonal element equal to 1 + (5 − 10)(15 − 10)/220 mab 3 = &22 √ = −7. maa mbb 2 2 (5−10) (15−10) 1 1 22 + 220 22 + 220
We find that t0.025,2,20,R = 2.38808, so the expected squared length is 4(2.38808)2 [(1/22) + (25/220)]σ 2 = 3.6291σ 2 . For the classical 95% confidence band, the expected squared length of the interval for the line’s ordinate at x is ⎡ 5 ⎣2 2F0.05,2,20 σ 2
⎤ 2 1 1 (x − 10)2 ⎦ (x − 10)2 + + . = 8F0.05,2,20 σ 2 22 SXX 22 SXX
When x = 5, 10, 15, this is equal to 4.445σ 2 , 1.270σ 2 , and 4.445σ 2 , respectively.
290
15 Inference for Estimable and Predictable Functions
Exercise 22 Consider the two-way main effects model with equal cell frequencies, i.e., yij k = μ + αi + γj + eij k
(i = 1, . . . , q; j = 1, . . . , m; k = 1, . . . , r).
Here μ, the αi ’s, and the γj ’s are unknown parameters, while the eij k ’s are independent normally distributed random variables having zero means and common variances σ 2 . Let 0 < ξ < 1. (a) Obtain an expression for a 100(1−ξ )% “one-at-a-time” (symmetric) confidence interval for αi − αi (i > i = 1, . . . , q). Obtain the analogous expression for a 100(1−ξ )% one-at-a-time confidence interval for γj −γj (j > j = 1, . . . , m). (b) Obtain an expression for a 100(1 − ξ )% confidence ellipsoid for the subset of Factor A differences {α1 − αi : i = 2, . . . , q}; likewise, obtain an expression for a 100(1 − ξ )% confidence ellipsoid for the subset of Factor B differences {γ1 − γj : j = 2, . . . , m}. Your expressions can involve vectors and matrices provided that you give nonmatrix expressions for each of the elements of those vectors and matrices. (c) Use each of the Bonferroni, Scheffé, and multivariate-t methods to obtain confidence intervals for all [q(q − 1)/2] + m(m − 1)/2] differences αi − αi (i > i = 1, . . . , q) and γj − γj (j > j = 1, . . . , m) such that the probability of simultaneous coverage is at least 1 − ξ . (d) Discuss why Tukey’s method cannot be used directly to obtain simultaneous confidence intervals for αi − αi (i > i = 1, . . . , q) and γj − γj (j > j = 1, . . . , m) such that the probability of simultaneous coverage is at least 1 − ξ . Show, however, that Tukey’s method could be used directly to obtain confidence intervals for αi − αi (i > i = 1, . . . , q) such that the probability of simultaneous coverage is at least 1 − ξ1 , and again to obtain confidence intervals for γj − γj (j > j = 1, . . . , m) such that the probability of simultaneous coverage is at least 1 − ξ2 ; then combine these via the Bonferroni inequality to get intervals for the union of all of these differences whose simultaneous coverage probability is at least 1 − ξ1 − ξ2 . (e) For each of the four sets of intervals for the α-differences and γ -differences obtained in parts (a) and (c), determine E(L2 )/σ 2 , where L represents the length of each interval in the set. 1 (f) Compute [E(L2 )/σ 2 ] 2 for each of the four sets of confidence intervals when q = 4, m = 5, r = 1, and ξ = .05, and using I in place of R for the multivariate t intervals. Which of the three sets of simultaneous confidence intervals would you recommend in this case, assuming that your interest is in only the Factor A differences and Factor B differences?
15 Inference for Estimable and Predictable Functions
291
Solution (a) According to the solution to Exercise 7.11, for i ≤ s, i > i = 1, . . . , q, and s > s = 1, . . . , q, ⎧ 2 2σ ⎪ if i = s and i = s , ⎪ ⎪ mr2 ⎨ σ if i = s and i = s , or i = s and i = s , cov(y i·· −y i ·· , y s·· −y s ·· ) = mr 2 σ ⎪ ⎪ − mr if i = s, ⎪ ⎩ 0, otherwise; for j ≤ t, j > j = 1, . . . , m, and t > t = 1, . . . , m, ⎧ 2 2σ ⎪ ⎪ qr if j = t and j = t , ⎪ ⎪ 2 ⎨σ if j = t and j = t , or j = t and j = t , cov(y ·j · − y ·j · , y ·t· − y ·t · ) = qr 2 ⎪ ⎪ − σqr if j = t, ⎪ ⎪ ⎩ 0, otherwise;
and for i > i = 1, . . . , q and j > j = 1, . . . , m, cov(y i·· − y i ·· , y ·j · − y ·j · ) = 0. Therefore, using (15.1), & (y¯i·· − y¯i ·· ) ± tξ/2,qmr−q−m+1 σˆ
2 mr
is a 100(1 − ξ )% confidence interval for αi − αi (i > i = 1, . . . , q). Similarly, 5 (y¯·j · − y¯·j · ) ± tξ/2,qmr−q−m+1 σˆ
2 qr
is a 100(1 − ξ )% confidence interval for γj − γj (j > j = 1, . . . , m). (b) Define the (q − 1) × (1 + q + m) matrix C1 as follows: ⎛
⎞ 0 1 −1 0 · · · 0 0 · · · 0 ⎜ 0 1 0 −1 · · · 0 0 · · · 0 ⎟ ⎜ ⎟ CT1 = ⎜ . ⎟. ⎝ .. ⎠ 01 0
0 · · · −1 0 · · · 0
Then CT1 β = (α1 − α2 , α1 − α3 , . . . , α1 − αq )T and CT1 βˆ = (y¯1·· − y¯2·· , y¯1·· − y¯3·· , . . . , y¯1·· − y¯q·· )T . Therefore, by Theorem 15.1.1, a 100(1 − ξ )% confidence
292
15 Inference for Estimable and Predictable Functions
ellipsoid for {α1 − αj : j = 2, . . . , q} is given by {β : (CT1 βˆ −CT1 β)T [CT1 (XT X)− C1 ]−1 (CT1 βˆ −CT1 β) ≤ (q −1)σˆ 2 Fξ,q−1,qmr−q−m+1 }.
Observe that the (i, i )th element of CT1 (XT X)− C1 is
1 σ2
cov(y¯1·· − y¯(i+1)·· , y¯1·· − y¯(i +1)·· ) =
2/mr if i = i , 1/mr if i = i ,
for i, i = 1, . . . , q − 1. Similarly, define the (m − 1) × (1 + q + m) matrix C2 as follows: ⎞ 0 0 · · · 0 1 −1 0 · · · 0 ⎜ 0 0 · · · 0 1 0 −1 · · · 0 ⎟ ⎟ ⎜ CT2 = ⎜ . ⎟. ⎠ ⎝ .. ⎛
0 0 ··· 0 1 0
0 · · · −1
Then CT2 β = (γ1 − γ2 , γ1 − γ3 , . . . , γ1 − γm )T and CT2 βˆ = (y¯·1· − y¯·2· , y¯·1· − y¯·3· , . . . , y¯·1· − y¯·m· )T . Therefore, by Theorem 15.1.1, a 100(1−ξ )% confidence ellipsoid for {γ1 − γj : j = 2, . . . , m} is given by T T T T − −1 T ˆ T ˆ {β : (CT2 β−C ˆ 2 Fξ,m−1,qmr−q−m+1 }. 2 β) [C2 (X X) C2 ] (C2 β−C2 β) ≤ (m−1)σ
Observe that the (j, j )th element of CT2 (XT X)− C2 is
1 σ2
cov(y¯·1· − y¯·(j +1)· , y¯·1· − y¯·(j +1)· ) =
2/qr if j = j , 1/qr if j = j ,
for j, j = 1, . . . , m − 1. (c) The Bonferroni simultaneous confidence intervals for αi − αi (i > i = 1, . . . , q) and γj − γj (j > j = 1, . . . , m) are &
2 and mr 5 2 (y¯·j · − y¯·j · ) ± tξ/[q(q−1)+m(m−1)],qmr−q−m+1 σˆ . qr
(y¯i·· − y¯i ·· ) ± tξ/[q(q−1)+m(m−1)],qmr−q−m+1 σˆ
15 Inference for Estimable and Predictable Functions
293
For the Scheffé method, take ⎛
⎞ 0 1 −1 0 0 · · · 0 0 1 −1 0 0 · · · 0 0 ⎜ 0 0 1 −1 0 · · · 0 0 0 1 −1 0 · · · 0 0 ⎟ ⎜ ⎟ C=⎜. ⎟. ⎝ .. ⎠ 00 0
0 0 · · · 1 −1 0 0
0 0 · · · 1 −1
Observe that rank(C) = q + m − 2. Thus, the Scheffé simultaneous confidence intervals for αi − αi (i > i = 1, . . . , q) and γj − γj (j > j = 1, . . . , m) are & (y¯i·· − y¯i ·· ) ± σˆ
2 % (q + m − 2)Fξ,q+m−2,qmr−q−m+1 mr
and 5 (y¯·j · − y¯·j · ) ± σˆ
2% (q + m − 2)Fξ,q+m−2,qmr−q−m+1 . qr
The multivariate t simultaneous confidence intervals for αi − αi (i > i = 1, . . . , q) and γj − γj (j > j = 1, . . . , m) (jointly) are & (y¯i·· − y¯i ·· ) ± tξ/2,[q(q−1)+m(m−1)]/2,qmr−q−m+1,R σˆ and
2 mr
5 (y¯·j · − y¯·j · ) ± tξ/2,[q(q−1)+m(m−1)]/2,qmr−q−m+1,R σˆ
2 , qr
respectively, where y¯(q−1)·· − y¯q·· y¯·1· − y¯·2· y¯1·· − y¯2·· y¯·(m−1)· − y¯·m· T R = var ,..., , √ ,..., . √ √ √ σ 2/mr σ 2/mr σ 2/qr σ 2/qr (d) Because the differences αi − αi (i > i = 1, . . . , q) and γj − γj (j > j = 1, . . . , m) cannot be represented as all possible pairwise differences of a set of linearly independent estimable functions, Tukey’s method cannot be applied directly. However, observe that ψi ≡ μ + αi + m1 m j =1 γj (i = 1, . . . , q) are estimable and αi − αi (i > i = 1, . . . , q) are all possible differences among ψ1 , . . . , ψq . The least squares estimator of ψi is ψˆ i = y¯··· + (y¯i·· − y¯··· ) +
1 (y¯·j · − y¯··· ) = y¯i·· m m
j =1
(i = 1, . . . , q)
294
15 Inference for Estimable and Predictable Functions
and var[(ψˆ 1 , . . . , ψˆ q )T ] =
σ2 mr
I.
Therefore, the 100(1 − ξ1 )% simultaneous Tukey intervals for αi − αi (i > i = 1, . . . , q) are given by ' (y¯i·· − y¯i ·· ) ± qξ∗1 ,q,qmr−q−m+1 σˆ 1/mr. By a very similar argument, the 100(1 − ξ2 )% simultaneous Tukey intervals for γj − γj (j > j = 1, . . . , m) are given by ' (y¯·j · − y¯·j · ) ± qξ∗2 ,m,qmr−q−m+1 σˆ 1/qr. Finally, let ' E1 = {(αi − αi ) ∈ (y¯i·· − y¯i ·· ) ± qξ∗1 ,q,qmr−q−m+1 σˆ 1/mr (i > i = 1, . . . , q)}, ' E2 = {(γj − γj ) ∈ (y¯·j · − y¯·j · ) ± qξ∗2 ,m,qmr−q−m+1 σˆ 1/qr (j > j = 1, . . . , m)}.
By the Bonferroni inequality, Pr(E1 ∩ E2 ) ≥ 1 − ξ1 − ξ2 . (e) Intervals
E(L2 )/σ 2 for α-difference
E(L2 )/σ 2 for γ -difference
2 One-at-a-time (8/mr)tξ/2,qmr−q−m+1
2 (8/qr)tξ/2,qmr−q−m+1
Bonferroni
2 (8/mr)tξ/[q(q−1)+m(m−1)],qmr−q−m+1
2 (8/qr)tξ/[q(q−1)+m(m−1)],qmr−q−m+1
Scheffé
(8/mr)(q + m − 2)Fξ,q+m−2,qmr−q−m+1
(8/qr)(q + m − 2)Fξ,q+m−2,qmr−q−m+1
2 2 Multivariate t (8/mr)tξ/2,[q(q−1)+m(m−1)]/2,qmr−q−m+1,R (8/qr)tξ/2,[q(q−1)+m(m−1)]/2,qmr−q−m+1,R
(f) Intervals One-at-a-time Bonferroni Scheffé Multivariate t (using R = I)
1
[E(L2 )/σ 2 ] 2 for α-difference 2.756 4.660 5.712 4.517
1
[E(L2 )/σ 2 ] 2 for γ -difference 3.081 5.210 6.386 5.050
The multivariate t intervals are recommended, despite using I in place of R, because they have the shortest expected squared length. Exercise 23 Consider data in a 3 × 3 layout that follow a normal Gauss–Markov two-way model with interaction, and suppose that each cell contains exactly r
15 Inference for Estimable and Predictable Functions
295
observations. Thus, the model is yij k = μ + αi + γj + ξij + eij k
(i = 1, . . . , 3; j = 1, . . . , 3; k = 1, . . . , r),
where the eij k ’s are independent N(0, σ 2 ) random variables. (a) List the “essentially different” interaction contrasts ξij − ξij − ξi j + ξi j within this layout and their ordinary least squares estimators. (A list of interaction contrasts is “essentially different” if no contrast in the list is a scalar multiple of another.) (b) Give expressions for one-at-a-time 100(1 − ξ )% confidence intervals for the essentially different interaction contrasts. (c) Use each of the Bonferroni, Scheffé, and multivariate t methods to obtain confidence intervals for the essentially different interaction contrasts that have probability of simultaneous coverage at least 1 − ξ . For the multivariate t method, give the numerical entries of R. (d) Explain why Tukey’s method cannot be used to obtain simultaneous confidence intervals for the essentially different interaction contrasts that have probability of simultaneous coverage at least 1 − ξ . (e) For each of the four sets of intervals for the essentially different error contrasts obtained in parts (b) and (c), determine E(L2 )/σ 2 , where L represents the length of each interval in the set. 1 (f) Compute [E(L2 )/σ 2 ] 2 for each of the four sets of confidence intervals when ξ = 0.05 and r = 3. Which of the three sets of simultaneous confidence intervals would you recommend in this case? Solution (a) There are nine essentially different interaction contrasts, as follows: ξ11 − ξ12 − ξ21 + ξ22 , ξ11 − ξ13 − ξ21 + ξ23 , ξ11 − ξ12 − ξ31 + ξ32 , ξ11 − ξ13 − ξ31 + ξ33 , ξ12 − ξ13 −ξ22 +ξ23 , ξ12 −ξ13 −ξ32 +ξ33 , ξ21 −ξ22 −ξ31 +ξ32 , ξ21 −ξ23 −ξ31 +ξ33 , ξ22 − ξ23 − ξ32 + ξ33 . The ordinary least squares estimator of ξij − ξij − ξi j + ξi j is τˆij i j ≡ y¯ij − y¯ij − y¯i j + y¯i j . (b) One-at-a-time 100(1 − ξ )% confidence intervals for the essentially different interaction contrasts are given by ' τˆij i j ± tξ/2,9(r−1) σˆ 4/r. (c) The Bonferroni intervals are given by ' τˆij i j ± tξ/18,9(r−1) σˆ 4/r.
296
15 Inference for Estimable and Predictable Functions
To determine the Scheffé intervals, we must find a basis for the essentially different interaction contrasts. The first four interaction contrasts in the list given in the solution to part (a) are linearly independent because ξ22 , ξ23 , ξ32 , and ξ33 appear only once among those four contrasts. Furthermore, each of the other five contrasts can be written as a linear combination of the first four, as follows: ξ12 − ξ13 − ξ22 + ξ23 = (ξ11 − ξ13 − ξ21 + ξ23 ) − (ξ11 − ξ12 − ξ21 + ξ22 ) ξ12 − ξ13 − ξ32 + ξ33 = (ξ11 − ξ13 − ξ31 + ξ33 ) − (ξ11 − ξ12 − ξ31 + ξ32 ) ξ21 − ξ22 − ξ31 + ξ32 = (ξ11 − ξ12 − ξ31 + ξ32 ) − (ξ11 − ξ12 − ξ21 + ξ22 ) ξ21 − ξ23 − ξ31 + ξ33 = (ξ11 − ξ13 − ξ31 + ξ33 ) − (ξ11 − ξ13 − ξ21 + ξ23 ) ξ22 − ξ23 − ξ32 + ξ33 = (ξ11 − ξ13 − ξ31 + ξ33 ) − (ξ11 − ξ12 − ξ31 + ξ32 ) −(ξ11 − ξ13 − ξ21 + ξ23 ) + (ξ11 − ξ12 − ξ21 + ξ22 ). Thus the number of vectors in a basis for all nine essentially different interaction contrasts is four, and the Scheffé intervals are given by τˆij i j ±
'
' 4Fξ,4,9(r−1) σˆ 4/r.
The multivariate t intervals are given by ' τˆij i j ± tξ/2,9,9(r−1),R σˆ 4/r, where ⎛
⎞ 1.00 0.50 0.50 0.25 0.50 0.25 0.50 0.25 0.25 ⎜ 0.50 1.00 0.25 0.50 0.50 0.25 0.25 0.50 0.25 ⎟ ⎜ ⎟ ⎜ 0.50 0.25 1.00 0.50 0.25 0.50 0.50 0.25 0.25 ⎟ ⎜ ⎟ ⎜ 0.25 0.50 0.50 1.00 0.25 0.50 0.25 0.50 0.25 ⎟ ⎜ ⎟ ⎜ ⎟ R = ⎜ 0.50 0.50 0.25 0.25 1.00 0.50 0.25 0.25 0.50 ⎟ . ⎜ ⎟ ⎜ 0.25 0.25 0.50 0.50 0.50 1.00 0.50 0.25 0.50 ⎟ ⎜ ⎟ ⎜ 0.50 0.25 0.50 0.25 0.25 0.50 1.00 0.50 0.50 ⎟ ⎜ ⎟ ⎝ 0.25 0.50 0.25 0.50 0.25 0.25 0.50 1.00 0.50 ⎠ 0.25 0.25 0.25 0.25 0.50 0.50 0.50 0.50 1.00 (d) Tukey’s method is not applicable because it is not possible to write the essentially different interaction contrasts as all pairwise differences of estimable functions whose least squares estimators have a variance–covariance matrix equal to a scalar multiple of an identity matrix.
15 Inference for Estimable and Predictable Functions
297
(e) Intervals One-at-a-time
Expected squared length/σ 2 2 (16/r)tξ/2,9(r−1)
Bonferroni
2 (16/r)tξ/18,9(r−1)
Scheffé Multivariate t
(64/r)Fξ,4,9(r−1) 2 (16/r)tξ/2,9,9(r−1),R
(f) Intervals One-at-a-time Bonferroni Scheffé Multivariate t
1
[E(L2 )/σ 2 ] 2 4.85 7.27 7.90 6.96
The multivariate t intervals are recommended because they have the shortest expected squared length among the three sets of simultaneous confidence intervals. Exercise 24 Consider the two-factor nested model yij k = μ + αi + γij + eij k
(i = 1, . . . , q;
j = 1, . . . , mi ;
k = 1, . . . , nij ),
where the eij k ’s are independent N(0, σ 2 ) random variables. Suppose that 100(1 − ξ )% simultaneous confidence intervals for all of the differences γij − γij (i = 1, . . . , q; j > j = 1, . . . , mi ) are desired. (a) Give 100(1 − ξ )% one-at-a-time confidence intervals for γij − γij (j > j = 1, . . . , mi ; i = 1, . . . , q). (b) For each of the Bonferroni, Scheffé, and multivariate t approaches to this problem, the desired confidence intervals can be expressed as √ (γˆij − γˆij ) ± a σˆ vijj , where γˆij − γˆij is the least squares estimator of γij − γij , σˆ 2 is the residual mean square, vijj = var(γˆij − γˆij )/σ 2 , and a is a percentage point from an appropriate distribution. For each of the three approaches, give a (indexed by its tail probability, degree(s) of freedom, and any other parameters used to index the appropriate distribution). Note: You need not give expressions for the elements of the correlation matrix R for the multivariate t method. (c) Consider the case q = 2 and suppose that nij is constant (equal to r) across j . Although Tukey’s method is not directly applicable to this problem, it can be
298
15 Inference for Estimable and Predictable Functions
used to obtain 100(1 − ξ1 )% simultaneous confidence intervals for γ1j − γ1j (j > j = 1, . . . , m1 ) and then used a second time to obtain 100(1 − ξ2 )% simultaneous confidence intervals for γ2j − γ2j (j > j = 1, . . . , m2 ). If ξ1 and ξ2 are chosen appropriately, these intervals can then be combined, using Bonferroni’s inequality, to obtain a set of intervals for all of the differences γij −γij (j > j ) whose simultaneous coverage probability is at least 1−ξ1 −ξ2 . Give appropriate cutoff point(s) for this final set of intervals. (Note: The cutoff point for some of the intervals in this final set may not be the same as the cutoff point for other intervals in the set. Be sure to indicate the intervals to which each cutoff point corresponds.) (d) For each of the four sets of intervals for the γ -differences obtained in parts (a) and (b), determine an expression for E(L2 /σ 2 ), where L represents the length of each interval in the set. Do likewise for the method described in part (c), with ξ1 = ξ2 = ξ/2. For the multivariate t approach, replace R with I to obtain conservative intervals. 1 (e) Evaluate [E(L2 )/σ 2 ] 2 for each of the four sets of confidence intervals when q = 2, m1 = m2 = 5, nij = 2 for all i and j , and ξ = .05. Based on these results, which set of simultaneous confidence intervals would you recommend in this case? Solution (a) According to the solution to Exercise 7.13, for i ≤ s, j ≤ t, j > j = 1, . . . , mi , t > t = 1, . . . , ms ,
cov(y ij · − y ij · , y st· − y st · ) =
⎧ 2 1 σ ( nij + ⎪ ⎪ ⎪ ⎪ ⎪ σ2 ⎪ ⎪ nij ⎪ ⎪ ⎪ ⎨ σ2
1 nij )
if i = s, j = t, j = t , if i = s, j = t, j = t ,
nij 2 ⎪ ⎪ − nσij ⎪ ⎪ ⎪ 2 ⎪ ⎪ − nσ ⎪ ⎪ ij ⎪
⎩
if i = s, j = t, j = t ,
if i = s and j = t , if i = s and j = t,
0
otherwise.
Therefore, using (15.1), 5 (y¯ij · − y¯ij · ) ± tξ/2,n−q
σˆ i=1 mi
1 1 + nij nij
is a 100(1 − ξ )% confidence interval for γij − γij (i = 1, . . . , q; j > j = 1, . . . , mi ). (b) For the Bonferroni approach, a = tξ/ q
q
i=1 mi (mi −1),n−
i=1 mi
.
15 Inference for Estimable and Predictable Functions
299
For the Scheffé approach, observe that for fixed i the mi (mi − 1)/2 differences {γij − γij : j > j = 1, . . . , mi } are linearly dependent, but any basis for them consists of mi − 1 linearly independent differences, and for different i these bases are linearly independent. It follows that for the Scheffé approach, 6 7 q 7 a = 8 (mi − 1)Fξ,q (mi −1),n−q mi . i=1
i=1
i=1
For the multivariate t approach, a = tξ/2,q
q
i=1 mi (mi −1)/2,n−
i=1 mi ,R
,
where R = var
y¯11· − y¯12· y¯11· − y¯1m1 · , ,..., ' √ σ (2/n11 ) + (2/n12 ) σ (2/n11 ) + (2/n1m1 )
T y¯q,(mq −1)· − y¯qmq · y¯21· − y¯22· . ,..., ' √ σ (2/nq,mq −1 ) + (2/nqmq ) σ (2/n21 ) + (2/n22 ) (c) {(γ1j − γ1j ) : j > j = 1, . . . , m1 } can be represented as all pairwise differences of the linearly independent estimable functions ψ11 , ψ12 , . . . , ψ1m1 , where ψ1j = μ + α1 + γ1j (j = 1, . . . , m1 ). The least squares estimator of ψ1j is y¯1j · and 2 σ T I. var[(y¯11 , . . . , y¯1m1 ) ] = r Therefore, the 100(1 − ξ1 )% simultaneous Tukey intervals for γ1j − γ1j (j > j = 1, . . . , m1 ) are given by ' (y¯1j · − y¯1j · ) ± qξ∗ ,m −1,n−q m σˆ 1/r (j > j = 1, . . . , m1 ). 1
1
i=1
i
By a very similar argument, the 100(1 − ξ2 )% simultaneous Tukey intervals for γ2j − γ2j (j > j = 1, . . . , m2 ) are given by ' (y¯2j · − y¯2j · ) ± qξ∗ ,m −1,n−q m σˆ 1/r (j > j = 1, . . . , m2 ). 2
2
i=1
i
Finally, let E1 = {(γ1j − γ1j ) ∈ (y¯1j · − y¯1j · ) ± qξ∗ ,m
q
E2 = {(γ2j − γ2j ) ∈ (y¯2j · − y¯2j · ) ± qξ∗ ,m
q
1
2
1 −1,n−
2 −1,n−
i=1
i=1
mi mi
' σˆ 1/r
(j > j = 1, . . . , m1 )},
' σˆ 1/r
(j > j = 1, . . . , m2 )}.
By the Bonferroni inequality, Pr(E1 ∩ E2 ) ≥ 1 − ξ1 − ξ2 .
300
15 Inference for Estimable and Predictable Functions
(d) Intervals
E(L2 )/σ 2
One-at-a-time
2 q 4tξ/2,n−
i=1 mi
Bonferroni Scheffé Multivariate t (with I in place of R) Tukey (with q = 2 and nij ≡ r)
1 nij
+
1 nij
1 1 2 q 4tξ/ q nij + nij m (m −1),n− m i i i i=1 i=1 q 1 q q 4 i=1 (mi − 1) Fξ, i=1 (mi −1),n− i=1 mi nij + 1 1 2 q 4tξ/2, q nij + nij i=1 mi (mi −1)/2,n− i=1 mi ,I 1 ∗2 4qξ/2,m r i −1,n−(m1 +m2 )
1 nij
(e) Intervals One-at-a-time Bonferroni Scheffé Multivariate t (with I in place of R) Tukey
1
[E(L2 )/σ 2 ] 2 4.456 8.009 9.914 7.645 6.990
The Tukey intervals are recommended because they have the shortest expected squared length. Exercise 25 Under the normal Gauss–Markov two-way partially crossed model introduced in Example 5.1.4-1 with one observation per cell: (a) Give 100(1−ξ )% one-at-a-time confidence intervals for αi −αj and μ+αi −αj . (b) Give 100(1 − ξ )% Bonferroni, Scheffé, and multivariate t simultaneous confidence intervals for {αi − αj : j > i = 1, . . . , q}. Solution (a) Using results from the solution to Exercise 7.14, we find that a 100(1 − ξ )% confidence interval for αi − αj is given by
yi· − y·i − yj · + y·j 2q
5
± tξ/2,q(q−2) σˆ
1 , q
15 Inference for Estimable and Predictable Functions
301
and that a 100(1 − ξ )% confidence interval for μ + αi − αj is given by 5 yi· − y·i − yj · + y·j 1 ± tξ/2,q(q−2) σˆ y¯·· + 2q q −1 1 1 + q1 = q−1 . because q(q−1) (b) The 100(1 − ξ )% Bonferroni intervals for all differences αi − αj (j > i = 1, . . . , q) may be obtained from the one-at-a-time intervals for the same quantities by replacing tξ/2,q(q−2) with tξ/[(q−1)q],q(q−2) '; the Scheffé intervals may be obtained by replacing the same quantity with (q − 1)Fξ,q−1,q(q−2) ; and the multivariate t intervals may be obtained by replacing the same quantities with tξ/2,(q−1)q/2,q(q−2),R , where, using the solution to Exercise 7.14b, the (ij, st)th element of R (i ≤ s, j > i = 1, . . . , q, t > s = 1, . . . , q) is given by
yi· − y·i − yj · + y·j ys· − y·s − yt· + y·t , 2q 2q ⎧ 1 if i = s and j = t, ⎪ ⎪ ⎨ 0.5 if i = s and j = t, or i = s and j = t, = ⎪ −0.5 if j = s, ⎪ ⎩ 0 otherwise.
corr
Exercise 26 Under the normal Gauss–Markov Latin square model with q treatments introduced in Exercise 6.21: (a) Give 100(1 − ξ )% one-at-a-time confidence intervals for τk − τk (k > k = 1, . . . , q). (b) Give 100(1 − ξ )% Bonferroni, Scheffé, and multivariate t simultaneous confidence intervals for {τk − τk : k > k = 1, . . . , q}. Solution (a) Using results from the solution to Exercise 7.15, we find that a 100(1 − ξ )% confidence interval for τk − τk is given by 5 y¯··k − y¯··k ± tξ/2,(q−1)(q−2) σˆ
2 . q
(b) The 100(1 − ξ )% Bonferroni intervals for all differences τk − τk may be obtained from the one-at-a-time intervals for the same quantities by replacing tξ/2,(q−1)(q−2) with tξ/[(q−1)q],(q−1)(q−2) ' ; the Scheffé intervals may be obtained by replacing the same quantity with (q − 1)Fξ,q−1,(q−1)(q−2) ; and the multivariate t intervals may be obtained by replacing the same quantities with
302
15 Inference for Estimable and Predictable Functions
tξ/2,(q−1)q/2,(q−1)(q−2),R , where the (kk , ll )th element of R (for k ≤ l, k > k = 1, . . . , q, l > l = 1, . . . , q) is given by
corr (y¯··k
⎧ ⎪ 1 ⎪ ⎪ ⎨ 0.5 − y¯··k , y¯··l − y¯··l ) = ⎪ −0.5 ⎪ ⎪ ⎩ 0
if k = l and k = l , if k = l and k = l , or k = l and k = l , if k = l, otherwise.
Exercise 27 Consider the normal split-plot model introduced in Example 13.4.51, but with ψ ≡ σb2 /σ 2 > 0 known, and recall the expressions obtained for variances of various mean differences obtained in Exercise 13.25. Let σ˜ 2 denote the generalized residual mean square. (a) Obtain 100(1 − ξ )% one-at-a-time confidence intervals for the functions in the sets {(αi − αi ) + (ξ¯i· − ξ¯i · ) : i > i = 1, . . . , q} and {(γk − γk ) + (ξ¯·k − ξ¯·k ) : k > k = 1, . . . , m}. [Note that y¯i·· − y¯i ·· is the least squares estimator of (αi − αi ) + (ξ¯i· − ξ¯i · ), and y¯··k − y¯··k is the least squares estimator of (γk − γk ) + (ξ¯·k − ξ¯·k ).] (b) Using the Bonferroni method, obtain a set of confidence intervals whose simultaneous coverage probability for all of the estimable functions listed in part (a) is at least 1 − ξ . (c) Using the Scheffé method, obtain a set of confidence intervals whose simultaneous coverage probability for the functions {(αi − αi ) + (ξ¯i· − ξ¯i · ) : i > i = 1, . . . , q} and all linear combinations of those functions is 1 − ξ . Solution (a) Using various results in Example 13.4.5-1 and Exercise 13.25, we find that 100(1 − ξ )% one-at-a-time confidence intervals for the functions in the sets {(αi − αi ) + (ξ¯i· − ξ¯i · ) : i > i = 1, . . . , q} are given by & (y¯i·· − y¯i ·· ) ± tξ/2,q(r−1)(m−1) σ˜
2 (mψ + 1). rm
Similarly, 100(1 − ξ )% one-at-a-time confidence intervals for the functions in the sets {(γk − γk ) + (ξ¯·k − ξ¯·k ) : k > k = 1, . . . , m} are given by 5 (y¯··k − y¯··k ) ± tξ/2,q(r−1)(m−1) σ˜
2 . qr
(b) The estimable functions listed in part (a) number q(q − 1) + m(m − 1) in total. In order to obtain simultaneous confidence intervals for all of them using the Bonferroni method, it suffices to replace the t-multiplier tξ/2,q(r−1)(m−1) used in part (a) by tξ/(2s),q(r−1)(m−1) , where s = q(q − 1) + m(m − 1).
15 Inference for Estimable and Predictable Functions
303
(c) Note that a basis for the set of estimable functions {(αi − αi ) + (ξ¯i· − ξ¯i · ) : i > i = 1, . . . , q} is given by the q − 1 linearly independent estimable functions {(αi − α1 ) + (ξ¯i· − ξ¯1· ) : i = 2, . . . , q}. Hence, the 100(1 − ξ )% Scheffé intervals are % cT βˆ ± (q − 1)Fξ,q−1,q(r−1)(m−1) σ˜ 2 {cT (XT X)− c}, where cT β is any linear combination of functions in the set {(αi − αi ) + (ξ¯i· − ξ¯i · ) : i > i = 1, . . . , q}. Exercise 28 Consider the normal Gauss–Markov no-intercept simple linear regression model yi = βxi + ei
(i = 1, . . . , n),
where n ≥ 2. (a) Find the Scheffé-based 100(1 − ξ )% confidence band for the regression line, i.e., for {βx : x ∈ R}. (b) Describe the behavior of this band as a function of x (for example, where is it narrowest and how narrow is it at its narrowest point?). Compare this to the behavior of the Scheffé-based 100(1 − ξ )% confidence band for the regression line in normal simple linear regression. Solution % n 2 ˆ ± tξ/2,n−1 σˆ |x|/ (a) {βx i=1 xi for all x ∈ R}. (b) This band is narrowest at x = 0, where its width is equal to 0. Its width is a monotone increasing function of |x|. These behaviors contrast with those of the band for the regression line in simple linear % regression with an intercept, which is narrowest at x = x, ¯ where its width is 2Fξ,2,n−2 σˆ 2 /n, and whose width is a monotone increasing function of |x − x|. ¯
Exercise 29 Consider the normal prediction-extended Gauss–Markov full-rank multiple regression model yi = xTi β + ei
(i = 1, . . . , n + s),
where x1 , . . . , xn are known p-vectors of explanatory variables (possibly including an intercept) but xn+1 , . . . , xn+s (s ≥ 1) are p-vectors of the same explanatory variables that cannot be ascertained prior to actually observing yn+1 , . . . , yn+s and must therefore be treated as unknown. This exercise considers the problem of
304
15 Inference for Estimable and Predictable Functions
obtaining prediction intervals for yn+1 , . . . , yn+s that have simultaneous coverage probability at least 1 − ξ , where 0 < ξ < 1, in this scenario. (a) Apply the Scheffé method to show that one solution to the problem consists of intervals of form % xTn+i βˆ ± σˆ (p + s)Fξ,p+s,n−p xTn+i (XT X)−1 xn+i + 1 for all xn+i ∈ Rp and all i = 1, . . . , s . (b) Another possibility is to decompose yn+i into its two components xTn+i β and en+i , obtain confidence or prediction intervals separately for each of these components by the Scheffé method, and then combine them via the Bonferroni inequality to achieve the desired simultaneous coverage probability. Verify that the resulting intervals are of form 9 xTn+i βˆ ± σˆ [a(xn+i ) + b]
for all xn+i ∈ Rp and all i = 1, . . . , s ,
% ' where a(xn+i ) = pFξ ∗ ,p,n−p xTn+i (XT X)−1 xn+i and b = sFξ −ξ ∗ ,s,n−p , and 0 < ξ∗ < ξ. (c) Still another possibility is based on the fact that (1/σˆ )e+ , where e+ = (en+1 , . . . , en+s )T , has a certain multivariate t distribution under the assumed model. Prove this fact and use it to obtain a set of intervals for yn+1 (xn+1 ), · · · , yn+s (xn+s ) of the form 9 xTn+i βˆ ± σˆ [a(xn+i ) + c]
for all xn+i ∈ Rp and all i = 1, . . . , s
-
whose simultaneous coverage probability is at least 1 − ξ , where c is a quantity you should determine. (d) Are the intervals obtained in part (b) uniformly narrower than the intervals obtained in part (c) or vice versa? Explain. (e) The assumptions that e1 , . . . , en+s are mutually independent can be relaxed somewhat without affecting the validity of intervals obtained in part (c). In precisely what manner can this independence assumption be relaxed? Note: The approaches described in parts (a) and (b) were proposed by Carlstein (1986), and the approach described in part (c) was proposed by Zimmerman (1987).
15 Inference for Estimable and Predictable Functions
305
Solution (a) By Corollary 2.16.1.1, ⎛ /
1 (p + s)σˆ 2
βˆ − β e+ − 0s
02 ⎞
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎟= max ⎜ ⎟ T X)−1 0 a=0 ⎜ (X ⎝ aT a⎠ 0 Is aT
βˆ − β e+
T
XT X 0 0 Is
βˆ − β e+
(p + s)σˆ 2
∼ F(p + s, n − p)
βˆ − β e+
T
XT X 0 0 Is
βˆ − β e+
where the distributional result holds because βˆ is independent of σˆ 2 . Consideration of the subset of ∼ χ 2 (p + s) and e+ vectors a ∈ Rp+s such that aT = {(xTn+i , uTi ) : xn+i ∈ Rp , i = 1, . . . , s} (where ui is the ith unit s-vector) yields the specified prediction intervals, which have simultaneous coverage probability at least 1 − ξ . (b) By Corollary 2.16.1.1,
1 pσˆ 2
[aT (βˆ − β)]2 max a=0 aT (XT X)−1 a
=
(βˆ − β)T XT X(βˆ − β) ∼ F(p, n − p) pσˆ 2
and
1 s σˆ 2
max b=0
[bT (e+ − 0s )]2 bT I−1 b
=
eT+ e+ ∼ F(s, n − p). s σˆ 2
Rewriting s choices of a ∈ Rp as xn+1 , . . . , xn+s and consideration of the subset of vectors b ∈ Rs given by b = {ui : i = 1, . . . , s} yield the specified separate intervals for {xTn+i β : xn+i ∈ Rp , i = 1, . . . , s} and the elements of e+ , which have simultaneous coverage probability at least 1 − ξ ∗ and at least 1 − ξ + ξ ∗ , respectively. Applying Bonferroni’s inequality yields the final set of intervals, which have simultaneous coverage probability at least 1 − ξ . (c) (1/σ )e+ ∼ N(0, I) and, of course, (n − p)σˆ 2 /σ 2 is distributed independently as χ 2 (n − p). Therefore, (1/σ )e+ (1/σˆ )e+ = & ∼ t(s, n − p, I). (n−p)σˆ 2 σ 2 (n−p)
306
15 Inference for Estimable and Predictable Functions
Consequently, for ξ ∗ ∈ (0, ξ ) we have 1 − ξ + ξ ∗ = Pr(|en+i | ≤ σˆ t(ξ −ξ ∗ )/2,s,n−p,I for all i = 1, . . . , s). By the same argument used in the solution to part (b), we obtain Pr(yn+i ∈ xTn+i βˆ ± σˆ [a(xn+i ) + c] for all xn+i ∈ Rp and all i = 1, . . . , s) ≥ 1 − ξ, where c = t(ξ −ξ ∗ )/2,s,n−p,I . ' (d) By (15.14), t(ξ −ξ ∗ )/2,s,n−p,I ≤ sFξ −ξ ∗ ,s,n−p , so the intervals obtained in part (c) are uniformly narrower than the intervals obtained in part (b). (e) The condition that en+1 , en+2 , . . . , en+s are independent can be relaxed to allow arbitrary dependence among these variables. Exercise 30 Consider the one-factor random effects model with balanced data yij = μ + bi + dij
(i = 1, . . . , q;
j = 1, . . . , r),
described previously in more detail in Example 13.4.2-2. Recall from that example that the BLUP of bi − bi (i = i ) is b i − bi =
rψ (y¯i· − y¯i · ). rψ + 1
Let σ˜ 2 be the generalized residual mean square and suppose that the joint distribution of the bi ’s and dij ’s is multivariate normal. (a) Give 100(1 − ξ )% one-at-a-time prediction intervals for bi − bi (i > i = 1, . . . , q). (b) Obtain multivariate t-based prediction intervals for all q(q − 1)/2 pairwise differences bi − bi (i > i = 1, . . . , q) whose simultaneous coverage probability is 1 − ξ . Determine the elements of the appropriate matrix R. (c) Explain why Tukey’s method is not directly applicable to the problem of obtaining simultaneous prediction intervals for all q(q − 1)/2 pairwise differences bi − bi (i > i = 1, . . . , q). Solution (a) According to the solution to Exercise 13.16a, var[b i − bi − (bi − bi )] =
2ψσ 2 . rψ + 1
15 Inference for Estimable and Predictable Functions
307
Thus, by Theorem 15.1.2, 5 b ˜ i − bi ± tξ/2,qr−1 σ
2ψ rψ + 1
is a 100(1 − ξ )% prediction interval for bi − bi . (b) Multivariate t-based prediction intervals for all q(q − 1)/2 pairwise differences bi − bi (i > i = 1, . . . , q) whose simultaneous coverage probability is 1 − ξ are given by 5 b ˜ i − bi ± tξ/2,q(q−1)/2,qr−1.R σ
2ψ , rψ + 1
where R = r(i,i ),(j,j ) : i > i = 1, . . . , q; j > j = 1, . . . , q − bq − (bq−1 − bq ) bq−1 b 1 − b2 − (b1 − b2 ) b 1 − b3 − (b1 − b3 ) , ,..., , . = var √ √ √ σ 2ψ/(rψ + 1) σ 2ψ/(rψ + 1) σ 2ψ/(rψ + 1)
Using the variance–covariance matrix of the prediction errors obtained in Exercise 13.16a, we find that for i ≤ j, i > i = 1, . . . , q, j > j = 1, . . . , q, ⎧ ⎪ ⎪ ⎨
r(i,i ),(j,j )
1 1/2 = ⎪ −1/2 ⎪ ⎩ 0
if i = j and i = j , if i = j and i = j , or i = j and i = j , if i = j, otherwise.
(c) Tukey’s method is not directly applicable because the variance–covariance matrix of the (b˜i − bi )’s, from which the b i − bi − (bi − bi )’s (i < j ) are obtained as all possible pairwise differences, is not a scalar multiple of the identity matrix. To see this, first observe from Example 13.4.2-2 that rψ (y¯i· − y¯·· ) − bi rψ + 1 rψ [(bi + d¯i· ) − (b¯· + d¯·· )] − bi = rψ + 1 rψ [(bi − b¯· ) + (d¯i· − d¯·· )] − bi . = rψ + 1
b˜i − bi =
308
15 Inference for Estimable and Predictable Functions
Hence for i > i = 1, . . . , q, rψ [(bi − b¯· ) + (d¯i· − d¯·· )] − bi , rψ + 1 [(bi − b¯· ) + (d¯i · − d¯·· )] − bi
cov(b˜i − bi , b˜i − bi ) = cov
rψ rψ + 1 2 rψ = cov[(bi − b¯· ), (bi − b¯· )] rψ + 1 2 rψ + cov[(di· − d¯·· ), (di · − d¯·· )] rψ + 1 rψ rψ cov[(bi − b¯· ), −bi ] + cov[−bi , (bi − b¯· )] + rψ + 1 rψ + 1 2 2 2 −σb2 2σb rψ −σ 2 rψ rψ = + + rψ + 1 q rψ + 1 qr rψ + 1 q =
rψ(rψ + 2)σb2 − rψ 2 σ 2 q(rψ + 1)2
= 0. Exercise 31 Let a1 , . . . , ak represent real numbers.It can be shown that |ai − ai | ≤ 1 for all i and i if and only if | ki=1 ci ai | ≤ 12 ki=1 |ci | for all ci such that k i=1 ci = 0. (a) Using this result, extend Tukey’s method for obtaining 100(1 − ξ )% simultaneous confidence intervals for all possible differences among k linearly independent estimable functions ψ1 , . . . , ψk , to all functions of the form k k d ψ with d i=1 i i i=1 i = 0 (i.e., to all contrasts). (b) Specialize the intervals obtained in part (a) to the case of a balanced one-factor model. Solution (a) The Tukey intervals satisfy ∗ 1 − ξ = Pr[|(ψˆ i − ψˆ i ) − (ψi − ψi )| ≤ cσˆ qξ,k,n−p ∗ for all i > i = 1, . . . , k].
15 Inference for Estimable and Predictable Functions
309
Using the result given in this exercise, we may manipulate the event in the probability statement above as follows: 1−ξ =
∗ Pr[|(ψˆ i − ψˆ i ) − (ψi − ψi )| ≤ cσˆ qξ,k,n−p ∗ for all i > i = 1, . . . , k]
/
=
3 (ψˆ − ψ ) − (ψˆ − ψ ) 3 i i i 3 Pr 33 i 3 ≤ 1 for all i > i = 1, . . . , k ∗
0
cσˆ qξ,k,n−p∗
/
=
k k 3 1 3 k ∗ Pr 33 di (ψˆ i − ψˆ i )33 ≤ cσˆ qξ,k,n−p |di | for all di such that di = 0 ∗ i=1
2
i=1
=
Pr
k " i=1
di ψˆ i −
i=1
1 ∗ cσˆ qξ,k,n−p ∗ 2
for all di such that
0
k
|di | ≤
i=1
k i=1
di ψi ≤
k
di ψˆ i +
i=1
1 ∗ cσˆ qξ,k,n−p |di | ∗ 2 k
i=1
#
k i=1
di = 0 .
Thus, 100(1 − ξ )% simultaneous confidence intervals for all contrasts among the ψi ’s are given by k
di ψˆ i
i=1
1 ∗ |di |. ± cσˆ qξ,k,n−p ∗ 2 k
i=1
(b) In the case of a balanced one-factor model, we have, from Example 15.3-2, % 1 ˆ ψi = μ + αi , ψi = y¯i· , c = , k = q, and n − p∗ = q(r − 1). Thus, intervals r
of the type obtained in part (a) specialize to q i=1
di y¯i·
1 ± σˆ 2
&
1 ∗ qξ,q,q(r−1) |di |. r q
i=1
These are 100(1 − ξ )% simultaneous confidence intervals for all contrasts among the αi ’s. Exercise 32 Consider the normal Gauss–Markov model {y, Xβ, σ 2 I}, and suppose that X is n × 3 (where n > 3) and rank(X) = 3. Let (XT X)−1 = (cij ), β = ˆ T (y − Xβ)/(n ˆ − 3). (βj ), βˆ = (βˆ j ) = (XT X)−1 XT y, and σˆ 2 = (y − Xβ) (a) Each of the Bonferroni, Scheffé, and multivariate t methods can be used to obtain a set of intervals for β1 , β2 , β3 , β1 +β2 , β1 +β3 , β2 +β3 , and β1 +β2 +β3 whose simultaneous coverage probability is equal to, or larger than, 1−ξ (where 0 < ξ < 1). Give the set of intervals corresponding to each method. (b) Obtain another set of intervals for the same functions listed in part (a) whose simultaneous coverage probability is equal to 1 − ξ , but which are constructed by adding the endpoints of 100(1 − ξ )% multivariate t intervals for β1 , β2 , and β3 only.
310
15 Inference for Estimable and Predictable Functions
(c) Obtain a 100(1 − ξ )% simultaneous confidence band for {β1 x1 + β2 x2 }, i.e., an infinite collection of intervals {(L(x1 , x2 ), U (x1 , x2 )): (x1 , x2 ) ∈ R2 } such that Pr[β1 x1 + β2 x2 ∈ (L(x1 , x2 ), U (x1 , x2 )) for all (x1 , x2 ) ∈ R2 ] = 1 − ξ. Solution (a) The Bonferroni intervals are given by √ βˆj ± tξ/14,n−3 σˆ cjj (j = 1, 2, 3), ' (βˆj + βˆk ) ± tξ/14,n−3 σˆ cjj + ckk + 2cj k (k > j = 1, 2, 3), and ' (βˆ1 + βˆ2 + βˆ3 ) ± tξ/14,n−3 σˆ c11 + c22 + c33 + 2c12 + 2c13 + 2c23 . The Scheffé'and multivariate t intervals are identical except that tξ/14,n−3 is replaced by 3Fξ,3,n−3 or tξ/2,7,n−3,R , respectively, where R = var
βˆ2 βˆ3 βˆ1 + βˆ2 βˆ1 + βˆ3 βˆ1 , √ , √ , √ , √ , √ σ c11 σ c22 σ c33 σ c11 + c22 + 2c12 σ c11 + c33 + 2c13
T βˆ2 + βˆ3 βˆ1 + βˆ2 + βˆ3 , √ . √ σ c22 + c33 + 2c23 σ c11 + c22 + c33 + 2c12 + 2c13 + 2c23
(b) 100(1 − ξ )% multivariate t intervals for β1 , β2 , and β3 only are given by √ βˆj ± tξ/2,3,n−3,R∗ σˆ cjj
(j = 1, 2, 3),
where R∗ is the upper left 3 × 3 block of R. Then, using Fact 4, " √ √ 1 − ξ = Pr βˆj − tξ/2,3,n−3,R∗ σˆ cjj ≤ βj ≤ βˆj + tξ/2,3,n−3,R∗ σˆ cjj (j = 1, 2, 3); √ √ (βˆj + βˆk ) − tξ/2,3,n−3,R∗ σˆ ( cjj + ckk ) ≤ βj + βk ≤ (βˆj + βˆk ) √ √ +tξ/2,3,n−3,R∗ σˆ ( cjj + ckk ) (k > j = 1, 2, 3); √ √ √ (βˆ1 + βˆ2 + βˆ3 ) − tξ/2,3,n−3,R∗ σˆ ( c11 + c22 + c33 ) ≤ β1 + β2 + β3 ≤ (βˆ1 + βˆ2 + βˆ3 ) # √ √ √ +tξ/2,3,n−3,R∗ σˆ ( c11 + c22 + c33 ) .
(c) The desired confidence band is % ' (βˆ1 x1 + βˆ2 x2 ) ± 2Fξ,2,n−3 σˆ x12 c11 + x22 c22 + 2x1 x2 c12 for all (x1 , x2 ) ∈ R2 .
15 Inference for Estimable and Predictable Functions
311
Exercise 33 Consider the normal Gauss–Markov {y, Xβ, σ 2 I}, let τi = q model q T T ci β (i = 1, . . . , q), and let τq+1 = i=1 τi = ( i=1 ci ) β. Give the interval for τq+1 belonging to each of the following sets of simultaneous confidence intervals: (a) 100(1 − ξ )% Scheffé intervals that include intervals for τ1 , . . . , τq . (b) The infinite collection of all intervals generated by taking linear combinations of the 100(1 − ξ )% multivariate t intervals for τ1 , . . . , τq . (c) The finite set of multivariate t intervals for τ1 , . . . , τq , τq+1 that have exact simultaneous coverage probability equal to 1 − ξ . Solution q
i=1 ci )
(a) (
Tβ ˆ
% q q ± σˆ q ∗ Fξ,q ∗ ,n−p∗ ( i=1 ci )T (XT X)− ( i=1 ci ), where ⎞ cT1 ⎜ . ⎟ q ∗ = rank ⎝ .. ⎠ . cTq ⎛
(b) (
q
T ˆ ∗ ˆ i=1 ci ) β ± tξ/2,q,n−p ,R σ
⎛
% T T − i=1 ci (X X) ci , where
q
cT1 βˆ
cTq βˆ
⎞T
⎠ . , ..., % R = var ⎝ % σ cTq (XT X)− cq σ cT1 (XT X)− c1 (c) (
% Tβ ˆ ± tξ,q+1,n−p∗ ,R∗ σˆ ( q ci )T (XT X)− ( q ci ), where c ) i i=1 i=1 i=1
q
⎛
cT1 βˆ
cTq+1 βˆ
⎞T
⎠ . R∗ = var ⎝ % , ..., % σ cTq+1 (XT X)− cq+1 σ cT1 (XT X)− c1 Exercise 34 Consider the normal Gauss–Markov no-intercept simple linear regression model yi = βxi + ei
(i = 1, . . . , n),
where n ≥ 2. Let yn+1 and yn+2 represent responses not yet observed at x-values xn+1 and xn+2 , respectively, and suppose that yi = βxi + ei
(i = n + 1, n + 2),
312
15 Inference for Estimable and Predictable Functions
where
en+1 en+2
∼ N2 02 , σ
2
2 0 xn+1 2 0 xn+2
,
and (en+1 , en+2 )T and (e1 , e2 , . . . , en )T are independent. (a) Obtain the vector of BLUPs of (yn+1 , yn+2 )T , giving each element of the vector by an expression free of matrices. (b) Obtain the variance–covariance matrix of prediction errors corresponding to the vector of BLUPs you obtained in part (a). Again, give expressions for the elements of this matrix that do not involve matrices. (c) Using the results of parts (a) and (b), obtain multivariate-t prediction intervals for (yn+1 , yn+2 )T having simultaneous coverage probability 1 − ξ (where 0 < ξ < 1). (d) Obtain a 100(1 − ξ )% prediction interval for (yn+1 + yn+2 )/2. Solution (a) By Theorem 13.2.1, the BLUP of
xn+1 xn+2
βˆ =
yn+1 yn+2
is
xn+1 ni=1 xi yi / ni=1 xi2 . xn+2 ni=1 xi yi / ni=1 xi2
(b) By Theorem 13.2.2, the variance–covariance matrix of the BLUP’s prediction errors is / 0 n T 2 xn+1 0 xn+1 2 −1 xn+1 2 σ ( xi ) + 2 xn+2 xn+2 0 xn+2 i=1 n 2 2 xn+1 xn+2 / ni=1 xi2 2 xn+1 [1 + (1/ i=1 xi )] n =σ . 2 [1 + (1/ 2 xn+1 xn+2 / ni=1 xi2 xn+2 i=1 xi )] (c) 100(1 − ξ )% multivariate t prediction intervals for yn+1 and yn+2 are given by 6 7 n 7 2 8 ˆ xi2 )] xn+1 β ± tξ/2,2,n−1,R σˆ xn+1 [1 + (1/ i=1
15 Inference for Estimable and Predictable Functions
313
and 6 7 n 7 2 [1 + (1/ xn+2 βˆ ± tξ/2,2,n−1,R σˆ 8xn+2 xi2 )], i=1
where σˆ =
%
1 n−1
n
i=1 (yi
ˆ i )2 and the off-diagonal element of R is − βx
xn+1 xn+2 / ni=1 xi2 % . n n 2 / 2 )][1 + (x 2 / 2 )] [1 + (xn+1 x x i=1 i i=1 i n+2 ˆ (d) By Corollary 13.2.1.1, the BLUP of (yn+1 + yn+2 )/2 is (xn+1 + xn+1 )β/2, and using the solution to part (b) we find that its prediction error variance is 2 2 σ 2 {(xn+1 + xn+2 )[1 + (1/
n
xi2 )] + (2xn+1 xn+2 /
i=1
n
xi2 )}/4 ≡ σ 2 Q.
i=1
Therefore, the desired prediction interval is given by ' ˆ ± tξ/2,n−1 σˆ Q. (xn+1 + xn+2 )β/2 Exercise 35 Suppose that observations {(xi , yi ) : i = 1, . . . , n} follow the normal Gauss–Markov simple linear regression model yi = β1 + β2 xi + ei , where n ≥ 3. Let βˆ1 and βˆ2 be the ordinary least squares estimators of β1 and β2 , 2 be the usual residual mean square; let x¯ = ( n x )/n; and let respectively; let σ ˆ i=1 i SXX = ni=1 (xi − x) ¯ 2 . Recall that in this setting, 5 (βˆ1 + βˆ2 xn+1 ) ± tξ/2,n−2 σˆ 1 +
¯ 2 1 (xn+1 − x) + n SXX
is a 100(1 − ξ )% prediction interval for an unobserved y-value to be taken at a specified x-value xn+1 . Suppose that it is desired to predict the values of three unobserved y-values, say yn+1 , yn+2 , and yn+3 , which are all to be taken at the same known x-value, say x ∗ . Assume that the unobserved values of y follow the same model as the observed data; that is, yi = β1 + β2 x ∗ + ei
(i = n + 1, n + 2, n + 3),
314
15 Inference for Estimable and Predictable Functions
where en+1 , en+2 , en+3 are independent N(0, σ 2 ) random variables and are independent of e1 , . . . , en . (a) Give expressions for Bonferroni prediction intervals for yn+1 , yn+2 , yn+3 whose simultaneous coverage probability is at least 1 − ξ . (b) Give expressions for Scheffé prediction intervals for yn+1 , yn+2 , yn+3 whose simultaneous coverage probability is at least 1 − ξ . (Note: The three requested intervals are part of an infinite collection of intervals, but give expressions for just those three.) (c) Give expressions for multivariate t prediction intervals for yn+1 , yn+2 , yn+3 whose simultaneous coverage probability is exactly 1 − ξ . Note: The offdiagonal elements of the correlation matrix R referenced by the multivariate t quantiles in your prediction intervals are all equal to each other; give an expression for this common correlation coefficient. (d) Give expressions for Bonferroni prediction intervals for all pairwise differences among yn+1 , yn+2 , yn+3 whose simultaneous coverage probability is at least 1 − ξ. (e) Give expressions for multivariate t prediction intervals for all pairwise differences among yn+1 , yn+2 , yn+3 whose simultaneous coverage probability is exactly 1 − ξ . Give expressions for the off-diagonal elements of the correlation matrix R referenced by the multivariate t quantiles in your prediction intervals. (f) Explain why Tukey’s method is applicable to the problem of obtaining simultaneous prediction intervals for all pairwise differences among yn+1 , yn+2 , yn+3 and obtain such intervals. Solution (a) Each of the three intervals is given by 5 (βˆ1 + βˆ2 x ∗ ) ± tξ/2,n−2 σˆ 1 +
¯ 2 1 (x ∗ − x) + . n SXX
(b) Using Theorem 15.3.7, we find that each of the three intervals is given by 5
1 (x ∗ − x) ¯ 2 (βˆ1 + βˆ2 x ) ± σˆ 3Fξ,3,n−2 1 + + . n SXX ∗
(c) Each of the three intervals is given by 5 (βˆ1 + βˆ2 x ∗ ) ± tξ/2,3,n−2,R σˆ 1 +
¯ 2 1 (x ∗ − x) + , n SXX
15 Inference for Estimable and Predictable Functions
315
⎛
⎞ yˆn+1 − yn+1 where R = corr ⎝ yˆn+2 − yn+2 ⎠, which has common off-diagonal element yˆn+3 − yn+3 (correlation coefficient) equal to 1 n
1+
(x ∗ −x) ¯ 2 SXX (x ∗ −x) ¯ 2 1 n + SXX
+
.
(d) Because yˆn+1 = yˆn+2 = yˆn+3 = βˆ1 + βˆ2 x ∗ , the BLUPs of yn+1 − yn+2 , yn+1 − yn+3 , and yn+2 − yn+3 are all equal to 0 and their prediction error variances are all equal to var{[(βˆ1 + βˆ2 x ∗ )−yn+1 ]−[(βˆ1 +√ βˆ2 x ∗ )−yn+2 ]} = 2σ 2 . Therefore, each of the desired intervals is 0 ± tξ/6,n−2 σˆ 2. √ (e) Each of the desired intervals is 0 ± tξ/2,3,n−2,R∗ σˆ 2, where ⎛
⎞ yˆn+1 − yˆn+2 − (yn+1 − yn+2 ) R∗ = corr ⎝ yˆn+1 − yˆn+3 − (yn+1 − yn+3 ) ⎠ yˆn+2 − yˆn+3 − (yn+2 − yn+3 ) ⎛ ⎞ ⎞ ⎛ yn+2 − yn+1 1 0.5 −0.5 = corr ⎝ yn+3 − yn+1 ⎠ = ⎝ 0.5 1 0.5 ⎠ . −0.5 0.5 1 yn+3 − yn+2 (f) Tukey’s method is applicable because yˆn+i − yˆn+j − (yn+i − yn+j ) = yn+j − yn+i for j = i = 1, 2, 3, which comprise all pairwise differences among the ⎛ ⎞ yn+1 predictable functions yn+1 , yn+2 , yn+3 , and var⎝ yn+2 ⎠ = c2 σ 2 I, where c = yn+3 1. Therefore, by Theorem 15.3.6 the Tukey simultaneous confidence intervals for all pairwise differences among yn+1 , yn+2 , yn+3 are given by ∗ (yˆn+j − yˆn+i ) ± σˆ qξ,3,n−2
(i = j = 1, 2, 3).
Exercise 36 Consider the normal Gauss–Markov one-factor model with balanced data: yij = μ + αi + eij
(i = 1, . . . , q;
j = 1, . . . , r).
Recall from Example 15.3-2 that a 100(1 − ξ )% confidence interval for a single level difference αi − αi (i > i = 1, . . . , q) is given by ' (y¯i· − y¯i · ) ± tξ/2,q(r−1) σˆ 2/r,
316
15 Inference for Estimable and Predictable Functions
q where σˆ 2 = i=1 rj =1 (yij −y¯i· )2 /[q(r−1)]. Also recall that in the same example, several solutions were given to the problem of obtaining confidence intervals for the level differences {αi − αi : i > i = 1, . . . , q} whose simultaneous coverage probability is at least 1 − ξ . Now, however, consider a slightly different problem under the same model. Suppose that the factors represent treatments, one of which is a “control” treatment or placebo, and that the investigator has considerably less interest in estimating differences involving the control treatment than in estimating differences not involving the control treatment. That is, letting α1 correspond to the control treatment, the investigator has less interest in α1 − α2 , α1 − α3 ,. . . , α1 − αq than in {αi − αi : i > i = 2, . . . , q}. Thus, rather than using any of the simultaneous confidence intervals obtained in Example 15.3-2, the investigator decides to use “new” confidence intervals for the treatment differences {αi − αi : i > i = 1, . . . , q} that satisfy the following requirements: (I) The new intervals for {αi − αi : i > i = 1, . . . , q}, like the intervals in Example 15.3-2, have simultaneous coverage probability at least 1 − ξ . (II) Each of the new intervals for {αi − αi : i > i = 2, . . . , q} obtained by a given method is narrower than the interval in Example 15.3-2 for the same treatment difference, obtained using the same method. (III) Each of the new intervals for α1 − α2 , α1 − α3 , . . . , α1 − αq is no more than twice as wide as each of the new intervals for {αi − αi : i > i = 2, . . . , q}. (IV) All of the new intervals for {αi − αi : i > i = 2, . . . , q} are of equal width and as narrow as possible, subject to the first three rules. (a) If it is possible to use the Scheffé method to obtain new intervals that satisfy the prescribed rules, give such a solution. Otherwise, explain why it is not possible. (b) If it is possible to use the multivariate t method to obtain new intervals that satisfy the prescribed rules, give such a solution. Otherwise, explain why it is not possible. Note: You may not use the Bonferroni method in formulating solutions, but you may use the notion of linear combinations of intervals. Solution (a) It is not possible. The original Scheffé intervals (y¯i· − y¯i · ) ±
% ' (q − 1)Fξ,q−1,q(r−1) σˆ 2/r
(i > i = 1, . . . , q)
15 Inference for Estimable and Predictable Functions
317
were obtained by taking ⎞ 0 1 −1 0 0 · · · 0 ⎜ 0 1 0 −1 0 · · · 0 ⎟ ⎟ ⎜ ⎟ ⎜ T C = ⎜ 0 1 0 0 −1 · · · 0 ⎟ . ⎟ ⎜. ⎠ ⎝ .. ⎛
01 0
0
0 · · · −1
To satisfy Requirement II we must eliminate at least one row from CT , but then the cT of cT β corresponding to some of the level differences will not be an element of R(CT ). (b) Start with the multivariate t intervals for αi −αi (i > i = 2, . . . , q) and α1 −α2 given by
√ (y¯i· − y¯i · ) ± tξ/2,(q−1)(q−2)/2+1,q(r−1),R σˆ 2/r if i > i = 2, . . . , q, √ (y¯1· − y¯2· ) ± tξ/2,(q−1)(q−2)/2+1,q(r−1),R σˆ 2/r
where R = var
y¯(q−1)· − y¯q· y¯1· − y¯2· y¯2· − y¯3· y¯2· − y¯4· , √ , √ ,..., √ √ σ 2/r σ 2/r σ 2/r σ 2/r
T .
These intervals satisfy Requirements I and II. Now observe that α1 − αi = (α1 − α2 ) + (α2 − αi ) for i = 3, . . . , q. Therefore, we can get intervals for {α1 − αi : i = 3, . . . , q} from the intervals just given above via the extension of the multivariate t method to linear combinations of estimable functions. The resulting intervals are ' (y¯1· − y¯i · ) ± 2tξ/2,(q−1)(q−2)/2+1,q(r−1),R σˆ 2/r
(i = 3, . . . , q).
Together, this last set of intervals and the multivariate t intervals listed above satisfy all the requirements. Exercise 37 Consider a normal prediction-extended linear model ⎛
y
⎞
⎛
X
⎝ yn+1 ⎠ = ⎝ xT yn+2 ⎛
e
n+1 xTn+2
⎞
⎛
e
⎞
⎠ β + ⎝ en+1 ⎠ , en+2
⎞
where ⎝ en+1 ⎠ satisfies Gauss–Markov assumptions except that var(en+1 ) = 2σ 2 en+2 and var(en+2 ) = 3σ 2 .
318
15 Inference for Estimable and Predictable Functions
(a) Give a 100(1 − ξ )% prediction interval for yn+1 . (b) Give a 100(1 − ξ )% prediction interval for yn+2 . (c) Give the 100(1−ξ )% Bonferroni simultaneous prediction intervals for yn+1 and yn+2 . (d) The intervals you obtained in part (c) do not have the same width. Indicate how the Bonferroni method could be used to obtain prediction intervals for yn+1 and yn+2 that have the same width yet have simultaneous coverage probability at least 1 − ξ . (e) Obtain the 100(1−ξ )% multivariate t simultaneous prediction intervals for yn+1 and yn+2 . Solution % (a) xTn+1 βˆ ± tξ/2,n−p∗ σˆ xTn+1 (XT X)− xn+1 + 2. % (b) xTn+2 βˆ ± tξ/2,n−p∗ σˆ xTn+2 (XT X)− xn+2 + 3. % (c) {xTn+1 βˆ ± tξ/4,n−p∗ σˆ xTn+1 (XT X)− xn+1 + 2, xTn+2 βˆ ± tξ/4,n−p∗ σˆ % xTn+2 (XT X)− xn+2 + 3. (d) To obtain Bonferroni-based intervals that have the same width yet have simultaneous coverage probability at least 1 − ξ , we must find ξ1 and ξ2 that satisfy ξ1 + ξ2 = ξ and 6 7 T 7 xn+1 (XT X)− xn+1 + 2 tξ2 /2,n−p∗ 8 . = T tξ1 /2,n−p∗ xn+2 (XT X)− xn+2 + 3 A solution can be found “by trial and error.” (e) The two intervals are % xTn+1 βˆ ± tξ/2,2,n−p∗ ,R σˆ xTn+1 (XT X)− xn+1 + 2 and % xTn+2 βˆ ± tξ/2,2,n−p∗ ,R σˆ xTn+2 (XT X)− xn+2 + 3, where
R=
and
1g g1
g = corr xTn+1 βˆ − yn+1 , xTn+2 βˆ − yn+2 = %
xTn+1 (XT X)− xn+2 % . xTn+1 (XT X)− xn+1 + 2 xTn+2 (XT X)− xn+2 + 3
15 Inference for Estimable and Predictable Functions
319
Exercise 38 Consider the normal Gauss–Markov one-factor analysis-ofcovariance model for balanced data, yij = μ + αi + γ xij + eij
(i = 1, . . . , q, j = 1, . . . , r),
where r ≥ 2 and xij = xij for j = j and i = 1, . . . , q. Let y = (y11 , y12 , . . . , yqr )T , let X be the corresponding model matrix, and let β = (μ, α1 , . . . , αq , γ )T . Let βˆ be a solution to the normal equations and let σˆ 2 be the residual mean square. Furthermore, let c0 , ci , and cix be such that cT0 β = γ ,
cTi β = μ + αi ,
and
cTix β = μ + αi + γ x
where − ∞ < x < ∞.
Also let cii = ci − ci for i > i = 1, . . . , q. Let 0 < ξ < 1. For each of the following four parts, give a confidence interval or a set of confidence intervals that satisfy the stated criteria. Express these confidence intervals in terms ˆ σˆ , σ, c0 , ci , cii , cix , ξ, q, r, and appropriate cutoff of the following quantities: X, β, points from an appropriate distribution. (a) A 100(1 − ξ )% confidence interval for γ . (b) A set of confidence intervals for {μ + αi : i = 1, . . . , q} whose simultaneous coverage probability is exactly 1 − ξ . (c) A set of confidence intervals for {μ + αi + γ x: i = 1, . . . , q and all x ∈ (−∞, ∞)} whose simultaneous coverage probability is at least 1 − ξ . (d) A set of confidence intervals for {αi − αi : i > i = 1, . . . , q} whose simultaneous coverage probability is exactly 1 − ξ. Solution % (a) cT0 βˆ ± tξ/2,q(r−1)−1 σˆ cT0 (XT X)− c0 . % (b) {cTi βˆ ± tξ/2,q,q(r−1)−1,R σˆ cTi (XT X)− ci : i = 1, . . . , q}, where ⎛
cT1 βˆ
cTq βˆ
⎞T
⎠ . ,..., % R = var ⎝ % T T − T T − σ c (X X) c σ c1 (X X) c1 q q (c) rank(X) = q + 1, so the desired intervals are % % cTix βˆ ± (q + 1)Fξ,q+1,q(r−1)−1 σˆ cTix (XT X)− cix for all i = 1, . . . , q and all x ∈ R .
320
15 Inference for Estimable and Predictable Functions
% 9 (d) cTii βˆ ± tξ/2,q(q−1)/2,q(r−1)−1,R∗ σˆ cTii (XT X)− cii for all i > i = 1, . . . , q , where ⎛
⎞T cTq,q−1 βˆ cT13 βˆ cT12 βˆ ⎝ ⎠ . % R = var , % ,..., % σ cTq,q−1 (XT X)− cq,q−1 σ cT12 (XT X)− c12 σ cT13 (XT X)− c13 ∗
Exercise 39 Consider a situation in which a response variable, y, and a single explanatory variable, x, are measured on n subjects. Suppose that the response for each subject having a nonnegative value of x is related to x through a simple linear regression model without an intercept, and the response for each subject having a negative value of x is also related to x through a simple linear regression without an intercept; however, the slopes of the two regression models are possibly different. That is, assume that the observations follow the model yi =
β1 xi + ei if xi < 0, β2 xi + ei if xi ≥ 0,
for i = 1, . . . , n. Suppose further that the ei ’s are independent N(0, σ 2 ) variables; that xi = 0 for all i; and that there is at least one xi less than 0 and at least one xi greater than 0. Let n1 denote the number of subjects whose x-value is less than 0, and let n2 = n − n1 . Finally, let 0 < ξ < 1. (a) Give expressions for the elements of the model matrix X. (Note: To make things easier, assume that the elements of the response vector y are arranged in such a way that the first n1 elements of y correspond to those subjects whose x-value is less than 0.) Also obtain nonmatrix expressions for the least squares estimators of β1 and β2 , and obtain the variance–covariance matrix of those two estimators. Finally, obtain a nonmatrix expression for σˆ 2 . (b) Give confidence intervals for β1 and β2 whose simultaneous coverage probability is exactly 1 − ξ . (c) Using the Scheffé method, obtain a 100(1 − ξ )% simultaneous confidence band for E(y); that is, obtain expressions (in as simple a form as possible) for functions a1 (x), b1 (x), a2 (x), and b2 (x) such that Pr[a1 (x) ≤ β1 x ≤ b1 (x) for all x < 0 and a2 (x) ≤ β2 x ≤ b2 (x) for all x ≥ 0] ≥ 1 − ξ. (d) Consider the confidence band obtained in part (c). Under what circumstances will the band’s width at x equal the width at −x (for all x)?
15 Inference for Estimable and Predictable Functions
321
Solution
x1 0 (a) X = 0 x2 xn1 +n2 )T ,
where x1 = (x1 , x2 , . . . , xn1 )T and x2 = (xn1 +1 , xn1 +2 , . . . ,
−1 n1 xi yi i=1 n1 +n2 n1 +n2 2 = 0 i=n1 +1 xi i=n1 +1 xi yi n 1 1 2 xi yi / ni=1 xi i=1 n1 +n2 2 , = n1 +n2 i=n1 +1 xi yi / i=n1 +1 xi
βˆ =
βˆ1 βˆ2
n1 2 i=1 xi
0
and ˆ = σ2 var(β)
1/
n1
2 0 i=1 xi 1 +n2 2 x 0 1/ ni=n 1 +1 i
.
# ˆ1 xi )2 + n1 +n2 (yi − βˆ2 xi )2 /(n1 + n2 ). (y − β i i=1 i=n1 +1 % % n1 n1 +n2 2 2 ˆ ˆ (b) β1 ± tξ/2,2,n−2,R σˆ / x and β ± t σ ˆ / 2 ξ/2,2,n−2,R i=1 i i=n1 +1 xi , where " n1
Finally, σˆ 2 =
⎛
⎞T ˆ ˆ β1 β2 ⎠ = I. R = var ⎝ % , % n1 +n2 n1 2 2 σ/ x σ/ x i=n1 +1 i i=1 i
−x 0 = (c) Take , where x > 0, and observe that rank(CT ) = 2. Thus, the 0 x Scheffé intervals are ' ' {cT βˆ ± 2Fξ,2,n1 +n2 −2 σˆ cT (XT X)− c for all cT ∈ R(CT )}, CT
i.e., ! βˆ1 x ±
'
βˆ2 x ±
5 2Fξ,2,n1 +n2 −2 σˆ
'
x2 n1
2 i=1 xi
for all x < 0,
⎫ ⎬ for all x ≥ 0 . n1 +n2 2 ⎭ i=n1 +1 xi
6 7 7 x2 2Fξ,2,n1 +n2 −2 σˆ 8
(d) From examination of the intervals in part (c), the width at x is equal to 1 derived 1 +n2 the width at −x if and only if ni=1 xi2 = ni=n x2. 1 +1 i
322
15 Inference for Estimable and Predictable Functions
Exercise 40 Consider a full-rank normal Gauss–Markov analysis-of-covariance model in a setting where there are one or more factors of classification and one or more regression variables. Let q represent the number of combinations of the factor levels. Suppose further that at each such combination the model is a regression model with the same number, pc , of regression variables (including the intercept in this number); see part (c) for two examples. For i = 1, . . . , q let xi represent an arbitrary pc -vector of the regression variables (including the intercept) for the ith combination of classificatory explanatory variables, and let β i and βˆ i represent the corresponding vectors of regression coefficients and their least squares estimators from the fit of the complete model, respectively. Note that an arbitrary p-vector x of the explanatory variables consists of xi for some i, padded with zeroes in appropriate places. (a) For i = 1, . . . , q let Gii represent the submatrix of (XT X)−1 obtained by deleting the rows and columns that correspond to the elements of x excluded to form xi . Show that
%
Pr xTi β i ∈ xTi βˆ i ± σˆ pc Fξ/q,pc ,n−p xTi Gii xi for all xi ∈ Rpc and all i = 1, . . . , q ≥ 1 − ξ. (Hint: First apply Scheffé’s method to obtain a confidence band for xTi β i for fixed i ∈ {1, . . . , q}.) (b) Obtain an expression for the ratio of the width of the interval for xTi β i in the collection of 100(1 − ξ )% simultaneous confidence intervals determined in part (a), to the width of the interval for the same xTi β i in the collection of standard 100(1 − ξ )% Scheffé confidence intervals % xT βˆ ± σˆ pFξ,p,n−p xT (XT X)−1 x, for all x ∈ Rp . (c) For each of the following special cases of models, evaluate the ratio obtained in part (b) when ξ = .05: (i) The three-group simple linear regression model yij = βi1 + βi2 xij + eij
(i = 1, 2, 3; j = 1, . . . , 10).
(ii) The three-group common-slope simple linear regression model yij = βi1 + β2 xij + eij
(i = 1, 2, 3; j = 1, . . . , 10).
Note: This exercise was inspired by Lane and Dumouchel (1994).
15 Inference for Estimable and Predictable Functions
323
Solution (a) For the ith combination of the classificatory explanatory variables, xT β = xTi β i and xT (XT X)−1 x = xTi Gii xi . Thus, the 100(1 − ξ/q)% Scheffé-based confidence band for xTi β i is given by % xTi βˆ i ± σˆ pc Fξ/q,pc ,n−p xTi Gii xi for all xi ∈ Rpc . Applying the Bonferroni inequality to these intervals yields the specified intervals for xTi β i for all xi ∈ Rpc and all i = 1, . . . , q whose simultaneous coverage probability is at least 1 − ξ . % (b) (c)
pc Fξ/q,pc ,n−p pFξ,p,n−p . % 2F.05/3,2,24 (i) % 6F.05,6,24 2F.05/3,2,26 (ii) 4F.05,4,26
= 0.8053. = 0.9367.
References Carlstein, E. (1986). Simultaneous confidence regions for predictions. The American Statistician, 40, 277–279. Lane, T. P. & Dumouchel, W. H. (1994). Simultaneous confidence intervals in multiple regression. The American Statistician, 48, 315–321. Tukey, J. W. (1949). One degree of freedom for non-additivity. Biometrics, 5, 232–242. Zimmerman, D. L. (1987). Simultaneous confidence regions for predictions based on the multivariate t distribution. The American Statistician, 41, 247.
Inference for Variance–Covariance Parameters
16
This chapter presents exercises on inference for the variance–covariance parameters of a linear model and provides solutions to those exercises. Exercise 1 Consider a two-way layout with two rows and two columns. Suppose that there are two observations, labelled as y111 and y112 , in the upper left cell; one observation, labelled as y121 , in the upper right cell; and one observation, labelled as y211 , in the lower left cell, as in the fourth layout displayed in Example 7.1-3 (the lower right cell is empty). Suppose that the observations follow the normal two-way mixed main effects model yij k = μ + αi + bj + dij k
(i, j, k) ∈ {(1, 1, 1), (1, 1, 2), (1, 2, 1), (2, 1, 1)},
where ⎞ b1 ⎜ b ⎟ ⎜ 2 ⎟ ⎟ ⎜ ⎜d ⎟ E ⎜ 111 ⎟ = 0 ⎜ d112 ⎟ ⎟ ⎜ ⎝ d121 ⎠ d211
⎛
⎛
and
⎞ b1 ⎜ b ⎟ ⎜ 2 ⎟ ⎜ ⎟ σb2 I2 0 ⎜d ⎟ var ⎜ 111 ⎟ = . ⎜ d112 ⎟ 0 σ 2 I4 ⎜ ⎟ ⎝ d121 ⎠ d211
Here, the variance components σb2 ≥ 0 and σ 2 > 0 are unknown. (a) Obtain quadratic unbiased estimators of the variance components. (b) Let ξ ∈ (0, 1). Obtain a 100(1−ξ )% confidence interval for ψ ≡ σb2 /σ 2 , which depends on the data only through y111 − y112 and y121 − (y111 + y112 )/2. (Hint: First obtain the joint distribution of these two linear functions of the data, and then obtain the joint distribution of their squares, suitably scaled.)
© Springer Nature Switzerland AG 2020 D. L. Zimmerman, Linear Model Theory, https://doi.org/10.1007/978-3-030-52074-8_16
325
326
16 Inference for Variance–Covariance Parameters
Solution (a) E[(y111 − y112 )2 ] = var(y111 − y112 ) + [E(y111 − y112 )]2 = 2σ 2 , so (y111 − y112 )2 /2 is a quadratic unbiased estimator of σ 2 . Also, E[(y111 − y121 )2 ] = var(y111 − y121 ) + [E(y111 − y121 )]2 = var(b1 + d111 − b2 − d121 ) = 2σb2 + 2σ 2 . Thus, 12 (y111 − y121 )2 − 12 (y111 − y112 )2 is a quadratic unbiased estimator of σb2 . y111 − y112 (b) The joint distribution of is bivariate normal with y121 − (y111 + y112 )/2 2 0 2σ because mean vector 02 and variance–covariance matrix 0 2σb2 + 32 σ 2 var(y111 − y112 ) = 2σ 2 according to part (a), var[y121 − (y111 + y112 )/2] = var[b2 + d121 − b1 − (d111 + d112 )/2] = 2σb2 + 32 σ 2 , and cov[y111 − y112 , y121 − (y111 + y112 )/2] = cov(y111 , y121 ) − cov(y112 , y121 ) − cov[y111 , (y111 + y112 )/2] + cov[y112 , (y111 + y112 )/2] = 0 − 0 − cov[b1 + d111 , b1 + (d111 + d112 )/2]+cov[b1 +d112 , b1 +(d111 +d112 )/2] = −(σb2 + 12 σ 2 )+σb2 + 12 σ 2 = 0. Next define S12 = 12 (y111 − y112 )2 and S22 = 12 [y121 − (y111 + y112 )/2]2 . Then, S12 ∼ χ 2 (1) σ2
and
S22 σb2 + 34 σ 2
∼ χ 2 (1),
and these two random variables are independent because y111 − y112 and y121 − (y111 + y112 )/2 are independent by Theorem 14.1.8. Now define U=
S22 S12
and
∗
U =
S22 /(σb2 + 34 σ 2 ) S12 /σ 2
=
σ2
σb2 + 34 σ 2
U ∗ ∼ F(1, 1) by Definition 14.5.1, so 1 − ξ = Pr(F1−(ξ/2),1,1 ≤ U ∗ ≤ Fξ/2,1,1 ) σ2 = Pr F1−(ξ/2),1,1 ≤ 2 3 U ≤ Fξ/2,1,1 σb + 4 σ 2 1 = Pr F1−(ξ/2),1,1 ≤ U ≤ Fξ/2,1,1 ψ + 34 U 3 3 U . − ≤ψ ≤ − = Pr Fξ/2,1,1 4 F1−(ξ/2),1,1 4
U.
16 Inference for Variance–Covariance Parameters
327
Thus,
U Fξ/2,1,1
3 3 U − , − 4 F1−(ξ/2),1,1 4
is a 100(1 − ξ )% confidence interval for ψ. If any endpoints of the interval are negative, they can be set equal to zero. Exercise 2 Consider the normal two-way main effects components-of-variance model with only one observation per cell, yij = μ + ai + bj + dij
(i = 1, . . . , q; j = 1, . . . , m)
where ai ∼ N(0, σa2 ), bj ∼ N(0, σb2 ), and dij ∼ N(0, σ 2 ) for all i and j , and where {ai }, {bj }, and {dij } are mutually independent. The corrected two-part mixed-model ANOVA table (with Factor A fitted first) is given below, with an additional column giving the distributions of suitably scaled mean squares under the model. Source Factor A
df q −1
Sum of squares Mean square q m i=1 (y¯i· − y¯·· )2 S12 m Factor B m−1 q j =1 (y¯·j − y¯·· )2 S22 q m 2 Residual (q − 1)(m − 1) S32 j =1 (yij − y¯i· − y¯·j + y¯·· ) i=1 q m 2 Total qm − 1 j =1 (yij − y¯·· ) i=1
Here, f1 S12 ∼ χ 2 (q − 1), f2 S22 ∼ χ 2 (m − 1), and f3 S32 ∼ χ 2 [(q − 1)(m − 1)], where f1 =
q −1 , σ 2 + mσa2
f2 =
m−1 , + qσb2
σ2
f3 =
(q − 1)(m − 1) . σ2
Furthermore, S12 , S22 , and S32 are mutually independent. Let ξ ∈ (0, 1). (a) Verify the distributions given in the last column of the table, and verify that S12 , S22 , and S32 are mutually independent. (b) Obtain minimum variance quadratic unbiased estimators of the variance components. (c) Obtain a 100(1 − ξ )% confidence interval for σ 2 + qσb2 . (d) Obtain a 100(1 − ξ )% confidence interval for σb2 /σ 2 . (e) Obtain a confidence interval for (σa2 − σb2 )/σ 2 which has coverage probability at least 1 − ξ . [Hint: first combine the interval from part (d) with a similarly constructed interval for σa2 /σ 2 via the Bonferroni inequality.]
328
16 Inference for Variance–Covariance Parameters
(f) Use Satterthwaite’s method to obtain an approximate 100(1 − ξ )% confidence interval for σa2 . (g) Obtain size-ξ tests of H0 : σa2 = 0 versus Ha : σa2 > 0 and H0 : σb2 = 0 versus Ha : σb2 > 0. Solution (a) Here f1 S12 = yT A1 y, f2 S22 = yT A2 y, and f3 S32 = yT A3 y, where 1 1 1 [(Iq − Jq ) ⊗ Jm ], A1 = 2 2 q m σ + mσa 1 1 1 A2 = [ Jq ⊗ (Im − Jm )], m σ 2 + qσb2 q 1 1 1 [(Iq − Jq ) ⊗ (Im − Jm )]. A3 = 2 q m σ
Here the expression for A1 was obtained by exploiting the similarity of this model up to and including the first random effect to the one-factor random model of Example 16.1.4-1; the expression for A2 was obtained by symmetry; and the expression for A3 was obtained by subtraction of the first two sums of squares from the corrected total sum of squares followed by the repeated use of Theorem 2.17.2, i.e., 1 1 1 1 1 1 yT {(Iq ⊗ Im ) − ( Jq ⊗ Jm ) − [(Iq − Jq ) ⊗ Jm ] − [ Jq ⊗ (Im − Jm )]}y q m q m q m = yT {(Iq ⊗ Im ) − (Iq ⊗ = yT {[Iq ⊗ (Im − = yT [(Iq −
1 1 1 Jm ) − [ Jq ⊗ (Im − Jm )]}y m q m
1 1 1 Jm ) − [ Jq ⊗ (Im − Jm )]}y m q m
1 1 Jq ) ⊗ (Im − Jm )]y. q m
Furthermore, ≡ var(y) = σa2 (Iq ⊗ Jm ) + σb2 (Jq ⊗ Im ) + σ 2 (Iq ⊗ Im ). Now, A1 =
1 σ 2 + mσa2
+σ 2 [(Iq −
{σa2 [(Iq −
1 1 Jq ) ⊗ Jm ] + σb2 [0q×q ⊗ Jm ] q m
1 1 Jq ) ⊗ Jm } q m
16 Inference for Variance–Covariance Parameters
329
1 1 Jq ) ⊗ Jm , q m 1 1 1 A2 = {σa2 [ Jq ⊗ 0m×m ] + σb2 [Jq ⊗ (Im − Jm )] 2 2 q m σ + qσb = (Iq −
1 1 +σ 2 [ Jq ⊗ (Im − Jm )]} q m 1 1 Jq ⊗ (Im − Jm ), q m 1 1 1 A3 = {σa2 [(Iq − Jq ) ⊗ 0m×m ] + σb2 [0q×q ⊗ (Im − Jm )] 2 q m σ =
+σ 2 [(Iq − = (Iq −
1 1 Jq ) ⊗ (Im − Jm )]} q m
1 1 Jq ) ⊗ (Im − Jm ). q m
Observe that A1 , A2 , and A3 are idempotent (because each term in each Kronecker product is idempotent). Also observe that rank(A1 ) = q − 1, rank(A2 ) = m − 1, and rank(A3 ) = (q − 1)(m − 1) by Theorem 2.17.8, and that Ai 1μ = μAi (1q ⊗ 1m ) = 0 for i = 1, 2, 3. Thus, by Corollary 14.2.6.1, f1 S12 ∼ χ 2 (q − 1), f2 S22 ∼ χ 2 (m − 1), and f3 S32 ∼ χ 2 [(q − 1)(m − 1)]. Moreover, 1 A1 A2 = (0q×q ⊗ 0m×m ) = 0, σ 2 + qσb2 1 1 [(Iq − Jq ) ⊗ 0m×m ] = 0, A1 A3 = 2 q σ 1 1 [0q×q ⊗ (Im − Jm )] = 0, A2 A3 = m σ2 so by Corollary 14.3.1.1 S12 , S22 , and S32 are independent. (q−1)S 2
(b) Using the distributional results from part (a), E σ 2 +mσ12 = q − 1 so E(S12 ) = a (q−1)(m−1)S32 (m−1)S22 2 2 2 2 2 = m−1 so E(S = σ +mσa ; E 2 ) = σ +qσ ; and E 2 b 2 σ2 σ +qσb
(q − 1)(m − 1), so E(S32 ) = σ 2 . Thus, minimum variance quadratic unbiased estimators are σˆ 2 = S32 ,
σˆ a2 = (S12 − S32 )/m,
σˆ b2 = (S22 − S32 )/q.
330
16 Inference for Variance–Covariance Parameters
(c) 1 − ξ = Pr
2 χ1−(ξ/2),m−1
= Pr Therefore,
(m − 1)S22 2 χξ/2,m−1
(m−1)S22 (m−1)S22 , 2 2 χξ/2,m−1 χ1−(ξ/2),m−1
≤
≤σ
(m − 1)S22 σ 2 + qσb2 2
+ qσb2
≤
≤
2 χξ/2,m−1
(m − 1)S22
2 χ1−(ξ/2),m−1
.
is a 100(1−ξ )% confidence interval for σ 2 +
qσb2 . (d) Define U=
S22
, 2
S3
U∗ =
S22 /(σ 2 + qσb2 ) S32 /σ 2
,
and
ψ=
σb2 , σ2
and observe that U ∗ ∼ F[m − 1, (q − 1)(m − 1)]. Therefore,
1−ξ = = = =
σ2 Pr F1−(ξ/2),m−1,(q−1)(m−1) ≤ 2 U ≤ Fξ/2,m−1,(q−1)(m−1) σ + qσb2 1 Pr F1−(ξ/2),m−1,(q−1)(m−1) ≤ U ≤ Fξ/2,m−1,(q−1)(m−1) 1 + qψ U U Pr ≤ 1 + qψ ≤ Fξ/2,m−1,(q−1)(m−1) F1−(ξ/2),m−1,(q−1)(m−1) U/Fξ/2,m−1,(q−1)(m−1) − 1 U/F1−(ξ/2),m−1,(q−1)(m−1) − 1 Pr ≤ψ ≤ . q q
Therefore, U/Fξ/2,m−1,(q−1)(m−1) − 1 U/F1−(ξ/2),m−1,(q−1)(m−1) − 1 , q q is a 100(1 − ξ )% confidence interval for σb2 /σ 2 . If any endpoints of the interval are negative, they can be set equal to zero. (e) Denote the interval obtained by replacing ξ/2 with ξ/4 in part (d) by [L1 , R1 ]. A similar interval could be constructed for σa2 /σ 2 ; denote that interval by [L2 , R2 ]. σ2
Let A denote the event L1 ≤ σb2 ≤ R1 and L2 ≤ 9 σ2 equivalent to the event − R1 ≤ − σb2 ≤ −L1 and
σa2 σ2
≤ R2 , which is σ2 L2 ≤ σa2 ≤ R2 .
16 Inference for Variance–Covariance Parameters
331
Then, by the Bonferroni inequality, Pr(A) ≥ 1 − ξ . Now by Fact 15.3,
σa2 − σb2 Pr(A) ≤ Pr L2 − R1 ≤ ≤ R2 − L1 . σ2 Thus, [L2 − R1 , R2 − L1 ] is an interval for (σa2 − σb2 )/σ 2 , which has coverage probability at least 1 − ξ . (f) In the notation used to describe Satterthwaite’s approximation, let U1 = where n1 = q − 1 and α12 = σ 2 + mσa2 . Similarly, let U3 = n3 = (q − 1)(m − 1) and σ2
=
α32
σ 2.
n3 S32 , α32
n1 S12 , α12
where
We want a confidence interval for σa2 =
+ mσa2 ) − m , so we let c1 = m1 and c3 = − m1 . Then, specializing (16.4) and (16.5), an approximate 100(1 − ξ )% confidence interval for σa2 is given by 1 2 m (σ
/
tˆ(S12 − S32 ) 2 mχξ/2, tˆ
,
tˆ(S12 − S32 )
0
2 mχ1−(ξ/2), tˆ
where tˆ =
( m1 S12 − (S12 /m)2 q−1
(g)
S12 /(σ 2 +mσa2 ) S32 /σ 2
+
1 2 2 m S3 ) (−S32 /m)2 (q−1)(m−1)
=
(S12 − S32 )2 S14 q−1
S12 S32
+
S34 (q−1)(m−1)
.
σ 2 +mσa2 F[q −1, (q −1)(m−1)]. σ2 Thus, a size-ξ test of H0 : σa2 = 0 versus Ha : σa2 > 0 is to reject H0 if and only S 2 /(σ 2 +qσ 2 ) S2 if 12 > Fξ,q−1,(q−1)(m−1) . Similarly, 2 2 2 b ∼ F[(m − 1), (q − 1)(m − 1)] S3 S3 /σ σ 2 +qσb2 S22 so 2 ∼ σ 2 F[(m − 1), (q − 1)(m − 1)]. Thus, a size-ξ test of H0 : σb2 = 0 S
∼ F[q −1, (q −1)(m−1)] so
∼
3
versus Ha : σb2 > 0 is to reject H0 if and only if
S22 S32
> Fξ,m−1,(q−1)(m−1) .
Exercise 3 Consider the normal two-way components-of-variance model with interaction and balanced data: yij k = μ + ai + bj + cij + dij k
(i = 1, . . . , q; j = 1, . . . , m; k = 1, · · · , r)
where ai ∼ N(0, σa2 ), bj ∼ N(0, σb2 ), cij ∼ N(0, σc2 ), and dij k ∼ N(0, σ 2 ) for all i, j , and k, and where {ai }, {bj }, {cij }, and {dij k } are mutually independent. The corrected three-part mixed-model ANOVA table (corresponding to fitting the interaction terms after the main effects) is given below, with an additional column giving the distributions of suitably scaled mean squares under the model.
332
16 Inference for Variance–Covariance Parameters Source Factor A
df q −1
Sum of squares q (y¯i·· − y¯··· )2 mi=1 Factor B m−1 qr j =1 (y¯·j · − y¯··· )2 q m Interaction (q − 1)(m − 1) r i=1 j =1 (y¯ij · − y¯i·· − y¯·j · + y¯··· )2 q m r 2 Residual qm(r − 1) j =1 k=1 (yij k − y¯ij · ) i=1 q m r 2 Total qmr − 1 j =1 k=1 (yij k − y¯··· ) i=1 mr
Mean square S12 S22 S32 S42
Here f1 S12 ∼ χ 2 (q − 1), f2 S22 ∼ χ 2 (m − 1), f3 S32 ∼ χ 2 [(q − 1)(m − 1)], f4 S42 ∼ χ 2 [qm(r − 1)], where f1 =
q −1 , σ 2 + rσc2 + mrσa2
f3 =
(q − 1)(m − 1) , σ 2 + rσc2
f2 = f4 =
σ2
m−1 , + rσc2 + qrσb2
qm(r − 1) . σ2
Furthermore, S12 , S22 , S32 , and S42 are mutually independent. Let ξ ∈ (0, 1). (a) Verify the distributions given in the last column of the table, and verify that S12 , S22 , S32 , and S42 are mutually independent. (b) Obtain minimum variance quadratic unbiased estimators of the variance components. (c) Obtain a 100(1 − ξ )% confidence interval for distribution of
S22 /S32 ,
σb2 . σ 2 +rσc2
(Hint: consider the
suitably scaled.)
(d) Obtain a 100(1 − ξ )% confidence interval for similarity with part (c).]
σa2 . σ 2 +rσc2
[Hint: exploit the σ 2 +σ 2
(e) Using the results of part (c) and (d), obtain a confidence interval for σ 2a+rσb2 , c which has coverage probability at least 1 − ξ . (f) Obtain size-ξ tests of H0 : σa2 = 0 versus Ha : σa2 > 0, H0 : σb2 = 0 versus Ha : σb2 > 0, and H0 : σc2 = 0 versus Ha : σc2 > 0. Solution (a) Here f1 S12 = yT A1 y, f2 S22 = yT A2 y, f3 S32 = yT A3 y, and f4 S42 = yT A4 y, where 1 1 1 1 [(Iq − Jq ) ⊗ Jm ⊗ Jr ], A1 = q m r σ 2 + rσc2 + mrσa2 1 1 1 1 A2 = [ Jq ⊗ (Im − Jm ) ⊗ Jr ], 2 2 2 q m r σ + rσc + qrσb
16 Inference for Variance–Covariance Parameters
333
1 1 1 1 [(Iq − Jq ) ⊗ (Im − Jm ) ⊗ Jr ], A3 = q m r σ 2 + rσc2 1 1 A4 = [Iq ⊗ Im ⊗ (Ir − Jr )] r σ2
Here the expressions for A1 , A2 , and A3 were obtained by extending expressions for matrices labelled with the same symbols in the solution to Exercise 16.2 to account for replication; obtaining the expression for A4 is trivial. Furthermore, ≡ var(y) = σa2 (Iq ⊗Jm ⊗Jr )+σb2 (Jq ⊗Im ⊗Jr )+σc2 (Iq ⊗Im ⊗Jr )+σ 2 (Iq ⊗Im ⊗Ir ).
Now, A1 =
1 σ 2 + rσc2 + mrσa2
+σc2 [(Iq −
{σa2 [(Iq −
1 1 Jq ) ⊗ Jm ⊗ Jr ] + σb2 [0q×q ⊗ Jm ⊗ Jr ] q m
1 1 1 1 1 Jq ) ⊗ Jm ⊗ Jr ] + σ 2 [Iq − Jq ) ⊗ Jm ⊗ Jr ]} q m q m r
1 1 1 Jq ) ⊗ Jm ⊗ Jr , q m r 1 1 1 {σa2 [ Jq ⊗ 0m×m ⊗ Jr ] + σb2 [Jq ⊗ (Im − Jm ) ⊗ Jr ] A2 = q m σ 2 + rσc2 + qrσb2 = (Iq −
1 1 1 1 +σc2 [ Jq ⊗ (Im − Jm ) ⊗ Jr ] + σ 2 [ Jq ⊗ (Im − Jm ) ⊗ Jr ]} q m q m 1 1 1 Jq ⊗ (Im − Jm ) ⊗ Jr , q m r 1 1 1 A3 = {σa2 [(Iq − Jq ) ⊗ 0m×m ⊗ Jr ] + σb2 [0q×q ⊗ (Im − Jm ) ⊗ Jr ] q m σ 2 + rσc2 =
+σc2 [(Iq −
1 1 Jq ) ⊗ (Im − Jm ) ⊗ Jr ] q m
+σ 2 [(Iq −
1 1 1 Jq ) ⊗ (Im − Jm ) ⊗ Jr ]} q m r
1 1 1 Jq ) ⊗ (Im − Jm ) ⊗ Jr , q m r 1 A4 = {σa2 [Iq ⊗ Jm ⊗ 0r×r ] + σb2 [Jq ⊗ Im ⊗ 0r×r ] + σc2 [Iq ⊗ Im ⊗ 0r×r ] σ2 = (Iq −
1 +σ 2 [Iq ⊗ Im ⊗ (Ir − Jr )]} r 1 = Iq ⊗ Im ⊗ (Ir − Jr ). r
334
16 Inference for Variance–Covariance Parameters
Observe that A1 , A2 , A3 , and A4 are idempotent (because each term in each Kronecker product is idempotent). Also observe that rank(A1 ) = q − 1, rank(A2 ) = m − 1, rank(A3 ) = (q − 1)(m − 1), and rank(A4 ) = qm(r − 1) by Theorem 2.17.8, and that Ai 1μ = μAi (1q ⊗ 1m ⊗ 1r ) = 0 for i = 1, 2, 3, 4. Thus, by Corollary 14.2.6.1, f1 S12 ∼ χ 2 (q − 1), f2 S22 ∼ χ 2 (m − 1), f3 S32 ∼ χ 2 [(q − 1)(m − 1)], and f4 S42 ∼ χ 2 [qm(r − 1)]. Moreover, A1 A2 = A1 A3 = A1 A4 = A2 A3 = A2 A4 = A3 A4 =
1
1 (0q×q ⊗ 0m×m ⊗ Jr ) = 0, r σ 2 + rσc2 + qrσb2 1 1 1 [(Iq − Jq ) ⊗ 0m×m ⊗ Jr ] = 0, 2 2 q r σ + rσc 1 1 1 [(Iq − Jq ) ⊗ Jm ⊗ 0r×r ] = 0, 2 q m σ 1 1 1 [0q×q ⊗ (Im − Jm ) ⊗ Jr ] = 0, m r σ 2 + rσc2 1 1 1 [ Jq ⊗ (Im − Jm ) ⊗ 0r×r ] = 0, m σ2 q 1 1 1 [(Iq − Jq ) ⊗ (Im − Jm ) ⊗ 0r×r ] = 0, 2 q m σ
so by Corollary 14.3.1.1 S12 , S22 , S32 , and S42 are independent. (q−1)S12 (b) Using the distributional results from part (a), E σ 2 +rσ 2 +mrσ 2 = q − 1 so c a 2 (m−1)S 2 E(S12 ) = σ 2 + rσc2 + mrσa2 ; E 2 = m − 1 so E(S22 ) = σ 2 + σ +rσc2 +qrσb2 (q−1)(m−1)S32 rσc2 + qrσb2 ; E = (q − 1)(m − 1) so E(S32 ) = σ 2 + rσc2 ; and σ 2 +rσc2 qm(r−1)S42 = qm(r −1) so E(S42 ) = σ 2 . Thus, minimum variance quadratic E σ2 unbiased estimators are σˆ 2 = S42 ,
σˆ c2 =
S32 − S42 , r
σˆ b2 =
S22 − S32 , qr
σˆ a2 =
S12 − S32 . mr
(c) Define U=
S22
, 2
S3
U∗ =
S22 /(σ 2 + rσc2 + qrσb2 ) S32 /(σ 2 + rσc2 )
,
and
ψ1 =
σb2 . σ 2 + rσc2
16 Inference for Variance–Covariance Parameters
335
Observe that U ∗ ∼ F[m − 1, (q − 1)(m − 1)]. Therefore,
σ 2 + rσc2 U ≤ Fξ/2,m−1,(q−1)(m−1) σ 2 + rσc2 + qrσb2 1 ≤ U ≤ Fξ/2,m−1,(q−1)(m−1) 1 + qrψ1
1 − ξ = Pr F1−(ξ/2),m−1,(q−1)(m−1) ≤ = Pr F1−(ξ/2),m−1,(q−1)(m−1) = Pr
(U/Fξ/2,m−1,(q−1)(m−1) ) − 1 (U/F1−(ξ/2),m−1,(q−1)(m−1) ) − 1 ≤ψ1 ≤ . qr qr
So
(U/Fξ/2,m−1,(q−1)(m−1) ) − 1 (U/F1−(ξ/2),m−1,(q−1)(m−1) ) − 1 , qr qr
is a 100(1 − ξ )% confidence interval for σb2 /(σ 2 + rσc2 ). If any endpoints of the interval are negative, they can be set equal to zero. (d) Define V = S12 /S32 ,
V∗ =
S12 /(σ 2 + rσc2 + mrσa2 ) S32 /(σ 2 + rσc2 )
,
and
ψ2 =
σa2 . σ 2 + rσc2
Observe that V ∗ ∼ F[q − 1, (q − 1)(m − 1)]. By the symmetry,
(V /Fξ/2,q−1,(q−1)(m−1) )) − 1 (V /F1−(ξ/2),q−1,(q−1)(m−1) ) − 1 , mr mr
is a 100(1 − ξ )% confidence interval for σa2 /(σ 2 + rσc2 ). If any of the endpoints of the interval are negative, they can be set equal to zero. (e) Define (U/Fξ/4,q−1,(q−1)(m−1) ) − 1 , mr (V /Fξ/4,m−1,(q−1)(m−1) ) − 1 , L2 = qr L1 =
σa2 ≤ R2 ) = σ 2 +rσc2 σb2 furthermore, by the Bonferroni inequality, Pr(L1 ≤ σ 2 +rσ 2 ≤ R1 c σb2 σa2 ≤ R2 ) ≥ 1 − ξ . Now, if L1 ≤ σ 2 +rσ 2 ≤ R1 and L2 ≤ σ 2 +rσc2 c
Then Pr(L1 ≤ 1 − (ξ/2); and L2 ≤
σb2 2 σ +rσc2
(U/F1−ξ/4,q−1,(q−1)(m−1) ) − 1 mr (U/F1−ξ/4,m−1,(q−1)(m−1) ) − 1 R2 = . qr R1 =
≤ R1 ) = 1 − (ξ/2) and Pr(L2 ≤
336
16 Inference for Variance–Covariance Parameters σa2 σ 2 +rσc2
≤ R2 , then L1 + L2 ≤
σa2 +σb2 σ 2 +rσc2
≤ R1 + R2 . By Fact 15.3 it follows that
σ 2 + σb2 ≤ R1 + R2 Pr L1 + L2 ≤ 2a σ + rσc2 (f)
S12 /(σ 2 +rσc2 +mrσa2 ) S32 /(σ 2 +rσc2 )
S12 S32
≥ 1 − ξ.
σ 2 +rσc2 +mrσa2 F[q − σ 2 +rσc2 2 2 1, (q − 1)(m − 1)]. Thus, a size-ξ test of H0 : σa = 0 versus Ha : σa > 0 is to S 2 /(σ 2 +rσ 2 +qrσ 2 ) S2 reject H0 if and only if 12 > Fξ,q−1,(q−1)(m−1) . Similarly, 2 2 2 c 2 b ∼ S3 S3 /(σ +rσc ) σ 2 +rσc2 +qrσb2 S22 F[(m − 1), (q − 1)(m − 1)] so 2 ∼ F[(m − 1), (q − 1)(m − 1)]. σ 2 +rσc2 S3 2 2 Thus, a size-ξ test of H0 : σb = 0 versus Ha : σb > 0 is to reject H0 if and only S 2 /(σ 2 +rσ 2 ) S2 if 22 > Fξ,m−1,(q−1)(m−1) . Finally, 3 2 2 c ∼ F[(q − 1)(m − 1), qm(r − 1)] S3 S4 /σ S32 σ 2 +rσc2 so 2 ∼ σ 2 F[(q −1)(m−1), qm(r −1)]. Thus, a size-ξ test of H0 : σc2 = 0 S4 S2 versus Ha : σc2 > 0 is to reject H0 if and only if 32 > Fξ,(q−1)(m−1),qm(r−1) . S
∼ F[q − 1, (q − 1)(m − 1)] so
∼
4
Exercise 4 Consider the normal two-factor nested components-of-variance model with balanced data: yij k = μ + ai + bij + dij k
(i = 1, . . . , q; j = 1, . . . , m; k = 1, . . . , r)
where ai ∼ N(0, σa2 ), bij ∼ N(0, σb2 ), and dij k ∼ N(0, σ 2 ) for all i, j , and k, and where {ai }, {bj }, and {dij k } are mutually independent. The corrected twopart mixed-model ANOVA table (with Factor A fitted first) is given below, with an additional column giving the distributions of suitably scaled mean squares under the model. Source Factor A
df q −1
Factor B within A
q(m − 1)
Residual
qm(r − 1) qmr − 1
Total
Sum of squares q mr i=1 (y¯i·· − y¯··· )2 q m r i=1 j =1 (y¯ij · − y¯i·· )2 q m r 2 j =1 k=1 (yij k − y¯ij · ) i=1 q m r 2 j =1 k=1 (yij k − y¯··· ) i=1
Mean square S12 S22 S32
Here f1 S12 ∼ χ 2 (q − 1), f2 S22 ∼ χ 2 [q(m − 1)], and f3 S32 ∼ χ 2 [qm(r − 1)], where f1 =
σ2
q −1 , + rσb2 + mrσa2
f2 =
q(m − 1) , σ 2 + rσb2
f3 =
qm(r − 1) . σ2
16 Inference for Variance–Covariance Parameters
337
Furthermore, S12 , S22 , and S32 are mutually independent. Let ξ ∈ (0, 1). (a) Verify the distributions given in the last column of the table, and verify that S12 , S22 , and S32 are mutually independent. (b) Obtain minimum variance quadratic unbiased estimators of the variance components. (c) Obtain a 100(1 − ξ )% confidence interval for σ 2 . σ 2 +mσ 2
(d) Obtain a 100(1 − ξ )% confidence interval for b σ 2 a . (e) Use Satterthwaite’s method to obtain an approximate 100(1 − ξ )% confidence interval for σa2 . (f) Obtain size-ξ tests of H0 : σa2 = 0 versus Ha : σa2 > 0 and H0 : σb2 = 0 versus Ha : σb2 > 0. Solution (a) Here f1 S12 = yT A1 y, f2 S22 = yT A2 y, and f3 S32 = yT A3 y, where
1
1 1 1 [(Iq − Jq ) ⊗ Jm ⊗ Jr ], q m r σ 2 + rσb2 + mrσa2 1 1 1 A2 = [Iq ⊗ (Im − Jm ) ⊗ Jr ], 2 2 m r σ + rσb 1 1 [Iq ⊗ Im ⊗ (Ir − Jr )], A3 = 2 r σ
A1 =
and ≡ var(y) = σa2 (Iq ⊗ Jm ⊗ Jr ) + σb2 (Iq ⊗ Im ⊗ Jr ) + σ 2 (Iq ⊗ Im ⊗ Ir ). Now, A1 =
1 σ 2 + rσb2 + mrσa2
{σa2 [(Iq −
+σb2 [(Iq −
1 1 Jq ) ⊗ Jm ⊗ Jr ] q m
+σ 2 [(Iq −
1 1 1 Jq ) ⊗ Jm ⊗ Jr ] q m r
= (Iq −
1 1 1 Jq ) ⊗ Jm ⊗ Jr , q m r
1 Jq ) ⊗ Jm ⊗ Jr ] q
338
16 Inference for Variance–Covariance Parameters
A2 =
1 σ 2 + rσb2
{σa2 [Iq ⊗ 0m×m ⊗ Jr ] + σb2 [Iq ⊗ (Im −
1 Jm ) ⊗ Jr ] m
1 1 Jm ) ⊗ Jr ]} m r 1 1 = Iq ⊗ (Im − Jm ) ⊗ Jr , m r 1 A3 = {σa2 [Iq ⊗ Jm ⊗ 0r×r ] + σb2 [Iq ⊗ Im ⊗ 0r×r ] σ2 +σ 2 [Iq ⊗ (Im −
1 +σ 2 [Iq ⊗ Im ⊗ (Ir − Jr )]} r 1 = Iq ⊗ Im ⊗ (Ir − Jr ). r Observe that A1 , A2 , and A3 are idempotent (because each term in each Kronecker product is idempotent). Also observe that rank(A1 ) = q − 1, rank(A2 ) = q(m − 1), and rank(A3 ) = qm(r − 1) by Theorem 2.17.8, and that Ai 1μ = μAi (1q ⊗ 1m ⊗ 1r ) = 0 for i = 1, 2, 3. Thus, by Corollary 14.2.6.1, f1 S12 ∼ χ 2 (q−1), f2 S22 ∼ χ 2 [q(m−1)], and f3 S32 ∼ χ 2 [qm(r −1)]. Moreover,
1 1 1 [(Iq − Jq ) ⊗ 0m×m ⊗ Jr ] = 0, A1 A2 = q r σ 2 + rσb2 1 1 1 [(Iq − Jq ) ⊗ Jm ⊗ 0r×r ] = 0, A1 A3 = 2 q m σ 1 1 [Iq ⊗ (Im − Jm ) ⊗ 0r×r ] = 0, A2 A3 = m σ2 so by Corollary 14.3.1.1 S12 , S22 , and S32 are independent. 2 = qm(r − 1), so σ 2 = S 2 ; E q(m−1) S 2 (b) E qm(r−1) = q(m − 1), so ˆ S 3 3 σ2 σ 2 +rσb2 2 2 σˆ 2 + r σˆ b2 = S22 , i.e., σˆ b2 = (S22 − S32 )/r; E 2 q−1 = q − 1, so S 2 2 1 σ +rσb +mrσa
σˆ 2 + r σˆ b2 + mr σˆ a2 = S12 , i.e., σˆ a2 = (S12 − S22 )/mr. (c)
qm(r − 1)S32 2 ≤ ≤ χξ/2,qm(r−1) 1 − ξ = Pr σ2 qm(r − 1)S32 qm(r − 1)S32 2 = Pr ≤σ ≤ 2 , 2 χξ/2,qm(r−1) χ1−(ξ/2),qm(r−1) 2 χ1−(ξ/2),qm(r−1)
16 Inference for Variance–Covariance Parameters
so
qm(r−1)S32 qm(r−1)S32 , 2 2 χξ/2,qm(r−1) χ1−(ξ/2),qm(r−1)
339
is a 100(1 − ξ )% confidence interval for σ 2 .
(d) Define U=
S12
, 2
S3
U∗ =
S12 /(σ 2 + rσb2 + mrσa2 ) S32 /σ 2
and
,
ψ=
σb2 + mσa2 . σ2
Observe that U ∗ ∼ F[q − 1, qm(r − 1)]. Therefore, 1 − ξ = Pr F1−(ξ/2),q−1,qm(r−1) ≤
σ2 σ 2 + rσb2 + mrσa2
U ≤ Fξ/2,q−1,qm(r−1)
1 1 U≤ ≤ = Pr Fξ/2,q−1,qm(r−1) 1 + rψ F1−(ξ/2),q−1,qm(r−1) (U/F1−(ξ/2),q−1,qm(r−1) ) − 1 (U/Fξ/2,q−1,qm(r−1) ) − 1 ≤ψ ≤ . = Pr r r
Therefore,
1
(U/Fξ/2,q−1,qm(r−1) ) − 1 (U/F1−(ξ/2),q−1,qm(r−1) ) − 1 , r r
σ 2 +mσ 2
is a 100(1−ξ )% confidence interval for b σ 2 a . If any endpoints of the interval are negative, they can be set equal to zero. (e) σa2 = [(σ 2 + rσb2 + mrσa2 ) − (σ 2 + rσb2 )]/mr ≡ (α12 /mr) − (α22 /mr). Thus, by Satterthwaite’s method an approximate 100(1 − ξ )% confidence interval for σa2 is given by ⎤ ⎡ 2 S S2 S2 S2 tˆ mr1 − mr2 tˆ mr1 − mr2 ⎥ ⎢ ⎥ ⎢ , ⎦ ⎣ 2 2 χξ/2, χ1−(ξ/2), ˆt ˆt where tˆ =
(f)
S12 /(σ 2 +rσb2 +mrσa2 ) S22 /(σ 2 +rσb2 )
(S12 − S22 )2 S14 q−1
+
S24 q(m−1)
.
σ 2 +rσb2 +mrσa2 F[q −1, q(m−1)]. σ 2 +rσb2 2 2 Thus, a size-ξ test of H0 : σa = 0 versus Ha : σa > 0 is to reject H0 if and S 2 /(σ 2 +rσ 2 ) S2 only if 12 > Fξ,q−1,q(m−1) . Similarly, 2 2 2 b ∼ F[q(m − 1), qm(r − 1)] S2 S3 /σ
∼ F[q −1, q(m−1)] so
S12 S22
∼
340
16 Inference for Variance–Covariance Parameters
so
S22 S32
∼
σ 2 +rσb2 F[q(m σ2
− 1), qm(r − 1)]. Thus, a size-ξ test of H0 : σb2 = 0
versus Ha : σb2 > 0 is to reject H0 if and only if
S22 S32
> Fξ,q(m−1),qm(r−1) .
Exercise 5 In a certain components-of-variance model, there are four unknown variance components: σ 2 , σ12 , σ22 , and σ32 . Suppose that four quadratic forms Si2 (i = 1, 2, 3, 4) whose expectations are linearly independent linear combinations of the variance components are obtained from an ANOVA table. Furthermore, suppose that S12 , S22 , S32 , and S42 are mutually independent and that the following distributional results hold: f1 S12 σ 2 + σ12 + σ22 f2 S22 σ 2 + σ12 + σ32 f3 S32 σ 2 + σ22 + σ32
∼ χ 2 (f1 ) ∼ χ 2 (f2 ) ∼ χ 2 (f3 )
f4 S42 ∼ χ 2 (f4 ) σ2 Let ξ ∈ (0, 1). (a) Obtain quadratic unbiased estimators of the four variance components. (b) Obtain a 100(1 − ξ )% confidence interval for σ 2 + σ12 + σ22 . (c) Obtain a confidence interval for σ22 −σ32 , which has coverage probability at least 1 − ξ . (Hint: Using the Bonferroni inequality, combine the interval from part [b] with a similarly constructed interval for another quantity.) (d) Use Satterthwaite’s method to obtain an approximate 100(1 − ξ )% confidence interval for σ12 . Solution (a) E(S12 ) = σ 2 + σ12 + σ22 ,
E(S22 ) = σ 2 + σ12 + σ32 ,
E(S32 ) = σ 2 + σ22 + σ32 ,
E(S42 ) = σ 2 .
16 Inference for Variance–Covariance Parameters
341
So quadratic unbiased estimators of the variance components can be obtained by solving the equations: S12 = σ 2 + σ12 + σ22 ,
S22 = σ 2 + σ12 + σ32 ,
S32 = σ 2 + σ22 + σ32 ,
S42 = σ 2 .
A solution will satisfy σˆ 2 = S42 ,
σˆ 12 + σˆ 22 = S12 − S42 ,
σˆ 22 − σˆ 32 = S12 − S22 ,
σˆ 22 + σˆ 32 = S32 − S42 ,
σˆ 12 − σˆ 22 = S22 − S32 .
The first of these equations yields the estimator of σ 2 ; the second and fifth equations yield σˆ 12 = (S12 + S22 − S32 − S42 )/2; the third and fourth equations yield σˆ 22 = (S12 − S22 + S32 − S42 )/2; and back-solving yields σˆ 32 = (−S12 + S22 + S32 − S42 )/2. (b) 1 − ξ = Pr
2 χ1−(ξ/2),f 1
f1 S12
≤
σ 2 + σ12 + σ22 = Pr Lξ ≤ σ 2 + σ12 + σ22 ≤ Rξ
where Lξ =
f1 S12 2 χξ/2,f
f1 S12 . 2 χ1−(ξ/2),f 1 σ 2 + σ12 + σ22 .
and Rξ = 1
≤
2 χξ/2,f 1
Thus [Lξ , Rξ ] is a 100(1 − ξ )%
confidence interval for (c) Using the same method as in part (b), we obtain 1 − ξ = Pr L∗ξ ≤ σ 2 + σ12 + σ32 ≤ Rξ∗ where L∗ξ =
f2 S22 2 χξ/2,f
and Rξ∗ = 2
f2 S22 2 χ1−(ξ/2),f
. Thus, by Bonferroni’s inequality, 2
∗ 1 − ξ ≤ Pr(Lξ/2 ≤ σ 2 + σ12 + σ22 ≤ Rξ/2 and L∗ξ/2 ≤ σ 2 + σ12 + σ32 ≤ Rξ/2 ) ∗ = Pr(Lξ/2 ≤ σ 2 + σ12 + σ22 ≤ Rξ/2 and − Rξ/2 ≤ −σ 2 − σ12 − σ32 ≤ −L∗ξ/2 ),
∗ ≤ σ 2 −σ 2 ≤ R ∗ implying further (by Fact 15.3) that Pr(Lξ/2 −Rξ/2 ξ/2 −Lξ/2 ) ≥ 2 3 1 − ξ. (d) σ12 = (1/2)(σ 2 + σ12 + σ22 ) + (1/2)(σ 2 + σ12 + σ32 ) − (1/2)(σ 2 + σ22 + σ32 ) − (1/2)σ 2 . Thus, the Satterthwaite-method approximate 100(1 − ξ )% confidence
342
16 Inference for Variance–Covariance Parameters
interval for σ12 is given by /
(1/2)tˆ(S12 + S22 − S32 − S42 ) (1/2)tˆ(S12 + S22 − S32 − S42 ) , 2 2 χξ/2, χ1−(ξ/2), tˆ tˆ
0
where tˆ =
(S12 + S22 − S32 − S42 )2 . 4 (S1 /f1 ) + (S24 /f2 ) + (S34 /f3 ) + (S44 /f4 )
If S12 + S22 − S32 − S42 < 0, then set both endpoints equal to 0. Exercise 6 Consider once again the normal one-factor components-of-variance model with balanced data that was considered in Examples 16.1.3-2 and 16.1.4-1. Recall that σˆ b2 = (S22 −S32 )/r is the minimum variance quadratic unbiased estimator of σb2 under this model. Suppose that σb2 /σ 2 = 1. (a) Express Pr(σˆ b2 < 0) as Pr(F < c), where F is a random variable having an F(ν1 , ν2 ) distribution and c is a constant. Determine ν1 , ν2 , and c in as simple a form as possible. (b) Using the pf function in R, obtain a numerical value for Pr(σˆ b2 < 0) when q = 5 and r = 4. Solution (a) By Definition 14.5.1, Pr(σˆ b2 < 0) = Pr[(S22 − S32 )/r < 0) = Pr(S22 < S32 ) = Pr(S22 /S32 < 1) S22 /(rσb2 + σ 2 ) σ2 = Pr < = Pr(F < c) S32 /σ 2 rσb2 + σ 2 where F ∼ F[q − 1, q(r − 1)] and c =
σ2 rσb2 +σ 2
=
1 r(σb2 /σ 2 )+1
=
1 r+1 .
(b) Using the pf function in R, we find, by specializing the result of part (a) to the case q = 5 and r = 4, that . Pr(σˆ b2 < 0) = Pr[F(4, 15) < 1/5] = 0.066. Exercise 7 Consider once again the normal one-factor components-of-variance model with balanced data that was considered in Examples 16.1.3-2 and 16.1.4-1. Simulate 10,000 response vectors y that follow the special case of such a model in which q = 4, r = 5, and σb2 = σ 2 = 1, and obtain the following:
16 Inference for Variance–Covariance Parameters
343
(a) the proportion of simulated vectors for which the quadratic unbiased estimator of σb2 given in Example 16.1.3-2 is negative; (b) the average width and empirical coverage probability of the approximate 95% confidence interval for σb2 derived in Example 16.1.4-1 using Satterthwaite’s approach; (c) the average width and empirical coverage probability of the approximate 95% confidence intervals for σb2 derived in Example 16.1.4-1 using Williams’ approach. Based on your results for parts (b) and (c), which of the two approaches for obtaining an approximate 95% confidence interval for σb2 performs best? (Note: In parts [b] and [c], if any endpoints of the intervals are negative, you should modify your code to set them equal to 0.) Solution (a) Relevant R code and the results obtained using that code are as follows: library(magic) set.seed(10) q