267 99 11MB
English Pages [237] Year 2020
N. R. Mohan Madhyastha S. Ravi A. S. Praveena
A First Course in Linear Models and Design of Experiments
A First Course in Linear Models and Design of Experiments
N. R. Mohan Madhyastha S. Ravi A. S. Praveena •
•
A First Course in Linear Models and Design of Experiments
123
N. R. Mohan Madhyastha Department of Studies in Statistics University of Mysore Mysuru, India
S. Ravi Department of Studies in Statistics University of Mysore Mysuru, India
A. S. Praveena Department of Studies in Statistics University of Mysore Mysuru, India
ISBN 978-981-15-8658-3 ISBN 978-981-15-8659-0 https://doi.org/10.1007/978-981-15-8659-0
(eBook)
Mathematics Subject Classification: 62J10, 62F03, 62F10, 62F25, 62H10, 62J05, 62J15, 62J99, 62K10, 62K15, 62K99 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
Preface
While writing any book on any topic, the first question that has to be confronted is: Why one more book on this topic? Our justification: we have taught the material in the book for several years to students of Master of Science degree program in Statistics at the University of Mysore. A student with only a basic knowledge of Linear Algebra, Probability Theory and Statistics, and wanting to understand the basic concepts of Linear Models, Linear Estimation, Testing of Linear Hypotheses, basics of Design and Analysis of Experiments and the standard Models, will find this book useful. The book is targeted at beginners. Proofs are given in detail to help the uninitiated. We expect the reader to get motivated by the basics in the book and refer advanced literature on the topics to learn more. This book is intended as a leisurely bridge to advanced topics. With a surge in interest in Data Science, Big Data and whatnot, we wish that this rigorous treatment will kindle interest in the reader to explore advanced topics. The first two chapters consist of the basic theory of Linear Models covering estimability, Gauss-Markov theorem, confidence interval estimation and testing of linear hypotheses. The later chapters consist of the general theory of design and analysis of general complete/incomplete Block Designs, Completely Randomized Design, Randomized Block Design, Balanced Incomplete Block Design, Partially Balanced Incomplete Block Design, general Row-Column Designs with Latin Square Design and Youden Square Design as particular cases, symmetric Factorial Experiments with factors at two/three levels including Partial and Complete Confounding, Missing Plot Technique, Analysis of Covariance Models, Split-Plot and Split-Block Designs. The material covered in these chapters should give a fairly good idea of the design and analysis of statistical experiments. Every chapter ends with some exercises which are intended as practice exercises to understand the theory discussed in the chapter. The first exercise of every chapter encourages readers to provide missing steps, or deliberately left out steps, in some of the discussions in the chapter. Exercises in the first two chapters and some in later
v
vi
Preface
chapters are original and the data accompanying many exercises may be from other books on this subject. Since the references to these could not be ascertained, we take this opportunity to gratefully acknowledge all the authors and publishers for some of the numerical data used here. Since the book is intended as a first course, historical references and citations are not given, even for the quotes at the beginning of every chapter except for the names of the authors of the quotes. R-codes are given at the end of every chapter after Exercises. These should help the reader to explore the material in R, an open-source software. The book ends with Bibliography which contains a list of books for further reading, and a subject index. The approach used is algebraic and is aimed at a beginner who has some exposure to Linear Algebra, Probability Theory and Statistics. The material can be covered in a one semester course. Before retiring from active service in the year 2003 as Professor of Statistics, Prof. N. R. Mohan had taught the material here for more than twenty five years to several students of the M.Sc. Statistics program of the University of Mysore. Subsequently, the undersigned, his student, has taught this subject for more than twenty years. Though we started writing this book several years ago, after finishing drafts of a few chapters, the book did not see the light of the day. Upon the insistence of Professor Mohan’s wife, this compilation has been done for wider dissemination. My sincere gratitude to Mrs. Geeta Mohan for pushing me towards completing the book and to Mrs. Padmashri Ambekar, daughter of Prof. N. R. Mohan, for help with the publication. My sincere thanks to Dr. A. S. Praveena for the R-codes in the book and for all help received while preparing this book for publication and to Mr. Shamim Ahmad, Senior Editor, Springer India, for facilitating the process of publication, and for tolerating my many e-mail queries. I fully own responsibility for any errors and omissions. I will be grateful for your comments and criticisms. Mysuru, India June 2020
S. Ravi
Contents
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
1 1 2 7 13 15 16 16
2 Linear Hypotheses and their Tests . . . . . . . . . . . . . . . . . . 2.1 Linear Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Likelihood Ratio Test of a Linear Hypothesis . . . . . . . 2.3 Gauss–Markov Models and Linear Hypotheses . . . . . . 2.4 Confidence Intervals and Confidence Ellipsoids . . . . . . 2.5 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 R-Codes on Linear Estimation and, Linear Hypotheses and their Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
19 20 21 24 32 36 36
........
37
1 Linear Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Gauss–Markov Model . . . . . . . . . . . . . . . . . . . 1.2 Estimability . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Least Squares Estimate . . . . . . . . . . . . . . . . . . 1.4 Best Linear Unbiased Estimates . . . . . . . . . . . . 1.5 Linear Estimation with Correlated Observations 1.6 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
3 Block Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 General Block Design . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Rank of the Block Design Model . . . . . . . . . . . . 3.1.2 Estimability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Least Squares Estimates . . . . . . . . . . . . . . . . . . . 3.1.4 Best Estimates of elpf’s . . . . . . . . . . . . . . . . . . . 3.1.5 Tests of Hypotheses . . . . . . . . . . . . . . . . . . . . . . 3.1.6 Anova Tables for a Block Design . . . . . . . . . . . . 3.1.7 Anova Table for RBD . . . . . . . . . . . . . . . . . . . . 3.1.8 Some Criteria for Classification of Block Designs
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
43 44 45 46 50 51 52 55 65 65
vii
viii
Contents
3.2 Balanced Incomplete Block Design . . . . . . . 3.2.1 Estimability . . . . . . . . . . . . . . . . . . . 3.2.2 Least Squares Estimates . . . . . . . . . . 3.2.3 Best Estimates . . . . . . . . . . . . . . . . . 3.2.4 Tests of Hypotheses . . . . . . . . . . . . . 3.2.5 Recovery of Inter-Block Information . 3.3 Partially Balanced Incomplete Block Design 3.3.1 Estimability, Least Squares Estimates 3.3.2 Blue’s and their Variances . . . . . . . . 3.3.3 Tests of Hypotheses . . . . . . . . . . . . . 3.3.4 Efficiency Factor of a Block Design . 3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 R-Codes on Block Designs . . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
68 70 71 71 72 73 76 83 85 86 87 87 90
4 Row-Column Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 General Row-Column Design . . . . . . . . . . . . . . . . . . 4.1.1 Rank of the Row-Column Design Model . . . . 4.1.2 Estimability . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Least Squares Estimates . . . . . . . . . . . . . . . . . 4.1.4 Blue’s and their Variances . . . . . . . . . . . . . . . 4.1.5 Tests of Hypotheses . . . . . . . . . . . . . . . . . . . . 4.1.6 Anova Table for Testing Ha in a Row-Column 4.2 Latin Square Design . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Anova Table for LSD . . . . . . . . . . . . . . . . . . 4.3 Youden Square Design . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Anova Table for Testing Ha in YSD . . . . . . . . 4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 R-Codes on Row-Column Designs . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
105 105 107 108 112 113 114 118 119 121 121 124 125 126
. . . . . .
. . . . . .
. . . . . .
131 132 132 133 135 135
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
Design . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
5 Factorial Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 2M -Factorial Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 Factorial Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Properties of Vectors Associated with Factorial Effects 5.1.3 Best Estimates of Factorial Effects . . . . . . . . . . . . . . . 5.1.4 Testing the Significance of Factorial Effects . . . . . . . . 5.1.5 Total of the Sums of Squares Associated with Testing the Significance of Factorial Effects . . . . . . . . . . . . . . 5.1.6 Anova Table for Testing the Significance of Factorial Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.7 Yates’ Algorithm to Obtain the Factorial Effect Totals . 5.2 Completely Confounded 2M -Factorial Experiment . . . . . . . . . 5.2.1 Rank of C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Least Squares Estimates . . . . . . . . . . . . . . . . . . . . . . .
. . . 136 . . . . . .
. . . . . .
. . . . . .
136 137 137 138 138 139
Contents
5.3
5.4
5.5
5.6 5.7
ix
5.2.4 Best Estimates of Estimable Factorial Effects . . . . . . 5.2.5 Testing the Significance of Unconfounded Factorial Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.6 Total of Sums of Squares Associated with Testing the Significance of Unconfounded Factorial Effects . . 5.2.7 Anova Table for Testing the Significance of Unconfounded Factorial Effects . . . . . . . . . . . . . . Partially Confounded 2M -Factorial Experiment . . . . . . . . . . . 5.3.1 Rank of C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Best Estimates of Factorial Effects . . . . . . . . . . . . . . 5.3.3 Testing the Significance of Factorial Effects . . . . . . . 5.3.4 Total of Sums of Squares Associated with Testing the Significance of Factorial Effects . . . . . . . . . . . . . 5.3.5 Anova Table for Testing the Significance of Factorial Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.6 A g-Inverse of C . . . . . . . . . . . . . . . . . . . . . . . . . . 3M -Factorial Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Factorial Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Linear/Quadratic Components of Factorial Effects . . . 5.4.3 Best Estimates of the Components . . . . . . . . . . . . . . 5.4.4 Testing the Significance of the Components . . . . . . . 5.4.5 Total of Sums of Squares Associated with Testing the Significance of the Components . . . . . . . . . . . . . 5.4.6 Anova Table for Testing the Significance of the Components . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.7 Divisors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.8 Extended Yates’ Algorithm to Obtain the Component Totals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Completely Confounded 3M -Factorial Experiment . . . . . . . . 5.5.1 Best Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Testing of Hypotheses . . . . . . . . . . . . . . . . . . . . . . . 5.5.3 Anova Table for Testing the Significance of Unconfounded Factorial Effects . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . R-Codes on Factorial Experiments . . . . . . . . . . . . . . . . . . . .
6 Analysis of Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 General Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Least Squares Estimates . . . . . . . . . . . . . . . . . 6.1.2 Testing the Relevance of the Ancova Model . . 6.2 Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Ancova Table for Testing Ha in CRD . . . . . . 6.2.2 Ancova Table for Testing Ha and Hb in RBD
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . 139 . . . . 139 . . . . 140 . . . . .
. . . . .
. . . . .
. . . . .
141 141 142 142 144
. . . . 145 . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
146 146 147 147 148 150 150
. . . . 151 . . . . 152 . . . . 152 . . . .
. . . .
. . . .
. . . .
153 153 156 156
. . . . 157 . . . . 157 . . . . 159 . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
165 166 166 167 167 169 172
x
Contents
6.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 6.4 R-Codes on Analysis of Covariance . . . . . . . . . . . . . . . . . . . . . . 175 . . . 183 . . . 183 . . . 185
7 Missing Plot Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Substitution for Missing Observations . . . . . . . . . . . . . . . . . . 7.2 Implications of Substitution . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Missing Plot Technique in RBD with One Missing Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Anova Table for Testing Ha and Hb in RBD with One Missing Observation . . . . . . . . . . . . . . . . . . 7.2.3 Efficiency Factor of RBD with Single Observation Missing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.4 Missing Plot Technique in LSD with One Observation Missing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.5 Anova Table for Testing Ha ; Hb ; and Hc in LSD with One Missing Observation . . . . . . . . . . . . . . . . . . 7.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 R-Codes on Missing Plot Technique . . . . . . . . . . . . . . . . . . .
. . . 189 . . . 189 . . . 190
8 Split-Plot and Split-Block Designs . . . . . . . . . . . . . . . . . . . 8.1 Split-Plot Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.2 Rank, Estimability, and Least Squares Estimates 8.1.3 Testing of Hypotheses . . . . . . . . . . . . . . . . . . . 8.1.4 Anova Table for a Split-Plot Design . . . . . . . . . 8.2 Split-Block Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 Rank, Estimability, Least Squares Estimates . . . 8.2.3 Testing of Hypotheses . . . . . . . . . . . . . . . . . . . 8.2.4 Anova Table for a Split-Block Design . . . . . . . 8.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 R-Codes on Split-Plot and Split-Block Designs . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . 186 . . . 187 . . . 188 . . . 188
. . . . . . . . . . . . .
. . . . . . . . . . . . .
197 197 198 201 204 208 209 209 213 216 219 220 220
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
About the Authors
N. R. Mohan Madhyastha is a former Professor of Statistics at the Department of Studies in Statistics, University of Mysore, India. His areas of interest include probability theory, distribution theory, probability theory on metric spaces, stochastic processes, linear models, and design and analysis of experiments. His research articles have been published in several journals of repute. He earned his Ph.D. and M.Sc. in Statistics from the University of Mysore, where he later served for more than 30 years. S. Ravi is Professor of Statistics at the Department of Studies in Statistics, University of Mysore, India. Earlier, he served as Lecturer in Statistics at the Department of Statistics, University of Mumbai, India, during 1994–97. He earned his Ph.D. in Statistics in 1992 under the supervision of Prof. N. R. Mohan Madhyastha with the thesis titled “Contributions to Extreme Value Theory”. With over 35 research articles published in several journals of repute, Prof. Ravi has supervised 8 students to receive their Ph.D. degrees. His areas of research include probability theory, distribution theory, stochastic processes, reliability theory, linear models, regression analysis, design and analysis of experiments, demography, and computational statistics. A. S. Praveena is Assistant Professor (under the UGC-Faculty Recharge Program) at the Department of Studies in Statistics, University of Mysore, India. She completed her Ph.D. in Statistics from the University of Mysore under the supervision of Prof. S. Ravi. Her research articles have been published in peer reviewed journals of repute. She has 13 years of experience teaching undergraduate and postgraduate students and has presented several R demonstrations in workshops and faculty development programs. She received an award for best poster presentation at the 103rd Indian Science Congress held in the year 2016.
xi
Abbreviations and Notations1
aob A⊗B A′ A−1 |A| (A : a) AB [AB] In M k := Y adj Anova Ancova BðsÞ BIBD blue C C(A) CRD Cor Cov DF Δc diag(…) E(Y )
Hadamard product ða1 b1 . . .an bn Þ0 of n 1 vectors a and b Kronecker product of matrices A and B Transpose of A Inverse of matrix A Determinant of A Matrix A augmented with vector/matrix a A is a subset of B Factorial effect total of factorial effect AB Column vector of n entries, all equal to 1 M choose k, for integers M and k is defined as Y follows adjusted Analysis of Variance Analysis of Covariance Subspace of dimension s Balanced Incomplete Block Design best linear unbiased estimator C -matrix or Information matrix Column space of matrix A Completely Randomized Design Correlation coefficient Covariance Degrees of freedom Delta matrix of order c equal to Ic 1c Ic I0c diagonal matrix with entries … Expectation of random variable/vector Y
1
All definitions appear italicized. All vectors are column vectors. xiii
xiv
elpf exp g-inverse A− Ip iff Ji lpf LSD max min MS MSB MSE MSTr Nðw; r2 IÞ PBIBD Rp RBD SS SSB SSC SSE SSR SST SSTr sup SV unadj V (Y ) YSD
Abbreviations and Notations
estimable linear parametric function Exponential function Generalized inverse of A Identity matrix of order p if and only if ith column of Identity matrix linear parametric function Latin Square Design maximum minimum Mean Squares Mean Squares for Blocks Mean Squares for Error Mean Squares for Treatments Multivariate normal distribution with mean vector w and dispersion matrix r2 I Partially Balanced Incomplete Block Design p-dimensional Euclidean space Randomized Block Design Sum of Squares Sum of Squares for Blocks Sum of Squares for Columns Sum of Squares for Error Sum of Squares for Rows Sum of Squares for Total Sum of Squares for Treatments supremum Sources of Variation unadjusted Variance of random variable/vector Y Youden Square Design
Chapter 1
Linear Estimation
Everything should always be made as simple as possible, but not simpler – A. Einstein
In many modeling problems, a response variable is modeled as a function of one or more independent or explanatory variables. The linear function of the explanatory variables along with a random error term has been found useful and applicable to many problems, for example, the weight of a newborn human baby as a function of the circumference of her head or shoulder, the price of crude oil as a function of currency exchange rates, the length of elongation of a weighing spring as a function of the loaded weight, and whatnot. Many such and similar problems are modeled using a linear model. In fact, all linear regression models are full-rank linear models. This chapter discusses linear estimation in a linear model. The subsequent chapter discusses the other aspect of such modeling, which is linear hypotheses. The material discussed in the first two chapters are applied to specific models in later chapters. R-codes for the topics discussed in this chapter are given at the end of Chap. 2.
1.1 Gauss–Markov Model Consider an n × 1 random vector Y = (Y1 . . . Yn ) of random variables Y1 , . . . , Yn , with expectation E(Y ) := (E(Y1 ) . . . E(Yn )) = Aθ,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 N. R. Mohan Madhyastha et al., A First Course in Linear Models and Design of Experiments, https://doi.org/10.1007/978-981-15-8659-0_1
1
2
1 Linear Estimation
and dispersion matrix V (Y ) := E((Y − E(Y ))(Y − E(Y )) ) = σ 2 In , where θ = θ1 . . . θ p is a p × 1 vector of real-valued parameters θ1 , . . . , θ p , which are unknown, A is an n × p matrix of known entries called the design matrix, σ 2 > 0 is unknown. Throughout, we shall denote the identity matrix of order n by In or by I if its order is clear from the context, and the transpose of B, a vector or a matrix, by B . Note that the expected value of Yi is a known linear function of the same set of p unknown parameters θ1 , . . . , θ p , i = 1, . . . , n, Y1 , . . . , Yn are pairwise uncorrelated and that each Yi has the same unknown variance σ 2 > 0. The last property is referred to as homoscedasticity, meaning equal variances, the opposite of heteroscedasticity. Note also that if we write Y = Aθ + , where = (1 . . . n ) is assumed to be the random error, then E() = 0 and, V () = σ 2 I, called the error variance. In this chapter, we will have no more assumptions about the probability distribution of Y. However, we assume n ≥ p throughout the book. The discussion about models where n < p is outside the scope of this book. The triple Y, Aθ, σ 2 I is called a Gauss–Markov model or a linear model. Observe that the design matrix specifies the model completely. By the rank of the model, we mean the rank of the design matrix A. The model is called a full-rank model if Rank(A) = p; otherwise, it is called a less-than-full-rank model. We shall reserve the symbol s to denote the rank of A so that 1 ≤ s ≤ p ≤ n. The model given above is the natural model for the random variables associated with many experiments, and the parameters θ1 , . . . , θ p represent the unknown quantities of interest to the experimenter. In most cases, the very purpose of conducting the experiment is to obtain estimates of and confidence intervals for linear functions of θ1 , . . . , θ p , and also to test hypotheses involving these parameters. In this chapter, we discuss the problem of estimation. Our search for the best estimators of linear functions of the parameters θ1 , . . . , θ p , best in a sense to be clarified later, will be confined to the class of estimators which are linear functions of Y1 , . . . , Yn . With the minimal assumptions that we have on the distribution of Y, we will not be able to admit nonlinear estimators to compete. In what follows, y = (y1 . . . yn ) denotes a realization of Y.
1.2 Estimability A linear parametric function (lpf) a θ = a1 θ1 + · · · + a p θ p , a = (a1 . . . a p ) ∈ R p , the p-dimensional Euclidean space, is said to be estimable if there exists a linear function c Y = c1 Y1 + · · · + cn Yn of Y1 , . . . , Yn , such that E(c Y ) = a θ for all θ = (θ1 . . . θ p ) ∈ R p , for some c = (c1 . . . cn ) ∈ Rn . In other words, a θ is estimable if c Y is an unbiased estimator of a θ. In this case, a θ is called an estimable linear parametric function (elpf) and c Y is called a linear unbiased
1.2 Estimability
3
estimator of a θ. As is evident from the examples that follow, not every lpf need be estimable in a Gauss–Markov model. The following theorem, which can be stated in several equivalent forms, gives a criterion for an lpf to be estimable. Theorem 1.2.1 A necessary and sufficient condition for an lpf a θ = a1 θ1 + · · · + a p θ p to be estimable is that Rank(A) = Rank(A : a),
(1.2.1)
where (A : a) denotes the matrix A augmented with a. Proof If a θ is estimable, by definition, there exists a c Y = c1 Y1 + · · · + cn Yn such that a θ = E(c Y ) = c E(Y ) = c Aθ for all θ ∈ R p . Hence A c = a and this implies (1.2.1). Conversely, if (1.2.1) holds, then A c = a for some vector c ∈ Rn , and c Y is unbiased for a θ. Hence a θ is estimable. Corollary 1.2.2 An lpf a θ is estimable iff Rank(A A) = Rank(A A : a).
(1.2.2)
Proof If a θ is estimable, then (1.2.1) holds. Since Rank(A A) ≤ Rank(A A : a) A 0n×1 = Rank(A : a) 0 1 ≤ Rank(A : a) = Rank(A) = Rank(A A), it follows that (1.2.2) holds. Conversely, if (1.2.2) holds, then there exists a vector d such that A Ad = a. Then the linear estimator d A Y is unbiased for a θ since E d A Y = d A E(Y ) = d A Aθ = a θ and a θ is estimable. Remark 1.2.3 The conditions (1.2.1) and (1.2.2) are, respectively, equivalent to a ∈ C(A ) and a ∈ C(A A), where C(B) denotes the column space of the matrix B. Remark 1.2.4 The conditions (1.2.1) and (1.2.2) are necessary and sufficient for the consistency of the linear equations A c = a and A Ad = a, respectively. Corollary 1.2.5 If a1 θ = a11 θ1 + · · · + a1 p θ p and a2 θ = a21 θ1 + · · · + a2 p θ p are estimable and c1 and c2 are scalars, then c1 a1 θ + c2 a2 θ is also estimable. The proof follows easily from the definition of estimability (see Exercise 1.1). Lemma 1.2.6 Every lpf in a Gauss–Markov model is estimable iff it is a full-rank model.
4
1 Linear Estimation
Proof Suppose a θ is estimable for every a ∈ R p . Then θ1 , . . . , θ p are estimable. Write θi = Ji θ, i = 1, . . . , p, where Ji denotes the ith column of I p . By Corollary 1.2.2, p ≥ Rank(A A) = Rank(A A : J1 : · · · : J p ) ≥ Rank(J1 . . . J p ) = Rank(I p ) = p and hence Rank(A A) = Rank(A) = p, proving the necessity. If Rank(A) = p, then Rank(A A : a) = p for every a ∈ R p and a θ is estimable. Remark 1.2.7 Aθ and A Aθ are estimable, that is, every component of these two vectors is an elpf. The claims follow easily from the definition of an elpf since E(Y ) = Aθ and E(A Y ) = A Aθ. Remark 1.2.8 Let V = b ∈ R p : A Ab = 0 . Then V is a vector space of dimension p − s and is orthogonal to C(A A). It is easy to see that if b ∈ V, then b θ is not estimable. However, if b θ is not estimable, it does not follow that b ∈ V. Let B p×( p−s) denote the matrix whose columns constitute a basis for the vector space V in Remark 1.2.8. Note that A AB = 0 and AB = 0. The following theorem gives yet another criterion for the estimability of an lpf. Theorem 1.2.9 Let B p×( p−s) be a matrix such that AB = 0 and Rank(B) = p − s. Then an lpf a θ is estimable iff a B = 0. Proof If a θ is estimable, then by Remark 1.2.3, a ∈ C(A A). This implies that a b = 0 for any b ∈ V and hence a B = 0. Suppose now that a B = 0. Then a ∈ C(A A) and a θ is estimable by Remark 1.2.3. Note that the columns of the matrix B in Theorem 1.2.9 constitute a basis for the vector space V in Remark 1.2.8 and hence B is not unique. In fact, if B ∗ is any nonsingular matrix of order p − s and B1 = B B ∗ , then a B = 0 iff a B1 = 0. Theorem 1.2.9 is particularly useful when we want to examine several lpf’s for estimability in a given Gauss–Markov model. Note that when compared to the verification of condition (1.2.1) or (1.2.2) for estimability, the verification of the condition a B = 0 is trivial. Our attempt will be to find a B satisfying the conditions of Theorem 1.2.9 for each Gauss–Markov model that we introduce later. In the following remark, a method of finding the B matrix is described. Remark 1.2.10 Let (A A)− be a generalized inverse (g-inverse) of A A, that is, A A = A A(A A)− A A. It follows from the definition of a g-inverse that − − A and I p − (A A) Since s = Rank(A A) = Rank (A A A−are idempotent. A) A − AA ≤ A) = A A(A A) Rank (A A) A A ≤ Rank(A A), we have Rank(A − − − Rank (A A) A A . Hence Rank I −(A A) A A = trace I −(A A) A A = p p p − trace (A A)− A A = p − s. Further, (A A) I p − (A A)− A A = 0. So one can choose the matrix B as the one obtained by deleting all the s linearly dependent columns in I p − (A A)− A A. Linear parametric functions a1 θ, . . . , am θ are said to be independent (orthogonal) if the associated vectors a1 , . . . , am are linearly independent (orthogonal), where ai = ai1 . . . ai p , i = 1, . . . , m.
1.2 Estimability
5
Lemma 1.2.11 In any given set of elpf’s, the maximum number of independent elpf’s is not more than s. Proof If possible, let a1 θ, . . . , am θ be independent elpf’s with m > s. By Theorem 1.2.1, s = Rank(A) = Rank A : a1 : · · · : am ) ≥ Rank (a1 . . . am ) = m. This contradiction establishes the claim. Note that, by Remark 1.2.7, Aθ has n elpf’s of which only s are independent. An lpf a θ is called a contrast or a comparison in θ if a1 + · · · + a p = 0. Example 1.2.12 Let Y1 , Y2 , Y3 , Y4 , and Y5 be pairwise uncorrelated random variables with common variance σ 2 > 0 and expectations given by E(Y1 ) = θ1 − θ2 + θ4 , E(Y2 ) = θ1 + θ2 + θ3 , E(Y3 ) = 2θ2 + θ3 − θ4 , E(Y4 ) = θ1 + 3θ2 + 2θ3 − θ4 , E(Y5 ) = 2θ1 + θ3 + θ4 . In this model, n = 5, p = 4, and the design matrix is ⎛
1 ⎜1 ⎜ A=⎜ ⎜0 ⎝1 2
−1 1 2 3 0
0 1 1 2 1
⎞ 1 0⎟ ⎟ −1⎟ ⎟. −1⎠ 1
We shall obtain a criterion for a θ = a1 θ1 + a2 θ2 + a3 θ3 + a4 θ4 to be estimable in this model as well as the rank of the model, simultaneously. We use the fact that the elementary transformations will not alter the rank of a matrix. We have ⎞ 1 1 0 1 2 a1 ⎜−1 1 2 3 0 a2 ⎟ ⎟ Rank ⎜ ⎝ 0 1 1 2 1 a3 ⎠ 1 0 −1 −1 1 a4 ⎞ ⎛ 1 1 0 1 2 a1 ⎜ 0 2 2 4 2 a1 + a2 ⎟ ⎟ Rank ⎜ ⎠ ⎝ 0 1 1 2 1 a3 0 1 1 2 1 a1 − a4 ⎛ ⎞ 1 1 0 1 2 a1 ⎜ 0 2 2 4 2 a1 + a2 ⎟ ⎟ Rank ⎜ ⎝ 0 0 0 0 0 a1 + a2 − 2a3 ⎠ 0 0 0 0 0 (a1 + a2 ) − 2(a1 − a4 ) Rank(A) ⎛
Rank(A : a) =
=
= =
6
1 Linear Estimation
= 2 iff a1 + a2 − 2a3 = 0 and a1 − a2 − 2a4 = 0. By Theorem 1.2.1, a θ is estimable iff a1 + a2 − 2a3 = 0 and a1 − a2 − 2a4 = 0, 1 1 −2 0 that is, a B = 0 with B = . Note that AB = 0 and B is the 1 −1 0 −2 matrix satisfying the conditions of Theorem 1.2.9. By postmultiplying B by the 1/2 1/2 nonsingular matrix ,we get another pair of equivalent conditions on a 1/2 −1/2 as a1 − a3 − a4 = 0 and a2 − a3 + a4 = 0. Example 1.2.13 Let Y = Y11 . . . Y1n 1 Y21 . . . Y2n 2 . . . Yv1 . . . Yvn v = Y1 Y2 . . . Yv with Yi = Yi1 . . . Yini , and Yi j , j = 1, . . . , n i , i = 1, . . . , v, be pairwise uncorrelated random variables with common variance σ 2 > 0 and expectations E(Yi j ) = μ + αi . We write E(Y ) = Aθ with θ = (μ α ) , α = (α1 . . . αv ), and ⎛
In 1 In 1 On 1 ×1 ⎜In 2 On 2 ×1 In 2 A=⎜ ⎝ . . . In v On v ×1 On v ×1
⎞ . . On 1 ×1 . . On 2 ×1 ⎟ ⎟, .. . ⎠ . . In v
where Im or I, if m is clear from the context, denotes the m-component column vector with each component equal to 1 and Om×n denotes the null matrix of order m × n which will be written as 0 if m = 1 = n or if m and n are clear from the context. In this model, n = n 1 + · · · + n v , p = v + 1, and ⎛
n ⎜n 1 ⎜ s = Rank(A A) = Rank ⎜ ⎜n 2 ⎝. nv
n1 n1 0 . 0
n2 0 n2 . 0
. . 0 . .
. . . . .
⎞ nv 0⎟ ⎟ n I N ⎟ 0 ⎟ = Rank = Rank(N ) = v, NI N .⎠ nv
since the first row (column) of A A is the sum of the remaining rows (columns), where N = diag(n 1 , . . . , n v ) is the diagonal matrix. To obtain a criterion for the estimability of a θ = a0 μ + a1 α = a0 μ + a11 α1 + · · · + a1v αv , we look for conditions on a = (a0 a1 ) that satisfy (1.2.2). Recalling that the first column of A A is the sum of the remaining columns and that multiplication by a nonsingular matrix does not alter the rank, we get
1.2 Estimability
7
n I N a0 N I N a1 I N a0 Rank N a1 I N a0 −1 Iv Rank 0v×1 Iv N a1 0 −a0 + I a1 Rank N a1 Rank(N ) Rank(A A)
Rank A A : a = Rank = = = = =
v a1i or iff −a0 + I a1 = 0. Thus a θ = a0 μ + a1 α is estimable iff a0 = i=1 a B = 0, where B = (−1 Iv ) . Note that AB = 0. In this model, note that none of the individual parameters μ, α1 , . . . , αv , is estimable and that an lpf a1 α of α alone is estimable iff I a1 = 0, that is, a1 α is a contrast in α. Remark 1.2.14 We have the above model for the random variables associated with experiments employing a Completely Randomized Design, abbreviated as CRD. In CRD, v treatments are randomly allocated to n homogeneous plots, grouped into v groups, such that the ith treatment is allotted to all the plots in the ith group with n i plots. We will discuss designs in later chapters.
1.3 Least Squares Estimate We have seen in the last section that in a Gauss–Markov model, not every lpf need be estimable unless the model is a full-rank model. Since unbiasedness is one of the criteria which is insisted upon almost always, we will search for the best estimates of elpf’s only, best in some sense to be made precise later. Further, as mentioned earlier, this search will be confined to the class of linear estimates only. For this purpose, we need the least squares estimate of θ. As mentioned in Sect. 1.1, y = (y1 . . . yn ) is a realization of Y. Any value of ˆ θ, say θˆ = θ(y), for which S(y, θ) = (y − Aθ) (y − Aθ)
(1.3.1)
is least, is called a least squares estimate of θ. Differentiating S(y, θ) partially with respect to θ1 , . . . , θ p and equating the derivatives to zero, we get what is known as the normal equation, written as the matrix equation A Aθ = A y.
(1.3.2)
8
1 Linear Estimation
We claim that the normal equation (1.3.2) is consistent, that is, Rank(A A) = Rank(A A : A y). This follows since Rank(A A) ≤ Rank(A A : A y) = Rank(A (A : y)) ≤ Rank(A ) = Rank(A A). Thus, the normal equation (1.3.2) is guaranteed to have at least one solution. When the Gauss–Markov model is a full-rank model, A A is nonsingular and (1.3.2) has the unique solution given by θˆ = (A A)−1 A y. Otherwise, it has infinite solutions. To get a solution of (1.3.2) in this case, we can proceed as follows. We delete all those p − s linearly dependent rows of A A and the corresponding entries in A y. We replace the deleted rows with any p − s rows so that the resulting matrix is nonsingular and we replace the corresponding entries in A y with zeroes. The new matrix equation so obtained will then have a unique solution θˆ = (A A)− A y, where (A A)− is a g-inverse of A A. Lemma 1.3.1 Any solution of the normal equation (1.3.2) is a least squares estimate of θ. Proof Let θˆ be a solution of (1.3.2). Then ˆ + A(θˆ − θ) (y − Aθ) ˆ + A(θˆ − θ) S(y, θ) = (y − Aθ) ˆ (y − Aθ) ˆ + 2(θˆ − θ) A (y − Aθ) ˆ + (θˆ − θ) A A(θˆ − θ) = (y − Aθ) ˆ (y − Aθ) ˆ + (θˆ − θ) A A(θˆ − θ) = (y − Aθ) ˆ (y − Aθ) ˆ ≥ (y − Aθ) ˆ = S(y, θ), ˆ = (θˆ − θ) (A y − A Aθ) ˆ = 0 and (θˆ − θ) A A(θˆ − θ) ≥ 0. ˆ A (y − Aθ) since (θ−θ) ˆ any solution of the normal equation. Hence S(y, θ) is least at θ = θ,
In view of this lemma, we will use the phrases ‘a least squares estimate of θ’ and ‘a solution of the normal equation (1.3.2)’ interchangeably. Whenever it is convenient, we will call the ith equation in (1.3.2) as the equation corresponding to θi and the ith column (row) of A A as that of θi , i = 1, . . . , p. Remark 1.3.2 Let the Gauss–Markov model be less-than-full-rank model and let θˆ and θ˜ be any two distinct solutions of the normal equation (1.3.2). Then A A(θˆ − ˆ − (y − Aθ) ˜ (y − Aθ) ˜ = (y y − θˆ A y) − (y y − ˜ = 0 and (y − Aθ) ˆ (y − Aθ) θ) ˆ A y = (θ˜ − θ) ˆ A Aθˆ = 0. Therefore, the value of S(y, θ) at any ˜θ A y) = (θ˜ − θ) solution of the normal equation (1.3.2) is the same, as it should be. ˆ = y y − θˆ A y If θˆ is any solution of the normal equation (1.3.2), then S(y, θ) is called the residual sum of squares or the error sum of squares or sum of squares for error (SSE).
1.3 Least Squares Estimate
9
Let (A A)− be a g-inverse of A A, the usual inverse if s = p. Then θˆ = ˆ (A A)− A y is a solution of the normal equation (1.3.2). Substituting this in S(y, θ), we get
ˆ = y y − y A(A A)− A y SSE = S(y, θ) = y I − A(A A)− A y where
= y M y, M = I − A(A A)− A .
(1.3.3) (1.3.4)
The vector y − Aθˆ = M y is called the residual vector. Lemma 1.3.3 The matrix M in (1.3.4) possesses the following properties: (i) M A = 0, (ii) M = M = M 2 , and (iii) Rank(M) = n − s. Proof Let G = M A = A(I − (A A)− A A). Then G G = (I − (A A)− A A) A A (I − (A A)− A A) = 0 using the definition of g-inverse of A A. Therefore, G is a null matrix and M A = 0. Now M 2 = M − M A(A A)− A = M and so M is idempotent. Further, M M = (I − A(A A)− A )M = M = M since M M is symmetric. Thus M is symmetric. It was shown in Remark 1.2.10 that (A A)− A A is idempotent and has rank s. This and a property of trace of a matrix gives Rank(M) = trace(M) = n − trace A(A A)− A − = n − trace (A A) A A = n − Rank (A A)− A A = n − s. Remark 1.3.4 By Lemma 1.3.3(i), we have E(MY ) = 0. ˆ θ) The ratio S(y, = SSE is called the mean squares for error (MSE). We will n−s n−s show in the next lemma that MSE is an unbiased estimate of σ 2 .
Lemma 1.3.5 An unbiased estimate of σ 2 is MSE =
ˆ S(Y,θ) n−s
= SSE . n−s
ˆ ) of the normal equation (1.3.2). Proof To prove the lemma, consider a solution θ(Y From (1.3.3) and Lemma 1.3.3, we have ˆ )) = Y MY = (Y − Aθ) M (Y − Aθ) = trace M (Y − Aθ) (Y − Aθ) . S(Y, θ(Y In the last step, we have used a property of trace of a matrix. So ˆ ))) = trace(E(M (Y − Aθ) (Y − Aθ) )) E(S(Y, θ(Y
10
1 Linear Estimation
= trace M E((Y − Aθ) (Y − Aθ) ) = trace σ 2 M = σ 2 Rank(M) = (n − s)σ 2 using Lemma 1.3.3 and a property of trace of a matrix once again. The claim follows from this. Example 1.3.6 Consider the Gauss–Markov model in Example 1.2.12. Let y = (y1 y2 y3 y4 y5 ) be an observation on Y. We have ⎛
7 ⎜ 3 A A = ⎜ ⎝5 2
3 15 9 −6
5 9 7 −2
⎞ ⎛ ⎞ ⎛ ⎞ x1 2 y1 + y2 + y4 + 2y5 ⎟ ⎟ ⎜ ⎜ −y + y + 2y + 3y x −6⎟ 1 2 3 4 ⎟ = ⎜ 2 ⎟ = x, say. ⎟, and A y = ⎜ ⎝ y2 + y3 + 2y4 + y5 ⎠ ⎝x3 ⎠ −2⎠ y1 − y3 − y4 + y5 x4 4
The normal equation A Aθˆ = A y can be written explicitly as 7θˆ1 + 3θˆ2 + 5θˆ3 + 2θˆ4 = x1 , 3θˆ1 + 15θˆ2 + 9θˆ3 − 6θˆ4 = x2 , 5θˆ1 + 9θˆ2 + 7θˆ3 − 2θˆ4 = x3 , 2θˆ1 − 6θˆ2 − 2θˆ3 + 4θˆ4 = x4 . Since s = 2 < p, the above normal equation has infinite solutions. To get a solution, we have to delete two dependent equations. Observe that the sum of the first and the second equation is twice the third equation and, the first equation minus the second equation is twice the fourth equation. In this case, therefore, any two equations can be declared as dependent and can be deleted. First, let us delete the last two equations and replace them with θˆ3 = 0
and
θˆ4 = 0,
to get the four new equations as 7θˆ1 + 3θˆ2 = x1 , 3θˆ1 + 15θˆ2 = x2 , θˆ3 = 0, θˆ4 = 0,
1.3 Least Squares Estimate
11
⎛
7 ⎜3 ⎜ or in the matrix form as ⎝ 0 0
3 15 0 0
0 0 1 0
⎞⎛ˆ ⎞ ⎛ ⎞ θ1 x1 0 ⎜ˆ ⎟ ⎜ ⎟ ⎟ 0 ⎟ ⎜θ 2 ⎟ ⎜ x 2 ⎟ . Notice that the matrix above ⎜ ⎟= 0⎠ ⎝θˆ3 ⎠ ⎝ 0 ⎠ 1 0 θˆ4
is nonsingular. Solving the two equations in θˆ1 and θˆ2 above, we get θˆ1 = 1 +7x 2 and θˆ2 = −3x96 and hence
θˆ =
5x1 −x2 ⎟ ⎜ −3x32 ⎜ 1 +7x2 ⎟ ⎜ 96 ⎟
⎝
0 0
⎛
⎞ 1 − 32 0 0 ⎛x ⎞ 1 ⎟⎜ ⎟ 7 x 0 0 ⎟ 2 − − ⎟ 96 = ⎟⎜ ⎝x3 ⎠ = (A A) x = (A A) A y, ⎠ ⎠ ⎝ 0 0 00 x4 0 0 00 ⎞
⎛
5x1 −x2 32
5 32 ⎜ 1 ⎜− 32 ⎜
where ⎛
5 32 ⎜− 1 ⎜ 32
(A A)− = ⎝ 0 0
1 − 32 00 7 96
0 0 0 0 0
⎞
0⎟ ⎟ 0⎠ 0
is a g−inverse of A A. To get another least squares estimate of θ, let us delete the second and the third equations in the normal equation above and replace them with θˆ1 = 0
θˆ3 = 0,
and
to get the four new equations as 3θˆ2∗ + 2θˆ4∗ = x1 , −6θˆ2∗ + 4θˆ4∗ = x4 , θˆ∗ = 0, 1
θˆ3∗ = 0, ⎛ 0 ⎜1 ⎜ or in the matrix form as ⎝ 0 0 two equations
3 0 0 −6
0 0 1 0
⎞ ⎛ ˆ∗ ⎞ ⎛ ⎞ θ1 x1 2 ⎜ ˆ∗ ⎟ ⎜ ⎟ θ2 ⎟ ⎜ 0 ⎟ 0⎟ ⎟⎜ . Notice that the choice of the ⎜ ⎟= 0⎠ ⎝θˆ3∗ ⎠ ⎝ 0 ⎠ x4 4 θˆ∗
θˆ1 = 0
4
and
θˆ3 = 0
12
1 Linear Estimation
has given us the above nonsingular matrix. Solving the two equations in θˆ2∗ and θˆ4∗ above, we get θˆ2∗ = 2x112−x4 and θˆ4∗ = 2x18+x4 and hence ⎛
0
⎞
⎛
00 ⎜ 2x1 −x4 ⎟ ⎜ 1 0 ∗ 12 ⎟ ⎜6 θˆ = ⎜ ⎝ 0 ⎠ = ⎝0 0 2x1 +x4 1 0 4 8
⎞⎛ ⎞ 0 0 x1 1 ⎟⎜ ⎟ 0 − 12 x ⎟ ⎜ 2 ⎟ = (A A)+ x = (A A)+ A y, 0 0 ⎠ ⎝x3 ⎠ x4 0 18
where ⎛
00 1 ⎜ 0 6 (A A)+ = ⎜ ⎝0 0 1 0 4
⎞ 0 0 1 ⎟ 0 − 12 ⎟ 0 0 ⎠ 0 18
is another g-inverse of A A. Note that the g-inverse (A A)+ is not symmetric even though A A is symmetric. The SSE in this model is SSE = y y − θˆ A y = y y − θˆ∗ A y = y y −
1 15x12 − 6x1 x2 + 7x22 , 96
where x1 = y1 + y2 + y4 + 2y5 and x2 = −y1 + y2 + 2y3 + 3y4 . Example1.3.7 Here we will consider the Gauss–Markov model in Example 1.2.13. Let y = y11 . . .y1n 1 y21 . . . y2n 2 . . . yv1 . . . yvn v be an observation on Y. In this n I N model, A A = and A y = (y.. y1. . . . yv. ) , where n = n 1 + · · · + NI N v n i v n v = I N I, N = diag(n 1 , . . . , n v ), y.. = i=1 and yi. = j=1 yi j = i=1 yi. n i y , i = 1, . . . , v. Let y = . . . y . Then the normal equation A Aθˆ = (y ) ∗. 1. v. j=1 i j A y can be written as n μˆ + I N αˆ = y.. , N Iμˆ + N αˆ = y∗. . Note that y.. = I y∗. and the top equation above corresponding to μ is dependent as it is the sum of the remaining α equations. Since s = v and p = v + 1, we need to add an equation upon deleting the top equation. We can take μˆ = 0 to get αˆ = N −1 y∗. . Then 0 01×v y.. μˆ 0 = = (A A)− A y, θˆ = = 0v×1 N −1 y∗. αˆ N −1 y∗.
0 01×v where (A A) = 0v×1 N −1
−
is a g-inverse of A A. The SSE in this case is
1.3 Least Squares Estimate
SSE = y y − θˆ A y = y y − y∗. N −1 y∗. =
13 nv v i=1 j=1
yi2j −
v yi.2 . n i=1 i
Remark 1.3.8 Estimability can be quickly understood by solving Exercise 1.2 and understanding the Gauss–Markov theorem in the next section.
1.4 Best Linear Unbiased Estimates Let a θ be an lpf in a Gauss–Markov model. By definition, it is nonestimable if it does not have a linear unbiased estimator. Suppose now that a θ is an elpf. If it has two linear unbiased estimators c1 Y and c2 Y, then λc1 Y + (1 − λ)c2 Y is also a linear unbiased estimator of a θ for every real number λ and hence a θ has infinite number of linear unbiased estimators. Thus unbiasedness alone will not always get us a unique estimator for an elpf. Therefore, we have to impose a second criterion in the hope of getting a unique estimator. It turns out that the estimator having the minimum variance in the class of linear unbiased estimators is unique. A linear function c Y of Y is said to be the best linear unbiased estimator (blue) of an elpf a θ if c Y is unbiased for a θ and has the least variance among all such linear unbiased estimators. The following celebrated Gauss–Markov theorem claims that the blue exists for every elpf in any Gauss–Markov model. Theorem 1.4.1 (Gauss–Markov Theorem) Let a θ be an elpf in a Gauss–Markov model (Y, Aθ, σ 2 I ). The blue of a θ is a θˆ where θˆ is a least squares estimate of θ, that is, a solution of the normal equation (1.3.2). The variance of the blue is a (A A)− aσ 2 , where (A A)− is a g-inverse of A A. The blue is unique. Remark 1.4.2 As we have observed in Sect. 1.3, when s = p, θˆ = (A A)−1 A y ˆ When s < p also, a θˆ remains the same no matis unique and so is a θ. ter which solution of the normal equation (1.3.2) is used. Further, the variance a (A A)− aσ 2 also remains the same whatever be the g-inverse (A A)− . To establish these facts, let θˆ and θˆ∗ be two solutions of the normal equation d such that (1.3.2). Since a θ is estimable, by Remark thereexists a vector 1.2.4, A Ad = a. Then a θˆ − a θˆ∗ = d A A θˆ − θˆ∗ = d A y − A y = 0. Let now (A A)− and (A A)+ be two g-inverses of A A so that A A(A A)± A A = A A. Then a (A A)− a − a (A A)+ a = d A A (A A)− − (A A)+ A Ad = 0. Proof of Theorem 1.4.1 Since a θ is estimable, by Remark 1.2.4, A Ad = a for ˆ )) = some vector d. Using this, we write a θˆ = d A Aθˆ = d A Y and E(a θ(Y E(d A Y ) = d A Aθ = a θ. Hence a θ is a linear unbiased estimate. Let c Y be an unbiased estimator of a θ so that a = A c. Its variance is V (c Y ) = 2 ˆ )) = E c Y − c Aθ = E c (Y − Aθ)(Y − Aθ) c = σ 2 c c and hence V (a θ(Y
14
1 Linear Estimation
V (d A Y ) = σ 2 d A Ad = σ 2 a (A A)− a. Since A c = a, we have V (c Y ) − ˆ )) = σ 2 c c − d A Ad = σ 2 (c − Ad) (c − Ad) ≥ 0 and, the equality V (a θ(Y ˆ ) has the least variance among all the linear unbiased holds iff c = Ad. Thus a θ(Y estimators of a θ and is unique. The following lemma gives the covariance between the blue’s of two elpf’s. Lemma 1.4.3 Let a θ and a ∗ θ be two elpf’s in a Gauss–Markov model. Then ˆ ), a ∗ θ(Y ˆ )) = a (A A)− a ∗ σ 2 = a ∗ (A A)− aσ 2 , where (A A)− is a gCov(a θ(Y inverse of A A and Cov denotes covariance. Proof By Remark 1.2.4, there exist vectors d and d ∗ such that A Ad = a and A Ad ∗ = a ∗ . Now ˆ ), d ∗ A Aθ(Y ˆ ) ˆ ), a ∗ θ(Y ˆ ) = Cov d A Aθ(Y Cov a θ(Y = Cov d A Y, d ∗ A Y by (1.3.2) = d A E (Y − Aθ)(Y − Aθ) Ad ∗ = d A V (Y )Ad ∗ = d A Ad ∗ σ 2 = d ∗ A Adσ 2 = d A A(A A)− A Ad ∗ σ 2 = a (A A)− a ∗ σ 2 = a ∗ (A A)− aσ 2 .
The corollary below follows trivially from Lemma 1.4.3. Corollary 1.4.4 The correlation coefficient between the blue’s of elpf’s a θ and a ∗ θ is a (A A)− a ∗ ˆ ), a ∗ θ(Y ˆ )) = √ Cor (a θ(Y , (a (A A)− a)(a ∗ (A A)− a ∗ ) where Cor denotes correlation coefficient. Remark 1.4.5 Using steps as in Remark 1.4.2, one can easily show that the covariance in Lemma 1.4.3 is the same whichever g-inverse of A A is used. Further, the covariance and hence the correlation coefficient above are zero iff a (A A)− a ∗ = 0 (see Exercise 1.1). Example 1.4.6 Let us consider once again the Gauss–Markov model in Example 1.2.12. It was shown there that a θ = a1 θ1 + a2 θ2 + a3 θ3 + a4 θ4 is estimable iff
1.4 Best Linear Unbiased Estimates
a1 + a2 = 2a3 and a1 − a2 = 2a4 .
15
(1.4.1)
The lpf’s a θ = θ1 + θ2 + θ3 and a ∗ θ = θ1 − 3θ2 − θ3 + 2θ4 are estimable. Using the least squares estimate θˆ of θ derived in Example 1.3.6, the best estimates of a θ and a ∗ θ are 1 3x1 + x2 = (y1 + 2y2 + y3 + 3y4 + 3y5 ) and 24 12 x1 − x2 1 a ∗ θˆ = = (y1 − y3 − y4 + y5 ) . 4 2 a θˆ =
Using the g-inverse (A A)± derived in Example 1.3.6, the variances of the best ˆ )) = a (A A)− aσ 2 = σ2 and V (a ∗ θ(Y ˆ )) = estimators are, respectively, V (a θ(Y 6 2 ˆ ), σ . By Lemma 1.4.3, the covariance between the best estimators is Cov(a θ(Y ∗ ˆ − ∗ 2 a θ(Y )) = a (A A) a σ = 0. The two estimators are uncorrelated. Let now a θ be estimable in this model so that (1.4.1) holds. By Gauss–Markov theorem, its blue is 5x1 −x2 −3x1 + 7x2 1 {(15a1 −a2 ) x1 +(−3a1 + 7a2 ) x2 }, +a2 = a θˆ = a1 32 96 96 (1.4.2) where θˆ is as in Example 1.3.6. The variance of the blue of a θ is ˆ )) = a (A A)− aσ 2 = 1 (15a 2 − 6a1 a2 + 7a 2 )σ 2 , V (a θ(Y 1 2 36
(1.4.3)
where (A A)− is as given in Example 1.3.6. Suppose now that a ∗ θ is another elpf in the model. Then the covariance between the blue’s of a θ and a ∗ θ is ˆ ), a ∗ θ(Y ˆ )) = a (A A)− a ∗ σ 2 Cov(a θ(Y 1 (15a1 a1∗ − 3a1 a2∗ − 3a1∗ a2 + 7a2 a2∗ )σ 2 . (1.4.4) = 96 Observe that while (1.4.1) helps us to quickly check the estimability of an lpf in this model, the ready-made formulae (1.4.2)–(1.4.4) enable us to derive the best estimates, their variances and covariances of elpf’s, without going through the steps all over again in each case.
1.5 Linear Estimation with Correlated Observations In a Gauss–Markov model Y, Aθ, σ 2 I , we have assumed that Y1 , . . . , Yn are pairwise uncorrelated. Suppose now that they are correlated with known correlation coefficients and all other assumptions remain the same. We can write the model as
16
1 Linear Estimation
the triplet Y, Aθ, σ 2 , where is a known positive definite matrix called the correlation matrix of Y. For the purpose of reference, let us call this model as the ‘correlated model’. The correlated model can be reduced to a Gauss–Markov model by means of a transformation as follows. Since is positive definite, there exists a nonsingular matrix G such that = GG . Let Z = G −1 Y. Then E(Z ) = G −1 Aθ and V (Z ) = G −1 (G )−1 σ 2 = σ 2 I. Thus, Z , G −1 Aθ, σ 2 I is a Gauss–Markov model with the design matrix G −1 A = A∗ , say. Note that Rank(A) = Rank(A∗ ). All the results derived in the previous sections are applicable to the ‘correlated model’ with A replaced by A∗ and Y by G −1 Y. Note that Theorem 1.2.1 holds as it is since Rank(A) = Rank (A : a) ⇔ Rank(G −1 A) = Rank(A (G )−1 : a). The normal equation for the ‘correlated model’ takes the form A −1 Aθˆ = A −1 y
(1.5.1)
and the SSE can be written as y −1 y − θˆ A −1 y. It is not difficult to show that the solution(s) of (1.5.1) is (are) the value(s) of θ for which (y − Aθ) −1 (y − Aθ) is the least (see Exercise 1.1).
1.6 Comments Let Y, Aθ, σ 2 I be a Gauss–Markov model. The vector space C(A) with dimension s is known as the estimation space, and the space c ∈ Rn : A c = 0 which is orthogonal to the estimation space is known as the error space. Obviously, the dimension of the error space is n − s. Note that, in view of Lemma 1.3.3, the error space is C(M) with M as defined in (1.3.4). Note also that if c is in the error space, then E(c Y ) = 0. Further, if b belongs to the estimation space and c to the error space, then b Y and c Y are uncorrelated since b and c are orthogonal.
1.7 Exercises Exercise 1.1 Provide the proof of Corollary 1.2.5, the missing steps in Remark 1.4.5 and in Sect. 1.5. Exercise 1.2 With the notations used in this chapter, show that the following are equivalent:
1.7 Exercises
(1) (2) (3) (4)
17
a θˆ is unique for all solutions θˆ of the normal equation. a ∈ C(A A) ⇔ a ∈ C(A ). There exists a linear function c Y such that E(c Y ) = a θ for all θ. a θˆ is linear in Y and unbiased for a θ.
Exercise 1.3 A Gauss–Markov model has the design matrix A given by ⎛
1 ⎜1 ⎜ A=⎜ ⎜0 ⎝1 2
−1 1 2 3 0
−1 0 1 1 −1
⎞ 1 1⎟ ⎟ 0⎟ ⎟. 1⎠ 2
Find the rank of the model. Derive a criterion for the estimability of an lpf. Obtain a least squares estimate of the parameter vector, a g-inverse of A A and the associated M matrix. Derive the expression for MSE. Exercise 1.4 Answer the questions as in the previous exercise for the following models: (i) E(Y1 ) = θ1 − θ2 + θ3 − θ4 ; E(Y2 ) = θ1 + θ2 + 3θ3 − θ4 ; E(Y3 ) = θ2 + θ3 ; E(Y4 ) = θ1 + 2θ3 − θ4 ; E(Y5 ) = θ1 − 2θ2 − θ4 . (ii) E(Y1 ) = θ1 + θ2 − θ3 ; E(Y2 ) = θ1 + θ3 ; E(Y3 ) = 2θ1 + θ2 ; E(Y4 ) = −θ2 + 2θ3 . Exercise 1.5 Y, Aθ, σ 2 I is a Gauss–Markov model with E(Y1 ) = θ1 + θ2 , E(Y2 ) = θ2 + θ3 , E(Y3 ) = θ1 + 2θ2 + θ3 , and E(Y4 ) = θ1 − θ3 . Show that an lpf is estimable iff it is of the form b1 (θ1 + θ2 ) + b2 (θ2 + θ3 ) for some real numbers b1 and b2 . Obtain the blue’s of θ1 + 2θ2 + θ3 and θ1 − θ3 , their variances, and the correlation coefficient between them. Exercise 1.6 Let Y, Aθ, σ 2 I be a Gauss–Markov model. Show that an lpf a θ is estimable iff I p − (A A)− (A A) = 0, where (A A)− is any g-inverse of A A. Exercise 1.7 In a Gauss–Markov model Y, Aθ, σ 2 I with rank s, let a1 θ, . . . , as θ be a collection of s independent elpf’s. Show that an lpf a θ is estimable iff it is of the form a θ = λ1 a1 θ + · · · + λs as θ for some real numbers λ1 , . . . , λs . Exercise 1.8 If T0 is the blue of an elpf a θ in a Gauss–Markov model Y, Aθ, σ 2 I , of a θ, then show that the correand if T1 is any other linear unbiased estimator lation coefficient between T0 and T1 is
V (T0 ) . V (T1 )
Exercise 1.9 With reference to a Gauss–Markov model (Y, Aθ, σ 2 I , show that every linear function of θ is estimable iff Aθ = Aφ implies that θ = φ.
Chapter 2
Linear Hypotheses and their Tests
Statistical thinking will one day be as necessary a qualification for efficient citizenship as the ability to read and write —H. G. Wells
The two important statistical aspects of modeling are estimation and testing of statistical hypotheses. As a sequel to the previous chapter on linear estimation, questions such as what the statistical hypotheses that can be tested in a linear model are and what the test procedures are, naturally arise. This chapter answers such questions, for example, comparison of two or more treatments/methods occur often in real-life problems such as comparing the effect of two or more drugs for curing a particular medical condition, comparing two or more diets on the performance of athletes/sportspersons in a particular sporting event, comparing two or more training methods, and whatnot. In such problems, it is of primary interest to rule out the possibility that all the treatments/methods have the same effect on the outcome of interest. This can be achieved by testing a hypothesis that the treatments/methods have the same effect on the desired outcome. Experiments can be designed wherein such hypotheses of interest can be statistically tested. In this chapter, after discussing linear hypotheses in a general multivariate normal setup, applications of the results to linear hypotheses in a linear model are discussed. These are further used to obtain confidence intervals, confidence ellipsoids, and simultaneous confidence intervals for elpf’s.
c The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 N. R. Mohan Madhyastha et al., A First Course in Linear Models and Design of Experiments, https://doi.org/10.1007/978-981-15-8659-0_2
19
20
2 Linear Hypotheses and their Tests
2.1 Linear Hypotheses Consider a random vector Y = (Y1 . . . Yn ) with expectation E(Y ) = ξ = (ξ1 . . . ξn ) . If each ξi can take any real number as its value, then ξ ∈ Rn . There are situations where the values of ξi ’s are restricted in such a way that ξ belongs to a subspace B(s) of Rn of dimension s < n. Here, we will be concerned with such situations only. So let ξ ∈ B(s), 0 < s < n. By a linear hypothesis, we mean the hypothesis H : ξ ∈ B0 (s − q), where B0 (s − q) is a subspace of B(s) with dimension s − q for some q, 0 < q ≤ s, and we say H is of rank q. The alternative is that ξ ∈ / B0 (s − q). As we will see later, a variety of hypotheses that are being tested in the analysis of experimental data happen to be linear hypotheses. We assume that Y1 , . . . , Yn are independent normal random variables with common variance σ 2 > 0 which is unknown. Note that Y then has multivariate normal distribution with E(Y ) = ξ and V (Y ) = σ 2 I, which we indicate by writing Y ∼ N (ξ, σ 2 I ). Thus the setup we have is as follows: Y ∼ N (ξ, σ 2 I ), ξ ∈ B(s), and H : ξ ∈ B0 (s − q) ⊂ B(s) ⊂ Rn . (2.1.1) Since B0 (s − q) ⊂ B(s), we can always find a basis for B(s) consisting of vectors c1 , . . . , cs−q , cs−q+1 , . . . , cs such that c1 , . . . , cs−q span B0 (s − q). Hence, a priori, ξ has the representation ξ = b1 c1 + · · · + bs cs for some scalars b1 , . . . , bs , and under H, it has the representation ξ = d1 c1 + · · · + ds−q cs−q , for some scalars d1 , . . . , ds−q . We will derive the likelihood ratio test for the hypothesis H in the setup (2.1.1). Before we proceed to derive the likelihood ratio test, we will show that the hypothesis H in (2.1.1) can be presented in a simple form by making an orthogonal , . . . , cs for transformation of Y. We first choose an orthonormal basis c q+1 B0 (s − q) and then extend it to c1 , . . . , cq , cq+1 , . . . , cs , an orthonormal basis for B(s). This is possible since B0 (s − q) ⊂ B(s). Finally, we extend it to an orthonormal basis {c1 , . . . , cs , cs+1 , . . . , cn } for Rn . Let C = (c1 .. . cn ). Then C is an orthogonal matrix. Write C1 = c1 . . . cq , C2 = cq+1 . . . cs , and C3 = (cs+1 . . . cn ). Let Z = CY. Then E(Z ) = Cξ = η, say, and V (Z ) = σ 2 CC = σ 2 I. Thus Z 1 , . . . , Z n are independent normal random variables with common variance σ 2 and E(Z i ) = ηi , i = 1, . . . , n. Hence Z ∼ N (η, σ 2 I ). Note that η = Cξ = (C1 ξ) (C2 ξ) (C3 ξ) . Since ξ ∈ B(s) a priori, and the columns of (C1 : C2 ) constitute a basis for B(s), there exist vectors a1 and a2 with q and s − q components, respectively, such that ξ = C1 a1 + C2 a2 . Therefore, if ξ ∈ B(s) then C3 ξ = 0, that is, ηs+1 = · · · = ηn = 0. Further, ξ ∈ B0 (s − q) under H, and the columns of C2 constitute a basis for B0 (s − q). Thus, under H, there exists a (s − q)-component vector b such that ξ = C2 b. Therefore, if H is true, then C1 ξ = 0, that is, η1 = · · · = ηq = 0.
2.1
Linear Hypotheses
21
Thus we see that, in the orthogonal setup, (2.1.1) can be written as the following canonical setup: Z ∼ N (η, σ 2 I ), ηs+1 = · · · = ηn = 0 and H ∗ : η1 = · · · = ηq = 0.
(2.1.2)
2.2 Likelihood Ratio Test of a Linear Hypothesis We consider the setup in (2.1.1). The following theorem gives the likelihood ratio test of the hypothesis H. Theorem 2.2.1 Let Y have multivariate normal distribution with E(Y ) = ξ and V (Y ) = σ 2 I, where ξ ∈ B(s), an s-dimensional subspace of Rn . Under the hypothesis H of rank q, let ξ ∈ B0 (s − q), where B0 (s − q) is a (s − q)dimensional subspace of B(s). Given an observation y on Y, the likelihood ratio test rejects H at a chosen level of significance ω, if l(y) =
1 q
minξ∈B0 (s−q) (y − ξ) (y − ξ) − minξ∈B(s) (y − ξ) (y − ξ) 1 minξ∈B(s) (y − ξ) (y − ξ) n−s
> F0 = F0 (ω; q, n − s),
(2.2.1)
where F0 is the (1 − ω)-quantile of the F-distribution with q and n − s degrees of freedom, that is, P(X ≤ F0 ) = 1 − ω, the random variable X having the F-distribution with q and n − s degrees of freedom. Proof First we will derive the likelihood ratio test of the hypothesis H ∗ in (2.1.2). Let z = C y so that z is a realization of Z . Given z, the likelihood function 1 L(η1 , . . . , ηs , σ ; z) = √ exp − 2 (z − η) (z − η) 2σ ( 2π σ)n s n
1
1 exp − 2 (z i − ηi )2 + z i2 , = √ 2σ ( 2π σ)n i=1 i=s+1 2
1
ηi ∈ R, i = 1, . . . , s, 0 < σ 2 < ∞. The likelihood ratio test statistic λ is λ(z) =
supηq+1 ,...,ηs ,σ2 L(0, . . . , 0, ηq+1 , . . . , ηs , σ 2 ; z) supη1 ,...,ηs ,σ2 L(η1 , . . . , ηq , ηq+1 , . . . , ηs , σ 2 ; z)
.
We get the denominator of λ by substituting the maximum likelihood estimates of η1 , . . . , ηs and σ 2 in L(η1 , . . . , ηs , σ 2 ; z). It is easy to show (see Exercise 2.1) that the maximum likelihood estimate of ηi is ηˆi = z i , i = 1, . . . , s n z i2 . Substituting these estimates in L , we and that of σ 2 is σˆ 2 = n1 i=s+1
22
2 Linear Hypotheses and their Tests
get the denominator as
√1 2π σˆ
n
exp(− n2 ). Similarly (see Exercise 2.1), sub-
stituting the maximum likelihood estimates η˜i = z i , i = q + 1, . . . , s and σ˜ 2 =
n 1 q 2 2 z in L(0, . . . , 0, ηq+1 , . . . , ηs , σ 2 ; z), we get the numerai=1 z i + i=s+1 n n i 1 tor of λ as √2π exp(− n2 ). Therefore, σ˜ λ(z) =
σˆ 2 σ˜ 2
n2
=
n
z2
q 2 i=s+1
n i 2 i=s+1 z i i=1 z i +
n2 .
The likelihood ratio test of H ∗ rejects it iff λ(z) < c, where c is to be chosen in such a way that the resulting test has the level of significance ω. The critical region λ(z) < c is equivalent to the region
q 2 z 1
n i=1 i 2 > c1 = 2/n − 1. c i=s+1 z i For convenience, we present the critical region as
q
2 i=1 z i /q 2 i=s+1 z i /(n −
n
s)
> c∗ .
(2.2.2)
It is this constant c∗ that will be chosen such that the resulting test has the level of significance ω. To determine c∗ , we have to find the distribution of
q Z 2 /q
n i=1 2 i when H ∗ is true. But the distribution of this is the same as that of Z i /(n−s) i=s+1
q 2 2
q Z 2
n Z /qσ Z i2 1 /q
n i=1 2 i = W2W/(n−s) , where W1 = i=1 σ2i and W2 = i=s+1 . Note σ2 Z /(n−s)σ 2 i=s+1
i
that, whether H ∗ is true or not, Zσi , i = s + 1, s + 2, . . . , n, are independent standard normal variables and hence W2 has Chi-square distribution with n − s degrees of freedom. However, if H ∗ is true, then Zσi , i = 1, . . . , q, are independent standard normal random variables. So W1 has Chi-square distribution with
q Z i2 /q ∗ ∗ i=1 = k(Z ), q degrees of freedom under H . Therefore, under H , n Z 2 /(n−s) i=s+1 i say, has the F-distribution with q and n − s degrees of freedom. Since c∗ has to satisfy the condition P(k(Z ) > c∗ ) = ω, we find that c∗ = F0 (ω; q, n − s).
(2.2.3)
Now to complete the proof, we need to show that the test statistic k(z) is the same as l(y) in (2.2.1). It is easy to see that n
i=s+1
z i2 = min
η1 ,...,ηs
s
(z i − ηi )2 +
i=1
= min(z − η) (z − η) η
n
i=s+1
z i2
2.2
Likelihood Ratio Test of a Linear Hypothesis
23
= min (y − ξ) (y − ξ),
(2.2.4)
ξ∈B(s)
and q
i=1
z i2 +
n
z i2 =
i=s+1
= =
min
ηq+1 ,...,ηs
⎧ q ⎨
⎩
min
i=1
{η:η1 =···=ηq =0}
z i2 +
s
(z i − ηi )2 +
i=q+1
n
z i2
i=s+1
⎫ ⎬ ⎭
(z − η) (z − η)
min (y − ξ) (y − ξ).
ξ∈B0 (s−q)
Substituting these in k(z), we get l(y).
The l(y) in (2.2.1) is called the likelihood ratio test statistic. Example 2.2.2 Let Y = (Y1 Y2 Y3 Y4 ) be a vector of independent normal random variables with common variance σ 2, and E(Y ) belong to the vector space B(2) spanned by the vectors (1 0 1 1) and (0 1 − 1 1) . Let the hypothesis H to be tested be that E(Y ) belongs to the vector space B0 (1) spanned by (1 1 0 2) . Since this vector is the sum of the two basis vectors spanning B(2), B0 (1) is a subspace of B(2). In this case, n = 4, s = 2, and q = 1. We shall obtain the likelihood ratio test of H using Theorem 2.2.1. We need to compute the test statistic in (2.2.1). Let y = (y1 y2 y3 y4 ) be an observation on Y. Since E(Y ) belongs to the vector space B(2) spanned by (1 0 1 1) and (0 1 − 1 1) , it has the representation E(Y ) = r1 (1 0 1 1) + r2 (0 1 − 1 1) = (r1 r2 (r1 − r2 ) (r1 + r2 )) for some real numbers r1 and r2 . Hence min (y − E(Y )) (y − E(Y )) 2
2 2 2 (yi − ri ) + (y3 − r1 + r2 ) + (y4 − r1 − r2 ) = min E(Y )∈B(2)
r1 ,r2 ∈R
i=1
= min f (r1 , r2 ) = f (ˆr1 , rˆ2 ), r1 ,r2 ∈R
2 (yi − ri )2 + (y3 − r1 + r2 )2 + (y4 − r1 − r2 )2 and rˆ1 and where f (r1 , r2 ) = i=1 rˆ2 are the values of r1 and r2 at which f is the least. Differentiating f partially with respect to r1 and r2 , equating the derivatives to zeroes and solving the two equations, we get rˆ1 = y1 +y33 +y4 and rˆ2 = y2 −y33 +y4 . That these are the values at which f is the least has been shown in Chap. 1. So f (ˆr1 , rˆ2 ) =
x 2 + x22 1 (x1 + x2 )2 + (x1 − x2 )2 + x12 + x22 = 1 , 9 3
where x1 = y1 − y2 + y3 and x2 = y1 + y2 − y4 . Under H, E(Y ) has the representation E(Y ) = (r1 r1 0 2r1 ) for some real number r1 . Hence
24
2 Linear Hypotheses and their Tests
min
(y − E(Y )) (y − E(Y )) = min
E(Y )∈B0 (1)
r1 ∈R
2
(yi − r1 ) + 2
y32
+ (y4 − 2r1 )
2
i=1
= min f 0 (r1 ) = f 0 (˜r1 ), r1 ∈R
2 (yi − r1 )2 + y32 + (y4 − 2r1 )2 and r˜1 is the value of r1 at where f 0 (r1 ) = i=1 which f 0 is the least. Proceeding as in the previous case (see Exercise 2.1), we can show that r˜1 = y1 +y62 +2y4 and hence f 0 (˜r1 ) =
1 2 3x1 + 2x22 + 9y32 + 6x1 y3 . 6
Substituting these in the test statistic l(y) in (2.2.1), we get l(y) =
(y1 − y2 + 2y3 )2 . (y1 − y2 − y3 )2 + (y1 + y2 − y4 )2
The hypothesis H is rejected at a chosen level of significance ω if l(y) > F0 (ω; 1, 2).
2.3 Gauss–Markov Models and Linear Hypotheses Let (Y, Aθ, σ 2 I ) be a Gauss–Markov model. Observe that the joint distribution of Y1 , . . . , Yn was not needed in Chap. 1. But here we will assume that Y has multivariate normal distribution. As a consequence, Y1 , . . . , Yn will then be independent normal random variables. It may be remembered that we will always have this additional assumption in the Gauss–Markov model whenever we propose tests of hypotheses. The model may be written as Y ∼ N (Aθ, σ 2 I ). The following theorem shows that any linear hypothesis in a Gauss–Markov model is equivalent to a specified number of independent elpf’s equal to zero. Recall that E(Y ) belongs to the vector space C(A) and that C(A) has dimension s = Rank(A). Theorem 2.3.1 A hypothesis H in a Gauss–Markov model (Y, Aθ, σ 2 I ) with rank s is a linear hypothesis of rank q, 0 < q ≤ s, iff there exist q independent elpf’s which are equal to zero under H. Proof Assume, without loss of generality, that the first s columns of A are lin early independent. Write A = (A(1) : A(2) ) and θ = (θ(1) θ(2) ) , where A(1) is (1) of order n × s and θ = (θ1 . . . θs ) . Since the columns of A(2) n×( p−s) are lin(1) early dependent on the columns of A , there exists a matrix G s×( p−s) such that A(2) = A(1) G. Then Aθ = A(1) (θ(1) + Gθ(2) ) = A(1) θ˜ where θ˜ = θ(1) + Gθ(2) . ˜ σ 2 I ) is a full-rank Gauss–Markov model and hence each comNote that (Y, A(1) θ,
2.3
Gauss–Markov Models and Linear Hypotheses
25
ponent of θ˜ is estimable. Since θ˜ = (Is : G)θ, the components of θ˜ are independent elpf’s. Let A∗n×(s−q) denote the matrix whose columns constitute a basis for the (s − q)-dimensional subspace of C(A) to which E(Y ) belongs under H. Then, under H, E(Y ) = A∗ θ¯ for some vector θ¯(s−q)×1 . Since C(A∗ ) is a subspace of C(A) = C(A(1) ), there exists an s × (s − q) matrix D1 of rank s − q such that A∗ = A(1) D1 . Now choose a s × q matrix D2 such that D = (D1 : D2 ) is non˜ and E(Y ) = A(1) θ˜ = AD ˜ −1 θ˜ = singular. Let A˜ = A(1) D. Then C(A(1) ) = C( A) D˜ θ˜ ˜ where D −1 = ( D˜ D˜ ) (A(1) D1 : A(1) D2 ) ˜ 1 ˜ = A(1) D1 D˜ 1 θ˜ + A(1) D2 D˜ 2 θ, 1 2 D2 θ ˜ θ∗∗ = D˜ 2 θ˜ , and A∗∗ = and D˜ 1 is a matrix of order (s − q) × s. Let θ∗ = D˜ 1 θ, (1) ∗ ˜ ˜ ∗∗ ˜ ˜ ∗ ∗ A D2 . Then E(Y ) = A D1 θ + A D2 θ = A θ + A∗∗ θ∗∗ and E(Y ) ∈ C(A∗ ) iff θ∗∗ = D˜ 2 θ˜ = 0. Since D˜ 2 θ˜ is a vector of q independent elpf’s, the claim follows. Remark 2.3.2 Under a linear hypothesis H of rank q, we can write E(Y ) = A∗ θ∗ for some vector θ∗ , where C(A∗ ) is a subspace of C(A) with dimension s − q. The model (Y, A∗ θ∗ , σ 2 I ), which we will call the reduced model under H, is a Gauss–Markov model with rank s − q. Observe that the reduced model in the proof above is a full-rank model since p = s − q by the choice of A∗ . However, in general, a reduced model need not be a full-rank model. This is because, the imposition of a linear hypothesis H in the original model (Y, Aθ, σ 2 I ) with rank s can lead to E(Y ) = A∗ θ∗ with the number of columns of A∗ more than the rank of A∗ . In view of Theorem 2.3.1, it would be useful to have two versions of the likelihood ratio test of a linear hypothesis H, one for the case where the reduced model under H is available, and the other for the case where H itself is that some independent elpf’s are equal to zero. The following theorem gives the likelihood ratio test of a linear hypothesis H making use of the reduced model. Theorem 2.3.3 Let (Y, Aθ, σ 2 I ) be a Gauss–Markov model with rank s and Y1 , . . . , Yn be independent normal random variables. Let H be a linear hypothesis so that the reduced model under H is (Y, A∗ θ∗ , σ 2 I ) with rank s − q for some q, 0 < q ≤ s. Given an observation y on Y, the likelihood ratio test of H rejects it at a chosen level of significance ω if 1 q
θˆ A y − θˆ∗ A∗ y MSE
> F0 (ω; q, n − s),
(2.3.1)
where θˆ and θˆ∗ are the solutions of the normal equations A Aθˆ = A y and θˆ A y A∗ A∗ θˆ∗ = A∗ y, respectively, MSE = SSE = y y− and F0 (ω; q, n − s) is n−s n−s the upper ω-quantile of the F-distribution with q and n − s degrees of freedom.
26
2 Linear Hypotheses and their Tests
Proof Write E(Y ) = ξ and note that ξ ∈ C(A) a priori and ξ ∈ C(A∗ ) under H. We apply Theorem 2.2.1 with B(s) = C(A) and B0 (s − q) = C(A∗ ). To get the test statistic l(y) in (2.2.1), we note that min (y − ξ) (y − ξ) = min(y − Aθ) (y − Aθ)
ξ∈B(s)
θ
ˆ ˆ (y − Aθ) = (y − Aθ) = y y − θˆ Aˆ y,
(2.3.2)
and (y − A∗ θ∗ ) (y − A∗ θ∗ ) min (y − ξ) (y − ξ) = min ∗
ξ∈B0 (s−q)
θ
= (y − A∗ θˆ∗ ) (y − A∗ θˆ∗ ) = y y − θˆ∗ A∗ y. Substituting these in l(y), we get (2.3.1) from (2.2.1).
Remark 2.3.4 The least squares estimate θˆ of θ is, in fact, the maximum likelihood estimate since Y has multivariate normal distribution. The numerator of the test statistic in (2.3.1) is 1 {SSE in the reduced model under H − SSE in the original model} q and q = Rank(A) − Rank(A∗ ). Example 2.3.5 Consider the Gauss–Markov model in Example 1.2.12. Assume that Y has 5-variate normal distribution. In this example, n = 5 and s = 2. Let H be the hypothesis that E(Y ) = A∗ θ∗ for some θ∗ real and A∗ = (1 3 2 5 4) . Since A∗ = 3A3 + A4 , where A3 and A4 denote the third and fourth column of A, C(A∗ ) is a subspace of C(A). The dimension of C(A∗ ) is 1. So H is a linear hypothesis and q = Rank(A) − Rank(A∗ ) = 1. We apply Theorem 2.3.3 to get the likelihood ratio test of H. Let y be an observation on Y. A least squares estimate of θ as given in Example 1.3.6 1 +7x 2 0 0), where x = A y, x1 = y1 + y2 + y4 + 2y5 , and is θˆ = ( 5x132−x2 −3x96 1 1 +7x 2 ) Hence θˆ A y = (5x132−x2 ) x1 + (−3x96 x2 = 96 x2 = −y1 + y2 + 2y3 + 3y4 . 2 2 (15x1 − 6x1 x2 + 7x2 ). To get θˆ∗ , we solve the normal equation A∗ A∗ θˆ∗ = A∗ y for the reduced model. The normal equation is 55θˆ∗ = y1 + 3y2 + 2y3 + 5y4 + 4y5 = 2x1 + x2 and θˆ∗ = 2x155+x2 . Hence θˆ∗ A∗ y = test statistic in (2.3.1) is
(2x1 +x2 )2 . 55
The numerator of the likelihood ratio
2.3
Gauss–Markov Models and Linear Hypotheses
27
(21x − 17x )2 1 ˆ 1 2 θ A y − θˆ∗ A∗ y = q 96 × 55 and hence the test statistic is (21x1 − 17x2 )2 . 1 15x12 − 6x1 x2 + 7x22 32 × 55 y y − 96
The hypothesis H is rejected at level of significance ω if the above is greater than F0 (ω; 1, 3). Remark 2.3.6 (i) According to Theorem 2.3.1, there exists an elpf which is equal to zero under H in the above example. To find this elpf, we write the design matrix as A = (A1 A2 A3 A4 ) where Ai denotes the ith column of A, i = 1, 2, 3, 4. Note that A1 = A3 + A4 , A2 = A3 − A4 and the columns A3 and A4 are linearly independent. Hence E(Y ) = Aθ = θ1 A1 + θ2 A2 + θ3 A3 + θ4 A4 = (θ1 + θ2 + θ3 )A3 + (θ1 − θ2 + θ4 )A4 . Under H, we have E(Y ) = θ∗ (3A3 + A4 ) for some θ∗ and the original model will reduce to the model under H iff θ1 + θ2 + θ3 = 3(θ1 − θ2 + θ4 ), that is, iff 2θ1 − 4θ2 − θ3 + 3θ4 = 0. According to the criterion given in Example 1.2.12, 2θ1 − 4θ2 − θ3 + 3θ4 is estimable. Thus, H is equivalent to 2θ1 − 4θ2 − θ3 + 3θ4 = 0. (ii) Let λ1 θ = 2θ1 − 4θ2 − θ3 + 3θ4 . If H1 : θ1 = θ2 = θ3 = θ4 , or H2 : θ1 = 0, θ2 = 0, θ3 = 3θ4 , holds in the above example, then also E(Y ) reduces ¯ Note that H1 or H2 implies λ θ = 0, but to (1 3 2 5 4) θ¯ for some θ. 1 λ1 θ = 0 does not imply H1 or H2 . In fact, in each case, we can find, in several ways, two independent nonestimable lpf’s λ2 θ and λ3 θ, independent of λ1 θ, such that H1 or H2 is equivalent to λ1 θ = 0, λ2 θ = 0, and λ3 θ = 0. For example, H1 is equivalent to λ1 θ = 0, θ1 − θ2 = 0, θ1 − θ3 = 0 and H2 is equivalent to λ1 θ = 0, θ1 − θ2 = 0, θ1 − θ3 + 3θ4 = 0. Observe that, under the superfluous conditions λ2 θ = 0 = λ3 θ alone, E(Y ) still belongs to C(A) and not to a subspace. Strictly speaking, H1 and H2 above are linear hypotheses because under H1 or H2 , E(Y ) belongs to a subspace of dimension 1. To distinguish such hypotheses from H : λ1 θ = 0, a linear hypothesis a1 θ = 0, . . . , aq θ = 0 will be called estimable linear hypothesis if all the lpf’s a1 θ, . . . aq θ are estimable. Example 2.3.7 Consider the one-way classification model introduced in Example 1.2.13. We assume that Yi j , j = 1, . . . , n i , i = 1, . . . , v, are independent normal random variables. Note that the design matrix A can be written as A = (In : A1 ) where ⎛ ⎞ In 1 0 . . 0 ⎜ 0 In 2 . . 0 ⎟ ⎜ ⎟ ⎟ A1 = ⎜ ⎜ . . .. . ⎟ ⎝ . . .. . ⎠ 0 . . . In v
28
2 Linear Hypotheses and their Tests
and E(Y ) = μIn + A1 α. Here α = (α1 . . . αv ) . Since A1 Iv = In , E(Y ) belongs to C(A1 ). As the columns of A1 are orthogonal, C(A1 ) has dimension v. Thus, in this model, B(s) = C(A1 ) with s = v. Let Hα : α1 = · · · = αv . Under Hα , E(Y ) = μ In + A1 Iv α1 = (μ + α1 )In = A∗ θ∗ , where A∗ = In and θ∗ = μ + α1 . Thus, under Hα , E(Y ) belongs to the vector space B0 (s − q) spanned by the vector In . Hence s − q = 1 and q = v − 1. So Hα is a linear hypothesis of rank v − 1. With y as an observation on Y, a least squares estimate of θ, as given in Example 1.3.7, is θˆ = (μˆ αˆ ) = (0 (N −1 y∗. ) ), where N =
v yi.2 , diag(n 1 , . . . , n v ) and y∗. = (y1. . . . yv. ) . Hence θˆ A y = y∗. N −1 y∗. = i=1 ni where A y = (y.. y∗. ) and y.. = Iv y∗. . The normal equation for the reduced model under Hα is A∗ A∗ θˆ∗ = A∗ y which simplifies to n θˆ∗ = y.. . We get y2 θˆ∗ = yn.. and θˆ∗ A∗ y = n.. . We use Theorem 2.3.3 to test Hα . The numerator yi.2 y..2 v of the test statistic, as given in (2.3.1), is v1 − . From Example 1.3.7, i=1 n i n 2
v n i y v we have SSE = i=1 j=1 yi2j − i=1 ni.i and the denominator of the test statistic is MSE = SSE . Therefore, H : α = · · · = α is rejected at a chosen level α
n−v
of significance ω if
1 v−1
yi.2 v i=1 n i
MSE
−
y..2 n
1
v
> F0 (ω; v − 1, n − v).
(2.3.3)
Remark 2.3.8 As mentioned in Remark 1.2.14, the above model is associated with a Completely Randomized Design (CRD). The integer n = n 1 + · · · + n v denotes the number of plots used and v denotes the number of treatments. The parameters α1 , . . . , αv , denote the effects of the treatments. A CRD is recommended for an experiment if the plots available are homogeneous, that is, the expected yields from all these plots, without applying treatments, are the same. The parameter μ may be interpreted as this common expected yield. The hypothesis Hα is that all the v treatments have the same effect. The numerator of the test statistic in (2.3.3) is called the treatment mean squares. Remark 2.3.9 Let H ∗ : α1 = · · · = αv = 0. Under H ∗ , we have E(Y ) = μ In and hence belongs to the same vector space to which E(Y ) belongs under Hα . However, H ∗ is equivalent to H : α1 = 0. Observe that the superfluous condition consists of a nonestimable lpf equal to zero. The next theorem is the counterpart of Theorem 2.3.3 in that the linear hypothesis is stated as q independent elpf’s equal to zero. Theorem 2.3.10 Let (Y, Aθ, σ 2 I ) be a Gauss–Markov model with rank s and Y1 , . . . , Yn be independent normal random variables. Let H : θ = 0, where θ is a vector of q independent elpf’s for some q, 0 < q ≤ s. Given an observation y on Y, the likelihood ratio test of H rejects it at a chosen level of significance ω if
2.3
Gauss–Markov Models and Linear Hypotheses 1 ˆ ( θ) q
− −1 ˆ (A A) ( θ) MSE
29
> F0 (ω; q, n − s),
(2.3.4)
where θˆ is a least squares estimate of θ, (A A)− is a g-inverse of A A and 1 (y y − θˆ A y). MSE = n−s Proof We write Aθ = ξ. Since θ is estimable, by Corollary 1.2.2, there exists a matrix W p×q such that A AW = . Since q = Rank() = Rank(A AW ) ≤ Rank(AW ) ≤ Rank(W ) ≤ q, it follows that q = Rank(AW ) = Rank(W ). Now H : θ = 0 ⇐⇒ W A ξ = 0. We choose an orthonormal basis c1 , . . . , cq for C(AW ) and extend it to an orthonormal basis c1 , . . . , cq , cq+1 , . . . , cs for C(A). This is possible since C(AW ) ⊂ C(A). Finally, we extend this to an orthonormal basis {c1 , . . . , cs , cs+1 , . . . , cn } for Rn so that C = (c1 . . . cn ) is an orthogonal matrix. Now we transform Y to Z by Z = CY. Then E(Z ) = Cξ = η, say. Arguing as in Sect. 2.1 with B(s) = C(A) (see Exercise 2.1), we conclude that ηs+1 = · · · = ηn = 0. Let C1 = (c1 . . . cq ). Since the columns of AW also constitute a basis for C(AW ), there exists a nonsingular matrix Dq×q such that C1 = AW D. Hence H : W A ξ = 0 ⇐⇒ D W A ξ = 0 ⇐⇒ C1 ξ = 0 ⇐⇒ η1 = · · · = ηq = 0. Note that the setup is the same as that in (2.1.2) and the critical region of the likelihood ratio test of H is given by (2.2.2). 1 n z2. With z = C y, the denominator of the test statistic in (2.2.2) is n−s i=s+1 i Using (2.2.4) and (2.3.2), we get this as MSE = SSE = 1 y y − θˆ A y . Using n−s
n−s
A Aθˆ = A y, the numerator of the test statistic is q 1 2 1 C y C1 y zi = q i=1 q 1
1 y AW D D W A y q 1 ˆ W A Aθ D D W A Aθˆ = q 1 ˆ θ D D θˆ . = q =
(2.3.5)
30
2 Linear Hypotheses and their Tests
Since C1 C1 = Iq = D W A AW D, we get (D D )−1 = W A AW = W = (A A)− . Hence −1 ˆ − q θˆ (A A) θ
1 2 z = . q i=1 i q Therefore, from (2.2.3), the critical region in (2.2.2) is the same as the one in (2.3.4). Remark 2.3.11 The condition that θ is a vector of q independent elpf’s in the hypothesis H in Theorem 2.3.10 is not a restriction. To see this, suppose that θ is a vector of q elpf’s of which q1 (< q) are independent. Writing θ = ((1 θ) (2 θ) ) , we assume, without loss of generality, that 1 θ is a vector of q1 independent elpf’s. Then there exists a matrix B(q−q1 )×q1 such that 2 θ = B1 θ. Thus θ = 0 iff 1 θ = 0. To illustrate the application of Theorem 2.3.10 vis-a-vis Theorem 2.3.1, we consider the Gauss–Markov model in Example 1.2.12 once again. Example 2.3.12 We consider the model in Example 1.2.12 and assume that Y has 5-variate normal distribution. In Remark 2.3.6 (i), we have shown that the hypothesis H in Example 2.3.5 is equivalent to 2θ1 − 4θ2 − θ3 + 3θ4 = 0. Writing θ = 2θ1 − 4θ2 − θ3 + 3θ4 , we observe that = (2 − 4 − 1 3) . We will use Theorem 2.3.10 to test the hypothesis θ = 0, given an observation y on Y. The denominator of the test statistic in (2.3.4) is computed in Example 2.3.5 and is equal to 1 1 2 2 yy− 15x1 − 6x1 x2 + 7x2 , MSE = 3 96 where x1 = y1 + y2 + y4 + 2y5 and x2 = −y1 + y2 + 2y3 + 3y4 . To get the numerator of the test statistic, note that q = 1 and that a g-inverse (A A)− of A A is available in Example 1.3.6. Using this, we get (A A)− = 55 . Using the least squares estimate of θ again from Example 1.3.6, we get 24 −17x2 . Substituting these, we get the numerator of the test statistic as θˆ = 21x148 (21x1 −17x2 )2 and the test statistic obviously as the one obtained in Example 2.3.5. 96×55 As another illustration, we consider the one-way classification model of Example 1.2.13. Example 2.3.13 Let the model, assumptions, and the notations be as in Example 2.3.7. Let 1 α be a vector of v − 1 independent contrasts in α and H : 1 α = 0. It is shown in Example 1.2.13 that the contrasts in α are estimable and hence H is a linear hypothesis. An easy way to obtain the test of H is to show that H : 1 α = 0 ⇐⇒ Hα : α1 = · · · = αv , and claim that the test of Hα obtained in Example 2.3.7 is the
2.3
Gauss–Markov Models and Linear Hypotheses
31
test of H as well. Now α1 = · · · = αv =⇒ 1 α = 0 is trivial since α = a I for some scalar a, and, by definition, 1 I = 0. Also, the maximum number of independent contrasts is v − 1, and α1 = · · · = αv is equivalent to v − 1 orthogonal contrasts α1 + · · · + αi−1 − (i − 1)αi , i = 2, . . . , v, equal to 0. Hence the components of 1 α are linear combinations of these orthogonal contrasts. Therefore, 1 α = 0 implies that α1 = · · · = αv . We will obtain the likelihood ratio test of H by applying Theorem 2.3.10. The denominator of the test statistic in (2.3.4) is MSE and is available in Example 2.3.7. So we will get the 1v×(v−1) ) . Then θ = numerator. Note that q = v − 1. Define = (01×(v−1) 1 α where θ = (μ α ) . A least squares estimate of θ is θˆ = (0 (N −1 y∗. ) ) , where N = diag(n 1 , .. . , n v ) and y∗. = (y1. . . . yv. ) and a g-inverse of A A is 0 01×v . Both these quantities are available in Example 1.3.7. The (A A)− = 0v×1 N −1 numerator of the test statistic is −1 −1 1 αˆ 1 N 1 1 αˆ v−1 −1 −1 y∗. N −1 1 1 N −1 1 1 N y∗. y By∗. = = ∗. , v−1 v−1 −1 −1 where we write B = N −1 1 1 N −1 1 1 N . Note that 1 (B − N −1 ) = 0. Since 1 I = 0 and Rank(1 ) = v − 1, we get B − N −1 = Iv Iv diag(a1 , . . . , av ) for some scalars a1 , . . . , av . Since Iv 1 = 0, we have I N B = 0. Hence 0 = I N B = I Iv + N I I diag(a1 , . . . , av ) = I + n I diag(a1 , . . . , av ). So ai = − n1 , i = 1, . . . , v, and B = N −1 − statistic is 1 v−1
y∗. N −1 y∗.
1 − y..2 n
1 I I . n v v
The numerator of the test
v
y2 1 y..2 i. = − , v − 1 i=1 n i n
which is the same as in (2.3.3). A general version of the hypothesis H in Theorem 2.3.10 is H † : θ = b, where b is given. Though H † is not a linear hypothesis, we can still get a test of this hypothesis using Theorem 2.3.10 as we show below. Note that θ = b is consistent for any b since Rank() = q. Theorem 2.3.14 Let (Y, Aθ, σ 2 I ) be a Gauss–Markov model with rank s and Y1 , . . . , Yn be independent normal random variables. Let θ be a vector of q independent elpf’s for some q, 0 < q ≤ s. Given an observation y on Y, the likelihood ratio test of H † : θ = b rejects it at a chosen level of significance ω if
32
2 Linear Hypotheses and their Tests 1 ( θˆ q
−1 ( θˆ − b) − b) (A A)− MSE
> F0 (ω; q, n − s),
(2.3.6)
where θˆ is a least squares estimate of θ, (A A)− is a g-inverse of A A, and 1 (y y − θˆ A y). MSE = n−s Proof Since the equation θ = b is consistent, we can find a p-component vector u such that u = b. Let w = Au, θ∗ = θ − u, Y ∗ = Y − w, and y ∗ = y − w. Then E(Y ∗ ) = Aθ∗, V (Y ∗ ) = σ 2 I, and θ = θ∗ + b. So the hypothesis H † takes the form H †† : θ∗ = 0 in the Gauss–Markov model (Y ∗, Aθ∗ , σ 2 I ). By Theorem 2.3.10, the likelihood ratio test rejects H †† at a chosen level of significance ω if 1 q
θˆ∗
−1 ∗ θˆ (A A)− MSE
> F0 (ω; q, n − s),
(2.3.7)
where θˆ∗ is a solution of the normal equation A Aθˆ∗ = A y ∗ . It is easy to see that θˆ is a solution of A Aθˆ = A y iff θˆ∗ = θˆ − u is a solution of A Aθˆ∗ = A y ∗ (see Exercise 2.1). Substituting θˆ∗ = θˆ − u in (2.3.7), we get the critical region in (2.3.6) since u = b.
2.4 Confidence Intervals and Confidence Ellipsoids Let Y, Aθ, σ 2 I be a Gauss–Markov model with rank s and Y have multivariate normal distribution. For a chosen ω, 0 < ω < 1, we will obtain the 100(1 − ω)%-confidence interval for an elpf a θ and the 100(1 − ω)%-confidence ellipsoid for a vector θ of q independent elpf’s, where 2 ≤ q ≤ s. We will also obtain simultaneous confidence intervals for all elpf’s a θ at level 1 − ω. Other methods including nonparametric methods may give narrower confidence intervals in some instances and the interested reader is urged to explore these further in the literature. In preparation, we prove the following theorem. Theorem 2.4.1 Let (Y, Aθ, σ 2 I ) be a Gauss–Markov model with rank s and Y have multivariate normal distribution. If θ is a vector of q independent elpf’s ˆ ) is independent of SSE(Y ), where A Aθˆ = for some q, 0 < q ≤ s, then θ(Y A Y and SSE(Y ) = Y Y − θˆ A Y. Proof We transform Y to Z by Z = CY where C is the orthogonal matrix employed in Theorem 2.3.10. If C1 = (c1 . . . cq ), then, as in the proof of Theorem 2.3.10, we have C1 = AW D for some nonsingular matrix Dq×q , where ˆ ). Hence θ(Y ˆ )= A AW = . So C1 Y = D W A Y = D W A Aθˆ = D θ(Y −1 −1 (D ) C Y = (D ) Z (1) , where we write C Y = Z (1) = (Z 1 . . . Z q ) . Thus θˆ
2.4
Confidence Intervals and Confidence Ellipsoids
33
is a linear function of Z 1 , . . . , Z q . It is shown in the proof of Theorem 2.3.10 that
n Z i2 . The claim follows since Z 1 , . . . , Z n are independent. SSE(Y ) = i=s+1 ˆ ) has q-variate normal Remark 2.4.2 Under the assumption of normality, θ(Y distribution with mean vector θ and dispersion matrix (A A)− σ 2 . By a well-known property of multivariate normal distribution, −1 1 ˆ − ˆ ) − θ θ (A A) θ(Y ) − θ(Y σ2 has Chi-square distribution with q degrees of freedom, where (A A)− is a ginverse of A A. In view of this and Theorem 2.4.1, 1 q
ˆ ) − θ θ(Y
−1 ˆ ) − θ θ(Y (A A)− (2.4.1)
SSE(Y ) n−s
has F-distribution with q and n − s degrees of freedom since SSE(Y ) has Chisquare distribution with n − s degrees of freedom. Let a θ be an elpf. By Remark 2.4.2,
ˆ )−a θ √a θ(Y a (A A)− aσ
bution and using Theorem 2.4.1, we claim that √
has standard normal distriˆ )−a θ a θ(Y
a (A A)− a MSE(Y )
has Student’s
t-distribution with n − s degrees of freedom. Then P
ω ˆ ) − aθ | | a θ(Y ;n − s ≤ t0 =1−ω √ − √ 2 a (A A) a MSE(Y )
(2.4.2)
and hence ω ω √ √ ; n − s , a θˆ + a (A A)− a MSE t0 ;n − s a θˆ − a (A A)− a MSE t0 2 2
(2.4.3)
is a confidence interval for a θ with confidence level 1 − ω, where t0 ω2 ; n − s is the 1 − ω2 -quantile of the Student’s t-distribution with n − s degrees of freedom. Now let θ be a vector of q independent elpf’s, where 2 ≤ q ≤ s. The critical region for testing the hypothesis θ = b at level of significance ω is given by (2.3.6) in Theorem 2.3.14. Substituting b = θ in (2.3.6) and replacing y by Y, we observe that the left side of the inequality (2.3.6) is the same as (2.4.1). Therefore the probability that the random variable in (2.4.1) is less than or equal to F0 (ω; q, n − s) is 1 − ω. Thus a 100(1 − ω)%-confidence ellipsoid for θ is 1 q
−1 θˆ − θ (A A)− θˆ − θ MSE
≤ F0 (ω; q, n − s).
(2.4.4)
34
2 Linear Hypotheses and their Tests
The intervals in (2.4.3) for different elpf’s satisfy (2.4.2). However, there is no guarantee that they satisfy the condition ˆ ) − aθ | | a θ(Y ≤c =1−ω P sup √ − √ a (A A) a MSE(Y ) a∈C(A )
(2.4.5)
with c = t0 (1 − ω2 ; n − s). The next lemma gives the interval that satisfies (2.4.5) with an appropriate c and such an interval is called a simultaneous confidence interval. Lemma 2.4.3 Let (Y, Aθ, σ 2 I ) be a Gauss–Markov model with rank s and Y have multivariate normal distribution. Let y be an observation on Y. Then the simultaneous confidence interval for all elpf’s a θ at level 1 − ω is given by ˆ a θ− s F0 (ω; s, n−s)a (A A)− aMSE, a θˆ + s F0 (ω; s, n−s)a (A A)− aMSE , (2.4.6) where (A A)− is a g-inverse of A A, θˆ is a least squares estimate of θ, and θˆ A y MSE = y y− . n−s Proof We transform Y to Z by Z = CY where C = (C1 : C2 ) = (c1 . . . cs cs+1 . . . cn ) is an orthogonal matrix such that the columns of C1 = (c1 . . . cs ) generate C(A). Then Z ∼ N (C Aθ, σ 2 I ). Let θ be a vector of s independent elpf’s. By Corollary 1.2.2, there exists a matrix W p×s with rank s such that A AW = . Note that C(A) = C(AW ) = C(C1 ) since Rank(AW ) = s, as in the proof of Theorem 2.3.10 with q = s. Hence there exists a nonsingular matrix Ds×s such that C1 = AW D. We have 0) since C2 C1 = 0 and C(A) = E(Z ) = C Aθ = ((C1 Aθ) (C2 Aθ) ) = (η(1) C(C1 ). Further, η(1) = C1 Aθ = D W A Aθ = D θ = 0 θ, where 0 = D. Necessarily, 0 θ is a vector of s independent elpf’s. An lpf a θ is estimable iff a = 0 u for some s-component vector u (see Exercise 1.7). Hence a θ = u η(1) for some u. Thus we get all elpf’s as u ranges over Rs and we will find the simultaneous confidence interval for u η(1) . Write Z = ((C1 Y ) (C2 Y ) ) = ((Z (1) ) (Z (2) ) ) so that E(Z (1) ) = η(1) and u (Z (1) −η(1) ) √ has hence E(u Z (1) ) = u η(1) and V (u Z (1) ) = u uσ 2 . Therefore, σ u u u (Z (1) −η(1) ) has Stu√ √ u u MSE
n 1 2 degrees of freedom, where MSE= n−s i=s+1 Z i 2 (u (Z (1) −η(1) ))
standard normal distribution and in view of Theorem 2.4.1, dent’s t-distribution with n − s
as in the proof of Theorem 2.4.1. Equivalently, with 1 and n − s degrees of freedom. Let w = Cauchy–Schwarz inequality, (w w ∗ )2 =
u u MSE √u and u u
has F-distribution w = Z (1) − η(1) . By ∗
u (Z (1) −η(1) )(Z (1) −η(1) ) u ≤ w ww ∗ w ∗ = (Z (1) − η(1) ) (Z (1) − η(1) ), uu
2.4
Confidence Intervals and Confidence Ellipsoids
35
for all u ∈ Rs . Note that the bound above is attained for u = Z (1) − η(1) . Hence 2 u (Z (1) − η(1) ) (Z (1) − η(1) ) (Z (1) − η(1) ) = . sup u uMSE MSE u∈Rs
(2.4.7)
Now, we rewrite (2.4.7) in terms of Y and θ. The right side of (2.4.7) is equal to (Y −Aθ) C1 C1 (Y −Aθ) , where C1 C1 = AW D D W A since C1 = AW D. Also, Is = MSE(Y )
C1 C1 = D W A AW D and hence (D D )−1 = W A AW = (A A)− . Therefore, the right side of (2.4.7) is (Y − Aθ) AW ( (A A)− )−1 W A (Y − Aθ) MSE(Y ) ˆ ˆ ) − θ θ(Y ) − θ ( (A A)− )−1 θ(Y . = MSE(Y )
(2.4.8)
To get the left side of (2.4.7), note that u η(1) = a θ and u Z (1) = u C1 Y = ˆ ) = u D θˆ − u θ(Y ˆ ) = a θ(Y ˆ ). Here, we have u D W A Y = u D W A Aθ(Y 0 ˆ −1 used the normal equation A Aθ(Y ) = A Y. Since (D D ) = (A A)− , we get u u = u D (D D )−1 Du = u D (A A)− Du = u 0 (A A)− 0 u = a (A A)− a. Hence the left side of (2.4.7) is ˆ ) − a θ)2 (a θ(Y . − a∈C(A ) a (A A) aMSE
(2.4.9)
sup
Using (2.4.8) and (2.4.9), we write (2.4.7) as ˆ ) − a θ)2 1 (a θ(Y sup − = s a∈C(A ) a (A A) aMSE(Y ) ˆ ) − θ ( (A A)− )−1 θ(Y ˆ ) − θ θ(Y sMSE(Y )
.
(2.4.10)
As the right side in (2.4.10) is the same as (2.4.1) with q = s, by Remark 2.4.2, it has F-distribution with s and n − s degrees of freedom. Hence
ˆ ) − a θ)2 (a θ(Y P sup − ≤ s F0 (ω; s, n − s) = 1 − ω, a∈C(A ) a (A A) aMSE(Y ) that is, √ ˆ ) − a θ |≤ s F0 (ω; s, n − s)a (A A)− aMSE(Y )∀a ∈ C(A ) = 1 − ω. P | a θ(Y This provides the simultaneous confidence interval for all elpf’s a θ.
36
2 Linear Hypotheses and their Tests
2.5 Comments Let Y, Aθ, σ 2 I be a Gauss–Markov model with rank s and Y have multivari ate normal distribution. Let = a θ : a θ is estimable } denote the space of all elpf’s in the model. Note that is closed under scalar multiplication and addition. Any set of s independent elpf’s will generate . Let 1 be a subspace of generated by a set of q (< s) independent elpf’s. Let 1 θ denote a vector of q independent elpf’s which generate 1 . Lemma 2.4.3 gives simultaneous confidence interval for all a θ belonging to . Suppose one wants simultaneous confidence interval only for those a θ belonging to the subspace 1 . Replacing θ by 1 θ and modifying the proof of Lemma 2.4.3, one can obtain the simultaneous confidence interval for all a θ ∈ 1 (see Exercise 2.1). It turns out that the constant s F0 (ω; s, n − s) in the confidence interval will now be q F0 (ω; q, n − s).
2.6 Exercises Exercise 2.1 Provide the missing steps in the proof of Theorem 2.2.1, Example 2.2.2, the proofs of Theorem 2.3.10 and Theorem 2.3.14 and in Sect. 2.5. Exercise 2.2 Let (Y, Aθ, σ 2 I ) be a Gauss–Markov model with rank s and Y have multivariate normal distribution. Show that the numerator of the test statistic ˆ for testing H : θ = 0 is θ As y , where θ is a vector of s independent elpf’s, θˆ is a least squares estimate of θ, and y is an observation on Y. Exercise 2.3 Consider the one-way classification model of Example 2.3.7. (i) Obtain the likelihood ratio tests for testing the following hypotheses: (a) H1 : α1 = · · · = αu , 2 ≤ u ≤ v. (b) H2 : α1 = · · · = αu 1 and αu 2 = · · · = αv , 2 ≤ u 1 < u 2 ≤ v − 1. (ii) Verify if the hypothesis H is a linear hypothesis given that H : α1 + · · · + αu − uαu+1 = 0, u = 1, . . . , v − 1. Derive the test procedure for testing H if it is a linear hypothesis and compare the test statistic and the test with those for testing the hypothesis Hα . (iii) Find a vector 1 α of v − 1 independent contrasts in α such that the numer ˆ 2 1 v−1 (λi α) ator of the test statistic for testing H : 1 α = 0 is v−1 , where i=1 ci 1 = (λ1 . . . λv−1 ). Find the constants c1 , . . . , cv−1 . (iv) Obtain the test procedure for testing H : a1 α = 0, where a1 α is a treatment contrast. Exercise 2.4 Y1 , . . . , Y5 are independent normal random variables with the same variance and expectations given by
2.6
Exercises
37
E(Y1 ) = θ1 + θ3 + θ4 , E(Y2 ) = θ1 + θ2 + 2θ3 , E(Y3 ) = θ1 − θ2 + 2θ4 , E(Y4 ) = θ2 + θ3 − θ4 , E(Y5 ) = 2θ1 + θ2 + 3θ3 + θ4 . (i) Obtain the likelihood ratio test of the hypothesis H : b1 (θ1 + θ3 + θ4 ) + b2 (θ2 + θ3 − θ4 ) = 0, where b1 and b2 are real numbers. (ii) Obtain simultaneous confidence interval for all elpf’s at 95% level. Exercise 2.5 Y1 , . . . , Y6 are independent normal random variables with a common variance and expectations given by E(Y1 ) = θ1 − θ3 + θ4 + θ5 , E(Y2 ) = θ1 − θ2 + θ3 − θ4 + θ5 , E(Y3 ) = θ2 − θ3 + θ4 − θ5 , E(Y4 ) = θ2 − 2θ3 + 2θ4 , E(Y5 ) = 2θ1 − θ2 + 2θ5 , E(Y6 ) = 2θ1 − θ3 + θ4 + θ5 . Obtain the likelihood ratio test of the hypothesis H : θ1 + θ3 + θ4 − θ5 = 0, θ2 + θ3 + θ4 − 3θ5 = 0. Exercise 2.6 Write R-codes for simultaneous confidence intervals.
2.7 R-Codes on Linear Estimation and, Linear Hypotheses and their Tests Example 2.7.1 This illustration is for topics discussed in Chaps. 1 and 2. We consider Examples 1.2.12, 1.3.6, 1.4.6, 2.3.5, and 2.3.12, with y = (−1.1, 1.3, 2.6, 1.9, 1.2) . Codes are given to find the least squares estimates of θ, check the estimability of an lpf a θ, find the variance of blue of a θ, find the covariance and the correlation between blue’s of a1 θ and a2 θ, and to test the hypotheses H0 : θ1 = . . . = θ p and H0 : θ = 0.
> > > > > +
rm(list=ls()); n > > > > >
[,1] [,2] [,3] [,4] 1 -1 0 1 1 1 1 0 0 2 1 -1 1 3 2 -1 2 0 1 1
A1 theta a1 aug Ra1 Ra1 [1] 2 > + + + + + + + + + + + + + +
if(s!=Ra1){ print("The lpf associated with a1 is not estimable"); }else{ print("The lpf associated with a1 is estimable"); M >
t1 F0 (ω; q1 , n − v − b + t),
(3.1.23)
where MSE is as given in (3.1.16) and βˆ is as in (3.1.8). Proof We use once again Theorem 2.3.10. Let = 0q 1 ×1 0q 1 ×v 2 . Then − θ = 2 β and H˜ : θ = 0. Substituting this , the g-inverse A A given in (3.1.8) and q1 in place of q in (2.3.4), we get the critical region in (3.1.23). Recall that s = n − v − b + t. Recall that we obtained the reduced normal equation (3.1.7) in α by substituting β in the normal equation (3.1.6) in θ. By eliminating α we will get the reduced normal equation in β. To get this, consider the normal equation (3.1.6) once again. Add μˆ = 0 as before. From the second equation, we get αˆ = R −1 T − N βˆ . Substituting this in the third equation, we get D βˆ = B − N R −1 T = P, say,
(3.1.24)
3.1 General Block Design
57
where D = K − N R −1 N . The equation (3.1.24) is the reduced normal equation in β. If D − is a g-inverse of D, then a solution of the equation (3.1.24) is βˆ = D − P. Substituting this in α, ˆ we get αˆ = R −1 T − N D − P . Thus, we get the following least squares estimate of θ: ⎛ ⎞ μˆ θˆ = ⎝αˆ ⎠ βˆ ⎛ = ⎝R
−1
⎞ 0 T − N D− P ⎠ D− P
(3.1.25)
⎛ ⎞⎛ ⎞ 0 0 0 y.. = ⎝0 R −1 + R −1 N D − N R −1 −R −1 N D − ⎠ ⎝ T ⎠ 0 −D − N R −1 D− B + = A A A y. + of A A in terms of D − . We get another g-inverse A A The vector P = B − N R −1 T in (3.1.24) is called the vector of adjusted block totals. It is not difficult to show (see Exercise 3.1) that E (P(Y )) = Dα
and
V (P(Y )) = Dσ 2 .
Note that Ib P = Ib D βˆ = 0. Remark 3.1.21 Using the least squares estimate θˆ in (3.1.26), we can write the critical region in (3.1.23) in a form similar to that in (3.1.22). Let = + given in 0q 1 ×1 0q 1 ×v 2 . Substituting this and the g-inverse A A (3.1.26) in the critical region (2.3.4), we get the critical region 1 q1
−1 2 βˆ 2 D − 2 2 βˆ MSE
> F0 (ω; q1 , n − v − b + t),
(3.1.26)
since q = q1 and s = n − v − b + t. The following lemma gives some relationships between quantities involving the matrices C and D.
58
3 Block Designs
Lemma 3.1.22 With Q(Y ) = T (Y ) − N K −1 B(Y ), we have (a) C R −1 N = N K −1 D, (b) Cov (Q(Y ), B(Y )) = 0, (c) Cov (P(Y ), T (Y )) = 0, (d) Cov (Q(Y ), P(Y )) = −σ 2 C R −1 N = −σ 2 N K −1 D. Proof (a)By definition, C R −1 N = R − N K −1 N R −1 N =N − N K −1 N R −1 N = N K −1 K − N R −1 N = N K −1 D. We write T (Y ) = A1 Y, B(Y ) = A2 Y, Q(Y ) = A1 Y − N K −1 A2 Y = F1 Y, and P(Y ) = A2 Y − N R −1 A1 Y = F2 Y where F1 = A1 − A2 K −1 N and F2 = A2 − A1 R −1 N . (b) We have Cov (Q(Y ), B(Y )) = E F1 (Y − Aθ) (Y − Aθ) A2 = σ 2 F1 A2 = σ 2 A1 A2 − N K −1 A2 A2 = 0 in view of (3.1.3). (c) We have Cov (P(Y ), T (Y )) = E F2 (Y − Aθ) (Y − Aθ) A1 = σ 2 F2 A1 = σ 2 A2 A1 − N R −1 A1 A1 =0
in view of (3.1.3).
(d) We have Cov (Q(Y ), P(Y )) = E F1 (Y − Aθ) (Y − Aθ) F2 = σ 2 F1 F2 = σ 2 A1 A2 − N K −1 A2 A2 − A1 A1 R −1 N + N K −1 A2 A1 R −1 N = −σ 2 N − N K −1 N R −1 N = −σ 2 R − N K −1 N R −1 N = −σ 2 C R −1 N = −σ 2 N K −1 D, using (a).
Example 3.1.23 Given below is the design of an experiment with five treatments labeled 1, 2, 3, 4, 5 and four blocks (columns) labeled 1, 2, 3, 4 from left to right along with the incidence matrix N . ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 4 3 ⎜ ⎟ 2 ⎜3⎟ ⎝ ⎠ ⎝ ⎠ 2 5 , 4 ⎝5⎠ 2 1 1
⎛ 0 ⎜1 ⎜ N =⎜ ⎜0 ⎝1 0
2 0 1 0 1
0 2 0 1 0
⎞ 1 0⎟ ⎟ 1⎟ ⎟. 0⎠ 1
3.1 General Block Design
59
We have R = diag(3, 3, 2, 2, 2) and K = diag(2, 4, 3, 3). The information matrix is ⎛ ⎞ 5/3 0 −5/6 0 −5/6 ⎜ 0 7/6 0 −7/6 0 ⎟ ⎜ ⎟ −1 ⎜ −5/6 0 17/12 0 −7/12⎟ C = R − NK N = ⎜ ⎟, ⎝ 0 −7/6 0 7/6 0 ⎠ −5/6 0 −7/12 0 17/12 Rank(C) = 3 and the By Theorem 3.1.3, a θ = a0 μ + 5design is disconnected. 5 a1i αi + 4j=1 a2 j β j is estimable iff (i) a0 = i=1 a1i = a1 α + a2 β = a0 μ + i=1 4 −1 a2 = 3. Let j=1 a2 j and (ii) Rank C : a1 − N K ⎞ ⎛ a11 − a222 + a324 ⎜a12 − a21 + 2a23 ⎟ ⎜ 3 ⎟ a222 −1 a24 ⎟ λ = a1 − N K a2 = ⎜ ⎜ a13 − a4 + a3 ⎟. ⎝ a14 − 21 + 23 ⎠ 3 2 a15 − a422 + a324 ⎛
Then Rank (C : λ) =
=
=
=
⎞ 5/3 0 −5/6 0 −5/6 λ1 ⎜ 0 7/6 0 −7/6 0 λ2 ⎟ ⎜ ⎟ ⎜ Rank ⎜−5/6 0 17/12 0 −7/12 λ3 ⎟ ⎟ ⎝ 0 −7/6 0 7/6 0 λ4 ⎠ −5/6 0 −7/12 0 17/12 λ5 ⎛ ⎞ 5/3 0 −5/6 0 −5/6 λ1 ⎜ 0 7/6 0 −7/6 0 λ2 ⎟ ⎜ ⎟ ⎜ 0 2 0 −2 2λ3 + λ1 ⎟ Rank ⎜ 0 ⎟ ⎝ 0 −7/6 0 7/6 0 λ4 ⎠ 0 0 −2 0 2 2λ5 + λ1 ⎛ ⎞ 5/3 0 −5/6 0 −5/6 λ1 ⎜ 0 7/6 0 −7/6 0 ⎟ λ2 ⎜ ⎟ ⎜ ⎟ 2 0 −2 2λ3 + λ1 Rank ⎜ 0 0 ⎟ ⎝ 0 0 ⎠ 0 0 0 λ2 + λ4 0 0 0 0 0 2λ1 + 2λ3 + 2λ5 Rank(C) = 3,
iff λ2 + λ4 = 0 and λ1 + λ3 + λ5 = 0; that is, iff a12 + a14 = a21 + a23 and a11 + a13 + a15 = a22 + a24 . Therefore, a θ is estimable iff (i) a0 = a11 + a12 + a13 + a14 + a15 , (ii) a12 + a14 = a21 + a23 , and (iii) a11 + a13 + a15 = a22 + a24 . In particular, a1 α is estimable iff a12 + a14 = a11 + a13 + a15 = 0 and a2 β is estimable iff a21 + a23 = a22 + a24 = 0. Thus α2 − α4 and any contrast in α1 , α3 , and α5 are estimable. The only block contrasts that are estimable are
60
3 Block Designs
β1 − β3 , β2 − β4 , and their linear combinations. Let the vector of observations be denoted by y = (y11 y12 y21 y22 y23 y24 y31 y32 y33 y41 y42 y43 ) . The vector of treatment totals is T = (T1 T2 T3 T4 T5 ) , where T1 = y21 + y24 + y43 , T2 = y11 + y32 + y33 , T3 = y22 + y41 , T4 = y12 + y31 , and T5 = y23 + y42 . The vector of block totals is B = (y1. y2. y3. y4. ). We will now obtain a least squares estimate of θ = μ α β , where α = (α1 α2 α3 α4 α5 ) and β = (β1 β2 β3 β4 ) . We need to solve the reduced normal equation in α, C αˆ = Q, where ⎛
y4. ⎞ 3 ⎟ + 2y33. ⎟ ⎟ + y34. ⎟ ⎟ + y33. ⎠ + y34.
T1 − y22. +
⎜ y ⎜T2 − 21. ⎜ y2. −1 Q = T − N K B = ⎜ T3 − ⎜ 4 ⎝ T4 − y1. 2 T5 − y42.
⎛
⎞ Q1 ⎜ Q2⎟ ⎜ ⎟ ⎟ =⎜ ⎜ Q 3 ⎟. ⎝ Q4⎠ Q5
Written explicitly, the reduced normal equation in α is 5 5 5 αˆ1 − αˆ3 − αˆ5 = Q 1 , 3 6 6 7 7 αˆ2 − αˆ4 = Q 2 , 6 6 5 17 7 − αˆ1 + αˆ3 − αˆ5 = Q 3 , 6 12 12 7 7 − αˆ2 + αˆ4 = Q 4 , 6 6 5 7 17 − αˆ1 − αˆ3 + αˆ5 = Q 5 . 6 12 12 Notice that Q 2 + Q 4 = 0 = Q 1 + Q 3 + Q 5 , and the corresponding sums of the left sides are also zeroes. Therefore, we can delete any one of second and fourth equation and any one of first, third, and fifth equations. We shall delete the fourth and the fifth equation and get a solution for αˆ from the first three equations. We have to add two independent equations to get a solution. We shall add the equations αˆ 3 = 0 and αˆ 4 = 0. The equations that we have to solve: 5 5 αˆ 1 − αˆ 5 = Q 1 , 3 6 7 αˆ 2 = Q 2 , 6 5 7 − αˆ 1 − αˆ 5 = Q 3 . 6 12 The solutions: αˆ 1 =
7 1 Q1 − Q3, 20 2
αˆ 2 =
6 Q2, 7
1 αˆ5 = − Q 1 − Q 3 . 2
3.1 General Block Design
61
⎞ ⎛ 7 Q 1 − 21 Q 3 20 6 ⎟ ⎜ 0 ⎜ Q 2 ⎟ ⎜ ⎜ 7 ⎟=⎜ 0 αˆ = ⎜ 0 ⎟ ⎜ ⎜ ⎠ ⎝ 0 ⎝ 0 1 − 2 Q1 − Q3 − 21 ⎛7
Hence
20
0 − 21 6 0 7 0 0 0 0 0 −1
0 0 0 0 0
⎞ 0 0⎟ ⎟ − 0⎟ ⎟ Q = C Q. ⎠ 0 0
A least squares estimate of β as given in (3.1.8) is βˆ = K −1 B − N C − Q . Since 6 7
α2 − α4
y11 −y12 2
estimable, its best estimate is αˆ 2 − αˆ 4 = 67 Q 2 = − 2y31 −y332 −y33 . From (3.1.12), the variance of the best estimator of is
α2 − α4 is equal to 67 σ 2 . In this model, n = 12, v = 5, b = 4, and t = 2. From (3.1.16), we have MSE k j 2 −1 K B 17 2 = y y−αˆ Q−B , where y y = 4j=1 l=1 y jl , αˆ Q = 20 Q 1 + 67 Q 22 + Q 23 + 5 y2 y2 y2 y2 Q 1 Q 3 since Q 1 + Q 3 + Q 5 = 0, and B K −1 B = 4j=1 k j.j = 21. + 42. + 33. + y4.2 . 3
The critical region for testing Hα : α1 = α2 = α3 = α4 = α5 is given by (3.1.18), that is, αˆ Q > F0 (ω; 3, 5). (3.1.27) 3 MSE Since Hα is not an estimable linear hypothesis, the critical region (3.1.27) is, in fact, for testing H : Cα = 0, that is, H : 2α1 = α3 + α5 , 12α3 = 7α4 + 5α5 . Example 3.1.24 (Randomized Block Design model) A random assignment of v treatments to vr plots, grouped into r blocks of v plots each, is called a Randomized Block Design, abbreviated as RBD, if every treatment appears once in each block. Since every treatment appears once in every block, we will call that plot in every block as the ith plot to which treatment i is assigned. Let y ji denote the yield from the ith plot in the jth block and Y ji denote the random variable on which y ji is an observation, j = 1, . . . , r, i = 1, . . . , v. Then the model (3.1.2) takes the simple form E Y ji = μ + αi + β j
(3.1.28)
since f jl(i) = 1 iff l = i. Note that n = vr, b = r, k1 = · · · = kb = v, r1 = · · · = rv = r , and n i j = 1 for all i and j. Therefore the incidence matrix is N = I I Iv Ir , R = r Iv , K = v Ir , and C = r Iv − vr Iv Iv = r v . Here, v = Iv − vv v . By definition, v is symmetric, idempotent, singular, and has rank v − 1 since Rank(v ) = trace(v ) (see Exercise 3.1). The RBD is connected since Rank(v )
62
3 Block Designs
= v − 1. Since each block gives a replication on each treatment, a block is also called a replicate. Writing (3.1.28) as E (Y ) = μ I + A1 α + A2 β = Aθ, we note that
⎛ ⎞ Iv ⎜ Iv ⎟ ⎜ ⎟ ⎜.⎟ ⎟ A1 = ⎜ ⎜.⎟ ⎜ ⎟ ⎝.⎠ Iv
and
⎛ Iv ⎜0 ⎜ ⎜. A2 = ⎜ ⎜. ⎜ ⎝. 0
0 Iv . . . 0
... ... ... ... ... ...
(3.1.29) ⎞ 0 0⎟ ⎟ .⎟ ⎟. .⎟ ⎟ .⎠ Iv
By Theorem 3.1.9, an lpf a θ = a0 μ + a1 α + a2 β is estimable iff a0 = Iv a1 = Ir a2 . In particular, an lpf a1 α (a2 β) of treatment (block) effects alone is estimable iff it is a contrast. To get a least squares estimate of θ, we solve the reduced normal equation (3.1.7). In this case, we write y.∗ = T = (y.1 . . . y.v ) , y∗. = B = (y1. . . . yr. ) , and Iv Ir y∗. Iv Iv y.∗ = y.∗ − = v y.∗ , Q = T − N K −1 B = y.∗ − v v v y ji , and y.. = rj=1 since I y = Ir y∗. = y.. , where y.i = rj=1 y ji , y j. = i=1 v v .∗ i=1 y ji . Equation (3.1.7) takes the form r αˆ −
r Iv Iv αˆ = v y.∗ . v
Since Rank(C) = v − 1, we need to add only one equation to get a unique solution. Adding the equation Iv αˆ = 0, we get αˆ =
v y.∗ = C − Q, r
where C− =
v . r
Substituting this in (a) of (3.1.6), we get y∗. . βˆ = v Thus
(3.1.30)
3.1 General Block Design
63
⎞⎛ ⎞ 0 0 0 y.. − θˆ = ⎝ vry.∗ ⎠ = ⎝0 r v 0 ⎠ ⎝ y.∗ ⎠ = A A A y. y∗. y∗. 0 0 Ivr v ⎛
0
⎞
⎛
(3.1.31)
The blue of an elpf a θ = a0 μ + a1 α + a2 β is a θˆ =
a y∗. a0 y.. a1 y.∗ + 2 − . r v vr
(3.1.32)
Substituting (3.1.30) in (3.1.11), we get a a a2 a2 a02 1 1 ˆ V a θ(Y ) = + − σ2 . r v vr
(3.1.33)
In particular, from (3.1.32), the blue of a1 α is a1 αˆ =
a1 y.∗ . r
(3.1.34)
From (3.1.33), its variance is a a1 V a1 α(Y ˆ ) = 1 σ2 . r
(3.1.35)
Again from (3.1.32), the blue of a2 β is a y∗. a2 βˆ = 2 , v
(3.1.36)
ˆ ) = a2 a2 σ 2 . V a2 β(Y v
(3.1.37)
and from (3.1.33), its variance is
Let a θ = a0 μ + a1 α + a2 β and a ∗ θ = a0∗ μ + a1∗ α + a2∗ β be two elpf’s. Upon substituting for C − from (3.1.30) in (3.1.14), we get ∗ ∗ ∗ ˆ ), a ∗ θ(Y ˆ ) = a1 a1 + a2 a2 − a0 a0 σ 2 . Cov a θ(Y r v vr
(3.1.38)
In particular, we get the following from (3.1.38): a a∗ ˆ ), a1∗ α(Y ˆ ) = 1 1 σ2 ; Cov a1 α(Y r a a∗ 2 2 2 ˆ ∗ ˆ Cov a2 β(Y ), a2 β(Y ) = σ ; v
(3.1.39) (3.1.40)
64
3 Block Designs
ˆ ) = 0. and Cov a1 α(Y ˆ ), a2∗ β(Y
(3.1.41)
Since RBD is connected, the hypotheses Hα and Hβ are estimable. We shall now obtain the critical regions for testing of Hα and Hβ from (3.1.18) and (3.1.20). From (3.1.16), the MSE is y y y y y y − .∗ rv .∗ − ∗.v ∗. y y − αˆ Q − B K −1 B = = MSE =
n−v−b+t
y vr y −
(v − 1)(r − 1)
y y.∗ y y v .∗ − ∗. vr ∗. r
(v − 1)(r − 1)
,
where y vr y =
v r j=1 i=1
y.∗ v y.∗ r
=
y 2ji −
y..2 = SST, vr
v y.i2 y2 − .. = SSTr, r vr i=1
r y 2j. y∗. r y∗. y2 and = − .. = SSB, v v vr j=1
SST being the total SS. Thus SSE = SST − SSTr − SSB. From the critical region (3.1.18), the numerator of the test statistic for testing Hα is y v y.∗ SSTr αˆ Q = .∗ = = MSTr, v−t r (v − 1) v−1 where MSTr stands for mean squares (MS) for treatments. Hence the level-ω critical region for testing Hα is MSTr > F0 (ω; v − 1, (v − 1)(r − 1)) . MSE
(3.1.42)
From the critical region (3.1.20), the numerator of the test statistic for testing Hβ is αˆ Q + B K −1 B − T R −1 T 1 y.∗ y.∗ y2 y y∗. y y.∗ = − .. + ∗. − .∗ b−t r −1 r vr v r SSB = r −1 = MSB,
3.1 General Block Design
65
where MSB denotes the MS for blocks. Hence the level-ω critical region for testing Hβ is MSB > F0 (ω; r − 1, (v − 1)(r − 1)). (3.1.43) MSE The analysis of variance table associated with an RBD is given below.
3.1.7 Anova Table for RBD
Sources of variation
Degrees of freedom
SS
MS
F-Ratio
Blocks
r −1
SSB
MSB
MSB MSE
Treatments
v−1
SSTr
MSTr
Error Total
(r − 1)(v − 1) vr − 1
SSE SST
MSE –
MSTr MSE – –
3.1.8 Some Criteria for Classification of Block Designs Consider a block design with incidence matrix N . A block is said to be an incomplete block if it does not contain all the treatments; otherwise, it is said to be a complete block. A block design is called an incomplete block design if there exists at least one incomplete block; otherwise, it is called a complete block design. A block design is said to be variance balanced if the variance of the blue of every normalized estimable treatment contrast is the same. Here, normalized contrast means that the norm of the vector associated with the contrast is equal to 1. It follows from the definition that the variance of the blue of every elementary estimable treatment contrast is the same in a variance balanced block design. The following theorem gives a criterion for a block design to be variance balanced. Theorem 3.1.25 A block design is variance balanced iff γC is idempotent for some γ > 0, where C is the information matrix of the block design. Proof Suppose that γC is idempotent for some γ > 0. Let a1 α be a normalized estimable treatment contrast. Since (γC)2 = γC, we have C − = γ Iv . Since a1 α is estimable, by (i) of Corollary 3.1.4, there exists a vector d such − . Hence d = C a = γa . Now V a α(Y ˆ ) = V d C α(Y ˆ ) = that Cd = a 1 1 1 1 V d Q(Y ) = d Cdσ 2 = d a1 σ 2 = γa1 a1 σ 2 = γσ 2 since C αˆ = Q and V (Q(Y )) = Cσ 2 . The sufficiency of the condition follows.
66
3 Block Designs
Conversely, suppose that the blue of every normalized treatment contrast has the same variance δσ 2 . Since Cα is estimable by (i) of Corollary 3.1.4, d Cα is a normalis an estimable treatment contrast for every vector d and √dd Cα C2d
ˆ ) ) ized estimable treatment contrast. Its blue is d√Cd α(Y = √d dQ(Y and has variance C2d C2d ) 2 2 V √d Q(Y = dd CCd Hence d C − δC 2 d = 0 for every d. This 2 d σ = δσ . 2 dC d
implies that C = δC 2 and hence δC is idempotent.
The following theorem shows that the C-matrix of a connected and variance balanced block design cannot be arbitrary. Theorem 3.1.26 A connected block design is variance balanced iff C = δv for some δ > 0. Proof Suppose that C = δv for some δ > 0. Then Cδ = v is idempotent and the block design is variance balanced by Theorem 3.1.25. Conversely, suppose that the block design is variance balanced and connected. Again by Theorem 3.1.25, C = γC 2 for some γ > 0. Therefore, C (Iv − γC) = 0. Since Rank(C) = v − 1 and CIv = 0, the solution space is spanned by Iv . Therefore, for some diagonal matrix C ∗ , we must have Iv − γC = Iv Iv C ∗ . Premultiplying both sides by Iv , we get Iv = vIv C ∗ and hence C ∗ = v1 Iv as I I Iv C = 0. Substituting this, we get Iv − γC = vv v and hence C = γ1 v . ˆ )) = 0 for any ˆ ), a2 β(Y A block design is said to be orthogonal if Cov(a1 α(Y estimable treatment contrast a1 α and any estimable block contrast a2 β. The following theorem gives a criterion for a block design to be orthogonal. Theorem 3.1.27 A block design is orthogonal iff R −1 is a g-inverse of C or equivalently, R −1 C (or C R −1 ) is idempotent. Proof Suppose that the block design is orthogonal. Since Cα and Dβ are estimable, a1 Cα and a2 Dβ are estimable treatment and block contrasts for every a1 ∈ Rv and a2 ∈ Rb . Then, for any a1 ∈ Rv and a2 ∈ Rb , using C αˆ = Q by ˆ = Cov a Q, a P = ˆ a2 D β) (3.1.7) and D βˆ = P by (3.1.24), 0 = Cov(a1 C α, 1 2 of Lemma 3.1.22. Hence C R −1 N = a1 Cov (Q, P) a2 = −a1 C R −1 N a2 σ 2 by (d) 0. Now C R −1 C = C R −1 R − N K −1 N = C − C R −1 N K −1 N = C and R −1 is a g-inverse of C or R −1 C (or C R −1 ) is idempotent. Suppose now that R −1 is a g-inverse of C. Then C = C R −1 C = R − N K −1 N R −1 C = C − N K −1 N R −1 C. This implies that N K −1 N R −1 C = 0 which, in turn, implies that −1 −1 −1 N R C = 0. CR N K Hence C R −1 N = 0. Now, let a1 α and a2 β be any estimable treatment and block contrasts, respectively, so that Cd1 = a1 and Dd2 = a2 for some d1 and d2 . Then
3.1 General Block Design
67
ˆ ˆ = Cov(d C α, Cov(a1 α, ˆ a2 β) = d1 Cov(Q, P)d2 = 1 ˆ d2 D β) = Cov d1 Q, d2 P −1 2 −d1 C R N d2 σ = 0 using (d) of Lemma 3.1.22, and the block design is orthogonal. A block design with incidence matrix N is said to be equireplicate if R = r Iv for some integer r, proper if K = k Ib for some integer k, and binary if the elements of N are either 0 or 1. Note that an equireplicate block design with r replicates for each treatment is orthogonal iff Cr is idempotent. The following theorem gives a criterion for a connected block design to be orthogonal. Theorem 3.1.28 A connected block design is orthogonal iff N = N is the incidence matrix of the block design.
RIv Ib K n
, where
Proof It is enough to show that when the block design is connected, then the criterion for orthogonality in Theorem 3.1.27 is equivalent to the one in the present theorem. Suppose that the block design is connected and orthogonal so that R −1 C is idempotent by Theorem 3.1.27. As in the proof of Theorem 3.1.27, R −1 C idempotent implies that C R −1 N = 0. Since Rank(C) = v − 1, arguing as in the proof of Theorem 3.1.26 (see Exercise 3.1), we assert that there exists a matrix C ∗ = diag (g1 , . . . , gb ) such that R −1 N = Iv Ib C ∗ . Comparing the (i, j)th entry n of the left side with that of the right side, we get riij = g j for all i and j. From this, we get k j = ng j or g j = j = 1, . . . , b. So N = Now let N =
RIv Ib K n
RIv Ib K n
kj n
and hence n i j =
ri k j n
, i = 1, . . . , v and
.
. Then, since Ib K Ib = n,
R −1 C = Iv − R −1 N K −1 N = Iv − R −1
RIv Ib K −1 K Ib Iv R Iv Iv R K = Iv − . n n n
It is easy to check (see Exercise 3.1) that Iv − is idempotent.
Iv Iv R n
is idempotent and hence R −1 C
Note that in a connected and orthogonal block design, no entry of N can be zero. Example 3.1.29 An RBD with v treatments and r blocks has the incidence matrix N = Iv Ir , R = r Iv , K = v Ir , and C = r v . Thus it is a binary, equireplicate, proper, connected, complete block design. Further, it is variance balanced and orthogonal (see Exercise 3.1). Example 3.1.30 Consider the block design whose incidence matrix is
68
3 Block Designs
⎛
1 ⎜1 ⎜ N =⎜ ⎜0 ⎝0 0
⎞ 0 0⎟ ⎟ 1⎟ ⎟. 1⎠ 1
20 2 0 . Hence C = and Rank(C) = 3. The 03 0 3 block design is binary, equireplicate, but neither connected nor proper. Further, it is an incomplete block design. However, by Theorem 3.1.25, it is variance balanced since C 2 = C. It is also orthogonal by Theorem 3.1.27 since R −1 C = C = C 2 . Then R = I5 and K =
3.2 Balanced Incomplete Block Design A random assignment of v treatments to plots in b blocks of k plots each (k < v) is called a Balanced Incomplete Block Design, abbreviated as BIBD, if (i) every treatment appears only once in r blocks and (ii) any two treatments appear together in λ blocks. The integers v, b, r, k, and λ are called the parameters of the BIBD. Let N denote the incidence matrix of this block design. Note that n i j = 0 or 1 for all i and j and that the BIBD is binary. Further, R = r Iv and K = k Ib so that a BIBD is equireplicate and proper. By definition, we have b j=1
ni j ni j =
r if i = i , λ if i = i ;
(3.2.1)
and hence the incidence matrix of a BIBD satisfies the relation N N = (r − λ)Iv + λ Iv Iv .
(3.2.2)
Note that the relationship (3.2.2) can be used to check whether a given block design is a BIBD or not. The following theorem shows that the parameters of a BIBD are related. The relations are only necessary conditions for a BIBD but not sufficient. Theorem 3.2.1 The parameters v, b, r, k, and λ of a BIBD satisfy the following relationships: (i) vr = bk; (ii) r (k − 1) = λ(v − 1); (iii) b ≥ v ⇔ (iv) b ≥ v + r − k. Proof (i) Note that Iv (N Ib ) = Iv RIv = vr = Iv N Ib = Ib K Ib = bk.
3.2 Balanced Incomplete Block Design
69
2 (ii) Since Iv N N Iv = I2v (r −λ)Iv +λ Iv Iv Iv = (r −λ)v+λv = Iv N N Iv = Ib K (K Ib ) = bk = vr k, we get r + λ(v − 1) = r k and the claim follows. (iii) Using the properties of determinants, we have r λ . . . λ λ r . . . λ | NN | = . . ... . λ λ . . . r r + λ(v − 1) λ . . . λ r + λ(v − 1) r . . . λ = . . . . . . r + λ(v − 1) λ . . . r 1 λ . . . λ 1 r . . . λ = {r + λ(v − 1)} . . ... . 1 λ . . . r 1 λ . 0 r − λ 0 = {r + λ(v − 1)} 0 0 r − λ . . . 0 . .
.. .. .. .. ..r
λ 0 0 . − λ
= {r + λ(v − 1)} (r − λ)v−1. 0, so that Rank(N N ) Since k < v, λ = r (k−1) v−1 = v = Rank(N ) ≤ b. b−r and hence (iii) and (iv) are equivalent. (iv) From (i), we have vb = rk = v−k The inequality (iii) is known as Fisher’s inequality and is equivalent to r ≥ k in view of (i). The condition k < v and the relationship (ii) together imply that r > λ. A BIBD is said to be symmetrical if b = v. Given a symmetrical BIBD, there exist two BIBD’s called derived BIBD and residual BIBD and these are out of the scope of the discussion here. We give below two characterizations of symmetrical BIBD. Theorem 3.2.2 A BIBD is symmetrical iff b = v + r − k. Proof Since the necessity part follows trivially from (i) of Theorem 3.2.1, we give the proof of the sufficiency part. Now b = v + r − k =⇒ b = v + bk − k =⇒ v b(v−k) = v − k =⇒ b = v since v − k > 0. v Theorem 3.2.3 A BIBD is symmetrical iff N N = N N .
70
3 Block Designs
Proof The sufficiency is trivial since N N is a square matrix of order v and N N is a square matrix of order b. Now let the BIBD be symmetrical. Then, from (3.2.2), N N N = (r − λ)N + λN Iv Iv = (r − λ)N + λ Iv Iv N , since N Iv Iv = RIv Iv = r Iv Iv = Iv Iv N . Since | N N | = | N |2 = 0, N is non −1 singular. Postmultiplying the above by N , we get N N = (r − λ)Iv + λ Iv Iv = N N . Note that in a symmetrical BIBD, any two blocks have λ treatments in common. Theorem 3.2.4 In a symmetrical BIBD with even number of treatments, r − λ must be a perfect square. Proof Since N is a square matrix, we have, as in the proof of (iii) of Theorem 3.2.1, | N N | = | N |2 = {r + λ(v − 1)} (r − λ)v−1 = r 2 (r − λ)v−1 , using the property that r (k − 1) = λ(v − 1) and r = k. Hence | N | = ±r (r − v−1 v−1 λ) 2 . Since | N | is an integer, (r − λ) 2 will be an integer when v is even iff r − λ is a perfect square. Using (3.2.2) and Theorem 3.2.1, the information matrix of the BIBD is C = R − N K −1 N = r Iv −
λv 1 1 N N = r Iv − (r − λ)Iv + λ Iv Iv = v . k k k (3.2.3)
k C is idempoTherefore Rank(C) = v − 1 and the BIBD is connected. Since λv tent, by Theorem 3.1.25, the BIBD is variance balanced. However, since R −1 C = λv is not idempotent, by Theorem 3.1.27, the BIBD is not orthogonal. rk v
3.2.1 Estimability In spite of the nice properties of BIBD, the model (3.1.2) does not simplify as it did in the case of RBD. Since the BIBD is connected, by Theorem 3.1.9, an lpf a θ = a0 μ + a1 α + a2 β is estimable iff a0 = Iv a1 = Ib a2 . By Corollary 3.1.10, an lpf a1 α (a2 β) is estimable iff it is a treatment (block) contrast.
3.2 Balanced Incomplete Block Design
71
3.2.2 Least Squares Estimates We will now obtain a least squares estimate of θ. Recall that μˆ = 0, βˆ = K −1 B − N αˆ , where αˆ is a solution of C αˆ = Q, the reduced normal equation; and Q = T − N K −1 B. From (3.2.3), the reduced normal equation is NB λv v αˆ = Q = T − . k k
(3.2.4)
Since Rank(C) = v − 1, to get a solution of (3.2.4), we add the equation Iv αˆ = 0. Then k Q. (3.2.5) αˆ = λv From (3.2.5), a g-inverse of the C-matrix of a BIBD is C− = Hence βˆ =
1 k
B−
kNQ λv
k Iv . λv
(3.2.6)
and
⎞ ⎛ 0 0 0 k k Q ⎠ = ⎝0 λv Iv θˆ = ⎝ λv NQ N B 0 − − λv k λv ⎛
⎞⎛ ⎞ 0 y.. N ⎠ ⎝ T ⎠ = A A − A y. − λv Ib B + NλvkN k
(3.2.7)
3.2.3 Best Estimates Let a θ = a0 μ + a1 α + a2 β be estimable. Then the blue of a θ given in (3.1.10) simplifies to N a2 a B k a1 − (3.2.8) Q+ 2 , a θˆ = λv k k with variance given in (3.1.11) simplifying to k a2 a2 N a2 N a2 ˆ a1 − a1 − + σ2 . V a θ(Y ) = λv k k k
(3.2.9)
In particular, the blue of a treatment contrast a1 α is a1 αˆ =
k a Q, λv 1
(3.2.10)
72
3 Block Designs
with variance
k V a1 α(Y ˆ ) = a a1 σ 2 . λv 1
(3.2.11)
The blue of a block contrast a2 β is a2 βˆ =
a N Q a2 B − 2 , k λv
(3.2.12)
with variance a a a2 N N a2 2 2 ˆ V a2 β(Y ) = + σ2 . k kλv
(3.2.13)
From (3.2.11), it follows that the variances of the blue’s of normalized treatment k 2 σ and this proves once again that a BIBD is variance contrasts are all equal to λv balanced. However, the variances of blue’s of the normalized block contrasts are not equal unless the BIBD is symmetric, in which case, (3.2.13) simplifies to ˆ ) = k a a2 σ 2 = k σ 2 if a a2 = 1. V a2 β(Y 2 λv 2 λv
(3.2.14)
3.2.4 Tests of Hypotheses The MSE given in (3.1.16) simplifies to MSE =
SSE k B B 1 y y − Q Q− = . (3.2.15) vr − v − b + 1 λv k vr − v − b + 1
The level-ω critical region for testing Hα given in (3.1.18) takes the following form: kQ Q λv(v−1)
MSE
> F0 (ω; v − 1, vr − v − b + 1),
(3.2.16)
and the level-ω critical region for testing Hβ given in (3.1.20) takes the form: 1 b−1
k Q Q λv
+
B B k
MSE
−
T T r
> F0 (ω; b − 1, vr − v − b + 1).
(3.2.17)
The twin Anova tables, Tables 3.1 and 3.2, can be used to present the computa tions by taking t = 1, n = vr, SST r (ad j) = k QλvQ , and SSB(adj) = k QλvQ + BkB − T T . r
3.2 Balanced Incomplete Block Design
73
Let 1 α be a vector of q (≤ v − 1) independent treatment contrasts and let Hˆ : 1 α = 0. By Theorem 3.1.19, the level-ω critical region for testing Hˆ is k λv
−1 1 Q 1 1 1 Q > F0 (ω; q, vr − v − b + 1). qMSE
(3.2.18)
Let 2 β be a vector of q1 (≤ b − 1) independent block contrasts and let H˜ : = 0. By Theorem 3.1.20, the level-ω critical region for testing H˜ is
2 β
k 2 βˆ 2 2 +
2 N N 2 λv
−1 2 βˆ
q1 MSE where, from (3.2.7), βˆ =
B k
−
> F0 (ω; q1 , vr − v − b + 1), (3.2.19)
NQ . λv
Example 3.2.5 The following is the plan of an experiment with 7 treatments labeled 1 to 7 and 7 blocks (columns) labeled 1 to 7 : ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 3 4 5 6 2 1 2 ⎝5⎠ ⎝6⎠ ⎝7⎠ ⎝1⎠ ⎝3⎠ ⎝3⎠ ⎝4⎠. 6 7 1 2 7 4 5 We will examine whether this is a BIBD using the relationship (3.2.2). The incidence matrix of the block design is ⎛
0 ⎜0 ⎜ ⎜1 ⎜ N =⎜ ⎜0 ⎜1 ⎜ ⎝1 0
0 0 0 1 0 1 1
1 0 0 0 1 0 1
1 1 0 0 0 1 0
0 1 1 0 0 0 1
1 0 1 1 0 0 0
⎞ 0 1⎟ ⎟ 0⎟ ⎟ 1⎟ ⎟, 1⎟ ⎟ 0⎠ 0
⎛
3 ⎜1 ⎜ ⎜1 ⎜ NN = ⎜ ⎜1 ⎜1 ⎜ ⎝1 1
1 3 1 1 1 1 1
1 1 3 1 1 1 1
1 1 1 3 1 1 1
1 1 1 1 3 1 1
1 1 1 1 1 3 1
⎞ 1 1⎟ ⎟ 1⎟ ⎟ 1⎟ ⎟ = 2I7 + I7 I7 . ⎟ 1⎟ 1⎠ 3
The block design is a symmetrical BIBD with v = b = 7, r = k = 3, and λ = 1. Remark 3.2.6 There are several methods to construct a BIBD and tables are available for parameter values v, b, r, k, λ for which a BIBD exists since a BIBD may not exist for all possible parameter values. The interested reader is urged to look into the literature for more information on the construction of a BIBD.
3.2.5 Recovery of Inter-Block Information In the block design models discussed in earlier sections, the treatment effects and the block effects were assumed to be deterministic and are called fixed effects. The
74
3 Block Designs
analysis of models with fixed effects is called intra-block analysis. Since the treatments are assigned randomly to incomplete blocks in an incomplete block design, the inter-block information is recovered by assuming the block effects to be random variables. Recovery of such information by treating the block effects as random is called recovery of inter-block information and is discussed here. We discuss recovery of inter-block information in a block design and specialize this for BIBD. We consider the block model vdesign f jl(i) αi + β j + jl , l = 1, . . . , k j , j = 1, . . . , b, Y jl = μ + i=1 where we assume that β j is a random variable with E(β j ) = 0 and V (β j ) = σb2 , V ( jl ) = σ 2 > 0, Cov(β j , j l ) = 0, Cov( jl , j l ) = 0, j, j = 1, . . . , b, l = 1, . . . , k j , l = 1, . . . , k j . Let Y∗. = (Y1. . . . Yb. ) denote the random vector of block totals. We have ⎛ ⎞ kj kj kj v ⎝ ⎠ E(Y j. ) = E Y jl = E(Y jl ) = k j μ + f jl(i) αi l=1
= kjμ +
v i=1
l=1
l=1 i=1
kj
αi
f jl(i) = k j μ +
l=1
v
n i j αi , j = 1, . . . , b,
i=1
˜ where A˜ = (K Ib N ) and θ˜ = (μ α ) . Also, V (Y∗. ) = so that E(Y∗. ) = A˜ θ, ˜ σ ∗2 Ib ) is a Gauss–Markov model with (σ 2 + σb2 )Ib = σ ∗2 Ib , say. Thus (Y∗. , A˜ θ, b 2 j=1 k j Ib K N ˜ = Rank( A˜ A) ˜ = Rank = Rank(N N ) = rank equal to Rank( A) N K Ib N N Rank(N ) = v if the block design is a BIBD, since Ib times the second row is equal to the first row (see Exercise 3.1). We observe that this linear model is a less-thanfull-rank model as the number of parameters is v + 1 and the rank of N N is at the most v. We also note that inter-block analysis is considered only when b ≥ v. In the remainder of this section, we assume that the block design is a BIBD with b > v, which is needed for the testing of hypotheses. Lemma 3.2.7 An lpf a θ˜ = a0 μ + a1 α is estimable iff a0 = Iv a1 . Proof The lpf a θ˜ = a0 μ + a1 α is estimable iff v = Rank( A˜ A˜ : a) = Rank
bk 2 r k Iv a0 , kr Iv N N a1
which is true iff a0 = Iv a1 , since Iv (kr Iv N N ) = (bk 2 r k Iv ), completing the proof. Corollary 3.2.8 An lpf a1 α is estimable iff it is a treatment contrast. The Corollary follows from the lemma by putting a0 = 0.
3.2 Balanced Incomplete Block Design
75
The normal equation is A˜ A˜ θˆ˜ = A˜ y∗. , which becomes bk 2 μˆ + r k Iv αˆ = ky.. , kr Iv μˆ + N N αˆ = N y∗. . Adding μˆ = 0, we get αˆ = (N N )−1 N y∗. , and 0 0 ky.. 0 ˆθ˜ = ˜ − A˜ y∗. . = = ( A˜ A) N y∗. 0 (N N )−1 (N N )−1 N y∗. If a1 α is a treatment contrast, then it is estimable and its blue is a1 αˆ = a1 (N N )−1 N y∗. . We wish to test the hypothesis Hα : α1 = . . . = αv . The reduced model under Hα is E(Y∗. ) = kIb μ + N Iv α1 = kIb (μ + α1 ) = kIb μ∗ = A∗ θ∗ ,
say,
where A∗ = kIb , θ∗ = μ∗ . From the normal equation A∗ A∗ θ∗ = A∗ y∗. , we get y.. bk 2 μˆ ∗ = ky.. , so that μˆ ∗ = bk . Also, we have θˆ˜ A˜ y∗. = αˆ N y∗. = y∗. (N N )−1 N N y , MSE= SSE , where SSE= y y − y (N N )−1 N N y , and the numerator ∗.
b−v
∗. ∗.
of the likelihood ratio test statistic equal to 1 1 ˆ˜ ˜ θ A y∗. − θˆ∗ A∗ y∗. = v−1 v−1
∗.
y∗. (N N )−1 N N y∗. −
∗.
SSTr y..2 = = MSTr. b v−1
Therefore, the likelihood ratio test rejects Hα at level ω if MSTr MSE > F0 (ω; v − 1, b − v). Remark 3.2.9 Note that an unbiased estimator of the error variance σ ∗2 = σ 2 + σb2 in this model is MSE. An estimate of σb2 is the difference between this MSE and the MSE of BIBD given in (3.2.15), provided this estimate is positive. With distributional assumption on the block effects, one can discuss the relevance of the assumption that the block effects are random by testing the hypothesis σb2 = 0, a significance test. The interested reader is referred to the literature on the recovery of inter-block information, random effect models, and variance component estimation, for more details. This completes the discussion on the recovery of inter-block information.
76
3 Block Designs
3.3 Partially Balanced Incomplete Block Design A random assignment of v treatments to plots in b blocks of k plots each (k < v) is said to be a Partially Balanced Incomplete Block Design, abbreviated as PBIBD, if it satisfies the following: (i) Every treatment appears once in r blocks. (ii) With respect to any treatment i, the remaining treatments can be divided into m (≥ 2) groups such that every treatment of the uth group appears together with the treatment i in exactly λu blocks and that there are n u treatments in the uth group, u = 1, . . . , m, i = 1, . . . , v. The integers n 1 , . . . , n m ; λ1 , . . . , λm remain the same whatever be i. The treatments of the uth group are called u-associates of treatment i. (iii) If treatments i and i are u-associates, then the number of treatments common between x-associates of i and y-associates of i is the same for any pair (i, i ) of u-associates and is denoted by pxuy , x, y = 1, . . . , m. Such a block design is called an m-associate class PBIBD. u Let Pm×m = pxuy , u = 1, . . . , m. Then v, b, r, k, λ1 , . . . , λm , n 1 , . . . , n m , P 1 , . . . , P m , are called the parameters of an m-associate class PBIBD. Let N = ((n i j )) denote the incidence matrix of a PBIBD. Define m matrices G v×v (u), u = 1, . . . , m, called association matrices, as follows: G(u) = ((gii (u))) = (g1 (u) . . . gv (u)),
(3.3.1)
where the (i, i )th entry of G(u) is gii (u) =
1 if i and i are u-associates, 0 otherwise,
i, i = 1, . . . , v, and g j (u) denotes the jth column of G(u), j = 1, . . . , v. The following lemma is trivial (see Exercise 3.1). Lemma 3.3.1 (i) G(u) is symmetric. (ii) Diagonal entries of G(u) are equal to 0. (iii) In any row (column) of G(u), exactly n u entries are equal to 1 and the rest are equal to 0. (iv) Ii Ii = Iv + mx=1 G(x). Lemma 3.3.2 We have N N = r Iv + mx=1 λx G(x). Proof From the conditions b (i) and (ii) in the definition of PBIBD, the (i, i )th entry of N N is equal to j=1 n i j n i j , which is equal to
3.3 Partially Balanced Incomplete Block Design
⎧ r ⎪ ⎪ ⎪ ⎪ ⎨ λ1 λ2 ⎪ ⎪ ... ⎪ ⎪ ⎩ λm
if if if .. if
77
i = i , i and i are first associates, i and i are second associates, .. ... .. ... i and i are mth associates.
The claim follows from this since λu appears in N N exactly in the same places as 1 appears in G(u), u = 1, . . . , m. x Lemma 3.3.3 We have (i) G 2 (u) = n u Iv + mx=1 puu G(x), u = 1, . . . , m. m x (ii) G(u)G(w) = x=1 puw G(x), u, w = 1, . . . , m; u = w. 2 Proof of G 2 (u) is equal to gi (u)gi (u) = ii v(i) The (i, i )th entry G (u) l=1 gil (u)gi l (u) = n u if i = i , since it counts the number of u-associates of treatment i. If i = i , this is equal to the number of treatments which are u-associates of i as well as that of i , which is equal to the number of treatments common between u-associates of i and u-associates of i . Hence by condition (iii) of the definition of a PBIBD, we have
G 2 (u)
ii
⎧ nu ⎪ ⎪ ⎪ 1 ⎪ ⎨ puu 2 = puu ⎪ ⎪ ... ⎪ ⎪ ⎩ m puu
if if if .. if
i = i , i and i are first associates, i and i are second associates, .. ... .. ... i and i are mth associates.
The claim follows from this. (ii) The (i, i )th entry of G(u)G(w), u = w, is ((G(u)G(w)))ii = gi (u)gi (w) =
v
gil (u)gi l (w).
l=1
Note that gil (u)gi l (w) = 1 iff the treatment l is a u-associate of treatment i and also a w-associate of treatment i . Hence ((G(u)G(w)))ii gives the number of such treatments l. Therefore, for u = w, ((G(u) G(w)))ii = 0 if i = i ; and this is equal to the number of treatments which are u-associates of treatment i as well as w-associates of treatment i , which is equal to the number of treatments common between u-associates of i and w-associates x if i and i are x-associates, x = 1, . . . , m. of i , in turn, equal to puw x the same positions where Observe that puw appears in G(u)G(w) in exactly x G(x), u = w. 1 appears in G(x). So G(u)G(w) = mx=1 puw
78
3 Block Designs
Remark 3.3.4 Observe that Lemma 3.3.2 holds iff the conditions (i) and (ii) in the definition of PBIBD hold, while Lemma 3.3.3 holds iff condition (iii) in the definition holds. Therefore, these two lemmata prove useful to verify whether a block design is a PBIBD or not. The following theorem gives the relationships between the parameters; the conditions are only necessary conditions which any PBIBD has to satisfy but not sufficient. Theorem 3.3.5 The parameters of PBIBD satisfy the following relationships: For u, w, y ∈ {1, . . . , m}, vr = bk, r (k − 1) = mx=1 λx n x , m x=1 n x = v − 1, y y puw = pwu , m n u − 1 if u = w, w (v) x=1 pux = if u = w, nu u w = n w puy . (vi) n u pwy
(i) (ii) (iii) (iv)
Proof (i) The proof is the same as that of (i) in Theorem 3.2.1. (ii) Using Lemma 3.3.2, we have Iv N N Iv
=v r+
m
λx n x
= Iv N N Iv = k Ib N Iv = r kv,
x=1
and the claim follows. The proofs of (iii) and (iv) follow from the definition. (v) From (iv) of Lemma 3.3.1, for fixed u ∈ {1, . . . , m}, G(u)
m
G(x) = G(u) Iv Iv − Iv = n u Iv Iv − G(u).
(3.3.2)
x=1
Again from (ii) of Lemma 3.3.3, m
G(x)G(u) =
m x=u=1
x=1
=
m
G(x)G(u) + G 2 (u)
x=u=1
=
m t=1
m
m
x=1
t=1
Equating (3.3.2) and (3.3.3), we get
t pux G(t)
+ n u Iv +
t pux G(t) + n u Iv .
m
t puu G(t)
t=1
(3.3.3)
3.3 Partially Balanced Incomplete Block Design
nu
Iv Iv
− Iv =
79
m m x=1
t pux G(t)
+ G(u).
t=1
Using (iv) of Lemma 3.3.1 once again, we get nu
m
G(t) =
t=1
m m x=1
t pux G(t)
+ G(u),
t=1
and m
nu −
t=u=1
t pux
G(t) + n u − 1 −
x=1
Defining τ (w) =
m
m
u pux
G(u) = 0.
(3.3.4)
x=1
m u nu − 1 − m x=1wpux if w = u, (3.3.4) can be written as if w = u, n u − x=1 pux m
τ (t)G(t) = 0.
t=1
From this we have τ (t) = 0, t = 1, . . . , m, and the claim follows. (vi) For fixed u, w, y ∈ {1, . . . , m}, and for a fixed treatment i, we evaluate the product gi (w)G(y)gi (u). Using the notation in (3.3.1), we write gi (w)G(y)gi (u) =
v
gi (w)gl (y)gil (u).
(3.3.5)
l=1
Define (i, l) = t if i and l are t-associates, t = 1, . . . , m, where l is a treatment. Note that i) may not exist. Then (3.3.5) can be written as v (i,
(i,l) g (w)G(y)g (u) = p i l=1 wy gil (u). Note that gil (u) = 1 iff (i, l) = u and i v u w g (u) = n . Hence gi (w)G(y)gi (u) = n u pwy = gi (u)G(y)gi (w)=n w puy . il u l=1 We now give an example of a PBIBD. Example 3.3.6 The following is the layout of an experiment with 6 treatments labeled 1 to 6 arranged in 6 blocks labeled 1 to 6 : ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ 1 2 1 1 2 1 ⎜2⎟ ⎜3⎟ ⎜3⎟ ⎜2⎟ ⎜3⎟ ⎜3⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟. ⎝ 4 ⎠ ⎝ 5 ⎠ ⎝4 ⎠ ⎝ 4 ⎠ ⎝ 5 ⎠ ⎝4 ⎠ 5 6 6 5 6 6
80
3 Block Designs
The incidence matrix N of the design is ⎛
1 ⎜1 ⎜ ⎜0 N =⎜ ⎜1 ⎜ ⎝1 0
0 1 1 0 1 1
1 0 1 1 0 1
1 1 0 1 1 0
0 1 1 0 1 1
⎞ ⎛ 42 1 ⎜2 4 0⎟ ⎟ ⎜ ⎜ 1⎟ ⎟ and N N = ⎜2 2 ⎟ ⎜4 2 1⎟ ⎜ ⎝2 4 0⎠ 22 1
2 2 4 2 2 4
4 2 2 4 2 2
2 4 2 2 4 2
⎞ 2 2⎟ ⎟ 4⎟ ⎟ = 4I6 + 2G(1) + 4G(2), 2⎟ ⎟ 2⎠ 4
where ⎛
0 ⎜1 ⎜ ⎜1 G(1) = ⎜ ⎜0 ⎜ ⎝1 1
1 0 1 1 0 1
1 1 0 1 1 0
0 1 1 0 1 1
1 0 1 1 0 1
⎛ ⎞ 0 1 ⎜0 1⎟ ⎜ ⎟ ⎜ 0⎟ ⎟ and G(2) = ⎜0 ⎜1 ⎟ 1⎟ ⎜ ⎝0 1⎠ 0 0
0 0 0 0 1 0
0 0 0 0 0 1
1 0 0 0 0 0
0 1 0 0 0 0
⎞ 0 0⎟ ⎟ 1⎟ ⎟. 0⎟ ⎟ 0⎠ 0
Here v = 6 = b, r = 4 = k, and the block design satisfies the first two conditions to be satisfied by a PBIBD. It has two associate classes. We have m = 2, n 1 = 4, λ1 = 2, n 2 = 1, and λ2 = 4, ⎞ ⎛ 422422 ⎜2 4 2 2 4 2 ⎟ ⎟ ⎜ ⎜2 2 4 2 2 4 ⎟ 2 ⎟ = 4I6 + 2G(1) + 4G(2), ⎜ G (1) = ⎜ ⎟ ⎜4 2 2 4 2 2 ⎟ ⎝2 4 2 2 4 2 ⎠ 224224 G 2 (2) = I6 = 1I6 + 0G(1) + 0G(2). And G(1)G(2) = G(1) = 1G(1) + 0G(2). The block design satisfies the third condition to be satisfied by a PBIBD. The block design is a 2-associate class PBIBD. The remaining parameters are identified using Lemma 3.3.3. We have
21 P = 10 1
40 and P = . 00 2
In this PBIBD, each treatment has four first associates and one second associate. The table below gives the first and second associates of each treatment.
3.3 Partially Balanced Incomplete Block Design
81
Association scheme of PBIBD in Example 3.3.6 Treatment i 1 2 3 4 5 6
First associates λ1 = 2 2, 3, 5, 6 1, 3, 4, 6 1, 2, 4, 5 2, 3, 5, 6 1, 3, 4, 6 1, 2, 4, 5
Second associates λ2 = 4 4 5 6 1 2 3
Remark 3.3.7 Using Lemma 3.3.2 (see Exercise 3.1), we can show that the Cmatrix of an m-associate class PBIBD with the parameters v, b, r, k, λ1 , . . . , λm , n 1 , . . . , n m , P 1 , . . . , P m , is 1 r (k − 1) Iv − λu G(u). k k u=1 m
C=
(3.3.6)
We note that a PBIBD is a binary, equireplicate and proper block design. It is, in general, not possible to find the rank of C in (3.3.6) and hence not possible to say whether a PBIBD is connected or not. In fact, a PBIBD, in general, need not be connected as the following example of a 2-associate class PBIBD shows. Example 3.3.8 Consider the following block design with four treatments labeled 1 to 4 arranged in four blocks labeled 1 to 4 : 1 1 3 3 . 2 2 4 4 This is a binary, equireplicate and proper block design with v = 4 = b, r = 2 = k. The incidence matrix of the design is ⎛ ⎞ ⎛ ⎞ 2200 1100 ⎜2 2 0 0 ⎟ ⎜1 1 0 0 ⎟ ⎜ ⎟ ⎟ N =⎜ ⎝0 0 1 1⎠, and N N = ⎝0 0 2 2⎠ = 2I4 + 0G(1) + 2G(2), where 0022 0011 ⎛
0 ⎜0 G(1) = ⎜ ⎝1 1
0 0 1 1
1 1 0 0
⎞ 1 1⎟ ⎟ 0⎠ 0
and
⎛ 0 ⎜1 G(2) = ⎜ ⎝0 0
1 0 0 0
0 0 0 1
⎞ 0 0⎟ ⎟. 1⎠ 0
82
3 Block Designs
⎛
Then
2 ⎜2 2 G (1) = ⎜ ⎝0 0
2 2 0 0
0 0 2 2
⎞ 0 0⎟ ⎟ = 2I4 + 0G(1) + 2G(2), 2⎠ 2
G 2 (2) = I4 = 1I4 + 0G(1) + 0G(2), ⎛ ⎞ 0011 ⎜0 0 1 1⎟ ⎟ and G(1)G(2) = ⎜ ⎝1 1 0 0⎠ = 1G(1) + 0G(2). 1100 From Lemmata 3.3.2 and 3.3.3, we identify this as a 2-associate class PBIBD with λ1 = 0, λ2 = 1, n 1 = 2, n 2 = 1, P 1 =
01 20 and P 2 = . 10 00
Since R = 2I4 and K = 2I4 , the C-matrix of this PBIBD is 2 0 C =2 0 2 and has rank 2. The PBIBD is disconnected. This example has got something more to reveal. Note that 21 C is idempotent and hence by Theorem 3.1.25, the design is balanced. Further, by Theorem 3.1.27, it is orthogonal too. The following theorem gives a criterion for a 2-associate class PBIBD to be connected. We assume, without loss of generality, that λ1 < λ2 . 1 > 0. Theorem 3.3.9 A two-associate class PBIBD is connected iff λ1 + p22
Proof By (ii) of Theorem 3.1.11, a block design is connected iff all the elementary treatment contrasts are estimable. The C-matrix of a 2-associate class PBIBD is C = r Iv −
r (k − 1) λ2 N N λ1 = Iv − G(1) − G(2). k k k k
Denote by Su (i), the set of all u-associates of treatment i, u = 1, 2; i = 1, . . . , v. Suppose first that λ1 > 0. Fix a treatment, say i. Then i appears with every member of S1 (i) and of S2 (i) in at least one block. Therefore, αi − αi is estimable, where i is any other treatment. 1 > 0. Again fix a treatment i. Since every Suppose now that λ1 = 0 and p22 member of S2 (i) appears together with i in λ2 blocks, αi − αi is estimable for 1 > 0, every member of S1 (i) has at least one second every i ∈ S2 (i). Since p22 1 associate which is also a second associate of i, and, in fact, there are exactly p22 of these. Let i ∈ S1 (i) and i 1 ∈ S2 (i) ∩ S2 (i ). Then αi − αi1 and αi − αi1 are estimable and hence αi − αi is estimable for every i ∈ S1 (i). This establishes sufficiency.
3.3 Partially Balanced Incomplete Block Design
83
1 To prove necessity, let λ1 = p22 = 0. Then N N = r Iv + λ2 G(2), G(1)G(2) 1 2 1 1 2 1 + p22 = n 2 , and n 2 p12 = n 1 p22 . The = p12 G(1) + p12 G(2) = n 2 G(1) since p21 C-matrix is
C=
λ2 n 2 λ2 λ2 r (k − 1) λ1 λ2 Iv − G(1) − G(2) = Iv − G(2) = − {G(2) − n 2 Iv }. k k k k k k
Therefore, Rank(C) = Rank {G(2) − n 2 I } . Since G(1) {G(2)− n 2 Iv } = 0, we have G(1)C = 0. Therefore, Rank(C) + Rank(G(1)) ≤ v. Since Rank (G(1)) ≥ 2, we must have Rank(C) ≤ v − 2. Therefore, the PBIBD cannot be connected.
3.3.1 Estimability, Least Squares Estimates Throughout the rest of this section, we will consider only 2-associate class PBIBD and assume that it is connected. With the model as in (3.1.2), the criteria for estimability of lpf’s given in Sect. 3.2 cannot be simplified in the case of a PBIBD as the Cmatrix does not have a simple form like that of a BIBD. However, we can get an explicit solution of the reduced normal equation (3.1.7) and thereby get a least squares estimate of θ. We need the following lemma, which can be obtained by premultiplying both sides of (i) and (ii) of Lemma 3.3.3 with m = 2, by Ji , where Ji is the ith column of Iv . Lemma 3.3.10 For any treatment i and u, w = 1, 2, we have gi (u)G(w) =
1 2 gi (1) + puu gi (2), u = 1, 2; n u Ji + puu 1 2 u, w = 1, 2, u = w. puw gi (1) + puw gi (2),
Lemma 3.3.11 A solution of the reduced normal equation C αˆ = Q is αˆ =
k (4 Q − 2 G(1)Q),
(3.3.7)
where C is as in (3.3.6) with m = 2, Q is as defined in (3.1.7), = 1 4 − 2 , and 4 = r (k − 2 3 , 1 = r (k − 1) + λ2 , 2 = λ2 − λ1 , 3 = (λ2 − λ1 ) p12 1 2 1 2 1) − λ1 ( p11 − p11 ) − λ2 ( p12 − p12 ). Proof The reduced normal equation is r (k − 1) λ1 λ2 αˆ − G(1)αˆ − G(2)αˆ = Q. k k k
(3.3.8)
84
3 Block Designs
Fix a treatment i. Premultiplying both sides by Ji , gi (1) and gi (2) successively, we get the following three equations. λ1 λ2 r (k − 1) (3.3.9) αˆi − gi (1)αˆ − gi (2)αˆ = Q i , k k k r (k − 1) λ1 λ2 gi (1)αˆi − gi (1)G(1)αˆ − gi (1)G(2)αˆ = gi (1)Q, (3.3.10) k k k r (k − 1) λ1 λ2 gi (2)αˆi − gi (2)G(1)αˆ − gi (2)G(2)αˆ = gi (2)Q. (3.3.11) k k k Note that the sum of the left sides and the sum of the right sides of the above three equations are zeroes. Hence we can delete one of the equations and we delete (3.3.11). By Lemma 3.3.10, we get 1 2 gi (1)G(1) = n 1 Ji + p11 gi (1) + p11 gi (2), 1 2 gi (1)G(2) = p12 gi (1) + p12 gi (2).
ˆ ηˆi (2) = gi (2)α, ˆ and gi (1)Q = Q i1 . Observe that αˆ i + Let ηˆi (1) = gi (1)α, ηˆi (1) + ηˆi (2) = Iv αˆ since Ji + gi (1) + gi (2) = Iv . The equations (3.3.9) and (3.3.10) can be written as r (k − 1)αˆ i − λ1 ηˆi (1) − λ2 ηˆi (2) = k Q i , −λ1 n 1 αˆ i + d1 ηˆi (1) + d2 ηˆi (2) = k Q i1 ,
(3.3.12) (3.3.13)
where 1 1 2 2 − λ2 p12 and d2 = λ1 p11 − λ2 p12 . d1 = r (k − 1) − λ1 p11
(3.3.14)
To get a unique solution, we add the equation αˆ i + ηˆi (1) + ηˆi (2) = 0.
(3.3.15)
Using (3.3.15), we eliminate ηˆi (2) from (3.3.12) and (3.3.13) and get (r (k − 1) + λ2 ) αˆ i + (λ2 − λ1 ) ηˆi (1) = k Q i , (d2 − λ1 n 1 ) αˆ i + (d1 + d2 ) ηˆi (1) = k Q i1 . Let 1 = r (k − 1) + λ2 , 2 = λ 2 − λ 1 , 2 2 2 2 3 = d2 − λ1 n 1 = λ1 p11 + λ2 p12 − λ1 p11 + p12 2 , and = (λ2 − λ1 ) p12
(3.3.16) (3.3.17)
3.3 Partially Balanced Incomplete Block Design
85
4 = d1 + d2 1 2 1 2 = r (k − 1) − λ1 ( p11 − p11 ) − λ2 ( p12 − p12 ).
(3.3.18)
Substituting (3.3.18) in (3.3.16) and (3.3.17), we get k {4 Q i − 2 Q i1 } . 1 4 − 2 3
(3.3.19)
k {4 Q − 2 G(1)Q},
(3.3.20)
αˆ i = Since i is arbitrary, we get αˆ i = where = 1 4 − 2 3 .
Corollary 3.3.12 A g-inverse of C in (3.3.7) with m = 2 is C− =
k {4 Iv − 2 G(1)},
(3.3.21)
where , 2 , 4 are as given in (3.3.18). Using the above corollary and (3.1.8), one can get a g-inverse of A A, where A is the design matrix of the PBIBD model. Remark 3.3.13 The procedure to solve the equation C αˆ = Q in the general case is straightforward provided the PBIBD is connected. The idea is to reduce the v equations in C αˆ = Q to m + 1 independent equations in αˆ i and m other variables. We get m + 1 equations from C αˆ = Q by premultiplying both sides by Ji , gi (1), . . . , gi (m). The resulting equations are however dependent as the sums of both sides of these equations are zeroes. Hence one of the equations, say, the last equation is deleted. The remaining m equations are then converted into equations in the variables αˆ i , ηˆi (1), . . . , ηˆi (m) using the version of Lemma 3.3.10 for general m. The addition of the equation αˆ i + ηˆi (1) + · · · + ηˆi (m) = 0 will get m + 1 equations in m + 1 variables having a unique solution.
3.3.2 Blue’s and their Variances Let a1 α be a treatment contrast. Its blue is a1 αˆ = a1 C − Q = By (3.1.12), we have
k 4 a1 Q − 2 a1 G(1)Q .
(3.3.22)
86
3 Block Designs
k 4 a1 a1 − 2 a1 G(1)a1 σ 2 . V a1 α(Y ˆ ) = a1 C − a1 σ 2 =
(3.3.23)
In particular, the variance of the blue of an elementary treatment contrast αi − αi , i = i , i, i = 1, . . . , v, is equal to 2k(4 +2 ) σ 2 if i and i are first associates, 4 and is equal to 2k σ 2 if i and i are second associates. Note that the variance of the blue of an elementary treatment contrast is not the same for every pair of treatments unlike in RBD or BIBD.
3.3.3 Tests of Hypotheses By (3.1.16), the MSE is MSE =
B B 1 y y − QC − Q − , vr − v − b + 1 k
where QC − Q =
k 4 Q Q − 2 Q G(1)Q .
(3.3.24)
(3.3.25)
The level-ω critical region for testing Hα given in (3.1.18) takes the form k
4 Q Q − 2 Q G(1)Q > F0 , (v − 1)MSE
(3.3.26)
where F0 = F0 (ω; v − 1, vr − v − b + 1), and the level-ω critical region for testing Hβ given in (3.1.20) will now be k
4 Q Q − 2 Q G(1)Q + (b − 1)MSE
B B k
−
T T r
> F1 ,
(3.3.27)
where F1 = F1 (ω; b − 1, vr − v − b + 1). As in the case of a BIBD, here also we can present the computations in the twin analysis of variance tables Tables 3.1 and 3.2 by taking n = vr, t = 1, SSTr(adj) B B = k k Q Q − Q G(1)Q , and SSB(adj) = Q Q − Q G(1)Q + k − 4 2 4 2 T T . r Theorems 3.1.19 and 3.1.20 can be used for tests of the hypotheses 1 α = 0 and 2 β = 0, respectively, where 1 α (2 β) is a vector of q independent treatment (block) contrasts. As the expressions for the critical regions do not simplify, we do not give them here.
3.3 Partially Balanced Incomplete Block Design
87
3.3.4 Efficiency Factor of a Block Design For defining the efficiency factor of a block design, the RBD is taken as the standard one and, the block design whose efficiency is compared with the RBD is assumed to be connected. 2 The Efficiency Factor E of a block design is defined as the ratio of 2σr and the average variance of blue’s of all the elementary treatment contrasts in the block v , ri denoting the number of replications of treatment i design, where r = r1 +···+r v in the block design, i = 1, . . . , v. 2σ 2
< 1, since λ(v − 1) = r (k − 1), which implies For BIBD, E = 2krσ2 = λv rk λv that λv = r k − (r − λ) and r > λ. For a 2-associate class, connected PBIBD, since there are v(v−1) distinct elemen2 tary treatment contrasts, vn 1 pairs of treatments i and i which are first associates, and vn 2 pairs of treatments i and i which are second associates, the average vari2kσ 2 ance of blue’s of elementary treatment contrasts is equal to (v−1) (n 1 2 + (v − 1) (v−1) 4 ) . So, the Efficiency Factor, E = r k(n 1 2 +(v−1)4 ) < 1. Remark 3.3.14 In the definition above, note that the average variance of a block design is compared with the average variance for an RBD. In the numerator, r is taken because the comparison will be meaningful only if the two designs have the same number of plots r1 + · · · + rv . Further, in the definition above, it is assumed that the per plot variance σ 2 is the same for both the designs. Remark 3.3.15 The BIBD and the two-associate class PBIBD are particular cases of cyclic designs and group divisible designs. These are block designs studied in the literature for their desirable properties including, but not limited to, flexibility, existence, ease of representation, and modeling. We refer the reader to the literature for more information on cyclic designs and group divisible designs.
3.4 Exercises Exercise 3.1 Prove the properties of the matrix v in Example 3.1.24. Provide the proofs of Remark 3.1.6, Lemmata 3.1.15, 3.1.16 and 3.3.1, and missing steps in the proofs of Corollary 3.1.4, Theorem 3.1.7, Corollary 3.1.8, Theorem 3.1.11, the discussion preceding Remark 3.1.21, Theorem 3.1.28, Example 3.1.29, Sect. 3.2.5, and in Remark 3.3.7 regarding the C-matrix of an m-associate class PBIBD. Exercise 3.2 The layout with yields of an experiment employing a block design is given below with numbers in parentheses indicating the yields and the numbers outside denoting the treatments: (a) Write the model with assumptions. (b) Find rank of the model.
88
3 Block Designs
(c) Obtain criteria under which an lpf, a treatment contrast, and a block contrast are estimable. (d) Classify the design. (e) If αi denotes the ith treatment effect, obtain the blue of α1 − 2α3 + α5 and α2 − 2α4 + α6 , and estimate their variances and covariances, after verifying if these are estimable. (f) Verify if the following are linear hypotheses and test at 5% level of significance: (i) Hα : α1 = . . . = αv ; (ii) Hβ : β1 = . . . = βb ; and (iii) H0 : α1 − 2α3 + α5 = 0, α2 − 2α4 + α6 = 0. Block 1 1(10.3) 5(8.9) 5(6.5)
Block 2 Block 3 2(9.9) 1(7.8) 4(11.2) 3(12.5) 4(9.1) 6(8.3)
Block 4 2(10.9) 2(13.5) 4(8.7) 6(10.6) 6(9.1)
Block 5 3(7.3) 5(8.2) 3(9.7) 1(11.5)
Exercise 3.3 The plan and yields from a block design experiment are given below with numbers in parentheses indicating the yields and the numbers outside denoting the treatments. Block 1 2(10.3) 5(14.2) 3(9.8) 5(11.4) (a) (b) (c) (d)
Block 2 3(8.9) 4(12.3) 1(13.3)
Block 3 5(13.5) 3(10.1) 2(11.3) 2(10.9) 4(12.5)
Block 4 1(12.8) 1(13.0) 4(12.7)
Show that the design is connected. Find the blue’s of all the elementary treatment contrasts. Examine whether all the treatments have the same effect on yields. Was blocking necessary in this experiment?
Exercise 3.4 Identify the design and analyze the following data, where yields are given in the parentheses. Blocks 1 2 3 4 5 6 7 8 9 10
Treatments with yields 1(7.5) 2(6.4) 3(7.6) 1(6.6) 2(6.0) 5(7.8) 1(5.4) 4(7.8) 5(8.7) 2(6.6) 3(7.0) 4(7.3) 3(7.5) 4(8.7) 5(7.3) 1(7.6) 2(6.9) 4(6.8) 1(6.5) 3(8.2) 4(7.6) 1(8.0) 3(7.7) 5(7.2) 2(7.2) 3(8.4) 5(8.3) 2(8.3) 4(7.0) 5(8.7)
3.4 Exercises
89
Exercise 3.5 The following data gives the outcome of an experiment with treatments in parentheses: Blocks 1 2 3 4 5 6 7 8 9 10
Yields with treatments 14.3(2) 12.8(1) 17.9(5) 19.0(6) 13.6(2) 9.5(1) 10.6(1) 20.3(3) 17.5(4) 12.4(1) 20.3(6) 15.3(3) 20.3(5) 18.1(4) 13.5(1) 21.2(2) 22.0(3) 17.6(4) 20.9(3) 21.3(2) 24.3(5) 20.2(4) 23.1(6) 20.7(2) 26.0(5) 21.1(3) 26.5(6) 28.9(4) 21.3(6) 31.0(5)
Identify the design and analyze. Find the efficiency factor, if applicable. Exercise 3.6 The layout with yields given below are from a varietal trial with varieties in parentheses. Blocks 1 2 3 4 5
Varieties and yields 32(1) 45(2) 37(3) 40(4) 88(6) 15(1) 43(7) 85(5) 68(6) 66(8) 56(4) 21(9) 53(2) 94(5) 13(9) 12(10) 18(10) 60(8) 35(3) 49(3)
(a) Identify the design. (b) Prepare the Anova table and test for the equality of variety effects. (c) Obtain an estimate of the difference between the effects of the first two varieties and calculate its standard error. (d) Compute the efficiency factor of the design. Exercise 3.7 The following table gives the data from a PBIBD. Blocks 1 2 3 4 5 6 7 8 9
Yields and treatments 54(3) 56(8) 53(4) 35(2) 36(7) 40(4) 48(1) 42(7) 43(5) 46(7) 56(8) 59(9) 61(4) 61(5) 54(6) 62(3) 53(9) 48(5) 54(1) 59(8) 62(6) 45(2) 46(9) 42(6) 31(1) 28(2) 25(3)
Analyze the data and find the efficiency factor of the design, after verifying that the design is, in fact, a PBIBD.
90
3 Block Designs
Exercise 3.8 Write R-codes to print the two Anova tables for testing Hα and Hβ , separately, as in Example 3.5.2, for finding elementary treatment contrasts in a GBD, for (b) of Exercise 3.3 and to generate the association matrices in Example 3.3.6.
3.5 R-Codes on Block Designs Example 3.5.1 This example illustrates Example 3.1.23, with y= (9.3, 11.2, 9.8, 10.4, 8.9, 11.3, 12.3, 12.5, 9.1, 10.3, 10.7, 12.5) . Codes in this and in the next example are given to find least squares estimates of α and β, to test the hypotheses Hα : α1 = . . . = αv , Hβ : β1 = . . . = βb and for classification of the block design.
> > > > + >
rm(list=ls()); block > > > > > > > >
0 1 0 1 0
2 0 1 0 1
0 2 0 1 0
1 0 1 0 1
T1