216 98 41MB
English Pages 271 [276] Year 2019
Multivariate Techniques
Multivariate Techniques: An Example Based Approach By
Muhammad Qaiser Shahbaz, Saman Hanif Shahbaz and Muhammad Hanif
Cambridge Scholars Publishing
Multivariate Techniques: An Example Based Approach By Muhammad Qaiser Shahbaz, Saman HanifShahbaz and Muhammad Hanif This book first published 2019 Cambridge Scholars Publishing Lady Stephenson Library, Newcastle upon Tyne, NE6 2PA, UK British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Copyright© 2019 by Muhammad Qaiser Shahbaz, Saman Hanif Shahbaz and Muhammad Hanif All rights for this book reserved. No part ofthis book may be reproduced, stored in a retrieval system, or transmitted, in any fonn or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior pennission ofthe copyright owner. ISBN (10): 1-5275-4011-1 ISBN (13): 978-1-5275-4011-8
To My Siste rs Muhammad Qaiser Shahbaz To My Sons: Khizar, Yousuf andMussa Saman HaniJShahbaz To My Parents MuhammadHaniJ
CONTENTS
Preface .......................................................................................................
IX
Chapter One ................................................................................................ 1 Introduction Chapter Two ............................................................................................. 33 The Multivariate Nonnal Distribution Chapter Three ........................................................................................... 73 The Wishart and Related Distributions Chapter Four ............................................................................................. 85 Inference about Mean Vectors
ChapterFive ............................................................................................ 119 Inference about Covariance Matrices
Chapter Six ............................................................................................. 137 The Multivariate Analysis of Variance Chapter Seven ......................................................................................... 169 Multivariate Regression Analysis
Chapter Eight .......................................................................................... 207 Canonical Correlation Analysis
Chapter Nine ........................................................................................... 229 Principal Components and Factor Analysis References ............................................................................................... 263
PREFACE
Multivariate analysis has tremendous applications in many areas of life and deals with several random variables simultaneously. The data which arise in practical life are multivariate in nature and hence require specialized techniques for decision making. Multivariate analyses are studied in two contexts, namely the multivariate distributions and multivariate techniques. Both of these contexts have their own requirements. The multivariate distributions are mainly studied theoretically whereas the multivariate techniques are studied using real data. In this we have given a detailed insight of multivariate distributions and multivariate techniques. The opening chapter of the book provides an introduction to multivariate distribution theory and popular multivariate measures. This chapter also provides a brief review of matrix algebra. One of the most popular multivariate distributions, the multivariate nonnal distribution, has been discussed in detail in chapter 2. Some other popular multivariate distributions have also been discussed in chapter 3. These include the Wishart distribution, multivariate Beta distribution and Wilk's distribution. Chapter 4 and Chapter 5 contain two popular multivariate inferences, namely inferences about population mean vectors and population covariance matrices. The multivariate counterpart of students' t statistics, the Hotelling's r2 statistic has also been discussed in Chapter 4. The multivariate extension of popular Analysis of Variance (ANOVA), the multivariate ANOVA, has been given in Chapter 6. This chapter contains both one way and two way multivariate ANOVA and also provides methods to conduct inference about several mean vectors. Chapter 7 of the book is dedicated to another popular multivariate technique, the multivariate regression analysis. This chapter provides methods of estimation and hypotheses testing in case of multivariate regression analysis. The chapter also covers another popular multivariate technique, the Seemingly Umelated Regression (SUR) models and contains methods of estimation and hypothesis testing for such a model. Chapter 8 and Chapter 9 are dedicated to multivariate techniques which do not have univariate counterpart. Chapter 8 is reserved for a detailed discussion of one of these techniques, the Canonical Correlation Analysis (CCA). Various versions of the technique have been discussed
x
Preface
including simple, partial, part and bi-partial canonical correlations. Methods of computing these correlations and variate are discussed in Chapter 8 alongside the inferences in these variants of canonical correlation. Chapter 9 of the book is dedicated to two popular dimension reduction techniques, the Principal Component Analysis (peA) and Factor Analysis (FA). Methods of computing principal components and factors are discussed alongside the procedures to conduct inferences in these techniques. The book provides sufficient numerical examples to understand various concepts alongside R codes for these examples. Finally, we would like to thank our colleagues and students for critical comments throughout compilation of this book Author one would like to thank his elder brother Prof Dr Tariq Bhatti for his continuous support and guidance in his academic and personal life. Authors one and two would also like to thank Department of Statistics, King Abdulaziz University for providing excellent facilities to compile this book, whereas the third author is thankful to National College of Business Administration & Economics, Lahore for providing a good atmosphere to work on this book Muhammad Qaiser Shahbaz, Saman Hanif Shahbaz and Muhammad Hanif
CHAPTER ONE INTRODUCTION
1.1 Introduction The data always arise in scientific and social studies as an input to the study. The studies are conducted to achieve certain objectives. In some cases the objectives are such that the data is required on a single variable only, for example study related to life length of an electronic component When the data is available on a single variable, the univariate analyses are very helpful. For example we can use the popular t-test to test the hypothesis that the average life length of an electric component is equal to 2 years. In certain situations, data on several variables is available and is called the multivariate data. If these variables are mutually independent then each variable can be studied separately but this is a rare case. In most of the situations, the variables are dependent and hence can not be studied individually by using univariate analysis of each variable separately. In such situations we need techniques which help us in studying data on several variables and are called multivariate techniques. Further, the analysis of multivariate data is called multivariate analysis. In this book, we have discussed some popular multivariate techniques to analyze a multivariate data depending upon various objectives. We have discussed these techniques in two ways, namely the theory of the techniques and their applications on the real data. In some cases, the multivariate analyses are a straight extension of the univariate analyses and in some cases these analyses are stand alone analyses and do not have any univariate counterpart. We have discussed both type of techniques in this book The multivariate analysis require certain specialized tenns and notations which we will discuss in this chapter. These tenns and notations are useful in understanding and applying the multivariate analysis. We start with a basic cornerstone of the multivariate analysis, known as the data matrix in the following.
2
Chapter One
1.2 The Data Matrix The data is a key input to statistical analysis. In univariate analysis, the data is collected from a set of respondents on a single variable and is usually presented in the [onn of a column vector. These vectors are then analyzed with respect to ceratian underlying objectives. In multivariate analysis the data is collected from a set of respondents on a set of variables and is usually collected in the [onn of a matrix. The matrix which contains the multivariate data is called the data matrix and has following description x l1
'12
'1;
xln
X 21
X 22
X
z;
x2n
Xil
Xi 2
Xi;
X( p x n ) =
xi n
In above representation of the data matrix each column vector, x J
' IS
collection of information of all variables for jth respondent and hence is }th observation in a multivariate data. The data matrix is a key sample information in Multivariate Analysis as this is needed to compute almost all of the multivariate measures. We will see this in the corning chapters. We know that in univariate analysis the study can be done by using the data and by using the distribution of underlying data. The study using data is usually considered a sample based study but often it happended that the study is conducted by using underlying probability distribution from where the data has been drawn. Such studies are conducted on the basis of some random variables. In multivariate analysis the concept is extended to the collection of several random variables. This collection of random variables is usually represented in the form of a vector is and called the random vector which is discussed below.
Introduction
3
1.2.1 The Random Vector In univariate analysis, a random variable plays very important role as it provides infonnation about probability distribution of some phenomenon. A random variable X always has some distribution function F( x). which provide all the infonnation about that random variable and is known as univariate distribution. In univariate analysis, the sample is drawn from some univariate distribution and is studied based upon the underlying objectives. In multivariate analysis concept of a single random variable is extended to the case of several random variables having some joint distribution. In multivariate analysis we have a collection of p random variables. Xl, X 2 , ... X p , which can be collected in the form of a column vector x as Xl
X2 x= Xp The above vector is called a random vector and its ith component Xi IS a random variable. Every random vector has certain joint distribution function given as F(XI, X2, ... , xp) or F( x). The distribution of a random vector is called a multivariate distribution. The random vector is key to all inferences in multivariate theory. The random vector is easily extended to the case of a random matrix and is a (p x p) matrix X such that all of its entries are are random variables. The distribution of a random matrix is called a matrix distribution. The multivariate distributions require certain notations which appear frequently. In the following we will discuss some notations which appear in multivariate analysis.
1.3 Notations of Bivariate and Multivariate Distributions Some popular notations which appear in multivariate analysis are discussed in the following.
Chapter One
4
1.3.1 The Joint Distributions The univariate distribution is a probability distribution of a single random variable. The concept is easily extended to the case of two and several random variables. We will discuss both in the following. The joint density function of two continuous random variables Xl and
X,
is denoted by
f(-'1,x,)
so that
f(-'1,x,)::-O
and
[[f(-'1,x,)dx;dx, ~1. The joint distribution function, in case of two random variables, is obtained as
(131)
The function F ( -'1, x,) is a distribution function if it satisfies following conditions
F (+00, +00 ) ~ 1,F( -00, x, ) ~ O,F( -'1,-00 ) ~ 0 and for every a, < a, and b, < b, the following inequality holds
F (a" b, )+ F (a,,q )- F ( a p b, )- F (a,,q ) ::- 0 . The joint distribution function is a udeful function which provides all the information about two random variables. The joint density function of two random variables is easily obtained from the j oint distribution function by differentiation, that is the joint density function oftwo random variables X, andX1 is obtained from the joint distribution function as
8'
--F(-'1'x,)~
8-'1 8x,
f(-'1,x,)'
(1.3.2)
The joint density function is a useful function and provides basis to compute joint probabilities for two random variables. Specifically. the probability that the random variables Xl andX1 belong to some regionE is computed as
p{(X"X,)EE}~ ffJ(x"x,)dx,dx,.
(1.3.3)
The joint distribution also provides basis for computation of joint moments of two random variables which as I ;\,T 2-
E(X"X" ) -- fW-cJ) fW_cJ)XiI, x 2"I( Xi'X2 )dx1 dx2' 1 2
(1.3.4)
Introduction
5
The concept of joint distribution is easily extended to the case of several random variables and the density for several random variables XI, X 2 , ... , Xp is denoted by f( -'1,X" ... ,xp ) such that f( -'1,X" ... ,xp );:- 0 and
The joint density function of several random variables provide basis for the joint distribution function of p random variables which is given as
enS) The joint distribution function of serveral random variables satisfies
F( +00, ... ,+00) ~ 1 and F( -'1, ... ,xH ,-oo,XHP ... ,xp ) ~ o. The joint density function of several random variables is directly obtained from the joint distribution function by differentiation as
8P --F(-'1' ... 'xp)~ f(-'1, ... ,xp ). 8-'1 ...8xp
(1.3.6)
The joint density function of several random variables can be used to compute the joint probability for occuence of several random variables. Specifically, the probability that the random variables belong to some measurable space A is computed, by using the joint density function, as
p\(Xp"".,Xp
)EAl ~ Lffh,···,x )dx; ...cb: p
p.
(1.3.7)
The joint density function of several random variables IS useful in computing the product moments for several random varibales. Specifically, the product moments are computed by using
's' (1.3.18) Specifically
E(X,lx2)~ AI2 and E(X21-'1)~ fl* are called the
conditional means of Xl givenX2 =
X2
andX2 givenXI
= Xl
respectively. The
Chapter One
8
conditional means are sometime called the regression functions. The conditional central moments are defined as
E \( Xl - Ji'i2)"
IX2l ~ [(Xl - Ji'i2)" f( Xl X2 )dxl ;
and
E \( X 2
-
Ji2')" IXI
I
l~ [(
(1.3.19)
x 2 - Ji2')" f (X2 I Xl) dx 2 . (1320)
The quantities
E \( Xl - "'1 2 )'IX2)
~ 0"112 and E (( X 2 - Ji 1l )1-'1l ~ 2
are called the conditional or partial variances of Xl given X 2 given Xl = Xl respectively.
0"221
= X2
and X 2
When we have the joint distribution of several random variables as
f( -'1, ... ,X
p
)
then several conditional distributions can be defined.
Specifically the conditional distribution of one set of variables say Xl. X 2• ... ,Xq given the remaining set of variables is obtained as
(
I
- f(-'1, ... ,xp) ). ) f(
f -'1, ... ,Xq Xq+p ... ,xp -
(1.3.21)
Xq+l'···'Xp
The conditional means, conditional or partial variances and conditional or partial covariances can be computed from (1.3.19). The partial variances and covariances provide the basis for computation of the partial correlation coefficients. Example 1.1: The joint distribution oftwo random variables Xl and X 2 is xx f(-'1,X2)~...LL;Oyz+ yz+ zx );x+ y+ z < 3. Compute the mean vector, covariance matrix and correlation matrix. Also Y) given Z = 1 find conditional mean vector and covariance matrix of alongside the partial correlation coefficient.
eX,
Solution: The joint moments are readily obtained as
J.l:~, ~ E(X'Y'Zt) ~ ~~ f:C f:-Y-'x' y' z' (>yz+ yz+ zx )dxdydz ~
r(r+l)r(s+I)r(t+2)[ ,H+t+1\( r 2 + 2t + 15 ) ( ) 16 x 3 r r+s+t+7 +s( t+ s+ ll)+ r( 5s+ t+ ll)lJ.
The marginal moments for random variables X, Y and Z are
J.l: ~E(X')~ ~g:~~[16x3'+1{r(r+ll)+15}], J.l; ~E(Y')~ n;:~~[16X3'+1{S(S+ll)+15lJ, J.lI ~ E(Z') ~ r(t+ 2)[16 x 3' +1 {2t+ 15}]. 1
r(t+7)
The product moments are
;.i ~E(X'Y,)~r(r+I)r(s+l) '.'
r(r+s+7)
X[16X 3,.,.1 (\(r2 + 15) + s( s+ II) + r( Ss+ II)llJ
J.l: , ~E(X'yt ) r(r+l)r(1+2) r(r+I+7)
x[16 x r I
)1"
t
1
+
{r( r+ 1+ ll)+ 21+ 15}],
~E(Y'Zt) 1(s+I)1(1+2) 1(s+1+7) x[16x 3,.,+1 {s( s+1+ 11)+ 21+ IS}]
Chapter One
22
The means and variances of random variables X, Y and Z are
Jl x ~ CT
27
1389 4900'
~--
xx
34
27
35' Jly ~ 35' 1', ~ 35 1389 4900'
CT yy
~--
CT u
1361 4900
~--
The covariances between possible pairs are CTry
606 ~E(XY)-E(X)E(Y)~- 4900
CTu
~E(XZ)-E(X)E(Z)~- ::~O
CTF
522 ~E(YZ)-E(Y)E(Z)~- 4900
The mean vector and covariance matrix are therefore
JlX Jl= Jl y [ Jl z
1: =
[CTxx
~ 3 5 [27] 27 1
34
::
::
CTa
1 [1389 -606 -522] 1389 -522 4900 1361
~--
The Correlation Matrix is
Pxx
p=
[
Pxy Pyy
Pu _ [l.000 Pyz -
-0.436 -0.380] l.000 -0.380. l.000
CTa
eX,
In order to obtain the conditional distribution of Y) given Z ~ I we will first obtain the marginal distribution of Z which is given as
I, (z) ~ 16 r - ' r 3
3 y
81Jo Jo
-'(.\)lZ
+ yz + zx )dxdy
~ ~z( z-3)' (z-l1);O < z < 3. 243
Hence the conditional distribution of fx,yIZ~I~ ( )
eX, Y) given Z
~
I is
f(x,y,l) 3 ( ) fz(l) ~lOx+y+xy;x+y,~l,lT'h); using above result in (2.3.10) we have
IT,h M x (t) + [,u, + tpn + ±thlT'h E( X,X k ) ~ h>' xM x ~
J
(t )[,uh + thlThh + ±tP'h J h>'
t=o
IT'h + ,u,,uh .
(2.3.11) Now using (2.3.6), (2.3.9) and (2.3.11), the variance of ith variable and covariance between ith and hth variable is obtained as under
Var(X.)~E(Xn-[E(X.)r ~lT" Cov( X,X h ) ~
E( X,X h )- E( X, )E( X h ) ~ IT'h
The central moment generating function is easily obtained below
Mx_I' (t) ~ E
(/(X-_l l~ exp ( ~e1:t].
(2.3.12)
The central moments can be easily obtained from (2.3.12).
2.4 Quadratic Fonu and its Distribution The density function of a random vector x having multivariate normal
distribution is
I( x) ~ (27, f
P12
I1:I-
1I2
exp [
-~( x-,u / 1:-
1
(
x-,u)}
The quadratic [onn in the exponent is given as
Q~
(x - J.1 )/1:-1 (x -
J.1 ) .
The quadratic form provides all the information about the multivariate normal distribution. The quadratic form is a scaler quantity and is a random variable, so it has some univariate distribution. In the following we will obtain probability distribution of the quadratic form. We know that under the transformation z ~ C- 1 (x - J.1), with C/1:- 1C = I, that
Z
IS
Np
(0,1)
and the components of z
are
The Multivariate Normal Distribution
41
independent standard nonnal variates. Also, under the given transformation the quadratic fonn becomes Q ~ (x _,..)/1;-1 (x -,..) ~ (CZ)I 1;-1 (Cz) ~ Z /C /1;-ICZ
~ Z/Z ~ L:~/,'.
(2.4.1)
From (2.4.1) we can see that the quadratic form has been written as the sum of square of independent standard nonnal variates and hence has the ;(2_ distribution withp degrees of freedom. The quadratic form is also very useful in characterizing the multivariate normal distribution as J.I and 1; can be completely identified from it. This can be readily justified as below. The quadratic form is given as Q ~ (x -,.. 1;-1 (x -,..) ~ x/1;-IX + ,../1;-1,..- 2,../1;-I X (2.4.2)
i
Differentiating (2.4.2) w.r.t. J.I, we have
8Q / 8x = 21;-I X- 21;-1,... Also
C!Q / ax =
0 gives
21;-I X-21;-I,..= O=> x = J.L
(2.4.3) From (2.4.3) we can readily see that the mean vector is the solution of C!Q / = 0 for x. Again from (2.4.2) we can see that the quadratic fonn can be written as
ax
(2.4.4) where cr'h is (Jii
(i, h )th
entry of 1;-1. From (2.4.4) we can readily see that
is the coefficient of
the coefficient of
Xi X h
2 Xi
is
in the expansion of the quadratic form whereas
2(Jih .
42
Chapter Two
Example 2.1: The quadratic form in the exponent of a trivariate normal distribution is
Identify the mean vector and the covariance matrix.
Solution: We know that the mean vector is the solution of Now we have
8Q 8x,
C!Q / ax =
O.
3x, - x, - 3-'3 + 1
=
8Q -8 =-x,+x,+x3- 1
x,
C!Q
=
8X3
-3Y. + x + 5x3 - 5 "1'
Setting above three derivatives to zero we have
3x,-x,-3x3 +1=O -x, +x, +x3 -1=O
-3x1 + x, + 5X3 Now x=
solving
above
J.1 = [2 1 2
r.
5= 0 three equations
simultaneously
~~/22 ~3~/ilorE=[2 ~
Example 2.2: If random vector x has Np then find mean of Y =
have
Again equating the coefficients of square and cross
product terms in the quadratic form we have
E_l=[3/2
we
H
(J.1. E) and A is some matrix
Xl Ax.
Solution: Clearly the random variable Y is univariate. Now
E(
Xl
II =tr{A(E+J.1J.1/ll as Cov(x)=E
AX) = E{tr( Xl Ax = E{tr( AXX/ll = tr{AE( xx/ll
The Multivariate Normal Distribution
43
2.5 Some Other Properties The multivariate normal distribution has certain properties which make it one of the most important multivariate distributions. In the following we will discuss some more properties of the multivariate normal distribution.
2.5.1 Distribution of Linear Combination The linear combinations arise in many areas of statistics. The distribution of linear combination depends upon the distribution of random
variables from which fhe linear combination is formed. In the following we will obtain the distribution of linear combinations of a multinonnal random vector x.
Let
x is Np (J.I,1:). Define the
linear combination of x as Y ~ Ax + b.
We will give the distribution of y in the following. The density of x is
I( x) ~ (27r f
P 12
I1:I-
1I2
exp [
-~(x- J.I/ 1:-
Under the transformation y ~ Ax + b we have the jacobbian of transformation is density of y ~ Ax + b is
1
(
x- J.I)]'
x = A-I (y -
b) and hence
IJI ~ lOx / Byl ~ modIAI- 1 .
Now the
I(Y) ~ (2nfPI211:1-1I2 exp [ -~( A- 1y - A- 1 b- J.I Y1:- 1
x( A- y - A- b_ J.I l]IAI1
1
1
~ (2n f PI211:1-1I2 exp [ _~\A-l (y - AJ.I- b )ll 1:- 1 X\A -1 (y - AJ.I- b)l ]IAAr1l2
44
Chapter Two
~ (2ff fPl21ALAf12 exp [ -~{y- (Afl+ b )}I (ALAr x{y-(Afl+b)}]
(251)
From (2.5.1) we can readily see that it is the density function of a (AJl + b) and
multivariate nonnal distribution with mean vector
covariance matrix Al:A/. Hence we see that if x is Np
(J.l,l:)
then
y~Ax+b has Np (Afl+ b,A1:A / ). Example 2.3: The random vector x has N3 (,..,1:) where J.l and 1: are gIVen as
What is the joint distribution of y,
~ 4X, + X3 - 2X2 and Y2 Also obtain correlation coefficient between Yt and f2.
Solution: The given linear combinations are Y1 3X, - 2X3 . Now we have
~ ~4~, +X3 -2X2 =>[~]~[4 Y2 - 3X, - 2X3
Y2
3
=
~
3X, - 2X3.
4.1:'1 + X3 - 21:'2 and Yz
-2 ~2][~: 0
=>y=Ax
X3
Clearly y has N2 (fly; 1: y ) where fly and 1: yare obtained below
fl
y
~Afl ~[43 -2o
=
The Multivariate Normal Distribution
45
-2
o
~ [74 ~~] Finally Pn
I 2
~ 25/ ~(74)(24) ~ 0.593. 2.5.2 Marginal Distributions
The marginal distributions are useful in studying the properties of subset of random vectors. In the following we will discuss the marginal distribution of subset of random vector having the multivariate normal distribution. The marginal distribution is obtained below. Let x has Np (J.1, E) and consider the following partitioning of X. J.1 and E X~
Xl] ; J.1~ [J.1I] and E [ x2 J.12
~
[El1 E12]
E21 E22 We will obtain the marginal distribution of Xl as under. Now Xl ~ IXI + OX2 ~ Ax where A = [I 0]. Using result of linear combinations. the distribution of Xl is N q (AJ.1,AEA / ) . Now
AJ.1~[1 0][;:]~J.l, AEAI
~[I
So the distribution of
O][El1 E21 Xl
E12][I]~El1 E22
is N q (J.1I' E
0
l1 ). Similarly. we can show that x 2
is N p _ q (J.l2 ,1: 22 )· The above result states that any subset of the vector X having multivariate normal distribution also has the multivariate normal distribution. This argument is much stronger argument than the corresponding result in case of bivariate normal distribution. This result also states that every component of the random vector X having Np (J.1,E) has
46
Chapter Two
the normal distribution with mean and variance obtained from the corresponding entries of p.t and 1: ; that is Xi has N (Pi; (iii)
2.5.3
.
Independence of Normal Variates
We know that two random variables Xl andX1 having bivariate normal distribution are independent if their covariance is zero. In the following we will extend this result to the case of two random vectors Xl and X2 having joint multivariate normal distribution. Suppose the random vector Xhas Np (J.I.E) and suppose that x. J.I and
1: are partitioned as X~
Xl] 'J.I~ [,..,] [X ' J.lz z
andE~
[El1 E12] 1:21
1:22
.
We will show that the vectors Xl and X2 are independent if E12 this consider the density of multivariate normal distribution as
~
0 . For
I( x) ~ (27r rP121EI-1I2 exp [ -~(x- J.I)I E- l (x- J.I)]
~ (27rrPI2IEI-1I2 exp ( -~Q J where Q~(X-J.l/E-l(X-J.l) Now if E12 ~O then IEI~IEl1IIE221 and E- l is
IlE-~l
l
E- ~ l
o
Under this condition, we have
Q~[(XI -J.ll)1
(X2 -J.l 2n
lE~;
~ (Xl - J.l l )1 E;; (Xl - J.l l ) + (X2 - J.l 2 )1 E~i (X2 - J.l 2 ); and the joint density becomes
I( x"x 2 ) ~ (27r rq121El1l-1I2 exp [ -~(Xl _,..,)1 E;; (Xl -,..,)]
The Multivariate Normal Distribution
-(p-Q)/21 x ( 27r ) 1:2 1-112 exp
47
[1-"2 (
-1 ( X2 - 112 )1 1:22 X2 - 112 )]
~ nq (Xl I 1l,,1:11 ) x n(p_q) (X2 I 1l 2 , 1:22 ).
(2.5.2)
From (2.5.2) we can readily see that the joint density has been written as the prcxluct of the marginal densities of vectors Xl and X2 and so the random vectors Xl and x 2 are independent. The present result is again a generalization of the result of bivariate case where we say that "If Xl and X
2
has the bivariate normal distribution with density
then Xl and X 2 are independent if p ~ 0 or if (t/Vt)l ~ exp( it l fl )q>( t/Vt )ifl + 2Vtq! (eVt )exp( ie fl) So
E(X)~!~¢X(t)1 I
at
t~O
~ ~exp (ie fl )q>( t/Vt ) ifl + 2Vtq>1 (eVt )exp( it fl l
1
~ ~ ifl = fl as q>( 0) ~ l.
to
(2.10.3)
I
Again the central characteristic function of x is given as ¢X- E(X)
(t) ~ exp( -ie fl )¢x (t) ~ q>( eVt).
The covariance matrix of x is obtained as
1 82
C( X) ~ f atat l
!
¢X- E(X)
(t) t=o
!
q>(
eVt ) ~ 2Vtq>1 (eVt )
:~I ¢X- E(X) (t) ~
!
\2Vtq>1
¢X-E(X)
(t) ~
(eVt )l ~ 2V q>1 (eVt) + 4Vtq>11 (eVt leV.
So (2.10.4) Equations (2.10.3) and (2.10.4) provide general results for mean vector and covariance matrix for any Elliptical distribution.
CHAPTER THREE THE WISHART DISTRIBUTION
The Wishart Distribution is an important multivariate distribution and arises in several cases. The distribution arises in multivariate distribution theory in two contexts, namely as a stand alone distribution and as a sampling distribution when sampling is done from a multivariate normal distribution. The Wishart distribution has been obtained by Fisher (1915) for bivariate case and by Wishart (1928) for multivariate case. The Wishart
distribution is natural extension of classical %2 - distribution. In this chapter we will discuss, in detail, about the Wishart Distribution and some related distributions.
3.1 Definition In univariate analysis it is well known result that when sample is available from a normal population with zero mean and the sampling distribution of
(J2
variance, then
z = L~=lX~, is
;(2 -
distribution with n degrees of freedom and scale parameter
(J2.
Another definition of X 2 - distribution can be given as the distribution of sum of squares of standard normal variates. The Wishart distribution is obtained as an extension of this result and arises in sampling from a multivariate nonnal population or as sum of square and cross product of multivariate normal variates. The distribution also arise as the distribution of sum of products of multivariate normal variates. The distribution is defined below.
74
Chapter Three
Definition 3.1 Wishart Distribution: Let
Xl' X Z,······, xn
be independently
and identically distributed random vectors each having Np
(0, E), then the
random matrix W defined as:
W= L~~IX;X~
(311)
is said to have the central Wishart Distribution with n degrees offreedom
and scale matrix 1: and is denoted as Wp ( n, 1:). The distribution is said to be non-central if each of x; in (3.1. 1) has N p
(,..,
E) and in that case it
is written as Wp ( n, 1:, M) where M is non-centerality parameter.
If each of X; in (3.1.1) is Nl (0,0"2 reduces to
(J2
l then the Wishart distribution
%~ and hence can be seen as multivariate extension of %2 -
distribution with n degrees of freedom. The density function of Wishart random variable, as derived by Wishart (1928), is given as
fw
(W) = 2P'/2r p (~/2)IEln/2Iw(-;-1 exp { -~tr( E-IWl}; (3.1.2)
where
r p (n/2)
is multivariate gamma function given as (3.1.3)
The Wishart distribution can also be defined as the distribution of W = TTl where T is a lower triangular matrix such that each Ti] is distributed as
N ( 0, CTiJ ); 1 S; j < is; p and each
(n - i + 1)
r;i2
is distributed as (Ji~;(2 with
degrees of freedom and elements of T are independent from
each other. Further, if a random sample of size n is available from
N p (;.1 f( x) -- r(~]r(~f -x , x ,m,n_. (3.4.2) We know that if X and Yare independent Gamma random variables with same scale parameter then Beta distribution is the distribution of
U ~ X / (X + Y). We will discuss the multivariate Beta distribution as extension of this result as below.
Suppose W, has Wp
(n, I)
and Wz has Wp
(m, I)
and
both are
independently distributed from each other The density function ofW, is
fw, (W,) ~ c(p,n )IW,I±Cn-p -,) exp { -~tr( W, )}; and the density function ofWz is
fw z (Wz ) ~ c(p,m )IWzl±Cm-p -,) exp { -~tr(Wz )}; where
c(p,n) ~ {2 pn /zr p (n/2)f'. The multivariate Beta distribution is the distribution of random matrix U defined as
,
,
U ~ (W, + Wzp: Wz(W, + Wzp. We derive the distribution in the following. The joint density ofW, and Wz is
_ I I.'C n-p -,) IWzzI.'C m-p -,) !w"wz(W"Wz)-c(p,n)c(p,m)W,z xexp{
-~tr(w, + W z )}.
Now making the transformation W ~ W, + W z, the joint density ofW, and Wz is; Jacobian oftransfonnation being 1;
80
Chapter Three
~
Now making the transformation V
W zW zW
Z
~
and W
W, the
I
Jacobian of transformation is IWI,(P+I) . Now the joint density ofU and W IS
I
I
I
iu,w (U, W) ~ c(p,n )c(p,m )IWI,("-P-I) II - UI,("- P-I) IUI,(m-p-I)
xIWI~(m-p-I) exp{ -~tr(w) }IWI~(P+I) ~
c(p,n )c(p,m) II-VzI~(n-p-I) IVzI~(m-p-I) c(p,n+m) xc(p,n + m)IWI~(n+m-p-I) exp { -~tr(W)}.
Integrating out W, the density of U is
fu (V) ~ c(p,n )c(p,m) c(p,n+m)
1
1-
VI~(n-p-,) IVI~(m-p-,);
(3.4.3)
for II - VI > 0 and IVI > 0 and zero otherwise. The density (3.4.3) is density function of multivariate Beta random matrix. We can see that for P ~ 1, the density (3.4.3) reduces to the density (3.4.2). The multivariate Beta distribution given in (3.4.3) can be associated with univariate Beta distribution as below. The random matrix U having a multivariate Beta distribution can be written as U = TTl where T is a lower triangular matrix having independently distributed components. The component Beta distribution with parameters
(m - i + 1)
and
ti7
has univariate
(n + i-I).
This result
is useful in deriving distribution of ratio of detenninant of two Wishart random matrices in the following.
The Wishart Distribution
81
3.5 Distribution of Ratio of Wishart Detenninants In section (3.4), we have seen that the distribution of determinant of Wishart matrix is the distribution of products of X 2
-
variates. Another
important distribution which appears in context of multivariate analysis is the distribution of ratio of detenninants of Wishart matrices. This distribution has been obtained by Wilks (1932) and we give this distribution in the following.
Wp(n,I)
Suppose W, has
Wp(m,I)
and W 2 has
and both are
independent from each other The random variable U defined as
U~
IW,I IW, + W 2
1
of particular interest in multivariate inference. We will discuss the distribution of U in the following. IS
IW,I can be written as
We know that
where
1;,2
has X2 - distribution with
(n - i + 1)
IW + W
since WI and W2 are independent so
21
1
degrees offreedom. Now can be written as
IW, + W ~ IHH/I ~ Ir~,H,;; 21
where
Hi; has X2 - distribution with ( n + m - i + 1) degrees of freedom.
So the random variable U can be written as ratio of product of independent
X2 - variates, that is
TI ,-,-I-,',--' -"'-= p
TI ~
TIP 1=1
or
p
p 1=1
H2
TI
2 P 1;1 1=1
H2
JJ
JJ
2
2
X( n-I+l) 2
X( n+m- I+l) _
T2
TI
p 1=1
X( n-I+l) 2.
X(n- I+l)
r([n-i+l]/2,2)
+
2
Xm
n~'r([n-i+l]/2, 2)+r(m/2,2)
82
Chapter Three
(3.5.1)
r((n-i+l)/2 ,2) IS parameter (n - i + 1)/2 and
where
r((n-i+l)/2,2) !, (x) ~ x
Gamma random variable with shape scale parameter 2.
The density of
is
1
21"- ' +I)/2 r [(n-i+l)/2]
We also know that if Xis
r( a,2)
and Yis
X I"- ' +I)/2-l e -X/2
r(b,2)
.
x> 0
,.
then the density of V
defined as
V~~
X+Y'
IS
r (v) ~ Jv
q a+ b) V"-l (l_v)b-I· O< V < 1 qa)qb) "
that is Vhas a Beta distribution. So the density of U ii
( u)~
r[(n+m-i+I)/2]
=
X u /( X ii + 1'; )) is
ul"- ,+l)/Z-l(l_u)m/z-l ;O mv=matrix(c(527.74,54.69,25.13), + nr ow=3,byrow=TRU E )
> cm=matrix(c(5691.34,600.51,217.25, + 600.51,126.05,23.37,217.25, + 23.37,23.11), nr ow=3,byrow=TRUE )
> tv=matrix(c(550,70,25), nrow=3,byrow=TRU E ) > dif=mv-tv;ss=85 > tO~ss*(t(dif)%*%solve(cm)%*%dif) > fstat~ ((ss-3) / ((ss-l) *3)) *to > to;fstat [,lJ 241. 7829 [l,J 78.67538
Example 4.2: A random sample of 42 from a bivariate normal population with covariance matrix
1: = [32.25 provided
-14.36] 24.86
x = [124.55
168.52r. Can we conclude that population mean
vector is different from [1l0
150r?
Solution: We test the hypothesis as under I. The null and alternative hypotheses are
Ho
:Jl=[llO] vs H, :Jl7c[llO] 150 150
2. Level of significance is a. ~ 0.05.
Jlo)'
3. Test statistic is Qo ~ n(x - Jlo)1 S-' (x 4. Computations: Necessary computations are given below
x _ ~ [124.55] _ [llO] ~ [14.55]
Jlo 168.52 150 18.52 ~ 42[14.55]1 [32.25 -14.36]-' [14.55]
Qo
18.52
24.86
18.52
Inference about Mean Vectors or Qo
~
93
1697.1.
5. Critical region is Qo ;:c- X~.P or Qo ;:c- X~.05.2 or Qo ~ 5.99. 6. Conclusion: The null hypothesis is rejected. We conclude that
population mean vector is different from [ll 0 150
r.
4.2.1 Joint Confidence Region for Population Mean Vector The test of hypothesis about population mean vector is one way of drawing inference about the population mean vector. Another useful way of drawing inferences about the population mean vector is to construct joint confidence regions for the population mean vector. The joint confidence region for population mean vector is discussed below. We know that the one sample Hotelling's
~
T02
T02 -
statistic is given as
n(x - J.lo i S-1 (x - J.lo),
and has a central F-distribution under H 0 : f1 = f1v . If the hypothesis is not true then this statistic has a non-central F-distribution. Further, the statistic has a central distribution if test value J.lo is replaced with the true population value J.I. That is
n(x-J.l)/ S-I(X-J.l) has (n-l)p F n- p
p,
n- , p
and hence can be used to construct the joint confidence region for true value of population mean vector. Specifically a 100(1 - a)% joint confidence region is obtained as
p[n(x-J.l)/ S-I(x_J.l)):~I~P Fa,p,n_p ]~I-a, and is given as
n(x-J.l)/ S-I(x_J.l)«n-l)p Fa,p,n_p' n-p
(4.2.15)
If population covariance matrix is known then a 100(1 - a)% confidence region for population mean vector is given as -
n ( X-J.I
)/ ",-1 ( -
""
)
2
X-J.I -s,Xa,p'
joint
(4.2.16)
Chapter Four
94
The plot of joint confidence region can be easily constructed by using (4.2.15). The points for ith axis to plot the joint confidence region are obtained as
IT (n-l)p xbJ/l,
(
n n- p
,
) Fa ,p,"_p/3"
(4.2.17)
/3,
where /I, is ith largest eigen value of Sand
IS
eigen vector
corresponding to ~.. When population covariance matrix is known then
eigen values and eigen vector of E are used in (4.2.17). Example 4.3: The mean vector and covariance matrix for a sample of size
42 are given below
_x~ [0.564] andS= [0.0144 0.0117] . 0.603 0.0146 Construct 95% joint confidence region for the true population mean vector.
Solution: The 100(1 - a)% joint confidence region for population mean vector is
-
n (x-Jl
)/S-l(-x-Jl )«n-l)PFa pn- p ' n- P
"
Now
2(42-1) 42-2
(n-l)p F n-p
_
----'---'-Fo 05 2 42-2 -
-'-----'-"-- a ,p ,n - p
. "
6.62.
Hence the joint confidence region is:
42[0.564- JiJ. 0.603 - Jlz
[0.0144 0.0117][0.564- JiJ.] " 6.62 0.0146 0.603- Jlz
or
JiJ.)2 + 8409.576( 0.603 - Jlz )2 -13724. 844( 0.564 - JiJ.)( 0.603 - Jlz)" 6.62. 8526. 756( 0.564 -
The joint confidence region can be used to test the hypothesis about population mean vector. For example if we want to test
Inference about Mean Vectors
H 0 : Jl = [0.562
95
0.589r then we will use these values in above region
to obtain
8526.756( 0.564- 0.562)2 + 8409.576( 0.603- 0.589)2 -13724.844( 0.564- 0.562)( 0.603- 0.589)
X~.05,3(4)/2 or Qo
;:>
X~.05,6 or Qo
;:>
12.59.
6. Conclusion: We can see that the computed value of the test statistic does not exceeds the critical value, hence the null hypothesis is not
rejected. We conclude that population covariance matrix is not different from the given matrix. The R commands for above example is given below > sigmaO=matrix(c(4,3,2,3,6,5,2,5,lO), + nrow=3,byrow=TRU E ) > s=matrix(c(3.42,2.60,1.89,2.60,8.00,6.51,
Inference about Covariance Matrices
123
+1.89, 6.51, 9.62),nrow=3,byrow=TRUE)
> n=20;pro=s > qO~(n-1)*(log(det(sigmaO))- log(det(s))+ + tr(pro)-p)
> qO [lJ
3.637404
5.1.1 Some Special Hypotheses We have derived the test statistic for testing Ho : 1: = 1:0 and is given in (5.1.5). Sometimes we are interested in testing special pattern in S on the basis of a given random sample from N p
(J.l, 1:) and in that case the statistic
(5.1.5) is accordingly modified. We will now give some special hypotheses which represent certain pattern of 1: alongside their test statistics in the following. These special hypotheses are discussed by Krishnaiah and Lee (1980). The first hypothesis for testing special pattern in 1: is test of equal correlation structure and is given as
1
P
p
Ho : 1: ~ (J"2 p i p p
p
1
(5.1.6) The hypothesis (5.1.6) can be tested by using the statistic
-l( ) 6(p_l)(p2 p(p+l)2(2p-3) J p-4)
Q -o
n-l -
+
InA
.
(5.1.7)
jO
where
~
II. 1
and
lSi
()P S2 (l-rt- 1 {1+(p-l)r}
(1 J
1 ,"P ,"P 2 P S=-~)ii; r ~ ( 1) 2 L..Ji=lL..Jt"oi Sit· p p- S P i=l
(5.1.8)
Chapter Five
124
The statistic (5.1.7), under hypothesis (5.1.6), has X
2
-
distribution with
f~{p(p+l)/2-2} degrees of freedom. The second hypothesis about special pattern of 1: is to test for pairwise independence of all the p variables. The hypothesis to be tested in this case is
Ho :1: ~
O"n
0
0
0
0"22
0
0
O"pp
0
(5.1.9)
The hypothesis (5.1.9) can be tested by using the statistic
Qo ~-2[1-
2P+ll]lnA 6
(5.110)
2'
where
(5.111) The statistic (5.111), under hypothesis (5.1.9), has X2 - distribution with f~ p(p-l)/2 degrees of freedom.
The third hypothesis to test special pattern in S is to test whether all variables are pairwise independent and has common vanance. Specifically the hypothesis to be tested is
Ho :1:~0"2I.
(5.112)
The hypothesis (5.1.12) can be tested by using the statistic
Qo ~
-2[1- 2p26+pnp+ 2]lnA
3'
(5.113)
where
(5.114) The statistic (5.114), under hypothesis (5.112), has X2 - distribution with f~(p+2)(p-l)/2 degrees of freedom.
Inference about Covariance Matrices
125
5.2 Testing Equality of Several Covariance Matrices Often we are interested to see whether several multivariate normal populations have same covariance matrices. This hypothesis is of interest in several other tests. We now derive the test statistic for testing of this hypothesis by using the likelihood ratio criterion as discussed by Box (1949). The statistic is derived below. Suppose q independent samples are available from multivariate normal populations with different mean vectors and covariance matrices. For simplicity let X k1 , Xk2 , .. , Xknk is random sample of size nk from
Np (J..lk,l:k) and using these samples we want to test (5.2.1)
We will use the likelihood ratio criterion to develop test statistic for testing (5.2.1). The likelihood ratio is given as
A ~ sup{L( e I x):
e E eo}
rnaxL( OJ)
sup{L(el x): eE e}
rnaxL(O)
with usual notations. The likelihood function now is product of several likelihood functions. Now the density function of kth population is:
f( x k ) ~ (2". rPlzl1:lllZ exp [ -~( x k - Ilk / 1:;:1 (Xk - Ilk)
J
and hence the likelihood function for kth sample is
L( Ok ) ~ L( xkPxw
~ TI~:J 2". r
p12
· - ·'X lmk ; Ilk' 1:k ) ~
l1:l
llZ
exp [
TI~:/( Xq ;llk,1: k )
-~h; - Ilk )/1:;:1 (X k; - Ilk)]
~ (2". r nkPI2I1:lnkl2 exp [ -~ L ~:1 (Xq The likelihood function for all q samples is:
L( 0) ~ TI~~l( Ok) ~ TI~~1 (2". r"kPlzl1:lnklz x exp [
1
Ilk )1 1:;:1 (Xq - Ilk )
-~ L ;~J x k; - Ilk Y1:;:1 (X k; - Ilk )]
126
Chapter Five
~ (2nT"Pi2 TI~~111:lnki2 x exp [ -
~ L ~~1L ~:1 (Xk; - Ilk )/ 1:;:1(Xk; - Ilk) ];
where n ~ L~~1nk . The maximum likelihood estimates of Ilk and 1:k are
, _-
J.lk-Xk
- 1 '" nk ( - )( - )/ and"" ,i..;k---;;L...J =l xq-x k xq-x k J
1 Ak
_ ---;;
k
and hence the maximum of
L( 0)
k
is
L( Q) ~ (27r f"Pi2 TI~~1IAk/nlnki2 xexp [
~ (27r f
npi2
",n, ( -"21 ",q L.k~1L.;=1nk xij -
_)/
-1(
x k Ak
xij -
_x k )]
TI~1IAk/nlnki2
I
{q
I nk xij xexp --tr Lnk L(
2
k=1
-
xk )( xij - xk )/ A;:1 }J
J=l
~ (27r f"Pi2 TI~~1IAk/nlnki2 exp [ -~tr{L~~1nkAkA;:1}]
~ (27r f"Pi2 TI~~1IAk/nlnki2 exp [ -~{L~~1nktr( Ip ))]
[1 ]
n p L (0') ~ (27r )-n i2 TI Hq 1Ak / nkI- k12 exp -"2np .
Again the overall likelihood function under the hypothesis (5.2.1) is
L( OJ) ~ TI~~1L( OJk ) ~ TI~~1 (27r
P pi2 l1:l- nki2
(5.2.2)
Inference about Covariance Matrices
127
The maximum likelihood estimates of Ilk and 1: for this likelihood function are
,
J.lk =
-X and'" .:... = k
l",q
",", (
-)(
-;; L.tk=1 L.t )=1 Xk:i - X k
_)1 = -;;lA.
X k) - X k
The maximum of L ( (])) is therefore
L( m) ~ (2". r"Pi2IAfnl-ni2 exp [ -~ L~=IL~:J Xq - Xk)1
x(A/nt(xq-,uk)] ~ (2". r"Pi2IAfnl-"/2 exp [ -~ ntr
xlL!=IL~:J xk)-,uk)( Xk}- Xk)I A - 1lJ ~ ( 2". r npi2 IAf nl-ni2 exp [ -~ ntr (AA -1) ] ~ (2". rnpi2IA/nl-ni2 exp [ -~ntr(I)]
1
~(2"'r"Pi2I~I-nI2 exp [ -~np
(5.2.3)
Using (5.2.2) and (5.2.3), the likelihood ratio is
A~
(2nF PI2 1A/ nl-"/2 exp( -np/2) 12 P12 (2nF TI!JAk/nkl "k exp( -np/2)
or (5.2.4) Under the hypothesis (5.2.1), the statistic Qo ~ -2ln( A) has X distribution with
f
2
-
~ p (p + 1) ( q - 1)/2 degrees offreedom. If we use the
unbiased estimates of 1: k and 1: defined as
128
Chapter Five
and
"q
1 "n, ( S = (n _ q) L...J k=l L...J;=1 Xq = (
_ )( XkJ - _X k )/
- Xk
1 )L:~=,h-l)Sk n-q
then Box (1949) has shown that the statistic Qo =MC; with
(5.2.5) (5.2.6)
and
C=Ihas X
2
-
)-'J.
["q
2p2+3p-l (n -1)-'-(n6(p+l)(q-l) L..k=' k q,
distribution with
f
(5.2.7)
= p (p + 1)( q -1 )/2 degrees of freedom.
Example 5.2: Independent random samples of sizes 25. 30 and 25. from three bivariate nonnal populations, yield following covariance matrices 1 [78 S, = 24
27] 1 [31 61 ;S2 = 29
8] 1 [30 12 and S3 = 24
12] 8'
T est at 5% level that covariance matrices of three populations are equal. Solution: We test the hypothesis as under I. The null and alternative hypotheses are
H O :I:,=I: 2 =I: 3 vs HI : Atleast two covariance matrices differ 2. Level of significance is a. = 0.05. 3. Test statistic is Qo = Me where
M
=(n-k)lnISI-
L:!=,(nk
-1)lnISkl; and
Inference about Covariance Matrices
2p2+3p-1 C~l- 6(p+1)(q-1)
129
["q (n -1 )-1 - (n-q)-IJ ; L..k=1
k
4. Computations: Necessary computations are given below
InlS11 ~ 1.945 ; InlS21 ~ -1.004; InlS31 ~ -1.792 S ~ _l_"q
n-q
L...Jk=l
(n
k
-l)S
k
~ [1.805
0.610] 1.052
InISI ~ 0.423
M ~(n-k)lnISI- L:!=I(nk -l)lnIS k l; ~ (80- 3)( 0.423)- {( 24 x 1.945) - (29 x 1.004) - (24 x 1.792)} ~
58.015.
Also
C~l- 2p2+3p-1
6(p+1)(q-1)
["q L..k=1
(n _l)-I_(n_ k
)-IJ.
q,
~ 1- 2x 22 + 3 x 2-1[24- 1 + 29-1 + 24-1-(80-3r'J 6(2+ 1)(3-1) ~ 0.962 => Qo ~ (58.015)( 0.962) ~ 55.976.
5. Critical region is Qo
;:c-
X~.05.[2(2+1)(3-1)J/2 or Qo
;:c-
X~.05.6 or
Qo;:C- 12.59. 6. Conclusion: The null hypothesis is rejected. We conclude that population covariance matrices are not equal. The R commands for above example is given below > sl=matrix(c(78,27,27,61), + nrow=2,byrow=TRUE)/24 > s2=matrix(c(31,8,8,12), + nrow=2,byrow=TRUE)/29 > s3=matrix(c(30,12,12,8), + nrow=2,byrow=TRUE)/24 > nl=2S;n2=30;n3=2S;nc=nl+n2+n3;p=2;q=3
> >
sc~((nl-l)*sl+(n2-1)*s2+(n3-1)*s3)/(nc-q) fl~((nl-l)*log(det(sl)))+
+( (n2-1)*log(det(s2)))+
130
Chapter Five
+( (n3-1)*log(det(s3)))
>
f2~(1/(n1-1))+(1/(n2-1))+(1/(n2-1))-
+(1/ (nc-q))
> bm~(nc-q)*log(det(sc) )-f1 > bc~l-( (2*p A 2+3*p-1)/(6*(p+1)*(q-1)) )*f2 > qO=bm*bc > bm;bc;qO [l,J 58.01252 [l,J 0.9647392 [l,J 55.96695
5.3 Testing Independence of Sets of Variates In chapter 2, we have seen that two multivariate nonnal random vectors
are independent iff the matrix of their covariances is a zero matrix. This concept can be easily extended to the case of several multivariate random
vectors. The independence of sets of variates is useful in several other multivariate procedures like multivariate regression and canonical correlation analysis. The independence of sets of variates can be tested by using the sample covariance matrix computed from a random sample from multivariate normal population. Specifically. the independence of sets of variates is tested by considering following partitioning of the population covariance matrix
1:~
for
a
x
[Xl
=
1:11
1:)2 1:22
1:13 1:23
1:21
1: 2q
1:31
1:32
1:33
1: 3q
1:q)
1:q2
1:q3
1: qq
random
X2
vector Xq
x
J
1:)q
which
has
(5.3.1)
been
partitioned
as
with Xc containing pc variables. The
independence of sets of variates Xl' Xl'
.. , Xq
can be tested by testing the
hypothesis that population covariance matrix given in (5.4.1) is a block diagonal matrix. Specifically, the independence of sets of variates can be tested by testing the null hypothesis
Inference about Covariance Matrices
1:11
0
o
1:22
o
0
o o
131
o o o
(5.3.2)
1: qq
0
We now develop the procedure for testing the hypothesis (5.3.2) in the following as obtained by Box (1949). Suppose a random sample of size n is available from N p (J.l, 1:) with density function
f(x) ~ (271r/211:1-1I2 exp [ -~(x- JlY 1:-' (x- Jl) Suppose
further
that
the
x~ [x, x
Xq ]
2
random
vector
,s
1
partitioned
as
and density of kth subset is
f(x k) ~ (271rPI211:kkl-1I2 exp [ -~(Xk -
S;~ (Xk - Ilk)l
Jlk Y
We will use the likelihood ratio given as A ~ sup{L(
e x): e E 8 0 } sup {L (e x) : e E 8} I
I
rnaxL( (j}) rnaxL( 0)
to derive a test procedure for testing the hypothesis (5.3.2). Now, the likelihood function for a sample of size n under entire sample space is
L( 0) ~ L( x"x 2,·' ·,x n ; ,..,1:) ~ ITJ( X;;,.., 1: ) ~
TI ;~, n
(
271 )-PI211-1I2 1: exp
~ (271 )-nPI21 1: l-nl2 exp
[l( -2'
x; - Jl )1 1:-'( x; - Jl )]
[1-2' ",n
)1, (x; - Jl)] .
L.. ;~, (x; - Jl 1:-
The maximuru likelihood estimates of Jl and 1: for above likelihood function are
_)1 = -nlA.
n lL: J.l, = -x an d'" .:... = = ( x· - x) ( x· - x n )1} }
So the maximuru of L (0) is
132
Chapter Five
L( Q) ~ (27r f"PI2IA/nl-nl2 exp [ -~ L~~l( Xl - X)I x(A/nr'(xl-x)]
~ (27rfnpl2lA/nl-nl2 exp [ -~tr\L~~I( Xl -X)( Xl - X)I A-I)] ~
(27T )-"P121 A/ n l-nl2 exp
[1 ] -"2np
(5.3.3)
Again the likelihood function under the null hypothesis is
L( OJ) ~ L( xl ,X Z ,···,x n ;J1,I:') ~ ITJ( X l ;J1,I:')
~ Il~~l (27T fPIZII:'I-lIZ exp [ -~( Xl - flY (I:' ~ (27T f
r (Xl - fl)]
PIZ Il~JI:klnlz
~ L ~~IL ~~l Xkl - flk YI:;~ ( Xkl - Jik )]. The maximum likelihood estimates of flk and I:kk are x exp [ -
,
J.lk
-
= xk
f. and ':"'kk
and hence the maximum of
(
_)1 = -;;1 A
1", n ( - )( = -;; L...J )=1 X k) - X k X k) - X k
L( (])) q
is
L(OJ ) ~ (27T f"Pl2 IIIAkk /nl-nl2 k=l
k=l
kk
Inference about Covariance Matrices
[1 ]
TIq IAkk I n1-"/2 exp -2 np
~ ( 2J[ )-"PI2 k~l
133
(5.3.4)
Using (5.3.3) and (5.3.4), the likelihood ratio is
A~
npll
q
-nil
(2J[r TIkJAkklnl exp(-npI2) (2J[) "PI21A1nl "/2 exp(_npI2)
IAI nl"/2 IAI"/2 q IA Inl"/2 TI q IA TI k=l
kk
k=l
kk
1"/2
If we use the unbiased estimates of 1: then the likelihood ratio is given as
ISI"/2
lSi
TI~JSklI2
TI~JSkkl
A~
Using the scale factor
c-1 ~ 1
( P 1
12/ n-l
Z3+ 3Z 2];
with Z, ~ p' - L~~lP: and 2/~Z2;Box (1949) has shown that the statistic
Qo ~-(n-l)C-lln(A); has X
2
-
(5.3.5)
distribution with! degrees of freedom. So the hypothesis (5.3.2)
can be rejected at a level of significance if Qo ;:> X!,f . We can also use correlation matrices to test independence of sets of variates. In this case the likelihood ratio is given as:
A~
---,I--,RI_ TI~JRkkl
The test statistic (5.3.5) remains same. A special case of test of independence is when all Pk ~ 1. In this case the hypothesis (5.3.2) becomes
Chapter Five
134
0"11
0
0
0
0
0"22
0
0
0
0
0"33
0
0
0
0
O"gq
:1:~1:' ~
Ho
(5.3.6)
The likelihood ratio for testing (5.3.6) is
A~
lSi
TI!=l
IRI; Skk
and the test statistic is given as
~[(n-q)
Qo and has X
2
-
(2 P+5)]ln(A); (5.3.7) 6 distribution with p( p -1 )/2 degrees of freedom.
Example 5.3: A random sample of size 100 yielded following covariance matrix
43.253
-l.912
-l.l23
-0.737
l.01O
-2.246
-9.953
3l.326
-l.604
7.252
-l.052
2.776
-3.660
18.514
5.741
-2.181
2.617
-l.363
77.575
10.537
-4.277
-5.724
30.034
-2.767
5.381
44.173
2.103
S~
57.650 T est at 5% level that sets (Xl. X 2 ) ; (X,. X 4 ) and (X,. X,. X 7 ) are independent Solution: We test the hypothesis as under I. The null and alternative hypotheses are
Ho
:1:~ [
1:11
0
0
0
1:22
0
o
0
1:33
Inference about Covariance Matrices
vs H, : 1: 7c
1:11
0
0
0
1:22
0
135
[
o 0 1:33 2. Level of significance is a ~ 0.05. 3. Test statistic is Qo ~ -( n -l)C'ln( A) where and C-' ~ 1
1 (
121 n-1 P2:3 +32:2].
4. Computations: We have
S 11
~ [43.253 -1.912].S ~ [18.514 31.326 '
22
30.034 -2.767 44.173
and S33 ~
[
5.741 ] 77.575
5.381] 2.103 57.650
Now
IS,I ~ 1351.293; IS21 ~ 1403.233; IS31 ~ 74567.785 ; lSi ~ 1.1657 x 1011 lSi 1.1657 x 1011
A
TI~JSkk I
~ 0.824
(135 1.3)(1403.3)(74567.785)
Also
C'
~
1
1 (
121 n-1 P2:3 +32:2] 2:3 ~ p3 - ~~, p~ ~ 73 _ ( 23 + 23 + 33 ) ~ 300 2: 2
~p2_qk=l
p2k
~72_(22+22+32)~32·/~.!.2: , 2 2 ~16
_,_ (2 x 300+3 x 32) So C - 1 12(16)( 100 -1)
0.963
Finally Qo ~ -(100 -1)( 0.963)ln( 0.824) ~ 18.432.
Chapter Five
136
5. Critical region is Qo ;:> X~.05,16 or Qo;:> 7.962. 6. Conclusion: The null hypothesis is rejected. We conclude that sets of variates are dependent. Following is set of R commands for above example > cQvx=read.table("d:/MS.c3v",header=T, + sep=", ")
> n ~100;p~7;p1~2;p2~2;p3~3 > sc=as.matrix(covx, nrow = 7,byrow=TRUE) > sl~matrixlcI43,253,-1,912,-1,912,31,3261, + nrow=2,byrow=TRU E )
>
s2~matrixlcI18,514,5,741,5,741,77,5751
+, nr ow=2,byrow=TRU E ) > s3=matrix(c(30.034,-2.767,S.381, + -2,767,44,173, 2,103,5,381,2,103, + 57.670),nrow=3,byrow=TRUE)
> ds1~detls11;ds2~detls21;ds3~detls31 > dsc~detlscl > lem~dsc/lds1*ds2*ds31 > f1~pA3-lp1A3+p2A3+p3A31 > f2~pA2-lp1A2+p2A2+p3A21 > f~f2/2;fc~1- 1 12*f1+3*f21 / 112*f* In-11 1 1 > qO~-ln-11*fc*loglleml > qO [lJ
18,4315
CHAPTER SIX THE MULTIVARIATE ANALYSIS OF VARIANCE
The Analysis Of Varinace (ANOVA), derived by Fisher (1921), is a popular techniques that has wide spread applications in many areas of Statistics. The technique provide basis to draw inferences in many situations. This technique is baseline of analysis in Designed Experiments where interest is to study effect of one or more treatments. This technique is based upon partitioning of total variation of data into various components associated with various treatments and/or combination oftreatrnents. These components then provide basis for testing significance of treatment effects. The ANOVA is based upon the assumption that multiple samples are available from normal populations with same variance. When several samples are available from several multivariate normal populations and the interest is to test the effect of some treatment; like teaching method or some medicine; then a natural extension of ANOVA is required. A popular extension of univariate ANOVA is Multivariate Analyais of Variance (MANOVA) which we will discuss in this chapter. The MANOVA is basically extension of ANOVA to vector variates but the core application remains the same. We will first give a brief description of univariate ANOVA and we will then discuss the multivariate version of the technique.
6.1 Univariate Analysis of Variance Univariate Analysis Of Variance (ANOVA) is a technique that is used to test the treatment effects when single sample from several normal populations or several samples from one normal population are available. The technique assume that population variance of several populations are equal. Several versions of this technique are available but we discuss two simple cases known as One Way ANOVA and Two Way ANOVA in the following.
Chapter Six
138
6.1.1 The One Way ANOVA The One Way ANOVA is simplest of Analysis of Variance techniques and is also known as one variable of classification Analysis of Variance. In one way analysis of variance we assume that independent samples are available from q nonnal populations having different means and equal
variance; that the several samples are available from
N(,ut;o-2).
Alternatively in this technique we assume thatjth observation in kth sample is generated by the linear model X)k =j1+Tt +5 jk ; k=1,2, .. q; j=1,2, ··,nk" (6.1.1) The hypothesis of interest in one way analysis of variance is that all group effects are zero; that is the hypothesis is H 0: Tk ~ O/orall k. which is equivalent to testing the hypothesis of equality of several population means given as Ho :!l,. ~ Ji. 2 ~ ... ~ Ji. q (6.1.2) The one way analysis of variance calls for partitioning of variation of data into its components as under
-x) ~ (Xk -X)+( X;k -Xk ), x n -1 L !=1 L ~:1 is mean of all (X;k
where
=
X)k
n
=
L !=1 n
k
observations
and xk is mean of kth sample. Using above partitioning; the total variation of sample is partitioned into its components as
L :~1L ;:1 (X k; - X)' ~ L :~1L ;:1 (Xk - X)2 + L :~1L ;:1 (X;k - Xk l' (6.1.3) or
SST where
~
SSE + SSE,
_"q ,,". (x _)2
SST - ~ k=l~
j=l
kj -
X
is Total Sum a/Square
SSE
~ L!~l nk (Xk - X)2 is Between Group Sum 0/ Square
SSE
=
L !=1 L ~:1 (x
kj -
xk t
is Within Group Sum ofSquare.
Under the assumption of normality it is well known result that each of above sum of squares follow
(52
X2
-
distribution with (n - I). (q - I) and (n - q)
The Multivariate Analysis of Variance
139
degrees of freedom respectively and are independent of each other The hypothesis (6.1.2) can therefore be tested by using the statistic
F
~ IL!~lnJ'k-x)2l/(q-l)
\L!~lL~:J X}k - xS l/(n - q)
a
_ SSB/(q-l) _ MSB - SSE/(n-q) - MSE·
(6.1 A)
The sum of squares given in identity (6.1.3) can be computed by using following alternative computing formulae SST ~
"q ,, ". (x - X)2 ~ "q ,, ". X2 - nx2. ~ k=l ~ j=l
nk xk2
~ k=l ~ j=l
k.;
k.;
nx- 2 .,
SSB
~ ",q
SSE
~ L!~lL ~:JX}k - xS ~ L!~l (nk -1 )S;
L.tk=l
-
,
The results of One Way ANOVA can be summarized as in the table below d.C
SOY Between
q-l
SS SSE
Within
n-q
SSE
Total
n-l
F-ratio MSB ~ SSB/(k-l) Fa -MSE/MSE MS
MSE ~ SSE/( n-k) SST
The statistic (6.1.4); under the hypothesis (6.1.2); has anF-distribution with (q - 1) and (n - q) degrees of freedom and hence the hypothesis (6.1.2) can be rejected at a level of significance if Fa '2 F UI}-l, n _ q. Example 6.1: Following data represent scores of randomly selected students from four colleges in a competitive examination:
College 1 74 68 81 62 69
College 2 72 86 72 77 69 83
College 3 57 76 83 87
College 4 67 61 73 74 81 86 84
140
Chapter Six
Can we say that average score of students of these four colleges is same in this competitive examination? Solution: We test the hypothesis as under. I. The null and alternative hypotheses are
Ho
~ 1-'2 ~ 1-'3 ~ 1-'4
:1-',
H, : I-'k " I-'m for at/east one pair. 2. Level of significance is a ~ 0.05 .
. . . F
3. Test statIstIc
IS
MSE MSE
0 = -_.
4. Computations are shown below Colle~e
Colle~e
1
2
Colle~e
74 68 81 62 69
72 86 72 77 69 83
57 76 83 87
Total nk
354 5
459 6
xk
70.80
76.50
Now
SST
SSE =
2
2
243 4
1581 22 -n
60.75
75.00
71.86 = x
2
2
2
=
(74 + 68 + .--+ 86 + 84
~
115283 -113616.41
",q L.tk=l
Xk2
-
4
67 61 73 74 81 86 84 525 7
L!=lL::IX~ _riX
=
Colle~e
3
~
) - ( 22x
74.59')
1666.59
n.i(2
~ (5 X 70.80 2 + --- + 7 X 75.00 2 ) - (22x 74.59 2 )
= 114313.95 -113616.41 = 697.14. The results are summarized in the table below
The Multivariate Analysis of Variance
SOY Between Colleges
d.f. 3
SS 697.54
MS 232.513
Within Colleges Total
18 21
969.05 1666.59
53.836
141
F-ratio Fa -4.319
5. Critical Region is Fo 2:: F U{} -l,n _ q or Fo 2:: FO.05 ,3,18 or Fo 2:: 3.16. 6. Conclusion: The null hypothesis is rejected. We conclude that average score of four colleges in this competitive examination differ significantly.
6.1.2 Two Way ANOVA The one way ANOVA is a useful technique for testing treatment effects when independent samples are collected completely at random from several groups. Often the randomness of a sample is restricted under some other criteria; say blocking. In this case the one way ANOV A is not a suitable method to compare the treatment effects. The technique known as Two Way ANOVA is a useful technique which can be used to test treatment effects in situations where randomness within treatment is restricted under some criteria. The two way ANOVA is a powerful technique which can be used to analyze many types of designed experiments which includes Randomized Complete Block Designs, Two Factor Factorial Designs and many others. We will now discuss two way ANOVA as a method to analyze a Randomized Complete Block Design under two situations which include the case when we have single observation per treatment and block combination and the case when we have more than one; but equal; number of observations per treatment and block combination. The two cases are discussed below. In a Randomized Complete Block Design with single observation per treatment and block combination it is assumed that the (J. k) th observation; observation for kth treatment in}th block; has been generated from the linear model X}k =Jl+jJ} +rk +8}k;j=1,2, .. ,a;k=1,2, .. ,b (6.1.5) where Sjk are N( 0;
(52) . The model (6.1.5) is popularly known as model
for Randomized Complete Block Design without Interaction. The model (6.1.5) can be used to test two hypotheses simultaneously which are HOI: Treatment Effects are same or HOI : Tk = 0 .
Chapter Six
142
H02: Block Effects are same or H02 : fJ}
=0 .
The two way ANOV A technique provides basis for testing of above two hypotheses by partitioning total variation of data into various components as under
,," "b (x j k_X)2 =b"" (xJ_X)2 +a"b (x _X)2 L..J ;=1L.. k=l ·· L..;=1 .. L.. k = l k ..
"",,b( ___ )2 ; j=lL.. k=l x jk - X I - X k + X
+ L..
X: k is mean for treatment k and 'X':'.
where xi" is mean for block);
of all observations. In equation (6.1.6) SST
=
L ~=1 L :=1 (X jk - X
SSB
=
b"" (x) L.. j=l
SSTr SSE
=
- X)2
a L~=l ("X: k
IS
mean
is Total Sum of Squares
is Block Sum a/Squares
_"X:.)2 is Treatment Sum a/Squares and
,," "b (x
= L..j=lL..k=l
t
(6.1.6)
_ _ +X_)2
jk - x I -Xk
is Error Sum
0/
Squares. Under the assumption of nonnality each of the above sum of squares has
X 2 - distribution with (ab - I), (a - I), (b - I) and (a - I)(b - I) degrees of freedom respectively and are independent from each other The hypothesis related to treatment effects can therefore be tested by using the (J'2
statistic
aL::Jx:k _x:)2 /(b-I) 01 L: ;~1 L::~1 x). - X: + X:)' I( a -1)( b -I )
F,
=
(X}k -
k
SSTr/(b-l) SSE/(a-l)(b-l)
(6.1.7)
which under Hal has an F-distribution with (b - I) and (a - I)(b - I) degrees of freedom. The hypothesis related to block effects can be tested by usmg
p; =
bL~JXI -xN(a-l)
02
L~=lL:=l(xrXI -xk+xN(a-l)(b-l)
The Multivariate Analysis of Variance
143
SSE/(b-l)
(6.1.8)
SSE/(a-l)(b-l)
which under HOI has anF-distribution with (a - I) and (a - I)(b - I) degrees of freedom. The hypotheses HOI and H02 are therefore rejected at a level of significance
if
F OI ;:> Fa,(b-l),("-l)(b-l)
and
F o2 ;:> Fa,(,-l),(,-l)(b-l)
respectively. For computational purpose we use following formulae to compute various sum of squares
SST -- "" L.J j=l " L.Jbk=l (x jk SSB ~ b"" (x j .- .X L.J j=l
SSTr
=
"Vb (-
a L...Jk=l
X. k -
X
-..
)2 - "" L.J -
)2 ~ b"" x L.J
- )2
x..
j=l
=
"b
L.J
j=l 2
j
nx
_
"Vb -2
a L...Jk=l X. k
k=l
-
x2
jk-
nx 2 ; n ~ ab
2
-2
nx..
SSE ~ SST - SSE - SSTr The results of two way ANOVA can be summarized as below d,[, SOY SS b-I SST, Treatments a -I Block SSB (a-I)(b-I) Error SSE n-I Total SST We have dIscussed the appl!catlOn of two way ANOVA wIthout mteracllon to analyze a Randomized Complete Block Design when one observation per treatment and block combination is available. In some situations we have more than one observation per treatment and block combination in a Randomized Complete Block Design. In such situations the analysis of the design can be carried out by using the technique known as Two Way ANOVA with Interaction. \\Then more than one observations are available in a Randomized Complete Block Design then we assume that the lth observation injth block under treatment k is given by the model X jk1
= j1 + /3j + Tk + rjk + 5 jk1
k~I,2, "b;j~I,2, ",a;l~1,2,
",n;
(6.1.9)
where Sjk are N( O;a" ). The above model can be used to test following hypotheses simultaneously HOI: Treatment Effects are same or HOI: r k = 0 .
Ho2 : Block Effects are same or H02 : fJ}
~
0.
144
Chapter Six
H03: Interection Effects are same or H03 : r}k
=0
Above hypotheses can be tested by partitioning total variation of data into various components as under:
L:~lL:~lL;~lhl-X)' ~bnL:~JxJ -x)' +anL:~Jxk _X)2 'V" 'Vb (-
+n L..Jj=lL..Jk=l
_
_
X jk -Xj -Xk
+
_)2 'V" 'Vb 'V" ( _ )2 X + L..Jj=lL..Jk=lL..JI=l x jk1 -x , (6.1.10)
SST
~
SSB + SSTr + SS(B x Tr) + SSE.
Each of the sum of squares in (6.1.10) follow
di
-distribution with
appropriate degrees of freedom. The results of two way ANOVA with interaction can be summarized in the table as below· SOY d.f. SS Treatments b-l SSTr Block a-I SSB (a-l)(b-l) Interaction SS(BxT,) ab(n - 1) Error SSE Total abn- 1 SST The hypotheses HOI, H02 and H03 are tested by usmg followmg stallsllcs:
1';" =
SSTr/(b-l) . SSE/ab(n-l) ' Fa2
=
SSB/(a-l) SSE/ab(n-l)
sst B x Tr )/( a-l)(b-l)
and
Fa, =
SSE/ab(n-l)
.
Each of the above statistics has F-distribution with appropriate degrees of freedom. Example 6.2: Four treatments are tested in five randomized blocks and
£o11 owmg d at a was 0 bta me d B-1 B-2 B-3 B-4 B-5 Total
xk
T-l
T-2
T-3
T-4
Total
x).
25 26 30 26 28 135 27.00
29 30 35 36 36 166 33.20
35 39 36 37 32 179 35.80
37 38 41 41 35 192 38.40
126 133 142 140 672
31.50 33.25 35.50 35.00 32.74 x = 33.60
The Multivariate Analysis of Variance
145
T est significance of treatments and blocks. Solution: The significance of treatments and blocks is tested below: I. The hypotheses to be tested are HOI: TreatmentMeans are Equal H02 : Block Effects are Same. 2. Level of significance is a ~ 0.05.
F
3. Test statistic is
=
01
MSTr . F MSE' 02
MSE MSE·
4. Computations are shown below:
B-1 B-2 B-3 B-4 B-5 Total
xk
T-l
T-2
T-3
T-4
Total
x).
25 26 30 26 28 135 27.00
29 30 35 36 36 166 33.20
35 39 36 37 32 179 35.80
37 38 41 41 35 192 38.40
126 133 142 140 672
31.50 33.25 35.50 35.00 32.74
a
SST
=
b
LLx~k
x = 33.60
_nx2
)=1 k=1
=(25 2 +29 2 + ... +41 2 +35 2 )-(20x33.60 2 ) = 474.80 ",b
-2
SSTr = a L...Jk=1 X. k
-2
-
nx..
= 5( 27.0 2 + 33.22 + 35.8 2 + 38.42 ) - (20 X 33.60 2 ) ~
358.00
SSE = bL~=IX~
_nx2
~ 4( 3l.502 + ... + 32.75 2 )- (20 X 33.60 2 ) ~
43.30
SSE = SST -SSTr-SSE = 73.50.
146
Chapter Six
The results are summarized below:
SoY Treatment Block Error Total
d.f. 3 4 12 19
SS 358 43.3 73.5 474.8
MS 119.333 10.825 6.125
F-ratio Fo! ~ 19.483 F02 - 1.767
5. Critical regions are: FOl 2:: FO.05,3,12 or FOl 2:: 3.490 and F02 2:: FO.05,4,12 or Fa2:> 3.259. 6. Conclusion: We reject the hypothesis about treatments and accept the hypothesis about blocks. It is concluded that treatment means differ significantly whereas the block effects are same. We now discuss the Multivariate ANOVA in the following.
6.2 The Multivariate Analysis of Variance The Multivariate Analysis a/Variance is extension of univariate analysis of variance to vector variates. Just like univariate ANOVA, the Multivariate ANOV A (MANOVA) partitions total sum of squares and cross products into
various components; associated with various treatments; and then those components are used to perfonn tests of significance about various treatments. The technique is based upon the assumption that independent
random samples are available from multivariate normal populations with same covariance matrices and the observations can be generated under various models just like (6.1.1) and (6.1.5). We will discuss two simple cases of Multivariate ANOVA in the following. as discussed by Timm (2002).
6.3 One Way Multivariate ANOVA The simplest of Multivariate ANOVA is the case when total sum of squares and cross products is divided into only two components; one associated with the treatment under consideration and one associated with random error. The technique in this case is known as One Way A1ANOVA is also called one vector classification ANOVA. In this technique it is assumed that the observation vectors can be generated from the model x}k~J.I+Tk+E}k;k~I.2 •...• q;j~I,2,
...• nk
(6.3.1)
The Multivariate Analysis of Variance
where
E)k
147
are i.i.d as Np (0; 1:), J.l is overall mean effect and
T k
is effect
of kth treatment or group. In one way MANOVA the total sum of squares and cross products is partitioned into components associated with T k and E)k
as below. Consider
(X)k -
i)
= ( X)k -
i k + i k - i) and hence the total
sum of squares and cross products is given as
L !~1 L :~1 (x
x) (X X Y ~ L !~IL :~1 x + x x)( }k -
}k -
(X}k -
k -
k
X}k -
xk + xk - xY
~ L~IL~:IICxk - x)+ (X}k -Xk )l\(xk -x) +( X}k -xk)V ~
L ~~IL ;:JXk - X)(Xk - xi +"q ,,"k (x -x )(x _X)I L....J k=lL....J j=l
)k
k
)k
k
I
L :~lL ~:1 (Xk - x)( X}k - X k ) + L ~~IL ;:1 (X}k - Xk)(Xk - xi· +
Now the third and fourth sum vanishes as
L ::1 (X)k -
Xk )
=
0 and hence
the partitioning of total sum of squares and cross products in one way MANOVAis
"q
,,"k (x jk - x)(x jk - X)I ~
L....J k=lL....J j=l
+
L
!=lL
"q
+ L....J
Or In (6.3.2)
~:J x jk
-
"q
,,"k (xk - x)(xk - X)I
L....J k=lL....J j=l
xk )( X jk
I -
Xk
)
""k (x - X )(x - X )1
k=lL....J j=l
SSCP(T)
jk
~
k
jk
k
SSCP(W) + SSCP(E)
SSCP(T)~ST ~L~~IL;:JX}k-X)(X}k-X)1 Squares and Cross Product Matrix,
IS
(6.3.2) Total Sum of
148
Chapter Six
SSCP(W) = SB
= L!~l nJxk - X)(Xk - X)I
is Between Group Sum of
Squares and Cross Product Matrix and SSCP (E)
= Sw = L ~~1 L ~:J x Jk
xk )( X
-
Jk -
xk )
I
is Error Sum of
Squares and Cross Product Matrix. Since in one way 1.1ANOVA it is assumed that samples are drawn from
multivariate nonnal distribution so each of the sum of squares and cross product matrix has Wishart distribution. Specifically the Total Sum of Squares and Cross Product matrix; ST; has Wp ( n -1, 1: ) , Between Groups Sum of Square and Cross Product matrix; SB; has Wp ( q -1, E) and Within Groups Sum of Squares and Cross Product matrix; Sw; has Wp (n - q,E); where n
= L !=1 nk
The ratio (6.3.3)
therefore; has Wilks (1946) Lembda distribution. Rao (1948) has shown that the statistic
_[ms- P(q-l)/2+1] 1_1',11, F. x --:-c-o p(q-l) f,lIJ has
an
F-distribution
approximate
{ms- p(q-l)/2+1}
(6.3.4)
p( q-l)
with
and
degrees of freedom; where
+q
P m=(n-l)--and s 2
[p2(q_l)2_ 4 ]
2
=
2
2·
P +(q-l) -5
The statistic (6.3.4) can; therefore; be used to test significance of treatment effects in one way MANOVA. The results of one way MANOVA can be summarized in the table as below
SoY Between Groups
q-l
SSCP SB
Within Groups Total
n-q n-l
Sw ST
d.f.
A A
=
ISwl/ISw + SBI
The Multivariate Analysis of Variance
149
We have following alternative computing formulae for various sum of squares and cross prcxluct matrices.
,,". (x - x)(x - X)I ~"q ,,".
S T ~"q ~ k=l ~
j=l
Jk
~ k=l ~ j=l
Jk
"q __
"~ = "" (- -) (- _)1 = ~k=1 YltXkXkI ~k=l Ylt Xk - X Xk - X
X
Jk
Xl
Jk
-
nxx l
_I
nxx
Sw ~L:01L~:1(Xjk-Xk)(Xjk-XkY ~L:olh-l)Sk where Sk is sample covariance matrix for kth group. One popular use of one way MANOVA is to test the hypothesis of equality of several population mean vectors. In the following we will show that the likelihood ratio criterion for testing equality of several mean vectors yield (6.3.3) as a statistic for testing hypothesis about equality of several mean vectors.
6.3.1 Testing Equality of Several Mean Vectors Suppose that independent random samples are available from 1. 2, ... , q and by using these samples we want to test the
Np (J.lk;1:); k~ hypothesis
Ho :J.l1 ~J.l2
~"'~J.lq
(6.3.5)
We will use the likelihood ratio criterion to test the above hypothesis. Now likelihood ratio is given as
11. ~ sup{L(B Ix) BE 8 o } sup {L (B
I
x) : B E
8}
maxL( w)
L(w)
maxL(O)
L(O )'
(4.1.1)
where notations have their usual meanings. Now since the kth population is
Np (J.I,;1:)
so its density function is
l
f (Xk ) ~ (2ff pI211:1- 1I2 exp [ - ~(Xk - J.lk )/1: -1 (Xk - J.lk )
r
and hence the likelihood function under entire sample space is
L(O) ~ IT:~IL( Ok) ~ IT:~n~:/( Xjk ;J.lk,1:) ~
IT :~n ~:l (2" r P1211:r 1/2 xex p [ -~(x jk - 11k Y1:- (x jk - 11k)] 1
150
Chapter Six
~
(2J!" f"PI21:!;1-"/2 xex p [
-~ L:~IL;:I(X;k - I'k Y :!;-1 (X;k - I'k)];
where n ~ L~~lnk . The maximum likelihood estimates of Ilk and 1: for above likelihood function are
, _-
Jlk-Xk
"'"k (
'" - 1 '" q an d ":";--L...Jt=lL.t =l j
n
- )(
Xkj-X k
- )1 _ 1 S
Xkj-X k
--
n
W.
Using these estimates we have
L( b) ~ (2Jr r"PI21~Sw 1-"12 -~ L:~lL~:l( X;k - X Ys;i; (X;k - xk)] nl2 ~ (2Jrr"'/21~ Sw lxexp [
k
xexp[-~trlL~~lL~:l(
X jk
-xk )( X jk -Xk)1 s~l]
~(2Jrr"'/21~Swl-nI2 exp [ -~tr(Ip)]
1
~(2Jrr"'/21~Swl-nI2 exp[-~np
(6.3.6)
Again the likelihood function under the null hypothesis is
L( cu) ~ IT:~l(CUk) ~ IT:~n;:/( X;k;I',:!;) 1/2 ~ IT :~IIT ;:1 (2J!" f PI21:!;1xex p [
~
-~(X;k -I'Y:!;-I(X;k
q (2J!")-"PI2 1:!;1-"/2exP [_Jc'V 'V"k 2 ~k=lL..
j=l
X:!;-I(X;k -I')l
-1')] (x
jk
_
,...
)1
The Multivariate Analysis of Variance
151
The maximum likelihood estimate of J.l and 1: for above likelihood function are
A_I n
"q
"n,
fL=X = - L..k~lL..j~lXjk
and
t=..!:.n L~-lL~~J +SB). - X kj - X)( X kj - X)I =..!:.(Sw n Using above estimates in L( m) we have L( m) = (271"r"'I2I~( Sw + SBf/2 x exp [ -
%L ~~lL ~:J x XY(Sw + SB r' (X x) ] Jk -
Jk -
= ( 271") -",12 i( Sw + SB) / {nl2
xexp [ -%trIL~~lL ~:l (x Jk - x)( X Jk X(Sw
np12
=
(271"r
=
(271" r "P
12
+sBf1lJ
I(Sw +SB)/nl-n12 exp [ -~tr(Ip)]
or
=
1
I( Sw + SB )/nl-n12 exp [ -~ np
Using (6.3.6) and (6.3.7) in (4.1.1) we have /I..
X)I
(271" r"PI2 1( SW + SB )/ nl-nl2 exp( - np/2) (271" r"PI2ISw /nl-nI2 exp( - np/2) nl2 ISwl
(6.3.7)
152
Chapter Six
which is exactly (6.3.3). The statistic (6.3.4) can therefore be used to test the hypothesis (6.3.5). Some special cases and associated F-statistic for testing (6.3.5) are given in the table below q
p
F-ratio
df
Any
2
[n- q -1]x 1-.JA q-1 .JA
2(q-1),2(n-q-1)
2
Any
3
Any
[n-
;-l]x l~A
[n- p -2]x 1-.JA .JA p
p,(n-p-1) 2p,2(n- p-1)
The hypothesis
Ho : J.l 1 = J.l 2 = ...... = J.l q can be tested by using any of the statistic from above table or by using (6.3.4). Example 6.3: Independent random samples of sizes 12. 10. 8 and 13 were selected from four different schools and information about student's height; in inches; and weight; in Ib's; was obtained. Following mean vectors and covariance matrices were computed from resulting samples Xl
58.2]
= [ 127.3
;X2
[59.8]
= 12l.4
;X3
[56.7]
= 120.9
;X4 -
[58.7] 124.5
S =[2.134 3.331]. S =[3.363 2.674] I 6.014 ' 2 5.224 S = [1.485 2.065]. S = [2.752 4.l74] 3 7.884 ' 4 9.813 Can we conclude that the average height and weight of students in four schools are equal?
Solution: We test the hypothesis in the following. 1. The null and alternative hypotheses are H 0: J.ll
= J.l2 = J.l3 = J.l4
The Multivariate Analysis of Variance
153
HI: Atleast two mean vectors differ 2. Level of significance is a ~ 0.05. 3. Since p = 2, therefore test statistic is
Fa
~
n- q -l] 1-,fA I . where A [ q-l x vA
4. Necessary computations are given below
S
L:
-
w -
q
k=l
(n
k
125.25] 286.11
-1 ) S - [97.16 k 125.25
ISwl ~ 12110.885 _
l"q
_
[58.444]
Now x ~ -;;- L..k=l nkxk ~ 123.891
~ [146876.084 311349.712]
nxx/
660002.914
"q __ /~ [146920.37
311349.71] 660280.81
and L..k=l nkxkxk
~ [44.286 -0.002]
So" '1l
and
277. 896 '
SW+SB ~ [ 141.45
Hence A
~ ISw I
ISW+SBI
125.25] 564.01
;ISw +SBI ~ 64091.652.
12110.885 64091.652
0.189.
Finally
F ~[n-q-l]xl-,fA o q -1 -,fA---'A~
~ [43- 4-1] x 1-.J0i89 ~ 4-1
vO.189
16.469.
5. Critical region is Fa 2:: F a,2(q -1),2(n -q-l) or Fa 2:: F a,6,76 or Fa 2:: 2.220.
154
Chapter Six
6. Conclusion: We reject the null hypothesis and conclude that students in these four schools differ significantly with respect to average height and average weight FollowingR commands will perform above analysis
> > > > > >
n1~12;n2~10;n3~8;n4~13;g~4
ml=matrix(c(58.2,127.3),nrow=2,byrow=TRUE) m2=matrix(c(59.8,121.4),nrow=2,byrow=TRUE) m3=matrix(c(56.7,120.9),nrow=2,byrow=TRUE) m4=matrix(c(58.7,124.5),nrow=2,byrow=TRUE) sl~matrix(c(2.134,3.331,3.331,6.014),
+ nrow=2,byrow=TRUE)
>
s2~matrix(c(3.363,2.674,2.674,5.224),
+ nr ow=2,byrow=TRUE )
>
s3~matrix(c(1.485,2.065,2.065,7.884),
+ nr ow=2,byrow=TRUE )
>
s4~matrix(c(2.752,4.174,4.174,9.813),
+ nr ow=2,byrow=TRUE )
> cn=nl+n2+n3+n4 > cm=(nl*ml+n2*m2+n3*m3+n4*m4)/cn > s w~(( n1 -1)*sl+( n 2-1)*s2+( n 3-1)* * s3+( n4 -1)*s4)
> > >
sb~( n1 *(m1%*%t(m1) )+ n 2*(m2%*%t(m2))+ + n3*(m3%*%t(m3))+n4*(m4%*%t(m4)))+ (c n *(cm%*%t(cm)))
st~s w +sb;lem~det(s w )/det(st) fO~((cn-g-1)/(g-1))*
+ ((l-sqrt(lem) )/sqrt(lem))
> lem;fO [lJ [lJ
0.1889728 16.47153
If we have raw data then following R commands will perform one way Multivariate ANOVA
> datal=read. table (lid: / MA.NOVA . csv", + header=T, sep=", ")
> attach (datal) > A=as.factor(id) > data.ma=manova(cbind(yl,y2)~A) > summary(data.ma,"Wilk s")
The Multivariate Analysis of Variance
155
6.3.2 Testing Equality of Components of Mean Vectors In previous section we have seen that how one way 1.1ANOVA can be used to conduct test of significance regarding equality of several mean vectors. We have seen that the test statistic
_[ms- P(q-l)/2+1]
Fo -
p(q-l)
1_/,,11,. 11.1/,
x,
(6.3.4)
provide basis for testing of the hypothesis H ° :fl, = fl2 = ...... = fl4 under multivariate normality. Once the hypothesis of equality of several mean vectors is rejected we can test the equality of components of mean vectors by using suitable entries of matrices Sw and SB. Recall that the matrix SB is matrix of between group sum of squares and cross products and hence the ith diagonal element of SB is between group sum of squares for ith variable in the analysis; that is S E" = SSE, . Again the matrix Sw is matrix of within group sum of squares and cross product matrix and hence ith diagonal element of Sw is error sum of squares for ith variable in the analysis; that is Sw" = SSE, . Hence the test of hypothesis
HOi: f.1n
= f.1n = ... =
Pig
can be conducted by using the statistic
Fa.
SSBj(q-l) . SSEj(n-q)'
(6.3.6)
which under Ho, has an F-distribution with (q - 1) and (n - q) degrees of freedom. Example 6.4: Consider the data of Example 6.3. Can we conclude that students in four schools have equal averages on two variables seprately? Solution: LetX, is Height andX2 is Weight, then the hypotheses to be tested are H 01 :,ull = ,u12 = ,u13 = ,u14 and H 02 :,u21 = ,u22 = ,u23 From Example 6.3 we have S= B
44.286 [
=
,u24·
-0.002] [ 97.l6 125.25] andS= 277.896 W 125.25 286.l1
So SSE, = 44.286, SSE 2 = 277.896, SSE, = 97.16 and SSE2 = 286.11.
156
Chapter Six
Necessary computations are shown below
SoV Schools
Variable Heizht Weight Heir;ht Weizht Height Weir;ht
Error
Total
d.f. 3 3 39 39 42 42
SS 44.286 277.896 97.16 286.11 141.446 564.006
MS 14.762 92.632 2.491 7.336
F- ratio Fa! ~ 5.926 Fa2 - 12.627
The critical region is Fa 2:: F a,(q -l),(n _ q) or Fa 2:: F 0.,3,39 or Fa 2:: 2.845. Since both computed values are larger than the critical value therefore the two hypotheses are rejected. We may conclude that the students in these four schools differ significantly with respect to their height as well as weight
6.3.3 Confidence Interval for Pairwise Difference Between Com ponents of Mean Vectors When the hypothesis of equality of mean vectors is rejected then we can test the pairwise difference between components of mean vectors by constructing the confidence intervals for difference between means. Specifically we are interested to construct the confidence interval for the difference (}lik - ,uih ) where f.i!k is ith component of J.lk and J.1!h is ith
component of estimate of
fl h .
The interval can be constructed by noting then an
(f.1i k - ,uih)
can be readily written as
(X:k - X;h)
with an
estimate of the variance given as
Var(x.k-X.h)~ Uk + ~J( ::hq} where Sw" is ith diagonal element of Sw. The 100(1 - a)% confidence interval for the difference (Pik - Jlih ) is therefore given as: ( XIk -
xIh )+- t a l m,( n - q )
where m=pq(q-l)/2
_I +_1 (n nh k
J(~J; n- q
(6.3.7)
The Multivariate Analysis of Variance
157
Example 6.4: Consider data of Example 6.3 and construct the confidence interval for true difference between average height and average weight
between fhird and fourth school. Solution: We are interested to construct confidence intervals for differences
('u'3 -'u'4) and
(,1123 - ,1124) . We know that fhe 100(1 - a)% confidence
interval for the difference ()lik - )lih ) is r;---~---,-
(X,k-X,h)±11 (_) (_I+_IJ(Sw" am , n q n n n _ q J. k
From example 6.3 we know that p
~
h
~
2, n3
m ~ (2x 4 x 3)/2 ~ 12 and la/m.ia-g) =
~ 13, n ~ 3.059 .
8, n4
10.05/12.39
~
43 and q ~ 4 so
Also from example 6.3 we have
"3
56.7] [ 120.9 ;
=
"4
=
[58.7] 124.5 ; Sw
[ 97.16 125.25] 125.25 286.11
=
97.16 and SW22 ~ 286.11. Now the interval for ('u'3 -'u'4) is So S Wll
~
( -X13
-
X 14) ± t
I (_) a m, n q
( - I + -I n) n
4
J( -n _-q J SWll
or
(56.7-58.7)±3.059
(.!:.+~J(97.16J 8 13
43-4
or
-2.0± 2.17 or -4.17 1:_,~[0.342 13
--{).316] 0.368
so
where 110 is identity matrix of order 10. Further
Multivariate Regression Analysis
201
---{).3161 1O ]. 0.3681 10 The generalized least square estimator of
Jl
IS
~ ~ (D/Q-1D r'D /Q-l y . Now we have 3.421
Din-1D
133.079
85.526
-3.158
-60.000
-90.000
-103.263
5216.763
3340.316
-122.842
-2349.474
-3513.789
-4042.105
2156.632
-78.947
-1515.421
-2257.579
-2586.947
3.684
70.000
105.000
120.473
1366.842
2009.737
2297.105
3006.684
3445.842
=
3970.474
and
-31.473 -1211.158 DI !rly =
-779.973 91.052 1732.158 2595.000 2973.421
So
34.813 0.482 0.465
p~(D/!rlDr'D/!rly= 63.571 0.377 0.121 0.196 The seemingly ill1Telated regression models are therefore Yl ~ 34.813 + 0.4821:"1 + 0.456X2 and Y2 ~ 63.571 + 0.377X3 + 0.121X4 + 0.196Xj .
202
Chapter Seven
The fitted models can be used to estimate the predicted values. FollowingR commands will perform above analysis > datasur=read.table("d:/SUR2c.csv", + header=T, sep=", ")
> attach (datasur) > n ~10;k1~3;k2~4 > yyl=as.matrix(datasur[,c(" y l") ] + , nr ow= n ,byrow=TRUE )
> yy2=as.matrix(datasur[,c(" y 2") ] + , nrow= n ,byrow=TRUE )
> xxl=as.matrix(datasur[,c("cl", "xl", "x2")] + , nrow=n,byrow=TRU E )
> xx2=as.matrix(datasur[,c("c2", "x3", "x4", > > > > > > > > > >
+ "xS")],nrow=n,byrow=TRUE ) al=matrix(O,n,k2);a2=matrix(O,n,kl) txl=cbind(xxl,al);tx2=cbind(a2,xx2) my~rbind(yy1,yy2);mx~rbind(tx1,tx2) id1~0.342*diag(n);id2~-0.316*diag(n) d3~0.368*diag(n)
oml=cbind(idl,id2);om2=cbind(id2,id3) ome=rbind(oml,om2) fl=t(mx)%*%ome%*%mx; f2=t(mx)%*%ome%*%my beta~solve(f1)%*%f2
beta
0.4789412 34.9688584 0.4640131 63.7221401 0.3747539 0.1208055 0.1931064
[,1 J
7.3.3 Properties of Estimates The seemingly ill1Telated regression model, in stacked fOnTI, is given in
(7.3.4) as y
~
DJl + E,
(7.3.4)
with E(E)~O and COV(E)~!!. where !!~1:®I. The assumptions on f:
provide following assumptions on y
E(Y) ~ DJl and Cov(y) ~!! The generalized least square estimator of
Jl
is given in (7.3.6) as
Multivariate Regression Analysis
P ~ (D/n-ID
203
f D/n-ly
~ (DI (I:®ItD
r
DI (I:®It y.
( 7.3.6)
In the following we have discussed some properties of estimator (7.3.6). 1.
The estimator f3 is an unbiased estimator of In order to prove this we consider (7.3.6) as
Il.
p~(D/n-IDfD/n-ly . Now using y =
DJl + E,
P ~ (D/n-ID
in above we have
f D/n-l (DIl + E)
~ (D/n-ID fD/n-IDP+ (D/n-IDfD/n-IE ~ Il+ (D/n-ID( Din-IE. Applying expectation we have
E(p) ~ E[1l +(D/n-ID ~ Il+ (D/n-ID
r Din-IE]
r D/n-IE( E)
~f3asE(E)~O.
So the estimator is unbiased. 2.
The covariance matrix of
Jl
is
We have
Cov(p) ~ E[(P - f3 )(p - f3
n
From (7.3.8). we have
P-Il ~(D/n-IDfD/n-IE So
(7.3.8)
204
Chapter Seven
or
Now using the fact that E
(EEl) ~ g
Cov(i3) ~ (DI g-ID ~ (D/g-1D
r r
we have
DI g-lgg-lD(D I g-ID
D/g-1D(D/g-1D
r
~(D/!rlDr' ~(D/(L®Ir'Dr,
r (7.3.9)
as required. It is to be noted that the feasible generalized estimator Jl is neither unbiased
nor its covariance matrix is as given in (7.3.9).
7.3.4 Inference in Seemingly Unrelated Regression Model The seemingly ill1Telated regression model, in stacked form, in given in (7.3.4) and the feasible generalized least square estimator of parameter Jl is given in (7.3.7). An important task in seemingly unrelated regression model is to test significance of the model parameters. A possible test of hypothesis H a : RJl ~ c. where R is a (J x K) matrix of coefficients and c is a (J x I) matrix of test values. The hypothesis Ha : RJl ~ c can be tested by using the statistic
(R~-Cn R(D/g-1DrRJ(R~-c)l
"I g-l,,/( np _ K)
where
e~ y- DJl.
(7.3.10)
The statistics Fa, under Ha, has an approximate F-
distribution with J and (np -
K)
degrees of freedom. For fairly large
sample sizes, an alternative statistic for testing Ho : RJ3
=
c is
Multivariate Regression Analysis
205
(7.3.11) which has approximate X
2
-
distribution with J degrees of freedom.
Various choices of R enable us to conduct various tests of significance about parameters of the model. For example if we want to test joint significance of all the predictors then R is a (( K -
p) x K)
matrix
gIVen as Om) Om
Om)
1m)
0
z
(mzxm j )
Om
z
0
hxmz)
0,,\
Om)
0
I mz
Om
Om
0
z
z
h~mp) h~mp)
R~
where 0mi is a
O(mixm h
)
(m
j
xl) vector of zeros, Imi is identity matrix of order i,
is an (m j x mh
)
matrix of zeros and mi is number of predictors
in ith regression model. Also, c is a (( K significance of the model.
p) xl)
vector of zeros in testing
CHAPTER EIGHT CANONICAL CORRELATION ANALYSIS
In previous chapter we have discussed multivariate regression analysis which deals with studying dependence of a set of response variables on a set of predictors. In section 7.2, we have discussed some measures which provide strength of dependence between two set of variables. The measures discussed in that section are based upon the assumption that one set of variables is random and the other set of variables is fixed. Often, it happen that we have two sets of random variables and we want to see the strength of interdependence between these two sets. Recall that the correlation coefficient is a popular measure that can be used to decide about the strength of interdependence between two random variables. When we have two sets of random variables then we need an extension of correlation coefficient which provides strength of interdependence between these two sets. In this chapter we will discuss a popular analysis which is used to study the strength of interdependence between two sets of variables, known as Canonical Correlation Analysis (CCA). Canonical correlation analysis provides us methods to compute strength of interdependence between two sets of variables, known as canonical correlation coefficient. In canonical correlation analysis we obtain representatives of sets of variates, known as canonical variates, and obtain correlation coefficient between these canonical variates. We discuss the concept of canonical correlation in the following.
8.1 Canonical Correlation and Canonical Variates Suppose we have two sets of variates yl = [~Yz Xl =
[Xl
Xl
XqJ
YpJ
and
which are measured on same set of
sampling units. The canonical correlation measures the strength of interdependence between two sets of variables y and x. In section (7.2) we have seen that a measure of overall association between two sets of variates IS gIVen as
Chapter Eight
208
R;.x = IS~;SyxS~Sxy I= TI;=1~2, where s
=
rnin(p, q) and r;2 is ith eigen value of S~;SyxS~;SXY. We have
seen that this measure assumes very small values and hence is not a useful measure of interdependence between two sets. The eigen values
r;2 provide
meaningful measure of interdependence between the sets y and x. The square roots of eigen values rI, rz, ... , rs are called the canonical correlations between y and x. The best overall measure of interdependence between two sets y and x is the square root of largest eigen value ri.
The canonical correlations can also be viewed as simple correlations between representatives of sets y and x. For this, suppose u = a/y and v = b/x, then the ith canonical correlation r ) is simple correlation between u and v. The representatives u and v are called canonical variates. In the following we give a formal derivation for canonical correlation and canonical variates.
8.2 Obtaining Canonical Correlations and Canonical Variates We will derive canonical correlations and canonical variates when population measures are available. The expression for sample measure can be easily written. Suppose y and x are two sets of variates with
Cov(y)= 1:11 ,Cov(x) = 1:22 and Cov(y,x)=1:12 Suppose further that the variables are standardized so that above matrices contain correlation coefficients between variables of different sets. Let u = a/y and v = b/x be linear corn bination of two sets with
Var(U) = Var( aly ) = a lCov(y)a = a/~la Var(V) = Var((31 x) = (31 Cov( x)(3 = (3/1:22(3 and
Canonical Correlation Analysis
209
Cov(U, V) ~ Cov( a/y,/3/x) ~ a/Cov(y,X)/3 ~ a/E12/3 Now squared correlation between u and v is p
2 ~
[Cov(U,V)J' Var(U)Var(V)
(a/E12/3)'
(8.2.1)
(a/Ella )(/3/E22/3)
The squared correlation (8.2.1), depends upon various choices of the coefficient vectors a and f3. Infact, there are s ~ min(p, q) squared correlations depending upon various choices of a and J3 and our aim is to maximize the squared correlation (8.2.1). The squared correlation is maximized when Var(U) ~ Var(V) ~ 1. This condition is equivalent to
a/Ella ~ f3/E22f3 ~ 1
(8.2.2)
that is we want to maximize a/E12f3 under the condition (8.2.2). The function to be maximized is (8.2.3)
where '" and 5 are Lagrangian multipliers. Differentiating (8.2.3) wrt a and p and equating to zero, we have
(a/E12/3 )E12/3 ~ ",Ella,
(8.2.4)
)E 21 a = SE 22 f3·
(825)
and (a/E12f3
Premultiplying (8.2.4) with a and (8.2.5) with
'" ~ 5
~ (a/E12f3
r.
~
we have
Using this value in (8.2.4) and (8.2.5) we have ",1I2 E12 /3 = ",Ella and ",1i2 E21a = ",E 22 /3 or (8.2.6)
and
E 2l a - ",1I2 E22 /3
=
0.
Writing above equations in matrix fonn we have
(8.2.7)
Chapter Eight
210
-/I,1I21:
l
11
1:21
Above equation have a non-trivial solution if ~O
or
or
11:;;1:121:~;1:21 - /1,11 ~ 0
(828)
From (8.2.8), we can see that the squared canonical correlation ;1., =
(a 1: l
12
J3
r
IS an eigen value of has
1;:> .It, ;:>
A., ;:> . - - ;:> /1"
;:>
s ~rnin(p,q)
o.
1:~;1:121:;;1:21.
In general
elgen
such
values
that
Further, we can see that the eigen values of
I:,~; 1:12 1:;;1:21 and I:,;~ 1:21 I:,~; 1:12 are same and hence squared canonical correlations can be obtained as eigen values of any of these two matrices.
The positive square root.Ji; is the canonical correlation between Uh and Vh;
h
=
1, 2, ... , s. The largest canonical correlation is given as the positive
square root of largest eigen value
.[i;
and so on. We will now discuss a
method to obtain the coefficient vectors a and From (8.2.6), we have
/1,1121:1213 = /l,1: 11 a
~
in the following.
(8.2.9)
From (8.2.7), we have
13 = /I,-1/21:~;1:21a .
(8.2.10)
Using (8.2.10) in (8.2.9), we have 1:12 I:,;~ 1:21
ex =
),1: 11 ex
or (8.2.11)
Canonical Correlation Analysis
211
From (8.2.11), we can see that a is an eigen vector of E~;E12E~iE2l subject to a Ell a ~ 1. Once a is obtained, the value of ~ is obtained by using (8.2.10). When the sample data is available then the squared canonical correlation coefficient is given as l
(a /S12 bf
2_ u
r,
-
(a/S11a)(b/S22b)'
and is obtained as the positive square root of eigen value of S~;S12S;;S21. The coefficient vector a is given as the solution of
(S~;S12S~is21 -;n)a ~ o.
(8.2.12)
The coefficient vector b is computed as b ~ /l-I/2S~iS2la . Example 8.1: Consider following correlation matrix for yl ~ [~ Xl
~ [Xl
Y2] and
X2]
l.0
0.4 0.5 0.6 l.0
E~
0.3 0.4 l.0
0.2 l.0
Obtain canonical correlations and canonical variates between sets y and x. Solution: We first partition E as
l.0
0.4
0.5
0.6
l.0
0.3
0.4
l.0
0.2
E~
~[E11 E12] E2l
E22
l.0 We know that the squared canonical correlations are given as the eigen values of 1:~;1:121:;;1:21 . Now
212
Chapter Eight
1
[0.4524 0.5238] 0.1190 0.1905
1
[0.3958
1:~1 1:12 ~ and
0.2292]
1:~21:21 ~ 0.5208 0.3542 ' so
1:-11: 1:- 11: 11 12 22 21
=
[0.4519 0.2892] 0.1463 0.0947 .
Now
1:-11: 1:-11: 12 22 21
_A.II~I°.4519-A. 0.1463
1 11
I
0.2892 0.0947 _ ~
~;1.2 - 0.5466;1.+ 0.0005 The characteristic equation is, therefore
;1.2 - 0.5466;1.+ 0.0005 ~ 0 The solution of above equation is Al
~
0.5457 and 1.2
~
0.0009. The two
canonical correlations are
PI ~.p:; ~ ~rO.--:547:5-=-7 - 0.7387
P2 ~
ji; ~ ')0.0009 -
0.0300 .
We now obtain the canonical variates corresponding to Al = 0.5457. For this, we know that the coefficient vector a l is given as the solution of
(1:~;1:121:~i1:21 - ..V)u1 ~ 0 subject to
Now
U
/
1:11 u ~ 1.
(1:~;1:121:;~1:21 - ~I )u 1 = 0 gives 1:~;1:121:;~1:21al 0.4519 [ 0.1463
0.2892][a11] 0.0947 a 21
=
~al
or
~ 0.5457[a11 ] a 21
or
0.4519a11 + 0.2892a 21
~
0.5457a11
(i)
0.1463a11 + 0.0947a 21
~
0.5457a 21 ·
(ii)
and
Canonical Correlation Analysis l
The constraint is a l: l1 a
1[
=1
213
or
l.0
a 21 0.4 or (iii) Also from (i) we have a21
~
0.3243a11. Using this value in (iii) we have
a 121 + 0.8a11 (0.3243a11 ) + (0.3243a11 )2 ~ 1 or
l.3646a121 ~ 1 => a 11 ~ 0.8560. Also a21 ~ 0.3243a11 gives a21 ~ 0.2776 so a l ~ [0.8560 coefficient vector R
-
PI -
0.27761. The
Pis obtained as
,-112~-1~
/'1
~22 ~2I a l
~ (0.5457r1l2 [0.3958
0.2292][0.8560] 0.5208 0.3542 0.2776
~ [0.5447] 0.7366
The first pair of canonical variate is, therefore
U1 ~ a;y ~ 0.8560~ + 0.2776Y2 and
V;
~
P1I X
~
0. 5447X1 + 0.7366X2
The second pair of canonical variates can be obtained in similar way.
8.3 Inference about Canonical Correlations Canonical correlations provide information about strength of interdependence between sets of variates. Assuming normality of two sets, the sample canonical correlations can be used to conduct test of significance about population canonical correlations. A simple test of significance about canonical correlations is to see whether all of the population canonical correlations are equal to zero, that is to test the significance of HO:P1~P2~·-----~P'~0
(8.3.1)
The test of hypothesis (8.3.1) is equivalent to testing independence of two sets of variates y and x as discussed in section (5.3). The hypothesis (8.3.1) can be tested by using the statistic
Chapter Eight
214
F
~
o
1- Alit d' 0 ~2 Alit d" o
(8.3.2)
~l
where
AO ~ II;~J 1-
,,2),
dfr ~ pq,
dfz ~ wt- pq/2+ 1, w~ n-(p+q+3)/2, t 2 ~ (p2q' -4 )j(p2 +q' -5),
and
r;2
is ith squared eigen value of
is rejected if Fo ;:> Fa,dfr,df
2
1:-;: 1:12 I:,;~ 1:21 . The hypothesis
H0
The hypothesis (8.3.1) can also be tested by
.
usmg
x~ ~-[n-~(p+q+3)]lnAo, 2
which has an approximate X - distribution with pq degrees of freedom. If the hypothesis (8.3.1) is rejected then significance of a specific number of canonical correlations can be tested sequentially, that is if hypothesis (8.3.1) is rejected then we can test the hypothesis that the first canonical correlation is different from zero and all subsequent canonical
correlations are equal to zero. Symbolically, we want to test the hypothesis HOI: PI 7c 0;P2 ~ P3 ~. _ .. - - ~ p, ~ O. (8.3.3) If hypothesis (8.3.3) is rejected then we can test the hypothesis H02 :Pl 7c 0,P2 7c0;P3 ~ P4 ~ " .. " ~ p, ~ 0, and in general we can test the hypothesis HOk : PI 7c 0" -Pk 7c O;Pk+1 ~" .. " ~ p, ~ 0 . The hypothesis (8.3.4) can be tested by using the statistic
F
-
Ok -
1- AlItk df2 (k) k AlItk d-f' ' k
~l(k)
where
Ak
~II;~k+1(1-,,2), dJ;(k) ~(p-k)(q-k),
dJ;(k) ~wtk-(p-k)(q-k)/2+1,
(8.3.4)
(835)
Canonical Correlation Analysis
215
1
(p_k)2(q_k)2_ 4
2
(p-k) +(q-k) -5
w~n--(p+q+3), t2~
2
2.
The statistic (8.3.5), under H ok , has an approximate F -distribution with
(dJ;(k)' dI2(k)) degrees of freedom. The hypothesis (8.3.4) is rejected if
Fat ~ Fa,dfi ,d!2 . We can also use following
X
2
-
approximation for testing
of (8.3.4)
X~k ~-[(n-k)-~(P+ q+3)]lnA k , which has an approximate X2 - distribution with (p -
(8.3.6)
k)( q - k)
degrees
of freedom. Example 8.2: Following is covariance matrix of seven standardized
variables (Y"Y2 ,Y3 1.00
s~
)
and (X"X 2 ,X3 ,X4
)
0.56
O.~
1.00
OA7 OA6 OAI 0.54 0.27 1.00 OA8 0.58 0.61 0.18
0.86
1.00
0.83
o.~
o.~
0.70
0.81
0.17
1.00
0.68
0.19
1.00
0.12 1.00
Test significance of various canonical correlations, assuming a sample of size 25. Solution: We first compute the canonical correlations. For this we first see that 1.00 0.70 0.81 0.17 1.00 0.56 0.66 1.00 0.68 0.19 Sl1 ~ 1.00 OA7 , S22 ~ 1.00 0.12 1.00 1.00
[
and
1 [
Chapter Eight
216
S12
~
[
0.86
0.83
0.46
0.41
0.94 0.54
0.26] 0.27.
0.48
0.58
0.61
0.18
Now
S~;s12s~is21 ~
[
0.976
0.528
0.006
0.045
-0.015
-0.013
0.603] -0.002 , 0.044
so
603 0. -0.002, .
1
0.044- A. The detenninential equation
.i3 + 1.065.i2
-
IS;;S12S;~S21 -
ill
=
0 gives
0.095.i+ 0.002 ~ O.
The roots of above equation; the squared canonical correlations; are
li2 ~ 0.969, r22 ~ 0.063 and r32 ~ 0.033 . We, now, test the hypothesis about significance of canonical correlations. We first test the significance of all canonical correlations.
1. The null hypothesis is H 0 : PI ~ P2 ~ P3 ~ 0 . 2. Level of significance to be used is a ~ 0.05. 3. Test statistic is
df2 where Ao -_ Fa -_ 1- A~t lit Ao dfr
II'. (1- r; ,~1
2)
,
and
dfr
~
pq,
df2 ~ wt- pq/2+ 1,
w~n-(p+q+3)/2,
/2
~(p2q2 _4)/(p2 +q2
4. Computations: Necessary calculations are given below
Now
-5),
Canonical Correlation Analysis
dJ;
217
~ pq~(3)(4)~12
w ~ n - (p + q + 3 )/2 ~ 25 - (3 + 4 + 3 )/2 ~ 20
t2~ p2q2_4 ~32x42_4 p2+q2_5 ~
dfz
32 +4 2 _5
7=>t~2.646
1 3x4 wt-- pq+ 1 ~ (20)(2.646)--+ 1 ~ 48
2
2
Finally
F
o
~
1- Alit df 1- 0.028112646 48 0 2~ xA~t dJ; 0.028 112646 12
~
11 450 . .
5. The critical region is Fo 2:: FO.05,12,48 or Fo 2:: 1.960. 6. Conclusion: Since the computed value of Fo exceeds the critical
value. we reject the hypothesis and concludes that all the canonical correlations are not equal to zero. We now test the significance of second and third canonical correlation, assuming that the first canonical correlation is different from zero. The
hypothesis is tested below. 1. The null hypothesis is Ho : PI
ofc
0; P 2 ~ P3 ~ 0 .
2. Level of significance to be used is a
~
0.05 .
3. Test statistic is
_ 1- A:ltl df2 _, 2 lit whereA I - TI _ (1- r, ), Al I dJ; ,-2
111 -2 (p-l) +(q-l) -5 2 +3 -5 df2(1) ~ wl1 -
(p-l)( q-l)/2+ 1
~ (20)(2)-(2 x 3)/2+ 1 ~ 38
Finally F o
1- Alltl df2(1) 1 A lit! dJ, HI
"1(1)
112
1- 0.906 0.906112
x 38 ~ 0.320 6
5. The critical region is Fa 2:: FO.05,6;38 or Fa 2:: 2.349. 6. Conclusion: Since the computed value of Fa does not exceed the critical value, we do not reject the hypothesis and concludes that all the canonical correlations, after first, are equal to zero.
8.4 Partial, Part and Bi-partial Canonical Correlations Canonical correlation analysis provides a method to see the strength of interdependence between two sets of variates. Often it happen that we have more than two sets of variates and our interest is to obtain canonical correlation between two sets after removing the effect of other sets of variates. In such situations, we can extend the concept of canonical correlations to parlial, part and hi-partial canonical correlation analysis. These situations are an extension of univariate partial, part and bi-partial correlation analyses. We discuss these in the following. The partial canonical correlation analysis is extension of partial correlation and is based upon the conditional covariance or correlation matrix of two sets of variates after removing the effect a third set of variates as discussed by Roy (1957) and Rao (1969). The partial canonical correlation analysis can be illustrated by assuming that we have three sets of variates y = 1, x = 2 and Z =3, containing p, q and r variables respectively, with covariance matrix
E
~
[
E11 E21
E12
E13
E22
E 23
~31
~32
~33
·
We assume that all the variables are standardized so that above matrix contains correlation coefficients for various sets. Now assume that the effect
Canonical Correlation Analysis
219
of set z is removed from the sets y and x and we have following conditional covariance matrix for sets y and x after removing the effect of set z
1: ·3
~ [1: 11 .3
1: 12.3] 1: 22 .3
1: 21.3
~ 11:11 - 1:131:~i1:31
1:12 - 1:131:~i1:32
1:231:~i1:31
1:22 - 1:231:~i1:32
l1:21 -
(8.4.1)
The canonical correlations based upon the conditional covariance matrix (8.4.1) are called the partial canonical correlations, that is the squared partial canonical correlations are obtained as solution of following detenninential equation
11:~;31:1231:~i31:213 - A311 ~ 0
(8.4.2)
The coefficient vector, u. 3 , for partial canonical variate u. 3 is obtained as a solution of
1:;:.31:12.31:;~.31:21.3U.3 under the constraint
=
U:31:11.3U.3 =
A: 3U. 3 , 1 . The coefficient vector, 13.3 ' for partial
canonical variate V.3 is obtained by using 13.3 = A:;1121:;~.31:21.3U.3 . The sample partial canonical correlations and partial canonical variates are easily obtained by using sample covariance or correlations matrices in (8.4.1). The sample partial canonical correlations can be used to test the significance of population partial canonical correlations, just like the simple canonical correlations are used to test significance of population canonical correlations. Specifically, the hypothesis
P1.3
0, ... , Pk.3 7:- 0; P(k+l).3 = ... = Ps.3 where s ~ min(p, q), can be tested by using the statistic HOk :
F
-
Ok -
7:-
1- AlItk dI2(k) k.3 lit
Ak;
dJ;(k)
'
=
°
(8.4.3)
(8.4.4)
where
Ak] ~TI;~k+1(1-,,23),k~o,1,2, __ ,s dJ;(k) ~ (p - k)( q - k), dI2(k) ~ wlk - (p - k)( q - k )/2 + 1,
Chapter Eight
220
1 2 w~(n-r)--(p+q+3), t ~
2 The statistic (8.4.4), under
HOb
(p_k)2(q_k)2_ 4 2
2·
(p-k) +(q-k) -5
has an approximate F-distribution with
(dJ;(k)' dI2(k)) degrees of freedom. The hypothesis (8.4.3) is rejected if Fot ~ Fa,dfi,dfz . We can also use following X2
-
approximation for testing
of (8.4.3)
X~k ~ -[( n -
)-~(p + q+ 3) ]lnA
(8.4.5)
which has approximate X2 - distribution with (p -
k)( q - k) degrees of
r- k
k,
freedom. The part canonical correlation appears when the effect of set z is removed only from one set of variables, say x. In such situations, the canonical correlations are computed by using following covariance matrix 1: 12 .3 ] 1: '
1:11
1:1(2.3)
~ [ 1:
21.3
(8.4.6)
22.3
that is the squared part canonical correlations are obtained by solving following deterrninential equation
11:~;1:12.31:~;.31:213 -
-'\2.3)11
~ o.
(8.4.7)
The coefficient vector U(Z.3) for part canonical variate U(Z.3) is obtained by solving
1:~;1:12.31:~;.31:213a(2.3) ~ -'\2.3)U(23)' under the constraint U(23)1:11U(2.3) ~ 1. The coefficient vector, /3(2.3)' for partial canonical variate V(Z.3) is obtained by using R
.:1,-1121:-1 1:
P(2.3) = -12.3)
22.3
21.3 U (Z.3)·
In partial canonical correlation, the effect of same set of variates is removed from two sets. Often it happen that the effects of different sets of variates are removed from two sets of variates and canonical correlations are computed from these purified sets. The canonical correlations computed by using such sets of variates are called hi-partial canonical correlations
Canonical Correlation Analysis
221
and are discussed by Timm and Carlson (1976). The concept ofbi-partial canonical correlations is illustrated by assuming that we have four sets of variates y = 1, x = 2, ZI = 3 and Zz = 4 ,containingp, q, r andh variables respectively, such that the joint covariance matrix of these four sets is
E~
[
~1
E12
~3
E14
1:21
1:22
1: 23
1:24
E31
E 3z
E33
~4
1:41 1:42 1: 43 1: 44 Now if the effects ofzl are removed from set y and effects of Z2 are removed from x then the conditional covariance matrix for set y and x has the form
E
-
(1.3)(Z.4) -
[EE
11 .3
'"
E.]
E
22.4
where 1:12 -1:131:~~1:32 -1:141:~!1:42 + 1:131:~~1:341:~!1:42 . (8.4.9) The squared bi-partial canonical correlations are obtained in usual way by using the covariance matrix (8.4.8), that is the squared bi-partial canonical correlations are given by solution of following determinential equation 1:",
=
IE~;3E.E~i4E: - ~L3)(z4)II ~ 0 .
(8.4.10)
The coefficient vector, u(1.3)(Z.4)' for partial canonical variate U(1.3)(2.4) is obtained as a solution of
E~;3E.E~i4E:u(13)(Z4) ~ ~13)(Z4)U(13)(Z4)' under the constraint U(13)(Z4)E l1 .3 U(13)(Z.4) ~ 1. The coefficient vector, (3(1.3)(Z.4)' for partial canonical variate v(1.3)(Z.4) is obtained by using
R
.rllz
P(1.3)(Z.4) = -V.3)(Z.4)
E- 1 EI ZZ.4
,u(1.3)(Z.4)·
The sample bi-partial canonical correlations are obtained by solving following determinential equation
IS~;3S.S~i4S: - ~13)(z4)II ~ 0
(8.4.11)
Chapter Eight
222
where
~~~'-~3~~-~4S~Sa+~~~S~~. The sample bi-partial canonical correlations can be used to test the significance of population bi-partial canonical correlations. Specifically, the hypothesis P l (1.3)(,.4) 7c 0, .. ,Pk(1.3)('.4) 7c 0 HOk :
where s
~
(8.4.12)
P(k+l)(1.3)('.4) ~ .. ~ p,(1.3)(,.4) ~ 0
min(p, q), can be tested by using the statistic lItk
_ 1- A k (1.3)(,.4) FOk -
lItk
A k(1.3)(,.4)
dlf '(k) d"
where
Au
~ TI;~k+111- "(13)('4)
dlr(k) ~
w~
(8.4.13)
"r(k)
l, k ~ 0,1,2, .. .
(p - k)( q - k), df,(k)
{n -
~ wlk -
,S.
(p - k)( q - k )/2 + 1,
1 max ( r,h )}- -(p+ q+ 3),
2
,_ (p-k)'(q-k)'-4 . (p-k)' + (q-k)'-5
t -
The statistic (8.4.13), under HOk, has an approximate F-distribution with (dJ;(k)' 4f2(k)) degrees of freedom. The hypothesis (8.4.3) is rejected if
Fak
;:>
Fa ,d')1' d')2 . We can also use following X' - approximation for testing
of (8.4.12)
X~k ~-[{n-max(r,h)-k}-~(P+q+3)]lnAk' which has approximate
X' -
distribution with (p -
k) ( q - k)
(8.4.14)
degrees of
freedom. Example 8.3: Following sample covariance matrix of three sets of variates
(Yl, Y,), (Xl, X,) and (Zl, Z,) is based upon a sample of size 25.
Canonical Correlation Analysis
l.00
223
0.80 0.45 0.20 0.22 0.20 l.00
0.31 0.32 0.38 0.27 l.00
S~
0.66 0.16 l.00
0.37
0.08 0.04 l.00
0.60 l.00
Compute the partial canonical correlations and canonical variates between sets (Y1. Y2) and (Xl. X 2) after removing the effects of set (Zl. Z2). Solution: The squared partial canonical correlations are given as the solution of
IS;;.3S12.3S;~.3S21.3 - A:3II =
0,
where S11.3 = S11 -
S13S~~S31'
S12.3 = S12 -
S13S~~S32
S21.3 = S~2.3 and S22.3 = S22 - S23S~~S32 . Now, for given covariance matrix we have
S
~ [l.00 0.80] S ~ [l.00 0.66] l.00 '
11
l.00
22
_ [l.00 0.60] / [0.45 0.20] l.00 ' S12 ~ S21 ~ 0.31 0.32
S33 -
~S
S 13
/
~
31
[0.22 0.20] 0.38 0.27
~S
mdS 23
/
~
32
[0.16 0.37] 0.08 0.04
Also
0.712] 0.853 -1 S22 - S23 S 33 S 32
S22.3
~
S
~ S
12.3
/ 21.3
~ S
- S 12
13
~
[0.857
1 S- S ~ 33 32
0.651] 0.994
[0.386 0.183] 0.231 0.290 .
Chapter Eight
224
Further
S-1 S S-1 S 11.3 12.3 ZZ.3 Z1.3
~ [0.376
0.014] -0.221 0.092
so
S-1 S
S-1 S
11.3
22.3
12.3
- A., I
The detenninential equation 'z
[0.376 - A.3. -0.221
~
21.3·3
0.014 ] 0.092- A.3 .
IS~;.3S12.3S~;.3SZ1.3 - .i,I1 ~ 0
'
A3 - 0.468A 3 + 0.038
gives
O.
~
Solving above equation, the squared partial canonical correlations are "2
~
..1,.3
Ii.3
~
"2
~
0.363 and ..s.3
rZ .3 ~ 0.105
The partial canonical correlations are
Ii.3
~
0.602 and rZ.3
~
0.324 .
We now obtain the partial canonical variates corresponding to ..1,.3 ~ 0.363 . For this, we know that the coefficient vector a1.3 is given as the solution
of
(S~;3S123S~;3SZ1.3 - ~3I )U1.3 ~ 0 subj ect to
Now
a:'
3 S11.3 a l .3 =
1.
(S~;3S123S~;3SZ1.3 - ~3I )U1.3 ~ 0
gives
S-1 S S-1 S U ~ /'1.3 5. U1.3 11.3 12.3 22.3 21.3 1.3 or
0.376 0.014]["11.3] ~ 0363["11.3] [ -0.221 0.092 a Z1.3 . a Z1.3 or
0.376a11.3 + 0.014a21 .3
~
0.363a11.3
(i)
and
-0.221a11.3 + 0.092a 21 .3
~
0.363aZ1.3 .
The constraint is af.3SU.3{X1.3 = 1 or
a 1.31[0.944 0.712]["11.3] Z 0.853 a Z1.3
~1
(ii)
Canonical Correlation Analysis
225
or
0.944",;1.3 + l.424"'11.3"'21.3 + 0.853"'i1.3 ~ 1.
(iii)
Also from (i) we have "'21.3 ~ -0.929"11.3. Using this value in (iii) we have
0.357 ",,21.2 ~ 1 => "'11.3 ~ l.674. Also "'21.3 ~ -0.929"'11.3 gives "'21.3 ~ -l.555 so
",i, ~[l.674 The coefficient vector Now
-1.555]
p.) is obtained by using lil.3
s-' s
_ [ 0.616 22.3 21.3 - -0.219
=
~~iI2S;~.3S21.3a1.3.
0.096] 0.229'
so
131.3
~ (0.363 r 1l2 [
0.616 -0.219
0.096] [ 1.674 ] 0.229 -1.555
~[
1.466 ] . -1.201
The first pair of partial canonical variate is, therefore
U13 ~ U;3Y ~ l.674~ -1.555Y2 and
v;, ~ f3;,x ~ l.466X, -l.201X2 The second pair of partial canonical variates can be obtained in similar way.
8.S Properties and Uses of Canonical Correlations Canonical correlation provide us the strength of interdependence between two sets of variates. The canonical correlations and canonical variates have certain properties and uses which are given below. 1.
2.
The canonical correlations are invariant under the change of origin and scale. This property of canonical correlation is extension of popular property of simple correlation coefficients. This property emphases that if unit of measurement is changed, the canonical correlations will remain unchanged. This property further emphasized that if variables are standardized, the canonical correlations will be unchanged. Since the first canonical correlation is maximum correlation between linear functions of sets y and x, the first canonical correlation will
Chapter Eight
226
(1'; ,X)) .This
exceed simple correlation coefficient between any pairs
3.
property further says that the first canonical correlation will exceed multiple correlation coefficient between any 1'; and combined effect ofx's The canonical variates are directly related with the multiple regression analysis. Recall that the multivariate regression coefficient matrix for regression of set y on set x (corrected for their means) is computed as
-,
,
Bl
u= i
SxxSxy. Once the coefficient vector a i for canonical variate
=
aiy is computed, the coefficient vector hi for canonical variate
1\ can be computed as hi for canonical variate 1\
=
a i for canonical variate where
B; is
=
~ai . Similarly if coefficient vector hi
bix is computed first, the coefficient vector
u i
can be computed by using a j
B;b J '
regression coefficient matrix for regression of mean .
" ...
corrected x on mean corrected y gIVen as Bl 4.
=
-1
=
S yy S yx
.
The test of significance in multivariate regression analysis can be conducted by using the canonical correlations. Recall that the Wilks statistic for testing H 0 : Bl = 0 in multivariate regression analysis is based upon the Wilks statistic given in (7.2.12) as
A~
lEI IE+HI
Iy/y Iy/y
B/X/yl
-nyll
This statistic can be based upon the squared canonical correlations as (8.5.1)
5.
where r;2 is ith squared sample canonical correlation coefficient. The test of significance for set of variables in multivariate regression can also be based upon the canonical correlations. This means that the hypothesis H 0 : B2 ~ 0 in partitioned model
y
~
X,B, + X 2 B 2 + E,
where Xl contains
(q -
h) predictors and X 2 contains h predictors.
Recall, that the test of hypothesis H Wilks statistic given in (7.2.19) as
0 :
B2
~
0
can be based upon
Canonical Correlation Analysis
~ Iy/y -B/X/yl ~ Af
A q
Iy/y -
B;X;yl
A,'
227
(8.5.2)
where A f , for full model, is given in (7.2.12) and A" for reduced model, is given as
~
A I
lEI ~ Iy/y - B;X;yl IE+HI ly/Y-ny/l·
By analogy with (8.5.1), the Wilks statistic for reduced model can be written as
A, ~ n:~1(I-c/2), where
2 GJ
is sample
(8.5.3)
squared canonical correlations between
(~'Y2' .. ,Yp) and (X"X 2, .. ,Xq_h ) and t~rnin(p,q-h) The Wilks statistic (8.5.1) can be written as
~ TI:~J 1- ,,2 ).
A q
6.
n:~Jl-c/2)
(8.5.4)
The subset selection in canonical correlations can be done by using the methods discussed in multivariate regression analysis.
CHAPTER NINE PRINCIPAL COMPONENT AND FACTOR ANALYSIS
9.1 Introduction In previous chapters we have discussed multivariate techniques that have direct univariate counterparts, that is the techniques which are considered as extension of some univariate techniques. For example the multivariate regression analysis discussed in Chapter 7 is an extension of popular univariate regression analysis. We now discuss some multivariate techniques that do not have their univariate counterpart. In this chapter we will discuss two popular multivariate techniques known as principal component and factor analysis. These two techniques aim at obtaining smaller number of components or factors which can be used as representative of actual variables reducing dimension of the data and hence known as dimension reduction techniques. We discuss these techniques in the following.
9.2 Principal Component Analysis Principal Component Analysis (PCA) is a multivariate technique which based upon reducing the dimension of a multivariate data set. The technique is based upon transforming a set of correlated variables IS
[Xl
x
=
y
~ [Y,
Xl
Y,
J J
Xp yp
into a new set of uncorrelated variables
with the purpose that first few of the transformed
variables y will account for maximum possible variation of the original data. The transformed variables, called Principal Components (PC's), are obtained as a linear combination of original variables, that is, the ith PC is
y.
~ a"X, + a"X, + ., .. ,.+ a,pXp ~ a/x; i ~ 1, 2, ... ,p(9.1.1)
Chapter Nine
230
where a i
=
[ail
a i2
a ip
J
is the coefficient vector which is to
be computed under certain conditions. The PC's are obtained in decreasing order of importance, that is the first principal component accounts for highest variation in the data. The second principal component accounts for second highest variation in the data and so on. Since the PCA is a transformation technique so the variables considered in the analysis are on equal footing, that is, none of the variables are considered as dependent or independent. The PCA is based on the assumption to transform actual set of variables into new set of uncorrelated variables therefore it is recommended not to perform PCA on set of variables which have low correlations. A simple way to decide about suitability of data for PCA is to run Test of Sphericity on given data and proceed with PCA if test provides a significant result The classical discussions on principal components can be found in Hotelling (1933) and Rao (1964). In the following we derive a procedure to obtain PC's for a set of variables x.
9.2.1 Computing the Principal Components The principal components are obtained as a transformation of original variables into a new set of uncorrelated variables. The new variables are obtained in decreasing order of importance such that the first principal component accounts for highest variation of the data. This simply means that the principal components are obtained such that their variances are in decreasing order. Now if we have a set of variables x with covariance matrix 1: then variance of first principal component Y1 =
Var(~)~Var(a;x)~a;I:a"
ai x
IS
gIVen as
(9.2.1)
Where a l is the coefficient vector such that (9.2.2)
ai a
The constraint l = 1 ensures that the transformations are orthogonal. Now to obtain first principal component we need to maximize Var{Y1) subject to constraint (9.2.2). The function to be maximized is therefore
rf1 ~a;I:a,-/l.(a;a,-1) Now differentiating (9.2.3) wrt a, we have
(9.2.3)
Principal Component and Factor Analysis
231
a¢1 ~ 21:u, - 2Au, .
au,
Equating above equation to zero we have 1:u, - Au, ~ 0 => 1:u, ~ Au, .
(9.2.4)
Using (9.2.4) in (9.2.1) we have
Var(J;) = Auiul => Var(J;) = A as uiul
=
1.
Further, for a unique and non-trivial solution of 1:u, -Au, ~ 0 or (1:-AI)U, ~ 0,
(9.2.5) (9.2.6)
For u, ' the matrix (1: - AI) should be singular, that is we must have
(9.2.7) From (9.2.7) we can see that A is an eigen value of 1: . Since 1: is a (p x p) positive definite matrix therefore it will have p eigen values. Suppose that the eigen values are arranged in descending order such that A, ;:> A, ;:> ...... ;:> Ap > 0, then from (9.2.5) we can say that Var(Y,) ~ A,. Using eigen value of 1: in (9.2.6) we can see that the coefficient vector a, is an eigen vector corresponding to eigen value AI.
z= ui x
Again consider the second principal component as Y
such that
Var(Y2 ) ~ ui1:u 2 and COV(~'Y2) ~ 0 Now
COV(~'Y2) ~ Cov( u;x,uix) ~ u;1:u 2 . Using the fact that 1:a 2 ~ Aa 2 we have COV(~'Y2) ~ AU;U 2
u;u
COV(~'Y2)~0 implies that
2
.
So
~O as A>O. Now for second
principal component, the function to be maximized is ¢2
~ ui1:u 2
A( uiu
-
2
-1) - 28uiu,
(9.2.8)
Differentiating (9.2.8) wrt u 2 we have
a¢2 _
- - - 21:a 2
aa 2
-
2Aa 2
-
2Sa, .
Pre-multiplying with ui and equating to zero we have
u;1:u 2 -2Au;U 2 -28u;u, ~0=>8~0,
(9.2.9)
232
Chapter Nine
as a,I Ea 2 ~ a,I a 2 ~ 0 and a,I a, ~ 1. Using this value in (7.2.9) and equating to zero we have Ea 2 - Aa 2 ~ 0 => (E - AI)a 2 ~ 0 (9.2.10) For a non-trivial solution the matrix
(E - AI)
should be singular, that is
we must have IE - All ~ 0 . We see again that A is an eigen value of E. This time we choose A, to be the second largest eigen value. Using this in (9.2.10) we have (E-~I)a2 ~O=>Ea2 ~~a2'
that is coefficient vector Uz is an eigen vector of 1: corresponding to second largest eigen value Az. In general the coefficient vector, ai, for ith principal component is an eigen vector of 1: corresponding to ith largest eigen value AI.
Remark 9,1: The computation o/principal components can be based upon either on covariance matrix 1: or correlation matrix p but since actual variables can be measured on dtfferent scales therefore it is preferred to compute principal components from the correlation matrix p. Remark 9,2: When we have sample data then the principal components are computed by using either sample covariance matrix S or sample cOJTelation matrix S. 9,2,2 Proportion of Variation Explained by PC's The ifh principal component is
Y,
~ a; x with Var (y, ) ~ A, where
A,
isitheigenvalue of E or p. Also Cov(y"Yk ) ~ 0 for i7c k. Now writing
n ~ [a,
a2
ap
J, with gl g ~ I and A ~ diag ( A, ) , the vector
of principal components y can be written as y ~ gl x. Also fhe set of equations I:u i
y
=
= ~ai
can be written as I:!l = !1A . Now consider
g/X=> Cov(y) ~ Cov( g/X) ~ g/Eg
Now using the fact that Eg ~!lA we have Cov(y)~ g/gA~ A as gig ~ I.
Principal Component and Factor Analysis
233
Further, since
Cov(y) ~ g/Eg so g/Eg ~ A Now
tr( A) ~ tr( g/Eg) ~ tr( Eg/g) ~ tr(E)
=> "",P
/t
L..ti=l
~"",p CT L..t i=l Ii
J
The proportion of variation explained by ith principal component is therefore
Pr(y')~ ~
(9.2.11)
L /~(~'
Example 9.1: Consider the covariance matrix of two variables
E~[~ 2~l Obtain the principal components of x. Solution: We know that the coefficient vector for principal components are given as the eigen vectors for the covariance or correlation matrix. We, therefore, first compute the eigen values of the covariance matrix by using
IE- /tIl ~ 0
Now
IE-2II~
1-2 4 1
4 25-2
1~22-262+9. 2
So the polynomial equation is /t - 26/t+ 9 ~ O. The solution ,s A, ~ 25.649 and Ao ~ 0.351. Now eigen vector corresponding to A, ,s obtained by using Eu,
1 4 [ 4 25
~
A,u, so we have
][aa ll ]~ 25.649[~'] a 12
12
or ~,
+ 4a12 ~ 25.649~, 4a + 25a12 ~ 25.649a12
ll
From (i) we have we have
all
(i)
(ii)
~ 0.162a12 . Now using u;u, ~ lora;, + a,22 ~ 1
234
Chapter Nine
Also
a 11 So u,
~
(0.162)(0.987)=> a 11 ~ 0.160
~ [0.160 0.987r and the first principal component is ~ ~
0.160X, + 0.987X2
.
Similarly, the second principal component is
Y2
~
-0.987X, + 0.160X2 .
The proportion of variation explained by two principal components is
A, ~ 25.649 ~ 0.9865 A, + A., 26
Pr(Y,) Pr(Y ) ~ 2
A.,
A,+A.,
0.351 ~ 0.0135. 26
We see that only one principal component is sufficient to provide information of these two variables.
9.2.3 Correlation between Principal Com ponents and Actual Variable The principal components of a random vector x can be computed by using its covariance matrix 1: or correlation matrix p. In case of sample data the estimates of 1: or p can be used. The ith principal component for
a;
u
the vector x is 1'; = x where i is ith eigen vector of 1: corresponding to the ith largest eigen value. When the principal components of a set of variables x are computed, then we may be interested to see the effect of individual variables on the principal components. This can be done by computing the correlation between actual variables and the principal components. In the following we have obtained the correlation coefficient between actual variables and the principal components.
.r:
a;
= x be the ith principal component of x and we want to Let compute its correlation with kth variable, Xk. Now, the correlation coefficient between r; and X k is
COV(Y"Xk) ~Var(Y, )~Var(Xk)'
Principal Component and Factor Analysis
235
where Var(~)=~. and Var(Xk)=o-kk.lnordertoobtainCov(~,Xk) we first see that the variable X k can be written as a linear combination ofx as X k
~ I~x where Ik ~ [0 0
1
Or. Now
Cov(Y"Xk) ~ Cov( xk,y,) ~ Cov(lkx,a;x) ~ I~Cov( x
Now using the fact that
I:a,
~
COV(~,Xk) = l~~ui
la, ~ 1~I:a,.
A,a, ' we have =
~aki'
where a h is kth element of u i . The correlation coefficient between ith principal component and kth variable is, therefore
_ Py,x
k -
A,ah
.Ji;~(Jkk
ah.Ji;
~(Jkk
(9.2.12)
A variable Xk is of larger importance for principal component Y! if it has higher correlation coefficient as compared with the other variables. When principal components are computed from the correlation matrix p then the correlation coefficient between ith principal component and kth variables is gIVen as
Py,xk
_ ),ah -
p; - a hV{j/\ ' _
(9.2.13)
where A, is ith eigen value of p and a h is kth entry of ith eigen vector of p. When we have sample data then the values in (9.2.12) and (9.2.13) are replaced with their estimates. Example 9.2: Consider following covariance matrix of three variables
"f ::
Obtain the principal components of x and obtain correlation coefficient between actual variables and principal components. Solution: We know that the coefficient vector for principal components are given as the eigen vectors for the covariance or correlation matrix. We,
236
Chapter Nine
therefore, first compute the eigen values of the covariance matrix by using ~ 0 . Now
IE - All
6-,1,
IE-All ~
5
4 5
6-,1,
~ ,1,3
-18,1,2 + 42,1,- 20.
6-,1, So the polynomial equation is
,1,3 -18,1,2 + 42,1,- 20 ~ 0 . The solution is A, ~ 15.349, Ao ~ 2.000 and
A"
vector corresponding to A, is obtained by using
E"1
4]5 ["11 "12 6
~
~
~
A,u1 so we have
15.349 ["11 "12
"13
"13
Solving above equations under the constraint u1
0.652. Now eigen
uiu
1 =
1, we have
~ [0.564 0.603 0.564f. The first principal component is, therefore ~ ~
0.564X1 - 0.603X2 + 0.564X3
.
Similarly, other two principal components are
Y2 Y3
~0.707Xl-0.707X3 ~
-0.427X1 + 0.797X2 -0.427X3
Now, correlation coefficient between ith principal component and kth actual variable is given as
_ak"';;
P~xk -
~'
veTa
so the correlation coefficient between first principal component and actual variables is
PYixk
_ akl..[i; -
~'
VtJkk
where A, is first eigen value and a h is kth entry of ith eigen vector We, therefore have
PYixl
_ "11..[i; -
~
vail
0.56~
16
0.920
Principal Component and Factor Analysis
237
~ a 21 ,[i; ~ 0.603~ ~ 0.964 ~tJ22
J6
and
0.56~
J6
0.920.
We can see that the variables Xl and X) are of equal importance for first principal component whereas the variable X 2 has higher correlation with first principal component. Correlation coefficient between other principal components and other variables can be computed in same way.
9.2.4 Principal Com ponents from Special Matrices The principal components are linear functions of actual variables and are obtained in decreasing order of importance. The principal components can be obtained from covariance or correlation matrix of actual variables. If the covariance or correlation matrix exhibit some specialized pattern then the principal components also show some specialized pattern. One situation that can arise as a special pattern is that a specific variable is uncorrelated with rest of the variables in fhe data. If fhis happen then that variable will ultimately appear as a principal component. Another special situation that may arise is that all the variables have equal pairwise correlation coefficients, that is we have equi-correlation matrix. If this sort of matrix is available then the principal components also have specialized pattern and is discussed below. Suppose we have an equi-correlation matrix of the fonn
1
p
p
p
p
1
p
p
p= p
p
1
p
p
p
p
1
then it is easy to show that the first eigen value of p is ..1, ~ 1 + (p -1) P and all other eigen values are equal to (1- p). The eigen vector corresponding to the first eigen value ..1.1
IS
Chapter Nine
238
1
~ ~-[1
1
JP
and hence the first principal component is
~
I
=
uix =
.JP1
", p
Xi·
L.t i =l
This shows that if we have an equi-correlation matrix then all the variables have equal importance in first principal component. Example 9.3: Consider the following covariance matrix of four variables
1.0 0.7 0.7 0.7 l.0
0.7 0.7 l.0
0.7 l.0
Obtain first principal component of x and compute proportion of variation explained by it. Solution: The given matrix is an equi-correlation matrix so the first eigen value of this matrix is
..1,
~1+(p-1)p~1+(4-1)(0.7)~3.1
The first principal component is
y> _1_",4 1
.JP
X ~.!. ",4 X
L.ti=l
I
2 L.ti=l
i·
The proportion of variation explained by the first principal component is
Pr(Yl)~~~ 3.1 ~77.5%. tr(p)
4.0
9.2.5 Mean Corrected Com ponent Scores We have seen that the principal components are linear transfonnation of the variables and the coefficient vector of the principal components is given as the eigen vector of the covariance or correlation matrix of actual variables. When information about the mean vector is also given then the principal components are to be corrected for this mean. These corrected principal components are called the mean corrected components and the ith mean corrected principal component is obtained as
Principal Component and Factor Analysis Y, ~
u; (
X-
239
I' ) .
If principal components are obtained from the sample, then the mean corrected principal component is given as
a;
~ = (x- x). The values of mean corrected principal components can be computed for each sample observation and these values are known as mean corrected component scores.
n
Example 9.4: Consider the following mean vector and covariance matrix
I'~ [~~] and ~~ [6
Obtain mean corrected principal components. Solution: We know that the coefficient vector for principal components are given as the eigen vectors for the covariance or correlation matrix. We, therefore, first compute the eigen values of the covariance matrix by using I~ All ~ 0 . Now
-
6-,1,
I~-AII ~
5 8-,1,
I~A2-14A+23
1
So the polynomial equation is ,1,2 -14,1,+ 23 ~ O. The solution 's A, ~ 12.099 and Ao ~ 1.901. Now eigen vector corresponding to
A,
~
12.099 is obtained by using [
ll 6 5][a ] 8 a 12
~u, ~
~ 12.099[aa
ll
A,u" so we have
] .
12
Solving above equations under the constraint
ai a
l
=
1, we have
u, ~ [0.634 0.773f. Similarly, the eigen vector corresponding to
Ao ~1.901
is u 2 ~[-0.773 components are
Y,
~u;(x-,u)~[0.634 ~
and
0.634( The two mean corrected principal
0.773 1[X,-12]
0.634X, + 0.773X 2 - 20.749
X 2 -17
240
Chapter Nine
Y, ~ a; (x-,u) ~ [-0.773 0.634] X 1 -12] [ X, -17 ~
-0.773Xl + 0.634X, -1.502
9.2.6 Number of Components to be Retained One primary purpose of principal components is to obtain orthogonal linear transformations of actual variables in decreasing order of importance such that the first few principal components provide maximum possible information about the actual variables. It is an important decision in principal component analysis to decide about the number of principal components k.
(k«
p) • out of a total p components. that one has to retain
for any subsequent analysis. Various methods are available that can be used to decide about the number of principal components to be retained after the analysis and are discussed in the following. 1.
The first and simple criteria to decide about the number of components to be retained is based upon the cumulative proporlion ofvariation accounted for by the first k principal components. The cumulative proportion of variation accounted for by the first k principal components is computed as
Pr(k)~ L;lA, ;k~ 1,2, ... ,p. L'~lA,
We choose the value of k such that the cumulative proportion of variation is anywhere between 70% - 80%. 2.
The arithmetic mean of the eigen values provides a basis to decide about the number of components to be retained. The arithematic mean of the eigen values if given as
1 p A. ~ - L'~lA" p
and that value of k is chosen such that 3.
A,. > A. .
The geometric mean of the eigen values is also used as a cut point to decide about the number of components to be retained. The geometric mean of the eigen values is computed as
Principal Component and Factor Analysis
241
and that value of k is chosen such that -\ > Ag . 4.
A formal test of significance can also be conducted to decide about the number of principal components to be retained. A popular test to decide about the number of components to be retained is to test the equality of last k population eigen values. The hypothesis to be tested is HOk : Ap_k+l = Ap_k+2 = ... = Ap ;k = 2,3, .. ,p and is tested by using the statistic, due to James (1969)
Qk -"
~ (n
1 k
where A ~ -
L
P 2 ;
ll)( klnl- L~~P_k+lln}, ), (9214)
p' k
r=p- +1
.
.
2
A,. The stallsllc Q, under H ok , has X -
distribution with v ~ (k-l)(k+ 2)/2 degrees of freedom. We start by testing the equality of last two eigen values, that is we start with testing of the hypothesis H o2 : Ap _1 ~ Ap ' and if this hypothesis is accepted then we test the hypothesis about equality of last three eigen values, that is we test the hypothesis H03 : Ap_2 ~ Ap_1 ~ Ap. We continue this procedure until the hypothesis of equality of eigen values is rejected and retain as many principal components where we get first significant result of the test
Example 9.5: Following covariance matrix is obtained from a sample of size 60
0.370
s=
0.602 0.149 0.044
0.107
0.209
2.629
0.103
0.377
0.801 0.666 0.458
0.01l -0.013 1.474
0.120
0.252
-0.054
0.488
-0.036 0.324.
242
Chapter Nine
Use different methods to decide about fhe number of components to be retained. Solution: The eigen values ofS are obtained as the solution of IS and are
,
,1,
,
,
3.3234, ~
~
,
~
l.3743,
,
0.3247, /1"
/1,4 ~
.ill ~ 0
.-s
~
0.4761
,
~
0.1565, /1"
~
0.0879.
Now we use different criteria to decide about the number of components to be retained.
We first use the proportion of variation criteria. The cumulative proportion of variation accounted for by various components is given in the table below Proportion of Variation 0.579 0.239 0.083 0.057 0.027 0.015
Eigen Value 3.3234 1.3743 0.4761 0.3247 0.1565 0.0879
Cumulative Proportion 0.579 0.818 0.901 0.957 0.985 1000
From above table we can see that the first two components accounted for about 82% variation of the data and hence we retain two components. Now to use the second criteria we first see that the mean of eigen values IS
-"
1
L P
/I, ~ -
p'
,1,
i=l
~ 0.957
Now we retain as only those components which have eigen values above 0.957. We see fhat only two eigen values are above 0.957 and hence we see that this criteria is also suggesting to retain only two components. Next, we compute the geometric mean of the eigen values as
-"/l,g ~ (Ir~l,1,, )lIP~ 0.462
Principal Component and Factor Analysis
243
Now we see that three eigen values are above 0.462 and hence this criteria suggest that we should retain three components.
Finally we use the test of significance and test the hypothesis about equality of eigen values sequentially. We start with testing ofthe hypothesis H02 : Ap_1 ~ Ap . For this hypothesis we have k ~ 2 and hence the test statistic is
where -;c
A~
1
p
,
- '\' k L.Il=p-lA,
~
0.1222.
Also
So Q2~
( 60-
The critical value is
2X6+11J 6 (2xlnO.1222+4.286)~4.595. 2 %a,(k-l)(k+2)/2
or
2 %0.05,2
or 5.99. We see that the
hypothesis is accepted and we conclude that the last two eigen values are
equaL We again test the equality of last three eigen values by using
Q3
~ (n- 2P ;
11
J( klnl- L~~P_2ln~),
where -;c
1 p , A, k L.Jz=p-2
A~-'\'
~0.1897
Also
So
Q3 ~ ( 60
2X6+11J 6 (3 xlnO.1897 + 5.4111) ~ 23.824
244
Chapter Nine
2
2
The critical value is Xa ,lk-l)lh2)/2 or XO.05,5 or 11.07. We see that the hypothesis is rejected and we conclude that the last three eigen values are not equal. This criteria, therefore, suggest to retain four components.
Following R code can be used to compute test statistic Qk for all values of k.
> s=matrix(c(O.370,O.602,O.149,O.044,O.107, + 0.209, 0.602,2.629,0.801,0.666,0.103, + 0.377,0.149,0.801,0.458,0.011,-0.013, + 0.120,0.044,0.666,0.011,1.474,0.252, + -0.054, 0.107,0.103,-0.013,0.252,0.488, + -0.036,0.209,0.377, 0.120,-0.054,-0.036, + 0.3241,6,6,byrow~TRUEI > ev=eigen(s) > lembda=ev$values > lembda
> p~6; n~60 > kl=c(2,3,4,5,6) > for (k in k1 I ( > 11~p-k+1;ul~p > i1~c(ll:ull > ml[k-1J~0;ml1[k-1J~0 > for(i in ill ( > ml[k-1J~ml[k-1J+lembda[iJ > ml1[k-1J~ml1[k-1J+log(lembda[iJ I > meanlem[k-1J~ml[k-1J/k > I > q[k-1J~(n-(2*p+111/61* + (k*log(meanlem[k-1J l-ml1[k-1J I
> > q [l,J [5,J
4.612599 23.789316 43.996115 123.809961 245.441020
9.3 The Factor Analysis The principal component analysis provide an orthogonal transfonnation of a multivariate system to a new multivariate system by partitioning the total variance of responses into successively smaller portions. If the first few components accounted for most of the total variance, and if they could be
Principal Component and Factor Analysis
245
interpreted meaningfully, the new system is supposed to be more useful. In multivariate studies the covariance provides an insight of strength of association between pairs of variables and hence we need a technique for explaining the covariances of the responses. Although principal component analysis accomplishes this in same degree through factorization of the covariance matrix, it is still nearly a transformation rather than the result of a fundamental model for covariance structure. This technique possess other short comings, such as the forms of the components are not invariant under changes in the scales and no rational criteria exist for deciding when a sufficient proportion of the variance has been accounted for by the principal components. Factor analysis, abbreviated as FA, is based upon a model in which the observed vector is partitioned into an unobserved systematic part and an unobserved error part. The components of the error vector are considered uncorrelated or independent while systematic part is taken as a linear combination of a relatively small number of unobserved factor variables. The analysis separate the effects of the factors, which are of basic interest, from the errors. The factor analysis, thus, derive new variables, called factors, which give better understanding of the data. Since factor analysis is based upon a proper statistical model therefore it is more concerned with explaining the covariance structure of the variables rather than explaining the variances. Any variance which is unexplained by the common factor can be described by residual error terms. In factor analysis the like variables are grouped together to form the factors and then the extracted factors are used as representative of actual variables. Most applications of factor analysis have been in psychology and the social sciences. As one sociological example, suppose that information is collected from a wide range of people about their occupation, type of education, whether or not they own their own horne, and so on. Then one might ask if the concept of social class is multidimensional or if it is possible to construct a single index of class from the data. In other words, to find if there is a single underlying factor
9.3.1 The Factor Analysis Model In factor analysis, it is assumed that the actual variables can be represented as linear combination of unobserved variables; commonfactors; and unobserved errors; specific factors; and hence each variable can be expressed as a model containing these unobserved variables. It is also assumed that the number of unobserved variables are much smaller than the
Chapter Nine
246
actual variables. The model is popularly known as the/actor analysis model and is given below.
Suppose that the actual variables beX"X2, ... ,Xp , unobserved common factors be ji, /Z, ... ,jm, such that m « p , and errors be e" e2, ... , ep . The factor analysis model for ith variable, assuming that E(X,) ~ 0, is
X, ~
A,,;; + A,z!2 +. --+ A,mfm+ e,
~L7~,A,J;+e, ;i~1,2, ___ ,p
(9.3.1)
The quantities Au in (9.3.1) are called/actor loadings. The factor analysis model (9.3.1) has certain assumptions which are given below 1. The common factors have zero mean and unit variance, that is
E(J;) = 0
and and
Var(J;) ~ 1
2. The common factors are uncorrelated, that is
Cov( J;, J; ) ~ 0 for
j 7c h 3. The specific factors have zero mean and different variances, that is
E(e,)~O and Var(e,)~'¥ ,
4. The specific factors are uncorrelated from each other, that is
Cov( e"e k ) ~ 0 5. The common factors and specific factors are uncorrelated, that is
Cov( e,,j;) ~ 0 The factor analysis model (9.3.1) for the random vector x is immediately written as
x,
\PXI)
=
A.
(
'lpxm) (mxl)
+e(pxl) ,
where A is the loading matrix given as
A"
A" A(px m) ~
A,2 A,2
(9.3.2)
Principal Component and Factor Analysis
f Cm"') is the vector of common factors; f e(pxl)
~ [J; 12
is the vector of specific factors; e = [e 1
ez
247
r;
1m
and
epJ.
The
assumptions of orthogonal factor model can be compactly written as below 1.
E(f)~O and Cov(f)~E(ff/)~Im
2. E(e)~O and Cov(e)~E(ee/)~,¥p ~diag('¥,).
3. Cov(e,f)~E(ef/)~OCp"m ) The factor analysis model (9.3.2) and associated assumptions are very useful in obtaining the covariance structure of actual variables in terms of common factors and specific factors. The covariance structure is given in the following.
9.3.2 Covariance Structure for the Factor Model The m factor model in given in (9.3.2) as Xi
\PX!)
with
~
A. ( ' lPxm) (mx!)
+e(px!) ,
E( x) ~ 0 . Now. we have Cov( x)
~ 1: ~E( XXi) ~ E[(Af + e)(Af + e)1 ] ~ E[ Affl AI + Af e l + ef l A+ eel]
~AE(ff/)AI + L E(fe/)+E(ef/)A+E(ee / ) or
1: ~ AAI + '¥ . (9.3.3) The covariance structure given in (9.3.3) is useful in obtaining the factor loadings by using the covariance matrix of the variables. From this equation we can see that the orthogonal factor model exactly explains the covariances of the variables, that is the covariances only depends upon the factor loadings and not on the errors. This can also be illustrated by computing the variance of individual variable and covariance between any pair of the variables as below.
248
Chapter Nine
We have
x, ~ A,l.h + A,zh + ... + A,mfm + e, Since
E(X, ) ~ 0
~ L~~lA,J; + e,
therefore
Var(X,) ~ 0"" ~ E(Xn ~ E(L7~1A,JJ + e,
~ E[(L~~lA,JJ ~
E[",m
z z
r e; +
+
r
2L~~1A,J/;e,]
",m ",m
L.J~lA,J/; + L.J~lL.h'J~lA,JA,JJh +e; +
2L ~~lA,JJe, ]
or
Var(X,) ~ L7~1A,~E(Jn+ L7~lL~'rlA,JA,hE(U;') +E(e:) + 2L~~(V(JJe,) (9.3.4) Again
Cov( X, ,Xk ) ~ O",k ~ E( X,X k)
~ E[(L~~lA,JJ + e, )(L~~l~JJ + ek)] ~
E[",m Z ",m ",m L.J~lA,JAq/; + L.J~lL.h'J~lA,J~JJh +L~~lA,JJek + L~~l~JJe, + e,ek]
or
O",k ~ L~~lA,J~JE(Jn+ L~~lL:'J~lA,J~hE(/;f,,) +L7~1A,JE(JJek)+ L7~AJE(JJe,)+E(e,ek) (9.3.5) From (9.3.5) we can see that the covariance only depends upon the factor loadings and not upon the error variance. The entry hi
=
L~=l~~ in the
Principal Component and Factor Analysis
249
decomposition of (Ju is called the commonality and this represents the part of variance explained by the common factor. The commonalities are the
diagonal entries of AA I . The proportion of variance of ith variable explained by the common factors is therefore (9.3.6) The proportion of variance of ith variable, explained by the }th common factor is simply A,~ / (J'"
.
The proportion of total variance explained by the
common factors is
(9.3.7) The proportion of variation explained by the specific factor is easily computed. The proportion of variation explained by the common factor is a useful measure to decide about the nurn ber of factors. We will discuss this latter in this chapter The covariance structure given in (9.3.3) provides a relationship between covariances of actual variables, common and specific factors. If the variables are standardized then 1: is actually the correlation matrix of the variables and (9.3.3) can be written as p~AA/+'P.
Further, if the sample data is used then the covariance structure for sample covariance matrix is analogously written, from (9.3.3) as
S ~ MI + ,y. The structure for sample correlation matrix is easily written.
(9.3.8)
The covariance structure given in (9.3.3) or (9.3.8) can be used to decide about a suitable value for the nurn ber of factors to be extracted. In factor analysis, the entries of the covariance matrix 1: or S are all known and altogether there are P(p + 1 )/2. In order to fit the factor analysis model we need to obtain pm entries of A and p entries of 'P , that is altogether we are required to obtain p ( m + 1) entries, that is the factor analysis model is fitted if
p(m+l)" p(p+l)/2
250
Chapter Nine
or
m,,(p-l)/2
(9.3.9)
From (9.3.9) we can see that the value of m should be much smaller as compared withp for fitting of the factor analysis model. Example 9.6: Consider the following correlation matrix of three variables
_ [l.00 P-
0.83 0.78] l.00 0.67 l.00
Obtain one factor solution for the model x Solution: Here we have p
~
3 and m
~
~
Af + e .
1. The factor analysis model for ith
variable is
Xi
=
L;=l~Jf; + ei ; i = 1,2, ... ,p,
which for m
~
1 is
~A,,;;
X,
+e"
i~1,2,.
__ ,p.
The covariance or correlation structure for the orthogonal factor model is
p ~ AAI + 'I'
where,
A~[~,
A.31f. Now from p
Ao, p=
[~'Ao,
[~,
f' '"',
for m
Ao,
~
1, the
loading matrix
~ AAI + 'I' we have, for m ~ 1,
A"J+
[T'
~
A"
~,Ao,
~,A"
Ai, + T2
Ao,A"
0
0
T2 0
0
T3
~, +T3
Now comparing this with given matrix we have
~2, + '1', ~l.OO;~,Ao, ~0.83;~,,,,, ~0.78
Ai, + 'I'
2
~ l.00; ~, + 'I' 3 ~ l.00; Ao, "', ~ 0.67 .
Solving above equations simultaneously, we have ~,~0.983;Ao, ~0.844;A"
T,
~
A
0.034; T 2 ~ 0.287; T3
~0.794 ~
0.370
,s
Principal Component and Factor Analysis
251
The one factor solution for the given covariance matrix is, therefore
X,
~
0.983t; + e,
X2
~
0.844t; + e 2
X3
~
0.794t; + e3
.
Also the commonalities are
..1,2, ~ 0.966;~, ~ O. 712and..t;, ~ 0.630. Total commonality is
L:~,..1,; ~ 2.308. The proportion of variance explained by common factor is, therefore 2.308 ~ 76.9%.
3 From above example we see that, for a one factor model with 3 variables, we need to obtain values of 6 unknowns. The number of unknowns increases rapidly with increase in p and m. For example for p ~ 6 and m ~ 2 we have to obtain values of p(m+l)~6(2+1)~18unknowns from
P (p + 1) / 2 ~ 6( 6 + 1) / 2 ~ 21 available equations. The solution of such large number of equations is usually done by using some computer software.
9.3.3 Correlation between Variables and Factors The factor analysis is a useful technique in reducing dimension of the data. The technique tries to obtain smaller number of factors. by grouping actual variables, which accounts for maximum possible variation of the data. The grouping of variables is done on the basis of likedness. that is variables which are close to each other are grouped together. Alternatively, we can say that a variable is grouped into that factor where it has maximum correlation, that is the grouping is done on the basis of correlation between a variable and a factor. We, therefore, need to compute the correlation between a variable and factors to decide about the grouping of that specific variable. In the following we compute the correlation coefficient between a variable and a factor. The m factor model is given in (9.3.1) as
252
Chapter Nine
Xi =L~=l~)f;+ei ;i=1,2, ... ,p. F or the sake of simplicity we assume that the variables are standardized and hence the covariance between a variable and a factor is equal to the correlation coefficient. Now
Cov(X,,/;) ~ E(XJ;) ~ E[( A,d; +-+ A,;!; +-+ A,mfm+ e, )f; ] ~ E[ A,JJ;
~ A"E(fJ; ) +
+ ---+ +
A,;!f + ---+ A,mfmJ; + e,J; ]
A,;E(Jf)+
+
A'mfmE(Jmf; )
+E( e,J;) => COV(X,,/;) ~ A,; . From this we can see that the factor loadings Ai] are actually correlation coefficients between variables and the factors and hence the variable X l is grouped in factorf; if Au > A,h; h ~ 1, 2, ___ , m and h j.
*
The covariance betweenXj andji can be extended to the random vector x and vector of common factors f. Specifically, the covariance between x and Cis
Cov( xC) ~ E( XCi) ~ E[(Af + e )C I ] ~ E( Affl + eC I ) ~AE(ff/)+E(ef/)~A
(9.3.10)
From this result we can see that the loading matrix A is actually the matrix containing correlation coefficients between variables and the factors. Each
column of I contains correlation coefficient of actual variables with a specific factor.
9.3.4 Factor Rotation The fitting of orthogonal factor model, given in (9.3.2), requires obtaining the entries of loading matrix A and entries of specific variance matrix '¥ . Often it happen that the entries of A and/or '¥ are not as per assumptions of the factor model and the solution can not be treated as an admissible solution. For example consider the following correlation matrix of three variables
Principal Component and Factor Analysis
253
0.9 0.7] 1.0 0.4 1.0 The covariance or correlation structure for the orthogonal factor model is
AAI + 'P A~[Al1 Ao,
p~
where.
for
I. fhe loading matrix
m
A
,s
A.31 (Nowfrom p~AA/+'P wehave.form~l.
~ [;1,2, + '1\
Ai, + 'P
P
2
Now comparing this with given matrix we have
/1,2, +'P,
~1.0;/I"Ao, ~0.9;/I"A" ~0.7
Ai,
Ai,
+ 'P 2 ~ 1.0; + 'P3 ~ 1.0 ; Ao,A" ~ 0.4. Solving above equations simultaneously, we have
/1"
~
1.255; Ao,
'P,
~
-0.575; 'P 2 ~ 0.486; 'P3
~
0.717; A"
~
0.558 ~
0.689
From above we can see fhat 'P, < 0 which is against the underlying assumption as '¥' s are the variances of specific factors which can not be negative. The solution is, therefore, not admissible. In above example we have seen that sometimes it is not possible to obtain an admissible solution even for a single factor model. When situations like this arise then the factors are to be rotated, that is we rotate the factor axis. The factor rotation can be done by multiplying the actual factor loadings with some orthogonal matrix. For fhis. suppose B be an orfhogonal matrix such that BBI ~ BIB ~ I and consider the factor model as
x ~ Af + e~ ABB/f + e ~ (AB)(B/f)+e~ A'f' +e where A' ~ AB and f' ~ B/f . The new factors f' ~ B/f are called
rotated factors and the new loadings A' ~ AB are called the rotated loadings. We can observe that
E(f')~E(B/f)~O
254
Chapter Nine
and
Cov( f') ~ Cov( B/f) ~ B/Cov( f)B ~ BIB ~ I, that is the rotated factors possess fhe properties of actual factor model. The covariance structure for the rotated factor model is easily obtained by considering the covariance structure of actual model given as
1: ~ AAI + '¥=ABBI AI + '¥
~(AB)(AB)I +'¥=A'A'I +A.
(9.3.11)
From this we can see that although the rotated factor loading A' differ from the actual factor loadings A, they provide the same covariance matrix 1: which means that the factor model solution is not unique as different choices of orfhogonal matrix B will provide different rotated solutions.
9.3.5 Principal Factor Analysis In previous sub-section we have seen that the solution of factor analysis model lies in obtaining the entries of loading matrix, A. and specific variance matrix 'P. We have seen that obtaining entries of A is not an easy task and we have to solve large number of equations simultaneously. Various methods are available that can be used to obtain fhe entries of A and hence in obtaining solution for the factor analysis model. One popular method of obtaining entries of A is based upon the eigen values and eigen vectors of the covariance matrix 1: or S and is known as the principal/actor solution. When factor analysis model is fitted by using fhis method then the analysis is calledprincipalfactor analysis. The method is described in the following for the sample covariance matrix S. We know that the covariance matrix S can be generated from the eigen values and eigen vectors by using
S
=
L:=I~CiC:,
where ~. is ith eigen value and c) is eigen vector corresponding to the largest eigen value of S. The above decomposition can also be written as
S ~ CDC I , (9.3.12) where C is an orthogonal matrix whose columns are nonnalized eigen vectors of Sand D is a diagonal matrix such fhat D ~ diag( It, . d·mgona 1 we can wrIte . D IS
=
D1I2D1I2 , were h
).
Since D
Principal Component and Factor Analysis
..re:
o
o o
o J8; o
255
o
Using this representation, we can write (9.3.12) as
S ~ CD1I2D' /2 CI ~ (CD'/2)( CD1I2 Y.
(9.3.13)
Comparing this equation with the covariance structure of orthogonal factor model given as
S~MI
+>¥, (9.3.14)
ft
we can see that A
~
CD
1/2
ft
and '¥
for the covariance matrix as CD
I/2
~
0 , that is we have a p factor solution
is order
(p x p). In factor analysis we
want to obtain an m factor solution with m < < p, so we can choose the m largest eigen values of S, to define a sub-matrix DI as
..re:
0
0
o o
o o
and define CI to be a (p x m) matrix whose columns are eigen vectors corresponding to the eigen values e,;:> e2 ;:>. - - ;:> em' In doing so, the representation (9.3.13) can be written as
S~ (C
1
D1I2 I
)(C D1I2)1 1
I
.
Now equating this with (9.3.14), we have A = C,D: 12 and = S - MI and hence we have the solution for the factor analysis model. This method is very useful in obtaining solution for the factor analysis model as we need not worry about the number of factors to be extracted. We illustrate this procedure for p ~ 5 and m ~ 2 as under. Since p ~ 5 and m ~ 2, we have
>¥
256
Chapter Nine
A=
A,1
A,2
Gl1
G12
A.z1
A.z2
G21
G22
4,1
4,2 ;c=
G31
G32
;1,41
;1,42
G41
G42
4,1
4,2
G"
G"
=
.JO:C11 .JO:CZ1 .JO:C31 .JO:C41 .JO:CS1
and D1I2
=l~
;J
Now
CD 1I2
So
, =C !ill A 1 1 gives A,1
A,2
A.z1
A.z2
4,1
4,2
;1,41
;1,42
A,1
A,2
FzC12 FzCZ2 FzC32 FzC42 FzCS2
.JO:C11 .JO:CZ1 .JO:C31 .JO:C41 .JO:CS1
FzC12 FzCZ2 FzC32 FzC42 FzCS2
that is the entries of A can be easily obtained by equating with the corresponding entries of C D1I2 . Once the entries of 1\ are obtained, the commonalities and proportion of variance can be easily computed. Example 9.7: Consider following covariance matrix
s=
5.834
-1.666
2.761
-1.666 2.761
9.717 -2.560
1.886
-0.312 -0.739
1.886
-2.560 -0.312 4.287
-0.739 14.277
Use principal factor analysis to obtain a two factor solution for this covariance matrix.
Principal Component and Factor Analysis
257
Solution: The principal factor solution is obtained by using eigen values ~0 and eigen vectors ofS. The eigen values ofS are obtained from IS -
ell
and are
e, ~ 14.777, e
2
~
11.731, e3
~
5.811 and
e
4
~
1.796.
The eigen vectors corresponding to first two eigen values of S are
0.241
,
c
-0.156 ~
0.034
0.351 and c 2
~
-0.799 0.428 -0.234
0.957
Now for a two factor solution we have
D~
[14.~77
0 ] =>D 112 11.731
~
and
0.241 C~
0.351
-0.156 -0.799 0.034
0.428
0.957
-0.234
so
0.928
CD1I2 ~
~,
~2
A." A"
A.,2
;1,4'
;1,42
A,2
1.204
-0.599 -2.736 0.131
1.467
3.680
-0.801
0.928
1.204
-0.599 -2.736 0.131
1.467
3.680
-0.801
So the two factor solution is
X, ~ 0.928.t; + 1.204/2 + e, X 2 ~-0.599.t; -2.736/2 +e,
[3.844 0
3.~25 ]
258
Chapter Nine
X3 ~ 0.131;; + 1.467 f2 + ej
and X4
~
3.680;; -0.801f2 + ej
Further 2.304 -3.846 7.849
1.883
2.445
-4.090
-0.013
2.166
-0.694 14.176
The commonalities are. therefore h,. and h4
~
~
2.304.
hz ~ 7.849.
h3
~
2.166
14.176. The total commonality is
h ~ h,. + h2 + h3 + h4
~
26.495 .
Also
tr(S)
~
5.834+ 9.717 + 4.287 + 14.277 ~ 34.155.
The proportion of variance explained by the common factors is, therefore
Pr(f) ~ _h_ ~ tr(S)
26.495
77.57%.
34.155
Further. by using the diagonal entries of Sand
AA I
we have