Statistics. Volume 2, Multiple Variable Analysis [2] 9781536151237, 1536151238


247 32 7MB

English Pages [482] Year 2019

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Contents
Preface
Chapter 1
A Correlation Factor
Abstract
1. Introduction
2. A Covariance
3. A Correlation Factor
4. A Rank Correlation
5. A Pseudo Correlation and a Partial Correlation Factor
6. Correlation and Causation
Summary
Chapter 2
A Probability Distribution for a Correlation Factor
Abstract
1. Introduction
2. Probability Distribution for Two Variables
3. The Probability Distribution for Multi Variables
4. Probability Distribution Function for Sample Correlation Factor
5. Approximated Normal Distribution for Converted Variable
6. Prediction of Correlation Factor
7. Prediction with More Rigorous Average Model
8. Testing of Correlation Factor
9. Testing of Two Correlation Factor Difference
Summary
Chapter 3
Regression Analysis
Abstract
1. Introduction
2. Path Diagram
3. Regression Equation for a Population
4. Different Expression for Correlation Factor
5. Forced Regression
6. Accuracy of the Regression
7. Regression for Sample
8. Residual Error and Leverage Ratio
9. Prediction of Coefficients of a Regression Line
10. Estimation of Error Range of Regression Line
11. Logarithmic Regression
12. Power Law Regression
13. Regression for an Arbitrarily Function
14. Logistic Regression
15. Matrix Expression for Regression
Summary
Chapter 4
Multiple Regression Analysis
Abstract
1. Introduction
2. Regression with Two Explanatory Variables
2.1. Regression Line
2.2. Accuracy of the Regression
2.3. Residual Error and Leverage Ratio
2.4. Prediction of Coefficients of a Regression Line
2.5. Estimation of a Regression Line
2.6. Selection of Explanatory Variables
3. Regression with More Than Two Explanatory Variables
3.1. Derivation of Regression Line
3.2. Accuracy of the Regression
3.3. Residual Error and Leverage Ratio
3.4. Prediction of Coefficients of a Regression Line for Multi Variables
3.5. Estimation of a Regression Line
3.6. The Relationship between Multiple Coefficient and Correlation Factors
3.7. Partial Correlation Factor
3.8. Selection of Explanatory Variable
4. Matrix Expression for Multiple Regression
Summary
Chapter 5
Structural Equation Modeling
Abstract
1. Introduction
2. Path Diagram
3. Basic Equation
Summary
Chapter 6
Fundamentals of a Bayes’ Theorem
Abstract
1. Introduction
2. A Bayes’ Theorem
3. Similar Famous Examples
3.1. Three Prisoners
3.2. A Monty Hall Problem
4. Bayesian Updating
4.1. Medical Check
4.2. Double Medical Checks
4.3. Two Kinds of Medical Checks
4.4. A Punctual Person Problem
4.5. A Spam Mail Problem
4.6. A Vase Problem
4.7. Extension of Vase Problem
4.8. Comments on Prior Probability
4.9. Mathematical Study for Bayesian Updating
5. A Bayesian Network
5.1. A Child Node and a Parent Node
5.2. A Bayesian Network with Multiple Hierarchy
Summary
Chapter 7
A Bayes’ Theorem for Predicting Population Parameters
Abstract
1. Introduction
2. A Maximum Likelihood Method
2.1. Example for a Maximum Likelihood Method
2.2. Number of Rabbits on a Hill
2.3. Decompression Sickness
3. Prediction of Macro Parameters of a Set with Conjugate Probability Functions
3.1. Theoretical Framework
3.2. Beta/Binominal (Beta)/Beta Distribution
3.3. Dirichlet/Dirichlet (Multinomial)/Dirichlet Distribution
3.4. Gamma/Poisson (Gamma)/Gamma Distribution
3.5. Normal /Normal/Normal Distribution
3.6. Inverse Gamma/Normal (Inverse Gamma)/Inverse Gamma Distribution
4. Probability Distribution with Two Parameters
4.1. MCMC Method (A Gibbs Sampling Method)
4.2. MCMC Method (A MH Method)
4.3. Posterior Probability Distribution with Many Parameters
4.4. Hierarchical Bayesian Theory
Summary
Beta/Binominal (Beta)/Beta Distribution
Dirichlet/Multinomial (Dirichlet)/Dirichlet Distribution
Gamma/Poisson(Gamma)/Gamma Distribution
Normal/Normal/Normal Distribution
Inverse Gamma/Normal (Inverse Gamma)/Inverse Gamma Distribution
Chapter 8
Analysis of Variances
Abstract
1. Introduction
2. Correlation Ratio (One Way Analysis)
3. Testing of the Dependence
4. The Relationship between Various Variances
5. Multiple Comparisons
6. Analysis of Variance (Two Way Analysis with Non-Repeated Data)
7. Analysis of Variance (Two Way Analysis with Repeated Data)
Summary
One Way Analysis
Chapter 9
Analysis of Variances Using an Orthogonal Table
Abstract
1. Introduction
2. Two Factor Analysis
3. Three Factor Analysis without Repetition
3.1. No Interaction
3.2. The Other Factors
3.3. Interaction
4. Three Factor Analysis with Repetition
Summary
Chapter 10
Principal Component Analysis
Abstract
1. Introduction
2. Principal Component Analysis with Two Variables
3. Principal Component Analysis with Multi Variables
4. Example for Principal Component Analysis (Score Evaluation)
5. Multiple Regression Using Principal Component Analysis
Summary
Chapter 11
Factor Analysis
Abstract
1. Introduction
2. One Factor Analysis
2.1. Basic Equation for One Variable Factor Analysis
2.2. Factor Load
2.3. Factor Point and Independent Load
3. Two-Factor Analysis
3.1. Basic Equation
3.2. Factor Load for Two-Factor Variables
3.3. Uncertainty of Load Factor and Rotation
3.4. Factor Point and Independent Load with Two-Factors
4. General Factor Number
4.1. Description of the Variables
4.2. Basic Equation
4.3. Factor Load
4.4. Uncertainty of Load Factor and Rotation
4.5. Factor Point and Independent Load with General Number-Factors
5. Summary
5.1. Notation of Variables
5.2. Basic Equation
5.3. Normalization of Variables
5.4. Correlation Factor Matrix
5.5. Factor Loading
5.6. Valimax Rotation
5.7. Factor Point
5.8. Independent Factor
5.9. Appreciation of the Results
Chapter 12
Cluster Analysis
Abstract
1. Introduction
2. Simple Example for Clustering
3. The Relationship between Arithmetic Distance and Variance
4. Various Distance
5. Chain Effect
6. Ward Method for Two Variables
7. Ward Method with Many Variables
8. Example of Ward Method with N Variables
Step 1: Sum of Square
Step 2: Sum of the Variance for Whole Combinations
Step 3: First Cluster Formation
Step 4: Sum of Square for Second Time
Step 5: Second Cluster Formation
Summary
Chapter 13
Discriminant Analysis
Abstract
1. Introduction
2. Discriminant Analysis with One Variable
3. Discriminant Analysis with Two Variables
4. Discriminant Analysis for Multi Variables
Summary
References
About the Author
Index
Blank Page
Recommend Papers

Statistics. Volume 2, Multiple Variable Analysis [2]
 9781536151237, 1536151238

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

MATHEMATICS RESEARCH DEVELOPMENTS

STATISTICS VOLUME 2 MULTIPLE VARIABLE ANALYSIS

No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.

MATHEMATICS RESEARCH DEVELOPMENTS Additional books and e-books in this series can be found on Nova’s website under the Series tab.

MATHEMATICS RESEARCH DEVELOPMENTS

STATISTICS VOLUME 2 MULTIPLE VARIABLE ANALYSIS

KUNIHIRO SUZUKI

Copyright © 2019 by Nova Science Publishers, Inc. All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. We have partnered with Copyright Clearance Center to make it easy for you to obtain permissions to reuse content from this publication. Simply navigate to this publication’s page on Nova’s website and locate the “Get Permission” button below the title description. This button is linked directly to the title’s permission page on copyright.com. Alternatively, you can visit copyright.com and search by title, ISBN, or ISSN. For further questions about using the service on copyright.com, please contact: Copyright Clearance Center Phone: +1-(978) 750-8400 Fax: +1-(978) 750-4470 E-mail: [email protected]. NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS. Additional color graphics may be available in the e-book version of this book.

Library of Congress Cataloging-in-Publication Data ISBN:  H%RRN

Published by Nova Science Publishers, Inc. † New York

CONTENTS Preface

vii

Chapter 1

A Correlation Factor

1

Chapter 2

A Probability Distribution for a Correlation Factor

17

Chapter 3

Regression Analysis

69

Chapter 4

Multiple Regression Analysis

117

Chapter 5

Structural Equation Modeling

169

Chapter 6

Fundamentals of a Bayes’ Theorem

179

Chapter 7

A Bayes’ Theorem for Predicting Population Parameters

223

Chapter 8

Analysis of Variances

277

Chapter 9

Analysis of Variances Using an Orthogonal Table

305

Chapter 10

Principal Component Analysis

339

Chapter 11

Factor Analysis

361

Chapter 12

Cluster Analysis

431

Chapter 13

Discriminant Analysis

453

References

461

About the Author

463

Index

465

Nova Related Publications

469

PREFACE We utilize statistics when we evaluate TV program rating, predict a result of voting, prepare stock, predict the amount of sales, and evaluate the effectiveness of medical treatments. We want to predict the results not on the base of personal experience or images, but on the base of the corresponding data. The accuracy of the prediction depends on the data and related theories. It is easy to show input and output data associated with a model without understanding it. However, the models themselves are not perfect, because they contain assumptions and approximations in general. Therefore, the application of the model to the data should be careful. We should know what model we should apply to the data, what are assumed in the model, and what we can state based on the results of the models. Let us consider a coin toss, for example. When we perform a coin toss, we obtain a head or a tail. If we try the coin toss three times, we may obtain the results of two heads and one tail. Therefore, the probability that we obtain for heads is 2/3, and the one that we obtain for tails is 1/3. This is a fact and we need not to discuss this any further. It is important to notice that the probability (2/3) of getting a head is limited to this trial. Therefore, we can never say that the probability that we obtain for heads with this coin is 2/3, in which we state general characteristics of the coin. If we perform the coin toss trial 400 times and obtain heads 300 times, we may be able to state that the probability of obtaining a head is 2/3 as the characteristics of the coin. What we can state based on the obtained data depends on the sample number. Statistics gives us a clear guideline under which we can state something is based on the data with corresponding error ranges. Mathematics used in statistics is not so easy. It may be a tough work to acquire the related techniques. Fortunately, the software development makes it easy to obtain results. Therefore, many members who are not specialists in mathematics can do statistical analysis with these softwares. However, it is important to understand the meaning of the model, that is, why some certain variables are introduced and what they express, and what we can state based on the results. Therefore, understanding mathematics related to the models is invoked to appreciate the results.

Kunihiro Suzuki

viii

In this book, we treat models from fundamental ones to advanced ones without skipping their derivation processes as possible as I can. We can then clearly understand the assumptions and approximations used in the models, and hence understand the limitation of the models. We also cover almost all the subjects in statistics since they are all related to each other, and the mathematical treatments used in a model are frequently used in the other ones. We have many good practical and theoretical books on statistics [1]-[10]. However, these books are oriented to special direction: fundamental, mathematical, or special subjects. I want to add one more, which treats fundamental and advanced models from the beginnings to the advanced ones with a self contained style. I also aim to connect theories to practical subjects. This book consists of three volumes:   

The first volume treats the fundamentals of statistics. The second volume treats multiple variable analysis. The third volume treats categorical and time dependent data analysis

This volume 2 treats multiple variable analysis. We frequently treat many variables in the real world. We need to study relationship between these many variables. First of all, we treat correlation and regression for two variables, and it is extended to multiple variables. We also treat various techniques to treat many variables in this volume. The readers can understand various treatments of multiple variables. We treat the following subjects. (Chapter 1 to 4) Up to here, we treat statistics of one variable. We then study correlation and regression for two variables. The probability function for the correlation function is also discussed, and it is extended to multiple regression analysis. (Chapter 5) We treat data which consist of only explanatory variables. We assume an objective variable based on the explanatory variables. The analysist can assume the image variables freely, and he can discuss the introduced variables quantitatively based on the data. This process is possible with the structural equation modeling. We briefly explain the outline of the model.

Preface

ix

(Chapter 6 and 7) We treat a Bayes’ theorem which is the extension of conditional probability, and is vital to analyze problems in real world. The Bayes’ theorem enables us to specify the probability of the cause of the event, and we can increase the accuracy of the probability by updating it using additional data. It also predict the two subject which are parts of vast network. Further, we can predict population prametervalues or the related distribution of a average, a variance, and a ratio using obtained data. We further treat vey complex data using a hierarchical Bayesian theory. (Chapter 8 and 9) We treat a variance analysis and also treat it using an orthogonal table which enables us to evaluate the relationship between one variable and many parameter variables. Performing the analysis, we can evaluate the effectiveness of treatment considering the data scattering. (Chapter 10 to 13) We then treat standard multi variable analysis of principal component analysis, factor analysis, clustering analysis, and discriminant analysis. In principal component analysis, we have no objective variable, and we set one kind of an objective variable that clarify the difference between each members. The other important role of the analysis is that it can be used to lap the variables which have high correlation factors. A high correlation factor induces unstable results in multi variable analysis, which can be overcome by the proposed method associated with the principal component analysis. In factor analysis, we also have no objective variable. We generate variables that explain the obtained results. Cluster analysis gives us to make groups that resemble each other. In discriminal analysis, we have two groups. We decide that a new sample belongs to which group. We do hope that the readers can understand the meaning of the models in statistics and techniques to reach the final results. I think it is not easy to do so. However, I also believe that this book helps one to accomplish it with time and efforts. I tried to derive any model from the beginning for all subjects although many of them are not complete. It would be very appreciated if you point out any comments and suggestions to my analysis.

Kunihiro Suzuki

Chapter 1

A CORRELATION FACTOR ABSTRACT Analysis of a relationship between two variables is an important subject in statistics. The relationship is expressed with a covariance, and it is extended to a non-scale parameter of correlation factor. The correlation factor has a value range between -1 and 1, and it expresses the strength of the relationship between two variables. The ranking data have a special data form, and the corresponding correlation factor has its special form and is called as a Spearman’s correlation factor. We also discuss a pseudo correlation factor where the other variable makes as if two variables have a relationship each other.

Keywords: covariance, correlation factor, ranking data, Spearman’s correlation factor, pseudo correlation

1. INTRODUCTION We guess that smoking influences the lifespans based on our feeling. We want to clarify the relationship quantitatively. We obtain a data set of smoking or no-smoking and their lifespans. Using the data, we should obtain a parameter that expresses the significance of a relationship between smoking and the lifespan, and want to decide whether the two items have a relationship or not. It is important to study the relationship between the two variables quantitatively. We focus on the two quantitative data and study to quantify the relationship in this chapter.

Kunihiro Suzuki

2

2. A COVARIANCE We consider the case where we obtain various kinds of data from one sample. Table 1 shows concrete data of scores of mathematics, physics, and language for 20 members. Table 1. Data for scores of mathematics, physics, and language Mathematics 52 63 34 43 56 67 35 53 46 79 75 65 51 72 49 69 55 61 41 34

Physics 61 52 22 21 60 70 29 50 39 78 70 59 56 79 48 73 57 68 42 36

Language 100 66 64 98 86 35 43 47 65 43 72 55 75 82 81 99 57 81 75 63

100

100

80

80

Score of language

Score of physics

Member ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

60 40 20 0

0

20 40 60 80 Score of mathematics (a)

100

60 40 20 0

0

20 40 60 80 Score of mathematics (b)

100

Figure 1. Scatter plots. (a) Sore of mathematics vs score of physics. (b) Score of mathematics vs score of language.

A Correlation Factor

3

We can see the relationship between scores of mathematics and physics or between scores of mathematics and language by obtaining scatter plots as shown in Figure 1 (a) and (b). We can qualitatively say that the member who gets a high score of mathematics is apt to get a high score of physics, and the member who gets a low score of mathematics is apt to get a low score of physics. We can also say that the score of mathematics and that of language have no clear relationship. We want to express the above discussions quantitatively. We consider the relationship between scores of mathematics and physics, or between scores of mathematics and language. The average of mathematics is given by N

x  E  X  

x

i

i 1

N

(1)

where we express the score of mathematics for each member as xi , and N is the number of members, and N is 20 in this case. The average of physics or language is given by N

 y  E Y  

y i 1

i

N

(2)

where we express the score of them for each member as yi . The variance of mathematics is given by N

 xx  2  E  X   x    2





 x   

2

x

i 1

N

(3)

The variance of physics or language is given by

 y    N

2  yy  2  E Y   y   





2

y

i 1

N

(4)

Kunihiro Suzuki

4

 2  2     2 Note that we express the variances as  xx and yy instead of  x and y to extend them to a covariance which is defined soon after. In the two variables, we newly introduce a covariance, which is given by 2

 xy  2  E  X   x  Y   y 

 x    y    N



x

i 1

y

N

(5)

In the above discussion, we implicitly assume that the set is a population. We assume that the sample size is n . A sample average of a variable x is denoted as x and is given by n

x

x

i

i 1

n

(6)

A sample average of

y

is given by

n

y

y

i

i 1

n

(7)

An unbiased variance of x is given by n

sxx    2

 x  x 

2

i 1

n 1

(8)

A sample variance of x is given by n

S xx    2

x  x 

2

i 1

n

An unbiased variance of

(9)

y

is given by

A Correlation Factor n

s yy

 2



 y  y 

2

i 1

n 1

(10)

A sample variance of n

S yy    2

5

 y  y 

y

is given by

2

i 1

n

(11)

We can also evaluate the variances given by n

sxy    2

  x  x  y  y  i 1

n 1

(12)

n

S xy    2

  x  x  y  y  i 1

n

(13)

We study characteristics of the covariance graphically as shown in Figure 2. The covariance is the product of deviations of two variables with respect to each average. The deviation is positive in the first and third quadrant planes, and negative in the second and fourth quadrant planes. Therefore, if the data are only on the first and third quadrants, the product is always positive, and hence sum becomes large. If the data is scattered on all quadrant planes, the product sign is not constant, and some terms vanish each other, and hence the sum becomes small. The covariance between scores of mathematics and physics is 132.45, and that between scores of mathematics and language is 40.88. Therefore, the magnitude of the covariance expresses the strength of the relationship qualitatively. Let us consider the 100 point examination, which corresponds to Table 1. We can convert these scores to 50 point one simply by divided by 2. The corresponding scatter plots are shown in Figure 3. The relationship must be the same as that of Figure 1(a). However, the corresponding covariance is 40.88, which is much smaller than that of 100 points. Since we only change the scale in both figures, we expect the same significance of the relationship. Therefore, we want to obtain a parameter that expresses the strength of relationship independent of the data scaling, which is not the covariance.

Score of physics - y

40

-

+

20 0 -20 -40

-

+

-40 -20 0 20 40 Score of mathematics - x (a)

Score of language - y

Kunihiro Suzuki

6

40

-

+

+

-

20 0 -20 -40

-40 -20 0 20 40 Score of mathematica - x (b)

Figure 2. Scattered plots of deviations with respect to each variable’s average. (a) Sore of mathematics vs score of physics. (b) Score of mathematics vs score of language.

Score of physics

50 40 30 20 10 0

0

10 20 30 40 Score of mathematica

50

Figure 3. Scatter plots under 50 points.

3. A CORRELATION FACTOR A correlation factor

 xy 

 xy

is introduced as

 xy  2  xx  2  yy  2

(14)

A Correlation Factor

7

We can also define a sample correlation factor given by sxy 

rxy 

sxx 

2

2

s yy 

2

S xy 



S xx 

2

2

S yy 

2

(15)

  0.94

We obtain the same correlation factor of xy expected independent of the scale. We evaluate a following expectation given by



for Figure 1(a) and Figure 3 as is



2 E    X   x   Y   y     2 2  E  2  X   x   2  X   x  Y   y   Y   y    

  2 xx   2 xy    yy  2

2

2

 2  xy 2  2    xx   2  2     yy     xx   2 2   xy 2    xy 2   2  2    xx     2     2     yy        xx   xx     2

  xx

where

  

2

 xy    2  2    xy 2     yy   0  2    xx   xx  2

 2

(16)

2

 is an arbitrary constant. Since the formula has a quadratic form, we insist on

    2 xy

 xx 2

2

    yy 0 2

(17)

Therefore, we obtain

    2 xy

2

 xx 2 yy 2

   2 xy     2  2  xx yy

2

    2 1 xy  

Finally, the range of the correlation factor is given by

(18)

Kunihiro Suzuki

8

1   xy  1

(19)

If there is no correlation, that is, two variables are independent of each other,  xy is 0. If the correlation is significant, the absolute value of  xy approaches to 1. The above theory is for a population set. The sample correlation factor rxy is simply given by rxy 

s xy 

2

s xx   s yy  2

2

(20)

This is also expressed with sample variances given by rxy 

sxy 

2

sxx   s yy  2

2



S xy 

2

S xx   S yy  2

2

(21)

4. A RANK CORRELATION There is a special data form where we rank the team. If we evaluate m teams, the corresponding data is from 1 to m . We want to obtain the relationship between each member’s evaluation. The theory is exactly the same for the correlation factor. However, the data form is special, and we can derive a special form of a correlation factor, which is called as a Spearman’s correlation factor. Let us derive the correlation factor. Table 2 shows the ranking of each member to the football teams. We want to know the resemblance of each member, which is evaluated by the correlation factor. The correlation factor is simply evaluated by the procedure shown in this chapter, which is shown in Table 3. It is clear that member 1 resembles with member 3. Table 2. Ranking of each member to teams Member ID m1 m2 m3

Team A

B 2 4 1

C 3 5 3

D 5 1 4

E 1 6 2

F 6 3 6

4 2 5

A Correlation Factor

9

Table 3. Correlation factor of each member m1

m2 m3 -0.771429 0.8857143 -0.6

m1 m2 m3

We denote the data of the member 1 as a1 , a2 ,

b1 , b2 ,

, an , and the data of the member 2 as

, bn . The correlation factor is evaluated as n

a

r

i

i 1

n

a i 1

i





 a  bi  b

 a

2

 b  b  n

i 1

2

i

(22)

The averages for a and b are the same and is decided by the team number, and is given by

a b 

n 1 2

(23)

A variance is also decided only by the team number, and is given by n

a i 1

i

n



 a    bi  b 2

i 1



2

n 1  n 1  n 1   1    2    3    2 2 2       2

n  n 1  i   2  i 1 

2

 

 2n  1 n  n  1 6

2

  n  1

2

n  n  1

2 n  n  1 4n  2  6n  6  3n  3 n  n  1 n  1 12

n 1   n   2  

2

n n  n 1   i 2   n  1  i  n    2  i 1 i 1



1

n

 n  1

2

4

12

(24)

Kunihiro Suzuki

10 We then evaluate a covariance. Further, we obtain

 ai  bi 

    a  a   b  b 

  ai  a   bi  b 

2

2

i

2

2

i



 2  ai  a  bi  b







(25)

1 2  ai  bi  2

(26)

 2  ai  a   2  ai  a  bi  b 2

Therefore, we obtain

 ai  a   bi  b    ai  a 

2



Finally, we obtain the Spearman’s correlation factor form as n

r

a

a  2

i

i 1

n

a i 1

i

1 n 2  ai  bi   2 i 1  a

n

6  ai  bi 

1

2

2

i 1

n  n  1 n  1

(27)

This gives the same results in Table 3. The maximum value for correlation factor is 1, and hence we set n

1 1

6  ai  bi 

2

i 1

n  n  1 n  1

(28)

This leads to

ai  bi The minimum value of the correlation factor is -1, and hence we set

(29)

A Correlation Factor n

1  1 

6  ai  bi 

11

2

i 1

n  n  1 n  1

(30)

This leads to n

a  b  i 1

i

2

i

1  n  n  1 n  1 6

(31)

We can easily guess this is the case when one data is 1,2, ,n , and corresponding the other data is n, n  1, ,1 , respectively. The corresponding term is n

 i  n  i  1 i 1

n

n

n

i 1

i 1

i 1

 n i   i 2   i 1 1 1  n n  n  1  n  n  1 2n  1  n  n  1 2 6 2 1 1    n  n  1  n   2n  1  1 2 3   1  n  n  1 n  2  6

(32)

This agrees with Eq. (31)。 Random can be expressed by r  0 , this expressed by n

0 1

6  ai  bi 

2

i 1

n  n  1 n  1

(33)

This leads to n

a  b  i 1

i

i

2

1  n  n  1 n  1 6

(34)

Kunihiro Suzuki

12

5. A PSEUDO CORRELATION AND A PARTIAL CORRELATION FACTOR r

Even when the correlation factor xy for two variables of x and y is large, we cannot definitely say that there is a causal relationship. Let us consider the case of the number of sea accident and amount of ice cream sales. The back ground of the relationship is the temperature. If the temperature increases, the number of people who go to the sea increases, and the number of the accident increases, and the amount of ice cream sales may increase. The relationship between the accident number and the ice cream sales is related to the same one parameter of the temperature. The second example is the relationship between annual income and running time for 50 m. The income increases with age, and the time increase with the age. We then obtain the relationship between the income and the time thorough the parameter of the age. The above feature can be evaluated as the followings. When there is a third variable

u , and x and y both have correlation relationships,

we then obtain a strong correlation factor between x and y even if there is no correlation between them. The correlation under this condition is called as pseudo correlation. We want to obtain the correlation eliminating the influence of the variable The regression relationship between x and

S xx  2

Xi  x 

 2 Suu

u is given by

rxu  ui  u  (35)

Similarly, the regression relationship between y and

Yi  y 

 2 S yy  2 Suu

u.

u is given by

ryu  ui  u  (36)

The residuals for x and y associated with the

u are given by

exi  xi  X i  xi  x 

S xx 2 Suu 2

rxu  ui  u  (37)

A Correlation Factor

13

eyi  yi  Yi  yi  y 

S yy 2 Suu 2

ryu  ui  u  (38)

u . The correlation factor between these variables is the one eliminating the influence of u and is called as a causal These are the variables eliminating the influence of

correlation factor. The causal correlation factor is denoted as

rxy ,u 

rxy ,u

, and is given by

Sex2ey Sex2ex Sey2ey

(39)

where

Sexi2e xi  

  S xx 2 S xx 2 1 n     x  x  r u  u x  x  r u  u   i    i  2  xu i  2  xu i n i 1     Suu Suu    2 S xx 2 1 n 1 S xx  2 n 1 n 2 2 x  x  r u  u  2 r i    i  n S 2 xu    x  x ui  u   2  xu n i 1 i n i 1 i  1 Suu uu

 S xx 2 

S xx 2 2  2 r S  2S xx 2 rxu2  2  xz uu Suu

 S xx 2 1  rxz2 

(40) Similarly, we obtain

Seyi2e yi  S yy 2 1  ryu2  On the other hand, we obtain

(41)

Kunihiro Suzuki

14 Sexi e yi  2

2 2   S yy  S xx  1 n     x  x  r u  u y  y  ryu  ui  u     i xu  i i 2 2  n i 1   Suu  Suu    

S yy 2 1 n 1 n    xi  x  yi  y     xi  x  ryu  ui  u  2 n i 1 n i 1 Suu  

  2   S  2 S xx 2 1 n 1 n  S xx yy    y  y r x  x  r u  u r u  u     2 yu  i     i   2 xu  i  n   2  xu i n i 1  Suu i 1  S Suu  uu   S yy 2

 S xy 2  ryz

 2

S xx 2

S xz 2  rxz

 2

S zz

S yz 2  rxz ryz

S zz

S xx 2 S yy 2  2

S zz

S zz 2

 S xy   S yy  S xx  rxu ryu  S yy  S xx  ryu rxu  S xx  S yy  rxu ryu 2

2

2

2

2

2

2

 S xy   S yy  S xx  rxu ryu 2

2

2

(42)

Finally, we obtain

rxy ,u  

 2  2  2 S xy  S yy S xx rxu ryu

1  r  S   1  r  S   2 xu

2 xx

2 yu

2 yy

rxy  rxu ryu

1  r 1  r  2 xu

2 yu

(43)

The other try is to take data considering the stratification. For example, if we take data considering the age, we take correlation selecting the limited age range with no relation between the income and running time. The above means the followings. If we have some doubt about the correlation and have some image of the third parameter that should be the cause of the correlation, we should take data for the limited range of the third parameter. This means that we should take data of two variables of x and y with the limited range of the third parameter of u where we can approximate that the value of u is constant. We can then evaluate the real relationship between x and y .

6. CORRELATION AND CAUSATION We should be careful about that a high correlation factor does not directly mean causation. On the other hand, if two variables have strong causation relationship, we can

A Correlation Factor

15

always expect a high correlation factor. Therefore, a high correlation is a requirement, but is not a sufficient condition. For example, there is implicitly the other variable, and two variables relate strongly to this implicit variable. We then observe a strong correlation. However, this strong correlation is related to the implicit variable. The relationship between two variables is summarized in Figure 4. The key process is whether there is an implicit variable.

Figure 4. Summary of the relationship between correlation and causation.

SUMMARY To summarize the results in this chapter, the covariance between variable given by

 xy  2 

x

and y is

1 N   x  x   y   y  N i 1

This expresses the relationship between two variables. This can be extended to the correlation factor given by

 xy 

 xy  2  xx  2  yy  2

Kunihiro Suzuki

16

 2  yy2   xx where and are variances of variables x and y given by

 xx 2 

1 N 2   x  x  N i 1

 yy  2 

2 1 N  y  y  N i 1

The correlation factor is given by

1   xy  1 When the data are order numbers, the related correlation factor is called as a Spearman’s factor and is given by n

r

a i 1

a  2

i

n

a

i

i 1

n

1

1 n 2   ai  bi  2 i 1  a

6  ai  bi 

2

2

i 1

n  n  1 n  1

If the correlation factor is influenced by the third variable correlation factor eliminating the influence of factor, and is given by.

rxy ,u 

rxy  rxu ryu

1  r 1  r  2 xu

2 yu

u , we want to know the

u and is called as a causal correlation

Chapter 2

A PROBABILITY DISTRIBUTION FOR A CORRELATION FACTOR ABSTRACT We derive a distribution function for the correlation factor, and the distribution function with a zero correlation factor is obtained as the special case of it. Using the distribution function, we evaluate the prediction range of the correlation factor, and the validity of the relationship.

Keywords: covariance, correlation factor, prediction, testing

1. INTRODUCTION We can easily obtain a correlation factor with one set of data. However, it should have an error range, which we have no idea up to here. The accuracy of the correlation factor should depend on the sample number. We want to evaluate the error range for the correlation factor. We therefore study a probability distribution function for a correlation factor. We can then predict the error range of the correlation factor. We perform matrix operation in the analysis here. The basics of matrix operation are described in Chapter 15 of volume 3.

2. PROBABILITY DISTRIBUTION FOR TWO VARIABLES We consider a probability distribution for two variables which have relation to each other.

Kunihiro Suzuki

18

We first consider two independent normalized variables z1 and z2 , which follow standard normal distributions. The corresponding joint distribution is given by f  z1 , z2  dz1dz2  f  z1  f  z2  dz1dz2 

1



z12 2

1



z2 2 2 dz dz 1 2

e e 2 2 1  1   exp   z12  z2 2  dz1dz2 2 2  





(1)

We then consider two variables with a linear connection of z1 and z2 , which is given by  x1  az1  bz2   x2  cz1  dz2

(2)

where

ad  bc  0

(3)

Eq. (2) can be expressed with a matrix form as  x1   a    x2   c

b   z1    d   z2 

(4)

We then obtain  z1   a b      z2   c d  d  D  c

1

 x1     x2  b   x1    a  x2 

(5)

where D is a determinant of the matrix, given by D

1 ad  bc

(6)

A Probability Distribution for a Correlation Factor

19

We then have  z1  dDx1  bDx2   z2  cDx1  aDx2

(7)

The corresponding Jacobian (see Appendix 1-5 of volume 3) is given by

J 



z1 x1

z1 x2

z2 x1

z2 x2

dD bD cD aD

 adD 2  bcD 2  D

(8)

Therefore, we obtain f  z1 , z2  dz1dz2  f  x1 , x2  dx1dx2   dx  bx 2   cx  ax 2  1 2 1 2  exp   D 2  dx1dx2 2 2     D

(9)

We can evaluate averages and variances of x1 and x2 as

E  x1   E  az1  bz2   aE  z1   bE  z2   0

(10)

E  x2   E cz1  dz2   cE  z1   dE  z2   0

(11)

2 E  x12   E  az1  bz2    

 2

(12)

 a  b  1 2

2

2 E  x22   E  cz1  dz2    

 2

 c  d  2 2

2

(13)

Kunihiro Suzuki

20

Cov  x1 , x2   E  x1 x2   E  x1  E  x1   E  az1  bz2  cz1  dz2    acE  z12   bdE  z2 2    ad  bc  E  z1 z2 

(14)

 ac  bd  12  2

The correlation factor is given by

 

  12 2

2 2 1   2 

(15)

ac  bd a 2  b2 c2  d 2

We then obtain

1 2 



2 2  2  2  1  2    12  21 2 2  1  2 

2 2 2  2   1  2    12 



 2  2

a 



1  2 2





 b 2 c 2  d 2   ac  bd 

(16)

2

12 2 2

2 ad  bc    2 2 1  2 

The determinant is then expressed by

D 

1  ad  bc

1

1        2

2

1

2 2

We preform calculation in the exponential term of

(17)

f  x1 , x2 

as

A Probability Distribution for a Correlation Factor



21

 dx1  bx2 2   cx1  ax2 2 D 2 2









1    d 2  c 2 x12  2  bd  ca  x1 x2  b 2  a 2 x22  D 2  2 1 2 1 2 2 2     2  x12  2   1  2  x1 x2   1  x2 2  2 2     2  1  2 1   2







1

2 1 2



  2 x1 x2 x2 2    2 x1  2    2   2  2  2   2    1 2  1 2 



f  x1 , x2 

We then have

f  x1 , x2  

(18)

1 2

1    2

as  1 exp   2 2 2 1 2 1  2  





 2 x1 x2 x22  x1  2    2    2 2 2 1  2   2  1

     

(19)

This is the probability distribution for two variables that follow a normal distribution

 0,   

 0,   

2

2 2

with and , and the correlation factor between the two variables is  . We need to generalize this for a certain averages instead of 0. Therefore, we introduce two variables as 1

u1  x1  1  u2  x2  2

(20)

 ,     ,      These two variables follow standard distributions with and . The 2

2

1

1

2

2

Jacobian is not changed, and hence

f  x1 , x2  dx1dx2  f  u1 , u2  du1du2

(21)

We then obtain a probability distribution as

f  u1 , u2  

 1 exp    2 1 2  2  2 2 1   1  2  1

2









2 2     u1  1   2   u1  1  u2  2    u1  2    2    2 2 2  2    1  2  1  

(22)

Kunihiro Suzuki

22

u1 and u2 are the dummy variables, and we can change them to any notation, and

hence we use common variables x and y , and obtain

f  x, y  dxdy 

 1  exp    2  2 2 2 1  2 1    xx  yy  1

2











 

 2   x  x   2  x  x  y   y  y   y   2  2  2  yy2   xx  xx  yy 



2

   dxdy   

(23) This is the probability distribution for two variables that follow normal distributions

 ,     ,      with and . The correlation factor for the two variables is  . 2

1

2

2

1

2

The probability associated with x  y plane region is given by

P



A

f  x, y dxdy

(24)

where A is the region in the x  y plane.

3. THE PROBABILITY DISTRIBUTION FOR MULTI VARIABLES We can extend the probability distribution for multi variables. We consider m variables. The variables can be expressed as a form of vector as  x1    x x 2       xm 

(25)

The corresponding probability distribution can be expressed by

f  x1 , x2 , , xm  

where

1 m

 2  2

 D2  exp     2  

(26)

A Probability Distribution for a Correlation Factor   11 2  12 2   2   22 2    21     2   2 m2  m1

 1m2    2 2m     2   mm 

23

(27)

and D is given by

D 2   x1  1 , x2  2 ,

  11 2  12 2  21 2   22 2 , xm  m      m1 2  m 2 2 

   xi  i   x j   j   m

m

 1m 2   x1  1     2 m 2   x2  2 

     mm 2   xm  m 

(28)

ij  2 

i 1 j 1

where

 k 

 1k    k       2      k    m 

  11 2  21 2      m1 2 

(29)

 12 2

 1m 2    111  12 2

 1m2 

 22 2

 2  22 2  2 m 2    21

 2 2m 

 m 2 2 

 

   mm  2     2     m1

 m 22

1



(30)

   2   mm 

We do not treat this distribution here, but use it in Chapter 4 where multi-regression is discussed.

4. PROBABILITY DISTRIBUTION FUNCTION FOR SAMPLE CORRELATION FACTOR We pick up n pair data from a population set. The obtained data are expressed by

 x1, y1  ,  x2 , y2  , ,  xn , yn  , and the sample correlation factor r

is given by

Kunihiro Suzuki

24 r



1

1 2 2     n S xx S yy

n

  x  x  y i

1   2   2   n S xx S yy 1

i

 y

i 1

n

 i 1

(31)

 xi yi  x y   

where   S xx  2

  S yy  2

1 n

1 n

n

 x

i

 x

2

i 1

(32)

n

 y  y 

2

i

i 1

(33)

We start with the probability distribution for correlation factor. The simultaneous probability distribution for

f  x, y  

 1  exp   2  2 2 2 1  2  2  xx  yy 1    1





 x, y  , and modify it for the sample

 x, y  is given by 

 

2  2  x  x  y   y x  y   x  x    2      2  2  yy2  xx  yy xx 



2

     

(34)

    2 where  is the correlation factor of the population, xx and yy are the variance of 2

x

and y , respectively. We perform variable conversions as t

u

x  x  2  xx

(35)

y  y

 yy2 

The corresponding Jacobian is evaluated as follows.

(36)

A Probability Distribution for a Correlation Factor

25

x  2   xx t

(37)

x 0 u

(38)

y 0 t

(39)

y 2   yy  u

(40)

We then have   x, y   t, u 



 2  xx

0

0

 yy2 

(41)

      xx  yy 2

2

The probability distribution for simultaneous variable pair  t , u  is then given by f  x, y  dxdy  f  t , u  dtdu 

 1 exp    2  2 2  2 1 2 2  xx  yy 1   



 1 exp    2 1 2 2 1   2 

1

1







t

2



t

2

  2  2  2  tu  u 2   xx  yy dtdu  



  2  tu  u 2  dtdu  



(42)

We further introduce a variable given by

v

u  t 1 2

and convert the variable pair from  t , u  to  t , v  . The corresponding Jacobian is given by

(43)

Kunihiro Suzuki

26 t 1 t

(44)

t 1  v v t 1

 

(45)

 1 2 1 2





u 0 t

(46)

u  1 2 v

(47)

and we obtain  t, u   t, v 



1 

1 2



(48)

1 2

0

 1 2

The term in the exponential in Eq. (42) is evaluated as t 2  2  tu  u 2   u   t    2t 2  t 2 2









 1   2 v2  1   2 t 2

(49)

We then obtain a probability distribution for variable pair of  t , v  as f  t , u  dtdu  g  t , v  dtdv 

 v2 t 2  exp     1   2 dtdv 2 1   2  2 2



 v2  1  t2  exp    exp    dtdv 2  2  2  2

1

1

(50)

A Probability Distribution for a Correlation Factor

27

Consequently, v and t follow a standard normal distributions independently. We convert each sample data as xi   x

ti 

 2  xx

ui 

(51)

yi   y

 yy2 

(52)

We further convert ui to vi as

vi 

ui   ti 1 2

(53)

We know that

 ti , vi  follows a standard normal distributions independently.

We need to express r with  ti , vi  .

r is a sample correlation factor for the pair xi , yi . It is also equal to the sample correlation factor for the pair ti , ui . n

 t

i

t

 ui  u 

i 1

r

n

 t

i

i 1

t

(54)



2

n

 u

i

u

2

i 1

We only know that t and v follow standard distributions independently, but do not know for u . Therefore, we need to convert u to v in Eq. (53). We do not start with Eq. (54) directly, but start with a variable of q given by

q

n

v

i

2

(55)

i 1

2 Since vi follows a standard normal distribution, q follows a  distribution with a freedom of n . Substituting Eq. (53) into Eq. (55), we obtain

Kunihiro Suzuki

28    

2

2

 2  ui ti   2ti 2

 u  t i  i q 2  i 1  1   n





n

 u

1 1 2

i

(56)

i 1



We want to relate q to r . Therefore, we investigate last terms in Eq. (56) with respect to the corresponding deviations as n

u

2

i

n

 u



u u

i

i 1

2

i 1 n

  u



 u   2  ui  u  u  u 2   2

i

i 1 n

n

n

i 1

i 1

i 1

2   ui  u   2u   ui  u    u 2



n

 u



i

 u   nu 2 2

i 1

n



ti ui 

i 1

n

 t

i

(57)

 ui  u  u 

t t

i 1



n

 t  t u  u   t u  u   u t i

i

i

i 1



i

 t   t u 

n

 t  t u  u   nt u i

i

i 1

n

t

i

2



i 1

n

 t

t t

i

(58)

2

i 1



n

 t

i

t

 2  2  ti  t  t

i 1



n

 t

i

t

i 1



n

 t

i



2

 2t

n

 t

i

i 1

t

 t 2 

t 

(59)

n

t

2

i 1

2  nu 2

i 1

Substituting Eqs. (57), (58), and (59) into Eq. (56), we obtain

A Probability Distribution for a Correlation Factor

q

 u  t i  i 2  1   i 1  n



   

2

2

 2  ui ti   2ti 2

n

1 1 2

 u



1 1 2

  n   n 2 2    ui  u   nu   2    ti  t   i 1   i 1



1 1 2

n  n 2   ui  u   2   ti  t  i 1 i 1



i

i 1













1  nu 2  2  nt u   2 nt 2 1 2  q0  q1

q 

1 1 2

n

 u

i

2

 2  ui ti   2ti 2



 i 1

n

n



i 1



 







1    ui  u 2  2   ti  t 2  1   i 1  i 1 1  nu 2  2  nt u   2 nt 2 1 2  q0  q1





 ui  u    2   ti  t 2 

n   n 1    ui  u 2  nu 2   2    ti  t 2  1     i 1   i 1



(60)





 ui  u   nt u    2   ti  t 2  nt 2  



i 1

n

29

n





 i 1



n

n



i 1

 

  

(61)

 ui  u    2   ti  t 2 







 ui  u   nt u    2   ti  t 2  nt 2  



where n n  2  u  u  2    ti  t  ui  u    2   i  i 1  i 1 n    ti  t  ui  u    1  2 2 2 i 1   T  U  2 TU TU 1 2       1  U 2  2  rTU   2T 2 1 2

q0 



1 1 2



n

 t

i

i 1

t



2 

 







n u 2  2 t u   2 t 2 2 1  n   u   t 2 1 2

q1 



(62)

 (63)

Kunihiro Suzuki

30 We define U and T as n

u  u 

U

2

(64)

i

i 1

n

t  t 

T

2

(65)

i

i 1

q1 is also expressed as q1  

n u   t 1 2

n 1  1 2  n

1   n  1  n





 ui   ti   

 u  t i  i  1 2 

 v 

2

   

2

(66)

2

2

i

1 Since vi follows a standard normal distribution,

n

n

v

i

i 1

follows a standard normal

 distribution. distribution. Therefore, q1 follows a 2

We analyze q0 next. We introduce a variable n

 t

i



 t  vi

i 1

T

(67)

and analyze it before doing analysis of q0 . Substituting Eq. (53) into Eq. (67), we obtain

A Probability Distribution for a Correlation Factor n

 t  t 

 ui   ti 

i



1 2

i 1

T n



31

 t  t  u  u    t i

1

i

i

i 1

1 2

 t  

T

n  n  ti  t  ui  u   ti  t  1  i 1 i 1    T T 1 2    n    ti  t  ui  u    2 1 T   i 1   U UT T  1 2       1   rU  T  1 2







2

      



(68)

We define q2 as q2   2 



1 r 2U 2  2r UT   2T 2 1 2

 (69)

Finally, q0 is related to q2 as q0 









1 1 U 2  2  rTU   2T 2  r 2U 2  2r UT   2T 2  q2 2 2 1  1 

1 r2 2 U  q2 1 2  q2  q3



(70)

where q3 is defined as q3 

1  r2 2 U 1 2

(71)

Kunihiro Suzuki

32

Therefore, q can be expressed by the sum of them as q  q0  q1

(72)

 q1  q2  q3

Since q1 follows a  2 distribution with a freedom of 1, q2  q3 follows a  2 distribution with a freedom of n  1 . We then analyze q2 and q3 separately. Let us consider  for fixed t of t0 , which is given by n

 t  t  v i



i

i 1

T n

 t  t  v i



i

i 1

n

 t  t 

2

i

i 1

 t0  t 



n

  vi i 1

n  t0  t 1 n

2

n

v

i

i 1

(73)

This follows a standard normal distribution. Therefore,

q2  v, t 0    2

follows a

2

q  v, t 0  distribution with a freedom of 1. Since Eq. (73) does not include t0 , 2 follows a

 2 distribution with a freedom of 1 for any t . We can conclude that q2 follows a  2  2 distribution with a distribution with a freedom of 1. Consequently, q3 follows a freedom of n  2 . We further consider q4 as

q4  t   T 2 Inspecting Eq. (65), this follows a

(74)

 2 distribution with a freedom of n  1 .

A Probability Distribution for a Correlation Factor

33

Finally, we obtain three independent variables q2 , q3 , and q4 whose corresponding probability distributions we know. We introduce three variables below instead of using q2 , q3 , and q4 . We consider a variable

 as

  1



1 

(75)

 rU  T 

2

Since q2   follows a  distribution with a freedom of 1,  follows a standard normal distribution, and the corresponding probability distribution is given by 2

2

 2  exp    2  2  1

f   

(76)

Next, we introduce a variable  

1 q3 2



1

2 1 2



1  r U 2

(77)

2

Since q3 follows a  2 distribution with a freedom of n  2 , we obtain f  q3  

1 n2  2 2 

n2  2   

q3

n4 2

 q  exp   3   2

(78)

Therefore, we can evaluate a probability distribution for  as follows. d 

1 dq3 2

We then have

(79)

Kunihiro Suzuki

34 f  q3  dq3  f    d  



1 n2  2 2 

n2  2   

1  n2    2 

 2 

n4 2

n4 2

exp      2d 

(80)

exp     d 

Finally, we introduce a variable of 1 q4 2 1  T2 2



(81)

Since q4 follows a  2 distribution with a freedom of n  1 , we obtain a probability distribution for  as f   d 



1 n 1  2 2 

n 1  2   

1   n 1    2 

n 3 2

 2 

n 3 2

exp     2d 

(82) exp    d

Therefore, the simultaneous three variable probability distribution is given by

f   ,  ,  



1 2

e



2 2

n4  2 e 

n 3  2 e

 n  2   n 1      2   2 

1 n  2   n 1  2      2   2 

n  4 n 3  2 2

(83)    exp         2  2

We convert this probability distribution to the one for  r , T ,U  . We evaluated the partial differentials below.

A Probability Distribution for a Correlation Factor   rU  T   1  2 r r 1  1



(84)

U

1 2

  rU  T   1  T T 1 2 1



1 2

1 1 2

(86)

r

 1    1 r2 U 2  2  r 2 1   r 





(85)



  rU  T   1  2 U U 1  

35





1

1    2

rU



 1    1  r2 U 2  2  T 2 1   T 



(87)

2







(88)

0  1    1 r2 U 2  2  U 2 1   U 











(89)



1  1  r2 U 1 2  1  2  T r 2 r 0

(90)

 1  2  T T 2 T T

(91)

 1   T2 U 2 U 0

(92)

Kunihiro Suzuki

36

We then obtain corresponding Jacobian as 1    ,  , 

  r , T ,U 

1   

2



U

1 rU 2 1 2

1 

2



3 2 2

1  

1 2

r



T 1

1



1 1 r2 U 1 2 0

0

0



1

(93)

TU 2

Therefore, we obtain a probability distribution with respect to a variable set of  r , T ,U  as f  r , T ,U  

1

1  

 1  2 1 2 



T2    2

1  n  2   n 1 2      2   2 

TU 2

3 2 2



 1 r2 U 2   





n 3

 2    1  exp    2 1 2  1  TU 2 3





1   

2 2





1 n4 2 2

1 n 3 2 2

  T 2  2r TU    1  n  2   n 1 2      2   2 

U



2

1  r  2

n4 2 2

1  

n4 2

U n4

T n 3

 1  exp    2 1 2 





n4 2



U

2

  T 2  2r TU   



1 1 1  n  2   n  1  n 7 2     2 2 1 2  2   2 



 1  exp    2 1 2 





U

2

1  r  2



  T 2  2r TU   



(94) n 1 2

n4 2

T n  2U n  2

A Probability Distribution for a Correlation Factor

37

T and U have value ranges between 0 and ∞. We further convert variables of T and U

to  and  given by   T   e 2     2 U   e 

(95)

Modifying these equations, we obtain   TU    U e   T 

(96)

The corresponding partial differentials are evaluated as 

T 1  e2  2 

(97)



T 1  e2  2

(98)



 U 1  e 2  2 

(99)



 U 1  e 2  2

(100)

The corresponding Jacobian is given by

 T ,U    ,  

1 

2  1 2 





1 e2 2

e2 e



 2







 1 e 2 2







 1 1 2 1  e2 e 2  e e2 2 2 2  2  1 1   4 4 1  2

1

(101)

Kunihiro Suzuki

38

Therefore, the term Q U , T  including T ,U is given by  1 Q T ,U   T n  2U n  2 exp    2 1 2 



n2



U

2

  T 2  2r TU   



n2

 1 exp    2 1 2       n  2 exp   cosh   r    2   1  

    e2  

   

    e 2  

   











 e    e   2r  

(102)



 changes from 0 to ∞, and  changes from -∞ to ∞. Therefore, the corresponding integration is given by 



Q T ,U dTdU 



  1   n2    exp   1   2  cosh   r  d d  2    0 



   n  1 1   2



n 1



(103)

 d  n 1 0  cosh   r   1

We set a variable t

  cosh   r   1 2

(104)

and obtain 

    n2   exp   1   2  cosh   r  d 0  

 1 2      cosh   r  

n2

 1 2     cosh   r   

n 1





0





0

t n  2 exp  t 

1 2 dt cosh   r 

t n  2 exp  t dt

 1 2     n  1    cosh   r  

n 1

(105) Since cosh  is an even function, and hence we change the integration region from 0 to ∞, and multiply it by 2 and obtain

A Probability Distribution for a Correlation Factor 1 1 1  n  2   n  1  n 7 2     2 2 1 2  2   2 

f r  





  n  1 1   2



n 1



n4 2



1  d  n 1 0  cosh   r  

  n  1



1  r  2

n 1 2

1

 n  2   n  1  2n 3    2   2 

 

(106)

n 1 2

1    1  r  2

39

2

n4 2



1  d  n 1 0  cosh   r  

Since the term in Eq. (106) is performed further as   n  1  n  2   n  1  n 3   2     n  2 , n2     n  2  2   2 

(107)

Eq. (106) is reduced to   n  1

f r  

 n  2   n  1  2 n 3    2   2 

 

  n  1





1

 n2



n 1 2

1    1  r     n  2 2

n 1 2

1    1  r  2

2

n4 2

n 1 2

1    1  r  2

2

n4 2

2

n4 2



1  d  n 1 0  cosh   r  



1  d  n 1 0  cosh   r  

(108)



1  d  n 1 0  cosh   r  

We further perform integration in Eq. (108) as follows. We introduce a variable N as

n 1  N

(109)

The term is then expanded as 1

 cosh   r  

n 1



1

 cosh   r   N



1 cosh N 



1 cosh N 

 r  1   cosh    



m 0

N

N  N  1

(110)

 N  m  1  r  m m!

cosh m 

Kunihiro Suzuki

40

where we utilize a Taylor series below. f  x   1  x 

N

 f 1 

f x

x x 0

1 2 f 2! x 2

 x2  x 0

1 3 f 3! x3

 x3  x 0

1 1  1  Nx  N  N  1 x 2  N  N  1 x 3  2! 3!  N  N  1  N  m  1 m  x m! m 0

(111)



where x

r cosh 

(112)

in this derivation process. Therefore, we can focus on the integration of the form given by 

1  du  0 cosh k u

(113)

Performing a variable conversion as x

1 cosh u

(114)

we then obtain dx 

 sinh u du cosh 2 u

 cosh 2 u  1 du cosh 2 u 1   x 2 2  1du x



  x 1  x 2 du

(115)

A Probability Distribution for a Correlation Factor

41

The integration of Eq. (113) is then reduced to  1 1 k  du   dx   x k 0 cosh u 1 x 1  x 2 0

1

 x k 1  dx 0 1  x 2

(116)

Further, we convert the variable as

x y

(117)

and obtain

dx 1  dy 2 y

(118)

Therefore, the integration is reduced to k 1

1

1  y 2 1  x k 1 dx  dy    1 y 2 y 0 1  x 2 0

1   2

1

0



k

y2

1

1 y

dy

1 k 1 B , 2  2 2 

k    1 2  2  k 1    2 

(119)

where  is a Gamma function and B is a Beta function (see Appendix 1-2 of volume 3). The integration is then given by

Kunihiro Suzuki

42 

1  d  n 1 0  cosh   r   

 1  N 0 cosh  









m 0

N  N  1





N  N  1



cosh m 

m!

 N  m  1

 r  m  



1

0 cosh N  m 

d d

 r  m

1 2

 N m     2   N  m 1   2  

r 

1 2

 N   N 1     N m  2  2  2     N  m   N  m 1  2     2  2   

 N  m  1 m!

m 0



 N  m  1  r  m

m!

m 0 

N  N  1

1  N   N 1     2  2  



m 0

N  N  1

 N  m  1 m!

m

(120)

Since we have relationships given by  N   N 1 2 N     N  2 ! 2 2 2    

(121)

 N  m   N  m  1  1 N m     N  m  2 2  2   

(122)

the term in Eq. (120) is reduced to

1 2

 N   N 1     N m  2  2  2     N  m   N  m 1  2       2  2   



2 N   N  2 ! 2  N  m  1 2     1 N  m 22    N  m  2 

 2m

2

m

 N m 2  2    N  2 !    N  m  N m 2  2    N  2 !   N  m  1!

(123)

A Probability Distribution for a Correlation Factor

43

Therefore, we substitute this into Eq. (120), we obtain 

1  d  n 1 0  cosh   r   



1  N   N 1     2  2  



N  N  1

m!

m 0



 N  m  1

r 

m

 N m 2  2    N  2 !   N  m  1!

2m

1  N   N 1     2  2  



1   r  m 2m   N  2 ! m! m 0

 N m 2    2   N  1!

  r   2m 1    m!  N   N  1  m 0     2  2  m



 N m 2    2   N  1

  r   2m 1    m!  n  1   n  2  m 0     2   2  m



(124)

 n 1 m  2   2    n  2

Therefore, the probability distribution for sample correlation factor r is given by f r  



n2

 n2



n 1 2

1    1  r  2

2

n 1 2 2

n4 2

n4 r2 2

1   1   n 1 2

1    1  r   2

2

n4 2

 n 1  n  2     2   2 

 





1  d  n 1 0  cosh   r  





m 0

  r   2m 1   m!  n  1   n  2  m 0   2  2    



m

 n 1 m  2   2    n  2

(125)

 2r   m  2  n  1  m  m!

 

2

 

This is a rigorous model for a probability distribution for correlation factor. However, it is rather complicated and is difficult to use. Therefore, we consider a variable conversion

Kunihiro Suzuki

44

and derive an approximated normal distribution, and do range analysis with the approximated normal distribution.

5. APPROXIMATED NORMAL DISTRIBUTION FOR CONVERTED VARIABLE We convert the variable r to  as 1

1 r 

  ln   2 1 r 

(126)

This can be expanded as 1 r   1 r  1 1  r  r3  r5  3 5 1 2

  ln 





 2k  1r 1

(127)

2 k 1

k 1

This changes a variable range from  to  when r changes from -1 to +1 as shown in Figure 1. Let us consider the region for r  0 . When r is less than0.5,  is almost equal to r , and then it deviate from r and become much larger than r when r approaches to 1, which is the required feature. 3 2



1 r

0 -1 -2 -3 -1

Figure 1. The relationship between

-0.5

0 r

0.5

 and sample correlation factor r .

1

A Probability Distribution for a Correlation Factor

45

We then derive a probability distribution for  . We solve Eq. (126) with respect to r and obtain r  tanh 

(128)

Each increment is related to each other as dr 

1 d cosh 2 

(129)

Therefore, the corresponding probability distribution is given by

f   

1    2

n 1 2

 n 1  n  2       2   2 

1  tanh    2

n4 2

cosh 2 





 2  tanh  m  2  n  1  m   

m!

m 0

2

 

(130)

Probability density

12 10 8 6 4

n = 100

f(r) f() f(r) f() f(r) f() =0

 = 0.4

 = 0.8

2 0 -0.5

0

Figure 2. Probability distribution function for

0.5 r, 

r

and

1

1.5

.

Figure 2 shows a probability distribution for r and  with a sample number of 100. f r 

becomes narrow and asymmetrical with increasing  , which is due to the fact the sample correlation factor has a upper limit of 1. On the other hand, f   has the almost identical shape and move parallel. f   and f r 

becomes same as the absolute value of  decreases.

In the small  , we can expect a small r . We can then approximate  as

Kunihiro Suzuki

46 1 3

1 5

  r  r3  r5  r

(131)

Therefore, f  r  and f   are the same.

5 4 f()

n = 100

 = 0.0 = 0.4 = 0.8

3 2 1 0

-0.4

-0.2

0  - 

0.2

Figure 3. The probability distribution with respect to the deviation of

0.4

.

We study the average of the  . We guess that the average of  can be obtained by substituting  into Eq. (126) as  

1 1   ln   2 1  

(132)

Therefore, we replot by changing the lateral axis in Figure 2 from  to    , which is shown in Figure 3. All data for   0,0.4,0.8 become almost the same. Therefore, the average is well expressed by Eq. (132), and all the profiles are the same if we plot them with a lateral axis of    . Next, we consider the variance of the profile. When we use a lateral axis as    , the profiles do not have  dependence. Therefore, we set   0 . In this case, f    f  r  , and the probability distribution f  t    2 converted from f  r  is a t distribution with a freedom of n  2 . The variance t  n  2 of

the t distribution with a freedom of n  2 is given by

A Probability Distribution for a Correlation Factor  t n 2  2

n2 n4

47 (133)

We re-express the relationship between t and r as r

t  n2

(134)

1  r2

This is not so simple relationship, but we approximate that r is small, and obtain t  n  2r

(135)

We then have r

t

(136)

n2

In this case, the variance for r expressed by  r 2  

1



n2



2

 r2n  2

is given by

 t2n 2 

1 n2 n2 n4 1  n4



(137)

This is the approximated one. When n is large, we can expect that this treatment is accurate. However, we may not expect accurate one for small n . Therefore, we propose that we modify the model of Eq. (137) and change the variable as   2 

1 n 

(138)

This model includes a parameter  . We decide the value by comparing the calculated results. In the standard text book,  is set at 3. The model and numerical data agree well with

  2.5 as shown in Figure 4.

Kunihiro Suzuki

48 0.8

n=5 =2

0.7

f()

 = 2.5

0.6

=3

0.5 =4

0.4 0.3

-0.4

-0.2

0  (a)

0.2

0.4

1.2 n = 10 =2  = 2.5

1.1

f()

=3

1.0

=4

0.9 -0.3 -0.2 -0.1

0  (b)

Figure 4. Comparison of analytical models with various

0.1

0.2

0.3

 and numerical data.

4.0  = 2.5

n=5 n = 10 n = 50 n = 100 Gauss

f()

3.0 2.0 1.0 0.0 -2

-1

0 

1

2

Figure 5. Comparison between analytical model and numerical model for various n.

A Probability Distribution for a Correlation Factor

49

Figure 5 compares an analytical model with   2.5 and a numerical model for various n . An analytical model well reproduces the numerical data for any n . Therefore, the probability distribution for  can be approximated with a normal distribution as

f   



1    2

n 1 2

1  tanh    2

n4 2





 n 1  n  2  cosh 2  m 0     2   2  2          1 ln  1      2 1     1 exp     1 2   2 n  2.5   n  2.5  

 

 2  tanh  m  2  n  1  m  m!

 

2

 

(139) When we perform the conversion given by

1 1     ln   2 1   z 1 n  2.5

(140)

z follows a standard normal distribution as

 z2  f  z  exp    2  2 1

(141)

We finally evaluate the normal distribution approximation for   0 . We use a sample number n  5 where the accuracy is sensitive. The analytical model reproduces the numerical data with

  2.5 as shown in Figure 6.

Kunihiro Suzuki

50

= 1e-8 = 0.4 = 0.8 Gauss( = 3) Gauss( = 2.5)

0.7 n=5

0.6

f()

0.5 0.4 0.3 0.2 0.1 0.0 -2

-1

0



1

2

3

Figure 6. Comparison of numerical and analytical model for various  at the sample number of 5. The parameters for the analytical model are set at 3 and 2.5.

6. PREDICTION OF CORRELATION FACTOR We showed that the converted variable follows a normal distribution and we further obtained approximated averages and variances. Therefore, related prediction range is given by 1  1  rxy ln  2  1  rxy

   zP  

1  1    1  1  rxy  ln    ln  n  2.5 2  1    2  1  rxy 1

   zP  

1 n  2.5

(142)

We then obtain the following form min    max

(143)

where max is evaluated from the equation of 1  1  max ln  2  1  max

 1  1  rxy   ln   2  1  rxy

 1   zP  n  2.5 

(144)

A Probability Distribution for a Correlation Factor

51

We set GH as

1  1  rxy GH  ln  2  1  rxy

   zP  

1 n  2.5

(145)

and obtain max as a function of GH as below.

 1   max  ln    2GH  1   max  1   max  e 2GH 1   max 1   max  1   max  e 2GH

1  e   2GH

 max 

max

 e2GH  1

e 2GH  1 e 2GH  1

(146)

Similarly, we obtain min as

min

e2GL  1  2G e L 1

(147)

where

1  1  rxy GL  ln  2  1  rxy

   zP  

1 n  2.5

(148)

Figure 7 shows the prediction range for a sample correlation factor. The range is very wide, and hence we need a sample number at least 50 to ensure the range is less than 0.5.

Kunihiro Suzuki

52

1.0

xy

0.5

n=5 10 20 40 60 100

0.0 -0.5 -1.0 -1.0

n = 100

-0.5

0.0 rxy (a)

60

0.5

5 10 20 40

1.0

1.0 n = 100 400

0.5

 xy

1000

0.0 100 400

-0.5 -1.0 -1.0

n = 1000

-0.5

0.0 rxy (b)

0.5

1.0

Figure 7. Prediction range for sample correlation factor. (a) n  100 , (b) n  100 .

Let us study Eq. (143) in more detail. When n is sufficiently large, we can neglect the second term in

GH

, and it is reduced

to

1  1  rxy GH  ln  2  1  rxy

   

Then the max is also reduced to

(149)

A Probability Distribution for a Correlation Factor

 max 

 1 rxy ln   1 rxy 

   

 1 rxy ln   1 r e  xy

   

e

1  rxy 1  rxy  1  rxy 1  rxy 

53

1 1

1 1

2rxy 2

 rxy

(150)

Then, the sample correlation factor becomes that of population. When n is not so sufficiently large, we can approximate as follows.   1  rxy exp  2GH   exp  ln    1  rxy   1  rxy   exp  zP 1  rxy  1  rxy   1  z P 1  rxy 

  2    zP  n  2.5     n  2.5  2

(151)

  n  2.5  2

Therefore, we obtain

 max

1  rxy  2  1  zP   1 1  rxy  n  2.5   1  rxy  2  1  zP   1 1  rxy  n  2.5  1  rxy rxy  zP n  2.5  1  rxy 1  zP n  2.5

Similarly, we obtain an approximated min as

(152)

Kunihiro Suzuki

54

 min 

rxy  z P 1  zP

1  rxy n  2.5 1  rxy n  2.5

(153)

The approximated model is compared with the rigorous one as shown in Figure 8. The agreement is not good for n less than 50, but the simple model does work for n larger than 100. The accuracy of the model is related to the Taylor expansion, and it is expressed by

n

3  4 z 2p

(154)

We decide the prediction probability as 0.95, and corresponding Eq. (154) is reduced to n

zp

3  4 1.962  18.4

(155)

This well explains the results in Figure 8.

1.0

xy

0.5

n = 20 (Rigorous) n = 40 (Rigorous) n = 100 (Rigorous) n = 200 (Rigorous) n = 20 (Simple) n = 40 (Simple) n = 100 (Simple) n = 200 (Simple)

0.0 -0.5 -1.0 -1.0

is 1.96. Therefore,

-0.5

0.0 rxy

0.5

Figure 8. Comparison between simple and rigorous correlation factor models.

1.0

A Probability Distribution for a Correlation Factor

55

7. PREDICTION WITH MORE RIGOROUS AVERAGE MODEL Let us inspect the Figure 6 in more detail. The analytical normal distribution deviates from the numerical data for large  . The analytical models shift left slightly with increasing  . This is due to the inaccurate approximation of the model for an average. This is can be improved by using a model given by  

1 1  ln  2 1 

    2  n  1

(156)

Figure 9 compares the modified analytical model and numerical data, where both agree well for various  . Finally, we express a probability distribution function as

f   





1 2



n 1 2

 

1  tanh 2 



n4 2





 2  tanh  m  2  n  1  m   

m!  n 1  n  2  cosh 2  m 0     2   2  2            1 ln  1           2  1    2  n  1    1   exp   1 2   2   n  2.5 n  2.5    

 

2

 

(157)

 = 1e-8 = 0.4  = 0.8 Gauss( = 0.25)

0.7 n=5

0.6

f()

0.5 0.4 0.3 0.2 0.1 0.0 -2

-1

0



1

2

3

Figure 9. Comparison with analytical data with modified average model with numerical data.

Kunihiro Suzuki

56

Therefore, we use a normal distribution with an average of

1 1  ln  2 1 

    2  n  1 ,

and a

1

standard deviation of n2.5 . Consequently, we set the normalized variable as

z

1  1  rxy ln  2  1  rxy

 1 1    1     ln      2  1    2  n  1  1 n  2.5

(158)

Then, it follows a standard normal distribution given by f  z 

 z2 exp   2  2 1

  

(159)

Based on the above analysis, we predict the range of correlation factor. First of all, we want to obtain the central value of  denoted as cent , which is

r

commonly regarded as xy . We can improve the accuracy as follows. cent should satisfy the relationship given by

cent 1  1  cent  1  1  rxy ln   ln   2  1  cent  2  n  1 2  1  rxy

   

(160)

If there is no the second term in the left side of Eq. (160), we obtain

cent  rxy

(161)

We construct a function given by

1 1    1  1  rxy fcent     ln   ln   2  1    2  n  1 2  1  rxy

   

(162)

A Probability Distribution for a Correlation Factor

57

cent is a root of the equation given by

fcent  cent   0

(163)

r We can expect this cent is not far from the xy from Figure 6 and Figure 9. Therefore, we can expect

fcent     f  rxy   fcent '  rxy    rxy 

(164)

Therefore, we obtain

0  f  rxy   f '  rxy  cent  rxy 

(165)

Therefore, we obtain cent as

cent  rxy 

fcent  rxy 

fcent '  rxy 

(166)

where

 

f cent rxy  

 

f cent ' rxy  

 1  rxy 1 ln ln   1  rxy 2 

 rxy 1  1  rxy  ln    2  n  1 2  1  rxy  

   

rxy

2  n  1

(167)

1 1 1  1    2 1  rxy 1  rxy  2  n  1 1 1  2 1  rxy 2  n  1

Therefore, we obtain

(168)

Kunihiro Suzuki

58

cent

1 2  n  1  rxy  rxy 1 1  1  rxy2 2  n  1



1 1  rxy2 1 1  2 1  rxy 2  n  1

rxy

(169)

Next, we evaluate the range of the correlation factor range. The related estimated range is expressed by 1  1  rxy ln  2  1  rxy

 1 1 1    1  1  rxy  ln   ln    zP   n  2.5 2  1    2  n  1 2  1  rxy 

 1   zP  n  2.5 

(170)

Form this equation, we want to obtain the final form below. min    max

(171)

We cannot obtain analytical forms for min and max from the Eq. (170). Therefore, we show approximated analytical forms and a numerical procedure to obtain rigorous one.  2 n  1  We first neglect a correlation term of     , and set max , min under this treatment as max 0 , min 0 .

max 0 can be evaluated from the equation given by 1  1  max 0  1  1  rxy ln    ln  2  1  max 0  2  1  rxy

   zP  

1 n  2.5

(172)

We introduce a variable as 1  1  rxy GH  ln  2  1  rxy

   zP  

1 n  2.5

(173)

A Probability Distribution for a Correlation Factor

59

We then have an analytical form for max 0 as below.

 1   max 0  ln    2GH  1   max 0  1   max 0  e 2 GH 1   max 0 1   max 0  1   max 0  e 2GH

1  e   2 GH

 max 0 

max 0

e

2 GH

(174)

1

e 2 GH  1 e 2 GH  1

Similarly, we obtain

min 0

e2GL  1  2G e L 1

(175)

where

1  1  rxy GL  ln  2  1  rxy

   zP  

1 n  2.5

(176)

Next, we include the correlation factor. max can be evaluated from the equation given by 1  1  max ln  2  1  max

 max 1  1  rxy  ln    2  n  1 2  1  rxy

   zP  

1 n  2.5

(177)

We construct a function given by 1 1    1  1  rxy f     ln   ln   2  1    2  n  1 2  1  rxy

   zP  

1 n  2.5

(178)

Kunihiro Suzuki

60

max is a root of the equation given by

f  max   0

(179)

We can expect that this max is not far from the max 0 from Figure 6 and Figure 9. Therefore, we can expect

f     f  max 0   f '  max 0    max 0 

(180)

We then obtain

0  f  max 0   f '  max 0  max  max 0 

(181)

Therefore, we obtain max as

max  max 0 

f   max 0 

(182)

f '   max 0 

where f  max 0   

max 0 1  1  max 0  1  1  rxy ln   ln   2  1  max 0  2  n  1 2  1  rxy



1 n  2.5

max 0

2  n  1

f '   max 0  

   zP  

(183)

 1 1 1 1    2 1   max 0 1   max 0  2  n  1 1 1   max 0 2

Therefore, we obtain



1 2  n  1

(184)

A Probability Distribution for a Correlation Factor 1 2  n  1

 max   max 0 

1 2 1   max 0



1 2  n  1

61

 max 0

1 

2 1   max 0

1



2 1   max 0

1 2  n  1

 max 0 (185)

Similarly, we can obtain min . min can be evaluated from the equation given by

min 1  1  min  1  1  rxy ln   ln   2  1  min  2  n  1 2  1  rxy

 1   zP  n  2.5 

(186)

We introduce a function for min given by 1 1    1  1  rxy g     ln   ln   2  1    2  n  1 2  1  rxy

   zP  

1 n  2.5

(187)

min is given by

 min   min 0 

g   min 0 

(188)

g '   min 0 

where g '   min 0  

1 1   min 0 2



1 2  n  1

(189)

We then obtain 1

min 

2 1   min 0

1 2 1   min 0



1 2  n  1

min 0

(190)

Kunihiro Suzuki

62

8. TESTING OF CORRELATION FACTOR The sample correlation factor is not 0 even when correlation factor of the population is 0. We can test it setting   0 in Eq. (125), and obtain  n 1    2  1 r2 n2     2 



f r  



n4 2

(191)

We introduce a variable t  n2

r

(192)

1  r2

The corresponding increment is 1  r2  r

2r

2 1  r 2 dr 1 r2

dt  n  2

1  r2  r2  n2  n2

1 r2 1  r2 1 3 2 2

1  r 

(193) dr

dr

The relationship between t and r is modified from Eq. (192) as





t 2 1  r 2   n  2 r 2

(194)

2 We solve Eq. (194) with respect to r , and obtain

r2 

t2  n  2  t 2

Therefore, the probability distribution for r is given by

(195)

A Probability Distribution for a Correlation Factor

63

 n 1  n4   2  1  r 2 2 dr f  r  dr  n2     2   n 1  n4  1  2  1 r2 2  dt n2 n 2    3  2  1  r2 2



















 n 1  n 1   2  1  r 2 2 dt n  2   n  2     2  n 1  n 1    2 1 t2  2  1    dt n  2    n  2    n  2   t 2     2  n 1  n 1   1 n2  2  2     dt n  2    n  2    n  2   t 2     2 



1



n 1







 2  n 1      1 1  2    dt t2  n  2   n  2     1   2  n2  n 1 n 1    2 1  2  1  t  2 dt   n2 n  2   n  2      2    n  2  1   n  2  1    2 2 2 1   1  t  dt   n2 n  2   n  2      2 

(196)

Therefore, the probability distribution function f  r  is converted to

f t  

  n  2  1   n  2  1   2  2 2  1 t   1   n2 n  2   n  2     2  

(197)

This is a t distribution with a freedom of n  2 . When we obtain sample correlation factor r from n samples, we evaluate t given by t  n2

r 1  r2

(198)

Kunihiro Suzuki

64

We decide a prediction probability P and obtain the corresponding P point of tP with a t distribution with a freedom of n  2 , and compare this with Eq.(198), and perform a test as  t  t p   t  t p

Two variables do not dependent each other Two variables depend on each other

(199)

9. TESTING OF TWO CORRELATION FACTOR DIFFERENCE We want to test that the two groups correlation factors are the same or not. We treat two groups A and B and corresponding correlation factors and sample numbers are rA and rB , and nA and nB . We introduce variables given by

1  1  rA  z A  ln   2  1  rA 

(200)

1  1  rB  zB  ln   2  1  rB 

(201)

The average of z A and z B are 0. The corresponding variances are given by

 A  

1 nA  2.5

(202)

 B  

1 nB  2.5

(203)

2

2

We introduce a variable as Z  z A  zB 1  1  rA  1  1  rB   ln    ln   2  1  rA  2  1  rB  1  1  rA 1  rB   ln   2  1  rB 1  rA 

(204)

A Probability Distribution for a Correlation Factor

65

The rA  rB is identical to Z  0 . The average of

Z is zero, and variance   2 is

  2    A    B  2

2

(205)

Therefore, we can evaluate corresponding z P for a normal distribution and we can judge whether the correlation factor is the same if the following is valid.

Z  zp

(206)

SUMMARY The probability distribution function of two variables is given by

f  x1 , x2  

 1 exp   2 1 2 2 2 1   2 1  2   1

2









 2 2   x1  2  x1 x2  x2     2  2  2   2    2 1 2  1  

This can be extended to multi variables given by f  x1 , x2 ,

, xm  

1

 2 

m 2

 D2  exp     2  

where   11 2   2     21     2  m1

and

 12 2

 1m2 

 2  22

 2 2m 

 m 22

D is given by



   2   mm 

Kunihiro Suzuki

66

  11 2  12 2  21 2   22 2 , xm  m      m1 2  m 2 2 

D 2   x1  1 , x2  2 ,

 1m 2   x1  1     2 m 2   x2  2 

     mm 2   xm  m 

The probability distribution for a sample correlation factor is given by n 1 2

1    1  r  f r   2

2

n4 2

 n 1  n  2     2   2 



 





m 0

 2r   m  2  n  1  m  m!

 

2

 

Introducing a variable  , the distribution function is converted to 1 r 

1

  ln   2 1 r  Performing numerical study, we show that the average and variance of this function are given by  

1 1  ln  2 1 

  2  

   2 n   1 

1 n  2.5

Therefore, the distribution is approximately expressed by

f   



    exp    2   2 2  2   1



2

   

The central correlation factor in the population cent is given by

cent 

1 1  rxy2 1 1  2 1  rxy 2  n  1

rxy

A Probability Distribution for a Correlation Factor We then obtain an error range as

min    max where

1

max 

2 1   max 0

1 2 1   max 0



1 2  n  1

max 0

1

min 

2 1   min 0

1 2 1   min 0

1  2  n  1

max 0

e2GH  1  2G e H 1

min 0

e2GL  1  2G e L 1

1  1  rxy GH  ln  2  1  rxy 1  1  rxy GL  ln  2  1  rxy

   zP  

min 0

1 n  2.5

 1   zP  n  2.5 

The correlation factor distribution for   0 is given by  n 1    2  1 r2 f r   n2     2 





n4 2

67

Kunihiro Suzuki

68 Introducing a variable t  n2

r 1  r2

We modify the distribution as

  n  2  1   n  2  1    2 2 1   1  t  2 f t     n  2   n  2   n  2     2  This is a t distribution with a freedom of n  2 . We can hence compare this with the P point t P with a freedom of n  2 .

Chapter 3

REGRESSION ANALYSIS ABSTRACT We studied the significance of relationship between two variables by evaluating a correlation factor in the previous two chapters. We then predict the objective variable value for given explanatory variable values, which is called as the regression analysis. We also evaluate the accuracy of the regression line, and predictive a range of regression and objective value. We further evaluate the predictive range of factors of regression line, the predictive range of regression, and the range of object values. We first focus on the linear regression to express the relationship, and then extend it to various functions.

Keywords: regression line, linear function, logarithmic regression, power law function, multi-power law function, leverage ratio, logistic curve

1. INTRODUCTION We can evaluate two variables such as smoking and life expectancy. We can obtain a corresponding correlation factor. The correlation factor value expresses the strength of the relationship between the two variables. We further want to predict the life expectancy of a person who takes one box tobacco per a day. We should express the relationship between two variables quantitatively, which we treat in this chapter. The relationship is expressed by a line in general, and it is called as regression line. Further, the relationship is not expressed with only a line, but may be with an arbitrary function. We perform matrix operation for the analysis in this chapter. The matrix operation is summarized in Chapter 15 of volume 3.

Kunihiro Suzuki

70

2. PATH DIAGRAM It is convenient to express the relationship between variables qualitatively. A path diagram is sometimes used to do it. The evaluated variables are expressed by a square, and the correlation is expressed by an arrowed arch line, and the objective and explanatory variables are connected by an arrowed line where the arrow is terminated to the objected variable. The error is expressed by a circle. Therefore, the objective variable y and the explanatory variable x are expressed as shown in Figure 1.

Figure 1. Path diagram for regression.

3. REGRESSION EQUATION FOR A POPULATION First, we consider a regression for a population, where we obtain N sets of data  x1 , y1  ,  x2 , y2  , ,  xN , yN  . We obtain a scattered plot as shown in the previous chapter. We want to obtain a corresponding theory that expresses relationship between the two variables. Table 1. Relationship between secondary industry number and detached house number Prefecture ID 1 2 3 4 5 6 7 8 9 10 11 12

Scondary industry number 630 1050 350 820 1400 1300 145 740 1200 200 450 1500

Detached house number 680 1300 400 1240 1300 1500 195 1050 1250 320 630 1750

Regression Analysis

71

We obtain the relationship between a secondary industry number and a detached house number as shown in Table 1. Figure 2 shows the scattered plot of the Table 1. We can estimate the correlation factor as shown in the previous chapter, and it is given by

 xy  0.959

(1)

That is, we have a strong relationship between the secondary industry number and the detached house number.

Detached house number

2000

1800 y = 1.0245x + 132.55

1600 1400 1200 1000 800 600 400 200 0 0

500

1000

1500

2000

Secondary industry number Figure 2. Dependence of detached house number on secondary industry number.

We assume that the detached house number data yi is related to the secondary industry data xi as yi  a0  a1 xi  ei  Yi  ei

(2)

where a0 and a1 are a constant number independent on data i . ei is the error associated with the data number i . Yi is the regression value and is given by Yi  a0  a1 xi

(3)





2 N 0,  e  We assume that the ei follows the normal distribution with , where

Kunihiro Suzuki

72 1 N 2  yi  Yi   N i 1

 e 2 

(4)

and it is independent on variable xi , that is Cov  xi , ei   0

(5)

We show how we can derive the regression line. The deviation from the regression line and data ei is expressed by

ei  yi  Yi

 yi   a0  a1 xi 

(6)

  2 A variance of ei is denoted as e and is given by 1 2  e1  e22   eN 2  N 2 2 1  y1   a0  a1 x1     y2   a0  a1 x2    N

 e 2 



  y N   a0  a1 xN 

2



(7)

  2 We decide variables a0 and a1 so that e has the minimum value. We first differentiate  e   1  a0 a0 N 2



1 N

 e 2  with respect to a0 and set it to be zero as follows

N

  y   a i

i 1

0

N

 2  y   a i

i 1

1  2  N

N

y i 1

i

0

 a1

 a1 xi  

2

 a1 xi  

1 N

 2  y  a1  x  a0 

N

x i 1

i

  a0  

(8)

0

where

y 

1 N  yi N i 1

(9)

Regression Analysis

x 

1 N  xi N i 1

73 (10)

Therefore, we obtain

 y  a0  a1 x

Differentiating

(11)

 e 2  with respect to a1 , we obtain

2  e   1 N  yi   a0  a1 xi     a1 a1 N i 1 2



1 N  2  yi   a0  a1 xi  xi N i 1 

(12)

0

Substituting Eq. (11) of a0 into Eq. (12), we obtain 1 N  xi  yi   a0  a1 xi  N i 1  1 N   xi  yi   a1 xi   y  a1  x   N i 1 1 N   xi  yi   y   a1  xi   x    0 N i 1

(13)

It should also be noted that 1 N  x  yi   y   a1  xi  x   0 N i 1 

(14)

Subtracting Eq. (14) from Eq. (13), we obtain 1 N   xi  x   yi   y   a1  xi  x   0 N i 1

We then obtain

(15)

Kunihiro Suzuki

74 a1 

 xy 2 

(16)

 xx 2 

where

 xx 2 

 xy 2

1 N 2  xi  x   N i 1

1 N   yi   y   xi  x  N i 1

Eq. (16) can be expressed as a function of a correlation factor

a1 

(17)

(18)

 xy

as

 xy 2  xx 2

  xy   xy

 xx 2 yy 2  xy 2  xy 2

 xx 2

(19)

 yy 2  xx 2

where

 xy 

 xy 2

(20)

 xx 2 yy 2

a0 can be determined from Eq. (11) as

a0   y  a1 x   y   xy

 yy 2  2

 xx

x

Finally, we obtain a regression line as

(21)

Regression Analysis

Y   y   xy   xy

 yy 2  xx 2

 yy 2  xx 2

 x   xy

 yy 2  xx 2

75

x

(22)

 x  x    y

This can be modified as a normalized form as Y  y

 yy  2

 x x   xy     2 xx 

   

(23)

Eq. (22) can then be expressed with a normalized variable as

zY   xy z x

(24)

where zY 

zx 

Y  y

 yy  2

(25)

x  x

 xx  2

(26)

Now, we can evaluate the assumption for e . The average of e is denoted as e and can be evaluated as 1 N   yi  Yi  N i 1 1 N    yi   a0  a1 xi   N i 1 1 N    yi   y   a1  xi   x   N i 1 0

e 

This is zero as was assumed. The covariance between x and e is given by

(27)

Kunihiro Suzuki

76

1 N   xi   x ei N i 1 1 N    xi   x   yi   a0  a1 xi   N i 1 1 N    xi   x   yi   y   a1  xi   x   N i 1

Cov  x, e 

  xy   a1 xx  2

 2

2

  xy 

 xy 2  2

 xx

 xx 2

0

(28)

Therefore, both variables are independent as is assumed. The variance of e can be also evaluated as  e 2 e12  e2 2   eN 2 N 1 2 2 2   y1  a0  a1 x1    y2  a0  a1 x2     yn  a0  a1 xN  N 2 2 2 1  y1   y   a1  x1   x     y2   y   a1  x2   x     yn   y   a1  xn   x         N 2 2 2 1   y1   y    y2   y     y N   y     N 2a1  y1   y   x1   x    y2   y   x2   x     y N   y   xN   x     N  2 a 2 2 2  1  x1   x    x2   x     xN   x    N 









  yy    2a1 xy    a12 xx  2

  yy

 2

2

2

 yy  2  xx  2

  yy    2 yy    xy 2

  yy 

2

2



2 2

 xy xy

 2

   2  yy   xy   xx  2    2   xx 

 xy  2  xx  2  yy  2

  yy    xy 2 2

1    2

xy

(29)

4. DIFFERENT EXPRESSION FOR CORRELATION FACTOR Let us consider the correlation between yi and the regression line data denoted by Yi , where

Regression Analysis

Yi 

 yy  2  xx  2

77

 xy  xi   x    y

(30)

The average and variance of Y are given by

Y   y

 YY 2 

(31)

 yy  2  xx

 2

 xy2  xx 2 (32)

The correlation between yi and Yi is then given by

1 N   yi   y Yi   y  N i 1  2  yy 2 YY

   2  1 N yy  y    x            i y   2 xy i x y y  N i 1   xx  

 yy 2  yy 

 2

 xx  2

 xy

 xx

 2

 xy2  xx 2

1 N   yi   y   xi   x  N i 1

 yy  2  xx

 yy  2

 2

 xy  yy  2 xx 2

1   yi   y   xi   x  N i 1 N



 yy  2 xx 2

  xy

(33)

It should be noted that the correlation between y and regression line data Y is equal to the correlation factor between y and x .

5. FORCED REGRESSION In some cases, we want to set some restrictions, that is, we sometimes want to set a constant boundary value at 0 for x  0 . In this special case, we can obtain a1 by setting a0  0 in Eq. (12) and obtain

Kunihiro Suzuki

78  e   1 N 2   yi  a1 xi   a1 a1 N i 1 2



1 N  2  yi  a1 xi  xi N i 1

(34)

0

This can be modified as 1 N   yi  a1 xi  xi N i 1 1 N    yi   y   y  a1  xi   x   x   xi   x   x  N i 1 1 N    yi   y    y  a1  xi   x   a1 x   xi   x  N i 1 1 N    yi   y    y  a1  xi   x   a1 x   x N i 1

(35)

  xy   a1 xx    y  x  a1 x 2 2

2

0

Therefore, we obtain

a1 

 xy 2   y  x  xx 2   x 2  xy

 

 xx 2 yy 2  2

 xy

 xy 2   y  x

(36)

 xx 2   x 2  xy  xx 2 yy 2   y  x  xx 2   x 2

We then obtain a corresponding regression equation as

Y

 xy  xx 2  yy  2   x  y  xx  2   x 2

x (37)

In general, we need not to set a0 as 0, but set is as a given value, and the corresponding a1 is then given by

Regression Analysis

79

 xy  xx  2  yy  2   x  y  a0  x

a1 



 2 xx

 x2



(38)

and the regression equation is given by

Y

 xy  xx  2  yy  2   x  y  a0  x



 2 xx

 x2



x  a0

(39)

6. ACCURACY OF THE REGRESSION The determination coefficient below is defined and frequently used to evaluate the accuracy of the regression, which is given by R2 

 r 2   yy 2 

(40)

 2 where  r is the variance of Y given by

 r 2 

2 1 N Yi   y    N i 1

(41)

This is the ratio of the variance of the regression line to the variance of y . If whole data are on the regression line, that is, the regression is perfect,

 yy2 

 2 is equal to  r , and

R2  1 . On the other hand, when the deviation of data from the regression line is significant, the determinant ratio is expected to become small. We further modify the variance of y related to the regression value as 1 N 1  N 1  N 1  N

 2  yy 

 y

 y 

 y

 Yi  Yi   y 

N

i

i 1 N

i

i 1 N

 y

i

i 1 N

e i 1

2 i



  e    r 2

 Yi 

2

1 N

2

2

1  N

2

 Y    i

i 1

 Y    N

i 1

i

(42)

N

y

2

y

2

1 N

2

1 2 N

N

 y i 1

N

 e a i 1

i

0

i

 Yi  Yi   y 

 a1 xi 

Kunihiro Suzuki

80

Therefore, the determination coefficient is reduced to

 r 2 R   2  e   r 2 2

1

 e 2  yy  2

(43)

Substituting Eq. (29)into Eq. (43), we obtain

R2  1    xy

 yy  2  yy

 2

1    2

xy

2

(44)

Consequently, the determination coefficient is the square of the correlation factor, and has a value range between 0 and 1. From Eqs. (43) and (44), we can easily evaluate

 e 2 as

 e 2  1   xy2  yy 2

(45)

7. REGRESSION FOR SAMPLE We can perform similar analysis for the sample data. We assume n samples. The regression line can be expressed by

Yˆ  aˆ0  aˆ1 x

(46)

The form of the regression line is the same as the one for population. However, the coefficients aˆ0 and aˆ1 depend on extracted samples, and are given by

aˆ1 

S yy 

2

S xx 

2

rxy

(47)

Regression Analysis

81

a0   y  a1  x y

S yy 

2

S xx 

2

rxy x

(48)

Finally, we obtain a regression line for sample data as

Yy

S yy 

2

S xx 

2

rxy  x  x 

(49)

where sample averages and sample variances are given by x

1  xi n

(50)

y

1  yi n

(51)

  S xx  2

  S yy  2

S xy   2

x

 x

i

2

(52)

n

 y

i

 y

2

(53)

n

  x  x  y i

i

 y

n

S xy 

(54)

2

rxy 

S xx  S yy  2

2

(55)

As in the population, the sample correlation factor is expressed by a different way as 1   yi   y Yi   y  rxy  n  2  2 S yy SYY

(56)

Kunihiro Suzuki

82

The determinant factor for the sample data is given by R2 

Sr 

2

S yy 

1

2

Se

2

S yy 

2

(57)

where Sr   2

 y

i

 y Yi  y  n

(58)

and it can be proved as the similar way as population as

S yy   Sr   Se 2

2

2

(59)

The variances are ones for sample data, and hence we replace them by unbiased ones as n R

*2

1

e n

T

Se

2

 2

(60)

S yy

a0 and a1 are used for Se 2 and hence freedom is decreased by two, and the corresponding freedom e is given e  n  2

y

(61)  2

is used in S yy , and hence the freedom is decreased by one, and the corresponding freedom T is given T  n  1

R*2 is called as a degree of freedom adjusted coefficient of determination.

(62)

Regression Analysis

83

8. RESIDUAL ERROR AND LEVERAGE RATIO We study a residual error and a leverage ratio to evaluate the accuracy of the regression. The residual error is given by

ek  yk  yˆk

(63)

The normalized one is then given by

ek

zek 

se

(64)

2

This follows a standard normal distribution with

N  0,12 

. We can set a certain critical

e'

value for k , and evaluate whether it is outlier or not. We can evaluate the error of the regression on the different stand point of view, called

h

it as a leverage ratio, and denote it as kk . This leverage ratio expresses how each data contribute to the regression line, and is evaluated as follows. The predictive value for

k -th sample is given by

Yk  aˆ0  aˆ1 xk

 y  aˆ1  xk  x 

We substitute the expression of

(65)

ˆ1 given by, a

S xy  2

aˆ1 

S xx  2

into Eq. (66), and obtain

(66)

Kunihiro Suzuki

84 S xy  2

Yk  y 

S xx  2

 xk  x 

1 n   xi  x  yi  y  1 n n   yi  i 1  xk  x  2 n i 1 S xx  n



1 n  yi  n i 1

 x

i

i 1

 x  yi

nS xx  2

(67)

 xk  x 

This can be expressed with a general form as Yk  hk1 y1  hk 2 y2 

 hkk yk 

 hkn yn

(68)

Comparing Eq. (67) with (68), we obtain hkk as

hkk 

1  xk  x   2 n nS xx 

2

(69)

hkk is the variation of the predicted value when yk changed by 1. The predicted value should be determined by the whole data, and the variation should be small to obtain an accurate regression. The leverage ratio increases with increasing the deviation of x from the sample

x , and it is determined only with xi independent on yi . Therefore, the data far from the average of x influences much to the regression. This means that we should average of

not use the data much far from the x The leverage ratio range can be evaluated below. It is obvious that the minimum value is given by

hkk 

1 n

(70)

We then consider the maximum value. We set xi  x  X i . hkk must have the maximum value for h11 of hnn . We assume hnn as the maximum value, which does not eliminate generality. We have the relationship given by

Regression Analysis n

X i 1

i

0

85 (71)

Extracting the term X n form Eq.(71), we obtain n 1

Xn   Xi

(72)

i 1

We then have 1  hnn  1 

1  xn  x   2 n nS xx 

2



1  2  n  1 S xx 2   xn  x   2 nS xx  



1   n  1 S xx 2  X n 2  2 nS xx  

n   X i2   1  2 i 1    2  n  1  Xn  n nS xx     n 1   X i2  X n2   1  2 i 1   2  n  1  Xn  n nS xx    





1  n  1 n 1 2 X n 2   Xi  n  2  nS xx   n i 1  1 nS xx  2

(73)

2   n 1    n 1   Xi   n 1   X i 2   i n1  1    0 n  i 1    

Therefore, we have hnn  1

(74)

Consequently, we obtain

1  hkk  1 n Sum of the leverage ratio is given by

(75)

Kunihiro Suzuki

86 n

h k 1

n x  x 1   k  2 nS xx k 1 n k 1 n

kk



2

  S xx 2

1

(76)

  S xx 2

2

Therefore, the average value for hkk is denoted as hkk

hkk 

2 n

(77)

Therefore, we use a critical value for hkk as

hkk    hkk

(78)

  2 is frequently used, and evaluate whether it is outlier or not. From the above analysis, we can evaluate the error of the regression by zek or hkk . Merging these parameters, we also evaluate t value of the residual error given by zek

t

(79)

1  hkk

This includes the both criticisms of parameters of

zek

or hkk , and we can set a certain

value for t , and evaluate whether it is outlier or not. When hkk approaches to its maximum value of 1, the denominator approaches to 0, and t becomes large. Therefore, the corresponding data is apt to be regarded as an outlier.

9. PREDICTION OF COEFFICIENTS OF A REGRESSION LINE When we extract samples, the corresponding regression line varies, that is, the factors

aˆ 0

and

aˆ1

vary depending on the extracted samples. We study averages and variances of

aˆ 0

and

aˆ1

in this section.

Regression Analysis

87

We assume that the population set has a regression line given by

yi  a0  a1 xi  ei

where

a0

and

a1

for i  1,2, , n

(80)

are established values.

 0,  . xi

ei

is independent on each other and follow a

 2

normal distribution with (80) is then only

ei

is a given variable. The probability variable in Eq.

.

We consider the case where we obtain n set of data and

aˆ1

 xi , yi  . The corresponding aˆ 0

are given by S xy  2

aˆ1 

S xx  2

n



x i 1

i

 x  yi  y  nS xx  2

n



xi  x

i 1

nS xx 

yi

n

xi  x

a

 i 1

2

nS xx  2

0

 a1 xi  ei

 (81)

aˆ0  y  aˆ1 x  n xi  x  1 n y  y x  i    2 i   n i 1  i 1 nS xx  n  1 x x      i  2 x  yi  nS xx  i 1  n n  1 x x      i  2 x   a0  a1 xi  ei  nS xx  i 1  n



We notice that only

ei

 (82)

is a probability variable, and evaluate the expected value as

Kunihiro Suzuki

88 n

E  aˆ1    i 1

xi  x  2

nS xx n

 a1 

n

 a0  a1 xi    i 1

xi  x

i 1

nS xx 

xi

n

xi  x

 xi  x 

 a1  i 1

2

 2

nS xx

xi  x nS xx  2

n

for  i 1

E ei 

xi  x nS xx 

 a1

2

x 0

(83)

E  aˆ0   E  y  aˆ1 x   y  E  aˆ1  x  y  a1 x  a0

(84)

 n x x  V  aˆ1   V   i  2 yi   i 1 nS xx  2

x x    i  2  V  yi    i 1  nS xx  n

2

x x    i  2  V  ei    i 1  nS xx  n

2

x x 2    i  2   e    i 1  nS xx  n



 e 2 2 nS xx 

(85)

 n 1 x x V  aˆ0   V     i  2  i 1  n nS xx n  1 x x     i  2  nS xx i 1  n

  x  yi      2

 x  V  yi    2

n  1 x x      i  2 x  V  ei   nS xx  i 1  n   2 n  x  x  2   2 x x 1    2  2 i2  2 x  i x 2 e 2 n S xx i 1 n n 2 S xx    2 1 x  2     2   e   n nS  xx  

 

(86)

Regression Analysis n   n x x 1 x x Cov  aˆ1 , aˆ0   Cov   i  2 yi ,    i  2  nS xx i 1  n  i 1 nS xx  n x x 1 x x    E   i  2   i  2 x  yi2   i 1 nS xx  n nS xx  

89

  x  yi    

xi  x  1 xi  x   x  E  yi2  2  2   nS xx     i 1 nS xx  n n x x 1 x x    i  2   i  2 x  E ei2   nS xx  i 1 nS xx  n n x x 1 x x  2   i  2   i  2 x   e   nS xx  i 1 nS xx  n n x x x x 2   i  2  i  2 x e  nS xx i 1 nS xx n





x  2

nS xx

 e 2

(87)



We assume that 1 and normalized variables given by

z

aˆi  ai

aˆ 0

follow normal distributions, and hence assume the

for i  0,1

V  ai 

(88)

This follows a standard normal distribution. Therefore, the predictive range is given by

aˆ1  z p

 e 2 nS xx  2

 a1  aˆ1  z p

1 x2 aˆ0  z p    n nS  2 xx 

 e 2 nS xx  2

  2   e  a0  aˆ0  z p 

(89) 1 x2    2  n nS xx

  2

  2   e 

(90)

s  2

In the practical cases, we do not know e , and hence use e instead. Then, the normalized variables are assumed to follow a t distribution function with a freedom of

n  2 . Therefore, the predictive range is given by

Kunihiro Suzuki

90 aˆ1  t p  n  2 

se

2

nS xx  2

1 x2 aˆ0  t p    n nS  2 xx 

 a1  aˆ1  t p  n  2 

  2  se  a0  aˆ0  t p 

se

2

nS xx  2

1 x2    2  n nS xx

(91)   2  se 

(92)

10. ESTIMATION OF ERROR RANGE OF REGRESSION LINE We can estimate the error range of value of a regression line. The expected regression value for value of

x0

is denoted as Y0 .

We assume that we obtain n set of data given by

 xi , yi  . The corresponding regression value is

Yˆ0  aˆ0  aˆ1 x0

(93)

The expected value is given by E Yˆ0   E  aˆ0   E  aˆ1  x0  a0  a1 x0  Y0

The variance of

(94)

Yˆ0 is given by

V Yˆ0   V  aˆ0  aˆ1 x0 

 V  aˆ0   2Cov  aˆ0 , aˆ1  x0  V  aˆ1  x02 1 x2 x 2  2 2x x 2 2     2   e   0 2  e   0 2  e   n nS  nS xx nS xx xx   2 1 x  x   2    0  2   e  nS xx   n

We assume that it follows a normal distribution and hence the predicted value of is given by

(95)

Y0

Regression Analysis  1  x0  x 2   2 ˆ   e  Yo  Yˆ0  z p Y0  z p   2 nS xx    n

If we use

se 2

instead of

91

 1  x0  x 2   2   e 2 nS xx    n

(96)

 e 2 for the variance, we need to use a t distribution with a

freedom of n  2 , and the predictive range is given by  1  x  x 2  2 Yˆ0  t p  n  2    0  2  se   Y0  Yˆ0  t p  n  2  nS xx   n

 1  x0  x 2   2    se 2 nS xx    n

(97)

When an explanatory variable x changes from x0 to x1 , the corresponding variation ˆ ˆ of objective variable Yˆ from Y0 to Y1 . The variation of the objective variable is then given by

Yˆ  Yˆ1  Yˆ0

(98)

Considering the error range, the total variation

D is given by

  1  x  x 2  2    D  Yˆ1  t p  n  2    1  2  se    Yˆ0  t p  n  2  nS xx   n      ˆ ˆ  Y1  Y0  t p  n  2       Yˆ  t p  n  2   

 1  x0  x 2   2     se  2 nS xx    n 

 1  x0  x 2   2   1  x1  x 2   2  se     se    2 2 nS xx   nS xx    n  n 

 1  x0  x 2   2   1  x1  x 2   2    se  s     e 2 2 nS xx   nS xx    n  n 

(99)

If this is the positive, we can judge that the objective variable certainly changes. Therefore, we can evaluate this as below.   Yˆ  t p  n  2    

 1  x0  x 2   2   1  x1  x 2   2   se     se    2 2   nS xx  nS xx   n  n  

(100)

Kunihiro Suzuki

92

Next, we consider that the predictive value of

y

instead of the regression value of

Y for a given x of x0 . We predict yi  ao  a1 xi  ei instead of yi  ao  a1 xi . Therefore, the variance is given by

 1  x0  x 2  1    2 2  nS xx   n

The predictive range for

(101)

y0

is given by

 1  x0  x 2   2 ˆ   e  yo  Yˆ0  z p Y0  z p 1    2 n nS   xx

se 2

If we use a freedom of

instead of

 e 2

 1  x0  x 2   2 1   e 2 nS xx    n

(102)

for the variance, we need to use a t distribution with

n  2 , and the predictive range is given by

 1  x0  x 2   2 ˆ  se  yo  Yˆ0  t p  n  2  Y0  t p  n  2  1   2 nS xx    n

 1  x0  x 2   2 1    se 2 nS xx    n

(103) When we obtain the data set  x, y  , we can evaluate corresponding expected objective value as well as its error range for the x . We can compare these values with given y , and can judge whether it is reasonable or not. Let us consider an example. Table 2 shows scores of mathematics and physics of 40 members, where the score of mathematics is denoted as x and that of physics as y . The corresponding average and sample variances are given by

x  52.38 y  72.78

(104)

(105)

Regression Analysis

93

S xx   213.38

(106)

S yy   176.52

(107)

S xy   170.26

(108)

2

2

2

The sample correlation factor is given by

rxy  0.88

(109)

The center value for the correlation factor is given by

cent  0.87

which is a little smaller than

(110)

rxy

. The correlation factor range is predicted as

0.78    0.93

(111)

The regression line is given by

Yˆ  30.98  0.80x

(112)

The determinant factor and one with considering the freedom are given by R 2  0.77

(113)

R *2  0.76

(114)

The predicted regression and objective values are shown in Figure 3. The error range increases with the distance from the average location to the x increases. The original data are almost within the range of predicted objected values.

Kunihiro Suzuki

94

Table 2. Score of mathematics and physics of 40 members ID Mathematics 1 34 2 56 3 64 4 34 5 62 6 52 7 79 8 50 9 46 10 49 11 42 12 60 13 39 14 45 15 40 16 42 17 65 18 41 19 50 20 54

Physics 57 78 76 64 78 82 91 67 64 58 62 72 63 65 68 73 82 67 66 80

ID Mathematics 21 49 22 35 23 59 24 27 25 77 26 24 27 58 28 47 29 56 30 75 31 68 32 70 33 51 34 69 35 30 36 38 37 85 38 65 39 46 40 62

Physics 68 49 87 56 96 49 76 63 78 96 100 76 72 98 57 68 95 74 56 84

150 Data Center value Regression y value

y

100

50

0

0

20

40

x

60

80

100

Figure 3. Predicted regression and objective values. The original data is also shown.

Regression Analysis

95

11. LOGARITHMIC REGRESSION We assume the linear regression up to here. However, there are various cases where an appropriate fitted function is not a linear line. When the target function is an exponential one, given by

y  beax

(115)

Taking logarithms for both sides of Eq. (115), we obtain ln y  ax  ln b

(116)

Converting variables as

u  x  v  ln y

(117)

We can apply the linear regression. When

y

is proportional to power law of x , that is, it is given by

y  bx a

(118)

Taking logarithms for both sides of Eq.(118), we obtain ln y  a ln x  ln b

(119)

Converting variables as

u  ln x  v  ln y We can apply the linear regression.

(120)

Kunihiro Suzuki

96

12. POWER LAW REGRESSION We assume the data are expressed with a power law equation given by

y  a0  a1 x  a2 x 2 

 am x m

(121)

The error is given by 1 2  e1  e22   eN 2  N 1 N    yi   a0  a1 xi  a2 xi 2  N i 1

 e 2 

 am xi

m



(122) 2

We then evaluate

 e            0, e  0, e  0, , e  0 a0 a1 a2 am 2

2

2

2

(123)

This gives us m  1 equations, and we can determine  a0 , a1 , a2 , , am  as follows.

 e  2 N     yi   a0  a1 xi  a2 xi 2  a0 N i 1 2

 am xi m   0

(124)

This gives



a0  y  a1 x  a1 x 2 

 am x k



(125)

We also have  e  2 N     yi   a0  a1 xi  a2 xi 2  ak N i 1 2

 am xi m  xik

 

 

2 N    yi  y   a1  xi  x   a2 xi 2  x 2  N i 1  2 N     yi  y    a1  xi  x   a2 xi 2  x 2   N i 1  0



 am  xi m  x m   xik  



 am  xi m  x m    xik  x k  

(126)



Regression Analysis

97

We then obtain  K x11   K x 21    K xm1

K x1m  a1   K y1      K x 22  a2   K y 2           K xmm  am   K ym 

K x12 K x 22 K xm 2

(127)

where



 x  x 

(128)



(129)

K xkl 

1 N k xi  x k  N i 1

K yk 

1 N   yi  y  xik  xk N i 1

l i

l



13. REGRESSION FOR AN ARBITRARILY FUNCTION We further treat an arbitrarily function with m parameters denoted as

f  a1 , a2 ,

, am , x 

 e 2  is expressed by

The error

 e 2 

.

1 N   yi  f  a1 , a2 , N i 1 

, am , xi 

2

(130)

Partial differentiating this with respect to a1 , we obtain  e  1  a1 N 2

N

 2  y i 1

i

 f  a1 , a2 ,

, am , xi  

f  a1 , a2 ,

, am , xi 

a1

0

(131)

Arranging this equation, we obtain 1 N

N

  y i 1

i

 f  a1 , a2 ,

, am , xi  

f  a1 , a2 ,

We set the left side of Eq. (132) as

, am , xi 

a1 g1  a1 , a2 ,

0

, am 

, that is,

(132)

Kunihiro Suzuki

98 g1  a1 , a2 ,

, am  

1 N

N

  y

i

i 1

 f  a1 , a2 ,

, am , xi  

f  a1 , a2 ,

, am , xi 

a1

(133)

Performing similar analysis for the other parameters, we obtain

 g1  a1 , a2 , , am   0   g 2  a1 , a2 , , am   0   g a , a , , a   0 m  m 1 2

(134)

where g k  a1 , a2 ,

, am  

1 N

N

  y

i

i 1

 f  a1 , a2 ,

, am , xi  

f  a1 , a2 ,

, am , xi 

ak

(135)

The parameters  a1 , a2 , , am  can be solved numerically as follows. a  ,a  ,  We set initial value set as 0

0

1

2

, am

0

.

a   , a   , 1

The deviation from the updating value



 a1 , a2 ,

1

, am 

1

1

2

 is given by

, am   a1   a1  , a2   a2  , , am   am 1

0

1

0

1

0



(136)

Let us consider the equation of gk  a1 , a2 , , am   0 , which we want to obtain the corresponding set of  a1 , a2 , , am  . However, we do not know the set. We assume that the

a   , a   , 1

tentative set

1

1

2

0  g k  a1 , a2 ,



1

 is close to the one, then

, am 

 g k a1  , a2  , 0

, am 

0

, am

0

  ga

k

1

where

a1 

g k a2  a2



g k am am

(137)

Regression Analysis g k  a1 , a2 ,

99

, am 

al 

1 N

  f  a1 , a2 , , am , xi  f  a1 , a2 , , am , xi    yi  f  a1 , a2 , al ak i 1   N

 

, am , xi  

 2 f  a1 , a2 , al ak

, am , xi      

(138) Therefore, we obtain gk g a1  k a2  a1 a2



gk 0 0 0 am   gk a1  , a2  , , am  am





(139)

We then obtain g1 g1  g1  0  0  0  a a1  a a2   a am   g1 a1 , a2 , , am 2 m  1 g 2 g 2 g 2 0 0 0 a1  a2   am   g 2 a1  , a2  , , am    a  a  a  1 2 m   g m g m  g m  0  0  0  a a1  a a2   a am   g m a1 , a2 , , am 2 m  1











(140)



We then obtain the below matrix.  g1  a  1  g 2   a1    g m  a  1

g1 a2 g 2 a2 g m a2

g1    g a  0 , a2 0 , , am 0 am   a1   1 1 g 2     0 0 0  g a   , a2  , , am   a am   2    2 1         am     g m a1 0 , a2 0 , , am 0 g m   am 

 

   





(141)

   

We then have  g1  a 1  a1     g   2  a2    a 1        am   g  m  a  1

g1 a2 g 2 a2 g m a2

1

g1  am    g1 a1 0 , a2 0 , , am  0  g 2    0  0  0   g a , a2 , , am am   2 1      0  0  0 g m    g m a1 , a2 , , am am 

 

   





   

(142)

Kunihiro Suzuki

100

The updating value is then given by a11  a1 0  a1  1 0 a2  a2   a2    1  0 am  am  am

(143)

If the following hold abs  a1    c  abs  a2    c   abs  a    m c 

(144)

the updating value is the ones we need. If they do not hold, we set a1 0  a11   0 1 a2  a2      0 1 am  am

(145)

We perform this cycle while Eq. (144) becomes valid.

14. LOGISTIC REGRESSION We apply the above procedure to a logistic curve, which is frequently used. The linear regression line is expressed as y  ax  b

(146)

The objective value y increases monotonically with increasing an explanatory value of x . We want to consider a two levels categorical objective variable such as yes or no, good or bad, or so on. Then we regard y as the probability for the categorical data and modify the left hand of Eq. (146), and express it as  y  ln    ax  b 1 y 

(147)

Regression Analysis

101

When we consider the case of yes or no, y corresponds to the probability of yes, and 1-y corresponds to the one of no. Eq. (147) can be solved with respect to y as y

1 1  exp  ax  b 

(148)

1  1  k exp  ax 

where k  eb

(149)

Y has a value range between 0 and 1, and saturates when it approaches to 1. We want to extend this form to have any positive value, and add one more parameter as f  a1 , a2 , a3 ; x  

a1 1  a2 exp  a3 x 

(150)

This is the logistic curve. In many cases, the dependence of the objective variable on explanatory variable is not simple, that is, even when the dependence is clear, it saturate for certain large explanatory variable values. A logistic curve is frequently used to express the saturation of phenomenon as mentioned above. We can easily extend this to include various explanatory variables given by

f  a1 , a2 , a3i ; xi  

a1 1  a2 exp    a31 x1  a32 x2 

 a3m xm  

(151)

In this analysis, we focus on one variable, and use Eq. (150). The error

 e 2 

 e 2  for the logistic curve is expressed by

2 1 N  yi  f  a1 , a2 , a3 ; x   N i 1

Partial differentiating this with respect to a , we obtain

(152)

Kunihiro Suzuki

102

2 f  a1 , a2 , a3 ; xi   e  1 N   2  yi  f  a1 , a2 , a3 ; xi  a1 N i 1 a1

1 N 1   2  yi  f  a1 , a2 , a3 ; xi  0 N i 1 1  a2 exp  a3 xi 

(153)

Arranging this equation, we obtain

1 N 1  yi  f  a1 , a2 , a3 ; xi  0  N i 1 1  a2 exp  a3 xi  We set the left side of Eq. (132) as ga1  a1 , a2 , a3  

g a1  a1 , a2 , a3 

(154)

, that is,

 a1 1 N  1 0  yi    N i 1  1  a2 exp  a3 xi   1  a2 exp  a3 xi 

(155)

Performing a similar analysis for the other parameters, we obtain  a1 exp  a3 xi  a1 1 N   yi    N i 1  1  a2 exp  a3 xi   1  a2 exp  a3 xi  2  

g a2  a1 , a2 , a3  

g a3  a1 , a2 , a3  

 a1a2 xi exp  a3 xi  a1 1 N   yi    N i 1  1  a2 exp  a3 xi   1  a2 exp  a3 xi  2  

(156)

(157)

We then have g a1  a1 , a2 , a3  a1    a1 N   1 N  1 1        yi 2  a1  N i 1  1  a2 exp  a3 xi   N i 1  1  a2 exp  a3 xi         N 1 1   N i 1 1  a2 exp  a3 xi   2  



(158)

Regression Analysis

103

g a1  a1 , a2 , a3  a2    a1 N   1 N  1 1   y     i  2  a2  N i 1  1  a2 exp  a3 xi   N i 1  1  a2 exp  a3 xi         exp  a3 xi   a1 N  2 1  a2 exp  a3 xi   exp  a3 xi   1 N         yi 4 N i 1  1  a2 exp  a3 xi  2  N i 1     1  a exp  a x   2 3 i          2a N   exp  a3 xi  exp  a3 xi  1 N  1       yi   N i 1  1  a2 exp  a3 xi  2  N i 1  1  a2 exp  a3 xi  3        



(159)

g a1  a1 , a2 , a3  a3    a1 N   1 N  1 1    y     i  a3  N i 1  1  a2 exp  a3 xi   N i 1  1  a2 exp  a3 xi  2       a2 xi exp  a3 xi   a1 N  2a2 xi 1  a2 exp  a3 xi   exp  a3 xi   1 N         yi 4 N i 1  1  a2 exp  a3 xi   2  N i 1     1  a exp  a x  3 i  2        xi exp  a3 xi   2a1a2 N  xi exp  a3 xi   a N     2   yi  N i 1  1  a2 exp  a3 xi  2  N i 1  1  a2 exp  a3 xi   3        



(160)

g a2  a1 , a2 , a3  a1  a1 exp  a3 xi   1 N  a12 exp  a3 xi      1 N      y   i a1  N i 1  1  a2 exp  a3 xi   2  N i 1  1  a2 exp  a3 xi   3            1 N  2a exp  a x   exp  a3 xi  1 N  1 3 i        yi 2 N i 1  1  a2 exp  a3 xi    N i 1  1  a2 exp  a3 xi   3        





 2a N   exp  a3 xi  exp  a3 xi  1 N  1    yi    N i 1  1  a2 exp  a3 xi   2  N i 1  1  a2 exp  a3 xi   3        

(161)

Kunihiro Suzuki

104 g a2  a1 , a2 , a3  a2

 N  a1 exp  a3 xi   1 N  a12 exp  a3 xi    1        yi 2 3   N i 1  1  a2 exp  a3 xi    N i 1  1  a2 exp  a3 xi     1 N  2a1 exp  2a3 xi  1  a2 exp  a3 xi       yi 4 N i 1   1  a2 exp  a3 xi     2 2 1 N  3a1 exp  2a3 xi  1  a2 exp  a3 xi      6 N i 1   1  a2 exp  a3 xi      3a 2 N   exp  2a3 xi  exp  2a3 xi  2a N    1     1   yi 3 4 N i 1  1  a2 exp  a3 xi    N i 1  1  a2 exp  a3 xi          



 a2

(162)

g a2  a1 , a2 , a3  a3  a1 exp  a3 xi   1 N  a12 exp  a3 xi      1 N        yi a3  N i 1  1  a2 exp  a3 xi   2  N i 1  1  a2 exp  a3 xi   3            2 N     a x exp  a x 1  a exp  a x  2 a a x exp  2 a x  3 i   2  3 i   3 i  1  a2 exp  a3 xi   1 1 i 1 2 i    yi 4 N i 1   1  a2 exp  a3 xi     3 2 2 2 1 N  a1 xi exp  a3 xi  1  a2 exp  a3 xi    3a1 a2 xi exp  2a3 xi  1  a2 exp  a3 xi      6 N i 1   1  a2 exp  a3 xi    



(163)

xi exp  a3 xi   2a1a2 N  xi exp  2a3 xi   a1 N    yi  yi   2 N i 1  1  a2 exp  a3 xi    N i 1  1  a2 exp  a3 xi   3             2 N  2 N xi exp  a3 xi  xi exp  2a3 xi  a   3a1 a2     1  N i 1  1  a2 exp   a3 xi   3  N i 1  1  a2 exp   a3 xi  4       



g a3  a1 , a2 , a3  a1  a1a2 xi exp  a3 xi   1 N  a12 a2 xi exp  a3 xi      1 N        yi a1  N i 1  1  a2 exp  a3 xi   2  N i 1  1  a2 exp  a3 xi   3           a2 xi exp  a3 xi   1 N  2a1a2 xi exp  a3 xi   1 N         yi N i 1  1  a2 exp  a3 xi   2  N i 1  1  a2 exp  a3 xi   3        





a2 N

N



  y i 1

i



 2a a  1 2 2 N 1  a2 exp  a3 xi    xi exp  a3 xi 

  3 i 1 1  a exp   a x    2 3 i    N



 

xi exp  a3 xi 

(164)

Regression Analysis

105

g a3  a1 , a2 , a3  a2  a1a2 xi exp  a3 xi   1 N  a12 a2 xi exp  a3 xi      1 N        yi a2  N i 1  1  a2 exp  a3 xi   2  N i 1  1  a2 exp  a3 xi  3           2 1 N  a1 xi exp  a3 xi  1  a2 exp  a3 xi    2a1a2 xi exp  2a3 xi  1  a2 exp  a3 xi      yi 4 N i 1   1  a2 exp  a3 xi    



(165)

3 2 2 2 1 N  a1 xi exp  a3 xi  1  a2 exp  a3 xi    3a1 a2 xi exp  2a3 xi  1  a2 exp  a3 xi     6 N i 1   1  a2 exp  a3 xi         N xi exp  a3 xi  xi exp  2a3 xi  a N   2a1a2   yi   1   yi N i 1  1  a2 exp  a3 xi   2  N i 1  1  a2 exp  a3 xi   3            2 N  2 N  x exp  a x x exp  2 a x     a i 3 i i 3 i   3a1 a2     1  N i 1  1  a2 exp  a3 xi   3  N i 1  1  a2 exp  a3 xi  4       

g a3  a1 , a2 , a3  a3  a1a2 xi exp  a3 xi   1 N  a12 a2 xi exp  a3 xi      1 N         yi a3  N i 1  1  a2 exp  a3 xi   2  N i 1  1  a2 exp  a3 xi   3           2 2 2 2 1 N  a1a2 xi exp  a3 xi  1  a2 exp  a3 xi    2a1a2 xi xp  2a3 xi  1  a2 exp  a3 xi       yi 4 N i 1   1  a2 exp  a3 xi     3 2 N   a 2 a x 2 exp   a x  1  a exp   a x    3a 2 a 2 x 2 exp  2a x  1  a exp   a x    1 1 2 i 3 i  2 3 i  1 2 i 3 i  2 3 i     6 N i 1     1  a exp  a x   2 3 i     2 2   2 N  N  x exp  a x x xp  2 a x  3 i   2a1a2   3 i  aa i   1 2   yi   yi i N i 1  1  a2 exp  a3 xi   2  N i 1  1  a2 exp  a3 xi   3          

a12 a2 N



 3a 2 a 2  1 2 N i 1 1  a exp   a x    2 3 i    N

 

xi2 exp  a3 xi 

3



(166)

  i 1 1  a exp   a x    2 3 i    N

 

xi2 exp  2a3 xi 

4

The incremental variation is given by  g1  a1  a1      g 2  a2    a  a   1  3  g  3  a1

g1 a2 g 2 a2 g3 a2

1

g1   a3    g1 a1 0 , a2 0 , a3 0  g 2    0  0  0   g 2 a1 , a2 , a3 a3     0  0 0 g3    g3 a1 , a2 , a3  a3 

  

The updating value is then given by

      

(167)

Kunihiro Suzuki

106 a11  a1 0  a1  1 0 a2  a2   a2    1  0 am  am  am

(168)

If the following hold abs  a1    c  abs  a2    c   abs  a    m c 

(169)

the updating value is the ones we need. If they do not hold, we set a1 0  a11   0 1 a2  a2      0 1 am  am

(170)

We perform this cycle while Eq. (144) becomes valid. We can set initial conditions roughly as follows. We can see the saturated value inspecting the data and set it as a10 . Selecting two points of  x1 , y1  , and  x2 , y2  , we obtain y1 

a10 1  a20 exp  a30 x1 

(171)

y2 

a10 1  a20 exp  a30 x2 

(172)

We then modify them as follows. a10  1  a20 exp  a30 x1  y1

(173)

Regression Analysis a10  1  a20 exp  a30 x2  y2

107 (174)

Therefore, we obtain

a  ln  10  1  ln a20  a30 x1  y1 

(175)

a  ln  10  1  ln a20  a30 x2  y2 

(176)

We then obtain  a10   1  y1 1   a30  ln x2  x1  a10   1   y2  a  ln a20  ln  10  1  a30 x1 y  1  a  a  x2 x1  ln  10  1  ln  10  1 x2  x1  y1  x2  x1  y2 

(177)

(178)

x2

 a10  x2  x1  1  y   ln  1 x1

 a10  x2  x1  1   y2 

Therefore, we obtain x2

 a10  x2  x1  1  y  a20   1 x1  a10  x2  x1  1   y2 

(179)

Kunihiro Suzuki

108

Let us consider the data given in Table 3. The value of y saturate with increasing x as shown in Figure 4. Table 3. Data for logistic curve



y 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

3 5 9 14 23 35 55 65 80 90 91 94 97 99 100

100 80 ypredict

y

60 40

a1 = 99.4 a2 = 76.2 a3 = 0.63

20 0 0

5

10

x Figure 4. Logistic cured with regression one.

15

Regression Analysis

109

We set a10 as 100, and use the data set of  5, 23 and 10,90  . We then have  a10   1  y 1  a30  ln  1 x2  x1  a10   1   y2   100   1  1 23   ln  10  5  100   1   90   0.68

(180)

x2

 a10  x2  x1  1  y  a20   1 x1  a10  x2  x1  1   y2  10

 100 10 5  1  23   5

(181)

 100 10 5  1   90   100.9

Using the initial conditions above, we perform a regression procedure shown in this section, and obtain the regression curve with factors of

 a1  99.4  a2  76.2  a  0.63  3

(182)

The data and a regression cure agree well.

15. MATRIX EXPRESSION FOR REGRESSION The regression can be expressed with a matrix form. This gives no new information. However, the expression is simple and one can use the method alternatively. The objective and an explanatory variables are related as follows.

Kunihiro Suzuki

110

yi  a0  a1  xi  x   ei

(183)

where we have n samples. Therefore, all data are related as  y1  a0  a1  x1  x   e1   y2  a0  a1  x2  x   e2   y  a  a x  x  e 0 1 n 1  n

(184)

Therefore, the regression analysis is to obtain the factors a0 , a1 . We introduce matrixes and vectors as  y1    y Y  2      yn 

(185)

1 x1  x    1 x2  x   X      1 xn  x 

(186)

a  β 0  a1 

(187)

 e1    e e 2      en 

(188)

Using the matrixes and vectors, the corresponding equation is expressed by

Y  Xβ  e The total sum of the square of error Q is then expressed by

(189)

Regression Analysis

111

Q  eT e  Y  Xβ  Y  Xβ  T

 Y T  βT X T  Y  Xβ 

(190)

 Y Y  Y Xβ  β X Y  β X Xβ T

T

T

T

T

T

 Y T Y  2βT X T Y  βT X T Xβ

We obtain the optimum value by minimizing Q . This can be done by vector differentiating (see Appendix 1-10 of volume 3) with respect to β it given by Q   T T  2  βT X T Y    β X Xβ  β β β  2 X T Y  2 X T Xβ 0

(191)

Therefore, we obtain β   X T X  X TY 1

(192)

SUMMARY To summarize the results in this chapter— The regression line is given by y  y

 yy  2

 x x   xy     2 xx 

   

If we want to force the regression to start from  0, a0  , the regression line is given by

y

 xy  xx  2  yy  2   x  y  a0  x



 2 xx

 x2



x  a0

The accuracy of regression is evaluated by

Kunihiro Suzuki

112 R2 

 r 2   xy2  yy 2

 2 where  r is the variance of Y given by

 r 2 

2 1 N Yi   y    N i 1

The error range for each factors are given by

aˆ1  t p  n  2 

se

2

nS xx  2

 a1  aˆ1  t p  n  2 

1 x2  2 aˆ0  t p    2  se   a0  aˆ0  t p  n nS  xx  

se

2

nS xx  2

1 x 2   2  s   2   e n nS xx  

The predictive value for the regression Y0 is given by  1  x0  x 2   2 ˆ  se  Yo  Yˆ0  t p  n  2  Y0  t p  n  2    2 nS xx    n

 1  x0  x 2   2    se 2 nS xx    n

The predictive y value y0 is given by  1  x0  x 2   2 ˆ  se  yo  Yˆ0  t p  n  2  Y0  t p  n  2  1   2 nS xx    n

where

Yˆ0  aˆ0  aˆ1 x0 When the target function is an exponential one, given by

y  beax

 1  x0  x 2   2 1    se 2 nS xx    n

Regression Analysis

113

We converting variables as u  x  v  ln y

We can apply the linear regression to the variables. When y is proportional to power law of x , that is, it is given by

y  bx a We converting variables as u  ln x  v  ln y

We can apply the linear regression to the variables. We assume the data are expressed with a power law equation given by

y  a0  a1 x  a2 x 2 

 am x m

The factors are given by  a1   K x11     a2    K x 21        am   K xm1

K x1m   K x 22    K xmm 

K x12 K x 22 K xm 2

1

 K y1     K y2       K ym 

where



 x  x 

K xkl 

1 N k  xi  xk N i 1

K yk 

1 N  yi  y  xik  xk  N i 1



l i

l



We assume an arbitrarily function having many factors is given by f  a1 , a2 , , am , x  We define the following function

Kunihiro Suzuki

114 g1  a1 , a2 ,

, am  

 g1  a 1  a1      g 2  a2    a    1     am   g  m  a  1

g1 a2 g 2 a2 g m a2

1 N

N

  y i 1

i

 f  a1 , a2 ,

, am , xi  

 

   





a11  a1 0  a1  1 0 a2  a2   a2    1  0 am  am  am

We repeat this process until the followings hold. abs  a1    c  abs  a2    c   abs  a    m c 

We apply this procedure to the logistic curve. The residual error is evaluated by zek 1  hkk

where hkk is the leverage ratio given by 1 x  x hkk   k  2 n nS xx

2

a1

1

g1  am    g1 a1 0 , a2 0 , , am  0  g 2    0  0  0   g a , a2 , , am am   2 1      0  0  0 g m    g m a1 , a2 , , am am 

The updating value is then given by

t

f  a1 , a2 ,

   

, am , xi 

Regression Analysis The regression can be expressed using a matrix form as β   X T X  X TY 1

where yi  a0  a1  xi  x   ei

The matrixes and vectors are defined as  y1    y Y  2      yn  1 x1  x    1 x2  x  X      1 xn  x  a  β 0  a1 

 e1    e e 2      en 

115

Chapter 4

MULTIPLE REGRESSION ANALYSIS ABSTRACT The objective variable is influenced by many kinds of variables in general. We therefore extend the analysis of a regression with one explanatory variable to multiple ones. We predict a theoretical prediction for given multi parameters with their error range. We also discuss the reality of the correlation relationship where the correlation factor is influenced by the other variables. The pure correlation without the influenced factor is also discussed. We also evaluate the effectiveness of the explanatory variables.

Keywords: regression, multiple regression, objective variable, explanatory variable, multicollinearity, partial correlation factor

1. INTRODUCTION We treated one explanatory variable in the previous chapter, and studied the relationship between an explanatory variable and one objective variable. However, the objective variable is influenced by various parameters. For example, the price of a house is influenced by location, house age, site area, etc. Therefore, we need to treat multi explanatory variables to predict the objective variable robustly. We first treat two explanatory variables, and then treat ones of more than two. We treat a sample regression in this chapter. The regression for population can be easily derived by changing sample averages and sample variances to the population’s ones as shown in the previous chapter. We perform matrix operation in this chapter. The basics of matrix operation are described in Chapter 15 of volume 3.

Kunihiro Suzuki

118

2. REGRESSION WITH TWO EXPLANATORY VARIABLES We treat one objective variable y , and two explanatory variables x1 and x2 in this section for simplicity. We derive concrete forms for the two variables and extend it to the one for multiple ones later.

2.1. Regression Line The objective variable yi and two explanatory variables xi1 and xi 2 in population can be related to yi  a0  a1 xi1  a2 xi 2  ei

(1)

where a0 , a1 , and a2 are constant factors. We pick up n set data from the population, and want to predict the characteristics of the population. We can decide the sample factors a0 , a1 , and a2 from the sample data, which is not the same as the ones for population, and hence we denote them as aˆ0 , aˆ1 , and aˆ2 . The sample data can be expressed by y1  aˆ0  aˆ1 x11  aˆ2 x12  e1 y2  aˆ0  aˆ1 x21  aˆ2 x22  e2 yn  aˆ0  aˆ1 xn1  aˆ2 xn 2  en

(2)

ei is called as residual error. A sample regression value Yi is expressed by Yi  aˆ0  aˆ1 xi1  aˆ2 xi 2

(3)

ei can then be expressed by ei  yi  Yi  yi   aˆ0  aˆ1 xi1  aˆ2 xi 2 

(4)  2

S The variance of ei is denoted as e , and is given by

Multiple Regression Analysis Se   2

119

2 1 n  yi   aˆ0  aˆ1 xi1  aˆ2 xi 2   n i 1

(5)

 2 We decide aˆ0 , aˆ1 , and aˆ2 so that Se has the minimum value.  2 Partial differentiating Se with respect to aˆ0 and set 0 for them, we obtain

Se  1 n  2   yi   aˆ0  aˆ1 xi1  aˆ2 xi 2   0 aˆ0 n i 1 2

(6)

We then obtain 1 n 1 n 1 n 1 n ˆ ˆ ˆ ˆ ˆ ˆ   y  a  a x  a x  y  a  a x  a xi 2    i 0 1 i1 2 i 2  n   i1 2 n  i 0 1 n i 1  n i 1 i 1 i 1  y  aˆ0  aˆ1 x1  aˆ2 x2 0

(7)

Se  with respect to aˆ1 and aˆ2 , and set 0 for them, we obtain 2

Partial differentiating

Se  1 n  2  xi1  yi   aˆ0  aˆ1 xi1  aˆ2 xi 2    0 aˆ1 n i 1

(8)

Se  1 n  2  xi 2  yi   aˆ0  aˆ1 xi1  aˆ2 xi 2    0 aˆ2 n i 1

(9)

2

2

Substituting Eq. (7) into Eqs. (8) and (9), we obtain 1 n  xi1  yi  y   aˆ1  xi1  x1   aˆ2  xi 2  x2   0 n i 1 

(10)

1 n  xi 2  yi  y   aˆ1  xi1  x1   aˆ2  xi 2  x2   0 n i 1 

(11)

Note that the followings are valid.

Kunihiro Suzuki

120

1 n  x1  yi  y   aˆ1  xi1  x1   aˆ2  xi 2  x2   0 n i 1 

(12)

1 n  x2  yi  y   aˆ1  xi1  x1   aˆ2  xi 2  x2   0 n i 1 

(13)

Therefore, we obtain 1 n   xi1  x1   yi  y   aˆ1  xi1  x1   aˆ2  xi 2  x2   S1y2  aˆ1S11 2  aˆ2 S12 2 n i 1 0

(14)

1 n   xi 2  x2   yi  y   aˆ1  xi1  x1   aˆ2  xi 2  x2   S2 2y  aˆ1S21 2  aˆ2 S22 2 n i 1 0

(15)

where S11  

1 n 2  xi1  x1   n i 1

(16)

  S22 

1 n 2  xi 2  x2   n i 1

(17)

2

2

S12   S21  2

2



1 n   xi1  x1  xi 2  x2  n i 1

(18)

S yy  

1 n 2   yi  y  n i 1

(19)

S1y  

1 n   xi1  x1  yi  y  n i 1

(20)

S2 y 

1 n   xi 2  x2  yi  y  n i 1

(21)

2

2

2

Multiple Regression Analysis

121

Eqs. (14) and (15) are expressed with a matrix form as   2 S12    aˆ1   S1 y         2 2 S22    aˆ2   S2 y 

 S11 2  2  S   21

2

(22)

Therefore, we can obtain aˆ1 and aˆ2 as    aˆ1   S11    ˆ    2  a2   S12 2

 S 11 2   21 2 S   

1

 S1y2   2   S    2y   2 12 2 S     S1 y   2     22 2 S     S2 y 

S12    2 S22   2

(23)

where   S11 2  2 22 2 S     S12 

 S 11 2  21 2 S 

S

12 2 

S12    2 S22   2

1

(24)

Once we obtain the coefficients aˆ1 and aˆ2 from Eq. (23), we can obtain aˆ0 from Eq. (7) as aˆ0  y  aˆ1 x1  aˆ2 x2

(25)

2.2. Accuracy of the Regression The accuracy of the regression can be evaluated similarly to the case of one explanatory variable by a determinant factor given by R2 

where

Sr

2

S yy  2

 r 2  is the variance of

(26)

Y

given by

Kunihiro Suzuki

122 Sr   2

2 1 n Yi   y    n i 1

(27)

This is the ratio of the variance of the regression line to the variance of

y

. If whole

 2

data are on the regression line, that is, the regression is perfect,

S yy

is equal to

2 Sr  ,

and

R 1. On the other hand, when the deviation of data from the regression line is significant, the determinant ratio is expected to become small. We further modify the variance of y can also be modified in the previous chapter as 2

1 n 2  yi  y   n i 1 1 n 2    yi  Yi  Yi  y  n i 1 1 n 1 n 1 n 2 2    yi  Yi    Yi  y   2   yi  Yi Yi  y  n i 1 n i 1 n i 1 n n n 1 1 1 2   ei2   Yi  y   2  ei  a0  a1 xi1  a2 xi 2  n i 1 n i 1 n i 1

S yy   2

 Se   Sr 2

(28)

2

where we utilize Cov e, x1   Cov e, x2   0

(29)

Therefore, the determination coefficient is reduced to

R2 

Sr

2

Se   Sr 2

1

Se

2

2

S yy 

2

(30)

Consequently, the determination coefficient is the square of the correlation factor, and has a value range between 0 and 1. The multiple correlation factor rmult is defined as the correlation between yi and Yi , and is given by

Multiple Regression Analysis

 y n

rmult 

i

i 1

 y n

i

i 1



 y Yi  y

y



  Y  y  n

2

123

2

i

i 1

(31)

We can modify the denominator of (31) as Se   2

1 n 2   yi  Yi  n i 1

2 1 n  yi  y   Yi  y    n i 1 1 n 1 n 1 n 2 2    yi  y    Yi  y   2   yi  y Yi  y  n i 1 n i 1 n i 1 n 1 2 2  S yy   S R   2   yi  y Yi  y  n i 1



(32)

The cross term in Eq. (32) is performed as





1 n 1  yi  y Yi  y   S yy 2  S R 2  Se 2  n i 1 2 1 2 2 2 2   S yy   S R   S yy   S R    2





 S R  2

(33)

Then, the multiple correlation factor is given by S R  2

rmult 

S yy  S R  2

2

S R  2



S yy  2

(34)

2 The square of rmult is the determinant of the multi regression, and is denoted as R and is given by

SR 

R 

  S yy 2

  S yy  Se 2

2

2



  S yy 2

2

1

Se

2

  S yy 2

(35)

Kunihiro Suzuki

124

Substituting Eq. (34) into Eq. (35), we can express the determinant factor with a multiple correlation factor as 2 R2  rmult

(36)

The variances used in Eq. (35) are related to the samples, and hence the determinant with unbiased variances are also defined as n R

*2

1

e n

T

Se

2

(37)

 2

S yy

 2 Three variables a0 , a1 , and a2 are used in Se , and hence the freedom of it e is given

by e  n  3

(38)

 2 y is used in S yy , and hence the freedom of it T is given by

T  n  1

(39)

R*2 is called as a degree of freedom adjusted coefficient of determination for a multiple regression. 2

R*2 can also be related to R as R*2  1 

T 1  R2  e

(40)

2.3. Residual Error and Leverage Ratio We evaluate the accuracy of the regression. The residual error can be evaluated as ek  yk  Yk

(41)

Multiple Regression Analysis

125

This is then normalized by the biased variance and is given by

zek 

ek se

(42)

2

se  is given by 2

where

se   2

n

e

Se

2

(43)

This follows a standard normal distribution with Further, it is modified using leverage ratio

tk 

zek

N  0,12 

.

hkk as (44)

1  hkk

This can be used for judging the outlier or not. We should consider the leverage ratio for a multiple regression, which is shown below. The regression data can be expressed by a leverage ratio as Yk  hk1 y1  hk 2 y2 

 hkk yk 

 hkn yn

(45)

On the other hand, Yk is given by Yk  aˆ0  aˆ1 xk 1  aˆ2 xk 2

 y   aˆ1 x1  aˆ2 x2   aˆ1 xk1  aˆ2 xk 2  y  aˆ1  xk1  x1   aˆ2  xk 2  x2 

(46)

 aˆ   y   xk1  x1 , xk 2  x2   1   aˆ2 

Each term is evaluated below. y

1 n  yk n i 1

(47)

Kunihiro Suzuki

126

 S 11 2

 aˆ   a2 

 xk1  x1 , xk 2  x2   ˆ1    xk1  x1 , xk 2  x2  

S

21 2 

 S 11 2   xk1  x1 , xk 2  x2   21 2 S   

 S 11 2   xk1  x1 , xk 2  x2   21 2 S   

  S1y2     2  22 2 S     S2 y  1 n   xi1  x1  yi  y   12 2     S n i 1   n 22 2  S     1    xi 2  x2  yi  y    n i 1  S

12 2 

(48)

1 n   n   xi1  x1 yi   S i 1   n 22 2  S     1 x  x y     i2 2 i   n i 1  12 2 

Focusing on the k -th term, we obtain hkk 

1 Dk2  n n

(49)

where  S 11 2 S 12 2   xk1  x1  xk 2  x2   12 2   S   S 22 2   xk 2  x2    11 2  12 2   xk1  x1  S   xk 2  x2  S     xk1  x     x  x  S 12 2   x  x  S 22 2  k2 2  k1 1 

Dk2   xk1  x1   xk1  x1

  xk1  x1  S 2

11 2 

 2  xk1  x1  xk 2  x2  S

12 2 

(50)

  xk 2  x2  S 2

22  2 

This Dk is called as Mahalanobis’ distance. Sum of the leverage ratio is given by n

1 n Dk2  k 1 n k 1 n n 2 2 1 ij 2  1    xki  xi   xkj  x j  S   n k 1 i 1 j 1 n

 hkk   k 1

 n     xki  xi   xkj  x j    S ij  2  1    k 1 n  i 1 i 1    2

2

2

2

 1   Sij  S i 1 i 1

1 2

2

ij  2 

(51)

Multiple Regression Analysis

127

The average leverage ratio is then given by

hkk 

3 n

(52)

Therefore, we use a critical value for hkk as

hkk    hkk

(53)

  2 is frequently used, and we evaluate whether it is outlier or not using the value.

When x1 and x2 are independent on each other, the inverse matrix is reduced to

 S 11 2  12 2 S   

1

  S11 2 S12 2    2 22 2  2  S     S12  S22   1  0    2 S11    1   0  2   S22   S

12 2 

(54)

and we obtain

Dk2 

 xk1  x1  S11  2

2



 xk 2  x2 

2

(55)

S22  2

The corresponding leverage ratio is given by 1  xk1  x1   xk 2  x2    2 2 n nS11  nS22  2

hkk 

2

(56)

Therefore, we can simply add the terms in the single regression. There are interactions between the explanatory variables, and the expression becomes more complicated in general, and we need to use the corresponding models.

Kunihiro Suzuki

128

2.4. Prediction of Coefficients of a Regression Line When we extract samples, the corresponding regression line varies, that is, the factors

aˆ 0 aˆ1 aˆ ˆ aˆ aˆ , , and 2 vary. We study averages and variances of 0 , a1 , and 2 in this section. We assume the pollution data are given by

yi  a0  a1 xi1  a2 xi 2  ei

where

 xi1 , xi 2 

population.

ei

for i  1,2, , n

is the given variable,

a0

,

(57)

a1

, and a2 are established values for

is independent on each other and follows a normal distribution with

0,    . The probability variable in Eq. (57) is then only e . 2

e

i

When we get sample data evaluated as

 xi1 , xi 2 , yi 

for i  1,2, , n . The coefficients are

  S1y2    22 2     2   S S  2y   S 11 2 S1y2  S 12 2 S2 2y     21 2  2  S S  S 22 2 S  2  1y 2y  

   aˆ1   S   ˆ   21 2  a2   S 11 2

S

12 2 

(58)

Each coefficient is expanded as

aˆ1  S 11 2 S1y2  S 12 2 S2 2y 2

 S p 1

1 p 2

S py 2



 1 n  2 1 p 2 xip  x p   yi   S  n i 1  p 1 



 1 n  2 1 p 2 xip  x p    a0  a1 xi1  a2 xi 2  ei    S  n i 1  p 1 

(59)

Multiple Regression Analysis

129

aˆ2  S 21 2 S1y2  S 22 2 S2 2y 2

 S

2 p 2

p 1

S py 2



 1 n  2 2 p  2 xip  x p   yi   S  n i 1  p 1 



 1 n  2 2 p  2 xip  x p    a0  a1 xi1  a2 xi 2  ei    S  n i 1  p 1 

(60)

aˆ0  y   aˆ1 x1  aˆ2 x2  

  1 n 1 n  2 1 p 2 1 n  2 2 p 2 y  x S x  x y  xip  x p  yi     1  x2  S    i ip p  i n i 1 n i 1  p 1 n i 1  p 1  



2 2  1 n  1 p 2 2 p 2 1  x S x  x  x xip  x p   yi      1 ip p 2S n i 1  p 1 p 1 



2 2  1 n  jp  2  xip  x p   yi  1   x j  S  n i 1  j 1 p 1 



2 2  1 n  jp  2  xip  x p    a0  a1 xi1  a2 xi 2  ei   1   x j  S  n i 1  j 1 p 1 

We notice that only

E  aˆ1  

ei

is a probability variable, and evaluate the expected value as

 1 n  2 1 p 2 xip  x p    a0  a1 xi1  a2 xi 2  ei    S  n i 1  p 1 

 a0

 1 n  2 1 p 2 xip  x p     S  n i 1  p 1 

 a1

 1 n  2 1 p 2 xip  x p  xi1   S  n i 1  p 1 

 a2

 1 n  2 1 p 2 xip  x p  xi 2   S  n i 1  p 1 



(61)

 1 n  2 1 p 2 xip  x p  ei   S  n i 1  p 1  2

 a1  S p 1

 a1

1 p 2

2

S p1  a2  S 2

p 1

1 p 2

S p 2 2

(62)

Kunihiro Suzuki

130 where 2

S

ip  2 

p 1

S pj    ij

E  aˆ2  

2

(63)

 1 n  2 2 p 2 xip  x p    a0  a1 xi1  a2 xi 2  ei    S  n i 1  p 1 

 a0

 1 n  2 2 p 2   S  xip  x p  n i 1  p 1 

 a1

 1 n  2 2 p 2   S  xip  x p xi1 n i 1  p 1 

 a2

 1 n  2 2 p 2 xip  x p  xi 2   S  n i 1  p 1 



(64)

 1 n  2 2 p 2 xip  x p  ei   S  n i 1  p 1  2

 a1  S p 1

2 p 2

2

S p1  a2  S 2

p 1

2 p 2

S p 2 2

 a2

E  aˆ0  

2 2  1 n  jp  2   xip  x p   a0  a1 xi1  a2 xi 2  ei  1   x j  S  n i 1  j 1 p 1 

 a0  a1

2 2  1 n  jp  2  xip  x p  xi1  1   x j  S  n i 1  j 1 p 1 

 a2

2 2  1 n  jp  2  1  x xip  x p  xi 2     jS n i 1  j 1 p 1 



a0

2 2  1 n  jp  2  xip  x p    1   x j  S  n i 1  j 1 p 1 

(65)

2 2  1 n  jp  2  xip  x p ei  1   x j  S  n i 1  j 1 p 1 

2 2  1 n  jp  2  1  x xip  x p      jS  n i 1  j 1 p 1 

n  2 n  2   1p 2 2p 2  a0  a0   x1  S    xip  x p    a0   x2  S    xip  x p   i 1  p 1 i 1  p 1   2 n 2 n     1p 2 2p 2  a0  a0  x1  S     xip  x p    a0  x2  S     xip  x p   i 1 i 1  p 1   p 1   a0

(66)

Multiple Regression Analysis

a1

131

2 2  1 n  jp  2   xip  x p xi1 1   x j  S  n i 1  j 1 p 1 

 a1 x1  a1

  1 n  2 1 p 2 1 n  2 2p 2 xip  x p  xi1  a1   x2  S    xip  x p  xi1   x1  S  n i 1  p 1 n i 1  p 1   2

 a1 x1  a1 x1  S

1 p 2

p 1

 a1 x1  a1 x1 S

11 2 

2

S p1  a1 x2  S 2

2 p 2

p 1

(67)

S p1 2

S11  2

0

a2

2 2  1 n  jp  2   xip  x p xi 2 1   x j  S  n i 1  j 1 p 1  2

 a2 x2  a2 x1  S

1 p 2

p 1 2

 a2 x2  a2 x1  S

n

i 1

1 p 2

p 1

 a2 x2  a2 x2 S

 x

22 2 

ip

 x p xi 2  a2 x2  S 2

p 1

2

S p 2  a2 x2  S 2

2 p 2

2 p 2

p 1

x n

i 1

ip

 x p xi 2

(68)

S p 2 2

 2

S22

 a2 x2  a2 x2 0 2 2  1 n  jp  2  xip  x p ei  0  1   x j  S  n i 1  j 1 p 1 

(69)

Therefore, we obtain E  aˆ0   a0

(70)

We then evaluate variances of factors. 2

n   1 2 1p 2 V  aˆ1      S    xip  x p   V  ei  n i 1  p 1  2

1 1 n  11 2  S  xi1  x1   S 12 2  xi 2  x2  V ei  n n i 1  1 1 n  11 2 2 2  S  xi1  x1   2S 12 2 S 11 2  xi1  x1  xi 2  x2   S 12 2   n n i 1  2 1  11 2 2  2 12 2 11 2 2 12 2 2  2  S S11  2S   S   S12   S   S22 s  e n  1 11 2 11 2 2 12 2 12 2   2  2   2  S 11 2  S 12 2  S 12 2  S 22   S    S   S11   S   S21 S    se n









S

11 2 

n





se

2







 x 2

i2

2 2  x2  se  

(71)

Kunihiro Suzuki

132 2

n   1 2 2p 2 V  aˆ2      S    xip  x p   V  e n i 1  p 1  2



1 1 n  21 2  S  xi1  x1   S 22 2  xi 2  x2  se 2 n n i 1 







1 1 n  21 2 2 2 S  xi 2  x2   2S 21 2 S 22 2  xi1  x1  xi 2  x2   S 22 2   n n i 1  2 1  21 2 2  2 21 2 22 2 2 22 2 2  2  S S22  2S   S   S12   S   S22 s   e n 1 21 2 21 2 2 22 2 2 22 2 21 2 2 22 2 2   2   S    S   S2 2   S   S12    S    S   S12   S   S22   se n







S

22 2 

n



se



 x 2

i2

2

2 2  x2   se  

(72)



2

2 2 1 n    jp 2 V  aˆ0   V   1   x j  S    xip  x p    a0  a1 xi1  a2 xi 2  ei    n i 1  j 1 p 1   2  n 1   2 2  jp 2     1   x j  S    xip  x p    V  e   i 1  n  j 1 p 1    2 2   1 1  n     1   x j  S j1 2  xi1  x1   S j 2 2  xi 2  x2     V e      n n  i 1   j 1     n 2 11   11 2  12 2    1  x1  S  xi1  x1   S  xi 2  x2   x2  S 21 2  xi1  x1   S 22 2  xi 2  x2    se 2 n n  i 1  2    11 2  xi1  x1   S 12 2  xi 2  x2    x2  S 21 2  xi1  x1   S 22 2  xi 2  x2     1 1  n 1  x1  S       2     se n n  i 1 2 x  S 11 2 x  x  S 12 2 x  x   x  S 21 2 x  x  S 22 2 x  x      i1 1   i 2 2  2   i1 1   i 2 2    1   2 2    2 11 2  xi1  x1   S 12 2  xi 2  x2   x22  S 21 2  xi1  x1   S 22 2  xi 2  x2     2 1 1  n   x1  S  1    se   n n  i 1  2 x x  S 11 2  x  x   S 12 2  x  x    S 21 2  x  x   S 22 2  x  x        1 2  i1 1 i2 2  i1 1 i2 2        2   2  11 2 2 2 2 12 2  11 2  12  2    x S x  x  S x  x  2 S S x  x x  x          1   i1 1 i2 2 i1 1 i2 2       2 1 1 1  n  2  21 2 2  2 2 22 2  21 2  22  2      xi1  x1   S  xi 2  x2   2S S  xi1  x1  xi 2  x2    se 2    x2  S    n n n  i 1       S 11 2 S 21 2  xi1  x1 2  S 11 2  S 22 2  xi1  x1  xi 2  x2     2 x1 x2         S 12 2 S 21 2  xi 2  x2  xi1  x1   S 12 2 S 22 2  xi 2  x2  xi 2  x2       





 

 

















      2 2   x 2  S 11 2 S  2   S 12 2 S  2   2S 11 2  S 12 2  S  2     1  11 22 12          2 2 1 1   22      x2 2  S 21 2 S11 2   S 22 2  S22  2S 21 2  S 22 2  S12 2     se 2     n n      S 11 2  S 21 2 S11 2   S 22 2 S12 2           2 x x   1 2    S 12 2  S 21 2 S  2  S 22 2 S  2     21 22        

















1 1 11 2 22 2 12 2  2     x12  S     x2 2  S     2 x1 x2 S     se   n n   1 1  2  11 2   2 22 2  12  2  21 2 2   x1 x2  S    x1  S   x2  S  S      se     n n  

2 2 1 jl  2    2  1   x j xl S  se n  j 1 l 1 



1  1   x1 n 

 S 11 2 x2   21 2 S   

S S

12 2  22 2 

  x1    2     se   x2   

(73)

Multiple Regression Analysis

133

where we utilize the following relationship in the last derivation step.

S

12 2

S

21 2

(74)

The covariance between aˆ1 and aˆ2 is given by

Cov  aˆ1 , aˆ2   

  2 2 p ' 2    1 1  n  2 1 p 2 xip '  x p '  V e     S  xip  x p    S n n  i 1  p 1   p '1   1 1  n  11 2  2 12 2  21 2  22 2    S  xi1  x1   S  xi 2  x2   S  xi1  x1   S  xi 2  x2   se n n  i 1 

2 11 2  21 2  11 2  22 2  1 1 n  S S  xi1  x1   S S  xi1  x1  xi 2  x2    2 se   n n i 1   S 12 2 S 21 2  x  x  x  x   S 12 2 S 22 2  x  x 2  i2 2 i1 1 i2 2   11 2 21 2 2 22 2 2               1  S  S S11  S S12    2  se n   S 12 2  S 21 2 S  2  S 22 2 S  2   21 22    

1 12 2 2  S   se  n (75) Similarly, we obtain

Cov  aˆ2 , aˆ1  

1 21 2  2 S se n

(76)

V  aˆ j , aˆl  is generally expressed by

Consequently, the

V  aˆ j 

Cov  aˆ j , aˆl  

1 jl  2  2 S se n

The covariance between

and

aˆ0 and aˆ1 is given by

(77)

Kunihiro Suzuki

134 Cov  aˆ0 , aˆ1   

2 2   2 1p 2  11 n  jp  2  xip  x p     S    xip  x p   V  e   1   x j  S  n n i 1  j 1 p 1   p 1  2  11 n  j1 2   xi1  x1   S j 2 2  xi 2  x2    S 11 2  xi1  x1   S 12 2  xi 2  x2  se 2 1   x j  S  n n i 1  j 1 

  11 2  xi1  x1   S 12 2  xi 2  x2    1 1 n 1  x1  S    11 2  xi1  x1   S 12 2  xi 2  x2  se 2  S  n n i 1  x  S 21 2  x  x   S 22 2  x  x     i1 1 i2 2   2  11 2  12 2      xi1  x1   S  xi 2  x2   11 2 1 1 n  x1  S   xi1  x1   S 12 2  xi 2  x2  se 2   S  n n i 1  x  S 21 2  x  x   S 22 2   x  x     i1 1 i2 2   2  2 11 2 11 2 12 2 11 2         x S S x  x  S S x  i1 1   i 2  x2   xi1  x1    1   21 2 11 2  xi1  x1 2  S 22 2  S 11 2   xi 2  x2  xi1  x1    1 1 n  x2  S S    2    se  2 11 2 12 2 12 2 12 2         n n i 1  x  S S  xi1  x1  xi 2  x2   S S  xi 2  x2    1     x  S 21 2 S 12 2  x  x  x  x   S 22 2 S 12 2   x  x 2   i1 1 i2 2 i2 2  2    11 2 11 2 2 12 2 11 2 2              x1  S S S11  S S S 21        x  S 21 2 S 11 2  S  2   S 22 2  S 11 2  S  2    11 21   1 2  2    se 11 2 12 2 2 12 2 12 2 2 n x S   S   S    S   S   S     1 12 22    21 2  12 2   2  22 2  12  2   2   x2  S S S12  S S S 22      1 11 2 12 2 2   x1 S    x2 S   se  n 1 11 2 12 2  x   S   S    1 n  x2 











(78) Similarly, we obtain





1 21 2 22 2 2 x1 S    x2 S   se  n 1 21 2 22 2  x  2   S   S    1  se  n  x2 

Cov  aˆ0 , aˆ2   





(79)

Consequently, the Cov  aˆ0 , aˆ j  is generally expressed by

1 Cov aˆ0 , aˆ j    n

2

x S l

l 1

jl  2 

2 se 

(80)

Multiple Regression Analysis



We assume that 1 and normalized variables. The parameter given by

t

aˆi  ai

V  ai 

aˆ 0

135

follow normal distributions, and hence assume the

for i  0,1, 2 (81)

follows a t distribution function with a freedom of e , where e  n  3

(82)

Therefore, the predictive range is given by

aˆi  t p e ; P V  aˆi   ai  aˆi  t p e ; P V aˆi 

(83)

2.5. Estimation of a Regression Line

x , x , y  We assume that we obtain n set of data i1 i 2 i . The corresponding regression value is given by Y  aˆ0  aˆ1 x1  aˆ2 x2

(84)

The expected value is given by E Y   E  aˆ0   E  aˆ1  x1  E  aˆ2  x2  a0  a1 x1  a2 x2

The variance of Y is given by

(85)

Kunihiro Suzuki

136 V Y   V  aˆ0  aˆ1 x1  aˆ2 x2  2

2

2

 V  aˆ0   2 x j Cov  aˆ0 , aˆ j    x j xl Cov  aˆ j , aˆl  j 1

j 1 l 1



2 2 2 1 2 2 1 2 2 jl  2    2  jl  2    2  jl  2    2  1   x j xl S  se    x j  xl S  se    x j xl S  se n n  j 1 l 1 n  j 1 l 1 j 1 l 1   



2 2 1 jl  2    2  1     x j xl  2 x j xl  x j xl  S  se n j 1 l 1 

(86)

Let us consider the term below. 2

2

 2 x x S j 1 l 1 2

jl  2 

j l

2

  x j xl S

jl  2 

j 1 l 1

2

2

  x j xl S

jl  2 

j 1 l 1

(87)

The second term is modified as 2

2

 x j xl S

jl  2 

j 1 l 1

2

2

  xl x j S

lj  2 

j 1 l 1 2

2

  xl x j S

jl  2 

j 1 l 1

(88)

where we utilize the relationship of S

lj  2

S

jl  2

(89)

Therefore, we obtain 2

2

 2 x x S j 1 l 1

j l

jl  2 

   x j xl  xl x j  S 2

2

jl  2 

j 1 l 1

Substituting Eq. (90) into Eq. (86), we obtain

(90)

Multiple Regression Analysis V Y  

137

2 2 1 jl  2    2  1    x j xl  2 x j xl  x j xl  S  se n  j 1 l 1 



2 2 1 jl  2    2  1    x j xl  x j xl  xl x j  x j xl  S  se n  j 1 l 1 



2 2 1 jl  2    2  1     x j  x j   xl  xl  S  se n  j 1 l 1 

1 1   x1  x1 n  1 2  1  D 2  se  n



 S 11 2 x2  x2   21 2 S   

  x1  x1    2    se 22 2 S     x2  x2   S

12 2 

(91)

where D2   x1  x1

 S 11 2 x2  x2   21 2 S   

S S

12 2 22 2

  x1  x1     x2  x2  

(92)

Therefore, the variable given by t 

Y  E Y  V Y 

 aˆ0  aˆ1 x1  aˆ2 x2    a0  a1 x1  a2 x2   1 D 2   2    se n  n

(93)

follows a t distribution. We can then estimate the value range of the regression given by  1 D2   2  1 D2   2 aˆ0  aˆ1 x1  aˆ2 x2  t p e ; P     se  Y  aˆ0  aˆ1 x1  aˆ2 x2  t p e ; P     se n n  n n 

(94) The objective variable range is given by y  aˆ0  aˆ1 x1  aˆ2 x2  e

and the corresponding range is given by

(95)

Kunihiro Suzuki

138

 1 D 2   2  1 D 2   2 ˆ ˆ ˆ aˆ0  aˆ1 x1  aˆ2 x2  t e , P  1   s  y  a  a x  a x  t  , P    e 1    se 0 1 1 2 2 e  n n   n n 

(96)

2.6. Selection of Explanatory Variables We used two explanatory variables. However, we want to know how each variable contributes to express the subject variable. We want to know the validity of each explanatory variable. One difficulty exists that with increasing number of explanatory variable, the determinant factor increases. Therefore, we cannot use the determinant factor to evaluate the validity. We start with regression without explanatory variable, and denote it as model 0. The regression is given by Model 0 : Yi  y

(97) Se M 0  2

The corresponding variance Se M 0  2

is given by

1 n 1 n 2 2  yi  Yi     yi  y   S yy 2  n i 1 n i 1

(98)

In the next step, we evaluate the validity of x1 and x2 , and denote the model as model 1. The regression using explanatory variable is given by xl

Yi  a0  a1 xil1

(99)

The corresponding variance Se2M 1 is given by Se M 1  2





2 1 n 1 n 2  yi  Yi     yi  a0  a1 xil1   n i 1 n i 1

Then the variable

(100)

Multiple Regression Analysis nS    F 

2 e M 0

 nSe M 1

1

2

nS

  

 2 e M 1

e M 0

 e M 1

139



(101)

e M 1

follows F distribution with freedom and are given by



F e M 0  e M 1 ,e M 1



where

e M 0

and

e M 1

are the

e M 0  n  1

(102)

e M 1  n  2

(103)

We can judge the validity of the explanatory variable as

 

 F1  F   e M 1 , e M 1 e M 0     F1  F e M 0  e M 1 , e M 1 

 

valid

(104) invalid

We evaluate F1 for x1 and x2 , that is, l1  1,2 , and evaluate the corresponding F1 . If both F1 for l1  1,2 are invalid, we use the model 0 and the process is end. If at least one of the F1 is valid, we select the larger one, and the model 1 regression is determined as

Yi  a0  a1 xil1

(105)

where l1  1 or 2

(106)

depending on the value of the corresponding F1 . We then evaluate the second variable and denote it as model 2. The corresponding regression is given by

Yi  a0  a1 xil1  a2 xil2

(107)

Kunihiro Suzuki

140

The corresponding variance is given by 2 Se M 2





2 1 n 1 n  2    yi  Yi     yi  a0  a1 xil1  a2 xil2  n i 1 n i 1

(108)

Then the variable

F2

nS    

2 e M 1

 nSe M 2 2

  

nSe M 2

e M 1

 e M 2



(109)

2

e M 2

follows F distribution with freedom and are given by



F e M 1  e M 2 ,e M 2



where

e M 1

and

e M 2

are the

e M 1  n  2

(110)

e M 2  n  3

(111)

We can judge the validity of the explanatory variable as

 

 F2  F   e M 2 , e M 2 e  M 1    F2  F e M 1  e M 2 , e M 2 

 

valid

(112) invalid

If both F2 are invalid, we use the model 1 and the process is end. If F2 is valid, we use two variables as the explanatory ones.

3. REGRESSION WITH MORE THAN TWO EXPLANATORY VARIABLES We move to an analysis where the explanatory variable number is bigger than two and we denote the number as m . The forms for the multiple regressions are basically derived in the previous section. We then neglect some parts of the derivation in this section.

Multiple Regression Analysis

141

3.1. Derivation of Regression Line We pick up n set data from the population, and want to predict the characteristics of the population. We can decide the sample factors a0 , a1 , a2 ,

, am from the sample data, which

is not the same as the ones for the population, and hence we denote them as aˆ0 , aˆ1 ,

,

and aˆm a. The sample data can be expressed by

y1  aˆ0  aˆ1 x11  aˆ2 x12   aˆm x1m  e1 y2  aˆ0  aˆ1 x21  aˆ2 x22   aˆm x2 m  e2 (113)

yn  aˆ0  aˆ1 xn1  aˆ2 xn 2 

where

 aˆm xnp  en

ei is a residual error. A sample regression value Yi is expressed by

Yi  aˆ0  aˆ1 xi1  aˆ2 xi 2 

 aˆm xim

(114)

ei can then be expressed by

ei  yi  Yi  yi   aˆ0  aˆ1 xi1  aˆ2 xi 2 

 aˆm xim 

(115)

 2

S The variance of ei is denoted as e , and is given by Se   2

1 n   yi   aˆ0  aˆ1 xi1  aˆ2 xi 2  n i 1 

We decide aˆ0 , aˆ1 , Partial differentiating them, we obtain

 aˆm xim 

2

(116)  2

S , and aˆm so that e

Se

2

has the minimum value.

with respect to aˆ0 , aˆ1 ,

, and aˆm and setting 0 for

Kunihiro Suzuki

142

 S11 2   2  S21    S  2  m1

 2 2 S1m   aˆ1   S1 y       2  2 S2 m   aˆ2   S2 y            2    aˆ p    2   Smm     S py 

S12  2

  S22 2

Sm 2 2

(117)

Therefore, we can obtain the coefficients as    aˆ1   S11  ˆ   2  a2    S21        aˆ p   Sm 21 2

 S 11 2  21 2 S      S m1 2 

S12 

S1m   2 S2 m     2  Smm 

2

2

S22  2

Sm 2 2

S S S

1

 S1y2   2   S2 y       S  2   py 

  S1y2     2  2m 2 S     S2 y      mm  2     2   S   S py 

12 2 

S

22 2 

m 2 2 

1m  2 

(118)

where  S 11 2  21 2 S      S m1 2 

S S S

12 2  22 2 

m 2 2 

After obtaining aˆ1 ,

  S11 2    2 2m 2 S     S21    mm 2     2 S   Sm1

S

1m  2 

S12  2

  S22 2

Sm 2 2

S1m   2 S2 m     2  Smm  2

1

(119)

, aˆm , we can decide aˆ0 as

aˆ0  y  aˆ1 x1  aˆ2 x2 

 aˆm xm

(120)

3.2. Accuracy of the Regression The accuracy of the regression can be evaluated similarly to the case of two explanatory variables by determinant factor given by

Multiple Regression Analysis R2 

Sr

143

2

S yy  2

(121)

 2 where Sr is the variance of Y given by

Sr   2

1 n 2 Yi  y   n i 1

(122)

The variance of y has the same expression as the one with two explanatory variables given by

S yy   Se   Sr 2

2

2

(123)

Therefore, the determination coefficient is reduced to R2 

S r

2

Se   S r 2

1

Se

2

2

S yy 

2

(124)

Consequently, the determination coefficient is the square of the correlation factor, and has a value range between 0 and 1. The multiple correlation factor rmult is defined as the correlation between yi and Yi , and is given by

 y n

rmult 

 y n

i 1



i

i 1

S S

i



 y Yi  y

y



  Y  y  2

n

i 1

2

i

 2 R  2 yy

We can modify the denominator of (31) as

(125)

Kunihiro Suzuki

144 Se   2

1 n 2   yi  Yi  n i 1

2 1 n   yi  y   Yi  y  n i 1  1 n 1 n 1 n 2 2    yi  y    Yi  y   2   yi  y Yi  y  n i 1 n i 1 n i 1 n 1 2  2  S yy  S R   2   yi  y Yi  y  n i 1



(126)

The cross term in Eq. (32) is performed as





1 n 1   yi  y Yi  y   2 S yy 2  SR 2  Se 2 n i 1 1 2 2 2 2   S yy   S R   S yy   S R     2 2  S R 





(127)

Then, the multiple correlation factor is also the same form with two explanatory variables given by S R  2

rmult 

S yy  S R  2

2

S R  2



S yy  2

(128)

2 The square of rmult is the determinant of the multi regression, and is denoted as R and is given by

SR 

R 

  S yy 2

  S yy  Se 2

2

2



  S yy 2

2

1

Se

2

  S yy 2

(129)

Substituting Eq. (34) into Eq. (35), we can express the determinant factor with multiple correlation factor as 2 R2  rmult

(130)

Multiple Regression Analysis

145

The variances used in Eq. (35) are related to the samples, and hence the determinant with unbiased variances are also defined as

n R*2  1 

e n

T

Se

2

 2

(131)

S yy

where

e  n   m  1

(132)

T  n  1

(133)

and

The validity of the regression can be evaluated with a parameter as

F

sr

se

2

(134)

2

where

sr  

nSr  m

se  

nSe  n   m  1

2

2

2

(135)

2

(136)

We also evaluate the critical F value of Fp  m, n   m  1  , and judge as   F  Fp  m, n   m  1   invalid    F  Fp  m, n   m  1   valid

(137)

Kunihiro Suzuki

146

3.3. Residual Error and Leverage Ratio We evaluate the accuracy of the regression. The residual error can be evaluated as

ek  yk  Yk

(138)

This is then normalized by the biased variance and is given by

zek 

ek se

(139)

2

se  is given by 2

where

se   2

n

e

Se

2

(140)

This follows a standard normal distribution with Further, it is modified as

tk 

N  0,12 

.

zek

(141)

1  hkk

This can be used for judging the outlier or not. We should consider the leverage ratio hkk for a multiple regression. The regression data can be expressed by leverage ratio as

Yk  hk1 y1  hk 2 y2 

 hkk yk 

On the other hand, Yk is given by

 hkn yn

(142)

Multiple Regression Analysis

Yk  aˆ0  aˆ1 xk1  aˆ2 xk 2   y   aˆ1 x1  aˆ2 x2 

 aˆm xkm

 aˆm xm   aˆ1 xk1  aˆ2 xk 2 

 y  aˆ1  xk1  x1   aˆ2  xk 2  x2    y   xk1  x1

147

 aˆm xkm

 aˆm  xkm  xm 

 aˆ1  ˆ  a xkm  xm   2       aˆm 

xk 2  x2

(143)

Each term is evaluated below.

1 n y   yk n i 1

 xk1  x1

  xk1  x1

  xk1  x1

  xk1  x1

  xk1  x1

xk 2  x2

xk 2  x2

xk 2  x2

xk 2  x2

xk 2  x2

(144)

xkm

 aˆ1  ˆ  a  xm   2       aˆm 

 S11 2   2 S xkm  xm   21   S  2  m1  S 11 2  21 2 S   xkm  xm     S m1 2   S 11 2  21 2 S   xkm  xm     S m1 2 

 S 11 2  21 2 S   xkm  xm     S m1 2 

S12  2

 2

S22

Sm 2 2

S S S

S S S

S S S

12 2  22 2 

m 2 2 

12 2  22 2 

m 2 2 

12 2  22 2 

m 2 2 

S1m   2 S2 m     2  Smm  2

1

 S1y2   2   S2 y       S  2   py 

  S1y2     2    S2 y  S      2 mm 2 S     S py  1 n     xi1  x1  yi  y   n 1m 2  S     i 1  1 n  2 m 2     xi 2  x2  yi  y   S   n i 1    mm  2     S  1 n   n   xim  xm  yi  y    i 1  1 n     xi1  x1 yi  n 1m  2  i  1   S  1 n  2 m 2     xi 2  x2 yi  S n   i 1    mm 2  S     1 n   x  x y   m i   n  im  i 1 

S

1m  2 

2 m 2

(145)

Kunihiro Suzuki

148

Focusing on the k -th term, we obtain hkk 

1 Dk2  n n

(146)

where

Dk2   xk1  x1

 S 11 2  21 2 S   xkm  xm     S m1 2 

xk 2  x2

S S S

12  2  22  2 

m 2 2 

1m  2 

 x  x    k1 1    xk 2  x2  S     mm 2 S     xkm  xm 

S

2 m 2

(147)

D2

This k is called as square of Mahalanobis’ distance. Sum of the leverage ratio is given by n 1 n Dk2 h     kk k 1 k 1 n k 1 n n

1

1 n p p  xki  xi   xkj  x j  S ij  2  n k 1 i 1 j 1

 n  p p    xki  xi   xkj  x j    S ij  2  1    k 1 n  i 1 i 1    p

p

 1   Sij  S 2

(148)

ij  2 

i 1 i 1

Since we obtain

S  S    2

ij

ij 2

1  0    0

Therefore, we obtain

0

0

0   0  1

(149)

Multiple Regression Analysis n

h k 1

kk

m

m

 1   Sij  S 2

149

ij  2 

(150)

i 1 j 1

1 m

The average leverage ratio is then given by

hkk 

m 1 n

(151)

Therefore, we use a critical value for hkk as

hkk    hkk

(152)

  2 is frequently used, and evaluate whether it is outlier or not.

3.4. Prediction of Coefficients of a Regression Line for Multi Variables

aˆ aˆ aˆ 2

We can simply extend the range of the variables of 0 , 1 , variables forms. We assume the pollution set has a regression line is given by

yi  a0  a1 xi1  a2 xi 2  x ,x , where  i1 i 2

, xim 

values for population.

ei

 am xim  ei

for i  1,2, , n

are the given variables,

, aˆm from the two

(153)

a0 a1 a2 ,

,

,

,

, am are established

is independent on each other and follows a normal distribution

0,     e with . The probability variable in Eq. (57) is then only . 2

i

Extending the forms for two variables, we can evaluate the averages and variables of the factors as the followings. The averages of the factors are given by E  aˆk   ak

for k  0,1, 2,

,m

The variances of the factors are given by

(154)

Kunihiro Suzuki

150 V  aˆk  

S

kk  2

n

se

2

for k  1, 2, , m

  1 V  aˆ0   1   x1 n  

(155)

 S 11 2  21 2 S   xm     S m1 2 

x2

S S S

12 2  22 2 

m 2 2 

  x   1  2 m 2     x2    2 S     se    mm 2 S     xm  

S

1m  2 

(156)

The covariance between aˆ j and aˆl are given by Cov  aˆ j , aˆl  

1 jl  2  2 S se n

for j , l  1, 2,

,m

(157)

aˆ ˆ The covariance between a0 and j are given by

Cov  aˆ0 , aˆ j   



1 j1 2 S n

S

j 2 2 

a a a2

We assume that 0 , 1 , that the normalized variables

t

aˆi  ai

V  ai 

,

S

jm 2 



 x1     x2  s  2   e    xm 

(158)

, am follow normal distributions, and hence assume

for i  0,1, 2 (159)

follows a t distribution function with a freedom of e , where e  n   m  1

(160)

Therefore, the predictive range is given by

aˆi  t p e ; P V  aˆi   ai  aˆi  t p e ; P V aˆi 

(161)

Multiple Regression Analysis

151

3.5. Estimation of a Regression Line

x , x , We assume that we obtain n set of data i1 i 2 regression value is given by Y  aˆ0  aˆ1 x1  aˆ2 x2 

, xim , yi 

. The corresponding

 aˆm xm

(162)

The expected value is given by E Y   E  aˆ0   E  aˆ1  x1  E  aˆ2  x2   a0  a1 x1  a2 x2 

 E  aˆm  xm

 am xm

(163)

The variance of Y is given by V Y   V  aˆ0  aˆ1 x1  aˆ2 x2    1  1   x1  x1 n   1 2  1  D 2  se  n

x2  x2

 aˆm xm   S 11 2  21 2 S   xm  xm     S m1 2 

S S S

12 2  22 2 

m 2 2 

  x  x  1  1    x2  x2    2 S s   e    mm 2 S     xm  xm  

S

1m  2 

2 m 2

(164)

where

D 2   x1  x1

x2  x2

 S 11 2  21 2 S   xm  xm     S m1 2 

S S S

12 2  22 2 

m 2 2 

1m  2 

 x  x  1  1    x2  x2  S     mm 2 S     xm  xm 

S

2 m 2

(165)

We can then estimate the value range of the regression. The regression range is given by

aˆ0  aˆ1 x1  aˆ2 x2 

 1 D 2   2  aˆm xm  t p e ; P     se n  n

(166)

Y  aˆ0  aˆ1 x1  aˆ2 x2 

 1 D 2   2  aˆm xm  t p e ; P     se n  n

Kunihiro Suzuki

152

The objective variable range is given by y  aˆ0  aˆ1 x1  aˆ2 x2 

 aˆm xm  e

(167)

and the corresponding range is given by

aˆ0  aˆ1 x1  aˆ2 x2 

 1 D 2   2  aˆm xm  t e , P  1    se n n  

 y

(168)

aˆ0  aˆ1 x1  aˆ2 x2 

 1 D 2   2  aˆm xm  t e , P  1    se n n  

3.6. The Relationship between Multiple Coefficient and Correlation Factors We study the relationship between multiple factors and correlation factors. We focus on the two variables x1 and x2 for simplicity, and also assume that the variables are normalized. The factors a1 , a2 are related to the correlation factors as 1   r12

r12  a1   r1       1  a2   r2 

(169)

Therefore, the factors are solved as  a1   1 r12      a2   r12 1  1   1 r 2 12   r12  2  1  r12

1

 r1  r12 r2   1 r 2  12    r2  r12 r1   2   1  r12 

 r1     r2  r12  1  r12 2   r1    r2  1  1  r12 2 



(170)

Multiple Regression Analysis

153

We then obtain 1  r12 a1  r1

(171)

1  r12 2

1  r12 a2  r2

r2 r1

r1 r2

(172)

1  r12 2

We assume that r1  r2 which do not induce loosing generality since the form is symmetrical. If there is no correlation between two variables, that is, r12  0 , we obtain a1  r1

(173)

a2  r2

(174)

The factors are equal to the corresponding correlation factors as are expected.

2 r1 = 0.8

a1, a2

r2 = 0.5

1

a1

0

a2

-1 -2

0

0.5 r12

1

Figure 1. Relationship between multiple factor and the correlation factors.

Figure 1 shows the relationship between the coefficients and correlation factors with respect to the correlation factor between two variables.

Kunihiro Suzuki

154

The variable a1 with large r1 decreases a little with increasing r12 , and increases significantly for larger r12 . On the other hand, the variable a2 with small r2 decreases little with decreasing r12 , and decreases significantly for larger r12 . Differentiating a1 with respect to r12 , we obtain

a1  r1 r12 



 r2 r  1  r12 2   1  r12 2   2r12   r1 r1  

1  r 

2 2

12

r2

1  r 

2 2

12

(175)

 2  r1  r12  2 r12  1 r2  

Setting 0 for Eq. (175), we obtain r122  2

r1 r12  1  0 r2

(176)

Solving this equation we obtain 2

r12 

r  r1   1  1 r2  r2 

(177)

Since r12 is smaller than 1, we select the root with negative sign root before the root, and then it is reduced to 2

r  r r12  1   1   1 r2  r2  1  2 r  r1   1  1 r2  r2 

(178)

a1 has the minimum value for the above r12 , and then increases with further increasing.

On the other hand, differentiating a2 with respect to r12 , we obtain

Multiple Regression Analysis r122  2

r2 r12  1  0 r1

155 (179)

Solving this with respect to r12 , we obtain 2

r  r r12  2   2   1 r1  r1 

(180)

Since r2  r1 , the term in the root become negative, and hence, we have no root. This means that the factor decreases monotonically as is expected from the figure. Inspecting the above, we can expect multiple factors not so far from the correlation factor within this minimum r12 . In this region, we can roughly image that the both factors are smaller than the independent correlation factors due to the interaction between the explanatory variables. However, they are significantly different from the ones with exceeding the minimum value of r12 , and the feature of the multiple factor is complicated. We cannot have a clear image for the value of the factors under this strong interaction between explanatory variables. It hence should be minded that we should use independent variables for the explanatory as possible as we can.

3.7. Partial Correlation Factor We can extend the partial correlation factor which is discussed in the previous chapter to the multiple variables. We consider an objective variable y and m kinds of explanatory variables

x1 , x2 ,

, xm .

Among the explanatory variables, we consider the correlation between y and x1 . In this case, both y and x1 have relationship between the explanatory variables. Neglecting a variable x1 , we can evaluate the regression relationship of y as

Yk  c0  c2 xk 2  c3 xk 3  where

 cm xkm

(181)

Kunihiro Suzuki

156    c2   S22     2  c3    S32       2  cm   Sm 2 2

  S23 2

  S33 2

Sm 3 2

We can decide c2 , c3 ,

S2 m   2 S3 m     2  Smm  2

1

 S2 2y   2   S3 y       S  2   my 

, cm using Eq. (182). After obtaining c2 , c3 ,

(182) , cm , we can

decide c0 from the equation below. y  c0  c2 x2  c3 x3 

 cm xm

(183)

On the other hand, we regard x1 as a subject variable, and it is expressed by

X k1  d0  d2 xk 2  d3 xk 3 

 dm xkm

(184)

where    d 2   S22     2  d3    S32        d p   Sm 22 2

d2 , d3 ,

  S23 2

S33  2

Sm 3 2

S2 m   2 S3 m     2  Smm  2

1

 2   S21   2   S31       S  2   m1 

(185)

, dm can be decided from the equation. d 0 can be evaluate as

x1  d0  d2 x2  d3 x3 

 dm xm

(186)

We then introduce the following two variables given by

uk  yk  Yk

(187)

vk1  xk1  X k1

(188)

These variables can be regarded as ones eliminating the influence of the other explanatory variables. We can then evaluate the followings.

Multiple Regression Analysis

u

v1 

1 n  uk n k 1

(189)

1 n  vk1 n k 1

(190)

1 n 2  uk  u   n k 1

(191)

1 n 2    vk1  v1  n k 1

(192)

Suu 2 

 2

Sv1v1

157

Suv 1  2

1 n  uk  u  vk1  v1  n k 1

Therefore, the corresponding partial correlation factor r1 y , 2,3,

r1 y , 2,3,

,m



(193) ,m

can be evaluated as

Suv 21 Sv12v1 Suu 2

(194)

We can evaluate the partial correlation factors for other explanatory variables.

3.8. Selection of Explanatory Variable We used many explanatory variables. However, some of them may not contribute to express the subject variable. Furthermore, if the interaction between explanatory variables is significant, the determinant of the matrix becomes close to zero, and the related elements in the inverse matrix become unstable or we cannot obtain results with the related numerical error, which is called as multicollinearity. One difficulty exists that the determinant factor increases with increasing number of explanatory variables. Therefore, we cannot use the determinant factor to evaluate the validity for the regression.

Kunihiro Suzuki

158

We start with a regression without any explanatory variable, and denote it as model 0. The regression is given by Model 0 : Yi  y

(195) Se M 0  2

The corresponding variance Se M 0  2

is given by

1 n 1 n 2 2 y  Y     yi  y   S yy 2   i i n i 1 n i 1

(196)

x In the next step, we evaluate the validity of x1 , x2 , and m , and denote the model as model 1.

The regression using explanatory variable is given by xl

Yi  a0  a1 xil1

(197) Se M 1 2

The corresponding variance Se M 1  2

is given by





2 1 n 1 n 2  yi  Yi     yi  a0  a1 xil1   n i 1 n i 1

(198)

Then the variable nS    F 

2 e M 0

1

 nSe M 1 2

  

2 nSe M 1

e M 0

 e M 1



(199)

e M 1

follows F distribution with freedom and are given by



F e M 0  e M 1 ,e M 1

 where 

e M 0 

and

e M 1

are the

e M 0  n  1

(200)

e M 1  n  2

(201)

Multiple Regression Analysis

159

We can judge the validity of the explanatory variable as

 

 F1  F   e M 1 , e M 1 e M 0     F1  F e M 0  e M 1 , e M 1 

 

valid

(202) invalid

We evaluate F1 for x1 and x2 , that is, l1  1,2, , m , and evaluate the corresponding F1 .

If both F1 are invalid, we use the model 0 and the process is end. If at least one of the F1 is valid, we select the maximum one, and the model 1 regression is determined as

Yi  a0  a1 xil1

(203)

Where l1 correspond to the selected variable in the above step. We then evaluate the second variable denote it as model 2. The second variables are residual ones where the first one is selected in the above step. The corresponding regression is given by

Yi  a0  a1 xil1  a2 xil2

(204)

The related variance is given by Se M 2  2





2 1 n 1 n 2  yi  Yi     yi  a0  a1 xil1  a2 xil2   n i 1 n i 1

(205)

Then the variable

F2 

 nS  

2 e M 1

 nSe M 2 2

nS

  

 2 e M 2

e M 1

 e M 2



(206)

e M 2

follows F distribution with freedom and are given by



F e M 1  e M 2 ,e M 2



where

e M 1

and

e M 2

are the

Kunihiro Suzuki

160

e M 1  n  2

(207)

e M 2  n  3

(208)

We can judge the validity of the explanatory variable as

 

 F2  F   e M 2 , e M 2 e  M 1    F2  F e M 1  e M 2 , e M 2 

 

valid

(209) invalid

If both F2 are invalid, we use the model 1 and the process is end. If at least one of the F2 is valid, we select the maximum one, and the model 2 regression is determined as

Yi  a0  a1 xil1  a2 xil2

(210)

where l 2 correspond to the selected variable in the above step. We repeat the process until all F are invalid, or check all variables.

4. MATRIX EXPRESSION FOR MULTIPLE REGRESSION The multiple regression can be expressed with a matrix form. This gives no new information. However, the expression is simple and one can use the method alternatively. The objective and p kinds of explanatory variables are related as follows.

yi  a0  a1  xi1  x1   a2  xi 2  x2  

 a p  xip  x p   ei

(211)

where we have n samples. Therefore, all data are related as  y1  a0  a1  x11  x1   a2  x12  x2      y2  a0  a1  x21  x1   a2  x22  x2        yn  a0  a1  xn1  x1   a2  xn 2  x2  

 a p  x1 p  x p   e1  a p  x2 p  x p   e2  a p  xnp  x p   e1

(212)

Multiple Regression Analysis

161

We introduce matrixes and vectors as  y1    y Y  2      yn 

(213)

1 x11  x1  1 x21  x1 X    1 xn1  x1

x12  x2 x22  x2 xn 2  x2

x1 p  x p   x2 p  x p    xnp  x p 

(214)

 a0    a1 β       ap 

(215)

 e1    e e 2      en 

(216)

Using the matrixes and vectors, the corresponding equation is expressed by Y  Xβ  e

(217)

The total sum of the square of error Q is then expressed by Q  eT e  Y  Xβ  Y  Xβ  T

 Y T  βT X T  Y  Xβ 

(218)

 Y T Y  Y T Xβ  βT X T Y  βT X T Xβ  Y T Y  2βT X T Y  βT X T Xβ

We obtain the optimum value by minimizing Q . This can be done by vector differentiating (see Appendix 1-10 of volume 3) with respect to β it given by

Kunihiro Suzuki

162

Q   T T  2  βT X T Y    β X Xβ  β β β  2 X T Y  2 X T Xβ 0

(219)

Therefore, we obtain β   X T X  X TY 1

(220)

SUMMARY To summarize the results in this chapter— We assume that the regression is expressed with m explanatory variables given by Y  a0  a1 X1  a2 X 2 

 am X m

The coefficients are given by  2

 aˆ1   S11  ˆ    2  a2    S21       2  aˆ p   Sm 1

S12  2

S22  2

Sm 2 2

S1m   2 S2 m     2  Smm  2

1

 S1y2   2   S2 y       S  2   py 

The accuracy of the regression can be evaluated with a determinant factor given by R2 

Sr

2

S yy  2

  S yy 2

where

and

2 Sr 

is the variance of Y given by

S yy  

1 n 2  yi  y   n i 1

Sr  

1 n 2 Yi  y   n i 1

2

2

Multiple Regression Analysis The validity of the regression can be evaluated with a parameter F given by

F

sr

se

2 2

where sr  

nSr  m

se  

nSe  n   m  1

2

2

2

2

 2

The variance Se Se   2

is given by

1 n 2  yi  Yi   n i 1

We also evaluate the critical

F

value of Fp  m, n   m  1  , and judge as

 F  Fp  m, n   m  1   invalid   F  Fp  m, n   m  1   valid

The residual error for each data can be evaluated as tk 

zek 1  hkk

where zek 

ek

hkk 

1 Dk2  n n

se

2

163

Kunihiro Suzuki

164 and

Dk2   xk1  x1

This

xk 2  x2

 S 11 2  21 2 S   xkm  xm     S m1 2 

S S S

12  2  22  2 

m 2 2 

1m  2 

 x  x    k1 1  2m 2 S     xk 2  x2      mm 2 S     xkm  xm 

S

Dk2 is called as square of Mahalanobis’ distance.

The error range of regression line are given by

aˆ0  aˆ1 x1  aˆ2 x2 

 1 D 2   2  aˆm xm  t p e ; P     se n  n

Y  aˆ0  aˆ1 x1  aˆ2 x2 

 1 D 2   2  aˆm xm  t p e ; P     se n  n

The error range of objective values are given by

aˆ0  aˆ1 x1  aˆ2 x2 

 1 D 2   2  aˆm xm  t e , P  1    se n n  

 y aˆ0  aˆ1 x1  aˆ2 x2 

 1 D 2   2  aˆm xm  t e , P  1    se n n  

The partial correlation factor, which is the pure correlation factor between factor 1 and r the objective variable y, 1 y , 2,3, , m  can be evaluated as

r1 y , 2,3,

,m



Suv 21 Sv12v1 Suu 2

where the variables u , v is the explanatory and objective variable eliminating the influence to the other variables given by

Multiple Regression Analysis

165

uk  yk  Yk vk1  xk1  X k1 Yk

and X k are regression variables and they are expressed by the other explanatory variables given by

Yk  c0  c2 xk 2  c3 xk 3 

 cm xkm

X k1  d0  d2 xk 2  d3 xk 3 

 dm xkm

The corresponding coefficients are determined by the standard regression analysis and is given by    c2   S22     2  c3    S32       2  cm   Sm 2 2

  S23 2

  S33 2

Sm 3 2

S2 m   2 S3 m     2  Smm  2

1

 S2 2y   2   S3 y       S  2   my 

We can decide c0 as y  c0  c2 x2  c3 x3     d 2   S22     2  d3    S32        d p   Sm 22 2

  S23 2

S33  2

Sm 3 2

 cm xm

S2 m   2 S3 m     2  Smm  2

1

 2   S21   2   S31       S  2   m1 

We can decide d 0 as x1  d0  d2 x2  d3 x3 

 dm xm

We start with the regression without explanatory variable, and denote it as model 0. The regression is given by

Kunihiro Suzuki

166 Model 0 : Yi  y Se M 0  2

The corresponding variance Se M 0  2

is given by

1 n 1 n 2 2  yi  Yi     yi  y   S yy 2  n i 1 n i 1

In the next step, we pick up one explanatory variable and denote the model as model 1. The regression using explanatory variable is given by xl

Yi  a0  a1 xil1 Se M 1 2

The corresponding variance Se M 1  2

is given by





2 1 n 1 n  2 y  Y  yi  a0  a1 xil1      i i  n i 1 n i 1

Then the variable

nS    F 

2 e M 0

1

 nSe M 1 2

  

2 nSe M 1

e M 0

 e M 1



e M 1

follows F distribution with freedom and are given by



F e M 0  e M 1 ,e M 1



where

e M 0  n  1 e M 1  n  2 We can judge the validity of the explanatory variable as

e M 0

and

e M 1

are the

Multiple Regression Analysis

 

 F1  F   e M 1 , e M 1 e M 0     F1  F e M 0  e M 1 , e M 1 

 

167

valid invalid

We select the variable that hold above evaluation and the maximum one. We repeat this process as follows. We validate the explanatory variables.

Fi 1

nS     nS         2 e Mi

2 e Mi 1

2 nSe Mi 1

e Mi 

 e Mi 1



e MI 1

follows F distribution with freedom and are given by



F e Mi   e Mi 1 ,e Mi 1

 where 

e  Mi 

and

e Mi 1

e Mi   n  i  1

e Mi 1  n  i  2 We select the variable that has the maximum value holding the equations. The evaluation is all invalid, the selection of the variable finish. The multiple regression can be expressed using a matrix form as β   X T X  X TY 1

where

yi  a0  a1  xi1  x1   a2  xi 2  x2  

 a p  xip  x p   ei

The matrixes and vectors are defined as  y1    y Y  2      yn 

are the

Kunihiro Suzuki

168 1 x11  x1  1 x21  x1 X    1 xn1  x1  a0    a1 β       ap   e1    e e 2      en 

x12  x2 x22  x2 xn 2  x2

x1 p  x p   x2 p  x p    xnp  x p 

Chapter 5

STRUCTURAL EQUATION MODELING ABSTRACT Latent variables are introduced and the relationship between the variables and obtained variable data are related quantitatively in structural equation modeling (SEM). Therefore, SEM enables us to draw a systematic total relationship structure of the subject. We start with k kinds of observed variables, and propose a path diagram which shows the relationship of the observed variables and their cause or results introducing latent variables. We evaluate the variances and covariances using the observed variable data, and they are also expressed with parameters related to the latent variables. We then determine the parameters so that the deviation between two kinds of expressions for variances and covariances is the minimum.

Keywords: structural equation modeling, observed variable, latent variable, path diagram

1. INTRODUCTION Structural equation modeling (SEM) shows a cause of the obtained data introducing variables which depends on an analysist’s image. Therefore, it is vital to search the cause of the obtained data, and SEM is intensively investigated these days. It is developed in psychology field, where a doctor must search the origin of the action, observation, or medical examination of the patient. This technology is also applied to marketing field and the accommodating fields are extended to more and more. On the other hand, this technology depends on the analysists’ image, and we have not unique results with this technology. We need to understand the meaning of the analytical process although it is quite easy to handle.

170

Kunihiro Suzuki

In this chapter, we focus on the basis of the technology. The application of it should be referred with the other books.

2. PATH DIAGRAM A path diagram is used in a structural equation modeling. Figure 1 shows the elements of figure used in a path diagram: Correlation, arrow from cause and results, observed variable, latent variable, and error. The data structure is described using these elements.

Figure 1. Figure elements used in a path diagram.

Figure 2. Example of a path diagram.

Structural Equation Modeling

171

3. BASIC EQUATION We form a path diagram using the elements as shown in Figure 2. We investigate the corresponding basic equation. We have the following equations from the path diagram, which are given by. x1i   fi  e1i

(1)

x2i   fi  e2i

(2)

x3i   gi  e3i

(3)

x4i   gi  e4i

(4)

gi   fi  egi

(5)

where i is the data ID and we assume n data, that is, i  1,2, , n . The variance for a variable x1i is defined S11   2

1 n 2  x1i  x1   n i 1

(6)

We can evaluate this variance using the observed data. This variance can also be expressed with the latent variable as 1 n 2   x1i  x1  n i 1 1 n 2    fi  e1i  x1  n i 1

S11   2

(7)

Further, we obtain from Eq. (1) as 1 n  x1i n i 1 1 n 1 n    fi   e1i n i 1 n i 1

x1 

  f  e1

(8)

Kunihiro Suzuki

172 where f 

e1 

1 n  fi n i 1

(9)

1 n  e1i n i 1

(10)

Substituting these into Eq. (7), we obtain 1 n 2  fi  e1i  x1   n i 1 2 1 n    fi  f   e1i  e1   n i 1 2 1 n 1 n   2  fi  f  2  fi  f n i 1 n i 1

S11   2











 e

1i

 e1  

1 n 2  e1i  e1   n i 1

  2 S ff   Se1  2

2

(11)

We assume that the correlation between latent variable and error is zero. We can obtain other variables with similar way as   S22   2 S ff   Se2 

(12)

  S33    2 S gg  Se3 

(13)

    S44   2 S gg  Se4 

(14)

2

2

2

2

2

2

2

2

2

  S gg   2 S ff   Seg  2

2

2

(15)

Latent variable g can be expressed with the other latent variable f . Therefore, we  2

 2

 2 eliminate S gg from S33 and S44 as

Structural Equation Modeling     S33   2 S gg  Se3  2

2

173

2





  2  2 S ff   Seg   Se3  2

2

2

  2 2 S ff    2 Seg   Se3  2

2

    S44   2 S gg  Se4  2

2

2

(16)

2





  2  2 S ff   Seg   Se4  2

2

2

  2 2 S ff    2 Seg   Se4  2

2

2

(17)

Next, we consider covariances as shown below. 1 n   x1i  x1  x2i  x2  n i 1 1 n    fi  e1i  x1   fi  e2i  x2  n i 1 1 n     fi  f   e1i  e1     fi  f   e2i  e 2  n i 1 2 1 n    fi  f n i 1

S12   2













  S ff  2

(18)

1 n   x1i  x1  x3i  x3  n i 1 1 n    fi  e1i  x1   gi  e3i  x3  n i 1 1 n    fi  f   e1i  e1     gi  g    e3i  e3   n i 1

S13   2

 







 g

1 n  fi  f n i 1

i

 g

(19)

Substituting gi of Eq. (5) into Eq. (19), we further obtain 2







 

1 n  fi  f  gi  g  n i 1 1 n    fi  f  fi  f   egi  eg   n i 1

S13   

  S ff 



2

(20)

Kunihiro Suzuki

174

1 n   x1i  x1  x4i  x4  n i 1 1 n    fi  e1i  x1  gi  e4i  x4  n i 1 1 n    fi  f   e1i  e1     gi  g    e4i  e4   n i 1 1 n    fi  f  gi  g  n i 1

S14   2









  S ff  2

(21)

1 n   x2i  x2  x3i  x3  n i 1 1 n     fi  e2i  x2   gi  e3i  x3  n i 1 1 n     fi  f   e2i  e2     gi  g    e3i  e3  n i 1 1 n    fi  f  fi  f   egi  eg  n i 1

  S23  2







 



  S ff  2

(22)

1 n   x2i  x2  x4i  x4  n i 1 1 n     fi  e2i  x2  gi  e4i  x4  n i 1 1 n     fi  f   e2i  e2     gi  g    e4i  e4   n i 1 1 n    fi  f  gi  g  n i 1 1 n    fi  f  fi  f   egi  eg   n i 1

  S24  2

  S ff 











 



2

(23)

Structural Equation Modeling

175

1 n   x3i  x3  x4i  x4  n i 1 1 n     gi  e3i  x3  gi  e4i  x4  n i 1 1 n     gi  g    e3i  e3     gi  g    e4i  e4   n i 1 1 n     gi  g  gi  g  n i 1

S34   2

    S gg 2



   2 S ff   Seg  2

2



  S ff    Seg  2

2

(24)

We can evaluate all Sij

2

using the observed data, and they are also expressed with

latent variables and related to parameters. The deviation between the observed one and theoretical one is denoted as Q and is expressed with

    S      S    S      2  S     S     2  S     S            S     S   S    S      2  S      S     S                   S    S   S  S     2

2 2 2 2 2 2 2 2 2 Q   S11    2 S ff   Se1    2  S12    S ff    2  S13    S ff    2  S14    S ff     2

2

2 22

2

2 33

2

2

2 ff

2

2 eg

2 e3

2 44

2

2

2 ff

2

2 eg

2 e4

2 ff

2 e2

2 23

2 ff

2

2

2

2 34

2 24

2 ff

2 ff

2 eg

2

2

2

2

(25) We impose that the parameters should be determined so that Q is the minimum. The parameters we should decide are as follows.  ,  ,  ,  ,  , Se 2 , Se 2 , Se 2 , Se 2 , Se 2 , S ff2 1

2

3

4

g

(26)

We express the parameters as symbolically as  . Partial differentiate Q with respect to  , we impose that it is zero. Q 0 

(27)

Kunihiro Suzuki

176

Therefore, we obtain 11 equations for 11 parameters. The solution for the parameters can be evaluated simply as below. We assume that the kinds of observed variable as k . In this case, we have x1 , x2 , x3 , x4 ,and k is 4. We have variances with the kinds of

1 k  k  1 2

(28)

In this case we have variables of number of

1 1 k  k  1   4  5  10 2 2

(29)

We express the number of parameters as p . The following should be held to decide parameters uniquely.

1 p  k  k  1 2

(30)

In this case, p  11 , and hence the condition is invalid, and we cannot decide parameters in the above case. When the imposed condition is not valid, we have two ways. One way is to decide the parameter values so that the imposed condition becomes valid, and decide the other parameters. The other one is to change the model.

SUMMARY To summarize the results in this chapter— We evaluate the variances and covariances of k kinds of observed data. That is, we obtain Sij  based on the observed data. 2

We also evaluate the Sij

2

including parameters assumed in the path diagram. We

denote them as aij . We then evaluate a parameter associated with the deviation as

Structural Equation Modeling



Q   Sij   aij i, j

2



177

2

Denoting parameters included in aij as  , we impose that Q 0 

We have number of n   equations which is the number of the parameters in the assumed path diagram. We set the number of observed variables as k . The followings condition must be held to decide parameters.

1 n    k  k  1 2 When the imposed condition is not valid, we have two ways. One way is to decide the parameter values so that the imposed condition becomes valid, and decide the other parameters. The other one is to change the model.

Chapter 6

FUNDAMENTALS OF A BAYES’ THEOREM ABSTRACT We assume that an event is related to some causes. When we obtain some event, the Bayes’ theorem enables us to specify the probability of the cause of the event. The related cause must be a certain one, and hence the related probability must be 1. However, we do not know which is the real cause, and predict it based on the obtained data. We can make the value of the probability close to 1 or 0 using Bayesian updating based on the obtained data. Many evens are related to each other in many cases. The events are expressed by nodes, and the relationship is expressed by a network connected by lines with arrows, which is called as a Bayesian network. All nodes have their conditional probabilities. A relationship between any two nodes in the network can be evaluated quantitatively using the conditional probabilities.

Keywords: Bayes’ theorem, conditional probability, Bayesian updating, priori probability, posterior probability, Bayesian network, parent node, child node

1. INTRODUCTION A Bayes’ theorem is based on the conditional probability, which is well established in standard statistics. Therefore, the Bayes’ theorem is not so different from the standard statistics in that standpoint of view. In the standard statistics, we basically obtain data associated with causes and predict results based on the data. The Bayes’ theorem discusses the subjects in the opposite direction with incomplete data set. We obtain the results and predict the related cause in the Bayes’ theorem. Further, even when the prediction is not so accurate at the beginning, it is improved by updating related data. We can get vast data and accumulate them easily these days. Therefore, the Bayesian theory utilizing updating data

Kunihiro Suzuki

180

plays an important role in these days. We further treat the case where many events are related to each other, which can be analyzed with a Bayesian network.

2. A BAYES’ THEOREM Let us consider an example of a medical check as shown in Figure 1. When we check a sick people, we obtain a result of positive reaction with a ratio of 98%. The event of the positive reaction is denoted as E , and the corresponding probability is denoted as P  E   0.98

(1)

Note that we consider only sick people. The high percentage is good itself, but it is not enough for the medical check. It is important that it does not react to non-sick people. Therefore, we assume the probability where the non-sick people react with the medical check as P  E   0.05

Figure 1. Reaction ratio of sick and non-sick people.

(2)

Fundamentals of a Bayes’ Theorem

181

The sick people are denoted as A and the non-sick people as B . The probability of 98% of Eq. (1) is the one among the sick people, and hence the notation should be modified to express the situation clearly. We then denote Eq. (1) as

P  E A  0.98

(3)

The right side of A in Eq. (3) expresses the population for the probability where event E occurs. P EB The probability for the non-sick people is then denoted as   and is expressed as

P  E B   0.05 This expresses that the event

(4) E

occurs for the non-sick people

The medical check is required to be high for Therefore, this example is an expected one.

P  E A

B

.

and to be low for

P E B

.

We further need a ratio of the sick people to the total one, which is denoted as P  A , and assume it to be 5% here. Therefore, we have P  A  0.05

(5)

There are only sick or non-sick people, and hence the ratio of the non-sick people is given by P  B   1  P  A  0.95

(6)

Strictly speaking, P  A means the probability that even A occurs among the whole events. The whole events are A and A in this case. We denote the whole case as  . Therefore, P  A should be expressed with

P  A  P  A  

(7)

We frequently ignore this  . This is implicit assumption and simplifies the expression. We then have all information associated with this subject. The ratio that people obtain positive reaction in the medical check is given by

Kunihiro Suzuki

182

P  E   P  E A P  A  P  E B  P  B   0.98  0.05  0.05  0.95  0.0965

(8)

Note that this total ratio is as low as about 10%. However, this ratio is not so interesting. This simply means that the sick people ratio is low. We study the probability that the Bayes’ theorem treats. We select one person from a set, and perform the medical check, and obtain the positive reaction. What is the probability that the person is really sick? The probability is denoted as   . The corresponding situation is shown in Figure 2. The probability is the ratio of the positively reacted people in A to the total reacted people, and we do not care about the non-reacted people. We discuss the situation under where we obtain the positive reaction. The ratio of the positively reacted people in A is given by P AE

P  E A P  A 

(9)

The ratio of the positively reacted people in B is given by

P  E B P  B

(10)

Therefore, the target probability is evaluated as P A E 

P  E A P  A

P  E A P  A  P  E B  P  B 

0.98  0.05 0.98  0.05  0.05  0.95  0.51



(11)

This is called as Bayes’ probability, which expresses the probability of the cause for obtained result. This may be rather lower value than we expect since P  E A  0.98 . The appreciation of the result is shown below. The positively reacted people constitution is schematically shown in Figure 2. The evaluated ratio is the reacted sick people to the total reacted people. Although the reaction ratio for the non-sick people is low, the number of the non-sick people is much larger than the sick people. Therefore, the number of the reacted people among non-sick people is comparable with the number of the reacted people among the sick people. Consequently, we have a rather low value ratio.

Fundamentals of a Bayes’ Theorem

183

Figure 2. Reaction ratio of sick and non-sick people.

We can easily extend this theory to a general case where the causes associated with the obtained event are many.

m kinds of subjects are denoted as Ai  i  1, 2, , m . When we have an event

E

, the

probability that the cause of the event Ai is expressed by P  Ai E    

P  E Ai  P  Ai  PE

P  E Ai  P  Ai 

P  E A1  P  A1   P  E A2  P  A2  

 P  E Am  P  Am 

P  E Ai  P  Ai 

 PE A P A  m

j 1

j

j

(12)

This is called as a Bayes’ theorem.

3. SIMILAR FAMOUS EXAMPLES There are other two famous examples of the Bayes’ theorem, which we introduce here briefly.

Kunihiro Suzuki

184

3.1. Three Prisoners There are three prisoners, A , B , and C . One of them is decided to be pardoned although the selection is at random and we do not know who he is beforehand. After the decision, the jailer knows who is pardoned. The prisoner A asks the jailer to notice the unpardoned one among the prisoners B and C . The jailer answers to the prisoner A

that the prisoner B is unpardoned.

The prisoner A thought that the probability that he is pardoned was 1 3 before he heard the jailer’s answer. The prisoner A then heard that the prisoner B is unpardoned and hence the prisoners A or C is pardoned. Therefore, the prisoner A thought that the probability that he (the prisoner A ) is pardoned is increased from 1 3 to 1 2 . In this problem, we assume that the jailer is honest and tells a prisoner name B or C with the same probability of 1 2 when the prisoner A is pardoned. We study whether the above discussion is valid or not. The events are denoted as follows. A: The prisoner A is pardoned. B: The prisoner B is pardoned. C: The prisoner C is pardoned. S A : The jailer talks that the prisoner A is not pardoned. S B : The jailer talks that the prisoner B is not pardoned. SC : The jailer talks that the prisoner C is not pardoned.

The probability that each member is pardoned is the same and is given by P  A  P  B   P  C  

1 3

(13)

We assume that the jailer is honest and does not tell a lie as mentioned before, and hence

P  SB B   0

(14)

Fundamentals of a Bayes’ Theorem

P  SB C   1

185

(15)

Let us consider the case where the prisoner A is pardoned. We then have P  S B A 

1 2

(16)

P  SC A  

1 2

(17)

The probability P  A SB  

P  A SB 

is given by P  S B A P  A

P  S B A P  A  P  S B B  P  B   P  S B C  P C 

1 1  2 3  1 1 1 1   0   1 2 3 3 3 1  3

(18)

which is not 1 2 . Therefore, the probability that the prisoner A is pardoned is invariant.

3.2. A Monty Hall Problem We treated the Monty Hall problem in Chapter 2 of volume 1, and study again the problem in the standpoint of the Bayes’ theorem. We explain the problem again. There are three rooms where a car is in one room and goats are in the other rooms. The doors are shut at the beginning. We call the room and the door where a car is in as winning ones. The guest selects one room first and does not check whether it is a winning one or not, that is, the door is not opened. Monty Hall checks the other rooms. There must be one nonwinning room at least, that is, there is a room where a goat is in. Monty Hall deletes the non-winning room, that is, he opens the door where a goat is in. The problem is whether the guest should change the selected room or keep his first selected room. The door that the guest selects is denoted as A and the door that Monty Hall opens is denoted as C , and the rest door is denoted as B .

Kunihiro Suzuki

186

The probability that the door A is the winning one is denoted as P  A , the probability that the door B is the winning one is denoted as P  B  , and the probability that the door C is the winning one is denoted as P  C  . The event that Monty Hall opens the door is denoted as D . The important probability is P  D A

P  D A

P DB and   .

is the probability that Monty Hall opens the door C under the condition that

the door A is the winning one. In this case, non-winning door may be B or C , and hence the probability that Monty Hall opens door C is 1 2 . Therefore, we obtain P  D A  P D B

1 2

(19)

is the probability that Monty Hall open the door C under the condition that the

door C is the winning one. In this case, non-winning door must be C , and hence the probability that Monty Hall opens door C is 1 . Therefore, we obtain

P D B  1

(20)

Finally we can evaluate the probability where the guest does not change the room as P  A D 

P  D A P  A

P  D A P  A  P  D B  P  B 

1 1  2 3  1 1 1   1 2 3 3 1  3

(21)

If the guest changes the selected room from A to B , the corresponding probability is given by

Fundamentals of a Bayes’ Theorem P  B D 



187

P  D B P  B

P  D A P  A  P  D B  P  B  1

1 3

1 1 1   1 2 3 3 2  3

(22)

Therefore, the guest should change the room from A to B . What is the difference between the three prisoner problem and the Monty Hall problem. One can change the decision in Monty Hall problem after he gets information, while one cannot change it in the three prisoner problem. If the prisoner A can be changed to prisoner B , the situation becomes identical to the Monty Hall problem.

4. BAYESIAN UPDATING The probabilities shown up to here may be against our intuition. However, one of our targets is specify the cause related to the event. For example, the resulted probability for medical check was about 0.5 even when we obtain positive reaction. The value of the probability may be interesting. However, what can we do with the probability of 0.5? We need to obtain extreme probability values close to 1 or 0 to do something further. Bayesian updating is a procedure to make the probability values to the extreme ones.

4.1. Medical Check Let us consider the situation of the sick people example in more detail. We obtain positive reaction E with the medical check, and the probability for the person to be sick is given by P A E 

P  E A P  A

P  E A P  A  P  E B  P  B 

0.98  0.05 0.98  0.05  0.05  0.95  0.51



(23)

Kunihiro Suzuki

188

We can also evaluate the probability for the person to be non-sick after we obtain event E as PB E 

P  E B P  B

P  E A P  A  P  E B  P  B 

0.05  0.95 0.98  0.05  0.05  0.95  0.49



(24)

We do not know the detail of the checked person before the medical check. Therefore, we use a general probability values for P  A and P  B  . However, we now know that his medical check result is positive. Therefore, we can regard P  A and P  B  for the person as

PA E

, and

PB E

, respectively. We can then regard

P  A E   P  A

(25)

P  B E   P  B

(26)

The probabilities of P  A  0.05 and P  B   0.95 before the medical check are called as

priori

probabilities,

P  B E   P  B   0.49

and

the

probabilities

P  A E   P  A   0.51

and

are called as posterior probabilities, which is shown in Figure 3.

Figure 3. Priori and posterior probability where we have a positive reaction in the medical check.

Fundamentals of a Bayes’ Theorem

189

4.2. Double Medical Checks We cannot do clearly with posterior probability of Eqs. (25) and (26) since they are not so extreme values. We try the same medical check again, and assume that we have again the positive reaction. It is not the same as the first one. We use posterior probabilities in this case. Let us consider the situation of the sick people example in more detail. We obtain positive reaction E with the medical check, and the probability for the person to be sick is given by P A E 

P  E A P  A

P  E A P  A  P  E B  P  B 

0.98  0.51 0.98  0.51  0.05  0.49  0.95



E

(27)

We can also evaluate the probability for the person to be non-sick after we obtain event as PB E 

P  E B P  B

P  E A P  A  P  E B  P  B 

0.05  0.49 0.98  0.51  0.05  0.49  0.05



(28)

Therefore, the person should perform further process as shown in Figure 4.

Figure 4. Priori and posterior probability after we have positive reaction in the second medical check.

Kunihiro Suzuki

190

4.3. Two Kinds of Medical Checks We consider two kinds of medical checks here. The first one is alternatively rough estimation, and the second accurate one, which is shown in Figure 5. The positive reaction probabilities for the test 1 is denoted as P1  E B   0.1

, and the positive reaction probabilities for the test 2 as

P1  E B   0.03

. We assume the prior probabilities of

P  A  0.05

and

P1  E A   0.8

and

P2  E A   0.98

P  B   0.95

and

.

Figure 5. Two kinds of medical check. The test 1 is relatively rough check compared with the test 2 since the related probabilities are not so extreme.

We perform the test 1 medical check first. If we obtain negative reaction, we can evaluate the posterior probabilities as





P1 A E 







P1 E A P  A 







P1 E A P  A  P1 E B P  B 

0.2  0.05 0.2  0.05  0.90  0.95  0.01







P1 B E 







P1 E B P  B 





(29)



P1 E A P  A  P1 E B P  B 

0.90  0.95 0.2  0.05  0.90  0.95  0.99



(30)

Fundamentals of a Bayes’ Theorem

191

Therefore, we can convincingly judge that the person is not sick, and finish the medical check. The corresponding probabilities update feature is shown in Figure 6.

Figure 6. Priori and posterior probability after we have negative reaction in the test1 medical check.

We then consider a case where we obtain positive reaction in the test 1 medical check. The corresponding posterior probabilities are given by P1  A E  

P1  E A P  A

P1  E A P  A  P1  E B  P  B 

0.80  0.05 0.80  0.05  0.10  0.95  0.30



P1  B E  

(31)

P1  E B  P  B 

P1  E A P  A  P1  E B  P  B 

0.10  0.95 0.80  0.05  0.10  0.95  0.70



(32)

The corresponding Bayesian updating process is shown in Figure 7. We cannot judge clearly with these posterior probabilities, and perform the test 2 medical check.

Figure 7. Priori and posterior probability after we have positive reaction in the test 1 medical check.

Kunihiro Suzuki

192

We use Eqs. (31) and (32) as the priori probabilities for the test 2. If we obtain positive reaction in this medical check, the posterior probabilities are evaluated as follows. P2  A E  

P2  E A P  A

P2  E A P  A  P2  E B  P  B 

0.98  0.30 0.98  0.30  0.03  0.70  0.93



P2  B E  

(33)

P2  E B  P  B 

P2  E A P  A  P2  E B  P  B 

0.03  0.70 0.98  0.30  0.03  0.70  0.07



(34)

Therefore, the person is probably sick. If we obtain negative reaction in this medical check, the posterior probabilities are evaluated as follows.





P2 A E 







P2 E A P  A 







P2 E A P  A   P2 E B P  B 

0.02  0.30 0.02  0.30  0.97  0.70  0.01







P2 B E 







P2 E B P  B 





(35)



P2 E A P  A   P2 E B P  B 

0.97  0.70 0.02  0.30  0.97  0.70  0.99



(36)

Therefore, the person is probably non-sick.

4.4. A Punctual Person Problem We can treat some ambiguous problem using the procedure shown in the previous section. The theory is exactly the same as the one in the previous section, but we should define the ambiguous problem by making some assumptions.

Fundamentals of a Bayes’ Theorem

193

Let us consider the case for a punctual person, where we want to decide whether a person is punctual or not punctual. The ambiguous point in this problem is that a punctual person is not clearly defined in general. We feel that a punctual person is apt to be on time with high probability and not punctual people are not apt to keep a time. The probability is not 1 or 0, but some values between 0 and 1 for both cases. We need to define punctual quantitatively, although it is rather difficult. We form two groups: One group consists of persons who are thought to be punctual, and the other group consists of persons who are thought to be not punctual. We then gather data of on time or off time and finally get the data such as shown in Table 1. This is a definition of punctual or not punctual. That is,

Punctual:

On time probability 

Not punctual:

4 1 ;Off time probability  5 5

On time probability 

1 3 ;Off time probability  4 4

Table 1. Definition of punctual or not punctual Character

Off time( D1 )

On time( D2 )

Not Punctual ( H1 )

3 4

1 4

Punctual ( H 2 )

1 5

4 5

We investigate a person S whether he is punctual or not. The unpunctual case is denoted as H1 , and punctual case as H 2 . The off-time is denoted as D1 and on time as D2 , which is also shown in Table 1. S’ s score in recent five meetings are D2 , D2 , D1 , D1 , D1 . Step 0 We set the initial value for a member S. We have an impression that he is not punctual, and hence set the initial one as

P  H1   0.6

(37)

P  H 2   0.4

(38)

Kunihiro Suzuki

194

Step 1 Next we re-evaluate S considering the obtained data. In the first meeting, he was on time, and hence, we have

P  H1 D2  

P  D2 H1  P  H1 

P  D2 H1  P  H1   P  D2 H 2  P  H 2 

1  0.6 4  1 4  0.6   0.4 4 5  0.319 P  H 2 D2  

(39)

P  D2 H 2  P  H 2 

P  D2 H1  P  H1   P  D2 H 2  P  H 2 

4  0.4 5  1 4  0.6   0.4 4 5  0.681

(40)

Therefore, the updating P  H1  and P  H 2  are

P  H1   0.319

(41)

P  H 2   0.681

(42)

Step 2 He was also on time in the second meeting, and we have

P  H1 D2  

P  D2 H1  P  H1 

P  D2 H1  P  H1   P  D2 H 2  P  H 2 

1  0.310 4  1 4  0.310   0.681 4 5  0.125

(43)

Fundamentals of a Bayes’ Theorem

P  H 2 D2  

195

P  D2 H 2  P  H 2 

P  D2 H1  P  H1   P  D2 H 2  P  H 2 

4  0.681 5  1 4  0.310   0.681 4 5  0.875

(44)

Therefore, the updating P  H1  and P  H 2  are

P  H1   0.125

(45)

P  H 2   0.875

(46)

Step 3 He was off time in the third meeting, and we have

P  H1 D1  

P  D1 H1  P  H1 

P  D1 H1  P  H1   P  D1 H 2  P  H 2 

3  0.125 4  3 1  0.125   0.875 4 5  0.349 P  H 2 D1  

(47)

P  D1 H 2  P  H 2 

P  D1 H1  P  H1   P  D1 H 2  P  H 2 

1  0.875 5  3 1  0.125   0.875 4 5  0.651 Therefore, the updating P  H1  and P  H 2  are

(48)

Kunihiro Suzuki

196

P  H1   0.349

(49)

P  H 2   0.651

(50)

Step 4 He was off time in the fourth meeting, and we have

P  H1 D1  

P  D1 H1  P  H1 

P  D1 H1  P  H1   P  D1 H 2  P  H 2 

3  0.349 4  3 1  0.349   0.651 4 5  0.668 P  H 2 D1  

(51)

P  D1 H 2  P  H 2 

P  D1 H1  P  H1   P  D1 H 2  P  H 2 

1  0.651 5  3 1  0.349   0.651 4 5  0.332

(52)

Therefore, the updating P  H1  and P  H 2  are

P  H1   0.668

(53)

P  H 2   0.332

(54)

Step 5 He was off time in the fifth meeting, and we have

Fundamentals of a Bayes’ Theorem

P  H1 D1  

P  D1 H1  P  H1 

P  D1 H1  P  H1   P  D1 H 2  P  H 2 

3  0.668 4  3 1  0.668   0.332 4 5  0.883 P  H 2 D1  

197

(55)

P  D1 H 2  P  H 2 

P  D1 H1  P  H1   P  D1 H 2  P  H 2 

1  0.332 5  3 1  0.668   0.332 4 5  0.117

(56)

Therefore, the updating P  H1  and P  H 2  are

P  H1   0.883

(57)

P  H 2   0.117

(58)

Therefore, we are convincing that he is not punctual based on the data. It should be noted that the results depend on the definition of the punctual people. However, we can treat such ambiguous issue quantitatively using the procedure.

4.5. A Spam Mail Problem We receive many mails every day. In the huge mails, we have sometimes spam mails. We want to divide the spam mail definitely by evaluating words in the mail. This is one famous example that the Bayesian updating procedure accommodates. We make a dictionary where key words and related probabilities are listed as word 1, 2, 3,・・・, where three of which are shown in Table 2.

Kunihiro Suzuki

198

Table 2. Dictionary for evaluating spam mails Word W01 W02 W03

Spam rs1 rs2 rs3

Non-spam rns1 rns2 rns3

We denote the event of a spam mail as S , and a non-spam mail as S . We assume that we find words 1, 2, and 3 in a mail, and want to evaluate whether the mail is spam or not. We need to define the initial probability for that

PS 

 

and

PS 

. We assume that we know

P S  1  rs 0

P  S   rs 0

. Therefore, we can assume that . We find the word 1, and the corresponding probability is given by

P  S W 01  





P S W 01  

P W 01 S  P  S 

P W 01 S  P  S   P W 01 S  P  S  rs1rs 0  PS  rs1rs 0  rns1 1  rs 0 

(59)

P W 01 S  P  S 

P W 01 S  P  S   P W 01 S  P  S  rns1 1  rs 0 

rs1rs 0  rns1 1  rs 0 

 PS  (60)

We find the word 2, and the corresponding probability is given by

P  S W 02  

P W 02 S  P  S 

P W 02 S  P  S   P W 02 S  P  S 

rs1rs 0 rs1rs 0  rns1 1  rs 0   rns1 1  rs 0  rs1rs 0 rs 2  rns 2 rs1rs 0  rns1 1  rs 0  rs1rs 0  rns1 1  rs 0  rs 2



rs 2 rs1rs 0  PS  rs 2 rs1rs 0  rns 2 rns1 1  rs 0 

(61)

Fundamentals of a Bayes’ Theorem





P S W 02 

199

P W 02 S  P  S 

P W 02 S  P  S   P W 02 S  P  S 

rns1 1  rs 0  rs1rs 0  rns1 1  rs 0   rns1 1  rs 0  rs1rs 0 rs 2  rns 2 rs1rs 0  rns1 1  rs 0  rs1rs 0  rns1 1  rs 0  rns 2



rns 2 rns1 1  rs 0 

rs 2 rs1rs 0  rns 2 rns1 1  rs 0 

 PS  (62)

We finally find the word 3, and corresponding probabilities are given by

P  S W 03 

P W 03 S  P  S 

P W 03 S  P  S   P W 03 S  P  S 

rs 2 rs1rs 0 rs 2 rs1rs 0  rns 2 rns1 1  rs 0   rns 2 rns1 1  rs 0  rs 2 rs1rs 0 rs 3  rns 3 rs 2 rs1rs 0  rns 2 rns1 1  rs 0  rs 2 rs1rs 0  rns 2 rns1 1  rs 0  rs 3







P S W 03 

rs 3rs 2 rs1rs 0  PS  rs 3rs 2 rs1rs 0  rns 3rns 2 rns1 1  rs 0 

(63)

P W 03 S  P  S 

P W 03 S  P  S   P W 03 S  P  S 

rns 2 rns1 1  rs 0  rs 2 rs1rs 0  rns 2 rns1 1  rs 0   rns 2 rns1 1  rs 0  rs 2 rs1rs 0 rs 3  rns 3 rs 2 rs1rs 0  rns 2 rns1 1  rs 0  rs 2 rs1rs 0  rns 2 rns1 1  rs 0  rns 3



rns 3 rns 2 rns1 1  rs 0 

rs 3 rs 2 rs1rs 0  rns 3rns 2 rns1 1  rs 0 

 PS  (64)

We can easily extend this to the general case where we find m kinds of keywords in a mail. The probabilities that the mail is spam or non-spam are given by

Kunihiro Suzuki

200 m

PS  

rs 0  rsi i 1

m

m

i 1

i 1

rs 0  rsi  1  rs 0   rnsi

(65)

m

PS  

1  rs 0   rnsi i 1

m

m

i 1

i 1

rs 0  rsi  1  rs 0   rnsi

(66)

It should be noted that we neglect the correlation between each word in the above analysis.

4.6. A Vase Problem We consider a vase problem, which is frequently treated and is convenient to extend the formula. We know the two kinds of vases where black and gray balls are in, and we know the each ratio of black ball number to the total one. We select one vase where we do not know which one it is. We take a ball from the vase and return it. Getting the data, we want to guess which vase we use.

Figure 8. A vase problem with two kinds of balls.

Let us consider two vases A and B as shown in Figure 8. The vase A contains balls with a ratio of gray to black is 4:1, and B with a ratio of 2:3. We express the event of

Fundamentals of a Bayes’ Theorem

201

obtaining a black ball as E and a gray balls as E , and hence the probabilities are expressed as

P  E A 

1 5

(67)

P  E B 

3 5

(68)

These probabilities characterize the vases. The probabilities that we obtain a black ball are given by P A E 

PB E 

P  E A P  A

P  E A P  A  P  E B  P  B 

(69)

P  E B P  B

P  E A P  A  P  E B  P  B 

(70)

The probabilities that we obtain a gray ball are given by





P AE 





P BE 









P E A P  A





(71)



(72)

P E A P  A  P E B P  B 









P E B P  B



P E A P  A  P E B P  B 

We select one type of a vase and cannot distinguish which vase it is by its appearance. We do not know which vase it is. Therefore, one of the probability P  A or P  B  is 1 and the other is 0. We pick up one ball and return it to the vase, and perform five times and obtain the results of EEEEE . Therefore, the number of black balls is larger and hence we can guess that the vase B is more plausible. We want to evaluate the probability more quantitatively. The procedures are as follows.

Kunihiro Suzuki

202

Step 0 If we have some information associated with the number of vase A and B , and hence its ratio, we can use them as the initial probability for P  A and P  B  . If we do not have any information associated with the ratio, we assume the probabilities as P  A  P  B  

1 2

(73)

Step 1 We obtain a gray ball are given by





P AE 







 E  in the first trial, and hence the corresponding probabilities 

P E A P  A





P E A P  A  P E B P  B 

4 1  5 2  4 1 2 1    5 2 5 2  0.667





P BE 







(74)



P E B P  B





P E A P  A  P E B P  B 

2 1  5 2  4 1 2 1    5 2 5 2  0.333

(75)

We regard these as P  A and P  B  , that is,

 

 

 P  A   P A E  0.667    P  B   P B E  0.333

(76)

Step 2 We obtain a black ball  E  in the second trial, and hence the corresponding probabilities are given by

Fundamentals of a Bayes’ Theorem P A E 

203

P  E A P  A

P  E A P  A  P  E B  P  B 

1  0.667 5  1 3  0.667   0.333 5 5  0.4

PB E 

(77)

P  E B P  B

P  E A P  A  P  E B  P  B 

2  0.333 5  4 2  0.667   0.333 5 5  0.6

(78)

We regard these as P  A and P  B  , that is,  P  A  P  A E   0.4     P  B   P  B E   0.6

(79)

Step 3 We obtain a black ball  E  in the third trial, and hence the corresponding probabilities are given by P A E 

P  E A P  A

P  E A P  A  P  E B  P  B 

1  0.4 5  1 3  0.4   0.6 5 5  0.182

(80)

Kunihiro Suzuki

204 PB E 

P  E B P  B

P  E A P  A  P  E B  P  B 

3  0.6 5  1 3  0.4   0.6 5 5  0.818

(81)

We regard these as P  A and P  B  , that is,  P  A  P  A E   0.182     P  B   P  B E   0.818

Step 4 We obtain a gray ball are given by





P AE 

 

(82)

 E  in the third trial, and hence the corresponding probabilities

 

P E A P  A

 

P E A P  A  P E B P  B 

4  0.182 5  4 2  0.182   0.818 5 5  0.308





P BE 

 

 

(83)

P E B P  B

 

P E A P  A  P E B P  B 

2  0.818 5  4 2  0.182   0.818 5 5  0.692

We regard these as P  A and P  B  , that is,

(84)

Fundamentals of a Bayes’ Theorem

 

205

 

 P  A   P A E  0.308   P B  P B E  0.692    

(85)

Step 5 We obtain a black ball  E  in the third trial, and hence the corresponding probabilities are given by P A E 

P  E A P  A

P  E A P  A  P  E B  P  B 

1  0.308 5  1 3  0.308   0.692 5 5  0.129

PB E 

(86)

P  E B P  B

P  E A P  A  P  E B  P  B 

3  0.692 5  0.871



(87)

We regard these as P  A and P  B  , that is,  P  A  P  A E   0.182     P  B   P  B E   0.818

(88)

The final probability of P  A is much smaller than P  B  , and hence we guess the vase is type B . When we update the probability, the accuracy of the probability improves, which is shown later. We treat two types of vase A and B here. However, we can easily extend it to many kinds of Ai where i is 1, 2, ・・・, m. The events are expressed by the conditional probability of



P E j Ai



Ej

. We need to know

(89)

Kunihiro Suzuki

206

E

In this analysis, j is not limited to E or E as is the case for the example. We can use any number kinds of the event, but need to know the corresponding conditional probabilities of Eq. (89). We can set initial probabilities as P  Ai  

1 m

(90)

If we obtain an event of E1 , the corresponding probability under the event E1 are given by P  Ai E1  

P  E1 Ai  P  Ai 

 P  E1 Ai  P  Ai  m

 P  Ai 

i 1

(91)

We can update the probabilities of P  Ai  using this equation.

Figure 9. A vase problem with three kinds of balls.

4.7. Extension of Vase Problem We can easily extend the vase problem, where we can increase the number of the vase, and kinds of balls. We increase the vase number from 2 to 3 denoted as A1 , A2 , and A3 , and kind of balls form black and gray to black, gray, and white ones denoted as E1 , E2 , and E3 , which is shown in Figure 9. We characterize the vases as the ratio of each ball, which is given by

Fundamentals of a Bayes’ Theorem

207

P  E1 A1  

1 2 2 ; P  E2 A1   ; P  E3 A1   5 5 5

(92)

P  E1 A2  

3 1 1 ; P  E2 A2   ; P  E3 A3   5 5 5

(93)

P  E1 A3  

2 0 3 ; P  E2 A3   ; P  E3 A3   5 5 5

(94)

If we have some information associated with the initial condition for P  Ai  , we should use them. If we do not have any information associated with the ratio, we assume the identical probabilities given by P  A1   P  A2   P  A3  

1 3

(95)

If we obtain a certain ball which is expressed as Ei , then we can evaluate posterior probabilities as P  A1 Ei  

P  A2 Ei  

P  A3 Ei  

P  Ei A1 

P  Ei A1  P  A1   P  Ei A2  P  A2   P  Ei A3  P  A3 

(96)

P  Ei A2 

P  Ei A1  P  A1   P  Ei A2  P  A2   P  Ei A3  P  A3 

(97)

P  Ei A3 

P  Ei A1  P  A1   P  Ei A2  P  A2   P  Ei A3  P  A3 

(98)

We can easily extend the number of the vase to m and kinds number of balls to l . We can then evaluate the posterior probabilities as P  Ak Ei  

P  Ei Ak 

 PE m

j 1

i

  

Aj P Aj

P  Aj  where i  1,2, , l . Note that we need to know the initial priori probabilities .

(99)

Kunihiro Suzuki

208

4.8. Comments on Prior Probability In the Bayesian updating, we need to know the prior probabilities. Let us consider it using the two kinds of medical check example. We use the same probability associated with the medical check, but two different sets of prior probabilities. We assume that we obtain positive reactions for the test 1 and the test 2. The related posterior probabilities are given by P  A EE  

P  B EE  

P2  E A P1  E A P0  A

P2  E A P1  E A P0  A  P2  E B  P1  E B  P0  B 

(100)

P2  E B  P1  E B  P0  B 

P2  E A P1  E A P0  A  P2  E B  P1  E B  P0  B 

(101)

We use two different prior probability sets of P0  A  0.05 and P0  B   0.95 , and and P0  B   0.5 . The related posterior profanities for sick are 0.93 and 1.00, respectively. The values are different, but there is no difference in the standpoint of the judge for next doing subjects. Therefore, the initial prior probabilities are not so important if we repeat the check many times. If we know the prior probabilities, we use it, and use certain ones if we do not know them in detail, and the identical one divided by the number of the cause m if we do not about it anything. P0  A  0.5

4.9. Mathematical Study for Bayesian Updating We observed that we obtain proper probability when we repeat Bayesian updating. We study the corresponding mathematics here. We consider the vase problem. A vase A contains black and gray balls and the ratio of black ball to the total is p1 . We have many other vases of which ratios are different from p1 . We do not know the vase from which we take balls although it is the vase A . When we perform N trials, we can expect p1 N black balls and 1  p1  N gray balls. What we should do is to prove the probability

has the maximum value for the vase A . The probability that we take a black ball is denoted as p in general. The probability that after the N times trial f is given by

Fundamentals of a Bayes’ Theorem f  p Np1 1  p 

209

N 1 p1 

(102)

We differentiate this equation with respect to p and set it to 0, and obtain

df N 1 p1  N 1 p1  1  Np1 p Np1 1 1  p   N 1  p1  p Np1 1  p  dp  Np Np1 1 1  p 

N 1 p1  1

 p1 1  p   p 1  p1  

 Np Np1 1 1  p 

N 1 p1  1

 p1  p 

0

(103)

Therefore, we obtain the maximum value for

p  p1

(104)

Figure 10 shows the dependence of on probability on p for various trial numbers N and p1 : p1  0.2 for (a) and p1  0.5 for (b). The probability has the maximum value for

p1 , and it is limited to p1 with increasing N . The features are the same for p1  0.2 and 0.5.Therefore, we can expect one significant probability value compared with the others and can state clearly which vase it is with increasing updating numbers. 1.0

1.0 N = 10

0.6

0.8 N = 100

0.4

N = 50

0.6

N = 100

0.4 0.2

0.2 0.0 0.0

N = 10

N = 50

f(p)/f(p1)

f(p)/f(p1)

0.8

p1 = 0.5

p1 = 0.2

0.5 p (a)

1.0

0.0 0.0

0.5 p (b)

1.0

Figure 10. Dependence of probability on the probability for taking a black ball (a) p1  0.2 , (b) p1  0.5 .

Kunihiro Suzuki

210

5. A BAYESIAN NETWORK Many events are related to each other and some events are causes of the other events in a real world. Further, the cause events are also sometimes the results of other events. Consequently, the related events can be expressed by a network. The event in the network is denoted as a node, and a conditional probability is assigned to each node, which is called as the Bayesian network. Using the Bayesian network, we can predict the quantitative probability between any two events in the network.

Figure 11. Schematic figure for medical check.

5.1. A Child Node and a Parent Node Let us consider first the previous example of a medical check as shown in Figure 11. We denote the sick or non-sick as an event of

Ai  A  1

, and if the person is non-sick, we denote

negative reaction event as

Ci  C  1

Ci

Ai

. If the person is sick, we denote

Ai  A  0

. We denote the positive or

. If the person obtains positive reaction, we denote

, and if the person obtains negative reaction, we denote

The event event

Ci

Ai

Ai

can be regarded as the cause of the event

. The event

Ai

Ci

Ci  C  0

and

Ci

is also called as the parent event for the event

is also called as the child event for event nodes in the network.

Ai

. The events

Ai

.

is the result of the

Ci

and

, and the event

Ci

are called as

Fundamentals of a Bayes’ Theorem

211

We also need the related probabilities of the nodes, which are given by

P  A  0.05

(105)

P  A   1  P  A  0.95

(106)

P  C A   0.98

(107)

P  C A   0.05

(108)

A

C

We regard this as a Bayesian network. We regard i as a cause of event i . Therefore, the corresponding network is expressed as shown in Figure 12. We note that the subscript i in Ai and i in Ci are independent each other. We use this expression for simplicity, and this is applied to the other figures.

Figure 12. A Bayesian network for the medical check.

Kunihiro Suzuki

212

P  Ci 

are not described in the figure. However, we can evaluate them as follows.

P C   P C A  P  A   P C A P  A   P  C A  1  P  A    P  C A  P  A   0.05  1  0.05  0.98  0.05  0.0965

(109)

P C   1  P C   1  0.0965  0.9305

We want to obtain the conditional probability

P A C 

(110) P A C

and

  , which is given by

P AC

P  C A P  A P C 

0.98  0.05 0.0965  0.508



Figure 13. A Bayesian network with one hierarchy and one child node and two parents nodes.

(111)

Fundamentals of a Bayes’ Theorem

213

We extend the Bayesian network of one hierarchy and one child node and two parents nodes as shown in Figure 13. The nodes are denoted as Ai , Bi , and Ci . Ai and Bi are causes of Ci . Ci is called as a child node for nodes Ai and Bi , and nodes Ai and Bi are called as parents nodes for the node Ai . The events Bi , and Ci are regarded as the causes of the result Ai . When the event related to Ai occurs, we express it as Ai  A  1 . When it does not occur, we express it as Ai  A  0 . Summarizing above, we can express the status of each node as

 A 1 Ai   A  0

(112)

B 1 Bi   B  0

(113)

C  1 Ci   C  0

(114)

We assign probabilities to each node as follows. The data associated with the node Ai is given by

P  A  0.05

(115)

and

P  A   1  P  A  0.95

(116)

The data associated with the node Bi is given by

P  B   0.03

(117)

Kunihiro Suzuki

214 and

P  B   1 P  B  0.97

(118)

The data associated with the node C should be P  C Ai events of Ai and Bi .



Bi

Bi 

which is related to the

 is then given by



Bi  1  P  C Ai

P C Ai

P  C Ai

Bi 

(119)

The above is shown in in Figure 13 Using the above fundamental data, we then evaluate the probabilities below. P C   P C A

B  P  A  P  B   P C A

B P  A P  B

 P  C A B  P  A P  B   P C A B  P  A P  B   0.01 1  0.05   1  0.03  0.30  1  0.05   0.03 0.80  0.05  1  0.03  0.95  0.05  0.03  0.058

(120)

We implicitly assume that the events P  Ai

Ai

and

B j   P  Ai  P  B j 

Bj

are independent and use

(121)

We further obtain P C   1  P C   0.942

(122)

We also evaluate conditional probabilities. We first evaluate P A C 

PA C

, which is given by

P  C A P  A P C 

(123)

Fundamentals of a Bayes’ Theorem

215

P C A We know P  C  and P  A , but do not know which can be expanded as

P  C A  P  C A B  P  B   P  C A B  P  B   0.80  1  0.03  0.95  0.03  0.805

(124)

Therefore, we obtain

P A C 

P  C A P  A P C 

0.805  0.05 0.058  0.694



(125)

We can obtain the other conditional probabilities with a similar way as

PB C 

P C B  P  B  P C 

 P C A  

B  P  A   P  C A B  P  A   P  B  P C 

0.30  0.95  0.95  0.05  0.03 0.058

 0.665

(126)

We then automatically obtain





P A C  1 P  A C   0.306



(127)



P B C  1 P  B C   0.335

(128)

Kunihiro Suzuki

216 We can further obtain

P A C 





P C A P  A P C 

1  P  C A   P  A    P C  

1  0.805  0.05

0.942  0.010

PB C 



(129)



P C B P B P C 

1  P  C B   P  B    P C  1   P  C B  

A  P  A   P C B P C 

A  P  A    P  B  

1   0.30  0.95  0.95  0.05    0.03  0.942  0.021

(130)

We can then automatically obtain





P A C  1 P  A C   0.09



(131)



P B C  1 P  B C   0.979

(132)

Fundamentals of a Bayes’ Theorem

217

Figure 14. A Bayesian network which is added node Di to the one in Figure 13.

5.2. A Bayesian Network with Multiple Hierarchy We considered one hierarchy in the previous section, where there is one child node and its two parents nodes, that is, there is only one child node. We then extend the network by adding a node child node for nodes

Ai

and

Since we add the node is given by

Di

Bi

Di

as shown Figure 14. Node

, but it is also a child node for node

Di

Ci

is a

.

, we need to know the related probability; we assume which

P  D C   0.01

(133)

P  D C   0.70

(134)

We can easily evaluate P  D  as P  D   P  D C  P C   P  D C  P C   0.01 0.942  0.70  0.058  0.050

(135)

Kunihiro Suzuki

218

We also need a conditional probability such as P  A D 

Ai  A

Therefore,

, which can be evaluated as

P  D A P  A P  D

(136)

In this equation, the unknown term is only we fix

P  A D

and

P  D A

Di  D

P  D A

. The route is

. The problem is how we treat the

Ci

Ai  Ci  Di

, which can be

, and

C or C .

can be expanded as





P  D A  P  D C  P  C A  P  C   P  D C  P C A P C   P  D C  P  C A P  C   P  D C  1  P  C A   P C 

(137)

P C A where also depend on Bi , we do not know its explicit expression, but can obtain as follows. P  C A  P  C A

B  P  B   P C A

B P B

 0.80  1  0.03  0.95  0.03  0.8045

Therefore, we obtain

(138)

P  D A

as

P  D A  P  D C  P  C A  P  C   P  D C  1  P  C A   P C   0.70  0.8045  0.058  0.01 1  0.8045   1  0.058   0.0345

(139)

This can be generally expressed as

P  D A   P  D Ci  P  Ci A  P  Ci  i

We can easily extend this procedure to any network system.

(140)

Fundamentals of a Bayes’ Theorem

We consider the network shown in Figure 15, where we add a node conditional probability P A F  

PA F 

219

Fi

. The

can be evaluated as below.

P  F A P  A PF 

(141)

where





P  A F    P  F Di  P  Di C j  P C j A P  Di  P C j  i, j

(142)

One problem for the long Bayesian network is that the number of term increases significantly as shown in Eq. (142).

Figure 15. A Bayesian network, which is added node Fi to the one in Figure 14.

We treat one more example as shown in Figure 16. The target conditional probability is

P A G

. The corresponding routes are

Ai  Ci  Di  Gi  Fi  Gi and Ai  Ci  Ei  Gi  Fi  Gi . We can then evaluate it

as

Kunihiro Suzuki

220

   P G F  P  F E  P  E

 C  P  C A  P  F  P  E P  C 

P  A G    P  G Fi  P  Fi D j  P D j Ck P  Ck A  P  Fi  P  D j P Ck  i , j ,k

i

i , j ,k

i

j

j

k

k

i

j

k

(143)

where F depends on both D and E , and hence the related term is expanded as P  Fi D j   P  Fi D j

E  P  E   P  Fi D j

E  P E 

(144)

P  Fi E j   P  Fi E j

D  P  D   P  Fi E j

D P D

(145)

C depends on both

A

P  Ck A   P  Ck A

and B , and hence the related term is expanded as B  P  B   P  Ck A

Figure 16. A Bayesian network, which has various routs.

B P B

(146)

Fundamentals of a Bayes’ Theorem

221

SUMMARY To summarize the results in this chapter: We consider m kinds of events Ai  i  1, 2, , m . When we have an event E , the probability for the related cause of Ai is expressed by P  Ai E  

P  Ai  P  E Ai 

P  A1  P  E A1   P  A2  P  E A2  

 P  Am  P  E Am 

We select one cause or member k . The corresponding probability P  k  is 1 and the other probability is 0. However, we do not know what is the cause or who is the one, that is, we do not know which i is k . We update the P  Ai  with obtaining the results of E . We have some kinds of E which are denoted as Ea and Eb for example and we know the conditional probability

P  Ea Ai 

and

P  Eb Ai 

. The resultant events Ea , Eb ,

reflect

k. We assume that we obtain events Ea , Ea , Eb . Based on the results, we update the P  Ai 

as

P  Ai Ea  

P  Ai Ea  

P  Ai Eb  

P  Ai  P  Ea Ai 

P  A1  P  Ea A1   P  A2  P  Ea A2  

P  Ai  P  Ea Ai 

P  A1  P  Ea A1   P  A2  P  Ea A2  

P  Ai  P  Eb Ai 

P  A1  P  Eb A1   P  A2  P  Eb A2  

 P  Am  P  Ea Am 

 P  Am  P  Ea Am 

 P  Am  P  Eb Am 

 P  Ai 

 P  Ai 

 P  Ai 

We can expect that the predominant  i  is the required k . A Bayesian network expresses the relationship between causes and results graphically. The probability for the event is related to the events in the previous step where the pervious event is one or many. We can evaluate probabilities related to any two event in the network. P A

Chapter 7

A BAYES’ THEOREM FOR PREDICTING POPULATION PARAMETERS ABSTRACT We predict a probability distribution associated with macro parameters of a set, where we use priori/likelihood/posterior distributions. We show first the likelihood distributions and multiply it to a priori distribution, and the resultant posterior distribution gives the information associated with the macro parameters of the population set. We show analytical approach first, where conjugate distributions are recommended to use. We then show numerical approach of Marcov Chain Monte Carlo (MCMC) method. In the MCMC method, there are two prominent ways of Gibbs sampling and Metropolis-Hastings algorithm. This process is extended to the procedures with various parameters in likelihood function which is constrained and controlled by the priori distribution function, which is called as a hierarchical Bayesian theory.

Keywords: Bayes’ theorem, conditional probability, priori probability, likelihood function, posterior probability; binomial distribution, beta distribution, beta function, multinomial distribution, dirichlet distribution, normal function, inverse gamma distribution, gamma function, Markov chain Monte Carlo, Gibbs sampling, Metropolis-Hastings algorithm, hierarchical Bayesian theory

1. INTRODUCTION Figure 1 shows the schematic targets of the Bayesian theory up to this chapter and the one of this chapter. In the Bayesian updating, we should know the probability of a certain cause. For example, we need to specify the ratio of black balls in a vase. However, we want to specify the ratio itself without assuming the ratio. In general, we want to predict macro

Kunihiro Suzuki

224

ammeters of a population set. We show analytical and numerical models to decide the parameters in this chapter.

Figure 1. The target of Bayesian updating is to identify the vase kinds. The target of the theory in this chapter is to predict the black ball ratio in the vase.

2. A MAXIMUM LIKELIHOOD METHOD A method to predict macro parameters of a population set consists of likelihood and priori distributions. We first discuss the likelihood function. The maximum likelihood method is utilized to predict parameters of a population set in general. We treat the macro parameter of the population set as a variable, and evaluate the probability based on obtained data. We impose that the parameter value gives the maximum probability. We start with this maximum likelihood method.

2.1. Example for a Maximum Likelihood Method We show two examples for a maximum likelihood method here.

2.2. Number of Rabbits on a Hill We want to know a total number of rabbits on a hill. We assume that the total number

 which is unknown and should be evaluated as follows. We get a  15 rabbits, mark, and release them. After a while, we get n  30 rabbits, and found x  5 marked ones among them.

of the rabbits is

The corresponding probability that we get x marked rabbit can be evaluated as

A Bayes’ Theorem for Predicting Population Parameters f  x 

15 Cx

  15 C30 x  C30

225

(1)

This is the probability distribution for a variable x . However, we obtain x  5 , and

 . Therefore, we regard that the function of Eq. (1) expresses the probability with respect to  , and hence it is denoted as the unknown variable is the total number of rabbit

L   

15 C5

  15 C305  C30

(2)

Figure 2 shows the dependence of the probability on  . It has a peak for 89.5. We regard that the situation occurs with the maximum probability, and hence we guess that there are about 89.5 rabbits on the hill. The corresponding distribution is shown in the figure. It should be noted that the form of Eq. (2) is the one for x , and hence it is

 . Therefore, the integration of Figure 2 deviates L  from 1. Therefore, the rigorous expression for   is given by normalized with respect to x not to

L   

15 C5

  15 C305  C30

(3)

If we do not care about the normalization and focus on the peak position, Eq. (2) is OK. 0.25 0.20

L

0.15 0.10 0.05 0.00 40

50

60

70

80

90 100 110 120 130 140 150

 Figure 2. Dependence of L associated with a rabbit number on

.

Kunihiro Suzuki

226

2.3. Decompression Sickness Decompression sickness is a fatal problem for divers. We want to know the probability where a diver feels a decompression problem as a function of water depth. The data we can get are water depths and related feelings associated with the decompression. If the diver feels a problem, the data is YES and NO for vice versa. We can then obtain data such as Table 1. Table 1. Data of water depth and the divers feelings associated with decompression

Member ID Depth (m) 1 34 2 34 3 34 4 34 5 35 6 35 7 36 8 36 9 37 10 37 11 38 12 38

Result YES NO NO NO NO NO YES NO YES YES YES YES

Next, we should define a probability where a diver feels a decompression problem. The probability should be small with small water depth, increase with the increasing depth, and then saturate. We therefore assume the form of the probability as f  x,   

x x 

(4)

where x is the water depth.  corresponds to the one where 50 % divers feel decompression problems. On the other hand, the probability where a diver does not feel a decompression problem is given by

1  f  x, 

(5)

A Bayes’ Theorem for Predicting Population Parameters That is, the data YES is related to

f  x

, and the data NO is related to

227 1  f  x

.

The probability where we obtain the data on Table 1 is denoted as L   and is given by L    f  34,   1  f  34,    1  f  35,   3

2

 f  36,   1  f  36,     f  37,     f  38,   2

Figure 3 shows the dependence of L on probability is given by f  x 

.

L

2

has a peek for

(6)

  35.5 . Therefore, the

x x  35.5

(7)

0.000290

0.000285 0.000280 0.000275

L

0.000270 0.000265 0.000260 0.000255 0.000250

0.000245 0.000240 30

35

40

45

50

Depth (m) Figure 3. Dependence of L associated with decompression sickness on depth.

3. PREDICTION OF MACRO PARAMETERS OF A SET WITH CONJUGATE PROBABILITY FUNCTIONS 3.1. Theoretical Framework We studied Bayesian updating in the previous chapter, where we evaluate the probability for the subject, and eventually decide which is the cause for the result. We treat a different situation in this chapter. We want to know macro parameters of a set based on extracted data and our experience. The prominent macro parameters for the set are average, variance, or ratio. We show how we can evaluate them in this chapter.

Kunihiro Suzuki

228

The procedure is similar to the Bayesian updating discussed in the previous chapter. However, we should use probability distributions instead of probabilities. We re-consider the Bayes’ theorem which is given by

P  H D 

PD H  PH  P  D

(8)

where H is the cause (Hypothesis), D is the result (Data) caused by H . We regard that H is decided by the macroscopic parameter of the set such as average. Therefore, H can be regarded as the macroscopic parameter for the set, and we denote it as

 . We modify the Bayes’ theory as P  H D 

We call we obtain

PD H  PH 

  

P  D

   D  

f  D      P  D

(9)

as the priori probability distribution which expresses the probability that

 . f  D   is the likelihood probability distribution where we have the data

 .   D  is the posterior probability distribution which

under the macroscopic value of is the product of express Eq. (9) as

f D  

and

D

  

.

P  D

is independent of

 and we neglect it and

  D   Kf  D     

(10)

where we can decide K so that the integration of the probability with respect to  is one. In the above theory, there are some ambiguous points. We do not know accurately how   we define   , which depends on the person’s image or experience. We use a simple one where we can easily obtain

  D 

with used

  

and

f D  

, and impose that the

f D     D    form of the all distributions of   , , and are the same. That is, the product of two same form functions is the same as the two functions. There are limited functions which have such characteristics. We show such combinations as shown in Table

  D    2. We basically know the likelihood function, and require that   and are the same type distribution functions, which is called as conjugate functions. If we regard the

A Bayes’ Theorem for Predicting Population Parameters

229

likelihood function as one with a variable of  , the corresponding function is different, which is shown in the bracket in the table. In those cases, all the type of the distributions of

  

,

f D  

, and

  D 

are the same.

Table 2. The combinations of priori, likelihood, and posterior functions

f D  

   Beta function

Binominal function (Beta function) Multinomial distribution (Dirichlet distribution) Normal function (Normal function) Normal function (Inverse Gamma function) Poisson function (Gamma function)

Dirichlet distribution Normal function Inverse Gamma function Gamma function

  D  Beta function Dirichlet distribution Normal function Inverse Gamma function Gamma function

3.2. Beta/Binominal (Beta)/Beta Distribution We study an example for the Bernoulli trails and evaluate the population ratio. We obtain the two kinds of events 1 or 0. event 1 for each event.

 corresponds to the probability that we obtain the

We assume the probability  where we obtain x times target event of 1 among n trials. The corresponding probability with respect to x is a binominal function and is given by

f  x   n Cx x 1   

n 1

(11)

We then convert this probability distribution function to the one for probability to obtain

 under the data

Dx

, which is the likelihood function denoted as

f D  

. Since

 for the likelihood function, it should be changed so that the total probability with respect to  is one. The factor is the factor n Cx does not include the variable parameter denoted as K . We then obtain the likelihood function as

Kunihiro Suzuki

230 f  D    K x 1   

We can decide is given by





0

K

n 1

(12)

by imposing that the total integration of the probability is one, which

f  D  d  K





0

 x 1   

n 1

d

1

(13)

We then obtain the corresponding likelihood function, which is the Beta distribution Be and is given by

f  D    Be  L ,  L   1

  L 1 1    L  B  L ,  L 

(14)

where

L  x  1

(15)

L  n  x  1

(16)

B  ,  

is a Beta function given by

B  ,       1 1    1

0

 1

d

(17)

The Beta distribution has parameters of  and  . B  , We use a Beta distribution function e  0 0  as the priori probability distribution, which is given by     Be  0 ,  0  

1  1  0 1 1    0 B  0 ,  0 

(18)

A Bayes’ Theorem for Predicting Population Parameters

231

There is not a clear reason to decide the parameters of  0 and  0 . Further, it is not so easy to decide the parameters. It is convenient to decide the corresponding average and standard deviation and convert them to  0 and  0 . The relationship between the average and variance and parameters given by

0 

 02 

0

 0  0

(19)

 0 0

0  0  10  0 2

(20)

We then obtain 0 

0 

02 1  0   0  02

(21)

 1  0  02  2 1  0   0  0   0 

(22)

We obtain a posterior distribution as

  D   f  D         L 1 1    

 0  L 1 1

 L 1

 0 1 1   

1       0

0 1

1

L 1

(23)

Finally, we obtain   D  



*

1

1   



B *,  *



*

1

(24)

where

 *  0   L  1

(25)

Kunihiro Suzuki

232

 *  0   L  1

(26)

The average and variance of  

 2 

 denoted as  and  2 are given by

*   * *



(27)

 * * *



 * 1 *  *



2

(28)

3.3. Dirichlet/Dirichlet (Multinomial)/Dirichlet Distribution The binomial distribution is extended to a multinomial distribution where the event is more than two. In general, we need to consider multiple events of more than two kinds. Let us consider m kinds of events and a corresponding probability for

P  X1  x1 , X 2  x2 , , X m  xm  f  x1 , x2 ,

, xm  

is given by

n! p x1 p x2 x1 ! x2 ! xm !

p xm

(29)

where p1  p2 

 pm  1

(30)

x1  x2 

 xm  n

(31)

We then convert this probability distribution to the one for the probability to obtain i

f  D i  D   x1 x2 xm  under the data , which is the likelihood function denoted as . We then obtain the corresponding likelihood function, which is the Dirichlet distribution given by f 1 , 2 ,

, m D  

  

 1    2 

  m 

1 1 2 1

2 1

 m

m 1

(32)

A Bayes’ Theorem for Predicting Population Parameters

233

where

i  xi  1

(33)

We use a Dirichlet distribution as the priori probability distribution, which is given by  1 , 2 ,

, m  

  0 

 10    20 

  m 0 

1

 2

10 1

 m

20 1

m 0 1

(34)

We obtain a posterior distribution as

 1 ,2 , ,m D   1 12 1

 1

m

2 1

1 10 1 1

2

 110 1220 1

m 1

 1

m

2  20 1

m

m 0 1

 1

m  m 0 1

(35)

Finally, we obtain  1 , 2 ,

, m D  

 

* 1

  * 

    * 2

 

* m



1 1 2 * 1

* 2 1

 m

* m 1

(36)

where

 i*   i   i 0  1

(37)

3.4. Gamma/Poisson (Gamma)/Gamma Distribution We also study an example for the Bernoulli trials and do not evaluate the population ratio, but the average number that the target event occurs where the probability that one event occurs is quite small. In this situation, we can regard that the likelihood function follows the Poisson distribution. We assume that we obtain an event x times target events among n trials, and the probability that we obtain a target event is as small as p . The corresponding probability f x with respect to x   is a Poisson distribution given by

f  x 

 x  e x!

(38)

Kunihiro Suzuki

234 where

 is the average target event number and is given by

  np

(39)

We then convert this probability distribution function to the one for the probability to obtain the average target event number function denoted as

f D  

 under the data of

Dx

, which is the likelihood

. Since the factor x! does not include the variable parameter

 for the likelihood function, it should be changed so that the total probability with respect to  is one. That is, we denote the factor as K . We then obtain the likelihood function as f  D    K x e We can decide is given by

K

(40) by imposing that the total integration of the probability is one, which





0

0

 f  D  d  K 

 x e d

1

(41)

We then obtain the corresponding likelihood function. It is a Gamma distribution and is given by

f D   

 n 1e  L

L

  nL 

(42)

where nL  x  1

(43)

L  1

(44)

  n

is a Gamma function given by 

  n     n1e d 0

(45)

A Bayes’ Theorem for Predicting Population Parameters We usually use a Gamma distribution which is given by    

  

235

as the priori probability distribution,

0n  n 1e    0

0

  n0 

(46)

It is not so easy to decide the priori probability parameters of  0 and  0 . It is convenient to decide the corresponding average and standard deviation and convert them to  0 and  0 . The relationship between the average and variance and parameters given by

0 

 02 

n0

0

(47)

n0

02

(48)

We then obtain

0  02

(49)

02 n0  2 0

(50)

0 

We obtain the posterior probability distribution as

  D   f  D        nL 1e L  n0 1e 0 

n0  nL 1 1  0  L 

e

Finally, we obtain the posterior distribution as

(51)

Kunihiro Suzuki

236

  D  

 n 1e   *

*

 

 n*

(52)

where

n*  n0  nL  1

(53)

 *  0  L

(54)

The average and variance are given by



2 

n*

*

(55)

n*  *2

(56)

3.5. Normal /Normal/Normal Distribution If we obtain a data set of x1 , x2 , , xn , we want to predict the average of the set. We only know that the data follow the normal distributions. The corresponding probability that we obtain D  x1, x2 , , xn is given by f  D 

  x   2  1   x   2  exp  1 2  exp   2 2  2 2   2 2   2 2 1

  x   2  exp   n 2  2   2 2 1

(57) 2 We assume that we know the variance  , but do not know the average. We therefore

replace  as f  D 

 . Then the probability is expressed as   x   2  1   x   2  exp   1 2  exp   2 2  2 2   2 2   2 2 1

  x   2  exp   n 2  2   2 2 1

(58)

A Bayes’ Theorem for Predicting Population Parameters

237

We then convert this probability distribution function to the one for probability to obtain the average

f D  

 under the data

D

, which is the likelihood function denoted as

 for the likelihood function, it should be changed so that the total probability with respect to  is . Since the factor 1

2 2 does not include the variable parameter

one. That is, we denote the factor as K and obtain the likelihood function as   x1   2    x2   2  f  D    K exp    exp    2 2  2 2      n  1 2  K exp   2   xi      2 i 1 

  xn   2  exp    2 2   

n  1  n   K exp   2   xi 2  2  xi  n 2   i 1   2  i 1  n 1 n   n   K exp   2 2 x   2  2 x  exp   2  xi 2   2   2 n i 1 









 n 1 n  2  n   K exp   2   x   x 2  exp   2  xi 2  2  2  n   i 1        x 2   nx 2   n 1 n 2  K exp   2  xi  exp  2  exp    2  2 n i 1   2   2   n       x 2   K 'exp    2  2   n 

(59)

where

 nx 2   n 1 n  K '  K exp   2  xi 2  exp  2   2 n i 1   2  We can decide the which is given by

K'

(60)

by imposing that the total integration of the probability is one,



 f  D  d  1 0

(61)

Kunihiro Suzuki

238

Therefore, we obtain a corresponding likelihood function as

f D   

   L 2  exp   2 L2   2 L2  1

(62)

where

 L2 

2 n

L  x 

(63) 1 n  xi n i 1

(64)

We then assume the priori distribution of

   

   0 2  1 exp    2 02  2 0  

(65)

The corresponding posterior distribution is given by   D   f  D         0 2      L 2   exp   exp    2 02  2 L2      2   0  L    2    2     0  L   1 1     2   2     0 L    exp            1    2  1  1    2 2    L   0  

(66)

Therefore, we obtain the posterior distribution as   D  

    * 2   exp    2 *2  2 *   1

(67)

A Bayes’ Theorem for Predicting Population Parameters where 1 1 1  2 2 *2  0  L  0

*  



2 0



239

(68)

 L  *2   L2 

(69)

3.6. Inverse Gamma/Normal (Inverse Gamma)/Inverse Gamma Distribution In the previous section, we studied how to predict the average of a set, and we assumed that we know the variance of the set. We study a case where we do not know the variance of a set and know the average  . If we obtain a data set of x1 , x2 , , xn , we want to predict the variance of the set. We only know that the data follow the normal distributions. The corresponding probability that we obtain D  x1, x2 , , xn is given by f  D 

  x   2  1   x   2  exp   1 2  exp   2 2  2 2   2 2   2 2 1

  x   2  exp   n 2  2   2 2 (70) 1

2 We regard that we know the average  , but do not know the variance  . We 2 therefore replace  as

f  D 

  

  

  

  

 . Then the probability is expressed as

  x   2  1   x   2    x   2  1 1 exp   1 exp   2 exp   n    2 2 2 2 2   2      n 2  n    xi     1   n2 i 1    exp   2 2      n  2 2  n   xi  2 xi    1   n2 i 1    exp   2 2      n   n  2  2 x    xi 2  n n     1  1   1  i 1 2   exp     2 2      n   2 2 2  n n   n    x   x    xi 1  1   2 1 1 i 1  exp     2  2    









(71)

Kunihiro Suzuki

240 where x

1 n  xi n i 1

(72)

Therefore, we obtain a corresponding likelihood function as

   exp   L       nL 

 n

L 1

f D   

(73)

where

nL 

n 1 2

(74)

1

n



L  n    x   x 2    xi 2   i 1  2  2

(75)

This is the inverse Gamma distribution. 2 We assume a priori distribution for variance  given by

0

   

 0     

0 n   n 1 Exp   0

  n0 

(76)

It is not so easy to decide the priori probability parameters of n0 and 0 . It is convenient to decide the corresponding average and standard deviation and convert them to n0 and 0 . The relationships between the average and variance and parameters are given by

0 

0 n0  1

(77)

A Bayes’ Theorem for Predicting Population Parameters

 02 

241

0 2

 n0  1  n0  2  2

(78)

We then obtain

n0  2 

02  02

(79)



0  0 1  

02    02 

(80)

The corresponding posterior distribution is given by

  D   f  D      L n   n L





  exp   L     nL 

L 1

 n0  nL 1 1

    n0  n0 1 Exp   0   0        n0 

   L  exp   0   

(81)

Therefore, we obtain the posterior distribution as *

  D  

 *     

  n 1 exp  

 

 n*

(82)

where

n*  n0  nL  1

(83)

 *  0  L

(84)

Kunihiro Suzuki

242

The average and variance are given by

* n*  1



 *2 

(85)

*

n

 n

1

*

2

*

2



(86)

4. PROBABILITY DISTRIBUTION WITH TWO PARAMETERS In the previous section, we show how to decide one parameter in a posterior probability distribution. In the analysis, we select a convenient distribution to obtain a rather simple posterior probability distribution with one parameter, which is called as conjugate distributions. However, we want to treat distributions with many parameters in general. We need to obtain the procedure to obtain many parameters. We show the treatment for two parameters in this section. We will extend the treatment for many parameters of more than two in the later section. We treat two parameters of an average 1 and a variance 2 . The posterior distribution for the average is given by

    * 2  1   1 D   exp   *2 *   2 2   1

(87)

where 1



*2



1



2 0

 0

*  



 L2 

2 0





1

 L2

(88)

 L  *2   L2 

(89)

2 n

(90)

L  x 

A Bayes’ Theorem for Predicting Population Parameters

243

1 n  xi n i 1

(91)

The posterior distribution for the variance is given by

 *    2 

  n 1 exp   *

  2 D  

  n* 

(92)

where

n*  n0  nL  1

(93)

 *  0  L

(94)

nL 

n 1 2

1

(95) n



L  n 1  x   x 2    xi 2   i 1  2  2

The corresponding schematic distribution is shown in Figure 4.

Figure 4. Probability distribution with two parameters of 1 and 2 .

(96)

Kunihiro Suzuki

244

We need to decide the parameters of 1 and 2 where they are constant numbers in one equation although they are variables in the other equations. We show various methods to decide them in the following sections. To perform concrete calculation, we use a data set as shown in Table 3. The data 2 number is 30, and the sample average x and unbiased variance s are given by

30

x   xi i 1

 5.11

(97)

1 30 2  xi  x   30  1 i 1  4.77

s2 

(98)

where xi is data value in the table. Table 3. Data for evaluating two macro parameters of a set D ata ID 1 2 3 4 5 6 7 8 9 10 Average Variance

D ata 6.0 10.0 7.6 3.5 1.4 2.5 5.6 3.0 2.2 5.0

D ata ID 11 12 13 14 15 16 17 18 19 20

D ata 3.3 7.6 5.8 6.7 2.8 4.8 6.3 5.3 5.4 3.3

D ata ID 21 22 23 24 25 26 27 28 29 30

D ata 3.4 3.8 3.3 5.7 6.3 8.4 4.6 2.8 7.9 8.9

5.11 4.77

4.1. MCMC Method (A Gibbs Sampling Method) A Gibbs sampling method gives us the parameter values with numerical iterative method by generating a random number related to the posterior probability distribution. We write again the posterior distribution for average given by

A Bayes’ Theorem for Predicting Population Parameters



 0

 1 D, 2



    * 2  1 1   exp   *  2 *2  2  

245

(99)

where 1

 *2



1

 02

 0

*  



 L2 

2 0





1

 L2

(100)

 L  *2   L2 

(101)

2 n

L  x 

(102) 1 n  xi n i 1

(103)

We should set the parameters as below.

0  x  5.11

s2 n 4.77  30  0.16

(104)

 02 

(105)

L  x  5.11 We should also set 2 for evaluating 1 . We set its initial value as

(106)

Kunihiro Suzuki

246

 2 0  s 2  4.77

(107)

Using the distribution function, we generate a random number for 1 which is denoted as

1 0 and is expressed by





11  rand  Inv _  1 D, 2 0  

 5.00



(108)

Eq.(108) expresses the random number based on the probability function of

 1 D, 2 0

 . The procedure to obtain the random number based on a function is

described in Chapter 14 of volume 3.

Figure 5. Probability distribution with two parameters of 1 and 2 . (a) 2 is set and 1 is generated. (b) 1 is set and 2 is generated.

A Bayes’ Theorem for Predicting Population Parameters

247

We can generate the random number associated with a certain probability distribution P   P  using a uniform random number as follows. We integrate   , and form a function F   as 

F     P   d

(109)

0

We generate the uniform random number, and obtain a corresponding shown in Figure 6.

 , which is

Figure 6. Random number generation related to the probability distribution of P   using inverse function method.

The posterior probability distribution for the variance is then given by *





  2 D,11 

 *    2 

 2  n 1 exp     n* 

(110)

where

n*  n0  nL  1

(111)

 *  0  L

(112)

nL 

n 1 2

(113)

Kunihiro Suzuki

248

1

n



L  n 1  x   x 2    xi 2   i 1  2  2

n We should set 0 and

(114)

0 . We assume that an average of the variance 0 as

0  s 2

(115)

and a variance of the variance as

 02  s 4

(116)

Therefore, we obtain

n0  2 

02  02

3

(117)



0  0 1 

02    02 

  4.77  2  9.54

(118)

We can further obtain

n 1 2 30  1 2  14

nL 

(119)

n*  n0  nL  1  3  14  1  18

(120)

A Bayes’ Theorem for Predicting Population Parameters n

x i 1

i

2

249

 920.6 (121)

n 1   2 2 L  n 1  x   x   xi 2    2 i 1 





1   1 n 1  x 2  





2

n   x 2    xi 2   i 1 



1 2 30  5.07  5.11  5.112   920.6   2  457.769



(122)

Using the distribution function, we generate a random number for 2 based on the distribution function and denote it as

 21 which is given by





 21  rand  Inv _   2 D,11  

 4.19

(123)



 1 D, 2  1



 1

We then move to the updated where 2 is used. We repeat the above process many times. In the initial stage of the cycle, the results are influenced by the initial values. Therefore; we ignore the first 1000 data for example. The other data express the distribution 2 for  and  . We then perform the Gibbs sampling 20000 cycle times, and delete the first 1000 cycle times data, and obtain the results as shown in Figure 7. The corresponding average and variance values are given by

  1  5.11    2  4.61

(124)

250

Kunihiro Suzuki

(a)

(b) Figure 7. Gibbs sampling. (a) Average, (b) Variance.

4.2. MCMC Method (A MH Method) In the previous section, we obtained the parameters of posterior probability distributions by generating random numbers related to posterior distributions. However, the posterior distribution changes every time, and we should evaluate functions to generate the random number related to the posterior probability distribution. A Metropolis-Hastings (MH) algorithm method gives us almost the same results without using the generation of random numbers related to the posterior probability distribution. We utilize detailed balance conditions as follows.

A Bayes’ Theorem for Predicting Population Parameters We define a probability distribution

P  xi 

251

where we have a status of xi . We also

W x  xi 1  define a transition function  i , that expresses the transition probability where the status changes form xi to xi 1 . The detailed balance condition requires the followings.

P  xi W  xi  xi 1   P  xi 1 W  xi 1  xi 

(125)

P x P x If the probability  i  is larger than  i 1  , the transition from the status xi to the status xi 1 is small and vice versa. In the MH algorithm, the above process is expressed

simply. The transition occurs significantly when transition occurs little when detail.

P  xi 

is larger than

P  xi 

is smaller than

P  xi 1 

P  xi 1 

, and the

, which is shown below in more

Figure 8. Transition probability W . The length qualitatively expresses the strength of the transition. (a) The probability gradient is positive. (b) The probability gradient is negative.

Kunihiro Suzuki

252

Figure 8 shows the schematic transition probability. Two cases are shown, one is the probability gradient is positive (a), and negative (b). We select a next step using random number, and hence xi 1 have a chance of both sides of the location of xi . We decide whether the transition occurs or not. We define the ratio r given by

r

P  xi 1  P  xi 

(126)

When r is larger than 1, we always change the status from xi to xi 1 . When r is smaller than 1, we generate a random number q which is uniform one with the range of

 0,1 . We then compare r with q .

When r is larger than q , we change the status from xi to xi 1 . When r is smaller than q , we do not change the status. The above process ensures that we stay long in the status where the corresponding probability is high and vice versa. We utilize the above process in the MH algorithm shown below. We use the same posterior probability distribution given by



 0

 0

 1 D, 2





   0   * 1 1  exp   *  2 *2 2 

where we set the initial values for



2

   

(127)

1 0  and  2 0  as

1 0  x  5.11

(128)

 2 0  s 2  4.77 The related parameters are given by

(129)

A Bayes’ Theorem for Predicting Population Parameters 1





*2

1



2 0

 0

*  



2 0

1





253

 L2

(130)

 L  *2   L2 

(131)

0  x  5.11

(132)

s2 n 4.77  30  0.16

 02 

(133)

L  x  5.11

(134)

 2 0  n  2 0   30

(135)

 L2 

Using the parameters above, we can evaluate the posterior probability associated with the average of Eq. (127). The posterior distribution associated with the variance is given by



 0

 0

  2 D,1



   2 0

 *  exp    0    2  *  n

 n* 1

 

(136)

where

n*  n0  nL  1

(137)

  0  L

(138)

*

Kunihiro Suzuki

254

nL 

n 1 2

1

(139)



L  n  1 0  x 2 



2

n   x 2    xi 2   i 1 

(140)

We should set n0 and 0 . We assume that an average of the variance 0 as

0  s 2

(141)

and a variance of the variance as

 02  s 4

(142)

Therefore, we obtain

n0  2 

02  02

3

(143)



0  0 1 

02    02 

  4.77  2  9.54

(144)

We can further obtain

n 1 2 30  1 2  14

nL 

(145)

A Bayes’ Theorem for Predicting Population Parameters

255

n*  n0  nL  1  3  14  1  18 n

x

2

L 

1    0 n 1  x 2  

i 1

i



(146)

 920.6

(147)









2

n   x 2    xi 2   i 1 





2 1 30  1 0  5.11  5.112   920.6   2

(148)

Using the parameters above, we can evaluate the distribution function associated with the variance. We then generate increments of the parameters using random numbers related to

 N  0,   N 0,  normal distributions of , and    . 1 and   are variances for the average and variance, respectively. We set the steps as 1

  1

1 0  

(149)

1

 2 0    2

(150)

2

1

and  are constant numbers and we use 10 for this case. We then generate the tentative next values as

1t  1 0  rand  Inv _ N  0,    

(151)

 2t   2 0  rand  Inv _ N  0,   

(152)

1

2

We then evaluate the two ratios given by

Kunihiro Suzuki

256

p1 



 1t D, 2 0



 1 0 D, 2 0



  2t D,1 0

p2 

If

p1



  2 0 D,1 0

and

p 2



(153)





(154)

are larger than 1, we set

11  1t

(155)

21  2t

(156)

when that is,

p1

or

p 2

q  rand  rand 

q



with

is smaller than 1, we generate a uniform random number denoted as q ,



(157)

 means that it is a uniform random number between 0 and 1. We then compare

p1

and

1   1  1t  1  0  1  1

  1   t  1   2   2t  1  0     1  0  2  2

p 2

, and decide next step parameters as follows.

for p1  q for p1  q

(158)

for p2  q for p2  q (159)

A Bayes’ Theorem for Predicting Population Parameters

257

We repeat the process many times. Figure 9 shows the schematic parameter generation in the MH algorithm method.

Figure 9. Parameter generation for a MH algorithm method.

In the initial stage of the cycle, the results are influenced by the initial values for  2 and  . Therefore, we ignore the first 1000 data for example. The other data express the

distribution for  and  . The corresponding average and variance values are given by 2

 1  5.19   2  4.76 Figure 11 shows the algorithm flow for the MH method for one parameter.

Figure 10. A Metropolis-Hastings (MH) algorithm method. (a) Average, (b) Variance.

(160)

258

Kunihiro Suzuki

Figure 11. Metropolis-Hastings algorithm flow.

4.3. Posterior Probability Distribution with Many Parameters We can easily extend the above procedures described in the previous section to many parameters of more than two. The l parameters are denoted as 1 ,2 , ,l . We need to know the posterior probability distribution for each parameter where the other parameters are included as constant numbers. The posterior distributions are then expressed as

A Bayes’ Theorem for Predicting Population Parameters

 1 D, 2 ,3 ,

,l 

  2 D,1 ,3 ,

,l 

 i D,1 , 2 ,

,i 1 ,i 1 ,

 l D,1 , 2 ,

,l 1 

259

,l 

(161)

In the Gibbs sampling method, we use the same posterior probability distributions, and

 ,  , , l

also set initial value for 2 3 number for the parameter values as

 1  1  1  2    1 i     1 l  

where

as

20 ,30 , ,l0 . We then generate the random

 

 



,i 11 ,i 1 0 ,



,l 11  

 rand  Inv _  1 D, 2 0  ,3 0  , ,l  0   rand  Inv _   2 D,11 ,3 0  , ,l  0    rand  Inv _  i D,11 , 21 ,   rand  Inv _  i D,11 , 21 , 

rand  Inv _    



,l  0  



means the generating random number for

(162)

 based on the

  function   . We then repeat the process. In the MH MC method, we use the same posterior probability distributions, and also

set initial value for

1 ,2 ,3 , ,l as 10 ,20 ,30 , ,l0 . We should also set the step for

 , , ,

l each parameters as 1  2 . We generate tentative parameters as

1t  1 0  rand  Inv _ N  0,     1

(163)

Kunihiro Suzuki

260

 2t   2 0  rand  Inv _ N  0,   

(164)

lt  l 0  rand  Inv _ N  0,    

(165)

2

l

We then evaluate the ratios given by



p1 

pl 

,l  0

 1 0 D, 2 0 ,3 0 ,

,l  0





p2 



 1t D, 2 0 ,3 0 ,

,l  0

  2 0 D,1 0 ,3 0 ,

,l  0



 lt D,1 0 , 2 0 ,



 l

 0

 0

 0

D, 1 ,  2 ,

, l 1 0 



(167)



0

, l 1

(166)



  2t D,1 0 ,3 0 ,







(168)

q   as 1

We further generate a random number

q1  rand 



(169)

which is a uniform random number between 0 and 1. We decide the updated parameters as below. We consider i-th parameter in general. If the i-th parameter is larger than 1, we update the value as

i1  it

When

q   , and set it when it is larger 1

is smaller than 1, we then compare it with

q   , and set i 1

than

pi

(170)

0

q   . That is, it is summarized as 1

when it is smaller than

A Bayes’ Theorem for Predicting Population Parameters 1

i

 it  0    i

261

for pi  1 or pi  q 1 for pi  q 1

(171)

We repeat the above process and obtain 1 , 2 ,3 , ,l . In the MH MC method, we need not posterior probability distributions of each parameter, but it is sufficient that we have one posterior probability distribution that includes whole parameters of 1 , 2 ,3 , ,l as

g   D;1 ,2 ,3 , ,l 

(172)

We can then regard the posterior probability distribution for each parameter as  1 D, 2 ,3 ,

,l   K1 g  D;1 , 2 ,3 ,

,l 

  2 D,1 ,3 ,

,l   K 2 g  D;1 , 2 ,3 ,

,l 

 i D,1 , 2 ,

,i 1 ,i 1 ,

 l D,1 , 2 ,

,l 1   Kl g  D;1 , 2 ,3 ,

,l   Ki g  D;1 , 2 ,3 ,

,l 

,l 

(173)

where the factor Ki is the normalized factor which is decided as we regard the parameter i as a variable while the others as constant numbers. However, we do not care about the value of it when we evaluate the ratio. We then apply the same procedures.

4.4. Hierarchical Bayesian Theory The shape of the posterior distribution is limited to the function we use for likelihood and priori distributions. We can improve the flexibility for the shape using a hierarchical Bayesian theory. We consider a case where 20 members perform an examination as shown in Figure 12. The examination consists of 10 problems, and the members select an answer in each problem. The score of each member is the number of the correct answer, which is denoted as xi for a member

i . Therefore, the value of xi is then between 0 and 10.

Figure 12 shows the assumed obtained data. The average 0 and the standard deviation  0 of the examination are given by

Kunihiro Suzuki

262

0  5.85

(174)

 0  3.81

(175) 7 Data Normal Binominal HBayes

Frequency

6 5 4 3 2 1 0

0

2

4

Score

6

8

10

Figure 12. Score distribution and expected value.

If we assume that the member number of the scores follows the normal distribution, and the member number of score k is given by

N k  

  k  0 2 1 exp    2 0 2 2 0 

   

(176)

This can be modified as

  k  0  2   exp    2 0 2    N  k   20  10   i  0  2   exp     2 0 2  i 0  

(177)

The score distribution is far from the bell shape ones, which cannot be reproduced with the normal distribution as shown in Figure 12. We try to explain the distribution with a hierarchical Bayes theory using a binominal distribution.

A Bayes’ Theorem for Predicting Population Parameters

263

The probability that a member obtain a correct answer for each problem is assumed to be

 , and we have a corresponding the likelihood function given by f  D    n Cx1 x1 1   

n  x1 n

Cx2 x2 1   

n  x2 n

Cx20 x20 1   

  X 1   

n  x20

1020  X

(178)

where 20

X   xi i 1

 117

(179)

The peak value of the likelihood function can be evaluated from

f  D   

  X 1         X 1 1   

200  X

200  X 1

   X 1      200  X  

0

(180)

We therefore obtain the peak value of the likelihood function at

X 200  0.585

0 

(181)

The expected number for score k is then given by

N  k   20  10 Ck0 k 1  0 

10  k

(182)

Figure 12 compares the data with the evaluated values. The expected values deviate significantly. It is obvious that we apply the same probability 0 to any member. This treatment is valid if it is such kind of coin toss with the similar coin. This treatment cannot explain the data such as the figure.

Kunihiro Suzuki

264

We assume that the peculiar shape of the distribution of Figure 12 comes from the various skill of the person. How can we include the different skills? We regard that each member comes from his own group with different macro parameters of

 , which is shown schematically in Figure 13.

Figure 13. Schematic treatment for a hierarchical Bayes theory. We regard that each data comes from different group and each group is controlled by the priori distribution.

We can add the variety of the shape of the distribution by changing Therefore, we modify the likelihood function as

f  D 1 ,  2 ,

,  20 

 n C x11x1 1  1   1x1 1  1 

 in each trial.

n  x1

n  x1 n

C x2 2 x2 1   2 

 2 x 1   2  2

n  x2

n  x2 n

C x205 x20 1   20 

5 x 1   20  20

n  x20

n  x20

(183)

It should be noted that we now have undecided parameter number of 20 from 1. We can do anything with this method. However, we add some constraint to the functions. The distribution of i is controlled by the priori distribution. A parameter i defined in the range of 0 and 1, and it is frequently converted as

   ln  i      i  1  i  We then have

(184)

A Bayes’ Theorem for Predicting Population Parameters

i 

1 1  exp      i 

265

(185)

Figure 14 shows the relationship between  and

 with various  .  increases

monotonically with increasing  and decreases with increasing  . Note that the definition ,  range for  is  , which is available for a normal distribution.

1.0 0.8

 = -1 =0 =1



0.6 0.4 0.2 0.0 -10 Figure 14. Relationship between  and

-5



0 

with various

5

10

.

We assume the normal distribution for  i as

  i   

 2  1 exp   i 2  2  2 

(186)

where  is also an undecided parameter. We introduced a parameter  and it is assumed to be constant although the value is not decided at present. Therefore, we do not define probability distribution for  . Therefore, we assume the priori distribution for  i as  2  1  2  1 exp   1 2  exp   2 2  2  2  2  2 

The posterior distribution is given by

 2  1 exp   202  2  2 

(187)

Kunihiro Suzuki

266

 2  1 exp   1 2  2  2   2  1 n x  10 C x2 2 x2 1   2  2 exp   2 2  2  2  x1 10 C x11 1  1 

n  x1

 10 C x20 20 x20 1   20 

n  x20

 2  1 exp   202  2  2 

(188)

The probability associated with the score k is evaluated as P  k   10 Ck k 1   

nk

 2  1 exp   2  2  2  10  k

k

1 1      10 Ck  1          1  e 1  e    

Figure 15 (a) shows the dependence of

 2  1 exp   2  2  2 

Pk 

(189)

on  with two standard deviations of

 . The peak position of  increases with increasing k , and the profile becomes broad with increasing  . P k Figure 15 (b) shows the dependence of   on  with various  . The profile is symmetrical with respect to k with   0 , and the probability with higher k is larger with positive  , and vice versa. 0.06

=0

k : from 0 to 10 step 2

0.08

=3 = 6

0.05

=3

k : from 0 to 10 step 2

=0 =1  = -1

k=0

Probability

Probability

0.06

0.04

k = 10

0.03 0.02

0.04 k=0

k = 10

0.02

0.01 0.00 -10

-5

0  (a)

5

10

0.00 -10

-5

0  (b)

5

Figure 15. Dependence of probability for number k on  . (a) Two standard deviations  . (b) Various

.

10

A Bayes’ Theorem for Predicting Population Parameters

267

Neglecting the factors which are not related to the parameters, we obtain the function given by 20

L  ,     Li i 1 20

 i xi 1  i 

n  xi

i 1

 2  exp   1 2    2  1

20

  1  exp      i  

 xi

i 1

1  exp     i  

 n  xi 

 2  exp   1 2    2 

(190)

 2  exp   1 2    2 

(191)

1

where Li   i ,  ,    i xi 1  i 

n  xi

 2  exp   1 2    2  1

 1  exp      i 

 xi

1  exp     i 

 n  xi 

1

We first decide parameters that do not depend on member, which are  and  . We eliminate parameters of  i as









L1   1 ,  ,  d  1  L2  2 ,  ,  d  2 

 Q1   ,   Q2   ,  

Q20   ,  







L20  20 ,  ,  d  20 (192)

We perform the logarithm for the equation given by

G  ln Q1   ,     ln Q2   ,    

 ln Q20   ,  

(193)

We can obtain the values for  and  that maximize G using a solver in a software such as Excel in Micro Soft, and they are denoded as  0 and  0 . In this case, we obtain

  0  0.86   0  3.08

(194)

Kunihiro Suzuki

268

We can then perform the MHMC method and obtain the parameter values of  i as shown in Table 4 where we evaluated 5000 cycle times data and select the last 4000 data. The expected score values are evaluated as

N  k   20 

1 20 10 k k  10 Cki 1  i  20 i 1

(195)

which is shown in Figure 12. The expected values well reproduces the data. Table 4. Evaluated parameter values Data ID γ θ Score

1 2 -3.22 -4.87 0.09 0.02 1 0

3 4 3.80 -1.22 0.99 0.41 10 4

5 3.07 0.98 10

6 1.99 0.95 10

7 8 9 3.64 -0.28 -1.30 0.99 0.64 0.39 10 6 4

10 11 2.42 -2.76 0.96 0.13 10 1

12 13 14 1.32 -3.93 -0.97 0.90 0.04 0.47 9 0 5

15 16 17 4.92 -0.10 -2.80 1.00 0.68 0.13 10 7 1

18 19 2.57 -2.06 0.97 0.23 9 2

20 0.59 0.81 8

SUMMARY To summarize the results in this chapter— We obtain data x , and the population parameter is denoted as  . We modify the Bayes’ theory as

P  H D 

PD H  PH  P  D

   D  

f  D      P  D

 Kf  D     

  where K is a normalized factor. We call   as the priori probability distribution which

f D  

depends on a macroscopic parameter or constant.

where we have data D under the macroscopic value of probability distribution. There are four convenient combinations for conjugate functions, and

  D 

  

are the as follows.

,

is the likelihood probability

 .   D  is the posterior f D  

, which is called as

A Bayes’ Theorem for Predicting Population Parameters

269

Beta/Binominal (Beta)/Beta Distribution We assume the probability where we obtain x target events among n trials, and the corresponding likelihood function is given by f D   

 1

  L 1 1    L   L ,  L 

where

L  x  1

L  n  x  1 The priori probability distribution for this case is given by

    Be  0 , 0  

1  1  0 1 1    0 B  0 , 0 

The average and variance for the priori distributions are related to the parameters as 0 

02 1  0   0  02

0 

 1  0  02  2 1  0   0  0   0 

The posterior function is given by

  D  



*

1

1   

 * 1

B  * ,  * 

where

 *  0   L 1

Kunihiro Suzuki

270

 *  0   L  1 The average and variance of

 

 2 

 are denoted as  and  2 are given by

* *  *



 * * *



  * 1  *   *



2

Dirichlet/Multinomial (Dirichlet)/Dirichlet Distribution

m

In general, we need to consider multiple events of more than two kinds. Let us consider kinds of events and the corresponding probability for

P  X1  x1 , X 2  x2 , , X m  xm  f  x1 , x2 ,

, xm  

is given by

n! p x1 p x2 x1 ! x2 ! xm !

p xm

where p1  p2 

 pm  1

x1  x2 

 xm  n

The corresponding likelihood function is given by f 1 , 2 ,

, m D  

  

 1    2 

  m 

1 1 2 1

2 1

 m

m 1

where

i  xi  1 The priori probability distribution for this case is given by

A Bayes’ Theorem for Predicting Population Parameters  1 , 2 ,

, m  

  0 

 10    20 

  m 0 

1

 2

10 1

20 1

 m

271

m 0 1

Finally, we obtain the posterior distribution as  1 , 2 ,

, m D  

 

* 1

  * 

    * 2

 

* m



1 1 2 * 1

* 2 1

 m

* m 1

where

 i*   i   i 0  1

Gamma/Poisson(Gamma)/Gamma Distribution We assume that we obtain x times target events among n , and the average number we obtain is given by

  np where p is the probability that we obtain a target event in each trial and assumed to be small. The corresponding likelihood function is given by

f D   

 n 1e L

  nL 

where nL  x  1

The priori distribution    

  

is given by

0n  n 1e    0

0

  n0 

The average and variance for the priori distributions are related to the parameters as

Kunihiro Suzuki

272

0 

0  02

02 n0  2 0 The posterior distribution is given by

  D  

 n 1e  *

*

 

 n*

where

n*  n0  nL  1

 *  0  1

Normal/Normal/Normal Distribution If we obtained n data from a set as x1 , x2 , is given by

f D   

   L 2  exp   2 L2   2 L2  1

where

 L2 

2 n

L  x 

1 n  xi n i 1

The priori distribution is given by

, xn , the corresponding likelihood function

A Bayes’ Theorem for Predicting Population Parameters

273

   0 2  1 exp    2 02  2 0  

   

The corresponding posterior distribution is given by

    * 2     D   exp    2 *2  2 *   1

where 1



*2



1



 0

*  





2 0

2 0



1

 L2

 L  *2   L2 

Inverse Gamma/Normal (Inverse Gamma)/Inverse Gamma Distribution We extract a data set from a set as x1 ,x2 , ,xn , and the corresponding likelihood distribution is given by

 n

   exp   L       nL 

L 1

f D   

where

nL 

n 1 2

1

n



L  n    x   x 2    xi 2   i 1  2  2

The priori distribution is given by

Kunihiro Suzuki

274

0

   

 0     

0 n   n 1 Exp   0

  n0 

The average and variance for the priori distributions are related to the parameters as

n0  2 

02  02 

0  0 1  

02    02 

The posterior distribution is given by

 *     

  n 1 exp   *

  D  

 

 n*

where

n*  n0  nL  1

 *  0  L We can easily extend these procedures to obtain many macro parameters of a population set. We need to obtain posterior l distribution functions with l parameters, which is denoted as

 1 D, 2 ,3 ,

,l 

  2 D,1 ,3 ,

,l 

 i D,1 , 2 ,

,i 1 ,i 1 ,

 l D,1 , 2 ,

,l 1 

,l 

A Bayes’ Theorem for Predicting Population Parameters

275

In the Gibbs sampling method, the parameter values are updated by generating corresponding random numbers as



i  k 1  rand  Inv _  i D,1 k 1 , 2 k 1 , 

,  i 1 k 1 ,  i 1 k  ,



,l  k  ,i  

In the MH method, the parameter values are updated as follows. We generate  it as

it  i k   rand  Inv _ N 0,    i

We then evaluate the ratio

pi 



 it D,1 k  , 2 k  , ,i 1 k  ,i 1 k  , ,l  k 





 i  k  D,1 k  , 2 k  , ,i 1 k  ,i 1 k  , ,l  k 

We also generate uniform random number

q k 1  rand 

q



k 1

as



We then update the parameter as

 k 1

i

it  k   i

for pi  1 or pi  q  k 1 for pi  q  k 1

We can extend this theory to use various parameters in a likelihood function which is controlled with priori function, which is called as a hierarchical Bayes’ theory.

Chapter 8

ANALYSIS OF VARIANCES ABSTRACT We sometimes want to improve a certain objective value by trying various treatments. For example, we treat various kinds of fertilizer to improve harvest. We probably judge the effectiveness of the treatment by comparing corresponding average of the objective values of amount of harvest. However, there should be a scattering in the data values in each treatment. Even when we have a better average data, some data in the best treatment are less than the one in the other treatment. We show how we decide the effectiveness of the treatment considering the data scattering.

Keywords: analysis of variance, correlation ratio, interaction, studentized range distribution

1. INTRODUCTION In agriculture, we want to harvest grain much, and try various treatments and want to clarify effectiveness of various treatments. One example factor is kinds of fertilizers. We denote the factor as A and kinds of fertilizers as A j . We want to know the effectiveness of various kinds of fertilizers. We obtain various harvest data for each fertilizer. Although the average values have clear difference among the fertilizers, they are usually scattered in each fertilizers’ group. Analysis of variance is developed to decide validity of the effectiveness. Further, we need to consider more factors for the harvest such as temperature and humidity. Therefore, we need to include various factors in the analysis of the variances. We start with one factor, and then extend two factors.

Kunihiro Suzuki

278

2. CORRELATION RATIO (ONE WAY ANALYSIS) We study the dependence of three kinds of fertilizer A1 , A2 , and A3 on a grain harvest. We call the kinds of fertilizer as levels in general. We use 1 kg seed for each and harvest grain and check the weight with unit of kg. The data are shown in the Table 1. Table 1. Dependence of harvest on kinds of fertilizer

Average

A1 5.44 7.66 7.55 4.33 5.33 4.55 6.33 5.98 5.90

A:Fertilizer A2 7.33 6.32 6.76 5.33 5.89 6.33 5.44 8.54 6.49

A3 9.44 7.33 7.66 6.43 7.45 6.86 9.78 6.34 7.66

The data number for each level are given by

nA1  8, nA2  8, nA3  8

(1)

n  nA1  nA2  nA3  24

(2)

The corresponding averages are given by nA1

1  A1  nA1

x

1  A2  nA2

nA2

x

1 nA3

x

A  3

i 1

i 1

iA1

(3)

iA2

 6.49

(4)

nA3

i 1

 5.90

iA3

 7.66

(5)

Analysis of Variances

279

The total average is given by nA3 nA2  nA1    xiA1   xiA2   xiA3   6.68 i 1 i 1  i 1 

1  nA1  nA2  nA3

(6)

The average harvested weight was 5.90 kg for fertilizer A1 , 6.49 kg for fertilizer A2 , and 7.66 kg for fertilizer A3 . It seems that fertilizer A3 is most effective. Fertilizer A1 seems to be less effective. However, there is a datum for fertilizer A1 that is larger than the one for fertilizer A3 . How can we judge the effectiveness of levels statistically? We judge the variance associated with the level versus the variance associated with each data within each level. If the former one is larger than the latter one, we judge the level change is effective, and vice versa. Let us express the above discussion quantitatively. We evaluate the variance associated with the effectiveness of the level, which is 2

denoted as S ex , and is given by

S  2 ex



nA1  A1  



2



 nA2  A2  



2



 nA3  A3  

nA1  nA2  nA3



2

 0.53

(7)

We then consider the variance associated with the group in each level. It is denoted as 2 in

S , and is given by nA1

Sin2 

 i 1

xiA1   A1



2

nA2



  xiA2   A2 i 1



2

nA1  nA2  nA3

nA3



  xiA3   A3 i 1



2

 1.27

(8)

The correlation ratio  2 is given by 2 

Sex2  0.30 Sin2  Sex2

(9)

This indicates the significance of the dependence. If it is large, the effectiveness of changing fertilizer is large. It is also obvious that the correlation ratio is less than 1. However, we do not use this correlation ratio for the final effectiveness judge. The procedure is shown in the next section.

Kunihiro Suzuki

280

3. TESTING OF THE DEPENDENCE 2 We consider the freedom of S ex . We denote it as

ex . The variable in the variance is

 A1 ,  A2 ,  A3 . However, it should satisfy Eq. (6). Therefore, the freedom is the

three of

number of fertilizer kinds minus 1. Therefore the freedom

ex

is given by

ex  p  1  3  1  2

(10)

where p is the number of fertilizer kinds and is 3 in this case. Therefore, the unbiased 2

variance sex is given by sex2 

n

ex

Sex2  6.45

We consider the freedom of Sin2 . We denote it as

in . Each group has nA , nA , nA 1

2

3

data, but each group has their sample averages and the freedom is decreased by 1 for each group. Therefore, the total freedom is given by in   nA  1   nB  1   nC  1 n p  24  3  21

(11) 2

The corresponding unbiased variance is given by sin sin2 

n

in

Sin2  1.44

(12)

Finally, the ratio of the unbiased variance is denoted by F and is given by F

sex2  4.46 sin2

(13)

Analysis of Variances This follows the F distribution with a freedom of

281

ex ,in  . The P

point for the F

distribution with the predictive probability of 95% is denoted as FC ex , in , 0.95 , and is FC ex , in ,0.95  3.47

(14)

which is shown in Figure 1. Therefore,

F  FC , and hence the dependence is valid.

Figure 1. F distribution testing.

4. THE RELATIONSHIP BETWEEN VARIOUS VARIANCES We treat two kinds of variances in the previous section. We can also define the total variance with respect to the total average. We investigate the relationship between the variances. We assume that the level number is three and denoted as A , B , and C , and each level has the same data number of n . The total variance S 2 is constituted by summing up the deviation of each data with respect to the total average, and is given by nA1

Stot2 

 x i 1

iA1



nA2

   x 2

i 1

iA2



nA3

   x 2

i 1

iA3





2

n

Let us consider the first term of numerator of Eq. (15), which is modified as

(15)

Kunihiro Suzuki

282

 nA

i 1

xiA1  



2

nA1



  xiA1   A1   A1   i 1 nA1









  xiA1   A1 i 1 nA1

  xiA1   A1 i 1

2

nA1

2



 2 xiA1   A1 i 1

2





 nA1  A1  



 

A1

nA1





     A1   i 1



2

2

(16)

Similar analysis is performed for the other terms, and we obtain nA1



2 Stot 

i 1

xiA1   A1



2

nA2



  xiA2   A2 i 1

n





nA1  A1  



2



 nA2  A2  



2



2

nA3



  xiA3   A3 i 1



 nA3  A3  





2

2

n

S S 2 ex

2 in

(17)

Therefore, the total variance is the sum of the variance associated with various levels and one associated with each level. Therefore, the correlation ratio is expressed by

2 

Sex2 Sex2  2 Sin2  Sex2 Stot

(18)

5. MULTIPLE COMPARISONS In the analysis comparisons, we test whether whole averages are the same or not. When we judge whole averages are not the same, we want to know which one is different. One way to test it is to select one pair and judge the difference as shown in the previous section. We test each pair setting a prediction probability. The total results are not related to the setting prediction probability since they are not independent. We want to test them all pairs with a certain prediction probability. Turkey-Kramer multiple comparisons procedure is frequently used and we introduce it here. We compare the average of level i and j , and obtain the absolute value of the difference of  Ai   Aj , and form a normalized value given by

Analysis of Variances

zij 

A  A i

283

j

sin2  1 1     2  nAi nAj 

(19)

We compare the normalized value z to the studentized range distribution critical value of q  p, n  p, P  , where p is the level number, and n is the total sample number. The corresponding table for q  r, n  p, P  with P  0.95 and P  0.99 are shown in Table 2. The other simple way to evaluate the difference is the one with zi 

A   i

2 in

s 2

 1 1     n A  i n

(20)

We may be able to compare this normalized value to q  p, n  p, P  . We need to interpolate the value of q  r , n  r , P  from the table. Especially, the interpolation associated with n  r is important. For simplicity, we set k n p

(21)

The row number before the final is 120 in the table. If k is less than 120, we can easily interpolate the value of q  p, k , P  as q  p, k , P   q  p, i , P  

q  p, k , P   q  p, i , P 

 q  p, j , P   q  p, i , P   q  p, j , P   q  p, i , P  

(22)

Where i, j are the both sides of the row number that holds ik j

(23)

The difficulty exists in that the final row number is infinity and the number before the final is 120 in general. Therefore, we need to study how to interpolate where k  120 .

Kunihiro Suzuki

284 We introduce variables as 

1   k  120  ln   120 1 

(24)

Table 2. Studentized range distribution table for prediction probabilities of 0.95 and 0.99

Freedom of numerator

One sided P = 0.95 2 17.97 6.09 4.50 3.93 3.64 3.46 3.34 3.26 3.20 3.15 3.11 3.08 3.06 3.03 3.01 3.00 2.98 2.97 2.96 2.95 2.92 2.89 2.86 2.83 2.80 2.77

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 24 30 40 60 120 ∞

3 26.98 8.33 5.91 5.04 4.60 4.34 4.17 4.04 3.95 3.88 3.82 3.77 3.74 3.70 3.67 3.65 3.63 3.61 3.59 3.58 3.53 3.49 3.44 3.40 3.36 3.31

4 32.82 9.80 6.83 5.76 5.22 4.90 4.68 4.53 4.42 4.33 4.26 4.20 4.15 4.11 4.08 4.05 4.02 4.00 3.98 3.96 3.90 3.85 3.79 3.74 3.69 3.63

5 37.08 10.88 7.50 6.29 5.67 5.31 5.06 4.89 4.76 4.65 4.57 4.51 4.45 4.41 4.37 4.33 4.30 4.28 4.25 4.23 4.17 4.10 4.04 3.98 3.92 3.86

6 40.41 11.74 8.04 6.71 6.03 5.63 5.36 5.17 5.02 4.91 4.82 4.75 4.69 4.64 4.60 4.56 4.52 4.50 4.47 4.45 4.37 4.30 4.23 4.16 4.10 4.03

7 43.12 12.44 8.48 7.05 6.33 5.90 5.61 5.40 5.24 5.12 5.03 4.95 4.89 4.83 4.78 4.74 4.71 4.67 4.65 4.62 4.54 4.46 4.39 4.31 4.24 4.17

8 45.40 13.03 8.85 7.35 6.58 6.12 5.82 5.60 5.43 5.31 5.20 5.12 5.05 4.99 4.94 4.90 4.86 4.82 4.79 4.77 4.68 4.60 4.52 4.44 4.36 4.29

9 43.36 13.54 9.18 7.60 6.80 6.32 6.00 5.77 5.60 5.46 5.35 5.27 5.19 5.13 5.08 5.03 4.99 4.96 4.92 4.90 4.81 4.72 4.64 4.55 4.47 4.39

Freedom of denominator 10 11 12 49.07 59.59 51.96 13.99 14.39 14.75 9.46 9.72 9.95 7.83 8.03 8.21 7.00 7.17 7.32 6.49 6.65 6.79 6.16 6.30 6.43 5.92 6.05 6.18 5.74 5.87 5.98 5.60 5.72 5.83 5.49 5.61 5.71 5.40 5.51 5.62 5.32 5.43 5.53 5.25 5.36 5.46 5.20 5.31 5.40 5.15 5.26 5.35 5.11 5.21 5.31 5.07 5.17 5.27 5.04 4.14 5.23 5.01 5.11 5.20 4.92 5.01 5.10 4.82 4.92 5.00 4.74 4.82 4.90 4.65 4.73 4.81 4.56 4.64 4.71 4.47 4.55 4.62

13 53.20 15.08 10.15 8.37 7.47 6.92 6.55 6.29 6.09 5.93 5.81 5.71 5.63 5.55 5.49 5.44 5.39 5.35 5.32 5.28 5.18 5.08 4.98 4.88 4.78 4.68

14 54.33 15.38 10.35 8.53 7.60 7.03 6.66 6.39 6.19 6.03 5.90 5.80 5.71 5.64 5.57 5.52 5.47 5.43 5.39 5.36 5.25 5.15 5.04 4.94 4.84 4.74

15 55.36 15.65 10.53 8.66 7.72 7.14 6.76 6.48 6.28 6.11 5.98 5.88 5.79 5.71 5.65 5.59 5.54 5.50 5.46 5.43 5.32 5.21 5.11 5.00 4.90 4.80

16 56.32 15.91 10.61 8.79 7.83 7.24 6.85 6.57 6.36 6.20 6.06 5.95 5.86 5.79 5.72 5.66 5.61 5.57 5.53 5.49 5.38 5.27 5.16 5.06 4.95 4.85

17 57.22 16.14 10.84 8.91 7.93 7.34 6.94 6.65 6.44 6.27 6.13 6.02 5.93 5.85 5.79 5.73 5.68 5.63 5.59 5.55 5.44 5.33 5.22 5.11 5.00 4.89

4 164.30 22.29 12.17 9.17 7.80 7.03 6.54 6.20 5.96 5.77 5.62 5.50 5.40 5.32 5.25 5.19 5.14 5.09 5.05 5.02 4.91 4.80 4.70 4.60 4.50 4.40

5 185.60 24.72 13.33 9.96 8.42 7.56 7.01 6.63 6.35 6.14 5.97 5.84 5.73 5.63 5.56 5.49 5.43 5.38 5.33 5.29 5.17 5.05 4.93 4.82 4.71 4.60

6 202.20 26.63 14.24 10.58 8.91 7.97 7.37 6.96 6.66 6.43 6.25 6.10 5.98 5.88 5.80 5.72 5.66 5.60 5.55 5.51 5.37 5.24 5.11 4.99 4.87 4.76

7 215.80 28.20 15.00 11.10 9.32 8.32 7.68 7.24 6.92 6.67 6.48 6.32 6.19 6.09 5.99 5.92 5.85 5.79 5.74 5.69 5.54 5.40 5.27 5.13 5.01 4.88

8 227.20 29.53 15.64 11.55 9.67 8.61 7.94 7.47 7.13 6.87 6.67 6.51 6.37 6.26 6.16 6.08 6.01 5.94 5.89 5.84 5.69 5.54 5.39 5.25 5.12 4.99

9 237.00 30.68 16.20 11.93 9.97 8.87 8.17 7.68 7.32 7.06 6.84 6.67 6.53 6.41 6.31 6.22 6.15 6.08 6.02 5.97 5.81 5.65 5.50 5.36 5.21 5.08

Freedom of denominator 10 11 12 245.60 253.20 260.00 31.69 32.59 33.40 16.69 17.13 17.53 12.27 12.57 12.84 10.24 10.48 10.70 9.10 9.30 9.49 8.37 8.55 8.71 7.86 8.03 8.18 7.50 7.65 7.78 7.21 7.36 7.49 6.99 7.13 7.25 6.81 6.94 7.06 6.67 6.79 6.90 6.54 6.66 6.77 6.44 6.56 6.66 6.35 6.46 6.56 6.27 6.38 6.48 6.20 6.31 6.41 6.14 6.25 6.34 6.09 6.19 6.29 5.92 6.02 6.11 5.76 5.85 5.93 5.60 5.69 5.76 5.45 5.53 5.60 5.30 5.38 5.44 5.16 5.23 5.29

13 266.20 34.13 17.89 13.09 10.89 9.65 8.86 8.31 7.91 7.60 7.36 7.17 7.01 6.87 6.76 6.66 6.57 6.50 6.43 6.37 6.19 6.01 5.84 5.67 5.51 5.35

14 271.80 34.81 18.22 13.32 11.08 9.81 9.00 8.44 8.03 7.71 7.47 7.26 7.10 6.96 6.85 6.74 6.66 6.58 6.51 6.45 6.26 6.08 5.90 5.73 5.56 5.40

15 277.00 35.43 18.52 13.53 11.24 9.95 9.12 8.55 8.13 7.81 7.56 7.36 7.19 7.05 6.93 6.82 6.73 6.66 6.59 6.52 6.33 6.14 5.96 5.79 5.61 5.45

16 281.80 36.00 18.81 13.73 11.40 10.08 9.24 8.66 8.23 7.91 7.65 7.44 7.27 7.13 7.00 6.90 6.81 6.73 6.65 6.59 6.39 6.20 6.02 5.84 5.66 5.49

17 286.30 36.53 19.07 13.91 11.55 10.21 9.35 8.76 8.33 7.99 7.73 7.52 7.35 7.20 7.07 6.97 6.87 6.79 6.72 6.65 6.45 6.26 6.07 5.89 5.71 5.54

Freedom of numerator

One sided P = 0.99

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 24 30 40 60 120 ∞

2 90.03 14.04 8.26 6.51 5.70 5.24 4.95 4.75 4.60 4.48 4.39 4.32 4.26 4.21 4.17 4.13 4.10 4.07 4.05 4.02 3.96 3.89 3.83 3.76 3.70 3.64

3 135.00 19.02 10.62 8.12 6.98 6.33 5.92 5.64 5.43 5.27 5.15 5.04 4.96 4.90 4.84 4.79 4.74 4.70 4.67 4.64 4.55 4.46 4.37 4.28 4.20 4.12

The  is related to the value of q  p, k , P  as 

q  p, k , P   q  p,120 P 

q  p, , P   q  p,120, P 

(25)

Analysis of Variances

285

We can evaluate  from the thirst relationship as 

k  120 120

(26)

We can then obtain 1     ln   1 

(27)

Solving this equation with respect to  , we obtain



e  1 e  1

(28)

Comparing Eq. (28) with Eq. (25), we obtain an interpolate value as q  p, k , P   q  p,120, P  

e  1 q  p, , P   q  p,120, P  e  1

(29)

6. ANALYSIS OF VARIANCE (TWO WAY ANALYSIS WITH NON-REPEATED DATA) We consider two factors of A and B . For example, factor A corresponds to the kinds of fertilizers, and factor B corresponds to the temperature. We assume the factor A has four levels of A1, A2 , A3 , A4 , denoted as nA , and the factor B has three levels of B1 , B2 , B3 denoted as nB , where nA  4 and nB  3 . We assume that the data is only one, that is, non-repeated data as shown in Table 3. The total data number n is given by

n  nA  nB

(30)

In this case, each data xij can be expressed by the deviation from the average, and is given by

Kunihiro Suzuki

286



 



xij     Aj    Bi    eij

(31)

This can be modified as



 



xij     Aj    Bi    eij

(32)

The corresponding freedom relationship is given by

tot  A  B  e

(33)

where

tot  n  1  12  1  11

(34)

 A  nA  1  4 1 3

(35)

B  nB  1  3 1 2

(36)

We can obtain the freedom associated with the error and it is given by

e  tot   A  B    n  1   nA  1  nB  1  n   nA  nB   1  12   4  3  1 6

The discussion is almost the same as that for one-way analysis.

(37)

Analysis of Variances

287

We evaluate the total average  and evaluate the average associated with factor A denoted as  A j where j  1, 2,3, 4 and factor B denoted as  Bi where i  1,2,3 , which are shown in Table 3. Table 3. Two-factors cross data table with no repeat

A1 B

A A3

A2

B1 B2 B3 μAj DμAj

7.12 8.63 7.99 7.91 -2.23

Total average

10.15

8.33 8.15 8.55 8.34 -1.80

μBi

A4 9.12 10.98 11.79 10.63 0.48

11.21 15.43 14.45 13.70 3.55

DμBi 8.95 10.80 10.70

-1.20 0.65 0.55

We consider the effect of a factor A where 4 levels exist. We can evaluate the associated variance as nA

S A2  



nB   Aj   j 1



2

n 1 nA

 nA

j 1

Aj





2

(38)

1 2 2 2 2  2.23   1.80    0.48   3.55   4  5.26



The freedom is level number 4 -1 =3. Therefore, the unbiased variance is given by s A2 

n

A

S A2

12  5.26 3  21.08



(39)

We next evaluate the effect of a factor B where three levels exist. We evaluate three deviations of each level average with respect to the total average, given by

 Bi   Bi  

(40)

Kunihiro Suzuki

288

The corresponding variance can be evaluated as nBi

S B2  



nA   Bi   i 1



2

n 1 nB

nBi

 i 1

Bi





2

(41)

1 2 2 2   1.20    0.65    0.55     3  0.72 2 The unbiased variance s B is then given by

sB2 

n

B

S B2

12  0.72 2  4.34



(42)

The error is given by



 



eij  xij      Aj     Bi     

(43)

which is shown in Table 4. Table 4. Error data

A

B

A1 0.41 0.06 -0.47

B1 B2 B3 2

The corresponding variance Se is given by

A2 1.19 -0.85 -0.34

A3 -0.31 -0.30 0.61

A4 -1.29 1.08 0.20

Analysis of Variances Se2 

1 3 4 2  eij n i 1 j 1

289

(44)

 0.50

The freedom of e is given by

e  tot  aLex  bLex    n  1   na  1  nb  1  n   na  nb   1  12   4  3  1 6

(45)

Therefore, the unbiased variance is given by se2 

n

e

Se2

12  0.50 6  1.01



(46)

The F associated with a factor A is given by FA 

s A2 21.08   20.87 se2 1.01

(47)

The critical value for FAc is given by

FAc  3,6,0.95  4.76 Therefore, FA is larger than

(48)

FAc , and hence there is dependence of a factor B .

The F value associate with the factor B is given by FB 

sB2 4.34   4.29 se2 1.01

The critical value for F is given by

(49)

Kunihiro Suzuki

290

FBc  2,6,0.95  5.14 Therefore, F is smaller than

(50)

FBc , and hence there is no clear dependence on the factor

B.

7. ANALYSIS OF VARIANCE (TWO WAY ANALYSIS WITH REPEATED DATA) We consider two factors of A and B , and the factor A has four levels of A1, A2 , A3 , A4 denoted as nA , and the factor B has three levels of B1 , B2 , B3 denoted as nB , where nA  4 and nB  3 . In the previous section, we assumed a one set data. That is, there is

only one data for the level combination of Aj Bi . We assume that the data are two sets, that is, repeated data as shown in Table 5. There are two data for Aj Bi and are denoted as

xij _ s

and s  1,2 .

The total data number n is given by n  nA  nB  ns  3 4  2  24

(51)

The total average  is evaluated as



1 ns nA nB  xij _ s n s 1 j 1 i 1

 10.07

(52)

We can then evaluate each data deviation xij _ s from the total average, and it is given by

xij _ s  xij _ s   which is shown in Table 6.

(53)

Analysis of Variances

291

Table 5. Two-factors cross data table with repeat of data 1 and 2 Data 1

B

A1

B1 B2 B3

7.55 8.12 8.97

Total mean

A A3

A2 8.98 6.96 10.95

Data 2

A4 8.55 11.89 12.78

11.02 11.89 16.55

B

A1

A A3

A2

B1 B2 B3

6.11 9.01 8.79

8.78 9.65 9.21

A4 8.44 9.86 12.01

9.89 13.22 12.47

10.07

Table 6. Deviation of each data with respect to the total average Data 1 deviation B

A1

B1 B2 B3

A A3

A2 -2.52 -1.95 -1.10

-1.09 -3.11 0.88

Data 2 deviation

A4 -1.52 1.82 2.71

0.95 1.82 6.48

B

A1

A A3

A2

B1 B2 B3

-3.96 -1.06 -1.28

-1.29 -0.42 -0.86

A4 -1.63 -0.21 1.94

-0.18 3.15 2.40

We form a table averaged the data 1 and 2, and it is given by

xij 



1 xij _1  xij _1 2



(54)

which is shown in Table 7. We regard that we have two sets of the same data. Table 7. Average of data 1 and 2 Average of data 1,2 B

B1 B2 B3 μAj DμAj

A1

A A3

A2 6.83 8.57 8.88 8.09 -1.98

8.88 8.31 10.08 9.09 -0.98

μBi

μBi

A4 8.50 10.88 12.40 10.59 0.52

10.46 12.56 14.51 12.51 2.44

8.67 10.08 11.47

-1.40 0.01 1.40

The averages are evaluated as  Aj 

 Bi 

1 nB

nB

x i 1

ij

1 nA  xij nA j 1

(55)

(56)

Kunihiro Suzuki

292

and the average deviation can be evaluated as

 Aj   Aj  

(57)

Bi  Bi  

(58)

2 Let us consider the effectiveness of a factor A . The related sum of square S Aex is given

by

S

2 Aex

2  nA    Ai    nB  2 i 1   n 2  nA    Ai    nB  2 i 1   2nA nB









 nA



i 1

Ai





2

nA

 1.98   0.98   0.52    2.44   2

2

2

2

4

 2.77

(59)

The related freedom Aex is the level number of 4 minus 1, and is given by

Aex  nA  1  4 1 3

(60)

2 Therefore, the corresponding unbiased variance s Aex is given by

2 s Aex 

n

2 S Aex

 Aex 24   2.77 3  22.17

(61)

Analysis of Variances

293

2 Let us consider the effectiveness of the factor B . The related sum of square SBex is

given by

2 S Bex

2  nB    Bi    nA  2 i 1   n 2  nB    Bi    nA  2 i 1   2nA nB









 nB



i 1

Bi





2

nB

 1.40    0.01  1.40   2

2

2

3

 1.31

(62)

The related freedom Bex is the level number of 3 minus 1, and is given by

Bex  nB  1  3 1 2

(63)

2 Therefore, the corresponding unbiased variance sBex is given by

2 sBex 

n

bLex

2 S Bex

24  1.31 2  15.69



(64)

Table 8. Deviation of the data with respect the average for each level Data 1

B

B1 B2 B3

A1

A A3

A2 0.72 -0.45 0.09

0.10 -1.35 0.87

Data 2

A4 0.05 1.02 0.39

0.57 -0.66 2.04

B

B1 B2 B3

A1

A A3

A2 -0.72 0.45 -0.09

-0.10 1.35 -0.87

A4 -0.06 -1.02 -0.39

-0.57 0.67 -2.04

Kunihiro Suzuki

294

We obtain two sets of data and the both data for the same levels of A j Bi are different in general. This difference is called as a pure error and is given by

epure _ ij _ s  xij _ s  xij

(65)

It should be noted that

epure _ ij _ 2  epure _ ij _1

(66)

which is shown in Table 8. 2 We can evaluate the variance associated with the difference Se _ pure and is given by

2 1  nB nA   e pure _ ij _1   2 n j i  1  2 2 2 2   0.72    0.10    0.05    0.57    24 2 2 2 2   0.45    1.35   1.02    0.66    



Se2_ pure 





(67)



2 2 2 2   0.09    0.87    0.39    2.04    2    0.78

Let us consider the corresponding freedom e _ pure . The data number is given by n  na  nb  2

(68)

We use averages of na  nb kinds. Therefore, we have a freedom of

e _ pure  na  nb  2  na  nb  na  nb   2  1  43  12 2 Therefore, the corresponding unbiased variation se _ pure is given by

(69)

Analysis of Variances se2_ pure 

n

e _ pure

295

Se2_ pure

24  0.78 12  1.57



(70)

The deviation of each data from the total average eij _ s is given by



 



eij _ s  xij _ s      Aj     Bi     

(71)

What is the difference between the deviation and the pure error plus effectiveness of factors A and B means? If the effectiveness of the factors A and B is independent, the error is equal to the pure error. Therefore, the difference is related to the interaction between the two factors. The difference is called with an interaction, and is given by einteract _ ij _ s  eij _ s  e pure _ ij _ s





 

 

 xij _ s      Aj     Bi     xij _ s  xij  



 





 xij      Aj     Bi     

(72)

Note that this does not depend on s . This is evaluated as shown in Table 9. The 2 corresponding variance Sinteract is given by

2 Sinteract 



2   0.14   1.20   2

2

24

  0.41   0.61 2

2

  0.36 (73)

Let us consider the related freedom interact . We obtain the equations associated with a freedom given by

tot  aLex  bLex  interact  e _ pure Therefore, we obtain

(74)

Kunihiro Suzuki

296

interact  tot   Aex  Bex  e _ pure   n  1   nA  1  nB  1  nA  nB   ns  1   23   3  2  12  6

(75)

2 The corresponding unbiased variance sinteract is given by

2 sinteract 

n

interact

2 Sinteract

24  0.36 6  1.45



(76) Table 9. Interactions

Data 1

B

A1

B1 B2 B3

A A3

A2 0.14 0.47 -0.61

1.20 -0.79 -0.41

Data 2

A4 -0.69 0.28 0.41

-0.65 0.04 0.61

B

B1 B2 B3

A1

A A3

A2 -1.84 -1.51 -2.59

0.22 -1.77 -1.39

A4 -0.17 0.80 0.93

1.79 2.48 3.04

Effectiveness of each factor can be evaluated with corresponding F values. The effectiveness of the factor A can be evaluated as

FA 

2 sAex

se2_ pure

 14.14 (77)

The critical F value Fac is given by FAc  F  Aex , e , P   F  3,12,0.95   3.49

(78)

Therefore, the factor A is effective. The effectiveness of the factor B can be evaluated as

FB 

2 sBex

se2_ pure

 10.01 (79)

Analysis of Variances

297

The critical F value FBc is given by FBc  F Bex , e , P   F  2,12,0.95   3.89

(80)

Therefore, the factor B is effective. The effectiveness of the interaction can be evaluated as

Finteract 

2 sinteract  0.92 se2_ pure

(81)

The critical F value Finteractc is given by Finteractc  F interact , e , P   F  6,6,0.95   3.00

(82)

Therefore, the interaction is not effective.

SUMMARY This chapter is summarized in the following.

One Way Analysis 2

The effectiveness of the level is evaluated with S ex given by

S  2 ex



nA1  A1  



2



 nA2  A2  



2



 nA3  A3  



2

nA1  nA2  nA3 2

The scattering of the data is expressed with S in , and it is given by

Kunihiro Suzuki

298 nA1

Sin2 

 x

iA1

i 1

  A1

nA2

   x 2

i 1

iA2

  A2

nA3

   x 2

i 1

iA3

  A3



2

nA1  nA2  nA3

The correlation ratio  2 is given by 2 

Sex2 S  Sex2 2 in

This is between 0 and 1, and the effectiveness of the factor can be regarded as significant with larger  2 . We form an unbiased variable as sex2 

n

ex

Sex2

where

ex  p p

is the level number.

sin2 

n

in

Sin2

where

in  n  p Finally, the ratio of the unbiased variance is denoted by F and it is given by F

sex2 sin2

This follows the F distribution with a freedom of

ex ,in  . The P

distribution is denoted as FC ex , in , P  . If F  Fc we regard that the factor is valid, and vice versa.

point for the F

Analysis of Variances

299

The effectiveness between each level can be evaluated as

 Ai   Aj sin2  1 1     2  nAi nAj  If this value is larger than the studentized range distribution table value of q  r , n  r , P  , we judge that the difference is effective. The other simple way to evaluate the difference is the one with

zi 

A   i

sin2  1 1     2  nAi n 

We may be able to compare absolute value of this with z p for a normal distribution. Two way analysis without repeated data We consider two factors of

A:

A1, A2 , A3 , A4 and B : B1 , B2 , B3 , where nA  4 and

nB  3 . The total data number n is given by

n  nA  nB In this case, each data xij can be expressed by the deviation from the average, and is given by



 



xij     Aj    Bi    eij The various variances are given by

 nA

2 S Aex 

i 1

Ai

nA





2

Kunihiro Suzuki

300

 nB

2 S Bex 

Se2 

i 1

Bi





2

nB

1 nB nA 2  eij n i 1 j 1

The various freedoms are given by

tot  n  1 A  nA  1 B  nB  1 The freedom associated with error is given by

e  tot  A  B    n  1   nA  1  nB  1  n   nA  nB   1 Therefore, the unbiased variances are given by sA2 

sB2 

se2 

n

A n

B n

e

S A2

SB2

Se2

The F associated with the factor A is given by FA 

s A2 se2

Analysis of Variances

301

This is compared with the F critical value of FAc A , c , P  s B2 se2

FB 

This is compared with the F critical value of FBc B , c , P  

Two way analysis with repeated data

We consider two factors of

A : A1 , A2 , , AnA and B : B1 , B2 , , BnB , and we have ns

set. The total data number n is given by n  nA  nB  ns

The total average is given by



1 ns nA nB  xij _ s n s 1 j 1 i 1

Each data deviation xij _ s from the total average is given by

xij _ s  xij _ s   The average data for each level is given by

xij 



1 xij _1  xij _1 2



The averages are evaluated as  Aj 

1 nB

nB

x i 1

ij

Kunihiro Suzuki

302

 Bi 

1 nA  xij nA j 1

and the average deviations can be evaluated as

 Aj   Aj   Bi  Bi  

 nA

2 S Aex 

i 1



Ai



2

nA

Aex  nA  1 2 Therefore, the corresponding unbiased variance s Aex is given by

2 sAex 

n

Aex

2 S Aex

 nB

2 S Bex 

i 1



Bi



2

nB

Bex  nB  1 2 The corresponding unbiased variance sBex is given by

  sBex  2

n

bLex

  S Bex 2

A pure error is given by

epure _ ij _ s  xij _ s  xij

Analysis of Variances Se2_ pure 

303

2 1  nB nA ns  e pure _ ij _ s  nA nB ns  j i s 





The deviation of each data from the total average eij _ s is given by



 



eij _ s  xij _ s      Aj     Bi     

The difference associated with interaction is given by einteract _ ij _ s  eij _ s  e pure _ ij _ s



 



 xij      Aj     Bi     

2 Sinteract 



nA

1 nA nB

ns

nB

  e

1 nA nB ns

interact _ ij _ s

j

i

nB

ij

j

i

2

s

x nA





 



     Aj     Bi     

2

interact  tot  Aex  Bex  e _ pure 

 n  1   nA  1  nB  1  nA  nB   ns  1 

2 sinteract 

FA 

n

interact

2 Sinteract

2 sAex

se2_ pure

The critical F value Fac is given by

FAc  F Aex ,e , P  Therefore, the factor A is effective. The effectiveness of the factor B can be evaluated as

Kunihiro Suzuki

304

FB 

2 sBex

se2_ pure

The critical F value FBc is given by

FBc  F Bex ,e , P  Therefore, the factor B is effective. The effectiveness of the interaction can be evaluated as

Finteract 

2 sinteract se2_ pure

The critical F value Finteractc is given by

Finteractc  F interact ,e , P 

Chapter 9

ANALYSIS OF VARIANCES USING AN ORTHOGONAL TABLE ABSTRACT We show a variable analysis using an orthogonal table which enables us to treat factors more than two. Although the level number for each factor is limited to two, many factors can be assigned to columns in the orthogonal table and we can also evaluate interaction of targeted combination, and obtain easily the effectiveness of each factor.

Keywords: analysis of variance, orthogonal table, interaction

1. INTRODUCTION We studied variable analysis with two factors in the previous chapter. However, we have a case to treat factors more than two. For example, when we treat 4 factors with two 4 levels of each, we need the data number at least of 2  16 . The factor number a with m level number m requires the minimum data number of a . The number of levels we should try to obtain a result increases significantly with increasing a and m . Therefore, we need some procedure to treat such many factors, which is called as experimental design using an orthogonal table. We focus on the two levels with many factors in this chapter.

Kunihiro Suzuki

306

2. TWO FACTOR ANALYSIS We start with two factors where each level number is two. The factor is donated as A and B , and the levels for each factors are denoted as A1 , A2 and B1 , B2 . The corresponding data are shown in Table 1. The related data for given

factors are denoted as yij , where i, j  1, 2 . This table is converted as a data list for a given combination of factors shown in Table 2. The combination is also expressed as the two columns as shown in Table 3. Finally, we convert the notation 1 to -1, and 2 to 1, and obtain Table 4. We regard each column as vector, and the dot product of the two vectors is given by  1   1  1 1 1 1    1  1  1  1  0 1   1

(1)

Therefore, the two vectors are orthogonal to each other, which is the reason why the table is called as orthogonal. Table 1. Factors, levels, and related data

B

A

A1 A2

B1 y11 y21

B2 y12 y22

Table 2. Combination of factors, and related data

Combination A1B1 A1B2 A2B1 A2B2

Data y11 y12 y21 y22

Analysis of Variances Using an Orthogonal Table

307

Table 3. Two column expression for factors, and related data A

B 1 1 2 2

1 2 1 2

Data y11 y12 y21 y22

Table 4. Two column expression for factors, and related data A

B -1 -1 1 1

-1 1 -1 1

Data y11 y12 y21 y22

We assume that the data yij is composed of many factors: the contribution of a factor Ai denoted as i , a factor B j denoted as  j , interaction between i and  j denoted as

 ij , an independent constant number denoted as

 , and an error denoted as

eij . yij is

hence expressed by

yij    i   j   ij  eij

(2)

The number of yij is denoted as n , and in this case, it is given by n  22  4

(3)

We impose the restrictions to the variables as

1  2  0

(4)

1  2  0

(5)

 11   12  0

(6)

Kunihiro Suzuki

308

 21   22  0

(7)

 11   21  0

(8)

 12   22  0

(9)

The above treatment does not vanish generality. Therefore, we decrease the variables number as

1  2  

(10)

1  2  

(11)

 11    21    21   22   

(12)

Each data is then given by

y11           e11

(13)

y12           e12

(14)

y21           e21

(15)

y22           e21

(16)

We should guess the values of  ,  ,  , and   from the obtained data yij . The predicted value can be determined so that the error is the minimum. The sum of error Qe2 can be evaluated as

Analysis of Variances Using an Orthogonal Table

309

Qe2  e112  e122  e212  e222

  y  y  y

                                

 y11            12

21

22

2

2

2

2

(17)

Differentiating Eq. (17) with respect to  and set it to 0, we obtain

Qe2  2 y11            





 2  y 2  y

                    

2 y12            21

22

 2  y11  y12  y21  y22   4ˆ  0

(18)

We then obtain

ˆ 

1  y11  y12  y21  y22  n

(19)

where we use hat to express that it is the predicted value. We utilize this notation in the followings. Differentiating Eq. (17) with respect to  and set it to 0, we obtain Qe2  2 y11            

 2  y 2  y 2  y

12

21

22

                                 

 2  y11  y12  y21  y22   4ˆ  0

(20)

Kunihiro Suzuki

310 We then obtain

ˆ 

1   y11  y12  y21  y22  n

(21)

Differentiating Eq. (17) with respect to  and set it to 0, we obtain

Qe2  2 y11            

 2  y 2  y 2  y

12

21

22

                                 

 2  y11  y12  y21  y22   4ˆ    0

(22)

We then obtain

1 ˆ    y11  y12  y21  y22  n

(23)

Differentiating Eq. (17) with respect to   and set it to 0, we obtain Qe2  2 y11              





 2  y 2  y

                      

2 y12            21

22

 2  y11  y12  y21  y22   4     0

We then obtain

(24)

Analysis of Variances Using an Orthogonal Table

1 n

311

    y11  y12  y21  y22  (25)

We define the variables below QA  y11  y12

(26)

QA  y21  y22

(27)

QB  y12  y22

(28)

QB  y11  y21

(29)

Q   y11  y22

(30)

Q   y12 +y21

(31)

The total average  is given

  ˆ 

1  y11  y12  y21  y22  n

(32)

This is also expressed by



1 Q   Q   n

(33)

where  is a dummy variable and    ,  ,  . The average associated with each factor of levels are expressed by

  

1 Q  n 2  

(34)

Kunihiro Suzuki

312

  

1 Q  n 2  

(35)

Let us consider the variance associated with a factor A denoted by S A2 , which is evaluated as 2 2 n  A       A        S A2  2 n

(36)

This can be further reduced to 2 2 n  A       A        S A2  2 n 2 2        Q   1   QA  1 1 A        QA   QA      QA   QA     2   n  n   n  n      2       2       2 2 1   1  1       QA   QA       Q A   Q A     2   n  n   1 2  2  QA   QA   n

(37)

The other variances are evaluated similar way as

SB2 

SC2 

1 2 Q  QB   2  B n

(38)

1 2 Q  QB   2  B n

(39)

2 S AB 

1 2 Q  QAB   2  AB  n

We can modify the table related to the above equations.

(40)

Analysis of Variances Using an Orthogonal Table

313

First, we can add column associated with AB as in Table 5. The final table can be formed using this data as shown Table 6. Table 5. Two column expression for factors adding interaction, and related data

A

B -1 -1 1 1

AB -1 1 -1 1

1 -1 -1 1

Data y11 y12 y21 y22

Table 6. Two column expression for factors adding interaction, and related data A

Factor Level

B

Ay11 y12 QA-

Data Sum

A+ y21 y22 QA+ S(2)A

Variance

By11 y21 QB-

(AB) B+ y12 y22 QB+

S(2)B

(AB)y12 y21 Q(AB)-

(AB)+ y11 y22 Q(AB)+

S(2)(AB)

The freedom is all 1, that is,

A  B  AB  1

(41)

and the unbiased variance is the same as the variance. The unbiased variances are denoted as small s character and are given by sA2 

n

A

S A2 , sB2 

n

B

2 SB2 , sAB 

n

AB

2 S AB

(42)

We cannot separate the variance associated with error. However, if the interaction can be ignored, that is    0 , the related variance can be regarded as the one for error. Therefore, 2 se2  sAB

We can then test the effectiveness the factor using a F distribution as

(43)

Kunihiro Suzuki

314

FA 

FB 

s A2 se2

(44)

sB2 se2

(45)

The critical value for the F distribution Fcrit is given by

Fcrit  F 1,1,0.95

(46)

3. THREE FACTOR ANALYSIS WITHOUT REPETITION We extend the analysis in the previous section to three factors of A , B , and C . We assume that the data yijk is expressed by the contribution of the factor Ai denoted as i , the factor B j denoted as  j , the factor Ck denoted as  k , the interaction between each factor denoted as  ij ,  ik ,    jk ,  ijk , the independent constant number denoted as  , and the error denoted as eijk . The number of yijk is denoted as n , and n  23  8

(47)

and it is expressed as

yijk    i   j   k   ij   ik     jk   ijk  eijk

(48)

We impose restrictions to the variables as

1  2  0

(49)

1  2  0

(50)

Analysis of Variances Using an Orthogonal Table

315

1   2  0

(51)

 11   12  0

(52)

 21   22  0

(53)

 11   21  0

(54)

 12   22  0

(55)

 11   12  0

(56)

 21   22  0

(57)

 11   21  0

(58)

 12   22  0

(59)

  11    12  0

(60)

  21    22  0

(61)

  11    21  0

(62)

  12    22  0

(63)

These restrictions do not vanish the generality. We assume that s1 : i  j  even  s2 : i  j  odd 

(64)

Kunihiro Suzuki

316 and impose

 s 1   s 2  0

(65)

 s 1   s 2  0

(66)

 s 1   s 1  0

(67)

 s 2   s 2  0

(68)

1

1

2

2

1

2

1

2

Eqs. (65)-(68) can be summarized as

 u   u 1

0

2

(69)

where u1 : i  j  k  even  u2 : i  j  k  odd 

(70)

Therefore, we decrease the variables number as 1  2  

(71)

1  2  

(72)

 1   2  

(73)

 11    21    21   22   

(74)

 11    21    21   22   

(75)

  11     21     21    22    

(76)

Analysis of Variances Using an Orthogonal Table

 u

1

   u    2

317

(77)

Each data yijk is then expressed by

y111                       e111

(78)

y112                       e112

(79)

y121                       e121

(80)

y122                       e122

(81)

y211                       e211

(82)

y212                       e212

(83)

y221                       e221

(84)

y222                       e222

(85)

Based on the above analysis, we can construct an orthogonal table which is called as

L8  27  . This can accommodate the case with factors up to 6. 8 in the notation of L8  27  3 comes from 2 and 3 comes from the number of character A, B, C . 2 in the notation of

L8  27  comes from the level number. 7 in the notation of L8  27  comes from the column number. The column number is assigned to the character as

A  20  1: first column B  21  2 : second column C  22  4 : fourth column

Kunihiro Suzuki

318

This is related to the direct effects of each factor. The other column is related to AB, AC, BC, ABC as below. AB  1  2  3 AC  1  4  5 BC  2  4  6 ABC  1  2  4  7

We can generate the yijk in order as shown in the table. The level of column 1(A), 2(B), and 4(C) are directly related to the subscript of yijk , where it is -1 when the subscript is 1 and 1 when the subscript is 2. This can also be appreciated that it is -1 when the subscript is odd number and 1 when the subscript is even number. The level for clum3 (AB) is determined from the sum of i  j is an odd or even number. The other column levels are also determined with the same way.

 

Table 7. Orthogonal table L8 27 1

8 Factor

2 -1 -1 -1 -1 1 1 1

Combination No

Column No. 1 2 3 4 5 6 7

1 A

3 -1 -1 1 1 -1 -1 1

4 1 1 -1 -1 -1 -1 1

1 B

1 AB

5 -1 1 -1 1 -1 1 -1

6 1 -1 1 -1 -1 1 -1

1 C

7 1 -1 -1 1 1 -1 -1

-1 1 1 -1 1 -1 -1

1

1 ABC

1 AC

BC

Data Value y111 2.3 y112 3.4 y121 4.5 y122 5.6 y211 7.5 y212 8.9 y221 9.7 y222

8.9

Let us perform dot product of any combination of columns. For example levels of column number 1 and 2. We perform dot product of columns 1 and 2 are 1   1  1   1 1 1 1 1 1 1 1 1    1  1  1  1  1  1  1  1 1   1  1    1   0

(86)

We can verify this result for any combination of columns. This expresses the orthogonality of the table.

Analysis of Variances Using an Orthogonal Table

319

We should guess the value of  ,  ,  ,  , and   ,   ,    , and   from the obtained data yijk . The predicted value can be determined so that the error is the minimum. The error Qe can be evaluated as 2

2

2

Qe2   eijk2 i 1 j 1 k 1

  y  y  y  y  y  y  y

                                                                                                                                                               

 y111                       

2

2

112

2

121

2

122

2

211

2

212

2

221

2

222

(87)

Differentiating Eq. (87) with respect to  and set it to 0, we obtain Qe2  2 y111                        





 2  y 2  y 2  y 2  y 2  y 2  y

                                                                                                                                          

2 y112                        121

122

211

212

221

222

 2  y111  y112  y121  y122  y211  y212  y221  y222   8  0

(88)

We then obtain

ˆ 

1  y111  y112  y121  y122  y211  y212  y221  y222  n

(89)

Kunihiro Suzuki

320

Differentiating Eq. (87) with respect to  and set it to 0, we obtain Qe2  2 y111                        

 2  y 2  y 2  y 2  y 2  y 2  y 2  y

112

121

122

211

212

221

222

                                                                                                                                                               

 2  y111  y112  y121  y122  y211  y212  y221  y222   8  0

(90)

We then obtain

ˆ 

1   y111  y112  y121  y122  y211  y212  y221  y222  n

(91)

Similarly, we obtain

1 ˆ    y111  y112  y121  y122  y211  y212  y221  y222  n

(92)

1   y111  y112  y121  y122  y211  y212  y221  y222  n

(93)

1  y111  y112  y121  y122  y211  y212  y221  y222  n

(94)

1  y111  y112  y121  y122  y211  y212  y221  y222  n

(95)

1  y111  y112  y121  y122  y211  y212  y221  y222  n

(96)

ˆ 

 

 

 

Analysis of Variances Using an Orthogonal Table

 

1   y111  y112  y121  y122  y211  y212  y221  y222  n

321

(97)

We define the variables below QA  y211  y212  y221  y222

(98)

QA  y111  y112  y121  y122

(99)

QB  y121  y122  y221  y222

(100)

QB  y111  y112  y211  y212

(101)

QC   y112  y122  y212  y222

(102)

QC   y111  y121  y211  y221

(103)

Q   y111  y112  y221  y222

(104)

Q   y121  y122  y211  y212

(105)

Q   y111  y121  y212  y222

(106)

Q   y112  y122  y211  y221

(107)

Q   y111  y122  y211  y222

(108)

Q   y112  y121  y212  y221

(109)

Q   y112  y121  y211  y222

(110)

Q   y111  y122  y212  y221

(111)

Kunihiro Suzuki

322 The total average  is given



1  y111  y112  y121  y122  y211  y212  y221  y222  n

(112)

This is also expressed by 

1 Q   Q   n

(113)

where  is a dummy variable and    ,  , ,  . The average associated with each factor of levels are expressed by   

  

1 Q  n 2  

(114)

1 Q  n 2  

(115)

Let us consider the variance associated with a factor A denoted by S A2 , which is evaluated as 2 2 n  A       A        S A2  2 n

(116)

This can be further reduced to 2 2 n        A      A   S A2  2 n 2 2           Q 1 Q 1 1     A    QA   QA      A    QA   QA     2   n  n   n  n      2       2       2 2 1   1  1       QA   QA       Q A   Q A     2   n  n  



1 2  QA   QA   n2

(117)

Analysis of Variances Using an Orthogonal Table

323

The other variances are evaluated with the similar way as

SB2 

SC2 

1 2 Q  QB   2  B n

(118)

1 2 Q  QB   2  B n

(119)

2 S AB 

2 S AC 

2 S BC 

1 2 Q  QAB   2  AB  n

(120)

1 2 Q  QAC   2  AC  n

(121)

1 2 Q  QBC   2  BC  n

(122)

2 S ABC 

1 2 Q  QABC   2  ABC  n

(123)

The freedom of each variance is all 1 and their variances are the same as the variances given by sA2 

n

A

2 sAB 

S A2 , sB2 

n

B

SB2 , sc2 

n

C

SC2

n 2 2 n 2 2 n 2 S ,s  S ,s  S AB AB AC AC AC BC BC BC

2 s ABC 

n 2 S ABC  ABC

(124)

(125)

(126)

Based on the above process, we can form the Table 8, and evaluated value is shown in Table 9.

Kunihiro Suzuki

324

 

Table 8. Orthogonal table L8 27 1 A

Column No. Factor Level

2 B

3 AB

4 C

5 AC

6 BC

7 ABC

-1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 y111 y211 y111 y121 y121 y111 y111 y112 y112 y111 y112 y111 y111 y112 y112 y212 y112 y122 y122 y112 y121 y122 y122 y121 y121 y122 y122 y121 y121 y221 y211 y221 y211 y221 y211 y212 y211 y212 y212 y211 y212 y211 y122 y222 y212 y222 y212 y222 y221 y222 y221 y222 y221 y222 y221 y222 QA- QA+ QB- QB+ Q(AB)- Q(AB)+ QC- QC+ Q(AC)- Q(AC)+ Q(BC)- Q(BC)+ Q(ABC)-Q(ABC)+

Data

Sum

S(2)A

Variance

S(2)B

S(2)AB

S(2)C

S(2)AC

S(2)BC

S(2)ABC

 

Table 9. Values for Orthogonal table L8 27 Column No. Factor Level

1 A

2 B

3 AB

-1 1 -1 1 -1 1 2.3 7.5 2.3 4.5 4.5 2.3 3.4 8.9 3.4 5.6 5.6 3.4 Data 4.5 9.7 7.5 9.7 7.5 9.7 5.6 8.9 8.9 8.9 8.9 8.9 Sum 15.8 35 22.1 28.7 26.5 24.3 5.76 0.68 0.08 Variance 46.08 5.445 0.605 Unbiased variance

4 C

5 AC

6 BC

-1 1 -1 1 -1 1 2.3 3.4 3.4 2.3 3.4 2.3 4.5 5.6 5.6 4.5 4.5 5.6 7.5 8.9 7.5 8.9 8.9 7.5 9.7 8.9 9.7 8.9 9.7 8.9 24 26.8 26.2 24.6 26.5 24.3 0.12 0.04 0.08 0.98 0.32 0.605

7 ABC -1 1 2.3 3.4 5.6 4.5 8.9 7.5 9.7 8.9 26.5 24.3 0.08 0.605

We can utilize this table more than its original meaning. The table is for three factors with full interaction. Therefore, we have no data associated with error. In the practical usage, we do not consider the whole interaction.

3.1. No Interaction If we do not consider any interaction, the direct effect is expressed with columns 1, 2, and 4. We regard the other column as the error one, and the corresponding variance Se2 is given by 2 2 2 2 Se2  S AB  S AC  SBC  S ABC  0.27

(127)

The corresponding freedom is the sum of the each variance and is e  4 in this case, and the unbiased error variance is given by se2 

n 2 Se  0.58 e

(128)

Analysis of Variances Using an Orthogonal Table

325

The each factor effect validity is evaluated as

s A2 46.08 FA  2   86.33 se 0.53

FB 

FC 

(129)

sB2 5.45   10.20 se2 0.53

(130)

sC2 0.98   1.84 se2 0.53

(131)

The corresponding critical F value is the same for all factors Fcrit as

Fcrit  F 1,4,0.95  7.71

(132)

Therefore, factors A and B are effective, and factor C is not effective.

3.2. The Other Factors Although the table is for three factors, we can utilize it for more factors if we neglect some interaction factors. Let us consider that we do not consider no interaction. We want to consider 4 factors, that is, factors A , B , C , and D . We can select any columns for factor D among the columns 3, 5, 6, and 7. We assign column 5 (AC) for the factor D , and set the experimental condition based on the Table 7. We assume that we obtain the same value for simplicity. 2 sD2  sAC

(133)

The variance associated with error is then given by 2 2 2 Se2  S AB  SBC  S ABC  0.23

(134)

Kunihiro Suzuki

326

The corresponding freedom is the sum of the each variance and is e  3 in this case, and the unbiased error variance is given by se2 

n

e

Se2  0.61

(135)

The each factor effect validity is evaluated as FA 

FB 

FC 

FD 

s A2 46.08   76.2 se2 0.61

(136)

sB2 5.45   9.00 se2 0.61

(137)

sC2 0.98   1.62 se2 0.61

(138)

sC2 0.32   0.53 se2 0.61

(139)

The corresponding critical F value is the same for all factors Fcrit as

Fcrit  F 1,3,0.95  10.13

(140)

Therefore, factors A is effective, and factors B , C , and D are not effective. We need one column for error at least. Therefore, we can add four factors as maximum.

3.3. Interaction We can consider 3 interactions as maximum, for factors A , B , and C . If we consider the interaction, we need to use the corresponding column. We consider the interaction of

AC , and the other factor D . We use the column 5 for the interaction of AC , and assign the column 3 for the factor D . The other columns of 6 and 7 are the ones for error.

Analysis of Variances Using an Orthogonal Table 2 2 Se2  SBC  S ABC  0.15

327 (141)

The corresponding freedom e  2 , and the unbiased variance is se2 

FA 

FB 

FC 

n

e

Se2  0.61

(142)

s A2 46.08   76.2 se2 0.61

(143)

sB2 5.45   9.00 se2 0.61

(144)

sC2 0.98   1.62 se2 0.61

(145)

2 s AC 0.32 FAC  2   0.07 se 0.61

FD 

sC2 0.32   0.53 se2 0.61

(146)

(147)

The corresponding critical F value is the same for all factors Fcrit as

Fcrit  F 1,2,0.95  18.51

(148)

Therefore, only factor A is effective.

4. THREE FACTOR ANALYSIS WITH REPETITION We extend the analysis in the previous section to the data with repetition, where we have r data for one given condition. The data is denoted as yijk _ r . The number of yijk is denoted as n , and is given by

Kunihiro Suzuki

328 n  23  r

(149)

We use r  3 here, and hence n  24 . However, the analysis procedure is applicable for any number of r . The orthogonal table is shown in Table 10. There are three data for the one same condition. 7 Table 10. Orthogonal table L8  2  with repeated data

1

2 -1 -1 -1 -1 1 1 1 1

Combination No

Column No. 1 2 3 4 5 6 7 8 Factor

A

3 -1 -1 1 1 -1 -1 1 1

B

1 1 -1 -1 -1 -1 1 1 AB

4

5 -1 1 -1 1 -1 1 -1 1

C

1 -1 1 -1 -1 1 -1 1 AC

6

7 1 -1 -1 1 1 -1 -1 1

BC

-1 1 1 -1 1 -1 -1 1 ABC

Data1 Data2 Data3 y111_1 y111_2 y111_3 y112_1 y112_2 y112_3 y121_1 y121_2 y121_3 y122_1 y122_2 y122_3 y211_1 y211_2 y211_3 y212_1 y212_2 y212_3 y221_1 y221_2 y221_3 y222_1 y222_2 y222_3

Therefore, we have additional variation associated error. The average associated with one condition is given by

ijk 

1 r

r

y

ijk _ p

p 1

(150)

The corresponding 8 averages are shown in Table 11. Table 11. Averages of repeated data Mean Data1 Data2 Data3 y111_1 y111_2 y111_3 (y111_1+y111_2+y111_3)/3 y112_1 y112_2 y112_3 (y112_1+y112_2+y112_3)/3 y121_1 y122_1 y211_1 y212_1 y221_1 y222_1

y121_2 y122_2 y211_2 y212_2 y221_2 y222_2

y121_3 y122_3 y211_3 y212_3 y221_3 y222_3

(y121_1+y121_2+y121_3)/3 (y122_1+y122_2+y122_3)/3 (y211_1+y211_2+y211_3)/3 (y212_1+y212_2+y212_3)/3 (y221_1+y221_2+y221_3)/3 (y222_1+y222_2+y222_3)/3

2 We can obtain variance associated with this Sein as

Analysis of Variances Using an Orthogonal Table 2 Sein 

1 n

2

2

2

3

  y

ijk _ r

 ijk

i 1 j 1 k 1 r 1



329

2

(151)

The freedom of the variance ein is given by

ein  2  2  2   r  1  2  2  2   3  1  16

(152)

The unbiased variance is given by 2 sein 

n 2 Sein ein

(153)

After we obtain the data shown in the Table 10, we construct the table. We simply add the data to the corresponding columns.

 

Table 12. Orthogonal table L8 27 Column No. Factor Level

Data1

Data2

Data3

Sum Variance Freedom

1 A -1 y111_1 y112_1 y121_1 y122_1 y111_2 y112_2 y121_2 y122_2 y111_3 y112_3 y121_3 y122_3 QA-

2 B 1 y211_1 y212_1 y221_1 y222_1 y211_2 y212_2 y221_2 y222_2 y211_3 y212_3 y221_3 y222_3 QA+

S(2)A 1

-1 y111_1 y112_1 y211_1 y212_1 y111_2 y112_2 y211_2 y212_2 y111_3 y112_3 y211_3 y212_3 QB-

3 AB 1 y121_1 y122_1 y221_1 y222_1 y121_2 y122_2 y221_2 y222_2 y121_3 y122_3 y221_3 y222_3 QB+

S(2)B 1

We define the variables below

-1 y121_1 y122_1 y211_1 y212_1 y121_2 y122_2 y211_2 y212_2 y121_3 y122_3 y211_3 y212_3 Q(AB)-

1 y111_1 y112_1 y221_1 y222_1 y111_2 y112_2 y221_2 y222_2 y111_3 y112_3 y221_3 y222_3 Q(AB)+

S(2)AB 1

4 C -1 y111_1 y121_1 y211_1 y221_1 y111_2 y121_2 y211_2 y221_2 y111_3 y121_3 y211_3 y221_3 QC-

5 AC 1 y112_1 y122_1 y212_1 y222_1 y112_2 y122_2 y212_2 y222_2 y112_3 y122_3 y212_3 y222_3 QC+

S(2)C 1

-1 y112_1 y122_1 y211_1 y221_1 y112_2 y122_2 y211_2 y221_2 y112_3 y122_3 y211_3 y221_3 Q(AC)-

1 y111_1 y121_1 y212_1 y222_1 y111_2 y121_2 y212_2 y222_2 y111_3 y121_3 y212_3 y222_3 Q(AC)+

S(2)AC 1

6 BC -1 y112_1 y121_1 y212_1 y221_1 y112_2 y121_2 y212_2 y221_2 y112_3 y121_3 y212_3 y221_3 Q(BC)-

7 ABC

1 -1 1 y111_1 y111_1 y112_1 y122_1 y122_1 y121_1 y211_1 y212_1 y211_1 y222_1 y221_1 y222_1 y111_2 y111_2 y112_2 y122_2 y122_2 y121_2 y211_2 y212_2 y211_2 y222_2 y221_2 y222_2 y111_3 y111_3 y112_3 y122_3 y122_3 y121_3 y211_3 y212_3 y211_3 y222_3 y221_3 y222_3 Q(BC)+ Q(ABC)-Q(ABC)+

S(2)BC 1

S(2)ABC 1

Kunihiro Suzuki

330



3

QA   y211_ r  y212 _ r  y221_ r  y222 _ r r 1



3

QA   y111_ r  y112 _ r  y121_ r  y122 _ r r 1

QB    y121_ r  y122 _ r  y221_ r  y222 _ r r 1

3



QB    y111_ r  y112 _ r  y211_ r  y212 _ r r 1

3



3



QC    y111_ r  y121_ r  y211_ r  y221_ r r 1

3

(155)



(157)



3

(159)















Q    y121_ r  y122 _ r  y211_ r  y212 _ r r 1

3

Q    y111_ r  y121_ r  y212 _ r  y222 _ r r 1

3

Q    y112 _ r  y122 _ r  y211_ r  y221_ r r 1

3



Q    y111_ r  y122 _ r  y211_ r  y222 _ r r 1

3



Q    y112 _ r  y121_ r  y212 _ r  y221_ r r 1

(158)



Q    y111_ r  y112 _ r  y221_ r  y222 _ r r 1

(156)



QC    y112 _ r  y122 _ r  y212 _ r  y222 _ r r 1

(154)





3







(160)

(161)

(162)

(163)



(164)

(165)

Analysis of Variances Using an Orthogonal Table 3









Q    y112 _ r  y121_ r  y211_ r  y222 _ r r 1

3

Q    y111_ r  y122 _ r  y212 _ r  y221_ r r 1

331

(166)

(167)

The total average  is given





1 3  y111_ r  y112 _ r  y121_ r  y122 _ r  y211_ r  y212 _ r  y221_ r  y222 _ r n r 1



(168)

This is also expressed by



1 Q   Q   n

(169)

where  is a dummy variable and    ,  , ,  . The average associated with each factor of levels are expressed by

  

  

1 Q  n 2  

(170)

1 Q  n 2  

(171)

Let us consider the variance associated with a factor A denoted by S A2 , which is evaluated as 2 2 n         A      A 2 S  n 2 A

This can be further reduced to

(172)

Kunihiro Suzuki

332

2 2 n        A      A   2 S  n 2 2        Q   1   QA  1 1 A        QA   QA      QA   QA     2   n  n   n  n      2       2       2 A

1   1  1      QA   QA       Q A   Q A    2   n  n  2



2

  

1 2  QA   QA   n2

(173)

The other variances are evaluated with a similar way as

SB2 

SC2 

1 2 Q  QB   2  B n

(174)

1 2 Q  QB   2  B n

(175)

2 S AB 

2 S AC 

2 S BC 

1 2 Q  QAB   2  AB  n

(176)

1 2 Q  QAC   2  AC  n

(177)

1 2 Q  QBC   2  BC  n

(178)

2 S ABC 

1 2 Q  QABC   2  ABC  n

(179)

The freedom of each variance is all 1 and their variances are the same as the variances given by

Analysis of Variances Using an Orthogonal Table sA2 

2 sAB 

n

A n

n

B

2 2 S AB , sAC 

AB

 s ABC  2

S A2 , sB2 

n

 ABC

SB2 , sc2 

n

AC

n

C

333

SC2

2 2 S AC , sBC 

(180) n

BC

2 SBC

(181)

  S ABC 2

(182)

If we neglect the some interactions, we should regard them as the extrinsic error. For 2

example, we neglect the interaction BC and ABC , the corresponding variance Se ' is given by 2 2 Se2'  SBC  S ABC

(183)

The corresponding freedom e ' is given by

e '  BC  ABC

(184)

The total variance associated with the error is given by 2 Se2  Sein  Se2'

(185)

The corresponding freedom e is given by

e  ein  e '

(186)

The unbiased variance is then given by se2 

n 2 Se e

The F values are given by

(187)

Kunihiro Suzuki

334

FA 

FB 

FC 

FAB 

s A2 se2

(188)

sB2 se2

(189)

sC2 se2

(190)

2 s AB se2

(191)

2 s AC FAC  2 se

(192)

The corresponding critical F value is the same for all factors Fcrit as Fcrit  F 1,e ,0.95

(193)

SUMMARY To summarize the results in this chapter:

 

In the L8 27 table, we have 7 columns and 8 rows, and basically we consider three factors of A, B, C without repeated data. The column number is assigned to the characters as

A  20  1: first column B  21  2 : second column C  22  4 : fourth column This is related to the direct effects of each factor. The other column is related to AB, AC, BC, ABC as below.

Analysis of Variances Using an Orthogonal Table

335

AB  1  2  3 AC  1  4  5 BC  2  4  6 ABC  1  2  4  7

We have 8 data denoted as yijk where i corresponds to A , j corresponds to B , and

k corresponds to C , and we have y111 , y112 , y121 , , y222 . We focus on the corresponding number related to the column. For example, the column is assigned to AB , we form i  j , and assign -1 or 1 for the row in each column depending on that i  j is odd or even. y111  1  1  2  1 y112  1  1  2  1 y121  1  2  3  1 y222  2  2  4  1

We denote each column as  , and denote each column with -1 as  and 1 as  . We denote the sum of the signed column as Q  or Q  . The corresponding variance associated with a factor  is given by S   2



1 Q  Q n2



2

The freedom of each factor  is all 1. Therefore, the corresponding unbiased variance is given by

s2 

nS2



 nS2

The variance associated with the error corresponds to the one we do not focus on. We denote it as  . Therefore, the variance associated with the error is given by Se2   S2 

The unbiased variance is given by

Kunihiro Suzuki

336

se2 

nSe2 n

where n is the number of factors we do not focus on. We evaluate

s2

F 

se2





We compare this with F 1, n , P . We then treat repeated data. The repeated number is denoted as r . The data number

n then becomes n  23  r . Each variance associated with the factor has the same expression as



1 Q  Q n2

S2 



2

The unbiased variance has also the same expression given by

s2 

nS2



 nS2

In this case, we have r data for given i, j , k , that is, each data are expressed with

yijk _ r and we can define averages for each level as ijk 

1 r  yijk _ p r p 1

We can then obtain the corresponding variance as 2 Sein 

2 1 2 2 2 r yijk _ p  ijk    n i 1 j 1 k 1 p 1

The corresponding freedom ein is given by

Analysis of Variances Using an Orthogonal Table

ein  23  r  1 We regard that the total error variance is given by 2 Se2  Sein   S2



The corresponding freedom is given by e  ein  n

Therefore, the corresponding unbiased variance is given by se2 

nS e2

e

We evaluate

F 

s2 se2

We compare this with F 1,e , P  .

337

Chapter 10

PRINCIPAL COMPONENT ANALYSIS ABSTRACT We evaluate one subject from various points of view, and want to evaluate each subject totally. Principal component analysis enables us to evaluate it. We compose variables by linearly combining the original variables. In the principal component analysis, the variables are composed so that the variance has the maximum value. We can judge that the value of the first component expresses the total score. We show the procedure to compose the variables. The method is applied to the regression where the explanatory variables have strong interaction. The variables are composed by the principal component analysis. We can then perform the regression using the composed variable, which gives us stable results.

Keywords: principal analysis, principal component, eigenvalue, eigenvector; regression

Figure 1. Path diagram for principal analysis.

Kunihiro Suzuki

340

1. INTRODUCTION We want to evaluate a subject from some different point of views in many cases. It is a rarely case that one subject is superior to the other subjects in all points. Therefore, we cannot judge which is better totally considering all points of views. In the principal component analysis, we construct composite variables, and decide which is better by simply evaluating the first composite variable. The schematic path diagram is shown in Figure 1. In the figure, we use p kinds of variables, and we theoretically have the same number kinds of principal variables. There is the other usage of principal component analysis. As we mentioned in Chapter 4 of volume 2 associated with multiple regression, the multiple regression factor coefficient becomes very complicated, unstable, and inaccurate when the interaction between explanatory variables are significant. In that case, we can construct a composite variable which consists of the explanatory variables with strong interaction. Using the composite variable, we can apply the multiple regression analysis, which makes the results simple, stable, and clear. We treat two variables first and extend the analysis to the multiple ones. We perform matrix operation in this chapter. The basics of the matrix operation are described in Chapter 15 of volume 3.

2. PRINCIPAL COMPONENT ANALYSIS WITH TWO VARIABLES We consider principal analysis with two variables

n , and each data are denoted as

xi1 xi 2 i  1, 2, ,

x j  j  1, 2

, n

. The sample number is

which is shown in Table 1.

Table 1. Data form for two variable principal analysis ID

x1

x1

1

x11

x11

2

x21

x12

i

xi1

xi 2

n

xn1

xn 2

Principal Component Analysis

The average and unbiased variance of

x1

341

  s x 1 1 are denoted as and , respectively. 2

s  x x Those of 2 are denoted as 2 and 2 , respectively. These parameters are given by 2

x1 

1 n  xi1 n i 1 1 n 2  xi1  x1   n  1 i 1

s1   2

x2 

(1)

(2)

1 n  xi 2 n i 1

(3)

1 n 2  xi 2  x2   n  1 i 1

s2   2

(4)

The correlation factor between r12 

x1

and

x2

1 n  x1  x1  x2  x2   2 2 n  1 i 1 s  s  1

r is denoted as 12 and is given by (5)

2

First of all, we normalize the variables as u1 

u2 

x1  x1 s1

2

x2  x2 s2  2

(6)

(7)

The normalized variables hold below. u1  u2  0 n

u i 1

2 i1

 n 1

(8)

(9)

Kunihiro Suzuki

342 n

u i 1

2

i2

 n 1

n

u u i 1

i1 i 2

  n  1 rx1 x2

(10)

(11)

We construct a composite variable z below. z  a1u1  a2u2

(12)

We impose that the unbiased variance of z has the maximum value to determine the 2 factors a1 and a2 . The unbiased variance sz is evaluated as 1 n 2  zi  z   n  1 i 1 1 n 2   zi n  1 i 1 1 n 2   a1ui1  a2ui 2   n  1 i 1

sz   2



(13)

n n 1  2 n 2 2 2  a1  ui1  2a1a2  ui1ui 2  a2  ui 2  n  1  i 1 i 1 i 1 

 a12  a22  2a1a2 rx1 x2

This value increases with increasing a1 and a2 . Therefore, we need a certain restriction to determine the value of a1 and a2 . It is given by a12  a22  1

(14)

Consequently, the problem is defined as follows. We determine the factors of a1 and a2 so that the unbiased variance has the maximum value under the restriction of Eq. (14). We apply Lagrange’s method of undetermined multipliers to solve this problem. We introduce a Lagrange function given by

f  a1 , a2 ,    a12  a22  2a1a2 r12    a12  a22  1

(15)

Principal Component Analysis

343

Partial differentiating Eq. (15) with respect to a1 and set it to 0, we obtain f  a1 , a2 ,    2a1  2a2 r12  2a1  0 a1

(16)

Arranging this, we obtain a1  r12 a2  a1

(17)

Partial differentiating Eq. (15) with respect to a1 and set it to 0, we obtain f  a1 , a2 ,    2a2  2a2 r12  2a2  0 a2

(18)

Arranging this, we obtain a2  r12 a1  a2

(19)

Eqs. (17) and (19) are expressed by a matrix form as 1   r12

r12   a1   a1        1   a2   a2 

(20)

This can also be expressed by

Ra  a

(21)

where 1 R  r12

r12   1

(22)

a  a  1  a2 

(23)

aT   a1 , a2 

(24)

Kunihiro Suzuki

344 Let us consider the meaning of 1  r12

 . Multiplying Eq. (24) to Eq. (20), we obtain

r12   a1   a1       a1 , a2     1   a2   a2 

 a1 , a2  

(25)

Expanding this, we obtain  1 r12  a1  left side   a1 , a2      r12 1  a2  a  r a    a1 , a2   1 12 2   r12 a1  a2   a12  r12 a1a2  r12 a1a2  a22

(26)

 a   a  2r12 a1a2 2 1

 sz

2 2

2

a  right side   a1 , a2    1   a2     a12  a22 

(27)



Therefore, we obtain

  sz 2

(28)

 is equal to the variance of z . Large  means a large variance. This  is called as an eigenvalue. Since our target for the principal analysis is to construct a variable that makes clear the difference of each member, the large variance is the expected one. Therefore, the large  is our requested one, and we denote the maximum one as first eigenvalue, and second, and so on. The eigenvalue  can be evaluated as follows. Modifying Eq. (20), we obtain r12   a1  1       0 1     a2   r12

This has roots only when it holds below.

(29)

Principal Component Analysis 1  r12 0 r12 1 

345 (30)

This is reduced to

1   

2

 r122  0

(31)

Therefore, we have two roots given by

1  1  r12

(32)

1  1  rx1x 2

(33)

We can obtain a set of  1 2  for each  . Substituting   1  r12 into (20), we obtain a ,a

a1  r12 a2  1  r12  a1

(34)

a2  r12 a1  1  r12  a2

(35)

These two equations are reduced to the same result of a1  a2

(36)

Imposing the restriction of Eq. (14), we obtain 2a12  1

(37)

Therefore, we obtain

a1  

1 2

We adopt a positive factor, and the composite number z1 is given by

(38)

Kunihiro Suzuki

346

z1 

1 2

u1 

1 2

u2

(39)

Similarly, we obtain for   1  r12 as a1  r12 a2  1  r12  a1

(40)

a2  rx1x2 a1  1  r12  a2

(41)

This is reduced to a1  a2

(42)

Considering the restriction of Eq. (14), we obtain 2a12  1

(43)

Therefore, we obtain

1

a1  

(44)

2

We adopt the positive one, and the composite number z2 is given by

z2 

1 2

u1 

1 2

u2

(45)

In the case of r12  0 , z1 is called as the first principal component, and z2 is the second principal component. In the case of r12  0 , z2 is called as the first principal component, and z1 is the second principal component. Assuming r12  0 , we summarize the results. The first principal component eigenvalue and eigenvector are given by

1  1  r12

(46)

Principal Component Analysis

a  1

     

347

1   2 1   2

(47)

The second principal component eigenvalue and eigenvector are given by

2  1  r12

a

 2

(48)

 1    2    1    2 

(49)

The normalized composite values are given as bellows. The first principal component is given by zi   1

1 2

ui1 

1 2

(50)

ui 2

The second principal component is given by

zi   2

1 2

ui1 

1 2

ui 2

(51)

The contribution of i component tot the total is evaluated as

 2

i

 2

s1  s2



i 1  r12  1  r12



i

(52)

2

The total sum of the eigenvalue is equal to the variable number in general.

3. PRINCIPAL COMPONENT ANALYSIS WITH MULTI VARIABLES We extend the analysis of two variables in the previous section to m variables

x1 , x2 , , xm

, where m is bigger than two. The sample data number is assumed to be n .

Therefore, the data are denoted as

xi1 , xi 2 , , xim i  1, 2,

, n

.

Kunihiro Suzuki

348

Normalizing these variables, we obtain u1 

x1  x1 s1

2

x2  x2

, u2 

s2

2

,up 

,

xp  xp sp 

(53)

2

The averages and variances of the variables are 0 and 1 for all. We construct a composited variable given by z  a1u1  a2u2 

 amum

(54)

The corresponding unbiased variance is given by 1 n 2  zi  z   n  1 i 1 1 n 2   zi n  1 i 1 1 n    a1ui1  a2ui 2  n  1 i 1

sz   2

 a12  a22 

 am uim 

2

 am2

2a1a2 r12  2a1a3 r13 

 2a1am r1m

2a2 a3 r23  2a2 a4 r24 

 2a2 am r2 m

2a3 a4 r34  2a3 a5 r35 

 2a3 am r2 m

(55)

2am 1am rm 1, m

The constrain is expressed by a12  a22 

 am2  1

(56)

Lagrange function given by f  a12  a22 

 am2

2a1a2 r12  2a1a3 r13 

 2a1am r1m

2a2 a3 r23  2a2 a4 r24 

 2a2 am r2 m

2a3 a4 r34  2a3 a5 r35 

 2a3 am r2 m

2am 1am rm 1, m   a12  a22 

 am2  1

(57)

Principal Component Analysis

349

Partial differentiating Eq. (57) with respect to a1 and set it 0, we obtain f  2a1  2a2 r12  2a3r13  a1

 2am r1m  2 a1  0

(58)

Partial differentiating Eq. (57) with respect to a2 and set it 0, we obtain f  2a1r12  2a2  2a3 r23  2a4 r24  a2

 2am r2 m  2 a2  0

(59)

Repeating the same process to the am , and we finally obtain the results with a matrix form, which is given by  1 r12   r21 1    rm1 rm 2

r1m  a1   a1      r2 m  a2  a   2          1  am   am 

(60)

We obtain m kinds of eigenvalues from 1   r12 r21 1   rm1

r1m r2 m

0

(61)

1 

rm 2

We show briefly how we can derive eigenvalues and eigenvectors. We assume that

1  2 

 m

(62)

i corresponds to the i  th principal component. The corresponding eigenvector can be evaluated from  1 r12   r21 1    rm1 rm 2

r1m  a1   a1      r2 m  a2  a  i  2          1  am   am 

(63)

Kunihiro Suzuki

350

We cannot obtain the value of the elements of the eigenvector but the ratio of each element. We consider the ratio with respect to a1 , and set the other factor as ai  i a1

(64)

We can evaluate i from Eq. (63), where a1 is not determined. From the constraint of Eq. (54), we can decide a1 as   m2 am2  1   22 

a12   22 a12 

  m2  a12

1

(65)

We select positive one and obtain

a1 

1 1   2 2

(66)

 m2

Therefore, we obtain z

1 1   2 2

 m2

u1  2u2 

 mum 

(67)

The contribution ratio is given by i

(68)

m

In the principal component analysis, we usually consider up to the second component. It is due to the fact that we can utilize a two-dimensional plot using them. The validity of only using two principal components can be evaluated by the sum of the component given by 1  2 m

(69)

The critical value for the accuracy is not clear. However, we can roughly estimate if the value is bigger than 0.8 or not for judging the validity of the analysis. The data are plotted on the two dimensional plane with the first and second principal component axes.

Principal Component Analysis

351

Each component variable for j  1, 2, , m is plotted as

 a  , a    a  , a   1 1

1

2

1 2

2 2

(70)

 a  , a   1 m

2 m

The each member is plotted as

 z  , z   1

i

2

(71)

i

where

zi   a1 ui1  a2 ui 2

(72)

zi   a1 ui1  a2 ui 2

(73)

1

1

2

2

1

2

We treat the first and second components identically up to here. However, it should be weighted depending on the eigenvalue. We should remind that the eigenvalue has the meaning of the variance, and hence it is plausible to weight the square root of the eigenvalue which corresponds to the standard deviation. That is, we multiply first component, and

2

1

to the

to the second component.

d j We can then define the distance between two members i and denoted as ij , and it is given by dij 



1 zi1  1 z j1

  2

2 zi 2  2 z j2



2

(74)

4. EXAMPLE FOR PRINCIPAL COMPONENT ANALYSIS (SCORE EVALUATION) We apply principal component analysis to the score evaluation. We obtain the score data as shown in Table 2.

Kunihiro Suzuki

352

Table 2. Score of subjects

Member ID 1 2 3 4 5 6 7 8 9 10 Mean Stdev.

Japanese x1 86 71 42 62 96 39 50 78 51 89 66.4 20.5

English x2 79 75 43 58 97 33 53 66 44 92 64.0 21.6

Mathematics x3 67 78 39 98 61 45 64 52 76 93 67.3 19.4

Science x4 68 84 44 95 63 50 72 47 72 91 68.6 18.0

Table 3. Normalized scores and corresponding first and second principal components Member ID 1 2 3 4 5 6 7 8 9 10

Japanese u1 0.954 0.224 -1.188 -0.214 1.441 -1.334 -0.798 0.565 -0.750 1.100

English Mathematics u2 u3 0.696 -0.015 0.510 0.552 -0.974 -1.461 -0.278 1.585 1.531 -0.325 -1.438 -1.151 -0.510 -0.170 0.093 -0.790 -0.928 0.449 1.299 1.327

Science u4 -0.033 0.857 -1.368 1.469 -0.312 -1.035 0.189 -1.202 0.189 1.246

First component 0.796 1.073 -2.493 1.283 1.165 -2.479 -0.643 -0.671 -0.518 2.488

Second component 0.857 -0.348 0.321 -1.765 1.802 -0.297 -0.678 1.342 -1.148 -0.086

The normalized values are shown in Table 3. The correlation matrix is then evaluated as 0.967 0.376 0.311   1   0.967 1 0.415 0.398  R  0.376 0.415 1 0.972    1   0.311 0.398 0.972

(75)

The corresponding eigenvalues can be evaluated with a standard matrix operation shown in Chapter 15 of volume 3, and are given by

Principal Component Analysis

353

1  2.721, 2  1.222, 3  0.052, 4  0.005

(76)

The eigenvectors are given by z1  0.487u1  0.511u2  0.508u3  0.493u4

(77)

z2  0.527u1  0.474u2  0.481u3  0.516u4

(78)

z3  0.499u1  0.539u2  0.504u3  0.455u4

(79)

z4  0.485u1  0.474u2  0.506u3  0.533u4

(80)

The contribution of each principal component is given by 1 4

 0.680,

2 4

 0.306,

3 4

 0.013,

4 4

 0.001

(81)

The contribution summing up to the second component is 98% and is more than 80%, and we can focus on the two components neglecting the other ones. The first and second components are shown in Table 3. The first and second component for each subject is given by Japanese: 0.487, 0.527  English: 0.511, 0.474 

(82)

Mathematics: 0.508, 0.481 Science: 0.493, 0.516 

We treat the first and second component identically above. The weighted ones depending on the eigenvalue can also be evaluated. That is, we multiply 1 first component, and 2 by

 1.222

 2.721

to the

to the second component. The resultant data are given

Japanese: 0.803, 0.583 English: 0.843, 0.524  Mathematics: 0.838, 0.532  Science: 0.813, 0.570 

(83)

Kunihiro Suzuki

354

Let us inspect the first component z1 , where the each factor is almost the same. Therefore, it means simple summation. Let us inspect the second component z2 . u1 and u2 are the almost the same, and u3 and u4 are the almost the same with the negative sign. The group u1 and u2 corresponds to

Japanese and English, and the group u3 and u4 corresponds to mathematics and science. Therefore, this expresses the quantitative course / qualitative course. The naming is not straight forward. We should do it by inspecting the value of factors. We further weight each score as shown in Table 4, and the corresponding twodimensional plot is shown in Figure 2. Table 4. Principal components and weighted principal components

Member ID

First component

1 2 3 4 5 6 7 8 9 10

Second component

0.796 1.073 -2.493 1.283 1.165 -2.479 -0.643 -0.671 -0.518 2.488

0.857 -0.348 0.321 -1.765 1.802 -0.297 -0.678 1.342 -1.148 -0.086

Weighted first component 1.313 1.770 -4.113 2.116 1.922 -4.090 -1.060 -1.107 -0.854 4.104

Weighted second component 0.948 -0.385 0.355 -1.951 1.992 -0.328 -0.750 1.483 -1.270 -0.095

Weighted second component and factor loading

5

-5

4

3 No.5

2 No.8 Jap. Eng.

1

No.1

No.3

No.10

0 -4 No.6

-3

-2

-1

0 No.7 -1 No.9

1 2 Math. No.2 Sci.

-2

3

4

5

No.4

-3 -4

-5

Weighted first principal component and factor loading

Figure 2. Weighted principal components and factor loadings.

We first simply inspect the horizontal direction which corresponds to the first principal component. No. 10 has high score, and the second group is No. 4 and No. 5, and the last group is No. 3 and No. 6.

Principal Component Analysis

355

We then focus on the vertical direction which corresponds to the second principal component. If it is positive it is qualitative characteristics and negative it is quantitative characteristics. Let us compare the second group No. 4 and 5 based on the second principal component, where both have almost the same first principal component. No. 4 has a quantitative character, and No. 5 has a quality character. We can also see that No. 10 is an all-rounder. The distance in the plane of Figure 2 corresponds to the resemblance of each member. The distances are summarized in Table 5. No. 1 resembles to No. 5, and No.2 to No.1, and so on. Table 5. Distance between each member

No.1 No.2 No.3 No.4 No.5 No.6 No.7 No.8 No.9 No.10

No.1 No.2 No.3 No.4 No.5 No.6 No.7 No.8 No.9 No.10 1.5 5.8 3.2 1.3 5.9 3.1 2.6 3.3 3.1 1.5 6.3 1.7 2.5 6.2 3.0 3.6 2.9 2.5 5.8 6.3 7.0 6.6 0.7 3.4 3.4 3.8 8.7 3.2 1.7 7.0 4.2 6.8 3.6 5.0 3.2 2.9 1.3 2.5 6.6 4.2 6.8 4.3 3.2 4.5 3.2 5.9 6.2 0.7 6.8 6.8 3.2 3.7 3.6 8.6 3.1 3.0 3.4 3.6 4.3 3.2 2.4 0.6 5.5 2.6 3.6 3.4 5.0 3.2 3.7 2.4 2.9 5.7 3.3 2.9 3.8 3.2 4.5 3.6 0.6 2.9 5.4 3.1 2.5 8.7 2.9 3.2 8.6 5.5 5.7 5.4

5. MULTIPLE REGRESSION USING PRINCIPAL COMPONENT ANALYSIS In the multiple regression analysis, the analysis becomes unstable and inaccurate when the interaction between explanatory variables is significant as shown in Chapter 4 of volume 2. In that case, we can utilize the principal analysis to overcome the problem. Let us consider the case where the interaction between two explanatory variables x1 and

x2 is significant. We construct a composite variable given by z  a1u1  a2u2

(84)

u1 and u2 are normalized variables. We then obtain the first principal component

where as

zi   1

1 2

ui1 

1 2

ui 2

(85)

Kunihiro Suzuki

356

We should evaluate the contribution of the component given by 1

(86)

2

This should be more than 0.8. We assume that the contribution is more than 0.8 and continue the analysis below. The average of the first principal component is zero. The variance  z12 

 z12 can be evaluated below.

1 n 2  zi1 n  1 i 1 2

1 n  1 1   ui1  ui 2    n  1 i 1  2 2  n 1 1    ui12  ui 22  2ui1ui 2  2 n  1 i 1  1  r12

(87)

The correlation factor between objective variables and the first principal component is given by 1 n vi zi   n  1 i 1   2 1

ryz1 

z1

 

1 n  1 1  vi  ui1  ui 2   1  r12 n  1 i 1  2 2  1

1 2 1  r12 

r

y1

(88)

 ry 2 

When r12 is zero, it becomes average of each correlation factor, that is, the two variables becomes one variable. The regression is given by v  ryz1 z   1

 

1 2 1  r12  ry1  ry 2 2 1  r12

r

y1

1  1   ry 2   u1  u2  2 2  

 u1  u2 

(89)

Principal Component Analysis

357

Therefore, the regression can be expressed by the average correlation factor, which is decreased by the interaction between the two explanatory variables. We can easily extend this procedure to multi variables. We can obtain the first component variable that consists of m variables where the interaction is significant. That is, m

z   ai  ui 1

(90)

i 1

We then analyze the multiple regression analysis using this lumped variable as well as the other variables.

SUMMARY To summarize the results in this chapter— We normalize the data as

uij  j  1,2, , p; i  1,2, , n  We then obtain the matrix relationship given by  1 r12   r21 1    rm1 rm 2

r1m  a1   a1      r2 m  a2  a   2          1  am   am 

where rij is the correlation factor between factor i and j . We can obtain eigenvalues and eigenvectors given by



1 ; a1  k11 , k21 ,

, k p  1





, k p 



, k p

2 ; a 2  k1 2 , k2 2 ,

2



・・・ p ; a p  k1 p , k2 p ,

p



Kunihiro Suzuki

358 We then express the data as

zi   k1 ui1  k2 ui 2 

 k p uip

zi   k1 ui1  k2 ui 2 

 k p uip

1

1

2

1

2

2

1

2

・・・

zi   k1 ui1  k2 ui 2  p

p

p

 k p uip p

This is only the different expression of the same data. However, the difference exits where we focus on only the first two components. The effectiveness of the two component selection can be evaluated as 1  2   2  1 1  2    p p

 z  , z   1

i

2

i

Each item can be plotted in the two dimensional plane as    Item 2 :  k   , k   

Item 1: k1  , k1



1

2

1 2

2

2

Item p : k p  , k p  1

2



The score of each member is plotted in the two-dimensional plane as



1 zi1 , 2 zi 2



We can judge the goodness of the member by evaluating the first term of  2

z characteristics from the second term of 1 i . The closeness of each member can be evaluated from the distance given by

1 zi1

, and

Principal Component Analysis d zij 



1 zi1  1 z j1

  2

2 zi 2  2 z j2



359

2

The each item is plotted in the two-dimensional plane as



1 k j1 , 2 k j 2



The closeness of each item can be evaluated from the distance given by dkij 



1 ki1  1 k j1

  2

2 ki 2  2 k j2



2

We can also evaluate the closeness of the member and the item from the distance given by d zkij 



1 zi1  1 k j1

  2

2 zi 2  2 k j 2



2

When the multiple regression suffers high interaction between explanatory variables, we perform the principal component analysis and for a composite variable given by m

z   ai  ui 1

i 1

We then perform one variable regression for each pair.

Chapter 11

FACTOR ANALYSIS ABSTRACT We obtain various data for many factors. We assume the output data comes from some few fundamental factors. Factor analysis gives us the procedure to decide the factors, where the factor number is less than the parameter numbers in general. Whole data are then expressed by the fewer factors, which are regarded as causes of the original data.

Keywords: factor analysis, factor point, factor loading, independent load, Valimax method, rotation matrix

1. INTRODUCTION We obtain various kinds of data. We assume that there are some common causes for all data, and each data can be reproduced by the cause factors. The procedure to evaluate the common cause is called a factor analysis. There are some confusion between principal component analysis and factor analysis, and we utilize the principal component analysis in the factor analysis. However, both analyses are quite different as explained below. In the principal component analysis, we extract the same kind number of the variable as that of the original variables, but focus on only the first principal component and treat it as though it is the target variable of the original variables, which is shown in Figure 1 (a). In the factor analysis, we want to extract variables which are regarded as the cause for the all original variables. We set the number of the variable for the cause from the beginning, and determine the variables. There is no order between the extracted variables. The schematic path diagram is shown in Figure 1 (b).

362

Kunihiro Suzuki

Consequently, in principal component analysis, and factor analysis, the focused variable number is fewer than the original variable number, and the analysis is quite similar. However, the role of the extracted variables is quite different. The ones for the principal analysis are objective one for the obtained data, and the ones for the factor analysis are the explanatory variables.

Figure 1. Schematic path diagram. (a) Principal component analysis, (b) Factor analysis.

In this chapter, we start with one factor, and then treat two factors, and finally treat arbitrarily factor numbers. We repeat almost the same step three times, where some steps

Factor Analysis

363

are added associated with the number. Performing the steps, we may be able to obtain a clear image of the factor analysis. We perform matrix operation in this chapter. The basic matrix operation is described in Chapter 15 of volume 3.

2. ONE FACTOR ANALYSIS We focus on one variable, that is, we assume that only one variable contributes to the whole original data.

2.1. Basic Equation for One Variable Factor Analysis We assume that one member has three kinds of scores of x, y , z . We also assume that the three variables are all normalized.

a , a , a  We denote f as the factor point, and factor loadings as x y z , and independent

loads as

 e , e , e  . Then the original variables are expressed by

 x    y  z  

x

y

z

 ax   ex      f  a y    ey  a  e   z  z

(1)

We denote each data using a subscript i , and they are expressed by

 xi   ax   eix         yi   fi  a y    eiy  a  e  z   z   iz   i

(2)

a , a , a  The factor loadings x y z are independent of the member, but depend on x, y , z . The factor point f depends on the member, but independent of x, y , z . The independent e ,e ,e

variables x y z depend on both x, y , z and the member. The terms on right side of Eq. (2) are all unknown variables. We need some restrictions on them to decide the terms.

Kunihiro Suzuki

364

We assume that a factor point f is normalized, that is E f   0

(3)

V  f  1

(4)

The average of the independent load is zero, that is,

E  e   0

(5)

where   x, y , z . The covariance between f , e are zero, that is,

Cov  f , e   0 Cov e , e   0

(6)

for   

(7)

We define the vectors below.

 f1    f f  2      fn 

(8)

 e1x    e ex   2 x       enx 

(9)

 e1 y    e2 y ey         eny 

(10)

Factor Analysis  e1z    e ez   2 z       enz 

 x1  X   y1 z  1

365

(11)

xn   yn  zn 

x2 y2 z2

(12)

 ax    A   ay  a   z

(13)

F   f1

f2

 e1x     e1 y e  1z

e2 x e2 y e2 z

fn 

(14)

enx   eny  enz 

(15)

Eq. (1) can then be expressed by X  AF  

(16)

T Multiplying X to the Eq. (16) from the right side, and expanding it, we obtain

 x1  T XX   y1 z  1

x xn   1  x yn   2  zn    xn

x2 y2 z2

 nS xx    nS xy  nS  xz

nS xy nS yy nS yz

1   n  rxy r  xz

rxy 1 ryz

y1 y2 yn

z1   z2    zn 

nS xz   nS yz  nS zz  rxz   ryz  1 

(17)

Kunihiro Suzuki

366 T

where X is the transverse matrix of X . Similarly, we multiply transverse matrix to Eq. (16) from the right side, we obtain

 AF    AF      AF     F T AT  T  T

 AFF T AT  AF T  F T AT  T  A  FF T  AT  A  F T    F T  AT  T T

(18)

We evaluate the terms in Eq. (18) as below. n

FF T   f i 2 i 1

n

(19)

F T   f1

 e1x  e2 x fn      enx

f2

    fi eix  fi eiy i 1  i 1   0 0 0 n

n

eny

n

fe i 1

e1z   e2 z    enz 

e1 y e2 y

i iz

  

(20)

Substituting Eqs. (17), (19), and (20) into Eq. (16), we obtain 1  1 T XX   rxy n r  xz

rxz   1 ryz   AAT  T n 1 

rxy 1 ryz

(21)

Evaluating the terms in the right side of Eq.(21), we obtain

 ax    AA   a y   ax a   z T

ay

az  (22)

Factor Analysis  e1x 1 T 1    e1 y n n  e1z

e2 x e2 y e2 z

 e1x enx    e2 x eny    enz    enx

e1 y e2 y eny

367

e1z   e2 z    enz 

n n  n 2  eix eiy  eix eiz     eix i 1 i 1  i 1  n n  1 n    eiy eix  eiy 2  eiy eiz  n  i 1 i 1 i 1  n n n   2   eiz eix  eiz eiy  eiz  i 1 i 1  i 1  n 1  2 0 0   n  eix  i 1    1 n 2  0 eiy 0   n i 1    1 n 2 0  0  eiz  n i 1   Se 2 0 0   x  2   0 Sey  0     2   0 0 S ez  

(23)

Substituting Eqs. (22) and (23) into Eq. (21), we obtain 1   rxy r  xz

rxy 1 ryz

rxz   ax     ryz    a y   ax 1   az 

ay

 Se 2  x az    0   0 

0 Sey  2

0

0   0   2 Sez  

(24)

This is the basic equation for one factor analysis. Let us compare the case for principal component analysis, where the covariance matrix is expanded as follows. 1   rxy r  xz

rxy 1 ryz

rxz   ryz   1v1v1T  2 v2 v2T  3v3v3T 1 

The right side terms are all determined clearly.

(25)

Kunihiro Suzuki

368 If we can assume

 ax    a   a y   1 v1v1T a   z

(26)

 Se 2  x  0   0 

(27)

0 Sey  2

0

0   0   2 v2 v2T  3v3v3T  2 Sez  

The first principal analysis and factor analysis become identical. It is valid when the first principal component dominates all. However, it is not true in general. As we mentioned, we can determine whole terms of Eq. (25) in the principal component analysis. On the other hand, we cannot solve the terms in Eq. (24) shown below in general.   2  ax   Sex     ay  ,  0 a   z  0 

0 Sey  2

0

0   0   2 Sez  

(28)

We solve Eq. (24) with the aid of the principal component analysis, which is shown in the next section.

2.2. Factor Load We can expand the covariance matrix as below as shown in Chapter 10 for the principal component analysis. 1   rxy r  xz

rxy 1 ryz

rxz   ryz   1v1v1T  2 v2 v2T  3v3v3T 1  



1 v1



1 v1

  T

2 v2



2 v2

  T

3 v3



3 v3



T

(29)

Factor Analysis We regards the first term



1 v1



369



T

1 v1 this as factor load in the factor analysis, that

is,

1   rxy r  xz

rxy 1 ryz

rxz   ryz   1 



1_1 v1_1



1_1 v1_1



T

 Se 2  x  0   0 

0 Sey  2

0

0   0   2 Sez  

(30)

Therefore, we approximate the factor load as

 ax     a y   1_1 v1_1 a   z

(31)

This is not the final form factor loadings, but is the first approximated one. The first approximated independent load can be evaluated as   Se 2_1 0 0  1 x    2  0 Sey _1 0    rxy    2   rxz  0 0 Sez _1  

rxy 1 ryz

rxz   ryz   1 



1_1 v1_1



1_1 v1_1



T

(32)

The non-diagonal elements on the right side of Eq. (32) are not 0 in general. Therefore, Eq. (32) is invalid from the rigorous point of view. However, we ignore the non-diagonal elements and regard the diagonal elements on both sides are the same. We then form the new covariance matrix given by  1  Se 2_1 rxy rxz  x   2 R1   rxy 1  Sey _1 ryz     2   r ryz 1  Sez _1   xz

We evaluate the eigenvalue and eigenvectors of the matrix, and evaluate below.

(33)

Kunihiro Suzuki

370  Se 2_ 2  x  0   0  1    rxy r  xz

0   S 0   2 0 Sez _ 2  rxy rxz   1 ryz   1_ 2 v1_ 2 ryz 1  0

 2 ey _ 2





1_ 2 v1_ 2



T

(34)

We evaluate the difference between the approximated independent factors given by    Se 2_ 2  Se 2_1 x  x   2  2     Sey _ 2  Sey _1    S  2  S  2  ez _1   ez _ 2

(35)

If the value below is less than a certain value 



2 1   2  2  2  2  Se 2_ 2  Se 2_1    Se 2_ 2  Se 2_1 S  S  e _ 2 e _1 x x y y z z      3 



(36)

we regard the component is the load factor given by

a  1_ m v1_ m

(37)

If the value is more than a certain value, we construct the updated covariance matrix given by 1  Se 2_ 2 rxy rxz  x   2 R2   rxy 1  Sey _ 2 ryz     2   r r 1  S xz yz e _ 2  z 

(38)

We evaluate the eigenvalue and eigenvector of it and the updated one denote as

1_ 3 , v1_ 3 . We can then evaluate the updated independent matrix given by

Factor Analysis  Se 2_ 3  x  0   0  1    rxy r  xz

0   2 Sey _ 3 0   2 0 Sez _ 3  rxy rxz   1 ryz   1_ 3 v1_ 3 ryz 1 

371

0





1_ 3 v1_ 3



T

(39)

We further evaluate the updated difference of the independent element matrix below.  Se 2_ 3  Se 2_ 2  x  x  2 2      Sey _ 3  Sey _ 2     S  2  S  2  ez _ 2   ez _ 3

(40)

and evaluate below. 



2 2 2 1   2 2 2 2 2 2 Sex _ 3  Sex _ 2    Sey _ 3  Sey _ 2    Sez _ 3  Sez _ 2   3



(41)

We repeat this process until is less than the certain critical value. When we obtain this value after the m times process, we obtain the load factor as

a  1_ m v1_ m

(42)

The corresponding diagonal elements are given by d x2  1  Sex _ m , d y2  1  Sey _ m , d z2  1  Sez _ m 2

2

2

(43)

which we utilize later.

2.3. Factor Point and Independent Load We review the process up to here. The basic equation for one variable factor analysis is given by

Kunihiro Suzuki

372

 xi   ax   eix         yi   fi  a y    eiy  a  e  z   z   iz   i

(44)

We decide a factor load, and the variances of ex , ey , ez . We further need to evaluate fi and ex , ey , ez themselves. We assume that x, y , z are commonly expressed with f . This means that the predictive

fˆ value of f is denoted as i and it can be expressed with the linear sum of x, y , z given by fˆi  hfx xi  hfy yi  hfz zi

(45)

The deviation from the true value fi is expressed by n



Q f   f i  fˆi i 1



2

   fi   h fx xi  h fy yi  h fz zi   n

2

i 1

(46)

We decide the factors h fx , h fy , h fz so that Q f has the minimum value. Partial differentiate Eq. (46) with respect to h fx , and set it to 0, we obtain

Q f h fx



 h fx

  f   h n

i 1

i

x  h fy yi  h fz zi  

2

fx i

 2  fi   h fx xi  h fy yi  h fz zi  xi  0 n

i 1

(47)

We then obtain

h n

i 1

fx xi xi  h fy yi xi  h fz zi xi    f i xi n

i 1

The left hand side of Eq. (48) is further modified as

(48)

Factor Analysis

h

x x  h fy yi xi  h fz zi xi   h fx  xi xi  h fy  yi xi  h fz  zi xi

n

i 1

373

fx i i

n

n

n

i 1

i 1

i 1

 n  h fx rxx  h fy rxy  h fz rxz 

(49)

The right side of Eq. (48) is further modified as n

n

 f x   f a i 1

i i

i

i 1

x

f i  bx g i  exi 

 nax

(50)

Therefore, we obtain

h fx rxx  h fy rxy  h fz rxz  ax

Similarly, we obtain with respect to Q f h fy



 h fx

  f   h n

i 1

i

(51)

h fy

x  h fy yi  h fz zi  

as 2

fx i

 2  fi   h fx xi  h fy yi  h fz zi   yi  0 n

i 1

(52)

We then obtain

h fx rxy  h fy ryy  h fz ryz  ay

Similarly, we obtain with respect to

h fx rzx  h fy ryz  h fz rzz  az

(53)

h fz

as

(54)

Summarizing these, we obtain  rxx   rxy r  zx

rxy ryy ryz

rzx  h fx   ax      ryz  h fy    a y     rzz   h fz   az 

(55)

Kunihiro Suzuki

374 We then obtain  h fx   rxx     h fy    rxy h  r  fz   zx

rxy ryy ryz

rzx   ryz  rzz 

1

 ax     ay  a   z

(56)

Using the h fx , h fy , h fz , the factor point is given by

fˆi  hfx xi  hfy yi  hfz zi

(57)

Independent load can be evaluated as  exi   xi   ax        e  y  f i  ay   yi   i  a  e   z   z  zi   i 

(58)

3. TWO-FACTOR ANALYSIS 3.1. Basic Equation We treat a two-factor analysis with three normalized variables x, y , z in this section. The one factor point is denoted as f , and the other as g . The factor load associated with f is denoted as  ax , ay , az  , and the factor load associated with g is denoted as

b , b , b  . The independent load is denoted as  e , e , e  . The variable is then expressed x

y

z

x

y

z

as  x  ax   bx   ex          y  f a  g    y  by    ey  z a  b  e     z  z  z

(59)

Expressing each data as i , we have  xi   ax   bx   eix           yi   fi  a y   gi  by    eiy  a  b  e  z   z  z   iz   i

(60)

The factor loadings

Factor Analysis

375

 a , a , a  and b , b , b 

are independent of the member, but

x

y

z

x

y

z

depend on x, y , z . The factor point f and g depends on the member, but independent on x, y , z . The independent variables

ex , ey , ez

depend on both x, y , z and member.

The terms on right side of Eq. (60) are all unknown variables. We need some restrictions on them to decide the terms We assume that factor points f and g are normalized, that is E f   0

(61)

V  f  1

(62)

Eg  0

(63)

V g  1

(64)

The average of the independent load is zero, that is, E  e   0

(65)

where   x, y , z . The covariance between f , g , e are zero, that is, Cov  f , g   0

(66)

Cov  f , e   0

(67)

Cov  g , e   0

(68)

Cov e , e   0

for   

We define the vectors below.

(69)

Kunihiro Suzuki

376  f1    f f  2      fn 

(70)

 g1    g g  2      gn 

(71)

 e1x    e ex   2 x       enx 

(72)

 e1 y    e2 y ey         eny 

(73)

 e1z    e ez   2 z       enz 

(74)

 x1  X   y1 z  1

x2 y2 z2

 ax bx    A   a y by  a b  z   z

xn   yn  zn 

(75)

(76)

Factor Analysis f F  1  g1

f2 g2

 e1x     e1 y e  1z

e2 x e2 y e2 z

fn   gn 

377

(77)

enx   eny  enz 

(78)

Eq. (59) can then be expressed by

X  AF  

(79)

The final form is exactly the same as the one for one factor analysis. T Multiplying X to the Eq. (90) from the right side, and expanding it, we obtain

 x1  T XX   y1 z  1

x xn   1  x yn   2  zn    xn

x2 y2 z2

 nS xx    nS xy  nS  xz

nS xy nS yy nS yz

1   n  rxy r  xz

rxy 1 ryz

y1 y2 yn

z1   z2    zn 

nS xz   nS yz  nS zz  rxz   ryz  1 

(80)

T

where X is the transverse matrix of X . Similarly, we multiply transverse matrix to Eq. (16) from the right side, we obtain

 AF    AF      AF     F T AT  T  T

 AFF T AT  AF T  F T AT  T  A  FF T  AT  A  F T    F T  AT  T T

We evaluate the terms in Eq. (81) as below.

(81)

Kunihiro Suzuki

378

f FF T   1  g1

f2 g2

 n 2   fi i 1  n    gi fi  i 1 1 0  n  0 1

f F T   1  g1

 f1  fn   f2  gn     fn n  fi gi   i 1  n 2  gi   i 1 

g1   g2    gn 

(82)  e1x  f n   e2 x  gn    enx

f2 g2

   fi eix  fi eiy i 1 i 1  n n    gi eix  gi eiy i 1  i 1  0 0 0    0 0 0 n

n

e1 y e2 y eny

e1z   e2 z    enz 

  i 1  n  gi eiz   i 1  n

fe

i iz

(83)

Therefore, we obtain 1  1 T XX   rxy n r  xz

rxy 1 ryz

rxz   1 ryz   AAT  T n 1 

(84)

T We further expand the term AAT and 1 n   as

 ax bx     ax a y az  AAT   a y by     a b   bx by bz  z   z  ax ax  bx bx ax a y  bx by    a y ax  by bx a y a y  by by a a b b a a b b z x z y z y  z x

ax az  bx bz   a y az  by bz  az az  bz bz 

 ax ax ax a y ax az   bx bx bx by bx bz        a y ax a y a y a y az    by bx by by by bz  a a a a a a  b b b b b b  z y z z  z y z z   z x  z x  ax   bx        a y   ax a y az    by   bx by bz  a  b   z  z

(85)

Factor Analysis  e1x 1 T 1    e1 y n n  e1z

e2 x e2 y e2 z

 e1x enx    e2 x eny    enz    enx

e1 y e2 y eny

379

e1z   e2 z    enz 

n n  n 2  eix eiy  eix eiz     eix i 1 i 1  i 1  n n n   1 2    eiy eix  eiy eiy eiz   n  i 1 i 1 i 1  n n  n  2   eiz eix  eiz eiy  eiz  i 1 i 1  i 1  n 1  2 0 0   n  eix  i 1    1 n 2  0 eiy 0   n i 1    1 n 2 0  0  eiz  n i 1   Se 2 0 0   x  2   0 Sey  0     2   0 0 S ez  

(86)

Finally, we obtain 1   rxy r  xz

rxy 1 ryz

rxz   ax     ryz    a y   ax 1   az 

ay

 bx    az    by   bx b   z

by

 Se 2  x bz    0   0 

0 Sey  2

0

0   0   2 Sez  

(87)

This is the basic equation for two factor analysis. Let us compare the case for principal component analysis, where the covariance matrix is expanded as follows. 1   rxy r  xz

rxy 1 ryz

rxz   ryz   1v1v1T  2 v2 v2T  3v3v3T 1 

(88)

The right side terms are all determined clearly. We utilize this equation for principal component analysis as shown in the one factor analysis.

Kunihiro Suzuki

380

One subject is added for two factor analysis. We will finally obtain X , A, F ,  in the end, which satisfy

X  AF  

(89)

However, in the mathematical stand point of view, Eq. (89) is identical to

X  ATT 1F  

(90)

We assume that T is the rotation matrix, Eq. (90) can be modified as X  AT   T    F  

(91)

where   A '  AT    '   F  T    F

(92)

Therefore, if A snd F are the root of Eq. (89), A' , F ' are also the root of it. We will discuss this uncertainty later.

3.2. Factor Load for Two-Factor Variables We can expand the covariance matrix as below as shown in the one factor analysis given by 1   rxy r  xz

rxy 1 ryz

rxz   ryz   1v1v1T  2 v2 v2T  3v3v3T 1  



1 v1



We regards the first term factor analysis, that is,

1 v1



  T

1 v1



2 v2

1 v1



2 v2

  T

  T

2 v2



3 v3

2 v2





3 v3



T

(93)

T

as a factor load in the

Factor Analysis 1   rxy r  xz

rxz   ryz   1 

rxy 1 ryz



1_1 v1_1



1_1 v1_1

  T

381

2 _1 v2 _1



2 _1 v2 _1



T

 Se 2  x  0   0 

0 Sey  2

0

0   0   2 Sez  

(94)

Therefore, we approximate the factor load as

 ax   bx       a y   1_1 v1_1 ,  by   2 _1 v2 _1 a  b   z  z

(95)

This is not the final form factor loadings, but is the first approximated one. The first approximated independent load can be evaluated as   Se 2_1  x  0   0 

0  2

Sey _1 0

0  1   0    rxy  r 2   xz Sez _1 

rxy 1 ryz

rxz   ryz     1 



1_1 v1_1



1_1 v1_1

  T

2 _1 v2 _1





T 2 _1 v2 _1 



(96)

The non-diagonal elements on the right side of Eq. (96) are not 0 in general. Therefore, Eq. (96)is invalid from the rigorous point of view. However, we ignore the non-diagonal elements and regard the diagonal element on both sides are the same. We then form the new covariance matrix given by  1  Se 2_1 rxy rxz  x   2 R1   rxy 1  Sey _1 ryz     2   r r 1  S xz yz ez _1  

(97)

We evaluate the eigenvalue and eigenvectors of the matrix, and evaluate below.  Se 2_ 2  x  0   0  1    rxy r  xz

0   2 Sey _ 2 0   2 0 Sez _ 2  rxy rxz   1 ryz   1_ 2 v1_ 2 ryz 1  0





1_ 2 v1_ 2



T

(98)

Kunihiro Suzuki

382

We evaluate the difference between the approximated independent factors given by    Se 2_ 2  Se 2_1 x  x  2 2    Sey _ 2  Sey _1     S  2  S  2  ez _1   ez _ 2

(99)

If the value below is less than a certain value 



2 2 1   2 2  2    Se 2_ 2  Se 2_1    Se 2_ 2  Se 2_1 Sex _ 2  Sex _1 y y z z       3



(100)

We regard the component is the load factor given by

a  1_ 2 v1_ 2 , b  2 _ 2 v2 _ 2

(101)

If the value is more than a certain value, we construct the updated covariance matrix given by 1  Se 2_ 2 rxy rxz  x   2 R2   rxy 1  Sey _ 2 ryz     2   r r 1  S xz yz e _ 2  z 

(102)

We evaluate the eigenvalue and eigenvector of it and the updated one denote as

1_ 3 , v1_ 3 , 2 _ 3 , v2 _ 3 . We can then evaluate the updated independent matrix given by  Se 2_ 3  x  0   0  1    rxy r  xz

0   2 Sey _ 3 0   2 0 Sez _ 3  rxy rxz   1 ryz    1_ 3 v1_ 3  ryz 1  0





1_ 3 v1_ 3

  T

2 _ 3 v2 _ 3





T 2 _ 3 v2 _ 3 



(103)

Factor Analysis

383

We further evaluate the updated difference of the independent element matrix below.  Se 2_ 3  Se 2_ 2  x  x   2  2     Sey _ 3  Sey _ 2    S  2  S  2  ez _ 2   ez _ 3

(104)

and evaluate below. 



2 2 2 1   2 2 2 2 2 2 Sex _ 3  Sex _ 2    Sey _ 3  Sey _ 2    Sez _ 3  Sez _ 2   3



(105)

We repeat this process until it is less than the certain value. When we obtain this value after the m times process, we obtain the load factor as

a  1_ m v1_ m , b  2 _ m v2 _ m

(106)

The corresponding diagonal elements are given by d x2  1  Sex _ m , d y2  1  Sey _ m , d z2  1  Sez _ m 2

2

2

(107)

which we utilize later.

3.3. Uncertainty of Load Factor and Rotation There is uncertainty in load factors as we mentioned before. We solve the uncertainty in this section. We assume that we have factor loads a x , a y , a z and bx , by , bz using the showed procedure, and obtain factor points fi , gi  i  1, 2,

, n  . The procedure to obtain the

factor points will be shown later. We then obtain

 xi   ax     yi    a y  z  a  i  z

bx   eix    fi    by      eiy  g bz   i   eiz 

(108)

Kunihiro Suzuki

384

We rotate f , g with an angle of



, and the factor point

 fˆ , gˆ  , and both are related to i

 f i , gi 

is mapped to

i

 f i   cos     gi    sin 

sin    f i    cos    gi 

(109)

 f i   cos     gi   sin 

 sin    f i    cos    gi 

(110)

Therefore, we obtain  ax bx   ax bx     fi     cos   sin    fi   a y by   g    a y by   sin  cos    g   i   a b  i   a b  z  z   z  z  ax cos   bx sin  ax sin   bx cos     f    a y cos   by sin  a y sin   by cos    i   a cos   b sin  a sin   b cos    gi  z z z  z   ax bx    f    a y by  i  g    i  az bz 

(111)

where

 ax  ax cos   bx sin   a y  a y cos   by sin    az  az cos   bz sin 

(112)

bx  ax sin   bx cos   by  a y sin   by cos    bz  az sin   bz cos 

(113)

The above operation is the product of load factor matrix denoted as T   and we obtain

A

and rotation matrix

Factor Analysis

385

AT    A

(114)

We can select any rotation as the stand point of root. How, we can select the angle? We select the angle so that factor points have strong relationship to the factor load. We introduce a Valimax method, where the elements in A has the maximum and minimum values. We introduce a variable 2

2 1 1  1 2 2   V    a   2    a      b    3   x, y , z 3   x , y , z  3   x, y , z 

 

2

2   1  b  32    x, y , z

 

2

  

2

(115)

and evaluate the angle that givens the maximum V , which is called as a crude Valimax method. We can evaluate the angle that maximize V numerically. However, we can obtain its analytical one as shown below. We consider the first and third terms on the right side of Eq. (115), and modify them below. 2

  

2 b   a          x, y, z x, y, z 

2

2

2

 

2  a sin   b cos  2    a cos   b sin           x, y,z x, y,z

  a 

4

cos 4   4a 3b cos3  sin   6a 2b 2 cos 2  sin 2   4a b 3 cos  sin 3   b 4 sin 4  

4

sin 4   4a 3b sin 3  cos   6a 2b 2 sin 2  cos 2   4a b 3 sin  cos 3   b 4 cos 4  

4

 b 4  cos 4 

 x, y,z

  a



  x, y,z



  a 

2

 x, y,z

  a b  a b  cos  sin 

4

3

3

3

  x, y,z

12



  x, y,z

a 2b 2 cos 2  sin 2 

  a b  a b  cos  sin 

4

3

3

3



 x, y,z



  a

  x, y,z

4

 b 4  sin 4 

(116)

Kunihiro Suzuki

386

We then consider the second term of the right side of Eq. (115), and modify it as below.

 2    a     x , y , z 

2

 2     a cos   b sin      x , y , z 

2

      a 2 cos 2   2a b cos  sin   b 2 sin 2      x , y , z  

  a 

2

cos 2   2a b cos  sin   b 2 sin 2  

2

cos 2   2a b cos  sin   b 2 sin 2  

 x, y, z



  a   x, y, z

2

     a 2  a 2  cos 4    x, y, z    x, y, z    2   a b  a 2  cos3  sin    x, y, z    x, y, z       b 2  a 2  cos 2  sin 2     x, y, z   x, y, z    2   a 2  a b  cos3  sin    x, y, z    x, y, z    4   a b  a b  cos 2  sin 2    x, y, z    x, y, z    2   b 2  a b  cos  sin 3     x, y, z   x, y, z       a 2  b 2  cos 2  sin 2    x, y, z    x, y, z    2   a b  b 2  cos  sin 3    x, y, z    x, y, z       b 2  b 2  sin 4     x, y, z   x, y, z  Rearranging them with respect to the same term associated with

(117)

 , we obtain

Factor Analysis  2    a     x , y , z  

387

2

2

     a 2  cos 4     x, y , z    4   a 2  a b  cos3  sin    x , y , z   x , y , z   2         2   a 2  b 2   4   a b   cos 2  sin 2      x , y , z   x , y , z     x, y,z     4   b 2  a b  cos  sin 3     x, y, z   x, y , z  2

     b 2  sin 4     x, y, z 

   b   x , y , z



  2

(118)

2



 2      a sin   b cos      x , y , z  

2

      a 2 sin 2   2a b sin  cos   b 2 cos 2      x , y , z  

  a

2

sin 2   2a b sin  cos   b 2 cos 2  

  a

2

sin 2   2a b sin  cos   b 2 cos 2  

  x, y, z



  x, y, z

     a 2  a 2  sin 4    x, y, z    x, y, z    2   a b  a 2  sin 3  cos    x, y, z    x, y, z       b 2  a 2  sin 2  cos 2    x, y,z    x, y, z    2   a 2  a b  sin 3  cos    x, y, z    x, y, z    4   a b  a b  sin 2  cos 2    x, y, z    x, y, z    2   b 2  a b  sin  cos 3    x, y,z    x, y, z       a 2  b 2  sin 2  cos 2    x, y, z    x, y, z    2   a b  b 2  sin  cos 3    x, y, z    x, y, z       b 2  b 2  cos 4    x, y,z    x, y, z 

2

(119)

Kunihiro Suzuki

388

We then consider the fourth term of the right side of Eq. (115), and modify it as below. Rearranging them with respect to the same term associated with 2    b    x , y , z 

 

 , we obtain

2

2

     a 2  sin 4     x, y ,z    4   a b  a 2  sin 3  cos   x, y,z    x, y ,z      2   a 2  b 2   4   a b  a b  sin 2  cos 2   x, y ,z    x, y ,z   x, y ,z     x, y ,z    4   b 2  a b  sin  cos3     x, y ,z   x, y ,z  2

     b 2  cos 4     x, y ,z 

(120)

Summarizing the second and fourth terms of the right side of Eq. (115), we obtain 2

2

2   2    a      b    x , y , z    x , y , z  2 2         a 2     b 2   cos 4     x , y , z     x , y , z      4   a 2   b 2   a b   cos3  sin    x, y, z    x , y , z    x , y , z   2     2 2 2  4   a  b   2   a b   cos  sin 2     x , y , z   x , y , z     x , y , z      4   a 2   b 2   a b  cos  sin 3    x, y, z    x , y , z     x, y, z 2 2         a 2     b 2   sin 4     x , y , z     x , y , z  

 

(121)

Factor Analysis

389

Summarizing all, we obtain

3V 2

    1    a 2     b  3   x, y , z      x, y, z 

2

2 2  1    a  b      3   x , y , z   x, y, z   2 2     1   4 4 2 2     a  b     a     b    cos 4  3    x , y , z     x , y , z      x , y , z       1  4    a 3b  a b 3      a 2  b 2    a b    cos3  sin  3    x , y , z    x , y , z      x , y , z 2      2 1   2 2 2 2 4 3  a b    a  b   2   a b    cos  sin 2  3    x , y , z   x , y , z     x , y , z       x , y , z      1  4    a 3b  a b 3      a 2  b 2    a b   cos  sin 3  3    x , y , z      x, y , z   x , y , z 2 2     1   4 4 2 2     a  b     a     b    sin 4  3    x , y , z     x , y , z      x , y , z   K1 cos 4   K 2 cos3  sin   K 3 cos 2  sin 2   K 2 cos  sin 3   K1 sin 4  (122) 2

2

 

 

2

where

K1 

  a   x, y, z

   1   b     a 2     b 2  3    x , y , z     x, y, z   2

4

4

2

  

(123)

    1    K 2  4    a 3b  a b 3      a 2  b 2    a b    3    x , y , z    x , y , z       x , y , z 

(124)

2      1   2 2 2 2 K3  4 3  a b    a  b   2   a b    3    x , y , z   x , y , z     x , y , z       x , y , z 

(125)

We obtain the condition where V has the local maximum as below.

Kunihiro Suzuki

390

  3V    4 K1 cos3  sin 

 K 2  3cos 2  sin 2   cos 4    K 3  2 cos  sin 3   2 cos3  sin    K 2   sin 4   3cos 2  sin 2   4 K1 sin 3  cos 

 K 2 cos 4    4 K1  2 K 3  cos3  sin  6 K 2 cos 2  sin 2    4 K1  2 K 3  cos  sin 3   K 2 sin 4   K 2 cos 4   4 K 4 cos3  sin   6 K 2 cos 2  sin 2   4 K 4 cos  sin 3   K 2 sin 4  0 (126) where

1 K 4  K1  K3 2

(127)

Dividing Eq. (126) by cos 4  , we obtain

1 4

K4 K tan   6 tan 2   4 4 tan 3   tan 4   0 K2 K2

(128)

Arranging it, we obtain

4  tan   tan 3   K2   tan  4  K 4 1  6 tan 2   tan 4 

(129)

where we utilize a lemma given by

tan  4  

4  tan   tan 3   1  6 tan 2   tan 4 

(130)

Factor Analysis

391

We can then obtain

K  4  tan 1  2   K4 

(131)

We impose that the angle limitation is



 2

 4 

 2

(132)

We assume that 0 is the one root of it. 0   can become the other root as shown in Figure 2.

Figure 2. The two angles that provide the same value of tangent.

Therefore, we obtain two roots for that. One corresponds to the local maximum and the other local minimum. We then select the local maximum. Further differentiate Eq. (126) with respect to

 , we obtain

Kunihiro Suzuki

392

 2  3V   2  K 2  4 cos3  sin   4 K 4  3cos 2  sin 2   cos 4   6 K 2  2 cos  sin 3   2 cos3  sin   4 K 4   sin 4   3cos 2  sin 2    K 2  4 cos  sin 3    K 2  4 cos3  sin   12 cos  sin 3   12 cos3  sin   4 cos  sin 3   4 K 4  3cos 2  sin 2   cos 4   sin 4   3cos 2  sin 2    16 K 2  cos  sin 3   cos3  sin   4 K 4  cos 4   sin 4   6 cos 2  sin 2    16 K 2 cos  sin   sin 2   cos 2   2 2 4 K 4  cos 2   sin 2     2 cos  sin       4  K 4 cos 4  K 2 sin 4   0

(133)

Therefore, the condition for the local maximum is given by

K 4 cos 4  K 2 sin 4  0

(134)

We have the relationship of

K2 sin 4  tan  4   K4 cos 4

(135)

We then obtain

cos 4 

K4 sin 4 K2

Substituting Eq. (136) into Eq. (134), we obtain

(136)

Factor Analysis

K22  K42 sin 4  0 K2

Multiplying

K2 2 /  K2 2  K4 2   0

393

(137)

to Eq. (137), we obtain

K 2 sin 4  0

(138)

This is the condition for the local maximum. If the evaluated 4 do not hold Eq. (138), we change the angle to

4  4  

(139)

We select the sign of  so that the angle satisfy



 4

 

 4

(140)

We consider the various cases and establish the evaluation process. When



 8

K 2 sin 4  0

 

is held, this ensures the below.

 8

(141)

We adopt the angle as it is. When

K 2 sin 4  0

is not held, we then consider the second condition.

When sin 4  0 is held, 4 exist in the first quadrant. When we adopt the negative sign, we obtain

  4  

 2

Therefore, corresponding angle region is

(142)

Kunihiro Suzuki

394



 4

  

 8

(143)

This is in the acceptable region. When we adopt positive singe, we obtain

  4 

3 2

(144)

Therefore, the corresponding angle region is

 3   4 8

(145)

This is the outside of the acceptable region. We then consider the case of sin 4  0 . This means that 4 exist in the fourth quadrant region. When we adopt negative sign, we obtain



3  4   2

(146)

Therefore, corresponding angle region is



3     8 4

(147)

This is the outside of the acceptable region. When we adopt positive sign, we obtain

  4   2

(148)

Therefore, corresponding angle region is

 8

 

 4

(149)

Factor Analysis

395

This is in the acceptable region. Table 1 summarizes the above procedure. Table 1. Conditions for determine angle Condition 1

Condition 2

4

K 2 sin 4  0

None

None

K 2 sin 4  0

conversion

Range of



sin 4  0

4  4  

sin 4  0

4  4  



 8

 8

 4



 

 8

  

 

 8

 4

A crude Valimax method, which we explain up to here, is inadequate when the factor 2 d2  1  Se2 is large. When d is large, the absolute value of factor loads are also apt to

become large. Consequently, the rotation angle is dominantly determined by the factor load with large instead of whole data. Since we want to evaluate the angle related to the whole data, the result with the crude Valimax method may be unexpected one. Therefore, we

V by changing the elements to  a d  ,  b d  . 2

2

modify

Therefore, we evaluate 2

2

2

2

2  a  2   a   1 1   S       2      3   x , y , z  d   3   x , y , z  d      

2  b  2   b   1 1         2      3   x , y , z  d   3   x , y , z  d      

(150)

and decide the  which gives the maximum S . This is called as a criteria Valimax method. The corresponding factors are given by

tan  4  

where

G2 G4

(151)

Kunihiro Suzuki

396

2 2 2 2  a 4  b 4  1           a b G1                             x , y , z  d   d   3    x , y , z  d     x , y , z  d      

(152)

  a 3  b   a  b 3  1   a 2  b 2   a  b         G2  4                                 x, y , z  d   d   d  d   3   x, y , z  d   d    x, y , z  d  d    (153) 2 2 2 2 2   a  b  1 a  b   a  b       G3  4 3                   2           d d 3 d d d d   x , y , z      x , y , z      x, y , z         x , y , z            

(154)

1 G4  G1  G3 2

(155)

G2 sin 4  0

(156)

3.4. Factor Point and Independent Load with Two-Factors We review the process up to here. The basic equation for two-variable factor analysis is given by

 xi   ax   bx   exi           yi   fi  a y   gi  by    eyi  a  b  e  z   z  z   zi   i

(157)

We decide the factor load, and variances of ex , ey , ez . We further need to evaluate fi , gi and ex , ey , ez themselves. We assume that x, y , z are commonly expressed with f and g . This means that the predictive value of

fˆi ˆ and can be expressed with the linear sum of gi

fˆi  hfx xi  hfy yi  hfz zi

x, y , z

given by

(158)

Factor Analysis

gˆ i  hgx xi  hgy yi  hgz zi

397

(159)

We focus on fi first. This is the same analysis for one-factor analysis and we obtain  h fx   rxx     h fy    rxy h  r  fz   zx

rxy ryy ryz

rzx   ryz  rzz 

1

 ax     ay  a   z

(160)

Using the h fx , h fy , h fz , the factor point is given by

fˆi  hfx xi  hfy yi  hfz zi

(161)

We can perform the same analysis for g , and obtain  hgx   rxx     hgy    rxy h  r  gz   zx

rxy ryy ryz

rzx   ryz  rzz 

1

 bx     by  b   z

(162)

Using the hgx , hgy , hgz , the factor point is given by

gˆ i  hgx xi  hgy yi  hgz zi

(163)

Summarizing above, we can express as  h fx   h fy h  fz

hgx   rxx   hgy    rxy hgz   rzx

rxy ryy ryz

rzx   ryz  rzz 

1

 ax   ay a  z

bx   by  bz 

(164)

Independent factor is expressed by  exi   xi    ax   bx             eyi    yi    fi  a y   gi  by    b  e   z   a   z   zi   i    z 

(165)

Kunihiro Suzuki

398

This can be further expressed for any member i as

 ex1   ex 2    exn

ey1 ey 2 eyn

ez1   x1  ez 2   x2      ezn   xn

y1 y2 yn

z1   f1   z2   f 2      zn   f n

g1   g 2   ax   bx  gn 

ay by

az   bz  (166)

4. GENERAL FACTOR NUMBER We expand the analysis to any number of factors in this section. We treat

q

p

variables,

factors, and n data here. The restriction is

q p

(167)

The analysis is basically the same as the ones for two-factor analysis. We repeat the process again here without skipping to understand the derivation process clearly.

4.1. Description of the Variables We treat various kinds of variables, and hence clarify the definition in the start point. The kind number of variables is p , and each variable has data number of n . The probability variables are denoted as

X1 , X 2 , , X p

(168)

We express all of them symbolically as X for   1,2,

,p

Each data is expressed by

(169)

Factor Analysis

399

X 1  x11 , x21 , , xi1 , , xn1 X 2  x12 , x22 , , xi 2 , , xn 2 X p  x1 p , x2 p , , xip , , xnp

(170)

We express all of them symbolically as

X i

(171)

We sometimes express 2 variable symbolically as

X , X 

(172)

There are q kinds of factor point denoted by

f1 , f 2 , , f q

(173)

We express this symbolically as

fk for k  1, 2,

,q

(174)

When we express that for each data, we then express the factor points as

f1 : f11 , f 21 ,

, f n1

f 2 : f12 , f 22 ,

, fn2

f q : f1q , f 2 q ,

, f nq

(175)

We express this symbolically as

fik When we express two factor point symbolically as

(176)

Kunihiro Suzuki

400

f k , fl

(177)

We have q kinds of factor loads expressed by

a1 , a2 , , aq

(178)

Each factor load has p kinds of element and is expressed by  a11    a21 a1         a p1   a12    a22  a2        ap2   a1q    a2 q aq         a pq 

(179)

We express the factor load symbolically as

ak

(180)

The corresponding elements are also expressed symbolically as

ak : a k

(181)

Independent factor has p kinds and each has n data, which is denoted as

e1 , e2 , , e p It can be expressed symbolically as

(182)

Factor Analysis e for   1,2,

,p

401 (183)

We can express each independent factor related to data as e1 : e11 , e21 ,

, en1

e2 : e12 , e22 ,

, en 2

e p : e1 p , e2 p ,

, enp

(184)

It can be expressed symbolically as ei

(185)

When we express two independent factors symbolically as e , e

(186)

We can obtain eigenvalues of p kinds given by

1 , 2 , ,  p

(187)

We utilize the factor number of them given by

1 , 2 , , q

(188)

4.2. Basic Equation We assume normalized p variable of x1 , x2 , , x p and q kinds of factor of

f1 , f 2 , , f q . The factor load associated with the factor point f1 given by

a

11

by

a21

a p1  . In general, the factor load associated with the factor point f k is given

Kunihiro Suzuki

402  a1k    a2 k ak         a pk 

(189)

Therefore, the data are expressed by

 x1   a11   a12         x2   f  a21   f  a22     1  2         xp   a p1   ap2 

 a1q   e1      a2 q   e2    fq           a pq   e p 

(190)

Focusing on data i , which correspond to the all data for member i , we express it as  xi1   a11   a12         xi 2   f  a21   f  a22   i1     i2          xip   a p1   ap2 

 a1q   ei1      a2 q   ei 2    fiq           a pq   eip 

for i  1, 2,

,n

(191)

Note that a k is independent of each member i , factor point is independent of x , but depend on i . ei depends both on i and x . To solve Eq. (191), we impose some restrictions below. We assume that the factor point f k is normalized, that is, E  fk   0

(192)

V  fk   1

(193)

The average of independent factor is zero, that is,

E ex   0 where   1, 2, , p . The covariance between f k and e is zero, that is,

(194)

Factor Analysis

403

Cov  f k , fl    kl

(195)

Cov  f k , e   0

(196)

Cov e , e   

(197)

We define the vectors below.  f11    f f1   21       f n1 

(198)

 f12    f f 2   22       fn2 

(199)

・・・  f1q    f2q   fq       f nq 

(200)

 e11    e e1   21       en1 

(201)

 e12    e e2   22       en 2 

(202)

Kunihiro Suzuki

404  e1 p    e2 p ep         enp 

 x11  x12 X     x1 p  a11  a21 A    a p1

 f11  f12 F     f1q

 e11  e12     e1 p

(203)

x21 x22 x2 p a12 a22 ap2

f 21 f 22 f2q

e21 e22 e2 p

xn1   xn 2    xnp  a1q   a2 q    a pq 

(204)

(205)

f n1   fn2    f nq 

(206)

en1   en 2    enp 

(207)

Then the data are expressed with a matrix form as

X  AF   T Multiplying X to the Eq. (90) from the right side, and expanding it, we obtain

(208)

Factor Analysis

 x11 x21  x12 x22 T XX      x1 p x2 p  nS11 2 nS12 2  2    2  nS21 nS22    nS  2 nS  2 p2  p1  r11  r21  n    rp1

xn1  x11  xn 2  x21   xnp   xn1 2 nS1 p    2 nS2 p     2  nS pp 

x12 x22 xn 2

r1 p   r2 p    rpp 

r12 r22 rp 2

405

x1 p   x2 p    xnp 

(209)

T

where X is the transverse matrix of X . Since the variables are normalized, the covariance is equal to the correlation factor and hence we obtain

  r  S 2

(210)

r  1 for   1,2, , p

(211)

Similarly, we multiply the transverse matrix to Eq. (16) from the right side, we obtain

 AF    AF      AF     F T AT  T  T

 AFF T AT  AF T  F T AT  T  A  FF T  AT  A  F T    F T  AT  T T

We evaluate the terms in Eq. (81) as below.

(212)

Kunihiro Suzuki

406  f11 f 21  f12 f 22 FF T      f1q f 2 q  n 2   fi1  i 1  n  f f   i 1 i 2 i1   n    fiq fi1  i 1 1 0  0 1  n   0 0

 f11 f 21  f12 f 22 T F      f1q f 2 q  n   fi1ei1  i 1  n  f e   i 1 i 2 i1   n    fiq ei1  i 1 0 0  0 0    0 0

f n1  f11  f n 2  f 21   f nq   f n1 n

f i 1

n

i 1

2 i2

n

i 1

fn2  f  i 1  n  fi 2 fiq   i 1    n 2  fiq   i 1  n

f

i1 i 2

f f

f1q   f2q    f nq 

f12 f 22

f

iq i 2

f

i1 iq

0  0   1

(213) f n1  e11 e12  f n 2  e21 e22   f nq   en1 en 2

n

f i 1

i 1

e

i2 i2

n

f i 1

 e  i 1  n  fi 2 eiq   i 1    n f iq eiq   i1  n

e

i1 i 2

n

f

e1q   e2 q    enq 

e

iq i 2

f

i1 iq

0  0   0

(214)

Therefore, we obtain 1 0  0 1 1 XX T    n  0 0

0  0 1  AAT  T  n  1

(215)

Factor Analysis

407

T We further expand the term AAT and 1 n   as

a1q  a11 a21 a p1   a11 a12    a a a a a a 21 22 2 q  12 22 p2  AAT          a pq  a1q a2 q a pq   a p1 a p 2  a112  a12 2   a1q 2 a11a21  a12 a22   a1q a2 q  a a  a a   a a a212  a22 2   a2 q 2 22 12 2 q 1q   21 11   a a  a a   a a a p1a21  a p 2 a22   a pq a2 q p 2 12 pq 1q  p1 11 2  a11 a11a21 a11a p1    2 a a a a 21 21a p1    21 11     2  a a a p1   p1 11 a p1a21  a12 2 a12 a22 a12 a p 2    2 a a a a 22 22 a p 2    22 12     2  a a ap2   p 2 12 a p 2 a22   a1q 2  a a   2 q 1q   a a  pq 1q   a11

  a12

a11a p1  a12 a p 2  a21a p1  a22 a p 2  a p12  a p 2 2 

 a1q a pq    a2 q a pq    2   a pq 

a1q a pq   a2 q a pq    2  a pq a2 q a pq   a11    a21 a21 a p1         a p1   a12    a22  a22 ap2        ap2  a1q a2 q aa2 q 2

   a1q

   a1k

a2 q

q

k 1

a2 k

 a1q    a2 q  a pq        a pq   a1k    a2 k  a pk       a  pk 

(216)

Kunihiro Suzuki

408  e11  1 T 1  e12   n n   e1 p

e21 e22 e2 p

 1 n 2  n  ei1  i 1 1 n  e e   n i 1 i 2 i1   n 1 e e  n  ip i1  i 1  Se 2  1  0     0ix  1

0 Se2  2

0

en1  e11 e12 e1 p    en 2  e21 e22 e2 p      enp  enp   en1 en 2 1 n 1 n  ei1ei 2   ei1eip  n i 1 n i 1   1 n 2 1 n ei 2 ei 2 eip    n i 1 n i 1    n n 1 1 2   eip ei 2  eip  n i 1 n i 1 0   0    2  Sep  

(217)

Finally, we obtain 1   r21    rp1

r12 1 rp 2

r1 p   r2 p  q   a1k   k 1  1 

a2 k

  2  a1k   Se1   a2 k   0 a pk           a pk   0ix1 

0 Se2  2

0

0   0     2  Se p 

(218)

This is the basic equation for the factor analysis.

4.3. Factor Load We can expand the covariance matrix as below as shown in the one factor analysis given by  1 r12   r21 1    rp1 rp 2

r1 p   r2 p    v v T  2 v2 v2T   111  1 

  p v p v pT

(219)

Factor Analysis We regards the first term



1 v1



1 v1

  T

409

2 v2



2 v2



T







q vq



q vq



T

as

factor load in the factor analysis, that is, 1   r21    rp1

r1 p   r2 p     1 

r12 1 rp 2



1 v1

 Se 2  1  0     0 



1 v1

  T

2 v2



2 v2



T







q vq



q vq

0   0     2  Se p 

0 Se2  2

0



T

(220)

This is not the final form factor loadings, but is the first approximated one. The first approximated independent load can be evaluated as

a1  1 v1 a2  2 v2 aq  q vq

(221)

We regard these as first approximated ones, that is, we denote them as

1_1 , v1_1 , 2 _1 , v2 _1 , , q _1 , vq _1   Se 2_1  1  0     0  1  r21     rp1





0  Se2 _1 2

0 r12 1 rp 2

1_1 v1_1



(222)

0   0     2  Se p _1  r1 p   r2 p    1 

1_1 v1_1

  T

2 _1 v2 _1



2 _1 v2 _1



T







q _1 vq _1



q _1 vq _1



T

(223)

Kunihiro Suzuki

410

The non-diagonal elements on the right side of Eq. (96) are not 0 in general. Therefore, Eq. (96) is invalid from the rigorous point of view. However, we ignore the non-diagonal elements and regard the diagonal element on both sides are the same. We then form the new covariance matrix given by  1  Se 2_1 r12 1  2  r 1  Se2 _1 R1   21    rp1 rp 2 

  r2 p     2  1  Se p _1  r1 p

(224)

We evaluate the eigenvalue and eigenvectors of the matrix, and evaluate below.

1_ 2 , v1_ 2 , 2 _ 2 , v2 _ 2 , , q _ 2 , vq _ 2

(225)

The diagonal elements can then be evaluated as  Se 2_ 2 0  1 2  0 Se2 _ 2     0 0   1 r12  r21 1     rp1 rp 2 



1_ 2 v1_ 2



0   0     2  Se p _ 2  r1 p   r2 p    1 

1_ 2 v1_ 2

  T

2 _ 2 v2 _ 2



2 _ 2 v2 _ 2



T







q _ 2 vq _ 2



q _ 2 vq _ 2



T

(226) We evaluate the





 below.

1   2  2  2   2  2  2 S  S  S  S  e _ 2 e _1 e _ 2 e 1 1 2 2 _1    q 

2 2    Seq _ 2  Seq _1 

2



If it is smaller than the setting value of c , we regard the factor load as

(227)

Factor Analysis

411

a1  1_ 2 v1_ 2 a2  2 _ 2 v2 _ 2 aq  q _ 2 vq _ 2

(228)

If   c , we construct updated covariance matrix given by 1  Se 2_ 2 r12 1  2  r21 1  Se2 _ 2 R2      rp1 rp 2 

  r2 p     2  1  Se p _ 2  r1 p

(229)

and evaluate the eigenvalues and eigenvectors given by

1_ 3 , v1_ 3 , 2 _ 3 , v2 _ 3 , , q _ 3 , vq _ 3

(230)

We evaluate the diagonal elements as  Se 2_ 3 0  1 2  0 Se2 _ 3     0 0   1 r12  r21 1     rp1 rp 2 



1_ 3 v1_ 3



0   0     2 Sep _ 3  r1 p   r2 p    1 

1_ 3 v1_ 3

  T

2 _ 3 v2 _ 3



2 _ 3 v2 _ 3



T







q _ 3 vq _ 3



q _ 3 vq _ 3



T

(231) We then evaluate





 given by

1   2  2  2   2  2  2 S  S  S  S  e _ 3 e _ 2 e _ 3 e 1 2 _2    2 q  1

2 2   Seq _ 3  Seq _ 2 

2



(232)

Kunihiro Suzuki

412 If

 is smaller than c , we decide the factor load as

a1  1_ 3 v1_ 3 a2  2 _ 3 v2 _ 3 aq  q _ 3 vq _ 3

If

(233)

 is larger than c we repeat this process until





2 2 1   2  2    Se 2_ m  Se 2_ m1   S  S e _ m e _ m  1 1 2   2  q  1

2 2   Seq _ m  Seq _ m1 

2



(234)

  c is held. We finally obtain the factor load given by  a11    a21 a1  1_ m v1_ m         a p1   a12    a22  a2  2 _ m v2 _ m        ap2 

aq  q _ m vq _ m

 a1q    a2 q        a pq 

(235)

On the other hand, we obtain the final diagonal elements as d12  1  Se1 _ m 2

d 22  1  Se2 _ m 2

d q2  1  Seq _ m 2

(236)

Factor Analysis

413

4.4. Uncertainty of Load Factor and Rotation We will finally obtain X , A, F ,  in the end. However, in the mathematical stand point of view, it is expressed by X  AF    ATT 1 F  

(237)

We assume that T is the rotation matrix, Eq. (90) can be modified as X  AT   T    F  

(238)

where   A '  AT    '   F  T    F

(239)

Therefore, if A and F are the root of Eq. (89), A and F are also the roots of it. We will discuss this uncertainty later. In the case of two-factor analysis, we can consider one angle. However, in the q-factor analysis, we have q C2 kinds of planes to be considered. We consider two factors among them. The corresponding factor loads are  a1k   a1l      a a2l 2k  ak   , al             a pk   a pl 

for k  l

(240)

The conversion is expressed by the matrix as  a1k a1l     a2 k a2l   cos   sin      sin  cos      a pk a pl   a1k cos   a1l sin  a1k sin   a1l cos     a2 k cos   a2l sin  a2 k sin   a2l cos         a pk cos   a pl sin  ak sin   al cos  

(241)

Kunihiro Suzuki

414

Therefore, the converted factor load is given by  a1k cos   a1l sin    a1k      a2 k cos   a2l sin    a2 k   ak            a pk cos   a pl sin    a pk 

(242)

 a1k sin   a1l cos    a1l      a sin   a2l cos    a2l  al   2 k           ak sin   al cos    a pl 

(243)

We decide the angle



so that it maximizes the S below.

2

2

2

2

2 2 1 p  a k   1  p  a k   S           p  1  d   p 2   1  d       2 2 1  a l   1  p  a l             p  1  d   p 2   1  d       p

(244)

The  is solved as

tan  4  

G2 G4

(245)

The critical condition is given by

G2 sin 4  0

(246)

Where each parameter is given by 2 2 2 2  a  4  a  4  1    p  a k    p  a l    k l G1                   d p d d      1  d    1   1                   p

(247)

Factor Analysis   a 3  a   a  a 3  1  p  a 2  a 2  p  a  a  G2  4     k    l     k   l       k     l      k   l   x, y , z  d   d   d  d   p   1  d   d    1  d  d

415        

(248) 2 2 2  p  a 2  a 2  p  a k  a l     1  p  a k  p  a l    k l G3  4 3            2      p   1  d   1  d   1  d   d    1  d  d         (249)

1 G4  G1  G3 2

(250)

G2 sin 4  0

(251)

We select two factors. The factor k , l is expressed by the matrix below... 1        T            0 

1 (k ,k )

( k ,l )

cos 

 sin  1 1

(l ,k )

sin 

( l ,l )

cos  1

0                 1 

(252)

We then form AT  

(253)

We perform whole combination of p C2 . When we perform the Valimax rotation, the two factor loads changes in general. Next, we select the same one and different one in the next selection, both factor load changes. Therefore, we need to many cycles so that the change becomes small.

Kunihiro Suzuki

416

4.5. Factor Point and Independent Load with General Number-Factors We review the process up to here. The basic equation for general number p of variables factor analysis is given by  xi1   a11   a12         xi 2   f  a21   f  a22   i1     i2          xip   a p1   ap2 

 a1q   ei1      a2 q   ei 2    fiq           a pq   eip 

We decide factor load, and variance of e1 , e2 , and ei1 , ei 2 ,

for i  1, 2,

,n

(254)

, e p . We further need to evaluate fi , gi

, eip themselves.

We assume that x1 , x2 , , x p are commonly expressed with f1k , f 2 k ,

, f pk This means

that the predictive value of fˆik can be expressed with the linear sum of x1 , x2 , , x p given by

fˆik  h1k xi1  h2k xi 2 

 hpk xip

(255)

Assuming fik as real data, the square of the deviation is given by n



Qk   fik  fˆik i 1



2

   f ik   h1k xi1  hk 2 xi 2  n

i 1

 hpk xip  

2

(256)

We decide h1k , h2k , , hpk so that it minimize Eq. (256). Partial differentiating Eq. (256) with respect to h1k , and set it as 0, we obtain Qk   h1k h1k

n

  f i 1

ik

  h1k xi1  hk 2 xi 2 

 2  fik   h1k xi1  hk 2 xi 2  n

i 1

 hpk xip  

2

 hpk xip xi1  0

(257)

Factor Analysis

417

This leads to

h n

1k xi1 xi1  hk 2 xi 2 xi1 

i 1

 hpk xip xi1    fik xi1 n

i 1

(258)

The left side of Eq. (258) is modified as

h n

x x  hk 2 xi 2 xi1 

1k i1 i1

i 1

 hpk xip xi1   h1k  xi1 xi1  h2 k  xi 2 xi1  n

n

i 1

i 1

 n  h1k r11  h2k r21 

 hpk rp1 

n

 hpk  xip xi1 i 1

(259)

The right side of Eq. (258) is modified as n

f i 1

x

ik i1

  fik  a11 fi1  a12 fi 2  n

i 1

n

n

i 1

i 1

 a1q f iq 

 a11  fik fi1  a12  f ik f i 2 

n

 a1q  f ik f iq i 1

n

 a1k  fik fik i 1

 na1k

(260)

Therefore, Eq. (258) is reduced to

h1k r11  h2k r21 

 hpk rp1  a1k

(261)

Performing the similar analysis with respect to h k , we obtain Q f h k



 h k

n

  f i 1

ik

  h1k xi1  h2 k xi 2 

 2  fik   h1k xi1  h2 k xi 2  n

i 1

We then obtain

 hpk xip  

2

 hpk xip  xi  0

(262)

Kunihiro Suzuki

418

h n

x x  h2 k xi 2 xi 

1k i1 i

i 1

 hpk xip xi    fik xi n

i 1

(263)

The left side of Eq. (263) is modified as

h n

1k xi1 xi  h2 k xi 2 xi 

i 1

 hpk xip xi   h1k  xi1 xi  h2k  xi 2 xi  n

n

i 1

i 1

 n  h1k r1  h2k r2 

 hpk rp 

n

 hpk  xip xi i 1

(264) The right side of Eq. (263) is modified as n

f i 1

x

ik i

  fik  a 1 fi  a 2 fi 2  n

i 1

n

n

i 1

i 1

 a q fiq 

 a11  fik fi1  a12  f ik f i 2 

n

 a1q  f ik f iq i 1

n

 a1k  fik fik i 1

 na1k

(265)

Therefore, Eq. (263) is reduced to

h1k r1  h2k r2 

 hpk rp  a k

(266)

Summarizing above, we obtain  1 r12   r21 1    rp1 rp 2

r1 p   h1k   a1k   r2 p   h2 k   a2 k           1   h2 k   a2 k 

We then obtain the factor point factors as

(267)

Factor Analysis  h1k   1     h2 k    r21        hpk   rp1

r12 1 rp 2

r1 p   r2 p    1 

1

419

 a1k     a2 k       a2 k 

(268)

We can then obtain factor points as

fˆik  h1k xi1  h2k xi 2 

 hpk xip

(269)

The independent factor is given by  ei1   xi1    a11   a12            ei 2    xi 2    f  a21   f  a22        i1   i 2             eip   xip    a p1   ap2 

 a1q     a2 q     fiq       a pq  

for i  1, 2,

,n

(270)

The above results are totally expressed by  e11 e12   e21 e22    en1 en 2

e1 p   x11   e2 p   x21      enp   xn1

x12 x22 xn 2

x1 p   f11   x2 p   f 21      xnp   f n1

f12 f 22 fn2

f1q  a11  f 2 q  a12   f nq   a1q

a21 a22 a2 q

a p1   ap2    a pq 

(271)

5. SUMMARY To summarize the results in this chapter— We treat various kinds of variables in this analysis, and hence summarize the treatments systematically in this section.

5.1. Notation of Variables Number of kinds of variables is p , data number is n , and factor number is q . , . Symbolic expressions for the variable are

Kunihiro Suzuki

420

Symbolic expressions for the data are

i, j

Symbolic expressions for the factors are k , l

 i, j  ,  ,   ,  k , l   The order of the subscript is given by  Based on the above, each expression is as follows. p kinds variable are expressed as

x1 , x2 , , x p This can be expressed with symbolically as x for   1,2, , p

The detailed expression that denotes the each data value is given by x1 : x11 , x21 ,

, xn1

x2 : x12 , x22 ,

, xn 2

x p : x1 p , x2 p ,

, xnp

The corresponding symbolic expression is given by xi

Two kinds of variables can be expressed symbolically as x , x

The normalize variables are expressed by

u1 , u2 , , u p This can be expressed symbolically as u for   1,2,

,p

Factor Analysis The detailed expression that denotes the each normalized data value is given by u1 : u11 , u21 , , un1 u2 : u12 , u22 , , un 2 u p : u1 p , u2 p , , unp

This can be expressed symbolically as ui

Two kinds of normalized variables are symbolically expressed as u , u

We have q kinds of factor points and they are denoted as

f1 , f 2 , , f q We express them symbolically as fk for k  1,2,

,q

The detailed expression to denote the each data value is given by f1 : f11 , f 21 ,

, f n1

f 2 : f12 , f 22 ,

, fn2

f q : f1q , f 2 q ,

, f nq

This can be expressed symbolically as fik

Two kinds of factor points are symbolically expressed as f k , fl

421

Kunihiro Suzuki

422

There are q kinds of factor loadings as

a1 , a2 , , aq Each factor loading has p elements and are expressed as  a11    a21 a1         a p1   a12    a22  a2        ap2   a1q    a2 q aq         a pq 

The factor loading is symbolically expressed as ak

The elements are expressed as ak : a k

Independent load has the same structure as the data x . It has p kinds of variables and each variable has n data. We express them as

e1 , e2 , , e p This can be symbolically expressed as e for   1,2,

,p

Factor Analysis

423

The detailed expression to denote the each data value is given by e1 : e11 , e21 ,

, en1

e2 : e12 , e22 ,

, en 2

e p : e1 p , e2 p ,

, enp

This can be expressed symbolically as

ei Two kinds of independent loadings are symbolically expressed as

e , e We have k kinds of eigenvalues. However, we utilize up to the q number of eigenvalue in the factor analysis given by

1 , 2 , , q

5.2. Basic Equation The basic equation for the factor analysis is given by  ui1   a11   a12         ui 2   f  a21   f  a22   i1 i2              uip   a p1   ap2 

 a1q   ei1      a2 q   ei 2   fiq            a pq   eip 

for i  1, 2,

,n

This equation should hold the following equation given by 1   r21    rp1

r12 1 rp 2

r1 p   r2 p  q   a1k   k 1  1 

a2 k

  2  a1k   Se1   a2 k   0  a pk          a pk   0ix1 

0 Se2  2

0

0   0     2 Sep  

Kunihiro Suzuki

424

5.3. Normalization of Variables The average  and variance  are given by  

1 n  xi n i 1 1 n 2   xi    n  1 i 1

 

When the data are regarded as the one in population set, the variance is given by 1 n 2   xi    n i 1

 

The normalized variable is given by ui 

xi  



5.4. Correlation Factor Matrix The correlation factors are evaluated as r 

1 n  ui ui n  1 i 1

When data are ones in population, the corresponding correlation factor is given by

r 

1 n  ui ui n i 1

We can then obtain the correlation factor matrix given by  1 r12  r21 1 R    rp1 rp 2

r1 p   r2 p    1 

Factor Analysis

425

5.5. Factor Loading Once the correlation matrix is obtained, it can be expanded using related eigenvalues and eigenvectors as 1   r21    rp1

r1 p   r2 p    v v T  2 _1v2 _1v2 _1T   1_1 1_1 1_1  1 

r12 1 rp 2

  p _1v p _1v p _1T

We then approximate as follows as a first step.   Se 2_1  1  0     0ix  1 1  r21     rp1

0  2

Se2 _1 0 r12 1 rp 2

0   0     2  Se p _1  r1 p   r2 p    v v T  2 _1v2 _1v2 _1T    1_1 1_1 1_1  1 

 q _1vq _1vq _1T 

We then approximate the second approximated correlation matrix as below, and obtain related eigenvalues and eigenvectors and expand the matrix as below.  1  Se 2_1 r12 1  2  r21 1  Se2 _1     rp1 rp 2 

  r2 p    1_ 2 v1_ 2 v1_ 2T  2 _ 2 v2 _ 2 v2 _ 2T   2  1  Sep _1  r1 p

The second order independent factor is then given by  Se 2_ 2 0 0   1  2   0 Se2 _ 2 0        2   0ix  0 S ep _ 2   1 2  1  Se _1 r12 r1 p  1   2   r21 1  Se2 _1 r2 p       2    rp1 rp 2 1  Se p _1     1_ 2 v1_ 2 v1_ 2T  2 _ 2 v2 _ 2 v2 _ 2T 

 q _ 2 vq _ 2 vq _ 2T 

  p _ 2 v p _ 2 v p _ 2T

Kunihiro Suzuki

426

We evaluate the deviation of independent factor DSe

DSe

2

 2

as

   Se 2_ 2  Se 2_1 1  1     Se 2_ 2  Se 2_1 2 2       2 2    Se _ 2  Se _1  p  p 

We then evaluate 



1 p  2  Se _ 2  Se2_1 p  1 



2

We repeat the above step until we obtain   c , that is, 



1 p  2  Se _ m  Se2_ m1 p  1 



2

 c

We then obtain factor loading as  a11    a21 a1  1_ m v1_ m         a p1   a12    a22  a2  2 _ m v2 _ m       a  p2 

aq  q _ m vq _ m

 a1q    a2 q        a pq 

5.6. Valimax Rotation Factor loading and factor points have uncertainty with respect to rotation. We decide the rotation angle as follows. The element rotated with an angle of

 is given by

Factor Analysis  a1k cos   a1l sin    a1k      a2 k cos   a2l sin    a2 k   ak            a pk cos   a pl sin    a pk   a1k sin   a1l cos    a1l      a sin   a2l cos    a2l  al   2 k           ak sin   al cos    a pl 

We then evaluate as below. 2

2

2

2

2 2 1 p  a k   1  p  a k   S           p  1  d   p 2   1  d       2 2 1 p  a l   1  p  a l             p  1  d   p 2   1  d      

The angle

 that maximize S is given by

tan  4  

G2 G4

G2 sin 4  0 where 2 2  a  4  a  4  1   p  a 2   p  a 2     G1     k     l        k       l     1  d   d   p    1  d     1  d       p

2 2  p  a 3  a   a  a 3   1  p  a   a   p  a  a     G2  4    k    l     k   l       k     l      k   l    d d d  p   1  d   d    1  d  d     1  d               

2 2 2  p  a 2  a 2  p  a k  a l     1  p  a k  p  a l   k l G3  4 3             2        d d p d d d    1   1   1   1                d       

427

Kunihiro Suzuki

428

1 G4  G1  G3 2 We perform the above process to any combination of k , l . The factor loading is denoted as ak , where we should be careful about the angle conversion as Condition 1

Condition 2

4

K 2 sin 4  0

none

none



sin 4  0

4  4  



sin 4  0

4  4  

K 2 sin 4  0

conversion



range

 8



 8

5.7. Factor Point The coefficient for the factor can be obtained from

 h1k   1     h2 k    r21        hpk   rp1

r12 1 rp 2

r1 p   r2 p    1 

1

 a1k     a2 k       a2 k 

We then obtain factor point associated with a factor

fˆik  h1k xi1  h2k xi 2 

 hpk xip

5.8. Independent Factor The independent factor is obtained as

k

is given by

4

 

 8

  

 

 4

 8

Factor Analysis  ei1   xi1    a11   a12            ei 2    xi 2    f  a21   f  a22        i1   i 2             eip   xip    a p1   ap2 

429

 a1q     a2 q     fiq       a pq  

for i  1, 2,

,n

This can be totally expressed as

 e11   e21    en1

e12 e22 en 2

e1 p   x11   e2 p   x21      enp   xn1

x12 x22 xn 2

x1 p   f11   x2 p   f 21      xnp   f n1

f12 f 22 fn2

f1q  a11  f 2 q  a12   f nq   a1q

a21 a22 a2 q

a p1   ap2    a pq 

5.9. Appreciation of the Results The score of each item is appreciated by the factor of number q . The relationship between the factor and each item is expressed by a k . For example, the elements for factor 1 are expressed by a11 , a12 , , a1q . The most important one is the maximum one among a11 , a12 , , a1q . The point of each member for the factor is fik . Therefore, we should set the target where large a with low f .

Chapter 12

CLUSTER ANALYSIS ABSTRACT One subject (a person for a special case) has various kinds of items. We define distance between the items using the data, and evaluate the similarity. Based on the similarity, we form clusters of the subject. We show various methods to evaluate the similarity.

Keywords: cluster analysis, arithmetic distance, tree diagram, chain effect, Ward method

1. INTRODUCTION Let us consider scores of an examination of various subjects. The scores are scattered in general. We can deduce that some members are resembled. We want to make groups based on the results. On the other case, we can have data where each customer buys some products. It is convenient if we make some groups and perform proper advertisements depending on the characteristics of the groups. Therefore, we want to have some procedures to obtain the groups, which we study in this chapter.

2. SIMPLE EXAMPLE FOR CLUSTERING Cluster analysis defines a distance between two categories and evaluates the similarity by the distance. We have score data for English and mathematics as shown in Table 1. The distance between members A and B is denoted as

d  A, B 

, and it is defined as

Kunihiro Suzuki

432 d  A, B   

 xA1  xB1 

2

 67  43

  64  76 

2

  x A 2  xB 2 

2

2

 26.8

(1) Table 1. Score of English and Mathematics English x1

Member ID A B C D E F G H I J

67 43 45 28 77 59 28 28 45 47

Mathematics x2 64 76 72 92 64 40 76 60 12 80

We can evaluate the distance for the other combinations of students and obtain the data as shown in Table 2. The minimum distance among these data is the distance between members B and C. Therefore, we form a cluster of B and C first. After forming a cluster, we modify the data for the cluster. The simplest modification is to use its average one given by

xBC1 

xB1  xC1 x x , xBC 2  B 2 C 2 2 2

(2)

Using this data, we evaluate the distance between the members which is shown in Table 3. Table 2. Distance of members A A B C D E F G H I J

B

C

D

E

F

G

H

I

J

26.8

23.4 4.5

48.0 21.9 26.2

10.0 36.1 33.0 56.4

25.3 39.4 34.9 60.5 30.0

40.8 15.0 17.5 16.0 50.4 47.5

39.2 21.9 20.8 32.0 49.2 36.9 16.0

56.5 64.0 60.0 81.8 61.1 31.3 66.2 50.9

25.6 5.7 8.2 22.5 34.0 41.8 19.4 27.6 68.0

Cluster Analysis

433

Table 3. The distance between members for second step English x1

Member ID A B,C D E F G H I J

67 44 28 77 59 28 28 45 47

Mathematics x2 64 74 92 64 40 76 60 12 80

A A B,C D E F G H I J

B,C

D

E

F

G

H

I

J

25.1

48.0 24.1

10.0 34.5 56.4

25.3 37.2 60.5 30.0

40.8 16.1 16.0 50.4 47.5

39.2 21.3 32.0 49.2 36.9 16.0

56.5 62.0 81.8 61.1 31.3 66.2 50.9

25.6 6.7 22.5 34.0 41.8 19.4 27.6 68.0

In this case, the distance between the cluster BC and J is the minimum. Therefore, we form the cluster BCJ, and modify the data for the cluster as

xBCJ 1 

xB1  xC1  xJ 1 x x x , xBC 2  B 2 C 2 J 2 3 3

(3)

Using this data, we evaluate the distance between the members which is shown in Table 4. Table 4. The distance between members for third step Member ID A B,C,J D E F G H I

English x1 67 45 28 77 59 28 28 45

Mathematics x2 64 76 92 64 40 76 60 12

A A B,C,J D E F G H I

B,C,J

D

E

F

G

H

I

25.1

48.0 23.3

10.0 34.2 56.4

25.3 38.6 60.5 30.0

40.8 17.0 16.0 50.4 47.5

39.2 23.3 32.0 49.2 36.9 16.0

56.5 64.0 81.8 61.1 31.3 66.2 50.9

In this case the distance between members A and E is the minimum. Therefore, we form a cluster AE, and modify the data as

xAE1 

xA1  xE1 x x , xAE 2  A2 E 2 2 2

(4)

Using this data, we evaluate the distance between the members which is shown in Table 5.

Kunihiro Suzuki

434

Table 5. The distance between members for fourth step English x1

Member ID A,E B,C,J D F G H I

72 45 28 59 28 28 45

Mathematics x2 64 76 92 40 76 60 12

A,E B,C,J A,E B,C,J D F G H I

29.5

D

F

G

H

I

52.2 23.3

27.3 38.6 60.5

45.6 17.0 16.0 47.5

44.2 23.3 32.0 36.9 16.0

58.6 64.0 81.8 31.3 66.2 50.9

In this case, the distance between D, G, and H is the minimum, and we form a cluster DGH, and modify the related data as

xDGH 1 

xD1  xG1  xH 1 x x x , xDGH 2  D 2 G 2 H 2 3 3

(5)

Using this data, we evaluate the distance between the members which is shown in Table 6. Table 6. The distance between members for fifth step English x1

Member ID A,E B,C,J D,G,H F I

72 45 28 59 45

Mathematics x2 64 76 76 40 12

A,E B,C,J D,GH A,E B,C,J D,GH F I

F

I

29.55 45.61 27.29 58.59 17.00 38.63 64.00 47.51 66.22 31.30

In this case, the distance between cluster BCJ and cluster DGH is the minimum, and we form a cluster BCJDGH, and modify the data as

xBCJDGH 1 

xB1  xC1  xJ 1  xD1  xG1  xH 1 x x x x x x , xDGH 2  B 2 C 2 J 2 D 2 G 2 H 2 6 6

(6)

Using this data, we evaluate the distance between the members which is shown in Table 7. Table 7. The distance between members for sixth step Member ID A,E B,C,J,D,G,H F I

English x1 72 36.5 59 45

Mathematics x2 64 76 40 12

A,E A,E B,C,J,D,G,H F I

B,C,J,D,G,H 37.47

F 27.29 42.45

I 58.59 64.56 31.30

Cluster Analysis

435

In this case the distance between cluster AE and F is the minimum. We then form a cluster AEF and modify the data as

xAEF1 

xA1  xE1  xF1 x x x , xAEF 2  A2 E 2 F 2 3 3

(7)

Table 8. The distance between members for sixth step Member ID A,E,F B,C,J,D,G,H I

English Mathematics x1 x2 67.67 56.00 36.50 76.00 45.00 12.00

A,E,F A,E,F B,C,J,D,G,H I

B,C,J,D,G,H 37.03

I 49.50 64.56

Using this data, we evaluate the distance between the members which are shown in Table 8. In this case, the distance between cluster AEF and cluster BCJDGH is the minimum. We hence form a cluster AEF BCJDGH and evaluate the data for the cluster. Finally, we form the last cluster as shown in Table 9. Table 9. The distance between members for eighth step Member ID A,E,F,B,C,J,D,G,H I

English Mathematics x1 x2 46.89 69.33 45.00 12.00

A,E,F,B,C,J,D,G,H A,E,F,B,C,J,D,G,H I

I 57.36

After the above process, we can form a tree structure as shown in Figure 1. We can obtain some groups if we set the critical distance. For example, if we obtain 4 groups, we set the critical distance of 25 and obtain the groups of below. [B,C,J,D,G,H], [A,E], F,I

Figure 1. Tree diagram.

Kunihiro Suzuki

436

There are various definitions of distances. The process above is the simplest one among them. We should mention one point in the above process.

x

x

The above process is valid when the two variables 1 and 2 are similar ones, which means the magnitude of the two variables are in the almost same order. However, they are quite different variables and hence the magnitude is significantly different, the distance is determined by the one which is larger. Therefore, we consider only one variable in this case. In such case, we need to use normalized variables below.

xi1   x1

zi1 

x

, zi 2 

xi 2   x2

 x2

1

(8)

where

x  1

x  2



1 1 xi1 ,  x1   xi1  x1  n i n i





1 1 xi 2 ,  x2   xi 2  x2  n i n i

2

(9)



2

(10)

and n is the data number.

3. THE RELATIONSHIP BETWEEN ARITHMETIC DISTANCE AND VARIANCE We evaluate the relationship between arithmetic distance and the variance. The square of the distance is modified as



  x  x    x  x    x  x     x  x    x  x     x  x    x  x   2  x  x  x  x    x  x    x  x   2  x  x  x  x   S    2  x  x  x  x    x  x  x  x  

d 2  xax1  xbx1

2

2

ax2

bx2

2

ax1

x1

bx1

x1

bx1

x1

2

ax1

ax2

x2

bx2

x2

2

x1

2

2 ab

2

ax2

x2

bx2

ax1

ax1

x1

bx1

x1

2

x1

x2

bx1

ax2

x1

ax2

x2

bx2

x2

bx2

x2

x2

(11)

Cluster Analysis

437

where

x

ax1

 xx1

 x

bx1

xax  xbx1  xax1  xbx1    xx1   xax1  1  xbx1   2 2    xax  xbx1 xbx1  xax1  1 2 2 2 1   xax1  xbx1 4







(12)

Similarly, we obtain

x

ax2

 xx2

 x

bx2



 xx2  



1 xax2  xbx2 4



2

(13)

Therefore, we obtain



 2 1  d 2  Sab  xax  xbx1 2  1  2 1 2  Sab  d 2

  x 2

ax2

 xbx2

  2

(14)

This can be modified as

d 2  2Sab  2

(15)

The arithmetic distance is double of the variance.

4. VARIOUS DISTANCE The distance is basically defined by d ab 

1 n 2  xia  xib   n i 1

(16)

However, there are various definitions for the distance. We briefly mention the distance.

Kunihiro Suzuki

438

When each item is not identical, we multiply a weighted factor wi , and the corresponding distance is given by d ab 

1 n 2  wi  xia  xib  n i 1

(17)

This is called as weighted distance. The generalized distance is given by 1

1 n   d ab    xia  xib   n i 1 

(18)

The maximum deviation is apt to dominate the distance with increasing  .

5. CHAIN EFFECT Chain effect means that one cluster absorbs one factor in order. In that special case, the group is always two for any stage of distance. This situation is not appropriate for our purpose. The above simple procedure is apt to suffer this chain effect, although the reason is not clear.

6. WARD METHOD FOR TWO VARIABLES Ward method form a cluster in the standpoint that the square sum is the minimum. This method is regarded as the one that overcomes the chain effects, and hence is frequently used. Table 10. Score of Japanese and English for members

Member ID 1 2 3 4 5

Japanese x1

English x2 5 4 1 5 5

1 2 5 4 5

Cluster Analysis

439

Let us consider the distance between member 1 and 2. We evaluate the means of Japanese and English scores for members 1 and 2 as shown in Table 11, and evaluate the sum of square given by 2

2

K12    xik  xk 

2

i 1 k 1

  5  4.5    4  4.5   1  1.5    2  1.5  2

2

2

2

 1.00

(19)

We can evaluate the sum of square for the other combination of members as shown in Table 12. The minimum square value is that for the combination of member 4 and 5, and we form a cluster of 4 and 5 denoted as C1 . Table 11. Score of Japanese and English for member 1 and 2

Member ID

Japanese x1

1 2 Mean

English x2

5 4 4.5

1 2 1.5

Table 12. Sum of square for member combinations ID 1 2 3 4 5

1

2

3

4

1 16 4.5 8

9 2.5 5

8.5 8

0.5

5

After forming a cluster we evaluate the distance between cluster C1 and the member 1. The corresponding data is shown in Table 13. Table 13. Score of cluster

Cluster ID Member ID 1 C1 Average

1 4 5

C1

and member 1

Japanese x1 5 5 5 5.00

English x2 1 4 5 3.33

Kunihiro Suzuki

440

The mean of Japanese for this group of the cluster and member is 5 and that of English is 3.33. The square sum 3

2

K C11

KC11    xik  xk 

is evaluated as

2

i 1 k 1

  5  5.00    5  5.00    5  5.00  2

2

 1  3.33   4  3.33   5  3.33 2

2

2

2

 8.67

(20)

We do not use this sum directly, but the deviation of the sum KC11 given by



KC11  KC11  KC1  K1



 8.67   0.5  0   8.17

(21)

where K1 is 0 since there is only one data. The variance deviations for any combination are shown in Table 14. The minimum variance deviation is between members 1 and 2. Therefore, we form a cluster of members 1 and 2, and denote it as C2. Table 14. Variance deviation for second step ID

1 1 2 3 C1

2 1 16 8.17

3

9 4.83

C1

10.83

We then evaluate the variance deviation between C1, C2, and member 3, which is shown in Table 15. The minimum variance deviation is between cluster C1 and C2, and we form the cluster of C1 and C2. The corresponding tree diagram is shown in Figure 2. Table 15. Variance deviation for step 2 ID

3 3 C1 C2

10.83 16.33

C1

9.25

C2

Cluster Analysis

441

Figure 2. Tree diagram with Ward method.

7. WARD METHOD WITH MANY VARIABLES We can easily extend the procedure for the two variables to that for many variables of more than two. Let us consider that we form a cluster lm by merging cluster l and cluster m . Cluster l and cluster m has the data denoted as xilk , ximk , and the merged number is denoted as nl and nm . We then obtain the variables below n

nl

Kl    xilk  xil 

2

i 1 k 1

n

nm

(22)

K m    ximk  xim 

2

i 1 k 1

(23)

The merged variance is expressed by Klm  Kl  Km  Klm

(24)

The deviation of the variance is given by Klm 

nl nm nl  nm

n

 x i 1

il

 xim 

2

which is very important rule and is proved in the following

(25)

Kunihiro Suzuki

442 Square sum of Klm is given by

nm n  nl 2 2 Klm      xilk  xilm     ximk  xilm   i 1  k 1 k 1 

(26)

where nl

xilm 

nm

 xilk   ximk k 1

k 1

nl  nm

(27)

We modify this equation to relate it to Kl and K m as nm n  nl 2 2 K lm      xilk  xil  xil  xilm     ximk  xim  xim  xilm   i 1  k 1 k 1  nl nl   2 2  xilk  xil   2  xilk  xil  xil  xilm   nl  xil  xilm   n  k 1 k 1    n nm m 2 2 i 1      ximk  xim   2  ximk  xim  xim  xilm   nm  xim  xilm   k 1  k 1  nl nm n  2 2      xilk  xil     ximk  xim   i 1  k 1 k 1  n

2 2    nl  xil  xilm   nm  xim  xilm     i 1 n

2 2  K l  K m    nl  xil  xilm   nm  xim  xilm     i 1

(28)

where we utilize nl

nl

k 1

k 1

  xilk  xil  xil  xilm    xil  xilm    xilk  xil   0 nm

 x k 1

imk

(29)

nl

 xim  xim  xilm    xim  xilm    ximk  xim   0 k 1

(30)

Cluster Analysis nl

xil  xilm 

nl

 xilk



k 1

nl

nm

 xilk   ximk k 1

k 1

nl  nm 

nl





k 1

nm



 k 1 nl  nl  nm 

k 1



nl

nm

k 1

k 1

nm  xilk  nl  ximk nl  nl  nm 

 xilk nm nl 



nl

 nl  nm   xilk  nl  xilk   ximk 

nl



443

k 1

nm

 nl nm

x k 1

imk

nl nm nl  nl  nm 

nm nl  xil  xim  nl  nl  nm 

nm  xil  xim  nl  nm

(31)

Similarly, we obtain

xim  xilm 

nl  xim  xin  nl  nm

(32)

Therefore, we obtain  n  x  xim    nl  xim  xin    nl  m il   nm    nl  nm   nl  nm  2

nl  xil  xilm   nm  xim  xilm  2

2

n n  l



2 m

 nm nn2   xil  xim 

 nl  nm 

2

2

2

nl nm 2  xil  xim  nl  nm

(33)

Finally, Eq. (25) is proved. The distance is normalized by the data number n and is given by d lm 

K lm n

(34)

Kunihiro Suzuki

444

The average in the cluster is given by nl

xilm 

nm

 xilk   ximk k 1

k 1

nl  nm

(35)

8. EXAMPLE OF WARD METHOD WITH N VARIABLES We apply the Ward method and form clusters. Table 1 shows the CS data of members for various items. The score is 5 point perfect. The member number n is 20. In this case, the following five items influence the customer satisfaction. a. Understanding customer b. Prompt response c. Flexible response d. Producing ability e. Providing useful information f. Active proposal We evaluate clustering of items using 20 member scores. The dummy variable for items a-f is k , and that for members 1-20 is i .

Step 1: Sum of Square The sum of square is evaluated as n

nl

Kl    xilk  xil 

2

i 1 k 1

(36)

We consider an item a , and evaluate the corresponding square sum. Since we consider the item a , l is a . Since we do not form cluster, k  a . The element is only one, and the mean is the same as the data given by

xia  xia

(37)

Cluster Analysis

445

Table 16. Satisfaction score of members for various items

4 3 3 5 4 2 5 3 1 1 5 2 3 3 3 1 5 1 3 3

f:Active proposal

3 4 2 5 4 2 5 3 1 1 5 3 3 4 3 1 5 1 3 4

e:Providing uselul Information

4 3 1 5 4 2 5 3 1 1 5 3 3 4 3 1 5 1 4 4

d: Producing ability

4 4 3 5 4 2 5 4 2 2 4 3 3 3 4 1 5 4 4 4

c:Flexible response

b:Prompt response

a: Understanding customer

Member ID

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

5 3 2 5 4 2 5 3 1 1 5 2 3 3 3 1 5 3 4 3

4 2 1 5 4 2 5 3 2 3 5 2 3 3 3 1 5 3 3 3

Therefore, the variance is given by Ka  0

(38)

The variances for the other items are the same and is given by

K a  Kb 

 Kf  0

This corresponds the initialization of Kl .

Step 2: Sum of the Variance for Whole Combinations The whole combination is as below. a-b, a-b, a-d, a-e, a-f,a-e,a-f b-c,b-d,b-e,b-f

(39)

Kunihiro Suzuki

446 c-d,c-e,c-f d-e,d-f e-f

Table 17. Variance deviation of items a and b

4 4 3 5 4 2 5 4 2 2 4 3 3 3 4 1 5 4 4 4

Square of deviation

b:Prompt response

a:Understanding customer

Member ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

4 3 1 5 4 2 5 3 1 1 5 3 3 4 3 1 5 1 4 4 Sum ΔK

0 1 4 0 0 0 0 1 1 1 1 0 0 1 1 0 0 9 0 0 20 10

We show the variance deviation evaluation of the combination between a and b . The corresponding variance is given by K ab 

na nb 20 2   xia  xib  na  nb i 1

(40)

where a and b are single item, and hence na  nb  1 . This also means the means equal to the data themselves, that is, xia  xia and xib  xib . The corresponding sum of square is 20. 20

 x i 1

ia

 xib   20 2

(41)

Cluster Analysis

447

Therefore, the variance deviation is K ab 

na nb na  nb

20

 x i 1

ia

 xib 

2

11   20 11  10

(42)

We can evaluate the other combinations as shown in Table 18. Table 18. Variance deviation for first step Item

a

a b c d

b

c 10

d

e

f

9

9

5.5

8

2

4

4.5

7

3

6.5

9

3.5

7

e

4.5

f

Step 3: First Cluster Formation The minimum variance deviation is that for Kbc . Therefore, we form a cluster of b and c . The corresponding distance dbc is given by dbc 

K bc n

2 20  0.316



The variance associated with the cluster bc is given by

(43)

Kunihiro Suzuki

448 Kbc  Kb  K c  K bc  Kbc 2

(44)

The corresponding item number is given by nbc  2

(45)

Calculating means for the cluster, we obtain the updated table and obtain Table 19. Table 19. Satisfaction score of members for various items and the first cluster

3.5 3.5 1.5 5 4 2 5 3 1 1 5 3 3 4 3 1 5 1 3.5 4

4 3 3 5 4 2 5 3 1 1 5 2 3 3 3 1 5 1 3 3

f

f:Active proposal

3 4 2 5 4 2 5 3 1 1 5 3 3 4 3 1 5 1 3 4

e:Providing uselul Information

4 3 1 5 4 2 5 3 1 1 5 3 3 4 3 1 5 1 4 4

e

d: Producing ability

a: Understanding customer 4 4 3 5 4 2 5 4 2 2 4 3 3 3 4 1 5 4 4 4

d

[bc] mean

Member ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

[bc]

c:Flexible response

a

b:Prompt response

Cluster 1

5 3 2 5 4 2 5 3 1 1 5 2 3 3 3 1 5 3 4 3

Step 4: Sum of Square for Second Time We then evaluate the variance deviation of the combinations below. a-[bc], a-d, a-e, a-f,a-e,a-f [bc]-d,[bc]-e,[bc]-f d-e,d-f e-f

4 2 1 5 4 2 5 3 2 3 5 2 3 3 3 1 5 3 3 3

Cluster Analysis

449

We show the evaluation of variance between a and [bc] below K abc 

na nbc na  nbc

20

 x i 1

ia

 xibc 

2

1 2  18 1 2  12



(46)

The corresponding table is given by Table 20. Table 20. Variance deviation of items a and a cluster [bc]

4 4 3 5 4 2 5 4 2 2 4 3 3 3 4 1 5 4 4 4

Square of deviation

[bc] mean

a: Understanding customer

Member ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

3.5 3.5 1.5 5 4 2 5 3 1 1 5 3 3 4 3 1 5 1 3.5 4 Sum ΔK

0.25 0.25 2.25 0 0 0 0 1 1 1 1 0 0 1 1 0 0 9 0.25 0 18 12

We perform similar analysis for the other combinations and obtain Table 21.

Kunihiro Suzuki

450

Table 21. Variance deviation for second step

ID

a

a

bc

d

12

bc

e

f

9

5.5

8

4

6.67

10

3.5

7

d

e

4.5

f

Step 5: Second Cluster Formation The minimum variance deviation is that for Kde . Therefore, we form a cluster of d and e . The corresponding distance dbc is given by d de 

K de n

3.5 20  0.418



(47)

The variance associated with the cluster bc is given by K de  K d  K e  K de  K de  3.5

The corresponding item number is given by

(48)

Cluster Analysis nde  2

451 (49)

We repeat the above process and obtain the tree diagram

Figure 3. Tree diagram for the data.

SUMMARY To summarize the results in this chapter— The distance between two variables are given by

d  A, B  

 xA1  xB1    xA2  xB2  2

2

We evaluate all combination of the variables, and select the minimum one, and form a cluster. After forming a cluster, we modify the data for the cluster. The simplest modification is to use its average one given by

xAB1 

xA1  xB1 x x , xAB 2  A2 B 2 2 2

Using this data, we evaluate the distance between the members. We can use a modified distance as 1

1 n   d    xiA  xiB   n i 1 

Kunihiro Suzuki

452

There is the other method for forming clusters called as a Ward method. The Ward method forms a cluster in the standpoint that the square sum is the minimum. Let us consider that we form cluster lm by merging cluster l and cluster m . Cluster l and cluster m has the data denoted as xilk , ximk , and the merged number is denoted as nl and nm . We then obtain the variables below. n

nl

Kl    xilk  xil 

2

i 1 k 1

n

nm

K m    ximk  xim 

2

i 1 k 1

The merged variance is expressed by Klm  Kl  Km  Klm

The deviation of the variance is given by Klm 

nl nm nl  nm

n

 x i 1

il

 xim 

2

We search the minimum combination for this deviation of variance, and repeat the process.

Chapter 13

DISCRIMINANT ANALYSIS ABSTRACT We assume that we have two groups and have reference data where we clearly know to which group the data belong to. When we obtain one set data for a certain member or subject, the discriminant analysis give us the procedure to decide the group for the member or the subjects. The decision is done by evaluating the distance between the data and ones for each group.

Keywords: distance, determinant, Maharanobis’ distance

1. INTRODUCTION When we have a bad condition in health, we go to a hospital, and take some kinds of medical checks. The doctor has to decide whether the person is sick or not. This problem is generalized as follows. Two groups are set, and the corresponding data with the same items are also provided. We consider the situation where we obtain one set data for the items. Discriminant analysis helps us to decide to which group it belongs. We perform matrix operation in this chapter. The basic matrix operation is shown in Chapter 15 of volume 3.

2. DISCRIMINANT ANALYSIS WITH ONE VARIABLE We want to judge whether a person is healthy or disease using one inspection value x1 . We define a distance between a data and two groups and select the shorter one and decide the person who belongs to the group of shorter distance.

Kunihiro Suzuki

454

We show the inspection data for healthy and diseased people as shown in Table 1. We inspect a person and judge whether he is healthy or diseased. First of all, we make two groups from the data in Table 1: healthy and disease. We then evaluate the averages and variances for the groups as shown in Table 2. Table 1. Inspection data for healthy and diseased people

No.

Kind 1 2 3 4 5 6 7 8 9 10

Healthy Healthy Healthy Healthy Healthy Disease Disease Disease Disease Disease

Inspection x1 50 69 93 76 88 43 56 38 21 25

Table 2. Average and variance for healthy and diseased people

Average Variance

Healthy

Disease

50 69 93 76 88 75.2 230.96

43 56 38 21 25 36.6 159.44

  The average and variance of healthy people are denoted as H1 , and H 1 , and the 2

  average and variance of disease people are denoted as D1 , and D1 . These are evaluated as below. 2

1 5  75.2

H 1    50  69  93  76  88

(1)

Discriminant Analysis

455

1 2 2 2 2 2  H 21    50  75.2    69  75.2    93  75.2    76  75.2   88  75.2   5   230.96



1 5  36.6

D1    43  56  38  21  25

(3)

1 2 2 2 2 2  D 21    43  36.6    56  36.6    38  36.6    21  36.6    25  36.6   5   159.44

(2)



(4)

We assume that the healthy and diseased people data follow normal distributions. The distribution for healthy people is given by

f  x1  

  x1   H 1 2  exp   2 2 2 H 1 2 H 1   1

(5)

The distribution for disease people is given by

  x1   D1 2  f  x1   exp    2 2 2 D 1 2 D 1   1

Figure 1 shows the above explanation schematically.

Figure 1. The probability distribution for healthy and diseased people.

(6)

Kunihiro Suzuki

456

When we have a data value of x1 , we want to judge whether the person is healthy or diseased. We define the distance as below.

DH2 

D  2 D

 x1   H 1 

2

(7)

 H 21

 x1  D1 

2

(8)

 D 2

We can judge whether the person is healthy or diseased as  DH2  DD2  2 2  DH  DD

 Healthy  Disease

(9)

We evaluate the value of x1 where the both distance is the same and denote it as c . We then have the relationship of

H 1  c  H 21



c   D1

(10)

 D 21

We then obtain

c

H 1  H 21  D1  D 21

(11)

 H 21   D 21

In this case, we obtain c

75.2  230.96  36.6  159.44 230.96  159.44

 57.68

(12)

Summarizing above, we obtain general expressions for the decision as  DD2  DH2  0  x1  c  Helathy  2 2  DD  DH  0  x1  c  Disease

(13)

Discriminant Analysis

457

3. DISCRIMINANT ANALYSIS WITH TWO VARIABLES We consider two inspections 1 and 2, and groups to extend the analysis for many variables. Two inspection items are expressed by a vector

A and B . We here introduce vectors

x  x 1  x2 

(14)

The vector associated with averages are given by     μ A   1 A  , μ B   1B   2 A   2 B 

(15)

   2 12 2A   11 2B  A   112A ,           2  B    2  21A 22 A   21B

12 2B 



 2   22 B 

(16)

The determinant for  A and B are given by

A 

B 

11 2A 12 2A  2

 21A  22 A 11 2B 12 2B  2

       11 A  22 A   12 A 21 A

(17)

       11 B  22 B   12 B 21B

(18)

2

 2

2

 2

 21B  22 B

2

2

2

2

2

2

The distribution for two variables for the group A is given by

f A  x1 , x2  

1 2

 2  2

 DA2  exp     2  A

 2   11 A D   x1  1A , x2  2 A   21 2     A 2 A

where

 2   12  x  1A  A  1  22 2    x    A  2 2A 

(19)

(20)

Kunihiro Suzuki

458

 2  2    2   11  12 11A 12 2A  A A   21     A1    2  22 2     2   2   A A   21A 22 A 

(21)

The Maharanobis’ distance associated with the group A is denoted as d A and is given by

d A  DA2

(22)

The Maharanobis’ distance associated with the group B is denoted as d B and is given by

d B  DB2

(23)

where   11 2 DB2   x1  1B , x2  2 B   B21 2     B

 2   12  x  1B  B  1  22 2    x    B  2 2B 

(24)

We evaluate d A and d B , and decide that the data belong to which group.

4. DISCRIMINANT ANALYSIS FOR MULTI VARIABLES We extend the analysis in the previous section to a multi variables case, where we assume p variables. The data are expressed by a vector having p elements given by  x1    x2 x       xp 

It follows the normal distribution with

(25)



N   ,  k

 , and is expressed by

Discriminant Analysis

f  x1 , x2 , , x p  

1 p

 2  2

459

 D2  exp     2  

(26)

where   11 2  12 2  2  2      22    21     2   2 p2  p1

D 2   x1  1 , x2  2 ,

p

 1 p2 



 2 2p 

(27)

   pp2 

  11 2  12 2  21 2      22 2 , xp   p      p1 2  p 2 2 

p

   xi  i   x j   j  

 1 p 2   x1  1     2 p 2   x2  2 

      pp 2   x p   p 

(28)

ij  2 

i 1 j 1

We also define a vector of μ as  1    2 μ       p 

(29)

The distance d is given by d  D2

(30)

We can select the group for shorter distance.

SUMMARY To summarize the results in this chapter— The distance of the member and a group can be evaluated as

Kunihiro Suzuki

460 d  D2

where

D 2   x1  1 , x2  2 ,

  11 2  12 2  21 2   22 2 , xp   p      p1 2  p 2 2 

 1 p  2   x1  1     2 p  2   x2  2 

      pp 2   x p   p 

We evaluate the distance with respect to each group, and select the group with smaller distance.

REFERENCES [1] [2]

[3] [4] [5]

[6]

[7]

W. L. Carlson and B. Thorne, Applied Statistical Methods, 1997, Presence-Hall, Inc., New Jersey, U.S.A. R. A. Barnett, M. R. Ziegler, and K. E. Byleen, College Mathematics for Business, Economics, Life science, and Social sciences 12th edition, 2011, Pearson Education, Inc., 2011, U. S. A. G.Maruyama, Probability and statistics, 1956, Kyoritu, Japan, in Japanese. 丸山儀四郎、“確率および統計入門”、共立出版、日本、1956. A. Kobari, Introduction to probability and statistics, 1973, Iwanami Shoten, Japan. 永田靖、小針 宏、“確率・統計入門”、岩波書店、東京、1973. Y. Tanaka and K. Wakimoto, Multivariate statistical analysis, 1994, Gendai Sugakusha, Japan, in Japanese. 田中豊、脇本和昌、“多変量統計解析法”、現 代数学社、京都、1994. Y. Nagata and M. Munechika, Multivariate analysis, 2007, Science Company, Japan, in Japanese. 永田靖、棟近雅彦、“多変量解析法入門”、サイエンス社、東京、 2007. Y. Wakui and S. Wakui, Covariance Structural Analysis, Nihon Jitsugyo Publisher, 2003, Japan, in Japanese. 涌井良幸、涌井貞美、“共分散構造分析”、日本実業

出版社、東京、2003. [8] Y. Wakui, Bayes Statistics as a Tool, Nihon Jitsugyo Publisher, 2009, Japan, in Japanese. 涌井良幸、“道具としてのベイズ統計”、日本実業出版社、東京、 2009. [9] H. Cramer, Mathematical Methods of Statistics, 1999, Princeton University Press, U. S. A. [10] D. M. Levine,T. C. Krehbiel, and M. L. Berenson, Business Statistics, 2013, Pearson Education Inc., U. S. A.

ABOUT THE AUTHOR Kunihiro Suzuki, PhD Fujitsu Limited, Tokyo, Japan Email: [email protected]

Kunihiro Suzuki was born in Aomori, Japan in 1959. He received his BS, MS, and PhD degrees in electronic engineering from Tokyo Institute of Technology, Tokyo, Japan, in 1981, 1983, and 1996, respectively. He joined Fujitsu Laboratories Ltd., Atsugi, Japan in 1983 and was engaged in design and modeling of high-speed bipolar and MOS transistors. He studied process modeling as a visiting researcher at the Swiss Federal Institute of Technology, Zurich, Switzerland in 1996 and 1997. He moved to Fujitsu Limited, Tokyo, Japan in 2010, where he was engaged in a division that is responsible for supporting sales division. His current interests are statistics and queuing theory for business. His research covers theory and technology in both semiconductor device and process. To analyze and fabricate high-speed devices, he also organizes a group that includes physicists, mathematicians, process engineers, system engineers, and members for analysis such as SIMS and TEM. The combination of theory and experiment and the aid from various members make his group special to do various original works. His models and experimental data are systematic and valid for wide range conditions and can contribute to academic and practical product fields. He is the author and co-author of more than 100 refereed papers in journals, more than 50 papers in international technical conference proceedings, and more than 90 papers in domestic technical conference proceedings.

INDEX A analysis of variance, v, 277, 285, 290, 305 arithmetic distance, 431, 436, 437 average, ix, 3, 4, 5, 6, 46, 55, 56, 64, 65, 66, 75, 77, 84, 86, 92, 93, 126, 148, 227, 228, 231, 232, 233, 234, 235, 236, 239, 240, 241, 242, 244, 248, 249, 250, 253, 254, 255, 257, 261, 269, 270, 271, 274, 277, 279, 281, 282, 285, 287, 290, 291, 292, 293, 295, 299, 301, 302, 311, 321, 322, 328, 331, 341, 356, 357, 364, 375, 402, 423, 432, 444, 451, 454

B Bayesian network, 179, 180, 210, 211, 212, 217, 219, 220, 221 Bayesian updating, 179, 187, 191, 197, 208, 223, 224, 227, 228 beta distribution, 223, 229, 230, 269 beta function, 223 binomial distribution, 223, 232

C chain effect, 431, 438 child node, 179, 210, 212, 213, 217 cluster analysis, v, 431 correlation factor, v, ix, 1, 6, 7, 8, 9, 10, 12, 13, 15, 16, 17, 20, 21, 22, 23, 24, 27, 43, 44, 45, 50, 51, 52, 53, 54, 56, 58, 59, 62, 63, 64, 65, 66, 67, 69, 71, 74, 76, 77, 80, 81, 93, 117, 122, 123, 143,

144, 152, 153, 154, 155, 157, 164, 341, 356, 357, 405, 424 covariance, 1, 2, 4, 5, 6, 10, 15, 17, 75, 133, 150, 169, 173, 176, 364, 367, 368, 369, 370, 375, 379, 380, 381, 382, 402, 405, 408, 410, 411, 461

D degree of freedom adjusted coefficient of determination, 82, 124 determinant, 18, 20, 79, 82, 93, 121, 122, 123, 124, 138, 142, 144, 157, 162, 453, 457 determination coefficient, 79, 80, 122, 143 Dirichlet distribution, 223, 229, 232, 233, 270 discriminant analysis, v, ix, 453, 457, 458 distance, 93, 126, 147, 163, 351, 355, 358, 359, 431, 432, 433, 434, 435, 436, 437, 438, 439, 443, 447, 450, 451, 453, 456, 458, 459, 460

E eigenvalue, 339, 344, 346, 347, 351, 353, 369, 370, 381, 382, 410, 423 eigenvector, 339, 346, 347, 349, 350, 370, 382

F factor analysis, v, ix, 306, 314, 327, 361, 362, 363, 367, 368, 369, 371, 374, 377, 379, 380, 396, 397, 398, 408, 409, 413, 416, 423 factor loading, 354, 361, 363, 369, 375, 381, 409, 421, 422, 424, 426, 428

Index

466 factor point, 361, 363, 371, 374, 375, 383, 384, 385, 396, 397, 399, 400, 401, 402, 416, 418, 421, 427, 428 forced regression, 77

G Gamma distribution, 233, 234, 235, 271 Gamma function, 223 Gibbs sampling, 223, 244, 249, 250, 259, 275

N normal distribution, 18, 21, 22, 27, 30, 32, 33, 44, 49, 50, 55, 56, 65, 71, 83, 87, 89, 90, 125, 128, 135, 146, 149, 150, 236, 239, 255, 262, 265, 272, 299, 455, 458

O observed variable, 169, 170, 176, 177 orthogonal table, v, ix, 305, 317, 328

H hierarchical Bayesian theory, ix, 223, 261

I independent load, 361, 363, 364, 369, 371, 374, 375, 381, 396, 409, 416, 423 inverse Gamma distribution, 223, 239, 273

J Jacobian, 19, 21, 24, 25, 36, 37

L latent variable, 169, 170, 171, 172, 175 leverage ratio, 69, 83, 84, 85, 114, 124, 125, 126, 127, 145, 146, 147, 148 likelihood function, 223, 224, 228, 229, 230, 232, 233, 234, 236, 237, 240, 263, 264, 269, 270, 271, 272, 275 logarithmic regression, 69, 95 logistic regression, 100

M Metropolis-Hastings (MH) algorithm, 223, 250, 251, 252, 256, 257, 258, 259, 261, 275 Monty Hall problem, 185, 187 multicollinearity, 117, 157 multinomial distribution, 223, 232 multiple regression, v, viii, 117, 124, 125, 140, 146, 160, 167, 340, 355, 357, 359

P parent node, 179, 210 partial correlation, 117, 155, 157, 164 partial correlation factor, 117, 155, 157, 164 path diagram, 70, 169, 170, 171, 176, 177, 340, 361, 362 Poisson distribution, 233 population, v, ix, 4, 8, 23, 24, 53, 62, 66, 70, 80, 81, 82, 86, 117, 118, 128, 141, 149, 181, 223, 224, 229, 233, 268, 274, 424 population ratio, 229, 233 posterior probability, 179, 188, 189, 191, 223, 228, 235, 242, 244, 247, 250, 252, 253, 258, 259, 261, 268 prediction, vii, 17, 50, 51, 52, 54, 55, 64, 86, 117, 127, 149, 179, 227, 282, 284 principal component analysis, v, ix, 339, 340, 347, 350, 351, 355, 359, 361, 362, 367, 368, 379 priori probability, 179, 223, 228, 230, 233, 235, 240, 268, 269, 270 pseudo correlation, 1, 12

R rank correlation, 8 regression, v, viii, 12, 23, 69, 70, 71, 72, 74, 76, 77, 78, 79, 80, 81, 83, 84, 86, 90, 92, 93, 94, 95, 97, 100, 108, 109, 110, 111, 112, 113, 114, 117, 118, 121, 122, 123, 124, 125, 127, 135, 137, 138, 139, 140, 141, 142, 144, 145, 146, 149, 150, 151, 155, 157, 158, 159, 160, 161, 162, 163, 164, 165, 167, 339, 340, 355, 356, 357, 359

Index regression line, 69, 72, 74, 76, 77, 79, 80, 81, 83, 86, 90, 93, 100, 111, 118, 122, 127, 135, 141, 149, 150, 163 rotation matrix, 361, 380, 384, 413

S sample correlation factor, 6, 8, 23, 24, 27, 43, 44, 45, 51, 52, 53, 62, 63, 66, 81, 93 standard deviation, 56, 231, 235, 240, 261, 266, 351 standard normal distribution, 18, 27, 30, 32, 33, 49, 56, 83, 89, 125, 146 structural equation modeling, v, viii, 169, 170 studentized range distribution, 277, 283, 299

467 T three prisoners, 184

V Valimax method, 361, 385, 395

W Ward method, 431, 438, 441, 444, 452 weighted distance, 438