249 90 8MB
English Pages [490] Year 2019
MATHEMATICS RESEARCH DEVELOPMENTS
STATISTICS VOLUME 3 CATEGORICAL AND TIME DEPENDENT DATA ANALYSIS
No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.
MATHEMATICS RESEARCH DEVELOPMENTS Additional books and e-books in this series can be found on Nova’s website under the Series tab.
MATHEMATICS RESEARCH DEVELOPMENTS
STATISTICS VOLUME 3 CATEGORICAL AND TIME DEPENDENT DATA ANALYSIS
KUNIHIRO SUZUKI
Copyright © 2019 by Nova Science Publishers, Inc. All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. We have partnered with Copyright Clearance Center to make it easy for you to obtain permissions to reuse content from this publication. Simply navigate to this publication’s page on Nova’s website and locate the “Get Permission” button below the title description. This button is linked directly to the title’s permission page on copyright.com. Alternatively, you can visit copyright.com and search by title, ISBN, or ISSN. For further questions about using the service on copyright.com, please contact: Copyright Clearance Center Phone: +1-(978) 750-8400 Fax: +1-(978) 750-4470 E-mail: [email protected]. NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS. Additional color graphics may be available in the e-book version of this book.
Library of Congress Cataloging-in-Publication Data ISBN: HERRN
Published by Nova Science Publishers, Inc. † New York
CONTENTS Preface
vii
Chapter 1
Customer Satisfaction Analysis
Chapter 2
Independent Factor Analysis
27
Chapter 3
Statistical Testing and Predictions
69
Chapter 4
Score Evaluation
115
Chapter 5
AHP (Analytic Hierarchy Process)
129
Chapter 6
Quantification Theory I
141
Chapter 7
Quantification Theory II
153
Chapter 8
Quantification Theory III (Correspondence Analysis)
169
Chapter 9
Quantification Theory IV
187
Chapter 10
Survival Time Probability
193
Chapter 11
Population Prediction
215
Chapter 12
Random Walk
221
Chapter 13
A Markov Process
249
Chapter 14
Random Number
301
Chapter 15
Matrix Operation
321
Appendix 1
Related Mathematics
367
Appendix 2
Summary of Probability Distributions and Their Moments
413
References
1
469
vi
Contents
About the Author
471
Index
473
Related Nova Publications
477
PREFACE We utilize statistics when we evaluate TV program rating, predict a result of voting, prepare stock, predict the amount of sales, and evaluate the effectiveness of medical treatments. We want to predict the results not on the base of personal experience or images, but on the base of the corresponding data. The accuracy of the prediction depends on the data and related theories. It is easy to show input and output data associated with a model without understanding it. However, the models themselves are not perfect, because they contain assumptions and approximations in general. Therefore, the application of the model to the data should be careful. We should know what model we should apply to the data, what are assumed in the model, and what we can state based on the results of the models. Let us consider a coin toss, for example. When we perform a coin toss, we obtain a head or a tail. If we try the coin toss three times, we may obtain the results of two heads and one tail. Therefore, the probability that we obtain for heads is 2/3, and the one that we obtain for tails is 1/3. This is a fact and we need not to discuss this any further. It is important to notice that the probability (2/3) of getting a head is limited to this trial. Therefore, we can never say that the probability that we obtain for heads with this coin is 2/3, in which we state general characteristics of the coin. If we perform the coin toss trial 400 times and obtain heads 300 times, we may be able to state that the probability of obtaining a head is 2/3 as the characteristics of the coin. What we can state based on the obtained data depends on the sample number. Statistics gives us a clear guideline under which we can state something is based on the data with corresponding error ranges. Mathematics used in statistics is not so easy. It may be a tough work to acquire the related techniques. Fortunately, the software development makes it easy to obtain results. Therefore, many members who are not specialists in mathematics can do statistical analysis with these softwares. However, it is important to understand the meaning of the model, that is, why some certain variables are introduced and what they express, and what we can state based on the results. Therefore, understanding mathematics related to the models is invoked to appreciate the results.
Kunihiro Suzuki
viii
In this book, we treat models from fundamental ones to advanced ones without skipping their derivation processes as possible as I can. We can then clearly understand the assumptions and approximations used in the models, and hence understand the limitation of the models. We also cover almost all the subjects in statistics since they are all related to each other, and the mathematical treatments used in a model are frequently used in the other ones. We have many good practical and theoretical books on statistics [1]-[10]. However, these books are oriented to special direction: fundamental, mathematical, or special subjects. I want to add one more, which treats fundamental and advanced models from the beginnings to the advanced ones with a self-contained style. I also aim to connect theories to practical subjects. This book consists of three volumes:
The first volume treats the fundamentals of statistics. The second volume treats multiple variable analysis. The third volume treats Categorical and time dependent data analysis.
In volumes 1 and 2, we treat numerical data. We treat categorical data in this volume. We can perform analysis with the categorical data similar to the numerical data. We also treat time dependent data analysis in this volume. We treat the following subjects. (Chapter 1 and 2) We introduce customer satisfaction (CS) analysis which decide the important item to improve target subject based on the two stand point of view: correlation factor between the item and the target subject and the level of the each item. This analysis is vital to clarify the items we should focus to improve objective variable. Independent analysis is the categorical version of the CS analysis. (Chapter 3) We summarize the discussions on testing and predictions up to this chapters, which is very important because it is the judge based on statistical method. (Chapter 4) We introduce score analysis, where we select a subject with a low score and a high variance, which is supposed to be most effective one to improve the total score.
Preface
ix
(Chapter 5) We treat an analytic hierarchy process (AHP), which is the analysis for various qualitative data. We show that how we make a decision quantitatively based on the qualitative data. (Chapter 6 to 9) We treat multi-variable analysis for categorical data. Quantification theory I corresponds the multiple regression analysis for categorical data. Quantification theory II corresponds the discriminant analysis for categorical data. Quantification theory III clarify the relationship between two categorical data. Quantification theory IV clarifies the similarity between two categorical data. (Chapter 10) From this chapter, we cover the time dependence of the probability. We treat survival time probability, which is frequently used in medical fields, where we have not complete form data, but use them all. (Chapter 11) We treat population problem, which is very important for us. We show how we predict the constitution of age members. (Chapter 12 and 13) We treat time dependent probability function in these chapters. We start with random work and extend it to a Marcov process. (Chapter 14) We briefly study random number to generate pseudo experimental data. Generating random number is vital to predict the results theoretically, and is necessary in Monte Carlo simulation. (Chapter 15) We briefly study matrix operation which is important and fundamental in statistics. (Appendix 1) We add brief explanation of mathematics related to the book. (Appendix 2) We evaluated moment parameters with various methods, and summarize the probability distribution functions and related moment parameters.
x
Kunihiro Suzuki
We do hope that the readers can understand the meaning of the models in statistics and techniques to reach the final results. I think it is not easy to do so. However, I also believe that this book helps one to accomplish it with time and efforts. I tried to derive any model from the beginning for all subjects although many of them are not complete. It would be very appreciated if you point out any comments and suggestions to my analysis.
Kunihiro Suzuki
Chapter 1
CUSTOMER SATISFACTION ANALYSIS ABSTRACT We treat data which consist of one objective variable, which is a customer satisfaction (CS) data, and many explanatory variables. CS analysis clarifies which explanatory variables are important to improve the objective variable. The decision is made by considering two aspects: one is the correlation factor between the objective variable and each explanatory variable and the other is the level achievement ratio of each explanatory variable. We select the explanatory variables of with high correlation factor s and low level achievement ratio.
Keywords: explanatory variable, objective variable, correlation factor, average, variance, normalization, CS plot, CS analysis, contribution degree, requested improvement degree, CS correlation factor, principal component analysis
1. INTRODUCTION We want to improve customer satisfaction (CS), and study what items influence the satisfaction. We then evaluate the level of the assumed items as well as the customers’ satisfaction. Based on the data, we want to decide which items are important for improving the satisfaction. This subject is generalized to the situation where one objective variable and many explanatory variables as shown in Figure 1. We assume that all data are numerical ones. This type of the analysis is called as CS analysis even when the objective variable is not customers’ satisfaction.
Kunihiro Suzuki
2
Figure 1. Data structure for CS analysis.
2. QUESTIONNAIRE We treat one objective variable and five explanatory variables in this section. The
y
objective variable is the customers’ satisfaction. The explanatory variables (Items) are the ones which are supposed to influence the customers’ satisfaction, and are denoted as xi i 1, 2,
,5
. We assume the items blow.
Item 1 x1 :Understanding of customers’ work
Item 2 x2 :Quality of reply to customers’ requests and questions
Item 3 x3 :Project promotion ability
Item 4 x4 :Effective information providing ability
Item 5 x5 :Proposal ability
The target of the analysis is to clarify for which items we (salesmen) should try to improve the customers’ satisfaction. Table 1. Questionnaire and variable notations Variable y x1 x2 x3 x4 x5
Questionary Customers' satisfaction Understanding of customer Quality of reply to customers' request Project promotion ability Effective information providing ability Proposal
Item No Item1 Item2 Item3 Item4 Item5
Score 1-10 1-5 1-5 1-5 1-5 1-5
The corresponding evaluation score form is shown in Table 1. The score range is shown in the table. We obtain the data from 350 customers, that is, the data number n is
Customer Satisfaction Analysis
3
350. We consider the correlation between the objective variable and each explanatory variable and do not consider correlation between each explanatory variable.
3. FUNDAMENTAL PARAMETERS The average of the objective variables is given by n
y
y k 1
k
(1)
n
The average of the explanatory variables are given by n
i
x
ik
k 1
(2)
n
where i denotes the item number, and. i 1, 2,
, p p is five here.
The unbiased variances and co-variances are given by
y n
yy 2
ik
k 1
ii 2
x
ik
k 1
(3)
i
2
n 1 n
iy
2
n 1 n
2
y
x k 1
ik
(4)
i yik y n 1
(5)
The correlation factor between the objective variable and the explanatory variable i is given by
riy
iy 2 ii 2 yy 2
(6)
Kunihiro Suzuki
4
Table 2 shows the averages, standard deviations and correlation factors for each item. The corresponding figures for the satisfaction average and the correlation factors are shown in Figure 2 and Figure 3. We want to select the items which are low in the average of the satisfaction and high in the correlation factor. We may decide the items by inspecting the both figures. However, it is not clear which one should be selected. We will introduce a parameter which enables us to select items clearly.
Mean of satisfaction
5 4
3 2 1
0 Item1
Item2
Item3
Item4
Item5
Item3
Item4
Item5
Figure 2. Average of satisfaction for each item.
Correlation factor
1.0 0.8 0.6 0.4
0.2 0.0 Item1
Item2
Figure 3. Correlation factors for each item and the objective variable.
Customer Satisfaction Analysis
5
Table 2. Average, standard variation, and correlation factors for each item Variable
Item1(x1)
Item2(x2)
Item3(x3)
Item4(x4)
Item5(x5)
y
Average
3.68
3.52
3.12
3.48
3.32
6.83
0.92
1.12
0.98
0.99
0.95
1.94
0.74
0.77
0.68
0.73
0.68
Standard deviation Correlation factor
4. CORRELATION FACTOR TESTING We evaluate the effectiveness of the explanatory variables. If the corresponding correlation factor is negative, we neglect it. We then evaluate whether the correlation factor is zero. The converted variable t
n2
t
r
(7)
1 r2
follows t -distribution with a freedom of n 2 , which is shown in Chapter 2 of volume 2. Since we evaluate only positive value, and hence we apply one sided probability P , and evaluate the corresponding P point
tp
. We then evaluate below.
t tp
(8)
If the above relationship is held, we judge that the variable is valid, and vice versa. We select explanatory variables that hold Eq. (8), and proceed the next step.
5. NORMALIZATION OF THE VARIABLES
We evaluate the average of each item’s average which are given by
1 p i p i 1
, and the standard deviation
,
(9)
Kunihiro Suzuki
6
2 1 p i p i 1
(10)
z We then normalize each item’s average i , which is denoted as i and is given by zi
i
(11)
We then evaluate the average r and the standard deviation r of correlation factors, which are given by r
r
1 p ri p i 1
(12)
1 p 2 ri r p i 1
(13)
We then normalize each item’s correlation factor ri , which is denoted as zri and is given by zri
ri r
r
(14)
a z ,z
ri i We then obtain the point in the plane as i . Table 3 shows the average standard deviation, correlation factor and their normalized values. The score of item3 is high and that of item 1 is low. On the other hand, the correlation factor of item 3 is low and that of item 1 is rather low. Table 4 shows the average and the standard deviation of scores with respect to the items. The normalized average and the standard deviation are evaluated with these values.
Table 5 shows the predictive probability and P-value of t distribution with which we evaluate the effectiveness of the correlation. This is shown in the Table 3.
Customer Satisfaction Analysis
7
Table 3. Average and standard variation with respect to the same items or objective variables Variable
Item1(x1)
Item2(x2)
Item3(x3)
Item4(x4)
Item5(x5)
y
Average
3.68
3.52
3.12
3.48
3.32
6.83
0.92
1.12
0.98
0.99
0.95
1.94
0.74
0.77
0.68
0.73
0.68
20.71
22.27
17.19
20.01
17.18
Yes
Yes
Yes
Yes
Yes
1.34
0.51
-1.6
0.29
-0.54
0.66
1.32
-1.16
0.34
-1.17
Contribution
1.42
1.29
-1.95
0.45
-1.21
Improvement request
-0.48
0.58
0.31
0.03
-0.44
Standard deviation Correlation factor t-value of correlation factor Evaluation of correlation Normalized average Normalized correlation factor
Table 4. Average and standard variation with respect to items
Table 5. Predictive probability and p-value of t-distribution
Prediction probability
0.95
tP
1.65
CS correlation factor
0.83
Kunihiro Suzuki
8
Normalized satisfaction
2
Contribution Item1
1 Item2
Item4
0 Item5
-1 Item3 Improve requested
-2 -2
-1
0
1
2
Normalized correlation factor Figure 4. CS plot of normalized correlation factor and satisfaction score.
6. IMPROVE REQUESTED AND CONTRIBUTED ITEMS We can plot the normalized correlation factor and normalized satisfaction as shown in Figure 4. The high correlation factor means that it is an important item, and the high satisfaction means that the item is in good condition. Therefore, the requested improvement item can be selected as the one with a high correlation factor and a low satisfaction score. How can we obtain the corresponding value ? The axis direction of right angle of -45o corresponds to the importance associated with the correlation factor and badness associated with the satisfaction, and hence it expresses requested an improvement requested degree. Therefore, the projection of each point to the axis corresponds to the improvement request degree, which is shown in the red arrow for item 2. The distance from the origin to the end point of the red arrow is the improvement requested degree. The axis direction of right angle of 45o corresponds to the importance associated with the correlation factor and the goodness associated with the satisfaction, and hence it expresses the degree of contribution. Therefore, the projection of each point to the axis corresponds to the contribution degree. The distance from the origin to the end point of blue arrow is the contribution degree. We can evaluate the degree as follows. The unit vector for the contribution axis eG , and that for the improvement requested axis eB are given by
Customer Satisfaction Analysis 1 1 , eG 2 2 e 1 , 1 B 2 2
9
(15)
The degree for the contribution is denoted as Gi and can be evaluated as Gi ai eG zri , zi
1 2
1 2
1,1
zri zi (16)
The degree for the improvement requested is denoted as Bi and can be evaluated as Bi ai eB zri , zi
1 2
1 2
1, 1
zri zi
(17)
Figure 5 shows the contribution and the requested improvement degrees extracted from Figure 4. 1.5
Contribution Improvement request
1.0
0.5
0.0
Item1(x1) Item2(x2) Item3(x3) Item4(x4) Item5(x5) Figure 5. Contribution and improvement request degrees.
Kunihiro Suzuki
10
Item 1 and Item 2 contribute to the satisfaction, while item 2 contributes to the improvement requested. Since the correlation factor of item 2 is high, it is requested high level score. It is the reason why item 2 appears both degrees. There is no clear critical value for the requested improvement. We should set a certain value. If we set the value at 0.5 here, we should focus on item 2.
7. CS CORRELATION FACTOR We want to evaluate the status of the CS. We can evaluate it inspecting the data distribution. If the data are along the contribution axis, the status is good. On the other hand, if the data are along the improvement request axis, the status is bad. We can evaluate the status of the CS by evaluating the correlation factor between zri
rz z and zi . We denote it as r , and evaluate it as rzr z
z 2z r
2
z z
r r
z2z
(18)
where 1 p 2 zri p i 1
(19)
z2z
1 p 2 zi p i 1
(20)
z 2z
1 p zri zi p i 1
(21)
z 2z r r
r
We call it as a CS correlation factor. The value is between -1 and 1, and the status is better with increasing value. Then we show the three typical CS statuses, where the CS correlation factor is -0.8, 0.0, and 0.8 as shown in Figure 6, where we use 10 items that may influence the objective variable.
Customer Satisfaction Analysis
11
The total status can be evaluated using this CS correlation factor. Two cases are shown in Figure 6, where (a) is in a bad condition, (b) is in a plane condition, and (c) is in a good condition. The status can be evaluated by the CS correlation factors. 2 rcl = -0.8
Contribution
Normalized satisfaction
Normalized satisfaction
2
1
0
-1
rcl = 0.0
1
0
-1 Improvement requested
Improvement requested
-2 -2
-1 0 1 2 Normalized correlation factor (a)
Normalized satisfaction
2
rcl = 0.8
Contribution
-2 -2
-1 0 1 2 Normalized correlation factor (b)
Contribution
1
0
-1 Improvement requested
-2 -2
-1 0 1 2 Normalized correlation factor (c)
Figure 6. CS plot with various CS correlation factors. (a) Bad condition (b) Plane condition, (c) Good condition.
8. TARGET VALUE FOR SATISFACTIONS OF EXPLANATORY AND OBJECTIVE VARIABLES We clarified the target items in the previous section. We then show the target values of the item using a regression theory.
Kunihiro Suzuki
12 The objective value related to
yk y
y
ri
y
and explanatory values xk k 1, 2, , n are assumed to be
xk i
i
(22)
We suppose that this relationship is held after we perform some treatments and assume yk y n xk i r y k 1 i i k 1 n
(23)
We can then modify it as y r 1 n 1 1 n yk ri i i xk y n k 1 i i n k 1 y
(24)
We then have 1
y
where
y p rp p y
yt
yt
and
it
rp pt p
(25)
is the average satisfaction after some treatment, and are given by
yt
1 n yk n k 1
(26)
it
1 n xk n k 1
(27)
Therefore, we obtain
y ri
where
y i p
(28)
Customer Satisfaction Analysis
13
y yt y
(29)
i it i
(30)
We assume that the relationship between each explanatory variable is independent, and the total improvement of the objective variable Q y is given by
Qy i
y i i
(31)
9. ANALYSIS FOR SUB GROUP We treat a total group up to here. The group may consist of many sub groups, and the characteristics of the sub groups are different from the total group in general. The target items for the sub group may be different from one for the total group. We try to select target items for the sub groups. We assume that the importance of item is the same for the sub group. The difference between a certain group and the total group or between each sub groups is the status of satisfaction. We reference the satisfaction of a sub group with respect to the total ones. We set the average for item i of the sub group as Gi and the data number as nG . We introduce the normalized variable for item i as zGi , and define it as Gi i
zGi
i
(32)
1 1 N nG
This may be too big deviation and suffer unstable one. This may be modified as zGi
Gi i
(33)
1 1 i N nG
where is just a parameter to handle the magnitude of the value, should be larger than 1. We may be able to use the simple form given by
Kunihiro Suzuki
14 zGi
Gi i i
(34)
This form is simplest, but does not consider the scale of the sub group. The selection of the model should be investigated further, we use a model of Eq. (34) here. The normalized satisfaction can be expressed as
zi zi zGi
(35)
If the average of the sub group Gi is the same as the total group, the normalized sub group is the same as the total group. If it is larger than the total group, the normalized satisfaction is larger than the one of the total group, which is the expected characteristic. We treat a group A as low score one, and a group B as high score one as shown in Table 6. The deviation of the normalized satisfaction scores are shown in Table 7. We used a model of Eq. (34) here. The contribution and requested improvement are changed depending on the score of satisfaction as shown in Table 8 and Table 9. The target items for the groups A and B are changed from the ones for the total group correspondingly. Table 6. Satisfaction score of group A and B Variable Average (total) Average (A) Average (B)
Item1(x1) 3.68 3.00 4.00
Item2(x2) 3.52 3.00 3.70
Item3(x3) 3.12 2.50 3.50
Item4(x4) 3.48 3.00 4.00
Item5(x5) 3.32 3.00 4.00
y 6.83 5.00 7.00
Number 350 15 15
Table 7. Deviation of normalized satisfaction of group A and B Variable Average (total) Average (A) Average (B)
Item1(x1) 0.00 -2.17 1.32
Item2(x2) 0.00 -1.75 0.61
Item3(x3) 0.00 -2.41 1.46
Item4(x4) 0.00 -1.85 2.00
Table 8. Contribution degree of group A and B Variable Item1(x1) Item2(x2) Item3(x3) Item4(x4) Item5(x5)
Total 1.42 1.29 -1.95 0.45 -1.21
Group A -0.55 0.05 -3.66 -0.86 -4.96
Group B 2.35 1.72 -0.92 1.86 0.71
Item5(x5) 0.00 -5.30 2.70
Customer Satisfaction Analysis
15
Table 9. Requested improvement of group A and B Variable Item1(x1) Item2(x2) Item3(x3) Item4(x4) Item5(x5)
Total -0.48 0.58 0.31 0.03 -0.44
Group A 1.48 1.82 2.02 1.34 3.31
Group B -1.42 0.15 -0.72 -1.38 -2.36
10. CS ANALYSIS USING MULTIPLE REGRESSION We do not consider interactions between explanatory variables. This means that an explanatory variable is independent of each other. Although it is an ideal condition, but explanatory variables interact each other in general. A multiple regression does consider the interaction. If we use this multiple regression model, we can accommodate the interaction. The multiple regression factors depend on the scale of the data. We hence use a normalized variable. We can then obtain the relationship given by
z y b1 z1 b2 z2
bp z p
(36)
The factor satisfies the below equation. S11 2 2 S21 S 2 p1
S12
2 2 S1 p b1 S1y 2 2 S2 p b2 S2 y 2 b 2 S pp p S py
2
S22 2
S p 2 2
(37)
Therefore, we can obtain the factor bi as b1 S11 2 b2 S21 2 bp S p1 2
S12 2
S22 2
S p 2 2
S1 p 2 S2 p 2 S pp 2
1
S1y2 2 S2 y S 2 py
(38)
Kunihiro Suzuki
16
where the matrix operation is shown in Chapter 15. We can use these multiple regression coefficients instead of the correlation factors. However, the multiple regression coefficients are difficult to handle. We cannot image the values from the data as shown in the corresponding chapter. This is the reason why a multiple regression is not used commonly although the fitness to the data is more excellent. We try to treat the interaction with a different way in the next section.
11. INTERACTION BETWEEN EXPLANATORY VARIABLES We showed that we can include the interaction in the multiple regression analyses. We show the other case to include the interaction in this section. On the other hand, the usual CS analysis is stable since it does not consider the interaction between the explanatory variables. We evaluate the correlation factor for explanatory variables, and select a group where
p
each correlation factors are large. We assume that the group consists of variables. The high correlation factor means high interaction. Therefore, the improvement of the objective variable cannot be simply added for each explanatory variable. The significance is doubly counted, and it should be decreased by the interaction. We perform principal component analysis to the explanatory variables. We evaluate the first principal component, which can be expressed as
zi a1u1i a2u2i
where below.
a1 , a2 , , a p
a12 a22
u1 , u2 , , u p
u1i
(39)
are the elements of the eigenvector for the first component, and hold
a 2p 1
(40)
are the normalized explanatory variables in the group and is given by
x1i 1
1 2
a p u pi
, u2 i
x2i 2
2 2
,
, u pi
x pi p
p2
The average and the variance are given by
(41)
Customer Satisfaction Analysis z
17
1 n zi n i 1 x 1 1 n x 2 1 n a1 1i a2 2i 2 n i 1 n i 1 1 2 2
x pi p 1 n ap n i 1 p2
(42)
0
1 n 2 zi z n 1 i 1 1 n 2 zi n 1 i 1
z 2
1 n x1i 1 1 n x2i 2 a1 a2 n 1 i 1 2 n 1 i 1 2 1 2 p
x 1 pi p ap 2 n 1 i 1 p n
2
(43)
p
ai a j R i, j i 1 j 1
a12 a22 p
1
i , j i j
a 2p
p
i , j i j
ai a j R i, j
ai a j R i, j
is the correlation factor between variables i and j . Since we select variables R i, j where each interaction is significant, is positive. We use this first principal component instead of the original variables. The correlation factor between the first principal component and the objective variable is given by where
R i, j
rz
1 n zi z yi n 1 i 1 2 z
n y y x1i 1 yi y x2i 2 1 n 1 a i a2 1 2 n 1 i 1 2 2 n 1 i 1 z y 1 y 2 2 2 a1r1 a2 r2 a p rp 1
yi y x pi p 1 n ap n 1 i 1 y 2 p2
z 2 p
i 1
ai
z 2
ri
(44)
Kunihiro Suzuki
18
We should also assign the explanatory value for the first principle component. It may be used as
x a11 a2 2
ap p
(45)
Finally, the increment of the objective variable is related to the explanatory variable where the interaction is significant can be expressed as y 2 y 2 y rz a1 1 a2 2 2 2 1 2
ap
y 2 p2
p
(46)
We can treat the other explanatory variables with the normal way, and hence the total objective variable increment can be expressed by q
y 2
K 1
k 2
y rz ak
k
p
r
k q 1
k
y 2 k 2
k
(47)
Figure 7. Correlation factors. (a) Original correlation factor (b) Correlation factor which is smaller than ones in (a) by 0.6, (b’) Correlation factor of (b) with different vertical value.
Customer Satisfaction Analysis
19
12. EXTENDED NORMALIZED CORRELATION FACTOR In this analysis, we use a normalized correlation factor, which is denoted again as zri
ri r
r
(48)
The origin of the normalized value is the average r . This means that we do not care the ratio of r r do not influence the value of the normalized value. Figure 7 shows the correlation factor of Figure 3 and also the correlation factor -0.6. In the Figure 7 (a), the average of the correlation factor is about 0.7 and the standard deviation is much smaller than the average. On the other hand in Figure 7 (b), the average correlation factor is about 0.1, and is comparable with the standard deviation. In the former case, we can approximately consider that the correlation factors are almost the same for any items, and the latter depends on the items. However, the CS analysis gives the same results. This means that CS analysis exaggerates the difference between correlation factors. We therefore propose an extended normalized variable given by 1 z p
r z r rp
1 r r
2
(49)
The extended normalized variable is as follows in the limiting cases as
z p
1 z rp
r r for r r for
1 1
These are the expected ones.
(50)
a z , z
p ip We then obtain a coordinate for member i as i . We evaluate the axis for improvement request and contribution. We define the angle given by
tan
(51)
Kunihiro Suzuki
20 That is, we obtain an angle of
tan 1
(52)
The angle has value for the limiting cases as 2 0
r r for r r
for
1 1
(53)
We propose to define the unit vectors for the contribution and the improvement as follows. eG cos ,sin 2 2 2 2 e cos ,sin B 2 2 2 2
(54)
These definitions realize the requested ones for the limiting cases. The contribution and improvement requested are given by ai eG zGi z p cos 2 2 zip sin 2 2 a e z z cos z sin i B Bi p ip 2 2 2 2
(55)
This subject will be discussed again more clearly in Chapter 4.
13. TREATMENT OF CORRELATION FACTOR It should be noted that the correlation factor has limiting values of 1 and -1. We cannot expect values outside of this region. Therefore, the variation from 0.1 to 0.2 and from 0.9 to 1.0 are not identical, and the latter rarely occurs or is hard to realize. However, we treat the values identically in CS analysis.
Customer Satisfaction Analysis We therefore propose to use converted variable
which is given by
1 r
1
21
ln 2 1 r
(56)
This variable changes from to when r changes from -1 to +1. We can perform the same analysis using this .
14. SUMMARY I summarize the results in this chapter. We consider an objective variable, and p items for explanatory variables. The average of the objective variables is given by n
y
y k 1
k
n
The average of the explanatory variables are given by n
i
x
ik
k 1
n
where i denoted the item number, and i 1, 2,
,p.
The unbiased variances and co-variances are given by
y
y
n
2 yy
ik
k 1
n 1 n
ii 2
x
ik
k 1
iy
i
2
n 1 n
2
2
x k 1
ik
i yik y n 1
Kunihiro Suzuki
22
The correlation factor between the objective variable and the explanatory variable i is given by
riy
iy 2 ii 2 yy 2
We can evaluate whether the correlation is valid or not by evaluating t given by t n2
r 1 r2
It follows a t -distribution with a freedom of n 2 . We compare this with the corresponding
tp
.
We evaluate the average of each item’s average which are given by
, and the standard deviation
,
1 p i p i 1
2 1 p i p i 1
z We then normalize the each item’s average i , which is denoted as i and is given by
zi
i
We then evaluate the average r and the standard deviation r of correlation factors, which are given by r
1 p ri p i 1
Customer Satisfaction Analysis 1 p 2 ri r p i 1
r
ri , which is denoted as zri and
We then normalize the each item’s correlation factor is given by zri
23
ri r
r
We then obtain the point in the plane as
ai zri , zi
.
The degree for the contribution is denoted as Gi and can be evaluated as 1
Gi
2
zri zi
The degree for the requested improvement is denoted as Bi and can be evaluated as 1
Bi
2
zri zi
We can evaluate the status of the CS by evaluating the correlation factor between zri and
zi
. We denote it as
rzr z
zr2z zr2zr z2z
where z 2z
1 p 2 zri p i 1
z2z
1 p 2 zi p i 1
r r
rzr z
, and evaluate it as
Kunihiro Suzuki
24
z 2z r
1 p zri zi p i 1
We can evaluate the subgroup as follows. We evaluate the normalized variable for subgroup as zGi
Gi i i
The normalized satisfaction can be expressed as zi zi zGi
We then perfume the same evaluation. We implicitly assume that the explanatory variables are independent on each other. However, there is a case where the interaction is significant. In that case, we perform the principal analysis to the variables with high interaction, and obtain first component.
zi a1u1i a2u2i
a p u pi
The related correlation factor is given by p
rz i 1
ai
z 2
ri
The corresponding explanatory value is given by
x a11 a2 2
ap p
We further introduce an extended normalized variable to consider the magnitude of the average and the standard deviation of the correlation factor, which is given by 1 z p
r z r rp
1 r r
2
Customer Satisfaction Analysis
a z , z
25
p ip We then obtain coordinate for member i as i . We evaluate the axis for improvement request and contribution. We define the angle given by
tan 1 We propose to define the unit vectors for contribution and improvement as follows. eG cos ,sin 2 2 2 2 e cos ,sin B 2 2 2 2
Figure 8. Flow of analysis.
Kunihiro Suzuki
26
The contribution and improvement requested are given by
ai eG zGi z p cos 2 2 zip sin 2 2 a e z z cos z sin i B Bi p ip 2 2 2 2 We further pointed out to use a variable converted from the correlation factor given by 1 2
1 r 1 r
ln
Figure 8 shows flow of the analysis described above.
Chapter 2
INDEPENDENT FACTOR ANALYSIS ABSTRACT We treat data which consist of one objective variable and many explanatory variables as is the case of CS analysis in the previous chapter. Independent analysis is the same as CS analysis but treat categorical data instead of numerical data. This analysis clarifies which explanatory data are important to improve the objective variable. The decision is made by considering the independent factor and the level achievement ratio.
Keywords: explanatory variable, objective variable, independent factor, independent value, adjust residual
1. INTRODUCTION In this chapter, we treat data which consist of one objective variable and many explanatory variables as shown in Figure 1. The data form is exactly the same as the one in the previous chapter, but the data types are all categorical. The data values correspond to category levels. In the numerical data, the score was 1,2,3,…. In the categorical data, the levels are high or low. For example, we can set any number of levels such as significanthigh, high, plane, low, significant low. In the categorical data, there is two types. High and low have certain order. However, such as methods A, B, C, which have no order, are also available in this analysis. We clarify which explanatory data are important to improve the objective variable.
Kunihiro Suzuki
28
Figure 1. Data structure for independent factor analysis.
2. QUESTIONNAIRE We treat one objective variable and five explanatory variables. The objective variable y x i 1, 2, ,5 is the customers’ satisfaction. The explanatory variables (Items) i are as below.
Item 1 x1 :Understanding of customers’ work Item 2 x2 :Quality of reply to customers’ requests and questions
Item 3 x3 :Project promotion ability Item 4 x4 :Effective information providing ability
Item 5 x5 :Proposal ability
The form is exactly the same as the one in the previous chapter. The target of the analysis is to clarify for which items we (salesmen) should try to improve the customers’ satisfaction. Table 1. Questionnaire and variable notations
Independent Factor Analysis
29
Table 2. Raw data for independent factor analysis ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Item1(x1) Item2(x2) Item3(x3) Item4(x4) Item5(x5) high high high high high high low low low low low low low low low high high high high high high high high high high low low low low low high high high high high high low low low low low low low low low low low low low low high high high high high low low low low low low low low low low low high low low low high low low low low low low low low low high high high high high high low low low low high high low high low high high low low low high high low high low high high low high high high high low high high high low low low low high high high high high high low low high high high high low high low high high high high high high high high high high high high high high high high high low high high high high low high high high low low low low high high high high high high low high low high low low low high low high high low low low high high low low low high low low low low high high low low high low low low low low low low low low low high high low high high high low low low low high low high low high low high low high low high high high high high high high low high low low low low low low high high high high high high high low low high low low low low low high low high high high low low low high low low low low low low high high low high high high high high high high low low low low low high high high high high low low low low low
y low low low high low low high low low low high low low low low low high low low low low low high low high low high high high high high low low high low low low high low high low low high low low low high high low high high low low low low high low low low low
Kunihiro Suzuki
30
The level for the data, which was a score in numerical data, is category for this analysis and is high or low as shown in Table 1. Therefore, the final data are as shown in Table 2, where the data number is 350. We consider the relationship between the objective variable and each explanatory variable and do not consider the relationship between each explanatory variable. We can then obtain five group data sets that show the relationship between each item and the objective variable. We show the data associated with Item 1 in Table 3. We can see qualitatively that high satisfaction for item 1 leads to high satisfaction for the salesmen. We evaluate the above relationship quantitatively. We focus on one item, and form a cross table. We express the data for i -th row and j -th column as xij . We then have data values given by x11 188 x12 33 x21 33 x22 96
(1)
The corresponding cross-tabulation table is shown in Table 3. Table 3. Cross-tabulated table and level ratios for item 1
3. INDEPENDENT VALUE We define independent values for the values in the cross tabulated table. We first evaluate the independent ratio ki with respect to Itme1 levels by dividing number for the level summed over the customers’ levels by the total number. This ratio is related to the level of Item 1 independent of the objective variable status.
Independent Factor Analysis
31
r
Next, we evaluate the independent ratio j with respect to customers’ satisfaction for the level summed over the item 1’s levels. This ratio is related to the level of the objective variables. The evaluated values are shown in Table 3. The independent value for cell
i, j
is denoted as
aij
and is given by
aij ki rj N
(2)
This is the value that we expect if there is no interaction between the objective and explanatory variables. Table 4 shows the independent values. Table 4. Independent values
Item1
y Score high low
high 139.55 81.45
low 81.45 47.55
The sum of the independent ratio can be evaluated as m l k r k i j i rj i 1 j i 1 j m
l
m
ki i 1
1
(3)
4. TESTING After we obtain a cross tabulation table as shown in Table 3, we want to know the significance of relationship of two categorical tables, which was evaluated with a correlation factor with numerical data. We utilize a likelihood ratio testing here. In the probability trial, we assume that we obtain k kinds of evens E1 , E2 , , Ek , where each event is exclusive. We try N times, and obtain E1 , E2 ,
n1 , n2 ,
, nk
, Ek events
times, respectively. The probability that Ei occur is denoted as i . The
Kunihiro Suzuki
32
n1 , n2 ,
probability that we obtain the results of distribution given by f n1 , n2 ,
, nk ;
N! 1n12 n2 n1 !n2 ! nk !
, nk
is expressed by a multinomial
k n
k
(4)
where
N n1 n2
nk
Using the obtained data
ˆN ˆ1N ,ˆ2 N , n n 1 , 2 , N N
, ˆkN ,
(5)
n1, n2 ,
, nk
, the probability is given by
n2 N
(6)
We compare it with the independent value and take the ratio as
n1 , n2 , , nk
f n1 , n2 ,
, nk ; 0 , n ;ˆ
f n1 , n2 ,
k
n
N! 10 n1 20 n2 n !n ! nk ! 1 2 N! ˆ1N n1ˆ2 N n2 n1 !n2 ! nk ! n1 10 n ˆ 1 1N
20 n2 ˆ n2 2 N
k 0n
k 0 nk ˆ nk kN
k
ˆkN n
k
(7)
Taking its logarithm, we obtain
2ln n1 , n2 ,
k
, nk 2 ni ln ˆiN ln i 0 i 1
k n 2 ni ln i ln i 0 i 1 N
(8)
Independent Factor Analysis
33
We expand it into Taylor series, and obtain k k ˆ 2 ni ln ˆiN ln i 0 2 ni ln iN i 1 i 1 i 0 k ˆ 2 ni ln iN i 0 i 0 i 0 i 1 k ˆ 2 ni ln 1 iN i 0 i 1 i 0 2 ˆ k 1 ˆiN i 0 iN i 0 2 ni 2 i 0 i 0 i 1 2 ni ni i 0 1 i 0 k 2 ni N N 2 i 0 i 0 i 1 2 k n N 1 ni Ni 0 i i0 2 ni 2 N i 0 Ni 0 i 1
(9)
This can be modified further as 2 n N 1 ni Ni 0 i i0 2 ni 2 Ni 0 Ni 0 i 1 2 k n N 1 ni Ni 0 i i0 2 ni Ni 0 Ni 0 2 Ni 0 Ni 0 i 1 2 3 2 k ni Ni 0 1 ni Ni 0 Ni 0 ni Ni 0 2 ni Ni 0 Ni 0 2 N 2i 0 2 i 1 2 3 k ni Ni 0 ni Ni 0 Ni 0 N 2i 0 2 i 1 k
k
i 1
ni Ni 0 Ni 0
2
(10)
Kunihiro Suzuki
34
where we assume that the second term before the last line in Eq. (10) is negligible compared to the first term and utilize below in the derivation process. k
k
k
i 1
i 1
ni Ni 0 ni N i 0 i 1
NN 0
(11)
Therefore, we can utilize a testing variable given by k
2
ni Ni 0
2
Ni 0
i 1
(12)
2 This is known to follow a distribution. In the model, we can replace Ni 0 by the 2 independent value. Since the distribution is the sum of the square of standard normal
variables, the average and the variance of cell i denoted as
i
and
i 2 are given by
i N i 0
(13)
i 2 Ni 0
(14)
Therefore, we evaluate the variable for the cross tabulated table as
2 ij2 i, j
x
ij
aij
2
(15)
aij
i, j
where
ij2
x
ij
aij aij
2
(16)
Independent Factor Analysis
35
We apply testing procedure to the data shown in Table 3. The term associated with cell
1,1 is evaluated as
188 139.55
112
2
139.55
16.6
(17)
The total sum over all cells are given by
2 112 12 2 212 22 2
188 139.55
139.55 123.85
2
33 81.45 81.45
2
33 81.45 81.45
2
96 47.55
2
47.55 (18)
This can be expressed as
2 ij 2 ij 2
i
j
x
aij
ij
(19) 2
aij
xij aij aij
2
(20)
Let us consider the freedom of the variable. The level numbers of the variables are assumed to be m and l . We utilize the ratio for two variables, and the freedom should be decreased by 1 for each variable. Therefore, the freedom is given by m 1 l 1
(21)
In this case, m 2 and l 2 . The corresponding freedom is given by
m 1 l 1 2 1 2 1 1
(22)
Kunihiro Suzuki
36
We set a predictive probability P 0.95 , and obtain the corresponding P-value as
c 2 2 , P 2 1,0.95 3.84
(23)
Therefore, we obtain
2 c 2
(24)
This means that the two variables have relationship.
5. INDEPENDENT FACTOR We consider the maximum and minimum values of . We treat different data from the previous section with more levels to clarify the analysis, where the relationship between age and flavor of dishes of Chinese, Japanese, and French. The dishes’ levels number is 3, and age’s levels number is 5. The total member number is 500. 2
2 Let us consider the minimum .
The minimum is 0, which we show below. We can first set the ratio as shown in Table 5. The corresponding independent value is 2
2 shown in Table 6. If the data are equal to the values, the corresponding is 0.
Table 5. Setting ratio of age and dish
Chinese mid-20 mid-30 Age mid-40 mid-50 mid-60 Dish ratio
0.5
Dish Japanese
0.2
French
0.3
Age ratio 0.2 0.3 0.4 0.3 0.2 1
Next, we consider the maximum value. The maximum value corresponds to the situation of significant deviation. The corresponding situation is that only one level has a value along all rows and columns, which is shown in Table 7. The corresponding 2 independent values are shown in Table 8. The corresponding is as large as 1000.
Independent Factor Analysis
37
Table 6. The independent value
mid-20 mid-30 Age mid-40 mid-50 mid-60 Dish ratio
Dish Chinese Japanese 50 20 75 30 100 40 75 30 50 20 0.5 0.2
Age ratio
French 30 45 60 45 30 0.3
0.2 0.3 0.4 0.3 0.2 1
Table 7. Data with significant deviation
mid-20 mid-30 Age mid-40 mid-50 mid-60 Dish ratio
Dish Chinese Japanese 100 0 0 200 0 0 0 0 0 0 0.2 0.4
Age ratio
French 0 0 200 0 0 0.4
0.2 0.4 0.4 0 0 1
Table 8. Independent values for Table 7
mid-20 mid-30 Age mid-40 mid-50 mid-60 Dish ratio
Dish Chinese Japanese 20 40 40 80 40 80 0 0 0 0 0.2 0.4
French 40 80 80 0 0 0.4
Table 9. Table for maximum Dish Chinese Age
Dish ratio
Age ratio 0.2 0.4 0.4 0 0 1
2
Age ratio Japanese 0
French 0
a b
Mid-20
a
Mid-30
0
b
0
Mid-40
0
0
c
c
Mid-50 Mid-60
0 0
0 0
0 0
0 0 1
a
N
b
N
c
N
N N
N
Kunihiro Suzuki
38
2 We want to obtain a general form for the maximum . Therefore, we use variable
values of a , b , and c instead of numeric data as shown in Table 9. The corresponding independent values are shown in Table 10. Table 10. Independent values for maximum Dish Japanese
Chinese Age
2
Age ratio French
Mid-20
a2 N N2
ab N N2
ac N N2
a
Mid-30
ba N N2
b2 N N2
bc N N2
b
Mid-40
ca N N2
cb N N2
c2 N N2
c
Mid-50 Mid-60
0 0
0 0
Dish ratio
b
0 0 N
c
N
N
N
N
0 0 1
2 We can then evaluate as
2
2 2 a2 ab ac a N N N 2 ab ac a2 N N N 2
2 2 b2 ba bc b N N N ba bc b2 N N N 2 2 c2 ca cb c N N N ca cb c2 N N N 2
2
2
2
a2 b2 c2 a b c N N N 2 ab bc ca 2 2 2 a b c N N N N
(25)
Independent Factor Analysis
39
Modifying the first term, we obtain 2
a2 a2 a a N N N a2 a N
2
2
a N 1 N a a2 N 1 2 2 N N a2 N 2a N
(26)
Performing the similar analysis for the second and third terms in Eq. (25), and summing up them, we obtain a 2 b2 c2 2 ab bc ca N N 2 2 2 a b c 2 ab bc ca
2 3N 2 a b c 3N 2 N 3N 2 N
N
a b c
2
N
2N
(27)
Note that the expression does not include use any combination of the values if they hold 2
a , b , and c
. This means that we can
abc N
(28)
2 We assumed that the form of Table 9 gives the maximum . We need to prove it.
We modify the Table 9, where we change a a , where is positive, and move
to the next cell. We then obtain the modified table as shown in Table 11. What we want 2 to show is that decreases with this operation. If it is realized, it is proved that the form of Table 9 gives the maximum one. The corresponding independent values are shown in 2 Table 12. We want to show the is decreased for any incremental positive .
Kunihiro Suzuki
40
In the derivation process below, we neglect the second order since it is assumed to be quite small. Table 11. Modified table for maximum
Age
Dish Chinese
Japanese
Mid-20
a
Mid-30
0
b
0
Mid-40
0
0
c
Mid-50 Mid-60
0 0
0 0
0 0
Age ratio French 0
b N
a N
Dish ration
2
c N
a N b N c N 0 0 1
2 The is given by
2 a a a b ac a N N N 2 ac a a a b N N N 2
2
2 b b ba bc b N N N bc ba b b N N N 2
2
c a c b c2 c N N N c a c b c2 N N N 2
2
2
a a b b c2 c a b N N N 2 a a b b c N N N 2
2
2
a b 2 N ac b a bc c a c a b N N N N N N 2
(29)
Independent Factor Analysis
41
The first term in the first line of Eq. (29) is modified as a a a a N a 1 N N a a a a a N N 2 a a N 1 N a a2 N a N 2a 2 N a N 2
2
2
b b b b b 2 N N 1 N b b b b b N N 2 b b 2N N 1 N b b b b bN 2b N b b b bN 2b N b 1 b b b 2b N 1 N b 2 b b N N 2b N N b 2
2
(30)
2
2
c2 N c2 c 1 N N c 2 2 c c N N 2 c 2N N 2 1 2 N c c N 2c
(31)
2
c2 N
(32)
Kunihiro Suzuki
42
Table 12. Independent value for the modified table for maximum
Age
Mid-20
Mid-30
Mid-40
Dish Chinese
Japanese
a a
a b
N
N
Age ratio
b a
b b
N
N
c a
c b
N Mid-50 Mid-60
0 0
N 0 0
a N
Dish ration
2
French
ac N bc N c2 N 0 0
b N
c N
a N b N c N 0 0 1
Summing them up, we have a2 N a 2 N a N 2 b b N N 2b N N b c2 N 2c N 1 3N 2 a b c a 2 b 2 c 2 N N a b N 2 a N N b N 2a
(33)
We then analyze the second line term in Eq. (29). a b a b N 1 N a b N a b a b N N a b 2N 1 N a b a b 2 N ab a 2 N N 2
2
2
(34)
Independent Factor Analysis
43
Summarizing the second term, we obtain ab a ac b a bc c a c 2 2 N N N N N N N 2 a b ab bc ca 2 N N N
(35)
2 Summarizing the total, we obtain as
1 2 a b2 c2 N N a b N 2 a N N b
2 N
2 a b ab bc ca 2 N N N
(36)
1 2 1 1 N a b c N N a b 1 1 2N N a b 1 1 2 max N a b 2 Therefore, the is decreased.
2 We generalize the analysis further to obtain the final form of the maximum . We y y treat two categorical variables x and , and x has m levels and has l levels as shown in Table 13. We assume m l , which does not vanish generality. It can be
expressed by k Min m, l
(37)
k is denoted and k m in this case. The ratio associated with the x levels 1 2 as f1 , f 2 , , f k . Since we assume the form for the maximum, the number of level
x ,x ,
y1 , y2 ,
, yk
yk 1 yk 2
y1 , y2 ,
x ,x ,
,x
,x
1 2 k , and is the same as the number of level yl 0 , which is shown in Table 13. The ratio associated with
, yk are denoted as
g1 , g2 ,
, gk , and are given by
Kunihiro Suzuki
44 g1 f1 , g2 f2 ,
, gk fk , gk 1 gk 2
gl 0
(38)
Table 13. General form for maximum y1
2
yk
y2
x1
Nf1
0
0
0
x2
0
Nf 2
0
0
xk
0
0
0
Nf k
y ratio
g1
g2
gk
yl
0
x ratio
0
f1
0
f2
0
0
fk
0
0
Table 14. Independent value table y1
y2
yk
yl
Ratio
x1
Nf
Nf1 f2
Nf1 fk
0
0
f1
x2
Nf2 f1
Nf 2 2
Nf 2 f k
0
0
f2
xk
Nfk f1
Nf k f 2
Nf k 2
0
0
fk
Ratio
f1
f2
fk
0
0
2 1
The corresponding independent values are shown in Table 14. 2 The sum of the component of the associated with the first row in the Table 14 can be evaluated as
Nf
1
Nf12 Nf12
2
N 2 f12 1 f1 Nf12
01 Nf1 f 2 Nf1 f 2
2
01 Nf1 f k
2
Nf1 f k
2
Nf1 f 2
N 1 f1 Nf1 f 2 f 3 2
Nf1 f k fk
N 1 f1 Nf1 1 f1 2
N 1 f1 1 f1 f1 N 1 f1 Performing the similar analysis for the other rows and sum them up, we obtain
(39)
Independent Factor Analysis
2 N 1 f1 N 1 f 2 kN f1 f 2
45
N 1 f k
fk N
k 1 N
(40)
where we utilize f1 f2
fk 1
(41)
2 Therefore, the maximum value of is
2 max k 1 N
(42)
This does not depend on f1 , f2 , f1 f2
, f k . Therefore, we can use any value if we satisfy
fk 1 .
2 Normalizing the obtained by this maximum value, we can define a factor given
by
rc
2
N k 1
(43)
This is called as an independent factor. This has the values between 0 and 1, and the relationship is significant with approaching 1. In the above example, we obtain
rc
2 123.85 0.59 N k 1 350 2 1
(44)
6. ADJUST RESIDUALS We again treat the data for customers’ satisfaction shown in Table 2. We evaluated the relationship between two categories by the independent factor. We want to obtain the relationship between each level of two categories, which can be evaluated with adjust residuals.
Kunihiro Suzuki
46
We focus on two levels to make the analysis simple. 2 We can evaluate as
2
x11 Nk1r1
2
Nk1r1
x21 Nk2 r1
x12 Nk1r2
2
Nk2 r1
2
Nk1r2
x22 Nk2 r2
2
Nk2 r2
(45)
where
k1 k2 1 r1 r2 1
(46)
These ratios are related to data values as
x11 x12 k1 N x11 x21 r 1 N x22 N x11 x12 x21
(47)
Therefore, we obtain
x12 Nk1 x11 x21 Nr1 x11 x22 N x11 Nk1 x11 Nr1 x11 1 k r N x 1 1 11 2 We then obtain as
(48)
Independent Factor Analysis
2
x11 Nk1r1
2
Nk1r1
Nk1 x11 Nk1 1 r1 Nk1r2
2
Nr1 x11 N 1 k1 r1 Nk2 r1
2
1 k1 r1 N x11 N 1 k1 1 r1 Nk2 r2
47
x11 Nk1r1
2
Nk1r1
x11 Nk1r1
2
N
x11 Nk1r1
x11 Nk1r1
2
Nk1r2
2
x11 Nk1r1 Nk2 r1
2
x11 Nk1r1
2
Nk2 r2
k2 r2 k2 r1 k1r2 k1r1 k1r1k2 r2
2
Nk1r1k2 r2
x11 Nk1r1 Nk1 1 k1 r1 1 r1 2
(49)
Note that this corresponds to the data of row of 1 and column of 1. 2 We obtained an expression for the focusing on
expression focusing on
x12
x11
. We can also derive an
and obtain
x11 Nk1r1 Nk1 1 k1 r1 1 r1 2 Nk1 x12 Nk1r1 Nk1 1 k1 r1 1 r1 2 x12 Nk1 1 r1 Nk1 1 k1 r1 1 r1 2
2
Focusing on
x21
, we obtain
(50)
Kunihiro Suzuki
48
x11 Nk1r1 Nk1 1 k1 r1 1 r1 2 x21 N 1 k1 r1 Nk1 1 k1 r1 1 r1 2
2
Focusing on
x22
(51)
, we obtain
x11 Nk1r1 Nk1 1 k1 r1 1 r1 2 x22 N 1 k1 1 r1 Nk1 1 k1 r1 1 r1 2
2
(52)
2 Since the is constant, we obtain
2 x12 Nk1 1 r1 x11 Nk1r1 Nk1 1 k1 r1 1 r1 Nk1 1 k1 r1 1 r1 2 x21 N 1 k1 r1 Nk1 1 k1 r1 1 r1 2 x22 N 1 k1 1 r1 Nk1 1 k1 r1 1 r1 2
(53)
Therefore, can be explained by using the parameter of each cell. The absolute values of adjust residuals are all the same. Next, we consider the sign of the adjust residual. 2
x Let us consider the cell 1, 2 where corresponding data is 12 . We have x11 x12 k1 N
(54)
We then have
x12 k1 N x11
(55)
Independent Factor Analysis
49
We can make the numerator of the adjust residual as
x12 k1r2 N k1 N x11 k1r2 N
x11 k1 1 r2 N x11 k1r1 N
(56)
This is the negative value for that of cell 1,1 . Therefore, the adjust residual for 2x2 cross tabulated table are given by Table 15. Table 15. General form of adjust residual for 2x2 cross tabulated table
We set an adjust residual for cell variance as
ij
Nki 1 ki rj 1 rj
i, j ij . We can regard the average as Nki rj
and
. Therefore, the variable given by
xij aij aij 1 ki 1 rj
(57)
can be regarded as the normalized one, and it is the form for the adjust residual. We show the above expression for a 2x2 cross tabulated table. We apply the form of Eq. (57) to any type cross tabulated table. The adjust residual is given by zij
xij aij n n aij 1 i 1 j N N
(58)
This is the variable that expresses the importance of the cell with respect to the objective variable. We assume that it follows the normal distribution, and we can relate the value of adjust residual to the probability as
Kunihiro Suzuki
50 zP P 0.90 1.64 zP P 0.95 1.96 z P 0.99 2.58 P
(59)
where the predictive probability is for both sides ones. We adopt P 0.95 here. Let us evaluate the adjust residual for item 1, which is shown in Table 16. Since the absolute value is larger than zP P 0.95 , we judge that the dependence is valid. The adjust residual and independent factor rc are related to
rc
2 N k 1 N k 1
(60) Table 16. Adjust residuals for item 1 Customers' satisfaction
Item1
high
low
high
11.13
-11.13
low
-11.13
11.13
Let us consider the meaning of the adjust residuals. We study two types of the data: one is uniform, and the other is the deviated data which is shown in Table 17. In this case, the item ratios are the same for both data and are 0.5. The corresponding adjust residual are 0 and 0.45, respectively, and the independent factor are 0 and 1, respectively. Therefore, both the adjust residual and the independent factor are not related to the item ratio, but the deviation of the data.
Item
Item
Table 17. Adjust residuals for uniformed and deviated data Adjust residual
high low
Customers' satisfaction high low Item ratio 5 5 0.5 5 5
Adjust residual
high low
Customers' satisfaction high low Item ratio 10 0 0.5 0 10
high low
high low
high 0.00 0.00
low 0.00 0.00
Independent factor 0
high 4.47 -4.47
low -4.47 4.47
Independent factor 1
Independent Factor Analysis
51
Next, we investigate the deviated data with varying the item ratio as shown in Table 18. When the item ratio changes significantly, the residual and the independent factor are independent of them. Inspecting above, an independent factor expresses the deviation of total data, and the adjust residual expresses the deviation of the levels, and both are independent of the item level ratio.
Item
Item
Item
Item
Item
Table 18. Adjust residual and independent factor of deviated data with varying item ratio Adjust residual
high low
Customers' satisfaction high low Item ratio 4 0 0.2 0 16
Adjust residual
high low
Customers' satisfaction high low Item ratio 8 0 0.4 0 12
Adjust residual
high low
Customers' satisfaction high low Item ratio 10 0 0.5 0 10
Adjust residual
high low
Customers' satisfaction high low Item ratio 12 0 0.6 0 8
Adjust residual
high low
Customers' satisfaction high low Item ratio 16 0 0.8 0 4
high low
high low
high low
high low
high low
high 4.47 -4.47
low -4.47 4.47
Independent factor 1
high 4.47 -4.47
low -4.47 4.47
Independent factor 1
high 4.47 -4.47
low -4.47 4.47
Independent factor 1
high 4.47 -4.47
low -4.47 4.47
Independent factor 1
high 4.47 -4.47
low -4.47 4.47
Independent factor 1
7. LEVEL ACHIEVEMENT RATIO We select levels based on the critical adjust residuals, and we evaluate the corresponding ratio. We call the ratio as a level achievement ratio that is denoted as
rli ,
where i expresses the item. The level of item 1 is high and is required to realize customers’ high satisfaction. The corresponding ratio is 0.63. In this case, the selection of levels of explanatory variable is rather clear. However, the selection is not clear associated with categorical levels in general. In that case, the adjust residual does work.
Kunihiro Suzuki
52
We can set the critical adjust residual value relating to the standard normal distribution. For example, if we relate the critical value to the predictive probability, we set the critical value at 1.96. When we set the critical value, there may be a case where the levels that exceed the value does not exist. We then do not treat the item. Consequently, we evaluate the selection in two gates: one is the evaluation, and the second is the adjust residual critical value. Further, if we evaluate the level with the critical value, it may occur that many levels are selected. In that case, we should sum the ratio as below 2
rl k1 k2
(61)
where levels 1 and 2 are assumed to exceed the critical value. Table 19 summarizes the data including the level achievement ratio. Using the data, we perform the CS analysis in the next section. Table 19. Summary of the satisfaction data Item1
Item2
Item3
Item4
Item5
Level achievement ratio
0.63
0.59
0.32
0.5
0.41
Independent factor
0.59
0.57
0.38
0.53
0.49
Dependence(χ2 evaluation)
yes
yes
yes
yes
yes
1.23
0.88
-1.48
0.08
-0.7
1.08
0.75
-1.78
0.19
-0.23
Contribution score
1.63
1.15
-2.31
0.19
-0.23
Rquired improvement score
-0.11
-0.09
0.21
0.08
0.33
Normalized level achievement ratio Normalized independent factor
8. DETERMINATION OF ITEM BASED ON CS ANALYSIS We obtained independent factors and level ratios for each item. Using these data, we perform CS analysis of which procedure is shown in the previous chapter.
Independent Factor Analysis
53
8.1. Normalization We first evaluate averages and standard deviations for a level ratio with respect to items, which are given by
r l
1 p rli p i 1
1 p rli rl p i 1
r l
(62)
2
(63)
We then normalize the level ratios as
zrl i
rli rl
r
(64)
l
We next evaluate averages and standard deviations for an independent factor with respect to items, which are given by
r c
r c
1 p rr i p i 1 c
1 p rci rc p i 1
(65)
2
(66)
We then normalize the independent factors
zrci
rci as
rci rc
rc
We then obtain a vector for each item as
(67)
ai zrc i , zrl i
.
Kunihiro Suzuki
54
8.2. Improve Requested and Contributed Items We can plot the normalized independent factor and normalize level ratios as shown Figure 2. The high correlation factor means that it is an important item, and high satisfaction means that it is in good condition. Therefore, the improvement request item can be selected as the one with a high independent factor and a low level achievement ratio. How can we obtain the corresponding value ? The axis direction of right angle of -45o corresponds to the importance associated with an independent factor and bad condition associated with the satisfaction, and hence it expresses the degree of an improvement requested degree. Therefore, the projection of each point to the axis corresponds to the improvement request degree. The axis direction of right angle of 45o corresponds to the importance associated with an independent factor and good condition associated with a level achievement ratio, and hence it expresses the degree of contribution. Therefore, the projection of each point to the axis corresponds to the contribution degree. We can evaluate the degree as follows. The unit vector for the contribution axis eG , and that for improvement request axis eB are given by 1 1 , eG 2 2 e 1 , 1 B 2 2
(68)
The degree for the contribution is denoted as
Gi and can be evaluated as
Gi ai eG
zrc i , zrl i
1 2
z
rc i
1 2
zrl i
1,1
The degree for the improvement request is denoted as
(69)
Bi and can be evaluated as
Bi ai eB
zrc i , zrl i
1 2
z
rc i
1 2
zrl i
1, 1
(70)
Independent Factor Analysis
55
Normalized level achievement ratio
Figure 2 shows CS plot and Figure 3 shows the contribution and requested improvement degrees extracted from Figure 2. Item1 and Item2 contribute to the customers’ satisfaction, while there is no clear improvement requested item.
2 Contribution Item1 Item2
1 Item4
0
-1
Item5
Item3
Improve requested
-2 -2
-1
0
1
2
Normalized independent factor Figure 2. CS plot for satisfaction for salesmen.
2.0 1.8 Contribution score
1.6
Improvement request
1.4 1.2 1.0 0.8 0.6
0.4 0.2 0.0 Item1
Item2
Item3
Figure 3. Contribution and improvement request degrees.
Item4
Item5
Kunihiro Suzuki
56
9. CS CORRELATION FACTOR We want to evaluate the status of the CS. We can evaluate it inspecting the data distribution. If the data are along the contribution axis, the status is good. On the other hand, if the data are along the improvement requested axis, the status is bad. We can evaluate the status of the CS by evaluating the correlation factor between zri and
zi
, which is denoted is as
rzr z
, and is evaluate as
zr2z
rzr z
zr2zr z2z
(71)
where 1 p 2 zri p i 1
(72)
z2z
1 p 2 zi p i 1
(73)
z 2z
1 p zri zi p i 1
(74)
z 2z r r
r
We call is as a CS correlation factor. The value is between -1 and 1, and status is better with increasing the value. The CS correlation factor for the data is 0.96, and hence the status is good in this case. This is the reason why we have many contributed items with less improvement requested items.
10. EXPECTED OBJECTIVE VARIABLE IMPROVEMENT WITH IMPROVING EXPLANATORY VALUE When we improve the explanatory variable value, how can we expect the improvement in the objective value? This can be easily done with numerical data for CS analysis using
Independent Factor Analysis
57
a regression analysis. We do not have such theory for categorical data. We propose here a procedure to predict it.
Figure 4 Data flow.
10.1. Two Levels (Objective Variable)-Two Levels (Explanatory) We consider objective and explanatory variables with both two levels. Figure 4 schematically shows the data flow. The ratio of customer’s satisfaction for high is denoted as as
r1 and that for low is denoted
r2 .
We consider the levels of item as yes or no. The level yes corresponds to a requested one which has a high adjust residual, and the level no corresponds to non-requested one which has a low or negative adjust residual. Therefore, we try to increase the ratio of k1 , expecting the increase in r1 . The initial data for corresponding cells are assumed to be x11 , x12 , x21 , x22 , and the ratios are related to the data as below.
k1
x11 x12 N
(75)
k2
x21 x22 N
(76)
Kunihiro Suzuki
58
r1
x11 x21 N
(77)
r2
x12 x22 N
(78)
where N is the total number of the data and is given by N x11 x12 x21 x22
We want to increase is increased to
k1
(79)
. Therefore, we perform something and assume that the ratio
k ' , and is expressed by
k1' k1 k1
where
(80)
k1 is positive. k 2
then becomes as
r
k 2' k 2 k1
. This
k1
is a given value.
r
We study the change of 1 and 2 in this situation. We assume that the values in the cells are changed to x11 x11' ' x12 x12 ' x21 x21 x x' 22 22
(81)
Let us consider the data changes in more detail. The increase in by the decrease of is related to the
x21
with
21
and the decrease of
x22
with
22
k1
can be expressed
. Therefore, the change
k1 as
21 22 N k1 We assume that
x11
and
(82)
x12
do not change.
Independent Factor Analysis
We also assume that all the others are added to factor and is given by
x12
21
is added to
x11
59
and that part of
22
is added to
rc
(83)
is 1, the whole data are added to
data are added to and obtain
x12
x11
. On the other hand, when
. This features are expected ones. We eliminate
' x11 x11 N k1 1 rc 22 ' x12 x12 1 rc 22
rc 22
rc 22 N
is 0, whole
using Eq. (82),
, and the corresponding ratio is
(85)
This can be solved with respect to
22 N
21
rc
(84)
The increase in number for high satisfaction is expressed by
r1' r1
and
. We assume that the change is expressed using the independent
' x11 x11 21 rc 22 ' x12 x12 1 rc 22
When
x11
22
given by
r1' r1 rc
(86)
There are two non-determined variables solve it.
22
and
r1' , and we need one condition to
2 We propose that the independent factor, that is, the is not changed. We then obtain
x '11 Nk1' r1' x11 Nk1r1 Nk1 1 k1 r1 1 r1 Nk1' 1 k1' r1' 1 r1' 2
2
2
(87)
Kunihiro Suzuki
60 2 We can modify as
2
x Nk r ' Nk 1 k r ' 1 r ' ' 11
' 1
' 1
' 1
2
1
1
1
x11 N k1 1 rc 22 Nk1' r1' Nk1' 1 k1' r1' 1 r1'
2
1 rc ' ' ' x11 N k1 N r1 r1 Nk1r1 rc ' ' ' ' Nk1 1 k1 r1 1 r1
2
x11 ' 1 rc 1 rc ' k1 r1 k1 r1 rc rc N N ' ' ' ' k1 1 k1 r1 1 r1
(88)
2
We then obtain ' 1 rc 1 rc x ' k 1 k r 1 r 11 k1 r1 k1 r1 N rc rc N
2
' 1
' 1
' 1
' 1
2
(89)
We introduce variables below A
B
2 N
k1' 1 k1'
(90)
1 rc x11 k1 r1 N rc
1 rc ' C k1 rc
(91)
(92)
The Eq. (89) can be expressed by Ar1' 1 r1' B Cr1'
2
(93)
Independent Factor Analysis
61
' We solve this with respect to r1 and obtain
r1'
A 2 BC A A 4 B C B 2 A C2
(94)
We have two roots, but the only one is available. We first consider the sign of the term C B as
1 rc ' x11 1 rc CB k1 k1 r1 r N r c c 1 rc x11 1 r1 k1 N rc 1 rc x11 x12 x11 1 r1 r N N c 1 rc x12 0 1 r1 N rc
(95)
Therefore, the term C B is positive. We consider that the adjust residual is large for cell 1,1 . Therefore, the below should hold.
x'11 Nk1' r1' 0
(96)
This leads to
B Cr1' 0 We consider the one root of plus sign in (94), and modify it as
(97)
Kunihiro Suzuki
62
B Cr1' B
A 2 BC A A 4 B C B C 2 A C2
2 AB 2 BC 2 AC 2 BC 2 A2C 2 4 ABC 2 C B 2 A C2
2 AB AC AC 1
4B C B A
2 A C2
4B 2 AB AC 1 1 C B A 2 2 A C
(98)
2 AB 2 AC 2 A C2 2 A B C 2 A C2
0
Therefore, this root is not adequate. The other root is modified as
B Cr1' B
A 2 BC A A 4 B C B C 2 A C2
4B 2 AB AC 1 1 C B A 2 2 A C
(99)
4B 2 AB AC 1 C B 1 A 0 2 2 A C Therefore, the root is positive and is adequate. We then have
r1'
A 2 BC A A 4 B C B 2 A C2
(100)
Independent Factor Analysis
63
Table 20. Cross tabulated table for general form and the modified effective one with 2x2 cross tabulated table is shown below
Item
Item
Levels significant high high plane low significant low ratio
high x11 x21 x31 x41 x51 r1
Total satisfaction plane x12 x22 x32 x42 x52 r2
Levels significant high high plane low significant low ratio
high
Total satisfaction plane low
low x13 x23 x33 x43 x53 r3
ratio k1 k2 k3 k4 k5
ratio
x11m
x12m
k1m
x21m
x22m
k2m
r1
r2m
10.2. General Form for Expected Objective Variable Level The target cross tabulated table is not 2 x 2 type in general. More general form is shown in Table 20. In that table, we obtained that the expected explanatory levels are two. We then merge the table and obtain the one blow, where the data are merged as
x11m x11 x21 x12 m x12 x13 x21 x23 x21m x32 x41 x51 x22 m x32 x33 x42 x43 x52 x53 We can then apply the same process to obtain the target values. This process can be easily extended to any expected levels.
(101)
Kunihiro Suzuki
64
11. ANALYSIS FOR SUB GROUP We treat a total group up to here. The group may consist of many sub groups, and the characteristics of the sub groups are different from the total group in general. The target items for the sub groups may be different from the total group. We try to select target items for the sub groups. We assume that the importance of item is the same for the sub groups of the total group. The difference between a certain group and the total group or between each sub groups is the status of satisfaction. We reference the level achievement ratio of the sub group with respect to the total ones. We set the level achievement ratio for i -item as rlGi and the corresponding data number as nG . The normalized value for i -th item zlGi is then given by zlGi
rlGi rli
rli 1 rli rlGi 1 rlGi N nG
(102)
We may use a different form as zlGi
rlGi rli 1 1 rli 1 rli N nG
(103)
where rli
Nrli nG rlGi N nG
(104)
Data for the group can be regarded as the portion of the total group, and hence the variance may be assumed to be the same for the sub-group. In this case, the normalized value may be evaluated as zlGi
rlGi rli 1 1 rli 1 rli N nG
(105)
Independent Factor Analysis
65
The above models may have a too big deviation and suffer an unstable one. This may be modified as a simple one of
zlGi
rlGi rli
rli 1 rli
(106)
This is a normalized one with respect to a standard deviation of the population, where we use it as a default. The normalized satisfaction can be expressed as
zli zli zlGi
(107)
After then the process is the exactly the same as that for CS analysis.
SUMMARY To summarize the results in this chapter.
a The independent value for cell i, j is denoted as ij and is given by aij ki rj N
rj
k
where i is the ratio of an explanatory variable, and is the ratio of an objective variable level. We evaluate the variable for the cross tabulated table as
2
i, j
x
ij
aij
2
aij
The level number of variable are assumed to be m and l . Therefore, the freedom is given by m 1 l 1
We set a predictive probability P , and obtain a corresponding P-value as
Kunihiro Suzuki
66
c 2 2 , P If 2 is larger than c2 , the dependence is valid, and vice versa. The independent factor is given by
rc
2
N k 1
The adjust residual is given by zij
xij aij n n aij 1 i 1 j N N
We select levels based on the critical adjust residuals, and we evaluate the corresponding ratio. We call the ratio as a level achievement ratio and denote it as rli , where i expresses the item. Using the independent factor, and the level achievement ratio, we perform CS analysis, and also we can evaluate a CS correlation factor. We can predict the improvement of the objective variable ratio with increasing the level achievement ratio of the explanatory variable. The improved objective level ratio is given by
r1'
A 2 BC A A 4 B C B 2 A C2
where A
B
2 N
k1' 1 k1'
1 rc x11 k1 r1 N rc
1 rc ' C k1 rc
Independent Factor Analysis
67
The subgroup can be analyzed as
zlGi
rlGi rli
rli 1 rli
This is a normalized one with respect to a standard deviation of the population, where we use it as a default. The normalized satisfaction can be expressed as
zli zli zlGi Figure 5 shows the flow for the above analysis. 2 We obtain the CS data, and it is tested associated with the relationship using testing. We then select items and evaluated the corresponding independent factor and satisfaction. Selecting the levels of items and we evaluate the level achievement ratio. Using the independent factor, and the level achievement ratio, we perform CS analysis and select the target items.
Figure 5. Flow for the analysis.
Chapter 3
STATISTICAL TESTING AND PREDICTIONS ABSTRACT We usually obtain various sample probability variables, or variables of two groups where we obtain a certain difference in general. Assuming the probability function, we evaluate that the data belongs to the population set, or the difference is valid or not. These judging processes are called as testing. In these processes, the values of population are known or approximated values are assumed. Further, we can also predict population variable value range from the obtained sample data. The testing and predictions are partially performed in each chapter up to here. We clearly define the testing and predictions and repeat again and summarize the testing and predictions in statistics in this chapter.
Keywords: testing, hypothesis, null hypothesis, ratio, average, variance, normal distribution, t distribution, F distribution, population ratio
1. INTRODUCTION When we obtain sample data such of average and ratio, these values are different from the ones of the corresponding population set. Furthermore, when we have two data set for sample, we want to know whether the difference is valid or not for the corresponding population set. In the testing process we basically focus on the sample data using or assuming the corresponding population data. In the same obtained data of samples, we want to predict the range of the difference of population. We show the procedure to judge the validity for various probability values, and predict the range. We treat various probability variables in this chapter.
Kunihiro Suzuki
70
2. HYPOTHESIS We set a hypothesis to perform a test. H 0 is a null hypothesis and H1 is an alternative hypothesis. We obtain clear results
whether H 0 is true or H1 is true. We should be careful about the appreciation as the followings. We cannot judge absolutely, and may make errors sometimes. The case is shown in Table 1. Table 1. Judge and real results
Judge
Real H1:true
H0 :false;H1:true
H1:false (1st type error) H0 :true
H0 :true;H1:false
H0 :false (2nd type error)
We basically evaluate whether H 0 is true or false. When H 0 is false, we can clearly stay that H1 is true based on the decided prediction probability. We may also make a mistake in this case as shown in the table, which is called as a first type error. We can reduce the error with increasing the prediction probability. When H 0 is true, we may also make a mistake as shown in the table, which is called as a second type error. However, we cannot directly relate it to the prediction probability. Therefore, this error is rather uncontrollable. It should be noted, when H 0 is true, we should not clearly state that H 0 is true, but should say that we cannot say that H1 is not true. When H 0 is true, there are two possibilities. One is that H 0 is really true. The other is that we cannot say that H1 is true or not due to significant error range. To avoid the latter case, we should be careful when H 0 is true. Therefore, H 0 is established to evaluate H1 clearly. When H 0 is true, we
should stay rather ambiguous results. In that stand point of view, H 0 is called as null hypothesis, where it is established to deny clearly.
Statistical Testing and Predictions
71
3. LEVEL OF SIGNIFICANCE In the probability variable, we cannot predict results with 100% accuracy, but must set a prediction probability P . This means that the results of judge or prediction may fail with a certain times if we perform them many times. For example, if we set the prediction probability at 95%, we may fail 5 times for 100 times testing or predictions. We cannot decide the prediction probability mathematically, but simply assume it. This depends on what accuracy a person needs or is requested from his customers. In the prediction probability, we have two cases. We sometimes do not care the minimum or the maximum case as shown in Figure 1 (a). For example, we focus on the maximum value for a stock, where we care the stock value is sufficient or insufficient for the future sales. We set one edge boundary in this case, and it is called as one sided probability. The other is the typical one of both sides probability where the probability is assumed for both sides of the target value as shown in Figure 1 (b). In the one sided probability, the probability P is related to the region edge P
P
p
as
f z dz
(1)
In the probability distribution where the defined region is positive, the one sided probability P is related to the region edge P
P
0
p
as
f z dz
In both sides prediction probability, the probability P is related to the region edge
(2)
p
as P
P
f z dz
P
(3)
This is true only when the probability distribution is symmetrical. When it is asymmetrical, we combine two one-sided probabilities for the both sides probability as P1
P1
P2
f z dz
P2
f z dz
(4)
(5)
Kunihiro Suzuki
72 where P P2 P1
f()
f()
(6)
P
P
-P
P
P
(a)
(b)
Figure 1. Probability distribution, probability and P-value. (a) One sided probability, (b) both sides probability.
4. P POINTS FOR VARIOUS PROBABILITY DISTRIBUTIONS Once, we set the prediction probability P , we can obtain corresponding P points for
2
various probability distributions. We treat normal, t, , and F distributions. The normal and t distribution are symmetrical and we can assume the both sides prediction probability
2
with its peak position. However, the , and F distributions are asymmetrical, and P points are defined for one sided probability. The normal distribution can be always reduced to a standard normal distribution and we treat the standard normal distribution given by
f x
x2 exp 2 2 1
for x
The t distribution with a freedom of n is given by
(7)
Statistical Testing and Predictions n 1 n 1 2 2 2 t fn t 1 n n n 2
The
73
for t (8)
2 distributions with a freedom of n is given by
fn x
n 1 x 2 x exp n 2 n 22 2
1
for 0 x
n The F distribution with a freedom of 1 and n1 2x 1 f n1 , n2 x n n n n B 1 , 2 1 x 2 2 2 2 2
n1
2
n1 x 2 1 n1 x n2 2 2
(9)
n2 is given by n2
2 1 x
for 0 x (10)
We can obtain corresponding values using a standard software.
5. TESTING FOR ONE VARIABLE 2 5.1. One Sample Data Testing for Known Variance
Hypothesis H0: The sample belongs to the set. H1: The sample does not belong to the set. Evaluation and Judgment
2
We have a set characterized with an average of and a variable of . When we have a certain value of x , we want to evaluate that the value belongs to the set. We evaluate the variable z
z
given by
x
2
(11)
Kunihiro Suzuki
74
We decide a prediction probability P , and evaluate a corresponding
zp
. If the
absolute value of z is smaller than the z P , we can regard that the data can be related to the set, and vice versa, which is expressed by z zP z zP
H0 is true. We cannot say that x does not belong to the set. H1 is true. x does not belong to the set.
Prediction If we do not know the population average, and the data the population, we can predict the population average as x zP x zP 2
x
(12)
can be regarded to belong to
2
(13)
2 5.2. Sample Average Testing for Known Variance
Hypothesis
.
H0: The sample average
x
is the same as the population average
H1: The sample average
x
is different from the population average
.
Evaluation and Judgment We evaluate a sample average x
x
with a sample number n , given by
1 n xi n i 1
(14)
2
We know the average and variance of the population set as and , respectively. We want to evaluate that the sample average is the same as that of the population. The same averages that the sample average variation covers the population average. We evaluate the variable
z
given by
Statistical Testing and Predictions z
75
x
2 n
(15)
We decide a prediction probability P , and evaluate corresponding
zp
. If the absolute
z
value of z is smaller than the p , we can regard that the data can be related to the set, and vice versa. Therefore, we can judge the sample average, and population average as H0 is true. We cannot say that x is different from . H1 is true. x is different from .
z zP z zP
(16)
Prediction If we do not know the population average, we can predict the population average as
x zP
2 n
x zP
2 n
(17)
2 5.3. Sample Average Testing for Unknown Variance
Hypothesis H0: The sample average H1: The sample average
x x
. is different from the population average . is the same as the population average
Evaluation and Judgment We evaluate a sample average
x
with a sample number n , given by
n
x xi i 1
We know the average of the population set as
2 . We then evaluate unbiased variance as
(18)
, but do not know the variance
Kunihiro Suzuki
76
s 2
1 n 2 xi x n 1 i 1
(19)
We want to evaluate that the sample average is the same as that of the population. The same averages that the sample average variation covers the population average. We evaluate the variable t given by t
x s n 2
(20)
We decide a prediction probability P , and evaluate corresponding
tp
. If the absolute
t
value of t is smaller than the p , we can regard that the data can be related to the set, and vice versa. Therefore, we can judge the sample average, and population average as t t P t t P
H0 is true. We cannot say that x is different from . H1 is true. x is different from .
(21)
Prediction If we do not know the population average, we can predict the population average as
x tP
2 n
x tP
2 n
(22)
(Example) Ministry of Health, Labor and Welfare in Japan evaluated new born man’s average weight in 1990, and the average was 3,150 g. More than 10 years passed now. The situation changes significantly, and hence investigate the average weight is changed. We randomly extract 100 new born male babies and obtained the data as follows. The average and unbiased standard deviation can be evaluated as
x 2982 s 316.17
(23)
Statistical Testing and Predictions
77
where
s s
2
(24)
Table 2. Babies’ weight
3372 2935 3118 2851 2675 2646 3163 2949 3522 2638
3110 3247 3580 3191 2689 2383 3187 2447 4231 3181
2619 3060 2879 3521 3070 3385 2358 3058 3366 2935
3315 2674 2915 2958 2935 2527 3268 3139 2520 2846
3420 2591 2679 3391 2443 2984 2512 2998 2724 2783
3230 3035 2956 2920 3121 3501 2716 2909 2894 3202
3017 3226 2809 3387 2936 3076 2890 3152 2941 2666
3124 2996 3092 3263 2655 3330 2807 2629 3061 2849
2928 3012 3482 2706 2863 2787 2647 3602 2788 2639
3048 3159 2753 2500 2794 2957 3084 3385 2863 2716
We can evaluate t as t
2892 3150 5.35 316.17 100
(25)
We decide a prediction probability P of 95%, and the corresponding P points are
t p 1.98
(26)
Therefore, we obtain
t tP
(27)
and we can judge that the weight is changed. z p 1.96 for standard normal distribution t p 1.98 for t distribution
(28)
Kunihiro Suzuki
78
As we mentioned before, we can approximate the t distribution as a standard normal distribution when the sample number is large. The P point for the standard normal distribution is
z p 1.96
(29)
which is almost the same as that for t distribution as is expected, and the judge is also the same.
2 5.4. Sample Variance Testing for Known Variance
Hypothesis 2 2 H0: The sample unbiased variance s is the same as the population average . 2 2 H1: The sample average s is different from the population average .
Evaluation and Judgment We evaluate a sample unbiased variance
s 2
2
with a sample number n , given by
1 n 2 xi x n 1 i 1
We evaluate the variable 2
s
(30)
2 given by
n 1 s 2 2
This follows the
(31)
2
distribution with a freedom of n 1 .
We decide a prediction probability
2
2
2 P , and evaluate corresponding P . If the absolute
value of is smaller than the P , we can regard that the data can be related to the set, and vice versa. Therefore, we can judge the sample average, and population average as
Statistical Testing and Predictions 2 2 P 2 2 P
79
H0 is true. We cannot say that s is different from . 2
2
H1 is true. s is different from . 2
2
(32)
Prediction
2
If we do not know the population variance, we can predict it. Since distribution is asymmetrical, we set two prediction probabilities and hence it is expressed as
n 1 s 2 P2
2
n 1 s 2
2
P2
(33)
1
5.5. Outliers Testing We evaluate whether the data are outlier or not given by
5.6. Population Ratio Testing with Restored Extraction Hypothesis H0: The sample average H1: The sample average
pˆ pˆ
is the same as the population ratio
p
.
is different from the population average
p
.
Evaluation and Judgment We have a population ratio of
p
want to judge the sample ratio that can be regarded as the population ratio We evaluate a normalized variable as z
pˆ
. We investigate the sample ratio and obtained . We
p
.
pˆ p p 1 p n
This variable follows standard normal distribution.
(34)
Kunihiro Suzuki
80
We decide a prediction probability
P , and evaluate corresponding z p . If the absolute
z
value of z is smaller than the P , we can regard that the data can be related to the set, and vice versa. Therefore, we can judge the sample average, and population average as z zP z zP
H0 is true. We cannot say that pˆ is different from p. H1 is true. pˆ is different from p.
(35)
Prediction If we do not know the population ratio, we can predict the population ratio as pˆ
pˆ 1 pˆ z P 2 pˆ 1 pˆ z P 2 z 2 2 pˆ P z P 2 n 4n p 2n n 4n zP 2 zP 2 1 1 n n
zP 2 zP 2n
(36)
When the sample number is sufficiently large, it is reduced to
pˆ z P
pˆ 1 pˆ n
p pˆ z P
pˆ 1 pˆ
(37)
n
5.7. Population Ratio Testing with Non-Restored Extraction Hypothesis H0: The sample average H1: The sample average
pˆ pˆ
is the same as the population ratio
p
.
is different from the population average
p
.
Evaluation and Judgment We have a population ratio of
p
. We investigate the sample ratio and obtained
We want to judge the sample ratio that can be regarded as the population ratio We evaluate a normalized variable as
p
.
pˆ .
Statistical Testing and Predictions
z
81
pˆ p p 1 p N n n N 1
(38)
This variable follows standard normal distribution. We decide a prediction probability
P , and evaluate corresponding z p . If the absolute
z
value of z is smaller than the P , we can regard that the data can be related to the set, and vice versa. Therefore, we can judge the sample average, and population average as z zP z zP
H0 is true. We cannot say that pˆ is different from p. H1 is true. pˆ is different from p.
(39)
Prediction If we do not know the population ratio, we can predict the population ratio as 2 pˆ 1 pˆ N n zP 2 N n zP 2 N n pˆ zP 2n N 1 n N 1 4n 2 N 1 z 2 N n 1 P n N 1 p 2 pˆ 1 pˆ N n zP 2 N n zP 2 N n pˆ zP 2n N 1 n N 1 4n 2 N 1 z 2 N n 1 P n N 1
(40)
6. TESTING FOR TWO VARIABLES We perform testing of two variables in this section. Before we perform the testing, we should discuss some points below. Let us consider that the two averages. When we obtain two averages the values are different of course in general. We evaluate that the difference is valid or not in the statistical evaluation. First, we evaluate two variables independently. We can evaluate averages and corresponding intervals as shown in Chapter 12 of volume 1 as
Kunihiro Suzuki
82 x1 z p
x2 z p
2 s1
n1 2 s2
n2
1 x1 z p
2 s1
2 x2 z p
n1 2 s2
n2
(41)
(42)
We approximate that the sample average follows normal distribution, and also assume x1 x2 .
Whether difference is valid or not can be evaluated by that the interval of these two variables have cross area. This is, we evaluate
x1 z p
2 s1
n1
x2 z p
2 s2
n2
(43)
If Eq. (43) is valid, we can judge 1 2 , and vice versa. This evaluation is inaccurate. What point is wrong?
zP
is decided based on the prediction probability P . Eq. (43) is valid for
1 P 1 P . Therefore, this evaluation is too severe. We should not evaluate two variables independently, but treat the difference itself as one probability variable.
2 6.1. Testing of Difference Between Population Averages: Is Known
Hypothesis H0: The two averages are same. H1: The two averages are different from each other. Evaluation and Judgment We evaluate the difference between two population average where the population variance is known. We obtain two sample averages of x1 and x2 , and assume the corresponding average and standard deviations are
Statistical Testing and Predictions 2 x1 : mean 1 ,standard deviation 1 n1 2 2 x : mean ,standard deviation 2 2 n2
83
(44)
We assume that the two variables X1 and X 2 follows normal distribution. We consider a difference of two probability variables given by y x1 x2
(45)
The corresponding average and variance are given by 1 2
2
(46)
1 2 2 2 n1 n2
(47)
We then construct a normalized form as z
x1 1 x2 2 2
1
n1
2
2
2
n2
(48)
x1 x2 1 2 1 2 n1
z
2 2 n2
follows a standard normal distribution. We want to evaluate whether 1 2 , and
hence set 1 2 , and the normalized variables is z
x1 x2
1 2 2 2 n1 n2
(49)
We decide a prediction probability
z
z
P , and evaluate corresponding z p . If the absolute
value of is smaller than the p , we can regard that the data can be related to the set, and vice versa. Therefore, we can judge difference of the averages as
Kunihiro Suzuki
84 z zP z zP
H0 is true. We cannot say that two averages are different. H1 is true. Two averages are different.
(50)
Prediction We set 1 2
(51)
The difference of two averaged for the population can be evaluated as
x1 x2 zP
1 2 n1
2 2 n2
x1 x2 z P
1 2 n1
2 2 n2
(52)
6.2. Testing of Difference between Population Averages: and the Variances Are Assumed to Be the Same
2
Is Unknown
Hypothesis H0: The two averages are same. H1: The two averages are different from each other. Evaluation and Judgment We evaluate the difference between two population average where the population variance is unknown and assumed to be the same. 2
s We obtain two sample averages of x1 and x2 , and also 1
and
2 s2
, and assume
the corresponding average and standard deviations of x1 and x2 are 2 s x1 : mean 1 ,standard deviation 1 n1 2 s2 x2 : mean 2 ,standard deviation n2
We assume that the two variables X1 and X 2 follows t distribution. We consider the difference as a probability variable given by
(53)
Statistical Testing and Predictions y x1 x2
85 (54)
The corresponding average and variance are given by 1 2
(55)
1 1 2 sp2 n1 n2
(56)
where
sp 2
n1 1 s1 2 n2 1 s2 2 n1 1 n2 1
(57)
We construct a normalized variable given by t
x1 1 x2 2 1 2 1 s p n n 1 2 x x 1 2 1 2
(58)
1 2 1 s p n n 2 1
We assume that the variable follows a t distribution with a freedom of n1 n2 2 . We want to evaluate whether 1 2 , and hence set 1 2 , and the normalized variables is t
x1 x2
(59)
2
sp
1 1 n1 n2
We compare this with P point
tP
We decide a prediction probability
t
and perform a testing.
P , and evaluate corresponding t p . If the absolute
value of t is smaller than the P , we can regard that averages the same, and vice versa. Therefore, we can judge the sample average, and population average as
Kunihiro Suzuki
86 t tP t tP
H0 is true. We cannot say that the averages are different from each other. H1 is true. The averages are different from each other.
(60)
Prediction If we want to evaluate that the population average difference is more than a certain value of , we set
1 2
(61)
and the corresponding normalized variable is
x1 x2 tP
1 1 2 1 2 1 sp x1 x2 t P sp n1 n2 n1 n2
6.3. Testing of Difference between Population Averages: Unknown and the Variances Are Assumed to Be Different
(62)
2
Is
Hypothesis H0: The two averages are same. H1: The two averages are different from each other. Evaluation and Judgment We evaluate the difference between two population averages where the population variance is known. 2
s We obtain two sample averages of x1 and x2 , and also 1
and
2 s2
, and assume
the corresponding average and standard deviations of x1 and x2 are
x1 : mean 1 ,standard deviation x2 : mean 2 ,standard deviation
s1 n1 2
s2 n2 2
We assume that the two variables X1 and X 2 follows t distribution. We consider the difference as a probability variable given by
(63)
Statistical Testing and Predictions
y x1 x2
87
(64)
The corresponding average and variance are given by
1 2
(65)
s1 s2 n1 n2 2
2
2
(66)
We construct a normalized variable given by t
x1 1 x2 2 s1 s2 n1 n2 2
2
(67)
x1 x2 1 2 s1 s2 n1 n2 2
2
We assume that the variable follows a s1 2 s2 2 n1 n2 * n 2 2 1 s1 1 s1 n1 1 n1 n2 1 n2
t distribution with a freedom of
n* , where
2
(68)
* * In general n is not an integer, and we use a integer nearest to n , and denote it as
nf
and hence express it as
n f Round n*
We decide a prediction probability
(69)
P , and evaluate corresponding t p . If the absolute
value of t is smaller than the t P , we can regard that averages are the same, and vice versa. Therefore, we can judge the sample average, and population average as
Kunihiro Suzuki
88 t tP t tP
H0 is true. We cannot say that the averages are different from each other. H1 is true. The averages are different from each other.
(70)
Prediction If we want to evaluate that the population average difference is more than a certain value of , we set 1 2
(71)
and the corresponding normalized variable is
x1 x2 tP
s1 s2 s s x1 x2 t P 1 2 n1 n2 n1 n2 2
2
2
2
(72)
2 6.4. Testing of Difference between Population Averages: Is Unknown with Paired Data
Hypothesis H0: The two averages are same. H1: The two averages are different from each other. Evaluation and Judgment We evaluate the difference between two population averages where the population variance is unknown, and the data are paired. We obtain two sample averages of x1 and x2 with the same data number of n . We have data for group 1 and group 2 are paired, and we can evaluate the difference of each pair as
di xi1 xi 2 We can evaluate the average and unbiased variance associated with d
1 n di n i 1
(73)
di
as
(74)
Statistical Testing and Predictions sd 2
1 n di d n 1 i 1
89
2
(75)
We construct a normalized variable given by t
x1 x2 1 2
(76)
sd n 2
We assume that the variable follows a t distribution with a freedom of n 1 . We decide a prediction probability
P , and evaluate corresponding t p . If the absolute
t
value of t is smaller than the P , we can regard that averages are the same, and vice versa. Therefore, we can judge the sample average, and population average as t tP H0 is true. We cannot say that the averages are different from each other. t tP H1 is true. The averages are different from each other.
(77)
Prediction If we want to evaluate that the population average difference is more than a certain value of , we set 1 2
(78)
and the corresponding normalized variable is
x1 x2 tP
sd s x1 x2 tP d n n 2
2
(79)
6.5. Testing of Difference between Population Ratio with Restored Extraction Hypothesis H0: The two population ratios are same. H1: The two population ratios are different from each other.
Kunihiro Suzuki
90
Evaluation and Judgment We assume that two population ratios are p1 and p2 , and evaluate the difference between two population ratios based on the sample data. ˆ1 and pˆ 2 , and assume the corresponding average and We obtain two sample ratio of p standard deviations are p 1 p1 pˆ1 : mean p1 ,standard deviation 1 n1 p2 1 p2 pˆ 2 : mean p2 ,standard deviation n2
(80)
This follows a normal distribution. We consider the difference as a probability variable given by y pˆ1 pˆ 2
(81)
The corresponding average and variance are given by p1 p2
2
(82)
p1 1 p1 n1
pˆ1 1 pˆ1 n1
p2 1 p2 n2
pˆ 2 1 pˆ 2
(83)
n2
We construct a normalized variable given by z
pˆ1 pˆ 2 p1 p2
(84)
We assume that the variable
z
follows a standard normal distribution. We want to
evaluate whether p1 p2 p , and hence set p1 p2 p , and the normalized variables is z
pˆ1 pˆ 2
We compare this with P point
(85)
zP
and perform a testing.
Statistical Testing and Predictions
91
Therefore, we can judge difference of the averages as z zP H0 is true. We cannot say that two population ratios are different. z zP H1 is true. Two population ratios are different.
(86)
Prediction We set p1 p2 p
(87)
The difference of two population ratios can be evaluated as
pˆ1 pˆ 2 zP
ˆ1 p ˆ 2 zP p p
(88)
6.6. Testing of Difference between Population Ratio with Non-Restored Extraction Hypothesis H0: The two population ratios are same. H1: The two population ratios are different from each other. Evaluation and Judgment We assume that two population ratios are p1 and p2 , and evaluate the difference between two population ratio based on the sample data. ˆ1 and pˆ 2 , and assume the corresponding average and We obtain two sample ratio of p standard deviations are
p 1 p1 N1 n1 pˆ1 : mean p1 ,standard deviation 1 n1 N1 1 p2 1 p2 N 2 n2 pˆ 2 : mean p2 ,standard deviation n2 N2 1
(89)
This follows a normal distribution. We consider the difference as a probability variable given by ˆ1 p ˆ2 y p
(90)
Kunihiro Suzuki
92
The corresponding average and variance are given by p1 p2
(91)
2
p1 1 p1 N1 n1 p 1 p2 N 2 n2 2 n1 N1 1 n2 N2 1
pˆ1 1 pˆ1 N1 n1 2 pˆ 2 1 pˆ 2 N 2 n2 n1 N1 1 n2 N2 1
2
(92)
We construct a normalized variable given by z
pˆ1 pˆ 2 p1 p2
(93)
We assume that the variable
z
follows a standard normal distribution. We want to
evaluate whether p1 p2 p , and hence set p1 p2 p , and the normalized variables is z
pˆ1 pˆ 2
(94)
z
We compare this with P point P and perform a testing. Therefore, we can judge difference of the averages as z zP z zP
H0 is true. We cannot say that two population ratios are different. H1 is true. Two population ratios are different.
(95)
Prediction We set p1 p2 p
(96)
The difference of two population ratios can be evaluated as
pˆ1 pˆ 2 zP
ˆ1 p ˆ 2 zP p p
(97)
Statistical Testing and Predictions
93
6.7. Testing of Ratio of Two Population’s Variances: 1 and 2 Are Known Hypothesis H0: The two population variances are same. H1: The two population variances are different from each other. Evaluation and Judgment
u1 , u2 are given by 2
Two variances 2
u1
2
u2
2
1 n1 2 xi1 1 n1 i 1
(98)
1 n2 2 xi 2 2 n2 i 1
(99)
The variable F defined by
F
u1
2
u2
2
(100)
follows F distribution with a freedom of k1 n1 , k2 n2 , where n1 and n2 are the sample numbers. We compare this with P point FP and perform a testing. Therefore, we can judge difference of the variances as F FP F FP
Prediction None.
H0 is true. We cannot say that two population variances are different. H1 is true. Two population variances are different.
(101)
Kunihiro Suzuki
94
6.8. Testing of Ratio of Two Population’s Variances 1 and 2 Are Unknown Hypothesis H0: The two population variances are same. H1: The two population variances are different from each other. Evaluation and Judgment
s1 , s2 given by 2
The ratio of two unbiased variance
F
s1
2
2
s2
2
(102)
follows a F distribution with a freedom of k1 n1 1, k2 n2 1 , where n1 and n2 are the sample numbers. We compare this with P point FP and perform a testing. Therefore, we can judge the difference of the variances as F FP F FP
H0 is true. We cannot say that two population variances are different. H1 is true. Two population variances are different.
(103)
Prediction None.
7. TESTING FOR CORRELATION FACTORS 7.1. Correlation Factor Testing Hypothesis H0: The two population variables have no correlation relationship. H1: The two population variables have correlation relationship. Evaluation and Judgment We test whether there is correlation relationship between two variables for a gotten correlation factor r . When the data has no correlation relationship, the variable
Statistical Testing and Predictions t n2
follows a
95
r 1 r2
(104)
t distribution with a freedom of n 2 .
We decide a prediction probability
P , and evaluate corresponding t p . If the absolute
value of t is smaller than the t P , we can regard that averages the same, and vice versa. Therefore, we can judge the sample average, and population average as t tP t tP
H0 is true. We cannot say that the averages are different from each other. H1 is true. The averages are different from each other.
(105)
Prediction None.
7.2. Correlation Factor Testing for Reference One Hypothesis H0: The correlation factor is the same as the population correlation factor. H1: The correlation factor is different from the population correlation factor. Evaluation and Judgment We test whether the gotten correlation factor We form a converted variable of 1
r
is the same as the reference of
.
1 r
ln 2 1 r
(106)
This follows a normal distribution with an average of
1 1 ln 2 1 and a standard deviation of
(107)
Kunihiro Suzuki
96
1 n 2.5
(108)
Consequently, the parameter given by
z
(109)
follows a standard normal distribution. We compare this with P point z P and perform a testing. Therefore, we can judge difference of the averages as z zP H0 is true. We cannot say that the corrlataion factor is different from the population correlation factor. z zP H1 is true. The corrlataion factor is different from the population correlation factor.
(110)
Prediction If we do not know the population correlation factor, we can predict it as zP zP
(111)
This can be converted as min max
(112)
where min
e2GL 1 e2GL 1
max
e2GH 1 e2GH 1
(113) (114)
and 1 1 rxy GL ln 2 1 rxy
1 zP n 2.5
(115)
Statistical Testing and Predictions GH
1 1 r ln zP 2 1 r
97
1 n 2.5
(116)
7.3. Two Correlation Factor Testing Hypothesis H0: The two correlation factors are the same. H1: The two correlation factors are different from each other. Evaluation and Judgment
r1
and r2 are the same or not. We assume the corresponding population correlation factors are 1 and 2 , respectively. We test whether the two gotten correlation factors We form converted variables of
1 1 r1 1 ln 2 1 r1
(117)
1 1 r2 2 ln 2 1 r2
(118)
The average and variance of 1 and 2 are given by
1 1 1 1 ln 2 1 1
(119)
1 1 2 2 ln 2 1 2
(120)
1
2
1 n1 2.5
(121)
1 n2 2.5
(122)
Kunihiro Suzuki
98 Consequently, the parameter
z
1 2 1
2
z
given by
(123)
follows a standard normal distribution, where 2 2 1
(124)
2
We compare this with P point z P and perform a testing. Therefore, we can judge difference of the averages as z zP z zP
H0 is true. We cannot say that the two corrlataion factors are differnt from each other. H1 is true. The two corrlataion factors are differnt from each other.
(125)
Prediction None.
8. TESTING FOR REGRESSION Hypothesis H0: The regression is invalid. H1: The regression is valid. Evaluation and Judgment In the regression analysis, a degree of freedom adjusted coefficient of determination is given by n R*2 1
e n
T
where
Se
2
2
S yy
(126)
Statistical Testing and Predictions
99
e n 2
(127)
T n 1
(128)
r 1
(129)
x
1 xi n
(130)
y
1 yi n
(131)
2
S xx
2
S yy
x x
2
y y
2
i
i
n
S xy 2
Sr 2
(132)
n
x
i
x yi y n
y
i
y Yi y n
S yy Sr Se 2
2
2
(133)
(134)
(135)
(136)
We form a parameter n F
r n
e
S r
2
Se
2
(137)
This follows a F distribution with a freedom of k1 r , k2 e .We compare this with P point FP and perform a testing. Therefore, we can judge the difference of the variances as
Kunihiro Suzuki
100 F FP F FP
H0 is true. We cannot say that the regression is valid. H1 is true. The regression is valid.
(138)
Prediction
x0
The regression value for y at
is expressed a
Yˆ0 aˆ0 aˆ1 x0
(139)
where
S xy 2
aˆ1
S xx
(140)
aˆ0 y aˆ1 x
(141)
2
The predictive range is given by 1 x0 x 2 2 ˆ se Y0 Yˆ0 t p n 2 Y0 t p n 2 2 n nS xx
The predictive range for
y0
1 x0 x 2 2 se 2 nS xx n
(142)
is given by
1 x x 2 2 Yˆ0 z p 1 0 2 e yo Yˆ0 z p nS xx n
1 x0 x 2 2 1 e 2 nS xx n
9. TESTING FOR MULTI REGRESSION Hypothesis H0: The regression is invalid. H1: The regression is valid.
(143)
Statistical Testing and Predictions
101
Evaluation and Judgment In the regression analysis, a degree of freedom adjusted coefficient of determination is given by
n R*2 1
Se
e n
2
S yy 2
T
(144)
where e n m 1
(145)
T n 1
(146)
r m
(147)
xp
y
1 xip n
1 yi n
S pq 2
2
S yy
(149)
x
ip
x xiq x n
y y
S py 2
Sr 2
(148)
n
ip
x p yi y n
y
i
y Yi y n
S yy Sr Se 2
2
i
x
2
2
(150)
(151)
(152)
(153)
(154)
Kunihiro Suzuki
102 We form a parameter
n F
r n
e
S r
2
Se
2
(155)
This follows a F distribution with a freedom of k1 r , k2 e . We compare this with P point FP and perform a testing. Therefore, we can judge the difference of the variances as F FP F FP
H0 is true. We cannot say that the regression is valid. H1 is true. The regression is valid.
(156)
Prediction The regression value for y at x0 is expressed a Yi aˆ0 aˆ1 xi1 aˆ2 xi 2
aˆm xim
(157)
where aˆ1 S11 ˆ 2 a2 S21 2 aˆ p Sm 1 2
S12
S1m 2 S2 m 2 Smm
2
2
S22 2
Sm 2 2
aˆ0 y aˆ1 x1 aˆ2 x2
1
S1y2 2 S2 y S 2 py
aˆm xm
(158) (159)
The predictive range is given by
aˆ0 aˆ1 x1 aˆ2 x2
1 D 2 2 aˆm xm t p e ; P se n n
Y aˆ0 aˆ1 x1 aˆ2 x2
(160) 1 D 2 2 aˆm xm t p e ; P se n n
Statistical Testing and Predictions
103
where
D 2 x1 x1
x2 x2
The predictive range for
aˆ0 aˆ1 x1 aˆ2 x2
S 11 2 21 2 S xm xm S m1 2
y0
S S S
12 2 22 2
m 2 2
1m 2
x x 1 1 x2 x2 S mm 2 x x m S m
S
2 m 2
(161)
is given by
1 D 2 2 aˆm xm t e , P 1 se n n
y
(162)
aˆ0 aˆ1 x1 aˆ2 x2
1 D 2 2 aˆm xm t e , P 1 se n n
10. TESTING FOR EFFECTIVENESS OF VARIANCES IN MULTI REGRESSION Hypothesis H0: The variance is invalid. H1: The variance is valid. Evaluation and Judgment In the multiple regression, the effectiveness of the m variances with respect to the regression should be evaluated. We start with a regression without an explanatory variable, which is denoted as model 0. The regression is given by Model 0 : Yi y
(163) S e M 0 2
The corresponding variance Se M 0 2
is given by
1 n 1 n 2 2 yi Yi yi y S yy 2 n i 1 n i 1
(164)
Kunihiro Suzuki
104
In the next step, we evaluate the validity of x1 , x2 , model 1.
, xm , and the model is denoted as
The regression using the explanatory variable is given by
xl
Yi a0 a1 xil1
(165) Se M 1 2
The corresponding variance Se M 1 2
is given by
2 1 n 1 n 2 yi Yi yi a0 a1 xil1 n i 1 n i 1
(166)
Then the variable nS F
2 e M 0
1
nSe M 1 2
2 nSe M 1
e M 0
e M 1
(167)
e M 1
follows a F distribution with freedom and are given by
F e M 0 e M 1 ,e M 1
where
e M 0
and
e M 1
are the
e M 0 n 1
(168)
e M 1 n 2
(169)
We can judge the validity of the explanatory variable as
F1 F e M 1 , e M 1 e M 0 F1 F e M 0 e M 1 , e M 1
We evaluate F1 for x1 and
valid
(170)
invalid
x2 , that is, l
1
1,2 , and evaluate the corresponding F1 .
If both F1 is invalid, we use the model 0 and the process ends. We precede these processes and obtain
Statistical Testing and Predictions
Fk
nS
2 e Mk 1
nSe Mk 2
nS
2 e Mk
e Mk 1
e Mk
105
(171)
e Mk
where
e Mk n k 1
(172)
Fp e Mk 1 e Mk ,e Mk We evaluate the corresponding F value given by . Therefore, we can judge the difference of the variances as
F FP F FP
H0 is true. We cannot say that the variable is valid for the regression. H1 is true. The variable is valid for the regression.
(173)
Prediction None.
11. TESTING FOR VARIANCE ANALYSIS 11.1. One Way Analysis Hypothesis H0: The parameter dependence is invalid. H1: The parameter dependence is valid. Evaluation and Judgment 2
The effectiveness of the level is evaluated with Sex given by
2
Sex
nA1 A1
2
nA2 A2
2
nA3 A3
2
nA1 nA2 nA3
(174) 2
The scattering of the data is expressed with Sin , and is given by
Kunihiro Suzuki
106 nA1
x
Sin
iA1
i 1
2
A1
2
i 1
iA2
A2
nA3
x 2
i 1
iA3
A3
2
nA1 nA2 nA3
The correlation ratio
2
nA2
x
2
(175)
is given by
Sex 2
Sin Sex 2
2
(176)
This is between 0 and 1, and the effectiveness of the factor can be regarded as significant with larger 2 . We form an unbiased variance as sex
n
2
ex
Sex 2
(177)
where
ex p p
(178)
is the level number.
sin
n
2
in
Sin 2
(179)
where in n p
(180)
Finally, the ratio of the unbiased variance is denoted by F and is given by
sex 2
F
sin 2
(181)
Statistical Testing and Predictions
This follows a F distribution with a freedom of distribution is denoted as
FP ex , in
107
ex ,in . The P point for the
F
.
We compare this with P point FP and perform a testing. Therefore, we can judge the difference of the variances as F FP F FP
H0 is true. We cannot say that the paremter dependence is valid. H1 is true. The paremter dependence is valid.
(182)
Prediction The effectiveness between each levels can be evaluated as
A A i
j
2 sin 1 1 2 nAi nAj
(183)
If this value is larger than the studentized range distribution table value of
q r , n r , P , we judge that the difference is effective. The other simple way to evaluate the difference is the one with zi
Ai 2 sin 1 1 2 nAi n
(184)
We may be able to compare absolute value of this with z p for a normal distribution.
11.2. Two Way Analysis without Repeated Data Hypothesis H0: The parameter dependence is invalid. H1: The parameter dependence is valid.
Kunihiro Suzuki
108
Evaluation and Judgment We consider two factors of A : A1, A2 , A3 , A4 and B : B1 , B2 , B3 , where nA 4 and nB 3 . The total data number
n
is given by
n nA nB
(185)
In this case, each data xij can be expressed by the deviation from the average, and is given by
xij Aj Bi eij
(186)
The various variances are given by
nA
S Aex 2
i 1
Ai
2
Se 2
2
nA
nB
S Bex
i 1
Bi
(187)
2
nB 1 nB nA 2 eij n i 1 j 1
(188)
(189)
The various freedoms are given by tot n 1
(190)
A nA 1
(191)
B nB 1
(192)
The freedom associated with the error is given by
Statistical Testing and Predictions
109
e tot A B n 1 nA 1 nB 1 n nA nB 1
(193)
Therefore, the unbiased variances are given by sA 2
sB 2
se 2
n
A n
B n
e
S A
(194)
SB
(195)
Se
(196)
2
2
2
The F associated with a factor A is given by s A 2
FA
se
(197)
2
F , This is compared with the F critical value of AP A c sB 2
FB
se
(198)
2
F , This is compared with the F critical value of BP B e . We compare this with P point F P and perform a testing, where A, B . Therefore, we can judge the difference of the variances as F F P F F P
Prediction None.
H0 is true. We cannot say that the paremter dependence is valid. H1 is true. The paremter dependence is valid.
(199)
Kunihiro Suzuki
110
11.3. Two Way Analysis with Repeated Data Hypothesis H0: The parameter dependence is invalid. H1: The parameter dependence is valid. Evaluation and Judgment We consider two factors of A : A1 , A2 , , AnA and B : B1 , B2 , , BnB , and we have ns set. The total data number
n
is given by
n nA nB ns
(200)
The total average is given by
1 ns nA nB xij _ s n s 1 j 1 i 1
(201)
Each data deviation xij _ s from the total average is given by
xij _ s xij _ s
(202)
The average data for each level is given by xij
1 xij _1 xij _1 2
(203)
The averages are evaluated as
Aj
Bi
nB
1 nB
x
1 nA
nA
ij
i 1
(204)
x j 1
ij
and the average deviation can be evaluated as
(205)
Statistical Testing and Predictions
111
Aj Aj
(206)
Bi Bi
nA
S Aex 2
i 1
Ai
2
2
nA
nB
S Bex
(207)
i 1
Bi
(208)
2
nB
(209)
Aex nA 1
(210)
Bex nB 1
(211)
s s Therefore, the corresponding unbiased variances Aex and Bex are given by 2
s Aex
n
2
sBex
Aex
n
2
bLex
2
S Aex 2
(212) S Bex 2
(213)
A pure error is given by
e pure _ ij _ s xij _ s xij Se _pure 2
1 nB nA ns nA nB ns j i s
(214)
e
2
pure _ ij _ s
The deviation of each data from the total average eij _ s is given by
(215)
Kunihiro Suzuki
112
eij _ s xij _ s Aj Bi
(216)
The difference associated with interaction is given by einteract _ ij _ s eij _ s e pure _ ij _ s
xij Aj Bi Sinteract 2
nA
1 nA nB
nB
ns
e
1 nA nB ns
interact _ ij _ s
j
i
nA
nB
j
i
(217)
2
s
xij Aj Bi
2
(218)
interact tot Aex Bex e _ pure
n 1 nA 1 nB 1 nA nB ns 1
sinteract
n
2
interact
(219)
Sinteract 2
(220)
s Aex 2
FA
se _pure 2
(221)
The critical F value FAP is given by
FAP F Aex ,e
(222)
The effectiveness of a factor B can be evaluated as sBex 2
FB
se _pure 2
The critical F value FBP is given by
(223)
Statistical Testing and Predictions
FBP F Bex ,e
113
(224)
Therefore, the factor B is effective. The effectiveness of interaction can be evaluated as sinteract 2
Finteract
se _pure 2
(225)
The critical F value FinteractP is given by
FinteractP F interact ,e
(226)
We compare this with P point F P and perform a testing, where A, B,interact . Therefore, we can judge difference of the variances as F F P F F P
H0 is true. We cannot say that the paremter dependence is valid. H1 is true. The paremter dependence is valid.
(227)
Prediction None.
11.4. Independent Factor Analysis Hypothesis H0: The parameter dependence is invalid. H1: The parameter dependence is valid. Evaluation and Judgment
a The independent value for cell i, j is denoted as ij and is given by aij ki rj N
(228)
r
where ki is the ratio of explanatory variable, and j is the ratio of objective variable levels. We evaluate the variable for the cross tabulated table as
Kunihiro Suzuki
114
2
i, j
x
ij
aij
2
(229)
aij
The level numbers of variable are assumed to be is given by
m
and l . Therefore, the freedom
m 1 l 1
(230)
We set a predictive probability P , and obtain the corresponding P-value as
c 2 2 , P
(231)
Therefore, we can judge the dependence of the variances as 2 c2 2 2 c
Prediction None.
H0 is true. We cannot say that the paremter dependence is valid. H1 is true. The paremter dependence is valid.
(232)
Chapter 4
SCORE EVALUATION ABSTRACT We show the procedure to decide the subject on which we focus to improve total scores of various subjects. The former procedure was that we evaluate the subject using its normalized value. We add one more aspect to improve the total scores, that is, we consider a standard deviation. We then need to focus on the subject with a low normalized value and a large standard deviation.
Keywords: score evaluation, normalized value, standard deviation, contribution, improvement requested
1. INTRODUCTION Success or failure is determined based on the total score in common examinations. Therefore, we want to know what subject we should focus on to improve the total score. The simple way to select the subject is to evaluate the normalized value for each subject, and try to improve the subject with the low normalized value. We show that it is more effective to add one more aspect to select the subjects, that is, the standard deviation of the subject.
2. EVALUATION OF THE FIVE SUBJECTS Table 1 shows the score of 40 students of five subjects: Japanese, English, Science, Social, and Mathematics. We express the subject with p , where p =1 to 5. We denote the average of subject
p
as
p
and the standard deviation as
p
, which are evaluated as
Kunihiro Suzuki
116 N
x
p
i 1
ip
N
(1)
x N
p2
i 1
ip
p
2
N
(2)
where N is the student number and is 40 here. The score of a member i with the subject normalized variable is given by
zip
p
is denoted as
xip
, and the related
xip p
p
(3)
This value is related to the status of the member i in the group. The order of the member in the group is located in the ratio r , and it is given by z
z2 1 r exp dz 2 2
1 z 1 Erf 2 2
(4)
where Erf is the error function as shown in Appendix 1-4. This is the ratio from the bottom, and the ratio from the top can be obtained by 1 minus the value, which is shown in Figure 1. 1.0
Probability
0.8 0.6
Bottom Top
0.4 0.2 0.0
-3
-2
-1
0 z
Figure 1. Dependence of probability on normalized value z .
1
2
3
Score Evaluation
117
z
z
Table 2 shows the normalized variables ip . The values of ip more than 0.5 are blue hatched, and the values less than -0.5 are red hatched. This corresponds to the top 30% and bottom 30%. Therefore, the blue hatched subjects correspond to the good point and red hatched subjects to bad point for the person. It is then recommended to improve the red hatched subjects for the person, which is the standard evaluation for selecting the target subjects. Table 1. Score of data for 40 members ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Average Stdv
Japanese 77 80 93 66 85 63 100 60 100 78 91 90 78 72 100 73 47 70 64 82 71 64 32 94 83 76 59 88 66 90 53 71 60 68 52 80 39 47 74 57 72.33 16.48
English 92 6 46 26 82 46 68 66 72 36 100 56 44 38 78 100 64 78 75 73 54 18 65 73 61 50 90 45 51 90 37 60 91 55 32 30 43 100 97 52 61.00 23.71
Science 17 48 46 24 49 60 38 36 61 52 70 74 67 38 30 54 39 30 28 66 50 22 19 28 72 44 35 40 64 55 53 51 56 55 13 51 42 73 25 29 45.10 16.49
Social 81 76 85 74 87 82 100 70 100 91 77 74 97 72 71 80 100 78 87 86 92 97 77 79 91 87 96 85 67 73 72 91 82 78 67 89 81 87 73 75 82.68 9.37
Math 50 67 34 61 68 0 82 70 77 64 51 100 65 51 18 100 82 69 42 9 55 72 60 66 43 90 0 54 57 74 70 38 0 71 0 94 12 88 91 28 55.58 28.55
Kunihiro Suzuki
118
Table 2. Normalized score for 40 members ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Japanese 0.28 0.47 1.25 -0.38 0.77 -0.57 1.68 -0.75 1.68 0.34 1.13 1.07 0.34 -0.02 1.68 0.04 -1.54 -0.14 -0.51 0.59 -0.08 -0.51 -2.45 1.32 0.65 0.22 -0.81 0.95 -0.38 1.07 -1.17 -0.08 -0.75 -0.26 -1.23 0.47 -2.02 -1.54 0.10 -0.93
English 1.31 -2.32 -0.63 -1.48 0.89 -0.63 0.30 0.21 0.46 -1.05 1.64 -0.21 -0.72 -0.97 0.72 1.64 0.13 0.72 0.59 0.51 -0.30 -1.81 0.17 0.51 0.00 -0.46 1.22 -0.67 -0.42 1.22 -1.01 -0.04 1.27 -0.25 -1.22 -1.31 -0.76 1.64 1.52 -0.38
Science -1.70 0.18 0.05 -1.28 0.24 0.90 -0.43 -0.55 0.96 0.42 1.51 1.75 1.33 -0.43 -0.92 0.54 -0.37 -0.92 -1.04 1.27 0.30 -1.40 -1.58 -1.04 1.63 -0.07 -0.61 -0.31 1.15 0.60 0.48 0.36 0.66 0.60 -1.95 0.36 -0.19 1.69 -1.22 -0.98
Social -0.18 -0.71 0.25 -0.93 0.46 -0.07 1.85 -1.35 1.85 0.89 -0.61 -0.93 1.53 -1.14 -1.25 -0.29 1.85 -0.50 0.46 0.35 1.00 1.53 -0.61 -0.39 0.89 0.46 1.42 0.25 -1.67 -1.03 -1.14 0.89 -0.07 -0.50 -1.67 0.67 -0.18 0.46 -1.03 -0.82
Math -0.20 0.40 -0.76 0.19 0.44 -1.95 0.93 0.51 0.75 0.30 -0.16 1.56 0.33 -0.16 -1.32 1.56 0.93 0.47 -0.48 -1.63 -0.02 0.58 0.15 0.37 -0.44 1.21 -1.95 -0.06 0.05 0.65 0.51 -0.62 -1.95 0.54 -1.95 1.35 -1.53 1.14 1.24 -0.97
Score Evaluation
119
3. SCORE EVALUATION CONSIDERING STANDARD DEVIATION We further study the score evaluation. The total Qi for a member i is given by 5
Qi xip p 1
(5)
This is modified as 5
xip p
p 1
p
Qi p
5
p p 1
5
5
p 1
p 1
p zip p
(6)
The second term is independent of a member, that is, it is independent of i . Therefore, we need not to consider this term to select the subjects. The first term is related to the selection of the subjects. The selection of subject for low normalized value means that we focus on the
z
zip
in the first term. However, the first term
is the product of ip and p . Therefore, we should also care about the standard deviation. This means that we should select the subject for low normalized value with high standard deviation. We evaluate the average and the standard deviation of the standard deviation Eq.(6) as
1 5 p 5 p 1
2 1 5 p 5 p 1
18.92 , and 6.61 in this case.
The normalized value is given by
p
in
(7)
(8)
Kunihiro Suzuki
120
z p
p
(9)
Substituting Eq. (9) into Eq. (6), we obtain 5
5
p 1
p 1
Qi p zip p 5
p 1
5 5 p zip zip p p 1 p 1
5
5
5
p 1
p 1
p 1
z p zip zip p
(10)
We want to summarize the first two terms and modify this as 5
5
5
p 1
p 1
p 1
Qi z p zip zip p 5 5 1 z p zip p p 1 p 1 1 z 2 5 p 1 2 p 1 1
1
2
5
z p 1
5 zip p p 1
5
z p
p ip
p 1
(11)
where z i is the extended normalized value given by 1 z p
z p
1
2
(12)
This extended normalized value is related to the importance of the subject to improve the total score. Table 3 summarizes the parameter values.
Score Evaluation
121
Table 3. Standard deviation (Stdev), normalized stdv, and extended normalized stdv Subject Stdv Normalized stdv Extended normalized stdv
Japanese 16.476 -0.369 0.822
English 23.711 0.724 1.183
Science 16.486 -0.368 0.823
Social 9.371 -1.443 0.468
Math 28.551 1.456 1.425
Let us appreciate the normalized value. The extended normalized value is as follows in the limiting cases as
z p
1 z p
for for
1 1
(13)
p The extended normalized value is independent of subject for . This means that we can regard the standard deviation as constant. In this case, there is no priority for the subject, and we should select only by the normalized score as the usual evaluation. p The normalized value is dependent of subject for . This means that the standard deviation depends on the subject. In this case, there is significant priority for the subject, and we should select by considering both normalized score and this extended normalized value. In the latter case, we should perform CS analysis, where we define contribution and requested axis which has the angles of 4 . In the former case, we do not define such axis explicitly. However, if we define the angles of 2 , we obtain the same results as the conventional one. Therefore, we need to define contribution and requested axis varying depending on the value of and .
a z , z
p ip We then obtain coordinate for member i as i . We evaluate the axis for improvement request and contribution. We define the angle given by
tan
That is, we obtain an angle of
(14)
Kunihiro Suzuki
122
tan 1
(15)
The angle has values for limiting case as 2 0
for
for
1 1
(16)
In this case, the angle is given by
6.61 tan 1 18.92 0.34 radian
tan 1
19.30
(17)
We propose to define the unit vectors for contribution and improvement as follows.
eG cos ,sin 2 2 2 2 e cos ,sin B 2 2 2 2
(18)
These definitions realize the requested ones for limiting cases. The contribution and improvement requested are given by
ai eG zGi z p cos 2 2 zip sin 2 2 a e z z cos z sin i B Bi p ip 2 2 2 2
(19)
Score Evaluation
123
The value for contribution and improvement requested are shown in Table 4 and Table 5. The values more than 0.5 in Table 4 are hatched blue, which express the contribution. The values more than 0.5 in Table 5 are hatched red, which express improvement requested. Table 4. Contribution values for 40 members ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Japanese 0.42 0.60 1.37 -0.24 0.90 -0.42 1.79 -0.60 1.79 0.48 1.26 1.20 0.48 0.12 1.79 0.18 -1.38 0.00 -0.36 0.72 0.06 -0.36 -2.28 1.43 0.78 0.36 -0.66 1.08 -0.24 1.20 -1.02 0.06 -0.60 -0.12 -1.08 0.60 -1.86 -1.38 0.24 -0.78
English 1.49 -2.09 -0.43 -1.26 1.07 -0.43 0.49 0.41 0.66 -0.84 1.82 -0.01 -0.51 -0.76 0.90 1.82 0.32 0.90 0.78 0.70 -0.09 -1.59 0.36 0.70 0.20 -0.26 1.40 -0.47 -0.22 1.40 -0.80 0.16 1.45 -0.05 -1.01 -1.09 -0.55 1.82 1.69 -0.18
Science -1.54 0.31 0.19 -1.12 0.37 1.03 -0.29 -0.41 1.09 0.55 1.63 1.87 1.45 -0.29 -0.77 0.67 -0.23 -0.77 -0.88 1.39 0.43 -1.24 -1.42 -0.88 1.75 0.07 -0.47 -0.17 1.27 0.73 0.61 0.49 0.79 0.73 -1.78 0.49 -0.05 1.81 -1.06 -0.83
Social -0.10 -0.62 0.32 -0.83 0.53 0.01 1.90 -1.26 1.90 0.95 -0.52 -0.83 1.59 -1.04 -1.15 -0.20 1.90 -0.41 0.53 0.43 1.06 1.59 -0.52 -0.31 0.95 0.53 1.48 0.32 -1.57 -0.94 -1.04 0.95 0.01 -0.41 -1.57 0.74 -0.10 0.53 -0.94 -0.73
Math 0.05 0.63 -0.51 0.43 0.67 -1.68 1.15 0.74 0.98 0.53 0.08 1.77 0.56 0.08 -1.06 1.77 1.15 0.70 -0.23 -1.37 0.22 0.81 0.39 0.60 -0.20 1.43 -1.68 0.18 0.29 0.87 0.74 -0.37 -1.68 0.77 -1.68 1.57 -1.27 1.36 1.46 -0.71
Kunihiro Suzuki
124
Table 5. Improvement requested values for 40 members ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Japanese -0.14 -0.32 -1.10 0.52 -0.62 0.70 -1.52 0.88 -1.52 -0.20 -0.98 -0.92 -0.20 0.16 -1.52 0.10 1.65 0.28 0.64 -0.44 0.22 0.64 2.55 -1.16 -0.50 -0.08 0.93 -0.80 0.52 -0.92 1.29 0.22 0.88 0.40 1.35 -0.32 2.13 1.65 0.04 1.05
English -1.09 2.48 0.82 1.65 -0.68 0.82 -0.09 -0.01 -0.26 1.24 -1.42 0.41 0.90 1.15 -0.51 -1.42 0.07 -0.51 -0.38 -0.30 0.49 1.99 0.03 -0.30 0.20 0.66 -1.01 0.86 0.61 -1.01 1.20 0.24 -1.05 0.45 1.40 1.49 0.95 -1.42 -1.30 0.57
Science 1.82 -0.04 0.08 1.40 -0.10 -0.75 0.56 0.68 -0.81 -0.27 -1.35 -1.59 -1.17 0.56 1.04 -0.39 0.50 1.04 1.16 -1.11 -0.16 1.52 1.70 1.16 -1.47 0.20 0.74 0.44 -0.99 -0.45 -0.33 -0.22 -0.51 -0.45 2.06 -0.22 0.32 -1.53 1.34 1.10
Social 0.25 0.78 -0.17 0.99 -0.38 0.15 -1.74 1.41 -1.74 -0.80 0.68 0.99 -1.43 1.20 1.31 0.36 -1.74 0.57 -0.38 -0.27 -0.90 -1.43 0.68 0.46 -0.80 -0.38 -1.32 -0.17 1.73 1.10 1.20 -0.80 0.15 0.57 1.73 -0.59 0.25 -0.38 1.10 0.89
Math 0.43 -0.16 0.98 0.05 -0.19 2.16 -0.67 -0.26 -0.50 -0.05 0.40 -1.30 -0.09 0.40 1.54 -1.30 -0.67 -0.23 0.71 1.85 0.26 -0.33 0.09 -0.12 0.67 -0.95 2.16 0.29 0.19 -0.40 -0.26 0.85 2.16 -0.29 2.16 -1.09 1.74 -0.88 -0.98 1.19
The plot for member ID5,10,26 are shown in Figure 2. The projection of contribution and improvement requested are shown in Table 6. The blue hatched subject is their contribution subjects and the red hatched subject is their improvement request subjects. The evaluated results using the normalized scores and the ones using the proposed procedure are different in general.
Score Evaluation Social
125
Japanese Science English
Math
Normalized score
2 1
ID5 ID10 ID26
/2
0 -1 /2
-2 -2 -1 0 1 2 Extended normalized standard deviaiton Figure 2. Dependence of normalized score on extended normalized standard deviation.
Table 6. Evaluated parameters for member ID5, 10, and 26 ID5 Subject Score Normalized score Contribution Improvement requested
Japanese 85 0.77 0.90 -0.62
English 82 0.89 1.07 -0.68
Science 49 0.24 0.37 -0.10
Social
ID10 Subject Score Normalized score Contribution Improvement requested
Japanese 78 0.34 0.48 -0.20
English 36 -1.05 -0.84 1.24
Science 52 0.42 0.55 -0.27
Social
ID26 Subject Score Normaloized score Contribution Improvement requested
Japanese 76 0.22 0.36 -0.08
English 50 -0.46 -0.26 0.66
Science 44 -0.07 0.07 0.20
Social
87 0.46 0.53 -0.38
91 0.89 0.95 -0.80
87 0.46 0.53 -0.38
Math 68 0.44 0.67 -0.19
Math 64 0.30 0.53 -0.05
Math 90 1.21 1.43 -0.95
Kunihiro Suzuki
126
SUMMARY Here is summarized the results in this chapter. The average and variance of each subject are given by N
x
p
i 1
ip
N
x N
p2
i 1
ip
p
2
N
where N is the number of students.
x p The score of member i for subject is denoted as ip , and is normalized as zip
xip p
p
We evaluate the average and standard deviation of standard deviation as
1 5 p 5 p 1
2 1 5 p 5 p 1
The related angle is evaluated a tan
tan 1 We then normalize the standard deviation for each subject as
Score Evaluation
z p
p
We introduce a normalized variable as 1 z p
z p
1
2
The unit vector contribution is given by
eG cos ,sin 2 2 2 2 The contribution is given by
ai eG zGi z p cos zip sin 2 2 2 2 The unit vector for improvement requested is given by
e B cos ,sin 2 2 2 2 The improvement requested is given by
ai eB zBi z p cos zip sin 2 2 2 2
127
Chapter 5
AHP (ANALYTIC HIERARCHY PROCESS) ABSTRACT Analytic hierarchy process (AHP) enables us to decide which subject we should select based on the various item evaluations. The evaluations are done qualitatively, but we convert them to the numerical ones, and decide the target as if we do it based on the quantitative data. AHP is used in various cases where we cannot have quantitative data.
Keywords: pair comparison method, geometric average, eigenvalue, eigenvector
1. INTRODUCTION When we buy a product, there are various kinds of ones in general. We care about various items to decide which product we select. The items are such as price, style, color, function, etc. It is a rare case that one kind of product is superior in all items. The simple decision can be done if we score each items and sum it up. We can easily decide which one we should select. In this decision, we treat each item identically. However, some item is more important than others. Therefore, we need to weight the items. The weight expresses the importance of the item for the person who selects the product. We cannot decide the importance clearly if there are many items to be considered. Analytic hierarchy process (AHP) was developed by Thomas L. Saaty to overcome the problem, where pair comparison is used. AHP treats quite ambiguous data, but gives us clear numerical results.
Kunihiro Suzuki
130
2. AHP PROCESS We consider a case of selecting one sport club among three ones: club A, club B, and club C. The items which we consider to select the club are supposed to be below four.
Price Facilities Transportation Staff
The corresponding data structure is shown in Figure 1. Each clubs score is shown in Table 1, and corresponding radar chart is shown in Figure 2. The data are given by general evaluation or the personal evaluation. Club A is superior in the price (low cost), and club C is superior in facility. We can evaluate the clubs by summing up the score, which are shown in Table 1. In the standpoint of sum score view, we should select club C. In the above evaluation, we implicitly assume that each item is identical. However, the importance of the items depends on a person, where price is the most important for someone, and facility is the most important for someone. Therefore, we need to include the importance of the items in deciding the club.
Figure 1. Data structure for AHP analysis.
AHP (Analytic Hierarchy Process)
131
Table 1. Scores for club A, B, and C Club A B C
Price 8 3 2
Facility Transportation Staff 2 4 5 2 4 4 8 5 6
Sum 19 13 21
Figure 2. Radar chart for club selection data.
3. PAIR COMPARISON METHOD 3.1. Pair Comparison Table In the pair comparison method, we select two items and compare them relatively. It is rather hard for a person to evaluate all items simultaneously, but the comparison is rather easy if we focus on only two subjects. We usually get the answers categorically, and convert them to values on the back yard. The conversion example is shown in Table 2. The example for raw data is shown in Table 3. The data is converted to numeric one based on Table 2, and finally we obtain Table 4. We evaluate the importance of the items from this table. The conversion of categorical data to the numeric such as better 3 is rather ambiguous. We can only think the categorical expression has some order and we assign a number based on the order. The important point is that the identical level for positive and negative follows the rule that the product is 1. For example, if we assign better 3 , the corresponding negative expression
worse
must be
1
3
. This rule is supposed to express
Kunihiro Suzuki
132
the human impression. I think that the conversion is rather ambiguous and not established one. Table 2. Score conversion Evaluation absolutely worse much worse worse little worse plane little good better much better absolutely better
Score 1/9 1/7 1/5 1/3 1 3 5 7 9
Table 3. Raw data for pair comparison method Absolutely worse
1
Much worse
3 ○
Worse
5
Little worse
7
Plane
Better
9
Little better
Much better
Score Price Price Price Facility Facility Transportation
Absolutely better
Left item
Right item
1/3 1/5 1/7 1/9 Facility Transportation Staff Transportation Staff Staff
○ ○ ○ ○ ○
Table 4. Converted table. The below is the one which is fulfilled base on the data above Price
Facility
Price Facility Transportation Staff
3
Price Price Facility Transportation Staff
Facility 1 1/3 1/5 1/7
3 1 1 1/5
Transportation 5 1
Staff
Transportation 5 1 1 1/3
Staff
7 5 3
7 5 3 1
AHP (Analytic Hierarchy Process)
133
3.2. Weight Evaluation Based on Geometric Average The item of Price has scores of 1,3,5,7 as shown in Table 4. The corresponding geometric average (see Chapter 3 of volume 1) is given by 1
Geometric average Price 1 3 5 7 4 3.20
(1)
The other items are also similarly evaluated as 1
1 4 Geometric average Facility 11 5 1.14 3
(2)
1
1 4 Geometric average Transportation 11 3 0.88 5
(3)
1
1 1 1 4 Geometric average Staff 1 0.31 7 5 3
(4)
Table 5. Weight based on geometric average Item Price Facility Transportation Staff Sum
Average 3.20 1.14 0.88 0.31 5.53
Weight 0.58 0.21 0.16 0.06
Table 6. Evaluation of each club using weight
Weight Club A Club B Club C
Price 0.58 8 3 2
Facility Transportation 0.21 0.16 2 2 8
4 4 5
Staff 0.06 5 4 6
Sum
Weighted sum 19 13 21
5.96 3.01 3.94
The sum of them is 5.53. Therefore, we can evaluate the weight of each item as the average divided by the sum, which is shown in Table 5. We can evaluate each club by
Kunihiro Suzuki
134
weighted sum as shown in Table 6. We select the club C by the simple sum, but we select the club A by the weighted evaluation.
3.3. Eigenvector Method We perform a matrix operation in this section, and the basic matrix operation is described in Chapter 15. We consider
w1 , w2 , , wn aij
n
items denoted as
The pair comparison of
I1 , I 2 , , I n Ij
to
Ii
. The ideal weight is denoted as
is then denoted as
aij
wi wj
(5)
Therefore, the ideal data for the matrix
w1 w 1 w2 A w1 wn w1
and is given by
w1 w2 w2 w2 wn w2
A is given by
w1 wn w2 wn wn wn
(6)
The corresponding geometric average for i-th item i is given by 1
w w w n i i i i wn w1 w2 wi w1w2 wn Therefore, the ratio is
(7)
AHP (Analytic Hierarchy Process)
w1 : w1 :
135
: wn
(8)
We consider the data from the different standpoint of view. Operating the weight vector from the right side, we obtain
w1 w 1 w2 w 1 wn w1
w1 wn w1 w1 w2 w w wn 2 n 2 wn wn wn wn
w1 w2 w2 w2 wn w2
(9)
Therefore, the weight is an eigenvector, and n is the eigenvalue in the ideal case. Consequently, we have the same result in geometric average method and the eigenvector method if the data is ideal. The real matrix is different from the ideal one. However, we perform the two methods even in the case. Let us consider data which are far from the ideal one.
We denote the first eigenvalue and eigenvector of matrix A as max and respectively. We start with the data shown in Table 4. The corresponding matrix is given by
1 1 3 A1 5 1 7
3
5
1
1
1
1
1 5
1 3
7 5 3 1
v
,
(10)
We evaluate the first eigenvalue and eigenvector, which is given by
Av 4.24v
(11)
Kunihiro Suzuki
136 0.57 0.21 v 0.16 0.06
(12)
where 4.24 is an eigenvalue, and to the weight shown in Table 6.
v
is an eigenvector. Note that the eigenvector is close
4. CONSISTENCY CHECK OF PAIR COMPARISON The data based on the pair comparison may suffer inconsistent problem. For example, when we compare A to the other. If A is inferior to B, and is superior to C, B should be inferior to C. This should be determined before performing B-C pair comparison. However, we perform B-C comparison without caring the comparison associated with A. Therefore, we may have inconsistent data sometimes. We should check this inconsistency of the data. Let us consider the table we treated, and show it again as Table 7. We only need the first row hatched. The other data can be generated base on the data as shown in the numbers in brackets. However, we obtain the corresponding the data independently. Therefore, there is some inconsistency. Table 7. Inconsistency of the data Price Price Facility Transportation Staff
We evaluate the value of
Facility
Transportation 3 5 1 (5/3)
Staff 7 5 (7/3) 3 (7/5)
max .
max n
(13)
Focusing on i-th row, we obtain n
a v j 1
ij
j
max vi (14)
AHP (Analytic Hierarchy Process)
137
Modifying this, we obtain n
vj
j 1
vi
max aij
(15)
We then obtain
max 1
1 y ij y j i 1 ij n
(16)
where
yij
1 2 yij
(17)
is valid in general. Therefore, we obtain
max 1
n
2
j i 1
1 2n i
(18)
i has a values from 1 and n , that is, n
n
I 1
i 1
max 1 2 n i
(19)
Therefore, we obtain
1 nmax 1 2n n 2 n n 1 2 2 n This leads to
(20)
Kunihiro Suzuki
138
max n
(21)
Equality holds only when
yij 1
(22)
That is, we obtain
aij
vi vj
(23)
The consistency can be evaluated as the deviation of
max
to n . We usually use the
factor as the deviation divided by n 1 . This factor is denoted as C.I . (consistency index), and it is given by
C.I .
max n n 1
(24)
In the above example, we obtain
max n n 1 4.24 4 4 1 0.08
C .I .
(25)
Roughly speaking, the critical value of C.I . is supposed to be in between 1 and 1.5. If the evaluated C.I . is less than the critical value, we judge that it is OK. In this case, we judge that the data is consistent. The geometric average method and the eigenvector method can be both used in the standpoint of obtaining the weight function. However, we can evaluate consistency of the data with the eigenvector method. Therefore, the eigenvector method is rather preferably used. We assume that the Table 1 is a given one. However, we can also make the table based on the AHP process above.
AHP (Analytic Hierarchy Process)
139
SUMMARY To summarize the results in this chapter– We obtain scores for subjects in many items. The sum of the scores for various items corresponds to the evaluation of the subjects. We add weight to the score. We select two items and compare them qualitatively and convert it to numerical data. We obtain the weight by performing the geometric average method or the eigenvector method. Using the weighted score sum, we can select the subject. We can evaluate the consistency of the data using eigenvalue.
Chapter 6
QUANTIFICATION THEORY I ABSTRACT We predicted an objective variable values for given multi parameters with their error range in the multiple regression. We discuss the same subject when we have categorical data or mixture of numerical and categorical data. One categorical data are converted to the level number -1 numerical data, and then the same procedure as the multi regression is performed.
Keywords: regression, multiple regression, objective variable, explanatory variable, categorical data
1. INTRODUCTION We frequently face to the case for multiple evaluations where the data are not numerical data. For example, we evaluate some subject with levels of yes or no, male or female, done or not done, and so on. These kinds of data are called as categorical ones. We want to predict the objective variable value including these categorical evaluations. Quantification theory I corresponds to the multiple regression with these categorical data. We perform matrix operation in this chapter, and the basics of the matrix operation are described in Chapter 15.
2. ONE VARIABLE ANALYSIS We assume that the objective variable is numerical data and the explanation variable is category data, and assume one categorical variable x1 .
Kunihiro Suzuki
142
The objective variable is numerical data from 0 to 100. The level of explanation variables are expressed with categorical data as shown in Table 1. Table 1. Relationship between group discussion evaluation and score
ID 1 2 3 4 5 6 7 8 9 10
Group discussion x1 Excellent Excellent Excellent Excellent Allowed Allowed Allowed Wrong Wrong Wrong
Score 96 88 77 89 80 71 77 78 70 62
The levels for the group discussion are categorical levels of excellent, allowed, or wrong, and we convert these data as
1 x11 0
for excellent for non excellent
(1)
1 x1 2 0
for allowed for non allowed
(2)
1 x13 0
for wrong for non wrong
(3)
We then obtain the modified data shown in Table 2. The score yi and the modified numerical data is related to
yi 0 11 xi11 1 2 xi1 2 13 xi13 i
(4)
There is one constraint:
xi11 xi1 2 xi1 3 1
(5)
Quantification Theory I
143
Therefore, we can eliminate one variable, and we eliminate
x11
here. We then have
yi 0 1 2 xi1 2 13 xi13 i x1 2 x1 3 0
corresponds to
(6)
x11 1
. The final data is shown in Table 3.
Table 2. Relationship between numerical group discussion evaluation and score ID 1 2 3 4 5 6 7 8 9 10
Group discussion x1(1) x1(2) x1(3) x1 Excellent 1 0 0 Excellent 1 0 0 Excellent 1 0 0 Excellent 1 0 0 Allowed 0 1 0 Allowed 0 1 0 Allowed 0 1 0 Wrong 0 0 1 Wrong 0 0 1 Wrong 0 0 1
Score 96 88 77 89 80 71 77 78 70 62
Table 3. Relationship between final numerical group discussion evaluation and score
ID
x1(2) 1 2 3 4 5 6 7 8 9 10
The predicted score value yˆi for
x1(3) 0 0 0 0 1 1 1 0 0 0
yi
Score 0 0 0 0 0 0 0 1 1 1
96 88 77 89 80 71 77 78 70 62
is given by
yˆi ˆ0 ˆ1 2 xi1 2 ˆ13 xi13
The variance associated with the deviation is given by
(7)
Kunihiro Suzuki
144
1 n 2 yi yˆi n i 1 2 1 n yi ˆ0 ˆ1 2 xi1 2 ˆ1 3 xi13 n i 1
Se 2
(8)
2 S
ˆ
ˆ
ˆ
We impose that e is minimum, and obtain , , as the followings. After obtaining the Table 3, the process is exactly the same as the multiple regression. We repeat here again. Partial differentiating
0
2 Se
11
1 2
ˆ of Eq. (8) with respect to 0 , we obtain
Se 2 n yi ˆ0 ˆ1 2 xi1 2 ˆ13 xi13 0 n i 1 ˆ0
2
(9)
We then have
ˆ0 n y ˆ1 2 x1 2 ˆ13 x13
(10)
Substituting Eq. (10) into Eq. (8), we obtain
2
2 1 n yi ˆ0 ˆ1 2 xi1 2 ˆ1 3 xi13 n i 1 2 1 n yi y ˆ1 2 xi1 2 x1 2 ˆ1 3 xi1 3 x1 3 n i 1
Se
Partial differentiating Se ˆ
2
1 2
2 Se
of Eq. (8) with respect to
(11)
ˆ1 2
, we obtain
2 n yi y ˆ12 xi12 x1 2 ˆ13 xi13 x13 xi12 x12 0 n i 1
(12)
We then have
ˆ1 2 S1221 2 ˆ13 S1231 2 S y 122
Partial differentiating
2 Se
of Eq. (8) with respect to
(13)
ˆ1 3
, we obtain
Quantification Theory I Se ˆ
2
1 2
145
2 n yi y ˆ12 xi12 x12 ˆ13 xi13 x13 xi13 x13 0 n i 1
(14)
We then have
ˆ1 2 S12213 ˆ13 S12313 S y 123
(15)
where
x x
(17)
(18)
S1 21 2
1 n x x12 n i 1 i1 2
S1 213
1 n x x n i 1 i1 2 1 2
S1313
1 n x x13 n i 1 i13
2
2
2
2
(16)
i1 3
2
(19)
(20)
S y 12
1 n yi y xi12 x12 n i 1
S y 13
1 n yi y xi13 x13 n i 1
2
2
13
We can express the result with a matrics form given by S1221 2 S12213 ˆ1 2 S y 212 2 S 2 ˆ S 2 S 1 213 1313 13 y13
(21)
We have ˆ1 2 S1221 2 ˆ S 2 1 3 1 213
S1 213 2 S1313 2
1
S y 122 11.5 S 2 17.5 y 1 3
(22)
Kunihiro Suzuki
146
We can obtain
ˆ1 2 , ˆ13
and
ˆ1 2 , ˆ13
from this. Substituting these to Eq. (10), we
ˆ 87.5
obtain 0 . Finally, we obtain
yˆ ˆ0 ˆ1 2 xi1 2 ˆ1 3 xi1 3
(23)
87.5 11.5 x1 2 17.5 x1 3
This process is the same as the multiple regression. However, only one variable among
x11 , x1 2 , x13
has a value of 1. Therefore, the final equation should be expressed by
0 yˆ 87.5 11.5 17.5
for excellent for allowed for wrong
(24)
3. ANALYSIS WITH MANY VARIABLES We extend the analysis for many variables. We add an item whether a member belongs to a circle club, where the levels are two and are yes or no, which is shown in Table 4. Table 4. Explanation categorical data are group discussion and circle club. The two data are assumed to influence the score of each member
ID 1 2 3 4 5 6 7 8 9 10
Group Circle club discussion x2 x1 Excellent Yes Excellent Yes Excellent No Excellent No Allowed Yes Allowed No Allowed No Wrong Yes Wrong Yes Wrong No
Score 96 88 77 89 80 71 77 78 70 62
Quantification Theory I
147
In this case, we relate each categorical data for group discussion to numerical data as below.
1 x21 0 1 x2 2 0
for yes for no
(25)
for no for yes
(26)
which is shown in Table 5. The score may be expressed by
yi 0 11 xi11 1 2 xi1 2 13 xi13 21 xi 21 2 2 xi 2 2+ i
(27)
However, there are constraints below.
xi11 xi1 2 xi13 1
(28)
xi 21 xi 2 2 1
(29)
Therefore, we can neglect one variable for each categorical data, and neglect
x2 2
x11
and
here, which is shown in Table 6. We then have
yi 0 1 2 xi1 2 13 xi13 21 xi 21 i
(30)
The predicted value of yi is denoted as yˆi , and is given by
yˆi ˆ0 ˆ1 2 xi1 2 ˆ13 xi13 ˆ21 xi 21
(31)
Kunihiro Suzuki
148
Table 5. Relationship between numerical group discussion evaluation and circle data. The two categorical data are converted to numerical data and assumed to influence the score
ID
Group discussion x1
1 2 3 4 5 6 7 8 9 10
x1(1)
Excellent Excellent Excellent Excellent Allowed Allowed Allowed Wrong Wrong Wrong
x1(2) 1 1 1 1 0 0 0 0 0 0
Circle club x2
x1(3) 0 0 0 0 1 1 1 0 0 0
0 0 0 0 0 0 0 1 1 1
x2(1)
Yes Yes No No Yes No No Yes Yes No
x2(2) 1 1 0 0 1 0 0 1 1 0
Score 0 0 1 1 0 1 1 0 0 1
96 88 77 89 80 71 77 78 70 62
Table 6. Relationship between final numerical group discussion and circle club evaluation and score
ID 1 2 3 4 5 6 7 8 9 10
x1(2) 0 0 0 0 1 1 1 0 0 0
x1(3) 0 0 0 0 0 0 0 1 1 1
x2(2) Score 1 1 0 0 1 0 0 1 1 0
96 88 77 89 80 71 77 78 70 62
The related variance of error can be evaluated as 1 n 2 yi yˆi n i 1 2 1 n yi ˆ0 ˆ1 2 xi1 2 ˆ1 3 xi13 ˆ21 xi 21 n i 1
Se 2
We impose that the can be evaluated as
(32)
2 Se
has the minimum value and decide
ˆ0 , ˆ11 , ˆ1 2 , ˆ21
, which
Quantification Theory I ˆ1 2 S1221 2 ˆ S 2 13 1 213 ˆ S 2 21 1 2 21
S1 21 3 2
S1313 2
S1 2 21 2
S1 2 21 2 S13 21 2 S2 1 21 2
1
S y 212 10.0 S 2 19.0 y13 S 2 9.0 y 21
149
(33)
where ˆ0 can be evaluated as
ˆ0 n y ˆ1 2 x1 2 ˆ13 x13 ˆ21 x21 83.0
(34)
Therefore, the corresponding regression equation is given by
yˆ ˆ0 ˆ1 2 x1 2 ˆ13 x13 ˆ21 x21 83.0 10.0 x1 2 19.0 x1 3 9.0 x21 0 83.0 10.0 19.0
for excellent 0 for allowed 9.0 for wrong
(35)
for no for yes
4. MIXTURE OF NUMERICAL AND CATEGORICAL DATA FOR EXPLANATION VARIABLES We can then treat both categorical and numerical data simultaneously. We add a numerical data of time to go to school for the explanation variable as shown in Table 7. We denote the data of time to go to school as x3 , and obtain the final form as shown in Table 8.
yi 0 1 2 xi1 2 13 xi13 21 xi 21 3 xi 3 i
The prediction of
yi
is denoted as
(36)
yˆi
yˆi ˆ0 ˆ1 2 xi1 2 ˆ13 xi13 ˆ21 xi 21 ˆ3 xi 3
(37)
Kunihiro Suzuki
150
Table 7. Explanation categorical data are group discussion and circle club and numerical data are time to go to school. The three data are assumed to influence the score of each member Group discussion x1 Excellent Excellent Excellent Excellent Allowed Allowed Allowed Wrong Wrong Wrong
ID 1 2 3 4 5 6 7 8 9 10
Time Circle club to go school x2 x3 Yes 15 Yes 85 No 78 No 15 Yes 57 No 29 No 64 Yes 22 Yes 57 No 50
Score 96 88 77 89 80 71 77 78 70 62
Table 8. Relationship between final numerical group discussion, circle club, and time to school evaluation and score
ID 1 2 3 4 5 6 7 8 9 10
x1(2)
x1(3) 0 0 0 0 1 1 1 0 0 0
x2(1) 0 0 0 0 0 0 0 1 1 1
x3 1 1 0 0 1 0 0 1 1 0
Score 15 85 78 15 57 29 64 22 57 50
96 88 77 89 80 71 77 78 70 62
The related variance of error can be evaluated as 1 n 2 yi yˆi n i 1 2 1 n yi ˆ0 ˆ1 2 xi1 2 ˆ1 3 xi13 ˆ21 xi 21 ˆ3 xi 3 n i 1
Se 2
(38)
Quantification Theory I 2 Se
We impose that the which can be evaluated as ˆ1 2 S1221 2 ˆ S1 22 1 3 1 3 ˆ S 2 21 1 2 21 ˆ S 2 3 1 2 3
has the minimum value and decide
S1 213
S1 2 21
S1313
S13 21
S1 3 21
S2 1 21
S133
S2 13
2
2
2
2
151
2
2
2
2
ˆ0 , ˆ1 2 , ˆ13 , ˆ21 , ˆ3
S1 23 S y 212 9.75 2 S133 S y 213 19.7 2 S2 13 S y 221 9.2 0.126 2 S33 S y 23
,
2
(39)
We then obtain ˆ0 as
(40)
89.0 9.75 x1 2 19.7 x1 3 9.2 x21 0.126 x3
(41)
ˆ0 n y ˆ1 2 x1 2 ˆ13 x13 ˆ21 x21 ˆ3 x3 89.0
Therefore, we obtain regression line as yˆ ˆ0 ˆ1 2 x1 2 ˆ1 3 x1 3 ˆ21 x21 ˆ3 x3 0 89.0 9.75 19.7
for excellent 0 for allowed 9.2 for wrong
for no 0.126 x3 for yes
We can evaluate contribution ratio and the selection of variables exactly the same process for multiple regression.
SUMMARY To summarize: We assume that the objective variable is numerical data and the explanation variable is categorical data. When we have m variables, and each variable have nk express the regression as
k 1, 2,
, m
levels, we
Kunihiro Suzuki
152
yˆ ˆ0 ˆ1 2 x1 2 ˆ13 x13
ˆ1 n1 x1 n1
ˆ2 2 x2 2 ˆ23 x23 ˆm 2 xm 2 ˆm3 xm3
We neglect
xk 1
ˆ2 n2 x2 n2 ˆm nm x2 nm
since we impose the restriction of
xk 1 xk 2 xk 3
xk nk 1
Only one term is 1 and the others are 0. The factors are given by ˆ1 2 S1221 2 ˆ S 2 1 3 1 213 2 ˆ 1 n1 S1 21 n1 ˆ2 2 S122 2 2 ˆ23 S122 23 ˆ 2 2 n2 S1 2 2 n2 ˆ S 2 m 2 1 2 m 2 ˆ 2 m3 S1 2 m3 ˆ m n S1 22 m n m m
S1 213 2
S1 21 n1 2
S1 2 2 2 2
S1 2 23 2
S1 22 n2 2
S1 2 m 2 2
S1 2 m3 2
2 S1 2 m nm 2 Sm nm m nm
1
S y 212 S 2 y1 3 2 S y1 n1 2 S y 2 2 S y 223 2 S y 2 n2 S 2 ym 2 2 S ym3 2 S ym n m
We then obtain ˆ0 as ˆ1 2 x1 2 ˆ1 3 x13 ˆ1 n x1 n 1 1 ˆ x ˆ x ˆ x 2 3 2 3 2 n2 2 n2 ˆ0 n y 2 2 2 2 ˆm 2 x2 2 ˆm3 x23 ˆm n x2 n m m
We can extend this process to the data form of mixture of categorical and numerical ones.
Chapter 7
QUANTIFICATION THEORY II ABSTRACT The discriminant analysis gives us a procedure to decide to which group a person or subject belongs. Quantification theory II gives the same results with categorical data or mixture of categorical and numerical data.
Keywords: quantification theory II, determinant analysis, Maharanobis’ distance, categorical data
1. INTRODUCTION When a person goes to a hospital, he is asked various items: having headache or not, having nausea or not, having fever or not, and smoking or no-smoking. Therefore, a doctor obtains various data and should do decide whether he is in disease or not, or what kind of disease he has. The data are not always numerical ones, but the mixture of categorical and numerical ones. We study how the categorical data are converted to the numerical data. After that, we can perform discriminant analysis for numerical data, which is called as quantification theory II. Therefore, the new thing exists in the data conversion. We perform matrix operation and the basic matrix operations are described in Chapter 15.
Kunihiro Suzuki
154
2. DISCRIMINANT ANALYSIS WITH ONE CATEGORICAL DATA We consider both healthy and disease members. We treat a categorical data of frequency of nausea, which may influence healthy or disease. The level number for the frequency of nausea is three, and the levels are no, little, and much. The corresponding data is shown in Table 1. Table 1. The relationship between condition and nausea. The levels for nausea are three
No.
Condition 1 2 3 4 5 6 7 8 9 10
Healthy Healthy Healthy Healthy Healthy Disease Disease Disease Disease Disease
Nausea x1 No Little No No No Little Much Little Little Much
Table 2. The relationship between condition and nausea. The levels for nausea are converted to numerical data
No. 1 2 3 4 5 6 7 8 9 10
Condition Healthy Healthy Healthy Healthy Healthy Disease Disease Disease Disease Disease
x1(2)
x1(3) 0 1 0 0 0 1 0 1 1 0
0 0 0 0 0 0 1 0 0 1
Quantification Theory II
155
We assign numerical data to the categorical nausea data as below.
1 x11 0
for no for non no
(1)
1 x1 2 0
for little for non little
(2)
1 x1 3 0
for much for non much
(3)
Since we impose the restriction of
x11 x1 2 x13 1
(4)
We can then neglect the Eq. (1). We evaluate the fundamental values for healthy and disease members below.
Healthy Member Data The number of data nA is given by nA 5
(5)
The averages of levels 2 and 3 are denoted as A1 2 and A1 3 , and are given by
A1 2
A13
x
iA1 2
nA
x
iA1 3
nA
0 1 0 0 0 1 5 5
(6)
00000 0 5
(7)
The variance of level 2 and 3 are denoted as A 21 2 A1 2 and A 213 A13 , and are given by
Kunihiro Suzuki
156
A 21 2 A1 2
x
A 21 3 A1 3
x
A1 2
iA1 2
2
nA 1 iA1 3
A1 3
0 0.2
nA 1
2
1 0.2 0 0.2 0 0.2 0 0.2 2
2
2
2
(8)
4
2
0 0
2
0 0 0 0 0 0 0 0 2
2
2
2
(9)
4
The covariance between level 2 and 3 is denoted as A 21 23 and is given by A 21 2 A13
x
iA1 2
A1 2
x
iA1 3
A13
nA 1
0 0.2 0 0 1 0.2 0 0 0 0.2 0 0 0 0.2 0 0 0 0.2 0 0 4
(10)
Disease Member Data The number of data nB is given by nB 5
(11)
The averages of levels 2 and 3 are denoted as B1 2 and B1 3 , and are given by
B1 2
B13
x
Bi1 2
nB
x
Bi1 3
nB
1 0 11 0 3 5 5
(12)
0 1 0 0 1 2 5 5
(13)
The variance of level 2 and 3 are denoted as B 21 2 2 and B 2133 , and are given by
2 B1 2 B1 2
x
iB1 2
B1 2
nB 1
2
1 0.6
2
0 0.6 1 0.6 1 0.6 0 0.6 2
2
2
2
4
(14)
Quantification Theory II
B 213 B13
x
iB1 3
B13
2
nB 1
0 0.4
2
157
1 0.4 0 0.4 0 0.4 1 0.4 2
2
2
2
4
(15) The covariance between level 2 and 3 is denoted as B 21 2 B13 and is given by B 21 2 B13
x
iB1 2
B1 2
x
iB1 3
B13
(16)
nB 1
1 0.6 0 0.4 0 0.6 1 0.4 1 0.6 0 0.4 1 0.6 0 0.4 0 0.6 1 0.4 4
The parameters for the total group of healthy and disease are evaluated as
1 2
13
nA A1 2 nB B1 2 nA nB nA A13 nB B13 nA nB
0.2 0.6 0.4 2
0 0.4 0.2 2
(17)
(18)
nA 1 A 22 2 nB 1 B 22 2 nA 1 nB 1
0.25
(19)
12313
nA 1 A 233 nB 1 B 233 nA 1 nB 1
0.15
(20)
12213
nA 1 A 21 23 nB 1 B 21 23 nA 1 nB 1
0.15
(21)
1221 2
We can evaluate the covariance matrix as 1221 2 2 1 21 3
12213 0.250 12313
0.150
The inverse matrix is evaluated as
0.150 0.150
(22)
Kunihiro Suzuki
158
10.00 10.00 1 10.00 16.67
(23)
The corresponding judge function
z
is given by
x1 2 1 2 z A1 2 B1 2 , A1 3 B13 1 x1 3 1 2 x 10.00 10.00 1 2 0.40 0.20 0.60,0 0.40 10.00 16.67 x13 0.20 5.33 8.00 x1 2 10.67 x1 3 0 5.33 8.0 10.67
(24)
for no for little for much
We can judge as below: z 0 Group A : Healthy
(25)
z 0 Group B : Disease
(26)
We can judge the accuracy of the evaluation by comparing the predicted result and the data, which is shown in Table 3. The accuracy is 90% in this case. Table 3. The comparison of predicted result with the data No. 1 2 3 4 5 6 7 8 9 10
Condition Healthy Healthy Healthy Healthy Healthy Disease Disease Disease Disease Disease
x1(2)
x1(3) 0 1 0 0 0 1 0 1 1 0
0 0 0 0 0 0 1 0 0 1
Score 5.33 -2.67 5.33 5.33 5.33 -2.67 -5.34 -2.67 -2.67 -5.34
Result Healthy Disease Healthy Healthy Healthy Disease Disease Disease Disease Disease
Quantification Theory II
159
3. DISCRIMINANT ANALYSIS WITH TWO CATEGORICAL DATA We consider the relationship between condition and two explanatory variables adding headache. The level number for the headache is also three, which is shown in Table 4. Table 4. The relationship between condition and nausea and headache. The levels for nausea and headache are three
No.
Condition 1 2 3 4 5 6 7 8 9 10
Healthy Healthy Healthy Healthy Healthy Disease Disease Disease Disease Disease
Nausea Headache x1 x2 No Little Little No No No No No No No Little Much Much No Little Little Little Much Much Little
We assign categorical data associated with nausea of much, little, no to the below.
1 x1 2 0
for little for non little
(27)
1 x1 3 0
for much for non much
(28)
Table 5. The relationship between condition and nausea and headache. The levels for nausea and headache are converted to numerical data No. 1 2 3 4 5 6 7 8 9 10
Condition Healthy Healthy Healthy Healthy Healthy Disease Disease Disease Disease Disease
x1(2)
x1(3) 0 1 0 0 0 1 0 1 1 0
x2(2) 0 0 0 0 0 0 1 0 0 1
x2(3) 1 0 0 0 0 0 0 1 0 1
0 0 0 0 0 1 0 0 1 0
Kunihiro Suzuki
160
We assign categorical data associated with headache of much, little, or no to the below.
1 for little x2 2 0 for non little
(29)
1 for much x23 0 for non much
(30)
Based on the table, we can evaluate the parameters below. The average associated with healthy members are given by
A1 2
A13
A 2 2
A23
x
iA1 2
nA
x
iA1 3
nA
x
iA 2 2
nA
x
iA 2 3
nA
(31)
(32)
(33)
(34)
The average associated with disease members are given by
B1 2
B13
B 2 2
B 23
x
iB1 2
nB
x
iB1 3
nB
x
iB 2 2
nB
x
iB 2 3
nB
(35)
(36)
(37)
(38)
Quantification Theory II
161
The total average is given by
1 2
13
nA A1 2 nB B1 2
nA A13 nB B13
(40)
nA nB
A 2 2
2 3
(39)
nA nB
n A A 2 2 nB B 2 2
(41)
n A nB
nA A23 nB B 23
(42)
nA nB
The healthy members’ variance are given by
A 21 2 A1 2
x
A 213 A13
x
iA1 2
A1 2
2
(43)
nA 1
iA1 3
A1 3
2
(44)
nA 1
A 22 2 A2 2
x
A 223 A23
x
iA 2 2
A 2 2
2
nA 1
iA 2 3
A 2 3
(45)
2
nA 1
(46)
The disease members’ variance are given by
B 21 2 B1 2
x
B 213 B13
x
iB1 2
B1 2
2
nB 1
iB1 3
B13
nB 1
(47)
2
(48)
Kunihiro Suzuki
162
B 22 2 B 2 2
x
B 223 B 23
x
iB 2 2
B 2 2
2
(49)
nB 1
iB 2 3
B 2 3
2
(50)
nB 1
The healthy members’ co-variances are given by
A 21 2 A13
A 21 2 A2 2
A 21 2 A23
A 213 A2 2
A 213 A23
A 22 2 A23
x
A1 2
iA1 2
x
iA1 3
A13
(51)
nA 1
x
iA1 2
x
A1 2
A 2 3
iA 2 3
A 2 3
iA 2 2
A 2 2
iA 2 3
nA 1
x
x
A1 2
iA1 2
nA 1
x
A13
iA1 3
x
nA 1
x
iA1 3
A13
x
iA2 3
A 2 3
iA 2 2
A 2 2
x
iA2 3
A 2 3
(53)
(54)
(55)
nA 1
x
(52)
nA 1
(56)
The disease members’ co-variances are given by
B 21 2 B13
B 21 2 B 2 2
x
iB1 2
B1 2
x
iB1 3
B13
(57)
nB 1
x
iB1 2
B1 2
x
nB 1
iB 2 2
B 2 2
(58)
Quantification Theory II
B 21 2 B 23
B 213 B 2 2
B 213 B 23
B 22 2 B 23
x
iB1 2
B1 2
x
iB 2 3
B 2 3
iB 2 2
B 2 2
nB 1
x
iB1 3
B13
x
nB 1
x
iB1 3
B13
x
iB 2 3
B 2 3
iB 2 2
B 2 2
x
iB 2 3
B 23
(59)
(60)
(61)
nB 1
x
163
nB 1
(62)
The total variances are then given by
1221 2
12313
nA 1 A 21 2 A1 2 nB 1 B 21 2 B1 2 nA 1 nB 1
(63)
nA 1 A 213 A13 nB 1 B 213 B13 nA 1 nB 1
(64)
2 2 2 2 2
2 23 23
(65)
nA 1 A 223 A23 nB 1 B 223 B 23 nA 1 nB 1
(66)
nA 1 A 21 2 A13 nB 1 B 21 2 B13 nA 1 nB 1
(67)
nA 1 A 21 2 A2 2 nB 1 B 21 2 B 2 2 nA 1 nB 1
(68)
12213
122 2 2
nA 1 A 22 2 A2 2 nB 1 B 22 2 B 2 2 nA 1 nB 1
Kunihiro Suzuki
164
nA 1 A 21 2 A23 nB 1 B 21 2 B 23 nA 1 nB 1
(69)
nA 1 A 213 A2 2 nB 1 B 213 B 2 2 nA 1 nB 1
(70)
nA 1 A 213 A23 nB 1 B 213 B 23 nA 1 nB 1
(71)
nA 1 A 22 2 A23 nB 1 B 22 2 B 23 nA 1 nB 1
(72)
122 23
123 2 2
123 23
2 2 2 23
We can evaluate the covariance matrix as
1221 2 1221 3 122 2 2 122 23 2 2 2 2 1313 13 2 2 13 23 1 31 2 2 2 2 2 2 213 2 2 2 2 2 2 23 2 21 2 2 2 2 2 231 2 2313 23 2 2 23 2 2
(73)
The corresponding judge function z is given by
z A1 2 B1 2 , A13 B13 , A2 2 B 2 2 , A23
12.80 9.60 x1(2) 20.80 x1(3) 6.40 x2(2) 14.40 x2(3) 0 12.80 9.60 20.80
no 0 no little 6.40 little much 14.40 much
We can judge as below
x1 2 1 2 x1 3 1 2 B 23 1 x2 2 2 2 x23 23 (74)
Quantification Theory II
165
z 0 Group A : Healthy
(75)
z 0 Group B : Disease
(76)
We can extend this analysis to the data of mixture of categorical and numerical data as shown in Table 6. Table 6. The relationship between condition and nausea and headache and two numerical data. The levels for nausea and headache are three
No.
Condition 1 2 3 4 5 6 7 8 9 10
Healthy Healthy Healthy Healthy Healthy Disease Disease Disease Disease Disease
Nausea Headache x1 x2 No Little Little No No No No No No No Little Much Much No Little Little Little Much Much Little
Inspection1 Inspection2 x3 x4 50 15.5 69 18.4 93 26.4 76 22.9 88 18.6 43 16.9 56 21.6 38 12.2 21 16.0 25 10.5
SUMMARY I summarize the results in this chapter. We consider two groups A and B , and want to judge to which group a member belongs. We obtain categorical data denoted as k , where k 1,2, , m , that is we have k kinds of categorical data. Each categorical data has We convert the data as
nk levels.
xk 2 , xk 3 , , xk nk Each data is 1 or 0. Only the one of them is 1 and the others are 0. That is,
xk 1 xk 2 xk 3
xk nk 1
Kunihiro Suzuki
166 We therefore do not consider
xk 1 .
The average is given by
Ak nk
x
iAk nk
nA
x
iBk nk
Bk nk
nB
The total average is given by
k nk
nA Ak nk nB Bk nk nA nB
The variances are given by
2 Ak n Al n k
iAk nk
Ak nk
x
iAl nl
Al nl
nA 1
l
2 Bk n Al n k
x x
iBk nk
Bk nk
x
iBl nl
Bl nl
nA 1
l
The total variances are given by
k 2 n k l nl
2 nA 1 Ak 2 n Al n nB 1 AB n Bl n nA 1 nB 1 k
l
We define the matrixes below.
k
l
Quantification Theory II
μ AB
A1 2 A1 3 A1 n1 A 2 2 A 2 3 A 2 n2 Am 2 Am 3 Am n m
B1 2 B13 B1 n1 B 2 2 B 2 3 B 2 n2 Bm 2 Bm3 Bm nm
1221 2 12213 2 12313 131 2 2 2 1 n1 1 2 1 n1 13 2 2 2 1 2 2 2 2 13 2 2 31 2 2 2 313 2 2 2 n2 1 2 2 n2 13 2 m 2213 m 21 2 2 2 m31 2 m313 m 2n 1 2 m 2n 1 3 m m
x1 2 x13 x 1 n1 x2 2 x2 3 X x 2 n2 xm 2 xm3 xm n m
167
1 2 13 1 n1 2 2 2 3 2 n2 m 2 m 3 m nm
1221 n
1222 2
12223
1222 n
122 m 2
122 m3
1231 n
1232 2
12323
1232 n
123 m2
123 m3
12n 1 n
12n 2 2 1
12n 23 1
12n 2 n
12n m 2 1
12n m3
2 2 2 1 n
2 2 2 22
2 22 2 3
2 22 2 n
2 22 m2
2 22 m3
2 231 n
2 23 2 2
2 23 23
2 23 2 n
2 23 m 2
2 23 m3
2 2 n 1 n
2 2 n 2 2 2
2 2 n 23 2
2 2 n 2 n
2 2 n m 2 2
2 2 n m 3
m 221 n
m 222 2
m 22 23
m 22 2 n
m 22 m 2
m 22 m3
2 m 31 n1
2 m 3 2 2
2 m 3 2 3
2 m 3 2 n2
2 m 3 m 2
m 23m3
1
1
1
1
1
1
2
1
1
m 2n
m
1 n1
m 2n
m
2 2
m 2n
m
2 3
2
2
1
2
2
2
2
2
2
m 2n
m
2 n2
m 2n
m
m 2
1
2
m 2n
m
m 3
122 m n
123 m nm 12n1 m nm 2 22 m nm 2 23 m nm 2 2 n2 m nm m 22 m nm m 23m nm 2 m nm m nm m
Kunihiro Suzuki
168
We then obtain the decision equation as z μTAB 1 X
We can judge as below. z 0 Group A z 0 Group B
Chapter 8
QUANTIFICATION THEORY III (CORRESPONDENCE ANALYSIS) ABSTRACT Quantification theory III evaluates the relationship between two categorical data. The theory assigns the categorical data to numerical values so that the correlation factor between two categorical data has the maximum value. Data values in the quantification theory are 0 or 1. The correspondence theory is extended to the quantification theory III to accommodate any values. The quantification theory III can be regarded as the one special case of the correspondence theory.
Keywords: categorical data, eigenvalue, eigenvector, quantification theory III, correspondence theory
1. INTRODUCTION We sometimes want to know the relationship between two categorical data. For example, the age dependence of favorite artists or dishes or so on. That is, we want to know the favorite singer or dishes associated with the ages. Quantification theory III enables us to obtain the relationship. Since the theory is included in the corresponding theory as its special case, we dominantly study corresponding theory in this chapter. We perform matrix operation and the basics of the matrix operations are described in Chapter 15.
Kunihiro Suzuki
170
2. BASIC CONCEPT OF QUANTIFICATION THEORY III The typical data for quantization theory III are shown in Table 1, where favorite curriculums of members are shown. We want to clarify the relationship between members and curriculums and divide them into some groups. Table 1. Favorite curriculums Member ID Japanese 1 2 3 4 5 6 7 8 9 10
Society
Math
○
Science
Music
○ ○
○ ○
○
Arts and crafts ○ ○
Physical eduation
○ ○ ○
○
○
○
○
○ ○ ○
○
○
○
○
○ ○
○ ○
○
○
○ ○
○
We change the orders of rows and columns so that ○ is ordered on the diagonal line as possible as we can, and obtain the data as shown in Table 2 for an example. The both categories look like a relationship focused on the symbol ○ in Table 2, while we have no image on the relationship in Table 1. The quantization theory III performs the operation from Table 1 to Table 2 by assigning a numerical data to each categorical data. Table 2. A table reordered from the Table 1 Member ID Japanese 2 6 7 1 9 4 10 8 3 6
○ ○ ○
Society
Math
○ ○
○
○ ○
○ ○ ○ ○ ○
Science
Music
Arts and crafts
Physical eduation
○ ○ ○ ○ ○ ○ ○
○ ○ ○ ○
○ ○ ○ ○
○ ○ ○
Quantification Theory III (Correspondence Analysis)
171
3. GENERAL FORM DATA FOR CORRESPONDENCE ANALYSIS In the previous section, the data was ○ or vacant. That is, the data values are 1 or 0 if we regard ○ as 1 and vacant as 0. This should be extended to the one where any number is available. This is called as the corresponding analysis. Therefore, the quantization analysis III is a special case of the one. Table 3. Data example for correspondence analysis
We consider Table 3 for the analysis. We assign category data of mid-20, mid-30, and mid-40 as x1 , x2 , x3 , respectively, and the category data of Chinese, Italian, French, and Japanese as y1 , y2 , y3 , y4 , respectively. The value of x1 , x2 , x3 , y1 , y2 , y3 , y4 are determined later. The data number related to the cell
x , y is nij . i
j
We define the sum of the cell related to x1 as nx1 n11 n12 n13 n14
(1)
4
n1 j j 1
The other sum associated with
x
are given by
nxi ni1 ni 2 ni 3 ni 4
(2)
4
nij j 1
Similarly, the sum of
y
is given by
n yj n1 j n2 j n3 j 3
nij i 1
(3)
Kunihiro Suzuki
172 The total sum N is given by 4
3
j 1
i 1
N nyj nxi
(4)
These sums are shown in Table 3. Table 4. General expression of the data shown in Table 3 y1
y2
y3
y4
Sum
x1
n11
n12
n13
n14
nx1
x2
n21
n22
n23
n24
nx 2
x3
n31
n32
n33
n34
nx3
Sum
ny1
ny2
ny3
ny4
N
Finally, the data form is expressed as shown in Table 4. We decide the values
x1 , x2 , x3 , y1 , y2 , y3 , y4 in the step so that the correlation factor between x and y is maximum.
y We impose that the average of x and are 0, which are expressed by
nx1 x1 nx 2 x2 nn3 x3 N 1 3 nxi xi N i 1 0
x
y
(5)
ny1 y1 ny 2 y2 n y 3 y3
1 N
N 4
n j 1
yj
yj
0
y We further impose that the variances of x and are 1, which are expressed by
(6)
Quantification Theory III (Correspondence Analysis) nx1 x12 nx 2 x22 nn3 x32 N 3 1 nxi xi2 N i 1 1
173
S xx 2
S yy 2
(7)
n y1 y12 n y 2 y22 n y 3 y32 n y 4 y42 N
1 N
4
n j 1
yj
(8)
y 2j
1
In the normalized variable, the covariance and the correlation factor is identical, and they are expressed by
2
r S xy
n11 x1 y1 n12 x1 y2 n13 x1 y3 n14 x1 y4 1 n21 x2 y1 n22 x2 y2 n23 x2 y3 n24 x2 y4 N n31 x3 y1 n32 x3 y2 n33 x3 y3 n34 x3 y4
(9)
Therefore, the subject is to maximize the correlation factor under the condition of variance of 1. The corresponding Legendre function L is given by
L r S xx 1 S yy 1 2
2
(10)
We decide x1 , x2 , x3 , y1 , y2 , y3 , y4 that provide maximum L . It should be noted that the average 0 is not imposed in Eq. (10), which should be checked after we decide value of the variables. We partially differentiate L with respect to x1 , x2 , x3 , and obtain
N
L n11 y1 n12 y2 n13 y3 n14 y4 2 nx1 x1 0 x1
(11)
N
L n21 y1 n22 y2 n23 y3 n24 y4 2 nx 2 x2 0 x2
(12)
N
L n31 y1 n32 y2 n33 y3 n34 y4 2 nx3 x3 0 x3
(13)
Kunihiro Suzuki
174
Next, we partially differentiate L with respect to y1 , y2 , y3 , y4 , and obtain
N
L n11 x1 n21 x2 n31 x3 2 ny1 y1 0 y1
(14)
N
L n12 x1 n22 x2 n32 x3 2 ny 2 y2 0 y2
(15)
N
L n13 x1 n23 x2 n33 x3 2 ny 3 y3 0 y3
(16)
N
L n14 x1 n24 x2 n34 x3 2 ny 4 y4 0 y4
(17)
We first investigate the relationship between and . From (11)~(13), we obtain x1 , x2 , x3 , n11 x1 y1 n12 x1 y2 n13 x1 y3 n14 x1 y4 nx1 x12 nx 2 x22 nx 3 x32 1 n x y n x y n x y n x y 2 21 2 1 22 2 2 23 2 3 24 2 4 N N n x y n x y n x y n x y 31 3 1 32 3 2 33 3 3 34 3 4
(18)
We then obtain
r 2
(19)
Multiplying y1 , y2 , y3 , y4 to Eqs. (14)-(17), we obtain n11 x1 y1 n12 x1 y2 n13 x1 y3 n14 x1 y4 ny1 y12 ny 2 y22 ny 3 y32 ny 4 y42 1 n x y n x y n x y n x y 2 21 2 1 22 2 2 23 2 3 24 2 4 N N n x y n x y n x y n x y 31 3 1 32 3 2 33 3 3 34 3 4
(20)
We then obtain r 2
This leads to
(21)
Quantification Theory III (Correspondence Analysis)
r 2
175 (22)
Equations (11)-(13)can be expressed with a matrix form as n11 n12 n21 n22 n 31 n32
y n14 1 nx1 0 y2 n24 2 0 nx 2 y3 0 n34 0 y 4
n13 n23 n33
0 x1 0 x2 nx3 x3
(23)
Eqs. (14)~(17) can be expressed with a matrix form as n11 n12 n13 n14
n21 n22 n23 n24
n31 ny1 x1 0 n32 x2 2 0 n33 x3 n34 0
0 ny 2 0 0
0 0 ny 3 0
0 y1 0 y2 0 y3 ny 4 y4
(24)
We then define the matrix below. n11 n12 A n21 n22 n n 31 32
n13 n14 n23 n24 n33 n34
(25)
x1 X x2 x 3
(26)
y1 y Y 2 y3 y4
(27)
nx1 Nx 0 0
0 nx 2 0
0 0 nx 3
(28)
Kunihiro Suzuki
176 Ny
n y1
0
0
0
ny 2
0
0
0
ny 3
0
0
0
0 0 0 n y 4
(29)
The Equations (23) and (24) are expressed AY 2 N x2 X
(30)
At X 2 N y2Y
(31)
From (31), we obtain
Y
1 1 N y2 At X 2
(32)
Substituting Eq. (32) into Eq. (30), we obtain
A
1 1 N y2 At X 2 N x2 X 2
(33)
Arranging this equation, we obtain N x1 A N y2 At N x1 N x X 2 N x X 1
2
(34)
We then obtain N x1 A N y2 At N x1U 2 U 1
2
(35)
where U Nx X
(36)
This can be expressed with a general form as
MU U
(37)
Quantification Theory III (Correspondence Analysis)
177
where M N x1 A N y2 At N x1
(38)
2 r 2
(39)
1
2
This is an eigenvalue problem with respect to a matrix U . The eigenvalues and eigenvectors are obtained as 1 ,U 1 ; 2 ,U 2 , 3 ,U 3
(40)
We always have 1 , and the corresponding eigenvector has all the same factors. It should be noted that we do not solve the eigenvalues of X , but U which is 1
converted as Nx X , and is symmetrical one as shown in Appendix 1-11. i After we obtain U , we can get
X N x1U i
(41)
In the above discussion, we do not impose the average restriction explicitly, and impose it with regard to variance. The maximum correlation factor can be obtained if all x and y values are the same. They are not appropriate one. We want to force the elements of this un-appropriate solution as 1. We assumed that the variance is 1 in the derivation process. The elements in X 1 are not 1, but has the same value. It is denoted as b. Since we impose that the square sum of them is 1, and it should hold Nb 2 1
(42)
We hence obtain b
1 N
(43)
Therefore, we obtain Nb 1
(44)
Kunihiro Suzuki
178 We rescale all data as X NX i
i
(45)
This treatment ensures that the elements of X 1 are all 1. Further, we obtain Y as Y i
1
i
N
2 1 y
At X
(46)
i
In this analysis, we need
x of more than three levels and y of more than two levels.
xi
We can evaluate the distance between category eigenvector. The distance is denoted as 1 1 2 2 dij xi yj xi yj 2
d ij
and category
yj
from the
as
2
(47)
In this analysis, we treat the first component and second component identically. However, the first component is more important. Therefore, we use weighted distance given by
d ij
1 1 1 2 2 2 2 2 x y xi y j i j 1 2 1 2
Further, we can evaluate the distance between each
x
d ij
(48)
and each y as
d x ij
1 1 1 2 2 2 2 2 x x xi x j i j 1 2 1 2
(49)
d y ij
2 1 1 2 2 2 1 2 y y y y j j 1 2 i 1 2 i
(50)
The correlation factor for the first and second component are given by
r 1
1
(51)
Quantification Theory III (Correspondence Analysis)
r 2
2
179 (52)
We perform a corresponding analysis using data of Table 3. We can generate a matrix as below. 8 20 15 4 A 17 10 15 7 12 9 13 17
(53)
47 Nx 0 0
0
(54)
0
0 0 51
Ny
37
0
0
0
39
0
0
0
43
0
0
0
49
0 0 0 28
(55)
The matrix for targeting the eigenvector is given by M N x1 A N y2 At N x1 1
0.379 0.313 0.289 0.313 0.354 0.332 0.289 0.332 0.396
(56)
This is a symmetrical matrix as shown in Appendix 1-11. This can be solved with Jacobi method. The corresponding first eigenvalue and eigenvector is given 0.565 1; 147 0.577 0.589
The second eigenvalue and the eigenvector are given by
(57)
Kunihiro Suzuki
180 0.749 0.098; 147 0.060 0.660
(58)
The third eigenvalue and eigenvector are given by 0.346 0.031; 147 0.814 0.466
(59)
The corresponding X vector is expressed by X NxU
(60)
and the corresponding ones are given by
X
X
X
1
2
3
1 1 1
(61)
1.324 0.104 1.121
(62)
0.612 1.410 0.791
(63)
We can then evaluate Y as Y i
N
1
i
2 1 y
At X i
and the corresponding ones are given by
(64)
Quantification Theory III (Correspondence Analysis)
181
Y
1 1 1 1
(65)
Y
0.398 1.257 0.277 1.650
(66)
1.472 0.765 0.224 1.223
(67)
1
Y
2
3
The target for this analysis is to obtain an eigenvector that provide the maximum correlation factor. 1 corresponds to this target. However, this does not hold the implicit restriction of average of 0. This is no meaning root for our purpose, and we neglect this first eigenvalue and eigenvector. We always obtain this eigenvector in the corresponding analysis, and hence always neglect the first eigenvalue and eigenvector. Therefore, we convert the second eigenvalue and eigenvector to the first ones, and the third eigenvalue and eigenvector to the second ones. The means of the first and second eigenvector are
X 1 0.042, X 2 0.012, Y 1 0.083, Y 2 0.096
(68)
These are not exactly 0, but close to 0. We impose that the variance is 1. The corresponding values are below.
X 1 2 1.016, X 2 2 0.987, Y 1 1.113, Y 2 2 1.054
(69)
They are not exactly 1, but close to 1. The first and second eigenvectors and eigenvalues are then given by
X
2
X
1
1.324 0.104 1.121
(70)
Kunihiro Suzuki
182 0.612 1.410 0.791
(71)
Y Y
0.398 1.257 0.277 1.650
(72)
Y Y
1.472 0.765 0.224 1.223
(73)
X
3
X
2
3
2
1
2
1 0.098
(74)
2 0.031
(75)
The weighed first and second components are expressed by 1 X 1 , 1Y 1 , 2 X 2 , 2Y 2
(76)
and the values are given by 1.154, 0.300 mid 30 : x , x 0.090,0.691 mid 40 : x , x 0.977, 0.388 Chinese : y , y 0.347, 0.721 Itarian : y , y 1.096, 0.375 French : y , y 0.242,0.110 Japanese : y , y 439, 0.599
mid 20 : x1 , x1 1
1
2
2
1
1 2
2
2 2
1
1 3
2
1
1 1
2
2
1
1
1 2
2
1
1 3
2
1
2
3
2 2
2
3
1 4
2
2 4
(77)
Quantification Theory III (Correspondence Analysis)
183
We can plot the above in a plane as shown in Figure 1. Mid-40 is close to Japanese dish, mid-30 is close to Chinese dish, and mid-20 is close to Italian dish. French dish is far from any ages. However, it is in the center of plots, and hence we can regard that French dish are favorite for all ages although it is not favorite for a special age.
Figure 1. Two-dimensional plot.
We can evaluate the distance between two categories and each category as shown in Table 5-Table 7.
Table 5. Distance between two categories
Table 6. Distance between ages
Distance(XX) mid-20 mid-30 mid-40
mid-20 0 1.591 2.134
mid-30 1.591 0 1.396
mid-40 2.134 1.396 0
Kunihiro Suzuki
184
Table 7. Distance between dishes
SUMMARY To summarize the results in this chapter‒ We treat two category data A and B , and they have
m
and l levels, respectively.
n
We then obtain the data of ij . The subscript i denotes the level number of category A , and j denotes the level number of category B . The following data table is shown below. We want to evaluate the assigned numerical values of
xi and y j .
Table 8. Distance between dishes
Category A1 Category A2 ・・・ Category Am Sum
Category B1 Category B2 y1 y2 x1 n11 n12 x2 n21 n22 ・・・ ・・・ ・・・ xm nm1 nm2 ny1 ny2
The sums are given by l
nxi nij j 1
m
n yj nij i 1
The total data number N is given by
・・・ ・・・ ・・・ ・・・ ・・・ ・・・ ・・・
Category Bl yl n1l n2l ・・・ nml nyl
Sum nx1 nx2 ・・・ nxm N
Quantification Theory III (Correspondence Analysis) m
l
i 1
j 1
N nxi n yj
We define the matrixes based on the data table as n11 n12 n n22 A 21 nm1 nm 2 nx1 Nx 0
ny1 Ny 0
n1l n2l nml
nx 2
ny 2
0 nxm
0 nyl
We want to evaluate the vectors' elements below. x1 x X 2 xm y1 y Y 2 yl
From the correspondence theory, we obtain
MU U
185
Kunihiro Suzuki
186 where U Nx X
M N x1 A N y2 At N x1 1
This is an eigenvalue problem with respect to a matrix U . The eigenvalues and eigenvectors are obtained as
1 ,U 1 ; 2 ,U 2 , 3 ,U 3 We can then get
X N x1U i
We rescale all data as X NX i
i
1 We always have 1 , and the corresponding eigenvector has all the same factors, which is not appropriate for this subject. Therefore, we neglect the first eigenvalue and eigenvectors, and renumber the eigenvalues and eigenvectors as 2 1 2 1 2 1 , X X , Y Y 3 2 3 2 3 2 , X X , Y Y
We can evaluate weighted distance
d ij
d ij
given by
1 1 1 2 2 2 2 2 x y i j 1 2 xi y j 1 2
Further, we can evaluate the distance between each
d x ij
x
1 1 1 2 2 2 2 2 xi x j 1 xi x j 1 2 2
and each y as
Chapter 9
QUANTIFICATION THEORY IV ABSTRACT We want to evaluate the similarity of the categorical data. We assume that we can evaluate the similarity numerically. We assign the categorical data so that the values give the maximum correlation factor. We can then define the distance between two categorical data. Quantification theory IV gives the procedure to obtain the distance.
Keywords: quantification theory IV, similarity, correlation factor
1. INTRODUCTION Let us consider various kinds of cars. We assume that we can evaluate the similarity of each pair. Based on the evaluation, we want to define the distance of the two categorical data. We can obtain such results with Quantification theory IV. Table 1. Relationship between group discussion evaluation and score 1 Crown 1 2 3 4 5 6 7 8 9 10
Crown Cedric Sunny Mark II Corolla Skyline March Vitz RAV4 Pjero
2 Cedric 10 9 6 7 5 2 2 1 1 2
3 Sunny 10 7 9 6 3 3 2 1 3
4 Mark II
10 8 8 6 5 4 2 3
10 8 3 4 3 1 4
5 Corolla
6 Skyline
10 6 7 5 3 5
7 March
10 6 5 3 2
8 Vitz
10 9 7 5
9 RAV4
10 8 5
10 Pjero
10 4
10
Kunihiro Suzuki
188
2. ANALYTICAL PROCESS We assume that we can evaluate the similarity between various kinds of cars as shown in Table 1. We want to assign a value Q rij xi x j
2
(1)
j i
i
Since
xi to a car i . We evaluate a parameter given by
rij
expresses the similarity,
rij
expresses the non-similarity. On the other hand,
x if we assign cars i and j to values of xi and j . The square of the distance is expressed
x x by i
x x i
2
j
. We can therefore evaluate the similarity using two parameters of
rij
and
2
. Q is the inner product of the two parameters. Therefore, we evaluate the
j
rij maximum value of the xi related to . x We impose that the variance of i is 1, that is, we impose
S 2
1 n 2 xi x n i 1 1 n 2 1 n xi xi n2 n i 1 i 1
2
(2)
1
We then obtain a Lagrange function given by L rij xi x j i
j i
2
2 1 n 2 1 n xi 2 xi n i 1 n i 1
Partial differentiating L with respect to xi , we obtain
(3)
Quantification Theory IV
189
L 2 n 1 2 rij xi x j 2 xi 2 xi xi n i 1 i j i n 2 2 2 rij rji xi x j xi 2 n j n 0
n
x i 1
i
(4)
We can set an origin arbitrary without losing generality. Therefore, we set x 0 , that is
1 n xi 0 n i 1
(5)
We then obtain
r
ij
j
rji x j rij rji xi 0 j n
(6)
We introduce variables below.
hij h ji rij rji
n
(7)
(8)
Eq. (6) can then be reduced to
h x ij
j
j
hij xi 0 j
(9)
hii does not influence the magnitude of Q , and hence we can set it arbitrary as below.
h
ij
j
rij rji 0 j
(10)
Kunihiro Suzuki
190 Therefore, we obtain hii hij
(11)
j i
Eq. (9) is then reduced to
h x ij
j
xi 0
(12)
j
This can be regarded as an eigenvalue problem of a matrix Multiplying xi to Eq.(12), we obtain
h x x x ij i
i
j
j
2 i
Q n
H ij
.
(13)
i
We can modify Eq. (12) to
h x ij
i
xi 0
j
j
(14)
i
We then obtain
xi hij x j 0 i
j
i
(15)
From above analysis, we can obtain the first and second eigenvalues and eigenvectors
1 , x1 2 , xi 2 . The distance between category i and j is denoted as d ij , and
i , given by it can be evaluated as
dij
2 1 2 2 2 2 xi1 xj1 x x i j 1 2 1 2
Quantification Theory IV
191
SUMMARY To summarize the results in this chapter‒
r We obtain data which express the similarity of category i and j as ij . We can then form a matrix of H hij rij rji The diagonal elements are set as hii hij j i
We obtain the first and second eigenvectors and eigenvalues given by
1 , xi1 , 2 , xi 2 . d The distance between category i and j is denoted as ij and it can be evaluated as dij
2 2 1 2 2 2 xi1 xj1 x xj 1 2 1 2 i
We can extend this process to the data form of mixture of categorical and numerical ones.
Chapter 10
SURVIVAL TIME PROBABILITY ABSTRACT We discuss the survival time probability, which is important issue in a medical field. We judge effectiveness of medical treatment by evaluating the survival time data. The complete data can be obtained only when persons are dead. However, the number of complete data is limited. We have the other data of ones where we cannot trace the medical treatment, and ones where the medical treatment is on the way, that is, the persons are still alive. We want to use all these data to evaluate the survival time. We divide the survival probability by two steps: one is the survival probability up to the target time and the probability that can survive just after the target time among the survive people. We then evaluate the average survival time and the standard deviation and predict the time where alive person can live for the other time. We also evaluate the effectiveness of two kinds of treatments.
Keywords: survival time probability, Kaplan-Meier product limit prediction method
1. INTRODUCTION In a medical field, we want to evaluate the effectiveness of medical treatment. The simple clear example is the one how fatal sick people can live long after the medical treatment. We can gather data of living time period, and apply the data to standard statistical analysis. However, there are difficulty specialized with the medical treatment. The complete data are the ones of time period of death. However, we cannot wait until all people are dead. Some of them are still alive, and some of them are unable to trace. We need to develop a procedure to use whole data and evaluate the survival time.
Kunihiro Suzuki
194
2. SURVIVAL PROBABILITY We show how we can draw the dependence of survival probability on time. We consider an example of 10 mice to which carcinogenic substance are administered. We treat complete data and the time period was 2, 3, 3, 4,4 4,4, 5, 5, 8, and 10 days. The corresponding survival probability can be evaluated as follows. The survival probability for time
t is denoted as S t . When a mouse is dead at time
t , we regard the situation that the mouse is alive up to time t, and dead at time t 0 . Step 1
S t 1.0 for 0 t 2
(1)
Step 2
S 2 0
9 10
(2)
Step 3
S t
9 10
for 2 t 3 (3)
Step 4
S 3 0
7 10
(4)
Step 5
S t
7 10
for 3 t 4 (5)
Step 6
S 4 0
4 10
(6)
Step 7
S t
4 10
for 4 t 5 (7)
Survival Time Probability
195
Step 8
S 5 0
2 10
(8)
Step 9
S t
2 10
for 5 t 8 (9)
Step 10
S 8 0
1 10
(10)
Step 11
S t
1 10
for 8 t 10 (11)
Step 12 S 10 0 0
(12)
The above results are shown in Figure 1. The survival probability decreases from 1 to 0 monotonically. Since we use a unit of day, the feature is not smooth but is angular. This means that we cannot obtain stable data associated with differential parameters.
Survival probability
1.0
0.5
0.0
0
2
4
Figure 1. Time dependence of survival probability.
6 Days
8
10
12
Kunihiro Suzuki
196
3. DIFFERENT EXPRESSION OF SURVIVAL PROBABILITY Let us consider the survival probability from the different point of view in this section. The survival probability just after 3 days is given by
S 3 0
7 10
(13)
We regard this as the product of two probabilities. One is the probability where mice can live up to 3 days. The other one is the probability that alive mice are dead. Therefore, the probability is expressed by
S 3 0
9 7 7 10 9 10
(14)
Therefore, we obtain the same results. However, we can apply this concept to the incomplete data. The first probability is the one that mice can survive at the time, which is not clearly decided for incomplete data. The second probability is clearly defined even for incomplete data as shown in the next section.
Figure 2. Time dependence of survival probability.
Survival Time Probability
197
Figure 3. Time dependence of survival probability.
4. SURVIVAL PROBABILITY WITH INCOMPLETE DATA (KAPLAN-MEIER PREDICTIVE METHOD) We assume that we obtain data as shown in Figure 2. In the figure, 11 persons are shown although we treat the number as n . We set the starting point for each data to the origin, and sort them and obtain the data as shown in Figure 3. The survival time is denoted as
t1 , t2 ,
, tm
(15)
where
t1 t2
tm
(16)
Note that there are probabilities where some data values are same, and hence m n . We assume that
dj
persons record
tj
, that is,
dj
persons are dead at the time of
t j0
.
t , t We assume that a person cannot be traced in the time period of j j 1 . We checked whether he is alive at the time
tj 0
, but do not know the status after then.
Kunihiro Suzuki
198
The number of person who are alive just before or cannot be traced. Therefore, we can evaluate
n j d j wj d j 1 wj 1
nj
tj
equals to the number who are dead
as
dm wm
(17)
The number of n j is dead among this n j persons. Therefore, the corresponding survival probability is given by
nj d j nj
(18)
The survival probability for two continuous steps is then related to
S t j 0
nj d j nj
We do not know
S t j 1 0 (19)
S t j 0
in general, but we do know that
S 0 1
(20)
Eq. (19) can be related to S 0 as S t j 0
nj d j nj
S t j 1 0
n j d j n j 1 d j 1 nj
n j 1
n j d j n j 1 d j 1 nj
n j 1
n j d j n j 1 d j 1 nj
n j 1
n j d j n j 1 d j 1 nj
n j 1
S t j 2 0 n1 d1 S 0 0 n1 n1 d1 n0 0 S 0 n1 n0 n1 d1 n1
(21)
Survival Time Probability Since we start with alive persons, d0 0 , and the survival probability as j
S t j 0 i 1
S 0 0 S 0
199 . Therefore, we obtain
ni di ni
(22)
This is called as a Kaplan-Meier product-limit predictive method. The standard deviation associated with S t j 0 is approximately expressed with
S t 0 S t j 0 j
j
di i di
n n i 1
i
(23)
5. REGRESSION FOR SURVIVAL PROBABILITY We obtained survival probability in the previous section. The resultant one is not smooth and are squarish. We cannot obtain clear parameters associated with derivatives. Therefore, we want to express the survival probability with a smooth analytical function.
5.1. Exponential Function Regression We approximate the survival probability with an exponential function given by
S t exp t
(24)
We want to evaluate to reproduce the data. Eq. (24) can be modified as
ln S t 0 t
The difference between the theory and the data
ei ln S ti 0 ti
(25)
ei is given by
(26)
Kunihiro Suzuki
200
The summation of the deviation Qe is then given by m
Qe ei2 i 1 m
ln S ti 0 ti
2
i 1
(27)
We set so that Qe has the minimum value. Differentiating Qe with respect to , we obtain m Qe 2 ln S ti 0 ti ti 0 i 1
(28)
We then obtain as m
t ln S t i 1
i
i
0
m
t i 1
2 i
(29)
5.2. Weibull Function Regression We assume that the death probability as f t . Then the accumulated death probability can be obtained by integrating f t from 0 to time t as F t f t dt t
(30)
0
Therefore, the survival probability at time
S t
is given by
S t 1 F t We approximate the death probability of
(31) f t
as a Weibull distribution as
Survival Time Probability 1
t f t
t exp
201
(32)
We then obtain the accumulated death probability as F t f t dt t
0
t
t t 1 exp dt 0 t 1 exp
(33)
Therefore, the survival probability is given by
t S t exp
(34)
We set the parameters and so that the deviation between the theory and data is minimum. Logarithm of Eq. (34) is given by
t ln S t 0
(35)
The logarithm of Eq. (35) is given by
ln ln S t 0 ln t ln
(36)
We introduce the following parameters as
K i ln ln S ti 0
(37)
a
(38)
Kunihiro Suzuki
202
b ln
(39)
Eq. (36) is then given by
Ki a ln ti b
(40)
The deviation between the theory and data is given by
ei Ki a ln ti b
(41)
The sum Qe is then given by m
Qe ei2 i 1 m
K i a ln ti b
2
i 1
(42)
We want to minimize this Qe . Differentiating Qe with respect to a and set it to be 0, we obtain m Qe 2 Ki a ln ti b ln ti 0 a i 1
(43)
We then obtain m
m
m
i 1
i 1
a ln ti b ln ti Ki ln ti i 1
2
(44)
Differentiating Qe with respect to b and set it to be 0, we obtain m Qe 2 Ki a ln ti b 0 b i 1
We then obtain
(45)
Survival Time Probability m
m
i 1
i 1
a ln ti bm Ki
(46)
From Eqs. (44) and (46), we obtain m
a
a and b as
1 m Ki m i 1 1 m ln ti m i 1
Ki ln ti i 1 m
ln t i 1
b
203
i
2
(47)
m 1 m K a ln ti i m i 1 i 1
We can then obtain parameters
(48)
a and as
a
(49)
b exp b exp a
(50)
6. AVERAGE AND STANDARD DEVIATION OF SURVIVAL TIME Using the survival probability, we can evaluate an average and a standard deviation of survival time. We then evaluate time where alive people can live how long time period from now. First of all, we can evaluate the percentile of probability of 50%, which is denoted as
T0.5 , and can be easily evaluated from the data. The average survival time is given by S t j 0 t j t j 1 m
j 1
where t0 0 . The variance of the survival time is given by
(51)
Kunihiro Suzuki
204 2 S t j 0 t j t j 1 m
2
j 1
(52)
The third and fourth moments for the survival time are given by 3 S t j 0 t j t j 1 m
3
j 1
4 S t j 0 t j t j 1 m
(53)
4
j 1
(54)
The skewness and kurtosis are then given by
3 3
(55)
4 4
(56)
The probability where a person, who is alive after the time period of t0 , can live time period of t from now can be evaluated as
P t
t 2 1 exp dt 2 2 t0 t 2
t0
t 2 1 exp dt 2 2 2
(57)
where we assume a normal distribution function for the survival time. If we do not use a normal distribution, but a Pearson function for f t with the given moment parameters, we can obtain the corresponding probability as
f t dt P t f t dt t0 t
t0
(58)
Survival Time Probability
205
It may be a case where the longest untraced time is larger than tm . In that case, S tm 0
do not become 0. The evaluated average survival time should be under estimated. The accuracy is improved by adding a term given by S t j 0 t j t j 1 S tm 0 tmax tm m
j 1
(59)
2 S t j 0 t j t j 1 S tm 0 tmax tm m
2
2
j 1
3 S t j 0 t j t j 1 S tm 0 tmax tm m
3
3
j 1
4 S t j 0 t j t j 1 S tm 0 tmax tm m
4
(60)
(61) 4
j 1
(62)
The following discussion is the same as the one without this improvement.
7. HAZARD MODEL 7.1. Definition of Hazard Function A hazard function
t
t
is defined as
f t
S t
(63)
This expresses that the person who are alive up to the time t are dead in the next incremental time period. On the other hand, f x is the probability without the condition who are alive up to the time t. Let us consider human life. We are apt to be dead when we take age. Therefore, the hazard function increases significantly with increasing age. However, the number of dead people may be small and hence f x is then small. Therefore, the hazard function is one important parameter to understand the phenomenon. Differentiating the survival probability, we obtain
Kunihiro Suzuki
206 dS t dt
dF t dt
f t
(64)
We then obtain the hazard function as
t
f t
S t
dS t dt S t
d ln S t dt
(65)
S t The hazard function includes differential form of . However, the data of S t cannot be differentiated, and hence it is hard to obtain the hazard function from the data directly. We then evaluate the integral form given by
t t dt t
0
t d ln S t dt 0 dt ln S t
(66)
The gradient of the accumulated hazard function t corresponds to the hazard function.
7.2. Analytical Expression for Hazard Function (Exponential Approximation) If we use an analytical function for the survival probability, we can obtain analytical one for the hazard function. If we use an exponential function, we obtain
Survival Time Probability d ln S t dt d ln exp t dt
207
t
(67)
Therefore, the parameter is the just the hazard function, which is the reason we use the same notation for the exponential function.
7.3. Analytical Expression for Hazard Function (Weibull Function) If we use a Weibull function, the corresponding hazard function is given by
t
f t
S t
1
t exp t exp
t
1
t t 1
(68)
8. TESTING OF TWO GROUP SURVIVAL TIME We want to evaluate the difference between two groups with respect to the survival time. We performed two types of medical treatment, and want to evaluate the difference. We consider the group A and B, and the alive number for each group at the time given by
nAj d Aj wAj d A j 1 wA j 1
d Am wAm
tj
are
(69)
Kunihiro Suzuki
208
nBj d Bj wBj d B j 1 wB j 1
d Bm wBm
Therefore, we obtain the cross table for the time
t tj
(70)
as shown in Table 1.
Table 1. The number of dead and alive people number for t t j Group A
Dead
Alive
d Aj
nAj
B
d Bj
nBj
Sum
dj
nj
If there is no group dependence, the dead person number is proportional to the group person number and the expected numbers are given by
eAj d j
nAj
eBj d j
nj
(71)
nBj nj
(72)
Therefore, we obtain the expected dead people number for group A and B as m
E A eAj j 1
(73)
m
EB eBj j 1
(74)
On the other hand, the death data are given by m
DA d Aj j 1
(75)
Survival Time Probability
209
m
DB d Bj j 1
(76)
Therefore, the deviation between the theory and data are given by
2
DA EA EA
2
DA EB
2
EB
(77)
We have the relationship between the theory and data as below. E A EB eAj eBj m
j 1
m n d nBj d j Aj j nj j 1 n j
m
dj j 1
d Aj d Bj m
j 1
DA DB
(78)
Therefore, we obtain
EA DA EB DB
(79)
Substituting Eq. (79) in to Eq. (77), we obtain
1 2 1 2 DA EA EA EB
(80)
We compare this value with the critical value for a distribution with a freedom of 1. If Eq. (80) is larger than the critical value, we can state that the results for both groups are different, and vice versa. 2
Kunihiro Suzuki
210
SUMMARY To summarize the results in this chapter: The number of person who are alive just before
n j d j wj d j 1 wj 1
where
dj
tj
t
dj
i 1
and is given by
persons are dead at the time of
is the number of person who are alive up to the time of traced. The survival probability is given by j
nj
dm wm
is the persons record j , that is,
S t j 0
is denoted as
tj
t j0
and
wj
and become not to be able to be
ni di ni
The standard deviation associated with S t j 0 is approximately expressed with
S t 0 S t j 0 j
j
di i di
n n i 1
i
The survival probability is approximately expressed with an exponential function as
S t exp t where m
t ln S t i 1
i
i
0
m
t i 1
2 i
The survival probability is approximately expressed with a Weibull function as
Survival Time Probability
211
t S t exp We introduce the following parameters as
K i ln ln S ti 0 a
b ln
where
a and b are given by m
a
i 1 m
ln t i
i 1
b
1 m Ki m i 1 1 m ln ti m i 1
Ki ln ti 2
m 1m K a ln ti i m i 1 i 1
We can then obtain parameters
a and as
a b
exp a
The moment parameters are given by S t j 0 t j t j 1 S tm 0 tmax tm m
j 1
2 S t j 0 t j t j 1 S tm 0 tmax tm m
j 1
2
2
Kunihiro Suzuki
212
3 S t j 0 t j t j 1 S tm 0 tmax tm m
3
3
j 1
4 S t j 0 t j t j 1 S tm 0 tmax tm m
4
4
j 1
tmax is the longest time for untraced data. If tm is larger than tmax , S tm 0 is 0, and the last terms are eliminated automatically. The skewness and kurtosis of are then given by
3 3 44
The probability where a person, who is alive after the time period of period of t from now can be evaluated as
t0 , can live time
P t
t 2 1 exp dt 2 2 t0 t 2
t 2 1 exp dt 2 2 2
t0
where we assume a normal distribution function for the survival time. If we do not use a normal distribution, but a Pearson function for f t with the given moment parameters, we can obtain the corresponding probability as
P t
f t dt
t0 t
t0
f t dt
The hazard function is defined as t
f t
S t
In the experimental data, we prefer to evaluate the accumulated hazard function as
Survival Time Probability
t ln S t In the analytical model, we can obtain
t for an exponential function and
t
1 t
for a Weibull function.
213
Chapter 11
POPULATION PREDICTION ABSTRACT A cohort ratio is introduced to express the change of the age constitution. The ratio expresses the probability that a certain age members move to the next generation age. We also introduce a birth ratio, which express the new members, that is, they are new born babies. Using the ratios, we can predict the constitution change of population.
Keywords: population, birth ratio, cohort ratio
1. INTRODUCTION It is important to predict the time dependence of constitution of population, which influence economics. We take a national census every five years in Japan. We can predict the dependence of population based on the continuous two national census data.
2. POPULATION IN FUTURE We use a data of national census data of 2005 as shown in Table 1, and 2010 as shown in Table 2. We predict the population in future based on the data.
Kunihiro Suzuki
216
Table 1. 2005 population data Year:2005 Sum 0 ~ 4 5 ~ 9 10 ~ 14 15 ~ 19 20 ~ 24 25 ~ 29 30 ~ 34 35 ~ 39 40 ~ 44 45 ~ 49 50 ~ 54 55 ~ 59 60 ~ 64 65 ~ 69 70 ~ 74 75 ~ 79 80 ~ 84 85 ~ 89 90 ~ 94 95 ~ 99 100 ~ 104 105 ~ 109 More than 110
Sum M ale Female 127,767,994 62,348,977 65,419,017 5,578,087 2,854,502 2,723,585 5,928,495 3,036,503 2,891,992 6,014,652 3,080,678 2,933,974 6,568,380 3,373,430 3,194,950 7,350,598 3,754,822 3,595,776 8,280,049 4,198,551 4,081,498 9,754,857 4,933,265 4,821,592 8,735,781 4,402,787 4,332,994 8,080,596 4,065,470 4,015,126 7,725,861 3,867,500 3,858,361 8,796,499 4,383,240 4,413,259 10,255,164 5,077,369 5,177,795 8,544,629 4,154,529 4,390,100 7,432,610 3,545,006 3,887,604 6,637,497 3,039,743 3,597,754 5,262,801 2,256,317 3,006,484 3,412,393 1,222,635 2,189,758 1,849,260 555,126 1,294,134 840,870 210,586 630,284 211,221 41,426 169,795 23,873 3,580 20,293 1,458 178 1,280 22 2 20
Table 2. 2010 population data. Birth ratio and cohort ratio are also shown Cohort ratio Year:2010 Sum 0 ~ 4 5 ~ 9 10 ~ 14 15 ~ 19 20 ~ 24 25 ~ 29 30 ~ 34 35 ~ 39 40 ~ 44 45 ~ 49 50 ~ 54 55 ~ 59 60 ~ 64 65 ~ 69 70 ~ 74 75 ~ 79 80 ~ 84 85 ~ 89 90 ~ 94 95 ~ 99 100 ~ 104 105 ~ 109 More than 110
Sum M ale Female Birth number Birth ratio M ale Female 128,057,352 62,327,737 65,729,615 5,296,748 2,710,581 2,586,167 5,585,661 2,859,805 2,725,856 1.0018578 1.0008338 5,921,035 3,031,943 2,889,092 0.9984983 0.9989972 6,063,357 3,109,229 2,954,128 13,494 0.00457 1.0092678 1.0068692 6,426,433 3,266,240 3,160,193 110,956 0.03511 0.9682252 0.9891213 7,293,701 3,691,723 3,601,978 306,913 0.08521 0.9831952 1.0017248 8,341,497 4,221,011 4,120,486 384,382 0.09329 1.0053495 1.0095524 9,786,349 4,950,122 4,836,227 220,103 0.04551 1.003417 1.0030353 8,741,865 4,400,375 4,341,490 34,610 0.00797 0.9994522 1.0019608 8,033,116 4,027,969 4,005,147 773 0.00019 0.9907757 0.9975146 7,644,499 3,809,576 3,834,923 0.9850229 0.9939254 8,663,734 4,287,489 4,376,245 0.9781552 0.991613 10,037,249 4,920,468 5,116,781 0.969098 0.9882162 8,210,173 3,921,774 4,288,399 0.9439756 0.976834 6,963,302 3,225,503 3,737,799 0.9098724 0.961466 5,941,013 2,582,940 3,358,073 0.8497232 0.9333804 4,336,264 1,692,584 2,643,680 0.7501535 0.8793261 2,432,588 744,222 1,688,366 0.6087033 0.7710286 1,021,707 241,799 779,908 0.435575 0.6026486 296,756 55,739 241,017 0.2646852 0.3823943 41,318 5,598 35,720 0.1351325 0.2103713 2,486 250 2,236 0.0698324 0.1101858 78 3 75 0.0166667 0.0576923
Population Prediction
217
We define a cohort ratio as
Cohort ratio Age : 5 9
2010 Population Age : 5 9 2005 Population Age : 0 4
Cohort ratio Age :10 14
2010 Population Age :10 14 2005 Population Age : 5 9
Cohort ratio Age :105 109
2010 Population Age :105 109 2005 Population Age :100 1004
(1)
The cohort ratio for the age more than 110 is given by Cohort ratio Age:more than 110
2010 Population Age : more than110 2005 Population Age:105:110+more than110
(2)
We evaluate the ratio for male and female, respectively. We can then predict the population in 2015 as
2015 Population Age : 5 9 =Cohort ratio Age : 5 9 2010 Population Age : 0
4
2015 Population Age :10 14 Cohort ratio Age :10 14 2010 Population Age : 5 9 2015 Population Age :105 1009 Cohort ratio Age :105 109 2010 Population Age :100 1004
(3)
The final range population is given by 2015 Population Age:more than110 Cohort ratio Age:more than110 2010 Population Age:105:109+more than110
(4)
Kunihiro Suzuki
218
We should evaluate the population in the age range from 0 to 4. This corresponds to the new born babies. We assume that female in the age range of 15 and 49 can have babies, and evaluate the corresponding population. We define the birth ratio given by Birth ratio =
Baby number Female populaton
(5)
We can then evaluate the population for the age range of 0 ~4 as
2015 Population Age : 0 4 =Birth ratio Age :15 19 2010 Female population Age :15 19 Birth ratio Age : 20 24 2010 Female population Age : 20 24 Birth ratio Age : 45 49 2010 Female population Age : 45 49
(6)
We only obtain the total population, and should divide them into male or female. The ratio of male and female was 100:105, and we divide them with the ratio. We can predict populations in future by repeating the above step. Although the cohort ratio and the birth ratio should change with time, we use it as constant.
0.4 Total 0-14 14-64 more than 65
Ratio of more than 65
0.3
100 0.2 50
0 1900
0.1
1950
2000 Year
Ratio of more than 65
Populaiton (million)
150
0 2050
Figure 1. Time dependence of population.
Figure 1 shows the dependence of the population on year. The total population decreases from 2015 and the ratio of more than 65 increases.
Population Prediction
219
SUMMARY To summarize the result in this chapter: We define the cohort ratio as
Cohort ratio Age region i+1
Current year population Age region i 1 Previous year population Age region i
The cohort ratio for the final age region is given by Cohort ratio (Age: more than 110) = Current year population (Age: more than 110) Previous year population (Age:105:110 + more than 110
(7)
The current population is given by
Current year population Age region i+1 =Cohort ratio Age region i+1 Previous year population Age region i The first region population is the one for new birth and it is evaluated as
Current year population Age : 0 4 =Birth ratio Age :15 19 Previous year female population Age :15 19 Birth ratio Age : 20 24 Previous year female population Age : 20 24 Birth ratio Age : 45 49 Previous year female population Age : 45 49 Female and male are divided using the ratio of a reference year.
Chapter 12
RANDOM WALK ABSTRACT We study random walk where a person goes to the left or right randomly. We evaluate the time evolution of distance, and the ratio that one exists in the right or left region with respect to the original location. Since the probability that a person goes to left or right is the same, we think that the probability of 0.5 for staying left region is the maximum. However, the result is opposite, and is the minimum for the value of 0.5, although the average is 0.5. Corresponding to the results, the probability for the frequency that a person cross the original point decreases with the increasing the step frequency.
Keywords: random walk, path, principle of symmetry
1. INTRODUCTION We did not consider the time evolution of a probability variable up to here. The corresponding fundamental issue is a random walk. A drunken people loses information of the path to his home. He randomly selects his next step left or right. The subject is that we predict the region that he exists during a certain time period, and the frequency that he returns the original point. We focus on this random walk in this chapter.
2. FUNDAMENTAL ANALYSIS FOR RANDOM WALK We discuss some fundamental analytical techniques to treat the random walk. We focus on one dimensional analysis here for simplicity. We relate the random walk to a coin toss. If we obtain a head, we step to the right, and if we obtain a tail, we step to
Kunihiro Suzuki
222
the left. We regard the one right step as 1 and one left step as -1. The sum expresses the location of the person. The trial number is expressed by x . The random walk is then expressed as shown in Figure 1, which is called as a path.
Figure 1. Path figure.
2.1. General Theory for Evaluating a Case Number of Path We count a case number of a path that starts from A (origin) to E . We can regard the start point A as a reference and set the coordinate as 0,0 . One example of the path is shown in Figure 1. We set the coordinate of E as n, m . The path is then expressed using coordinates as
0,0 , 1, s1 , 2, s2 , 3, s3 , , n, sn
(1)
where
sn m
(2)
The case number where the path starts form 0,0 to n, m is denoted as N n, m , which we evaluate as follows. m
We assume that we obtain heads as
p q n p q m
p
times and tails
q
times, which is related to
n
and
(3)
Random Walk
223
This can be solved with respect to p as
p
nm 2
(4)
This can be regarded as a case that we select right step p times in the n trials, and the corresponding case number is given by
N n,m n C p n C n m 2
(5)
Eq. (3) can also be solved with respect to q as
q
nm 2
(6)
There are some constraints for the final position as follows. The conditions p, q 0 impose that
n m n
(7)
This is rather obvious condition. Further, we have a constraint from Eq. (4) as n m 2p
Therefore, the sum of n and m is an even number.
Figure 2. Principle of symmetry.
(8)
Kunihiro Suzuki
224
2.2. Principle of Symmetry We utilize a principle of symmetry in the analysis here after, which we discuss in this section. We consider the path that starts form B k , a for a 0 to E n, sn as shown in Figure 2. The number of path that holds above and also have common points with x axis is equal to the number of the paths that start from the symmetrical location of B with respect to the
x
axis denoted as B ' k , a to E n, sn .
Let us consider a path that starts from B and has a common point for the first time. We reflect the path with respect to the x -axis. We can always generate this path for any path and we can establish one-to one correspondence. Therefore, the path number from
B E is the same as the one form B' E . Let us consider the path number. The path is expressed by
k , a n, sn
(9)
This can be identical to the path given by
0,0 n k , sn a
(10)
From Eq. (5), we can evaluate the corresponding path number as
N n k , sn a
n k C n k sn a 2
This is the case number where we do not care for crossing the
Figure 3. Path where all points except or the origin are in the positive region.
(11)
x -axis.
Random Walk
225
2.3. The Path Number Where All Points Except for the Starting Point Is Positive We treat a path where all points except for the starting point are positive. This is expressed as
0,0 , 1, s1 , 2, s2 , 3, s3 , , n, sn
for s1 , s2 , , sn 0
(12)
where we set sn m 0 . The corresponding all paths cross the point 1,1 . Therefore, the number for the path is identical to the number for the path.
1,1 n, m
(13)
This path number is identical to the path given by
0,0 n 1, m 1
(14)
Therefore, from Eq. (5), the corresponding path number, where we do not consider the positive region, is given by n 1 C n 1 m 1 2
n 1 C n m 2 2
(15)
We should extract the path number, where it has common points with the x -axis. The number of path that has common points with the x-axis is identical to the number of path of
1, 1 n, m
(16)
The corresponding path number is identical to the number of paths given by
0,0 n 1, m 1 The corresponding paths number is given by
(17)
Kunihiro Suzuki
226 n 1 C n 1 m 1 2
n 1 C n m 2
(18)
Therefore, the number of path that is always in the positive region except for that the origin is given by
n 1 C n m 2 2
n 1! n 1! n m 2 n m 2 n m nm ! ! n 1 ! ! n 1 2 2 2 2 nm nm n n ! n! 2 2 nm nm nm nm n n n 2 ! 2 ! n 1 2 ! 2 ! m n! nm nm n n 2 ! 2 ! m n C nm n 2
n 1C n m 2
(19)
m N n,m n
From the symmetrical consideration, the path number where all points are in the negative region except for the origin is the same, which is given by
m N n,m n The path number where one reaches
(20)
m for the first time after n steps is also given
by
m N n,m n
(21)
The condition is described as
s1 , s2 , , sn1 sn m 0
(22)
The corresponding case can be realized by reflecting path of Figure 3 and set the origin as 0, 0 . Therefore, the path number is the same.
Random Walk
227
Figure 4. The path where all points except for the edge points are in positive region.
2.4. The Number of Path from 0,0 to 2n,0 Where s1 , s2 ,
The path from
s1 , s2 ,
0,0
to
, s2n1 0
2n,0 where
, s2n1 0
(23)
is shown in Figure 4. The number of the path is identical to the number of path from 0,0 to 2n 1,1 where
s1 , s2 ,
, s2n1 0
(24)
The number of the path is then evaluated as
1 1 N 2 n 1,1 2 n 1 C 2 n 11 2n 1 2n 1 2 1 2 n 1 Cn 2n 1 1 2n 1! 2n 1 n ! n 1!
1 2n 2 ! n n 1! n 1!
1 n
2 n 2 Cn 1
(25)
Kunihiro Suzuki
228
Figure 5. Path where all points are 0 or positive.
2.5. The Path Number from 0,0 to 2n, 0 Where s1 , s2 ,
, s2n1 0
We evaluate the number of paths from 0,0 to 2n, 0 where
s1 , s2 ,
, s2n1 0
This means that we allow contacting the region.
(26)
x-axis but do not allow entering the negative
We move up the path by one. The path is then modified as the path from 0,1 to 2n,1 . We add the points 0,0 and 2n 2,0 to the path. The path is then changed to start from
0,0 to 2n 2,0 as shown in Figure 5. If we impose this path to the condition given by
s1, s2 , , s2n1 0
(27)
The number of paths is the one that we want to obtain, and it can be evaluated by changing n to n 1 in Eq. (25) as
1 2 n Cn n 1
(28)
3. THE PROBABILITY THAT A PERSON IS IN POSITIVE REGION We utilize the above analysis, and evaluate time period ratio where a person is in a positive region.
Random Walk
229
3.1. The Probability Where the Path Starts from 0,0 to 2n,0 The probability where the path starts form 0,0 to 2n,0 can be evaluated as
1 N 2 n,0 22 n 1 2n 2n C 2n0 2 2
u2 n
1 22 n
2 n Cn
(29)
Obviously, we obtain
u0 1
(30)
3.2. The Probability That a Person Reaches x Axis at 2n Trial for the First Time We consider the case where a person reaches a x-axis at 2n time step for the first time and denote the corresponding probability f 2n . This is divided into two cases, where a person is always in positive region or always in negative region. We consider the case where he is always in positive region. The case corresponds to the path from 0,0 to 2n,0 where
s1 , s2 ,
, s2n1 0
(31)
The corresponding probability is given by
1 1 22 n n
2 n 2 Cn 1
The case number for the negative region is the same, and hence we obtain
(32)
Kunihiro Suzuki
230 1 1 2 n 2 Cn 1 22 n n 1 1 2 2 n 2 Cn 1 2 2 22 n 2 n 1 1 2 n 2 Cn 1 2n 2 2 n 2 1 u2 n 2 2n
f 2n 2
(33)
3.3. The Probability That a Person Enters a Negative Region at 2n 1 Time Step for the First Time The events that a person enters the negative region at 2n 1 time step for the first time is denoted as G2n1 . We evaluate the probability that a person enters the negative region at
2n 1 time step for the first time. We first consider a path from 0,0 to 2n 2,0 where all points are in the positive region. The corresponding probability is given by 1 2
2n2
1 n
2 n 2 Cn 1
(34)
Further, the probability that he enters the negative region at 2n 1 time step is
1
2
.
Therefore, the probability that a person enters a negative region at 2n 1 time step for the first time is given by
1 1 1 P G2n 1 2n 2 2 2 n
2 n 2 Cn 1
f 2n
(35)
3.4. The Probability That a Person Does Not Cross X Axis up to 2n Time Step
event that a person reaches the
x
x-axis up to
2n times is denoted as A . The axis at 2n time step for the first time is denoted as K2n
The event that a person does not cross the
. We also express all event as . Then, the event A is expressed as
Random Walk A K2 K4
K2n
231 (36)
The corresponding probability P A is given by
P K 2n
P A P P K 2 P K 4 1 f2 f4
f 2n
(37)
We have a relationship below. 1 22 n 2 1 2n2 2 1 2n2 2
u2 n 2 u2 n
2 n 2 Cn 1
1 22 n
2 n Cn
2n ! 1 2n2 n !n ! 2 2 2 1 1 2n 2n 1 2n 2 ! 2 n 2 Cn 1 4 22 n 2 nn n 1! n 1!
2 n 2 Cn 1
C 2 n 2 2 n 2 n 1
1
2 1 u2 n 2 2n f2n
2n 1 2n
1
2
C 2 n 2 2 n 2 n 1
(38)
Therefore, we obtain P A 1 f 2 f 4
f 2n
1 u0 u 2 u 2 u 4
u2 n 2 u2 n
1 u0 u 2 n u2 n
(39)
3.5. The Probability That a Person Does Not Enter Negative Region up to 2n Time Steps We denote the events that a person does not enter the negative region up to 2n as B , which is expressed by B G1 G2
G2n1
(40)
Kunihiro Suzuki
232
Therefore, the corresponding probability P B is given by
P B P P G1 P G2 1 f2 f4
f 2n
u2 n
P G2 n 1 (41)
We denote the events that a person reaches xaxis at 2n time as A2n , where we do not care whether it is the first time, second time, or so on. The events that a person returns to the x -axis at 2r time for the first time as B2r . A2n can then be expressed by
A2 n
n
B
A2 n 2 r
2r
(42)
r 1
The corresponding probability is given by
P A2n
n
PB
2r
P A2n2r
r 1
(43)
This is expressed by
u2 n
n
f r 1
2 r u2 n 2 r
(44)
Figure 6. Path where length 2k is positive. (a) Start with positive region. (b) Start with negative region.
Random Walk
233
3.6. The Probability That the Length of 2k Is in Positive Region of 2n Length Path We denote the probability that the length of 2k is in the positive region of 2n length path as P2k ,2n . The corresponding case is shown in Figure 6. We prove that the probability is given by P2k ,2 n u2 k u2 n 2 k
(45)
P2n,2n is the case where a person never enter the negative region, and hence it is given
by
P2n,2 n u2 n
(46)
P0,2n is the case where a person never enter the positive region, and hence it is given
by P0,2n u2n
(47)
It should be noted that u0 1, and hence Eq. (47) can be expressed by P0,2n u0u2 n
(48)
Therefore, Eq. (45) is valid for k 0 and any n . We assume that Eq. (45) is valid up to the path length of 2n 2 for any k . What we should do is to prove that Eq. (45) is valid for the path of 2n length. This means that we can lengthen the path from 2 to any length, and valid for any k . We consider the path of length of 2n with the positive region length of 2k . We can consider two cases for that. One is the path in the positive region where we reach the x-axis at 2r time step for the first time, and have 2k 2r positive path in the rest of 2n 2r length path. The variable r can have a value from 1 to k . The other is the path in the negative region where we reach the x-axis at 2r time step for the first time, and have 2k positive path in the rest of 2n 2r length path. The variable r can have a value from 1 to n k .
Kunihiro Suzuki
234 Therefore, we obtain
P2 k ,2 n
nk 1 1 f 2 r P2 k 2 r ,2 n 2 r f 2 r P2 k ,2 n 2 r 2 2 r 1 r 1 k
1 2
k
f 2 r u2 k 2 r u2 n 2 k
r 1
k
1 u2 n 2 k 2
r 1
1 n k f 2 r u2 k u2 n 2 k 2 r 2 r 1
nk 1 f 2 r u2 k 2 r u 2 k f 2 r u2 n 2 k 2 r 2 r 1
(49)
On the other hand, we obtain k
f
2 r u2 k 2 r
u2 k
r 1
(50)
nk
f
2 r u2 n 2 k 2 r
u2n 2k
r 1
(51)
Therefore, Eq. (49) is reduced to P2 k ,2 n
1 u2 n 2 k 2
k
r 1
nk 1 f 2 r u2 k 2 r u 2 k f 2 r u2 n 2 k 2 r 2 r 1
1 1 u2 n 2 k u2 k u2 k u2 n 2 k 2 2 u 2 k u2 n 2 k
2k ! 2 22 k k !
(52)
2n 2k ! 2 22 n 2 k n k !
From the Stirling’s theorem (Appendix 1-12), we can approximate n! as
n! 2n nn en Therefore, we obtain
(53)
Random Walk u2 n
235
2n ! 2 2 n ! 2n 2 2n 2n e 2 n 2n
22 n
2n n n e n
2
1
n
u2 n 2 k
(54)
2n 2k ! 2 22 n 2 k n k ! 2 2n 2k 2n 2k 22 n 2 k
e 2 n 2 k
2n 2k n k n k e n k
n k n k 2 n 2 k
22 n 2 k 1 22 n 2 k 1
2n2k
n k n k n k
2
(55)
2
1
n k
Eq. (52) is then approximated as
P2 k ,2 n u2 k u2 n 2 k
1
1
k
n k 1
k n k
(56)
The dependence of the probability on k is shown in Figure 7. The path number is assumed to be 2n 200 . The rigorous model of Eq. (52) and the analytical model of Eq. (56) are compared. The ratio of the positive and the negative regions is equal corresponds to k 50 . The probability is the minimum at the points. The probability increasing with deviating from k 50 . It should be noted that the rigorous model is well approximated with the analytical model.
Kunihiro Suzuki
236 0.03 n = 100
Rigorous Analytical
P
0.02
0.01
0.00
0
50 k
100
Figure 7. The Dependence of probability on k. The path number 2n is 200. The rigorous model of Eq. (52) and the analytical model of Eq. (56) are compared.
We extend the analysis to the normalized one. The ratio of the positive path length to the total path length is given by
2k k z 2n n
(57)
The corresponding probability density f z is given by
P2k ,2n k k f z z
(58)
We then obtain P2 k ,2 n k k
1
k n k 1
k n k 1
k nz z
k k 1 n n 1 z z 1 z
(59)
Random Walk
237
Finally, we obtain
1
f z
z 1 z
(60)
Figure 8 shows the corresponding results. The average ratio is 0.5 as is expected. However, the probability density is the minimum at 0.5. This means that a person is apt to be in one sided region (positive or negative). The probability P for the region a z b can be evaluated as P
b
a
f z dz
b
1 dz a z 1 z
2
sin
1
b sin 1 a
(61)
3.0
f(z)
2.0
1.0
0.0 0.0
0.5 z
1.0
Figure 8. The probability density for the ratio where a person is in the positive region.
4. RETURN FREQUENCY TO THE ORIGIN We evaluate the frequency where a person crosses the x-axis. We show that a person rarely crosses the axis. This may be opposite to our image. Since the probability that a person goes to the left or the right is the same, we may expect the crossing frequency is large. However, this result can also be guessed from the results in the previous section, where the one side region staying probability of 0.5 is the minimum.
Kunihiro Suzuki
238
Figure 9. The path where a person reaches m for the 2n-m step.
Figure 10. The path where a person reaches m for the 2n-m step for the first time.
We start with a path where we reach mat the 2n m step as shown in Figure 9. This path is identical to the path where a person reaches m for the first time at the 2n m step as shown in Figure 10. The probability that a person return to the given by
f 2n
x-axis at the
1 u2 n 2 2n
2n step for the first time is
(62)
We consider the path with length of n where
s1, s2 ,
, sn1 m, sn m
(63)
We regards that the path as the one that reaches mfor the first time. The related probability is denoted as
m 1 m hn N n,m n n 2
m m hn hn
.
is given by
(64)
Random Walk
239
Let us consider the path where we reach 1 for the first time at the 2n 1 step. This path is identical to the path where we move the path totally by (1,-1), and connect (0,0) and (1,-1). The result and path is always negative and reach the axis at the 2n step for the first time. Therefore, the corresponding case number is
1 n
2 n 2 Cn 1
(65)
Therefore, the related probability is given by 1 n
2 n 2 Cn 1
1 2
n 1
1 1 2n 2 2 n 2 1 u2 n 2 2n f 2n
2 n 2 Cn 1
(66)
1 We also evaluate h2 n 1 as
h2 nn 1 1
1 1 N 2 n 1,1 2 n 1 2n 1 2 1 1 2 n 1 Cn 2 n 1 2n 1 2 2 n 1 ! 1 1 2n 1 n 1!n ! 22 n 1
2n 2 ! 1 1 2n 22 n 2 2n 2 n 1 ! n 1!
f 2n
(67)
Therefore, we obtain
h2 n1 f 2 n 1
(68)
Inspecting above, we focus on the path that starts from y location of 0 to 1, and modify the path from the 0 point to the end, and repeat the modification. Therefore, the focused point in this case is shown with red marks in Figure 10.
Kunihiro Suzuki
240
The probability where a person cross the
x-axis m times just after the
2n steps is
f 2n . Then the above is expressed by m
denoted as
h2 n1 f 2n 1
1
(69)
We want to prove that
f 2n h2 n m m
m
(70)
The corresponding case number is
m N 2 n m, m 2n m
(71)
Therefore, the corresponding probability is given by
h2 n m m
m 1 N 2 n m, m 2 n m 2n m 2
(72)
The path in Figure 10 can be modified below. We move the path with the deviation 1, 1 totally in the first step and add a red line at the beginning as shown in Figure 11. We then move with the deviation 1, 1 from the point where the path crosses the
x-axis for the first time. Then the red path crosses the
x -axis
second times as shown in
Figure 12. We then move with the deviation 1, 1 from the point where the path crosses the
x-axis for the second time. Then the red path crosses the
third times as shown in Figure 13. We then move with the deviation 1, 1 from the point where the path crosses the
x-axis for the third time. Then the red path cross the 14.
x
x -axis
axis fourth times as shown in Figure
Random Walk
Figure 11. We modify the path by 1, 1 , and add connection between two black points.
Figure 12. We modify the path by 1, 1 , and add connection between two black points.
Figure 13. We modify the path by 1, 1 , and add connection between two black points.
Figure 14. We modify the path by 1, 1 , and add connection between two black points.
241
Kunihiro Suzuki
242
The path is the one where a person reaches xaxis mtimes at the 2n step while in the negative region. The corresponding case number is then
m N 2 n m, m 2n m We need to release the condition of negative region. We add the case as shown in Figure 15. The corresponding case number is given by
m N 2 n m , m 2m 2n m
Figure 15. The path where a person cross touch x-axis m times at 2n step.
The corresponding probability is given by f 2n m
m 1 N 2 n m , m 2m n 2n m 2 m 1 N 2 n m,m n m 2n m 2 h2 n m m
(73)
We restrict our analysis that we realize mtimes crossing at 2n time step. However, we extend the analysis to where we obtain mtimes crossing during 2n time steps. We
g denote it as 2 n , which is given by m
g 2 n
1
m
2
C 2n m 2n m n
We should prove the above.
(74)
Random Walk We assume that a person touch the
x-axis
243
mtimes at the 2n k step, and then the
person does not touch the axis for the other 2k steps. The case where a person does not touch the axis for the other 2k steps is identical to the case where a person returns to the axis at 2n step. The above means that m m m 1 g 2 n f 2n f 2n
n f 2n
(75)
f 2n as below. k
We study
f 2n h2 n k k
k
k 1 2nk 2n k 2
2 n k Cn
(76)
This can be divided as two terms as
1
2
C 2nk 2nk n
1
2 2
1
2
C 2nk 2nk n
1
C 2nk 2nk n
C 2 n k 1 2 n k 1 n 1 2
1 2
2n k 1! 2n k 1 n !n ! 2 2 n k n 2n k ! 2n k 2n k n !n !
2 n k 1
2nk
2 2n k n 1 1 2 n k 2 n k Cn 2n k 2 2 n k 2n 2k 1 2 n k Cn 2n k 22 n k k 1 C 2nk 2nk n 2n k 2 k f 2n
We therefore obtain
(77)
Kunihiro Suzuki
244 m m m 1 g 2 n f 2n f 2n
1
2
C 2nm 2nm n
n f 2n
1
2
1
2
C 2 n m 1 2 n m 1 n
C 2 n m 1 2 n m 1 n
1
2
1 1 C n 1 n 1 Cn n n n 2 2 1 1 2 n m 2 n m Cn n 1 2 2 1 2 n m 2 n m Cn 2 1 2n m ! 2nm 2 n m !n !
C 2 n m 11 2 n m 11 n
n 1 Cn
(78)
We utilize that p Cq
0
for p q
g 2n g and 2n can be evaluated as 0
The
(79)
g2 n 0
g 2 n
1 22 n
1
2 n Cn
u2n
(80)
1
1
2
C 2 n 1 2 n 1 n
2n 1! 2 2n 1 n ! n ! 2n ! 1 n 2 n 1 2n 2n n ! n ! 2
1
2 n 1
1 22 n u2 n
2 n Cn
(81)
Eq. (77) corresponds to m m 1 m g 2 n g 2 n f 2n 0
(82)
Random Walk
245
This means that
g 2 n g 2 n 1
2
(83) g 2 n to g 2n is evaluated m
Let us consider the case where
n
is quite large. The ratio of
0
as g 2 n m
0 g 2 n
22 n 22 n m
2 n m Cn 2 n Cn
2 n m ! 2n n ! n ! 2n m n ! n ! 2n ! n n 1 n m 1 2m 2n 2n 1 2n m 1 n n 1 n m 1
2m
1 n n 2 1 1 n 1 1 2n
m 1 n 2 m 1 n n m 1 1 2n
(84)
Therefore, we obtain 1 m 1 1 n n n g m ln 2 n0 ln 1 m 1 g 2 n 1 2n 1 2n
m 1
k
k
ln 1 n ln 1 2n k 1
m 1
k
k
n 2n k 1
m 1
2n k
k 1
m m 1 4n
where we utilize a Taylor series for a small
(85)
xas
Kunihiro Suzuki
246
ln 1 x x
(86)
Therefore, we obtain m m 1 m 0 g 2 n g 2 n exp 4n m m 1 1 exp 4n n
(87)
0.06 n = 100
Rigorous Analytical
0.05
g
0.04 0.03 0.02 0.01 0.00
0
20
40
m
60
80
100
Figure 16. Dependence of probability on m.
Figure 16 shows the dependence of the probability on m. The probability decreases with increasing m. That is, a person rarely returns to the original points once he deviates from the original point. The rigorous model of Eq. (78) is well approximated with an analytical one of Eq. (87).
SUMMARY To summarize the results in this chapter: The probability where a person is in the positive region for the time length 2k among the total time length 2n is given by P2k ,2 n
2k ! 2 22k k !
2n 2k ! 2 22n 2k n k !
Random Walk
247
Then the corresponding probability function where the person is in the positive region in unit time for large
n is given by
P2 k ,2 n u2 k u2 n 2 k
1
1
k
n k 1
k n k
The ratio of the positive path length to the total path length is given by
2k k z 2n n The corresponding probability is given by
f z
1
z 1 z
The probability where a person returns to the original point mtimes during 2n path is given by g 2 n
1
m
2
2nm
2n m ! n m !n !
The probability function where the person returns to the origin for large n is given by g 2 n m
m m 1 exp 4n n 1
Chapter 13
A MARKOV PROCESS ABSTRACT A Marcov process assumes that the status for the next step is determined by the status of just the previous step, and not influenced by the step before the previous step. The random walk treated in the previous chapter is included in the Markov process as a special case. The Marcov process is not limited to the random walk, but it accommodates various subjects in business and economics fields, and so on. Markov process uses a transition matrix to express the change from a certain status to the next status. The elements of the transition matrix express the probabilities where one status transits to the next status. We also use an initial vector to express the initial status. The status of any time step is simply obtained by multiplying the transition matrix for corresponding time steps. We also investigate various components of the elements that express vanishing process, a supply source, and constant flux.
Keywords: random walk, transition matrix, supply source, constant flux, condition probability, network, network loop
1. INTRODUCTION In our daily life, we want to predict the future status, such as the population generation constitution, constitution of university students, a share ratio of some products, and queues for service business. We should predict the results based on the data up to now. The simplest assumption is that the next step status is determined only by the previous step status, which is called as a Markov process. If the assumption is valid, we can predict the results simply using the Markov process theory. We study the Markov process theory and show that it can treat various subjects in this chapter.
Kunihiro Suzuki
250
2. A MARKOV PROCESS FOR RANDOM WALK We treated the random walk in the previous chapter. We treat it again with a different way of a Marko process. We assume that we can only select a step to the direction of right or left. We relate the random walk to a coin toss. If we obtain a head, we step to the right, and if we get tail a tail, we then step to the left. This can be regarded as probability process. The probability to obtain a head is denoted as
p
, and one to obtain a tail is denoted
as q 1 p . Here, we assume p 1 2 . The unit of time
t is related to the step, and it is assumed to be 1 for 1 step. The
probability variable that the distance from the origin is set to be X t . The transition of variable is denoted as follows.
x 0 0 Probability 1
(1)
1 1 Probability = 2 x 1 1 Probability = 1 2
(2)
1 2 Probability = 4 1 x 2 0 Probability = 2 1 2 Probability = 4
(3)
3 1 x 3 1 3
1 8 3 Probability = 8 3 Probability = 8 1 Probability = 8
Probability =
(4)
A Markov Process
251
In general, x n can have a value of n, n 2, n 4, , n 4 , n 2 , n for t n . That is, it can have n 1 values. x n has a value of n 2k when we have tail k times among n trials. The related probability fn n 2k is given by k
1 1 f n n 2k n Ck 2 2
nk
1 n Ck 2
n
(5)
For example, x 3 3 corresponds to n 3, k 0 , and we have 3
1 1 f3 3 2 0 f3 3 3 C0 2 8
(6)
On the other hand, x 3 1 corresponds to n 3, k 1 , and we have 3
1 3 f3 3 2 1 f3 1 3 C1 2 8
(7)
3. TRANSITION PROBABILITY FOR RANDOM WALK The location at the time t n is determined by the location at the time t n 1 and it is not predicted definitely, but as a probability variables. This process can be regarded as a Markov process. The Markov process is related to conditional probability. Let us consider the transition of a probability variable X t . X t have variables at the time 0,1,2, , n 1 as a0 , a1 , a2 , , an1 . The vale at the time n is expressed by X t an , and it is assumed to the condition before. The probability is expressed as
P X n an X 0 a0 , X 1 a1 , X 2 a2 ,
, X n 1 an 1
(8)
This is the general form. In the Marko process, it is determined only by the previous step, and hence, Eq. (8) is reduce to
P X n an X n 1 an 1
(9)
Kunihiro Suzuki
252
In the random walk, the location is changed with a probability of 1 2 . Let us consider the case where the status with n 2 changes to the status with n 3 . We consider the location of x 0, 1, 2, 3 . We need to consider the condition probability from the location x 0, 1, 2, 3 with n 2 to the location x 0, 1, 2, 3 with n 3. x 3 3 Let us consider the status with n 2 to the location , which is given by
1 P X 3 3 X 2 2 2 P X 3 3 X 2 1 0 P X 3 3 X 2 0 0 P X 3 3 X 2 1 0 P X 3 3 X 2 2 0 P X 3 3 X 2 3 0
P X 3 3 X 2 3 0
(10)
x 3 2 Let us consider the status with n 2 to the location , which is given by
12 P X 3 2 X 2 2 0 1 P X 3 2 X 2 1 2 P X 3 2 X 2 0 0 P X 3 2 X 2 1 0 P X 3 2 X 2 2 0 P X 3 2 X 2 3 0
P X 3 2 X 2 3
Let us consider the status with n 2 to the location x 3 1 , which is given by
(11)
A Markov Process
1 P X 3 1 X 2 2 2 P X 3 1 X 2 1 0 1 P X 3 1 X 2 0 2 P X 3 1 X 2 1 0 P X 3 1 X 2 2 0 P X 3 1 X 2 3 0
253
P X 3 1 X 2 3 0
(12)
Let us consider the status with n 2 to the location x 3 0 , which is given by
P X 3 0 X 2 2 0 1 P X 3 0 X 2 1 2 P X 3 0 X 2 0 0 1 P X 3 0 X 2 1 2 P X 3 0 X 2 2 0 P X 3 0 X 2 3 0
P X 3 0 X 2 3 0
(13)
Let us consider the status with n 2 to the location x 3 1 , which is given by
P X 3 1 X 2 2 0 P X 3 1 X 2 1 0 1 P X 3 1 X 2 0 2 P X 3 1 X 2 1 0 1 P X 3 1 X 2 2 2 P X 3 1 X 2 3 0
P X 3 1 X 2 3 0
(14)
Kunihiro Suzuki
254
Let us consider the status with n 2 to the location x 3 2 , which is given by
P X 3 2 X 2 2 0 P X 3 2 X 2 1 0 P X 3 2 X 2 0 0 1 P X 3 2 X 2 1 2 P X 3 2 X 2 2 0 1 P X 3 2 X 2 3 2
P X 3 2 X 2 3 0
(15)
Let us consider the status with n 2 to the location x 3 3 , which is given by
P X 3 3 X 2 2 0 P X 3 3 X 2 1 0 P X 3 3 X 2 0 0 P X 3 3 X 2 1 0 1 P X 3 3 X 2 2 2 P X 3 2 X 2 3 0
P X 3 3 X 2 3 0
(16)
Summarizing above, we have a transition matrix given by 0 0 0 0 0 0 12 0 12 0 0 0 0 1 2 0 12 0 12 0 0 0 0 12 0 12 0 0 0 0 0 0 12 0 12 0 0 0 0 12 0 1 2 0 0 0 0 0 0 12 0
(17)
A Markov Process
255
Although we obtain the above matrix by considering the transition from time of 2 to the time of 3, it is the same for any time step independent of time. The transition matrix is directly related to the transition diagram as shown in Figure 1. The numbered ball shows the status. The arrow shows the transition, and the numeric related to the arrows are transition element. The dashed arrow shows vanish, which will be discussed in detail in the next section.
Figure 1. Transition diagram for random walk.
The transition matrix of Eq.(17) can be appreciated as Figure 2, where the transition status is shown. The number of a column expresses the status before the transition, and the row number expresses the status after the transition. The corresponding elements show their transition probability.
Figure 2. The relationship between transition matrix and statuses.
The initial vector is shown in Figure 3. It is directly related to the Figure 2, and it is shown that one person exists at the location of 0 as an initial status.
Kunihiro Suzuki
256
Figure 3. Initial vector with related status.
When we multiply the transition matrix to the initial vector, we obtain the status for the next step. The transition from n 0 to n 1 can be obtained by 0 0 0 0 0 0 0 0 12 0 12 0 0 0 0 0 0 1 2 0 12 0 12 0 0 0 0 1 2 0 12 0 12 0 0 1 0 0 0 0 0 12 0 12 0 0 1 2 0 0 0 12 0 1 2 0 0 0 0 0 0 0 0 12 0 0 0
(18)
The result is identical to Eq. (2). The transition from n 1 to n 2 can be obtained by multiplying the transition matrix to the right side of Eq. (18) given by 0 0 0 0 0 0 0 0 12 0 12 0 0 0 0 0 1 4 1 2 0 12 0 12 0 0 0 1 2 0 0 12 0 12 0 0 0 1 2 0 0 0 0 12 0 12 0 1 2 0 0 0 0 12 0 1 2 0 1 4 0 0 0 0 0 0 12 0 0 0
(19)
A Markov Process
257
The result is identical to Eq.(3). The transition from n 2 to n 3 can be obtained by multiplying the transition matrix to the right side of Eq. (19) given by 0 0 0 0 0 0 1 8 0 12 0 12 0 0 0 0 1 4 0 1 2 0 12 0 12 0 0 0 0 3 8 0 12 0 12 0 0 1 2 0 0 0 0 0 12 0 12 0 0 3 8 0 0 0 12 0 1 2 1 4 0 0 0 0 0 0 0 12 0 0 1 8
(20)
The result is identical to Eq.(4). Consequently, the status for n k can be obtained by k
0 0 0 0 0 0 0 12 0 12 0 0 0 0 0 1 2 0 12 0 12 0 0 0 0 0 12 0 12 0 0 1 0 0 0 0 12 0 12 0 0 0 0 0 12 0 1 2 0 0 0 0 0 0 0 12 0 0
(21)
4. TRANSITION MATRIX ELEMENTS 4.1. General Discussion for Matrix Elements The transition matrix expresses the transition from the status of a column number j to the status of a row number i. Therefore, if we focus on a certain j column, we can inspect that it transits to which statuses. For example, let us focus on the third column in Figure 2. This corresponds to the status of -1. The second and fourth elements in the column are 1/2. This means that it transits to the state probability of 1/2. We impolitely assume that it transits to certain status, and hence the sum of the probability should be 1. Let us consider the first column in Figure 2, which corresponds to the status of -3. Only the second row has a value of1/2. This means that it transit to the status of -2 with the probability of 1/2. Since there is no number else, the sum of the probability is not 1 in this
Kunihiro Suzuki
258
column. The other status that the transition is to the status of -4. However, the matrix does not treat the status. We need to extend the matrix to cover any status. We need infinite size of the transition matrix given by
0 12 0 0 0 0 0 0 0 12 0 12 0 0 0 0 0 0 0 12 0 12 0 0 0 0 0 0 0 12 0 12 0 0 0 0 0 0 0 12 0 12 0 0 0 0 0 0 0 12 0 12 0 0 0 0 0 0 0 12 0 12 0 0 0 0 0 0 0 12 0 12 0 0 0 0 0 0 0 12 0
(22)
This is true in the standpoint of mathematics. However, we cannot treat an infinite size matrix in the numerical calculation. We use a matrix cut from Eq. (22) with a sufficient size. If the max step is m, we should use a matrix with the size larger than 2m 1 .
Figure 4. Assumed five statuses.
We consider generalization of the transition matrix, where we consider five statuses. In general, one can transit to any status. Therefore, the general form for the transition matrix is given by a11 a21 a31 a41 a 51
a12 a22 a32 a42 a52
a13 a23 a33 a43 a53
a14 a24 a34 a44 a54
a15 a25 a35 a45 a55
(23)
The column number corresponds to the status 1, 2, 3, 4, and 5, and the row number corresponds to the status 1, 2, 3, 4, and5. The transition probability from the status j to the status i is given by
aij
.
A Markov Process
259
Let us consider the second column for an example. The corresponding part is given by a12 a22 a32 a42 a 52
(24)
Figure 5. Auto regressie case diagram.
Let us consider some special cases. The special case is staying the same state, that is, an auto regressive case, which is expressed by a12 0 a22 1 a32 0 a42 0 a 0 52
The corresponding diagram is shown in Figure 5.
Figure 6. Vanishing process diagram.
(25)
Kunihiro Suzuki
260
The next one is the vanishing, which is given by a12 a22 a32 a42 a 52
0 0 0 0 0
(26)
The corresponding diagram is shown in Figure 6.
Figure 7. Reflection to the status-1 process diagram.
The final case is the reflection, which is given by a12 1 a22 0 a32 0 a42 0 a 0 52
(27)
This case shows the reflection to the satatus-1. The corresponding diagram is shown in Figure 7. If it is reflected to various statuses, we divide the value depending on the transition probability, which is given by a12 a22 0 a32 a42 a 52
(28)
A Markov Process
261
where
a
ij
1
(29)
i j
The corresponding diagram is shown in Figure 8.
Figure 8. Multi-reflection process diagram.
Figure 9. General expression for the transition from the status 2 to the other statuses.
In general case, there are both auto regression and multi-reflections, and is given by
a12 a22 a32 a42 a 52
(30)
Kunihiro Suzuki
262
If there is vanishing, the below relationship holds.
a
ij
i all
1
(31)
If there is no vanishing, the below relationship holds.
a
i all
ij
1
(32)
Further, the below relationship should holds.
0 aij 1
(33)
The corresponding diagram is shown in Figure 9. We focus on the column 2 up to here. However, the same discussion is valid for the other columns. Based on the above, we can a generate transition matrix for various cases.
4.2. Supply Source We want to supply to the status 1 by h at any step. We consider the transition matrix of Eq. (23), and set the initial condition as b1 b2 b3 b4 b 5
(34)
When there is no supply, the statuses at the next step can be obtained as a11 a21 a31 a41 a 51
a12 a22 a32 a42 a52
a13 a23 a33 a43 a53
a14 a24 a34 a44 a54
a15 b1 a25 b2 a35 b3 a45 b4 a55 b5
(35)
A Markov Process
263
Next, we consider the supply to the status 1, which is given by h 0 0 0 0
(36)
We call it as supply vector. The statuses at the next step are then given by a11 a21 a31 a41 a 51
a12 a22 a32 a42 a52
a13 a23 a33 a43 a53
a14 a24 a34 a44 a54
a15 b1 h a25 b2 0 a35 b3 0 a45 b4 0 a55 b5 0
(37)
In the next step, we can obtain a11 a21 a31 a41 a 51
a12 a22 a32 a42 a52
a13 a23 a33 a43 a53
a14 a24 a34 a44 a54
a15 a11 a25 a21 a35 a31 a45 a41 a55 a51
a12 a22 a32 a42 a52
a13 a23 a33 a43 a53
a14 a24 a34 a44 a54
a15 b1 h h a25 b2 0 0 a35 b3 0 0 a45 b4 0 0 a55 b5 0 0
(38)
We can repeat this cycle. It is flexible to supply to which status, and hence the supply vector is expressed in general as h1 h2 h3 h4 h 5
(39)
When we utilize the matrix for the initial condition, we can then use a matrix supply matrix as
Kunihiro Suzuki
264 h1 0 0 h2 0 0 0 0 0 0
0 0 h3 0 0
0 0 0 h4 0
0 0 0 0 h5
(40)
4.3. Supply Source Included in the Transition Matrix In the previous method, we need an additional calculation process to handle the supply source. It is rather a complex process to be improved. We add a status for the source, and express it within the framework of a transition matrix. We propose to use a status as shown in Figure 10. The initial vector for the source is a , and it supplies to the i-status at any step as hi api
(41)
The corresponding transition matrix and the initial vector are enlarged by one order and the status after the k-th step can be evaluated as 1 p1 p2 p3 p 4 p5
0 a11 a21 a31 a41 a51
0 a12 a22 a32 a42 a52
0 a13 a23 a33 a43 a53
0 a14 a24 a34 a44 a54
0 a15 a25 a35 a45 a55
k
a b1 b2 b3 b 4 b5
Figure 10. The transition diagram for source.
(42)
A Markov Process
265
4.4. Vanishing Monitor We showed that there are some vanishing processes when the sum of a certain column is less than 1. We do not express the vanishing process explicitly, and we do not have any data associated with the vanish process. However, we sometimes want to know the amount of the vanishing process. The upper one in Figure 11 shows the normal vanishing process, where the dashed line expresses it. We set the status of Cv , and set self-regression. The value associated with the status Cv expresses the accumulation of the vanishing.
Figure 11. The accumulaiton of the vanish.
4.5. Constant Flux In a transition matrix, the transition is expressed with a transition probability. Therefore, the flux from a one status to the other one is proportional to the value of the status before. Since the value changes with time, the flux changes accordingly. However, we want to use a constant value flux in some cases, where we want to express a constant flux a from a status j to a status i. We propose a constant source Sc as shown in Figure 12. We assume a self-regression for the source, which keep the value of the source constant. We also use an initial condition for the source as a . The transition probability from the source to the status j is set to be -1. This enables us to subtract a flux from the status i at all steps.
Kunihiro Suzuki
266
The transition probability from the source to the status i is set to be 1. This enables us to add a flux to the status i at all steps. Focusing on the status j and i , we obtain constant flux a from the status j to i .
b b We should be careful about the value of the status j of j . If j is smaller than a , it b cannot transfer the flux amount of a , and it should be j . Therefore, the element for the constant flux source
bSc should be
bSc Min a, b j
(43)
Figure 12. Transition diagram for constant flux from a status j to a status i .
4.6. Initial Condition Next, we consider an initial condition. Let us consider a transition matrix as shown in Eq. (21), where it corresponds to the status that one person starts from the status 0. The matrix operation predicts a probability how far that a person goes. If we set 10 instead of 1, this predicts how many members can go. If we set a certain member for various statuses, we can predict the probability distribution for how many members can be expected for each status.
A Markov Process
267
0 0 0 0 0 0 0 0 12 0 12 0 0 0 0 0 3 2 1 2 0 12 0 12 0 0 0 3 5 2 0 12 0 12 0 0 5 3 0 0 0 0 12 0 12 0 2 7 2 0 0 0 12 0 1 2 0 3 2 0 0 0 0 0 0 12 0 0 1
(44)
In the above matrix, we set 5 members at the 0 status 0, 3 members at the -1 status, and 2 members at the status 1, and expect the person distribution after the next step. We can use an initial matrix instead of an initial vector. The resultant data show the detailed transition for each initial member. 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 1 2 0 2 0 0 0 0 0 0 0 0 0 32 0 0 0 0 0 0 52 0 0 0 0 3 2 0 3 2 0 0 0 0 5 2 0 2 2 0 0 0 0 3 2 0 0 0 0 0 0 1 0
12 0 0 0 0 12 0 0 12 0 12 0 0 12 0 12 0 0 12 0 1 0 0 0 12 0 0 0 0 1
0 0 0 0 0 0 0
0 0 3 0 0 0 0
0 0 0 5 0 0 0
0 0 0 0 2 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
(45)
Summing up the row data, we obtain 0 0 0 0 0 0 0
0 0 0 0 0 0 32 0 0 0 0 0 52 0 0 0 32 0 32 0 0 0 52 0 22 0 0 0 32 0 0 0 0 0 1
0 0 0 3 2 0 5 2 0 3 0 7 2 0 3 2 0 1
(46)
Comparing the results of Eq. (44) and (45), we can check both results are the same.
Kunihiro Suzuki
268
5. VARIOUS EXAMPLES We apply the Markov process to various examples here.
5.1. Promotion of University Student Grade We trace the promotion of university students’ grade. We assume 4 grade to graduate a university. The gates to pass the promotion are set at the end of second and fourth grade. A student can try second times at each gate. When a student fails the second time gate, he withdraws the university. Adding that, some students withdraw every state spontaneously. We denote the corresponding status as below.
G1:The first grade student G2:The second grade student G22: The second grade student who failed the first gate one time G3:The third grade student G4:The fourth grade student G44: The fourth grade student who failed the second gate one time F:Graduated student D:Withdrawed student
We assume that 1000 students entered the university. 5% of the students withdraw at each step spontaneously. 60% of the second grade students pass the gate. 80% of G22 students pass the gate, and the others 20% students withdraw. 70% of the fourth grade students graduate the university. 70% of G44 students graduate the university, and the other 30% students withdraw. The corresponding transition diagram is shown in Figure 13. We show a corresponding transition matrix (Figure 14), and as initial vector (Figure 15). The results are shown in Table. In the final stage of 6 years later, about 70% students graduate, and others withdraw.
A Markov Process
Figure 13. Diagram for the promotion of university students.
Figure 14. Transition matrix for university student promotion.
269
Kunihiro Suzuki
270
Figure 15. Initial vector for university student promotion.
Table 1. Year dependence of the student number for each status Status G1 G2 G22 G3 G4 G44 F D
Initial 1st year 2nd year 3rd year 4th year 5th year 6th year 1000 0 0 0 0 0 0 0 950 0 0 0 0 0 0 0 332.5 0 0 0 0 0 0 570 266 0 0 0 0 0 0 541.5 252.7 0 0 0 0 0 0 135.4 63.2 0 0 0 0 0 379.1 650.7 694.9 0 50 97.5 192.5 232.9 286.1 305.1
5.2. Promotion of University Grade in the Steady State In the previous example, we traced the promotion process of entered 1000 students. Here, we analyze the steady state of the university student under the condition that 1000 students enter the university every year. This can be analyzed using a supply source. The corresponding transition diagram is shown in Figure 16. We do not monitor the accumulation of graded and withdrawed students, and hence eliminate the corresponding status.
A Markov Process
271
Figure 16. Diagram for university student promotion with a supply source.
The source S expresses that 1000 students enter the university every year. The corresponding transition matrix and initial vector are shown in Figure 17.
Figure 17. Transition matrix and initial vector for promotion of university student under the condition that 1000 students enter every year.
Kunihiro Suzuki
272
The results after k years passed can be evaluated as 1.00 1.00 0 0 0 0 0 0 0
0 0 0 0 0.95 0 0 0.35 0 0.60 0 0 0 0 0 0 0.05 0.05
0 0 0 0 0.80 0 0 0 0.20
0 0 0 0 0 0 0 0 0 0 0.95 0 0 0.25 0 0.70 0.05 0.05
0 0 0 0 0 0 0 0.70 0.30
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
k
1000 0 0 0 0 0 0 0 0
(47)
Table 2 shows the results, and Table 3 shows the student number in the steady state. Steady state is established after 7 years. The first grade student number is 1000. The second grade student number is larger than 1000, which is attributed to that some of the student cannot pass the gate. The third grade student is 836 that is attributed to the spontaneous and gated withdrawed students. The fourth grade student is 993. This is larger than the third grade students, which is attributed to that some of the student cannot graduate. The graded student number is 695 every year. Withdrawed student number is 305 every year. Table 2. Time dependence of student number of each status under the condition that 1000 students enter the university every year Status S G1 G2 G22 G3 G4 G44 F D
Step0 1000 0 0 0 0 0 0 0 0
Step1 1000 1000 0 0 0 0 0 0 0
Step2 1000 1000 950 0 0 0 0 0 50
Step3 1000 1000 950 332.5 570 0 0 0 97.5
Step4 1000 1000 950 332.5 836 541.5 0 0 192.5
Step5 1000 1000 950 332.5 836 794.2 135.4 379.1 232.9
Step6 1000 1000 950 332.5 836 794.2 198.6 650.7 286.1
Table 3. The student number in the steady state G1 G2 G3 G4 F
1000 1282.5 836 992.8 694.9
Step7 1000 1000 950 332.5 836 794.2 198.6 694.9 305.1
Step8 1000 1000 950 332.5 836 794.2 198.6 694.9 305.1
A Markov Process
273
5.3. Population Problem We treat a population problem, where we evaluate the dependence of the constitution of number of people in the age regions. We divide the age at the unit of 10 years. We assume that 95% people survive to the next division up to the 50s. It is 80% for 60s, 40% for 80s, and 0 for 90s. People at 20s and 30s generate babies with the ratio of rB . We neglect the difference between male and female in this analysis.
Figure 18. Diagram for population problem.
The corresponding transition diagram is shown in Figure 18, and the corresponding transition matrix and an initial vector are given below. 0 rB rB 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.95 0 0.95 0 0 0 0 0 0 0 0 0 0.95 0 0 0 0 0 0 0 0 0 0 0 0.95 0 0 0 0 0 0 0 0 0 0.95 0 0 0 0 0 0 0 0 0 0 0 0.95 0 0 0 0 0 0 0 0 0 0 0.80 0 0 0 0 0 0 0 0 0 0.60 0 0 0 0 0 0 0 0 0 0 0 0.40 0 0.05 0.05 0.05 0.05 0.05 0.05 0.20 0.40 0.60 1
0 100 0 100 0 100 0 100 0 100 0 100 0 100 0 100 0 100 0 100 0 0
(48)
The column and row number is ordered as 0,10,20,…,90 ages, and the final one corresponds to the status D. The unit for the initial condition is ten-thousand.
Kunihiro Suzuki
274
We evaluated the time evolution for 5 steps, that is the change in 50 years, which is shown in Table 4. The total population monotonically decreases with time with rB of 0.5, and it increases with rB of 0.7, although it decreases first. This decrease is influenced by the initial condition. Table 4. Time evolution of population rB=0.5 Age 0 10 20 30 40 50 60 70 80 90 D
Step0 100 100 100 100 100 100 100 100 100 100 0
Step1 100 95 95 95 95 95 95 80 60 40 250 850
Step2 95 95 90 90 90 90 90 76 48 24 156 789
Step3 90 90 90 86 86 86 86 72 46 19 129 751
Step4 88 86 86 86 81 81 81 69 43 18 119 720
Step5 86 84 81 81 81 77 77 65 41 17 113 692
Step0 100 100 100 100 100 100 100 100 100 100 0
Step1 140 95 95 95 95 95 95 80 60 40 250 890
Step2 133 133 90 90 90 90 90 76 48 24 158 865
Step3 126 126 126 86 86 86 86 72 46 19 133 859
Step4 148 120 120 120 81 81 81 69 43 18 124 883
Step5 168 141 114 114 114 77 77 65 41 17 122 930
Sum rB=0.7 Age 0 10 20 30 40 50 60 70 80 90 D Sum
It is important whether the total population increases or decreases with time. The source of the people is the 20s and 30s. Therefore, the people in these age region is important, which we add some analysis. We assume that the population of any age is constant. We set the population for 20s as n20 , and that for 30s as n30 . Both are related to
A Markov Process
275
n30 0.95n20
(49)
The population for 0s is denoted as n0 , and is given by
n0 rB n20 n30
(50)
1.95rB n20 The population of 10s is denoted as n10 , and is given by n10 0.95n0
(51)
0.95 1.95rB n20
Therefore, the population for 20 s is given by
n20 0.95n10
(52)
0.952 1.95rB n20 Therefore, the rB
rB for constant total population is evaluated as
1 0.568 0.95 1.95
(53)
2
Table 3 shows the time evolution of the population with the above
rB . We can expect
the constant total population as is expected, and the total population is about 760. We can improve this analysis by using more realistic values. Table 5. Time evolution with the optimized rB rB=0.57 Age 0 10 20 30 40 50 60 70 80 90 D Sum
Step0 100 100 100 100 100 100 100 100 100 100 0 1000
Step1 114 95 95 95 95 95 95 80 60 40 250 864
Step2 108 108 90 90 90 90 90 76 48 24 156 815
Step3 103 103 103 86 86 86 86 72 46 19 130 788
Step4 107 97 97 97 81 81 81 69 43 18 121 774
Step5 111 102 93 93 93 77 77 65 41 17 116 768
Step6 105 105 97 88 88 88 74 62 39 16 112 762
Step7 105 100 100 92 84 84 84 59 37 16 108 759
Step8 109 100 95 95 87 79 79 67 35 15 106 761
Step9 108 103 95 90 90 83 75 63 40 14 107 762
Step10 105 102 98 90 86 86 79 60 38 16 107 760
Kunihiro Suzuki
276
5.4. Share Rate of a Product We can evaluate a share rate of a product. We assume that A, B, and C companies make the same products, and they are competitive. We obtain data as shown in Table 6, where the change of the selected company is shown from 100 people. Table 6. Change of company product
A 2nd time
A B C Sum
1st time B 10 5 25 8 15 7 50 20
C 10 12 8 Total sum 30 100
The corresponding transition matrix and diagram are shown in Table 7 and Figure 19. The calculation results are shown in Table 8. Company A has a high share rate, but it decreases with the time step, and company B increases its share rate.
Table 7. Transition matrix for change of product
2nd time
A B C
1st time A B C 0.20 0.25 0.33 0.50 0.40 0.40 0.30 0.35 0.27
Table 8. Results of transition of company
A B C
Step0 Step1 Step2 Step3 0.5 0.25 0.26 0.26 0.2 0.45 0.43 0.43 0.3 0.3 0.31 0.31
A Markov Process
Figure 19. Diagram for transition of company.
Figure 20. Schematic figure for repeat customer.
277
Kunihiro Suzuki
278
5.5. Repeat Customer We assume a trade area, where N customers possibly use a shop. We want to evaluate the time evolution of repeat customers for the shop. That is, we want to know the ratio of repeat customers in the trade area, and the ratio of repeat customer in the shop. First we need to define repeat and non-repeat customers. We treat customers who use the shop at least one time. The repeat customer is defined as one who use the shop more than 10 times per year, and the other ones are defined as the non-repeat customer, and we know the average data as
f1 30 f2 5
(54)
We show a corresponding schematic figure (Figure 20). We set the status of the customers as repeat( R ) and non-repeat ( NR ). The corresponding Markov transition process is shown in Figure 21. The transition probability from a repeat customer to a nonrepeat customer is , and the opposite case is .
Figure 21. Markov transition process for a repeat customer.
Table 9. The transition of customers for continuous two years
This year
R NR Sum
Last year R NR 1300 1500 200 2000 1500 3500
Sum 2800 2200 5000
A Markov Process
279
We obtained data for 5000 customers within continuous two years as shown Table 9. We can form a transition matrix from this data given by y1 k 0.87 0.43 0 y2 k 0.13 0.57 5000 k
(55)
where we assume that the initial condition of all non-repeat customers. Therefore, the parameters for the transition Figure 21 are given by
0.13 0.43
(56)
y1 k denotes the repeat customer number at k step, and y2 k denotes the non-
repeat customer number at k step. Figure 22 shows the results evaluated with Eq. (55). The repeat customer number increases with the time step, and then saturates.
5000 Q = 5000 =6
y1, y2
4000
y1
3000 2000 y
2
1000 0
0
2
4
6
8
10
Step Figure 22. Time evolution of repeat and non-repeat customer number in the trade area.
Above results show the number of repeat and non-repeat customer numbers in the trade area. We further want to know the customer ratio of the customers who visit the shop. The total customer in in the trade area is denoted as Q , and is given by
Q y1 y2
(57)
Kunihiro Suzuki
280
We also know the average using frequency of repeat and non-repeat customers as
f1 , f2 . The customer number who use the shop per a day is denoted as G is then given by
G
y1 f1 y2 f 2 N
(58)
where N is the number of days in a year, and we set it as 365. The repeat and non-repeat customer numbers are then given by
x1
y1 f12 N
(59)
x2
y2 f 2 N
(60)
We define the ratio as f1 f 2 . Figure 23 shows the time evolution of repeat and non-repeat customer numbers who visits a shop per a day. The repeat customer number increases with the time step and then saturate, which feature is the same as the one for the customer number in the trade area.
500 Q = 5000 =6
400
x1, x2
G
300
x1
200 100 0
x2
0
2
4
6
8
10
Step Figure 23. Time evolution of repeat and non-repeat customers who visit a shop per a day.
A Markov Process
281
We can see in Figure 22 and Figure 23 that the repeat customer number increase and then saturates, which is pointed out previously. We evaluate the saturated number. In the saturated condition, the status is expected not to be changed, and we can set
y1 1 y1 y2 1 y2
(61)
We then obtain
y1 y2
(62)
Finally, we obtain the saturated repeat customer in the trade region as y1
Q 1
(63)
In this example, we obtain y1
5000 3839 0.13 1 0.43
(64)
The saturated number of repeat customers who visit the shop per a day is evaluated as
x1
f1 y1 30 3839 316 365 365
(65)
These values reproduce the results in Figure 22 and Figure 23.
5.6. Queue with Single Teller Window Let us consider a service trade. When we go to a bank, we should wait if the teller window is full, and get a service if it is vacant. We can treat this subject using a transition matrix. We assume one teller, and 5 for the maximum queue number. We assign the status number to the numbers who are in this system. Therefore, they are 0,1,2,3,4,5,6. We denote
Kunihiro Suzuki
282
the corresponding probabilities as P 0 , P 1 , , P 6 . This system is called as M/M/1(6), and Figure 24 shows this process schematically.
Figure 24. Diagram for M/M/1(6) system.
In this system, the number of person in the system increases by one with the probability of per a unit time. This means that one person enters the system with the probability. The
number of the person in the system decreased by one with a probability of per unit time. This means that the one member’s service is finished. We need to set the unit time so that two events do not occur simultaneously. For example, 5 members enter the system per one hour. We should use a unit of time of one minute instead of one hour. The probability to increase the number is then evaluated as
5 60
(66)
The unit we used decides the time step when we multiply a transition matrix. In this example, we use a probability per one minute. Therefore, the time step is one minute. We consider all statuses from here.
Let us consider a status of 0. We consider what will happen in the next step. In this status, there is no member who is served in the teller. If a customer does not come, the status is unchanged. The corresponding probability is 1 . If a customer comes, the status is changed to status 1. The corresponding probability is .
A Markov Process
283
We consider the status 1. If a customer does not come, and service is not finished, the status is kept. The corresponding probability is 1 1 1 . If a person comes, the status is changed to status 2. The corresponding probability is
. If a service is finished, the status is changed to status 0. The corresponding probability
is .
The statuses 2, 3, 4, and 5 are the same as the status 1.
We consider status 6. If the service is not finished, the status is not unchanged. The corresponding probability is 1 . We do not care whether a customer comes or not. Even if a customer comes, he returns and do not enter the system. If the service is finished, the status is change to the status 5. The corresponding
probability is . The corresponding transition matrix is given by
0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 When we can evaluate
P 0 , P 1 ,
, P 6
(67)
, we can evaluate the number of persons
who are in the system L , and the number who are waiting for the service
Lq
.as
L 1 P 1 2 P 2 3 P 3 4 P 4 5 P 5 6 P 6
(68)
Lq 1 P 2 2 P 3 3 P 4 4 P 5 5 P 6
(69)
Kunihiro Suzuki
284
Table 10. The probability for each status, and the expected number in the system, and queue number P(0) P(1) P(2) P(3) P(4) P(5) P(6) L Lq
0.35 0.24 0.16 0.10 0.07 0.05 0.03 1.56 0.92
We assume that 0.2, 0.3 , and use an initial condition of P 0 1 with the other of 0. The initial condition corresponds to the one where no member exists in the system. We can expect that the system form a steady state after we perform 1000step cycles. Table 10 shows the results. L is 1.56 and
Lq
is 0.92.
5.7. Queue with Multi Teller Windows We assume 3 tellers, and 3 for the maximum queue number. We assign the status number to the numbers who are in this system. Therefore, they are 0,1,2,3,4,5,6. We denote the corresponding probabilities as P 0 , P 1 , , P 6 . This system is called as M/M/3(6), and the corresponding schematic figure is shown in Figure 25.
Figure 25. Diagram for M/M/3(6) system.
A Markov Process
285
In this system, the number of persons in the system increase by one with the probability of per unit time. This means that one person enters the system with the probability. The number of the persons in the system decreases by one with a probability of
per unit
time for the status 1, and 2 for the status 2, and 3 for the status 3, 4, 5, and 6.
Let us consider the status of 0. We consider what will happen in the next step. In this status, there is no member who is served in the teller. If a customer does not come, the status is unchanged. The corresponding probability is 1 . If a customer comes, the status is changed to status 1. The corresponding probability is . We consider the status 1. If a customer does not come, and service is not finished, the status is kept. The 1 1 1 corresponding probability is . If a person comes, the status is changed to the status 2. The corresponding probability is . If a service is finished, the status is changed to the status 0. The corresponding probability is .
We consider the status 2. If a customer does not come, and service is not finished, the status is kept. The 1 1 2 1 2 corresponding probability is . If a person comes, the status is changed to the status 3. The corresponding probability is . If a service is finished, the status is changed to the status 1. The corresponding probability is 2 .
We consider the status 3. If a customer does not come, and service is not finished, the status is kept. The 1 1 3 1 3 corresponding probability is . If a person comes, the status is changed to the status 3. The corresponding probability is . If a service is finished, the status is changed to the status 1. The corresponding probability is 3 .
286
Kunihiro Suzuki The status 4 and 5 are the same as the one for the status 3.
We consider the status 6. If the service is not finished, the status is not changed. The corresponding probability is 1 3 . We do not care whether a customer comes or not. Even if a customer comes, he returns and do not enter the system. If the service is finished, the status is change to the status 5. The corresponding probability is 3 . The corresponding transition matrix is given by 0 0 0 0 0 1 1 2 0 0 0 0 0 1 2 3 0 0 0 0 1 3 3 0 0 0 0 0 0 1 3 3 0 0 0 0 1 3 3 0 0 0 0 0 0 1 3
(70)
We assume that 0.2, 0.1 , and use an initial condition of P 0 1 with the other of 0. The initial condition corresponds to the one where no member exists in the system. We can expect that the system form a steady state after we perform 1000step cycles. Table 10 shows the results. L is 2.30 and
Lq
is 0.40.
Table 11. The probability for each status, and the expected number in the system, and queue number for .M/M/3(6)
M M 3(6) P(0) P(1) P(2) P(3) P(4) P(5) P(6) L Lq
0.12 0.24 0.24 0.16 0.11 0.07 0.05 2.30 0.40
A Markov Process
287
The Markov process using a transition matrix can accommodate non-steady state
if we change the matrix element conditions, and can include the time dependence of , depending on the time.
5.8. Blood Type Transition We consider the constitution and evolution of blood types. The constituent ratio for Japanese people is AB-type:10%, A-type: 35%, B-type:25%, and O-type:30%. We analyze the time evolution of the types. We simplify some points to analyze this subject. We assume that the total population is invariable. That is the number of dead people and that of born people are the same. The generation change occurs simultaneously. The flow from the other region does not exit. The ratio is the same for male and female. This means that the number of fundamental elements of A,B, and O are constant. The blood types for the next generation are determined from the current blood type constitution, and hence it can be regarded as a Markov process. However, we cannot constitute a corresponding matrix with the above data. We have four blood types of AB, A, B, and O. However, it is 6 kinds if we consider the detailed constitution of AB, AA, AO, BB, BO, and OO. In the analysis, we use this fundamental 6 kinds of blood types although we have no corresponding data. We assign the status for each blood type as X1、X2、X3、X4、X5、and X6. The crosses of these 6 types decide the next generation blood type constitution as shown in Table 12. The crossing always generates 4 components, but some of them are the same. Table 12. Generation of blood type X1 AB
X2 AA
X3 AO
X4 BB
X5 BO
X6 OO
X1 AB
AA AB AB BB
AA AA AB AB
AA AO AB BO
AB AB BB BB
AB AO BB BO
AO AO BO BO
X2 AA
AA AA AB AB
AA AA AA AA
AA AO AA AO
AB AB AB AB
AB AO AB AO
AO AO AO AO
X3 AO
AA AO AB BO
AA AO AA AO
AA AO AO OO
AB AB BO BO
AB AO BO OO
AO AO OO OO
X4 BB
AB AB BB BB
AB AB AB AB
AB AB BO BO
BB BB BB BB
BB BO BB BO
BO BO BO BO
X5 BO
AB AO BB BO
AB AO AB AO
AB AO BO OO
BB BO BB BO
BB BO BO OO
BO BO OO OO
X6 OO
AO AO BO BO
AO AO AO AO
AO AO OO OO
BO BO BO BO
BO BO OO OO
OO OO OO OO
Kunihiro Suzuki
288
n 1 We set the current constitution as X i , and the next generation constitution as X i . We then obtain n
X1
n 1
X 2
n 1
X 3
n 1
X 4
X 5
n 1
n 1
2 n n 2 n n 1 n n 2 n n 1 n n X1 X1 X1 X 2 2 X1 X 3 2 X1 X 4 2 X1 X 5 2 4 4 4 4 4 4 n n 2 n n X2 X4 2 X2 X5 2 4 4 2 n n 1 n n X 3 X 4 2 X 3 X 5 2 4 4 (71)
1 n n 2 n n 1 n n X 1 X 1 X1 X 2 2 X 1 X 3 2 4 4 4 4 n 2 n n n X 2 X 2 X 2 X 3 2 4 4 1 n n X 3 X 3 4
(72)
1 n n 1 n 2 n n n X 1 X 3 2 X 1 X 5 2 X 1 X 6 2 4 4 4 2 n 2 n 4 n n n n X 2 X 3 2 X 2 X 5 2 X 2 X 6 2 4 4 4 2 n 1 n 2 n n n n X 3 X 3 X 3 X 5 2 X 3 X 6 2 4 4 4
(73)
1 n n 2 n n 1 n n X1 X1 X1 X 4 2 X1 X 5 2 4 4 4 4 n n 2 n n X4 X4 X4 X5 2 4 4 1 n n X 5 X 5 4
(74)
1 n n 1 n n 2 n n X 1 X 3 2 X1 X 5 2 X1 X 6 2 4 4 4 2 n n 1 n n X3 X4 2 X3 X5 2 4 4 2 n n 4 n n X 4 X 5 2 X 4 X 6 2 4 4 2 n n 2 n n X 5 X 5 X 5 X 6 2 4 4
(75)
A Markov Process
X 6
n 1
289
1 n n 1 n n 2 n n X 3 X 3 X 3 X 5 2 X 3 X 6 2 4 4 4 1 n n 2 n n X 5 X 5 X 5 X 6 2 4 4 4 n n X6 X6 4
(76)
We easily obtain a steady state after about two or three cycle steps. The steady sate depends on the initial condition significantly. Since we do not know the real ratio for the detail type constitution ratio, we use it as a fitting parameter of , and the ratio is then given by
rAA rA0 rBB r B0
0.35 0.35 1 (77)
0.25 0.25 1
We obtained a good agreement with 0.2 as shown in Table 13. The steady state is established at the first step cycle, and it is not far from the initial state. According to the results, the ratio of AA and BB are rarely case among A and B–type people. Table 13. Time evolution of blood type constitution Type X1 X2 X3 X4 X5 X6
Step0 0.10 0.07 0.28 0.05 0.20 0.30
Step1 0.10 0.07 0.28 0.04 0.22 0.29
Step2 0.10 0.07 0.28 0.04 0.22 0.29
Ratio Blood type 0.10 AB 0.35
A
0.26
B
0.29
O
6. STATUS AFTER LONG TIME STEPS When the transition matrix is decided, we can sometimes discuss the status after long time steps.
Kunihiro Suzuki
290
6.1. Status after N Step The status after
k
steps D can be expressed as
D Ak B
(78)
where A is a transition matrix, and B is the initial matrix, and are given by
a11 a12 a a A 21 22 an1 an 2
a1n a2 n ann
b1 b B 2 bn
(79)
(80)
We denote that the eigen vectors and eigen values of matrix A as x1 ,x2 , ,xn , and
1 ,2 , ,n , respectively. We then constitute a matrix P as
P x1 x2
xn
(81)
where
a11 a12 a a Axi 21 22 an1 an 2
a1n a2 n x i i ann
(82)
A Markov Process a11 a AP 21 an1 1x1
a1n a2 n x x2 1 ann n x n
a12 a22 an 2
2 x 2
1 0 0 2 xn 0 0 0 0 n
x1 x 2
1 0 0 2 P 0 0
291
xn
0 0 n
(83)
1 We also evaluate the inverse matrix of P denoted as P . We can realize a diagonal matrix as
1 0 P 1 AP 0
0 0 n
0
2 0
(84)
Therefore, we obtain
P
1
AP
k
1k 0 0
0
2 k 0
0 0 n k
(85)
On the other hand, we can extend the left side of Eq. (85) as
P
1
AP
P k
1
P AP A PP P
AP P 1 AP
P 1 A PP 1 P 1 Ak P
1
1
(86)
Kunihiro Suzuki
292 Therefore, we obtain 1k 0 k A P 0
0
2 k 0
0 0 1 P n k
Finally, we obtain the status after the 1k 0 D P 0
0
2
k
0
(87)
k
step processes as
0 0 1 P B n k
(88)
6.2. Steady State If there is a steady state, we can expect
Ax x
(89)
This can be modified as
A E x 0
(90)
Therefore, we can evaluate the existence of the steady state as
det A E 0
(91)
6.3. Vanishing Process If we have some vanishing elements, we can modify the transition matrix as
E R A 0 Q
(92)
A Markov Process
293
where E is a unit matrix, which expresses the status an object stay forever. Q expresses the tentative status, and R expresses the transition from the tentative status to the vanishing process. The matrix is denoted as D , the elements correspond to the probability where a status j transit to a status
k . The matrix size of D
is the same as that of R . We transfer the
status from the matrix R directly and through the status in the matrix Q to the matrix R and then reach to the status k . D R DQ
(93)
We then obtain
D RE Q
1
(94)
7. A NETWORK LOOP We form some kinds of networks. If there are loops in the network, the characteristics become unstable. Therefore, it is important to monitor the existence of the loops in the network. We apply the transition matrix to monitor the loops.
7.1. Network Matrix Let us consider a simple network as shown in Figure 26. The flow direction is expressed by arrows. The corresponding process is n1→n2→n3, and then it ends.
Figure 26. Diagram for a network path.
We consider the corresponding matrix associated with the network. The column number corresponds to the status start from, and the row number corresponds to the status to end. Then the corresponding network matrix is expressed by
Kunihiro Suzuki
294 0 0 0 N 1 0 0 0 1 0
(95)
Since we focus on the flow, the elements are all 1. The sum of the elements along each column is not 1 in general. We only notice that the element is 0 or not 0. How, we can express the flow of n1→n2→n3. We consider the initial vector of 1 0 0
(96)
Multiplying the network matrix to the initial vector, we obtain 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0
(97)
This means that the status is in the 2. Multiplying the network matrix to the vector of Eq. (97), we obtain 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1
(98)
This express that the status is now 3. Multiplying the network matrix to the vector, we obtain 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0
This expresses the process flow when we start from the node1. Let us use an initial vector given by
(99)
A Markov Process 0 1 0
295
(100)
Multiplying the network matrix to the vector continuously, we obtain 0 0 0 1 0 0 0 1 0
(101)
This expresses the flow when we start from the node2. The resultant vector shows that the location of the process after the step. If here is no loop in the network, we can expect all 0 elements.
7.2. A Network Path with a Loop Figure 27 shows a network path with a loop. It is clear that the nodes 3 and 4 form a loop. We want to obtain an algorithm to monitor the loop.
Figure 27. Diagram of network path with a loop.
The corresponding network matrix is given by 0 1 0 N 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 1 0 1 0
0 0 0 0 0 1
0 0 0 0 0 0
(102)
We set an initial vector with all elements of 1, and multiply N to the initial vector as
Kunihiro Suzuki
296 0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 1 0 1 0
0 0 0 0 0 1
0 1 0 0 1 1 0 1 2 0 1 1 0 1 1 0 1 1
(103)
The node 1 vanishes and the other nodes are not 0. The node 3 element is 2 because it has a flux from the nodes 2 and 4. Multiplying N once more and we obtain 0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 1 0 1 0
0 0 0 0 0 1
0 0 0 0 0 0
2
1 0 1 0 1 2 1 2 1 1 1 1
(104)
The element of the node 2 becomes 0. Multiplying once more, we obtain 0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 1 0 1 0
0 0 0 0 0 1
0 0 0 0 0 0
3
1 0 1 0 1 2 1 2 1 2 1 1
(105)
Multiplying once more again, we obtain 0 1 0 0 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 1 0 1 0
0 0 0 0 0 1
0 0 0 0 0 0
4
1 0 1 0 1 2 1 2 1 2 1 2
(106)
It is clear that the feature of the vector is invariable, where elements of the node1 and 2 are 0, and the other elements are not 0.
A Markov Process
297
Therefore, we regard A as a set of nodes that are on or the right side of the loop, and is given by
A n3, n4, n5, n6
(107)
We then consider a transverse matrix with respect to N given by 0 0 0 NT 0 0 0
1 0 0 0 0 0
0 1 0 1 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 1 0
(108)
This corresponds to the network diagram with opposite direction arrows. We perform a similar operation. T We set an initial vector with all elements of 1, and multiply N to the initial vector as
0 0 0 0 0 0
1 0 0 0 0 0
0 1 0 1 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 1 1 0 1 1 0 1 1 0 1 2 1 1 1 0 1 0
(109)
T Multiplying N once more, we obtain
0 0 0 0 0 0
1 0 0 0 0 0
0 1 0 1 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 1 0
2
1 1 1 1 1 2 1 2 1 0 1 0
(110)
Kunihiro Suzuki
298
T Multiplying N once more, we obtain
0 0 0 0 0 0
1 0 0 0 0 0
0 1 0 1 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 1 0
3
1 1 1 2 1 2 1 2 1 0 1 0
(111)
T Multiplying N once more, we obtain
0 0 0 0 0 0
1 0 0 0 0 0
0 1 0 1 0 0
0 0 1 0 0 0
0 0 0 1 0 0
0 0 0 0 1 0
4
1 2 1 2 1 2 1 2 1 0 1 0
(112)
It is clear that the feature of the vector is invariable, where elements of the node5 and 6 are 0, and the other elements are not 0. Therefore, we regard B as a set of nodes that are on or the left side of the loop, and is given by
B n1, n2, n3, n4
(113)
We set a set on the loop as L , and obtain
LA
B
n3, n4
(114)
Therefore, the nodes on the loop are determined to be 3 and 4. The above algorithm can be applied to any network. We should care that we need to multiply much time of more than number of network points.
A Markov Process
299
SUMMARY To summarize the results in this chapter: The status after and is given by a11 a21 a31 a41 a 51
a12 a22 a32 a42 a52
a13 a23 a33 a43 a53
k
step can be expressed with the transition matrix and the initial vector
a14 a24 a34 a44 a54
a15 a25 a35 a45 a55
k
b1 b2 b3 b4 b 5
This can be extended to the system with a source as 1 p1 p2 p3 p 4 p5
0 a11 a21 a31 a41 a51
0 a12 a22 a32 a42 a52
0 a13 a23 a33 a43 a53
0 a14 a24 a34 a44 a54
0 a15 a25 a35 a45 a55
k
a b1 b2 b3 b 4 b5
We can monitor the loop by multiplying a network matrix many times of more than elements number with the initial vector where the elements are all 1. We monitor the node numbers where the elements are not 0. We then multiply a transverse network matrix many times of more than elements number with the same initial vector. We then monitor the node numbers where the elements are not 0. The selected nodes in both processes are the nodes on the loop path.
Chapter 14
RANDOM NUMBER ABSTRACT We can simulate probability phenomenon using random numbers. Since probability number follows various probability distribution functions, we need to generate the corresponding random numbers related to the distributions. We study how to generate the random numbers in this chapter.
Keywords: random number, regularity, random series, uniform distribution, poisson distribution, normal distribution, exponential distribution
1. INTRODUCTION We want to simulate a probability process before we obtain corresponding real data, or appreciate the obtained data. In the simulation, we need to generate random numbers. Since there are various kinds of probability distribution functions, we need to generate corresponding various kinds of random numbers.
2. CHARACTERISTICS OF RANDOM NUMBER We assume a vessel with ten balls denoted as 0, 1, 2,・・・, and 9. We stir up the balls, and pick up one among them, and record the number. We then return the ball into the vessel, stir up the balls, and pick up one, and record the number. We continue the process. The resultant numbers correspond to random numbers, which have the characteristics below.
Kunihiro Suzuki
302
Characteristic 1: Principle of Equal A Priori Probabilities We repeat the above trials n times, and obtain a number series. The ratio of number ki which is associated with number i to n can be approached to the value below, that is, it is expressed by.
ki 1 n n 10
lim
for i 0,1, 2,
,9 (1)
This means that the all numbers of 0, 1,2, …,9 have equal priori probabilities.
Characteristic 2: No Regularity We obtain a first number. The second number does not depend on the first number. This characteristic is called as no regularity, which means that there is no correlation between them.
3. UNIFORM RANDOM NUMBER SERIES Let us consider a regular dodecahedron dice which have planes having a number of 1,2,…,12. We convert the number n to n-1. When we have a number 11, and 12, we do not record and try again until we obtain a number of less than 11. We can then obtain a number series between 0 and 9. If we use two dices and assign the number of the first dice to the first digit, and the second dice to the second digit, we obtain a random number series between 00 and 99. n If we use n dices, we can generate a random number series between 0 and 1 10 .
4. NUMERICAL UNIFORM RANDOM NUMBER GENERATION METHOD We can generate a random number numerically. Lehmer proposed a method to generate a random number. We set
xn1 15 xn x0 1
mod10 1 6
(2) (3)
Random Number
This means that
303
xn 1 is a reminder of 15xn divided by 106 1 . This can be performed
as x0 1 x1 15
x2 15 15 mod 106 1 x2 225
x3 15 225 mod 106 1 x3 3375
x4 15 3375 mod 106 1
(4)
x4 50625
x5 15 50625 mod 106 1 x5 759375
x6 15 759375 mod 106 1 x6 390614
We then obtain a random series given by 1,15, 225,3375,50625,759375,309614,
(5)
6 6 This is the random number series between 0 and 10 . Dividing them by 10 , we obtain
a random number series with a region of 0,1 . We can generalize it more, and define below.
xn1 axn
mod 10
m
1
x0 b
(6) (7)
We can then obtain a number series given by x0 , x1 , x2 , x3 , x4 ,
(8)
This is a random number series between 0 and 10m . Dividing them by 10m , we obtain a random number series with a region of 0,1 .
Kunihiro Suzuki
304
This random series are not perfect, and have some periodic characteristics. Therefore, m
it is called as quasi random series. Roughly speaking, the periodicity is about 10 . This should be much larger than the trial event times, and is worth to be used in practical cases.
5. TESTING OF RANDOM NUMBER SERIES We should check the random number in the stand point of principle of equal priori probabilities and no regularity.
5.1. Testing of Equal Priori Probabilities We divide the range 0,1 by l regions. The value at the edge of the divided region is denoted as pi . Therefore, we have p0 0, pl 1
(9)
We denote the number of the data in the i-th region as fi . The expected data number Fi where the data are in the i-th region is given by Fi N pi pi 1
(10)
We define the 2 as l
fi Fi
i 1
Fi
2
2
(11)
2 This can be regarded to follow distribution with a freedom of l 1 , and the critical 2 value c is denoted as
c2 2 l 1, P
(12)
2 2 where P is the prediction probability. If the is smaller than c , the equal priori probabilities is valid.
Random Number
305
5.2. Testing of No Regularity Correlation Factor Testing To test the no regularity of a random number series, we utilize a correlation factor. Since a random number series is one dimensional number, we generate two dimensional one from the series. We use a number and a number apart from k-t, that is we use xi and xi k pair, and the correlation factor is evaluated as 1 n xi x xi k x n i 1 r 1 n 2 xi x n i 1 1 n 1 n 1 n x x x x x i ik n xi k x 2 i n i 1 n i 1 i 1 1 n 2 xi x n i 1
(13)
1 n xi xi k x 2 n i 1 1 n 2 xi x n i 1
when i n k 1, we regard xi k as
xi k x i k n
(14)
The data number is n. We introduced a variable as t n2
r 1 r2
This follows a
(15)
t distribution with a freedom of n 2 as shown in Chapter 2 of volume
2. The critical value tc is denoted as
tc t n 2, P
(16)
Kunihiro Suzuki
306
where P is the prediction probability. If the ensured.
t is smaller than tc , the no regularity is
We can select any k , but values between 1 and 5 are frequently used.
Combination Testing We divide a random number series by a certain length. For example, 10 length. We then obtain m divisions. We further categorize a random number as below. If the random number is less than 0.5, we assign it to 0, and else we assign to 1.We count the number of 1 in each division. If the random number series has no regularity, the division number which has
k of 1 is denoted as
Ek is
10
1 Ek m 10 Ck 2
for k 0,1,2, ,10
(17)
We denote the data of the random number series as k , and define the 2 as
2
k Ek
k
2
Ek
(18)
2 This can be regarded to follow distribution with a freedom of m 1 , and the
critical value
c2 is denoted as
c2 2 m 1, P
(19)
2 2 where P is the prediction probability. If the is smaller than c , the equal priori probabilities is ensured.
Runs Testing We assign the random number of less than 0.5 to A and else to B . The random number series is converted as below for example
BAABBBABBABAAABABB
(20)
Random Number
307
We call AA , BB , and AAA , where we fine continuous character rows, as runs. We define the length of each run as the number of the character. We test the order of A and
B have no regularity. The number of A and B are n A and nB , respectively. We set n as number of the random number, and hence,
n nA nB
(21)
If no regularity is ensured, it is known that the average and variance of number of run approached to
2nAnB 1 nA nB
2
2nA nB 2nA nB nA nB
nA nB 2 nA nB 1
(22)
(23)
With increasing n . We then obtain a normalized variable as
z
x
(24)
We assume that it follows a standard normal distribution. The corresponding critical value is denoted as is smaller than
z P , where subscript P denotes the predictive probability. If the
z
z P , no regularity is ensured.
6. RANDOM NUMBER SERIES ASSOCIATED WITH VARIOUS PROBABILITY DISTRIBUTIONS Utilizing uniform random series, we convert to the one for various probability distribution functions.
308
Kunihiro Suzuki Uniform random numbers consist of random numbers in the range of 0,1 . However,
we treat uniform random natural numbers in the range of 1,10 to explain the procedure clearly.
(a)
(b) f , which is converted from the Figure 1. Random number generation for distribution function uniform random numbers. (a) Uniform random number. (b) Converted random number.
In the uniform random number series, the probability that a certain number occurs is the same, which is shown in Figure 1(a) as f . We have a probability distribution function as shown in Figure 1 (b). We convert the number of x to the number of t as shown in
Random Number
309
Table 1. We can then obtain a random number associated with the probability distribution function. Table 1. Converted random number table
x
t 1 2 3 4 5 6 7 8 9 10
t1
2t1 3t1 4t1
7. INVERSE TYPE RANDOM NUMBER GENERATION FOR GENERAL PROBABILITY FUNCTION We investigate the procedure in the previous section mathematically. The probability associated with the uniform number with range 0,1 is denoted as express the uniform random series with the range 0,1 as expressed by g x x f t t
Since
g x 1
g x 1
Rand 1
and is 1. We
. The conversion is
(25)
, this is reduced to
x f t t
(26)
Expressing this equation with an integral form, we obtain
x
0
dx f t dt t
0
Therefore, we obtain
(27)
Kunihiro Suzuki
310 x F t
(28)
where F t f t dt t
(29)
0
Setting and inverse function of F t as invF , we obtain t invFx invFRand 1
(30)
This is the form of the random number associated with an arbitrary probability distribution function f t .
8. RANDOM NUMBER SERIES FOR EXPONENTIAL DISTRIBUTION An exponential probability distribution is expressed by f t exp t
(31)
We then obtain an integral function as
F t 0 exp t dt 1 exp t t
(32)
Therefore, we obtain x 1 exp t
(33)
Finally, we obtain
t
1
ln 1 x
1
ln 1 Rand 1
(34)
Rand 1 is the random series with a range of 0,1 . Therefore, 1 Rand 1 is also the
random series with a range of 0,1 . Therefore, we can also use a form given by
Random Number
311
1 t ln Rand 1
(35)
The range of Rand 1 are commonly used as
0 Rand 1 1
(36)
That is, we have a probability of case. When
Rand 1
Rand 1 0
, but no probability of
Rand 1 1
in this
is 0, Eq. (35) infinitely diverges. Therefore, Eq. (34) is preferred. If
the range of Rand 1 is given below 0 Rand 1 1
(37)
We should use Eq. (35) instead of Eq. (34) to avoid the infinite divergence problem.
9. RANDOM NUMBER SERIES FOR POISSON DISTRIBUTION We apply the above procedure to the Poisson distribution and obtain
t x exp t dk ' 0 k '! k
k'
(38)
We should obtain k which satisfies Eq. (38). We treat k and k ' as continuous number in Eq. (38). However, they are natural numbers in reality in the Poisson distribution. Therefore, Eq. (38) is not valid. We express the right side of Eq. (38) correctly, and obtain k
x Rand 1
k ' 0
t k ' exp t k '!
(39)
The left side and the right side are not equal in general. Therefore, we evaluate the
minimum k that exceed x Rand 1 . A Poisson distribution can be regarded as one, where the event occurring time period follows an exponential distribution.
Kunihiro Suzuki
312
We generate the exponential random number that corresponds to the event occurring period. When the sum of the period exceeds t for the first time, the number of k can be the target random number. This process can be expressed by, t t1 t2
1
tk
ln 1 Rand1 1
1
ln 1 Rand 2 1
1 ln 1 Rand1 1 1 Rand 2 1
1
ln 1 Rand k 1
1 Rand 1 k
(40)
where tn
1
ln 1 Randn 1
(41)
We then have
1 Rand 1 1 Rand 1 1 Rand 1 exp t 1
2
k
(42)
The minimum k that holds Eq. (42) corresponds to the target k . In this case, we need not to care about an infinite convergence problem, and hence can use below.
1 Rand 1 Rand 1 i
(43)
i
Therefore, it can be expressed by simpler form as Rand1 1 Rand 2 1
k
Rand k 1 Rand n 1 exp t
(44)
n 1
10. RANDOM NUMBER SERIES FOR A NORMAL DISTRIBUTION The normal distribution is given by f t
t 2 exp 2 2 1
(45)
Random Number
313
The integral is given by t
t ' 2 1 F t exp dt ' 2 2 1 1 Erf 2
(46)
t 2
Therefore, we obtain
t 2 Erf 1 2Rand 1 1
(47)
We can generate a random number series utilizing a central limit theorem. We convert
t as z
t
(48)
This follows a standard normal distribution. The average of Rand 1 is given by
1
x 1dx 2 1
(49)
0
The corresponding variance is given by 1
2
1
1 1 2 x 1dx x x dx 2 4 0 0 1 1 1 3 2 4 463 12 1 12
(50)
When we sum up n variables, and increase n, the distribution approaches to a normal distribution with average of
n
2
and variance of
n
12
. Therefore, the variable
Kunihiro Suzuki
314 n n Rnad i 1 2 z i 1 n 12
(51)
can be regarded to follow a standard normal distribution. The converted variable n n Rnad i 1 2 t i 1 n 12
follows a normal distribution with an average of n 12 , we obtain
(52)
and standard deviation of . Setting
12 t Rnadi 1 6 i 1
(53)
The value that follows a normal distribution has both negative and positive sign. However, we only use positive value in some cases. For example, service time and the time period between the events occur. In that case, we neglect the negative random number, and only take positive ones. The real average of the random number deviate the average used in the distribution expression, and the average ' as evaluate
' should be larger than . We
'
t 2 1 t exp dt 2 2 0
0
t 2 1 exp dt 2 2
(54)
Introducing a variable s
t 2
(55)
Random Number
315
we obtain
'
t 2 1 t exp dt 2 2 0
0
2
2
t 2 1 exp dt 2 2
s exp s 2 dt exp s 2 dt
exp 1 erf
(56)
2 2
2
That is, we obtain
exp ' 1 erf This is larger than
2
2
2
(57)
as is expected.
11. RANDOM NUMBER SERIES FOR NATURAL NUMBERS BETWEEN 1 AND N We want to generate the natural number such as the dice, which can be realized as
Int Rand 1 10n 1
(58)
Kunihiro Suzuki
316
12. TWO RANDOM NUMBERS THAT FOLLOW NORMAL DISTRIBUTIONS WITH A CORRELATION FACTOR OF We first generate two independent random numbers of X1 and X 2 that follow standard normal distributions. However, this distribution is not a constraint. We can use any type of random numbers. We then convert them as Y1 X1
(59)
Y2 X 1 1 2 X 2
(60)
The corresponding expected values and variances are evaluated as below. E Y1 E X1
(61)
V Y1 V X1
(62)
E Y2 E X 1 1 2 X 2
(63)
E X1 1 2 E X 2
V Y2 2V X 1 1 2 V X 2 2 1 2 C ov X 1 , X 2 2V X 1 1 2 V X 2
(64)
Cov Y1 , Y2 E X 1 X 1 1 2 X 2 E X 12 1 2 Cov X 1 , X 2
(65)
E X 12
In reality, the averages of X1 and X 2 are sample data and their average deviate from 0, and the variance either deviate from 1. We evaluate the averages and variances of X1 and X 2 as 1
1 n xi1 n i 1
(66)
Random Number 1 2
2
1 n 2 xi1 1 n i 1
1 n xi 2 n i 1
2 2
1 n 2 xi 2 2 n i 1
317 (67)
(68)
(69)
We further evaluate the covariance as
1 n xi1 1 xi 2 2 n i 1
(70)
We then normalize two variables as
zi1
zi 2
xi1 1
1 2 xi 2 2
2 2
(71)
(72)
We convert them as
yi1 zi1
(73)
1 2 1 2 yi 2 z zi 2 i1 1 2 1 2
(74)
The averages are then 0 and variances are as below. E Y1 E Z1 0
(75)
V Y1 V Z1 1
(76)
Kunihiro Suzuki
318
1 2 1 2 E Y2 E Z1 Z2 2 1 1 2 1 2 1 2 E Z1 E Z2 2 1 1 2 0
(77)
1 2 1 2 V Y2 V Z Z2 1 1 2 1 2 2
1 2 1 2 1 2 1 2 V Z V Z 2 Cov Z1 , Z 2 1 2 1 2 1 2 1 2 1 2
2
1 2 1 2 1 2 1 2 2 2 2 2 1 1 1 1 2 1 2 1 2 1 2 1 2 2 2 2 1 1 1 2 1 2 1 2 1 2 1 2 2 1 1 2 1 2 1 2 2 1 2 2 1 2 1 2
2 1 2 1 2 2 1 2 1 2
2 2 2 2 2 2 1 2 1 2
1
(78) The covariance is given by 1 2 1 2 Cov Y1 , Y2 Cov Z1 , Z Z2 1 1 2 1 2 1 2 1 2 2 E Z E Z1 , Z 2 1 1 2 1 2
1 2 1 2 2 1 1 2
(79)
Random Number
319
SUMMARY To summarize the results in this chapter: We set
xn1 axn
mod 10
m
1
x0 b We can then obtain a number series given by
x0 , x1 , x2 , x3 , x4 , This is a random number series between 0 and 10m . Dividing them by 10m , we obtain a random number series with a region of 0,1 . We can generate the random number associated with a probability function below. We integrate the function as below.
f t
as
F t f t dt t
0
Setting an inverse function of F t as invF , we obtain t invFx invFRand 1
An exponential probability distribution is expressed by f t exp t
and the related random number can be generated as
t
1
ln 1 x
1
ln 1 Rand 1
The random number associated with a Poisson distribution can be obtained from below.
Kunihiro Suzuki
320 Rand1 1 Rand 2 1
k
Rand k 1 Rand n 1 exp t n 1
The number k that breaks the above first is the target one. We can generate a random number associated with a normal distribution as
12 t Rnadi 1 6 i 1 We can obtain two kinds random numbers with correlation. We first generate independent two random numbers of X1 and X 2 , and convert them as Y1 X1
Y2 X 1 1 2 X 2 Then the number corresponds to the random number with a correlation factor of
.
Chapter 15
MATRIX OPERATION ABSTRACT Matrix operation is important and fundamental mathematical analysis in statistics. Therefore, we treat the operation of a matrix. We treat sum, product, and inverse matrices, determinant of a matrix, eigenvalues, and eigenvectors.
Keywords: matrix, inverse matrix, transverse matrix, determinant, eigenvalue, eigenvector
1. INTRODUCTION Matrix operation is a base of the statistics, and many analyses in the statistics are based on this subject. Therefore, we treat basic operations of matrix in this chapter.
2. DEFINITION OF A MATRIX A matrix A with a11 a12 a a A 21 22 an1 an 2
n
rows and m columns are defined by a1m a2 m anm
a The element of i -th row and j -th column is denoted as ij .
(1)
Kunihiro Suzuki
322
When m n , it is called as a square matrix with an order of n , and aii is called as the main diagonal components. A square matrix with the main diagonal elements are all 1 and the other elements are 0 is called as a unit matrix, which is given by 1 0 0 E 0
0 1 0
0 0 1
0
0
0
0 0 0 0 1
(2)
3. SUM OF A MATRIX Sum and difference of matrices of A and B are denoted as C and is given by a11 a21 an1
a12 a22 an 2
a1n b11 b12 a2 n b21 a22 ann bn1 bn 2
b1n c11 c12 b2 n c21 c22 bnn cn1 cn 2
c1n c2 n cnn
(3)
The elements are given by
cij aij bij
(4)
4. PRODUCT OF A CONSTANT NUMBER AND A MATRIX A product of a constant number k and a matrix A is given by a11 a k 21 an1
a12 a22 an 2
a1n ka11 a2 n ka21 ann kan1
ka12 ka22 kan 2
ka1n ka2 n kann
(5)
Matrix Operation
323
5. A PRODUCT OF TWO MATRICES RELATED TO A SIMULTANEOUS EQUATIONS A product of two matrices is rather difficult. It is convenient to relate it to simultaneous equations. It should also be noted that solving the simultaneous equation is one of the important subject for matrix operation. A simultaneous equation for n-variables of x1 , x2 , , xn is given by a11 x1 a12 x2 a21 x1 a22 x2 an1 x1 an 2 x2
a1n xn b1 a2 n xn b2 a1n xn bn
(6)
This is described with a matrix form as a11 a21 an1
a12 a22 an 2
a1n x1 b1 a2 n x2 b2 ann xn bn
(7)
The first matrix in Eq. (7) is called a coefficient matrix, and it is an n-th order square matrix for n-variable simultaneous equations. The product is easily appreciated as n
bi aik xk k 1
(8)
The definition of the product can be easily generalized as follows. We assume that a matrix A has n rows and mcolumns, and a matrix B has m rows and l columns. We can then perform a product operation. The retraction for the product is that the column number of the matrix A and row number of the matrix B must be the same. We can then obtain a matrix C of the product of the matrix A and B as a11 a21 an1
a12 a22 an 2
a1m b11 b12 a2 m b21 a22 anm bm1 bn 2
b1l c11 b2l c21 bnl cn1
c12 c22 cn 2
c1l c2l cnl
(9)
Kunihiro Suzuki
324
The elements of the matrix C are given by m
cij aik bkj k 1
(10)
The row number of C is the same as the row number of A , and the column number of C is the same as the column number of B , that is, the matrix C has a row number of
n and a column number of l . It should be noted that the product AB and BA are different, and sometimes the other cannot be performed. In the above example, the column number of the matrix B is l and the row number of the matrix A is n, and they are different in general, and we cannot perform the product operation. It should also be noted, the product of a unit matrix and any square matrix is independent of the order, that is, we have
EA AE A
(11)
6. TRANSVERSE MATRIX T A transverse matrix with respect to A is denoted as A , which is given by changing the row and column number, and is given by
a11 a21 an1
a12 a22 an 2
T
a1n a11 a2 n a 12 ann a1n
a21 a22 a2 n
an1 an 2 ann
(12)
T The elements of the transverse matrix aij is expressed by
aijT a ji
(13)
Let us consider the transverse matrix of a product of AB , that is, let us consider
AB T .
Matrix Operation
325
The elements of AB should be expressed by T
ab ij
c ji
T
m
a jk bki k 1 m
bki a jk k 1
(14)
Therefore, it is expressed as
AB
T
BT AT
(15)
This can be generalized as
A1 A2
Am1 Am AmT AmT 1 T
A2T A1T
(16)
7. SOLUTION OF A SIMULTANEOUS EQUATIONS We consider solving simultaneous equations of (6) with a matrix form of Eq.(7). The corresponding solution is given by x1 a11 x2 a21 xn an1
a1n a2 n ann
a12 a22 an 2
1
b1 b2 bn
(17)
where a11 a21 an1
a12 a22 an 2
a1n a2 n ann
1
(18)
1 is called as an inverse matrix of A , and it is denoted as A . This is defined because the
product of A and A1 becomes an unit matrix as
Kunihiro Suzuki
326 a11 a A1 A 21 an1
a1n a11 a2 n a21 ann an1
a12 a22 an 2
a12 a22 an 2
1
a1n 1 0 a2 n 0 1 ann 0 0
0 0 1
(19)
1 Multiplying A from the left side of Eq. (7), we obtain
x1 b1 x b A1 A 2 A1 2 xn bn
(20)
The left side of Eq. (20) is reduced to 1 0 0 1 0 0
0 x1 x1 0 x2 x2 1 xn xn
(21)
Finally, we obtain Eq. (17). Therefore, we need to obtain an inverse matrix of Eq. (18), which enables us to solve the simultaneous equation of Eq. (17). We discuss the procedure to obtain the inverse matrix in the following section.
8. GAUSS ELIMINATION METHOD We consider a simple example of 4 variable simultaneous equations given by
4x 3y 2z u 9 2 y 4 z 3u 8 4z u 2 3u 6
(22)
The simultaneous equation is called as an upper triangular matrix type simultaneous equation. This can be solved easily as follows. The last equation gives
Matrix Operation
u
6 2 3
327
(23)
The equation of the second equation from the last gives 2u 4 22 4 1
z
(24)
The equation of the third equation from the last gives 8 4 z 3u 2 846 2 3
y
(25)
The equation of the fourth equation (that is, top of the equation) gives 9 3y 2z u 4 9922 4 1
x
(26)
Therefore, if the equation is reduced to the form of an upper triangle matrix, we can easily obtain a corresponding solution. Let us consider a Gauss elimination method. We consider the simultaneous equation given by
4 x 3 y 2 z u 20 2 x 5 y 3z 2u 5 x 4 y 8 z u 13 3 x 2 y 4 z 5u 9 This can be expressed with a matrix form as
(27)
Kunihiro Suzuki
328 4 3 2 1 x 20 2 5 3 2 y 5 1 4 8 1 z 13 3 2 4 5 u 9
(28)
Using elements of the first column, we want to eliminate the elements of the second and subsequent elements of the first column as
Second row element - First row elemet
Third row element - First row elemet
Second row element First row elemet
(29)
Third row element First row elemet
(30)
Fourth row element First row elemet
(31)
Fourth row element - First row elemet We then obtain
4 3 2 1 20 2 2 2 2 2 24 5 3 3 2 2 1 x 5 20 4 4 4 4 y 4 1 1 1 1 1 1 4 4 3 8 2 1 1 z 13 20 4 4 4 4 4 u 3 3 3 3 3 4 2 3 4 2 5 1 20 3 9 4 4 4 4 4
(32)
Performing the calculation, we obtain 3 4 7 0 2 19 0 4 17 0 4
2 4 15 2 5 2
1 5 x 20 2 y 15 5 z 8 4 u 24 23 4
(33)
Matrix Operation
329
as is expected of 0 elements of the second and subsequent elements of the first column. We then move to the second row focusing on the second column. Using elements of the second column, we want to eliminate the elements of the third and subsequent elements of the second column as
Third row element - Second row elemet
Third row element Second row elemet
Fourth row element - Second row elemet
Fourth row element Second row elemet
(34)
(35)
We then obtain 3 2 1 4 7 5 0 x 4 2 2 y 19 19 2 7 15 19 2 5 19 2 5 0 4 z 4 4 7 2 2 4 7 4 4 7 2 u 17 17 2 7 5 17 2 23 17 2 5 0 4 4 4 7 2 2 4 7 4 4 7 2 20 15 19 2 8 15 4 7 17 2 24 15 4 7
(36)
Performing the calculation, we obtain 4 0 0 0
3 7 2 0 0
2 4 29 14 33 14
1 20 5 x 15 2 y 173 65 z 14 14 u 591 123 14 14
(37)
as is expected of 0 elements of the third and subsequent elements of the second column.
Kunihiro Suzuki
330
We then move to the third row focusing on the third column. Using elements of the third column, we want to eliminate the elements of the fourth element of the third column as
Fourth row element - Third row element
Fourth row element Third row element
(38)
We then obtain 4 0 0 0
3 7 2 0 0
1 20 5 x 15 4 2 y 173 29 65 z 14 14 14 u 591 33 14 173 33 33 14 29 123 33 14 65 14 14 29 14 14 14 29 14 14 14 29 14 2
(39)
Performing the calculation, we obtain 4 0 0 0
3 7 2
2 4
0
29 14
0
0
1 20 5 x 15 2 y 173 65 z 14 14 u 1632 408 29 29
(40)
as is expected of 0 element of the fourth element of the third column. We then obtain a form of an upper triangular matrix, and can solve easily as shown before. From the last row relationship, we obtain
408 1632 u 29 29
(41)
We then have
u
1632 4 408
(42)
Matrix Operation
331
From the second row from the last relationship, we obtain
29 65 173 z u 14 14 14
(43)
We then obtain
z
1 87 173 65 4 3 29 29
(44)
From the third row from the last relationship, we obtain
7 5 y 4 z u 15 2 2
(45)
We then obtain y
2 5 2 15 4 3 4 7 2 7 2 7
(46)
From the fourth row from the last (top) relationship, we obtain 4 x 3 y 2 z u 20
(47)
We then obtain 1 20 3 y 2 z u 4 1 20 3 2 2 3 4 4 1 4 4 1
x
(48)
We should generalize the above Gauss elimination method using a variable given by
Kunihiro Suzuki
332 a11 a21 ai ,1 a i 1,1 a n1
a12 a22
a1,i a2,i
a1,i 1 a2,i 1
ai ,2 ai 1,2
ai ,i ai 1,i
ai ,i 1 ai 1,i 1
an 2
an ,i
an ,i 1
a1n x1 b1 a2 n x2 b2 ai , n xi bi ai 1, n xi 1 bi 1 b ann x n n
(49)
Using elements of the first column, we eliminate the elements of the second and subsequent elements of the first column as a11 a21 a21 a11 a11 a ai ,1 i ,1 a11 a11 ai 1,1 ai 1,1 a11 a11 a an1 n1 a11 a11
a12 a a22 21 a12 a11 ai ,2
ai ,1
ai 1,2
ai 1,1
an 2
a11
a12
a11
a12
an1 a12 a11
a1,i a a2,i 21 a1,i a11 ai ,i
ai ,1
ai 1,i
ai 1,1
an,i
a11 a11
a12 a12
an1 a12 a11
a1,i 1 a a2,i 1 21 a1,i 1 a11 ai ,i 1
ai ,1
ai 1,i 1
ai 1,1
an,i 1
a11 a11
a12 a12
an1 a12 a11
x1 x2 a ai , n i ,1 a12 a11 x i x a i 1 ai 1,n i 1,1 a12 a11 xn a ann n1 a12 a11 a1n a a2 n 21 a1n a11
b1 a b2 21 b1 a11 ai ,1 bi b1 a11 ai 1,1 bi 1 b1 a11 a bn n1 b1 a11
(50) We update the elements, and express the new ones as a11 a12 a22 0 ai ,2 0 0 a i 1,2 0 an 2
a1,i a2,i
a1,i 1 a2,i 1
ai ,i ai 1,i
ai ,i 1 ai 1,i 1
an,i
an,i 1
a1n x1 b1 a2 n x2 b2 ai , n xi bi ai 1, n xi 1 bi 1 ann x n bn
(51)
Matrix Operation
333
Performing the procedure, and we obtain a11 0 0 0 0
a12 a22
a1,i a2,i
a1,i 1 a2,i 1
0 0
ai ,i ai 1,i
ai ,i 1 ai 1,i 1
0
an ,i
an ,i 1
a1n x1 b1 a2 n x2 b2 ai , n xi bi ai 1, n xi 1 bi 1 ann xn bn
We then obtain the elements below the diagonal up to
(52)
ai 1,i 1 is all zero. Next, we
perform the similar operation to the i 1 -th row and subsequent, and obtain a1,i a11 a12 0 a a 22 2,i 0 ai ,i 0 a 0 0 ai 1,i i 1,i ai ,i ai ,i a 0 an ,i n ,i ai ,i 0 ai ,i b1 b2 bi a bi 1 i 1,i bi ai ,i an ,i bi bn ai ,i
a1,i 1 a2,i 1 ai ,i 1 a ai 1,i 1 i 1,i ai ,i 1 ai ,i an ,i 1
an ,i ai ,i
x1 x 2 ai , n a x ai 1, n i 1,i ai , n i ai ,i xi 1 xn an ,i ann ai , n ai ,i a1n a2 n
ai ,i 1
(53) We update the elements and obtain
Kunihiro Suzuki
334 a11 a12 0 a22 0 0 0 0 0 0
a1,i a2,i
a1,i 1 a2,i 1
ai ,i 0
ai ,i 1 ai 1,i 1
0
an,i 1
a1n x1 b1 a2 n x2 b2 ai , n xi bi ai 1, n xi 1 bi 1 ann xn bn
(54)
Performing the similar operation to the last row, we obtain a11 0 0 0 0
a12 a22
a1,i a2,i
a1,i 1 a2,i 1
0 0
ai ,i 0
ai ,i 1 ai 1,i 1
0
0
0
a1n x1 b1 a2 n x2 b2 ai , n xi bi ai 1, n xi 1 bi 1 ann xn bn
(55)
We then obtain a form of an upper triangular matrix, and can solve easily as shown before. From the last row relationship, we obtain
ann xn bn
(56)
We then obtain xn as xn
bn ann
(57)
From the second row from the last relationship, we obtain
an1,n1 xn1 an1,n xn bn1
We then obtain xn 1
xn 1
(58)
as
1 bn1 an1,n xn an 1, n 1
(59)
Matrix Operation We can continue the process in up order of the row. Let us consider the term of We can obtain from the equation given by ai ,i xi ai ,i 1 xi 1
ai ,n xn bi
335
xi .
(60)
We then obtain xi
1 bi ai ,i 1 xi 1 ai ,i
1 ai ,i
ai , n xn
(61)
n bi ai , k xk k i 1
8.1. Gauss Elimination Method and LU Decomposition A Gauss elimination method is vital to solve simultaneous equations as shown above. However, we can generalize this method as shown in this section. This is called an LU decomposition, which is vital in matrix operation. Let us consider the process of a Gauss elimination method. The simultaneous equation is expressed with a matrix form given by
AX B
(62)
where a11 a21 A ai ,1 a i 1,1 a n1
x1 x2 X xi x i 1 x n
a12 a22
a1,i a2,i
a1,i 1 a2,i 1
ai ,2 ai 1,2
ai ,i ai 1,i
ai ,i 1 ai 1,i 1
an 2
an ,i
an ,i 1
a1n a2 n ai , n ai 1,n ann
(63)
(64)
Kunihiro Suzuki
336 b1 b2 B bi b i 1 b n
(65)
We set mi1
ai1 a11
(66)
and define the matrix as 0 0 1 m21 1 0 M 1 m31 0 1 m n1 0 0
0 0 0 1
(67)
Multiplying this to the A , we obtain 0 1 m21 1 M 1 A m31 0 m n1 0 a11 a12 a22 0 0 a32 0 an 2
0 0 1 0 a13 a23 a33 an3
0 a11 0 a21 0 a31 1 an1 an a2 n a33 ann
a12 a22 a32
a13 a23 a33
an 2
an 3
an a2 n a33 ann
(68)
The elements of the second and the subsequent row is updated as
aij aij mi1a1 j Next, we set
(69)
Matrix Operation mi 2
ai 2 a22
337
(70)
and define the matrix given by 1 0 M2 0 0
0 1 m32
0 0 1
mn 2
0
0 0 0 1
(71)
Multiplying this to the M1 A , we obtain 1 0 0 1 M 2 M 1 A 0 m32 0 m n2 a11 a12 a22 0 0 0 0 0
0 a11 a12 0 0 a22 0 0 a32 1 an 2 0 an a2 n a33 ann
0 0 1 0 a13 a23 a33 an3
a13 a23 a33 an3
an a2 n a33 ann
(72)
The elements of the third and the subsequent row is updated as
aij aij mi 2 a2 j
(73)
We repeat the similar process n 1 times, and obtain
M n 1
a11 0 M 2 M1 A 0 0
a12 a22 0
a13 a23 a33
0
0
an a2 n a33 ann
(74)
The right side of Eq. (74) is the exactly the same as that of the Gauss elimination method. Therefore, this process is identical to the method.
Kunihiro Suzuki
338 We then obtain M 2 M1 AX M n1
M n1
M 2 M1B
(75)
Modifying this equation, we obtain
M n1
M 2 M1
1
M n1
M 2 M1 AX B
(76)
We then define the matrixes as
L M n1
M 2 M1
U M n1
M 2 M1 A
1
(77) (78)
Therefore, the original matrix is expressed by
LUX B
(79)
The form of U is shown in Eq. (74) and is an upper triangle matrix. Let us consider the form of L . The inverse matrix with respect to M k is given by
M k 1
1 0 0 0
0 1
1 0 0 0
0 1
0
1 mk 1, k
0
mnk
0
1 mk 1, k
0
mnk
0 0 0 1 0 0 0 1
1
(80)
Matrix Operation
339
Therefore, we obtain
L M 11 M 21
M n11
1 m21 m k 1,1 m n1
0 1 1 mk 1,2
mk 1, k
mn 2
mnk
0 0 0 1
(81)
This is a lower triangle matrix. Therefore, solving a simultaneous equations step is divided into two steps as
LY B
(82)
UX Y
(83)
Both steps are easily solved, which will be shown later. Once we modify the matrix A to LU , we can apply it to any B . Since Eqs. (74) and (81) use updating elements, we need to obtain them from the original elements, which should be done next.
8.2. LU Division In LU division, we express a matrix A as a product of two matrices given by
a11 a21 an1 where
a12 a22 an 2
a1n 1 0 a2 n l21 1 ann ln1 ln 2
0 u11 u12 0 0 u22 1 0 0
u1n u2 n unn
(84)
Kunihiro Suzuki
340
1 0 l 1 L 21 ln1 ln 2
0 0 1
u11 u12 0 u22 U 0 0
(85)
u1n u2 n unn
(86)
We want to evaluate all elements of L and U simultaneously. Let us start with a simple example given by 6 5 4 1 12 13 10 l21 18 21 17 l 31
0 1 l32
u11 l21u11 l u 31 11
0 u11 0 0 1 0
u12 u22 0
u12 l21u12 u22 l31u12 l32u22
u13 u23 u33 u13 l21u13 u23 l31u13 l32u23 u33
(87)
We can decide the elements of the first row elements as u11 6
(88)
u12 5
(89)
u13 4
(90)
From the second row and the first column, we obtain l21u11 12
(91)
l21 is then decided as
l21
12 12 2 u11 6
(92)
Matrix Operation
341
Using this l21 , u22 and u23 are deiced as follows. l21u12 u22 13
(93)
u22 13 l21u12 13 2 5 3
(94)
l21u13 u23 10
(95)
u23 10 l21u13 10 2 4 2
(96)
Focusing on the third row and the first column, we can obtain l31u11 18
(97)
we can decide l31 as l31
18 18 3 u11 6
(98)
Using this l31 ,we can decide l32 as l31u12 l32u22 13
l32
21 l31u12 21 3 5 2 u22 3
(99)
(100)
Using l32 we can decide u33 as l31u13 l32u23 u33 17
(101)
u33 17 l31u13 l32u23 17 3 4 2 2 1
(102)
Therefore, the original matrix is LU divided as
Kunihiro Suzuki
342
6 5 4 1 0 0 6 5 4 12 13 10 2 1 0 0 3 2 18 21 17 3 2 1 0 0 1
(103)
Let us consider a general algorithm for LU division with a four order matrix, which is given by a11 a21 a31 a41
a12 a22 a32 a42
a3 a23 a33 a43
1 0 l 1 21 l31 l32 l41 l42
0 0 1 l43
a4 a24 a34 a44 0 u11 u12 0 0 u22 0 0 0 1 0 0
u12 u11 l u l u 21 12 u22 21 11 l31u11 l31u21 l32 u22 l41u11 l41u12 l42u22
u13 u23 u33 0
u14 u24 u34 u44
(104)
u13 u14 l21u13 u23 l21u14 u24 l31u13 l32 u23 u33 l31u14 l32 u24 u34 l41u13 l42u23 l43u33 l41u14 l42u24 l43u34 u44
Focusing on the first row, we decide
u1 j
as
u1 j a1 j
(105)
Focusing on the second row, we first decide l21 as l21
a21 u11
(106)
Using l21 , we can decide u2 j j 2 as u22 a22 l21u12
(107)
u23 a23 l21u13
(108)
Matrix Operation u24 a24 l21u14
343 (109)
Focusing on the third row, we first decide l31 , l32 as l31
a31 u11
(110)
l32
1 a32 l31u22 u22
(111)
Using l31 , l32 , we can decide u3 j j 3 as u33 a33 l31u13 l32u23
(112)
u34 a34 l31u14 l32u24
(113)
Focusing on the third row, we first decide l41 , l42 , l43 as l41
a41 u11
(114)
l42
1 a42 l41u22 u22
(115)
l43
1 a43 l41u13 l42u23 u33
(116)
Using l31 , l32 , we can decide u44 as u44 a44 l41u14 l42u24 l43u34
(117)
We further generalize the above process. We first initialize the matrix elements as
0 lij 1
for i j , uij 0 for i j
(118)
Kunihiro Suzuki
344
Focusing on the first row, we decide
u1 j
as
u1 j a1 j
(119)
Focusing on the i 2 -th row, we first decide li1 as li1
ai1 u11
(120)
We then decide elements columns as
lij 2 j i
lij
of the second and the subsequent up to j i 1 -th
j 1 1 aij lik ukj u jj k 1
(121)
In the same row, we decide the elements uij of j i –th and the subsequent columns as j 1
uij j i aij lik ukj
(122)
k 1
8.3. Inverse Matrix Derivation Utilizing LU Division 1 We set an inverse matrix with respect to a matrix A as A , that is
AA1 E
(123)
We assume a 5 5 matrix as a11 a21 a31 a41 a 51
a12 a22 a32 a42 a52
a13 a23 a33 a43 a53
a14 a24 a34 a44 a54
a15 x11 a25 x21 a35 x31 a45 x41 a55 x51
x12 x22 x32 x42 x52
x13 x23 x33 x43 x53
x14 x24 x34 x44 x54
x15 1 x25 0 x35 0 x45 0 x55 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
(124)
Matrix Operation
The elements of inverse matrix are denoted as
. Eq. (124) can be expressed as
a11 a21 a31 a41 a 51
a12 a22 a32 a42 a52
a11 a21 a31 a41 a 51
a12 a22 a32 a42 a52
a13 a23 a33 a43 a53
a14 a24 a34 a44 a54
a15 x12 0 a25 x22 1 a35 x32 0 a45 x42 0 a55 x52 0
(126)
a11 a21 a31 a41 a 51
a12 a22 a32 a42 a52
a13 a23 a33 a43 a53
a14 a24 a34 a44 a54
a15 x13 0 a25 x23 0 a35 x33 1 a45 x43 0 a55 x53 0
(127)
a11 a21 a31 a41 a 51
a12 a22 a32 a42 a52
a13 a23 a33 a43 a53
a14 a24 a34 a44 a54
a15 x14 0 a25 x24 0 a35 x34 0 a45 x44 1 a55 x54 0
(128)
a11 a21 a31 a41 a 51
a12 a22 a32 a42 a52
a13 a23 a33 a43 a53
a14 a24 a34 a44 a54
a15 x15 0 a25 x25 0 a35 x35 0 a45 x45 0 a55 x55 1
(129)
a13 a23 a33 a43 a53
a14 a24 a34 a44 a54
a15 x11 1 a25 x21 0 a35 x31 0 a45 x41 0 a55 x51 0
xij
345
(125)
Therefore, we can derive all matrix elements using the simultaneous matrix elements derivation process five times. The matrix elements of a unit matrix is denoted as
eij ij
(130)
Kunihiro Suzuki
346
Applying the simultaneous equation solving process, we can obtain the matrix elements as y1 j e1 j i 1
yij eij lik ykj
for i 2
(131)
k 1
xnj
xij
ynj unn yij
(132)
n
u
k i 1
i ,k
xkj
uii
9. DETERMINANT OF A MATRIX The determinant of a matrix for the two and three order square matrix are given by
a11 a12 a11a22 a12 a21 a21 a2 a11 a21 a31
a12 a22 a32
a3 a23 a11a22 a33 a12 a23 a31 a13 a21a32 a3
(133)
(134)
a13 a22 a31 a12 a21a33 a11a23 a32
The determinant for more than the four order square matrix is rather complex and the derivation of this type is not practical. We do not discuss the determinant in detail, but only show the numerical process to obtain it. We can modify any row elements by adding the constant multiplied the other row elements, which does not change the value of the determinant. Therefore, we can obtain an up triangle one by applying the Gauss method as a '11 0 0
a '12 a '22 0
0
0
a '13 a '23 a '33
a '1n a '2 n a '3n 0
a 'nn
(135)
Matrix Operation
347
The corresponding determinant is then given by a '11 a '22 a '33
(136)
a 'nn
10. NUMERICAL EVALUATION OF EIGENVALUE We sometimes need to obtain eigenvalues and corresponding eigenvectors. They are fundamental unit of a matrix. We discuss the subject in this section.
10.1. Relationship between Matrix and Eigenvector Let us consider a square matrix with a a11 a A 21 an1
n -th order given by
a1n a2 n ann
(137)
We have corresponding
n eigenvalues and eigenvectors associated with A and
a12 a22 an 2
k k denote them as and u , which satisfy
Au
u
k
k
u
k
u1 k k u 2 uk n
(138)
(139)
It is known that each eigenvector is orthogonal, and is given by
u u ij i
j
We constitute a matrix that consists of the eigenvectors as
(140)
Kunihiro Suzuki
348
U u
u
1
2
u
n
(141)
T The corresponding transverse matrix U is given by
u1T 2 T u T U u n T
(142)
where u
k T
u1
k
u2
k
un
k
(143)
T The product of the matrices of U and U is given by
UU T u 1
u
2
u1 u1 2 1 u u u n u1 1 0 0 1 0 0 E
u
n
u u
2
u
2
u
2
u
n
u
2
1
u1T 2 T u u n T
1 n u u 2 1 u u n n u u
0 0 1
(144)
Similarly, we obtain U TU E
We consider a product of
(145)
A and U given by
Matrix Operation
AU u 1
2 u 2
1
n u n
349
(146)
T Multiplying U from the right side to Eq. (146), we obtain
AUU T AI A
(147)
Therefore, we obtain A AU U T
u 1
2 u 2
1
n u n
u u u u 1
1
1T
2
2
2 T
u1T 2 T u u n T
u u
n
n
n T
(148)
Therefore, eigenvectors are fundamental components of a matrix A . It is similar to that a space vector r is expressed by unit vector ei as
r ax e x a y e y a z e z
(149)
Therefore, the eigenvalue corresponds to the component related to the eigenvector. k When we multiply eigenvector u from the right side to the matrix A , we obtain
Au u u u k
1
1
u k
1T
k
u u u 2
2
2T
k
u k
k
u u k T
k
u n
n
u u nT
k
k
(150) Therefore, we can extract the k-th component. It is similar to the space vector case where we have
r e x ax e x e x a y e y e x az e z e x ax
(151)
Kunihiro Suzuki
350
10.2. Power Method We assume ordered eigenvalues given by
1 2
n
(152)
1 We show how to evaluate the maximum eigenvalue of . The eigenvectors are independent of each other, and any vector eigenvectors as
v C1u C2 u 1
2
Cn u
v
is expressed with the
n
(153)
Multiplying a matrix A from the left side to Eq. (153), we obtain
Av C1 Au C2 Au 1
C p Au
2
C1 u C2 u 1
1
2
We perform this process
r
2
C p u
1
1
r
r
1
n
n
(154)
times and obtain
u C u
Ar v C1
n
2
r
2
2
2 C1u1 C2 1
r
2 u
u
Cp
n
r
n C p 1
n
r n u
(155)
Since we have 1 2 n , the second and the subsequent becomes negligible when r increases. Therefore, we can obtain
Ar v C1 1
r
u 1
(156)
Therefore, multiplying A to v with many times, we obtain the maximum eigenvector with a factor. The other thing is how we can judge the sufficient cycle time to extract the maximum eigenvector, which is shown below. We start with an initial vector of v
Matrix Operation v 0
1 n
1
1
351
1
(157)
This can be replaced by the other ones without zero vectors. We multiply A to this vector, and normalize its size as v 1
Av
0
Av
0
v11 1 v 2 v1 n
(158)
We further multiply A to this vector, and normalize its size as v 2
Av 1
Av 1
v21 1 v 2 v1 2
(159)
We further multiply A to this vector, and normalize its size as v 3
Av
2
Av
2
v1 3 3 v 2 v 3 n
(160)
We then repeat this process many times. r * r 1* If the first component becomes predominant, the all elements of v and v become close. Setting a critical value of , we can evaluate as
Kunihiro Suzuki
352 v
r 1
v
v
r
r 1
(161)
If this equation is valid, we judge the first component becomes predominant. Then, the first eigenvector is determined as u v 1
r1
(162)
The eigenvalue can be evaluated from the evaluation of Au u 1
1
1
(163)
We then obtain the eigenvalue as
1
u Au 1T
1
u u 1T
1
(164)
We can evaluate the other eigenvectors and eigenvalues as follows. From Eq. (148), we make a matrix given by
A2 A u u 1
u u 2
2
1
1T
2 T
u u n
n
n T
(165) 1
This A2 is a matrix without the fundamental component of u . Therefore, any vector v is expressed as
v C2 u
Cn u
2
(166)
A2 from the left side to the vector, we obtain
Multiplying
A2 v C2 A2 u 2
C2 u 2
n
2
Cn A2 u
n
Cn u n
n
(167)
Matrix Operation
353
We then obtain the second eigenvector and eigenvalue. We can repeat this process to the end, and finally we can obtain all eigenvectors and related eigenvalues. In this procedure, the error is accumulated with solving the eigenvectors. Therefore, we cannot expect accurate eigenvectors and eigenvalues for smaller eigenvalues. This is mainly applied to obtain the maximum eigenvector and eigenvalue.
11. JACOBI METHOD FOR SYMMETRICAL MATRIX A Jacobi method aims to force elements except for the diagonal ones to zero. After the completion of the process, we can obtain all the eigenvalues and eigenvectors. Although this method is limited to a symmetrical matrix, most of the case where statistics require is related to the symmetrical matrix. Therefore, the Jacobi method is vital for statistics and may be most important. In the symmetrical matrix, any two eigenvectors are orthogonal to each other, which are shown below. Let us consider two eigenvectors of uA and uB and corresponding eigenvalues are
A , and B . We then have AuA A uA
(168)
AuB B uB
(169)
The transverse of Eq.(169) is given by uTB A B uTB
(170)
Since A is a symmetrical matrix, we can assume AT A . Multiplying uA from the right side, we obtain uTB AuA B uBT uA
(171)
The left side is performed as
uTB AuA uBT A uA A uBT uA
(172)
Kunihiro Suzuki
354 Therefore, we obtain
A B uTB uA 0
(173)
Since A B , we obtain
uTB uA 0
(174)
Therefore, it is proved that any two eigenvectors are orthogonal to each other. Let us start with a second order square matrix. The corresponding eigenvalue problem is described as
Ax x
(175)
We assume that eigenvector
x
is normalized, and hence it is expressed by
cos x sin
(176)
Therefore, Eq.(175) is expressed by a11 a21
a12 cos cos a22 sin sin
(177)
This is modified as a11 a21
a12 cos 0 a22 sin
(178)
This is reduced to a11 cos a12 sin 0 a21 cos a22 sin 0
This has solutions only when the below is held.
(179)
Matrix Operation a11 a21
a12 0 a22
355
(180)
Therefore, we obtain
a11 a22 a12 a21 0
(181)
Finally, we obtain the eigenvalues of a a a a 11 22 11 22 a12 a21 2 2 2
(182)
Since we treat the symmetrical matrix, a12 a21 , and Eq. (182) is reduced to a11 a22 a a22 2 11 a12 2 2 2
(183)
We denote two solutions as a11 a22 a a 11 22 a122 2 2
(184)
a11 a22 a a 11 22 a122 2 2
(185)
2
1
2
2
Substituting 1 into Eq. (179), we obtain
a11 1 cos1 a12 sin 1 0
(186)
Modifying this, we obtain tan 1
1 a11 a12
Similarly, substituting 2 into Eq. (179), we obtain
(187)
Kunihiro Suzuki
356 tan 2
2 a11 a12
(188)
We can evaluate 1 and 2 from Eqs.(187) and (188). However, we have a relationship given by tan tan
(189)
Therefore, the angle is not uniquely determined. In the Jacobi method, we do not evaluate the angles directly, but evaluate cos and sin , and the ambiguity vanishes, which is shown later. Let us consider eigenvectors in more detail, which is given by cos 1 cos 2 x1 , x2 sin 1 sin 2
(190)
Performing an inner product of the eigenvectors, we obtain sin 1 sin 2 cos 1 cos 2 sin 1 sin 2 cos 1 cos 2 1 cos 1 cos 2 cos 1 cos 2 1 tan 1 tan 2 a a cos 1 cos 2 1 1 11 2 11 a12 a12
(191)
Furthering the calculation, we obtain 1
1 a11 2 a11
1
a12
a12
12 1 2 a11 a112 a12 2
a11 a22 a11 a22 2 2 a12 a11 a22 a11 a11 2 2 1 a12 2 2
0
2
(192)
Therefore, the two eigenvectors are orthogonal to each other as is shown in general before.
Matrix Operation
357
We define the matrix H consists of the eigenvectors as cos 1 H sin 1
cos 2 sin 2
(193)
We perform a product of AH , which is a12 cos 1 cos 2 a AH 11 a21 a22 sin 1 sin 2 a cos 1 a12 sin 1 a11 cos 2 a12 sin 2 11 a21 cos 1 a22 sin 1 a21 cos 2 a22 sin 2 cos 1 2 cos 2 1 1 sin 1 2 sin 2 cos 1 cos 2 1 0 sin 1 sin 2 0 2 H
(194)
where 1 0
0 2
(195)
When H is orthogonal, the composing column vector is orthogonal to each other. Then, the composing row vector is orthogonal to each other. Therefore, the following is valid.
HH T E
(196) T
Performing HH , we obtain cos 1 cos 2 cos 1 sin 1 HH T sin 1 sin 2 cos 2 sin 2 cos 2 1 cos 2 2 cos 1 sin 1 cos 2 sin 2 sin 2 1 sin 2 2 sin 1 cos 1 sin 2 cos 2 1 0 0 1
(197)
Kunihiro Suzuki
358 Therefore, we obtain sin2 cos1 ,cos2 sin2
(198)
Consequently, H is a rotation matrix of Sr given by cos H Sr sin
sin cos
(199)
T The transverse matrix of Sr is denoted as Sr and is given by
cos S rT sin
sin cos
(200)
Therefore, the product is given by cos sin cos sin SrT Sr sin cos sin cos cos 2 sin 2 cos sin sin cos sin 2 cos 2 sin cos cos sin 1 0 E 0 1
(201)
We then obtain SrT Sr1
(202)
as is expected. Further, we obtain ASr Sr
(203)
We then obtain A Sr SrT
(204)
Matrix Operation
359
We can modify it as follows SrT ASr
(205) T
This means that any matrix A can be diagonalized by multiplying Sr from the left, and Sr from the right. It is also noted that any symmetrical matrix can be expressed with T a form of Sr Sr .
We limit our analysis to n 2 up to here. Let us then consider n 2 . We select the row number of
p
a pq 0 p q
, and the column number of . We rotate the matrix so that
and finish the process when all
aij i j
q
a pq
, and consider a corresponding element is forced to be 0 . We repeat any elements
is sufficiently small.
xx The rotation with an angle of in p q plane, is expressed by 1 Sr
p
q
cos
sin
1 1 sin
cos 1
1
(206)
Let us consider the transformation using Sr as
B SrT ASr
When we multiply
(207)
Sr to A from the right side, only the elements of p and q
T p columns change. When we multiply Sr to ASr from the left side, only the elements of
q
and rows change. We consider the change of the elements after the operation of Eq. (207) inspecting Figure 1.
Kunihiro Suzuki
360
A The elements in region are changed only by the operation of multiplying Sr from the right.
The elements in region the left.
B are changed only by the operation of multiplying SrT from
The elements in region and are changed by both multiplying. The other elements are unchanged. The next thing we should do is to select the angle associated with the operations. D
C
Figure 1. The change elements changed by the operation of Eq. (207).
Let us compare the elements of matrices A and B denoted by row number i and the column number j . bij aij for i, j 1, 2,
, n : i , j p, q
bip bpi aip cos aiq sin for i 1, 2,
biq bqi aip sin aiq cos for i 1, 2,
bpp a pp cos 2 aqq sin 2 a pq sin 2
bqq a pp sin 2 aqq cos 2 a pq sin 2
bpq bqp a pq cos 2
a pp a pq 2
aij
and
bij
with the
(208) , n : j p, q
, n : j p, q
(209) (210)
(211)
(212)
sin 2
(213)
Matrix Operation
361
The sum of the square of p and q column in region (A) is given by bip2 biq2 aip2 cos 2 aiq2 sin 2 2aip aiq cos sin aip2 sin 2 aiq2 cos 2 2aip aiq cos sin aip2 cos 2 sin 2 aiq2 cos 2 sin 2 aip2 aiq2
(214)
Therefore, the sum is unchanged for matrices A and B in this region. Since A and B are symmetrical ones, and the same results are obtained for region (B), which is expressed by
bpi2 bqi2 a 2pi aqi2
(215)
Therefore, the difference of sum of square of non-diagonal elements for matrices A and B is expressed by the difference of elements in region C , which is given by 2 2 a2pq bpq
(216)
This becomes the maximum when Since the operation of
bpq
is 0.
SrT ASr corresponding to the rotation of the matrix elements of
A, the total norm, which is called as a Frobenius norm, must hold. This is expressed by n
n
a i 1 j 1
2 ij
n
n
bij2 i 1 j 1
(217)
Therefore, when we perform this operation, we can expect smaller element value for non-diagonal elements in the region (C), and larger diagonal elements value in the region (D). The corresponding angle can be evaluated from Eq. (213) as bpq a pq cos 2
We then obtain
a pp a pq 2
sin 2 0
(218)
Kunihiro Suzuki
362
tan 2
2a pq a pp a pq
(219)
We can then evaluate cos and sin as follows (see Appendix 1-13). cos 2
1 1 tan 2 2
(220)
sin 2 tan 2 cos2
(221)
1 cos 2 2
cos
(222)
sin 2 2cos
sin
(223)
Let us consider an eigenvector. The Jacob method repeats the process where the maximum non diagonal element to be zero, we finally obtain the matrix given by Sr T
Sr 2T Sr1T ASr1Sr 2
Sr
(224)
where 1 0
2
0 n
(225)
It should be noted that we have ei i ei
(226)
This can be expressed by
S
T r
SrT2 SrT1 ASr1Sr 2
Sr ei i ei
(227)
Matrix Operation Multiplying Sr1Sr 2
Sr from the left, we obtain
Sr ei i Sr1Sr 2
ASr1Sr 2
363
Sr ei
(228)
Therefore, we obtain eigenvector as
vi ASr1Sr 2
Sr ei
(229)
Summarizing above, we can describe the Jacobi method as follows. Set a convergence condition .
0 Set the initial eigenvector matrix V as E .
Search the maximum non diagonal element
a pq
in matrix A .
If
a pq
is valid, the searching process is finished.
If
a pq
is valid, evaluate cos ,sin and obtain Sr .
0
Then update A and V below A SrT1 A Sr1 1
V
1
0
(230)
V S r1 0
(231)
a
Perform next step of searching next pq . We can then obtain a diagonal matrix. A SrT A
1
SrT
Sr
SrT2 SrT1 A Sr1Sr 2 0
Sr
(232)
The j-th column elements correspond to the j-th eigenvalue. Evaluate V V
1
Sr
V S r1 S r 2 0
Sr
The j-th column vector corresponds to the j-th eigenvector.
(233)
Kunihiro Suzuki
364
12. n-TH PRODUCT OF MATRIX We consider matrix A given by a11 a A 21 an1
a1n a2 n ann
a12 a22 an 2
(234)
k We consider the k-th product of the matrix of A . We assume that we know the
eigenvectors and eigenvalues of the matrix A and denote them as x1 ,x2 , ,xn . We then form a matix
P x1 x2
xn
(235)
where a11 a12 a a Axi 21 22 an1 an 2
a1n a2 n x i i ann
(236)
The product of A and P is given by a11 a12 a a22 AP 21 an1 an 2 1x1 2 x 2 x1 1 0 P 0
x2
0
2 0
a1n a2 n x x2 1 ann n x n 1 0 xn 0 0 0 n
0
2 0
xn
0 0 n
(237)
Matrix Operation
365
1 We also evaluate the inverse matrix of P denoted as P . We can realize a diagonal matrix as
1 0 0 2 P 1 AP 0 0
0 0 n
(238)
Therefore, we obtain
P
1
AP
k
1k 0 0
0
2
k
0
0 0 n k
(239)
On the other hand, we can extend the left side of Eq. (239) as
P
1
AP
P k
1
P AP A PP P
AP P 1 AP
P 1 A PP 1
1
1
(240)
P 1 Ak P
Therefore, we obtain 1k 0 k A P 0
0
2 k 0
0 0 1 P n k
(241)
Appendix 1
RELATED MATHEMATICS We briefly show various mathematical treatments which are used in the text book.
1. SUMMATION We utilize a summation expression using n
1 1 1
AND PRODUCT
whole in this book. It is defined as
1 n
k 1
(1)
n
k 1 2
n
k 1
n
k
2
(2)
12 22
n2
k 1
(3)
and so on for higher order summation. Since k is a dummy variable, we can use any variable as n
n
n
m 1 2 k 1
k
l 1
l
m 1
n (4)
Kunihiro Suzuki
368
We can apply this to a suffix of a variable as
x1 x2
xn
n
x
k
k 1
(5)
Therefore, the equation in the text book is given by
a b
n
n
n Cr a n r b r r 0
n C0 a n n C1a n 1b n C2 a n 2 b 2
n Cn 1ab n 1 n Cn b n
(6)
expresses the product as given by n
i 1 2
n
i 1
(7)
This can then be related to a factorial as n
i 1 2
n n!
i 1
(8)
Note that
0! 1
(9)
The definition of 0! comes from below. The combination where we select
n
Cr
r among n elements is given by
n! r ! n r !
(10)
When we set n r , the case number should be 1, that is
n
Cn
n! 1 1 n! n n ! 0!
Therefore, we impose Eq. (9).
(11)
Related Mathematics
369
expresses a more general case as shown below. n
x i 1
i
x1 x2
xn (12)
We also define double factorial where the elements skip one, and is given by
2n !! 2n 2n 2 2 2n 2n 2 2n 2 n 1 n
2n 2 i 1 i 1
(13)
2n 1!! 2n 1 2n 1 1 2n 1 2n 1 2n 1 2n n 1
2n 1 2 i 1 i 1
(14)
2. A GAMMA FUNCTION AND A BETA FUNCTION 2.1. Definition of a Gamma Function Gamma function is defined as
x
0
exp t t x 1dt
(15)
We consider a more general form than Eq.(15), where the term in the exponential term is not t , but it has a term of t a given by
t x 1 t exp dt 0 a
(16)
This can also be related to the Gamma function. We introduce a variable y
t
a
and obtain
Kunihiro Suzuki
370
x 1 t x 1 t exp dt 0 ay exp y ady 0 a
a x y x 1 exp y dy 0
a x x
(17)
2.2. A Gamma Function and a Factorial A Gamma function has a relationship of
x 1 x x
(18)
which we prove hereafter as follows.
x x exp t xt x 1dt 0
dt x dt exp t dt 0
0
0
(19)
exp t t exp t t x dt x 1
Therefore, we obtain n 1 n n n n 1 n 1
(20)
n! 1
where
n
is a natural number. We further obtain
1 exp t dt 1 0
(21)
Therefore, we obtain factorial with a Gamma function as
n 1 n!
(22)
Related Mathematics
371
This can be extended to a real number as
x! x 1 exp t t x dt 0
(23)
2.3. Evaluation of 1 2 2 Introducing a variable of t u , we obtain dt 2udu , and Eq. (15) is reduced to
2
x
exp u 2 u 2 x 2 2udu
0
exp u 2 u 2 x 1du
0
(24)
Therefore, we obtain 1 2 2
exp u 2 du
0
(25)
2.4. A Gamma Function Where x < 0 A Gamma function is expressed as with a limiting form as
n !n x n x x 1 x 2
x lim
x n
(26)
which is proved as follows. We consider a function given by n
0
t
n
n x 1 t x 1dt n
Introducing a variable of
t n s , we obtain
(27)
Kunihiro Suzuki
372 1
1 s ns n 1 s s
n x
n
x 1
nds
0
1
x
n
x 1
ds
(28)
0
Therefore, we obtain
n x n
x
1
1 s
n
s x 1ds
0
1
n n 1 1 s s x x 0 x 1 n 1 s n1 s x ds 0 x
1
1 s
n 1
s x ds
0
(29)
We then obtain
n x n
x
1
1 s
n
s x 1ds
0
n 1 1 s n 1 s x ds x 0 n n 1 1 n2 1 s s x 1ds x x 1 0
n n 1 n 2
1
1 s x x 1 x 2
n 3
s x 2 ds
0
n n 1 n 2 1
x x 1 x 2
x x 1 x 2
1
1 s x n 1
0
s x n 1ds
0
n!
x n 1 x n
(30)
Therefore, we obtain
n x
n !n x x x 1 x 2 x n 1 x n
On the other hand, we obtain
(31)
Related Mathematics n
373
t lim n x lim 1 t x 1dt n n 0 n
n
0
n
exp t t x 1dt
x
(32)
Therefore, we obtain
n !n x n x x 1 x 2 x n 1 x n
x lim
(33)
x is infinite for x = 0, -1, -2, ・・・・・. That is, 1 0 n
(34)
The dependence of the Gamma function on x is shown in Figure 1.
10
Gamma (x)
5 0 -5 -10 -5 -4 -3 -2 -1 0 1 2 x
3 4 5
Figure 1. A Gamma function for whole planes of positive and negative regions.
2.5. A Product of a Gamma Function of n is a natural number and we set
n n 1 2 2
Kunihiro Suzuki
374 n n 1 n 2 2
(35)
From the characteristics of a Gamma function, we obtain n 1 n 1 n 1 2 2 2
(36)
Therefore, we obtain n n 1 n n 1 n 1 n 2 2 2 2 2
Consequently,
n
n
(37)
holds a recursion of
n 1 n 1 2
(38)
We then obtain n 1 n 1 2 n 1 n 2 n 2 2 2 n 1 n 2 1 1 2 2 2 n 1 n 2 1 1 1 2 2 2 2 n 1! n 1 2 21 n n
n
(39)
Finally, we obtain n n 1 1 n n 2 2 2
(40)
Related Mathematics
375
This is valid even for n 1 . This is valid when n is a real number, which we do not prove here.
2.6. A Binominal Factor for n k A binominal factor is given by
n
Ck
n! n k !k !
(41)
Therefore, it is expressed with Gamma functions as
n
Ck
n 1
(42)
n k 1 k 1
n k 1 When n and k are integers, and n k , the term approaches to infinity. Therefore, we have
n
Ck 0 for n < k
(43)
2.7. A Beta Function We consider the product of a Gamma function as
x y 4 exp u 2 u 2 x 1du exp v 2 v 2 y 1dv
0
4
0
0
0
exp u v u 2 x 1v 2 y 1dvdu 2
2
(44)
We convert variables as u r cos , v r sin
(45)
The incremental integration area is converted as
dvdu rd dr
(46)
Kunihiro Suzuki
376
Therefore, the product is expressed as
x y 4 0
2 0
exp r 2 r cos
2 exp r 2 r
2 x y 1
0
2 x 1
r sin
dr 2 2 cos
2 x 1
0
x y 2 2 cos
2 x 1
0
sin
2 y 1
2 y 1
d rdr
sin
2 y 1
d
(47)
d
where we change r from 0 to infinity, and from 0 to 2 . We then obtain a Beta function as
B x, y
x y
(48)
x y
The Beta function is thus defined as
B x, y 2 2 cos
2 x 1
0
sin
2 y 1
d
(49)
Performing a variable conversion, cos 2 t
(50)
we obtain
2cos sin d dt
(51)
Therefore, the Beta function is also expressed by
B x, y 2 2 cos
2 x 1
0
sin
2 y 1
cos sin cos sin 1 0
cos 2 1
0
2 x 1
x 1
sin
t x 1 1 t dt 1
0
y 1
2
d
2 y 1
y 1
dt dt
(52)
Related Mathematics
377
3. GAUSS INTEGRATION 3.1. Normal Gauss Integration Gauss integration I is defined as
I
exp ax2 dx
0
(53)
We change the integration region form 0, to , and express the integration as I ' and is given by
I'
exp ax2 dx
exp ax 2
Since
(54)
is an even function,
I ' 2I
(55)
Since x is a dummy variable, we can use any notation. Changing a variable from x to
y
, we can also express the integration given by
I'
exp ay 2 dy (56)
Multiplying the both equation, we obtain
I '2 exp ax 2 dx exp ay 2 dy
exp a x 2 y 2 dxdy
(57)
We change the integration in the Cartesian axis system to that in the polar axis system as
Kunihiro Suzuki
378
r 2 x2 y 2 dxdy 2 rdr
(58)
We then obtain
I '2 0
exp ar 2 2 rdr (59)
Introducing a variable u r2
(60)
we then obtain
du 2rdr
(61)
The integration is given by
I '2 0
exp ar 2 2 rdr
exp au 2 r
0
0
1 du 2r
exp au du
a
(62)
Finally, we obtain 1 I' 2 1 2 a
I
3.2. Modified Gauss Integration The integration related to the Gauss integration is given by
(63)
Related Mathematics
In a
379
x2n exp ax2 dx
0
(64)
We start with the Gauss integration, which can be denoted as I 0 a . The Gauss integration is given by
I0 a
exp ax2 dx
0
(65)
We regard this as a function of a . Differentiating this equation with respect to a , and we obtain dI 0 a
da
d
exp ax 2 dx
0
da
x exp ax 2 dx 2
0
I1
(66)
On the other hand, we obtain 1 d 2 a dI 0 a da da
3
1 a 2 2 2
(67)
Therefore, we obtain I1 a
1 22
a3
(68)
We differentiate I1 a and obtain dI1 a da
0
x 4 exp ax 2 dx
Therefore, we obtain I 2 a as
5
1 a 2 2 2
(69)
Kunihiro Suzuki
380 I2 a
x 4 exp ax 2 dx
0
3 23
a5
(70)
We repeat the above process, and obtain a general form as In a
3 5 7
x 2 n exp ax 2 dx
0
2 2n 1!! 2
n 1
2n 1 n 1
a
2 n 1
a
2 n 1
(71)
where we define
2n 1!! 1 3 5 7
2n 1
(72)
The other integration related to the Gauss integration is also given by
Kn a
x2n1 exp ax2 dx
0
(73)
We start with n 0 given by
K0 a
x exp ax 2 dx
0
(74)
Introducing a variable u x2
(75)
We then have
du 2xdx We obtain
(76)
Related Mathematics K0 a
x exp ax 2 dx
0
x exp au
0
1 2 1 2a
0
1 du 2x
exp au du
(77)
We regard this as a function of obtain dK0 a
da
381
d
a . Differentiating this equation with respect to a , and
x exp ax 2 dx
0
da
x3 exp ax 2 dx
0
K1 a
(78)
On the other hand, we obtain 1 d 2a da 1 2 2a
dK 0 a da
(79)
Therefore, we obtain K1 a
1 2a 2
(80)
We differentiate this further and obtain dK1 a da
0
x5 exp ax 2 dx
Therefore, we obtain
2 2a 3
(81)
Kunihiro Suzuki
382
K2 a
x5 exp ax 2 dx
0
2 2a 3
(82)
We repeat the process and obtain a general form as Kn a
x 2 n 1 exp ax 2 dx
0
2 3 n 2a n 1 n! n 1 2a
(83)
4. AN ERROR FUNCTION An error function Erf x is defined as the integration of a Gauss function as Erf x
1
x
0
exp y 2 dy
(84)
The boundary conditions for the function is given by Erf 0 0
(85)
Erf 1
(86)
The complementary error function is defined as
Erfc x 1 Erf x
(87)
Let us consider the integration below. x
I 0
z2 exp dz 2 2 1
(88)
Related Mathematics
383
Introducing a variable t
z
(89)
2
we further perform the integration as x
z2 exp dz 2 2
I 0
0
1
x
1
2
2 x
exp t 2 2dt
1 2 0
1 x Erf 2 2
2
2
(90)
exp t 2 dt
The dependences of Erf x and Erfc x on x are shown in Figure 2. The inverse error function Erf Erf 1 x
1
x is approximately expressed as
1 7 2 5 127 3 7 4369 4 9 34807 5 11 x x3 x x x x 2 12 480 40320 5806080 182476800
1.0 Erf(x)
Erf(x), Erfc(x)
0.8 0.6 0.4 Erfc(x)
0.2 0.0 0.0
0.5
1.0 x
1.5
2.0
Figure 2. Dependence of an error function and a complementary error function on x .
(91)
Kunihiro Suzuki
384
5. AN INTEGRAL AREA OF CONVERTED VARIABLES We discuss an integral area for converted two variables, where the two variables
x, y is converted to u, v . The corresponding schematic expression is shown in Figure 3. We start with one variable, and consider the conversion given by
x x u
(92)
The integral for the converted variable can be given by
x2
x1
f x dx f x u u2
u1
The factor shows
dx
du
dx du du
(93)
how the axis length is changed by the conversion.
Let us treat two variables. We convert
x, y to u, v , which is expressed in general
as x x u, v y y u, v
(94)
The corresponding total derivative is given by dx dy
x x du dv u v y y du dv u v
(95)
Using a matrix form, this can also be expressed by x dx u dy y u
x v du y dv v
(96)
Related Mathematics
385
This means that the vectors 1,0 , 0,1 , 1,1 in u, v plane is converted to x dx u dy y u x u y u
x v 1 y 0 v
x dx u dy y u x v y v
x v 0 y 1 v
(97)
(98)
x x dx u v 1 dy y y 1 u v x x v v y y v v
in
x, y plane. The area in x, y is given by det J
x u J y u
x v y v
(99)
, where
(100)
Kunihiro Suzuki
386
u, v is related to the incremental Therefore, the incremental area dudv in the plane area in the
x, y plane as
dxdy det J dudv
(101)
Figure 3. Integral areas for converted variables.
6. A MARGINAL PROBABILITY DISTRIBUTION We consider two probability variables X and Y , and relate them to the probability distribution f x, y , which correspond to the probability distribution where both
y
x and
occur. We want to know the probability of X independent of Y . We can obtain it by
f x summing up the probability for X with whole Y . We set it as 1 . If the values are discrete ones, we can obtain N
f1 X xi pij pi1 pi 2 j 1
piN
(102)
If the values are continuous ones, we obtain f1 x
f x, y dy
Similar analysis can be done for
(103)
y
and we obtain
Related Mathematics f2 y
f x, y dx
(104)
The conditional probability
f x y
387
f x y
can be expressed by
f x, y f2 y
(105)
If X is independent on Y , we obtain f x, y f1 x f 2 y
(106)
7. INTEGRATION BY PARTS Let us consider the derivative of a product of two functions fG , which is given by
fG f 'G fg '
(107)
where
g G'
(108)
We can obtain
fgdx fG f Gdx '
(109)
The -th moment of an exponential distribution is given by
X
x 1 x exp dx 0
where we can regard as
(a-4)
Kunihiro Suzuki
388
f x
g
(110)
x 1 exp
(111)
We then obtain f ' x 1
(112)
x G exp
(113)
Finally, we obtain
X
x 1 x exp dx 0
x x x exp x 1 exp dx 0 0
x x 1 exp dx 0
(114)
8. DERIVATIVES OF INVERSE TRIGONOMETRIC FUNCTIONS We define a variable as
x sin y
1 cos y
for
y 2 2
dy dx
Therefore, we obtain
(115)
(116)
Related Mathematics
dy 1 dx cos y 1 1 x2
389
(117)
This leads the integration given by
1 dx sin 1 x C 2 1 x
(118)
We define a variable as for 0 y
x cos y
1 sin y
dy dx
(119)
(120)
Therefore, we obtain dy 1 dx 1 x2
(121)
This leads the integration given by
1 dx cos1 x C 2 1 x
(122)
y 2 2
(123)
x tan y
for
We then obtain 1 dy cos 2 y dx dy 1 x2 dx
1
(124)
Kunihiro Suzuki
390 Therefore, we obtain
dy 1 dx 1 x 2
(125)
This leads the integration given by
1 dx tan 1 x C 1 x2
(126)
9. A DERIVATIVE FUNCTION The derivative function for f x is defined as f ' x lim
f x x f x
(127)
x
x 0
We then have
f x g x ' f ' x g x f x g ' x
(128)
This can be proved as f x g x ' f x x g x x f x g x lim x 0 x f x x g x x f x g x x f x g x x f x g x lim x 0 x f x x g x x g x lim g x x f x lim x 0 x 0 x x f ' x g x f x g ' x
(129)
Eq. (128) can be generalized as f x g x
n
n
n Ck f k 0
nk
g
k
(130)
Related Mathematics
391
We also have
f x f ' x g x f x g ' x 2 g x g x '
(131)
We do not directly prove it, but consider the following. 1 1 ' 1 g x x g x lim x g x x 0 g x x g x g x x g x lim x 0 x g x x g x x lim x 0 g x x g x
(132)
g ' x g x
2
Therefore, we obtain f x 1 1 f x f ' x g x g x g x f ' x g ' x f x 2 g x g x f ' x g x f x g ' x 2 g x '
'
(133)
We also have g f x g ' y f ' x '
(134)
where
y f x
(135)
Kunihiro Suzuki
392 We prove above. We have
f x x y y
(136)
Therefore, we obtain
x 0 : y f x x f x 0
(137)
Therefore, we obtain g f x lim x 0 '
lim
g f x x g f x x g y y g y y y
x 0
lim
g y y g y f x x f x y
x 0
lim
x
g y y g y
y 0
y
x lim
(138)
f x x f x
x 0
x
g ' y f ' x
10. VECTOR DERIVATIVE We treat vector derivative here. We consider a p-th order vector given by
a1 a2 β ap We consider a scalar
(139)
f which depends on the elements of β , which is denoted as
f a1 , , a p . The vector derivative is then defined as
Related Mathematics f a 1 f f a2 β f a p
393
(140)
We consider two special forms for f here. Let us consider a vector X , and a form f as
f βT X
(141)
T Since β is the 1 p vector, X should be p 1 vector given by
x1 x2 X xp
(142)
Therefore, we obtain
f βT X x1 x2 a1 a2 ap xp a1 x1 a2 x2 a p x p p
ai xi i 1
We then obtain
(143)
Kunihiro Suzuki
394 T f β X β β p
ai xi i 1
β
(144)
x1 x2 xp X
f as Let us consider a symmetrical matrix M , and a form
f βT Mβ
(145)
T p p Since β is the 1 p vector and β is the p 1 vector, M should be a matrix given by
m11 m12 m21 m22 M m p1 m p 2
m1 p m2 p m pp
(146)
Therefore, we obtain
f βT Xβ a1 a2
p
p
ai mij a j i 1 j 1
We then obtain
m11 m12 m21 m22 ap m p1 m p 2
m1 p a1 m2 p a2 m pp a p
(147)
Related Mathematics
395
T f β Mβ β β p
p
ai mij a j
(148)
i 1 j 1
β
Let us consider f a1 . The Eq. (148) can be reduced to p
f a1
p
ai mij a j i 1 j 1
a1
p p a1 m1 j a j a1 ai mi1 a12 m11 j 2 i2 a1 p 2a1 m1 j a j a12 m11 j 2 a1
(149)
p 2 m1 j a j a1m11 j 2 p
2 m1 j a j j 1
Since we assume a symmetrical matrix, we assume
mi1 m1i
(150)
in the derivation process. Therefore, we obtain T f β Mβ β β
p 2 m1 j a j j 1 p 2 m2 j a j j 1 p 2 m pj a j j 1 2 Mβ
(151)
Kunihiro Suzuki
396
11. SYMMETRY OF THE MATRIX N x1 A N y2 At N x1 1
We use a matrix of in corresponding analysis when we evaluate eigenvalues and eigenvectors. We study the symmetry of the matrix.
At N x1
At N x1
is evaluate as
n11 n 12 n13 n14
n31 n32 n33 n34
n21 n22 n23 n24
n11 nx1
nx 2 n22
nx1
nx 2
n13
n23
nx1
nx 2
n14
n24
nx1
nx 2
N Multiplying 2 y
1 n y1 0 2 1 t 1 N A N y x 0 0
0 0 1 nx 3
0
nx1
1
0
nx 2
0
0
n31 nx 3 n32 nx 3 n33 nx 3 n34 nx 3
n21
n12
1
(152)
1
to the matrix, we obtain
0
0
1 ny 2
0
0
1 ny 3
0
0
0 0 0 1 n y 4
n11
n21
nx1 n y1
nx 2 n y1
n12 nx1 n y 2 n13 nx1 n y 3 n14 nx1 n y 4
n22 nx 2 n y 2 n23 nx 2 n y 3 n24 nx 2 n y 4
n11
n21
nx1
nx 2
n12
n22
nx1
nx 2
n13
n23
nx1
nx 2
n14
n24
nx1
nx 2
n31 n x 3 n y1 n32 nx 3 n y 2 n33 nx 3 n y 3 n34 nx 3 n y 4
n31 nx 3 n32 nx 3 n33 nx 3 n34 nx 3
(153)
Related Mathematics We further multiply
397
A to the matrix, and obtain
A N y2 At N x1 1
n11 n21 n 31
M 11 M 21 M 31
n12 n13 n14 n22 n23 n24 n32 n33 n34 M 12 M 13 M 22 M 23 M 32 M 33
n31 nx 3 ny1
n11 nx1 ny1
n21 nx 2 ny1
n12 nx1 ny 2
n22 nx 2 ny 2
n32 nx 3 ny 2
n13 nx1 ny 3
n23 nx 2 ny 3
n33 nx 3 ny 3
n14 nx1 ny 4
n24 nx 2 ny 4
n34 nx 3 ny 4
(154)
Each component is given by M11
M12
M13
M 21
M 22
M 23
n 2 n112 n 2 n 2 12 13 14 nx1 ny1 nx1 ny 2 nx1 ny 3 nx1 ny 4
(155)
n n n11n21 n n n n 12 22 13 23 14 24 nx 2 ny1 nx 2 ny 2 nx 2 ny3 nx 2 ny 4
(156)
n11n31 n n n n n n 12 32 13 33 14 4 nx3 ny1 nx3 ny 2 nx3 ny3 nx3 ny 4
(157)
n n n21n11 n n n n 22 12 23 13 24 14 nx1 ny1 nx1 ny 2 nx1 ny3 nx1 ny 4
(158)
n232 n212 n222 n242 nx 2 ny1 nx 2 ny 2 nx 2 ny3 nx 2 ny 4
(159)
n21n31 n n n n n n 22 32 23 33 24 4 nx3 ny1 nx3 ny 2 nx3 ny3 nx3 ny 4
(160)
Kunihiro Suzuki
398 M 31
M 32
M 33
n31n112 n n 2 n n 2 n n 2 32 12 33 13 34 14 nx1 ny1 nx1 ny 2 nx1 ny3 nx1 ny 4
(161)
n31n21 n n n n n n 32 22 33 23 34 24 nx 2 ny1 nx 2 ny 2 nx 2 ny3 nx 2 ny 4
(162)
n312 n 2 n 2 n 2 32 33 34 nx3 ny1 nx3 ny 2 nx3 ny 3 nx3 ny 4
(163)
Finally, we multiply
N x 1 1
to the matrix. This corresponds to that we multiply
n
x2 to the first row elements, to the second row elements, and elements. Therefore, we obtain
M11
M12
M13
M 21
M 22
M 23
M 31
n 2 n112 n 2 n 2 12 13 14 nx1ny1 nx1ny 2 nx1ny 3 nx1ny 4
1
nx3
1
nx1
to the third row
(164)
n13n23 n11n21 n12 n22 n14 n24 nx1nx 2 ny1 nx1nx 2 ny 2 nx1nx 2 ny3 nx1nx 2 ny 4
(165)
n11n31 n12 n32 n13n33 n14 n4 nx1nx3 ny1 nx1nx3 ny 2 nx1nx3 ny3 nx1nx3 ny 4
(166)
n23n13 n21n11 n22 n12 n24 n14 nx 2 nx1 ny1 nx 2 nx1 ny 2 nx 2 nx1 ny3 nx 2 nx1 ny 4
(167)
n 2 n212 n 2 n 2 22 23 24 nx 2 ny1 nx 2 ny 2 nx 2 ny3 nx 2 ny 4
(168)
n21n31 n22 n32 n23n33 n24 n4 nx 2 nx3 ny1 nx 2 nx3 ny 2 nx 2 nx3 ny3 nx2 nx3 ny 4
(169)
n31n112 n n 2 n n 2 n n 2 32 12 33 13 34 14 nx3 nx1 ny1 nx3 nx1 ny 2 nx3 nx1 ny 3 nx3 nx1 ny 4
(170)
Related Mathematics
n31n21 n32 n22 n33n23 n34 n24 nx3nx 2 ny1 nx3nx 2 ny 2 nx3nx 2 ny3 nx3nx 2 ny 4
M 32
M 33
399
(171)
n312 n 2 n 2 n 2 32 33 34 nx3ny1 nx3ny 2 nx3ny3 nx3ny 4
(172)
M ij M ji
, and we can perform a Jacobi method to obtain
This is symmetrical, that is eigenvalues and eigenvectors.
If we use X instead of U , we should obtain the eigenvalue of X . Let us see what happen in this case. The equation we utilize in the text is given by N x1 A N y2 At N x1 N x X 2 N x X 1
2
(173)
This can be modified so that it is associated with X as
N
2 1 x
A N y2 At X 2 X 1
2
(174)
Therefore, we study the symmetry of matrix
N
2 1 y
At
N
2 1 x
A N
2 1 y
t
A
.
is evaluated as
1 n y1 0 2 1 t N y A 0 0 n11 n y1 n12 ny 2 n13 ny 3 n14 n y4
0
0
1 ny 2
0
0
1 ny3
0
0
n21 n y1
n31 n y1 n32 ny 2 n33 ny 3 n4 n y 4
n22 ny 2 n23 ny 3 n24 ny 4
0 n 0 11 n12 n 0 13 n14 1 n y 4
n21 n22 n23 n24
n31 n32 n33 n4
(175)
Kunihiro Suzuki
400
Multiplying A to the matrix of Eq. (175), we obtain A N y2 At 1
n11 n21 n 31
n12 n22 n32
M 11 M 21 M 31
M 12 M 22 M 32
n13 n23 n33
n11 n y1 n12 n14 ny 2 n24 n n34 13 ny 3 n14 n y4
n21 n y1 n22 ny 2 n23 ny 3 n24 ny 4
n31 n y1 n32 ny 2 n33 ny 3 n34 n y 4
M 13 M 23 M 33
(176)
Each element is given below.
M11
M 12
M 13
M 21
M 22
M 23
n112 n12 2 n132 n14 2 ny1 ny 2 ny 3 ny 4 n11n21 n12 n23 n13 n23 n14 n24 n y1 ny 3 ny 3 ny 4
n11n31 n12 n32 n13 n33 n14 n4 n y1 ny 2 ny 3 ny 4
n21n11 n22 n12 n23 n13 n24 n14 n y1 ny 2 ny 3 ny 4
n212 n222 n232 n242 ny1 ny 3 ny 3 ny 4 n21n31 n22 n32 n23 n33 n24 n4 n y1 ny 2 ny 3 ny 4
(177)
(178)
(179)
(180)
(181)
(182)
Related Mathematics
M 33
n312 n32 2 n332 n34 2 ny1 ny 2 ny 3 ny 4
N Finally, we multiply
401
(183)
2 1 x
to the first row element, 1 nx 2 We then obtain
M11
M 12
M 13
M 21
M 22
M 23
M 31
M 32
M 33
to the matrix. This corresponds to that we multiply 1 nx1 to the second row elements 1 nx3 to the third row elements.
n 2 n112 n 2 n 2 12 13 14 nx1ny1 nx1ny 2 nx1ny 3 nx1ny 4 n11n21 n12 n22 n13 n23 n14 n24 nx1ny1 nx1ny 3 nx1ny 3 nx1ny 4
n11n31 n12 n32 n13 n33 n n 14 4 nx1ny1 nx1ny 2 nx1ny 3 nx1ny 4
n n n21n11 n n n n 22 12 23 13 24 14 nx 2 ny1 nx 2 ny 2 nx 2 ny 3 nx 2 ny 4
n 2 n212 n 2 n 2 22 23 24 nx 2 ny1 nx 2 ny 3 nx 2 ny 3 nx 2 ny 4 n21n31 n22 n32 n n n n 23 33 24 4 nx 2 n y1 nx 2 n y 2 nx 2 n y 3 nx 2 n y 4
n31n11 n32 n12 n n n n 33 13 34 14 nx 3 n y1 nx 3 n y 2 nx 3 n y 3 nx 3 n y 4
n31n21 n32 n23 n33 n23 n34 n24 nx 3 n y1 nx 3 n y 3 nx 3 n y 3 nx 3 n y 4
n312 n 2 n 2 n 2 32 33 34 nx3 ny1 nx3 ny 2 nx3 ny 3 nx3ny 4
(184)
(185)
(186)
(187)
(188)
(189)
(190)
(191)
(192)
Kunihiro Suzuki
402
This is not symmetrical, and we cannot perform a Jacobi method to obtain eigenvalues and eigenvectors.
12. A STIRLING’S FORMULA We prove a Stirling’s formula given by lim n! 2n nn en
n
(193)
Step 1: A Wallis’ formula First, we prove a Wallis’ formula given by 22 42 62 2k 2 lim 2k 1 2k 1 n 1 3 3 5 5 7 k 1 n
lim
n
2
2n 2
2n 1 2n 1 (194)
We consider the integral given by
Sn
2 0
sin n xdx
(195)
Performing the integral, we obtain
Sn
2 0
2
sin n xdx sin x sin n 1 xdx
0
cos x sin n 1 x 2 n 1 0 n 1 n 1
1 sin x sin 2 0
2
0
2
n2
2
cos 2 x sin n 2 xdx
0
xdx
sin n 2 x sin n x dx
n 1 Sn 2 n 1 Sn
(196)
Related Mathematics
403
We then obtain
Sn
n 1 Sn 2 n
(197)
We can then have 2n 1 S2n 2 2n 2n 1 2n 3 S2 n 4 2n 2n 2 2n 1 2n 3 3 1 S0 2n 2n 2 4 2
S2n
(198)
2n S2 n 1 2n 1 2n 2n 2 S 2 n 3 2n 1 2n 1 2n 2n 2 4 2 S1 2n 1 2n 1 5 3
S2 n 1
(199)
where
S0
2
(200)
S1 1
(201)
Therefore, we obtain S2 n 1 2n 2n 2n 2 2n 2 S2 n 2n 1 2n 1 2n 1 2n 3
2
22 42 62
1 3 3 5 5 7
44222 5331
2n 2
2n 1 2n 1
In the region of 0 x 2 , we have a relationship below.
(202)
Kunihiro Suzuki
404
0 sin2n1 x sin2n x sin2n1 x
(203)
Therefore, we obtain
0 S2n1 S2n S2n1
(204)
This can be reduced to
1
S2 n S 2n 1 S2n 1 S2n 1
S2n 1 2n 1 2n 2n S2n 1 2n 1
(205)
Therefore, we obtain S2 n 1 n S 2 n 1 lim
2
lim
1 3 3 5 5 7
n
2
2
2 4 6
2
2n 1 2n 1 2n 2
(206)
Finally, we obtain the Wallis’s formula as 22 42 62 2k 2 lim 2k 1 2k 1 n 1 3 3 5 5 7 k 1 n
lim
n
2
2n 2
2n 1 2n 1 (207)
Step 2: We prove below.
lim
n
22 n n 2 n Cn
We have
(208)
Related Mathematics 2n 1 2n 3 2n 2n 2 1 2n 1 2
S2 n S2 n 1
Multiplying
nS2 n 1
n
3 1 2n 2n 2 4 2 2 2n 1 2n 1
405 42 53
(209)
on the both sides of the equation, and we obtain the root of
n 2n 1 2
S2 n S2 n 1
(210)
Using a relationship
lim
n
S2 n 1 S2 n 1
(211)
we have
S2 n lim nS2n 1 n S2n 1
lim n
nS2n 1
2
(212)
On the other hand, we have 2n 2n 2 4 2 2n 1 2n 1 5 3 2n 2n 2n 2 2n 2 2n 1 2n 2n 1 2n 2
S2 n 1
2n 2 2n 2 2 2n 1! 2 22 n n ! 2n 1 2n !
22 n 2n 1 2n Cn
Therefore, we have
4 42 2 5 4 3 2
42 22
(213)
Kunihiro Suzuki
406
lim 2 nS2 n 1 n
22 n lim 2 n n 2n 1 2n Cn 2n 22 n lim n 2n 1 n 2 n Cn 2n 2 2 lim n 2 1 n 2 n Cn n 22 n lim n n 2 n Cn
(214)
Step 3: We consider the integral given by x
ln xdx x ln x x 1
(215)
We then have
n
1
ln xdx x ln x x 1
n
n ln n n 1
(216)
We can regard the above integration geometrically as
n
1
ln xdx ln 2 ln 3
1 ln n 1 ln n 2
1 ln n 1! ln n n 2 which is shown in Figure 4. Let us consider the integral in detail as shown in Figure 5.
(217)
Related Mathematics
407
y
a a a
5
4
3
a
2
a
1
1
3
2
x
n 1 n
4
Figure 4. The integral of logarithm function and sum of the area of bars.
a
D a
2n
E
2 n 1
C B
A
n
n 1
x
Figure 5. The detail of integral of logarithm function and sum of the area of bars in the region of n and n+1.
The difference between the integral and the sum of bars’ areas is given by n a1 a2 a3 a4
a2n2
(218)
Since, we have a2n CDE ABC a2n1
(219)
Kunihiro Suzuki
408
an
decreases monotonically with increasing n. We further have
1 1 1 1 a2n ln n 1 ln n ln 1 2 4 8 n
(220)
Therefore, we obtain lim a2 n 0
n
(221)
then converges to when
an
n , although we do not know the value of
now. Therefore, we obtain
1 n ln n n 1 ln n 1! ln n n 2
(222)
We then have
n 1! nn
1
2
e n e1 n
(223)
Therefore, we obtain
lim
n
n 1! n
n 12 n
n!
lim
n
e
n
n 12 n
e
A
(224)
where
A e1
(225)
Squaring both sides of Eq. (224), we obtain
lim
n
n!2 n2n 1e2 n
A2
Eq. (224) should hold for n 2n , and hence we obtain
(226)
Related Mathematics
2n ! n 2n 2n e2n lim
1
409
A
2
(227)
Therefore, we obtain
n !2 A lim
n
n 2 n 1e2 n 2n !
2n 2n e2n n !2 2n 2n e2n lim n 2n ! n 2 n 1e2 n 1
2
1
2
1 22 n 2 n 2 n 2n ! n2n 1 1
lim
n
1
2
n !2 lim
n
1 22 n 2 2n ! n
n !2 lim
n
1 22 n 2 2 n 2 n Cn
(228)
We then have
n!
lim
n
n
n 12 n
e
A 2 (229)
Finally, we obtain
lim
n
n! 2n nn en
1
We then obtain for large
n!
(230)
n as
2n nn e n
This is the Starling’s theorem.
(231)
Kunihiro Suzuki
410
13. TRIGONOMETRIC FUNCTIONS The trigonometric function can be appreciated well by starting with an Euler formula given by
x iy rei r cos i sin
(232)
Therefore, we obtain x ' iy ' x iy ei x iy cos i sin x cos y sin i x sin y cos
(233)
This can be rearranged as x ' x cos y sin y ' x sin y cos cos sin x sin cos y
(234)
Therefore, the matrix
cos sin sin cos
(235)
is called as rotation one. We further obtain ei 2 ei ei
(236)
This leads to cos 2 i sin 2 cos i sin cos i sin cos 2 sin 2 2i cos sin
(237)
Related Mathematics
411
Therefore, we obtain cos 2 cos 2 sin 2 2cos 2 1 1 2sin 2
(238)
sin 2 2cos sin
(239)
Further, we obtain cos 2 2 sin 2 2 1
(240)
This leads to
1 tan 2 2
1 cos2 2
(241)
Appendix 2
SUMMARY OF PROBABILITY DISTRIBUTIONS AND THEIR MOMENTS ABSTRACT We studied various probability distributions, and showed that the characteristics of the distributions are expressed with their moments. We studied various analytical techniques to evaluate them. We summarize all the results in this chapter.
Keywords: moment, central moment, moment parameter, expectation, uniform distribution, binomial distribution, multinomial distribution, dirichlet distribution, negative binominal distribution, beta distribution, gamma distribution, inverse gamma distribution, poisson distribution, geometric distribution, hypergeometric distribution, normal distribution, standard normal distribution, lognormal distribution, Cauchy distribution, distribution, distribution, Rayleigh distribution, F distribution, t distribution, exponential distribution, Erlang distribution, Laplace distribution, Weibull distribution 2
1. INTRODUCTION We studied the probability distributions from the Bernoulli trial functions and extend it to a binomial distribution and lead to a normal distribution, and derive distributions composed of various variables. We further study the probability distribution in the standpoint of moments. Moment generation function is also studied to treat the moments in general. The moments are evaluated with various methods as mentioned above, and we summarize the results totally in this chapter.
Kunihiro Suzuki
414
2. GENERAL RELATIONSHIPS The moments for the probability distributions for discrete and continuous data are defined as
X
n
xj f j j 1
(1)
X x f x dx
(2)
The first central moment for discrete and continuous data are the same as the moments and are defined as
1 X 1
(3)
which is the same form for discrete and continuous data. The central moments of the order higher than two are given by
x j 1 f j n
j 1
(4)
x 1
f x dx
(5)
The expressions are the same as for both discrete and continuous data and are given by
1 X 2
X
(6)
X
X2 X
2
2
(7)
Summary of Probability Distributions and Their Moments
3
X
X
3
X3 3 X2 X 2 X
4
X
X
415
(8)
3
4
X4 4 X3
X 6 X2
X
2
3 X
(9)
4
The moment parameters are given by
1 X 2 2 X X
3 3
4 4
(10)
2
(11)
(12)
(13)
There are some confusion in expressions where
and 2 are used as parameters for
distribution functions instead of moment parameters. In that case, we use 1 and 2 . The other confusion may exist in using , where we also use covariance. The covariance can have positive and negative values while the superscript 2 give us the 2
2
positive value. Therefore, we use the expression of the second order variable where the confusion may exist. We also add brief comments on the distributions.
2 instead of to express that
Kunihiro Suzuki
416
3. FUNCTIONS, GENERATING FUNCTIONS, AND MOMENTS PARAMETERS FOR VARIOUS PROBABILITY DISTRIBUTIONS 3.1. A Uniform Distribution Graphics
Figure 1. Uniform distribution.
Probability Function 1 f x b a 0
for a x b for x a or x b
(14)
Generating Function
1 e b e a ba
(15)
Moments X
ba 2
(16)
Summary of Probability Distributions and Their Moments
X2
X3
X4
1 2 b ba a 2 3
1 3 2 b b a ba 2 a3 4
417
(17)
1 4 3 b b a b2 a 2 ba3 a 4 5
(18)
(19)
Central Moments
1
ba 2
(20)
2
1 2 b a 12
(21)
3 0
4
1 4 b a 80
(22)
(23)
Moment Parameters
ba 2
2
1 2 b a 12
0
9 5
(24)
(25) (26)
(27)
Kunihiro Suzuki
418
Peak Position None. Comment The angle probability in darts is uniform with the value range of 0 to 2 radian. If there is no special reason to have a certain value, the probability distribution should be uniform.
3.2. A Binomial Distribution Graphics 0.5 n = 10
p = 0.1
Probability
0.4 p = 0.3
0.3
p = 0.5
0.2 0.1 0.0
0
2
4
6
8
10
x (a) 0.5 p = 0.2
n=5
Probability
0.4 0.3 n = 10
0.2
n = 50
0.1 0.0
Figure 2. Binomial distribution. (a)
0
5
p
10 x (b)
15
dependence with n 10 . (b)
20
n
dependence with p 0.2 .
Summary of Probability Distributions and Their Moments
419
Probability Function
f x n Cx p x q n x for x 0,1,2, , n
(28)
where q 1 p
(29)
Generating Function pe q
n
(30)
Moments X np
(31)
X 2 n n 1 p2 np
(32)
X 3 n n 1 n 2 p3 3n n 1 p2 np
X 4 n n 1 n 2 n 3 p4 6n n 1 n 2 p3 7n n 1 p2 np
(33) (34)
Central Moments 1 np
(35)
2 n n 1 p 2 np
(36)
3 np 1 p 1 2 p
(37)
4 3n n 2 p 2 p 1 np 1 p 2
(38)
Kunihiro Suzuki
420
Moment Parameters
np
(39)
2 n n 1 p 2 np
(40)
1 2p np 1 p
3
n n 2 n
2
(41)
1 np 1 p
The composite variable
(42)
Y X1 X 2 also follows a binomial distribution with
parameters
f y n C y p y q n y
(43)
where
n n1 n2
(44)
Peak Position The peak position is given by
np q x0 np p where
x0
(45)
is an integer.
Comment This distribution is related to Bernoulli trials where we have only two values of 1 and 0. The binominal distribution is related to the number of a target event among the total trials. The coin toss is the most frequently used example for the distribution. We obtain tails or heads in the coin toss. The binominal distribution is the probability of x times head event among n times trials. This distribution can be applied to any one associated with the Bernoulli trials.
Summary of Probability Distributions and Their Moments
421
3.3. A Multinomial Distribution Graphics The corresponding graphic is the same as the one for binomial distribution. Probability Function f x1 , x2 ,
, xm
n! p x1 p x2 x1 ! x2 ! , xm !
p xm
(46)
where x1 x2
xm n
(47)
p1 p2
pm 1
(48)
Generating Function pi e qi
n
(49)
Moments X npi
(50)
X 2 n n 1 pi 2 npi
(51)
X 3 n n 1 n 2 pi3 3n n 1 pi 2 npi
(52)
X 4 n n 1 n 2 n 3 pi 4 6n n 1 n 2 pi3 7n n 1 pi 2 npi
(53)
X i X j n n 1 pi p j
(54)
Central Moments 1 npi
(55)
Kunihiro Suzuki
422
2 n n 1 pi 2 npi
(56)
3 npi 1 pi 1 2 pi
(57)
4 3n n 2 pi 2 pi 1 npi 1 pi 2
(58)
Moment Parameters npi
(59)
2 n n 1 pi 2 npi
(60)
1 2 pi
npi 1 pi
3
n n 2 n
2
(61)
1 npi 1 pi
ij 2 X i X j X i
(62)
Xj
n n 1 pi p j npi np j npi p j
(63)
ij 2 i 2 j2 pi p j
1 pi 1 p j
(64)
Peak Position The peak position is given by npi 1 pi x0 npi pi
(65)
Summary of Probability Distributions and Their Moments
423
where x0 is an integer.
Comment This distribution is related to Bernoulli trials where we have many values. The multinomial distribution is related to the number of one event among the total trials. The parameters are focused on one parameter and hence the results are the same as ones for the binomial distribution.
3.4. A Negative Binomial Distribution Graphics 0.5 r=2
p = 0.2 p = 0.5 p = 0.7
f(x)
0.4 0.3 0.2 0.1 0.0
0
5
10 x (a)
15
20
0.3 r=5 p = 0.2 p = 0.5 p = 0.7
f(x)
0.2
0.1
0.0
0
10
Figure 3. Negative binomial distribution. (a)
20 x (b)
r 2 . (b) r 5 .
30
40
Kunihiro Suzuki
424
Probability Function x f x, r , p r 1 x Cr 1 p r 1 q p
r 1 x
r
Cr 1 p r q
x
Cx p r q for x 1, 2,3, x
;r 0
(66)
where q 1 p
(67)
Generating Function p 1 qe
r
(68)
Moments X
rq p
(69)
X2
2 rq r r 1 q 2 p p
(70)
X3
2 r r 1 r 2 q 3 rq 3r r 1 q p p2 p3
(71)
X4
2 6r r 1 r 2 q 3 r r 1 r 2 r 3 q 4 rq 7r r 1 q p p2 p3 p4
(72)
Central Moments 1
2
rq p
(73)
rq p2
(74)
Summary of Probability Distributions and Their Moments 3
425
r 1 p 2 p p3
(75)
rq 3r r 2 q 4 2 p p4
2
(76)
Moment Parameters 1
2
rq p
(77)
rq p2
(78)
2 p r 1 p
2
(79) p2
3 1 r rq
(80) Y X1 X 2 also follows a binomial distribution with
The composite variable parameters
f y, r, p r Cy pr q
y
(81)
where
r r1 r2
(82)
Peak Position The peak position x0 is given by
rq 1 rq 1 1 x0 1 p qr p q
(83)
Kunihiro Suzuki
426 where x0 is an integer.
Comment This distribution is related to binominal distribution, but with the special situation. In the Bernoulli trials, we obtain two kinds of events of success and fail. The negative binominal distribution is related to the situation where we obtain r times success at the trial number of x r . Therefore, the distribution is related to the fail event number x where we have r times success.
3.5. A Beta Distribution Graphics 3
3 (5,2)
( ,) = (2,5) (2,3)
( ,)
(3,2)
2
(0.2,0.5) (0.2,0.3) (0.3,0.2) (0.5,0.2)
f(x)
f(x)
2
1
0 0.0
1
0.2
0.4
x (a)
0.6
0.8
0 0.0
1.0
0.2
0.4
x (b)
0.6
0.8
1.0
0.8
1.0
3
3 ( ,) = (0.5,5)
2
==5
(5,0.5)
2
(2,0.5)
2
f(x)
f(x)
(0.5,2)
1
1
1
0 0.0
0.5
0.2
0.4
x (c)
0.6
0.8
1.0
0 0.0
0.2
0.4
x (d)
0.6
Figure 4. Beta distributions. (a) 1, 1 , (b) 1, 1 , (c) ( 1)( 1) 0 , (d) .
Summary of Probability Distributions and Their Moments
427
Probability Function f x
x 1 1 x
1
B ,
for 0 x 1; 0, 0
(84)
Generating Function None. Moments Xi
B i , B ,
(85)
Central Moments Use a theorem. Moment Parameters Use a theorem. Peak Position The distribution has the peak for 1 and 1 , and the peak position x0 is given by x0
1 1 1
(86)
Comment A Beta distribution gives various kind of shape between 0 and 1 with varying parameter values of and . It is somehow extension of a binomial distribution, where is constant in the distribution. A Beta distribution is more flexible where we can set values and independently. A Beta distribution is used in the Bayes’ theorem.
3.6. A Dirichlet Distribution Graphics The corresponding graphic is the same as the one for Beta distribution.
Kunihiro Suzuki
428
Probability Function f x1 , x2 , , xm
1 2 m
x11 1 x2 2 1
xm m 1 (87)
where 1 2
m
(88)
x1 x2
xm 1
(89)
Generating Function None. Moments xj
j 1
1 j
j
x j2
x j3
x j4
(90) j 2
j
2
j j 1 1
(91)
j 3
3 j
j j 1 j 2 1 2
4
(92)
j 4 j
j j 1 j 2 j 3 1 2 3
(93)
Summary of Probability Distributions and Their Moments
429
The moment associated with covariance is given by xi x j
i 1 j 1 j
2 i
i j 1
(94)
Central Moments Use a theorem. Moment Parameters Use a theorem. Peak Position The distribution is the same as the one for Beta function. Comment A Dirichlet distribution is a likelihood function for multinomial distribution, and used in the Bayes’ theorem.
3.7. A Gamma Distribution Graphics 2.0
0.5
n=1 = 0.5
1.5
= 1.0
1.0
f(x)
f(x)
n=5
0.4
= 2.0
0.3 = 1.0
0.2
0.5 0.0
= 0.5
= 2.0
0.1 0.0
0
2
4
x (a)
6
8
Figure 5. Gamma distribution. (a) n 1 . (b)
10
n5.
0
5
10 x (b)
15
20
Kunihiro Suzuki
430
Probability Function
x
x n 1e f x n n
for x 0 (95)
Generating Function
1
1
n
(96)
Moments X n
(97)
X 2 n n 1 2 X 3 n n 1 n 2 3
X 4 n n 1 n 2 n 3 4
(98)
(99)
(100)
Central Moments
1
n 1 n
n 2 n 1 2 2 2 n n 3 n 3 n 1 n 2 n 1 3 3 3 2 n n n n
(101)
(102)
(103)
Summary of Probability Distributions and Their Moments
431
2 4 n 4 n 1 n 3 n 1 n 2 n 1 4 4 4 6 3 n n n n n n (104)
When
n
is an integer, they are reduced to
1 n
(105)
2 n 2
(106)
3 2n 3
(107)
4 3n n 2 4
(108)
Moment Parameters Use a theorem. When n is an integer, they are reduced to
n
(109)
2 n 2
(110)
2 n
(111) 2
3 1 n
(112)
We consider two independent variables X1 and X 2 which follow Gamma distributions and form a composite one as Y X1 X 2
Therefore, it also follows a Gamma distribution with
(113)
Kunihiro Suzuki
432
y
y n 1e f y n n
(114)
where n n1 n2
(115)
Peak Position The peak positon is given by x0 n 1
(116)
Comment
When an event occurs once per the time period of , the Gamma distribution expresses the probability for the time period where the event occurs n times. Therefore, it expresses the probability that an event occurs n times during the time period of x .
3.8. An Inverse Gamma Distribution Graphics
3.0 n=3 =1
f(x)
2.0
=2
1.0 =5
0.0
0
Figure 6. Inverse Gamma distribution.
1
2
x
3
4
5
Summary of Probability Distributions and Their Moments
433
Probability Function
n x n 1e f x n
x
for x 0 (117)
Generating Function None. Moments X
n 1 n
X2
X3
X4
(118)
n 2 n
n 3 n
n 4 n
2 (119)
3 (120)
4 (121)
Central Moments
1
n 1 n
n 2 n 1 2 2 2 n n 3 n 3 n 1 3 n 2 n 1 3 3 2 n n n n
(122)
(123)
(124)
Kunihiro Suzuki
434
2 4 n 4 n 1 4 n 3 n 1 n 2 n 1 4 4 6 3 n n n n n n (125)
When
n
is an integer, they are reduced to
n 1
1
(126)
1
2
n 1 n 2 2
2 (127)
4
3
n 1 n 2 n 3 3
3
3n 19
4
n 1 n 2 n 3 n 4 4
(128)
4 (129)
Moment Parameters Use a theorem. When n is an integer, they are reduced to
n 1
2
4
(130)
1
n 1 n 2 2
2
n2 n3
19 n 3 n 2 3 n 3 n 4
(131)
(132)
(133)
Summary of Probability Distributions and Their Moments
435
Peak Position The peak position x0 is given by
x0
n 1
(134)
Comment The probability that we have several data that follow the same normal distributions can be expressed with the product of many normal distributions. If we regard the variable for the function is the variances, it is an inverse Gamma function. Therefore, the inverse Gamma function is used for a priori distribution function in Bayesian statistics for evaluating the variance. The inverse Gamma distribution is derived by converting a variable of X to the inverse one, where X follows the Gamma distribution. This is the reason why the distribution is called as inverse Gamma distribution.
3.9. A Poisson Distribution Graphics
0.5 =1 =2 =5
f(x)
0.4 0.3 0.2 0.1 0.0
0
Figure 7. Poisson distribution.
2
4
x
6
8
10
Kunihiro Suzuki
436
Probability Function f x
x x!
e for x 0,1, 2, (135)
Generating Function
exp e 1 1 1 1 exp 2 3 4 2 3! 4!
(136)
Moments
X X 2 2 X 3 3 3 2
X 4 4 6 3 7 2
(137)
(138)
(139)
(140)
Central Moments
1
(141)
2
(142)
3
(143)
4 3 2
(144)
Summary of Probability Distributions and Their Moments
437
Moment Parameters
(145)
2
(146)
1
(147)
3
1
(148)
The composite variable Y X1 X 2 also follows a Poisson distribution f y
y y!
e
(149)
where
1 2
(150)
Peak Position The peak position
1 x0
x0
is given by
(151)
where x0 is an integer.
Comment A Poisson distribution can be regarded as a limited one of a binomial distribution with a quite small probability and many trials. This distribution is convenient in the standpoint of convergence, which is a severe one associated with the binomial distribution. Therefore, the Poisson distribution is frequently used for many cases, where the occurring probability is small. It should be noted that the parameter for the distribution is the average number while it is the probability where a target event occur in the binominal distribution.
Kunihiro Suzuki
438
3.10. A Geometric Distribution Graphics
0.5 p = 0.1 p = 0.2 p = 0.5
f(x)
0.4 0.3 0.2 0.1 0.0
0
5
10 x
15
20
Figure 8. Geometric distribution.
Probability Distribution
f x q x 1 p for x 1,2,
(152)
where q 1 p
(153)
Generating Function
pe 1 qe
(154)
1 p
(155)
2 p p2
(156)
Moments X
X2
Summary of Probability Distributions and Their Moments X3
X4
439
6 6 p p2 p3
(157)
24 36 p 14 p 2 p3 p4
(158)
Central Moments 1
2
1 p
(159)
1 p p2
(160)
1 p 2 p
3
p3
(161)
9 1 p p 2 1 p 2
4
p4
(162)
Moment Parameters
1 p
2
(163)
1 p p2
(164)
2 p 1 p
9
p2 1 p
(165)
(166)
Kunihiro Suzuki
440
Peak Position x0 1
(167)
Comment
p
In the Bernoulli event with probability of , the trial number where we succeed for the first time follows a geometric distribution. This distribution corresponds to the case where we succeed for the first time after the x time trials. This means that we fail x 1 times and then succeed.
3.11. A Hypergeometric Distribution Graphics
0.4 N = 50 M = 15 n = 10
f(x)
0.3 0.2 0.1 0.0
0
2
4
x
6
8
10
Figure 9. Hypergeometric distribution.
Probability Distribution f x
M
Cx N
N M
Cn
Generating Function None.
Cn x
(168)
Summary of Probability Distributions and Their Moments
441
Moments X np
X 2 np
X3
(169)
npN n pN N N 1
(170)
n2 p 2 N 2 3n 2 pN 2n 2 3np 2 N 2 3npN 2 p 2 N 2 np N 1 N 2 3npN 2 3nN 3 pN 2 N 2
X 4 X X 1 X 2 X 3 6 X 3 11 X 2 6 X
X X 1 X 2 X 3 np pN 1 pN 2 pN 3
n 1 n 2 n 3 N 1 N 2 N 3
(171)
(172)
(173)
where
p
M N
(174)
q 1 p
(175)
Central Moments
1 np 2 npq
N n N 1
2n2 3npN 2 p 2 N 2 3nN 3 pN 2 N 2 np 3 N 1 N 2 6n2 p 6np2 N 6npN 4n2 p2
(176)
(177)
(178)
We do not have a clear expression for 4 , and hence evaluate it using the theorem as
Kunihiro Suzuki
442
4 X 4 4 X 3 X 6 X 2 X 3 X 2
4
(179)
Moment Parameters
np 2 npq
(180)
N n N 1
(181)
We do not have a clear expression for , and hence evaluate it as
3 3
(182)
We do not have a clear expression for , and hence evaluate it as
4 4
(183)
Peak Position The peak position x0 is given by n 1 n 1 np p N x N 0 2 2 1 1 N N
np q
(184)
where x0 is an integer.
Comment This distribution is related to Bernoulli trials where we have only two values. A binominal distribution is related to the number of one event among the total trials. In the binomial distribution, we assume that the probability that the event occurs is invariant. In the determining the ratio of red balls among red and black balls, we return the picked up ball, and hence the ratio of the red ball among the total ball is constant. However, we do not return the ball, and hence the corresponding probabilities to obtain red balls changes in each event and the corresponding probability distribution becomes this hypergeometric one.
Summary of Probability Distributions and Their Moments
443
3.12. A Normal Distribution Graphics 0.03
0.05
= 20
= 100
= 40
= 80
= 120
= 10
f(x)
f(x)
0.02
0.04 0.03 0.02
0.01
= 20 = 40
0.01
0.00
0
50
100 x (a)
Figure 10. Normal distribution. (a)
150
200
0.00
0
50
100 x (b)
150
200
dependence. (b) dependence.
Probability Distribution f x
x 2 exp 2 2 2 1
for x
(185)
Generating Function 2 2 exp 2
(186)
Moments X
X2 2 X3 0
X4 3
(187)
(188)
(189)
(190)
Kunihiro Suzuki
444
Central Moments 1
(191)
2 2
(192)
3 0
(193)
4 3 4
(194)
Moment Parameters
(195)
2 2
(196)
0
(197)
3
(198)
The composite variable Y X1 X 2 also follows a normal distribution given by f y
y 2 exp 2 2 2 1
(199)
with parameters given by 1 2
(200)
2 12 22
(201)
A partial normal distribution is given by
x a 2 2 1 for x 0 f x exp 2 2 a 2 1 Erf 2
(202)
Summary of Probability Distributions and Their Moments
445
A joined half normal distribution is given by f x
x 2 exp for x 2 12 1 2 2 1
x 2 exp for x 2 22 1 2 2 1
(203)
Peak Position
x0
(204)
Comment When the probability variable is influenced by many independent factors and each factor is not extremely significant compared with other factors, we can expect that the variable follows a normal distribution. Many probability variables, which have bell shaped frequency distributions, are expressed with a normal distribution. Therefore, a normal distribution is the most important probability distribution in statistics.
3.13. A Standard Normal Distribution Graphics 0.50
f(x)
0.40 0.30 0.20 0.10 0.00 -4
-3
Figure 11. Standard normal distribution.
-2
-1
0 x
1
2
3
4
Kunihiro Suzuki
446
Probability Distribution f x
x2 exp 2 2 1
for x
(205)
Generating Function 2 exp 2
(206)
Moments X 0
X 2 1 X3 0 X4 3
(207)
(208)
(209)
(210)
Central Moments 1 0
(211)
2 1
(212)
3 0
(213)
4 3
(214)
Moment Parameters 0
(215)
2 1
(216)
Summary of Probability Distributions and Their Moments
447
0
(217)
3
(218)
Peak Position
x0 0
(219)
Comment The normal distribution is most important one with two parameters of an average and a variance. Therefore, the distributions are controlled with these two parameters. The standard normal distribution is normalized with the two parameters and is independent of any parameters. Therefore, the probability variables are converted to the normalized ones. We can analyze the data with the standard normal distribution and the results are converted to the original normal distribution if we know the related average and variance.
3.14. A Lognormal Distribution Graphics
1.50
= 0 = 2.0
1.00
f(x)
= 1.5 = 1.0 =0.5
0.50
0.00
0
Figure 12. Lognormal distribution.
1
2
x
3
4
5
Kunihiro Suzuki
448
Probability Distribution ln x 2 f x exp 2 2 2 x 1
for x 0
(220)
Generating Function None. Moments
1 X exp 2 2
(221)
X 2 exp 2 2 2
(222)
9 X 3 exp 3 2 2
(223)
X 4 exp 4 8 2
(224)
Central Moments
1
1 exp 2 2
2 exp 2 2 2
(225)
(226)
3 exp 3 2 exp 3 2 3exp 2 2 2
(227)
4 exp 4 2 2 exp 6 2 4exp 3 2 6exp 2 3
(228)
3
Summary of Probability Distributions and Their Moments
449
Moment Parameters
1 2
exp 2
(229)
2 exp 2 2 2
(230)
exp 3 2 3exp 2 2
(231)
3
exp 2 1 2
exp 6 2 4exp 3 2 6exp 2 3 exp 2 1
The composite variable
2
(232)
Y X1 X 2 also follows a lognormal distribution with
parameters
1 2
(233)
2 12 22
(234)
Peak Position
x0 e
2
(235)
Comment When the probability variable Y ln X follows a normal distribution, the lognormal distribution is the one for the variable X . This distribution is sometimes used for the distribution for asset.
Kunihiro Suzuki
450
3.15. A Cauchy Distribution Graphics 0.8
=0
f(x)
0.6
= 0.5 = 1.0
0.4
= 2.0
0.2 0.0
-5 -4 -3 -2 -1
0 x
1
2
3
4
5
Figure 13. Cauchy distribution.
Probability Distribution f x
1 x 2 1
for x
(236)
The standard one is given by f x
1 1 x 2
Generating Function None. Moments None. Central Moments None. Moment Parameters None.
(237)
Summary of Probability Distributions and Their Moments
451
Peak Position
x0
(238)
Comment This distribution is used for resonant phenomenon for X-ray density spectrum in nuclei physics. This is also known as the one that has no moment parameters.
2 3.16. Distribution
Graphics
1.0 n=1
0.8
n=3 n=5 n = 10
f(x)
0.6
n = 20
0.4 0.2 0.0
0
2
4 x
6
8
Figure 14. 2 distribution.
Probability Distribution fn x
n 1 x 2 x exp n 2 n 22 2
1
for 0 x
(239)
Generating Function
1 2 x n2 1 n dx x exp 2 n 0 2 2 2 1
(240)
Kunihiro Suzuki
452
Moments X n
(241)
X 2 n n 2
(242)
X 3 n n 2 n 4
(243)
X 4 n n 2 n 4 n 6
(244)
Central Moments 1 n
(245)
2 2n
(246)
3 8n
(247)
4 12n n 4
(248)
Moment Parameters
n
(249)
2 2n
(250)
8 n
(251)
3 n 4 n
2 The composite variable Y X1 X 2 also follows a distribution given by
(252)
Summary of Probability Distributions and Their Moments
fn y
n 1 y 2 y exp n 2 n 22 2
1
453
(253)
where
n n1 n2
(254)
Peak Position
x0 n 2
(255)
When x0 is negative, it is 0.
Comment 2 When the probability variable X follows a standard normal distribution, the distribution is the one for the sum of square of sum of various X . Therefore, this distribution is related to the error of variables, and hence is frequently used in statistics.
3.17.
Distribution
Graphics
1.0 n=1
0.8
f(x)
n=2 n=5
0.6 0.4 0.2 0.0
Figure 15.
distribution.
0
1
2
x
3
4
5
Kunihiro Suzuki
454
Probability Distribution
fn x
x2 n 1 x exp n 1 n 2 22 2 1
for 0 x
(256)
Generating Function None. Moments
n 1 2 2 X n 2
(257)
X2 n
(258)
n 3 22 2 n 2 3
X3
X 4 n n 2
(259)
(260)
Central Moments Use a theorem. Moment Parameters Use a theorem. Peak Position
x0 n 1
(261)
Summary of Probability Distributions and Their Moments
455
Comment 2 2 When the probability variable Y X follows a distribution, this is the one for
X . Therefore, this distribution is related to the distance from the origin.
3.18. A Rayleigh Distribution Graphics
1.5 = 0.5
f(x)
1.0 = 1.0
0.5
0.0
= 2.0
0
1
2
x
3
4
5
Figure 16. Rayleigh distribution.
Probability Distribution f x
x2 x exp 2 2 2
for x 0 (262)
Generating Function None. Moments X
2
X 2 2 2
(263)
(264)
Kunihiro Suzuki
456 3 2 3 2
X3
X 4 8 4
(265)
(266)
Central Moments
1
2
(267)
2 2 2 2
3
3 2
(268)
3
(269)
(270)
3
4 8 2 4 4
Moment Parameters
(271)
2 2 2 2
(272)
2
2 3 2
4 2 32 3 2
4
2
(273)
(274)
Peak Position x0
(275)
Summary of Probability Distributions and Their Moments
457
and Y
follow normal
Comment When the two independent probability variables X X2 Y2
distributions with averages of zero, this is the one for is used in acoustic engineering.
. Therefore, this distribution
3.19. An F Distribution Graphics 1.0
n2 = 5
f(x)
0.8
n1 = 1 n1 = 5 n1 = 10
0.6 0.4 0.2 0.0
0
1
2
x
3
4
5
Figure 17. F distribution.
Probability Distribution n1 2x 1 f n1 , n2 x n1 n2 n1 x n2 B , 2 2 2 2
n1
2
n1 x 2 1 n1 n x 2 2 2
n2
2 1 x
for 0 x (276)
Generating Function
n1 2x 1 exp x n1 n2 n1 x n2 B , 2 2 2 2 0
n1
2
n1 x 2 1 n1 n x 2 2 2
n2
2 1 dx x
(277)
Kunihiro Suzuki
458
Moments X
X
n2 n2 2
n 2 n1
2
(278)
n1 n1 2
2
n2 2 n2 4
(279)
n1 n1 2 n1 4 n X3 2 n n 1 2 2 n2 4 n2 6
(280)
n1 n1 2 n1 4 n1 6 n X4 2 n1 n2 2 n2 4 n2 6 n2 8
(281)
3
4
Central Moments 1 X
2
3
n2 n2 2
2n22 n1 n2 2 n1 n2 2 n2 4 2
8n23 2n1 n2 2 n1 n2 2 n12 n2 2 n2 4 n2 6 3
n1 6 n1 4 n1 2 n1 n2 2 n2 4 n2 6 n2 8 3 n n1 4 n1 2 n1 n2 4 2 n n 2 n 4 n 6 n 2 2 2 2 1 2 2 2 n n1 2 n1 n2 6 2 n1 n2 2 n2 4 n2 2 n2 n1
(282)
(283)
(284)
4
4
n2 3 n2 2
(285)
4
Moment Parameters
n2 n2 2
(286)
Summary of Probability Distributions and Their Moments 2
2n22 n1 n2 2
459 (287)
n1 n2 2 n2 4 2
8 n2 4 2n1 n2 2 n2 6 n1 n1 n2 2
(288)
2 3 n2 4 4 n2 2 n1 n2 10 n1 n2 2 n1 n2 6 n2 8 n1 n2 2
(289)
Peak Position
n1n2 n2 2 x0 n1n2 n1 2
(290)
Comment This distribution is related to the ratio of two variances, and hence plays an important role in variance analysis.
3.20. A t Distribution Graphics 0.5 0.4
n = 10 n=3
f(x)
n=1
0.3 0.2 0.1 0.0 -5 -4 -3 -2 -1 0 x
Figure 18
t distribution.
1
2
3
4
5
Kunihiro Suzuki
460
Probability Distribution n 1 n 1 2 2 2 x fn x 1 n n n 2
for x (291)
Generating Function n 1 n 1 2 2 2 x 2 dx exp x 1 n n n 0 2
(292)
Moments X 0
X2
(293)
n n2
X3 0
X4
3n 2 n 2 n 4
(294)
(295)
(296)
Central Moments 1 0
2
n n2
3 0 4
3n 2 n 2 n 4
(297)
(298)
(299)
(300)
Summary of Probability Distributions and Their Moments
461
Moment Parameters 0
2
(301)
n n2
(302)
0
3
(303)
n2 n4
(304)
Peak Position x0 0
(305)
Comment This distribution is related to the average of sample data, and hence most important one in statistics. This distribution is well approximated with a standard normal distribution with a large sample number n .
3.21. An Exponential Distribution Graphics 2.0 = 0.5
f(x)
1.5 1.0
= 1.0 = 2.0
0.5 0.0 0 Figure 19. Exponential distribution.
1
2
x
3
4
5
Kunihiro Suzuki
462
Probability Distribution f x
x 1 exp
for 0 x (306)
Generating Function
1 1
(307)
Moments X
(308)
X 2 2 2
(309)
X 3 6 3
(310)
X 4 24 4
(311)
Central Moments 1
(312)
2 2
(313)
3 2 3
(314)
4 9 4
(315)
Moment Parameters
(316)
2 2
(317)
Summary of Probability Distributions and Their Moments
463
2
(318)
9
(319)
Peak Position x0 0
(320)
Comment This distribution is applied to a variable where the occurring probability is the identical for each event. The probability that occurs is then proportional to the current number of the elements. The nuclear decay number is well expressed by the distribution.
3.22. An Erlang Distribution Graphics
1.5
= 1/k
k = 10 k=5
k=1
1.0
f(x)
k=3
0.5
0.0
0
1
x
2
3
Figure 20. Erlang distribution.
Probability Distribution fk x
xk 1 e k k 1! x
for 0 x (321)
Kunihiro Suzuki
464
Generating Function
1
k
(322)
Moments X k
(323)
X 2 k k 1 2
(324)
X 3 k k 1 k 2 3
(325)
X 4 k k 1 k 2 k 3 4
(326)
Central Moments 1 k
(327)
2 k 2
(328)
3 2k 3
(329)
4 k 2 2k 4
(330)
Moment Parameters k
(331)
2 k2
(332)
2
(333)
k
1
2 k
(334)
Summary of Probability Distributions and Their Moments
465
Peak Position x0 k 1
(335)
Comment When the probability variable X1 , X 2 , , X k follow exponential distributions, the sum of the probability variables follows this distribution. This distribution is used queueing theory for service time where service consists of serial many processes. The distribution changes from an exponential distribution to a delta function with increasing the service step number k .
3.23. A Laplace Distribution Graphics
1.0
=0
f(x)
0.8
= 0.5
0.6 = 1.0
0.4
= 2.0
0.2 0.0 -4
-3
-2
-1
0 x
1
2
3
4
Figure 21. Laplace distribution.
Probability Distribution f x
x 1 exp 2
for x
(336)
Generating Function
1 1 2
2
e
(337)
Kunihiro Suzuki
466
Moments X
(338)
X 2 2 2 2
(339)
X 3 3 6 2
(340)
X 4 4 12 2 2 24 4
(341)
Central Moments 1
(342)
2 2 2
(343)
3 0
(344)
4 24 4
(345)
Moment Parameters
(346)
2 2 2
(347)
0
(348)
6
(349)
Peak Position
x0
(350)
Summary of Probability Distributions and Their Moments
467
Comment A Laplace distribution is the one that is formed by connected two exponential distributions that are symmetrical with respect the origin, and hence it is defined in an infinite plane.
3.24. A Weibull Distribution Graphics 2.0
1.5
=1 m=3
1.0
m=2
= 0.5
1.5
=1
f(x)
m=2
f(x)
m=1 m = 0.5
=2
1.0
0.5
0.5 0.0
0
1
2
3
4
0.0
5
0
1
x (a)
2
3
4
5
x (b)
Figure 22. Weibull distribution. (a) m dependence. (b)
dependence.
Probability Distribution f x
m x
m 1
x m exp
for 0 x ; m 0, 0
(351)
Generating Function None. Moments k X k k 1 m
Central Moments Use a theorem.
(352)
Kunihiro Suzuki
468
Moment Parameters Use a theorem. Peak Position 1
m 1 m x0 m
(353)
Comment This distribution is widely used in the reliability field and is related to the failure rate. The failure is not related to the average point in the system, but to the weakest point. The distribution is derived focusing on the weak point in the system, and hence is called as the weakest link model. This distribution expresses various kinds of shape with varying a parameter value, which corresponds to various kinds of mechanisms for failures.
REFERENCES [1] [2]
[3] [4] [5]
[6]
W. L. Carlson and B. Thorne, Applied Statistical Methods, 1997, Presence-Hall, Inc., New Jersey, U.S.A. R. A. Barnett, M. R. Ziegler, and K. E. Byleen, College Mathematics for Business, Economics, Life science, and Social sciences 12th edition, 2011, Pearson Education, Inc., 2011, U. S. A. G. Maruyama, Probability and statistics, 1956, Kyoritu, Japan, in Japanese. 丸山儀四郎、“確率および統計入門”、共立出版、日本、1956. A. Kobari, Introduction to probability and statistics, 1973, Iwanami Shoten, Japan. 永田靖、小針 宏、“確率・統計入門”、岩波書店、東京、1973. Y. Tanaka and K. Wakimoto, Multivariate statistical analysis, 1994, Gendai Sugakusha, Japan, in Japanese. 田中豊、脇本和昌、“多変量統計解析法”、現代数学社、京都、1994. Y. Nagata and M. Munechika, Multivariate analysis, 2007, Science Company, Japan, in Japanese.永田靖、棟近雅彦、“多変量解析法入門”、サイエンス社
[7]
、東京、2007. Y. Wakui and S. Wakui, Covariance Structural Analysis, Nihon Jitsugyo Publisher, 2003, Japan, in Japanese. 涌井良幸、涌井貞美、“共分散構造分析”、日本実
[8]
業出版社、東京、2003. Y. Wakui, Bayes Statistics as a Tool, Nihon Jitsugyo Publisher, 2009, Japan, in Japanese. 涌井良幸、“道具としてのベイズ統計”、日本実業出版社、 東京、2009.
470 [9]
Kunihiro Suzuki
H. Cramer, Mathematical Methods of Statistics, 1999, Princeton University Press, U. S. A. [10] D. M. Levine,T. C. Krehbiel, and M. L. Berenson, Business Statistics, 2013, Pearson Education Inc., U. S. A.
ABOUT THE AUTHOR Kunihiro Suzuki, PhD Fujitsu Limited, Tokyo, Japan Email: [email protected]
Kunihiro Suzuki was born in Aomori, Japan in 1959. He received his BS, MS, and PhD degrees in electronic engineering from Tokyo Institute of Technology, Tokyo, Japan, in 1981, 1983, and 1996, respectively. He joined Fujitsu Laboratories Ltd., Atsugi, Japan in 1983 and was engaged in design and modeling of high-speed bipolar and MOS transistors. He studied process modeling as a visiting researcher at the Swiss Federal Institute of Technology, Zurich, Switzerland in 1996 and 1997. He moved to Fujitsu Limited, Tokyo, Japan in 2010, where he was engaged in a division that is responsible for supporting sales division. His current interests are statistics and queuing theory for business. His research covers theory and technology in both semiconductor device and process. To analyze and fabricate high-speed devices, he also organizes a group that includes physicists, mathematicians, process engineers, system engineers, and members for analysis such as SIMS and TEM. The combination of theory and experiment and the aid from various members make his group special to do various original works. His models and experimental data are systematic and valid for wide range conditions and can contribute to academic and practical product fields. He is the author and co-author of more than 100 refereed papers in journals, more than 50 papers in international technical conference proceedings, and more than 90 papers in domestic technical conference proceedings.
INDEX 2
2 distribution, 34, 72, 73, 78, 79, 209, 304, 306, 413, 451, 452, 453, 455
A adjust residual, 27, 45, 48, 49, 50, 51, 52, 57, 61, 66 Analytic Hierarchy Process (AHP), v, ix, 129, 130, 139 average, 1, 3, 4, 5, 6, 7, 12, 13, 14, 16, 19, 21, 22, 24, 34, 49, 69, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 95, 97, 108, 110, 111, 112, 115, 119, 126, 129, 133, 134, 135, 139, 160, 161, 166, 172, 173, 177, 181, 193, 203, 205, 221, 237, 278, 280, 307, 313, 314, 316, 437, 447, 461, 468
B birth ratio, 215, 218
C cohort ratio, 215, 216, 217, 218, 219 combination, 471 constant flux, 249, 265, 266 contributed item, 8, 54, 56 correlation factor, viii, 1, 3, 4, 5, 6, 8, 10, 11, 16, 17, 18, 19, 20, 22, 23, 24, 26, 31, 54, 56, 66, 94, 95,
96, 97, 169, 172, 173, 177, 178, 181, 187, 305, 316, 320 correspondence analysis, v, 169, 171 co-variance, 3, 21, 162 CS analysis, viii, 1, 2, 15, 16, 19, 20, 27, 52, 56, 65, 66, 67, 121 CS correlation factor, 1, 10, 11, 56, 66 CS plot, 1, 8, 11, 55
D determinant of a matrix, 321, 346
E eigenvalue, 129, 135, 136, 139, 169, 177, 179, 180, 181, 186, 190, 321, 347, 349, 350, 352, 353, 354, 363, 399 eigenvector method, 134, 135, 139 explanatory variable, 1, 2, 3, 5, 13, 15, 16, 18, 21, 22, 24, 27, 28, 30, 31, 51, 56, 57, 65, 66, 103, 104, 114, 141, 159 exponential distribution, 301, 310, 311, 387, 413, 461, 465, 467 extended normalized value, 120, 121
F F distribution, 69, 72, 73, 93, 94, 99, 102, 104, 107, 413, 457107, 413, 457 first principal component, 16, 17
Index
474 G Gauss elimination method, 326, 327, 331, 335, 337 geometric average, 129, 133, 134, 135, 139
H hazard function, 205, 206, 207, 212 hypothesis, 69, 70, 73, 74, 75, 78, 79, 80, 82, 84, 86, 88, 89, 91, 93, 94, 95, 97, 98, 100, 103, 105, 108, 110, 113
N network matrix, 293, 294, 295, 299 network path with a loop, 295 normal distribution, 49, 52, 69, 72, 78, 79, 81, 82, 83, 90, 91, 92, 95, 96, 98, 107, 204, 212, 301, 307, 312, 313, 314, 316, 320, 413, 435, 442, 444, 445, 447, 449, 453, 457, 461 normalized value, 6, 19, 64, 115, 116, 119, 120, 121 n-th product of matrix, 364
O I improve requested, 8, 54 improvement request item, 54 independent factor, v, 27, 28, 29, 36, 45, 50, 51, 52, 53, 54, 59, 66, 67, 113, 445 independent value, 27, 30, 31, 32, 34, 36, 37, 38, 39, 44, 65, 113 initial condition, 262, 263, 265, 266, 273, 274, 279, 284, 286, 289 initial vector, 249, 255, 256, 264, 267, 268, 271, 273, 294, 295, 297, 299, 350 inverse matrix, 158, 291, 321, 325, 326, 338, 344, 345, 365
K Kaplan-Meier product-limit predictive method, 199
L Lagrange function, 188 level achievement ratio, 1, 27, 51, 52, 54, 64, 66, 67 LU decomposition, 335 LU division, 339, 344
M Markov process, v, 249, 250, 251, 268, 287 matrix operation, v, ix, 16, 134, 141, 153, 169, 266, 321, 323, 335
objective variable, viii, 1, 2, 3, 4, 7, 10, 11, 13, 16, 17, 18, 21, 22, 27, 28, 30, 31, 49, 56, 57, 63, 65, 66, 114, 141, 142, 151
P P point, 5, 72, 78, 85, 90, 92, 93, 94, 96, 98, 99, 102, 107, 109, 113 pair comparison method, 129, 131, 132 Poisson distribution, 311, 319, 435, 437 population prediction, v, 215 power method, 350 prediction, v, vii, 70, 71, 72, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 91, 92, 93, 94, 95, 96, 98, 100, 102, 105, 107, 110, 113, 114, 149, 193, 215, 304, 306 principle of symmetry, 221, 224
Q quantification theory I, v, 141, 153, 169, 170, 187 quantification theory II, v, 153, 169, 170 quantification theory III, v, 169, 170 quantification theory IV, v, 187
R random number, v, ix, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 319, 320 random walk, v, 221, 249, 250, 251, 252, 255 return frequency, 237
Index S score evaluation, v, 115, 119 standard deviation, 4, 5, 6, 19, 22, 24, 53, 65, 67, 76, 82, 84, 86, 90, 91, 95, 115, 119, 121, 125, 126, 193, 199, 203, 210, 314 standard normal distribution, 52, 72, 78, 79, 81, 83, 90, 92, 96, 98, 307, 313, 314, 316, 413, 445, 447, 453, 461 supply source, 249, 262, 264, 270, 271 survival probability, 193, 194, 195, 196, 197, 198, 199, 200, 201, 203, 205, 206, 210
475 93, 94, 95, 96, 97, 98, 99, 100, 102, 103, 105, 107, 109, 113, 207, 304, 305, 306 transition matrix, 249, 254, 255, 256, 257, 258, 262, 264, 265, 266, 268, 271, 273, 276, 279, 281, 282, 283, 286, 287, 289, 290, 292, 293, 299 transition probability, 251, 255, 258, 260, 265, 266, 278
U unbiased variance, 3, 21, 75, 78, 88, 94, 106, 107, 109, 111 unit vector, 8, 20, 25, 54, 122, 127, 349
T V t distribution, 5, 6, 7, 22, 69, 72, 77, 78, 84, 85, 86, 87, 89, 95, 305, 413, 459 testing, v, viii, 5, 31, 34, 35, 67, 69, 71, 73, 74, 75, 78, 79, 80, 81, 82, 84, 85, 86, 88, 89, 90, 91, 92,
vanishing monitor, 265