Statistics. Volume 3, Categorical and time dependent data analysis [3 ed.] 9781536151251, 1536151254

249 90 8MB

English Pages [490] Year 2019

Table of contents :
Contents
Preface
Chapter 1
Customer Satisfaction Analysis
Abstract
1. Introduction
2. Questionnaire
3. Fundamental Parameters
4. Correlation Factor Testing
5. Normalization of the Variables
6. Improve Requested and Contributed Items
7. CS Correlation Factor
8. Target Value for Satisfactions of Explanatory and Objective Variables
9. Analysis for Sub Group
10. CS Analysis Using Multiple Regression
11. Interaction between Explanatory Variables
12. Extended Normalized Correlation Factor
13. Treatment of Correlation Factor
14. Summary
Chapter 2
Independent Factor Analysis
Abstract
1. Introduction
2. Questionnaire
3. Independent Value
4. Testing
5. Independent Factor
6. Adjust Residuals
7. Level Achievement Ratio
8. Determination of Item Based on CS Analysis
8.1. Normalization
8.2. Improve Requested and Contributed Items
9. CS Correlation Factor
10. Expected Objective Variable Improvement with Improving Explanatory Value
10.1. Two Levels (Objective Variable)-Two Levels (Explanatory)
10.2. General Form for Expected Objective Variable Level
11. Analysis for Sub Group
Summary
Chapter 3
Statistical Testing and Predictions
Abstract
1. Introduction
2. Hypothesis
3. Level of Significance
4. P Points for Various Probability Distributions
5. Testing for One Variable
5.1. One Sample Data Testing for Known Variance
Hypothesis
Evaluation and Judgment
Prediction
5.2. Sample Average Testing for Known Variance
Hypothesis
Evaluation and Judgment
Prediction
5.3. Sample Average Testing for Unknown Variance
Hypothesis
Evaluation and Judgment
Prediction
(Example)
5.4. Sample Variance Testing for Known Variance
Hypothesis
Evaluation and Judgment
Prediction
5.5. Outliers Testing
5.6. Population Ratio Testing with Restored Extraction
Hypothesis
Evaluation and Judgment
Prediction
5.7. Population Ratio Testing with Non-Restored Extraction
Hypothesis
Evaluation and Judgment
Prediction
6. Testing for Two Variables
6.1. Testing of Difference Between Population Averages: Is Known
Hypothesis
Evaluation and Judgment
Prediction
6.2. Testing of Difference between Population Averages: Is Unknown and the Variances Are Assumed to Be the Same
Hypothesis
Evaluation and Judgment
Prediction
6.3. Testing of Difference between Population Averages: Is Unknown and the Variances Are Assumed to Be Different
Hypothesis
Evaluation and Judgment
Prediction
6.4. Testing of Difference between Population Averages: Is Unknown with Paired Data
Hypothesis
Evaluation and Judgment
Prediction
6.5. Testing of Difference between Population Ratio with Restored Extraction
Hypothesis
Evaluation and Judgment
Prediction
6.6. Testing of Difference between Population Ratio with Non-Restored Extraction
Hypothesis
Evaluation and Judgment
Prediction
6.7. Testing of Ratio of Two Population’s Variances: and Are Known
Hypothesis
Evaluation and Judgment
Prediction
6.8. Testing of Ratio of Two Population’s Variances and Are Unknown
Hypothesis
Evaluation and Judgment
Prediction
7. Testing for Correlation Factors
7.1. Correlation Factor Testing
Hypothesis
Evaluation and Judgment
Prediction
7.2. Correlation Factor Testing for Reference One
Hypothesis
Evaluation and Judgment
Prediction
7.3. Two Correlation Factor Testing
Hypothesis
Evaluation and Judgment
Prediction
8. Testing for Regression
Hypothesis
Evaluation and Judgment
Prediction
9. Testing for Multi Regression
Hypothesis
Evaluation and Judgment
Prediction
10. Testing for Effectiveness of Variances in Multi Regression
Hypothesis
Evaluation and Judgment
Prediction
11. Testing for Variance Analysis
11.1. One Way Analysis
Hypothesis
Evaluation and Judgment
Prediction
11.2. Two Way Analysis without Repeated Data
Hypothesis
Evaluation and Judgment
Prediction
11.3. Two Way Analysis with Repeated Data
Hypothesis
Evaluation and Judgment
Prediction
11.4. Independent Factor Analysis
Hypothesis
Evaluation and Judgment
Prediction
Chapter 4
Score Evaluation
Abstract
1. Introduction
2. Evaluation of the Five Subjects
3. Score Evaluation Considering Standard Deviation
Summary
Chapter 5
AHP (Analytic Hierarchy Process)
Abstract
1. Introduction
2. AHP Process
3. Pair Comparison Method
3.1. Pair Comparison Table
3.2. Weight Evaluation Based on Geometric Average
3.3. Eigenvector Method
4. Consistency Check of Pair Comparison
Summary
Chapter 6
Quantification Theory I
Abstract
1. Introduction
2. One Variable Analysis
3. Analysis with Many Variables
4. Mixture of Numerical and Categorical Data for Explanation Variables
Summary
Chapter 7
Quantification Theory II
Abstract
1. Introduction
2. Discriminant Analysis with One Categorical Data
Healthy Member Data
Disease Member Data
3. Discriminant Analysis with Two Categorical Data
Summary
Chapter 8
Quantification Theory III (Correspondence Analysis)
Abstract
1. Introduction
2. Basic Concept of Quantification Theory III
3. General Form Data for Correspondence Analysis
Summary
Chapter 9
Quantification Theory IV
Abstract
1. Introduction
2. Analytical Process
Summary
Chapter 10
Survival Time Probability
Abstract
1. Introduction
2. Survival Probability
3. Different Expression of Survival Probability
4. Survival Probability with Incomplete Data (Kaplan-Meier Predictive Method)
5. Regression for Survival Probability
5.1. Exponential Function Regression
5.2. Weibull Function Regression
6. Average and Standard Deviation of Survival Time
7. Hazard Model
7.1. Definition of Hazard Function
7.2. Analytical Expression for Hazard Function (Exponential Approximation)
7.3. Analytical Expression for Hazard Function (Weibull Function)
8. Testing of Two Group Survival Time
Summary
Chapter 11
Population Prediction
Abstract
1. Introduction
2. Population in Future
Summary
Chapter 12
Random Walk
Abstract
1. Introduction
2. Fundamental Analysis for Random Walk
2.1. General Theory for Evaluating a Case Number of Path
2.2. Principle of Symmetry
2.3. The Path Number Where All Points Except for the Starting Point Is Positive
2.4. The Number of Path from to Where
2.5. The Path Number from to Where
3. The Probability That a Person Is in Positive Region
3.1. The Probability Where the Path Starts from to
3.2. The Probability That a Person Reaches Axis at 2n Trial for the First Time
3.3. The Probability That a Person Enters a Negative Region at Time Step for the First Time
3.4. The Probability That a Person Does Not Cross X Axis up to Time Step
3.5. The Probability That a Person Does Not Enter Negative Region up to Time Steps
3.6. The Probability That the Length of Is in Positive Region of 2n Length Path
4. Return Frequency to the Origin
Summary
Chapter 13
A Markov Process
Abstract
1. Introduction
2. A Markov Process for Random Walk
3. Transition Probability for Random Walk
4. Transition Matrix Elements
4.1. General Discussion for Matrix Elements
4.2. Supply Source
4.3. Supply Source Included in the Transition Matrix
4.4. Vanishing Monitor
4.5. Constant Flux
4.6. Initial Condition
5. Various Examples
5.1. Promotion of University Student Grade
5.2. Promotion of University Grade in the Steady State
5.3. Population Problem
5.4. Share Rate of a Product
5.5. Repeat Customer
5.6. Queue with Single Teller Window
Let us consider a status of 0.
We consider the status 1.
We consider status 6.
5.7. Queue with Multi Teller Windows
Let us consider the status of 0.
We consider the status 1.
We consider the status 2.
We consider the status 3.
We consider the status 6.
5.8. Blood Type Transition
6. Status after Long Time Steps
6.1. Status after N Step
6.2. Steady State
6.3. Vanishing Process
7. A Network Loop
7.1. Network Matrix
7.2. A Network Path with a Loop
Summary
Chapter 14
Random Number
Abstract
1. Introduction
2. Characteristics of Random Number
Characteristic 1: Principle of Equal A Priori Probabilities
Characteristic 2: No Regularity
3. Uniform Random Number Series
4. Numerical Uniform Random Number Generation Method
5. Testing of Random Number Series
5.1. Testing of Equal Priori Probabilities
5.2. Testing of No Regularity
Combination Testing
Runs Testing
6. Random Number Series Associated with Various Probability Distributions
7. Inverse Type Random Number Generation for General Probability Function
8. Random Number Series for Exponential Distribution
9. Random Number Series for Poisson Distribution
10. Random Number Series for a Normal Distribution
11. Random Number Series for Natural Numbers between 1 and N
12. Two Random Numbers That Follow Normal Distributions with a Correlation Factor of
Summary
Chapter 15
Matrix Operation
Abstract
1. Introduction
2. Definition of a Matrix
3. Sum of a Matrix
4. Product of a Constant Number and a Matrix
5. A Product of Two Matrices Related to a Simultaneous Equations
6. Transverse Matrix
7. Solution of a Simultaneous Equations
8. Gauss Elimination Method
8.1. Gauss Elimination Method and LU Decomposition
8.2. LU Division
8.3. Inverse Matrix Derivation Utilizing LU Division
9. Determinant of a Matrix
10. Numerical Evaluation of Eigenvalue
10.1. Relationship between Matrix and Eigenvector
10.2. Power Method
11. Jacobi Method for Symmetrical Matrix
12. n-th Product of Matrix
Appendix 1
Related Mathematics
1. Summation and Product
2. A Gamma Function and a Beta Function
2.1. Definition of a Gamma Function
2.2. A Gamma Function and a Factorial
2.3. Evaluation of
2.4. A Gamma Function Where x < 0
2.5. A Product of a Gamma Function of
2.6. A Binominal Factor for
2.7. A Beta Function
3. Gauss Integration
3.1. Normal Gauss Integration
3.2. Modified Gauss Integration
4. An Error Function
5. An Integral Area of Converted Variables
6. A Marginal Probability Distribution
7. Integration by Parts
8. Derivatives of Inverse Trigonometric Functions
9. A Derivative Function
10. Vector Derivative
11. Symmetry of the Matrix
12. A Stirling’s Formula
Step 1: A Wallis’ formula
Step 2:
Step 3:
13. Trigonometric Functions
Appendix 2
Summary of Probability Distributions and Their Moments
Abstract
1. Introduction
2. General Relationships
3. Functions, Generating Functions, and Moments Parameters for Various Probability Distributions
3.1. A Uniform Distribution
Graphics
Probability Function
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.2. A Binomial Distribution
Graphics
Probability Function
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.3. A Multinomial Distribution
Graphics
Probability Function
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.4. A Negative Binomial Distribution
Graphics
Probability Function
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.5. A Beta Distribution
Graphics
Probability Function
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.6. A Dirichlet Distribution
Graphics
Probability Function
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.7. A Gamma Distribution
Graphics
Probability Function
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.8. An Inverse Gamma Distribution
Graphics
Probability Function
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.9. A Poisson Distribution
Graphics
Probability Function
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.10. A Geometric Distribution
Graphics
Probability Distribution
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.11. A Hypergeometric Distribution
Graphics
Probability Distribution
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.12. A Normal Distribution
Graphics
Probability Distribution
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.13. A Standard Normal Distribution
Graphics
Probability Distribution
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.14. A Lognormal Distribution
Graphics
Probability Distribution
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.15. A Cauchy Distribution
Graphics
Probability Distribution
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.16. Distribution
Graphics
Probability Distribution
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.17. Distribution
Graphics
Probability Distribution
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.18. A Rayleigh Distribution
Graphics
Probability Distribution
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.19. An F Distribution
Graphics
Probability Distribution
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.20. A t Distribution
Graphics
Probability Distribution
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.21. An Exponential Distribution
Graphics
Probability Distribution
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.22. An Erlang Distribution
Graphics
Probability Distribution
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.23. A Laplace Distribution
Graphics
Probability Distribution
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
3.24. A Weibull Distribution
Graphics
Probability Distribution
Generating Function
Moments
Central Moments
Moment Parameters
Peak Position
Comment
References
About the Author
Index
Blank Page

Recommend Papers

Analysis of Categorical Data: Dual Scaling and its Applications 9781487577995

This volume presents a unified and up-to-date account of the theory and methods of applying one of the most useful and w

149 64 21MB Read more

Longitudinal Categorical Data Analysis (Springer Series in Statistics) [2014 ed.] 9781493921379, 0195054733, 1493921371

113 29 5MB Read more

Statistics and Data Analysis Through R 9798560999926

This book focuses on the implementation of statistics and data analysis through R. It deals first with the Exploratory D

756 109 3MB Read more

Marginal Models in Analysis of Correlated Binary Data with Time Dependent Covariates 3030489035, 9783030489038

This monograph provides a concise point of research topics and reference for modeling correlated response data with time

407 16 6MB Read more

Regression analysis of longitudinal binary data with time-dependent environmental covariates bias an

386 69 176KB Read more

Marginal Models in Analysis of Correlated Binary Data with Time Dependent Covariates [1st ed.] 9783030489038, 9783030489045

This monograph provides a concise point of research topics and reference for modeling correlated response data with time

414 11 6MB Read more

Categorical Data Analysis Using the SAS System [2 ed.] 9780471224242, 0-471-22424-3

Along with providing a useful discussion of categorical data analysis techniques, this book shows how to apply these met

533 20 3MB Read more

Categorical Data Analysis [1 ed.] 0471853011, 9780471853015

453 89 7MB Read more

Statistics for astrophysics: Time series analysis 9782759827411

This book is the result of the 2019 session of the School of Statistics for Astrophysics (Stat4Astro) that took place on

125 48 5MB Read more

Introductory Statistics for Data Analysis 9783031281884, 9783031281891

104 26 4MB Read more

Statistics. Volume 3, Categorical and time dependent data analysis [3 ed.]
9781536151251, 1536151254

Author / Uploaded
Kunihiro Suzuki

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

MATHEMATICS RESEARCH DEVELOPMENTS

STATISTICS VOLUME 3 CATEGORICAL AND TIME DEPENDENT DATA ANALYSIS

No part of this digital document may be reproduced, stored in a retrieval system or transmitted in any form or by any means. The publisher has taken reasonable care in the preparation of this digital document, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained herein. This digital document is sold with the clear understanding that the publisher is not engaged in rendering legal, medical or any other professional services.

MATHEMATICS RESEARCH DEVELOPMENTS Additional books and e-books in this series can be found on Nova’s website under the Series tab.

MATHEMATICS RESEARCH DEVELOPMENTS

STATISTICS VOLUME 3 CATEGORICAL AND TIME DEPENDENT DATA ANALYSIS

KUNIHIRO SUZUKI

Copyright © 2019 by Nova Science Publishers, Inc. All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted in any form or by any means: electronic, electrostatic, magnetic, tape, mechanical photocopying, recording or otherwise without the written permission of the Publisher. We have partnered with Copyright Clearance Center to make it easy for you to obtain permissions to reuse content from this publication. Simply navigate to this publication’s page on Nova’s website and locate the “Get Permission” button below the title description. This button is linked directly to the title’s permission page on copyright.com. Alternatively, you can visit copyright.com and search by title, ISBN, or ISSN. For further questions about using the service on copyright.com, please contact: Copyright Clearance Center Phone: +1-(978) 750-8400 Fax: +1-(978) 750-4470 E-mail: [email protected]. NOTICE TO THE READER The Publisher has taken reasonable care in the preparation of this book, but makes no expressed or implied warranty of any kind and assumes no responsibility for any errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of information contained in this book. The Publisher shall not be liable for any special, consequential, or exemplary damages resulting, in whole or in part, from the readers’ use of, or reliance upon, this material. Any parts of this book based on government reports are so indicated and copyright is claimed for those parts to the extent applicable to compilations of such works. Independent verification should be sought for any data, advice or recommendations contained in this book. In addition, no responsibility is assumed by the publisher for any injury and/or damage to persons or property arising from any methods, products, instructions, ideas or otherwise contained in this publication. This publication is designed to provide accurate and authoritative information with regard to the subject matter covered herein. It is sold with the clear understanding that the Publisher is not engaged in rendering legal or any other professional services. If legal or any other expert assistance is required, the services of a competent person should be sought. FROM A DECLARATION OF PARTICIPANTS JOINTLY ADOPTED BY A COMMITTEE OF THE AMERICAN BAR ASSOCIATION AND A COMMITTEE OF PUBLISHERS. Additional color graphics may be available in the e-book version of this book.

Library of Congress Cataloging-in-Publication Data ISBN: HERRN

Published by Nova Science Publishers, Inc. † New York

CONTENTS Preface

vii

Chapter 1

Customer Satisfaction Analysis

Chapter 2

Independent Factor Analysis

27

Chapter 3

Statistical Testing and Predictions

69

Chapter 4

Score Evaluation

115

Chapter 5

AHP (Analytic Hierarchy Process)

129

Chapter 6

Quantification Theory I

141

Chapter 7

Quantification Theory II

153

Chapter 8

Quantification Theory III (Correspondence Analysis)

169

Chapter 9

Quantification Theory IV

187

Chapter 10

Survival Time Probability

193

Chapter 11

Population Prediction

215

Chapter 12

Random Walk

221

Chapter 13

A Markov Process

249

Chapter 14

Random Number

301

Chapter 15

Matrix Operation

321

Appendix 1

Related Mathematics

367

Appendix 2

Summary of Probability Distributions and Their Moments

413

References

1

469

vi

Contents

About the Author

471

Index

473

Related Nova Publications

477

PREFACE We utilize statistics when we evaluate TV program rating, predict a result of voting, prepare stock, predict the amount of sales, and evaluate the effectiveness of medical treatments. We want to predict the results not on the base of personal experience or images, but on the base of the corresponding data. The accuracy of the prediction depends on the data and related theories. It is easy to show input and output data associated with a model without understanding it. However, the models themselves are not perfect, because they contain assumptions and approximations in general. Therefore, the application of the model to the data should be careful. We should know what model we should apply to the data, what are assumed in the model, and what we can state based on the results of the models. Let us consider a coin toss, for example. When we perform a coin toss, we obtain a head or a tail. If we try the coin toss three times, we may obtain the results of two heads and one tail. Therefore, the probability that we obtain for heads is 2/3, and the one that we obtain for tails is 1/3. This is a fact and we need not to discuss this any further. It is important to notice that the probability (2/3) of getting a head is limited to this trial. Therefore, we can never say that the probability that we obtain for heads with this coin is 2/3, in which we state general characteristics of the coin. If we perform the coin toss trial 400 times and obtain heads 300 times, we may be able to state that the probability of obtaining a head is 2/3 as the characteristics of the coin. What we can state based on the obtained data depends on the sample number. Statistics gives us a clear guideline under which we can state something is based on the data with corresponding error ranges. Mathematics used in statistics is not so easy. It may be a tough work to acquire the related techniques. Fortunately, the software development makes it easy to obtain results. Therefore, many members who are not specialists in mathematics can do statistical analysis with these softwares. However, it is important to understand the meaning of the model, that is, why some certain variables are introduced and what they express, and what we can state based on the results. Therefore, understanding mathematics related to the models is invoked to appreciate the results.

Kunihiro Suzuki

viii

In this book, we treat models from fundamental ones to advanced ones without skipping their derivation processes as possible as I can. We can then clearly understand the assumptions and approximations used in the models, and hence understand the limitation of the models. We also cover almost all the subjects in statistics since they are all related to each other, and the mathematical treatments used in a model are frequently used in the other ones. We have many good practical and theoretical books on statistics [1]-[10]. However, these books are oriented to special direction: fundamental, mathematical, or special subjects. I want to add one more, which treats fundamental and advanced models from the beginnings to the advanced ones with a self-contained style. I also aim to connect theories to practical subjects. This book consists of three volumes:   

The first volume treats the fundamentals of statistics. The second volume treats multiple variable analysis. The third volume treats Categorical and time dependent data analysis.

In volumes 1 and 2, we treat numerical data. We treat categorical data in this volume. We can perform analysis with the categorical data similar to the numerical data. We also treat time dependent data analysis in this volume. We treat the following subjects. (Chapter 1 and 2) We introduce customer satisfaction (CS) analysis which decide the important item to improve target subject based on the two stand point of view: correlation factor between the item and the target subject and the level of the each item. This analysis is vital to clarify the items we should focus to improve objective variable. Independent analysis is the categorical version of the CS analysis. (Chapter 3) We summarize the discussions on testing and predictions up to this chapters, which is very important because it is the judge based on statistical method. (Chapter 4) We introduce score analysis, where we select a subject with a low score and a high variance, which is supposed to be most effective one to improve the total score.

Preface

ix

(Chapter 5) We treat an analytic hierarchy process (AHP), which is the analysis for various qualitative data. We show that how we make a decision quantitatively based on the qualitative data. (Chapter 6 to 9) We treat multi-variable analysis for categorical data. Quantification theory I corresponds the multiple regression analysis for categorical data. Quantification theory II corresponds the discriminant analysis for categorical data. Quantification theory III clarify the relationship between two categorical data. Quantification theory IV clarifies the similarity between two categorical data. (Chapter 10) From this chapter, we cover the time dependence of the probability. We treat survival time probability, which is frequently used in medical fields, where we have not complete form data, but use them all. (Chapter 11) We treat population problem, which is very important for us. We show how we predict the constitution of age members. (Chapter 12 and 13) We treat time dependent probability function in these chapters. We start with random work and extend it to a Marcov process. (Chapter 14) We briefly study random number to generate pseudo experimental data. Generating random number is vital to predict the results theoretically, and is necessary in Monte Carlo simulation. (Chapter 15) We briefly study matrix operation which is important and fundamental in statistics. (Appendix 1) We add brief explanation of mathematics related to the book. (Appendix 2) We evaluated moment parameters with various methods, and summarize the probability distribution functions and related moment parameters.

x

Kunihiro Suzuki

We do hope that the readers can understand the meaning of the models in statistics and techniques to reach the final results. I think it is not easy to do so. However, I also believe that this book helps one to accomplish it with time and efforts. I tried to derive any model from the beginning for all subjects although many of them are not complete. It would be very appreciated if you point out any comments and suggestions to my analysis.

Kunihiro Suzuki

Chapter 1

CUSTOMER SATISFACTION ANALYSIS ABSTRACT We treat data which consist of one objective variable, which is a customer satisfaction (CS) data, and many explanatory variables. CS analysis clarifies which explanatory variables are important to improve the objective variable. The decision is made by considering two aspects: one is the correlation factor between the objective variable and each explanatory variable and the other is the level achievement ratio of each explanatory variable. We select the explanatory variables of with high correlation factor s and low level achievement ratio.

Keywords: explanatory variable, objective variable, correlation factor, average, variance, normalization, CS plot, CS analysis, contribution degree, requested improvement degree, CS correlation factor, principal component analysis

1. INTRODUCTION We want to improve customer satisfaction (CS), and study what items influence the satisfaction. We then evaluate the level of the assumed items as well as the customers’ satisfaction. Based on the data, we want to decide which items are important for improving the satisfaction. This subject is generalized to the situation where one objective variable and many explanatory variables as shown in Figure 1. We assume that all data are numerical ones. This type of the analysis is called as CS analysis even when the objective variable is not customers’ satisfaction.

Kunihiro Suzuki

2

Figure 1. Data structure for CS analysis.

2. QUESTIONNAIRE We treat one objective variable and five explanatory variables in this section. The

y

objective variable is the customers’ satisfaction. The explanatory variables (Items) are the ones which are supposed to influence the customers’ satisfaction, and are denoted as xi  i  1, 2,

,5

. We assume the items blow.



Item 1  x1  :Understanding of customers’ work



Item 2  x2  :Quality of reply to customers’ requests and questions



Item 3  x3  :Project promotion ability



Item 4  x4  :Effective information providing ability



Item 5  x5  :Proposal ability

The target of the analysis is to clarify for which items we (salesmen) should try to improve the customers’ satisfaction. Table 1. Questionnaire and variable notations Variable y x1 x2 x3 x4 x5

Questionary Customers' satisfaction Understanding of customer Quality of reply to customers' request Project promotion ability Effective information providing ability Proposal

Item No Item1 Item2 Item3 Item4 Item5

Score 1-10 1-5 1-5 1-5 1-5 1-5

The corresponding evaluation score form is shown in Table 1. The score range is shown in the table. We obtain the data from 350 customers, that is, the data number n is

Customer Satisfaction Analysis

3

350. We consider the correlation between the objective variable and each explanatory variable and do not consider correlation between each explanatory variable.

3. FUNDAMENTAL PARAMETERS The average of the objective variables is given by n

y 

y k 1

k

(1)

n

The average of the explanatory variables are given by n

i 

x

ik

k 1

(2)

n

where i denotes the item number, and. i  1, 2,

, p p is five here.

The unbiased variances and co-variances are given by

 y n

 yy 2 

ik

k 1

 ii  2 

x

ik

k 1

(3)

 i 

2

n 1 n

 iy 

2

n 1 n

 2

 y 

 x k 1

ik

(4)

 i   yik   y  n 1

(5)

The correlation factor between the objective variable and the explanatory variable i is given by

riy 

 iy  2  ii  2  yy  2

(6)

Kunihiro Suzuki

4

Table 2 shows the averages, standard deviations and correlation factors for each item. The corresponding figures for the satisfaction average and the correlation factors are shown in Figure 2 and Figure 3. We want to select the items which are low in the average of the satisfaction and high in the correlation factor. We may decide the items by inspecting the both figures. However, it is not clear which one should be selected. We will introduce a parameter which enables us to select items clearly.

Mean of satisfaction

5 4

3 2 1

0 Item1

Item2

Item3

Item4

Item5

Item3

Item4

Item5

Figure 2. Average of satisfaction for each item.

Correlation factor

1.0 0.8 0.6 0.4

0.2 0.0 Item1

Item2

Figure 3. Correlation factors for each item and the objective variable.

Customer Satisfaction Analysis

5

Table 2. Average, standard variation, and correlation factors for each item Variable

Item1(x1)

Item2(x2)

Item3(x3)

Item4(x4)

Item5(x5)

y

Average

3.68

3.52

3.12

3.48

3.32

6.83

0.92

1.12

0.98

0.99

0.95

1.94

0.74

0.77

0.68

0.73

0.68

Standard deviation Correlation factor

4. CORRELATION FACTOR TESTING We evaluate the effectiveness of the explanatory variables. If the corresponding correlation factor is negative, we neglect it. We then evaluate whether the correlation factor is zero. The converted variable t

n2

t

r

(7)

1  r2

follows t -distribution with a freedom of n  2 , which is shown in Chapter 2 of volume 2. Since we evaluate only positive value, and hence we apply one sided probability P , and evaluate the corresponding P point

tp

. We then evaluate below.

t  tp

(8)

If the above relationship is held, we judge that the variable is valid, and vice versa. We select explanatory variables that hold Eq. (8), and proceed the next step.

5. NORMALIZATION OF THE VARIABLES

We evaluate the average of each item’s average which are given by  

1 p  i p i 1



, and the standard deviation



,

(9)

Kunihiro Suzuki

6

 

2 1 p i      p i 1

(10)

z We then normalize each item’s average i , which is denoted as i and is given by zi 

i   

(11)

We then evaluate the average r and the standard deviation  r of correlation factors, which are given by r 

r 

1 p  ri p i 1

(12)

1 p 2  ri  r   p i 1

(13)

We then normalize each item’s correlation factor ri , which is denoted as zri and is given by zri 

ri  r

r

(14)

a  z ,z



ri  i We then obtain the point in the plane as i . Table 3 shows the average standard deviation, correlation factor and their normalized values. The score of item3 is high and that of item 1 is low. On the other hand, the correlation factor of item 3 is low and that of item 1 is rather low. Table 4 shows the average and the standard deviation of scores with respect to the items. The normalized average and the standard deviation are evaluated with these values.

Table 5 shows the predictive probability and P-value of t distribution with which we evaluate the effectiveness of the correlation. This is shown in the Table 3.

Customer Satisfaction Analysis

7

Table 3. Average and standard variation with respect to the same items or objective variables Variable

Item1(x1)

Item2(x2)

Item3(x3)

Item4(x4)

Item5(x5)

y

Average

3.68

3.52

3.12

3.48

3.32

6.83

0.92

1.12

0.98

0.99

0.95

1.94

0.74

0.77

0.68

0.73

0.68

20.71

22.27

17.19

20.01

17.18

Yes

Yes

Yes

Yes

Yes

1.34

0.51

-1.6

0.29

-0.54

0.66

1.32

-1.16

0.34

-1.17

Contribution

1.42

1.29

-1.95

0.45

-1.21

Improvement request

-0.48

0.58

0.31

0.03

-0.44

Standard deviation Correlation factor t-value of correlation factor Evaluation of correlation Normalized average Normalized correlation factor

Table 4. Average and standard variation with respect to items

Table 5. Predictive probability and p-value of t-distribution

Prediction probability

0.95

tP

1.65

CS correlation factor

0.83

Kunihiro Suzuki

8

Normalized satisfaction

2

Contribution Item1

1 Item2

Item4

0 Item5

-1 Item3 Improve requested

-2 -2

-1

0

1

2

Normalized correlation factor Figure 4. CS plot of normalized correlation factor and satisfaction score.

6. IMPROVE REQUESTED AND CONTRIBUTED ITEMS We can plot the normalized correlation factor and normalized satisfaction as shown in Figure 4. The high correlation factor means that it is an important item, and the high satisfaction means that the item is in good condition. Therefore, the requested improvement item can be selected as the one with a high correlation factor and a low satisfaction score. How can we obtain the corresponding value ? The axis direction of right angle of -45o corresponds to the importance associated with the correlation factor and badness associated with the satisfaction, and hence it expresses requested an improvement requested degree. Therefore, the projection of each point to the axis corresponds to the improvement request degree, which is shown in the red arrow for item 2. The distance from the origin to the end point of the red arrow is the improvement requested degree. The axis direction of right angle of 45o corresponds to the importance associated with the correlation factor and the goodness associated with the satisfaction, and hence it expresses the degree of contribution. Therefore, the projection of each point to the axis corresponds to the contribution degree. The distance from the origin to the end point of blue arrow is the contribution degree. We can evaluate the degree as follows. The unit vector for the contribution axis eG , and that for the improvement requested axis eB are given by

Customer Satisfaction Analysis   1 1  , eG      2 2  e   1 ,  1    B  2 2

9

(15)

The degree for the contribution is denoted as Gi and can be evaluated as Gi  ai eG   zri , zi  

1 2

1 2

1,1

 zri  zi  (16)

The degree for the improvement requested is denoted as Bi and can be evaluated as Bi  ai eB   zri , zi  

1 2

1 2

1, 1

 zri  zi 

(17)

Figure 5 shows the contribution and the requested improvement degrees extracted from Figure 4. 1.5

Contribution Improvement request

1.0

0.5

0.0

Item1(x1) Item2(x2) Item3(x3) Item4(x4) Item5(x5) Figure 5. Contribution and improvement request degrees.

Kunihiro Suzuki

10

Item 1 and Item 2 contribute to the satisfaction, while item 2 contributes to the improvement requested. Since the correlation factor of item 2 is high, it is requested high level score. It is the reason why item 2 appears both degrees. There is no clear critical value for the requested improvement. We should set a certain value. If we set the value at 0.5 here, we should focus on item 2.

7. CS CORRELATION FACTOR We want to evaluate the status of the CS. We can evaluate it inspecting the data distribution. If the data are along the contribution axis, the status is good. On the other hand, if the data are along the improvement request axis, the status is bad. We can evaluate the status of the CS by evaluating the correlation factor between zri

rz z and zi . We denote it as r  , and evaluate it as rzr z 

 z 2z r

 2

z z

r r

 z2z 

(18)

where 1 p 2  zri p i 1

(19)

 z2z  

1 p 2  zi p i 1

(20)

 z 2z 

1 p  zri zi p i 1

(21)

 z 2z  r r

r

We call it as a CS correlation factor. The value is between -1 and 1, and the status is better with increasing value. Then we show the three typical CS statuses, where the CS correlation factor is -0.8, 0.0, and 0.8 as shown in Figure 6, where we use 10 items that may influence the objective variable.

Customer Satisfaction Analysis

11

The total status can be evaluated using this CS correlation factor. Two cases are shown in Figure 6, where (a) is in a bad condition, (b) is in a plane condition, and (c) is in a good condition. The status can be evaluated by the CS correlation factors. 2 rcl = -0.8

Contribution

Normalized satisfaction

Normalized satisfaction

2

1

0

-1

rcl = 0.0

1

0

-1 Improvement requested

Improvement requested

-2 -2

-1 0 1 2 Normalized correlation factor (a)

Normalized satisfaction

2

rcl = 0.8

Contribution

-2 -2

-1 0 1 2 Normalized correlation factor (b)

Contribution

1

0

-1 Improvement requested

-2 -2

-1 0 1 2 Normalized correlation factor (c)

Figure 6. CS plot with various CS correlation factors. (a) Bad condition (b) Plane condition, (c) Good condition.

8. TARGET VALUE FOR SATISFACTIONS OF EXPLANATORY AND OBJECTIVE VARIABLES We clarified the target items in the previous section. We then show the target values of the item using a regression theory.

Kunihiro Suzuki

12 The objective value related to

yk   y

y

 ri

y

and explanatory values xk  k  1, 2, , n  are assumed to be

xk  i

i

(22)

We suppose that this relationship is held after we perform some treatments and assume  yk   y  n  xk  i     r   y  k 1  i  i  k 1  n

 

(23)

We can then modify it as  y   r 1 n 1 1 n yk    ri i   i  xk    y n k 1  i   i n k 1  y

(24)

We then have 1

y

where

 y p  rp  p  y

 yt  

 yt

and

it

 rp  pt    p

(25)

is the average satisfaction after some treatment, and are given by

 yt 

1 n  yk n k 1

(26)

it 

1 n  xk n k 1

(27)

Therefore, we obtain

 y  ri

where

y i p

(28)

Customer Satisfaction Analysis

13

 y   yt   y

(29)

i  it  i

(30)

We assume that the relationship between each explanatory variable is independent, and the total improvement of the objective variable Q y is given by

Qy   i

y  i i

(31)

9. ANALYSIS FOR SUB GROUP We treat a total group up to here. The group may consist of many sub groups, and the characteristics of the sub groups are different from the total group in general. The target items for the sub group may be different from one for the total group. We try to select target items for the sub groups. We assume that the importance of item is the same for the sub group. The difference between a certain group and the total group or between each sub groups is the status of satisfaction. We reference the satisfaction of a sub group with respect to the total ones. We set the average for item i of the sub group as Gi and the data number as nG . We introduce the normalized variable for item i as zGi , and define it as Gi  i

zGi 

i

(32)

1 1  N nG

This may be too big deviation and suffer unstable one. This may be modified as zGi 

Gi  i

(33)

1 1  i  N nG

where  is just a parameter to handle the magnitude of the value, should be larger than 1. We may be able to use the simple form given by

Kunihiro Suzuki

14 zGi 

Gi  i i

(34)

This form is simplest, but does not consider the scale of the sub group. The selection of the model should be investigated further, we use a model of Eq. (34) here. The normalized satisfaction can be expressed as

zi  zi  zGi

(35)

If the average of the sub group Gi is the same as the total group, the normalized sub group is the same as the total group. If it is larger than the total group, the normalized satisfaction is larger than the one of the total group, which is the expected characteristic. We treat a group A as low score one, and a group B as high score one as shown in Table 6. The deviation of the normalized satisfaction scores are shown in Table 7. We used a model of Eq. (34) here. The contribution and requested improvement are changed depending on the score of satisfaction as shown in Table 8 and Table 9. The target items for the groups A and B are changed from the ones for the total group correspondingly. Table 6. Satisfaction score of group A and B Variable Average (total) Average (A) Average (B)

Item1(x1) 3.68 3.00 4.00

Item2(x2) 3.52 3.00 3.70

Item3(x3) 3.12 2.50 3.50

Item4(x4) 3.48 3.00 4.00

Item5(x5) 3.32 3.00 4.00

y 6.83 5.00 7.00

Number 350 15 15

Table 7. Deviation of normalized satisfaction of group A and B Variable Average (total) Average (A) Average (B)

Item1(x1) 0.00 -2.17 1.32

Item2(x2) 0.00 -1.75 0.61

Item3(x3) 0.00 -2.41 1.46

Item4(x4) 0.00 -1.85 2.00

Table 8. Contribution degree of group A and B Variable Item1(x1) Item2(x2) Item3(x3) Item4(x4) Item5(x5)

Total 1.42 1.29 -1.95 0.45 -1.21

Group A -0.55 0.05 -3.66 -0.86 -4.96

Group B 2.35 1.72 -0.92 1.86 0.71

Item5(x5) 0.00 -5.30 2.70

Customer Satisfaction Analysis

15

Table 9. Requested improvement of group A and B Variable Item1(x1) Item2(x2) Item3(x3) Item4(x4) Item5(x5)

Total -0.48 0.58 0.31 0.03 -0.44

Group A 1.48 1.82 2.02 1.34 3.31

Group B -1.42 0.15 -0.72 -1.38 -2.36

10. CS ANALYSIS USING MULTIPLE REGRESSION We do not consider interactions between explanatory variables. This means that an explanatory variable is independent of each other. Although it is an ideal condition, but explanatory variables interact each other in general. A multiple regression does consider the interaction. If we use this multiple regression model, we can accommodate the interaction. The multiple regression factors depend on the scale of the data. We hence use a normalized variable. We can then obtain the relationship given by

z y  b1 z1  b2 z2 

 bp z p

(36)

The factor satisfies the below equation.  S11 2  2    S21    S  2  p1

S12 

2 2 S1 p    b1   S1y        2 2 S2 p   b2   S2 y            2   b    2  S pp   p   S py 

2

  S22 2

S p 2 2

(37)

Therefore, we can obtain the factor bi as    b1   S11     2  b2    S21       2    bp   S p1 2

S12  2

  S22 2

S p 2 2

S1 p    2 S2 p    2 S pp  2

1

 S1y2   2   S2 y       S  2   py 

(38)

Kunihiro Suzuki

16

where the matrix operation is shown in Chapter 15. We can use these multiple regression coefficients instead of the correlation factors. However, the multiple regression coefficients are difficult to handle. We cannot image the values from the data as shown in the corresponding chapter. This is the reason why a multiple regression is not used commonly although the fitness to the data is more excellent. We try to treat the interaction with a different way in the next section.

11. INTERACTION BETWEEN EXPLANATORY VARIABLES We showed that we can include the interaction in the multiple regression analyses. We show the other case to include the interaction in this section. On the other hand, the usual CS analysis is stable since it does not consider the interaction between the explanatory variables. We evaluate the correlation factor for explanatory variables, and select a group where

p

each correlation factors are large. We assume that the group consists of variables. The high correlation factor means high interaction. Therefore, the improvement of the objective variable cannot be simply added for each explanatory variable. The significance is doubly counted, and it should be decreased by the interaction. We perform principal component analysis to the explanatory variables. We evaluate the first principal component, which can be expressed as

zi  a1u1i  a2u2i 

where below.

a1 , a2 , , a p

a12  a22 

u1 , u2 , , u p

u1i 

(39)

are the elements of the eigenvector for the first component, and hold

 a 2p  1

(40)

are the normalized explanatory variables in the group and is given by

x1i  1

 1 2

 a p u pi

, u2 i 

x2i  2

 2 2

,

, u pi 

x pi   p

 p2

The average and the variance are given by

(41)

Customer Satisfaction Analysis z  

17

1 n  zi n i 1 x  1 1 n x  2 1 n a1 1i   a2 2i   2   n i 1 n i 1 1  2 2



x pi   p 1 n ap  n i 1  p2

(42)

0

1 n 2  zi   z   n  1 i 1 1 n 2   zi n  1 i 1

 z 2 

 1  n  x1i  1  1 n  x2i  2   a1     a2 n  1  i 1    2  n  1 i 1    2  1 2      p

x  1 pi p  ap   2    n  1 i 1  p n

    

2

(43)

p

  ai a j R  i, j  i 1 j 1

 a12  a22  p

1



i , j i  j 

 a 2p 

p



i , j i  j 

ai a j R  i, j 

ai a j R  i, j 

is the correlation factor between variables i and j . Since we select variables R i, j where each interaction is significant,   is positive. We use this first principal component instead of the original variables. The correlation factor between the first principal component and the objective variable is given by where

R  i, j 

rz 

1 n zi z yi  n  1 i 1   2 z

 

 n y   y x1i  1 yi   y x2i  2 1 n  1 a i  a2   1  2   n  1 i 1  2  2 n  1 i  1 z  y 1  y 2  2 2 a1r1  a2 r2   a p rp 1



yi   y x pi   p  1 n  ap  n  1 i 1  y 2  p2 

 z 2 p

 i 1

ai

 z 2

ri

(44)

Kunihiro Suzuki

18

We should also assign the explanatory value for the first principle component. It may be used as

x  a11  a2 2 

 ap  p

(45)

Finally, the increment of the objective variable is related to the explanatory variable where the interaction is significant can be expressed as   y 2  y 2 y  rz  a1 1  a2 2    2  2   1 2 

 ap

 y 2  p2

  p   

(46)

We can treat the other explanatory variables with the normal way, and hence the total objective variable increment can be expressed by q

 y 2

K 1

 k 2

y  rz  ak

k 

p

r

k  q 1

k

 y 2  k 2

k

(47)

Figure 7. Correlation factors. (a) Original correlation factor (b) Correlation factor which is smaller than ones in (a) by 0.6, (b’) Correlation factor of (b) with different vertical value.

Customer Satisfaction Analysis

19

12. EXTENDED NORMALIZED CORRELATION FACTOR In this analysis, we use a normalized correlation factor, which is denoted again as zri 

ri  r

r

(48)

The origin of the normalized value is the average  r . This means that we do not care the ratio of  r r do not influence the value of the normalized value. Figure 7 shows the correlation factor of Figure 3 and also the correlation factor -0.6. In the Figure 7 (a), the average of the correlation factor is about 0.7 and the standard deviation is much smaller than the average. On the other hand in Figure 7 (b), the average correlation factor is about 0.1, and is comparable with the standard deviation. In the former case, we can approximately consider that the correlation factors are almost the same for any items, and the latter depends on the items. However, the CS analysis gives the same results. This means that CS analysis exaggerates the difference between correlation factors. We therefore propose an extended normalized variable given by 1 z p 

r z r rp

  1  r   r 

2

(49)

The extended normalized variable is as follows in the limiting cases as

z p

 1   z  rp

r r  for r r for

1 1

These are the expected ones.

(50)

a  z , z



p ip We then obtain a coordinate for member i as i . We evaluate the axis for improvement request and contribution. We define the angle given by

tan  

 

(51)

Kunihiro Suzuki

20 That is, we obtain an angle of

    tan 1      

(52)

The angle has value for the limiting cases as  2    0 

r r  for r r

for

1 1

(53)

We propose to define the unit vectors for the contribution and the improvement as follows.          eG   cos    ,sin      2 2  2 2     e   cos       ,sin             B   2 2  2 2  

(54)

These definitions realize the requested ones for the limiting cases. The contribution and improvement requested are given by        ai eG  zGi  z p cos  2  2   zip sin  2  2        a e  z  z cos        z sin       i B Bi p   ip     2 2  2 2 

(55)

This subject will be discussed again more clearly in Chapter 4.

13. TREATMENT OF CORRELATION FACTOR It should be noted that the correlation factor has limiting values of 1 and -1. We cannot expect values outside of this region. Therefore, the variation from 0.1 to 0.2 and from 0.9 to 1.0 are not identical, and the latter rarely occurs or is hard to realize. However, we treat the values identically in CS analysis.

Customer Satisfaction Analysis We therefore propose to use converted variable

 which is given by

1 r 

1

21

  ln   2 1 r 

(56)

This variable changes from  to  when r changes from -1 to +1. We can perform  the same analysis using this .

14. SUMMARY I summarize the results in this chapter. We consider an objective variable, and p items for explanatory variables. The average of the objective variables is given by n

y 

y k 1

k

n

The average of the explanatory variables are given by n

i 

x

ik

k 1

n

where i denoted the item number, and i  1, 2,

,p.

The unbiased variances and co-variances are given by

 y

 y 

n

 2  yy 

ik

k 1

n 1 n

 ii  2 

x

ik

k 1

 iy 

 i 

2

n 1 n

 2

2

 x k 1

ik

 i   yik   y  n 1

Kunihiro Suzuki

22

The correlation factor between the objective variable and the explanatory variable i is given by

riy 

 iy  2  ii  2  yy  2

We can evaluate whether the correlation is valid or not by evaluating t given by t  n2

r 1  r2

It follows a t -distribution with a freedom of n  2 . We compare this with the corresponding

tp

.

We evaluate the average of each item’s average which are given by  

 



, and the standard deviation



,

1 p  i p i 1

2 1 p i      p i 1

z We then normalize the each item’s average i , which is denoted as i and is given by

zi 

i   

We then evaluate the average r and the standard deviation  r of correlation factors, which are given by r 

1 p  ri p i 1

Customer Satisfaction Analysis 1 p 2   ri  r  p i 1

r 

ri , which is denoted as zri and

We then normalize the each item’s correlation factor is given by zri 

23

ri  r

r

We then obtain the point in the plane as

ai   zri , zi 

.

The degree for the contribution is denoted as Gi and can be evaluated as 1

Gi 

2

 zri  zi 

The degree for the requested improvement is denoted as Bi and can be evaluated as 1

Bi 

2

 zri  zi 

We can evaluate the status of the CS by evaluating the correlation factor between zri and

zi

. We denote it as

rzr z 

 zr2z  zr2zr  z2z 

where  z 2z 

1 p 2  zri p i 1

 z2z  

1 p 2  zi p i 1

r r

rzr z

, and evaluate it as

Kunihiro Suzuki

24

 z 2z  r

1 p  zri zi p i 1

We can evaluate the subgroup as follows. We evaluate the normalized variable for subgroup as zGi 

Gi  i i

The normalized satisfaction can be expressed as zi  zi  zGi

We then perfume the same evaluation. We implicitly assume that the explanatory variables are independent on each other. However, there is a case where the interaction is significant. In that case, we perform the principal analysis to the variables with high interaction, and obtain first component.

zi  a1u1i  a2u2i 

 a p u pi

The related correlation factor is given by p

rz   i 1

ai

 z 2

ri

The corresponding explanatory value is given by

x  a11  a2 2 

 ap  p

We further introduce an extended normalized variable to consider the magnitude of the average and the standard deviation of the correlation factor, which is given by 1 z p 

r z r rp

  1  r   r 

2

Customer Satisfaction Analysis

a  z , z

25



p ip We then obtain coordinate for member i as i . We evaluate the axis for improvement request and contribution. We define the angle given by

    tan 1       We propose to define the unit vectors for contribution and improvement as follows.          eG   cos    ,sin      2 2  2 2     e   cos       ,sin             B   2 2  2 2  

Figure 8. Flow of analysis.

Kunihiro Suzuki

26

The contribution and improvement requested are given by

       ai eG  zGi  z p cos  2  2   zip sin  2  2        a e  z  z cos        z sin       i B Bi p   ip     2 2  2 2  We further pointed out to use a variable converted from the correlation factor given by 1 2

1 r   1 r 

  ln 

Figure 8 shows flow of the analysis described above.

Chapter 2

INDEPENDENT FACTOR ANALYSIS ABSTRACT We treat data which consist of one objective variable and many explanatory variables as is the case of CS analysis in the previous chapter. Independent analysis is the same as CS analysis but treat categorical data instead of numerical data. This analysis clarifies which explanatory data are important to improve the objective variable. The decision is made by considering the independent factor and the level achievement ratio.

Keywords: explanatory variable, objective variable, independent factor, independent value, adjust residual

1. INTRODUCTION In this chapter, we treat data which consist of one objective variable and many explanatory variables as shown in Figure 1. The data form is exactly the same as the one in the previous chapter, but the data types are all categorical. The data values correspond to category levels. In the numerical data, the score was 1,2,3,…. In the categorical data, the levels are high or low. For example, we can set any number of levels such as significanthigh, high, plane, low, significant low. In the categorical data, there is two types. High and low have certain order. However, such as methods A, B, C, which have no order, are also available in this analysis. We clarify which explanatory data are important to improve the objective variable.

Kunihiro Suzuki

28

Figure 1. Data structure for independent factor analysis.

2. QUESTIONNAIRE We treat one objective variable and five explanatory variables. The objective variable y x i  1, 2, ,5 is the customers’ satisfaction. The explanatory variables (Items) i  are as below.  

Item 1  x1  :Understanding of customers’ work Item 2  x2  :Quality of reply to customers’ requests and questions



Item 3  x3  :Project promotion ability Item 4  x4  :Effective information providing ability



Item 5  x5  :Proposal ability



The form is exactly the same as the one in the previous chapter. The target of the analysis is to clarify for which items we (salesmen) should try to improve the customers’ satisfaction. Table 1. Questionnaire and variable notations

Independent Factor Analysis

29

Table 2. Raw data for independent factor analysis ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

Item1(x1) Item2(x2) Item3(x3) Item4(x4) Item5(x5) high high high high high high low low low low low low low low low high high high high high high high high high high low low low low low high high high high high high low low low low low low low low low low low low low low high high high high high low low low low low low low low low low low high low low low high low low low low low low low low low high high high high high high low low low low high high low high low high high low low low high high low high low high high low high high high high low high high high low low low low high high high high high high low low high high high high low high low high high high high high high high high high high high high high high high high high low high high high high low high high high low low low low high high high high high high low high low high low low low high low high high low low low high high low low low high low low low low high high low low high low low low low low low low low low low high high low high high high low low low low high low high low high low high low high low high high high high high high high low high low low low low low low high high high high high high high low low high low low low low low high low high high high low low low high low low low low low low high high low high high high high high high high low low low low low high high high high high low low low low low

y low low low high low low high low low low high low low low low low high low low low low low high low high low high high high high high low low high low low low high low high low low high low low low high high low high high low low low low high low low low low

Kunihiro Suzuki

30

The level for the data, which was a score in numerical data, is category for this analysis and is high or low as shown in Table 1. Therefore, the final data are as shown in Table 2, where the data number is 350. We consider the relationship between the objective variable and each explanatory variable and do not consider the relationship between each explanatory variable. We can then obtain five group data sets that show the relationship between each item and the objective variable. We show the data associated with Item 1 in Table 3. We can see qualitatively that high satisfaction for item 1 leads to high satisfaction for the salesmen. We evaluate the above relationship quantitatively. We focus on one item, and form a cross table. We express the data for i -th row and j -th column as xij . We then have data values given by  x11  188   x12  33   x21  33   x22  96

(1)

The corresponding cross-tabulation table is shown in Table 3. Table 3. Cross-tabulated table and level ratios for item 1

3. INDEPENDENT VALUE We define independent values for the values in the cross tabulated table. We first evaluate the independent ratio ki with respect to Itme1 levels by dividing number for the level summed over the customers’ levels by the total number. This ratio is related to the level of Item 1 independent of the objective variable status.

Independent Factor Analysis

31

r

Next, we evaluate the independent ratio j with respect to customers’ satisfaction for the level summed over the item 1’s levels. This ratio is related to the level of the objective variables. The evaluated values are shown in Table 3. The independent value for cell

 i, j 

is denoted as

aij

and is given by

aij  ki rj N

(2)

This is the value that we expect if there is no interaction between the objective and explanatory variables. Table 4 shows the independent values. Table 4. Independent values

Item1

y Score high low

high 139.55 81.45

low 81.45 47.55

The sum of the independent ratio can be evaluated as m  l  k r  k   i j i   rj  i 1 j i 1  j  m

l

m

  ki i 1

1

(3)

4. TESTING After we obtain a cross tabulation table as shown in Table 3, we want to know the significance of relationship of two categorical tables, which was evaluated with a correlation factor with numerical data. We utilize a likelihood ratio testing here. In the probability trial, we assume that we obtain k kinds of evens E1 , E2 , , Ek , where each event is exclusive. We try N times, and obtain E1 , E2 ,

n1 , n2 ,

, nk

, Ek events

 times, respectively. The probability that Ei occur is denoted as i . The

Kunihiro Suzuki

32

 n1 , n2 ,

probability that we obtain the results of distribution given by f  n1 , n2 ,

, nk ;  

N! 1n12 n2 n1 !n2 ! nk !

, nk 

is expressed by a multinomial

k n

k

(4)

where

N  n1  n2 

 nk

Using the obtained data



ˆN  ˆ1N ,ˆ2 N , n n  1 , 2 , N N

, ˆkN ,

(5)

 n1, n2 ,

, nk 

, the probability is given by



n2   N

(6)

We compare it with the independent value and take the ratio  as

  n1 , n2 , , nk  

f  n1 , n2 ,

, nk ; 0  , n ;ˆ



f n1 , n2 ,

k

n



N! 10 n1 20 n2 n !n ! nk !  1 2 N! ˆ1N n1ˆ2 N n2 n1 !n2 ! nk !   n1   10 n ˆ 1  1N

  20 n2   ˆ n2  2 N

  

k 0n

  k 0 nk  ˆ nk   kN

k

ˆkN n

k

  

(7)

Taking its logarithm, we obtain

2ln   n1 , n2 ,

k

, nk    2 ni ln ˆiN  ln i 0  i 1

k  n    2 ni ln  i   ln i 0  i 1  N 

(8)

Independent Factor Analysis

33

We expand it into Taylor series, and obtain k k  ˆ  2 ni ln ˆiN  ln i 0   2 ni ln  iN  i 1 i 1  i 0  k  ˆ        2 ni ln  iN i 0 i 0   i 0 i 1    k   ˆ      2 ni ln 1   iN i 0   i 1   i 0   2 ˆ   k 1  ˆiN  i 0   iN i 0  2 ni      2  i 0    i 0 i 1   2  ni  ni     i 0 1   i 0   k  2 ni  N  N   2  i 0    i 0 i 1     2 k  n  N 1  ni  Ni 0   i i0  2 ni      2  N i 0    Ni 0 i 1 

(9)

This can be modified further as 2  n  N 1  ni  Ni 0   i i0 2 ni      2  Ni 0    Ni 0 i 1  2 k  n  N 1  ni  Ni 0   i i0  2  ni  Ni 0   Ni 0       2  Ni 0    Ni 0 i 1  2 3 2 k  ni  Ni 0   1  ni  Ni 0   Ni 0  ni  Ni 0    2    ni  Ni 0    Ni 0 2 N 2i 0 2 i 1    2 3 k  ni  Ni 0   ni  Ni 0        Ni 0 N 2i 0 2  i 1   k

k

 i 1

 ni  Ni 0  Ni 0

2

(10)

Kunihiro Suzuki

34

where we assume that the second term before the last line in Eq. (10) is negligible compared to the first term and utilize below in the derivation process. k

k

k

i 1

i 1

  ni  Ni 0    ni  N  i 0 i 1

NN 0

(11)

Therefore, we can utilize a testing variable given by k

2  

 ni  Ni 0 

2

Ni 0

i 1

(12)

2 This is known to follow a  distribution. In the model, we can replace Ni 0 by the 2 independent value. Since the  distribution is the sum of the square of standard normal

variables, the average and the variance of cell i denoted as

i

and

 i 2 are given by

 i  N i 0

(13)

 i 2  Ni 0

(14)

Therefore, we evaluate the variable for the cross tabulated table as

 2   ij2 i, j



x

ij

 aij 

2

(15)

aij

i, j

where

ij2

x 

ij

 aij  aij

2

(16)

Independent Factor Analysis

35

We apply testing procedure to the data shown in Table 3. The term associated with cell

1,1 is evaluated as

188  139.55

112 

2

139.55

 16.6

(17)

The total sum over all cells are given by

 2  112  12 2   212   22 2 

188  139.55

139.55  123.85

2



 33  81.45 81.45

2



 33  81.45 81.45

2



 96  47.55

2

47.55 (18)

This can be expressed as

 2    ij 2 ij 2 

i

j

x

 aij 

ij

(19) 2

aij

 xij  aij   aij 

   

2

(20)

Let us consider the freedom of the variable. The level numbers of the variables are assumed to be m and l . We utilize the ratio for two variables, and the freedom should be decreased by 1 for each variable. Therefore, the freedom  is given by    m  1 l  1

(21)

In this case, m  2 and l  2 . The corresponding freedom  is given by

   m  1 l  1   2  1 2  1 1

(22)

Kunihiro Suzuki

36

We set a predictive probability P  0.95 , and obtain the corresponding P-value as

c 2   2  , P    2 1,0.95  3.84

(23)

Therefore, we obtain

 2  c 2

(24)

This means that the two variables have relationship.

5. INDEPENDENT FACTOR We consider the maximum and minimum values of  . We treat different data from the previous section with more levels to clarify the analysis, where the relationship between age and flavor of dishes of Chinese, Japanese, and French. The dishes’ levels number is 3, and age’s levels number is 5. The total member number is 500. 2

2 Let us consider the minimum  .

The minimum  is 0, which we show below. We can first set the ratio as shown in Table 5. The corresponding independent value is 2

2 shown in Table 6. If the data are equal to the values, the corresponding  is 0.

Table 5. Setting ratio of age and dish

Chinese mid-20 mid-30 Age mid-40 mid-50 mid-60 Dish ratio

0.5

Dish Japanese

0.2

French

0.3

Age ratio 0.2 0.3 0.4 0.3 0.2 1

Next, we consider the maximum value. The maximum value corresponds to the situation of significant deviation. The corresponding situation is that only one level has a value along all rows and columns, which is shown in Table 7. The corresponding 2 independent values are shown in Table 8. The corresponding  is as large as 1000.

Independent Factor Analysis

37

Table 6. The independent value

mid-20 mid-30 Age mid-40 mid-50 mid-60 Dish ratio

Dish Chinese Japanese 50 20 75 30 100 40 75 30 50 20 0.5 0.2

Age ratio

French 30 45 60 45 30 0.3

0.2 0.3 0.4 0.3 0.2 1

Table 7. Data with significant deviation

mid-20 mid-30 Age mid-40 mid-50 mid-60 Dish ratio

Dish Chinese Japanese 100 0 0 200 0 0 0 0 0 0 0.2 0.4

Age ratio

French 0 0 200 0 0 0.4

0.2 0.4 0.4 0 0 1

Table 8. Independent values for Table 7

mid-20 mid-30 Age mid-40 mid-50 mid-60 Dish ratio

Dish Chinese Japanese 20 40 40 80 40 80 0 0 0 0 0.2 0.4

French 40 80 80 0 0 0.4

Table 9. Table for maximum  Dish Chinese Age

Dish ratio

Age ratio 0.2 0.4 0.4 0 0 1

2

Age ratio Japanese 0

French 0

a b

Mid-20

a

Mid-30

0

b

0

Mid-40

0

0

c

c

Mid-50 Mid-60

0 0

0 0

0 0

0 0 1

a

N

b

N

c

N

N N

N

Kunihiro Suzuki

38

2 We want to obtain a general form for the maximum  . Therefore, we use variable

values of a , b , and c instead of numeric data as shown in Table 9. The corresponding independent values are shown in Table 10. Table 10. Independent values for maximum  Dish Japanese

Chinese Age

2

Age ratio French

Mid-20

a2 N N2

ab N N2

ac N N2

a

Mid-30

ba N N2

b2 N N2

bc N N2

b

Mid-40

ca N N2

cb N N2

c2 N N2

c

Mid-50 Mid-60

0 0

0 0

Dish ratio

b

0 0 N

c

N

N

N

N

0 0 1

2 We can then evaluate  as

2

2 2  a2   ab   ac  a       N  N  N  2    ab ac a2 N N N 2

2 2  b2   ba   bc  b        N N N      ba bc b2 N N N 2 2  c2   ca   cb  c        N N N        ca cb c2 N N N 2

2

2

2

   a2  b2  c2  a   b   c   N N N 2      ab  bc  ca  2 2 2 a b c N N N N

(25)

Independent Factor Analysis

39

Modifying the first term, we obtain 2

 a2   a2 a   a   N  N  N a2 a   N 

     

2

2

a   N 1   N   a a2   N 1  2  2  N N   a2  N  2a  N

(26)

Performing the similar analysis for the second and third terms in Eq. (25), and summing up them, we obtain a 2  b2  c2 2   ab  bc  ca  N N 2 2 2 a  b  c  2  ab  bc  ca 

 2  3N  2  a  b  c    3N  2 N   3N  2 N 

N

a  b  c

2

N

 2N

(27)

Note that the  expression does not include use any combination of the values if they hold 2

a , b , and c

. This means that we can

abc  N

(28)

2 We assumed that the form of Table 9 gives the maximum  . We need to prove it.

We modify the Table 9, where we change a  a   , where  is positive, and move

 to the next cell. We then obtain the modified table as shown in Table 11. What we want 2 to show is that  decreases with this operation. If it is realized, it is proved that the form of Table 9 gives the maximum one. The corresponding independent values are shown in 2 Table 12. We want to show the  is decreased for any incremental positive  .

Kunihiro Suzuki

40

In the derivation process below, we neglect the second order  since it is assumed to be quite small. Table 11. Modified table for maximum 

Age

Dish Chinese

Japanese

Mid-20

a



Mid-30

0

b

0

Mid-40

0

0

c

Mid-50 Mid-60

0 0

0 0

0 0

Age ratio French 0

b N

a N

Dish ration

2

c N

a N b N c N 0 0 1

2 The  is given by

2 a a    a b        ac  a          N N    N  2   ac a a   a b    N N N 2

2

2 b b      ba      bc    b     N N    N  bc ba   b b    N N N 2

2

 c a     c b      c2  c       N N N      c a   c b    c2 N N N 2

2

2

a a    b b        c2  c  a     b    N N N      2 a a   b b    c N N N 2

2

2

a b         2 N   ac  b  a     bc  c  a     c  a b    N N N N N N 2

(29)

Independent Factor Analysis

41

The first term in the first line of Eq. (29) is modified as a a      a a     N a       1   N N a      a a   a a   N N 2 a a    N  1    N a  a2  N a  N  2a   2    N  a N 2

2

2

b b       b b     b     2 N    N  1  N  b b    b  b     b    N N 2 b b     2N  N     1    N  b b  b b    bN   2b  N b b b    bN   2b  N   b 1   b  b b        2b  N 1   N  b 2 b b N  N  2b       N N b  2

2

(30)

2

2

  c2   N  c2  c      1   N N  c    2 2 c c N N 2 c  2N N 2   1   2  N c c   N  2c 

(31)

2

c2 N

(32)

Kunihiro Suzuki

42

Table 12. Independent value for the modified table for maximum 

Age

Mid-20

Mid-30

Mid-40

Dish Chinese

Japanese

a a  

a b   

N

N

Age ratio

b a  

b b   

N

N

c a  

c b   

N Mid-50 Mid-60

0 0

N 0 0

a N

Dish ration

2

French

ac N bc N c2 N 0 0

b N

c N

a N b N c N 0 0 1

Summing them up, we have a2  N a  2    N  a N 2 b b N  N  2b       N N b  c2  N  2c  N 1  3N  2  a  b  c    a 2  b 2  c 2  N N a b N  2      a N N b  N  2a 

(33)

We then analyze the second line term in Eq. (29).   a b      a b      N     1     N a b   N         a b    a b    N N  a b     2N   1  N  a  b     a b      2 N ab  a    2    N  N 2

2

2

(34)

Independent Factor Analysis

43

Summarizing the second term, we obtain ab  a ac b  a    bc c  a    c 2   2         N  N N N N N N 2 a b    ab  bc  ca    2     N N N 

(35)

2 Summarizing the total, we obtain  as

1 2  a  b2  c2  N N a b N  2      a N N b 

2  N 



2 a b  ab  bc  ca    2     N N N 

(36)

1 2 1 1  N  a  b  c     N  N a b 1 1  2N     N  a b 1 1 2   max     N a b 2 Therefore, the  is decreased.

2 We generalize the analysis further to obtain the final form of the maximum  . We y y treat two categorical variables x and , and x has m levels and has l levels as shown in Table 13. We assume m  l , which does not vanish generality. It can be

expressed by k  Min  m, l 

(37)

k is denoted and k  m in this case. The ratio associated with the x levels 1 2 as f1 , f 2 , , f k . Since we assume the form for the maximum, the number of level

x ,x ,

y1 , y2 ,

, yk

yk 1  yk 2 

y1 , y2 ,

x ,x ,

,x

,x

1 2 k , and is the same as the number of level  yl  0 , which is shown in Table 13. The ratio associated with

, yk are denoted as

g1 , g2 ,

, gk , and are given by

Kunihiro Suzuki

44 g1  f1 , g2  f2 ,

, gk  fk , gk 1  gk 2 

 gl  0

(38)

Table 13. General form for maximum  y1

2

yk

y2

x1

Nf1

0

0

0

x2

0

Nf 2

0

0

xk

0

0

0

Nf k

y ratio

g1

g2

gk

yl

0

x ratio

0

f1

0

f2

0

0

fk

0

0

Table 14. Independent value table y1

y2

yk

yl

Ratio

x1

Nf

Nf1 f2

Nf1 fk

0

0

f1

x2

Nf2 f1

Nf 2 2

Nf 2 f k

0

0

f2

xk

Nfk f1

Nf k f 2

Nf k 2

0

0

fk

Ratio

f1

f2

fk

0

0

2 1

The corresponding independent values are shown in Table 14. 2 The sum of the component of the  associated with the first row in the Table 14 can be evaluated as

 Nf

1

 Nf12  Nf12



2



N 2 f12 1  f1  Nf12

 01  Nf1 f 2  Nf1 f 2

2





 01  Nf1 f k 

2

Nf1 f k

2

 Nf1 f 2 

 N 1  f1   Nf1  f 2  f 3  2

 Nf1 f k  fk 

 N 1  f1   Nf1 1  f1  2

 N 1  f1 1  f1  f1   N 1  f1  Performing the similar analysis for the other rows and sum them up, we obtain

(39)

Independent Factor Analysis

 2  N 1  f1   N 1  f 2    kN   f1  f 2 

45

 N 1  f k 

 fk  N

  k  1 N

(40)

where we utilize f1  f2 

 fk  1

(41)

2 Therefore, the maximum value of  is

 2 max   k  1  N

(42)

This does not depend on f1 , f2 , f1  f2 

, f k . Therefore, we can use any value if we satisfy

 fk  1 .

2 Normalizing the obtained  by this maximum value, we can define a factor given

by

rc 

2

N   k  1

(43)

This is called as an independent factor. This has the values between 0 and 1, and the relationship is significant with approaching 1. In the above example, we obtain

rc 

2 123.85   0.59 N   k  1 350   2  1

(44)

6. ADJUST RESIDUALS We again treat the data for customers’ satisfaction shown in Table 2. We evaluated the relationship between two categories by the independent factor. We want to obtain the relationship between each level of two categories, which can be evaluated with adjust residuals.

Kunihiro Suzuki

46

We focus on two levels to make the analysis simple. 2 We can evaluate  as

  2



 x11  Nk1r1 

2



Nk1r1

 x21  Nk2 r1 

 x12  Nk1r2 

2

Nk2 r1



2

Nk1r2

 x22  Nk2 r2 

2

Nk2 r2

(45)

where

k1  k2  1  r1  r2  1

(46)

These ratios are related to data values as

 x11  x12  k1  N   x11  x21  r 1  N   x22  N   x11  x12  x21   

(47)

Therefore, we obtain

 x12  Nk1  x11   x21  Nr1  x11   x22  N   x11  Nk1  x11  Nr1  x11    1  k  r  N  x 1 1 11  2 We then obtain  as

(48)

Independent Factor Analysis

2 

 x11  Nk1r1 

2

Nk1r1

 Nk1  x11  Nk1 1  r1    Nk1r2

2

 Nr1  x11  N 1  k1  r1   Nk2 r1

2

1  k1  r1  N  x11  N 1  k1 1  r1    Nk2 r2   

47

 x11  Nk1r1 

2



Nk1r1

 x11  Nk1r1 

2

N

 x11  Nk1r1 

 x11  Nk1r1 

2

Nk1r2



2

 x11  Nk1r1  Nk2 r1

2



 x11  Nk1r1 

2

Nk2 r2

 k2 r2  k2 r1  k1r2  k1r1    k1r1k2 r2  

2

Nk1r1k2 r2

 x11  Nk1r1   Nk1 1  k1  r1 1  r1  2

(49)

Note that this corresponds to the data of row of 1 and column of 1. 2 We obtained an expression for the  focusing on

expression focusing on

x12

x11

. We can also derive an

and obtain

 x11  Nk1r1    Nk1 1  k1  r1 1  r1  2  Nk1  x12  Nk1r1   Nk1 1  k1  r1 1  r1  2  x12  Nk1 1  r1    Nk1 1  k1  r1 1  r1  2

2

Focusing on

x21

, we obtain

(50)

Kunihiro Suzuki

48

 x11  Nk1r1  Nk1 1  k1  r1 1  r1  2  x21  N 1  k1  r1   Nk1 1  k1  r1 1  r1  2

2 

Focusing on

x22

(51)

, we obtain

 x11  Nk1r1  Nk1 1  k1  r1 1  r1  2  x22  N 1  k1 1  r1    Nk1 1  k1  r1 1  r1  2

2 

(52)

2 Since the  is constant, we obtain

2  x12  Nk1 1  r1    x11  Nk1r1    Nk1 1  k1  r1 1  r1  Nk1 1  k1  r1 1  r1  2  x21  N 1  k1  r1   Nk1 1  k1  r1 1  r1  2  x22  N 1  k1 1  r1    Nk1 1  k1  r1 1  r1  2

(53)

Therefore,  can be explained by using the parameter of each cell. The absolute values of adjust residuals are all the same. Next, we consider the sign of the adjust residual. 2

x Let us consider the cell 1, 2  where corresponding data is 12 . We have x11  x12  k1 N

(54)

We then have

x12  k1 N  x11

(55)

Independent Factor Analysis

49

We can make the numerator of the adjust residual as

x12  k1r2 N  k1 N  x11  k1r2 N

  x11  k1 1  r2  N    x11  k1r1 N 

(56)

This is the negative value for that of cell 1,1 . Therefore, the adjust residual for 2x2 cross tabulated table are given by Table 15. Table 15. General form of adjust residual for 2x2 cross tabulated table



 



We set an adjust residual for cell variance as

ij 

Nki 1  ki  rj 1  rj 

 i, j   ij . We can regard the average as Nki rj

and

. Therefore, the variable given by

xij  aij aij 1  ki  1  rj 

(57)

can be regarded as the normalized one, and it is the form for the adjust residual. We show the above expression for a 2x2 cross tabulated table. We apply the form of Eq. (57) to any type cross tabulated table. The adjust residual is given by zij 

xij  aij n  n   aij 1  i  1  j  N  N 

(58)

This is the variable that expresses the importance of the cell with respect to the objective variable. We assume that it follows the normal distribution, and we can relate the value of adjust residual to the probability as

Kunihiro Suzuki

50  zP  P  0.90   1.64   zP  P  0.95  1.96  z  P  0.99   2.58  P

(59)

where the predictive probability is for both sides ones. We adopt P  0.95 here. Let us evaluate the adjust residual for item 1, which is shown in Table 16. Since the absolute value is larger than zP  P  0.95 , we judge that the dependence is valid. The adjust residual and independent factor rc are related to

rc  

2 N  k  1  N  k  1

(60) Table 16. Adjust residuals for item 1 Customers' satisfaction

Item1

high

low

high

11.13

-11.13

low

-11.13

11.13

Let us consider the meaning of the adjust residuals. We study two types of the data: one is uniform, and the other is the deviated data which is shown in Table 17. In this case, the item ratios are the same for both data and are 0.5. The corresponding adjust residual are 0 and 0.45, respectively, and the independent factor are 0 and 1, respectively. Therefore, both the adjust residual and the independent factor are not related to the item ratio, but the deviation of the data.

Item

Item

Table 17. Adjust residuals for uniformed and deviated data Adjust residual

high low

Customers' satisfaction high low Item ratio 5 5 0.5 5 5

Adjust residual

high low

Customers' satisfaction high low Item ratio 10 0 0.5 0 10

high low

high low

high 0.00 0.00

low 0.00 0.00

Independent factor 0

high 4.47 -4.47

low -4.47 4.47

Independent factor 1

Independent Factor Analysis

51

Next, we investigate the deviated data with varying the item ratio as shown in Table 18. When the item ratio changes significantly, the residual and the independent factor are independent of them. Inspecting above, an independent factor expresses the deviation of total data, and the adjust residual expresses the deviation of the levels, and both are independent of the item level ratio.

Item

Item

Item

Item

Item

Table 18. Adjust residual and independent factor of deviated data with varying item ratio Adjust residual

high low

Customers' satisfaction high low Item ratio 4 0 0.2 0 16

Adjust residual

high low

Customers' satisfaction high low Item ratio 8 0 0.4 0 12

Adjust residual

high low

Customers' satisfaction high low Item ratio 10 0 0.5 0 10

Adjust residual

high low

Customers' satisfaction high low Item ratio 12 0 0.6 0 8

Adjust residual

high low

Customers' satisfaction high low Item ratio 16 0 0.8 0 4

high low

high low

high low

high low

high low

high 4.47 -4.47

low -4.47 4.47

Independent factor 1

high 4.47 -4.47

low -4.47 4.47

Independent factor 1

high 4.47 -4.47

low -4.47 4.47

Independent factor 1

high 4.47 -4.47

low -4.47 4.47

Independent factor 1

high 4.47 -4.47

low -4.47 4.47

Independent factor 1

7. LEVEL ACHIEVEMENT RATIO We select levels based on the critical adjust residuals, and we evaluate the corresponding ratio. We call the ratio as a level achievement ratio that is denoted as

rli ,

where i expresses the item. The level of item 1 is high and is required to realize customers’ high satisfaction. The corresponding ratio is 0.63. In this case, the selection of levels of explanatory variable is rather clear. However, the selection is not clear associated with categorical levels in general. In that case, the adjust residual does work.

Kunihiro Suzuki

52

We can set the critical adjust residual value relating to the standard normal distribution. For example, if we relate the critical value to the predictive probability, we set the critical value at 1.96. When we set the critical value, there may be a case where the levels that exceed the value does not exist. We then do not treat the item. Consequently, we evaluate the selection in two gates: one is the  evaluation, and the second is the adjust residual critical value. Further, if we evaluate the level with the critical value, it may occur that many levels are selected. In that case, we should sum the ratio as below 2

rl  k1  k2

(61)

where levels 1 and 2 are assumed to exceed the critical value. Table 19 summarizes the data including the level achievement ratio. Using the data, we perform the CS analysis in the next section. Table 19. Summary of the satisfaction data Item1

Item2

Item3

Item4

Item5

Level achievement ratio

0.63

0.59

0.32

0.5

0.41

Independent factor

0.59

0.57

0.38

0.53

0.49

Dependence(χ2　evaluation)

yes

yes

yes

yes

yes

1.23

0.88

-1.48

0.08

-0.7

1.08

0.75

-1.78

0.19

-0.23

Contribution score

1.63

1.15

-2.31

0.19

-0.23

Rquired improvement score

-0.11

-0.09

0.21

0.08

0.33

Normalized level achievement ratio Normalized independent factor

8. DETERMINATION OF ITEM BASED ON CS ANALYSIS We obtained independent factors and level ratios for each item. Using these data, we perform CS analysis of which procedure is shown in the previous chapter.

Independent Factor Analysis

53

8.1. Normalization We first evaluate averages and standard deviations for a level ratio with respect to items, which are given by

r  l

1 p  rli p i 1



1 p  rli  rl p i 1

r  l

(62)



2

(63)

We then normalize the level ratios as

zrl i 

rli  rl

r

(64)

l

We next evaluate averages and standard deviations for an independent factor with respect to items, which are given by

r  c

r  c

1 p  rr i p i 1 c



1 p  rci  rc p i 1

(65)



2

(66)

We then normalize the independent factors

zrci 

rci as

rci  rc

 rc

We then obtain a vector for each item as

(67)



ai  zrc i , zrl i

.

Kunihiro Suzuki

54

8.2. Improve Requested and Contributed Items We can plot the normalized independent factor and normalize level ratios as shown Figure 2. The high correlation factor means that it is an important item, and high satisfaction means that it is in good condition. Therefore, the improvement request item can be selected as the one with a high independent factor and a low level achievement ratio. How can we obtain the corresponding value ? The axis direction of right angle of -45o corresponds to the importance associated with an independent factor and bad condition associated with the satisfaction, and hence it expresses the degree of an improvement requested degree. Therefore, the projection of each point to the axis corresponds to the improvement request degree. The axis direction of right angle of 45o corresponds to the importance associated with an independent factor and good condition associated with a level achievement ratio, and hence it expresses the degree of contribution. Therefore, the projection of each point to the axis corresponds to the contribution degree. We can evaluate the degree as follows. The unit vector for the contribution axis eG , and that for improvement request axis eB are given by   1 1  , eG      2 2  e   1 ,  1  B    2  2 

(68)

The degree for the contribution is denoted as

Gi and can be evaluated as

Gi  ai eG



 zrc i , zrl i 

1 2

z

rc i



1 2

 zrl i

1,1



The degree for the improvement request is denoted as

(69)

Bi and can be evaluated as

Bi  ai eB



 zrc i , zrl i 

1 2

z

rc i



1 2

 zrl i

1, 1



(70)

Independent Factor Analysis

55

Normalized level achievement ratio

Figure 2 shows CS plot and Figure 3 shows the contribution and requested improvement degrees extracted from Figure 2. Item1 and Item2 contribute to the customers’ satisfaction, while there is no clear improvement requested item.

2 Contribution Item1 Item2

1 Item4

0

-1

Item5

Item3

Improve requested

-2 -2

-1

0

1

2

Normalized independent factor Figure 2. CS plot for satisfaction for salesmen.

2.0 1.8 Contribution score

1.6

Improvement request

1.4 1.2 1.0 0.8 0.6

0.4 0.2 0.0 Item1

Item2

Item3

Figure 3. Contribution and improvement request degrees.

Item4

Item5

Kunihiro Suzuki

56

9. CS CORRELATION FACTOR We want to evaluate the status of the CS. We can evaluate it inspecting the data distribution. If the data are along the contribution axis, the status is good. On the other hand, if the data are along the improvement requested axis, the status is bad. We can evaluate the status of the CS by evaluating the correlation factor between zri and

zi

, which is denoted is as

rzr z

, and is evaluate as

 zr2z

rzr z 

 zr2zr  z2z 

(71)

where 1 p 2  zri p i 1

(72)

 z2z  

1 p 2  zi p i 1

(73)

 z 2z 

1 p  zri zi p i 1

(74)

 z 2z  r r

r

We call is as a CS correlation factor. The value is between -1 and 1, and status is better with increasing the value. The CS correlation factor for the data is 0.96, and hence the status is good in this case. This is the reason why we have many contributed items with less improvement requested items.

10. EXPECTED OBJECTIVE VARIABLE IMPROVEMENT WITH IMPROVING EXPLANATORY VALUE When we improve the explanatory variable value, how can we expect the improvement in the objective value? This can be easily done with numerical data for CS analysis using

Independent Factor Analysis

57

a regression analysis. We do not have such theory for categorical data. We propose here a procedure to predict it.

Figure 4 Data flow.

10.1. Two Levels (Objective Variable)-Two Levels (Explanatory) We consider objective and explanatory variables with both two levels. Figure 4 schematically shows the data flow. The ratio of customer’s satisfaction for high is denoted as as

r1 and that for low is denoted

r2 .

We consider the levels of item as yes or no. The level yes corresponds to a requested one which has a high adjust residual, and the level no corresponds to non-requested one which has a low or negative adjust residual. Therefore, we try to increase the ratio of k1 , expecting the increase in r1 . The initial data for corresponding cells are assumed to be x11 , x12 , x21 , x22 , and the ratios are related to the data as below.

k1 

x11  x12 N

(75)

k2 

x21  x22 N

(76)

Kunihiro Suzuki

58

r1 

x11  x21 N

(77)

r2 

x12  x22 N

(78)

where N is the total number of the data and is given by N  x11  x12  x21  x22

We want to increase is increased to

k1

(79)

. Therefore, we perform something and assume that the ratio

k ' , and is expressed by

k1'  k1  k1

where

(80)

k1 is positive. k 2

then becomes as

r

k 2'  k 2  k1

. This

k1

is a given value.

r

We study the change of 1 and 2 in this situation. We assume that the values in the cells are changed to  x11  x11'  '  x12  x12  '  x21  x21  x  x'  22 22

(81)

Let us consider the data changes in more detail. The increase in by the decrease of is related to the

x21

with

 21

and the decrease of

x22

with

 22

k1

can be expressed

. Therefore, the change

k1 as

21  22  N k1 We assume that

x11

and

(82)

x12

do not change.

Independent Factor Analysis

We also assume that all the others are added to factor and is given by

x12

 21

is added to

x11

59

and that part of

22

is added to

rc

(83)

is 1, the whole data are added to

data are added to and obtain

x12

x11

. On the other hand, when

. This features are expected ones. We eliminate

'   x11  x11  N k1  1  rc   22  '   x12  x12  1  rc   22

rc  22

rc  22 N

is 0, whole

using Eq. (82),

, and the corresponding ratio is

(85)

This can be solved with respect to

 22  N

 21

rc

(84)

The increase in number for high satisfaction is expressed by

r1'  r1 

and

. We assume that the change is expressed using the independent

'   x11  x11   21  rc  22  '   x12  x12  1  rc   22

When

x11

 22

given by

r1'  r1 rc

(86)

There are two non-determined variables solve it.

 22

and

r1' , and we need one condition to

2 We propose that the independent factor, that is, the  is not changed. We then obtain

x '11  Nk1' r1'    x11  Nk1r1     Nk1 1  k1  r1 1  r1  Nk1' 1  k1'  r1' 1  r1'  2

2

2

(87)

Kunihiro Suzuki

60 2 We can modify  as



2

 x  Nk r '   Nk 1  k  r ' 1  r '  ' 11

' 1

' 1

' 1

2

1

1

1

 x11  N k1  1  rc   22  Nk1' r1'   Nk1' 1  k1'  r1' 1  r1' 

2

   1  rc  ' ' '  x11  N k1  N    r1  r1   Nk1r1   rc    ' ' ' ' Nk1 1  k1  r1 1  r1 

2

  x11  '   1  rc    1  rc  '   k1    r1      k1  r1   rc    rc   N   N ' ' ' ' k1 1  k1  r1 1  r1 

(88)

2

We then obtain   '  1  rc    1  rc   x  ' k 1  k  r 1  r     11  k1    r1      k1  r1  N    rc    rc     N 

2

' 1

' 1

' 1

' 1

2

(89)

We introduce variables below A

B

2 N

k1' 1  k1' 

(90)

 1  rc  x11  k1    r1 N  rc 

 1  rc  ' C    k1  rc 

(91)

(92)

The Eq. (89) can be expressed by Ar1' 1  r1'    B  Cr1' 

2

(93)

Independent Factor Analysis

61

' We solve this with respect to r1 and obtain

r1' 

A  2 BC  A  A  4 B  C  B   2 A  C2 

(94)

We have two roots, but the only one is available. We first consider the sign of the term C  B as

 1  rc  ' x11  1  rc  CB  k1     k1   r1 r N r  c   c   1  rc  x11   1  r1   k1  N  rc   1  rc  x11  x12 x11    1  r1   r N N  c   1  rc  x12  0  1  r1   N  rc 

(95)

Therefore, the term C  B is positive. We consider that the adjust residual is large for cell 1,1 . Therefore, the below should hold.

x'11  Nk1' r1'  0

(96)

This leads to

B  Cr1'  0 We consider the one root of plus sign in (94), and modify it as

(97)

Kunihiro Suzuki

62

B  Cr1'  B  

A  2 BC  A  A  4 B  C  B   C 2 A  C2 

2 AB  2 BC 2  AC  2 BC 2  A2C 2  4 ABC 2  C  B  2 A  C2 

2 AB  AC  AC 1  

4B C  B  A

2 A  C2 

  4B 2 AB  AC 1  1  C  B   A    2 2 A  C   

(98)

2 AB  2 AC 2 A  C2  2 A B  C  2 A  C2 

0

Therefore, this root is not adequate. The other root is modified as

B  Cr1'  B 

A  2 BC  A  A  4 B  C  B   C 2 A  C2 

  4B 2 AB  AC 1  1  C  B   A    2 2 A  C 

(99)

  4B 2 AB  AC  1   C  B   1 A   0  2 2 A  C  Therefore, the root is positive and is adequate. We then have

r1' 

A  2 BC  A  A  4 B  C  B   2 A  C2 

(100)

Independent Factor Analysis

63

Table 20. Cross tabulated table for general form and the modified effective one with 2x2 cross tabulated table is shown below

Item

Item

Levels significant high high plane low significant low ratio

high x11 x21 x31 x41 x51 r1

Total satisfaction plane x12 x22 x32 x42 x52 r2

Levels significant high high plane low significant low ratio

high

Total satisfaction plane low

low x13 x23 x33 x43 x53 r3

ratio k1 k2 k3 k4 k5

ratio

x11m

x12m

k1m

x21m

x22m

k2m

r1

r2m

10.2. General Form for Expected Objective Variable Level The target cross tabulated table is not 2 x 2 type in general. More general form is shown in Table 20. In that table, we obtained that the expected explanatory levels are two. We then merge the table and obtain the one blow, where the data are merged as

 x11m  x11  x21   x12 m  x12  x13  x21  x23   x21m  x32  x41  x51  x22 m  x32  x33  x42  x43  x52  x53  We can then apply the same process to obtain the target values. This process can be easily extended to any expected levels.

(101)

Kunihiro Suzuki

64

11. ANALYSIS FOR SUB GROUP We treat a total group up to here. The group may consist of many sub groups, and the characteristics of the sub groups are different from the total group in general. The target items for the sub groups may be different from the total group. We try to select target items for the sub groups. We assume that the importance of item is the same for the sub groups of the total group. The difference between a certain group and the total group or between each sub groups is the status of satisfaction. We reference the level achievement ratio of the sub group with respect to the total ones. We set the level achievement ratio for i -item as rlGi and the corresponding data number as nG . The normalized value for i -th item zlGi is then given by zlGi 

rlGi  rli

rli 1  rli  rlGi 1  rlGi   N nG

(102)

We may use a different form as zlGi 

rlGi  rli 1 1  rli 1  rli      N nG 

(103)

where rli 

Nrli  nG rlGi N  nG

(104)

Data for the group can be regarded as the portion of the total group, and hence the variance may be assumed to be the same for the sub-group. In this case, the normalized value may be evaluated as zlGi 

rlGi  rli 1 1  rli 1  rli      N nG 

(105)

Independent Factor Analysis

65

The above models may have a too big deviation and suffer an unstable one. This may be modified as a simple one of

zlGi 

rlGi  rli

rli 1  rli 

(106)

This is a normalized one with respect to a standard deviation of the population, where we use it as a default. The normalized satisfaction can be expressed as

zli  zli  zlGi

(107)

After then the process is the exactly the same as that for CS analysis.

SUMMARY To summarize the results in this chapter.

a The independent value for cell  i, j  is denoted as ij and is given by aij  ki rj N

rj

k

where i is the ratio of an explanatory variable, and is the ratio of an objective variable level. We evaluate the variable for the cross tabulated table as

  2

i, j

x

ij

 aij 

2

aij

The level number of variable are assumed to be m and l . Therefore, the freedom  is given by    m  1 l  1

We set a predictive probability P , and obtain a corresponding P-value as

Kunihiro Suzuki

66

c 2   2  , P  If  2 is larger than  c2 , the dependence is valid, and vice versa. The independent factor is given by

rc 

2

N   k  1

The adjust residual is given by zij 

xij  aij n  n   aij 1  i  1  j  N  N 

We select levels based on the critical adjust residuals, and we evaluate the corresponding ratio. We call the ratio as a level achievement ratio and denote it as rli , where i expresses the item. Using the independent factor, and the level achievement ratio, we perform CS analysis, and also we can evaluate a CS correlation factor. We can predict the improvement of the objective variable ratio with increasing the level achievement ratio of the explanatory variable. The improved objective level ratio is given by

r1' 

A  2 BC  A  A  4 B  C  B   2 A  C2 

where A

B

2 N

k1' 1  k1' 

 1  rc  x11  k1    r1 N  rc 

 1  rc  ' C    k1  rc 

Independent Factor Analysis

67

The subgroup can be analyzed as

zlGi 

rlGi  rli

rli 1  rli 

This is a normalized one with respect to a standard deviation of the population, where we use it as a default. The normalized satisfaction can be expressed as

zli  zli  zlGi Figure 5 shows the flow for the above analysis. 2 We obtain the CS data, and it is tested associated with the relationship using  testing. We then select items and evaluated the corresponding independent factor and satisfaction. Selecting the levels of items and we evaluate the level achievement ratio. Using the independent factor, and the level achievement ratio, we perform CS analysis and select the target items.

Figure 5. Flow for the analysis.

Chapter 3

STATISTICAL TESTING AND PREDICTIONS ABSTRACT We usually obtain various sample probability variables, or variables of two groups where we obtain a certain difference in general. Assuming the probability function, we evaluate that the data belongｓ to the population set, or the difference is valid or not. These judging processes are called as testing. In these processes, the values of population are known or approximated values are assumed. Further, we can also predict population variable value range from the obtained sample data. The testing and predictions are partially performed in each chapter up to here. We clearly define the testing and predictions and repeat again and summarize the testing and predictions in statistics in this chapter.

Keywords: testing, hypothesis, null hypothesis, ratio, average, variance, normal distribution, t distribution, F distribution, population ratio

1. INTRODUCTION When we obtain sample data such of average and ratio, these values are different from the ones of the corresponding population set. Furthermore, when we have two data set for sample, we want to know whether the difference is valid or not for the corresponding population set. In the testing process we basically focus on the sample data using or assuming the corresponding population data. In the same obtained data of samples, we want to predict the range of the difference of population. We show the procedure to judge the validity for various probability values, and predict the range. We treat various probability variables in this chapter.

Kunihiro Suzuki

70

2. HYPOTHESIS We set a hypothesis to perform a test. H 0 is a null hypothesis and H1 is an alternative hypothesis. We obtain clear results

whether H 0 is true or H1 is true. We should be careful about the appreciation as the followings. We cannot judge absolutely, and may make errors sometimes. The case is shown in Table 1. Table 1. Judge and real results

Judge

Real H1:true

H0 :false;H1:true

H1:false (1st type error) H0 :true

H0 :true;H1:false

H0 :false (2nd type error)

We basically evaluate whether H 0 is true or false. When H 0 is false, we can clearly stay that H1 is true based on the decided prediction probability. We may also make a mistake in this case as shown in the table, which is called as a first type error. We can reduce the error with increasing the prediction probability. When H 0 is true, we may also make a mistake as shown in the table, which is called as a second type error. However, we cannot directly relate it to the prediction probability. Therefore, this error is rather uncontrollable. It should be noted, when H 0 is true, we should not clearly state that H 0 is true, but should say that we cannot say that H1 is not true. When H 0 is true, there are two possibilities. One is that H 0 is really true. The other is that we cannot say that H1 is true or not due to significant error range. To avoid the latter case, we should be careful when H 0 is true. Therefore, H 0 is established to evaluate H1 clearly. When H 0 is true, we

should stay rather ambiguous results. In that stand point of view, H 0 is called as null hypothesis, where it is established to deny clearly.

Statistical Testing and Predictions

71

3. LEVEL OF SIGNIFICANCE In the probability variable, we cannot predict results with 100% accuracy, but must set a prediction probability P . This means that the results of judge or prediction may fail with a certain times if we perform them many times. For example, if we set the prediction probability at 95%, we may fail 5 times for 100 times testing or predictions. We cannot decide the prediction probability mathematically, but simply assume it. This depends on what accuracy a person needs or is requested from his customers. In the prediction probability, we have two cases. We sometimes do not care the minimum or the maximum case as shown in Figure 1 (a). For example, we focus on the maximum value for a stock, where we care the stock value is sufficient or insufficient for the future sales. We set one edge boundary in this case, and it is called as one sided probability. The other is the typical one of both sides probability where the probability is assumed for both sides of the target value as shown in Figure 1 (b). In the one sided probability, the probability P is related to the region edge P

P



p

as

f  z dz

(1)

In the probability distribution where the defined region is positive, the one sided probability P is related to the region edge P

P

0

p

as

f  z dz

In both sides prediction probability, the probability P is related to the region edge

(2)

p

as P

P

f  z dz

 P

(3)

This is true only when the probability distribution is symmetrical. When it is asymmetrical, we combine two one-sided probabilities for the both sides probability as P1  

 P1



P2  

f  z dz

 P2



f  z dz

(4)

(5)

Kunihiro Suzuki

72 where P  P2  P1

f()

f()

(6)

P

P

-P

P

P



 (a)

(b)

Figure 1. Probability distribution, probability and P-value. (a) One sided probability, (b) both sides probability.

4. P POINTS FOR VARIOUS PROBABILITY DISTRIBUTIONS Once, we set the prediction probability P , we can obtain corresponding P points for

2

various probability distributions. We treat normal, t, , and F distributions. The normal and t distribution are symmetrical and we can assume the both sides prediction probability

2

with its peak position. However, the , and F distributions are asymmetrical, and P points are defined for one sided probability. The normal distribution can be always reduced to a standard normal distribution and we treat the standard normal distribution given by

f  x 

 x2  exp    2  2 1

for    x  

The t distribution with a freedom of n is given by

(7)

Statistical Testing and Predictions  n 1 n 1    2  2 2  t  fn t     1 n n  n     2  

The

73

 for    t    (8)

 2 distributions with a freedom of n is given by

fn  x  

n 1  x 2 x exp    n  2 n 22    2

1

for 0  x  

n The F distribution with a freedom of 1 and  n1  2x 1 f n1 , n2  x    n n n  n B  1 , 2   1 x  2  2 2 2 2

n1

2    

n1  x  2 1   n1 x  n2  2 2

(9)

n2 is given by n2

2  1   x 

 for 0  x    (10)

We can obtain corresponding values using a standard software.

5. TESTING FOR ONE VARIABLE  2 5.1. One Sample Data Testing for Known Variance 

Hypothesis H0: The sample belongs to the set. H1: The sample does not belong to the set. Evaluation and Judgment



 2

We have a set characterized with an average of and a variable of  . When we have a certain value of x , we want to evaluate that the value belongs to the set. We evaluate the variable z

z

given by

x

  2

(11)

Kunihiro Suzuki

74

We decide a prediction probability P , and evaluate a corresponding

zp

. If the

absolute value of z is smaller than the z P , we can regard that the data can be related to the set, and vice versa, which is expressed by  z  zP   z  zP

H0 is true. We cannot say that x does not belong to the set. H1 is true. x does not belong to the set.

Prediction If we do not know the population average, and the data the population, we can predict the population average as x  zP       x  zP   2

x

(12)

can be regarded to belong to

2

(13)

 2 5.2. Sample Average Testing for Known Variance 

Hypothesis

.

H0: The sample average

x

is the same as the population average

H1: The sample average

x

is different from the population average

.

Evaluation and Judgment We evaluate a sample average x

x

with a sample number n , given by

1 n  xi n i 1

(14)



 2

We know the average and variance of the population set as and  , respectively. We want to evaluate that the sample average is the same as that of the population. The same averages that the sample average variation covers the population average. We evaluate the variable

z

given by

Statistical Testing and Predictions z

75

x 

  2 n

(15)

We decide a prediction probability P , and evaluate corresponding

zp

. If the absolute

z

value of z is smaller than the p , we can regard that the data can be related to the set, and vice versa. Therefore, we can judge the sample average, and population average as H0 is true. We cannot say that x is different from . H1 is true. x is different from .

 z  zP   z  zP

(16)

Prediction If we do not know the population average, we can predict the population average as

x  zP

  2 n

   x  zP

  2 n

(17)

 2 5.3. Sample Average Testing for Unknown Variance 

Hypothesis H0: The sample average H1: The sample average

x x

.  is different from the population average . is the same as the population average

Evaluation and Judgment We evaluate a sample average

x

with a sample number n , given by

n

x   xi i 1

We know the average of the population set as

  2 . We then evaluate unbiased variance as

(18)

 , but do not know the variance

Kunihiro Suzuki

76

s   2

1 n 2  xi  x   n  1 i 1

(19)

We want to evaluate that the sample average is the same as that of the population. The same averages that the sample average variation covers the population average. We evaluate the variable t given by t

x  s  n 2

(20)

We decide a prediction probability P , and evaluate corresponding

tp

. If the absolute

t

value of t is smaller than the p , we can regard that the data can be related to the set, and vice versa. Therefore, we can judge the sample average, and population average as  t  t P   t  t P

H0 is true. We cannot say that x is different from . H1 is true. x is different from .

(21)

Prediction If we do not know the population average, we can predict the population average as

x  tP

  2 n

   x  tP

  2 n

(22)

(Example) Ministry of Health, Labor and Welfare in Japan evaluated new born man’s average weight in 1990, and the average was 3,150 g. More than 10 years passed now. The situation changes significantly, and hence investigate the average weight is changed. We randomly extract 100 new born male babies and obtained the data as follows. The average and unbiased standard deviation can be evaluated as

 x  2982   s  316.17

(23)

Statistical Testing and Predictions

77

where

s  s

2

(24)

Table 2. Babies’ weight

3372 2935 3118 2851 2675 2646 3163 2949 3522 2638

3110 3247 3580 3191 2689 2383 3187 2447 4231 3181

2619 3060 2879 3521 3070 3385 2358 3058 3366 2935

3315 2674 2915 2958 2935 2527 3268 3139 2520 2846

3420 2591 2679 3391 2443 2984 2512 2998 2724 2783

3230 3035 2956 2920 3121 3501 2716 2909 2894 3202

3017 3226 2809 3387 2936 3076 2890 3152 2941 2666

3124 2996 3092 3263 2655 3330 2807 2629 3061 2849

2928 3012 3482 2706 2863 2787 2647 3602 2788 2639

3048 3159 2753 2500 2794 2957 3084 3385 2863 2716

We can evaluate t as t

2892  3150  5.35 316.17 100

(25)

We decide a prediction probability P of 95%, and the corresponding P points are

t p  1.98

(26)

Therefore, we obtain

t  tP

(27)

and we can judge that the weight is changed.  z p  1.96 for standard normal distribution  t p  1.98 for t distribution

(28)

Kunihiro Suzuki

78

As we mentioned before, we can approximate the t distribution as a standard normal distribution when the sample number is large. The P point for the standard normal distribution is

z p  1.96

(29)

which is almost the same as that for t distribution as is expected, and the judge is also the same.

 2 5.4. Sample Variance Testing for Known Variance 

Hypothesis  2  2 H0: The sample unbiased variance s is the same as the population average  .  2  2 H1: The sample average s is different from the population average  .

Evaluation and Judgment We evaluate a sample unbiased variance

s   2

2

with a sample number n , given by

1 n 2  xi  x   n  1 i 1

We evaluate the variable 2 

s

(30)

 2 given by

 n  1 s  2   2

This follows the

(31)

2

distribution with a freedom of n  1 .

We decide a prediction probability

2

2

2 P , and evaluate corresponding  P . If the absolute

value of is smaller than the P , we can regard that the data can be related to the set, and vice versa. Therefore, we can judge the sample average, and population average as

Statistical Testing and Predictions 2 2    P  2 2    P

79

H0 is true. We cannot say that s  is different from    . 2

2

H1 is true. s  is different from    . 2

2

(32)

Prediction

2

If we do not know the population variance, we can predict it. Since distribution is asymmetrical, we set two prediction probabilities and hence it is expressed as

 n  1 s 2  P2

   2

 n  1 s 2

2

 P2

(33)

1

5.5. Outliers Testing We evaluate whether the data are outlier or not given by

5.6. Population Ratio Testing with Restored Extraction Hypothesis H0: The sample average H1: The sample average

pˆ pˆ

is the same as the population ratio

p

.

is different from the population average

p

.

Evaluation and Judgment We have a population ratio of

p

want to judge the sample ratio that can be regarded as the population ratio We evaluate a normalized variable as z

pˆ

. We investigate the sample ratio and obtained . We

p

.

pˆ  p p 1  p  n

This variable follows standard normal distribution.

(34)

Kunihiro Suzuki

80

We decide a prediction probability

P , and evaluate corresponding z p . If the absolute

z

value of z is smaller than the P , we can regard that the data can be related to the set, and vice versa. Therefore, we can judge the sample average, and population average as  z  zP   z  zP

H0 is true. We cannot say that pˆ is different from p. H1 is true. pˆ is different from p.

(35)

Prediction If we do not know the population ratio, we can predict the population ratio as pˆ 

pˆ 1  pˆ  z P 2 pˆ 1  pˆ  z P 2 z 2  2 pˆ  P  z P  2 n 4n  p  2n n 4n zP 2 zP 2 1 1 n n

zP 2  zP 2n

(36)

When the sample number is sufficiently large, it is reduced to

pˆ  z P

pˆ 1  pˆ  n

 p  pˆ  z P

pˆ 1  pˆ 

(37)

n

5.7. Population Ratio Testing with Non-Restored Extraction Hypothesis H0: The sample average H1: The sample average

pˆ pˆ

is the same as the population ratio

p

.

is different from the population average

p

.

Evaluation and Judgment We have a population ratio of

p

. We investigate the sample ratio and obtained

We want to judge the sample ratio that can be regarded as the population ratio We evaluate a normalized variable as

p

.

pˆ .

Statistical Testing and Predictions

z

81

pˆ  p p 1  p  N  n n N 1

(38)

This variable follows standard normal distribution. We decide a prediction probability

P , and evaluate corresponding z p . If the absolute

z

value of z is smaller than the P , we can regard that the data can be related to the set, and vice versa. Therefore, we can judge the sample average, and population average as  z  zP   z  zP

H0 is true. We cannot say that pˆ is different from p. H1 is true. pˆ is different from p.

(39)

Prediction If we do not know the population ratio, we can predict the population ratio as 2 pˆ 1  pˆ  N  n zP 2  N  n  zP 2 N  n pˆ   zP    2n N  1 n N  1 4n 2  N  1  z 2 N n 1 P n N 1  p 2 pˆ 1  pˆ  N  n zP 2  N  n  zP 2 N  n pˆ   zP    2n N  1 n N  1 4n 2  N  1  z 2 N n 1 P n N 1

(40)

6. TESTING FOR TWO VARIABLES We perform testing of two variables in this section. Before we perform the testing, we should discuss some points below. Let us consider that the two averages. When we obtain two averages the values are different of course in general. We evaluate that the difference is valid or not in the statistical evaluation. First, we evaluate two variables independently. We can evaluate averages and corresponding intervals as shown in Chapter 12 of volume 1 as

Kunihiro Suzuki

82 x1  z p

x2  z p

2 s1 

n1 2 s2 

n2

 1  x1  z p

2 s1 

 2  x2  z p

n1 2 s2 

n2

(41)

(42)

We approximate that the sample average follows normal distribution, and also assume x1  x2 .

Whether difference is valid or not can be evaluated by that the interval of these two variables have cross area. This is, we evaluate

x1  z p

2 s1 

n1

 x2  z p

2 s2 

n2

(43)

If Eq. (43) is valid, we can judge 1  2 , and vice versa. This evaluation is inaccurate. What point is wrong?

zP

is decided based on the prediction probability P . Eq. (43) is valid for

1  P   1  P  . Therefore, this evaluation is too severe. We should not evaluate two variables independently, but treat the difference itself as one probability variable.

 2 6.1. Testing of Difference Between Population Averages:  Is Known

Hypothesis H0: The two averages are same. H1: The two averages are different from each other. Evaluation and Judgment We evaluate the difference between two population average where the population variance is known. We obtain two sample averages of x1 and x2 , and assume the corresponding average and standard deviations are

Statistical Testing and Predictions    2  x1 : mean 1 ,standard deviation 1 n1     2 2 x : mean  ,standard deviation  2 2 n2 

83

(44)

We assume that the two variables X1 and X 2 follows normal distribution. We consider a difference of two probability variables given by y  x1  x2

(45)

The corresponding average and variance are given by   1  2

2 

(46)

 1 2  2 2  n1 n2

(47)

We then construct a normalized form as z

 x1  1    x2  2   2

1

n1 



 2

2

2

n2

(48)

 x1  x2    1  2   1 2 n1

z



 2 2 n2

follows a standard normal distribution. We want to evaluate whether 1  2 , and

hence set 1  2 , and the normalized variables is z

x1  x2

 1 2  2 2  n1 n2

(49)

We decide a prediction probability

z

z

P , and evaluate corresponding z p . If the absolute

value of is smaller than the p , we can regard that the data can be related to the set, and vice versa. Therefore, we can judge difference of the averages as

Kunihiro Suzuki

84  z  zP   z  zP

H0 is true. We cannot say that two averages are different. H1 is true. Two averages are different.

(50)

Prediction We set 1  2  

(51)

The difference of two averaged for the population can be evaluated as

 x1  x2   zP

 1 2 n1



 2 2 n2

    x1  x2   z P

 1 2 n1



 2 2 n2

(52)

6.2. Testing of Difference between Population Averages: and the Variances Are Assumed to Be the Same

  2

Is Unknown

Hypothesis H0: The two averages are same. H1: The two averages are different from each other. Evaluation and Judgment We evaluate the difference between two population average where the population variance is unknown and assumed to be the same.  2

s We obtain two sample averages of x1 and x2 , and also 1

and

2 s2 

, and assume

the corresponding average and standard deviations of x1 and x2 are 2  s   x1 : mean 1 ,standard deviation 1 n1   2  s2   x2 : mean 2 ,standard deviation n2 

We assume that the two variables X1 and X 2 follows t distribution. We consider the difference as a probability variable given by

(53)

Statistical Testing and Predictions y  x1  x2

85 (54)

The corresponding average and variance are given by   1  2

(55)

1 1  2  sp2     n1 n2 

(56)

where

sp   2

 n1  1 s1 2   n2  1 s2 2  n1  1   n2  1

(57)

We construct a normalized variable given by t



 x1  1    x2  2  1 2  1 s p     n n  1 2  x  x     1 2   1 2 

(58)

1 2  1 s p     n n 2   1

We assume that the variable follows a t distribution with a freedom of n1  n2  2 . We want to evaluate whether 1  2 , and hence set 1  2 , and the normalized variables is t

x1  x2

(59)

 2 

sp

1 1      n1 n2 

We compare this with P point

tP

We decide a prediction probability

t

and perform a testing.

P , and evaluate corresponding t p . If the absolute

value of t is smaller than the P , we can regard that averages the same, and vice versa. Therefore, we can judge the sample average, and population average as

Kunihiro Suzuki

86  t  tP   t  tP

H0 is true. We cannot say that the averages are different from each other. H1 is true. The averages are different from each other.

(60)

Prediction If we want to evaluate that the population average difference is more than a certain value of  , we set

1  2  

(61)

and the corresponding normalized variable is

 x1  x2   tP

1 1 2  1 2  1 sp         x1  x2   t P sp      n1 n2   n1 n2 

6.3. Testing of Difference between Population Averages:  Unknown and the Variances Are Assumed to Be Different

(62)

 2

Is

Hypothesis H0: The two averages are same. H1: The two averages are different from each other. Evaluation and Judgment We evaluate the difference between two population averages where the population variance is known.  2

s We obtain two sample averages of x1 and x2 , and also 1

and

2 s2 

, and assume

the corresponding average and standard deviations of x1 and x2 are

  x1 : mean 1 ,standard deviation     x2 : mean 2 ,standard deviation 

s1  n1 2

s2  n2 2

We assume that the two variables X1 and X 2 follows t distribution. We consider the difference as a probability variable given by

(63)

Statistical Testing and Predictions

y  x1  x2

87

(64)

The corresponding average and variance are given by

  1  2

(65)

s1  s2     n1 n2 2

2

2

(66)

We construct a normalized variable given by t



 x1  1    x2  2  s1  s2   n1 n2 2

2

(67)

 x1  x2    1  2  s1  s2   n1 n2 2

2

We assume that the variable follows a  s1 2  s2 2      n1 n2   * n  2 2 1 s1  1 s1   n1  1 n1 n2  1 n2

t distribution with a freedom of

n* , where

2

(68)

* * In general n is not an integer, and we use a integer nearest to n , and denote it as

nf

and hence express it as

n f  Round n* 

We decide a prediction probability

(69)

P , and evaluate corresponding t p . If the absolute

value of t is smaller than the t P , we can regard that averages are the same, and vice versa. Therefore, we can judge the sample average, and population average as

Kunihiro Suzuki

88  t  tP   t  tP

H0 is true. We cannot say that the averages are different from each other. H1 is true. The averages are different from each other.

(70)

Prediction If we want to evaluate that the population average difference is more than a certain value of  , we set 1  2  

(71)

and the corresponding normalized variable is

 x1  x2   tP

s1  s2  s  s       x1  x2   t P 1  2 n1 n2 n1 n2 2

2

2

2

(72)

 2 6.4. Testing of Difference between Population Averages:  Is Unknown with Paired Data

Hypothesis H0: The two averages are same. H1: The two averages are different from each other. Evaluation and Judgment We evaluate the difference between two population averages where the population variance is unknown, and the data are paired. We obtain two sample averages of x1 and x2 with the same data number of n . We have data for group 1 and group 2 are paired, and we can evaluate the difference of each pair as

di  xi1  xi 2 We can evaluate the average and unbiased variance associated with d 

1 n  di n i 1

(73)

di

as

(74)

Statistical Testing and Predictions sd   2



1 n  di  d n  1 i 1



89

2

(75)

We construct a normalized variable given by t

 x1  x2    1  2 

(76)

sd  n 2

We assume that the variable follows a t distribution with a freedom of n  1 . We decide a prediction probability

P , and evaluate corresponding t p . If the absolute

t

value of t is smaller than the P , we can regard that averages are the same, and vice versa. Therefore, we can judge the sample average, and population average as  t  tP H0 is true. We cannot say that the averages are different from each other.   t  tP H1 is true. The averages are different from each other.

(77)

Prediction If we want to evaluate that the population average difference is more than a certain value of  , we set 1  2  

(78)

and the corresponding normalized variable is

 x1  x2   tP

sd  s      x1  x2   tP d n n 2

2

(79)

6.5. Testing of Difference between Population Ratio with Restored Extraction Hypothesis H0: The two population ratios are same. H1: The two population ratios are different from each other.

Kunihiro Suzuki

90

Evaluation and Judgment We assume that two population ratios are p1 and p2 , and evaluate the difference between two population ratios based on the sample data. ˆ1 and pˆ 2 , and assume the corresponding average and We obtain two sample ratio of p standard deviations are  p 1  p1   pˆ1 : mean p1 ,standard deviation 1 n1   p2 1  p2    pˆ 2 : mean p2 ,standard deviation n2 

(80)

This follows a normal distribution. We consider the difference as a probability variable given by y  pˆ1  pˆ 2

(81)

The corresponding average and variance are given by   p1  p2

2  

(82)

p1 1  p1  n1

pˆ1 1  pˆ1  n1

 

p2 1  p2  n2

pˆ 2 1  pˆ 2 

(83)

n2

We construct a normalized variable given by z

 pˆ1  pˆ 2    p1  p2 

(84)



We assume that the variable

z

follows a standard normal distribution. We want to

evaluate whether p1  p2  p , and hence set p1  p2  p , and the normalized variables is z

pˆ1  pˆ 2 

We compare this with P point

(85)

zP

and perform a testing.

Statistical Testing and Predictions

91

Therefore, we can judge difference of the averages as  z  zP H0 is true. We cannot say that two population ratios are different.   z  zP H1 is true. Two population ratios are different.

(86)

Prediction We set p1  p2  p

(87)

The difference of two population ratios can be evaluated as

 pˆ1  pˆ 2   zP

ˆ1  p ˆ 2   zP  p   p

(88)

6.6. Testing of Difference between Population Ratio with Non-Restored Extraction Hypothesis H0: The two population ratios are same. H1: The two population ratios are different from each other. Evaluation and Judgment We assume that two population ratios are p1 and p2 , and evaluate the difference between two population ratio based on the sample data. ˆ1 and pˆ 2 , and assume the corresponding average and We obtain two sample ratio of p standard deviations are

 p 1  p1  N1  n1  pˆ1 : mean p1 ,standard deviation 1 n1 N1  1   p2 1  p2  N 2  n2   pˆ 2 : mean p2 ,standard deviation n2 N2  1 

(89)

This follows a normal distribution. We consider the difference as a probability variable given by ˆ1  p ˆ2 y p

(90)

Kunihiro Suzuki

92

The corresponding average and variance are given by   p1  p2

(91)

2 

p1 1  p1  N1  n1 p 1  p2  N 2  n2  2 n1 N1  1 n2 N2  1



pˆ1 1  pˆ1  N1  n1 2 pˆ 2 1  pˆ 2  N 2  n2  n1 N1  1 n2 N2  1

2

(92)

We construct a normalized variable given by z

 pˆ1  pˆ 2    p1  p2 

(93)



We assume that the variable

z

follows a standard normal distribution. We want to

evaluate whether p1  p2  p , and hence set p1  p2  p , and the normalized variables is z

pˆ1  pˆ 2

(94)



z

We compare this with P point P and perform a testing. Therefore, we can judge difference of the averages as  z  zP   z  zP

H0 is true. We cannot say that two population ratios are different. H1 is true. Two population ratios are different.

(95)

Prediction We set p1  p2  p

(96)

The difference of two population ratios can be evaluated as

 pˆ1  pˆ 2   zP

ˆ1  p ˆ 2   zP  p   p

(97)

Statistical Testing and Predictions

93

6.7. Testing of Ratio of Two Population’s Variances: 1 and 2 Are Known Hypothesis H0: The two population variances are same. H1: The two population variances are different from each other. Evaluation and Judgment

u1  , u2  are given by 2

Two variances  2

u1

 2

u2

2

1 n1 2    xi1  1  n1 i 1

(98)

1 n2 2    xi 2  2  n2 i 1

(99)

The variable F defined by

F

u1

2

u2 

2

(100)

follows F distribution with a freedom of k1  n1 , k2  n2 , where n1 and n2 are the sample numbers. We compare this with P point FP and perform a testing. Therefore, we can judge difference of the variances as  F  FP   F  FP

Prediction None.

H0 is true. We cannot say that two population variances are different. H1 is true. Two population variances are different.

(101)

Kunihiro Suzuki

94

6.8. Testing of Ratio of Two Population’s Variances 1 and 2 Are Unknown Hypothesis H0: The two population variances are same. H1: The two population variances are different from each other. Evaluation and Judgment

s1  , s2  given by 2

The ratio of two unbiased variance

F

s1

2

2

s2

2

(102)

follows a F distribution with a freedom of k1  n1  1, k2  n2  1 , where n1 and n2 are the sample numbers. We compare this with P point FP and perform a testing. Therefore, we can judge the difference of the variances as  F  FP   F  FP

H0 is true. We cannot say that two population variances are different. H1 is true. Two population variances are different.

(103)

Prediction None.

7. TESTING FOR CORRELATION FACTORS 7.1. Correlation Factor Testing Hypothesis H0: The two population variables have no correlation relationship. H1: The two population variables have correlation relationship. Evaluation and Judgment We test whether there is correlation relationship between two variables for a gotten correlation factor r . When the data has no correlation relationship, the variable

Statistical Testing and Predictions t  n2

follows a

95

r 1  r2

(104)

t distribution with a freedom of n  2 .

We decide a prediction probability

P , and evaluate corresponding t p . If the absolute

value of t is smaller than the t P , we can regard that averages the same, and vice versa. Therefore, we can judge the sample average, and population average as  t  tP   t  tP

H0 is true. We cannot say that the averages are different from each other. H1 is true. The averages are different from each other.

(105)

Prediction None.

7.2. Correlation Factor Testing for Reference One Hypothesis H0: The correlation factor is the same as the population correlation factor. H1: The correlation factor is different from the population correlation factor. Evaluation and Judgment We test whether the gotten correlation factor We form a converted variable of 1

r

is the same as the reference of

.

1 r 

  ln   2 1 r 

(106)

This follows a normal distribution with an average of

1 1     ln   2 1   and a standard deviation of

(107)

Kunihiro Suzuki

96  

1 n  2.5

(108)

Consequently, the parameter  given by

z

   

(109)

follows a standard normal distribution. We compare this with P point z P and perform a testing. Therefore, we can judge difference of the averages as  z  zP H0 is true. We cannot say that the corrlataion factor is different from the population correlation factor.   z  zP H1 is true. The corrlataion factor is different from the population correlation factor.

(110)

Prediction If we do not know the population correlation factor, we can predict it as   zP       zP 

(111)

This can be converted as min    max

(112)

where min 

e2GL  1 e2GL  1

max 

e2GH  1 e2GH  1

(113) (114)

and 1  1  rxy GL  ln  2  1  rxy

 1   zP  n  2.5 

(115)

Statistical Testing and Predictions GH 

1 1 r  ln    zP 2 1 r 

97

1 n  2.5

(116)

7.3. Two Correlation Factor Testing Hypothesis H0: The two correlation factors are the same. H1: The two correlation factors are different from each other. Evaluation and Judgment

r1

and r2 are the same or not. We assume the corresponding population correlation factors are 1 and  2 , respectively. We test whether the two gotten correlation factors We form converted variables of

1  1  r1  1  ln   2  1  r1 

(117)

1  1  r2  2  ln   2  1  r2 

(118)

The average and variance of 1 and 2 are given by

1  1  1  1  ln   2  1  1 

(119)

1  1  2  2  ln   2  1  2 

(120)

 1 

 2 

1 n1  2.5

(121)

1 n2  2.5

(122)

Kunihiro Suzuki

98 Consequently, the parameter

z

1   2       1

2

z

given by





(123)

follows a standard normal distribution, where    2   2 1

(124)

2

We compare this with P point z P and perform a testing. Therefore, we can judge difference of the averages as   z  zP    z  zP

H0 is true. We cannot say that the two corrlataion factors are differnt from each other. H1 is true. The two corrlataion factors are differnt from each other.

(125)

Prediction None.

8. TESTING FOR REGRESSION Hypothesis H0: The regression is invalid. H1: The regression is valid. Evaluation and Judgment In the regression analysis, a degree of freedom adjusted coefficient of determination is given by n R*2  1 

e n

T

where

Se

2

 2

S yy

(126)

Statistical Testing and Predictions

99

e  n  2

(127)

T  n  1

(128)

r  1

(129)

x

1  xi n

(130)

y

1  yi n

(131)

 2

S xx

 2

S yy

x  x  

2

 y  y  

2

i

i

n

S xy   2

Sr   2

(132)

n

 x

i

 x  yi  y  n

 y

i

 y Yi  y  n

S yy   Sr   Se 2

2

2

(133)

(134)

(135)

(136)

We form a parameter n F

r n

e

S r

2

Se

2

(137)

This follows a F distribution with a freedom of k1  r , k2  e .We compare this with P point FP and perform a testing. Therefore, we can judge the difference of the variances as

Kunihiro Suzuki

100  F  FP   F  FP

H0 is true. We cannot say that the regression is valid. H1 is true. The regression is valid.

(138)

Prediction

x0

The regression value for y at

is expressed a

Yˆ0  aˆ0  aˆ1 x0

(139)

where

S xy  2

aˆ1 

S xx 

(140)

aˆ0  y  aˆ1 x

(141)

2

The predictive range is given by  1  x0  x 2   2 ˆ  se  Y0  Yˆ0  t p  n  2  Y0  t p  n  2     2 n nS   xx

The predictive range for

y0

 1  x0  x 2   2    se 2 nS xx    n

(142)

is given by

 1  x  x 2  2 Yˆ0  z p 1   0  2   e   yo  Yˆ0  z p nS xx   n

 1  x0  x 2   2 1   e 2 nS xx    n

9. TESTING FOR MULTI REGRESSION Hypothesis H0: The regression is invalid. H1: The regression is valid.

(143)

Statistical Testing and Predictions

101

Evaluation and Judgment In the regression analysis, a degree of freedom adjusted coefficient of determination is given by

n R*2  1 

Se

e n

2

S yy  2

T

(144)

where e  n   m  1

(145)

T  n  1

(146)

r  m

(147)

xp 

y

1  xip n

1  yi n

S pq  2

 2

S yy

(149)

x

ip

 x  xiq  x  n

 y  y  

S py  2

Sr   2

(148)

n

ip

 x p   yi  y  n

 y

i

 y Yi  y  n

S yy   Sr   Se 2

2

i

x

2

2

(150)

(151)

(152)

(153)

(154)

Kunihiro Suzuki

102 We form a parameter

n F

r n

e

S r

2

Se

2

(155)

This follows a F distribution with a freedom of k1  r , k2  e . We compare this with P point FP and perform a testing. Therefore, we can judge the difference of the variances as  F  FP   F  FP

H0 is true. We cannot say that the regression is valid. H1 is true. The regression is valid.

(156)

Prediction The regression value for y at x0 is expressed a Yi  aˆ0  aˆ1 xi1  aˆ2 xi 2 

 aˆm xim

(157)

where    aˆ1   S11  ˆ   2  a2    S21       2  aˆ p   Sm 1 2

S12 

S1m   2 S2 m     2  Smm 

2

2

  S22 2

Sm 2 2

aˆ0  y  aˆ1 x1  aˆ2 x2 

1

 S1y2   2   S2 y       S  2   py 

 aˆm xm

(158) (159)

The predictive range is given by

aˆ0  aˆ1 x1  aˆ2 x2 

 1 D 2   2  aˆm xm  t p e ; P     se n n 

Y  aˆ0  aˆ1 x1  aˆ2 x2 

(160)  1 D 2   2  aˆm xm  t p e ; P     se n n 

Statistical Testing and Predictions

103

where

D 2   x1  x1

x2  x2

The predictive range for

aˆ0  aˆ1 x1  aˆ2 x2 

 S 11 2  21 2 S   xm  xm     S m1 2 

y0

S S S

12 2  22 2 

m 2 2 

1m  2 

 x  x   1 1    x2  x2  S     mm  2   x  x m  S  m

S

2 m 2

(161)

is given by

 1 D 2   2  aˆm xm  t e , P  1    se n n  

 y

(162)

aˆ0  aˆ1 x1  aˆ2 x2 

 1 D 2   2  aˆm xm  t e , P  1    se n n  

10. TESTING FOR EFFECTIVENESS OF VARIANCES IN MULTI REGRESSION Hypothesis H0: The variance is invalid. H1: The variance is valid. Evaluation and Judgment In the multiple regression, the effectiveness of the m variances with respect to the regression should be evaluated. We start with a regression without an explanatory variable, which is denoted as model 0. The regression is given by Model 0 : Yi  y

(163) S e M 0  2

The corresponding variance Se M 0  2

is given by

1 n 1 n 2 2  yi  Yi     yi  y   S yy 2  n i 1 n i 1

(164)

Kunihiro Suzuki

104

In the next step, we evaluate the validity of x1 , x2 , model 1.

, xm , and the model is denoted as

The regression using the explanatory variable is given by

xl

Yi  a0  a1 xil1

(165) Se M 1 2

The corresponding variance Se M 1  2

is given by





2 1 n 1 n 2  yi  Yi     yi  a0  a1 xil1   n i 1 n i 1

(166)

Then the variable nS    F 

2 e M 0

1

 nSe M 1 2

  

2 nSe M 1

e M 0

 e M 1



(167)

e M 1

follows a F distribution with freedom and are given by



F e M 0  e M 1 ,e M 1



where

e M 0

and

e M 1

are the

e M 0  n  1

(168)

e M 1  n  2

(169)

We can judge the validity of the explanatory variable as

 

 F1  F   e M 1 , e M 1 e M 0     F1  F e M 0   e M 1 , e M 1 

 

We evaluate F1 for x1 and

valid

(170)

invalid

x2 , that is, l

1

 1,2 , and evaluate the corresponding F1 .

If both F1 is invalid, we use the model 0 and the process ends. We precede these processes and obtain

Statistical Testing and Predictions

Fk 

 nS  

2 e Mk 1

  nSe Mk  2

nS

  

 2 e Mk 

e Mk 1

 e Mk 

105



(171)

e Mk 

where

e Mk   n   k  1

(172)





Fp e Mk 1  e Mk  ,e Mk  We evaluate the corresponding F value given by . Therefore, we can judge the difference of the variances as

 F  FP   F  FP

H0 is true. We cannot say that the variable is valid for the regression. H1 is true. The variable is valid for the regression.

(173)

Prediction None.

11. TESTING FOR VARIANCE ANALYSIS 11.1. One Way Analysis Hypothesis H0: The parameter dependence is invalid. H1: The parameter dependence is valid. Evaluation and Judgment  2

The effectiveness of the level is evaluated with Sex given by

 2

Sex 



nA1  A1  



2



 nA2  A2  



2



 nA3  A3  



2

nA1  nA2  nA3

(174)  2

The scattering of the data is expressed with Sin , and is given by

Kunihiro Suzuki

106 nA1

 x

Sin  

iA1

i 1

2

  A1

2

i 1

iA2

  A2

nA3

   x 2

i 1

iA3

  A3



2

nA1  nA2  nA3

The correlation ratio

2 

nA2

   x

2

(175)

is given by

Sex  2

Sin   Sex  2

2

(176)

This is between 0 and 1, and the effectiveness of the factor can be regarded as significant with larger  2 . We form an unbiased variance as   sex 

n

2

ex

  Sex 2

(177)

where

ex  p p

(178)

is the level number.

sin  

n

2

in

Sin  2

(179)

where in  n  p

(180)

Finally, the ratio of the unbiased variance is denoted by F and is given by

sex  2

F

sin  2

(181)

Statistical Testing and Predictions

This follows a F distribution with a freedom of distribution is denoted as

FP ex , in 

107

ex ,in  . The P point for the

F

.

We compare this with P point FP and perform a testing. Therefore, we can judge the difference of the variances as  F  FP   F  FP

H0 is true. We cannot say that the paremter dependence is valid. H1 is true. The paremter dependence is valid.

(182)

Prediction The effectiveness between each levels can be evaluated as

A  A i

j

2 sin   1 1     2  nAi nAj 

(183)

If this value is larger than the studentized range distribution table value of

q  r , n  r , P  , we judge that the difference is effective. The other simple way to evaluate the difference is the one with zi 

 Ai   2 sin   1 1     2  nAi n 

(184)

We may be able to compare absolute value of this with z p for a normal distribution.

11.2. Two Way Analysis without Repeated Data Hypothesis H0: The parameter dependence is invalid. H1: The parameter dependence is valid.

Kunihiro Suzuki

108

Evaluation and Judgment We consider two factors of A : A1, A2 , A3 , A4 and B : B1 , B2 , B3 , where nA  4 and nB  3 . The total data number

n

is given by

n  nA  nB

(185)

In this case, each data xij can be expressed by the deviation from the average, and is given by



 



xij     Aj    Bi    eij

(186)

The various variances are given by

 nA

  S Aex  2

i 1

Ai

2

Se   2



2

nA

 nB

  S Bex 



i 1

Bi

(187)





2

nB 1 nB nA 2  eij n i 1 j 1

(188)

(189)

The various freedoms are given by tot  n  1

(190)

A  nA  1

(191)

B  nB  1

(192)

The freedom associated with the error is given by

Statistical Testing and Predictions

109

e  tot  A  B    n  1   nA  1  nB  1  n   nA  nB   1

(193)

Therefore, the unbiased variances are given by sA   2

sB   2

se   2

n

A n

B n

e

S A 

(194)

SB 

(195)

Se

(196)

2

2

2

The F associated with a factor A is given by s A  2

FA 

se

(197)

2

F  ,   This is compared with the F critical value of AP A c sB  2

FB 

se

(198)

2

F  ,  This is compared with the F critical value of BP B e . We compare this with P point F P and perform a testing, where   A, B . Therefore, we can judge the difference of the variances as  F  F P   F  F P

Prediction None.

H0 is true. We cannot say that the paremter  dependence is valid. H1 is true. The paremter  dependence is valid.

(199)

Kunihiro Suzuki

110

11.3. Two Way Analysis with Repeated Data Hypothesis H0: The parameter dependence is invalid. H1: The parameter dependence is valid. Evaluation and Judgment We consider two factors of A : A1 , A2 , , AnA and B : B1 , B2 , , BnB , and we have ns set. The total data number

n

is given by

n  nA  nB  ns

(200)

The total average is given by



1 ns nA nB  xij _ s n s 1 j 1 i 1

(201)

Each data deviation xij _ s from the total average is given by

xij _ s  xij _ s  

(202)

The average data for each level is given by xij 



1 xij _1  xij _1 2



(203)

The averages are evaluated as

 Aj 

 Bi 

nB

1 nB

x

1 nA

nA

ij

i 1

(204)

x j 1

ij

and the average deviation can be evaluated as

(205)

Statistical Testing and Predictions

111

 Aj   Aj  

(206)

Bi  Bi  

 nA

  S Aex  2

i 1

Ai

2



2

nA

 nB

  S Bex 



(207)

i 1

Bi

(208)





2

nB

(209)

Aex  nA  1

(210)

Bex  nB  1

(211)

s  s  Therefore, the corresponding unbiased variances Aex and Bex are given by 2

s Aex 

n

2

  sBex 

 Aex

n

2

bLex

2

  S Aex 2

(212)   S Bex 2

(213)

A pure error is given by

e pure _ ij _ s  xij _ s  xij Se _pure  2

1  nB nA ns  nA nB ns  j i s

(214)

e



 2

pure _ ij _ s



The deviation of each data from the total average eij _ s is given by

(215)

Kunihiro Suzuki

112



 



eij _ s  xij _ s      Aj    Bi     

(216)

The difference associated with interaction is given by einteract _ ij _ s  eij _ s  e pure _ ij _ s



 



 xij      Aj     Bi        Sinteract  2



nA

1 nA nB

nB

ns

  e

1 nA nB ns

interact _ ij _ s

j

i

 nA

nB

j

i



(217)

2

s



 



xij      Aj     Bi     

2

(218)

interact  tot  Aex  Bex  e _ pure 

 n  1   nA  1  nB  1  nA  nB   ns  1 

  sinteract 

n

2

interact

(219)

  Sinteract 2

(220)

s Aex 2

FA 

se _pure 2

(221)

The critical F value FAP is given by

FAP  F Aex ,e 

(222)

The effectiveness of a factor B can be evaluated as   sBex 2

FB 

se _pure 2

The critical F value FBP is given by

(223)

Statistical Testing and Predictions

FBP  F Bex ,e 

113

(224)

Therefore, the factor B is effective. The effectiveness of interaction can be evaluated as   sinteract 2

Finteract 

se _pure 2

(225)

The critical F value FinteractP is given by

FinteractP  F interact ,e 

(226)

We compare this with P point F P and perform a testing, where   A, B,interact . Therefore, we can judge difference of the variances as  F  F P   F  F P

H0 is true. We cannot say that the paremter  dependence is valid. H1 is true. The paremter  dependence is valid.

(227)

Prediction None.

11.4. Independent Factor Analysis Hypothesis H0: The parameter dependence is invalid. H1: The parameter dependence is valid. Evaluation and Judgment

a The independent value for cell  i, j  is denoted as ij and is given by aij  ki rj N

(228)

r

where ki is the ratio of explanatory variable, and j is the ratio of objective variable levels. We evaluate the variable for the cross tabulated table as

Kunihiro Suzuki

114

  2

i, j

x

ij

 aij 

2

(229)

aij

The level numbers of variable are assumed to be is given by

m

and l . Therefore, the freedom 

   m  1 l  1

(230)

We set a predictive probability P , and obtain the corresponding P-value as

c 2   2  , P 

(231)

Therefore, we can judge the dependence of the variances as   2  c2  2 2    c

Prediction None.

H0 is true. We cannot say that the paremter dependence is valid. H1 is true. The paremter dependence is valid.

(232)

Chapter 4

SCORE EVALUATION ABSTRACT We show the procedure to decide the subject on which we focus to improve total scores of various subjects. The former procedure was that we evaluate the subject using its normalized value. We add one more aspect to improve the total scores, that is, we consider a standard deviation. We then need to focus on the subject with a low normalized value and a large standard deviation.

Keywords: score evaluation, normalized value, standard deviation, contribution, improvement requested

1. INTRODUCTION Success or failure is determined based on the total score in common examinations. Therefore, we want to know what subject we should focus on to improve the total score. The simple way to select the subject is to evaluate the normalized value for each subject, and try to improve the subject with the low normalized value. We show that it is more effective to add one more aspect to select the subjects, that is, the standard deviation of the subject.

2. EVALUATION OF THE FIVE SUBJECTS Table 1 shows the score of 40 students of five subjects: Japanese, English, Science, Social, and Mathematics. We express the subject with p , where p =1 to 5. We denote the average of subject

p

as

p

and the standard deviation as

p

, which are evaluated as

Kunihiro Suzuki

116 N

x

p 

i 1

ip

N

(1)

 x N

 p2 

i 1

ip

 p 

2

N

(2)

where N is the student number and is 40 here. The score of a member i with the subject normalized variable is given by

zip 

p

is denoted as

xip

, and the related

xip   p

p

(3)

This value is related to the status of the member i in the group. The order of the member in the group is located in the ratio r , and it is given by z

 z2   1 r  exp    dz  2  2 

1  z  1  Erf   2  2 

(4)

where Erf is the error function as shown in Appendix 1-4. This is the ratio from the bottom, and the ratio from the top can be obtained by 1 minus the value, which is shown in Figure 1. 1.0

Probability

0.8 0.6

Bottom Top

0.4 0.2 0.0

-3

-2

-1

0 z

Figure 1. Dependence of probability on normalized value z .

1

2

3

Score Evaluation

117

z

z

Table 2 shows the normalized variables ip . The values of ip more than 0.5 are blue hatched, and the values less than -0.5 are red hatched. This corresponds to the top 30％ and bottom 30%. Therefore, the blue hatched subjects correspond to the good point and red hatched subjects to bad point for the person. It is then recommended to improve the red hatched subjects for the person, which is the standard evaluation for selecting the target subjects. Table 1. Score of data for 40 members ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Average Stdv

Japanese 77 80 93 66 85 63 100 60 100 78 91 90 78 72 100 73 47 70 64 82 71 64 32 94 83 76 59 88 66 90 53 71 60 68 52 80 39 47 74 57 72.33 16.48

English 92 6 46 26 82 46 68 66 72 36 100 56 44 38 78 100 64 78 75 73 54 18 65 73 61 50 90 45 51 90 37 60 91 55 32 30 43 100 97 52 61.00 23.71

Science 17 48 46 24 49 60 38 36 61 52 70 74 67 38 30 54 39 30 28 66 50 22 19 28 72 44 35 40 64 55 53 51 56 55 13 51 42 73 25 29 45.10 16.49

Social 81 76 85 74 87 82 100 70 100 91 77 74 97 72 71 80 100 78 87 86 92 97 77 79 91 87 96 85 67 73 72 91 82 78 67 89 81 87 73 75 82.68 9.37

Math 50 67 34 61 68 0 82 70 77 64 51 100 65 51 18 100 82 69 42 9 55 72 60 66 43 90 0 54 57 74 70 38 0 71 0 94 12 88 91 28 55.58 28.55

Kunihiro Suzuki

118

Table 2. Normalized score for 40 members ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Japanese 0.28 0.47 1.25 -0.38 0.77 -0.57 1.68 -0.75 1.68 0.34 1.13 1.07 0.34 -0.02 1.68 0.04 -1.54 -0.14 -0.51 0.59 -0.08 -0.51 -2.45 1.32 0.65 0.22 -0.81 0.95 -0.38 1.07 -1.17 -0.08 -0.75 -0.26 -1.23 0.47 -2.02 -1.54 0.10 -0.93

English 1.31 -2.32 -0.63 -1.48 0.89 -0.63 0.30 0.21 0.46 -1.05 1.64 -0.21 -0.72 -0.97 0.72 1.64 0.13 0.72 0.59 0.51 -0.30 -1.81 0.17 0.51 0.00 -0.46 1.22 -0.67 -0.42 1.22 -1.01 -0.04 1.27 -0.25 -1.22 -1.31 -0.76 1.64 1.52 -0.38

Science -1.70 0.18 0.05 -1.28 0.24 0.90 -0.43 -0.55 0.96 0.42 1.51 1.75 1.33 -0.43 -0.92 0.54 -0.37 -0.92 -1.04 1.27 0.30 -1.40 -1.58 -1.04 1.63 -0.07 -0.61 -0.31 1.15 0.60 0.48 0.36 0.66 0.60 -1.95 0.36 -0.19 1.69 -1.22 -0.98

Social -0.18 -0.71 0.25 -0.93 0.46 -0.07 1.85 -1.35 1.85 0.89 -0.61 -0.93 1.53 -1.14 -1.25 -0.29 1.85 -0.50 0.46 0.35 1.00 1.53 -0.61 -0.39 0.89 0.46 1.42 0.25 -1.67 -1.03 -1.14 0.89 -0.07 -0.50 -1.67 0.67 -0.18 0.46 -1.03 -0.82

Math -0.20 0.40 -0.76 0.19 0.44 -1.95 0.93 0.51 0.75 0.30 -0.16 1.56 0.33 -0.16 -1.32 1.56 0.93 0.47 -0.48 -1.63 -0.02 0.58 0.15 0.37 -0.44 1.21 -1.95 -0.06 0.05 0.65 0.51 -0.62 -1.95 0.54 -1.95 1.35 -1.53 1.14 1.24 -0.97

Score Evaluation

119

3. SCORE EVALUATION CONSIDERING STANDARD DEVIATION We further study the score evaluation. The total Qi for a member i is given by 5

Qi   xip p 1

(5)

This is modified as 5

xip   p

p 1

p

Qi    p

5

  p p 1

5

5

p 1

p 1

   p zip    p

(6)

The second term is independent of a member, that is, it is independent of i . Therefore, we need not to consider this term to select the subjects. The first term is related to the selection of the subjects. The selection of subject for low normalized value means that we focus on the

z

zip

in the first term. However, the first term



is the product of ip and p . Therefore, we should also care about the standard deviation. This means that we should select the subject for low normalized value with high standard deviation. We evaluate the average and the standard deviation of the standard deviation Eq.(6) as  

 

1 5  p 5 p 1

2 1 5  p      5 p 1

  18.92 , and    6.61 in this case.

The normalized value is given by

p

in

(7)

(8)

Kunihiro Suzuki

120

z p 

 p   

(9)

Substituting Eq. (9) into Eq. (6), we obtain 5

5

p 1

p 1

Qi   p zip    p 5

   p 1

5 5  p   zip    zip    p  p 1 p 1

5

5

5

p 1

p 1

p 1

    z p zip    zip    p

(10)

We want to summarize the first two terms and modify this as 5

5

5

p 1

p 1

p 1

Qi     z p zip    zip    p 5 5       1   z p zip    p  p 1  p 1    1   z 2 5      p   1       2    p 1      1       

    1       

2

5

 z p 1

  5  zip    p p 1    

5

z   p

p ip

p 1

(11)

where z i is the extended normalized value given by 1 z p 

 z   p

  1      

2

(12)

This extended normalized value is related to the importance of the subject to improve the total score. Table 3 summarizes the parameter values.

Score Evaluation

121

Table 3. Standard deviation (Stdev), normalized stdv, and extended normalized stdv Subject Stdv Normalized stdv Extended normalized stdv

Japanese 16.476 -0.369 0.822

English 23.711 0.724 1.183

Science 16.486 -0.368 0.823

Social 9.371 -1.443 0.468

Math 28.551 1.456 1.425

Let us appreciate the normalized value. The extended normalized value is as follows in the limiting cases as

z p

 1   z   p

   for   for

1 1

(13)

p The extended normalized value is independent of subject for   . This means that we can regard the standard deviation as constant. In this case, there is no priority for the subject, and we should select only by the normalized score as the usual evaluation. p The normalized value is dependent of subject for   . This means that the standard deviation depends on the subject. In this case, there is significant priority for the subject, and we should select by considering both normalized score and this extended normalized value. In the latter case, we should perform CS analysis, where we define contribution and requested axis which has the angles of   4 . In the former case, we do not define such axis explicitly. However, if we define the angles of   2 , we obtain the same results as the conventional one. Therefore, we need to define contribution and requested axis varying depending on the value of  and  .

a  z , z



p ip We then obtain coordinate for member i as i . We evaluate the axis for improvement request and contribution. We define the angle given by

tan  

 

That is, we obtain an angle of

(14)

Kunihiro Suzuki

122

    tan 1      

(15)

The angle has values for limiting case as  2    0 

   for  

for

1 1

(16)

In this case, the angle is given by

        6.61   tan 1    18.92   0.34 radian

  tan 1 

 19.30

(17)

We propose to define the unit vectors for contribution and improvement as follows.

         eG   cos    ,sin      2 2  2 2     e   cos       ,sin             B   2 2  2 2  

(18)

These definitions realize the requested ones for limiting cases. The contribution and improvement requested are given by

       ai eG  zGi  z p cos  2  2   zip sin  2  2        a e  z  z cos        z sin       i B Bi p   ip     2 2  2 2 

(19)

Score Evaluation

123

The value for contribution and improvement requested are shown in Table 4 and Table 5. The values more than 0.5 in Table 4 are hatched blue, which express the contribution. The values more than 0.5 in Table 5 are hatched red, which express improvement requested. Table 4. Contribution values for 40 members ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Japanese 0.42 0.60 1.37 -0.24 0.90 -0.42 1.79 -0.60 1.79 0.48 1.26 1.20 0.48 0.12 1.79 0.18 -1.38 0.00 -0.36 0.72 0.06 -0.36 -2.28 1.43 0.78 0.36 -0.66 1.08 -0.24 1.20 -1.02 0.06 -0.60 -0.12 -1.08 0.60 -1.86 -1.38 0.24 -0.78

English 1.49 -2.09 -0.43 -1.26 1.07 -0.43 0.49 0.41 0.66 -0.84 1.82 -0.01 -0.51 -0.76 0.90 1.82 0.32 0.90 0.78 0.70 -0.09 -1.59 0.36 0.70 0.20 -0.26 1.40 -0.47 -0.22 1.40 -0.80 0.16 1.45 -0.05 -1.01 -1.09 -0.55 1.82 1.69 -0.18

Science -1.54 0.31 0.19 -1.12 0.37 1.03 -0.29 -0.41 1.09 0.55 1.63 1.87 1.45 -0.29 -0.77 0.67 -0.23 -0.77 -0.88 1.39 0.43 -1.24 -1.42 -0.88 1.75 0.07 -0.47 -0.17 1.27 0.73 0.61 0.49 0.79 0.73 -1.78 0.49 -0.05 1.81 -1.06 -0.83

Social -0.10 -0.62 0.32 -0.83 0.53 0.01 1.90 -1.26 1.90 0.95 -0.52 -0.83 1.59 -1.04 -1.15 -0.20 1.90 -0.41 0.53 0.43 1.06 1.59 -0.52 -0.31 0.95 0.53 1.48 0.32 -1.57 -0.94 -1.04 0.95 0.01 -0.41 -1.57 0.74 -0.10 0.53 -0.94 -0.73

Math 0.05 0.63 -0.51 0.43 0.67 -1.68 1.15 0.74 0.98 0.53 0.08 1.77 0.56 0.08 -1.06 1.77 1.15 0.70 -0.23 -1.37 0.22 0.81 0.39 0.60 -0.20 1.43 -1.68 0.18 0.29 0.87 0.74 -0.37 -1.68 0.77 -1.68 1.57 -1.27 1.36 1.46 -0.71

Kunihiro Suzuki

124

Table 5. Improvement requested values for 40 members ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Japanese -0.14 -0.32 -1.10 0.52 -0.62 0.70 -1.52 0.88 -1.52 -0.20 -0.98 -0.92 -0.20 0.16 -1.52 0.10 1.65 0.28 0.64 -0.44 0.22 0.64 2.55 -1.16 -0.50 -0.08 0.93 -0.80 0.52 -0.92 1.29 0.22 0.88 0.40 1.35 -0.32 2.13 1.65 0.04 1.05

English -1.09 2.48 0.82 1.65 -0.68 0.82 -0.09 -0.01 -0.26 1.24 -1.42 0.41 0.90 1.15 -0.51 -1.42 0.07 -0.51 -0.38 -0.30 0.49 1.99 0.03 -0.30 0.20 0.66 -1.01 0.86 0.61 -1.01 1.20 0.24 -1.05 0.45 1.40 1.49 0.95 -1.42 -1.30 0.57

Science 1.82 -0.04 0.08 1.40 -0.10 -0.75 0.56 0.68 -0.81 -0.27 -1.35 -1.59 -1.17 0.56 1.04 -0.39 0.50 1.04 1.16 -1.11 -0.16 1.52 1.70 1.16 -1.47 0.20 0.74 0.44 -0.99 -0.45 -0.33 -0.22 -0.51 -0.45 2.06 -0.22 0.32 -1.53 1.34 1.10

Social 0.25 0.78 -0.17 0.99 -0.38 0.15 -1.74 1.41 -1.74 -0.80 0.68 0.99 -1.43 1.20 1.31 0.36 -1.74 0.57 -0.38 -0.27 -0.90 -1.43 0.68 0.46 -0.80 -0.38 -1.32 -0.17 1.73 1.10 1.20 -0.80 0.15 0.57 1.73 -0.59 0.25 -0.38 1.10 0.89

Math 0.43 -0.16 0.98 0.05 -0.19 2.16 -0.67 -0.26 -0.50 -0.05 0.40 -1.30 -0.09 0.40 1.54 -1.30 -0.67 -0.23 0.71 1.85 0.26 -0.33 0.09 -0.12 0.67 -0.95 2.16 0.29 0.19 -0.40 -0.26 0.85 2.16 -0.29 2.16 -1.09 1.74 -0.88 -0.98 1.19

The plot for member ID5,10,26 are shown in Figure 2. The projection of contribution and improvement requested are shown in Table 6. The blue hatched subject is their contribution subjects and the red hatched subject is their improvement request subjects. The evaluated results using the normalized scores and the ones using the proposed procedure are different in general.

Score Evaluation Social

125

Japanese Science English

Math

Normalized score

2 1

ID5 ID10 ID26

/2

0 -1 /2

-2 -2 -1 0 1 2 Extended normalized standard deviaiton Figure 2. Dependence of normalized score on extended normalized standard deviation.

Table 6. Evaluated parameters for member ID5, 10, and 26 ID5 Subject Score Normalized score Contribution Improvement requested

Japanese 85 0.77 0.90 -0.62

English 82 0.89 1.07 -0.68

Science 49 0.24 0.37 -0.10

Social

ID10 Subject Score Normalized score Contribution Improvement requested

Japanese 78 0.34 0.48 -0.20

English 36 -1.05 -0.84 1.24

Science 52 0.42 0.55 -0.27

Social

ID26 Subject Score Normaloized score Contribution Improvement requested

Japanese 76 0.22 0.36 -0.08

English 50 -0.46 -0.26 0.66

Science 44 -0.07 0.07 0.20

Social

87 0.46 0.53 -0.38

91 0.89 0.95 -0.80

87 0.46 0.53 -0.38

Math 68 0.44 0.67 -0.19

Math 64 0.30 0.53 -0.05

Math 90 1.21 1.43 -0.95

Kunihiro Suzuki

126

SUMMARY Here is summarized the results in this chapter. The average and variance of each subject are given by N

x

p 

i 1

ip

N

 x N

 p2 

i 1

ip

 p 

2

N

where N is the number of students.

x p The score of member i for subject is denoted as ip , and is normalized as zip 

xip   p

p

We evaluate the average and standard deviation of standard deviation as  

1 5  p 5 p 1

 

2 1 5  p      5 p 1

The related angle is evaluated a tan  

 

    tan 1       We then normalize the standard deviation for each subject as

Score Evaluation

z p 

 p   

We introduce a normalized variable as 1 z p 

 z   p

  1      

2

The unit vector contribution is given by

        eG   cos    ,sin      2 2    2 2 The contribution is given by

      ai eG  zGi  z p cos     zip sin     2 2  2 2 The unit vector for improvement requested is given by

        e B   cos     ,sin       2 2    2 2 The improvement requested is given by

      ai eB  zBi  z p cos      zip sin      2 2  2 2

127

Chapter 5

AHP (ANALYTIC HIERARCHY PROCESS) ABSTRACT Analytic hierarchy process (AHP) enables us to decide which subject we should select based on the various item evaluations. The evaluations are done qualitatively, but we convert them to the numerical ones, and decide the target as if we do it based on the quantitative data. AHP is used in various cases where we cannot have quantitative data.

Keywords: pair comparison method, geometric average, eigenvalue, eigenvector

1. INTRODUCTION When we buy a product, there are various kinds of ones in general. We care about various items to decide which product we select. The items are such as price, style, color, function, etc. It is a rare case that one kind of product is superior in all items. The simple decision can be done if we score each items and sum it up. We can easily decide which one we should select. In this decision, we treat each item identically. However, some item is more important than others. Therefore, we need to weight the items. The weight expresses the importance of the item for the person who selects the product. We cannot decide the importance clearly if there are many items to be considered. Analytic hierarchy process (AHP) was developed by Thomas L. Saaty to overcome the problem, where pair comparison is used. AHP treats quite ambiguous data, but gives us clear numerical results.

Kunihiro Suzuki

130

2. AHP PROCESS We consider a case of selecting one sport club among three ones: club A, club B, and club C. The items which we consider to select the club are supposed to be below four.    

Price Facilities Transportation Staff

The corresponding data structure is shown in Figure 1. Each clubs score is shown in Table 1, and corresponding radar chart is shown in Figure 2. The data are given by general evaluation or the personal evaluation. Club A is superior in the price (low cost), and club C is superior in facility. We can evaluate the clubs by summing up the score, which are shown in Table 1. In the standpoint of sum score view, we should select club C. In the above evaluation, we implicitly assume that each item is identical. However, the importance of the items depends on a person, where price is the most important for someone, and facility is the most important for someone. Therefore, we need to include the importance of the items in deciding the club.

Figure 1. Data structure for AHP analysis.

AHP (Analytic Hierarchy Process)

131

Table 1. Scores for club A, B, and C Club A B C

Price 8 3 2

Facility Transportation Staff 2 4 5 2 4 4 8 5 6

Sum 19 13 21

Figure 2. Radar chart for club selection data.

3. PAIR COMPARISON METHOD 3.1. Pair Comparison Table In the pair comparison method, we select two items and compare them relatively. It is rather hard for a person to evaluate all items simultaneously, but the comparison is rather easy if we focus on only two subjects. We usually get the answers categorically, and convert them to values on the back yard. The conversion example is shown in Table 2. The example for raw data is shown in Table 3. The data is converted to numeric one based on Table 2, and finally we obtain Table 4. We evaluate the importance of the items from this table. The conversion of categorical data to the numeric such as better  3 is rather ambiguous. We can only think the categorical expression has some order and we assign a number based on the order. The important point is that the identical level for positive and negative follows the rule that the product is 1. For example, if we assign better  3 , the corresponding negative expression

worse

must be

1

3

. This rule is supposed to express

Kunihiro Suzuki

132

the human impression. I think that the conversion is rather ambiguous and not established one. Table 2. Score conversion Evaluation absolutely worse much worse worse little worse plane little good better much better absolutely better

Score 1/9 1/7 1/5 1/3 1 3 5 7 9

Table 3. Raw data for pair comparison method Absolutely worse

1

Much worse

3 ○

Worse

5

Little worse

7

Plane

Better

9

Little better

Much better

Score Price Price Price Facility Facility Transportation

Absolutely better

Left item

Right item

1/3 1/5 1/7 1/9 Facility Transportation Staff Transportation Staff Staff

○ ○ ○ ○ ○

Table 4. Converted table. The below is the one which is fulfilled base on the data above Price

Facility

Price Facility Transportation Staff

3

Price Price Facility Transportation Staff

Facility 1 1/3 1/5 1/7

3 1 1 1/5

Transportation 5 1

Staff

Transportation 5 1 1 1/3

Staff

7 5 3

7 5 3 1

AHP (Analytic Hierarchy Process)

133

3.2. Weight Evaluation Based on Geometric Average The item of Price has scores of 1,3,5,7 as shown in Table 4. The corresponding geometric average (see Chapter 3 of volume 1) is given by 1

Geometric average  Price   1  3  5  7  4  3.20

(1)

The other items are also similarly evaluated as 1

1 4 Geometric average  Facility   11 5   1.14 3 

(2)

1

1 4 Geometric average Transportation    11 3   0.88 5 

(3)

1

 1 1 1 4 Geometric average Staff      1  0.31 7 5 3 

(4)

Table 5. Weight based on geometric average Item Price Facility Transportation Staff Sum

Average 3.20 1.14 0.88 0.31 5.53

Weight 0.58 0.21 0.16 0.06

Table 6. Evaluation of each club using weight

Weight Club A Club B Club C

Price 0.58 8 3 2

Facility Transportation 0.21 0.16 2 2 8

4 4 5

Staff 0.06 5 4 6

Sum

Weighted sum 19 13 21

5.96 3.01 3.94

The sum of them is 5.53. Therefore, we can evaluate the weight of each item as the average divided by the sum, which is shown in Table 5. We can evaluate each club by

Kunihiro Suzuki

134

weighted sum as shown in Table 6. We select the club C by the simple sum, but we select the club A by the weighted evaluation.

3.3. Eigenvector Method We perform a matrix operation in this section, and the basic matrix operation is described in Chapter 15. We consider

w1 , w2 , , wn aij 

n

items denoted as

The pair comparison of

I1 , I 2 , , I n Ij

to

Ii

. The ideal weight is denoted as

is then denoted as

aij

wi wj

(5)

Therefore, the ideal data for the matrix

 w1 w  1  w2 A   w1    wn  w1

and is given by

w1 w2 w2 w2 wn w2

A is given by

w1  wn   w2  wn     wn  wn 

(6)

The corresponding geometric average for i-th item i is given by 1

w w w n i   i  i   i  wn   w1 w2 wi  w1w2 wn Therefore, the ratio is

(7)

AHP (Analytic Hierarchy Process)

w1 : w1 :

135

: wn

(8)

We consider the data from the different standpoint of view. Operating the weight vector from the right side, we obtain

 w1 w  1  w2 w  1    wn  w1

w1  wn    w1   w1  w2    w  w wn   2   n  2            wn   wn  wn  wn 

w1 w2 w2 w2 wn w2

(9)

Therefore, the weight is an eigenvector, and n is the eigenvalue in the ideal case. Consequently, we have the same result in geometric average method and the eigenvector method if the data is ideal. The real matrix is different from the ideal one. However, we perform the two methods even in the case. Let us consider data which are far from the ideal one.



We denote the first eigenvalue and eigenvector of matrix A as max and respectively. We start with the data shown in Table 4. The corresponding matrix is given by

1  1 3 A1  5 1  7

3

5

1

1

1

1

1 5

1 3

7  5   3   1 

v

,

(10)

We evaluate the first eigenvalue and eigenvector, which is given by

Av  4.24v

(11)

Kunihiro Suzuki

136  0.57    0.21   v  0.16     0.06 

(12)

where 4.24 is an eigenvalue, and to the weight shown in Table 6.

v

is an eigenvector. Note that the eigenvector is close

4. CONSISTENCY CHECK OF PAIR COMPARISON The data based on the pair comparison may suffer inconsistent problem. For example, when we compare A to the other. If A is inferior to B, and is superior to C, B should be inferior to C. This should be determined before performing B-C pair comparison. However, we perform B-C comparison without caring the comparison associated with A. Therefore, we may have inconsistent data sometimes. We should check this inconsistency of the data. Let us consider the table we treated, and show it again as Table 7. We only need the first row hatched. The other data can be generated base on the data as shown in the numbers in brackets. However, we obtain the corresponding the data independently. Therefore, there is some inconsistency. Table 7. Inconsistency of the data Price Price Facility Transportation Staff

We evaluate the value of

Facility

Transportation 3 5 1 (5/3)

Staff 7 5 (7/3) 3 (7/5)

max .

max  n

(13)

Focusing on i-th row, we obtain n

a v j 1

ij

j

 max vi (14)

AHP (Analytic Hierarchy Process)

137

Modifying this, we obtain n

vj

j 1

vi

max   aij

(15)

We then obtain

max  1 

 1  y     ij y  j i 1  ij  n

(16)

where

yij 

1 2 yij

(17)

is valid in general. Therefore, we obtain

max  1 

n

2

j i 1

 1 2n  i

(18)

i has a values from 1 and n , that is, n

n

I 1

i 1

 max   1  2  n  i 

(19)

Therefore, we obtain

1 nmax  1  2n  n  2 n  n  1 2 2 n This leads to

(20)

Kunihiro Suzuki

138

max  n

(21)

Equality holds only when

yij  1

(22)

That is, we obtain

aij 

vi vj

(23)

The consistency can be evaluated as the deviation of

max

to n . We usually use the

factor as the deviation divided by n  1 . This factor is denoted as C.I . (consistency index), and it is given by

C.I . 

max  n n 1

(24)

In the above example, we obtain

max  n n 1 4.24  4  4 1  0.08

C .I . 

(25)

Roughly speaking, the critical value of C.I . is supposed to be in between 1 and 1.5. If the evaluated C.I . is less than the critical value, we judge that it is OK. In this case, we judge that the data is consistent. The geometric average method and the eigenvector method can be both used in the standpoint of obtaining the weight function. However, we can evaluate consistency of the data with the eigenvector method. Therefore, the eigenvector method is rather preferably used. We assume that the Table 1 is a given one. However, we can also make the table based on the AHP process above.

AHP (Analytic Hierarchy Process)

139

SUMMARY To summarize the results in this chapter– We obtain scores for subjects in many items. The sum of the scores for various items corresponds to the evaluation of the subjects. We add weight to the score. We select two items and compare them qualitatively and convert it to numerical data. We obtain the weight by performing the geometric average method or the eigenvector method. Using the weighted score sum, we can select the subject. We can evaluate the consistency of the data using eigenvalue.

Chapter 6

QUANTIFICATION THEORY I ABSTRACT We predicted an objective variable values for given multi parameters with their error range in the multiple regression. We discuss the same subject when we have categorical data or mixture of numerical and categorical data. One categorical data are converted to the level number -1 numerical data, and then the same procedure as the multi regression is performed.

Keywords: regression, multiple regression, objective variable, explanatory variable, categorical data

1. INTRODUCTION We frequently face to the case for multiple evaluations where the data are not numerical data. For example, we evaluate some subject with levels of yes or no, male or female, done or not done, and so on. These kinds of data are called as categorical ones. We want to predict the objective variable value including these categorical evaluations. Quantification theory I corresponds to the multiple regression with these categorical data. We perform matrix operation in this chapter, and the basics of the matrix operation are described in Chapter 15.

2. ONE VARIABLE ANALYSIS We assume that the objective variable is numerical data and the explanation variable is category data, and assume one categorical variable x1 .

Kunihiro Suzuki

142

The objective variable is numerical data from 0 to 100. The level of explanation variables are expressed with categorical data as shown in Table 1. Table 1. Relationship between group discussion evaluation and score

ID 1 2 3 4 5 6 7 8 9 10

Group discussion x1 Excellent Excellent Excellent Excellent Allowed Allowed Allowed Wrong Wrong Wrong

Score 96 88 77 89 80 71 77 78 70 62

The levels for the group discussion are categorical levels of excellent, allowed, or wrong, and we convert these data as

1 x11   0

for excellent for non  excellent

(1)

1 x1 2   0

for allowed for non  allowed

(2)

1 x13   0

for wrong for non  wrong

(3)

We then obtain the modified data shown in Table 2. The score yi and the modified numerical data is related to

yi   0  11 xi11  1 2 xi1 2  13 xi13   i

(4)

There is one constraint:

xi11  xi1 2  xi1 3  1

(5)

Quantification Theory I

143

Therefore, we can eliminate one variable, and we eliminate

x11

here. We then have

yi   0  1 2 xi1 2  13 xi13   i x1 2   x1 3  0

corresponds to

(6)

x11  1

. The final data is shown in Table 3.

Table 2. Relationship between numerical group discussion evaluation and score ID 1 2 3 4 5 6 7 8 9 10

Group discussion x1(1) x1(2) x1(3) x1 Excellent 1 0 0 Excellent 1 0 0 Excellent 1 0 0 Excellent 1 0 0 Allowed 0 1 0 Allowed 0 1 0 Allowed 0 1 0 Wrong 0 0 1 Wrong 0 0 1 Wrong 0 0 1

Score 96 88 77 89 80 71 77 78 70 62

Table 3. Relationship between final numerical group discussion evaluation and score

ID

x1(2) 1 2 3 4 5 6 7 8 9 10

The predicted score value yˆi for

x1(3) 0 0 0 0 1 1 1 0 0 0

yi

Score 0 0 0 0 0 0 0 1 1 1

96 88 77 89 80 71 77 78 70 62

is given by

yˆi  ˆ0  ˆ1 2 xi1 2  ˆ13 xi13

The variance associated with the deviation is given by

(7)

Kunihiro Suzuki

144

1 n 2   yi  yˆi  n i 1 2 1 n    yi  ˆ0  ˆ1 2  xi1 2   ˆ1 3 xi13   n i 1 

Se   2



(8)



2 S 

ˆ

ˆ

ˆ

We impose that e is minimum, and obtain  ,    ,    as the followings. After obtaining the Table 3, the process is exactly the same as the multiple regression. We repeat here again. Partial differentiating

0

2 Se 

11

1 2

ˆ of Eq. (8) with respect to  0 , we obtain

Se  2 n     yi  ˆ0  ˆ1 2 xi1 2  ˆ13 xi13   0  n i 1  ˆ0



2



(9)

We then have





ˆ0  n  y  ˆ1 2 x1 2  ˆ13 x13  

(10)

Substituting Eq. (10) into Eq. (8), we obtain



2



2 1 n  yi  ˆ0  ˆ1 2 xi1 2  ˆ1 3 xi13     n i 1 2 1 n    yi  y    ˆ1 2 xi1 2  x1 2  ˆ1 3 xi1 3  x1 3     n i 1 

Se  



Partial differentiating Se ˆ

2

1 2 



2 Se 







of Eq. (8) with respect to





(11)

ˆ1 2 

, we obtain



 



2 n    yi  y   ˆ12 xi12  x1 2  ˆ13 xi13  x13   xi12  x12  0 n i 1 

(12)

We then have

ˆ1 2 S1221 2  ˆ13 S1231 2  S y 122

Partial differentiating

2 Se 

of Eq. (8) with respect to

(13)

ˆ1 3

, we obtain

Quantification Theory I Se ˆ

2



1 2







145

 



2 n    yi  y   ˆ12 xi12  x12  ˆ13 xi13  x13   xi13  x13  0 n i 1 

(14)

We then have

ˆ1 2 S12213  ˆ13 S12313  S y 123

(15)

where







 x    x   

(17)





(18)

S1 21 2 

1 n  x  x12 n i 1 i1 2

S1 213 

1 n  x x n i 1 i1 2 1 2

S1313 

1 n  x  x13 n i 1 i13

2

2

2

2

(16)

i1 3

2





(19)





(20)

S y 12 

1 n   yi  y  xi12  x12 n i 1

S y 13 

1 n   yi  y  xi13  x13 n i 1

2

2

13

We can express the result with a matrics form given by  S1221 2 S12213  ˆ1 2   S y 212       2  S  2  ˆ   S  2  S  1 213 1313  13   y13 

(21)

We have  ˆ1 2   S1221 2    ˆ   S  2 1 3      1 213

S1 213   2 S1313  2

1

 S y 122   11.5     S  2   17.5  y 1 3   

(22)

Kunihiro Suzuki

146

We can obtain

ˆ1 2 , ˆ13

and

ˆ1 2 , ˆ13

from this. Substituting these to Eq. (10), we

ˆ  87.5

obtain 0 . Finally, we obtain

yˆ  ˆ0  ˆ1 2 xi1 2  ˆ1 3 xi1 3

(23)

 87.5  11.5 x1 2  17.5 x1 3

This process is the same as the multiple regression. However, only one variable among

x11 , x1 2 , x13

has a value of 1. Therefore, the final equation should be expressed by

 0  yˆ  87.5  11.5 17.5 

for excellent   for allowed  for wrong 

(24)

3. ANALYSIS WITH MANY VARIABLES We extend the analysis for many variables. We add an item whether a member belongs to a circle club, where the levels are two and are yes or no, which is shown in Table 4. Table 4. Explanation categorical data are group discussion and circle club. The two data are assumed to influence the score of each member

ID 1 2 3 4 5 6 7 8 9 10

Group Circle club discussion x2 x1 Excellent Yes Excellent Yes Excellent No Excellent No Allowed Yes Allowed No Allowed No Wrong Yes Wrong Yes Wrong No

Score 96 88 77 89 80 71 77 78 70 62

Quantification Theory I

147

In this case, we relate each categorical data for group discussion to numerical data as below.

1 x21   0 1 x2 2    0

for yes for no

(25)

for no for yes

(26)

which is shown in Table 5. The score may be expressed by

yi   0  11 xi11  1 2 xi1 2  13 xi13   21 xi 21   2 2 xi 2 2＋ i

(27)

However, there are constraints below.

xi11  xi1 2  xi13  1

(28)

xi 21  xi 2 2  1

(29)

Therefore, we can neglect one variable for each categorical data, and neglect

x2  2 

x11

and

here, which is shown in Table 6. We then have

yi  0  1 2 xi1 2  13 xi13   21 xi 21   i

(30)

The predicted value of yi is denoted as yˆi , and is given by

yˆi  ˆ0  ˆ1 2 xi1 2  ˆ13 xi13  ˆ21 xi 21

(31)

Kunihiro Suzuki

148

Table 5. Relationship between numerical group discussion evaluation and circle data. The two categorical data are converted to numerical data and assumed to influence the score

ID

Group discussion x1

1 2 3 4 5 6 7 8 9 10

x1(1)

Excellent Excellent Excellent Excellent Allowed Allowed Allowed Wrong Wrong Wrong

x1(2) 1 1 1 1 0 0 0 0 0 0

Circle club x2

x1(3) 0 0 0 0 1 1 1 0 0 0

0 0 0 0 0 0 0 1 1 1

x2(1)

Yes Yes No No Yes No No Yes Yes No

x2(2) 1 1 0 0 1 0 0 1 1 0

Score 0 0 1 1 0 1 1 0 0 1

96 88 77 89 80 71 77 78 70 62

Table 6. Relationship between final numerical group discussion and circle club evaluation and score

ID 1 2 3 4 5 6 7 8 9 10

x1(2) 0 0 0 0 1 1 1 0 0 0

x1(3) 0 0 0 0 0 0 0 1 1 1

x2(2) Score 1 1 0 0 1 0 0 1 1 0

96 88 77 89 80 71 77 78 70 62

The related variance of error can be evaluated as 1 n 2  yi  yˆi   n i 1 2 1 n    yi  ˆ0  ˆ1 2 xi1 2   ˆ1 3 xi13  ˆ21 xi 21   n i 1 

Se   2



We impose that the can be evaluated as

(32)



2 Se 

has the minimum value and decide

ˆ0 , ˆ11 , ˆ1 2 , ˆ21

, which

Quantification Theory I  ˆ1 2   S1221 2     ˆ    S  2  13   1 213  ˆ   S  2  21   1 2 21

S1 21 3 2

S1313 2

S1 2 21 2

S1 2 21   2 S13 21   2 S2 1 21  2

1

 S y 212     10.0   S  2    19.0    y13    S  2   9.0   y 21 

149

(33)

where ˆ0 can be evaluated as





ˆ0  n  y  ˆ1 2 x1 2  ˆ13 x13  ˆ21 x21   83.0 

(34)

Therefore, the corresponding regression equation is given by

yˆ  ˆ0  ˆ1 2 x1 2  ˆ13 x13  ˆ21 x21  83.0  10.0 x1 2  19.0 x1 3  9.0 x21  0   83.0  10.0 19.0 

for excellent   0 for allowed    9.0 for wrong  

(35)

for no   for yes 

4. MIXTURE OF NUMERICAL AND CATEGORICAL DATA FOR EXPLANATION VARIABLES We can then treat both categorical and numerical data simultaneously. We add a numerical data of time to go to school for the explanation variable as shown in Table 7. We denote the data of time to go to school as x3 , and obtain the final form as shown in Table 8.

yi   0  1 2 xi1 2  13 xi13   21 xi 21  3 xi 3   i

The prediction of

yi

is denoted as

(36)

yˆi

yˆi  ˆ0  ˆ1 2 xi1 2  ˆ13 xi13  ˆ21 xi 21  ˆ3 xi 3

(37)

Kunihiro Suzuki

150

Table 7. Explanation categorical data are group discussion and circle club and numerical data are time to go to school. The three data are assumed to influence the score of each member Group discussion x1 Excellent Excellent Excellent Excellent Allowed Allowed Allowed Wrong Wrong Wrong

ID 1 2 3 4 5 6 7 8 9 10

Time Circle club to go school x2 x3 Yes 15 Yes 85 No 78 No 15 Yes 57 No 29 No 64 Yes 22 Yes 57 No 50

Score 96 88 77 89 80 71 77 78 70 62

Table 8. Relationship between final numerical group discussion, circle club, and time to school evaluation and score

ID 1 2 3 4 5 6 7 8 9 10

x1(2)

x1(3) 0 0 0 0 1 1 1 0 0 0

x2(1) 0 0 0 0 0 0 0 1 1 1

x3 1 1 0 0 1 0 0 1 1 0

Score 15 85 78 15 57 29 64 22 57 50

96 88 77 89 80 71 77 78 70 62

The related variance of error can be evaluated as 1 n 2  yi  yˆi   n i 1 2 1 n    yi  ˆ0  ˆ1 2 xi1 2  ˆ1 3 xi13  ˆ21 xi 21  ˆ3 xi 3    n i 1

Se   2





(38)

Quantification Theory I 2 Se 

We impose that the which can be evaluated as  ˆ1 2   S1221 2     ˆ   S1 22 1 3  1 3         ˆ   S  2  21   1 2 21  ˆ   S  2  3   1 2 3

has the minimum value and decide

S1 213

S1 2 21

S1313

S13 21

S1 3 21

S2 1 21

S133

S2 13

2

2

2

2

151

2

2

2

2

ˆ0 , ˆ1 2 , ˆ13 , ˆ21 , ˆ3

S1 23   S y 212     9.75  2 S133   S y 213   19.7     2 S2 13   S y 221   9.2      0.126  2 S33    S y 23  

,

2

(39)

We then obtain ˆ0 as





(40)

 89.0  9.75 x1 2  19.7 x1 3  9.2 x21  0.126 x3

(41)

ˆ0  n  y  ˆ1 2 x1 2  ˆ13 x13  ˆ21 x21  ˆ3 x3   89.0 

Therefore, we obtain regression line as yˆ  ˆ0  ˆ1 2 x1 2  ˆ1 3 x1 3  ˆ21 x21  ˆ3 x3  0   89.0   9.75 19.7 

for excellent   0 for allowed    9.2 for wrong  

for no    0.126 x3 for yes 

We can evaluate contribution ratio and the selection of variables exactly the same process for multiple regression.

SUMMARY To summarize: We assume that the objective variable is numerical data and the explanation variable is categorical data. When we have m variables, and each variable have nk express the regression as

 k  1, 2,

, m

levels, we

Kunihiro Suzuki

152

yˆ  ˆ0  ˆ1 2 x1 2  ˆ13 x13 

 ˆ1 n1  x1 n1 

 ˆ2 2 x2 2  ˆ23 x23   ˆm 2 xm 2  ˆm3 xm3 

We neglect

xk 1

 ˆ2 n2  x2 n2   ˆm nm  x2 nm 

since we impose the restriction of

xk 1  xk  2  xk 3 

 xk  nk   1

Only one term is 1 and the others are 0. The factors are given by  ˆ1 2   S1221 2     ˆ   S  2 1 3    1 213        2 ˆ  1 n1    S1 21 n1      ˆ2 2   S122 2 2     ˆ23   S122 23       ˆ    2   2 n2    S1 2 2 n2         ˆ   S  2  m 2   1 2 m 2  ˆ    2   m3   S1 2 m3       ˆ   m n   S1 22 m n   m      m

S1 213 2

S1 21 n1  2

S1 2 2 2 2

S1 2 23 2

S1 22 n2  2

S1 2 m 2  2

S1 2 m3 2

2 S1 2 m nm                          2 Sm nm  m nm  

1

 S y 212     S  2  y1 3       2   S y1 n1    2   S y 2 2     S y 223        2   S y 2 n2        S  2   ym 2    2   S ym3      2   S ym n    m 

We then obtain ˆ0 as   ˆ1 2 x1 2  ˆ1 3 x13   ˆ1 n  x1 n   1 1       ˆ x  ˆ x   ˆ x  2  3 2  3 2  n2  2  n2   ˆ0  n  y   2 2 2 2         ˆm 2 x2 2  ˆm3 x23   ˆm n  x2 n     m m   

We can extend this process to the data form of mixture of categorical and numerical ones.

Chapter 7

QUANTIFICATION THEORY II ABSTRACT The discriminant analysis gives us a procedure to decide to which group a person or subject belongs. Quantification theory II gives the same results with categorical data or mixture of categorical and numerical data.

Keywords: quantification theory II, determinant analysis, Maharanobis’ distance, categorical data

1. INTRODUCTION When a person goes to a hospital, he is asked various items: having headache or not, having nausea or not, having fever or not, and smoking or no-smoking. Therefore, a doctor obtains various data and should do decide whether he is in disease or not, or what kind of disease he has. The data are not always numerical ones, but the mixture of categorical and numerical ones. We study how the categorical data are converted to the numerical data. After that, we can perform discriminant analysis for numerical data, which is called as quantification theory II. Therefore, the new thing exists in the data conversion. We perform matrix operation and the basic matrix operations are described in Chapter 15.

Kunihiro Suzuki

154

2. DISCRIMINANT ANALYSIS WITH ONE CATEGORICAL DATA We consider both healthy and disease members. We treat a categorical data of frequency of nausea, which may influence healthy or disease. The level number for the frequency of nausea is three, and the levels are no, little, and much. The corresponding data is shown in Table 1. Table 1. The relationship between condition and nausea. The levels for nausea are three

No.

Condition 1 2 3 4 5 6 7 8 9 10

Healthy Healthy Healthy Healthy Healthy Disease Disease Disease Disease Disease

Nausea x1 No Little No No No Little Much Little Little Much

Table 2. The relationship between condition and nausea. The levels for nausea are converted to numerical data

No. 1 2 3 4 5 6 7 8 9 10

Condition Healthy Healthy Healthy Healthy Healthy Disease Disease Disease Disease Disease

x1(2)

x1(3) 0 1 0 0 0 1 0 1 1 0

0 0 0 0 0 0 1 0 0 1

Quantification Theory II

155

We assign numerical data to the categorical nausea data as below.

1 x11   0

for no for non  no

(1)

1 x1 2   0

for little for non  little

(2)

1 x1 3   0

for much for non  much

(3)

Since we impose the restriction of

x11  x1 2  x13  1

(4)

We can then neglect the Eq. (1). We evaluate the fundamental values for healthy and disease members below.

Healthy Member Data The number of data nA is given by nA  5

(5)

The averages of levels 2 and 3 are denoted as  A1 2  and  A1 3 , and are given by

 A1 2 

 A13 

x

iA1 2

nA

x

iA1 3

nA



0 1 0  0  0 1  5 5

(6)



00000 0 5

(7)

The variance of level 2 and 3 are denoted as  A 21 2 A1 2 and  A 213 A13 , and are given by

Kunihiro Suzuki

156

 A 21 2 A1 2

 x 

 A 21 3 A1 3

 x 

  A1 2

iA1 2 



2

nA  1 iA1 3

  A1 3

 0  0.2 





nA  1

2

 1  0.2    0  0.2    0  0.2    0  0.2  2

2

2

2

(8)

4

2



 0  0

2

  0  0   0  0   0  0   0  0 2

2

2

2

(9)

4

The covariance between level 2 and 3 is denoted as  A 21 23 and is given by  A 21 2 A13  

 x

iA1 2 

  A1 2

 x

iA1 3

  A13



nA  1

 0  0.2  0  0   1  0.2  0  0    0  0.2  0  0    0  0.2  0  0    0  0.2 0  0  4

(10)

Disease Member Data The number of data nB is given by nB  5

(11)

The averages of levels 2 and 3 are denoted as  B1 2  and  B1 3 , and are given by

B1 2 

B13 

x

Bi1 2

nB

x

Bi1 3

nB



1 0 11 0 3  5 5

(12)



0 1 0  0 1 2  5 5

(13)

The variance of level 2 and 3 are denoted as  B 21 2 2 and  B 2133 , and are given by



 2 B1 2  B1 2 

 x 

iB1 2 

  B1 2

nB  1



2



1  0.6 

2

  0  0.6   1  0.6   1  0.6    0  0.6  2

2

2

2

4

(14)

Quantification Theory II

 B 213 B13

 x 

iB1 3

  B13



2



nB  1

 0  0.4 

2

157

 1  0.4    0  0.4    0  0.4   1  0.4  2

2

2

2

4

(15) The covariance between level 2 and 3 is denoted as  B 21 2 B13 and is given by  B 21 2 B13  

 x

iB1 2 

  B1 2

 x

iB1 3

  B13



(16)

nB  1

1  0.6  0  0.4    0  0.6 1  0.4   1  0.6  0  0.4   1  0.6  0  0.4    0  0.6 1  0.4  4

The parameters for the total group of healthy and disease are evaluated as

1 2 

13 

nA  A1 2  nB  B1 2 nA  nB nA  A13  nB  B13 nA  nB





0.2  0.6  0.4 2

0  0.4  0.2 2

(17)

(18)

 nA  1 A 22 2   nB  1 B 22 2  nA  1   nB  1

 0.25

(19)

 12313

 nA  1  A 233   nB  1  B 233   nA  1   nB  1

 0.15

(20)

 12213

 nA  1  A 21 23   nB  1 B 21 23   nA  1   nB  1

 0.15

(21)

 1221 2 

We can evaluate the covariance matrix as   1221 2  2     1 21 3

 12213   0.250   12313

   0.150 

The inverse matrix is evaluated as

0.150   0.150 

(22)

Kunihiro Suzuki

158

10.00 10.00   1    10.00 16.67 

(23)

The corresponding judge function

z

is given by

 x1 2  1 2   z    A1 2   B1 2 ,  A1 3   B13   1   x1 3  1 2    x   10.00 10.00  1 2 0.40     0.20  0.60,0  0.40   10.00 16.67   x13  0.20   5.33  8.00 x1 2  10.67 x1 3 0   5.33  8.0 10.67 

(24)

for no for little for much

We can judge as below: z  0  Group A : Healthy

(25)

z  0  Group B : Disease

(26)

We can judge the accuracy of the evaluation by comparing the predicted result and the data, which is shown in Table 3. The accuracy is 90% in this case. Table 3. The comparison of predicted result with the data No. 1 2 3 4 5 6 7 8 9 10

Condition Healthy Healthy Healthy Healthy Healthy Disease Disease Disease Disease Disease

x1(2)

x1(3) 0 1 0 0 0 1 0 1 1 0

0 0 0 0 0 0 1 0 0 1

Score 5.33 -2.67 5.33 5.33 5.33 -2.67 -5.34 -2.67 -2.67 -5.34

Result Healthy Disease Healthy Healthy Healthy Disease Disease Disease Disease Disease

Quantification Theory II

159

3. DISCRIMINANT ANALYSIS WITH TWO CATEGORICAL DATA We consider the relationship between condition and two explanatory variables adding headache. The level number for the headache is also three, which is shown in Table 4. Table 4. The relationship between condition and nausea and headache. The levels for nausea and headache are three

No.

Condition 1 2 3 4 5 6 7 8 9 10

Healthy Healthy Healthy Healthy Healthy Disease Disease Disease Disease Disease

Nausea Headache x1 x2 No Little Little No No No No No No No Little Much Much No Little Little Little Much Much Little

We assign categorical data associated with nausea of much, little, no to the below.

1 x1 2   0

for little for non  little

(27)

1 x1 3   0

for much for non  much

(28)

Table 5. The relationship between condition and nausea and headache. The levels for nausea and headache are converted to numerical data No. 1 2 3 4 5 6 7 8 9 10

Condition Healthy Healthy Healthy Healthy Healthy Disease Disease Disease Disease Disease

x1(2)

x1(3) 0 1 0 0 0 1 0 1 1 0

x2(2) 0 0 0 0 0 0 1 0 0 1

x2(3) 1 0 0 0 0 0 0 1 0 1

0 0 0 0 0 1 0 0 1 0

Kunihiro Suzuki

160

We assign categorical data associated with headache of much, little, or no to the below.

1 for little x2 2   0 for non  little

(29)

1 for much x23   0 for non  much

(30)

Based on the table, we can evaluate the parameters below. The average associated with healthy members are given by

 A1 2 

 A13 

 A 2 2  

 A23 

x

iA1 2 

nA

x

iA1 3

nA

x

iA 2 2

nA

x

iA 2 3

nA

(31)

(32)

(33)

(34)

The average associated with disease members are given by

B1 2 

 B13 

 B 2 2  

B 23 

x

iB1 2 

nB

x

iB1 3

nB

x

iB 2 2

nB

x

iB 2 3

nB

(35)

(36)

(37)

(38)

Quantification Theory II

161

The total average is given by

1 2 

13 

nA  A1 2  nB  B1 2

nA  A13  nB  B13

(40)

nA  nB

 A 2 2  

 2  3 

(39)

nA  nB

n A  A 2  2   nB  B 2  2 

(41)

n A  nB

nA  A23  nB  B 23

(42)

nA  nB

The healthy members’ variance are given by

 A 21 2 A1 2

 x 

 A 213 A13

 x 

iA1 2 

  A1 2



2

(43)

nA  1

iA1 3

  A1 3



2

(44)

nA  1

 A 22 2 A2 2

 x 

 A 223 A23

 x 

iA 2 2 



  A 2 2 

2

nA  1

iA 2 3

  A 2  3



(45)

2

nA  1

(46)

The disease members’ variance are given by

 B 21 2 B1 2

 x 

 B 213 B13

 x 

iB1 2 

  B1 2



2

nB  1

iB1 3

  B13

nB  1



(47)

2

(48)

Kunihiro Suzuki

162

 B 22 2 B 2 2

 x 

 B 223 B 23

 x 

iB 2 2 

  B 2 2 



2

(49)

nB  1

iB 2 3

  B 2  3



2

(50)

nB  1

The healthy members’ co-variances are given by

 A 21 2 A13 

 A 21 2 A2 2 

 A 21 2 A23 

 A 213 A2 2 

 A 213 A23 

 A 22 2 A23 

 x

  A1 2

iA1 2 

 x

iA1 3

  A13



(51)

nA  1

 x

iA1 2 

 x

  A1 2

  A 2  3



iA 2 3

  A 2  3



iA 2 2 

  A 2 2 



iA 2 3

nA  1

 x

 x

  A1 2

iA1 2 

nA  1

 x

  A13

iA1 3

 x

nA  1

 x

iA1 3

  A13

 x

iA2 3

  A 2  3



iA 2 2 

  A 2 2 

 x

iA2 3

  A 2  3

(53)

(54)

(55)

nA  1

 x

(52)



nA  1

(56)

The disease members’ co-variances are given by

 B 21 2 B13 

 B 21 2 B 2 2 

 x

iB1 2

 B1 2

 x

iB1 3

 B13



(57)

nB  1

 x

iB1 2

 B1 2

 x

nB  1

iB 2 2 

  B 2 2 



(58)

Quantification Theory II

 B 21 2 B 23 

 B 213 B 2 2 

 B 213 B 23 

 B 22 2 B 23 

 x

iB1 2 

 B1 2

 x

iB 2 3

  B 2  3



iB 2 2

  B 2 2 



nB  1

 x

iB1 3

 B13

 x

nB  1

 x

iB1 3

 B13

 x

iB 2 3

  B 2  3



iB 2 2

  B 2 2 

 x

iB 2 3

 B 23

(59)

(60)

(61)

nB  1

 x

163



nB  1

(62)

The total variances are then given by

 1221 2

 12313

 nA  1  A 21 2 A1 2   nB  1  B 21 2 B1 2   nA  1   nB  1

(63)

 nA  1  A 213 A13   nB  1  B 213 B13   nA  1   nB  1

(64)

 2 2 2  2 2 

 2 23 23

(65)

 nA  1 A 223 A23   nB  1 B 223 B 23   nA  1   nB  1

(66)

 nA  1  A 21 2 A13   nB  1  B 21 2 B13  nA  1   nB  1

(67)

 nA  1 A 21 2 A2 2   nB  1 B 21 2 B 2 2   nA  1   nB  1

(68)

 12213 

 122 2 2

 nA  1 A 22 2 A2 2   nB  1  B 22 2 B 2 2  nA  1   nB  1

Kunihiro Suzuki

164

 nA  1  A 21 2 A23   nB  1 B 21 2 B 23  nA  1   nB  1

(69)

 nA  1  A 213 A2 2   nB  1 B 213 B 2 2   nA  1   nB  1

(70)

 nA  1  A 213 A23   nB  1  B 213 B 23  nA  1   nB  1

(71)

 nA  1  A 22 2 A23   nB  1  B 22 2 B 23   nA  1   nB  1

(72)

 122 23 

 123 2 2

 123 23 

 2 2 2  23

We can evaluate the covariance matrix as

  1221 2  1221 3  122 2 2  122 23     2  2  2    2  1313  13 2 2  13 23  1 31 2      2  2  2    2  2 213  2 2 2 2  2 2 23   2 21 2   2  2  2    2   231 2  2313  23 2 2   23 2 2  

(73)

The corresponding judge function z is given by

z    A1 2   B1 2 ,  A13   B13 ,  A2 2    B 2 2  ,  A23

 12.80  9.60 x1(2)  20.80 x1(3)  6.40 x2(2)  14.40 x2(3) 0   12.80  9.60 20.80 

 no   0  no      little    6.40  little    much  14.40  much 

We can judge as below

 x1 2  1 2    x1 3  1 2     B 23   1    x2 2  2 2     x23  23  (74)

Quantification Theory II

165

z  0  Group A : Healthy

(75)

z  0  Group B : Disease

(76)

We can extend this analysis to the data of mixture of categorical and numerical data as shown in Table 6. Table 6. The relationship between condition and nausea and headache and two numerical data. The levels for nausea and headache are three

No.

Condition 1 2 3 4 5 6 7 8 9 10

Healthy Healthy Healthy Healthy Healthy Disease Disease Disease Disease Disease

Nausea Headache x1 x2 No Little Little No No No No No No No Little Much Much No Little Little Little Much Much Little

Inspection1 Inspection2 x3 x4 50 15.5 69 18.4 93 26.4 76 22.9 88 18.6 43 16.9 56 21.6 38 12.2 21 16.0 25 10.5

SUMMARY I summarize the results in this chapter. We consider two groups A and B , and want to judge to which group a member belongs. We obtain categorical data denoted as k , where k  1,2, , m , that is we have k kinds of categorical data. Each categorical data has We convert the data as

nk levels.

xk  2 , xk 3 , , xk  nk  Each data is 1 or 0. Only the one of them is 1 and the others are 0. That is,

xk 1  xk  2  xk 3 

 xk  nk   1

Kunihiro Suzuki

166 We therefore do not consider

xk 1 .

The average is given by

 Ak  nk  

x

iAk  nk 

nA

x

iBk  nk 

Bk  nk  

nB

The total average is given by

k  nk  

nA  Ak  nk   nB  Bk  nk  nA  nB

The variances are given by

 2  Ak  n  Al  n   k

iAk  nk 

  Ak  nk 

 x

iAl  nl 

  Al  nl 



nA  1

l

 2  Bk  n  Al  n   k

 x  x

iBk  nk 

 Bk  nk 

 x

iBl  nl 

 Bl  nl 



nA  1

l

The total variances are given by

 k 2 n k l  nl 

 2  nA  1  Ak 2 n  Al  n    nB  1  AB  n  Bl  n    nA  1   nB  1 k

l

We define the matrixes below.

k

l

Quantification Theory II

μ AB

  A1 2    A1 3      A1 n1    A 2 2     A 2  3     A 2 n2      Am 2    Am 3     Am n  m 

  B1 2     B13      B1 n1      B 2 2      B 2  3      B 2 n2        Bm 2     Bm3      Bm nm  

  1221 2  12213     2  12313  131 2    2  2   1 n1 1 2  1 n1 13  2   2  2 1 2   2 2 2 13    2 2 31 2  2 2 313     2  2   2 n2 1 2  2 n2 13      2  m 2213  m 21 2   2  2   m31 2  m313     m 2n 1 2  m 2n 1 3  m     m  

 x1 2   x13    x  1 n1   x2 2   x2 3  X x  2 n2     xm 2   xm3    xm n   m

167

 1 2    13     1 n1      2 2      2  3     2 n2       m 2     m  3     m nm  

 1221 n 

 1222 2 

 12223

 1222 n 

 122 m 2 

 122 m3

 1231 n 

 1232 2 

 12323

 1232 n 

 123 m2 

 123 m3

 12n 1 n 

 12n 2 2  1

 12n 23 1

 12n 2 n 

 12n m 2  1

 12n m3

 2 2 2 1 n 

 2 2 2 22 

 2 22 2 3

 2 22 2  n 

 2 22  m2 

 2 22 m3 

 2 231 n 

 2 23 2 2

 2 23 23

 2 23 2 n 

 2 23 m 2

 2 23 m3

 2 2 n 1 n 

 2 2 n 2 2  2

 2 2 n 23 2

 2 2 n 2 n 

 2 2 n  m 2  2

 2 2 n  m 3

 m 221 n 

 m 222 2 

 m 22 23

 m 22 2 n 

 m 22 m 2 

 m 22 m3

 2 m  31 n1 

 2 m  3 2  2 

 2 m  3 2  3

 2 m  3 2  n2 

 2 m  3 m  2 

 m 23m3

1

1

1

1

1

1

2

1

1



 m 2n

m

1 n1 



 m 2n

m

 2 2 



 m 2n

m

 2  3

2

2

1

2

2

2

2

2

2



 m 2n

m

 2 n2 



 m 2n

m

 m 2

1

2

 m 2n

m

 m  3

 122 m n

  123 m nm       12n1  m nm     2 22  m nm     2 23 m nm       2  2 n2  m nm      m 22 m nm    m 23m  nm      2  m nm  m nm   m



Kunihiro Suzuki

168

We then obtain the decision equation as z  μTAB 1 X

We can judge as below. z  0  Group A z  0  Group B

Chapter 8

QUANTIFICATION THEORY III (CORRESPONDENCE ANALYSIS) ABSTRACT Quantification theory III evaluates the relationship between two categorical data. The theory assigns the categorical data to numerical values so that the correlation factor between two categorical data has the maximum value. Data values in the quantification theory are 0 or 1. The correspondence theory is extended to the quantification theory III to accommodate any values. The quantification theory III can be regarded as the one special case of the correspondence theory.

Keywords: categorical data, eigenvalue, eigenvector, quantification theory III, correspondence theory

1. INTRODUCTION We sometimes want to know the relationship between two categorical data. For example, the age dependence of favorite artists or dishes or so on. That is, we want to know the favorite singer or dishes associated with the ages. Quantification theory III enables us to obtain the relationship. Since the theory is included in the corresponding theory as its special case, we dominantly study corresponding theory in this chapter. We perform matrix operation and the basics of the matrix operations are described in Chapter 15.

Kunihiro Suzuki

170

2. BASIC CONCEPT OF QUANTIFICATION THEORY III The typical data for quantization theory III are shown in Table 1, where favorite curriculums of members are shown. We want to clarify the relationship between members and curriculums and divide them into some groups. Table 1. Favorite curriculums Member ID Japanese 1 2 3 4 5 6 7 8 9 10

Society

Math

○

Science

Music

○ ○

○ ○

○

Arts and crafts ○ ○

Physical eduation

○ ○ ○

○

○

○

○

○ ○ ○

○

○

○

○

○ ○

○ ○

○

○

○ ○

○

We change the orders of rows and columns so that ○ is ordered on the diagonal line as possible as we can, and obtain the data as shown in Table 2 for an example. The both categories look like a relationship focused on the symbol ○ in Table 2, while we have no image on the relationship in Table 1. The quantization theory III performs the operation from Table 1 to Table 2 by assigning a numerical data to each categorical data. Table 2. A table reordered from the Table 1 Member ID Japanese 2 6 7 1 9 4 10 8 3 6

○ ○ ○

Society

Math

○ ○

○

○ ○

○ ○ ○ ○ ○

Science

Music

Arts and crafts

Physical eduation

○ ○ ○ ○ ○ ○ ○

○ ○ ○ ○

○ ○ ○ ○

○ ○ ○

Quantification Theory III (Correspondence Analysis)

171

3. GENERAL FORM DATA FOR CORRESPONDENCE ANALYSIS In the previous section, the data was ○ or vacant. That is, the data values are 1 or 0 if we regard ○ as 1 and vacant as 0. This should be extended to the one where any number is available. This is called as the corresponding analysis. Therefore, the quantization analysis III is a special case of the one. Table 3. Data example for correspondence analysis

We consider Table 3 for the analysis. We assign category data of mid-20, mid-30, and mid-40 as x1 , x2 , x3 , respectively, and the category data of Chinese, Italian, French, and Japanese as y1 , y2 , y3 , y4 , respectively. The value of x1 , x2 , x3 , y1 , y2 , y3 , y4 are determined later. The data number related to the cell

 x , y  is nij . i

j

We define the sum of the cell related to x1 as nx1  n11  n12  n13  n14

(1)

4

  n1 j j 1

The other sum associated with

x

are given by

nxi  ni1  ni 2  ni 3  ni 4

(2)

4

  nij j 1

Similarly, the sum of

y

is given by

n yj  n1 j  n2 j  n3 j 3

  nij i 1

(3)

Kunihiro Suzuki

172 The total sum N is given by 4

3

j 1

i 1

N   nyj   nxi

(4)

These sums are shown in Table 3. Table 4. General expression of the data shown in Table 3 y1

y2

y3

y4

Sum

x1

n11

n12

n13

n14

nx1

x2

n21

n22

n23

n24

nx 2

x3

n31

n32

n33

n34

nx3

Sum

ny1

ny2

ny3

ny4

N

Finally, the data form is expressed as shown in Table 4. We decide the values

x1 , x2 , x3 , y1 , y2 , y3 , y4 in the step so that the correlation factor between x and y is maximum.

y We impose that the average of x and are 0, which are expressed by

nx1 x1  nx 2 x2  nn3 x3 N 1 3   nxi xi N i 1 0

x

y

(5)

ny1 y1  ny 2 y2  n y 3 y3

1  N

N 4

n j 1

yj

yj

0

y We further impose that the variances of x and are 1, which are expressed by

(6)

Quantification Theory III (Correspondence Analysis) nx1 x12  nx 2 x22  nn3 x32 N 3 1   nxi xi2 N i 1 1

173

S xx   2

  S yy  2

(7)

n y1 y12  n y 2 y22  n y 3 y32  n y 4 y42 N

1  N

4

n j 1

yj

(8)

y 2j

1

In the normalized variable, the covariance and the correlation factor is identical, and they are expressed by

 2

r  S xy

 n11 x1 y1  n12 x1 y2  n13 x1 y3  n14 x1 y4  1     n21 x2 y1  n22 x2 y2  n23 x2 y3  n24 x2 y4  N    n31 x3 y1  n32 x3 y2  n33 x3 y3  n34 x3 y4 

(9)

Therefore, the subject is to maximize the correlation factor under the condition of variance of 1. The corresponding Legendre function L is given by



 



  L  r   S xx   1   S yy 1 2

2

(10)

We decide x1 , x2 , x3 , y1 , y2 , y3 , y4 that provide maximum L . It should be noted that the average 0 is not imposed in Eq. (10), which should be checked after we decide value of the variables. We partially differentiate L with respect to x1 , x2 , x3 , and obtain

N

L  n11 y1  n12 y2  n13 y3  n14 y4  2 nx1 x1  0 x1

(11)

N

L  n21 y1  n22 y2  n23 y3  n24 y4  2 nx 2 x2  0 x2

(12)

N

L  n31 y1  n32 y2  n33 y3  n34 y4  2 nx3 x3  0 x3

(13)

Kunihiro Suzuki

174

Next, we partially differentiate L with respect to y1 , y2 , y3 , y4 , and obtain

N

L  n11 x1  n21 x2  n31 x3  2 ny1 y1  0 y1

(14)

N

L  n12 x1  n22 x2  n32 x3  2 ny 2 y2  0 y2

(15)

N

L  n13 x1  n23 x2  n33 x3  2 ny 3 y3  0 y3

(16)

N

L  n14 x1  n24 x2  n34 x3  2 ny 4 y4  0 y4

(17)

 We first investigate the relationship between  and . From (11)~(13), we obtain x1 , x2 , x3 ,  n11 x1 y1  n12 x1 y2  n13 x1 y3  n14 x1 y4   nx1 x12  nx 2 x22  nx 3 x32  1   n x y  n x y  n x y  n x y  2    21 2 1 22 2 2 23 2 3 24 2 4  N  N     n x y  n x y  n x y  n x y  31 3 1 32 3 2 33 3 3 34 3 4 

(18)

We then obtain

r  2

(19)

Multiplying y1 , y2 , y3 , y4 to Eqs. (14)-(17), we obtain  n11 x1 y1  n12 x1 y2  n13 x1 y3  n14 x1 y4   ny1 y12  ny 2 y22  ny 3 y32  ny 4 y42  1   n x y  n x y  n x y  n x y  2    21 2 1 22 2 2 23 2 3 24 2 4  N  N     n x y  n x y  n x y  n x y  31 3 1 32 3 2 33 3 3 34 3 4 

(20)

We then obtain r  2

This leads to

(21)

Quantification Theory III (Correspondence Analysis)

  

r 2

175 (22)

Equations (11)-(13)can be expressed with a matrix form as  n11 n12   n21 n22 n  31 n32

y  n14   1   nx1 0   y2   n24   2  0 nx 2  y3   0 n34    0  y  4

n13 n23 n33

0  x1    0  x2   nx3   x3 

(23)

Eqs. (14)～(17) can be expressed with a matrix form as  n11   n12  n13   n14

n21 n22 n23 n24

n31   ny1    x1  0 n32    x2   2     0 n33      x3  n34   0

0 ny 2 0 0

0 0 ny 3 0

0   y1   0   y2  0   y3    ny 4   y4 

(24)

We then define the matrix below.  n11 n12  A   n21 n22 n n  31 32

n13 n14   n23 n24  n33 n34 

(25)

 x1    X   x2  x   3

(26)

 y1    y Y  2  y3     y4 

(27)

 nx1  Nx   0   0 

0 nx 2 0

0   0   nx 3 

(28)

Kunihiro Suzuki

176    Ny      

n y1

0

0

0

ny 2

0

0

0

ny 3

0

0

0

0   0   0   n y 4 

(29)

The Equations (23) and (24) are expressed AY  2 N x2 X

(30)

At X  2 N y2Y

(31)

From (31), we obtain

Y

1 1 N y2  At X  2

(32)

Substituting Eq. (32) into Eq. (30), we obtain

A

1 1 N y2  At X  2 N x2 X  2

(33)

Arranging this equation, we obtain N x1 A  N y2  At N x1 N x X   2  N x X 1

2

(34)

We then obtain N x1 A  N y2  At N x1U   2  U 1

2

(35)

where U  Nx X

(36)

This can be expressed with a general form as

MU  U

(37)

Quantification Theory III (Correspondence Analysis)

177

where M  N x1 A  N y2  At N x1

(38)

   2   r 2

(39)

1

2

This is an eigenvalue problem with respect to a matrix U . The eigenvalues and eigenvectors are obtained as  1 ,U 1 ;  2 ,U  2 ,  3 ,U 3

(40)

 We always have   1 , and the corresponding eigenvector has all the same factors. It should be noted that we do not solve the eigenvalues of X , but U which is 1

converted as Nx X , and is symmetrical one as shown in Appendix 1-11. i  After we obtain U , we can get

X    N x1U i

(41)

In the above discussion, we do not impose the average restriction explicitly, and impose it with regard to variance. The maximum correlation factor can be obtained if all x and y values are the same. They are not appropriate one. We want to force the elements of this un-appropriate solution as 1. We assumed that the variance is 1 in the derivation process. The elements in X 1 are not 1, but has the same value. It is denoted as b. Since we impose that the square sum of them is 1, and it should hold Nb 2  1

(42)

We hence obtain b

1 N

(43)

Therefore, we obtain Nb  1

(44)

Kunihiro Suzuki

178 We rescale all data as X   NX  i

i

(45)

This treatment ensures that the elements of X 1 are all 1. Further, we obtain Y as Y   i

1



i 

N 

2 1 y

At X  

(46)

i

In this analysis, we need

x of more than three levels and y of more than two levels.

xi

We can evaluate the distance between category eigenvector. The distance is denoted as 1 1 2 2 dij   xi   yj     xi   yj   2

d ij

and category

yj

from the

as

2

(47)

In this analysis, we treat the first component and second component identically. However, the first component is more important. Therefore, we use weighted distance given by

d ij 

 1  1 1  2   2   2  2  2 x  y  xi  y j  i j   1    2   1    2 

Further, we can evaluate the distance between each

x

d ij

(48)

and each y as

d x ij 

 1  1 1  2   2   2  2  2 x  x  xi  x j  i j   1    2   1    2 

(49)

d y ij 

2  1  1   2   2 2 1  2 y  y  y  y j   j  1  2  i 1  2  i    

(50)

The correlation factor for the first and second component are given by

r      1

1

(51)

Quantification Theory III (Correspondence Analysis)

r      2

2

179 (52)

We perform a corresponding analysis using data of Table 3. We can generate a matrix as below.  8 20 15 4    A  17 10 15 7  12 9 13 17   

(53)

 47  Nx   0   0 

0

(54)

0

0   0   51 

   Ny     

37

0

0

0

39

0

0

0

43

0

0

0

49

0   0   0   28 

(55)

The matrix for targeting the eigenvector is given by M  N x1 A  N y2  At N x1 1

 0.379 0.313 0.289      0.313 0.354 0.332   0.289 0.332 0.396   

(56)

This is a symmetrical matrix as shown in Appendix 1-11. This can be solved with Jacobi method. The corresponding first eigenvalue and eigenvector is given  0.565    1; 147  0.577   0.589   

The second eigenvalue and the eigenvector are given by

(57)

Kunihiro Suzuki

180  0.749    0.098; 147  0.060   0.660   

(58)

The third eigenvalue and eigenvector are given by  0.346    0.031; 147  0.814   0.466   

(59)

The corresponding X vector is expressed by X  NxU

(60)

and the corresponding ones are given by

X

X

X

1

 2

 3

 1     1  1  

(61)

 1.324      0.104   1.121   

(62)

 0.612      1.410   0.791   

(63)

We can then evaluate Y as Y   i

N  

1



i

2 1 y

At X   i

and the corresponding ones are given by

(64)

Quantification Theory III (Correspondence Analysis)

181

Y 

 1   1    1    1

(65)

Y

 0.398    1.257    0.277     1.650 

(66)

 1.472    0.765    0.224     1.223 

(67)

1

Y

2

3

The target for this analysis is to obtain an eigenvector that provide the maximum correlation factor.   1 corresponds to this target. However, this does not hold the implicit restriction of average of 0. This is no meaning root for our purpose, and we neglect this first eigenvalue and eigenvector. We always obtain this eigenvector in the corresponding analysis, and hence always neglect the first eigenvalue and eigenvector. Therefore, we convert the second eigenvalue and eigenvector to the first ones, and the third eigenvalue and eigenvector to the second ones. The means of the first and second eigenvector are

 X 1  0.042,  X 2  0.012, Y 1  0.083, Y 2  0.096

(68)

These are not exactly 0, but close to 0. We impose that the variance is 1. The corresponding values are below.

 X 1 2  1.016,  X 2 2  0.987,  Y 1  1.113,  Y 2 2  1.054

(69)

They are not exactly 1, but close to 1. The first and second eigenvectors and eigenvalues are then given by

X

 2

X

1

 1.324      0.104   1.121   

(70)

Kunihiro Suzuki

182  0.612      1.410   0.791   

(71)

Y  Y 

 0.398    1.257     0.277     1.650 

(72)

Y  Y

 1.472    0.765    0.224     1.223 

(73)

X

 3

X

2

3

 2

1

2

 1  0.098

(74)

  2  0.031

(75)

The weighed first and second components are expressed by  1 X 1 ,  1Y 1 ,   2 X  2 ,   2Y  2

(76)

and the values are given by     1.154, 0.300  mid  30 :    x  ,    x     0.090,0.691 mid  40 :    x  ,    x     0.977, 0.388  Chinese :    y   ,    y      0.347, 0.721 Itarian :    y   ,    y     1.096, 0.375  French :    y   ,    y      0.242,0.110  Japanese :    y   ,    y      439, 0.599 

mid  20 :    x1  ,    x1 1

1

2

2

1

1 2

2

2 2

1

1 3

2

1

1 1

2

2

1

1

1 2

2

1

1 3

2

1

2

3

2 2

2

3

1 4

2

2 4

(77)

Quantification Theory III (Correspondence Analysis)

183

We can plot the above in a plane as shown in Figure 1. Mid-40 is close to Japanese dish, mid-30 is close to Chinese dish, and mid-20 is close to Italian dish. French dish is far from any ages. However, it is in the center of plots, and hence we can regard that French dish are favorite for all ages although it is not favorite for a special age.

Figure 1. Two-dimensional plot.

We can evaluate the distance between two categories and each category as shown in Table 5-Table 7.

Table 5. Distance between two categories

Table 6. Distance between ages

Distance(XX) mid-20 mid-30 mid-40

mid-20 0 1.591 2.134

mid-30 1.591 0 1.396

mid-40 2.134 1.396 0

Kunihiro Suzuki

184

Table 7. Distance between dishes

SUMMARY To summarize the results in this chapter‒ We treat two category data A and B , and they have

m

and l levels, respectively.

n

We then obtain the data of ij . The subscript i denotes the level number of category A , and j denotes the level number of category B . The following data table is shown below. We want to evaluate the assigned numerical values of

xi and y j .

Table 8. Distance between dishes

Category A1 Category A2 ・・・ Category Am Sum

Category B1 Category B2 y1 y2 x1 n11 n12 x2 n21 n22 ・・・・・・・・・ xm nm1 nm2 ny1 ny2

The sums are given by l

nxi   nij j 1

m

n yj   nij i 1

The total data number N is given by

・・・・・・・・・・・・・・・・・・・・・

Category Bl yl n1l n2l ・・・ nml nyl

Sum nx1 nx2 ・・・ nxm N

Quantification Theory III (Correspondence Analysis) m

l

i 1

j 1

N   nxi   n yj

We define the matrixes based on the data table as  n11 n12  n n22 A   21    nm1 nm 2  nx1   Nx     0 

 ny1   Ny      0 

n1l   n2l    nml 

nx 2

ny 2

0      nxm 

0       nyl 

We want to evaluate the vectors' elements below.  x1    x X  2       xm   y1    y Y  2      yl 

From the correspondence theory, we obtain

MU  U

185

Kunihiro Suzuki

186 where U  Nx X

M  N x1 A  N y2  At N x1 1

This is an eigenvalue problem with respect to a matrix U . The eigenvalues and eigenvectors are obtained as

 1 ,U 1 ;  2 ,U  2 ,  3 ,U 3 We can then get

X    N x1U i

We rescale all data as X   NX  i

i

1 We always have   1 , and the corresponding eigenvector has all the same factors, which is not appropriate for this subject. Therefore, we neglect the first eigenvalue and eigenvectors, and renumber the eigenvalues and eigenvectors as  2 1  2 1  2 1     , X  X , Y  Y  3  2  3  2   3  2     , X  X , Y  Y

We can evaluate weighted distance

d ij 

d ij

given by

 1  1 1  2   2   2  2  2 x  y  i j   1    2  xi  y j   1    2 

Further, we can evaluate the distance between each

d x ij

x

 1  1 1  2   2   2  2  2  xi  x j   1 xi  x j   1    2      2 

and each y as

Chapter 9

QUANTIFICATION THEORY IV ABSTRACT We want to evaluate the similarity of the categorical data. We assume that we can evaluate the similarity numerically. We assign the categorical data so that the values give the maximum correlation factor. We can then define the distance between two categorical data. Quantification theory IV gives the procedure to obtain the distance.

Keywords: quantification theory IV, similarity, correlation factor

1. INTRODUCTION Let us consider various kinds of cars. We assume that we can evaluate the similarity of each pair. Based on the evaluation, we want to define the distance of the two categorical data. We can obtain such results with Quantification theory IV. Table 1. Relationship between group discussion evaluation and score 1 Crown 1 2 3 4 5 6 7 8 9 10

Crown Cedric Sunny Mark II Corolla Skyline March Vitz RAV4 Pjero

2 Cedric 10 9 6 7 5 2 2 1 1 2

3 Sunny 10 7 9 6 3 3 2 1 3

4 Mark II

10 8 8 6 5 4 2 3

10 8 3 4 3 1 4

5 Corolla

6 Skyline

10 6 7 5 3 5

7 March

10 6 5 3 2

8 Vitz

10 9 7 5

9 RAV4

10 8 5

10 Pjero

10 4

10

Kunihiro Suzuki

188

2. ANALYTICAL PROCESS We assume that we can evaluate the similarity between various kinds of cars as shown in Table 1. We want to assign a value Q    rij  xi  x j 

2

(1)

j  i 

i

Since

xi to a car i . We evaluate a parameter given by

rij

expresses the similarity,

rij

expresses the non-similarity. On the other hand,

x if we assign cars i and j to values of xi and j . The square of the distance is expressed

x  x  by i

x  x  i

2

j

. We can therefore evaluate the similarity using two parameters of

 rij

and

2

. Q is the inner product of the two parameters. Therefore, we evaluate the

j

 

rij maximum value of the  xi  related to . x We impose that the variance of  i  is 1, that is, we impose

S   2



1 n 2  xi  x   n i 1 1 n 2 1 n  xi   xi  n2   n i 1 i 1 

2

(2)

1

We then obtain a Lagrange function given by L    rij  xi  x j  i

j  i 

2

2  1 n 2 1  n        xi  2   xi   n  i 1     n i 1 

Partial differentiating L with respect to xi , we obtain

(3)

Quantification Theory IV

189

L 2 n   1  2  rij  xi  x j    2 xi  2  xi  xi n i 1  i j  i   n 2 2  2  rij  rji  xi  x j     xi  2 n j n 0



n

 x  i 1

i

(4)

We can set an origin arbitrary without losing generality. Therefore, we set x  0 , that is

1 n  xi  0 n i 1

(5)

We then obtain

r

ij

j

   rji  x j      rij  rji  xi  0 j n 

(6)

We introduce variables below.

hij  h ji  rij  rji



 n

(7)

(8)

Eq. (6) can then be reduced to

h x ij

j

j

       hij  xi  0 j  

(9)

hii does not influence the magnitude of Q , and hence we can set it arbitrary as below.

h

ij

j

   rij  rji   0 j

(10)

Kunihiro Suzuki

190 Therefore, we obtain hii   hij

(11)

j i

Eq. (9) is then reduced to

h x ij

j

  xi  0

(12)

j

This can be regarded as an eigenvalue problem of a matrix Multiplying xi to Eq.(12), we obtain

 h x x  x ij i

i

j



j

2 i

Q n

H ij

.

(13)

i

We can modify Eq. (12) to

 h x ij

i

   xi  0

j

j

(14)

i

We then obtain 



  xi     hij  x j  0 i

j



i



(15)

From above analysis, we can obtain the first and second eigenvalues and eigenvectors

 1 , x1   2 , xi 2 . The distance between category i and j is denoted as d ij , and

i , given by it can be evaluated as

dij 

2  1   2   2  2  2  xi1  xj1   x  x i j    1    2   1    2 

Quantification Theory IV

191

SUMMARY To summarize the results in this chapter‒

r We obtain data which express the similarity of category i and j as ij . We can then form a matrix of H   hij  rij  rji  The diagonal elements are set as hii   hij j i

We obtain the first and second eigenvectors and eigenvalues given by

 1 , xi1 ,   2 , xi 2 . d The distance between category i and j is denoted as ij and it can be evaluated as dij 

2 2  1   2   2 2  xi1  xj1   x  xj   1  2  1  2  i     

We can extend this process to the data form of mixture of categorical and numerical ones.

Chapter 10

SURVIVAL TIME PROBABILITY ABSTRACT We discuss the survival time probability, which is important issue in a medical field. We judge effectiveness of medical treatment by evaluating the survival time data. The complete data can be obtained only when persons are dead. However, the number of complete data is limited. We have the other data of ones where we cannot trace the medical treatment, and ones where the medical treatment is on the way, that is, the persons are still alive. We want to use all these data to evaluate the survival time. We divide the survival probability by two steps: one is the survival probability up to the target time and the probability that can survive just after the target time among the survive people. We then evaluate the average survival time and the standard deviation and predict the time where alive person can live for the other time. We also evaluate the effectiveness of two kinds of treatments.

Keywords: survival time probability, Kaplan-Meier product limit prediction method

1. INTRODUCTION In a medical field, we want to evaluate the effectiveness of medical treatment. The simple clear example is the one how fatal sick people can live long after the medical treatment. We can gather data of living time period, and apply the data to standard statistical analysis. However, there are difficulty specialized with the medical treatment. The complete data are the ones of time period of death. However, we cannot wait until all people are dead. Some of them are still alive, and some of them are unable to trace. We need to develop a procedure to use whole data and evaluate the survival time.

Kunihiro Suzuki

194

2. SURVIVAL PROBABILITY We show how we can draw the dependence of survival probability on time. We consider an example of 10 mice to which carcinogenic substance are administered. We treat complete data and the time period was 2, 3, 3, 4,4 4,4, 5, 5, 8, and 10 days. The corresponding survival probability can be evaluated as follows. The survival probability for time

t is denoted as S  t  . When a mouse is dead at time

t , we regard the situation that the mouse is alive up to time t, and dead at time t  0 . Step 1

S t   1.0 for 0  t  2

(1)

Step 2

S  2  0 

9 10

(2)

Step 3

S t  

9 10

for 2  t  3 (3)

Step 4

S 3  0 

7 10

(4)

Step 5

S t  

7 10

for 3  t  4 (5)

Step 6

S  4  0 

4 10

(6)

Step 7

S t  

4 10

for 4  t  5 (7)

Survival Time Probability

195

Step 8

S 5  0 

2 10

(8)

Step 9

S t  

2 10

for 5  t  8 (9)

Step 10

S 8  0 

1 10

(10)

Step 11

S t  

1 10

for 8  t  10 (11)

Step 12 S 10  0   0

(12)

The above results are shown in Figure 1. The survival probability decreases from 1 to 0 monotonically. Since we use a unit of day, the feature is not smooth but is angular. This means that we cannot obtain stable data associated with differential parameters.

Survival probability

1.0

0.5

0.0

0

2

4

Figure 1. Time dependence of survival probability.

6 Days

8

10

12

Kunihiro Suzuki

196

3. DIFFERENT EXPRESSION OF SURVIVAL PROBABILITY Let us consider the survival probability from the different point of view in this section. The survival probability just after 3 days is given by

S 3  0 

7 10

(13)

We regard this as the product of two probabilities. One is the probability where mice can live up to 3 days. The other one is the probability that alive mice are dead. Therefore, the probability is expressed by

S 3  0 

9 7 7   10 9 10

(14)

Therefore, we obtain the same results. However, we can apply this concept to the incomplete data. The first probability is the one that mice can survive at the time, which is not clearly decided for incomplete data. The second probability is clearly defined even for incomplete data as shown in the next section.

Figure 2. Time dependence of survival probability.

Survival Time Probability

197

Figure 3. Time dependence of survival probability.

4. SURVIVAL PROBABILITY WITH INCOMPLETE DATA (KAPLAN-MEIER PREDICTIVE METHOD) We assume that we obtain data as shown in Figure 2. In the figure, 11 persons are shown although we treat the number as n . We set the starting point for each data to the origin, and sort them and obtain the data as shown in Figure 3. The survival time is denoted as

t1 , t2 ,

, tm

(15)

where

t1  t2 

 tm

(16)

Note that there are probabilities where some data values are same, and hence m  n . We assume that

dj

persons record

tj

, that is,

dj

persons are dead at the time of



t j0

.

t , t We assume that a person cannot be traced in the time period of  j j 1 . We checked whether he is alive at the time

tj  0

, but do not know the status after then.

Kunihiro Suzuki

198

The number of person who are alive just before or cannot be traced. Therefore, we can evaluate

n j   d j  wj    d j 1  wj 1  

nj

tj

equals to the number who are dead

as

  dm  wm 

(17)

The number of n j is dead among this n j persons. Therefore, the corresponding survival probability is given by

nj  d j nj

(18)

The survival probability for two continuous steps is then related to

S t j  0 

nj  d j nj

We do not know

S  t j 1  0  (19)

S t j  0

in general, but we do know that

S  0  1

(20)

Eq. (19) can be related to S  0  as S t j  0     

nj  d j nj

S  t j 1  0 

n j  d j n j 1  d j 1 nj

n j 1

n j  d j n j 1  d j 1 nj

n j 1

n j  d j n j 1  d j 1 nj

n j 1

n j  d j n j 1  d j 1 nj

n j 1

S t j 2  0 n1  d1 S  0  0 n1 n1  d1 n0  0 S  0 n1 n0 n1  d1 n1

(21)

Survival Time Probability Since we start with alive persons, d0  0 , and the survival probability as j

S t j  0   i 1

S  0  0  S  0

199 . Therefore, we obtain

ni  di ni

(22)

This is called as a Kaplan-Meier product-limit predictive method. The standard deviation associated with S  t j  0 is approximately expressed with

 S t  0  S  t j  0  j

j

di i  di 

 n n i 1

i

(23)

5. REGRESSION FOR SURVIVAL PROBABILITY We obtained survival probability in the previous section. The resultant one is not smooth and are squarish. We cannot obtain clear parameters associated with derivatives. Therefore, we want to express the survival probability with a smooth analytical function.

5.1. Exponential Function Regression We approximate the survival probability with an exponential function given by

S  t   exp  t 

(24)

We want to evaluate  to reproduce the data. Eq. (24) can be modified as

ln S  t  0  t

The difference between the theory and the data

ei  ln S ti  0  ti

(25)

ei is given by

(26)

Kunihiro Suzuki

200

The summation of the deviation Qe is then given by m

Qe   ei2 i 1 m

  ln S  ti  0    ti 

2

i 1

(27)

We set  so that Qe has the minimum value. Differentiating Qe with respect to  , we obtain m Qe   2 ln S  ti  0   ti  ti  0  i 1

(28)

We then obtain  as m



 t ln S  t i 1

i

i

 0

m

t i 1

2 i

(29)

5.2. Weibull Function Regression We assume that the death probability as f  t  . Then the accumulated death probability can be obtained by integrating f  t  from 0 to time t as F  t    f  t dt t

(30)

0

Therefore, the survival probability at time

S t 

is given by

S t   1  F t  We approximate the death probability of

(31) f t 

as a Weibull distribution as

Survival Time Probability  1

t f t       

  t   exp          

201

(32)

We then obtain the accumulated death probability as F  t    f  t dt t

0

t

  t      t  1     exp     dt          0   t    1  exp          

(33)

Therefore, the survival probability is given by

  t   S  t   exp          

(34)



We set the parameters  and so that the deviation between the theory and data is minimum. Logarithm of Eq. (34) is given by 

t ln S  t  0       

(35)

The logarithm of Eq. (35) is given by

ln   ln S  t  0     ln t   ln 

(36)

We introduce the following parameters as

K i  ln   ln S  ti  0  

(37)

a 

(38)

Kunihiro Suzuki

202

b   ln 

(39)

Eq. (36) is then given by

Ki  a ln ti  b

(40)

The deviation between the theory and data is given by

ei  Ki   a ln ti  b 

(41)

The sum Qe is then given by m

Qe   ei2 i 1 m

   K i   a ln ti  b  

2

i 1

(42)

We want to minimize this Qe . Differentiating Qe with respect to a and set it to be 0, we obtain m Qe  2  Ki   a ln ti  b  ln ti  0 a i 1

(43)

We then obtain m

m

m

i 1

i 1

a  ln ti   b ln ti   Ki ln ti i 1

2

(44)

Differentiating Qe with respect to b and set it to be 0, we obtain m Qe  2  Ki   a ln ti  b   0 b i 1

We then obtain

(45)

Survival Time Probability m

m

i 1

i 1

a ln ti  bm   Ki

(46)

From Eqs. (44) and (46), we obtain m

a

a and b as

1 m  Ki m i 1 1 m   ln ti m i 1

 Ki ln ti  i 1 m

  ln t  i 1

b

203

i

2

(47)

m 1 m  K  a ln ti    i  m  i 1 i 1 

We can then obtain parameters

(48)

a and  as

 a

(49)

 b   exp       b  exp     a

(50)

6. AVERAGE AND STANDARD DEVIATION OF SURVIVAL TIME Using the survival probability, we can evaluate an average and a standard deviation of survival time. We then evaluate time where alive people can live how long time period from now. First of all, we can evaluate the percentile of probability of 50％, which is denoted as

T0.5 , and can be easily evaluated from the data. The average survival time is given by    S  t j  0  t j  t j 1  m

j 1

where t0  0 . The variance of the survival time is given by

(51)

Kunihiro Suzuki

204  2   S  t j  0   t j  t j 1     m

2

j 1

(52)

The third and fourth moments for the survival time are given by 3   S  t j  0   t j  t j 1     m

3

j 1

4   S  t j  0   t j  t j 1     m

(53)

4

j 1

(54)

The skewness  and kurtosis  are then given by





3 3

(55)

4 4

(56)

The probability where a person, who is alive after the time period of t0 , can live time period of t from now can be evaluated as 

P t  

  t   2   1 exp    dt   2 2   t0 t 2 

   t0

  t   2  1 exp    dt 2 2  2 

(57)

where we assume a normal distribution function for the survival time. If we do not use a normal distribution, but a Pearson function for f  t  with the given moment parameters, we can obtain the corresponding probability as 

 f t  dt P t    f t  dt t0  t 

t0

(58)

Survival Time Probability

205

It may be a case where the longest untraced time is larger than tm . In that case, S  tm  0 

do not become 0. The evaluated average survival time should be under estimated. The accuracy is improved by adding a term given by    S  t j  0  t j  t j 1   S  tm  0  tmax  tm  m

j 1

(59)

 2   S  t j  0   t j  t j 1      S  tm  0   tmax  tm     m

2

2

j 1

3   S  t j  0   t j  t j 1      S  tm  0   tmax  tm     m

3

3

j 1

4   S  t j  0   t j  t j 1      S  tm  0   tmax  tm     m

4

(60)

(61) 4

j 1

(62)

The following discussion is the same as the one without this improvement.

7. HAZARD MODEL 7.1. Definition of Hazard Function A hazard function

 t  

 t 

is defined as

f t 

S t 

(63)

This expresses that the person who are alive up to the time t are dead in the next incremental time period. On the other hand, f  x  is the probability without the condition who are alive up to the time t. Let us consider human life. We are apt to be dead when we take age. Therefore, the hazard function increases significantly with increasing age. However, the number of dead people may be small and hence f  x  is then small. Therefore, the hazard function is one important parameter to understand the phenomenon. Differentiating the survival probability, we obtain

Kunihiro Suzuki

206 dS  t  dt



dF  t  dt

  f t 

(64)

We then obtain the hazard function as

 t  

f t 

S t 

dS  t    dt S t  

d ln S  t  dt

(65)

S t The hazard function includes differential form of   . However, the data of S  t  cannot be differentiated, and hence it is hard to obtain the hazard function from the data directly. We then evaluate the integral form given by

  t      t  dt t

0

t d   ln S  t dt  0 dt   ln S  t 

(66)

The gradient of the accumulated hazard function   t  corresponds to the hazard function.

7.2. Analytical Expression for Hazard Function (Exponential Approximation) If we use an analytical function for the survival probability, we can obtain analytical one for the hazard function. If we use an exponential function, we obtain

Survival Time Probability d ln S  t  dt d   ln exp   t   dt 

207

 t   

(67)

Therefore, the parameter is the just the hazard function, which is the reason we use the same notation for the exponential function.

7.3. Analytical Expression for Hazard Function (Weibull Function) If we use a Weibull function, the corresponding hazard function is given by

 t  

f t 

S t 

 1



  t   exp             t   exp          

t    

 1

t          t  1 

(68)

8. TESTING OF TWO GROUP SURVIVAL TIME We want to evaluate the difference between two groups with respect to the survival time. We performed two types of medical treatment, and want to evaluate the difference. We consider the group A and B, and the alive number for each group at the time given by





nAj   d Aj  wAj   d A j 1  wA j 1 

  d Am  wAm 

tj

are

(69)

Kunihiro Suzuki

208





nBj   d Bj  wBj   d B j 1  wB j 1 

  d Bm  wBm 

Therefore, we obtain the cross table for the time

t  tj

(70)

as shown in Table 1.

Table 1. The number of dead and alive people number for t  t j Group A

Dead

Alive

d Aj

nAj

B

d Bj

nBj

Sum

dj

nj

If there is no group dependence, the dead person number is proportional to the group person number and the expected numbers are given by

eAj  d j

nAj

eBj  d j

nj

(71)

nBj nj

(72)

Therefore, we obtain the expected dead people number for group A and B as m

E A   eAj j 1

(73)

m

EB   eBj j 1

(74)

On the other hand, the death data are given by m

DA   d Aj j 1

(75)

Survival Time Probability

209

m

DB   d Bj j 1

(76)

Therefore, the deviation between the theory and data are given by

  2

 DA  EA  EA

2



 DA  EB 

2

EB

(77)

We have the relationship between the theory and data as below. E A  EB    eAj  eBj  m

j 1

m n d nBj d j Aj j     nj j 1  n j

  

m

 dj j 1

   d Aj  d Bj  m

j 1

 DA  DB

(78)

Therefore, we obtain

EA  DA    EB  DB 

(79)

Substituting Eq. (79) in to Eq. (77), we obtain

1  2 1  2   DA  EA      EA EB 

(80)

We compare this value with the critical value for a  distribution with a freedom of 1. If Eq. (80) is larger than the critical value, we can state that the results for both groups are different, and vice versa. 2

Kunihiro Suzuki

210

SUMMARY To summarize the results in this chapter: The number of person who are alive just before

n j   d j  wj    d j 1  wj 1  

where

dj

tj

t

dj

i 1

and is given by

persons are dead at the time of

is the number of person who are alive up to the time of traced. The survival probability is given by j

nj

  dm  wm 

is the persons record j , that is,

S t j  0  

is denoted as

tj

t j0

and

wj

and become not to be able to be

ni  di ni

The standard deviation associated with S  t j  0 is approximately expressed with

 S t  0  S  t j  0  j

j

di i  di 

 n n i 1

i

The survival probability is approximately expressed with an exponential function as

S  t   exp  t  where m



 t ln S  t i 1

i

i

 0

m

t i 1

2 i

The survival probability is approximately expressed with a Weibull function as

Survival Time Probability

211

  t   S  t   exp           We introduce the following parameters as

K i  ln   ln S  ti  0   a 

b   ln 

where

a and b are given by m

a

i 1 m

  ln t  i

i 1

b

1 m  Ki m i 1 1 m   ln ti m i 1

 Ki ln ti  2

m 1m  K  a ln ti    i  m  i 1 i 1 

We can then obtain parameters

a and  as

 a  b  

  exp    a

The moment parameters are given by    S  t j  0  t j  t j 1   S  tm  0  tmax  tm  m

j 1

 2   S  t j  0   t j  t j 1      S  tm  0   tmax  tm     m

j 1

2

2

Kunihiro Suzuki

212

3   S  t j  0   t j  t j 1      S  tm  0   tmax  tm     m

3

3

j 1

4   S  t j  0   t j  t j 1      S  tm  0   tmax  tm     m

4

4

j 1

tmax is the longest time for untraced data. If tm is larger than tmax , S tm  0 is 0, and the last terms are eliminated automatically.  The skewness and kurtosis of  are then given by

3 3    44 



The probability where a person, who is alive after the time period of period of t from now can be evaluated as

t0 , can live time



P t  

  t   2   1 exp    dt   2 2   t0 t 2 

  t   2  1 exp    dt 2 2  2 

   t0

where we assume a normal distribution function for the survival time. If we do not use a normal distribution, but a Pearson function for f  t  with the given moment parameters, we can obtain the corresponding probability as 

 P t   

f  t  dt

t0  t 

t0

f  t  dt

The hazard function is defined as  t  

f t 

S t 

In the experimental data, we prefer to evaluate the accumulated hazard function as

Survival Time Probability

  t    ln S  t  In the analytical model, we can obtain

 t    for an exponential function and

 t  

  1 t 

for a Weibull function.

213

Chapter 11

POPULATION PREDICTION ABSTRACT A cohort ratio is introduced to express the change of the age constitution. The ratio expresses the probability that a certain age members move to the next generation age. We also introduce a birth ratio, which express the new members, that is, they are new born babies. Using the ratios, we can predict the constitution change of population.

Keywords: population, birth ratio, cohort ratio

1. INTRODUCTION It is important to predict the time dependence of constitution of population, which influence economics. We take a national census every five years in Japan. We can predict the dependence of population based on the continuous two national census data.

2. POPULATION IN FUTURE We use a data of national census data of 2005 as shown in Table 1, and 2010 as shown in Table 2. We predict the population in future based on the data.

Kunihiro Suzuki

216

Table 1. 2005 population data Year:2005 Sum 0 ～ 4 5 ～ 9 10 ～ 14 15 ～ 19 20 ～ 24 25 ～ 29 30 ～ 34 35 ～ 39 40 ～ 44 45 ～ 49 50 ～ 54 55 ～ 59 60 ～ 64 65 ～ 69 70 ～ 74 75 ～ 79 80 ～ 84 85 ～ 89 90 ～ 94 95 ～ 99 100 ～ 104 105 ～ 109 More than 110

Sum M ale Female 127,767,994 62,348,977 65,419,017 5,578,087 2,854,502 2,723,585 5,928,495 3,036,503 2,891,992 6,014,652 3,080,678 2,933,974 6,568,380 3,373,430 3,194,950 7,350,598 3,754,822 3,595,776 8,280,049 4,198,551 4,081,498 9,754,857 4,933,265 4,821,592 8,735,781 4,402,787 4,332,994 8,080,596 4,065,470 4,015,126 7,725,861 3,867,500 3,858,361 8,796,499 4,383,240 4,413,259 10,255,164 5,077,369 5,177,795 8,544,629 4,154,529 4,390,100 7,432,610 3,545,006 3,887,604 6,637,497 3,039,743 3,597,754 5,262,801 2,256,317 3,006,484 3,412,393 1,222,635 2,189,758 1,849,260 555,126 1,294,134 840,870 210,586 630,284 211,221 41,426 169,795 23,873 3,580 20,293 1,458 178 1,280 22 2 20

Table 2. 2010 population data. Birth ratio and cohort ratio are also shown Cohort ratio Year:2010 Sum 0 ～ 4 5 ～ 9 10 ～ 14 15 ～ 19 20 ～ 24 25 ～ 29 30 ～ 34 35 ～ 39 40 ～ 44 45 ～ 49 50 ～ 54 55 ～ 59 60 ～ 64 65 ～ 69 70 ～ 74 75 ～ 79 80 ～ 84 85 ～ 89 90 ～ 94 95 ～ 99 100 ～ 104 105 ～ 109 More than 110

Sum M ale Female Birth number Birth ratio M ale Female 128,057,352 62,327,737 65,729,615 5,296,748 2,710,581 2,586,167 5,585,661 2,859,805 2,725,856 1.0018578 1.0008338 5,921,035 3,031,943 2,889,092 0.9984983 0.9989972 6,063,357 3,109,229 2,954,128 13,494 0.00457 1.0092678 1.0068692 6,426,433 3,266,240 3,160,193 110,956 0.03511 0.9682252 0.9891213 7,293,701 3,691,723 3,601,978 306,913 0.08521 0.9831952 1.0017248 8,341,497 4,221,011 4,120,486 384,382 0.09329 1.0053495 1.0095524 9,786,349 4,950,122 4,836,227 220,103 0.04551 1.003417 1.0030353 8,741,865 4,400,375 4,341,490 34,610 0.00797 0.9994522 1.0019608 8,033,116 4,027,969 4,005,147 773 0.00019 0.9907757 0.9975146 7,644,499 3,809,576 3,834,923 0.9850229 0.9939254 8,663,734 4,287,489 4,376,245 0.9781552 0.991613 10,037,249 4,920,468 5,116,781 0.969098 0.9882162 8,210,173 3,921,774 4,288,399 0.9439756 0.976834 6,963,302 3,225,503 3,737,799 0.9098724 0.961466 5,941,013 2,582,940 3,358,073 0.8497232 0.9333804 4,336,264 1,692,584 2,643,680 0.7501535 0.8793261 2,432,588 744,222 1,688,366 0.6087033 0.7710286 1,021,707 241,799 779,908 0.435575 0.6026486 296,756 55,739 241,017 0.2646852 0.3823943 41,318 5,598 35,720 0.1351325 0.2103713 2,486 250 2,236 0.0698324 0.1101858 78 3 75 0.0166667 0.0576923

Population Prediction

217

We define a cohort ratio as

Cohort ratio  Age : 5 9  

2010 Population  Age : 5 9  2005 Population  Age : 0 4 

Cohort ratio  Age :10 14  

2010 Population  Age :10 14  2005 Population  Age : 5 9 

Cohort ratio  Age :105 109  

2010 Population  Age :105 109  2005 Population  Age :100 1004 

(1)

The cohort ratio for the age more than 110 is given by Cohort ratio  Age:more than 110  

2010 Population  Age : more than110  2005 Population  Age:105:110+more than110 

(2)

We evaluate the ratio for male and female, respectively. We can then predict the population in 2015 as

2015 Population  Age : 5 9  =Cohort ratio  Age : 5 9   2010 Population  Age : 0

4

2015 Population  Age :10 14   Cohort ratio  Age :10 14   2010 Population  Age : 5 9  2015 Population  Age :105 1009   Cohort ratio  Age :105 109   2010 Population  Age :100 1004 

(3)

The final range population is given by 2015 Population  Age:more than110   Cohort ratio  Age:more than110   2010 Population  Age:105:109+more than110 

(4)

Kunihiro Suzuki

218

We should evaluate the population in the age range from 0 to 4. This corresponds to the new born babies. We assume that female in the age range of 15 and 49 can have babies, and evaluate the corresponding population. We define the birth ratio given by Birth ratio =

Baby number Female populaton

(5)

We can then evaluate the population for the age range of 0 ~4 as

2015 Population  Age : 0 4  =Birth ratio  Age :15 19   2010 Female population  Age :15 19   Birth ratio  Age : 20 24   2010 Female population  Age : 20 24   Birth ratio  Age : 45 49   2010 Female population  Age : 45 49 

(6)

We only obtain the total population, and should divide them into male or female. The ratio of male and female was 100:105, and we divide them with the ratio. We can predict populations in future by repeating the above step. Although the cohort ratio and the birth ratio should change with time, we use it as constant.

0.4 Total 0-14 14-64 more than 65

Ratio of more than 65

0.3

100 0.2 50

0 1900

0.1

1950

2000 Year

Ratio of more than 65

Populaiton (million)

150

0 2050

Figure 1. Time dependence of population.

Figure 1 shows the dependence of the population on year. The total population decreases from 2015 and the ratio of more than 65 increases.

Population Prediction

219

SUMMARY To summarize the result in this chapter: We define the cohort ratio as

Cohort ratio  Age region i+1 

Current year population  Age region i  1 Previous year population  Age region i 

The cohort ratio for the final age region is given by Cohort ratio (Age: more than 110) = Current year population (Age: more than 110) Previous year population (Age:105:110 + more than 110

(7)

The current population is given by

Current year population  Age region i+1 =Cohort ratio  Age region i+1  Previous year population  Age region i  The first region population is the one for new birth and it is evaluated as

Current year population  Age : 0 4  =Birth ratio  Age :15 19   Previous year female population  Age :15 19   Birth ratio  Age : 20 24   Previous year female population  Age : 20 24   Birth ratio  Age : 45 49   Previous year female population  Age : 45 49  Female and male are divided using the ratio of a reference year.

Chapter 12

RANDOM WALK ABSTRACT We study random walk where a person goes to the left or right randomly. We evaluate the time evolution of distance, and the ratio that one exists in the right or left region with respect to the original location. Since the probability that a person goes to left or right is the same, we think that the probability of 0.5 for staying left region is the maximum. However, the result is opposite, and is the minimum for the value of 0.5, although the average is 0.5. Corresponding to the results, the probability for the frequency that a person cross the original point decreases with the increasing the step frequency.

Keywords: random walk, path, principle of symmetry

1. INTRODUCTION We did not consider the time evolution of a probability variable up to here. The corresponding fundamental issue is a random walk. A drunken people loses information of the path to his home. He randomly selects his next step left or right. The subject is that we predict the region that he exists during a certain time period, and the frequency that he returns the original point. We focus on this random walk in this chapter.

2. FUNDAMENTAL ANALYSIS FOR RANDOM WALK We discuss some fundamental analytical techniques to treat the random walk. We focus on one dimensional analysis here for simplicity. We relate the random walk to a coin toss. If we obtain a head, we step to the right, and if we obtain a tail, we step to

Kunihiro Suzuki

222

the left. We regard the one right step as 1 and one left step as -1. The sum expresses the location of the person. The trial number is expressed by x . The random walk is then expressed as shown in Figure 1, which is called as a path.

Figure 1. Path figure.

2.1. General Theory for Evaluating a Case Number of Path We count a case number of a path that starts from A (origin) to E . We can regard the start point A as a reference and set the coordinate as  0,0 . One example of the path is shown in Figure 1. We set the coordinate of E as  n, m  . The path is then expressed using coordinates as

 0,0 , 1, s1  ,  2, s2  , 3, s3  , ,  n, sn 

(1)

where

sn  m

(2)

The case number where the path starts form  0,0  to  n, m  is denoted as N n, m , which we evaluate as follows. m

We assume that we obtain heads as

p  q  n  p  q  m

p

times and tails

q

times, which is related to

n

and

(3)

Random Walk

223

This can be solved with respect to p as

p

nm 2

(4)

This can be regarded as a case that we select right step p times in the n trials, and the corresponding case number is given by

N n,m  n C p  n C n  m 2

(5)

Eq. (3) can also be solved with respect to q as

q

nm 2

(6)

There are some constraints for the final position as follows. The conditions p, q  0 impose that

n  m  n

(7)

This is rather obvious condition. Further, we have a constraint from Eq. (4) as n  m  2p

Therefore, the sum of n and m is an even number.

Figure 2. Principle of symmetry.

(8)

Kunihiro Suzuki

224

2.2. Principle of Symmetry We utilize a principle of symmetry in the analysis here after, which we discuss in this section. We consider the path that starts form B  k , a  for a  0 to E  n, sn  as shown in Figure 2. The number of path that holds above and also have common points with x axis is equal to the number of the paths that start from the symmetrical location of B with respect to the

x

axis denoted as B '  k , a  to E  n, sn  .

Let us consider a path that starts from B and has a common point for the first time. We reflect the path with respect to the x -axis. We can always generate this path for any path and we can establish one-to one correspondence. Therefore, the path number from

B  E is the same as the one form B'  E . Let us consider the path number. The path is expressed by

 k , a    n, sn 

(9)

This can be identical to the path given by

 0,0   n  k , sn  a 

(10)

From Eq. (5), we can evaluate the corresponding path number as

N n  k , sn  a 

n  k C n  k  sn  a 2

This is the case number where we do not care for crossing the

Figure 3. Path where all points except or the origin are in the positive region.

(11)

x -axis.

Random Walk

225

2.3. The Path Number Where All Points Except for the Starting Point Is Positive We treat a path where all points except for the starting point are positive. This is expressed as

 0,0 , 1, s1  ,  2, s2  , 3, s3  , ,  n, sn 

for s1 , s2 , , sn  0

(12)

where we set sn  m   0 . The corresponding all paths cross the point 1,1 . Therefore, the number for the path is identical to the number for the path.

1,1   n, m 

(13)

This path number is identical to the path given by

 0,0   n  1, m  1

(14)

Therefore, from Eq. (5), the corresponding path number, where we do not consider the positive region, is given by n 1 C n 1 m 1 2



n 1 C n  m  2 2

(15)

We should extract the path number, where it has common points with the x -axis. The number of path that has common points with the x-axis is identical to the number of path of

1, 1   n, m 

(16)

The corresponding path number is identical to the number of paths given by

 0,0   n  1, m  1 The corresponding paths number is given by

(17)

Kunihiro Suzuki

226 n 1 C n 1 m 1 2



n 1 C n  m 2

(18)

Therefore, the number of path that is always in the positive region except for that the origin is given by

n 1 C n  m  2 2

 n  1!  n  1!  n  m  2 n  m  2 n m nm     ! !  n 1 ! !  n 1   2 2 2   2      nm nm n n ! n! 2  2  nm nm nm nm n  n   n  2 ! 2 !  n  1  2 ! 2 !       m n!  nm nm n  n  2 ! 2 !    m  n C nm n 2

 n 1C n  m  2



(19)

m N n,m n

From the symmetrical consideration, the path number where all points are in the negative region except for the origin is the same, which is given by

m N n,m n The path number where one reaches

(20)

m for the first time after n steps is also given

by

m N n,m n

(21)

The condition is described as

s1 , s2 , , sn1  sn  m   0 

(22)

The corresponding case can be realized by reflecting path of Figure 3 and set the origin as  0, 0 . Therefore, the path number is the same.

Random Walk

227

Figure 4. The path where all points except for the edge points are in positive region.

2.4. The Number of Path from  0,0 to  2n,0  Where s1 , s2 ,

The path from

s1 , s2 ,

 0,0

to

, s2n1  0

 2n,0  where

, s2n1  0

(23)

is shown in Figure 4. The number of the path is identical to the number of path from  0,0  to  2n  1,1 where

s1 , s2 ,

, s2n1  0

(24)

The number of the path is then evaluated as

1 1 N 2 n 1,1  2 n 1 C 2 n 11 2n  1 2n  1 2 1 2 n 1 Cn 2n  1 1  2n  1!  2n  1 n ! n  1!





1  2n  2  ! n  n  1! n  1!



1 n

2 n  2 Cn 1

(25)

Kunihiro Suzuki

228

Figure 5. Path where all points are 0 or positive.

2.5. The Path Number from  0,0  to  2n, 0  Where s1 , s2 ,

, s2n1  0

We evaluate the number of paths from  0,0  to  2n, 0  where

s1 , s2 ,

, s2n1  0

This means that we allow contacting the region.

(26)

x-axis but do not allow entering the negative

We move up the path by one. The path is then modified as the path from  0,1 to  2n,1 . We add the points  0,0  and  2n  2,0 to the path. The path is then changed to start from

 0,0 to  2n  2,0 as shown in Figure 5. If we impose this path to the condition given by

s1, s2 , , s2n1  0

(27)

The number of paths is the one that we want to obtain, and it can be evaluated by changing n to n  1 in Eq. (25) as

1 2 n Cn n 1

(28)

3. THE PROBABILITY THAT A PERSON IS IN POSITIVE REGION We utilize the above analysis, and evaluate time period ratio where a person is in a positive region.

Random Walk

229

3.1. The Probability Where the Path Starts from  0,0 to  2n,0  The probability where the path starts form  0,0 to  2n,0 can be evaluated as

1 N 2 n,0 22 n 1  2n 2n C 2n0 2 2

u2 n 



1 22 n

2 n Cn

(29)

Obviously, we obtain

u0  1

(30)

3.2. The Probability That a Person Reaches x Axis at 2n Trial for the First Time We consider the case where a person reaches a x-axis at 2n time step for the first time and denote the corresponding probability f 2n . This is divided into two cases, where a person is always in positive region or always in negative region. We consider the case where he is always in positive region. The case corresponds to the path from  0,0  to  2n,0  where

s1 , s2 ,

, s2n1  0

(31)

The corresponding probability is given by

1 1 22 n n

2 n  2 Cn 1

The case number for the negative region is the same, and hence we obtain

(32)

Kunihiro Suzuki

230 1 1 2 n  2 Cn 1 22 n n 1 1  2 2 n  2 Cn 1 2  2  22 n  2 n 1 1  2 n  2 Cn 1 2n 2 2 n  2 1  u2 n  2 2n

f 2n  2 

(33)

3.3. The Probability That a Person Enters a Negative Region at 2n  1 Time Step for the First Time The events that a person enters the negative region at 2n  1 time step for the first time is denoted as G2n1 . We evaluate the probability that a person enters the negative region at

2n  1 time step for the first time. We first consider a path from  0,0  to  2n  2,0  where all points are in the positive region. The corresponding probability is given by 1 2

2n2

1 n

2 n  2 Cn 1

(34)

Further, the probability that he enters the negative region at 2n  1 time step is

1

2

.

Therefore, the probability that a person enters a negative region at 2n  1 time step for the first time is given by

1 1 1 P  G2n 1    2n 2 2 2 n

2 n  2 Cn 1

 f 2n

(35)

3.4. The Probability That a Person Does Not Cross X Axis up to 2n Time Step

event that a person reaches the

x

x-axis up to

2n times is denoted as A . The axis at 2n time step for the first time is denoted as K2n

The event that a person does not cross the

. We also express all event as  . Then, the event A is expressed as

Random Walk A    K2  K4 

 K2n

231 (36)

The corresponding probability P  A  is given by

 P  K 2n 

P  A  P     P  K 2   P  K 4    1  f2  f4 

 f 2n

(37)

We have a relationship below. 1 22 n  2 1  2n2 2 1  2n2 2

u2 n  2  u2 n 



2 n  2 Cn 1



1 22 n

2 n Cn

 2n ! 1 2n2 n !n ! 2 2 2 1 1 2n   2n  1  2n  2 ! 2 n  2 Cn 1  4 22 n  2 nn  n  1! n  1!

2 n  2 Cn 1



C 2 n  2 2 n  2 n 1



1

2 1  u2 n  2 2n  f2n

 2n  1  2n

1

2

C 2 n  2 2 n  2 n 1

(38)

Therefore, we obtain P  A  1  f 2  f 4 

 f 2n

 1   u0  u 2    u 2  u 4  

  u2 n  2  u2 n 

 1  u0  u 2 n  u2 n

(39)

3.5. The Probability That a Person Does Not Enter Negative Region up to 2n Time Steps We denote the events that a person does not enter the negative region up to 2n as B , which is expressed by B    G1  G2 

 G2n1

(40)

Kunihiro Suzuki

232

Therefore, the corresponding probability P  B  is given by

P  B   P     P  G1   P  G2    1  f2  f4 

 f 2n

 u2 n

 P  G2 n 1  (41)

We denote the events that a person reaches xaxis at 2n time as A2n , where we do not care whether it is the first time, second time, or so on. The events that a person returns to the x -axis at 2r time for the first time as B2r . A2n can then be expressed by

A2 n 

n

 B

A2 n 2 r 

2r

(42)

r 1

The corresponding probability is given by

P  A2n  

n

PB

2r

 P  A2n2r 

r 1

(43)

This is expressed by

u2 n 

n

f r 1

2 r u2 n  2 r

(44)

Figure 6. Path where length 2k is positive. (a) Start with positive region. (b) Start with negative region.

Random Walk

233

3.6. The Probability That the Length of 2k Is in Positive Region of 2n Length Path We denote the probability that the length of 2k is in the positive region of 2n length path as P2k ,2n . The corresponding case is shown in Figure 6. We prove that the probability is given by P2k ,2 n  u2 k u2 n 2 k

(45)

P2n,2n is the case where a person never enter the negative region, and hence it is given

by

P2n,2 n  u2 n

(46)

P0,2n is the case where a person never enter the positive region, and hence it is given

by P0,2n  u2n

(47)

It should be noted that u0  1, and hence Eq. (47) can be expressed by P0,2n  u0u2 n

(48)

Therefore, Eq. (45) is valid for k  0 and any n . We assume that Eq. (45) is valid up to the path length of 2n  2 for any k . What we should do is to prove that Eq. (45) is valid for the path of 2n length. This means that we can lengthen the path from 2 to any length, and valid for any k . We consider the path of length of 2n with the positive region length of 2k . We can consider two cases for that. One is the path in the positive region where we reach the x-axis at 2r time step for the first time, and have 2k  2r positive path in the rest of 2n  2r length path. The variable r can have a value from 1 to k . The other is the path in the negative region where we reach the x-axis at 2r time step for the first time, and have 2k positive path in the rest of 2n  2r length path. The variable r can have a value from 1 to n  k .

Kunihiro Suzuki

234 Therefore, we obtain

P2 k ,2 n   

nk 1 1 f 2 r P2 k  2 r ,2 n 2 r  f 2 r P2 k ,2 n 2 r 2 2 r 1 r 1 k

 1 2



k



f 2 r u2 k  2 r u2 n  2 k 

r 1

k



1 u2 n  2 k 2

r 1

1 n k f 2 r u2 k u2 n  2 k  2 r 2 r 1



nk 1 f 2 r u2 k  2 r  u 2 k f 2 r u2 n  2 k  2 r 2 r 1



(49)

On the other hand, we obtain k

f

2 r u2 k  2 r

 u2 k

r 1

(50)

nk

f

2 r u2 n  2 k  2 r

 u2n 2k

r 1

(51)

Therefore, Eq. (49) is reduced to P2 k ,2 n 

1 u2 n  2 k 2

k

 r 1

nk 1 f 2 r u2 k  2 r  u 2 k f 2 r u2 n  2 k  2 r 2 r 1



1 1 u2 n  2 k u2 k  u2 k u2 n  2 k 2 2  u 2 k u2 n  2 k





 2k  ! 2 22 k  k !

(52)

 2n  2k  ! 2 22 n  2 k   n  k !

From the Stirling’s theorem (Appendix 1-12), we can approximate n! as

n!  2n nn en Therefore, we obtain

(53)

Random Walk u2 n  

235

 2n  ! 2 2  n ! 2n 2  2n    2n  e 2 n 2n

22 n



2n n n e n



2

1



n

u2 n  2 k 







(54)

 2n  2k  ! 2 22 n  2 k   n  k ! 2  2n  2k    2n  2k  22 n  2 k

e 2 n  2 k

 2n  2k    n  k  n  k e  n  k 



 n  k    n  k 2 n  2 k

22 n  2 k 1 22 n  2 k 1

2n2k

 n  k    n  k n  k 



2

(55)

2

1

 n  k 

Eq. (52) is then approximated as

P2 k ,2 n  u2 k u2 n  2 k  

1

1

k

 n  k  1

 k n  k 

(56)

The dependence of the probability on k is shown in Figure 7. The path number is assumed to be 2n  200 . The rigorous model of Eq. (52) and the analytical model of Eq. (56) are compared. The ratio of the positive and the negative regions is equal corresponds to k  50 . The probability is the minimum at the points. The probability increasing with deviating from k  50 . It should be noted that the rigorous model is well approximated with the analytical model.

Kunihiro Suzuki

236 0.03 n = 100

Rigorous Analytical

P

0.02

0.01

0.00

0

50 k

100

Figure 7. The Dependence of probability on k. The path number 2n is 200. The rigorous model of Eq. (52) and the analytical model of Eq. (56) are compared.

We extend the analysis to the normalized one. The ratio of the positive path length to the total path length is given by

2k k  z 2n n

(57)

The corresponding probability density f  z  is given by

P2k ,2n  k  k  f  z  z

(58)

We then obtain P2 k ,2 n  k  k  

1

 k n  k  1

 k n  k  1



k nz z

k k 1   n n 1  z  z 1  z 



(59)

Random Walk

237

Finally, we obtain

1

f  z 

 z 1  z 

(60)

Figure 8 shows the corresponding results. The average ratio is 0.5 as is expected. However, the probability density is the minimum at 0.5. This means that a person is apt to be in one sided region (positive or negative). The probability P for the region a  z  b can be evaluated as P



b

a

f  z  dz

b

1  dz a  z 1  z 

 

2



sin

1

b  sin 1 a



(61)

3.0

f(z)

2.0

1.0

0.0 0.0

0.5 z

1.0

Figure 8. The probability density for the ratio where a person is in the positive region.

4. RETURN FREQUENCY TO THE ORIGIN We evaluate the frequency where a person crosses the x-axis. We show that a person rarely crosses the axis. This may be opposite to our image. Since the probability that a person goes to the left or the right is the same, we may expect the crossing frequency is large. However, this result can also be guessed from the results in the previous section, where the one side region staying probability of 0.5 is the minimum.

Kunihiro Suzuki

238

Figure 9. The path where a person reaches m for the 2n-m step.

Figure 10. The path where a person reaches m for the 2n-m step for the first time.

We start with a path where we reach mat the 2n  m step as shown in Figure 9. This path is identical to the path where a person reaches m for the first time at the 2n  m step as shown in Figure 10. The probability that a person return to the given by

f 2n 

x-axis at the

1 u2 n  2 2n

2n step for the first time is

(62)

We consider the path with length of n where

s1, s2 ,

, sn1  m, sn  m

(63)

We regards that the path as the one that reaches mfor the first time. The related probability is denoted as

m 1 m hn   N n,m n n 2

m m hn  hn 

.

is given by

(64)

Random Walk

239

Let us consider the path where we reach 1 for the first time at the 2n  1 step. This path is identical to the path where we move the path totally by (1,-1), and connect (0,0) and (1,-1). The result and path is always negative and reach the axis at the 2n step for the first time. Therefore, the corresponding case number is

1 n

2 n  2 Cn 1

(65)

Therefore, the related probability is given by 1 n

2 n  2 Cn 1

1 2

n 1

1 1 2n 2 2 n  2 1  u2 n  2 2n  f 2n



2 n  2 Cn 1

(66)

1 We also evaluate h2 n 1 as

h2 nn 1  1

1 1 N 2 n 1,1 2 n 1 2n  1 2 1 1  2 n 1 Cn 2 n 1 2n  1 2 2 n  1 ! 1 1   2n  1  n  1!n ! 22 n 1 

 2n  2  ! 1 1 2n 22 n  2  2n  2   n  1 ! n  1!

 f 2n

(67)

Therefore, we obtain

h2 n1  f 2 n 1

(68)

Inspecting above, we focus on the path that starts from y location of 0 to 1, and modify the path from the 0 point to the end, and repeat the modification. Therefore, the focused point in this case is shown with red marks in Figure 10.

Kunihiro Suzuki

240

The probability where a person cross the

x-axis m times just after the

2n steps is

f 2n  . Then the above is expressed by m

denoted as

h2 n1  f 2n 1

1

(69)

We want to prove that

f 2n   h2 n  m m

m

(70)

The corresponding case number is

m N 2 n  m, m 2n  m

(71)

Therefore, the corresponding probability is given by

h2 n  m  m

m 1 N 2 n  m, m 2 n  m 2n  m 2

(72)

The path in Figure 10 can be modified below. We move the path with the deviation 1, 1 totally in the first step and add a red line at the beginning as shown in Figure 11. We then move with the deviation 1, 1 from the point where the path crosses the

x-axis for the first time. Then the red path crosses the

x -axis

second times as shown in

Figure 12. We then move with the deviation 1, 1 from the point where the path crosses the

x-axis for the second time. Then the red path crosses the

third times as shown in Figure 13. We then move with the deviation 1, 1 from the point where the path crosses the

x-axis for the third time. Then the red path cross the 14.

x

x -axis

axis fourth times as shown in Figure

Random Walk

Figure 11. We modify the path by 1, 1 , and add connection between two black points.

Figure 12. We modify the path by 1, 1 , and add connection between two black points.

Figure 13. We modify the path by 1, 1 , and add connection between two black points.

Figure 14. We modify the path by 1, 1 , and add connection between two black points.

241

Kunihiro Suzuki

242

The path is the one where a person reaches xaxis mtimes at the 2n step while in the negative region. The corresponding case number is then

m N 2 n  m, m 2n  m We need to release the condition of negative region. We add the case as shown in Figure 15. The corresponding case number is given by

m N 2 n  m , m  2m 2n  m

Figure 15. The path where a person cross touch x-axis m times at 2n step.

The corresponding probability is given by f 2n   m

m 1 N 2 n  m , m  2m n 2n  m 2 m 1  N 2 n  m,m n  m 2n  m 2  h2 n  m m

(73)

We restrict our analysis that we realize mtimes crossing at 2n time step. However, we extend the analysis to where we obtain mtimes crossing during 2n time steps. We

g  denote it as 2 n , which is given by m

g 2 n  

1

m

2

C 2n m 2n m n

We should prove the above.

(74)

Random Walk We assume that a person touch the

x-axis

243

mtimes at the 2n  k step, and then the

person does not touch the axis for the other 2k steps. The case where a person does not touch the axis for the other 2k steps is identical to the case where a person returns to the axis at 2n step. The above means that m m m 1 g 2 n   f 2n   f 2n  

n  f 2n 

(75)

f 2n  as below. k

We study

f 2n   h2 n k k

k



k 1 2nk 2n  k 2

2 n  k Cn

(76)

This can be divided as two terms as

1

2  

C 2nk 2nk n



1

2 2

1

2

C 2nk 2nk n



1



C 2nk 2nk n

C 2 n  k 1 2 n  k 1 n 1 2

1 2

 2n  k  1!  2n  k  1  n  !n ! 2  2 n  k  n   2n  k  ! 2n  k  2n  k  n  !n !

2 n  k 1

2nk

 2  2n  k  n   1  1   2 n  k 2 n  k Cn 2n  k   2 2 n  k  2n  2k 1  2 n  k Cn 2n  k 22 n  k k 1  C 2nk 2nk n 2n  k 2 k  f 2n 

We therefore obtain

(77)

Kunihiro Suzuki

244 m m m 1 g 2 n   f 2n   f 2n  

1



2



C 2nm 2nm n



n  f 2n 

1

2

1

2

C 2 n  m 1 2 n  m 1 n

C 2 n  m 1 2 n  m 1 n 

1

2

1 1 C  n 1 n 1 Cn n n n 2 2 1 1  2 n  m 2 n  m Cn  n 1 2 2 1  2 n  m 2 n  m Cn 2 1  2n  m  !  2nm 2  n  m  !n !

C 2 n  m 11 2 n  m 11 n



n 1 Cn

(78)

We utilize that p Cq

0

for p  q

  g 2n g  and 2n can be evaluated as 0

The

(79)

g2 n  0

g 2 n 

1 22 n

1

2 n Cn

 u2n

(80)

1

1

2

C 2 n 1 2 n 1 n

 2n  1! 2  2n  1  n  ! n !  2n  ! 1 n  2 n 1 2n  2n  n  ! n ! 2



1

2 n 1

1 22 n  u2 n



2 n Cn

(81)

Eq. (77) corresponds to m m 1 m g 2 n   g 2 n   f 2n   0

(82)

Random Walk

245

This means that

g 2 n  g 2 n  1

2

(83)   g 2 n  to g 2n is evaluated m

Let us consider the case where

n

is quite large. The ratio of

0

as g 2 n  m

0 g 2 n



22 n 22 n  m

2 n  m Cn 2 n Cn

 2 n  m  !  2n  n  ! n !  2n  m  n  ! n !  2n  ! n  n  1  n  m  1  2m 2n  2n  1  2n  m  1 n  n  1  n  m  1 

 2m

1  n n   2   1 1  n     1    1  2n   

m 1  n  2    m 1  n  n     m 1  1  2n   

(84)

Therefore, we obtain  1  m 1   1  n   n  n    g  m     ln  2 n0   ln   1   m 1    g 2 n   1  2n   1  2n     

m 1

 

k



k 

 ln 1  n   ln 1  2n  k 1



m 1

 k  

k 

   n     2n  k 1



m 1

 2n k

k 1



m  m  1 4n

where we utilize a Taylor series for a small

(85)

xas

Kunihiro Suzuki

246

ln 1  x    x

(86)

Therefore, we obtain  m  m  1  m 0 g 2 n   g 2 n exp    4n    m  m  1  1  exp    4n  n 

(87)

0.06 n = 100

Rigorous Analytical

0.05

g

0.04 0.03 0.02 0.01 0.00

0

20

40

m

60

80

100

Figure 16. Dependence of probability on m.

Figure 16 shows the dependence of the probability on m. The probability decreases with increasing m. That is, a person rarely returns to the original points once he deviates from the original point. The rigorous model of Eq. (78) is well approximated with an analytical one of Eq. (87).

SUMMARY To summarize the results in this chapter: The probability where a person is in the positive region for the time length 2k among the total time length 2n is given by P2k ,2 n 

 2k  ! 2 22k  k !

 2n  2k  ! 2 22n  2k   n  k !

Random Walk

247

Then the corresponding probability function where the person is in the positive region in unit time for large

n is given by

P2 k ,2 n  u2 k u2 n  2 k  

1

1

k

 n  k  1

 k n  k 

The ratio of the positive path length to the total path length is given by

2k k  z 2n n The corresponding probability is given by

f  z 

1

 z 1  z 

The probability where a person returns to the original point mtimes during 2n path is given by g 2 n  

1

m

2

2nm

 2n  m  !  n  m  !n !

The probability function where the person returns to the origin for large n is given by g 2 n   m

 m  m  1  exp    4n  n  1

Chapter 13

A MARKOV PROCESS ABSTRACT A Marcov process assumes that the status for the next step is determined by the status of just the previous step, and not influenced by the step before the previous step. The random walk treated in the previous chapter is included in the Markov process as a special case. The Marcov process is not limited to the random walk, but it accommodates various subjects in business and economics fields, and so on. Markov process uses a transition matrix to express the change from a certain status to the next status. The elements of the transition matrix express the probabilities where one status transits to the next status. We also use an initial vector to express the initial status. The status of any time step is simply obtained by multiplying the transition matrix for corresponding time steps. We also investigate various components of the elements that express vanishing process, a supply source, and constant flux.

Keywords: random walk, transition matrix, supply source, constant flux, condition probability, network, network loop

1. INTRODUCTION In our daily life, we want to predict the future status, such as the population generation constitution, constitution of university students, a share ratio of some products, and queues for service business. We should predict the results based on the data up to now. The simplest assumption is that the next step status is determined only by the previous step status, which is called as a Markov process. If the assumption is valid, we can predict the results simply using the Markov process theory. We study the Markov process theory and show that it can treat various subjects in this chapter.

Kunihiro Suzuki

250

2. A MARKOV PROCESS FOR RANDOM WALK We treated the random walk in the previous chapter. We treat it again with a different way of a Marko process. We assume that we can only select a step to the direction of right or left. We relate the random walk to a coin toss. If we obtain a head, we step to the right, and if we get tail a tail, we then step to the left. This can be regarded as probability process. The probability to obtain a head is denoted as

p

, and one to obtain a tail is denoted

as q  1  p . Here, we assume p  1 2 . The unit of time

t is related to the step, and it is assumed to be 1 for 1 step. The

probability variable that the distance from the origin is set to be X  t  . The transition of variable is denoted as follows.

x  0   0 Probability  1

(1)

1   1 Probability = 2 x 1   1 Probability = 1  2

(2)

1   2 Probability = 4  1  x  2    0 Probability = 2  1  2 Probability = 4 

(3)

  3   1  x  3    1   3 

1 8 3 Probability = 8 3 Probability = 8 1 Probability = 8

Probability =

(4)

A Markov Process

251

In general, x  n  can have a value of n, n  2, n  4, ,   n  4 ,   n  2 , n for t  n . That is, it can have n  1 values. x  n  has a value of n  2k when we have tail k times among n trials. The related probability fn  n  2k  is given by k

1 1 f n  n  2k   n Ck     2 2

nk

1  n Ck   2

n

(5)

For example, x  3  3 corresponds to n  3, k  0 , and we have 3

1 1 f3  3  2  0   f3  3  3 C0    2 8

(6)

On the other hand, x  3  1 corresponds to n  3, k  1 , and we have 3

1 3 f3  3  2  1  f3 1  3 C1    2 8

(7)

3. TRANSITION PROBABILITY FOR RANDOM WALK The location at the time t  n is determined by the location at the time t  n  1 and it is not predicted definitely, but as a probability variables. This process can be regarded as a Markov process. The Markov process is related to conditional probability. Let us consider the transition of a probability variable X  t  . X  t  have variables at the time 0,1,2, , n  1 as a0 , a1 , a2 , , an1 . The vale at the time n is expressed by X  t   an , and it is assumed to the condition before. The probability is expressed as



P X  n   an X  0   a0 , X 1  a1 , X  2   a2 ,

, X  n  1  an 1



(8)

This is the general form. In the Marko process, it is determined only by the previous step, and hence, Eq. (8) is reduce to



P X  n   an X  n  1  an 1



(9)

Kunihiro Suzuki

252

In the random walk, the location is changed with a probability of 1 2 . Let us consider the case where the status with n  2 changes to the status with n  3 . We consider the location of x  0, 1, 2, 3 . We need to consider the condition probability from the location x  0, 1, 2, 3 with n  2 to the location x  0, 1, 2, 3 with n  3. x 3  3 Let us consider the status with n  2 to the location   , which is given by

  1 P  X  3  3 X  2   2   2 P  X  3  3 X  2   1  0 P  X  3  3 X  2   0   0 P  X  3  3 X  2   1  0 P  X  3  3 X  2   2   0 P  X  3  3 X  2   3  0

P X  3  3 X  2   3  0

(10)

x 3  2 Let us consider the status with n  2 to the location   , which is given by

  12 P  X  3  2 X  2   2   0 1 P  X  3  2 X  2   1  2 P  X  3  2 X  2   0   0 P  X  3  2 X  2   1  0 P  X  3  2 X  2   2   0 P  X  3  2 X  2   3  0

P X  3  2 X  2   3 

Let us consider the status with n  2 to the location x  3  1 , which is given by

(11)

A Markov Process

  1 P  X  3  1 X  2   2   2 P  X  3  1 X  2   1  0 1 P  X  3  1 X  2   0   2 P  X  3  1 X  2   1  0 P  X  3  1 X  2   2   0 P  X  3  1 X  2   3  0

253

P X  3  1 X  2   3  0

(12)

Let us consider the status with n  2 to the location x  3  0 , which is given by

  P  X  3  0 X  2   2   0 1 P  X  3  0 X  2   1  2 P  X  3  0 X  2   0   0 1 P  X  3  0 X  2   1  2 P  X  3  0 X  2   2   0 P  X  3  0 X  2   3  0

P X  3  0 X  2   3  0

(13)

Let us consider the status with n  2 to the location x  3  1 , which is given by

  P  X  3  1 X  2   2   0 P  X  3  1 X  2   1  0 1 P  X  3  1 X  2   0   2 P  X  3  1 X  2   1  0 1 P  X  3  1 X  2   2   2 P  X  3  1 X  2   3  0

P X  3  1 X  2   3  0

(14)

Kunihiro Suzuki

254

Let us consider the status with n  2 to the location x 3  2 , which is given by

  P  X  3  2 X  2   2   0 P  X  3  2 X  2   1  0 P  X  3  2 X  2   0   0 1 P  X  3  2 X  2   1  2 P  X  3  2 X  2   2   0 1 P  X  3  2 X  2   3  2

P X  3  2 X  2   3  0

(15)

Let us consider the status with n  2 to the location x  3  3 , which is given by

  P  X  3  3 X  2   2   0 P  X  3  3 X  2   1  0 P  X  3  3 X  2   0   0 P  X  3  3 X  2   1  0 1 P  X  3  3 X  2   2   2 P  X  3  2 X  2   3  0

P X  3  3 X  2   3  0

(16)

Summarizing above, we have a transition matrix given by 0 0 0 0 0  0 12   0 12 0 0 0 0 1 2  0 12 0 12 0 0 0   0 12 0 12 0 0  0  0 0 0 12 0 12 0   0 0 0 12 0 1 2  0  0 0 0 0 0 12 0  

(17)

A Markov Process

255

Although we obtain the above matrix by considering the transition from time of 2 to the time of 3, it is the same for any time step independent of time. The transition matrix is directly related to the transition diagram as shown in Figure 1. The numbered ball shows the status. The arrow shows the transition, and the numeric related to the arrows are transition element. The dashed arrow shows vanish, which will be discussed in detail in the next section.

Figure 1. Transition diagram for random walk.

The transition matrix of Eq.(17) can be appreciated as Figure 2, where the transition status is shown. The number of a column expresses the status before the transition, and the row number expresses the status after the transition. The corresponding elements show their transition probability.

Figure 2. The relationship between transition matrix and statuses.

The initial vector is shown in Figure 3. It is directly related to the Figure 2, and it is shown that one person exists at the location of 0 as an initial status.

Kunihiro Suzuki

256

Figure 3. Initial vector with related status.

When we multiply the transition matrix to the initial vector, we obtain the status for the next step. The transition from n  0 to n  1 can be obtained by 0 0 0 0 0  0   0   0 12      0 12 0 0 0 0  0   0  1 2  0 12 0 12 0 0 0  0  1 2       0 12 0 12 0 0  1    0   0  0 0 0 12 0 12 0  0  1 2       0 0 0 12 0 1 2  0   0   0  0    0 0 0 0 12 0    0   0 

(18)

The result is identical to Eq. (2). The transition from n  1 to n  2 can be obtained by multiplying the transition matrix to the right side of Eq. (18) given by 0 0 0 0 0  0   0   0 12      0 12 0 0 0 0  0  1 4  1 2  0 12 0 12 0 0 0 1 2   0       0 12 0 12 0 0  0   1 2   0  0 0 0 12 0 12 0 1 2   0       0 0 0 12 0 1 2  0  1 4   0  0    0 0 0 0 12 0    0   0 

(19)

A Markov Process

257

The result is identical to Eq.(3). The transition from n  2 to n  3 can be obtained by multiplying the transition matrix to the right side of Eq. (19) given by 0 0 0 0 0  0   1 8   0 12      0 12 0 0 0 0 1 4   0  1 2  0 12 0 12 0 0 0  0   3 8       0 12 0 12 0 0 1 2    0   0  0 0 0 12 0 12 0  0   3 8       0 0 0 12 0 1 2 1 4   0   0  0    0 0 0 0 12 0    0   1 8 

(20)

The result is identical to Eq.(4). Consequently, the status for n  k can be obtained by k

0 0 0 0 0 0  0 12     0 12 0 0 0 0 0 1 2  0 12 0 12 0 0 0 0     0 12 0 12 0 0 1  0  0 0 0 12 0 12 0 0     0 0 0 12 0 1 2 0  0  0 0 0 0 0 12 0   0  

(21)

4. TRANSITION MATRIX ELEMENTS 4.1. General Discussion for Matrix Elements The transition matrix expresses the transition from the status of a column number j to the status of a row number i. Therefore, if we focus on a certain j column, we can inspect that it transits to which statuses. For example, let us focus on the third column in Figure 2. This corresponds to the status of -1. The second and fourth elements in the column are 1/2. This means that it transits to the state probability of 1/2. We impolitely assume that it transits to certain status, and hence the sum of the probability should be 1. Let us consider the first column in Figure 2, which corresponds to the status of -3. Only the second row has a value of1/2. This means that it transit to the status of -2 with the probability of 1/2. Since there is no number else, the sum of the probability is not 1 in this

Kunihiro Suzuki

258

column. The other status that the transition is to the status of -4. However, the matrix does not treat the status. We need to extend the matrix to cover any status. We need infinite size of the transition matrix given by                 

0 12 0 0 0 0 0 0 0 12 0 12 0 0 0 0 0 0 0 12 0 12 0 0 0 0 0 0 0 12 0 12 0 0 0 0 0 0 0 12 0 12 0 0 0 0 0 0 0 12 0 12 0 0 0 0 0 0 0 12 0 12 0 0 0 0 0 0 0 12 0 12 0 0 0 0 0 0 0 12 0

                

(22)

This is true in the standpoint of mathematics. However, we cannot treat an infinite size matrix in the numerical calculation. We use a matrix cut from Eq. (22) with a sufficient size. If the max step is m, we should use a matrix with the size larger than 2m  1 .

Figure 4. Assumed five statuses.

We consider generalization of the transition matrix, where we consider five statuses. In general, one can transit to any status. Therefore, the general form for the transition matrix is given by  a11   a21  a31   a41 a  51

a12 a22 a32 a42 a52

a13 a23 a33 a43 a53

a14 a24 a34 a44 a54

a15   a25  a35   a45  a55 

(23)

The column number corresponds to the status 1, 2, 3, 4, and 5, and the row number corresponds to the status 1, 2, 3, 4, and5. The transition probability from the status j to the status i is given by

aij

.

A Markov Process

259

Let us consider the second column for an example. The corresponding part is given by  a12     a22   a32     a42  a   52 

(24)

Figure 5. Auto regressie case diagram.

Let us consider some special cases. The special case is staying the same state, that is, an auto regressive case, which is expressed by  a12  0     a22  1   a32  0     a42  0   a  0  52 

The corresponding diagram is shown in Figure 5.

Figure 6. Vanishing process diagram.

(25)

Kunihiro Suzuki

260

The next one is the vanishing, which is given by  a12   a22  a32   a42 a  52

 0   0  0   0  0 

(26)

The corresponding diagram is shown in Figure 6.

Figure 7. Reflection to the status-1 process diagram.

The final case is the reflection, which is given by  a12  1     a22  0   a32  0     a42  0   a  0  52 

(27)

This case shows the reflection to the satatus-1. The corresponding diagram is shown in Figure 7. If it is reflected to various statuses, we divide the value depending on the transition probability, which is given by  a12     a22  0   a32     a42   a   52 

(28)

A Markov Process

261

where

a

ij

1

(29)

i j

The corresponding diagram is shown in Figure 8.

Figure 8. Multi-reflection process diagram.

Figure 9. General expression for the transition from the status 2 to the other statuses.

In general case, there are both auto regression and multi-reflections, and is given by

 a12     a22   a32     a42  a   52 

(30)

Kunihiro Suzuki

262

If there is vanishing, the below relationship holds.

a

ij

i  all 

1

(31)

If there is no vanishing, the below relationship holds.

a

i  all 

ij

1

(32)

Further, the below relationship should holds.

0  aij  1

(33)

The corresponding diagram is shown in Figure 9. We focus on the column 2 up to here. However, the same discussion is valid for the other columns. Based on the above, we can a generate transition matrix for various cases.

4.2. Supply Source We want to supply to the status 1 by h at any step. We consider the transition matrix of Eq. (23), and set the initial condition as  b1     b2   b3     b4  b   5

(34)

When there is no supply, the statuses at the next step can be obtained as  a11   a21  a31   a41 a  51

a12 a22 a32 a42 a52

a13 a23 a33 a43 a53

a14 a24 a34 a44 a54

a15  b1    a25  b2  a35  b3    a45  b4   a55   b5 

(35)

A Markov Process

263

Next, we consider the supply to the status 1, which is given by h   0 0   0 0  

(36)

We call it as supply vector. The statuses at the next step are then given by  a11   a21  a31   a41 a  51

a12 a22 a32 a42 a52

a13 a23 a33 a43 a53

a14 a24 a34 a44 a54

a15   b1   h         a25   b2   0   a35   b3    0         a45   b4   0   a55   b5   0  

(37)

In the next step, we can obtain  a11   a21  a31   a41 a  51

a12 a22 a32 a42 a52

a13 a23 a33 a43 a53

a14 a24 a34 a44 a54

a15   a11   a25   a21  a35   a31   a45   a41  a55   a51

a12 a22 a32 a42 a52

a13 a23 a33 a43 a53

a14 a24 a34 a44 a54

a15   b1   h    h            a25   b2   0    0    a35   b3    0     0            a45   b4   0    0   a55   b5   0    0  

(38)

We can repeat this cycle. It is flexible to supply to which status, and hence the supply vector is expressed in general as  h1     h2   h3     h4  h   5

(39)

When we utilize the matrix for the initial condition, we can then use a matrix supply matrix as

Kunihiro Suzuki

264  h1 0   0 h2 0 0  0 0 0 0 

0 0 h3 0 0

0 0 0 h4 0

0  0 0  0 h5 

(40)

4.3. Supply Source Included in the Transition Matrix In the previous method, we need an additional calculation process to handle the supply source. It is rather a complex process to be improved. We add a status for the source, and express it within the framework of a transition matrix. We propose to use a status as shown in Figure 10. The initial vector for the source is a , and it supplies to the i-status at any step as hi  api

(41)

The corresponding transition matrix and the initial vector are enlarged by one order and the status after the k-th step can be evaluated as  1   p1  p2   p3 p  4  p5

0 a11 a21 a31 a41 a51

0 a12 a22 a32 a42 a52

0 a13 a23 a33 a43 a53

0 a14 a24 a34 a44 a54

0   a15  a25   a35  a45   a55 

k

a    b1   b2     b3  b   4   b5 

Figure 10. The transition diagram for source.

(42)

A Markov Process

265

4.4. Vanishing Monitor We showed that there are some vanishing processes when the sum of a certain column is less than 1. We do not express the vanishing process explicitly, and we do not have any data associated with the vanish process. However, we sometimes want to know the amount of the vanishing process. The upper one in Figure 11 shows the normal vanishing process, where the dashed line expresses it. We set the status of Cv , and set self-regression. The value associated with the status Cv expresses the accumulation of the vanishing.

Figure 11. The accumulaiton of the vanish.

4.5. Constant Flux In a transition matrix, the transition is expressed with a transition probability. Therefore, the flux from a one status to the other one is proportional to the value of the status before. Since the value changes with time, the flux changes accordingly. However, we want to use a constant value flux in some cases, where we want to express a constant flux a from a status j to a status i. We propose a constant source Sc as shown in Figure 12. We assume a self-regression for the source, which keep the value of the source constant. We also use an initial condition for the source as a . The transition probability from the source to the status j is set to be -1. This enables us to subtract a flux from the status i at all steps.

Kunihiro Suzuki

266

The transition probability from the source to the status i is set to be 1. This enables us to add a flux to the status i at all steps. Focusing on the status j and i , we obtain constant flux a from the status j to i .

b b We should be careful about the value of the status j of j . If j is smaller than a , it b cannot transfer the flux amount of a , and it should be j . Therefore, the element for the constant flux source

bSc should be

bSc  Min a, b j 

(43)

Figure 12. Transition diagram for constant flux from a status j to a status i .

4.6. Initial Condition Next, we consider an initial condition. Let us consider a transition matrix as shown in Eq. (21), where it corresponds to the status that one person starts from the status 0. The matrix operation predicts a probability how far that a person goes. If we set 10 instead of 1, this predicts how many members can go. If we set a certain member for various statuses, we can predict the probability distribution for how many members can be expected for each status.

A Markov Process

267

0 0 0 0 0  0   0   0 12      0 12 0 0 0 0  0   3 2  1 2  0 12 0 12 0 0 0  3   5 2       0 12 0 12 0 0  5    3   0  0 0 0 12 0 12 0  2   7 2       0 0 0 12 0 1 2  0   3 2   0  0    0 0 0 0 12 0    0   1 

(44)

In the above matrix, we set 5 members at the 0 status 0, 3 members at the -1 status, and 2 members at the status 1, and expect the person distribution after the next step. We can use an initial matrix instead of an initial vector. The resultant data show the detailed transition for each initial member.  0  1 2  0   0  0   0  0  0  0 0   0 0  0 0 

0 0  0  0 0  0 0 0  0  0 0  0 2 0  0  0 1 2  0 2 0   0 0 0 0 0 0 0  0 32 0 0 0 0 0 0 52 0 0 0  0 3 2 0 3 2 0 0 0 0 5 2 0 2 2 0  0 0 0 3 2 0 0 0 0 0 0 1 0 

12 0 0 0 0 12 0 0 12 0 12 0 0 12 0 12 0 0 12 0 1 0 0 0 12 0 0 0 0 1

0 0 0 0 0 0 0

0 0 3 0 0 0 0

0 0 0 5 0 0 0

0 0 0 0 2 0 0

0 0 0 0 0 0 0

0  0 0  0 0  0 0 

(45)

Summing up the row data, we obtain 0  0 0  0 0  0 0 

0 0 0 0 0 0 32 0 0 0 0 0 52 0 0 0 32 0 32 0 0 0 52 0 22 0 0 0 32 0 0 0 0 0 1

0  0     0 3 2  0 5 2    0   3  0 7 2    0 3 2  0   1 

(46)

Comparing the results of Eq. (44) and (45), we can check both results are the same.

Kunihiro Suzuki

268

5. VARIOUS EXAMPLES We apply the Markov process to various examples here.

5.1. Promotion of University Student Grade We trace the promotion of university students’ grade. We assume 4 grade to graduate a university. The gates to pass the promotion are set at the end of second and fourth grade. A student can try second times at each gate. When a student fails the second time gate, he withdraws the university. Adding that, some students withdraw every state spontaneously. We denote the corresponding status as below.        

G1:The first grade student G2:The second grade student G22: The second grade student who failed the first gate one time G3:The third grade student G4:The fourth grade student G44: The fourth grade student who failed the second gate one time F:Graduated student D:Withdrawed student

We assume that 1000 students entered the university. 5% of the students withdraw at each step spontaneously. 60% of the second grade students pass the gate. 80% of G22 students pass the gate, and the others 20% students withdraw. 70% of the fourth grade students graduate the university. 70% of G44 students graduate the university, and the other 30% students withdraw. The corresponding transition diagram is shown in Figure 13. We show a corresponding transition matrix (Figure 14), and as initial vector (Figure 15). The results are shown in Table. In the final stage of 6 years later, about 70% students graduate, and others withdraw.

A Markov Process

Figure 13. Diagram for the promotion of university students.

Figure 14. Transition matrix for university student promotion.

269

Kunihiro Suzuki

270

Figure 15. Initial vector for university student promotion.

Table 1. Year dependence of the student number for each status Status G1 G2 G22 G3 G4 G44 F D

Initial 1st year 2nd year 3rd year 4th year 5th year 6th year 1000 0 0 0 0 0 0 0 950 0 0 0 0 0 0 0 332.5 0 0 0 0 0 0 570 266 0 0 0 0 0 0 541.5 252.7 0 0 0 0 0 0 135.4 63.2 0 0 0 0 0 379.1 650.7 694.9 0 50 97.5 192.5 232.9 286.1 305.1

5.2. Promotion of University Grade in the Steady State In the previous example, we traced the promotion process of entered 1000 students. Here, we analyze the steady state of the university student under the condition that 1000 students enter the university every year. This can be analyzed using a supply source. The corresponding transition diagram is shown in Figure 16. We do not monitor the accumulation of graded and withdrawed students, and hence eliminate the corresponding status.

A Markov Process

271

Figure 16. Diagram for university student promotion with a supply source.

The source S expresses that 1000 students enter the university every year. The corresponding transition matrix and initial vector are shown in Figure 17.

Figure 17. Transition matrix and initial vector for promotion of university student under the condition that 1000 students enter every year.

Kunihiro Suzuki

272

The results after k years passed can be evaluated as 1.00  1.00  0   0  0   0  0   0   0

0 0 0 0 0.95 0 0 0.35 0 0.60 0 0 0 0 0 0 0.05 0.05

0 0 0 0 0.80 0 0 0 0.20

0 0 0 0 0 0 0 0 0 0 0.95 0 0 0.25 0 0.70 0.05 0.05

0 0 0 0 0 0 0 0.70 0.30

0 0 0 0 0 0 0 0 0

0  0 0  0 0  0 0  0  0

k

1000     0   0     0   0     0   0     0     0 

(47)

Table 2 shows the results, and Table 3 shows the student number in the steady state. Steady state is established after 7 years. The first grade student number is 1000. The second grade student number is larger than 1000, which is attributed to that some of the student cannot pass the gate. The third grade student is 836 that is attributed to the spontaneous and gated withdrawed students. The fourth grade student is 993. This is larger than the third grade students, which is attributed to that some of the student cannot graduate. The graded student number is 695 every year. Withdrawed student number is 305 every year. Table 2. Time dependence of student number of each status under the condition that 1000 students enter the university every year Status S G1 G2 G22 G3 G4 G44 F D

Step0 1000 0 0 0 0 0 0 0 0

Step1 1000 1000 0 0 0 0 0 0 0

Step2 1000 1000 950 0 0 0 0 0 50

Step3 1000 1000 950 332.5 570 0 0 0 97.5

Step4 1000 1000 950 332.5 836 541.5 0 0 192.5

Step5 1000 1000 950 332.5 836 794.2 135.4 379.1 232.9

Step6 1000 1000 950 332.5 836 794.2 198.6 650.7 286.1

Table 3. The student number in the steady state G1 G2 G3 G4 F

1000 1282.5 836 992.8 694.9

Step7 1000 1000 950 332.5 836 794.2 198.6 694.9 305.1

Step8 1000 1000 950 332.5 836 794.2 198.6 694.9 305.1

A Markov Process

273

5.3. Population Problem We treat a population problem, where we evaluate the dependence of the constitution of number of people in the age regions. We divide the age at the unit of 10 years. We assume that 95% people survive to the next division up to the 50s. It is 80% for 60s, 40% for 80s, and 0 for 90s. People at 20s and 30s generate babies with the ratio of rB . We neglect the difference between male and female in this analysis.

Figure 18. Diagram for population problem.

The corresponding transition diagram is shown in Figure 18, and the corresponding transition matrix and an initial vector are given below. 0 rB rB 0 0 0 0 0 0  0  0 0 0 0 0 0 0 0 0  0.95  0 0.95 0 0 0 0 0 0 0 0  0 0.95 0 0 0 0 0 0 0  0  0 0 0 0.95 0 0 0 0 0 0  0 0 0 0.95 0 0 0 0 0  0  0 0 0 0 0 0.95 0 0 0 0   0 0 0 0 0 0 0.80 0 0 0  0 0 0 0 0 0 0.60 0 0  0  0 0 0 0 0 0 0 0 0.40 0   0.05 0.05 0.05 0.05 0.05 0.05 0.20 0.40 0.60 1

0  100    0  100  0  100    0 100  0 100    0 100  0 100    0 100    0 100  0 100    0 0 

(48)

The column and row number is ordered as 0,10,20,…,90 ages, and the final one corresponds to the status D. The unit for the initial condition is ten-thousand.

Kunihiro Suzuki

274

We evaluated the time evolution for 5 steps, that is the change in 50 years, which is shown in Table 4. The total population monotonically decreases with time with rB of 0.5, and it increases with rB of 0.7, although it decreases first. This decrease is influenced by the initial condition. Table 4. Time evolution of population rB=0.5 Age 0 10 20 30 40 50 60 70 80 90 D

Step0 100 100 100 100 100 100 100 100 100 100 0

Step1 100 95 95 95 95 95 95 80 60 40 250 850

Step2 95 95 90 90 90 90 90 76 48 24 156 789

Step3 90 90 90 86 86 86 86 72 46 19 129 751

Step4 88 86 86 86 81 81 81 69 43 18 119 720

Step5 86 84 81 81 81 77 77 65 41 17 113 692

Step0 100 100 100 100 100 100 100 100 100 100 0

Step1 140 95 95 95 95 95 95 80 60 40 250 890

Step2 133 133 90 90 90 90 90 76 48 24 158 865

Step3 126 126 126 86 86 86 86 72 46 19 133 859

Step4 148 120 120 120 81 81 81 69 43 18 124 883

Step5 168 141 114 114 114 77 77 65 41 17 122 930

Sum rB=0.7 Age 0 10 20 30 40 50 60 70 80 90 D Sum

It is important whether the total population increases or decreases with time. The source of the people is the 20s and 30s. Therefore, the people in these age region is important, which we add some analysis. We assume that the population of any age is constant. We set the population for 20s as n20 , and that for 30s as n30 . Both are related to

A Markov Process

275

n30  0.95n20

(49)

The population for 0s is denoted as n0 , and is given by

n0  rB  n20  n30 

(50)

 1.95rB n20 The population of 10s is denoted as n10 , and is given by n10  0.95n0

(51)

 0.95  1.95rB n20

Therefore, the population for 20 s is given by

n20  0.95n10

(52)

 0.952  1.95rB n20 Therefore, the rB 

rB for constant total population is evaluated as

1  0.568 0.95  1.95

(53)

2

Table 3 shows the time evolution of the population with the above

rB . We can expect

the constant total population as is expected, and the total population is about 760. We can improve this analysis by using more realistic values. Table 5. Time evolution with the optimized rB rB=0.57 Age 0 10 20 30 40 50 60 70 80 90 D Sum

Step0 100 100 100 100 100 100 100 100 100 100 0 1000

Step1 114 95 95 95 95 95 95 80 60 40 250 864

Step2 108 108 90 90 90 90 90 76 48 24 156 815

Step3 103 103 103 86 86 86 86 72 46 19 130 788

Step4 107 97 97 97 81 81 81 69 43 18 121 774

Step5 111 102 93 93 93 77 77 65 41 17 116 768

Step6 105 105 97 88 88 88 74 62 39 16 112 762

Step7 105 100 100 92 84 84 84 59 37 16 108 759

Step8 109 100 95 95 87 79 79 67 35 15 106 761

Step9 108 103 95 90 90 83 75 63 40 14 107 762

Step10 105 102 98 90 86 86 79 60 38 16 107 760

Kunihiro Suzuki

276

5.4. Share Rate of a Product We can evaluate a share rate of a product. We assume that A, B, and C companies make the same products, and they are competitive. We obtain data as shown in Table 6, where the change of the selected company is shown from 100 people. Table 6. Change of company product

A 2nd time

A B C Sum

1st time B 10 5 25 8 15 7 50 20

C 10 12 8 Total sum 30 100

The corresponding transition matrix and diagram are shown in Table 7 and Figure 19. The calculation results are shown in Table 8. Company A has a high share rate, but it decreases with the time step, and company B increases its share rate.

Table 7. Transition matrix for change of product

2nd time

A B C

1st time A B C 0.20 0.25 0.33 0.50 0.40 0.40 0.30 0.35 0.27

Table 8. Results of transition of company

A B C

Step0 Step1 Step2 Step3 0.5 0.25 0.26 0.26 0.2 0.45 0.43 0.43 0.3 0.3 0.31 0.31

A Markov Process

Figure 19. Diagram for transition of company.

Figure 20. Schematic figure for repeat customer.

277

Kunihiro Suzuki

278

5.5. Repeat Customer We assume a trade area, where N customers possibly use a shop. We want to evaluate the time evolution of repeat customers for the shop. That is, we want to know the ratio of repeat customers in the trade area, and the ratio of repeat customer in the shop. First we need to define repeat and non-repeat customers. We treat customers who use the shop at least one time. The repeat customer is defined as one who use the shop more than 10 times per year, and the other ones are defined as the non-repeat customer, and we know the average data as

 f1  30   f2  5

(54)

We show a corresponding schematic figure (Figure 20). We set the status of the customers as repeat（ R ） and non-repeat ( NR ). The corresponding Markov transition process is shown in Figure 21. The transition probability from a repeat customer to a nonrepeat customer is , and the opposite case is  .

Figure 21. Markov transition process for a repeat customer.

Table 9. The transition of customers for continuous two years

This year

R NR Sum

Last year R NR 1300 1500 200 2000 1500 3500

Sum 2800 2200 5000

A Markov Process

279

We obtained data for 5000 customers within continuous two years as shown Table 9. We can form a transition matrix from this data given by  y1  k    0.87 0.43   0        y2  k    0.13 0.57   5000  k

(55)

where we assume that the initial condition of all non-repeat customers. Therefore, the parameters for the transition Figure 21 are given by

  0.13     0.43

(56)

y1  k  denotes the repeat customer number at k step, and y2  k  denotes the non-

repeat customer number at k step. Figure 22 shows the results evaluated with Eq. (55). The repeat customer number increases with the time step, and then saturates.

5000 Q = 5000  =6

y1, y2

4000

y1

3000 2000 y

2

1000 0

0

2

4

6

8

10

Step Figure 22. Time evolution of repeat and non-repeat customer number in the trade area.

Above results show the number of repeat and non-repeat customer numbers in the trade area. We further want to know the customer ratio of the customers who visit the shop. The total customer in in the trade area is denoted as Q , and is given by

Q  y1  y2

(57)

Kunihiro Suzuki

280

We also know the average using frequency of repeat and non-repeat customers as

f1 , f2 . The customer number who use the shop per a day is denoted as G is then given by

G

y1 f1  y2 f 2 N

(58)

where N is the number of days in a year, and we set it as 365. The repeat and non-repeat customer numbers are then given by

x1 

y1 f12 N

(59)

x2 

y2 f 2 N

(60)

We define the ratio as    f1 f 2  . Figure 23 shows the time evolution of repeat and non-repeat customer numbers who visits a shop per a day. The repeat customer number increases with the time step and then saturate, which feature is the same as the one for the customer number in the trade area.

500 Q = 5000  =6

400

x1, x2

G

300

x1

200 100 0

x2

0

2

4

6

8

10

Step Figure 23. Time evolution of repeat and non-repeat customers who visit a shop per a day.

A Markov Process

281

We can see in Figure 22 and Figure 23 that the repeat customer number increase and then saturates, which is pointed out previously. We evaluate the saturated number. In the saturated condition, the status is expected not to be changed, and we can set

 y1  1      y1       y2    1     y2 

(61)

We then obtain

 y1   y2

(62)

Finally, we obtain the saturated repeat customer in the trade region as y1 

Q 1

 

(63)

In this example, we obtain y1 

5000  3839 0.13 1 0.43

(64)

The saturated number of repeat customers who visit the shop per a day is evaluated as

x1 

f1 y1 30  3839   316 365 365

(65)

These values reproduce the results in Figure 22 and Figure 23.

5.6. Queue with Single Teller Window Let us consider a service trade. When we go to a bank, we should wait if the teller window is full, and get a service if it is vacant. We can treat this subject using a transition matrix. We assume one teller, and 5 for the maximum queue number. We assign the status number to the numbers who are in this system. Therefore, they are 0,1,2,3,4,5,6. We denote

Kunihiro Suzuki

282

the corresponding probabilities as P  0 , P 1 , , P  6 . This system is called as M/M/1(6), and Figure 24 shows this process schematically.

Figure 24. Diagram for M/M/1(6) system.

In this system, the number of person in the system increases by one with the probability of  per a unit time. This means that one person enters the system with the probability. The



number of the person in the system decreased by one with a probability of per unit time. This means that the one member’s service is finished. We need to set the unit time so that two events do not occur simultaneously. For example, 5 members enter the system per one hour. We should use a unit of time of one minute instead of one hour. The probability to increase the number is then evaluated as



5 60

(66)

The unit we used decides the time step when we multiply a transition matrix. In this example, we use a probability per one minute. Therefore, the time step is one minute. We consider all statuses from here.

Let us consider a status of 0. We consider what will happen in the next step. In this status, there is no member who is served in the teller. If a customer does not come, the status is unchanged. The corresponding probability is 1  . If a customer comes, the status is changed to status 1. The corresponding probability is  .

A Markov Process

283

We consider the status 1. If a customer does not come, and service is not finished, the status is kept. The corresponding probability is 1   1     1     . If a person comes, the status is changed to status 2. The corresponding probability is

. If a service is finished, the status is changed to status 0. The corresponding probability

is  .

The statuses 2, 3, 4, and 5 are the same as the status 1.

We consider status 6. If the service is not finished, the status is not unchanged. The corresponding probability is 1   . We do not care whether a customer comes or not. Even if a customer comes, he returns and do not enter the system. If the service is finished, the status is change to the status 5. The corresponding



probability is . The corresponding transition matrix is given by

 0 0 0 0 0  1     1     0 0 0 0     0  1     0 0 0    0  1     0 0   0  0 0 0  1     0    0 0 0  1       0  0 0 0 0 0  1     When we can evaluate

P  0 , P 1 ,

, P  6

(67)

, we can evaluate the number of persons

who are in the system L , and the number who are waiting for the service

Lq

.as

L  1 P 1  2  P  2   3  P  3  4  P  4   5  P 5  6  P  6 

(68)

Lq  1  P  2   2  P  3  3  P  4   4  P  5   5  P  6 

(69)

Kunihiro Suzuki

284

Table 10. The probability for each status, and the expected number in the system, and queue number P(0) P(1) P(2) P(3) P(4) P(5) P(6) L Lq

0.35 0.24 0.16 0.10 0.07 0.05 0.03 1.56 0.92

We assume that   0.2,   0.3 , and use an initial condition of P  0  1 with the other of 0. The initial condition corresponds to the one where no member exists in the system. We can expect that the system form a steady state after we perform 1000step cycles. Table 10 shows the results. L is 1.56 and

Lq

is 0.92.

5.7. Queue with Multi Teller Windows We assume 3 tellers, and 3 for the maximum queue number. We assign the status number to the numbers who are in this system. Therefore, they are 0,1,2,3,4,5,6. We denote the corresponding probabilities as P  0 , P 1 , , P  6 . This system is called as M/M/3(6), and the corresponding schematic figure is shown in Figure 25.

Figure 25. Diagram for M/M/3(6) system.

A Markov Process

285

In this system, the number of persons in the system increase by one with the probability of  per unit time. This means that one person enters the system with the probability. The number of the persons in the system decreases by one with a probability of

 per unit

time for the status 1, and 2  for the status 2, and 3 for the status 3, 4, 5, and 6.

Let us consider the status of 0. We consider what will happen in the next step. In this status, there is no member who is served in the teller. If a customer does not come, the status is unchanged. The corresponding probability is 1  . If a customer comes, the status is changed to status 1. The corresponding probability is  . We consider the status 1. If a customer does not come, and service is not finished, the status is kept. The 1   1     1     corresponding probability is  . If a person comes, the status is changed to the status 2. The corresponding probability is  . If a service is finished, the status is changed to the status 0. The corresponding probability is  .

We consider the status 2. If a customer does not come, and service is not finished, the status is kept. The 1   1  2   1    2 corresponding probability is  . If a person comes, the status is changed to the status 3. The corresponding probability is  . If a service is finished, the status is changed to the status 1. The corresponding probability is 2  .

We consider the status 3. If a customer does not come, and service is not finished, the status is kept. The 1   1  3   1    3 corresponding probability is  . If a person comes, the status is changed to the status 3. The corresponding probability is  . If a service is finished, the status is changed to the status 1. The corresponding probability is 3 .

286

Kunihiro Suzuki The status 4 and 5 are the same as the one for the status 3.

We consider the status 6. If the service is not finished, the status is not changed. The corresponding probability is 1  3 . We do not care whether a customer comes or not. Even if a customer comes, he returns and do not enter the system. If the service is finished, the status is change to the status 5. The corresponding probability is 3 . The corresponding transition matrix is given by  0 0 0 0 0  1      1     2  0 0 0 0    0  1    2 3 0 0 0    0  1    3 3 0 0   0  0 0 0  1    3 3 0    0 0 0  1    3 3   0  0 0 0 0 0  1  3  

(70)

We assume that   0.2,   0.1 , and use an initial condition of P  0  1 with the other of 0. The initial condition corresponds to the one where no member exists in the system. We can expect that the system form a steady state after we perform 1000step cycles. Table 10 shows the results. L is 2.30 and

Lq

is 0.40.

Table 11. The probability for each status, and the expected number in the system, and queue number for .M/M/3(6)

M M 3(6) P(0) P(1) P(2) P(3) P(4) P(5) P(6) L Lq

0.12 0.24 0.24 0.16 0.11 0.07 0.05 2.30 0.40

A Markov Process

287

The Markov process using a transition matrix can accommodate non-steady state

 if we change the matrix element conditions, and can include the time dependence of  , depending on the time.

5.8. Blood Type Transition We consider the constitution and evolution of blood types. The constituent ratio for Japanese people is AB-type:10%, A-type: 35%, B-type:25%, and O-type:30%. We analyze the time evolution of the types. We simplify some points to analyze this subject. We assume that the total population is invariable. That is the number of dead people and that of born people are the same. The generation change occurs simultaneously. The flow from the other region does not exit. The ratio is the same for male and female. This means that the number of fundamental elements of A,B, and O are constant. The blood types for the next generation are determined from the current blood type constitution, and hence it can be regarded as a Markov process. However, we cannot constitute a corresponding matrix with the above data. We have four blood types of AB, A, B, and O. However, it is 6 kinds if we consider the detailed constitution of AB, AA, AO, BB, BO, and OO. In the analysis, we use this fundamental 6 kinds of blood types although we have no corresponding data. We assign the status for each blood type as X1、X2、X3、X4、X5、and X6. The crosses of these 6 types decide the next generation blood type constitution as shown in Table 12. The crossing always generates 4 components, but some of them are the same. Table 12. Generation of blood type X1 AB

X2 AA

X3 AO

X4 BB

X5 BO

X6 OO

X1 AB

AA AB AB BB

AA AA AB AB

AA AO AB BO

AB AB BB BB

AB AO BB BO

AO AO BO BO

X2 AA

AA AA AB AB

AA AA AA AA

AA AO AA AO

AB AB AB AB

AB AO AB AO

AO AO AO AO

X3 AO

AA AO AB BO

AA AO AA AO

AA AO AO OO

AB AB BO BO

AB AO BO OO

AO AO OO OO

X4 BB

AB AB BB BB

AB AB AB AB

AB AB BO BO

BB BB BB BB

BB BO BB BO

BO BO BO BO

X5 BO

AB AO BB BO

AB AO AB AO

AB AO BO OO

BB BO BB BO

BB BO BO OO

BO BO OO OO

X6 OO

AO AO BO BO

AO AO AO AO

AO AO OO OO

BO BO BO BO

BO BO OO OO

OO OO OO OO

Kunihiro Suzuki

288

   n 1 We set the current constitution as X i , and the next generation constitution as X i . We then obtain n

X1

n 1

X 2

n 1

X 3

n 1

X 4

X 5

n 1

n 1

2  n  n 2  n  n 1 n n 2 n n 1 n n X1 X1  X1 X 2  2  X1  X 3   2  X1  X 4   2  X1  X 5   2 4 4 4 4 4 4  n  n 2 n n  X2 X4  2  X2 X5  2 4 4 2 n n 1 n n  X 3  X 4   2  X 3  X 5   2 4 4 (71)



1  n n 2 n n 1 n n X 1 X 1  X1 X 2  2  X 1  X 3   2 4 4 4 4 n 2 n n n  X 2  X 2   X 2  X 3   2 4 4 1 n n  X 3  X 3  4

(72)

1  n  n 1 n 2 n n n X 1 X 3  2  X 1  X 5   2  X 1  X 6   2 4 4 4 2 n 2 n 4 n n n n  X 2  X 3   2  X 2  X 5   2  X 2  X 6   2 4 4 4 2 n 1 n 2 n n n n  X 3  X 3   X 3  X 5   2  X 3  X 6   2 4 4 4

(73)

1  n n 2 n n 1 n n X1 X1  X1 X 4  2  X1  X 5   2 4 4 4 4  n  n 2 n n  X4 X4  X4 X5  2 4 4 1 n n  X 5  X 5  4

(74)

1  n  n 1 n n 2 n n X 1 X 3  2  X1  X 5   2  X1  X 6   2 4 4 4 2  n  n 1  n  n  X3 X4  2  X3 X5  2 4 4 2 n n 4 n n  X 4  X 5   2  X 4  X 6   2 4 4 2 n n 2 n n  X 5  X 5   X 5  X 6   2 4 4

(75)









A Markov Process

X 6

n 1

289

1  n n 1 n n 2 n n X 3 X 3  X 3 X 5  2  X 3  X 6   2 4 4 4 1 n n 2 n n  X 5  X 5   X 5  X 6   2 4 4 4  n  n  X6 X6 4



(76)

We easily obtain a steady state after about two or three cycle steps. The steady sate depends on the initial condition significantly. Since we do not know the real ratio for the detail type constitution ratio, we use it as a fitting parameter of  , and the ratio is then given by

rAA   rA0  rBB r  B0

 0.35    0.35  1    (77)

 0.25    0.25  1   

We obtained a good agreement with   0.2 as shown in Table 13. The steady state is established at the first step cycle, and it is not far from the initial state. According to the results, the ratio of AA and BB are rarely case among A and B–type people. Table 13. Time evolution of blood type constitution Type X1 X2 X3 X4 X5 X6

Step0 0.10 0.07 0.28 0.05 0.20 0.30

Step1 0.10 0.07 0.28 0.04 0.22 0.29

Step2 0.10 0.07 0.28 0.04 0.22 0.29

Ratio Blood type 0.10 AB 0.35

A

0.26

B

0.29

O

6. STATUS AFTER LONG TIME STEPS When the transition matrix is decided, we can sometimes discuss the status after long time steps.

Kunihiro Suzuki

290

6.1. Status after N Step The status after

k

steps D can be expressed as

D  Ak B

(78)

where A is a transition matrix, and B is the initial matrix, and are given by

 a11 a12  a a A   21 22    an1 an 2

a1n   a2 n    ann 

 b1    b B 2      bn 

(79)

(80)

We denote that the eigen vectors and eigen values of matrix A as x1 ,x2 , ,xn , and

1 ,2 , ,n , respectively. We then constitute a matrix P as

P   x1 x2

xn 

(81)

where

 a11 a12  a a Axi   21 22    an1 an 2

a1n   a2 n  x  i i  ann 

(82)

A Markov Process  a11  a AP   21    an1   1x1

a1n   a2 n   x x2  1  ann  n x n 

a12 a22 an 2

2 x 2

 1 0  0 2 xn     0 0 0  0   n 

  x1 x 2

 1 0  0 2  P   0 0

291

xn 

0  0   n 

(83)

1 We also evaluate the inverse matrix of P denoted as P . We can realize a diagonal matrix as

 1  0 P 1 AP     0

0  0   n 

0

2 0

(84)

Therefore, we obtain

P

1

AP



k

 1k   0    0 

0

2 k 0

0   0    n k 

(85)

On the other hand, we can extend the left side of Eq. (85) as

P

1

AP

  P k

1



  P AP   A  PP  P

AP P 1 AP



 P 1 A PP 1  P 1 Ak P

1

1

(86)

Kunihiro Suzuki

292 Therefore, we obtain  1k   0 k A  P   0 

0

2 k 0

0   0  1 P  n k 

Finally, we obtain the status after the  1k   0 D  P   0 

0

2

k

0

(87)

k

step processes as

0   0  1 P B  n k 

(88)

6.2. Steady State If there is a steady state, we can expect

Ax  x

(89)

This can be modified as

 A  E x  0

(90)

Therefore, we can evaluate the existence of the steady state as

det A  E  0

(91)

6.3. Vanishing Process If we have some vanishing elements, we can modify the transition matrix as

E R A   0 Q

(92)

A Markov Process

293

where E is a unit matrix, which expresses the status an object stay forever. Q expresses the tentative status, and R expresses the transition from the tentative status to the vanishing process. The matrix is denoted as D , the elements correspond to the probability where a status j transit to a status

k . The matrix size of D

is the same as that of R . We transfer the

status from the matrix R directly and through the status in the matrix Q to the matrix R and then reach to the status k . D  R  DQ

(93)

We then obtain

D  RE  Q

1

(94)

7. A NETWORK LOOP We form some kinds of networks. If there are loops in the network, the characteristics become unstable. Therefore, it is important to monitor the existence of the loops in the network. We apply the transition matrix to monitor the loops.

7.1. Network Matrix Let us consider a simple network as shown in Figure 26. The flow direction is expressed by arrows. The corresponding process is n1→n2→n3, and then it ends.

Figure 26. Diagram for a network path.

We consider the corresponding matrix associated with the network. The column number corresponds to the status start from, and the row number corresponds to the status to end. Then the corresponding network matrix is expressed by

Kunihiro Suzuki

294  0 0 0   N  1 0 0 0 1 0  

(95)

Since we focus on the flow, the elements are all 1. The sum of the elements along each column is not 1 in general. We only notice that the element is 0 or not 0. How, we can express the flow of n1→n2→n3. We consider the initial vector of 1   0 0  

(96)

Multiplying the network matrix to the initial vector, we obtain  0 0 0  1   0        1 0 0  0    1   0 1 0  0   0      

(97)

This means that the status is in the 2. Multiplying the network matrix to the vector of Eq. (97), we obtain  0 0 0  0   0        1 0 0  1    0   0 1 0  0   1      

(98)

This express that the status is now 3. Multiplying the network matrix to the vector, we obtain  0 0 0  0   0        1 0 0  0    0   0 1 0  1   0      

This expresses the process flow when we start from the node1. Let us use an initial vector given by

(99)

A Markov Process 0   1 0  

295

(100)

Multiplying the network matrix to the vector continuously, we obtain 0 0 0       1  0  0 0 1 0      

(101)

This expresses the flow when we start from the node2. The resultant vector shows that the location of the process after the step. If here is no loop in the network, we can expect all 0 elements.

7.2. A Network Path with a Loop Figure 27 shows a network path with a loop. It is clear that the nodes 3 and 4 form a loop. We want to obtain an algorithm to monitor the loop.

Figure 27. Diagram of network path with a loop.

The corresponding network matrix is given by 0  1 0 N  0 0  0

0 0 1 0 0 0

0 0 0 1 0 0

0 0 1 0 1 0

0 0 0 0 0 1

0  0 0  0 0  0 

(102)

We set an initial vector with all elements of 1, and multiply N to the initial vector as

Kunihiro Suzuki

296 0  1 0  0 0  0

0 0 1 0 0 0

0 0 0 1 0 0

0 0 1 0 1 0

0 0 0 0 0 1

0 1  0      0 1  1  0 1  2       0 1  1  0 1  1      0  1  1 

(103)

The node 1 vanishes and the other nodes are not 0. The node 3 element is 2 because it has a flux from the nodes 2 and 4. Multiplying N once more and we obtain 0  1 0  0 0  0

0 0 1 0 0 0

0 0 0 1 0 0

0 0 1 0 1 0

0 0 0 0 0 1

0  0 0  0 0  0 

2

1   0      1  0  1  2     1  2  1  1      1  1 

(104)

The element of the node 2 becomes 0. Multiplying once more, we obtain 0  1 0  0 0  0

0 0 1 0 0 0

0 0 0 1 0 0

0 0 1 0 1 0

0 0 0 0 0 1

0  0 0  0 0  0 

3

1   0      1   0  1  2     1   2  1   2      1   1 

(105)

Multiplying once more again, we obtain 0  1 0  0 0  0

0 0 1 0 0 0

0 0 0 1 0 0

0 0 1 0 1 0

0 0 0 0 0 1

0  0 0  0 0  0 

4

1   0      1  0  1  2     1  2  1  2      1  2 

(106)

It is clear that the feature of the vector is invariable, where elements of the node1 and 2 are 0, and the other elements are not 0.

A Markov Process

297

Therefore, we regard A as a set of nodes that are on or the right side of the loop, and is given by

A  n3, n4, n5, n6

(107)

We then consider a transverse matrix with respect to N given by 0  0 0 NT   0 0  0

1 0 0 0 0 0

0 1 0 1 0 0

0 0 1 0 0 0

0 0 0 1 0 0

0  0 0  0 1  0 

(108)

This corresponds to the network diagram with opposite direction arrows. We perform a similar operation. T We set an initial vector with all elements of 1, and multiply N to the initial vector as

0  0 0  0 0  0

1 0 0 0 0 0

0 1 0 1 0 0

0 0 1 0 0 0

0 0 0 1 0 0

0 1  1      0 1  1  0 1  1       0 1  2  1 1  1      0  1  0 

(109)

T Multiplying N once more, we obtain

0  0 0  0 0  0

1 0 0 0 0 0

0 1 0 1 0 0

0 0 1 0 0 0

0 0 0 1 0 0

0  0 0  0 1  0 

2

1  1      1  1  1  2     1  2  1   0      1   0 

(110)

Kunihiro Suzuki

298

T Multiplying N once more, we obtain

0  0 0  0 0  0

1 0 0 0 0 0

0 1 0 1 0 0

0 0 1 0 0 0

0 0 0 1 0 0

0  0 0  0 1  0 

3

1   1      1   2  1   2     1  2  1   0      1   0 

(111)

T Multiplying N once more, we obtain

0  0 0  0 0  0

1 0 0 0 0 0

0 1 0 1 0 0

0 0 1 0 0 0

0 0 0 1 0 0

0  0 0  0 1  0 

4

1  2      1  2  1  2     1  2  1   0      1   0 

(112)

It is clear that the feature of the vector is invariable, where elements of the node5 and 6 are 0, and the other elements are not 0. Therefore, we regard B as a set of nodes that are on or the left side of the loop, and is given by

B  n1, n2, n3, n4

(113)

We set a set on the loop as L , and obtain

LA

B

 n3, n4

(114)

Therefore, the nodes on the loop are determined to be 3 and 4. The above algorithm can be applied to any network. We should care that we need to multiply much time of more than number of network points.

A Markov Process

299

SUMMARY To summarize the results in this chapter: The status after and is given by  a11   a21  a31   a41 a  51

a12 a22 a32 a42 a52

a13 a23 a33 a43 a53

k

step can be expressed with the transition matrix and the initial vector

a14 a24 a34 a44 a54

a15   a25  a35   a45  a55 

k

 b1     b2   b3     b4  b   5

This can be extended to the system with a source as 1   p1  p2   p3 p  4  p5

0 a11 a21 a31 a41 a51

0 a12 a22 a32 a42 a52

0 a13 a23 a33 a43 a53

0 a14 a24 a34 a44 a54

0   a15  a25   a35  a45   a55 

k

a    b1   b2     b3  b   4   b5 

We can monitor the loop by multiplying a network matrix many times of more than elements number with the initial vector where the elements are all 1. We monitor the node numbers where the elements are not 0. We then multiply a transverse network matrix many times of more than elements number with the same initial vector. We then monitor the node numbers where the elements are not 0. The selected nodes in both processes are the nodes on the loop path.

Chapter 14

RANDOM NUMBER ABSTRACT We can simulate probability phenomenon using random numbers. Since probability number follows various probability distribution functions, we need to generate the corresponding random numbers related to the distributions. We study how to generate the random numbers in this chapter.

Keywords: random number, regularity, random series, uniform distribution, poisson distribution, normal distribution, exponential distribution

1. INTRODUCTION We want to simulate a probability process before we obtain corresponding real data, or appreciate the obtained data. In the simulation, we need to generate random numbers. Since there are various kinds of probability distribution functions, we need to generate corresponding various kinds of random numbers.

2. CHARACTERISTICS OF RANDOM NUMBER We assume a vessel with ten balls denoted as 0, 1, 2,・・・, and 9. We stir up the balls, and pick up one among them, and record the number. We then return the ball into the vessel, stir up the balls, and pick up one, and record the number. We continue the process. The resultant numbers correspond to random numbers, which have the characteristics below.

Kunihiro Suzuki

302

Characteristic 1: Principle of Equal A Priori Probabilities We repeat the above trials n times, and obtain a number series. The ratio of number ki which is associated with number i to n can be approached to the value below, that is, it is expressed by.

ki 1  n  n 10

lim

 for i  0,1, 2,

,9  (1)

This means that the all numbers of 0, 1,2, …,9 have equal priori probabilities.

Characteristic 2: No Regularity We obtain a first number. The second number does not depend on the first number. This characteristic is called as no regularity, which means that there is no correlation between them.

3. UNIFORM RANDOM NUMBER SERIES Let us consider a regular dodecahedron dice which have planes having a number of 1,2,…,12. We convert the number n to n-1. When we have a number 11, and 12, we do not record and try again until we obtain a number of less than 11. We can then obtain a number series between 0 and 9. If we use two dices and assign the number of the first dice to the first digit, and the second dice to the second digit, we obtain a random number series between 00 and 99. n If we use n dices, we can generate a random number series between 0 and 1  10 .

4. NUMERICAL UNIFORM RANDOM NUMBER GENERATION METHOD We can generate a random number numerically. Lehmer proposed a method to generate a random number. We set

xn1  15  xn x0  1

 mod10  1 6

(2) (3)

Random Number

This means that

303

xn 1 is a reminder of 15xn divided by 106  1 . This can be performed

as x0  1 x1  15

x2  15  15  mod 106  1 x2  225

x3  15  225  mod 106  1 x3  3375

x4  15  3375  mod 106  1

(4)

x4  50625

x5  15  50625  mod 106  1 x5  759375

x6  15  759375  mod 106  1 x6  390614

We then obtain a random series given by 1,15, 225,3375,50625,759375,309614,

(5)

6 6 This is the random number series between 0 and 10 . Dividing them by 10 , we obtain

a random number series with a region of  0,1 . We can generalize it more, and define below.

xn1  axn

 mod 10

m

 1

x0  b

(6) (7)

We can then obtain a number series given by x0 , x1 , x2 , x3 , x4 ,

(8)

This is a random number series between 0 and 10m . Dividing them by 10m , we obtain a random number series with a region of  0,1 .

Kunihiro Suzuki

304

This random series are not perfect, and have some periodic characteristics. Therefore, m

it is called as quasi random series. Roughly speaking, the periodicity is about 10 . This should be much larger than the trial event times, and is worth to be used in practical cases.

5. TESTING OF RANDOM NUMBER SERIES We should check the random number in the stand point of principle of equal priori probabilities and no regularity.

5.1. Testing of Equal Priori Probabilities We divide the range  0,1 by l regions. The value at the edge of the divided region is denoted as pi . Therefore, we have p0  0, pl  1

(9)

We denote the number of the data in the i-th region as fi . The expected data number Fi where the data are in the i-th region is given by Fi  N   pi  pi 1 

(10)

We define the  2 as l

 fi  Fi 

i 1

Fi

2  

2

(11)

2 This can be regarded to follow  distribution with a freedom of l  1 , and the critical 2 value  c is denoted as

c2   2  l  1, P 

(12)

2 2 where P is the prediction probability. If the  is smaller than  c , the equal priori probabilities is valid.

Random Number

305

5.2. Testing of No Regularity Correlation Factor Testing To test the no regularity of a random number series, we utilize a correlation factor. Since a random number series is one dimensional number, we generate two dimensional one from the series. We use a number and a number apart from k-t, that is we use xi and xi  k pair, and the correlation factor is evaluated as 1 n   xi  x  xi  k  x  n i 1 r 1 n 2   xi  x  n i 1 1 n 1 n 1 n x x  x x  x  i ik n   xi k  x 2 i n i 1 n i 1 i 1  1 n 2  xi  x   n i 1

(13)

1 n  xi xi  k  x 2 n i 1  1 n 2   xi  x  n i 1

when i  n  k  1, we regard xi  k as

xi  k  x i  k   n

(14)

The data number is n. We introduced a variable as t  n2

r 1  r2

This follows a

(15)

t distribution with a freedom of n  2 as shown in Chapter 2 of volume

2. The critical value tc is denoted as

tc  t  n  2, P 

(16)

Kunihiro Suzuki

306

where P is the prediction probability. If the ensured.

t is smaller than tc , the no regularity is

We can select any k , but values between 1 and 5 are frequently used.

Combination Testing We divide a random number series by a certain length. For example, 10 length. We then obtain m divisions. We further categorize a random number as below. If the random number is less than 0.5, we assign it to 0, and else we assign to 1.We count the number of 1 in each division. If the random number series has no regularity, the division number which has

k of 1 is denoted as

Ek is

10

1 Ek  m  10 Ck   2

for k  0,1,2, ,10

(17)

We denote the data of the random number series as k , and define the  2 as

2 



  k  Ek 

k

2

Ek

(18)

2 This can be regarded to follow  distribution with a freedom of m  1 , and the

critical value

 c2 is denoted as

c2   2  m  1, P 

(19)

2 2 where P is the prediction probability. If the  is smaller than c , the equal priori probabilities is ensured.

Runs Testing We assign the random number of less than 0.5 to A and else to B . The random number series is converted as below for example

BAABBBABBABAAABABB

(20)

Random Number

307

We call AA , BB , and AAA , where we fine continuous character rows, as runs. We define the length of each run as the number of the character. We test the order of A and

B have no regularity. The number of A and B are n A and nB , respectively. We set n as number of the random number, and hence,

n  nA  nB

(21)

If no regularity is ensured, it is known that the average and variance of number of run approached to



2nAnB 1 nA  nB

2 

2nA nB  2nA nB  nA  nB 

 nA  nB 2  nA  nB  1

(22)

(23)

With increasing n . We then obtain a normalized variable as

z

x 

(24)

We assume that it follows a standard normal distribution. The corresponding critical value is denoted as is smaller than

z P , where subscript P denotes the predictive probability. If the

z

z P , no regularity is ensured.

6. RANDOM NUMBER SERIES ASSOCIATED WITH VARIOUS PROBABILITY DISTRIBUTIONS Utilizing uniform random series, we convert to the one for various probability distribution functions.

308

Kunihiro Suzuki Uniform random numbers consist of random numbers in the range of  0,1 . However,

we treat uniform random natural numbers in the range of 1,10 to explain the procedure clearly.

(a)

(b) f , which is converted from the Figure 1. Random number generation for distribution function uniform random numbers. (a) Uniform random number. (b) Converted random number.

In the uniform random number series, the probability that a certain number occurs is the same, which is shown in Figure 1(a) as f . We have a probability distribution function as shown in Figure 1 (b). We convert the number of x to the number of t as shown in

Random Number

309

Table 1. We can then obtain a random number associated with the probability distribution function. Table 1. Converted random number table

x

t 1 2 3 4 5 6 7 8 9 10

t1

2t1 3t1 4t1

7. INVERSE TYPE RANDOM NUMBER GENERATION FOR GENERAL PROBABILITY FUNCTION We investigate the procedure in the previous section mathematically. The probability associated with the uniform number with range  0,1 is denoted as express the uniform random series with the range  0,1 as expressed by g  x  x  f  t  t

Since

g  x  1

g  x  1

Rand 1

and is 1. We

. The conversion is

(25)

, this is reduced to

x  f  t  t

(26)

Expressing this equation with an integral form, we obtain



x

0

dx   f  t  dt t

0

Therefore, we obtain

(27)

Kunihiro Suzuki

310 x  F t 

(28)

where F  t    f  t  dt t

(29)

0

Setting and inverse function of F t  as invF , we obtain t  invFx   invFRand 1

(30)

This is the form of the random number associated with an arbitrary probability distribution function f  t  .

8. RANDOM NUMBER SERIES FOR EXPONENTIAL DISTRIBUTION An exponential probability distribution is expressed by f t    exp t 

(31)

We then obtain an integral function as

F t   0  exp t dt  1  exp t  t

(32)

Therefore, we obtain x  1  exp t 

(33)

Finally, we obtain

t

1



ln 1  x   

1



ln 1  Rand 1

(34)

Rand 1 is the random series with a range of  0,1 . Therefore, 1  Rand 1 is also the

random series with a range of  0,1 . Therefore, we can also use a form given by

Random Number

311

1 t   ln Rand 1 

(35)

The range of Rand 1 are commonly used as

0  Rand 1  1

(36)

That is, we have a probability of case. When

Rand 1

Rand 1  0

, but no probability of

Rand 1  1

in this

is 0, Eq. (35) infinitely diverges. Therefore, Eq. (34) is preferred. If

the range of Rand 1 is given below 0  Rand 1  1

(37)

We should use Eq. (35) instead of Eq. (34) to avoid the infinite divergence problem.

9. RANDOM NUMBER SERIES FOR POISSON DISTRIBUTION We apply the above procedure to the Poisson distribution and obtain

 t  x exp t dk ' 0 k '! k

k'

(38)

We should obtain k which satisfies Eq. (38). We treat k and k ' as continuous number in Eq. (38). However, they are natural numbers in reality in the Poisson distribution. Therefore, Eq. (38) is not valid. We express the right side of Eq. (38) correctly, and obtain k

x  Rand 1  

k ' 0

t k ' exp  t  k '!

(39)

The left side and the right side are not equal in general. Therefore, we evaluate the

minimum k that exceed x Rand 1 . A Poisson distribution can be regarded as one, where the event occurring time period follows an exponential distribution.

Kunihiro Suzuki

312

We generate the exponential random number that corresponds to the event occurring period. When the sum of the period exceeds t for the first time, the number of k can be the target random number. This process can be expressed by, t  t1  t2  

1

 tk

ln 1  Rand1 1 

1

ln 1  Rand 2 1 

  1   ln 1  Rand1 1 1  Rand 2 1 



1



ln 1  Rand k 1

1  Rand 1 k

(40)

where tn  

1



ln 1  Randn 1

(41)

We then have

1  Rand 1 1  Rand 1 1  Rand 1  exp  t  1

2

k

(42)

The minimum k that holds Eq. (42) corresponds to the target k . In this case, we need not to care about an infinite convergence problem, and hence can use below.

1  Rand 1  Rand 1 i

(43)

i

Therefore, it can be expressed by simpler form as Rand1 1 Rand 2 1

k

Rand k 1   Rand n 1  exp  t 

(44)

n 1

10. RANDOM NUMBER SERIES FOR A NORMAL DISTRIBUTION The normal distribution is given by f t  

  t   2  exp     2   2   1

(45)

Random Number

313

The integral is given by t

  t '  2   1 F t    exp     dt '    2    2 1  1  Erf 2

(46)

 t       2  

Therefore, we obtain

t    2 Erf 1 2Rand 1  1

(47)

We can generate a random number series utilizing a central limit theorem. We convert

t as z

t

(48)



This follows a standard normal distribution. The average of Rand 1 is given by

1

 x 1dx  2 1

(49)

0

The corresponding variance is given by 1

2

1

1 1     2   x   1dx    x  x  dx 2 4 0  0  1 1 1    3 2 4 463  12 1  12

(50)

When we sum up n variables, and increase n, the distribution approaches to a normal distribution with average of

n

2

and variance of

n

12

. Therefore, the variable

Kunihiro Suzuki

314  n  n   Rnad i 1   2  z   i 1 n 12

(51)

can be regarded to follow a standard normal distribution. The converted variable  n  n   Rnad i 1   2  t    i 1  n 12

follows a normal distribution with an average of n  12 , we obtain

(52)

 and standard deviation of  . Setting

 12   t     Rnadi 1   6       i 1

(53)

The value that follows a normal distribution has both negative and positive sign. However, we only use positive value in some cases. For example, service time and the time period between the events occur. In that case, we neglect the negative random number, and only take positive ones. The real average of the random number deviate the average used in the distribution expression, and the average  ' as evaluate



 ' should be larger than  . We



'

  t   2   1 t exp       dt  2   2   0 

   0

  t   2  1 exp      dt 2   2  

(54)

Introducing a variable s

t 2

(55)

Random Number

315

we obtain 

 '  



  t   2   1 t   exp       dt   2   2   0 

   0









 2 

  2

  t   2  1 exp      dt 2   2  

s exp   s 2  dt exp   s 2  dt

  exp         1  erf 

(56)

     2   2

      2  

That is, we obtain

  exp      '     1  erf  This is larger than

         2  

   2 

2

(57)

 as is expected.

11. RANDOM NUMBER SERIES FOR NATURAL NUMBERS BETWEEN 1 AND N We want to generate the natural number such as the dice, which can be realized as

Int  Rand 1  10n  1

(58)

Kunihiro Suzuki

316

12. TWO RANDOM NUMBERS THAT FOLLOW NORMAL DISTRIBUTIONS WITH A CORRELATION FACTOR OF  We first generate two independent random numbers of X1 and X 2 that follow standard normal distributions. However, this distribution is not a constraint. We can use any type of random numbers. We then convert them as Y1  X1

(59)

Y2   X 1  1   2 X 2

(60)

The corresponding expected values and variances are evaluated as below. E Y1   E  X1 

(61)

V Y1   V  X1 

(62)

E Y2   E   X 1  1   2 X 2   

(63)

  E  X1   1   2 E  X 2 

V Y2    2V  X 1   1   2 V  X 2   2  1   2 C ov  X 1 , X 2    2V  X 1   1   2 V  X 2 



(64)



Cov Y1 , Y2   E  X 1  X 1  1   2 X 2      E  X 12   1   2 Cov  X 1 , X 2 

(65)

  E  X 12 

In reality, the averages of X1 and X 2 are sample data and their average deviate from 0, and the variance either deviate from 1. We evaluate the averages and variances of X1 and X 2 as 1 

1 n  xi1 n i 1

(66)

Random Number  1 2 

2 

1 n 2   xi1  1  n i 1

1 n  xi 2 n i 1

 2 2 

1 n 2  xi 2  2   n i 1

317 (67)

(68)

(69)

We further evaluate the covariance as



1 n   xi1  1  xi 2  2  n i 1

(70)

We then normalize two variables as

zi1 

zi 2 

xi1  1

 1 2 xi 2  2

 2 2

(71)

(72)

We convert them as

yi1  zi1

(73)

 1 2  1 2 yi 2      z  zi 2  i1  1   2  1  2 

(74)

The averages are then 0 and variances are as below. E Y1   E  Z1   0

(75)

V Y1   V  Z1   1

(76)

Kunihiro Suzuki

318

 1 2  1 2  E Y2   E      Z1  Z2  2 1    1  2     1 2  1 2     E  Z1   E Z2  2  1    1  2  0

(77)

 1 2  1 2  V Y2   V     Z  Z2   1 1   2  1  2    2

  1 2  1 2 1 2  1 2    V Z  V Z  2    Cov  Z1 , Z 2        1 2   1   2  1  2 1   2  1   2                 

2

 1 2  1 2  1 2 1 2    2     2 2 2  1    1    1   1  2  1 2  1 2 1 2  1 2     2  2 2 1     1  1   2  1   2 1   2  1 2  1 2      2 1    1   2  1   2  1 2 2 1 2  2    1  2 1  2  

 2 1   2   1   2  2  1   2 1  2

 2   2 2   2   2 2  1   2 1  2

1

(78) The covariance is given by   1 2  1 2  Cov Y1 , Y2   Cov  Z1 ,     Z  Z2   1 1   2  1  2      1 2  1 2 2      E Z  E  Z1 , Z 2   1  1   2    1  2   

1 2 1 2   2 1  1  2

(79)

Random Number

319

SUMMARY To summarize the results in this chapter: We set

xn1  axn

 mod 10

m

 1

x0  b We can then obtain a number series given by

x0 , x1 , x2 , x3 , x4 , This is a random number series between 0 and 10m . Dividing them by 10m , we obtain a random number series with a region of  0,1 . We can generate the random number associated with a probability function below. We integrate the function as below.

f t 

as

F  t    f  t  dt t

0

Setting an inverse function of F t  as invF , we obtain t  invFx   invFRand 1

An exponential probability distribution is expressed by f t    exp t 

and the related random number can be generated as

t

1



ln 1  x   

1



ln 1  Rand 1

The random number associated with a Poisson distribution can be obtained from below.

Kunihiro Suzuki

320 Rand1 1 Rand 2 1

k

Rand k 1   Rand n 1  exp  t  n 1

The number k that breaks the above first is the target one. We can generate a random number associated with a normal distribution as

 12   t     Rnadi 1   6       i 1 We can obtain two kinds random numbers with correlation. We first generate independent two random numbers of X1 and X 2 , and convert them as Y1  X1

Y2   X 1  1   2 X 2 Then the number corresponds to the random number with a correlation factor of

.

Chapter 15

MATRIX OPERATION ABSTRACT Matrix operation is important and fundamental mathematical analysis in statistics. Therefore, we treat the operation of a matrix. We treat sum, product, and inverse matrices, determinant of a matrix, eigenvalues, and eigenvectors.

Keywords: matrix, inverse matrix, transverse matrix, determinant, eigenvalue, eigenvector

1. INTRODUCTION Matrix operation is a base of the statistics, and many analyses in the statistics are based on this subject. Therefore, we treat basic operations of matrix in this chapter.

2. DEFINITION OF A MATRIX A matrix A with  a11 a12  a a A   21 22    an1 an 2

n

rows and m columns are defined by a1m   a2 m    anm 

a The element of i -th row and j -th column is denoted as ij .

(1)

Kunihiro Suzuki

322

When m  n , it is called as a square matrix with an order of n , and aii is called as the main diagonal components. A square matrix with the main diagonal elements are all 1 and the other elements are 0 is called as a unit matrix, which is given by 1  0 0 E     0

0 1 0

0 0 1

0

0

0

0  0 0   0  1 

(2)

3. SUM OF A MATRIX Sum and difference of matrices of A and B are denoted as C and is given by  a11   a21    an1

a12 a22 an 2

a1n   b11 b12   a2 n   b21 a22      ann   bn1 bn 2

b1n   c11 c12   b2 n   c21 c22      bnn   cn1 cn 2

c1n   c2 n    cnn 

(3)

The elements are given by

cij  aij  bij

(4)

4. PRODUCT OF A CONSTANT NUMBER AND A MATRIX A product of a constant number k and a matrix A is given by  a11  a k  21    an1

a12 a22 an 2

a1n   ka11   a2 n   ka21      ann   kan1

ka12 ka22 kan 2

ka1n   ka2 n    kann 

(5)

Matrix Operation

323

5. A PRODUCT OF TWO MATRICES RELATED TO A SIMULTANEOUS EQUATIONS A product of two matrices is rather difficult. It is convenient to relate it to simultaneous equations. It should also be noted that solving the simultaneous equation is one of the important subject for matrix operation. A simultaneous equation for n-variables of x1 , x2 , , xn is given by a11 x1  a12 x2   a21 x1  a22 x2    an1 x1  an 2 x2 

 a1n xn  b1  a2 n xn  b2  a1n xn  bn

(6)

This is described with a matrix form as  a11   a21    an1

a12 a22 an 2

a1n  x1   b1      a2 n  x2   b2           ann  xn   bn 

(7)

The first matrix in Eq. (7) is called a coefficient matrix, and it is an n-th order square matrix for n-variable simultaneous equations. The product is easily appreciated as n

bi   aik xk k 1

(8)

The definition of the product can be easily generalized as follows. We assume that a matrix A has n rows and mcolumns, and a matrix B has m rows and l columns. We can then perform a product operation. The retraction for the product is that the column number of the matrix A and row number of the matrix B must be the same. We can then obtain a matrix C of the product of the matrix A and B as  a11   a21    an1

a12 a22 an 2

a1m  b11 b12  a2 m  b21 a22   anm  bm1 bn 2

b1l   c11   b2l   c21      bnl   cn1

c12 c22 cn 2

c1l   c2l    cnl 

(9)

Kunihiro Suzuki

324

The elements of the matrix C are given by m

cij   aik bkj k 1

(10)

The row number of C is the same as the row number of A , and the column number of C is the same as the column number of B , that is, the matrix C has a row number of

n and a column number of l . It should be noted that the product AB and BA are different, and sometimes the other cannot be performed. In the above example, the column number of the matrix B is l and the row number of the matrix A is n, and they are different in general, and we cannot perform the product operation. It should also be noted, the product of a unit matrix and any square matrix is independent of the order, that is, we have

EA  AE  A

(11)

6. TRANSVERSE MATRIX T A transverse matrix with respect to A is denoted as A , which is given by changing the row and column number, and is given by

 a11   a21    an1

a12 a22 an 2

T

a1n   a11   a2 n  a   12     ann   a1n

a21 a22 a2 n

an1   an 2    ann 

(12)

T The elements of the transverse matrix aij is expressed by

aijT  a ji

(13)

Let us consider the transverse matrix of a product of AB , that is, let us consider

 AB T .

Matrix Operation

325

The elements of  AB  should be expressed by T

 ab ij

 c ji

T

m

  a jk bki k 1 m

  bki a jk k 1

(14)

Therefore, it is expressed as

 AB 

T

 BT AT

(15)

This can be generalized as

 A1 A2

Am1 Am   AmT AmT 1 T

A2T A1T

(16)

7. SOLUTION OF A SIMULTANEOUS EQUATIONS We consider solving simultaneous equations of (6) with a matrix form of Eq.(7). The corresponding solution is given by  x1   a11     x2    a21        xn   an1

a1n   a2 n    ann 

a12 a22 an 2

1

 b1     b2       bn 

(17)

where  a11   a21    an1

a12 a22 an 2

a1n   a2 n    ann 

1

(18)

1 is called as an inverse matrix of A , and it is denoted as A . This is defined because the

product of A and A1 becomes an unit matrix as

Kunihiro Suzuki

326  a11  a A1 A   21    an1

a1n  a11  a2 n  a21   ann  an1

a12 a22 an 2

a12 a22 an 2

1

a1n  1 0   a2 n  0 1      ann  0 0

0  0   1

(19)

1 Multiplying A from the left side of Eq. (7), we obtain

 x1   b1      x b A1 A  2   A1  2           xn   bn 

(20)

The left side of Eq. (20) is reduced to 1 0  0 1   0 0

0   x1   x1      0   x2   x2           1  xn   xn 

(21)

Finally, we obtain Eq. (17). Therefore, we need to obtain an inverse matrix of Eq. (18), which enables us to solve the simultaneous equation of Eq. (17). We discuss the procedure to obtain the inverse matrix in the following section.

8. GAUSS ELIMINATION METHOD We consider a simple example of 4 variable simultaneous equations given by

4x  3y  2z  u  9 2 y  4 z  3u  8 4z  u  2 3u  6

(22)

The simultaneous equation is called as an upper triangular matrix type simultaneous equation. This can be solved easily as follows. The last equation gives

Matrix Operation

u

6 2 3

327

(23)

The equation of the second equation from the last gives 2u 4 22  4 1

z

(24)

The equation of the third equation from the last gives 8  4 z  3u 2 846  2 3

y

(25)

The equation of the fourth equation (that is, top of the equation) gives 9  3y  2z  u 4 9922  4  1

x

(26)

Therefore, if the equation is reduced to the form of an upper triangle matrix, we can easily obtain a corresponding solution. Let us consider a Gauss elimination method. We consider the simultaneous equation given by

4 x  3 y  2 z  u  20 2 x  5 y  3z  2u  5 x  4 y  8 z  u  13 3 x  2 y  4 z  5u  9 This can be expressed with a matrix form as

(27)

Kunihiro Suzuki

328  4 3 2 1  x   20        2 5 3 2  y    5   1 4 8 1  z   13        3 2 4 5  u   9 

(28)

Using elements of the first column, we want to eliminate the elements of the second and subsequent elements of the first column as

Second row element  -  First row elemet  

 Third row element  -  First row elemet  

Second row element   First row elemet 

(29)

 Third row element   First row elemet 

(30)

 Fourth row element   First row elemet 

(31)

 Fourth row element  -  First row elemet   We then obtain

4 3 2 1 20         2 2 2 2  2 24 5   3 3   2 2   1  x   5   20    4 4 4 4  y   4        1 1 1 1 1  1  4 4   3 8   2 1   1   z   13   20  4 4 4 4    4   u      3 3 3 3 3 4 2  3 4  2 5 1  20   3  9   4 4 4 4   4 

(32)

Performing the calculation, we obtain 3 4  7 0  2  19 0  4   17 0  4

2 4 15 2 5  2

1   5    x   20  2   y   15   5       z   8  4     u   24    23  4 

(33)

Matrix Operation

329

as is expected of 0 elements of the second and subsequent elements of the first column. We then move to the second row focusing on the second column. Using elements of the second column, we want to eliminate the elements of the third and subsequent elements of the second column as

 Third row element  - Second row elemet  

 Third row element  Second row elemet 

 Fourth row element  - Second row elemet  

 Fourth row element  Second row elemet 

(34)

(35)

We then obtain 3 2 1 4    7 5 0  x  4     2 2 y  19 19 2 7 15 19 2 5 19 2  5     0         4         z  4 4 7 2 2 4 7 4 4 7  2      u 17 17 2 7 5 17 2 23 17 2  5     0         4        4 4 7 2 2 4 7 4 4 7  2   20      15      19 2  8     15   4 7   17 2    24     15   4 7  

(36)

Performing the calculation, we obtain 4  0   0   0 

3 7 2 0 0

2 4 29 14 33 14

1   20   5  x    15       2 y  173  65         z  14   14    u  591  123        14  14 

(37)

as is expected of 0 elements of the third and subsequent elements of the second column.

Kunihiro Suzuki

330

We then move to the third row focusing on the third column. Using elements of the third column, we want to eliminate the elements of the fourth element of the third column as

 Fourth row element  -  Third row element  

 Fourth row element   Third row element 

(38)

We then obtain 4  0   0   0 

3 7 2 0 0

1  20    5    x   15 4      2 y   173   29 65    z   14  14 14    u   591 33 14  173     33 33 14 29 123 33 14  65   14  14  29    14               14 14 29 14 14 14 29  14   2

(39)

Performing the calculation, we obtain 4  0   0   0 

3 7 2

2 4

0

29 14

0

0

1   20   5  x    15   2  y    173  65         z   14   14     u  1632  408        29  29 

(40)

as is expected of 0 element of the fourth element of the third column. We then obtain a form of an upper triangular matrix, and can solve easily as shown before. From the last row relationship, we obtain

408 1632 u 29 29

(41)

We then have

u

1632 4 408

(42)

Matrix Operation

331

From the second row from the last relationship, we obtain

29 65 173 z u 14 14 14

(43)

We then obtain

z

1 87  173  65  4   3 29 29

(44)

From the third row from the last relationship, we obtain

7 5 y  4 z  u  15 2 2

(45)

We then obtain y

2 5  2  15  4  3   4    7  2 7 2  7

(46)

From the fourth row from the last (top) relationship, we obtain 4 x  3 y  2 z  u  20

(47)

We then obtain 1  20  3 y  2 z  u  4 1   20  3  2  2  3  4  4 1  4 4 1

x

(48)

We should generalize the above Gauss elimination method using a variable given by

Kunihiro Suzuki

332  a11   a21    ai ,1 a  i 1,1   a  n1

a12 a22

a1,i a2,i

a1,i 1 a2,i 1

ai ,2 ai 1,2

ai ,i ai 1,i

ai ,i 1 ai 1,i 1

an 2

an ,i

an ,i 1

a1n  x1   b1   a2 n  x2   b2          ai , n  xi    bi  ai 1, n  xi 1   bi 1           b  ann  x  n   n 

(49)

Using elements of the first column, we eliminate the elements of the second and subsequent elements of the first column as a11    a21  a21  a11  a11    a  ai ,1  i ,1  a11 a11   ai 1,1  ai 1,1   a11  a11    a  an1  n1  a11 a11 

a12 a a22  21  a12 a11 ai ,2 

ai ,1

ai 1,2 

ai 1,1

an 2 

a11

 a12

a11

 a12

an1  a12 a11

a1,i a a2,i  21  a1,i a11 ai ,i 

ai ,1

ai 1,i 

ai 1,1

an,i 

a11 a11

 a12  a12

an1  a12 a11

a1,i 1 a a2,i 1  21  a1,i 1 a11 ai ,i 1 

ai ,1

ai 1,i 1 

ai 1,1

an,i 1 

a11 a11

 a12  a12

an1  a12 a11

     x1      x2    a ai , n  i ,1  a12    a11 x  i   x  a i 1  ai 1,n  i 1,1  a12    a11       xn   a ann  n1  a12  a11  a1n a a2 n  21  a1n a11

b1     a  b2  21  b1    a11       ai ,1  bi   b1  a11     ai 1,1  bi 1   b1    a11       a  bn  n1  b1  a11  

(50) We update the elements, and express the new ones as  a11 a12  a22  0   ai ,2  0  0 a i 1,2    0 an 2 

a1,i a2,i

a1,i 1 a2,i 1

ai ,i ai 1,i

ai ,i 1 ai 1,i 1

an,i

an,i 1

a1n  x1   b1   a2 n  x2   b2          ai , n  xi    bi  ai 1, n  xi 1   bi 1             ann  x   n   bn 

(51)

Matrix Operation

333

Performing the procedure, and we obtain  a11   0    0  0    0 

a12 a22

a1,i a2,i

a1,i 1 a2,i 1

0 0

ai ,i ai 1,i

ai ,i 1 ai 1,i 1

0

an ,i

an ,i 1

a1n  x1   b1   a2 n   x2   b2          ai , n  xi    bi  ai 1, n  xi 1   bi 1             ann   xn   bn 

We then obtain the elements below the diagonal up to

(52)

ai 1,i 1 is all zero. Next, we

perform the similar operation to the i  1 -th row and subsequent, and obtain a1,i  a11 a12  0 a a 22 2,i    0 ai ,i  0  a  0 0 ai 1,i  i 1,i  ai ,i ai ,i     a 0 an ,i  n ,i  ai ,i  0 ai ,i  b1     b2       bi     a   bi 1  i 1,i  bi  ai ,i         an ,i  bi   bn   ai ,i  

a1,i 1 a2,i 1 ai ,i 1 a ai 1,i 1  i 1,i  ai ,i 1 ai ,i an ,i 1 

an ,i ai ,i

    x1   x   2  ai , n      a x ai 1, n  i 1,i  ai , n   i  ai ,i   xi 1        xn  an ,i ann   ai , n   ai ,i  a1n a2 n

 ai ,i 1

(53) We update the elements and obtain

Kunihiro Suzuki

334  a11 a12   0 a22   0  0  0 0    0 0 

a1,i a2,i

a1,i 1 a2,i 1

ai ,i 0

ai ,i 1 ai 1,i 1

0

an,i 1

a1n  x1   b1   a2 n  x2   b2          ai , n  xi    bi  ai 1, n  xi 1   bi 1             ann    xn   bn 

(54)

Performing the similar operation to the last row, we obtain  a11   0    0  0    0 

a12 a22

a1,i a2,i

a1,i 1 a2,i 1

0 0

ai ,i 0

ai ,i 1 ai 1,i 1

0

0

0

a1n  x1   b1      a2 n  x2   b2          ai , n  xi    bi  ai 1, n  xi 1   bi 1             ann  xn   bn 

(55)

We then obtain a form of an upper triangular matrix, and can solve easily as shown before. From the last row relationship, we obtain

ann xn  bn

(56)

We then obtain xn as xn 

bn ann

(57)

From the second row from the last relationship, we obtain

an1,n1 xn1  an1,n xn  bn1

We then obtain xn 1 

xn 1

(58)

as

1  bn1  an1,n xn  an 1, n 1

(59)

Matrix Operation We can continue the process in up order of the row. Let us consider the term of We can obtain from the equation given by ai ,i xi  ai ,i 1 xi 1 

ai ,n xn  bi

335

xi .

(60)

We then obtain xi 

1 bi   ai ,i 1 xi 1  ai ,i 

1  ai ,i

ai , n xn  

(61)

n   bi   ai , k xk  k  i 1  

8.1. Gauss Elimination Method and LU Decomposition A Gauss elimination method is vital to solve simultaneous equations as shown above. However, we can generalize this method as shown in this section. This is called an LU decomposition, which is vital in matrix operation. Let us consider the process of a Gauss elimination method. The simultaneous equation is expressed with a matrix form given by

AX  B

(62)

where  a11   a21   A   ai ,1 a  i 1,1   a  n1

 x1     x2      X   xi  x   i 1     x   n 

a12 a22

a1,i a2,i

a1,i 1 a2,i 1

ai ,2 ai 1,2

ai ,i ai 1,i

ai ,i 1 ai 1,i 1

an 2

an ,i

an ,i 1

a1n   a2 n    ai , n  ai 1,n    ann 

(63)

(64)

Kunihiro Suzuki

336  b1     b2      B   bi  b   i 1    b   n 

(65)

We set mi1 

ai1 a11

(66)

and define the matrix as 0 0 1   m21 1 0 M 1    m31 0 1    m  n1 0 0

0  0 0   1 

(67)

Multiplying this to the A , we obtain 0 1   m21 1 M 1 A   m31 0    m  n1 0  a11 a12  a22 0  0 a32   0 an 2 

0 0 1 0 a13 a23 a33 an3

0  a11  0  a21 0  a31   1   an1 an   a2 n  a33    ann 

a12 a22 a32

a13 a23 a33

an 2

an 3

an   a2 n  a33    ann 

(68)

The elements of the second and the subsequent row is updated as

aij  aij  mi1a1 j Next, we set

(69)

Matrix Operation mi 2 

ai 2 a22

337

(70)

and define the matrix given by 1  0 M2  0   0 

0 1 m32

0 0 1

mn 2

0

0  0 0   1 

(71)

Multiplying this to the M1 A , we obtain 1 0  0 1 M 2 M 1 A   0 m32    0 m n2   a11 a12  a22 0  0 0   0 0 

0  a11 a12  0  0 a22 0  0 a32   1  an 2  0 an   a2 n  a33    ann 

0 0 1 0 a13 a23 a33 an3

a13 a23 a33 an3

an   a2 n  a33    ann 

(72)

The elements of the third and the subsequent row is updated as

aij  aij  mi 2 a2 j

(73)

We repeat the similar process n  1 times, and obtain

M n 1

 a11  0 M 2 M1 A   0   0 

a12 a22 0

a13 a23 a33

0

0

an   a2 n  a33    ann 

(74)

The right side of Eq. (74) is the exactly the same as that of the Gauss elimination method. Therefore, this process is identical to the method.

Kunihiro Suzuki

338 We then obtain M 2 M1 AX  M n1

M n1

M 2 M1B

(75)

Modifying this equation, we obtain

 M n1

M 2 M1 

1

 M n1

M 2 M1  AX  B

(76)

We then define the matrixes as

L   M n1

M 2 M1 

U  M n1

M 2 M1 A

1

(77) (78)

Therefore, the original matrix is expressed by

LUX  B

(79)

The form of U is shown in Eq. (74) and is an upper triangle matrix. Let us consider the form of L . The inverse matrix with respect to M k is given by

M k 1

1  0    0   0 

0 1

1  0    0   0 

0 1

0

1 mk 1, k

0

mnk

0

1 mk 1, k

0

mnk

0  0    0   1  0  0    0   1 

1

(80)

Matrix Operation

339

Therefore, we obtain

L  M 11 M 21

M n11

 1   m21    m  k 1,1   m  n1

0 1 1 mk 1,2

mk 1, k

mn 2

mnk

0  0    0   1 

(81)

This is a lower triangle matrix. Therefore, solving a simultaneous equations step is divided into two steps as

LY  B

(82)

UX  Y

(83)

Both steps are easily solved, which will be shown later. Once we modify the matrix A to LU , we can apply it to any B . Since Eqs. (74) and (81) use updating elements, we need to obtain them from the original elements, which should be done next.

8.2. LU Division In LU division, we express a matrix A as a product of two matrices given by

 a11   a21    an1 where

a12 a22 an 2

a1n   1 0   a2 n   l21 1      ann   ln1 ln 2

0  u11 u12  0  0 u22   1  0 0

u1n   u2 n    unn 

(84)

Kunihiro Suzuki

340

1 0  l 1 L   21    ln1 ln 2

0  0   1

 u11 u12  0 u22 U    0  0

(85)

u1n   u2 n    unn 

(86)

We want to evaluate all elements of L and U simultaneously. Let us start with a simple example given by 6 5 4 1    12 13 10    l21  18 21 17   l    31

0 1 l32

 u11    l21u11 l u  31 11

0  u11  0  0 1   0

u12 u22 0

u12 l21u12  u22 l31u12  l32u22

u13   u23  u33  u13   l21u13  u23  l31u13  l32u23  u33 

(87)

We can decide the elements of the first row elements as u11  6

(88)

u12  5

(89)

u13  4

(90)

From the second row and the first column, we obtain l21u11  12

(91)

l21 is then decided as

l21 

12 12  2 u11 6

(92)

Matrix Operation

341

Using this l21 , u22 and u23 are deiced as follows. l21u12  u22  13

(93)

u22  13  l21u12  13  2  5  3

(94)

l21u13  u23  10

(95)

u23  10  l21u13  10  2  4  2

(96)

Focusing on the third row and the first column, we can obtain l31u11  18

(97)

we can decide l31 as l31 

18 18  3 u11 6

(98)

Using this l31 ,we can decide l32 as l31u12  l32u22  13

l32 

21  l31u12 21  3  5  2 u22 3

(99)

(100)

Using l32 we can decide u33 as l31u13  l32u23  u33  17

(101)

u33  17   l31u13  l32u23   17   3  4  2  2   1

(102)

Therefore, the original matrix is LU divided as

Kunihiro Suzuki

342

 6 5 4   1 0 0  6 5 4       12 13 10    2 1 0  0 3 2   18 21 17   3 2 1  0 0 1      

(103)

Let us consider a general algorithm for LU division with a four order matrix, which is given by  a11   a21  a31   a41

a12 a22 a32 a42

a3 a23 a33 a43

1 0  l 1   21  l31 l32   l41 l42

0 0 1 l43

a4   a24  a34   a44  0  u11 u12  0  0 u22 0  0 0  1 0 0

u12  u11  l u l u 21 12  u22   21 11  l31u11 l31u21  l32 u22   l41u11 l41u12  l42u22

u13 u23 u33 0

u14   u24  u34   u44 

(104)

u13 u14   l21u13  u23 l21u14  u24   l31u13  l32 u23  u33 l31u14  l32 u24  u34  l41u13  l42u23  l43u33 l41u14  l42u24  l43u34  u44 

Focusing on the first row, we decide

u1 j

as

u1 j  a1 j

(105)

Focusing on the second row, we first decide l21 as l21 

a21 u11

(106)

Using l21 , we can decide u2 j  j  2  as u22  a22  l21u12

(107)

u23  a23  l21u13

(108)

Matrix Operation u24  a24  l21u14

343 (109)

Focusing on the third row, we first decide l31 , l32 as l31 

a31 u11

(110)

l32 

1  a32  l31u22  u22

(111)

Using l31 , l32 , we can decide u3 j  j  3 as u33  a33  l31u13  l32u23

(112)

u34  a34  l31u14  l32u24

(113)

Focusing on the third row, we first decide l41 , l42 , l43 as l41 

a41 u11

(114)

l42 

1  a42  l41u22  u22

(115)

l43 

1  a43  l41u13  l42u23  u33

(116)

Using l31 , l32 , we can decide u44 as u44  a44  l41u14  l42u24  l43u34

(117)

We further generalize the above process. We first initialize the matrix elements as

0 lij   1

for i  j , uij  0 for i  j

(118)

Kunihiro Suzuki

344

Focusing on the first row, we decide

u1 j

as

u1 j  a1 j

(119)

Focusing on the i   2 -th row, we first decide li1 as li1 

ai1 u11

(120)

We then decide elements columns as

lij  2 j i  

lij

of the second and the subsequent up to j  i  1 -th

j 1  1   aij   lik ukj  u jj  k 1 

(121)

In the same row, we decide the elements uij of j  i –th and the subsequent columns as j 1

uij  j i   aij   lik ukj

(122)

k 1

8.3. Inverse Matrix Derivation Utilizing LU Division 1 We set an inverse matrix with respect to a matrix A as A , that is

AA1  E

(123)

We assume a 5  5 matrix as  a11   a21  a31   a41 a  51

a12 a22 a32 a42 a52

a13 a23 a33 a43 a53

a14 a24 a34 a44 a54

a15  x11  a25  x21 a35  x31  a45  x41 a55   x51

x12 x22 x32 x42 x52

x13 x23 x33 x43 x53

x14 x24 x34 x44 x54

x15  1   x25   0 x35    0   x45   0 x55   0

0 1 0 0 0

0 0 1 0 0

0 0 0 1 0

0  0 0  0 1 

(124)

Matrix Operation

The elements of inverse matrix are denoted as

. Eq. (124) can be expressed as

 a11   a21  a31   a41 a  51

a12 a22 a32 a42 a52

 a11   a21  a31   a41 a  51

a12 a22 a32 a42 a52

a13 a23 a33 a43 a53

a14 a24 a34 a44 a54

a15  x12   0      a25  x22  1  a35  x32    0      a45  x42   0     a55   x52   0 

(126)

 a11   a21  a31   a41 a  51

a12 a22 a32 a42 a52

a13 a23 a33 a43 a53

a14 a24 a34 a44 a54

a15  x13   0      a25  x23   0  a35  x33   1      a45  x43   0     a55   x53   0 

(127)

 a11   a21  a31   a41 a  51

a12 a22 a32 a42 a52

a13 a23 a33 a43 a53

a14 a24 a34 a44 a54

a15  x14   0      a25  x24   0  a35  x34    0      a45  x44  1     a55   x54   0 

(128)

 a11   a21  a31   a41 a  51

a12 a22 a32 a42 a52

a13 a23 a33 a43 a53

a14 a24 a34 a44 a54

a15  x15   0      a25  x25   0  a35  x35    0      a45  x45   0     a55   x55  1 

(129)

a13 a23 a33 a43 a53

a14 a24 a34 a44 a54

a15  x11  1      a25  x21   0  a35  x31    0      a45  x41   0     a55   x51   0 

xij

345

(125)

Therefore, we can derive all matrix elements using the simultaneous matrix elements derivation process five times. The matrix elements of a unit matrix is denoted as

eij   ij

(130)

Kunihiro Suzuki

346

Applying the simultaneous equation solving process, we can obtain the matrix elements as y1 j  e1 j i 1

yij  eij   lik ykj

for i  2

(131)

k 1

xnj 

xij 

ynj unn yij 

(132)

n

u

k  i 1

i ,k

xkj

uii

9. DETERMINANT OF A MATRIX The determinant of a matrix for the two and three order square matrix are given by

a11 a12  a11a22  a12 a21 a21 a2 a11 a21 a31

a12 a22 a32

a3 a23  a11a22 a33  a12 a23 a31  a13 a21a32 a3

(133)

(134)

  a13 a22 a31  a12 a21a33  a11a23 a32 

The determinant for more than the four order square matrix is rather complex and the derivation of this type is not practical. We do not discuss the determinant in detail, but only show the numerical process to obtain it. We can modify any row elements by adding the constant multiplied the other row elements, which does not change the value of the determinant. Therefore, we can obtain an up triangle one by applying the Gauss method as a '11 0 0

a '12 a '22 0

0

0

a '13 a '23 a '33

a '1n a '2 n a '3n 0

a 'nn

(135)

Matrix Operation

347

The corresponding determinant is then given by a '11 a '22 a '33

(136)

a 'nn

10. NUMERICAL EVALUATION OF EIGENVALUE We sometimes need to obtain eigenvalues and corresponding eigenvectors. They are fundamental unit of a matrix. We discuss the subject in this section.

10.1. Relationship between Matrix and Eigenvector Let us consider a square matrix with a  a11  a A   21    an1

n -th order given by

a1n   a2 n    ann 

(137)

We have corresponding

n eigenvalues and eigenvectors associated with A and

a12 a22 an 2

k  k  denote them as  and u , which satisfy

Au

u

k

k

 u

k

 u1 k    k   u   2     uk    n 

(138)

(139)

It is known that each eigenvector is orthogonal, and is given by

u  u   ij i

j

We constitute a matrix that consists of the eigenvectors as

(140)

Kunihiro Suzuki

348



U  u 

u

1

2

u



n

(141)

T The corresponding transverse matrix U is given by

 u1T   2 T u T U    u  n T 

      

(142)

where u

k T



 u1

k

u2

k

un

k



(143)

T The product of the matrices of U and U is given by



UU T  u  1

u

2

 u1 u1   2 1 u u    u n  u1  1 0  0 1    0 0 E

u

n

u  u

2

u

2

u

2

u

n

u

2

1



 u1T   2 T u    u  n T 

      

1 n u  u    2 1 u  u      n n u  u  

0  0   1

(144)

Similarly, we obtain U TU  E

We consider a product of

(145)

A and U given by

Matrix Operation



AU     u  1

  2 u 2

1

  n u n

349



(146)

T Multiplying U from the right side to Eq. (146), we obtain

AUU T  AI A

(147)

Therefore, we obtain A   AU U T



    u  1

  2  u 2 

1

  n  u n 

    u  u      u  u 1

1

1T

2

2

2 T



 u1T   2 T u    u  n T 

      

    u  u



n

n

n T

(148)

Therefore, eigenvectors are fundamental components of a matrix A . It is similar to that a space vector r is expressed by unit vector ei as

r  ax e x  a y e y  a z e z

(149)

Therefore, the eigenvalue corresponds to the component related to the eigenvector. k  When we multiply eigenvector u from the right side to the matrix A , we obtain



Au      u  u  u k

1

1

    u k

1T

k

     u   u  u    2

2

2T

k

    u k

k

 u  u   k T

k

    u n

n

 u  u   nT

k

k

(150) Therefore, we can extract the k-th component. It is similar to the space vector case where we have

r e x  ax e x e x  a y e y e x  az e z e x  ax

(151)

Kunihiro Suzuki

350

10.2. Power Method We assume ordered eigenvalues given by

1   2 

 

n

(152)

1 We show how to evaluate the maximum eigenvalue of  . The eigenvectors are independent of each other, and any vector eigenvectors as

v  C1u   C2 u   1

2

 Cn u

v

is expressed with the

n

(153)

Multiplying a matrix A from the left side to Eq. (153), we obtain

Av  C1 Au   C2 Au   1

 C p Au

2

 C1   u   C2    u   1

1

2

We perform this process

r

2

 C p    u

1

 

   1

r

r

1

n

n

(154)

times and obtain

  u    C      u  

Ar v  C1   

n

2

r

2

2

    2 C1u1  C2  1      

r

  2  u  

   u 

 Cp 

n

r

   n  C p  1 

n

r   n   u    

(155)

Since we have 1  2    n , the second and the subsequent becomes negligible when r increases. Therefore, we can obtain

 

Ar v  C1    1

r

u  1

(156)

Therefore, multiplying A to v with many times, we obtain the maximum eigenvector with a factor. The other thing is how we can judge the sufficient cycle time to extract the maximum eigenvector, which is shown below. We start with an initial vector of v

Matrix Operation v   0

1 n

1

1

351

1

(157)

This can be replaced by the other ones without zero vectors. We multiply A to this vector, and normalize its size as v   1

Av 

0

Av 

0

 v11   1  v   2     v1   n 

(158)

We further multiply A to this vector, and normalize its size as v   2

Av   1

Av   1

 v21   1   v    2     v1   2 

(159)

We further multiply A to this vector, and normalize its size as v   3

Av 

2

Av 

2

 v1 3    3  v   2     v  3   n 

(160)

We then repeat this process many times.  r *  r 1* If the first component becomes predominant, the all elements of v and v become close. Setting a critical value of  , we can evaluate as

Kunihiro Suzuki

352 v

r 1

v

 v

r



r 1

(161)

If this equation is valid, we judge the first component becomes predominant. Then, the first eigenvector is determined as u   v 1

r1

(162)

The eigenvalue can be evaluated from the evaluation of Au      u  1

1

1

(163)

We then obtain the eigenvalue as

 1 

u  Au  1T

1

u  u  1T

1

(164)

We can evaluate the other eigenvectors and eigenvalues as follows. From Eq. (148), we make a matrix given by

A2  A     u  u  1

    u  u 2

2

1

1T

2 T



    u  u n

n

n T

(165) 1

This A2 is a matrix without the fundamental component of u . Therefore, any vector v is expressed as

v  C2 u  

 Cn u

2

(166)

A2 from the left side to the vector, we obtain

Multiplying

A2 v  C2 A2 u   2

 C2    u   2

n

2

 Cn A2 u

n

 Cn    u n

n

(167)

Matrix Operation

353

We then obtain the second eigenvector and eigenvalue. We can repeat this process to the end, and finally we can obtain all eigenvectors and related eigenvalues. In this procedure, the error is accumulated with solving the eigenvectors. Therefore, we cannot expect accurate eigenvectors and eigenvalues for smaller eigenvalues. This is mainly applied to obtain the maximum eigenvector and eigenvalue.

11. JACOBI METHOD FOR SYMMETRICAL MATRIX A Jacobi method aims to force elements except for the diagonal ones to zero. After the completion of the process, we can obtain all the eigenvalues and eigenvectors. Although this method is limited to a symmetrical matrix, most of the case where statistics require is related to the symmetrical matrix. Therefore, the Jacobi method is vital for statistics and may be most important. In the symmetrical matrix, any two eigenvectors are orthogonal to each other, which are shown below. Let us consider two eigenvectors of uA and uB and corresponding eigenvalues are

A , and B . We then have AuA  A uA

(168)

AuB  B uB

(169)

The transverse of Eq.(169) is given by uTB A  B uTB

(170)

Since A is a symmetrical matrix, we can assume AT  A . Multiplying uA from the right side, we obtain uTB AuA  B uBT uA

(171)

The left side is performed as

uTB AuA  uBT  A uA   A uBT uA

(172)

Kunihiro Suzuki

354 Therefore, we obtain

 A  B  uTB uA  0

(173)

Since A  B , we obtain

uTB uA  0

(174)

Therefore, it is proved that any two eigenvectors are orthogonal to each other. Let us start with a second order square matrix. The corresponding eigenvalue problem is described as

Ax   x

(175)

We assume that eigenvector

x

is normalized, and hence it is expressed by

 cos   x   sin  

(176)

Therefore, Eq.(175) is expressed by  a11   a21

a12   cos    cos        a22   sin    sin  

(177)

This is modified as  a11     a21

a12   cos    0 a22     sin  

(178)

This is reduced to   a11    cos  a12 sin   0   a21 cos   a22    sin   0

This has solutions only when the below is held.

(179)

Matrix Operation a11   a21

a12 0 a22  

355

(180)

Therefore, we obtain

 a11    a22     a12 a21  0

(181)

Finally, we obtain the eigenvalues of a a a a    11 22   11 22   a12 a21 2 2   2

(182)

Since we treat the symmetrical matrix, a12  a21 , and Eq. (182) is reduced to a11  a22  a  a22  2   11   a12 2 2   2



(183)

We denote two solutions as a11  a22 a a    11 22   a122 2 2  

(184)

a11  a22 a a    11 22   a122 2 2  

(185)

2

1 

2

2 

Substituting 1 into Eq. (179), we obtain

 a11  1  cos1  a12 sin 1  0

(186)

Modifying this, we obtain tan 1 

1  a11 a12

Similarly, substituting 2 into Eq. (179), we obtain

(187)

Kunihiro Suzuki

356 tan 2 

2  a11 a12

(188)

We can evaluate 1 and 2 from Eqs.(187) and (188). However, we have a relationship given by tan   tan    

(189)

Therefore, the angle is not uniquely determined. In the Jacobi method, we do not evaluate the angles directly, but evaluate cos and sin  , and the ambiguity vanishes, which is shown later. Let us consider eigenvectors in more detail, which is given by  cos 1   cos  2  x1    , x2     sin 1   sin  2 

(190)

Performing an inner product of the eigenvectors, we obtain  sin 1 sin  2  cos 1 cos  2  sin 1 sin  2  cos 1 cos  2 1   cos 1 cos  2    cos 1 cos  2 1  tan 1 tan  2    a  a   cos 1 cos  2 1  1 11 2 11  a12 a12  

(191)

Furthering the calculation, we obtain 1

1  a11 2  a11

1

a12

a12

12   1  2  a11  a112 a12 2

 a11  a22   a11  a22  2 2      a12   a11  a22  a11  a11 2 2    1  a12 2 2

0

2

(192)

Therefore, the two eigenvectors are orthogonal to each other as is shown in general before.

Matrix Operation

357

We define the matrix H consists of the eigenvectors as  cos 1 H   sin 1

cos  2   sin  2 

(193)

We perform a product of AH , which is a12  cos 1 cos  2  a AH   11    a21 a22  sin 1 sin  2   a cos 1  a12 sin 1 a11 cos  2  a12 sin  2    11   a21 cos 1  a22 sin 1 a21 cos  2  a22 sin  2    cos 1 2 cos  2   1   1 sin 1 2 sin  2   cos 1 cos  2  1 0      sin 1 sin  2  0 2   H

(194)

where   1 0

0 2 

(195)

When H is orthogonal, the composing column vector is orthogonal to each other. Then, the composing row vector is orthogonal to each other. Therefore, the following is valid.

HH T  E

(196) T

Performing HH , we obtain  cos 1 cos  2  cos 1 sin 1  HH T      sin 1 sin  2  cos  2 sin  2   cos 2 1  cos 2  2 cos 1 sin 1  cos  2 sin  2    sin 2 1  sin 2  2  sin 1 cos 1  sin  2 cos  2  1 0     0 1  

(197)

Kunihiro Suzuki

358 Therefore, we obtain sin2  cos1 ,cos2   sin2

(198)

Consequently, H is a rotation matrix of Sr given by  cos  H  Sr    sin 

 sin    cos  

(199)

T The transverse matrix of Sr is denoted as Sr and is given by

 cos  S rT     sin 

sin    cos  

(200)

Therefore, the product is given by  cos  sin   cos   sin   SrT Sr       sin  cos   sin  cos    cos 2   sin 2   cos  sin   sin  cos     sin 2   cos 2    sin  cos   cos  sin   1 0  E 0 1

(201)

We then obtain SrT  Sr1

(202)

as is expected. Further, we obtain ASr  Sr 

(203)

We then obtain A  Sr SrT

(204)

Matrix Operation

359

We can modify it as follows   SrT ASr

(205) T

This means that any matrix A can be diagonalized by multiplying Sr from the left, and Sr from the right. It is also noted that any symmetrical matrix can be expressed with T a form of Sr Sr .

We limit our analysis to n  2 up to here. Let us then consider n  2 . We select the row number of

p

a pq  0  p  q 

, and the column number of . We rotate the matrix so that

and finish the process when all

aij i  j 

q

a pq

, and consider a corresponding element is forced to be 0 . We repeat any elements

is sufficiently small.

xx The rotation with an angle of  in p q plane, is expressed by  1      Sr          

p

q

cos 

 sin 

1 1 sin 

cos  1

               1 

(206)

Let us consider the transformation using Sr as

B  SrT ASr

When we multiply

(207)

Sr to A from the right side, only the elements of p and q

T p columns change. When we multiply Sr to ASr from the left side, only the elements of

q

and rows change. We consider the change of the elements after the operation of Eq. (207) inspecting Figure 1.

Kunihiro Suzuki

360

A The elements in region   are changed only by the operation of multiplying Sr from the right.

The elements in region the left.

 B  are changed only by the operation of multiplying SrT from

The elements in region   and   are changed by both multiplying. The other elements are unchanged. The next thing we should do is to select the angle associated with the operations. D

C

Figure 1. The change elements changed by the operation of Eq. (207).

Let us compare the elements of matrices A and B denoted by row number i and the column number j . bij  aij for i, j  1, 2,

, n : i , j  p, q

bip  bpi  aip cos   aiq sin  for i  1, 2,

biq  bqi  aip sin   aiq cos  for i  1, 2,

bpp  a pp cos 2   aqq sin 2   a pq sin 2

bqq  a pp sin 2   aqq cos 2   a pq sin 2

bpq  bqp  a pq cos 2 

a pp  a pq 2

aij

and

bij

with the

(208) , n : j  p, q

, n : j  p, q

(209) (210)

(211)

(212)

sin 2

(213)

Matrix Operation

361

The sum of the square of p and q column in region (A) is given by bip2  biq2  aip2 cos 2   aiq2 sin 2   2aip aiq cos  sin   aip2 sin 2   aiq2 cos 2   2aip aiq cos  sin   aip2  cos 2   sin 2    aiq2  cos 2   sin 2    aip2  aiq2

(214)

Therefore, the sum is unchanged for matrices A and B in this region. Since A and B are symmetrical ones, and the same results are obtained for region (B), which is expressed by

bpi2  bqi2  a 2pi  aqi2

(215)

Therefore, the difference of sum of square of non-diagonal elements for matrices A and B is expressed by the difference of elements in region  C  , which is given by 2 2  a2pq  bpq 

(216)

This becomes the maximum when Since the operation of

bpq

is 0.

SrT ASr corresponding to the rotation of the matrix elements of

A, the total norm, which is called as a Frobenius norm, must hold. This is expressed by n

n

 a i 1 j 1

2 ij

n

n

  bij2 i 1 j 1

(217)

Therefore, when we perform this operation, we can expect smaller element value for non-diagonal elements in the region (C), and larger diagonal elements value in the region (D). The corresponding angle  can be evaluated from Eq. (213) as bpq  a pq cos 2 

We then obtain

a pp  a pq 2

sin 2  0

(218)

Kunihiro Suzuki

362

tan 2 

2a pq a pp  a pq

(219)

We can then evaluate cos and sin  as follows (see Appendix 1-13). cos 2 

1 1  tan 2 2

(220)

sin 2  tan 2 cos2

(221)

1  cos 2 2

cos  

(222)

sin 2 2cos 

sin  

(223)

Let us consider an eigenvector. The Jacob method repeats the process where the maximum non diagonal element to be zero, we finally obtain the matrix given by Sr T

Sr 2T Sr1T ASr1Sr 2

Sr  

(224)

where  1     0

2

0     n 

(225)

It should be noted that we have ei  i ei

(226)

This can be expressed by

S

T r

SrT2 SrT1 ASr1Sr 2

Sr  ei  i ei

(227)

Matrix Operation Multiplying Sr1Sr 2

Sr from the left, we obtain

Sr ei  i Sr1Sr 2

ASr1Sr 2

363

Sr ei

(228)

Therefore, we obtain eigenvector as

vi  ASr1Sr 2

Sr ei

(229)

Summarizing above, we can describe the Jacobi method as follows. Set a convergence condition  .

 0 Set the initial eigenvector matrix V as E .

Search the maximum non diagonal element

a pq

  in matrix A .

If

a pq  

is valid, the searching process is finished.

If

a pq  

is valid, evaluate cos ,sin  and obtain Sr .

0

Then update A and V below A   SrT1 A  Sr1 1

V

1

0

(230)

 V   S r1 0

(231)

a

Perform next step of searching next pq . We can then obtain a diagonal matrix. A   SrT A

 1



 SrT

Sr

SrT2 SrT1 A  Sr1Sr 2 0

Sr

(232)

The j-th column elements correspond to the j-th eigenvalue. Evaluate V  V 

 1

Sr

 V   S r1 S r 2 0

Sr

The j-th column vector corresponds to the j-th eigenvector.

(233)

Kunihiro Suzuki

364

12. n-TH PRODUCT OF MATRIX We consider matrix A given by  a11  a A   21    an1

a1n   a2 n    ann 

a12 a22 an 2

(234)

k We consider the k-th product of the matrix of A . We assume that we know the

eigenvectors and eigenvalues of the matrix A and denote them as x1 ,x2 , ,xn . We then form a matix

P   x1 x2

xn 

(235)

where  a11 a12  a a Axi   21 22    an1 an 2

a1n   a2 n  x  i i  ann 

(236)

The product of A and P is given by  a11 a12  a a22 AP   21    an1 an 2   1x1 2 x 2   x1  1  0  P   0

x2

0

2 0

a1n   a2 n   x x2  1  ann  n x n   1  0 xn     0 0  0   n 

0

2 0

xn 

0  0   n 

(237)

Matrix Operation

365

1 We also evaluate the inverse matrix of P denoted as P . We can realize a diagonal matrix as

 1 0  0 2 P 1 AP     0 0

0  0   n 

(238)

Therefore, we obtain

P

1

AP



k

 1k   0    0 

0

2

k

0

0   0    n k 

(239)

On the other hand, we can extend the left side of Eq. (239) as

P

1

AP

  P k

1



  P AP   A  PP  P

AP P 1 AP



 P 1 A PP 1

1

1

(240)

 P 1 Ak P

Therefore, we obtain  1k   0 k A  P   0 

0

2 k 0

0   0  1 P  n k 

(241)

Appendix 1

RELATED MATHEMATICS We briefly show various mathematical treatments which are used in the text book.

1. SUMMATION  We utilize a summation expression using n

1  1  1 

AND PRODUCT

 whole in this book. It is defined as

1  n

k 1

(1)

n

k 1 2 

n

k 1

n

k

2



(2)

 12  22 

 n2

k 1

(3)

and so on for higher order summation. Since k is a dummy variable, we can use any variable as n

n

n

  m  1 2  k 1

k

l 1

l

m 1

n (4)

Kunihiro Suzuki

368

We can apply this to a suffix of a variable as

x1  x2 

 xn 

n

x

k

k 1

(5)

Therefore, the equation in the text book is given by

 a  b

n

n

  n Cr a n  r b r r 0

 n C0 a n  n C1a n 1b  n C2 a n  2 b 2 

 n Cn 1ab n 1  n Cn b n

(6)

 expresses the product as given by n

 i  1 2 

n

i 1

(7)

This can then be related to a factorial as n

 i  1 2 

 n  n!

i 1

(8)

Note that

0!  1

(9)

The definition of 0! comes from below. The combination where we select

n

Cr 

r among n elements is given by

n! r ! n  r !

(10)

When we set n  r , the case number should be 1, that is

n

Cn 

n! 1  1 n! n  n ! 0!

Therefore, we impose Eq. (9).

(11)

Related Mathematics

369

 expresses a more general case as shown below. n

x i 1

i

 x1  x2 

 xn (12)

We also define double factorial where the elements skip one, and is given by

 2n !!   2n  2n  2   2    2n  2n  2   2n  2  n  1  n

   2n  2  i  1  i 1

(13)

 2n  1!!   2n  1 2n  1 1   2n  1 2n  1  2n  1  2n  n 1

   2n  1  2  i  1  i 1

(14)

2. A GAMMA FUNCTION AND A BETA FUNCTION 2.1. Definition of a Gamma Function Gamma function is defined as

  x 





0

exp  t  t x 1dt

(15)

We consider a more general form than Eq.(15), where the term in the exponential term is not t , but it has a term of t a given by 

 t  x 1  t exp    dt 0  a

(16)

This can also be related to the Gamma function. We introduce a variable y 

t

a

and obtain

Kunihiro Suzuki

370 

 x 1  t  x 1  t exp    dt  0  ay  exp   y  ady 0  a 

 a x  y x 1 exp   y  dy 0

 a   x x

(17)

2.2. A Gamma Function and a Factorial A Gamma function has a relationship of

  x  1  x  x 

(18)

which we prove hereafter as follows. 

x  x    exp  t  xt x 1dt 0



dt x  dt  exp  t  dt 0 



0

0

(19)

 exp  t  t    exp  t  t x dt    x  1

Therefore, we obtain   n  1  n  n   n  n  1   n  1

(20)

 n! 1

where

n

is a natural number. We further obtain 

 1   exp  t  dt  1 0

(21)

Therefore, we obtain factorial with a Gamma function as

  n  1  n!

(22)

Related Mathematics

371

This can be extended to a real number as 

x!    x  1   exp  t  t x dt 0

(23)

2.3. Evaluation of  1 2 2 Introducing a variable of t  u , we obtain dt  2udu , and Eq. (15) is reduced to





2



  x 





exp u 2 u 2 x  2 2udu

0 





exp u 2 u 2 x 1du

0

(24)

Therefore, we obtain 1    2 2









exp u 2 du

0

 

(25)

2.4. A Gamma Function Where x < 0 A Gamma function is expressed as with a limiting form as

n !n x n  x  x  1 x  2 

  x   lim

 x  n

(26)

which is proved as follows. We consider a function given by n

  0 

t 

n

 n  x    1   t x 1dt n

Introducing a variable of

t n  s , we obtain

(27)

Kunihiro Suzuki

372 1

 1  s   ns   n  1  s  s

 n  x 

n

x 1

nds

0

1

x

n

x 1

ds

(28)

0

Therefore, we obtain

 n  x n

x



1

 1  s 

n

s x 1ds

0

1

n n 1    1  s  s x   x 0 x  1 n  1  s n1 s x ds 0 x

1

 1  s 

n 1

s x ds

0



(29)

We then obtain

 n  x n

x



1

 1  s 

n

s x 1ds

0

n 1 1  s n 1 s x ds x 0 n  n  1 1 n2  1  s  s x 1ds  x  x  1 0









n  n  1 n  2 

1

1  s  x  x  1 x  2  

n 3

s x  2 ds

0

 

n  n  1 n  2  1

x  x  1 x  2 

x  x  1 x  2 

1

1  s   x  n  1 

0

s x  n 1ds

0

n!

 x  n  1 x  n 

(30)

Therefore, we obtain

 n  x 

n !n x x  x  1 x  2   x  n  1 x  n 

On the other hand, we obtain

(31)

Related Mathematics n

373

  t lim  n  x   lim  1   t x 1dt n  n  0  n  



n

0

n

exp  t  t x 1dt

   x

(32)

Therefore, we obtain

n !n x n  x  x  1 x  2   x  n  1 x  n 

  x   lim

(33)

 x is infinite for x = 0, -1, -2, ・・・・・. That is, 1 0   n 

(34)

The dependence of the Gamma function on x is shown in Figure 1.

10

Gamma (x)

5 0 -5 -10 -5 -4 -3 -2 -1 0 1 2 x

3 4 5

Figure 1. A Gamma function for whole planes of positive and negative regions.

2.5. A Product of a Gamma Function of n is a natural number and we set

 n   n 1    2  2 

Kunihiro Suzuki

374  n   n 1       n  2  2 

(35)

From the characteristics of a Gamma function, we obtain  n 1 n 1  n 1     2  2   2 

(36)

Therefore, we obtain  n   n 1  n   n 1 n 1      n       2  2   2  2  2

Consequently,

  n 

  n

(37)

holds a recursion of

n 1   n  1 2

(38)

We then obtain n 1    n  1 2 n 1 n  2      n  2 2 2 n 1 n  2 1      1 2 2 2 n 1 n  2 1 1          1 2 2 2 2  n  1!  n 1  2  21 n    n 

 n 

(39)

Finally, we obtain  n   n 1 1 n      n 2  2  2 

(40)

Related Mathematics

375

This is valid even for n  1 . This is valid when n is a real number, which we do not prove here.

2.6. A Binominal Factor for n  k A binominal factor is given by

n

Ck 

n!  n  k  !k !

(41)

Therefore, it is expressed with Gamma functions as

n

Ck 

  n  1

(42)

  n  k  1   k  1

  n  k  1 When n and k are integers, and n  k , the term approaches to infinity. Therefore, we have

n

Ck  0 for n < k

(43)

2.7. A Beta Function We consider the product of a Gamma function as

  x    y   4  exp  u 2 u 2 x 1du  exp  v 2 v 2 y 1dv 



0

 4



0

0





0

exp    u  v   u 2 x 1v 2 y 1dvdu 2

2

(44)

We convert variables as u  r cos , v  r sin 

(45)

The incremental integration area is converted as

dvdu  rd dr

(46)

Kunihiro Suzuki

376

Therefore, the product is expressed as 

  x    y   4  0





2 0

exp  r 2   r cos  

 2 exp  r 2  r 

2 x  y  1

0

2 x 1

 r sin  



dr 2 2  cos  

2 x 1

0



   x  y  2 2  cos  

2 x 1

0

 sin  

2 y 1

2 y 1

d rdr

 sin  

2 y 1

d

(47)

d

where we change r from 0 to infinity, and  from 0 to  2 . We then obtain a Beta function as

B  x, y  

  x   y

(48)

 x  y

The Beta function is thus defined as 

B  x, y   2 2  cos 

2 x 1

0

sin  

2 y 1

d

(49)

Performing a variable conversion, cos 2   t

(50)

we obtain

2cos sin d  dt

(51)

Therefore, the Beta function is also expressed by 

B  x, y   2  2  cos  

2 x 1

0

 sin  

2 y 1

  cos    sin     cos  sin  1 0

   cos 2   1

0

2 x 1

x 1

 sin  

  t x 1 1  t  dt 1

0

y 1

2

d

2 y 1

y 1

dt dt

(52)

Related Mathematics

377

3. GAUSS INTEGRATION 3.1. Normal Gauss Integration Gauss integration I is defined as

I









exp ax2 dx

0

(53)

We change the integration region form  0,   to  ,  and express the integration as I ' and is given by

I'











exp ax2 dx



exp ax 2

Since

(54)

 is an even function,

I '  2I

(55)

Since x is a dummy variable, we can use any notation. Changing a variable from x to

y

, we can also express the integration given by

I'











exp ay 2 dy (56)

Multiplying the both equation, we obtain

 I '2   exp  ax 2  dx  exp  ay 2  dy 





 





 





exp  a x 2  y 2  dxdy  

(57)

We change the integration in the Cartesian axis system to that in the polar axis system as

Kunihiro Suzuki

378

r 2  x2  y 2 dxdy  2 rdr

(58)

We then obtain 

 I '2  0





exp ar 2 2 rdr (59)

Introducing a variable u  r2

(60)

we then obtain

du  2rdr

(61)

The integration is given by





 I '2  0



exp ar 2 2 rdr



  exp   au  2 r

0

 





0

1 du 2r

exp   au  du

 a

(62)

Finally, we obtain 1 I' 2 1   2 a

I

3.2. Modified Gauss Integration The integration related to the Gauss integration is given by

(63)

Related Mathematics

In  a  









379

x2n exp ax2 dx

0

(64)

We start with the Gauss integration, which can be denoted as I 0  a  . The Gauss integration is given by

I0  a  









exp ax2 dx

0

(65)

We regard this as a function of a . Differentiating this equation with respect to a , and we obtain dI 0  a 



da







d





exp ax 2 dx

0

da









 x exp ax 2 dx 2

0

  I1

(66)

On the other hand, we obtain 1  d  2 a dI 0  a    da da

   3



 1 a 2 2 2

(67)

Therefore, we obtain I1  a  



1 22

a3

(68)

We differentiate I1  a  and obtain dI1  a  da







0





 x 4 exp ax 2 dx 

Therefore, we obtain I 2  a  as

5

 1 a 2 2 2

(69)

Kunihiro Suzuki

380 I2  a  









x 4 exp ax 2 dx 

0



3 23

a5

(70)

We repeat the above process, and obtain a general form as In  a   







3 5 7 





x 2 n exp ax 2 dx

0

2  2n  1!! 2

n 1

  2n  1 n 1

 a

2 n 1

 a

2 n 1

(71)

where we define

 2n  1!!  1 3  5  7 

  2n  1

(72)

The other integration related to the Gauss integration is also given by

Kn  a  









x2n1 exp ax2 dx

0

(73)

We start with n  0 given by

K0  a  









x exp ax 2 dx

0

(74)

Introducing a variable u  x2

(75)

We then have

du  2xdx We obtain

(76)

Related Mathematics K0  a  









x exp ax 2 dx

0 

  x exp  au 

0

1 2 1  2a







0

1 du 2x

exp   au  du

(77)

We regard this as a function of obtain dK0  a 



da

381

d







a . Differentiating this equation with respect to a , and



x exp ax 2 dx

0

da











x3 exp ax 2 dx

0

  K1  a 

(78)

On the other hand, we obtain  1  d  2a    da 1  2 2a

dK 0  a  da

(79)

Therefore, we obtain K1  a  

1 2a 2

(80)

We differentiate this further and obtain dK1  a  da







0





x5 exp ax 2 dx  

Therefore, we obtain

2 2a 3

(81)

Kunihiro Suzuki

382







K2  a  



x5 exp ax 2 dx 

0

2 2a 3

(82)

We repeat the process and obtain a general form as Kn  a  









x 2 n 1 exp ax 2 dx

0

2  3  n 2a n 1 n!  n 1 2a



(83)

4. AN ERROR FUNCTION An error function Erf  x  is defined as the integration of a Gauss function as Erf  x  

1





x

0





exp  y 2 dy

(84)

The boundary conditions for the function is given by Erf  0   0

(85)

Erf     1

(86)

The complementary error function is defined as

Erfc  x   1  Erf  x 

(87)

Let us consider the integration below. x

 I  0

 z2  exp   dz 2  2 1

(88)

Related Mathematics

383

Introducing a variable t

z

(89)

2

we further perform the integration as x

 z2  exp   dz 2  2

 I  0

  0

1

x

1

2

2 x

exp  t 2  2dt



1  2 0



1  x  Erf   2  2

2

2



(90)

exp  t 2 dt

The dependences of Erf  x  and Erfc  x  on x are shown in Figure 2. The inverse error function Erf Erf 1  x  

1

 x  is approximately expressed as

 1  7 2 5 127 3 7 4369 4 9 34807 5 11    x  x3  x  x  x  x  2 12 480 40320 5806080 182476800  

1.0 Erf(x)

Erf(x), Erfc(x)

0.8 0.6 0.4 Erfc(x)

0.2 0.0 0.0

0.5

1.0 x

1.5

2.0

Figure 2. Dependence of an error function and a complementary error function on x .

(91)

Kunihiro Suzuki

384

5. AN INTEGRAL AREA OF CONVERTED VARIABLES We discuss an integral area for converted two variables, where the two variables

 x, y  is converted to  u, v  . The corresponding schematic expression is shown in Figure 3. We start with one variable, and consider the conversion given by

x  x u 

(92)

The integral for the converted variable can be given by



x2

x1

f  x  dx   f  x  u   u2

u1

The factor shows

dx

du

dx du du

(93)

how the axis length is changed by the conversion.

Let us treat two variables. We convert

 x, y  to  u, v  , which is expressed in general

as   x  x  u, v     y  y  u, v 

(94)

The corresponding total derivative is given by   dx   dy  

x x du  dv u v y y du  dv u v

(95)

Using a matrix form, this can also be expressed by  x  dx   u    dy   y   u

x  v   du    y   dv   v 

(96)

Related Mathematics

385

This means that the vectors 1,0  ,  0,1 , 1,1 in  u, v  plane is converted to  x  dx   u    dy   y   u  x   u     y     u 

x  v   1    y   0   v 

 x  dx   u    dy   y   u  x   v     y     v 

x  v   0    y   1   v 

(97)

(98)

 x x   dx   u v  1      dy   y y  1    u v   x x   v  v     y  y     v v 

in

 x, y  plane. The area in  x, y  is given by det J

 x  u J   y   u

x  v  y   v 

(99)

, where

(100)

Kunihiro Suzuki

386

 u, v  is related to the incremental Therefore, the incremental area dudv in the plane area in the

 x, y  plane as

dxdy  det J dudv

(101)

Figure 3. Integral areas for converted variables.

6. A MARGINAL PROBABILITY DISTRIBUTION We consider two probability variables X and Y , and relate them to the probability distribution f  x, y  , which correspond to the probability distribution where both

y

x and

occur. We want to know the probability of X independent of Y . We can obtain it by

f x summing up the probability for X with whole Y . We set it as 1   . If the values are discrete ones, we can obtain N

f1  X  xi    pij  pi1  pi 2  j 1

 piN

(102)

If the values are continuous ones, we obtain f1  x   





f  x, y dy

Similar analysis can be done for

(103)

y

and we obtain

Related Mathematics f2  y   





f  x, y dx

(104)

The conditional probability

f  x y 

387

f  x y

can be expressed by

f  x, y  f2  y 

(105)

If X is independent on Y , we obtain f  x, y   f1  x  f 2  y 

(106)

7. INTEGRATION BY PARTS Let us consider the derivative of a product of two functions fG , which is given by

 fG   f 'G  fg '

(107)

where

g  G'

(108)

We can obtain

 fgdx  fG   f Gdx '

(109)

The  -th moment of an exponential distribution is given by 

X



 x  1   x exp    dx  0  

where we can regard as

(a-4)

Kunihiro Suzuki

388

f  x

g

(110)

 x 1 exp      

(111)

We then obtain f '   x 1

(112)

 x G   exp     

(113)

Finally, we obtain 

X



 x  1   x exp    dx  0   



   x    x    x   exp         x 1 exp    dx        0 0   

 x     x 1 exp    dx 0  

(114)

8. DERIVATIVES OF INVERSE TRIGONOMETRIC FUNCTIONS We define a variable as

x  sin y

1  cos y

for 

   y 2 2

dy dx

Therefore, we obtain

(115)

(116)

Related Mathematics

dy 1  dx cos y 1  1  x2

389

(117)

This leads the integration given by

 1 dx  sin 1 x  C  2  1 x

(118)

We define a variable as for 0  y  

x  cos y

1   sin y

dy dx

(119)

(120)

Therefore, we obtain dy 1  dx 1  x2

(121)

This leads the integration given by

 1 dx   cos1 x  C  2  1 x

(122)

   y 2 2

(123)

x  tan y

for 

We then obtain 1 dy cos 2 y dx dy  1  x2 dx

1





(124)

Kunihiro Suzuki

390 Therefore, we obtain

dy 1  dx 1  x 2

(125)

This leads the integration given by

 1 dx  tan 1 x  C   1  x2

(126)

9. A DERIVATIVE FUNCTION The derivative function for f  x  is defined as f '  x   lim

f  x  x   f  x 

(127)

x

x 0

We then have

 f  x  g  x  '  f '  x  g  x   f  x  g '  x 

(128)

This can be proved as  f  x  g  x   ' f  x  x  g  x  x   f  x  g  x   lim x  0 x f  x  x  g  x  x   f  x  g  x  x   f  x  g  x  x   f  x  g  x   lim x  0 x f  x  x  g  x  x   g  x   lim g  x  x   f  x  lim x  0  x  0 x x  f ' x g  x  f  x g ' x

(129)

Eq. (128) can be generalized as  f  x  g  x 

 n

n

  n Ck f  k 0

nk 

g

k

(130)

Related Mathematics

391

We also have

 f  x  f ' x g  x  f  x g ' x    2  g  x    g  x  '

(131)

We do not directly prove it, but consider the following. 1 1  '  1  g  x  x  g  x     lim x  g  x   x 0 g  x  x   g  x   g  x  x  g  x   lim x  0 x g  x  x   g  x   x  lim x  0 g  x  x  g  x  

(132)

g ' x  g  x  

2

Therefore, we obtain  f  x   1  1  f  x     f ' x  g  x  g  x    g  x   f ' x g ' x   f  x 2 g  x  g  x   f ' x g  x  f  x g ' x  2  g  x   '

'

(133)

We also have  g  f  x    g '  y  f '  x    '

(134)

where

y  f  x

(135)

Kunihiro Suzuki

392 We prove above. We have

f  x  x   y  y

(136)

Therefore, we obtain

x  0 : y  f  x  x   f  x   0

(137)

Therefore, we obtain  g  f  x    lim   x 0 '

 lim

g  f  x  x    g  f  x   x g  y  y   g  y  y y

x 0

 lim

g  y  y   g  y  f  x  x   f  x  y

x 0

 lim

x

g  y  y   g  y 

y 0

y

x lim

(138)

f  x  x   f  x 

x 0

x

 g ' y  f ' x

10. VECTOR DERIVATIVE We treat vector derivative here. We consider a p-th order vector given by

 a1    a2 β       ap  We consider a scalar

(139)

f which depends on the elements of β , which is denoted as

f  a1 , , a p  . The vector derivative is then defined as

Related Mathematics  f   a   1  f  f    a2  β      f   a   p

393

(140)

We consider two special forms for f here. Let us consider a vector X , and a form f as

f  βT X

(141)

T Since β is the 1  p vector, X should be p  1 vector given by

 x1    x2 X        xp 

(142)

Therefore, we obtain

f  βT X  x1    x2   a1 a2 ap        xp   a1 x1  a2 x2   a p x p p

  ai xi i 1

We then obtain

(143)

Kunihiro Suzuki

394 T f   β X   β β p



  ai xi i 1

β

(144)

 x1    x2        xp  X

f as Let us consider a symmetrical matrix M , and a form

f  βT Mβ

(145)

T p p Since β is the 1  p vector and β is the p  1 vector, M should be a matrix given by

 m11 m12  m21 m22 M     m p1 m p 2

m1 p   m2 p    m pp 

(146)

Therefore, we obtain

f  βT Xβ   a1 a2

p

p

  ai mij a j i 1 j 1

We then obtain

 m11 m12  m21 m22 ap      m p1 m p 2

m1 p  a1    m2 p  a2      m pp   a p 

(147)

Related Mathematics

395

T f   β Mβ   β β p



p

  ai mij a j

(148)

i 1 j 1

β

Let us consider f a1 . The Eq. (148) can be reduced to p

f  a1

p

  ai mij a j i 1 j 1

a1

p  p    a1  m1 j a j  a1  ai mi1  a12 m11  j 2 i2    a1 p     2a1  m1 j a j  a12 m11  j 2    a1

(149)

 p   2   m1 j a j  a1m11   j 2  p

 2 m1 j a j j 1

Since we assume a symmetrical matrix, we assume

mi1  m1i

(150)

in the derivation process. Therefore, we obtain T f   β Mβ   β β

 p   2 m1 j a j   j 1   p   2 m2 j a j    j 1       p   2 m pj a j   j 1   2 Mβ

(151)

Kunihiro Suzuki

396

11. SYMMETRY OF THE MATRIX N x1 A  N y2  At N x1 1

We use a matrix of in corresponding analysis when we evaluate eigenvalues and eigenvectors. We study the symmetry of the matrix.

At N x1

At N x1

is evaluate as

 n11  n   12  n13   n14             

  n31   n32    n33    n34    

n21 n22 n23 n24

n11 nx1

nx 2 n22

nx1

nx 2

n13

n23

nx1

nx 2

n14

n24

nx1

nx 2

N  Multiplying 2 y

 1 n  y1   0  2 1 t 1 N A N   y x   0    0               

 0    0    1  nx 3 

0

nx1

1

0

nx 2

0

0

n31   nx 3  n32   nx 3   n33  nx 3   n34  nx 3 

n21

n12

1

(152)

1

to the matrix, we obtain

0

0

1 ny 2

0

0

1 ny 3

0

0

 0    0    0    1  n y 4   

n11

n21

nx1 n y1

nx 2 n y1

n12 nx1 n y 2 n13 nx1 n y 3 n14 nx1 n y 4

n22 nx 2 n y 2 n23 nx 2 n y 3 n24 nx 2 n y 4

n11

n21

nx1

nx 2

n12

n22

nx1

nx 2

n13

n23

nx1

nx 2

n14

n24

nx1

nx 2

n31   n x 3 n y1  n32   nx 3 n y 2   n33  nx 3 n y 3   n34   nx 3 n y 4 

n31   nx 3  n32   nx 3   n33  nx 3   n34  nx 3 

(153)

Related Mathematics We further multiply

397

A to the matrix, and obtain

A  N y2  At N x1 1

 n11    n21 n  31

 M 11    M 21 M  31

    n12 n13 n14    n22 n23 n24    n32 n33 n34       M 12 M 13   M 22 M 23  M 32 M 33 

n31   nx 3 ny1 

n11 nx1 ny1

n21 nx 2 ny1

n12 nx1 ny 2

n22 nx 2 ny 2

n32  nx 3 ny 2 

n13 nx1 ny 3

n23 nx 2 ny 3

n33  nx 3 ny 3 

n14 nx1 ny 4

n24 nx 2 ny 4



n34   nx 3 ny 4 

(154)

Each component is given by M11 

M12 

M13 

M 21 

M 22 

M 23 

n 2 n112 n 2 n 2  12  13  14 nx1 ny1 nx1 ny 2 nx1 ny 3 nx1 ny 4

(155)

n n n11n21 n n n n  12 22  13 23  14 24 nx 2 ny1 nx 2 ny 2 nx 2 ny3 nx 2 ny 4

(156)

n11n31 n n n n n n  12 32  13 33  14 4 nx3 ny1 nx3 ny 2 nx3 ny3 nx3 ny 4

(157)

n n n21n11 n n n n  22 12  23 13  24 14 nx1 ny1 nx1 ny 2 nx1 ny3 nx1 ny 4

(158)

n232 n212 n222 n242    nx 2 ny1 nx 2 ny 2 nx 2 ny3 nx 2 ny 4

(159)

n21n31 n n n n n n  22 32  23 33  24 4 nx3 ny1 nx3 ny 2 nx3 ny3 nx3 ny 4

(160)

Kunihiro Suzuki

398 M 31 

M 32 

M 33 

n31n112 n n 2 n n 2 n n 2  32 12  33 13  34 14 nx1 ny1 nx1 ny 2 nx1 ny3 nx1 ny 4

(161)

n31n21 n n n n n n  32 22  33 23  34 24 nx 2 ny1 nx 2 ny 2 nx 2 ny3 nx 2 ny 4

(162)

n312 n 2 n 2 n 2  32  33  34 nx3 ny1 nx3 ny 2 nx3 ny 3 nx3 ny 4

(163)

Finally, we multiply

N x 1 1

to the matrix. This corresponds to that we multiply

n

x2 to the first row elements, to the second row elements, and elements. Therefore, we obtain

M11 

M12 

M13 

M 21 

M 22 

M 23 

M 31 

n 2 n112 n 2 n 2  12  13  14 nx1ny1 nx1ny 2 nx1ny 3 nx1ny 4

1

nx3

1

nx1

to the third row

(164)

n13n23 n11n21 n12 n22 n14 n24    nx1nx 2 ny1 nx1nx 2 ny 2 nx1nx 2 ny3 nx1nx 2 ny 4

(165)

n11n31 n12 n32 n13n33 n14 n4    nx1nx3 ny1 nx1nx3 ny 2 nx1nx3 ny3 nx1nx3 ny 4

(166)

n23n13 n21n11 n22 n12 n24 n14    nx 2 nx1 ny1 nx 2 nx1 ny 2 nx 2 nx1 ny3 nx 2 nx1 ny 4

(167)

n 2 n212 n 2 n 2  22  23  24 nx 2 ny1 nx 2 ny 2 nx 2 ny3 nx 2 ny 4

(168)

n21n31 n22 n32 n23n33 n24 n4    nx 2 nx3 ny1 nx 2 nx3 ny 2 nx 2 nx3 ny3 nx2 nx3 ny 4

(169)

n31n112 n n 2 n n 2 n n 2  32 12  33 13  34 14 nx3 nx1 ny1 nx3 nx1 ny 2 nx3 nx1 ny 3 nx3 nx1 ny 4

(170)

Related Mathematics

n31n21 n32 n22 n33n23 n34 n24    nx3nx 2 ny1 nx3nx 2 ny 2 nx3nx 2 ny3 nx3nx 2 ny 4

M 32 

M 33 

399

(171)

n312 n 2 n 2 n 2  32  33  34 nx3ny1 nx3ny 2 nx3ny3 nx3ny 4

(172)

M ij  M ji

, and we can perform a Jacobi method to obtain

This is symmetrical, that is eigenvalues and eigenvectors.

If we use X instead of U , we should obtain the eigenvalue of X . Let us see what happen in this case. The equation we utilize in the text is given by N x1 A  N y2  At N x1 N x X   2  N x X 1

2

(173)

This can be modified so that it is associated with X as

N 

2 1 x

A  N y2  At X   2  X 1

2

(174)

Therefore, we study the symmetry of matrix

N 

2 1 y

At

N 

2 1 x

A N



2 1 y

t

A

.

is evaluated as

 1 n  y1   0 2 1 t  N y  A    0    0    n11 n  y1  n12   ny 2   n13  ny 3   n14 n  y4

0

0

1 ny 2

0

0

1 ny3

0

0

n21 n y1

n31  n y1  n32   ny 2  n33  ny 3   n4  n y 4 

n22 ny 2 n23 ny 3 n24 ny 4

 0    n 0   11   n12  n 0   13   n14  1  n y 4 

n21 n22 n23 n24

n31   n32  n33   n4 

(175)

Kunihiro Suzuki

400

Multiplying A to the matrix of Eq. (175), we obtain A  N y2  At 1

 n11    n21 n  31

n12 n22 n32

 M 11    M 21 M  31

M 12 M 22 M 32

n13 n23 n33

 n11 n  y1  n12 n14     ny 2 n24   n n34   13  ny 3   n14 n  y4

n21 n y1 n22 ny 2 n23 ny 3 n24 ny 4

n31  n y1  n32   ny 2  n33  ny 3   n34  n y 4 

M 13   M 23  M 33 

(176)

Each element is given below.

M11 

M 12 

M 13 

M 21 

M 22 

M 23 

n112 n12 2 n132 n14 2    ny1 ny 2 ny 3 ny 4 n11n21 n12 n23 n13 n23 n14 n24    n y1 ny 3 ny 3 ny 4

n11n31 n12 n32 n13 n33 n14 n4    n y1 ny 2 ny 3 ny 4

n21n11 n22 n12 n23 n13 n24 n14    n y1 ny 2 ny 3 ny 4

n212 n222 n232 n242    ny1 ny 3 ny 3 ny 4 n21n31 n22 n32 n23 n33 n24 n4    n y1 ny 2 ny 3 ny 4

(177)

(178)

(179)

(180)

(181)

(182)

Related Mathematics

M 33 

n312 n32 2 n332 n34 2    ny1 ny 2 ny 3 ny 4

N Finally, we multiply  

401

(183)

2 1 x

to the first row element, 1 nx 2 We then obtain

M11 

M 12 

M 13 

M 21 

M 22 

M 23 

M 31 

M 32 

M 33 

to the matrix. This corresponds to that we multiply 1 nx1 to the second row elements 1 nx3 to the third row elements.

n 2 n112 n 2 n 2  12  13  14 nx1ny1 nx1ny 2 nx1ny 3 nx1ny 4 n11n21 n12 n22 n13 n23 n14 n24    nx1ny1 nx1ny 3 nx1ny 3 nx1ny 4

n11n31 n12 n32 n13 n33 n n    14 4 nx1ny1 nx1ny 2 nx1ny 3 nx1ny 4

n n n21n11 n n n n  22 12  23 13  24 14 nx 2 ny1 nx 2 ny 2 nx 2 ny 3 nx 2 ny 4

n 2 n212 n 2 n 2  22  23  24 nx 2 ny1 nx 2 ny 3 nx 2 ny 3 nx 2 ny 4 n21n31 n22 n32 n n n n   23 33  24 4 nx 2 n y1 nx 2 n y 2 nx 2 n y 3 nx 2 n y 4

n31n11 n32 n12 n n n n   33 13  34 14 nx 3 n y1 nx 3 n y 2 nx 3 n y 3 nx 3 n y 4

n31n21 n32 n23 n33 n23 n34 n24    nx 3 n y1 nx 3 n y 3 nx 3 n y 3 nx 3 n y 4

n312 n 2 n 2 n 2  32  33  34 nx3 ny1 nx3 ny 2 nx3 ny 3 nx3ny 4

(184)

(185)

(186)

(187)

(188)

(189)

(190)

(191)

(192)

Kunihiro Suzuki

402

This is not symmetrical, and we cannot perform a Jacobi method to obtain eigenvalues and eigenvectors.

12. A STIRLING’S FORMULA We prove a Stirling’s formula given by lim n!  2n nn en

n

(193)

Step 1: A Wallis’ formula First, we prove a Wallis’ formula given by   22 42 62  2k  2    lim  2k  1 2k  1  n 1 3 3 5 5 7  k 1   n

lim

n 





 2

 2n 2

 2n  1 2n  1 (194)

We consider the integral given by

Sn 





2 0

sin n xdx

(195)

Performing the integral, we obtain

Sn  





2 0



 2

sin n xdx sin x sin n 1 xdx

0



   cos x sin n 1 x  2   n  1 0   n  1   n  1



 1  sin x  sin 2 0



 2

0

2

n2



 2

cos 2 x sin n  2 xdx

0

xdx

sin n  2 x  sin n x  dx  

  n  1 Sn  2   n  1 Sn

(196)

Related Mathematics

403

We then obtain

Sn 

n 1 Sn  2 n

(197)

We can then have 2n  1 S2n  2 2n 2n  1 2n  3  S2 n  4 2n 2n  2 2n  1 2n  3 3 1  S0 2n 2n  2 4 2

S2n 

(198)

2n S2 n 1 2n  1 2n 2n  2  S 2 n 3 2n  1 2n  1 2n 2n  2 4 2  S1 2n  1 2n  1 5 3

S2 n 1 

(199)

where

S0 

 2

(200)

S1  1

(201)

Therefore, we obtain S2 n 1 2n 2n 2n  2 2n  2  S2 n 2n  1 2n  1 2n  1 2n  3 

2

22 42 62

 1 3  3 5   5 7 

44222 5331

 2n 2

 2n  1 2n  1

 In the region of 0  x  2 , we have a relationship below.

(202)

Kunihiro Suzuki

404

0  sin2n1 x  sin2n x  sin2n1 x

(203)

Therefore, we obtain

0  S2n1  S2n  S2n1

(204)

This can be reduced to

1

S2 n S  2n 1  S2n 1 S2n 1

S2n 1 2n  1  2n 2n S2n 1 2n  1

(205)

Therefore, we obtain S2 n 1 n  S 2 n 1 lim



 2

lim

1 3  3 5  5 7 

n 

2

2

2 4 6

2

 2n  1 2n  1  2n 2

(206)

Finally, we obtain the Wallis’s formula as   22 42 62  2k  2    lim  2k  1 2k  1  n 1 3 3 5 5 7  k 1   n

lim

n 





 2

 2n 2

 2n  1 2n  1 (207)

Step 2: We prove below.

lim

n 

22 n n 2 n Cn

We have

  (208)

Related Mathematics 2n  1 2n  3 2n 2n  2 1   2n  1 2

S2 n S2 n 1 

Multiplying

nS2 n 1

n

3 1  2n 2n  2 4 2 2 2n  1 2n  1

405 42 53

(209)

on the both sides of the equation, and we obtain the root of

n  2n  1 2

S2 n  S2 n 1

(210)

Using a relationship

lim

n 

S2 n 1 S2 n 1

(211)

we have

 S2 n lim  nS2n 1  n S2n 1 

   lim n 





nS2n 1 

 2

(212)

On the other hand, we have 2n 2n  2 4 2 2n  1 2n  1 5 3 2n  2n  2n  2  2n  2   2n  1  2n  2n  1  2n  2 

S2 n 1 

 2n  2  2n  2  2  2n  1! 2 22 n  n !   2n  1 2n !





22 n  2n  1 2n Cn

Therefore, we have

4 42 2 5  4  3  2 

42 22

(213)

Kunihiro Suzuki

406



  lim 2 nS2 n 1 n 



 22 n  lim  2 n n    2n  1 2n Cn   2n 22 n   lim   n   2n  1 n 2 n Cn     2n  2  2  lim   n   2  1 n 2 n Cn  n    22 n   lim   n   n 2 n Cn  

  

(214)

Step 3: We consider the integral given by x

 ln xdx  x ln x  x 1

(215)

We then have



n

1

ln xdx   x ln x  x 1

n

 n ln n  n  1

(216)

We can regard the above integration geometrically as



n

1

ln xdx  ln 2  ln 3 

1  ln  n  1  ln n   2

1  ln  n  1! ln n   n 2 which is shown in Figure 4. Let us consider the integral in detail as shown in Figure 5.

(217)

Related Mathematics

407

y

a a a

5

4

3

a

2

a

1

1

3

2

x

n 1 n

4

Figure 4. The integral of logarithm function and sum of the area of bars.

a

D a

2n

E

2 n 1

C B

A

n

n 1

x

Figure 5. The detail of integral of logarithm function and sum of the area of bars in the region of n and n+1.

The difference between the integral and the sum of bars’ areas is given by n  a1  a2  a3  a4 

 a2n2

(218)

Since, we have a2n  CDE  ABC  a2n1

(219)

Kunihiro Suzuki

408

an

decreases monotonically with increasing n. We further have

1 1 1  1 a2n   ln  n  1  ln n    ln 1   2 4 8  n

(220)

Therefore, we obtain lim a2 n  0

n 

(221)

then converges to when

an

n   , although we do not know the value of 

now. Therefore, we obtain

1 n ln n  n  1  ln  n  1! ln n   n 2

(222)

We then have

 n  1!  nn

1

2

e n e1 n

(223)

Therefore, we obtain

lim

n 

 n  1! n

n 12  n

n!

 lim

n

e

n

n 12  n

e

A

(224)

where

A  e1

(225)

Squaring both sides of Eq. (224), we obtain

lim

n 

 n!2 n2n 1e2 n

 A2

Eq. (224) should hold for n  2n , and hence we obtain

(226)

Related Mathematics

 2n  ! n   2n 2n  e2n lim

1

409

A

2

(227)

Therefore, we obtain

 n !2 A  lim

n 

n 2 n 1e2 n  2n  !

 2n 2n  e2n  n !2  2n 2n  e2n  lim n   2n  ! n 2 n 1e2 n 1

2

1

2

1 22 n  2 n 2 n   2n ! n2n 1 1

 lim

n 

1

2

 n !2  lim

n 

1 22 n 2  2n  ! n

 n !2  lim

n 

1 22 n 2  2 n 2 n Cn

(228)

We then have

n!

lim

n 

n

n  12  n

e

 A  2 (229)

Finally, we obtain

lim

n

n! 2n nn en

1

We then obtain for large

n! 

(230)

n as

2n nn e n

This is the Starling’s theorem.

(231)

Kunihiro Suzuki

410

13. TRIGONOMETRIC FUNCTIONS The trigonometric function can be appreciated well by starting with an Euler formula given by

x  iy  rei  r  cos  i sin  

(232)

Therefore, we obtain x ' iy '   x  iy  ei   x  iy  cos   i sin    x cos   y sin   i  x sin   y cos  

(233)

This can be rearranged as  x '   x cos   y sin       y '   x sin   y cos    cos   sin   x      sin  cos   y 

(234)

Therefore, the matrix

 cos   sin      sin  cos  

(235)

is called as rotation one. We further obtain ei 2  ei  ei

(236)

This leads to cos 2  i sin 2   cos   i sin     cos   i sin    cos 2   sin 2   2i cos  sin 

(237)

Related Mathematics

411

Therefore, we obtain cos 2  cos 2   sin 2   2cos 2   1  1  2sin 2 

(238)

sin 2  2cos sin

(239)

Further, we obtain cos 2 2  sin 2 2  1

(240)

This leads to

1  tan 2 2 

1 cos2 2

(241)

Appendix 2

SUMMARY OF PROBABILITY DISTRIBUTIONS AND THEIR MOMENTS ABSTRACT We studied various probability distributions, and showed that the characteristics of the distributions are expressed with their moments. We studied various analytical techniques to evaluate them. We summarize all the results in this chapter.

Keywords: moment, central moment, moment parameter, expectation, uniform distribution, binomial distribution, multinomial distribution, dirichlet distribution, negative binominal distribution, beta distribution, gamma distribution, inverse gamma distribution, poisson distribution, geometric distribution, hypergeometric distribution, normal distribution, standard normal distribution, lognormal distribution, Cauchy distribution,  distribution,  distribution, Rayleigh distribution, F distribution, t distribution, exponential distribution, Erlang distribution, Laplace distribution, Weibull distribution 2

1. INTRODUCTION We studied the probability distributions from the Bernoulli trial functions and extend it to a binomial distribution and lead to a normal distribution, and derive distributions composed of various variables. We further study the probability distribution in the standpoint of moments. Moment generation function is also studied to treat the moments in general. The moments are evaluated with various methods as mentioned above, and we summarize the results totally in this chapter.

Kunihiro Suzuki

414

2. GENERAL RELATIONSHIPS The moments for the probability distributions for discrete and continuous data are defined as

X



n

  xj f j j 1

(1)



X    x f  x  dx 

(2)

The first central moment for discrete and continuous data are the same as the moments and are defined as

1  X 1

(3)

which is the same form for discrete and continuous data. The central moments of the order higher than two are given by

    x j  1  f j n



j 1

(4)



 x  1  

  



f  x  dx

(5)

The expressions are the same as for both discrete and continuous data and are given by

1  X 2 

X 

(6)

X

 X2  X



2

2

(7)

Summary of Probability Distributions and Their Moments

3 

X 

X



3

 X3 3 X2 X 2 X

4 

X 

X

415



(8)

3

4

 X4 4 X3

X 6 X2

X

2

3 X

(9)

4

The moment parameters are given by

  1  X  2  2   X  X



3 3



4 4

(10)



2

(11)

(12)

(13)

There are some confusion in expressions where

 and  2 are used as parameters for

distribution functions instead of moment parameters. In that case, we use 1 and 2 . The other confusion may exist in using  , where we also use covariance. The covariance can have positive and negative values while the superscript 2 give us the 2

 2

positive value. Therefore, we use the expression of  the second order variable where the confusion may exist. We also add brief comments on the distributions.

2 instead of  to express that

Kunihiro Suzuki

416

3. FUNCTIONS, GENERATING FUNCTIONS, AND MOMENTS PARAMETERS FOR VARIOUS PROBABILITY DISTRIBUTIONS 3.1. A Uniform Distribution Graphics

Figure 1. Uniform distribution.

Probability Function  1  f  x  b  a  0

for a  x  b for x  a or x  b

(14)

Generating Function    

1 e b  e a ba 

(15)

Moments X 

ba 2

(16)

Summary of Probability Distributions and Their Moments

X2 

X3 

X4 



1 2 b  ba  a 2 3





1 3 2 b  b a  ba 2  a3 4



417

(17)



1 4 3 b  b a  b2 a 2  ba3  a 4 5

(18)



(19)

Central Moments

1 

ba 2

(20)

2 

1 2 b  a  12

(21)

3  0

4 

1 4 b  a  80

(22)

(23)

Moment Parameters



ba 2

2 

1 2 b  a  12

 0 

9 5

(24)

(25) (26)

(27)

Kunihiro Suzuki

418

Peak Position None. Comment The angle probability in darts is uniform with the value range of 0 to 2 radian. If there is no special reason to have a certain value, the probability distribution should be uniform.

3.2. A Binomial Distribution Graphics 0.5 n = 10

p = 0.1

Probability

0.4 p = 0.3

0.3

p = 0.5

0.2 0.1 0.0

0

2

4

6

8

10

x (a) 0.5 p = 0.2

n=5

Probability

0.4 0.3 n = 10

0.2

n = 50

0.1 0.0

Figure 2. Binomial distribution. (a)

0

5

p

10 x (b)

15

dependence with n  10 . (b)

20

n

dependence with p  0.2 .

Summary of Probability Distributions and Their Moments

419

Probability Function

f  x   n Cx p x q n x for x  0,1,2, , n

(28)

where q 1 p

(29)

Generating Function      pe  q 

n

(30)

Moments X  np

(31)

X 2  n  n  1 p2  np

(32)

X 3  n  n  1 n  2 p3  3n  n  1 p2  np

X 4  n  n  1 n  2 n  3 p4  6n  n  1 n  2 p3  7n  n  1 p2  np

(33) (34)

Central Moments 1  np

(35)

2  n  n  1 p 2  np

(36)

3  np 1  p 1  2 p 

(37)

4  3n  n  2  p 2  p  1  np 1  p  2

(38)

Kunihiro Suzuki

420

Moment Parameters

  np

(39)

 2  n  n  1 p 2  np

(40)



1 2p np 1  p 

  3

n  n  2 n

2

(41)



1 np 1  p 

The composite variable

(42)

Y  X1  X 2 also follows a binomial distribution with

parameters

f  y   n C y p y q n y

(43)

where

n  n1  n2

(44)

Peak Position The peak position is given by

np  q  x0  np  p where

x0

(45)

is an integer.

Comment This distribution is related to Bernoulli trials where we have only two values of 1 and 0. The binominal distribution is related to the number of a target event among the total trials. The coin toss is the most frequently used example for the distribution. We obtain tails or heads in the coin toss. The binominal distribution is the probability of x times head event among n times trials. This distribution can be applied to any one associated with the Bernoulli trials.

Summary of Probability Distributions and Their Moments

421

3.3. A Multinomial Distribution Graphics The corresponding graphic is the same as the one for binomial distribution. Probability Function f  x1 , x2 ,

, xm  

n! p x1 p x2 x1 ! x2 ! , xm !

p xm

(46)

where x1  x2 

 xm  n

(47)

p1  p2 

 pm  1

(48)

Generating Function      pi e  qi 

n

(49)

Moments X  npi

(50)

X 2  n  n  1 pi 2  npi

(51)

X 3  n  n  1 n  2 pi3  3n  n  1 pi 2  npi

(52)

X 4  n  n  1 n  2 n  3 pi 4  6n  n  1 n  2 pi3  7n  n  1 pi 2  npi

(53)

X i X j  n  n  1 pi p j

(54)

Central Moments 1  npi

(55)

Kunihiro Suzuki

422

2  n  n  1 pi 2  npi

(56)

3  npi 1  pi 1  2 pi 

(57)

4  3n  n  2 pi 2  pi  1  npi 1  pi  2

(58)

Moment Parameters   npi

(59)

  2  n  n  1 pi 2  npi

(60)



1  2 pi

npi 1  pi 

  3

n  n  2 n

2

(61)



1 npi 1  pi 

 ij 2  X i X j  X i

(62)

Xj

 n  n  1 pi p j  npi np j  npi p j

 

(63)

 ij 2  i 2 j2 pi p j

1  pi  1  p j 

(64)

Peak Position The peak position is given by npi  1  pi   x0  npi  pi

(65)

Summary of Probability Distributions and Their Moments

423

where x0 is an integer.

Comment This distribution is related to Bernoulli trials where we have many values. The multinomial distribution is related to the number of one event among the total trials. The parameters are focused on one parameter and hence the results are the same as ones for the binomial distribution.

3.4. A Negative Binomial Distribution Graphics 0.5 r=2

p = 0.2 p = 0.5 p = 0.7

f(x)

0.4 0.3 0.2 0.1 0.0

0

5

10 x (a)

15

20

0.3 r=5 p = 0.2 p = 0.5 p = 0.7

f(x)

0.2

0.1

0.0

0

10

Figure 3. Negative binomial distribution. (a)

20 x (b)

r  2 . (b) r  5 .

30

40

Kunihiro Suzuki

424

Probability Function x f  x, r , p    r 1 x Cr 1 p r 1  q   p  



r 1 x



r

Cr 1 p r  q 

x

Cx p r  q  for x  1, 2,3, x

;r  0

(66)

where q 1 p

(67)

Generating Function  p          1  qe 

r

(68)

Moments X 

rq p

(69)

X2 

2 rq r  r  1 q  2 p p

(70)

X3 

2 r  r  1 r  2  q 3 rq 3r  r  1 q   p p2 p3

(71)

X4 

2 6r  r  1 r  2  q 3 r  r  1 r  2  r  3 q 4 rq 7r  r  1 q    p p2 p3 p4

(72)

Central Moments 1 

2 

rq p

(73)

rq p2

(74)

Summary of Probability Distributions and Their Moments 3 

425

r 1  p  2  p  p3

(75)

rq 3r  r  2  q 4  2  p p4

2

(76)

Moment Parameters   1 

2 



rq p

(77)

rq p2

(78)

2 p r 1  p  

2

(79) p2

  3 1    r  rq 

(80) Y  X1  X 2 also follows a binomial distribution with

The composite variable parameters

f  y, r, p   r Cy pr  q 

y

(81)

where

r  r1  r2

(82)

Peak Position The peak position x0 is given by

rq  1 rq  1  1    x0  1   p  qr  p  q

(83)

Kunihiro Suzuki

426 where x0 is an integer.

Comment This distribution is related to binominal distribution, but with the special situation. In the Bernoulli trials, we obtain two kinds of events of success and fail. The negative binominal distribution is related to the situation where we obtain r times success at the trial number of x  r . Therefore, the distribution is related to the fail event number x where we have r times success.

3.5. A Beta Distribution Graphics 3

3 (5,2)

( ,) = (2,5) (2,3)

( ,)

(3,2)

2

(0.2,0.5) (0.2,0.3) (0.3,0.2) (0.5,0.2)

f(x)

f(x)

2

1

0 0.0

1

0.2

0.4

x (a)

0.6

0.8

0 0.0

1.0

0.2

0.4

x (b)

0.6

0.8

1.0

0.8

1.0

3

3 ( ,) = (0.5,5)

2

 ==5

(5,0.5)

2

(2,0.5)

2

f(x)

f(x)

(0.5,2)

1

1

1

0 0.0

0.5

0.2

0.4

x (c)

0.6

0.8

1.0

0 0.0

0.2

0.4

x (d)

0.6

Figure 4. Beta distributions. (a)   1,   1 , (b)   1,   1 , (c) (  1)(   1)  0 , (d)    .

Summary of Probability Distributions and Their Moments

427

Probability Function f  x 

x 1 1  x 

 1

B  ,  

for 0  x  1;  0,   0

(84)

Generating Function None. Moments Xi 

B   i ,   B  ,  

(85)

Central Moments Use a theorem. Moment Parameters Use a theorem. Peak Position The distribution has the peak for   1 and   1 , and the peak position x0 is given by x0 

 1   1     1

(86)

Comment A Beta distribution gives various kind of shape between 0 and 1 with varying parameter values of  and  . It is somehow extension of a binomial distribution, where    is constant in the distribution. A Beta distribution is more flexible where we can set values  and  independently. A Beta distribution is used in the Bayes’ theorem.

3.6. A Dirichlet Distribution Graphics The corresponding graphic is the same as the one for Beta distribution.

Kunihiro Suzuki

428

Probability Function f  x1 , x2 , , xm  

  

 1    2    m 

x11 1 x2 2 1

xm m 1 (87)

where 1  2 

 m  

(88)

x1  x2 

 xm  1

(89)

Generating Function None. Moments xj  

     j  1

   1   j 

j 

x j2  

x j3  

x j4  

(90)   j  2 

  

  j 

   2 

 j  j  1    1

(91)

     j  3 

   3    j 

 j  j  1 j  2     1  2 

  

   4 

(92)

  j  4    j 

 j  j  1 j  2  j  3    1  2   3

(93)

Summary of Probability Distributions and Their Moments

429

The moment associated with covariance is given by xi x j  

  

  i  1   j  1   j 

    2    i 

 i j    1

(94)

Central Moments Use a theorem. Moment Parameters Use a theorem. Peak Position The distribution is the same as the one for Beta function. Comment A Dirichlet distribution is a likelihood function for multinomial distribution, and used in the Bayes’ theorem.

3.7. A Gamma Distribution Graphics 2.0

0.5

n=1  = 0.5

1.5

 = 1.0

1.0

f(x)

f(x)

n=5

0.4

 = 2.0

0.3  = 1.0

0.2

0.5 0.0

 = 0.5

 = 2.0

0.1 0.0

0

2

4

x (a)

6

8

Figure 5. Gamma distribution. (a) n  1 . (b)

10

n5.

0

5

10 x (b)

15

20

Kunihiro Suzuki

430

Probability Function 

x

x n 1e  f  x    n  n

for x  0 (95)

Generating Function    

1

1   

n

(96)

Moments X  n

(97)

X 2  n  n  1  2 X 3  n  n  1 n  2  3

X 4  n  n  1 n  2 n  3  4

(98)

(99)

(100)

Central Moments

1 

  n  1  n



   n  2     n  1  2    2 2         n       n  3    n  3    n  1     n  2    n  1   3 3   3  2    n   n  n    n      

(101)

(102)

(103)

Summary of Probability Distributions and Their Moments

431

2 4    n  4    n  1     n  3   n  1   n  2     n  1    4 4   4 6    3    n  n  n   n     n      n       (104)

When

n

is an integer, they are reduced to

1  n

(105)

 2  n 2

(106)

3  2n 3

(107)

4  3n  n  2   4

(108)

Moment Parameters Use a theorem. When n is an integer, they are reduced to

  n

(109)

 2  n 2

(110)



2 n  

(111) 2

  3 1   n



(112)

We consider two independent variables X1 and X 2 which follow Gamma distributions and form a composite one as Y  X1  X 2

Therefore, it also follows a Gamma distribution with

(113)

Kunihiro Suzuki

432 

y

y n 1e  f  y    n  n

(114)

where n  n1  n2

(115)

Peak Position The peak positon is given by x0   n  1 

(116)

Comment



When an event occurs once per the time period of , the Gamma distribution expresses the probability for the time period where the event occurs n times. Therefore, it expresses the probability that an event occurs n times during the time period of x .

3.8. An Inverse Gamma Distribution Graphics

3.0 n=3 =1

f(x)

2.0

=2

1.0 =5

0.0

0

Figure 6. Inverse Gamma distribution.

1

2

x

3

4

5

Summary of Probability Distributions and Their Moments

433

Probability Function

 n x  n 1e f  x   n



 x

for x  0 (117)

Generating Function None. Moments X 

  n  1  n

X2 

X3 

X4 

 (118)

  n  2  n

  n  3  n

  n  4  n

2 (119)

3 (120)

4 (121)

Central Moments

1 

  n  1  n



   n  2     n  1  2    2 2       n  n         3    n  3    n  1   3   n  2    n  1  3   3  2     n   n    n       n 

(122)

(123)

(124)

Kunihiro Suzuki

434

2 4    n  4    n  1   4   n  3   n  1   n  2     n  1   4   4 6    3     n  n   n     n      n       n  (125)

When

n

is an integer, they are reduced to

 n 1

1 

(126)

1

2 

 n  1  n  2  2

2 (127)

4

3 

 n  1  n  2  n  3 3

3

3n  19

4 

 n  1  n  2  n  3 n  4  4

(128)

4 (129)

Moment Parameters Use a theorem. When n is an integer, they are reduced to



 n 1

2 

 4

(130)

1

 n  1  n  2  2

2

n2 n3

19    n  3   n  2    3  n  3 n  4 

(131)

(132)

(133)

Summary of Probability Distributions and Their Moments

435

Peak Position The peak position x0 is given by

x0 

 n 1

(134)

Comment The probability that we have several data that follow the same normal distributions can be expressed with the product of many normal distributions. If we regard the variable for the function is the variances, it is an inverse Gamma function. Therefore, the inverse Gamma function is used for a priori distribution function in Bayesian statistics for evaluating the variance. The inverse Gamma distribution is derived by converting a variable of X to the inverse one, where X follows the Gamma distribution. This is the reason why the distribution is called as inverse Gamma distribution.

3.9. A Poisson Distribution Graphics

0.5 =1 =2 =5

f(x)

0.4 0.3 0.2 0.1 0.0

0

Figure 7. Poisson distribution.

2

4

x

6

8

10

Kunihiro Suzuki

436

Probability Function f  x 

x x!

e   for x  0,1, 2, (135)

Generating Function

    exp    e  1   1 1 1  exp       2   3   4  2 3! 4!  

  

(136)

Moments

X  X 2  2   X 3   3  3 2  

X 4   4  6 3  7  2  

(137)

(138)

(139)

(140)

Central Moments

1  

(141)

2  

(142)

3  

(143)

4  3 2  

(144)

Summary of Probability Distributions and Their Moments

437

Moment Parameters



(145)

2  

(146)

 

1

(147)



  3

1

(148)



The composite variable Y  X1  X 2 also follows a Poisson distribution f  y 

y y!

e 

(149)

where

  1  2

(150)

Peak Position The peak position

  1  x0  

x0

is given by

(151)

where x0 is an integer.

Comment A Poisson distribution can be regarded as a limited one of a binomial distribution with a quite small probability and many trials. This distribution is convenient in the standpoint of convergence, which is a severe one associated with the binomial distribution. Therefore, the Poisson distribution is frequently used for many cases, where the occurring probability is small. It should be noted that the parameter for the distribution is the average number while it is the probability where a target event occur in the binominal distribution.

Kunihiro Suzuki

438

3.10. A Geometric Distribution Graphics

0.5 p = 0.1 p = 0.2 p = 0.5

f(x)

0.4 0.3 0.2 0.1 0.0

0

5

10 x

15

20

Figure 8. Geometric distribution.

Probability Distribution

f  x   q x 1 p for x  1,2,

(152)

where q 1 p

(153)

Generating Function    

pe 1  qe

(154)

1 p

(155)

2 p p2

(156)

Moments X 

X2 

Summary of Probability Distributions and Their Moments X3 

X4 

439

6  6 p  p2 p3

(157)

24  36 p  14 p 2  p3 p4

(158)

Central Moments 1 

2 

1 p

(159)

1 p p2

(160)

1  p  2  p 

3 

p3

(161)

9 1  p   p 2 1  p  2

4 

p4

(162)

Moment Parameters 

1 p

2 

 

(163)

1 p p2

(164)

2 p 1 p

 9

p2 1 p

(165)

(166)

Kunihiro Suzuki

440

Peak Position x0  1

(167)

Comment

p

In the Bernoulli event with probability of , the trial number where we succeed for the first time follows a geometric distribution. This distribution corresponds to the case where we succeed for the first time after the x time trials. This means that we fail x  1 times and then succeed.

3.11. A Hypergeometric Distribution Graphics

0.4 N = 50 M = 15 n = 10

f(x)

0.3 0.2 0.1 0.0

0

2

4

x

6

8

10

Figure 9. Hypergeometric distribution.

Probability Distribution f  x 

M

Cx  N

N M

Cn

Generating Function None.

Cn  x

(168)

Summary of Probability Distributions and Their Moments

441

Moments X  np

X 2  np

X3 

(169)

npN  n  pN  N N 1

(170)

n2 p 2 N 2  3n 2 pN  2n 2  3np 2 N 2  3npN  2 p 2 N 2  np    N  1 N  2  3npN 2  3nN  3 pN 2  N 2 

X 4  X  X  1 X  2  X  3  6 X 3  11 X 2  6 X

X  X  1 X  2  X  3  np  pN  1 pN  2  pN  3

 n  1 n  2  n  3  N  1 N  2  N  3

(171)

(172)

(173)

where

p

M N

(174)

q  1 p

(175)

Central Moments

1  np 2  npq

N n N 1

2n2  3npN  2 p 2 N 2  3nN  3 pN 2  N 2  np 3     N  1 N  2 6n2 p  6np2 N  6npN  4n2 p2 

(176)

(177)

(178)

We do not have a clear expression for 4 , and hence evaluate it using the theorem as

Kunihiro Suzuki

442

4  X 4  4 X 3 X  6 X 2 X  3 X 2

4

(179)

Moment Parameters

  np  2  npq

(180)

N n N 1

(181)

We do not have a clear expression for  , and hence evaluate it as

3 3



(182)

We do not have a clear expression for  , and hence evaluate it as



4 4

(183)

Peak Position The peak position x0 is given by n 1 n 1 np  p  N x  N 0 2 2 1 1 N N

np  q 

(184)

where x0 is an integer.

Comment This distribution is related to Bernoulli trials where we have only two values. A binominal distribution is related to the number of one event among the total trials. In the binomial distribution, we assume that the probability that the event occurs is invariant. In the determining the ratio of red balls among red and black balls, we return the picked up ball, and hence the ratio of the red ball among the total ball is constant. However, we do not return the ball, and hence the corresponding probabilities to obtain red balls changes in each event and the corresponding probability distribution becomes this hypergeometric one.

Summary of Probability Distributions and Their Moments

443

3.12. A Normal Distribution Graphics 0.03

0.05

 = 20

 = 100

 = 40

 = 80

 = 120

 = 10

f(x)

f(x)

0.02

0.04 0.03 0.02

0.01

 = 20  = 40

0.01

0.00

0

50

100 x (a)

Figure 10. Normal distribution. (a)

150



200

0.00

0

50

100 x (b)

150

200

dependence. (b)  dependence.

Probability Distribution f  x 

  x   2  exp    2 2  2  1

for    x  

(185)

Generating Function   2 2      exp     2  

(186)

Moments X 

X2 2 X3  0

X4 3

(187)

(188)

(189)

(190)

Kunihiro Suzuki

444

Central Moments 1  

(191)

2   2

(192)

3  0

(193)

4  3 4

(194)

Moment Parameters



(195)

2 2

(196)

 0

(197)

 3

(198)

The composite variable Y  X1  X 2 also follows a normal distribution given by f  y 

  y   2  exp    2 2  2  1

(199)

with parameters given by   1  2

(200)

 2  12   22

(201)

A partial normal distribution is given by

  x     a  2  2 1   for x  0 f  x  exp    2 2    a  2   1  Erf      2 

(202)

Summary of Probability Distributions and Their Moments

445

A joined half normal distribution is given by      f  x       

  x   2  exp    for x   2 12     1   2  2 1

  x   2  exp    for x   2 22     1   2  2 1

(203)

Peak Position

x0  

(204)

Comment When the probability variable is influenced by many independent factors and each factor is not extremely significant compared with other factors, we can expect that the variable follows a normal distribution. Many probability variables, which have bell shaped frequency distributions, are expressed with a normal distribution. Therefore, a normal distribution is the most important probability distribution in statistics.

3.13. A Standard Normal Distribution Graphics 0.50

f(x)

0.40 0.30 0.20 0.10 0.00 -4

-3

Figure 11. Standard normal distribution.

-2

-1

0 x

1

2

3

4

Kunihiro Suzuki

446

Probability Distribution f  x 

 x2  exp    2  2 1

for    x  

(205)

Generating Function 2      exp    2 

(206)

Moments X 0

X 2 1 X3  0 X4 3

(207)

(208)

(209)

(210)

Central Moments 1  0

(211)

2  1

(212)

3  0

(213)

4  3

(214)

Moment Parameters  0

(215)

 2 1

(216)

Summary of Probability Distributions and Their Moments

447

 0

(217)

 3

(218)

Peak Position

x0  0

(219)

Comment The normal distribution is most important one with two parameters of an average and a variance. Therefore, the distributions are controlled with these two parameters. The standard normal distribution is normalized with the two parameters and is independent of any parameters. Therefore, the probability variables are converted to the normalized ones. We can analyze the data with the standard normal distribution and the results are converted to the original normal distribution if we know the related average and variance.

3.14. A Lognormal Distribution Graphics

1.50

= 0  = 2.0

1.00

f(x)

 = 1.5  = 1.0  =0.5

0.50

0.00

0

Figure 12. Lognormal distribution.

1

2

x

3

4

5

Kunihiro Suzuki

448

Probability Distribution   ln x   2  f  x  exp    2 2 2 x   1

for x  0

(220)

Generating Function None. Moments

1   X  exp     2  2  

(221)

X 2  exp  2  2 2 

(222)

9   X 3  exp  3   2  2  

(223)

X 4  exp  4  8 2 

(224)

Central Moments

 

 

1

1  exp     2  2

2  exp  2  2 2  



(225)

(226)

3  exp  3   2  exp  3 2   3exp  2   2 2  

(227)

4  exp  4  2 2  exp  6 2   4exp  3 2   6exp  2   3

(228)

3

Summary of Probability Distributions and Their Moments

449

Moment Parameters  

1 2

 

  exp     2 

(229)

 2  exp  2  2 2 

(230)







 

exp 3 2  3exp  2  2

 

(231)

3

exp  2  1 2  



exp  6 2   4exp  3 2   6exp  2   3 exp  2   1  

The composite variable

2

(232)

Y  X1 X 2 also follows a lognormal distribution with

parameters

  1  2

(233)

 2  12   22

(234)

Peak Position

x0  e 

2

(235)

Comment When the probability variable Y  ln X follows a normal distribution, the lognormal distribution is the one for the variable X . This distribution is sometimes used for the distribution for asset.

Kunihiro Suzuki

450

3.15. A Cauchy Distribution Graphics 0.8

=0

f(x)

0.6

 = 0.5  = 1.0

0.4

 = 2.0

0.2 0.0

-5 -4 -3 -2 -1

0 x

1

2

3

4

5

Figure 13. Cauchy distribution.

Probability Distribution f  x 

1   x   2   1         

for    x  

(236)

The standard one is given by f  x 

1  1  x 2 

Generating Function None. Moments None. Central Moments None. Moment Parameters None.

(237)

Summary of Probability Distributions and Their Moments

451

Peak Position

x0  

(238)

Comment This distribution is used for resonant phenomenon for X-ray density spectrum in nuclei physics. This is also known as the one that has no moment parameters.

2 3.16.  Distribution

Graphics

1.0 n=1

0.8

n=3 n=5 n = 10

f(x)

0.6

n = 20

0.4 0.2 0.0

0

2

4 x

6

8

Figure 14.  2 distribution.

Probability Distribution fn  x  

n 1  x 2 x exp    n  2 n 22    2

1

for 0  x  

(239)

Generating Function 

 1  2  x   n2 1     n  dx  x exp   2  n  0   2 2   2 1

(240)

Kunihiro Suzuki

452

Moments X n

(241)

X 2  n  n  2

(242)

X 3  n  n  2 n  4

(243)

X 4  n  n  2 n  4 n  6

(244)

Central Moments 1  n

(245)

2  2n

(246)

3  8n

(247)

4  12n  n  4

(248)

Moment Parameters

n

(249)

 2  2n

(250)

8 n

(251)





3 n  4 n

2 The composite variable Y  X1  X 2 also follows a  distribution given by

(252)

Summary of Probability Distributions and Their Moments

fn  y  

n 1  y 2 y exp    n  2 n 22    2

1

453

(253)

where

n  n1  n2

(254)

Peak Position

x0  n  2

(255)

When x0 is negative, it is 0.

Comment 2 When the probability variable X follows a standard normal distribution, the  distribution is the one for the sum of square of sum of various X . Therefore, this distribution is related to the error of variables, and hence is frequently used in statistics.

3.17.

 Distribution

Graphics

1.0 n=1

0.8

f(x)

n=2 n=5

0.6 0.4 0.2 0.0

Figure 15.



distribution.

0

1

2

x

3

4

5

Kunihiro Suzuki

454

Probability Distribution

fn  x  

 x2  n 1 x exp   n 1  n   2 22    2 1

for 0  x  

(256)

Generating Function None. Moments

 n 1 2    2  X  n   2

(257)

X2  n

(258)

 n  3 22     2   n   2 3

X3

X 4  n  n  2

(259)

(260)

Central Moments Use a theorem. Moment Parameters Use a theorem. Peak Position

x0  n  1

(261)

Summary of Probability Distributions and Their Moments

455

Comment 2 2 When the probability variable Y  X follows a  distribution, this is the one for

X . Therefore, this distribution is related to the distance from the origin.

3.18. A Rayleigh Distribution Graphics

1.5  = 0.5

f(x)

1.0  = 1.0

0.5

0.0

 = 2.0

0

1

2

x

3

4

5

Figure 16. Rayleigh distribution.

Probability Distribution f  x 

 x2  x exp  2  2  2 

for x  0 (262)

Generating Function None. Moments X 

 2



X 2  2 2

(263)

(264)

Kunihiro Suzuki

456 3 2 3  2

X3 

X 4  8 4

(265)

(266)

Central Moments 

1 

2



(267)

  2   2    2 2 

3 

 3 2  

(268)

 3

(269)

 

(270)

3

4   8   2   4 4

Moment Parameters 





(271)

   2   2   2 2 

(272)





2

2    3  2

4   2 32  3 2

4   

2

(273)

(274)

Peak Position x0  

(275)

Summary of Probability Distributions and Their Moments

457

and Y

follow normal

Comment When the two independent probability variables X X2 Y2

distributions with averages of zero, this is the one for is used in acoustic engineering.

. Therefore, this distribution

3.19. An F Distribution Graphics 1.0

n2 = 5

f(x)

0.8

n1 = 1 n1 = 5 n1 = 10

0.6 0.4 0.2 0.0

0

1

2

x

3

4

5

Figure 17. F distribution.

Probability Distribution  n1  2x 1 f n1 , n2  x     n1 n2   n1 x  n2 B ,   2  2 2  2

n1

2    

n1  x  2 1  n1 n  x 2  2 2

n2

2  1   x 

 for 0  x    (276)

Generating Function 

  n1   2x 1      exp  x    n1 n2   n1 x  n2  B ,  2 2   2  2   0

n1

2    

n1  x  2 1  n1 n  x 2  2 2

n2

2  1 dx   x 

(277)

Kunihiro Suzuki

458

Moments X 

X

n2 n2  2

n   2   n1 

2

(278)

n1  n1  2 

2

 n2  2  n2  4 

(279)

n1  n1  2  n1  4  n  X3  2  n n   1  2  2  n2  4  n2  6 

(280)

n1  n1  2  n1  4  n1  6  n  X4  2   n1   n2  2  n2  4  n2  6  n2  8 

(281)

3

4

Central Moments 1  X 

2 

3 

n2 n2  2

2n22  n1  n2  2  n1  n2  2   n2  4  2

8n23  2n1  n2  2  n1  n2  2  n12  n2  2   n2  4  n2  6  3

 n1  6  n1  4  n1  2  n1  n2  2  n2  4  n2  6  n2  8 3 n   n1  4  n1  2  n1 n2 4  2  n n  2 n  4 n  6 n  2  2  2  2  1  2 2 2 n   n1  2  n1  n2  6  2     n1   n2  2  n2  4   n2  2   n2    n1 

(282)

(283)

(284)

4

4  

 n2  3    n2  2 

(285)

4

Moment Parameters 

n2 n2  2

(286)

Summary of Probability Distributions and Their Moments 2 



2n22  n1  n2  2 

459 (287)

n1  n2  2   n2  4  2

8  n2  4  2n1  n2  2  n2  6 n1  n1  n2  2

(288)

2 3  n2  4   4  n2  2   n1  n2  10  n1  n2  2      n1  n2  6  n2  8  n1  n2  2 

(289)

Peak Position

n1n2  n2 2 x0  n1n2  n1 2

(290)

Comment This distribution is related to the ratio of two variances, and hence plays an important role in variance analysis.

3.20. A t Distribution Graphics 0.5 0.4

n = 10 n=3

f(x)

n=1

0.3 0.2 0.1 0.0 -5 -4 -3 -2 -1 0 x

Figure 18

t distribution.

1

2

3

4

5

Kunihiro Suzuki

460

Probability Distribution  n 1 n 1    2  2 2  x  fn  x     1 n n  n    2

 for    x    (291)

Generating Function  n 1  n 1  2   2 2   x 2       dx  exp  x    1 n  n  n    0 2

(292)

Moments X 0

X2 

(293)

n n2

X3  0

X4 

3n 2  n  2  n  4 

(294)

(295)

(296)

Central Moments 1  0

2 

n n2

3  0 4 

3n 2  n  2  n  4 

(297)

(298)

(299)

(300)

Summary of Probability Distributions and Their Moments

461

Moment Parameters  0

2 

(301)

n n2

(302)

 0

 3

(303)

n2 n4

(304)

Peak Position x0  0

(305)

Comment This distribution is related to the average of sample data, and hence most important one in statistics. This distribution is well approximated with a standard normal distribution with a large sample number n .

3.21. An Exponential Distribution Graphics 2.0  = 0.5

f(x)

1.5 1.0

 = 1.0  = 2.0

0.5 0.0 0 Figure 19. Exponential distribution.

1

2

x

3

4

5

Kunihiro Suzuki

462

Probability Distribution f  x 

 x 1 exp      

 for 0  x    (306)

Generating Function    

1 1  

(307)

Moments X 

(308)

X 2  2 2

(309)

X 3  6 3

(310)

X 4  24 4

(311)

Central Moments 1  

(312)

2   2

(313)

3  2 3

(314)

4  9  4

(315)

Moment Parameters



(316)

 2  2

(317)

Summary of Probability Distributions and Their Moments

463

 2

(318)

 9

(319)

Peak Position x0  0

(320)

Comment This distribution is applied to a variable where the occurring probability is the identical for each event. The probability that occurs is then proportional to the current number of the elements. The nuclear decay number is well expressed by the distribution.

3.22. An Erlang Distribution Graphics

1.5

 = 1/k

k = 10 k=5

k=1

1.0

f(x)

k=3

0.5

0.0

0

1

x

2

3

Figure 20. Erlang distribution.

Probability Distribution fk  x  

 xk 1 e k  k  1! x

 for 0  x    (321)

Kunihiro Suzuki

464

Generating Function

    1   

k

(322)

Moments X  k

(323)

X 2  k  k  1  2

(324)

X 3  k  k  1 k  2  3

(325)

X 4  k  k  1 k  2 k  3  4

(326)

Central Moments 1  k 

(327)

2  k  2

(328)

3  2k  3

(329)

4   k 2  2k   4

(330)

Moment Parameters   k

(331)

 2  k2

(332)



2

(333)

k

 1

2 k

(334)

Summary of Probability Distributions and Their Moments

465

Peak Position x0    k  1

(335)

Comment When the probability variable X1 , X 2 , , X k follow exponential distributions, the sum of the probability variables follows this distribution. This distribution is used queueing theory for service time where service consists of serial many processes. The distribution changes from an exponential distribution to a delta function with increasing the service step number k .

3.23. A Laplace Distribution Graphics

1.0

=0

f(x)

0.8

= 0.5

0.6 = 1.0

0.4

= 2.0

0.2 0.0 -4

-3

-2

-1

0 x

1

2

3

4

Figure 21. Laplace distribution.

Probability Distribution f  x 

 x  1 exp    2   

for    x  

(336)

Generating Function    

1 1   2

2

e 

(337)

Kunihiro Suzuki

466

Moments X 

(338)

X 2   2  2 2

(339)

X 3   3  6 2

(340)

X 4   4  12 2 2  24 4

(341)

Central Moments 1  

(342)

2  2 2

(343)

3  0

(344)

4  24 4

(345)

Moment Parameters



(346)

 2  2 2

(347)

 0

(348)

 6

(349)

Peak Position

x0  

(350)

Summary of Probability Distributions and Their Moments

467

Comment A Laplace distribution is the one that is formed by connected two exponential distributions that are symmetrical with respect the origin, and hence it is defined in an infinite plane.

3.24. A Weibull Distribution Graphics 2.0

1.5

=1 m=3

1.0

m=2

 = 0.5

1.5

=1

f(x)

m=2

f(x)

m=1 m = 0.5

=2

1.0

0.5

0.5 0.0

0

1

2

3

4

0.0

5

0

1

x (a)

2

3

4

5

x (b)

Figure 22. Weibull distribution. (a) m dependence. (b)



dependence.

Probability Distribution f  x 

m x    

m 1

  x m  exp          

for 0  x  ; m  0,  0

(351)

Generating Function None. Moments k  X k   k    1 m 

Central Moments Use a theorem.

(352)

Kunihiro Suzuki

468

Moment Parameters Use a theorem. Peak Position 1

 m 1 m x0      m 

(353)

Comment This distribution is widely used in the reliability field and is related to the failure rate. The failure is not related to the average point in the system, but to the weakest point. The distribution is derived focusing on the weak point in the system, and hence is called as the weakest link model. This distribution expresses various kinds of shape with varying a parameter value, which corresponds to various kinds of mechanisms for failures.

REFERENCES [1] [2]

[3] [4] [5]

[6]

W. L. Carlson and B. Thorne, Applied Statistical Methods, 1997, Presence-Hall, Inc., New Jersey, U.S.A. R. A. Barnett, M. R. Ziegler, and K. E. Byleen, College Mathematics for Business, Economics, Life science, and Social sciences 12th edition, 2011, Pearson Education, Inc., 2011, U. S. A. G. Maruyama, Probability and statistics, 1956, Kyoritu, Japan, in Japanese. 丸山儀四郎、“確率および統計入門”、共立出版、日本、1956. A. Kobari, Introduction to probability and statistics, 1973, Iwanami Shoten, Japan. 永田靖、小針宏、“確率・統計入門”、岩波書店、東京、1973. Y. Tanaka and K. Wakimoto, Multivariate statistical analysis, 1994, Gendai Sugakusha, Japan, in Japanese. 田中豊、脇本和昌、“多変量統計解析法”、現代数学社、京都、1994. Y. Nagata and M. Munechika, Multivariate analysis, 2007, Science Company, Japan, in Japanese.永田靖、棟近雅彦、“多変量解析法入門”、サイエンス社

[7]

、東京、2007. Y. Wakui and S. Wakui, Covariance Structural Analysis, Nihon Jitsugyo Publisher, 2003, Japan, in Japanese. 涌井良幸、涌井貞美、“共分散構造分析”、日本実

[8]

業出版社、東京、2003. Y. Wakui, Bayes Statistics as a Tool, Nihon Jitsugyo Publisher, 2009, Japan, in Japanese. 涌井良幸、“道具としてのベイズ統計”、日本実業出版社、東京、2009.

470 [9]

Kunihiro Suzuki

H. Cramer, Mathematical Methods of Statistics, 1999, Princeton University Press, U. S. A. [10] D. M. Levine,T. C. Krehbiel, and M. L. Berenson, Business Statistics, 2013, Pearson Education Inc., U. S. A.

ABOUT THE AUTHOR Kunihiro Suzuki, PhD Fujitsu Limited, Tokyo, Japan Email: [email protected]

Kunihiro Suzuki was born in Aomori, Japan in 1959. He received his BS, MS, and PhD degrees in electronic engineering from Tokyo Institute of Technology, Tokyo, Japan, in 1981, 1983, and 1996, respectively. He joined Fujitsu Laboratories Ltd., Atsugi, Japan in 1983 and was engaged in design and modeling of high-speed bipolar and MOS transistors. He studied process modeling as a visiting researcher at the Swiss Federal Institute of Technology, Zurich, Switzerland in 1996 and 1997. He moved to Fujitsu Limited, Tokyo, Japan in 2010, where he was engaged in a division that is responsible for supporting sales division. His current interests are statistics and queuing theory for business. His research covers theory and technology in both semiconductor device and process. To analyze and fabricate high-speed devices, he also organizes a group that includes physicists, mathematicians, process engineers, system engineers, and members for analysis such as SIMS and TEM. The combination of theory and experiment and the aid from various members make his group special to do various original works. His models and experimental data are systematic and valid for wide range conditions and can contribute to academic and practical product fields. He is the author and co-author of more than 100 refereed papers in journals, more than 50 papers in international technical conference proceedings, and more than 90 papers in domestic technical conference proceedings.

INDEX 2

 2 distribution, 34, 72, 73, 78, 79, 209, 304, 306, 413, 451, 452, 453, 455

A adjust residual, 27, 45, 48, 49, 50, 51, 52, 57, 61, 66 Analytic Hierarchy Process (AHP), v, ix, 129, 130, 139 average, 1, 3, 4, 5, 6, 7, 12, 13, 14, 16, 19, 21, 22, 24, 34, 49, 69, 73, 74, 75, 76, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 95, 97, 108, 110, 111, 112, 115, 119, 126, 129, 133, 134, 135, 139, 160, 161, 166, 172, 173, 177, 181, 193, 203, 205, 221, 237, 278, 280, 307, 313, 314, 316, 437, 447, 461, 468

B birth ratio, 215, 218

C cohort ratio, 215, 216, 217, 218, 219 combination, 471 constant flux, 249, 265, 266 contributed item, 8, 54, 56 correlation factor, viii, 1, 3, 4, 5, 6, 8, 10, 11, 16, 17, 18, 19, 20, 22, 23, 24, 26, 31, 54, 56, 66, 94, 95,

96, 97, 169, 172, 173, 177, 178, 181, 187, 305, 316, 320 correspondence analysis, v, 169, 171 co-variance, 3, 21, 162 CS analysis, viii, 1, 2, 15, 16, 19, 20, 27, 52, 56, 65, 66, 67, 121 CS correlation factor, 1, 10, 11, 56, 66 CS plot, 1, 8, 11, 55

D determinant of a matrix, 321, 346

E eigenvalue, 129, 135, 136, 139, 169, 177, 179, 180, 181, 186, 190, 321, 347, 349, 350, 352, 353, 354, 363, 399 eigenvector method, 134, 135, 139 explanatory variable, 1, 2, 3, 5, 13, 15, 16, 18, 21, 22, 24, 27, 28, 30, 31, 51, 56, 57, 65, 66, 103, 104, 114, 141, 159 exponential distribution, 301, 310, 311, 387, 413, 461, 465, 467 extended normalized value, 120, 121

F F distribution, 69, 72, 73, 93, 94, 99, 102, 104, 107, 413, 457107, 413, 457 first principal component, 16, 17

Index

474 G Gauss elimination method, 326, 327, 331, 335, 337 geometric average, 129, 133, 134, 135, 139

H hazard function, 205, 206, 207, 212 hypothesis, 69, 70, 73, 74, 75, 78, 79, 80, 82, 84, 86, 88, 89, 91, 93, 94, 95, 97, 98, 100, 103, 105, 108, 110, 113

N network matrix, 293, 294, 295, 299 network path with a loop, 295 normal distribution, 49, 52, 69, 72, 78, 79, 81, 82, 83, 90, 91, 92, 95, 96, 98, 107, 204, 212, 301, 307, 312, 313, 314, 316, 320, 413, 435, 442, 444, 445, 447, 449, 453, 457, 461 normalized value, 6, 19, 64, 115, 116, 119, 120, 121 n-th product of matrix, 364

O I improve requested, 8, 54 improvement request item, 54 independent factor, v, 27, 28, 29, 36, 45, 50, 51, 52, 53, 54, 59, 66, 67, 113, 445 independent value, 27, 30, 31, 32, 34, 36, 37, 38, 39, 44, 65, 113 initial condition, 262, 263, 265, 266, 273, 274, 279, 284, 286, 289 initial vector, 249, 255, 256, 264, 267, 268, 271, 273, 294, 295, 297, 299, 350 inverse matrix, 158, 291, 321, 325, 326, 338, 344, 345, 365

K Kaplan-Meier product-limit predictive method, 199

L Lagrange function, 188 level achievement ratio, 1, 27, 51, 52, 54, 64, 66, 67 LU decomposition, 335 LU division, 339, 344

M Markov process, v, 249, 250, 251, 268, 287 matrix operation, v, ix, 16, 134, 141, 153, 169, 266, 321, 323, 335

objective variable, viii, 1, 2, 3, 4, 7, 10, 11, 13, 16, 17, 18, 21, 22, 27, 28, 30, 31, 49, 56, 57, 63, 65, 66, 114, 141, 142, 151

P P point, 5, 72, 78, 85, 90, 92, 93, 94, 96, 98, 99, 102, 107, 109, 113 pair comparison method, 129, 131, 132 Poisson distribution, 311, 319, 435, 437 population prediction, v, 215 power method, 350 prediction, v, vii, 70, 71, 72, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 91, 92, 93, 94, 95, 96, 98, 100, 102, 105, 107, 110, 113, 114, 149, 193, 215, 304, 306 principle of symmetry, 221, 224

Q quantification theory I, v, 141, 153, 169, 170, 187 quantification theory II, v, 153, 169, 170 quantification theory III, v, 169, 170 quantification theory IV, v, 187

R random number, v, ix, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 319, 320 random walk, v, 221, 249, 250, 251, 252, 255 return frequency, 237

Index S score evaluation, v, 115, 119 standard deviation, 4, 5, 6, 19, 22, 24, 53, 65, 67, 76, 82, 84, 86, 90, 91, 95, 115, 119, 121, 125, 126, 193, 199, 203, 210, 314 standard normal distribution, 52, 72, 78, 79, 81, 83, 90, 92, 96, 98, 307, 313, 314, 316, 413, 445, 447, 453, 461 supply source, 249, 262, 264, 270, 271 survival probability, 193, 194, 195, 196, 197, 198, 199, 200, 201, 203, 205, 206, 210

475 93, 94, 95, 96, 97, 98, 99, 100, 102, 103, 105, 107, 109, 113, 207, 304, 305, 306 transition matrix, 249, 254, 255, 256, 257, 258, 262, 264, 265, 266, 268, 271, 273, 276, 279, 281, 282, 283, 286, 287, 289, 290, 292, 293, 299 transition probability, 251, 255, 258, 260, 265, 266, 278

U unbiased variance, 3, 21, 75, 78, 88, 94, 106, 107, 109, 111 unit vector, 8, 20, 25, 54, 122, 127, 349

T V t distribution, 5, 6, 7, 22, 69, 72, 77, 78, 84, 85, 86, 87, 89, 95, 305, 413, 459 testing, v, viii, 5, 31, 34, 35, 67, 69, 71, 73, 74, 75, 78, 79, 80, 81, 82, 84, 85, 86, 88, 89, 90, 91, 92,

vanishing monitor, 265