119 6 4MB
English Pages 340 [333] Year 2023
Testing Statistical Hypotheses with Given Reliability
Testing Statistical Hypotheses with Given Reliability By
Kartlos Joseph Kachiashvili
Testing Statistical Hypotheses with Given Reliability By Kartlos Joseph Kachiashvili This book first published 2023 Cambridge Scholars Publishing Lady Stephenson Library, Newcastle upon Tyne, NE6 2PA, UK British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Copyright © 2023 by Kartlos Joseph Kachiashvili All rights for this book reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner. ISBN (10): 1-5275-1063-8 ISBN (13): 978-1-5275-1063-0
TABLE OF CONTENTS
Preface ....................................................................................................... ix Introduction ................................................................................................ 1 Chapter 1 .................................................................................................... 6 Hypotheses Testing Methods 1.1. Existed Basic Parallel Methods of Hypotheses Testing ................. 6 1.1.1. The Fisher’s -test ................................................................ 6 1.1.2. The Newman-Pearson’s frequentist test ................................ 6 1.1.3. The Jeffreys Bayesian approach ............................................ 7 1.1.3.1. General loss function .................................................... 9 1.1.3.2. Stepwise loss function ................................................ 10 1.1.4. The Berger’s conditional test............................................... 12 1.2. Sequential Tests ........................................................................... 14 1.2.1. The Wald’s Method ............................................................. 14 1.2.2. The Bayes’ Method ............................................................. 15 1.2.3. The Berger’s Method........................................................... 15 1.3. Constrained Bayesian Methods (CBM) of Hypotheses Testing... 16 1.4. The Method of Sequential Analysis of Bayesian Type ................ 20 Chapter 2 .................................................................................................. 23 Constrained Bayesian Method for Testing Different Type of Hypotheses 2.1. General Statement of Constrained Bayesian Method................... 23 2.1.1. Restrictions on the averaged probability of acceptance of true hypotheses (Task 1)...................................................... 23 2.1.2. Restrictions on the conditional probabilities of acceptance of true hypotheses (Task 2)...................................................... 25 2.1.3. Restrictions on the conditional probabilities of acceptance of each true hypothesis (Task 21) ............................................. 27 2.1.4. Restrictions on posterior probabilities of acceptance of true hypotheses (Task 3) ................................................................. 29 2.1.5. Restriction on the averaged probability of rejection of true hypotheses (Task 4) ................................................................. 30 2.1.6. Restrictions on the conditional probabilities of rejection of each true hypothesis (Task 5) .............................................. 32
vi
Table of Contents
2.1.7. Restrictions on a posteriori probabilities of rejection of each true hypothesis (Task 6)........................................................... 33 2.1.8. Restrictions on probabilities of rejection of true hypothesis (Task 61) .................................................................................. 34 2.1.9. Restrictions on posterior probability of rejected true hypotheses (Task 7) ................................................................. 35 2.2. Directional Hypotheses Testing Methods .................................... 36 2.3. CBM for Testing Directional Hypotheses Restricted False Discovery Rates ............................................................................ 49 2.3.1. Restrictions on the averaged probability of acceptance of true hypotheses for testing directional hypotheses (Task 1) .... 50 2.3.2. Restrictions on the conditional probabilities of acceptance of each true hypothesis for testing directional hypotheses (Task 2) .................................................................................... 55 2.3.3. Restrictions on the averaged probability of rejection of true hypotheses for testing directional hypotheses (Task 4) ........... 56 2.3.4. Restrictions on the conditional probabilities of rejection of each true hypothesis for testing directional hypotheses (Task 5) .................................................................................... 58 2.3.5. Restrictions on posterior probabilities of rejected true hypotheses for testing directional hypotheses (Task 7) ........... 63 2.4. CBM for Testing Multiple Hypotheses with Directional Alternatives in Sequential Experiments ........................................ 66 2.4.1. CBM for testing multiple directional hypotheses ................ 68 2.4.1.1. A sequential test for multiple directional hypotheses ... 70 2.5. Application of CBM to Union-Intersection and IntersectionUnion Hypotheses Testing Problems ............................................ 75 2.5.1. Statement of the problem .................................................... 77 2.5.2. General solution of the stated problem ................................ 79 2.5.2.1. Another loss function .................................................. 83 2.5.3. Examples ............................................................................. 86 2.6. Quasi-Optimal Rule of Testing Directional Hypotheses and Its Application to Big Data................................................................. 92 2.6.1. Quasi-optimal approach of testing the hypotheses .............. 94 2.6.2. Testing multiple directional hypotheses ............................ 100 2.7. CBM for Testing Hypotheses Concerning Parameters of Normal Distribution with Equal Mathematical Expectation and Variance ...102 2.7.1. Statement of the problem ...................................................105 2.7.1.1. Estimation of the parameter .......................................106 2.7.2. Testing Hypotheses (2.169) ................................................107 2.7.2.1. The maximum ratio test .............................................107
Testing Statistical Hypotheses with Given Reliability
vii
2.7.2.2. Stein’s method ...........................................................108 2.7.3. CBM for testing hypotheses (2.169) at conditional distributions (2.189)............................................................... 113 2.7.4. Testing hypotheses (2.175) at conditional distributions (2.176) using CBM 2 ..............................................................116 2.8. Constrained Bayesian Method for testing Equi-Correlation Coefficient of a Standard Symmetric Multivariate Normal Distribution ..................................................................................121 2.8.1. Introduction ........................................................................121 2.8.2. Statement of the problem ...................................................123 2.8.3. Testing (2.221) hypotheses.................................................126 2.8.3.1. The maximum ratio test .............................................126 2.8.3.2. Stein’s approach.........................................................127 2.8.4. CBM for testing (2.224) hypotheses ..................................128 2.8.5. Evolution of CBM 2 for testing (2.224) hypotheses...........132 2.8.5.1. Using the maximum ratio estimation .........................132 2.8.5.2. Using the Stein’s approach ........................................135 Chapter 3 .................................................................................................138 Comparison Analysis Hypotheses Testing Methods 3.1. Comparison of CBM with the frequentist and the Bayes methods ........................................................................................138 3.2. Comparison of Hypotheses Testing in Parallel Experiments ..... 140 3.3. Comparison of Hypotheses Testing in Sequential Experiments . 165 3.4. Comparison of the Directional Hypotheses Testing Methods.....178 3.4.1. CBM for the normally distributed directional hypotheses ... 179 3.4.1.1. Determination of the Lagrange multiplier .................181 3.4.2. Computation results............................................................183 3.4.2.1. Discussion ..................................................................189 Chapter 4 .................................................................................................195 Experimental Investigations 4.1. Simulation results of Directional Hypotheses at Restriction of False Discovery Rates ............................................................. 195 4.1.1. Computation results for the normally distributed directional hypotheses ..............................................................................197 4.1.2. Discussion of the obtained results ......................................203 4.2. Simulation results of Multiple Hypotheses with Directional Alternatives in Sequential Experiments .......................................205 4.2.1. Computation results............................................................206 4.2.2. Discussion of the obtained results ......................................213
viii
Table of Contents
4.3. Simulation results of Union-Intersection and Intersection-Union Hypotheses ...................................................................................214 4.3.1. Calculations for concrete examples ....................................214 4.3.2. Computational results .........................................................219 4.3.3. Discussion of the obtained results ......................................237 4.4. Consideration of the normal distribution for Quasi-Optimal Rule of Testing directional hypotheses .................................................238 4.4.1. Computation results............................................................240 4.4.2. Discussion of the obtained results ......................................264 4.5. Computation results for Testing Composite Hypotheses Concerning Normal Distribution with Equal Parameters .............265 4.5.1. Maximum Likelihood method ............................................265 4.5.2. Computation results obtained by Stein’s Method ...............268 4.6. Simulation results of testing Equi-Correlation Coefficient of a Standard Symmetric Multivariate Normal Distribution ...............290 4.6.1. Computation results............................................................290 4.6.2. Discussion of the obtained results ......................................292 Appendix A1 ...........................................................................................308 Algorithm of computation of (2.251) distribution function Appendix A2 ...........................................................................................310 The Kullback–Leibler divergence between the distributions corresponding to the basic and alternative hypotheses References ...............................................................................................313
PREFACE
It is difficult to overestimate the role and place of one of the main fields of mathematical statistics, hypothesis testing, in both theoretical and applied statistics. There are many theoretical and applied works dedicated to solving this problem, the number of which is increasing every day. Among them, based on contemporary research, my 2018 book is worth noting, the logical continuation of which is this book (Kachiashvili, 2018a). In particular, the results of the further development of the CBM for many types of interesting and practically useful hypotheses and comparison results with the existing basic methods are given there (Fisher, 1925; Neyman and Pearson, 1928, 1933; Jeffreys, 1939; Wald, 1947a, b; Berger, 2003). A brief description of the discussed methods is given in the first chapter. The second chapter provides optimal and quasi-optimal methods for testing individual and multiple directional hypotheses for parallel and sequential experiments using CBM; Also, the general forms of presentation of statistical hypotheses in the form of their intersection-union and union-intersection and optimal decision-making methods for such formulation using CBM are discussed; hypothesis testing methods using CBM for normal distribution with equal parameters and for equi-correlation coefficient of standard symmetric multivariate normal distribution are considered. The next chapter shows the advantage of CBM compared to existing classical methods in terms of the reliability of the made decisions and the minimization of required information (sample size). The fourth chapter presents the results of the experimental investigation of the developed methods, which clearly
Preface
x
confirm the validity of the obtained theoretical results and the conclusions made on their basis. In our opinion, the work will be interesting and useful for both professional and beginner researchers and practitioners of many fields, who are interested in the theoretical and practical issues of the considered direction of mathematical statistics, namely, in statistical hypothesis testing. It will also be very useful for specialists of different directions for solving suitable problems at the appropriate level, because the book discusses in detail many practically important problems and provides detailed algorithms for their solution, the direct use of which does not represent much difficulty.
INTRODUCTION
A statistical hypothesis is a formalized record of properties of the investigated phenomenon and relevant assumptions. Statistical hypotheses are set when random factors affect the investigated phenomena, i.e., when the observation results of the investigated phenomena are random. The properties of the investigated phenomenon are completely defined by its probability distribution law. Therefore, the statistical hypothesis is an assumption concerning this or that property of the probability distribution law of a random variable. Mathematical statistics is the set of methods for studying the events caused by random variability and estimates the measures (the probabilities) of possibility of occurrence of these events. For this reason, parametrical methods of hypotheses testing directly use distribution laws and non-parametrical methods use not distribution laws but only the properties of these laws. Practically all methods of mathematical statistics one way or another, in different doses, use hypotheses testing techniques. Therefore, it is very difficult to overestimate the meaning of the methods of statistical hypotheses testing in the theory and practice of mathematical statistics (Kachiashvili, 2019, 2022). A lot of investigations are dedicated to the statistical hypotheses testing theory and practice (see, for example, Berger, 1985, 2003; Berger et al., 1994; Bernardo and Rueda, 2002; Christensen, 2005; Hubbard and Bayarri, 2003; Lehmann, 1993, 1997; Moreno and Giron, 2006; Moreno and Martínez, 2022; Mei and Jiang, 2022; Wolpert, 1996; Zou et al., 2022) and their number increase steadily. But, despite this, there are only three
Introduction
2
following basic ideas (philosophies) of hypotheses testing at parallel experiments: the Fisher, the Neyman-Pearson and the Jeffreys (Fisher, 1925; Neyman and Pearson, 1928, 1933; Jeffreys, 1939). They use different ideas for testing hypotheses but all of them are identical in one aspect: they all necessarily accept one of the stated hypotheses in making decisions, despite the existence or absence of enough information for decision making with given reliability. The considered methods have well known positive and negative sides (Kachiashvili, 2022). All other existing methods are the particular cases of these approaches taking into account the peculiarities of the concrete problems and adapting to these specificities for increasing the reliability of the decision (see, for example, Berger and Wolpert, 1988; Berger et al., 1994; Bernardo, 1980; Delampady and Berger, 1990; Kiefer, 1977; Bansal and Sheng, 2010; Bansal and Miescke, 2013; Bansal et al., 2016). The essences of these methods are discussed below. Fisher considered only one hypothetical distribution and on the basis of the observation results was making a decision concerning its correctness (see Item 2.1). A question that Fisher did not raise was the origin of his test statistics: Why these rather than some others? This is the question that Neyman and Pearson considered (Neyman & Pearson, 1928, 1933). Their solution involved not only the hypothesis but also a class of possible alternatives and the probabilities of two kinds of errors (see Item 2.2): false rejection (Error I) and false acceptance (Error II) (Lehmann, 1993). The “best” test was the one that minimized the probability of an alternative at validity of the basic hypothesis (Error II) subject to a bound on probability of the basic hypothesis at validity of the alternative (Error I). The latter is the significance level of the test. To the requirements of the Neyman-Pearson experiment, Jeffrey added a priori probabilities and loss function, using
Testing Statistical Hypotheses with Given Reliability
3
which he defined a risk function as averaged losses and by minimizing the risk function, hypotheses acceptance regions are defined (see Item 2.3). An attempt to reconcile the different points of view of noted philosophies was made in (Berger, 2003), and as a result there was offered a new, * compromise T method of testing (see Item 2.4). The method uses the
Fisher’s p -value criterion for making a decision, the Neyman-Pearson’s statement (using basic and alternative hypotheses) and Jeffrey’s formulae for computing the Type I and Type II conditional error probabilities for every observation result x on the basis of which the decision is made. A new approach (philosophy) to the statistical hypotheses testing, called Constrained Bayesian Methods (CBM), was comparatively recently developed (Kachiashvili, 1989, 2003, 2011, 2014a, b, 2015, 2016, 2018a, b; Kachiashvili et al. 2012a, b, c; Kachiashvili and Mueed, 2013; Kachiashvili et al., 2018; Kachiashvili, 2021) (see Item 1.4). This method differs from the traditional Bayesian approach with a risk function split into two parts, reflecting risks for incorrect rejection and incorrect acceptance of hypotheses and stating the risk minimization problem as a constrained optimization problem when one of the risk components is restricted and the another one is minimized. It generates data-dependent measures of evidence with regard to the level of restriction. In spite of absolutely different motivations of introduction of T * and CBM, they lead to the hypotheses acceptance regions with identical properties in principle. Namely, despite the classical cases when the observation space is divided into two complementary sub-spaces for acceptance and rejection of tested hypotheses, here the observation space contains the regions for making the decision and the regions for no-making the decision (see, for example, Berger, 2003; Kachiashvili et al., 2012a; Kachiashvili et al. 2012b;
4
Introduction
Kachiashvili and Mueed, 2013; Kachiashvili, 2018a). Though, for CBM, * the situation is more differentiated than for T . For CBM, the regions for
not making the decision are divided into the regions of impossibility of making the decision and the regions of impossibility of making a unique decision. In the first case, the impossibility of making the decision is equivalent to the impossibility of making the decision with given probability of the error for a given observation result, and it becomes possible when the probability of the error decreases. In the second case, it is impossible to make a unique decision when the probability of the error is required to be small, and it is unattainable for the given observation result. By increasing the error probability, it becomes possible to make a decision. * In our opinion, these properties of T and CBM are very interesting and
useful. They bring the statistical hypotheses testing rule much closer to the everyday decision-making rule when, at shortage of necessary information, acceptance of one of made suppositions is not compulsory. * The specific features of hypotheses testing regions of the Berger’s T * test and CBM, namely, the existence of the no-decision region in the T
test and the existence of regions of impossibility of making a unique or any decision in CBM give the opportunities to develop the sequential tests on their basis (Berger et al., 1994; Kachiashvili and Hashmi, 2010; Kachiashvili, 2015, 2018a). The sequential test was introduced by Wald in the middle of forty of last century (Wald, 1947a, b). Since Wald’s pioneering works, a lot of different investigations were dedicated to the sequential analysis problems (see, for example, Berger and Wolpert, 1984; Ghosh, 1970; Ghosh and Sen, 1991; Siegmund, 1985) and efforts to the development of this approach constantly increase as it has many important
Testing Statistical Hypotheses with Given Reliability
5
advantages in comparison with the parallel methods (Tartakovsky et al., 2015). Application of CBM to different types of hypotheses (two and many simple, composite, directional and multiple hypotheses) with parallel and sequential experiments showed the advantage and uniqueness of the method in comparison with existing ones (Kachiashvili, 2014a, b, 2015, 2016, 2018a, b; Kachiashvili et al., 2018). The advantage of the method is the optimality of made decisions with guaranteed reliability and minimality of necessary observations for given reliability. CBM uses not only loss functions and a priori probabilities for making decisions as the classical Bayesian rule does, but also a significance level as the frequentist method does. The combination of these opportunities improves the quality of made decisions in CBM in comparison with other methods. This fact has been confirmed many times by application of CBM to the solution of different practical problems (Kachiashvili, 2018a, 2019a, b; Kachiashvili et al., 2012c; Kachiashvili and Melikdzhanian, 2006; Kachiashvili et al., 2007; Kachiashvili et al., 2008; Kachiashvili et al., 2009; Kachiashvili et al., 2012b; Kachiashvili and Prangishvili, 2018; Kachiashvili et al., 2019; Kachiashvili et al., 2020).
CHAPTER 1 HYPOTHESES TESTING METHODS
1.1. Existed Basic Parallel Methods of Hypotheses Testing 1.1.1. The Fisher’s -test Let us suppose that the observation result X ~ f ( x | T ) , where f ( x | T ) is the probability distribution density of X at hypothesis H and it is necessary to test hypothesis H 0 : T
T
T0 . Let us choose the test statistics
t (X ) such that large values of T reflects evidence against H 0 . After
computing the p value p
P(t ( X ) t t ( x) | H 0 ) , where t (x) is a value of
the statistics t (X ) , computed by sample x , hypothesis H 0 will be rejected if p is small (Kachiashvili, 2014b). Some methods of generalization of this approach for multiple hypotheses can be found in (Kachiashvili, 2018a).
1.1.2. The Newman-Pearson’s frequentist test For the Neyman-Pearson (N-P) criterion for testing a null hypothesis
H0 :T
T0 , it is necessary to form some alternative hypothesis, for
instance, H A : T
T A , T A ! T0 . The null hypothesis rejection region has the
form T t c and otherwise it is accepted. Here c is the critical value defined from the condition D
P(T t c | H 0 ) . Quantity D is the Type I error
Hypotheses Testing Methods
7
probability, while the Type II error probability is calculated as
E
P(T c | H A ) (Kachiashvili, 2014b, 2018a).
Generalization of this method for many (more than two) hypotheses is given by generalized Neyman-Pearson lemma (Rao, 2006) but its application in practice is quite problematic.
1.1.3. The Jeffreys Bayesian approach The general statement of the Bayes Method (the Jeffrey’s Method) for arbitrary number of hypotheses is the following. T Let the sample x
( x1,..., xn ) be generated from p( x;T ) , and the
m problem of interest is to test H i : Ti 4i , i 1,2,..., S , where 4i R ,
i 1,2,..., S , are disjoint subsets with 4i
Rm . The number of tested
hypotheses is S . Let the prior on T be denoted by
¦
S i 1
S (T | H i ) p ( H i ) ,
where for each i 1,2,..., S , p( Hi ) is the a priori probability of hypothesis
H i and S (T | H i ) is a prior density with support 4 i ; p( x | H i ) denotes the
marginal
p( x | H i )
where d
³
4i
of
x
p( x | T )S (T | H i )dT and D
^d1,..., dS ` , it being di
G ( x)
density
given
^d`
Hi ,
i.e.,
is the set of solutions,
so that
°1, if hypothesis Hi is accepted, ® °0, otherwise; ¯
^G1( x),G 2 ( x),...,G S ( x)` is the decision function that associates each
observation vector x with a certain decision
Chapter 1
8
( x) x G od D ;
*j is the region of acceptance of hypothesis H j , i.e. *j
^x : G j (x) 1`.
It is obvious that G ( x) is completely determined by the *j regions, i.e.
G ( x)
^*1, *2 ,..., *S ` .
Let us introduce loss function L( H i , G ( x)) which determines the value of loss in the case when the sample has the probability distribution corresponding to hypothesis H i , but, because of random errors, decision
G ( x) is made. Making the decision that hypothesis H i is true, in reality true could be one of the hypotheses H1,..., H i 1, Hi 1,..., H S , i.e. accepting one of the hypothesis, we risk to reject one of (S 1) really true hypotheses. This risk is called the risk corresponding to the hypothesis H i , and is equal to (Berger, 1985; Kachiashvili, 2003)
U H i , G
³
Rn
LH i , G x px | H i dx .
A complete risk for any decision rule G ( x) , i.e. the risk of making an incorrect decision, is characterized by the function: rG
¦
S i 1
U ( H i ,G ) p( H i )
¦
S i 1
p( H i )
³
Rn
L ( H i , G ( x )) p ( x | H i ) dx , (1.1)
which is called risk function. Decision rule G * ( x) or, what is the same, *i* , i 1,..., S - the regions of acceptance of hypotheses H i , i 1,..., S , is called a Bayes rule if there takes place:
Hypotheses Testing Methods
rG *
min r
9
(1.2)
^G ( x)` G
Its solutions for general and stepwise loss functions are given below. 1.1.3.1. General loss function In the general case, the loss function L( H i ,G ( x)) consists of two components:
¦
L( H i , G ( x))
S
L j 1 1
H i ,G j ( x) 1 ¦Sj 1 L2 H i ,G j ( x) 0 , (1.3)
i.e. loss function L( H i ,G ( x)) is the total loss of incorrectly accepted and incorrectly rejected hypotheses. Taking into account (1.3), the solution of the problem (1.2) can be written down in the following form (Berger, 1985; Kachiashvili, 2003):
*j
¦
x : ® ¯
S L ( Hi ,G j ( x) i 1 1
¦
S L ( Hi , G j ( x) i 1 2
1) p( Hi ) p( x | Hi )
0) p( Hi ) p( x | Hi )½¾ , ¿
j 1,..., S .
(1.4)
Let us suppose that the losses are the same within the acceptance and rejection regions and introduce denotations L1( Hi , H j ) and L2 ( Hi , H j ) for incorrect acceptance of H i when H j is true and incorrect rejection of
H i in favor of H j . Then it is possible to rewrite the risk function (1.1) as follows (Kachiashvili, 1989, 2003): rG
S
S
j 1
i 1,i z j
¦ ¦
L( H i , H j ) p( H i )
and condition (1.4) takes the form
³
*j
p ( x | H i )dx ,
(1.5)
Chapter 1
10
*j
x : ® ¯
¦
S
i 1
L1( Hi , H j ) p( Hi | x)
S
¦
L ( Hi , H k ) p( Hi i 1 2
| x);
k : k 1,..., j 1, j 1,..., S `, j 1,..., S .
(1.6)
Example 1.1. Let us consider the case when the number of hypotheses equals two. Then risk function (1.5) is rG
L ( H1 , H 2 ) p ( H1 )
L ( H 2 , H1 ) p ( H 2 )
³
*1
³
*2
p ( x | H1 )dx
p ( x | H 2 )dx ,
(1.7)
and hypotheses acceptance regions (1.6) take the form
*1
^x : L1( H1, H1) p( H1) p( x | H1)
L1( H2 , H1) p( H2 ) p( x | H2 ) L2 ( H1, H 2 ) p( H1) p( x | H1) L2 ( H2 , H 2 ) p( H2 ) p( x | H2 )` , *2
^x : L1( H1, H 2 ) p( H1) p( x | H1)
L1( H2 , H 2 ) p( H2 ) p( x | H2 ) L2 ( H1, H1) p( H1 ) p( x | H1 ) L2 ( H2 , H1) p( H2 ) p( x | H2 )` .
(1.8)
1.1.3.2. Stepwise loss function Let us suppose that the losses for incorrectly accepted hypotheses are identical, while those for correctly-made decisions are equal to zero, i.e.
°C at i z j, L( H i , H j ) ® °0 at i j. ¯
(1.9)
Hypotheses Testing Methods
11
In this case, risk function (1.5) takes the form (Kachiashvili, 1989, 2003; Duda et al., 2006; Sage and Melse, 1972):
§ C ¨1 ©
rG
¦
S
i 1
p( H i )
³
*i
· p( x | H i )dx ¸ . ¹
(1.10)
The minimum in (1.10) is achieved by solving the problem: max ^*i `
¦
S i 1
p( H i )
³
*i
p( x | H i )dx .
(1.11)
It is evident, that we can consider C 1 without limiting the generality. It is not difficult to be persuaded that the solution of problem (1.11) has the following form:
*i
^x : p(Hi ) p(x | Hi ) ! p(H j ) p(x | H j );
j : j (1,...,i 1, i 1,..., S )` .
(1.12)
Let us denote:
*ij {x : p( Hi ) p( x | Hi ) ! p( H j ) p( x | H j )} ° p( x | Hi ) p( H j ) ½° ! ¾. ®x : °¯ p( x | H j ) p( Hi ) °¿
(1.13)
Then *i
S j 1, j z i
*ij .
Example 1.2. For stepwise loss functions (1.9), hypotheses acceptance regions (1.12) at testing two hypotheses are the following
*1 {x : p( H1) p( x | H1) ! p( H 2 ) p( x | H 2 )} , *2 {x : p( H 2 ) p( x | H 2 ) ! p( H1 ) p( x | H1)} .
(1.14)
Chapter 1
12
An attempt to reconcile the different points of view of noted philosophies was made in (Berger, 2003), and as a result there was offered a new, compromise T * method of testing. The method uses the Fisher’s p -value criterion for making a decision, the Neyman-Pearson’s statement (using basic and alternative hypotheses) and Jeffrey’s formulae for computing the Type I and Type II conditional error probabilities for every observation result x on the basis of which the decision is made.
1.1.4. The Berger’s conditional test The conditional test T C is the following
TC
° °if B( x) d c0 , reject H 0 and report ° °conditional error probability (CEP) ° B( x) ° , ®D ( x) 1 B( x) ° ° °if B( x) ! c0 , accept H 0 and ° 1 °report CEP E ( x) , ° 1 B( x) ¯
where B( x) is the likelihood ratio, and c0 is the minimax critical value defined as
P( B( x) c | H 0 ) 1 P( B( x) c | H1) . The modified conditional test T * consists in the following
(1.15)
Hypotheses Testing Methods
T*
where B( x)
13
°if B( x) d r , reject H 0 and report ° °conditional error probability (CEP) ° °°D ( B( x)) B( x ) /(1 B( x)); ® °if r B( x) a make no decision, ° °if B( x) t a, accept H 0 and report ° ° °¯CEP E ( x) 1 /(1 B ( x)),
p( x | H 0 ) / p( x | H A ) is the likelihood ratio and a and r are
defined as follows
r 1 and a
r
F01(1 FA (1)) if F0 (1) d 1 FA (1) ,
FA1(1 F0 (1)) and a 1 if F0 (1) ! 1 FA (1) ,
(1.16)
where F0 and FA are the cumulative distribution functions (c.d.f.) of
B( X ) under p( x | H 0 ) and p( x | H A ) , respectively. As was mentioned in (Dass and Berger, 2003, p. 196), “ T * is an actual frequentist test; the reported CEPs, D ( B(x)) and E ( B(x)) , are conditional frequentist Type I and Type II error probabilities, conditional on the statistic we use to measure strength of evidence in the data. Furthermore, D ( B(x)) and E ( B(x)) will be seen to have the Bayesian interpretation of being (objective) posterior probabilities of H 0 and H A , respectively. Thus, T * is simultaneously a conditional frequentist and a Bayesian test.” Generalization of the T * test for any number of hypotheses seems quite problematic. For the general case, it is possible only by simulation because the definition of exact distribution of B(x) likelihood ratio for arbitrary hypothetical distributions is very difficult if not impossible.
Chapter 1
14
1.2. Sequential Tests 1.2.1. The Wald’s method The sequential test was introduced by Wald in the mid-forties of last century (Wald, 1947a, b). consists
B( x)
in
the
The essence of the Wald’s sequential test
following:
compute
the
p( x1, x2 ,..., xn | H 0 ) / p( x1, x2 ,..., xn | H A )
for
likelihood
n
ratio
sequentially
obtained observation results, and, if
B B( x) A , do not make the decision and continue the observation of the random variable. If
B( x ) t A , accept the hypothesis H 0 on the basis of n observation results. If
B( x ) d B , accept the hypothesis H A on the basis of n observation results. The thresholds A and B are chosen so that A
1 E
D
and B
E 1D
.
Here D and E are the desirable values of the error probabilities of Types I and II, respectively. It is proved (Wald, 1947a) that in this case the real values of the error probabilities of Types I and II are close enough to the desired values, but still are distinguished from them. Since Wald’s pioneer works, a lot of different investigations were dedicated to the sequential analysis problems (see, for example, Ghosh, 1970; Siegmund, 1985; Kachiashvili, 2018a) and efforts to the development of this approach constantly increase as it has many important advantages in
Hypotheses Testing Methods
15
comparison with the parallel methods (see, for example, Tartakovsky et al., 2015).
1.2.2. The Bayes’ method Concerning the Bayesian sequential methods, the following is written in Berger (1985): “While Bayesian analysis in fixed sample size problems is straightforward (robustness consideration aside), Bayesian sequential analysis is very difficult” (p. 442). The idea of sequential Bayesian procedure consists in computation of the Bayes risk function for every stage of obtained observation result and its comparison with expected posterior Bayes risk that will be obtained if more observations are taken. If the posterior Bayes risk is greater than the Bayes risk function, to stop experimentation
and
to
make
decision,
otherwise
to
continue
experimentation. The readers, interested in details of sequential Bayesian method, can refer to the following sources (Berger, 1985; Arrow et al., 1949; Ghosh and Sen, 1991).
1.2.3. The Berger’s method The sequential test developed on the basis of T * test is as follows (Berger et al., 1994): if the likelihood ratio B( x) d r , reject H0 and report the conditional error probability D ( B( x))
B( x) /(1 B( x)) ;
if r B( x) a , make no decision and the observations continue; if B( x) t a , accept H0 and report the conditional error probability
E ( B( x)) 1/(1 B( x)) .
16
Chapter 1
Here r and a are determined by ratios (1.16).
1.3. Constrained Bayesian Methods (CBM) of Hypotheses Testing A new approach (philosophy) to the statistical hypotheses testing, called Constrained Bayesian Methods (CBM), was comparatively recently developed (Kachiashvili, 2003, 2011, 2014a,b, 2015, 2016, 2018a,b; Kachiashvili et al., 2012a,bc; Kachiashvili and Mueed, 2013; Kachiashvili et al., 2018). This method differs from the traditional Bayesian approach with a risk function split into two parts, reflecting risks for incorrect rejection and incorrect acceptance of hypotheses and stating the risk minimization problem as a constrained optimization problem when one of the risk components is restricted and the another one is minimized. It generates data-dependent measures of evidence with regard to the level of restriction. In spite of absolutely different motivations of introduction of T * and CBM, they lead to the hypotheses acceptance regions with identical properties in principle. Namely, in despite of the classical cases when the observation space is divided into two complementary sub-spaces for acceptance and rejection of tested hypotheses, here the observation space contains the regions for making the decision and the regions for no-making the decision (see, for example, Berger, 2003; Kachiashvili, 2018a; Kachiashvili et al., 2012a; Kachiashvili et al., 2012b; Kachiashvili and Mueed, 2013). Though, for CBM, the situation is more differentiated than for T * . For CBM the regions for no-making the decision are divided into the regions of impossibility of making the decision and the regions of impossibility of making unique decision. In the first case, the impossibility of making the decision is equivalent to the impossibility of making the
Hypotheses Testing Methods
17
decision with given probability of the error for a given observation result, and it becomes possible when the probability of the error decreases. In the second case, it is impossible to make a unique decision when the probability of the error is required to be small, and it is unattainable for the given observation result. By increasing the error probability, it becomes possible to make a decision. It is possible to formulate nine different statements of CBM depending on what type of restrictions is desired which is determined by the aim of the practical problem that must be solved (Kachiashvili, 2011, 2018a; Kachiashvili et al., 2012b). They are (see chapter 2): 1) Restrictions on the averaged probability of acceptance of true hypotheses (Task 1); 2) Restrictions on the conditional probabilities of acceptance of true hypotheses (Task 2); 3) Restrictions on the conditional probabilities of acceptance of each true hypothesis (Task 21); 4) Restrictions on posterior probabilities of acceptance of true hypotheses (Task 3); 5) Restrictions on the averaged probability of rejection of true hypotheses (Task 4); 6) Restrictions on the conditional probabilities of rejection of each true hypothesis (Task 5); 7) Restrictions on a posteriori probabilities of rejection of each true hypothesis (Task 6); 8) Restrictions on probabilities of rejection of true hypothesis (Task 61); 9) Restrictions on posterior probability of rejected true hypotheses (Task 7). Let us introduce Task 1, as an example, for a demonstration of the specificity of CBM. In this case, we have to minimize the averaged loss of incorrectly accepted hypotheses rG
min ® ^* j ` ¯
¦
S i 1
p ( Hi )
S
¦ ³
j 1 *j
½ L1 ( Hi , G j ( x ) 1) p ( x | Hi ) dx ¾ , ¿
subject to the averaged loss of incorrectly rejected hypotheses
(1.17)
Chapter 1
18
¦
S i 1
p ( Hi )
S
¦
i 1
¦
i 1
S
S
¦ ³
j 1 R n * j
p ( Hi )
p( Hi )
S
¦ ³
j 1 Rn
S
¦ ³
j 1 *j
L2 ( Hi , G j ( x)
L2 ( H i , G j ( x )
L2 ( Hi , G j ( x)
0) p ( x | Hi ) dx 0) p ( x | H i ) dx
0) p ( x | Hi )dx d r1 .
(1.18)
where r1 is some real number determining the level of the averaged loss of incorrectly rejected hypotheses. By solving problem (1.17), (1.18), we have
*j O
x : ® ¯
¦
S L ( Hi ,G j ( x) i 1 1
¦
S L ( Hi , G j ( x ) i 1 2
1) p( Hi ) p( x | Hi )
0) p ( Hi ) p ( x | Hi )½¾ , ¿
j 1,..., S ,
(1.19)
where Lagrange multiplier O ( O ! 0 ) is defined so that in (1.18) the equality takes place. Example 1.3. Let us consider stepwise losses
0 at i j, ° L1 ( Hi , G j ( x) 1) ® , °1 at i z j, ¯ 0 at i z j, ° L2 ( Hi , G j ( x) 0) ® °1 at i j. ¯ Then problem (1.17), (1.18) transforms
(1.20)
Hypotheses Testing Methods
min ® ^* j ` ¯
rG
¦
S i 1
p ( Hi )
¦
S
³
j 1, j z i * j
½ p ( x | H i ) dx ¾ , ¿
19
(1.21)
subject to
1
¦
S
i 1
p( Hi )
³
*i
p( x | Hi )dx d r1 ,
(1.22)
and hypotheses acceptance regions (1.19) take the form (Kachiashvili, 2013). *j
x : ® ¯
¦
S i 1,i z j
p ( Hi ) p ( x | Hi ) Op( H j ) p( x | H j )½¾ , ¿
j 1,..., S .
(1.23)
When number of hypotheses S 2 statement of the problem and its solution are min ® p ( H1 )
rG
^*1 , *2 `¯
p ( H1 )
³
*1
³
*2
p ( x | H1 )dx p ( H 2 )
p ( x | H1 ) dx p ( H2 )
³
*2
³
*1
½ p ( x | H 2 ) dx ¾ , ¿
p ( x | H2 ) dx t 1 r1 ,
*1
^x : p( H2 ) p( x | H2 ) Op( H1) p( x | H1)` ,
*2
^x : p( H1) p( x | H1) Op( H2 ) p( x | H2 )` ,
(1.24) (1.25)
(1.26)
In our opinion, the mentioned properties of T * and CBM are very interesting and useful. They bring the statistical hypotheses testing rule much close to the everyday decision-making rule when, at shortage of necessary information, acceptance of one of made suppositions is not compulsory. The specific features of hypotheses testing regions of the Berger’s T * test and CBM, namely, the existence of the no-decision region in the T * test and the existence of regions of impossibility of making a unique or any decision in CBM give the opportunities to develop the sequential tests on
Chapter 1
20
their basis (Berger et al., 1994; Kachiashvili, 2018a; Kachiashvili, 2015; Kachiashvili and Hashmi, 2010).
1.4. The Method of Sequential Analysis of Bayesian Type Let us suppose that there is an opportunity to obtain repeated observations. To introduce the method of sequential analysis for an arbitrary number of hypotheses on the basis of constrained Bayesian task, let us use the n denotations introduced by (Wald, 1947a). Let Rm be the sampling space of
all possible samples of m independent n -dimensional observation vectors
x1,..., xn . Let us split
x
Rmn into S 1 disjoint sub-regions Rmn ,1 , Rmn , 2
,..., Rmn , S , Rmn , S 1 such that Rmn
S 1
i 1
Rmn ,i . Let p x1,...,x m | H i be the
total probability distribution density of m independent n -dimensional observation
p x1,...,x m | H i
m
vectors;
px | H ... px 1
i
is m
the
sample
size.
Then
| Hi .
Let us determine the following decision rule (Kachiashvili, 2014a, 2018a; Kachiashvili and Hashmi, 2010). If the matrix of observation results
x ,...,x belongs to the sub-region R , i 1,..., S , then hypothesis is accepted and, if x x ,...,x belongs to the sub-region R , the 1
x
Hi
m
n m,i
1
m
n m, S 1
decision is not made and the observations continue until one of the tested hypotheses is accepted. Regions Rmn ,i , i 1,..., S 1 , are determined in the following way: Rmn ,i ,
i 1,..., S , is such a part of acceptance region *im of hypothesis H i that does not belong to any other region * mj , j 1,...,i 1, i 1,..., S ; Rmn , S 1 is
Hypotheses Testing Methods
21
n a part of sampling space Rm that belongs simultaneously to more than one
region *im , i 1,..., S , or it does not belong to any of these regions. Here the index m ( m 1,2,... ) points to the fact that the regions are determined on the basis of m sequential observation results. Hypotheses acceptance regions Rmn ,i , i 1,..., S 1 , could be determined as follows. Let us denote the population of sub-regions of intersections of acceptance m regions *i of hypotheses H i (i 1,..., S ) in CBM of hypotheses testing
with the regions of acceptance of other hypotheses H j , j m by I i . By Emn
Rmn
S *m i 1 i
1,..., S , j z i ,
, we denote the population of regions of
n space Rm that do not belong to any of hypotheses acceptance regions. Then
the hypotheses acceptance regions in the method of sequential analysis of Bayesian type are determined in the following way:
Rmn ,i
Rmn , S 1
*im / I im , i 1,..., S ;
§ ¨ ©
S I m ·¸ i 1 i ¹
E
n m
.
(1.27)
m m n Here regions *i , I i , Em, i 1,..., S , are defined on the basis of
hypotheses acceptance regions in CBM (see for example, (1.19)). Finally, it must be noted that, the detailed investigation of different statements of CBM and the choice of optimal loss functions in the constrained statements of the Bayesian testing problem opens wide opportunities in statistical hypotheses testing with new, beforehand unknown and interesting properties. On the other hand, the statement of the Bayesian estimation problem as a constrained optimization problem gives
22
Chapter 1
new opportunities in finding optimal estimates with new, unknown beforehand properties, and it seems that these properties will advantageously differ from those of the approaches known today (Bishop, 2006). In our opinion, the proposed CBM are the ways for future, perspective investigations which will give researchers the opportunities for obtaining new perspective results in the theory and practice of statistical inferences and it completely corresponds to the thoughts of the well-known statistician B. Efron (Efron, 2004): “Broadly speaking, nineteenth century statistics was Bayesian, while the twentieth century was frequentist, at least from the point of view of most scientific practitioners. Here in the twentyfirst century scientists are bringing statisticians much bigger problems to solve, often comprising millions of data points and thousands of parameters. Which statistical philosophy will dominate practice? My guess, backed up with some recent examples, is that a combination of Bayesian and frequentist ideas will be needed to deal with our increasingly intense scientific environment. This will be a challenging period for statisticians, both applied and theoretical, but it also opens the opportunity for a new golden age, rivaling that of Fisher, Neyman, and the other giants of the early 1900s.”
CHAPTER 2 CONSTRAINED BAYESIAN METHOD FOR TESTING DIFFERENT TYPE OF HYPOTHESES
2.1. General Statement of Constrained Bayesian Method The essence of CBM consists of the restriction of one type of errors (Type I or Type II) and in minimizing other type of errors. Depending on what type of errors are restricted and what type of errors are minimized, there are possibilities for different formulations of CBM. Let us consider all possible statements of CBM the concrete kind of which must be chosen depending on what type of restriction is desired which is determined by the practical problem that must be solved (Kachiashvili, 2011, 2018a; Kachiashvili et al., 2012b).
2.1.1. Restrictions on the averaged probability of acceptance of true hypotheses (Task 1) Let us use the notations introduced in Item 1.1.3. In the general case, loss function L( H i , G ( x)) consists of two components: L( H i , G ( x))
¦
S j 1
¦Sj 1 L2 H i , G j ( x) 0 ,
L1 H i , G j ( x) 1
(2.1)
where L1 ( Hi , G j ( x) 1) and L2 ( Hi , G j ( x) 0) are the losses of incorrectly accepted and incorrectly rejected hypotheses. The formulation and the solution of this task is given in Item 1.3.
Chapter 2
24
Let us suppose that the losses are the same within the acceptance and rejection regions and introduce denotations L1 ( H i , H j ) and L2 ( H i , H j ) for incorrect acceptance of H j when Hi is true and incorrect rejection of
H j in favour of Hi . Then decision making regions (1.18) take the form x : ® ¯
*j
¦
O
S
p( Hi ) L1 ( Hi , H j ) p ( x | Hi )
i 1
S
¦
i 1
p( Hi ) L2 ( Hi , H k ) p( x | Hi );
k : k 1,..., j 1, j 1,..., S `, j 1,..., S ;
(2.21)
that is the same as *j
x : ® ¯
¦
S
i 1
L1 ( Hi , H j ) p ( Hi | x) O
¦
S
L ( Hi , H k ) p ( H i i 1 2
| x);
k : k 1,..., j 1, j 1,..., S `, j 1,..., S ;
(2.22)
From (1.17) it is clear that the following condition must be fulfilled
r1
¦
S
i 1
p ( Hi )
S
¦ ³
j 1 Rn
L2 ( Hi , G j ( x) 0) p( x | Hi )dx ,
(2.3)
i.e., for losses L1 ( H i , H j ) and L2 ( Hi , H j ) ,
r1
¦
S
i 1
p ( Hi )
¦
S
L ( Hi , H j ) . j 1 2
(2.4)
Taking into account (2.1), the solution of the problem (1.2) can be written down in the following form (Berger, 1985; Kachiashvili, 2003): x : ® ¯
*j
¦
S L ( Hi , G j ( x ) i 1 1
¦
S
L ( Hi , G j ( x ) i 1 2 j
1) p ( Hi ) p ( x | Hi )
0) p ( Hi ) p( x | Hi )½¾ , ¿
1,..., S ,
(2.5)
Constrained Bayesian Method for Testing Different Type of Hypotheses
25
and, for losses L1 ( Hi , H j ) and L2 ( Hi , H j ) , we have *j
x : ® ¯
S L ( Hi , H j ) p ( Hi i 1 1
¦
| x)
S L ( Hi , H k ) p ( Hi i 1 2
¦
| x );
k : k 1,..., j 1, j 1,..., S `, j 1,..., S .
(2.6)
It is obvious that the difference between (1.18) and (2.5), that is – between (2.2) and (2.6), consists of the Lagrange multiplier O which cardinally changes the properties of decision-making regions (see, for example, Kachiashvili et al., 2012a; Kachiashvili and Mueed, 2013; Kachiashvili, S
2018a). In particular, there are fulfilled the following:
*
j
Rn ,
j 1
*j
*
i
S
*
j
, i z j , when * j are defined by (2.5) and, as a rule,
z R n , which is caused by existence such i and j ( i z j ) that
j 1
*j
* z when * i
j
are defined by (2.2), also, such sub-regions of R
which do not belong to any * j , j 1,..., S , defined by (2.2). Here R n is n dimensional observation space.
2.1.2. Restrictions on the conditional probabilities of acceptance of true hypotheses (Task 2) To minimize (1.16) at S
¦ ³
j 1 R n * j
L2 ( Hi , G j ( x) 0) p( x | Hi )dx d r2 , i 1,..., S .
For losses L1 ( Hi , H j ) and L2 ( Hi , H j ) , we have
(2.7)
Chapter 2
26 S
¦
L ( Hi , H k ) j 1 2 *j
p( x | Hi )dx t
³
S
¦
L ( Hi , H k ) r2 j 1 2
,
k : k (1,..., j 1, j 1,..., S ) .
(2.8)
And, finally, for (1.19) from (2.8), we have 1
³
*i
p ( x | Hi ) dx d r2 ,
i.e.,
³
*i
When r2
p ( x | Hi ) dx t 1 r2 , i 1,..., S .
D , this is Task 2 at loss function “0-1” (Kachiashvili, 2003;
Kachiashvili, 2018a). Let us solve constrained optimization problem (1.16), (2.7). Lagrange function is min ® *j ¯
¦
S i 1
p( Hi )
S Oª i 1 i« ¬
¦
S
¦ ³
j 1 *j
S
¦ ³
S
¦ ³
j 1 *j
j 1 Rn
L1 ( H i , G j ( x ) 1) p ( x | H i ) dx
L2 ( Hi , G j ( x) 0) p ( x | Hi )dx
º ½° L2 ( Hi , G j ( x) 0) p( x | Hi )dx r2 » ¾ . ¼ °¿
For solving this problem, the solution of the following is necessary min ® ^* j ` ¯
S
S
j 1 *j
i 1
¦ ³ ª«¬¦
¦
S
p ( H i ) L1 ( H i , G j ( x ) 1) p ( x | H i )
O L ( Hi , G j ( x) i 1 i 2
½ 0) p ( x | Hi ) º dx ¾ . »¼ ¿
From here it is clear that *j
x : ® ¯
¦
S i 1
p( Hi ) L1 ( Hi , G j ( x) 1) p ( x | Hi )
Constrained Bayesian Method for Testing Different Type of Hypotheses
¦
S
O L ( Hi , G j ( x ) i 1 i 2
0) p ( x | Hi )½¾ , ¿
27
(2.9)
that, for losses L1 ( Hi , H j ) and L2 ( Hi , H j ) , takes the form *j
x : ® ¯
¦
S
i 1
¦
p( Hi ) L1 ( Hi , H j ) p ( x | Hi )
S
O L ( Hi , H k ) p( x | i 1 i 2
Hi );
k : k 1,..., j 1, j 1,..., S ` .
(2.10)
In the expressions (2.9) and (2.10) Lagrange multipliers Oi , i 1,..., S , are determined so that in the conditions (2.7) the equalities are provided. For the losses (1.19), expression (2.9) takes the form *j
x : ® ¯
¦
S
i 1,i z j
p( Hi ) p( x | Hi ) O j p( x | H j )½¾ , ¿
which completely coincides with Task 2 at loss function “0-1”.
2.1.3. Restrictions on the conditional probabilities of acceptance of each true hypothesis (Task 21) To minimize (1.16) at
³
R n * j
L2 ( Hi , G j ( x) 0) p ( x | Hi )dx d r2c , i , j 1,..., S .
(2.11)
Let us transform (2.11)
³
*j
t
³
Rn
L2 ( Hi , G j ( x) 0) p ( x | Hi )dx t
L2 ( Hi , G j ( x) 0) p( x | Hi )dx r2c , i , j 1,..., S ,
that for L1 ( Hi , H j ) and L2 ( Hi , H j ) gives
(2.12)
Chapter 2
28
L2 ( Hi , H k )
³
*j
p( x | Hi )dx t L2 ( Hi , H k ) r2c ,
k : k (1,..., j 1, j 1,..., S ) .
For losses (1.19) from (2.12) we have
³
*j
and when r2
p( x | H j )dx t 1 r2c ,
D , this is Task 2 at loss function “0-1”.
Solution of the constrained optimization problem (1.16), (2.12), using the Lagrange method, gives *j
S
x : ® ¯
¦
¦
i 1
p( Hi ) L1 ( Hi , G j ( x) 1) p ( x | Hi )
S O L ( Hi , G j ( x ) i 1 ij 2
0) p ( x | Hi )½¾ , ¿
(2.13)
that for L1 ( Hi , H j ) and L2 ( Hi , H j ) takes the form *j
x : ® ¯
¦
S i 1
p ( H i ) L1 ( H i , H j ) p ( x | H i )
S O L ( Hi , H k ) p ( x | i 1 ij 2
¦
Hi );
k : k 1,..., j 1, j 1,..., S ` ,
(2.14)
where Oij are determined so that in (2.11) equalities are provided. For the losses (1.19), expression (2.13) takes the form *j
x : ® ¯
¦
S i 1, i z j
p ( Hi ) p ( x | Hi ) O j p ( x | H j )½¾ , ¿
which completely coincides with Task 2 at loss function “0-1”. This fact is evident from the comparison of the restrictions (2.7) and (2.11) too.
Constrained Bayesian Method for Testing Different Type of Hypotheses
29
2.1.4. Restrictions on posterior probabilities of acceptance of true hypotheses (Task 3) To minimize (1.16) at restriction p( H i )
S
¦ ³
j 1 R n * j
L2 ( H i , G j ( x) 0) p ( x | H i )dx d r3i , i 1,..., S .
(2.15)
When loss functions do not change within the hypotheses acceptance and rejection regions, i.e., instead of L1 ( H i , G j ( x) 1) and L2 ( H i , G j ( x) 0) , we have L1 ( Hi , H j ) and L2 ( Hi , H j ) , expression (2.15) transforms in the form
p ( Hi )
S
¦
L ( Hi , H k ) j 1 2 *j
³
t p ( Hi )
¦
p( x | Hi )dx t
S
L ( Hi , H k ) r3i j 1 2
,
k : k (1,..., j 1, j 1,..., S ) , i
1,..., S .
(2.16)
For losses (1.19) from (2.16) we have p ( x | H j )dx t 1 r3 ,
³
*j
and when r3 D , this is Task 2 at loss function “0-1”. Using Lagrange’s method for solving the constrained optimization problem (1.16), (2.15), we obtain *j
S
^ x : ¦i
¦
1
p ( H i ) L1 ( H i , G j ( x ) 1) p ( x | H i )
S O p ( Hi ) L2 ( Hi , G j ( x ) i 1 i
0) p ( x | Hi )½¾ , ¿
that for L1 ( Hi , H j ) and L2 ( Hi , H j ) transforms in the following *j
x : ® ¯
¦
S i 1
p ( H i ) L1 ( H i , H j ) p ( x | H i )
(2.17)
Chapter 2
30
S O p( Hi ) L2 ( Hi , H k ) p ( x | i 1 i
¦
Hi );
k : k 1,..., j 1, j 1,..., S ` .
(2.18)
In the expressions (3.17) and (2.18) Lagrange multipliers Oi , i 1,..., S , are determined so that in the conditions (2.15) the equalities are provided. For the losses (1.19), expression (2.18) takes the form x : ® ¯
*j
¦
S
i 1,i z j
p( Hi ) p ( x | Hi ) O j p( H j ) p ( x | H j )½¾ , ¿
which completely coincides with Task 3 at loss function “0-1”.
2.1.5. Restriction on the averaged probability of rejection of true hypotheses (Task 4) At this case, we have to minimize min ® ^* j ` ¯
¦
S
p( Hi )
i 1
S
¦ ³
j 1 R n * j
L2 ( H i , G j ( x )
½ 0) p ( x | H i ) dx ¾ , ¿
(2.19)
subject to S
¦
i 1
p( Hi )
S
¦ ³
j 1 *j
L1 ( Hi , G j ( x) 1) p( x | Hi )dx d r4 .
(2.20)
Let us rewrite (2.19) min ® ^* j ` ¯
¦
S i 1
p ( Hi )
S
¦ ª«¬³ j 1
Rn
L2 ( Hi , G j ( x) 0) p ( x | Hi ) dx
º ½° L2 ( Hi , G j ( x) 0) p( x | Hi )dx » ¾ . *j ¼ °¿
³
(2.21)
It is obvious that the minimum in (2.21) is achieved at maximization of the expression GG
max ® ^* j ` ¯
¦
S i 1
p( Hi )
S
¦ ³
j 1 *j
L2 ( H i , G j ( x )
½ 0) p ( x | H i ) dx ¾ . ¿
(2.22)
Constrained Bayesian Method for Testing Different Type of Hypotheses
31
Therefore, instead of constrained optimization problem (2.19), (2.20), the following problem must be solved: to maximize (2.22) at restriction of (2.20). Application of Lagrange’s undetermined multipliers method gives the following hypotheses acceptance regions
x : ® ¯
*j
1
O
S
¦
i 1
¦
S i 1
p( Hi ) L1 ( Hi , G j ( x) 1) p( x | Hi )
½ p ( Hi ) L2 ( Hi , G j ( x) 0) p ( x | Hi )¾ , ¿ 1,..., S ,
j
(2.23)
which for losses L1 ( Hi , H j ) and L2 ( Hi , H j ) transforms as follows
x : ® ¯
*j
1
S
¦
i 1
O¦
S i 1
p( Hi ) L1 ( Hi , H j ) p( x | Hi )
p ( Hi ) L2 ( Hi , H k ) p ( x | Hi );
k : k 1,..., j 1, j 1,..., S ` , j 1,..., S .
(2.24)
For losses (1.19) from (2.20) we have S
¦
i 1
which at r4
p ( Hi )
S
¦
³
j 1, j zi * j
p( x | Hi )dx d r4 ,
(2.25)
D completely coincides with Task 4 at loss function “0-1”.
For the same step-wise losses, decision region (2.24) becomes *j
x : p( H ) p( x | H ) ! O ® j j ¯ j
¦
1,..., S ,
S
i 1,i z j
p ( Hi ) p ( x | Hi )½¾ , ¿
(2.26)
which completely coincides with the solution of Task 4 at loss function “01”.
Chapter 2
32
2.1.6. Restrictions on the conditional probabilities of rejection of each true hypothesis (Task 5) In this case it is necessary to maximize (2.22) at restrictions
³
*j
L1 ( Hi , G j ( x) 1) p ( x | Hi )dx d r5 , i , j 1,..., S .
(2.27)
Hypotheses acceptance regions obtained by solution of the problem (2.22), (2.27), are the following *j
S
^ x : ¦i !
1
p ( H i ) L2 ( H i , G j ( x )
S O L ( Hi , G j ( x) i 1 ij 1
¦
j
0) p ( x | H i ) !
1) p( x | Hi )½¾ , ¿
1,..., S ,
(2.28)
which for the losses L1 ( Hi , H j ) and L2 ( Hi , H j ) , takes the form *j
x : ® ¯
!
¦
S i 1
p ( H i ) L2 ( H i , H k ) p ( x | H i ) !
S O L ( Hi , H j ) p ( x | i 1 ij 1
¦
Hi );
k : k 1,..., j 1, j 1,..., S ` , j 1,..., S .
(2.29)
For losses (1.19) restrictions (2.27) transform as follows
³
*j
p ( x | H i )dx d r5 , i , j 1,..., S , i z j .
(2.30)
which at r5 D completely coincides with Task 5 at loss function “0-1”. For losses (1.19), decision region (2.28) becomes *j
x : p( H ) p( x | H ) ! ® j j ¯
¦
S
O i 1,i z j ij
p ( x | Hi )½¾ , j 1,..., S , ¿
which completely coincides with Task 5 at loss function “0-1”.
(2.31)
Constrained Bayesian Method for Testing Different Type of Hypotheses
33
2.1.7. Restrictions on a posteriori probabilities of rejection of each true hypothesis (Task 6) In this case it is necessary to maximize (2.22) at restrictions p( Hi )
³
*j
L1 ( H i , G j ( x ) 1) p ( x | H i ) dx d r6 , i , j
1,..., S . (2.32)
Application of Lagrange’s method gives hypotheses acceptance regions x : ® ¯
*j
!
¦
¦
S i 1
S
O i 1 ij
p ( H i ) L2 ( H i , G j ( x )
0) p ( x | H i ) !
p ( Hi ) L1 ( Hi , G j ( x) 1) p ( x | Hi )½¾ , ¿ 1,..., S ,
j
(2.33)
where Oij are determined so that the equalities are provided in (2.32). The forms of regions (2.33), for losses L1 ( Hi , H j ) and L2 ( Hi , H j ) , are the following x : ® ¯
*j
!
¦
¦
S i 1
p ( H i ) L2 ( H i , H k ) p ( x | H i ) !
S O p( Hi ) L1 ( Hi , H j ) p( x i 1 ij
| Hi );
k : k 1,..., j 1, j 1,..., S ` , j 1,..., S .
(2.34)
For step-wise losses (1.19) expressions (2.32) take the form p ( Hi )
which at r6
³
*j
p( x | Hi )dx d r6 , i , j 1,..., S , i z j .
(2.35)
D completely coincides with Task 6 at loss function “0-1”.
For losses (1.19), decision region (2.33) becomes *j
x : p( H ) p( x | H ) ! ® j j ¯ j
S O i 1,i z j ij
¦
1,..., S .
p ( Hi ) p( x | Hi )½¾ , ¿
(2.36)
Chapter 2
34
which completely coincides with Task 6 at loss function “0-1”.
2.1.8. Restrictions on probabilities of rejection of true hypothesis (Task 61) In this case, it is necessary to maximize (2.22) at restrictions S
¦ ³
j 1 *j
L1 ( Hi , G j ( x) 1) p( x | Hi )dx d r6c , i 1,..., S .
(2.37)
Lagrange’s method gives hypotheses acceptance regions of the form S
^ x : ¦i
*j
!
1
p ( H i ) L2 ( H i , G j ( x )
S O L ( Hi , G j ( x) i 1 i 1
¦
0) p ( x | H i ) !
1) p( x | H i )½¾ , j 1,..., S , ¿
(2.38)
where Oi are determined so that in (2.37) equalities are provided. For losses L1 ( Hi , H j ) and L2 ( Hi , H j ) , regions (2.38) become *j
x : ® ¯
¦
S i 1
p ( H i ) L2 ( H i , H k ) p ( x | H i ) !
¦
S O L ( Hi , H j ) p( x | i 1 i 1
k : k 1,..., j 1, j 1,..., S ` , j 1,..., S .
H i );
(2.39)
For losses (1.19) from (2.37) we have S
¦ Let us r6c
³
j 1, j z i * j
p( x | Hi )dx d r6c , i 1,..., S .
(2.40)
D , 0 d D d 1 . Then condition (2.40) means that the
probability that Hi hypothesis will not be accepted when it is true is less than D . For losses (1.19) decision region (2.38) becomes *j
x : p( H ) p( x | H ) ! ® j j ¯
S O p( x | i 1,i z j i
¦
Hi )½¾ , j 1,..., S , (2.41) ¿
where Oi are determined so that the equalities are provided in (2.37).
Constrained Bayesian Method for Testing Different Type of Hypotheses
35
2.1.9. Restrictions on posterior probability of rejected true hypotheses (Task 7) In this case, it is necessary to maximize (2.22) at restrictions
p ( Hi )
S
¦ ³
L ( Hi , G j ( x) j 1 *j 1
1) p( x | Hi )dx d r8 , i 1,..., S . (2.42)
By solution of the problem (2.22), (2.42), we obtain x : ® ¯
*j
!
¦
¦
S i 1
S O i 1 i
p ( H i ) L2 ( H i , G j ( x )
0) p ( x | H i ) !
p ( H i ) L1 ( H i , G j ( x) 1) p ( x | H i )½¾ , ¿ j
1,..., S ,
(2.43)
where Oi are determined so that in (2.42) equalities take place. For losses L1 ( Hi , H j ) and L2 ( Hi , H j ) , instead of (2.43), we have x : ® ¯
*j
!
¦
S i 1
S O i 1 i
¦
p ( H i ) L2 ( H i , H k ) p ( x | H i ) !
p ( H i ) L1 ( H i , H j ) p( x | H i );
k : k 1,..., j 1, j 1,..., S ` , j 1,..., S .
(2.44)
For losses (1.19) from (2.42) we have S
¦
i 1,i z j
which at r7
p( Hi )
³
*j
p( x | Hi )dx d r7 , i 1,..., S .
(2.45)
D completely coincides with Task 7 at loss function “0-1”.
For losses (1.19) decision region (2.43) becomes *j
x : p( H ) p( x | H ) ! ® j j ¯
S O p ( Hi ) p( x | i 1,i z j i
¦
Hi )½¾ , j 1,..., S , (2.46) ¿
where Oi are determined so that in (2.45) equalities are provided.
Chapter 2
36
2.2. Directional Hypotheses Testing Methods Statistical hypothesis testing is one of the basic problems of the mathematical statistics theory and practice. Many different types of hypotheses have been considered in the literature. However directional hypotheses are comparatively new in comparison to traditional hypotheses. For parametrical models, this problem can be stated as
H0 : T T 0 vs. H : T T 0 , or H : T ! T 0 .
(2.47)
where T is the parameter of the model, T 0 is known (see, for example, Bansal & Sheng, 2010). These alternatives are called skewed or directional alternatives. The consideration of directional hypotheses started in the 1950s. The earliest works considering this problem were by Lehmann (1950, 1957a, 1957b) and Bahadur (1952). Interest in this problem has not decreased since (see, for example, Kaiser, 1960; Leventhal and Huynh, 1996; Finner, 1999; Jones & Tukey, 2000 and Shaffer, 2002; Bansal & Sheng, 2010). For solving this problem, authors used traditional methods based on p-values, frequentist or Bayesian approaches and their modifications. A compact but exhaustive review of these works is given in Bansal & Sheng (2010), where Bayesian decision theoretical methodology for testing the directional hypotheses was developed and compared with the frequentist method. In the same work, the decision theoretic methodology was used for testing multiple directional hypotheses. The cases of multiple experiments for directional hypotheses were also considered in Bansal & Miescke (2013) and Bansal et al. (2016). The choice of a loss function related to the Kullback-Leibler divergence in a general Bayesian framework for testing the directional hypotheses is considered in Bansal et al. (2012).
Constrained Bayesian Method for Testing Different Type of Hypotheses
37
A new approach to the statistical hypotheses testing, called Constrained Bayesian Methods (CBM), was developed by Kachiashvili et al. (2012a, b; 2018a), Kachiashvili and Mueed (2013). As we have seen in Item 2.1, this method differs from the traditional Bayesian approach with a risk function split into two parts, reflecting risks for incorrect rejection and incorrect acceptance of hypotheses and stating the risk minimization problem as a constrained optimization problem when one of the risk components is restricted and the another one is minimized (Kachiashvili, 2011; Kachiashvili et al., 2012b). Application of this method to different types of hypotheses (two and many simple, composite and multiple hypotheses) with parallel and sequential experiments showed the advantage and uniqueness of the method in comparison with existing ones (Kachiashvili, 2014a, b; Kachiashvili, 2015; Kachiashvili, 2016; Kachiashvili, 2018a, b). The uniqueness of the method is the following. It gives the regions of impossibility of making a decision alongside of the regions of acceptance of testing hypotheses (like the sequential analysis method). This allows us to develop both parallel and sequential methods without any additional efforts. The advantage of the method is the optimality of made decisions with guaranteed reliability and minimality of necessary observations for given reliability (see, for example, Kachiashvili, 2014a, b; Kachiashvili, 2015; Kachiashvili, 2016; Kachiashvili, 2018a). CBM uses not only loss functions and a priori probabilities for making decisions as the classical Bayesian rule does, but also a significance level as the frequentist method does. The combination of these opportunities improves the quality of made decisions in CBM in comparison with other methods. Taking into account the fact that CBM gives better results than other known methods for testing the traditional hypotheses, it is expected that it will give similar better
Chapter 2
38
results for testing the directional hypotheses as it, in addition to the classical Bayesian method, uses significance levels in appropriate restrictions. For testing (2.47) hypotheses, the loss functions that do not depend on x are used in (Bansal & Sheng, 2010). Let us denote: *0 , * and * are the regions of acceptance of the appropriate hypotheses. And, for testing (2.47) hypotheses, let us use Task 1, for concreteness. In the considered case, decision-making regions (2.2) become: the hypothesis H acceptance region
*
^x : L1 ( H , H ) p( H | x) L1 ( H0 , H ) p( H0 | x) L1 ( H , H ) p( H | x) O > L2 ( H , H 0 ) p( H | x) L2 ( H0 , H 0 ) p( H0 | x)
(k {0)
L2 ( H , H 0 ) p( H | x)@ & & L1 ( H , H ) p( H | x) L1 ( H0 , H ) p( H0 | x) L1 ( H , H ) p( H | x) O > L2 ( H , H ) p( H | x) L2 ( H0 , H ) p( H0 | x)
(k { )
L2 ( H , H ) p( H | x)@ ; similarly, for
*0
(2.48)
*0 , we have
^x : L1 ( H , H0 ) p( H | x) L1 ( H0 , H 0 ) p( H0 | x) L1 ( H , H 0 ) p( H | x)
(k { )
O > L2 ( H , H ) p( H | x) L2 ( H0 , H ) p( H0 | x) L2 ( H , H ) p( H | x)@ &
& L1 ( H , H0 ) p( H | x) L1 ( H0 , H 0 ) p( H0 | x) L1 ( H , H 0 ) p( H | x) (k { )
O > L2 ( H , H ) p( H | x) L2 ( H0 , H ) p( H0 | x) L2 ( H , H ) p( H | x)@ ;
(2.49)
Constrained Bayesian Method for Testing Different Type of Hypotheses
and, for
39
* , we have *
^x : L1 ( H , H ) p( H | x) L1 ( H0 , H ) p( H0 | x) L1 ( H , H ) p( H | x) O > L2 ( H , H 0 ) p( H | x) L2 ( H0 , H 0 ) p( H0 | x)
(k { 0)
L2 ( H , H 0 ) p( H | x)@ & & L1 ( H , H ) p( H | x) L1 ( H0 , H ) p( H0 | x) L1 ( H , H ) p( H | x) O > L2 ( H , H ) p( H | x) L2 ( H0 , H ) p( H0 | x)
(k { )
L2 ( H , H ) p( H | x)@ .
(2.50)
The following “ 0 K ” loss function was used in Bansal & Sheng (2010)
L1 ( H , H )
L1 ( H0 , H0 )
L1 ( H , H )
L2 ( H , H )
L2 ( H0 , H0 ) L2 ( H , H ) 0 ,
L1 ( H , H0 ) L1 ( H , H0 ) K 0 , L2 ( H , H0 ) L2 ( H , H0 ) K 0 , L1 ( H0 , H ) L1 ( H0 , H ) L1 ( H , H ) L1 ( H , H ) K1 , L2 ( H , H ) L2 ( H , H ) L2 ( H0 , H ) L2 ( H0 , H ) K1 . (2.51) Inputting these losses into decision-making regions (2.48), (2.49) and (2.50), we have: for * ( k { 0 ) *
^x : K1 p( H0 | x) p( H | x) O K0 p( H | x) p( H | x) &
( k { ) & K1 p( H0 | x) p( H | x) O K1 p( H | x) p( H0 | x) ` , (2.52) i.e. (k {0)
*
p ( H0 | x) p( H | x) ®x : ¯ p ( H | x) p ( H | x)
1 p ( H | x) K 0 O& 1 p ( H 0 | x ) K1
Chapter 2
40
(k { )
&
p ( H0 | x) p( H | x) p( H0 | x) p( H | x)
1 p( H | x) O` ; 1 p( H | x)
(2.53)
similarly, for *0 , we have
^x : K 0 p( H | x) p( H | x) O K1 p( H0 | x) p( H | x) &
( k { ) *0
( k { ) & K 0 p( H | x) p( H | x) O K1 p( H | x) p( H0 | x) ` , (2.54) i.e. (k {)
*0
(k { )
&
p( H 0 | x) p ( H | x) ®x : ¯ p( H | x) p( H | x)
p( H | x) p( H0 | x) p ( H | x) p ( H | x)
1 p( H | x) K 0 1 ! & 1 p ( H 0 | x ) K1 O
1 p( H | x) K 0 1 ½ ! ¾, 1 p ( H 0 | x ) K1 O ¿
(2.55)
and, finally *0
° § 1 p ( H | x ) 1 p ( H | x ) · K 0 1 °½ ¸! , ¾; ® x : min ¨¨ ¸ °¯ © 1 p ( H 0 | x ) 1 p ( H 0 | x ) ¹ K1 O °¿
(2.56)
for * , we obtain
*
^x : L1 ( H , H ) p( H | x) L1 ( H0 , H ) p( H0 | x) L1 ( H , H ) p( H | x)
(k {0)
O > L2 ( H , H 0 ) p( H | x) L2 ( H0 , H 0 ) p( H0 | x) L2 ( H , H 0 ) p( H | x)@ & & L1 ( H , H ) p( H | x) L1 ( H0 , H ) p( H0 | x) L1 ( H , H ) p( H | x) (k {)
O > L2 ( H , H ) p( H | x) L2 ( H0 , H ) p( H0 | x) L2 ( H , H ) p( H | x)@ , i.e.
(2.57)
Constrained Bayesian Method for Testing Different Type of Hypotheses
p ( H 0 | x) p ( H | x) ®x : ¯ p( H | x) p ( H | x)
( k { 0 ) * (k {)
p ( H0 | x) p ( H | x) p ( H0 | x) p ( H | x)
&
41
1 p ( H | x) K 0 O& 1 p ( H 0 | x ) K1
1 p( H | x) O` . 1 p( H | x)
(2.58)
Analyzing regions (2.53), (2.56) and (2.58), we conclude that generally, for arbitrary O ! 0 , in contradistinction to the classical cases, the following conditions
*
take
* * 0
place:
*i
*
j
z ,
iz j,
i , j ( , 0, )
and
z R n , i.e., in general, hypotheses acceptance regions
intersect and the union of these regions does not coincide with the observation space. If more than one of conditions (2.53), (2.56) and (2.58) or none of these conditions are fulfilled, then it is impossible to make a simple decision. In the first case, more than one of the hypotheses are suspected to be true and, in the second case, it is impossible to make a single decision. In such cases, it is necessary to obtain one more observation and, on the basis of increased sample, to make a decision using conditions (2.53), (2.56) and (2.58) or to change
r1 in condition (1.17) upon fulfilling only
one of conditions (2.53), (2.56) and (2.58). When O
1 , decision rules
(2.53), (2.56) and (2.58) completely coincide with the Bayesian decision rule given in Bansal & Sheng (2010). In the case of loss functions (2.51), condition (1.17) takes the form K1 p ( H )
³
*
p ( x | H ) dx K 0 p ( H 0 )
K1 p ( H )
³
*
³
*0
p ( x | H 0 )dx
p ( x | H )dx t
t p( H ) K1 p( H0 ) K 0 p( H ) K1 r1 .
(2.59)
Hence it is clear that the following condition must always be satisfied
Chapter 2
42
p( H ) K1 p( H0 ) K 0 p( H ) K1 ! r1 . Let us choose
r1
r1 as follows
p( H ) K1 D p( H0 ) K 0 D 0 p( H ) K1 D ,
where 0 d D d 1 , 0 d D 0 d 1 and 0 d D d 1 . Then, in the right side of (2.59), we have
p( H ) K1 (1 D ) p( H0 ) K 0 (1 D 0 ) p( H ) K1 (1 D ) . (2.60) Let us consider the following losses L1 ( Hi , H j )
0 at i j , ° ® ° L1 ( Hi , H j ) at i z j , ¯
L2 ( Hi , H j )
0 at i z j , ° ® ° L2 ( Hi , H i ) at i ¯
j.
(2.61)
It is clear that the “ 0 1 ” loss function is a private case of the step-wise loss (3.61). For loss functions (2.61), Eq. (1.16) takes the form rG
ª min ® p ( H ) « L1 ( H , H 0 ) ^* ,*0 ,* `¯ ¬
+ L1 ( H , H ) ª + p ( H 0 ) « L1 ( H0 , H ) ¬ ª + p ( H ) « L1 ( H , H ) ¬
³
*
³
*0
p ( x | H ) dx
º p ( x | H )dx » + ¼
³
p ( x | H0 )dx L1 ( H 0 , H )
³
º p ( x | H 0 )dx » + ¼
³
p ( x | H ) dx L1 ( H , H 0 )
³
º½ p ( x | H ) dx » ¾ , ¼¿
*
*
*
*0
(2.62) and condition (1.17) transforms in the following form
Constrained Bayesian Method for Testing Different Type of Hypotheses
p ( H ) L2 ( H , H )
³
*
p ( x | H ) dx p ( H 0 ) L2 ( H 0 , H 0 )
p ( H ) L2 ( H , H )
³
*
³
*0
43
p ( x | H0 ) dx
p ( x | H ) dx t
t p( H ) L2 ( H , H ) p( H0 ) L2 ( H0 , H0 ) p( H ) L2 ( H , H ) r1 . (2.63) Stated problem (2.62), (2.63) can be written as
rG
min
^L1 ( H , H 0 ) P( x *0 ) P( H | x *0 )
^* ,*0 ,* `
+ L1 ( H , H ) P( x * ) P( H | x * ) + + L1 ( H0 , H ) P( x * ) P( H0 | x * ) + L1 ( H0 , H ) P( x * ) P( H0 | x * ) + + L1 ( H , H ) P( x * ) P( H | x * ) +
L1 ( H , H 0 ) P( x *0 ) P( H | x *0 )` ,
(2.64)
at L2 ( H , H ) p( x * ) p( H | x * )
+ L2 ( H0 , H0 ) p( x *0 ) p( H0 | x *0 )
L2 ( H , H ) p( x H ) p( H | x * ) t t p( H ) L2 ( H , H ) p( H0 ) L2 ( H0 , H0 ) + p( H ) L2 ( H , H ) r1 .
(2.65)
At the “ 0 1 ” loss function, (2.64) and (2.65) take the form
rG
min
^P( x *0 ) P( H | x *0 ) P( x * ) P( H | x * ) +
^* ,*0 ,* `
+ P( x * ) P( H0 | x * ) P( x * ) P( H0 | x * ) + + P( x * ) P( H | x * ) P( x *0 ) P( H | x *0 )` , at
(2.66)
Chapter 2
44
p( x * ) p( H | x * ) p( x *0 ) p( H0 | x *0 )
p( x H ) p( H | x * ) t 1 r1 .
(2.67)
Let us rewrite (2.66) and (2.67) in the following forms
rG
^P( H ) >P( x *0 | H ) P( x * | H )@ +
min
^* ,*0 ,* `
+ P( H0 ) >P( x * | H0 ) P( x * | H0 )@ + + P( H ) > P( x * | H ) P( x *0 | H )@` ,
(2.68)
and p( x * | H ) p( H ) p( x *0 | H0 ) p( H0 )
p( x H | H ) p( H ) t 1 r1 .
(2.69)
The results of (2.68) and (2.69) can be stated in terms of positive false discovery rates ( pFDR ) for testing multiple hypotheses (Storey, 2003). Let us call false discovery rates of the appropriate hypotheses the following
pFDR (* ) P( x *0 | H ) P( x * | H ) , pFDR (* ) P( x * | H ) P( x *0 | H ) ,
(2.70)
pFDR0 (*0 ) P( x * | H 0 ) P( x * | H 0 ) , and true discovery rates of the appropriate hypotheses the following p ( x * | H ) , TDR0 (*0 )
TDR ( * )
TDR ( * )
p( x *0 | H0 ) ,
p ( x * | H ) .
(2.71)
Then equations (2.68) and (2.69) will be written as follows
rG
min
^P( H ) pFDR (* ) P( H 0 ) pFDR0 (*0 )
^* ,*0 ,* `
P( H ) pFDR (* )` ,
(2.72)
at p(H )TDR (* ) p(H 0 )TDR0 (*0 ) p(H )TDR (* ) t 1 r1 .
(2.73)
Constrained Bayesian Method for Testing Different Type of Hypotheses
45
For comparing the decision rules, let us consider a hierarchical structure of the prior on T similarly to Bansal & Sheng (2010). Let us introduce the first stage prior
p
p( H ) ,
p0
p( H0 ) ,
p
p( H ) , with
p p0 p 1 , and the second stage prior on T as S (T ) S (T | H ) ,
S 0 (T ) S (T | H 0 ) and S (T ) S (T | H ) , where, S 0 (T ) I (T T 0 ) and S () and S () are the densities with supports in (f ,0 ) and (0, f ) , respectively. Then the prior on T can be written as
S (T )
p S (T ) I (T 0) p0 I (T
T0 ) p S (T ) I (T ! 0) . (2.74)
For a fixed prior S , the decision rule can be compared by comparing the points in the space
S (S )
^pFDR (* ), pFDR (* ), pFDR (* ) : G D `,
0
*
0
where D* is the class of randomized decision rules. Let us consider a subclass of decision rules D D* such that pFDR0 (*0 ) is constant for all
G D . p
Let
us
consider
^ p( H ), p( H0 ), p( H )`
and pc
two
different
sets
^ pc( H ), p( H0 ), pc( H )`
that the following relations take place
of
priors:
and suppose
p ( H ) ! p c( H )
and thus
p ( H ) p c( H ) . Then the following fact can be proved.
c Theorem 2.1. If G CBM and G CBM denote CBM rules within the class D under the priors p and p c , respectively, then c
c pFDRG CBM (* ) d pFDRG CBM (* ) , TDRG CBM (* ) t TDRG CBM (* )
and c
c pFDRG CBM (* ) t pFDRG CBM (* ) , TDRG CBM (* ) d TDRG CBM (* ) .
Chapter 2
46
The proof of this theorem is similar to the proof of theorem 1 of Bansal & Sheng (2010), therefore its shortened version for only the false discovery rate, adapted to the considered case, is given below. The validity of this theorem is clearly demonstrated by the computation results shown in Figure 3.7 and Figure 3.9 b. Proof. Bayes Risk of a decision rule G under the prior (2.74) is given by p pFDRG CBM p0 pFDR0G CBM p pFDRG CBM .
rG
c are the constrained Bayesian rules under S Thus, since G CBM and G CBM and S c , respectively, p pFDRG CBM p0 pFDR0G CBM p pFDRG CBM d c
d p pFDRG CBM p0 pFDR0G CBM p pFDRG CBM , c
c
c
pc pFDRG CBM p0 pFDR0G CBM pc pFDRG CBM d c
c
p c pFDRGCBM p0 pFDR0G CBM pc pFDRG CBM . Now, since pFDR0 is constant within the class D , and since G CBM and
c G CBM belong to the class D , we have c
c
p pFDRG CBM p pFDRG CBM d p pFDRG CBM p pFDRG CBM , c
c
pc pFDRG CBM pc pFDRG CBM d pc pFDRGCBM pc pFDRG CBM , which implies that
pc pFDR
pc pFDR pFDR t 0 . (2.75) pFDR by x and pFDR if we will denote pFDR by y and will consider system of equations pFDR c
c
p pFDRG CBM pFDRG CBM p pFDRG CBM pFDRG CBM d 0 , c G CBM
G CBM
G CBM
Now,
pFDR
G CBM
c G CBM
c G CBM
G CBM
c G CBM
Constrained Bayesian Method for Testing Different Type of Hypotheses
47
(2.75) relatively x and y , we will easily be convinced in the validity of the theorem.
c Corollary 2.1. If G CBM and G CBM denote CBM rules under the priors p ( x *0 | H ) and and p c , respectively, then PG CBM ( x *0 | H ) t PG CBM c PG CBM ( x *0 | H ) d PG CBM ( x *0 | H ) , where c PG CBM ( x *0 | H )
³
R n *0 ( G CBM )
PG CBM ( x *0 | H )
³
p ( x | H ) dx and
R n *0 ( G CBM )
p ( x | H ) dx .
The proof of the corollary directly follows from the proof of Theorem 2.1 (see Figure 3.7). When testing the directional hypotheses, some authors (see, for example Shaffer, 2002 and Jones & Tukey, 2000) offer to use the Type III error rate which is defined as Type III error rate
P( x * | H 0 ) P ( x * | H 0 ) .
(2.76)
A somewhat different definition of this term is offered in Kaiser (1960). In particular, the type III error involves inferring incorrectly the direction of the effect. For example, when the population value of the tested parameter is actually more than the null value, getting a sample value that is so much below the null value that you reject the null and conclude that the population value is also below the null value. In the considered case this means: Type III error rate
P ( x * | H ) P ( x * | H ) .
(2.77)
Chapter 2
48
T Let us denote Type III error rate in (2.76) as ERR III and Type III error K rate in (2.77) as ERR III .
T K Theorem 2.2. For the considered directional hypotheses, ERRIII ! ERRIII
always takes place and, when
min
^i , j,0, ;i z j`
div H i , H j o f , both error
rates tend to zero. Proof. If we recall the character of considered directional hypotheses and the fact that the increasing divergence among hypotheses entails a decrease in the probabilities of errors of the first and the second types at hypotheses testing, and, in the limit, when
min
^i , j,0, ;i z j`
div H i , H j o f , there takes
place Į ĺ 0 and ȕ ĺ 0 (Kullback, 1978), we will be convinced in the validity of the theorem. From the comparison of expressions (2.70) with (2.76) and (2.77), it is T K and pFDR (* ) pFDR (* ) ! ERRIII . seen that pFDR0 (*0 ) ERRIII T K ERRIII The ratio between pFDR (* ) pFDR (* ) and ERRIII can be
arbitrary in general. From the above given, it is clear that CBM is a data-dependent test (see Eqs. (2.66) and (2.67)) similarly to the Fisher’s p –value test, in addition to the fact that it also computes Type I and Type II error probabilities like the Neyman-Pearson’s approach (see Eqs. (2.62) and (2.63)), and uses a posteriori probabilities like the Bayes test (see Eqs. (2.64) and (2.65)).
Constrained Bayesian Method for Testing Different Type of Hypotheses
49
2.3. CBM for Testing Directional Hypotheses Restricted False Discovery Rates The traditional formulation of testing simple basic hypotheses versus composite alternatives is a well studied problem in many scientific works (Marden, 2000; Anderson, 1982; Wijsman, 1967; Berger and Pericchi, 1996; Kass and Wasserman, 1996; Gomez-Villegas et al., 2009; Duchesne and Francq, 2014; Bedbur et al. 2013). The problem of making the sense about direction of difference between parameter values, defined by basic and alternative hypotheses, is important in many applications (Kaiser, 1960; Leventhal and Huynh, 1996; Leventhal, 1999; Finner, 1999; Jones and Tukey, 2000; Shaffer, 2002; Bansal and Sheng, 2010; Bansal and Miescke, 2013; Bansal et al., 2016). Here the decision whether the parameter outstrips or falls behind of the value defined by basic hypothesis is meaningful. The alternatives in (2.47) are called skewed or directional alternatives. The consideration of directional hypotheses found their applications in different realms. Among them are biology, medicine, technique and so on (Bansal et al, 2016; Efron, 2007). The appropriate tests “has just begun to stir up some interests in the educational and behavioral literature” (Huynh, 2014) (See, for example, Hand et al., 1985; Harris, 1997; Hopkins, 1973). Directional false discovery rate (DFDR) or mixed directional false discovery rate (mdFDR) are used when alternatives are skewed (Bansal et al., 2016). The optimal procedures controlling DFDR (or mdFDR) use twotailed procedures assuming that directional alternatives are symmetrically distributed. Therefore, decision rule is symmetric in relation with the parameter’s value defined by basic hypothesis (Shaffer, 2002; Benjamini and Yekutieli, 2005). For the experiments where the distribution of the alternative hypotheses is skewed, the asymmetric decision rule, based on skew normal priors and used Bayesian methodology for testing when
50
Chapter 2
minimizing mdFDR, is offered in (Bansal et al., 2016). Theoretically, it has been proved “that a skewed prior permits a higher power in number of correct discoveries than if the prior is symmetric”. This result is confirmed by simulation study comparing the proposed rule with a frequentist’s rule and the rule offered in (Benjamini and Yekutieli, 2005). Because CBM allows us to foresee the skewness by not only a prior probabilities but also by restriction levels in the constraints, it is expected that it will give more powerful decision rule in number of correct discoveries than existed symmetric or asymmetric in the prior decision rules. Therefore, different statements of CBM (see Item 2.1), for testing skewed hypotheses with restricted mdFDR, are considered below (Kachiashvili, 2011; Kachiashvili, 2018a; Kachiashvili et al., 2012a, b). They differ from each other by the kind of restrictions put on the Type I or Type II errors made at testing. Consider skewed hypotheses, let us examine some kind of CBM, from all possible statements (see Item 2.1), for testing directional hypotheses (2.47). Remark 2.1. numbering of the tasks, described below, is preserved from
Item 2.1 that coincide with (Kachiashvili, 2018a).
2.3.1. Restrictions on the averaged probability of acceptance of true hypotheses for testing directional hypotheses (Task 1) For directional hypotheses (2.47) and loss functions 0 at i j , ° and L1( H i , G j ( x) 1) ® ° K1 at i z j; ¯ K 0 at i j , ° L2 ( H i , G j ( x) 0) ® °0 at i z j; ¯
(2.78)
Constrained Bayesian Method for Testing Different Type of Hypotheses
51
using concepts of posterior probabilities, the problem (1.16), (1.17) transforms in the following form (Kachiashvili et al., 2018a)
rG
min
^p( H ) K1>P( x *0 | H ) P( x * | H )@
^* ,*0 ,* `
p( H0 ) K1 >P( x * | H0 ) P( x * | H0 )@ p( H ) K1>P( x * | H ) P( x *0 | H )@` ,
(2.79)
subject to
p( H ) P( x * | H ) p( H0 ) P( x *0 | H 0 ) p ( H ) P( x * | H ) t 1
r1 . K0
(2.80)
The solution of the problem (2.79) and (2.80) by Lagrange undetermined multiplier method gives
*
^x : K1 p( H0 | x) p( H | x) O1 K0 p( H | x)` ,
*0
^x : K1 p( H | x) p( H | x) O1 K0 p( H0 | x)`,
*
^x : K1 p( H | x) p( H0 | x) O1 K 0 p( H | x)` ,
(2.81)
where Lagrange multiplier O1 is determined so that in condition (2.80) the equality takes place. Remark 2.2. for the statement (2.79), (2.80) as well as for other statements
(see below Tasks 2, 4, 5 and 7 for directional hypotheses), depending upon the choice of
x , there is a possibility that G j ( x) 1 for more than one j or
G j ( x) 0 for all j ( ,0, ) . Let us introduce denotations
Chapter 2
52
rG
K1 >P( x *0 | H ) P( x * | H )@ ,
rG0
K1 >P( x * | H0 ) P( x * | H0 )@ ,
rG
K1 >P( x * | H ) P( x *0 | H )@ ,
(2.82)
and let us call them individual average risks (Bansal et al., 2016). Then for the average risk (2.79) we have rG
min
^p( H ) rG p( H ) rG p( H
^ H , H0 , H `
0
0
) rG
`.
(2.83)
When testing directional hypotheses, it is possible to make a false statement about choice among alternative hypotheses, i.e. to make a directional error, or a type III error (Benjamini and Yekutieli, 2005). For recognition of directional errors in terms of a false discovery rate (FDR), two variants of false discovery rate (FDR) were introduced in (Benjamini et al., 1993):
pure directional false discovery rate (pdFDR) and mixed
directional false discovery rate (mdFDR), which are the following
pdFDR P( x * | H ) P( x * | H ) ,
(2.84)
and
mdFDR P( x * | H ) P( x * | H 0 ) P( x * | H ) P( x * | H 0 ) .
(2.85)
It is obvious that pdFDR d mdFDR . It is shown that the FDR is an effective model selection criterion, as it can be translated into a penalty function (Benjamini and Yekutieli, 2005). Therefore, FDR gives the opportunity to increase the power of the test in general case (Benjamini and Hochberg, 1995). Both variants of FDR for directional hypotheses: pdFDR and mdFDR can be expressed by Type III error rates (ERRIII): K pdFDR ERRIII and mdFDR SERRIII ,
(2.86)
Constrained Bayesian Method for Testing Different Type of Hypotheses
53
where T ERRIII K ERRIII
P( x * | H 0 ) P( x * | H 0 ) ,
(2.87)
P( x * | H ) P( x * | H ) ,
(2.88)
and T K ERRIII ERRIII .
SERRIII
(2.89)
T K Here ERRIII and ERRIII are two different forms of Type III error rates,
considered by different authors (Mosteller, 1948; Kaiser, 1960; Jones and Tukey, 2000 and Shaffer, 2002) and SERRIII is the summary type III error rate ( SERRIII ) (Kachiashvili et al., 2018). Hereinafter, if necessary, let us ascribe the number of the task related to the considered CBM directly to this abbreviation. Theorem 2.3. CBM 1 with restriction level of (2.8), at satisfying a condition
r 1 1 Pmin K 0
q,
where
0 q 1 ,
Pmin
min^p( H ), p( H 0 ), p( H )` ,
ensures a decision rule with mdFDR (i.e. with SERRIII ) less or equal to
q
; that is, with the condition mdFDR SERRIII d q . Proof. Because of the peculiarity of the decision making rule of CBM, alongside of hypotheses acceptance regions, there exists the regions of impossibility of making a decision (Kachiashvili et al., 2012b; Kachiashvili and Mueed, 2013). Therefore, instead of condition
P( x * | H i ) P( x *0 | H i ) P( x * | H i ) 1 , i < , < { ^,0,` , of the classical decision making procedures, the following condition is fulfilled in CBM
Chapter 2
54
P( x * | H i ) P( x *0 | H i ) P( x * | H i ) P(imd | H i ) 1 , i < , < { ^,0,` ,
(2.90)
where imd is the abbreviation of the impossibility of making a decision. Taking into account (2.90), condition (2.80) can be rewritten as follows
p( H ) P( x * | H ) p( H0 ) P( x *0 | H 0 ) p( H ) P( x * | H ) p( H )>1 P( x * | H ) P( x *0 | H ) P(imd | H )@ p( H0 )>1 P( x * | H 0 ) P( x * | H 0 ) P(imd | H 0 )@ p( H )>1 P( x * | H ) P( x *0 | H ) P(imd | H )@ t 1
r1 . K0
From here follows that
p( H )>P( x * | H ) P( x *0 | H )@ p( H0 )>P( x * | H 0 ) P( x * | H 0 )@ p ( H )>P( x * | H ) P( x *0 | H )@ d Let us denote Pmin
r1 . K0
(2.91)
min^p( H ), p( H 0 ), p( H )` . Then from (2.91) we
have
P( x * | H ) P( x *0 | H ) P( x * | H 0 ) P( x * | H 0 ) P( x * | H ) P( x *0 | H ) d
r 1 1 . Pmin K 0
Taking into account (2.85), we write mdFDR P( x *0 | H ) P( x *0 | H ) d This proves the theorem.
r 1 1 . Pmin K 0
(2.92)
Constrained Bayesian Method for Testing Different Type of Hypotheses
55
Let us call the false acceptance rate (FAR) the following
FAR P( x *0 | H ) P( x *0 | H ) ,
2.93)
then from (2.92), we have mdFDR FAR d
r 1 1 . Pmin K 0
(2.94)
2.3.2. Restrictions on the conditional probabilities of acceptance of each true hypothesis for testing directional hypotheses (Task 2) To minimize (2.79) subject to
P( x *i | H i ) t 1
r2i , i < , < { ^,0,` . K 0 p( H i )
(2.95)
The solution of the problem (2.79), (2.95) is the following
* *0 *
^x : K p( H | x) p( H | x) O K p( H | x)`, ^x : K p( H | x) p( H | x) O K p( H | x)`, ^x : K p( H | x) p( H | x) O K p( H | x)`, 1
0
2
0
1
0 2
0
0
1
0
2
0
(2.96)
where the Lagrange multipliers O2 , O02 and O2 are determined so that in conditions (2.95) the equalities hold. Theorem 2.4. CBM 2 with restriction level of (2.95), at satisfying a
condition
r2 r20 r2 K 0 p( H ) K 0 p( H 0 ) K 0 p( H )
q , where 0 q 1 ,
ensures a decision rule with mdFDR (i.e. with SERRIII ) less or equal to q ; that is, with the condition mdFDR SERRIII d q . Proof. Taking into account (2.85), (2.90), condition
Chapter 2
56
mdFDR SERRIII d q
(2.97)
can be rewritten as follows
mdFDR
^1 P( x *0 | H 0 ) P(imd | H 0 )`
^1 P( x * | H ) P( x *0 | H ) P(imd | H )` ^1 P( x * | H ) P( x *0 | H ) P(imd | H )` d q . (2.98) On the basis of (2.95), condition (2.98) takes the form
mdFDR d
r20 r2 P(imd | H 0 ) K 0 p( H 0 ) K 0 p( H )
P( x *0 | H ) P(imd | H )
r2 P( x *0 | H ) P(imd | H ) d q . K 0 p( H )
(2.99)
It is obvious that if the following condition is fulfilled r2 r20 r2 K 0 p( H ) K 0 p( H 0 ) K 0 p( H )
q , (2.100)
then condition (2.99), and accordingly the condition of the theorem, is fulfilled. Using denotation (2.93), at fulfillment of (2.100) the following inequality can be written
mdFDR FAR d q P(imd | H 0 ) P(imd | H ) P(imd | H ) . (2.101)
2.3.3. Restrictions on the averaged probability of rejection of true hypotheses for testing directional hypotheses (Task 4) For directional hypotheses (2.47) Task 4, i.e., problem (2.20), (2.22) takes the form:
Constrained Bayesian Method for Testing Different Type of Hypotheses
GG
max
57
^K 0 > p( H ) P( x * | H ) p( H0 ) P( x *0 | H0 )
^* ,*0 ,* `
p( H ) P( x * | H )@` ,
(2.102)
subject to
p( H ) K1 >P( x *0 | H ) P( x * | H )@ p( H0 ) K1 >P( x * | H 0 ) P( x * | H 0 )@ p( H ) K1 >P( x * | H ) P( x *0 | H )@ d r4 . (2.103) Solution of the problem (2.102), (2.103) by the method of Lagrange undetermined multipliers gives *
½ 1 K 0 p ( H | x )¾ , ® x : K1 p ( H0 | x ) p ( H | x ) O4 ¯ ¿
*0
½ 1 K 0 p ( H0 | x )¾ , ® x : K1 p ( H | x ) p ( H | x ) O 4 ¿ ¯
*
½ 1 K 0 p ( H | x )¾ , (2.104) ® x : K1 p ( H | x ) p ( H0 | x ) O4 ¯ ¿
where Lagrange multiplier O4 is determined so that in condition (2.104) the equality takes place. Theorem 2.5. CBM 4 with restriction level of (2.103), at satisfying
condition
Pmin
r 1 4 Pmin K1
q,
where
0 q 1,
min^p( H ), p( H 0 ), p( H )` , ensures a decision rule with mdFDR
(i.e. with SERRIII ) less or equal to q ; that is, with the condition
mdFDR SERRIII d q .
Chapter 2
58
Proof. Let us rewrite (2.103) as follows
P( x *0 | H ) P( x * | H ) P( x * | H 0 ) P( x * | H 0 ) P( x * | H ) P( x *0 | H ) d
1 r 4 . Pmin K1
(2.105)
Taking into account (2.85), condition (2.105) can be rewritten as follows mdFDR P( x *0 | H ) P( x *0 | H ) d
1 r 4 . Pmin K1
(2.106)
Because of P( x *0 | H ) P( x *0 | H ) t 0 , the correctness of the stated theorem is obvious. Recalling (2.93), condition (2.106) can be rewritten as follows mdFDR FAR d
r 1 4 . Pmin K1
(2.107)
2.3.4. Restrictions on the conditional probabilities of rejection of each true hypothesis for testing directional hypotheses (Task 5) In this case, it is necessary to maximize (2.102) subject to P( x *0 | H ) d
r50, r , , P( x * | H ) d 5 , K1 K1
P( x * | H 0 ) d
r5,0 r ,0 , P( x * | H 0 ) d 5 , K1 K1
P( x *0 | H ) d
r50, r , , P( x * | H ) d 5 . K1 K1
Application of the Lagrange method gives
(2.108)
Constrained Bayesian Method for Testing Different Type of Hypotheses
* *0 *
^x : K O ^x : K O ^x : K O
p( x | H ) K p( x | H ) K
59
` p( H ) p( x | H )` , p( H ) p( x | H )` ,
1
,0 5
p( x | H0 ) O5, p( x | H ) K0 p( H ) p( x | H ) ,
1
0, 5
p( x | H ) O05,
1
, 5
p( x | H ) O5,0
0
0
0
0
0
(2.109) where Lagrange multiplier vectors O5
(O5,0 , O5, ) , O05
(O05, , O05, ) and
O5 (O5, , O5,0 ) are determined so that in (2.108) the equality takes place. Theorem 2.6. CBM 5 with restriction level of (2.108), at satisfying condition
r5, r5,0 r5, r5,0 K1
r5, r5,0 r5, r5,0
q K1 ), where 0 q 1 , ensures a decision rule
q
(i.e.
with mdFDR (i.e. with SERRIII ) less or equal to q ; that is, with the condition mdFDR
SERRIII d q .
Proof. It is not difficult to be convinced in the correctness of the theorem, putting restrictions from (2.108) into expression (2.85) that gives
mdFDR d
r5, r5,0 r5, r5,0 K1
q.
(2.110)
On the other hand, taking into account equations (2.84) and (2.93) and restrictions (2.108), for pure directional false discovery rate (pdFDR) and for false acceptance rate (FAR) we have
pdFDR d and
r5, r5, , K1
(2.111)
Chapter 2
60
FAR d
r50, r50, , K1
(2.112)
respectively. In the condition of Theorem 2.6, there takes place pdFDR mdFDR d q and,
if
restriction
levels
in
r5, r5,0 r5, r5,0 r50, r50,
(2.108)
satisfy
the
condition
q K1 , then the following inequality
mdFDR FAR d q is correct.
To compare conditions (2.94), (2.101), (2.107) and (2.110), we conclude that in the conditions of the stated theorems the less strict is Task 5 by restriction level of the mdFDR as in this case we have mdFDR d q , then the less strict are Task 1 and Task 4 as in these cases we have mdFDR d q FAR . Task 2 is the most strict as in this case we have
mdFDR d q FAR P(imd | H ) P(imd | H 0 ) P(imd | H ) . The consideration of the statements of different Tasks and hypotheses acceptance regions allow us to see that the directionality can be foreseen by a priori probabilities in Tasks 1 and 4 while it can be foreseen not only by a priori probabilities but by restriction levels too in Tasks 2 and 5. Therefore, consideration of Tasks 2 and 5 for testing directional hypotheses is preferable than of Tasks 1 and 4. Unfortunately, to make a decision at given restriction level of mdFDR is possible not for every observation result x , i.e., for given q there exist observation results for which making a simple decision is impossible. In such cases there are two possible ways: to change q or collect extra observations and decision make on the basis of the total information contained in these observations. Decision making rules with necessary restriction levels in the abovementioned tasks that guaranty the fulfillment
Constrained Bayesian Method for Testing Different Type of Hypotheses
61
of the condition mdFDR d q can be described by the following procedure, sequential in principle by its nature (Kachiashvili, 2014b; Kachiashvili et al., 2012a). Let us denote xi ( i 1,2,... ) as the i th observation result. Then the sequential test is as follows: Procedure A Step 1. If test statistics x1 *i and x1 * j , where j < / i , accept hypothesis H i , i < , where < { ^,0,` is a set of indices. If x1 *i , i< ,
(x * ) ,
( x1 *i )
or
1
j
iz j,
i, j < ,
( x * )( x * ) , then continue sampling; collect
( x1 * )
1
compute x2
( x1 x2 ) / 2 .
0
1
or
x 2 and
Step 2. If test statistics x2 *i and x2 * j , where j < / i , accept hypothesis H i , i < , where < { ^,0,` is a set of indices. If x2 *i , i< ,
(x
( x2 *i )
or
(x
(x
( x2 * )
2
*0 )
compute x2
( x1 x2 x3 ) / 3 ;
2
2
*j ) ,
iz j,
i, j < ,
or
* ) , then continue sampling; collect x3 and
etc. sampling continues until the arithmetic mean of observation results does not belong to only one region of acceptance of tested hypotheses. Lagrange multipliers that are determined for n 1 are used in hypotheses acceptance regions for making decision at increasing sample size.
62
Chapter 2
The following assertion is correct for above given and, in general, for all possible statements of CBM at directionality of hypotheses. Theorem 2.7. Let us assume that the probability distributions p( x | H i ) ,
i < , where < { ^,0,` , are such that, at increasing number of observations n in the sample, the entropy concerning distribution parameter T , in relation to which the hypotheses are formulated, decreases*). In such cases, for given set of hypotheses H 0 , H and H , there always exists such a sample of size n , on the basis of which, decision concerning tested hypotheses can be made with given reliability, when in decision making regions, Lagrange multipliers are determined for n 1 and the condition mdFDR d q is satisfied.
Note 2.1. *) hereinafter it is assumed that this supposition is fulfilled. Proof. Let us denote the necessary reliability of making a decision by D (
0 D 1 ). By making a decision with given reliability, we mean the fulfillment of the following inequality P( x *i | H i ) t D , when H i , i < , < { ^,0,` , is true and x is a statistics on the basis of which decision is made. When sample size increases and we make decision on the basis of
xn (index n indicates that arithmetic mean xn is computed on the basis of the sample with n observations), a posteriori distribution p( H i | xn ) , when
H i , i < , is true, increases and other two probabilities: p( H j | xn ) and p( H k | xn ) , j z k , j , k < / i , decrease. In decision making regions (2.81), (2.96), (2.104) and (2.109), fraction, in the denominator of which
Constrained Bayesian Method for Testing Different Type of Hypotheses
63
p( H i | xn ) attends, when H i is true, must be less than some constant that does not depend on n . Therefore, the bigger p( H i | xn ) is, the bigger the hypothesis H i acceptance region will be and, accordingly, the bigger the probability P( xn *i | H i ) will be, proving the first statement of the theorem. For that reason, mentioned above, in the considered case, probabilities P( xn * j | H i ) , j z i , i, j < , < { ^,0,` , are decreased, consequently mdFDR does not increase (see (2.85)) and inequality mdFDR d q is fulfilled.
2.3.5. Restrictions on posterior probabilities of rejected true hypotheses for testing directional hypotheses (Task 7) In the considered case, it is necessary to maximize (2.102) subject to
K1 > P( H 0 ) P( x * | H 0 ) P( H ) P( x * | H )@ d r7 , K1 >P( H ) P( x *0 | H ) P( H ) P( x *0 | H )@ d r70 , K1 > P( H ) P( x * | H ) P( H 0 ) P( x * | H 0 )@ d r7 . (2.113) Here for directional hypotheses (2.47), the losses (2.78) are used. Using an undetermined Lagrange multiplier method and the concepts of posterior probabilities, for solving the problem (2.102), (2.113), we have *
° ½° 1 ® x : K1 p ( H0 | x ) p ( H | x ) K 0 p ( H | x )¾ , °¯ °¿ O7
*0
° ½° 1 ® x : K1 p ( H | x ) p ( H | x ) 0 K 0 p ( H0 | x )¾ , °¯ °¿ O7
*
½° ° 1 ® x : K1 p ( H | x ) p ( H0 | x ) K 0 p ( H | x )¾ , (2.114) °¿ °¯ O7
Chapter 2
64
where Lagrange multipliers O7 , O07 and O7 are determined, so that in conditions (2.113), the equalities take place. Theorem 2.8. CBM 7 with restriction level of (2.113), at satisfying a
condition
Pmin
r7 r7 K1 Pmin
q (i.e. r7 r7
q K1 Pmin ),
where 0 q 1 ,
min^p( H ), p( H 0 ), p( H )` , ensures a decision rule with mdFDR
(i.e. with SERRIII ) less or equal to q ; that is, with the condition
mdFDR SERRIII d q . Proof. Taking into account denotation Pmin
min^p( H ), p( H 0 ), p( H )`
in restrictions (2.113) and considering the definition of mdFDR (2.85), we have mdFDR d
r7 r7 K1 Pmin
q,
(2.115)
that was necessary to prove. It is obvious that when O07 is chosen so that in the second condition of (2.113) the equality takes place, the following is fulfilled (see formula (2.93))
FAR d c where Pmin
1 r70 , c Pmin K1
(2.116)
min^P( H ), P ( H )`.
On the basis of Theorem 2.8 and of condition (2.116), the correctness of the following Theorem is obvious.
Constrained Bayesian Method for Testing Different Type of Hypotheses
65
Theorem 2.9. CBM 7 with restriction level of (2.113), at satisfying a condition
r7 r70 r7 K1 Pmin
0 q 1 , Pmin
q
(i.e.
r7 r70 r7
q K1 Pmin ),
where
min^p( H ), p( H 0 ), p( H )` , ensures a decision rule with
mdFDR FAR less or equal to q ; that is, with the condition mdFDR FAR d q .
Proof. Taking into account denotation Pmin
min^p( H ), p( H 0 ), p( H )`
in restrictions (2.113) and to consider the definitions of mdFDR (2.85) and
FAR (2.93), we are easily convinced of the validity of the theorem. As mentioned above (see Remark 2.2), for given restriction level q , it is not always possible to make a decision on the basis of observation result x . In such a case, it is necessary to change q , which causes a change in
mdFDR as a result, or to continue observations until a decision is made; that is, to turn to the sequential experiment. Decision making rules with necessary restriction levels in the abovementioned tasks that guaranty the fulfillment of the condition mdFDR d q can be described by Procedure A, which is sequential in
principle by its nature. Definition 2.1. A procedure is “proper” if, as a result of its application, the probability of achieving the put aim is equal to 1.
Chapter 2
66
It is not difficult to be convinced that theorem 7 is also equitable for decision making regions (2.114) (Kachiashvili et al., 2019). From this theorem, it directly follows that the sequential procedure A is proper and when making a decision the condition mdFDR d q is satisfied.
2.4. CBM for Testing Multiple Hypotheses with Directional Alternatives in Sequential Experiments In many applications such as Biology, Medicine, Genetics, Epidemiology, Defense, Environment, Economics, Communication, Radio Astronomy, Video Signalling, Computers, Networks and more, the case of multiple directional hypotheses is considered. Testing hypotheses is as follows (Bansal et al., 2016; Bansal and Miescke, 2013; Benjamini and Hochberg, 1995; Finner, 1999; Shaffer, 2002):
Hi(0 ) : Ti
0 vs H i( ) : T i 0
or H i( ) : T i ! 0 , i 1,..., m . (2.117)
where m is the number of individual hypotheses about parameters T1,...,T m that must be tested by test statistics X
X 1, X 2 ,..., X m ,
where
X i | f xi | T i .
In many cases, especially in biomedical study, the number of hypotheses is very large (Bansal et al., 2016). For testing multiple hypotheses, depending on the pursued aim, many different criteria are offered. For example, comparison-wise error rate (CWER), family-wise error rate (FWER), and false discovery rate (FDR) or positive false discovery rate (pFDR) (Benjamini and Yekutieli, 2005; Chow, 2011). It is well known that the following ratio is among these criteria: “when a great many tests are to be done, the FDR (or some alternate form, such as the pFDR) represents a promising alternative between comparison-wise error (CWE) protection,
Constrained Bayesian Method for Testing Different Type of Hypotheses
67
often considered to be too liberal (i.e. having high power (K.K.)), and family-wise error (FWE) protection, often considered to be too conservative (i.e. having lower power (K.K.))” (see, for example, Prof. D. Edwards’ comment to the paper by Benjamini and Yekutieli, 2005, p. 81). Alternatives such as skewed directional false discovery rate (DFDR) or mixed directional false discovery rate (mdFDR) may be used (Bansal et al., 2016). The optimal procedures controlling DFDR (or mdFDR) use twotailed procedures assuming that directional alternatives are symmetrically distributed. Therefore, the decision rule is symmetric in relation to the parameter’s value defined by the basic hypothesis (Shaffer, 2002; Benjamini and Yekutieli, 2005). For experiments where the distributions of the alternative hypotheses are skewed, the asymmetric decision rule, based on skew normal priors and using Bayesian methodology for testing, was offered in Bansal et al. (2016). Bansal et al. (2016) theoretically proved “that a skewed prior permits a higher power in number of correct discoveries than if the prior is symmetric (Bansal et al., 2016, p. 2)”. This result has been confirmed by a simulation study comparing the proposed rule with a frequentist’s rule and the rule offered in Benjamini and Yekutieli (2005). A sequential method based on constrained Bayesian methods (CBMs) for testing multiple hypotheses was considered in Kachiashvili (2014b), where it was shown that to control both the family-wise error rate and the familywise power, a sample with significantly fewer observations than a Bonferroni or intersection scheme is used, considering the ideas of step-up and step-down methods for multiple comparisons of sequential designs. The new method surpasses existing testing methods proposed earlier in a substantial reduction of the expected sample size. Because CBM can be applied to symmetric and asymmetric hypotheses without any changes and
Chapter 2
68
taking into account that it can foresee the skewness (directionality) in both prior probabilities and the restriction levels, investigation of the CBM approach to testing directional hypotheses is interesting. CBM for testing both an individual and multiple directional hypotheses are considered below in context of FDR.
2.4.1. CBM for testing multiple directional hypotheses Let us consider multiple directional hypotheses (2.117). Let us X
X 1, X 2 ,..., X m
X i ~ f xi | T i .
be a collection of test statistics such that d
Let
d i (1,0,1) , where d i
H i(0)
is true, and d i
d1, d 2 ,..., d m
be a decision rule with
1 means that H i( ) is true, d i
0 means that
1 means that H i( ) is true. We consider the loss
function of the form (Bansal and Miescke, 2013; Bansal et al., 2016) L T , d
where T
T1,T 2 ,..., T m . LT , d i i
hypothesis
H i(0)
versus
H i( )
or
¦
m L i 1 i
T i , d i
,
(2.118)
is the loss for testing each individual
H i( )
.
A constrained Bayesian rule for testing multiple hypotheses (2.117) can be obtained by maximizing the average power (Kachiashvili, 2018a) r G
¦
m r i 1 i
d i
,
(2.119)
subject to
g j di d rl ,ji
, i 1,..., m , j 1,..., kl , l (1,2,3,4,5,6,7 ) , (2.120)
Constrained Bayesian Method for Testing Different Type of Hypotheses
where
g j di
69
, j 1,..., kl , are restricted functions in the stated task of CBM
for i th individual set of directional hypotheses,
rl ,ji
are restriction levels, l
is the number of the task of the CBM (Kachiashvili, 2018a; l 7 in the considered case), and kl is the number of equations in the restrictions of task l of the CBM ( kl
3 in the considered case).
In this case, the total mixed directional false discovery rate (tmdFDR) when testing multiple hypotheses (2.117) is as follows (Bansal and Miescke, 2013; Bansal et al., 2016; Kachiashvili et al., 2020):
tmdFDR
m
¦
i 1
mdFDRi ,
(2.121)
where mdFDRi is mixed directional false discovery rate of i th individual hypotheses. Thus, for testing multiple directional hypotheses (2.117), each d i in stated the CBM - that is, decisions concerning each subset of directional hypotheses in the stated CBM - must be determined using Procedure A, defined above (see Item 2.3.5). Theorem 2.10. For each subset of directional hypotheses of multiple
hypotheses (2.117), one of the possible stated tasks of CBM is used with the restriction levels providing the condition mdFDRi d qi , i 1,..., m . Then, if qi , i 1,..., m , are taken so that
tmdFDRd q is fulfilled.
¦
m
q i 1 i
q , the following condition
Chapter 2
70
Proof. Because of the definition of tmdFDR (2.85), when taking the restriction levels qi , i 1,..., m , in each individual directional task so that m q i 1 i
¦
q , the correctness of the theorem is obvious.
Taking into account the specificity of CBM, as a result of testing multiple hypotheses (2.117) on the basis of test statistics
X
X1, X 2 ,..., X m , there
can appear that for some i , i(1,...,m) , a simple decision concerning testing hypotheses is not made.To have a decision rule with tmdFDR of the level q and simple decisions in all subsets of directional hypotheses, additional information to the parameters Ti ( i(1,...,m) ) concerning which decisions are not made must be obtained; that is, there must be obtained extra observations of the appropriate parameters until decisions in relation to all individual hypotheses will not be made. 2.4.1.1. A sequential test for multiple directional hypotheses
Depending on the relation between the components of the vector of test statistics
X
X1, X 2 ,..., X m , different sequential tests for testing multiple
directional hypotheses (2.117) can be considered. When the components of the vector of test statistics can be observed independently, a decision concerning the subsets of the set of multiple hypotheses can be made independent of each other. However, when the considered components are impossible to observe independently, decisions concerning all subsets of the set of multiple hypotheses must be made on the basis of the vector of test statistics.
Constrained Bayesian Method for Testing Different Type of Hypotheses
71
Let us consider the first case, when the components of the vector
X
X1, X 2 ,..., X m
can be observed independently and, consequently,
decisions concerning the subsets of hypotheses can be made independently. Let us suppose that for test statistics
X i ~ f xi | Ti , i (1,...,m) , a decision xi(1)
is not made on the basis of observation result
. Here the upper index
indicates the number of the observation in sequence. Let sequence of independent observations of the sample
xi(1) , xi( 2) ,...
is
xi( 2) , xi(3) ,...
be the
X i . Then, the common density of
f xi(1) | Ti f xi( 2) | T i
. Using these densities in
decision-making regions of the appropriate CBM, we make decisions until one of the tested directional hypotheses is not accepted; that is, the stopping rule in this sequential test is as follows:
T
max Ti
i(1,..., m)
,
(2.122)
where
Ti Here,
½ inf ®ni : ti xi(1) , xi( 2) ,..., xi( ni ) only one of *ij , j < ¾ ¯ ¿.
ni is the number of sequentially obtained observations for making
a final decision in the i th subset of multiple hypotheses, and < is a set of tested hypotheses in each subset of multiple hypotheses; that is, < { ^,0,` . If independent observation of the components of the vector
X
X1, X 2 ,..., X m
T
° inf ®n : °¯
m i
is impossible, then the stopping rule is as follows:
ª º ½° ti X (1) , X ( 2) ,..., X ( n ) only one of * ij , j < » ¾ « 1 ¬ ¼ °¿ , (2.123)
Chapter 2
72
(i ) where X , i 1,..., n , are independent, m -dimensional observation
vectors. Theorem 2.11. The stopping rules defined by both (2.122) and (2.123) for hypothesis acceptance regions of the considered task of the CBM are proper for any correctly chosen restriction levels in the appropriate restrictions. Proof. The correctness of the statement that the stopping rule (2.122) is
proper in the condition of the theorem follows from theorem 5.3 of Kachiashvili (2014b), which, in the considered case, has the following form: “if the probability distributions increase in the sample size distribution parameter
f xi | Ti , i 1,..., m , are such that an
ni entails a decrease in the entropy concerning
Ti about which the hypotheses are formulated for
any values of restriction levels in a constrained Bayesian task, there always exists such an integer
ni*
that, if the number of repeated observations
ni in
the method of sequential analysis of Bayesian type is more than this value – that is,
ni ! ni*
- one of the tested hypotheses will be accepted with a
probability equal to unity.” The correctness of the statement that the stopping rule (2.123) is proper in the condition of the theorem follows from the above considered and theorem 5.1 of Kachiashvili (2014b), which proves that, in the condition of the theorem, an infinitely increasing number of repeated observations in the sequential analysis of Bayesian type entails infinite decreasing probabilities of errors of the first and the second types.
Constrained Bayesian Method for Testing Different Type of Hypotheses
For satisfaction of the condition choose qi
m q i 1 i
¦
73
q (see Theorem 2.10), we can
q / m , i 1,..., m . On the other hand, it was proved in
Kachiashvili (2014b, p.29) that, at increasing Kullback–Leibler divergences between tested hypotheses, the minimum possible values of error probabilities for which a single decision could be made in the Constrained Bayesian Method decrease and, in the limit, tend to zero. Using this fact, let us define the following algorithm of multiple hypotheses (2.117) testing. Let us introduce denotations /n,i
n
i 1
p ( H i( 0) | X i )
n
i 1
/0n,i
n
i 1
i 1
/n,i
n
i 1
i 1
i 1
p ( H i( ) | X i )
n
i 1
p( H i( ) | X i )
p( H i( 0) | X i )
p ( H i( ) | X i ) n
n
p ( H i( ) | X i )
p ( H i( ) | X i ) n
n
i 1
p ( H i( 0) | X i )
p ( H i( ) | X i )
,
,
,
where X1 , X 2 ,…, X n are sequentially obtained n independent vectors of observations on the basis of which hypotheses (2.117) are tested. /n,i , /0n,i and /n,i are likelihood ratios on the basis of which a decision is made in the considered task of the CBM. Let us denote the averaged values of introduced likelihood ratios /in
p( H i( ) ) /n,i p( H i(0) ) /0n,i p( H i( ) ) /n,i , i 1,..., m ,
and use these values for a characterization of the divergences between the hypotheses in the subsets of tested multiple hypotheses (2.117).
Chapter 2
74
After observing the vectors X1 , X 2 ,…, X n , order the averaged likelihood ratios statistics in nonincreasing order,
/[n1] t /[n2] t t /[nm] , and let H [ j ] , H 0[ j ] and H [ j ] for j 1,...,m be the corresponding tested directional hypotheses, respectively, arranged in the same order. Let us set the levels of the mixed directional false discovery rates for individual tests as (Kachiashvili, 2015)
q[ j ]
§ q ¨ /[nm j 1] / ¨ ©
m
i 1
·
i ¸ n¸
¦/
for j 1,...,m ,
(2.124)
¹
in the considered task. Proceed from the most significant averaged likelihood ratio statistic /[n1] down to the second most significant statistic, etc. Procedure B Step 1. Apply Procedure A to the subset of multiple hypotheses to which the averaged likelihood ratio /[n1] corresponds and go to /[n2] . Step 2. Apply Procedure A to the subset of multiple hypotheses to which the averaged likelihood ratio /[n2] corresponds and go to /[n3] . etc. The sampling is continuing until a simple decision will not be made for all the subsets of (2.117) multiple hypotheses. The stopping rules in this case are remain the same as in (2.122) or (2.123), depending on whether the components of the vector
X
X1, X 2 ,..., X m are observed independently or dependently.
Constrained Bayesian Method for Testing Different Type of Hypotheses
75
Theorem 2.12. The stopping rules (2.122) and (2.123) are proper; the
offered scheme strongly controls the mixed directional false discovery rate. That is, for any set ȥ of true hypotheses in directional hypotheses (2.117),
PȌ ^T f` 1 and
tmdFDRd q . Here ȥ (\ 1,\ 2 ,...,\ m ) is the index set of the true hypotheses,
\ i ^,0,`. Proof. The assertion that the stopping rules (2.122) and (2.123) are proper in the considered case tends from Theorem 2.11 when restriction levels in the appropriate restrictions are chosen correctly. The fact that the offered scheme strongly controls mixed directional false discovery rate tends from the Theorem 2.10 and the equality m
m
¦
qj
j 1
q
§ j ¨ /n / ¨ 1©
¦ j
m
i 1
·
i ¸ n¸
¦/
q.
¹
2.5. Application of CBM to Union-Intersection and Intersection-Union Hypotheses Testing Problems The consideration of the Union-Intersection (UI) problem where the basic hypothesis H 0 states the simultaneous occurrence of several disjoint subhypotheses, i.e., when H 0
S
i 1
H 0i , started in the middle of the last
century (Roy, 1953). The reverse scenario where the basic hypothesis H 0 states the occurrence of at least one of the sub-hypotheses, i.e., when
Chapter 2
76
H0
S
i 1
H 0i , was considered some time later (SenGupta, 1991;
SenGupta and Pal, 2001) and was termed the Intersection-Union (IU) testing of hypotheses problem. Both statements of the hypotheses testing problem deserve attention as they appear in many practical applications. For example, UI situations arise when one considers multi-parameter testing problems in multivariate distributions, while IU testing problems arise in, e.g., one-parameter situations such as “equivalence” testing problems, acceptance sampling in statistical process control, reliability and multivariate analysis (Pal and SenGupta, 2000), directional statistics (Jammalamadaka and SenGupta, 2001, Section 6.3.3; SenGupta and Pal, 2001), multi-parameter problems like contaminated or mixture models (Berger, 1982; Choudhary and Nagaraja, 2004; Madallaz and Mau, 1981; SenGupta, 2007), multiple comparisons in verbal fluency-disorder studies (Soulakova, 2017), group sequential clinical trials (Peng et al, 2018), etc. The intersection of the separate critical regions obtained by the standard separate tests for each H 0i for testing H 0 is considered in Choudhary and Nagaraja (2004). The general uniformly most powerful (UMP) test is presented in (Lehmann, 1986) for solving this problem. An approach based on a Pivotal Parametric Product (P3) as given by SenGupta (2007) is exemplified by computations for several practical examples and by comparisons of the obtained results with the results given in (Berger, 1982). To pursue the above problem, more general statements of the UI and IU problems and application of Constrained Bayesian Method (CBM) for solving it are presented below (Kachiashvili, 2019b; Kachiashvili and SenGupta, (unpublished-b)).
Constrained Bayesian Method for Testing Different Type of Hypotheses
77
2.5.1. Statement of the problem Let us consider the problem of testing a basic hypothesis against an alternative one when one of them is a union or intersection of a sub-set of hypotheses and another is a negation of the first one (SenGupta, 2007; Roy, 1953). Since these two cases easily can be transformed into each other by changing the basic (null) and alternative hypothesis and vice versa, we will consider the case when the null (basic) hypothesis is union or intersection of a sub-set of hypotheses and the alternative is negation of the null, i.e., a) H 0 {
b) H 0 {
S0
i 1
H 0i and H1 { no H 0 or
S0
i 1
H 0i and H1 { no H 0 .
(2.1251)
In general, this problem can be stated as follows: to test the basic hypothesis H 0 , against the alternative one H1 , where H 0 and H1 are the union or intersection of some sub-sets of hypotheses H 01 , H 02 ,…, H 0 S 0 and H11 , H12 ,…, H1S1 , respectively. Here H 0
H H H
H
1
(or more
generally H 0i
1j
, i 1,.., S0 , j 1,.., S1 ) and the fulfillment of the
condition
1
R m , where R m is m dimensional parametrical
0
space, is not obligatory in contrast to the classical case. Hypotheses from one sub-set can intersect with each other but hypotheses from different subsets do not intersect. These suppositions make the statistical hypotheses similar to the hypotheses usually encountered in real-life and, therefore, make them more natural. In general, here we consider the following combinations of testing of hypotheses (Kachiashvili and SenGupta, (unpublished-b)):
Chapter 2
78 S0
a) H 0 {
b) H 0 {
c) H 0 {
d) H 0 {
i 1
H 0i vs. H1 {
H 0i vs. H1 {
H 0i vs. H1 {
S0
i 1 S0
i 1
S0
i 1
S1
H 0i vs. H1 {
i 1 S1
i 1
H1i ; H1i ;
S1
i 1
S1
i 1
H1i ;
(2.1252)
H1i .
It is obvious that hypotheses (2.1251) are particular cases of hypotheses (2.1252). The standard separate tests for each couple H 0i and H1 j , offered in Choudhary and Nagaraja (2004), which yield a test for H 0 and H1 with the acceptance regions given by the intersection or union of the separate acceptance regions, has the following drawback. The information, that may be contained in the hypotheses H 0i and/or H1 j concerning other subhypotheses, are lost in such separate considerations. Application of CBM for testing these hypotheses is free from such drawbacks. It does not need the derivations of a new test statistic for every concrete case and its distribution law (as P3 test needs) (see SenGupta, 2007, 1991), which may be non-trivial in many cases. Besides, it is free from the necessity to have “exact separate tests” (SenGupta, 2007). Let us adopt the following notations for the application of CBM to testing of hypotheses (2.1251) or (2.1252). Denote H ic { H 0i , i 1,..., S0 , H ic { H1i , i
S0 1,..., S0 S1 . Then we have to test S
S0 S1 hypotheses H1c ,
H 2c ,…, H Sc (instead of S0 S1 separate tests in pairs). Let us henceforth omit the upper index for simplicity. Let a sample xT
( x1 ,..., xn ) be
generated from p ( x;T ) and the problem of interest is to test H i : T i 4i ,
Constrained Bayesian Method for Testing Different Type of Hypotheses
79
m i 1,..., S , where 4i R , i 1,..., S , R m is m dimensional parameter
space and the requirement of being disjoint subsets of 4i is not obligatory. Let the prior on T be denoted by
¦
S
S (T | H i ) p( H i ) , where for each
i 1
i 1,..., S , p( H i ) is the a priori probability of hypothesis Hi and S (T | H i )
is a prior density with support 4i ; p( x | H i ) denotes the marginal density of x given Hi , i.e. p ( x | H i )
³
4i
p ( x | T )S (T | H i ) dT . Let us use one of
possible above formulated CBM, for example Task 3 (see Item 3.1.4), for testing stated hypotheses. Remark 2.3. This problem may be viewed as a generalization of the
optimization problem of Dantzig and Wald which was considered in Lehmann (1986) for obtaining the most powerful test.
2.5.2. General solution of the stated problem Let us consider Task 3 for the solution of the stated problem. This is one of the possible forms that can be modified to other forms depending on the specific hypotheses testing technique applied (see, for example, Kachiashvili, 2011; Kachiashvili et al., 2012b; Kachiashvili, 2018a). The solution of the problem (1.16), (2.15) for general loss functions
L1 ( H i , G j ( x) 1) and L2 ( H i , G j ( x) 0) is given by formula (2.17), for losses L1 ( Hi , H j ) and L2 ( Hi , H j ) , transforms in (2.18) and for losses (2.78) gives the following decision making regions *j
x : K ® 1 ¯
¦
S
i 1,i z j
p( H i | x) K 0 O j p ( H j | x)½¾ , j 1,..., S , (2.126) ¿
Chapter 2
80
where O j , j 1,..., S , are determined so that in conditions (2.80) the equalities take place. For testing hypotheses (2.125), decision making rules are defined on the basis of the regions (2.126) as follows: x
for hypotheses of (2.1251)
a) - accept H 0 if x belongs only to the union of the regions *i , i 1,.., S0 ( x
S0 * i 1 i
);
- accept H1 if x belongs only to the region *S0 1 ; - do not make a decision in any other case. b) - accept H 0 if x belongs only to the intersection of the regions *i ,
i 1,.., S0 ( x
S0 * i 1 i
);
- accept H1 if x belongs only to the region *S0 1 ; - do not make a decision in any other case. x
for hypotheses of (2.1252)
a) - accept H 0 if x belongs only to the intersection of the regions *i ,
i 1,.., S0 ( x
S0 * i 1 i
);
- accept H1 if x belongs only to the intersection of the regions *i ,
i S0 1,..., S0 S1 ( x
S0 S1
i S0 1
*i );
- do not make a decision in any other case. b) - accept H 0 if x belongs only to the union of the regions *i ,
i 1,.., S0 ( x
S0
* i 1 i
);
Constrained Bayesian Method for Testing Different Type of Hypotheses
81
- accept H1 if x belongs only to the union of the regions *i ,
i S0 1,..., S0 S1 ( x
S0 S1
i S0 1
*i );
- do not make a decision in any other case. c) - accept H 0 if x belongs only to the intersection of the regions *i ,
i 1,.., S0 ( x
S0 * i 1 i
);
- accept H1 if x belongs only to the union of the regions *i ,
i S0 1,..., S0 S1 ( x
S0 S1
i S0 1
*i );
- do not make a decision in any other case. d) - accept H 0 if x belongs only to the union of the regions *i ,
i 1,.., S0 ( x
* i 1 i
);
- accept H1 if x belongs only to the intersection of the regions *i ,
i S0 1,..., S0 S1 ( x
S0 S1
i S0 1
*i );
- do not make a decision in any other case. Remarks 2.4a. In all the above situations, the statement “do not make a
decision in any other case” was made since it is impossible to make a decision at the desired levels on the basis of existing information (see (2.15)). 2.4b. If making a decision on the basis of existing information, i.e., on the
basis of the sample with given observations is impossible, then there are two actions available: to change restriction levels in (2.15) until a decision will not be made, or to continue the sampling, i.e., to transfer to the sequential experiment, until decision is made (Kachiashvili, 2014b, 2018a).
Chapter 2
82
Theorem 2.13. CBM 3 at statement (1.16), (2.15), for hypotheses b) of
(2.1252) and losses (2.78), ensures a decision rule with the error rates of the Type-I and Type-II restricted by the following inequalities S0
Dd
¦ i 1 S
Ed
r2i , K 0 p( H i )
r2i . K p( H i ) 1 0
¦
i S0
(2.127)
Proof. The Type-I and Type-II error rates for hypotheses (2.1252) are the following
³
D
*1
p ( x | H 0 )dx and E
³
*0
p ( x | H 1 ) dx .
(2.128)
For hypotheses of (2.1252) b), expression (2.128) can be rewritten as follows
³
D d
S
* j S 0 1 j S0
S
i 1
j S 0 1 * j
¦ ¦
S0
iS01 p( x | H i )dx
³
¦ ³ i 1
S j S 0 1
*j
S0
S
i 1
j S 0 1
¦ ¦
p ( x | H i )dx
p( x | H i )dx d p ( x * j | H i ) (2.129)
and
³
E d
S0
* j 1 j
¦
iS S0 1 p( x | H i )dx
S
S0
i S 0 1
¦ ³
j 1 *j
S
¦
p ( x | H i )dx
i S0 1
¦
³
S0
S
i S 0 1
j 1
*j
¦
p( x | H i )dx d
S0 j 1
p ( x * j | H i ) . (2.130)
Since the following condition holds in CBM (Kachiashvili, 2018b) when a decision is made
¦
S0 j 1
p( x * j | H i )
¦
S j S 0 1
p ( x * j | H i ) 1 , i 1,..., S ,
(2.131)
Constrained Bayesian Method for Testing Different Type of Hypotheses
83
conditions (2.129) and (2.130) can be rewritten as follows
Dd d
¦
¦
S0
ª1
i 1« ¬
¦
ª r2i « 1 p( H ) K 0 i ¬«
S0 i
d
¦
S0 j 1
¦
p( x * j | H i )º d »¼ S0 j 1, j z i
º p( x * j | H i )» d ¼»
r2i , 1 p( H ) K i 0
S0
i
and
Ed d
¦
S i
¦
S
ª1
i S 0 1 « ¬
¦
ª r2i « S 0 1 p ( H ) K i 0 ¬«
d
¦
S
i
S j S 0 1
¦
p( x * j | H i )º d »¼
S j S 0 1, j z i
º p( x * j | H i )» d ¼»
r2i . S0 1 p ( H ) K i 0
Making the same transformations for other combinations of hypotheses (2.1252), it is easily seen that the theorem holds for all these hypotheses. 2.5.2.1. Another loss function
Let us, instead of losses (2.78), consider the following loss functions for hypotheses of (2.1252):
°0, at i, j (1,..., S 0 ) or i, j ( S 0 1,..., S ), ° ° L1( H i , G j ( x) 1) ® K1 , at i (1,..., S 0 ) and j ( S 0 1,..., S ) or ° ° at i ( S 0 1,..., S ) and j (1,..., S 0 ); °¯ (2.132)
Chapter 2
84
and
°K 0 , at i, j (1,..., S 0 ) or i, j ( S 0 1,..., S ), ° ° L2 ( H i , G j ( x) 0) ®0, at i (1,..., S 0 ) and j ( S 0 1,..., S ) or , ° ° at i ( S 0 1,..., S ) and j (1,..., S 0 ). °¯ (2.133) Then CBM 3, i.e., statement of the problem (1.16), (2.15), for hypotheses b) of (2.1252), takes the following form
rG
° ª min ®K1 « ^* j ` °¯ ¬
¦
¦
S0
i 1
S i S 0 1
p( H i )
p( H i )
¦
S
³
j S 0 1 * j
S0
¦ ³
j 1 *j
p( x | H i )dx
º ½° p ( x | H i )dx » ¾ , ¼ °¿
(2.134)
subject to S0
K 0 p( H i )
¦³ j 1
R n * j
p( x | H i )dx d r3i , i 1,..., S0 ,
S
K 0 p( H i )
¦³
j S0 1
R n * j
p( x | H i )dx d r3i , i
S0 1,..., S . (2.135)
Application of the Lagrange method for solving of the constrained optimization problem (2.134) and (2.135), gives *j
° ® x : K1 °¯
¦
S0
S i S 0 1
p( H i ) p( x | H i ) K 0
½°
¦ O p( H ) p( x | H )¾° , i
i
i
¿
i 1
j 1,..., S0 , *j
° ® x : K1 °¯
¦
S0
i 1
S
p( H i ) p ( x | H i ) K 0
½°
¦ O p(H ) p( x | H )¾° , i
i S 0 1
i
i
¿
Constrained Bayesian Method for Testing Different Type of Hypotheses
j
S0 1,..., S ,
85
(2.136)
where Lagrange multipliers Oi , i 1,..., S , are determined so that equality holds in (2.135). Thus, we have *1 ... *S0
*0 and *S0 1 ... *S
*1 .
Theorem 2.14. CBM 3 under (1.16), (2.15), for hypotheses b) of (2.1252)
and losses (2.132) and (2.133) ensures a decision rule with the error rates of the Type-I and Type-II restricted by the following inequalities S0
¦
Dd
i 1 S
r2i , K 0 S0 p( H i )
r2i . K (S S0 ) p( H i ) 1 0
¦
Ed
i S0
(2.137)
Proof. Restrictions (2.135) for hypotheses acceptance regions (2.136) are transformed to the forms
K 0 p( H i ) S0 1 p( x *0 | H i ) d r2i , i 1,..., S0 , K 0 p( H i ) (S S0 ) 1 p( x *1 | H i ) d r2i , i S0 1,..., S . (2.138) The use of these ratios for the Type-I and Type-II error rates (2.129) and (2.130) respectively, gives
Dd
¦
S0
i 1
p( x *1 | H i )
S0
S0
1 p( x *0 | H i ) d ¦i 1 i 1
¦
r2i , K 0 S0 p( H i )
and
Ed
¦
S i S 0 1
p ( x *0 | H i )
d
¦
S i
¦
S i S 0 1
1 p( x *1 | H i ) d
r2i . S 0 1 K ( S S ) p ( H ) 0 0 i
Chapter 2
86
Similarly, this Theorem can be proved for other combinations of hypotheses (2.1252).
2.5.3. Examples Let us consider examples for illustrating the fact that well known cases of statistical hypotheses formulations are particular cases of hypotheses given by formula (2.125). Example 2.1. Case of One-parameter H 0 (SenGupta, 2007, p. 4) Let a random variable X follow the distribution f ( x;T ) , where T is a scalar parameter. Let us consider testing
H 0 : T d T1 or T t T 2 vs H1 : T1 T T 2 .
(2.139)
Let us denote: H 01 : T d T1 , H 02 : T t T 2 , H11 : T ! T1 and H12 : T T 2 . Then hypotheses (2.139) can be rewritten in the form
H 0 : H 01
H
02
vs. H1 : H11
i.e., we have case (2.1252) d), where S0
H
2 , S1
12
,
(2.140)
2 and S
S 0 S1
4.
Remark 2.5. We are forced to choose four hypotheses ( H 01 , H 02 , H11
and H12 ) instead of three H 01 , H 02 and H1 (to which correspond disjoint parametrical
41
subsets
^P : P1 P P2 ` )
401
^P : P d P1` ,
402
^P : P t P2 `
and
because of the specificity of the example under
consideration. Otherwise, the suitable choice of the parameter P of truncated normal distribution at H1 is impossible and the quality of decision made depends on a chosen value of P ( P ( P1 , P 2 ) ). This note will be more evident when presenting concrete examples in Item 4.3.
Constrained Bayesian Method for Testing Different Type of Hypotheses
87
a) Let us consider the case of loss functions (2.78). Restrictions (2.16) and decision regions (2.126) take the following forms in this case
r201 , K 0 p( H 01 )
³
p( x | H 01 )dx t 1
³
p( x | H 02 )dx t 1
r202 , K 0 p( H 02 )
³
p( x | H11 )dx t 1
r211 , K 0 p( H11 )
³
p( x | H12 )dx t 1
*01
*02
*11
*12
r212 . K 0 p( H12 )
(2.141)
*01
^x : K1 p( H 02 | x) p( H11 | x) p( H12 | x) K 0 O01 p( H 01 | x)` ,
*02
^x : K1 p( H 01 | x) p( H11 | x) p( H12 | x) K 0 O02 p( H 02 | x)` ,
*11
^x : K1 p( H 01 | x) p( H 02 | x) p( H12 | x) K 0 O11 p( H11 | x)` ,
*12
^x : K1 p( H 01 | x) p( H 02 | x) p( H11 | x) K 0 O12 p( H12 | x)` . (2.142)
The errors of the Type-I and the Type-II accordingly are:
D
³
p ( x | H 01 )dx
³
p ( x | H 02 ) dx
p ( x *1 | H 01 ) p ( x *1 | H 02 ) ,
E
³
p( x | H11 )dx
³
p( x | H12 )dx
p( x *0 | H11 ) p( x *0 | H12 ) ,
*1
*0
*1
*0
(2.143)
Chapter 2
88
and, by Theorem 2.13, for their restriction on the desired levels at making a decision, restriction levels in (2.141) must be chosen on the basis of the following conditions
Dd
r201 r202 , K 0 p( H 01 ) K 0 p( H 02 )
r211 r212 . K 0 p ( H11 ) K 0 p( H12 )
Ed
(2.144)
b) Let us consider the case of loss functions (2.132) and (2.133). Risk function (2.134) and restriction conditions (2.135) take following forms in this case rG
ª min ® K1 « ^* j ` ¯ ¬
³
*11
p( H 01 ) p( x | H 01 ) p( H 02 ) p( x | H 02 ) dx
³
p( H 01 ) p( x | H 01 ) p( H 02 ) p( x | H 02 ) dx
³
p( H11 ) p ( x | H11 ) p ( H12 ) p ( x | H12 ) dx
³
p( H11 ) p( x | H11 ) p ( H12 ) p( x | H12 ) dx º» ¾ ,
*12
*01
½
¼¿
*02
(2.145)
and ª K 0 p( H 01 ) « ¬
³
ª K 0 p( H 02 ) « ¬
³
³
p( x | H 02 )dx
³
³
p ( x | H11 )dx
³
º p ( x | H11 )dx » d r211 , ¼
³
p ( x | H 12 ) dx
³
º p ( x | H 12 )dx » d r212 . (2.146) ¼
ª K 0 p ( H11 ) « ¬ ª K 0 p ( H12 ) « ¬
º p ( x | H 01 )dx » d r201 , ¼
p ( x | H 01 )dx
n
R *01
R n *01
R n *11
R n *11
n
R *02
R n *02
R n *12
R n *12
º p( x | H 02 )dx » d r202 , ¼
Constrained Bayesian Method for Testing Different Type of Hypotheses
89
Solution of (2.145), (2.146) gives the hypotheses acceptance regions
*01 { *02 { *0 and *11 { *12 { *1 , where *0
^x : K1 p( H11) p( x | H11) p( H12 ) p( x | H12 )
K 0 O01 p( H 01 ) p( x | H 01 ) O02 p( H 02 ) p( x | H 02 ) ` , *1
^x : K1 p( H 01) p( x | H 01) p( H 02 ) p( x | H 02 )
K 0 O11 p( H11 ) p( x | H11 ) O12 p( H12 ) p( x | H12 ) ` . (2.147) Here, Lagrange multipliers O01 , O02 , O11 and O12 are determined so that in the conditions (2.146) equalities hold. Taking into account (2.147), restriction conditions (2.146) take the forms
³
*0
r201 , 2 K 0 p( H 01 )
p( x | H 01 )dx t 1
³
*0
p ( x | H 02 ) dx t 1
r202 2 K 0 p ( H 02 )
,
³
p( x | H11 )dx t 1
r211 , 2 K 0 p( H11 )
³
p( x | H12 )dx t 1
r212 . 2 K 0 p( H12 )
*1
*1
(2.148)
It should be noted that the determination of Lagrange multipliers is more difficult for (2.132), (2.133) than for (2.78), because in the first case the two-dimensional equations with respect to Lagrange multipliers must be solved, instead of one-dimensional in the second case. For guaranteeing restrictions of Type-I and Type-II error rates at the desired levels, according to Theorem 2.14, restriction levels in conditions (2.148) must be chosen so that the following inequalities are fulfilled
Dd
r201 r202 , 2 K 0 p ( H 01 ) 2 K 0 p( H 02 )
Chapter 2
90
r211 r212 . 2 K 0 p ( H11 ) 2 K 0 p( H12 )
Ed
(2.149)
The comparison of (2.144) and (2.149) allows us to conclude that at identical K0 , p( H 01) , p( H 02 ) , p( H11) , p( H12 ) , r201 , r202 , r211 and r212 , Type I and Type II error rates for losses (2.132), (2.133), in general, are less than the same error rates for losses (2.78) at the solution of the stated condition of the problem (see, Remark 4.1 in Item 4.3.1). Therefore, the use of losses (2.132), (2.133) is not only more logical than the use of losses (2.78) but, also, it is preferable for minimization of Type I and Type II error rates. Example 2.2. Case of Multi-parameter H 0 (SenGupta, 2007, p. 13) Let us consider the mixture model with density
g ( x | p,T ,- )
p f ( x | T ,- ) (1 p) f ( x | T 0 ,- ) ,
(2.150)
where 0 d p d 1 , T 4 , an interval of the real line; both p and T are unknown and T0 is a known point of 4 ; and - is an unknown parameter (possibly vector-valued), to be interpreted as a nuisance parameter. The density f ( x | T , - ) is assumed to be sufficiently “regular”. We want to test the null hypothesis H 0 : “no contamination” against the alternative H1 : “there is contamination”. As a result of the above setup, the null hypothesis of the contamination translates to the union of three parametric hypotheses:
ª « H 01 : p 0 ¬
H
02
:T
T0
H
03
: p 0 and T
º
T0 » . ¼
Taking into account (2.150), the null parametric sub-hypotheses are
H 01 : g 01 ( x | T ,- )
f ( x | T 0 ,- ) ,
Constrained Bayesian Method for Testing Different Type of Hypotheses
H 02 : g 02 ( x | p,T ,- ) H 03 : g 03 ( x | T ,- )
p f ( x | T 0 ,- ) (1 p) f ( x | T 0 ,- )
91
f ( x | T 0 ,- ) ,
f ( x | T 0 ,- ) .
(2.151)
Because of g 01 { g 02 { g 03 , hypotheses H 01 , H 02 and H 03 are the same. It is obvious that to the alternative hypothesis correspond the following parametric hypothesis H1 : p z 0 and T z T 0 with underlying density (2.150). Finally, we have the following set of hypotheses for testing
H 0 : X ~ f ( x | T 0 ,- ) vs H1 ~ g ( x | p,T ,- ) .
(2.152)
Thus we have S0 1 and S1 1 . Let us introduce p( H i ) , i
0,1 , a priori probabilities; S (Z | H i ) , a priori
density with support :i ( Z { ( p ,T ,- ) ); and p( x | H i ) the marginal density of x given Hi , i.e. p( x | H i )
³
:i
g i ( x | Z )S (Z | H i )dZ , i
0,1 .
(2.153)
Taking into account (2.151), more specifically, for marginal densities we have f ( x | T 0 ,- )S (- | H 0 )d- ,
p( x | H 0 )
³
p( x | H1 )
³³ T ³
Q 1
0 4/
0
Q
p f ( x | T ,- )S ( p | H1 )S (- | H1 )dpdTd-
1
³ ³ (1 p) f ( x | T ,- )S ( p | H )S (- | H )dpd- . 0 Q
0
1
1
(2.154)
Here Q is the domain of support of - . It is obvious that we have a particular situation of the previous case (see Example 2.1), where for testing we use again CBM 3. Therefore, the results obtained for Example 2.1, and,
Chapter 2
92
in particular, theorem 2.14, are in force but conditions (2.144) are simplified and have the following forms
Dd
r20 r21 , Ed . K 0 p( H 0 ) K 0 p ( H1 )
(2.155)
2.6. Quasi-Optimal Rule of Testing Directional Hypotheses and Its Application to Big Data In many cases, testing hypotheses (2.47), especially (2.117), requires making a decision so that all possible errors of incorrect results are restricted on the desired levels. Unfortunately, the existing classical methods do not have such opportunities. Opposite to this, CBM allows us to make such decisions. But, when the number of hypotheses surpasses two, in many statements of CBM, the determination of vectors of Lagrange multipliers is necessary. This is quite a difficult and time consuming procedure, as it requires the solution of nonlinear equations concerning Lagrange multipliers. For overcoming these problems, i.e., for simplification of the computation process with the reduction of necessary time at the expense of the determination of scalar Lagrange multipliers and, consequently, increasing the accuracy of obtained results, the consideration of (2.47) hypotheses in pairs is offered below. Let us, instead of simultaneous consideration of three hypotheses competing among themselves, as it is made in (2.47), consider them in streams
as
follows
(Kachiashvili,
Kachiashvili
and
SenGupta,
(unpublished)):
H 0 : T T 0 vs. H : T T 0 , H 0 : T T 0 vs. H : T ! T 0 , H : T T 0 vs. H 0 : T T 0 ,
(2.156)
Constrained Bayesian Method for Testing Different Type of Hypotheses
H : T T 0 vs. H : T ! T 0 ,
93
(2.157)
H : T ! T 0 vs. H 0 : T T 0 , H : T ! T 0 vs. H : T T 0 ,
(2.158)
Let us denote: E0 j is the region of acceptance of hypothesis H 0 at testing of this hypothesis versus hypothesis H j , j , (couple of hypotheses (2.156)); E j is the region of acceptance of hypothesis H at testing of this hypothesis versus hypothesis H j , j 0, (couple of hypotheses (2.157)) and E j is the region of acceptance of hypothesis H at testing of this hypothesis versus hypothesis H j , j ,0 (couple of hypotheses (2.158). Then hypothesis Hi , i ,0, , is accepted if an observation result x Eij , j ^,0,` , j z i . For finding regions Eij , i, j ^,0,` ,
i z j , let us consider one of possible statement of CBM p ( H i ) p ( x | H i ) dx ,
(2.159)
p ( H j ) p ( x | H j ) dx d J ij ,
(2.160)
max
^Eij ` ³Eij
subject to
³
Eij
i ,0, , j ,0, , j z i . Here p( H i ) and p( x | H i ) are a priori probabilities of hypothesis Hi and the marginal density of x given Hi , respectively. Let us call the stated problem of determination of regions Eij ,
i, j ^,0,` , i z j , using statements (2.159) and (2.160), as Quasi-
Chapter 2
94
Optimal Constrained Bayesian Method (QOCBM) for testing directional hypotheses, similar to the method utilized by Kachiashvili et al. (2012c). Finally, let us note that hypotheses (2.47), as well as hypotheses (2.117), can be presented as a case of intersection-union, union-intersection hypotheses (Roy, 1953; SenGupta, 2007; Kachiashvili, 2019b). Therefore, an approach developed below can be also used for solving the problems of intersection-union, union-intersection hypotheses.
2.6.1. Quasi-optimal approach of testing the hypotheses The solution of the constrained optimization problem (2.159) and (2.160) is the following ° p ( H j ) p ( x | H j ) 1 ½° ®x : ¾, p ( H i ) p ( x | H i ) Oij °¿ °¯
Eij
(2.161)
where Lagrange multiplier Oij is determined so that the equality takes place in (2.160). Let us make denotations: *0
E0
E
0
is the regions of acceptance of
E
hypothesis H 0 in the initial problem (2.47); similarly, *
E0
the regions of acceptance of hypothesis H and *
E
E 0
is
is the
regions of acceptance of hypothesis H . Then decision making rule is defined by the following procedure. Procedure C
E E
-
if x *
E0
-
if x *0
E0
0
, accept hypothesis H , , accept hypothesis H 0 ,
Constrained Bayesian Method for Testing Different Type of Hypotheses
-
if x *
E 0
E
95
, accept hypothesis H .
The averaged risk function of incorrectly acceptance of tested directional hypotheses is computed by the formula Q rCBM
ª p( H ) « ¬
ª p( H 0 )« ¬
ª p( H )« ¬
³
*0
+ p( H
³
*
º p( x | H )dx » ¼
³
p ( x | H 0 ) dx
³
º p ( x | H 0 ) dx » ¼
³
p ( x | H ) dx
³
º p ( x | H )dx » = ¼
*
*
> E )>P x E E )>P x E E
p ( H ) P x E0
+ p(H 0
p( x | H )dx
0
0
0
*
*0
E | H P x E E | H P x E E
| H P x E0
@ | H @+ | H @ . (2.162)
| H + 0
0
0
0
0
As was mentioned above, one of basic criteria of optimality at testing directional hypotheses is mdFDR (2.85). Another criterion of optimality at testing directional hypotheses is the Type III error rate. As is shown in Item 2.3.1 (see Eqs. (2.86), (2.87) and (2.88)), there exists the following ratio
mdFDR SERRIII between mdFDR and the summary type III error rate ( SERRIII ). Taking into account the above introduced denotations, we have the following for mdFDR
mdFDR
P x ( 0
P x ( 0
(
(
| H–
( | H P x ( ( | H
| H P x ( 0 0
0
P x ( | H P x ( 0 | x ( , H
0
Chapter 2
96
Px ( 0 | H 0 Px ( | ( 0 , H 0 P x ( | H – P x ( 0 | x ( , H –
Px ( 0 | H 0 Px ( | ( 0 , H 0 .
(2.163)
Recalling the condition of determination of Lagrange multipliers, restrictions (2.160) can be rewritten as follows P x ( | H J / p ( H ) , P x ( 0 | H 0 J 0 / p ( H 0 ) ,
P x ( 0 | H J 0 / p ( H ) , P x ( 0 | H J 0 / p ( H ) , P x ( | H J / p ( H ) , P x ( 0 | H 0 J 0 / p ( H 0 ) . (2.164)
Theorem 2.15. QOCBM with restriction levels (2.160) (that is (2.164)), at
satisfying
a
J 0 / p( H 0 )
condition
J / p( H ) J 0 / p( H 0 ) J / p( H )
q where 0 q 1 , ensures a decision rule with mdFDR
(that is with SERRIII ) less or equal to q ; that is, with the condition
mdFDR SERRIII d q . Proof. Because the second multipliers in (2.163) are less than 1 and taking into account conditions (2.164), we can write mdFDR d P x ( | H P x ( 0 | H 0 P x ( | H
P x ( 0 | H 0 J / p ( H ) J 0 / p ( H 0 ) J / p ( H ) J 0 / p ( H 0 ) q . (2.165) And thus it is proved. Theorem 2.16. QOCBM with restriction levels (2.160) (that is (2.164)), at
satisfying a condition
¦
¦
J ij i^,0, ` j^,0, `, j z i
q , where 0 q 1 , ensures a
Constrained Bayesian Method for Testing Different Type of Hypotheses
97
decision rule with the averaged risk function (2.162) of incorrectly acceptance of tested directional hypotheses less or equal to q ; that is, with Q
the condition rCBM d q . Proof. Let us rewrite risk function (2.162) as follows Q rCBM
p ( H )>P x E 0 | H P x E 0 | x E 0 , H
Px E | H Px E0 | x E , H @ p ( H 0 )>P x E 0 | H 0 P x E | x E 0 , H 0
Px E0 | H 0 Px E | x E0 , H 0 @ p ( H )>P x E | H P x E 0 | x E , H
Px E0 | H Px E0 | x E0 , H @ d d p ( H )>P x E0 | H P x E | H @ p ( H 0 )>P x E 0 | H 0 P x E 0 | H 0 @ p ( H )>P x E | H P x E 0 | H @
p( H )
J 0 J p( H )
p( H 0 )
¦
J 0 J 0 p( H 0 )
¦
J ij i^, 0, ` j^, 0, `, j z i
p( H ) q.
J J 0 p( H ) (2.166)
That proves the theorem. Theorem 2.17. For given restriction levels in (2.160) when the minimum
value of the divergence between hypotheses H i and H j tends to infinity, Q that is min J ( H i , H j ) o f , i, j ^,0,` , i z j , both risk function ( rCBM
^i , j`
98
Chapter 2
) and mdFDR , for fixed Lagrange multipliers defined by formulae (2.160) at satisfaction of (2.165) and (2.166), respectively, tend to zero. Proof. It is not difficult to be convinced that when min J ( H i , H j ) o f , ^i , j`
i, j ^,0,` , i z j , the second multipliers in (2.163) and (2.166) as well tend to zero and the values of the first multipliers are determined by condition (2.160). Therefore, their multiplication tends to zero and, Q accordingly, the values of rCBM and mdFDR tend to zero too.
Q In general, to make a judgment about the ratio between rCBM and Q mdFDR is impossible, even when the values of both rCBM and mdFDR Q are restricted on one and the same level because the value of rCBM depends
on a priori probabilities whereas mdFDR does not. Because of the specificity of hypotheses acceptance regions of CBM (Kachiashvili, 2018a), when testing directional hypotheses using Procedure C a situation can happen when making a simple decision is impossible because test statistics belong to intersection areas of hypotheses acceptance regions or do not belong to any of these regions. In such a situation, to make a simple decision with given reliability on the basis of existing information, i.e. existing sample is impossible and, to overcome this, more information, i.e. the increase in the sample is necessary. When increasing the sample size is impossible, the restriction levels in (2.160) must be changed until a simple decision will not be made. When increasing the information is possible, we have to transfer to the sequential experiment, i.e., to increase a sample size and apply Procedure C to all observation results for making a decision and
Constrained Bayesian Method for Testing Different Type of Hypotheses
99
so on, until a simple decision will not be made. The appropriate sequential procedure of making a decision is given in Procedure D. Procedure D Let us denote the existing sample by x ( x1 ,..., xn ) , test statistics on the basis of n observations - by xn , then the sequential procedure is the following Step 1 -
if xn belongs to only region *
E0
E
E0
E
, accept hypothesis
H , -
if xn belongs to only region *0
0
, accept hypothesis
, accept hypothesis
H0 , -
if xn belongs to only region *
E0
E
H , -
otherwise continue sampling; collect xn1 and compute new test statistics xn1 ;
Step 2 -
if xn 1 belongs to only region *
E 0
E
E0
E
, accept hypothesis
H , -
if xn 1 belongs to only region *0
H0 ,
0
, accept hypothesis
Chapter 2
100
-
if xn 1 belongs to only region *
E0
E
, accept hypothesis
H , -
otherwise continue sampling; collect xn2 and compute new test statistics xn2 ;
etc. The sampling continue until test statistic does not belong to only one region of acceptance of tested hypotheses. Note 2.2. it is clear that in the beginning of the sequential test a sample size can be equal to one, i.e., n 1 and this corresponds to the parallel experiment on which the testing process finishes if the desired level of reliability of making a decision is achievable for this amount of information, otherwise sampling continues, i.e., a parallel experiment goes over to the sequential experiment naturally.
2.6.2. Testing multiple directional hypotheses For testing multiple directional hypotheses (2.117), let us use the concept of the total mixed directional false discovery rate (tmdFDR) (see expression (2.121) in Item 2.4.1). For guaranteeing the level q of tmdFDR , at testing hypotheses (2.117), we have to consider m subsets of directional hypotheses and, for each of them, to use above described procedure D, providing qi level of mdFDR i for i th subset of hypotheses, so that the following was fulfilled
¦
m q i 1 i
q.
We act similarly for providing a level q for the total averaged risk function. Namely, we provide qi level of the appropriate averaged risk
Constrained Bayesian Method for Testing Different Type of Hypotheses 101
function for i th subset of individual directional hypotheses and, as a result, we have Q rCBM
¦
m Q r i 1 i ,CBM
(2.167)
for the total averaged risk function, where riQ,CBM is the averaged risk function of the
i th subset of directional hypotheses (Kachiashvili et al.,
2020). Q ) can be chosen The values of qi in both cases (for tmdFDR and for rCBM
so that they are equal to each other, that is, qi
q / m or they can be
different, equal to the inverse proportions of the informational distances between the test hypotheses of the subgroups of directional hypotheses (Kachiashvili et al. ., 2020). Q on the desired levels, In both cases of restriction of tmdFDR and of rCBM
we use the aforementioned sequential Procedure D where sampling continues, until a simple decision will not be made for all the subsets of (2.117) multiple hypotheses. The stopping rules remain the same as in (2.122) or (2.123), depending on whether the components of the vector
X
X1, X 2 ,..., X m
are observed independently or dependently. The
theorems 2.11 and 2.12, proving properness of stopping rules for both cases, are in force for considered directional hypotheses as well. Currently, in many applications, very actual is a situation when the number of individual hypotheses in the set of multiple hypotheses (2.117) is very big, i.e., when data is big (Bansal et al., 2016; Bansal and Miescke, 2013). In such a situation, determination of Lagrange multipliers for each subset of individual hypothesis requires long time of computation. Though the computation of Lagrange multipliers is implemented before making a
Chapter 2
102
decision, on the preparatory stage, still the reduction of computation time is important for many practical applications for operative and cheapness points of view. For this purpose, the following theorem is proved. Theorem 2.18. Let us individual hypotheses in the set of (2.117) of multiple
hypotheses are stated concerning values of the parameters
T1,...,Tm , and
X i i 1,..., m ( ) test statistics of all subsets of individual hypotheses have one and the same distribution laws, then, if for testing of all subsets of individual hypotheses, we use one and the same Lagrange multipliers determined for a subset of individual hypothesis with lowest divergence among directional hypotheses on the level qi
¦
m q i 1 i
q / m , satisfying condition
q , the total mixed directional false discovery rate (2.121) and
the total risk function (2.167) will be restricted on the level q .
Proof. The correctness of the theorem follows from theorem 2.17 in accordance of which mdFDRi and riQ,CBM ( i 1,..., m ) tend to zero when the Kullback–Leibler divergence among directional hypotheses tends to infinity.
2.7. CBM for Testing Hypotheses Concerning Parameters of Normal Distribution with Equal Mathematical Expectation and Variance In a large variety of practical situations, a normal distribution with equal parameters, that is N T ,T , may be postulated as a useful model for data. In statistical science, one may encounter a situation when a response variable X happens to be the number of (i) calls arriving at a switchboard
Constrained Bayesian Method for Testing Different Type of Hypotheses 103
in a unit time, (ii) claims received daily at an insurance company, (iii) disintegrated particles of radioactive substances in a given period of time, (iv) observed meteorites in a given period of time in the space, or (v) observed false reflected radio-signals at a receiving station. In such situations, X is often viewed as a random variable having a Poisson distribution with a parameter T , especially when T is believed to be large. For making optimal inferences in such a situation, it is important to estimate T accurately. But, when T is assumed to be large, in practice, an original Poisson distribution of X is frequently approximated with a
N T ,T distribution, T ! 0 . Thus, estimation of T or testing some hypotheses concerning T in a N T ,T population are problems of significant practical importance. We address two other important points: (a) suppose that Y has a
N T * , cT *
distribution with T * ! 0 unknown, but c ! 0 is assumed
known. Then, the coded data X
c 1Y obtained from X would be guided
by the distribution N T ,T where T
c 1T * . Here is another important
point: (b) asymptotic distributions of appropriately standardized designed
* * purely sequential stopping times are known to follow a N T , cT
distribution with T * ! 0 unknown and c ! 0 is known. Very detailed sets of interesting exploratory data analysis (EDA) have been reported recently by Mukhopadhyay and Zhang (2018, 2020). Hosts of inference problems have been discussed in the literature. For example, Neyman-Pearson D level most powerful (MP) methods for testing
H 0 : T T0 against an alternative H1 : T T1 ( ! T 0 ) was
considered in Bhattacharjee and Mukhopadhyay (2011). In accordance with
Chapter 2
104
the theorem of Karlin and Rubin (1956), it is the uniformly most powerful (UMP) D level test for testing H 0 : T d T0 against H1 : T ! T1 . They reported various exact and large-sample Neyman-Pearson D level methods having additionally preassigned a Type-II error probability E , 0 E 1 , by determining the requisite sample size n { n(D , E ) . While such determination of n producing both error rates for large values of T0 and
T1 is complicated, in the present work, both Type-I and Type-II error rates are both fixed and preassigned without any additional complexity. Moreover, we consider intricate cases under composite and complex alternative hypotheses. Applications of Wald’s (1947) sequential probability ratio test (SPRT) and random SPRT (RSPRT) from Mukhopadhyay and de Silva (2008) for deciding between H 0 : T
T0 vs. H1 : T T1 where T 0 ,T1 ( T0 z T1 ) were
considered in Bhattacharjee and Mukhopadhyay (2012), with respective target Type-I and Type-II error probabilities 0 D 1 , 0 E 1 , and
D E 1. Here, we may emphasize that SPRT does not work well under the present scenario because a complete and sufficient statistic for T is
n
¦
i 1
X i2 when
a population is N (T ,T ) . This clearly contrasts from the classical case when
¦
n
i 1
X i as a complete and sufficient statistic for T under a N (T ,V 2 )
population. This creates some unexpected complications in implementing Wald’s SPRT. Therefore, Mukhopadhyay and de Silva (2008) incorporated an RSPRT. In the case of both SPRT and RSPRT, they addressed the issue of truncation when observations arrived from a N T ,T distribution.
Constrained Bayesian Method for Testing Different Type of Hypotheses 105
We revisit this problem in this book by offering maximum likelihood and Stein’s methods under the CBM ideology for testing hypotheses concerning
T parameter’s values (Kachiashvili, Mukhopadhyay and Kachiashvili, (unpublished)).
2.7.1. Statement of the problem Let us consider a random variable X distributed as N T ,T , T ! 0 , with a probability density function (PDF) as follows:
f ( x, T )
1 2ST
e
x T 2 2T
, x!0, T .
(2.168)
The problem is to test the hypotheses
H 0 : T T 0 , T0 ! 0 vs. H : 0 T T 0 or H : T ! T 0 ,
(2.169)
on the basis of independent identically distributed (i.i.d.) random variables
X 1 , X 2 ,..., X n with the following likelihood function: g ( x, T )
(2ST ) n / 2 e
1 2T
n
¦ xi T 2 i 1
.
(2.170)
The hypotheses from (2.169) are asymmetric in relation to the parameter
T . A basic null hypothesis specifies a value of the parameter directly whereas the alternative hypotheses do not, but they only indicate some intervals of choice. Thus, we have to test a simple hypothesis versus composite alternatives when the two latter specifications are not symmetric with regard to the basic one. We handle composite hypotheses via the maximum likelihood ratio tests as well as Stein’s method (Wijsman 1967; Anderson 1982; Kachiashvili 2016). In applications of the first method, the MLE of the relevant
Chapter 2
106
parameter of interest must be computed. Stein’s method allows us to identify the uniformly most powerful invariant test. 2.7.1.1. Estimation of the parameter
For defining estimation of the parameter T of the normal distribution (2.168) by maximum likelihood method, the following problem must be solved: n 1 ¦ xi T 2 ° 2 T i 1 max®(2ST ) n / 2 e ^T ` ° ¯
ª 1 « 2T (2ST ) n / 2 e «¬
n
n
i 1
i 1
¦ xi2 ¦ xi
T n º ½ » 2 »¼ °
¾ . (2.171) ° ¿
Because the maximum of (2.171) with respect to T under given n
observations does not involve S and
¦x
i
, instead of (2.171) we can solve
i 1
the following: ª 1 n 2 T n º ½ xi » ° n / 2 «¬« 2T ¦ 2 ¼» ° i 1 e max®T ¾, ^T ` ° ° ¯ ¿
(2.172)
or equivalently the following:
1 n 2 n ½ n max® ln T ¦ xi T ¾ ^T ` ¯ 2 2T i 1 2 ¿ Solution of (2.173) leads to:
T ML
1 1 1 # 2 4 n
n
2 i
¦x
.
i 1
Because of positivity of the parameter T , we have the maximum likelihood estimator (MLE) expressed as:
(2.173)
Constrained Bayesian Method for Testing Different Type of Hypotheses 107
T ML
1 1 1 2 4 n
n
2 i
¦x
(2.174)
i 1
which is positive with a probability of 1 .
2.7.2. Testing hypotheses (2.169) 2.7.2.1. The maximum ratio test
In this case, we test basic hypothesis
H 0 : T T 0 , T0 ! 0 versus H A : T T ML , T 0 z T ML , where
T ML
(2.175)
is defined by (2.174).
Distribution densities at basic and alternative hypotheses have the following forms
p( x | H 0 )
p( x | H A )
1 2ST0
e
1 2ST ML
x T 0 2
e
2T 0
and
x T ML 2 2T ML
,
(2.176)
respectively. It is obvious that the general rule of hypotheses testing, in this case, is: when alternative hypothesis is accepted and T ML T 0 , hypothesis H is accepted, otherwise hypothesis H is accepted. It remains to determine a decision rule for giving preference to one of the distributions (2.176). And for application of such a decision rule, a sample x1, x2 ,..., xn must be divided on two sub-sets x1, x2 ,..., xm and xm1, x2 ,..., xn , where m n . When using, for example, sample x1, x2 ,..., xm , we compute the estimation T ML using
Chapter 2
108
(2.174) and, when using sample xm1, x2 ,..., xn , we give the preference to one of the distributions (2.176). 2.7.2.2. Stein’s method
We have to test the following hypotheses
1
H 0 : p( x | H 0 )
2ST0 T0
H : p( x | H )
³
f
³T
x T 0 2
1 2ST
0
H : p( x | H )
e
0
1 2ST
2T 0
e
e
,
x T 2
2T
J (T )dT ,
(2.177)
x T 2 2T
J (T )dT ,
where J (T ) and J (T ) are the densities of the parameter T at the alternative suppositions H and H , respectively. It is seen from (2.174) that the statistic
1 n
n
2 i
¦x
is a complete and
i 1
sufficient (CS) statistic of T . Let us find its distribution law. For this purpose, let us make the following transformations
1 n
n
¦
xi2
i 1
1 n
2
n
¦ T K T i
i 1
n
§ T K i T · ¨ ¸ ¸ n i 1 ¨© T ¹
T
¦
2
T
2
n
K T , n¦ i
1/ 2
i 1
(2.178) where Ki are independent random variables having the standard normal distributions, i.e., Ki ~ N 0,1 . It is clear that random variable
K T ~ N T ,1 . i
1/ 2
1/ 2
Constrained Bayesian Method for Testing Different Type of Hypotheses 109
It is known (Johnson et al., 2004, p. 433) that the sum squares of n independent normally distributed random variables
K T i
1/ 2
is
distributed according to the non-central chi-square distribution with n degrees of freedom and non-centrality parameter n
2
¦ T
O
1/ 2
nT .
(2.179)
i 1
Therefore, for CS statistics we have 1 n
n
2 i
¦x i 1
~
T 2 F nc nT , n
(2.180)
where F nc2 nT has a non-central distribution with degrees of freedom n and parameter of non-centrality nT . Thus, as a prior densities of T in (2.177) at H and H , we choose noncentral chi-square distributions truncated to the intervals 0 T T0 and
T 0 T f , respectively. Let us denote Fn( a,b) (nT ) and f n( a,b) (nT ) cumulative distribution function (CDF) and PDF of a non-central chi-square distribution truncated to the interval ( a , b ) , respectively. For the untruncated case, i.e., when a 0, b f , we use designations Fn,nT ( x) and f n,nT ( x) . Then, for nT ( a, b) we have (Marchand, 1996; Johnson et al., 2004; p. 436)
Fn(,an,Tb ) ( x)
F n,nT ( x) F n,nT (a) F n,nT (b) F n,nT (a)
(2.181)
and f n(,an,Tb ) ( x)
f n,nT( x) F n,nT (b) F n,nT (a )
,
(2.182)
Chapter 2
110
where f n,nT( x) e ( nT x ) / 2
1§ x · ¨ ¸ 2 © nT ¹
( n 2) / 4
I ( n 2) / 2
nTx , x ! 0 , (2.183)
and
I ( n 2) / 2
nTx
· §1 nTx ¸ ¨ 2 © ¹
( n 2) / 2 f
nTx / 4 j
¦ j!*(n 2) / 2 j 1 .
(2.184)
j 1
The use of distribution (2.181) and (2.182) as a priori densities in (2.177) are associated with difficulties as at determination of conditional densities
p( x | H ) and p( x | H ) , so directly at testing the appropriate hypotheses. On the other hand, when degrees of freedom increase and, in practice, it is greater than 5, i.e., in the considered case n ! 5 , non-central distribution
F nc2 nT is quite good approximated by normal distribution as it is seen from Figures 29.1a, b of (Johnson et al., 2004; p. 444). The use of normal distribution is strengthened also by central limit theorem for sufficiently large n . Therefore, let us use normal approximation for distribution (2.180). For this purpose, we determine mathematical expectation and 1 n
variance of random variable
n
¦x
2 i
. According to (Johnson et al., 2004;
i 1
p. 447)
§1 E¨ ¨n © §1 var¨ ¨n ©
n
n
i 1
i 1
· T2
2¸ i ¸
¦x
· T (n nT ) T (1 T ) , ¹ n
2¸ i ¸
¦x ¹
n2
2(n 2nT )
2T 2 (1 2T ) . n
Truncated normal distributions to the intervals
(2.185)
0 T T0
T 0 T f with the parameters (2.185) have the following forms
and
Constrained Bayesian Method for Testing Different Type of Hypotheses 111
f ( x;T (1 T ), 2T (1 2T ) ,0, T 0 ) § x T (1 T ) · ¸ T 2(1 2T ) / n ¨© T 2(1 2T ) / n ¸¹ § T T (1 T ) · § · ¸ )¨ 0 T (1 T ) ¸ )¨ 0 ¨ T 2(1 2T ) / n ¸ ¨ T 2(1 2T ) / n ¸ © ¹ © ¹ 1
M¨
§ x T (1 T ) · 1 ¸ M¨ T 2(1 2T ) / n ¨© T 2(1 2T ) / n ¸¹ § T T (1 T ) · § · T (1 T ) ¸ 1 )¨ ¸ )¨ 0 ¨ T 2(1 2T ) / n ¸ ¨ T 2(1 2T ) / n ¸ © ¹ © ¹
,
f ( x;T (1 T ), 2T (1 2T ) , T 0 , f) § x T (1 T ) · ¸ T 2(1 2T ) / n ¨© T 2(1 2T ) / n ¸¹ . § T T (1 T ) · 0 ¸ 1 )¨ ¨ T 2(1 2T ) / n ¸ © ¹ 1
M¨
(2.186)
Finally, for the conditional densities, we have
H 0 : p( x | H 0 )
H : p( x | H )
H : p( x | H )
1 2ST0 T0
³
0
f
³T
0
e
x T 0 2
1 2ST 1 2ST
2T 0
e
e
,
x T 2 2T
f ( x;T (1 T ),T 2(1 2T ) / n ,0,T 0 )dT ,
x T 2 2T
f ( x;T (1 T ),T 2(1 2T ) / n ,T 0 , f)dT . (2.187)
Unfortunately, the use of a priori densities (2.186) instead of (2.182) in (2.177) does not substantially simplify the expressions of conditional
Chapter 2
112
densities p( x | H ) and p( x | H ) and their application for testing hypotheses remains problematic. Because testing informationally distanced hypotheses leads to small probabilities of erroneous decisions and these probabilities are smaller for bigger distances, we may consider some finite interval (T 0 , kT0 ) instead of the infinite interval (T 0 , f) at H , with k
2,3, 4,... . But this arbitrariness will affect the final result insignificantly.
The change will be smaller as k is larger. The transition to the finite intervals allows us to use the uniform distributions for densities J (T ) and
1
J (T ) , i.e., let us use J (T )
T0
1 kT 0 T 0
and J (T )
1 in (k 1)T 0
(2.177). Then, instead of (2.169), we may test the hypotheses:
H 0 : T T 0 , T 0 ! 0 vs. H : 0 T T0 or H : T 0 T kT 0 (2.188) and the appropriate conditional densities are
1
H 0 : p( x | H 0 )
2ST0
H : p( x | H ) 1
T 0 2S
T0
³
0
T
H : p( x | H ) 1 ( k 1)T 0
kT 0
³T
0
0
2ST
x T 2 1 2T 2e
2ST
e
e
(2.1891)
,
x T 2 2T
1
T0
dT
1
T0
e
0
1 2ST
e
x T 2 2T
dT
(2.1892)
x T 2 2T
x T 2 2T
T0
³
dT ,
2ST
0
1
2T 0
1
kT 0
³T
x T 0 2
1
T0
³
e
dT
1 dT (k 1)T0
Constrained Bayesian Method for Testing Different Type of Hypotheses 113 2
1
kT 0
(k 1)T 0 2S
³T
T
x T 1 2T 2e
dT .
(2.1893)
0
2.7.3. CBM for testing hypotheses (2.169) under conditional distributions (2.189) For testing hypotheses (2.169) at conditional distributions (2.189), let us make denotations: * j is the region of acceptance of hypothesis H j and it is determined by decision function G ( x)
^G1 ( x),G 2 ( x),...,G S ( x)` where we
denote: 1 if H j is accepted , ° G j ( x) ® otherwise, °0, ¯
that is, * j
^x : G j ( x) 1`.
Let us suppose that L1 ( Hi , G j ( x) 1) and L2 ( Hi , G j ( x) 0) are the losses due to incorrectly accepting and incorrectly rejecting the hypotheses. Then the total loss of incorrectly accepted and incorrectly rejected hypotheses
L( H i , G ( x)) is determined by (2.1). Adapting the denotations to the considered hypotheses, let us consider Task 2 and Task 7 of CBM (see Items 2.3.2 and 2.3.5) for testing hypotheses (2.169) (Kachiashvili, 2018). For Task 7 (see Item 2.3.5) Theorems 2.8 and 2.9 were proved for directional alternatives which are in force for the considered case (Kachiashvili et al., 2020). At consideration of Task 2 (see Item 2.3.2), the following theorems are truthful.
Chapter 2
114
Theorem 2.19. CBM 2 at statement (2.79), (2.95), for hypotheses (2.169),
conditional distributions (2.189) and losses (2.78), ensures a decision rule with the error rates of Type-I and Type-II, restricted by the following inequalities
Dd
Ed
r20 , K 0 p( H 0 )
r2 r2 . K 0 p( H ) K 0 p( H )
(2.190)
Proof. The Type-I and Type-II error rates for hypotheses (2.169) are the following
D
P( x * | H 0 ) P( x * | H 0 )
³
*
p ( x | H 0 ) dx
³
*
p ( x | H 0 ) dx ,
(2.191)
and
E
P( x *0 | H ) P( x *0 | H )
³
*0
p ( x | H ) dx
³
*0
p ( x | H ) dx .
(2.192)
Hypotheses acceptance regions in CBM have special properties (2.90) (Kachiashvili et al., 2019). Because of specificity (2.90) of hypotheses acceptance regions in CBM and restrictions (2.95), we can write
D
³
*
d 1
and
p ( x | H 0 ) dx
³
*0
³
*
p ( x | H 0 )dx d
p ( x | H 0 ) dx d
r20 p( H 0 ) K 0
Constrained Bayesian Method for Testing Different Type of Hypotheses 115
³
E
ª d «1 ¬
³
*
*0
p ( x | H ) dx
³
*0
p ( x | H ) dx d
ª d 1 « ¬
³
p ( x | H ) dx
³
º p ( x | H ) dx » ¼
ª 1 « ¬
³
p ( x | H ) dx
³
º p ( x | H ) dx » d ¼
*
*
º ª p ( x | H )dx » «1 ¼ ¬
³
*
*
*
r2 r2 º p ( x | H )dx » d . ¼ p( H ) K 0 p( H ) K 0
Theorem 3.20. CBM 2 at statement (2.79), (2.95), for hypotheses (2.169),
conditional distributions (2.189) and losses (2.78), ensure a decision rule with rG the averaged loss of incorrectly accepted hypotheses and D s the averaged loss of incorrectly rejected hypotheses is restricted by the following inequalities rG d
K1 r2 r20 r2 , K0
>
@
D s d r2 r20 r2 .
(2.193)
Proof. Taking into account (2.79), restrictions (2.95) and condition (2.90), we can write § rG d p ( H ) K1 ¨1 ©
³
*
§ · p( x | H )dx ¸ p ( H 0 ) K1 ¨1 ¹ ©
§ p ( H ) K 1 ¨1 © d p ( H ) K1
³
*
³
*0
· p( x | H 0 )dx ¸ ¹
· p ( x | H ) dx ¸ d ¹
r2 r20 p ( H 0 ) K1 p( H ) K0 p( H 0 ) K 0
p ( H ) K1
r2 p( H ) K 0
K1 r2 r20 r2 , K0
Chapter 2
116
and
Ds
p( H ) K 0
p ( x | H ) dx p ( H 0 ) K 0
³
R n *
p( H ) K 0
§ p ( H ) K 0 ¨1 ©
³
*
³
R n *
R n *0
p ( x | H )dx
§ · p( x | H )dx ¸ p ( H 0 ) K 0 ¨1 ¹ ©
§ p ( H ) K 0 ¨1 ©
³
*
p ( x | H 0 ) dx
³
³
*0
· p( x | H 0 )dx ¸ ¹
· 0 p ( x | H ) dx ¸ d r2 r2 r2 . ¹
2.7.4. Testing hypotheses (2.175) at conditional distributions (2.176) using CBM 2 In this case, we have to solve the following conditional problem: to minimize rG
§ min ® p ( H 0 )¨ ©
^*0 ,*A `¯
³
*A
*0
L1 ( H 0 , H 0 ) p ( x | H 0 ) dx
· L1 ( H 0 , H A ) p ( x | H 0 ) dx ¸ ¹
§ p ( H A )¨ ©
³
³
*0
L1 ( H A , H 0 ) p ( x | H A )dx
·½ L1 ( H A , H A ) p( x | H A )dx ¸¾ , *A ¹¿
³
subject to ª p( H0 ) « ¬
³
³
R n *A
R n *0
L2 ( H 0 , H 0 ) p ( x | H 0 )dx ,
º L2 ( H 0 , H A ) p ( x | H 0 ) dx » d r20 ¼
(2.194)
Constrained Bayesian Method for Testing Different Type of Hypotheses 117
ª p( H A ) « ¬
³
³
R n *A
R n *0
L2 ( H A , H 0 ) p ( x | H A )dx
º L2 ( H A , H A ) p ( x | H A ) dx » d r2A . ¼
(2.195)
For the losses (2.78), the problem (2.194), (2.195) transforms in the form
min ® p ( H0 ) K1
rG
^*0 ,*A `¯
³
*A
p( x | H0 )dx p ( H A ) K1
³
*0
½ p ( x | H A )dx ¾ (2.196) ¿
subject to § p ( H 0 ) K 0 ¨1 ©
³
· p( x | H0 )dx ¸ d r20 , ¹
§ p ( H A ) K 0 ¨1 ©
³
· p ( x | H A ) dx ¸ d r2A . ¹
*0
*A
(2.197)
The solution of the problem (2.196), (2.197) by the Lagrange method gives
*0 *A
^x : p(H A ) K1 p( x | H A ) O0 p(H0 ) K0 p( x | H0 )` , ^x : p(H 0 ) K1 p( x | H 0 ) OA p(H A ) K0 p( x | H A )`,
(2.198)
where Lagrange multipliers O0 and OA are determined so that in (2.197) the equalities take place. Let us determine O0 and OA for the conditional densities (2.176). Hypotheses acceptance regions (2.198) for (2.176) take the form
*0
p( x | H A ) p( H 0 ) K 0 ½ O0 ®x : ¾ p( H A ) K1 ¿ ¯ p( x | H 0 )
½° ° 2T MLT 0 ª 1 T0 º 2 ®x : x «ln O0 ln d 0 ln » T MLT 0 ¾ , (2.199) T ML T 0 ¬ 2 T ML ¼ °¿ °¯
Chapter 2
118
where d 0
p( H 0 ) K 0 , and p( H A ) K1 *A
p( x | H A ) 1 p( H 0 ) K1 ½ ! ®x : ¾ ¯ p( x | H 0 ) O A p( H A ) K 0 ¿
½° ° 2T 0T ML ª 1 T0 º 2 ®x : x ! «ln d A ln O A ln » T 0T ML ¾ , (2.200) T ML T 0 ¬ 2 T ML ¼ °¿ °¯ where d A
p( H 0 ) K1 . p( H A ) K 0
Note 2.3. Intervals of searching O0 and OA must be chosen so that the values
J0
2T MLT 0 ª
T ML T 0 «¬
ln O0 ln d 0 1 ln 2
T0 º » T MLT 0 T ML ¼
(2.201)
and
JA
2T 0T ML ª T0 º 1 «ln d A ln O A ln » T 0T ML (2.202) T ML T 0 ¬ 2 T ML ¼
are determined positively. Lagrange multipliers O0 and OA must be determined from the conditions (2.197) that can be rewritten as follows p ( x *0 | H 0 ) t 1
r20 , p( H0 ) K 0
p ( x *A | H A ) t 1
r2A . p( H A ) K 0
(2.203)
Let us rewrite hypotheses acceptance regions (2.199) and (2.200) as follows
Constrained Bayesian Method for Testing Different Type of Hypotheses 119
*0
^x : x
2
`
^x : x
J 0 , and *A
`
2
!JA .
(2.204)
Then conditions (2.203) can be written in the form p( x 2 J 0 | H0 ) t 1
r20 p( H0 ) K 0
r2c 0 ,
p( x 2 ! J A | H A ) t 1
r2A p( H A ) K 0
r2c A .
(2.205)
As was mentioned above (see (2.180)) x 2 ~ T F1c 2 (T ) . Therefore, conditions (2.205) take the forms J 0 / T0
p ( x 2 J 0 | H0 )
2
³ F c ( x;T )dx t r c 1
0
2
0
,
0
f
p( x 2 ! J A | H A )
2
³T F c ( x;T 1
JA/
ML ) dx
t r2c A .
(2.206)
ML
In accordance with Abdel-Aty’s (1954) distribution function, P ( x;1, T ) of x 2 is quite accurate and can be approximated by the formula
§ § x ·1 / 3 § 2 ¨¨ ¸ ¨¨1 ¨ ©1T ¹ f 9 © P ( x;1,T ) | )¨ 2 ¨ ¨ 9f ©
· ·¸ ¸¸ ¸ ¹ , where f ¸ ¸ ¸ ¹
1
T2 1 2T
. (2.207)
Thus, for finding Lagrange multipliers O0 and OA , we have to solve the following equations
§ § J / T ·1 / 3 § · ¨ ¨ 0 0 ¸ ¨1 2 ·¸ ¸ ¨ 9f ¸¸ ¨ ¨ 1 T0 ¸ T 02 0 ¹ © ¹ ¸ r2c0 , where f 0 1 )¨ © , (2.208) 1 2T 0 2 ¨ ¸ ¨¨ ¸¸ 9 f0 © ¹ and
Chapter 2
120
§ § J /T ¨ ¨ A ML ¨ ¨ 1 T ML 1 )¨ © ¨ ¨¨ ©
1/ 3
· ¸¸ ¹
§ 2 · ·¸ ¸¸ ¨¨1 2 ¸ T ML © 9 f ML ¹ ¸ r c A . 2 , where f ML 1 1 2T ML 2 ¸ ¸¸ 9 f ML ¹ (2.209)
On the other hand, in accordance with Chebyshev’s inequality, we can rewrite inequalities (2.205) as follows p( x 2 J 0 | H0 ) 1 p( x 2 t J 0 | H0 ) d 1 p( x 2 ! J A | H A ) d
E(x2 | H A )
1
JA
E ( x 2 | H0 )
J0
1
r20 , p( H0 ) K 0
r2A . p( H A ) K 0
(2.210)
From here we find E ( x 2 | H0 )
T 0 (1 T 0 ) J0
J0 E(x2 | H A )
r20 , p( H0 ) K 0
r2A T A (1 T A ) . 1 JA p( H A ) K 0
JA
(2.211)
Taking into account (2.199) and (2.200), and notations (2.201) and (2.202), on the basis of (2.211), we can write
J0 2T MLT 0 ª
T ML T 0 «¬ O0 , and
p( H0 ) K 0 T 0 (1 T 0 ) r20
ln O0 ln d 0 1 ln 2
T0 º » T MLT 0 T ML ¼
, p ( H 0 ) K 0 T 0 (1 T 0 ) r20
,
°ª p( H ) K T (1 T ) ½° º T ML T 0 1 T 0 0 0 0 0 exp®« d T T ln ln » ¾ ML 0 0 2 T ML r20 °¯¬« °¿ ¼» 2T MLT 0 (2.212)
Constrained Bayesian Method for Testing Different Type of Hypotheses 121
JA
T A (1 T A ) § r2A ¨1 ¨ p( H A ) K 0 ©
· ¸ ¸ ¹
,
2T 0T ML ª 1 T0 º «ln d A ln O A ln » T 0T ML T ML T0 ¬ 2 T ML ¼
OA
T A (1 T A ) § · r2A ¨1 ¸ ¨ p( H A ) K 0 ¸¹ ©
,
½ ª º ° ° « » ° 1 T 0 « T A (1 T A ) » T ML T 0 ° T 0T ML » exp®ln d A ln ¾. 2 T ML « § 2T 0T ML ° · r2A ° « ¨1 » ¸ ° ° « ¨© » p( H A ) K 0 ¸¹ ¬ ¼ ¯ ¿ (2.213)
2.8. Constrained Bayesian Method for Testing EquiCorrelation Coefficient of a Standard Symmetric Multivariate Normal Distribution 2.8.1. Introduction Symmetric Multivariate Normal Distribution (SMND) is widely used in many applications of different spheres of human activities such as psychology, education, genetics, and so on (SenGupta, 1987). It is also using for solving different statistical problems, for example, in analysis of variance (ANOVA) for modeling of the error part (De and Mukhopadhyay, 2019). A random vector has a SMND if its components have equal means, equal variances and an equal correlation coefficient between the pairs of components. The last is called equi-correlation coefficient (Rao, 1973; SenGupta, 1987). A vector has a standard symmetric multivariate normal distribution (SSMND) when the components have zero mean and unite variances. Consideration of SSMND instead of SMND does not reduce
122
Chapter 2
generality when the means and variances are known. On the other hand, SSMND is interesting from several theoretical aspects (SenGupta, 1987, p. 2): it is an invariant model which belongs to a curved exponential family (Efron, 1975; Anderssen, 1976), it allows us to find a simple estimator of the correlation coefficient (Sampson, 1976), and it can be derived by a small sample optimal test for the correlation coefficient (SenGupta, 1987). Due to the told the inference problem of a correlation coefficient of such model is quite actual and to its solution a lot of works are dedicated starting from early 40th of last century (Mahalanobis, 1940) and more recently (Mukhopadhyay and de Silva, 2009; Zacks, 2009; Banerjee and Mukhopadhyay, 2016). In the majority of these works, the problem of finding estimators of the correlation coefficient was considered (see, for example, Sampson, 1976, 1978; Ghosh and Sen, 1991; Ghosh et al., 1997; De and Mukhopadhyay, 2015), whereas, comparatively rarely the testing problem was considered (SenGupta, 1987). Likelihood ratio test (LRT), for general cases when testing H 0 : U
U 0 against H 2 : U z U 0 , and the
locally most powerful test (LMPT) when testing H 0 : U
0 against
H 1 : U ! ( ) 0 is offered by SenGupta in his work (1987). LMPT is based
on the best (minimum variance) natural unbiased estimator (BNUE) of U . In the present work, we consider the Constrained Bayesian Method (CBM) for testing the above given hypotheses concerning the equi-correlation coefficient using exact and asymptotical probability density functions (PDFs) of test statistics obtained in (SenGupta, 1987; De and Mukhopadhyay, 2015; Zacks and Raming, 1987). This allows us to make a decision with the restricted criteria of optimality on the desired levels that is not achieved by the testing rules given in (SenGupta, 1987). Along with the theoretical results, the results of a simulation are offered in Item 4.6 for
Constrained Bayesian Method for Testing Different Type of Hypotheses 123
the demonstration of the validity of this theory, the obtained results, and an investigation of their properties.
2.8.2. Statement of the problem We consider the problem of testing hypotheses concerning equality to a concrete value of the correlation coefficient of SMND in this paragraph which can be formally presented as follows. Let us consider a k -dimensional X random vector which obeys the k variate normal distribution with a zero mean vector and correlation matrix
W of the following structure:
W (1 U) I kuk U J kuk ,
(2.214)
where I kuk is an identity matrix and J kuk is a matrix of ones. It is known (SenGupta, 1987) that W 1
(cij ) k uk ,
where
cii cij
^1 (k 2)U`/^(1 U)>1 (k 1)U@` , U /^(1 U )>1 (k 1)U @` , i z j ,
and the density function of X for non-singular W is the following
p( X ; U )
¦
1 ( 2S )
k/2
2 ª§ k 2 · § k x · U º ½° xi ¸ ° ¨ ¨ ¸ « » i i 1 ° 1 ¹ © i 1 ¹ » °¾ , exp ® « © 2 1 U 1 ( k 1 ) U 1 U « »° ° « »° °¯ ¬ ¼¿
W
1/ 2
¦
(2.215) f xi f , i 1,..., k ; 1 /( k 1) U 1 .
Chapter 2
124
As the interval of U , 1 /( k 1),1 is taken, for to be non-singular the correlation matrix W (De and Mukhopadyay, 2019), i.e. for existence of W 1 .
Let us introduce Y1 , Y2 ,... , the Helmert orthogonal transformation of uncorrelated multivariate normal vectors sequence
Yi
X 1 , X 2 ,... , i.e.,
H X i , i 1,2,..., and H is the Helmert matrix given as (De and
Mukhopadyay, 2019)
H
1 1 1 ª 1 º ... « » k k k « k » « 1 » 1 0 ... 0 « » 2 « 2 » « 1 » 1 2 0 ... 0 « » . (2.216) 6 6 « 6 » «........................................................ » « » 1 1 1 « » ... « » k (k 1) k (k 1) » « k (k 1) ¬« ¼»
Then each of Y1 , Y2 ,... , are i.i.d. k -dimensional random vectors with zero means and a diagonal correlation matrix
V
HWHT
diag(v1,...,vk ) ,
(2.217)
where v1 ,...,vk are the eigenvalues of the correlation matrix W , i.e.,
Yi ~ Nk 0,V (De and Mukhopadyay, 2019). It is known (Rao, 2006) that v1 1 (k 1) U and v2
... vk 1 U .
(2.218)
Let Y1,...,Yn be i.i.d. observations, i.e., sample from Nk 0,V and introduce variables
Constrained Bayesian Method for Testing Different Type of Hypotheses 125 n y2 i 1 i1
¦
V1n
where Yi
yi1, yi2 ,..., yik , i
and V2n
n
k
¦ ¦ i 1
y2 j 2 ij
,
(2.219)
1,..., n .
The maximum likelihood estimation (MLE) of the correlation coefficient
U is (Zacks and Raming, 1987; De and Mukhopadyay, 2019)
Uˆ n
V1n V2 n (k 1) 1 . V1n V2 n
(2.220)
The problem that we want to solve can be formulated as follows: to test
H0 : U U0 , vs. H1 : U (!)U0 ,
(2.2211)
or
H0 : U U0 vs. H2 : U z U0 ,
(2.2212)
on the basis of the sample Y1,...,Yn . Here Yi is a k -dimensional random vector with independent components, each of which has zero mean and variances determined by (2.218). Probability density function (PDF) of Yi is
p(Yi | U ) (2S )
k 2
V
1 2
1 ½ exp® Yi V 1 YiT ¾ , i 1,..., n . ¯ 2 ¿
(2.222)
Joint PDF of the sample Y1,...,Yn has the following form (De and Mukhopadyay, 2019)
p(Y1,...,Yn | U ) (2S ) kn / 2 V
(2S ) kn / 2 (1 (k 1) U )
n 2
° 1 exp® °¯ 2
n n( k 1) 2 (1 U ) 2
n
¦Y V i
i 1
1
½° YiT ¾ °¿
° 1 § V1n V ·½° exp® ¨¨ 2n ¸¸¾ , °¯ 2 © 1 (k 1) U 1 U ¹°¿ (2.223)
where V1n and V2n are defined by (2.219).
Chapter 2
126
2.8.3. Testing (2.221) hypotheses Let us transform hypotheses (2.221) into directional ones, i.e., instead of (2.221) consider hypotheses (Kachiashvili and SenGupta, (unpublished-a)):
H0 : U U0 vs. H : 1/(k 1) U U0 or H : U0 U 1, (2.224)
U, U0 1/(k 1),1 . 2.8.3.1. The maximum ratio test
In this case, we test hypotheses
H0 : U U0 vs. H A : U Uˆ n , U0 z Uˆ n ,
(2.225)
where Uˆn is defined by (2.220). For testing hypotheses (2.225), we use the sample Y m
(Yn1 ,...,Ynm )
and PDF (2.223) with the parameters U0 and Uˆn for the validity of hypotheses H0 and H A , respectively, i.e., PDFs at basic and alternative hypotheses are determined by (2.223) using the values of U0 and Uˆn for basic and alternative suppositions, respectively. The algorithm for making a decision is the following. Let us designate
Y1,...,Yn ,Yn1,...,Ynm i.i.d. random vectors obtained by Helmert orthogonal transformation of the sample X1,..., X n , X n1,..., X nm , i.e., Yi
H Xi ,
i 1, 2,..., n m . On the basis of the first half of the vectors Y1,...,Yn , we
compute MLE of the parameter U by (2.220) and on the basis of the second half of the sample Yn1,...,Ynm , we test hypotheses (2.225). PDFs used for testing
are
the
p(Yn1,...,Ynm | H A )
following:
p(Yn1 ,...,Ynm | H 0 )
p(Y m | U0 )
and
p(Y m | Uˆ A ) determined by (2.223). When testing
Constrained Bayesian Method for Testing Different Type of Hypotheses 127
(2.225) and decision is made in favor of the alternative hypothesis, we accept H if Uˆ n U0 , otherwise H . 2.8.3.2. Stein’s approach
The integrated probability distribution densities (PDD) of the sample
Ym
(Yn1 ,...,Ynm ) are necessary in this case for the hypotheses under
consideration, i.e., the following PDDs: m
H0 : p(Y | H0 ) (2S )
mk 2 (1 (k
1) U0 )
° 1 § V1m V exp ® ¨¨ 2m °¯ 2 © 1 ( k 1) U 0 1 U 0
m m( k 1) 2 2 (1 U ) 0
·½° ¸¾ , ¸° ¹¿
(2.226)
U0
H : p(Y m | H )
³ p(Y
m
| U ) J ( U )dU ,
(2.227)
1 /(k 1) 1 m
H : p(Y | H )
³ p(Y U
m
| U ) J ( U )dU ,
(2.228)
0
m where p(Y | U ) is defined by (2.223) for the sample Y m
(Yn1 ,...,Ynm )
; J ( U ) and J ( U ) are the densities of the parameter U at the alternative suppositions H and H , respectively. Because at alternative hypotheses H and H the value of the parameter
U belongs to the intervals (1/(k 1), U0 ) and (U0 ,1) , respectively, and no information about preferability of some values from these intervals, we use the uniform distributions of U in the appropriate intervals, i.e. (2.227) and (2.228) PDDs transform in the following forms:
Chapter 2
128
H : p(Y m | H ) (2S )mk / 2 U0
³
(1 (k 1) U )
m m( k 1) 2 (1 U ) 2
1 /(k 1)
1
U0 (k 1)1
° 1 § V1m V ·½° exp® ¨¨ 2m ¸¸¾dU , °¯ 2 © 1 (k 1) U 1 U ¹°¿ (2.229)
H : p(Y m | H ) (2S ) mk / 2 1
(1 (k 1) U )
³ U
m m( k 1) 2 (1 U ) 2
0
1 1 U0
° 1 § V1m V ·½° exp® ¨¨ 2m ¸¸¾dU , °¯ 2 © 1 (k 1) U 1 U ¹°¿ (2.230)
where V1m
Ym
and V2m are computed by (2.219) for the sample
(Yn1 ,...,Ynm ) .
2.8.4. CBM for testing (2.224) hypotheses Let us consider CBM for testing (2.224) hypotheses. One of possible statement of CBM, namely, so called Task 7 (Kachiashvili, 2018a; Kachiashvili et al., 2020) has the following form GG
max
^K >p( H
^* ,*0 ,* `
0
) P (Y
m
* | H ) p ( H 0 ) P (Y m *0 | H0 )
@`
p( H ) P(Y m * | H ) ,
(2.231)
subject to
> K >p ( H ) P(Y K >p ( H ) P(Y
@ * | H )@ d r , * | H )@ d r ,
K1 p ( H 0 ) P(Y m * | H 0 ) p( H ) P(Y m * | H ) d r7 , 1
1
m m
*0 | H ) p( H ) P(Y m * | H ) p( H 0 ) P(Y m
0
0 7
0
7
(2.232)
Constrained Bayesian Method for Testing Different Type of Hypotheses 129
where p (H0 ) , p ( H ) and p ( H ) are a priori probabilities of the appropriate hypotheses, P (Y m *i | H j ) , i , j ( ,0, ) , is a probability of acceptance of Hi hypothesis at validity of H j hypothesis on the basis of Y m ; K1 and K0 define the values of losses at incorrectly accepted
hypotheses, i.e., they define the loss functions of incorrect acceptance and incorrect rejection of hypotheses which are the following m
L1 ( H i , G j (Y ) 1)
0 at i j , ° and ® ° K1 at i z j; ¯
K 0 at i j , ° L2 ( H i , G j (Y ) 0) ® °0 at i z j; ¯ m
^G (Y
G (Y m )
m
(2.233)
`
),G 0 (Y m ),G (Y m ) is a decision function, the components
of which 1 if H j is accepted , ° G j (Y ) ® otherwise, °0 ¯ m
*j
^Y
m
`
: G j (Y m ) 1 , j ( ,0, ) , is H j hypothesis acceptance region;
r7 , r7 and r7 are restriction levels of the averaged losses of incorrectly acceptance of hypotheses H , H0 and H , respectively. The solution of the problem (2.231), (2.232) by undetermined Lagrange multipliers gives hypotheses acceptance regions *
½ ° m 1 m m m ° ®Y : K1 p ( H0 | Y ) p ( H | Y ) K 0 p ( H | Y )¾ , °¿ °¯ O7
Chapter 2
130
*0
½ ° m 1 m m m ° ®Y : K1 p ( H | Y ) p ( H | Y ) 0 K 0 p ( H0 | Y )¾ , °¿ °¯ O7
*
½ ° m 1 m m m ° ®Y : K1 p ( H | Y ) p ( H0 | Y ) K 0 p ( H | Y )¾ , °¿ °¯ O7
(2.234) where Lagrange multipliers O7 , O07 and O7 are determined so that in the conditions (2.232) the equalities take place. In accordance with the theorems proven in (Kachiashvili et al., 2020), when the circumstances satisfied,
the
O7 O7 K1 Pmin
following
q or
O7 O07 O7
conditions
q ( 0 q 1 ) are
K1 Pmin mdFDR
SERR III d q
or
mdFDR FAR d q are fulfilled, where the following criteria of optimality
of made decisions are used: mixed directional false discovered rate (
mdFDR ), the summary type III error rate ( SERR III ) and false acceptance rate ( FAR ). They are determined by the following rations
mdFDR P(Y m * | H ) P(Y m * | H 0 ) P(Y m * | H ) P(Y m * | H 0 ) ,
(2.235)
FAR P(Y m *0 | H ) P(Y m *0 | H ) .
(2.236)
Another possible statement of CBM, namely, so called Task 2 (Kachiashvili, 2018a; Kachiashvili et al., 2020) has the following form RG
min
^p ( H
^* ,*0 ,* `
) K1
> p( H ) K >P(Y
>
@
P (Y m *0 | H ) P (Y m * | H )
@ * | H )@`,
p( H0 ) K1 P(Y m * | H0 ) P(Y m * | H0 )
subject to
1
m
* | H ) P(Y m
0
(2.237)
Constrained Bayesian Method for Testing Different Type of Hypotheses 131
P(Y m * | H ) t 1
r2 r20 , P(Y m *0 | H 0 ) t 1 , P( H ) K 0 P( H0 ) K 0
P(Y m * | H ) t 1
r2 , P( H ) K 0
(2.238)
0 where r2 , r2 and r2 are restriction levels in the considered statement.
The solution of the problem of (2.237), (2.238) by Lagrange method, gives the following hypotheses acceptance regions
* *0 *
^Y ^Y ^Y
m m m
: K p( H | Y : K p( H | Y
` ) O K p( H | Y )`, ) O K p( H | Y )`, (2.239)
: K1 p( H0 | Y m ) p( H | Y m ) O2 K 0 p( H | Y m ) , m
1
1
) p( H | Y m
m
) p( H0 | Y m
0 2
0
0
2
0
m
m
0 where Lagrange multipliers O2 , O2 and O2 are determined so that in the
conditions (2.238) the equalities take place. Theorems are proven in (Kachiashvili and SenGupta, (unpublished-a)) in accordance of which (2.239) decision regions ensure restricted error rates of the Type I and Type II, determined by the following ratios
Dd
r20 r2 r2 , Ed , K 0 P( H0 ) K 0 P( H ) K 0 P( H )
(2.240)
also, they ensure the restriction of the averaged loss of incorrectly accepted hypotheses RG and the averaged loss of incorrectly rejected hypotheses DG by the following ratios RG d
K1 r2 r20 r2 , D G d r2 r20 r2 , K0
>
@
(2.241)
where
DG
p( H ) K 0 1 P(Y m * | H ) p( H 0 ) K 0 1 P(Y m *0 | H 0 )
Chapter 2
132
p( H ) K 0 1 P(Y m * | H ) .
2.8.5. Evolution of CBM 2 for testing (2.224) hypotheses 2.8.5.1. Using the maximum ratio estimation
Like (2.239), hypotheses acceptance regions, at testing (2.225) hypotheses using CBM 2, have the following forms
*0 *A
^Y ^Y
` )`, (2.242)
m
: K1 p( HA ) p(Y m | HA ) O02 K 0 p( H0 ) p(Y m | H0 ) ,
m
: K1 p( H0 ) p(Y m | H0 ) O2A K 0 p( HA ) p(Y m | HA
0 A where Lagrange multipliers O2 and O2 are determined from the following
conditions (Kachiashvili and SenGupta, (unpublished-a))
) 1 P (Y
K 0 p( H0 ) 1 P (Y m *0 | H0 ) K 0 p ( HA
m
r20 ,
*A | HA )
r2A ,
(2.243)
0 A where r2 and r2 are the restriction levels at validity of H0 and H A ,
respectively. For the conditional PDFs of Y m , we have (2.226) at validity of H0 and the following at validity of H A
H A : p(Y m | H A ) (2S )
mk 2 (1 (k
1) Uˆ n )
m 2 (1 U ˆ
° 1 § V1m V exp ® ¨¨ 2m ˆ k 2 1 ( 1 ) 1 U Uˆ n °¯ n ©
·½° ¸¾ , ¸° ¹¿
n)
m( k 1) 2
(2.244)
0 A Let us determine Lagrange multipliers O2 and O2 in (2.242) for
satisfaction of (2.243). Hypotheses acceptance regions (2.242) for the conditional densities (2.226) and (2.244) after simple transformations take the forms
Constrained Bayesian Method for Testing Different Type of Hypotheses 133
^Y : d m
*0
0 1
`
V1m d 20 V2m 2 g 20 g10 ,
(2.245)
where d10
1 1 , 1 (k 1) U 0 1 (k 1) Uˆ n 1 1 , 1 U 0 1 Uˆ n
d 20 g10
m § 1 (k 1) U 0 ln ¨¨ 2 © 1 (k 1) Uˆ n
^Y : d m
· ¸, ¸ ¹
§ p( H 0 ) K 0 · ¸. ln ¨¨ O02 p ( H A ) K1 ¸¹ ©
g 20
*A
· m(k 1) § 1 U 0 ¸ ln ¨¨ ¸ 2 ¹ © 1 Uˆ n
A 1
(2.246)
`
V1m d 2A V2m 2 g 2A g1A ,
(2.247)
where
d1A
d10 , d 2A
d 20 , g1A
g 2A
§ p( H A ) K 0 ln ¨¨ O2A p ( H 0 ) K1 ©
· ¸. ¸ ¹
g10 , (2.248)
0 A For determination of Lagrange multipliers O2 and O2 from the conditions
(2.243),
computation
of
the
probabilities
P (Y m *0 | H0 )
and
P (Y m *A | HA ) are essential. For this purpose, the knowledge of the probability distribution functions of the random variables
[0
d10 V1m d 20 V2m ,
[A
d1A V1m d 2A V2m ,
(2.249)
are necessary. The properties of V1m and V2m are given in (SenGupta, 1987; De and Mukhopadyay, 2019):
Chapter 2
134
(i)
V1m ~ (1 (k 1) U ) F m2 ,
(ii)
V2 m ~ (1 U ) F m2 ( k 1) ,
(iii)
V1m and V2m are independent.
(2.250)
The distribution function of a linear combination of chi-square random variables is considered in many works (Fleiss, 1971; Solomon and Stephens, 1977; Moschopoulos and Canada, 1984; Coelho, 2020). Below, we use the results of (Moschopoulos and Canada, 1984), in accordance of l which the distribution function F ( z, U ) of [l , l (0, A) , from (2.249) is
F l ( z, U ) b2l
f
¦
j 0
a lj
³
z
0
f jl ( y, U )dy ,
l where formulae for the determination of coefficients b2 , a lj , j
(2.251) 0,1,... ,
and f jl ( y , U ) density are given in Appendix A1. Using (2.251), conditions (2.243) will be written as follows
F 0 (2 ( g 20 g10 ), U 0 ) 1
r20 , p( H 0 ) K 0
F A (2 ( g 2A g1A ), Uˆ n ) 1
r2A , p( H A ) K 0
(2.252)
0 A by solving which with respect to Lagrange multipliers O2 and O2 and using
the sample Y m , we make a decision depending on which of the conditions (2.242) is satisfied. Note 2.4. Because of specificity of hypotheses acceptance regions in CBM, in that they can be intersected or their union can not recover the observation space, there can arise the situation when not one of (2.225) hypotheses are
Constrained Bayesian Method for Testing Different Type of Hypotheses 135
accepted on the basis of the sample Y m
(Yn1 ,...,Ynm ) . In this case, we
pass to sequential experiment, i.e., we increase the sample size by one and test hypotheses (2.225). If a unique decision cannot made on the basis of the sample Y m1
(Yn1,...,Ynm , Ynm1 ) , we again increase its size by one, i.e.,
we obtain the sample Y m2
(Yn1,...,Ynm , Ynm1 , Ynm2 ) and so on until
a unique decision will not be made. Note 2.5. Decision making procedures can be purely sequential when we start testing (2.225) hypotheses for m 1 . Otherwise, it is combined: after parallel experiment, if on the basis of m observations a simple decision is not made, we pass to the sequential experiment until a decision will not be made. The notes 2.4 and 2.5 concern Stein’s approach, which is described below. 2.8.5.2. Using the Stein’s approach
We generate the sample vectors Yi,l ~ N k (0,V ) , i 1,..., n , l (,0, ) , where N k (0,V ) is k -variate normal distribution. As a result, we have Yi ,l
( yil1 ,..., yikl ) , i 1,..., n , where components of the vectors Yi,l are
independent and are given by the ratios
yil1 ~ N (0,1 (k 1) Ul ) , yijl ~ N (0,1 U l ) , j
2,..., k .
(2.253)
Here N (0, T ) is one dimensional normal distribution with zero mean and variance equal to T . Depending on which of equation from (2.238) we solve, Ul takes values
Chapter 2
136
Ul 1/(k 1), U0 , Ul
U0 or Ul U0 ,1 ,
(2.254)
at the validity of H , H0 or H hypotheses, respectively. Let us suppose that for the vectors Yi,l , i 1,..., n , condition Yi ,l *l is fulfilled nl times, then we have P Yl *l | H l |
where Yl
nl , n
(2.255)
( y1l ,..., ykl ) and *l , l ( ,0, ) , is defined by (2.239).
l By changing O2 in *l , we achieve the fulfilment of the condition
§ · r2l ¸ dH , PYl *l | H l ¨1 ¨ p( H l ) K 0 ¸¹ ©
(2.256)
where H is the desired accuracy of the solution of the equations (2.238). l Thus O2 is the solution of the appropriate equation from (2.238) with given
accuracy. For the next generated value Yn 1,l
l and already determined O2 ,
l (,0, ) , we test conditions (2.239) using distribution densities
p(Yn 1,l | H t )
(2S )
k 2
° 1 § ( y l ) 2 n 1,1 exp® ¨ ¨ 2 ( 1 ( k 1) U t ) °¯ ©
(1 (k 1) U t ) k
¦
j
1 2
(1 U t )
k 1 2
( y nl 1, j ) 2 ·¸½° ¾ , l , t ( ,0, ) , 2 (1 U ) ¸ t ¹° ¿
and depending on which hypothesis acceptance region Yn 1,l belongs to, the appropriate hypothesis is accepted. If accepted hypothesis is Hl , the decision is correct, otherwise, the decision is incorrect.
Constrained Bayesian Method for Testing Different Type of Hypotheses 137
If no one hypothesis is accepted, a new random vector Yn 2,l is generated and then of (2.239) conditions are testing using (2.223) (in which n
2)
until one of the tested hypothesis is not accepted. Note 2.6. In (Zacks and Raming, 1987) is shown that Uˆn is asymptotically normally distributed when n o f , i.e., n ( Uˆ n U ) o N (0, V U2 ) , where
V U2
2(1 U ) 2 (1 (k 1) U ) 2 . k (k 1)
(2.257)
On the basis of this fact, for quite big n , we can test directional hypotheses (2.224) using CBM for the following conditional distributions, instead of (2.226), (2.229) and (2.230) H 0 : p( z | H 0 ) H : p( z | H )
1 U0 1
H : p( z | H )
1 1 U0
N ( z; ( Uˆ n U 0 ),V U20 ) ,
³
U0
1
(2.258)
N ( z; ( Uˆ n U ),V U2 )dU ,
(2.259)
U ),V U2 )dU .
(2.260)
1
³U N ( z; ( Uˆ
n
0
For testing (2.225) hypotheses using maximum ratio test, we have to use the following densities H 0 : p( z | H 0 )
N ( z; ( Uˆ n U 0 ),V U20 ) ,
H A : p( z | H A )
N ( z;0, V U2ˆn ) .
(2.261)
Here N ( z; a,V 2 ) is normal distribution with mathematical expectation
a and variance V 2 .
CHAPTER 3 COMPARISON ANALYSIS OF HYPOTHESES TESTING METHODS
3.1. Comparison of CBM with the Frequentist and the Bayes Methods As was mentioned in Item 1.3, for making a simple decision, i.e., for acceptance of one of the hypotheses being tested, an observation result must belong to the part of the appropriate region which does not intersect with other hypotheses acceptance regions (Kachiashvili et al., 2012a, b, Kachiashvili, 2018a). That means that hypotheses acceptance regions in the above considered statements of CBM practically always are smaller than the regions of acceptance of the same hypotheses in classical Bayesian or other statements (except in the case O
1 , when they coincide to each
other). This is a very important peculiarity which gives CBM the opportunity to make decisions more accurately than other known methods and, moreover, to develop parallel and sequential methods as well. Let us define the summary risk (SR) of making the incorrect decision at hypotheses testing as the weighed sum of probabilities of making incorrect decisions, i.e.,
rS (*)
S
S
i 1
j 1, j zi
¦ ¦
L( Hi , H j ) p( Hi )
³
*j
p( x | Hi )dx .
(3.1)
It is clear that, for given losses and probabilities, SR depends on the regions of making decisions. Let us denote by *CBM and * B the hypotheses
Comparison Analysis of Hypotheses Testing Methods
139
acceptance regions in CBM and Bayes rule, respectively. Then SR for CBM and Bayes rules are rS (* CBM ) and rS (* B ) , respectively. Remark 3.1. For simplicity, the theorem given below is formulated and proven when O is scalar, though it is not difficult to be convinced that the same is true when it is a vector. Theorem 3.1. For given losses and probabilities, SR of making incorrect decision in CBM is a convex function of O with a maximum at O
1 . By
increasing or decreasing O , SR decreases, and in the limit, i.e., at O o f or O o 0 , SR tends to zero. Proof. It is known that decision-making regions in the Bayesian rule satisfy conditions
S
i 1
*iB
B R n and *i
*
B j
, i , j 1,..., S , i z j . It is
proved (see, for example, Kachiashvili et al., 2012a; Kachiashvili & Mueed, 2013; Kachiashvili, 2018a) that in all Tasks of CBM, when O differs from
1 , in observation space R n , there appear to be subspaces of intersections of hypotheses acceptance regions or subspaces which do not belong to any region of acceptance of hypotheses. Both kinds of subspaces are greater, the more O differs from 1 and, when O o f or O o 0 , their union coincides with observation space R n , i.e., decision-making regions become empty (see hypotheses acceptance regions (1.18) or (2.2) and (2.6)). In the first case, hypotheses acceptance regions are reduced by the intersection subregion and, in the second case, hypotheses acceptance regions are reduced by the regions that do not belong to any region of acceptance of hypotheses. Thus in both cases (when O ! 1 and when O 1 ), hypotheses acceptance
Chapter 3
140
regions are reduced in comparison with the case of O
1 , and in the limits
( O o f or O o 0 ) hypotheses acceptance regions become empty. Since, in general, in CBM O z 1 , the hypotheses acceptance regions are less than the regions when O
1 . The hypotheses acceptance regions become smaller
as the difference between O and 1 increases. Since SR of making the incorrect decision (3.1) is defined on these regions, the reduced regions correspond to the reduced SR and vice versa. This proves the theorem. Corollary 3.1. At the same conditions SR of making the incorrect decision
in CBM is less or is equal to SR of the Bayesian decision rule, i.e.,
rS (*CBM ) d rS (* B ) . The justice of this corollary is obvious from Theorem 3.1. Similar to Theorem 3.1, it is not difficult to make sure that SR of making the incorrect decision in CBM is less than or equals SR of the frequentist decision rule, i.e., rS (*CBM ) d rS (* f ) , where rS (* f ) is SR for the frequentist method.
3.2. Comparison of Hypotheses Testing in Parallel Experiments For comparison of the above-described methods, let us consider concrete examples from (Berger, Brown and Wolpert, 1994). In particular, let us consider the following example.
Comparison Analysis of Hypotheses Testing Methods
141
Example 3.1 (Berger, Brown and Wolpert, 1994). Suppose that X1 , X 2 ,…,
X n are i.i.d. N (T ,1) and that it is desired to test H 0 : T
1 versus
H A : T 1 . Then n
B
i 1
1 (2S ) 1 / 2 exp{ ( xi 1) 2 } 2 1 1 / 2 (2S ) exp{ ( xi 1) 2 } 2
Let us test the introduced hypotheses for n
exp{2nx} .
4 and different x .
In the T C test, the threshold C0 is determined on the basis of condition (1.15) which, in the considered case, takes the form § 1 · § 1 · 1 ) ¨ ln C0 2 ¸ ) ¨ ln C0 2 ¸ , © n ¹ ¹ © n
where ) is the standard normal c.d.f. Form here it is seen that C0
1.
Thus the T C test completely coincides with the Bayes one. In both tests, the hypothesis acceptance regions are: if B d 1 (i.e. x t 0 ), reject H 0 and report error probability D ( B )
B( x ) ; if B ! 1 (i.e., x 0 ), accept H 0 1 B( x )
and report error probability E ( B )
1 . Hereinafter, considering the 1 B( x )
concrete examples, we imply that the hypotheses are a priori identically probable. To concretize condition (1.16) for determination of the thresholds r and a in the T * test, we obtain
F0 (1) 1 F1 (1) 1 )(2) ,
(3.2)
Chapter 3
142
a 1. Thus the T * test coincides with
and from here it is obvious that r
the T C and Bayes tests for the considered example. Here we consider only one of the Tasks of CBM, namely the Task 1 for stepwise loss function with two possible values 0 and 1. The essence of this method is the minimization of the averaged probability of incorrect acceptance of hypotheses at restriction of the averaged probability of rejection of true hypotheses, i.e.,
min§¨1 {*i } ©
¦
S i 1
p( H i ) P( X *i | H i ) ·¸ , ¹
(3.3)
subject to S
¦
i 1
p( H i )
S
¦
j 1, j zi
P( X * j | H i ) d J .
(3.4)
Solution of Task (3.3) and (3.4) is (Kachiashvili, 2011; Kachiashvili, Hashmi & Mueed, 2012b)
*j
{x : p( H j ) p(x | H j ) ! O
S
¦
i 1,i z j
p( H i ) p(x | H i )} , j 1,..., S . (3.5)
Coefficient O is the same for all regions of acceptance of hypotheses, and it is determined so that in (3.4), the equality takes place. When the number of hypotheses is equal to 2 and their a priori probabilities are equal to 1 / 2 , solution (3.5) can be rewritten using the Bayes factor: the hypothesis H 0 rejection region is defined as B (x ) d O , and the alternative hypothesis rejection region is B ( x ) t 1 / O . Probabilities (3.3) and (3.4) take the forms
min 1 ( P( B(x) ! O | H 0 ) P( B(x) 1 / O | H A )) / 2 ,
^*0 ,*A `
(3.6)
and
( P( B(x) ! O | H A ) P( B(x) 1 / O | H 0 )) / 2 d J ,
(3.7)
Comparison Analysis of Hypotheses Testing Methods
143
respectively. The posterior probabilities of the hypotheses are calculated similarly to the above given Bayes method. The probabilities of incorrect rejection of basic and alternative hypotheses when they are true are
P ( B ( x ) d O | H 0 ) 1 P ( B ( x) ! O | H 0 )
D0 and
P ( B ( x) t 1 / O | H A ) 1 P ( B ( x) 1 / O | H A ) ,
DA
respectively, and the probabilities of incorrect acceptance of hypotheses when
EA
they
are
erroneous
are
E0
P ( B ( x) ! O | H A )
and
P( B(x) 1/ O | H 0 ) , respectively.
It is clear that, at O 1 , CBM completely coincides with the Bayes method but, at O z 1 , it has new properties (Kachiashvili & Mueed, 2013; Kachiashvili, Kachiashvili & Mueed, 2012b). Namely, when O 1 hypotheses acceptance regions intersect, by using the data from this intersecting area, it is impossible to make an unambiguous decision; when
O ! 1 in the observation space there arises a sub-region which does not belong to any hypothesis acceptance region and it is impossible to make a simple
decision
(Kachiashvili,
Kachiashvili
&
Mueed,
2012a,b;
Kachiashvili, 2018a). Therefore, probabilities of errors of Type I and Type II are computed by the following ratios: at O 1 ,
D0
P ( B ( x) d O | H 0 ) , D A
P( B(x) t 1 / O | H A ) ,
E0
P ( B ( x) ! O | H A ) , E A
P ( B ( x) 1 / O | H 0 ) ;
at O ! 1 ,
D0
P ( B ( x) 1 / O | H 0 ) , D A
P ( B ( x) ! O | H A ) ,
Chapter 3
144
E0
P( B(x) ! O | H A ) , E A
P ( B ( x) 1 / O | H 0 ) ;
at O 1 ,
D0
P ( B ( x) d O | H 0 ) , D A
P( B(x) t 1 / O | H A ) ,
E0
P ( B ( x) t 1 / O | H A ) , E A
P ( B ( x) O | H 0 ) .
While the probabilities of making no decision are
P(1/ O d B(x) d O | H 0 ) and P(1/ O d B(x) d O | H A ) at O ! 1 , and
P(O B(x) 1/ O | H 0 ) and P(O B(x) O | H A ) at O 1 , respectively. Let us denote D c and E c as the probability of not accepting true hypothesis and the probability of accepting false hypothesis, respectively. Then, it is obvious that, when O 1 ,
D 0c D 0 , D cA D A , E 0c
E 0 , E Ac
EA ;
when O ! 1 ,
D 0c D 0 P(1/ O d B(x) d O | H 0 ) , D cA D A P(1/ O d B(x) d O | H A ) ,
E 0c
E 0 , E Ac
EA ;
and when O 1 ,
D 0c D 0 P( O d B(x) d 1 / O | H 0 ) , D cA D A P( O d B(x) d 1/ O | H A ) , E 0c
E 0 , E Ac
EA .
As was mentioned in Dass and Berger (2003, p. 196), “ T * is an actual frequentist test; the reported CEPs, D ( B ( x )) and E ( B ( x )) , are conditional frequentist Type I and Type II error probabilities, conditional on the statistic we use to measure strength of evidence in the data. Furthermore, D ( B ( x ))
Comparison Analysis of Hypotheses Testing Methods
145
and E ( B ( x)) will be seen to have the Bayesian interpretation of being (objective) posterior probabilities of H 0 and H A , respectively. Thus, T * is simultaneously a conditional frequentist and a Bayesian test.” It is not difficult to be convinced that the same is true for the considered CBM. Generalization of the T * test for any number of hypotheses seems quite problematic. For the general case, it is possible only by the way of simulation because the definition of exact distribution of B (x ) likelihood ratio for any hypothetical distributions is very difficult, if not impossible. Generalization of CBM for any number of hypotheses does not represent any problem. It is stated and solved namely for the arbitrary number of hypotheses
(Kachiashvili,
2011;
Kachiashvili
&
Mueed,
2013;
Kachiashvili, Kachiashvili & Mueed, 2012a; Kachiashvili, Hashmi & Mueed, 2012b; Kachiashvili, 2018a). The properties of the decision rules are common and do not depend on the number of hypotheses. In (Dass and Berger, 2003) it is also noted that, because T * is a Bayesian test, it inherits many of the positive features of Bayesian tests; as the sample size grows, the test chooses the right model. If the data actually arises from the third model, the test chooses the hypothesis which is the closest to the true model in Kullback-Leibler divergence (Berk, 1966). CBM has the same positive features that the T * test has and chooses the right model with greater reliability when the increasing sample size (see (Kachiashvili and Mueed, 2013; Kachiashvili, 2018a) and examples given below). If the data arises from the model which is not included in the hypothetical set of tested hypotheses and J is quite small in restriction (3.4), the CBM does not choose any tested hypotheses. Let us determine the threshold O in CBM on the basis of condition (3.7). After simple transformations, we obtain
Chapter 3
146
O exp((8 4) 1 (J ))) .
(3.8)
For probabilities of errors of Type I and Type II, in this case, we have at O 1
D0 D A
E0
E A 1 )(2) ;
D0 D A
E0
E A 1 )¨
(3.91)
at O ! 1 § ln O · 2¸ ; ¹ © 4
(3.92)
at O 1
D0 D A
E0
EA
§ ln O · )¨ 2¸ . ¹ © 4
(3.93)
The probabilities of not accepting of hypotheses and suspicion on validity of both hypotheses are: at O ! 1
P(1 / O d B(x) d O | H 0 )
P(1 / O d B(x) d O | H A )
§ ln O · § ln O · )¨ 2 ¸ )¨ 2¸ , 4 4 © ¹ © ¹
at O 1
P(O B(x) 1 / O | H 0 ) P(O B(x) 1 / O | H A ) § ln O · § ln O · )¨ 2 ¸ )¨ 2¸ , © 4 ¹ © 4 ¹
respectively. From (3.8) it is seen that the thresholds in the decision rule of CBM depend on the restriction of the averaged probability of incorrect rejection of hypotheses. The dependence of these thresholds, i.e., O and 1/ O , on the probability J is shown in Figure 3.1. Different values of the likelihood ratio are also given here, which are used for making a decision. The computation
Comparison Analysis of Hypotheses Testing Methods
147
results of the thresholds and error probabilities depending on J are given in Table 3.1. From here it is seen that, depending on the chosen restriction
J , the region of making the decision on the basis of likelihood ratio B (x ) is divided into three non-intersecting sub-regions: sub-regions of acceptance of one of tested hypothesis, sub-region of not acceptance of the hypotheses and sub-region of the impossibility of acceptance of one hypothesis. Their union coincides with the domain of definition of B (x ) .
Figure 3.1. Dependence of the thresholds O and 1/ O on the probability
J (see (3.8)) ( lam { O , lam1 { 1/ O , Bx { B (x ) ).
Chapter 3
148
Table 3.1. Computed values of the thresholds, error probabilities and probabilities of making no decision for Example 3.1. Prob. of
J
O
0.0001
no decision
Dc
Ec
0.0001
0.38926
0.38936
0.0001
0.00211
0.0002
0.32259
0.32279
0.0002
174.5318
0.00573
0.0005
0.23852
0.23902
0.0005
0.001
78.32989
0.01277
0.001
0.18047
0.18147
0.001
0.005
10.00732
0.09993
0.005
0.07220
0.0772
0.005
0.006
7.75686
0.12892
0.006
0.06239
0.06839
0.006
0.007
6.22799
0.16057
0.007
0.05445
0.06145
0.007
0.008
5.13286
0.19482
0.008
0.04780
0.0558
0.008
0.009
4.31662
0.23166
0.009
0.04209
0.05109
0.009
0.01
3.68913
0.27107
0.01
0.03710
0.0471
0.01
0.015
1.97459
0.50643
0.015
0.01863
0.03363
0.015
0.016
1.78183
0.56122
0.016
0.01576
0.03176
0.016
0.017
1.61654
0.61861
0.017
0.01306
0.03006
0.017
0.018
1.4736
0.67861
0.018
0.01052
0.02852
0.018
0.019
1.34907
0.74125
0.019
0.00811
0.02711
0.019
0.02
1.23986
0.80654
0.02
0.00581
0.02581
0.02
0.021
1.14348
0.87452
0.021
0.00362
0.02462
0.021
0.022
1.05798
0.9452
0.022
0.00152
0.02352
0.022
0.023
0.98174
1.0186
0.0225
0.00050
0.023
0.0225
0.024
0.91345
1.09475
0.02156
0.00244
0.024
0.02156
0.025
0.85202
1.17368
0.02067
0.00433
0.025
0.02067
0.026
0.79655
1.25541
0.01985
0.00615
0.026
0.01985
0.027
0.74628
1.33998
0.01908
0.00792
0.027
0.01908
0.028
0.70057
1.4274
0.01836
0.00964
0.028
0.01836
0.029
0.65888
1.51772
0.01768
0.01132
0.029
0.01768
1/ O
D
968.8075
0.00103
0.0002
473.5868
0.0005
E
Comparison Analysis of Hypotheses Testing Methods
149
0.03
0.62075
1.61095
0.01704
0.01296
0.03
0.01704
0.04
0.36889
2.71083
0.01225
0.02775
0.04
0.01225
0.05
0.24157
4.13954
0.00926
0.04074
0.05
0.00926
0.07
0.12284
8.14037
0.0058
0.06420
0.07
0.0058
0.09
0.07158
13.97095
0.00392
0.08608
0.09
0.00392
0.1
0.05648
17.70406
0.00328
0.09672
0.1
0.00328
0.15
0.02119
47.19398
0.00152
0.14848
0.15
0.00152
0.2
0.00972
102.875
0.00079
0.19921
0.2
0.00079
0.25
0.00498
200.7461
0.00044
0.24958
0.25002
0.00044
0.3
0.00273
365.9139
0.00025
0.29984
0.30009
0.00025
0.4
0.00092
1082.049
0.00009
0.40035
0.40044
0.00009
0.5
0.00034
2980.958
0.00003
0.49863
0.49866
0.00003
0.6
0.00012
8212.301
0.00001
0.60140
0.60141
0.00001
0.7
0.00004
24284.71
0
0.70252
0.70252
0
0.8
0.00001
86377.71
0
0.81009
0.81009
0
The dependences of the probabilities of errors Type I and Type II on the threshold O are shown in Figure 3.2a. The appropriate computed values are given in Table 3.1. Also given here are the probabilities of making no decision depending on the threshold O . From this data, it is seen that, when the probabilities of errors Type I and Type II are equal to
J
F0 (1) 1 F1 (1) 0.02275 , we have O 1 and the probabilities of
errors have the maximum values and the probability of making no decision is equal to zero (see Figure 3.2b). In this case, CBM coincides with the T C , T * and Bayes tests with identical probabilities of errors of both types (see Eq. (3.2) and Eq. (3.9)). Though, in the general case, for any J the situation is more common, and, to each statistics x , on the basis of which the decision is made, there corresponds a certain interval for J , J [J 1 , J 2 ] ,
Chapter 3
150
for which the correct decision is made. For J [J 1 , J 2 ] , either both hypotheses are rejected (when the information contained in x is insufficient for making the decision at the given level) or both hypotheses are suspected to be true (when the information contained in x is insufficient for making a unique decision at the given level). For that reason, probabilities of errors of both types in CBM are less than in the T C , T * and Bayes tests. The probability of making no decision is a measure characterizing a shortage of information for making a simple decision for chosen J for the given
1.00000
0.02000
0.80000
Lambda
.0000010
.0000400
.0003400
.0027300
.0097200
.0564800
.1228400
.3688900
.6588800
.7462800
.8520200
.9817400
1.1434800
1.3490700
1.6165400
1.9745900
6.2279900
.0000010
.0000400
.0003400
.0027300
.0097200
.0564800
.1228400
.3688900
.6588800
.7462800
.8520200
.9817400
1.1434800
1.3490700
1.6165400
1.9745900
6.2279900
0.00000 4.3166200
0.00000 10.0073200
0.20000
174.5318000
0.00500
4.3166200
0.40000
10.0073200
0.01000
0.60000
174.5318000
0.01500
968.8075000
Probability of no decision
0.02500
968.8075000
Probability of Error I or Error II
hypotheses.
Lambda
a)
b)
Figure 3.2. The dependences of the probabilities of errors and making no decision on O . The results of testing of the hypotheses by the above-considered tests for different values of the statistics x are given in Table 3.2. The classical Neyman-Pearson (N-P) test is defined with equal error probabilities, which gives the rejection and acceptance regions similar to T C , T * and Bayes tests, but reports the error probabilities of Type I and Type II,
D
E 1 )( n ) , equal to the same probabilities of CBM for O 1 (
Comparison Analysis of Hypotheses Testing Methods
J
151
0.04550 ). Also reported was the p -value against H 0 which was equal
to 1 )( n ( x 1)) . Some of these data were taken from Table 1 in (Berger, Brown & Wolpert, 1994). It is seen from Table 3.2 that T C , T * , Bayes and N-P tests accept the alternative hypothesis on the basis of x
0 , while there is no basis for such
a decision. Both hypotheses are identically probable or are identically improbable. In this sense, the p -value test is more preferable as it rejects the H 0 hypothesis, to say nothing about H A . Though, the value of the probability p does not give any information about the existence of other hypothesis identically probable to H 0 . In this situation, CBM gives the most logical answer, as it proves that both hypotheses are identically probable or identically improbable. For other values of the statistics x , all considered tests make correct decisions with different error probabilities. But, in these cases, CBM also seems more preferable as it simply defines the significance levels of the test for which it is possible to accept one of the tested hypotheses by statistics x . It is evident that D
E
E c and D c changes correspondingly to the
probability of not making a decision, but it is shifted to the positive side of the ordinate by the value of D . If it is necessary to have such a decision that not only D and E do not surpass some D * , but also D c , it is necessary to choose such O and, accordingly, such J to correspond with D c d D * . It is obvious that in this case D
E D * . For instance, in the considered
example, if, for the statistics x
0.25 , it is necessary to have such a
decision that D c d 0.05 , we must take J corresponds with the following: O
0.01 for CBM, which
3.68313 , 1/ O 0.27107 , D c 0.0471
Chapter 3
152
and D
E
E c 0.01 , i.e. 1 E
0.99 (see Table 3.1). In this case, CBM
is a more powerful test than T C , T * and Bayes tests. It is also a more powerful test than N-P by D and E but is a less powerful by D c . If we take J , 1/ O
0.02275 , then the parameters of CBM will be: O 1.00001
0.99999 , D
E
D c E c 0.02275 , and CBM and N-P tests will
have the identical probabilities of errors of both types, and they will surpass the T C , T * and Bayes tests. For other values of J from the interval J [ 0.007 ,0.05 ] (see Table 3.1), CBM surpasses the N-P test by D and E
but by D c it is worse than the N-P test. For the sake of justice, it is necessary to note that to compare CBM with the N-P by D c is not completely correct, as the N-P has no region of making no decision, i.e., it does not have a characteristic like D c . If the interval [J 1 , J 2 ] , in which it is possible to accept unique decisions, does not contain J * which is necessary to make a decision then we must act as follows: If J * ! J 2 , choose the value J 2 and say that this is the minimum possible significance level of the test for given information and make the decision for J 2 . If J * J 1 , choose the value J 1 and say that this is the maximum possible significance level of the test for given information and make the decision for J 1 . If J * [J 1, J 2 ] , make the decision for J * and say that this is the chosen significance level or power (this depends on the kind of the considered Task) (see Kachiashvili, 2011; Kachiashvili, Hashmi & Mueed, 2012b; Kachiashvili, 2018a) of the test. In the considered case, (1 J ) is the averaged power of the test (see (3.7)).
Comparison Analysis of Hypotheses Testing Methods
153
In Berger, Brown and Wolpert’s (1994) study, for considering the T C , N-P and p -value tests for Example 3.1, it was stated that “The intuitive attractiveness of D (B ) and E (B ) is clear. If the data are x suggests that the evidence equally supports H 0 : T
0 , intuition
1 and H1 : T 1 ;
D (B ) and E (B ) so indicate, while D and E (and p -value) do not. When
x 1 , in contrast, intuition would suggest overwhelming evidence for H1 (note that x 1 is four standard deviation from T
1 ); again D (B ) and
E (B ) reflect this.” It was also stated (see p. 1791) “Since T C is completely
justified from all foundational perspectives and is as “data-adaptive” as the
p -value, T C is clearly to be preferred.” Applying these suggestions to the data given in Table 3.2, the advantage of CBM over T C and consequently over the other tests is evident. Let us consider the following example from Berger, Boukai and Wang’s (1997) study. Example 3.2. Suppose X ! 0 and it is necessary to test H 0 : X ~ e x versus HA : X ~
1 x / 2 . e 2
The likelihood ratio is B( x) 2e x / 2 and its range is the interval (0,2) . Let us concretize the above-considered tests for this example. Let us begin the consideration with CBM. Like in the previous case, let us consider the situation when the hypotheses have identical a priori probabilities. Condition (3.7) will be written as follows
Chapter 3
154
P(2e X / 2 ! O | H A ) P(2e X / 2 1 / O | H 0 ) d 2J .
(3.10)
Thus we have the following decision rule: if B( x) 2e x / 2 ! O , i.e. x 2 ln x ! 2 ln
O 2
, hypothesis H 0 is accepted, if B( x) 2e x / 2 1 / O , i.e.
1 hypothesis H A is accepted. It is clear that, if O 1 , the 2O
hypotheses acceptance regions are mutually complementary, and CBM coincides with classical Bayes and N-P tests (see below). If O 1 , the subregions of impossibility of making unique decisions appear in the decisionmaking space, and, if O ! 1 , the sub-regions of impossibility of making decision appear in the decision-making space. After simple transformations, we obtain from (3.10)
J
1 O 1 2. 2 4 8O
(3.11)
The graph of dependence (3.11) is shown in Figure 3.3. From here it is seen that the inverse proportional dependence exists between J and O .
Figure 3.3. Dependence between J and O for Example 3.2.
Comparison Analysis of Hypotheses Testing Methods
155
Since O ! 0 at 0 d J d 1 , the value of the threshold O changes in the interval >0.4516,2.1121@ . Thus, for the given value of J , we find a positive solution of equation (3.11) with regard to O (which is in the interval
>0.4516,2.0@ ,
and the thresholds O and 1/ O determine the regions of
acceptance of tested hypotheses. By these thresholds, the probabilities of errors Type I and Type II are also determined: at O 1
D0
1
EA
2
4O
0.25 , D A
E0 1
O 2
0.5 ;
at O ! 1 ( 1 O d 2.1121)
D0
EA
1 4O2
, DA
E0
O °1 , if 1 O d 2, ® 2 ° ¯0, if 2 O d 2.1121;
at O 1 ( 0 d O 1 )
D0
EA
O2 4
, DA
E0
°0, if 0 d O d 0.5, ® 1 °1 if 0.5 O 1. °¯ 2O
The probabilities of not accepting hypotheses and suspicion on the validity of both hypotheses are: at O ! 1 ( 1 O d 2.1121)
P ( 2 ln
O 2
x 2 ln 2O | H 0 )
O4 1 , if 1 O d 2, ° ° 4 O2 ® °1 1 , if 2 O d 2.1121, ° ( 2O ) 2 ¯
Chapter 3
156
P (2 ln
O 2
x 2 ln 2O | H A )
O2 1 , if 1 O d 2, ° ° 2O ® °1 1 , if 2 O d 2.1121; °¯ 2O
at O 1 ( 0 d O 1 )
P ( 2 ln 2O x 2 ln
P (2 ln 2O x 2 ln
O 2
O 2
| H0)
O2 °1 , if 0 d O d 0.5, ° 4 ® 4 O 1 ° , if 0.5 O 1; °¯ 4O2
| H A)
O °1 , if 0 d O d 0.5, ° 2 ® 2 °1 O , if 0.5 O 1, ¯° 2 O
respectively. The computed values of these error probabilities are given in Table 3.4. In this case, Bayes test has the following form:
B
°if x t 1.38629, ° ° ° ® °if x 1.38629, ° ° ° ¯
reject H 0 and report the posterior probability D * ( B( x))
B( x) , 1 B( x)
accept H 0 and report the posterior probability E * ( B( x))
1 . 1 B( x)
The critical values of the T C test determined on the basis of (1.15) are:
C1 1.236 and C2
3.236 . Of interest is the positive solution of this
Comparison Analysis of Hypotheses Testing Methods
157
equation only, which is C 1.236 . The conditional error probabilities of this test are similar to Bayes test (Berger, Brown & Wolpert, 1994, p. 1788). The thresholds of the T * test are determined by condition (1.16) which, after simple computation, yields r 1 and a
2 , so that the no-decision
region is the interval (1, 2 ) (Berger, Boukai & Wang, 1997). The reported error probabilities, upon rejection or acceptance, are analogous of the Bayes and T C tests. The critical value for the classical N-P test with equal error probabilities is defined on the basis of (1.15) and, like for the T C test, it is C 1.236 . Error probabilities of Type I and Type II are equal to D
E
0.382 . If we
choose C 1 in the N-P test, the error probabilities of the unconditional test are D
0.25 and E
0 .5 .
p -value
against
The p value
H0
is
computed
by
the
formula
exp( x ) .
The results of the application of these tests for solving the considered problem are given in Table 3.3. Because the error probabilities for the Bayes, T C and T * tests are identical, in Table 3.3 they are omitted for T C and T * tests. The thresholds for making the decision, the error probabilities of Type I and Type II and probabilities of making no decision in CBM are given in Table 3.4 and in Figure 3.4. Therefore, the error probabilities of CBM are not given in Table 3.3. Comparing the results of different tests given in Table 3.3, we can infer the following. Despite the identity of the probabilities of both types in the Bayes, T C and T * tests, for the observation result x 1 they accept different hypotheses: the Bayes - H 0 , T C - H A and T * does not allow to
158
Chapter 3
make a decision. For the same observation result, the N-P and p -value tests accept the H 0 hypothesis, and the probabilities of errors of both types for N-P are less than in the previous tests. For the observation result
x 1.38629 , for which the likelihood ration is equal to 1, the Bayes, T C and T * tests accept the H A hypothesis, while N-P and p -value tests accept the H 0 hypothesis. Though, for this observation result, both hypotheses are equiprobable. In the interest of fairness, it should be noted that this fact is evidenced by the equality of the error probabilities of Type I and Type II in Bayes, T C and T * tests to 0.5. The CBM differs from the considered tests. For any observation result, depending on the chosen level of the averaged error probability of Type II (see (3.10)), the correct decision is made, or it is indicated that on the basis of the existing information it is impossible to make a decision in general or to make a concrete decision (to choose one of tested hypotheses). When the observation result is x 1 , CBM accepts the H 0 hypothesis for J [ 0.3,0.45 ] and does not make a decision for other J . When the
likelihood ratio is equal to 1 ( x 1.38629 ), none of the hypotheses are accepted. Depending on the value of the averaged error probability, both hypotheses are suspected to be true or both are rejected. The critical value of J is 0.375, to which the error probabilities of Type I and Type II equal to 0.25 and 0.5, respectively, correspond (analogously of the N-P test for
C 1 ). In this case, O 1 / O 1 (see Table 3.4) and the CBM is formally similar to the classical Bayes and N-P tests, though the decisions are absolutely different in these tests. In Figure 3.4 it is seen that the more significantly differs the likelihood ratio from 1 or, that is the same, the more information is contained in the observation results in favor of one of the
Comparison Analysis of Hypotheses Testing Methods
159
testing hypothesis, the more is the number of possible values of the averaged error probabilities for which the true hypothesis is accepted.
Figure 3.4. Dependence of O and 1/ O on J for Example 3.2 (Data are obtained by the exact solution of dependence (3.11) using MATLAB).
Chapter 3
1
0.8
0.14
0.000 34
0.0 28
0.2 5
1
B(x)
0.000 34
0.122 8
0.444 (4)
0.999 66
0.877 2
0.555 (5)
HA
HA
HA
0.02 275
0.02 275
0.02 275
0.02 275
0.02 275
0.02 275 HA
HA
HA
0.00 003
0.00 621
0.01 989
AH*)
No one1) Both e2) Reje J [0.02,0.02 D [0.02,0.025] E [0.02,0.02 H A ct J ! 0.025 D 0.02 E ! 0.025 Both H0 e D ! 0.025 J 0.02 E 0.02 No one Reje J [0.007,0.0 D [0.009,0.06] E [0.007,0.0 H A ct Both J ! 0.05 E ! 0.05 D 0.009 H0 e D ! 0.06 J 0.007 E 0.007 No one Reje J [0.0001,0. D (0.00003,0.38E [0.0001,0 H A ct Both J t 0.5 E t 0.5 D d 0.00003 H0 e
For N-P test For p-value For constrained Bayes test (CBM) For tests T C , T * test and Bayes J D CEP A CEP A p - AH*) E D (J , B) E (J , B) *) *) H H D (B) E (B) value 0.02 Reje J d 0.0455 D d 0.02275 0.5 0.5 H A 0.02 H A 0.02 E d 0.02275 275 275 275 ct J ! 0.0455 D ! 0.02275 E ! 0.02275 H0
Table 3.2. The results of testing of the hypotheses by the considered tests for different values of the statistics x .
0
x
160
0.02 275 H0
H0
Both hypotheses are identically possible to be true
0.02 275
0.02 275
2)
H0
0.02 275
Both hypotheses are identically impossible to be true
0.000 34
H0
Accepted Hypothesis
0.999 66
0.119 2
1)
2980. 958
-1
0.88
*)
7.389
0.2 5
0.5
0.06 681
Acc ept H0
Acc ept H0 No one J [0.0001,0 D (0.00003,0.389E [0.0001,0. H 0 Both J t 0.5 E t 0.5 D d 0.00003 e
161
J [0.007,0.0D [0.00920,0.06 E [0.007,0.0 H 0 Both J ! 0.05 D 0.00926 E ! 0.05 J 0.007 D ! 0.06145 E 0.007 e
Comparison Analysis of Hypotheses Testing Methods
1.5576
1.2130
0.5
1
6
2
B(x)
0
x
162
0.5481
0.609
6
0.4518
9
0.3909
0.33(3)
E (B)
D (B)
0.66(6)
CEP
CEP
For Bayes tests
H0
H0
H0
AH*)
HA
H0
H0
AH*)
tests
TC
For
0.38 2
sion
2
2
0.38
2
0.38
2
2
0.38
0.38
E
0.38
D
For N-P test
No deci-
H0
H0
AH*)
tests
For T *
8
H 0 0.3678
3
H 0 0.6065
H0 1
J
J d 0.25
J t 0.5
H 0 J [0.3,0.45]
J d 0.15
J t 0.65
H 0 J [0.2,0.6]
J d 0.03
J t 0.9
H 0 J [0.04,0.85]
H*)
value
H *)
A
p-
No one
Bothe
H0
No one
Bothe
H0
No one
Bothe
H0
AH*)
test
test
A
For constrained Bayes
For p-value
Table 3.3. The results of application of the considered tests for exponential distribution.
Chapter 3
2)
6
0.7357
0.42388
0.5761 HA
HA
HA HA
Both hypotheses are identically possible to be true
hypotheses are identically impossible to be true
Accepted Hypothesis
1) Both
*)
2
3
0.4
HA
HA
0.38
0.38
2
0.38 2
2
0.38 2
0.38
0.6
HA
0.38
0.9447
HA HA
1.5
0.5 2
0.5 2
1
9
1.3862
4
H A 0.1353
3
H A 0.2231
H 0 0.2500
Comparison Analysis of Hypotheses Testing Methods
J d 0.2
J t 0.55
H 0 J [0.25,0.5]
J d 0.3
J t 0.45
H 0 J [0.35,0.4]
J ! 0.375
H 0 J d 0.375
No one
Bothe
HA
No one
Bothe
HA
Bothe
No one
163
Chapter 3
164
Table 3.4. Computed values of thresholds and error probabilities for Example 3.2. J
1/ O
O
D 0 (J , B)
D A (J , B)
E 0 (J , B)
E A (J , B)
0
2.1121
0.47346
1
0.76327
0
0.05604
0.001
2.1085
0.47427
1
0.76286
0
0.05623
0.005
2.094
0.47755
1
0.76122
0
0.05701
0.01
2.076
0.4817
1
0.75915
0
0.05801
0.02
2.0401
0.49017
1
0.75491
0
0.06007
0.03
2.0044
0.4989
1
0.75055
0
0.06223
0.04
1.969
0.50787
0.96924
0.74606
0.0155
0.06448
0.05
1.9337
0.51714
0.9348
0.74143
0.03315
0.06686
0.06
1.8987
0.52668
0.90127
0.73666
0.05065
0.06935
0.07
1.8639
0.53651
0.86853
0.73175
0.06805
0.07196
0.08
1.8294
0.54663
0.83668
0.72669
0.0853
0.0747
0.09
1.7952
0.55704
0.80569
0.72148
0.1024
0.07757
0.1
1.7612
0.56779
0.77546
0.7161
0.1194
0.0806
0.15
1.5962
0.62649
0.63696
0.68676
0.2019
0.09812
0.2
1.4408
0.69406
0.51898
0.65297
0.2796
0.12043
0.25
1.2972
0.77089
0.42068
0.61455
0.3514
0.14857
0.3
1.1671
0.85682
0.34053
0.57159
0.41645
0.18354
0.35
1.0519
0.95066
0.27662
0.52467
0.47405
0.22594
0.37
1.0101
0.99
0.25508
0.505
0.49495
0.24503
0.375
1
1
0.25
0.5
0.5
0.25
0.4
0.9519
1.05053
0.22653
0.47473
0.52405
0.2759
0.45
0.8663
1.15433
0.18762
0.42283
0.56685
0.33312
0.5
0.7937
1.25992
0.15749
0.37004
0.60315
0.39685
0.55
0.7323
1.36556
0.13407
0.31722
0.63385
0.46619
0.6
0.6803
1.46994
0.1157
0.26503
0.65985
0.54018
0.65
0.636
1.57233
0.10112
0.21384
0.682
0.61805
Comparison Analysis of Hypotheses Testing Methods
165
0.7
0.598
1.67224
0.0894
0.16388
0.701
0.6991
0.75
0.5652
1.76929
0.07986
0.11536
0.7174
0.78259
0.8
0.5366
1.86359
0.07198
0.06821
0.7317
0.86824
0.85
0.5115
1.95503
0.06541
0.02248
0.74425
0.95554
0.9
0.4892
2.04415
0.05983
0
0.7554
1
0.95
0.4694
2.13038
0.05508
0
0.7653
1
0.99
0.455
2.1978
0.05176
0
0.7725
1
1
0.4516
2.21435
0.05099
0
0.7742
1
3.3. Comparison of Hypotheses Testing in Sequential Experiments The specific features of hypotheses testing regions of the Berger’s T * test and CBM (see previous sections) are namely, the existence of the nodecision region in the T * test and the existence of regions of impossibility of making a unique or any decision in CBM give the opportunities to develop the sequential tests on their basis. Using the concrete example taken from Berger, Brown & Wolpert’s (1994) study, below, these tests are compared among themselves and with the Wald (1947a) sequential test. For clarity, let us briefly describe these tests. The sequential test developed on the basis of the T * test is as follows (Berger, Brown & Wolpert, 1994): if the likelihood ratio B ( x ) d r , reject H 0 and report the conditional error probability D ( B ( x ))
B ( x ) /(1 B ( x )) ;
if r B ( x ) a , make no decision; if B ( x ) t a , accept H 0 and report the conditional error probability
E ( B ( x )) 1 /(1 B ( x )) . Here r and a are determined by ratios (1.16).
Chapter 3
166
The sequential test developed on the basis of CBM consists of the following (Kachiashvili and Hashmi, 2010; Kachiashvili, 2013, 2018a). Let
*in be the Hi hypothesis acceptance region (3.5) on the basis of n sequentially obtained repeated observation results; Rnm is the decisionmaking space in the sequential method; m is the dimensionality of the observation vector; I in is the population of sub-regions of intersections of hypotheses Hi acceptance regions *in acceptance Enm
Rnm
of
S
i 1
other
(i 1,..., S ) with the regions of
hypotheses
Hj,
j 1,..., S ,
jzi;
*in is the population of regions of space Rnm which do not
belong to any of hypotheses acceptance regions. The Hi hypotheses acceptance regions for n sequentially obtained observation results in the sequential method are:
Rnm,i
*in / I in , i 1,..., S ;
(3.12)
the no-decision region is:
Rnm, S 1
§ ¨ ©
S
I n ·¸ i 1 i ¹
E
m n
,
(3.13)
where *in
{x : p(x | H i ) !
S Oi p (x | " 1," zi "
¦
0 d Oi" f , " 1,..., S . Coefficients Oi"
O
H " )} ,
(3.14)
p( H " ) are defined by the p( H i )
equality of the suitable restriction (3.4). This test is called the sequential test of Bayesian type (Kachiashvili & Hashmi, 2010). Such tests could be considered for all Constrained Bayesian
Comparison Analysis of Hypotheses Testing Methods
167
Methods offered in Kachiashvili, Hashmi and Mueed (2012b), and Kachiashvili (2018a). The essence of the Wald’s (1947a,b) sequential test consists of the following:
B ( x)
compute
the
p( x1 , x2 ,..., xn | H 0 ) / p( x1 , x2 ,..., xn | H A )
likelihood for
n
ratio sequentially
obtained observation results, and, if B B (x ) A ,
(3.15)
do not make the decision and continue the observation of the random variable. If B (x) t A ,
(3.16)
accept the hypothesis H 0 on the basis of n observation results. If B (x ) d B ,
(3.17)
accept the hypothesis H A on the basis of n observation results. The thresholds A and B are chosen so that A
1 E
D
and B
E 1D
.
(3.18)
Here D and E are the desirable values of the error probabilities of Type I and Type II, respectively. It is proved (Wald, 1947a) that in this case the real values of the error probabilities of Type I and Type II are close enough to the desired values, but still are distinguished from them. Example 3.3 (Berger, Brown & Wolpert, 1994). Consider the scenario of
Example 3.1, but suppose the data are observed sequentially. As we showed above, the hypotheses are identically probable.
Chapter 3
168
The sequential test developed on the basis of the T * test for this concrete example is as follows (Berger, Brown & Wolpert, 1994): if xn t g (n) , where n
is the number of sequentially obtained
observations, stop experimentation, reject H 0 and report the conditional error probability D ( Bn ) 1 /[1 exp(2nxn ]) ; if xn g (n) , stop experimentation, accept H 0 and report the conditional error probability E ( Bn ) 1 /[1 exp(2nxn ]) . The choice g ( n)
1 §1 · ln¨ 1¸ 2n © D ¹
(3.19)
guarantees that the reported error probability will not exceed D (Berger, Brown & Wolpert, 1994). The sequential test developed on the basis of CBM in this case is as follows: if
^
`
x min ) 1 (J ) / n 1 , ) 1 (J ) / n 1 , stop experimentation,
accept
H0
and
report
the
conditional
error
probability
ECBM (J , n) P( x BCBM | H A ) ) n BCBM 1 ;
^
`
if x ! max ) 1 (J ) / n 1 , ) 1 (J ) / n 1 , accept H A and report the
conditional
error
probability
DCBM (J , n) Px ! ACBM | H 0 1 ) n ACBM 1 . Otherwise, do not make the decision and continue the observation of the random variable. Here J is the desired value of restriction in (3.4), ) is the standard normal c.d.f. and
Comparison Analysis of Hypotheses Testing Methods
^ min^)
n 1 ,)
169
` n 1 `.
max )1J / n 1 , )1J / n 1 ,
ACBM BCBM
1
J /
1
J /
Wald’s sequential test for this concrete example is as follows:
x
if
x !
1 1 E ln , 2n D
stop
experimentation,
accept
H0 ;
if
1 E ln , stop experimentation, accept H A ; otherwise, do not 2n 1 D
make the decision and continue the observation of the random variable. The error probabilities of Type I and Type II computed similarly to the previous case are:
DW (D , E , n) Px ! AW | H 0 1 ) n AW 1 and
EW (D , E , n) P( x BW | H A ) ) n BW 1 respectively. Here AW
1 1 E 1 E ln and BW ln . 2n D 2n 1 D
It is obvious that, when D
E in Wald’s test and D in Berger’s test from
(3.19) are equal, the hypotheses acceptance thresholds in both these tests are the same. That means that these tests become identical. Let us consider the case when, for Wald’s test, D Berger’s test, D J
E
0.05 , for the
E 0.05 , and, for the sequential test of Bayesian type,
0.05 . The dependences of the thresholds on the number of observations
in the considered tests for chosen error probabilities are shown in Figure 3.5. The computed values are given in Table 3.5. The dependence of error probabilities on the number of observations in the sequential test of Bayesian type and in Wald’s test (that is the same, in Berger’s test) is shown in Figure 3.6, and the computed values are given in Table 3.6. From this data, it is seen that in the sequential test of Bayesian type, the probability of
Chapter 3
170
incorrectly accepting a hypothesis when another hypothesis is true at increasing n , decreases more significantly than in Wald’s test, but the probability of not accepting a true hypothesis in Wald’s test decreases more significantly at increasing n than in the sequential test of Bayesian type. Though, it should be noted that Berger computed the error probabilities in a similar manner as Fisher had for the given value of the statistics (Berger, Brown & Wolpert, 1994). These probabilities given in Table 3.6 were computed as the averaged possibilities of occurrence of such events in the manner similar to the Neyman’s principle.
ACBM and BCBM - the upper and lower thresholds of the sequential test of Bayesian type; AW and AB - the upper thresholds of Wald and erger’s sequential tests, respectively; BW and BB - the lower thresholds of Wald and Berger’s sequential tests, respectively.
Figure 3.5. Dependence of the thresholds on the number of observations in the considered tests (Kulback’s divergence between the considered hypotheses J (1 : 2)
2 ).
Comparison Analysis of Hypotheses Testing Methods
171
Table. 3.5. The computed values of the thresholds depending on the number of observations in the considered tests. ACBM
n
BCBM
AW and AB
BW and BB
1
0.64485
-0.64485
1.47222
-1.47222
2
0.16309
-0.16309
0.73611
-0.73611
3
0.05034
-0.05034
0.49074
-0.49074
4
0.17757
-0.17757
0.36805
-0.36805
5
0.2644
-0.2644
0.29444
-0.29444
6
0.32849
-0.32849
0.24537
-0.24537
7
0.3783
-0.3783
0.21032
-0.21032
8
0.41846
-0.41846
0.18403
-0.18403
9
0.45172
-0.45172
0.16358
-0.16358
10
0.47985
-0.47985
0.14722
-0.14722
11
0.50406
-0.50406
0.13384
-0.13384
12
0.52517
-0.52517
0.12268
-0.12268
13
0.5438
-0.5438
0.11325
-0.11325
14
0.56039
-0.56039
0.10516
-0.10516
15
0.5753
-0.5753
0.09815
-0.09815
16
0.58879
-0.58879
0.09201
-0.09201
17
0.60106
-0.60106
0.0866
-0.0866
18
0.6123
-0.6123
0.08179
-0.08179
19
0.62264
-0.62264
0.07749
-0.07749
20
0.6322
-0.6322
0.07361
-0.07361
21
0.64106
-0.64106
0.07011
-0.07011
22
0.64932
-0.64932
0.06692
-0.06692
23
0.65702
-0.65702
0.06401
-0.06401
24
0.66425
-0.66425
0.06134
-0.06134
25
0.67103
-0.67103
0.05889
-0.05889
26
0.67742
-0.67742
0.05662
-0.05662
Chapter 3
172 27
0.68345
-0.68345
0.05453
-0.05453
28
0.68915
-0.68915
0.05258
-0.05258
29
0.69456
-0.69456
0.05077
-0.05077
30
0.69969
-0.69969
0.04907
-0.04907
40
0.73993
-0.73993
0.03681
-0.03681
50
0.76738
-0.76738
0.02944
-0.02944
60
0.78765
-0.78765
0.02454
-0.02454
70
0.8034
-0.8034
0.02103
-0.02103
80
0.8161
-0.8161
0.0184
-0.0184
90
0.82662
-0.82662
0.01636
-0.01636
100
0.83551
-0.83551
0.01472
-0.01472
Figure 3.6. Dependence of the error probabilities on the number of observations in the sequential test of Bayesian type.
Comparison Analysis of Hypotheses Testing Methods
173
Table 3.6. The values of error probabilities depending on the number of observations. P ( x AW | H A ) P ( x ! ACBM | H 0 ) P ( x ACBM | H A ) P ( x ! AW | H 0 )
=
=
=
P ( x BCBM | H A ) P ( x ! BCBM | H 0 ) P( x BW | H A )
= P ( x ! BW | H 0 )
Error I
Error II
Error I
Error II
probability
probability
probability
probability
for Wald’s
n
in CBM
in CBM
for Wald’s test
test
1
0.05
0.36124
0.68161
0.00671
2
0.05
0.11829
0.3545
0.00704
3
0.03444
0.05
0.18887
0.00491
4
0.00926
0.05
0.10313
0.00311
5
0.00235
0.05
0.05732
0.0019
6
0.00057
0.05
0.03227
0.00114
7
0.00013
0.05
0.01834
0.00068
8
0.00003
0.05
0.0105
0.00041
9
0.00001
0.05
0.00605
0.00024
10
0
0.05
0.0035
0.00014
11
0
0.05
0.00203
0.00008
12
0
0.05
0.00119
0.00005
13
0
0.05
0.00069
0.00003
14
0
0.05
0.00041
0.00002
15
0
0.05
0.00024
0.00001
16
0
0.05
0.00014
0.00001
17
0
0.05
0.00008
0
18
0
0.05
0.00005
0
19
0
0.05
0.00003
0
20
0
0.05
0.00002
0
Chapter 3
174 21
0
0.05
0.00001
0
22
0
0.05
0.00001
0
0
0.05
0
0
t 23
The computation results of the sequentially processed sample generated by N (1,1) with 17 observations are given in Table 3.7, where the arithmetic mean of the observations xk ,..., xm is denoted by xk ,m . From this, it is seen that Wald and Berger’s tests yield the same results, though the reported error probabilities in Berger’s test are a little less than in Wald’s test for the reason mentioned above (Berger computed the error probabilities for the given value of the statistics). Out of 17 observations, correct decisions were taken 7 times on the basis of 3, 3, 5, 1, 1, 3 and 1 observations in both tests. The average value of observations for making the decision is equal to 2.43. In the sequential test of Bayesian type for the same sample correct decisions were taken 10 times on the basis of 1, 2, 2, 1, 3, 2, 1, 1, 3 and 1 observations. The average value of observations for making the decision is equal to 1.7. The reported error probabilities in the sequential test of Bayesian type and Wald’s test decrease depending on the number of observations used for making the decision (see Table 3.6). By the Type II error probability, it strongly surpasses Wald’s test. While these characteristics for Berger’s test have no monotonous dependence on the number of observations (for the reason mentioned above). They basically are defined by the value of the likelihood ratio. For example, the value of the Type I error probability for 5 observations ( x7 ,...,x11 ) surpasses the analogous value for 3 observations
x14 , x15 , x16 and both of them surpass the same value for 1 observation x17 .
1.201596 0.043484 0.658932
0.039022 0.616960 2.026540
-0.422764 0.562569 0.353047 -0.123311 1.126263
1.521061
4 5 6
7 8 9 10 11
12
xi
Observation results
1 2 3
n
1.521061
0.2992
x7,11
x12
0.8942
0.6347
x4 , 6
x1,3
xk , m
D 0.0456 , E 0.9544 .
HA,
D 0.0478 , E 0.9522 .
HA,
D 0.0047 , E 0.9953 .
HA,
D 0.0217 , E 0.9783 .
HA,
The Berger’s test
HA
HA
HA
HA
The Wald’s test xk , m
x12
x10,11
x 7 ,9
x6
x 4 ,5
x 2 ,3
1.521061
0.5015
0.1643
2.02654
0.3280
0.3512
x1 1 .2016
for sequential test of Bayesian type
Table 3.7. The results of testing of a normal sample.
Comparison Analysis of Hypotheses Testing Methods
HA
HA
HA
HA
HA
HA
HA
The sequential teat of Bayesian type
175
-0.578935 0.623006 1.616669
1.754413
14 15 16
17
n
1.486411
13
176
x17
x14,16
x13
1.7544
0.5536
1.4864
2.43
D 0.0291 , E 0.9709 .
HA,
D 0.0348 , E 0.9652 .
HA,
D 0.0487 , E 0.9513 .
HA,
Chapter 3
x17
HA
2.43
x14 ,16
x13
HA
HA
1.754413
0.55362
1.486411
1.7
HA
HA
HA
Comparison Analysis of Hypotheses Testing Methods
177
Example 3.4. Let us briefly consider example 7 from Berger, Brown and
Wolpert’s (1994) study. The sequential experiment is conducted involving i.i.d. N (T ,1) data for testing H 0 : T
0 versus H A : T 1 under a
symmetric stopping rule (or at least a rule for which D
E ). Suppose the
report states that sampling stopped after 20 observations, with x20
0.7 .
In this case, the likelihood ratio is 20
[ f ( x | 0) / f ( x | 1)]
B20
i
i
exp^ 20x20 0.5 ` 0.018 .
i 1
0.8413 . Therefore, a 1 and r 0.3174 .
ࢀ כtest. Compute F0 (1)
Because B20 the
0.018 r 0.3174 , the basic hypothesis H 0 is rejected and
associated
conditional
error
probability
D ( B20 ) B20 /(1 B20 ) 0.01768 is reported. Wald test. Choosing D as A 19 and B
0.05 and E
0.05 the thresholds are computed
0.0526 . Because B20
0.018 B 0.0526 , the
alternative hypothesis H A is accepted. The error probabilities are
D
P( B20 0.0526 | H 0 ) 0.001899 and
E
P( B20 ! 19 | H A ) 0.001899 .
CBM test. The results of computation obtained by CBM for the data
x20
0.7 , V 2 x20 1 / 20 0.05 and J
O 3.141981 B20
0.018 BCBM
and
0.05
are the following:
1/ O 0.31827 .
Because
1/ O 0.31827 , the alternative hypothesis H A is
Chapter 3
178
accepted
and
the
error
D
P( B20 0.31827 | H 0 ) 0.00635 and
E
P( B20 ! 3.141981| H A ) 0.00635 .
If J
probabilities
0.01 is chosen, the computation results are the following: accept
the alternative hypothesis H A with error probabilities D E
are
0.01 and
0.015945 .
It is obvious that, for this example, by error probabilities CBM surpasses the T * and Wald’s method surpasses CBM. Though, for the sake of fairness, it is necessary to note that the error probabilities of CBM are also quite small.
3.4. Comparison of the Directional Hypotheses Testing Methods Directional hypotheses are comparatively new in comparison to traditional hypotheses. For parametrical models, this problem can be stated as H0 : T
T 0 vs. H : T T 0 or H : T ! T 0 , where T is the parameter of
the model, T 0 is known (see, for example, Bansal & Sheng, 2010). Till now, CBM has been introduced and investigated for the stepwise loss function (see, for example, Kachiashvili, 2011; Kachiashvili et al., 2012b). Let us now consider the general case (see Item 1.1.3) and one of the possible formulations of CBM, namely Task 1, given by (1.17) and (1.18). The kinds of functions (1.17) and (1.18) could be chosen differently depending on what types of restrictions are desired, which is determined by the aim of the practical problem that must be solved (Kachiashvili, 2011, 2018a; Kachiashvili et al., 2012b). Statements (1.17) and (1.18) minimize the averaged risk caused by incorrectly accepted hypotheses with restriction of
Comparison Analysis of Hypotheses Testing Methods
179
the averaged risk caused by incorrectly rejected hypotheses. Its solution is given by (1.19). For stepwise loss function, the statement and the solution of this task transform in (1.21), (1.22) and (1.23). For the losses L1 ( H i , H j ) and L2 ( H i , H j ) decision making regions (1.19) take the form (2.6) and the condition (2.4) must be fulfilled.
3.4.1. CBM for the normally distributed directional hypotheses For illustration of the fact that the results of CBM are more promoted than the results of Bayes and frequentist methods when testing the directional hypotheses, let us consider the example given in Bansal and Sheng (2010) for showing some advantage of Bayes rule in comparison with the frequentist one. Let sample X1 , X 2 ,..., X n be derived from N (T ,V 2 ) with known V 2 , and let
p( x | H )
and
p( x | H )
be the truncated normal densities
N (0, Z01V 2 ) ( Z0 known) over (f ,0) and (0,f ) , respectively. Due to the aforementioned, the arithmetic mean is the sufficient statistics. For determination of hypotheses acceptance regions (2.53), (2.56) and (2.58), the following ratios must be determined: 1 p( H | x ) 1 p ( H | x ) 1 p( H | x ) , and . 1 p( H0 | x ) 1 p ( H | x ) 1 p( H0 | x )
Taking into account the conditions of the stated problem, after routine computation, we have
1 p( H | x ) 1 p( H 0 | x )
p( H 0 | x ) p( H | x ) p( H | x ) p( H | x )
^
`
p0 n Z 0 exp u 2 / 2 2Z 0 p ) u 2 Z 0 > p 1 ) u p ) u @
,
Chapter 3
180
^
p0 n Z0 exp u / 2 2 Z0 p 1 )u p0 n Z 0
where
is
u
^ ` exp^ u / 2` 2 2
1 p( H | x ) 1 p( H0 | x ) ) ()
`
p0 n Z0 exp u 2 / 2 2 Z0 p )u
1 p( H | x ) 1 p( H | x )
2
Z0 p 1 )u
2 Z0 > p 1 )u p )u @ the
standard
normal
distribution
,
,
(3.20)
function
and
nx / V n Z 0 . Application of these ratios to hypotheses acceptance regions (2.53), (2.56)
and (2.58) gives
*0
&
^
^
*
&
`
p0 n Z0 exp u 2 / 2 2 Z0 p 1 )u 2 Z0 > p 1 )u p )u @
^
!
K 0 1 ½° ¾, K1 O ° ¿
(3.21)
`
° p0 n Z0 exp u 2 / 2 2 Z0 p )u K 0O& ®x : K1 2 Z0 > p 1 )u p )u @ °¯
*
&
`
° p0 n Z0 exp u 2 / 2 2 Z0 p )u K 1 ! 0 & ®x : K1 O 2 Z0 > p 1 )u p )u @ °¯
^
`
p0 n Z0 exp u 2 / 2 2 Z0 p )u p0
½° O ¾. n Z0 exp u 2 / 2 2 Z0 p 1 )u °¿
^
` exp^ u / 2` 2
(3.22)
2 ° p0 n Z0 Z0 p 1 )u K 0 x : O& ® K1 2 Z0 > p 1 )u p )u @ °¯
^
`
p0 n Z0 exp u 2 / 2 2 Z0 p )u
^
2
`
p0 n Z0 exp u / 2 2 Z0 p 1 )u
!
1 ½° ¾. O °¿
(3.23)
In (3.21), (3.22) and (3.23), the Lagrange multiplier O is determined so that, in condition (2.63), the equality was provided.
Comparison Analysis of Hypotheses Testing Methods
181
Finally, the decision rule in the considered case is the following: if x belongs to only one of the regions *0 , * or * determined by formulae (3.21), (3.22) and (3.23), then the appropriate hypothesis is accepted. Otherwise, i.e., if x belongs to more than one of the considered regions or it does not belong to any of them, a decision is not made. In the first case, it is impossible to make a single decision, because more than one hypothesis are suspected to be true and, in the second case, it is impossible to make a decision. For making a decision, it is necessary to change the restriction level r1 in (2.63) or to add one more observation to the sample.
3.4.1.1. Determination of the Lagrange multiplier As mentioned above, the Lagrange multiplier O is determined so that the equality was provided in condition (2.59). For the solution of equation (2.59), the computation of the following integrals is necessary
³
*0
p ( x | H 0 ) dx ,
³
*
p ( x | H )dx and
³
*
p ( x | H ) dx .
(3.24)
The first integral can be easily computed by the Monte-Carlo method. It is necessary to generate the random variables x with distribution law
p( x | H 0 ) N ( x | 0,V 2 / n) N times and to check the condition x *0 (see (3.21)). Let the condition x *0 be fulfilled N1 d N times. Then
³
*0
p ( x | H 0 ) dx | N1 / N .
For computation of the second integral of (3.24), we have to generate the random variables x with distribution law p( x | H ) N times and to check the condition
x * (see (3.22)). Let the condition x * be fulfilled N 2 d N times. Then
³
*
p ( x | H ) dx | N 2 / N .
Chapter 3
182
Taking into account the specificity of the considered case, for distribution law p( x | H ) , we have 0
p( x | H )
° nx T 2 ½° 2 Z0 ° Z02T 2 ½° exp exp® ¾ ® ¾dT = °¯ 2V 2 °¿ 2V 2 °¿ 2S V °¯ 2S V n
³
f
2 Z0 n Z0
nx
where u
V n Z0 p( x | H )
° Z u 2 ½° 1 )u exp® 0 ¾ , °¯ 2n °¿ 2S V 1
. Therefore,
2 Z0 n Z0
§ § nx ¨1 ) ¨ ¨ ¨ 2S V © © V n Z0 n
·· ¸¸ ¸¸ ¹¹
° Z 0 n 2 x 2 ½° exp ® ¾. °¯ 2nV 2 (n Z 0 ) °¿
Let us denote V 12
V n Z0 V 2 n Z0 , then V 1 , and nZ0 n Z0
§ § nx p ( x | H ) 2 ¨1 ) ¨ ¨ ¨V Z © 1 0 ©
·· ¸ ¸ N x | 0, V 2 , x f,0 , 1 ¸¸ ¹¹
where N x | 0,V 2 is the normal distribution function with mathematical expectation equal to zero and variance equal to V 2 . For obtaining a sample of x with PDF p( x | H ) , it is necessary to solve the equation x
§ § ny 2 ¨1 )¨ ¨ ¨V Z © 1 0 f ©
³
·· ¸ ¸ N y | 0,V 2 dy u , x f,0 , 1 ¸¸ ¹¹
Comparison Analysis of Hypotheses Testing Methods
183
where u is the uniformly distributed random variable from the interval [0,1] , i.e. u ~ U [ 0,1] .
Conditional distribution density of x at validity of H is f
p( x | H )
³ 0
° nx T 2 ½° 2 Z0 ° Z 2T 2 ½° exp® exp® 0 2 ¾dT = ¾ 2 °¯ 2V °¿ 2V °¯ °¿ 2S V 2S V n
§ nx 2 )¨ ¨V Z © 1 0
· ¸ N x | 0,V 2 , x 0,f . 1 ¸ ¹
For obtaining a sample of x with PDF p( x | H ) , it is necessary to solve the equation x
§ ny 2 )¨ ¨V Z © 1 0 0
³
· ¸ N y | 0,V 2 dy u , x 0,f , 1 ¸ ¹
where u ~ U [ 0,1] .
3.4.2. Computation results For the reasons noted in the beginning of Section 3.4.1, let us compute a concrete example with the initial data from Bansal and Sheng (2010): wherein
the
a
priori
^0.3975,0.3975,0.205` and
probabilities pc
p
^p , p0 , p `
^ pc , p0 , pc ` ^0.205,0.3975,0.3975 ` ; the
values of the loss functions are K 0 (2.60) are D 0 D D
are
K1 1 ; the probabilities in restriction
0.05 ; the coefficient is Z0
1 ; the variance is
V 2 1 ; the sample size is n 100 ; the probabilities were thus computed by simulating 10,000 samples from the appropriate populations. Computation results are given in Table 3.8.
p
pc
p
pc
Bayes at
CBM at
CBM at
pc
CBM at
Bayes at
p
pc
CBM at
CBM at
p
CBM at
p
pc
Bayes at
pc
p
Bayes at
Bayes at
pc
CBM at
Bayes at
p
CBM at
Used method
184
0.9573
6.5014
0.9576
7.5
0.9580
6.1718
0.9652
0.9580
8.75
7.1875
1 1
0.9654
8.75
1 1
0.9653
8.5937
1 1
0.9658
Averaged probabilit y on the left-side of (2.59)
8.7213
O
Lagrange Multiplier
-0.2
-0.3
-0.4
0.1613
0.0353
0.1231 0.1917
0.0328
0.0023
0.0154 0.0338
0.0007
0.0001
0.0007 0.0023
0
0.1771
0.2148
0.8724 0.7981
0.5459
0.5924
0.9834 0.9643
0.853
0.8942
0.9993 0.9975
0.9837
0.9865
0
0
0 0
0
0
0 0
0
0
0 0
0
0
ܪା
ିܪ
0
ܪ
x -0.5
Hypotheses acceptance probabilities
Meth. expectat ion of the sample
0.2148 (0.9647) 0.1771 (0.8387)
0.5924 (0.9977) 0.5459 (0.9672) 0.8769 0.8083
0.8942 (0.9999) 0.853 (0.9993) 0.9846 0.9662
0.9993 0.9977
0.9865 (1) 0.9837 (1)
ܪ
0.0368 (0.7852) 0.1696 (0.8229)
0.0023 (0.4076) 0.0331 (0.4541) 0.1276 0.2019
0.0001 (0.1058) 0.0007 (0.147) 0.0166 0.0357
0.0007 0.0025
0 (0.0135) 0 (0.0163)
ିܪ
0.9985 (1) 0.9917 (1)
0.9997 (1) 1 1
1
1 1
1
1
1 1
1
1
ܪା
Hypotheses rejection probabilities (probabilities of impossibility of acceptance of Hypotheses)
Table 3.8. The results of testing directional hypotheses using CBM and Bayes rules.
Chapter 3
0.6616
0.7499
0 0
0.4213
0.4053
0 0
0.1463
0.1057
0 0
0.0163
0.0135
Probability of impossibility of making a decision
p
pc
p
pc
p
pc
p
pc
p
pc
CBM at
CBM at
Bayes at
Bayes at
CBM at
CBM at
Bayes at
Bayes at
pc
CBM at
Bayes at
p
CBM at
Bayes at
p
pc
pc
CBM at
Bayes at
p
CBM at
Bayes at
p
pc
Bayes at
Bayes at
0.9573
6.5014
0.9579
7.0312
0.9574
6.5429
0.9574
6.2402
1 1
0.9662
8.75
1 1
0.9652
8.75
1 1
0.9662
8.75
1 1
0.9662
8.75
1 1
0.2
0.1
0
-0.1
0.5595 0.4247
0.0759
0.1166
0.8683 0.7862
0.2935
0.3334
0.9493 0.9492
0.534
0.3915
0.7891 0.874
0.4775
0.1916
0.4246 0.5513
0.0001 0.0001
0
0
0.0023 0.0006
0.0001
0
0.033 0.0148
0.0017
0.0022
0.2043 0.1182
0.0303
0.036
0.567 0.4373
0.4296 0.5666
0.2633
0.1587
0.1221 0.2081
0.0491
0.0227
0.0147 0.0331
0.0037
0.0016
0.0009 0.0025
0.0002
0
0 0.0001
0.1587 (0.8834) 0.2633 (0.9241) 0.4405 0.5753
0.1317 0.2138
0.0227 (0.6666) 0.0492 (0.7065)
0.0038 (0.6085) 0.0054 (0.466) 0.0507 0.0508
0.036 (0.8084) 0.0305 (0.5225) 0.2109 0.126
0.5754 0.4487
Comparison Analysis of Hypotheses Testing Methods
0.9874 (1) 0.9992 (1) 0.9999 0.9999
0.9977 0.9994
0.888 (1) 0.983 (0.9999)
0.5857 (0.9978) 0.8532 (0.9983) 0.967 0.9852
0.2187 (0.964) 0.541 (0.9697) 0.7957 0.8818
0.433 0.5627
0.1292 (0.8413) 0.0767 (0.7367) 0.5704 0.4334
0.8779 0.7919
0.4454 (0.9773) 0.3105 (0.9509)
0.8058 (0.9984) 0.6808 (0.9963) 0.9853 0.9669
0.9729 (1) 0.9365 (0.9998) 0.9991 0.9975
1 0.9999
0 0
0.6608
0.7247
0 0
0.6573
0.6439
0 0
0.4606
0.6047
0 0
0.492
0.7724
0 0
185
p
pc
p
pc
CBM at
CBM at
Bayes at
Bayes at
0.9580
7.1875
0.9580
6.1718
0.9576
7.5
1 1
0.9652
8.75
1 1
0.9662
8.75
1 1
0.9652
8.75
0.5
0.4
0.3
0.0023 0.0007
0
0
0.0311 0.0164
0
0.0005
0.1928 0.1182
0.005
0.0166
0 0
0
0
0 0
0
0
0 0
0
0
0.9976 0.9992
0.9896
0.9775
0.9668 0.983
0.9126
0.8426
0.7994 0.8792
0.6214
0.5095
Chapter 3
0.9775 (1) 0.9896 (1) 0.9977 0.9993
0.8426 (0.9995) 0.9126 (1) 0.9689 0.9836
0.5095 (0.9834) 0.6214 (0.995) 0.8072 0.8818
1 1
1
1
1 1
1
1
1 1
0.9995 (1) 1
0 (0.0225) 0 (0.0104) 0.0024 0.0008
0.0005 (0.1574) 0 (0.0874) 0.0332 0.017
0.0171 (0.4905) 0.005 (0.3786) 0.2006 0.1208
0 0
0.0104
0.0225
0 0
0.0874
0.1569
0 0
0.3736
0.4739
hypotheses rejection probabilities.
Remark 3.2. The probabilities of impossibility of acceptance of hypotheses are given in the brackets of the columns of
p
pc
pc
CBM at
Bayes at
p
CBM at
Bayes at
p
pc
Bayes at
pc
CBM at
Bayes at
p
CBM at
186
Comparison Analysis of Hypotheses Testing Methods
187
In accordance with the results presented in Table 3.8, the following dependences are constructed: -
dependences of the probabilities of impossibility of acceptance of
H0 hypothesis on the arithmetic mean of observation results (Figure 3.7); -
dependences of the probability of acceptance of H0 hypothesis on the arithmetic mean of observation results (Figure 3.8);
-
dependences of the probabilities of acceptance of H and H hypotheses on the arithmetic mean of observation results (Figure 3.9);
-
dependences of the probabilities of rejection of H and H hypotheses on the arithmetic mean of observation results (Figure 3.10). From these graphs, the correctness of the above-described theoretical
results and the advantage of CBM in comparison with Bayes rule and, accordingly, with the frequentist method is obvious.
188
Chapter 3
Figure 3.7. Dependences of the probabilities of impossibility of acceptance of H0 hypothesis on the arithmetic mean of observation results. B - Bayes rule; L1 - CBM for losses (1.20); p1 { p( H ) and
Figure 3.8. Dependences of the probability of acceptance of H0 hypothesis on the arithmetic mean of observation results.
p 2 { p( H ) .
a) b) Figure 3.9. Dependences of the probabilities of acceptance of H and
H hypotheses on the arithmetic mean of observation results.
Comparison Analysis of Hypotheses Testing Methods
a
189
b
Figure 3.10. Dependences of the probabilities of rejection of H and H hypotheses on the arithmetic mean of observation results.
3.4.2.1. Discussion CBM is more sensitive to changes to a priori probabilities than Bayes test because, in CBM, a priori probabilities are multiplied by probabilities of significance levels, and hence the change in a priori probabilities changes the restriction level in (1.18) more significantly and, accordingly, changes the decision-making regions more significantly. From the specificity of the decision rule in CBM, the following relations take place among the computed probabilities: 1. (prob. of rejec. of H0 ) = (prob. of rejec. of all hypotheses) + (prob. of rejec. of H0 and H and accep. of H ) + (prob. of rejec. of H0 and H and accep. of H ); 2. (prob. of accep. of H0 ) + (prob. of accep. of H ) + (prob. of accep. of
H ) + (prob. of no making of decision) = 1;
Chapter 3
190
3. a) at absence of intersecting regions: (prob. of accep. of H0 ) + (prob. of rejec. of H0 ) = 1; b) at intersecting regions: (prob. of accep. of H0 ) + (prob. of rejec. of H0 ) + (prob. of suspicion of more than one hypotheses to be true) = 1; 4) (prob. of rejec. of H and H and acceptance of H0 ) = (prob. of
x *0 ) - (prob. of accep. of H0 and H and rejection of H ) - (prob. of accep. of H0 and H and rejec. of H ) - (prob. of accep. of all H0 , H and H ). 5) Summary risk (3.1) can be computed using the appropriate computation results as follows SR
p ( H ) [(1 ( prob. of accep. of H1 )) ( prob. of impos. of H1 )]
p ( H0 ) [(1 ( prob. of accep. of H 0 )) ( prob. of impos. of H 0 )] p( H ) [(1 ( prob. of accep. of H )) ( prob. of impos. of H )] .
The dependences of SR on Lagrange multiplier are shown in Figure 3.11. They clearly demonstrate the validity of theorem 3.1.
Comparison Analysis of Hypotheses Testing Methods
191
Figure 3.11. Dependence of the summary risk (SR) on the Lagrange multiplier. The graphs of summary risk (SR) are constructed by computed values of SR, using simulated samples at supposition of the validity of the following hypotheses: a) H : x
0.1 ; H 0 : x
0 ; H : x 0.1 ;
b) H : x
0.2 ; H 0 : x
0 ; H : x 0.2 ;
c) H : x
0.2 ; H 0 : x
0 ; H : x
0.1 .
6) Type III error rate (2.76) can be computed using the computation results by the following ratio T ERRIII
( probab. of accep. of H | H 0 is true)
( probab. of accep. of H | H 0 is true)
and type III error rate (2.77) can be computed using the computation results as follows K ERRIII
( probab. of accep. of H | H is true)
( probab. of accep. of H | H is true) .
Chapter 3
192
Appropriate computed results are shown in Figure 12. They clearly demonstrate the validity of Theorem 2.2.
Figure 12. Dependences of type III error rates on the Lagrange multiplier. The graphs of type III error rates are constructed by computed values of ERRs, using the simulated samples at supposition of the validity of the following hypotheses: a) H : x
0.1 ; H 0 : x
0 ; H : x
0.1 ;
b) H : x
0.2 ; H 0 : x
0 ; H : x 0.2 ;
c) H : x
0.2 ; H 0 : x
0 ; H : x
0.1 .
Remark 3.3. The values of different Type III error rates differ considerably. K for Therefore, the character of the change in the graphs of ERR III
hypotheses a) and b) are not clear from the graph given on the left side of Figure 12. For avoiding this inconvenience, the graphs of Type III error rates are grouped depending on their values and presented in the two right
Comparison Analysis of Hypotheses Testing Methods
193
T K K graphs of Figure 12 ( ERRIII and ERRIII c) on the upper graph and ERRIII K a) and ERR III b) on the lower graph).
As the conclusion of the current chapter, we note that the offered CBM method is a more general method of hypotheses testing than the existing classical Fisher, Jeffrey, Neyman and Berger methods. It has all the positive properties of the aforementioned methods. Namely, it is a data-dependent measure like Fisher’s test, for making the decision it uses a posteriori probabilities like Jeffreys’ test and computes Type I and Type II error probabilities like Neyman-Pearson’s approach. Like Berger’s methods, it has no-decision-making regions. Moreover, the regions of making decisions have new, more general properties than the same regions in the other considered methods. These properties allow us to make more well-founded and reliable decisions. Particularly, do not accept a unique hypothesis or do not accept any hypothesis when the information on the basis of which the decision must be made is not enough for distinguishing the informationally close hypotheses or for choosing a hypothesis among informationally distant ones. The computed results presented in this chapter confirm the abovementioned reasoning and clearly demonstrate positive properties of CBM with comparison of the existing methods. A very interesting peculiarity of CBM is the possibility of its use in parallel and sequential experiments without any changes and when it is necessary smoothly transit from parallel to sequential methodology. In despite of Berger’s and Wald’s methods, the sequential test of Bayesian type is universal, and without modification it can be used for any number of hypotheses and any dimensionality of observation vector. It is simple and very convenient for use and methodologically practically does not depend
194
Chapter 3
on the number of tested hypotheses and dimensionality of the observation space. The computed results, presented in the paper, clearly demonstrate high quality of the sequential test of Bayesian type. Generalization of CBM for arbitrary loss functions and its application for testing the directional hypotheses is also offered. The advantage of CBM in comparison with Bayes and frequentist methods is theoretically proven and clearly demonstrated by a concrete computed example. The advantages of the use of CBM for testing the directional hypotheses are: 1) alongside with a priori probabilities and loss functions, it uses the significance levels of hypotheses for sharpening the sensitivity concerning direction; 2) it makes decisions more carefully and with given reliability; 3) less values of SR and Type III error rates correspond to it. CBM allows making a decision with required reliability if the existing information is sufficient, otherwise it is necessary to increase the information or to reduce the required reliability of the made decision. CBM surpasses the Bayes and frequentist methods with guaranteed reliability of made decisions.
CHAPTER 4 EXPERIMENTAL INVESTIGATIONS
4.1. Simulation results of Directional Hypotheses at Restriction of False Discovery Rates With the intention of checking the correctness of theoretical results given in Item 3.4.1, let us consider the following example. Let us suppose that an observation result x is distributed by the normal distribution N (T ,V 2 ) with 1 2 known V 2 at H 0 , and it is distributed by the truncated N (0, Z0 V ) ( Z 0
known) densities over (f,0) and (0,f) , at H and at H , respectively. The arithmetic mean is a sufficient statistic for this example. For determination of hypotheses acceptance regions (2.81), (2.96) and (2.104), the following ratios must be determined:
p ( H0 | x ) p ( H | x ) p ( H | x ) p ( H | x ) p ( H | x ) p ( H0 | x ) , and p( H | x ) p( H0 | x ) p( H | x ) . Note 4.1. Hypotheses acceptance regions (2.109) differ a little from these ones. Therefore, the computational formulae for Task 5 are slightly different from the formulae given below. Because this is not related with any principal difficulties, we will not concentrate our attention on it. Taking into account the conditions of the stated problem, after routine computation, we have
Chapter 4
196
p( H 0 | x ) p( H | x ) p( H | x )
^
`
p( H 0 ) n Z0 exp u 2 / 2 p( H ) 2 Z0 )u p ( H ) 2 Z0 1 )u
,
(4.1)
p( H | x ) p( H | x ) p( H0 | x ) p ( H ) 2 Z0 1 )u p( H ) 2 Z0 ) u
^
`
p ( H 0 ) n Z0 exp u 2 / 2
,
(4.2)
p( H | x ) p( H0 | x ) p( H | x )
^
`,
p ( H ) 2 Z0 1 ) u p ( H 0 ) n Z0 exp u 2 / 2 p ( H ) 2 Z0 ) u where u
) ()
(4.3)
is the standard normal distribution function and
nx /(V n Z0 ) (Kachiashvili, 2018a; Kachiashvili et al., 2018). Application of these ratios to hypotheses acceptance regions of the
considered Tasks 1, 2 and 4, i.e., to formulae (2.81), (2.96) and (2.104), respectively, gives
*
^
Oc
*0
`
° p( H0 ) n Z0 exp u 2 / 2 p( H ) 2 Z0 )u ®x : p( H ) 2 Z0 1 )u °¯ K0 ½ ¾, K1 ¿
° p( H ) 2 Z0 1 )u p( H ) 2 Z0 )u K ½° O cc 0 ¾ , ®x : 2 K1 °¿ p( H0 ) n Z0 exp u / 2 °¯
^
`
Experimental Investigations
*
197
° p( H ) 2 Z0 1 )u p( H0 ) n Z0 exp u 2 / 2 ®x : p( H ) 2 Z0 )u °¯ O ccc
K0 ½ ¾, K1 ¿
(4.4)
where O c O cc O ccc O1 for Task 1, O c O2 , O cc O02 and O ccc O2 for Task 2 and O c O cc O ccc 1/ O4 for Task 4. The Lagrange multipliers are determined so that, in the appropriate conditions (2.80), (2.95) and (2.103), the equalities were provided. For the solution of these equations, the computation of the following integrals is necessary
³
*j
p( x | H i )dx , i, j < , < { ^,0,` ,
(4.5)
that can be easily made by the Monte-Carlo method described in Kachiashvili et al. (2018a). In particular, it is necessary to generate the random variables x with distribution law p( x | H i ) N times and to check the condition x * j . Let the condition x * j be fulfilled N1 d N times. Then
³
*j
p( x | H i )dx | N1 / N .
4.1.1. Computation results for the normally distributed directional hypotheses Let us compute a concrete example with the initial data from Bansal and Sheng (2010) and Kachiashvili et al. (2018a): the a priori probabilities are p
^p , p0 , p ` ^0.3975,0.3975,0.205`; the values of the loss functions are
Chapter 4
198
K0
K1 1 ; the coefficient in this case is Z0
1 ; variance is V 2 1 ; and
the levels of mdFDR in all the considered cases is q
0.05 .
For this data, for satisfaction of the conditions of Theorems 2.3., 2.4., 2.5. and 2.6., the probabilities in restrictions (2.80), (2.95), (2.103) and (2.108), must be taken: these are
r4
0.01025 and r5,
r5,0
r1
0.01025 ; r2
r5,
r20
r2
0.0050456 ;
r5,0 0.0125 . Otherwise, we can
take different restriction levels for different restriction conditions of the considered Tasks, for example, proportionally to the a priori probabilities. Computation results for saving room and simplifying the reading of the work are given below only for Task 1. Computation results for Task 1 when decisions are made on the basis of xn , computed by n observation results and the appropriate On , obtained by solving equation (2.80) for p ( xn | H i ) ,
i < , < { ^,0,`, are given in Table 4.1. Computation results for Task 1 when decisions are made on the basis of xn , computed by n observation results and O , obtained by solving equation (2.80) for n 1 , are given in Table 4.2. We used 3,000 results of simulation for computation of probabilities integrals (4.5) when determining Lagrange multipliers and we used simulated samples with 10,000 observations for making decisions at different values of n . The correctness of the theoretical results of Item 2.3.1, in particular, of Theorem 2.3, is obvious from these results. For changing the informational distance between tested hypotheses, we were changing the value of n (see Table 4.1). It is evident that the greater is n , the greater is informational distance between hypotheses. For easy perception, the results of Table 4.1 are presented graphically in Figures 4.1, 4.2, 4.3 and 4.4. It is seen from here
Experimental Investigations
199
that when information distances between hypotheses increas, the probability of acceptance of true hypothesis increases, the probability of impossibility of making a decision and mixed directional false discovery rate decrease that are logically correct.
Chapter 4
Lagrange multiplier
ߣ
4.681218057687266 5.297517776489258 5.817072043658022 6.27502441406250 6.68864345672742 7.069199352497151 7.423412142832838 7.8125 8.070774078369141 8.370046615600586 8.65600024076107 8.930274526515243 9.194187541976135 9.448831728659570 9.695120066094205 9.933824826543038 10.165596008300781 10.391025543212891 10.610603003517735 10.82275390625 12.5
݊
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 100
0/0.0128/0.0038 0/0.0156/0.005 0/0.0138/0.0047 0/0.0137/0.0052 0/0.0111/0.0053 0/0.0111/0.0044 0/0.0103/0.0037 0/0.0085/0.0045 0/0.0086/0.0037 0/0.0076/0.0031 0/0.0071/0.004 0/0.0065/0.0032 0/0.0059/0.0024 0/0.0049/0.0027 0/0.0052/0.0032 0/0.0047/0.0021 0/0.0055/0.0031 0/0.0044/0.0025 0/0.004/0.0029 0/0.0042/0.0028 0.1361/0.0014/ 0.0007
0/0.12/0 0/0.2122/0 0/0.2748/0 0/0.3195/0 0/0.3621/0 0/0.3847/0 0/0.4114/0 0/0.4371/0 0/0.4451/0 0/0.4678/0 0/0.4867/0 0/0.492/0 0/0.506/0 0/0.5181/0 0/0.5266/0 0/0.5308/0 0/0.5429/0 0/0.5459/0 0/0.5578/0 0/0.5743/0 0/0.7773/0
0/0/0.0535 0/0/0.1418 0/0/0.2065 0/0/0.2534 0/0/0.2865 0/0/0.3187 0/0/0.3599 0/0/0.3684 0/0/0.3948 0/0/0.4196 0/0/0.4292 0/0/0.446 0/0/0.4647 0/0/0.4773 0/0/0.4798 0/0/0.4962 0/0/0.5032 0/0/0.5165 0/0/0.5204 0/0/0.5268 0.0155/0/ 0.7528
0.9834 0.9794 0.9815 0.9811 0.9836 0.9845 0.986 0.987 0.9877 0.9893 0.9889 0.9903 0.9917 0.9924 0.9916 0.9932 0.9914 0.9931 0.9931 0.993 0.8618
0.88 0.7878 0.7252 0.6805 0.6379 0.6153 0.5886 0.5629 0.5549 0.5322 0.5133 0.508 0.494 0.4819 0.4734 0.4692 0.4571 0.4541 0.4422 0.4257 0.2227
at ିܪ 0.9465 0.8582 0.7935 0.7466 0.7135 0.6813 0.6401 0.6316 0.6052 0.5804 0.5708 0.554 0.5353 0.5227 0.5202 0.5038 0.4968 0.4835 0.4796 0.4732 0.2317
at ܪା
at ܪ
at ܪା
at ܪ
at ିܪ
Probabilities of impossibility of making a decision
Probabilities of acceptance of Hypotheses
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Pure directional false discovery rate ܴܦܨ݀
Mixed directional false discovery rate ܴ݉݀ܦܨ