Joint Statistical Papers [Reprint 2020 ed.] 9780520339897


170 47 27MB

English Pages 308 Year 2020

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Joint Statistical Papers [Reprint 2020 ed.]
 9780520339897

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

JOINT STATISTICAL PAPERS

JOINT STATISTICAL PAPERS J. N E Y M A N & E. S. P E A R S O N

U N I V E R S I T Y OF C A L I F O R N I A PRESS BERKELEY A N D LOS A N G E L E S

University of California Press Berkeley and Los Angeles, California Library of Congress Catalog Card N u m b e r : 67-15554

P R I N T E D I N GREAT B R I T A I N

CONTENTS Foreword

page vii

1 On the use and interpretation of certain test criteria for purposes of statistical inference. Part I Biometrika (1928) 20A, 175-240.

1

On the use and interpretation of certain test criteria for purposes of statistical inference. Part II Biometrika (1928) 20 A, 263-94.

67

3 On the problem of two samples. Bull. Acad. Pol. Sci. (1930), 73-96.

99

2

4

On the problem of k samples. Bull. Acad. Pol. Sci. (1931), 460-81.

116

5 Further notes on the x 2 distribution. Biometrika (1931) 22, 298-305.

132

6 On the problem of the most efficient tests of statistical hypotheses. Phil. Trans. Boy. Soc. A (1933) 231, 289-337.

140

7 The testing of statistical hypotheses in relation to probabilities a priori. Proc. Camb. Phil. Soc. (1933) 24, 492-510.

186

8

203

Contributions to the theory of testing statistical hypotheses. Stat. Res. Mem. (1936) 1, 1-37.

9 Sufficient statistics and uniformly most powerful tests of statistical hypotheses. Stat. Res. Mem. (1936) 1, 113-37. 10 Contributions to the theory of testing statistical hypotheses. Stat. Res. Mem. (1938) 2, 25-57. Appendix. On statistics the distribution of which is independent of the parameters involved in the original probability law of the observed variables. Stat. Res. Mem. (1938) 2, 57-9.

240 265 298

FOREWORD Professor E. S. Pearson retired from the Managing Editorship of Biometrika at the end of 1965, and the Trustees decided to celebrate his long and distinguished editorship by reissuing some of his written contributions to statistics in collected form. Some of the most important of these were made jointly with Professor Jerzy Neyman and so it was decided to issue two volumes: one of papers largely by Professor Pearson alone, which has been published, and the other of the joint Neyman-Pearson papers, which is the present volume. The Trustees salute both authors, and are glad to make their important work available to statisticians and students in convenient form. The University Presses of Cambridge and California will shortly publish a volume of collected papers by Professor Neyman alone. L. H. C. T I P P E T T December 1966

Chairman of Biometrika Trustees

ACKNOWLEDGEMENTS The Trustees of Biometrika thank the Polish Academy of Science and Letters, The Royal Society, The Cambridge Philosophical Society and the Department of Statistics, University College London, for permission to reproduce papers which originally appeared in their publications.

ON THE USE AND INTERPRETATION OF CERTAIN TEST CRITERIA FOR PURPOSES OF STATISTICAL INFERENCE. PART I. By J. NEYMAN, PH.D. AND E. S. PEARSON, D.Sc. CONTENTS. I. Introductory II. Sampling from a Normal Population : (1) Description of the Fundamental Space (2) The Use of the Contour Systems in testing Hypothesis A (3) The Criterion of Likelihood (4) The Application of the Criterion of Likelihood in testing Hypothesis A (5) Student's Test and Hypothesis B (6) Alternative Method of examining Hypothesis B . (7) Solutions obtained by the Inverse Method (8) Analysis of Church's samples from a Skew Population . . . . (9) Illustrations of the use of the Tables of P^ III. Sampling from a Rectangular Population: (10) Sampling Distributions of the Frequency Constants . . . . (11) Solutions obtained by the Inverse Method (12) Illustrations from Experimental Sampling IV. Sampling from an Exponential Population : (13) Sampling Distributions of Frequency Constants (14) Test for Random Intervals V. Conclusion VI. Appendix with Pk Tables I.

PAGE

1

4 8 10 13 15 17 18 23 28 34 39 40 47 52 66 59

Intboductort.

O n e of the most common as well as most important problems which arise in the interpretation of statistical results, is that of deciding whether or not a particular sample may be judged as likely to have been randomly drawn from a certain population, whose form may be either completely or only partially specified. W e may term Hypothesis A the hypothesis that the population from which the sample 2 has been randomly drawn is that specified, namely II. In general the method of procedure is to apply certain tests or criteria, the results of which will enable the investigator to decide with a greater or less degree of confidence whether to accept or reject Hypothesis A, or, as is often the case, will show him that further data are required before a decision can be reached. At first sight the problem may be thought to be a simple one, but upon fuller examination one

2

J . NEYMAN AND E . S. PEARSON

is forced to the conclusion that in many cases there is probably no single "best" method of solution. The sum total of the reasons which will weigh with the investigator in accepting or rejecting the hypothesis can very rarely be expressed in numerical terms. All that is possible for him is to balance the results of a mathematical summary, formed upon certain assumptions, against other less precise impressions based upon a priori or a posteriori considerations. The tests themselves give no final verdict, but as tools help the worker who is using them to form his final decision; one man may prefer to use one method, a second another, and yet in the long run there may be little to choose between the value of their conclusions. What is of chief importance in order that a sound judgment may be formed is that the method adopted, its scope and its limitations, should be clearly understood, and it is because we believe this often not to be the case that it has seemed worth while to us to discuss the principles involved in some detail and to illustrate their application to certain important sampling tests. There are two distinct methods of approach, one to start from the population II, and to ask what is the probability that a sample such as 2 should have been drawn from it, and the other the inverse method of starting from £ and seeking the probability that II is the population sampled. The first is the more customary method of approach, partly because it seems natural to take II as the point of departure since in practice there are often strong a priori grounds for believing that this is the population sampled, and partly because there is a common tendency to view with suspicion any method involving the use of inverse probability. But in fact, however strong may be the a prion evidence in favour of II, there would be no problem at all to answer if we were not prepared to consider the possibility of alternative hypotheses as to the population sampled; and we shall find that it is impossible to follow the first method very far without introducing certain ideas of inverse probability—that is to say, arguing from the sample to the population. If on the other hand we start boldly with assumptions regarding a priori and a posteriori probability, we reach by an almost simpler method sampling tests very nearly equivalent to those obtained from the first starting-point. Indeed the inverse method may be considered by some the more logical of the two; we shall consider first however the other solution. Perhaps the most suggestive method of description is to represent 2 by a point in a hyperspace whose dimensions will depend upon the particular problem considered; and to associate the criteria for acceptance or rejection with a system of contours in this space, so chosen that in moving out from contour to contour Hypothesis A becomes less and less probable*. The frequency with which the sample corresponding to a particular point will occur in random sampling from II may be represented by giving to the space an appropriate "point-density." Thus the chance of drawing a sample whose representative point lies within a certain * Here and later the term "probability" used in connection with Hypothesis A must be taken in a very wide sense. It cannot necessarily be deseribed by a single numerical measure of inverse probability; as the hypothesis beoomes "less probable," our confidence in it decreases, and the reason for this lies in the meaning of the particular contour system that has been ohosen.

On the use and interpretation

of certain test criteria

3

region in the space is proportional to the density integrated throughout that region, or in other words to the " weight" of the region. The contours are not necessarily contours of equal density, but may be surfaces or regions throughout which some statistical measure such as the mean or standard deviation remains at a constant "level." Although it is impossible to visualise a space of high dimensions, it will be found that with the help of the analogy of a three-dimensional density space this method of description is often a very considerable aid to purely algebraic discussion. ' Setting aside the possibility that the sampling has not been random or that the population has changed during its course*, 2 must either have been drawn randomly from II or from II', where the latter is some other population which may have any one of an infinite variety of forms differing only slightly or very greatly from II. The nature of the problem is such that it is impossible to find criteria which will distinguish exactly between these alternatives, and whatever method we adopt two sources of error must arise: (1) Sometimes, when Hypothesis A is rejected, 2 will in fact have been drawn from II. (2) More often, in accepting Hypothesis A, 2 will really have been drawn from II'. I n the long run of statistical experience the frequency of the first source of error (or in a single instance its probability) can be controlled by choosing as a discriminating contour, one outside which the frequency of occurrence of samples from II is very small—say, 5 in 100 or 5 in 1000. I n the density space such a contour will include almost the whole weight of the field. Clearly there will be an infinite variety of systems from which it is possible to choose a contour satisfying such a condition. For example there will be the system of contours upon any one of which (a) the mean, or (b) the standard deviation, or (c) the ratio of mean to standard deviation, of 2 is constant. The second source of error is more difficult to control, but if wrong judgments cannot be avoided, their seriousness will at any rate be diminished if on the whole Hypothesis A is wrongly accepted only in cases where the true sampled population, II', differs but slightly from II. I t is not of course possible to determine II', but making use of some clearly defined conception of probability we may determine a "probable" or "likely" form of it, and hence fix the contours so that in moving "inwards" across them the difference between II and the population from which it is "most likely" that 2 has been sampled should become less and less. This choice also implies that on moving "outwards" across the contours, other hypotheses as to the population sampled become more and more likely than Hypothesis A. * A non-random or biased sample from n may appear to be a random sample from IX'; in practice it may be very important to be able to attribute 2 to II rather than IT. But clearly no general rules to cover such cases can be given, for the position must depend in each case on the possible forms of bias that may have occurred.

4

J . N E Y M A N AND E . S. PEARSON

Both these aspects of the problem must be taken into account. Using only the first control, any given sample could always be found to lie outside an extremely divergent contour of some system. That is to say a criterion of any desired degree of stringency could always be found for a sample. But regarding the position from the second point of view, it is seen that there will only be certain systems which are of any value as criteria. The application of these principles will become clearer when illustrated in the cases of particular sampling tests, but it seems well to emphasise at the outset the importance of careful thinking in these matters. For example it might readily be supposed that if, in sampling from IT, a sample of form Si occurs more frequently than one of form S 2 , then greater confidence in Hypothesis A would be justified if S, were drawn rather than 2 2 . But we shall see that under certain conditions this may not be the case, because in the first event alternative hypotheses are relatively far more probable than A than they are in the second. It is indeed obvious, upon a little consideration, that the mere fact that a particular sample may be expected to occur very rarely in sampling from II would not in itself justify the rejection of the hypothesis that it had been so drawn, if there were no other more probable hypotheses conceivable. II.

SAMPLING FROM A NORMAL POPULATION.

(1) Description of the Fundamental Space. We shall discuss first the case of sampling from an "infinite" population in which the single variable under consideration follows a Normal Distribution, and we shall suppose that this population, II, is completely specified, its mean a and standard deviation a being known. We have a sample, 2, of n observations, Xlt X2,...Xn, and the hypothesis whose probability we wish to test is that 2 is a random sample from I I ; this is Hypothesis A. The methods to be discussed are perfectly general, but we have particularly in mind the case of small samples where the data are insufficient for the application of the ordinary (P, %") test for goodness of fit. Making use of the geometrical method first introduced into this problem by R. A. Fisher*, we shall imagine an n-dimensioned space in which we take an origin at the point 0 (a, a,... a) and rectangular axes 0xlt 0x2, ... 0xn. Referred to these axes, 2 is the point x1 = X1 — a, x2 = X.^ — a,... xn = Xn — a. We now fill this space with the density field appropriate for samples of n drawn from II, making the point-density at (xlt x2,... xn) equal D = l/(V2mr)». e 2"a

(i). so that the chance of drawing a sample with variates lying within the range + ... xn± ^dxn is Ddx!,... dxn. * Biometrika, Vol. x. p. 507.

On the use and interpretation of certain test criteria

5

I n the great majority of problems we are neither able nor do we wish to distinguish between

differentiated variables, but in the space as

defined there will be n ! points corresponding to a given unordered set of n values of the variable.

Each of these points will lie in a different region of the general

hyperspace; one such region will be that for which Xl

...

(ii),

which may be described as a wedge-shaped region lying between n — 1 primes passing through the line These n ! regions are identical in shape and density distribution and one may be superposed upon another or all combined together by a rotation about the axis ( i i i ) * .

The contour surfaces of equal density and the other contours that

will be considered are all figures of revolution about this axis, and it will follow that any conclusions which may be drawn regarding the integral of the density taken throughout regions lying between such contours will apply to a single

FIG. I

segmental region as well as to the complete space.

For simplicity in treatment

we shall therefore consider the latter space, which may be termed the fundamental space, bearing in mind that in practice it is generally only possible to locate 2 in a single region such as that defined by the conditions (ii).

This region may be

supposed filled with a density obtained by superposing the n\ similar regions. Denote the mean of 2 by a + m and its standard deviation by s. The section of the fundamental space by the two-dimensioned plane which passes through the line ( i i i ) and the point P representing 2 is shown in Figure 1.

0 is the origin,

OD the line (iii), and E the point where this line is cut by

(iv).

¡Ci+x2 + ... + xn = nm * In the case n = 2, the regions are the halves of the

, x,2 plane divided by the line

=

For n = 3 ,

there are six regions, each being a " w e d g e " or " s l i c e " in three-dimensioned space lying between two planes which cut at an angle of 60° in the line x1 = x 2 = x 3 . any linear function of the n variables. n dimensions.

W e shall use the term " prime" to describe

Such a locus is a " f l a t " space of n - X

dimensions lying in

6

J . N E Y M A N AND E . S .

PEARSON

The prime (iv) passes through P and lies at right angles to OB, so that the angle PEO is a right angle. Further P is the point (X1 — a, X2 — a,... Xn — a) and E the point (m, m,... m), so that OP2 =

8 (Xt - a)2 = n (m2 + s 2 ), i=i OE* — nm2 and consequently EP2 = ns\ Hence if we reduce the scale of this section of the hyperspace in the linear ratio of we shall obtain a plane in which the position of the point corre-

sponding to P is given by rectangular coordinates (m, s). This plane will be termed the (m, s)-plane. Any contour in the hyperspace, whose equation can be expressed in terms of the two variables m and s only, may therefore be obtained by a rotation of the corresponding curve in the (m, s)-plane about the axis (iii), and a linear enlargement in the ratio Vn : l . Consider now the following: (a) The point density D in the hyperspace may be written n (m2 + s 2 )

D = constant x e where*

= constant x

2j2

= S (Xti=i

,

af = n (m2 + s2).

(b) The contours of equal density are (n — l)-fold hyperspheres centred at the origin 0. (c) I f the two-dimensioned (m, s)-plane be rotated in m dimensions about the axis (iii), the point P will trace out an (n — 2)-fold hypersphere lying in a prime perpendicular to the axis of rotation; the boundary surface of this hypersphere is proportional to sn~K Consequently the integral of D taken fundamental space which can be obtained curves lying in the (m, s)-plane, will be the corresponding curves in that plane, where plane measured by d = constant x s"

between any contour surfaces in the (after enlargement) by a rotation of same as the integral of d between the d is a point-density assigned to the

-2

n (m2 + s'2)

e

,

or choosing the constant so that J

r+ co

J

d. dm . d s =

n - 2

1,

e 22

V

'

* This x 2 must not be confused with the x 2 of the Goodness of Fit Test.

(v

>'

On the use and interpretation of certain test criteria

7

The equations of the five contour systems defined below can all be expressed in terms of m and s. In order to grasp their bearing on the problem of testing Hypothesis A, it is necessary to consider their form and position in the fundamental space, but their reduction to the (m, s)-plane makes clearer their relation to one another and enables a ready integration of D to be obtained in certain regions of the hyperspace. These contours, which are shown diagrammatically in the (m, s)-plane in Figure 2, are as follows: (1) The contours of equal density D (contours of centred at 0, of radius x a H n ~ +

These are semicircles,

(2) The contours of constant m; a series of lines parallel to the axis of s. (3) The contours of constant s; a series of lines parallel to the axis of m. (4) The contours upon each of which the ratio of z — m/s is constant (contours of z). These are a series of straight lines radiating from 0. (5) The contours of equal density d, or of equi-probable doublets (m, s). They form a series of oval curves in the (m, s)-plane with equations (n — 2) loge s — jTi (m2 + s5)/ m> s> or z, equal to or greater than the value observed: (1)

( v i i >'

Px=c1jjiXn~ie~ix'dX nma

1.00

(2)

Pm=cJ

e

J m

2

°*dm

(viii).

(3) The distribution of s being skew, we may take _ ns2

oo

-P» = C 3 J

Sn~*e

2a'ds

(ix),

m* 8

or

P,' = c J s ^ e ' ^ d s

(x),

Jo

according as s is greater or less than the modal value s0 = a *J(n — 2)/n. n

(4)

Pz = cJ

2

(1+s )

2

dz

(xi),

Jz

and finally, for the chance of obtaining a less probable doublet (TO, S), n (wt2 + «") (5)

Pm. = cJ

sn~2e

2-test are applied to a sample, and the Hypothesis A is rejected if, let us say, \ m\ > 3 I ~2 0 < XX1VJ " (27r)>"2 (TO2 + s2)»/2 ' We can now obtain the ratio of (a) maximum likelihood for a member of M (II), to (b) maximum likelihood for any normal population (the DM of (xviii)), or ' - f c - t e h )

5

- * * " "

5

'01, it will make a considerable difference to his conclusions whether he uses Pz or PK (max). But as the ultimate F

\

T \

/ T / V

V -M D'

C

r \ O

+ M

D

FIG. 3

value of statistical judgment depends upon a clear understanding of the meaning of the statistical tests applied, the difference between the values of the two P's should present no difficulty. The difference in the two scales of probability simply corresponds to the difference in attitude of mind with which the problem is approached; the use of (max) perhaps corresponds to the more cautious attitude, but it would seem impossible to claim that one approach is the correct one and the other is erroneous. (7) Solutions obtained by the Inverse Method. The inverse method of approach is to start from 2, the observed sample, as the one certain fact in the problem. Points in the space no longer represent samples but populations from which 2 may have been drawn; that is to say, the density field is made appropriate to 2 and not to.II, and we shall reject an

On the use and interpretation

of certain test criteria

19

hypothesis concerning the population sampled on entering regions of the field in which the density falls below a certain level. This alternative point of view is of some interest, and although it leads to tests which are almost identical with those reached above, we shall discuss it in some detail in the present case of sampling from a normal population. We must again assume that 2 has been drawn from some normally distributed population, and as this can be described completely by its mean, a, and standard deviation,a, the whole discussion maybe confined to the two-dimensioned (a, o-)-plane. Referred to a fixed origin in this plane and rectangular axes, 2 is the point G (X, s) and any population II a point P (a, a). The position is illustrated in Figure 4. The ordinary method of inverse probability consists in postulating some function tf> (a, a) to represent the probability a priori that the sampled population is II, and taking for the point-density of the field D = constant x if> (a, a) x a~n e

2ff2

a

) +s i

(xxviii).

Fi6.4 D.da .da- would then be termed the probability a posteriori that the population sampled has a mean and standard deviation in the range, a + \da, a + \da. The difficulty of this procedure in any practical problem lies in the fact that it is almost impossible to express in exact terms. We prefer therefore to follow a line of argument which while really equivalent to the above with assumed constant makes use of the principle of likelihood rather than the somewhat vaguer conception of a, posteriori probability. The likelihood of II may be written L oci) = ( l / V 2 ^ ) » a - » r ^ U X " a ) 2 + S2} (xxix). Population points such as P of Figure 4, for which L is constant will lie on closed curves, log e o- s -|-¡(Z-a) 2 + s8}/„= 1 when a = X, a = s. It is of interest to compare the values of Pm.fi, (xii) ; PA, (xxi); and P0>tl l l l l l I M M

l | ^ 1 1 eo J m TÎ i-!DONffiK(OH(>nO®a)H!OT|ioOiOH^!C( ii HNrt

\

Vs ©i N + 1 111111 111111°

1111111111111++++++++++++++++ HHSKSMMnit^^fMOlMISrt HrtinnoifSOioojon^^b'iOMioac^® «O

Ui

»Q US p

o^o^o^o^oiçiijiioipo^oiflouîo 1 1111111 11 ++++++++++ N O W ' i ' i œ p p O J i i M O t05 ¿¿¿Sib'i-WMtNTOr-ôàiô »ô n (—t rH

OS ¿3

œ

8

S* 1 1 1 1 1 I I 1 ¿ 1 1 1 1 1 1 1 1

KJ

I

500-0

Expected ("normal" case) Observed

Ç^ 'P 71 9 9 f ^ rH r-1 1—C i MCOIC i CMrH —1O1C —1OD

8 IO Total

O O »o

OJ ^ H

MH«H |hW(MOOOjiONCOb»WW(MH 11 H11 1 HMÎDW^I>H 1—I 1—1

Expected

s Ö "8 Pz > > 0"5) will not imply that the intervals (less b) are random. This result will follow whatever be the form of distribution of intervals, provided only that i, — b is small compared to the average interval. A large P z merely suggests that if the intervals (less the closed period) are random, then it is reasonable to suppose that the closed period is b. On the other hand, a very small value of P 2 " suggests that either the supposed closed period b is incorrect, or that the intervals (less b) are not random. In cases where it is clear a priori that there can be no closed period, then a small P f certainly suggests that the intervals are not random. I f however it is possible to observe a number, say N, of series of n consecutive intervals, we may calculate z" in each case and compare the distribution of these JV values of z" with the theoretical frequency law (lx). I f there is agreement, then not only is it probable that the hypothesis as to b is correct, but also that the intervals (less b) are random. As z" is independent of c, the average " rate of happenings " of the events, it follows that if b is unchanged—and in particular is zero—we may combine together in the test, the z"'s of different series of n observations, even though this rate may change from series to series. The test is also applicable, however small n may be. The practical importance of this will be illustrated below. Through the kindness of Miss E. M. Newbold and Professor Greenwood, we have been able to make use of some of the original data regarding Factory Accidents used by the former in Report No: 34 of the Industrial Fatigue Research Board*. We may ask what answer the z"-test gives to the following question: Are the lengths of interval between consecutive accidents incurred by the same individual worker distributed randomly ? For this purpose use has been made of the cards of male workers in a Chocolate Factory during the year 1 9 2 3 f . The time of accident was recorded to the day and even hour, and by making allowance * " A Contribution to the Study of the Human Factor in the Causation of Accidents," 1925. For a later paper by the same writer, see Journal of Roy. Stat. Soc. 1927, p. 487. + These are the cards of the series Ml of page 11 of the I.F.R.B. Report, No.-34. The injuries were mainly of minor character, i.e. " c u t finger," "abrasion," "splinter," etc. attended to in the Ambulance Room of the Factory, and did not lead to absence from work.

54

J . NEYMAN AND E . S. PEARSON

for Sundays, Bank Holidays, and as far as possible extra days of absence, sickness, etc.*, it was possible to obtain fairly accurately the number of days which each worker was exposed to risk between accidents. As Miss Newbold has shown, the liability to accident varies considerably among the workers, varying from none to fifteen accidents in the year for the cases examined, but the ratios z", which are independent of this liability or rate, can be combined together. The cards have been analysed for 50 workers; of these, 25 incurred only three accidents in the year, giving two intervals or the minimum to which the test can be applied. TABLE IX. Intervals between Accidents. n=2

Distributions of z". n=6

n=4 Theory

z"

Observation

Theory

0— •5 •5— 1-0 1-0— 1-5 IS— 2-0 2-0— 2-5 2-5— 8-0 3-0— 4-0 4'0— 5-0 5-0— 6-0 6-0— 7-0 7-0— 8-0 8-0— 9-0 9-0—10-0

20 7 4 2 2 2-5 5

16-7 8-3 5-0 3-3 2-4 1-8 2-5 1-7 1-2 0-9 0-7 0-6 0-4

0— •1 •1— •2 •2— •3 •3— -4 •4— -5 •5-1-0 1-0—1-5 1-5—2-0 2-0—5-0 5-0—8-0

4 8 4 1 1 3 3 1

6-2 4-3 31 2-3 1-7 4-3 1-5 0-7 0-8 0-1

Total

25

25-0

•460

•500

1-5 1

Greater than)

5

4-5

Total

50

50-0

10-0

Mean z"

(





Difference in terms of standard error

-•23

z" 0—-1 -1—-2 •2—-3 -S--4 •4— -5

Observation Theory 4 10 5

Greater) than '5 j

i

œ -i cÄ iL d>

ObEervation

Total

20

200

•179

•250

l l l l l l l l

z"

- •

99

In the above table are given the calculated values of z" for (a) the 50 cases, taking the first three accidents or n = 2, (b) the 25 of these for which there were at least five accidents (n = 4), and (c) the 20 of these for which there were at least seven accidents (n = 6). The intervals chosen were the first two, first four and first six of the year. ^ It is clear that as the records were restricted to a year, they do not contain a completely random selection of intervals, at any rate for the workers * As the dates of absenoe were not specified exactly, but grouped in thirteen 4-week periods, there was often some uncertainty as to the accident interval to whioh they belonged. But it was always possible to locate the longer periods of absenoe such as the 10—14 days annual holiday.

On the use and interpretation

of certain test

55

criteria

of small liability. The table compares the observed frequencies with the theoretical obtained by integrating (lx) between appropriate limits; also the mean z", and the ratio of the difference between mean (observed) and mean (theory) to the standard error. In the case of n = 2, the theoretical mean and standard deviation of z" are infinite so that the latter comparison cannot be made, but as far as can be judged from a sample of 50, the correspondence in the frequencies appears very close. For w = 4 and n = 6 the suggestion of too few observed values of z" in the first group would need confirmation from fuller data*; the mean values of are TABLE X. Distribution of z"; Omnibuses and Pedestrians. Intervals between Omnibuses n =4

n=2

0— •5 S— 1-0 1-0— IS IS— 2-0

2-0— 2-s us— s-o

3-0— Jf-0

4'0— s-o

S-0— 6-0 6-0— 7-0 7-0— 8-0 8-0— 9-0 9-0—10-0 Greater than) 10-0 ( Total Mean 2"

Observation

Theory

4 7*5 1-5 3 2

93 4-7 2-8 1-9 1-3 1-0 1-4 0-9 0-7 0-5 0-4 0-3 0-3



2 4 2 —



2

28



Intervals between Pedestrians

0— -1

•1— -2

•2— -3 •3— -If •4— S -5—1-0 1-0—IS 1-5—2-0 2-0—5-0 5-0—8-0 Greater) than 8-o]

n=5

Observation

Theory

z"

Observation

Theory

3 1 2 5 2 6 3 2 2 1

7-0 4-8 3 5 2-5 1-9 4-8 1-7 0-8 0-9 0-1

0—-1 -1—-2 •2—-3 •3—-4 •4—5 •5—-6 •6—-7 Greater) than -7 j

4 3

3-2 2-0 1-3 0-9 0-6 0-4 0-3

— —

1 1 1













1 —

2-5

28-0



Difference in terms of standard error

Total

28

28 '0

1-258

•500 +4 6

Total

10

10-0

•210

•333 -

82

however quite in agreement with theory as far as can be judged with such large standard errors. To obtain some further idea of the sensitiveness of the test, we have applied it to the time intervals observed between the passing of consecutive omnibuses on a London bus route. These intervals are not of course random, the omnibuses * In general there could be no closed interval, 6, the times between accidents varying from 0 to over 180 days, but it is possible that after an accident the worker is for a day or two more cautious, and this reduces the ohanoe of very small intervals and therefore small values of z".

56

J . NEYMAN AND E . S. PEARSON

having started at fixed intervals (apparently of 3 minutes) from three-quarters of an hour to an hour before the observations were made. During this time traffic delays, etc. would have superposed chance positive and negative variations on to the scheduled intervals, so that in some cases these had been reduced to a few seconds and in others had been doubled. Table X gives the values of ¡¿' calculated for 28 sets of n = 2 and 28 of n — 4. The fit for n = 2 is far less satisfactory than in the case of the accident intervals, and appears even worse for n = 4, where the mean z" exceeds the theoretical value by 4*6 times its standard error. Finally in the same table are given 10 values of z" calculated from sets of 5 intervals of time observed between pedestrians passing a fixed point on a roadway; here, as far as can be judged, agreement is satisfactory, suggesting that the intervals were random. There has not been time before going to press to extend these results, but inadequate as the numbers are, they certainly suggest that the method may be of value in testing the randomness of intervals in many problems, and further that the test will be sensitive even for pairs of consecutive intervals (n = 2) provided that the number of such pairs available is large enough.

V.

CONCLUSION.

The main problem that we have considered is that of determining whether it is probable that a given sample, taken as a whole, has been drawn from a specified population. From the idea of the representation of a sample of n as a point in multiple space, there follows at once the conception that the discriminating criteria may be regarded as associated with surfaces in this space which divide regions containing points for which the hypothesis to be tested as to the sample's origin is less probable from those for which it is more probable. Regarded from this point of view it is seen that in any given problem there need be no single system of surfaces which it is " best" to make use of. The system adopted will provide a numerical measure, and this must be coordinated in the mind of the statistician with a clear understanding of the process of reasoning on which the test is basedWe have endeavoured to connect in a logical sequence several of the most simple tests, and in so doing have found it essential to make use of what R. A. Fisher has termed "the principle of likelihood." The process of reasoning, however, is necessarily an individual matter, and we do not claim that the method which has been most helpful to' ourselves will be of greatest assistance to others. It would seem to be a case where each individual must reason out for himself his own philosophy. The differences in method of approach have been illustrated in the two problems that have been described as the testing of Hypothesis A and Hypothesis B. Both problems suppose a knowledge of the form or shape of the population sampled, A supposes its standard deviation known and B does not. In both cases there appear to be four or five methods open to the statistician; if properly interpreted we should not describe one as more accurate than another, but according to the

On the use and interpretation

of certain test

criteria

57

problem in hand should recommend this one or that as providing information which is more relevant to the purpose. The principles involved have been illustrated in the case of samples from normal, rectangular and exponential populations, for each of which analogous tests exist. Most attention has been given to the normal case, simply because the /3's of the populations of experience do tend to cluster round the Gaussian point. New tables have been computed for what is termed the PK test of Hypothesis A, and illustrations of some of the various types of problems in which it could be employed have been given. Now these tests, whether applicable to the normal, the rectangular or the exponential population, have all assumed both that there is perfect random sampling and that while there is uncertainty as to position and scale there is none as to the shape of the population curve. In following what would appear a more efficient system of reasoning, it must be remetnbered that the ideal situation is not quite that of practical reality, and we must be careful that what is gained on the swings is not being lost on the roundabouts. It would be useless to emphasise the gain in relevance of a Pz = '08 obtained from Student's Tables over a P = 15 obtained from Sheppard's Tables, if in fact the former were in error by an amount of the order of '05 owing to a faulty assumption that the population sampled was normal. Owing to the rapidity with which the distribution of means tends to normality, as n increases, the P obtained by entering Sheppard's Tables with z = mjs has a definite meaning if properly interpreted, even if the population sampled be skew. But by using Pz we may be obtaining a false impression of the accuracy of the criterion upon which our judgment is to be based. In practice in dealing with small samples it is generally not possible to be certain of the form of the population distribution from internal evidence. But from other considerations, from previous experience in similar problems and so forth, we may have good grounds for believing that the distribution is roughly of a certain type, and if skew for knowing the direction in which it will be more sharply limited. There will undoubtedly be regions in the /32 population field surrounding the normal, the rectangular, and the exponential points and any other points for which the problem can be completely solved, within which the criteria appropriate for these spots will be adequate. It is therefore a problem of first importance to ascertain what is the extent of these regions of adequacy, and failing a fresh advance in theory* the most straightforward method of doing this is to take experimental soundings in the /3j, /92 field. In this process we have not at present gone very far, but have shown that (a) At = 0 2, /32 = 3'2, the normal theory for samples of 10 does provide criteria which for practical purposes are probably adequate in the case of the distribution of m, of s, of z and for the PK contours. * It is of course a serious limitation to the practical value of the method of likelihood, that it appears only to provide readily soluble equations in certain simple cases. It is not until the samples are large enough for the application of the (P, x 2 ) group test that the problem can be solved in a more general manner. 5

58

J. NEYMAN AND E . S. PEARSON

(b) A t ft = 0, /32=l-8, the ^-distribution for samples of 4 and of 10 appears adequate, but the normal theory quite fails for s and P k . In this direction an intermediate sounding is required. (c) Some experimental work at present in progress suggests that the normal theory will also be adequate for s and z for samples of 5 and 10 from a symmetrical leptokurtic curve with /92 as great as 4-2. (d) The results obtained in connection with the exponential population are quite out of touch with the symmetrical cases, but the distributions reached in that case will probably be adequate for a fairly wide range of skew populations with and & in the region surrounding the point (4, 9). The limitation implied by the assumption of perfect random sampling must not of course be overlooked; this is of particular importance in the case of very small samples. I t has been shown that there may be some latitude as to the exact shape of the population curve associated with each hypothesis, but it has been assumed that in any event the sample is a random one. I t is true that the causes underlying the bias or selection which occurs in the drawing of a sample from I I may be such that the sample can be regarded as randomly drawn from I I ' ; but if a biased sample from an approximately normal population can only be considered as a random sample from a J-shaped curve, it is clear that our methods of comparison will be inadequate. The bias may consist only of a few errors of observation or of crudeness of grouping, but the smaller the sample the more important will be the effect. The answer to criticism from this source is that the tests should only be regarded as tools which must be used with discretion and understanding, and not as instruments which in themselves give the final verdict. I t is impossible to escape from the difficulties; they make it desirable to avoid the use of very small samples whenever possible, but do not prevent some conclusions of value being drawn when fuller data are not available. I f the criteria that have been discussed indicate that the supposed population of origin is improbable in comparison with some other alternative, that is a sign which is not to be disregarded. But we must not discard the original hypothesis until we have examined the alternative suggested, and have satisfied ourselves that it does involve a change in the real underlying factors in which we are interested; until we have assured ourselves that the alternative hypothesis is not error in observation, error in record, variation due to some outside factor that it was believed had been controlled, or to any one of many causes which when they chance to modify even one observation in a sample of five may be of serious consequence. We must not conclude without thanking Miss Ida McLearn for her diagrams, Miss M. Page, without whose assistance in computing the completion of the paper in its present form must have been indefinitely delayed, and Mrs L. J. Comrie who carried out the final stage in the tabling of P A .

On the use and interpretation VI.

APPENDIX:

59

of certain test criteria

TABLES OF

Pk.

PK is given by the relation -J

dMdS

2

(xxi

where the integral is to be taken outside the curve in the (M, S)-plane upon which X is constant, and We may write

\ = Sne

-£(AP + S2-1)

(xix bis).

1

log,.X = \{log«,S* — (M2 + S>- 1) log 1 0 e}=l {log10e - k}, and obtain for the equations of the contours (if 2 + &) logi0 e - logio S3 = k (xx bis). Values of M for equidistant values of S* were first computed for the curves corresponding to a number of values of k between "45 and 2*40. Some of the contours have been drawn in Figures 13 and 14. The lowest possible value of k corresponding to the centre of the system, M = 0, S=l, X = 1, is k = log10e = •434,2945. I f Sl and S2 be the points at which a contour cuts the axis of 8, if M, indicates the value of M at the point on the curve corresponding to a given S, and if we write x, = "Jn Mt, then the integral (xxi bis) can be put into the form n-l »-3 9

where

U

2

~2~ ( n - l

Sn~*e

2

(%a„)dS

(lxiii),

r -

^ (1 + o8) =

--=- e dx, J-oo v 2ir and is obtained by entering Sheppard's Tables with xs=*JnM,. The values of the integral (lxiii) were then calculated by quadrature. The task of calculating P x for an adequate framework of values of k and for n = 3 to n = 50 would have been very great, but it was discovered that the ratios for a given k are very nearly independent of n. Table X I gives the values of P A actually computed divided by the corresponding X. It was estimated that the computed value of P * might be in error by several units in the fifth decimal place, and this limits the number of significant figures that can be given for the ratio as k increases. It also means that the final figures in the tabled ratios are not exact. The results however appeared to justify the following procedure: (a) To obtain interpolated series of ratios for n — 3, 4 and 10 from the ten or eleven values computed in each case. * The argument interval for 8 varied from -005 to -04.

60

J . NEYMAN AND E . S . PEARSON

TABLE XI. Computed

Values of

k

n= 3

n=4

n—5

n — 10

11 = 20

n=r>0

loge

1-0000

1-0000 1-0169

1-0000

1-0000 1-0170 1 -0731 11327 1-1959 1 -2622 1-332

1-0000

1-0000 1-0170 1-072 1-135"

— •1,5 •SO- 1-0743 — SS •60 1-1991 — •65 — •70 1-4151 •75 1 -4957 •80 1-6721 •90 — 1-00 2-3432 1-30 3-292 vr>o 4-638 l-so 9-42 2-40 18-53 3-00

— —

1-1977 —

1-3363 1-4117 1-4921 1-6672 1-864 2-335" 3-286 4-62



1 -0736 — — — — —

1 -4899



1-485+ 1-66 1-85



1 0732 —

1-195—

1-33

— —





























































— —

2-330

(6) To extrapolate for high values of k for n = 10 and 4, from n = 4 and 3 respectively, in view of the fact that as P \ decreases only a few significant figures are needed in the ratio. (c) With the help of these " frames " and the isolated values for n = 5, to interpolate appropriate ratios for n = 5, 6, 7, 8 and 9. (d) Since the ratios at n = 10 give accurately the three values of I \ computed for « = 20 and the three for « = 50, to assume that the ratios for ?i=10 will be adequate for all values of n in the range 10 to 50. 71

To obtain PK it was only necessary to compute logI0 \ = ^ (log10 e — 1c), add to it the logarithm of the ratio and find the anti-logarithm. The tables given below were calculated in this way. They are given to four decimal places, but the final figure may often be in error by a single unit, although not, it is believed, by as much as two units. After the work was commenced it was realised that the variable k was not perhaps the best to have taken from the point of view of interpolation in the tables; k2 would probably have been better. I t has been necessary to change the interval from "01 to '05 at k = '65, and from "05 to '30 at k = 1*50. For high values of n and low values of k the tables are not easy to use if fourfigure accuracy is required, but for most practical purposes it is sufficient to obtain Pk to two decimal places only, and in this there is no difficulty. Method of entry.

Having obtained from the data of the problem under consideration M = m\a and S = s/cr, k may either be calculated from (xx bis), or generally obtained with sufficient accuracy from one or other of the Figures 13 and 14. P* is then found by entering the tables with n and k. Examples of the use of the tables have been given in Section (9) above.

On the use and interpretation

of certain

test criteria

61

Extension of the Tables. On the assumption that the ratios P*/X for n = 10 will continue to hold good for any value of n greater than 50, we may proceed as follows: (a) Find the appropriate k, (b) calculate X from the relation ft log,„\ = ^ (log10 e - k ) , (c) multiply the ratio P A /X given in Table X I I below by X, and so obtain Pk. TABLE XII. Values of PKj\ for n = 10 and beyond. k

P>J*

k

AA

k

Pa/X

k

•435 •UO •U5 •460 •455 •46O •465 •470 •475 •480 •485 •490 •495

1-0008 1-0062 1-0116 10170 1-0224 1 -0279 1 0334 1-0390 1-0446 1-0502 1-0559 1-0616 1 -0673

•500 •505 •510 •515 •520 •525 •530 •535 •540 '546 •550 •555 •560

1-0731 1-0789 1-0847 1-0906 1-0965 1-1024 1-1084 11144 1-1205 1-1266 1-1328 1-1390 1-1452

•565 •570 •575 •580 •585 •590 •595 •600 •605 •610 •615 •620 •625

1-1514 1-1576 1-1639 1-1702 1-1766 1-1830 1-1894 1 -1959 1-2024 1-2089 1-2155 1-2221 1-2287

•630 •635 •640 •645 •650 •70 •75 •80 •85 •90 •95 1-00 —

1-2353 1 -2420 1-2488 1-2555 1-2623 1-332 1-407 1-485+ 1-569 1-658 1-753 1-855 —

J . N E Y M A N AND E . S . P E A R S O N t^

«

in



«

On the use and interpretation of certain test criteria

63

SCALE or z »

urn

64

J . N E Y M A N AND E . S. P E A R S O N

TABLES OF P x . Size of Sample, n. h

k

3

4

5

6

7

8

9

10

11

12

13

14

•u •45 'V •48 •49 •50 •51 •52 •58 •54 •55 •56 •57 •58 •59 •60 •61 •62 •63 •64 •65

•9865 •9634 •9408 •9189 •8975 •8766 •8562 •8363 •8168 •7978 •7792 •7611 •7434 •7261 •7092 •6927 •6766 •6608 •6454 •6305 •6158 •6015

•9800 •9460 •9132 •8817 •8513 •8219 •7935 •7661 •7396 •7140 •6894 •6656 •6426 •6204 •5990 •5784 •5584 •5391 •5205 •5026 •4852 •4685

•9736 •9291 •8866 •8461 •8075 •7707 •7355 •7019 •6699 •6393 •6101 •5823 •5558 •5304 •5063 •4832 •4611 •4401 •4201 •4009 •3826 •3652

•9672 •9124 •8607 •8120 •7660 •7227 •6818 •6432 •6068 •5724 •5401 •5096 •4808 •4536 •4280 •4038 •3810 •3594 •3391 •3199 •3018 •2847

•9609 •8961 •8356 •7792 •7267 •6778 •6321 •5895 •5497 •5126 •4781 •4459 •4159 •3879 •3618 •3374 •3147 •2935 •2737 •2553 •2381 •2220

•9546 •8800 •8113 •7478 •6894 •6356 •5859 •5402 •4980 •4591 •4233 •3903 •3598 •3318 •3059 •2820 •2600 •2397 •2210 •2037 •1878 •1732

•9484 •8642 •7876 •7177 •6541 •5961 •5432 •4951 •4512 •4112 •3748 •3416 •3113 •2837 •2586 •2357 •2148 •1958 •1784 •1626 •1482 •1350

•9421 •8487 •7646 •6888 •6205 •5590 •5036 •4537 •4088 •3683 •3318 •2990 •2694 •2427 •2187 •1970 •1775 •1599 •1441 •1298 •1169 •1053

•9360 •8335 •7423 •6611 •5887 •5243 •4669 •4159 •3704 •3299 •2938 •2617 •2331 •2076 •1849 •1647 •1467 •1306 •1163 •1036 •0923 •0822

•9298 •8186 •7207 •6344 •5585 •4917 •4329 •3811 •3356 •2954 •2601 •2290 •2017 •1776 •1563 •1376 •1212 •1067 •0939 •0827 •0728 •0641

•9238 •8039 •6996 •6089 •5299 •4612 •4014 •3493 •3040 •2646 •2303 •2005 •1745 •1519 •1322 •1151 •1001 •0872 •0759 •0660 •0575 •0500

•9177 •7895 •6792 •5844 •5027 •4325 •3721 •3202 •2755 •2370 •2039 •1755 •1510 •1299 •1118 •0962 •0827 •0712 •0613 •0527 •0453 •0390

•44 •45 •46 •47 •48 •49 •50 •51 •52 •53 •54 •55 •56 •57 •58 •59 •60 •61 •62 •63 •64 •65

•70 •75 •80 •85 •90 •95 1-00 1-05 1-10 1-15 1-20 1-25 1-30 1-35 1-40 1-45 1-50

•5348 •4756 •4229 •3762 •3347 •2979 •2651 •2359 •2100 •1869 •1664 •1482 •1319 •1174 •1046 •0932 •0830

•3931 •3299 •2769 •2325 •1952 •1640 •1377 •1157 •0972 •0817 •0687 •0578 •0486 •0409 •0344 •0289 •0243

•2892 •2291 •1815 •1439 •1140 •0904 •0717 •0568 •0450 •0357 •0284 •0226 •0179 •0142 •0113 •0090 •0071

•2129 •1591 •1190 •0891 •0666 •0499 •0373 •0279 •0209 •0156 •0117 •0088 •0066 •0049 •0037 •0028 •0021

•1567 •1106 •0781 •0551 •0389 •0275 •0194 •0137 •0097 •0069 •0048 •0034 •0024 •0017 •0012 •0009 •0006

1153 •0768 •0512 •0341 •0228 •0152

•0849 •0534 •0336 •0211 •0133 •0084 •0053 •0033 •0021 0013 •0008 •0005 •0003 •0002

•0625 •0371 •0221 •0131 •0078 •0046 •0027 •0016 •0010 •0006 •0003 •0002

•0461 •0258 •0145 •0081 •0046 •0026 •0014 •0008 •0005 •0003 •0001

0339 •0180 •0095 •0050 •0027 •0014 •0007 •0004 •0002 •0001

•0250 •0125 •0062 •0031 •0016 •0008 •0004 •0002

•0184 •0087 0041 •0019 •0009 •0004 •0002

•70 •75 •80 •85 •90 •95 1-00 1-05 I-10 1-15 1-20 1-25 1-30 1-35 1-40 1-45 1-50

1-80 2-10 2-40 2-70 3-00

•0415 •0206 •0106 •0053 •0026

•0086 •0030

•0018 •0004

•0001

•0004

•0001 •0000

•ft)

•0011

•0001

•0004

•0000

•0001

•0000 —

•0101

•0068 •0045 •0030 •0020 •0013 •0009 •0006 •0004 •0003 •0002

•0001 •0001 •0001

•0000

•0000

•0001 •0001

•0000

•0001

•0001

•0000

•0001

•0000 —

•0001

•0000 — —



























•0000



































































1-80 2-10 2-40 2-70 3-00

65

On the use and interpretation of certain test criteria Size of Sample, n. k

15

16

17

18

19

20

21

22

23

24

25

26

•55 •56 •57 •58 •59 •60 •61 •62 •63 •64 •65

•9117 •7754 •6594 •5608 •4770 •4057 •3450 •2934 •2496 •2123 •1806 1536 •1306 •1111 •0945 •0804 •0684 •0582 •0495 •0421 0358 •0304

•9057 •7615 •6402 •5382 •4525 •3805 •3199 •2690 •2261 •1901 •1599 •1344 •1130 •0950 •0799 •0672 •0565 •0475 •0399 •0336 •0282 •0237

•8998 •7478 •6215 •5166 •4293 •3568 •2966 •2465 •2049 •1703 •1416 •1177 •0978 •0813 •0676 •0562 •0467 •0388 0323 •0268 •0223 •0186

•8939 •7344 •6034 •4958 •4073 •3347 •2750 •2259 •1856 •1525 •1253 •1030 •0846 •0695 •0571 •0470 •0386 •0317 •0260 •0214 •0176 •0145

•8881 •7213 •5858 •4758 •3864 •3139 •2549 •2071 •1682 •1366 •1110 •0901 •0732 •0595 •0483 •0392 •0319 •0259 •0210 •0171 •0139 •0113

•8822 •7084 •5687 •4566 •3666 •2944 •2364 •1898 •1524 •1224 •0983 •0789 •0634 •0509 •0409 •0328 •0263 •0212 •0170 •0136 •0110 •0088

•8765 •6956 •5521 •4382 •3478 •2761 •2191 •1739 •1381 •1096 •0870 •0691 •0548 •0435 •0345 •0274 •0218 •0173 •0137 •0109 •0086 0069

•8707 •6832 •5360 •4206 •3300 •2589 •2032 •1594 •1251 •0982 •0770 •0604 •0474 •0372 •0292 •0229 •0180 •0141 •0111 •0087 •0068 •0054

•8650 •6709 •5204 •4037 •3131 •2429 •1884 •1461 •1133 •0879 •0682 •0529 •0410 •0318 •0247 •0192 •0149 0115 •0089 •0069 •0054 •0042

•8593 •6589 •5052 •3874 •2970 •2278 •1746 •1339 •1027 •0787 •0604 •0463 •0355 •0272 •0209 •0160 •0123 •0094 •0072 •0055 •0042 •0033

•8537 •6471 •4905 •3718 •2818 •2136 1619 1227 •0930 •0705 •0535 •0405 •0307 •0233 •0177 •0134 •0101 •0077 •0058 •0044 •0034 •0025

•8481 •6355 •4762 •3568 •2674 •2004 •1501 •1125 •0843 •0632 •0473 •0355 •0265 •0199 •0149 •0112 •0084 •0063 •Q047 •0035 •0026 •0020

•70 •75 •80 •85 •90 •95 1-00 1-05

•0135 •0060 •0027 •0012 •0005 •0002 •0001 •0000

•0100 •0042 •0018

•0073 •0029 •0012 •0005 •0002 •0001 •0000

•0054 •0020 •0008 •0003 •0001 •0000

•0040 •0014 •0005 •0002 •0001 •0000

•0029 •0010 •0003 •0001 •0000

•0022 •0007 •0002 •0001 •0000

•0016 •0005 •0001 •0000

•0012 •0003 •0001 •0000

•0009 •0002 •0001 •0000

•0006 •0002 •0000

•0005 •0001 •0000

•u •45

•46

•47 •48 •49 •50 •51 •52 •53

•54

•0007 •0003 •0001 •0001 •0000















—.





















k

•44 •45 •46 •47 •48

•49 •50 •51 •52 •53 •54 •55 •56 •57 •58 •59 •60 •61 •62 •63 •64 •65 •70 •75 •80 •85 •90 •95 1-00 1-05

Size of Sample, n. k

27

28

29

30

31

32

33

34

35

36

37

38

•44 •45

•8426 •6241 •4623 •3425

•48

•63 •64 •65

•2537 •1879 •1392 •1031 •0764 •0566 •0419 •0311 •0230 •0170 •0126 •0094 •0069 •0051 •0038 •0028 •0021 •0015

•8371 •6129 •4488 •3287 •2407 •1762 •1290 •0945 •0692 •0507 •0371 •0272 0199 •0146 •0107 •0078 •0057 •0042 •0031 •0022 •0016 •0012

•8316 6020 •4357 •3154 •2283 •1653 •1196 •0866 •0627 •0454 •0329 •0238 •0172 •0125 •0090 •0065 •0047 •0034 •0025 •0018 •0013 •0009

•8262 •5912 •4230 •3027 •2166 •1550 •1109 •0794 •0568 •0407 •0291 •0208 •0149 •0107 •0076 •0055 •0039 •0028 •0020 •0014 •0010 •0007

•8207 •5806 •4107 •2905 •2055 •1454 •1028 •0728 •0515 •0364 •0258 •0182 •0129 •0091 •0065 •0046 •0032 •0023 •0016 •0011 •0008 •0006

•8154 •5702 •3987 •2788 •1950 1364 •0954 •0667 •0466 •0326 •0228 •0160 •0112 •0078 •0055 •0038 •0027 •0019 •0013 •0009 •0006 •0004

•8100 •5600 •3871 •2676 •1850 •1279 ••0884 •0611 •0423 •0292 •0202 •0140 •0097 •0067 •0046 •0032 •0022 •0015 •0011 •0007 •0005 •0003

•8047 •5499 •3758 •2568 •1755 •1199 •0820 •0560 •0383 •0262 •0179 •0122 •0084 •0057 •0039 •0027 •0018 •0012 •0009 •0006 •0004 •0003

•7994 •5401 •3648 •2465 •1665 •1125 •0760 •0513 •0347 •0234 •0158 •0107 •0072 •0049 0033 •0022 •0015 •0010 •0007 •0005 •0003 •0002

•7942 •5304 •3542 •2365 •1580 •1055 •0705 •0471 0314 •0210 0140 •0094 •0063 •0042 •0028 •0019 •0012 •0008 •0006 •0004 •0002 •0002

•7890 •5209 •3439 •2270 •1499 •0989 0653 •0431 •0285 •0188 •0125 •0082 •0054 •0036 •0024 •0016 •0010 •0007 •0004 •0003 •0002 •0001

•7839 •5116 •3339 •2179 •1422 •0928 •0606 •0395 •0258 •0168 •0110 •0072 •0047 •0031 •0020 •0013 •0008 •0006 •0004 •0002 •0002 •0001

•70 •75 •80

•0003 •0001 •0000

•0003 •0001 0000

•0002 •0000

•0001 •0000

•0001 •0000

•0001 •0000

•0001 •0000

•0000

•0000

•0000

•0000

•0000

•46 •47 •49 •50 •51 •52 •53 •54 •55 •56 •57 •58 •59 •60 •61

•62











k

•44 •45

•46 •47 •48

•49

•50 •51 •52 •53 •54 •55 •56 •57 •58 •59 •60 •61 •62 •63

•64

•65 •70 •75 •80

66

J . N E Y M A N AND E . S . P E A R S O N

Size of Sample, n. k

•u •45

•46 •47 •48

•49

•50 •51 •52 •53

•54

•55 •56 •57 •58 •59 •60

•61 •62 •63

•64

•65

39

•7787 •5024 •3241 •2091 •1349 •0870 •0562 •0362 •0234 •0151 •0097 •0063 •0041 •0026 •0017 •0011 •0007 •0005 •0003 •0002 •0001 •0001

40

•7736 •4934 •3147 •2007 •1280 •0816 •0521 •0332 •0212 •0135 •0086 •0055 •0035 •0022 •0014 •0009 •0006 •0004 •0002 •0002 •0001 •0001

41

•7686 •4846 •3055 •1926 •1214 •0766 •0483 •0304 •0192 0121 •0076 •0048 •0030 •0019 •0012 •0008 •0005 •0003 •0002 •0001 •0001 •0000

42

•7635 •4759 •2966 •1848 •1152 •0718 •0448 •0279 •0174 •0108 •0068 •0042 •0026 •0016 •0010 •0006 •0004 •0002 •0002 •0001 •0001 •0000

43

44

45

46

47

48

49

50

k

•7585 •4673 •2879 •1774 •1093 •0673 •0415 •0256 •0158 •0097 •0060 •0037 •0023 •0014 •0009 •0005 •0003 •0002 •0001 •0001 •0000

•7536 •4590 •2795 •1703 •1037 •0632 •0385 •0234 •0143 •0087 •0053 •0032 •0020 •0012 •0007 •0004 •0003 •0002 •0001 •0001 •0000

•7486 •4507 •2714 •1634 •0984 •0592 •0357 •0215 •0129 •0078 •0047 •0028 •0017 •0010 •0006 •0004 •0002 •0001 •0001 •0000

•7437 •4427 •2635 •1568 •0933 •0556 •0331 •0197 •0117 •0070 •0042 •0025 •0015 •0009 •0005 •0003 •0002 •0001 •0001 •0000

•7389 •4347 •2558 •1505 •0886 •0521 •0307 •0180 •0106 •0062 •0037 •0022 •0013 •0007 •0004 •0003 •0002 •0001 •0001 •0000

•7340 •4269 •2483 •1444 •0840 •0489 •0284 •0165 •0096 •0056 •0033 •0019 •0011 •0006 •0004 •0002 •0001 •0001 •0000

•7292 •4193 •2411 •1386 •0797 •0458 •0264 •0152 •0087 •0050 •0029 •0017 •0010 •0005 •0003 •0002 •0001 •0001 •0000

•7244 •4118 •2341 •1330 •0756 •0430 •0244 •0139 •0079 •0045 •0026 •0015 •0008 •0005 •0003 •0002 •0001 •0000

•44 •45





— —

•46

•47

•48 •49 •50 •51 •52 •53

•54 •55 •56 •57 •58 •59 •60 •61

•62 •63

•64 •65

N.B. I feel it necessary to make a brief comment on the authorship of this paper. Its origin was a matter of close co-operation, both personal and by letter, and the ground covered included the general ideas and the illustration of these by sampling from a normal population. A part of the results reached in common are included in Chapters I, II and V. Later I was much occupied with other work, and therefore unable to co-operate. The experimental work, the calculation of tables and the developments of the theory of Chapters III and IV are due solely to Dr Egon S. Pearson. J . NEYMAN.

ON THE USE AND INTERPRETATION OF CERTAIN TEST CRITERIA FOR PURPOSES OF STATISTICAL INFERENCE. PART II. BY J . N E Y M A N , PH.D. AND E. S. P E A R S O N , D . S c . CONTENTS. (1) (2) (3) (4) (5) (6) (7)

An Extension of the Definition of Likelihood The Fundamental x 2 Problem The Test of Goodness of Pit . . . A Sampling Experiment . . . Summary of the Position The Case of Two Samples . . . Application to Contingency Tables

PAGE

67 69 72 78 84 87 88

(1) An Extension of the Definition of Likelihood. In an earlier paper * we have endeavoured to emphasise the importance of placing in a logical sequence the stages of reasoning adopted in the solution of certain statistical problems, which may be termed problems of inference. In testing whether a given sample, 2, is likely to have been drawn from a population IJ, we have started from the simple principle that appears to be used in the judgments of ordinary life—that the degree of confidence placed in an hypothesis depends upon the relative probability or improbability of alternative hypotheses. From this point of view any criterion which is to assist in scaling the degree of confidence with which we accept or reject the hypothesis that 2 has been randomly drawn from IT should be one which decreases as the probability (defined in some definite manner) of alternative hypotheses becomes relatively greater. Now it is of course impossible in practice to scale the confidence with which we form a judgment with any single numerical criterion, partly because there will nearly always be present certain a priori conditions and limitations which cannot be expressed in exact terms. Yet though it may be impossible to bring the ideal situation into agreement with the real, some form of numerical measure is essential as a guide and control. In our previous paper we have made use of the criterion of likelihood. That there may be other forms of criteria or that this one can be interpreted in a different manner is very possible, but our object has been to find a single principle connecting the various sampling tests already in use, and one which could be extended to new problems. * Biometrika, xxA, pp. 175—240.

68

J. NEYMAN AND E . S. PEARSON

Hitherto no absolute value has been assigned to the measure of likelihood; if the chance of obtaining 2 from II be C, and from IT be C\ then the ratio of the likelihoods of II and IT with regard to 2 has been defined as C/C. This ratio of chances or frequencies is undoubtedly one which has a direct bearing on our judgments in ordinary life. In dealing, however, with continuous variation, for any unique sample 2, both 0 and C will be infinitesimally small. I t is therefore necessary to take the chance of obtaining a sample whose variate values lie within certain fixed elements which are independent of II and II'. The value of the ratio C/C' will then depend upon the elements chosen. The ratio of the likelihoods has in this case been defined as the limiting value of C/C as the elements tend to zero. I t is perhaps only easy to grasp the significance of the ratio in the case where C and C are finite, as for example when dealing with grouped observations*. But if in giving the same interpretation to the limiting ratio we are making in a certain sense an assumption, so far we have been unable to discover any case where the use of this criterion leads to results which on intuitive grounds appear unreasonable. I t was in this form of the ratio of likelihoods that the idea was introduced by R. A. Fisher f , and it was so used in our previous paper. I t is, however, possible and perhaps desirable to give a more precise definition of the term. The populations IT, IT, etc. which are considered as possible sources of origin for 2 must belong to some set or universe of populations, ft, which can generally be specified; further, ft will contain, in all cases likely to be met in practice, a certain population II (ft,„ ax ) for which the chance of drawing 2 is a maximum, say 0 =C (ft m a x ). We may now define the likelihood of the hypothesis that 2 has been drawn from II as X = C/C(ftmax)

(1).

Under these conditions the maximum likelihood becomes unity. The likelihood, X, is of course defined with regard to a definite set of populations, and will change if ft be changed. The hypothesis that 2 has been drawn from an exactly defined population II may be termed a simple hypothesis, while we may term the hypothesis that 2 has been drawn from some unspecified population belonging to a clearly defined subset to of the set ft, a composite hypothesis. For example, if ft represents all possible normal frequency distributions 1

e~

a simple hypothesis is that 2 has been sampled from a population for which a = alt o- = for which the chance of occurrence of 2 is a maximum be * The ratio is not here final in the sense that the limiting ratio is, since G and C' will change with the arrangement of grouping. t Phil. Trans. Vol. 222, A, p. 326.

On the use and interpretation

of certain test

criteria

69

n (wmax), and let this chance be G (wmax.). Then we define the likelihood of the composite hypothesis that 2 has been sampled from some member of the sub-set m of the set fi to be the ratio *, ^ = 0 ( ^ ) 1 0 (Sl^)

(2)

"Student's" Problem consists in testing such a composite hypothesis, and this definition leads at once to the expression (xxv) of our first paper, namely, n so that the contours of constant likelihood are the ^-contours. The value of this criterion in the case of testing composite hypotheses will perhaps be questioned. It may be argued that it is impossible to estimate the probability of such a hypothesis without a knowledge of the relative a priori probabilities of the constituent simple hypotheses. But in general it is quite impossible even to attempt to express our a priori knowledge i« exact terms. Can we then get no further ? Certainly we are faced with a problem whose solution cannot take the same form as that which is possible when a simple hypothesis is tested, yet we are inclined to think that this ratio of maximum chances or frequencies of occurrence provides perhaps the best numerical measure that can be found under the circumstances to guide our judgment. As for so in the case of a knowledge of the ratio alone is not however sufficient. It has been pointed out that the errors of judgment which must inevitably occur will be of two kinds, (1) we sometimes reject the hypothesis when it is in fact true, and (2) we sometimes accept it when 2 has been drawn from some population not belonging to the sub-set co. The form of the criterion Xx has been chosen to minimise the effect of (2), but it is necessary to know also the probability distribution of in sampling from the members of o> in order to control the source of error (1). In the cases which we have so far come across this distribution has possessed the essential property of being the same or approximately the same whatever member of a> may have been sampled. (2) The Fundamental Problem. We have indicatedf in the case of a single continuous variable the geometrical relation in multiple space between the point; density B=f(x1)f(a:2)...f(xN)

(3)

associated with a sample tJ/j j £'2 • • • ^ from a population following the distribution law y =f (x), and the term of the multinomial C = »"1"- T• •- •- H-kT,^)"1-^ W * This will be as before a limiting ratio if we are dealing with continuous variation and observations which are not grouped.

t Biometrika, xxA, p. 185.

70

J. NEYMAN AND E . S.

PEARSON

representing the chance of drawing a sample with group frequencies n l t n 2 ... from a population in which the corresponding group proportions a r e ^ , ^ ...picThe expression (4) is, however, of very general application, for in using it we are not limited to problems of a single variable, to equal grouping ranges or even to quantitative variables at all. Further, instead of using the .^-dimensioned space previously considered, it is now possible to represent the problem in a space of k dimensions, in which the variables are the sample group frequencies n1, n2... fl*, while the relation n1 + n2 + ... +nk = N (5) representing a prime in this space, shows that the sample point P is constrained to move in a space of k — 1 dimensions. As the quantities n„ must increase by integral values, we are in fact only concerned with a finite number of discrete points in the space, with each of which is associated a value of G. But if the size of the sample be large so that there exists a very great variety of possible sample points, and if none of the quantities p,N are too small, it becomes possible to represent with good approximation the sum of discrete values of G throughout a given region of the field by the integral of a continuous density function throughout that region. This function we shall now consider. Write m, = paN, and Bns = ns — m, (s = 1, 2 ... k).. .(6), where of course

Wj + m2 + ... + mk = N

(7).

Then (4) may be written (8>

- ^ r

If the values of all the group frequencies ne are large enough to justify, as an approximation, the neglect of the terms 1/12n, and those which follow in Stirling's Expansion for (ns)!, it is known that (8) may be written* C =

_

. /

m

c

(V2tt)*-iV

>

m

>

"'«

I

(9) w

'

All the terms except the first in the power of the exponential tend to zero as the ms increase. We take, therefore, as an approximation to C, the continuous density field represented by D = D0e~W

where

= S

s=i\msJ

= S ^ i = 1

(10),

^

m.

(11).

The assumptions involved in obtaining (10) from the multinomial are the same as those of the original method f , which reached the same result starting from the multivariate normal correlation surface of the sample group frequencies. * See for example E . Borel, Traité du Calcul des Probabilités, t K. Pearson, Phil. Mag. July 1900, pp. 157—175.

Tome 1, Faso. 1, pp. 35—36.

On the use and interpretation

of certain test

criteria

71

The next problem is to consider the contours which should be used in testing the hypothesis that 2 has been drawn from II. We shall use the contours of likelihood. In the earlier paper, when dealing with continuous variation in small samples, it was necessary to limit the set, ft, of possible populations (as defined on p. 264 above) to normal distributions, to rectangular distributions, etc. This is no longer necessary, for ft may now be taken as the set of all possible populations classed into k frequency groups. For such populations G (i! mai ) is obtained by making a maximum subject to pi +p2' + ••• + Pk = 1. The solution is p.' = ns/N,

(s = 1, 2 ... k).

Consequently the likelihood of this simple hypothesis becomes x

But

" c ( n L . ) " ©

" ' ©

. - - „ log ( l + I ' ) - ... - », log ( l + I ' ) k =- 8 «=i k =- S s= 1

and since S (Sns) = 0, we find that \ =e



J

*

'"«

J

'

(13),

or approximately \ = e~ (14). It follows that the levels of constant likelihood, X, among the discrete sample points will correspond approximately to the contours of % in the density field represented by (10)*, and we may take as the chance of obtaining a sample from II with X $ Xo that of obtaining x^X»> o r [ J P = -M Jo

k ie

X

~ ~^dx — xk~2^dx

(15).

From this point of view we give another interpretation to the ordinary (P, x 2 ) Test; we accept \ rather than x as the fundamental criterion by which to measure our confidence in the hypothesis that 2 has been sampled from II. I t would of course be unreasonable to accept as the most probable alternative the population of maximum likelihood, namely, that in which p„ = nJN, but these proportions point to a direction in which there will lie a widening field of possible alternatives as the likelihood diminishes. And this is really the idea which underlies the use of the \ contours. * The relation between x and the method of likelihood has been discussed by R. A. Fisher, Vol. 222, A, pp. 356—358; Jour. Boy. Stat. Soc. Vol. Lxxxvn. pp. 442—450.

Tram.

Phil.

72

J . N E Y M A N AND E . S .

PEARSON

(3) The Test of Goodness of Fit In the preceding section the standard x" result has been shown to follow from the use of the criterion of likelihood ; it was there supposed that the population II could be exactly specified, that is to say, we were dealing with a simple hypothesis. We have now another kind of problem in which a theoretical law of frequency of given type, but whose exact form depends upon certain undetermined parameters, is fitted to a series of observations, the value obtained for the parameters depending on the method of fitting. Some measure is then required of the adequacy of this fit. I f there be c of these parameters or constants to be determined*, then the proportional frequency in the sth group into which the population is classed may be expressed by a relation P » = / ( s ; «1. ®2 ••• «c)

(10),

where the form o f / i s known, but a l t a 2 ... a c are to be determined from the sample. Using some appropriate method of fitting, (16) will give a series of frequencies = Npa = Nf (s;

,c2...ac)

ai

(s = 1, 2 ... k) ...(17)

with which to compare the observed frequencies n l t ... n^. I t is quite possible that all we may then require is an answer to the question : Is it probable that the observed sample would have arisen in random sampling from, a population whose group proportions are actually the values ps= ms/JSr of (17) determined by fitting (16) to the observations ? This question is answered with the help of the ordinary (P, X2)

Test

> putting x = Xi>

where Xl A

*= | s =i

ms

(18),

and the Tables are entered with n = k. We are testing a simple hypothesis. On the other hand, we may attempt to answer a rather different question: Is it probable that the observed sample would have occurred in random sampling from a population whose law of frequency is defined by (16)? These two questions are not the same; how far an answer to the first can be considered as satisfying the second will be discussed later. In recent papers f R. A. Fisher has given results showing that under certain conditions the second question can be answered by using the (P, Test with %2 = %i2 as in (18) and n — k — c. Partly because we shall approach the problem from a somewhat different point of view and partly because we believe that some may have difficulty in following Dr Fisher's very condensed reasoning we propose to go into this question at some length. We shall also deduce the correlation surface of the x °f ( H ) and the Xi of (18), and test the theoretical formulae by applying them to the results of experimental sampling. I f the differences between the sample group frequencies, ne, and the corresponding reduced population frequencies, ms, or xs = ns — ms, (s =1,... k), are taken * c is the minimum number of parameters required to express the law. f Jour. Roy. Stat. Hoc. LXXXV. pp. 87—94, and LXXXVII. pp. 442—450.

On the use and interpretation

of certain test

criteria

73

as variables, the problem may be represented in a ¿-dimensioned space with axes Oxlt Ox2... Oxk, in which the sample points are constrained to lie on the prime + ... + «i = 0 (19), or in k — 1 dimensions. The chance of drawing from the population a sample lying in a given region of this space may then be represented approximately, as has been seen, by the integral throughout this region of the continuous density function D = D0e~W

where we may now write %2 +

(10 6w), =

(20).

Tf now we fit to the sample a frequency law of given form depending on c parameters, the resulting frequencies m1, m^ ... mk will be represented by a point X1 = Wj - Wj,... Xk = mk - fhk, which must lie on what may be termed the population locus, or \ X,=f,(alta,...att)

(s = 1, 2 ... k - 1) ...(21a),

[Z1 + Z 1 + . . . + Z t = 0 (216). For a given sample (xlt... xk), the point (Xt,... Xk) chosen will depend upon the method of fitting employed. The value of Xi2 calculated from the fitted population frequencies is fn1+X1 + - + Thk + Xk ~Xl This expression represents a hyperellipsoid centred at ( 1 , , . . . Xk). Suppose now that the method of fitting is one which determines (Xlt... Xk), subject to equations (21), so that Xi is a minimum. Then it follows that the locus of the sample points for which X]2 is constant is the envelope of the hyperellipsoids (22) whose centres are constrained to move on the population locus (21). Under these conditions, therefore, the chance of obtaining a value of ^ greater let us say than y will be equal to the integral of D taken outside the envelope of the hyperellipsoids for which = 7. A general solution for this integration would be extremely difficult to obtain; in the present paper we shall, however, (a) Show how a solution may be obtained under certain simplifying assumptions. (b) Give in an Appendix what we believe to be a rigorous proof that the approximation (a) is the limiting form for large samples under very general conditions. (c) Describe certain experimental results which suggest that this limiting solution will in fact represent adequately the situation met with in common practice. The assumptions required for the solution (a) are as follows: (1) That, as before, the distribution of samples from the population II can be adequately represented by the continuous density function D of (10). 6

74

J . NEYMAN AND E . S . PEARSON

(2) That the point ( X x , . . . Xk) is sufficiently close to the origin—or the fitted m, near enough to the true ms—for the linear transformation by which the hyperellipsoids (20) are turned into concentric hyperspheres of radius or + + = (23) to change also the hyperellipsoids (22) approximately into hyperspheres of radius xi> o r fa' - Z O 2 + ...+(«*'Xky = Xi (24). In other words, we assume that at any rate within the region of significant density, the terms Xu ... Xk in the denominators of (22) are negligible compared with m x ,... mt. (3) If the c constants a be eliminated from (21a), we shall be left with k — c — 1 equations of form (i = l , 2 . . . * - c - l ) ...(25). Ft(Xx,...X*) = 0 Since X s = ms — m„ is small these equations may generally be expanded by Taylor's Theorem in powers of the X's. It will be assumed that in the neighbourhood of the origin it is adequate to take the first order terms only in these expansions, and to represent the population locus, instead of by (21), by the k — c primes altX1 + ... +aMXk

=0

(« = 1, 2 ... k-c-1)

...(26a),

*!+...+ X* = 0 (26 b). If these three assumptions are justified the same linear transformation will change the hyperellipsoids (22) as well as the hyperellipsoids (20) into hyperspheres, and will leave the equations (26) as a series of primes. The effect of the conditions (19) and (26 b) is to limit the field to a space of k— 1 dimensions. It follows that the problem reduces to that of integrating D outside the envelope traced out in the (k - l)-dimensioned space defined by (19), by a hypersphere of radius whose centre is constrained to move in this space on the k — c— 1 primes (26 a)f. We cannot visualise the relation of the hypersurfaces and envelopes beyond the case where k = 4, but in this case the relationship can be represented in three dimensions; the contours of are ordinary ellipsoidal surfaces in three dimensions surrounding the population point 0. If c ~ 1, the point Q representing the population obtained by fitting to the sample is constrained to lie on a locus of one dimension, i.e. a curve, which by the assumption (3) above we suppose to lie in the neighbourhood of 0 on the intersection of two planes, or to be represented by a straight line. After the transformation, the locus of constant is approximately the envelope of spheres of radius ^ whose centres move on a straight line, that is to say, it is a cylinder. If c = 2, then the locus of Q is two-dimensioned, and we assume that near 0 it can be represented by a plane. The locus of constant ^ is now approximately the envelope of spheres whose centres move on a plane, that is to say, consists of two parallel planes at a distance apart. * x' and X' represent the coordinates after transformation. t More correctly, we shall be dealing with (19) and (26 a) when transformed into terms of x! and X'.

On the use and interpretation of certain test criteria

75

To obtain the distribution of %i we shall consider the solution of the following general lemma. Lemma. A point 0 in a spice of n-dimensions is surrounded by a density field

(27),

D = D0e-i°' where

x f + x£ + •.. +

= s?

(28).

The chance of a point P falling into any region of this space is measured by the integral of D taken throughout that region. A point Q is constrained to move upon a locus defined by the intersection of s primes, + a^tno-i + ... + a„txn = 0

(£=1, 2 ... s) ...(29).

Then the problem is to find the chance that the distance, u, between P and the nearest possible position of Q lies within the limits u + \du, du being an infinitesimal element. Let P be the point (xlt

... xn) and Q the point (2f x , X2... X„). Then

(x1-X1y+... and

+ (xn-Xnf

(30),

= u*

a l t X 1 + a2tXa + . . . + a n t X „ = 0

(t = 1, 2 ...s) ...(31).

Let Figure 1 represent the section of the n-dimensioned space by the plane POQ; call 0Q = w. .P

FIG. 1

For a given P, Q is determined in the following manner. By solving the s equations (31) it is possible to express s of the quantities Xly...Xn as linear functions of the remaining n — s, say Xlt X2... X„_ s ; these values may be substituted into (30) and will give on minimising w2, n — s linear relations of form 8W..

dXt

n

( i = l , 2 ... n - ,

.(32).

76

J . N E Y M A N AND E . S . P E A R S O N

These relations (32) will determine the unique point Q on (29) for which PQ = u is a minimum. I t follows conversely that for a given point Q the locus of points P for which (1) PQ = u — constant, (2) Q is the nearest point to P lying on (29) is the intersection of the (n — l)-dimensioned hypersphere* centred at Q, radius u with the (n — s) primes (32), which may now be regarded as linear equations in the ic's. That is to say, the locus is an (.9 — l)-dimensioned hypersphere lying at right angles to (29). I t is also clear that PQ must be at right angles to OQ, seeing that OQ must lie in the locus (29) and that PQ is the minimum distance from P to this locus. Hence for a fixed Q and constant u, z2 = u2 + w2 and is constant, so that this locus of P for fixed Q lies on a single one of the hyperspherical density contours defined by (27) and (28). Consequently the chance of drawing a sample for which ( 1 )

u +

\ d u ~ z P Q > u



\ d u ,

(2) Q is at a fixed position on (29), (3) u = PQ is a minimum OC

u ' ^ d u D o e

oc

u ^ e ' W d u

x

To complete the solution it is necessary to integrate this expression for all possible positions of Q on (29), remembering that for a given P, Q is uniquely determined; that is to say, we must take

J

e ~

i w 2

d v

over the hyper volume defined

by (29). The result will be clearly independent of u, and it follows that the chance of obtaining a point P for which P Q lies in the limits u ± $ d u x u^er^du, o r

f ( u ) d u

=

c

0

u

s

-

1

e ~ ^ d u

( 3 3 ) .

A further problem is to find the chance of obtaining a point P for which u and z lie within the limits u + | d u , and z ±^dz. For this we take the expression u ^ e - W d u

J

e - ^ d v

( 3 4 ) ,

where the integral is now to be taken over that portion only of the hypervolume (29) for which w2 = zi— u2. Now (29) cuts the hypersphere (28) in an (n — s — 1 )dimensioned hypersphere, and consequently (34) is proportional to 3-1

U

W

)i-S-1

-lu>*dudw.

e

* W e shall speak of t h e l o c u s r e p r e s e n t e d by a n equation s u c h a s (28) a s a n (n - 1 ) - d i m e n s i o n e d hypersphere;

for if n — '6 we h a v e t h e s u r f a c e of a n o r d i n a r y s p h e r e in 3 - d i m e n s i o n e d s p a c e , w h i c h

is a 2 - d i m e n s i o n e d l o c u s . a plane.

S i m i l a r l y if n = 2

t h e l o c u s is a c i r c l e , o r a figure of 1 - d i m e n s i o n lying in

On the use and interpretation of certain test criteria

77

Changing the variables from u and w to u and z, since z = Vw2 + w2, we have du dw = dudz. z/w, and consequently (u, z) dudz oc u s_1 e-i" 2 w" - " - 2 e~ i w *zdudz n-s-2

= CQW^z (z2 — m2)

e-Wdudz

2

(35).

These results may be applied at once to the %i2) problem as it has been simplified by the assumptions of p. 270. We must write z=*x> u~Xi a n d a s the problem is reduced to k — 1 dimensions by condition (19), n — k — 1. Also the s primes (29) correspond to the k — c — 1 equations (26 a), so that s = k — c — 1. I t follows from (33) that and from (35) that

/(X i) dXi =

(36), c-2

0 (x> Xi) dXdXi = Co'Xih-°-*X(x2 - Xi2) 2 e-WdXdXi

(37).

Equation (36) represents the distribution referred to above which has been given by Fisher; it shows that if the assumptions involved are justified the distribution of minimum xi ' s that of the ordinary x obtained by using the true population frequencies, if the group number k be reduced by the number of constants in the population law which are determined by fitting to the sample. The relation (37) is we believe novel. Both these results assume a more convenient form if we take i]r = X 2 and fa = x\ as variables. We have, then, for the distributions of yjr and fa the Type I I I curves k—3

y = yo-fir e-M

(38),

y=

(39),

and these are the margins of the correlation surface c-2

k - c - 3

z = z0fa

2

(y/r-f x )

2

eri*

(40).

This surface (31) has the following properties: (a) I t is limited by the line

= fa, i.e.

(ib) For a given yjr the array of show that

^ fa.

is a Type I curve for which it is easy to

Mean fa = yfr(k-c-

1 )/(k-

1),

or the regression of fa on i/r is linear. (c) For a given fa the array of is a Type I I I curve with start at the point yfr = ^ j , and with mean at a distance c and mode at a distance c —2 from this point. The regression of on fa is therefore linear. (d) Since = V 2 ( f c - 1 ) and 2 and a3 as linear functions of n1,... ng, and these values when substituted into the expressions for ... ms (of which that for ?nx is given in the footnote) result in the following eight relations expressing m8 as linear functions of the observed sample group frequencies: f m1 = -8939«! + "2424^2 - 0606>i3 - '1212«4 - -0454ji5 + -0606«« + -0909% - '0606)18, m2 = -2424«! + •3745n2 + -3290M3 + •1818% + -0087?I6 - 1147ns -•1126n 7 + -0909«8, mj = - -0606»! + -3290n2 + -4177% + -3117))4 + -1168ns - -0606n8 -•1147M7 + -0606)18, rw4 = - 1212«! + -1818n2 + -31l7)i3 + 3139n4 + -2338n5 + -1169?i6 + -0087H7 - '0455)18 For example

171,= I (o„ + «j.t + (i2x'- + u.ia;s) dx = al) -

J

(42).

+ tya,-J^J-a,. 18—2

80

J . N E Y M A N AND E . S . P E A R S O N

The expressions for m6, me, m7 and m8 are obtained by reversing the order of the multipliers in the expressions for m4, m3, Wj, mx respectively. For example, m , = -0909V11.- - 1 1 2 6 % - - 1 1 4 7 n s + - 0 0 8 7 » 4 + - 1 8 1 8 n s + ' 3 2 9 0 « « + -3745w 7 + - 2 4 2 4 n 8 .

Using these expressions the fitted population frequencies,m lt ... m8, could each be computed by a continuous operation on the calculating machine, for each of the 208 samples. And from these

=

8

8

and Xi = S (ns — m s flm s

^ (n» ~ i

I

were calculated. The resulting (x 2 , Xi2) distribution is shown in Table I. Consider now, (a) the margin, (6) the Xi2 margin, (c) the correlation and regression, and (d) the distribution in the arrays of constant Xi2(a) The x2 Margin. Since k = 8, the theoretical distribution of ty = x 2 is V ~ V o ^ e ' ^ , curve with

a

Type I I I

j Mean y/r = k - 1 = 7*0000 and standard error for 208 samples of 0*2594, 1 /JVThis result can, however, be obtained by a method which makes no appeal to the conceptions of maximum likelihood and minimum The set of all possible samples of N from a fixed independence population with marginal ratios ag, bt ( s = l , 2 ...p \ i = l, 2 ...q) can be divided into a number of sub-sets for each of * Phil. Trans. Vol. 22*i A, p. 357; Jour. Roy. Stat. Soc. Vol. I A X X V I I . p. 447. 7

90

J . N E Y M A N AND E . S . PEARSON

which the marginal totals of the sample have the same series of values. Consider the frequency distribution among the cells in the samples composing one of these sub-sets in which, let us say, the marginal totals are vs., v.t (s = 1, 2 ... p\ t= 1, 2 ... q). The frequency of occurrence of a particular combination of cell contents nst will be proportional to a term of the multinomial expansion, or as in (54) to N'

-

—, , ' —: a' 1- ... d/p-. V-1... bq'-q, nll-nli••• npqor since the v's are constant within the sub-set, to iV!/(nn! nn\... npgl). That is to say, the distribution of frequency and hence the distribution of Xi2 = N2 within the sub-set does not depend upon the as and 5's of the sampled population, but only on the specially chosen marginal totals vt. and v.t. Within this sub-set, therefore, defined by the I/'S the distribution of cell contents in random sampling will be the same whatever be the independence distribution sampled; in fact we may suppose that its marginal ratios actually have the values as = v„./N, bt= v.t/N given by the marginal totals of the sub-set. But if this were so, the distribution of Xi within the sub-set becomes simply that of a % obtained for samples of N for which the frequencies are restricted by p + q — 2 linear relations, nsl + nl2 + ...+neq=vs. (s = 1, 2 ...p - 1)] «1« + «2« + ••• +npt = v.t (i = 1, 2 ... q- 1)J We are now" dealing no longer with envelopes but with hyperellipsoids in a space of reduced dimensions, and for this case»it is known as proved originally in Biometrika, xi.* that the distribution of x is given by y = yoXPi-2-IP+«-2)

e-ix

(58).

2

This distribution for Xi will hold within each of the sub-sets f ; it will therefore hold for the aggregate of all possible samples seeing that it is independent of the v's defining the sub-set. It follows that the distribution of 2 = Xi/^ that given in (56). The possibility of this method of solution appears to arise from the special form of the population locus representing independence distributions, and the method does not seem to be rigorously applicable in general. I t is perhaps of interest to conclude by referring to a third and entirely different way in which one of us approached the 2 distribution a few years ago. Writing ne = mst + bnst, ns. = i?is. + Sns., n.t = m.t + Sn.t, it is found that Xl

a _ ktj.2 — Q - iv

t + g ^ p r

+

2 n t8n-~ Sns? + ^ 8 n . t * - * i TO,.2 m.t2 ns.

n.t

+

2gw;ygw4

N

)

...(59),

* K. Pearson, " O n the Theories of Multiple and Partial Contingency," p. 145. t There is of course an approximation involved at this point, because even if the sample be very large, in some of the sub-sets the frequencies m^v^vJN must still be too small to justify the assumptions leading to the x 2 distribution. The chance of a sample falling into one of these sub-sets is, however, so small as barely to affect the form of the aggregate distribution of all possible samples.

On the use and interpretation of certain test criteria

91

going only as far as second order terms. The mean value of the squares and products of the cell variations in (59) are well known, and it was easy to show on taking mean values that Mean X l 2 = ( p - l ) ( 9 - l ) . Next taking the mean value of the square of (59) and evaluating at some length the 4th order products of form (¿n^SiiM etc., it was possible to show that to the same order of approximation oy = (pq — 1). Multiplying together the expressions (59) and (60), taking mean values and evaluating the resulting 4th order products, it was found that Mean (x2Xi) + — p2q — pq2 - p- q + 1, and from these results the coefficient of correlation between x 2 and x 2 is found to be _ /(p-iHg-i) *2*'2 V pq-1 Remembering that in the present case k =pq and c =p + q — 2, it will be seen that these values for the mean and standard deviation of x 2 and x 2 aQ d for the correlation between the two correspond exactly to special cases of the general results given on p. 273 above. This is an interesting confirmation of the previous work and suggests that the order of approximation involved is in both cases the same. APPENDIX. The Distribution of XiIn section (3) above the distribution of minimum ^ given in equation (36) was deduced by the use of geometrical arguments in multiple space, applied to a position which had been simplified by certain assumptions. We shall first give in the following Lemma what may appear a more exact proof to replace the geometrical reasoning of pp. 271—273 above, and then establish more rigorously the manner in which the true distribution of tends to that which has been obtained from these simplifying assumptions. Instead of the sample frequencies nu re2, ... nk, take as variables the proportions qlt q2,... qk where Nq, = n„ and let pi, p2, ... pic be the corresponding population group proportions, where Np, = m,. It follows that

+ ?2 + ... + ?t = l, P1+P2+ •••+Pk= I-

Let P x be defined by

n | ip.-q.)2 i>Xl = C l //"'//

where the region of integration is that

Ic

N S\

»=1 I

2*=I

inside the

(61) -

d9l'"dqk-1

envelope

S,

of the. hypersurfaces

c

S «„(^-Oe0)-^)2)



P, the envelope b?ing taken with regard to the c -

ft-,)

+ ••)

2

+

+ squared terms containing (a*-ft) 2 ,

(t=k-

+ product terms containing ( x t - f t ) ,

(t=k~2,

-

i f e f )

3, ... 2, 1) ft-

3,... 2, 1).

This expression must be positive at every point on the straight line defined by x ( = f t ( t = ft-3, ... 2, 1); Bt-i, k-i (xk-i" ft-i) +

i-j

- ft-2) =

0

except a t f t , ft, . . . f t - i where it vanishes. Hence t_i Z>V-2,1-2 ~ - ^ i - i , *-2 > 0 and 7 t - 2 , t - 2 * 0 ; the process of extracting positive squared terms may thus be continued to the next stage. It may be shown in the same way that it can be continued step by step until (73) is reached, and that always y ^ O .

94

J . N E Y M A N AND E . S . PEARSON

I t follows from (74) that , = 0, for s=k-c, k-c + \,... 1. We need not, however, solve these equations for the f¥a, since clearly the result of the substitution of their solutions in (73) will be to give *TW}=X. i=l

2

(75),

i is denoted the expression for i in which we put /3, = 0, or „= S {7«^,}. (75) is «=i the transformed equation of the envelope, £'. k /p -a)'' We must now transform the function under the integral in (61). iV S - — — = x 2 represents S = 1 P» where by ^

in the original coordinate space a hyperellipsoid centred at pu p2, ...pk_l. After the transformation it becomes that member of the set (73) which has its centre at the origin, that is for which f3„=0, s = l , 2, ... k - 1. Hence (61) transforms into rr c Px l= 2jJ---J

r

,e

-ii _kS1 l{4>J}

dxidxi...dxtc-1

(76).

The proof of the Lemma now follows at once. Since t

(x, - V.) O i - ^ i ) } = X I 2

(89),

S («•,—®,)2, and the quantities a,t are functions of the it's, but do not depend on «=i

N a n d will tend to finite limits as ir,-—p,(s=1,

2,

...k).

W e must now determine the radii rx and r 2 of the hyperspheres An>+n». V(2TT)/ \S2 V(27T)/

and hence

(11)

We must now find the maximum value of C associated with a population belonging to o). This we may do by first noting that if ox = a 2 = a (say) and ,

(12)

where x0 and s0 are the mean and standard deviation obtained by combining the ny + n2 variables of the samples and S 2 . It now follows that the maximum chance occurs when a = x0, (T = s0

( 1

\n,+n, e-i

X (1 + g2)-ioo in (40) and using (41) we find that

an expression giving the moment coefficients of the distribution of A whose probability integral we have tabled in our earlier paper. I n order to obtain some appreciation of the rapidity with which the distribution of AH in the two sample problem tends to the rectangular form, we have calculated from (40) the frequency constants: mean A, standard deviation of

110

E . S . PEABSON AND J . N E Y M A N

A, /?! and /?2, for a variety of values of n x and n 2 *. These are given in Table I. For the rectangular distribution these constants have values Mean = 0-5; Standard deviation = 1/^12 = 0-288675; & = 0; /?2 = 1-8. We have also calculated certain values of (43). All constants approach the values for the rectangular distribution as and n2 are increased, b u t without further analysis it is not possible to say a t what point it would be justifiable to use the equality PAfi = XH. We propose here only to set out briefly the steps in reasoning which have led us to the rough Tables I I and I I I given below. (а) As X H must lie between 0 and 1 we assume t h a t its distribution can be represented approximately by the law

The rectangular distribution is a special case of this, arising when p = q= 1. (б) The p and q of (44) may be expressed in terms of the mean, mv and the variance, m 2 , of the distribution; namely i'=~Wl-m1)-m2}) /#('2

p=

{mx( 1 - m x ) - m2}.

7/1'2

(45)

(c) Substituting in (45) the true means and standard deviations of the distributions given in the third and fourth columns of Table I, t h a t is putting m1 = Mean XH, m 2 =/¿ 2 — (Mean \ H ) 2 we have calculated p and q in each case. (d) The curves represented by (44) have the following values for fl^ and /?2, PB l =4{p-q)*(p M(*> B P2

+ ?+ + g2 )+2 l) * . _3#i(j> + g + 2) + 6(p + g + 1 ) 2(p + q + 3)

(46)

These have been calculated using the p's and q's of (c), and the resulting values are given in the 7th and 8th columns of Table I. (e) I t will be seen t h a t these latter values agree to the 4th decimal place very closely with the true values of and /?2 for the XH distribution calculated from the moment coefficients of equations (40) and (43) and given in the 5th and 6th columns of the table. This suggests t h a t the curves (44) may give a reasonable fit to the true XH distribution, although a more exact confirmation is clearly required in the critical region near A H = 0. A partial check on this point is described below. (/) Using these curves (44) we have computed for several different sizes of the samples the values of A H for which PAff=J

/(A)dA = 0-05

and

0-01.

* Mean A=fi[, Standard deviation = V(/ia—Z4^)» a11*! if As» As and ji t are the second, the third and fourth moment-coefficients of A referred to the mean,

Pi=l4//4> PI=MJ/4-

111

Problem of two samples

These are given in Tables II and III. The computation was rendered easy because in all cases the value of q differed only slightly from unity, and therefore in the neighbourhood of AH = 0 we obtain approximately from (44)

_T(p)r(g)A*

,

.

TABLE I Moment Coefficients of Àn

Distribution

True values from equations (40) and (43) »1

Values from equations (46)

n3

Mean A

"A

Ai

ß>

ßi

ßi

5

•2986 •2973 •2969 •2968 •2968

•0787 •0451 •0365 •0339 •0334

1-8257 1-8001 1-7942 1-7924 1-7920

•0779 •0450 •0367 •0342 •0338

1-8252 1-8001 1-7943 1-7926 1-7922

5 5 5 5 5

50 00

-4222 •4403 •4459 •4477 •4481

10 10 10 10

10 20 50 00

•4634 •4718 •4749 •4757

•2946 •2935 •2931 •2930

•0166 •0098 •0077 •0072

1-7863 1-7855 1-7856 1-7857

•0165 •0098 •0077 •0072

1-7863 1-7855 1-7856 1-7857

20 20 20

20 50 00

•4823 •4869 •4882

•2918 •2911 •2909

•00383 •00208 •00168

1-7879 1-7901 1-7908

•00382 •00208 •00169

1-7880 1-7901 1-7908

50 50 00

50 00 00

•4930 -4954 •5000

•2900 •2896 •2887

•00059 •00026 •00000

1-7940 1-7959 1-8000

•00058 •00026 •0000

1-7941 1-7959 1-8000

10 20

(g) For the limiting cases n2 (or nx) = oo we may compare the values of XH corresponding to PX = 0-05 and 0-01 computed in this approximate manner with those obtained from the tables in our earlier paper (Neyman & Pearson, 1928 a, pp. 238-40). The latter were obtained by a quadrature of the density field lying inside the oval A contours in the (m, s) field i.e. by a completely independent and exact method. They are given in brackets in the marginal columns of Tables II and III, and will be seen to agree very closely with the results of our present approximation. (h) As a further check on the accuracy of this method of approximation, the case n 1 = - » 2 = 5 was taken and the integration of F(t,6) outside certain of the Ajy contours was carried out by using quadratures. It was found that, for PXH = 0-01, AJJ = 0-00193 against the approximate value of 0-00186; for PXh = 0-05, XH = 0-0169 against the approximate value of 0-0167. The former values are given in brackets in the cells corresponding to nt = n2 = 5 of Tables II and III.

112

E . S . PEARSON AND J . N E Y M A N

(k) A s f a r a s c a n b e j u d g e d f r o m t h e f r e q u e n c y c o n s t a n t s t h e d i s t r i b u t i o n of A f o r t h e l i m i t i n g case n2 = oo a n d also f o r t h e special case n1 = n2 = 5 a r e of t h e s a m e g e n e r a l t y p e a s i n t h e o t h e r cases. T h e a g r e e m e n t t h e r e f o r e , d e s c r i b e d i n (g) a n d (h) a b o v e , b e t w e e n t h e t r u e v a l u e s of A f o u n d b y q u a d r a t u r e a n d t h o s e f o u n d b y u s i n g t h e c u r v e s (44), s u g g e s t s t h a t T a b l e s I I a n d I I I m a y b e t a k e n t h r o u g h o u t a s g i v i n g a d e q u a t e a p p r o x i m a t i o n s t o t h e v a l u e s of XH c o r r e s p o n d i n g t o PA = 0-05 a n d 0-01. TABLE

II

Values of XH giving PAh = 0-05 5

10

20

50

OO

5

•0167 (0169)

•0222

•0241

•0247

•0248 (•0247)

10

•0222

•0312

•0349

•0364

•0368 (-0367)

20

•0241

•0349

•0401

•0425

•0432 (•0431)

50

•0247

•0364

•0425

•0459

•0473 (-0474)

00

•0248 (•0247)

•0368 (•0367)

•0432 (•0431)

•0473 (•0474)

•0500

TABLE

III

Values of AH giving PXfI = 0-01 5

10

20

50

•0019 (•0019)

•0029

•0033

•0034

•0034 (•0033)

10

•0029

•0048

•0058

•0061

•0062 (•0062)

20

•0033

•0058

•0071

•0078

•0080 (•0079)

50

•0034

•0061

•0078

•0088

•0092 (•0092)

00

•0034 (•0033)

•0062 (•0062)

•0080 (•0079)

•0092 (•0092)

•0100

5

00

113

Problem of two samples IV.

A N ILLUSTRATIVE EXAMPLE

In our previous paper (loc. cit. p. 202) we took as an example the variations in Cephalic Index (breadth to length ratio x 100) measured on each of two series of 10 human skulls, namely Series 1, 74-1;

77-7;

74-4;

74-0;

w 1 = 10;

73-8;

72-2;

0^= 75-15;

75-2;

78-2;

77-1;

78-4

72-4;

78-1

^ = 2-059.

Series 2, 66-7;

69-4;

67-8;

73-2;

n 2 = 10;

79-3;

80-7;

x2 = 73-47;

64-9;

82-2;

s 2 = 5-942.

W e then inquired whether it was likely, as far as the cephalic index was concerned that these two sets of skulls could be random samples from a population in which the mean cephalic index was 75-06 and the standard deviation 2-68, and concluded that it was very probable in the first case (PA = 0-504), but highly improbable in the second ( P A < 0-0001). W e were then testing two "simple" hypotheses. W e may now proceed to test the "composite" hypothesis that these two samples have come from the same population, without specifying what that population may be, except that it will be assumed that the distribution of cephalic index does not differ so much from normality as to invalidate the test.* On combining the two samples it is found that s0 = 4-562, and hence A H = (Sls2ls%)10= 0-00492. Table I I I indicates that this value corresponds very closely to P A j f = 0-01; that is to say only once in a hundred times should we expect our criterion to have as low or a lower value were the hypothesis tested true. W e should therefore conclude that it was very unlikely that the two series of skulls came from the same population. The values of t and 0 for this pair of samples are t = 0-973,

(9 = 8-328.

The point R(t, 6) representing the samples has been plotted in the accompanying diagram, and also a curve representing (though not drawn to scale) the member of the family of contours (18) for which A# = 0-00492. Then PÁ¡¡ is the integral of the function F(t, 8) of (33) and (34) taken throughout the region of the field lying outside this contour. Let us also consider the hypotheses H-y and H2. Equation (23) gives X H i = 0-00822; this is constant for sample points lying not only on the line R' N R through R, parallel to the axis of t, but also on a second parallel Q' M Q. The latter corresponds to certain pairs of samples in which is greater than s2, and 0 = 0-120. PA// is the integral of F(t, 0) over (a) the region for which * The sensitiveness of the test to deviations from normality will require further consideration.

114

E . S . PEARSON A N D J . N E Y M A N

6 3t 8-328 and (b) the region for which 8 ^ 0-120, that is to say over the two shaded areas of the diagram. It is found that P Aff = 0-0042*. We should therefore conclude that it was extremely unlikely that the two samples came from populations with a common standard deviation, and as the test of hypothesis H2 is based on the assumption that the populations sampled have a commmon standard deviation, we should hardly go on to examine this. If however we had some a priori grounds for believing that (r1 = cr2, and that therefore the observed difference between sx and s2 was just a very abnormal chance fluctuation, then we might

i

>3

N

// //

l-%, • • •>

^ 1 1 0 0 8 ^ ,

k-2

i}

x ,

=

k

(r

~

1

x

k

~

2

n i=2

c

o

s

«

1

-

(

3

9

)

h a v e

( /c =

N (s?)ioo,

(t =

1,2,...,k)

the

fi'j,

tend

(i) for AHi,

t o ^ + l)-»- 1 '

(75)

(ii) for A ffi ,

t o ^ + l)-^-»

(76)

(iii) for A ffa ,

to (p + 1 ) - ^ » .

(77)

We conclude that the frequency distributions of the A's tend at the same time to limiting forms having for their ^th moment coefficients these values; it follows that the limiting forms of frequency functions of AHi and A ^ are identical. But the function for which , . „ /*; = (!»+ l) _ m (78) Further upon making the transformation A =

we obtain

=

e~ix2

(80)

(81)

126

J. N E Y M A N A N D E . S. PEARSON

It follows that when dealing with very large samples we may expect to obtain approximations to PA from the tables of the x 1 integral, or the Incomplete Gamma Function. Using Elderton's Tables we should enter them with n' = 2k — 1 when the tested hypothesis is H and with n' = Jc when dealing with the hypotheses H^tH^* VI.

APPROXIMATIONS TO PA IN THE CASE OF SMALL SAMPLES

Various methods of calculating the probability integral of X Hi are available based upon the distribution (23). For instance the z-transformation of R . A . Fisher may be performed, namely (82)

and his tables of 5 % and 1% probability limits used (Fisher, 1930). Other methods of procedure have been discussed by one of us in a recent paper (Pearson, 1929). It will be noted that in this case there are only two variable quantities, N and k. When dealing with X H and AJZi the general problem is more complex as the moments, and hence the respective frequency functions, depend upon n^ n2,..., nk as well as upon k. Many of the more recent applications of statistical analysis have however been made to a class of problems in which the size of the samples can be controlled to a large extent and in fact made equal, so that nx = n2 = ... = nk = n. Such experiments are planned for a definite purpose; this is the case in the cement test described at the beginning of this paper; it is true for many other tests as have been described, for example, in the publications of the Bell Telephone Laboratories of New York and in the new book of R.Becker et al. (1927). I t will also occur in biological experiments and in agricultural plot trials. That is to say, a very useful purpose would be served were it possible to provide, for a selection of values of n and k, the 5 % and 1 % limitsf for AH and \ H . The problem requires fuller investigation than has at present been possible; in thefirstplace it is clear that some transformation of A is required. For the case k = 2 previously discussed, the distribution of \ H tends to the so-called Rectangular Form in which all values of between 0 and 1 are equally likely to occur. But as k increases the frequency becomes more and more compressed towards the limit A h = 0 §. Having regard to the limiting forms, it is possible that a logarithmi c transformation is desirable. An alternative method has however been explored which seems likely to be of value if n be not too large. This will now be discussed. * [Later note: Elderton's table for the probability integral of x2 was published in Tables for Statisticians and Biometricans, 1; in his notation, n'—l corresponds to the degrees of freedom. E.S.P.] t Namely the limits below which values of A would be found to lie in only 5 % and 1 % of sets of k samples, if the corresponding hypotheses tested were true. § For example in the case k — 20, n = 10 it was found from equations (56) that Mean (A a ) = 0-00000070, cr = 0-00001518, ^ = 56815, >£, = 206127!

Problem of k samples It is seen from (21) and (23) that L = the law

127

is distributed exactly according to fc 2

P{i} = constant xLi 2' •••i n) If H0 is a composite hypothesis, denote by A-^w) the sub-set of A^ corresponding to the set w of simple hypotheses belonging to H0 and by pjxl , #2, • • • > Xn) the upper bound of vl s («). The likelihood of the composite hypothesis is then ^_Pu(xVx2> •••ixn) ^2) Pa(x 1>X2> •••>xn) In most cases met with in practice, the elementary probabilities, corresponding to different simple hypotheses of the set Q are continuous and differentiable functions p(ava2, ...,ac,ac+1, ...,ak; xvx2, ...,xn), (13) of the certain number, k, of parameters av a2,..., a c , a c + 1 , ..., a,.; and each simple hypothesis specifies the values of these parameters. Under these conditions the upper bound, pa(x1,x2, ...,xn), is often a maximum of (13) (for fixed values of the x's) with regard to all possible systems of the a's. If H0 is a composite hypothesis with c degrees of freedom, it specifies the values of k — c parameters, say a a c + i > a c + 2 > •••> fc and leaves the others unspecified. Then pa(xv ) is often a maximum of (13) (for fixed values of the a;'s and of a c + 1 , a c + 2 ,..., afc) with regard to all possible values of a l f a 2 , . . . , . . . , occ. The use of the principle of likelihood in testing hypotheses, consists in accepting for critical regions those determined the inequality A < C = by const. (14) Let us now for a moment consider the form in which judgements are made in practical experience. We may accept or we may reject a hypothesis with varying degrees of confidence; or we may decide to remain in doubt. But whatever conclusion is reached the following position must be recognized. If we reject H0, we may reject it when it is true; if we accept H0, we may be accepting it when it is false, that is to say, when really some alternative Ht is true. These two sources of error can rarely be eliminated completely; in some cases it will be more important to avoid the first, in others the second. We are reminded of the old problem considered by Laplace of the number of votes in a court of judges t h a t should be needed to convict a prisoner. Is it more serious to convict an innocent man or to acquit a guilty? That will depend upon the consequences of the error; is the punishment death or fine; what is the danger to the community of released criminals; what are the current ethical views on punishment? From the point of view of mathematical theory all that we can do is to show how the risk of the errors may be controlled and minimized. The use of these statistical tools in any given case, in determining just how the balance should be struck, must be left to the investigator.

Problem of most efficient tests of statistical

hypotheses

147

The principle upon which the choice of the critical region is determined so that the two sources of errors may be controlled is of first importance. Suppose for simplicity that the sample space is of two dimensions, so that the sample points lie on a plane. Suppose further that besides the hypothesis H0 to be tested, there are only two alternatives Hx and H2 . The situation is illustrated in fig. 1, where the cluster of spots round the point 0, of circles round Av and of crosses round A2 may be taken to represent the probability or density field appropriate to the three hypotheses. The spots, circles and crosses in the figure suggest diagrammatically the behaviour of the functions Pi(xv x2 ), i = 0,1,2, in the sense that the number of spots included in any region w is proportional to the integral of Pd(xi,xt) taken over this region, etc. Looking at the diagram we see that, if the process of sampling was repeated many times, then, were the hypothesis H0 true, most sample points would lie somewhere near the point 0. On the contrary, if Hx or H2 were true, the sample points would be close to 0 in comparatively rare cases only.

FIG. 1.

In trying to choose a proper critical region, we notice at once that it is very easy to control errors of the first kind referred to above. In fact, the chance of rejecting the hypothesis H0 when it is true may be reduced to as low a level as we please. For if w is any region in the sample space, which we intend to use as a critical region, the chance, P0 (w), of rejecting the hypothesis H0 when it is true, is merely the chance determined by H0 of having a sample point inside of w, and is equal either to the sum (when the sample points do not form a continuous region) or to the integral (when they do) of p0 (xlt x2 ) taken over the region w. It may be easily made s£ e, by choosing w sufficiently small. Four possible regions are suggested on the figure; (1) wx ; (2) w2 , i.e. the region to the right of the line BC; (3) w3 , the region within the circle centred at A2; (4) w4, the region between the straight lines OD, OE. If the integrals of ^„(a^, x2 ) over these regions, or the numbers of spots included

148

J . N E Y M A N AND E . S. PEARSON

in them, are equal, we know that they are all of equal value in regard to the first source of error; for as far as our judgment on the truth or falsehood of H0 is concerned, if an error cannot be avoided it does not matter on which sample we make it.* I t is the frequency of these errors that matters, and this—for errors of the first kind—is equal in all four cases. I t is when we turn to consider the second source of error—that of accepting H0 when it is false—that we see the importance of distinguishing between different critical regions. If H^ were the only admissible alternative to H0, it is evident that we should choose from wv w2, w3 and wt that region in which the chance of a sample point falling, if Hx were true, is greatest; that is to say the region in the diagram containing the greatest number of the small circles forming the cluster round Av This would be the region w2, because for example, P1(w2)>P1(w1)

or

P1(W-w2) a0 as in fig. 4. W e deal first with t h e class of alternatives for which a > a0. If e = 0-05; x0 = a0+ l-6449 a0. Take ww as the square GEA3F with length of side = b0*Je lying in the upper left-hand corner of W0. (b) ax < a0. Take a similar square with corner at Av

x,

O FIG. 7.

In both cases the whole space outside W0 must be added to make up the critical region w0. In the general case of samples of n, the region w00 will be a hypercube with length of side b0$e fitting into one or other of the two corners of the hypercube of W0 which lie on the axis x1 = x2 = ...=xn. The whole of the space outside W0 within which sample points can fall will also be added to w00 to make up w 0 * Example (5). Suppose that the set of alternatives consists of distributions of form (63), for all of which a = a0, but b may vary. H0 is the hypothesis that b = b0. The sample spaces, Wt, are now hypercubes of varying size all centred at the point = x2 =... = xn = a0). A little consideration suggests that we should make the critical region w0 consist of— (1) The whole space outside the hypercube W0. (2) The region «>„„ inside a hypercube with centre at (x1 = x2= ... =xn = a0), * If the set is limited to distributions for which b = b„, no sample point can lie outside the envelope of hypercubes whose centres lie on the axis xl = xi=... =xn.

Problem of most efficient tests of statistical

hypotheses

161

sides parallel to the co-ordinate axes and of volume eb£. This region w00 is chosen because it will lie completely within the sample space Wot common to H0 and HT for a larger number of the set of alternatives than any other region of equal content. Example (6). H0 is the hypothesis that a = a0, b = b0, and the set of admissible alternatives is given by (63) in which both a and b are now unspecified. Both the mid-point (a^ = x2 = ... = xn = at) and the length of side, bt, of the alternative sample spaces Wt can therefore vary. Clearly we shall again include in w0 the whole space outside W0, but there can be no common region w00 within W0.

Ö1 Fio. 8.

Fig 8(a) represents the position for n = 2. Four squares Wv W2, W3, and Wt correspond to the sample spaces of possible alternatives HLT H2, H3, and H4, and the smaller shaded squares wv w2, w3, and wi represent possible critical regions for HQ with regard to these. What compromise shall we make in choosing a critical region with regard to the whole set O ? As we have shown elsewhere* the method of likelihood fixes for the critical region that part of the space that represents samples for which the range (the difference between extreme variates) is less than a given value, say I < lQ. For samples of 2, 1 = x1 — x2 if x1>x2, and x2 — x1 if x1 < x2> and the critical region w00 will therefore lie between two straight lines parallel to and equidistant from the axis x1 = x2. A pair of such lines will be the envelope of the small squares wv w2, etc., of fig. 8(a). In fact, the complete critical region will be as shown in fig. 8(6), the belt w00 being chosen so that its area is e&jjFor n = 3 the surface I = l0 is a prism of hexagonal cross-section, whose generating lines are parallel to the axis 2/j — — £3. The space w00, within this and the * Biometrika, vol. 20A, p. 208 (1928). Section on 'Samples from a Rectangular Population'.

162

J . N E Y M A N AND E . S .

PEARSON

whole space outside the cube W0 will form the critical region w0. In general for samples of n the critical region of the likelihood method will consist of the space outside the hypercube W0, and the space of content e&J within the envelope of hypercubes having centres on the and edges parallel to the axes of co-ordinates. It will have been noted that a correspondence exists between the hypotheses tested in examples (1) and (4), (2) and (5), (3) and (6), and between the resulting critical regions. Consider for instance the position for w=3 in example (3); the boundary of the critical region may be obtained by rotating fig. 6 in 3-dimensioned space about the axis of means. The region of acceptance of H0 is then bounded by a surface analogous to an anchor ring surrounding the axis x1 = x2 = x3, traced out by the rotation of the dotted curve A = constant. Its counterpart in example (6) is the region inside a cube from which the hexagonal sectioned prism % surrounding the diagonal has been removed. A similar correspondence may be traced in the case of sampling from a distribution following the exponential law. It continues to hold in the higher dimensioned spaces with n >3. The difference between the normal, rectangular and exponential laws is of course, very great, but the question of what may be termed the stability in form of best critical regions for smaller changes in the frequency law, is of considerable practical importance. IV.

COMPOSITE H Y P O T H E S E S

(a) Introductory In the present investigation we shall suppose that the set Q of admissible hypotheses defines the functional form of the probability law for a given sample, namely—

p(x 1 ,x 2 , ...,xn),

(66)

but that this law is dependent upon the values of c + d parameters cc™,a®,...,a«"';

a, ...,a; This composite hypothesis consists of a sub-set w (of the set Q) of simple hypotheses. We shall denote the probability law for H'0 by Po=Pa{x1,xi,...,xn),

(69)

associating with (69) in any given case the series (68). An alternative simple

Problem of most efficient tests of statistical

hypotheses

163

hypothesis which is definitely specified will be written as Ht, and with this will be associated (1) a probability law (70) Pi=pAXl>Xt xJ, (2) a series of parameters

a(D) a (2) ; ... ;

^c+d).

(7i)

We shall suppose that there is a common sample space W for any admissible hypothesis Ht, although its probability law pt may be zero in some parts of W. As when dealing with simple hypotheses we must now determine a family of critical regions in the sample space, W, having regard to the two sources of error in judgment. In the first place it is evident that a necessary condition for a critical region, w, suitable for testing H'0 is that ^>o(M')=JJ---J i>o(£i,x 2 ,...,x n )dxydx 2 ... dxn = constant = e

(72)

for every simple hypothesis of the sub-set w. That is to say, it is necessary for P0(w) to be independent of the values of a(1), a (2) ,..., a(c). If this condition is satisfied we shall speak of w as a region of " size " e, similar to W with regard to the c parameters a(1), a®,..., a(c). Our first problem is to express the condition for similarity in analytical form. Afterwards it will be necessary to pick out from the regions satisfying this condition that one which reduces to a minimum the chance of accepting H'0 when a simple alternative hypothesis Ht is true. If this region is the same for all the alternatives Ht of the set Q, then we shall have a common best critical region for H'0 with regard to the whole set of alternatives. The fundamental position from which we start should be noted at this point. It is assumed that the only possible critical regions that can be used are similar regions; that is to say regions such that P(w) = e for every simple hypothesis of the sub-set w. It is clear that were it possible to assign differing measures of a priori probability to these simple hypotheses, a principle might be laid down for determining critical regions, w, for which P(w) would vary from one simple hypothesis to another. But it would seem hardly possible to put such a test into working form. We have, in fact, no hesitation in preferring to retain the simple conception of control of the first source of error (rejection of H'0 when it is true) by the choice of e, which follows from the use of similar regions. This course seems necessary as a matter of practical policy, apart from any theoretical objections to the introduction of measures of a priori probability. (b) Similar Regions for Case in which H'0 has One Degree of Freedom We shall commence with this simple case for which the series (68) becomes a®; (73) We have been able to solve the problem of similar regions only under very limiting conditions concerning p0. These are as follows: (a) p0 is indefinitely differentiate with regard to a(1) for all values of a(1) and in every point of W, except perhaps in points forming a set of measure zero. That is

164

J . N E Y M A N AND E . S . PEARSON

to say, we suppose that dkp0ld( n' This last condition could be somewhat generalized by adding the term C2 to the right-hand side of (75), but this would introduce some complication and we have not found any practical case in which p0 does satisfy (75) in this more general form and does not in the simple form. We have, however, met instances in which neither of the two forms of the condition (b) is satisfied by p0. If the probability law p0 satisfies the two conditions (a) and (b), then it follows that a necessary and sufficient condition for w to be similar to W with regard to a(1) is that k | 0k Po dX dX _n /7„_1 O \ (76) = i f ' ' " L f ^ *• -'dXn = ° ( ¿ = 1 . 2 , . . . ) . k 8(a^)

Taking in (76) k = 1 and 2 and writing

dP0(w) doc® = l l " ' f it will be found that PP0(w) 2>o(02 + ') dxj^dxz... dxn = 0. ¿»(a«»)2r=IJ...J

=

) cc c

(79) (80)

Using (75) we may transform this last equation into the following 8*P0(w) aw'

=

I f ' '' J >

2 + A + B < 1 > ) dx

*dx* - d x n = ° -

(81)

Having regard to (72) and (79) it follows from (81) that (say).

(82)

The condition (76) for k = 3 may now be obtained by differentiating (81). We shall have ppp(v>) =JJ...J 8{otP>f

p0(P + 3B(f>* + (3A + B* + B')(¡> + A' + AB)dx1...

(83) dxn = 0,

Problem of most efficient tests of statistical

hypotheses

165

which, owing to (72), (79) and (82) is equivalent to the condition J J . . . J p03dx1...dxn = (3AB-A'~

AB) e = e f s(a)

(4=1,2),

(85)

where i/rk(oca)) is a function of a(1) but independent of the sample a;'s, since the quantities A, B and their derivatives with regard to a(1) are independent of the x's. ^fc(a(1)) is also independent of the region w, and it follows that whatever be w, and its size e, if it be similar to W with regard to a(1), the equation (85) must hold true for every value of k, i.e., 1, 2,.... Since the complete sample space W is clearly similar to W and of size unity, it follows that « //"' J Po k dx1 dx2... dxn = JJ. --J^Po

dx

i dx2... dxn

(¿=1,2,3).

(86)

N o w p 0 ( x v x 2 , . . . , x n ) is a probability law of n variates x2,...,xn, defined in the region W; similarly 1 ¡ep{x1,xi, ...,xn) may be considered as a probability law for the same variates under the condition that their variation is limited to the region w. We may regard as a dependent variate which is a known function of the n independent variates a;i. The integral on the right-hand side of (86) is the ¿th moment coefficient of this variate obtained on the assumption that the variation in the sample point xv x2,..., xn is limited to the region W, while the integral on the left-hand side is the same moment coefficient obtained for variation of the sample point, limited to the region w. Denoting these moment coefficients Hk{W) and fik{w), we may rewrite (86) in the form: /»*(«>)

(¿=1,2,3).

(87)

I t is known that if the set of moment coefficients satisfy certain conditions, the corresponding frequency distribution is completely defined.* Such, for instance, is the case when the series*L{jik(it)k¡lc!}is convergent, and it then represents the characteristic function of the distribution. We do not, however, propose to go more closely into this question, and shall consider only the cases in which the moment coefficients of satisfy the conditions of H. Hamburger-)". In these cases, to which the theory developed below only applies^, it follows from (87) that when , which is related t o p 0 { x x 2 , . . . xn) by (74) is such as to satisfy the equation (75), the identity of the two distributions of is the necessary (and clearly also sufficient) condition for w being similar to W with regard to the parameter a(1). * Hamburger; Math Ann., vol. 81, p. 4 (1920). •f We are indebted to Dr R . A. Fisher for kindly calling our attention to the fact that we had originally omitted to refer to this restriction. | It may easily be proved that these conditions are satisfied in the case of examples (7), (8), (9), (10) and (11) discussed below.

166

J . N E Y M A N AND E . S . PEARSON

The significance of this result may be grasped more clearly from the following consideration. Every point of the sample space W will fall on to one or other of the family of hypersurfaces ^= ^ ^ = ^ (8g) Then if

P0(«#))= ff... f P»dw{t), J J J Mi) ^0(^(0))= f f - f JJ J

(89)

PodWtfi)

(90)

represent the integral of p0 taken over the common parts, vo() and W( = const, and satisfying the condition p M ) ) = ^ ^ (99) we should have

Pt(H)) < Ptiwo(4>))>

(10°)

except perhaps for a set of values of of measure zero. Suppose in fact that the proposition is not true and that there exists a set E of values of of positive measure for which it is possible to define the regions v( ) ) > ^ ^ (W1) Denote by CE the set of values of complementary to E. We shall now define a region, say v, which will be similar to W with regard to a (1) and such that Pt(v)>Pt(w0),

(102)

which will contradict the assumption that w0 is the best critical region with regard to Hf. ,

168

J . N E Y M A N AND E . S . P E A R S O N

The region v will consist of parts of hypersurfaces = const. For 'a included in CE, these parts, v(), will be identical with w0() and for 's belonging to E, they will be v((»)-i>H)=f

JE

(Pt(v( a0; region w0 defined by z =

(118) (119)

Problem of most efficient tests of statistical

hypotheses

171

where z0 is related to e by (117), and z'0 by a similar expression in which the limits of the integral are — oo and z'0 = — z0. This is "Student's" test.* It is also the test reached by using the principle of likelihood. Further, it has now been shown that starting with information in the form supposed, there can be no better test for the hypothesis under consideration. (2) Example (9). The hypothesis concerning the variance in the sampled population. The sample has been drawn from some normal population and H'0 is the hypothesis that cr = ,..., a (c) and of size e. This is the natural generalization of the notion of similarity with regard to one single parameter previously introduced. Let us first consider regions w, which are similar to W with regard to some single parameter, a (,) , for some fixed values of other parameters cc(i>(j = 1,2, ...,c, but j + i). Clearly there may be regions which are similar to W with regard to a ( i ) when the other a's have some definite values, but which cease to be similar when these values are changed. We shall now prove the following proposition. The necessary and sufficient condition for w being similar to W with regard to the set of parameters a (1) , a ( 2 ) ,..., a (c) , is that it should be similar with regard to each one of them separately for every possible system of values of the other parameters. The necessity of this condition is evident. We shall have to prove that it is also sufficient. This we shall do assuming c = 2, since the generalization follows at once from this. The conditions of the theorem mean that, (a), whatever be the values of and we shall have P({a!i\ og»} w) = P({a'i>, af)w) = e,

(132)

and (b), whatever be a ^ . a ^ . a ^ , then ¿ W , a(c2)}

=

O

v>) = e-

(133)

Problem of most efficient tests of statistical It follows that whatever be a{}\ otty; a ^ ,

hypotheses

173

we shall have

(

P{{otl\ a c'} w) = P({a£>, ag>} w) = e,

(134) (1)

(2)

and thus that the region w is similar to W with regard to the set a , a . We shall now introduce a conception which may be termed that of the independence of a family of hyper surfaces from a parameter. Let

fi(oc,x1,x2,...,xn)

= Ci

(¿=1,2, ...,&< w),

(135)

be the equations of certain hypersurfaces in the w-dimensioned space, a and Gi being parameters. Denote by S(a,,CvC2,Ck) the intersection of these hypersurfaces, or if k = 1, the hypersurface corresponding to the equation (135). Consider the family of hypersurfaces S{a,C1,C2,..., Ck) corresponding to a fixed value of a and to all possible values of CVC2,..., Ck. This will be denoted by F(a). Take any hypersurface S(ax, C^, •••,C'k) from any family F(a1). If whatever be a 2 it is possible to find suitable values of the O's, for example C"ltC'2, ...,Ck, such that the hypersurface S(a2, C'[, C\ C"k) is identical with 8(0.^ C'v ..., G'k), then we shall say that the family F(a) is independent of a. A simple illustration in 3-dimensioned space may be helpful. Let /i( a > xlt x2, x3) = (xx - a) 2 + (x2 - a) 2 + (x3 - a) 2 = Cv

(136)

/ 2 (a, xv x2, x3) = xx + x2 + x3 = C2.

(137)

These equations represent families of spheres and of planes. For given values of a, C1 and C2, S(ac, Cv C2) will be a circle lying at right angles to the line x1 = x2 = x3 and having its centre on that line. The family F(a) obtained by varying C1 and C2 consists of the set of all possible circles satisfying these conditions. This family is clearly independent of a—that is, of the position of the centre of the sphere on the line •Cj — «Tg — 2/g SO that F(oc) may be described as independent of a. It is now possible to solve the problem of finding regions similar to W with regard to a set of parameters a(1), a (2) ,..., a(c), but we are at present only able to do so under rather limiting conditions. In the first place we shall have conditions analogous to those assumed when dealing with the case c= 1 (see p. 164). (A) We shall assume the existence of 8kp0/2(a®)fc in every point of the sample space except perhaps in a set of measure zero, and for all values of the a's. (B) Writing , „ ,, aiogP o _ 1 d Po il38, it will be assumed that for every i = 1,2,..., c ft^Ai

+ Bifa,

(139)

At and Bi being independent of the x's. (C) Further there will need to be conditions concerning the hypersurfaces i = const. Denote by i), and w((j>A) which do not change when a(2) varies, is guaranteed by the condition that F(a. We shall now use the condition that w is similar to W with regard to a (1) whatever be the values of other parameters, and thus of a(2). This means that the variation of a (2) does not destroy the equation (142). I t follows that ¿>(a)*

_ e

S(a(2))fc

A «*•••)•

* We are aware that these conditions are more stringent than is necessary for the existence of similar regions; this is a point requiring further investigation.

Problem

of most efficient tests of statistical

As the regions W(^J and written in the form

hypotheses

175

are independent of a(2), these conditions may be

We may now use the condition (139) for i = 2, and applying the method used when dealing with the case c = 1, show that (144) is equivalent to e

J

p^dwfa) =(...( p^dW^J. J to((5,) J JfVtyJ

(145)

Following the same method of argument we find immediately that the necessary (and clearly also sufficient) condition for w being similar to W with regard to a(1) and a (2) is that f...f pdw(1,2) = et..( J J witut,) J Jm&.i>.)

pdWi^fa),

(146)

where 2) means the intersection in the sample space W of the hypersurfaces (p1 = Gx and 0 2 = C2 for any values of Cl and C2, and m>(0i> $2)—the part of the same, contained in w. I t is easily seen that the same argument may be repeated c — 1 times and that finally we shall find that the necessary and sufficient condition for w being similar to W with regard to the whole set of parameters a(1), a ( 2 ) ,..., a(c) is that ... J or

p d w t f ^ f c , ...,0 c ) = e ... pdWtfvfa,...,^) J J —W P ( w # 1 ; 0 2 , ...,&)) = eP{ W(v fa,..., fa)).

(147)

Here •••>0C) means the intersection in W of hypersurfaces fa = Cit (i = 1,2,..., c) for fixed values of the a's and for any system of values of the C's. •••, Pt(v). (149) As in the case where the probability law p0 depended only upon the value of one unspecified parameter a(1), we can prove that if w0 is a region maximising Pt(w), then except for a set of values of c))- That is to say, P i K ^ , (¡>2, i = Ci,

=

(156)

(3>

is independent of a ; and so on in general for a® until lastly we find that the family F{OL(c)) of hypersurfaces S(a(c\ C1; C2, ...,Cc_1), formed of points satisfying c - 1 equations ^ = ^ ^ = ^ _^ = ^ ( ^ is independent of o6':). If all these conditions are satisfied, then the best critical region w0 of size e with regard to a simple alternative, Ht, determining a frequency law pt, must be built up of pieces, w0(^>1,^>2, of the hypersurfaces W(y,z, ...,^ on which the inequality (152) holds, the coefficient •••>&) being determined to satisfy (147). We note that if the boundaries of the regions wQ( 2,..., c) are independent of the d additional parameters, a (c+1) ,a (c+2) , ...,oc(c+d), (158) specified in (67), then w0 will be a common best critical region with regard to every Ht of the set Q. (c) Illustrative examples We shall give two illustrations in which we suppose that two samples, (1) S x of size nv mean = standard deviation = sv (2) S 2 of size n2, mean = x2, standard deviation = s2, have been drawn at random from some normal populations. If this is so, the most general probability law for the observed event may be written p(xx,x2,...,nni; (

xni+1,...,xN) 1

Y

1

CXP

in

+

„ ( ^ Z + f l )

(159)

where nx-\-n2 = N, and a1, cr1 are the mean and standard deviation of the first, and a2, 2, 3) is the locus of point in which xv x2 and si have certain fixed values, the right-hand side of (175) is the product of e and the corresponding value of the frequency function of the three variates xv x2 and or p0(xv x2, s®). The left-hand side of the same equation is the integral of p0 over that part of W( i,

&»)) =

C k" Po(xi>

x

2>

s

Jki

or

P0(w(

1.^2^3))=

R*« J k"

_

a. 4)

in the case

dsl,

(181)

(a),

_

i> 0 ( a; i> a; 2> s o» 5 l) iis i» inth ecase(6).

(182)

Here k" and k'" are the upper and the lower limits of variation of s2 for fixed values of xv x2 and Further, we shall have P O (

w

( 0 I > 0 2 > 0

3

) ) = P O (

X

I >

x

2 > 4 ) =

It is known that p0(xlt

x2,

flf,

s\) =

«»

1- 3 sp-3

:

I" Po(xi>x2>4>4)d4J k"

exp - J - i i - i

(

1 8 3

)

-J • ( 1 8 4 )

^

Introducing s\ instead of sf as a new variate, we have =

(185)

and Po(xl>

x

s

a>5!)

2>

+ n

= const. (Nsl-».aDK-x-®«».-»exp-("~^ I t is easily seen that

/ 2 2 of

„ Jfc* =

— n 2

a

k

"

'

2

~

+

= 0.

J

. (186) (187)

Therefore, using (181), (183) and (186) we have from (175), after cancelling equal multipliers on both sides, P

Jfci

(Nsl-n2s|)i(ni-3)sp'3dsi

= e P (Ns2a-n2s\)^n^s^-sds\

Jo

(188)

for case (a), and an analogous equation for case (b). Write n2s\

= Nslu

(189)

where u is the new variate. Then instead of (188) we shall have f 1 (1 Ju,

u)}

= eB

-

1), \(n2

-

1)} =

f"" (1 J 0

u)Un i~3>

du,

(190)

Problem of most efficient tests of statistical hypotheses where

181

n ^ U°~Nsl'

U°"Nsl•

(iyi)

I t follows from (190) that u0 and u[ depend only upon nv n2 and e. Therefore, whatever be xv x2, and defined by the inequality

or

the element w( = a

«!, 4) =

3 e X p - jiV

+

,

(212)

C being a constant. Substituting in (212) »2 'N'

(213) (214)

N =

(215)

and multiplying by the absolute value of the Jacobian N S(as0, «g, v) '' 2

(216)

we obtain the frequency function of x0, Sq, V, sf namely, +

=

(217)

We note that for fixed values of Sq and v, the variate sf may lie between limits zero and . , (218)

The frequency function p0(x0, v) is found from p0{x0, sjj, s f , v) by integrating it with regard to s f , between the limits zero and sj 2 . We have thus =

exp — IN

^

j.

(219)

184

J . N E Y M A N AND E . S . PEARSON

Putting this into the equation (211), and cancelling on both sides equal constants, we find

- î e J ^ - ^ t ^ ' V N 2•Then ,

P(Wm =J1

=

=

(10)

and the chance of making an error in following this rule will be e, whatever be the probabilities a priori of the two alternatives. As the size of sample is increased (n remaining odd), P(w) — e will tend to zero, and probably no serious objection could be raised against the test. It is possible that there may be many analogous cases in which a test independent of the probabilités a priori (in the sense of Definition A) could be found, but it is necessary to call attention to a very wide class of problems in which even if a region satisfying (8) could be determined, it would provide a test of no practical value. Such problems are those in which the set of alternative hypotheses are very numerous (if not infinite), and some of these differ only insignificantly from the hypothesis tested. When, for example, we test the hypothesis that a

190

J . NEYMAN AND E . S . PEARSON

sample has been drawn from a normal population with mean a 0 and standard deviation |//f) we must substitute the products a0P(w\H0) and « ¿ P ^ l / ^ ) . I t follows that this situation (c) would not introduce any new principle essentially modifying the problem,

196

J . N E Y M A N AND E . S . PEARSON

although, of course, there might be considerable difference in the details of solution and its difficulty. If the situation can be represented as in (b), the following method of choice of a critical region might be adopted, which would lead to a further definition of independence. The probability of type I I errors associated with the use of a critical region to has been defined by m ¿>n(w)=m-PW)}. (27) i=i Denote by II(w>) t h e upper bound of the probabilities P(w\H?). C l e a r l y

P u M ^ n w ,

(28)

whatever be the probabilities Choose now from all equivalent regions of size e for H0, t h a t region w0 for which II(w 0 ) is a minimum*. If we accept the test based on w0, it has the following properties: (a) I t is of size e, i.e. P(w0\H0) = e. (b) For every other critical region wv equivalent to w0, the upper bound of the probability of type II errors, namely II(wj), is larger t h a n II(m>0). T h a t is to say, while w0 and w1 would provide equal control of type I errors, the control of type II errors is in general not the same, and m a y be described as follows: (i) There is no guarantee t h a t the test based on w 0 will be always of greater resultant power t h a n t h a t based on w^, for there m a y be systems of the values of the ^ j ' s for which the use of w1 will provide the better control of type I I errors. (ii) B u t it m a y be asserted t h a t whatever the probabilities a priori, the probability of type I I errors, Pn(M?0), when using w0, never surpasses II(w 0 ); while there will be systems of such t h a t this probability of error, P ^ « ^ ) , corresponding to the region w1 is as close to II (wx) as desired, and thus is larger t h a n n(w 0 )>P n (ti>„). This property of the region w0 described in (ii), providing a known upper limit t o P „ K ) whatever t h e m a y be, could be taken as the basis of a further definition of a test independent of the a priori probability law. D E F I N I T I O N C. A test T based on a critical region w0 will be termed independent of the probabilities a priori, (f>iy if (1) whatever the fi/s may be, the probability of type II errors Pn(w0) never exceeds a number II (W0); and (2) whatever other equivalent test is taken, based on a critical region wv there are systems of H's such that

Pn(^i)>Tl(w0)>Pn(w0), and for which the test T has thus greater resultant power than Tv

* I t is assumed that a region w 0 , minimizing !!(«>„), exists.

Statistical hypotheses in relation to probabilities a priori III.

197

T H E C A S E OP C O M P O S I T E H Y P O T H E S E S

As defined above, a composite hypothesis fy may be regarded as consisting of a set of simple hypotheses h{1,hi2,...,hik,..., or, if we use the terminology of mathematical logic, as the logical sum of these simple hypotheses. ik will denote the probability a priori of hik (i = 0 , 1 , 2 , . . . ; k = 1,2,...). Further we shall write &= where 2

k

k

=

(29)

k

= 1. We may now express the chance of occurrence of the first

and second types of error, in testing a composite hypothesis H0 by means of a critical region w, namely pi{w) = s {okP(w\hok)} = 0 S yrokP(w\hok)}, (30) k

k

PII(w)=XX{ikP(w\hik)}, k

(31)

where as before w = W — w is the region for acceptance of H0. As before we shall treat all errors of type I as equivalent, and we are therefore concerned with the value of the probability of their logical sum. From our point of approach this is an essential feature of the problem. For example, if we test the composite hypothesis that a sample has been drawn from some unspecified normally distributed population, it is only the question of normality to which we intend the test to be sensitive. To suppose that it would matter less if we rejected H0 when the standard deviation of the normal population has some value which a priori we consider most improbable, would be to misinterpret the purpose of the test as we regard it. To examine the value of the standard deviation, a separate test would need to be applied after decision on the question of normality has been taken. I f now we write ,, . /00. PiMKk)^^, (32) we have

2>(w) =

k

{f ok e fe )}.

(33)

As in the case of testing a simple hypothesis, we cannot determine P\{w) without knowing Moreover in the present case we require also a knowledge of thetyVofc'sThere are however certain statements with regard to the first source of error which can be made that are independent of the probabilities a priori. (a) We may sometimes find regions w, such that P1(w) = 0e, where e is independent of the ; that is to say, using such a region we shall be able to determine exactly the chance e of rejecting a true hypothesis when we meet it, although PK|fcifc) (38)

for every other admissible simple hypothesis hik, where w1 is any other region whatsoever, absolutely equivalent to w0.* (2) The test so determined is not necessarily unique, since there may be more than one set of absolutely equivalent regions. On the other hand such regions may not exist at all.

O

(3) So long as the values of ok = 'fioirok a r e unknown, we shall only be able to obtain the upper bound of PI(M>0)- In the special case when one of the sets of absolutely equivalent regions is a set of similar regions, we can however find and control the chance e of rejecting a true hypothesis when we meet one. The conditions to be satisfied are undoubtedly complicated. In the first place it may be questioned: is it ever possible that relations (34) and (38) should both be satisfied? We have dealt with this problem at length elsewhere, and shown that in a number of important statistical tests connected with samples from normal populations these conditions do hold good (Neyman & Pearson, 1933)."f * In other words, if a test T satisfies Definition D, it will be uniformly more powerful with regard to the class of alternatives C(h) than any other equivalent test; and vice versa. f Such is "Student's" test, also R. A. Fisher's tests for comparing the means and variances in two samples from normal populations.

200

J . N E Y M A N AND E . S . PEARSON

The regions associated with such tests we have termed best critical regions, as in the case of testing simple hypotheses. If it is considered essential to choose a critical region such that ,, ,„„. f, k can be given any desired value, then it appears that no better tests independent of the probabilities a priori than those satisfying (34) and (38) can be devised. On the other hand, similar regions for H0 may not exist, or if they exist as well as another set of absolutely equivalent but dissimilar regions, it is still possible that we might prefer to choose a critical region from the latter. In doing this we should know only the upper bound of e instead of its actual value, but we might obtain a better control of type II errors. With regard to absolutely equivalent regions in general, it will be seen that a unique solution satisfying Definition D is possible, when (a) For each simple hypothesis hok a common best critical region wk exists with regard to each alternative hypothesis h tj (i=j= 0). (b) When wk is the same for all values of k, or wk — wa (say), although P(w0\h0k) = ek is not necessarily the same. A simple illustration of this case occurs when the hypothesis H0 is that the mean a of a normal population with known standard deviation cr is not negative, i.e. a^O, the alternative hypothesis being that a< 0. We have previously considered a problem closely connected with this (Neyman & Pearson, 1933, pp. 302-4*); namely that of determining the best critical region for testing the simple hypothesis, say that the mean of a sampled normal population of known standard deviation cr has some definite value a0, the alternative being that a < a0. It was shown that whatever the alternative, say ha, the best critical region w is the same, namely that defined by the inequality nx — x1 + x2+...+xn ^ e ) , (4 2) we shall find that it has the following properties: (a) For every population hypothesis ha (contained in H0), specifying that the population mean has some definite, non-negative value a, we have PK|A 0 ) = e a P(w1\hai). (45) It follows from the preceding work that this region w0, defined by (42), will * See pp. 153-4 above.

Statistical hypotheses in relation to probabilities a priori

201

provide a test T independent of the probabilities a priori in the sense of Definition D; that is to say, whatever be the values of these probabilities, T has a greater resultant power (and is also uniformly more powerful) than any other test T-y absolutely equivalent to T. Further e, which is at our choice, will be the upper bound of the chance of rejecting a true hypothesis when we meet it. This problem is of course a simple one, and so far it has not in all cases been possible to establish that tests in common use satisfy Definition D. In cases where there is no test of a composite hypothesis satisfying this Definition, it would be possible to set down a further definition of independence analogous to that discussed in the case of testing simple hypotheses (Definition C).

IV.

CONCLUSION

We began with the question: to what extent is it possible to employ tests of statistical hypotheses which are independent of probabilities a'priori'1. To answer this it has been necessary to discuss what is meant by "independence ". We have suggested that a statistical test may be regarded as a rule of behaviour to be applied repeatedly in our experience when faced with the same set of alternative hypotheses. From this point of view it becomes natural to analyse the errors that we shall make, which are of two types: I. We reject the hypothesis H0 when it is true. II. We accept H0 when some alternative Hi is true. In making a decision upon which subsequent action will be based we are influenced by the consequences which follow from a wrong decision; some errors will matter more than others, and certain tests might be described as safer than others. We have therefore considered the conditions under which the choice of a test could not be improved upon—though the test might give us more information —even if the probabilities a priori were known. In the first place we have suggested that all errors of type I may be regarded as equivalent, while those of type II will be of differing quality according to the extent to which the alternative hypothesis which is true varies from that which is tested. Two tests which are equivalent (H0 simple), or absolutely equivalent (H0 composite), assure an equal control of type I errors. A test with greater resultant power than a second ensures a smaller risk of type II errors. The following definition, which applies, with slight modification, to composite as well as to simple hypotheses, has then been discussed: A test of a simple (composite) hypothesis may be termed independent of the probabilities a priori, if it is of greater resultant power than any other (absolutely) equivalent test, whatever be these probabilities. We have shown the relation of tests satisfying these conditions to those based on what have been defined elsehwere as best critical regions. Further we have suggested other lines of attack when such regions do not exist. Reference has also been made to the assumption, often necessary in practice, regarding the limits of the class of admissible hypotheses. 14

202

J . NEYMAN AND E . S. PEARSON REFERENCES

(1930). Inverse probability. Proc. Camb. Phil. Soc. 2 6 , 528-35. (1933). The concepts of inverse probability a n d fiducial probability referring to unknown parameters. Proc. Roy. Soc. A, 139, 343-8. N E Y M A N J . & P E A R S O N , E . S. (1928a,6). On t h e use and interpretation of certain test criteria for purposes of statistical inference. Biometrika, 20A, (a) P a r t I, 175-240; (6) P a r t I I , 263-94. N E Y M A N , J . & P E A R S O N , E . S. (1933). On t h e problem of the most efficient tests of statistical hypotheses. Phil. Trans. A , 2 3 1 , 2 8 9 - 3 3 7 . P E A R S O N , E . S. & N E Y M A N , J . ( 1 9 3 0 ) . On the problem of two samples. Bull. Acad. Polonaise Sci. et Lettres, Série A , 7 3 - 9 6 .

FISHER, R . A . FISHER, R . A .

C O N T R I B U T I O N S TO T H E T H E O R Y O F T E S T I N G STATISTICAL H Y P O T H E S E S I. UNBIASSED CRITICAL REGIONS OF TYPE A AND TYPE AX BY J. NEYMAN AND E. S. PEARSON CONTENTS

PAGE

1. 2. 3. 4. 6. 6. 7.

Introductory The choice of a critical region Unbiassed critical regions The determination of an unbiassed critical region of type A Simplification of solution in terms of = d log p/dB Illustrative examples The invariance of an unbiassed region of type A with regard to transformations of the unknown parameter 8. The determination of an unbiassed critical region of type Al 9. Summary

203 208 210 212 214 217 228 229 232

APPENDICES

I. The conditions of regularity II. Calculation of the limits of the type A unbiassed critical region for v .

.

233 237

1. Introductory. In a number of recent publications we have discussed a method of approach to the problem of testing statistical hypotheses starting from a simple concept which may be expressed as follows: arrange your test so as to minimize the probability of errors. Since the errors involved in testing hypotheses are of two kinds, the problem requires further specification which may have different forms. One of these has led to the theory of uniformly most powerful tests, but as it has been found that in many situations a solution along these lines is impossible, it follows that in such cases the problem of minimizing the probability of errors must be specified in a different form. This fresh specification will be discussed in the present paper, but we believe that it will be useful first to give a brief review of the general method of approach. In the first place it must be remembered that, as in all branches of applied mathematics, it is necessary to set up a precise but generally simplified model which we believe represents the world that we observe with sufficient accuracy to provide us with useful results. The methods of statistical analysis may be regarded as tools constructed to operate on this model and, as with all tools, their parts and uses must be described in technical terminology. Further, since a tool is something to be used on many occasions under conditions which are similar but not precisely identical, it is important to consider what are the common elements of the practical problems to which a statistical test is to be applied.

204

J . NEYMAN AND E . S. PEARSON

In the construction of these tests we must distinguish between those steps which can be established by mathematical theory and (on the assumption that this theory is correct) are therefore not open to question, and other steps in the making of which the statistician has a freedom of choice. In a given set of conditions mathematical theory may show that definite consequences will follow from the application of a statistical teat; these consequences may or may not appeal to the statistician as desirable. If they do not, he can adjust the form of test so that the results which can be shown to follow by mathematical reasoning are such as he can accept as of value in practical application. There is here inevitably present a personal element, since clearly established consequences which appear satisfactory to one individual may not be so regarded by another. The controversies that have troubled the world of mathematical statistics seem often to have arisen from a failure to distinguish clearly between these subjective and objective aspects of the problem. Before giving mathematical precision either to the model or the statistical tools, it will be useful to illustrate these ideas on a simple type of situation. Two examples will be considered. Example I. A manufacturer of electric lamps wishes to ascertain, from tests carried out on a small number of lamps selected at random, whether a batch of 1500 lamps is likely to satisfy that condition in a specification which lays down upper and lower tolerance limits for the "initial efficiency",* say a;, of individual lamps. Example II. A pharmacologist is asked to report whether a new and cheaper form of insulin is of inferior quality to an older standard. The experiments are carried out on rabbits and consist in the determination of a certain reaction, x\ low values of x are associated with inferior quality. It is very likely that these two problems would contain common elements as follows: (1) As a result of the tests a decision must be reached determining which of two alternative courses is to be followed. Thus the manufacturer must decide whether (A) to place the batch of lamps on the market, or (B) to test a second sample. The pharmacologist must decide whether to report, (A) "I can detect no indication that the cheaper insulin is worse than the more expensive one and therefore recommend the use of the cheaper insulin if certain further tests prove satisfactory"; or (B) "The cheaper insulin seems to be of inferior quality and I cannot advocate its use." (2) Since the decision must be made on a limited amount of data,I it is inevitable that it will sometimes differ from that which would have been made were * The "initial efficiency" expressed in lumens per watt is a measure of the lamp's efficiency as a light producer; it depends largely on the quality of the filament. t For example, economic considerations will govern the number of lamps which are tested.

Contributions to the theory of testing statistical hypotheses

205

fuller data available; it may therefore be described as liable to error. Two different faulty decisions or errors may be made; course (A) may be followed where, with more complete information, (B) would have been chosen, and vice versa. (3) Very different consequences will follow from these two errors and therefore they should be distinguished. On the one hand, for example, the manufacturer may suffer in reputation by allowing a faulty batch of lamps to go on the market, and on the other he may incur unnecessary expense in testing a second sample of lamps or even in destroying the whole batch when it is in fact up to standard. (4) In both problems it is likely that the situation may be represented by the same mathematical model. Thus past experience may have shown that the distribution of the tested character, x, whether in a single batch of lamps or in rabbits from a laboratory population receiving the same treatment, is approximately normal having a known standard deviation not dependent upon the mean. The model then defines as the probability law for x p(x)=

1

e~2 o* ,

(1)

V2TTO

and the statistical problem is to draw from n observations x1,x2,... xninferences regarding the unknown value of the mean, In the manufacturer's problem a specified value, , represents his objective mean initial efficiency for this type of lamp, while for the pharmacologist £0 is the mean reaction using the standard form of insulin. Beyond this there is clearly some difference between the two situations. The manufacturer does not expect his objective always to be maintained, nor does he mind if the mean level has shifted to , provided that, let us say, the values of — 3a and ^ + 3a fall within the tolerance limits set by the specification for individual lamps. The pharmacologist, on the other hand, may be almost certain that the new insulin cannot be better than the old, but is anxious to detect whether it is worse, i.e. to detect the existence of any positive difference, £0 — . Both, however, will be satisfied if the statistical test can provide them with a rule which will very rarely result in the choice of course (B) when £ = £0, and will be increasingly likely to suggest course (B) as £ differs more and more from £0. It is to give precision to the statistical tool used in dealing with situations of this kind that we have introduced a formal terminology which is convenient if understood in the sense defined. Thus in the present instance we are concerned with a test of the statistical hypothesis, H 0 , that £ = £0. The class of admissible hypotheses is defined by equation (1), where it is supposed that a is known. The manufacturer is concerned with alternative values of £ both above and below , while the pharmacologist wishes to concentrate attention on the range £ < f 0 . The model which only allows for uncertainty as to £ may be inadequate to represent the situation, but this is another question; the test has been constructed to fit the model and it may lose

206

J . NEYMAN AND E . S. PEARSON

its meaning if this is inaccurate or over simplified. The responsibility of choosing the model rests, however, with the statistician applying the test, who, in this case, would have to decide for himself whether he is running any serious risk in assuming the variation in x to be normal and a known.* If, in following the rule provided by the test (which in this case would probably relate to the mean x of the observations in the sample), course (A) is adopted, this has been described as the acceptance of the hypothesis, H 0 , t h a t £ = If course (B) is adopted it is said t h a t H0 is rejected. If H0 is rejected when, in fact, £ = it is said t h a t H0 is rejected when it is true and t h a t an error of the first kind has been made. If H 0 is accepted when it is said to have been accepted when some alternative is true, and that an error of the second kind has been made. I n any given case it is impossible to determine whether a hypothesis is true or false, but it is possible to assess the efficiency of a statistical test by the manner in which it controls these two sources of error on repeated application in the situation defined by the mathematical model. The method of approach t h a t we have followed in a series of previous papersf has been to consider how these two sources of error may be most effectively controlled. We shall now give an outline of the results obtained and recall certain definitions which will be used below. Denote generally by x1, x2, ... xna, system of variables, the values of which can be given by observation. Any system of values of the x's will be represented by a point, E, in the «.-dimensioned space, W. The point E% and the space W will be called the sample point and the sample space respectively. If the variables »Cj j • • • ^ have the property that, whatever the region w in the sample space, there exists a number, P {Eew}, representing the probability t h a t the sample point E will fall within the region w, then we shall describe these x's as random variables. The probability P{Eew} considered as a function of the region w will be described as the integral probability law of the x's. Any assumption concerning the nature of the integral probability law P {Eew} is called a statistical hypothesis. The statistical hypothesis is said to be simple if it determines P {Eew} as a single-valued function of the region w. Any statistical hypothesis which is not simple is called composite. I n our papers quoted we have assumed, and this assumption will be kept throughout the present paper, t h a t there exists a function, p(xt, ...xn)—p(E), defined, non-negative and continuous in almost any point § of W such that * Investigations into the adequacy of the statistician's models to represent the phenomena that he observes have, of course, been made from time to time. t J. Neyman and E. S. Pearson, (1) "On the problem of the most efficient tests of statistical hypotheses", Phil. Trans. Soy. Soc. A, vol. coxxxi (1933), p. 289; (2) "The testing of statistical hypotheses in relation to probabilities a priori", Proc. Camb. Phil. Soc. vol. xxix (1933), p. 492. t In our previous publications we denoted the sample point by S. This, however, was found to be inconvenient because of the possible confusion with the sign of summation. The letter E is suggested by the possible term "event" point. § I.e. except perhaps for a set of points of measure zero.

Contributions

to the theory of testing statistical hypotheses

207

whatever the region w, the value of P{Eew} is equal to the integral of p (E) taken over that region w. The function p (E) thus defined is called the elementary probability law of the x's. Under the assumption that the elementary probability law exists, we may say that a statistical hypothesis means any assumption concerning the nature of the elementary probability law of the z's. As it was shown in our previous publications and as is apparent from the preceding section, any test of a statistical hypothesis, H0, may be considered as equivalent to a rule of rejecting H0 whenever the sample point falls within a specified region, w, called the critical region, and in accepting it in all other cases. We have termed the probability of the first kind of error determined by H0 the size of the corresponding critical region w; thus if the size be a and the hypothesis tested, H0, determines the elementary probability law of the x's, say p{xx,...xn then

| 60m,... 0O«>) = p(x1,...xn\H0),

P {Eew\H0}= J . . .J p(x1,...xn\0OP{Eew

| H'}

(14)

for every admissible hypothesis H' of the set Q, alternative to H0, and for every region w of the set of unbiassed regions of size «. A region w0 having these properties may be termed the best unbiassed critical region, and the test associated with it, the uniformly most powerful unbiassed test. If we accept the principle of choosing an unbiassed test, and if among these a uniformly most powerful test exists, then it is difficult to see that this test coujd be bettered. Of unbiassed tests providing equal control of the first kind of error, this test reduces to a minimum the risk of the second kind of error, whatever admissible hypothesis alternative to H0 be true. We must leave the examination of the general theory of such tests for later discussion, at present approaching the problem in a form which is mathematically simpler and which we believe does in fact, in the special examples illustrated below, provide an unbiassed test which is also uniformly most powerful. We shall now define as an unbiassed critical region of type A, a region w satisfying the equation (11) and also the conditions iPOmil1 uv d2P

[Eew 16} — i

s 9=9„

e=ell .

. a maximum.

(lo)

On the assumption that the differential coefficients exist, the condition (16) implies that in using w, the probability of rejecting H0 when it is not true not only increases as | d — 60 | increases, but does so in the neighbourhood of 0 = 6o at a more rapid rate than if any other critical region satisfying (15) of the same size were used. It may, of course, be argued that the behaviour of what we shall term the power function, P {Eew \ 9}, is of less importance in the neighbourhood of 6 = d0 than farther away on either side, because the consequences of accepting H0 when an alternative hypothesis is true will in general be of less serious import * The range is the difference between the largest and smallest of the n values of x.

212

J . N E Y M A N AND E . S . P E A R S O N

when 6 differs only slightly from dQ than when 6 is considerably larger or smaller. This criticism would, of course, not apply if the type A region (which is necessarily unbiassed) was also the best unbiassed region. The final verdict of the practical value of the region satisfying conditions (11), (15) and (16) will depend therefore on the properties of the power function throughout the whole range of admissible values of 6. We shall, however, start by considering the conditions under which a type A region can be obtained and the procedure to be followed in determining it. 4 .

T h e

d e t e r m i n a t i o n of

P r o p o s i t i o n

I .

a n

u n b i a s s e d

c r i t i c a l r e g i o n of

t y p e

A .

If whatever be the region w in the sample space, the derivatives d P { E e w \

0 }

d e

n

(17)

^ P { E e w \ 6 }

and a

d

-

d0i

exist and are represented by the integrals L J

i

M J

w

d 0 »

e =

^

a n d

e

X

.

l

.

.

d

X

J

n

.

J

j

c

^ d x

1

. . . d x

j

p

= j . . . j ^ p

n

' ( E \ e

g

) d x

0

. . . d x

1

(say)*

n

w

" ( E \ 6

e

) d x

0

. . . d x

1

n

J-(18)

( s a y )

respectively, then the region w0 within which p

and outside which

p

"

e

( E

| e

) >

0

„ , „, „ .

e

"

( E

| 0

O

k

l

P

)


e

) d x

0

' ( E \ 6

0

1

) d x

1

. . . d x

=

n

x ,

=

( 2 1 )

0 ,

( 2 2 )

is the unbiassed region of type A defined above. The proof of this proposition is a simple consequence of the following Lemma. Consider a set of integrable functions F , F . . . F defined in the whole space of x1, ... xn, and regions w in this space satisfying the conditions 0

j . . . j ^ F

i

d x

1

. . . d x

n

=

c

l

t

i

m

( i =

1,2...

to),

(23)

where the c's are some constants. Let w0 be one of these regions, within which Til

and outside of which

F^-ZiktFJ, e"(E\90)dx1...dxn

(30)

(31)

corresponding to any other region w satisfying the same conditions (21) and (22). Comparing this statement with that of the Lemma, it is found that the role of F0 is here played by the function pg" ( E | 60) and that of F1 and F2 by p (E \ 80) and pg (E | d0). It follows that the region w0 required, maximizing (31), is that within which (19) is true and outside which (20) is true. Thus the proposition is proved.

214

J . N E Y M A N AND E . S . PEARSON

In the application of the general method to particular examples, it is of considerable importance that what may be termed the conditions of regularity of the function p ( E \ 6) under which the differentials of (17) may be represented by the integrals of (18) should be clearly understood, since the solution of the problem of unbiassed critical regions contained in (19) and (20) depends upon these conditions being satisfied. The meaning of the restriction has therefore been described in some detail in an Appendix,* where two illustrative examples are given, in one of which the conditions are satisfied and in the other are not. Proposition I I . If the probability law p (E | 6) satisfies the conditions of regularity (a) and (b) of the Appendix (p. 31), and if there exists a sufficient statistic T of 6, then the unbiassed critical region of type A is limited by surfaces upon which T has constant values. Proof. Darmoisf and NeymanJ have shown that if T is a sufficient statistic of 6, then in every point E of the sample space, except perhaps in a set of measure Zer

°' p(E\6)=p(T\9)f(x1,...xn), (32) where f{xx,... xn) is a function not depending on the parameter 6. Substituting the right-hand side of this equality in the inequality (19) defining the unbiassed critical region, we are able to divide both sides b y / ( x 1 ; . . . xn) and obtain pg" (T I e0) > klPe' (T\60) + k2p (T I e0)

(33)

in every point of the region. This proves the theorem. 5. Simplification of solution in terms of = d logpjdd. The solution given in the preceding section may be simplified considerably in the following way. If we write

we shall have

diogP(E\d) * = M

*

pe'(E\60) pe"(E\e0)

=

dnogp{E\6) w

= p(E\60),

= (' + *)p(E\60).

(34) (35) (36)

Substituting these two expressions into the inequality (19) defining the unbiassed critical region of type A, we obtain fi + ^ Z k ^ + kz

(37)

in any point where p (E | 60) + 0. Further steps in the solution depend upon the properties of ' and . Case (a). Here fi is a function of , say F (), not involving the x's explicitly. * See p. 31. "f G. Darmois, "Sur les lois de probabilité à estimation exhaustive", Comptes Rendus, t. cc (1935), p. 1266. J J. Neyman, "Su un teorema concernente le cosiddette statistiche sufficienti", Qiornale cLelV

Islituto Italiano degli Atluari, vol. vi (1935).

Contributions

to the theory of testing statistical

hypotheses

215

It may be shown that in this case a sufficient statistic of 6 exists.* Here (37) becomes an inequality in terms of , i f ^ + ^ ^ y + i,,

(38)

and it is seen that the unbiassed critical region of type A, w, is limited by the surfaces on which has certain constant values, say = ct

(i = 1,2, ...m),

(39)

where m ^ 2 is the number of different roots of the equation F ^ + ^ - k ^ - j f c 2 = 0.

(40)

The region w is thus defined by several inequalities of the type Ci^ ( ^ W = a, f + C O ^ f + " i » ( ^ f ) c ^ ' = 0. J - CO J J - CO J

(54)

6. Illustrative examples. Example IV. Suppose that the admissible hypotheses assume the following probability law of the x's: p{E\£)=p{x1,...xn\£) = {

^

^

j

l

(

— c o < x i < +oo; » = 1,2,...»). (55)

In other words the variables x are known to be independent and normally distributed with unit standard deviation, but the value of the mean is uncertain. The simple hypothesis, H 0 , to be tested assumes that £ = $ 0 , and it is desired to determine the unbiassed critical region of type A and size a for testing H0. It is shown in detail in the Appendix that the function defined in (55) satisfies the conditions of regularity justifying the use of the integrals of (18) in place of the differentials of (17). The unbiassed critical region of type A is therefore defined by the inequalities (19) and (20); more conveniently we may use the method of Section 5, working in terms of . We have - ¿)2,

logp(E\£)=-nlogV&r-(«, ¿ = S (*,-&),

'=-n,

(56) (57)

and consequently the condition (44) is satisfied. Writing S (xt) = rix, we see that ¿ = »(s-£0). (58) Clearly we may substitute x for in the relations (47), (50) and (51). It follows that the unbiassed critical region of type A is defined by the inequalities, say — Cj = x^, + C2 = #2) (59) where

p (x) dx — 1 — a, J Xi '(x-£0)p(x)dx

= 0.

(60) (61)

J :

Since, if H0 be true, x is known to be distributed normally, and therefore symmetrically, about with a standard deviation of 1 jVn or =

(62)

it follows from (61) that c / = c a ' = c' = X/Vn, say. Hence the unbiassed critical region of type A is determined by the inequalities x^£0-X/Vn 15

and

+ X/Vn,

(63)

218

J . NEYMAN AND E . S . PEARSON

where from (60) and (62) it is seen that A is obtained from the equation

J a V2ir

= 2i a

V(64)

'

and is at once found, for any desired value of a, by using the tables of the normal probability integral. I t is seen that in the present instance the test associated with the type A unbiassed critical region is precisely the test in common use based on equal "tailareas" of the sampling distribution of x. This is also the test which follows from the use of the likelihood ratio. I t will also be noticed that since x is a sufficient statistic for the type A region will be limited by surfaces upon which x — constant (Proposition II). A knowledge of this boundary condition without the introduction of some further principle will not, however, suffice to determine which out of the infinite number of regions of size a that are limited by surfaces on which x is constant should be selected. In Fig. 3 the power functions associated with six different critical regions of size a = 0-05, all bounded by surfaces x = constant, are represented.* On the assumption that a = a 0 is known, f any one of these regions would give control of the first kind of error if used to test the hypothesis H0 that i = i0 = 0. The regions (a) and (b) are what we have termed the best critical regions that would be appropriate to use in testing H0 if the admissible alternative hypotheses were confined to the set, Q, for which (a) £ > £0 or (6) In the case (a), for example, where the value of the power function has only to be considered for £ > f 0 , no critical region of size a can be found giving a power curve lying at any point above the curve shown in the diagram associated with the best critical region. This is true whether the alternative region is bounded by x — constant or not.J On the other hand, if both classes of alternative hypotheses with £ > £0 and £ < £0 are admissible, as we have supposed in formulating Example IV, neither of the best critical regions are satisfactory; it is seen that, in one direction or the other, the ordinates of the power curve tend to zero as | £ — £0 | increases. Regions (c), (d), (e) and (/) all satisfy the conditions (11) and (15), and the first three are unbiassed, i.e. satisfy (12) also. Clearly region (/) is of no practical value, since the power of the test is less than a for every admissible alternative hypothesis. Region (c) is the unbiassed critical region of type A satisfying (16). As will be pointed out in Section 8 the region (c) is also of the form there defined as type A x . This is because besides satisfying (11) and (15) it satisfies condition (14) if the * The boundaries for the regions are: (a) +1-6449, (6) -1-6449, (e) -1-9600 and +1-9600, (d) -2-3263, -0-0376 and +0-0376, +2-3263, (e) -2-0537 and -1-6954, +1-6954 and +2-0537, (/) -0-5388 and -0-4677, +0-4677 and +0-5388. f In the example just discussed it was supposed that '*+Pe>(E 10O) r -

(88)

Solving (87) and (88) with regard to pe' (E | 60) and pe" (E \ 0O), substituting into (19), (20) and (22) and multiplying (22) by P{E*w\ei}

(94)

for every other region w satisfying (92) and (93) and for every admissible value 614= 80. Since equations (92) and (93) are identical with (11) and (15), the specification of the type A x region differs from that of the type A region in the substitution of condition (94) in place of (16). It is felt intuitively and may be rigorously proved that if regions of type A x and type A both exist they must be identical, but even if the elementary probability law p(E | 6) is regular and admits of differentiation under the integral representing P {Eew | 8}, type A1 may not exist. If, however, this region does exist, it possesses the important property that its power with regard to any alternative hypothesis will exceed that of any other unbiassed region of size a satisfying (93).

230

J . NEYMAN AND E . S. PEARSON

I t is also probable t h a t in very many cases the test of H0 based on a type A j region will be the uniformly most powerful unbiassed test in the sense defined on p. 9 above, i.e. will satisfy condition (14). I t must be remembered, however, t h a t the condition (94) is not identical with (14); in the former the alternative regions w0 satisfy (92) and (93) (i.e. (11) and (15)), while in the latter they satisfy (11) and (12). I t may be possible (in fact one example of this kind is known) for an unbiassed region, say w0', of size a to exist for which the derivative of P {Eew0' | 6} does not exist at 8 = 8 0 , but which is nevertheless uniformly more powerful than the type A x region, w 0 . The situation is suggested in the following diagram, where w0' is described as of type Az.

R E G I O N S O F T Y P E A 1 (w 0 ). CURVE FOR AND T Y P E A a (wo).

p OWER

v

EH CA H EH fc O PS w * ©

F

\\

N

\ s\

**

\

\

\\ \

\

\

\ \

\

\

\

\

\

\

\

\ \

\ \

\

\ \

\

\

S

\

\

\ \

o/

/

\

e0

SCALE

OF

# G

Fig. 6.

The relation between these regions clearly needs fuller discussion and illustration. Here it will suffice to indicate the method by which the unbiassed critical region of type A 1 ; if it exists, may be determined. We shall assume t h a t for any region w the first derivative of the power function given in (17) may be represented by the integral of (18). Then using the Lemma of p. 10 it is not difficult to show that the region, say w0 (0X), which satisfies (92), (93) and (94) for a fixed alternative value, 6 — 0 1 , is such t h a t within w 0 (6 1 ) p(E\61)>k1pe'(E\e0) + k2p(E\90) (95) and outside w0 p (E | 0J < k^g (E | 60) + k2p (E | 0O), (96) where k 1 and k2 are two constants to be determined so as to satisfy (92) and (93). If the region w0 (0X) so obtained is the same for every admissible alternative 01 ^ d0, t h a t is to say, is independent of 6, then it will be an unbiassed critical region of type A x .

Contributions

to the theory of testing statistical hypotheses

231

Example V a. We shall now show that the unbiassed critical region of type A found in Example V when testing the hypothesis that a = a 0 has the property of being also of type A x . Using the probability law for the x's given in (65), it is found that the inequality (95) may be written

where kx and k2 are certain functions of a1 to be chosen so that w0 (CT1) satisfies (92) and (93). Writing as before S (x¿2) = vau2 and rearranging, the inequality (97) may be written „„ ,,„, „«„,„2 . where a and b are functions of k1 and k2 as well as of a1. It follows that the region w0 (o-1) is limited by surfaces on which v has certain constant values, a"hd that instead of finding and k2 in (97) so as to satisfy (92) and (93), we may determine values for a and b in (98) so as to satisfy the same conditions. We shall now show that the inequality (98) may be made equivalent to two inequalities and

V^v 2 ,

(99)

where and v2 have the values found previously in Example V, and therefore satisfy (92) and (93), which in this special case assume the form (68) and (72). I f we substitute these two values of?; alternatively into both parts of (98) and join them with the sign of equality, we obtain ecvi = a + bv1,

ecv* = a + bv2.

(100)

I f the values of a and b obtained by solving (100) are substituted into (98), it may be shown that an inequality is obtained which is satisfied by values of v which satisfy (99) and by these only. To prove this consider the function, say y(v) = e€V-a-bv.

(101)

We shall have y(v1)=y(v2), which shows that the function y(v) must have at least one extremum between vx and v2. Taking the derivatives y' (v) = cecv-b,

y" (v) = c2eCT > 0,

(102)

we see that y (v) has only one extremum and that it is a minimum. I t follows that y (v) > 0 for v < and

y{v)< 0 for

and v > v2

v1 0 that \pg"'(E\9)\ 0, then the point E' will be within the region W0 (0o + AO) and also within the region W+ (0o — A8), and therefore we shall have p(E'\eo

and

+ Ad) = 0,

p(E'\eo-A0)

lim p (E' \ 60 + AO) = 0,

A8->0

= e-~^i-eo+^

(132)

lim p (E' \ 0o - AO) = - . es

A0-5-O

(133)

I t follows that p (E' | 6) considered as a function of 0 is not even continuous at the point d = 0o. Thus the derivative pg' (E' \ 0o) does not exist and the validity conditions for the solution given above are not satisfied. As a result, for certain regions w in the sample space, the integrals (18) do not represent the derivatives

Contributions to the theory of testing statistical hypotheses

237

(17). It is easily seen that such will be regions including points belonging both to Wo(0) and W+(B). To make the position quite clear, we may consider the case when n = 2. Take for the region w that, in which x l + x 2 ^ a = constant. (134) Assume that 9 < a and calculate the probability of the sample point falling within w. This is given by P{Eew I 9}=je-J" = 1

e-tot-°>dxtJ dx,

_ e -_( a _ 0 ) ^ - 2 0 ) .

(135)

The derivative of this probability with regard to 9 < a exists, namely ~ P {Eew | 9} = e-(*i> —®»)|A|,

(6)

where there should be substituted on the right-hand side of (6), instead of the x's their expressions in terms of the y's obtained from (4). 3. The testing of statistical hypotheses. Any assumption concerning the probability law of a set of variables is called a statistical hypothesis. This is called simple if it specifies the probability law completely; otherwise it is called composite. Any test of a statistical hypothesis may be regarded as a rule for the rejection of the hypothesis tested when the * I.e. except perhaps in a set of points of measure zero. + The above terminology, e.g. sample point and sample space, etc., may suggest that the x's we consider denote necessarily independent observations of one or more random variables. In such case p (xlt ... xn \ 0lt ... Bt) would be a product of several functions of similar form and differing only .in their arguments. This is a limitation, which is frequently assumed. However, the nature of the problems considered below does not require this limitation in any way and the reasoning which follows applies equally in cases where the x's are mutually dependent or not.

242

J . NEYMAN AND E . S. PEARSON

observed sample point falls within a specified region, w, called the critical region, and for its acceptance in other cases.* In testing statistical hypotheses errors of two kinds may be made: (1) We may reject the hypothesis tested when it is true. (2) We may accept it when it is false. The probability of the first kind of error determined by the hypothesis tested, say H, has been called the size of the corresponding critical region,f and is given by P {Eew | H}. Two tests based on critical regions of the same size have been called equivalent. The probability of rejecting H when the true hypothesis is an alternative simple hypothesis, say H', or P {Eew | H'} has been termed the power of the test with regard to H'. The most powerful test for H with regard to H' is the test whose power is greater than that of any other equivalent test, and the critical region yielding the most powerful test with regard to H' has been termed the best critical region for H with regard to H'. If we denote by ii the set of all simple hypotheses which are considered admissible, then the test of a hypothesis, H0, has been called uniformly most powerful with regard to i2, if it is the most powerful test with regard to every hypothesis included in £2, alternative to H0. Write $ (xx,... xn | H0) and p{xl,... xn | H') for the probability laws of the x's determined by two simple hypotheses, H0, t h a t which is tested, and H', an alternative. In recent papers we have discussed the problem of the most powerful tests of statistical hypotheses, J and have shown t h a t the region w0 defined by the inequality (7) p(Xl,...Xn\H')>ip(x1,...xn\H0), where k > 0 is a constant chosen so that P {Eew0

| H0} = a

(8)

is the best critical region with regard to H ' having size a. The existence of a uniformly most powerful test with regard to the whole class of alternatives depends on the inequality (7). If after substituting the proper value of k determined from (8), the inequality is independent of H', then the region defined by (7) is the common best critical region with regard to the whole class of alternatives. But it may happen that (7) depends on H', and in this case no uniformly most powerful test exists. We have given examples of both situations. The problem of the most powerful tests, in the case where all admissible hypotheses are associated with a probability law of the x'a of the same form, differing only in the values of certain parameters involved in the probability law, has been * Sometimes it is true we may decide to remain in doubt, and in this case the rule will indicate a division of W into three instead of two regions. f J. Neyman and E. S. Pearson, Phil. Trans. Roy. Soc. A, vol. ccxxxi (1933), p. 289; Proc. Camb. Phil. Soc. vol. xxix, part 4 (1933), p. 492. X Loc. cit.

Sufficient statistics and uniformly

most powerful tests

243

discussed by R. A. Fisher,* who connected it with the theory of the so-called "sufficient" statistics. He stated that in cases where a uniformly most powerful test exists: (1) A sufficient statistic must exist and be constant on the boundary of the corresponding best critical region. (2) The number I of independent parameters which are specified by the alternative hypotheses must be one only. In the present paper we propose to discuss the same question rather more fully. It would appear that the original conception of a sufficient statistic needs some extension and classification, and that Fisher's argument must be modified, since his statements just quoted, if taken in full generality, appear to be inexact. We should like to emphasize at this point that we are not concerned in the discussion which follows with the use of sufficient statistics in problems of estimation, but rather with their bearing on the theory of testing statistical hypotheses. In treating this latter problem we have found it necessary to use not only the conception of sufficient statistics as introduced by R. A. Fisher, but to introduce also some new conceptions which, as far as we are aware, have not been considered before, namely the conceptions of a sufficient set of statistics and of a shared sufficient statistic. Again, we are not concerned with the bearing on Fisher's theory of estimation of the new functions which we define. We believe that our definition of a "specific sufficient statistic" corresponds to Fisher's conception of a sufficient statistic, but though he has written on sufficient statistics in several places the definitions he has given appear, in our opinion, to leave some room for misunderstanding. In his paper " On the mathematical foundations of theoretical statistics "f there is a section headed "Definitions" in which the following is found: Sufficiency. A statistic satisfies the criterion of sufficiency when no other statistic which can be calculated from the sample provides any additional information as to the value of the parameter estimated.

This definition contains the term "information" which is not previously defined. Later, however, in the same paper (p. 316) Fisher writes: The complete criterion suggested by our work on the mean square error J is : That the statistic chosen should summarize the whole of the relevant information supplied by the sample. This may be called the Criterion of Sufficiency. In mathematical language we may interpret this statement by saying that if 6 be the parameter to be estimated, a statistic which contains the whole of the information as to the value of 9, which thè sample supplies, and 02 any other statistic, then the surface of distribution of pairs of * Proc. Boy. Soc. A, vol. cxliv (1934), p. 285. t R. A. Fisher, Phil. Trans. Boy. Soc. A, vol. c c x x n (1922), p. 309. J R. A. Fisher, "A mathematical examination of the methods of determining the accuracy of an observation by the mean error and by the mean square error", Monthly Notices of B.A.S. vol. lxxx (1920), p. 758.

244

J . N E Y M A N AND E . S. PEARSON

values of S1 and 02, for a given value of 9, is such that for a given value of 8lt the distribution of d2 does not involve d. In other words, when B1 is known, knowledge of the value ff2 throws no further light upon the value of 0.

In our notation 9X and d2 are T1 and T2. This we have regarded as Fisher's mathematical definition of a sufficient statistic and we believe that it is equivalent to our Definition II. We have hitherto defined the sample space, W, as the whole space of the x's, including that is to say every point specified by a system of real values oix1,... xn. It will, however, be convenient in what follows to use the same expression in a somewhat narrower sense. The space W to which the following propositions relate will be defined as the set of points in which the probability law p(x1,...xn\61,...dl), as determined by at least one of the admissible hypotheses, is not zero. It will be seen that with this definition the sample space may be limited. In fact, it will not contain points, say x{, ... x^, in which p (a;/,... xn' \ ,... 6t) = 0 for every set of values of the 0's specified by the different admissible hypotheses. 4.

Definitions

and properties

of sufficient

statistics.

We shall start by defining what we shall term a statistic and more particularly a sufficient statistic. We shall need to consider sufficient statistics of different classes, the first of which is, we believe, the class originally defined by Fisher. If a function, T, of random variables xlt ... xn possesses the following properties: (a) T is defined and single valued at almost every point of the sample space W, (b) whatever be a number T', the locus of points in the sample space in which T < T' is such that the probability law of the x's may be integrated over it, giving the probability P{T < T'},* Definition

I.

(c) there exist such values, T', that the locus of points, W(T'), in which T= T' is of at least (n— 1) dimensions, i.e. one less than the number of dimensions of the sample space W, (d) T does not depend upon any unknown parameters which may be involved in the probability law, then it will be called a statistic. In formulating this definition we have tried to cover the case of statistics which are, at present, in common use. Frequently they are continuous functions possessing derivatives of all orders at all sample points. Such, for instance, are the mean

n

n

2 (x^/n and the variance s2= 2 (x — x)2/n. On the other hand, ¿=1 i= 1 "Student's" ratio z = x/s is not continuous and has no sense at any point of the line This shows that it would not be reasonable to apply the x=

* Such functions are called "measurable".

Sufficient statistics and uniformly

most powerful tests

245

term statistic to continuous functions only. I t may, however, be useful to require continuity and differentiability almost everywhere in W, but we do not insist on this point. We have thought it necessary to require that condition (6) should be fulfilled, since statistics for which it is impossible to calculate the probability laws would be of no practical value. The meaning of the condition (c) may be illustrated on the following example. Consider the locus of points, say W {x, s), in which both x and s are constant. This is the intersection of the hypercylinder, s 2 = constant, and the prime, x — constant, and its dimensions are therefore (n — 2). !Now if T is a function of the x's such that the locus of points in which it is constant is W (x, s), then we shall not consider it as a statistic. On the other hand, we shall consider W (x, s) as determined by the values of two statistics, x and s. Clearly for a function, T, to be a statistic we could not require that for any value T' the locus W (T') be of (n — 1) dimensions, as such a condition would eliminate a number of statistics in common use. For instance, for any number n

T' < 0 the locus of points in which s2 = 2 (xt — x)2jn = T' does not contain points i= 1

at all. Further, the locus of points in which s 2 = 0 represented by the straight line x i = x 2 = • • • = xn i® only of one dimension. Again, it would not be reasonable to require that the locus, W(T), be of exactly (n— 1) dimensions. In a number of practical problems we do, in fact, consider statistics which are such that the locus is of n dimensions. To illustrate this point consider the case where we have n independent observations, x1, x2, ... xn, of a single random variable which follows a normal law with known standard deviation. For the purpose of testing hypotheses regarding the mean associated with this law and for the purpose of its estimation, we can and do sometimes use a statistic T which is equal to the number of the z's having a value greater than some fixed value, say X.* The locus W (T) in the sample space in which this statistic is constant is of n dimensions, i.e. of the same number as the sample space. This point can be made clear by considering the case n = 2, as suggested in Fig. 1. Here the sample space is represented by the plane of xx and x 2 , and T assumes constant values of (i) 0, (ii) 1, and (iii) 2, respectively, for (i) x1 < X, x2 ^ X, (ii) x1 < X, x2 > X, or xx > X, x 2 X , and (iii) xx > X, > X. All four of these loci are of two dimensions. In the condition (d) a distinction is made between an unknown parameter, say 9, and its definite value, say 6 = d0, specified by some hypothesis. The object of this limitation is merely to make sure that it will be possible to calculate the value of T at almost all points of the sample space. If, for example, we write the expression * This is usually done when, for some reason, we are unable to ascertain the actual values of the ¡r's'but can count the number of those which exceed the value X.

246

J . N E Y M A N AND E . S . PEARSON

z = (x — d)ls and state that d is the unknown population mean of the x'a, x being the mean and s2 the variance in a sample, then we are unable to calculate the value of z, although x and s are known. Such an expression will not be called a statistic. On the other hand, if we substitute into z instead of the "unknown parameter 8 " its numerical value, say 6 0 = 1, specified by some hypothesis, then z = (x—l)/a becomes a function of the x's determined in every point of the sample space (except in a set of measure zero for which x1 = x2= ... = xn), and we shall call it a statistic. I t may be noticed here that some writers use the words "statistic" and "estimate" as synonymous. In our view a distinction should be made, the term "estimate" being used only with regard to such statistics which, for one reason XM

H

T-1

T-2

xt-x T - 0

T" 1

0

3C, FIG.L.

or another, are selected for estimating the value of unknown parameters involved in the probability law of the random variables considered. As we are, however, not Concerned here with problems of estimation we need not enter into any details concerning "estimates". We hope that as a result of this somewhat lengthy explanation the meaning of our Definition I will be understood and further that it will be found to be not too narrow. Definition II. The statistic T is called a specific sufficient statistic with regard to the parameter 61 if, whatever other statistic T2 be taken, the relative probability law P(T2\ TJ) of T2, given Tx, is independent of 6X. This we believe to correspond with Fisher's original definition of a sufficient statistic. We have added the adjective "specific" for convenience in comparison. Definition III. The statistic T1 is called a shared sufficient statistic of the parameters 6 l t ... 9q if, whatever other statistic T 2 be taken, the relative probability law of T2, given Tx, is independent of these q parameters, while it depends on the remaining l — q parameters 6g+1, ... 0l.

Sufficient statistics and uniformly

most powerful tests

247

I t will be noticed that the conception of a shared sufficient statistic is narrower than that of a specific sufficient statistic. In fact, if Tx is a shared sufficient statistic of say two parameters 91 and d2, then it will satisfy the definition of a specific sufficient statistic of 61 or 02 taken separately. On the other hand, a statistic which is known to be specifically sufficient with regard to 61 need not be a shared sufficient statistic of 61 and 0 2 . Definition IV. A set of m algebraically independent statistics T1,...Tm (i.e. such that none of them can be presented as a function of the others) is called a sufficient set of statistics with regard to parameters 6 1 , ...6 q if, whatever be any other statistic, T, the relative probability law of T, given Tlt ... Tm, is independent of 0 1( ... 9q. I t will be seen that, starting with this definition, it would be possible to carry on the subdivision of the conception of a sufficient set of statistics. We shall, however, leave this subject to be dealt with elsewhere. Neyman has recently proved the following proposition:* Proposition I. The necessary and sufficient condition for a statistic to be specifically sufficient with regard to a parameter 6 (in the sense of Definition II) is that in any point of the sample space (as defined on p. 117), except perhaps for a set of measure zero, it should be possible to present the probability law of the x's in the form of the product P (X1; ... Xn | 6)=p(T

| 6)(x1, ...Xn) |T=T(Xl,...Xn),

(9)

where p (T \ 6) denotes the probability law of T, and is a function of the x'b, independent of 9. This necessary and sufficient condition may also be put in the following form: Proposition la. The necessary and sufficient condition for a statistic T to be specifically sufficient with regard to a parameter 6 is that it should be possible to present the probability law of the x'b in the form of a product p(x1,...xn\6)

= / ( r , 9){x1,... xn),

(10)

where the function/depends upon T and 0, and does not involve 6, the above equality holding good in all points of the sample space, except perhaps in a set of points of measure zero.f * Oiornale dell' Istituto Italiano degli Ativan, vol. vi, no. 4 (1935). See also G. Darmois, Comptes

Rendus, t. cc (1935), p. 1265.

t The sufficiency of the conditions expressed in Propositions I and l a is evident, and it has been used by Fisher (e.g. in Proc. Boy. Soc. A, vol. CXLIV (1934), p. 289). The necessity of the conditions is not, however, so evident, and this forms the main topic of Neyman's paper quoted. The proof there given concerns both the case of continuous and discontinuous variables. In the latter case the equations (9) or (10) must hold good in all points where p (xr,... x„ | 8) is not zero. In the case where the variables are continuous it was also required that 8T/8xi (i = l , . . . n) should exist and be continuous, and that at least one of these derivatives should differ from zero in almost the whole sample space. It is believed, however, that these limitations are not essential for the validity of the theorem.

248

J . N E Y M A N AND E . S . PEARSON

We may now state the following further propositions: Proposition I I . For a statistic T to be a shared sufficient statistic of parameters 61, ... 6a, it is necessary and sufficient that in almost every point of W, it should be possible to present the probability law of the x's in the form of the product li-ru,....*„), 1p{x1,... xn | el,... eq,... e,)=p (T | elt... eg)(x1,... «„;o q + 1 ,... et) (11) where p (T \ dlt... 6q) is the probability law of T, and thé function does not depend upon 6 1 , ... 6 q . Proposition I I I . For a set of algebraically independent statistics T1, ... Tm to be a sufficient set with regard to the parameters d1,... 9q, it is necessary and sufficient that in almost every point of W it should be possible to present the probability law of the X'B in the form of the product p(x1,...xm\e1,...dQ,...8l)=p(T1,...Tm\e1,...eq)(x1,...xn-,eq+1,...dl), (12) where p(Tlt... Tm\d1, ...6q)is the probability law of Tlt ... Tm and the function does not depend upon ... 6q. Here again equivalent forms of the conditions expressed in Propositions I I and I I I will be obtained if instead of p ( T \ 01,... 6q) and p (Tlt... Tm\61,... 6q) in equations (11) and (12) were substituted any functions f1(T,61,... dq) and f2(T,... Tm,61, ...0q) respectively, not necessarily probability laws. Propositions II and I I I in such transformed form will be referred to as Propositions I I a and I l i a respectively. Since the method of proof of Propositions I I and I I I is identical with that given by Neyman for Proposition I, we shall omit the proof and refer the reader to the publication quoted. It should be noticed that wjiile in the present paper we consider only the case where the variables xx, ... xn are continuous, the Definitions II-IV and the Propositions I - I I I hold good in the most general case, except that for discontinuous variables the equalities (11) and (12) must hold good in all points where the left-hand side is positive. In the course of the following pages examples will be given illustrating these definitions of sufficient statistics. Our main purpose is, however, to consider what connection exists between such statistics and uniformly most powerful tests. In doing so we shall limit ourselves to the case where the hypothesis tested is simple, and specifies the value of one or more parameters involved in the probability law of the x's, whose functional form is assumed known.* We shall start with the assumption that a uniformly most powerful test with regard to a class Q exists in the general case where the alternative hypotheses of £2 depend upon several, say q, independent parameters. Since, however, this assumption contradicts the second of Fisher's statements quoted on p. 116, we shall start by giving an example where it is true. * For the definition of a simple hypothesis see p. 114.

249

Sufficient statistics and uniformly most powerful tests

5. The existence of uniformly most powerful tests when the alternative hypotheses depend upon more than one independent parameter. Example I. Suppose that each of the independent random variables, xt, follows the probability law f p (x{) = fie-P&i^ for x^y. (13) li'(a;i) = 0

for xi < y.

(14)

\N

EXAMPLE WITH BEST CRITICAL

EXPONENTIAL REGION IS

LAW; T H E SHADED.

FIGURE. 2. We shall assume that the class ii is composed of hypotheses for which y < y0 and fi > fi0 > 0, y and fi being thus two independent parameters. The simple hypothesis to test, H0, assumes that y = y0> fi = fio- Any alternative, H', will thus assume either that y 17

(16)

250

J . N E Y M A N AND E . S . PEARSON

in the part of the sample space, say W + (y), determined by the inequalities xt>y (¿=1,2,...7i), (16) and will be zero in the remainder of W, which we shall denote by W0 (y). To simplify notation we shall write below p0 for p{xl, ... xn \ y0, j30) and, if either y ± y0 or j3 ^ j30 or both, px for p (x1, ... xn | y, p). The best critical region, w, for any alternative H ' will be defined by the inequality Pi>kpQ, (17) where k is a constant to be determined, so that P{Eew \ H0} = a. I t will be seen t h a t w must contain all points of W0 (y0), i.e. in which p0 = 0, since all such points satisfy (17) whatever be the corresponding value of p1. Besides these points of W0 (y0), the best critical region will contain certain points of W+ (y0) in which i>o>°Since y0 ^ y, then p1 > 0, whenever p0 > 0, or in other words whenever p0 is of the form (13), p1 must be of similar form. I t follows t h a t the part of the region W+ (yQ) included in the best critical region is defined by (x—y) J; kPQne-np0 (5-y0))

(18)

k being a constant to be determined later, and x denoting the mean of the n values of x. The formula (18) is equivalent to (p - Po) x < yp - y0 j30 - 1 log k + log (j5/j30).

(19)

According to the conditions of our problem ¡3 may be either equal to or greater than p o . I n the first case the inequality (18) does not imply any limitation of the shape of the part of W+ (y0) to be included in the best critical region, which may therefore be chosen arbitrarily, subject to the unique condition t h a t P{Eew\H0}

= oi.

I n the case where ft > ¡¡0 the inequality (19) may be written x ^ y j 8 - y „ / 3 0 - i l o g * + log(i8/j3 0 )J^(j8-ia 0 ) = i 1 (say),

(20)

and does imply a limitation. This limitation is, however, of the same character whatever the values of y and fi specified by the alternative hypothesis, H', included in Q, provided fi > jS0. Using the known distribution of the mean in a sample from an exponential distribution* and the Tables of the Incomplete Gamma Functionit is easy to determine the k 1 of (20) such that P{Eew | y 0 , j90} = a. This quantity, k1, will be a function of a, y 0 and /90 only, so t h a t the part of W+ (y0), defined by (20), to be included in the best critical region depends on the parameters y 0 and /30 specified by H0, but is independent of the alternative H'. * See for example Neyman and Pearson, Biometrika, vol. xx A (1928), p. 223: p(x\vM

=

(i - Vo)"-1

f Karl Pearson, Tables of the Incomplete Oamma Function.

(5 y )

-» •

Sufficient statistics and uniformly most powerful tests

251

It follows that if we include in the best critical region (1) the whole of the region TT0 (y0), where p0 = 0, •(2) the part of W+ (y0) (in which p0 > 0) determined by the inequality (20), we obtain a region, say w0, wjiich is a best critical region for H0 with regard to any alternative H' of £2 specifying the two independent parameters y and provided that £2 is restricted to the class y < y0, j8 ^ jS0. We conclude that the existence of a uniformly most powerful test with regard to a class, £2, of simple hypotheses does not require that the number of independent parameters specified by the alternatives should be necessarily equal to one. The statement to the contrary, as made by Fisher, is therefore in general not correct. 6. Conclusions to be drawn from the existence of a uniformly most powerful test. Suppose that the probability law p(xx, ... xn\91, ... fy), or for short p1 (E), where x±, ... xn are the coordinates of the point E, depends upon I independent parameters 61, ... 6h whose values for all admissible hypotheses of a set £2 are contained in certain intervals, limited or not. The hypothesis tested, H 0 , specifies one such system of values, 9^°, ... df, and to shorten the notation we shall write p0 (E) for p (x1, ... xn | ... 0j°). Denote by W+ and WQ the parts of the sample space W in which p0 (E) > 0 and p0 (E) = 0 respectively. Assume that there exists a uniformly most powerful test, and denote by a the size of the corresponding best critical region, w (a). This region, besides including all points of W0, in which p0 = 0, will contain a part, say v (a), in which p0 > 0. It is clear that in any point, E, ofv (a) we must have p1 (E) > 0 for every system of values of the 0's, for otherwise E could not be included in the best critical region within which it is known that px > kpv. v (a) will be called the positive part of the best critical region. We shall say that E is a point belonging to the positive boundary L(a) of w (a) if: (a) E belongs to v (a), (b) in the vicinity of E there is at least one point E' belonging to w (a), and at least one point E" lying outside w(a), both E' and E" being different from E. Proposition IV. If w (a) is the best critical region for H0 common to all alternatives included in £2, and if E^ and are any two different points of the positive boundary L (a), then, for every fixed system of values of d1, ... 6h we shall have px (EJ/po (EJ =Pl (E2)lp0 (E2). (21) Fix a system of admissible values of the 6's, say 6i = 6i' (i=l, ... I). The best critical region, w (a), being common with regard to all admissible hypotheses, there must exist a constant, say k(d x , ... 0/), such that px (E) > k (0/,... 6,')p0 (E) if E is within w («),

(22)

p1(E)^k(0/,...

(23)

d/)p0(E)

for all points E outside w(a).

252

J. N E Y M A N AND E . S. PEARSON

If Et (i = 1, 2) lies on the positive boundary L (a), then both p0 (23f) and pl (Et) are positive in Et and therefore continuous. Since in the vicinity of Ei there must be a point, Et', lying within w (a) where (22) is true, and also a point E" where (23) is true, it follows that p1(Ei) = k(ei',...ei')p0(Ei)

(¿=1,2),

(24)

and consequently (21) is true and the proposition proved. Assume now that for any value of a, 0 < a < a 0 ^ 1, there exists a uniformly most powerful test for which the size of the best critical region is a. (In the majority of problems a0 = 1, but this is not necessarily the case.) Denote by S the set of uniformly most powerful tests such that to any a, 0 < a < a0, there corresponds one and only one test of S. This set will be called a system of tests associated with the limit a0. Denote by w (a) the best critical region of size a corresponding to a test of S. We shall say that the set S is ordered if for any 0 < ax < a2 < a0 the best critical region w (04) is included in w (OL2).

Fig. 3.

Proposition V. If a system, 8 1 , of tests exists and is not ordered, then it is possible to find another system, S2, associated with the same limit a0, which will be an ordered system. Proposition V is a simple corollary of the following Lemma. Lemma I. If a2 < a2 and w (a2) and w (a2) are two hest critical regions common to all alternatives, of size ax and a2 respectively, and further if w (ax) contains a part, say v', which is not contained in w (a2), then it is possible to find a region, w' (OLJ), which will be a best critical region common to all alternatives, will be of size cx.1, and will be contained in w (a2). w (04) and w (a2) will both include all points of W0, in which p0 = 0. liw (OLx) is not wholly contained in w (a2), the positive parts, v (a), of these two regions may be expressed as v(k2Po(E). (28) It follows that within v", k2p0(E)^Pl(E)^klPa(E), and thus k2^k1.

(29)

Further, outside w (a2) and therefore within v', (E) 0 The set Q was determined by the inequalities y ^ y 0 and being certain fixed numbers, and the hypothesis tested, H0, was that y = y 0 , /} = |90. It was shown that a uniformly most powerful test for H0 existed corresponding to any a < 1. It will be found, however, that the condition of Proposition VII regarding the spaces B and W is not satisfied. In the first place the sample space W is not limited and extends from — oo to + oo for all variables x1, ... xn. For whatever be the point E' with real coordinates ... xn', it is possible to find a hypothesis, say H', included in Q specifying a value y = y', such that y' 0 in E'. On the other hand, the boundary space B must by definition be entirely contained in the part, W+, of the sample space in which p0{E) >.0. No point, E", having at least one of its coordinates x1; ... xn less than y 0 , belongs therefore to B, and it is clear that the set of such points E" does not possess a measure equal to zero. We shall now show in fuller detail that, as the general theoretical conclusions have led us to expect, no shared sufficient statistic of y and /S exists. For suppose that there is such a statistic T, then according to Proposition II we must have in almost the whole of W,

p(x1,...xn\yp)=p[T\yP)4(x1,...

xn),

* It is easily seen that T satisfies the Definition I.

(41)

256

J . N E Y M A N AND E . S . PEARSON

where is independent of both y and /?. In particular i£ y1 /J0 are

(

'

and consequently if at any two points, say Em and E(2) of W+, the shared sufficient statistic T has the same value T1, then the ratio B (Tx) =p0 (EV)lPl (EW) =p (T11 y0fi0)lp (T, | y i f t )

(43)

must have the same value in those points, i.e. for both i = 1 and 2. In other words the locus of points, W (T), where T is constant must form either a part or the whole of the locus W (R) in which the ratio R is constant. This circumference will enable us to decide whether there is a shared sufficient statistic of y and In the present example the ratio R is of the form (44) J2 = (jSo| Pi) n e-n-A>i+ro0o-yiA in every point of W+. It is seen that within W+ the locus of points, W (R), in which R is constant is identical with that in which the mean, x, is constant. It follows that if a shared sufficient statistic, T, exists at all, either (i) it is constant throughout the whole of the prime x = constant and changes its value when that of x changes; or (ii) each of the primes x = constant can be broken into a number of parts within which T has a constant value, which however differs from part to part of the prime and also from the values assumed by T in other primes x = constant. I f the position (i) were true, then x would itself be a shared sufficient statistic. The fixing of x would in fact mean the fixing of T, the probability law of x would differ at most from that of T by a factor, depending on the nature of T only and not on the values of y and /9, and finally it would follow that if T satisfies the conditions of the Proposition II, then x would do so also. A superficial examination of equation (38) might suggest to the reader that T = x satisfies the conditions of Proposition I I a with (x1, ...xn) = l. This, however, is not the case, and it is easy to show that if we write p{xlt

...xn\yp)

= f]ne-fln&-r)(Xl, ...xn),

(45)

and require that this equality should hold good in almost the whole of the sample space W (so as to satisfy the conditions of Proposition I I a), we shall find that the function must depend upon y. To see that this is the case, fix a sample point E1 and choose the value oiy = y1 so that it does not exceed any of the coordinates of E t . Then the probability law of the a;'s will be of the form (38) and - *»)=!>(*i. -

1.

(46)

Next increase the value of y to y2 > y1 so as to exceed the smallsst of the coordinates of the point E1, say x^. According to the definition p(x1,...xn\y2f3) =Q

Sufficient statistics

and uniformly

most powerful

tests

257

and therefore the value of will have also to be zero. I t follows that the value of at the same sample point Ex may be either 1 or 0 according to the value of y. As this result will apply to every point in the sample space, it follows that the function does not satisfy the conditions of Proposition I I a. The result would be similar if we were to try to present p (z1, ... xn\ yfi) as a product of p (x | yj8) and another function, say , so as to satisfy the conditions of Proposition II. I t follows that x is not a shared sufficient statistic of y and /?. Let us now consider the alternative (ii). Denote by T the shared sufficient statistic, the existence of which we assume, and by p {T \ yfi) its probability law. Next select two points Ea) and E{2) with coordinates (a;/, ... xn') and (a; 1 ",... xn"), respectively, in which T has the same value, Let x ^ and x ^ ' be the smallest of the coordinates of the two points. According to Proposition II, it follows that (47) where does not depend on y and /}. I t will be noted that the values of at the points E m and 2?® cannot be equal to zero, since if this were so the values of p(xlt ... xn \ yjS) at these points would be always zero, whereas for y (*i. ••• xn\ya)=P{x\ya)4>(.xi>

where

j(z1,...xn)=

* n (V 2 •n)n-1 /

l

n

••• xn)>

(60)

(61)

(62)

Since is independent of either y or a, it follows that x is a shared sufficient statistic of y and a. Example IV. Here we shall show that a specific sufficient statistic o f one unknown parameter may exist, while the best critical regions corresponding to any two alternative hypotheses are different. Take the probability law (52)

260

J. NEYMAN

AND E . S.

PEARSON

defined in the preceding example and put a = y, and suppose that Q is the class of hypotheses having y > 0. Then in any point of the sample space we shall have A2

1 "

. (63) y (V 277)™ If we now proceed to test the hypothesis that y = y0, it is found that the best critical region with regard to any alternative is defined by the inequality z 2 (y2 ~~ yo2) — 2x (y — y0) yy0 ^

(64)

lc.

Two cases arise: (1) y > y0; M> is then determined by the inequalities (57), (2) y < y0; w is now determined by (58), c — YYol(y where now + Yo) (® 5) and the values of d are chosen so that the best critical region may be of size a. Clearly as before the boundaries of the best critical regions depend on the value of y, and there is no uniformly most powerful test with regard to all the alternatives. Moreover, the best critical regions corresponding to any two different alternative hypotheses are different. x is however a specific sufficient statistic of y, as may be shown following the same reasoning as in Example III. To illustrate this problem graphically the case of n = 2 has been taken, for which equation (63) may be written p(x

1

x

2

\ y ) = ^—e

2Y

.

(66)

This law may be described as a normal bivariate correlation distribution for which Mean xx = 2y, Mean x2 = 0, 2 Standard deviation of x1 = V1 + y , Standard deviation of x2 = 1, Coefficient of correlation = — 1/Vl + y2. In Fig. 4 coordinate axes Ox1 and Ox2 have been drawn, and the elliptic contours of equal probability i { (

X l

- 2y)2 + 2 ( x

1

- 2 y ) x

2

+ ( l + y2) z22} =

x 2

= 5-991

(67)

are shown for three hypotheses, namely y = 0-5, y = 1-0 and y = 2-0. If a sample point E, (x1, x2), is subject to the law (66), then the probability of E falling within the ellipse (67) is 0-95.* Thus the three curves suggest pictorially the manner in which the normal correlation distribution changes with y. The straight line DOD' represents the limiting form of (67) as y->0. * If w is the region within (67), then it is known that P {Eew I y}= f... I p (xxx2 I y) dx1 ¿x2 = 1 -e~l*2=0-95.

Sufficient statistics and uniformly

most powerful tests

261

9

ift

¿ 1

o

II

a

XX O O b* H ft

n

«

«

« P5 y 0 . Here the inequality (57) must be used, and the boundaries will be two straight lines such as EE' and GG' equidistant from the point G1 on AOB. Since from (65) the constant cx = §, in the diagram OC1 = V2 C l = §V2. The distance between the boundaries or 2V2d will depend on the value of « chosen, i.e. upon the risk accepted of rejecting H0 when it is true. (2) For the alternative y = 0-5 < y 0 . Here we must use (58) and the best critical region will lie between two straight lines such as J J' and LL', equidistant from the point C 2 on A O B . From (65), c 2 = J so that OC\ = V 2 c = j V 2 , while again the distance between the boundaries will depend upon a. Since the length V2c = V2y/(1 + y) determining the position of the points Clt C 2 , ... etc. is different for every alternative, the best critical regions will be different in every case, although the boundaries of these regions all belong to the family of straight lines, x = constant. The region w appropriate for an alternative y = y x is picked out so that P { E e w | (y = y x ) } is a maximum subject to 2

P

{ E e w

| (y = y 0 ) } = a .

We may also interpret the sufficient statistic in this simple case. If we take new rectangular axes O B , OD with coordinates u = (xx + x2)/V2, v = ( x — x J / V 2 , then the probability law for u and v becomes 2

2 p ( u v \ y ) =

— y

(tt - VSiy) 8 y

2

j

(u+vY

x — — e

\tt

*

.

(68)

2 V i t

Thus whatever be y, for a fixed u = V2x, v is distributed normally with a standard deviation of V2 about a mean of — u. When u (or ic) is known, it follows that a knowledge of v provides no additional information whatsoever concerning y. In this sense x is a specific "sufficient" statistic of y* E x a m p l e V . On p. 120 of the present paper we gave a definition of a sufficient set of statistics with regard to parameters 81, ... 9 q . It is not proposed to discuss here the properties of such a set, but we shall indicate briefly with an example that such sets of statistics do exist. If each of n independent random variables x follows the normal probability law

p(x t ) =

O^/iTT

-

±

=

e

-

(

6

9

)

* The sufficient estimate of y obtained from the maximum likelihood solution is found to be y = 2 2 ( V 2 - l ) if x>0, and y = 2x (1—V2) if £ < 0 .

Sufficient statistics and uniformly most powerful tests

263

then the joint probability law of xt, ... xn may be written

n (70) where

n

nx = £ (Xj) and

n

ns2= S (xt — x)2.

(71)

Then it follows t h a t (1) x is a specific sufficient statistic of y. (2) There is no specific sufficient statistic'of a nor a shared sufficient statistic of y and a. (3) x and s form a set of sufficient statistics with regard to y and a. (4) If y is known, then x2 = 2 (xi — Y)2 is a specific sufficient statistic of a. Other examples of sufficient systems of statistics are mentioned on p. 131. 8. Conclusions. The theory and examples discussed have shown t h a t (a) When a system of uniformly most powerful tests exists and some other additional conditions are satisfied, then sufficient statistics either specific or shared must also exist. (b) When a system of uniformly most powerful tests exists there may be'no unique sufficient statistic a t all. (c) When a sufficient statistic exists, a system of uniformly most powerful tests m a y exist or not. We cannot therefore agree with the opinion expressed by R . A. Fisher* t h a t the problem of uniformly most powerful tests is, as it were, covered by t h a t of sufficient statistics. Neither do we think t h a t in cases where no sufficient statistics exist his own method of approach, by what he has termed the theory of estimation, is the most convenient or simple one to follow. Certainly there are links between these two kinds of problem, which are very interesting, b u t no more. I n our opinion the problems of testing statistical hypotheses should be treated b y starting directly from some comprehensible principle expressed, if possible, in terms of existing concepts, such as t h a t of probability. The concept which seems reasonable to us is this: arrange your test so as to minimize the probability of errors. Since the errors involved in testing hypotheses are of two kinds, t h e problem requires further specification which may have different forms. One of these has led to the theory of uniformly most powerful tests. Since it has been found t h a t in m a n y cases a solution along these lines is impossible, it follows t h a t in such cases the problem of minimizing the probability of errors must be specified * Proc. Roy. Soc. A, vol.

cxliv (1934), p. 296.

264

J . NEYMAN AND E . S. PEARSON

in a different form. Our further attempt in this direction has led us to a conception of unbiassed critical regions, the theory of which is being published partly in another paper by the authors in the present issue and partly by Neyman, elsewhere.* Before concluding we may remark t h a t the practical statistician may perhaps complain t h a t certain of our examples, e.g. I l l and IV, are somewhat artificial. With this we agree. We think, however, t h a t they serve their purpose, which is to show that the premise: " a sufficient statistic exists'', does not necessarily involve the conclusion: " a uniformly most powerful test exists". There is however nothing artificial or exceptional in other examples, I, I a and II. * Bull. Soc. Math, de France, t. 63 (1935), p. 246-266.

C O N T R I B U T I O N S TO T H E T H E O R Y O F T E S T I N G STATISTICAL

HYPOTHESES

BY J . NEYMAN AND E. S. P E A R S O N CONTENTS Part II. Certain theorems on unbiassed critical regions of type A. 1. Introductory 2. The existence of the unbiassed critica regions of type A when the hypothesis tested concerns the value of only one parameter 3. Sufficient conditions for the unbiassed critical region of type A being of type Aj

PAGE

265 266 273

Part III. Unbiassed tests of simple statistical hypotheses specifying the values of more than one unknown parameter. 1. 2. 3. 4. 5. 6.

General remarks Unbiassed critical regions of type C Effect of a transformation performed on the parameters Simplification of solution in terms of 's An illustrative example Power functions of alternative tests of the hypothesis H0, considered in the previous section Appendix

276 277 282 285 286 291 298

PART I I . 1.

CERTAIN THEOREMS ON UNBIASSED CRITICAL REGIONS OF TYPE A Introductory.

The present paper is a direct continuation of our previous publication under the same title (Neyman & Pearson, 1936).* This makes it unnecessary to discuss the general character of the problems considered or to explain the notation. In fact, the present publication does not need any kind of detailed introduction. The following preliminary remarks are, however, necessary. In presenting a theory, it is an advantage to be able to develop it gradually step by step, without leaving any gaps. This, however, is not always possible. The results obtained in a given direction do not always cover the whole ground but, on the contrary, they sometimes go by jumps and the gaps have to be filled up later on. If we were to insist on leaving no gaps, then the publication of the latest results would have to be greatly delayed. As these have proved to be useful and have led on to some other papers prepared by workers in the Department, we have thought it justifiable to go ahead with their publication in spite of our being aware of the fact t h a t many points raised in our P a r t I still require consideration. The most important of these is the question of the existence of the * This paper will be referred to below as Part I, for short. 18

266

J . NEYMAN AND E . S. PEARSON

unbiassed critical regions of types A and Av In Part I, we gave their definitions and showed the procedure which, if it is possible to carry out, leads to an unbiassed test of type A. We showed also how to test whether the corresponding critical region belongs to the type Ax. Further, we gave a few examples proving that the new conceptions introduced were not contradictory in themselves, that is to say that there were cases where the tests discussed existed. However, we did not attempt to consider the question whether it is always possible to determine an unbiassed test either of type A or of type A 1; nor, if the answer is in the negative, what are the conditions under which the solution of the problem actually exists. This is a gap which should obviously be filled before we proceed any further. In what follows in Part II, we shall prove the existence and the identity of the unbiassed tests of type A and A1, although the theorems in question refer only to one of the three particular cases distinguished in Part I. Next, in Part III, we shall proceed directly to broaden the conception of the unbiassed tests so as to cover the case where the hypothesis tested is simple but specifies the values of more than one parameter. This work, since it is breaking fresh ground, will necessarily be to some extent exploratory. It may be useful to mention here two papers appearing since the publication of our Part I, which seem to emphasize the importance of the unbiassed test. In the first of these (Neyman, 1935) it was possible to show how an unbiassed test, called of type B, can be defined and determined in the case where the hypothesis tested is composite and specifies the value of one unknown parameter involved in the probability law of the observable variables. The other publication (Neyman, 1937 a) contains a theorem showing that when the elementary probability law is differentiable with respect to the parameter 6, to which the hypothesis tested H0 ascribes a value 60, and when the admissible hypotheses are both that 6 < 8 0 and 6 > 60, then there can be no uniformly most powerful test for the hypothesis HQ. It is worth noting that the theory of unbiassed tests, as developed in our Part I and in the subsequent publication by Neyman, refers just to those cases where the conditions of the theorem last mentioned are satisfied and, therefore, where no uniformly most powerful test exists. This fact seems to emphasize the importance of the unbiassed tests. 2. The existence of the unbiassed critical regions of type A when the hypothesis tested concerns the value of only one parameter. In our earlier paper, we have considered the problem of unbiassed critical regions assuming the following fundamental condition which, without its being mentioned again, will be always assumed below: FUNDAMENTAL CONDITION. The elementary probability law p(E \ 6) of the x's, whose values may be given by observation, admits two successive derivatives with respect to 6 under the sign of integral, taken over any fixed region w in the sample

Contributions

to the theory of testing statistical

hypotheses

267

space, so that ddk ¿=1,2.

for

J.. . j j ( E \d)dx,...

dx„

s

J.. - J ^ f J

... dx„

(1)

If we denote by d\ogp{E\6) 4> = de then

. 0=00

0' =

dHogp(E\0) do2

dp(E I 6)

2

(3)

| 6)

de

(2)

(4)

)=0O

and it is seen t h a t the inequality (19) of our Part I, p. 10, defining the unbiassed critical region w0 of type A reduces to { 0.

(8)

Outside of w0 it is necessary that +

0.

(9)

I t will be useful to draw attention to the following important consequence of the Fundamental Condition. As, whatever 0, j...j^p(E\e)dx1...dx„

= l,

(10)

the derivative of the left-hand side of (10) with respect to 0 must be identically equal to zero. Owing to (1) and (3), we shall have therefore J...f

tp(E\0o)dxl...dxn=0.

(H)

We shall use this equality below. The conclusions concerning the existence of the unbiassed critical regions of type A depend on the nature of the function '. In this respect we have formerly

208

J . NEYMAN AND E . S. PEARSON

(Part I, pp. 12-14) distinguished the following three cases: (a) Here ' is a function of not involving the x's explicitly, (12)

fi = F{(j>). (b) This is a particular case of (a), namely, that where = F{) = A + B,

(13)

A and B being independent of the x's. (c) Here 0. As was pointed out in Part I, the inequality (19) reduces in this case to the two following inequalities: and (14) and the existence of the unbiassed critical region of type A depends on the possibility of finding two numbers c 1 < c2 and a region w0 in the sample space satisfying (6) and (7) such t h a t within this region the inequalities (14) should hold good, while within its complementary, say w = W — w0, c^^^ca.

(15)

Owing to (6), (10) and (11), the region w complementary to w0 must satisfy the conditions (16)

(...¡_p(E\0o)dx1...dxn=l-oc, J *J W j...jjp(E\eo)dx1...dxn

= 0.

(17)

I t is obvious that whenever w is found it determines uniquely w0. Therefore, whenever convenient, we may consider the problem of determining w. PROPOSITION I . If the relationship ( 1 3 ) holds good almost everywhere in the sample space, then the region w can be always determined.

Denote by F(t) the probability, determined by the hypothesis tested, of (¡) having a value smaller than t, i.e. F(t) = P{4>p{E\d0)dxl...dxn,

1 — a),

+ Ay),

+ Ay)-V(y),

1^(2/+ 1 — a)— V{y + A — 1 — a),

when when when

Ay>0,

Ay0,

when/fyp(E\e0)dx1...dxn

(32)

J 0, then t(Ay) must tend to — oo and the integral in the right-hand side of (32) must tend to zero since, otherwise, the integral (11) of the same function, taken over the whole sample space, would not be convergent. It follows that, in this case also, (26) tends to zero with Ay and, therefore, that I(y) is continuous at y = 0. A similar method leads to the conclusion that it is continuous at y — a. Denote by w(y) = W — w(y) the region complementary to w(y), and consider w(0), corresponding to y = 0. The equality (11) reduces to f...f p(E\60)dx1...dxn J Jw(0)

p(E\60)dx1...dxn

J

J w(0)

= a(l-at),

= 0.

(33)

(34)

272

J . N E Y M A N AND E . S . P E A R S O N

with a ^ t ( 1 — a). Similarly p(E | 0 O )dx 1 ... dxn = 6a, (35) J" lw( \U 0) with 6 ^ t(\ — a) ^ a . Owing to (39), either both a and b are equal to zero or else a < 0 < b. In the former case the region it'(O) satisfies the definition of the unbiassed critical region of type A. In the latter case we have 1(0) < 0. Repeating the same reasoning, we shall find that 1(a) must be positive, which proves that there must exist a number y' between zero and a such that I(y') = 0. Then w(y') will satisfy the definition of the unbiassed critical region of type A. Example. In the above proof we have assumed that the probability of being equal to some fixed value t may be different from zero. In view of the considerable regularity of the probability law p(E | 6) required to satisfy the differential equation (13), it may be questioned whether such an assumption can be realized at all. If this question were to be answered in the negative, then the above discussions would be unnecessarily elaborate. The following example shows that the elementary probability law p(E \ 6) may satisfy the equation (13) at almost any point of w when p(E | 0a) > 0, and then, for some values t the probability P{ = t} may be greater than zero. Consider the case of only two variables x1 and x2 and assume that, 6 being some positive number, p(E I d) =

^

T l [ d 0

e

g _

l ) 1

= 0

f o r

0^+xi^l

for I 0 at least for one value of 6. PROPOSITION

II.* If at any point of W+ we have ' = A + B(j),

(41)

where A and B are independent of E and if 0 does not vanish identically in W+, then the unbiassed critical region of type A to test a hypothesis H0 ascribing to 6 any value 60, as described in the proof of Proposition I, will be necessarily of type Ay Proof. Denote by w0 the unbiassed critical region of type A, i.e. such that for two fixed numbers cl ^ c2 and at any point of w0 either ,p{E\d0)dx1...dxn

= Q.

(45)

Wo

In order to prove the Proposition II, we have to prove that whatever be the alternative hypothesis H t specifying the value 6 + and whatever any other region w satisfying (44) and (45), we shall have f . . . f p(E\d)dx1...dxn> J

J Wo

i . . . f p(E\9)dx1...dxn. J

(46)

J W

Owing to the Lemma proved in Part I, p. 11, the proof will be completed if we manage to show that, both 6 0 and 6 + 6 0 being fixed, it is always possible to determine two numbers a and b, depending on 6 and 60 but independent of E, such that within wn0 ,™ , ,n tn w i , /^ p(E\6)>p(E\d0)(a0 + b), (47) and within the region w0, complementary to w0, p(E\d)),

(70)

where fi(dx, d21 w) is considered as a function of 6X and 02 corresponding to t h e region w. If this region w is used as a critical region, t h e n fi{dx, 02 | w) will be called its power function. W h a t e v e r 6[ and 0'2 m a y be, t h e value of /?(#{, 0'2 \ w) represents the probability of E falling within w, calculated under the assumptions t h a t 6X = d[ and d2 = 0'2. If w is the critical region of a test of a hypothesis H 0 , t h e n this function also gives t h e probability t h a t H0 will be rejected using the test based on w. Hence t h e values of fi(6x, 021 w), corresponding to various values of 8X a n d d 2 , indicate how frequently H 0 will be rejected in cases when it is true and also when it is u n t r u e in one way or another. I t follows t h a t a rational comparison of a n y two tests suggested for the same hypothesis m u s t be based on t h a t of the corresponding power functions; this would involve the calculation of the integral (70) over the two critical regions for a n y values of 0X and d2. Below we shall give examples of such comparisons. But, as we have already seen in earlier sections, a p a r t f r o m the question of comparison, t h e idea of the power function m a y be useful when we a t t e m p t to deduce a test. The properties of t h e power function depend on t h e size and shape of the critical region and, if these are entirely a t our disposal, we m a y consider how best to choose in order t o provide the power function with properties which, for one reason or another, we consider as advantageous. I n § 2 below, we shall direct our attention to changes in the immediate neighbourhood of the hypothesis tested, t h a t is to say where both | 6X | and | d 2 1 are small, and see how t h e conception of an unbiassed test of t y p e A can be extended t o cover t h e case where the number of unknown parameters is two or more. 2. Unbiassed critical regions of type C. W e shall assume t h a t the following f u n d a m e n t a l condition is satisfied: At any point of the sample space, the elementary probability law p(E \ 0l, 02) is differentiable with respect to 0X and 02 and admits two consecutive differentiations

278

J . NEYMAN AND E . S . PEARSON

under the sign of the integral taken over any fixed region w; that is to say

for r + s = k = 1, 2.

1 ^ ^ - • ^ = Njm* 1110MdXi - dXn (71)

The derivatives of the above type taken at 61 = 0 2 = 0 will be simply denoted as follows: ^-J...jj(E\d1,d2)dxi...dxn

= A(w)

for ¿=1,2,

(72)

^ ¡ . . . j j i E ^ d x ^

= A » . (73) e. Similarly, ft^w) and /i2 2(w) will denote partial derivatives of ¡3(w) taken twice with respect to 6X and d2. DEFINITION I .

If the region w0 possesses the following properties: A K ) = A K ) = O.

(74)

/?1,2H) = 0.

(75)

i u W

(76)

= AL^O).

and ¿/, /or any other region w whatsoever satisfying (74), (75) and (76) and for which /i(0,0|w 0 )=/ff(0,0|w) = a,

(77)

the following inequality holds, (w0) > (78) then we shall say that wQ is a regular unbiassed critical region for testing H0 of type C (Neyman, 19376). Apart from the above definition, it may be useful to describe these properties of the critical region of type C in a more intuitive way. For this purpose let us take the Taylor expansion of the power function of any region w in the vicinity of 0i = 02 = 0. We shall have p(0lt 0t\w) = /?(0,0 | w) + V , (w) + djl2(w) + ¿ { f l J A . i M + 2W

^

M

+

+









(79)

For small values of the 0's the terms of first and second order will provide a sufficient approximation to fi(6v 02 \ w). If we compare regions satisfying (77), then, for all of them the first term in the expansion (79) will be equal to a or the probability of rejecting falsely the hypothesis tested when it is in fact true. The conditions (74), which we may conveniently call the conditions of unbiassedness, mean that the terms of the first order in 6's in (79) vanish and that, consequently, the power function may possess, at 0X = 02 = 0, an extremum. Whether this is so or not, depends on the sign of ^ =

(80)

Contributions to the theory of testing statistical hypotheses

279

If this be negative and P ltl {w) = /?2j2(tt>) > 0, then the power function has a minimum at = d2 = 0. This means that, when using the region w to test the hypothesis H0, we shall reject H0 when it is slightly in error (so t h a t | d1 | and 16Z \ are small) more frequently than when it is correct. Assuming for the moment t h a t t h e conditions (74) are satisfied and A < 0 we may write fi{6Mw)

= a + |{9i/î 1 , 1 M + 2 9 i y i ) ! M + % » } + . . . .

(81)

If the d's are small, then the situation may be easily interpreted by considering the plane with axes of co-ordinates Od1 and Od2, as in Fig. 2. Ignoring the terms of higher order in the d's, we see t h a t the loci, ,02\w) = constant, are represented by coaxial ellipses, say C(w), centred at the origin. These may be termed ellipses of equidetectability. Since it is assumed t h a t the hypothesis tested ascribes to the d'a values equal to zero, the value 81 + 0 may be conveniently called " t h e error in 6 1 ". With this terminology, if the ellipse, 0 2 \w) — constant, is situated as shown in Fig. 2, we may say t h a t "positive and negative errors in the two d'a are not equally controlled by the t e s t " . I n fact, the two points A and B correspond to the same dy = d'x > 0 and to 0 2 = > 0 and d% = — 6'2 < 0

280

J . NEYMAN AND E . S. PEARSON

respectively; these points, however, do not lie on the same ellipse, and therefore the probabilities of detecting the falsehood of the hypothesis tested when the true values of the 6's are (&[, 6'2) and (8'v — d'2) are not the same. I n m a n y cases this result m a y seem artificial, and the condition (75) has been introduced to insure t h a t small positive and negative errors in the 0's should be controlled equally by the test. I n fact, when /?12(m>) = 0, the ellipses C(w) are symmetrical with respect to both axes of co-ordinates. Supposing t h a t (75) is satisfied b u t not (76), we notice t h a t another property of the test may be considered sometimes as an artificiality. For example, if fii,i(w) < fii,2Íw)> t h e n the ellipses will be longer in the horizontal t h a n in the vertical direction. Consequently, comparatively small errors in d 2 will be as frequently detected by t h e test as much larger errors in 6 V If both parameters have been properly scaled,* then this feature is undesirable, and it has been avoided by t h e introduction of the condition (76) requiring t h a t small errors in the 0'a of equal size should be equally frequently detected by the test. If (76) is satisfied the ellipses reduce to circles. I t remains to justify the condition (78). This implies t h a t using the region w 0 we shall detect the falsehood of the hypothesis tested (provided t h a t the true 6's are in the vicinity of the origin) more frequently t h a n using any other region which, (a) controls t h e errors of first kind as efficiently as does w0, and (6) possesses the properties (74), (75) and (76) already discussed. Since later we shall need to use them, it will be convenient to give a special name to critical regions w satisfying the conditions of unbiassedness (74), for which in (80) A < 0, for which fti,i(w) > 0 b u t which do not satisfy (75) and (76). We will therefore adopt the following definition: DEFINITION I I . If the region w1 possesses the properties A ) for any other region w such that

and

(74), while in (80) (82)

/?(0,0 | u^) = /?(0,0 | w) = a,

(83)

^

(84)

fi^ú

P

^

'

Pl.tM then we shall say that w1 is a non-regular unbiassed critical region of type C. From w h a t was said above, it will be noticed t h a t , for small values of the 6's, the region w1 thus defined will be unbiassed and more powerful t h a n any other region corresponding to the same ellipses of equidetectability C(w) as the region w v The reader m a y have noticed the emphasis we lay on the restriction t h a t these statements are valid for small values of the 0's. I t is essential to remember t h a t they m a y not apply to large errors in the 6's, and we do not suggest t h a t the * This problem of scaling is discussed more fully below.

Contributions

to the theory of testing statistical

hypotheses

281

unbiassed critical regions of type C are always necessarily the most advantageous. Let us repeat this point again: the final rational choice of the critical region must be based on the properties of its power function over the whole range of possible values of the 6's and not only in the vicinity of the origin. We shall discuss this point later on, together with the question of a "proper scaling" of the errors in the d's. We must first, however, consider the method of determining the regions that have been defined. To simplify the printing, we shall write simply p for p(E | 0,0) and add appropriate subscripts to p to denote the derivatives with respect to either 61 or d2, taken at 6X = d2 = 0. Thus, for example, pn will denote c2p(E \ 6V d2)jdd\ \ g1=g1=0 etc. PROPOSITION I I I . If the region w0 satisfies the following

conditions:

(i) at any point inside w0 (85)

Pn > h(Pu - J»«i) + hPn + hPi + KPz + hP> and at any point outside w0

(86)

pn ^ h(Pn~Pn) + hPi2 + hPi + KP% + hP, where the k/s are constant coefficients; (") if

J

...

ptda1...dxn J W„

f...f pdx1...dxn J J t»3 = \...\ pl2dx1...dxn= J J W,

= cc,

(87)

...

= 0

J

(p11-p22)dxI...dxn J U>0

(¿=1,2),

(88)

then the region w0 is the regular unbiassed critical region of type C.

The proof of Proposition I I I follows directly from the application of the lemma proved in Part I, pp. 10-11. Thus the problem of finding an unbiassed critical region of type C reduces to that of determining the constants kit so that a region satisfying (85) and (86) satisfies also (87) and (88). PROPOSITION I V . If a region w1 satisfies the following conditions : (i) at any point inside w^ Pu > hiYizPn ~ YuPn) + h(Y22Pn ~ Y11P22) + and at any point outside wy

+ hP2 + hP

P11 < fci(7i2i>n - riiPu) + h(722Pj 1 - JuP22) + hPi + hP2 + hP> where the k{'s are constant coefficients, while the y's are such that y n > 0 and

J J

J

••• pdx1...dxn J wx

...

pidx1...dxn J w1

••• (712P11 ~ ViiPu) dx1...dxn=\...\ J W, J 19

J U),

(90) (91)

712-7x1722 (») */

(89)

= a,

(92) = 0,

¿=1,2

(722P11 ~ 711P22) ¿xi •• • dxn = 0, (93)

282

J. NEYMAN AND E . S. PEARSON

then the region w1 is the non-regular unbiassed critical region of type C, having the ellipses of equidetectability determined by the equation 7U0Í + 2yi20i02 + 7220! = const.

(94)

In this case also, the proof of the proposition is a direct consequence of the lemma mentioned above. 3. Effect of a transformation performed on the parameters. In our Part I we have found that the unbiassed critical regions of type A are invariant with respect to a broad class of transformations of the unknown parameters. Instead of this property, in the case of the regions of type C, we find the property expressed in the following propositions, which lead us to a closer consideration of the problem of the proper choice of parameters. Consider a case where it is possible to establish a regular unbiassed critical region, w0, of type C to test a hypothesis H0 ascribing to two parameters 61 and d2 the values of zero. The region w0 will then satisfy the conditions of Proposition I I I where the k's are some fixed numbers. Suppose now that owing to certain considerations it is decided to transform the original parameters dx and d2. Thus instead of considering the probability law p(E | dx, d2) to be dependent on dx and d2, we should decide to consider it as dependent on some other parameters, 0X and 0 2 , ofwhich 6X and d2 are certain specified, single-valued and differentiable functions 01=/L(01,02), 02=/2(0I>02)(95) Without any loss of generality, we may assume that when 0X = 0 2 = 0, then fx=f2 = 0, so that in this case the hypothesis tested, H0, will ascribe the values zero to both 0X and 0 2 . Formerly we denoted by Q the set of admissible hypotheses specifying the values ofd1 and 02. The transformation (95) leads to a transformation of the set Q into a new set, say Q', containing hypotheses which specify the values of the parameters 0X and 0 2 . The probability laws specified by members of Q and Q' will be the same, but, owing to the transformation (95), they will be in a sense differently ordered, so that the conceptioft of what we have called "an error in the hypothesis tested" will be different. We shall now proceed to discuss the properties of the region w0 with respect to the new set Q', if it is known that with respect to the set Q, it is a regular unbiassed critical region of type C. In particular, we must ask whether it is possible and if so, under what conditions, to affirm that the properties of w0 with respect to Q' will be the same as with respect to ü 1 In order to assure a one-to-one correspondence between the hypotheses forming the two sets Q and Q', we shall assume that the equations (95) are soluble with respect to 0X and 0 2 , and in particular that the Jacobian d(6x, dz)/d(01,02) 19 „ 0 4= 0.

Contributions to the theory of testing statistical hypotheses

283

We shall further introduce the following notation:

dd1 50!

= A, Mi

« dd2 C, 30! =

A dW1 — Al< = 30 1 30 2

30 2

'

3^ (96) 30| 3202 3202 (s2> = Ci> 30? 30 x 30 2 30| Next the derivatives of the power function of the region w0 with respect to 0X and 0 2 will be marked by the appropriate subscripts inserted in brackets. Further, as we shall consider a single region w0 only, we shall omit it in denoting either the power function or any of its derivatives, so that, for example,

sel

A

a m . 02 K ) _ o ~

but

= Ä n ) = AJ1 +

P u

(qi)

'

(

'

CJ2 + A*ßn + 2ACß12 + 2) = - {AD - BCf = - (f/^1' 6*\Y < 0, \o(0i, 0 2 )/ (103) it follows that for all such regions the power function ß will have a minimum at 0i = 0, and that all of them will correspond to the same system of ellipses of equidetectability, namely, (^42+ C2) 0\ + 2(AB + CD) 0102 + (B2 + D2) 0\ = constant.

(104)

284

J . N E Y M A N AND E . S . PEARSON

If the region w0 satisfies also the conditions (78), then it is seen to satisfy the definition I I of the non-regular unbiassed critical region with respect to the set of alternatives Q'. I n particular cases the region wQ may be regular also with respect to Q'. For this it is necessary and sufficient t h a t (105) All these results m a y be summed up in the following proposition. P R O P O S I T I O N V . If the formulae connecting the new parameters and 02 with the initial ones, 61 and 02, are such that the Jacobian d{6v ^2)/3(@1,02) differs from zero at 0X = 02 = 0, then a region w0, which is a regular unbiassed critical region of type C with respect to the set of alternative hypotheses Q, will retain its property of unbiassedness with respect to the set Q' of transformed hypotheses, specifying the values of new parameters 0X and 02. If the conditions (105) are satisfied, then w0 will be a regular unbiassed critical region with respect to Q'. Otherwise, it will be non-regular.

I t is seen t h a t the properties of the unbiassed critical regions of type C differ essentially from those of t y p e A. The latter were invariant with regard to a transformation of the parameter involved in the probability law of the a;'s. I t follows that, when deciding to apply an unbiassed critical region of t y p e C to test a hypothesis H0, it is necessary to be quite clear about the system of parameters which it is most appropriate to adopt in any particular practical problem. The choice of such a system lies clearly beyond the bounds of the theory of statistics and must be made in accordance with the practical importance of errors which it is desired to avoid when testing a particular hypothesis. I t m a y be useful, however, to give a few illustrations. If we test a hypothesis H 0 involving the assumption, among others, t h a t the standard deviation a of the population sampled has some specified value, e.g. CQ = 1, it m a y be considered, though not necessarily, t h a t tr by itself is not an appropriate parameter to use in adjusting a regular unbiassed critical region of type C. I t m a y be said t h a t positive and negative errors in 022 = - W2 = 0 012 = = ^12» where and x 2 denote the means of the two samples. Substituting (115) into (112), we get the following inequality: + k i n \(x 2 - £0)2 -

- £0) -

fc>2(x2

(117)

which must be satisfied at any point belonging to the critical region w0 sought. It is seen that w0 is bounded by a surface corresponding to the equation f(xv x2) = constant, and that therefore all the conditions (113) that the critical region has to satisfy could be expressed by means of the integrals taken over a region w'0 in the plane of xl and x2, determined by the same inequality (117). Of course, instead of the original probability law p(E | £0, £0), we shall have that of xx and x2, which is known to be p(xvx2\g0,g0)

=

fe^

e-Un^-W+n^-W),

(118)

We shall simplify the printing still more by introducing, instead of xt and x2, the variables u = a, to the case where a = A and b is infinite, giving the pair of straight lines m = ±A.

Clearly A is given by the equation

(128)

2 ["e^dx V(27 ZnjJ A

= a,

(129)

while, since the circle corresponds to a contour of the probability function p(u,v) of (124), it is easily seen that its radius is a = b = > /(-21og e flt).

(130)

I t is now only necessary to establish that it is always possible to find a member of this system of ellipses, the region w'Q outside which will satisfy (123). With this object in view, write 2 2 r r e-i(« +v ) F(a,b) = ... {nxu2 — n2v2)— dudv — (n1 — n%)a. J J w„¿n

(131)

(1) Suppose w"0 is the region outside the circle with radius given by (130). Make the transformation u = r cos xjr,v = r sin xjr, and it is found that f... f 2n j J

u*e-W+*dudv

=

fa(r*e-*' P'cos 2 1 lrdilr)dr t 27TJo\ JO J

= 1 — e~ ia2 (l + ¿a2) Consequently,

= 1 — a(l + la2).

(132)

— n2) a( 1 + ha2) — (nl — n2)a

F(a, a) =

- ra2) ¿a 2 .

=

(133)

(2) Now take iv", as the space outside the parallel lines (127). In this case f... f 2 n j JWo•

dudv = 2 f °° ( — f + " u2 —— e-*"2 du\ dv J A W(2tt) J —a> j(2n) j = a,

(134)

making use of (129). Further, tyj

f . . .

f

J to,"

d u

d v

=

2

= 19*

f

°° ( v *

JA I



]



V( 2w )

e - i *

l - A e - W + oc. \ it

2

f

+

°° - 7 7 ^ —

J-