183 58 64MB
English Pages 776 [775] Year 2020
PROCEEDINGS OF T H E SIXTH BERKELEY SYMPOSIUM
VOLUME I
PROCEEDINGS of the SIXTH BERKELEY SYMPOSIUM ON MATHEMATICAL STATISTICS AND PROBABILITY Held at the Statistical Laboratory University of California June 21-July 18, 1970 with the support of
University of California National Science Foundation Air Force Office of Scientific Research Army Research Office Office of Naval Research
VOLUME I
THEORY OF STATISTICS EDITED BY L U C I E N M . L E C A M , J E R Z Y N E Y M A N , AND E L I Z A B E T H L. S C O T T
UNIVERSITY OF CALIFORNIA PRESS BERKELEY AND LOS ANGELES 1972
UNIVERSITY O F C A L I F O R N I A PRESS BERKELEY AND LOS ANGELES CALIFORNIA
CAMBRIDGE UNIVERSITY PRESS LONDON, E N G L A N D
COPYRIGHT ©
1 9 7 2 , BY
T H E REGENTS O F T H E UNIVERSITY OF CALIFORNIA
T h e United States Government a n d its offices, agents, and employees, acting within the scope of their duties, may reproduce, publish, a n d use this material in whole or in part for governmental purposes without payment of royalties thereon or therefor. T h e publication or republication by the government either separately or in a public document of any material in which copyright subsists shall not be taken to cause any abridgment or a n n u l m e n t of the copyright or to authorize any use or appropriation of such copyright material without the consent of the copyright proprietor. ISBN: 0 - 5 2 0 - 0 1 9 6 4 - 4 LIBRARY O F CONGRESS CATALOG CARD N U M B E R :
49-8189
P R I N T E D IN T H E UNITED STATES OF AMERICA
CONTENTS OF PROCEEDINGS VOLUMES I, II, A N D III Volume I—Theory of Statistics
General Theory R. J. BERAX. Upper and lower risks and minimax procedures. C. R. B L Y T H and D. M. ROBERTS, On inequalities of C r a m e r - R a o type and admissibility proofs. J. O O S T E R H O F F and W. R . V A N Z W E T , The likelihood ratio test for the multinomial distribution. YV. E. S T R A W D E R M AX, On the existence of proper Bayes minimax estimators of the mean of a multivariate normal distribution.
Sequential Analysis P. J. BICKEL and J. Y A H A V , On the Wiener process approximation to Bayesian Sequential testing problems. Y u . V. L I N N I K and I. V. R O M A N O V S K Y . Some new results in sequential estimation theory. R. M I L L E R , Sequential rank tests—one sample case. R. A. W I J S M A N , Examples of exponentially bounded stopping time of invariant sequential probability ratio tests when the model may be false.
Asymptotic Theory R. R. B A H A D U R and M. R A G H A V A C H A R I , Some asymptotic properties of likelihood ratios on general sample spaces. D. M. CHIBISOV, On the normal approximation for a certain class of statistics. J. H A J E K , Local asymptotic minimax and admissibility in estimation. R. A. J O H N S O N a n d G. G . R O U S S A S , Applications of contiguity to multiparameter hypotheses testing. J. K I E F E R , Iterated logarithm analogues for sample quantiles when p „ | 0 . L. LE C A M , Limits of experiments. M. D. P E R L M A N , On the strong consistency of approximate maximum likelihood estimators. P. S W I T Z E R , Efficiency robustness of estimators.
Nonparametric Procedures R. E. BARLOW and K. A. D O K S U M , Isotonic tests for convex orderings. Z. W. BIRXBAL T M. Asymptotically distribution free statistics similar to student's t. K . A. DOKSL'M. Decision theory for some nonparametric models. J. M. H A M M E R S L E Y , A few seedlings of research. A. W. M A R S H A L L and F. P R O S C H A N . Classes of distributions applicable in replacement with renewal theory implications. R. PYKE, Spacings revisited. H. RUBIN". On large sample properties of certain nonparametric procedures. 1. R. SAVAGE and J. S E T H U R A M A X . Asymptotic distribution of the log likelihood ratio based on ranks in the two sample problem. I. VINCZE, On some results and problems in connection with statistics of the Kolmogorov-Smirnov type.
Regression Analysis T. W. ANDERSON, Efficient estimation of regression coefficients in time series. D. R. BRILLINGER. The spectral analysis of stationary interval functions. H. B O H L M A N N , Credibility procedures. W. G. C O C H R A N , Some effects of errors of measurement on linear regression. L. J. GLESER and I. OLKIN, Estimation for a regression model with an unknown covariance matrix. J. M. HOEM, On the statistical theory of analytic graduation. v
vi
CONTENTS OF PROCEEDINGS
Multivariate Analysis C. R. R A O and S. K. M I T R A , Generalized inverse of a matrix and its applications. H. C H E R N O F F , Metric considerations in cluster analysis. F. X DAVID. Measurement of diversity. L. A. G O O D M A N , Some multiplicative models for the analysis of cross classified data. T. O R C H A R D and M. A. W O O D B U R Y , A missing information principle: theory and applications. M. SOBEL and G. H. WEISS, Recent results on using the play-the-winner sampling rule with binomial selection problems. M. Z E L E N . Exact significance tests for contingency tables embedded in a 2" classification.
Volume II—Probability Theory Introduction J. L. DOOB, William Feller 1906-1970. M. K A C , William Feller, in L. K. S C H M E T T E R E R , Alfred Renyi, in memoriam.
memoriam.
Measure Theory D. W. M Ü L L E R , Randomness and extrapolation. R. M. B L U M E N T H A L and H. H. C O R S O N , On continuous collections of measures. G. D E B R E U and D. S C H M E I D L E R , The Radon-Nikodym derivative of a correspondence. R. D U D L E Y , A counterexample on measurable processes. Z. F R O L I K , Projective limits of measure spaces. W. H I L D E N B R A N D , Metric measure spaces of economic agents. C. I O N E S C U T U L C E A . Liftings commuting with translations. J. H. B. K E M P E R M A N , On a class of moment problems. D. M A H A R A M , Consistent extensions of linear functionals and of probability measures. H. R O S E N T H A L , On the span in IP of sequences of independent random variables. L. K. S C H M E T T E R E R , On Poisson laws and related questions. M. L. S T R A F . Weak convergence of stochastic processes with several parameters.
Inequalities D. L. B U R K H O L D E R , B. J. DAVIS, and R. F. G U N D Y , Integral inequalities for convex functions of operators on martingales. S. DAS G U P T A , M. L. E A T O N , I. O L K I N . M. P E R L M A N , L. J. SAVAGE, and M. SOBEL, Inequalities on the probability content of convex regions for elliptically contoured distributions.
Combinatorial Analysis P. D O U B I L E T , G.-C. R O T A , and R. S T A N L E Y , On the foundations of combinatorial theory, VI: the idea of generating function.
Ergodic Theory S. K A K U T A N I , Strictly ergodic symbolic dynamical systems. W. K R I E G E R . On unique ergodicity. D. S. O R N S T E I N , On the root problem in ergodic theory.
Gaussian Processes J. F E L D M A N , Sets of boundedness and continuity for the canonical normal process. A. M. GARSIA, Continuity properties of Gaussian processes with multidimensional time parameter. G. K A L L I A N P U R and M. N A D K A R N I , Supports of Gaussian measures. R. S. LIPTSER and A. M. S H I R Y A Y E V , Statistics of conditionally Gaussian random sequences. M. B. M A R C U S and L. A. SHEPP, Sample behavior of Gaussian processes. S. OREY, Growth rate of certain Gaussian processes.
C O N T E N T S OF P R O C E E D I N G S
Vll
Central Limit Theorem R. X. BH A T T A C H A R Y A . Recent results on refinements of the central limit theorem. R. F. C O G B U R X . The central limit theorem for Markov processes. A. DVORETZKY. Asymptotic normality of sums of dependent random variables. B. V. G X E D E X K O . Limit theorems for sums of a random number of positive independent random variables. \1. ROSEXBLATT. Central limit theorem for stationary processes. V. V. SAZOXOY. On a bound for the rate of convergence in the multidimensional central limit theorem. C. STEIX, A bound for the error in the normal approximation to the distribution of a sum of dependent random variables.
Volume III—Probability Theory Passage Problems Yu. K. BELYAYEV, Point processes and first passage problems. A. A. BOROYKOY. Limit theorems for random walks with boundaries. X. C. JAIX and VV. E. PRUITT. The range of random walk. H. ROBBIXS and D. S I E G M l ' X D . On the law of the iterated logarithm for maxima and minima. A. D. SOLOVIEV. Asymptotic distribution of the moment of first crossing of a high level by a birth and death process.
Markov Processes—Potential Theory R. G. AZEXCOTT and P. CART1ER, Martin boundaries of random walks on locally compact groups. J. L. DOOB, The structure of a Markov chain. S. PORT and C. STOXE, Classical potential theory and Brownian motion. S. P O R T and C. STONE, Logarithmic potentials and planar Brownian motion. K. SATO, Potential operators for Markov processes.
Markov Processes—Trajectories—Functionals R. G E T O O R , Approximations of continuous additive functionals. K. ITO, Poisson point processes attached to Markov processes. J. F. C. K I N G M A N , Regenerative phenomena and the characterization of Markov transition probabilities. E. J. McSHANE, Stochastic differential equations and models of random processes. P. A. MEYER, R. SMYTHE, and J. WALSH, Birth and death of Markov processes. P. W. MILLAR, Stochastic integrals and processes with stationary independent increments. D. W. STROOCK and S. R. S. V A R A D H A X , On the support of diffusion processes with applications to the strong maximum principle. D. W. STROOCK and S. R. S. V A R A D H A X , Diffusion processes.
Point Processes, Branching Processes R. V. AMBARTSUMI AX. On random fields of segments and random mosaic on a plane. H. SOLOMOX and P. C. C. W A N G . Xonhomogeneous Poisson fields of random lines with applications to traffic flow. D. R. COX and P. A. VV. LEWIS. Multivariate point processes. M. R. LEADBETTER, On basic results of point process theory. VV. J. BUHLER. The distribution of generations and other aspects of the family structure of branching processes. P. S. PURI, A method for studying the integral functionals of stochastic processes with applications: III. W. A. O'X. VVA UGH. Uses of the sojourn time series for the Markovian birth process. J. GANI, First emptiness problems in queueing. storage, and traffic theory. H. E. DANIELS, Kuhn-Griin type approximations for polymer chain distributions. L. K ATZ and M. SOBEL, Coverage of generalized chess boards by randomly placed rooks. R. HOLLEY, Pressure and Helmholtz free energy in a dynamic model of a lattice gas. D. MOLLISOX, The rate of spatial propagation of simple epidemics. W. H. OLSOX and V. R. R. UPPULURI, Asymptotic distribution of eigenvalues or random matrices.
viii
CONTENTS OF PROCEEDINGS
Information and Control R. S. BUCY. A priori bounds for the Ricatti equation. T. F E R G U S O N . Lose a dollar or double your fortune. H. J. K U S H N E R . Necessary conditions for discrete parameter stochastic optimization problems. P. Y A R A I Y A . Differential games. E. C. P O S N E R and E. R. R O D E M I C H , Epsilon entropy of probability distributions.
PREFACE B E R K E L E Y S Y M P O S I A ON M A T H E M A T I C A L S T A T I S T I C S A N D P R O B A B I L I T Y have been held at five year intervals since 1945. The Sixth Berkeley Symposium was divided into four sessions. The first took place from June 21 to July 18, 1970. It covered mostly topics in statistical theory and in theoretical and applied probability. The second session was held from April 9 to April 12, 1971 on the special subject of evolution with emphasis on studies of evolution conducted at the molecular level. The third session held in June 1971 was devoted to problems of biology and health. A fourth session on pollution was held in July 1971. The first three volumes of the Proceedings cover papers presented in June and July, 1970, as well as papers which were sent to us at that time, but could not be presented in person by their authors. The first volume is entirely devoted to statistics. The second and third are devoted to contributions in probability. Allocation of the papers to the three volumes was made in a manner which we hope is fairly rational, but with an unavoidable amount of arbitrariness and randomness. In the event of doubt, a general index should help the prospective reader locate the desired contribution. The Berkeley Symposia diifer substantially from most other scientific meetings in that they are intended to provide an extended period of contact between participants from all countries in the world. In addition, an effort is made to promote cross contacts between scholars whose fields of specialization cover a broad spectrum from pure probability to applied statistics. However, these fields have expanded so rapidly in the past decades that it is no longer possible to touch upon every domain in a few weeks only. Since time limits the number of invited lectures, the selection of speakers is becoming rapidly an impossible task. We could only sample the abundance of available talent. For this selection, as well as for several other important matters, we were privileged to have the assistance of an advisory committee consisting of Professors Z. W. Birnbaum and L. Schmetterer, representatives of the Institute of Mathematical Statistics, and of Professor Steven Orey, delegate of the American Mathematical Society. The visible success of our gathering is in no small measure attributable to the help we have received from this committee and other scientific friends. A conference which extends over six weeks with participants from various parts of the world entails expenses. In this respect we feel fortunate that in spite of the general shortage of funds, the University of California and the Federal Agencies found it possible to support our enterprise. We are grateful for the allocation of funds from the Russell S. Springer Memorial Foundation, the National Science Foundation, the Office of Naval Research, the Army Research Office, the Air Force Office of Scientific Research, and the National Institutes of Health, which contributed particularly to the sessions on evolution and on problems of biology and health. In addition the pollution session received support from the Atomic Energy Commission. The organization of the meetings fell under the responsibility of the under IX
X
PREFACE
signed with the able help of the staff of the Statistical Laboratory and of the Department of Statistics. For assistance with travel arrangements and various organizational matters, special thanks are due to Mrs. Barbara Gaugl. The end of the actual meeting signals the end of a very exciting period, but not the end of our task, since the editing and publishing of over 3,000 pages of typewritten material still requires an expenditure of time and effort. In this respect we are indebted to Dr. Morris Friedman for translations of Russian manuscripts. We are particularly grateful to Dr. Amiel Feinstein and Mrs. Margaret Stein who not only translated such manuscripts but acted as editors, checking the references and even verifying the accuracy of mathematical results. The actual editing and marking of manuscripts was not easy since we attempted to follow a uniform style. We benefitted from the talent and skill of Mrs. Virginia Thompson who also assumed responsibility for organizing and supervising the assistant editors, Miss Carol Conti, Mrs. Margaret Darland, and Miss Jean Kettler. We are extremely grateful to all the editors for the knowledge and patience they have devoted to these manuscripts. In the actual publication of the material the University of California Press maintained their tradition of excellence. The typesetting was performed by the staff of Oliver Burridge Filmsetting Ltd., in Sussex, England. The meetings of the Sixth Symposium were saddened by the absence of two of our long time friends and regular participants, William Feller and Alfred Renyi. Professors J. L. Doob and Mark Kac were kind enough to write a short appreciation of Feller. For a similar appreciation of Renyi. we are indebted to Professor L. Schmetterer. The texts appear at the beginning of the second volume. LLC.
J.N.
E.L.S.
CONTENTS General
Theory
R. J. BERAN—Upper and Lower Risks and Minimax Procedures
1
C. R. B L Y T H and D . M . R O B E R T S —On Inequalities of CramerRao Type and Admissibility Proofs
17
and W . R. V A N Z W E T — T h e Likelihood Ratio Test for the Multinomial Distribution
31
W. E. S T R A W D E R M A N — O n the Existence of Proper Bayes Minimax Estimators of the Mean of a Multivariate Normal Distribution
51
J. OOSTERHOFF
Sequential P.
Analysis
and J . Y A H A V —On the Wiener Process Approximation to Bayesian Sequential Testing Problems . . . .
57
Yu. V . L I N N I K and I . V . ROMANOVSKY—Some New Results in Sequential Estimation Theory
85
R. MILLER—Sequential Rank Tests—One Sample Case.
.
97
R. A. WIJSMAN—Examples of Exponentially Bounded Stopping Time of Invariant Sequential Probability Ratio Tests When the Model May be False
109
Asymptotic Theory R. R . B A H A D U R and M RAGHAVACHARI—Some Asymptotic Properties of Likelihood Ratios on General Sample Spaces .
129
D. M. C H I B I S O V —On the Normal Approximation for a Certain Class of Statistics
153
H A J E K — Local Asymptotic Minimax and Admissibility in Estimation
175
A. J O H N S O N and G . G . ROUSSAS—Applications of Contiguity to Multiparameter Hypotheses Testing
195
J. KIEFER—Iterated Logarithm Analogues for Sample Quantiles When/^O
227
J.
R.
J. BICKEL
xi
.
xii
CONTENTS
L. LE CAM—Limits of Experiments
245
M. D . PERLMAN—On the Strong Consistency of Approximate Maximum Likelihood Estimators
263
P. SWITZER—Efficiency Robustness of Estimators
283
Nonparametric
. . . .
Procedures
R. E. B A R L O W and K. A. DORSUM—Isotonic Tests for Convex Orderings
293
Z. W. BIRNBAUM—Asymptotically Distribution Free Statistics Similar to Student's t
325
K. A. DORSUM—Decision Theory for Some Nonparametric Models
331
J.
M.
HAMMERSLEY
—A Few Seedlings of Research.
.
345
A. W. M A R S H A L L and F. PROSCHAN—Classes of Distributions Applicable in Replacement with Renewal Theory Implications
395
R. PYRE—Spacings Revisited
417
H.
RUBIN—On Large Sample Properties of Certain Nonpara-
metric Procedures
429
I. R. S A V A G E and J. S E T H U R A M A N — Asymptotic Distribution of the Log Likelihood Ratio Based on Ranks in the Two Sample Problem 437 I.
VINCZE—On Some Results and Problems in Connection with Statistics of the Kolmogorov-Smirnov Type
Regression
Analysis
T. W. ANDERSON—Efficient Estimation of Regression Coefficients in Time Series D. R.
459
471
BRILLINGER—The Spectral Analysis of Stationary Interval
Functions
483
H . BUHLMANN—Credibility P r o c e d u r e s
515
W. G. COCHRAN—Some Effects of Errors of Measurement on Linear Regression
527
CONTENTS
XLLL
and I . O L K I N —Estimation f o r a Regression Model with an Unknown Covariance Matrix
541
J. M . HOEM—On the Statistical Theory of Analytic Graduation
569
L. J. GLESER
Multivariate C.
Analysis
R . R A O and S. K . M I T R A —Generalized Inverse of a Matrix a n d Its Applications
H . CHERNOFF—Metric Considerations in Cluster Analysis .
.
601 621
F. N. DAVID—Measurement of Diversity
631
L. A. GOODMAN—Some Multiplicative Models for the Analysis of Cross Classified Data
649
T.
and M. A. W O O D B U R Y — A Missing Information Principle: Theory and Applications
697
M . S O B E L and G . W E I S S — R e c e n t Results on Using the Playthe-Winner Sampling Rule with Binomial Selection Problems
717
M . ZELEN—Exact Significance Tests f o r Contingency Tables Embedded in a 2" Classification
737
ORCHARD
UPPER A N D LOWER RISKS A N D MINIMAX PROCEDURES R . J.
BERAN
U N I V E R S I T Y OF C A L I F O R N I A ,
BERKELEY
The essential goal of R. A. Fisher's fiducial argument was to make posterior inferences about unknown parameters without resorting to a prior distribution. Over the past decade, there have been two major attempts at developing a statistical theory that would accomplish this convincingly. One of these efforts has been described in a series of publications by Fraser, the other in papers by Dempster. From the early work [4], [11], [12], [13], which was tied to a fiducial viewpoint, both authors developed statistical theories that were distinct from the fiducial argument, yet achieved the goal of non-Bayesian posterior inference [5], [6], [7], [8], [14], [15], [16]. Despite technical and other differences, the main ideas underlying this later work by Dempster and by Fraser appear to be similar. Fraser's papers, analyzing statistical models that possess a special kind of structure, arrive at "structural probability" distributions for the unknown parameters. Dempster's papers, dealing with less specialized models, derive "upper and lower probabilities" on the parameter space. Disregarding some technicalities, these upper and lower probabilities reduce to structural probabilities for the models considered by Fraser. To this extent, upper and lower probabilities are a generalization of structural probabilities. However, there appear to be differences in interpretation. Fraser has given a frequency interpretation to structural probabilities in [11], [12] (but not in later work); this interpretation depends upon the special form of the statistical models in his theory, and does not apply to Dempster's theory. Dempster has provided no simple interpretation for upper and lower probabilities; he suggested in [7] that his theory might be "an acceptable idealization of intuitive inferential 'appreciations'." More recently, he has embedded his theory within a generalized Bayesian framework [9], [10]. The justification forthe latter is unclear at present (see the discussion to [9]). Lacking in both the Dempster and Fraser theories are systematic methods for dealing with estimation and hypothesis testing problems (or suitable analogues of such). A method of constructing tests was described by Fraser in [16], but no performance criteria were established. Dempster [5] defined upper and lower risks but did not pursue their application; the statistical meaning of these risks This research was partially s u p p o r t e d by National Science F o u n d a t i o n G r a n t GP-15283.
1
2
SIXTH BERKELEY SYMPOSIUM: BERAN
is not evident under his interpretation of upper and lower probabilities. Since even simple models suggest a variety of natural estimates and tests, some theory seems necessary as a guide to choice of procedure. The results presented in this paper proceed in several directions. A statistical interpretation for upper and lower probabilities and risks is described in Section 2; this rationale leads naturally to a minimax criterion for statistical procedures and, in principle, to an alternative to standard decision theory. The desirability of such an alternative stems from well-known awkward features of standard decision theory, such as the possibility that a test of low size and high power may make a decision which is contradicted by the data (see Hacking [17]). A heuristic account of these ideas in a less general context has previously been given by the author in [1]. Section 3 of the paper develops basic mathematical properties of upper and lower probabilities and risks in the light of Choquet's [3] theory of capacities. The results include extensions of properties given by Dempster in [6]. In Section 4, convenient conditions are established for the existence of minimax procedures (as defined in Section 2). An example in a nonparametric setting follows. 2. Statistical background An experiment is performed, resulting in observation x. It is known that the observed x was generated from a parameter value t and a realized random variable e by the mapping (2.1)
x
=
£(e, t).
Moreover, t lies in a parameter space T, x lies in an observation space X. and e is realized according to a probability measure P on an elementary space E. Both P and the mapping £ are known. The problem is to draw inferences concerning t from x and the model. The following formal assumptions are made: X is a Borel subset of a metric space and is endowed with the (7-algebra 3C of all Borel sets. T and E are complete separable metric spaces, endowed with er-algebras 3~ and S". respectively. -JT consists of all Borel sets in T. P is defined on the Borel sets in E and $ is the completion with respect to P of the ff-algebra of these Borel sets; thus £ contains all analytic sets. The function ^: E x T -» X is Borel measurable. Formally, performing the experiment described above amounts to realizing, through physical operations, a specific triple (x, t, e) e X x T x E. Before the experiment is carried out (or the outcome x is noted), the following prospective assertions can be made about the triple to be realized: the chance that c e B, B e S, is P(B); t is an unspecified element of T; the observable x is related to t and e through (2.1). Once the experiment has been performed and x has been observed, the particular triple (x, t, e) that was realized can be described more precisely. If (2.2)
Tx(e) = {teT-.x
= Z(e, /)},
3
MINIMAX PROCEDURES it is evident that the e realized in the experiment must lie in Ex = {eeE:Tx(e)
(2.3)
+
0},
and whatever that e is, the realized t must belong to the corresponding Tx(e). Since Ex = p r o j £ [ £ ~ 1 ( # ) ] , Ex is analytic under the assumptions and so lies in Let P[B\E^\ denote the conditional probability defined by P[B n EX1
(2.4)
Beê,
provided P[E^\ > 0. If P[E^\ = 0, it may still be possible to condition upon a suitable statistic. In any event, a modification of £ so as to include round off error incurred in observing x will generally result in P\_E^\ > 0. Thus, after the experiment has been performed and x has been observed, the following prospective statements can be made about the realized triple (x. t, e): x is as observed; eeXx and the chance that ee B, B e £, is P\B\E^\ whatever e is. t is an unspecified element of the corresponding set Tx(e): relation (2.1) is necessarily satisfied. This collection of assertions about the triple (x, t, e) will be called the posterior model Jlx for the experiment. Both Dempster and Fraser have previously considered reductions of this type, though not in terms of experimental triples. Since the realized experiment (x, t, e) is described more precisely by the posterior model J t x than by the original model, it is proposed to evaluate statistical procedures of interest by their average behavior over a hypothetical sequence of independent experiments, each of which is generated under the assumptions o f ,MX. The aim is to measure how well a statistical procedure performs when applied to hypothetical experimental triples that are as similar as can be arranged to the actual triple (x. t, e). Let D denote a space of decisions and let £: T x D -* R+ be a nonnegative loss function. Let denote the c-algebra of all Borel sets in R+, and assume that for every d e D, £(•, d) is a measurable mapping of (T, into (R+, ). Suppose d e D is a specific decision whose consequences are to be evaluated relative to the posterior model M x under the loss function f . Let {(ar. /¡. e,). i = 1, 2, • • •} be a sequence of independent hypothetical experiments generated under the posterior model; in other words, e ) : e2. ••• are independent random variables, each distributed according to /'[•|j?JC]. is selected arbitrarily from Tx(e(), x is the observed data. For each i. the equation x = Q(ej. tj) will necessarily be satisfied. Let the general notation propn(7t, ) denote the proportion of true propositions among the propositions {7rf. n 2 , • • • . 7in}. The average loss incurred over the first n hypothetical experiment as a result of taking decision d is n~ 1 £ " = t £(tt, d). Since f ^ 0. (2.5)
4
SIXTH B E R K E L E Y S Y M P O S I U M : B E R A N
for every z e R+ and
UA(z, d) = {teT: d) > z}, then A(z, d) e and {¿(ti,d) > z) = {t,¡e A(z, d)}. Therefore, (2.6)
deD.
r p r o p ^ f e , ) (B) -
^(BuA,)
+ Z f t B u A ^ A j ) -
• • • + (-ly^BuAi
u ••
-uAp),
and let (3.2)
V p = (B) -
Br^A,)
+
^ ( B n J . n ^ ) + (-1
)'4>(BnAxn---nAp).
T h e sums in (3.1) and (3.2) are taken o v e r all possible distinct combinations o f indices, excluding combinations that repeat indices. F o l l o w i n g Choquet we say that (j> is alternating o f order p if A p ^ 0 f o r arbitrary B, A t, •• • and is m o n o t o n e o f order p if Vp ^ 0 f o r arbitrary B, A PROPOSITION 3.1. function
u is monotone
The
set function
of all
orders.
v is alternating
• • • , Ap e of all orders.
[3],
,Ap£8T . The
set
6
SIXTH B E R K E L E Y S Y M P O S I U M :
PROOF (essentially due to Choquet). and alternating of all orders. Now (3.3)
BERAN
The probability / > [-|£ ;t ] is monotone A e JT
v(A) = P[^{A)\EX\
where \p(A) = proj£[ï
=
-
I j=-i
a/l(Aj),
the second equality coming f r o m P r o p o s i t i o n 3.2. Finally, (b) is a consequence o f (3.34) and (3.35). REMARK 3.7.
F r o m L e m m a 3.1, the set function q appearing in L e m m a 3.2
is a probability. Thus, ( b ) represents s ( f ) as an expectation. PROPOSITION 3.12.
Let f,g
eW be such that f + g is defined.
(a) If either s ( f + ) + s(g + ) < oo or s{f~) ^ s(f
+
a(f)
+
( b ) If either s ( f ' ) + s(g~) ^ r(f
+
PROOF,
r(f) +
< oo. then ?•(/) -I- s(g)
< x> or r ( f + ) + s(g+)
< x.
then r ( f ) +
r(g)
s{g).
(i) Let f,ge%>be
(a) apply. Then s(f),
+ r(g~)
s(g).
elementary functions to which the hypotheses o f
s(g) and r ( f ) exist. Assume that |s(/ + gr)| < oo. T h e sum
f + g may be represented in the f o r m (3.28). If c(-) denotes expectation with respect to q, then by R e m a r k 3.7 and the preceding lemmas, (3.36) (ii) I f /, g e
s(f
+ g) = e(f
+ g) = e ( f ) + e(g)
^ s(f) +
s(g).
each may be approximated f r o m below by a m o n o t o n e in-
creasing sequence o f elementary functions. U n d e r the hypotheses o f ( a ) a n d if |s(f + gi)| < oo, the result o f (i) applies t o approximating elementary functions. T a k i n g m o n o t o n e limits establishes (3.37)
s(f
+ g) g s ( f ) +
s(g).
MINIMAX
PROCEDURES
13
Special cases. If s(f + g) = — oo and the hypotheses of (a) hold, then ( 3 . 3 7 ) is trivial. If s(f + g) = oo, then s(f+ + g+) ^ s [ ( / + p) + ] = oo. Since / + , g+ . each may be approximated from below by a monotone increasing sequence of simple functions, each of which is in and is bounded. The result in (i) for elementary functions applies; taking monotone limits shows that S(/ + ) + s ( g ) ^ S ( / + + g+) = O O . Thus, one of s ( f ) , s(g) is oo and ( 3 . 3 7 ) is valid. In summary, therefore, if s(f + g) exists and the hypotheses of (a) hold, then ( 3 . 3 7 ) is valid. Under the same assumptions, +
(3.38)
s(g)
= a(f
+ g - f ) £
s(f
+ g) + s( - / ) ,
which, by Proposition 3.9, is equivalent to the left inequality in (a). (iii) Suppose /, g e # and the hypotheses of (b) hold, ensuring that r(f), >'(&)• s(g) exist. Assume also that r(f + g) exists. Since r(f + g) = — s( —/ — g), the inequalities of (b) follow from (ii). (iv) To complete the proof, it is necessary to show that s(f + g), r(f + g) exist under the hypotheses of (a) and (b), respectively. Since s(f+ + g+), r(f~ + g~) exist, it follows from (ii) and (iii) that + 0) + ] ^
«[(/ (
9)
+ 9+)
' [ ( / + 9 ) " ] ^ r(f~
^ s(f+)+
+ g~) ^ s(f~)
+
s(g+), r(g~).
Thus «(/ + g) exists under the hypotheses of (a). A dual argument shows that r(f + g) exists in (b). COROLLARY 3 . 1 .
Let / e
(a) If s(D exists, | a ( / ) | ^
s(\f\).
(b) If r ( f ) exists, | r ( / ) | g
,(|/|).
Define sets K, K+
Q X2
>(),•••
,xn>
^ 0}, 0}.
PROPOSITION 3.13. Let h: K -* R be continuous and concave in K and such that h(x) > 0 and h(kx) = kh(x) for every x e K+ and k ^ 0. Let fx, f2, /„ e + be such that «(/,) < oo for 1 ^ i ^ n. Then
(3.41)
s[A(/i( |¿(t, d) - f(t, d')\ < e
for every t e T. Applying Proposition 3.9, parts (b), (c), (d), (e), to the right side of (4.1) establishes (4.2)
d) - s(t, d')| < e,
|r(£, d) - r ( f , d')\ < e.
hence (a) and (b). EXAMPLE. An example of the statistical model described in Section 2 is the nonparametric version of the two sample location shift model. If (a^, • • • . xm) are the observations of the first sample and (ylt • • • , yn) are the observations of the second sample, the model can be written in the form Xi = F~1 (Uj), (4 3)
'
SO = V + * ' \ u m
l g t g f f i . + J),
where ( « ! , • • • , um + n) are realizations of independent, identically distributed random variables, each uniformly distributed on [0, l ] , F e the class of all continuous distribution functions on the real line, /xeSl = ( — oo, oo), and (H, F) is the unknown parameter. Equations (4.3) are of the general form (2.1).
15
MINIMAX PROCEDURES
Let {dij - yj — xh 1 ^ i ^ m, 1 ^ j ^ n} and let Oj < a 2 < • • • < . where A/ = mw + 1, denote the ordered { 0, (3.1)
ET2
* {ET)2
+
that is, (3.2)
Var T ^
{Cov K(T, V)}2
X
Var V
or equivalently, from the invariance properties of (2.6), (3.3)
E{T - g(6)}2 ^ {ET - g(6)}2
+
{Cov (T, V)}2 Var V
for any real valued function g defined on ii. This inequality appears to give a lower bound on the risk of T as an estimator of g(9) for squared error loss {T — g{Q)}2. and obviously also for quadratic loss a(6){T — g(9)}2 with a (9) > 0. But the apparent bound is useless because it depends on T: rather than compute Cov (T. V) to get a lower bound on the risk, we would compute Var T to get the risk itself. However, when V is such that Cov (T. V) depends on T only through ET\ that is, when V has the property (3.4)
ET, = ET2 => Cov (7V V) = Cov {T2, V),
20
SIXTH B E R K E L E Y S Y M P O S I U M : BLYTH A N D ROBERTS
then the best Schwarz inequality (3.1) takes the very useful form of a lower bound on the risk of T in terms of ET: ET = m(0) => ET2
(3.5)
^ {m(6)}2
+ bH(0),
that is, (3.6)
ET = m(9) => Var T ^
bm(0),
or equivalently, (3.7)
ET = m(0) => E{T - g(6)}2 ^ {m(0) - g(6)}2 + bm(6),
where bm{0) = {Cov (T, F)} 2 /Var V. We will refer to (3.5) as an inequality of Cramer-Rao type. The question as to what functions V satisfy condition (3.4) and therefore give Cramer-Rao type inequalities has the following partial answer. THEOREM 2 . A necessary condition for V to give a Cramer-Rao type inequality is that V depend on X only through a minimal sufficient statistic. This condition is also sufficient, when the minimal sufficient statistic has a complete family of possible distributions. PROOF. Necessity. For every statistic T and every sufficient statistic S, notice that E(T\S) is a statistic and has the same expectation as T. The property (3.4) for V therefore implies
(3.8)
Cov {E(T\S).
I'} = Cov (T. f ) .
that is. (3.9)
^ { [ ^ ( r l N ) ] ! * } - [A1{A1(71|»S,)}]Air = E(TV)
-
(ET)(EV).
that is. (3.10)
=
E(TV).
that is. (3.11)
/¿[¿'{[¿(rlNJlrlN}] = £[£7(7T|S)].
that is. (3.12)
E[E(T\S)E(V\S)]
= £[£(7T|N)].
This is true for every T. In particular, for T = I'(A\ 0o) the above identity gives, at 0 = 90. (3.13)
««„[WIS)}2] =
l'2|if)].
This shows, for every G0 in ii, that the distribution of [' given N must be concentrated on one point, that is, V must be a function of S. Sufficiency. If S is a sufficient statistic and V = F()/P((o) with quadratic loss. Hodges and Lehmann [ 7 ] proved admissibility for (a, b) = ii of particular estimators T* for the specific exponential families binomial, Poisson, normail (co, 1), and gamma with scale parameter co, given by specific choices of /1. Theiir proofs use the relaxed inequality (4.3); different proofs can be given using onl\y (4.4). Girshick and Savage [ 6 ] proved T* = X admissible for (a, 6) = £1 = (— oo, oo)). Their proof uses (4.3) which they further relax to (4.4), so can be carried ouit using only (4.4). Karlin [8], using the limiting Bayes method, gave sufficient conditions foir T*, with k = 0, to be admissible for (a, 6) = i2. Ping [10] gave a Hodges-Lehmann type proof of the following extension olf Karlin's theorem: T* is admissible provided
I N E Q U A L I T I E S OF C R A M E R - R A O T Y P E
J
'ON
29
(*lO d£ =
TO
oo = ->» lim JTD0 [P(X)]-xe~kit
d{.
Ping writes d o w n the relaxed inequality (his f o r m u l a 1.4) but then replaces this by the further relaxed inequality (his f o r m u l a 1.5), so that his p r o o f actually uses o n l y (4.4). EXAMPLE 3.
Gamma
each with density ke~Xx,
F o r Xlt
with scale parameter.
• • • , X„
independent,
x ^ 0, A > 0, the sufficient statistic X =
has
exponential f a m i l y (complete) o f possible densities
(5.14)
A" -—x"~le~ T(n)
i x
x ^ 0,
,
A > 0.
F o r this f a m i l y with n > 2 (not necessarily integer valued), f o r estimating k = 1 /EX with quadratic loss, the u n i f o r m l y best constant multiple o f X is n (5-15)
T*
- 2
=
which has expectation (5.16) and risk (5.17)
m*(k)
= ET*
E(T*
-
= " ~ 2 k 71 — 1
2 k)2 =
71—1
This extimator T * was p r o v e d admissible by Ghosh and Singh [ 5 ] using a limiting Bayes argument. H e r e it provides an example, f o r an exponential family, o f using the H o d g e s - L e h m a n n m e t h o d to p r o v e admissibility o f an estimator o f something other than EX. F o r an estimator T(X)
( E x a m p l e 2 is restricted to estimation of with expectation m(k)
inequality f o r T, l/X proves the existence o f (5.18)
A
r ^ x ' - i e - f d x , 1 (n) Jo x
m(k)
= j r(k) k
and we have (5.19) Written in terms o f (5.20)
«(A) =
the relaxed inequality (4.3) is
r(k) - j t ~
-
EX.)
and finite variance, Schwarz's
r'(k).
30
SIXTH
(5.21)
BERKELEY
SYMPOSIUM:
BLYTH
AND
ROBERTS
(n - 1){A«'(A)}2 + 2{As'(A)} + (n - l)(n - 2){s(A)} 2 g 0.
and the further relaxed inequality (4.4) is (5.22)
{(n - 2)*(A) - As'(A)}2 + 2As'(A) ^ 0.
For each of these inequalities, it is easy to prove that «(A) s 0, corresponding to m(X) = m*(X), is the only solution that corresponds to some m(k) = ET, so either inequality can be used to prove T* admissible. The proofs are the same as those given for the corresponding inequalities in Example 1, except that the roles of A -* 0 and A -» oo are reversed. EXAMPLE 4 .
Normal
(0, 1). F o r X n o r m a l (9, 1), — oo < 6
( 0 ) ] 2 - 26m(0) g 0.
This inequality has the nontrivial solution m(9) = 0, corresponding to T(X ) = X. The same happens for any constant T* in the exponential family problem of Example 2. REFERENCES [1] E. W. BARANKIN, "Locally best unbiased estimates," Ann. Math. Statist., Vol. 20 (1949), pp.477-501. [2] A. BHATTACHARYYA, "On some analogues of the amount of information and their UHE in statistical estimation," Sankhyd, Vol. 8 (1946-1948), pp. 1-14, 201-218, 315-328. [3] D. G. CHAPMAN and H. ROBBINS, "Minimum variance estimation without regularity assumptions," Ann. Math. Statist., Vol. 22 (1951), pp. 581-586. [4] H. CramER, Mathematical Methods of Statistics. Princeton, Princeton University Press. 1946. [5] J. K. GHOSH and R. SINGH, "Estimation of the reciprocal of scale parameter of a Gamma density," Ann. Inst. Statist. Math., Vol. 22 (1970), pp. 51-55. [6] M. A. GIRSHICK and L. J. SAVAGE, "Bayes and minimax estimates for quadratic loss functions," Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, Berkeley and Los Angeles, University of California Press, 1951, pp. 53-73. [7] J. L. HODGES and E. L. LEHMANN. "Some applications of the Cramer-Rao inequality." Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. Berkeley and Los Angeles, University of California Press, 1951, pp. 13-22. [8] S. KARLIN, "Admissibility for estimation with quadratic loss,' Ann. Math. Statist.. Vol. 29 (1958), pp. 406-436. [9] J. KIEFER, "On minimum variance estimates," Ann. Math. Statist., Vol. 23 (1952), pp. 627-629. [10] C. PING, "Minimax estimates of parameters of distributions belonging to the exponential family," Chinese Math., Vol. 5 (1964), pp. 277-299.
THE LIKELIHOOD RATIO TEST FOR THE MULTINOMIAL DISTRIBUTION J. O O S T E R H O F F UNIVERSITY
OF
NIJMEGEN
and W. R. VAN Z W E T U N I V E R S I T Y OF
LEIDEN
1. Introduction and summary Let X 0 for all z As £ «¡7:,- = 0 and af = Jtf = 0 whenever 7t; = 0, the points with coordinates 7t; + eaini are points of Q for all sufficiently small £ > 0. Hence (2.1)
£ a,/-/(/>)!,=„ = Z i=l ¡=1
Z P(Z{N) zeA"
= N Z zeA"
=
z\n)Nz,
= z\n) Z a,z, ¡=1
is a directional derivative o f f at n in a direction in fl multiplied by a nonnegative constant. Note, however, that atnt may be equal to zero for all i if ni = 0 for some i. If f(n) > 0, then (2.1) is positive because £ aizi > 0 for all z e AN and consequently/does not have a maximum at n. If/(7t) = 0 the same conclusion holds since AN is nonempty. Q.E.D. For z, p 6 il we define k z(2-2) I(z,p) = Z zJog-, Pi • =1 where zt log (z./p, ) = 0 by definition if zf = 0. It is well known that for fixed p this function is convex in z, positive unless z = p and finite if p, 0 for all i. In Lemma 2.2 we show that under p the random variable I(ZiN), p) is of order at most iV - 1 in probability uniformly in p.
LIKELIHOOD RATIO TEST LEMMA 2 . 2 .
F o r
PROOF.
(2.4)
0
t h e r e
e x i s t s
sup p(l(Z
e v e r y
For 0 ^
Z, ^
log - = 2 i
l
zt
Zi
log
Since uncer p ,
Z \
^
N )
=
(2.5)
Z \
N )
0 ^
Pi
log
N )
( Z \
/ ( Z
w
^
^
\
\
s u c h
t h a t
f o r
all
Zi
z
i
~
J
Pi
,
= (Zi -
s
,
Pi) +
(
z
i
~
Pi
= 0, we have under
(
^
X Pi
*o
P i )
2
.
Pi
= 0 a.s. if p
/ p , )
p )
N
e.
1,
1+
Pi
0
>
2;
1, 0
0, (2.24)
(2.25)
-
a
-v =
+ aN +
0
for N -» oo. We consider three cases. (i) Suppose that p° e Since cN + aN + AN~1 -» 0 for N -* co, there exists e > 0 such that for all sufficiently large N the set {z|z e fi, I(z, p°) ^ C N + AN + '} is contained in the convex set {z|z e O, z( ^ e f o r i = 1, • • • , k}. By Lemma 2.1 the supremum over Q in (2.25) may therefore be replaced by the supremum over the set of all p e Q with p ^ e. Furthermore, we may again use the fact that under p I(Zw.p°)
(2.26)
= Y Z\N)\og^ + I(Zm.p) • =i Pi
a.s.
and 0 ^ I(Z{N). p) < oo a.s. It follows from Lemma 2.2 that to prove (2.25) it is sufficient to show that for every A > 0 and e > 0, (2.27)
sup p(cN pg£
V
tends to zero for N —• oo.
A
£ Z\n log % ^ cN + aN + Pi iV
i=1
/
38
SIXTH BERKELEY SYMPOSIUM: OOSTERHOFF AND VAN ZWET
The condition Ncn —• oo implies that c s is positive for all sufficiently large N: together with the condition Naj/C^ 1 —• 0 it also yields a
(2-28)
»
+
i
= °((l)1/2)
=
°{Cn)
for N —* oo. Hence Cjy — aN — AN 1 > 0 for all sufficiently large N. As f o r p = p° the random variable in (2.27) is equal to 0 a.s., the supremum in (2.27)) may be restricted to the set of all p =j=- p u with p ^ £. Applying Lemma 2.3 we find that it suffices to show that for every A > 0 and e > 0 V
°ip>p°)
J _ J c
- a
N
N
- A N "
V
-HP,PQ)
0{P,P°)
tends to zero for N -* oo, uniformly for all p =/= p° with p ^ e. Define, for N = 1,2, -, il w>1 = j p l p e a j » * p \ p ^
^
j
"n.2
>
^J
(2.30) = j p l p e a p *P°,P
^ e,I(p,p°)
For p e ClN 1, (2.29) is bounded above by 1 - d>i* C " ~
(2.31)
Un
~ 0 A N ~ l N"2
and by (2.28) and Lemma 2.4 ^ - a . - A N 1
1
o ( p , p°)
cnN112
^
2a(p,p°)
=
CnNW 2[M2I(p,p°)-]1'2
Nc \1/2 —rr-1 -» oo for N \2M2
oo.
For p e ilN 2, (2.29) is bounded above by (2.33)
a{ p,p°)
=
{aN + A
N - i ) (\MJip, N
g (aN +AN~l)
p°)
^
( IN \1/2 ( - — -> 0 \MiCnJ
by the mean value theorem, Lemma 2.4 and (2.28). Hence the suprema of (2.29)) over both (lN 1 and SlN 2 tend to zero which proves the lemma for p° e Cl. (ii) Suppose that jo 0 is a boundary point but not an extreme point o f f ) ; without loss of generality we assume that for some p° 0 for
39
LIKELIHOOD RATIO TEST
} = 1, • • • , m and pf = 0 for i = m + I, • • • , k. Since I(z, p°) = oo if zt =/= 0 for some m + 1 ^ i ^ k, the set {z\z e ii, I(z, ^ cN + aN + AN~is contained in the convex set {z\z e ii, z, = 0 for i = m + 1, • • • , k}. By Lemma 2.1 the supremum over i i in (2.25) may therefore be replaced by the supremum over all p e i i with pi = 0 for i = m + 1, • • • , k. But under any p with p( = 0 for i = m + 1, • • • , k, m %(N) (2.34) I(Z£) is now an interior point. This has been dealt with in (i). (iii) Suppose that/? 0 is an extreme point of Q. This implies that I(ZiN), p°) can only assume the values 0 and oo. Since cN — aN > 0 for all sufficiently large N, (2.25) is immediate. Q.E.D. We remark that in the proof of L e m m a 2.5 we have made use of the condition c,v -» 0 only to ensure that in case (i), for every A > 0 (2.35)
{ z | z e f i , / ( z , j > ° ) g cN + aN + AN'1}
c { z | z e f i , z ^ e}
for some £ > 0 for all sufficiently large N, whereas in case (ii) it is needed that the same condition holds for the reduced lower dimensional problem. As Ojy + AN'1 = o(Cjy) by (2.14), L e m m a 2.5 will continue to hold if we replace the condition cN -* 0 by the following assumption. F o r all sufficiently large N the set {z|z £ fi, /(z, p°) ^ c^} remains bounded away from the set of all points z e Q that have zf = 0 for all i for which pf = 0 but also for at least one i with pf ^ 0. This extension of L e m m a 2.5 is the main step in relaxing the condition — log Pm)- For the power of such a test at p
where p = (3.9)
ßN(p)
e{4>n(Zw)\P%
=
1 1 - nN + nNE(N(ZiN))\p)
p° we have
if P i = • • • = otherwise,
Pm
= 0,
where (3.10)
* = X Pi'
Pi
for
1,
For the random vector Z(N\ consider the auxiliary problem of testing H: p = p° against K: p J= p°, where p denotes the parameter vector of the distribution of Z{N). A test for this problem will reject H with probability n{z) if Z{N) = z. Such a test has size aN if and only if (j)N satisfies (3.8), and its power at p is given by ßN(p) =
(3.11)
E{(t>N(ZiN))\p).
Thus there exists a one to one correspondence between the class of size a.N tests for H based on ZiN) that reject H with probability one if Z\N) =/= 0 for at least one i = m + I. •••. k and the class of all size oln tests for H based on Zm. Here corresponding tests have the same function (¿0) - (2p -
r)
" « I t
Again the right side approaches + 00 as t approaches + 00 while the left side remains bounded. This contradiction establishes the lemma.
B A Y E S MINIMAX
53
ESTIMATORS
From the lemma and the definition of \j/(t) we get immediately the following result. THEOREM l . I f d ( x ) = A ( | | X | | 2 ) X is a minimax
estimator,
2
(9)
and
2
£„(/*( || X || )X( - 0.) = -exp {-i||a: -
^
- 2 ( p - 2) log H | - 2 ( p -
expl-iHa:
-
d\\2}dG(6) exp
u\\2} f [ dxt i= 1 „0112
}
n
^
2) log ||«°||.
Because of the orthogonal invariance of dG(6) and the fact that \U [| = Y], the two terms in the above inequality which depend on u° do so only through t]. Hence, these two terms are constants. By Jensen's inequality applied to (14) it follows that (15)
log A
[L
exp{ —I||ar - 01|2} dG{0) RP \JRp
^ (16)
-2(p exp{ —
> ec
exp { f x
- u||2} f [
2) log H | + c 0|| 2 }rfG(0)i exp {— i l l * - « H
Yidx, i= 1
I -2(P-2)
Note that by the orthogonal invariance of G. the above inequality holds for all u such that ||u || Si £. The left side of (16) is essentially the (density o f ) the convolution of the standard normal with the convolution of dG(9) and the standard normal. Hence the quantity on the left represents the Radon-Nikodym derivative (with respect to Lebesgue measure on Rp) of a measure which is finite if and only if Jj,p dG{Q) < oo. However, integrating the right side of (16) over the sphere ||u|| ^ e we see that the result can only be finite if 2(p — 2) — {p — 1) > 1. or equivalently if p > 4. We therefore have the following result. Theorem 2. Let ¿(X) be a spherically symmetric minimax estimator which is generalized Bayes with respect to the (generalized) prior dG(0). If p ^ 4. dG(9) cannot be a proper prior distribution.
BAYES MINIMAX
ESTIMATORS
55
5. Remarks We have shown that there do not exist spherically symmetric proper Bayes minimax estimators in four or lower dimensions. In a previous paper we demonstrated the existence of such estimators for p ^ 5. We are thus far unable to rule out the possibility that nonspherically symmetric proper Bayes minimax estimators exist in three and four dimensions although it seems highly unlikely that such estimators do exist. Also we have been unable thus far to say anything concrete about the situation where some part of the covariance structure of the problem is unknown. L. Brown has recently found a proof of the nonexistence of nonspherically symmetric proper Bayes estimators for/) = 3 and 4.
o o o o o The author wishes to thank L. Brown for bringing this problem to his attention and C. Stein for much helpful discussion. REFERENCES [1] A. J. BARANCHIK. "Multiple regression and estimation of the mean of a multivariate normal distribution," Stanford University, Technical Report No. 51 (1964). [2] . "A family of minimax estimators of the mean of a multivariate normal distribution." Ann. Math. Statist., Vol. 41 (1970), pp. 642-645. [3] CHARLES STEIN, "Inadmissibility of the usual estimator for the mean of a multivariate normal distribution," Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Berkeley and Los Angeles, University of California Press, 1955, pp. 197-206. [4] \V. E. STRAWDERMAN. "Generalized Bayes estimators and admissibility of estimators of the mean vector of a multivariate normal distribution with quadratic loss." Ph.D. thesis, Rutgers University. 1969. [5] . "Proper Bayes minimax estimators of the multivariate normal mean, ' Ann. Math. Statist.. Vol. 42 (1971), pp. 385-388.
ON THE WIENER PROCESS APPROXIMATION TO BAYESIAN SEQUENTIAL TESTING PROBLEMS P. J.
BICKEL
1
U N I V E R S I T Y OF C A L I F O R N I A ,
BERKELEY
and J. A. Y A H A V
2
U N I V E R S I T Y OF T E L
3
AVIV
1. Introduction and summary In 1959 Chernoff [7] initiated the study of the asymptotic theory of sequential Bayes tests as the cost of observation tends to zero. He dealt with the case of a finite parameter space. The definitive generalization of the line of attack initiated in that paper was given by Kiefer and Sacks in [13]. Their work as well as that of Chernoff, the intervening papers of Albert [1], Bessler [3], and Schwarz [19], and the subsequent work of the authors [4] used implicitly or explicitly the theory of large deviations and applied only to situations where hypothesis and alternative were separated or at least an indifference region was present. In the meantime in 1961 Chernoff [8] began to study the problem of testing H: 9 ^ 0 versus K: 9 > 0 on the basis of observation of a Wiener process with drift 9 per unit time as an approximation to the discrete time normal observations problem. Having made the striking observation that study of the asymptotic behavior of the Bayes procedures for any normal prior was in this case equivalent to the study of the Bayes procedure with Lebesgue measure as prior and unit cost of observation, he reduced this problem for suitable loss functions to the solution of a free boundary problem for the heat equation. In subsequent work ([2], [9], [10] and [16]) the nature of this solution was investigated by Chernoff and others. In this paper we are concerned with the problem of testing H: 9 ^ 0 versus K :9 > 0 by sampling sequentially from a member of one parameter exponential (Koopman-Darmois) family of distributions (see equation (3.1)) at cost c per observation. We will assume the simple zero-one loss structure in which an error in decision costs one unit while being right costs nothing. ' P r e p a r e d with the partial s u p p o r t of the Office of Naval Research. C o n t r a c t X O X R X00014-69-A-0200-1038. 2 Prepared with the partial s u p p o r t of U.S. Public Health G r a n t G M - 1 0 5 2 5 ( 0 7 ) . 3 T h i s research was d o n e while the a u t h o r was visiting the University of California. Berkeley. 57
58
SIXTH BERKELEY SYMPOSIUM: BICKEL A N D YAHAV
Our main result, Theorem 4.2, states that if we assume a bounded continuous prior density tp on the parameter space and that an observation has mean zero and variance one if 6 = 0, then our problem is asymptotically equivalent to the analogous Wiener process problem with drift 0 per unit time, the same loss and cost structure and prior "density" = i/^O). Chernoff's observation applies here also and this asymptotic problem is equivalent to the problem for fixed cost. A formal result in this direction was obtained for the special case of Bernoulli trials by Moriguti and Robbins [18]. Our technique may be viewed as an extension to the sequential case of an approach of Wald [21] and LeCam [14], It is clearly applicable to other testing, estimation, and general decision problems. We begin by examining the Wiener process problem and the embedded discrete time normal observation problem for a general continuous and bounded prior density i//. Our first two results, Lemmas 1.1 and 2.2, establish the asymptotic relation between the Wiener process problem with prior density ip and the same problem with prior density = i/'fO). Our basic tool is the similarity transform used by Chernoff in [8] and a weak compactness theorem which is a special case of an unpublished result of LeCam. A statement and proof of the latter for our special case is given in the Appendix (Theorem A.l). The validity of this result requires the use of randomized procedures. These are employed throughout the paper, despite the fact that the Bayes procedures for all our problems are nonrandomized. Randomization also plays an important role in considering the relation between the discrete and continuous time problems where we make heavy use of sufficiency. Reference to Chapter 7 of Ferguson [12] may prove helpful. In Section 3 we show essentially that the exponential family problem is asymptotically at least as hard as the Wiener process problem. To do this we successively, without substantial loss, reduce the problem to one in which observation is carried out in blocks, the parameter space is shrunk to a neighborhood of zero, and the time of observation is truncated. At this stage we use a BerryEsseen type bound essentially due to Petrov [19] to show that the normal approximation is valid and then apply the results of Section 2. This approximation theorem is given as Lemma 3.3 and its proof is given in the Appendix. Finally, in the fourth section we show that the Wiener process problem is at least as difficult asymptotically as the exponential family problem. In doing so. we exhibit implicitly a sequence of procedures, independent of i¡t. for which the bound of Section 3 is achieved. Some concluding remarks and statements of open problems are given in the last section. 2. The normal theory problem In this section we shall describe randomized sequential procedures in continuous and discrete time and derive asymptotic results for the Wiener process problem and its discrete time approximations.
59
W I E N E R PROCESS S E Q U E N T I A L TESTING
Let C [ 0 . x ) be the set o f all c o n t i n u o u s functions defined o n [0, x ) such that l i m , ^ x(t)/t2 = 0 e n d o w e d with the norm ||x|] = s u p , | x ( / ) | / ( l + t2). The space C is complete separable and metric. Let OS d e n o t e the class of Borel sets o n C [ 0 , 30) (the product sigma field) and let 8&t d e n o t e the Borel field generated by the maps x -> :r(,s) for 0 ^ s ^ Let fi = C [ 0 . x ) x [ 0 . 1 ] , si be the product Borel field and Qe. — x 0 with zero-one loss and cost c per unit time. A sequential procedure 71 = (
0
T )
}
d
6
P
{
e
Jo
W
(
T
)
< 0 } d 6 .
T h e right side of (2.25) converges t o z e r o as T -> oo w h i c h completes t h e p r o o f of t h e l e m m a . Before giving o u r final l e m m a we review t w o ways of defining sequential p r o c e d u r e s f o r discrete time p r o b l e m s . Let 9C = Rm x [0, 1] be t h e p r o d u c t of a c o u n t a b l e n u m b e r of copies of R a n d [0, 1], a n d let ^ be t h e Borel field o n this space. A r a n d o m i z e d s t o p p i n g t i m e t is n o w a m e a s u r a b l e m a p f r o m ?£ t o t h e n a t u r a l n u m b e r s {0, 1, 2, • • • , oo} such t h a t t h e event [ t ( - , Z) ^ n ] is, f o r every z a n d n, m e a s u r a b l e with respect t o t h e such t h a t t h e 2 section of A n [ r ^ «] is in 2&* f o r every n a n d z. In this f o r m u l a t i o n (which we refer t o as I) a p r o c e d u r e n = (d, t) h a s the same interpretation as in the c o n t i n u o u s t i m e p r o b l e m . O n the o t h e r h a n d , following F e r g u s o n [ 1 2 ] we can define a s t o p p i n g rule z by a sequence of f u n c t i o n s (t/f 0 , ij/j, 1j/ 2 , • • •) where i¡/j is a 3&f m e a s u r a b l e f u n c t i o n f r o m R1X1 t o [0, 1] a n d Dj°=o 'A; ^ 1. If t is a s t o p p i n g t i m e in t h e sense of (I), t h e n the 1¡/j are given by l
(2.26)
< P
J
( x
l
, x
2
, - - - )
=
A[z: t ( i „ i 2 , • • • ,
z )
=
1
j],
where k is Lebesgue measure. Conversely, it is a w e l l - k n o w n result of W a l d a n d Wolfowitz [ 2 3 ] t h a t given a s t o p p i n g t i m e in this second m o d e as (1p0, • • •) there is a s t o p p i n g t i m e in t h e sense of I satisfying (2.26) (see t h e p r o o f of T h e o r e m A . l ) . Similarly, a t e r m i n a l decision rule is specified in t h e second m o d e as a sequence ( = Ee{dl[(i
-
1)A7 < t ^ ¿A7]/W(iV). • • • , H'(»A7)} for i = 0, 1. • • •
and ¿i"' = 0 otherwise. It is clear that n {N) = { ( ^ K • • •)• ( ' ' )) (2.37)
J j=o