273 116 59MB
English Pages 504 Year 1985
de Gruyter Studies in Mathematics 7 Editors: Heinz Bauer · Peter Gabriel
Helmut Strasser
Mathematical Theory of Statistics Statistical Experiments and Asymptotic Decision Theory
W DE
G Walter de Gruyter · Berlin · New York 1985
Author: Dr. Helmut Strasser Professor of Mathematics Universität Bayreuth
For my wife
Library of Congress Cataloging in Publication Data Strasser, Helmut, 1948Mathematical theory of statistics. (De Gruyter studies in mathematics ; 7) Bibliography: p. Includes indexes. 1. Mathematical statistics. 2. Experimental design. 3. Statistical decision. I. Title. II. Series. QA276.S855 1985 519.5 85-16269 ISBN 0-89925-028-9 (U.S.)
CIP-Kurztitelaufnahme der Deutschen Bibliothek Strasser, Helmut: Mathematical theory of statistics : statist, experiments and asymptot. decision theory / Helmut Strasser. Berlin ; New York : de Gruyter, 1985. (De Gruyter studies in mathematics ; 7) ISBN 3-11-010258-7 NE: GT
© Copyright 1985 by Walter de Gruyter & Co., Berlin. All rights reserved, including those of translation into foreign languages. No part of this book may be reproduced in any form - by photoprint, microfilm, or any other means - nor transmitted nor translated into a machine language without written permission from the publisher. Printed in Germany. Cover design: Rudolf Hübler, Berlin. Typesetting and Printing: Tutte Druckerei GmbH, SalzwegPassau. Binding: Lüderitz & Bauer, Berlin.
Preface
The present book is neither intended to be a text-book for a course in statistics, nor a monograph on statistical decision theory. It is rather the attempt to connect well-known facts of classical statistics with problems and methods of contemporary research. To justify such a connection, let us recapitulate some modern developments of mathematical statistics. A considerable amount of statistical research during the last thirty years can be subsumed under the label "asymptotic justification of statistical methods". Already in the first half of this century asymptotic arguments became necessary, as it turned out that the optimization problems of statistics can be solved for finite sample size only in very few particular cases. The starting point of a systematic mathematical approach to asymptotics was the famous paper by Wald [1943]. This early paper contains already many important ideas for the asymptotic treatment of testing hypotheses. Ten years later the thesis of LeCam appeared, which in the meantime has turned out to be leading for asymptotic estimation. Briefly, the situation considered in both papers is as follows. As mathematical model serves a family of probability measures or, for short, an experiment which can be viewed as a finite dimensional differentiable manifold, i.e. a so-called parametric model. By independent replication of the experiment it becomes possible to restrict the analysis to a small subset of the model, i.e. to localize the problem. As the sample size tends to infinity the experiment can locally be approximated by a much simpler one, namely by a Gaussian shift. Essentially, this is due to the smoothness properties of the experiment. Thus, for large sample sizes the statistical analysis of the original experiment can be replaced by the analysis of the approximating Gaussian shift. It took at least twenty years to understand the structure of the papers of Wald and LeCam in the way described above. Remarkable steps were the papers by LeCam [1960], and by Hajek [1970 and 1972]. Finally, LeCam succeeded in extending his version of decision theory [1955 and 1964] to an asymptotic theory of experiments [1972 and 1979]. It covers the main results of asymptotic statistics obtained so far, thus providing a framework which facilitates the understanding of classical asymptotics considerably. Moreover, for good reasons this theory claims to determine the framework of future developments in asymptotic statistics. Apart from the above-mentioned papers the asymptotic theory of experiments is presented in Lecture Notes of LeCam, [1969 and 1974] and hopefully soon by a forthcoming monograph of LeCam. The present book is intended to serve as a "missing link" between introductory text-books like those of
VI
Preface
Lehmann [1958], Schmetterer [1974], or Witting [1985], and the presentation by LeCam. This goal determines what the book contains as well as the omissions. As to the mathematical prerequisites, we present the asymptotic theory of experiments on a mathematical level corresponding to the usual level of upper graduate courses in probability. Essentially, there are two sets of problems where the state of the art justifies the attempt of unified presentation. First, there is the general decision-theoretic framework of statistics together with asymptotic decision theory, and second, its application to the case of independent replications of parametric models. The present volume deals mainly with the first complex. To give a rough outline of the organization of the book, let us discuss briefly the contents. More detailed information is provided by the introductions to the single chapters. After having collected some more or less well-known facts of probability theory in Chapter 1, we deal in Chapter 2 with basic facts of testing hypotheses. On the one hand, the results of this chapter are needed later, but on the other hand they convey a first idea of handling decision-theoretic concepts. It turns out, that the theory of Neyman-Pearson is the basic tool for the analysis of binary experiments. This gives rise to exemplify the theory of experiments in Chapter 3 by means of these tools only. Solely by the theory of Neyman and Pearson some of the main results of the theory of experiments are proved in an elementary way. Already classical statistics compares experiments with respect to the information contained. The central role plays the concept of sufficiency. In Chapter 4 we take up this idea and extend it, passing the notion of exhaustivity, to the general concept of randomization of dominated experiments. By relatively simple methods we prove a version of the randomization criterion for dominated experiments. In Chapter 5 we collect the most important applications of sufficiency to exponential experiments and Gaussian shifts. The testing problems which have been considered up to this point are of dimension one. In Chapter 6 we start with the consideration of higher dimensional testing problems. We begin with Wald's classical complete class theorem and prove the completeness of convex acceptance regions for exponential experiments. Our main interest in this chapter are Gaussian shift experiments of dimension greater than one. At first, some basic facts are proved elementary, i.e. by means of the Neyman-Pearson lemma. The rest of the chapter, however, is devoted to another approach to testing for Gaussian shifts, namely to reduction by invariance. In Chapter 6 we take this opportunity for discussing amenability of groups and for proving the most simple version of the Hunt-Stein theorem. Partly, these results are needed later, but their presentation in Chapter 6 serves mainly as an introduction to the ideas of statistical invariance concepts. Before we go into the general decision theory we present in Chapter 7 a brief
Preface
VII
compendium of estimation theory. The power of the Neyman-Pearson theory is illustrated by median unbiased estimates. Sufficiency is applied to mean unbiased estimates. As it becomes clear already in Chapter 3, the natural extension of the Neyman-Pearson lemma to arbitrary experiments leads to the Bayesian calculus which is considered without any regard to "subjectivistic" interpretation. It turns out that invariance-theoretic methods of estimation as well as most proofs of admissibility are, in a technical sense, Bayesian in spirit. At this point there is plenty of motivation, to introduce the concepts of decision theory in full generality. In Chapter 8 we present the classical results of decision theory, such as the minimax theorem, the complete class theorem and the general Hunt-Stein theorem. Chapter 9 deals with those parts of decision theory which are known under the label of comparison of experiments. The general theorems of asymptotic decision theory are contained in Chapter 10. By means of these results it is possible to carry through our program, namely to reduce asymptotic problems to the analysis of the limit experiments. This is done in the remaining Chapter 11-13. Let us have a closer look at the program. The asymptotic method consists of three steps corresponding to Chapters 11,12 and 13. The main idea is to embed a statistical experiment into a convergent sequence of experiments and then to analyse the limit experiment of the sequence instead of the original one. Hence, one major problem is the statistical analysis of the limit experiment. In the present book we confine ourselves to limits which are Gaussian shifts. In Chapter 11 the statistical theory of Gaussian shifts is recapitulated sufficiently general to cover infinite dimensional parameter spaces. A second problem consists in the proof of convergence for certain sequences of experiments. In this context we only consider the case of convergence to Gaussian shifts. This case can be treated by means of stochastic expansions of likelihood processes. By way of example we establish in Chapter 12 the basic expansion for independent replications of a given experiment. As soon as convergence of a sequence against a limit experiment is established the asymptotic decision theory of Chapter 10 may be applied. This synthesis is carried through in Chapter 13 for sequences of experiments converging to Gaussian shifts. In this way, we obtain the main results of classical asymptotics. We show at hand of examples how to treat both parametric as well as nonparametric problems by the tools established so far. It is clear that the motivation for the development of a general asymptotic decision theory can only be found in classical examples, which are not the main subject of this book. The connoisseur of classical statistics will not mind this lack of discussing the intuitive background of the concepts. But for the beginner we emphasize that this book will completely miss its aim if the reader is not aware of the connection with the origin of the ideas. There is a sufficient number of good textbooks covering classical statistical methods and some
VIII
Preface
asymptotics, which should be used as a permanent reference. Moreover, it is highly recommended to read the original papers quoted in the text. We do not claim that our presentation is final in any respect but we hope that it will be helpful for some reader. The author wishes to express his gratitude to H. Bauer, P. Gabriel and W. Schuder for accepting this book for publication. He feels very much indebted to C. Becker and H. Milbrodt who helped during the preparation of the manuscript by working through several versions, filling gaps and correcting numerous mistakes, and to Miss Heike Milbrodt for translating parts of the preface and the introductions into English. Thanks are also due to many collegues for supporting me by valuable hints and remarks, especially I. Bomze, A. Janssen, H. Linhart and J. Pfanzagl. It is a special pleasure to assure Mrs. G. Witzigmann of my most grateful appreciation for the care and skill she displayed in the typing of the manuscript. Bayreuth, July 1985
Helmut Strasser
Contents
Chapter 1: Basic Notions on Probability Measures 1. 2. 3. 4. 5. 6.
Decomposition of probability measures Distances between probability measures Topologies and σ-fields on sets of probability measures Separable sets of probability measures Transforms of bounded Borel measures Miscellaneous results
1 5 13 17 21 30
Chapter 2: Elementary Theory of Testing Hypotheses 7. Basic definitions 8. Neyman-Pearson theory for binary experiments 9. Experiments with monotone likelihood ratios 10. The generalized lemma of Neyman-Pearson 11. Exponential experiments of rank 1 12. Two-sided testing for exponential experiments: Part 1 13. Two-sided testing for exponential experiments: Part 2
38 42 47 54 55 58 63
Chapter 3: Binary Experiments 14. 15. 16. 17. 18.
The error function Comparison of binary experiments Representation of experiment types Concave functions and Mellin transforms Contiguity of probability measures
68 74 78 81 85
Chapter 4: Sufficiency, Exhaustivity, and Randomizations 19. 20. 21. 22. 23. 24. 25.
The idea of sufficiency Pairwise sufficiency and the factorization theorem Sufficiency and topology Comparison of dominated experiments by testing problems Exhaustivity Randomization of experiments Statistical isomorphism
92 93 97 98 101 105 109
X
Contents
Chapter 5: Exponential Experiments 26. Basic facts 27. Conditional tests 28. Gaussian shifts with nuisance parameters
115 119 124
Chapter 6: More Theory of Testing 29. 30. 31. 32.
Complete classes of tests Testing for Gaussian shifts Reduction of testing problems by invariance The theorem of Hunt and Stein
131 135 140 147
Chapter 7: Theory of estimation 33. Basic notions of estimation 34. Median unbiased estimation for Gaussian shifts 35. Mean unbiased estimation 36. Estimation by desintegration 37. Generalized Bayes estimates 38. Full shift experiments and the convolution theorem 39. The structure model 40. Admissibility of estimators
155 162 166 172 181 186 200 215
Chapter 8: General decision theory 41. Experiments and their L-spaces 42. Decision functions 43. Lower semicontinuity 44. Risk functions 45. A general minimax theorem 46. The minimax theorem of decision theory 47. Bayes solutions and the complete class theorem 48. The generalized theorem of Hunt and Stein
227 230 234 237 238 240 242 245
Chapter 9: Comparison of experiments 49. Basic concepts 50. Standard decision problems 51. Comparison of experiments by standard decision problems 52. Concave function criteria 53. Hellinger transforms and standard measures
257 262 264 266 270
Contents
54. Comparison of experiments by testing problems 55. The randomization criterion 56. Conical measures 57. Representation of experiments 58. Transformation groups and invariance 59. Topological spaces of experiments
XI
273 278 287 288 293 296
Chapter 10: Asymptotic decision theory 60. Weakly convergent sequences of experiments 61. Contiguous sequences of experiments 62. Convergence in distribution of decision functions 63. Stochastic convergence of decision functions 64. Convergence of minimum estimates 65. Uniformly integrable experiments 66. Uniform tightness of generalized Bayes estimates 67. Convergence of generalized Bayes estimates
302 305 308 312 317 324 331 333
Chapter 11: Gaussian shifts on Hubert spaces 68. 69. 70. 71. 72. 73.
Linear stochastic processes and cylinder set measures Gaussian shift experiments Banach sample spaces Testing for Gaussian shifts Estimation for Gaussian shifts Testing and estimation for Banach sample spaces
340 343 350 357 362 372
Chapter 12: Differentiability and asymptotic expansions 74. Stochastic expansion of likelihood ratios 75. Differentiable curves 76. Differentiable experiments 77. Conditions for differentiability 78. Examples of differentiate experiments 79. The stochastic expansion of a differentiable experiment
378 383 387 390 395 402
Chapter 13: Asymptotic normality 80. Asymptotic normality 81. Exponential approximation and asymptotic sufficiency 82. Application to testing hypotheses
409 419 426
XII
Contents
83. Application to estimation 84. Characterization of central sequences
437 446
Appendix: Notation and terminology
461
References
478
List of symbols
483
Author index
485
Subject index
487
Chapter 1: Basic Notions on Probability Measures
This chapter contains some mathematical tools that are somehow typical of statistical theory. Strictly speaking, however, their mathematical background has little to do with statistical ideas. We give detailed proofs where the problems have so far hardly been discussed in textbooks on probability theory. The decomposition of probability measures with the help of likelihood ratios is of particular importance. Here, we base the concept of the likelihood ratio exclusively on the Radon-Nikodym Theorem, because the introduction of a dominating measure seems arbitrary. Section 1 contains fundamental facts and rules on likelihood ratios. In Section 2 we discuss the variational distance and the Hellinger distance between probability measures. We prove some well-known inequalities which will be useful, later. Section 3 contains brief remarks on topologies of sets of probability measures and in Section 4 we are concerned with separability of those topologies. Separability has important consequences for measurability of the likelihood function. The results are partly due to Pfanzagl [1969] and Nolle [1966]. In Section 5 we occupy ourselves with transforms of measures. Our major concern are the Laplace and the Mellin Transforms of measures. At last, we discuss the Hellinger transform of measures on simplices. For each transform we shall give a uniqueness- and a continuity theorem. Section 6 contains miscellaneous results needed later and serves mainly for easy reference.
1. Decomposition of probability measures Let (Ω, j/) be a measurable space and let P\ si , Q\ jtf be probability measures. We begin with existence and uniqueness of Lebesgue decompositions of probability measures. 1.1 Lemma. There is an so -measurable function f ^ 0 and a set N e jtf such that = 0 and
2
Chapter 1: Basic Notions on Probability Measures
Proof. Since P 02^2 are probability measures. Then 1,
Let
=
02).
02) ^ ^ΐ(Λ,
2
, and Ict/ f =
Then
D
gi
=
i = 1. 2
8
Chapter 1: Basic Notions on Probability Measures
Apart from the variational distance the so-called Hellinger distance is an import tool for statistical purposes. If Ρ This implies that P(Q,
R) Z YR(Nnr) · \/Q(NQlP).
2.13 Remark. Suppose that P|«s/, 01«^ and R\,s/ are probability measures. Let I — , N! 1 and I —— , N2 I be the Lebesgue decompositions of Q and R with respect to P. Later, the functions
will be considered. Easy computations show that P(gQgR) = 4[^(P, β) + ^(Λ 7?) - j
= Σ (exp (a, - 1) - aj) U at U exp (a, - 1) 7=1
«