Mathematical Statistics: A Decision Theoretic Approach [1 ed.] 0122537505, 9780122537509


203 82 16MB

English Pages 396 [514] Year 1967

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Mathematical Statistics
Title Page
Preface
Table of Contents
1 Game Theory and Decision Theory
2 The Main Theorems of Decision Theory
3 Distributions and Sufficient Statistics
4 Invariant Statistical Decision Problems
5 Testing Hypotheses
6 Multiple Decision Problems
7 Sequential Decision Problems
References
Index
Solutions to Selected Exercises
Recommend Papers

Mathematical Statistics: A Decision Theoretic Approach [1 ed.]
 0122537505, 9780122537509

  • Commentary
  • With Solutions to Selected Exercises
  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Probability and Mathematical Statistics A Series of Monographs

Edited

Z. W.

and

Textbooks

by

Birnbaum

University of Washington Seattle, Washington

E.

Lukacs

Catholic University Washington, D.C.

1. T h o m a s Ferguson. M a t h e m a t i c a l Statistics: A Decision A p p r o a c h . 1967 2. H o w a r d T u c k e r . A G r a d u a t e Course in P r o b a b i l i t y . 1967 In

preparation

K. R. P a r t h a s a r a t h y . P r o b a b i l i t y Measures on Metric Spaces

Theoretic

MATHEMATICAL STATISTICS A DECISION

THEORETIC

APPROACH

Thomas S. Ferguson DEPARTMENT OF UNIVERSITY OF LOS A N G E L E S ,

MATHEMATICS CALIFORNIA

CALIFORNIA

1967

ACADEMIC P R E S S

N e w York a n d L o n d o n

THIS WORK WAS SUPPORTED I N PART B Y A GRANT FROM T H E FORD FOUNDATION, A N D B Y A CONTRACT W I T H T H E OFFICE O F N A V A L RESEARCH, CONTRACT 2 3 3 ( 7 5 ) .

COPYRIGHT

©

ALL RIGHTS

1967, B Y A C A D E M I C P R E S S I N C . RESERVED.

NO PART O F THIS BOOK MAY B E REPRODUCED I N A N Y FORM, B Y PHOTOSTAT, MICROFILM,

OR A N Y OTHER MEANS,

WITHOUT

WRITTEN PERMISSION FROM T H E PUBLISHERS.

ACADEMIC PRESS I N C . I l l Fifth A v e n u e , N e w York, N e w York 10003

United Kingdom Edition published by ACADEMIC PRESS I N C . (LONDON) L T D . B e r k e l e y Square H o u s e , L o n d o n W . l

LIBRARY

OF CONGRESS CATALOG CARD

NUMBER:

P R I N T E D I N T H E U N I T E D STATES O F AMERICA

66-30080

Preface

T h e theory of games is a p a r t of t h e rich m a t h e m a t i c a l legacy left by J o h n von N e u m a n n , one of t h e outstanding mathematicians of our era. Although others—notably Emil Borel—preceded h i m in formulating a theory of games, it was von N e u m a n n who with t h e publication in 1927 of a proof of t h e minimax theorem for finite games laid t h e founda­ tion for t h e theory of games as it is known t o d a y . Von N e u m a n n ' s work culminated in a book written in collaboration with Oskar Morgenstern entitled Theory of Games and Economic Behavior published in 1944. A t about t h e same time, statistical theory was being given an in­ creasingly rigorous m a t h e m a t i c a l foundation in a series of papers b y J. N e y m a n and Egon Pearson. Statistical theory until t h a t time, as developed by K a r l Pearson, R. A. Fischer, and others h a d lacked t h e precise m a t h e m a t i c a l formulation, supplied by N e y m a n a n d Pearson, t h a t allows t h e delicate foundational questions involved to be t r e a t e d rigorously. Apparently it was A b r a h a m Wald who first appreciated t h e connec­ tions between t h e theory of games and t h e statistical theory of N e y m a n and Pearson, and who recognized t h e advantages of basing statistical theory on t h e theory of games. WakTs theory of statistical decisions, as it is called, generalizes a n d simplifies t h e N e y m a n - P e a r s o n theory by unifying, t h a t is, by treating problems considered as distinct in t h e Neyman-Pearson theory as special cases of t h e decision theory problem. V

i?i •

Preface

In t h e 1940's, Wald produced a prodigious a m o u n t of research t h a t resulted in t h e publication of his book Statistical Decision Functions in 1950, t h e year of his tragic d e a t h in an airplane accident. I t is our objective t o present t h e elements of WakTs decision theory and an investigation of t h e extent to which problems of m a t h e m a t i c a l statistics m a y be treated successfully b y this approach. T h e main viewpoint is developed in t h e first two chapters a n d culminates in a rather general complete class theorem (Theorem 2.10.3). T h e remaining five chapters deal with statistical topics. N o separate chapter on estima­ tion is included since estimation is discussed as examples for general decision problems. I t was originally intended t h a t only those p a r t s of statistical theory t h a t could be justified from a decision-theoretic view­ point would be included. Mainly, this entails t h e omission of those topics whose mathematical justification is given b y large sample theory, such as m a x i m u m likelihood estimates, m i n i m u m x methods, a n d likelihood ratio tests. However, one exception is made. Although t h e theory of confidence sets as treated does n o t allow a decision-theoretic justification, it was felt t h a t this topic "belongs" in any discourse on statistics wherein tests of hypotheses are treated. F o r purposes of com­ parison, t h e decision-theoretic notion of a set estimate is included in t h e exercises. This book is intended for first-year graduate students in m a t h e m a t i c s . I t has been used in mimeographed form a t U C L A in a two-semester or three-quarter course a t t e n d e d mainly by mathematicians, bio-statisti­ cians, a n d engineers. I have generally finished t h e first four chapters in the first semester, deleting perhaps Sections 1.4 a n d 3.7, b u t I h a v e never succeeded in completing t h e last three chapters in t h e second semester. T h e r e are four suggested prerequisites. (1) T h e main prerequisite is a good u n d e r g r a d u a t e course in proba­ bility. Ideally, this course should p a y a little more attention t o condi­ tional expectation t h a n t h e usual course. I n particular, t h e formula E(E(X | F ) ) = E{X) should be stressed. Although t h e abstract a p ­ proach to probability theory through measure theory is n o t used (except in Section 3.7, which m a y be omitted), it is assumed t h a t t h e reader is acquainted with t h e notions of a c-field of sets (as t h e n a t u r a l domain of definition of a probability) a n d of a set of probability zero. (2) An u n d e r g r a d u a t e course in analysis on Euclidean spaces is strongly recommended. I t is assumed t h a t t h e reader knows t h e con2

Preface



vii

cepts of continuity, uniform continuity, open and closed sets, t h e Riemann integral, and so forth. (3) An introductory u n d e r g r a d u a t e course in statistics is highly desirable as background material. Although t h e usual notions of test, power function, and so on, are defined as t h e y arise, t h e discussion a n d illustration are r a t h e r abstract. (4) A course in t h e algebra of matrices would be helpful to the student. R u d i m e n t a r y notes leading to this book h a v e been in existence for about six years. E a c h succeeding generation of students has improved t h e quality of the text and removed errors overlooked by their prede­ cessors. W i t h o u t t h e criticism and interest of these students, too numerous to mention individually, this book would n o t h a v e been written. E a r l y versions of t h e notes benefitted from comments by J a c k Kiefer and H e r b e r t Robbins. T h e notes were used by Milton Sobel for a course a t t h e University of M i n n e s o t a ; his criticisms and those of his students were very useful. F u r t h e r improvements followed when P a u l Hoel used t h e notes in a course a t U C L A . Finally, G u s H a g g s t r o m gave the galleys a critical reading and caught several errors t h a t eluded all previous readers. T o all these, I express m y deep appreciation. T H O M A S S.

Berkeley, February,

California 1967

FERGUSON

MATHEMATICAL STATISTICS

Table of Contents Preface Chapter 1. Game Theory and Decision Theory 1.1 Basic Elements 1.2 A Comparison of Game Theory and Decision Theory 1.3 Decision Function; Risk Function 1.4 Utility and Subjective Probability 1.5 Randomization 1.6 Optimal Decision Rules 1.7 Geometric Interpretation for Finite θ 1.8 The Form of Bayes Rules for Estimation Problems Chapter 2. The Main Theorems of Decision Theory 2.1 Admissibility and Completeness 2.2 Decision Theory 2.3 Admissibility of Bayes Rules 2.4 Basic Assumptions 2.5 Existence of Bayes Decision Rules 2.6 Existence of a Minimal Complete Class 2.7 The Separating Hyperplane Theorem 2.8 Essential Completeness of the Class of Nonrandomized Decision Rules 2.9 The Minimax Theorem 2.10 The Complete Class Theorem 2.11 Solving for Minimax Rules Chapter 3. Distributions and Sufficient Statistics 3.1 Useful Univariate Distributions 3.2 The Multivariate Normal Distribution 3.3 Sufficient Statistics Sufficient Statistics

3.5 Exponential Families of Distributions 3.6 Complete Sufficient Statistics 3.7 Continuity of the Risk Function Chapter 4. Invariant Statistical Decision Problems 4.1 Invariant Decision Problems 4.2 Invariant Decision Rules 4.3 Admissible and Minimax Invariant Rules 4.4 Location and Scale Parameters 4.5 Minimax Estimates of Location Parameters 4.6 Minimax Estimates for the Parameters of a Normal Distribution 4.7 The Pitman Estimate 4.8 Estimation of a Distribution Function Chapter 5. Testing Hypotheses 5.1 The Neyman-Pearson Lemma 5.2 Uniformly Most Powerful Tests 5.3 Two-Sided Tests 5.4 Uniformly Most Powerful Unbiased Tests 5.5 Locally Best Tests 5.6 Invariance in Hypothesis Testing 5.7 The Two-Sample Problem 5.8 Confidence Sets 5.9 The General Linear Hypothesis 5.10 Confidence Ellipsoids and Multiple Comparisons Chapter 6. Multiple Decision Problems 6.1 Monotone Multiple Decision Problems 6.2 Bayes Rules in Multiple Decision Problems 6.3 Slippage Problems Chapter 7. Sequential Decision Problems

7.1 Sequential Decision Rules 7.2 Bayes and Minimax Sequential Decision Rules 7.3 Convex Loss and Sufficiency 7.4 Invariant Sequential Decision Problems 7.5 Sequential Tests of a Simple Hypothesis Against a Simple Alternative 7.6 The Sequential Probability Ratio Test 7.7 The Fundamental Identity of Sequential Analysis References Subject Index

MATHEMATICAL STATISTICS

CHAPTER 1

Game Theory and Decision Theory

1.1

Basic Elements

T h e elements of decision theory are similar to those of t h e theory of games. I n particular, decision theory m a y be considered as t h e t h e o r y of a two-person game, in which n a t u r e takes t h e role of one of t h e players. T h e so-called n o r m a l form of a zero-sum two-person game, henceforth to be referred to as a game, consists of three basic elements: 1. A n o n e m p t y set, ®, of possible states of nature, sometimes re­ ferred to as t h e p a r a m e t e r space. 2. A n o n e m p t y set, Gfc, of actions available to t h e statistician. 3. A loss function, L(0, a ) , a real-valued function defined on ® X &. A game in t h e m a t h e m a t i c a l sense is j u s t such a triplet (©, Ct, L ) , a n d a n y such triplet defines a game, which is interpreted as follows. N a t u r e chooses a point 0 in ©, a n d t h e statistician, without being in­ formed of t h e choice n a t u r e has made, chooses a n action a in &. As a consequence of these two choices, t h e statistician loses a n a m o u n t L(0, a ) . [ T h e function L(0, a) m a y t a k e negative values. A negative loss m a y be interpreted as a gain, b u t t h r o u g h o u t this book L(0, a) represents t h e loss to t h e statistician if he takes action a when 0 is t h e " t r u e s t a t e of n a t u r e " . ] Simple t h o u g h this definition m a y be, its scope is quite broad, as t h e following examples illustrate. I

2 • 1 Game

Theory

and Decision

E X A M P L E 1. O D D OR E V E N .

Theory

T W O c o n t e s t a n t s simultaneously p u t u p

either one or two fingers. One of t h e players, call h i m player I, wins if t h e s u m of t h e digits showing is odd, a n d t h e other player, player I I , wins if t h e sum of t h e digits showing is even. T h e winner in all cases receives in dollars t h e sum of t h e digits showing, this being paid t o h i m b y t h e loser. T o create a triplet (®, L) o u t of this game we give player I t h e label " n a t u r e " a n d player I I t h e label "statistician". E a c h of these players h a s two possible choices, so t h a t @ = {1, 2} = Gt, in which " 1 " a n d " 2 " stand for t h e decisions t o p u t u p one a n d two fingers, respec­ tively. T h e loss function is given b y Table 1.1. T h u s L ( l , 1) = —2, T a b l e 1.1

a

K

0

l

2

N

l

2

-2

3

3

-4

L (0, a)

1/(1, 2) = 3, L ( 2 , 1) = 3, a n d L ( 2 , 2) = - 4 . I t is quite clear t h a t this is a game in t h e sense described in t h e first paragraph. This example is discussed later in Section 1.7, in which it is shown t h a t one of t h e players has a distinct a d v a n t a g e over t h e other. C a n y o u tell which one it is? Which player would you r a t h e r be? E X A M P L E 2. T I O T A C - T O E , C H E S S .

I n t h e g a m e (@, d, L)

a n element

of t h e space © or & is sometimes referred t o as a strategy. I n some games strategies are built on a more elementary concept, t h a t of a " m o v e " . M a n y parlor games illustrate this feature; for example, t h e games tic-tac-toe, chess, checkers, Battleship, N i m , Go, a n d so forth. A move is an action m a d e b y a specified player a t a specified time during t h e game. T h e rules determine a t each move t h e player whose t u r n it is t o move a n d t h e choices of move available t o t h a t player a t t h a t time. For such a game a strategy is a rule t h a t specifies for a given player t h e exact move h e is t o m a k e each time it is his t u r n t o move, for all possible histories of t h e game. T h e game of tic-tac-toe h a s a t most nine moves, one player making five of them, t h e other making four. A player's

1.1 Basic

Elements

• 3

strategy m u s t tell h i m exactly w h a t move t o m a k e in each possible position t h a t m a y occur in t h e game. Because t h e n u m b e r of possible games of tic-tac-toe is rather small (less t h a n 9!), it is possible t o write down a n optimal strategy for each player. I n this case each player has a strategy t h a t guarantees its user a t least a tie, no m a t t e r w h a t his opponent does. Such strategies are called optimal strategies. N a t u r a l l y , in t h e game of chess it is physically impossible t o describe "all possible histories", for there are too m a n y possible games of chess a n d m a n y more strategies, in fact, t h a n there are atoms in our solar system. W e can write down strategies for t h e game of chess, b u t none so far con­ structed h a s m u c h of a chance of beating t h e average a m a t e u r . W h e n the two players h a v e written down their strategies, t h e y m a y be given to a referee who m a y play through t h e game a n d determine t h e winner. I n t h e triplet (@, Ct, L), which describes either tic-tac-toe or chess, t h e spaces @ a n d & are t h e sets of all strategies for t h e t w o players, a n d t h e loss function L(6, a) m a y be + 1 if t h e strategy 0 beats t h e strategy a, 0 for a draw, a n d — 1 if a beats 0. EXAMPLE 3. A GAME WITH BLUFFING.

A n o t h e r feature of m a n y games,

a n d one t h a t is characteristic of card games, is t h e notion of a chance move. T h e dealing or drawing of cards, t h e rolling of dice, t h e spinning of a roulette wheel, a n d so on, are examples of chance moves. I n t h e theory of games it is assumed t h a t b o t h players are aware of t h e p r o b ­ abilities of t h e various outcomes resulting from a chance move. Some­ times, as in card games, one player m a y be informed of t h e actual outcome of a chance move, whereas t h e other player is not. This leads t o t h e possibility of "bluffing". T h e following example is a loose description of a situation which sometimes occurs in t h e game of s t u d poker. T w o players each p u t a n " a n t e " of a units into a p o t (a > 0 ) . Player 1 t h e n draws a card from a deck, which gives h i m a winning or a losing card. B o t h players are aware of t h e probability P t h a t t h e card drawn is a winning card (0 < P < 1 ) . Player I t h e n m a y bet b units (b > 0) b y p u t t i n g b units into t h e p o t or h e m a y check. If player I checks, he wins t h e p o t if he h a s a winning card a n d loses t h e p o t if he h a s a losing card. If player I bets, player I I m a y call a n d p u t b units in t h e p o t or h e m a y fold. If player I I folds, player I wins t h e p o t whatever card he h a s drawn. If player I I calls, player I wins t h e p o t if he h a s a winning card a n d loses it otherwise. If I receives a winning card, it is clear t h a t he should b e t : if he checks,

4 • 1 Game

Theory

and

Decision

Theory

he automatically receives t o t a l winnings of a units, whereas if he bets, he will receive a t least a units a n d possibly more. For t h e purposes of our discussion we assume t h a t t h e rules of t h e game enforce this con­ dition: t h a t if I receives a winning card, he m u s t bet. This will eliminate some obviously poor strategies from player P s strategy set. W i t h this restriction, player I has two possible strategies: (a) the bluff strategy—bet with a winning card or a losing card; a n d (b) the honest strategy—bet with a winning card, check with a losing card. T h e two strategies for player I I are (a) the call strategy—if player I bets, call; a n d (b) the fold strategy— if player I bets, fold. Given a strategy for each player in a game with chance moves, a referee can play t h e game t h r o u g h as before, playing each chance move with t h e probability distribution specified, and de­ termining who has won a n d b y how much. T h e actual payoff in such games is t h u s a r a n d o m q u a n t i t y determined b y t h e chance moves. I n writing down a loss function, we replace these r a n d o m quantities b y their expected values in order to obtain a game as defined. ( F u r t h e r discussion of this m a y be found in Sections 1.3 and 1.4.) Table 1.2 T a b l e 1.2 Call

Bluff

(2P -

1) (a +

Honest

(2P -

l)o +

Fold

a

b)

Pb

(2P -

l)a

shows player P s expected winnings and player I P s expected losses. F o r example, if I uses t h e honest strategy a n d I I uses t h e call strategy, player I P s loss will be (a + b) with probability P ( I receives a winning card) a n d —a with probability (1 — P) (I receives a losing c a r d ) . T h e ex­ pected loss is (a + b)P - a ( l - P)

= (2P -

l)a +

Pb,

as found in t h e table. If player I is given t h e label " n a t u r e " a n d player I I t h e label "statistician," t h e triplet (@, a, L ) , in which © = (bluff, h o n e s t ) , d = (call, fold), and L is given b y Table 1.2, defines a game t h a t contains t h e main aspects of t h e bluffing game already described. This game is considered in Exercises 1.7.4 and 5.2.8.

1.2

1.2

A Comparison

of Game

Theory

and

Decision

Theory

• 5

A C o m p a r i s o n of G a m e T h e o r y a n d D e c i s i o n T h e o r y

There are certain differences between game theory and decision theory t h a t arise from t h e philosophical interpretation of t h e elements 0, a, a n d L. T h e m a i n differences are these. 1. I n a two-person game t h e two players are trying simultaneously to maximize their winnings (or t o minimize their losses), whereas in decision theory n a t u r e chooses a state without this view in mind. T h i s difference plays a role mainly in t h e interpretation of w h a t is considered to be a good decision for t h e statistician a n d results in presenting h i m with a broader dilemma and a correspondingly wider class of w h a t might be called "reasonable" decision rules. This is natural, for one can de­ pend on an intelligent opponent to behave "rationally", t h a t is to say, in a way profitable t o him. However, a criterion of " r a t i o n a l " behavior for n a t u r e m a y n o t exist or, if it does, the statistician m a y n o t h a v e knowledge of it. W e do n o t assume t h a t n a t u r e wins t h e a m o u n t L(0, a) when 0 and a are t h e points chosen b y the players. A n example will m a k e this clear. Consider t h e game (0, &, L) in which 0 = {0i, 0 } and 0, = {ai, a } a n d in which the loss function L is given b y Table 1.3. In game 2

2

T a b l e 1.3

0i

02

4

1

- 3

L($,

0

a)

theory, in which t h e player choosing a point from 0 is assumed to be intelligent a n d his winnings in t h e game are given b y t h e function L, t h e only " r a t i o n a l " choice for h i m is B\. N o m a t t e r w h a t his opponent does, he will gain more if he chooses 0i t h a n if he chooses 0 . T h u s it is clear t h a t t h e statistician should choose action a , instead of a\, for he will lose only one instead of four. Again, this is t h e only reasonable thing for him to do. Now, suppose t h a t t h e function L does not reflect 2

2

6 • 1 Game

Theory

and

Decision

Theory

the winnings of n a t u r e or t h a t n a t u r e chooses a state without a n y clear objective in mind. T h e n we can no longer state categorically t h a t t h e statistician should choose action a . If n a t u r e happens to choose 6 , t h e statistician will prefer to t a k e action a\. This basic conceptual difference between game theory a n d decision theory is reflected in the difference between the theorems we have called fundamental for game theory a n d fundamental for decision theory (Sec. 2.2). 2. I t is assumed t h a t n a t u r e chooses t h e " t r u e s t a t e " once a n d for all a n d t h a t t h e statistician has a t his disposal t h e possibility of gathering information on this choice b y sampling or b y performing an experiment. This difference between game theory a n d decision theory is more a p ­ parent t h a n real, for one can easily imagine a game between two in­ telligent adversaries in which one of t h e players has an a d v a n t a g e given to him b y t h e rules of t h e game b y which he can get some information on the choice his opponent has m a d e before he himself has to m a k e a decision. I t t u r n s out (Sec. 1.3) t h a t t h e over-all problem which allows t h e statistician to gain information b y sampling m a y simply be viewed as a more complex game. However, all statistical games h a v e this char­ acteristic feature, a n d it is t h e exploitation of t h e structure which such gathering of information gives to a game t h a t distinguishes decision theory from game theory proper. 2

2

For an entertaining introduction to finite games t h e delightful book The Compleat Strategyst b y t h e late J. D . Williams (1954) is highly recommended. T h e more serious s t u d e n t should also consult t h e lucid accounts of game theory found in McKinsey (1952), Karlin (1959), and Luce a n d Raiffa (1957). A n elementary text b y Chernoff a n d Moses (1959) provides a good introduction to t h e m a i n concepts of decision theory. T h e important book b y Blackwell and Girshick (1954), which is a more advanced text, is recommended as collateral reading for this study.

1.3

Decision F u n c t i o n ; Risk F u n c t i o n

T o give a m a t h e m a t i c a l structure t o this process of information gather­ ing, we suppose t h a t t h e statistician before making a decision is allowed t o look a t t h e observed value of a r a n d o m variable or vector, X, whose distribution depends on t h e t r u e state of nature, 0. T h r o u g h o u t most of this book the sample space, denoted b y X, is t a k e n to be (a Borel subset of)

1.3

Decision

Function;

Risk

Function

• 7

a finite dimensional Euclidean space, a n d the probability distributions of X are supposed to be denned on t h e Borel subsets, (B of £. T h u s for each 0 £ © there is a probability measure P denned on (B, a n d a cor­ responding cumulative distribution function Fx (x\ 6), which represents t h e distribution of X when 0 is the t r u e value of t h e parameter. [If X is an n-dimensional vector, it is best to consider X as a notation for (Xi, • • •, X ) a n d F (x | 0) as a notation for t h e multivariate cumula­ tive distribution function Fxi,...,x»(^i, • • •, x \ 6).~\ e

n

x

n

A statistical decision problem or a statistical game is a game (©, d, L) coupled with an experiment involving a r a n d o m observable X whose distribution Pe depends on t h e state 0 £ © chosen b y n a t u r e . On t h e basis of t h e outcome of t h e experiment X = x (x is t h e ob­ served value of X ) , the statistician chooses a n action d{x) £ d. Such a function d, which m a p s t h e sample space H into Q, is an elementary strategy for t h e statistician in this situation. T h e loss is now t h e r a n d o m q u a n t i t y L(0, d(X)). T h e expected value of L(0, d(X)) when 0 is t h e t r u e s t a t e of n a t u r e is called the risk function R(6,d)

=

E L(6,d(X))

(1.1)

e

and represents t h e average loss to t h e statistician when t h e true s t a t e of n a t u r e is 0 a n d t h e statistician uses t h e function d. N o t e t h a t for some choices of t h e function d a n d some values of t h e p a r a m e t e r 0 t h e ex­ pected value in (1.1) m a y be zb or, worse, it m a y n o t even exist. As t h e following definition indicates, we do not bother ourselves a b o u t such functions.

D e f i n i t i o n 1. A n y function d(x) t h a t m a p s t h e sample space £ into & is called a nonrandomized decision rule or a nonrandomized decision function, provided t h e risk function R(6, d) exists and is finite for all 0 £ @. T h e class of all nonrandomized decision rules is denoted b y D. Unfortunately, t h e class D is not well defined unless we specify t h e sense in which t h e expectation in (1.1) is t o be understood. T h e reader m a y t a k e this expectation to be t h e Lebesgue integral, R(d,d)

=E L(d,d(X)) e

= J L(0,d(x))

dP (x). e

8 • 1 Game

Theory

and

Decision

Theory

W i t h such an understanding, D consists of those functions d for which L(0, d(x)) is for each 0 £ @ a Lebesgue integrable function of I n par­ ticular, D contains all simple functions. (A function d from H to ft is called simple if there is a finite partition of T£ into measurable subsets Bi, • • •, B £ (B, and a finite subset {a • • •, a } of ft such t h a t for # £ Z? , d(x) = di for i = 1, ra.) On t h e other hand, t h e expectation in (1.1) m a y be t a k e n as t h e R i e m a n n or t h e Riemann-Stieltjes integral, m

R(0,d)

h

=E L(6,d(X))

m

= J L(d,d(x))

e

4

dFx(x\6).

I n t h a t case D would contain only functions d for which L(0, d(x)) is for each 0 £ @ continuous on a set of probability one under Fx(x | 0). F o r t h e purposes of understanding w h a t follows, it is n o t too i m p o r t a n t which of t h e various definitions is given to t h e expectation in (1.1). I n most of t h e proofs of t h e theorems given later we use only certain lin­ earity [E{aX + Y) = aEX + EY~] a n d ordering ( X > 0 => EX > 0) properties of t h e expectation; such proofs are equally valid for Lebesgue and R i e m a n n integrals. Therefore we let t h e definition of the expecta­ tion be arbitrary (unless otherwise stated) and assume t h a t t h e class D of decision rules is well defined. E X A M P L E 1. T h e game of " o d d or e v e n " mentioned in Sec. 1.1 m a y be extended to a statistical decision problem. Suppose t h a t before t h e game is played t h e player called " t h e statistician" is allowed t o ask t h e player called " n a t u r e " how m a n y fingers he intends to p u t u p a n d t h a t n a t u r e m u s t answer truthfully with probability 3 / 4 (hence u n t r u t h ­ fully with probability 1/4). T h e statistician therefore observes a r a n d o m variable X (the answer n a t u r e gives) taking t h e values 1 or 2. If 0 = 1 is t h e t r u e state of nature, t h e probability t h a t X == 1 is 3 / 4 ; t h a t is, Pi{X = 1} = 3 / 4 . Similarly, P } X = 1} = 1 / 4 . T h e r e are exactly four possible functions from X = {1, 2} into ft = {1, 2}. These are t h e four decision rules: 2

= 1,

4 ( 2 ) = 1;

4 ( 1 ) = 1,

4 ( 2 ) = 2;

4 ( 1 ) = 2,

4 ( 2 ) = 1;

4 ( 1 ) = 2,

4 ( 2 ) = 2.

di(l)

Rules 4 and 4 ignore t h e value of X . Rule 4 reflects t h e belief of t h e

1.3

Decision

Function;

Risk

Function

• 9

statistician t h a t n a t u r e is telling the t r u t h , a n d rule d%, t h a t n a t u r e is not telling t h e t r u t h . T h e risk table (Table 1.4) should be checked b y t h e student as an exercise. T a b l e 1.4 d

d*

-3/4

7/4

3

-9/4

5/4

- 4

2

C?4

@\ 1

2

-2

3

R(d, d)

I t is a custom, which we steadfastly observe, t h a t t h e choice of a de­ cision function should depend only on t h e risk function R(Q, d) ( t h e smaller in value the better) and n o t otherwise on t h e distribution of t h e r a n d o m variable L(0, d(X)). (For example, this would entail t h e supposition t h a t a poor m a n would be indifferent when choosing b e ­ tween t h e offer of $10,000 as a n outright gift, a n d t h e offer of a gamble t h a t would give him $20,000 with probability one half a n d $0 with probability one half.) T h e r e is a relatively sound m a t h e m a t i c a l reason for t h e statistician to behave in this fashion, provided t h e loss function is measured in utiles r a t h e r t h a n in some m o n e t a r y way. This topic is t h e subject of t h e next section. Notice t h a t t h e original game (©, Ct, L) has been replaced b y a new game, (©, D , R), in which t h e space D a n d t h e function R h a v e an under­ lying structure, depending on Ct, L, a n d t h e distribution of X , whose ex­ ploitation m u s t be t h e m a i n objective of decision theory. Naturally, only a small p a r t of statistics can be contained within such a simple framework. N o room has been m a d e for such broad topics as t h e choice of experiments, t h e design of experiments, or sequential analysis. I n each case a new structure could be added to t h e framework to include these topics, and t h e problem would b e reduced once again to a simple game. F o r example, in sequential analysis t h e statistician m a y t a k e observations one a t a time, paying c units each time h e does so. Therefore a decision rule will h a v e t o tell him b o t h when to stop taking observations and w h a t action to t a k e once he has stopped. H e will t r y

10 • 1 Game

Theory

and

Decision

Theory

to choose a decision rule t h a t will minimize in some sense his new risk, which is denned now as t h e expected value of the loss plus t h e cost. Nevertheless, even t h e simple structure of t h e first p a r a g r a p h of this section is broad enough to contain t h e m a i n aspects of three i m p o r t a n t categories in w h a t might be called "classical" m a t h e m a t i c a l statistics. 1. ft consists of two points, ft = {a\, a ). Decision theoretic problems in which ft consists of exactly two points are called problems in testing hypotheses. Consider t h e special case in which 0 is t h e real line and suppose t h a t t h e loss function is for some fixed n u m b e r 0 given b y t h e formulas 2

O

fZi

if

6 > d

o

if

e < do

0

if

0 > d

h

if

e < do.

0

L(6, ai) = and L(6,a )

0

=

2

where h and l are positive numbers. Here we would like to t a k e action ai if 6 < 0 and action a if 0 > 6 . T h e space D of decision rules consists of those functions d from t h e sample space into {a a \ with t h e prop­ erty t h a t P {d(X) = ai} is well-defined for all values of 0 £ 0 . T h e risk function in this case is easy to compute 2

O

2

0

h

2

e

R(6,d)

hPe{d(X)

= ]

l Pe{d(X)

= a)

ai

if

0 > 0

if

0 < 0 •

O

= 2

2

O

I n this way probabilities of making two types of error are involved. F o r 0 > 0 , Po{d(X) = ai} is t h e probability of making t h e error of taking action a\ when we should t a k e action a and 0 is t h e true s t a t e of nature. Similarly, for 0 < 0 , O

2

O

Pe{d(X)

= a } = 1 - P {d(X) 2

e

= m]

is t h e probability of m a k i n g t h e error of taking action a when we should t a k e action a\ a n d 0 is t h e true state of n a t u r e . 2. ft consists of k points, [a\, a , • • •, a^j, k > 3. These decision theo­ retic problems are called multiple decision problems. As a n example, a statistician might be called on to decide which of k worthy students is 2

2

1.4

Utility

and

Subjective

Probability

• 11

to receive a scholarship on t h e basis of school grades a n d financial need in which t h e loss is based on t h e students expected performance with and without a scholarship. Another typical example occurs when an experimenter is to judge which of two t r e a t m e n t s has a greater yield on t h e basis of an experiment. H e m a y (a) decide t r e a t m e n t 1 is better, (b) decide t r e a t m e n t 2 is better, or (c) withhold j u d g m e n t until more d a t a are available. I n this example k = 3. 3. CI consists of the real line, Ct = (—• 0, X < 1, t h e n Pi < Pi if, and only if, \pi + (1 — \)q < Xp + (1 — X)q. h

2

2

Hypothesis H. If p p , a n d p £ (P* are such t h a t pi < p < p , t h e n there exist numbers X and ^ with 0 < X < 1 and 0 < \x < 1, such that 2

h

2

Xpz + (1 -

3

\)pi

2

< p < up* + (1 -

3



2

Hypothesis Hi seems reasonable. A minor objection is t h a t we might be indifferent between Xpi + (1 — \)q and \p + (1 — \)q when X is sufficiently small, say X = 10~ , even though we prefer pi to p . Another objection comes from t h e m a n who dislikes gambles with r a n d o m p a y ­ offs. H e might prefer p t h a t would give him $2.00 for sure to a gamble pi t h a t would give him $3.10 with probability 1/2 a n d $1.00 w i t h prob­ ability 1/2; b u t if q is $5.00 for sure and X = 1/2, he might prefer Xpi + (1 — X)q to Xp + (1 — X)q on the basis of larger expected value, for the payoff is r a n d o m in either case. Hypothesis H is more debatable. I t is safe to assume t h a t d e a t h < 10^ < $1.00. Yet would there exist a ii < 1 such t h a t 10^ < /x($1.00) + (1 - M) ( d e a t h ) ? P e r h a p s not. For myself, I would say t h a t 1 — /* = 1 0 would suffice. A t a n y rate, hypothesis H implies t h a t there is no payoff infinitely more desirable or infinitely less desirable t h a n a n y other payoff. For penetrating critiques of t h e whole subject of utility and subjective probability, two enter­ taining and informative books are recommended: Luce and Raiffa (1957) and Savage (1954). 2

1000

2

2

2

2

- 1 0 0 0

2

T h e o r e m 1. If a preference p a t t e r n < on (P* satisfies Hi and H , then there exists a utility, u, on (P* which agrees with < . Furthermore, u is uniquely determined u p to a linear transformation. 2

Note. If u is a utility t h a t agrees with < , then u = au + fi, where a > 0 and 0 are real numbers, is also a utility t h a t agrees with < . T h u s t h e uniqueness of u u p to a linear transformation is as strong a uniqueness as a n y t h a t can be obtained.

1.4

Proof.

Utility

and

Subjective

Probability

• J5

W e break u p t h e proof into a n u m b e r of easy steps.

1. If po < pi a n d 0 < X < M < 1, t h e n Xpi + (1 - X)p < /xpi + (1

-

0

M)PO •

(If pi is preferred t o p , then, between a n y two linear combinations of pi and po, t h e one giving larger weight to pi is preferred.) 0

Because M — X is positive, Hi implies t h a t

Proof.

Xpi + (1 - X)p =

(M

< (

M

-

0

X)p

0

+

n + X)

(1 -

- \)pi + (1

-

M

+ X)

= fipi + (1 - M)PO • 2. If p < pi a n d p < q < pi, there exists a unique n u m b e r X', 0 < X' < 1, such t h a t X'pi + (1 - \')p ~ g. 0

0

0

Proof. If either pi ^ # or p ~ q, t h e result is immediate. Hence as­ sume t h a t po < q < pi a n d let 2

T = (X:0 < X < 1

and

Xpi + (1 - X)p < q) ; 0

t h e n 0 G T a n d 1 $ T . B y ( 1 ) , if X G T a n d X < Xi, t h e n X G T - T h u s T is a n interval. Let X' be t h e least upper b o u n d of T. W e will show t h a t X' satisfies t h e requirement of s t a t e m e n t ( 2 ) . ( a ) , q < X'pi + (1 — X')p . This s t a t e m e n t is obvious if X' = 1. N o w suppose t h a t X' < 1 a n d t h a t this s t a t e m e n t is false. T h e n 7

7

x

7

2

2

7

0

X'pi + (1 - X')p < q < P i , 0

so t h a t from H there is a X, 0 < X < 1, such t h a t 2

Xpi + (1 - X)[X'p! + (1 - X')po] < q. This is t h e same as (X' + X(l - X'))pi + (1 - X ) ( l - X')po < q, so t h a t X' + X(l - X')

G

T)

b u t X' + X(l - X') > X',

which contradicts t h e definition of X' as a n upper b o u n d of T.

16 • 1 Game

Theory

and

Decision

Theory

( b ) . A'pi + (1 — X')p < q. T h e proof, similar to t h a t of ( a ) , is left t o t h e reader. Together, (a) a n d (b) imply t h a t X'pi + (1 — X')p ~ q. Only unicity remains to be proved. ( c ) . Unicity. If X'pi + (1 - \')p ~ X"pi + (1 - X")p , t h e n from (1) b o t h X' < X" and X" < X' so t h a t X" = X', completing t h e proof of ( 2 ) . 0

0

0

0

If all p 6 (P* are equivalent, t h e result is trivial. So we suppose t h e r e exist po and pi G (P* such t h a t p < Pi- B y t h e interval [ p , P i ] we shall m e a n t h e set 0

0

[Po, P i ] = ( g f

(P*:po < g < p i } .

3. Let po a n d pi be a n y elements of (P* for which p < pi. T h e n there exists a utility function, u, on t h e interval [ p , p J , uniquely determined u p t o a linear transformation, which agrees with < on [ p , p i ] . 0

0

0

Proof. F o r # G [po, p j define w(g) to be t h a t unique n u m b e r X' such t h a t q ~ X'pi + (1 — X')po. N o t e t h a t u(p ) = 0 a n d u(pi) = 1. ( a ) . u agrees with