124 79 18MB
English Pages 448 [451] Year 2016
Annals of Mathematics Studies Number 39
ANNALS OF MATHEMATICS STUDIES Edited by Emil Artin and Marston Morse 1. Algebraic Theory of Numbers, by H e r m a n n W e y l 3. Consistency of the Continuum Hypothesis, by K u r t G o d e l 6. The Calculi of Lambda-Conversion, by A l o n z o C h u r c h 1 0 . Topics in Topology, by S o l o m o n L e f s c h e t z 1 1 . Introduction to Nonlinear Mechanics, by N. K r y l o f f and N. B o g o l i u b o f f 15. Topological Methods in the Theory of Functions of a Complex Variable,. by M a r s t o n M o r s e 16. Transcendental Numbers, by C a r l L u d w ig S i e g e l 17. Probleme General de la Stabilite du Mouvement, by M. A. L i a p o u n o f f 19. Fourier Transforms, by S. B o c h n e r and K. C h a n d r a s e k h a r a n 20. Contributions to the Theory of Nonlinear Oscillations, Vol. I, edited by S. L efsc h etz
Functional Operators, Vol. I, by J o h n v o n N e u m a n n 22. Functional Operators, Vol. II, by J o h n v o n N e u m a n n
21.
24. Contributions to the Theory of Games, Vol. I, edited by H . W . K u h n and A. W. T u c k e r 25. Contributions to Fourier Analysis, edited by A. Z y g m u n d , W . T r a n s u e , M. M o r s e , A. P. C a l d e r o n , and S . B o c h n e r 26. A Theory of Cross-Spaces, by R o b e r t S c h a t t e n 27. Isoperimetric Inequalities in Mathematical Physics, by G. P o l y a and G. Sz e g o 28. Contributions to the Theory of Games, Vol. II, edited by H. K u h n and A. W. T u c k e r 29. Contributions to the Theory of Nonlinear Oscillations, Vol. II, edited by S. L e fsc h e t z
30. Contributions to the Theory of Riemann Surfaces, edited by L. A h l f o r s et al. 31. Order-Preserving Maps and Integration Processes, by E d w a r d J. M c Sh a n e 32. Curvature and Betti Numbers, by K . Y a n o and S . B o c h n e r 33. Contributions to the Theory of Partial Differential Equations, edited by L . B e r s , S . B o c h n e r , and F. J o h n 34. Automata Studies, edited by C . E. S h a n n o n and J . M c C a r t h y 35. Surface Area, by L a m b e r t o C e s a r i 36. Contributions to the Theory of Nonlinear Oscillations, Vol. Ill, edited by S. L e fsc h e t z
37. Lectures on the Theory of Games, by H a r o l d W . K u h n . In press 38. Linear Inequalities and Related Systems, edited by H . W. K u h n and A. W. T ucker
39. Contributions to the Theory of Games, Vol. Ill, edited by M. D r e s h e r , A. W . T u c k e r and P. W o l f e 40. Contributions to the Theory of Games, Vol. IV, edited by R. D u n c a n L u c e and A. W . T u c k e r . In press 41. Contributions to the Theory of Nonlinear Oscillations, Vol. IV, edited by S. L e f s c h e t z . In pressa
CONTRIBUTIONS TO TH E THEORY OF GAMES VOLUME III
C. BERGE L . D. BERKO VITZ L. E. D U BIN S H . EV ER ETT
S. K A R LIN J . G. K E M E N Y J . M ILN O R J . C. OXTOBY M . 0 . RABIN
W. H . F L E M IN G D. GALE
R. RESTREPO H . E. SCARF
D. G IL L E T T E 0 . GROSS
L . S. SH A P L E Y M . SION
J . F. H A N N A N J . C. HOLLADAY J . R. ISB E L L
G. L . TH O M PSO N W. W ALDEN
P. W O LFE
Edited by
,
M. Dresher, A. W. Tucker P. Wolfe
Princeton, New Jersey Princeton University Press
1957
Copyright © 1957 , by Princeton University Press London: Oxford University Press All Rights Reserved L. C. Card 57-5^60
This research was supported in part by the Office of Naval Research. Reproduc tion, translation, publication, use and disposal in whole or in part by or for the United States Government is permitted. Papers 1, 12, 17, 21, 22, and 23 are pub lished by permission of The RAM) Corporation.
Printed in the United States of America
PREFACE The Theory of Games that John von Neumann created some thirty years ago stands as one of many lasting monuments to his great genius. We join with the whole scientific world in mourning the untimely loss (February 8, 1 9 5 7 ) of this giant of modern mathematics. Since the publication in 1953 of CONTRIBUTIONS TO THE THEORY OF GAMES, Volume II (Annals of Mathematics Study 2 8 ), work in game theory has developed in two principal directions. One has been the investigation of certain classes of infinite two-person zero-sum games, not only to es tablish the existence of solutions but also to describe their detailed nature and construction. The present Study is devoted mainly to contribu tions of this sort. Other research has been directed toward the solution of as large a class of n-person games as possible, or the proposal of new solution-like notions. Such work is to appear in CONTRIBUTIONS TO THE THEORY OF GAMES, VolumeIV (Annals of Mathematics Study ^0 ). There is no general bibliography in this Study to supplement those In Volumes I and II (Annals of Mathematics Studies 2k and 2 8 ). Instead, It is planned to have such a bibliography In Volume IV (Study k o ) . Also, the new book of R. D. Luce and H. Raiffa, GAMES AND DECISION-An Introductory Survey (Wiley 1957)> contains an excellent contemporary survey of game theory and a bibliography* The editing and preparing of this Study have been done partly at Princeton University in the Department of Mathematics through a Logistics Project sponsored by the Office of Naval Research, and partly at the RAND Corporation. Princeton Project members who participated in the task have been M. Frank, J. H. Griesmer, H. W. Kuhn, R. Z. Norman, M. Sion, G. L. Thompson, A. W. Tucker, and Philip Wolfe. RAND participants have been L. D. Berkovitz, A. W. Boldyreff, M.Dresher, 0. Gross, 0. Helmer, S. Johnson, M. Peisakoff, H. Scarf, and L. S. Shapley. The following additional persons have generously assisted in the refereeing work: D. Blackwell, H. Everett, W. H. Fleming, R. Isaacs, J. P. Mayberry, J. Nash, R. Restrepo, J. Robinson, and D. V. Widder. The typing v
PREFACE of the master copy has been done by Mrs. Euthie Anthony with efficient care. To all these, and to the Princeton University Press through its Director, H. S. Bailey, Jr., the Editors express sincere thanks.
M. Bresher A. W. Tucker P. Wolfe July 1957
vi
CONTENTS Preface
v
Introduction
1
PART I: Paper 1.
Moves as Plays of Other Games
On Games of Survival By J. Milnor and L. S. Shapley
15
2.
Recursive Games By H. Everett
^7
3•
Finitary Games By J. R. Isbell
79
b.
Approximation to Bayes Risk In Repeated Play By James Hannan
97
5•
Information in Games With Finite Resources By David Gale PART II:
iM
Games With Perfect Information
Effective Computability of Winning Strategies By Michael 0 . Rabin
1* 4-7
The Banach-Mazur Game and Banach Category Theorem By John C. Oxtoby
159
Topological Games With Perfect Information By Claude Berge
165
Stochastic Games With Zero Stop Probabilities By Dean Gillette
179
10
Cartesian Products of Termination Games By John C. Holladay
189
11
A Study of Simple Games Through Experiments On Computing Machines By W. Walden
201
PART III:
Games With Partial Information
12.
Games With Partial Information By H. E. Scarf and L. S. Shapley
213
13•
A Discrete Evasion Game By L. E. Dubins
231
1^ . An Infinite Move Game With a Lag By Samuel Karlin 15*
The Effect of Psychological Attitudes on The Outcomes of Games By John G. Kemeny and Gerald L. Thompson
257
273
CONTENTS
PART IV: Paper 16 .
Games With a Continuum of Strategies
On a Game Without a Value By Maurice Sion and Philip Wolfe
299
17.
A Rational Game on The Square By 0 . Gross
3 07
18.
Tactical Problems Involving Several Actions By Rodrigo Restrepo
313
19*
Multistage Poker Models By Samuel Karlin and Rodrigo Restrepo
337
20.
On Games Described by Bell Shaped Kernels By Samuel Karlin
365
PART V:
Games With a Continuum of Moves
21.
On Differential Games With Survival Payoff By H. E. Scarf
393
22.
A Note on Differential Games of Prescribed Duration By W. H. Fleming
J4-07
23.
On Differential Games With Integral Payoff By L. D. Berkovltz and W. H. Fleming
M 3
COOTRIBUTIONS TO THE THEORY OP GAMES, III
INTRODUCTION Most of the contributions in this Study deal with infinite twoperson zero-sum games, establishing the existence of solutions and investi gating in detail the nature and construction of solutions. The papers fall in two broad categories, those in which the games are presented In "normal ized form” and those which exploit the structure of the game given in "ex tensive form.11 Laying down the necessary definitions for an infinite game in ex tensive form is not a difficult matter, since it is obviously trivial to relax the usual condition that the number of choices at each move is finite. If plays of infinite length are also allowed, it is sufficient to redefine a play to be a sequence of choices, with the payoff function then defined on such sequences. Simple measurability conditions will ensure the exist ence of an expected payoff for any pair of the players’ pure strategies, given probability distributions at each of the umpire’s choices. The normalized form for such a game can then be obtained, once the sets of pure strategies for the two players have been described as appropriate measure spaces, by defining mixed strategies as probability measures over these spaces. Mixed strategies can be obtained from behavior strategies as product measures in the usual manner. Most of the previous research on infinite games has taken as its object games that are in normalized form with a continuum of pure strategies for each player. If this continuum is taken as a unit interval, as is usually the case, the set of all joint outcomes constitutes a unit square and the payoff function can be taken simply as a function on the unit square. Thus the phrase "game on the unit square" describes a two-person zero-sum game defined by a measurable function K(x, y), o < x, y < 1, in which the two players select x and y independently, each in ignorance of the other’s choice, following which the player selecting y pays the other the amount K(x, y). All games having continua of pure strategies can be regarded as games on the unit square, but the setting up of K will require special handling If the extensive structure of the game involves a more complicated information pattern than mere independent selection of x and y .
2
INTRODUCTION
In any case, the basic question asked about these Infinite games is: Do they have optimal strategies (or, at least, "e-optimal strategies" for arbitrarily small e)? The early example of Ville1 showed that a rather simple game on the square need not have a value. He gave also the first general theorem of a positive nature: A game on the unit square possesses optimal strategies if it has a continuous payoff function K. The positive results that have been obtained along these lines are not a p great deal stronger. Glicksberg has shown that e-optimal strategies ex ist if the payoff function is upper- or lower-semicontinuous, results which can be refined somewhat by the use of a theorem of Kneser on mixed strate gies. Karlin^ has obtained related extensions in terms of the behavior of the payoff function as the kernel of an integral transformation. Although these results establish the existence of a value and of optimal strategies for some games in extensive form treated in this Study, they are too weak to settle these questions for most of them. Many of these games, formulated to model game-like situations drawn from other fields, have payoffs which are too "pathological" to allow treatment by very general methods, and detailed use must be made of their particular ex tensive structures. Most of the games studied, moreover, belong to the broad class of "multi-move" games: infinite games which are built up out of a set of "components" which are games or game-like structures, themselves of finite length. If one component of such a set is prescribed as a "start", and the outcome of each of the components is an instruction to play another component of the set, possibly together with a numerical payoff, then the entire infinite game is recursively defined by the set of rules for the components. An extraordinary variety of types of Infinite games— each presenting a unique problem to the theorist— results from the specification of various information patterns and payoffs for the infinite game. Part I is devoted to games built out of repeated play of simul taneous -move games. That is, It Is supposed that each component is al ready a game in normalized form, each player being uninformed of the other’s 1 Ville, J., "Sur la th6orie g6n£rale des jeux oti intervient l ’habilitS des joueurs," Traite du Calcul des Probability et de ses Applications, par E. Borel et collaborateurs, Paris (19 3 8 ), vol IV, no. 2, 105-113. 2 Glicksberg, I. L., "Minimax theorem for upper and lower semicontinuous payoffs," RAND RM-^78 (October 1950). ^ Kneser, H., "Sur un th6or£me fondamental de la th£orie des jeux," Comptes Rendus Acad. Sci. Paris 23^ (1952), 2^-18-2^20. ^ Karlin, S., "Operator treatment of minmax principle," Annals of Mathe matics Study No. 28 (Princeton, 19 5 3 ), pp* 133 - 15^-*
INTRODUCTION
3
present choice but completely informed of all choices made in components previously played— and hence knowing in which component he is playing at the moment of choice. The games of Part II are again multimove games, but have perfect information for both players; not only is each player inform ed of all previous choices, but each component is a game of perfect informa tion. The games of Part III, on the other hand, carry even a weaker in formation structure than do those of Part I; at the time of choice, a play er is only partially informed of the component he is playing. They are games of "information lag". In another direction, the papers of Part IV deal with the oldest type of infinite game— games given directly in normal ized form on the unit square. Finally, Part V consists of contributions to the theory of the important but difficult class of games whose plays are described by a continuum of choices. PART I— Moves as Plays of Other Games The games studied in the five papers of this Part have, in various publications, borne the descriptive titles "survival," "ruin," "attrition," "stochastic," "recursive," and "multistage." The study of these games was initiated by a group of mathematicians at the RAND Corporation in 1951• In several -unpublished RAND memoranda cited in Paper 1, special cases of the game of survival described in the next paragraph were studied. One involved repeated play of a single two-by-two matrix game by players who begin playing with given initial resources and continue until one player is ruined. Another concerned a similar game having a matrix of arbitrary finite size with integral entries. PAPER 1
This line of development has been carried considerably further by Milnor and Shapley in the first paper of this Study. Pre cisely, two players with initial resources r and R - r agree to play the zero-sum matrix game lla^jll an unprescribed number of times. If a player is ruined, the game terminates, and the payoff is 1 to the sur vivor, 0 to the ruined player. Since the transition probabilities are controlled by the participants, and not by chance, it Is possible for both players to survive indefinitely, in which case the payoffs are Q > 0 and 1 - Q > °, where Q, may be an arbitrary function of the entire course of play. Using a certain game-theoretic functional equation and the theory of semi-martingales, the authors analyze the extent to which the existence of solutions depends on Q. It is shown that, if the game has a value, it can be given as a function of the first playerTs initial resources and is a monotonic solu tion of the functional equation
INTRODUCTION 0(r) = val ||0(r + with boundary conditions
0(r) = o
if
.)|| r < o
o < r < R, and
0 (r)
= 1
if
r > R.
In particular, if Q = 1 then the value exists as well as an optimal strategy for the first player. If there are no zeros in ||a. .||, then the J value exists and is independent of Q. In the latter case, optimal strate gies exist for both players. If llaj_jll contains zeros, then the exist ence of a game value depends on the "regularity" of the function Q. PAPER 2
In a "survival game", a payoff accumulates throughout an entire play of the game. In the recursive game developed in this Paper, a payoff occurs only when the game stops. A recursive game is a set of n "game elements", each of which is a two-person game (with no restriction on the cardinality of the players1 sets of strategies) whose outcome is either a zero-sum payoff or an Instruction that a specified game of the set be played again. Plays of infinite length may thus occur, and are assigned payoff zero. The game is studied by means of its "value mapping": given an n-vector v, a zero-sum game is derived from each game element by re placing for each i = 1, ..., n, the outcome "play game element i next" by the number v^; the n values (when they exist) of the games so de rived constitute the image of v under the value mapping. Everett shows, under the hypothesis that any game derived from a game element has optimal strategies, that the recursive game has a value In stationary strategies— mixed strategies for the recursive game consisting simply in the employ of the same mixed strategy in each game element. However, optimal strategies may not exist— only "e-optimal" strategies. If, on the other hand, the games derived from the game elements are assumed only to have values, then the recursive game will still have a value, but not necessarily in sta tionary strategies. These results are generalized to a type of recursive game in which moves are made continuously, the passage from one game ele ment to its successor taking an infinitesimal time. Everett also considers stochastic games, where payoffs take place even though play does not stop. It is no longer true that values must ex ist. However, several large classes of stochastic games have values. In particular, Everett shows that a stochastic game always has a value when it consists of one element, which can at most repeat itself. This paper thus complements the historically important 1953 paper of Shapley,^ which shows the existence of optimal stationary strategies for stochastic games having probability one of terminating in a finite length of time. Everett!s results include those of Shapley and also the special cases of survival games described above. ^ Shapley, L. S., "Stochastic games," Proc. Nat. Acad. Sci. (USA) 39 0 9 5 3 ) 1095-1i00.
INTRODUCTION
5
PAPER 3
In "FInitary games," Isbell gives a variety of results related to the decomposition of a given game into finite game elements and the reconstruction of games from them. He finds first that, employing suitable behavior strategies, the customary condition that a play of a finite game cannot meet a given information set more than once can be re moved without losing the existence of optimal strategies in the game. By means of the Kakutani fixed-point theorem, this result can be given in the form of the existence of an equilibrium point for such a finite many-person game. Another result is closely related to those of Papers 1 and 2: A finitary game is composed from a finite number of game elements as Is the recursive game of Paper 2, with the provision that all non-terminating plays have the same value to a given player (not necessarily zero). Isbell shows that such a game possesses optimal strategies, although not nec essarily stationary ones. Finally he proves a mlnlmax theorem of consider able range for "programming games" having a two-player payoff function which is the quotient of two multilinear forms. PAPER k
In this paper which views repeated play of a single finite game as a statistical decision process, Hannan studies strategy-sequences for player II of a zero-sum. two-person game which take advantage of player I fs misplay. These strategy-sequences are continuous approxi mations to the (fictitious play) strategy-sequence which consists in play ing against I fs cumulative past choice strategy-sequence for II, using at the the sum of I fs cumulative past choices where z is chosen at random from the
at each play. Hannan exhibits a (k+1 )st move a Bayes strategy against 1 2 and the vector (3n2 /2m) 1/2 ' k ' z, unit m-cube (the original matrix
game being m by n). An upper bound is derived for the expected inutility incurred by this strategy for N moves. PAPER 5
Studying the role of Information In the sequential play of matrix games, Gale analyzes a particular class of games, called games with finite resources, for which the information about the opponent1s moves may be omitted without any loss in the game value. In such a game each player is required to play each pure strategy of a given finite game a fix ed number of times in any order. The payoff is the sum of the payoffs from the individual plays. Gale shows that in such games It Is of no advantage to a player to know which strategies are available to his opponent. Further, it turns out that the uniform mixed strategy is optimal. PART II— Games with Perfect Information The five papers that comprise the second part of this Study treat games with perfect information, that is, games in which each player is always
6
INTRODUCTION
informed of the complete previous history of the play. Historically, this •was the first category of games in extensive form to be studied. It owes its importance to the result, stated (for the game of Chess) by E. Zermelo and first given a complete proof by J. von Neumann, which asserts that a finite two-person zero-sum game can always be solved by pure strategies without randomization. The basis of this result is clear: At each occasion for a choice, a player selects from among the subgames with perfect information that follow each of the alternatives open to him. If all of the plays are of finite length (I.e., composed of a finite number of choices), an induction completes the proof, provided that the result holds for games with but one choice in each play. (A transfinite induction is needed if there is no -uniform bound on the length of the plays. ) Therefore the theo rem is valid as stated in games that offer but a finite number of alterna tives at each occasion for a choice, while pure strategies are e-optimal in all games with perfect information in which all plays are of finite length. PAPER 6
While the question of the existence of winning strategies for In finite games with all plays of finite length and perfect informa tion is thus affirmatively settled, Rabin points out some fundamental prob lems pertaining to the computability of winning strategies in these games. First, he asks, how are the rules of a game to be given so as to ensure the possibility of actually playing it? The rules must be such that It is possible to ascertain effectively within a finite time (i) whether a move by a player is legal and (II) whether the payoff from any play can be computed. These requirements are expressed mathematically by stipulating that certain functions are effectively computable (recursive). Rabin thus arrives at a mathematical definition of "actual games," i.e., games which can actually be played. The second problem posed is the analogue for actual games of the basic question for games: Does every actual game (which necessarily possesses optimal strategies) possess optimal strategies which are effectively computable? Using a standard result of recursive function theory, this question is answered In the negative: there exist actual games which have no computable winning strategies. A further theo rem indicates that substituting a computer for the player having winning strategies is sometimes the worst possible arrangement. A short description of Turing machines and recursive functions is appended to the paper. PAPER 7
The "game of Banach and Mazur" has been part of the folklore of game theory for some time. It is "played" as follows: a subset A of the real line Is given, and two players alternately choose nonempty closed bounded intervals of the line In such a way that each interval is a subinterval of the preceding choice. The player who chooses first wins
INTRODUCTION
7
if the Intersection of this nest of intervals with A is nonempty; otherwise the other player wins. The theorem conjectured by Mazur and proved by Banach (but never published) asserts that (i) the game Is determined in favor of the second player if and only if A is of first category, and (il) the game is determined In favor of the first player if and only if the complement of A is of first category at some point. A proof of this theorem by Mycielski, Swierczkowski, and ZIeba has been announced, and will appear in Fundamenta Mathematicae. In the present paper, Oxtoby general izes the game to an arbitrary topological space in which the players alternately choose sets from a family of sets with nonempty interiors such that every nonempty open set contains set of the family. Under these con ditions (i) holds without change, while (il) is valid if X is a complete metric space. As an interesting by-product, the Banach category theorem is shown to be a corollary of (1) in an arbitrary topological space. PAPER 8
In "Topological games with perfect information” Berge treats a certain class of many-person games with perfect information of the type of "games of pursuit," for which the set X of all "positions" is topologized. Defining the rules and payoff of the game by certain func tions on X, the game Is called topological if these functions are con tinuous. It Is not required that X be finite, or that a play, which is a sequence of elements of X, be bounded in length. The theorem of Zermelo, von Neumann, and Kuhn on the existence of an equilibrium point in pure strategies is obtained for this game. In addition, Berge gives some topo logical properties of the function on X whose value at any position in X Is the payoff a given player can ensure himself if play starts from that position. PAPER 9
In this paper Gillette examines some questions related to a game of infinite length consisting of repeated play of a finite number of finite zero-sum two-person games of perfect information, the outcome of each of which is both a utility payoff and instructions to play another game of the set* Shapley [footnote 5 above] has considered games of this sort for which one member of the set is a game from which there Is no pass age to another, subject to the assumption that with probability one every play enters this game. Dropping Shapley’s restriction, Gilletteuses as "average" payoff function for the infinite game the expression Urn infn— ^ ^ l(pQ + Pl + ... * pn ) , where pn is the payoff on the occasion of the set. Via the auxiliary "discounted" payoff
nth play of a game of the
8
INTRODUCTION
p 0 + (i
- s ) p .,
+ ...
+ (1
- s)
pn
(0 < S
xk-1
.
A fundamental theorem ([5 ], page 32b) implies that a bounded semi-martIngale converges with probability 1, and that its limit xm satisfies ECx | xo 1 > x^ '•CO 1 o For our purposes, "bounded" can be taken to mean that the x^. themselves are bounded, uniformly in k, although the results stated are valid -under much weaker conditions. Let 0 be any bounded solution of (3)* We define a local 0-strategy to be a mixed strategy that always prescribes optimal probabilities for the games ||0(r. . + a . .) ||. Thus, In this terminology, a locally opti.K - I 1J mal strategy is a local v-strategy. If Player I uses a local 0-strategy against an arbitrary strategy of Player II, then the sequence
{0(r-^)}
that
Is generated is a bounded semimartIngale. (Note that E{0 (rk } 1 rk-i> •••' r0 ) > 0 ( ^ k-1) implies E{0(rk ) | •••> 0(rQ )} > 0 ( r k-1)^ even though 0 may not be one-one.) Hence we have convergence with probability 1, and e
[
lira
I k --- ► °°
0(rk ) | r | > U (r ) J
.
GAMES OF SURVIVAL 0
Now if
19
satisfies (4 ) as well, the left side of this inequality can be
expressed as o*prob{I is ruined) + 1 *prob{II is ruined) + 0 •prob{both survive) where
0
is some number between
o
and
1.
,
Hence:
(5 )
prob{II Is ruined) > 0 (rQ ) - 0 •prob{both survive) ;
(6 )
prob{I survives) > 0 (rQ ) + (1 - 0 )•prob{both survive)
Thus such a strategy for Player I guarantees that he will survive with probability > 0 (r ). If we could show that double survival has prob ability zero, at least for some particular local 0 -strategy of Player I, then it would follow that he can guarantee himself an expected payoff of 0 (rQ ), or more, regardless of the other’s strategy, and regardless of Q. A similar argument for Player II would then establish the existence of a value and optimal strategies for the survival game, independent of Q. In attempting to carry out a proof on the above lines, one might hope to start with an arbitrary local 0 -strategy and (I) use the known convergence of vergence of {r-^); then (ii) end; all with probability
{0 (r^))
use the convergence of
to establish con
{r^J
to show that the game must
1.
Unfortunately, neither (I) nor (ii) is unconditionally valid. In Section 3 we proceed by way of strictly monotonic approximants, for which (i) is valid, and obtain thereby the existence of the value. In Section k we ob tain the existence of optimal strategies by working with a restricted class of "special 0 -strategies", which make (r^) converge even when 0 Is not strictly monotonic. However, in both proofs It is necessary to assume that none of the a ^ . is zero, in order to make convergence of {r^J equivalent to termination of play (step (ii)). In Section 5 we drop the zero-free condition on and find that a value still exists if Q is sufficiently regular. However, the value may depend on Q (see Examples 1,2, 5 above), and the players may not have optimal strategies (Example 7 below). Our proof parallels the one in Section 3 (strictly monotonic approximants), but is based on a more complicated functional equation, to be discussed there. Finally, in Section 6 we will derive some estimates for the value function that have much in common with the well-known approximate solutions of the classic "gambler’s ruin" problem. They have simple analytic forms,
MILNOR AND SHAPLEY
20
in contrast to the sharply discontinuous nature of the exact value functions (see Examples 8 and 9 below). The estimates become more precise if R is made large compared to the a^., and they give exact information if the a^j are all + 1, or + 1 and o. They also provide strategies that are approximately optimal. It should be noted that Sections 3 , b, independent of one another.
and 6 are essentially
[In Example 7 , Player I can win with probability approaching 1 if he always chooses I-, according 2 2 to the distribution (1 - e, € - € , € ) , with € small but positive. However, if Q < 1 he has no strictly optimal strategy. Example 8 illustrates in a simple way some of the possibilities for the value function v(r). Under optimal play the first player’s fortune describes a random walk on (o, R) with + 1 1
0
0
-1
1
0
-1
-1
1
Example 7
l -a
-a \ 1/
Example 8
/ 1 I -a
-a+e 1
Example 9
and - a having equal probability. The value is just the probability of absorption at R. If a Is rational then the value Is a finite step-function, which can be determined exactly by solving a certain system of linear equations. But if a is irrational (with R > 1 + a > 1 ), then the value function is discontinuous on a set of points everywhere dense in (o, R); it is strictly mono tonic; and its derivative Is almost everywhere 0. In Example 9, let a be irrational, and assume o < e < a < R - 1. The value function is again strictly monotone, with discontinuities everywhere dense between 0 and R. We no longer have a simple random walk as above, and we no longer have a good description of the value function. Whether the derivative vanishes almost everywhere in this case is an open question.] §2.
SOLUTIONS OP THE FUNCTIONAL EQUATIONS
A monotonic solution to (3 ), ( b ) can be constructed by an itera tive procedure. Define. 0 Q by:
21
GAMES OF SURVIVAL
CO
if
r < R
Bq ( t ) -
L lif and let
(7 )
r > R
0 n - Tn0 Q, whepe the tpansformation T is given by: val ||0 (p + a. .)||
o < p < R
0 (p )
p < 0, p > R
T0 (p)
It is cleap that 0 R can be intepppeted as the value function of the finite, tpuncated game in which Playep I loses unless he succeeds in puining his opponent in n moves op less. LEMMA 1. The sequence {0 n ) just defined convepges pointwise to a monotonic solution of (3), (4 ). PROOF.
By constPuction the fixed points of (?) ape solutions of
(3 ), and convepsely. Since T is continuous, it suffices to show that lim 0 n exists and Is monotonic. This is accomplished by showing inductive ly that 0 n (p) is monotonic increasing in both n and p. The details ppesent no difficulty whatever. (Compape the much hapdep ppoof of Lemma 5 below. ) Let vQ denote the limit of the limit of the similap (descending) sequence function
r o
0 n, and let
v1 denote the £Tn0^ b e g i n n i n g with the
if
r < o
if
r > o.
0’ (r) = | I 1
THEOREM 1.
If
Q = o
then the value of the supvival
game exists and is equal to v0 ^ro^* ^ Q s 1 the value exists and is equal to v 1(pQ ). PROOF.
then
Playep I can guarantee that ppobfll is puined) > 0n (^o )
by following an optimal stpategy fop the nth tpuncated game (and playing apbitpapily aftep the n ^ move). On the othep hand, Player II can guapantee that
22
MILNOR AND SHAPLEY prob{II survives) > 1 - vo (rQ )
by adopting a local vQ-strategy (see (6 ) above). But the payoff of the Q = 0 game depends solely on whether Player II survives or not. There fore
v0 (po^
^~s
value>
other case is similar.
Note that the proof provides an optimal strategy for Player II, but not Player I, if Q = o. The existence of this optimal strategy, and of the value, could have been deduced from the lower semi-continuity of the payoff, as a function of the pure strategies (see [9])* remark applies to the Q = 1 game.
A similar
THEOREM 2. If (3 ) } (4 ) have aunique solution then the value of the survival games exists and equal to 0 (rQ ), independently of Q. PROOF.
0,
is
As before, Player I can ensure that prob{II Is ruined) > 0n (rQ )
Similarly Player II can ensure that prob {I is ruined) > 1 - ^(^q) But lira 0n =
VQ = 0 = v 1 = lim 0^; hence
0 (r
)
is the value of the game.
Note that this time we do not obtain an optimal strategy either player.
for
The next lemma identifies vQ and v 1 as the ’’extreme" solu tions of (3), (4 ); and incidentally establishes aconverse to Theorem 2. LEMMA 2 . If 0 bounded between
is any solution of (3), (4 ) that Is 0 and 1, then vQ < 0 < v 1.
PROOF. We observe that 0Q < 0, and that T0 n < 0 . Hence vQ < 0 . Symmetrically, v >
0n < 0 0.
implies
COROLLARY. If the value function ofthe survival game exists and is Independent of Q, then It is the only solution of (3), (4 ) bounded between 0 and 1. The last provision is necessary, since "spurious" unbounded so lutions do sometimes occur.
GAMES OP SURVIVAL The next lemma shows that and
R,
vQ
and
v^
23 usually have jumps at
and characterizes the exceptions in terms of the matrix
0
llaij ||.
LEMMA 3 (A) The following are equivalent: (i) v 1(r)
(B)
is continuous at
r=
R;
(ii) v 1(r) s 1 for 0 < r < R; (iii) ||a, .|| has a nonnegative row, -j The following are equivalent: (i) v (r) is continuous at r = (ii) v (r) = 1 for 0 < r < R; (iii)
R;
every set of columns ofllaj_j ||, con sidered as a submatrix of ||a^ .||, has a nonnegative row, not all zero.
Corresponding statements hold concerning continuity of vQ and v 1 at r = o. PROOP.
(A)
Obviously (iii) = >
(II)= >
(I). If (iii) is
false there Is a negative entry in each row. A strategy of playing all columns with equal probability, on every move, gives Player II a prob ability > n ^ o f winning, where n Is the number of columns and a Is the smallest non-zero |a^ - 1. This gives a positive lower bound for 1 - v 1(^Q ), Independent of rQ, and makes v 1 discontinuous at R. Hence (I) = >
(ill).
(B) Obviously (ii) ==>- (i). If (iii) Is false there is a set of s columns on which player II can distribute hischoices with equal probabilities 1/s, giving him a probability > s^ of surviving. _.;nce vQ (r0 )is bounded away from 1 and vQ is discontinuous at R. Hence (i)= > (iii). To complete the proof, suppose that (iii) holds but not (ii). Choose r* > 0 so that vQ (r*) < vQ (r* + a). Then vQ (r*) isstrictlyless than vQ (r* + a ^ .)whenever a ^ . is positive. Let r\ be an optimal mixed strategy for II in the matrix game ||vQ (r* + a^ .)||; let S be the set of columns j with r\ . > 0; and let iQ be the J D nonnegative subrow, now all zero, whose existence is asserted by (iii). Then v vQ (r*) < vQ (i + a. .) holds for j in S, with strict inequality o (r*) - v_ o (r* at least once, and nj-v0 (r*) < ^ ( r *
+ a±aJ.)
holds for all j, with strict inequality at least once. Summing over and recalling the optimality of t\, we obtain:
j,
MILNOR M D SHAPLEY vQ (r*)
- (ii). (An alternative proof that (iii) implies (ii) could be obtained from the discussion below.) COROLLARY.
If max mln a . . < o < min max a . . I j j I J
then every bounded solution of (3), (4 ) has jumps at o and R . it may be of Interest to describe some near-optimal strategies for Player I In the event that (I), (ii), (Iii) of (B) hold. (Compare Ex ample 7 above.) Let SQ be the set of all columns; let 1 be a row non negative and not identically zero on SQ; let S 1 be the subset of SQ on which a^ • = 0; and so on. Then we have SQ } ... 3 S^ ) S^+1 = 0 , for some p (proper inclusion all the way), and moreover the IQ, ..., 1^ are alldistinct. Then it is easy to show that the probabilities; 1 - €, 6- = 6 - e2, X1
^O
!
= eP"1 Xp-1
- eP, 4 xp
= £P
,
all other = o, if used repeatedly by Player I, guarantee with prob ability > 1 - e that the first non-zero a^ . to occur will be positive. Hence Player I wins with probability > (1 - € )“^~(R-po)/a ^ no matter what Player
II does.
This bound goes to
§3 . EXISTENCE OF A VALUE WHEN
1 as
€ ---^ 0.
||ai .||
IS ZERO-FREE
This section will be devoted to the proof of the following theorem: THEOREM 3 * If llaj_jll is zero-free, then the value of the survival game exists and Is independent of the payoff Q assigned to nonterminating play. During the proof we shall work with certain generalized survival games, having more general, bounded payoff functions P*(r) in place of the P(r) of (2). The other elements of the game, namely llaj_j!!> R and r , remain as before. The functional equation (3 ) is still applicable, but with new boundary conditions:
GAMES OF SURVIVAL (4 *)
0(r) = P*(r) LEMMA k .
25
r < 0, r > R
.
Suppose that (3 ), ( k * ) have a strictly mono-
tonic solution 0 *. Then, if llaj_jll is zero-free, the value of the generalized survival game exists and is equal to 0*(rQ ). PROOF. Let Player I use a local 0 *-strategy and Player II an arbitrary strategy. Then {0 *(r^)) is a bounded semimartingale, and con verges with probability 1. Because 0 * is strictly monotonic, {r^J also converges. This means that play terminates, since, with none of the aij =
roust
outside
(o, R).
E{P*(lim rk )} = E{0 *(lim r^)} >
Hence
0*(rQ )
.
A similar argument for Player II completes the proof. We shall consider functions (until Section 6): r P*(r) = \ t
P*
of the following form only
€(r - R - A) 1 + e(r - R -
A)
if
r
R
,
where A = max |a^ .|, and e is a positive constant. As € -- ► 0 these functions approach P(r) from below. Imitating the construction of Section 2, we define 0 * = Tn0 * where T is the same transformation (7 ), and 0 * is given by: f =
It if exists,
lim 0R
e(r - R - A)
if
r
R
\
is obviously a solution of (3 ), (4 *).
LEMMA 5. If val||a^ .|| > 0, and if vQ is not con tinuous at R, then for sufficiently small e the sequence {0n ) just defined converges pointwise to a strictly monotonic solution of (3), (4 *). PROOF.
To show that the limit exists we observe first that
26
MILNOR AND SHAPLEY f*(r) = val||0 *(r + aj_.)|| > val||e(r
A + a.
e(r - R - A ) + e * val||a^ .|| > e (r - R - A)
■X*- v 0o (r) if o < r < R, and 0 -j(r) = 0 (r) otherwise. Moreover, 0 > 0 , im~ ■X* *)f^_ *X* II Ii I plies 0n+1 = T0n > T0'n_1 = 0n * Since the sequence is obviously bounded, it therefore converges. To show that the limit is strictly monotonic we shall prove inductively that for e sufficiently small the function *Xv 0n (r) - er is monotonlc in r for each n. This is trivial for o J * assume it for 0n-1 CASE 1
Take
o < r < s < R.
0n (r) - er
Then we have:
val||0*_.!(r + a ^ .) - sr|| < val||0*_.!(s + aij.) - es|| = 0n (s) - GS
CASE 2 . Take
r < o < s < R.
Then we have:
*/ V 0n (r) ~ er - 0n-!(r) - er - €s
< 0n-1
•*
< 0>)
- es
(using the first part of this proof in the last step)CASE 3 * Take
o < r < R < s. Note that 0 Q < vQ over the entire range of interest (- A, R + A), and hence, by induction, that 5n* < ~ vo . Since the latter is assumed discontinuous at R, we can select s so that eA < *I - v q (R -). Then we have: */ */ \ 0 (r) - er < 0n (R - ) - eR < v q (R - ) - eR < 1 - eA - eR */ \ - 0n (s) - es (using Case 1 in the first step).
GAMES OP SURVIVAL The other cases, namely are trivial.
27
r < s < o, r < 0 < R < s,
and
R < r < s
This completes the proof of Lemma 5.
PROOF OF THEOREM 3 * There is no loss of generality in assuming that vallla^jH > 0. The theorem is trivial if vQ = 1 in (0, R); we may therefore assume (Lemma 3 ) that vQ is discontinuous at R. Lemmas k and 5 now give us well-defined value functions 0* for our class of gener alized games, for e sufficiently small. As e--- ^ 0 their payoffs P* converge uniformly to P, our original payoff function. It follows that lim €
--------- ^
0*(r ) 0
0
exists and is the value of the original game. dependent of
Q.
This number is obviously in
This completes the proof.
As yet we know nothing about the existence of optimal strategies, unless Q = 0 or Q = 1. As Example k showed, local v-strategies need not be optimal. However, the local 0 *-strategies, for given e, are nearly optimal: it can be shown that they guarantee an expected payoff within e(R + 2A)
of the true value. COROLLARY. If ||a..|| is zero-free then (3 ), ( b ) J have a unique solution bounded between 0 and 1. PROOF.
Theorem 3 and the Corollary to Lemma 2. §4 . EXISTENCE OF OPTIMAL STRATEGIES
In this section we shall find optimal strategies for both play ers, under the assumption that llaj_jll is zero-free. The strategies are locally optima,!, with the added property that they force play to terminate with probability one. Their optimality is therefore independent of Q. The strategies in question are constructed as follows. Let 0 be a monotonic solution of (3 ), (4 ). For each number c for which the set C = 0 ~1(c) is not empty, let |(c) be an optimal strategy of Play er I In the matrix game (8)
||inf 0 (s + a, .)|| seC
.
Define a mixed strategy for the survival game by the rule: choose
1^
according to the probability distribution £(0 (rk - l ^ ‘ S u ch a strategy will be called a special 0 -strategy of Player I. Note that the same prob abilities must be used each time the same value of 0(r) comes up. Special 0 -strategies of Player II are defined similarly with Msup" instead
28
MILNOR AND SHAPLEY
of "inf". We shall see presently that a special 0-strategy is also a local 0 -strategy. First we state our main result. THEOREM k*
If
0
is any monotonic solution of
(3 ),
( k ) , and if llaj_jll is zero-free, then the special 0-strategies are optimal, independently of Q, and the value of the game is 0(rQ ).
As stated, Theorem k is independent of and includes Theorem 3 , and this independence will he maintained throughout the proof, which takes up the rest of this section. Consequently we have a separate proof of the existence of a value in the zero-free case. Of course, 0 is actually ■unique and equal to v, and our real object is only to show that the special v-strategies are optimal. LEMMA 6. A special 0-strategy is also a local 0-strategy. PROOF. We must show that for all r in (0 , R). But ||0 (r + matrices have the same value 0(r), assumed monotonic. Since t(0 (r)) is also
Given a tion":
optimal for
the matrix
£(0(r)) is optimal in ||0 (r + aj_j)|| majorizes (8), and both by (3) and the fact that 0 was is optimal for (8) by definition,
it
||0 (r + a|_j)||*
The proof of Theorem k will be based on the following concept. special 0 -strategy |(c) for Player I define the "punishment func
it(r, j) = ]T £± (c)0 (r + a±j) - c i
,
where c = 0 (r). By Lemma 6, * is always non-negative. If Player II follows a local 0 -strategy then * will always be zero. (Thus it measures the expected amount that Player II will be punished for choosing the j-th column. ) LEMMA 7 • If Player I uses a special 0 -strategy against any strategy of Player II then every possible play of the survival game has the prop erty that, for each k > o, one of the following is true: (a) rk__1 < o or r ^ (b) 0 (rk-1 ) = o ;
> R ;
GAMES OF SURVIVAL
(°)
31
J*k) > 0
(d) rk > Pk__1 + a, (e)
0 (pk ) 4 0
)
29
j where
a = min|a ^ .|
•
lj*
;
The impoptance of this lemma lies in the fact that it sharply pestpicts the possibility of a non-terminating play in which {0(p ^)) convepges. PROOF. We will assume that (a), (b), (d), and (e) ape false and ppove that (c) is true. Since (a) is false we have
= Pk- 1 + \ i *
‘
Since (d) is false this implies that
rk < rk~i Let set
aikjk < 0
op
*
c=0 (i>k _ 1 )> C = 0""1 (c)• Since (b) is falseit follows that this Chasa lowep bound, and therefore that inf 0js + a_. . 1 < c seC ’ ( 3 +
Since (e) is false we have c = 0 (rk ) = 0( r k _, + a±kJ. J
.
Therefore, in the inequality inf 0( ss + a.i j k ) - 0 (r k -, * < * i j J seC ' * a ' 1 "dk
'
the stpict inequality holds fop i = ik * The fact that the p o w ik is actually played by Playep I on the k-th move implies that its probability (c) is positive. Therefore k I
But since
|(c)
?i (c)
0 (s + aijk )
c.
Therefore
M c) 0(rk-i + aijk } - c
This proves assertion (c) and completes the proof of Lemma 7.
LEMMA 8 . If Player I uses a special 0 -strategy against any strategy of Player II, then with prob ability one the sequence {it^) = {^ )) con verges to 0 o
sum of the
PROOF. (Compare [5]* page 297, Theorem . For each n > 1 we have:
^{ i -k} =e{ z[
1. 2 (i).)
Consider the
i v ,} -»]}
n
=
X
[ E { 0(rk) } - E{0(rk_1)}
3 E { 0 (rn }} ~ E { 0 (ro> } < It follows that the probability of the Infinite sum exceeding any given bound M Is < 1/M. Hence, with probability one the series has a finite sum and 1 *^} converges to 0. PROOF OF THEOREM k . row or negative column In
The theorem is easy If there is a positive
liaj_j II* ^e therefore assume
max' min a. . < 0 < min max a. .
i
j
j
i
Let Player I adopt a special 0 -strategy, and Player II an arbitrary strate gy. The play of the game that then occurs can be described by the pair of sequences (i^^ which we shall regard as the underlying random vari able. They are sufficient to determine three other important sequences {r^}, and (0 (rk )}. The last-named is a bounded semimartingale, by Lemma 6 , and therefore the set of plays for which it fails to converge has probability zero. The set of plays for which {n-^} fails to
GAMES OF SURVIVAL
31
converge to o also has probability zero, by Lemma 8 . We shall prove that every play outside these two sets terminates. By (5) this will imply that Player I Ts special 0 -strategy assures him an expected payoff > 0 (r )• The corresponding argument for the other player will complete the proof. Consider therefore a play (0
(9 )
{i^}, {j^}
in which both
c
(rk )}
and 0
(10)
If c = 0 or 1 then play must terminate, because of the fact that 0 is discontinuous at 0 and R (see the Corollary of Lemma 3 , Section 2). Thus our object will be to show that the hypothesis 0 < c < 1 leads to a contradiction. Let {kn ) 'be the sequence of indices k such that 0 (pk )
^ c
or
4 0,
or both, and let
sn = rk • Lemma 7 shows that
{kn l and {sn ) have infinitely many terms, since alternatives "(a)" and n(b)M are excluded by hypothesis, while any unduly long chain of M(d)" will take r^ out of the interval C = 0 ~ 1 (c). Thus, instances of ,f(c)" or n (e),f must occur regularly, giving us *k ^ 0, or 0 (r^) 4 0 (rk-1 )• In fact, consecutive terms of {kn ) can not differ by more than [ ( 7 / a) + 2], where 7 is the length of C, a = min|a^j|, and [x] is the greatest integer < x. Let us now examine the possible limit points of the sequence {sn ), bearing in mind properties (9) and (10). First we have the upper and low er endpoints of C; call them u and z respectively. (If C is empty, a short argument shows that (9) is contradicted.) Secondly, there are the points within C, in the neighborhood of which can be arbitrarily small, but positive. There are only a finite number of such points, since, for each j, *(r, j) is a monotOAic increasing function of r in C; denote these points by y^, y^, ..., yp . Let 8 be a small positive constant (it has an exact value, which will be defined later), and let Y 1, Y 2, ..., Yp be a collection of intervals, variously open and closed, but all of length 8 , defined as follows: [u, u + 8 )
if
u i C
(u, u + 8 )
if
u € C
Y
(l
4
- 8 , i]
if
I
Y, 2
(£ - 8 , I)
if
& e C
Yi
(yj_> y± + 5 ]
3
C
< i < p
.
32
MILNOR AND SHAPLEY
Let Y denote their set-theoretic union. By (9), (10), there is an (depending on 6) such that sn € Y for all n > n . Now define
&
nc
to he the smallest nonzero number of the form:
N (11
X
)
N < pC(7/a) + 2]
v= 1 the The effect of this definition is to ensure that when |m - n| < p difference |sffl - sn | is either 0 or > 5 It follows that, for n > n , lying in a given Y^ are equal (proof below). Hence some s* all n appears infinitely often in the sequence If s* i C then (9) is is in contradicted. If s* e C then (10) is contradicted, since k finitely often positive, and bounded away from *(s*, j).
0 by the smallest nonzero
This is the desired contradiction.
Finally, we have to prove the statement underlined above. Call a pair (s , sn ) conflicting if they lie in the same Y^ but are unequal. Either the statement In question Is true or there is a conflicting pair and spanning no other conflicting pair ( s m, s n ), coming a f t e r s n (s„ mw teger
n' ), m e m ’ < n f < n. < n such that sm j+ 1
Let mQ = m and let m .+ 1 be the last Inand s_ , lie in a common interval Y... m .+,1
Then the ascending sequence
m0, m1, has at most
each
p +
. mq-!J ,m
= n
1 terms, and has the property that
j. We see that each
[(7/a) + 2]
m2, -
increments
is either 0 or > a conflicting pair
a.
is
f o r
s„ m j-1 - s | Is of the form (11), and.
s^ ^ Is the sum of Hence
s. s m .+ 1 m j+ 1 J and at most
5 . This contradicts the assumption that
(s , s ) was
§5 . EXISTENCE OF A VALUE IN GENERAL When there are zeros In the llaijll matrix a unique solution to the fundamental equations (3), (k) is no longer assured, and the value of the survival game may depend on the double-survival payoff Q. In this section we show that the value does exist if Q, is sufficiently regular. Our proof makes use of a new pair of functional equations involving the notion of recursive game (equations (3), (5 ) below), which reduce to (3), is zero-free, and which always have a unique solution. (4 ) If ij
GAMES OP SURVIVAL
33
The restriction on Q, consists in assuming that whenever (r^) converges to a limit r within (o, R), the payoff depends only on that limit: Q = P(r). (Of course {r^} can converge to r only If the equality r^ = r holds for all sufficiently large k. ) We assume P to be monotonic increasing and, of course, bounded between 0 and 1 . It is convenient to amalgamate it with the old function P, which was defined only outside (o, R), and denote both by P. The still-arbitrary portion of Q will be denoted by Q. Thus, the principal distinction in the present set-up Is between convergent play (payoff P ) and noneonvergent play (pay off Q), replacing the former distinction between terminating play (pay off P) and nonterminating play (payoff Q). Note that the present arrangement still includes the important special cases Q - constant. On the other hand, the mischievous GaleStewart functions (Example 3 in Section 1 ) are kept out. THEOREM 5 • If the convergent-play payoff is a monotonic function P of the limit of the first playerls fortune, then the value of the survival game exists and is independent of the payoff Q for nonconvergent play. The proof follows the same general lines as the proof of Theorem in Section 3 • However we must begin with some new definitions and no tation. 3
By an elementary ing: A matrix INj^H = M the symbol (£) . Players m. . Is a number the game
recursive game (M, pj we shall mean the follow is given, each entry being either a number or I and II choose i 1 and ^ respectively. If ends, and II pays I the indicated amount. If
J 1
m.
.
is
(B>
there is no payment, and the players go back to the be-
1J 1
ginning and make new choices
i
tition the payoff is the number
and p
j2,
etc.
In case of infinite repe
(a constant).
The above is a special case of the recursive games defined by Everett [6 ], except for the np M feature. (Everett effectively assumes that p = o. However, (M, p) is obviously equivalent to £m!, o}, with M ! obtained by subtracting p from each numerical entry of M . ) Translating the results of [6 ] we find that the elementary recursive game has a value, which we shall denote by val (M, p), though perhaps not optimal strategies. The value satisfies the relation
(12)
x = val ||mij : x||
,
MILNOR AND SHAPLEY
3^ where
: x ll
the matrix obtained by inserting the numerical variable
x in play of in llmj_jll* The solutions of (12) form a closed inter val, and it develops that val (M, p) is the solution of (12) that is closest to p. The new functional equations can now be formulated; they are: (3 )
0 (r)
= val (M(0 , r), P(r)],
0
< r < R
and 0 (r)
(I) where
M (0 , R)
= P(r),
r < 0, r > R
,
is the matrix with entries: f 0 (r + a±j) m± j (0, r) = < l ( D
if if
a±j. 4 0 a± . = 0.
(5 ) and (4 ) are actually the same. If lla^ .|| is zero-free, then (3) re duces to (3 )- In general, if 0 is a solution of (3), then (12) gives us: 0 (r)
= val||m1 j(0 , r) : 0 (r)|| = val||0 (r + a±j.)||
;
that is, 0 is a solution of (3 ) as well. The converse is not true, how ever, since in fact (3 ), (i) always have a unique solution, while (3), (4 ) do not.
vQ
(7 )
As in Section 2 (Lemma 1), we can construct a monotonic solution to (3), (i) by iterating the transformation T: f val(m( 0 , r), P(r)) T0 (r) = -I I 0 (r)
0 < r < R r < 0, r > R
,
applied to the same initial function 0 q (p) which is 0 for r < R and 1 for r > R. It is easily shown that the functions fn 0 Q (r) are mono tonic in r and form a bounded increasing sequence; the limit is the de sired function v . (It can be interpreted as the value function of the Q s 0 game (compare Theorem 1 ), but there is no point in establishing this fact now, In view of the stronger result which will be proved as Theorem 7.) As before, we introduce certain generalized payoff functions P*(r); they will be assumed monotonic increasing in (- A, R + A). The symbols (3 *), (5 *), (7 *) will refer to equations (3), (5 ), (7) with P replaced by P*. The next lemma corresponds to Lemma k in Section 3 .
GAMES OP SURVIVAL LEMMA 9*
35
Suppose that (3 *), (E*) have a strictly
monotonic solution 0*. Then the value of the gen eralized survival game exists and is equal to 0*(r ). PROOF. Relative to a particular play of the generalized survival game define kQ = o and let kn+1 he the first k (if any) such that rv 4 • The subsequence {s ) = {rv } is finite in length if and only n n Kn if {r^.} converges. We now describe a "local recursive €-optimal ^-strate gy" for Player I; it resembles our previous "local" strategies, but is based on elementary recursive games instead of matrix games. Choose a se quence of positive numbers e , €1, eg, ... with sum e. Let Player I be gin by playing an c -optimal strategy of the elementary recursive game (M(0*, rQ ), P*(rQ )). If and when that strategy runs out (after k 1 moves, in fact), let him continue with an e -optimal strategy of [M(0 *, s1), P*(s )], st and so on. In general, on his (kn+l) move, he will be commencing an €n-optimal strategy of (M(0*, sn ), P*(sn )) . We wish to show that such a strategy, played against an arbitrary strategy of Player II, causes {r^} to converge with probability one. Define the infinite sequence f 0*(sn )
Xn = j
if
{si )
Cxn )
as follows:
is defined through
i = n
,
_
L P*(s
)
if
{s^}
stops at
i = n
< n
o Our construction ensures that, for
n = 1, 2,
.
E{xn I xn - 1’x o ] > xn - 1 ~ en-1
•
Therefore the sequence xQ, x 1 + eQ, x2 + eQ + e1, x^ + €q + e + €2, etc. Is a bounded semimartingale. We conclude that {xn ) converges with prob ability one, with
E ( X oo
However,
(xn )
I
X Q ]
can not converge if
>
X Q
{s^)
6
-
•
does not stop at some
since the
s^
oscillate and
xco =
^ ='P*(lira
sn o (
-
0*
Is strictly monotonic. Hence
anc^ ¥e have: E{P*(lim rk )} > 0*(rQ ) - €
.
The rest of the proof is obvious. We now particularize
P*(r) to be
P(r) + €(r - R - A),
where
36
MILNOR AND SHAPLEY
€ is a positive constant and A = max a ^ . . the same initial function as before:
(Compare Section 3 -)
e(r - R
- A)
if r < R
1
- R - A)
if r > R
(
+ e(r
Using
,
we generate a sequence {i*} = (T*n 0 *) by iterating the new transformation T*, given by (7 *). The next lemma corresponds to Lemma 5. LEMMA 10 .
if
val||a^j|| > 0,
tinuous at R, —-X* sequence {0 }
and if
vQ
is not con
then for sufficiently small e the just defined converges to a strictly
monotonic solution of (3 *), (£*)• The proof is essentially the same as the proof of Lemma 5 * The substitution of elementary recursive games for matrix games causes trouble only at one spot: the proof of 0 > 0 Q . The difficulty is resolved by an appeal to the following fact: p > 0 and val||m^j : o|| > 0 together imply that val(M, p) >0 . LEMMA 11. If both vQ and then so does vQ .
P have jumps at
R,
PROOF. We use the fact that vQ Is a solution of (3 ), (.5 ). As in the proof of Lemma 11 we can find a "local recursive e-optimal ^-strate gy" for Player I that ensures that the sequence (xn ) (as defined there, but with and that
P
for
P*
and
vQ
for
0 *)
converges with probability
E (Xco I Xo 5 ? W This holds forany strategy
" 6
1,
•
of Player II; we shall consider a particular
one. By Lemma 3 the jump in vQ means that has a set of columns that meets each row in a subrow that contains a negative element, or is all zero. The same strategy for Player II used In the proof of Lemma 3 , part B, guarantees a probability > s[- R/a] = 5 > 0 that {r^.} will never in crease. This means that with probability > 5 , (r^) will converge to a limit r < rQ < R. Hence E{xoo I X0 J < (1 " 5) + 5 * P(R ")
-
This bound is < 1 because of rQ . Thus, letting € 0
of the jump in we find that
P
at P,
~0 (r0 )
* and is independent hounded away
GAMES OP SURVIVAL 1for
from
0
< rQ < R,
as was
37
to be shown.
PROOF OF THEOREM 5 *There is no loss of generality in assuming vallla^ .|| > o. The theorem is trivial if vQ is continuous at R, since then vQ = v 1 by Lemmas 2 and3 ;sowe can assume that vQ has a jump at R. Assume for the moment that Palso has a jump at R. Then, applying Lemmas 13, 12 and 11 in that order, we find that the P* games all have values as e — — 0. The uniform convergence ensures that the original game also has a value, and it is clear that this value, being the limit of the P* values, is independent of Q. On the other hand, if P is con tinuous at R, then we can approximate it uniformly by a sequence of dis continuous, monotonic functions. The preceding argument applies to the latter, and passing to the limit completes the proof. COROLLARY. Equations (3), (5 ) have a unique solution, assuming only that P is monotonic and satisfies ( k ) . PROOF. Let 0 be any solution of (3 ), (£) and consider the game determined by P, Q, with Q = 1 . A "local recursive e-optimal 0-strategy" for Player I (see proof of Lemma 1 1 ) will guarantee him an expected payoff of at least 0 (r ) - e in this game. Thus:
v ( r Q) > d ( r 0 ) if v is its value function. But v game defined by Q, s 0, by Theorem 5.
is also the value function of the By symmetry we have:
v ( r Q) < 0 ( r Q) Thus
0
,
.
is uniquely determined. §6 . APPROXIMATIONS AND BOUNDS FOR THE VALUE FUNCTION
In this section we extend to games of survival some of the known results for random walks with absorbing barriers — I.e., the gambler’s ruin problem (see [7], Chapter 1k ). The random walk on (0, R) with each step determined by the fixed random variable |leads naturally to a func tional equation, highly reminiscent of our fundamental equation (3 ): (13)
0 (r)
= E{0 (r + |)},
0 < r < R
It Is satisfied by several functions associated with the random walk; among them is the probability pR (r) that a particle starting at rwill reach R before It reaches 0. This "absorption" probability Is uniquely deter mined by (1 3 ) and the familiar boundary condition:
MILNOR AID SHAPLEY
38 (4 )
0 (r)
assuming that
= P(r),
r
R
;
o.
If It happens that E{^} = 0 , then (13 ) has among its solutions all linear functionsA + Br« Applying the two conditions P^(o) = o and pR (R) = 1 we get A = 0 and B = 1/R, or P r (p )
0
« p
< r < R
.
This is not exact because the particle will in general be absorbed beyond, not at, the barriers rigorous estimates:
o
and
R.
Taking this fact into account, we obtain
r + v - pR^r ^- Trrir > where
nonzero
\x
and
v
are such that always
-
If on the other hand E{£} 4 °> such that
0
n < |
< r < R
>
< v.
then there will be a unique,
X
x
I
e °
1
provided that £ takes on both positive and negative values with positive probability (see [7], page 302, or [15], page 2 8 4 ). Then (13) has among its solutions all functions of the form A + Be 0 . As before, this leads to an approximation:
v
.
% ( r ) ^ ..XTR'-"'"Y
0< r < R
and bounds: ^0 (r+ n )
\jR+v) ^ pR (r) ^ (R+n) e u - l e u
\>
The linear casefirst discussed (with = 0 . Actually it Isnot an exceptional case; we introduce the function f: f ( e Xx - 1 ) / x
f(x,
x) =
J
x
0 < r < R
.
1
E{|}= 0)corresponds to this becomes evident If
if
x j- 0
if
X =0
.
GAMES OP SURVIVAL
39
which is continuous and monotonic increasing in X for each x. (In fact for x 4 0 the function f(X, x) is strictly monotonic in X . ) The re lation (1 4 ) defining XQ becomes E{f(xq, |)} = 0 , which has a unique solution in all cases. Write P(x) for f(xq, x). Then the approxima tion and bounds for pR are simply F(r)/F(R), P(r)/P(R + v) and P(r + m-)/P(R + n )> respectively, regardless of whether XQ ispositive, negative, or zero. For games of survival we have some verysimilar results: LEMMA 1 2 .
If
lla^jll
is zero-free and if
max min a. . < o < min max a . . , i then there Is a unique number (1 5 )
j
I
such that
val|lf (xQ, a± .)|| = o Moreover X and or are both zero.
j
.
val||a. .|| have opposite signs, -LJ
PROOF. The value of \\f(x, a. •)|| Is a continuous and strictly J monotonic function of X, since none of the a^. is 0 . It tends to the limit + ooas X — because of the positive element In eachcolumn, and to the limit - oo asX — - oo because of the negative element in each row. Therefore it has precisely one zero. The last part of the lemma follows from the fact that val||f (o, a. .)|| = val||a. .||. J -J-J Again write F(x ) for f (^Q, x ). LEMMA 13 . F
is a solution of (3).
PROOF1. Using the identity (valid for all
X ):
f (A,, x + y) = f (X, x) + e Xxf ( x , y) we have: X 3? val||F(r + a^j )|| = F(r) + e 0 val||F(a^.)|| = F(r) as required.
The corollary which follows is proved by the same device.
COROLLARY* The local F-strategles are precisely those mixed strategies that use only probability
bo
MILNOR AND SHAPLEY distributions that are optimal in the matrix game
THEOREM 6.
If
lla^jll
is zero-free and if
max min a. . < 0 < min max a. . i j J j i J then the value of the survival game is approximately equal to F(rQ )/F(R). More precisely, we have: P(rQ ) F(rQ+M.) P(R+v ) - v(ro } $ P(R+ut J
’
where v = max a.., \± = - min a... The local F-strategies are approximately optimal, in the sense that Play er I can enforce the lower bound of (1 6 ), and Player II the upper bound, by using them. We remark that an all-positive row (max min a. . > o) or an negative column (min max a^. < o) trivializes the game. (These cases spond to x = - oo and + 00 respectively. )
all corre
PROOF OP THEOREM 6. Denote the indicated lower bound in (1 6 ) by g(r ). Clearly g is a strictly monotonic solution of (3 ), and the local F-strategies are also local g-strategies. If we set P*(r) = g(r) for r outside (0 , R) we have a "generalized survival game" In the sense of Section 3 . By Lemma b, g(rQ ) Is its value, andthe local P-strategies are optimal. But P* < P throughout the relevant intervals (- n, 0 ] and [R, R + v); therefore g(rQ ) < v(rQ ), and the local P^-strategies enforce at least the lower amount for Player I. The other bound Is established in the same way. The bounds (1 6 ) can sometimes be Improved by exploiting special properties of the matrix. For example, inadmissible rows or columns of ||F(a^ .)|| can be disregarded in calculating sults are the following:
m
and
v.
Two other such re
COROLLARY 1. IfrQ, R,and the a± . are all integers, then |i and v In (1 6 ) may be replaced by n and v - 1 respectively.
- 1
COROLLARY 2. If
the
rand
R
are integers and
aij are i 19 then the value of the game exactly P(r )/P(R), and the local F-strategies are optimal.
is
GAMES OP SURVIVAL An equally exact result, holds for arbitrary rQ and R; it has the form v(rQ ) = P(rQ + |I)/F(R + 5 + v), where - [I and R + v are the unique absorption points of the process. However, jl and v depend on rQ in such a way that v is actually a step function, despite the continuity of P. Two simple asymptotic results are of interest: COROLLARY 3 • If R -------- ----- °° with rQ value of the game tends to a limit & that is or satisfies
o(ro+,x) , val||a^.|| < 0
or
> o.
Thus, if the '’money1' game is in his favor, Player I can defeat even an arbitrarily rich opponent, with some probability. COROLLARY 4 . If rQ and R --- ► 00 in a fixed ratio, or, equivalently, if the a^ . all --- ► 0 in a fixed ratio, then the limit of v(rQ ) is either o, rQ/R, or 1, depending on whether valHa^.H < 0 , = o, or > o, respectively. As we pass to the limit in this fashion, the "naive” strategy of maximizing the minimum, expected money gain on each round becomes better and better. Indeed, in the balanced case (valHa^.H = o) It is a local F-strategy, and in the lopsided cases (val||a^j|| < o or > o) one player has nothing to lose anyway, In the limit, while any strategy with positive expected gain wins for the other. These remarks may clarify the rather puzzling (and not entirely correct) conclusions of [1 ], [2 ], [11 ] to the effect that the "naive” strategy just mentioned is approximately optimal. [The following is an example of a game in which this "naive" strategy is not satisfactory:
Example 1o In fact if Player I follows it here he will always choose the first row and hence always lose (assuming
MILNOR AND SHAPLEY o < € < ~). Another case where the "naivp" strategy is not satisfactory for Player I is given by Example 7 above.] A generalization of Corollary k has been obtained by Scarf [12] for survival games in which r and a^j are n-dimensional vectors. Under certain assumptions, which reduce to our condition val||.a^.|| = 0, he finds that the limiting value functions are generalized harmonic functions, being the zeros of certain second-order differential operators, in general nonlinear. A different extension of the survival game model, of some inter est, is obtained by changing the information pattern, disrupting in some specified way the process whereby the players learn of each other1s past moves and the resulting winnings or losses (see [14 ]). Since the local P-strategies can be played without benefit of any information whatever, the bounds of Theorem 6 remain applicable, and we have; COROLLARY 5. In a game of survival with restricted information flow, the value (If It exists) lies with in the bounds (16). In any case, the minorant (supInf) and majorant (inf-sup) values exist and satisfy (16).
Q = 1,
We note in passing that the value always exists If Q = 0 or since the payoff as a function of the pure strategies is semi
continuous, and the pure strategy spaces are compact, regardless of the In formation pattern (compare [13]). So far in this section we have been proceeding on the assumption that lla^jll is zero-free, We now indicate without proof the modifications required If this assumption is dropped. The parallel numbering will assist comparison. LEMMA 12°.
If
solutions of
max min a. . < 0 < min max a. . then the i j J j i J
(15°)
val||f (A,, a^j )|| = 0 constitute a finite, closed interval [a,1, a ”]. More over A r and A," both have signs opposite to val||a^j||, and we have A f < 0 < a " if and only if val||a^j|| = 0. Write
F !(x)
LEMMA 13°.
for
Both
f (X T, x) P 1 and
and Fn
P !,(x)
for
f(A", x).
are solutions of (3).
GAMES OF SURVIVAL COROLLARY. The local F l- and F n-strategies are precisely those mixed strategies that use only prob ability distributions that are optimal in the matrix games
||F!(a^.)|| and
THEOREM 6 ?
||Fn(a^ .)||
respectively.
If max min a. . < o < min max a. . i j j i J
then the extreme solutions vQ and v 1 of (3), (4 ) are approximated by F"(r)/F,f(R) and F l(r )/F!(R) respectively, with precise bounds of the form (16). In the Q = 0 game, Player II can enforce the upper bound to vQ by playing a local F"~strategy, and Player I can enforce to within any e > 0 of the lower bound by choosing 6 > 0 small enough and playing optimal strategies of |[f(X ” + 8 , a^ .)|| on each round. A similar statementholds for the Q = 1 game and its value function v 1.Forgeneral Q the value (if it exists) lies between and F 1(r + n )/F!(R + \i ). Again we remark that
the cases max min
F M(r0 )/FM(R + v)
a^. > 0
and
min
max
aij < 0 are trivial. A guide towhat happens when one or both is equal to zero Is provided by Lemma 3, in Section 2. The five corollaries are unchanged or are modified in the obvious way, using the last part of Lemma 14° and noting that statements must be made in terms of vQ and v 1, with the value of the game In general (if it exists) lying in between. Corollary 2° can be extended slightly (with the aid of Theorem 2) to yield the following result: THEOREM 7 -
If the
are all
+ 1
X ’ = x", then VQ = V 1 and- the value game exists and is independent of Q.
It more general answer seems valid almost
and if
of the
survival
is natural to ask whether X f = x" Implies vQ = v 1 under conditions. In view of Example 11,discussed below, the to be in the negative. However the converse Implicationis always. In fact, if X f < x" then the inequality:
F n(r+M.)
m a x [ u !, u"] and vQ < mlntu1, u n]. However, a simple calculation shows that u ? and u n are distinct for R > 1; hence vQ and
v1
are also distinct.] BIBLIOGRAPHY
[1]
BELLMAN, R., M0n multi-stage games with imprecise payoff," RAND Corporation Research Memorandum RM- 1337; September 1954 .
[2]
BELLMAN, R., "Decision-making in the face of uncertainty - II," Naval Research Logistics Quarterly 1 (1954 ), pp. 327-332 .
[3]
BELLMAN, R., and LASALLE, J., "On non-zero-sum. games and stochastic processes," RAND Corporation Research Memorandum RM-212, August 1949 -
[4 ] BLACKWELL, D., "On multi-component attrition games," Naval Research Logistics Quarterly 1, (1954 ), pp. 210-216.
GAMES OF SURVIVAL [5]
DOOB, J. L., Stochastic Processes, Wiley, 19 5 3 *
[6 ] EVERETT, H., "Recursive games," this Study. CT]
FELLER, W., Probability Theory and its Applications,
Wiley, 1950.
[8 ] GALE, D., and STEWART, F. M., "infinite games with perfect informa tion," Annals of Mathematics Study No. 28 (Princeton, 1 9 5 3 )> pp* 2^5-266. [9] [10]
GLICKSBERG, I., "Minimax theorem for upper and lower semi-continuous payoffs," RAND Corporation Research Memorandum RM-^78 , October 1950. HA.USNER, M., "Games of survival and optimal strategies in games of survival," RAND Corporation Research Memoranda RM-776 and RM-777 , February 19 5 2 .
[11 ] PEISAKOFF, M. P., "More on games of survival," RAND Corporation Re search Memorandum RM-884 , June 1952 ; reproduced in R. Bellman: "The theory of dynamic programming," RAND Corporation Report R- 2^5 (1 9 5 3 ), pp. 87-96. [12 ]
SCARF, H. E., "On differential games with survival payoff," this Study.
[13]
SCARF, H. E., and SHAPLEY, L. S., "Games with information lag," RAND Corporation Research Memorandum RM-1320, August 19 5 4 •
[14 ] SCARF, H. E., and SHAPLEY, L. S., "Games with partial information," this Study. [15]
WALD, A., "On cumulative sums of random variables," Annals of Mathe matical Statistics 15 (1 9 4 4 ), pp. 283-296.
J . Milnor L. S. Shapley Princeton University The RAND Corporation and The California Institute of Technology
RECURSIVE GAMES H. Everett1 INTRODUCTION A recursive game is a finite set of "game elements", which are games for which the outcome of a single play (payoff) is either a real number, or another game of the set, but not both.
By assigning real numbers
to game payoffs, each element of the recursive game becomes an ordinary game, whose value and optimal strategies (If they exist) of course depend upon the particular assignment. It is shown that if every game element possesses a solution for arbitrary assignments, then the recursive game possesses a solution. In particular, if the game elements possess minlmax solutions for all assignments of real numbers to game payoffs, then the recursive game possesses a supinf solution in stationary strategies, while If the game elements possess only supinf solutions, then the recursive game possesses a supinf solution which may, however, require non-stationary strategies. No restrictions are placed upon the type of game elements, other than the condition that they possess solutions for arbitrary assign ments of real numbers to game payoffs. Some extensions to more general games are given. § 1 . DEFINITIONS A recursive game, is a finite set of n "game elements", 1 2 n denoted by r > r > •••, r > each of which possesses a pair of strategy spaces, denoted by S^ and S^ corresponding to for Players 1 and 2 respectively. To every pair of strategies X^ e S^, Y^ e S^, there Is associated an expression (generalized payoff):
(1 .0 )
Hk (Xk , Yk ; T ) = pkek + £
qk> M
J=1 1
National Science Foundation Predoctoral Fellow 1953 -5 6 . ^7
EVERETT where pk >
> 0
and
+ X j
= 1
The interpretation of this generalized payoff is that if Player 1 and Player 2 play with strategies X^ and respectively, the possible outcomes of the single round are either to terminate play -with k Player 1 receiving an amount e from Player 2 , or to have no payoff and k ki proceed to play another game of the set, where p and the q J are the probabilities of these events. A strategy
x € £>
for
P^ is an infinite sequence of vectors,
X = {X^} = x], ~XS , .• ~Xt , ...where T t = (X^, X2, ..., x£) and x i e Sf for all t and all i, with the interpretation that if P, k k finds himself in rfor the t-th round of play, he willuse strategy X. . 1 i A strategy x is stationary in component i if X^ = X^ for all t. A strategy x is stationary if it is stationary in all components. Similar definitions holdfor a, strategy w e g*2for P2 . A pairof strategies x, ¥ and a starting position define a random walk with absorbing barriers among the game elements. Since ab sorption in rk in the t-th round carries the payoff e^ an expectation, Ex*^[x, ¥] is defined. Thus to each strategy pair there corresponds an expectation vector, whose components correspond to the starting positions. If we define the n x n matrices P^ and and the column vector E^ for the strategy pair (x, y ) by: [Pt]1:3* - S ^ p 1
[ Q ^ 1^ = qij*
(1 . 2 ) ■i _ „i t where p1, qij*, and e1 are given by H^CX^, Y^; r) through (1 .0 ), then straightforward calculation gives the expectation vector for n rounds of play as:
(1.3)
— . n / k ~' \ — Exn (x, I) = £ f j % PA k=i ' t=o
where k-1
and hence that the ultimate expectation is
/
RECURSIVE GAMES k -1 (1 A )
Ex(x, y) =
lim
Sn'*-
" ' I
Il«t
k= 1 \ t=o
' A
/
which for bounded payoffs always converges, and which assigns zero expec tation to a non-terminating play. A recursive game will be said to possess a solution if there ex ists a vector V, and if for all e > o there exist strategies x€ € ^ e such that:
(1 .5 )
Ex(Xe, y ) > V - e T
for a l l
Ex(x, ¥€) 2
and x € £>
where U > ¥ (> U1 > W 1 for all i, and 1= (1, 1, ..., 1 ). Then x€ and ¥€ are called e-best strategies, and V is the value of the re cursive game. (Our definition thus corresponds to a solution of all of the games which can arise from the different possible starting positions
§2.
THE VALUE MAPPING,
M
For an arbitrary vector ¥ = (W , W , . . . , ¥ ) we can reduce a game element to an ordinary (non-recursive) game r^(¥) by defining the (numerical valued) payoff function for rk (¥) to be: (2 .1 )
Hk (Xk , Yk; W) = pkek + ^
qkj’w j’,
(Xk, Yk ) e Sk x Sk
j which results from H(Xk, Yk; T) by replacing the symbols rJ by the real numbers W J in (l.o). In effect we are arbitrarily assigning a "value", to the command to play
(2 .2 )
(2 .3 )
H •
DEFINITION 1 . A game element r1 satisfies the supinf condition if the ordinary game r^(¥) possesses a supinf solution in the usual sense for all W. DEFINITION 2 . A game element r1 satisfies the minimax condition if r3"(W) possesses a minimax solution for all
W.
50
EVERETT
Of course, if a game element satisfies the minimax condition, it also satisfies the supinf condition. We shall henceforth deal only with re cursive games, all of whose elements satisfy at least the supinf condition. If each of the n game elements of a recursive game r satisfies the supinf condition, then for any n-vector U, we define the n-vector IT = M( U ) through: (2.k)
U'1 = Val
U )
The mapping, M, of n-vectors into n-vectors is then called the value mapping for the game r. We now define the relations
>*
and
* V1 U
>*
V1 > 0
If
V
1
U1 > V1
if
r
U1 < V1
if
for all
i
for all
i
V1 < 0
(2 .5 )
U
•
0
the classes
I
C1 ( r ), 0 { r ) of n-vectors
by: W e C 1( r )
M( W ) > ‘ W
W € C2 ( r ) ^
M( W )
(2 .6 )
and we note that C1( r ) and for the zero vector.
W
C2 ( r ) are always disjoint except possibly
THEOREM 1. (a) W e C 1 ( v ) ==>■ for every there exists a strategy xG € ^ such that Ex(Xe , ¥) > ¥ (b) there exists a strategy
eT W Y€ e
E x(X, ¥ ) O
¥ e S’ )
e C2 ( r ) ===>• for every such that (all
X e g )
e > 0
51
RECURSIVE GAMES PROOF. We shall prove (a) by supposing that we are given a W e C 1 ( r ) and. an e > o, and then using W to construct a strategy x€ 6 & , which we subsequently prove gives the desired result.
Let W ! = M( W ). Because W € C 1 ( T ) all components ^ which are positive increase under the value mapping, and since there are only a finite number, there exists a 7 > 0 such that W1 > o = > W Ti - W 1 > 7 for all i. Choose 6 such that 0 < 5 < min (7 , e), and then let strateX e 6^ for P^ have components irl as follows: 1)
If
r1 ( W ) possesses an optimal strategy,
X1 e sj, for P , then let x£ = X1 all t . (stationary in comp, i)
2 ) If
r1 ( W )
for
fails to possess an optimal strate
gy for P1, but W1 > 0, then let = X1 for all t, whereX1 e s| is 6-best in r1 ( W ). (stationary in comp. I)
gy)
3 ) If
r (W )
gy for
fails to possess an optimal strate
P 1,
and
W1
0,
then let
X^ e s|
be a strategy which is 5,-best in r^( W ), 1 t where 5^ = (— ) 6 . (non-stationary in comp. Then by (2 .1 ), (2.4), and (2 .6 ), for
Xt
i).
so defined and for all
Yt:
H'
(2 .8 )
¥1 + 7 - 6
¥ 1 - &t so that if we define the non-negative vectors If
w
in
1 ) and 2 )(w1 > o)
in 3 ) ( ¥ 1 < 0 ) n
and 0
> 0
&t if
by: ¥
> 0
(2 .9 ) 0
if
¥ ^ 8t g 0 if
then we can summarize (2.8) in matrix notation as (2 .1 0 ) for all
3tEt + t,
under
Xe,
and for all
> ¥ + |i - 6t
¥.
¥
¥.,
.
§3 • THE CRITICAL VECTOR (3*0
DEFINITION 3 • V = V( r ) Is a critical vector for — $■r ZZZH for every e > 0 there exists a pair of vectors, ¥ 1 and ¥2, lying componentwise within an e-neighborhood of V, (e N ( V )), such that ¥ 1 e C1 ( r ) and ¥ 2 e Cg ( T ). (V is in the intersection of the closures of c2 ( r
C1( r ) and
)•)
THEOREM 2 * V is a critical vector in r = > r possesses a solution, with value V. (Hence V is unique,) PROOF. Follows Immediately from definition of critical vector, Theorem 1 , and the definition of a solution. COROLLARY. If V possesses a critical vector, V, then there exist for all € > 0, e-best strategies x€, ¥g, for the players which are stationary in all components I for which either satisfies the i 1 minimax condition, or V is favorable. (V > 0 is favorable for P^, V1 < 0 is favorable for P2 . ) PROOF. Follows from construction of e-best strategies (2 .7 ) in proof of Theorem 1. REMARK 1. The value of an ordinary (non-recursive) game is obviously a critical 1 -vector In that game. §4.
REDUCTIONS OF RECURSIVE GAMES
For any recursive game
1 2 r = ( r, r,
n ..., r ),
we can form a
55
RECURSIVE GAMES
reduced recursive game rs ( ¥ s ), from any subset s of the game ele ments of "r^ by assigning real numbers ¥^ to the game payoffs r1 for the remaining set s of game elements. That is, the generalized payoff function for the game element r1, i € s, in the reduced game, is defined to be:
(4.1 )
H ^ X 1, Y 1;
r s ( ¥ ®)) = p-^e1 +
qij*¥j* + j€S
qlj'rj* jes
¥e shall say that a game element r1 has bounded payoff if there exist finite numbers a, p such that p § e1 § a for all (X1, Y 1 ) € s| x S^. ¥e shall now investigate_the behavior of the value, ifit ex ists, of a reduced game rs (v i & ) formed by assigning the single real number v to the set s . ¥e will abbreviate the k-th component of
v7i by
V^(v),
3
( v T 5 )}
and the game element LEMMA 1 . (a)
rk (v 1 3 ) by
rk (v).
¥e then have:
a > o, p < 0
are anypayoffbounds for all rk , k e s. (b) Vk (v ) exists for all v. together imply (c ) 3 § Vk (v ) § a for p ^ v o, and for all v, Vk (v +
5 *^ Vk (v - 5 ) wj (1 e 3 )
which Is simply a statement that for the value map game (4.27)
M
for the reduced
r s ( V s ): M ^ 3)
^
3
(so that ^ 3 € C 1(
T s ( T S )) .
Similar treatment holds for W 2S and we conclude that in r s ( V 3 ), and the proof is completed.
V s
)
is critical
6o
EVERETT §5.
EXISTENCE OF THE CRITICAL VECTOR - MAIN THEOREM
THEOREM 5 . Every recursive game whose game elements have bounded payoffs and satisfy the supinf con dition possesses a critical vector. PROOF.
Induction on the number of game elements using:
HYPOTHESIS (k): Every recursive game consisting of k or fewer game elements, all of which have bounded pay offs, and satisfy the supinf condition, possesses a critical vector. Now consider any recursive game r which consists of k + 1 game elements with the above properties. Remove one element, say r^, and consider the remaining set, r r, as a reduced game r r (v) which Is a function of the nvalue" v assigned to r^. This Is then a re cursive game with k elements and hence by hypothesis possesses a critical vector V (v) for all v. Moreover, since Lemma 1 applies to criti cal vectors, as we have seen, we conclude that V P (v) is a continuous monotonic function of v in all components. Now consider the ordinary game r^( V r (v), v), which possesses a value for all v by virtue of its satisfying the supinfcondition. Define: (5 .1 )
V( V ) = Val rq ( T r (V ),
V
)
then applying Lemmas 1 and 2 we obtain the conditions
onV(v):
V(v) - 5 g V(v - 6) 0
such that
3 g v ^ a
where a, 3 are the upper and lower payoff bounds for all of the game elements of r. Therefore, V(v) is a continuous mapping of the closed line segment [3 , or] into itself, so that there exists a closed, non empty set of fixed points, and hence in particular there exists a fixed point of minimum absolute value which we shall designate as v*. That is, there always exists a v* such that:
RECURSIVE G M E S (5*4)V (v*) = v*,
and for all
v,
61
V (v ) = v = >
|v | > |v * j
We shall now show that the (k+1)-vector V defined by V = [V (v*), v*], and which always exists, is critical in r. To this we proceed to show that for any e > 0 there is a W 1 e N £( V such that W 1 e C^( r ): CASE 1.
do )
v* > 0.
We first remark that (5*5)
V(v) > v
for
since by (5 .2 ) and (5*4) if we let
0 § v < v*
v = v* ~ 8, 8 > 0,
then
but the equality cannot hold, or it would contradict the minimum absolute value property of the fixed point v*. Now, (5*5) implies that for every e > 0 there is a v € g N£ (v * ) for which V ( v G ) > v €, and hencethere exists a 8, 0 < 8 < €, for which: V ( v ) = V(v* - 8) > V( v * ) - 8 = v* - 8
(5.6)
=vj
V (ve ) > ve +8
.
We now turn to the reduced game r r (v€), for whichthe induc tion hypothesis guarantees the existence of a vector V P (v€ ) which is critical in ~r~r (v€ ), the property that: (5.7) Now, (5-6) since
Val
so that there is avector
for all
W 1r , v e ) >
W^
€Ng(V r (v€ ))
k e r
.
Implies that In rq, Valrq ( Vr (ve ), ve ) > vs + S, e N& (V r (vs )), applying Lemma 2 we get
(5-8)
Val rq ( W ^ ,
ve ) > ve
with
and
.
But (5*7) and (5*8) are simply the statement that the k + 1 vector W 1 = [W1P, v€ ] has the property that M( W 1 ) > W^, so that
(5.9) But by choice of (5.10)
e c1 ( T
)
.
v€: V € € N € (v *)
which implies, using Lemma 1 for critical vectors, that
V r (ve ) e N £(V r (v*
62
EVERETT
which, because
W^
e N& {V
(v€ )},
(5.11 )
and
5 < e,
implies that
e N 2 € ( T p (v*))
and combining
(5•1o ) and (5*11) we see
(5.12)
that
¥ 1 = [W^, v€] € N2e{[V P (v*), v*]}
and hence that CASE
Now,
W 1 e N2€( V ),
2. —
and Case 1 is completed.
0.
v* ^
-p
V = CV
(v*), v*] may have some othercomponents besides
the q-thcomponent equal to v*. Let p denote the set of allthose in dices k for which = v*. Then, since by Theorem 3 V P is a fixed point of M, the value mapping for the reduced game r r (v*), we can conclude that: (5.13)
Val
elements
rk ( V p (v*), v*) = v*
k € p
We now turn to the reduced game r P(vl p ) consisting of the r^, i € p, with "value" v assigned to all of the gameele
ments in p. By applying Theorem 4 to tor V P (v*), we have: (5.14)
V( restricted to
since
all
Vk = v*
for
? p( v *T p,
which has critical vec
P)) = Y p = V
is critical in
k e p. Now, if we denote
we get, by Lemma 1, that for any (5*15)
r P (v*),
€ > 0,
V P (v*) - V P (v)
V P( v * ) - e T P . Yal r ^ v
1
for which: o W,
V p (v),
v* - v = e,
"v
W.1 p e N~{V p (v)}
by
and for v = v* - e:
in all components since V p (v*) = V p is not equalto v* component. Hence there exists a 5, 0 < & < e, suchthat: (5 .1 6 )
f
P) 5.wj
all
1 € p
in any
RECURSIVE GAMES The existence of such a r
v 1 P).
(5.19)
¥1 p
63
is assured, since
V p (v)
is critical in
Nov, returning to (5-13) and rewriting It as rk (v* 1
Val
V P(v*)) = v*
all
k € p
we apply Lemma 2 to get: (5 •2 0 )
Val rk ((v* - € ) 1
V P(v*) - € 1 -P) > v* - € all
v
which means, since
(5.21)
Val r k (v
=
v*
T p,
-
e,
k € p
and by (5.17) and Lemma 2 , that
T P(v) - 5
T p) >
V
a ll
k e p
.
But
^
p e Ng (v" p (v )) ,
and Lemma 2 applied to (5 . 2 1 ) yield:
(5.22)
Val rk (v T p, W1 p ) > v a l l k e p
.
Now we see that, because v o a ¥ 1 e N e ( V ) such that ¥ 1 € C 1( r ). Similar treatment with use of alternate relations and reversal of all inequalities (effectively reversing the roles of the players) yields the existence of a ¥2 € C2 ( r ), ¥2 € N €( V ) and we have proved that the vector V, which we have constructed, is critical in r. ¥e have shown that
Hyp(k) = > Hyp(k + 1 ).
Furthermore, for the
EVERETT
64
empty game, which has zero payoff for all strategies, and hence value zero, the hypothesis is obviously satisfied, since zero itself Is critical in this game, and the proof of Theorem 5 is completed. (An independent proof of the hypothesis for recursive games with one element is contained in Theorem 8 .) We can now summarize our results, using Theorems 2 , 5, and Corollary 1 : THEOREM 6 . MAIN THEOREM. Every recursive game whose elements have bounded payoffs and satisfy the supinf condition possesses a solution, V, and e-best strate gies XG and for the players which are stationary in all components i for which either r1 satisfies the minimax condition or V1 is favorable. An important consequence of Theorem 6 is that any recursive game whose game elements are matrix games (matrices with generalized payoff elements of the form a^ . = jejLj + ^k^ij1"^ possesses a solution and e-best'stationary strategies for the players, since all such matrix game elements satisfy the minimax condition. §6 . GENERALIZATIONS For our purposes we shall define a stochastic game, r , a collection of game elements {t ^}, each with strategy spaces S^ S^, and generalized payoff function of the form: ff^X1, Y 1; r ) = e1 + p S +
to be and
(X1, Y 1 ) e sj x S;
( 6.1 )
p1, q1J’ > 0;
p1 +
= 1
where now e1 is a payoff which takes place whether or not the play stops, p^ is the stop probability, and the q^* are the transition probabilities to other game elements, as before. With such games the payoffs are allow ed to accumulate throughout the course of the play, in distinction to re cursive games, where payoff can take place only when the play stops. If we now extend all of our definitions and formulae in the ob vious manner (which amounts to replacing P E by E in the expectation formulae) to stochastic games, we notice that Theorems 1 , 2 , 3 , and 4 re main true for stochastic games. Lemma 1 , however, fails ((c) is no longer -Xr true, and the crucial § of (d) must be replaced by the milder (6.4) such that
for every
. Ex(x% ¥) > i
£ there exists a for all
x^ e g
Y e 6^
and similarly for Val = and make a similar extension of the notion of a critical vector (in this case simply a number), then we can give a complete answer to the existence of a solution for simple stochastic games, with no restrictions on e or q. THEOREM 8 . Every simple stochastic game which satisfies the supinf condition possesses a solution. o
The stochastic games treated by Shapley [1 ], which are generalized matrix games with the condition that the stop probability is bounded away from zero for all strategies in all game elements, are a subclass of pseudo-recursive games.
66
EVERETT
PROOF. line is in either is always in of their closures critical for r,
We simply remark that every point of the extended real C ^ r ) or C2 (r), and that neither is empty since - oo and +00 is always in C2, so that the intersection is non-empty. But a point in the closure of both is and the theorem is proved. c.
UNIVALENT STOCHASTIC GAMES
A univalent stochastic game is one for which the payoffs are always non-negative (or non-positive) for all strategies, and in all game elements. Such games are useful for describing certain pursuit games, in which Player 1 , the player being pursued, receives some positive payoff from the pursuer (P2 ) for every move that takes place for which he successfully avoids capture, the play ending with no payoff when capture takes place. It is useful now to introduce the notion of a "trap" in a stochastic game, which is an element or set of elements such that once the play reaches an element of the trap one of the players can force the play to remain in trap indefinitely in such a way as to accumulate payoffs from the other player, and hence achieve an arbitrarily high expectation. Traps are, then, sets of elements which have infinite values in the sense of (6.4). A game contains no traps when each player can prevent infinite ad verse expectations in all elements. We shall see, however, that even trap-free stochastic games do not always possess solutions, at least in the sense of our previous definition of a solution. THEOREM 9 . Every univalent stochastic game, whose game elements satisfy the supinf condition, and which contains no traps, possesses a solution. PROOF.
Consider the sequence
{ W-^ }:
W0 = T
Wk+^ = M( \
)
which is generated by iterating the value mapping. Since all payoffs are non-negative we know that W 1 > ° = W Q. Now, assume that ^ +1 >
Then, by Lemma 2 ( f ) Val r 1 ( ^ k+1 ) > V al r i (
implies that M( W^ +1 ) > M( by induction we have proved: (6 .5 )
{ Wn )
) fo r a l l i ,
) which means that
^ + 2 i ^k+1
is monotone increasing in all components
which 30
RECURSIVE GAMES
6?
Let r(n) be the truncated game which results from r by in troducing a compulsory stop with zero payoff after n rounds of play, whose value is obviously given by iterating the value mappingntimes on the zero vector and hence equal to Wn - We are assured that P^can come arbitrarily close to this value by simply playing a strategy which is €-best in the truncated game and arbitrarily thereafter, since, because all of the payoffs are > 0, he can lose nothing after n moves. So that for every Wn in the sequence { ), and for every e > 0, there ex ists a xe e £ such that: (6.6)
Ex(xe, ¥) > Wn
- € 1 for all
¥ €
But, since the game is presumed to betrap-free, P2can pre vent any infinite positive expectations, so that the sequence(W-^ } is bounded above, and hence converges to some finite limit, W*, which P 1 can approach arbitrarily closely, and for which M( W* ) = W*. However, this means that W* e Cp ( r ),sothat P2 can also, byTheorem.1 , come arbitrarily close to W*, so thatthe game has a solution withvalue W* . (It should be noted that in general the limit ofthe iterated value mapping is not the value of the game. See §9 Ex. 6.) It can be shown that if the elements of a univalent stochastic game satisfy the extended minimax condition (r^( W ) possesses a mini max solution for all W including the possibility of infinite components), then the game possesses a solution even finite values are, of course, allowed.
if traps are present, where in The difficulty in extendingthis
result to the supinf case lies in the fact that the limit of the iterated mapping, W*, which may now have infinite components, does not necessarily satisfy the relation that M( W* ) = W* in the supinf case as it does in the minimax case. §7.
EXTENSION TO THE CASE OP CONTINUOUS TIME
We shall now present a further generalization of the theory of recursive games developed so far, to include the case of a continuous, rather than discrete, time parameter. We wish to show that the theory of continuous time recursive games can be reduced in a simple manner to the earlier theory.
ments (7.1 )
A continuous time recursive game r is a collection of game ele {r^}, with payoff functions of the form.: E1 (X± y Y1;
) = p-^e1 + z*. qij*rj* J
(z!
omits
j = i)
68
EVERETT
where the interpretation is that if the players are playing strategies X1, Y^ in r^, then in the (infinitesimal) time interval dt the play stops with payoff e1 with probability p^dt, while with probability qij*dt the players move on and play H . The p1 and q^* are referred to as transition rates. They are non-negative, but do not necessarily sum to unity. In such games the players are at each instant playing some strate gy, but they are free to change at any time. However, we assume that with all admissible time dependent strategies the transition rates are integrable, i.e., /p^dt and the /q^'dt always exist. (In any actual game it is simply impossible that the players could change strategies so fast that this condition would not be met. ) We furthermore assume that the transition rates p1 and qij*, as well as the payoffs e1, are bounded for all strategies, in all elements. We shall show that we can, in a simple manner, associate with r a discrete time recursive game r (a ), which, if it has a critical vector, supplies all the information necessary for optimal (or e-best) play in r — i.e., which has the same value, and whose e-best strategies furnish e-best strategies for r . Thus the problem of continuous time recursive games will be reduced to that of discrete time games which we have already discussed.
Let
A
The reduction to a discrete time game is accomplished as follows: be a positive number such that A(p^ + € -q*^ ) is xe and YG are also e-best In r, which has a solution with value V.
playing
PROOF. Let us assume that it Is the k-th round and that P 1 Is xG, and let t measure the time elapsed since the beginning of
the round (k- 1 st event). P 1 changes only at times [T/a], (7-4)
p ^ e * 1
for all
Y 1 €S2,
(7-5)
pV-♦
is therefore playing x^+1 +[T/a]^ which and for which, according to (2 .8 ) and (2 .9 ):
+
and for all > (P1 +
> W1
I.
+
ii 1
-
S^+1+tT/A ]
This implies, according to(7*3) that
z J c W
+ 1 ^
- 1 »i+1 + [T/A]
for all Y^" and all I. Since (7 .5 ) holds for all holds at eachinstantof play of r.
Y^~
and alli,
It
We are now interested in the ultimate outcome of the k-th round, regardless of the time Involved, and wish to compute the probabilities p^, q^J (i 4 j) the various possible ultimate outcomes of the k-th round.We can then view the course of play as a discrete stochastic
EVERETT process which takes place only with each event, in which time is eliminated.
pi^
Whatever strategy ¥ = Y (t) P2 is playing, the transition rates as well as the payoffs e^~ are functions of the time subject to
(7 .5 ). Let us restrict our attention to the i-th element, and let n(t)dt be the probability of an event in the time interval dt, so that the trans ition rate n(t) Is: (7-6)
n(t) = p1 (t) + Sjqlj'(t)
Furthermore, let occurred at time
.
R(t) be the probability that the k-th event has not yet t (Note: t measured from beginning of k-th round).
Then clearly R(t) Is monotone decreasing, bounded between and satisfies the relation:
0
and
1,
t
J
(7 .7 )
1 - R(t)
R(t)n(T)dT =
.
o The probability that by time sulted in a stop,
p1(t),
t
the k-th round will have re
Is t
(7-8)
J
pi(t) =
R(T)p1 (T )dT
o while the probability that it will have resulted In a transition to qij’(t), is
( 7 - 9 ) qlj'(t)
=
J
t R( t )qlj*(r )dT
.
Finally, if t e^(t) =
( f o
R( t )pi (x )e^(T )dr I / (
j
R( t )p^( t )dx
denotes the mean payoff (which is, of course, bounded by any bounds for e^), then we can write the total expected payoff as: u (7 -1 0 )
pi (t)S1 (t) =
J
R(T)p1 (T)e1 (t)dT
.
However, making use of (7*5), we have that for the k-th round,
RECURSIVE GAMES in the I-th element, under
x6
and for all
71 Y (t):
! p 1 ( t ) e 1 ( t)
+
Y, j t 1
t J* R(t )p^(t )e^(t)dx + O
t WJ
j
J*
T )dT
o
(7*11)
J*
b R(t )
)e^~(t ) +
(t )W^*
dx
L
I
R( t )
n ^ W 1 + l ^ 1 - 1 6j+1+[T/A]
dx
so that, using (7 *7 )
> [1 - R(t)]¥;L + i ii1
(7.12)
R (T )dT
u 1
A
dT
/
R(t)BL
+ [T/a ]
Now hy the construction of (2 .7 )
3k + 1+[T/a] so that, since that
R(t)
1,
is bounded by
and certainly
00
J
t 0 ) that
must be zero, since otherwise 00
J R(t)dx o would be infinite (R e j ) and the left side of (7*13) would be Infinite, an impossibility for bounded e^ and finite W^1. Therefore if W1 is positive [1 = W1, while if W1 g 0 then [1 > W1 . Hence (7 .1 3 ) Implies that
R(~)]¥1
RMJW1
1
(7-14)
+
00
/
Y , qkJ'wJ" = wl + b X
I/ V
j
\
R (T ^dT ] o /
-•
Finally, since A was chosen so that A(p^ + ejQ."^) ^ 1 , for all strategies in all elements, we have that An( t ) 0,
and because
W1 > 0
implies
R( W 1 + (i1 - (j)kb
.
j Similar analysis holds for each element, so that (7-15) holds for all
I.
This expression (7*15) involving the ultimate transition prob abilities and expected payoffs for the k-th round is formally equivalent to the expressions (2.8), (2 .9 )* But if we formmatrices P1 , Q, , and ~i i ~i vectors E^ from pk , q^d and e^ by the formulas (1 .2 ) then the formu las (1 •3 ), (1.4) for the expectation are applicable to our case. There fore the proof of Theorem 1 is also applicable, and we can conclude that the ultimate expectation for xG satisfies (7 .1 6 )
Ex(xe, ¥ ) > " w - e T
for all
Y
.
Since W is € Ne( V ), the strategy X€ is 2€-best for P 1. Reversal of the roles of the players shows the same for Pg, and the theorem is proved. Theorem 10 is easily generalized to the case of continuous time stochastic games, which are games r whose elements r1 have payoffs of the form H ^ X 1, Y 1; 7
) = e1 + piS + z!>qij*rj* J
where the interpretation is that If the players are playing X^, Y^ in r1, then in time dt a payoff e ^ d t takes place, and with probability p^dt play stops while with probability q ^ d t there is a transition to H . e1 is in this case a rate of payoff which is going on at all times (accumulating throughout the course of play) until play stops. Theorem e1for p^e1 in 10 then goes through directly with substitution of all formulas ( E for PE ), and we have THEOREM 11 . Theorem 1 0 holds for continuous time stochastic games. Finally, we remark that there Is no difficulty in handling re cursive (or stochastic) games In which some elements are discrete time games and the others continuous. One simply reduces the continuous time game elements to discrete time elements in the manner presented here, leaving the discrete time elements unaltered.
EVERETT §8 .
SUMMARY AND COMMENTS
Our main tools have been the concepts of the value mapping and the critical vector. Theorems 1 and 2 establish that the critical vector is a solution of discrete time recursive (and stochastic) games, while Theorems 10 and 11 extend this result to the continuous time case. We can therefore state with full generality: If a recursive or stochastic game, either discrete or continuous (or mixed) time, possesses a critical vector then that critical vector is unique and is the solution of the game. Theorem 5 establishes the existence of a critical vector for all discrete time recursive games whose elements have bounded payoffs and satisfy the sup-inf condition. This result, together with the above re sult, implies that: Every discrete time recursive game whose elements have bounded payoffs and satisfy the sup-inf condition, as well as every continuous time game for which a derived discrete time game Is such a recursive game, possesses a solution. This latter result cannot be extended to stochastic games, since the existence of a critical vector is no longer guaranteed, as shown by Examples 3 and b of §9 . We should like now to emphasize several points. We have address ed ourselves solely to the combinatorial problem of what can be expected when a number of games are "hooked together11 with various feedback paths (by allowing some outcomes to feed Into other games instead of numerical payoffs), under the assumption that the individual games (elements) are "inherently soluble" (i.e., that when the loops are opened, by replacing game payoffs by numerical payoffs, the resulting ordinary games have so lutions ). The situation is fully analogous to servomechanism analysis, where the complex behavior of a closed loop servomechanism is analyzed in terms of the (open loop) behavior of its parts. The theory of servo mechanisms is concerned solely with the problem, of predicting this closed loop behavior from know behavior of the components. An appropriate alter nate name for recursive games would be "games with feedback". Since it was not necessary to place any restrictions on the type
RECURSIVE GAMES
75
of game elements to achieve our results, they are valid whether the ele ments he matrix games, games on the square, infinite games In extensive form, some type as yet undiscovered, or for that matter other recursive games. It is therefore improper to regard recursive games as a particu lar class of games. Rather, the concept is one which can he applied to any game (every game is trivially a one element recursive game), hut which is useful only if the game is such that there are a number of different situations which can confront the players (the game elements) and the be havior of these elementary situations is completely understood. §9*
EXAMPLES, COUNTER-EXAMPLES, APPLICATIONS
In order to illustrate the results, and to motivate some of the restrictions imposed on the theorems, we list some simple examples of dis crete time games. A supply of examples for the continuous time case may be easily obtained from these by suitable reinterpretation of probabilities as transition rates. EXAMPLE 1.
Value 1, e-best strategy [1 - e, €] for P 1, all strategies optimal for Pg . This is an example of a recursive game which satisfies the minimax condition, yet which possesses no optimal strategy for P 1. Notice that
1
is the unique fixed point of the value map for this game. EXAMPLE 2.
Where P2 is restricted to strategies of the type [1 - a, a] with 0 < a § 1 . This is a recursive game which satisfies the supinf condition, for which P2 possesses no stationary e-best strategy. The value is 1, [4/5, 1/5] is optimal for P ^ and P2 can also approach 1 by playing a non-stationary strategy [1 - a^_, a^J for which
76
EVERETT
at < €
Z
•
i=1
EXAMPLE 3« r2
10
r3
0
-1 0
r3
10
r2
ro
-1 0
r2 : (1 + r2 ):
r3 : ( - 2 + r3 ; r3 (-:
Example of a stochastic game with traps, for which no solution exists.
/
1 +r 2
r" L
5 \
/ -i+ r1
.)
■"[-
5 \
.j'
This is an example of a stochastic game which contains no traps, but which still does not possess a solution according to our definition, due to the fact that under the "best11 strategies the expectation oscillates. This game does not possess a critical vector. EXAMPLE 5. r r1 :
2
r2 r 20
r 20
r2 : (- 1 0 )
.
2
This is a recursive game satisfying the minimax condition, for which the value of r1 is 5, with optimal strategies [0 , 1 /2 , 1 /2 ] for P 1 and [1 /2 , 1 /2 ] for P2 in r1. However, for any truncation (compulsory stop after n-rounds) the value of r1 is 1 0 instead of 5* This shows that the solution of a recursive (or stochastic) game cannot in general be obtained as a limit of solutions of truncated games. Note also that for this example the iterated value mapping starting with 0 does not converge to the value of the game. EXAMPLE 6. "Colonel Blotto commands a desert outpost staffed by three military units, and is charged with the task of capturing the en campment of two units of enemy tribesmen, which is located ten miles away.
RECURSIVE GAMES
77
Blotto scores + 1 If he successfully captures the enemy base without losing his own base, and - 1 if he loses his own base under any circumstances. Daylight raids are impractical, and for night raids an attacking force needs one more unit than the defending force to effect capture. If an attacking force arrives with insufficient strength to effect capture, then it re treats to its own base without engaging." In this game a strategy for a player for a single night1s opera tion consists simply of a partition of units into attacking and defending forces. Letting A stand for attack, D for defend, the matrix for this recursive game is: ENEMY
A
D
0
3
1
2
2
1
3
0
A
0
1
2
D
2
1
0
,r
r
r
/
r
r
1
I
r
\
BLOTTO
1
1 -
-1
1 - 1
The value of this game Is easily seen to be + 1, with strate2 2 gies of the form [0, 1 - e - e , e, e ] e-best for Blotto, and all strate gies optimal for enemy. It is tactical situations such as this which appear to be the main application of zero-sum two-person recursive games. It is amusing to note that Blotto!s patience could be measured by the reciprocal of the e he chooses for his e~best strategy, since the smaller he chooses e the surer he is of winning, while at the same time the possible duration of play is increased. This is always the case in a recursive game when a player Is forced to resort to e-best strategies even though the elements satisfy the minimax condition, since such a strategy becomes necessary only in the event that optimal play in r ( V ) for all game elements would fail to make all of the favorable game states transient, to use the terminology of stochastic processes. Although not zero-sum, many bargaining situations can be con veniently formalized as recursive games by allowing one or several players to submit offers or bids, the play ending with payoff if agreement is reach ed, and continuing with no payoff and subsequent attempts otherwise.
EVERETT
78
BIBLIOGRAPHY [1 ]
SHAPLEY, L. S., "Stochastic games/ 1 Proceedings of the National Academy of Sciences, U.S.A., 39 (1 953 )> pp* 1095-1 1 0 0 .
H. Everett Princeton University
FINITARY GAMES1 J. R. Isbell INTRODUCTION This paper applies old theorems to new definitions in order to show that the von Neumann-Kuhn extensive form covers more games than It has been given credit for. All our theorems say that certain classes of games have equilibrium points in certain classes of strategies; accordingly we have to delineate these classes with some care. First we remove Kuhn’s restriction [1 1 ] that a play cannot meet an information set more than once. To solve the resulting games we must admit strategies which are probability mixes of behavior strategies; or rather, if we stick to strategies of the usual sort (mixtures of pure strategies), we find a formal solution, but a player may be able to do better with on-the-spot randomization. Two referees have suggested inter pretations for this novel situation. The first Is that the moves are play ed by randomizing machines. For each move an umpire asks a machine for Its decision. Before play, each player decides the settings of his machines, one setting for each of his information sets. Thus he needs one machine for each of his Information sets; but any one machine may operate more than once In the course of the play. Now this might happen because the de cisions have to be acted on faster than a human can act, perhaps in inter cepting a missile; and it is easy to imagine reasons why it could be In feasible to build a memory into the machine, e.g., weight limitations. The second interpretation is closer to Kuhn's concept of a strategist [113 (and also to the author’s initial interpretation). The moves will be made by personal agents, not understanding the whole game but capable of randomizing as ordered. Since the agents have memory, we must have a different agent for each move; but one order goes to all agents in 1 This is part of the author’s doctoral thesis, Princeton University; supported (in part) by the Office of Naval Research at George Washington University under ONR Contract N7onr4l904. It was presented (In part) to the American Mathematical Society on February 2 8, 1953. 79
80
ISBELL
the same information set. This might happen because in the crisis central headquarters is too busy to analyze all local conditions, or simply be cause communication facilities are overloaded. Either Interpretation supposes an underlying structure satisfy ing Kuhn’s restriction. For some reason the player cannot profitably ex ploit all the information in this underlying structure. He loses by this inability (the opponents’ information being fixed); but he would lose even more if he restricted himself to mixtures of pure strategies. In other words, he achieves his best result by a partial decentralization of decision. The formal proof of existence of equilibrium points is an adapta tion of results of Dresher, Karlin, and Shapley [2 ]. Second, we follow up Kuhn’s concept of difference game [1 1 ] to decompose a finite game into its (unique) composition series, consisting of minimal difference games. Third, we admit infinite extensive forms of the same nature as the finite ones, with zero payoff for Infinitely con tinued play. Fourth, we decompose these flnitary games into composition series, and further, into stochastic forms. Shapley’s stochastic games [1 2 ] form a subclass of the stochastic forms of finitary games; but these are to be contrasted with ninfinitary games" in which there are non-zero payoffs for infinite play, and with Everett’s generalized stochastic (recursive) games [4]. Last, we include some material on programming games. The inter est of such games seems to lie principally In the field In which they were found, as models for tactical problems. The prototype programming game involves two non-homogeneous forces; each weapon chooses an enemy target, or a distribution of its fire over several targets. After some time one unit will be killed, and new strategies will be chosen to continue the process. The payoff function is linear fractional. We have discussed such tactical models in [9 ]. Now there is an existence theorem for linear fractional payoff functions which we need in [8 ] (the other part of the author’s thesis) and give in this paper. One can also regard the proto type as a single move, not belonging to a player, but shared by all play ers; and one can go on to build extensive games out of such moves. We conclude the paper with a sketch of the building operation and an indi cation of the proof of the relevant existence theorem. § 1 . FINITE GAMES As Is customary, we use the term ’'game" for an incomplete set of rules. When one defines the class of admissible strategies, one has a function on strategies to stochastic processes which could also be called the game.
FINITARY GAMES
81
DEFINITION. A finite game r = on the finite set N of players is an ordered pair (G, H); the game structure G is an ordered triple (K, P, J ) and the payoff H is an ordered pair (C, F). (a) The game tree K is a finite partially ordered set of moves with a least move o and for every x in K just one maximum chain from 0 to x, the path of x. A maximal move is a play; the moves covering the move x are alternatives at x. At every move which is not a play there must he at least two alternatives. (b) The player partition P is an indexed partition of the set of all moves which are not plays Into n + 1 sets, PQ, P 1, ..., Pn . Amove in PQ is a chance move and a move in P^, i > 0, is a personal move of player I. (c) The information J Is an ordered pair (U, Q) of partitions. The information partition U is a re finement of P; its elements are called information sets, and when x and y are moves in the same inform ation set, -there must exist a one-one correspondence be tween the alternatives at x and those at y. choice partition Q sets up elements of Q, are choices, and there must exist u In U such one alternative at each move in u; if u Q P^, q is over P^.
The
such a correspondence; the for each choice q that q contains just u. Then q is over The sum of Q Is
K - (0). (d) The umpire1s rule C is a function assigning to each Information set u Q PQ a probability distri bution on the choices over u. (e) The objective F is a function on plays whose values are non-negative real n-vectors. The definition Is modified from that of Kuhn C11]. We shift the term payoff from F to H to facilitate the generalization to Infinite games. In a finite game the chance moves can be made with any independent probability distributions without the strategic nature of the game being seriously affected by the changes. Of course, the players are supposed to know theumpire’s rule, and the entire apparatus of the definition. We have Incorporated the suggestion of Dalkey [1] that an in formation structure be put on the chance moves. This means that certain
82
ISBELL
chance moves may be linked together, so that the same probability bution must be used over the alternatives at each. The situation to correspond to reality; e.g., a deck of cards might be shuffled in the same manner at several points in a game tree. We have not
distri appears and dealt incorpo
rated the stronger suggestion of*Dunne and Otter [3 ] that general nonindependent umpire’s rules be admitted; for this appears to correspond to nothing in reality. The form used admits that players may be given in formation about chance choices which is incomplete in a combinatorial way; thus a chance move might have five alternatives, three of which lie In the same information set. A game which can be described in any of these forms, of course, can be described in any other of them (excepting the requirement of Kuhn, Dalkey, and Dunne-Otter that a path cannot meet an information set in two moves); but the class of games carried by a given game structure Is different in the different forms. In our treatment the basic type of strategy is the behavior strategy. The strategy space of player I is the space of all functions s, whose arguments are information sets u C P^, such that s(u) is. a probability distribution on the choices over u, topologized by pointwise convergence; or alternatively the space of all functions on choices over P^ whose values are non-negative real numbers summing, over each information set, to 1 . Let be the set of all choices over P^; for each u in P^, let be the set of all choices over u. DEFINITION. The coefficient simplex Su is the ab stract simplex of all probability vectors on C^. The strategy space S^ is the Cartesian product
X s.u
u£P±
The elements of a strategy space are behavior strategies, and their coordinates are behavior coefficients. The typical behavior strategy s1 in S^ Is an array behavior coefficients; Sjk is the probability of the k-th choice at the j-th information set of player I. To play it, the player sets up a machine M a . for each of his information sets u*. When a move in u . . ^ arises, an umpire will go to the machine for a choice. The player sets all machines initially so that Ma. will choose k with probability s^v . J (If only behavior strategies are to be used then a simpler scheme of play may suffice; but in general, some such scheme as given above is needed.) We need notation to describe the stochastic process which a game
83
FINITARY" GAMES
becomes when all players have chosen behavior strategies. Let the umpire’s rule C be displayed as s° = {s^); let Z be the set of all plays. The probability of arriving at a move x Is the product over the path of x of the probabilities of the unique choices required to stay on that path; designate this expression as Pr(x) = s^.. The indices I, j, k, of course, depend on several variables; but the entire collection is deter mined by x. Then the mathematical expectation of payoff is the vector function E = Zz Pr(z)F(z). Let F^ denote the 1-th coordinate payoff function (objective), E^ the expectation of the i-th player. We have E^(s1, ..., sn ) = EzFi(z )nzskm* The constants F^(z) and s ^ can be assimilated into an expression £ f^.(s^), where f^;. is, except z J -q J for a constant factor, the product of those s^r Involved In Pr(z) for which p = j . We shall transform (P-t ~ (p^)) • JLZ , j
S.
to a space of vectors
(f^j(s^)) = p^ =
Then the payoff
% 0-z n_“
(pv x in the same information set, delete all moves > y which require a choice different from that at x . (2 ) Then replace each such y by the (unique) next re maining move z > y.
The result is a game In the usual sense, and we refer to the usual theory
[ 11]. The proof of equilibria In mixed strategies does not differ essentially from Kakutani’s original application of his fixed point theorem [1 0 ]. The theorem is that a point-to-convex-set transformation In a compact convex subset of En , with a closed graph, has a fixed point; i.e., some point is in its image. The appropriate domain Is the Cartesian product M of the n moment spaces ML.. If p = (p^ ) ^is In M, let Y^(p) be the subset of M. consisting of those points r1 such that E± (p1,
r\
Pn ) = X
n [ piz I j ^ 1 ]
maximum.
85
FINITARY GAMES
P o p each p, Y^(p) is the maximizing set of a linear function and hence is non-void, closed, and convex. Set Y (p) = X Y.(p). Now the complement I of the graph of Y is open; i.e., if y £ Y(p), so that y Is not in the maximizing set Y^(p), for some I, then
x Az n[ 4 1 j *
< max
for some £ > 0, and therefore some neighborhood of the pair (p, y) con sists entirely of points (r, a) with a1 £ Y^(r). Hence the Kakutani theorem applies, and there is a point p = (p^) at which no player can improve. That is, THEOREM 1.1. A finite game has an equilibrium point In mixed strategies. Figure 1 is as simple an example as we can find to illustrate the 0
I
S
O
FIGURE 1
ISBELL
86
necessity for non-linear mixed strategies. The diagram represents a twoperson zero-sum game; each play is labeled with the payment to player I there, which is the negative of the payment to player II. The strategy of II is obvious. One may verify that I - obtains 9 / 1 6 by randomizing between [x1 = 3/4, x2 = 0, x^ = 1/4, y 1 = 0, yg = 1 ], and [x1 = 0, x2 = 3 /4 , x^ = 1 /4 , y E^ (s ). Thenfor some v^, ®j_(s : y j )> E^(s), there isa linear strategy y1 such that this contradicts the hypothesis.
by linearity. By Kuhn’s Lemma, E^(s : y1 ) = E^(s : vj). But
Kuhn’s proof [11] that a game with perfect recall has an equi librium point in behavior strategies applies here, because a game with perfect recall is linear. But we may approach the matter somewhat differ ently. First, a player 1 has perfect recall if whenever x < y in P^, (1 ) x
u,
and
y
belong to different information sets,
(2 ) every element z so that
choice at
of
v
is
u, v,
preceded by an element
(3) the choice at w in the path of x in the path of y.
z
w
of
is the same as the
Thus the player remembers at y everything he knew at x, and what choice he made at x. Then the partial ordering of moves induces a partial ordering
87
FINITARY GAMES of information sets z < w, z in v, w a
of
u
such that
of player I; for x < y, x in u, y in v, excludes in u, since there would have to be another element a < z < w,
THEOREM 1 .3 . convex.
where p^ follows: P±, let ceding a at c in
If
I
and then
a
and
has perfect recall,
z R^
would violate (1 ). is
PROOF. The general convex combination p = z a^p-^, k = 1, ..., m, = T^ (sk ), Is the Image of the behavior strategy s defined as Write sk = fsj^}. For each k, for each information set a in wk be the product over the Information sets c in P. prek of the coefficients SQ(^> where d is the index of the choice the path of a. For w^ = 0, s ^ is arbitrary; for wk > 0,
fk_^a®ab_^k ab Sk wa “k The formula is easily verified. A game Is said to decompose at the move x when every personal information set which meets the tree of all moves y > x is contained in that tree. A subgame of a finite game r has for game tree such a sub tree of the game tree of r; the remaining apparatus Is defined by re striction. If the least move of the subtree Is x, the subgame may be designated rx « The treatment of subgames given by Kuhn [1 1 ] is fully applicable here. However, we wish to use a simpler concept In place of that of difference game. DEFINITION. A finite inning I is an ordered pair (G, H), where G Is a (finite) game structure, H is an ordered pair (C, F), C is an umpire’s rule in G, and F is a function on plays whose values are non-negative vectors or finite innings (In either case) on the players of G. For the present we consider the smallest class satisfying the definition. DEFINITION. The composition series of a finite game r = rQ is a partially ordered set of finite innings defined as follows: The elements Ix of ^ are in one-one correspondence with the subgames rx of r. For each rA , the game tree of IX Is the set of all moves y in rx such that if x < z < y,
ISBELL
88
then r does not decompose at z. The game structure and the umpire’s rule of Ix are the restrictions to the game tree of Ix of those objects In r. The ob jective F*of I is defined as follows: If z is a play in I and a play In r, then F*(z) = F(z); otherwise r decomposes at z, and then F*(z) = Iz * f* is partially ordered by the relation Ix < 1^ if and only if x 0, each E. b. . = 1. We wish to apply -‘-J “* J Glicksberg1s theorem [6] that equilibrium points relative to regular prob ability measures exist for every game with continuous payoff on a (finite) product of compact Hausdorff spaces. The requirement of regularity assures that the expectations exist; but since the behavior strategies themselves are admitted, such an equilibrium is as stable as could be asked. Now the strategy space of a player Is certainly compact in the product topology, in which a sequence {(b^ .)n ) converges if and only if each coordinate sequence (b^ j } converges. The Malthusian condition actually assures continuity in this topology; for, given any € > 0, there is k such that MAk < e(i - a )/2, whence the expectation from moves after the k-th is less than e/2 . There remains the expectation in the finite game obtained by stopping after k moves, a continuous function of strategies, which then varies less than e/2 In sufficiently small neighborhoods. Hence we have THEOREM 2.1. point.
A Malthusian game has an equilibrium
The Malthusian condition is stronger than necessary. If a finitary game is defined by relaxing the previous definition to the con dition of continuity of the payoff in the product topology, there follows the COROLLARY. point.
A finitary game has an equilibrium
ISBELL
90
games.
Kuhn’s Lemma, and hence 1.2, carries over to linear finitary For this one uses a standard theorem of measure theory: for every
sequence {S^} of probability spaces, there exists a unique measure on the product X S1 which for any finite set of coordinate Indices combines the given measures independently. Then for each player k, take a finite space S1 for each of his information sets u^; let S1 contain one point for each choice at u. . A behavior strategy (b. .] induces a prob1 ability measure on each S in a natural way; the product measure naturally induces the required linear strategy. THEOREM 2.2. A finitary game in which no information set meets any path twice has an equilibrium point in linear strategies (relative to mixed strategies). A composition series is not necessarily the most articulated form for a general finitary game. One may well be playing the same inning over and over, either with discounted payoff or with the same payoff but a possibility at any time that play may stop.1 Accordingly we approach the decomposition question in a slightly different way. defining infinitary innings.
We thus refrain from
Given a finitary game r, let D(r) bethe partially ordered subset of the game tree of r consisting of those moves x at which r decomposes. In D(r) define an equivalence relation, x equivalent to y if there Is a one-one correspondence cp between the game trees of rx and ry preserving all the structure of these games except that there may be a positive real number a 4 1 such that cp carries the objective of rx into a times the objective of r . Define a covering relation in the set C(r) of the equivalence classes of D(r)by [x] R* [y] if and only if for some representatives, D(r) such that x < z < y.
x, y, x < y and there is no
z
in
LEMMA. Equivalence In D(r) is an equivalence rela tion. If x and y are equivalent there Is unique —1 a(x, j ) = a ~ (y, x) such that the objective of rx is a times the objective of r^.2 In C(r), if [x] R* [y] then there is unique n such that for every element x of [x], the game tree of r contains n elements,
There are two other such possibilities. There may be zero payoff, or there may be no payoff unless play passes irreversibly out of the inning. In connection with the latter case certain non-finitary games can be solved quite satisfactorily [4]. 2
If the objectives of
rx
and
ry
vanish identically, we set
a'(x, y) = 1.
FINITARY GAMES y1
... yn,
of [y] such
that for no
91 z
in
is x < z < y^; and an equivalence between x x ! carries corresponding (y^) into each other.
D(r) and
The proof is omitted. We shall use choose any element of
C(r) z,
as an Indexing set; and for each
and label it
z
in
C(r),
x°(z).
DEFINITION. The stochastic form of a finitary game r is an ordered triple (S, R, 0), where R is a relation in S and 0 is an element of S. The elements Iz of S are in one-one correspondence with the elements z of C(r); R is the relation induced by R*, and 0 is I ,where z D Is the element [0] of C(r) containing the least move 0 In the game tree of r. Each I Is an ordered pair (G, H) = ((K, P, J ), (C, F * ))• Here K is a partially ordered set isomorphic with the subset of the game tree of r , where x is any element of z, consisting of those moves y such that If x < w < y, then r does not decompose at w. The sets P, J, C, are isomorphic with the corresponding sets in r , for any x in z . To define F*, choose an isomorphism of K onto K*o(z ), p p*. If r decomposes at p*, then F*(p) Is an ordered pair, (zf, a), where z 1 € C(r) is the equivalence class of p* and a = a(p*, x°(z!)). Otherwise, F*(p) = F(p*), where F is the objective of r. The Iz are Innings of r; a set I is an inning if and only if there exists a finitary game r such that I is an inning of r. A stationary strategy in a finitary game Is a function on its innings whose values are mixed strategies in their game structures. THEOREM 2.3* A finitary game has an equilibrium point in stationary strategies (relative to mixed strategies). PROOF. The proof is again by Glicksberg1s theorem, with the same topology (the product topology). Stationary strategies correspond to a closed subspace of the space of behavior strategies, and hence there is a relative equilibrium point. Now if player I could gain by changing to a non-stationary strategy, using strategy s in one occurrence of an inning
92
ISBELL
and s' In another, the two occurrences having objectives F and aF, then either his expectations are of the form t, at, or he does better in one case. At any rate he does not diminish his gain by using the better strategy in both cases, the strategies of the other players remaining fix ed. The argument applies successively to all occurrences of all innings and contradicts the assumption of relative equilibrium. There may well be non-stationary equilibria at which any player attempting to change to a stationary strategy will lose. In this setting one would naturally define a stochastic game as one all of whose innings have finite game trees. Shapley [12] has made a special study of zero-sum two-player stochastic games; and Everett [4] has investigated games of the same sort in which the expected payoff Is bound ed but does not go to zero as the number of moves increases. (This is only a small part of Everett’s results in [4].) In both cases the expec tation is a linear fractional function of stationary strategies; in Everett’ case the denominator may vanish, and then the numerator also vanishes and the expectation is in fact zero. Generally this is a jump discontinuity In expectation. We observe that given any polynomial resp. rational algebraic payoff function (in appropriate variables; including the unpleasant ex amples of Glicksberg and Gross [7 ]), there is a finite resp. stochastic game with this payoff. The extensive description of the moves In a game is frequently condensed to a representation in terms of an objective function on a Cartesian product of strategy spaces. A strategy space is defined to be a convex subset of sequential Hilbert space, compact in the weak topology. Typically it is a space of moments ormeasures representing mixed strate gies In a finitary game. The objective function is often multilinear; and we can conclude from Glicksbergfs fixed point theorem [6] that such a func tion has an equilibrium point. We give below a stronger result, which finds application in [8]. Programming games arise naturally in some models of warfare [9 ]. One has a number of units of different kinds, not expecting to be reinforced firing at each other. More formally, the system is in a given state and may pass next to any of the states h^, ..., h^; all h^ have definite values for each player. Each player must choose a program, I.e., an ele ment of a strategy space (for example, a battery of probability measures representing the proportions in which the firepower of his several units is to be distributed). Let r- be the program of the j-th player, let J P(t) be the probability that at time t no transition has taken place, and let Q^(t) be the probability of transition to h^ by time t. We
FINITARY GAMES
93
assume (o
| i = (.i r s
^
. .j)
p (t)
with constant coefficients c^, a^. Thus the effects of the programs are supposed to be differentially linear and independent. Assuming the trans ition probabilities are not all zero, the ratio of the probabilities of transition to h^ and to h^. is independent of timeand is given by
(2 )
c± + £
r . • aj: ck + Z rj • aj J
j
•
Hence if we let the expressions in (2 ) be p^ resp. p^, and the vector values of the states h^ be designated w^, the vector payoff is
Y,
Pi(rV - " ' rn )wi
I___________ __
(3) y
p ±
^ r
1 ' ■ ■ ‘ ' r n
}
This Is a multilinear fractional function; therefore It has an equilibrium point, as will be shown. DEFINITION. A programming game is a real n-vectorvalued function P on the product of n strategy spaces, S 1, ..., Sn, of the form P(x1, •.., xn ) = P(x) = (P^ (x), ..., Pn (x)), where
Pi ( x ) =
z n.... f«(j) (xj) ..................... YP TTJ sp(j) (xJ)
with a and 3 running over finite sets, all ^a(j) and gp(j) continuous linear functions, and the g !s strictly positive. DEFINITION. An equilibrium point in a programming game is an n-tuple of strategies, (x1, ..., x11), such that
9k
ISBELL P i (x1,
THEOREM point.
xn ) = max
2
±
P ^ x 1, .
y 1 , ..., xn )
.
• A programming game has an equilibrium
We omit the proof, which parallels that of Theorem1.1. propriate mapping T is given by T± (x)= | y 1eS1 | Pl (x1, .
y1,
xn ) = max |
The ap
.
Insteadof Kakutani’s theorem, one usesGlicksberg's generalization: A closed point-to-convex-set mapping in a compact convex set has a fixed point C6]. We conclude -with a sketch of the general game including ordinary personal moves and programming moves (as well as chance moves). Inci dentally, both personal moves and chance.moves are special cases of pro gramming moves. There is no player partition. Each player may be active at all moves. Thus each player must have an information structure over the en tire game tree. Each player has a strategy space X^ for each of his information sets u^. For each move m, let Xm be the Cartesian product of the strategy spaces X^ belonging to Information sets u^ which contain m. Thus there is one factor X^ for each player. Let Ym be the space of probability distribution over the alternatives at m. There is given a continuous function on Xm to Ym .^ Finally,, there is given an ob jective function F on the set of all terminal moves; In n-person games, the values of F are real n-vectors, but in two-person zero-sum games, the values are real numbers (representing the payment from the second play er to the first). Summarizing, an n-person extensive programming game consists of (i) (ii)
3 A point
a finite game tree K; n information partitions U1 = {Uj) over K: m
x in X represents a complete determination of all the rele vant parameters controlled by the players. There ensues a one-step stochastic process whose outcome ^ ( x ) Is a probability distribution over alternatives, rather than a single alternative as in finite or finitary games. One can make m a pure chance move by reducing Xm to a single point. Then one must modify the information partition; we leave the details to the interested reader.
FINITARY GAMES (iii) (iv) (v)
95
for each information set Uj a compact Hausdorff space X^; for each move m a continuous function on Xm to Ym , as described above; an objective function F as described above.
It is clear that specification of a point in each Xj determines a stochastic process governed by the probability distributions which are the relevant values of the functions H^; further, each coordinate of F will have an expected value under this process, and the expectation E(F(x^)) Is a continuous function of the family {x^}. This is all that J J is needed for the existence of an equilibrium point for the function E(F( )), by the general equilibrium point theorem of Glicksberg [6]. We recall the well-known fact that the equilibrium points of a two-person zerosum game are precisely the pairs of good strategies for the two players. The content of this definition and proof sketch may be somewhat obscured by the form. We may summarize it as follows. It Is well known that, under certain restrictions, the values of the objective of a game may be other games. What we have done here is to point out the principle, The moves of a game may be other games. Regarded from this point of view, our definition above simply imposes a hand-picked list of restrictions which is known to be adequate. This suggests the interesting problem of finding weaker adequate restric tions, which we shall not go into here. BIBLIOGRAPHY [1 ]
DALKEY, NORMAN, "Equivalence of information patterns and essentially determinate games,' Annals of Mathematics Study 28 (Princeton, 1953), pp. 217-244.
[2 ]
DRESHER, M., KARLIN, S., and SHAPLEY, L., "Polynomial games," Annals of Mathematics Study 2 4 (Princeton, 1 9 5 0 ), pp. 1 6 1 —1 8 0 .
[3 ]
DUNNE, J., and OTTER, R., "Games with equilibrium points," Proceedings of the National Academy of Sciences 39 (1953), pp* 310-314.
[4]
EVERETT, H., "Recursive games," this Study.
[5 ]
FENCHEL, W., "Krummung und WIndung geschlossener Raumkurven, " Mathematische Annalen 101 (1 9 2 9 ), pp. 2 3 8 -2 5 2 .
[6]
GLICKSBERG, I., "A further generalization of the Kakutani fixed point theorem, with application to Nash equilibrium points," Proceedings of the American Mathematical Society 3 (1952), pp. 170-174.
[7 ]
GLICKSBERG, I., and GROSS, 0., "Notes on games over the square," Annals of Mathematics Study 28 (Princeton, 1953), pp. 173-182.
96
ISBELL
[8 ]
ISBELL, J., "Absolute games," to be published.
[9 ]
ISBELL, J., and MARLOW, W., "Attrition games," Naval Research Logistics Quarterly 3 (1956 ), PP* 71-94.
[1 0 ] KAKUTANI, S., "A generalization of Brouwer’s fixed point theorem," Duke Mathematical Journal 8 (1941), pp. 457-4 5 9 . [1 1 ] KUHN, H., "Extensive games and the problem of Information, " Annals of Mathematics Study 28 (Princeton, 1953 )j PP* 1 9 3 -2 1 6 . [1 2 ]
SHAPLEY, L., "Stochastic games," Proceedings of the National Academy of Sciences 39 0 953), pp« 1095-1 1 0 0 .
J. R. Isbell
The George Washington University Logistics Research Project
APPROXIMATION TO RAYES RISK IN REPEATED PLAY James Hannan1 SUMMARY This paper is concerned with the development of a dynamic theory of decision under uncertainty. The results obtained are directly applicable to the development of a dynamic theory of games in which at least one play er is, at each stage, fully informed on the joint empirical distribution of the past choices of strategies of the rest. Since the decision problem can be Imbedded in a sufficiently unspecified game theoretic model, the paper is written in the language and notation of the general two person game, in which, however, player I ’s motivation is completely unspecified. Sections 2 - 7 consider a sequence game based on N successive plays of the same m by n game and culminate in Theorem 4 which exhibits a usable sequence-strategy for II, consisting in the use at the (k+1 )-st play of a strategy Bayes against the perturbation of I ’s cumulative past choice by the addition of [3n2/2m]1^2k 1^2z, with z chosen at random from the -unit m-cube. With |B | denoting the maximum difference within rows of II ls inutility matrix, Theorem k asserts that the expected inutility incurred by this strategy across N plays, less N times the single-game Bayes inutility against I ’s empirical distribution of choices within the N plays, is bounded above by [3n2m/2 ]1^2 1B| N 1^2, uniformly in N and in I ’s N choices. For fixed N, a sequence-strategy which minimizes the maximum of the criterion of Theorem h is characterized by a recursive program in Section k . Except in the trivial case where II has a dominant column, the resulting min-max is bounded below by a non-zero multiple of N 1^2 (Theorem 2). In a slight generalization of Matching Pennies the solution to the recursive program and the resulting min-max are explicitly exhibited. Sections 8 - 9 consider the sequence game when the component game may be non-finite. For the restrictive class of non-finite games where a 97
HAMM
98
Bayes response satisfies a Lipschitz condition of order asserts that the criterion of Theorem, b is
a > o,
0(z^k~a ) when
II!s
Theorem 5 strategy
consists in the use of the values of this Bayes response at the successive cumulative past choices of I. The game on the square with II*s inutility given by squared deviation illustrates the non-vacuity of Theorem 5 for a < 1. For the class of games where I has only a finite number, m, of pure strategies, Theorem 6 asserts, under regularity conditions vacuously satisfied by finite games, the truth of the natural generalization of Theo rem b with the n of the strategy and bounds replaced by 2. The appendix considers a semi-dynamic game suggested by an in terpretation of problem 13 (ii) in the first of these Contributions. If I uses a fixed randomized strategy x on each play independently, II1s expected inutility, less N times the single-game Bayes inutility against x, is 0 (1 ) when II*s strategy choices are the values of any Bayes response at the successive cumulative past choices of I (Theorem 7). § 1.
INTRODUCTION
A finitary form of the (static) decision problem with numerical utilities may be described briefly as follows: exactly one of n possible decisions is required to be made in a context in which the decision-maker knows only that one of m possible states of nature obtains and, for each decision and each state, the inutility a^j of decision j when state i obtains. The problem of choosing a decision which will in some sense mini mize inutility has been resolved by many principles of solution [1 2 ], [1 3 ], and C1 6 ]. These principles have in each case been suggested and to some extent supported by the additional assumptions associated with some class of realizations. In particular the classical Bayes principle introduces the assumption that the state index i is subject to a known (a priori) probability distribution. Under this assumption, which is considered much too restrictive for many realizations, a decision which minimizes expected inutility is a very satisfactory solution of the problem. The present paper is concerned with a sequence of N decision problems, which are formally alike except for the fact that the state of nature may vary arbitrarily from problem, to problem. Decisions are re quired to be made successively and it is assumed that they may be allowed to depend on the e.d. (empirical distribution) of the states of nature across the previous problems in the sequence. This total lack of assumptions regarding the behavior of the state sequence is a feature distinguishing the present structure from many considerations of multistage processes
BAYES RISK IN REPEATED PLAY
99
(cf. for example [1], [9]). In a certain sense the opportunity for minimizing the average in utility of the set of N decisions depends on the e.d. of the N states involved.
If this e.d. were known before any decisions had to be made,
this knowledge would enable the choice of a decision Bayes with respect to this distribution. The repeated use of such a decision would reduce the average inutility across problems to the minimum expected inutility on a single problem where the Bayes assumption holds and the probability dis tribution on the state index is the same as the e.d. of the N states in the sequence problem (see (4.1 0 )). Another hypothetical situation, somewhat more suggestive for the sequence problem, is that in which successive decisions are permitted to depend on the successive e.d. of all states thru their respective presents. The use of decisions Bayes with respect to these distributions reduces average inutility to not more than the Bayes single-problem inutility with respect to the N-state e.d. (see (6.4) ff. ). The most important conclusion of this paper is that the knowledge of the successive e.d. of past states makes it constructively possible to do almost as well at reducing the average inutility across problems as In the case where N and the distribution of the N states are known in ad vance (or even as in the case of the preceding paragraph). The sequence of decisions exhibited which attain this performance is a sequence of random ized (not necessarily properly) decisions whose expectations are the values of a sequence of smoothed versions of the Bayes response at the successive e.d. of previous states. The idea of using the Bayes inutility against the N-state e.d. as a goal for the performance of a set of decisions (distinguished by the term compound decision problem and with stochastic information on the N-state e.d. replacing the knowledge of past e.d. in the sequence problem) was enunciated in [14]. The program outlined in [14] for the rigorous in vestigation of compound decision problems was initiated in [6] and these papers exerted a strong influence on the sequence development. The inadequacy of min-max solutions has long been noted.in connection with statistical decision problems and [7 ], [8], [1 1 ], [14],[1 8 ], [1 9 ], are particularly relevant. The fact that this inadequacy is always present in certain compound decision problems was noted in [14] and [6]. An example of a compound problem in which the min-max solution fails to be a direct product is exhibited in [171 * Within the theory of games the need for a dynamic theory is docu mented in [1 3 ] and C10 ]. The main development of the present paper consti tutes an approach to the solution of a strong form of problem (1 3 ), part
HAMM
1 00 (ii) in [io].
A weak form is considered in an Appendix.
The use of the Bayes response across a sequence of games is co incident with a one-sided version of "the method of fictitious play", [3 ], [ b ] , and [1 5 ]. Theorem 5, applied to a zero sum game in which the hy potheses of the theorem are satisfied for both players, yields an interest ing bound on the rate of convergence of the method. PART I. §2.
FINITE GAMES
THE COMPONENT GAME,
G
The main purpose of this section is to introduce the notational framework of a single finite game, later to be used as the generic component of a sequence of games. The single game terminology can also be applied to the normal form of sequence games and for this purpose some extensions of the concept of "regret” (the loss of [1 6 ]) are Introduced. These may in duce interesting orderings in games with sufficient structure and, in particular, will do so for the sequence games of Section 3. Let G be a finite two-person general game in which players and II have, respectively, m and n strategies. Their spaces of randomized strategies will be denoted by X and Y,
I
m
X
x = (x1? x2, .
xm )
I
X±
> 0
Y,
xi
= 1
1 n
(1)
y = (y,;
•••> yn) I yj > o Z yj = 1
and their pure strategies -will be represented (V
(2)
V
(8,, &2,
6J
a basis vector in
X
5n )
a basis vector in
Y
,
(3)
_P ro
JD
I
In accord with the dominant decision theoretic orientation, the game will be consistently viewed from the position of player II. Nothing will be assumed about player I ’s motivation and the game will be defined only up to player II’s inutility which will be described by a loss matrix A. The elements of A will be denoted by a. . (or A J ), the rows by ij A. and the columns by a J. ••
a in
A i
= [a V . - . a 11]
A =
_
am 1
am2
am n _
BAYES RISK IN REPEATED PLAY To exclude trivial games, it will be assumed that for each j (h]
max
eA ^ - min eAr
A
has no dominant column,
> o
r
€
and, to avoid notation distinguishing a subset of non-weakly“dominated col umns, that
A
€Aj > eAy
(5)
y
has no dominated or duplicated column, for each for all
€
only if
The expectation of the loss when will be called the risk, R(e, y) =
£y eA6 = eA
II
j
y.= 1 J
uses a randomized strategy
= eAy
.
For given y, the risk function is representable as the vector, s = Ay, and the mapping from Y to S = AY furnishes a convenient canonical form for G. From the point of view of risk, G is identical with the game in which II*s pure strategies are m-vectors in the set of columns of A, (6)
cr = (a1, ..., crm ) in
{A1, A2, ..., An ]
,
II?s randomized strategies are m-vectors In the convex hull of the columns of A, (7)
s = (s^
and the risk of
s
..., sm ) in
in
(8)
x
in
Y]
,
S is given by the scalar product
R(€, S) =
For each
S = (Ay | y
in X
tjecT
= € gyCT = es
.
the minimum of the expectation of risk, min xs s
,
is attained for s in the convex hull of the set of minimizinga. This minimum will becalled the Bayes risk against x and any minimizing s will be called a Bayes strategy against x. Considered as functions, 0 and s, on X, 0 will be termed the Bayes envelope, s a Bayes response. It is convenient to extend their definition to the whole of m-space by
(9)
0 (w) = min ws = min w o s a s = any function to S such that ws(w) = 0(w)
for each
w
1 02
HAMM
Some continuity properties of the Bayes envelope and the risks of Bayes strategies are immediate consequences of their definitions. These properties are little utilized in static game theory and their importance here stems from their direct use and analogical value in connection with the sequence games to be introduced in Section 3• The Bayes envelope is known to be concave and continuous in more general games ([2] Theorem 2 .2 7 ). Here 0, as the minimum of a finite class of linear functions, is concave and piecewise linear, and (1 0 )
(w - w ’)s(w) < 0 (w) - 0 (w 1) < (w - w !)s(w!)
Although any Bayes response will be discontinuous at every point of possible ambiguity, each Bayes response possesses a local weak continuity at each w, (1 1 )
ws(w!) - 0 (w) = w[s(wT) - s(w)] < (w - w !)[s(wr) - s(w)]
Introducing the several norms for m-vectors, •
m
v± I
|v| = max |v± | 1
1
and the uniform bound on the variation of any Bayes response, |B | = sup |s (w f) - s (w )'| = max max [ecr - 0(e)] w,w’ a e (13) = max max j i
A^ - min A^
the inequality (1 1 ) may conveniently be weakened to (14)
ws(w' ) - 0 (w ) < ||w - w 1|| |s(w!) - s(w)| < ||w - w 1|| |B |
Letting W. = Iw | wA^ = 0(w)], 0 is linear on each W. and each W. ci J J is convex and closed. Each W. not containing a given w has a positive distance from it and, letting d(w) be the minimum of these distances (or +00 if not otherwise defined), it follows that each a which is Bayes with respect to a w ! in the open neighborhood m
X (wi -
W1 Y
1
(and hence also each
s(w1))
< d2(w)
is necessarily Bayes with respect to m
(1 5 )
ws(w!) - 0 (w ) > 0
only if
|w^ -
j2 > d2 (w)
w,
BAYES RISK IN REPEATED PLAY The hound (1 1 ) Is thus improved to zero for all small, quite non-uniformly In w.
||w - w 1|| sufficiently
In zero-sum games against an intelligent opponent, attention has been concentrated on the ordering of II1s strategies Induced by their maximum risk. Letting R(s) denote max €s
,
€
the value of
G
is given by R = min R(s) s
and, by the min-max theorem, (16 )
R = max min xcr = max 0(x) x a x
In games against Nature, the Bayes envelope is considered a worthy defensive goal for II since it is usually felt that I !s move is in no way influenced by II's choice of strategy. As a consequence, a strategy s Is evaluated for each x in X in terms of the "regret", the additional expected risk above 0(x) which it incurs, (17)
D(x, s) = xs - 0(x)
D(x, s)
.
is clearly non-negative and Its maximum with respect to
x
is
frequently used to establish a complete ordering on S. It follows from the concavity of 0 that ^ x ^es ~ 0(03 > xs - 0 (x) and hence that equality holds in max D(€, s) < max D(x, s) e x Because of the assumption (4) D(s) = max D(x, s) = max D(e, s) = max [es - 0(e)] > o x e €
,
(18)
D = min D(s) > 0 s
,
and, since max [€s - 0(e)] = max
£ [es - 0(e)] = max [xs - g 0(e)]
the min-max theorem yields the representation,
,
HAMM
1 ok
D = min max [xs - k , in G°°. Such selection is Implicit in all results involving strong strategies. Non-randomized sequence strategies for
II
will be denoted by
£.(£.) = X ? ° k ( £k ~ 1 ) and their risk given by the natural extension of (2 ). Because of the linearity of the loss in the component moves, the sequence risks attainable by arbitrary randomization are attained by the class of randomized sequence strategies induced by product p-measures, on the possible £. Letting /0\ (3)
k / k- 1 \
r k
k = ^(£)a = yk(gk- 1 )
_s( 0
such strategies will be denoted by dN/
x
N v
= k k
’ ) and their risk by
N v
k k/ k- 1
The orderings induced by maximum risk and maximum regret in G^ will be considered briefly. Letting x = X ^e a product p-measure on (5 )
^ €ksk (e^ 1 ) = xk ^ s k (ck ~ 1 ) > 0 (xk )
,
and from this it follows that N (6 )
max R^(e_> s_) > max e x
0(xk ) = N max 0 (x) = N R x
e
HAOTM The lower bound in (6) will be attained and s_(_e) will minimax risk in G^ if sk (£k”1 ) minimaxes risk in G for each £_ and each k < N. That this condition is also necessary, if II has only a constant risk minimax s in G, follows from the recursive characterization: js(0 If and only if, for each £ minimaxes risk in G'N k-1 (7)
max esk (jek 1 ) < kR -
k = 1 , 2,
€^s^(_e^*“1 ), 1
This prospect of a constant risk minimax . s_(0 in G makes the maximum risk ordering quite unattractive. Alternatively, if II has non-constant risk minimax s in G it follows from (7) that II has multiple minimax s_(0 in G^ and additional principles will be required to discriminate among these. ^ G^
A similar treatment of regret in the sequence game regret by N
I
is possible.
Expressing
N sk sk (e k _ 1 )
I
1
'(ek )
1
it follows from (2.1 7 ) that N
N max €
I
k D(‘
sk ( £ k- 1 ))
k (_ek 1)) = max 1
1
N > max x
I
(xk ) -
Z
k 0(ek )
= N D
X
The conditions under which this lower bound will be attained, the recursive characterization of minimax regret sequence strategies and the reasons for dissatisfaction with the maximum regret ordering exactly parallel their risk counterpart• A classification of II's strategies will be based on their degree of dependence on I fs prior moves. Those in which this dependence is unrestricted will be called recursive strategies. The subclass in which sk is a function only of the empirical distribution of l !s prior moves, k-1 (9 )
k = 1, 2,
will be called symmetric and denoted by _s(E) = X ^ s k (Ek_1 ). For Intro ductory Investigations and comparisons, the class of strategies which are not properly recursive will be distinguished. Such strategies could be used by II In the absence of any Information about I fs past moves,
BAYES RISK IN REPEATED PLAY will be called product strategies and represented by X^sk . Product strate gies in which all components are Identical will be called power strategies and denoted by the common component (or by [s]^). For power strategies, the risk (4) reduces to a linear function of N
(10)
z
RN (e, s)
# 3 > 0 (#)
ek S
with equality if and only if s is Bayes against E^. Thus even if II were restricted to power strategies, advance knowledge of would enable II to attain an average risk across the N games which is equal to the Bayes risk in a single game where I uses the randomized strategy E^/N. This suggests considering the regret associated with the ignorance of the partition function E^, In the light of the possibility of using power strategies,
(11)
N
D ^ e , a) = £ e V f e * - 1 ) - 0 ( # ) . 1
It should be noted of this modification of regret and, more generally, of any modification involving the partition function E^, that N
'(#) = inf max seT N
e V
e s 1
= inf max ssT e*1! NI #
z
N eksk
l(€k )
0(
+ 1
and hence that in dealing with any N
Z
eksk
■a?)
it may, without loss of generality, and will, without further comment, henceforth be assumed that
(12 )
'(e ) = min eA^ = 0 j
for each
In the rest of the paper the modified regret (11 ) will be used almost exclusively. It has the advantages of simplicity and single game interpretability over the modification based on the envelope risk func tion of recursive strategies,
:i3)
0N (# ) -— m m
N ^max - k ~k ' - k ~ 1 _N ■
HANNAN In order to relate the results of later sections to those obtainable in connection with the latter modification, it is of interest to prove (# ) > «(#) - (m - 1 )1/2W 1/2|B|
THEOREM 1.
.
PROOF. Letting Q be a p-measure on the N! permutations of the min-max theorem and the use of conditional expectation yield N N X ---^ - JUc,_k-1 max = min raax t ±L Q
_N e_ ,
i!
1
min
Q
'Q
s.
Q
:6k | €k_1 ]sk (sk"1)
N - raax Q
If
P
X 1
is uniform on the
N!
6k i
E^ - Ek - 1
permutations, = ( # - E^-1 )/(N - k + 1 )
has the same distribution as
E^ -k+1
,
and (1^) yields the lower
bound, N
0 N(EN) > Y, Zp 0(Er/r)
(15)
For
N - 1
r = 1, 2,
.
it follows from (2 .1 0 ) and m
m
X
4 /N = X
Ei / r
that
(?) - ( * )
- (*-£)
s (E )
(1 6 ) E^ Er H ----F It can easily be verified (and is noted in [5 ] pp- 1 8 2 -3 ) that
K PyN
k
iy
r J
, E» / ~ r N y
B?\
|
N / \
r _ ,, N - 1 /
'
BAYES RISK IN REPEATED PLAY and from this and two applications of the Schwarz inequality ........ 2 -, 1 / 2
m # N
E1” r
(5
4
)
mr J
/
-crN
1 (17)
£[?(■-
1/2
v
< r
1
1/ 2 < r -
'
m V
Z,
1/2
/ i
ir
V L
1
1
I \
TT
'
From (1 6 ) and (1 7 ) N
X
/ N-1
^ p0(Er / r ) > 0 ( # ) - I £
1
' 1
\ r“ r " 1l / 2 //2 2 )J (m - 1 )1/2 jB| '
and the proof of the theorem is complete.
§4.
MINIMAX MODIFIED REGRET ORDERINGS IN
N GJ
The results of later sections afford indirect proofs of theorems on the efficacy of recursive strategies optimal in the weak (or in a certain natural strong) ordering based on the modified regret (3 •11 )• Exact char acterizations are obtainable in the form of recursive programs and, in a slight generalization of Matching Pennies, explicit solutions are exhibited. Some obvious properties of modified regret may be noted. Con sideration of the m power moves for I shows that the maximum is non negative for each recursive s_, r N (1 )
#(3)
max
- 0 (#)
> 0
(It will later be seen that N""1/2D^(s_) is bounded away from zero.) The same consideration shows that the relative minimum of D^(s_), over the class of IITs product strategies, is of the order of magnitude of N, N > max (2) e Y, s k - 0 ( N e ) D(x?3k ) = ND ^ Y s k / N j > N D • € 1 1
11 0
HAJWAN
It is a most Important conclusion of this paper that recursive s_ do not necessarily suffer from the. weakness (2 ). Letting D^ denote the minimum of D^(s_) over the class of II1 s recursive strategies, N (3 :
min s
s_) = min max s e
a byproduct of the proof of Theorem h consists In the exhibition of a weak symmetric strategy, s_(E), such that D^(js) < (n2m / 2 ) 1 ^ N 1 |B| . A simi lar byproduct of Theorem 6 uniformly improves on this result and Implies that DN < (2m)1 y/2N 1//2 |B|
( 4)
The problems of the exact determination of D3^ and a minimizing s_ have, In principle, a simple recursive solution. It follows from the definition of a recursive
_s that
min max s1 e1
D^
1 1
has the representation 2 2
m m max s2 e2
€ S
€ S
+
(5 ) ^ s N - 0 (#)
+ m m max SN eN and hence that _s minimizes D^(s_) If for each N - 1, 1 , sr (_er *"1 ) minimizes max „r
L
€rsr + min max r+i p+i S €
£_r 1
and each
r = N,
£r+1sr+1
(6)
m m max sN eN
N N
€ S
'(#)
Denoting the minimum with respect to s of the expression (6 ) by 1\F — * 1 *1^* 1 V (E ), the s (_£ ) minimizing (6 ) may be taken to be the symmetric s, minimax in the auxiliary game ss + VN (Er _1 + e)
(7) Moreover, v»< E
r— 1 \ ) = min max
ersr + VN (Er )
BAYES RISK IN REPEATED PLAY and the min-max theorem yields VN (Er 1) = max
'(xr ) + £
VN (Er ) x
(8 )
= PrVN (Er ) = PrPr +1
... FW (- 0(EN ))
with F an abbreviation for the operator which it replaces., and hence the representation, D>N = VW (0 ) = P 1F2 ... PN (- 0(EN ))
(9)
,
from which follow the useful lower bounds, N (1 o )
k
DN > max x
mi x ) -
> max x
Z
N 0(E )
[x]
These bounds could have been obtained more directly from
SJ
max D^(_£, s.) = max € ( 11)
= max x
X
0 (#)
xk £xsk (£k-1) -
The weaker of the bounds (1 0 ) can be used to obtain a more ex plicit lower bound for which establishes the optimality, in the order of magnitude of their modified regrets, of the strategies of Theorems k and 6 . Considering the covering of X+ by the closed sets X . = [x | xA^ = 0 (x)], (2.1+) insures that no single X- covers and consequently there exists some xQ in X+ such that s(xQ ) has determinations a and a f with
(is)
h2 . i r ' £
! # ( . - » ' )| LXo J
- Z 1
(«! -
*o1 >
THEOREM 2 . If A satisfies ( 2 . k ) there exists satisfying (1 2 ) and for any such xQ
>N0 (x ) 0
Z
i\r 0 (EN ) > (2 * r l'/2hN1 / 2 [xlN
xQ
.
HAMM PROOF. For any such x , N0 (x ) - 0 (E^) > max[ (Nx - E^)cr, J I u u u (Nxn - Er )ar] and, representing this maximum by the average of the sum and absolute difference,
z
(13)
N0 (xq ) -
0 (#)
Z.
>
[xQ ]
„N
- «r')|/2
•
[xo ]
Because of the assumption that h is positive, the random variable # ( a - crf) is asymptotically normal with mean 0, variance Nh2, and the absolute moment of the standardized variable approaches that of the standard normal which is (2 /*)1/2 . Thus |#(a - ff')l/ 2 ~ (2 «)"1 //2 hN 1//2
S! V "
and the bound of the theorem follows from (10), (13) and (1^).
6 exhibit a strong recursive
Theorems b and
#(£, s.*) < N 1 /,2 (6 m ) 1//2 |B|
(1 5 )
s*
for which
.
Theorem 2 and this result suggest ordering strong strategies by
(16)
U(s) = sup iT1/2# ^ ) N -
= sup max N
fY
1/ 2
eksk (€k
^
1 ) - 0 (#)
since they insure that h(2 it)~ 11//d2 < inf U(s) < (6 m ) 1/ s
(1 7 )
h ( o r r ) ~
2 |B|
.
No program for strong strategies optimal in the ordering (16) has been found and, after concluding the present section with an example in which weak optimal strategies can be obtained explicitly, much of the rest of the paper is devoted to a comparatively simple class of usable strong strategies. EXAMPLE
1 /p., 0 0
1 . MATCHING m-SIDED PENNIES2 0
0
1 /p2 0
0 1
0 '.
0
.
0
/p3 . .
m o
o by /p. - 0 (E^ 1 + € (i)) = max - 0(EN ” 1 + € ( I )) I it follows easily from the concavity of of the example that
,
in general and the particulars
m
X
+ eN) - 111111 0 (EN~ 1 + O e
ui =
1 (
1) + p) - 0(EN_1 ) = 1
< 0
.
Consequently max I
y^/Pj_ - 0(5^ 1 + s ( i ) ) j
> max - 0(EN 1 + e ( i ) ;
and is minimized uniquely by lij
J± =
(■ - 1
+
a
)
1
(2
) p±
1 + 0(EN_1
( D ) - £ p0 ( # - 1
,N with the minimum being the constant risk of y in the auxiliary game, 1 - £p0(E^ ~ 1 + €^). A similar argument applies at each, including the induction, stage and yields at stage r that the I-th row of (4.7) is expressible
(3 )
yi/Pi * r+i
that its maximum with respect to
i
is minimized uniquely by
HAMM
r = pi
[p]
(e P~ 1 + e(i) + ^ ek ) r+1
N-r
(k)
N « ( K- '
* X
«“ )
^[p]N- r+1 and hence that the minimum is the constant risk of game,
y
in the auxiliary
N,
hence define a
N - r + 1
(5;
tp] It may be noted that the
N-r+i
yp (Er”1) depend also on
sequence strategy only in weak make this explicit.
G^,
and a more complete notation would
In view of the apparent complexity of the strategy (4) it is of some Interest to note that it could be constructively attained by the following compound randomization: choose | with distribution that In duced on the values of N
X
k
r+1 by the product p-measure yr |^ with
(6)
yi|i = Pi
[p]^ r,
for each fixed
|
use the strategy
1 + 0(Er 1 + e (i) + |) - g p0(Er _1 + 6 + | )
Since 0(Er 1 + e(i) + g) = 0 (Er 1 + |) unless y(Er 1 + i ) determines a unique column j and i = j, it follows from (6 ) that yr i = p when r— 1 \ ’ y(E + 1 ) is not unique, assigns probability 1 to the column j r— 1 associated with y(E + | ) when that column is sufficiently pre-eminent/ and is In general an Interpolate between p and the j-th column defined by [1 - PjA] (7)
i 1 j
yi u = pi [i + (1 - Pj )a ]
i = j
with ~
> A = 0 (Er 1 + e(j ) + |) - 0 (Er 1 + £) > 0
BAYES RISK IN REPEATED PLAY Taking
(8 )
r
equal to
1
115
in (5 ) it follows that N 0(£
DN = N -
ek )
,
and hence that, in this example, equality holds in (^.1 o ). assigns positive measure to each and #(c,
for any
s,
the sequence strategy
Since
[p] N
3) - D»
y(E|N)
defined by (4) has the modified
~ regret DJT for all _eN a fact which could also have been Inferred from TO . “Ki _ , r-1 i the behavior of the in the auxiliary games The general bounds AT (4A) and the class of lower bounds of Theorem 2 , have appropriate on D^ specializations which assert (assuming 1/P ^ / P m ) that 1 /2 (9)
< N- 1 / 2
N [p]
I N^ N 0 (E
< [2m]
1/2
These bounds could be much improved but serve to illustrate the order of magnitude of the asymptotic evaluation. The program for finding the
VN (Er ~ 1 ) envisaged in
.8 ) could
be carried out directly as follows. Consider the maximization in F^ in the stages, over x^ such that 0 (x^) = c, then over o < c < 1 . For the former, each x^ > cp^ and
|(XN )
N 0(EN_1+ 6N ) < c
- t X
( 10)
cpj.0(EN 1 + €(j)) + (1 - cPj.)0(EN 1 )
where j Is the index of a column attaining 0 (E^ 1). Equality holds in (1 0 ) if = c p • and since the coefficient of c on the right Is nonnegative, the latter maximization is accomplished at c = 1 , yielding (1 1 )
FN (- 0(EN )) = 1 + £ (- 0(EN_1 + eN ))
.
Because the maximizations In the operators are attained for x not depend ing on the values of other an easy iteration is possible and yields the result (5 ).
HMNM §5 • THE H, n CLASS OF RECURSIVE RESPONSES If
s
is a Bayes response (2 .9 ), the recursive strategy for
GN
with) = s(E^_1 ) for k = 1, 2 , ... has some interesting aspects. It is the strategy usually attributed to II in the ’’method of fictitious play” ([3 ]; [15]) evaluation of a zero-sum. G. It was noted In [1 5 ] that alternative use of s(E^) (which exceeds the concept of a recursive strategy) improves the convergence rate of the evaluation and it will be seen in Section 6 that N X 1
eks(Ek ) < 0(EN )
If s were uniformly continuous on from this that
X
.
it would be possible to conclude
N Y
eks(Ek - 1 ) < 0 (#) + o(N)
1 uniformly in
_€»
Unfortunately
s
is necessarily discontinuous and the
conclusion is easily shown to be false for any A satisfying (2 .^). (It will be seen in Section 8 that in certain non-finite games neither of these objections will continue toapply.) The present section defines and ex amines a class of sequences of responses, s*k, which induce responses uniformly continuous on X for each k, weakly monotone in k for each x, and, under appropriate conditions on H (in addition to (1)), con verging weakly in k to s for each x, withfirst k-differences con~ verging to zero In norm,uniformly in x. Let Hk numbers such that (1 )
k = 0, 1 , ... be a non-decreasing sequence of positive hk = Hk/k k = 1, 2 , ... is a non-increasing sequence, 0 < H° < H 1 < ...
h 1 > h2 > ...
Introduce the abbreviation A^r = A^- - Ar and let ji be any p-measure on the unit m-cube, Z = [z= ( z ]f ..., zm ) | 0 < z^ < 1], such that, for each q < r and each t 1 < t2, the distribution function of z A ^ satisfies the Lipschitz condition, (2 )
11 [z | t1 < zAqp/1A^1,| < t2] < L(t2 “ t 1 )
Letting a be a Bayes response (2 .9 ), which is pure-strategy valued and such that cr(cw) = a(w) for each c > 0, define for each k = 1, 2 , .
BAYES RISK IN REPEATED PLAY
117
m (3 )
s*k (w) ~ ^ 0 (w -f Hk~1z ) on
w± = k - 1
E
It may be noted that s* is independent of the ambiguity In the choice of a since 0 is ambiguous at w + H z only if (w + H z)A^ = 0 / k- 1 for some q f r , and for each fixed v , H , q, r, the set of z satisfying this equation has ^-measure zero by (2 ). Since the range of a is the col-qmns of A, s*k may be represented somewhat more explicitly,
a(w + Hk - ' z ) - AJ' ] 1 (b)
n
-
on
1 1
^
Wi = k - 1
1
For the comparison of different terms of the sequence (3 ) it is convenient to consider the sequence of responses induced on the common domain X, 3 *k U ] - S*k ((i{ - l)x) = £ ^ ( ( k - 1 )X + Hk_1Z ) (5:
£ Letting
If
k = 1
cr(x + hk ” 1z ) if
k > 1
•
A = cr( (k - 1 )x + Hk"^z) - a(kx + Hkz ) it follows from H ^ t k x + Hkz]A > o > Hk [(k - 1 )x + H ^ z l A
that [kHk 1 — (k — 1 )Hk ]xA > o fled that (6)
for each
z
and hence it is easily veri-
xs*k [x] > xs*k+1 [x]
It further follows from (g»1 l) that xa(x + hk-1 z ) - xs(x) < mh^ 1 |B| from (2 ,1 5 ) that this difference is zero if
X 1
Prom the latter,
d2 (x)
and
1 18
H A M AN xs*k [x] = 0(x)
(7)
if
hk”1 < m “1//2d(x)
A constructive proof of the continuity of
s
[x ]
is obtainable
from the definition of the a-function and the assumption (2 ) on |i. Be cause of the homogeneity of the a-function it will be convenient to obtain this as one of the byproducts of a lemma which will furnish a main tool in the proof of Theorem 3 in Section 6. LEMMA 1. If w and w 1 are m-vectors, a is a pure strategy valued, positive-homogeneous Bayes response and n satisfies (2 ), then |
+ z) - ^cr(wf + z )| < L
PROOF. g
Letting
|B | ||w! - w||
T .k = [z | a (w + z) = A-1’, ct(w' + z) = Ak ],
a± (v + z ) - ^>}icri^w ' + z )
is expressible as
I
^
>' - Hkz (crk - crk+1 ), whence by summation by parts on the right and the use of the bounds, H^za^+1 > 0 , zak < ||z|| |B|, N V
pk/ k -or k+1 )\ >v HtjzN^a N+1 - HtjOza 1
1
(8 )
N
Y j (Hk - Hk_1 )zak > - HN ||z|| |B|
.
1
Prom (6 ) , ( 7 ) and (8 ) , (£ , )> -H1^||z|||B|for lower bound of the theorem follows upon taking expectation with respect to m . Prom Ek-1 (ak - crk+1) < - Hk-1 z(ok - crk+1 ) k = 2 , follows by summation by parts on the right that
N,
it
each
122
HfflNM rk
„k-1 \„ k+1
'za (Hk - Hk ~ 1 )
Ek>1(crk - ffk+1 ) < HNzaN+1 - E ] z a 2 -
Y,
(9)
and from (9) and the upper bound (7) that
E-'\N+1 - 0 < # ) + ^
E k“ 1(ak - ak + 1 )
N ( 10)
< H za(E ) - H'za2 - Y , 2 < HN IIz II |B |
rk
(H
TTk-l
“ H
'zak+i )
,
whence the first part of the upper bound of the theorem follows by expecta \i .
tion with respect to The term
k+1 \ cr )
is alone in that there is no interesting upper bound for the integrand which is uniform in e_ for,each z. The expectation of the summands, ek [s*k (Ek - 1 ) - s*k+1 (E^)], could however be partitioned into a continuity term and a difference term which could then be bounded by the use of (5 . 1 1 ) and (5 *1 2 ). Any such bound can be slightly improved by a direct application of Lemma 1 with w = w ! = E^/H^. With this identi fication, n if (11
w._L - w!_l
is by (5*1) non-negative if
± = 0
and non-positive
= 1,
)
( “ - ' - 2^ i ' ) ( 3 E = T - ? ) 5
and the conclusion of the lemma is that (1 2 )
*k+i
1)
< L y IB|
k - 1 H
k - 2 Hk “
The second part of the upper bound of the theorem follows from (1 2 ) by summation with respect to k and the proof of the theorem is now complete. §7.
CHOICE OP
H, n
IN
s*(E)
For any H, ia satisfying (5-1 ) and (5*2) abbreviate the upper bound of Theorem 3 by
123
BAYES RISK IN REPEATED PLAY N
l^CK, n) =
(1 )
For fixed
HNe + L
p. it will be shown that
and
sup IT 1 //2 UN N admit simple minimizing H in the pespective classes of weak and stpong H. The pesuiting minimums ape increasing functions of Le and the mini mum of the sup of this ppoduct ovep all A satisfying (2 .4 ) is attained when ( j, is unifopm on Z. This choice of n and the associated stpong H fop which sup N_l/2 UN N is minimal yield Theopem
4 as a copollapy to Theopem 3 • 1
p
H , H ,
By minimizing successively with pespect to
H
M__ 1
,
within the constpaints (5 *1 ) it follows that
(2 )
Yj
2 /nk > 2 N/HN with equality iff H 1 = H 2
and hence that (3 )
UN
is bounded below by a linear form in
UN (H, h ) >
with equality
H
iff H
HNe + L —
N/HN
is the weak
IB| > N 1 ^2 [Ln2 e ]
HN
and
1 /HN,
1^ 2 |B|
with
(4 )
1/2 m ’/2
To obtain a stpong
H
minimizing sup N- 1 /2 u^ N
of stpong sup N'” 1 / 2 U N
= oo
0
if
lira N T
-
1/2
O P
N
I
oo).
H with P op su c h
a
lim N _1//2 N
2
1 F
0 < Tim N-r/2HN - a < oo (since H N
X
k - 1/2 = i
a
1
( 5)
lim N
N ~ 1 /2 H^0 - L
^
4
^ H
1
= ea - L
n 2 _1_ 4 a
,
HAMM and hence
r sup N"1//2UN > llm N"ly/2UN > N “ N
However, for
Hk = ak1Z2 k = 1, 2, ..., N
o, w the transformation on Z 1,
and
w1
Tz = (w* + z)/c - w
in
W, T
,
then, for each i = 1 , 2 , ..., m, S
cr1 (w + z) Z1
< 1BI
£
cr. (w* + z) Z!
1 - tiy, (TZ1) + m' (1 - c)+
is
HAJMAN PROOF. Expressing ^cr^w1 + z) as a Lebesgue integral with integration variable v , transforming via z = Tv and using the homo geneity of a , noting cr^w + z) > 0 and the domination of ii^i x A ( z « ),
Z o ± (v< +
) = f
z
+ v)
= cin
Z*
f
ffi^w
+ z ') x x f r r
TZ' j
> cm
cr^w + z) d^2
TZ1 Partitioning
Z'
by
T Z 1,
^ ai^w + z ) < [ 1 - nZ i(TZ!)] |B| +
(5 )
f cr^w + z) dnz , TZ1
,
whence by subtraction the difference of the lemma is bounded above by [1 - fiz ,(TZ1 )] |B| + (1 - cm )
(6 )
C
a± (w + z) dnzl
.
TZ1 The proof is completed by using the bound (1 - cm )+ |B| for the integral in (6 ) then weakening by (i - cm )+ = (1 - c)+ (i + c + ... + cra ~ 1 ) < m 1 (1 - c )+ . The rest of this section will be devoted to applications of Lemma 2 to the particular case where Z* = Z. Z - TZ is the subset of Z such that outside the unit interval for at least one i. Hence m
¥i z
1
1 zi < — - wi
or
m |B|
»
+
Z
m
* Z
1
1
falls
1 + w! zi > — o------- wi
(o, 1 ),
and since each z^ has the uniform distribution on Lemma 2 is in this case bounded above by
(7)
+ z^_) ~
r
the bound of
+
*"< ’ -=)*] ■
By taking c = 1 in (7) (and in (7) with w and w f inter changed ) there follows a uniform strengthening of the bound deducible from Lemma 1 , (8)
|g
cr(w + z) ^Z
£
cj(w! + z)| < |B| ||w1 - w|| ^Z
.
BAYES RISK IN REPEATED PLAY
1-33
This can be used with w and w ! identified as for (5 . 1 1 ) and (5 .1 2 ) to obtain the strengthening of their respective particularizations to the case n = ng. For k > 1 (9)
ls*k [x] - s*k [x']I < J B L
(1 0 )
|s*k [x] - s*k+1[x]I < IBI
X'
-
X
h k For all k > 1 such that mh < 1, direct applications of (T)« For w
(1 0 ) may be distinctly improved by and w 1 as for Foi (5 .1 2 ) and c = hk 1/hk ,
*k
( 11 )
[x] - sIk+1[xl < lB l m [ 1 -
while from the interchange of w and w 1 and c = hk/hk“”1 it follows that the negative of the left hand side of (11) has the same upper bound, and hence (1 2 )
|s*k [x] - s*k +1 [X]I < |B| m
1
-
h h
The possibility of distinct improvement in the (1 0 ) follows from the fact that [ r - ,k/v.k-1l k~1 by (5*1 )> while the bound (1 0 ) will be of when the Hk are of order k 1^2 .
bound (1 2 ) as compared with - [kH*~' n,Trk-1 - (k /l, - OH^/ktf 1 \Trkl /^rrk-1 = order of magnitude k~1^
The most important use for Lemma 2 will be to bound
*
t' v
N V k k+1 \ 2 , 6 (a ~ 5 >
1
in the special case \± = As in Section 6 the bounds obtainable from the use of (9) and (1 0 ) (or (1 2 )) can be slightly improved upon by direct applications of (8). Taking w = E^~1/Hk > 1 } w i - e^/H^, there follows
( 1*0 t
N
V
uz Y
€k
r L
1
1
I
whence by summation with respect to
1 1 < |B|
CVJ 1
‘ (s *1^ - 1) - s * k+1 (E^)
(13)
1
from (8) and (6 .1 1 ) a generalization and improvement of the particulari zation of (6 .1 2 ),
k 1M
1
+ Hk - 1z) -
oCeP
+ Hk z;
< IBI
U
HANNAN The bound (14) can be further reduced for sufficiently large N by a substitute for (1 3 ) which is obtained directly from (T)» For k > 1 the choice of c = Hk/Hk~1 yields k s*k(Ek-1) _ s*k+i(Ek)
(15)
+ m
< |B|
1
H
and this improves on (1 3 ) iff k - 2 > mHrTk-11. By bounding the left hand side by |B| when k = 1 , it follows that
(16)
N
Z e krs*k [
s*k+1]
It follows that the explicit assumption of the existence of cr(w) and the bound (14) prove that for \x = \±y the conclusion of Theorem 3 with n = 2 applies to s* -with )=
t
nz
a ^ -1 + Hk~1z)
.
F'urther, the Investigation of Section 7 of the behavior of the resulting upper bound with respect to choice of H is Immediately applicable and yields the promised improvement and generalization of (7 *1 3 ), N (17)
Z
£k t
a ^Ek_1 + J ^
z j - 0 (EN ) < N 1/2 "/an |B|
1
as well as that of Theorem 4, THEOREM 6. If p.^ Is the uniform measure on the m-cube Z and cr(w) minimizes w a for each m-vector w, then* for all N and e_9
,
BAYES RISK IN REPEATED PIAY N 1/2 s f u S m
135
JB}
iN z)
o' ( E^-1 +
< Z
- 0(EN ) < N 1/2 y&TlBl
.
As was noted in Section 8, the natural generalization of Theorem 1 is applicable here and, with Theorem 6, proves N
( 18)
< N 1/2 | s/m - 1 + JSm j |Bj That of Theorem 2 proves that the strategy of Theorem 6 incurs a maximum regret optimal in order of magnitude if there exists xQ satisfymodified re£ ing (4.12)* APPENDIX.
SEMI-DYNAMIC GAMES
It may be of interest to compare the behavior of
s*
in
G^
(Theorems 4 and 6) with that of an estimated Bayes response, X^s (Ek’ "1 ), in a context suggested by the Interpretation of problem 13 (ii) in [1o] as a semi-dynamic finite game In which I uses a fixed but unknown power strategy [x]^ in G^» In this context sequence strategies for II are simply ordered by their (unknown) expected inutility and for each recursive
N
N eksk (ek”1 ) =
1
£
xsk (_€k“'1 ) > N0 (-0
•
1
The expected risk of the estimated Bayes response,
X ^ s(Ek~1),
Is N t
(1 )
Y
1 Letting
Pk
be the
[x3^
N eks(Ek“1) = X
^ ( E ^ 1)
.
1 measure of the set of
E
such that
HAMM
136
i t fo llo w s from (2.1 5 ) th a t
(2)
ExsCE1*)
1, I = 1, ..., N. A player, knowing the numbers is given the cards face down on a table. He has an amount of money a, and his first move con sists in placing some fraction a of a onone of the cards. The card Is turned over and he Is paid a times the number on the card. He continues by playing some fraction of the remaining capital a - a on one of the un turned cards, continuing In this way until all cards have been turned. We show that no matter what the player does he cannot expect a payoff greater than a(np^)/N. (This amount can, of course, be assured by placing the amount a/N on each card. ) Tills last example does not fall under the general class of games with finite resources as described above. For this reason we shall consider a more general type of game in obtaining the result of the next section. §2.
THE MAIN RESULT
We first describe a game
r
consisting of the following objects:
an m by n matrix, a*(N)) (2) If (a1, aN )eA, so also is ( a where is any permutation of the integers 1, N. The elements of B are analogously defined. The elements of A and B are called admissible N-tuples. The game is now played as follows: Player II chooses an admissible N-tuple (b^ b2, ..., b^). Player I chooses a vector a1 belonging to some N-tuple of A, is then informed of b ] and chooses ag so that a ^ a^ are the first two terms of an element of A. Play
GAMES WITH FINITE RESOURCES continues in this fashion. After choosing a^, player I Is Informed of tu and then chooses a1+1 . ,, so that the sequence a., ..., a., a.JL*r.I form the I _L first I + 1 terms of an admissible N-tuple. Play terminates after play er I has made N choices and the payoff to player I is then a l^ b 1 +
+ •••
aj^/^bj^-
•
The game r is thus described by sixobjects and we represent notation, r = VcM, a, b, N, A, B] .
itby
the
Note that a game with finite resources corresponds to the special case where the matrix cM Is N by N, a and b are N-vectors all of whose components are 1, and the N-tuples a^) and (b^ ..., b^) are admissibleIf the vectors a^ and b^ are unit vectors,that is, vectors with one coordinate equal to unity and the rest zero. In Example 2 above the matrix cM is the number 1, the vector a Is the number a, the vector b Is the number B .. The set A consists of all N-tuples j\T (a^, ..9, a^) such that 1 ou = a, and the set B consists of all permutations of the N-tuple ((3^ . We can now state our result. THEOREM. The value of the game r is given by v = ( a ^ b )/N. An optimal strategy for either play er consists of choosing any admissible N-tuple (a1, ..., a^) or (b^ b^)and playing all permutations of the N-tuple with equal probability. (The symmetric nature of this result shows that play er I gains no advantage from his additional information. ) PROOF. Let (b1, b2, b^) be an admissible N-tuple for play er II. Let t be the mixed strategy which consists In playing all permuta tions of this N-tuple with equal probability. We shall show that If player I plays any pure strategy against t his expectation will be exactly v. The proof by induction on N Is immediate for N = 1. Suppose player I picks a strategy cr which involves choosing a on the first move, and suppose player II chooses bk on his first move so that the partial pay off from this move is a ^ b ^ The players now find themselves playing a new game, r ? = ieM, a - a 1, b - b^, N - 1, A f,B 1]. Furthermore, in r ’ player II is playing the mixed strategy In which he plays with equal prob ability all permutations of the (N-1)-tuple(b1, ..., bk-], b^+1, b^). By induction hypothesis the expected payofffor this game is
V
(a-a )^(b-bv ) ----- 3------- , i\f - 1
GALE
i,
L, C, R
will perform an atomic act composed of three
parts 1)
vertical strokes10
Assume that the complete configuration
scans a square containing T(siclj) =
(the corresponding
n + 1
s^ s.
and prints denotes
Sj,
instead,11
1.
sQ means leaving the square blank.
and
RABIN
156
2 ) M moves one square to the left, stays on the original square, or moves one square to the right, according as to whether u or R, respectively, 12 3 ) M goes into internal state q-,*
is
L, C,
J
ration
When the atomic act is completed we have a new complete configu w 1. If j f > 1 then M proceeds to perform a new atomic act and
so on. M ’s
operation stops when
M
goes into the terminal state
qQ,
from there on no atomic acts are performed. To make M perform a calculation we print various symbols on the tape and set the machine in some internal state scanning a square of the tape; this complete configuration constitutes the input. M starts operating and transforms the original complete configuration by a series of atomic acts. If M ever enters qQ during its operation it stops and whatever is printedon the tape at that time is the machines output. We are now in a position to define effective computability. DEFINITION.
A function f(x) = y
from integers to integers Is effectively computable (recursive) if there exist a Turing machine M such that for each Integer x, if M's Input is ^q1, where | Isthe numeral corresponding to the integer x, then the machine does reach qQ and the output is T]q0 where t] is the numeral corresponding to f(x) (= y). We could actually go further with our formalization and completely eliminate from our definition the notions of a machine, machine tape etc. by replacing everything by some kind of an algebraic system of Post words. How to do It Is quite clear and on the other hand doing so would eliminate some of the intuitive appeal of the definition. For this reason we leave things as they are. Chapter XIII of Kleene’s book [2 ] contains a detailed discussion of Turing machines and computable functions. In Section 70 the reader will find the motivation for defining computable functions as those functions calculable by some Turing machine.
12
L, C,
and
R
stand for "left", "centerM, and "right".
EFFECTIVE COMPUTABILITY
157
BIBLIOGRAPHY [1] GALE, DAVID, and STEWART, F. M., "infinite games with perfect in formation", Annals of Mathematics Study No. 2 8 , (Princeton, 1953 .)> pp. 245-266. [2]
KLEENE, S. C ., York, (1 9 5 2 ).
Introduction to Metamathematics , Van Nostrand, New
[3 ] von NEUMANN, J., and MORGENSTERN, 0., Theory of Games and Behavior, Princeton 1944, 2nd ed. 1947, pp* 1 1 2 -1 2 8 . [4]
Economic
POST, E., "Recursively enumerable sets of positive integers and their decision problems", Bulletin of the American Mathematical Society, 50 (1 9 4 4 ), pp. 284-3 1 6 .
[5 ] ZERMELO, E., "Uber eine Anwendung der Mengenlehre auf die Theorie des Schachspiels", Proc. Fifth Int. Cong, of Math., Cambridge (1 9 1 2 ), Vol. II, p. 5 0 1 . Michael 0. Rabin
Princeton University ADDED IN PROOF: An additional reference is "Machine Computability in Game Theory" by F. L. Wolf (Doctoral dissertation, Minnesota (1 955 ), unpublished). Judging from an abstract, it seems that, in contrast to actual games as defined in this paper, there exists a continuum number of "constructively defined games" in Wolf1s sense; there would thus exist constructively defined games in which the players could not be supplied with effective instructions for actually playing the game. Furthermore, Wolf does not prove the existence of a particular constructively defined game which does not possess any effectively computable optimal strategies.
AND
THE BANACH-MAZUR GAME
BANACH
CATEGORY THEOREM John C. Oxtoby The object of this note is to generalize the Banach-Mazur game of "pick an interval" to an arbitrary topological space, to show that the theorem conjectured by Mazur and proved by Banach (but never published by him) (see [1 ]) concerning the circumstances under which the game is deter mined in favor of one of the players still holds, and to obtain thereby a new proof of the Banach category theorem [2 ]. Let
X
be an arbitrary topological space, and let
specified class of subsets of
X
be a
g?
such that each member of
has a non
empty interior and such that every non empty open set contains a member of
C f.
Let
sets.
X = A U B
The game
be any given decomposition of
alternately choose sets
is played as follows: Gn
g?
of
X
into two
disjoint
Two players (A)
such that
and (B)
Gn ) Gn+1 (n = 1, 2 , •••)•
Player (A) chooses the terms with odd index, and (B) those with even in dex.
Player (A) wins in case
A n fl Gn
Any nested sequence a play of the game
.
(Gn )
(The index
For each
n,
consisting of a set
fn
n
belongs to
^
such that
fn
.
1 ) for every
n.
A n fl Gn
G ^ , G 2, ..., G gn is a subset of fg = {fn ) G 2n_ 1 )
fl
Gn C B
159
2n
that
of functions of
2n - 1
is a subset of
f^
• • • > G 2n^ for ever^ fg = (fn ) if G 2n = (f^, fg)
n*
determines uniquely
is a winning strategy for (A)
is non empty for every play consistent with
winning strategy for (B) if
of
G gn
is saidto be consistent with
Any pair
a play consistent with both strategies, if
and a sequence of functions
G 1, ..., G 2n-i
fn (G1,
fA * (G, l f n ) ) if G 1 » G and G 2n+1 = A play {Gn ) is said to be consistent with fn (G1,
is called
will always be understood to
A strategy for (B) is a sequence
sets belonging to G 2n_i
G
£
A strategy for (A) is a pair
is defined for every sequence
sets belonging to
Otherwise (B) wins
of sets belonging to
run over the set of positive integers.) fa = (G, (fn ))
is non empty.
f^.
fg
is a
for every play consistent with
fg.
OXTOBY The game is said to be determined In favor of (A) or (B) ifthere exists a winning strategy for (A) or (B), respectively. THEOREM 1. The game is determined in favor is of first category in X.
a well ordered subclass of g such that member of g ?o . (In case X has a count be taken to be countable.) Suppose A is AQ for some sequence of nowhere dense
setsAQ. For any sequence ••*, ^2n-l 0^ 2n “ 1 sets belonging to let fn (G1, ..., G2n_-, ) be the first member of g ? 0 that is contain ed in G2n_1 - A^. Then the sequence fg = {fn ) is a strategy for (B). For any play {Gn ) consistent with fg we have f| Gn C B. Hence fg is a winning strategy for (B). g
for
Conversely, suppose there exists a winning strategy fg = {f (B). By an f-chain of order n we shall mean a nested sequence
G.j 5 G2 3 ••• 3 G2n of 2n fl (G1, G2, ..., Gpl__1 ) for
sets belonging to i = 1, 2, .. ., n.
}
$
such that G2^ = Theinterior of G2n
is
called the interior of the chain. An f-chain of order n + k is a con tinuation of one of order n if the first 2n terms of both chains are the same. The class of all f-chains is ordered by the relation of con tinuation. Among all f-chains of order
1,
let
F^
be a maximal family
with the property that the interiors of any two members of F are dis joint. Then the union U 1 of the interiors of the members of F 1 is a dense open set in X. Proceeding by induction, suppose a family Fn of f-chains of order n has been so defined that the interiors of the members of Fn are disjoint and that their union Un is dense in X. Among the f-chains of order n + 1 that are continuations of members of Fn , let En+1 be a maximal family with the property that the interiors of any two members are disjoint. Such a family exists by ZornTs lemma, since the required property is of^finite character. (In case is denumerable, we can define En+1 explicitly. Let ^ be the subsequence of g consisting of those sets that are contained in the last term of some chain belonging to Fn - Each member of ^ determines an f-chain of order n + 1 of which it is the (2n+l)th term. Arrange these chains in a sequence. Taking these in order, select those whose interior is disjoint to the interiors of the chains already selected. The chains selected in this way may be taken to define Fn+1.) It follows from the maximality of Fn+i that the union Un+1 of the interiors of the members of En+j is dense in X. Therefore, by induction, we may assume that such
THE RAMCH-MAZUR GAME a family Fn is defined for every n, and that each member of Fn+1 is a continuation of some member of Fn . Put E = H Un . For each x in E there is a unique sequence CCn ) of f-chains such that Cn belongs to Fn and such that x belongs to the interior of C , for every n. This sequence of f-chains is linearly ordered by continuation, and defines an infinite nested sequence of sets GR belonging to where Bn is nowhere dense in X. For each n, and for any sequence G 1 , G2nof 2n sets belonging to Cf 9 let fn (G1, G2n)be the first member G2n+1 of such that G2n+1 C G2n - Bn and such that diamG2n+1 < ~ . Then the pair fA = (G, Cfn )) constitutes a strategy for (A). For any play {Gn } con sistent with this strategy, the intersection fl Gn is contained in A. Since G2n+1 C G2n for every n, we have fl Gn = fl Gn - Since diam &2n+l < for every n, and since X is a complete metric space, it follows that fl Gn consists of just one point. Hence «. n Gn Gn is non empty. Therefore is a winning strategy for (A).
n
n
Conversely, suppose fA = (G , (f }) is a winning strategy for (A).It will suffice to show that GQ fl B is of first category in X. To this end we shall define a new winning strategy g^ such that every play consistent with g^ intersects in a single point. For each n, let
OXTOBY hn
g
be a mapping of
diam hn (G) < ~
2n
1 , 2 , ..., n), Since H2n C G2n it gA = (GQ , (gn ))
G
g ,
sets belonging to
(i =
g
into
for every
such that
hn (G)
C
G
and
. For any sequence
In
* G 2 j__i
let
and define
gj^,
and
of
H 2i = hi (G2 i )
..., G2 n ) = ^n (H1, •••, H 2 n ).
gn (G1 , ..., G 2 n ) C
follows that
G ^ , ..., G 2n
constitutes a strategy for (A).
G2n*
Hence the pair
Consider any play
{Gn )
consistent with this strategy. Let H 2 n -1 = ^ 2 n-i and> H2n “ hn ^ 2 n^ for every n. We shall show that {R^} is a play consistent with f^. is clear that
Hn
g
belongs to
for every
is nested we proceed by induction. H2
H1 3
h 1 (G2 ) C G 2 C G 1 = H 1.
We have H1
Hence
n.
H 1 = G 1 = GQ
3
Hg.
and
Suppose that
H 2 D •• • ^ H2n* Then H 2n+2 =hn + i ^ 2 n + 2 ^ G2n+2 ^ G 2 n +1 gn (Gi^ ..., G2 n ) = f‘n (H1, •• •> H2n) C H 2n* Hence H2n 3 H 2n+1 Therefore the sequence {Hn 3 is a play. The relation H 2n+1 = fn (Hi, •••, ^2 n ) is
a
shows that this play is consistent with
winning strategy, the set
An
fl H n
strategy for (A). fl Gn
every play
Moreover,
(Gn )
consistent with
g^.
1,
G Q fl B
REMARK
1.
If
X
is the line and
1
favor of (B) If and only if if and only if
B
and
A
2
(gn )
A
is the original determined in
Is of first category, and in
favor of (A)
Is Indeterminate. and
B
This Is the theo
As noted by the authors of h],
REMARK stated In Theorem
2. 2
If
X
{0, 1).
X
Let
B
for
(For the ex
2 6 .)
is a compact Hausdorff space, the criterion
for the game
to be determined in favor of (A) For a counter ex
be the product of uncountably many spaces each equal to A
be the set of points for which all but countably many
coordinates are equal to
0,
and let
B = X - A.
of first category at any point, but the game favor of (A).
and
For instance, this will be the
is still sufficient, but It is no longer necessary. ample , let
A
intersect every perfect set.
istence of such sets, see [3], p*
There
is the class of non de
the axiom of choice implies that there exist linear sets
for
is a winning
X.
is of first category at some point.
case whenever both
the inter
imply that this game is
rem of Mazur and Banach as stated In [1 ]. which the game
Con winning
.
is of first category in
Theorems
n,
fl Gn ( A
It follows that
generate bounded closed intervals, then the game Banach-Mazur game.
Since
H GR .
for every
Therefore
strategy for the second player in the game fore, by Theorem
*
must be non empty.Both ( H^
sincediam H2n < ~
consists of just one point.
^ 2n + 2
f^.
and £Gn ) are nested, hence H = f| H2n+1 = H G2n+1 = sequently A n fl Gn is non empty, and therefore g^ is a section
It
To show that the sequence
Neither
A
nor
B
is determined in
A winning strategy for (A) consists in choosing always a
basic closed open set.
is
THE BAUACH-MAZUR GAME
163
BIBLIOGRAPHY [1 ] MYCIELSKI, J., SWIERCZKOWSKI, S., and ZIIJBA, A., "On infinite position al games, Bull. Acad. Polon. Sci. Cl. Ill, vol. 4 (1 9 5 6 ), pp. 485-488. It is stated that proofs will appear in a paper in Fundamenta Mathematicae. [2 ] BANACH, S.,"Theoreme sur les ensembles de premiere cat£gorie," Fund. Math. 16 (1 9 3 0 ), pp. 395-398, or Th m 2 ,m^,
by a point
3,
or
in the set
according to whether
A position of thegame may be
0 )ofthe cartesian product
M
given
x M x Mx
©.
is defined as follows: let
B,r (m) be a sphere i (with its interior) having as radius the greatest distance
which (i) can cover in one second. m 2, m y
r ( m ^ , m2 , m^, rfn^, If
0
At each second choose a number
and take it equal to
m2 ,
m^ = m 1, or
(m1 ) x {m2 ) x
1) *
2) = {m1 )x
3)
4 m 1 , m^ 4 m 2, m^
If
{m2 } x B^
m^ = m 2, or
H,
define
{m^) x {2} {m3 3 x (3 )
(m2 ) x
- {m1 }x
4
(m^)x
(1
}
m^ e H,define
0
m 2, m^, 0) =
.
If the aim of (1 ) is to capture (3 ), define f 1 (m^, m 2,m 3,
0) = 0
if
=1
if
(1
)
6
N+
m3
4
m 1, m3
m^ = m 1
4 or
m2 m.^ = m 2
.
I f the aim o f (2) is m erely to approach ( 3 ) as c lo s e ly as p o s s ib le , then d e fin e f g ^ , m2, m3, 9) = - d(m2, ) (2) £ N+
.
I f ( 3 ) is tr y in g to reach the harbor H, m2, m3, 9) = 0 = 1 (3 ) e N+
d e fin e if m / h i f m3 e H .
168
BERGE
This game is alternative. Note that if there is no place where (3) can be secure from cap ture one may define f^ (x) f^ (m1, m2,
by , e) = o =
if m^ = m 2
1
4 m2
if
or
m^ =
and
m^ 4 m -,
(1 ) € N“ . The aim of (3) being to maximize player.
REMARK.
inf
Qfo(x),
X £o j
player (3) is now a passive
In the finite games considered by von Neumann and Morgen-
stern one has x
4y
implies
f± (x)
=o
if
rx 4 0
f± (x)
>°
if
rx = 0
rx n
ry = 0
In the infinite games considered by Gale and Stewart [6], or Wolfe [11 ], the rule Is still given in this form., but the manner of de fining the pay-off is quite different. In our formulation, the pay-off of a play depends on the set of the positions met during the play, but not on the order in which we met these positions; for that reason we shall obtain, a stronger existence theorem, e.g., the Zermelo-von Neumann Theorem for in finite games. But the main advantage of our description is being "global11 In distinction to the preceding theories, which consider a game having a fixed initial position xQ ("local point of view"); our theorems, on the other hand, will be stated for every initial position ("global point of view"). A game (r, f-, f9, ..., f ) is only a structure on the abstract A partitioned set X. If A is a subset of X, if r denotes the re striction to A of r, and f^(x) the restriction to A of f. (x), the A A A A\ game (r , f1, f2, f ) on the partitioned set A is a subgame of (r, f1, fp, •••, fn )^ every subset A defines a subgame, and every subgame of a game is a game. §3.
EXISTENCE OP AN EQUILIBRIUM POINT
We shall now give a generalization of the Zermelo-von Neumann Theorem proved by Kuhn ([9 ]) for a less general class of n-person games with perfect information.
1 69
TOPOLOGICAL GAMES If
S
is a subset of
X
and if
0
r+S = (x/rx C S, rx
is the null-set, define
4 0}
r“S = (x/rx f i S / 0 )
0}
XQ = {x/rx = If
$(S)
, ,
.
designates the collection of the subsets of
two mappings from
$ (X)
into
and the lower inverse of the multi-valued mapping r”
X, r+
and
r~
are
$(X - X ) which will be called the upper r.
The lower inverse
is the inverse as usually considered in analysis; the properties of the
upper inverse have been studied by the author elsewhere ([3] ) for use in game theory.
If
r
is a single-valued mapping one has
r+ = r~ = r”1
(hence the name "inverse"). Define
X(0) = X0 X(1 ) = x(o) u r+x( 0)
X(k) = x(k - i) u r+X(k - i )
X(o>) = Uk=oX X(co + 1 ) = X(cn) U r+x(co)
and so on, for all ordinal numbers. One has xQ e X ( k ),
X(o) C X(1 ) C X(2) C ... C X(o>) C X(a> + 1 ) C ...
one can say that the game will terminate within
x Q e X(a) + k),
one can say that within
k
r-finite
called
r"-finite if the set
called
r+-finite if, for every finite set
finite.
if the set
rx r~x
The length of a play
quence ; if, for a given position their length bounded by a number bounded at
x;
is finite for all is finite for all S,
x
the set
x, k,
it will be r+S
is
x 1 e rxQ,
all the plays starting from, x have we shall say that the game is locally
if all the plays starting from
same as in graph theory.
it will be
x;
is the number of elements of this se
shall say that the game is locally finite at finite at
If
A game will be x;
(xQ , x 1, xg, ...), with
x2 e rx1, ..., xn e rxn>-1, ...,
.
moves; if
moves one will be able to tell
in how many moves the game will terminate; and so on. called
k
x x.
have a finite length we These definitions are the
It is well known that a game
is also locally bounded at
x
(Konig).
r-finite and locally Furthermore, if the
BERGE
170 set
X
is finite, the game is locally finite if and only if there does not
exist a cyclic path. THEOREM 1.
The game islocally finite if
there exists an such that PROOF. A fl
A PI X(pQ ) = hence
0
A =
0;
oc
If
a
such an
does exist, let
x1 € B
tion x^
x2
A
be the set of all po
some play of infinite length.
X(p) =
0 for
all ordinal numbers
One has A flX(o) =
&Q
p
x
= supffj^yj/y
a
e
>}
< x;
cr
if
(1) e N",
>)
- XQ
of strategies for
(i ),
(2),
determines a play of the
game.
De
as the set of the positions
inf (f^(y)/y € < x;
in X^
if
(i)
met during that play and
e
N+ ,
or
a)
f^x;
=
as the resultingpay-off for play
er (i). Define also
(ct/t ^ ) = (a1, cr2, ...,
the n-tuple obtained from
o
by replacing
,
a^
t ^,
by
t^.
called an equilibrium point for the initial position f± (x ; c r / ) < fi (xQ ;
a)
for all
xQ
and for
a
xQ
Neumann).
PROOF. Define aa as the restriction to we are trying to construct.
are finite,
X(a)
.
we shall say that
If the game is locally
finite, and the sets F ± = { f ^ x V x e X) there exists an equilibrium point.
is
if one has
is an equilibrium point of the game. THEOREM 2.(Zermelo-von
cr
After Nash,
as
i = 1, 2, ..., n
If these inequalities hold for every initial position
o
ai+1, ..., an )
of the mapping
171
TOPOLOGICAL GAMES
In
X ( 0 ) we d e fin e a r b it r a r ily =x
a°x
o°
fo r a l l
by
x in X (o) .
Having de fin ed ak on X (k ) wes h a ll d e fin e ak+1 In the fo llo w in g way: 1 ) i f x 6 X (k) take 5 k+1x = akx 2 ) i f x € X(k + 1 ) - X (k ), X . a x, take ak+1x = y such th a t J
cfk ) *
max
5k )
Z€ rx
.
As I t is obvious th a t cr° is an e q u ilib riu m p o in t o f the subgame defined by X (o ), le t us prove th a t i f _k a is an e q u ilib riu m p o in t o f the sub game de fin ed by X (k ), ak+1 Is an e q u ilib riu m p o in t o f the subgame de fin e d by X(k + 1 ), I . e . , f ± (x; 5 k + l/ t f +1 ) < f ± (x; 5 k+1) , fo r a l l i = 1, 2 , . . n fo r a l l x in X(k + 1 ) fo r a l l Tk+1 .
, ,
Since th is is obvious i f x € X (k ), assume th a t x e X(k + 1 ) - X (k ). Suppose th a t x e X^ and consider a s tra te g y fo r p la y e r ( i ); to s im p lify one w i l l w rite ak+1x = ax = y T k+1
x = tx =
z
One has f^(z;
Then i f
(I) €
and i f
o/t±)
7) G y(k) = (1 U r +B 1 U r~B 2 )G^(k - 1 )
.
Using tr a n s fin lte in d u c tio n one can d e fin e in the n a tu ra l way - Uk=o
G^(w + 1 ) = (1 U r +B 1 U r ”B2 )Gy(a>)
I t is c le a r th a t G^(o>) is the set o f p o s itio n s from which 0 )can stro n g ly ( s t r ic t ly ) guarantee 7 and we have G7 = Uex G7 (a) (the union is taken over a l l tr a n s fin ite o rd in a l numbers a ). In a low er to p o lo g ic a l game fo r (1 ), S^ Is open; moreover i f Q is open, the set (1 U r +B,1 U r “ Bn)a 7 d. Is also open. Hence the sets G^ (co) and G^ are open. f
!
THEORM 6. In a low er to p o lo g ic a l game fo r (1 ) the best p a y -o ff fu n c tio n v (x ) o f (1 ) is low er semi continuous on X.
BERGE
176 PROOF. can guarantee
7
Let €> 0,
and
= v(x )- e; v(x)
xQ € X;
xQ player (1 )
at the position
furthermore
>v(xQ ) - €
for all
x
in G^
G^ is a neighborhood of xQ (Theorem 5 ), therefore v(x) is lower semicontinuous at xQ . As xQ is given arbitrarily, v(x) is lower semicontinuous on X. THEOREM 7 . If, in an upper topological game for (1 ) one has rx ^ 0 for all x, and if we arbitrarily stop the game at the k ^ move, then the set G^ of positions in which (1 ) can guarantee 7 is closed. PROOF. Since rx 4 0 closed for any closed set F; thus the is closed.
x the sets r+F and r~F set G^ = G-^(k) = (1U r+B 1 U
are r“B2 )kS^
THEOREM 8 . If, in an upper topological game for (1 ) onehas rx 4 0 for all x, and if we arbitrarily th stop the game at the k move, the best pay-off func tion v(x) for (1 ) Is upper semi-continuous. PROOF. cannot guarantee
Let
e > 0 and
xQ e
7 =v (x0 ) +
xQ player (1
X; at the position
)
andone has
v(x) < v(xQ ) + € As X - G^ Is a neighborhood of Is upper semi-continuous at xQ; upper semi-continuous in X.
for all xQ as
x In
X
-
G^
after Theorem 7, the function v(x) xQ is arbitrarily given, v(x) Is
COROLLARY. In an upper topological game for (1 ), if rx / 0 for all x, if rx Is a compact set for all x in X, and If we arbitrarily stop the game at the k^*1 move, there exists for (1 ) an optimal strategy PROOF. Let xQ e X; there exists aposition y In
after Theorem 8 and since rx such that
v(y) = max v(z) ze rx if player (1 ) chooses such a position
y
rx
is
. compact,
;
he will adopt an optimal strategy.
177
TOPOLOGICAL GAMES
These propositions are applicable to the example above: the best pay-off
function of player (2 ) is continuous but the pay-off functions of
players
(1 ) and (3 )are only upper semi-continuous if we stop the game at
the
move. If, as in
this example, the space
space, more precise
X
topological properties for z1
studying the space
Is a uniform
topological
v(x)
obtained in
can be
of the strategies for player (1) with the strong
topology and the weak topology ([3] ).
However we have not succeeded In
finding a topological characterization of these functions. of theaboveresults i.e., we x ,
The simplicity
seems to be due to the global point of
did notconsider the
view adopted;
game relatively to a given initial position
but as being an additional structure to the space
X.
If
X
is pro
vided with other structures than the topological one, the following problem arises: can the formula of the winning position, as given in the proof of Theorem.
5,
be used for further investigation (and particularly for existence
theorems)? REMARK.
One can obtain a slightly more general description in
using preference ordering relations instead of preference functions. A binary relation
>,
defined on a set
X,
is a quasi-ordering
if 1) x > x
2) 3)
for all
x > y, y > z x, y € X
y
x >z
imply either
If one hasx > y to
x Imply
and one writes
and
x = y;
x > y
y > x, if
can
and a real valued bounded function is equivalent to If
(i) e N+
every position
y
in
y> x
one says
x > y but
The preference of player(I) on X i x > y
or
that
not y
we shall write < y , t >,
X
< xQ ,
o > >
I,
€
is an equilibrium point of the game if
all
t
,
all
xQ .
< yQ , t >
there exists a position
a similar definition is immediate).
xQ , for all
aquasi-ordering
f^(x) > f^(y).
i
a
is equivalent one writes x
can not exist such that
x > y
(if (i) e I\f
both).
x‘
> x,
be givenby f .(x)
(or
a /,t±
i > < < xQ , a >
x
if, for such that
> y. >
178
BERGE
If X is a topological space, a relation of quasi-ordering > on X is upper semi-continuous in xQ if, for every point x 1 (> xQ ) — if such a x 1 exists — one has a neighborhood V(x ) of xQ such that x e V(xQ ) implies
x < x1
Reversing the inequalities, we shall define a lower semi-continuous quasiordering in xQ . So, all the preceding results can be obtained in terms of quasi ordering. We used the preference function for the only reason that the notion of semi-continuity of a real valued function may be more familiar to the reader than that of semi-continuity of a quasi-ordering. BIBLIOGRAPHY [1 ] BERGE, C., lfUne Sc. 231 (1955),
theorie ensembiste des jeux alternatifs,” C. R. pp. 294-296.
Ac.
[2 ] BERGE, C., "Sur une generalisation du theoreme de Zermelo-von Neumann,” C. R. Ac. Sc. 241 (1955), pp. *+55-^57• [3 ] BERGE, C., "Theorie des jeux (a n personnes),” M§m. Paris, 1957 (to be published). [4]
CHOQUET, G., "Convergences,” Ann. Univ. Grenoble 23
des Sc. Math., (19^7), PP* 57~97*
[5 ] FLEMING, W. L., "A note of differential games of prescribed duration,” this Study. [6 ]
GALE, D., and STEWART, F. M., "infinite games with perfect informa tion,” Annals of Mathematics Study No. 28 (Princeton, 1953), pp. 245-266.
[7 ]
ISAACS, R., ’’Differential games,” RAND Corporation Research Memoranda RM-1 391, 1399, U11, i486 (1954-1955).
[8 ] KAKUTANI, S., ”A generalization of Brouwer’s fixed-point theorem,” Duke Math. J. 8 (1 941 ), pp. 451-459[9 ] KUHN, H. W., ’’Extensive games and the problem of information, ” Annals of Mathematics Study No. 28 (Princeton, 1953), PP- 1 93-2 1 6 . [1 0 ] MICHAEL, E., ’’Continuous selections,” Annals of Math. 63 (1 9 5 6 ), PP- 3 6 1 -3 8 1 . [1 1 ] WOLFE, P., ’’The strict determinateness of certain infinite games,” Pac. J. Math. 5 (1955), pp. 8 9 1 -8 9 7[1 2 ]
ZERMELO, E., ”Uber eine Anwendung der Mengenlehre auf die Theorie des Schachspiels,” Proc. Fifth Int. Cong, of Math., Cambridge (1 9 1 2 ), Vol. II, p. 5 0 1 .
Claude Berge Economics Research Project Princeton University
STOCHASTIC GAMES WITH ZERO STOP PROBABILITIES Dean Gillette § 1.
INTRODUCTION
A stochastic game has been defined by Shapley [1], to be a game consisting of a finite collection of positions among which two players pass according to jointly-controlled transition probabilities. i—
position, the players choose pure strategies
of passing to position first player is
a ^ .
j
is given by
pj^j
k
and
Thus, if at the the probability
and the (finite) payoff to the
Payments are to accumulate throughout the game.
Shapley has, in addition, required that, at each position, there is a positive probability
sj^
of stopping the game.
We may include the
stop probability within our framework by requiring the existence of a po sition
j
with
aj^ 2 o
and with
pj^
>
5
> o
and
pjj^ = o
for all
1, k, J. In this paper, we shall investigate the consequences of relaxing the requirement that the game shall certainly end, i.e., we will allow the stop probability to be zero at some or all positions. that the accumulated payoff may not be well defined.
It follows, of course, We introduce effective
payoffs, defined over the entire game, and attempt to find optimum strate gies for the players relative to these effective payoffs. §2.
STRATEGIES AND EFFECTIVE PAYOFFS
A strategy is taken to be a selection of a probability distribu tion on the alternatives at each position of the infinite game.
It is not
in general required that a given distribution be maintained on successive returns to a given position; a player is allowed to change his mind.
These
strategies are more properly labeled behavior strategies, but since we deal with no others, we will omit the modifier. Of particular interest are stationary strategies, in which for each of the finite number of distinct positions, a probability distribution is specified for use when that position is reached, independent of where it
179
GILLETTE might occur in the infinite game. A stationary strategy for the first play er, on a game of M positions, may be represented by an M-tuple of prob ability distributions: x = (x1, x2, ..., xM ),
each
x1 = I x|, x^, ...
’ X* J
and similarly for the second player. We shall also use a pair (x, y) to denote non-stationary strategies for the two players, but shall specify the type of strategy where necessary. The the payoff to (x, y),
probability of passingfrom position i to position player one at position i,given stationary strategies
j and
are respectively:
pI j(x' y> = k,i X pJi xi yj Ai (x, y) = X
i
xk 4
■
k, a Let
P?j
denote the identity matrix, and for
n > 1,
let:
M
Pi j 1(x>y> =mI= 1 pim(x' y> pmj^ y> • The payoff that player one has accumulated after traversing N + 1 po sitions, N > 0, starting with position i, given stationary strategies (x, y), is:
h?(x,
N
M
y) = n=o Z j=1 Z
P^(x, y)
Aj(x, y)
P o p non-stationary strategies, this formulation of H^(x, y) is, of course, not applicable; however, we shall retain the notation for the accumulated payment, independent of the type of strategy. An effective payoff is some function of the payoffs accumulated position by position during the course of the game. If the payoff after n + 1 positions have been played is dis counted by (1 - s )n, o < s < 1, we write the effective s-discounted payoff for strategies (x, y) as:
STOCHASTIC GAMES WITHOUT STOP oo d^(x, y ) =
y
,
(1 - s)n ( Hi (x> y } - Hi"1^
y })
n=o where
H T 1 ( x , y) - o. We also define a limiting average payoff,
L. (x, y) - lira inf. 1 N -- ^ oc The lira inf is taken to insure existence.
L^(x, y ),
nj(x,y)
--w
by:
.
However any convex combination
of lim inf and lim sup may be taken without destroying the character of later proofs. Shapley’s results [1], may be interpreted to demonstrate the validity of the following theorem. THEOREM 1.
For the s-discounted effective payoff on
a stochastic game there are stationary strategies (x*, y*) all
1
such that for all strategies
(x, y)
and
= 1, ..., M;
D®(x, y*) < D?(x*, y*) < D^(x*, y)
.
Moreover, if the game is of perfect information, a solution exists in stationary pure strategies. (Note that this theorem concerns stochastic games, possibly with stop probabilities, and with a discount.
However, with the formulation of
a stop position as in the Introduction above, Shapley1s Theorem 2 and Application 2 are applicable. ) In order to obtain an analogous result for the limiting average payoff, some lemmas are required* §3 . LEMMA 1,
f
is a sequence of functions con
verging uniformly on
X x Y
each
(xn , yn ) c X x Y
all
(A)
Suppose
PRELIMINARY RESULTS
n
there exist
to a function
f*
If for
such that for
(x, y ) € X x Y:
f n (x, yn ) < f n(xn, y ) < f n (xn, y)
GILLETTE then: (i)
inf sup f(x, y) = sup inf f(x, y) yeY xeX xeX j e Y
Moreover, for each if
(xn , yn )
€ > o
there is an
n
satisfies (A) then for all
f m(xn' yn } “ € < f m(V
such that m > n:
^
(ii) ^
fm (V Choosing
n
so that for all
|fm (x, y ) - f(x, y ) | < -p LEMMA 2.
If
+ 6 > fm (x’ yn }
m > n
*
and for all
x, y,
one may easily verify (ii) from which (i) follows.
an
is a sequence of non-negative numbers,
then: N lim sup ^
N—
»
Y
an < lira sup s ^
n=1
3— 0 +
n=1
(l
I
- s )n“ 1a.
- s)n- v
a^ > lim inf s lim inf ¥ n - s Q+ N — n= 1 n=l
A proof of this generalization of a theorem of Hardy and Littiewood may be constructed along the same lines as the proof of that theorem as given by Titchmarsh ([3 ]> p. LEMMA
3*
If
matrix, with
2 2 7 ).
P 1 = {pi •} n th P the n
stochastic matrix,
is an M-dimensional stochastic power of
Q = (
j ),
P,
then there is a
such that:
N lim N —
I y
pn = Q
Moreover, if there is an Integer 8 > o
.
w n= 1 m
and a number
such that rain p?. > 8
ij then the numbers
q.y
~
are independent of
i
and
STOCHASTIC GAMES WITHOUT STOP N
1 IT
t
/ n
dij)
(pij
1 N
n=o
( 1-MS ) (1- ( 1-MB )
T7 nT
These results, from the theory of Markov processes, may be obtained by a slight extension of the results given in Doob ([2 ], pp. 1 73 — 175 )* §4.
THE LIMITING AVERAGE EFFECTIVE PAYOFF
We Impose the condition that the payoffs to player one be non negative. From the definition of the limiting average effective payoff, we see that this results in no loss of generality. We first concentrate on stationary strategies.
From the defi
nitions and Lemma 3, we may write:
L^x, y)
llm
N
M
i £
Z
PijU, y)
A.O
n=1 j=1
It follows by Lemma 2 that: (B)
L . (x,
y) =
llm s _— o
s D? (x, y)
•
There is always a solution for the s-discounted payoff, consequently, from Lemma 1 there are at least e-best strategies in the limiting average effec tive payoff provided the passage to the limit Is uniform over a set of (x, y) simultaneously containing solutions of an Infinite sequence of s-discounted payoffs with the discount approaching zero. We consider sever al cases. CASE 1 . Perfect Information. The convergence In (B) Is uniform over the (finite) set of all pure stationary strategies. Since this set also contains solutions for the s-discounted games for all s, we find that there are pure stationary strategies x*, y* so that for all pure stationary strategies (x, y ) for all I: Li (x*, y) > L± (x*, y * ) > L± (x, y * )
.
Now, let (x, y ) be any pair of strategies, stationary or not. From the definitions of the effective payoffs and with the aid of Lemma 2 , we have:
184
G ILLET T E L. (x, y * ) < lim sup N_oo
lim sup s ^
•o
y*)
i
(1 - s)n " 1 ( H?(x,
j*)
- H1?” 1 (x, y * ) j
n-1
lim sup s _ ^ o+
s D£(x, y*)
lim inf +
s D±(x*, j * )
< lim sup
-s ■
+
< lim inf +
s D!?( x * ,
J*)
L 1 (x*, y* ) =
s n?(x*, y)
00 lim inf s (1 - s )n_1 s — ■*-o n= 1
lim inf ^ H?(x*, y) =
| H?(x*, y)
H^~1 (x*, y))
(x*, y)
Thus, we have: THEOREM 2 . For the off on a stochastic are pure stationary for all strategies
limiting average effective pay game of perfect information there strategies (x*, y*) such that x and y, all i = 1, ..., M:
L-i (x*, y) > L-(x*, y*) > L. (x, y*)
.
CASE 2. Cyclic Stochastic Games. A cyclic stochastic game is one in which there exists a 5 > 0 and N > 1 such that:
J• j
J -j J 2
.^T6! ^2^2
JN -1 J V
h
=
5
j
Here, independent of the choice of strategies, there is a positive prob ability of passing from position i to position j in exactly N moves. In particular, there can be no "absorbing subsets" of the positions of the stochastic game nor Is there a stop position. It Is clear that the transition probabilities p| .(x, y), cyclic game satisfy the conditions of the second part of Lemma 3 ,
for a
STOCHASTIC GAMES WITHOUT STOP simultaneously in all stationary strategies (x, y). As a result of this, it may be shownthat, for stationary strategies, the passage to the limit (B) is uniform over all stationary strategies. Consequently, from Lemma 1 and Theorem 1 , for stationary strategies (x, y) and all i: (C)
Inf sup L.(x, y) = sup Inf L.(x, y) y x x y
.
Using the norm ||x - x|| = £ |x£ - X^l i,k
,
it may be shown that the functions s D?(x, y)are continuous in x and y . From the uniform convergence, it follows that L^(x, y ) is continuous so that the inf and sup of (C) may be replaced by min and max. Using this fact and using Lemma 2 as in Case 1 we have: THEOREM 3 * For the limiting averageeffective payoff on cyclic stochastic games, there are stationary strate gies x* and y* such that for all I: L ± (x, y * ) < L±
(x*, y * ) < L± (x*,
A Counter example for the General Case.
y)
.
It is not in general the
i
il ^
|0|
o
-* o
10 0 ■d13 pki = 10 1
Ml
j
b23 pk^ = I 0 I
|o|
;
4 } = Ml
| o |
J
OJ II
4 * = Ii °l 1 1*•
^
3
II
4 1 = |o|
P^ = OJ
ro
0 1* i! °l-
o
case that the unrestricted stochastic game will have a solution In stationary strategies for the limiting average effective payoff. Consider the follow ing probabilities and payoffs for a 3 -position game.
3 3
D3 ki
.. Ml "
It may be verified that for stationary strategies: min max L-(x, y) = 1 y x max min L- (x, y) = • — x y
GILLETTE
186
Here, we have a non-cyclic game (position 2 may be considered a stop po sition or a repeating position with zero payoff). non-uniform and the function
L^(x, y)
Convergence in (B) is
is discontinuous In
y
at
y 1 = (1, o).
§5 - REMARKS 1.
The fact that a solution exists in stationary strategies is
dependent upon the effective payoff.
Consider the following two-position
game:
Pki
= |0 0|
pk"
= 11 1 I
Pki =111 p£5 - ISI a^
= I1 -1 I
= | ]|
.
If we let the effective payoff be E.(x, y) = lim sup - |H?(x, y)|
N -- oo
-1-
it may be verified that for any stationary min E. (x, y ) = - °o,
x, i = 1, 2,
y but that there exists a non-stationary strategy min E (x, y) = 0,
y
x,
that gives:
min E (x, y) = - 1
y
(The maximizing player should choose the same alternative number as did his opponent at the previous move. ) We note also that here the addition of a constant to the payoff may seriously affect the results of the game. 2.
In, general, the value of a stochastic game depends on the
starting position.
However, as may be deduced from Lemma
3,
with the limit
ing average effective payoff for a cyclic game, the value is independent of the starting position.
3.
It Is easy to show that if there are finite sets
such that If for all off lie In
(X, Y),
s < sQ ,
X
and
Y
optimal strategies of the s-discounted pay
then there is an
s1 < sQ
and pair
(x, y)
that is uniformly the solution for the s-discounted payoff, for the limiting average payoff.
in
s < s^
(X, Y) and
STOCHASTIC GAMES WITHOUT STOP In particular, games of perfect Information fall into this cate gory. One may ask if there is an analogous uniformity of solution in cyclic games. The following example (due to Shapley) answers the question In the negative:
I P21 1/3 P22 i '2^3 /3
11 - i !/l I/'t
p 12
-
1 1/31
o
ll ?l
II
4o)
is easily explained
by observing that the values of all positions In each circle are equal. Let us now generalize these results to infinite termination games. First consider Nim played with ordinal numbers instead of non-negative in tegers . Were there to exist a line of play in Ordinal Nim containing an infinite number of moves, then there would exist an infinite decreasing sequence of ordinal numbers, which Is impossible ([5], p.
2 4 6 ).
Conse
quently, we see that Ordinal Nim is a termination game. Given an initial set of ordinal numbers, consider mappings ordinals into
0
or
1
so that
Given two different mappings f 1(a ) < f2 (a ),
where
a
f ~ 1(l )
f1
and
f
of
is a finite set of ordinals.
f2,
let
f 1 < f2
mean that
is the largest ordinal on which they disagree.
This relation obviously forms a transitive ordering between such mappings. Consider a subset
F
of mappings.
If
F
contains a non-zero
mapping, define as the smallest ordinal so that for some f € F, f (cr^ ) = 1 but f ( c r ) = o for a > . Proceeding by math ema ti cal indue tion, given a^ > a2 > ••• > crn , f e F, f(cr) = 1 and if orf > a, for some
i = 1, ..., n] .
consider the set {a/ a < an ; for some then f (a1 ) = 1 if and only if a 1 = cr^
If this set is non-empty, define
smallest element in this set.
an + -]
as the
Since every decreasing sequence of ordinals
terminates, this sequence has a last element
a^.
Define
g (cr) = 1
if
HOLLADAY or € {c^,
..., cr-^}, g(a) = 0
the first mapping in
otherwise.
( [ ?’ ] ,
a representation of ordinal numbers binary (or base
Then
g
F. Since these mappings p.
is easily shown to he
are well ordered, they form
2 5 4 ).
Let us call this the
2) representation of ordinal numbers.
Let us now show that for Ordinal Nim, a position is safe if and only if, expressing the pile numbers as binary mappings, for each ordinal, the number of mappings which map it into one is even.
To show that Con
dition I on safe positions is satisfied by this dichotomic labeling, ob serve that the representation of the number zero is the zero mapping and that zero is an even number.
To show Condition II, observe that given a
safe position, a move will alter some function thereby destroying the evenness on at least one ordinal.
To show Condition III, given an unsafe
position, consider the largest ordinal which is mapped into one an odd number of times (this may
be
done since there are only a finite number
which get mapped into one at all). ordinal into one.
Consider a
mapping which sends this
Then the following is a move leading to a safe position:
Replace this mapping by that mapping which maps an ordinal into one if and only if an odd number of the other mappings map it into one. Since every termination game must end in a finite number of moves, no cyclical game may exist.
Therefore, the following definition cannot be
reflexive for any two positions.
Therefore, the relation is a partial
ordering (transitivity Is obvious). (6)
DEFINITION.
Given a game
be said to follow a position
G,
let a position
p2
p1
if and only if there
exists a sequence of moves in the game which leads from. P2
to
p 1.
(7 ) LEMMA.
Given a non-empty set
a game, there exists p
does not follow
a final element in PROOF. if
pn
Choose
pQ e P pQ .
terminate, there exists
of positions in
such that
p e P
implies
In other words, there exists
P.
p1 €
is not a final element
Since the game starting at
P
p1
pn e P
P. Proceeding by induction, given pn e of P,
choose
and going to such that
Pn+1 e P which follows p 2,
pn
to
p^
P, pn «
etc., must
is a final element of
P.
Notice that the method of proof of (8) is a generalization to partially ordered sets with a maximality principal of the well known justi fication of definition by induction on well-ordered sets ([5 ], P* 250). (8)
THEOREM.
Given a game
G,
there exists one and
195
TERMINATION GAMES only one mapping
f
of positions in
numbers so that for
a
smallest ordinal to a position PROOF. that whenever
p
p1
for which
€ T and
description and
f1
Let
pQ
to a position in f2 (pQ )>
4
p ’ follows
f2,
then
T
is the
T
as a set of positions such
p, then
p 1 € T.
If
f1
and
f2
Into ordinals which satisfy the above
4
P = {p/p € T, f^(p)
be a final element T - P.
to ordinal
f(p1 ) = cr.
Let usdefine a terminal
p
G f(p)
such that there is no legal move
were two mappings of a terminal non-void.
a position,
But f 1 = fg
f2 (p))
of P.Then any move on on T
- P
so
would be pQ
leads
f 1( p ) must equal
which is a contradiction. Consider a partial ordering of terminals by the relation contains.
Given any ordered set of terminals on which
f
exists, because of its
uniqueness on each terminal, it exists on the union of these ordered terminals.
Therefore, using Zorn's Lemma, there exists a maximal terminal
Tq
on which
of
G -T •
to
pQ . This contradicts the maximalIty of
If
f
exists.
Tq
is a proper subset of
Then
TQ U {pQ }
G,*
choose a final position
is also a terminal and
pQ
f may be extended
TQ and so this theorem
is
proved. (9)
THEOREM.
Given a game, there exists a unique
mapping of positions into the words safe and unsafe which satisfies the three conditions at the beginning of this article.
Furthermore, this mapping labels a
position as safe if and only if a player leaving his opponent with that position may force a win. PROOF. it maps into zero,
If, from (8), we label a position as safe if and only
If
we havefound a labeling satisfying Conditions I, II
and III. To prove uniqueness, it is sufficient to prove the second sentence of this theorem.
To win, use the following strategy:
the opponent must leave an unsafe position.
again leave the opponent with a safe position.
Since, whenever a safe
position is left,it Is the opponent’s move, Condition If one leaves the this strategy.
By Condition II,
Using Condition III, one may I guarantees a win.
opponentwith an unsafe position, he may likewise use
Therefore, one may always force a win if and only if he
leaves a safe position. (1 o ) LEMMA.
If
p
is a position In
G,
a
an ordinal
HOLLADAY number and f the mapping of (8 ), then (p, cr) is safe in G x one-pile Ordinal Nim if and only if f(p) - cr. PROOF. Condition I is satisfied since if (p, cr) is a termina ting position, so is p and f(p) = o = cr. To show Condition II, observe that If (p, cr) to (pf, a) is a move, then f (p?) 4 f (p). Toshow Con dition III, If f(p) > cr, there exists a move to (p!, cr) wheref(p') = a If f(p) < cr, one may move to (p, f(p)). Now, using (1 0 ) in place of (1), one may proceed as before to prove (3 ) and its corollary for non-finite terminating games. The value of a position is the ordinal number assigned to it by the function of (8 ). An. Interesting generalization of the game of Nim is the follow ing due to E. Ho Moore [6 ]. The rules for "Nim^." (k a positive integer) are the same as for Nim itself (which Is MNim1" ) save that for each move the player may take at will from any number of piles not exceeding k. As Moore discovered, a position In "Nim^” is safe If and only if, express ing the pile numbers as binary numbers, each power of two is represented zero times modulo k + 1 . It may easily be shown that for "Ordinal Nim^", this is still the solution where the binary expansion of an ordinal number Is the expansion referred to earlier in this paper. Given games G 1, ..., Gn, let(G] x ... x Gn )k be the same as for G 1 x aea x Gn (which Is (G1 x ... x G)1) save that for each move the player may move at will from any number of games not exceeding k. An interesting question Is whether a position in (G1 x ... x Gn )^ is safe If and only If the set of values of the component positions is safe in "Nim^.". In fact, let us consider a termination game consisting of termina tion games G 1, ..•, G for which a move consists of making moves from certain authorized subsets of the games G 1, ..., G . Is a position in this generalized Cartesian product game safe if and only if the values of the component positions is safe in the corresponding Nim-like game? The answer to the question just posed Is sometimes, but not usually. To see this, consider the three conditions on safe positions. Condition. I would be satisfied since the component positions of a termina ting position are all terminating positions, whose values are therefore zero. Also, Condition III would be satisfied. Given a position for which the component values are unsafe in the Nim-like game, choose a move in the Nim-like game which would change these values to a safe position. Using (5) (or (8 ) as a definition of value), for each component position corre sponding to a pile which Is to be lowered, there exists a move which leaves a position whose value equals the desired pile size In the Nim-like game. The only problem is In satisfying Condition II.
TERMINATION GAMES
197
For the corresponding Nim-like game, Condition II implies that for any safe position, a move leaves an unsafe position. However, in the Nim-like game, all moves lower pile sizes and never raise them. But (5 ) (or (8)) does not guarantee that a move will leave a position of lower value. As an illustration, consider the well known "Modulo Game". A po sition consists of a non-negative number (or pile size), and a move con sists of reducing thesize of the £>ile by any amount not to exceed m, where mis a specified positive Integer. The value of a position is the size of the pile modulo
m + 1.
In four-pile "Nim2", the position (1, 2, 3 , 3 ) Is safe. In themodulo game M where m = 3 > the values of 5, 2and 3 are 1, 2 and 3 respectively. Therefore, one might expect (5, 2, 3 , 3 ) to be safe in (M x M x M xM)2 . However, this position isunsafe since one may move to
(2, 1, 3^ 3)
which is safe.
For (G.j x ... x Gn )^, where k > 1, if n > k + 1, then the theory breaks down. A position in k + 1-pile "Nim^." is safe If and only if all pile sizes are equal. Since altering the sizes of any k or less piles destroys the equality, the following Is true: A position in (G1 x ... x G^+1)k is safe If and only if the values of all of the com ponent positions are equal. Likewise a position In (G1 x ... x G-^. is safe if and only if all of the component positions are safe. Cartesian products of thek-pile "Nim^" types of Cartesian of types,which satisfy this theory, will also satisfy this theory. Instance, a position (p1, ..., p ^ ) in ((G1 x G2 x G^ )2 x (Glf x
product For x G^)
Is safe If and only if f (p1 ) =f (p2 ) = f (p^ ) and (f(pi|),f(p^), f(p^)) is safe in 3 -pile Nim, where f Is the value function of (8). position
In k-plle "Ordinal Nim^", we find from(8) that the value of a (cr.j, ..., cr^.) Is a ^ + ... + o^., where we define Inductively
cri + ... + Oy, as the first ordinal number which exceeds all ordinals of the form aj + ... + cx-^, where a! < a. (I = 1, ..., k) and at least one of these inequalities is strict. It may be easily seen that this concept of sum coincides with the concept "natural sum" due to Hessenberg [7]. The "natural sum" is the sum obtained from term by term addition of the base a) expansions of ordinal numbers (mappings of ordinals Into non-negative integers for which only a finite number of ordinals map Into positive In tegers). However, one could also describe this "natural sum" from ex pansions In other bases as, for instance, the base two described earlier in this article. The need for carrying in such other bases Is no problem at all since each ordinal has an immediate successor. Notice that if we express the pile numbers in ordinal Nim as expansions to the base 00, a position Is safe If and only if for each ordinal, the integers into which it maps are safe positions for Integer Nim.
HOLLADAY Consider Cartesian products of the type satisfying the analogue of (3 ) (a position is safe if and only if the values of the component po sitions is safe in the corresponding Nim-like game). One might then sus pect that the analogue of (4) would then follow (the value of a position is the value of the position in the corresponding Nim-like game determined by the values of the component positions). However, this speculation is not necessarily true. For example, consider (M x N)2, where M is the modulo game described earlier and N Is one-pile ordinal Nim. Then the value of a position (i, a) is the smallest of the following two ordinal numbers: (l )
cr + I
(2)
a(m + 1 ) + r {
where a + i m + 1
is the remainder obtained upon dividing a + i by m + 1. The concept of multiplication used here is the usual non-commutative one obtained by letting cr(m + 1 ) be the ordinal corresponding to the order type obtained by ordering the ( a 1, n ) 1s, 1 < c r ! < a , 1 < n < m + 1 by ( a j , n 1 ) < (ffg, n2 ) If and only if either
a| < a2
or
0 ] = °2
and
n l < n2 '
To show that this valuation satisfies (8), let us first show that a move changes the value. For a move (I, a ) to (I1, cr! ), if a r < a , then a 1 + I f < cr + I and cr
and i -
*(m + 1 ) + r
(
m.\+ 1 J
< a (m + 1 )+ r
so f(if,cr1) < f(i, cr). If a 1 = a, then f(i, I 1 (mod m + 1 ) 4 O(mod m + 1 ).
If i < m, pile Nim2!I so
then
(i, a)
( a + 1
^m + 1 a) - f(if,cr) =
is equivalent to a position In "two-
f(i,cr) = a + i < o (m + 1 ) + r
i-
Now, we need but show that when i > m and v is an ordinal less than f(i, a), one may move to a position whose value is v. Define k as the ordinal such that 0 < v - k ( m + l ) < m . Define j so that ° < I - j < m and v = k(m + 1 ) + r
TERMINATION GAMES Then we get three cases CASE I.
v < k + j. Then
k(m + 1 ) + r k < a
and
(1 , a ) to
v = f (j, k ).
Also, since
) = v < c(m + 1 ) + r
(ftt)
’
(j, k )is a move.
CASE II. v > k + j. is a legal move and
v k + j. v > cr + j • Then v < f(i, a) < o + 1 implies that i > v - cr. Also, since + j < v, v - c r > j > i - m and we see that (1, cr) to (v - o, o) is a move. Also, a + (v - cr) = k(m + 1 ) + r ^
+ 1)+ r
j
n
V i
The next step is to replace this inequality by an equality, and this Is accomplished by the following reasoning. Let x *(an+i lan_A,+2-y **8’ an^ be the initial component of an optimal behavior strategy for Player 1 in Gn • Since the strategy is optimal, it can be told to Player A 2 without degrading Player 1ls expected return. Let Player 2 choose t>n+1 so as to minimize
^
(V u 2 )V(Im i ’ Pn+1( - l V x +2 )}
'
21 9
GAMES WITH PARTIAL INFORMATION
where p*+1 (•|an+x_2 ) Is compounded from Pn (•) and x * ( a n+] \&n „ x + 2> •••, a^) in the obvious way. Then with probability P(^n_x,+2 ) the common fund of information available to both players is ln+1. Now if Player 2 continues his strategy by playing an optimal strategy in ^n+l(In+1J Pn+i H a n_x+2)), it is clear that he will prevent Player 1 from getting an expectation great er than X p ( a n_x+2 )V(In+1; p L i (,|an_x+2:>) which from the way that Min n+1
bn+1
,
was chosen is equal to
I p ( V u 2 )V(Id+ 1 ; P n + l H a n . ^ g ) )
•
Since Player 1 was assumed to be playing optimally, this last quantity must be no less than
V(I ; Pn (*))>
V(In; Pn(,))S
MaxMin n+ 1 I'’ n
and we obtain Z p ( a n_ x+2 )V(In+1; pn+1 (• |an _^+2 )) n+1
.
Combining this with the previous inequality, we obtain the desired functional relationship. THEOREM 1 . Let
GQ
be a game with time lag
X
(w r i t t in the form k = 1 , Z = \ ) , which has a con tinuous payoff. Let V(ln; Pn (8)) be the value of the subgame in which both players 1 information about the past is ln = (a1, ..., an_^+1; b 1, ..., bn ) and in which Player 11 s previous \ - 1 moves are governed by the joint probability distribution
P(an-X+2'
=
,
,
Max
Pn (#) =
an } ’ ^
n ^ Mln
Z p ( a n_x+2)V(In+1; pn+1(-|an_x+2))
x(a n+l I n-ii+ 2 ’ • • • >an ) b n+1 where P (an - ^ 2 } =
E an-X,+3
P^an-x.+2' an
V
,
220
SCARF AND SHAPLEY and P(a.'n-A,+2 - - > an )x(V l l ^■~n-A.+2 ’
Pn+i(-|an^ +2)
• • •■>an )
P (Sn-X+2 ) §4.
OPTIMAL STRATEGIES FOR PLAYER 1
In this section, we shall show that a class of optimal strategies for Player 1 in the game with time lag A, can he derived from the function al equations that we have established in the preceding section. As before, we assume that the game is represented in the form k = 1 , & = X > o. Optimal strategies for the case x = 0 are obtained by the process out lined in the Introduction. The first of the functional equations relates the value of GQ (the actual value of the game Itself) to the value of G^I^; p ^ 9))* For X > 2
the first equation is V =
Max IJJin V (I -; p -(• )) x(a1 ) b 1
with I1 = b ] and p 1(• ) = x(a1). Let us define the components of a be havior strategy for Player 1 in the following recursive fashion. Let x(a^ ) be chosen so as to maximize JV[in V d ^ bl Call such a maximizing distribution
p 1 (• ))
.
x*(a1 ).
In general, if
x*(a1 ),
x*(a2 |a1; b 1), ..., x*(an |a1, ..., a ^ ; b^, ..., bn _ 1 ) are known, then for each In =-(a1, i ..., a,ii~*A.+1 , b-, i ..., bII) we form the joint probability distribution p*(• ) given by the following product of X - 1 factors:
x * ( an | a 1,
Sn_x+ r
• • • X* (an-x+ 2 la i'
an-V V
•••' V x + r
V i }
A
A
V
V x + 1}
•
. | /\ a ^ ^ . Then x*(an+1 |a1, ..., an_x+1> •••> an i b-j, ..., bn ) Is chosen equal to any x(a^^-|a„ ,,0, ..., aII ) that maximizes n+ I n—Aj+d JVIIn n+1 As
In
^ P ^ V ^ ^ ^ n + T
Pn+1 ^’ Ian-A,+2 ^
takes on all conceivable values, we obtain the complete behavior
221
GAMES WITH PARTIAL INFORMATION strategy component x*(an+1 |a1, ..., anj b ^
..., bn )
We want to show that this method for selecting the components of a behavior strategy for Player 1 leads to an optimal strategy. Suppose that such a sequence has been chosen. Let us define the sequence of functions V*(I ) to be equal to V(I ; pn (• ))• They have the property that I.
Min X x * ( a n_,+2 |ai, n+1
)v*(In+1 ) = V * ^ )
Iflin V*(I 1 ) = V
II.
,
,
b 1
and III.
lim
V*(I ) = M(a, b)
n ~^oo uniformly, where
M
is the payoff function.
Property I is a direct consequenceof the definitions. Property II follows from the application of the initial functional equation. Prop erty III is a direct consequence of the continuity of the payoff function, which implies that the values of the subgames approach the payoff function for large fixed initial segments. Now let us suppose that the strategy {x*} is played against an arbitrary mixed strategy for Player 2 , which is represented in behavior strategy form by the sequence of conditional distributions y(bn+1 |a1, .. ., ari—A.+1 • b-, I ..., bn ). These two strategies give rise to a measure on the space of all sequences of a ’s and b ’s with the property that prob (a1, ..
an, b 1, .. ., t>n ) = x*(a1 )
... x*(an |a1,
an-1; b,, ..., bn_, )y(b1 )
... y(bn |a1 , ..., and the functions
^ /-j
bn - 1 )
V(I ) become a sequence of random variables.
E(V*(In+ i ) H n )
/v
b l5
a
= E(V*(In+1 )|a,,
an_,+ 1, b ^
prob(a 1 ,...,an+,>b ,,...,bn+ ^ /\
an - ^ 2 , .. . A + 1 Prob(al ’ • • - ' an-x,+ l >b l ' • • -’ V V i
...,
^
, Then bn ) ^
n+ 1
SCARP AND SHAPLEY
222 which in turn is equal to
X x*(an~x.+2 I an-\+2 ^n+i
y an-x,+ i’ V
V ^ 1 }
•••»
'
because In+,ni does not depend on a.ri”A ,,~, Property I tells us ,+j ..., a,,1. n+1 that this last expression is not less than V*(In ), and we obtain M V *(In + ])\ln ) > v * ( V
.
If we integrate out the conditioning variables and apply Property II, we obtain
E(V*(In+1)) > V
;
and applying Property III yields E(M) > V, which tells us that our strategy is optimal. There may be some question at this point as to which optimal strategies of Player 1 are obtained from the functional equation by the pro cedure outlined above. It Is quite easy to give examples in which not all of Player 1 ?s optimal strategies are obtained in this way. It Is true, but we shall not prove it at this point, that the class of strategies obtained from the functional equation will include the class of "best" strategies for Player 1 [8, p. 84] (If we disregard those portions of a strategy that refer to situations of measure zero). THEOREM 2. If the components of a behavior strategy for Player 1 are chosen recursively in the way out lined above, this strategy is optimal. §5.
OPTIMAL STRATEGIES POR PLAYER 2
•
To obtain optimal strategies for Player 2, the game must be repre sented in the form k = X + 1, Z = 0 . The diagram for the n-th subgame In this case Is
223
GAMES WITH PARTIAL INFORMATION moves for Player 1 In this subgame
The game is specified by In ” (al' an+l’ b l and qn M We notice that the first move in this Prob (bn_x+2, V * subgame is made by Player 2 If we denote Its value by V(I the functional equation is
V ( ! n; qn ( - ) ) Min n+1
Max
'n-A,+2' ' ■ ' V an+2
and by using the same techniques as in Section b, it is possible to compute an optimal strategy for Player 2 in a recursive fashion from this function al equation. §6.
THE GENERALIZED SUBGAMES (OTHER VALUES OF
k
AND
i)
In this section we consider the representation of our game with time lag \ for general values of k and i (k > 1, £ > 0, A, = k + i - 1 ). For each value of ( k , £) there will be two classes of subgames, depending on which player moves first. These will be generalizations of either the games discussed in Section 3 or those discussed in Section 5* In what follows we shall restrict our attention to the former. The diagram for the n-th subgame Is given by
This subgame Is described, first of all, by the fund of Informa tion known to both players after Player 2 has made his n-th move. In this
22k
SCARP AND SHAPLEY
case it will be a specification of Player 1!s first n - & + 1 moves, and Player 2 ’s first n - k + 1 moves, say (a^ ..., &n _£ + ] > •••> bri__k+1 ) We also have given an arbitrary pair of joint probability distributions pn (*) = Prob(an_.{!+2-’ •••■’ an ^ and = prob^bn-k+2> *•'» bn ^ The Same will be denoted by G (I ; Pn (‘)* %(*))> and it proceeds as follows: The moves an_^+2> •••* an are randomized from p (*) and told to Player 1 but not to Player 2. Simultaneously, the moves bn_^+2^ **•> are panc^om ized from q (* ) and told to Player 2 but not to Player 1. They then pro ceed as they would in the original game, with the same payoff M(a, b). We still assume that this payoff is continuous, so that the general theorem of [9] applies and the game has a value, which we denote by V(I ; ^n^ * ^ As before, there exists a sequence of functional relations. They take the form V(ln ; pn (.), qn (-))
Max
x ( a ti+ l l ah - i + 2 ' ” - ' a h )
Min
y(V l
I V k + 2 ' - ' bn>
Zp(an^ +2h(Vk+2)V(In+l5 Pn + l ( - l Sn - ^ +2 ^ = Min Max The proof is quite similar to the proof given in Section
3 and we shall not
repeat it here. We would like to indicate the major difference between the func tional equations in this case and the functional equations that were dis cussed' in Sections 3 and k . In those sections we showed how an optimal strategy for Player 1 could be computed recursively from the functional equation. The corresponding procedure for the present case would be the following: Suppose that x*(a] ) ... x*(an |a1, ..., an_1, b 1, ..., bn_k ) have been computed. Then for any I = (a1, we would consider a game Gn (In ; Pn (*^ Q.n (*))
Pn M
an_£+1> b -j> •••> bn-k+l ^ with pR (•) defined by
= x*(ai l _jJ+2 |a 1 , . . . , an_^ + 1 , b , , ... x*(alJa |ai> ‘00' an-1; b i'
b ^ , ) bn-k
and for some Q.Ln n (*) which for the moment we leave -undefined. x(an+1 I +29 aa*9 an^ would be chosen so as to maximize
Then
225
GAMES WITH PARTIAL INFORMATION
v(b lb Min b ) y(bn+i |bn-k+2 ' - * * ' V
^ p(an ^ + 2 )q(bn-k+2 )
It is clear that this choice would depend on qn (•), and we would there fore only be able to show that the strategy we have chosen is optimal against a particular choice of Player 2 fs strategy. This means that in order to obtain an optimal strategy for Player 1 in an arbitrary (k, £ ) representation (X > o), we must transform, this representation into k = 1, £ = X by renumbering the moves of one of the players and the addition of several vacuous moves at the beginning of the game, and then apply the method of Theorem 2 . To obtain optimal strategies for Player 2 , we must transform into k = X + l , 4 = 0 . §7-
REMARKS
In our discussion we have consistently assumed that the payoff function is continuous. This has permitted us to say that each subgame under discussion has both a value and optimal strategy for either player, and there is at least one point in the proof that we have given for the validity of the functional equation in which the existence of optimal strate gies was specifically used. There are other conditions under which our sub games may be shown to have a value. For example, as is shown in [9 ], if the payoff function is upper (lower) semi-continuous, then each subgame has a value and optimal strategies exist for Player 1 (2 ). The question arises as to whether the functional equations relating the values of these subgames are still valid. It can be shown that a modification of the argument of Section 3 yields this same functional equation, with Max Min replaced by Max Inf (Sup Min). It is also true that optimal strategies for the player who has them in the semi-continuous case can also be generated by means of the functional equations. On the other hand, very little can be said about the other player's strategies from the functional equation. To illustrate this point, let us assume that the payoff is lower semi-continuous, so that the maximizing player does not necessarily have an optimal strategy. Let us recursively pick strategies for Player 1, by choosing an x (a^_L .+d 0, 11+1|a„ I n-A, ..., an ) which is e/ 2 -effective in Min n+1 and as before, define
Z p ( an-X+2 )V(In+1; Pn + 1 (’ >'K - X + 2 }}
V*(I ) = V(In ; p*(•))•
’
Then it will be true that
SCARP AM) SHAPLEY
226
against any strategy for Player 2 we have E(V*(In+ 1 )IIn ) > V*(In ) - e/2 n , and therefore E(V*(I )) > V - e. But lower semi-continuity weakens Prop erty III of Section h- to: III’. Tim^ A M(a, a,b -— a,b
b)
As a result, we canconclude E(
but not that
11 m
> lim V*(l ) > n ---► «>
Y*(l ) > M(a, b ) n-—
that
lim ^ „ M(a, b ) ) > V - e a,b -- a,b
E(M(a, b )) > V - e.
Even without the conditions of semi-continuity on the payoff func tion, it is still meaningful to talk about the functional equations. With out any conditions on the payoff function, if we can find a solution of the functional equations
V d n; Pn (•))
= ,
,
Max
, *Mln
n+1 |a n - \ + 2 ' - - - ' V bn+1
^ P (V u 2 )v(In+r
>
with the property that V(I ; pn (•)) -- M(a, b) (say boundedly), then the strategy for Player 1 , which is generated recursively from these equa tions, will guarantee Player 1 at least V(IQ ) against any strategy of Player
2. §8
M
EXAMPLE
It may be instructive to show how the above techniques can be applied to the celebrated "bomber-battleship" game to yield functional equa tions. (References [3], [5], [6 ], [7 J* ) We shall do this in two ways, according to the two arrangements
Case I a n-1
|
a n
V,
Case_^I
a i n+1
V^V.
k = 1,
a n
Vi
a ,i n+1 i
bn
i = 2
k = 3,
(a 1
and
a ~ n+2
|»„.,
I I
^ = o
a2
vacuous)
227
GAMES WITH PARTIAL INFORMATION The
b^
(pass).
will each be The first
o
or
an 4 ©
bn + V i
1;
the
will each be
0, 1 , 2 ,
or
©
is interpreted as a prediction that
= an
or
V s
(In Case I ).
+ V i
= an
(In Case II)
The payoff Is 1 to the a-player for a correct prediction, o for an in correct prediction or no prediction. This payoff is lower semi-continuous, so optimal strategies are assured only for the b-player. (This formulation follows Blackwell [1 ].) In Case I we observe that the generalized subgame G ( I ; Pn (•)) is trivial if any a^ © in In . On the other hand, Gn (In; pn (•)) and G^d^; Pm (#)) are completely Isomorphic provided that pn - pm , bn = b^, and all a^, a^ are © (i.e., the earlier b^, b^ do not matter). Hence we may write
W
V
= f v (xo' V
x2 }
lf
n > 1
where v = bR and x^ = Pn (a). Moreover, symmetry tells us that fo (xQ, x 1, x2 ) = f 1 (x2, x 1, xQ ). Hence the functional equations of ThSbrem 1 , n > 1 , reduce to the single equation: tfQ (u, v, w) + x (1 )
f (x, y, z) =
Max Min u, v,w tfo(w,
where t = 1 with sum
x)
Min
for the value of the game. Equation (1 ) is the same as equation (34) in Isaacs’ paper [6 ] and forms the basis of his analysis of e-optimal strate gies for thea-player. The unique "ideal” (locally optimal) strategy is given by xQ =x^ = x2 = o — i.e.,never predict — and is clearly not optimal.
I ,
In Case II, Gn (I ; qn (*)) is again trivial if any a^ / €> in while the other games are entirely independent of all b . in I .
SCARP AND SHAPLEY
228
Thus, we have = e o there exists an n
0
DUBINS
236 such that
-
|0
0
| < e/2 *
Now choose
v e S2
such that for all
u € S1,
- 0n ~ €!2. can done since Gn has value 0 n * This implies that F(u, v) > F (u, v) > 0 n - e/2 > 0 - e for all u € S1 . Thus G has value 0 and s is an optimal strategy for player 1 (the evader) which has been shown to exist. Since = T an(^ 02 that l/ 3 < 0 • It is not difficult to see that 0 < 1/2. the next section that
0 = —
^ ^ ° ^ OW3 We will show in
•.
§3 - VALUE OF THE GAME AND OPTIMAL STRATEGIES OF THE EVADER In this section we determine the value of the game — —and find all optimal strategies of the evader.
G to
be
We will call a strategy s of the evader a special strategy if, for every path p € P, the conditional probabilities that the evader then moves two steps to the left, and two steps to the right given that he has taken path
p,
are each
< 0
and their sum is > 1 - 0.
To state this somewhat more formally, provided that for every path equalities hold. s ( a 1, (A)
1 - s ( a 1, 1 ~
p, p = (a^,
• •• , an ) ...,
s ( a 1,
1 ~ s(a1 ,
•
an ) ...,
s ( a 1,
s
is a special strategy
a ) the following three in
. . . f a n , R) < 0 ...,
•1 ~ s ( a 1 , -an )
•
s ( a 1,
an )
€(1 - A).
Therefore If s were not special, we have shown that there ex ists a strategy for the bomber for which the probability of destruction of the evader is > 0 . This Is a contradiction, so every optimal strategy for the evader is special. Consider the strategy for the evader which corresponds to his pro ceeding initially to the left with probability e, and whenever he takes a move in a certain direction, he takes his next in the same direction also with probability e. It Is clear by referring to Figure 3 that two units of time later, the evader arrives two steps to the left with probability 2 & , and that he arrives directly in front with probability 1 - e. If he p chooses 0 to be such that 0 = 1 then the bomber cannot destroy him with probability greater than 1 - 0. Thus the value of the game, 0 , Is certainly less than or equal to 1 - 0 It is simple to check that 1 - 0 = o _ jv ■■-■g---- and that therefore, 0, the value of the game Is less than or n. 0 equal to . This was observed Figure 3 2 by Isaacs and he conjectured the value of the game was — - . We will show that his conjecture was correct. We note that — < 1/2. We will now show that If s Is a special strategy for the evader, and if p is any path, then the probability with which the evader goes to the right at the instant he has completed path p Is less than or equal to 0 (where 0 is the positive root of the quadratic 0 = 1 - 0) as Is the probability of his going to the left. Equivalently, we will show that 1 - 0 < s(p) < 0. Assume otherwise. It is clearly sufficient to assume that for some path p the evader’s probability of continuing to the left
238
DUB INS
Is hQ, where hQ > e. We will derive a contradiction. to Figure b for meaning of symbols.
Figure k-
1 1
Figure
5
Since the strategy s is special it follows that hQg0 < 0 - hQg0 - (1 - hQ )h^ < 0 . Adding both inequalities together, we get — h^(1 -hQ ) < 20 from which it follows that
1
we may assiime
an 1 " 20 •
h for
Reference is made
-
’
hQ < 20 < 2(1 - e) < 1,
Similarly, by referring to Figure
and may therefore
divide by 1
- h
5 for the meaning of the symbols,
and using the fact that s is a special strategy, we have h-|g-j < 0 a^d 1 - h lg 1 - h2 (i - h 1 ) < 0 . Adding as before, we have 1 - h2 (i - h 1) < 20 from which It follows that 2 ->1 Again h 1 < 1 , for otherwise s defines hn and determines that
h
1
-
20
- h1 *
would not be special.
>
By induction, one
1 - 20
n - 1 - V i
and hn < 20 < 1 . We now show that hn is monotone non-decreasing. Clear ly if hn > h]Q_1, it follows that bn+1 > hn . Therefore it is sufficient to show that h 1 > h . This will certainly be proven once it is shown that ■1
1 ~ 20 0 - 1 - ho '
which we now proceed to do. That is, we desire to show that hQ(1 - hQ ) < 1 - 2 0 . We observe that the polynomial x(l - x) has a maximum at x = 1 / 2
A DISCRETE EVASION GAME and is decreasing to the right of x = 1 /2 . that e, and 0 > 1 /2 , It follows that h Q(
1
239
Since
hQ
- e) = e -
e 2 = (1 -
= i - 282 =
1 - 2( 1 -
- hQ) < e(i
was assumed greater
e2 ) - e 2
e ) < 1 -20.
The last equality above follows from the fact that
1 - 0=0
p
and thelast inequality follows fromthe previously established fact that 0 < 1~ 0 . Thus we have shown that hQ (l - hQ ) < 1 - 20 and havecompleted the proof that hn is monotone non-decreasing. Since hn Is bounded above by 20 and is non-decreasing, hn ---*-h. Since v, ^ n
1-2 0
> T "- "K
n -1
’
it follows that
h >- 1 1 - -20h Thus h( 1 - h) > 1 - 20 from which it follows that h(l- h) > 1 - 202 2 2 since 0 < 0 • On the other hand, 0(1 - 0 ) = 1 - 20and again since the polynomial x(l - x) is decreasing to the right of x = 1 /2 , It follows that h < 0 . But hQ was assumed > 0 and h > hQ . This is a contradic tion. We have therefore proven that if s is a special strategy for the evader, then for every path
p, 1 - 0 < s(p) < e?
Now we will show that for any special strategy s and non-trivial path p(that Is for any path other than the null path) if p = (a , ...,an ), and if an = R then s(p) > 0, but if an = L then s(p) < 1 - 0 . In tuitively, we are asserting that for any special strategy s, the evader will always continue in the direction of his last move withprobability not less than 0 . It Is clearly sufficient to assume that: p = («! > ■■■, an ),
where
That Is, the nth step is to the left.
Sinces
an = L. Is a special
strategy, we
have (see Figure 6 on the following page) s(p) + 3(0^,
. . . , aR_1)
1 -3(0^
Now since the last paragraph has shown that
3(0^,
a n _ 1 , R) < e
,
,
. . . , an_1 R)
J
0 the evader possesses a strategy which insures his destruction with probability less than 0 . In order to prove the existence of such a strategy for the evader, we have found it necessary to introduce an auxiliary family of games. For each real number u in the closed interval [0, 1 ], we will define a game Gu which heuristically is the modification of the primary game G obtain ed by constraining the evader to take his first move to the right with probability u. We will then show that each game Gu has a value V(u), and that for each game Gu the evader possesses an optimal strategy su -
A DISCRETE EVASION GAME
2k3
The details are quite similar to those of Section 1 and therefore ve will be sketchy. In Section 1 we defined the game Gn for each n > 2 . For each u in [0, 1 J and each n > 2 , we define the game Gn u to be the game Gn restricted by considering only those s1 e S1 for which s1 (o) = (o is the null path.) More precisely, the set of strategies for the first player (evader) for the game G„ the subset of S.1 n, _ u_ will be S.(u), i consisting of all s1 € S1 such that s1 (o ) = u. The set of strategies for the second player (the bomber or pursuer) will be S2 as before. The payoff function F^ ( sI s d ) will be the same as Fn , except that it is n , Li restricted to a subset of S1 x S2; namely it is restricted to S^(u) x S2 . It is easy to see that S1 (u) is a closed subspace of S^ in the topology of weak convergence. For each u in [o, l] and for each s1 e S (u) and each s 2 £ S2> FR s2 ) Is a monotone non-decreasing function of n, bounded above by 1 . Thus Fn u (s1,s2 ) has a limit which we call FU ( V s 2 ^‘ r^ ie Sarae with strategies S1 (u) and S2, and payoff function Fu , we call G • A modification of the usual theory of finite game shows that each game GR u has a value 0n (u), and that 0n (u ) is monotone non decreasing and Is bounded from above by 1 . Let V(u) be the limit which must then exist. Then, since S^(u) is compact, a similar argument to the one given in Section 1 shows that V(u) is the value of the game Gu and that the evader possesses an optimal strategy s . It Is, of course, clear that Fu isthe functionF cut down to S^ (u) x S2 . From this it easily follows that V(u) > 0 , the value of the game G.Also from our knowledge of what the optimal strategies of the evader for game G are, It follows that V(u) =0 for all u In the closed Interval [0, 1 - 0], and that V(u)> 0 for all other u In the closed Interval [o, 1 ]. Furthermore, it is easy to see that V(u) Is symmetric about the point u = 1 /2 . We will presentlydetermine the function V(u) rather explicitly for the half open interval (1 - 0, 1 ] and by symmetry, we will then know the function V(u) over the closed interval [0, 1 ]. However before determining the function
V(u),
we will indicate our reason for doing so.
Recall that we were in the course of proving that the pursuer possessed no optimal strategy for the game G. We had stated that if he possessed such a strategy, we could assume he possessed one which called for his bombing initially with some positive probability and had shown that he must bomb with probability zero, two steps to the left of the in itial position of the evader as well as two steps to the right of the evader's initial position. Thus he bombs with probability p > 0 direct ly in front of the evader’s initial position and he delays bombing with probability 1 - p. (See Figure 8 .) We now desire to show that there exists a strategy for the evader which defeats any such strategy of the bomber. That is, we desire to show
244
DUB INS that for any p > 0, there exists a strategy for the evader for which his probability of destruction is less than 0 y the value of game G. Our proof that the pursuer possesses no optimal strategy will then be complete Suppose the evader goes initially to the right with probability 1 and takes his next move to the right with probability
u.
Then the evader is
destroyed on his next move with prob ability p(i - u). And since an opti mal strategy for the game Gu exists for the evader, he can select a strategy so that the probability with
Figure 8
which he is subsequently destroyed is less than or equal to (1 - p)V(u). Thus the evader possesses a strategy for which his probability of destruc tion is less than or equal to p(l - u) + (1 - p)V(u) . Thus we need only show that for’any
p > o,
there exists a
u
in
such that: (3(1 - u ) + (1 - p)V(u) < 0 .
Let
0 = 1 -
0
. Recalling that
V(e) =
0,
we have:
( 3 ( 1 - u) + (1 - p)V(u) p(i - e - (u - e )) + (1 - P)(V(u) - V(e ) + 0) =
p(i - 0) - p(u - e) + (1 - p)
=
p0 - p(u - e) +
=
0
.
,
(1
- p)
- p(u - 0) + (l - p) _ B ( U _ 6 )
[ ,
- 1 ^
[V(u) - V(e)]
[V(u) -
v(e)]
+
(1
+ (1 - p )0 - (3 ) 0
[V(u) - V(0)] (V(u)u-_Y(,S),) ] .
Thus we need only show that there exists a
u > e
1 - P I V(u) - V(e) \ . 1 P \ u - e /
such that:
t0, 1]
A DISCRETE EVASION GAME It is clearly sufficient to show
(2)
lim « u ) - V C ) u ------ 0 u " 0 u
>
- 0.
e
In order to prove (2 ), we must look a little more closely at the function V(u) In the half open interval e < u < 1 , and this Is what we now propose to do. We will first prove the following fact: If
0 - the value of game
u > 1 - 0,
G . and if
then
(3) V(u) = min
max
,
U0
+
(1
-
u)V (s)
1 -0 1 - d as long as this pattern continues. For such c and d, we have gn (c) > gn (d). (b) Since gR is convex and g ^ O [In spite of the fact that gR is piecewise can be employed for purposes of simplifying effect the rigor of ourarguments. In fact, for a finite number of values.3 (c)
)=■§■, we get ~ > g ^ c ) > 0. linear, the use of derivatives the reasoning and does not all derivatives exist except
By a direct calculation from (6), we get
M _ gn(c) [2(1~d)~c] + O-d) + Sn(c)- Sn(d) (l-x)[2g^(c)+l+g^(d)] This expression is strictly positive in view of the inequalities 1 > 1 (d ) |i gn (c) > gn (d) and Icg^c)! < j < 1 - d. Consequently,
KARLIN
268
if
-
is positive neap T:
1
’ "
-
2V n+1
1 " Vn+1
'
the preceding arguments show that the positivity of wherever (6) is valid and x decreases away from
-
1 "
becomes stronger
2V n + l
Summarizing the conclusions of the above analysis in the form of a Theorem, we get: THEOREM 3 • The function gn is convex and attains its minimum vn on an interval of x values ’ 1 - 2vn 1 _ V
1~ vn
Moreover, vn+\> vn > y from o to
for (n >
_ ' 2).
As
x
varies
1 " 2V h
1-vn+r' the corresponding c crease. Explicitly,
and d for gn+j(x) in (3) de d varies from the value ~ to
the value vn+1andc decreases from the value 1 to the value 1 -v^,,. When v ,1 < d < ~ and corren+i n+i d spondingly 1 - vn+1 < c < 1 then c and d are uniquely determined by the relations (6). A new proof of Theorem 1 can be obtained from the result of Theo rem 3 by allowing n to become infinite.
§3 • THE BOMBER’S OPTIMAL STRATEGY With the aid of Theorem 3 additional qualitative features concern ing the optimal strategy for the bomber can be given. Let us consider first the truncated game G in which B must fire during the first n units n of time. Let pffl denote the probability that B fires at time m in the game G • In other words, p^ represents the sum of all probabilities that B fires at time m taking into consideration all possible histories
269
A GAME WITH A LAG
of the paths of S. Of course, may be decomposed into its various parts describing the fine structure of where and with what probability +•V> B aims his bomb at the possible locations of S at the m stage of the game. It suffices for our purposes to deal only with the quantity p^. It will now be established that p^ > 0 where n > 1, n > m > 0 . Suppose to the contrary that for some m , pj'1 = 0 . Then, if S follows during o the first mQ + 1 moves his optimal strategy for the game Gffl , and o thereafter employs the optimal strategy for the game G _1, then the o most B can secure for himself is the value mo"1 (10 )
vm -1
^ ^i + vm-m -1 i=1
X/ ^i < vn i=mQ+1
This inequality is based on Theorem 3 which implied that Vj < vn for j < n. As this implies a contradiction, we deduce that for any optimal strategy of B, there is positive probability he will fire at each move. Our next objective is to prove the nonexistence of optimal stra tegies for B In the infinite move game. Suppose that such a strategy exists and that B follows this optimal procedure. Let S employ his optimal strategy PQ with minor changes to be indicated later. Suppose at the move previous to a time when B decides to fire, S had been moving In the direction to the right, then it follows from the nature of the strategy P that B must aim either at the present position of S or at the position two units to the right of S !s present location. Let r 1 andr2 denote respectively the probabilities that B will fire at each of these two places. After one unit oftime twooutcomes arepossible (a) (b)
S continued to the right S reversed direction.
The same argument shows that in case (a) B bombs aiming with probability s1 to the new present sighted position and with probability s2 he aims two units to the right of it, where s^ = 0 is not an excluded value. In case (b) he aims to the sighted position with probability t1 and two units to the left with probability t2 . The diagram on the following page might help to clarify these operations. If B fires when sighting S at Q0, then according to our discussion, he fires at two places with prob abilities r 1 and r2 as indicated. Similarly, If B releases his bomb at , then the probabilities of striking the two chosen targets are Indicated by s1 and s2 . A similar interpretation Is valid for t1 and t2 . We now establish the following relations
KARLIN
270
(1 1 )
s2 ( ^ - i ) = ^
(1 2 )
t2
- r£
= r1
Let us consider the yield if S de parts only from his strategy PQ when at position Q 1 such that with probability y3 S continues to the left and with probability 1 - y, he reverses direction. Thereafter S continues with the procedure of the strategy PQ * Let us compute the conditional yield for
B
under the circumstances that
Q, to Q . The total yield to firing probabilities gives
B
S
has moved from
corresponding to the three possible
(*4-0
(H^)2
Substituting for
this expression reduces to (r2 + r, ) 2 - ^ 1 +
[(s2 + s, )
V5
3-/5
+ (t2 +
r0 ~ r, + s
\/~5
a )
But (r2 + !•,)+ (s2 + s, ) ^ - § - 1 + (t2 + t, ) -3—
is the conditional probability that B fires In the subsequent two moves provided S has moved from Q, to Qq . Since according to P , the re turn for B Is
A GAME WITH A LAG
271
3 2
this requires as
e
is of
arbitrary
9
sign that nT5 - 1
r l
"
r 2
=
S2
2
•
The second relation (1 2 ) is derived by an analogous argument in changing the strategy PQ for S only at the point Q2 to an arbitrary mixture. A similar set of equations holds if we are faced with the symmetrical motion in which S was moving to the left. We shall use these relations to prove the nonexistence of an optimal strategy for the bomber. Let p 1 denote the probability that B fires at the initial move. It is clear that in this case B must aim at the original position of S. As in (1 2 ), we obtain r2
where we have regarded
Q0
nT5 - 1 2 > Pi
in our diagram as the initial starting point
for S. Equation (1 1 ) implies that r 1 > r2 . Thus we find that any of the component probabilities p2 of firing at the second stage are
nT5 - 1 2
Repeating the argument with successive applications of (1 1 ) and (1 2 ) im plies that the probability of firing at successive moves increases by a factor 2 nT5
- 1
As the Siam of the probabilities is one, we infer that the only compatible explanation requires that p 1 = 0. The situation after the first stage is symmetrical to the initial setup and hence we infer that all the prob abilities of B firing at any stage must be zero. This proves THEOREM k. There exists no optimal strategy for B in the infinite game. In any truncated game Gn there is positive probability that B fires at each stage.
272
KARLIN
The relations (11) and (1 2), of interest In themselves, suggest 'the construction of a specific strategy for B in any truncated game which is in fact e effective. We will, however, not pursue this point. We remark in closing that an alternative proof of the first part of 'Theorem 4 can be developed based on the fact that carried through by Dubins [2].
g ?(v) = o.
This has In fact been
BIBLIOGRAPHY [1]
SCARF', H. E., and SHAPLEY, L. S., "Games with partial Information,” this Study *
[2]
DUBINS, L. E 0, "A discrete evasion game,n this Study.
[3.1
ISAACS, R., nThe problem of aiming and evasion, " Naval Research Logistics Quarterly 2 (1955), pp. 47-67.
Samuel Karlin Stanford University
THE EFFECT OF PSYCHOLOGICAL ATTITUDES ON THE OUTCOMES OF GAMES 1
John G. Kemeny and Gerald L. Thompson § 1.
INTRODUCTION
We begin by clarifying the relationship between our theories and the classical game theory of von Neumann and Morgenstern [6]. Our theory presumes the following psychology for an individual playing a matrix (i,ev normalized) game: The entries in the matrix are interpreted as monetary payments; each person applies to these entries a real-valued utility funct Ion and then plays the game as if his opponent1s utility function had the same effect on the matrix. Briefly, this can be described by saying that each player plays the game "as it looks to him." In general, our procedure will lead to a non-zero sum game which raises questions of cooperation, bargaining and other types of player 'interaction before and during the course of play of the game. Our approach can be regarded as a first approxi mation to the complete problem and yields a very conservative method of play analogous to the minimax rule for decision-making in the theory of statisti cal decisions. Extensions of our theory to include other possibilities lead to interesting but difficult problems. In the development of our theory we find It necessary to apply the utility function not only to monetary outcomes but also to sums of monetary outcomes plus expected future outcomes. We explicitly assume that such an operation is meaningful, and while it is a moot assumption, It seems not unlike what is frequently done in certain decision-making situations. It allows for the possibility that the manner in which a person plays a game may be influenced by the state of his fortune at the time he plays the game. This possibility is implicitly but not explicitly allowed for In the classi cal theory. Our theory may be of use to persons who are applying game theory The preparation of this paper was supported by the National Science Foundation by means of a grant given to the Dartmouth Mathematics Project and, in. part, by the Office of Naval Research through a contract with Princeton University. 273
KEMENY AND THOMPSON
27 h
to "real“life” situations in which the payoffs in terms of money are well defined but not those in terms of utility. It can be used, as we do in Section 7, toexplain paradoxes in the game-playing behavior of players. In Sections 2 through 4 we discuss strategy preserving utility functions and their effect on matrix, non-zero sum and stochastic games. In Sections 5 through 7 we discuss and illustrate various utility functions. In the last section, Section 8, we discuss more fully the reckless, cautious and common attitudes. §2.
CHARACTERIZATION OF STRATEGY PRESERVING UTILITY FUNCTIONS
Let G be an m x n matrix game with matrix ||g^j||; we define valtG] to be the value of G and X[G] to be the set of optimal strate gies for the first player in G. By a utility function f we shall mean a real-valued (nonconstant) monotone increasing function. If f is a utility function and
G
a game, then by j )I •
game whose matrix is
f (G)
we shall mean the matrix
DEFINITION. We shall say that the utility function f is strategy preserving If and only if for all matrix games G and all real constants h we have X[f(G + hE)] = X[f(G)] where E is the are unity.
m x n
;
matrix all of whose entries
The Intuitive interpretation of a strategy preserving utility function is the following: suppose that the first player has a fortune of h dollars at the time he is to play the game G; his fortune at the end of the game will be g^j + h where g^j is the actual payoff he receives from the game; the utility which he assigns to this outcome is f(g^j + h); then a strategy preserving utility function is such that every strategy optimal in the game f(G + hE) is optimal in the game f(G) and con versely; in other words it is such that the way In which the first player plays a matrix game G is independent of the state of his fortune when he plays It. The purpose of this section is to characterize such functions. Two important types of utility functions are the linear and ex ponential ones given by ax + c, a > 0 ,
{ ae
+ c,
ab ^ o
•
ATTITUDES AND GAME OUTCOMES
275
Here the letters a, b, and c indicate parameters and the conditions on the parameters are chosen so that f !(x) > 0 for all x. Several facts about elementary matrix game theory will be needed subsequently. First of all we recall that the addition of the same constant to every entry in the matrix does not change the sets of optimal strategies, nor does the multiplication of every entry by the same positive constant. (These two facts are sufficient to show that linear utility functions are strategy-preserving.) Secondly, we recall that a 2 x 2 matrix game is non-strictly-determined if and only if either the inequalities £>11 ^ Si 2 * £>2 1
£>2 2
^12* ^21
*
or the inequalities obtained from these by reversing the inequality signs, are satisfied. Finally, the first component of the optimal strategy for the first player in the 2 x 2 non-strictly determined case is given by the formula (2 )
x
= ___________________ S22 ~ 821_____ 1 S 11 " S 12 “ S21 + S22
LEMMA 1. If f is a strategy-preserving utility function then f is differentiable and f *(x) > 0 for all x . PROOF. If f is monotone increasing but not strictly increasing (and nonconstant) then there exist real numbers a, b, and h, with a < b, so that f(a) = f(b) and f(a + h) < f(b + h). If we set g 11 = g22= b and g 12 = g2 i = a> then every strategy for the first play er is optimal in the gamef(G) but the game f (G + hE) has a unique opti mal strategy, so that f is not strategy preserving. If f is monotone strictly increasing, then, by a well-known theorem (see, for example, [2 ], p. 177> ff • ) , it is differentiable almost everywhere. Hence f has a derivative at two points, say at a and b, where a > b. Set g 11 = a and g 12 = g21 = b; if g22 = x where x > then the 2 x 2 game G = llg-yllis non-strictly determined. strategy preserving we have from (2 ) that
If
f is
x f(x) - f(b) X 1 “ f(a) - 2f(b) + f(x) is the first component of the optimal strategy for the first player in the game f(G); similarly, ■y* -
X1
f(x+h) - f(b+h)
“ f (a+h")"~-~2 f (b+h ) + f (x+h)
b,
KEMENY AND THOMPSON
276
is the analogous quantity for the game equal we must have that
f (G-
+
h E ).
Pop
f(x+h) - f (b+h) _ f(x) - f(b) fTa^T™rT T B + h T "
these two to be
*
Using the identity that if d 4 f then c/d = e/f implies (c - e)/(d - f) c/d, we subtract corresponding terms on the right-hand side of this ex pression from those on the left * If we then divide numerator and denominator of the left-hand side of the resulting expression by h we obtain (1 /h) [f (x+h) - f (x)] - d/h)[f(b+h) - f (b )]
TTTKTITra^ITT^
f(x) - f(b)
= fW ^ T T F T
'
which is true for all values of ho Letting h tend to zero, and using the fact that f has a derivative at a and b, we see that f has a deriva tive for all x > b. To show thatf has a derivative for all x < b, we choose any two real numbers c and d where c > d > b, set g^ ^= d, g 12 = g21 = c and g22 = x < b, and use reasoning analogous to the above. Because f is strictly Increasing f 1(x) > o for all x. This concludes the proof of the lemma* THEOREM 1. A necessary and sufficient condition that a utility function should be strategy preserving Is that it should be either linear or exponential as in equa tion (1 ). PROOFo Sufficiency. We have already observed that linear func tions are strategy preserving* If f is exponential then, using the ele mentary facts about matrix games mentioned above, we have X
X '||aebSij+bh + C | X
||aebsij • ebh|| "
X
!|aebgij + c II
X 'Ilf(g± j )ll '
•
Necessity. Let G- = IlSijll a non-strictly determined 2 x 2 matrix game so that it has a unique optimal strategy for the first player. By the lemma, f is strictly increasing, so that f(G) and f(G + hE) are also non-strictly determined and have unique optimal strategies. By an analysis similar to that carried out in the lemma, in order that the first components of the optimal strategies for the first player In each of
ATTITUDES AND GAME OUTCOMES these games he equal, we must have f(g22+h) - f(g12+h) f(gn +h) - f(g21+h)
^ f(g22)- f(g12) = f (g11 )- f (g21 )
We observe that obvious solutions to this identity are linear functions of the form f(x) = ax + c, where we require a > o to make f ?(x) > o. To find other solutions, we use the identity mentioned in the proof of the lemma and subtract numerator and denominator of the right-hand side from thecorresponding quantities on the left-hand side* Dividingthrough the resultingexpression by h, letting h tend to zero, andusing thediffer entiability of f as given by the lemma, we obtain the following differ ential equation f'(g22) - f'(g12). f (g22) - f (s12) - T'Tg;:yT = r ( g ~ r - .f(g2p which must be true for all non-strictly determined games G. Observe that this equation becomes indeterminate for the linear solutions found above. After cross-multiplying the latter identity, setting the result equal to a constant and solving, we see that f satisfies f ’(x) = bf(x) + d
,
for all x, where b and d are constants• The unique non-linear tion of thisdifferential equation is the exponential one
solu
f(x) = aebx + c where
c = -
that
ab > o,
d/b.
To satisfy the condition that
completing the
f !(x) > o
werequire
proof of the theorem.
Von Neumann and Morgenstern [6] use the theory of the zero-sum two-person game to derive, from a zero-sum n-person game in extensive form, its characteristic function. Briefly the idea is to partition the set of players into two opposing coalitions in all possible ways and, for each such partition, to solve the resulting two person game (where a "person” now is a "coalition"). Suppose that we regard each coalition (or "person") as having a fortune which is the sum of the individual fortunes of the members of the coalition. Assume also that each coalition has a utility function with which it evaluates its prospects. We can then state the following result.
KEMENY AND THOMPSON
278
COROLLARY'. A necessary and sufficient condition that the set of optimal strategies for a coalition be inde pendent of the present fortune of the coalition is that the coalition!s utility function should be either linear or exponential as in (1). Since the usual n-person theory considers only the characteristic function of the game this result is not particularly pertinentto the classical theory but may well be to variants of the theory. §3-
EQUILIBRIUM POINT PRESERVING UTILITY FUNCTIONS
Having proved the above result for two-person zero-sum games, it is natural to inquire whether or not a similar result holdsfornon-zero sum games. As shown below, such a result does hold. Let r be an n-person non-zero-sum game and let * = (j ..., irn ) signify a vector of pure strategies and ^ = (ji1, ..., ^ ) signify a vector of mixed strategies for the game. If the mixed strategy vector [x is used th then the expected payoff to the 1 player is K±(n) =
^
Jf where
k^( *)
is the (expected) payoff to
i
if pure strategy
chosen, and is the probability of it being chosen if mixed strategy |I is an equilibrium point if for each i K± (il) > K ^ E / ^ )
for all
n±
\±
* is used.
is A
.
Here the symbol stands for the vector , ..., •■■> Sn ) Let E[r] be the set of all equilibrium points of r. By a theorem of J. Nash [5 ], the set E[r] is non-empty. As in the case of matrix games we regal’d the payoffs as monetary and permit each player to have his own utility function with which he evaluates his prospects. If player 1 has a utility function f^ then his modified expected payoff is *, Kj_(n) = For example, if
f^(x) = ax + c
then
ATTITUDES AND GAME OUTCOMES *h"x*
and iff^(x) = ae + c,
then
( b)
K^n) = a
*
279
bk . (*)
e
nn + c
.
n
If player i has a fortune of h^ when he plays the game, then his modi fied payoffsare, In the linear case (31)
K^(n) = a ^ k ^ n ) ^
+ ah± + c= Kt(u) + ah.^
;
and similarly, in the exponential case (k' )
1
bh.
K±(n) = ae
bk1 (n ) 2j
e
bh. r
iafl + c = e
*
1 K±(n) - c
+ c
.
Jt
Let f = (f1, ..., f ) be the vector of utility functions for the players; let h = (h1, ..., hn ) be the vector of present fortunes; let f (r ) be the game whose payoffs are f^(k^(*)) to the i^b player; and let f(r + h) be the game in which the ith player has fortune h^ and hence must evaluate the prospects f^(k^(* ) + h^). DEFINITION. The vector f is an equilibrium point preserving vector of utility functions if and only if for all non-zero-sum games r and all n-vectors h we have E[f(r + h )] = E[f(r)]. THEOREM 2. A necessary and sufficient condition that f be an equilibrium point preserving vector of utility functions for all games r is that each f^ be either linear or exponential as in (1). PROOF. Necessity is the same as In Theorem 1 since a matrix game Is a special type of non-zero-sum n-person game. We shall prove sufficiency for the exponential case only since the linear case is easier. Suppose jl is an equilibrium point in f (r ), that is K*(S) > K*( P:/m.± ) Let
f^(x) = ae°x + c;
then, if
h^
for all Is the
n1
.
present fortune
280
KEMENY AND THOMPSON
i _
bhi r *
K ± (n) = e > e
= which shows that
1
K± [j-l ] *
-
c
1
+
Kj.(il/n1 ) - c
c
1 + c,
(j l / )j-
jl is an equilibrium point in
for all
^
for all
\i^,
f(r + h).
If G is a matrix game (i.e., a zero-sum two-person game) and both players use their own utility functions, then the result, in general, will be a non-zero-sum game. This theorem shows how the equilibrium point in such a game behave if the utility functions are of a special type. §4.
STRATEGY PRESERVING UTILITY FUNCTIONS AND STOCHASTIC GAMES
The basic paper on stochastic games is due to Shapley [7 ], and we shall follow his notation quite closely here. In order to discuss his ideas we must consider both stochastic games and stochastic games of bound ed length, i.e., truncated stochastic games. The following definition ex plains both concepts as well as that of a sequence game. DEFINITION. Let G 1, G2, ..., GN be a finite ordered set of matrix games where Gk = is an x nv k ^ kr real-valued matrix. Let s^j and p^j be non-negative numbers bounded by 1 , called stop probabilities and transition probabilities respectively. The interpretav tion of these numbers is the following: if G is k k played and the result Is outcome g. • then s . • is the probability that the game stops and p ^ . is the prob ability that game Gr will be played next. Since these are disjunct and exhaustive alternatives we must have k kr k 3ij + Zr pIj = 1* ket r be the game which begins with G^ and then proceeds according to the above rules. Finally, let r be the collection {r1, r2, ..., r^}. (A) A truncated stochastic game r is one which is played according to the above rules, except that after at most T (a finite integer) games have been played, all further play stops. (B) A sequence game Is a truncated stochastic game with N = 1, that is, it consists of the repeated play of a single matrix game G at most T times. Here S = llsj_jll where = 1 - p^j, is the matrix of stop
281
ATTITUDES AND GAME OUTCOMES probabilities.
If
s^. = 1
for all
1
and
j,
then
we have an ordinary matrix game* (C) A stochastic game r is one which is played according to the above rules, except that the requirement k that . > s > 0 for all 1, j, and k, is added. Let
f
be a utility function and
r
a game of one
of the above types; then by the modified game r* of this type we shall mean the game with matrices f(Gk ) and the same probabilities as in r. If Gk is one of the matrix games considered and a = (a1, ..., a'N is an N-vector, then by Gk (a) we shall mean the matrix game whose entries are given by kr r
k Let^e the truncated stochastic game that begins with after at most T games have been played; let
'(0 ) = val
k and ends
"(o) = (“ (0 )’
val Gk («(0 ))
«?o))
«d ) = (ajl
a^T ) = val k Then a (rp) value is the value vector for
G
))
}
;
e(T) truncated stochastic game
k r(i>)>
and
^.
k The moves of which follow a simultaneous move of the two players form a subgame (see Kuhn [3 ], P* 2 04). Hence it is easy to show, by induction, that a behavior strategy for the first player is optimal in r^Yp) if and only if it is determined as follows: suppose that t moves from the end of the game there is a positive probability of playing the game GP; then his optimal behavior strategy should specify an optimal strategy in the set X[Gr (a ^ ^)J. Since a stochastic game r is zero-sum while the modified game r* may well be non-zero-sum, we cannot talk about optimal strategies and the value of the latter games. Instead we shall talk about mlnimax strategies and the minimax value of the game. As in finite games, the play ers may be able to do better than this by cooperative, and in some cases,
KEMENY AND THOMPSON even by non-cooperative behavior. THEOREM 3 . Let r be a truncated stochastic game, let f be a utility function, and let r* be the modified truncated stochastic game. Then a behavior strategy in r* achieves the minimax value for the first player in dependently of his present fortune for r if and only if f is either linear or exponential as in (1 ). PROOF.
As above, one can show by induction that a minimax be
havior strategy in r* is determined as follows: if at t moves from the end of the game there is positive probability of playing the game Gr; then the minimax behavior strategy should specify a strategy in the set X[Gr (a(t ) + hE)] where h is the fortune of the player at that time and E is the matrix of all ones. By Theorem 1 , a necessary and sufficient condition that, for all games r, this strategy be independent of h, i.e., should also belong to the set X[f (Gr (oi^ ^ ))], is that f be of the form (1 ). In effect this theorem says that the player need onlyknow how many more games he has to play after Gr in order to decide how toplay. nPast history" is irrelevant to his decision. Now let r be a stochastic game and let G 1, ..., G^ be the matrix games occurring in r. If a is any N-vector then we define (with Shapley) a vector mapping Va = 0 , where the components of p are given by Pk = val[Gk (a)]
.
THEOREM 4. If r is a stochastic game, f a utility function of type (1 ) and r* is the modified stochastic game, then the first player's minimax value vector of r* is the unique solution of the equations 0 = val[f(Gk (0 ))] for
,
k = 1 , . . N.
PROOF. The details of the proof are as in Shapleyfs paper except for the convergence of the V mapping. The proof that theV-mapping con verges for utility functions of type (1 ) follows from the fact that the value of a matrix game is a continuous function of itsentries,andfunc tions oftype (1 ) are continuous. THEOREM 5» Stationary strategies achieve the mini max value in each game (r^)* belonging to r*.
283
ATTITUDES AND GAME OUTCOMES PROOF. §5.
As in Shapley’s paper. PSYCHOLOGICAL ATTITUDES AND UTILITY FUNCTIONS
In this section we shall discuss various types of psychological attitudes. For convenience we shall give names to them and leave to the reader the judgement of the suitability of these names. By an attitude of the first kind we shall mean one which depends only on the payoffs involved. Such attitudes can always be represented by means of a utility function. Attitudes of the second kind depend on factors other than the payoff, and hence cannot be described by means of a utility function. All but one of our examples will be of utility functions of the first kind. When discussing attitudes of the first kind we shall usually re quire tion, types
that f (0 ) =0 since, by a linear (strategy preserving) transforma wecan always make f have this property. In Figure 1 we show six of continuous utility functions and two discontinuous ones.
The first such is the "fair attitude” of a person who judges his utility to be directly proportional to the payoff. Since an additive constant can be ignored, we can think of the linear function representing the fair attitude.
f(x) = x
as
Next there is the "reckless or gambler’s attitude" of a person who concentrates on winning large sums. To him a large win looks even larger and a large loss is discounted. The result is a utility curve that is concave upward over its entire range. The exponential functions satisfy ing (1 ) with a > 0 are of this type. For example, the function f(x) = ex - 1 Is such. The "cautious attitude" is that of a person who concentrates on avoiding large losses. He exaggerates large losses and correspondingly discounts large wins. Thus his utility curve is convex downward over its entire range. The exponential functions satisfying (1 ) with a < 0 are of this type. For example, f(x) = 1 - e~x is such. All functions satisfying (1 ) represent one of these three types, hence we know that other attitudes must not satisfy (1 ).It is interest ing to note that utility curves similar to these three were observed the Mosteller-Nogee utility experiment [ h ].
in
The "poor man’s attitude" is that of a person for whom large sums, either positive or negative, are exaggerated. His utility curve is concave upward for positive x, and concave downward for negative x . The "rich man’s attitude" is that of a person to whom large
284
KEMENY AND THOMPSON
Rich
Desperate
FIGURE 1
ATTITUDES AND GAME OUTCOMES
285
sums, either positive or negative, are discounted. His utility curve is concave downward for positive x and concave upward for negative x. The "common attitude" is that of a person who Is reckless enough to play games In what the payoff entries are small, but when payoffs become large, he becomes cautious. Thus, for a range about zero his utility curve is convex upward, but it becomes concave downward as x becomes large in absolute value. In addition to the continuous utility functions discussed above a large number of discontinuous ones can also be distinguished. Here we mention just two interesting cases. The "winning attitude" is that of a person who, besides consider ing how much he wins or loses, puts a positive premium on winning and a negative premium on losing. We shall assume that he has a fair attitude otherwise, so that his utility curve is the line y = x, with the posi tive half moved up and the negative half moved down. By means of this utility function we are able to "explain" the paradoxial sequence game dis cussed in Section 7* The "desperate attitude" is that of a person who must win a given sum of money. Any amount of money less than this Is of no value to him and any amount in excess does not have greater value. His utility curve is a step function, positive from some sum on, and negative otherwise. (In this we do not require that f (0 ) = 0 .) The reader will doubtless be able to think of other intuitively interesLing utility curves. The ones mentioned above are sufficient to discus-?, some Interesting examples, several of which are described in the next tvo sections. §6 . EXAMPLES A. The lottery game. A player of the lottery game purchases from a banker a small chance of winning a large sum of money. Let s be the price of the lottery ticket, let K be the amount of the win and let p be the probability of win. The expectation of the banker is p(- K) + (1 - p)s
.
The banker will adjust p so that his expectation is positive and hence so that the player1s expectation is negative, i.e., he will choose 0 < P < TT~a Here
p
is small since
K
•
is large compared to
s.
KEMENY AND THOMPSON
286
Suppose the player has a utility function
f . He then evaluates
his expected utility as E = pf(K) + (1 - p)f(- s)
.
Let us assume that he will play only if his expected utility is positive, and will not play if it is negative or zero. Then we can make a list of typical decisions to play or not to play by the various players, depending upon their utility functions. It is to be emphasized that not all utility functions of a given type will make the same decision in the same situa tion; this will happen only when their shapes are sufficiently extreme. Not Play
Play, Reckless Poor Common (K small) Desperate (K greater than discontinuity point)
Fair Cautious Common (K large) Desperate (K less than discontinuity point) Winning Rich
The reader may check his intuition as to whether or not the various de cisions made on the basis of the utility functions are consistent with the names we have given them. (B)
The Insurance Game. Here the player owns property worth
dollars and there Is a small probability p of losing it by accidental means. The insurance company offers to pay the player K dollars if the property is destroyed in return for a premium of s dollars which the play er pays to the company- The expectation of the insurance company is (1 - p )s + p( - K + s ), and clearly they will adjust s so that this ex pectation is positive; i.e., they will require that s > pK. A simple computation shows that this makes the playerls monetary expectation nega tive. Suppose, however, the player has a utility function f; then his expected utility is f (- s ) if he insures and pf(- K) if he does not insure; the difference of these two is D = f(- s) - pf(- K)
.
Let us assume that he will insure only if D is positive. Again typical (but not necessary) decisions by persons having the above utility func tions are as follows:
R
28 ?
ATTITUDES AND GAME OUTCOMES Insure Cautious Poor Common (if
Not Insure
K
Pair Reckless Common (If K Rich Winning Desperate .
large)
small)
Again the reader can check whether or not these decisions agree with his intuition. By adjusting s, the Insurance company can appeal to people having different kinds'of utility functions. (C) Colonel Blotto. The attitudes illustrated above were all of the first kind and continuous. Here we wish to illustrate a discontinuous attitude of the first kind and an attitude of the second kind. That famed military leader, Colonel Blotto, has four divisions available and Is contesting two mountain passes with an inferior enemy who has only three divisions. Assume that whichever side has the most divisions at a pass takes the pass and the opponent1s divisions, except that in case of a tie neither side takes the pass and neither loses a division. Blotto has three strategies corresponding to three possible ways of dividing his troops into two parts: b + 0,3 + 1 and 2 + 2 . His opponent may divide his divisions in either of the two ways, 3 + 0 or 2 + 1 . We assume that, having decided on the partitioning of the troops, the actual assignment to the passes is done at random. Given a choice of a strategy by each player we can calculate the expected outcome. Por example, if Blotto plays 3 + 1 and his opponent plays 2 + 1 we must distinguish two cases: (1 ) Blotto1s three divisions go against the opponents’ two; then Blotto wins one pass and ties for the other; (2 ) Blotto’s three divisions go against the opponents’ one; then Blotto wins one pass and loses one. These possibilities are equally like ly, so Blotto’s payoff is 1 /2 . (Here we do not take into account, as we do later, the loss of divisions.) Proceeding in this way we obtain the following payoff table: 3 + 0
2 + 1
4+0
1/2
0
3 + 1
1/2
1/2
2 + 2
0
1
The resulting matrix game is strictly determined and Blotto should always play 3 + 1 and his opponent should always play 3 + 0 . The value of the
288
KEMENY-AND THOMPSON
game to Blotto is 1 /2 , Indicating his numerical superiority. If both players play optimally there is probability 1 / 2 that Blotto will lose a division. But now let us suppose that Blotto takes into account not only the number of passes that he can capture but also worries about the number of his men who are captured; in other words he has an attitude of the second kind. To be specific, assume that to Blotto a division lost is as bad as a pass lost. The resulting payoff table can be computed to be the following: 3 + 0
2 + 1
4 + 0
1/2
0
3 + 1
0
0
2 + 2
- 2
1
The solution to this game is that Blotto should play the first row with probability 6/7 and the last row with probability 1/7; his opponent should play the first column with probability 1 / 7 and the second column with probability 6/7 . The value of the game to Blotto is 1 / 7 indi cating that his concern over loss of troops has decreased his expectation in the game. If both players play optimally, then Blottofs probability of losing a division is only 1/49 as compared with probability 1 / 2 in the preceding example. Thus his new strategy is successful in reducing the probability of troop loss. As a final variant, assume that Blotto has the desperate atti tude and wants at all costs, to capture one pass. To be definite, assume that, regardless of losses, he puts value one on capturing one or more passes and zero on not capturing a pass. It is easy to see that this makes all entries in the payoff table equal to one so that any strategy for Blotto is optimal. ences .
Thus the desperate attitude obscures strategic differ
§7-
A SEQUENCE GAME
As a more complicated example of an application of utility func tions, let us consider the game of matching pennies in which the players have agreed to quit after N plays, and in which the first player is given the option of quitting sooner if he wishes. According to ordinary gametheoretic arguments, this game is fair and hence a "rationaln person should not be unwilling to accept the role of either player in the game. Yet most people would gladly play the role of the first player, and would refuse the role of the second player. We shall show that a person with the "winning" attitude (and initial fortune zero) would be willing to play the game as
289
ATTITUDES AND GAME OUTCOMES
the first player and would not he willing to play the game as the second player. Moreover he will continue to play until one of the following three conditions obtains: his fortune becomes one; his fortune returns to zero and there is exactly one more play of the game; his fortune drops so low that he has no chance of raising back to zero. If any of the latter three cases occurs, the player stops playing the game* Thus the winning atti tude produces a behavior somewhat like that of many persons. Let the game
G
be given by the matrix 1 -
-
1
1
1
0
0
where the row of zeros corresponds to the optional stop privilege of the first player. Obviously both players will use a ~) strategy if they decide to play. The rules of the game are: after the Nth play of G the game stops; after fewer than N plays of G the game continues if the first player has chosen one of the first two rows of the matrix but stops if he has chosen the third row. Suppose now that the first player has the winning attitude given by the utility function € ! r x + 0 f(x) = ^
x +
L -
€
for
X
> 0
,
for
X
= 0
,
for
X
< 0
.
Note that the value of f(x) Is obtained from x by adding a bonus of + €f 0 , or - e, depending upon the value of x. As the game proceeds the first player's fortune varies according to a random walk (see Feller [1 ], Chapter 1 k ). The expected change in his utility depends upon his ex pected final fortune and his expected final bonus. By a well-known theo rem, his expected final fortune is the same as his Initial fortune, re gardless of how he uses his optional stop rivilege. Hence his expected final change in utility depends only upon his expected final bonus. By a position we shall mean the pair h, n where h is the play er's fortune and n the number of games left to play. Let v-. „ denote n $n the function that gives the expected change in bonus of the first player if he starts at position h,n and plays optimally. We proceed to deter mine v^ n by means of difference equations. Let us first determine the boundary conditions. The easiest values to define are for n = 0 . Clearly
290
KEMENY AND THOMPSON
(5 )
vh o = 0
f'OP a11
h
since the game is then not played at all. If h < - n, that is, if it is impossible for the player ever to raise his fortune to zero, then he can not change the bonus and hence (6)
Finally, if
v^ n = 0
h > 1,
for
h < - n
.
then the player already has the maximum bonus of
+ e,
and cannot improve on it and may possibly decrease it by playing. On the other hand, he can assure himself of a zero bonus by not playing, that is (7)
vh n = 0
^or
h > 1,
all
n
.
(Here as elsewhere in this paper we use the convention that the player will not play unless his expected bonus is positive. ) Equations (5 ), (6) and (7 ) define three boundaries and determine a wedge in the plane. If the player always plays when he is located in side this wedge, then it is obvious that the following difference equa tions hold:
vo,n = i
(8)
(9)
(10)
v-1,n-l
v-l,n = ¥ ( £ + vo,n-1 + v-2,n-1 } vh,n = i
(vh+l,n-1 + vh-1,n-1 > for
h < - 1
.
If we find a nonnegative solution to these equations within the wedge de fined by the boundary conditions then we will have found the expression foi the expected gain v, . Without going into details it can be shown that n,n
the solution of these difference equations, valid in the wedge, is
(1 1 )
v
= e Y , --- 3-- k k=1 (k+l)2k
°k+1 -^1
n 02)
vh,n = ( - h + 1 ) e
E
^k
k=-h lK+1 '42
°k+h ~
for
h < 0
•
ATTITUDES AND GAME OUTCOMES In interpreting these equations the binomial coefficients
291 Cp
are to be
set equal to zero if the subscript is.not a nonnegative integer* By* direct verification it is easy to show that vu, I = 0, which means that the player does not play when his fortune is zero and exactly one game remains to be played. Also by direct verification, it can be seen that v-^ > 0 for h = 0 and n > 1 or h < 0 and n > 0, that is, the player will continue to play in the rest of the wedge defined by the boundary conditions. The final solution is then that the player plays in a plane region that is a "nickedM wedge, the nick being at the position 0, 1 . The solution just found may be stated verbally as follows:
If
the player (having a winning utility function) and zero fortune is asked to play the game he should accept the role of the first player providing N is at least 2 . This strategy is, roughly speaking, "play -until you get ahead, then quit." More precisely, he should quit whenever (a) his fortune does rise above zero or (b) he falls too far behind to catch up (h < - n), or (c) he gets his losses back and there is only a single game left. On the other hand, if the player is offered the second player1s position he should definitely refuse, since his opponent could use the above strategy and assure that the expected change in the player1s bonus is negative. The reader should observe that the above optimal strategy depends upon the present fortune of the player in contrast to the games discussed in the early parts of this paper in which the optimal strategy was independ ent of the present fortune of the player. §8 .. THE RECKLESS, CAUTIOUS AND COMMON ATTITUDES We discuss first the reckless and cautious attitudes and then, at the end of the section, the common attitude. In order to discuss the first two we shall consider "bets" which are, in effect, one-person games against Nature. By a bet ( x , p) we shall mean the matrix game
Nature P Accept Bet
\
Reject Bet
0
1 - P - X 0
Here Nature1s strategies are to play column one with fixed probability p > 0 and column two with probability 1 - p. The quantity X is call ed the stake* If the player chooses the first row he "accepts" the bet
KEMENY AND THOMPSON and if he plays the second row he "rejects” the bet. The bet is called a fair bet if p = 1 /2 ; it is called a losing bet if p < 1 /2 ; and it is a p > 1 /2 .
winning bet if
A function a positive number 8
f The function versed.
f
f is said to be convex upward atx if there exists such that for all X with 0 < X < 5, (x + x) -
f
(x) >
f(x )
is convex downward at
x
-
f(x
- x)
.
if the inequalitysign
is re
Inthis section we shall always assume that if f is a utility function then f(o) = 0 . If the player has a utility function f and x is his present fortune, then his expected utility from accepting the bet is pf(x + x) + (1 - p)f(x - x) The utility of his present fortune is, of course, f(x). the player will playaccording to the following rules: (A) (B)
if pf(x + x) + then accept the if pf(x + x) + then reject the
We assume that
(1 - p)f(x - x) - f(x) > 0 bet; (1 - p)f(x ~ X) - f(x) < 0 bet.
In words this strategy says, accept the bet if and only if your expected utility from the bet is greater than the utility of your present fortune. We shall say that a player is reckless in an interval
(a, b)
if and only if, when his present fortune x lies in this interval he will accept winning and fair bets and some unfair bets, providing the stake is sufficiently small. Similarly, we shall say that he is cautious in an in terval (a, b) if and only if, when his present fortune x lies in the interval, he will accept only some (sufficiently favorable) winning bets. Assume for the moment that the function f is strictly in creasing; then we can recast the above strategy by means of a function defined by p. _ f (x+ x ) - f(x) gxu ; _ f ( x ) -
•
The above strategy now becomes, (A) (B)
if if
g^(x) > (1 - p)/p - then accept the bet; g,(x) < (1 - p)/p then reject the bet.
g^
ATTITUDES AND GAME OUTCOMES Obviously losing bets correspond to spond to
(1 - p)/p < 1
(1 - p)/p > 1
and fair bets to
and winning bets corre
(1 - p)/p = 1 .
THEOREM 6 . A utility function f gives rise to con sistently reckless behavior in an interval a < x < b if and only if f is continuous, convex upward and strictly increasing in the interval. A utility func tion f gives rise to consistently cautious behavior in the interval if and only if it is continuous, con vex downward and strictly increasing in the interval. PROOF. We prove the first statement only since the proof of the second is similar. Since f is monotone increasing it has right and left hand limits at any point c in the interval (a, b). Therefore, we can define f~(c) =
lim f(x) x -- *~c“
and
f+ (c) = llm f(x) x -- c+
and set d(c) = f+ (c) - f~(c)
.
Suppose now thata point c in (a, b)is a point of discon tinuity of f,i.e., suppose that a < c < b and d = d(c) > 0 . Then consider the fair bet (x, l/2 ). Choose numbers X and 5 so that 0 < 6 0, and if z is ever less than d then, by (A), the player will notaccept the fair bet, contraryto the fact that he has the reckless attitude. On the other hand, if for all choices of X and 5 we have z > d, then f+ (c) does not exist, which is also a contradiction. Hence we can conclude that f Is continuous in (a, b)> If
f
is not always convex upward then there is a fortune
x
and
KEMENY AND THOMPSON a stake
X
such that
x, x + A, and
x - X
belong to
(a, b)
and such
that f(x +
x)
- f(x) < f(x) - f(x - X ) .
But then the player would not accept the fair bet ( x , 1 /2 ), contrary to the assumption that he is reckless. In the same way one can obtain a con tradiction if f is not strictly monotone in the interval (a, b ). By referring to the definition of convexity and the rules of be havior of the'player given above it is easy to see that if f is con tinuous, convex upward and strictly increasing in the interval then it gives rise to reckless behavior. COROLLARY. If f ' is consistently reckless (or cautious) in the interval (a, b) then g^ is con tinuous in the interval for every X > 0 . PROOF. Since f is strictly increasing the denominator of g^ is non-zero, and hence g^ is continuous since it is the quotient of a continuous function by a non-zero continuous function. If a player is everywhere reckless then g^(x) >1 for X> 0 and all x. The manner in which g^ varies for fixed X as x ---00 shows whether or not the player becomes more or less reckless ashis for tune goes to infinity. We shall say that the playerrs behavior is increasingly reckless or boundedly reckless according to whether or not g^ goes monotonically to infinity or converges monotonically to a finite limit as x — ^oo„ By a uniformly reckless utility function we shall mean one for which g^ remains constant for all x and fixed X. A uniformly reckless utility function is obviously a special type of a boundedly reck less function. Notice that wedo not consider here the possibility of non-monotonic behavior of g . In other words we require that the play er's behavior should vary consistently as x increases. In the next theo rem we study these kinds of behavior, making use of the function k(x) = log f(x). THEOREM 7* (A) The function f is an increasingly reckless utility function if and only if k = log f is eventually (that is, for sufficiently large x) convex upward. (B) The function f is a boundedly reckless utility function only if k(x + x ) - k(x) is bounded. (C) The function f is a uniformly reckless bx utility function if and only if it is of the form ae - a.
ATTITUDES AND GAME OUTCOMES
295
PROOF. If f is a consistently reckless utility function, then by Theorem 6 it is monotone increasing to infinity; then k = log f must also Increase monotonically to infinity, for if log f were bounded then f would also be bounded and would cross the graph of the equation contrary to the fact that f was convex upward.
k
(A) as follows:
l x (x)
We can rewrite the function
0k(x+x) _ ek(x) = ek~(x) ek(x~A,)
ek(x)-k(x-x) = pc(x)-k(x-x) _ ^
g^
ek(x+A,)-k(x)
y = x,
in terms of the function
1
It is not hard to see that is bounded if k(x + X) - k(x) Is bounded. The first factor on the right hand side is ofthe form z/(z - 1) which decreases monotonically to one as z tends to Infinity; hence g^ Is monotone increasing if and only If k(x + x) - k(x) is monotonically In creasing to infinity, and the latter statement is the same thing as saying that k = log f is eventually convex upward.
tive. g(x)—
(B) Since f Is increasing k(x + A.) - k(x) is always posi If k(x + x) - k(x) were unbounded then, as in (A), we would have ► oo as x —
(C) If g^ is always equal to a constant we have a functional equation of the form
M for allx
f(x + X) + Mf(x - X) = (1 + M)f(x)
then
.
It is easy to check that a function of the form •K~ \r
f(x) = ae
- a,
a > o
,
satisfies the equation providing M = ( e^ - 1 )/(1 - e " ^ ). The latter equation determines b uniquely. By investigating the behavior of fon a dense set of x values one can show, in the standard manner for handling such functional equations, that f is uniquely determined up to a multi plicative constant. EXAMPLES. It is instructive to have examples of functions ex hibiting the above kinds of behavior. Such are the following: x2 (A) f (x) =e 1, (B) f(x) =x3 , (C) f(x) = ex - 1 . The function k is obtained from f by "moderating” the con vexity of f by taking its logarithm. Now k is not defined for small positive and all negative values of x, but for large values of x it can
KEMENY AND THOMPSON be considered to be a utility function.
Define the following functions:
k 1 = k = log f k2 = log k 1 = log log f
kn = log kn_,
.
We shall say that f Is increasingly reckless of type n if and only if kR is (eventually) convex upwards by kn+1 is not. Defining exp z = ez we can see that the function f(x) = exp exp ... exp x - exp exp ... exp 1 i ^ _____________ _j y ------------------------ Y n times (n-1) times is a utility function reckless of type n. Thus we see that there exists an infinite hierarchy of reckless utility functions each of a given type. The next theorem shows how reckless utility functions affect the behavior of a person playing matrix games. Obviously an analogous theorem can be proved about cautious utility functions. THEOREM 8. G
(B) (0)
G
If
f
is a reckless utility function and
is a matrix game, then (A) Val [G] = 0 implies
val [f(G)J > 0
;
Val [GJ > 0 implies val [f(G)] > val [G] ; For every fortune x there exists a game G with val [G 1 < 0 and val [f(G + xE)] > f(x); i.e., the player will always be willing to play some unfair games.
PROOF. (A) and consider the
Let x = (x^ ..., x^) be an optimal strategy for columns of G and f(G). We have
0 = val [G] < X xlsij ^
- val
>
where the second inequality sign may not be strict in the case that for all i.
g^j =
(B) The proof here Is the same as in case (A) except that the second inequality must be strict since v(G) > o implies that each column of G has at least one positive entry. (C)
Consider the game
ATTITUDES AND GAME OUTCOMES / °6
X - 8
- X - 6 \
y - X - 5
6
\-
J
where o < 8 < X. Optimal s t r a t e g i e s f o r each p la y e r a r e v a l [Gg] = - 5 < 0. S im ila r ly , f o r th e game / f (x + x - 6 ) f(GB + xE) = | \ f (x - x - 8 ) th e op tim al s t r a t e g i e s a re
(1/2, 1/2 )
v a l [f(G 5 + xE )] = ( l / 2 ) [ f ( x + F or
8
= 0we have, from the co n v ex ity
8 > 0
and
f (x - x - 8 ) f (x + x - 8 )
f o r each p la y e r and i t s v a lu e i s X - 8 ) + f (x - X -
of
8 )]
f , th a t
v a l [f(G 0 + xE )] > f ( x ) By c o n tin u ity , we can choose
( 1/ 2, 1/2 )
.
so sm all th a t
v a l [f(G 6 + xE ) ] > f ( x ) , con clu din g th e p r o o f. We say th a t a p la y e r h as the common a t t it u d e i f and only i f he i s r e c k le s s whenh is p re se n t fo rtu n e l i e s in an in t e r v a l (a , b ) c o n ta in in g the o r ig in , and i s c a u tio u s when h is p re se n t fo rtu n e l i e s o u tsid e t h i s in t e r v a l . As exam ples, c o n sid e r th e fo llo w in g fu n c tio n s : x.
L et
(a ) Our f i r s t example i s con tinuous and d if f e r e n t i a b l e f o r a l l a < 0 < b be a r b it r a r y numbers and l e t f be a s fo llo w s: ex - 1
f(x) =
at
(b ) a and b .
V
fo r
a< x < b
- e2b e~x + 2eb - 1
fo r
x> b
- e 2a e ”x + 2ea - 1
fo r
x< a
,
,
The second example i s con tinuous and d i f f e r e n t i a b l e , except L et a < 0 < b be a r b it r a r y numbers and l e t f be a s fo llo w s:
f(x) = < [
ex - 1
fo r
a < x < b
^ x + e°
fo r
x > b
2x + ea - 1
fo r
x < a
,
,
KEMENY' AND THOMPSON The characteristics of the common attitude are quite similar to those of many persons. terval
(a, b)
too large.
For example, If the player’s fortune is In the In
then he will be willing to play games whose entries are not
Thus he would match pennies if the payoffs are in terms of
dollars; however he would not match pennies for payoffs In terms of millions of dollars.
He will continue to play a game until his fortune rises to
somewhere in the neighborhood of a.
b
or until it falls to somewhere near
In other words, he will quit if "he gets far enough ahead" or if "his
losses drive him from the game."
By means of this utility function it
would be possible to construct a game-playing robot having these properties. BIBLIOGRAPHY [1]
FELLER, ¥., An Introduction to Probability Theory and its Applications, John Wiley and Sons, 1950 .
[2]
GOFFMAN, C ., Real Functions, Rinehart and Co., Inc.,
[3]
KUHN, H. W., "Extensive games and the problem of information," Annals of Mathematics Study No. 28 (Princeton, 1953 ), PP 193-216.
[4 ]
MOSTELLER, F. and NOGEE, P., "An experimental measurement of utility," Journal of Political Economy 59 0 9 5 1 ), pp. 286-295.
[5]
NASH, J. F., "Non-cooperative games," Annals of Mathematics pp. 286-295.
[6 ]
von NEUMANN, JOHN and MORGENSTERN, OSKAR, The Theory of Games and Economic Behavior, Princeton, (19 b k ; third edition, 1 9 5 3 )*
[7]
SHAPLEY, L. S., "Stochastic games," Proc. Nat. Acad. Sci. pp. 1095-1100.
1953 *
39
5 ^ (1951
(1 9 5 3 ),
John G. Kemeny Gerald L. Thompson Dartmouth College
),
ON A GAME WITHOUT A VALUE Maurice Sion §1 .
1
and Philip Wolfe
O
INTRODUCTION
The object of the present paper is to show that one of the main results in the theory of infinite games, the theorem of Glicksberg [2 ] on semi-continuous payoffs, cannot be extended in certain directions. In §2 we present a game on the square (a form of continuous Blotto) which does not have a value, but whose payoff function is topo logically even simpler than that of the classical example due to Ville [6 ]. Scarf and Shapley [5] have applied Glicksberg1s theorem to a number of in finite games in extensive form. It is natural to ask whether the condition of semi-continuity of the payoff is equally important for the determinacy of such games. To answer this question we show in §3 that any game on the square may be transcribed into a game in extensive form with its value, or lack of value, preserved. In §4 we find that the transcription of the ex ample of §2 to extensive form yields a type of "game of pursuit" without a value. §2 . THE GAME ON THE SQUARE
and
K
Glicksberg1s theorem states: If A and B are compact sets, is an upper (lower) semi-continuous function on A x B, then
sup infI K df dg = inf supI K df dg
f
where
f
and
measures on
A
g
g
J J
g
f
«/,J
,
range respectively over the sets of all probability Borel
and
B.
Supported by the United States Air Force, through the Office of Scientific Research of the Air Research and Development Command, under contract No. AF 1 8 (6 oo)—i1 0 9 . 2 Under contract with the Office of Naval Research. 299
SION AND WOLFE The example delimiting possible extensions of this theorem is the game on the unit square (o < x < 1 , 0 < y < 1) having the payoff (see Figure 1) -1 o
K(x, j )
+ 1
if
x < j
if
x = j
< x + j
or
,
y = x + l
otherwise . Note that, unlike Ville's ex ample, this function K assumes the values + 1 , - 1 respectively on two open sets and vanishes on their comple ment. It is clearly neither upper nor lower semi-continuous. (Recall that a function F is upper (lower) semi-continuous if for any number c, the set {P | f(P) < c }({P | F(P) > c} is open.) We shall show that sup inf f R (
a
K df dg
1) inf sup
K df dg ii
Figure 1 Let
f
be any probability measure on
f( M ) let
yf = i.
6 > 0
T
•
i)) > i
’
so that
'([ 0,
o < t < 1
Q(o) = 0;
Q(1 ) = 1 ;
Q'(t) > 0,
0 < t < 1
.
3) The symbols z , r(z^ ), s (z^ ), ¥ (z) are defined only when x eX and y € Y are two vectors such that x^ 4 > all i, j . z denotesthe vector whose components are the numbers x 1, ..., ym arranged in increasing order; for k = 1, n + m, either z^ = x_^ (for some i ), or z^ = y^ (for some j ), but not both. Thus r
p (x
.)
if
zk = x± ,
I - Q(yj)
if
zk = y .
r(zk } = j ;
Then re
RESTREPO
316 and
f P(x. )
if
zv = x.
,
Q(yj )
if
zk = y .
;
s(zk ) =
are well defined functions. follows:
Finally,
f(z) is defined recursively as
..., zk ) = r(z 1 ) + [1 - s(z1 )]
f(z2,
f(z2, ..., zk ) = r(z2 ) + [i - s(z2 )]
~a(z3 ,
^(z^,
Y(zk ) = r(zk )
..., zk ) zk )
.
The number of components in x and y is not essential, and the definition can be applied to a single vector x e X, or to a single vector y e Y. In this case one may say that the other vector has no components, k)
The pay-off function ¥ (x, y) is defined as follows: each component of x is different from each component of y, then
if
y(x, y) = *(z) where 'J'(z) is the number defined above. satisfied, then *(£, y) = -
If the given condition is not
{^(x +"_~6, y~i~~o) + y (x ~ o, y + “o )}
§4.
SOME PROPERTIES OF
¥(x, y)
Y(x, y) is continuous as long as the relative order components of x and y does not change. Furthermore, Y(x, symmetric in the following sense: if the roles of X and Y changed, Y(y, x) = - Y(x, y ). Othersimple properties of ¥ in the next three lemmas. LEMMA 1. If
z = (z1, ...,
z^, zt+1,..., z^),
of the y) is skeware inter are given
then
*(z) = Y(Z1,. . zt ) t + j~f [1 - s(Z l )] T(zt+1, ..., Zk ) 1=1 PROOF.
If
t = 1,
.
the formula reduces to the definition of
31
TA CTICA L PROBLEMS f(z).
For
t > 1,
the formula is proved by induction.
The details are
omitted. LEMMA 2.
If Z
f{z)
=
( z
^
• • • ;
z t
=
_ 1 ,
Z t
,
Z t + 1 ,
zt_ 1, zt+1,
t-1 + ]~j [1 -
S
(Z_^ )] (r(zt )
Z k ),
zk )
-
s(zt ) Y(zt+1,
zk )} .
i= 1 PROOF. By Lemma 1,
¥(z) = ^(z^
t-1 zt_1 ) + j"! [1 - s(z1 )] Y(zt, 1=1
zk )
,
and '?(zt,
zk ) = r(zt ) + [1 - s(zt )] 5f(zt+1,
zk )
.
Therefore,
¥ (z ) = ¥ (Z . j, *•m
t-1 Z^___j) + | |" [1 ” s (Z^ )] Y (zt+ ^, •*•; Zk ) 1=1
t-1 + ]~T f1 - s(z1 )] {p(zt ) - s(zt ) ¥(zt + 1, . i= 1 Lemma 1 shows that the first two terms can be replaced by Zt+1>
zk )}
y
( z ^,
.
z^_1
zk )# LEMMA 3* For any fixed y, ¥(x, y) Is a monotone in creasing function of each component x^ of x as long as x^ ranges over an open interval that does not con tain any components of y . Similarly, for any fixed x, ¥(x, y) Is a monotone decreasing function of each component y . of y as long as y^ ranges over an open interval that does not contain any components of
x.
PROOF. Since ¥ (x, y) is skew-symmetric in x and y, it is sufficient to prove only the first half of the lemma. The result Is
RESTREPO
318
established by means of Lemma 2. In the term
j~j [
17
i-1
1
-
P(xk )]
k= 1
where
z*
Indeed, any component
[ 1
-
)]
Q(yj
(P^)
-
x^
appears only
P(X l ) y(z*)}
,
y . o,
andan+| = 1•
The first
n - 1 measures are continuous;
last measure Fn may mass a at xn = 1.
the
have only one discontinuity with
Equivalent conditions are used when dealing with measures G(y) defined over Y. NOTATION.
In the following sections,
denote the expected values of
P
and
¥.
That is,
ai+i
%= J
P(t) dP± (t)
a.
R(y) =
J
y(x, y) dF(x)
and
R(y)
will always
TACTICAL PROBLEMS The vector
f)(& )
and the function
■ U W
319
are defined as follows:
.2.....Dn>
“k
+ [1 - D^] 0 ( & )
\
0(Bn ) = 0
.
Using the last definition, it is easy to see that
Ji(xk+1,
xn ) dPk+1(xk+l) ... dPn(xn ) = 0(Dk)
.
LEMMA k . Let F(x) be in the class o, and let y € Y be a vector whose last component ym is contained in the open interval (a^, a^+1 ). Then
k-i
m-1
n
i= 1 r
•••> ym- i }
i
i
P?
R(y-i> [1
- % ! n [i -- Q(yj)]
t- Q(ym)]
•
j=i
n
ak + i
1I 2 J
P(V
dPk (xk )
+
[1 - i y
, [1 + 0(1^)]
1J -.
J
PROOF. Let y be the given vector and let x = (x1, ..., xn ) be any vector contained in the support of F(x). Lemma 2 can be used to separate all the terms of Y(x, y ) that depend on J m; thus, if ak < xk < V * ( X ,
(y,, • • • , y ^ , ym)) - *(x, (y^ •••> ym_-|))
k
m-1
= j~[ [1 - p(x± )] ]~j [1 - Q(yj)] [- Q(ym )] M 1=1
But if
ym < xk < a ^ ,
*(x, (y1, • k-1
+ *(xk+1> •••> xn )J
j=i
ym_1, ym)) - *(x, (y1?
ym_1))
m-i
= J~j [1 - p(xi )] [] [1 - Q(yj)] [- Q(ym )] t1 + 1=1 j=i
xk+i^ •••■’ xn )]
RESTREPO
320
The vectors
x
x^ = Jm
with
•••» ym _i»
ym )) - y(x, (y 1, ..., ym _1 ) )] dP(x) m~ 1
- rt" i=i
k-i + f f [1 - D . ] 1=1 r
r
n [1 - Q(yj )H-Q(ym )] -j J j= i v.
- %i
m-1 f f [ i - Q(yj ) J [ - Q(ym)] j=i
i [1 - P ( x k )][i + 0(0^)] dPk (xk ) IJ
•
ak+1
I/
Therefore,
•••> v V
= J [^(5 , ( j v k- 1
have F-measure zero.
^ I 1 + P(Xk) + t1 ~ P(xk )] 0 (Dk )}dFk (xk ) >
ym
k-i
=
m-1
f|[1 - d.] [] 1=1
•
{ 2/
f1 -Q(yj)3 [-
•
j=1
+ [1 - °k] [1 +
p i+k-1
2 n
[1 - ct ]
7i+k , r
j = ri
t=± In order to complete the proof of the lemma it is sufficient to replace this system of equations by the equivalent system that is obtained by sub tracting from each equation the preceding one; for instance, if the equa tion that contains h^j, then
hij+i
Is subtracted from the equation that contains
bij +2 Q(bl j + i )
P(x_. ) dFi (x1 ) + 2y. - 2 7 n. . l>J+2
/
ft(b..+ 2 )
= 2hi j + l
Q ( b i j +1 ) "
j " 2hi J +1
Simplifying, hij
^1
Similarly, the equations that contain
+ 1 ^ hij + l hj_r>
and
h^+1 Q
show that
'
RESTREPO
326
bi+i ,o
^1 ~ It may happen that one of the original
a^1s
coincides with one of the
original t>j,s> say ai+i = bi+i 1 ‘ this case the system, (d) does not contain an equation with h i+1 . ,. ,o ^ and the last derivation is not valid. However, subtracting in this case the equation with from, the equa tion with hi+1 .j one can see that 1
In this case it is convenient to introduce the unnecessary parameter hi+1,o
defined hi+1,o
“ ^ ai+l^ hi+l,1
t1 ” ^ bI+l,l^ hi+1,1J
if ai+l . „ = b i+1,1 . , With this definition, the equations stated
h.JL JL , h._L"r , . I y KJ
y(x>
h.l “r
,
I y |
are always related by
in the lemma.
LEMMA 7» Let F(x) the class o. If f
and
and
y) dP(x) =
V,
G(y)
be containedin
all
y 6 SQ ,
all
yeY
ym 4 1
then[ J ¥ (x,
y) dF(x) > v,
.
If
J
f(x,
y) dG(y) = v,
all
x € Sp ,
xn i
1
,
then J ¥(x, y) dG(y) < v,
PROOF.
First Inequality.
The definition of
¥(x, y)
implies
that Y(x, y) > ¥(x, y - o),
all
x, y
Therefore, it is sufficient to consider only those vectors y with ym 4 1 * The inequality is proved by induction, and we may assume that it has been established for all the vectors y whose last m - q components coincide with the last m - q components of some vector in the support of G(y).
327
TACTICAL PROBLEMS
Let y = (y1, ..., yq+1^ y*+2^ •••> y^) be a vector whose last m - (q + 1 ! components coincide with the last m - (q + 1) components of some vector in the support of G. The assumptions of this lemma imply that R(y) must satisfy all the equations derived in the proof of Lemma 6. In particular, R(y) must be independent of all the starred components, and (by (b)), i-1
q+i [1 - D S 1 ff
R(y) = R(y,, •••, yq, yq + 1 ) - 2 ff
S=1
where
b^j+1 < y*+2 < b^j+2 ‘
t=1
[1 - Q(yt )]
7 ± j +1
,
Consider four cases.
CASE 1. y^+1 > b^j+1 .
In this case we must have
bij+l < yq+1 ^ bij+2 * Then-> by Lemma 4, i-1
yq ) - 2 f| [i - Dg] f[ [1 - ft(yt )] rljj + 2 j" H 2hi,j+i
y Hence, i-1 R(y) = R(y,,
q
yq ) - 2 ff [1 - Ds] ff S = 1
i-1 IT S=1
q ^1
CS] n II - Q(yt^ t=1
[1 - Q(yt )l rl j j + 1
t = 1
r | 2hij+i + 2 ^7i, j+2 “ 7i, j+ 1^ ^^q+1 ^ v.
By equation (2 ), 7j_,j+2 ” 7i,j+1 = ~ hij + 1; clearly* R (y) is a monotone increasing function of yq+1• If the components yp , yp+1, .••, y^ also lie in the interval [bij+1, b1j+2], the same argument will show that R(y) Is a monotone increasing function of these components, and R(y) achieves a smaller value if all these components are replaced by
32 8
RESTREPO
blj+1; we can write y*+1 = b±J+1 e [b±j, bij+l] Then y^+ 1 coincides with the q + 1 st component of some vector in the support of G(y) and R(y) > R ( y r
•••,
CASE 2 . ^q+1
b..+
b.. + 1, y * +1, y*+2,
1,
< y^ +1 < b^j+1.
y*) > v
.
In this case we simply choose
^q+1 1 9
CASE 3 • a -| < < bij* Fop definiteness, assume that bkl - ^q+1 bkl+1 ’ Applying Lemma k to R(y) and simplifying the result by means of the appropriate condition (a) from Lemma 6, i- 1 R(y) = R(y,,
•••, yq ) - ]~[
3=1
k- 1 - n S=1
q
[1
" D s ] II t=1
" Q(yt )]
7 i,j+i
r
q t1 - ds] n
[1
\ shk i + 2^ (v i } 7k,1 +1
[1 -
t=1
I
1-1 - ff [1 - Dff] 7ljj+1 R(y,,
•••> yq, a ^ y*
,
y*)
.
The problem is now reduced to Case 3.
X
and
Second Inequality. It is sufficient to interchange the roles of Y and to recall that ¥(y, x) = - ¥ (x, y )•• Then, by assumption J%(y, x) dG(y) = - v,
all
Then, by the first inequality (applied to
J \ ( y , x)
x e S^, Y(y, x))
dG(y) > -
v
xn ^ 1
TACTICAL PROBLEMS COROLLARY. of
G(y)>
If
y
329
is not contained in the support
then
J
nr(x, y) dP(x) > v
.
This result follows from the strict monotonicity of the functions that appear in the proof of Lemma 7 . LEMMA 8. Let F(x) and G(y) be two corresponding strategies such that at least one of them is con tinuous at 1 . Then v = v, andF and G are optimal. PROOF. For definiteness, we may assume that By definition of
is continuous.
v, f) dP(x) s v,
Since the vectors with (b)
all
y e SQ ,
4 1
ym
.
ym = 1 have G-measure zero,
J f *(x, y) dF(x) dG(y) = J
Similarly, by definition of f
G
v dG(y) = v
.
v,
Y(x, y) dG(y) = v,
all
x € Sp,
4 1 .
xn
Since G is continuous, the integral is continuous and the equation is valid for all x e S^,. Therefore, (5)
J J
^(x, y) dG(y) dF(x) - J * v dF(x) = v
.
Equations (h ) and (5 ) imply that v = v; this number is denoted by Lemma 7 asserts that v is the value of the game, and that F and are optimal strategies. REMARK.
It is well known that if y y (x , y) dF(x) = v,
all
F
and
G
are optimal,
v. G
then
y € S& ,
provided that yis a point of continuity of the integral; a similar result is valid when F and G are interchanged. In particular, any pair of optimal strategies that belong to the class 0 must be a pair of correspond ing strategies, and it is easy to show that one of them must be continuous
RESTREPO
330 at
t = 1.
This remark may be taken as the converse of Lemma 8.
CHARACTERIZATION OF CORRESPONDING STRATEGIES. Let F(x) and G(y) be two strategies contained in the class 0, and let a and £ denote the discrete masses that Fn and Gm may have at xn = 1, y^ = 1 Let
D1 =
rb j+i
J
P(t) dP±( t ) ,
Ej -
ai and let
f*(x)
and
J
Q ( t ) d G j(t )
,
bj g*(y)
(6)
f*(x) =
(7)
g*(y) =
be two discontinuous functions defined by
f]
[ 1 - Q(b •)]
Q.'(x)
fl [1 - P(a. )] .ai>y P (y) Q(y)
The results of Lemma 6 can be stated as follows: F(x) and G(y) form a pair of corresponding strategies if and only if the following conditions hold: (8)
dF1 (x1 ) = h± f * ^ ) dx±,
ai < x ± < a±+1,
1=1.
(9)
dGj(yj } = kj S*(yj } dyj'
b j < y j < b j+1'
j = 1'
(1 0)
1+ 2a = Dn + 2hn
(11)
1+ 2f3 = Em + 2km
(12)
hi =
[1 - D^J hl+1
(.3)
kj -
h - Ej ] kJ t |
m
a, • b, If these equations are used to characterize the optimal strategies, it is necessary to include the continuity hypothesis (ap = o) of Lemma 8 and the normalizing equations
TACTICAL PROBLEMS §7-
331
EXISTENCE OP A SOLUTION
We have shown that in order to find two optimal strategies it is sufficient to find a set of numbers a., , h., ..., k , a and p 1 . 7 h nr 17 ' nr that satisfy the previous set of equations. In equations (lo), (11), (1 2 ) and (13) it is convenient to write and Ej in terras of f* and g*, and to eliminate hn, k^, h^ and kj by means of the normalizing equa tions. In this form, the complete system is as follows: Normalizing Equations
1 (110
hn
/ an 1
f
(15)
(16)
h±
f*(t) dt + “ = 1
g*(t) dt + p = 1
i+1 J'
f*(t)
dt =
1
1=1,
..., n -
1
al
(IT)
k.
J
b j+1 g*(t) dt = 1
j = 1, ..., m - 1
bJ Equations from Lemma 6 and Lemma 8 1
(18 )J
[(1 + a) - (1 - a) P(t)] f*(t) dt = 2(1 - a) an
(1 9 )
f
1 [(1 + p) - (l - p) Q(t)] g*(t) dt = 2(1 - p)
bm ai+1 (20)
\ J
1 1 - P(t)] f *(t) dt = t— !— ,
1+1
1 = 1 ,
..., n -
1
ai
(21)
f
bJ-i
Jb.
[1 - Q(t)]
g*(t) dt = -T— !— , ^
j = 1,...,m -1
RESTREPO
332 (2 2 )
a1 = b1
(23)
ap = o
.
LEMMA 9» Let a be any number in the interval (o, 1], and let b ^ ..., bm be any set of parameters, subject to the restriction o < b 1 < ... < t>m < 1j let f* be defined by equation (6). Then
limf x - ^ o
PROOF.
a [1 - P(t)] f *(t) dt = + t»
Jx
Let m M = ]~[ tl - Q(bj)] , j=1
and choose
c
such that for
t < c,
a J
l-P(t)>P(t).
c
J
[1 - P(t)] f*(t) dt >
X
x
Then, for
x < c,
a M
dt + Q
J [1 “
dt
c
cx
= M
The first term tends to
+ °o;
-— 1 + f Q(c) J Q('x)
[1 - P(t)] f*(t) dt
Jn
the second one is finite.
LEMMA 10. Let a be any number in the interval [0, 1 ), and let b ^ ..., b^ be any set of parameters, subject to the restriction o < b 1, ..., bm < 1. Under these assumptions, equations (14 ), (16 ), (1 8 ) and (2o ) have a unique solution a 1, ..., a , h 1, ..., hn . If the b ’s are fixed, an Is a monotone increasing function of a, and an — 1 as a — 1. If a is fixed, aR is a decreasing function of bm , and an -— 0 as bm - ^ 1 . PROOF. Lemma 9 shows that equation (1 8 ) has a solution an e (°> since the integrand is positive, the solution is unique, and it Is clearly a continuous, monotone increasing function of a; setting a = 1, one obtains an = 1; but if oc < 1, aR must be in (o, 1 ), and hn can be
TACTICAL PROBLEMS
333
computed by means of equation (14). When an and hn are known, an __1 can be obtained from equation (2 0 ), and hn __1 from equation (1 6 ). The process can be continued -until all the a ’s and h fs are found. Finally, it is necessary to consider an as a function of b . Let a and d be fixed, and consider
J rb- [ (1 + a) - (1 - a) P (t)] f *(t) dt d
+
[ (1 + a) - (1 - a) P(t)] f*(t) dt
J*
bm In the first term, the integrand contains the factor [1 - Q(bm )] and tends to zero uniformly as bm — M ; in the second term the Integrand is bounded, and the range of integration can be made arbitrarily small. Thus, as
bm —
1,
the solution
an
of equation (18 ) must approach
0.
REMARK. The value of each a^ depends only on those parameters b that are larger than a^. The remaining parameters can be changed, and new parameters may be added. Also, the process is symmetric: if a 1, an , p are given parameters, equations (1 5 ), (1 7 )> (1 9 ), and (2 1 ) have unique solutions b.j, ..., b^. Results like those of Lemma 10 will hold. THEOREM. The system of equations (14), ..., (2 3 ) has a unique solution. This solution determines two strate gies F(x) and G(y) that are optimal, and these are the only optimal strategies that belong to the class 0 . PROOF. It Is sufficient to show that the system has a unique solution a 1, ..., an , b.,, ..., bm , h ^ ..., hn, k ^ ..., km , a, p. Then F(x) and G(y) can be defined by equations (8 ) and (9 ). The normalizing equations show that F and G are strategies, and the remaining equa tions show that F and G satisfy the hypothesis of Lemma 8 . The remark that follows that lemma shows that no other strategies in the class 0 can be optimal. EXISTENCE.
Consider two numbers 0 < a < 1,
a
and
= 0,0
< p
p
such that < 1
,
and construct a set of numbers a*, ..., a*, b*, ..., b * as follows: first, compute two numbers an and bm by means of equations (1 8 ) and (1 9 ) using no parameters in the definitions of f* and g*; the resulting
RESTREPO numbers are compared and the larger one Is kept as a parameter; for definiteness, we assume that an > b , and define a* = an; the smaller number is neglected. In the next step a number an_i is computed by means of equation (2 0 ) (without parameters), and a new bm is computed from equation (19 )> using the single parameter a*. It Is clear that the new bm is smaller than the one computed in the previous step, and therefore bm < a*; the two numbers an __1 and bm are compared, and the larger one is kept as a parameter. For definiteness, one may assume that > lDm> and define an„-| = an_-| • The process is continued in this manner: at each step, a new a^ and a new b . are computed, using as parameters the previously starred a ’s and b*s;the two numbers are compared, and the -Xr *X* larger one is kept as a parameter,denoted by a. (or b.); the corre* , *v ± j sponding h^ (or k •) is computed by means of the appropriate normalizing equation. Since each parameter is smaller than those computed previously, it is clear that the resulting set is self-consistent: that is, if -X-X-X-X b.j, .. •, bm are used as parameters, one obtains againa 1, ..., an , and conversely. Therefore, these numbers satisfy the system of equations, except for equation (2 2 ). Now, consider a* and b* as functions of a and p. If the previous construction is carried out with a = p = 0, it may happen that ■X-X* -X-Xa1 = b ^ and all equations are satisfied; otherwise, either a^< b 1 or b* < a*; for definiteness it may be assumed that b* < a*.In this case the same construction is applied with a = 0, p = 1 - e, and € is chosen so small that In the first step an < bffl, and b* = bm is arbitrarily close to 1 . In the remaining steps a is always computed with the * n parameter b , and an — 0; thus all the b * ’s are computed without parameters and remain bounded away from zero; then a* < a* < b*; the inequality has been reversed as p increased from 0 to 1. Since the a fs and b fs are limits of integration of strictly positive densities, it is clear that a- and b 1 arecontinuous functions of p, andthere -X* -Xexists a positive p for which a- = b 1. If the construction with * * cc = p = 0 leads to the inequality a 1 < b^, the system of equations has a solution with a > 0, p = 0. UNIQUENESS. The solution is unique. Indeed, suppose that a1, .. ., an , b 1, ..., bm and a1, ..., an , b 1, ..., bm are two solutions. These solutions give rise to two pairs of optimal strategies F(x), G(y) and F*(x), G*(y). Since F and G* are optimal
f y(51,
f)
dF(x) =
v,
ally e SQ *,
ym 4
1
The corollary to Lemma 7 shows that S^* C Sq. Since the argument is symmetric,the two measures must have the same supports, i.e., b? = b., * J J all j. Similarly, a^ = a^.
335
TACTICAL PROBLEMS BIBLIOGRAPHY
[1 ] SHIFTMAN, M., "Games of timing, " Annals of Mathematics Study No. 28 (Princeton, 1953), pp. 97-123* [2 ] KARLIN, S., "Reduction of certain classes of games to integral equations, Annals of Mathematics Study No. 28 (Princeton, 1953), pp. 1 2 5 -1 5 8 . [3 ] BLACKWELL, D., and GIRSHICK, M. A., Theory of Games and Statistical Decisions, New York, John Wiley and Sons, 19 5 ^.
Rodrigo Restrepo California Institute of Technology
MULTISTAGE POKER MODELS Samuel Karlin and Rodrigo Restrepo
*
One rich source of examples of non-trivial two-person games can be found In the area of parlor games. Specifically, variants of poker games provide an inexhaustible flow of such models. Aside from their game-theoretic interest, there is also much similarity between the analysis and so lution of poker games and the statistical decision problem of testing hy potheses . It is therefore of considerable value to develop unified methods for analyzing such games. Von Neumann was the first to make use of the general theory of games In the study of some specific poker models [1 ]. Subsequently, In a joint paper, Gillies, Mayberry and von Neumann Investigated some extensions of these games [2 ] (see also [3 1 )• In this paper we continue the study of poker models by allowing the possibilities of several rounds of betting and different sizes of bets. In spite of the complicated form of the problem., complete solutions are obtained. More important, implicit in the techniques used here is a general method which can be applied to a great variety of games. The method is applicable If the pay-off is the expected value of a function P of two random variables x and y, provided that the strategies for I and II can be Identified with vectors 0 (x ) = (0 1 (x), ..., 0m (x)) and 'i'(y) = (^(y),
*n (y))
.
These vectors may be subject to arbitrary constraints. The cumulative dis tributions P and G of x and y can be arbitrary, but implicit in the definition of the strategies is the assumption that I knows the value of x and II knows the value of y. In these games the pay-off will be of the form R (0 > y) =
J J P(x, y; 0 (x), ¥(y)) dP(x) dG(y)
-X-
.
This work was supported In part by O.N.R. through Contract Nonr-2 2 0 (1 6 ) with the California Institute of Technology. 337
KARLIN AND RESTREPO
338
In the subsequent examples it will be possible to write this pay off in two different forms: m R(0, S') = M 1(S') +
Y
J 0 ± (x) C0 _(y; x) dF(x)
and n
R(0, s') = M 2 (0) + ]T Cf.(0 J y) dG(y), j= i
J
where M 1 and M2 are independent of 0 and Y respectively, and where CL and Gw denote the coefficients of 0. and Y. in the expressions j. j J for R(0, Y). The notation emphasizes the fact that G0 and are functions of ¥ and 0 respectively. 1 ^ In order to solve the game It isnecessaryand sufficient to find two strategies 0* and such that
R( 0*, Y*) = max
M 1(Y*) + ^ J 0^(x)C0 ^(¥*;
x)
dF(x )^j-
and
R(0*,
¥*) = min -|Jm2(0*) + ]T
o. In contrast, where Kj_(y) = o, the values of Y^y) will not affect the pay-off. Guided to some extent hy intuitive considerations, we shall con struct two strategies 0* and Y*. In will be easy to verify that these strategies satisfy equations (1) and (2 ) and are therefore optimal. The main problem in the construction is to make sure that the strategies are consistent in the sense that the function 0* that maximizes (3 ) is the same function that appears in (4); and similarly for Y*. We now proceed with the details. Since player
B
has no opportunity to bluff, we may expect that if
(5)
y < for some
^(y) if
This is in fact the case as each
bi
y > bi K-(y)
is non-increasing.
On the other hand we may expect that player
A
will bluff
3^2
KARLIN AND RESTREPO
sometimes when his hand is low. In order to allow such occasions of bluff * ing, we determine the critical numbers b^ which define ^(y) so that This can be accomplishthe coefficient Lj_(x ) of 0 . is zero for x c bi is constant on this interval. Hence, we choose ed as a. 2 + a^
:6 )
and thus b 1 < b2 < of 0 . is zero < bn < 1 . The coefficient in the interval ( 0 , b ± ) , and thereafter it is a linear function of x such that L± (1 )
ai + 2
Prom this we deduce that the functions the unique point (7)
L^(x)
and
L .(x)
intersect at
(2+ai )(2+aj)
Clearly c^j is a strictly increasing function of i and j Define b^ and c . = c . „ • for i > 2 . For x in the interval (c^, c^+1) l 1-1,1 '1 L.. (x) > L - ( x ) > o for all j 4 i* Consequently, according to our previous discussion, if 0* maximizes R(0, ¥*), then 0^(x) = 1 for =1* Of course, ci < x < ci+i, and for definiteness we also set this is of no consequence since if a strategy is altered only at a finite number of points (or on a set of Lebesgue measure zero), the yield R(0, ¥) remains unchanged. Summarizing, we have shown that if
is defined as in (5 )
with
bi then any strategy
0*
ai 2 + ai
of the form
*/ X 0i(x) = « s.
(where Z^=10?(x) < 1 and of 0j_(x) in the interval
II
(8)
o V
r arbitrary 0
c1 < X o) o < x < c1
Ci
maximizes R(0, S'*). are still undetermined because of
MULTISTAGE POKER MODELS the relations
L^(x) = o
which are valid for that same interval.
It remains to show that the
Y*
as constructed actually mini
mizes R(0*, w). In order to guarantee this property for ¥*, it might be •Wr necessary to impose some further conditions on the 0^. For this purpose, we shall utilize the flexibility present in the definition of the 0* as x ranges over the Interval (o, c1 ). In order that minimize R(0 *, y) we must show that the coefficient K^(y) of ¥^(y) is non-negative if y < b^ and non-positive for y > b^. Since K^(y) Is a continuous mono tone decreasing function, the last condition is equivalent to the relation
J 3!
(9)
J
-(a^ + 2 )
1 0*(x) dx + a^J '
0?(x) dx = 0 . b.
Inserting the special form (8 ) of
0*
into (9 ) leads to the
equations bi 1 (x) dx = b 1 (1 - b 1 )(b2 + 1 ),
/
o
(
10) bi
i
V x) dx = bn (l - V * 1 + V l >
j
K(x) dx = b1 (l - b^) (b±+1 - b± _ 1 )
and
(1 1 )
Since
(12 )
0^(x) < 1 ,
these equations can be satisfied if and only If
2
rbl j X 0i(x )
< 2bi 1 *
O
But the sum of the right hand sides of the equations (1 o) and (1 1 ) is at most -j~(2 + bR - b 1 ) since always b^( 1 - b^) < -J- . Since b 1 > -~ we 1 T ( 2 + bn - b^ ) < -]~(3 - b 1 ) < y < 2b^ . Thus, the requirements in (1 2 ), (1 o ) and (1 1 ) can be fulfilled. We have consequently established the In equalities (1 ) and (2 ) for the strategies 0* and Y*. In summary, we display the optimal strategies as follows:
KARLIN AND RESTREPO o
if
y
b±
(y)
1
arbitrary but subject to the constraint that f" 1 J 0^(x) dx = b^ tCj_+1 ” cj_J
*/ \ K(x)
if
x < b^
if
b1 < x
x„
1 + x_
**(y) = 0
J > Jr
x < y
’ *(y, x) x > y where x is the unique root lying in the unit interval of the equation 3 2 x + x + x - 1 = o. When the uniform distribution is replaced by an arbitrary continu ous distribution F then the form of the optimal strategies is unchanged. However, if F is taken to be a discrete distribution, then the form of the optimal strategies is modified only in that u*(x) and Y*(y) may in volve randomization at the critical values xQ and yQ respectively. This is analogous to the situation of Model I. The original "le Her" game corre sponds to such a discrete underlying distribution. The solution of this special example was carried out by M. Dresher [6 ]. §4.
POKER MODEL WITH
k
RAISES
The two players ante one unit and receive independently hands
x
350
KARLIN M D RESTREPO
and y, identified with points of the unit interval, according to the uni form distribution. There are k + 1 rounds of betting. In the first round player A can fold relinquishing his ante to player B, or he may bet "a" units. Players A and B move alternately. In each subsequent round the appropriate player can fold or see (whereupon the game ends), or raise the bet by "a" units. In the last round the player can only fold or see. If k Is even the last possible round ends with A, and if k is odd the last possible round ends with B. A strategy for player A can be described by a k-tuple of func 0v (x)). These functions shall indicate A's tions 0 (0 1 (x), 0Q (x), 1 procedure of action when he receives the hand x. Explicitly, >± (x is the probability that A folds Immediately and is the prob ability that A bets at his first opportunity. Further ^(x) =
the probability that
A
folds at his second round,
?2 (x) =
the probability that
A
sees at his second round,
Z ^ 30i(x) =
the probability that
A
raises at his second round
if the occasion arises, i.e., if player B had raised in his first round thus causing the continuation of the game into A fs second round of betting Similarly, if the game continues through A !s r-th round, then
02r-3(x)
the probability that round of betting,
A
folds at his r-th
02 r_2 (x)
the probability that round of betting,
A
sees at his r-th
the probability that round of betting.
A
raises at his r-th
Analogously, a strategy for player B is expressible as a k-tuple Y = (^(y), ?2 (y), . . o, Y^(y)) which shall describe B rs course of action when B receives thehand y. The precise meaning of the components is as follows: At B's first opportunity he folds with probability = 1 " E^-i^n*(y)^ sees A's original bet with probability ¥ (y) J '1 J v 1 and raises with probability sj=2 ^j(y'* ^ Same continues to B's r-th opportunity to bet, then we have Y2 r-2 ^ ^ =
Probability that B round of betting,
folds at his r-th
1 (y) =
the probability that B round of betting,
sees at his r-th
Y2
MULTISTAGE POKER MODELS z. J
Y .(y) = the probability that J round of betting.
B
351 raises at his r-th
If the two players receive hands x and y and choose the strategies and ¥ respectively, then the pay-off to player A is /
k
v
k
P (0 (x), Y(y)) = (-1 ) ( 1 - X 0i o. J
The coefficients for
R(0, ¥).
c0
CL0 . and Cw canbe obtained from, the formula Recalling the definition of K(x, y) one easily obtains
= 2 + a 1
J
x
J
- (a + 2 )
O
1 y* -
(a + 2 )
X
J
1
k
X
O
j = 2
and y
c
= - (a
+2
) J
X o
The point < 0.
C
b
1
k
0i
1 =1
x > b
and
y > b
/
S 0i y
will be chosen so that, for
Thus, for
+ a
k
x > b, C0
•
1=1 > o
and, for
y > b,
in order to determine the optimal
1
strategies, the actual magnitude of a coefficient is not as relevant as the sign of the differences C0 - C0 or - C» . Since the functions i J 1 j 0^ and 0^+1 (or and ¥ .+ 1) appear together in most of the terms of R(0, ¥) it is easy to obtain by inspection (and simplification) the follow ing formulas:
MULTISTAGE POKER MODELS x
((Ur - Da + 2 )
° 02 r " % r - l
355
k
1
f
£
o
j= 2r
- a
k
f
£
X
j=2P
(8 )
i Cm
2p
“ C™
((4r - 2 )a + 2 )
2 r-i
J
2P™ 1
J
y
'.9)
+
((i+r - D a + 2 )
k Y
f o
i + a
/y
i=2 r+1
i
•!
i=2r+1
2 r = 2 , U, ..., k,
y 2 r-l
- C
2 r- 2
= -
((Ur - 3 )a + 2 )
/
1
k
*
+ a
/y
I=2P-1
°Q
k
i
i=2r-1
-I-
( 10)
3
2P-1
(i D
^0
((Ur - U)a + 2 )
**r _2 + a
/
J
^
2P-2
r
J
X
((4r - 3 )a + 2 )
2 r-i
I o
1 -
One should notice that the even op odd
* /X
k
z
’
j-2 r
k
i
J = 2P
*;
chapactep of the indices i
is important; the equations for ^2 p- 1 are essentlally the same fop 0gp• The same pemapk holds if wecompape the equations fop
and
j
as those and
02 r+1 ’ 0Pder ^or Prescribed strategy 0* to be optimal we require that Cfl - C be zero for x = c0 1. In view of the definition of 2r 2 r- 1 this last condition will imply that C0 - C0 = o for 2r 2 r-1
KARLIN AND RESTREPO
356 b < x < d2p-1,
if
(1 2 )
for
[(^r - 1 )a + 2 ]
2 r = 2 , b,
..., k.
and the definition of x > d2 p __1
^
/
^ rij = a^l - d2 p_ 1 j=2 r
Again from the increasing nature of ¥*,
it follows that
Gw
(x)
(C0
- C0 )(x) 2r 2r-i (x) for x < b,
C0
2r
2 r-l
respectively, In order to insure that
C, = CL er-! 2r- 2
at
c
the follow
ing relation must be satisfied:
a| c [c2 r~2 " d2 r-2 ] ~ a _d2 r-i ~ c2 r-2 ] (13)
k + [(^r - 3 )a + 2 ]
]T nj + a£l - d2r_^ j=2r
which is to be valid for 2r - 1 = 3, 5, ..., k-1. By differentiating it is easily observed that C0 - C0 is strictly increasing for x 2 r-l 2 r- 2 traversing the interval (d2p_2, d2 r-i ^ which contains the point c 2 r - 2 in its interior. (y)- CL (y) = 0 2 r- 1 2 r- 2 and is strictly decreasing otherwise, provided that
By analogous arguments wededuce that for
(U)
b < y < c2 p_2
CL
k [(^r - 3 )a + 2 ] ]T mjL = a[~1 - c2p__2 1 i=2 r- 1
Also, by imposing the requirement that 2 r = 2 , 4, ...,
k should vanishwhen
a| d0_ , - c [d2 r-i “ C2p"1]
2 r-l = 3 , 5 ,
k-1 .
C„, (y) (y) for 2r 2 r- 1 y = d2 r-l' we ob"tain
= a[°2p ~ d2 r-i]
(15)
,
k +
( ( b v - i)a + 2 )
mi + a ( 1 - °2 rj*
Y
i=2 r+l Furthermore,
(y) - CL
CL
2r C2r-1
to
c2 r *
(y) 2r-i
strictly decreases as
y
varies from.
MULTISTAGE POKER MODELS Finally, if and only If
357
(y) is strictly decreasing; it vanishes for
y = b
1 k
(1 6 )
(a + 2 )
= a(l - b). i= 1
Similarly, the coefficient
C0 (x)
vanishes identically for
0 < x < b
1
and is otherwise positive, if and only if k (17)
2 = (a + 2 ) Y ,
n- + (a + 2 ) ( 1
- b).
The equations (1 2 ), (1 3 ), (14) and (1 5 ) appear in cycles of four. Having assumed that k is even, the first cycle corresponds to 2 r = k, the last cycle corresponds to 2 r = 2 . But in this last case we must ex clude equation (1 3 ) which is valid only when 2 r > k. The four equations of each cycle can be written in the following form:
(1 8 )
[(4r - 1 )a + 2 ]
X nj = a j=2 r
- dgr_i I
2p - 2 , ..., k,
k (19)
a /c2 r _2 - d2r_2 ) = a f l \
+ [(in- - 3 )a + 2 ]
- c 2 v _2 \
^ nj j=2 r
2 r = k,
(2 0 )
[(4r> - 3 )a + 2 ]
k X mi = a M i=2 r-i
- c Z r _2 \
..., k.
2 r = 2 , ..., k,
k (2 1 )
a ^d2r_, - o&r _ ^
“ a (1
- d2 r_1J+
[(^r-i)a + 2 ]
X ^'1 I=2 r+1
2 r = 2 , ..., k. §6 . SOME FURTHER IDENTITIES If 0* is the function that maximizes equation ( k ) it is clear ly necessary that Cw = o at all points x where o < 0 .(x) < 1 . In 1 particular, in the interval o < x < b, where bluffing is involved, we must
1
358
KARLIN AND RESTREPO
have C0 - Va is even)1 (2 2 )
0 of equivalently (having assumed that
G0
C0 - C0 - 0, *k-l k-3
Similarly, in the interval (2 3 )
C
0, *.., C0
C k-3
k-5
Therefore, if Y. and 0^ (2 2 ) and (2 3 ) reduce to
C
- C
(2 5 )
J
[(4r - 3)a + 2 ]
= 0
so,..., C
k -2
2
k-4
are of the form given in Figure 1, equations
1 /
0.
1
° < y < b,
C„ - C so, k k- 2
(24 ) C(4r - 1)a + 2 ]
k
*2r = 2a
/ o
k
1
2a
2r~1
2r = 2, . . k,
2 j=2r+1
/o
k
s
i=2r
and
r»1 k
(2 6 )
f
2 = (a + 2)
Z
O
Equation (2 6 ) is equivalent to
■
j=1
= 0 , 0 < x < 1.
CL
Equation (2 5 ) (with
1
2r = 2 ) is equivalent to
= 0, ° < J < b . Rewriting (24) and (2 5 ) in
Gv
2
an equivalent form we obtain
[(4r - 1 )a + 2 ]
1
/ o
[(4r - 3 )a + 2 ]
OP [(kr - 1)a + 2]
r
o
1 [(4r - 1)a + 2 3
. 2
i=2r-1
/o
k d2p-1 -
-I=2r-1
z
c2r-2
= [(4r - 1)a + 2 ]
0±
k
i "i + -' - dar ..j=2P+2 ~
r m. + 1
k i=2r
r = [ (hr + 1 )a + 2]
k I
1
j=2r+1
k
I nj - 1 L j =2p
[(4r - 3 )a + 2 ]
1
1 o
j=2r 1
(25a)
= [(4r + 1 )a + 2 ]
k
k
I .i=2r+1
1 0 41
(24a)
1
ro
I k
MULTISTAGE POKER MODELS
359
The first equation can be simplified by means of equations (1 8 ) and (1 9 ); the second equation by means of (2 0 ) and (2 1 ). Hence, (4ra + 2 )(c 2p ~ d2r-1 ^ = [
+ 2 )a + 2](c2r - d2r ^
(24b) 2r = 2 , ..., k-2, [ ( b v - a)a + a H d ^
- c2r_2 ) = (Ura + 2 )(d2p _ 1 - c ^ , )
(25b) 2r = 2 , ..., k.
§7*
EXISTENCE OP A SOLUTION
Our task now is to show that equations (1 8 ), (1 9 ), (2 0 ), (2 1 ), (24b), (25b ) and (2 6 ) possess a solution b, c ., d ., m., n . consistent J JJ with the requirements of Figure 1 . Tx Z2 r- 1 mi
Eliminatingn . between equations (1 8 ) and (1 9 ), and ti J ^I>om equations (2 0 ) and (2 1 ), we obtain
[( k r - 1 )a + 2 ] (2 c2p __2 - 1 - d2p_2 ) = [( k r - 3 )a + 2 ](1 - d2 p_ 1 ) (2 6 ) 2 r = k , .•., k and [( k r + 1 )a + 2 ](2 d2 r - 1
- 1 - c2p__1 ) = [(Ur - 1 )a + 2 ](l - c2p)
( 27)
2 r = 2 , ..., k-2 .
Moreover, from (2 1 ) for (2 8 )
2 r = k,
we get
dk_i - Cic- 1 = 1 - dk-i
which is Identical to (2 7 ) provided we allow the Index 2 r = k and in terpret c2k = 1 . With this convention (24b), (25b), (2 6 ) and (2 7 ) are equivalent to the following system of equations.
(2 9 )
” °2 r- 1 ^ ~ d2 r- 1 ^ ~ c2 r ±JLJ- = 2 -. . — 1----------------------------------------------------(4r-1)a + 2 (4r-1)a + 2 (4r+1)a + 2
(3 0 )
^ - .l- 1 ^ 2£ii , ^gr- 1 ~ 02 r-l 4ra + 2 (4r~2)a + 2 (interpret
cQ = b )
2 r = 2 , ..., k,
3^0
KARLIN AND RESTREPO
(31 ) u 1'
1 ^2 r- 2 ^ °2 r-2 ^ 1 ~ +^2-—r-—1 — -----------------=£-=— = 2 --— (4r~3)a + 2 (4r-3)a + 2 (ij-r- 1 )a + 2
(3 2 )
-iH=S---( k r - 2 )a + 2
2 r = 4, ..., k,
Hzl = 2vzi---- %IzL
2V =
(k r - k )a + 2
k, ...,
k,
and
(33)
=
1-b +
(1 _ d l ).
The last equation is obtained by examination of (2 6 ) and (1 8 ). Let < ^ - 1 (2 9 ) (since c^ = 1 that
be any number in the interior of [0, 1 ]. Then from by convention), c^^ is determined and we also see
c^c__1 < d ^ i • Prom (3 0 ), we next obtain
0 ^- 2
and evidently
c^.^ < c^_-j • Equation (31 ) gives the value of d-^-2 and shows that dk- 2 < °k-2 ° Finally, by means of (3 2 ) we get < ^ - 3 also seen to be smaller than d^._2 . The process can be thus continued until c0 = b is found from equation (3 0 ). It is evident that if we start the process with = 1, then all the unknowns must have the value 1. However, then (3 3 ) leads to the inequality -d- --+ cL > 1 - b + ■ Jci * d (1 - d .) 1= 0 . On the other hand, if dk __1 = 0 then since d 1 < dk - 1 and b 1 < d^.^, we get < 1 < 1 - b + — .A — (1 - d 1 ). As all the c fs and d ’s are continuous functions of d - ^ there exist some d^__1 (0, 1 ) for which equality holds in (33 )• It Is clear from (33) that b must be positive and thus all c^ and d. are positive. Prom (1 8 ) we see in view of the nature of the determined d. k k that zj=2 r nj> Zj=2 r+2 nj > 0 so a-^ ^ e masses n. are positive. Similarly, from (2 0 ), we deduce that all the masses arepositive. From (1 8 ) for 2 r = 2and (33), wefindthat the sum of the n. is small er than b. From (2 0 ), we obtain that Zm. = (1 - b) < b since o -L a + d 1TT~ 2 > 1 - b We have thus established the existence of strategies of the type described in Figure 1 that satisfy (1 8 ), (1 9 ), (2 0 ), (2 1 ), ( 2 b ) , ( 2 5 ), and (2 6 ). §8 .
THE CONSISTENCY OP
0*
AND
Y*
In order to complete the analysis of the problem it is necessary to verify that the solutions 0* and constructed above actually satisfy ( k ) and (5)- Since the problem is largely symmetric (the ® 3 with even index play the same role as the ¥*s with odd index), we will
361
MULTISTAGE POKER MODELS simply verify that
0*
maximizes
R(0, ¥*).
0 < x < b,
In the interval tions (24) and (2 2 )) and
(x) = 0 for all r (equa2 r- 1 (see the discussion in connection
C0
(x) < 0 2r with (1 2 )). Therefore, we conclude that in order to maximize R(0, ¥*), it is necessary that 02p = 0 for all r in this interval. Furthermore, in view of the fact that C0 (x) = 0 (0 < x < b), 0p 1 can be chosen 2 r-l arbitrarily in the interval, provided only that 0 1 + 0^ + ••• + 0k- 1 - 1 * In particular, since E mi < b it is possibleto choose 02r„-j such that C0
02r-l = m 2r-i’ 2r " 1 =
3'
k_1.
At this point it is convenient to notice that at C0 = 0(see (1 2 )), so that 2 r-l 2r C- (b) = 0 for all i
(34)
x = b,
1.
From equation (1 0 ) we also note that each
C* (x) is increasing so that 1 C0 > 0 for all x > b.For x > b, we have C0 > 0 . Consequently, in I ^ order to maximize R(0, ¥*) we must have 0± ^ 1 *Furthermore, the 0i with the largest coefficient should take the value 0^(x) = 1 . At this point, it is necessary to consider two typical intervals (c2p_2,
c2 r - 1 ) and (°2 p-i > c2 r ^ us our attention on the interval (°2 r-2 ' C2 r-1 ^ wbere 02 r-l (x) = 1 . In order to show that 0* is indeed an optimal strategy we must verify that in the given interval C0(x) > C0 2 r-l
(x),
all
1.
We consider two cases: CASE 1 .
i > 2r - 1 .
If
i = 2 s > 2 r,
the derivation of ( 12 )
shows that C0
(x) s C0 2S
On theother hand, if
(x),
b < x < d
*2S-1
23
. 1
i = 2 s- 1 > 2 r, equations (1 1 ), (1 3 ) and
(34)show
that
C0 23-1
(x)
Since each of the intervals terval
(x)
- c0
< 0,
b < x < C
2S-2
(b, d2s_1 ) and
(c2r_2 > C2r- 1 ^ we have
(b,
c 2 3 _2 '>
contains the in
KARLIN AND RESTREPO
362 C0 (x) 2r-i for
x
in
=
C0 (x) 2r
>
C0 (x) = C0 (x) > ... 2r+i 2r+2
(c2r_2 , o ^ , ) .
CASE 2 •
i < 2 r - 1 . By equation (13 ),
c0 2r-1
(x) - c0 (x) > 0, 2r-2
and by (8 ) and (1 2 ), for any C0
c
0,
b < x < 1.
Finally, from (8 ) and (1 1 ), 1 ° *
2 s
"
C «>2 S _ 2
=
^
+
^
*2 8 - 2
/
+
a
/
X
*“
1
x
fO j~2i
* 2 3 - 1
-
S
/
*23-1
o
S
IX j=i2s *i ■
-“
This is a monotone increasing function of x, for x > ^2 3 - 2 ’ in ParH cu lar, It is a monotone increasing function for x > c0o 0; at x = c„a „ CS"*C S this function has the value zero (equations (1 2 ) and (13))- Therefore C0 23
-
G0 (x) 23—2
>
0,
C
for all
c2 r- 2 ^ x < C2r-1 ’
I < 2r - 1 . A similar argument can be applied to show that
c2 r- 1 < x < c2 r^
02 r ^
= 1
^op
is the t,est strategy.
As an Illustration A strategy for player A is 0g(x)) and similarly player (^(y), ¥2 (y)). The meaning ceding theory shows that the follows:
we consider the case where k = 2 and a = 2 described by a pair of functions (0^x), B fs strategies are given by the pair of these quantities are as before. The pre optimal strategies 0* and are as
MULTISTAGE POKER MODELS
12
I
35
*/ % (-I(x) =
o. condition is that /_“ 0 ( u ) du < °° .
Our final
Such functions were studied originally by Polya In connection with the problem of characterizing those functions 0(u) which can be approxi mated uniformly as close as desired In any prescribed finite interval by polynomials possessing only real zeros [5]. Later, I. J. Schoenberg [6] developed certain variation diminishing properties of Polya frequency func tions. Explicitly, let f(x) be a bounded continuous function. The sign variation of f(x) denoted by V(f) . is defined to be the supremum. of the number of changes of sign of the sequence ff(x1 ), f(x2 ), ..., f(xn )] where the x^ are chosen freely and then arranged in Increasing order so that x 1 < Xp < ... < x^ and where also n is allowed to vary arbitrarily. The fundamental proposition established by Schoenberg concerning Polya fre quency functions is that the transformation
g(x) =
00 J
0 (x - t ) f ( t ) dt
— 00
is sign variation diminishing [V(g) < V(f)] if and only if 0 is a Polya frequency function. Independently, Krein and Gantmacher also developed some of the sign variation diminishing properties for the case of matrix trans formations satisfying property (II). (The author in other publications has exploited these sign changing properties in order to develop some results about parametric statistical decision theory.) A function 0 is said to be a proper Polya frequency function (P.P.F.F.) if all the determinants arising in (II) are strictly positive. Examples of (P.P.F.F.) are (a)
0 (u) = e~u
2
u (b)
0 (u) = sech u
(c)
0 (u) = eu_e
367
BELL SHAPED KERNELS
As pointed out by Schoenberg [6], many new (P.P.F.F.) can be generated by the process of convoluting two P61ya frequency functions as follows: If 01 is (P.P.F.F.) and 02 is a continuous (P.F.F.) with the additional property that for every choice of x^(x1 < x2 < ... < xn ) there exists a set of n y ^ ’s y 1 < y2 c ... c yR which may depend on x^ such that det ||0 (x^ - yj )|| > 0, with a symmetrical condition valid when the x ^ 1s are prescribed in advance, then 00
01 * $2 =
J
0,(x - u)02 (u) du — 00
is itself a P.P.F.F. Any Polya frequency function satisfying the conditions described above for 02 will be called a regular P.F.F. Throughout this paper, unless explicitly stated to the contrary, we shall only deal with regular P.F.F. By convolving a regular P.F.F. function with au2
M u ) = - L r e"
2
v 2no
and then allowing a — *-o, it follows that any regular P.F.F. can be approached uniformly over any finite interval by a P.P.F.F. The key property needed in the sequel in our study of games gen erated by regular P.F.F. is summarized in the following lemma: LEMMA l. Let a^(i = 1, ..., n) be arbitrary real numbers and let y^ be arranged in order so that y 1 < y 2 < ... < y . If 0 (x) is a regular P.F.F., then the function f(x) = a .0(x - y .) has at J
most
n - 1
J
J
changes of sign.
This lemma can easily be obtained as a consequence of the work of Schoenberg. For the sake of completeness we present a proof* PROOF OF LEMMA 1. Let us first assume that 0 Is a P.P.F.F. Suppose to the contrary that x 1 < x2 < ... < xn+1 exists such that f(x^.)f(x^,+ 1 ) < 0. Expanding the determinant 0(x,
-y, ),
0(xn+1 - 7 l )
0(Xl
-y2 ),
0(xn+1 - y 2 )
0 (Xl
-yn ),
0(xn+1 - yn )
f(X l )
,
f(xn+1)
= o ^
KARLIN
368
by the last row we obtain that
n+ 1 (!)
o = x
f(xk )(-i)k + n u Q j
yn +1)
where the U symbol denotes the minor obtained by eliminating the last row and the appropriate column. As and (-1 alternate in sign while all the U are strictly positive, we see that the sum cannot be zero which is incompatible with (i). The result of the lemma for the case of 0 a regular P.P.P. is obtained readily by means of approximating to 0 by P.P.P.P.'s. The details shall be omitted. REMARK 1. A more refined analysis shows that the number of vari ations in sign of f(x) is bounded by the number of changes of sign of the sequence an-d in fact the number of zeros counting multiplicities of f(x) is bounded by the number of changes of sign of a.. Por the proof of these facts see the references [3 ] and [ k ]. The next series of lemmas are new and shall be applied in connec tion with games whose'pay-off kernels are built out of P.P.P. LEMMA 2.
If a positive regular P.P.P.
a real analytic function and y l < ^2
n = 1 Take
c yn >
then
0(t)
is also
x 1 < x2
0 . J'
PROOP. Since by assumption 0 (t ) > 0 the lemma is valid for We proceed by induction and assume that the lemma is true for n. fixed and greater than yn and let x be variable. Define 0(x1 - y 1 ), 0(x1 - y2 ),
•'
0 (x i
-
yn +i }
0(x2 - y 1 ), ..........
■' 0 ( x 2 - yn+i }
0(xn ' V '
•>
(2)
0 (x - y 1 ),
f(x, n
yn+i} >(x - yn+1)
Expanding this determinant by the last row we obtain f(x) = 2k=i “ ^k^ where by virtue of the induction hypothesis the coefficients a^ are all non-zero and alternate in sign. Furthermore, f(x) vanishes at x.j, x2, xn but is not identically zero. To establish this last assertion, let P(s) be the Fourier transform of f(x). Then, by a direct calculation,
369
BELL SHAPED KERNELS 11 1 isy, P(s) = ®(s) Y ake
*
k=i
with $ (s ) the Fourier transform of 0 (x). Since the a^ are non-zero coefficients of a trigonometric polynomial and $ (s) is not identically zero the assertion follows. The analyticity of f(x) implies further that f(x)possesses only isolated zeros. We next show that f(x) changes sign at the points x 1, x2, ..., xn - In fact, for x > xn prop erty (II) of P.F.F.*s implies that f(x) > 0 . Now for xn > x > xn _ 1 by interchanging the last two rows of (2 ) we see that f(x) < 0 . Similarly, for xn > x > xn _2 we obtain that f(x) > 0 by use of two interchanges of rows in (2 ). Continuing these arguments we see that f(x) alternates In sign In the Intervals (- «>, x 1 ) (x1, x2 ), ..., (xn, «0 and has there fore the maximal number n - 1 of changes of sign as required by Lemma 1 . If f(x) should vanish for some | where x^ < £ < x^+1, then | is a root of even multiplicity of f(x). Suppose for definiteness f(x) > 0 in the interval (x^, x-^+1 ). Then the function f (x) - e0 ( x - y 1 ) is nega tive at £. For e sufficiently small thisnew function maintains all the previous variations in sign that f(x) possessed and in addition accumulates two new changes of sign near x = £. This is in contradiction with Lemma 1 . Consequently, f (x) cannot vanish except at x^, x2, ..., x which Is equivalent to the conclusion of Lemma 2 . If the assumption of analytic ity is completely discarded, then the conclusion of Lemma 2 is not valid in general. In fact, 0 (u) = e~^u ^ pro vides an example of a regular P.F.F. which, however, is not a P.P.F.F. The verification of this fact is left as an exercise for the reader. LEMMA 3• (a) If 0 is a regular P.F.F. and 0 1 is continuous, then g(x) = z?^ a^0 1(x - y^) has at most 2n - 1 variations of sign. (b) If 0 11 positive, then g 1 (x) PROOF OF (a).
Let
is continuous and a^ are all has at most 2n changes of sign.
h > 0
be sufficiently small and consider
n
S h (x) = e Z a i { 0 (x + h - y ± ) - 0(x - y ± ) | 1 =1
1
2n = E
X
J‘=1
b j 0(x " 2j }
KARLIN
370
where Zgi-*, = J 1 “ h > z2i = yi and b2i-l = ai while b2i = " ai (i 1, . n). By Lemma 1 the number of changes In sign of g^(y) is at most 2n - 1. Since g^(x) tends to g(x) uniformly on any finite interval, we obtain that g(x) cannot vary its sign more than2n - 1 times. PROOF OF (b).
In this case we approximate to
g*(x)
by
n Ph (x) -
= Z c
a± | 0 (x + h - y± ) + 0 (x - h - yi ) - 20 (x
0(x -
-
y± ) j
w .)
where w • have been arranged in order of increasing values. It Is an easy matter to check that the number of changes of sign in the sequence Cj is at most 2n. For this count the hypothesis that all a^ are posi tive must be used. By an application of the result of Remark 1 following Lemma 1 we conclude that changes sign at most 2n times and similarly g'(x). We shall have occasion to use part (b) of this Lemma in the dis cussion of Theorem 3. Finally, we note that all the analogues of Lemmas 1-3 remain true if we consider sums of the form b .0(x^ - y) which are functions of y with a set of x^ given. §2.
BELL SHAPED GAMES
A game over the unit square with kernel K(x, y) Is said to be a bell shaped game If K(x, y) = 0 (x - y) where 0(t) is a positive analytic regular P.F.F. The reason for the descriptive title is that in deed 0(t ) is a bell shaped function, that Is, 0^k ^(t) has precisely k simple zeros [see 6]. The naming of these games may be slightly misleading since the function ¥(u) = -- ---p 1 + u is also bell shaped, but yet Is not a P.F.F. Nevertheless, essentially all of our results concerning bell shaped games developed below are also true for the game K(x, y ) = ----- ---- p 1 + (x-y)2
BELL SHAPED KERNELS
371
generated by y ( u ) . Perhaps this suggests that much of our theory holds under more general circumstances. However, the proofs employed seem to depend heavily upon the special sign variation diminishing properties of P.P.P. The extension to more general class of kernels which would include P.P.P.1s and the special example cited remains an open question. LEMMA k . The value v of a bell shaped game is always strictly positive and all optimal strategies must be finite distributions. Equivalently, the spectrum (points of increase) of any optimal strategy for either player consists of a finite number of points. REMARK 2 . The argument presented below applies to more general situations than treated here. The main proviso is that the game is analy tic and a mechanism is present to carry out the corresponding limit pro cedure employed in the proof of this lemma. PROOP. As 0 is analytic and strictly positive everywhere we obtain that 0 ( x - y ) > 6 > O for 0 < x, y < 1 . Hence v > 5. If one of the players (say X) had an optimal strategy with an infinite number of values x in its spectrum, then it follows for any such x that for every optimal g(y), we get / 0 (x - y) dg(y) = v. Since the Integral Is an analytic function of x, we get that f Q ] 0 (x - y ) dg (y) = v. But it is known that any P.P.F. 0 (u) must vanish exponentially as |u |— oo (see [6 ] ). Hence as x tends to °°, /^ 0 (x - y ) dg(y)— -► o contradicting the fact that v > o. The analyticity assumption is somewhat vital, since the game K(x, y) = e~ I arising from the P.F.F. 0 (u) = e” ^ does not have optimal strategies with finite spectrum. This will be demonstrated in Section ^ . From now on unless explicitly stated to the contrary we assume that 0 (u) is analytic. All that we in fact shall need to carry out the analysis in the sequel is that the conclusion of Lemma ^ should hold and that lim
|u| ---►00
0 (u) = o.
Lemma k tells us that all strategies involve only a finite number of steps. Our next lemma points out a relationship between the number of steps involved in any optimal minimizing strategy and the number of points in the spectrum, of any optimal maximizing strategy.
KARLIN
372
LEMMA 5 . If there exists an optimal strategy for the minimizing player involving n steps, then all optimal maximizing strategies involve n or n - 1 fixed points.
PROOF. Let with
n
2i = -j ai ly• be
°PtInial
strategy for player II
points of increase where
I denotes the pure strategy for playyI er II fully concentrating its total mass at the point y^. Of course, > 0 and za^ = 1. Let h(x) = za^0(x - y^); then on account of Lemma 3 we knowthat h'(x) changes sign at most 2n - 1 times. Furthermore, h(x) 2m - 1 if m. < n - 2 .If only one of the points o or 1 appears in the spectrum of g(y), the same analysis shows that the number of variations of sign is greater than 2m - 1 . This contradiction proves the Lemma. Our next two lemmas are concerned with some special features about the nature of optimal strategies. Lemma 3 shows that 0*(x) has at most one zero while the fact that 0 (x) tends to zero as x --- ► + oo shows that 0 1 (x) must have at least one zero. In the next two lemmas we have normalized 0 (u) by means of a translation of the independent variable in such a way that 0 f(o) = o. The examples cited previously were all already normalized as indicated. Throughout the remainder of this section we assume this normalization to hold. LEMMA 6 . If 0 (x) is normalized in 0 1 (o) = 0, then the points o and spectrum of every optimal minimizing ever, no maximizing optimal strategy
PROOF. Let g (y ) = s j . , a j
_
and
such a way that i belong to the strategy. How uses 0 or 1 .
f(x ) =
b± I x _be
optimal strategies, with a^, bj_ / o. We may assume that the points x^ (and y .) are arranged in increasing order. It will now be shown that u y 1 = o by demonstrating that any other possibility leads to a contradiction. CASE (i). Let o < x 1 < y 1 . Define h(x) = £j=1 a . 0 (x - y ^ ). Since f and g are optimal we get that, h(x1 ) = v and h(x) < v for x in the unit interval. These conditions and the fact that h(- oo) = o clearly imply that there exists x < x 1 for which h 1 (x) = o. However, such a number x cannot existsince we would have x - y . < o and there fore 0 1 (x - y.) > o for every jwhich in turn implies that h*(x) > 0 . J
CASE (ii). Let o < y i < x } . Set r(y) = s“ =1 b± 0(x± - y). Then r(y) > v for all y in the unit interval and r(y1 ) = v. These conditions require the existence of y < o for which r 1 (y) = o which is impossible since 0 1 (x) < 0 if x > o. CASE (iii). Let y 1 = 0 . This is the only possibility that re mains . Since (i) and (ii) lead to contradictions,« the point o must be a point of increase for every optimal minimizing strategy. A similar analysis is valid at the point 1 . The last part of the lemma is established by arguments similar to that employed in the proof of Case (ii). The details are omitted.
KARLIN
37^
LEMMA 7. 0 = y 1 < x^ < J 2? £°r of some optimal X strategy.
x -j in the spectrum
PROOF. In view of the previous lemma we only need to show that X] c y 2* If x 1 were greater than or equal to y 2, then the function r(y) = h^ 0 (x^ - y) > v must attain the value v both at y^ and at y 2 . Hence, there exists y, y 1 < y < y2, for which r 1 (y) = 0 . But since x^ - y > o we must have 0 1 (x^ - y) < 0 by the normalization. Therefore r'(y) < o, and this contradiction proves the lemma. An analogous proof will show that between must be a point of the spectrum of an optimal
x
yn _ 1
and
yn =
1
strategy.
LEMMA 8 . The optimal strategies for both the Y players are unique.
X
and
PROOF. Suppose there exists an optimal strategy for Y using n + 1 points, then an optimal strategy X can Involve only a fixed set of n or n + 1 points (see Lemma 5 ) in their spectrum. If there exist two optimal X strategies, then Z^.] (x^ - y) = o for y = y 1 , ..., yn+1 with X.J_ not all zero. By Lemma 2 , det |j(0 (x._L - y J•)|j > o which implies a contradiction. Thus the optimal X strategy is always unique. If there exists an optimal X strategy using n + 1 points then the same argument establishes that the Y good strategy is unique. Finally, let us con sider the case where there are two minimizing strategies employing n + 1 points while the unique X optimal strategy involves n points. We ob tain for c^ not all zero that n +1
(1 )
Y
°i
1=1
0 (x " y±) = 0
and with the aid of Lemma
7 as x^
foi>
x = x i> '•'> xn
are all interior to the interval
n +1 (2)
Y, ci 0 '
~ y±) = 0
for
x = x 1, ..., xn
1=1
The relation (1) is equivalent to 0 (Xl - y,) ... 0(x, - yn+1) o =
....... ......... .... .... 0(xn - y , ) ... 0(xn - yn+,) 0 (x ~ y 1 ) ••• 0(x - yn+1)
= ¥ (x)
.
[o,
1]
375
BELL SHAPED KERNELS and thus all the
are non-zero (Lemma 2).
The function
f(x)
was
analyzed in the course of the proof of Lemma 2 and was found to strictly alternate in sign as x passes through x^. Further, in view of (2 ) we deduce that x^ are all inflection points. An appropriate combination
g(x) = 0^0 (x - 0 ) - ag0(x - 1 )
with a 1 andchosen sufficiently small (Iaj_I < €) can be found such that g has a simple zero at x 1, g(x1) = 0 and such that g 1(x1 )[l'(x1 + t)) - y(x1 - t))] < 0 for t) small enough (see Remark 1 following Lemma 1). By selecting € sufficiently small the function f(x) + g(x) retains the changes of sign that Y(x) had at xg, x ^, •••, xn and in the neighborhood of x^ we obtain now two variations of sign. But ¥(x) + g(x) is of the form di - y^) with at least n + 1 changes of sign. This contradicts the conclusion of Lemma 1 and the proof of this lemma is complete. At this point It seems worthwhile to summarize our previous lemmas in the form of a theorem. THEOREM 1. If K(x, y ) = 0 (x - y ) is a bell shaped game with 0 1(0 ) = 0, then the optimal strategies for both players are unique finite step distributions such that (1 ) If the minimizing optimal strategy consists of n steps, then the maximizing optimal strategy is composed of either n or n - 1 steps. (2 ) (3)
n
or
The points 0 and 1 belong to the spectrum of the optimal Y strategy. The first point x 1 In the optimal X strategy separates 0 and the next optimal y point. The last value of the optimal x strategy has a similar property relative to the point 1 .
We shall later show that the X optimal strategy can have either n-1 points of increase so that statement (1) cannot be improved upon. THEOREM 2. If 0 (u) = 0(- u), then the unique solu tion for both players is symmetric about , i.e., if S = {x^) are involved in the spectrum of the maximizing x strategy, then S is left invariant by the transformation x -- ► 1 - x with the weights at the points x^ satisfying the relation when x^ = 1 - x .. A similar statement applies for the optimal Y strategy.
KARLIN
376 PROOF.
Let
f(x) = £ a.1 Ix± represent an optimal strategy for player I. Then, 0 (x^ - y) > v for all y in the unit interval. But since 0(u) is even,we obtain L a^ 0 (1 - x^ - y) = z a^ 0 (- x^ + (1 - y )) = E a^ 0 (x^ - (1 - y )) > v. Consequently, f*(x) = z a. I-is also optimal. Since the optimal strategy is unique, i the truth of our theorem is now immediately clear. §3.
FAMILIES OF BELL SHAPED GAMES DEPENDING ON A PARAMETER
Some results will now be presented concerning a family of games depending on aparameter namely: K(x, y) = 0(x(x-y)), (x > o), where 0 is a regular analytic P.F.F. Let n ( \ ) denote thenumber of steps in the optimal strategies used by the minimizing player corresponding to the game with kernelK^(x, y) and let v(x) be the value of the game. The following lower bound forthe number of stepsin terms of the value can-be established. LEMMA 9 . K(x, y) = n > . iCOl number of
Let K denote the kernel of the game 0 (x - y) where 0 Is a P.F.F., then where v is the value and n is the steps Involved in the optimal Y strategy.
PROOF. Let (y1 ), 1 = 1 , ..., n be the points of the spectrum in the optimal Y strategy with weights a^. Then, a^ 0 (x - y ^ ) < v for every x in the unit interval. At least one of the a. > . Thus 1 ^ since 0 (x) > 0, we obtain v > — 0 (x - yi ) for every ° < x < 1 . Setting x = y^ yields the conclusion. 0 o Upper bounds for n can be obtained with the aid of the Rouche Theorem of complex variables which furnishes bounds on the number of zeros of 0 ’(x) in terms of the bounds of 0 r in a complex region containing the unit interval. Returning now to our considerations of the family of games K^(x, y), we can prove LEMMA 1 0 . As X -- °o, n ( \ ) tends to infinity and v(\) tends to zero. Moreover, n(x) is lower semi continuous . PROOF.
If we consider the uniform distribution, then
BELL SHAPED KERNELS
J K(x,
y) dy =
J
377
1
0(x(x - y)) dy
o tends to zero as x— ► «>. Indeed, since 0 (x(x - y )) -- ►-0 for almost every y and for each x, the Lebesgue convergence criteria implies the above conclusion. Consequently, v ( x ) tends to zero and on account of Lemma 9> we find that n ( x ) goes to infinity. It remains only to show that n( x ) is lower semi-continuous . If x ^ ► x, then since the opti mal strategies are unique the optimal strategies for the x^ game must converge to the corresponding optimal strategy for the x game. But the number of steps involved is n( x ) and thus for i sufficiently large n(x^) > n(X), which establishes our lemma. For the game defined by the kernel K^(x, y ) with X sufficient ly small K^(x, y ) is a strictly concave function in each variable separate ly keeping the other variable fixed. By a known result [7 ], such kernels possess unique strategies such that the optimal X strategy concentrates at a single point x^ and the spectrum of the optimal Y strategy con sists of the points 0 and 1. As X increases we reach a value XQ such that slightly beyond XQ the optimal X strategy and the optimal Y strategies both involve distributions with two steps. As X further in creases we arrive at the circumstance where the optimal Xstrategy in volves two points and the optimal Y strategy uses three pointsand so this pattern continues. In the next theorem wedemonstrate anddiscuss in greater detail the validity of this pattern. For purposes of expediting the exposition we introduce the follow ing notation. Let m(x) denote the number of steps in the optimal x strategy for the game K(x, y ). Also, let us recall that the function n(X) is the number of steps involved in the optimal ystrategy. Inview of Lemma 5 , m(x) is equal to either n(x) or n(x) - 1 . THEOREM 3• If XQ is a critical value such that the number of steps involved in either the optimal x or optimal y increases as X increases beyond XQ, then one of these two patterns must occur: (a)
If m(x ) = n(xQ ) - 1, then for X > XQ but sufficiently close to X, we get m(x) = n(x) = n(XQ ).
(b) If m(x ) = n(XQ ), then for X close XQ (x > Xq ) It follows that n(x ) = m(X) = n(X) - 1.
to
378
KARLIN THEOREM 3a. If XQ is a critical value such that the number of steps involved in either the optimal x or optimal y decreases as X Increases beyond X then one of these two patterns must occur (a)
If m( \ 0 ) = n(xo ) - 1, then for X > but sufficiently close to X, we get m( x ) = n(X ) = m( XQ ).
(b)
If
m(x
X
) = n(\
>
),
V
then for
X
close to
It follows that
X
n(x ) =
m( ) + 1 = n ( ).
The proof of Theorem 3 only will be given. Theorem 3a are similar.
The arguments for
PROOF OF (a). Suppose to the contrary that for X sufficiently close to XQ but larger than XQ, we have n(x) > n(XQ )+ 1. Let the optimal x strategy for the game with parameter X = XQ be n U Q )-l
s =
I
i =1
1
Suppose the optimal Y strategy concentrates at j ^ ( x ) . As X approaches XQ from above, at least two of the points in the spectrum ofthe optimal Y
must reduce to a single point.
Denote this point by
y^
Let o
n U Q )-1
g(y) =
X
M V 0(x1(*-o) - y)-
1=1 As g r(o) and g l(l) are not zero because of the normalization 0 1(o) = o and Lemma 7, we deduce that y. is interior to the unit interval. It now o follows that g 1(y) possesses a multiple root of odd order for y = y. . o Also, between two consecutive interior y-j_(^0 ) there are at least two values of y (the points where g changes from concavity to convexity) where g"(y) changes sign. This is also valid for y < y2 and for y > Yn ( x (remembering that y 1 = o and j = l)* Thus g M(y) has at least 2n(xQ ) - 2 changes of sign. Moreover, g M has a root of even multiplicity at y. . Suppose for definiteness g M(y) > o for o y 4 y\ In the neighborhood of y. . Since the function h M(y) = o o €0h (x 1(a,o ) - y) has at most a simple rcot at y = y^ , we find that for
BELL SHAPED KERNELS
379
e of sufficiently small magnitude and of appropriate sign the function k"(y) = g M(y) + h M(y) has two sign changes in the neighborhood of y. . o Moreover, if € is chosen small enough, then k ,!(y) maintains all the original sign changes that g" (y) possessed. Hence, we have shown that the function k(y) = g(y) + h(y) which is of the form n(xQ )-i X i= 1 where 2n(XQ ) (b) of n(x) = for x
0± 0(x. - J )
c^ are all positive has a second derivative k"(y) with at least sign changes. This is in contradiction of the conclusion of part Lemma 3* The only compatible interpretation requires that n(XQ ). Since is a critical value, it follows that m ( x ) > m(x ) > XQ .
Finally, m(x) cannot be greater than n(XQ ) since otherwise, on account of Lemma n(x) necessarily larger than or equal to m( x ) would also be greater than n(XQ ). This was just shown to be impossible. The proof of part (a) is thus complete. PROOF OF (b). The argument is similar to only slight modifications and so only an outline of Specifically, it will be shown that if m( x ) > n ( x X , we deduce a contradiction. In fact, if y ^ ( spectrum of an optimal Y strategy, then
that of (a) above with the proof will be given. ) + 1 for x close to ) are the points in the
n U 0) r(x) =
X 1=1
a1 0 (x - y i )
has relative maxima at x^Xq), i = 1, ..., m( XQ ) = n( XQ ) with at least one value x^ a multiple root of odd order for r ’(x). The last assertion is * a consequ§nce of the assumption that m(X ) > n(XQ ) +1 . Again, rM(x)has at least two changes of sign associated with each x^ (the points where r(x) changes from concavity to convexity in a neighborhood about x^) and in addition r"(x) possesses a root of even multiplicity at x. .As o in part (a) we arrive at a function s(x) of the same form as r(x) with positive coefficients such that sn(x) possesses 2n(XQ ) + 2 sign changes. This is in contradiction of Lemma 3• Therefore, our assumption that m(x) > n(xQ ) + 1 is false. The final conclusion of our theorem is now clear. It seems worthwhile now to indicate how one could compute these critical values corresponding to where the numbers of steps involved in the optimal strategies change. We will illustrate this only with kernels
KARLIN
380
generated by even P61ya frequency function (0(u) = 0 (- u)). The results are given without proof. For X. sufficiently small K^(x, y) = 0 (\(x - y )) is a concave function in each variable separately. Thus the optimal X strategy concentrates at a single point which in view of the symmetry of 0 and Theorem 2 must be located at ~ . The optimal Y strategy concen trates only at 0 and 1 with equal probabilities. As X increases we reach a value X^ such that a multiple root of odd order for
g r(x) = ~2~ 0 f(X1(x - 0)) + —
0 f(X1(x - 1 ))
appears at
— . This is the first critical value.
by setting
g M G=r) = 0
we obtain
X^ = 2 .
For
K(x, y) = e-A.(x~y)
For
K(x, y) = ---- o 1+ X(x-j) not a P.F.F., which however shares many of the game theoretic properties of bell shaped games, has Its first critical number X^ = y . For X slightly greater than X : the optimal maximizing strategy must concentrate at two points x 1 and 1 - x 1 = xg (x1 < 75- ) with equal probability so that 1 0(x(x x
such that
|y^ - x| =
0.
dx '
385
BELL SHAPED KERNELS
In view of the remark proceeding this lemma, we secure for our special choices of y 1,j 2 and y^ that the determinantin (3) is strictly nega tive. In order to avoid this contradiction wemusthave that
4
j> > °
dx
for
x > y.
An analogous argument proves that
a2K (x, y) > o dy2 for
x > y.
By symmetry,
(x, y) > 0 0 Sy
for
x < y.
The proof of our lemma is now complete.
These two lemmas together yield the following theorem: THEOREM k . If K(x, y ) is a G.P.T. kernel such that either (i) K(x, y ) satisfies Kxx > 0 and > 0 in each of the regions x < y and x > y separately in the unit square or (ii) K(x, y ) = y (|x - y | ) for all real x and y, then the optimal strategy for each play er is unique and is composed of a density, positive over the entire unit interval, and two jumps at 0 and 1 . Moreover, the density part of the strategy is obtained by solving an appropriate integral equation by means of a Neumann series. An example of this kind is §5-
K(x, y) =
.
GREEN KERNELS
As a specific application of this theory about P.T. kernels we shall study games whose kernels are the Green’s functions of appropriate
386
KARLIN
differential equations. The optimal strategies for these games are directly related to the coefficients which define the differential equation. EXAMPLE i. Let continuous for a < x < b the differential equations: (pu!)1
p(x) > 0be differentiable and q(x) where we assume that a < o < 1 < b.
- qu = 0,
> o be Consider
a < x < b.
We assume that this equation hastwo linearly independent solu tions 0 (x) and ¥(x) satisfying suitable boundary conditions (for ex ample 0(a) = Y(b) = o) which in addition have the following properties: 0 (x) > o, ¥(x) > o, 0 f(x) > o, ¥ ’(x) < o, and y "(x ) > o, 0" (x) > o, over the interval o < x < 1. Consider a game defined over the unit square whose kernel Is the Green's function of the differential equation. Explicitly
r
0 (x ) f ( y )
x < y
I ¥ (x )0 ( y )
x > y
K(x, y)
K(x, y) =
It is an easy matter to show that K(x, y) is a G.P.T. kernel and in fact K(x, y) defines a G.T. kernel. Consequently, the optimal strategy is the same (as K(x, y) = K(y, x)) for both players and is of the form F = (aIQ, f (x), PI1) where f(x) is the density part of the strategy with a and p representing the magnitude of the jumps at 0 and l respect ively. By the methods of [2], it follows that f(x)
0
_ 0"yt
[0 1¥ - 0 Y 1 ]2
a = v 0 ( o)
[0
*( o ) y ( o )
- 0 ( o ) ¥ f (0 ) ]
Y !(1 ) ___________1___________ Y(1 )
[0 1 (1 )¥(1 ) - 0(1 ) * ’ (1 )]
where v is the value of the game. We shall now show that f(x) = kq(x) for some suitable constant. Indeed, as p0" + p !0 f - q0 = o = p¥M + r ’¥ r - q'F and p[0!¥ - 0¥ f] = constant, we can solve for q and we get f(x) = kq(x). A similar reduction implies that
BELL SHAPED KERNELS
387
and
The -unknown constant The
k
Is determined so that
/ f(x) dx = 1 - a - p.
following differential equations
satisfythe
conditionsof
Example 1:
and f(x)
I. If u ,r - u = 0 , -- 00 < x < then 0(x) = ex , ¥ (y) = e”^ K(x, y) = e“ lx~y L Therefore, the density part of the solution is = kq(x) = k. II.
Consider
u ,! - (1 + x2 )u! = 0 . This equation has solutions
0 (x) = ex
f
2/
x
e~ t 2 dt
and
*(y) = ey Therefore, if exp
i(K2 * y2 )]
f
e~ t 2 dt
OO
.nv
J
0n (x) dx = v t p O J ^ O ) - p ( o ) 0 ^ ( o ) ] .
If we now substitute this expression in equation (l), we conclude that the condition
R(y) = v
is equivalent to
a 0 n (o) + P0n ( l ) + v [ p ( l ) 0 ^ ( l ) - p ( 0 ) 0^( 0 )] s 0 . Therefore, in order that
F(x)
be optimal it is sufficient that
r
a0n (o) - v p C o ^ ^ o ) = o
L
P0n(o) +
(2 )
vp(i ) 0 ^ ( i )
= o
.
If we compare these equations with the original boundary conditions
a0n (o) +
0 0 ^ ( 0 ) =o
b0n (l ) +
d01!t( 1 ) = o
we see that (2) will besatisfied with a >0,
b > 0,
a >
0, p > 0, vq > 0
whenever
- - < ^ =^ > 0 .
Under these assumptions a = ka,
where
k
3 = kb,
v = - k
= k ^Ly
is determined so that Jvq( x) dx = 1 - a - p.
We remark that the optimal
F(x)
is unique.
BIBLIOGRAPHY [1]
DRESHER, M., KARLIN, S., and SHAPLEY, L. S., "Polynomial games", Annals of Mathematics Study 2k (Princeton 1 9 5 0 ), pp. 161 — 1 8 o-
[2]
KARLIN, S., "Reduction of certain classes of games to integral equations", Annals of Mathematics Study No. 28 (Princeton 1953), pp. 125-158.
BELL SHAPED KERNELS
391
[3 ]
KARLIN, S., "Decision theory for Polya type distributions, case of two actions I?!, To appear in the Third Berkeley Symposium on Prob ability and Statistics.
[4]
KARLIN, S., "Polya type distributions II", To appear In the Annals of Mathematical Statistics.
[5 ] POLYA, G., ,fUber Annaherung durch Polynome mit lauter reelen Wurzeln". Rendiconti di Palermo 36 (1913)* pp. 2 7 9-2 9 5 . [6]
SCHOENBERG, I. S., M0n Polya frequency functions11, Journal Mathematique 1 (1 951 ), pp. 33 1-37^*
D 1analyse
[7 ] BOHNENBLUST, H. P., KARLIN, S., and SHAPLEY, L. S., "Games with continuous convex pay-off", Annals of Mathematics Study 2 k (Princeton 1950), pp. 1 48-153•
Samuel Karlin Stanford University
ON DIFFERENTIAL GAMES WITH SURVIVAL PAYOFF H. §1.
E. Scarf
SURVIVAL GAMES
In a paper appearing elsewhere in this volume, a class of differ ential games with integral payoffs Is discussed by Fleming [5 ]. The games are defined and it Is shown that, under fairly stringent conditions, the game will have a value and each player will possess e-effective strategies. We shall, in this paper, discuss differential games with a survival•payoff. Basically, because of the fact that survival games may last for an infinite length of time, and the games discussed in the reference mentioned above are explicitly limited to a finite time, the techniques available for the treatment of survival games are more Involved and comparable results are more difficult to obtain, than in the case of games with an integral pay off. For these reasons, the emphasis in this paper will be on proving a result which is somewhat weaker than the existence of a value for sur vival games. What we shall do is define a series of approximating games with a discrete time parameter, and show that under conditions similar to those given by Fleming, both the upper and lower values of the approxi mating games converge to the same limit, as the grid size for the time parameter tends to zero. We shall not make any statements mal strategies; but, inasmuch as the value characterize optimal play, their limit may indication of the limiting characteristics
about the convergence of opti functions for the discrete games be expected to give a reasonable of optimal play.
The discrete games that we are interested in are readily seen to be generalizations of the survival games considered by Hausner, [8], Peisakoff, [12], Bellman, [1 , 2], Blackwell [2 ], LaSalle [1], and Milnor and Shapley [11]. We consider a bounded n-dimensional region R, with boundary B. A bounded n-dimensional vector-valued function g(x; y, z) is given, which for each x in R either is a continuous vector-valued function on the (y, z ) unit square, or is a vector-valued matrix. The game is played as follows: A vector x , interior to the region R, is 393
SCARP
394
chosen. On the first move, player I chooses a particular value of y and player II simultaneously chooses a particular value of z, and subsequently a straight line is drawn from the point xQ to the point x 1 = xQ + Sg(xQ; y, z). The players then make another choice of y and z and we draw a straight line from x 1 to x2 = x 1 + 5g(x1; y, z). This process is repeated until the path penetrates the boundary, at which point the game is terminated. The payoff is defined as follows: We have a function b(x), defined and continuous on the boundary B; if the game terminates at the point x, then the payoff to the first player is b(x), and the second player receives the negative of this amount. In order to complete the defi nition of the game, we should define the payoff in the event that the game does not terminate, but, as we shall see, this is a matter of indifference to us. Of course, both players are permitted to use mixed strategies at every stage of the game. In order to indicate the dependence of this game upon the parameter 5 we shall designate this game by G& . As 5 tends to zero, the motion of the game tends more and more to bedescribed by the equations x = g(x; y, z), and these are the de fining equations of a differential game. Let us assume for the moment that G5 has a value W5 (x), x being the initial starting point. We shall give sufficient conditions for this sequence of functions to converge, and also obtain a system of differential inequalities whose solution represents the limiting function. Conditions for the existence of W6 (x) are, at present, known only for the one-dimensional case, but as we shall see later in the paper, it is not necessary to assume the existence of the value. It will actually be true that both the upper and lower values (defined appropriately) will converge to the same limit. We define W*(x) to be the best that player I can guarantee himself, using mixed strategies, in G5 . That is, W 5 is the Sup Inf over the payoff, in mixed strategies. W~(x) is defined to be the Inf Sup of the payoff. It is true that W*(x) < W~(x). §2.
SURVIVAL GAMES WITH FINITE TIME
As was mentioned In the previous section, one of the basic differ ences between survival games and the type of games with integral payoffs treated by Fleming in [5 ], is that the latter are played for a finite length of time, whereas, the former may continue indefinitely. It is possible to modify survival games so as to make them last for a finite length of time, and we shall devote this section to a discussion of games of this sort. The purpose will be to point out some important differences between in finite survival games and their finite counterparts. In order to define the discrete analogues of a finite survival
395
GAMES WITH SURVIVAL PAYOFF
game, in addition to the data given in Section 1, we have to fix a time T, and also require that the "boundary function b(x) be extended continuously throughout the interior of the region R. The game Gt n
( x q,
T)
is defined as follows: Starting from xQ, a broken line path is construct ed from x_O to' x 1, and from x.I to x with the associated boundary condition possesses more than one solution. More specifically, there Is a large class of functions g, for which the equation
Val ( g , g ) = 0 is true for every continuously differentiable function W. The class of survival games which have this property are the furthest removed, in both technique and final results, from finite survival games, and it Is this class of games which we shall discuss In the remainder of this paper. §3 . UNBIASED DIFFERENTIAL GAMES In the previous section it was mentioned we shall restrict our attention to a specific class of survival games, which we shall call un biased games. They are, roughly speaking, described by saying that at each point neither player can force any particular direction. DEFINITION 1. A differential game is said to be unbiased if for every x in R, and for every vector c, the scalar product (c, g(x; y, z)), when considered as a game over the (y, z) space, has value zero.
GAMES WITH SURVIVAL PAYORF
397
We shall discuss some examples of unbiased games In detail later on, but let us, for the moment, examine a specific game in the plane. Let the region and the boundary value be arbitrary, and let g(x; y, z ) be inx and equal to the matrix (1, 0)
(-1, 0)
(o, 1 )
(o, -1 )
(o, -1)
(1, 0)
(-1, 0)
(o, 1)
(o, 1 )
(o, -1 )
(1, 0)
(-1, 0)
(-1, 0)
(o, 1 )
(o, -1 )
(1, 0)
a specific example of a general class of unbiased games, i.e when g(x; y, z ) Is given by a cyclic matrix with row-sum equal to zero. The optimal strategy for either player in any projection of the above matrix, i.e., for any linear combination of the matrices, is to play each row, or column, with probability one-fourth. In the game G&, all of the elements of strategy customarily associated with a game are lacking; If both players play optimally, the resulting stochastic process is a simple random walk in which the point moves by an amount 5, with probability onefourth in the north, east, south, or west directions. The value function W Q (x1, x2 ) satisfies the equation W s(x1, x2 ) = Tj- W(x1, x2 + S ) + -jj- W8 (x1, x2 - 5 )
and is close to b(ia) when x is near the boundary point \±. It Is wellknown that, as b tends to zero, these functions converge. The limit func tion is harmonic in R and assumes the boundary value b (x ). In the general case our results will be somewhat similar; the primary difference will be the replacement of the Laplacian by a consider ably more complex differential operator. We need some definitions. DEFINITION 2. We define D„_ yz linear differential operator
Z
The operator
D___ yz
sk(x’
to be the first-order
z) A • dx '
is defined to be
X Sk (x J 7> z)g'e(x; j , k, i
z)
“ Tr--j Sx Sx
SCARF1
398 DEFINITION 3*
We define the operator
L(f)
to be
This latter definition is very important for us, and needs some comment. Let us first apply this definition to the cyclic game just dis cussed. In this case it is easy to verify that the matrix
becomes
„ f
+ i f
f 2 + —2 f 22
f + —2 f 22 x2 f 2 + —2 f 22
22
f 1 1 + -2 f 11
f
f 2 + —2 f 22
f 2 + —2 f 22
f
2
2
f 2 + —2 f 22
1 + —2 f 11 1
+ — f 2 11
- f2 + ■ — f 2 22
which is itself cyclic and therefore has the value -^(f-, j + f 22). If we divide by 5 and let S tend to zero, we see that L(f) is, aside from a constant factor, the Laplacian. Another interesting case occurs when the functions g(x; y, z) are again independent of x and are given by the two-dimensional matrix
(1 , 0 )
(-1 , 0 ) ]
(-1 , 0 )
(1 , 0 )
(o, 1)
(o,
(o, -1 )
(o, 1 )
-1 )
This game is unbiased, and the matrix
becomes
i-t ro
f1
f2
+
6 2
+
5 2 f n
+
6 2
OJ OJ
f1
f 2 + —2 f 22
+
5 f 2 22
f 2 + —2 f 22
GAMES WITH SURVIVAL PAYOFF
399
The maximizer can ensure himself of at least Max (f11, f 2 ) by playing either the first two rows (if f 11 > f 22) or the last two rows (if f22 > f 11) with equal probabilities, and the minimizer canhold him to at most this amount by playing the two columns with equal probabilities. It follows that L(f) = ~ Max (f11, f22). The definition of L(f) may be put into another form by means of a result due Independently to Gross [7 ] and Mills [1 0 ]. Itis based on the observation that L(f) is equal to the derivative of Val with respect to
6,
when
(
Dyz + i2 D2 f yz ) I
6
Is zero.
LEMMA 1. Let P(f, x ) represent the class of opti mal strategies for the maximizing player In the game ||DyZf ||, and Q(f, x ) the corresponding class for the minimizing player. Then L(f) = Max Min peP qeQ
JJf f d~
(f )dp(y)dq(z).
There Is an analogous statement for matrices. The proof of the statement for matrices may be found in the paper by Mills; the proof of the statement for the continuous case follows the same lines. §4.
THE MAIN RESULTS
We have given one example, the cyclic case, in which the limit of the value functions exists, and is equal to the solution of the Laplace equation which assumes the correct boundary values, that is, which is equal to b(x) on the boundary. In general, the result will be analogous to this; the value functions, or, if these do not exist, the upper and lower values, will converge to the solution of L(f) = 0, which Is equal to b(x) on the boundary. Our actual technique will be to approach this re sult from above and below, in much the same manner that super- and subharmonic functions may be used to approximate harmonic functions. THEOREM 2. Let L(f) be the operator associated with an unbiased differential game. Let f(x) be a func tion with continuous bounded derivatives up to the third order in some open set S containing the closure of R in its interior, and which satisfies the con ditions :
400
SCARF 1.
L(f) > c > 0,
2.
f (x) < b(x)
for all interior points of fop
x
on the boundary
R,
B.
Then W*(x) > f (x).
him 5
O
PROOF. Let us fipst of all choose 6 so small that, if x is an intepiop point of R, then x + 8g(x; y, z ) is In ScWe now define a strategy fop the first playep in Gg, by defining a set of probability distributions dp(y; x , 5). If the position of the game Is x, then the first player is to play y with probabilitydp(y; x, 6). We define dp(y; x, 5) to be any optimal strategy for the first player in the game
"yz ' 2
yz
Let us examine the variations in the function
f(x)
as this strategy de
velops . We know that f (x + 5g(x; y, z)) = f (x ) + 6D f(x) yz‘ 2
+ ~ where
r (x, 8)
Is bounded for all
5
and
the play to have started at x , and if quence of plays, then we may say that
DyZf(x) + s3r(x, 6) , x
in the region.
l E
(f (x n )|V
-
Xn -1
)
> f ( xn -l)
If we imagine
(xn ) represents a typical se
+
5 Val
|| ( Dy z
R + I
O \f Dy z )
- 53M.
The way in which we have chosen the stpategy f o p the fipst playep implies that E ( f ( xn ) l v and fop
5
' ‘
’
xn~ 1 )
? f ( xn-l)
+ °82 - 53M,
sufficiently small .2 e
( « x „>1x 0, ....
Since the function f(x) is bounded inside the pegion R, we can deduce that with probability one the sequence (xn ) will leave the region. Let n* represent the random variable which is the length of time that the process continues; that Is, n* is the numbep of steps, starting
bo*\
GAMES WITH SURVIVAL PAYOFF
from xQ and continuing until the process penetrates the boundary for the first time. It is a standard result from the theory of martingales that E(f (xn * )) - f (x0 ^ P* 3 02 ]. The point xn* Is within 6M of the boundary point x, if this is the point of penetration of the boundary; and since f is continuous, we may conclude that E(b) > f ( x ) - e, where e tends to zero with 6. It follows that Lim 5 -- o
W+(x) > f (x).
It is not difficult to give conditions which guarantee that the class of functions satisfying the conditions of the theorem is not empty. One condition, which we shall not use later, is that |g(x; y, z )| be 1 P bounded away from zero. Consider the function f (x) = 2*1x I- d, where d is a positive constant chosen so large as to make f(x)less than the 2 minimum of b(x). Then D_ (f) is equal to |g(x; y, z)| , and if c 1 o is positive and less than the minimum of -p|g(x; y, z )| we easily obtain
// for any distributions
p
•5- DyZ( f )dp(y)dq.(z) > c
and
q.
Applying Lemma 1, we see that
L(f) > c .
It is clearly possible to reverse the conditions of Theorem 2 , and obtain an upper bound for Lim 5 -—
W~(x) o
We would, in this case, be dealing with the analogue of a superharmonic function. Continuing in this spirit, let us define two classes of functions. DEFINITION b . A function f(x) will be said to be in class M+, if it has continuous bounded derivatives up to the third order in some open set containing the closure of the region R, satisfies L(f) > c > 0 inside the region, and is less than or equal to b on the boundary. DEFINITION 5- A function f(x) will be said to be in class M~, if it has continuous bounded derivatives up to the third order in some open set containing the closure of the region R, satisfies L(f) < c < 0 in side the region, and is greater than or equal to b on the boundary.
in
M+
Our results so far may be summarized by saying that any function is a lower bound for
2
h02
SCARF Lim
Wg (x) . o
6 It Is also true that the pointwise denote by
W+ (x),
Sup
Lim 5 --- ►. o
and all
of all functions in
M+,
which we
is a lower bound for
similarly the function functions in M~ , • forms an
W+(x)
;
D
W~(x), definedto be the upper bound for
pointwiseInfof
Lim W~(x) 5 -- *-0 Suppose that we can demonstrate that W + (x)
W+ . It is therefore true that W+ = W “ = W, and the theorem is proved. §5.
REMARKS
1. We would, firstof all, like tocomment upon some of thecon ditions imposed in Theorem 2 . There are many examplesin which the con ditions of Theorem 2 are violated, and in which convergence occurs. If we examine the example in Section 3 in which L(f) turned out to be Max(f1, f2 ), we see that it violates the condition that optimal strategies in any projection actually produce non-zero motion in that projection (this is the verbal translation of the first condition in Theorem 3 )• It is, 1 2 however, true that if L(W) = 0, then L(W + |x| ) is greater than zero for positive a , and less than zero for negative a. This is sufficient to prove that there are functions arbitrarily close to W which lie in M+, and ones which lie In M”“, and this is the basic idea of Theorem
SCARF 2. It is also possible, in this case, to dispense with the condition that the gradient of W be different from zero. In general, if L(f + — |x|2 ) is greater than L(f) for a positive, and less than L(f) for a nega tive, then the existence of a solution to L(W) = o, with the correct boundary conditions, is sufficient to yield the conclusion of Theorem 2. It would be very interesting to see what classes of games produce operators with this property. It is, of course, the non-linearity of the operator L(f) which makes it difficult to obtain any general results about its behavior. 2. We have restricted our treatment so far to unbiased differential games. In this type of game, neither player has the option of forcing the expected change of position to be in a favorable direction. He must, at each stage of the game, rest his hopes upon the variance of his choices. It is precisely this feature of unbiased games which gives rise to oper ators which resemble elliptic second-order differential operators. In general, survival games are not unbiased. For example, if we expect the limiting game to have pure strategies for each player, then it seems unreasonable to impose a condition which makes the choice of the variance the only strategic element. It is quite possible, however, that in a game which is not unbiased, the players may, at various moments, be forced to pay some attention to the variance of their moves, so that the resulting play will be governed by a combination of first- and secondorder operators. BIBLIOGRAPHY [1] BELLMAN, RICHARD, and LASALLE, JOSEPH, !,0n non-zero sum games and stochastic processes," The RAND Corporation, Research Memorandum RM-2 12, August, 1949* [2]
BELLMAN, RICHARD, and BLACKWELL, DAVID, "On a particular non-zero sum game," The RAND Corporation, Research Memorandum RM-250, September, 1 9 5 0 .
[3 ] DARLING, D. A., "The continuous pursuit problem," unpublished work, September, 1954. [4]
D00B, J. L., Stochastic Processes, New York, John Wiley and Sons, 1953 -
[5 ] FLEMING, W . H., "A note on differential games of prescribed dura tion," this Study. [6]
BERKOVITZ, L. D., and FLEMING, W. H., "On differential games with an integral payoff," this Study.
[7 ] GROSS, OLIVER, "The derivatives of the value of a game," The RAND Corporation, Research Memorandum RM-1 286, March, 1954. [8]
HAUSNER, MELVIN, "Optimal strategies in games of survival," The
GAMES WITH SURVIVAL PAYOFF RAND Corporation, Research Memorandum RM-777; February, 1 9 5 2 . [9]
[10]
ISAACS, RUFUS, "Differential games II: the definition and formula tion," The RAND Corporation, Research Memorandum RM-1399* November, 1 9 5 k • MILLS, HARLAN D., "Marginal values of matrix games and linear pro grams," Annals of Mathematics Study No. 38 (Princeton, 1956), pp. 183-1 9 3 .
[1 1 ] MILNOR, J. Wo, and SHAPLEY, L. S., "On games of survival," The RAM) Corporation, Paper P-6 2 2 , January, 1955. [12]
PEISAKOFF, MELVIN, "More on games of survival, " The RAND Corporation, Research Memorandum RM-8 8 June, 1952.
H. E. Scarf
The RAND Corporation
A NOTE ON DIFFERENTIAL GAMES OF PRESCRIBED DURATION W. H. Fleming1 § 1.
INTRODUCTION
The term differential game, as used here and in [1], [2 3, applies to a multi-stage process with continuous time in which each of two players continuously exerts a control on the position of the process. At present there is no satisfactory general mathematical formulation of differential games, let alone existence theorems for solutions. Part of the difficulty is to describe mathematically a situation in which each player is allowed to randomize continuously in time, the particular distribution which he uses at time t depending on the outcome of play for time less than t . A possible approach is to replace continuous time by discrete time, and to prove that the values of the time-discrete games converge to a limit as the maximum time between successive moves shrinks to o.' In [2], Scarf approaches the problem in this way. He introduces a certain second-order differential operator L(V) such that, if the equation' L(V) = o has a sufficiently differentiable solution V with the appropri ate boundary values, then the values of the time-discrete games converge to V(u) for every permissible position u of the process. Thus, V(u) may in some sense be called the value of the limiting differential game. An essential feature of Scarf’s formulation is that termination of play occurs at that time when the position u reaches the boundary of the region in which it varies. In this note we consider an analogous situation, with the impor tant difference that the time T (finite) when play terminates is fixed and prescribed in advance. In this case the differential operator is of first, rather than second, order and the proof of the corresponding approxi mation theorem under the assumption of existence of a continuously differ entiable solution is very simple. In addition to the time-discrete form we also define the The preparation of this paper was sponsored (in part) by the RAND Corporation. 1+07
FLEMING continuous process Itself as a game. Under the same restrictive assumptions as in the approximation theorem, the value of this game exists and coincides with the limit of the values of the approximating games. §2. Let
t
denote a
TIME-DISCRETE FORM point of the real interval
(o, T),
x = (x1, ..., xm ) a point in euclidean m-space, and y, z points of com pact sets Y, Z, respectively, in some euclidean space of finite dimension. Let f(x, y, z), g(x, y, z ) be continuous functions, f being real val ued and g = (g^, ...* gm ) having values in m-space. It is assumed that g satisfies a uniform. Lipschitz condition in x, namely, there isa posi tive constant K such that |g(x, y, z) g(x!, y, z)| < K|x - x l| for all x, x f, y, z. A time-discrete version of the process is defined in the following way. Consider a partition it of the interval (o, T), namely, a set of points t-j, t^+1 with o = t 1 < t 2 < ... < t ^ < t^+] = T. Set 5k = ^k+i ~ k = 1'•••> N. An initial position x 1 is given.At move k = l, ..., N, Player I chooses y^ from Y and II chooses z^from Z simultaneously. Both know the position x^ of the game after k - 1 moves before move k is made. The new position x^+1 is given by (-2 ’1 ^
Xk+1 = xk + 5k s(xk> yk> zk )j
k = 1, . . N.
The total payoff is N (2.2)
X
8k f(xk ’
zk } + V ^ N + I *
’
k=i where V (x) is a given continuously differentiable function describing the worth of the final position of the game. We say that a non-negative measure on Y (or similarly, a non-negative measure v on Z) is an elementary distribution if ii(Y) = 1 and i~i is carried by a finite set of points. Let a strategy for I in the above game consist of a function n(x; k) of x and k = 1, ..., N, whose range is contained in the set of elementary distri butions on Y. A strategy for II is a function v(x; k) with the corre sponding properties. At move k, I chooses y^ according to the ele mentary distribution ^(x^j k), and II chooses z^ according to v(x-^; k). Since the measures ii(x; k) and v(x; k) are carried by a finite set of points for each (x, k), there are no integrability diffi culties involved in calculating the expected payoff after N moves; the
GAMES OP PRESCRIBED DURATION integrals which appear reduce to finite sums. Let V(x1, T; it) denote the value of the game, if it exists. Let n 1 denote the partition of the interval (o, T - 6 ) by the points t 1, ..., t^, where t^ = t^+1, i = 1, ..., N. Since t does not appear explicitly in f or g, we may regard the continuation of the game after the first move as a new game over * 1 with initial position x2 . We denote by val M(y, z) the value of a game over Y x Z with payoff M(y, z ), provided M(y, z) is continuous. For any such game, p e-effective elementary distribution strategies exist for every € > o. THEOREM 1. For each T > 0,initial position x, and partition it of (o, T), the game has a value V(x, Tj it). The value is continuous in x for each T and n, and satisfies the following functional equation: (2 .3 ) V(x, T; it) =
val
(y>z)
S^Cx, J ,
z) + V(x + SjgU, j ,
z),
T - S1; It1)
J
where we make the convention that V! V(x, T - 5 1; * 1) = V0 (x) for all x in the case 51 = T. Theorem 1 can be easily proved by familiar arguments using in duction on the number of points of n; we omit the proof. While Theorem 1 guarantees the existence of a value and €-effective strategies for every € > 0, optimal (i.e., 0-effective) strategies need not exist. REMARK. The essential piece of information the players use at each stage k of the game is the current position x^. It can be proved that a more general formulation of strategy which makes use of complete past history would not change the value of the game; €-effective strategies of the above game are still e-effective in the more general game. §3 .
CONTINUOUS FORM
The natural continuous analogue of the game just described would be obtained by replacing the difference equation (2 . 1 ) by a differential equation, the sum in (2 .2 ) by an integral, and strategy by a function \i (x, t) (or v(x, t )) with elementary distributions as values. Since we do not know how to formulate this idea precisely, let us consider the 2
A strategy for I is e-effective if it yields at least val M(y, z ) - e against every strategy for II. e-effective strategies for II are defined correspondingly.
FLEMING following modification which preserves many of the time-discrete features. A strategy for Player I for a game of duration T now consists of a set (t’, t^) and a function n(x, t), o < t < T with values in the set of elementary distributions on Y, subject to o = t' *1 < 2t^ z^
together with the initial position x(o) = c . The strategies or Instruc tions for play may take into account the state x(t) of the game at time t, it being assumed that each player always knows the state of the game. The payoff to Player I is given by the integral
P
=
J
T f [x (t ), y(t), z(t)]dt
o Player I ’s objective Is to choose a strategy which enables him to maximize P, while the second player’s goal is a strategy which will minimize P. To give a precise, useful mathematical formulation of the prob lem is a nontrivial task, and the present paper by no means attempts to give a complete theory. There is not even an existence theorem ensuring that max min = min max. Some of the results which we shall present were first obtained in hitherto unpublished works by Isaacs in a formal manner or else under more restrictive conditions than those set forth here. The reader will also note a connection with the work of Bellman [1]. What we do is to consider the problem as a two-sided extremum problem with differ ential and inequality side conditions and apply well-known techniques of M3
BERKOVITZ AND FLEMING the calculus of variations (c. f. Bliss [3 ], Valentine [5 ]) to see how much information can be obtained in this way. Analogues of the classical Euler equations are derived as necessary conditions for a saddle-point. Their solutions, which are called extremals, turn out to be characteristics of a certain partial differential equation involving the value of the game. The converse problem is then treated. Namely, when does a family of extremals determine a solution of the game? The basic condition for this is that the curves x(t) associated with the family P of extremals simply cover a certain region R in the (x, t) plane. The family then defines a pair of strategies which are optimal in a sense to be made precise below* The val ue ¥ of the game is a continuously differentiable function of position (c, t) in R. Actually, we consider the more general case of a finite number of nonoverlapping regions R 1, ..., Rm , such that, for each 1, R^ is simply covered by a family of extremals. In this case the val ue ¥ is continuously differentiable in each region R^ and is continuous across boundary arcs common to more than one region. The partial deriva tives of ¥ are in general discontinuous across such boundary arcs. Although the discussion in this paper is carried out for a scalar differential equation, x = g, the argument can be carried over to a vector equation X = G. §2.
MATHEMATICAL FORMULATION
¥e begin by defining certain terms which are used throughout this paper. An arc in the (x, t) plane is called smooth if each function in its representation in terms of arc length is twice continuously differ entiable. A region G in the (x, t) plane has a piecewise smooth boundary if its boundary consists of a finite number of smooth arcswithout cusps at corners. Let G denotethe closure of G. A function 0 (x, t) is C^k ^ in G if it is C^k ^ in G and all of Its derivatives up to and including those of order k have a continuous extension to G .
functions
Throughout this paper we shall be considering two real-valued f(x, y, z) and g(x, y, z) which are defined for all x and
for all y, z conditions: (a)
(2,1 }
(b)
with 0 < y< 1 j 0
and which satisfy the following
(2 ) f and g are C^ on the closure of their domain of definition; fyy - °' Syy “
0’
fzz ? °» -
(c)
ax + b < g(x, y, z) < a'x + b 1for suitable constants a, a 1, b, b ’.
Let
T > 0
be fixed for the remainder of this paper.
The symbol
GAMES WITH INTEGRAL PAYOFF R will designate a region of the (x, t) plane contained inthe strip o < t < T, such that the projection of R on the t axis covers the entire interval 0 < t < T. We shall consider functions y (x, t) and z(x, t) defined on R and the differential equation ri"3 c’
^ W(c' t o) ^ P(v)(y>’ z;
t o )>
Henceforth a path corresponding to a saddle-point (y*(x, t ), z*(x, t )) will be denoted by x(t), y(t), z(t). Also, whenever a = 1 we shall drop the subscript on the payoff
P.
We close this section with some remarks about the assumptions on f and g. Conditions (2 .1b) say that f is concave in y and convex in z, and are imposed in order to avoid introducing mixed strategies. Insimilar problems involving only maximization or only minimization, suchconcavity or convexity requirements were found to be essential to ensure that the maximum or minimum is attained. (See [2 ].) The requirement (2 .1 c) is placed on g to ensure that the solutions of (2 .2 ) and (2 .3 ) have a uniform bound depend ing only on the initial condition (c, tQ ). §3-
NECESSARY CONDITIONS
We now proceed to derive a set of necessary conditions which must hold along a path resulting from a saddle-point. In simple examples use of these necessary conditions, together with the sufficiency theory to follow in Sections k and 5 , often enables one to find the complete solution to the game. (See Section 7«)* Let (y*(x, t ), z*(x, t )) be a saddle-point relative to ( , has a unique solution x(t; xQ, tQ ) through each point (x , t ) of R^, 1 = 1 * ..., n. The nontangency condition in (3*0 ensures that for (x^, t^) sufficiently close to (xQ, tQ ) the solution x(t; x^, t^) has a unique continuation across each arc i = 1, ..., k. This fact together with (2.1c) implies the existence of a constant K such that for (x^, t ) sufficient ly close to (xQ, tQ ) the Inequality |x(t; holds for
t
xQ, tQ )| < K|xQ - x^|
tQ ) - x(t;
< t < T.
The case in which x(t) follows some boundary R^ fl over an interval of time rather than crossing as in (3*0 is also of importance, but is not treated here. Let us Introduce the following quantities, which play a funda mental role In what follows:
H ( t ) = f y [ x ( t ) , y ( t ), z ( t ) ] + x ( t ) g y [ x ( t ) , y ( t ), z (t)J ,
o ,
r >
J
= 0
if
o < y
zj_ = ^(t, u),
are both continuous In U^; moreover, vanishes at any point where y^ is discon tinuous, and K^ vanishes at any point where z^ is discontinuous. 3•
w1 ( t ,
Let the functions
u) =
J
w^(t, u), defined by
T u), y1 (t, u), z± (t, u)j dt,
t ( 6.1 )
i = 1, ..., m
,
be such that whenever x^(t, u) = x^(t, u' ) then w^(t, u) = Wj(t, u f). Then the m families x 1(t, u), ..., xm (t, u) determine a field F over R . PROOF. It is required to show that (5*1 ) and (5*2) hold. From hypotheses 1 and 2 it can easily be shown that corresponding to each there is determined by means of x^(t, u) a simply connected region with piecewise smooth boundary and such that for, i, k + j, i, R ^ H R ^ = These regions suitably renumbered, can be taken as the subregions R. of R described in (5 .1a). It alsofollows from the hypotheses of the theorem that for each 1 the functions x. (t, u) can be inverted togive (2 ) functions u = u^(x, t), continuous on ^ and C K ' in each For (x, t) in Rik define:
GAMES WITH INTEGRAL PAYOFF
yik^x> 13^ = (6.2)
uj.(x > t))
,
zik^x ’ t) = z±(t, u±(x, t))
,
Aik (x, t) = \1(t, u^x, t))
.
These functions clearly satisfy (4.1a) and (4.1b) on R ^ * Since x^(t, u), yi(t, u), z^(t, u) is an extremal, and, by (2 .1b), f + A ^ g is concave in y and convex in z, it follows that (4. ic)also holds on R ^ . From the definition (3 .3 ) of A.^ and the relation
xit = Aikxxit + Aikt = Aikxg (xi’ yi' z l ' > + Aikt = Aikxg{xi> yik' it follows that (4.id) holds. To show that (5 .1b)
zi k ) + Aikt
'
Hence (5 •1a ) is verified. holds, we proceed as follows.
Define
y*(x, t) = y^k (x, t) z*(x, t) = z?k (x, t) for 1 = 1 , ..., m; k = 1, ..., k(I), if (x, t ) belongs to just one region If (x, t) belongs to more than one region then we choose the first such region according to the lexicographic ordering of (I, k) to define y* and z*. With this definition, y* and z* satisfy (2.4a). For the purpose of checking that (2.4c) and (2 .4d)hold, it suffices to show that for any I, k, u, the curve x^(t, u) is not tangent to the curve X(u) = x^(t^(u), u). This is immediate, for tangency at a point would imply the existence of a constant p such that dX = x^^dt^ + x^udu = px^dt dtik = Pdt
,
,
whence xiudu = 0 (2 ) But du =£ 0 since t^(u) is C v , thus the curves cannot be tangent.
and
We now proceed to verify (5•2 ). (6 .3 )
•
W(x, t) = w± (t, u± (x, t))
x^u
is boundedaway from zero;
Define for
(x, t) e ^
428
BERKOVITZ AND FLEMING
From hypothesis 3 it Is clear that on each
R^.
W(x, t)
is continuous on
It is evident from (6.1) and (6.3) that
R
and is
W(x, T) = 0.
Thus (5.2a) and (5 .2 b) are established. On each
we have, by (6.3)*
W(x(t, u), t) = w± (t, u),
u = u± (x, t)
,
whence
V t + wt = wi t But, by (6.1 ),
w it = - f
and
x^ = g,
•
so we have
W t (x, t) * - f(x, y*, z*) - Wx (x, t)g(x,
y*, z*)
Thus to establish (5 .2 c) it is required to show that Rik*
^ a t is
.
Wx =
on each
say (see (6.2)), it suffices to show that
(6 .4 )
X± ( t ,
u) = ^(t, u± (x, t)) =
.
Differentiating (6.1) yields
(6-5)
wi u =
/ ! r dt+ t
S Aik( f)- a ^ ' tlk>t
where df = fx*iu + V i u
+ fzziu
'
A.k (f) = f[x(tlk, u), y"(tlk, u), - fjx(tlk, u), y + (tlk, u),z + ( t l k the
+ and
(t, u) —
respectively. t,
from the interior of
u)]
In some
and
, limits as
UL k ,
Upon differentiating the right-hand side of (6 .4 ) with we get, with the help of (6.5)*
(wluui A (x, t)
,
-superscripts indicating right- and left-hand (tik(u), u)
respect to
u)]
But
= - IS uix + wiuulxt' u^
= (x -j_u )~1,
and so
I s uix " f x + f y?lu + f zziu ulx
•
GAMES WITH INTEGRAL PAYOFF
429
Hence, by (6.2),
*
ie e\ (6,6)
Sa
u ix - fx + fy ~ 55T
+ fz
Sf 11-f,f57lk1fSz±k" 3x
~ST
Furthermore, tr „ W „ . iu^ixt ' j
u
{ w iu [V"iu I
w iuxiut /„ n2
(xlu)^
(■§§ )
(xl u )
where
"Su ~ gx*iu + % ^ i u + &zziu Thus, writing
(f
- 'I
-
(6,7)
ir
1 c
S 7 lk
t r
S Z lk
I E “ gx + «y “ S T + gz “ SF"
’
we have
If • By using (4 .2 ) one can write the differential equation (3 *3 ) for
X
as
follows:
i
X where
df/dx
*
and
dg/&x are
/
\
u ± < u < u± , X ^ ( t , u) and in t at allinterior points Wiu (T, u) = 0, X^(T, u) = 0, (6 .4 )
= _
K
dx
X
42: ’
as defined in (6.6) and (6.7).
Thus for each
w ^uu j_x satisfy the same differential equation of the regions Ui k . Furthermore, since it follows that to complete the
It is necessary to show that
t±k(u).
_ x
If we drop the subscript
as explained above, then for each
proof of
Is continuous across the curves i
and use the superscripts
k, k = 1, ..., k(i) - 1,
+
and
it is required
to show that
(wuux)~ - (wuux)+ = 0
(wuux )" - M
•
+ = ( wu - < ) ux + < ( ux - ux) -
ik~k) ~k
( wu - wu )
^ K • x“) } ■
BERKOVITZ AND FLEMING
1+30
Clearly, (6.1+) is valid on(^ )j so, proceeding inductively, we may assume that w4t jl = X. It follows from (6 .5 ) that U. A
wu - wu
dt-'k du
’(x, yt, 4 ) - f(x, y7, z:)
Writing x as the integral of to u, gives
g,
xu - xu - - [ g(x,
and then differentiating with respect
4 ) - g(x> y ^
4 )
dtk “3u
Thus the right-hand member of (6.8) can be written as
^ (mr) { [ f(x’ ?i’z±)-f(x>*1>zi)] -
X
[g (x> y ^
4
) - g ( x > 7i>
and tocomplete the proof we must show that the term in curly braces
but
4
)
is zero.
If y+ = y~, z+ = z“, then there is no problem.Suppose y~ 4 y+ z~ = z+ = z. Then by hypothesis 2(d), H = f^ + x,g^ = 0 . It follows
from, the concavity of ®(y, z) = f(x, y, z) + \g(x, y, z ) in
y that both
question.
y+
and y ”
maximize
0
for the given
z and
x in
Hence
f(x, y+, z) - f(x, y , z) = \ g(x, y , z) - g(x, y , z) which, since z = z~ = z , says that the term in curly braces is zero. Suppose then that y“ f y+, z“ =(= z+ . Then H = K = 0, and (y~, z~) and (y+, z+ ) are both saddle-points for the game over the square with payoff $(y, z). Hence (y~, z” ) = o(y+, z+ ), and f(x, y+> z+ ) - f(x, y ”, z ) = X g(x, y , z ) - g(x, y , z ) as was to be proved. §7-
AN EXAMPLE
We shall consider the game for which
GAMES WITH INTEGRAL PAYOFF f(x, j ,
z ) = x2 ,
g(x,
y,
z ) = 2y -
- 1,
x(o) = c
.
Let us make some preliminary observations. Intuitively, It is desirable for Player I to make x either very large or very small If possible. Further, given sufficient time he can reach either of these goals, as is shown by the following table which gives the values of dx./dt for the extreme values of y and z. z = 0 - 1
z = 1 3 2
+ 1
This suggests that a good choice for I may be either y0 (x, t) = y0 (t) = 0 Moreover, if c > o and I chooses Similarly, if c < o and I chooses
or
y 1(x, t ) = y 1(t) = 1
.
y 1, II should clearly choose z1 = 1. yQ, II should choose zQ = o. We have
T
P(y0> z0} = /
(c - t)2dt = c2T - cT2 + T3/3
,
O T (c - 3/2t )2dt = c2T - 3 / 2 cT2 + 3/4 T3
J
P(y0, ^ ) =
O (7.1 ) T
P(y^ z0) =
f
(c + t)2dt = c2T + cT2 + T3/3
,
O
T p (y^ z1) =
(c + l/2t)2dt = c2T + 1 / 2 cT2 + T3/12
f
O Clearly, P(yQj 20 ) > P(y1? Zl )
if
T > 6c
< P(y1, z1 )
if
T < 6c
Next, by (3-2) and (3-3),
,
BERKOVITZ AND FLEMING H = fy +
~ 2X
,
K = fz + A,gz = - 1/2X,
3 F = ” ^ x + Xgx
^
= ~ ^x,
\(T ) = o
Set x(T) = u. P o p u > o, dx/dt < o at t = T, and hence some interval t1 < t < T. Then H > o, K < o on (t.j, T), and so y(t, u) = 1, z(t, u) = 1 and dx(t, u)/dt = 1 / 2 along the extremal passingthrough (u, T). Along this extremal, x(t, u) = (t- T)/2 + u. The value of t 1 is determined by x(t1 ) = 0 to be t1 - T = - Vu. At t = t-j,we have x(t1, u) = - u. Similarly for u < 0, we have y(t, u) = 0, z(t, u) = 0 on an interval t2 < t < T; along this extremal, we have x(t, u) = - (t - T) + u, the value of tg is t2 - T = 2u, and x(t2, u) = - u. We have thus defined two one-parameter families of ex tremals, one family corresponding to values u > 0, and the other to * values u < 0 . Consider the first family on the closed region U. defined -Xby u > 0, T - (3u/2) < t< T,and the second family on the region U2 defined by u < 0, T+ (6u/5) < t < T. (See Figure 1.) The function x(t, u) = (t - T)/2 + u maps the interior of IL in a one-to-one fashion -xonto the region R^ of the (x, t ) plane defined by t < T and x > - (t - T)/6. (See Figure 2 .) On the other hand, the function x(t, u )= - (t - T) + u maps the interior ofU2 onto the region R2 which is defined by t < T and x < - (t - T)/6. Denoting functions from X > oin
the first family by the subscript have, for arbitrary t < T,
1
and those from the second by
2,
we
X1 ( t 0» - 2 / 3 ( t 0 - T)) = x2 ( t 0, 5 / 6 ( t 0 - T ) j and
T
I
2 {
~ 2/3(to
~
T) }
T
dt =
J
2 j(t
- T) + 5 / 6 ( t 0 - T)
j
dt
to
that is,
wl ( V
-
2/3 (t0 "
T ))
= w2 ( t 0 '
5/ 6 (to
-
T ) )
•
Thus we have a field over the half plane t < T with two subregions R* * 1 and Rp . To place the example completely in the context of the theory we ■Xrestrict U 1 and U2 tononnegative t values andbounded u value's in order to obtain regions IL and IL, respectively, andthereby obtain in 1 ^ * * turn regions R 1 and R2 whichare restrictions of R^ and R2,
^33
GAMES WITH INTEGRAL PAYOFF respectively, and lie in the strip 0 < t < T saddle-point is the pair of functions:
of the
1
if (x, t ) is in
R 1 or
o
if (x, t )
R0 ;
1
if (x, t ) is in
0
if (x, t )
(x, t)
plane.
in
R 1 0 R2
R 1 or in
0 R2
y*(x, t) = is in
z*(x, t) = is in
FIGURE 1
R
A
BERKOVITZ AM) FLEMING
FIGURE 2 Dashed lines are extremals
435
GAMES WITH INTEGRAL PAYOFF BIBLIOGRAPHY [1]
BELLMAN, R., Dynamic Programming of Continuous Processes, The RAND Corporation, Report R-2 7 1 , July, 1954.
[2]
BELLMAN, R . , FLEMING, W. H., and WIDDER, D. V., "Variational lems with constraints," Annali di Matematica 4i (to appear).
prob
[3 ] BLISS, G. A., Lectures on the Calculus of Variations, The University of Chicago Press, Chicago, 1946. [4]
FLEMING, W. H., "A note on differential games of prescribed duration," this Study.
[5 ] VALENTINE, F. A., "The problem of Lagrange with differential in equalities as added side conditions, Contributions to the Calculus of Variations, 1 933-37, University of Chicago Press, Chicago, 1937*
L. D. Berkovitz W. H. Fleming The RAND Corporation
PR IN C ETO N M ATHEM ATICAL SE R IE S Edited by Marston Morse and A. W. Tucker 1. The Classical Groups By HERM ANN W EYL ................................................................... 314 pp. $6.00 2. Topological Groups By L. PONTRJAGIN (translated by Emma Lehm er) ..........310 pp.
$6.00
3. An Introduction to Differential Geometry By L U T H E R PFA H LER EISEN H A R T ...................................... 316 pp. $6.00 4. Dimension Theory By W ITO LD HUREW ICZ and HENRY W ALLM AN ..........174 pp.
$3.75
5. Analytical Foundations of Celestial Mechanics By A U R EL W IN TN ER ................................................................... 460 pp. $7.50 6. The Laplace Transform By DAVID VERNON W ID D ER ................................................. 416 pp. $7.50 7. Integration By EDW ARD J. M cSH ANE ................................ Paper bound 400 pp. $2.95 8. Theory of Lie Groups: I By C LA U D E C H EV A LLEY ................................ Paper bound 288 pp. $2.75 9. Mathematical Methods of Statistics By HARALD CRAMER .................................................................570 pp. $7.50 10. Several Complex Variables By S. BOCHNER and W. T. MARTIN ......................................216 pp. $4.00 11. Introduction to Topology By S. LE FSC H E T Z ..........................................................................226 pp. $4.50 12. Algebraic Geometry and TopologyEdited by R. H. FOX, D. C. SPEN CER, and A. W. TU C K ER ..............................................................................................408 pp. $7.50 13. Algebraic Curves By RO BERT J . W ALKER .............................................................. 210 pp. $4.00 14. The Topology of Fibre Bundles By NORMAN STEEN R O D .......................................................... 232 pp. 15. Foundations of Algebraic Topology By SAM UEL E IL E N B E R G and NORMAN STEEN R O D
$5.00
342 pp. $7.50
16. Functionals of Finite Riemann Surfaces By M ENAIIEM S C IIIF F E R and DONALD C. SPEN CER 464 pp. $8.00 17. Introduction to Mathematical Logic, Vol. I By ALONZO CHURCH .................................................................. 384 pp. $7.50 18. Algebraic Geometry By S. LE FSC H E T Z ........................................................................... 242 pp. $5.00 19. Homological Algebra By H. CARTAN and S. E IL E N B E R G ................................... 408 pp. $7.50 20. The Convolution Transform By I. I. HIRSCHM AN and
D. V.W ID D ER ............................280 pp. $5.50
21. Geometric Integration Theory By H A SSLER W HITNEY ....................................................... 396 pp. $8.50 22. Qualitative Theory of Differential Equations By V. V. NEM ICKII and V. V. STEPANOV ......................................... In press 23. Topological Analysis By GORDON T. W IIYBURN ................................................................... In press PRINCETON
UNIVERSITY
P rin c e to n , N ew Je rse y
PRESS