406 86 8MB
English Pages 1008 Year 2013
more information - www.cambridge.org/9781107005488
Game Theory Covering both noncooperative and cooperative games, this comprehensive introduction to game theory also includes some advanced chapters on auctions, games with incomplete information, games with vector payoffs, stable matchings, and the bargaining set. Mathematically oriented, the book presents every theorem alongside a proof. The material is presented clearly and every concept is illustrated with concrete examples from a broad range of disciplines. With numerous exercises the book is a thorough and extensive guide to game theory from undergraduate through graduate courses in economics, mathematics, computer science, engineering, and life sciences to being an authoritative reference for researchers. M i c h a e l M a s c h l e r was a professor in the Einstein Institute of Mathematics and the Center for the Study of Rationality at the Hebrew University of Jerusalem in Israel. He greatly contributed to cooperative game theory and to repeated games with incomplete information. E i l o n S o l a n is a professor in the School of Mathematical Sciences at Tel Aviv University in Israel. The main topic of his research is repeated games. He serves on the editorial board of several academic journals. S h m u e l Z a m i r is a professor emeritus in the Department of Statistics and the Center for the Study of Rationality at the Hebrew University of Jerusalem in Israel. The main topics of his research are games with incomplete information and auction theory. He is the editor-in-chief of the International Journal of Game Theory.
Game Theory
MICHAEL MASCHLER EILON SOLAN SHMUEL ZAMIR Translated from Hebrew by Ziv Hellman English Editor Mike Borns
cambridge university press Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, S˜ao Paulo, Delhi, Mexico City Cambridge University Press The Edinburgh Building, Cambridge CB2 8RU, UK Published in the United States of America by Cambridge University Press, New York www.cambridge.org Information on this title: www.cambridge.org/9781107005488 C The Estate of the late Michael Maschler, Eilon Solan and Shmuel Zamir 2013 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2013 Printed in the United Kingdom at the University Press, Cambridge A catalog record for this publication is available from the British Library Library of Congress Cataloging in Publication data Zamir, Shmuel. [Torat ha-mishakim. English] Game theory / Michael Maschler, Eilon Solan, Shmuel Zamir ; translated from Hebrew by Ziv Hellman ; English editor, Mike Borns. pages cm Translation of: Torat ha-mishakim / Shemu’el Zamir, Mikha’el Mashler ve-Elon Solan. Includes bibliographical references and index. ISBN 978-1-107-00548-8 (hardback) 1. Game theory. I. Maschler, Michael, 1927–2008. II. Solan, Eilon. III. Title. QA269.Z3613 2013 519.3 – dc23 2012050827 ISBN 978-1-107-00548-8 Hardback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
To Michael Maschler
Contents
Acknowledgments Notations Introduction 1
2
The game of chess
1
1.1 1.2 1.3 1.4
1 2 7 7
vii
Schematic description of the game Analysis and results Remarks Exercises
Utility theory 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11
3
page xiv xv xxiii
Preference relations and their representation Preference relations over uncertain outcomes: the model The axioms of utility theory The characterization theorem for utility functions Utility functions and affine transformations Infinite outcome set Attitude towards risk Subjective probability Discussion Remarks Exercises
9 9 12 14 19 22 23 23 26 27 31 31
Extensive-form games
39
3.1 3.2 3.3 3.4 3.5 3.6 3.7
40 41 42 47 49 52 57
An example Graphs and trees Game trees Chomp: David Gale’s game Games with chance moves Games with imperfect information Exercises
viii
Contents
4
5
6
Strategic-form games
75
4.1 Examples and definition of strategic-form games 4.2 The relationship between the extensive form and the strategic form 4.3 Strategic-form games: solution concepts 4.4 Notation 4.5 Domination 4.6 Second-price auctions 4.7 The order of elimination of dominated strategies 4.8 Stability: Nash equilibrium 4.9 Properties of the Nash equilibrium 4.10 Security: the maxmin concept 4.11 The effect of elimination of dominated strategies 4.12 Two-player zero-sum games 4.13 Games with perfect information 4.14 Games on the unit square 4.15 Remarks 4.16 Exercises
76 82 84 85 85 91 95 95 100 102 106 110 118 121 128 128
Mixed strategies
144
5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10
145 152 166 170 172 176 180 186 194 194
The mixed extension of a strategic-form game Computing equilibria in mixed strategies The proof of Nash’s Theorem Generalizing Nash’s Theorem Utility theory and mixed strategies The maxmin and the minmax in n-player games Imperfect information: the value of information Evolutionarily stable strategies Remarks Exercises
Behavior strategies and Kuhn’s Theorem
219
6.1 6.2 6.3 6.4 6.5 6.6
221 226 235 238 243 244
Behavior strategies Kuhn’s Theorem Equilibria in behavior strategies Kuhn’s Theorem for infinite games Remarks Exercises
ix
Contents
7
8
9
Equilibrium refinements
251
7.1 7.2 7.3 7.4 7.5 7.6
252 260 262 271 284 284
Subgame perfect equilibrium Rationality, backward induction, and forward induction Perfect equilibrium Sequential equilibrium Remarks Exercises
Correlated equilibria
300
8.1 8.2 8.3 8.4
301 305 313 313
Examples Definition and properties of correlated equilibrium Remarks Exercises
Games with incomplete information and common priors 9.1 The Aumann model of incomplete information and the concept of knowledge 9.2 The Aumann model of incomplete information with beliefs 9.3 An infinite set of states of the world 9.4 The Harsanyi model of games with incomplete information 9.5 Incomplete information as a possible interpretation of mixed strategies 9.6 The common prior assumption: inconsistent beliefs 9.7 Remarks 9.8 Exercises
10
11
319 322 334 344 345 361 365 367 368
Games with incomplete information: the general model
386
10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8
386 391 394 400 407 415 423 423
Belief spaces Belief and knowledge Examples of belief spaces Belief subspaces Games with incomplete information The concept of consistency Remarks Exercises
The universal belief space
440
11.1 Belief hierarchies 11.2 Types
442 450
x
Contents
12
13
11.3 Definition of the universal belief space 11.4 Remarks 11.5 Exercises
453 456 456
Auctions
461
12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9 12.10 12.11 12.12
464 464 465 468 471 484 488 492 500 501 508 509
Repeated games
519
13.1 13.2 13.3 13.4
520 521 524
13.5 13.6 13.7 13.8 13.9 13.10
14
Notation Common auction methods Definition of a sealed-bid auction with private values Equilibrium The symmetric model with independent private values The Envelope Theorem Risk aversion Mechanism design Individually rational mechanisms Finding the optimal mechanism Remarks Exercises
The model Examples The T -stage repeated game Characterization of the set of equilibrium payoffs of the T -stage repeated game Infinitely repeated games The discounted game Uniform equilibrium Discussion Remarks Exercises
530 537 542 546 554 555 555
Repeated games with vector payoffs
569
14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8 14.9 14.10 14.11
570 572 573 574 576 585 590 600 606 607 608
Notation The model Examples Connections between approachable and excludable sets A geometric condition for the approachability of a set Characterizations of convex approachable sets Application 1: Repeated games with incomplete information Application 2: Challenge the expert Discussion Remarks Exercises
xi
Contents
15
16
17
18
Bargaining games
622
15.1 15.2 15.3 15.4 15.5 15.6 15.7 15.8 15.9 15.10 15.11
625 625 626 630 635 639 641 643 650 653 653
Notation The model Properties of the Nash solution Existence and uniqueness of the Nash solution Another characterization of the Nash solution The minimality of the properties of the Nash solution Critiques of the properties of the Nash solution Monotonicity properties Bargaining games with more than two players Remarks Exercises
Coalitional games with transferable utility
659
16.1 16.2 16.3 16.4 16.5 16.6 16.7 16.8
661 668 670 671 672 676 678 678
Examples Strategic equivalence A game as a vector in a Euclidean space Special families of games Solution concepts Geometric representation of the set of imputations Remarks Exercises
The core
686
17.1 17.2 17.3 17.4 17.5 17.6 17.7 17.8 17.9 17.10 17.11 17.12
687 691 695 702 712 715 717 721 724 732 735 735
Definition of the core Balanced collections of coalitions The Bondareva–Shapley Theorem Market games Additive games The consistency property of the core Convex games Spanning tree games Flow games The core for general coalitional structures Remarks Exercises
The Shapley value
748
18.1 18.2 18.3 18.4
749 751 754 758
The Shapley properties Solutions satisfying some of the Shapley properties The definition and characterization of the Shapley value Examples
xii
Contents
18.5 18.6 18.7 18.8 18.9 18.10
19
20
21
22
An alternative characterization of the Shapley value Application: the Shapley–Shubik power index Convex games The consistency of the Shapley value Remarks Exercises
760 763 767 768 774 774
The bargaining set
782
19.1 19.2 19.3 19.4 19.5 19.6 19.7
784 788 788 794 797 798 798
Definition of the bargaining set The bargaining set in two-player games The bargaining set in three-player games The bargaining set in convex games Discussion Remarks Exercises
The nucleolus
801
20.1 20.2 20.3 20.4 20.5 20.6 20.7 20.8 20.9 20.10 20.11
802 805 809 815 816 823 825 831 842 843 844
Definition of the nucleolus Nonemptiness and uniqueness of the nucleolus Properties of the nucleolus Computing the nucleolus Characterizing the prenucleolus The consistency of the nucleolus Weighted majority games The bankruptcy problem Discussion Remarks Exercises
Social choice
853
21.1 21.2 21.3 21.4 21.5 21.6
856 864 871 873 874 874
Social welfare functions Social choice functions Non-manipulability Discussion Remarks Exercises
Stable matching
884
22.1 The model 22.2 Existence of stable matching: the men’s courtship algorithm 22.3 The women’s courtship algorithm
886 888 890
xiii
Contents
22.4 22.5 22.6 22.7
23
Comparing matchings Extensions Remarks Exercises
892 898 905 905
Appendices
916
23.1 23.2 23.3 23.4 23.5
916 943 945 950 950
Fixed point theorems The Separating Hyperplane Theorem Linear programming Remarks Exercises
References Index
958 968
Acknowledgments
A great many people helped in the composition of the book and we are grateful to all of them. We thank Ziv Hellman, the devoted translator of the book. When he undertook this project he did not know that it would take up so much of his time. Nevertheless, he implemented all our requests with patience. We also thank Mike Borns, the English editor, who efficiently read through the text and brought it to its present state. We thank Ehud Lehrer who contributed exercises and answered questions that we had while writing the book, Uzi Motro who commented on the section on evolutionarily stable strategies, Dov Samet who commented on several chapters and contributed exercises, Tzachi Gilboa, Sergiu Hart, Aviad Heifetz, Bo’az Klartag, Vijay Krishna, Rida Laraki, Nimrod Megiddo, Abraham Neyman, Guni Orshan, Bezalel Peleg, David Schmeidler, Rann Smorodinsky, Peter Sudh¨olter, Yair Tauman, Rakesh Vohra, and Peter Wakker who answered our questions, and the many friends and students who read portions of the text, suggested improvements and exercises and spotted mistakes, including Alon Amit, Itai Arieli, Galit AshkenaziGolan, Yaron Azrieli, Shani Bar-Gera, Asaf Cohen, Ronen Eldan, Gadi Fibich, Tal Galili, Yuval Heller, John Levy, Maya Liran, C Maor, Ayala Mashiach-Yaakovi, Noa Nitzan, Gilad Pagi, Dori Reuveni, Eran Shmaya, Erez Sheiner, Omri Solan, Ron Solan, Roee Teper, Zorit Varmaz, and Saar Zilberman. Finally, we thank the Center for the Study of Rationality at the Hebrew University of Jerusalem and Hana Shemesh for the assistance they provided from the beginning of this project.
xiv
Notations
The book makes use of large number of notations; we have striven to stick to accepted notation and to be consistent throughout the book. The coordinates of a vector are always denoted by a subscript index, x = (xi )ni=1 , while the indices of the elements of sequences are always denoted by a superscript index, x 1 , x 2 , . . . The index of a player in a set of players is always denoted by a subscript index, while a time index (in repeated games) is always denoted by a superscript index. The end of the proof of a theorem is indicated by , the end of an example is indicated by ◭, and the end of a remark is indicated by . For convenience we provide a list of the mathematical notation used throughout the book, accompanied by a short explanation and the pages on which they are formally defined. The notations that appear below are those that are used more than once. 0 0 ∅ 1A 2Y |X| x∞
x
number of elements in finite set X L∞ norm, x∞ := maxi=1,2,...,n |xi | d 2 norm of a vector, x := l=1 (xl )
A∨B A∧B A⊆B A⊂B
maximum matching (for men) in a matching problem maximum matching (for women) in a matching problem set A contains set B or is equal to it set A strictly contains set B
x, y x 0 , . . . , x k i ≻i ≈i P ≻Q ≈Q
inner product k-dimensional simplex preference relation of player i strict preference relation of player i indifference relation of player i preference relation of an individual strict preference relation of society indifference relation of society
x≥y
xk ≥ yk for each coordinate k, where x, y are vectors in a Euclidean space x ≥ y and x = y
x>y xv
chance move in an extensive-form game origin of a Euclidean space strategy used by a player who has no decision vertices in an extensive-form game function that is equal to 1 on event A and to 0 otherwise collection of all subsets of Y
50 570 5 595 325 603 531 570 895 896
570 920 14 10 10, 897 857 857 857 625 625
xvi
Notations
x≫y x+y xy x+S xS cx cS S+T
⌈c⌉ ⌊c⌋ x⊤
argmaxx∈X f (x) a(i) A A Ai Ak A(x) A(Ui ) bi b(S) brI (y) brII (x) Bi p Bi
BZi (N; v) B BiT Bi∞ c c+ ci C
xk > yk for each coordinate k, where x, y are vectors in a Euclidean space sum of vectors in a Euclidean space, (x + y)k := xk + yk coordinatewise product of vectors in a Euclidean space, (xy)k := xk yk x + S := {x + s : s ∈ S}, where x ∈ Rd and S ⊆ Rd xS := {xs : s ∈ S}, where x ∈ Rd and S ⊆ Rd product of real number c and vector x cS := {cs : s ∈ S}, where c is a real number and S ⊆ Rd sum of sets; S + T := {x + y : x ∈ S, y ∈ T }
625 625 625 625 625 625 625 625
smallest integer greater than or equal to c largest integer less than or equal to c transpose of a vector, column vector that corresponds to row vector x set of all x where function f attains its maximum in the set X
534 534 571 125, 625
producer i’s initial endowment in a market set of actions in a decision problem with experts set of alternatives player i’s action set in an extensive-form game, j Ai := ∪kji=1 A(Ui ) possible outcome of a game set of available actions at vertex x in an extensive-form game set of available actions at information set Ui of player i in an extensive-form game buyer i’sbid in an auction b(S) = i∈S bi where b ∈ RN Player I’s set of best replies to strategy y Player II’s set of best replies to strategy x player i’s belief operator set of states of the world in which the probability that p player i ascribes to event E is at least p, Bi (E) := {ω ∈ Y : πi (E | ω) ≥ p} Banzhaf value of a coalitional game coalitional structure set of behavior strategies of player i in a T -repeated game set of behavior strategies of player i in an infinitely repeated game coalitional function of a cost game maximum of c and 0 i (vi ) ci (vi ) := vi − 1−F fi (vi ) function that dictates the amount that each buyer pays given the vector of bids in an auction
703 601 856 221 13 44 54 91, 466 669 125 125 392
426 780 673 525 538 661 840 501 466
xvii
Notations
C(x) C (N, v) C (N, v; B ) conv{x1 , . . . , xK } d di dt d(x, y) d(x, S) D(α, x)
set of children of vertex x in an extensive-form game 5 core of a coalitional game 687 core for a coalitional structure 732 smallest convex set that contains the vectors {x1 , . . . , xK } Also called the convex hull of {x1 , . . . , xK } 530, 625, 917
disagreement point of a bargaining game debt to creditor i in a bankruptcy problem distance between average payoff and target set Euclidean distance between two vectors in Euclidean space Euclidean distance between point and set collection of coalitions whose excess is at least α, D(α, x) := {S ⊆ N, S = ∅ : e(S, x) ≥ α}
625 833 581 571 571 818
e(S, x) E E E
excess of coalition S, e(S, x) := v(S) − x(S) set of vertices of a graph estate of bankrupt entity in a bankruptcy problem set of experts in a decision problem with experts
F F Fi
set of feasible payoffs in a repeated game 530, 578 social welfare function 857 cumulative distribution function of buyer i’s private values in an auction 466 324 atom of the partition Fi that contains ω cumulative distribution function of joint distribution of 466 vector of private values in an auction collection of all subgames in the game of chess 5 family of bargaining games 625 family of bargaining games with set of players N 650 family of bargaining games in F where the set of alternatives is comprehensive and all alternatives are at least as good as the disagreement point, which is (0, 0) 644 player i’s information in an Aumann model of incomplete information 323
Fi (ω) FN
F F FN Fd
Fi gT G G
average payoff up to stage T (including) in a repeated game graph social choice function
h ht H (t) H (∞) H (α, β) H + (α, β) H − (α, β)
history of a repeated game history at stage t of a repeated game set of t-stage histories of a repeated game set of plays in an infinitely repeated game hyperplane, H (α, β) := {x ∈ Rd : α, x = β} half-space, H + (α, β) := {x ∈ Rd : α, x ≥ β} half-space, H − (α, β) := {x ∈ Rd : α, x ≤ β}
i −i
player set of all players except of player i
802 41, 43 833 601
572 41 865 525 602 525, 601 538 577, 943 577, 943 577, 943
xviii
Notations
I J J (x) −k ki K Ki KS , KS (S) L L L L L
function that dictates the winner of an auction given the vector of bids
466
number of lotteries that compose a compound lottery player who chooses a move at vertex x of an extensive-form game
14
player who is not k in a two-player game number of information sets of player i in an extensive-form game number of outcomes of a game player i’s knowledge operator Kalai–Smorodinsky solution to bargaining games
571 54 16 325 648
lottery: L = [p1 (A1 ), p2 (A2 ), . . . , pK (AK )] number of commodities in a market = [q1 (L1 ), . . . , qJ (LJ )] compound lottery: L set of lotteries set of compound lotteries
13 703 14 13 15
44
m(ǫ) mi mi (S) M Mm,l M(ǫ) M(N; v; B )
minimal coordinate of vector ε number of pure strategies of player i highest possible payoff to player i in a bargaining game maximal absolute value of a payoff in a game space of matrices of dimension m × l maximal coordinate of vector ε bargaining set for coalitional structure B
n n nx N N N N N N N (N; v) N (N; v; B ) N (N; v; K)
number of players 77 number of buyers in an auction 466 4 number of vertices in subgame Ŵ(x) set of players 43, 833, 660 set of buyers in an auction 466 set of individuals 856 set of producers in a market 703 set of natural numbers, N := {1, 2, 3, . . .} N (S, d), Nash’s solution to bargaining games 630 nucleolus of a coalitional game 805 nucleolus of a coalitional game for coalitional structure B 805 nucleolus relative to set K 804
O
set of outcomes
p
common prior in a Harsanyi game with incomplete information probability that the outcome of lottery L is Ak probability distribution over actions at chance move x binary relation
pk px P
264, 268 147 643 521 204 264, 268 786
13, 43 347 13 50 857
xix
Notations
P P Pσ (x) Pσ (U )
PN P O(S) P O W (S) P (A)
P (N) P ∗ (A) PN (N; v) PN (N; v; B ) q q(w)
set of all weakly balancing weights for collection D∗ of all coalitions 701 common prior in an Aumann model of incomplete information 334 probability that the play reaches vertex x when the players implement strategy vector σ in an extensive-form game 254 probability that the play reaches a vertex in information set U when the players implement strategy vector σ in an extensive-form game 273 vector of preference relations 857 set of efficient (Pareto optimal) points in S 627 627 set of weakly efficient points in S set of all strict preference relations over a set of alternatives A 857 collection of nonempty subsets of N, P (N) := {S ⊆ N, S = ∅} 670, 701 set of all preference relations over a set of alternatives A 857 prenucleolus of a coalitional game 805 prenucleolus of a coalitional game for coalitional 805 structure B quota in a weighted majority game minimal weight of a winning coalition in a weighted majority game, q(w) := minm w(S) S∈W
Q++
set of positive rational numbers
rk R1 (p)
total probability that the result of a compound lottery is Ak set of possible payoffs when Player 1 plays mixed action p, R1 (p) := {puq ⊤ : q ∈ (J )} set of possible payoffs when Player 2 plays mixed action q, R2 (p) := {puq ⊤ : q ∈ (I )} real line set of nonnegative numbers set of positive numbers n-dimensional Euclidean space nonnegative orthant in an n-dimensional Euclidean space, Rn+ := {x ∈ Rn : xi ≥ 0, ∀i = 1, 2, . . . , n} |S|-dimensional Euclidean space, where each coordinate corresponds to a player in S range of a social choice function
R2 (p) R R+ R++ Rn Rn+ RS range(G) s s t
s si
strategy vector function that assigns a state of nature to each state of the world action vector played at stage t of a repeated game strategy of player i
664 828 18 576 576
669 870 45 323 525 45, 56
xx
Notations
st
s−1 (C) S S S S Si Sh supp supp ti T T Ti u ui ui ui uit ut u(s) j Ui Ui U (C) U [α] v v v v v v0 vi v∗ vi vi val(A) V V
state of nature that corresponds to type vector t in a Harsanyi game with incomplete information set of states of the world that correspond to a state of nature in C, s−1 (C) := {ω ∈ Y : s(ω) ∈ C} set of all vectors of pure strategies set of states of nature in models of incomplete information set of states of nature in a decision problem with experts set of alternatives in a bargaining game set of player i’s pure strategies Shapley value support of a probability distribution support of a vector in Rn player i’s type in models of incomplete information set of vectors of types in a Harsanyi model of incomplete information number of stages in a finitely repeated game player i’s type set in a Harsanyi model of incomplete information payoff function in a strategic-form game player i’s utility function player i’s payoff function producer i’s production function in a market payoff of player i at stage t in a repeated game vector of payoffs at stage t in a repeated game outcome of a game under strategy vector s information set of player i in an extensive-form game mixed extension of player i’s payoff function uniform distribution over set C scalar payoff function generated by projecting the payoffs in direction α in a game with payoff vectors
347 330 77 323 601 625 77 754 206 925 452 347 528 347 43, 601 14 77 703 527 527 45 54 147
588
value of a two-player zero-sum game 114 coalitional function of a coalitional game 660 maxmin value of a two-player non-zero-sum game 113 minmax value of a two-player non-zero-sum game 113 maximal private value of buyers in an auction 471 42, 43 root of a game tree 91 buyer i’s private value in an auction 732 superadditive closure of a coalitional game player i’s maxmin value in a strategic-form game 103, 104, 176 player i’s minmax value in a strategic-form game 177, 529 value of a two-player zero-sum game whose payoff 588 function is given by matrix A set of edges in a graph 41, 43 set of individually rational payoffs in a repeated game 530
xxi
Notations
V0 Vi Vi V Vi VN wi Wm x−i x(S) X Xk X−i X(n) X(N; v) X 0 (N; v) X(B ; v) X 0 (B ; v) Y (ω) Y i (ω) Y
Zk Z(P , Q; R)
Z(P N , QN ; R) βi βi βi∗ Ŵ Ŵ
set of vertices in an extensive-form game where a chance move takes place set of player i’s decision points in an extensive-form game random variable representing buyer i’s private value in an auction buyer’s set of possible private values in a symmetric auction buyer i’s set of possible private values set of vectors of possible private values: VN := V1 × V2 × · · · × Vn
43 43 467 471 466 466
player i’s weight in a weighted majority game collection of minimal winning coalitions in a simple monotonic game
664
x−i := (x j )j =i x(S) := i∈S xi , where x ∈ RN X := ×i∈N Xi space of belief hierarchies of order k X−i := ×j =i Xj standard (n − 1)-dimensional simplex, n x = 1, xi ≥ 0 ∀i} X(n) := {x ∈ Rn : i=1 i set of imputations in a coalitional game, X(N; v) := {x ∈ Rn : x(N) = v(N), xi ≥ v(i) ∀i ∈ N} set of preimputations, X 0 (N; v) := {x ∈ RN : x(N) = v(N)} set of imputations for coalitional structure B , X(B ; v) := {x ∈ RN : x(S) = v(S) ∀S ∈ B , xi ≥ vi ∀i} set of preimputations for coalitional structure B , X 0 (B ; v) := {x ∈ RN : x(S) = v(S) ∀S ∈ B }
85 669 2 442 85
826
935 674, 802 805 674 805
set of states of the world 323, 334 minimal belief subspace in state of the world ω 401 403 minimal belief subspace of player i in state of the world ω space of coherent belief hierarchies of order k preference relation in which alternatives in R are preferred to alternatives not in R, the preference over alternatives in R is determined by P , and the preference over alternatives not in R is determined by Q preference profile in which the preference of individual i is Z(Pi , Qi ; R)
445
buyer i’s strategy in an auction buyer i’s strategy in a selling mechanism buyer i’s strategy in a direct selling mechanism in which he reports his private value extensive-form game extension of a strategic-form game to mixed strategies
467 495
866 867
495 43, 50, 54 147
xxii
Notations
ŴT Ŵλ Ŵ∞ Ŵ(x) Ŵ ∗ (p)
(S) ε εi εi (si ) θ(x) θik λ λα μk χS πi σ σi σ−k i τi τi τi∗ ϕ, ϕ(S, d) ϕ ϕ
T -stage repeated game 528 544 discounted game with discount factor λ 539 infinitely repeated game subgame of an extensive-form game that starts at vertex x 4, 45, 55 extended game that includes a chance move that selects a vector of recommendations according to the probability distribution p in the definition of a correlated equilibrium 305 set of probability distributions over S 146 vector of constraints in the definition of perfect 264 equilibrium vector of constraints of player i in the definition of perfect equilibrium 264 minimal probability in which player i selects pure 264 strategy si in the definition of perfect equilibrium vector of excesses in decreasing order 802 20 Ak ≈ [θik (AK ), (1 − θik )(A0 )] discount factor in a repeated game 543 640 egalitarian solution with angle α of bargaining games 442 belief hierarchy of order k 693 incidence vector of a coalition belief space: = (Y, F , s, (πi )i∈N ) 466 387 player i’s belief in a belief space strategy in a decision problem with experts 601 mixed strategy of player i 146 strategy of the player who is not player k in a two-player game 571 147 set of mixed strategies of player i 305 strategy in a game with an outside observer Ŵ ∗ (p) 525, 538 player i’s strategy in a repeated game strategy in a game with an outside observer in which player i follows the observer’s recommendation 306 solution concept for bargaining games 626 solution concept for coalitional games 673 solution concept for bankruptcy problems 833 universal belief space 453
Introduction
What is game theory? Game theory is the name given to the methodology of using mathematical tools to model and analyze situations of interactive decision making. These are situations involving several decision makers (called players) with different goals, in which the decision of each affects the outcome for all the decision makers. This interactivity distinguishes game theory from standard decision theory, which involves a single decision maker, and it is its main focus. Game theory tries to predict the behavior of the players and sometimes also provides decision makers with suggestions regarding ways in which they can achieve their goals. The foundations of game theory were laid down in the book The Theory of Games and Economic Behavior, published in 1944 by the mathematician John von Neumann and the economist Oskar Morgenstern. The theory has been developed extensively since then and today it has applications in a wide range of fields. The applicability of game theory is due to the fact that it is a context-free mathematical toolbox that can be used in any situation of interactive decision making. A partial list of fields where the theory is applied, along with examples of some questions that are studied within each field using game theory, includes:
r Theoretical economics. A market in which vendors sell items to buyers is an example of a game. Each vendor sets the price of the items that he or she wishes to sell, and each buyer decides from which vendor he or she will buy items and in what quantities. In models of markets, game theory attempts to predict the prices that will be set for the items along with the demand for each item, and to study the relationships between prices and demand. Another example of a game is an auction. Each participant in an auction determines the price that he or she will bid, with the item being sold to the highest bidder. In models of auctions, game theory is used to predict the bids submitted by the participants, the expected revenue of the seller, and how the expected revenue will change if a different auction method is used. r Networks. The contemporary world is full of networks; the Internet and mobile telephone networks are two prominent examples. Each network user wishes to obtain the best possible service (for example, to send and receive the maximal amount of information in the shortest span of time over the Internet, or to conduct the highest-quality calls using a mobile telephone) at the lowest possible cost. A user has to choose an Internet service provider or a mobile telephone provider, where those providers are also players in the game, since they set the prices of the service they provide. Game theory tries to predict the behavior of all the participants in these markets. This game is more complicated from the perspective of the service providers than from the perspective xxiii
xxiv
Introduction
r
r
r
r
of the buyers, because the service providers can cooperate with each other (for example, mobile telephone providers can use each other’s network infrastructure to carry communications in order to reduce costs), and game theory is used to predict which cooperative coalitions will be formed and suggests ways to determine a “fair” division of the profit of such cooperation among the participants. Political science. Political parties forming a governing coalition after parliamentary elections are playing a game whose outcome is the formation of a coalition that includes some of the parties. This coalition then divides government ministries and other elected offices, such as parliamentary speaker and committee chairmanships, among the members of the coalition. Game theory has developed indices measuring the power of each political party. These indices can predict or explain the division of government ministries and other elected offices given the results of the elections. Another branch of game theory suggests various voting methods and studies their properties. Military applications. A classical military application of game theory models a missile pursuing a fighter plane. What is the best missile pursuit strategy? What is the best strategy that the pilot of the plane can use to avoid being struck by the missile? Game theory has contributed to the field of defense the insight that the study of such situations requires strategic thinking: when coming to decide what you should do, put yourself in the place of your rival and think about what he/she would do and why, while taking into account that he/she is doing the same and knows that you are thinking strategically and that you are putting yourself in his/her place. Inspection. A broad family of problems from different fields can be described as twoplayer games in which one player is an entity that can profit by breaking the law and the other player is an “inspector” who monitors the behavior of the first player. One example of such a game is the activities of the International Atomic Energy Agency, in its role of enforcing the Treaty on the Non-Proliferation of Nuclear Weapons by inspecting the nuclear facilities of signatory countries. Additional examples include the enforcement of laws prohibiting drug smuggling, auditing of tax declarations by the tax authorities, and ticket inspections on public trains and buses. Biology. Plants and animals also play games. Evolution “determines” strategies that flowers use to attract insects for pollination and it “determines” strategies that the insects use to choose which flowers they will visit. Darwin’s principle of the “survival of the fittest” states that only those organisms with the inherited properties that are best adapted to the environmental conditions in which they are located will survive. This principle can be explained by the notion of Evolutionarily Stable Strategy, which is a variant of the notion of Nash equilibrium, the most prominent game-theoretic concept. The introduction of game theory to biology in general and to evolutionary biology in particular explains, sometimes surprisingly well, various biological phenomena.
Game theory has applications to other fields as well. For example, to philosophy it contributes some insights into concepts related to morality and social justice, and it raises questions regarding human behavior in various situations that are of interest to psychology. Methodologically, game theory is intimately tied to mathematics: the study of game-theoretic models makes use of a variety of mathematical tools, from probability and
xxv
Introduction
combinatorics to differential equations and algebraic topology. Analyzing game-theoretic models sometimes requires developing new mathematical tools. Traditionally, game theory is divided into two major subfields: strategic games, also called noncooperative games, and coalitional games, also called cooperative games. Broadly speaking, in strategic games the players act independently of each other, with each player trying to obtain the most desirable outcome given his or her preferences, while in coalitional games the same holds true with the stipulation that the players can agree on and sign binding contracts that enforce coordinated actions. Mechanisms enforcing such contracts include law courts and behavioral norms. Game theory does not deal with the quality or justification of these enforcement mechanisms; the cooperative game model simply assumes that such mechanisms exist and studies their consequences for the outcomes of the game. The categories of strategic games and coalitional games are not well defined. In many cases interactive decision problems include aspects of both coalitional games and strategic games, and a complete theory of games should contain an amalgam of the elements of both types of models. Nevertheless, in a clear and focused introductory presentation of the main ideas of game theory it is convenient to stick to the traditional categorization. We will therefore present each of the two models, strategic games and coalitional games, separately. Chapters 1–14 are devoted to strategic games, and Chapters 15–20 are devoted to coalitional games. Chapters 21 and 22 are devoted to social choice and stable matching, which include aspects of both noncooperative and cooperative games.
How to use this book The main objective of this book is to serve as an introductory textbook for the study of game theory at both the undergraduate and the graduate levels. A secondary goal is to serve as a reference book for students and scholars who are interested in an acquaintance with some basic or advanced topics of game theory. The number of introductory topics is large and different teachers may choose to teach different topics in introductory courses. We have therefore composed the book as a collection of chapters that are, to a large extent, independent of each other, enabling teachers to use any combination of the chapters as the basis for a course tailored to their individual taste. To help teachers plan a course, we have included an abstract at the beginning of each chapter that presents its content in a short and concise manner. Each chapter begins with the basic concepts and eventually goes farther than what may be termed the “necessary minimum” in the subject that it covers. Most chapters include, in addition to introductory concepts, material that is appropriate for advanced courses. This gives teachers the option of teaching only the necessary minimum, presenting deeper material, or asking students to complement classroom lectures with independent readings or guided seminar presentations. We could not, of course, include all known results of game theory in one textbook, and therefore the end of each chapter contains references to other books and journal articles in which the interested reader can find more material for a deeper understanding of the subject. Each chapter also contains exercises, many of which are relatively easy, while some are more advanced and challenging.
xxvi
Introduction
This book was composed by mathematicians; the writing is therefore mathematically oriented, and every theorem in the book is presented with a proof. Nevertheless, an effort has been made to make the material clear and transparent, and every concept is illustrated with examples intended to impart as much intuition and motivation as possible. The book is appropriate for teaching undergraduate and graduate students in mathematics, computer science and exact sciences, economics and social sciences, engineering, and life sciences. It can be used as a textbook for teaching different courses in game theory, depending on the level of the students, the time available to the teacher, and the specific subject of the course. For example, it could be used in introductory level or advanced level semester courses on coalitional games, strategic games, a general course in game theory, or a course on applications of game theory. It could also be used for advanced mini-courses on, e.g., incomplete information (Chapters 9, 10, and 11), auctions (Chapter 12), or repeated games (Chapters 13 and 14). As mentioned previously, the material in the chapters of the book will in many cases encompass more than a teacher would choose to teach in a single course. This requires teachers to choose carefully which chapters to teach and which parts to cover in each chapter. For example, the material on strategic games (Chapters 4 and 5) can be taught without covering extensive-form games (Chapter 3) or utility theory (Chapter 2). Similarly, the material on games with incomplete information (Chapter 9) can be taught without teaching the other two chapters on models of incomplete information (Chapters 10 and 11). For the sake of completeness, we have included an appendix containing the proofs of some theorems used throughout the book, including Brouwer’s Fixed Point Theorem, Kakutani’s Fixed Point Theorem, the Knaster–Kuratowski–Mazurkiewicz (KKM) Theorem, and the separating hyperplane theorem. The appendix also contains a brief survey of linear programming. A teacher can choose to prove each of these theorems in class, assign the proofs of the theorems as independent reading to the students, or state any of the theorems without proof based on the assumption that students will see the proofs in other courses.
1
The game of chess
Chapter summary In the opening chapter of this book, we use the well-known game of chess to illustrate the notions of strategy and winning strategy. We then prove one of the first results in game theory, due to John von Neumann: in the game of chess either White (the first mover) has a winning strategy, or Black (the second mover) has a winning strategy, or each player has a strategy guaranteeing at least a draw. This is an important and nontrivial result, especially in view of the fact that to date, it is not known which of the above three alternatives holds, let alone what the winning strategy is, if one exists. In later chapters of the book, this result takes a more general form and is applied to a large class of games.
We begin with an exposition of the elementary ideas in noncooperative game theory, by analyzing the game of chess. Although the theory that we will develop in this chapter relates to that specific game, in later chapters it will be developed to apply to much more general situations.
1.1
Schematic description of the game • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The game of chess is played by two players, traditionally referred to as White and Black. At the start of a match, each player has sixteen pieces arranged on the chessboard. White is granted the opening move, following which each player in turn moves pieces on the board, according to a set of fixed rules. A match has three possible outcomes:
r Victory for White, if White captures the Black King. r Victory for Black, if Black captures the White King. r A draw, if: 1. it is Black’s turn, but he has no possible legal moves available, and his King is not in check; 2. it is White’s turn, but he has no possible legal moves available, and his King is not in check; 3. both players agree to declare a draw; 4. a board position precludes victory for both sides; 5. 50 consecutive turns have been played without a pawn having been moved and without the capture of any piece on the board, and the player whose turn it is requests that a draw be declared; 1
2
The game of chess
6. or if the same board position appears three times, and the player whose turn it is requests that a draw be declared.
1.2
Analysis and results • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
For the purposes of our analysis all we need to assume is that the game is finite, i.e., the number of possible turns is bounded (even if that bound is an astronomically large number). This does not apply, strictly speaking, to the game of chess, but since our lifetimes are finite, we can safely assume that every chess match is finite. We will denote the set of all possible board positions in chess by X. A board position by definition includes the identity of each piece on the board, and the board square on which it is located. A board position, however, does not provide full details on the sequence of moves that led to it: there may well be two or sequences of moves leading to the same board position. We therefore need to distinguish between a “board position” and a “game situation,” which is defined as follows. Definition 1.1 A game situation (in the game of chess) is a finite sequence (x0 , x1 , x2 , . . . , xK ) of board positions in X satisfying 1. x0 is the opening board position. 2. For each even integer k, 0 ≤ k < K, going from board position xK to xK+1 can be accomplished by a single legal move on the part of White. 3. For each odd integer k, 0 ≤ k < K, going from board position xK to xK+1 can be accomplished by a single legal move on the part of Black. We will denote the set of game situations by H . Suppose that a player wishes to program a computer to play chess. The computer would need a plan of action that would tell it what to do in any given game situation that could arise. A full plan of action for behavior in a game is called a strategy. Definition 1.2 A strategy for White is a function sW that associates every game situation (x0 , x1 , x2 , . . . , xK ) ∈ H , where K is even, with a board position xK+1 , such that going from board position xK to xK+1 can be accomplished by a single legal move on the part of White. Analogously, a strategy for Black is a function sB that associates every game situation (x0 , x1 , x2 , . . . , xK ) ∈ H , where K is odd, with a board position xK+1 such that going from board position xK to xK+1 can be accomplished by a single legal move on the part of Black. Any pair of strategies (sW , sB ) determines an entire course of moves, as follows. In the opening move, White plays the move that leads to board position x1 = sW (x0 ). Black then plays the move leading to board position x2 = sB (x0 , x1 ), and so on. The succeeding board positions are determined by x2K+1 = sW (x0 , x1 , . . . , x2K ) and x2K+2 = sB (x0 , x1 , . . . , x2K+1 ) for all K = 0, 1, 2, . . ..
3
1.2 Analysis and results
An entire course of moves (from the opening move to the closing one) is termed a play of the game. Every play of the game of chess ends in either a victory for White, a victory for Black, or a draw. A strategy for White is termed a winning strategy if it guarantees that White will win, no matter what strategy Black chooses. Definition 1.3 A strategy sW is a winning strategy for White if for every strategy sB of Black, the play of the game determined by the pair (sW , sB ) ends in victory for White. A strategy sW is a strategy guaranteeing at least a draw for White if for every strategy sB of Black, the play of the game determined by the pair (sW , sB ) ends in either a victory for White or a draw. If sW is a winning strategy for White, then any White player (or even computer program) adopting that strategy is guaranteed to win, even if he faces the world’s chess champion. The concepts of “winning strategy” and “strategy guaranteeing at least a draw” for Black are defined analogously, in an obvious manner. The next theorem follows from one of the earliest theorems ever published in game theory (see Theorem 3.13 on page 46). Theorem 1.4 In chess, one and only one of the following must be true: (i) White has a winning strategy. (ii) Black has a winning strategy. (iii) Each of the two players has a strategy guaranteeing at least a draw. We emphasize that the theorem does not relate to a particular chess match, but to all chess matches. That is, suppose that alternative (i) is the true case, i.e., White has a winning strategy sW . Then any person who is the White player and follows the prescriptions of that strategy will always win every chess match he ever plays, no matter who the opponent is. If, however, alternative (ii) is the true case, then Black has a winning strategy sB , and any person who is the Black player and follows the prescriptions of that strategy will always win every chess match he ever plays, no matter who the opponent is. Finally, if alternative (iii) is the true case, then White has a strategy sW guaranteeing at least a draw, and Black has a strategy sB guaranteeing at least a draw. Any person who is the White player (or the Black player) and follows the prescriptions of sW (or sB , respectively) will always get at least a draw in every chess match he ever plays, no matter who the opponent is. Note that if alternative (i) holds, there may be more than one winning strategy, and similar statements can be made with regard to the other two alternatives. So, given that one of the three alternatives must be true, which one is it? We do not know. If the day ever dawns in which a winning strategy for one of the players is discovered, or strategies guaranteeing at least a draw for each player are discovered, the game of chess will cease to be of interest. In the meantime, we can continue to enjoy the challenge of playing (or watching) a good chess match. Despite the fact that we do not know which alternative is the true one, the theorem is significant, because a priori it might have been the case that none of the alternatives was possible; one could have postulated that no player could ever have a strategy always guaranteeing a victory, or at least a draw.
4
The game of chess
White moves
Black moves
White moves
Figure 1.1 The game of chess presented in extensive form
We present two proofs of the theorem. The first proof is the “classic” proof, which in principle shows how to find a winning strategy for one of the players (if such a strategy exists) or a strategy guaranteeing at least a draw (if such a strategy exists). The second proof is shorter, but it cannot be used to find a winning strategy for one of the players (if such a strategy exists) or a strategy guaranteeing at least a draw (if such a strategy exists). We start with several definitions that are needed for the first proof of the theorem. The set of game situations can be depicted by a tree1 (see Figure 1.1). Such a tree is called a game tree. Each vertex of the game tree represents a possible game situation. Denote the set of vertices of the game tree by H . The root vertex is the opening game situation x0 , and for each vertex x, the set of children vertices of x are the set of game situations that can be reached from x in one legal move. For example, in his opening move, White can move one of his pawns one or two squares forward, or one of his two rooks. So White has 20 possible opening moves, which means that the root vertex of the tree has 20 children vertices. Every vertex that can be reached from x by a sequence of moves is called a descendant of x. Every leaf of the tree corresponds to a terminal game situation, in which either White has won, Black has won, or a draw has been declared. Given a vertex x ∈ H , we may consider the subtree beginning at x, which is by definition the tree whose root is x that is obtained by removing all vertices that are not descendants of x. This subtree of the game tree, which we will denote by Ŵ(x), corresponds to a game that is called the subgame beginning at x. We will denote by nx the number of vertices in Ŵ(x). The game Ŵ(x0 ) is by definition the game that starts with the opening situation of the game, and is therefore the standard chess game. ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
1 The mathematical definition of a tree appears in the sequel (see Definition 3.5 on page 42).
5
1.2 Analysis and results
If y is a child vertex of x, then Ŵ(y) is a subtree of Ŵ(x) that does not contain x. In particular, nx > ny . Moreover, nx = 1 if and only if x is a terminal situation of the game, i.e., the players cannot implement any moves at this subgame. In such a case, the strategy of a player is denoted by ∅. Denote by
F = {Ŵ(x) : x ∈ H }
(1.1)
the collection of all subgames that are defined by subtrees of the game of chess. Theorem 1.4 can be proved using the result of Theorem 1.5. Theorem 1.5 Every game in F satisfies one and only one of the following alternatives: (i) White has a winning strategy. (ii) Black has a winning strategy. (iii) Each of the players has a strategy guaranteeing at least a draw. Proof: The proof proceeds by induction on nx , the number of vertices in the subgame Ŵ(x). Suppose x is a vertex such that nx = 1. As noted above, that means that x is a terminal vertex. If the White King has been removed from the board, Black has won, in which case ∅ is a winning strategy for Black. If the Black King has been removed from the board, White has won, in which case ∅ is a winning strategy for White. Alternatively, if both Kings are on the board at the end of play, the game has ended in a draw, in which case ∅ is a strategy guaranteeing a draw for both Black and White. Next, suppose that x is a vertex such that nx > 1. Assume by induction that at all vertices y satisfying ny < nx , one and only one of the alternatives (i), (ii), or (iii) is true in the subgame Ŵ(y). Suppose, without loss of generality, that White has the first move in Ŵ(x). Any board position y that can be reached from x satisfies ny < nx , and so the inductive assumption is true in the corresponding subgame Ŵ(y). Denote by C(x) the collection of vertices that can be reached from x in one of White’s moves. 1. If there is a vertex y0 ∈ C(x) such that White has a winning strategy in Ŵ(y0 ), then alternative (i) is true in Ŵ(x): the winning strategy for White in Ŵ(x) is to choose as his first move the move leading to vertex y0 , and to follow the winning strategy in Ŵ(y0 ) at all subsequent moves. 2. If Black has a winning strategy in Ŵ(y) for every vertex y ∈ C(x), then alternative (ii) is true in Ŵ(x): Black can win by ascertaining what the vertex y is after White’s first move, and following his winning strategy in Ŵ(y) at all subsequent moves. 3. Otherwise: r (1) does not hold, i.e., White has no winning strategy in Ŵ(y) for any y ∈ C(x). Because the induction hypothesis holds for every vertex y ∈ C(x), either Black has a winning strategy in Ŵ(y), or both players have a strategy guaranteeing at least a draw in Ŵ(y). r (2) does not hold, i.e., there is a vertex y0 ∈ C(x) such that Black does not have a winning strategy in Ŵ(y0 ). But because (1) does not hold, White also does not have a
6
The game of chess
winning strategy in Ŵ(y0 ). Therefore, by the induction hypothesis applied to Ŵ(y0 ), both players have a strategy guaranteeing at least a draw in Ŵ(y0 ). As we now show, in this case, in Ŵ(x) both players have a strategy guaranteeing at least a draw. White can guarantee at least a draw by choosing a move leading to vertex y0 , and from there by following the strategy that guarantees at least a draw in Ŵ(y0 ). Black can guarantee at least a draw by ascertaining what the board position y is after White’s first move, and at all subsequent moves in Ŵ(y) either by following a winning strategy or following a strategy that guarantees at least a draw in that subgame. The proof just presented is a standard inductive proof over a tree: one assumes that the theorem is true for every subtree starting from the root vertex, and then shows that it is true for the entire tree. The proof can also be accomplished in the following way: select any vertex x that is neither a terminal vertex nor the root vertex. The subgame starting from this vertex, Ŵ(x), contains at least two vertices, but fewer vertices than the original game (because it does not include the root vertex), and the induction hypothesis can therefore be applied to Ŵ(x). Now “fold up” the subgame and replace it with a terminal vertex whose outcome is the outcome that is guaranteed by the induction hypothesis to be obtained in Ŵ(x). This leads to a new game Ŵ . Since Ŵ(x) has at least two vertices, Ŵ has fewer vertices than the original game, and therefore by the induction hypothesis the theorem is true for Ŵ . It is straightforward to ascertain that a player has a winning strategy in Ŵ if and only if he has a winning strategy in the original game. In the proof of Theorem 1.5 we used the following properties of the game of chess: (C1) The game is finite. (C2) The strategies of the players determine the play of the game. In other words, there is no element of chance in the game; neither dice nor card draws are involved. (C3) Each player, at each turn, knows the moves that were made at all previous stages of the game. We will later see examples of games in which at least one of the above properties fails to hold, for which the statement of Theorem 1.5 also fails to hold (see for example the game “Matching Pennies,” Example 3.20 on page 52). We next present a second proof of Theorem 1.4. We will need the following two facts from formal logic for the proof. Let X be a finite set and let A(x) be an arbitrary logical formula.2 Then:
r If it is not the case that “for every x ∈ X the formula A(x) holds,” then there exists an x ∈ X where the formula A(x) does not hold: ¬ (∀x(A)) = ∃x(¬A).
(1.2)
r If it is not the case that “there exists an x ∈ X where the formula A(x) holds,” then for every x ∈ X the formula A(x) does not hold: ¬ (∃x(A)) = ∀x(¬A).
(1.3)
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
2 Recall that the logical statement “for every x ∈ X event A obtains” is written formally as ∀x(A), and the statement “there exists an x ∈ X for which event A obtains” is written as ∃x(A), while “event A does not obtain” is written as ¬A. For ease of exposition, we will omit the set X from each of the formal statements in the proof.
7
1.4 Exercises
Second Proof of Theorem 1.4: As stated above, we assume that the game of chess is a finite game, i.e., there is a natural number K such that every play of the game concludes after at most 2K turns (K turns on the part of White and K turns on the part of Black). Assume that there are exactly 2K turns in every play of the game: every play that ends in fewer turns can be continued by adding more turns, up to 2K, at which each player alternately implements the move “do nothing,” which has no effect on the board position. For every k, 1 ≤ k ≤ K, denote by ak the move implemented by White at his k-th turn, and by bk the move implemented by Black at his k-th turn. Denote by W the sentence that White wins (after 2K turns). Then ¬W is the sentence that the play ends in either a draw or a victory for Black. Using these symbols, the statement “White has a winning strategy” can be written formally as ∃a1 ∀b1 ∃a2 ∀b2 ∃a3 · · · ∃aK ∀bK (W ).
(1.4)
It follows that the statement “White does not have a winning strategy” can be written formally as ¬(∃a1 ∀b1 ∃a2 ∀b2 ∃a3 · · · ∃aK ∀bK (W )).
(1.5)
By repeated application of Equations (1.2) and (1.3) we deduce that this is equivalent to ∀a1 ∃b1 ∀a2 ∃b2 ∀a3 · · · ∀aK ∃bK (¬W ).
(1.6)
This, however, says that Black has a strategy guaranteeing at least a draw. In other words, we have proved that if White has no winning strategy, then Black has a strategy that guarantees at least a draw. We can similarly prove that if Black has no winning strategy, then White has a strategy that guarantees at least a draw. This leads to the conclusion that one of the three alternatives of Theorem 1.4 must hold.
1.3
Remarks • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The second proof of Theorem 1.4 was brought to the attention of the authors by Abraham Neyman, to whom thanks are due.
1.4
Exercises • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
1.1 “The outcome of every play of the game of chess is either a victory for White, a victory for Black, or a draw.” Is that statement equivalent to the result of Theorem 1.4? Justify your answer. 1.2 Find three more games that satisfy Properties (C1)–(C3) on page 6 that are needed for proving Theorem 1.4. 1.3 Theorem 1.4 was proved in this chapter under the assumption that the length of a game of chess is bounded. In this exercise we will prove the theorem without that assumption, that is, we will allow an infinite number of moves. We will agree that the outcome of an infinitely long game of chess is a draw.
8
The game of chess
When one allows infinite plays, the set of game situations is an infinite set. However, to know how to continue playing, the players need not know all the sequence of past moves. In fact, only a bounded amount of information needs to be told to the players, e.g.,
r What is the current board position? r Have the players played an even or an odd number of moves up to now (for knowing whose turn it is)? r For every board position, has it appeared in the play up to now 0 times, once, or more than once (for knowing whether the player whose turn it is may ask for a draw)? We will therefore make use of the fact that one may suppose that there are only a finite number of board positions in chess. Consider the following version of chess. The rules of the game are identical to the rules on page 1, with the one difference that if a board position is repeated during a play, the play ends in a draw. Since the number of game situations is finite, this version of chess is a finite game. We will call it “finite chess.” (a) Prove that in finite chess exactly one of the following holds: (i) White has a winning strategy. (ii) Black has a winning strategy. (iii) Each of the two players has a strategy guaranteeing at least a draw. (b) Prove that if one of the players has a winning strategy in finite chess, then that player also has a winning strategy in chess. We now prove that if each player has a strategy guaranteeing at least a draw in finite chess, then each player has a strategy guaranteeing at least a draw in chess. We will prove this claim for White. Suppose, therefore, that White has a strategy σW in finite chess that guarantees at least a draw. Consider the following strategy σW for White in chess:
r Implement strategy σW until either the play of chess terminates or a board position repeats itself (at which point the play of finite chess terminates). r If the play of chess arrives at a game situation x that has previously appeared, implement the strategy σW restricted to the subgame beginning at x until the play arrives at a board position y that has previously appeared, and so on.
(c) Prove that the strategy σW guarantees at least a draw for White in chess.
2
Utility theory
Chapter summary The objective of this chapter is to provide a quantitative representation of players’ preference relations over the possible outcomes of the game, by what is called a utility function. This is a fundamental element of game theory, economic theory, and decision theory in general, since it facilitates the application of mathematical tools in analyzing game situations whose outcomes may vary in their nature, and often be uncertain. The utility function representation of preference relations over uncertain outcomes was developed and named after John von Neumann and Oskar Morgenstern. The main feature of the von Neumann–Morgenstern utility is that it is linear in the probabilities of the outcomes. This implies that a player evaluates an uncertain outcome by its expected utility. We present some properties (also known as axioms) that players’ preference relations can satisfy. We then prove that any preference relation having these properties can be represented by a von Neumann–Morgenstern utility and that this representation is determined up to a positive affine transformation. Finally we note how a player’s attitude toward risk is expressed in his von Neumann–Morgenstern utility function.
2.1
Preference relations and their representation • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
A game is a mathematical model of a situation of interactive decision making, in which every decision maker (or player) strives to attain his “best possible” outcome, knowing that each of the other players is striving to do the same thing. But what does a player’s “best possible” outcome mean? The outcomes of a game need not be restricted to “Win,” “Loss,” or “Draw.” They may well be monetary payoffs or non-monetary payoffs, such as “your team has won the competition,” “congratulations, you’re a father,” “you have a headache,” or “you have granted much-needed assistance to a friend in distress.” To analyze the behavior of players in a game, we first need to ascertain the set of outcomes of a game and then we need to know the preferences of each player with respect to the set of outcomes. This means that for every pair of outcomes x and y, we need to know for each player whether he prefers x to y, whether he prefers y to x, or whether he is indifferent between them. We denote by O the set of outcomes of the game. The preferences of each player over the set O are captured by the mathematical concept that is termed preference relation. 9
10
Utility theory
Definition 2.1 A preference relation of player i over a set of outcomes O is a binary relation denoted by i . A binary relation is formally a subset of O × O, but instead of writing (x, y) ∈ i we write x i y, and read that as saying “player i either prefers x to y or is indifferent between the two outcomes”; sometimes we will also say in this case that the player “weakly prefers” x to y. Given the preference relation i we can define the corresponding strict preference relation ≻i , which describes when player i strictly prefers one outcome to another: x ≻i y
⇐⇒
x i y and y i x.
(2.1)
We can similarly define the indifference relation ≈i , which expresses the fact that a player is indifferent between two possible outcomes: x ≈i y
⇐⇒
x i y and y i x.
(2.2)
We will assume that every player’s preference relation satisfies the following three properties. Assumption 2.2 The preference relation i over O is complete; that is, for any pair of outcomes x and y in O either x i y, or y i x, or both. Assumption 2.3 The preference relation i over O is reflexive; that is, x i x for every x ∈ O. Assumption 2.4 The preference relation i over O is transitive; that is, for any triple of outcomes x, y, and z in O, if x i y and y i z then x i z. The assumption of completeness says that a player should be able to compare any two possible outcomes and state whether he is indifferent between the two, or has a definite preference for one of them, in which case he should be able to state which is the preferred outcome. One can imagine real-life situations in which this assumption does not obtain, where a player is unable to rank his preferences between two or more outcomes (or is uninterested in doing so). The assumption of completeness is necessary for the mathematical analysis conducted in this chapter. The assumption of reflexivity is quite natural: every outcome is weakly preferred to itself. The assumption of transitivity is needed under any reasonable interpretation of what a preference relation means. If this assumption does not obtain, then there exist three outcomes x, y, z such that x i y and y i z, but z ≻i x. That would mean that if a player were asked to choose directly between x and z he would choose z, but if he were first asked to choose between z and y and then between the outcome he just preferred (y) and x, he would choose x, so that his choices would depend on the order in which alternatives are offered to him. Without the assumption of transitivity, it is unclear what a player means when he says that he prefers z to x. The greater than or equal to relation over the real numbers ≥ is a familiar preference relation. It is complete and transitive. If a game’s outcomes for player i are sums of dollars, it is reasonable to suppose that the player will compare different outcomes using this preference relation. Since using real numbers and the ≥ ordering relation is very convenient for the purposes of conducting analysis, it would be an advantage to be able
11
2.1 Preference relations and their representation
in general to represent game outcomes by real numbers, and player preferences by the familiar ≥ relation. Such a representation of a preference relation is called a utility function, and is defined as follows. Definition 2.5 Let O be a set of outcomes and be a complete, reflexive, and transitive preference relation over O. A function u : O → R is called a utility function representing if for all x, y ∈ O, xy
⇐⇒
u(x) ≥ u(y).
(2.3)
In other words, a utility function u is a function associating each outcome x with a real number u(x) in such a way that the more an outcome is preferred, the larger is the real number associated with it. If the set of outcomes is finite, any complete, reflexive, and transitive preference relation can easily be represented by a utility function. Example 2.6 Suppose that O = {a, b, c, d} and the preference relation is given by a ≻ b ≈ c ≻ d.
(2.4)
Note that although the relation is defined only on part of the set of all pairs of outcomes, the assumptions of reflexivity and transitivity enable us to extend the relation to every pair of outcomes. For example, from the above we can immediately conclude that a ≻ c. The utility function u defined by u(a) = 22, u(b) = 13, u(c) = 13, u(d) = 0,
(2.5)
which represents . There are, in fact, a continuum of utility functions that represent this relation, because the only condition that a utility function needs to meet in order to represent is u(a) > u(b) = u(c) > u(d).
(2.6)
◭ The following theorem, whose proof is left to the reader (Exercise 2.2), generalizes the conclusion of the example. Theorem 2.7 Let O be a set of outcomes and let be a complete, reflexive, and transitive preference relation over O. Suppose that u is a utility function representing . Then for every monotonically strictly increasing function v : R → R, the composition v ◦ u defined by (v ◦ u)(x) = v(u(x))
(2.7)
is also a utility function representing . Given the result of this theorem, a utility function is often called an ordinal function, because it represents only the order of preferences between outcomes. The numerical values that a utility function associates with outcomes have no significance, and do not in any way represent the “intensity” of a player’s preferences.
12
Utility theory
2.2
Preference relations over uncertain outcomes: the model • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Once we have represented a player’s preferences by a utility function, we need to deal with another problem: the outcome of a game may well be uncertain and determined by a lottery. This can occur for two reasons:
r The game may include moves of chance. Examples of such games include backgammon and Monopoly (where dice are tossed) and bridge and poker (where the shuffling of the deck introduces chance into the game). In many economic situations, an outcome may depend on uncertain factors such as changes in currency conversion rates or the valuation of stocks in the stock market, and the outcome itself may therefore be uncertain. The most convenient way to model such situations is to describe some of the determining factors as lottery outcomes. r One or more of the players may play in a non-deterministic manner, choosing moves by lottery. For example, in a chess match, a player may choose his opening move by tossing a coin. The formal analysis of strategies that depend on lotteries will be presented in Chapter 5.
Example 2.8 Consider the following situation involving one player who has two possible moves, T and B. The outcome is the amount of dollars that the player receives. If she chooses B, she receives $7,000. If she chooses T , she receives the result of a lottery that grants a payoff of $0 or $20,000 with equal probability. The lottery is denoted by [ 12 ($20,000), 21 ($0)]. What move can we expect the player to prefer? The answer depends on the player’s attitude to risk. There are many people who would rather receive $7,000 with certainty than take their chances with a toss of a coin determining whether they receive $20,000 or $0, while others would take a chance on the large sum of $20,000. Risk attitude is a personal characteristic that varies from one individual to another, and therefore ◭ affects a player’s preference relation.
To analyze situations in which the outcome of a game may depend on a lottery over several possible outcomes, the preference relations of players need to be extended to cover preferences over lotteries involving the outcomes. Given an extended preference relation of a player, which includes preferences over both individual outcomes and lotteries, we can again ask whether such a relation can be represented by a utility function. In other words, can we assign a real number to each lottery in such a way that one lottery is preferred by the player to another lottery if and only if the number assigned to the more-preferred lottery is greater than the number assigned to the less-preferred lottery? A convenient property that such a utility function can satisfy is linearity, meaning that the number assigned to a lottery is equal to the expected value of the numbers assigned to the individual outcomes over which the lottery is being conducted. For example, if L = [px, (1 − p)y)] is a lottery assigning probability p to outcome x, and probability 1 − p to outcome y, then the linearity requirement would imply that u(L) = pu(x) + (1 − p)u(y).
(2.8)
13
2.2 Preference relations over uncertain outcomes
A5
A7 1 2
1 2
A1
A2 1 3
2 3
L1
A1 1 2
L2
A2 1 4
A5 1 8
A7 1 8
L3
Figure 2.1 Lotteries over outcomes
Such a utility function is linear in the probabilities p and 1 − p; hence the name. The use of linear utility functions is very convenient for analyzing games in which the outcomes are uncertain (a topic studied in depth in Section 5.5 on page 172). But we still need to answer the question which preference relation of a player (over lotteries of outcomes) can be represented by a linear utility function, as expressed in Equation (2.8)? The subject of linear utility functions was first explored by the mathematician John von Neumann and the economist Oskar Morgenstern [1944], and it is the subject matter of this chapter. Suppose that a decision maker is faced with a decision determining which of a finite number of possible outcomes, sometimes designated “prizes,” he will receive. (The terms “outcome” and “prize” will be used interchangeably in this section.) Denote the set of possible outcomes by O = {A1 , A2 , . . . , AK }. In Example 2.8 there are three outcomes O = {A1 , A2 , A3 }, where A1 = $0, A2 = $7,000, and A3 = $20,000. Given the set of outcomes O, the relevant space for conducting analysis is the set of lotteries over the outcomes in O. Figure 2.1 depicts three possible lotteries over outcomes. The three lotteries in Figure 2.1 are: L1 , a lottery granting A5 and A7 with equal probability; L2 , a lottery granting A1 with probability 32 and A2 with probability 31 ; and L3 granting A1 , A2 , A5 , and A7 with respective probabilities 12 , 14 , 18 , and 18 . A lottery L in which outcome Ak has probability pk (where p1 , . . . , pK are nonnegative real numbers summing to 1) is denoted by L = [p1 (A1 ), p2 (A2 ), . . . , pK (AK )],
(2.9)
and the set of all lotteries over O is denoted by L. The three lotteries in Figure 2.1 can thus be written as L1 = 12 (A5 ), 12 (A7 ) , L2 = 32 (A1 ), 13 (A2 ) , L3 = 12 (A1 ), 14 (A2 ), 18 (A5 ), 81 (A7 ) .
The set of outcomes O may be regarded as a subset of the set of lotteries L by identifying each outcome Ak with the lottery yielding Ak with probability 1. In other words, receiving outcome Ak with certainty is equivalent to conducting a lottery that yields Ak with probability 1 and yields all the other outcomes with probability 0, [0(A1 ), 0(A2 ), . . . , 0(Ak−1 ), 1(Ak ), 0(Ak+1 ), . . . , 0(AK )].
(2.10)
14
Utility theory
We will denote a preference relation for player i over the set of all lotteries by i , so that L1 i L2 indicates that player i either prefers lottery L1 to lottery L2 or is indifferent between the two lotteries. Definition 2.9 Let i be a preference relation for player i over the set of lotteries L. A utility function ui representing the preferences of player i is a real-valued function defined over L satisfying ui (L1 ) ≥ ui (L2 )
⇐⇒
L1 i L2 ∀L1 , L2 ∈ L.
(2.11)
In words, a utility function is a function whose values reflect the preferences of a player over lotteries. Definition 2.10 A utility function ui is called linear if for every lottery L = [p1 (A1 ), p2 (A2 ), . . . , pK (AK )], it satisfies1 ui (L) = p1 ui (A1 ) + p2 ui (A2 ) + · · · + pK ui (AK ).
(2.12)
As noted above, the term “linear” expresses the fact that the function ui is a linear function in the probabilities (pk )K k=1 . If the utility function is linear, the utility of a lottery is the expected value of the utilities of the outcomes. A linear utility function is also called a von Neumann–Morgenstern utility function. Which preference relation of a player can be represented by a linear utility function? First of all, since ≥ is a transitive relation, it cannot possibly represent a preference relation i that is not transitive. The transitivity assumption that we imposed on the preferences over the outcomes O must therefore be extended to preference relations over lotteries. This alone, however, is still insufficient for the existence of a linear utility function over lotteries: there are complete, reflexive, and transitive preference relations over the set of simple lotteries that cannot be represented by linear utility functions (see Exercise 2.18). The next section presents four requirements on preference relations that ensure that a preference relation i over O can be represented by a linear utility function. These requirements are also termed the von Neumann–Morgenstern axioms.
2.3
The axioms of utility theory • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Given the observations of the previous section, we would like to identify which preference relations !i over lotteries can be represented by linear utility functions ui . The first requirement that must be imposed is that the preference relation be extended beyond the set of simple lotteries to a larger set: the set of compound lotteries. Definition 2.11 A compound lottery is a lottery of lotteries. A compound lottery is therefore given by = [q1 (L1 ), q2 (L1 ), . . . , qJ (LJ )], L
(2.13)
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
1 Given the identification of outcomes with lotteries, we use the notation ui (Ak ) to denote the utility of the lottery in Equation (2.10), in which the probability of receiving outcome Ak is one.
15
2.3 The axioms of utility theory
A1
A2
2 3
A5
A7
1 2
1 3
3 4
1 2
1 4
Figure 2.2 An example of a compound lottery
where q1 , . . . , qJ are nonnegative numbers summing to 1, and L1 , . . . , LJ are lotteries in L. This means that for each 1 ≤ j ≤ J there are nonnegative numbers (pkj )K k=1 summing to 1 such that j j j (2.14) Lj = p1 (A1 ), p2 (A1 ), . . . , pK (AK ) .
Compound lotteries naturally arise in many situations. Consider, for example, an individual who chooses his route to work based on the weather: on rainy days he travels by Route 1, and on sunny days he travels by Route 2. Travel time along each route is inconstant, because it depends on many factors (beyond the weather). We are therefore dealing with a “travel time to work” random variable, whose value depends on a lottery of a lottery: there is some probability that tomorrow morning will be rainy, in which case travel time will be determined by a probability distribution depending on the factors affecting travel along Route 1, and there is a complementary probability that tomorrow will be sunny, so that travel time will be determined by a probability distribution depending on the factors affecting travel along Route 2. We will show in the sequel that under proper assumptions there is no need to consider lotteries that are more compound than compound lotteries, namely, lotteries of compound lotteries. All our analysis can be conducted by limiting consideration to only one level of compounding. To distinguish between the two types of lotteries with which we will be working, we will call the lotteries in L ∈ L simple lotteries. The set of compound lotteries is denoted . by L A graphic depiction of a compound lottery appears in Figure 2.2. Denoting L1 = [ 23 (A1 ), 13 (A2 )] and L2 = [ 21 (A5 ), 12 (A7 )], the compound lottery in Figure 2.2 is = L
3 4
(L1 ), 14 (L2 ) .
(2.15)
that yields the Every simple lottery L can be identified with the compound lottery L simple lottery L with probability 1: = [1(L)]. L
(2.16)
16
Utility theory
As every outcome Ak is identified with the simple lottery L = [0(A1 ), . . . , 0(Ak−1 ), 1(Ak ), 0(Ak+1 ), . . . , 0(AK )],
(2.17)
it follows that an outcome Ak is also identified with the compound lottery [1(L)], in which L is the simple lottery defined in Equation (2.17). Given these identifications, the space we will work with will be the set of compound lotteries,2 which includes within it the set of simple lotteries L, and the set of outcomes O. We will assume from now on that the preference relation i is defined over the set of compound lotteries. Player i’s utility function, representing his preference relation i , is → R satisfying therefore a function ui : L 1 ) ≥ ui (L 2 ) ui (L
⇐⇒
. 2 , ∀L 1 i L 1 , L 2 ∈ L L
(2.18)
Given the identification of outcomes with simple lotteries, ui (Ak ) and ui (L) denote the utility of compound lotteries corresponding to the outcome Ak and the simple lottery L, respectively. Because the preference relation is complete, it determines the preference between any two outcomes Ai and Aj . Since it is transitive, the outcomes can be ordered, from most preferred to least preferred. We will number the outcomes (recall that the set of outcomes is finite) in such a way that AK i · · · i A2 i A1 .
2.3.1
(2.19)
Continuity Every reasonable decision maker will prefer receiving $300 to $100, and prefer receiving $100 to $0, that is, $300 ≻i $100 ≻i $0.
(2.20)
It is also a reasonable assumption that a decision maker will prefer receiving $300 with probability 0.9999 (and $0 with probability 0.0001) to receiving $100 with probability 1. It is reasonable to assume he would prefer receiving $100 with probability 1 to receiving $300 with probability 0.0001 (and $0 with probability 0.9999). Formally, [0.9999($300), 0.0001($0)] ≻i 100 ≻i [0.0001($300), 0.9999($0)]. The higher the probability of receiving $300 (and correspondingly, the lower the probability of receiving $0), the more the lottery will be preferred. By continuity, it is reasonable to suppose that there will be a particular probability p at which the decision maker will be indifferent between receiving $100 and a lottery granting $300 with probability p and $0 with probability 1 − p: 100 ≈i [p($300), (1 − p)($0)].
(2.21)
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
2 The set of lotteries, as well as the set of compound lotteries, depends on the set of outcomes O, so that in fact we should denote the set of lotteries by L(O), and the set of compound lotteries by L(O). For the sake of readability, we take the underlying set of outcomes O to be fixed, and we will not specify this dependence in our formal presentation.
17
2.3 The axioms of utility theory
The exact value of p will vary depending on the decision maker: a pension fund making many investments is interested in maximizing expected profits, and its p will likely be close to 13 . The p of a risk-averse individual will be higher than 31 , whereas for the risk lovers among us p will be less than 31 . Furthermore, the size of p, even for one individual, may be situation-dependent: for example, a person may generally be risk averse, and have p higher than 31 . However, if this person has a pressing need to return a debt of $200, then $100 will not help him, and his p may be temporarily lower than 13 , despite his risk aversion. The next axiom encapsulates the idea behind this example. Axiom 2.12 (Continuity) For every triplet of outcomes A i B i C, there exists a number θi ∈ [0, 1] such that B ≈i [θi (A), (1 − θi )(C)].
2.3.2
(2.22)
Monotonicity Every reasonable decision maker will prefer to increase his probability of receiving a more-preferred outcome and lower the probability of receiving a less-preferred outcome. This natural property is captured in the next axiom. Axiom 2.13 (Monotonicity) Let α, β be numbers in [0, 1], and suppose that A ≻i B. Then [α(A), (1 − α)(B)] i [β(A), (1 − β)(B)]
(2.23)
if and only if α ≥ β. Assuming the Axioms of Continuity and Monotonicity yields the next theorem, whose proof is left to the reader (Exercise 2.4). Theorem 2.14 If a preference relation satisfies the Axioms of Continuity and Monotonicity, and if A i B i C, and A ≻i C, then the value of θi defined in the Axiom of Continuity is unique. satisfies the Axioms of Continuity Corollary 2.15 If a preference relation i over L and Monotonicity, and if AK ≻i A1 , then for each k = 1, 2, . . . , K there exists a unique θik ∈ [0, 1] such that Ak ≈i θik (AK ), 1 − θik (A1 ) . (2.24)
The corollary and the fact that A1 ≈i [0(AK ), 1(A1 )] and AK ≈i [1(AK ), 0(A1 )] imply that θi1 = 0,
2.3.3
θiK = 1.
(2.25)
Simplification of lotteries The next axiom states that the only considerations that determine the preference between lotteries are the probabilities attached to each outcome, and not the way that the lottery is conducted. For example, if we consider the lottery in Figure 2.2, with respect to the probabilities attached to each outcome that lottery is equivalent to lottery L3 in Figure 2.1:
18
Utility theory
in both lotteries the probability of receiving outcome A1 is 12 , the probability of receiving outcome A2 is 41 , the probability of receiving outcome A5 is 18 , and the probability of receiving outcome A7 is 81 . The next axiom captures the intuition that it is reasonable to suppose that a player will be indifferent between these two lotteries. Axiom 2.16 (Axiom of Simplification of Compound Lotteries) For each j = 1, . . . , J , let Lj be the following simple lottery: j j j Lj = p1 (A1 ), p2 (A2 ), . . . , pK (AK ) , (2.26) be the following compound lottery: and let L
= [q1 (L1 ), q2 (L2 ), . . . , qJ (LJ )]. L
(2.27)
rk = q1 pk1 + q2 pk2 + · · · + qJ pkJ ;
(2.28)
L = [r1 (A1 ), r2 (A2 ), . . . , rK (AK )].
(2.29)
≈i L. L
(2.30)
For each k = 1, . . . , K define
will be Ak . this is the overall probability that the outcome of the compound lottery L Consider the simple lottery Then
As noted above, the motivation for the axiom is that it should not matter whether a lottery is conducted in a single stage or in several stages, provided the probability of receiving the various outcomes is identical in the two lotteries. The axiom ignores all aspects of the lottery except for the overall probability attached to each outcome, so that, for example, it takes no account of the possibility that conducting a lottery in several stages might make participants feel tense, which could alter their preferences, or their readiness to accept risk.
2.3.4
Independence Our last requirement regarding the preference relation i relates to the following scenario. Suppose that we create a new compound lottery out of a given compound lottery by replacing one of the simple lotteries involved in the compound lottery with a different simple lottery. The axiom then requires a player who is indifferent between the original simple lottery and its replacement to be indifferent between the two corresponding compound lotteries. = [q1 (L1 ), . . . , qJ (LJ )] be a compound lottery, and Axiom 2.17 (Independence) Let L let M be a simple lottery. If Lj ≈i M then ≈i [q1 (L1 ), . . . , qj −1 (Lj ), qj (M), qj +1 (Lj +1 ), . . . , qJ (LJ )]. L
(2.31)
One can extend the Axioms of Simplification and Independence to compound lotteries of any order (i.e., lotteries over lotteries over lotteries . . . over lotteries over outcomes) in a natural way. By induction over the levels of compounding, it follows that the player’s
19
2.4 The characterization theorem
preference relation over all compound lotteries (of any order) is determined by the player’s preference relation over simple lotteries (why?).
2.4
The characterization theorem for utility functions • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The next theorem characterizes when a player has a linear utility function. is complete and transitive, and Theorem 2.18 If player i’s preference relation i over L satisfies the four von Neumann–Morgenstern axioms (Axioms 2.12, 2.13, 2.16, and 2.17), then this preference relation can be represented by a linear utility function. The next example shows how a player whose preference relation satisfies the von Neumann–Morgenstern axioms compares two lotteries based on his utility from the outcomes of the lottery.
Example 2.19 Suppose that Joshua is choosing which of the following two lotteries he prefers:
r [ 1 (New car), 1 (New computer)] – a lottery in which his probability of receiving a new car is 1 , 2 2 2
and his probability of receiving a new computer is 21 . r [ 1 (New motorcycle), 2 (Trip around the world)] – a lottery in which his probability of receiving 3 3 a new motorcycle is 13 , and his probability of receiving a trip around the world is 23 .
Suppose that Joshua’s preference relation over the set of lotteries satisfies the von Neumann– Morgenstern axioms. Then Theorem 2.18 implies that there is a linear utility function u representing his preference relation. Suppose that according to this function u: u(New Car) = 25,
u(Trip around the world) = 14, u(New motorcycle) = 3, u(New computer) = 1.
Then Joshua’s utility from the first lottery is u 12 (New Car), 12 (New computer) =
1 2
× 25 +
and his utility from the second lottery is u 13 (New motorcycle), 23 (Trip around the world) =
1 3
1 2
×3+
× 1 = 13, 2 3
× 14 =
31 3
(2.32)
= 10 13 .
(2.33)
It follows that he prefers the first lottery (whose outcomes are a new car and a new computer) to ◭ the second lottery (whose outcomes are a new motorcycle and a trip around the world).
Proof of Theorem 2.18: We first assume that AK ≻i A1 , i.e., the most-desired outcome AK is strictly preferred to the least-desired outcome A1 . If A1 ≈i AK , then by transitivity,
20
Utility theory
the player is indifferent between all the outcomes. That case is simple to handle, and we will deal with it at the end of the proof. Step 1: Definition of a function ui over the set of lotteries. By Corollary 2.15, for each 1 ≤ k ≤ K there exists a unique real number 0 ≤ θik ≤ 1 satisfying Ak ≈i θik (AK ), 1 − θik (A1 ) . (2.34)
. Suppose L = We now define a function ui over the set of compound lotteries L [q1 (L1 ), . . . , qJ (LJ )] is a compound lottery, in which q1 , . . . , qJ are nonnegative numbers j j summing to 1, and L1 , . . . , LJ are simple lotteries given by Lj = [p1 (A1 ), . . . , pK (AK )]. For each 1 ≤ k ≤ K define rk = q1 pk1 + q2 pk2 + · · · + qJ pkJ .
(2.35)
This is the probability that the outcome of the lottery is Ak . Define a function ui on the : set of compound lotteries L = r1 θi1 + r2 θi2 + · · · + rK θik . ui (L)
(2.36)
It follows from (2.36) that, in particular, every simple lottery L = [p1 (A1 ), . . . , pK (AK )] satisfies ui (L) =
K
pk θik .
(2.37)
k=1
Step 2: ui (Ak ) = θik for all 1 ≤ k ≤ K. Outcome Ak is equivalent to the lottery L = [1(Ak )], which in turn is equivalent to the = [1(L)]. The outcome of this lottery L is Ak with probability 1, so compound lottery L that in this case 1 if l = k, rl = (2.38) 0 if l = k. We deduce that ui (Ak ) = θik ,
∀k ∈ {1, 2, . . . , K}.
(2.39)
Since θi1 = 0 and θiK = 1, we deduce that in particular ui (A1 ) = 0 and ui (AK ) = 1. Step 3: The function ui is linear. To show that ui is linear, it suffices to show that for each simple lottery L = [p1 (A1 ), . . . , pK (AK )], ui (L) =
K
pk ui (Ak ).
(2.40)
k=1
This equation Equation (2.37) implies that the left-hand side of this equaholds, because k tion equals K i=1 pk θi , and Equation (2.39) implies that the right-hand side also equals K k p θ . k i i=1
21
2.4 The characterization theorem
≈i [ui (L)(A K ), (1 − ui (L))(A Step 4: L 1 )] for every compound lottery L. = [q1 (L1 ), . . . , qJ (LJ )] be a compound lottery, where Let L j j Lj = p1 (A1 ), . . . , pK (AK ) , ∀j = 1, 2, . . . , J.
(2.41)
Denote, as before,
J
qj pjk , ∀k = 1, 2, . . . , K.
(2.42)
≈i [r1 (A1 ), r2 (A2 ), . . . , rK (AK )]. L
(2.43)
≈i [r1 (M1 ), r2 (M2 ), . . . , rK (MK )]. L
(2.44)
rk =
j =1
By the Simplification Axiom,
Denote Mk = [θik (AK ), (1 − θik )(A1 )] for every 1 ≤ k ≤ K. By definition, Ak ≈i Mk for every 1 ≤ k ≤ K. Therefore, K applications of the Independence Axiom yield Equation (2.43).
Since all the lotteries (Mk )K k=1 are lotteries over outcomes A1 and AK , the lottery on the right-hand side of Equation (2.44) is also a lottery over these two outcomes. Therefore, if we denote by r∗ the total probability of AK in the lottery on the right-hand side of Equation (2.44), then r∗ =
K
k=1
rk θik = ui (L),
(2.45)
and the Simplification Axiom implies that
K ), (1 − ui (L))(A ≈i [r∗ (AK ), (1 − r∗ )(A1 )] = [ui (L)(A L 1 )].
(2.46)
Step 5: The function ui is a utility function. To prove that ui is a utility function, we need to show that for any pair of compound and L ′ lotteries L ′ i L L
⇐⇒
, ≥ u i (L ′ ) ∀L 1 , L 2 ∈ L ui (L)
(2.47)
and this follows from Step 4, and the Monotonicity Axiom. This concludes the proof, under the assumption that AK ≻i A1 . We next turn to deal with the degenerate case in which the player is indifferent between all the outcomes: A1 ≈i A2 ≈i · · · ≈i AK .
(2.48)
By the Axioms of Independence and Simplification, the player is indifferent between any two simple lotteries. To see why, consider the simple lottery L = [p1 (A1 ), . . . , pK (AK )]. By repeated use of the Axiom of Independence, L ≈i [p1 (A1 ), p2 (A1 ), . . . , pK (A1 )].
(2.49)
22
Utility theory
The Axiom of Simplification implies that L ≈i [1(A1 )], so every compound lottery L ≈i [1(A1 )]. It follows that the player is indifferent between any two compound satisfies L lotteries, so that any constant function ui , represents his preference relation. Theorem 2.18 implies that if a player’s preference relation satisfies the von Neumann– Morgenstern axioms, then in order to know the player’s preferences over lotteries it suffices to know the utility he attaches to each individual outcome, because the utility of any lottery can then be calculated from these utilities (see Equation (2.37) and Example 2.19). Note that the linearity of utility functions in the probabilities of the individual outcomes, together with the Axiom of Simplification, implies the linearity of utility functions in the probabilities of simple lotteries. In words, if L1 and L2 are simple lotteries and = [qui (L1 ) + (1 − q)ui (L2 )] (see Exercise 2.11). = [q(L1 ), (1 − q)(L2 )], then ui (L) L
2.5
Utility functions and affine transformations • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Definition 2.20 Let u : X → R be a function. A function v : X → R is a positive affine transformation of u if there exists a positive real number α > 0 and a real number β such that v(x) = αu(x) + β, ∀x ∈ X.
(2.50)
The definition implies that if v is a positive affine transformation of u, then u is a positive affine transformation of v (Exercise 2.19). The next theorem states that every affine transformation of a utility function is also a utility function. Theorem 2.21 If ui is a linear utility function representing player i’s preference relation i , then every positive affine transformation of ui is also a linear utility function representing i . Proof: Let i be player i’s preference relation, and let vi = αui + β be a positive affine transformation of ui . In particular, α > 0. The first step is to show that vi is a utility function 2 if and 1 and L 2 be compound lotteries. We will show that L 1 i L representing i . Let L only if vi (L1 ) ≥ vi (L2 ). Note that since ui is a utility function representing i , 1 i L 1 ) ≥ ui (L 2 ) 2 ⇐⇒ ui (L L 1 ) + β ≥ αui (L 2 ) + β ⇐⇒ αui (L 1 ) ≥ vi (L 2 ), ⇐⇒ vi (L
which is what we needed to show.
(2.51) (2.52) (2.53)
23
2.7 Attitude towards risk
Next, we need to show that vi is linear. Let L = [p1 (A1 ), p2 (A2 ), . . . , pK (AK )] be a simple lottery. Since p1 + p2 + · · · + pK = 1, and ui is linear, we get vi (L) = αui (L) + β
= α(p1 ui (A1 ) + p2 ui (A2 ) + · · · + pK ui (AK )) + (p1 + p2 + · · · + pK )β
= p1 vi (A1 ) + p2 vi (A2 ) + · · · + pK vi (AK ), which shows that vi is linear.
(2.54) (2.55) (2.56)
The next theorem states the opposite direction of the previous theorem. Its proof is left to the reader (Exercise 2.21). Theorem 2.22 If ui and vi are two linear utility functions representing player i’s preference relation, where that preference relation satisfies the von Neumann–Morgenstern axioms, then vi is a positive affine transformation of ui . Corollary 2.23 A preference relation of a player that satisfies the von Neumann– Morgenstern axioms is representable by a linear utility function that is uniquely determined up to a positive affine transformation.
2.6
Infinite outcome set • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
We have so far assumed that the set of outcomes O is finite. A careful review of the proofs reveals that all the results above continue to hold if the following conditions are satisfied:
r The set of outcomes O is any set, finite or infinite. r The set of simple lotteries L contains every lottery over a finite number of outcomes. contains every lottery over a finite number of simple r The set of compound lotteries L lotteries. r The player has a complete, reflexive, and transitive preference relation over the set of . compound lotteries L r There exists a (weakly) most-preferred outcome AK ∈ O: the player (weakly) prefers AK to any other outcome in O. r There exists a (weakly) least-preferred outcome A1 ∈ O: the player (weakly) prefers any other outcome in O to A1 . In Exercise 2.22, the reader is asked to check that Theorems 2.18 and 2.22, and Corollary 2.23, hold in this general model.
2.7
Attitude towards risk • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
There are people who are risk averse, people who are risk neutral, and people who are risk seeking. The risk attitude of an individual can change over time; it may depend, for
24
Utility theory
example, on the individual’s family status or financial holdings. How does risk attitude affect a player’s utility function? In this section, we will assume that the set of outcomes is given by the interval O = [−R, R]: the real number x ∈ [−R, R] represents the monetary outcome that the player receives. We will assume that every player prefers receiving more, in dollars, to receiving less, so that x ≻i y if and only if x > y. We will similarly assume that the player has a complete, reflexive, and transitive preference relation over the set of compound lotteries that satisfies the von Neumann–Morgenstern axioms. Denote by ui player i’s utility function. As previously noted, the function ui is determined by player i’s utility from every outcome of a lottery. These utilities are given by a real-valued function Ui : R → R. In words, for every x ∈ O, Ui (x) := ui ([1(x)]).
(2.57)
Since players are assumed to prefer getting as large a monetary amount as possible, Ui is a monotonically increasing function. By the assumption that each player’s preference relation satisfies the von Neumann– Morgenstern axioms, it follows that for every simple lottery L = [p1 (x1 ), p2 (x2 ), . . . , pK (xk )], ui (L) =
K
k=1
pk Ui (xk ) =
K
pk ui ([1(xk )]).
(2.58)
k=1
The significance of this equation is that the utility ui (L) of a lottery L is the expected utility of the resulting payoff. Given a lottery L = [p1 (x1 ), p2 (x2 ), . . . , pK (xk )] with a finite number of possible outcomes, we will denote by μL the expected value of L, given by μL =
K
pk xk .
(2.59)
i=1
Definition 2.24 A player i is termed risk neutral if for every lottery L with a finite number of possible outcomes, ui (L) = ui ([1(μL )]).
(2.60)
A player i is termed risk averse if for every lottery L with a finite number of possible outcomes, ui (L) ≤ ui ([1(μL )]).
(2.61)
A player i is termed risk seeking (or risk loving) if for every lottery L with a finite number of possible outcomes, ui (L) ≥ ui ([1(μL )]).
(2.62)
Using Definition 2.24, to establish a player’s risk attitude, we need to compare the utility he ascribes to every lottery with the utility he ascribes to the expected value of that lottery. Conducting such a comparison can be exhausting, because it involves checking the condition with respect to every possible lottery. The next theorem, whose proof is
25
2.7 Attitude towards risk
left to the reader (Exercise 2.23), shows that it suffices to conduct the comparisons only between lotteries involving pairs of outcomes. Theorem 2.25 A player i is risk neutral if and only if for each p ∈ [0, 1] and every pair of outcomes x, y ∈ R, ui ([p(x), (1 − p)(y)]) = ui ([1(px + (1 − p)y)]).
(2.63)
A player i is risk averse if and only if for each p ∈ [0, 1] and every pair of outcomes x, y ∈ R, ui ([p(x), (1 − p)(y)]) ≤ ui ([1(px + (1 − p)y)]).
(2.64)
A player i is risk seeking if and only if for each p ∈ [0, 1] and every pair of outcomes x, y ∈ R, ui ([p(x), (1 − p)(y)]) ≥ ui ([1(px + (1 − p)y)]).
(2.65)
Example 2.26 Consider a player whose preference relation is represented by the utility function Ui (x) that is depicted in Figure 2.3, which is concave.
ui ([1(x)]) = Ui (x) Ui (w) Ui (αy+ (1 − α)w) αUi(y) + (1 − α)Ui (w) Ui (y) y αy + ( 1 − α)w w
x
Figure 2.3 The utility function of a risk-averse player
The figure depicts the graph of the function Ui , which associates each x with the utility of the player from definitely receiving outcome x (see Equation (2.57)). We will show that the concavity of the function Ui is an expression of the fact that player i is risk averse. Since the function Ui is concave, the chord connecting the two points on the graph of the function passes underneath the graph. Hence for every y, w ∈ R and every α ∈ (0, 1), ui ([1(αy + (1 − α)w))]) = Ui (αy + (1 − α)w)
(2.66)
> αUi (y) + (1 − α)Ui (w)
(2.67)
= ui ([α(y), (1 − α)(w)]).
(2.69)
= αui ([1(y)]) + (1 − α)ui ([1(w)])
(2.68)
In words, player i prefers receiving with certainty the expectation αy + (1 − α)w to receiving y with probability α and w with probability 1 − α, which is precisely what risk aversion means. ◭
26
Utility theory
As Example 2.26 suggests, one’s attitude to risk can be described in simple geometrical terms, using the utility function. Theorem 2.27 A player i, whose preference relation satisfies the von Neumann– Morgenstern axioms, is risk neutral if and only if Ui is a linear function, he is risk averse if and only if Ui is a concave function, and he is risk seeking if and only if Ui is a convex function. Proof: Since by assumption the player’s preference relation satisfies the von Neumann– Morgenstern axioms, the utility of every simple lottery L = [p1 (x1 ), p2 (x2 ), . . . , pK (xK )] is given by ui (L) =
K
k=1
pk ui ([1(xk )]) =
K
pk Ui (xk ).
(2.70)
k=1
A player is risk averse if and only if ui (L) ≤ ui ([1(μL )]) = Ui (μL ), or, in other words, if and only if K
K
pk Ui (xk ) = ui (L) ≤ Ui (μL ) = Ui pk xk . (2.71) k=1
k=1
In summary, a player is risk averse if and only if K
K
pk Ui (xk ) ≤ Ui pk xk . k=1
(2.72)
k=1
This inequality holds for every (x1 , x2 , . . . , xK ) and for every vector of nonnegative numbers (p1 , p2 , . . . , pK ) summing to 1, if and only if Ui is concave. Similarly, player i is risk seeking if and only if K
K
(2.73) pk Ui (xk ) ≥ Ui pk xk . k=1
k=1
This inequality holds for every (x1 , x2 , . . . , xK ) and for every vector of nonnegative numbers (p1 , p2 , . . . , pK ) summing to 1, if and only if Ui is convex. A player is risk neutral if and only if he is both risk seeking and risk neutral. Since a function is both concave and convex if and only if it is linear, player i is risk neutral if and only if Ui is linear.
2.8
Subjective probability • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
A major milestone in the study of utility theory was attained in 1954, with Leonard Savage’s publication of The Foundations of Statistics. Savage generalized von Neumann and Morgenstern’s model, in which the probability of each outcome in every lottery is “objective” and known to the participants. That model is reasonable when the outcome is determined by a flip of a coin or a toss of dice, but in most of the lotteries we face in real life, probabilities are often unknown. Consider, for example, the probability of a major
27
2.9 Discussion
earthquake occurring over the next year in the San Fernando Valley, or the probability that a particular candidate will win the next presidential election. The exact probabilities of these occurrences are unknown. Different people will differ in their assessments of these probabilities, which are subjective. In addition, as noted above, people often fail to perceive probability correctly, so that their perceptions contradict the laws of probability. Savage supposed that there is an infinite set of states of the world, ; each state of the world is a complete description of all the variables characterizing the players, including the information they have. Players are asked to choose between “gambles,” which formally are functions f : → O. What this means is that if a player chooses gamble f , and the state of the world (i.e., the true reality) is ω, then the outcome the player receives is f (ω). Players are assumed to have complete, reflexive, and transitive preference relations over the set of all gambles. For example, if E, F ⊂ are two events, and A1 , A2 , A3 , and A4 are outcomes, a player can compare a gamble in which he receives A1 if the true state is in E and A2 if the true state is not in E, with a gamble in which he receives A3 if the true state is in F and A4 if the true state is not in F . Savage proved that if the preference relation of player i satisfies certain axioms, then there exists a probability distribution qi over and a function ui : O → R representing player i’s preference relation. In other words, the player, by preferring one gamble to another, behaves as if he is maximizing expected utility, where the expected utility is calculated using the probability distribution qi : ui (f (ω))dqi (ω). (2.74) ui (f ) =
Similarly to von Neumann–Morgenstern utility, the utility of f is the expected value of the utility of the outcomes, with qi representing player i’s subjective probability, and utility ui representing the player’s preferences (whether or not he is conscious of using a probability distribution and a utility function at all). A further development in subjective probability theory, slightly different from Savage’s, was published by Anscombe and Aumann [1963].
2.9
Discussion • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Theoretically, a simple interview is all that is needed to ascertain a player’s utility function, assuming his preference relation satisfies the von Neumann–Morgenstern axioms. One can set the utility of A1 , the least-preferred outcome, to be 0, the utility of Ak , the mostpreferred outcome, to be 1, and then find, for every k ∈ {2, 3, . . . , K − 1}, the values of θik such that the player is indifferent between AK and the lottery [θik (AK ), (1 − θik )(A1 )]. Experimental evidence shows that in interviews, people often give responses that indicate their preferences do not always satisfy the von Neumann–Morgenstern axioms. Here are some examples.
2.9.1
The assumption of completeness The assumption of completeness appears to be very reasonable, but it should not be regarded as self-evident. There are cases in which people find it difficult to express clear
28
Utility theory
preferences between outcomes. For example, imagine a child whose parents are divorced, who is asked whether he prefers a day with his mother or his father. Many children find the choice too difficult, and refuse to answer the question.
2.9.2
The assumption of transitivity Give a person a sufficiently large set of choices between outcomes, and you are likely to discover that his declared preferences contradict the assumption of transitivity. Some of these “errors” can be corrected by presenting the player with evidence of inconsistencies, careful analysis of the answers, and attempts to correct the player’s valuations. Violations of transitivity are not always due to inconsistencies on the part of an individual player. If a “player” is actually composed of a group of individuals, each of whom has a transitive preference relation, it is possible for the group’s collective preferences to be non-transitive. The next example illustrates this phenomenon.
Example 2.28 The Condorcet Paradox Three alternative political policies, A, B, and C, are being debated. It is suggested that a referendum be conducted to choose between them. The voters, however, have divided opinions on the relative preferences between the policies, as follows: Democrats: Republicans: Independents:
A ≻D B ≻D C B ≻R C ≻R A C ≻I A ≻I B
Suppose that the population is roughly equally divided between Democrats, Republicans, and Independents. It is possible to fashion a referendum that will result in a nearly two-thirds majority approving any one of the alternative policies. For example, if the referendum asks the electorate to choose between A and B, a majority will vote A ≻ B. If, instead, the referendum presents a choice between B and C, a majority will vote B ≻ C; and a similar result can be fashioned for C ≻ A. Which of these three policies, then, can we say the electorate prefers? The lack of transitivity in preferences resulting from the use of the majority rule is an important subject in “social choice theory” (see Chapter 21). This was first studied by Condorcet3 ◭ (see Example 21.1 on page 854).
2.9.3
Perceptions of probability If a person’s preference relation over three possible outcomes A, B, and C satisfies A ≻ B ≻ C, we may by trial and error present him with various different probability values p, until we eventually identify a value p0 such that B ≈ [p0 (A), (1 − p0 )(C)].
(2.75)
Let’s say, for example, that the player reports that he is indifferent between the following: $7,000 ≈ 32 ($20,000), 13 ($0) . (2.76) ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
3 Marie Jean Antoine Nicolas Caritat, Marquis de Condorcet, 1743–94, was a French philosopher and mathematician who wrote about political science.
29
2.9 Discussion
Empirically, however, if the same person is asked how large x must be in order for him to be indifferent between the following: $7,000 ≈ 32 ($x), 13 ($0) , (2.77)
the answer often4 differs from $20,000. This shows that the perceptions of probability that often occur naturally to decision makers may diverge from the mathematical formulations. People are not born with internal calculators, and we must accept the fact that what people perceive may not always follow the laws of probability.
2.9.4
The Axiom of Simplification The Axiom of Simplification states that the utility of a compound lottery depends solely on the probability it eventually assigns to each outcome. We have already noted that this ignores other aspects of compound lotteries; for example, it ignores the pleasure (or lack of pleasure) a participant gains from the very act of participating in a lottery. It is therefore entirely possible that a person may prefer a compound lottery to a simple lottery with exactly the same outcome probabilities, or vice versa.
2.9.5
Other aspects that can influence preferences People’s preferences change over time and with changing circumstances. A person may prefer steak to roast beef today, and roast beef to steak tomorrow. One also needs to guard against drawing conclusions regarding preferences to quickly given answers to interview questions, because the answers are liable to depend on the information available to the player. Take, for example, the following story, based on a similar story appearing in Luce and Raifa [1957]. A man at a restaurant asks a waiter to list the available items on the menu. The waiter replies “steak and roast beef.” The man orders the roast beef. A few minutes later, the waiter returns and informs him that he forgot to note an additional item on the menu, filet mignon. “In that case,” says the restaurant guest, “I’ll have the steak, please.” Does this behavior reveal inconsistency in preferences? Not necessarily. The man may love steak, but may also be concerned that in most restaurants, the steak is not served sufficiently tender to his taste. He therefore orders the roast beef, confident that most chefs know how to cook a decent roast. When he is informed that the restaurant serves filet mignon, he concludes that there is a high-quality chef in the kitchen, and feels more confident in the chef’s ability to prepare a tender steak. In other words, the fact that given a choice between steak and roast beef, a player chooses roast beef, does not necessarily mean that he prefers roast beef to steak. It may only indicate that the quality of the steak is unknown, in which case choosing “steak” may translate into a lottery between quality steak and intolerable steak. Before receiving additional information, the player ascribes low probability to receiving quality steak. After the additional information has been given, the probability of quality steak increases in the player’s estimation, thus affecting his choice. The player’s preference of steak to roast
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
4 The authors wish to thank Reinhard Selten for providing them with this example.
30
Utility theory
beef has not changed at all over time, but rather his perception of the lottery with which he is presented. This story illustrates how additional information can bring about changes in choices without contradicting the assumptions of utility theory. Another story, this one a true event that occurred during the Second World War on the Pacific front,5 seems to contradict utility theory. A United States bomber squadron, charged with bombing Tokyo, was based on the island of Saipan, 3000 kilometers from the bombers’ targets. Given the vast distance the bombers had to cover, they flew without fighter-plane accompaniment and carried few bombs, in order to cut down on fuel consumption. Each pilot was scheduled to rotate back to the United States after 30 successful bombing runs, but Japanese air defenses were so efficient that only half the pilots sent on the missions managed to survive 30 bombing runs. Experts in operations research calculated a way to raise the odds of overall pilot survival by increasing the bomb load carried by each plane – at the cost of placing only enough fuel in each plane to travel in one direction. The calculations indicated that increasing the number of bombs per plane would significantly reduce the number of required bombing runs, enabling three-quarters of the pilots to be rotated back to the United States immediately, without requiring them to undertake any more missions. The remaining pilots, however, would face certain death, since they would have no way of returning to base after dropping their bombs over Tokyo. If the pilots who are sent home are chosen randomly, then the pilots were, in fact, being offered the lottery 3 (Life), 41 (Death) , 4
in place of their existing situation, which was equivalent to the lottery 1 (Life), 21 (Death) . 2
Every single pilot rejected the suggested lottery outright. They all preferred their existing situation. Were the pilots lacking a basic understanding of probability? Were they contradicting the von Neumann–Morgenstern axioms? One possible explanation for why they failed to act in accordance with the axioms is that they were optimists by nature, believing that “it will not happen to me.” But there are other explanations, that do not necessarily lead to a rejection of standard utility theory. The choice between life and death may not have been the only factor that the pilots took into account. There may also have been moral issues, such as taboos against sending some comrades on certain suicide missions while others got to return home safely. In addition, survival rates are not fixed in war situations. There was always the chance that the war would take a dramatic turn, rendering the suicide missions unnecessary, or that another ingenious solution would be found. And indeed, a short time after the suicide mission suggestion was raised, American forces captured the island of Iwo Jima. The air base in Iwo Jima was sufficiently close to Tokyo, only
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
5 The story was related to the authors by Kenneth Arrow, who heard of it from Merrill F. Flood.
31
2.11 Exercises
600 kilometers away, to enable fighter planes to accompany the bombers, significantly raising the survival rates of American bombers, and the suicide mission suggestion was rapidly consigned to oblivion.6
2.10
Remarks • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The authors wish to thank Tzachi Gilboa and Peter Wakker for answering several questions that arose during the composition of this chapter. The Sure-Thing Principle, which appears in Exercise 2.12, first appeared in Savage [1954]. first presented in Marschak [1950] and Nash [1950a]. The property described in Exercise 2.14 is called “Betweenness.” Exercise 2.15 is based on a column written by John Branch in The New York Times on August 30, 2010. Exercise 2.25 is based on Rothschild and Stiglitz [1970], which also contains an example of the phenomenon appearing in Exercise 2.27. The Arrow–Pratt measure of absolute risk aversion, which appears in Exercise 2.28, was first defined by Arrow [1965] and Pratt [1964].
2.11
Exercises • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
2.1 Prove the following claims: (a) A strict preference relation ≻ is anti-symmetric and transitive.7 (b) An indifference relation ≈ is symmetric and transitive.8 2.2 Prove Theorem 2.7 (page 11): let O be a set of outcomes, and let be a complete, reflexive, and transitive relation over O. Suppose that u is a utility function representing . Prove that for every monotonically increasing function v : R → R, the composition v ◦ u defined by (v ◦ u)(x) = v(u(x))
(2.78)
is also a utility function representing . 2.3 Give an example of a countable set of outcomes O and a preference relation over O, such that every utility function representing must include values that are not integers. 2.4 Prove Theorem 2.14 (page 17): if a preference relation i satisfies the axioms of continuity and monotonicity, and if A i B i C and A ≻i C, then there exists a unique number θi ∈ [0, 1] that satisfies B ≈i [θi (A), (1 − θi )(B)].
(2.79)
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
6 Bombing missions emanating from Iwo Jima also proved to be largely inefficient – only ten such missions were attempted – but American military advances in the Spring of 1945 rapidly made those unnecessary as well. 7 A relation ≻ is anti-symmetric if for each x, y, if x ≻ y, then it is not the case that y ≻ x. 8 A relation ≈ is symmetric if for each x, y, if x ≈ y, then y ≈ x.
32
Utility theory
2.5 Prove that the von Neumann–Morgenstern axioms are independent. In other words, for every axiom there exists a set of outcomes and a preference relation that does not satisfy that axiom but does satisfy the other three axioms. 2.6 Prove the converse of Theorem 2.18 (page 19): if there exists a linear utility function representing a preference relation i of player i, then i satisfies the von Neumann– Morgenstern axioms. 2.7 Suppose that a person whose preferences satisfy the von Neumann–Morgenstern axioms, and who always prefers more money to less money, says that:
r he is indifferent between receiving $500 and participating in a lottery in which he receives $1,000 with probability 32 and receives $0 with probability 13 ; r he is indifferent between receiving $100 and participating in a lottery in which he receives $500 with probability 38 and receives $0 with probability 58 . (a) Find a linear utility function representing this person’s preferences, and in addition satisfying u($1,000) = 1 and u($0) = 0. (b) Determine which of the following two lotteries will be preferred by this person: r A lottery in which he receives $1,000 with probability 3 , $500 with probability 10 1 1 1 , $100 with probability , and $0 with probability , or 10 2 10 2 r A lottery in which he receives $1,000 with probability , $500 with probability 10 3 2 3 , $100 with probability 10 , and $0 with probability 10 . 10 (c) Is it possible to ascertain which of the following two lotteries he will prefer? Justify your answer. r A lottery in which he receives $1,000 with probability 3 , $500 with probability 10 1 1 1 , $100 with probability , and $0 with probability . 10 2 10 r Receiving $400 with probability 1. (d) Is it possible to ascertain which of the following two lotteries he will prefer? Justify your answer. r A lottery in which he receives $1,000 with probability 3 , $500 with probability 10 1 1 , $100 with probability 12 , and $0 with probability 10 . 10 r Receiving $600 with probability 1. 2.8 How would the preferences between the two lotteries in Exercise 2.7(b) change if u($1,000) = 8 and u($0) = 3? Justify your answer. 2.9 Suppose that a person whose preferences satisfy the von Neumann–Morgenstern axioms says that his preferences regarding outcomes A, B, C, and D satisfy (2.80) C ≈i 35 (A), 25 (D) , B ≈i 43 (A), 14 (C) , A ≻i D.
Determine which of the following two lotteries will be preferred by this person: L1 = 25 (A), 15 (B), 15 (C), 15 (D) or L2 = 52 (B), 35 (C) . (2.81)
2.10 What would be your answer to Exercise 2.9 if D ≻i A instead of A ≻i D? Relate your answer to this exercise with your answer to Exercise 2.9.
33
2.11 Exercises
2.11 Prove that if ui is a linear utility function, then = ui (L)
J
qj ui (Lj )
(2.82)
j =1
= [q1 (L1 ), q2 (L2 ), . . . , qJ (LJ )]. is satisfied for every compound lottery L
2.12 The Sure-Thing Principle Prove that a preference relation that satisfies the von Neumann–Morgenstern axioms also satisfies [α(L1 ), (1 − α)(L3 )] ≻ [α(L2 ), (1 − α)(L3 )]
(2.83)
[α(L1 ), (1 − α)(L4 )] ≻ [α(L2 ), (1 − α)(L4 )].
(2.84)
if and only if
for any four lotteries L1 , L2 , L3 , L4 , and any α ∈ [0, 1]. 2.13 Suppose a person whose preferences satisfy the von Neumann–Morgenstern axioms says that with respect to lotteries L1 , L2 , L3 , L4 , his preferences are L1 ≻ L2 and L3 ≻ L4 . Prove that for all 0 ≤ α ≤ 1, [α(L1 ), (1 − α)(L3 )] ≻ [α(L2 ), (1 − α)(L4 )].
(2.85)
2.14 Suppose a person whose preferences satisfy the von Neumann–Morgenstern axioms says that with respect to lotteries L1 and L2 , his preference is L1 ≻ L2 . Prove that for all 0 < α ≤ 1, [α(L1 ), (1 − α)(L2 )] ≻ L2 .
(2.86)
2.15 A tennis player who is serving at the beginning of a point has two attempts to serve; if the ball does not land within the white lines of the opponent’s court on his first attempt, he receives a second attempt. If the second attempt also fails to land in the opponent’s court, the serving player loses the point. If the ball lands in the opponent’s court during either attempt, the players volley the ball over the net until one or the other player wins the point. While serving, a player has two alternatives. He may strike the ball with great force, or with medium force. Statistics gathered from a large number of tennis matches indicate that if the server strikes the ball with great force, the ball lands in the opponent’s court with probability 0.65, with the server subsequently winning the point with probability 0.75. If, however, the server strikes the ball with medium force, the ball lands in the opponent’s court with probability 0.9, with the server subsequently winning the point with probability 0.5. In most cases, servers strike the ball with great force on their first-serve attempts, and with medium force on their second attempts. (a) Assume that there are two possible outcomes: winning a point or losing a point, and that the server’s preference relation over compound lotteries satisfies the von Neumann–Morgenstern axioms. Find a linear utility function representing the server’s preference relation.
34
Utility theory
(b) Write down the compound lottery that takes place when the server strikes the ball with great force, and when he strikes the ball with medium force. (c) The server has four alternatives: two alternatives in her first-serve attempt (striking the ball with great force or with medium force), and similarly two alternatives in his second serve attempt if the first attempt failed. Write down the compound lotteries corresponding to each of these four alternatives. Note that in this case the compound lotteries are of order 3: lotteries over lotteries over lotteries. (d) Which compound lottery is most preferred by the server, out of the four compound lotteries you identified in item (c) above? Is this alternative the one chosen by most tennis players? 2.16 Ron eats yogurt every morning. Ron especially loves yogurt that comes with a small attached container containing white and dark chocolate balls, which he mixes into his yogurt prior to eating it. Because Ron prefers white chocolate to dark chocolate, he counts the number of white chocolate balls in the container, his excitement climbing higher the greater the number of white chocolate balls. One day, Ron’s brother Tom has an idea for increasing his brother’s happiness: he will write to the company producing the yogurt and ask them to place only white chocolate balls in the containers attached to the yogurt! To Tom’s surprise, Ron opposes this idea: he prefers the current situation, in which he does not know how many white chocolate balls are in the container, to the situation his brother is proposing, in which he knows that each container has only white chocolate balls. Answer the following questions. (a) Write down the set of outcomes in this situation, and Ron’s preference relation over those outcomes. (b) Does Ron’s preference relation over lotteries satisfy the von Neumann– Morgenstern axioms? Justify your answer. 2.17 A farmer wishes to dig a well in a square field whose coordinates are (0, 0), (0, 1000), (1000, 0), and (1000, 1000). The well must be located at a point whose coordinates (x, y) are integers. The farmer’s preferences are lexicographic: if x1 > x2 , he prefers that the well be dug at the point (x1 , y1 ) to the point (x2 , y2 ), for all y1 , y2 . If x1 = x2 , he prefers the first point only if y1 > y2 . Does there exist a preference relation over compound lotteries over pairs of integers (x, y), 0 ≤ x, y ≤ 1000, that satisfies the von Neumann–Morgenstern axioms and extends the lexicographic preference relation? If so, give an example of a linear utility function representing such a preference relation, and if not, explain why such a preference relation does not exist. 2.18 In this exercise, we will show that in the situation described in Exercise 2.17, when the coordinates (x, y) can be any real numbers in the square [0, 1000]2 , there does not exist a utility function that represents the lexicographic preference relation. Suppose, by contradiction, that there does exist a preference relation over [0, 1000]2 that represents the lexicographic preference relation. (a) Prove that for each (x, y) ∈ [0, 1000]2 there exists a unique θx,y ∈ [0, 1] such that the farmer is indifferent between locating the well at point (x, y) and a
35
2.11 Exercises
(b) (c)
(d) (e) (f) (g)
lottery in which the well is located at point (0, 0) with probability 1 − θx,y and located at point (1000, 1000) with probability θx,y . Prove that the function (x, y) #→ θx,y is injective, that is, θx ′ ,y ′ = θx,y whenever (x ′ , y ′ ) = (x, y). For each x, define Ax := {θx,y : y ∈ [0, 1000]}. Prove that for each x the set Ax contains at least two elements, and that the sets {Ax , x ∈ [0, 1]} are pairwise disjoint. Prove that if x1 < x2 then θ1 < θ2 for all θ1 ∈ Ax1 and for all θ2 ∈ Ax2 . Prove that there does not exist a set {Ax : x ∈ [0; 1]} satisfying (c) and (d). Deduce that there does not exist a utility function over [0, 1000]2 that represents the lexicographic preference relation. Which of the von Neumann–Morgenstern axioms is not satisfied by the preference relation in this exercise?
2.19 Prove that if v is a positive affine transformation of u, then u is a positive affine transformation of v. 2.20 Prove that if v is a positive affine transformation of u, and if w is a positive affine transformation of v, then w is a positive affine transformation of u. 2.21 Prove Theorem 2.22 (page 23): suppose a person’s preferences, which satisfy the von Neumann–Morgenstern axioms, are representable by two linear utility functions u and v. Prove that v is a positive affine transformation of u. 2.22 Let O be an infinite set of outcomes. Let L be the set of all lotteries over a finite be the set of all compound lotteries over number of outcomes in O, and let L a finite number of simple lotteries in L. Suppose that a player has a complete, reflexive, and transitive preference relation ! over the set of compound lotteries L that satisfies the von Neumann–Morgenstern axioms, and also satisfies the property that O contains a most-preferred outcome AK , and a least-preferred outcome A1 , that is, AK ! A ! A1 holds for every outcome A in O. Answer the following questions: (a) Prove Theorem 2.18 (page 19): there exists a linear utility function that represents the player’s preference relation. (b) Prove Theorem 2.22 (page 23): if u and v are two linear utility functions of the player that represent , then v is a positive affine transformation of u. (c) Prove Corollary 2.23 (page 23): there exists a unique linear utility function (up to a positive affine transformation) representing the player’s preference relation. 2.23 Prove Theorem 2.25 on page 25. 2.24 Recall that a linear utility function ui over lotteries with outcomes in the interval [−R, R] defines a utility function Ui over payoffs in the interval [−R, R] by setting Ui (x) := ui ([1(x)]). In the other direction, every function Ui : [−R, R] → R defines a linear utility function ui over lotteries Kwith outcomes in the interval [−R, R] by ui ([p1 (x1 ), p2 (x2 ), . . . , pK (xK )]) := k=1 pk Ui (xk ). For each of the following functions Ui defined on [−R, R], determine whether it defines a linear utility function of a risk-neutral, risk-averse, or risk-seeking player,
36
Utility theory
or none of the above: (a) 2x + 5, (b) −7x + 5, (c) 7x − 5, (d) x 2 , (e) x 3 , (f) ex , (g) ln(x), (h) x for x ≥ 0, and 6x for x < 0, (i) 6x for x ≥ 0, and x for x < 0, (j) x 3/2 for x ≥ 0, x for x < 0, (k) x/ ln(2 + x), for x ≥ 0, and x for x < 0. Justify your answers. 2.25 In this exercise, we show that a risk-averse player dislikes the addition of noise to a lottery. Let U : R → R be a concave function, let X be a random variable with a finite expected value, and let Y be a random variable that is independent of X and has an expected value 0. Define Z = X + Y . Prove that E[U (X)] ≥ E[U (Z)]. 2.26 In this exercise, we show that in choosing between two random variables with the same expected value, each with a normal distribution, a risk-averse player will prefer the random variable that has a smaller variance. Let U : R → R be a concave function, and let X be a random variable with a normal distribution, expected value μ, and standard deviation σ . Let λ > 1, and let Y be a random variable with a normal distribution, expected value μ, and standard deviation λσ . √ √ (a) Prove that U (μ + c) + U (μ − c) ≥ U (μ + c λ) + U (μ − c λ) for all c > 0. (b) By a proper change of variable, and using item (a) above, prove that ∞ ∞ 2 (y−μ)2 1 1 − (x−μ) (2.87) u(x) √ u(y) √ e 2σ dx ≥ e− 2λσ dy. 2πσ 2πλσ −∞ −∞ (c) Conclude that E[U (X)] ≥ E[U (Y )]. 2.27 In Exercises 2.25 and 2.26, a risk-averse player, in choosing between two random variables with the same expected value, prefers the random variable with smaller variance. This exercise shows that this does not always hold: sometimes a risk-averse player called upon to choose between two random variables with the same expected value will actually prefer the random variable with greater variance. Let U (x) = 1 − e−x be a player’s utility function. (a) Is the player risk averse, risk neutral, or risk seeking? Justify your answer. For each a ∈ (0, 1) and each p ∈ (0, 1), let Xa,p be a random variable whose distribution is P(Xa,p = 1 − a) =
1−p , 2
P(Xa,p = 1) = p,
P(Xa,p = 1 + a) =
1−p . 2
(b) Calculate the expected value E[Xa,p ] and the variance Var(Xa,p ) for each a ∈ (0, 1) and each p ∈ (0, 1). (c) Let c2 = a 2 (1 − p). Show that the expected value of the lottery Xa,p is given by c2 1 (2.88) (ea + e−a + 2) 2 − 2 , E[U (Xa,p )] = 1 − 2e a which is not a constant function in a and p.
37
2.11 Exercises
(d) Show that there exist a1 , a2 , p1 , p2 ∈ (0, 1) such that E[Xa1 ,p1 ] = E[Xa2 ,p2 ], and Var(Xa1 ,p1 ) = Var(Xa2 ,p2 ),
(2.89)
but E[U (Xa1 ,p1 )] < E[U (Xa2 ,p2 )]. (e) Conclude that there exist a1 , a2 , p1 , p2 ∈ (0, 1) such that E[Xa1 ,p1 ] = E[Xa2 ,p2 ],
Var(Xa1 ,p1 ) < Var(Xa2 ,p2 ),
and
E[U (Xa1 ,p1 )] < E[U (Xa2 ,p2 )].
(2.90)
2.28 The Arrow–Pratt measure of absolute risk aversion Let Ui be a monotonically increasing, strictly concave, and twice continuously differentiable function over R, and let i be a player for which Ui is his utility function for money. The Arrow–Pratt measure of absolute risk-aversion for player i is rUi (x) := −
Ui′′ (x) . Ui′ (x)
(2.91)
The purpose of this exercise is to understand the meaning of this measure. (a) Suppose the player has $x, and is required to participate in a lottery in which he stands to gain or lose a small amount $h, with equal probability. Denote by Y the amount of money the player will have after the lottery is conducted. Calculate the expected value of Y , E[Y ], and the variance of Y , Var(Y ). (b) What is the utility of the lottery, ui (Y ), for this player? What is the player’s utility loss due to the fact that he is required to participate in the lottery; in other words, what is uh := Ui (x)′′− ui (Y )? U (x) h (c) Prove that limh→0 u = − i2 . h2 (d) Denote by yx,h the amount of money that satisfies ui (yx,h ) = ui (Y ), and by
xh the difference xh := x − yx,h . Explain why xh ≥ 0. Make use of the following figure in order to understand the significance of the various sizes. Ui (x)
Ui (x + h)
∆xh
Ui (x) ui (Y)
∆uh
x−h
x yx,h x
x+h
38
Utility theory
(e) Using the fact that limh→0 (d) above, prove that
uh
xh
= Ui′ (x), and your answers to the items (b) and
U ′′ (x)
xh =− i′ = 21 rUi (x). h→0 Var(Y ) 2Ui (x) lim
(2.92)
We can now understand the meaning of the Arrow–Pratt measure of absolute risk aversion rUi (x): it is the sum of money, multiplied by the constant 12 , that a player starting out with $x is willing to pay in order to avoid participating in a fair lottery over an infinitesimal amount $h with expected value 0, measured in units of lottery variance. (f) Calculate the Arrow–Pratt measure of absolute risk aversion for the following utility functions: (a) Ui (x) = x α for 0 < α < 1, (b) Ui (x) = 1 − e−αx for α > 0. (g) A function Ui exhibits constant absolute risk aversion if rUi is a constant function (i.e., does not depend on x). It exhibits increasing absolute risk aversion if rUi is an increasing function in x, and exhibits decreasing absolute risk aversion if rUi is a decreasing function in x. Check which functions in part (g) exhibit constant, increasing, or decreasing absolute risk aversion. 2.29 Which of the von Neumann–Morgenstern axioms were violated by the preferences expressed by the Second World War pilots in the story described on page 30?
3
Extensive-form games
Chapter summary In this chapter we introduce a graphic way of describing a game, the description in extensive form, which depicts the rules of the game, the order in which the players make their moves, the information available to players when they are called to take an action, the termination rules, and the outcome at any terminal point. A game in extensive form is given by a game tree, which consists of a directed graph in which the set of vertices represents positions in the game, and a distinguished vertex, called the root, represents the starting position of the game. A vertex with no outgoing edges represents a terminal position in which play ends. To each terminal vertex corresponds an outcome that is realized when the play terminates at that vertex. Any nonterminal vertex represents either a chance move (e.g., a toss of a die or a shuffle of a deck of cards) or a move of one of the players. To any chance-move vertex corresponds a probability distribution over the edges emanating from that vertex, which correspond to the possible outcomes of the chance move. To describe games with imperfect information, in which players do not necessarily know the full board position (like poker), we introduce the notion of information sets. An information set of a player is a set of decision vertices of the player that are indistinguishable by him given his information at that stage of the game. A game of perfect information is a game in which all information sets consist of a single vertex. In such a game whenever a player is called to take an action, he knows the exact history of actions and chance moves that led to that position. A strategy of a player is a function that assigns to each of his information sets an action available to him at that information set. A path from the root to a terminal vertex is called a play of the game. When the game has no chance moves, any vector of strategies (one for each player) determines the play of the game, and hence the outcome. In a game with chance moves, any vector of strategies determines a probability distribution over the possible outcomes of the game.
This chapter presents the theory of games in extensive form. It will be shown that many familiar games, including the game of chess studied in Chapter 1, can be described formally as extensive-form games, and that Theorem 1.4 can be generalized to every finite extensive-form game. 39
40
Extensive-form games
3.1
An example • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
How does one describe a game? Every description of a game must include the following elements: r A set of players (decision makers). r The possible actions available to each player. r Rules determining the order in which players make their moves. r A rule determining when the game ends. r A rule determining the outcome of every possible game ending. A natural way to depict a game is graphically, where every player’s action is depicted as a transition from one vertex to another vertex in a graph (as we saw in Figure 1.1 for the game of chess). Example 3.1 Consider the simple game shown in Figure 3.1. We start with a table with four squares, labeled 1, 2, 3, and 4.
2 4 1 3 Figure 3.1 The game board in Example 3.1
Two players, labeled Players I and II, participate in the game. Player I has the opening move, in which he “captures” one of the squares. By alternate turns, each player captures one of the squares, subject to the following conditions: 1. A square may be captured by a player only if it has not been previously captured by either player. 2. Square 4 may not be captured if Square 2 or Square 3 has been previously captured. 3. The game ends when Square 1 is captured. The player who captures Square 1 is the losing player. A graphic depiction of this game appears in Figure 3.2. a
II wins
b II
1
I 3
c II
e f
3
2 r
1
1 2
I g
d II
1 2 3
l
j
m 1 p 1 q
I 3 k I
II wins
I wins
1 2
Figure 3.2 The game tree in Example 3.1
II wins
1 I wins
h I i
4
I wins
II s w II
II wins y I wins
1 II wins z 1
I wins
41
3.2 Graphs and trees Every circled vertex in Figure 3.2 represents a decision by a player, and is labeled with the number of that player. The terminal vertices of the game are indicated by dark dots. The edges of the graph depict game actions. The number that appears next to each edge corresponds to the square that is captured. Next to every terminal vertex, the corresponding game outcome is indicated. A game depicted by such a graph is called a game in extensive form, or extensive-form game. ◭
As the example illustrates, a graph that describes a game has a special structure, and is sometimes called a game tree. To provide a formal definition of a game tree, we first define a tree.
3.2
Graphs and trees • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Definition 3.2 A (finite) directed graph is a pair G = (V , E), where:
r V is a finite set, whose elements are called vertices. r E ⊆ V × V is a finite set of pairs of vertices, whose elements are called edges. Each directed edge is composed of two vertices: the two ends of the edge (it is possible for both ends of a single edge to be the same vertex). A convenient way of depicting a graph geometrically is by representing each vertex by a dot and each edge by an arrow (a straight line, an arc, or a circle) connecting two vertices. Illustrative examples of geometric depictions of graphs are presented in Figure 3.3. Remark 3.3 Most of the games that are described in this book are finite games, and can therefore be represented by finite graphs. But there are infinite games, whose representation requires infinite graphs. Definition 3.4 Let x 1 and x K+1 be two vertices in a graph G. A path from x 1 to x K+1 is a finite sequence of vertices and edges of the form x 1 , e1 , x 2 , e2 , . . . , eK , x K+1
(3.1)
in which the vertices are distinct: ek = el for every k = l and for 1 ≤ k ≤ K, the edge ek connected vertex x k with vertex x k+1 . The number K is called the path length. A path is called cyclic if K ≥ 1 and x 1 = x K+1 .
Figure 3.3 Examples of graphs
42
Extensive-form games
x4
x3 x1
x1
x2
x 4x 5x 6
x7
x8
x1
x2
x3
x0
x0
x0
x0
Tree A
Tree B
Tree C
Tree D
Figure 3.4 Examples of trees
Definition 3.5 A tree is a triple G = (V , E, x 0 ), where (V , E) is a directed graph, x 0 ∈ V is a vertex called the root of the tree, and for every vertex x ∈ V there is a unique path in the graph from x 0 to x. The definition of a tree implies that a graph containing only one vertex is a tree: the triple ({x 0 }, ∅, x 0 ) is a tree. The requirement that for each vertex x ∈ V there exists a unique path from the root to x guarantees that if there is an edge from a vertex x to a vertex x then x is “closer” to the root than x: the path leading from the root to x passes through x (while the path from the root to x does not pass through x). It follows that there is no need to state explicitly the directions of the edges in the tree. Figure 3.4 shows several trees. Tree A contains only one vertex, the root x 0 . Tree B contains two vertices and one edge, from x 0 to x 1 . Tree C contains four edges, from x 0 to x 1 , from x 0 to x 2 , from x 1 to x 3 , and from x 1 to x 4 . A vertex x is called a child of a vertex x if there is a directed edge from x to x. For example, in the tree in Figure 3.2, g and h are children of c, and s and w are children of k. A vertex x is a leaf (or a terminal point) if it has no children, meaning that there are no directed edges emanating from x.
3.3
Game trees • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Various games can be represented by trees. When a tree represents a game, the root of the tree corresponds to the initial position of the game, and every game position is represented by a vertex of the tree. The children of each vertex v are the vertices corresponding to the game positions that can be arrived at from v via one action. In other words, the number of children of a vertex is equal to the number of possible actions in the game position corresponding to that vertex. Figure 3.2 indicates that in addition to the game tree, we need two further components in order to describe a game fully:
r For every vertex that is not a leaf, we need to specify the player who is to take an action at that vertex. r At each leaf, we need to describe the outcome of the game.
43
3.3 Game trees
Definition 3.6 Let B be a nonempty set. A partition of B is a collection B1 , B2 , . . . , BK of pairwise disjoint and nonempty subsets of B whose union is B. We are now ready for the first definition of a game in extensive form. We will later add more elements to this definition. Definition 3.7 A game in extensive form (or extensive-form game) is an ordered vector1 Ŵ = (N, V , E, x 0 , (Vi )i∈N , O, u),
(3.2)
where:
r r r r r
N is a finite set of players. (V , E, x 0 ) is a tree called the game tree. (Vi )i∈N is a partition of the set of vertices that are not leaves. O is the set of possible game outcomes. u is a function associating every leaf of the tree with a game outcome in the set O.
By “possible outcome” we mean a detailed description of what happens as a result of the actions undertaken by the players. Some examples of outcomes include: 1. 2. 3. 4.
Player I is declared the winner of the game, and Player II the loser. Player I receives $2, Player II receives $3, and Player III receives $5. Player I gets to go out to the cinema with Player II, while Player III is left at home. If the game describes bargaining between two parties, the outcome is the detailed description of the points agreed upon in the bargaining. 5. In most of the following chapters, an outcome u(x) at a leaf x will be a vector of real numbers representing the utility2 of each player when a play reaches leaf x. For each player i ∈ N, the set Vi is player i’s set of decision vertices, and for each leaf x, the outcome at that leaf is u(x). Note that the partition (Vi )i∈N may contain empty sets. We accept the possibility of empty sets in (Vi )i∈N in order to be able to treat games in which a player may not be required to make any moves, but is still a game participant who is affected by the outcome of the game. In the example in Figure 3.2, N = {I, II},
V = {r, a, b, c, d, e, f, g, h, i, j, k, l, m, p, q, s, w, y, z},
x 0 = r,
VI = {r, f, h, j, k},
VII = {b, c, d, q, w}. The set of possible outcomes is O = {I wins, II wins},
(3.3)
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
1 The word “ordered” indicates the convention that the elements of the game in extensive form appear in a specific order: the first element is the set of players, the second is the set of vertices, etc. 2 The subject of utility theory is discussed in Chapter 2.
44
Extensive-form games
and the function u is given by u(a) = u(l) = u(m) = u(p) = u(s) = II wins, u(e) = u(g) = u(i) = u(y) = u(z) = I wins.
The requirement that (Vi )i∈N be a partition of the set of vertices that are not leaves stems from the fact that at each game situation there is one and only one player who is called upon to take an action. For each vertex x that is not a leaf, there is a single player i ∈ N for whom x ∈ Vi . That player is called the decision maker at vertex x, and denoted by J (x). In the example in Figure 3.2, J (r) = J (f ) = J (h) = J (j ) = J (k) = I,
J (b) = J (c) = J (d) = J (q) = J (w) = II. Denote by C(x) the set of all children of a non-leaf vertex x. Every edge that leads from x to one of its children is called a possible action at x. We will associate every action with the child to which it is connected, and denote by A(x) the set of all actions that are possible at the vertex x. Later, we will define more complicated games, in which such a mapping between the possible actions at x and the children of x does not exist. An extensive-form game proceeds in the following manner:
r Player J (x 0 ) initiates the game by choosing a possible action in A(x 0 ). Equivalently, he chooses an element x 1 in the set C(x 0 ). r If x 1 is not a leaf, Player J (x 1 ) chooses a possible action in A(x 1 ) (equivalently, an element x 2 ∈ C(x 1 )). r The game continues in this manner, until a leaf vertex x is reached, and then the game ends with outcome u(x). By definition, the collection of the vertices of the graph is a finite set, so that the game necessarily ends at a leaf, yielding a sequence of vertices (x 0 , x 1 , . . . , x k ), where x 0 is the root of the tree, x k is a leaf, and x l+1 ∈ C(x l ) for l = 0, 1, . . . , k − 1. This sequence is called a play.3 Every play ends at a particular leaf x k with outcome u(x k ). Similarly, every leaf x k determines a unique play, which corresponds to the unique path connecting the root x 0 with x k . It follows from the above description that every player who is to take an action knows the current state of the game, meaning that he knows all the actions in the game that led to the current point in the play. This implicit assumption is called perfect information, an important concept to be studied in detail when we discuss the broader family of games with imperfect information. Definition 3.7 therefore defines extensive-form games with perfect information. Remark 3.8 An extensive-form game, as defined here, is a finite game: the number of vertices V is finite. It is possible to define extensive-form games in which the game tree ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
3 Note carefully the words that are used here: a game is a general description of rules, whereas a play is a sequence of actions conducted in a particular instance of playing the game. For example, chess is a game; the sequence of actions in a particular chess match between two players is a play.
45
3.3 Game trees
(V , E, x 0 ) is infinite. When the game tree is infinite, there are two possibilities to be considered. It is possible that the depth of the tree is bounded, i.e., that there exists a natural number L such that the length of every path in the tree is less than or equal to L. This corresponds to a game that ends after at most L actions have been played, and there is at least one player who has an infinite number of actions available at an information set. The other possibility is that the depth of the vertices of the tree is not bounded; that is, there exists an infinite path in the game tree. This corresponds to a game that might never end. The definition of extensive-form game can be generalized to the case in which the game tree is infinite. Accomplishing this requires implementing mathematical tools from measure theory that go beyond the scope of this book. With the exception of a few examples in this chapter, we will assume here that extensive-form games are finite. We are now ready to present one of the central concepts of game theory: the concept of strategy. A strategy is a prescription for how to play a game. The definition is as follows. Definition 3.9 A strategy for player i is a function si mapping each vertex x ∈ Vi to an element in A(x)(equivalently, to an element in C(x)). According to this definition, a strategy includes instructions on how to behave at each vertex in the game tree, including vertices that previous actions by the player preclude from being reached. For example, in the game of chess, even if White’s strategy calls for opening by moving a pawn from c2 to c3, the strategy must include instructions on how White should play in his second move if in his first move he instead moved a pawn from c2 to c4, and Black then took his action. The main reason this definition is used is its simplicity: it does not require us to provide details regarding which vertices need to be dealt with in the strategy and which can be ignored. We will later see that this definition is also needed for further developments of the theory, which take into account the possibility of errors on the part of players, leading to situations that were unintended. Definition 3.10 A strategy vector is a list of strategies s = (si )i∈N , one for each player. Player i’s set of strategies is denoted by Si , and the set of all strategy vectors is denoted S = S1 × S2 × . . . Sn . Every strategy vector s = (si )i∈N determines a unique play (path from the root to a leaf). The play that is determined by a strategy vector s = (si )i∈N is (x 0 , x 1 , x 2 , . . . , x k ), where x 1 is the choice of player J (x 0 ), based on his strategy, x 2 is the choice of player J (x 1 ) based on his strategy, and so on, and x k is a leaf. The play corresponds to the terminal point x k (with outcome u(x k )), which we also denote by u(s). We next proceed to define the concept of subgame: Definition 3.11 Let Ŵ = (N, V , E, x 0 , (Vi )i∈N , O, u) be an extensive-form game (with perfect information), and let x ∈ V be a vertex in the game tree. The subgame starting at x, denoted by Ŵ(x), is the extensive-form game Ŵ(x) = (N, V (x), E(x), x, (Vi (x))i∈N , O, u), where:
46
Extensive-form games
r The set of players N is as in the game Ŵ. r The set of vertices V (x) includes x, and all the vertices that are descendants of x in the game tree (V , E, x 0 ), that is, the children of x, their children, the children of these children, and so on. r The set of edges E(x) includes all the edges in E that connect the vertices in V (x). r The set of vertices at which player i is a decision maker is Vi (x) = Vi ∩ V (x). r The set of possible outcomes is the set of possible outcomes in the game Ŵ. r The function mapping leaves to outcomes is the function u, restricted to the set of leaves in the game tree (V (x), E(x), x). The original game Ŵ is itself a subgame: Ŵ(x 0 ) = Ŵ. In addition, every leaf x defines a subgame in which no player can make a choice. We next focus on games with two players, I and II, whose set of outcomes is O = {I wins, II wins, Draw}. We will define the concepts of a winning strategy and a strategy guaranteeing at least a draw for such games. Definition 3.12 Let Ŵ be an extensive-form game with Players I and II, whose set of outcomes is O = {I wins, II wins, Draw}. A strategy sI of Player I is called a winning strategy if u(sI , sII ) = I wins, ∀sII ∈ SII .
(3.4)
A strategy sI of Player I is called a strategy guaranteeing at least a draw if u(sI , sII ) ∈ {I wins, Draw}, ∀sII ∈ SII .
(3.5)
A winning strategy for Player II, and a strategy guaranteeing at least a draw for Player II, are defined similarly. Theorem 3.13 (Von Neumann [1928]) In every two-player game (with perfect information) in which the set of outcomes is O = {I wins, II wins, Draw}, one and only one of the following three alternatives holds: 1. Player I has a winning strategy. 2. Player II has a winning strategy. 3. Each of the two players has a strategy guaranteeing at least a draw. The proof of Theorem 3.13 is similar to the proof of Theorem 1.4 for the game of chess (page 3), and it is left to the reader as an exercise (Exercise 3.7). As we saw above, in proving Theorem 1.4 we did not, in fact, make use of any of the rules specific to the game of chess; the proof is valid for any game that satisfies the three properties (C1)–(C3) specified on page 6. Examples of additional games to which Theorem 3.13 applies include, for example, checkers, the game Nim (see Exercise 3.14), and the game Hex (see Exercise 3.19). Remark 3.14 In our definition of a game in extensive form, we assumed that the game tree is finite. The proof of Theorem 3.13 shows that the theorem also holds when the game tree is infinite, but the depth of the vertices of the tree is bounded: there exists a natural
47
3.4 Chomp: David Gale’s game
j 8 7 6 5 4 3 2 1
×
1 2 3 4 5 6 7 8
i
Figure 3.5 Gale’s game for n = m = 8
number L such that the depth of each vertex in the tree is less than or equal to L. It turns out that the theorem is not true when the depth of the vertices of the tree is unbounded. See Mycielski [1992], Claim 3.1. We now consider another game that satisfies the conditions of Theorem 3.13. This game is interesting because we can prove which of the three possibilities of the theorem holds in this game, but we do not know how to calculate the appropriate strategy, in contrast to the game of chess, in which we do not even know which of the three alternatives holds.
3.4
Chomp: David Gale’s game • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The game described in this section is known by the name of Chomp, and was invented by David Gale (see Gale [1974]). It is played on an n × m board of squares. Each square is denoted by a pair of coordinates (i, j ), 1 ≤ i ≤ n and 1 ≤ j ≤ m: i is the horizontal coordinate, and j is the vertical coordinate. Figure 3.5 depicts the game board for n = m = 8. Every player in turn captures a square, subject to the following rules: if at a certain stage the square (i0 , j0 ) has been captured, no square that is located north-east of (i0 , j0 ) can be captured in subsequent moves. This means that after (i0 , j0 ) has been captured, all the squares (i, j ) satisfying i ≥ i0 and j ≥ j0 can be regarded as if they have been removed from the board. In Figure 3.5, for example, square (4, 7) has been captured, so that all the darkened squares in the figure are regarded as having been removed. Player I has the opening move. The player who captures square (1, 1) (which is marked in Figure 3.5 with a black inner square) is declared the loser. We note that the game in Example 3.1 is David Gale’s game for n = m = 2. Theorem 3.15 In David Gale’s game on an n × n board, the following strategy is a winning strategy for Player I: in the opening move capture square (2, 2), thus leaving only the squares in row j = 1 and column i = 1 (see Figure 3.6). From that point on, play symmetrically to Player II’s actions. That is, if Player II captures square (i, j ), Player I captures square (j, i) in the following move.
48
Extensive-form games
j 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8
i
Figure 3.6 The game board after Player I has captured square (2, 2)
The above strategy is well defined. That is, if Player II captures square (i, j ) (and (i, j ) = (1, 1)), square (j, i) has not yet been removed from the board (verify this!). This strategy is also a winning strategy when the board is infinite, ∞ × ∞. What happens if the board is rectangular but not square? Which player then has a winning strategy? As the next theorem states, the opening player always has a winning strategy. Theorem 3.16 For every finite n × m board (with n > 1 or m > 1), Player I, who has the opening move, has a winning strategy. Proof: The game satisfies the conditions of von Neumann’s Theorem (Theorem 3.13), and therefore one of the three possibilities of the theorem must hold. Since the game cannot end in a draw, there are only two remaining possibilities: 1. Player I has a winning strategy. 2. Player II has a winning strategy. Theorem 3.16 will be proved once the following claim is proved. Claim 3.17 For every finite n × m board (with n > 1 or m > 1), if Player II has a winning strategy, then Player I also has a winning strategy. Since it is impossible for both players to have winning strategies, it follows that Player II cannot have a winning strategy, and therefore the only remaining possibility is that Player I has a winning strategy. Proof of Claim 3.17: Suppose that Player II has a winning strategy sII . This strategy guarantees Player II victory over any strategy used by Player I. In particular, the strategy grants Player II victory even if Player I captures square (n, m) (the top-rightmost square) in the opening move. Suppose that Player II’s next action, as called for by strategy sII , is to capture square (i0 , j0 ) (see Figure 3.7(a)). From this point on, a new game is effectively being played, as depicted in Figure 3.7(b). In this game Player I has the opening move, and Player II, using strategy sII , guarantees
49
3.5 Games with chance moves
m
I
j0
j0
II
i0
m
n
(a)
i0
n
(b)
Figure 3.7 The board after the first action (a) and the board after the second
action (b)
himself victory. In other words, the player who implements the opening move in this game is the losing player. But Player I can guarantee himself the situation depicted in Figure 3.7(b) when Player II opens, by choosing the square (i0 , j0 ) on his first move. In conclusion, a winning strategy in the original game for Player I is to open with (i0 , j0 ) and then continue according to strategy sII , thus completing the proof of the claim. It follows from Claim 3.17 that Player II has no winning strategy, so that Player I must have a winning strategy. The conclusion of the theorem is particularly striking, given the fact that for n = m we do not know how to find the winning strategy for Player I, even with the aid of computers, on relatively small boards of n and m between 30 to 40.
3.5
Games with chance moves • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
In the games we have seen so far, the transition from one state to another is always accomplished by actions undertaken by the players. Such a model is appropriate for games such as chess and checkers, but not for card games or dice games (such as poker or backgammon) in which the transition from one state to another may depend on a chance process: in card games, the shuffle of the deck, and in backgammon the toss of the dice. It is possible to come up with situations in which transitions from state to state depend on other chance factors, such as the weather, earthquakes, or the stock market. These sorts of state transitions are called chance moves. To accommodate this feature, our model is expanded by labeling some of the vertices in the game tree (V , E, x 0 ) as chance moves. The edges emanating from vertices corresponding to chance moves represent the possible outcomes of a lottery, and next to each such edge is listed the probability that the outcome it represents will be the result of the lottery.
50
Extensive-form games
Example 3.18 A game with chance moves Consider the two-player game depicted in Figure 3.8.
(0, 0) a R I b
A 0
1 2
—
B II
e
D 0
I C
(−2, 5)
(1, 1) 1 4 1 — 2 1 — 4 —
h
(5, −1)
(2, 0)
f g 1 — 2
1 3 2 — 3 —
0 E
(0, 2) (−1, 1) (1, 1)
Figure 3.8 An example of a game with chance moves
The outcomes of the game are noted by pairs of numbers (zI , zII ), where zI is the monetary payoff to Player I, and zII is the monetary payoff to Player II. The verbal description of this game is as follows. At the root of the game (vertex R) Player I has the choice of selecting between action a, which leads to the termination of the game with payoff (0, 0), and action b, which leads to a chance move at vertex A. The chance move is a lottery (or a flip of a coin) leading with probability 21 to state B, which is a decision vertex of Player II, and with probability 12 to state C, which is a decision vertex of Player I. At state B, Player II chooses between action f , leading to a termination of the game with payoff (2, 0), and action e leading to state D which is a chance move; at this chance move, with probability 31 the game ends with payoff (5, −1), and with probability 23 the game ends with payoff (−2, 5). At state C, Player I chooses between action g, leading to the termination of the game with payoff (1, 1), and action h, leading to a chance move at vertex E. At this chance move the game ends, with payoff (0, 2), or (−1, 1), or ◭ (1, 1), with respective probabilities 14 , 21 , and 14 .
Formally, the addition of chance moves to the model proceeds as follows. We add a new player, who is called “Nature,” and denoted by 0. The set of players is thus expanded to N ∪ {0}. For every vertex x at which a chance move is implemented, we denote by px the probability vector over the possible outcomes of a lottery that is implemented at vertex x. This leads to the following definition of a game in extensive form. Definition 3.19 A game in extensive form (with perfect information and chance moves) is a vector Ŵ = (N, V , E, x 0 , (Vi )i∈N∪{0} , (px )x∈V0 , O, u),
(3.6)
where:
r r r r
N is a finite set of players. (V , E, x 0 ) is the game tree. (Vi )i∈N∪{0} is a partition of the set of vertices that are not leaves. For every vertex x ∈ V0 , px is a probability distribution over the edges emanating from x.
51
3.5 Games with chance moves
r O is the set of possible outcomes. r u is a function mapping each leaf of the game tree to an outcome in O. The notation used in the extension of the model is the same as the previous notation, with the following changes:
r The partition of the set of vertices is now (Vi )i∈N∪{0} . We have, therefore, added the set V0 to the partition, where V0 is the set of vertices at which a chance move is implemented. r For each vertex x ∈ V0 , a vector px , which is a probability distribution over the edges emanating from x, has been added to the model. Games with chance moves are played similarly to games without chance moves, the only difference being that at vertices with chance moves a lottery is implemented, to determine the action to be undertaken at that vertex. We can regard a vertex x with a chance move as a roulette wheel, with the area of the pockets of the roulette wheel proportional to the values px . When the game is at a chance vertex, the wheel is spun, and the pocket at which the wheel settles specifies the new state of the game. Note that in this description we have included a hidden assumption, namely, that the probabilities of the chance moves are known to all the players, even when the game includes moves that involve the probability of rain, or an earthquake, or a stock market crash, and so forth. In such situations, we presume that the probability assessments of these occurrences are known by all the players. More advanced models take into account the possibility that the players do not all necessarily share the same assessments of the probabilities of chance moves. Such models are considered in Chapters 9, 10, and 11. In a game without chance moves, a strategy vector determines a unique play of the game (and therefore also a unique game outcome). When a game includes chance moves, a strategy vector determines a probability distribution over the possible game outcomes. Example 3.18 (Continued ) (See Figure 3.8.) Suppose that Player I uses strategy sI , defined as sI (R) = b, sI (C) = h,
(3.7)
and that Player II uses strategy sII , defined as sII (B) = f.
(3.8)
Then:
r r r r
the play R → A → B → (2, 0) occurs with probability 1/2, leading to outcome (2, 0); the play R → A → C → E → (0, 2) occurs with probability 1/8, leading to outcome (0, 2); the play R → A → C → E → (−1, 1) occurs with probability 1/4, leading to outcome (−1, 1); the play R → A → C → E → (1, 1) occurs with probability 1/8, leading to outcome (1, 1).
◭
Using this model of games with chance moves, we can represent games such as backgammon, Monopoly, Chutes and Ladders, and dice games (but not card games such as poker and bridge, which are not games with perfect information, because players do
52
Extensive-form games
not know what cards the other players are holding). Note that von Neumann’s Theorem (Theorem 3.13) does not hold in games with chance moves. In dice games, such as backgammon, a player who benefits from favorable rolls of the dice can win regardless of whether or not he has the first move, and regardless of the strategy adopted by his opponent.
3.6
Games with imperfect information • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
One of the distinguishing properties of the games we have seen so far is that at every stage of the game each of the players has perfect knowledge of all the developments in the game prior to that stage: he knows exactly which actions were taken by all the other players, and if there were chance moves, he knows what the results of the chance moves were. In other words, every player, when it is his turn to take an action, knows precisely at which vertex in the game tree the game is currently at. A game satisfying this condition is called a game with perfect information. The assumption of perfect information is clearly a very restrictive assumption, limiting the potential scope of analysis. Players often do not know all the actions taken by the other players and/or the results of chance moves (for example, in many card games the hand of cards each player holds is not known to the other players). The following game is perhaps the simplest example of a game with imperfect information. Example 3.20 Matching Pennies The game Matching Pennies is a two-player game in which each player chooses one of the sides of a coin, H (for heads) or T (for tails) in the following way: each player inserts into an envelope a slip of paper on which his choice is written. The envelopes are sealed and submitted to a referee. If both players have selected the same side of the coin, Player II pays one dollar to Player I. If they have selected opposite sides of the coin, Player I pays one dollar to Player II. The depiction of Matching Pennies as an extensive-form game appears in Figure 3.9. In Figure 3.9, Player I’s actions are denoted by upper case letters, and Player II’s actions are depicted by lower case letters.
A R I
H II T
B
h
(1, −1)
t
(−1, 1)
h
(−1, 1)
t
(1, −1)
Figure 3.9 The game Matching Pennies as a game in extensive form
Figure 3.9 introduces a new element to the depictions of extensive-form games: the two vertices A and B of Player II are surrounded by an ellipse. This visual element represents the fact that when Player II is in the position of selecting between h and t, he does not know whether the game state is currently at vertex A or vertex B, because he does not know whether Player I has selected H or ◭ T . These two vertices together form an information set of Player II.
53
3.6 Games with imperfect information
Remark 3.21 The verbal description of Matching Pennies is symmetric between the two players, but in Figure 3.9 the players are not symmetric. The figure depicts Player I making his choice before Player II’s choice, with Player II not knowing which choice Player I made; this is done in order to depict the game conveniently as a tree. We could alternatively have drawn the tree with Player II making his choice before Player I, with Player I not knowing which choice Player II made. Both trees are equivalent, and they are equivalent to the verbal description of the game in which the two players make their choices simultaneously.
In general, a player’s information set consists of a set of vertices that satisfy the property that when play reaches one of these vertices, the player knows that play has reached one of these vertices, but he does not know which vertex has been reached. The next example illustrates this concept.
Example 3.22
Consider the following situation. David Beckham, star mid-fielder for Manchester United,
is interested in leaving the team and signing up instead with either Real Madrid, Bayern Munich, or AC Milan. Both Bayern Munich and AC Milan have told Beckham they want to hire him, and even announced their interest in the star to the media. Beckham has yet to hear anything on the matter from Real Madrid. With the season fast approaching, Beckham has only a week to determine which club he will be playing for. Real Madrid announces that it will entertain proposals of interest from players only up to midnight tonight, because its Board of Directors will be meeting tomorrow to discuss to which players the club will be making offers (Real Madrid’s Board of Directors does not rule out making offers to players who have not approached it with proposals of interest). Only after the meeting will Real Madrid make offers to the players it wishes to add to its roster for the next season. Beckham needs to decide whether to approach Real Madrid today with an expression of interest, or wait until tomorrow, hoping that the club will make him an offer on its own initiative. Real Madrid’s Board of Directors will be called upon to consider two alternatives: hiring an outside expert to assess Beckham’s potential contribution to the team, or dropping all considerations of hiring Beckham, without even asking for an expert’s opinion. If Real Madrid hires an outside expert, the club will make an offer to Beckham if the outside expert’s assessment is positive, and decline to make an offer to Beckham if the assessment is negative. The outside expert, if hired by Real Madrid, will not be informed whether or not Beckham has approached Real Madrid. If Beckham fails to receive an offer from Real Madrid, he will not know whether that is because the expert determined his contribution to the team unworthy of a contract, or because the team did not even ask for an expert’s opinion. After a week, whether or not he receives an offer from Real Madrid, Beckham must decide which club he will be playing for next season, Bayern Munich, AC Milan, or Real Madrid, assuming the latter has sent him an offer. This situation can be described as a three-player game (see Figure 3.10) (verify this). There are three information sets in this game that contain more than one vertex. The expert does not know whether or not Beckham has approached Real Madrid with an expression of interest. If Beckham has not received an offer from Real Madrid, he does not know whether that is because the expert determined his contribution to the team unworthy of a contract, or because the team did not even ask for an expert’s opinion.
54
Extensive-form games
Negative recommendation Seek opinion
Positive rec.
Real Approach Real
Beckham
Don’t seek opinion Adviser
Real Bayern
Negative recommendation
Seek opinion
Beckham
Real Don’t seek opinion
Positive recommendation
Milan Beckham Bayern
Milan
Beckham Don’t Approach Real
Bayern
Real Bayern
Milan Bayern Milan Beckham Bayern
Milan Milan
◭
Figure 3.10 The game in Example 3.22 in extensive form
The addition of information sets to our model leads to the following definition. Definition 3.23 Let Ŵ = (N, V , E, x 0 , (Vi )i∈N∪{0} , (px )x∈V0 , O, u) be a game in extensive form. An information set of player i is a pair (Ui , A(Ui )) such that
r Ui = {xi1 , xi2 , . . . , xij } is a subset of Vi that satisfies the property that at each vertex in Ui player i has the same number of actions li = li (Ui ), i.e., j
|A(xi )| = li ,
∀j = 1, 2, . . . , m. (3.9) j r A(Ui ) is a partition of the mli edges m j =1 A(xi ) to li disjoint sets, each of which j contains one element from the sets (A(xi ))m j =1 . We denote the elements of the partition by li 2 1 ai , ai , . . . , ai . The partition A(Ui ) is called the action set of player i in the information set Ui . We now explain the significance of the definition. When the play of the game arrives at vertex x in information set Ui , all that player i knows is that the play has arrived at one of the vertices in this information set. The player therefore cannot choose a particular edge emanating from x. Each element of the partition ail contains m edges, one edge for each vertex in the information set. The partition elements ai1 , ai2 , . . . , aili are the “actions” from which the player can choose; if player i chooses one of the elements from the partition ail , the play continues along the unique edge in the intersection ail ∩ A(x). For this reason, when we depict games with information sets, we denote edges located in the same partition elements by the same letter. Definition 3.24 A game in extensive form (with chance moves and with imperfect information) is a vector j j =1,...,ki
Ŵ = (N, V , E, x 0 , (Vi )i∈N∪{0} , (px )x∈V0 , (Ui )i∈N
, O, u),
(3.10)
55
3.6 Games with imperfect information
where:
r r r r r r r r
N is a finite set of players. (V , E, x 0 ) is a game tree. (Vi )i∈N∪{0} is a partition of the set of vertices that are not leaves. For each vertex x ∈ V0 , px is a probability distribution over the set of edges emanating from x. j For each player i ∈ N, (Ui )j =1,...,ki is a partition of Vi . j j For each player i ∈ N and every j ∈ {1, 2, . . . , ki }, the pair (Ui , A(Ui )) is an information set of player i. O is a set of possible outcomes. u is a function mapping each leaf of the game tree to a possible outcome in O.
We have added information sets to the previous definition of a game in extensive form j j (Definition 3.19): (Ui )j =1,...,ki is a partition of Vi . Every element Ui in this partition is an information set of player i. Note that the information sets are defined only for players i ∈ N, because, as noted above, Nature has no information sets. In a game with imperfect information, each player i, when choosing an action, does j not know at which vertex x the play is located. He only knows the information set Ui that contains x. The player then chooses one of the equivalence classes of actions available to j j him in Ui , i.e., an element in A(Ui ). The game proceeds as described on pages 44 and 51, with one difference: when the play is x, the decision maker at that state, player J (x), knows only the information set j j UJ (x) that contains x, and he chooses an element a in A(UJ (x) ). We can now describe many more games as games in extensive form: various card games such as poker and bridge, games of strategy such as Stratego, and many real-life situations, such as bargaining between two parties. Definition 3.25 An extensive-form game is called a game with perfect information for player i if each information set of player i contains only one vertex. An extensive-form game is called a game with perfect information if it is a game with perfect information for all of the players. In Definition 3.11 (page 45), we defined a subgame starting at a vertex x, to be the game defined by restriction to the subtree starting at x. A natural question arises as to how this definition can be adapted to games in which players have information sets that contain several vertices, because player i may have an information set (Ui , A(Ui )) where ui contains both vertices that are in the subtree starting at x, and vertices that are outside this subtree. We will say that Ŵ(x) is a subgame only if for every player i and each of this information sets (Ui , A(Ui )), the set Ui is either contained entirely inside the subtree starting at x, or disjoint from this subtree. For simplicity we will often refer to Ui as an information set, and omit the set partition A(Ui ). Example 3.26
Consider the two-player game with chance moves and with imperfect information that is
described in Figure 3.11. The outcomes, the names of the actions, and the probabilities assigned to the chance moves are not specified in this game (as they are not needed for our discussion).
56
Extensive-form games
E A x0 I
t1
C 0
T II
b1 t1 b1
b2 t2 F
3 4
0 D
t2
II 2 3
B B
1 3
b2 t3
II G
1 4
b3
Figure 3.11 The game in Example 3.26 in extensive form
The game in Figure 3.11 has four subgames: Ŵ(R), Ŵ(C), Ŵ(D), and Ŵ(G). The subtree starting at A (or at B) cannot represent a subgame, because the information set {A, B} of Player II is neither contained in, nor disjoint from, the subtree. It would therefore be incorrect to write Ŵ(A) (or Ŵ(B)). Similarly, the subtrees that start at E and F cannot represent subgames, because the information ◭ set {E, F } of Player II is neither contained in, nor disjoint from, each of these subtrees.
3.6.1
Strategies in games with imperfect information Recall that a player’s strategy is a set of instructions telling the player which action to choose, every time he is called upon to play. When we dealt with games with perfect information, in which each player, when coming to make a decision, knows the current vertex x, a strategy was defined as a function si : Vi → V , where si (x) ∈ C(x) for every x ∈ Vi . In a game with imperfect information, when choosing an action, the player knows the information set that contains the current vertex. Therefore a strategy is a function that assigns an action to each information set. Definition 3.27 A strategy of player i is a function from each of his information sets to the set of actions available at that information set, i.e., si : Ui →
ki
j
A(Ui ),
(3.11)
j =1
where Ui = {Ui1 , . . . , Uiki } is the collection of player i’s information sets, and for each j information set Ui ∈ Ui , j
j
si (Ui ) ∈ A(Ui ).
(3.12)
Just as in games with chance moves and perfect information, a strategy vector determines a distribution over the outcomes of a game. For example, in Example 3.22, suppose that the players implement the following strategies:
r David Beckham approaches Real Madrid, and then chooses to play for Real Madrid if Real Madrid then makes him an offer; otherwise, he chooses to play for Bayern Munich. r Real Madrid hires an outside expert if Beckham approaches it, and does not hire an outside expert if Beckham does not approach the club. Real Madrid makes an offer to
57
3.7 Exercises
Beckham only if Beckham first approaches the club, and if the outside expert gives a positive recommendation. r The outside expert recommends that Real Madrid not make an offer to Beckham. There are no chance moves in this game, so that the strategy vector determines a unique play of the game, and therefore also a unique outcome: Beckham ends up playing for Bayern Munich, after he approaches Real Madrid, Real Madrid in turn hires an outside expert to provide a recommendation, the expert returns with a negative recommendation, Real Madrid does not make an offer to Beckham, and Beckham then decides to play for Bayern Munich.
3.7
Exercises • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
3.1 Describe the following situation as an extensive-form game. Three piles of matches are on a table. One pile contains a single match, a second pile contains two matches, and the third pile contains three matches. Two players alternately remove matches from the table. In each move, the player whose turn it is to act at that move may remove matches from one and only one pile, and must remove at least one match. The player who removes the last match loses the game. By drawing arrows on the game tree, identify a way that one of the players can guarantee victory. 3.2 Candidate choice Depict the following situation as a game in extensive form. Eric, Larry, and Sergey are senior partners in a law firm. The three are considering candidates for joining their firm. Three candidates are under consideration: Lee, Rebecca, and John. The choice procedure, which reflects the seniority relations between the three law firm partners, is as follows:
r Eric makes the initial proposal of one of the candidates. r Larry follows by proposing a candidate of his own (who may be the same candidate that Eric proposed). r Sergey then proposes a candidate (who may be one of the previously proposed candidates). r A candidate who receives the support of two of the partners is accepted into the firm. If no candidate has the support of two partners, all three candidates are rejected. 3.3 Does aggressiveness pay off? Depict the following situation as a game in extensive form. A bird is defending its territory. When another bird attempts to invade this territory, the first bird is faced with two alternatives: to stand and fight for its territory, or to flee and seek another place for its home. The payoff to each bird is defined to be the expected number of descendants it will leave for posterity, and these are calculated as follows:
r If the invading bird yields to the defending bird and instead flies to another territory, the payoff is: 6 descendants for the defending bird, 4 descendants for the invading bird.
58
Extensive-form games
r If the invading bird presses an attack and the defending bird flies to another territory, the payoff is: 4 descendants for the defending bird, 6 descendants for the invading bird. r If the invading bird presses an attack and the defending bird stands its ground and fights, the payoff is: 2 descendants for the defending bird, 2 descendants for the invading bird. 3.4 Depict the following situation as a game in extensive form. Peter and his three children, Andrew, James, and John, manage a communications corporation. Andrew is the eldest child, James the second-born, and John the youngest of the children. Two candidates have submitted offers for the position of corporate accountant at the communications corporation. The choice of a new accountant is conducted as follows: Peter first chooses two of his three children. The two selected children conduct a meeting to discuss the strengths and weaknesses of each of the two candidates. The elder of the two children then proposes a candidate. The younger of the two children expresses either agreement or disagreement to the proposed candidate. A candidate is accepted to the position only if two children support his candidacy. If neither candidate enjoys the support of two children, both candidates are rejected. 3.5 (a) How many strategies has each player got in each of the following three games (the outcomes of the games are not specified in the figures). T2
T1
M2
I
t II
t1
B2 b
I
b1
I
B1
m1
II
T
t2
B
II b2
Game A
Game B T2 τ1
I B2 T3
III t1
β1
T1 I
M1
II
I B3 t2
m1
B1
τ2
b1
II b2 t3
III β2
II b3
Game C
59
3.7 Exercises
O
X
X Figure 3.12 The board of the game Tic-Tac-Toe, after three moves
(b) Write out in full all the strategies of each player in each of the three games. (c) How many different plays are possible in each of the games? 3.6 In a single-player game in which at each vertex x that is not the root the player has mx actions, how many strategies has the player got? 3.7 Prove von Neumann’s Theorem (Theorem 3.13 on page 46): in every twoplayer finite game with perfect information in which the set of outcomes is O = {I wins, II wins, Draw}, one and only one of the following three alternatives holds: (a) Player I has a winning strategy. (b) Player II has a winning strategy. (c) Each of the two players has a strategy guaranteeing at least a draw. Where does your proof make use of the assumption that the game is finite? 3.8 Tic-Tac-Toe How many strategies has Player I got in Tic-Tac-Toe, in which two players play on a 3 × 3 board, as depicted in Figure 3.12? Player I makes the first move, and each player in turn chooses a square that has not previously been selected. Player I places an X in every square that he chooses, and Player II places an O in every square that he chooses. The game ends when every square has been selected. The first player who has managed to place his mark in three adjoining squares, where those three squares form either a column, a row, or a diagonal, is the winner.4 (Do not attempt to draw a full game tree. Despite the fact that the rules of the game are quite simple, the game tree is exceedingly large. Despite the size of the game tree, with a little experience players quickly learn how to ensure at least a draw in every play of the game.) 3.9 By definition, a player’s strategy prescribes his selected action at each vertex in the game tree. Consider the following game. Player I has four strategies, T1 T2 , T1 B2 , B1 T2 , B1 B2 . Two of these strategies, B1 T2 and B1 B2 , regardless of the strategy used by Player II, yield the same play of the game, because if Player I has selected action B1 at the root vertex, he will never get to his second decision vertex. We can therefore eliminate one of these two strategies and define a reduced strategy B1 , which only stipulates that Player I chooses B1 at the root of the game. In the game appearing in the above
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
4 The game, of course, can effectively be ended when one of the players has clearly ensured victory for himself, but calculating the number of strategies in that case is more complicated.
60
Extensive-form games t1 T1
II b1
T2 I B2
I B1
t2 II b2
figure, the reduced strategies of Player I are T1 T2 , T1 B2 , and B1 . The reduced strategies of Player II are the same as his regular strategies, t1 t2 , t1 b2 , b1 t2 , and b1 b2 , because Player II does not know to which vertex Player I’s choice will lead. i Formally, a reduced strategy τi of player i is a function from a subcollection U of player i’s collection of information sets to actions, satisfying the following two conditions: (i) For any strategy vector of the remaining players σ−i , given the vector (τi , σ−i ), the game will definitely not get to an information set of player i that is not in the i . collection U i satisfying condition (i). (ii) There is no strict subcollection of U
(a) List the reduced strategies of each of the players in the game depicted in the following figure. T2 τ1 T1
III
B2 β1
I
t1 B1
M2
I
II b1
(1, 2, 3) t3 (4, 5, 6) II (7, 8, 9) b3 (10, 11, 12)
(3, 5, 7) (2, 8, 3) t 2 III II b2
τ2
(4, 0, −5)
(7, 5, 2) β2 (0, 0, 0)
(b) What outcome of the game will obtain if the three players make use of the reduced strategies {(B1 ), (t1 , t3 ), (β1 , τ2 )}? (c) Can any player increase his payoff by unilaterally making use of a different strategy (assuming that the other two players continue to play according to the strategies of part (b))? 3.10 Consider the game in the following figure. O1 I II
O2 O3
61
3.7 Exercises
The outcomes O1 , O2 , and O3 are distinct and taken from the set {I wins, II wins, Draw}. (a) Is there a choice of O1 , O2 , and O3 such that Player I can guarantee victory for himself? Justify your answer. (b) Is there a choice of O1 , O2 , and O3 such that Player II can guarantee victory for himself? Justify your answer. (c) Is there a choice of O1 , O2 , and O3 such that both players can guarantee for themselves at least a draw? Justify your answer. 3.11 The Battle of the Sexes The game in this exercise, called Battle of the Sexes, is an oft-quoted example in game theory (see also Example 4.21 on page 98). The name given to the game comes from the story often attached to it: a couple is trying to determine how to spend their time next Friday night. The available activities in their town are attending a concert (C), or watching a football match (F ). The man prefers football, while the woman prefers the concert, but both of them prefer being together to spending time apart. The pleasure each member of the couple receives from the available alternatives is quantified as follows:
r From watching the football match together: 2 for the man, 1 for the woman. r From attending the concert together: 1 for the man, 2 for the woman. r From spending time apart: 0 for the man, 0 for the woman. The couple do not communicate well with each other, so each one chooses where he or she will go on Friday night before discovering what the other selected, and refuses to change his or her mind (alternatively, we can imagine each one going directly to his or her choice directly from work, without informing the other). Depict this situation as a game in extensive form. 3.12 The Centipede game5 The game tree appearing in Figure 3.13 depicts a two-player game in extensive form (note that the tree is shortened; there are another 94 choice vertices and another 94 leaves that do not appear in the figure). The payoffs appear as pairs (x, y), where x is the payoff to Player I (in thousands of dollars) and y is the payoff to Player II (in thousands of dollars). The players make moves in alternating turns, with Player I making the first move. Every player has a till into which money is added throughout the play of the game. At the root of the game, Player I’s till contains $1,000, and Player II’s till is empty. Every player in turn, at her move, can elect either to stop the game (S), in which case every player receives as payoff the amount of money in her till, or to continue to play. Each time a player elects to continue the game, she removes $1,000 from his till and places them in the other player’s till, while simultaneously the game-master adds another $2,000 to the other player’s till. If no player has stopped the game after 100 turns have passed, the game ends, and each player receives the amount of money in her till at that point. How would you play this game in the role of Player I? Justify your answer!
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
5 The Centipede game was invented by Robert Rosenthal (see Rosenthal [1981]).
62
Extensive-form games Stage: 1 I S (1, 0)
C
2 II S (0, 3)
3 I
C S
(3, 2)
C
4 II
99 I
S
S
(2, 5)
C
100 II
C
(101, 100)
S
(99, 98) (98, 101)
Figure 3.13 The Centipede game (outcomes are in payoffs of thousands of dollars)
3.13 Consider the following game. Two players, each in turn, place a quarter on a round table, in such a way that the coins are never stacked one over another (although the coins may touch each other); every quarter must be placed fully on the table. The first player who cannot place an additional quarter on the table at his turn, without stacking it on an already placed quarter, loses the game (and the other player is the winner). Prove that the opening player has a winning strategy. 3.14 Nim6 Nim is a two-player game, in which piles of matches are placed before the players (the number of piles in the game is finite, and each pile contains a finite number of matches). Each player in turn chooses a pile, and removes any number of matches from the pile he has selected (he must remove at least one match). The player who removes the last match wins the game. (a) Does von Neumann’s Theorem (Theorem 3.13 on page 46) imply that one of the players must have a winning strategy? Justify your answer! We present here a series of guided exercises for constructing a winning strategy in the game of Nim. At the beginning of play, list, in a column, the number of matches in each pile, expressed in base 2. For example, if there are 4 piles containing, respectively, 2, 12, 13, and 21 matches, list: 10 1100 1101 10101 Next, check whether the number of 1s in each column is odd or even. In the above example, counting from the right, in the first and fourth columns the number of 1s is even, while in the second, third, and fifth columns the number of 1s is odd. A position in the game will be called a “winning position” if the number of 1s in each column is even. The game state depicted above is not a winning position. (b) Prove that, starting from any position that is not a winning position, it is possible to get to a winning position in one move (that is, by removing matches from a ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
6 Nim is an ancient game, probably originating in China. There are accounts of the game being played in Europe as early as the 15th century. The proof presented in this exercise is due to Bouton [1901].
63
3.7 Exercises
single pile). In our example, if 18 matches are removed from the largest pile, the remaining four piles will have 2, 12, 13, and 3 matches, respectively, which in base 2 are represented as 10 1100 1101 11 which is a winning position, as every column has an even number of 1s. (c) Prove that at a winning position, every legal action leads to a non-winning position. (d) Explain why at the end of every play of the game, the position of the game will be a winning position. (e) Explain how we can identify which player can guarantee victory for himself (given the initial number of piles of matches and the number of matches in each pile), and describe that player’s winning strategy. 3.15 The game considered in this exercise is exactly like the game of Nim of the previous exercise, except that here the player who removes the last match loses the game. (The game described in Exercise 3.1 is an example of such a game.) (a) Is it possible for one of the players in this game to guarantee victory? Justify your answer. (b) Explain how we can identify which player can guarantee victory for himself in this game (given the initial number of piles of matches and the number of matches in each pile), and describe that player’s winning strategy. 3.16 Answer the following questions relating to David Gale’s game of Chomp (see Section 3.4 on page 47). (a) Which of the two players has a winning strategy in a game of Chomp played on a 2 × ∞ board? Justify your answer. Describe the winning strategy. (b) Which of the two players has a winning strategy in a game of Chomp played on an m × ∞ board, where m is any finite integer? Justify your answer. Describe the winning strategy. (c) Find two winning strategies for Player I in a game of Chomp played on an ∞ × ∞ board. 3.17 Show that the conclusion of von Neumann’s Theorem (Theorem 3.13, page 46) does not hold for the Matching Pennies game (Example 3.20, page 52), where we interpret the payoff (1, −1) as victory for Player I and the payoff (−1, 1) as victory for Player II. Which condition in the statement of the theorem fails to obtain in Matching Pennies? 3.18 Prove that von Neumann’s Theorem (Theorem 3.13, page 46) holds in games in extensive form with perfect information and without chance moves, in which the game tree has a countable number of vertices, but the depth of every vertex is
64
Extensive-form games N Dark
Light
W
E
Light
Dark S
Figure 3.14 The Hex game board for n = 6 (in the play depicted here, dark is the
winner)
bounded; i.e., there exists a positive integer K that is greater than the length of every path in the game tree. 3.19 Hex Hex7 is a two-player game conducted on a rhombus containing n2 hexagonal cells, as depicted in Figure 3.14 for n = 6. The players control opposite sides of the rhombus (in the accompanying figure, the names of the players are “Light” and “Dark”). Light controls the south-west (SW ) and north-east (NE) sides, while Dark controls the north-west (NW ) and south-east sides (SE). The game proceeds as follows. Dark has the opening move. Every player in turn chooses an unoccupied hexagon, and occupies it with a colored game piece. A player who manages to connect the two sides he controls with a continuous path8 of hexagons occupied by his pieces is declared a winner. If neither player can do so, a draw is called. We will show that a play of the game can never end in a draw. In Figure 3.14, we depict a play of the game won by Dark. Note that, by the rules, the players can keep placing game pieces until the entire board has been filled, so that a priori it might seem as if it might be possible for both players to win, but it turns out to be impossible, as we will prove. There is, in fact, an intuitive argument for why a draw cannot occur: imagine that one player’s game pieces are bodies of water, and the other player’s game pieces are dry land. If the water player is a winner, it means that he has managed to create a water channel connecting his sides, through which no land-bridge constructed by the opposing player can cross. We will see that turning this intuitive argument into a formal proof is not at all easy.9 ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
7 Hex was invented in 1942 by a student named Piet Hein, who called it Polygon. It was reinvented, independently, by John Nash in 1948. The name Hex was given to the game by Parker Bros., who sold a commercial version of it. The proof that the game cannot end in a draw, and that there cannot be two winners, is due to David Gale [1979]. The presentation in this exercise is due to Jack van Rijswijck (see http://www.cs.ualberta.ca/∼javhar/). The authors thank Taco Hoekwater for assisting them in preparing the figure of the game board. 8 A continuous path is a chain of adjacent hexagons, where two hexagons are called “adjacent” if they share a common edge. 9 This argument is equivalent to Jordan’s Theorem, which states that a closed, continuous curve divides a plane into two parts, in such a way that every continuous curve that connects a point in one of the two disconnected parts with a point in the other part must necessarily pass through the original curve.
65
3.7 Exercises
For simplicity, assume that the edges of the board, as in Figure 3.14, are also composed of (half) hexagons. The hexagons composing each edge will be assumed to be colored with the color of the player who controls that respective edge of the board. Given a fully covered board, we construct a broken line (which begins at the corner labeled W ). Every leg of the broken line separates a game piece of one color from a game piece of the other color (see Figure 3.14).
(a) Prove that within the board, with the exception of the corners, the line can always be continued in a unique manner. (b) Prove that the broken line will never return to a vertex through which it previously passed (hint: use induction). (c) From the first two claims, and the fact that the board is finite, conclude that the broken line must end at a corner of the board (not the corner from which it starts). Keep in mind that one side of the broken line always touches hexagons of one color (including the hexagons comprising the edges of the rhombus), and the other side of the line always touches hexagons of the other color. (d) Prove that if the broken line ends at corner S, the sides controlled by Dark are connected by dark-colored hexagons, so that Dark has won (as in Figure 3.14). Similarly, if the broken line ends at corner N, Light has won. (e) Prove that it is impossible for the broken line to end at corner E. (f) Conclude that a draw is impossible. (g) Conclude that it is impossible for both players to win. (h) Prove that the player with the opening move has a winning strategy.
Guidance for the last part: Based on von Neumann’s Theorem (Theorem 3.13, page 46), and previous claims, one (and only one) of the players has a winning strategy. Call the player with the opening move Player I, and the other player, Player II. Suppose that Player II has a winning strategy. We will prove then that Player I has a winning strategy too, contradicting von Neumann’s Theorem. The winning strategy for Player I is as follows: in the opening move, place a game piece on any hexagon on the board. Call that game piece the “special piece.” In subsequent moves, play as if (i) you are Player II (and use his winning strategy), (ii) the special piece has not been placed, and (iii) your opponent is Player I. If the strategy requires placing a game piece where the special game piece has already been placed, put a piece on any empty hexagon, and from there on call that game piece the “special piece.” 3.20 And-Or is a two-player game played on a full binary tree with a root, of depth n (see Figure 3.15). Every player in turn chooses a leaf of the tree that has not previously been selected, and assigns it the value 1 or 0. After all the leaves have been assigned a value, a value for the entire tree is calculated as in the figure. The first step involves calculating the value of the vertices at one level above the level of the leaves: the value of each such vertex is calculated using the logic “or” function, operating on the values assigned to its children. Next, a value is calculated for each vertex one level up, with that value calculated using the logic “and” function, operating on the
66
Extensive-form games
or and
0
0
or
or
1
1
or and
1
1
or
and
1
1
or and
1
0
or
or
0
1
or and
1
1
or 1
0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0
Figure 3.15 A depiction of the game And-Or of depth n = 4 as an extensive-form
game
values previously calculated for their respective children. The truth tables of the “and” and “or” functions are:10 x 0 0 1 1
y 0 1 0 1
x and y 0 0 0 1
x or y 0 1 1 1
Equivalently, x and y = min{x, y} and x or y = max{x, y}. The values of all the vertices of the tree are alternately calculated in this manner recursively, with the value of each vertex calculated using either the “and” or “or” functions, operating
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
10 Equivalently, “x or y” = x ∨ y = max{x, y}, and “x and y” = x ∧ y = min{x, y}.
67
3.7 Exercises
on values calculated for their respective children. Player I wins if the value of the root vertex is 1, and loses if the value of the root vertex is 0. Figure 3.15 shows the end of a play of this game, and the calculations of vertex values by use of the “and” and “or” functions. In this figure, Player I is the winner. Answer the following questions: (a) Which player has a winning strategy in a game played on a tree of depth two? (b) Which player has a winning strategy in a game played on a tree of depth 2k, where k is any positive integer? Guidance: To find the winning strategy in a game played on a tree of depth 2k, keep in mind that you can first calculate inductively the winning strategy for a game played on a tree of depth 2k − 2. 3.21 Each one of the following figures cannot depict a game in extensive form. For each one, explain why.
c II
a
a
d
I
I
c
b
II b
II d
d
I
e
b
II c d
d
II e f Part C
f
a b
f
Part A
c
a
c
Part B
e 1 3
c a
III d
I b
e
II
III
a
r p r w p
f g
0 3 4
II
c
b
I d Part E
T2 I
t1
I
II
B2
b1
τ β
t2 B1
T3
III
M1 II
I
B3 B3
b2 Part F
I h
f Part D
T1
I
T3
e
68
Extensive-form games e 3 7
f
g 2 7
I
g
a 4 7
II 0
I
h c
b
I
f
0
a
e
3 7
I
h
3 7
b
II
II
4 7
d Part G
Part H
3.22 In each of the following games, Player I has an information set containing more than one vertex. What exactly has Player I “forgotten” (or could “forget”) during the play of each game? c a
II d
I b
I
e II
r p
T
(7, 8) t II b
I
r p
B
(5, 0) T I B
f Game A
(3, 4) (1, 2)
Game B t2 t1
I
II b1
T1
T3
II b2
B1 T2
I
B2
B3
I
T3
t3 II
B3
b3 Game C
3.23 In which information sets for the following game does Player II know the action taken by Player I? UII2 UII1
I
0
II
II 0 II UII3
69
3.7 Exercises
3.24 Sketch the information sets in the following game tree in each of the situations described in this exercise. I II III I III II III
(a) Player II does not know what Player I selected, while Player III knows what Player I selected, but if Player I moved down, Player III does not know what Player II selected. (b) Player II does not know what Player I selected, and Player III does not know the selections of either Player I or Player II. (c) At every one of his decision points, Player I cannot remember whether or not he has previously made any moves. 3.25 For each of the following games: (a) List all of the subgames. (b) For each information set, note what the player to whom the information set belongs knows, and what he does not know, at that information set. t3 1
x II x1 x0 I
T II B
x0 I
t b
t1
T
b1
x3 III x4
t2
B
x 0 T1 I B1
t2
β
II
τ
t3
x6
b2
b3
x2 b Game A
x1 t 1 II b1
b3
β
II x2
t
x5 τ
Game B x3
T2 B2
I
II b x2 2 x Game C
T2 4
B2
x0 I
T
x1 0
2 3
1 3
x3 II
b2 x4
II B
t1 b1 Game D
x2
t2
t1 b1
70
Extensive-form games
3.26 Only a partial depiction of a game in extensive form is presented in the accompanying figure of this exercise. Sketch the information sets describing each of the following situations. (a) Player II, at his decision points, knows what Player I selected, but does not know the result of the chance move. (b) Player II, at his decision points, knows the result of the chance move (where relevant). If Player I has selected T , Player II knows that this is the case, but if Player I selected either B or M, Player II does not know which of these two actions was selected. (c) Player II, at his decision points, knows both the result of the chance move and any choice made by Player I.
x1 II T 0
x I
M
x2 0
B 0 x3
2 3 1 3 4 5 1 5
x4 x
5
x
6
x
7
II II II II
3.27 (a) What does Player I know, and what does he not know, at each information set in the following game:
a
r 0
T1 B1
3 5
I I
2 5
T2 b
B2
c T3 I B3
10
d T2
25
B2
15
T1
40
e B1
15
T4 I f B4
25
20
20
(b) How many strategies has Player I got? (c) The outcome of the game is the payment to Player I. What do you recommend Player I should play in this game? 3.28 How many strategies has Player II got in the game in the figure in this exercise, in each of the described situations? Justify your answers.
71
3.7 Exercises A II 1 3
0
I
2 3 1 5
0 4 5
(a) (b) (c) (d)
B II C II D II E II
The information sets of Player II are: {A}, {B, C}, {D, E}. The information sets of Player II are: {A, B}, {C}, {D, E}. The information sets of Player II are: {A, B, C}, {D, E}. The information sets of Player II are: {A, B, D}, {C}, {E}.
3.29 Consider the following two-player game.
T1 7 10
I B1
0 3 10
T2 I
t1 O 5 b1 O6
II
t1
UII2
II
B2 UII1
O7 b1 O 8
t2
O1
b2
O2
t2
O3
b2
O4
(a) What does Player II know, and what does he not know, at each of his information sets? (b) Depict the same game as a game in extensive form in which Player II makes his move prior to the chance move, and Player I makes his move after the chance move. (c) Depict the same game as a game in extensive form in which Player I makes his move prior to the chance move, and Player II makes his move after the chance move. 3.30 Depict the following situation as a game in extensive form. Two corporations manufacturing nearly identical chocolate bars are independently considering whether or not to increase their advertising budgets by $500,000. The sales experts of both corporations are of the opinion that if both corporations increase their advertising budgets, they will each get an equal share of the market, and the same result will ensue if neither corporation increases its advertising budget. In contrast, if one corporation increases its advertising budget while the other maintains the same level of advertising, the corporation that increases its advertising budget will grab an 80% market share, and the other will be left with a 20% market share.
72
Extensive-form games
The decisions of the chief executives of the two corporations are made simultaneously; neither one of the chief executives knows what the decision of the other chief executive is at the time he makes his decision. 3.31 Investments Depict the following situation as a game in extensive form. Jack has $100,000 at his disposal, which he would like to invest. His options include investing in gold for one year; if he does so, the expectation is that there is a probability of 30% that the price of gold will rise, yielding Jack a profit of $20,000, and a probability of 70% that the price of gold will drop, causing Jack to lose $10,000. Jack can alternatively invest his money in shares of the Future Energies corporation; if he does so, the expectation is that there is a probability of 60% that the price of the shares will rise, yielding Jack a profit of $50,000, and a probability of 40% that the price of the shares will drop to such an extent that Jack will lose his entire investment. Another option open to Jack is placing the money in a safe index-linked money market account yielding a 5% return. 3.32 In the game depicted in Figure 3.16 , if Player I chooses T , there is an ensuing chance move, after which Player II has a turn, but if Player I chooses B, there is no chance move, and Player II has an immediately ensuing turn (without a chance move). The outcome of the game is a pair of numbers (x, y) in which x is the payoff for Player I and y is the payoff for Player II. t2 2 3
T
0 1 3
I B
II
II
t1 m1 b1
m2 b2 t3
II b3 (9, 5)
(12, 6) (9, 3) (0, 6) (9, 3) (6, 12)
(7, 7) (5, 9)
(a) What are all the strategies available to Player I? (b) How many strategies has Player II got? List all of them. (c) What is the expected payoff to each player if Player I plays B and Player II plays (t1 , b2 , t3 )? (d) What is the expected payoff to each player if Player I plays T and Player II plays (t1 , b2 , t3 )? 3.33 The following questions relate to Figure 3.16. The outcome of the game is a triple (x, y, z) representing the payoff to each player, with x denoting the payoff to Player I, y the payoff to Player II and z the payoff to Player III. The outcome of the game is a pair of numbers, representing a payment to each player. (a) Depict, by drawing arrows, strategies (a, c, e), (h, j, l), and (m, p, q) of the three players.
73
3.7 Exercises
a
II
b
III
I 1 3
0
1 3
i
I
j
III
II
1 3
q
(3, 6, 9)
h
(0, 12, 3)
m
(6, 12, 15)
n
(15, 12, 6)
c
(6, 0, 0)
d
(0, 0, 6)
o
(6, 15, 12)
p
(12, 18, 15)
e
(12, 15, 21)
f
(6, 3, 0)
k
(6, 13, 12)
l
(0, 18, 6)
I
III r
g
II
Figure 3.16
(b) Calculate the expected payoff if the players make use of the strategies in part (a). (c) How would you play this game, if you were Player I? Assume that each player is striving to maximize his expected payoff. 3.34 Bill asks Al to choose heads or tails. After Al has made his choice (without disclosing it to Bill), Bill flips a coin. If the coin falls on Al’s choice, Al wins. Otherwise, Bill wins. Depict this situation as a game in extensive form. 3.35 A pack of three cards, labeled 1, 2, and 3, is shuffled. William, Mary, and Anne each take a card from the pack. Each of the two players holding a card with low values (1 or 2) pays the amount of money appearing on the card he or she is holding to the player holding the high-valued card (namely, 3). Depict this situation as a game in extensive form. 3.36 Depict the game trees of the following variants of the candidate game appearing in Exercise 3.2: (a) Eric does not announce which candidate he prefers until the end of the game. He instead writes down the name of his candidate on a slip of paper, and shows that slip of paper to the others only after Larry and Sergey have announced their preferred candidate. (b) A secret ballot is conducted: no player announces his preferred candidate until the end of the game. (c) Eric and Sergey keep their candidate preference a secret until the end of the game, but Larry announces his candidate preference as soon as he has made his choice.
74
Extensive-form games
3.37 Describe the game Rock, Paper, Scissors as an extensive-form game (if you are unfamiliar with this game, see page 78 for a description). 3.38 Consider the following game. Player I has the opening move, in which he chooses an action in the set {L, R}. A lottery is then conducted, with either λ or ρ selected, both with probability 12 . Finally, Player II chooses either l or r. The outcomes of the game are not specified. Depict the game tree associated with the extensive-form game in each of the following situations: (a) Player II, at his turn, knows Player I’s choice, but does not know the outcome of the lottery. (b) Player II, at his turn, knows the outcome of the lottery, but does not know Player I’s choice. (c) Player II, at his turn, knows the outcome of the lottery only if Player I has selected L. (d) Player II, at his turn, knows Player I’s choice if the outcome of the lottery is λ, but does not know Player I’s choice if the outcome of the lottery is ρ. (e) Player II, at his turn, does not know Player I’s choice, and also does not know the outcome of the lottery. 3.39 In the following game, the root is a chance move, Player I has three information sets, and the outcome is the amount of money that Player I receives. (a) What does Player I know in each of his information sets, and what does he not know? (b) What would you recommend Player I to play, assuming that he wants to maximize his expected payoff? Justify your answer. 2 R2 L2
18
14
R3 L3 1 3
I
I
12
1 3
R3 L3
0 Root
1 3
R2
4
L1 R1
L2
I
4 7
L1 5
R1 10
4
Strategic-form games
Chapter summary In this chapter we present the model of strategic-form games. A game in strategic form consists of a set of players, a strategy set for each player, and an outcome to each vector of strategies, which is usually given by the vector of utilities the players enjoy from the outcome. The strategic-form description ignores dynamic aspects of the game, such as the order of the moves by the players, chance moves, and the informational structure of the game. The goal of the theory is to suggest which strategies are more likely to be played by the players, or to recommend to players which strategy to implement (or not to implement). We present several concepts that allow one to achieve these goals. The first concept introduced is domination (strict or weak), which provides a partial ordering of strategies of the same player; it tells when one strategy is “better” than another strategy. Under the hypothesis that it is commonly known that “rational” players do not implement a dominated strategy we can then introduce the process of iterated elimination of dominated strategies, also called rationalizability. In this process, dominated strategies are successively eliminated from the game, thereby simplifying it. We go on to introduce the notion of stability, captured by the concept of Nash equilibrium, and the notion of security, captured by the concept of the maxmin value and maxmin strategies. The important class of two-player zero-sum games is introduced along with its solution called the value (or the minmax value). This solution concept shares both properties of security and stability. When the game is not zero-sum, security and stability lead typically to different predictions. We prove that every extensive-form game with perfect information has a Nash equilibrium. This is actually a generalization of the theorem on the game of chess proved in Chapter 1. To better understand the relationships between the various concepts, we study the effects of elimination of dominated strategies on the maxmin value and on equilibrium payoffs. Finally, as a precursor to mixed strategies introduced in the next chapter, we look at an example of a two-player game on the unit square and compute its Nash equilibrium.
As we saw in Chapter 3, a player’s strategy in an extensive-form game is a decision rule that determines that player’s action in each and every one of his information sets. When there are no chance moves in the game, each vector of strategies – one strategy per 75
76
Strategic-form games
player – determines the play of the game and therefore also the outcome. If there are chance moves, a vector of strategies determines a probability distribution over possible plays of the game, and therefore also over the outcomes of the game. The strategy chosen by a player therefore influences the outcome (or the probability distribution of outcomes, if there are chance moves). If all we are interested in is the outcomes of the game and not the specific actions that brought about those outcomes, then it suffices to describe the game as the set of strategies available to each player, along with the distribution over the outcomes that each vector of strategies brings about.
4.1
Examples and definition of strategic-form games • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
For the analysis of games, every player must have preferences with respect to the set of outcomes. This subject was covered in detail in Chapter 2 on utility theory, where we saw that if player i’s preference relation !i satisfies the von Neumann–Morgenstern axioms, then it can be represented by a linear utility function ui . In other words, to every possible outcome o, we can associate a real number ui (o) representing the utility that player i ascribes to o, with the player preferring one outcome to another if and only if the utility of the first outcome is higher than the utility of the second outcome. The player prefers one lottery to another lottery if and only if the expected utility of the outcomes according to the first lottery is greater than the expected utility of the outcomes according to the second lottery. In most games we analyze in this book, we assume that the preference relations of the players satisfy the von Neumann–Morgenstern axioms. We will also assume that the outcomes of plays of games are given in utility terms. This means that the outcome of a play of a game is an n-dimensional vector, where the i-th coordinate is player i’s utility from that play of the game.1 Example 4.1 Consider the following two-player game (Figure 4.1) presented in extensive form with six possible outcomes.
l1 II L1
O1
I
I R1
L2
O2
R2
O3
L2
O4
R2
O5
r1
l2 II
r2
O6
Figure 4.1 A two-player game in extensive form
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
1 This is equivalent to the situation where the outcomes are monetary payoffs and the players are risk neutral, in which case every lottery over payoffs is equivalent to the expected monetary payoff in the lottery drawing.
77
4.1 Examples and definition of strategic-form games Player I has four strategies: L1 L2 , L1 R2 , R1 L2 , R1 R2 , and Player II has four strategies: l1 l2 , l1 r2 , r1 l2 , r1 r2 . The extensive-form description presents in detail what each player knows at each of his decision points. But we can ignore all that information, and present the players’ strategies, along with the outcomes they lead to, in the table in Figure 4.2:
Player I
l1 l2
Player II l1 r 2 r 1 l2
r1 r2
L1 L2
O1
O1
O2
O2
L1 R 2
O1
O1
O3
O3
R 1 L2
O4
O6
O4
O6
R1 R2
O5
O6
O5
O6
Figure 4.2 The game in Figure 4.1 in strategic form
In this description of the game, the rows represent the strategies of Player I and the columns those of Player II. In each cell of the table appears the outcome that arises if the two players choose the pair of strategies associated with that cell. For example, if Player I chooses strategy L1 L2 and Player II chooses strategy l1 l2 , we will be in the upper-leftmost cell of the table, leading to outcome O1 . ◭
A game presented in this way is called a game in strategic form or a game in normal form. Definition 4.2 A game in strategic form (or in normal form) is an ordered triple G = (N, (Si )i∈N , (ui )i∈N ), in which:
r N = {1, 2, . . . , n} is a finite set of players. r Si is the set of strategies of player i, for every player i ∈ N. We denote the set of all vectors of strategies by S = S1 × S2 × · · · × Sn .
r ui : S → R is a function associating each vector of strategies s = (si )i∈N with the payoff (= utility) ui (s) to player i, for every player i ∈ N. In this definition, the sets of strategies available to the players are not required to be finite, and in fact we will see games with infinite strategy sets in this book. A game in which the strategy set of each player is finite is termed a finite game. The fact that ui is a function of the vector of strategies s, and not solely of player i’s strategy si , is what makes this a game, i.e., a situation of interactive decisions in which the outcome for each player depends not on his strategy alone, but on the strategies chosen by all the players.
78
Strategic-form games
Example 4.3 Rock, Paper, Scissors In the game “Rock, Paper, Scissors,” each one of two players chooses an action from three alternatives: Rock, Paper, or Scissors. The actions are selected by the players simultaneously, with a circular dominance relationship obtaining between the three alternatives: rock smashes scissors, which cut paper, which in turn covers rock. The game in extensive form is described in Figure 4.3 in which the terminal vertices are labeled by the outcomes “I wins,” “II wins,” or D (for draw).
R P R I
S R
II
P
P S R
S
P S
Draw II wins I wins I wins Draw II wins II wins I wins Draw
Figure 4.3 Rock, Paper, Scissors as a game in extensive form
Setting the payoff to a player to be 1 for a win, −1 for a loss, and 0 for a draw, we obtain the game in strategic form appearing in Figure 4.4. In each cell in Figure 4.4 the left number denotes the payoff to Player I and the right number denotes the payoff to Player II.
Rock
Player I
Player II Paper Scissors
Rock
0, 0
1, −1
1, −1
Paper
1, −1
0, 0
− 1, 1
Scissors
−1, 1
−1, 1
0, 0
Figure 4.4 Rock, Paper, Scissors as a strategic-form game
◭ Games in strategic form are sometimes called matrix games because they are described by matrices.2 When the number of players n is greater than 2, the corresponding matrix is ndimensional, and each cell of the matrix contains a vector with n coordinates, representing the payoffs to the n players. When there are no chance moves, a game in strategic form is derived from a game in extensive form in the following way: ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
2 When n = 2 it is customary to call these games bimatrix games, as they are given by two matrices, one for the payoff of each player.
79
4.1 Examples and definition of strategic-form games
r List the set of all strategies Si available to each player i in the extensive-form game. r For each vector of strategies s = (si )i∈N find the play determined by this vector of strategies, and then derive the payoffs induced by this play: u(s) := (u1 (s), u2 (s), . . . , un (s)).
r Draw the appropriate n-dimensional matrix. When there are two players, the number of rows in the matrix equals the number of strategies of Player I, the number of columns equals the number of strategies of Player II, and the pair of numbers appearing in each cell is the pair of payoffs defined by the pair of strategies associated with that cell. When there are more than two players, the matrix is multi-dimensional (see Exercises 4.17 and 4.18 for examples of games with three players). How is a strategic-form game derived from an extensive-form game when there are chance moves? In that case, every strategy vector s = (si )i∈N determines a probability distribution μs over the set O of the game’s possible outcomes, where for each o ∈ O the value of μs (o) is the probability that if the players play according to strategy vector s the outcome will be o. The cell corresponding to strategy vector s contains the average of the payoffs corresponding to the possible outcomes according to this probability distribution, i.e., the vector u(s) = (ui (s))i∈N ∈ RN defined by
ui (s) := μs (o) × ui (o). (4.1) o∈O
Since we are assuming that the preference relations of all player satisfy the von Neumann–Morgenstern axioms, ui (s) is the utility that player i receives from the lottery over the outcomes of the game that is induced when the players play according to strategy vector s. Example 4.4 Consider the game in extensive form presented in Figure 4.5.
(0, 0) a R I b
A 0
1 2
—
B II
e
D 0
1 2
I C
(−2, 5)
(1, 1) 1 4 1 — 2 1 — 4 —
h
(5, −1)
(2, 0)
f g —
1 3 2 — 3 —
0 E
(0, 2) (−1, 1) (1, 1)
Figure 4.5 An extensive-form game with chance moves
In this game the outcome is a payoff to each of the players. This is a game of perfect information. Player I has two decision nodes, in each of which he has two possible actions. Player I’s strategy
80
Strategic-form games set is therefore SI = {(a, g), (a, h), (b, g), (b, h)}.
(4.2)
Player II has one decision node with two possible actions, so that Player II’s strategy set is SII = {e, f }.
(4.3)
To see how the payoffs are calculated, look, for example, at Player I’s strategy (b, g) and at Player II’s strategy e. If the players choose these strategies, three possible plays can occur with positive probability:
r The play R → A → B → D → (5, −1), with probability 1 . 6 r The play R → A → B → D → (−2, 5), with probability 1 . 3 r The play R → A → C → (1, 1), with probability 1 . 2 It follows that the expected payoff is 1 (5, −1) 6
+ 13 (−2, 5) + 12 (1, 1) =
2
3
,2 .
(4.4)
We can similarly calculate the payoffs to each pair of strategies. The resulting strategic-form game appears in Figure 4.6 (verify!).
Player II f
e
(a , g)
0, 0
0, 0
(a , h)
0, 0
0, 0
(b, g)
3 1 , 2 2
2 ,2 3
(b, h)
7 5 8, 8
Player I
1 24
,
17 8
Figure 4.6 The strategic form of the game in Figure 4.5
◭
In the game in Figure 4.6, Player I’s two strategies (a, g) and (a, h) correspond to the same row of payoffs. This means that, independently of Player II’s strategy, the strategy (a, g) leads to the same payoffs as does the strategy (a, h). We say that these two strategies are equivalent. This equivalence can be understood by considering the corresponding game in extensive form (Figure 4.5): when Player I chooses R (at vertex a), the choice between g and h has no effect on the outcome of the game, because the play never arrives at vertex C. We can therefore represent the two strategies (a, g) and (a, h) by one strategy, (a), and derive the strategic-form game described in Figure 4.7. A strategic-form game in which every set of equivalent strategies is represented by a single strategy (“the equivalence set”) is called a game in reduced strategic form. This is essentially the form of the game that is arrived at when we take into account the fact that a particular action by a player excludes reaching some information sets of that player. In that case, there is no need to specify his strategies at those information sets.
81
4.1 Examples and definition of strategic-form games
Player II
Player I
f
e
(a)
0, 0
0, 0
(b, g)
3 1 , 2 2
2 ,2 3
(b, h)
7 5 8, 8
1 24
,
17 8
Figure 4.7 The reduced strategic form of the game in Figure 4.5
Example 4.5 The game of chess in strategic form The number of strategies in the game of chess is immense even if we impose a maximal (large) number of moves after which the outcome is declared as a draw. There is no practical way to write down its game matrix (just as there is no practical way to present the game in extensive form). But it is significant that in principle the game can be represented by a finite matrix (even if its size is astronomic) (see Figure 4.8). The only possible outcomes of the game appearing in the cells of the matrix are W (victory for White), B (victory for Black), and D (draw).
White
Black 1 2 3 1 W D W 2 D D B 3 B B D · · · · · · · ·
· · · · · ·
· · · · · ·
· · · · · ·
Figure 4.8 The game of chess in strategic form
A winning strategy for the White player (if one exists) would be represented by a row all of whose elements are W . A winning strategy for the Black player (if one exists) would be represented in this matrix by a column all of whose elements are B. A strategy ensuring at least a draw for White (or Black) is a row (or a column) all of whose elements are D or W (or B or D). It follows from Theorem 1.4 (page 3) that in the matrix representing the game of chess, one and only one of the following alternatives holds: 1. There is a row all of whose elements are W . 2. There is a column all of whose elements are B. 3. There is a row all of whose elements are W or D, and a column all of whose elements are B or D. If the third possibility obtains, then the cell at the intersection of the row ensuring at least a draw for White and the column guaranteeing at least a draw for Black must contain D: if both players are playing a strategy guaranteeing at least a draw, then the outcome of the play must necessarily ◭ be a draw.
82
Strategic-form games
4.2
The relationship between the extensive form and the strategic form • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
We have shown that every extensive-form game can be associated with a unique reduced strategic-form game (meaning that every set of equivalent strategies in the extensive-form game is represented by one strategy in the strategic-form game). We have also exhibited a way of deriving a strategic-form game from an extensive-form game. There are two natural questions that arise with respect to the inverse operation: Does every strategic-form game have an extensive-form game from which it is derived? Is there a unique extensive-form game associated with a strategic-form game? The answer to the first question is affirmative, while the answer to the second question is negative. To show that the first question has an affirmative answer, we will now describe how to associate an extensive-form game with a given strategic-form game. Let G = (N, (Si )i∈N , (ui )i∈N ) be a game in strategic form, and denote the strategies of each player i by Si = {si1 , . . . , simi }. The reader can verify that G is the strategic-form game associated with the extensive-form game that appears in Figure 4.9. This is a natural description that is also called “the canonical representation” of the game. It captures the main characteristic of a strategic-form game: in essence, the players choose their strategies simultaneously. This property is expressed in Figure 4.9 by the fact that each player has a single information set. For example, despite the fact that in the extensive-form game Player 1 chooses his strategy first, none of the other players, when coming to choose their strategies, know which strategy Player 1 has chosen. Clearly, the order of the players that appear in Figure 4.10 can be selected arbitrarily. Since there are n! permutations over the set of n players, and each permutation defines a different ordering of the players, there are n! such extensive-form canonical descriptions of the same strategic-form game. Are there other, significantly different, ways of describing the same strategic-form game? The answer is positive. For example, each one of the three games in Figure 4.11 yields the two-player strategic-form game of Figure 4.10. Representation A in Figure 4.11 is the canonical representation of the game. In representation C we have changed the order of the players: instead of Player I playing first followed by Player II, we have divided the choice made by Player II into two parts: one choice is made before Player I makes his selection, and one afterwards. As neither player knows which strategy was selected by the other player, the difference is immaterial to the game. Representation B is more interesting, because in that game Player II knows Player I’s selection before he makes his selection (verify that the strategic form of each of the extensive-form games in Figure 4.11 is identical to the strategic-form game in Figure 4.10.) The fact that a single strategic-form game can be derived from several different extensive-form games is not surprising, because the strategic-form description of a game is a condensed description of the extensive-form game. It ignores many of the dynamic aspects of the extensive-form description. An interesting mathematical question that arises here is “what is the extent of the difference” between two extensive-form games associated with the same strategic-form game? Given two extensive-form games, is it possible
83
4.2 The relationship between extensive and strategic forms
Player 2 s 12
Player 3 s 13 3 sm 3
Player 1 2 sm 2
s 11 s 12
s 13
2 sm 2
3 sm 3 1 sm 1
s 12
s 13
2 sm 2
3 sm 3
Figure 4.9 A canonical representation of a strategic-form game as an
extensive-form game
Player II rr
rl
lr
ll
L
2, −1
2, −1
2, 0
2, 0
R
3, 1
1, 0
3, 1
1, 0
Player I Figure 4.10 The strategic-form game derived from each of the three games in
Figure 4.11 to identify whether or not they yield the same strategic-form game, without explicitly calculating their strategic-form representation? This subject was studied by Thompson [1952], who defined three elementary operations that do not change the “essence” of a game. He then proved that if two games in extensive form with the same set of players can be transformed into each other by a finite number of these three elementary operations,
84
Strategic-form games
UII1 R I
rr
(3, 1)
rl
(1, 0)
lr
(3, 1)
ll
R
(1, 0)
II L
r
L
(2, −1)
rl lr
l r
I
(2, −1)
rr
II
II l
(2, 0)
(3, 1) (1, 0) (2, −1) (2, 0)
(2, 0)
ll Representation A
Representation B UII
r
(3, 1)
l
(1, 0)
r
(2, −1)
R UII1 r II
L I
l
II
l r
R l L
(2, −1) (3, 1) (1, 0)
r
(2, 0)
l
(2, 0)
Representation C Figure 4.11 Three extensive-form games corresponding to the same strategic-form game in Figure 4.10
then those two extensive-form games correspond to the same strategic-form game. He also showed that the other direction obtains: if two games in extensive form yield the same strategic-form game, then they can be transformed into each other by a finite number of these three elementary operations.
4.3
Strategic-form games: solution concepts • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
We have dealt so far only with the different ways of describing games in extensive and strategic form. We discussed von Neumann’s theorem in the special case of two players and three possible outcomes: victory for White, a draw, or victory for Black. Now we will look at more general games, and consider the central question of game theory: What can we say about what “will happen” in a given game? First of all, note that this question has at least three different possible interpretations:
85
4.5 Domination
1. An empirical, descriptive interpretation: How do players, in fact, play in a given game? 2. A normative interpretation: How “should” the players play in a given game? 3. A theoretical interpretation: What can we predict will happen in a game given certain assumptions regarding “reasonable” or “rational” behavior on the part of the players? Descriptive game theory deals with the first interpretation. This field of study involves observations of the actual behavior of players, both in real-life situations and in artificial laboratory conditions where they are asked to play games and their behavior is recorded. This book will not address that area of game theory. The second interpretation would be appropriate for a judge, legislator, or arbitrator called upon to determine the outcome of a game based on several agreed-upon principles, such as justice, efficiency, nondiscrimination, and fairness. This approach is best suited for the study of cooperative games, in which binding agreements are possible, enabling outcomes to be derived from “norms” or agreed-upon principles, or determined by an arbitrator who bases his decision on those principles. This is indeed the approach used for the study of bargaining games (see Chapter 15) and the Shapley value (see Chapter 18). In this chapter we will address the third interpretation, the theoretical approach. After we have described a game, in either extensive or strategic form, what can we expect to happen? What outcomes, or set of outcomes, will reasonably ensue, given certain assumptions regarding the behavior of the players?
4.4
Notation • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Let N = {1, . . . , n} be a finite set, and for each i ∈ N let Xi be any set. Denote X := ×i∈N Xi , and for each i ∈ N define X−i := ×j =i Xj . For each i ∈ N we will denote by X−i =
× Xj
j =i
(4.5)
the Cartesian product of all the sets Xj except for the set Xi . In other words, X−i = {(x1 , . . . , xi−1 , xi+1 , . . . , xn ) : xj ∈ Xj , ∀j = i}.
(4.6)
An element in X−i will be denoted x−i = (x1 , . . . , xi−1 , xi+1 , . . . , xn ), which is the (n − 1)-dimensional vector derived from (x1 , . . . , xn ) ∈ X by suppressing the i-th coordinate.
4.5
Domination • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Consider the two-player game that appears in Figure 4.12, in which Player I chooses a row and Player II chooses a column. Comparing Player II’s strategies M and R, we find that:
r If Player I plays T , the payoff to Player II under strategy M is 2, compared to only 1 under strategy R. r If Player I plays B, the payoff to Player II under strategy M is 1, compared to only 0 under strategy R.
86
Strategic-form games
Player II L
M
R
T
1, 0
1, 2
0, 1
B
0, 3
0, 1
2, 0
Player I Figure 4.12 Strategy M dominates strategy R
We see that independently of whichever strategy is played by Player I, strategy M always yields a higher payoff to Player II than strategy R. This motivates the following definition: Definition 4.6 A strategy si of player i is strictly dominated if there exists another strategy ti of player i such that for each strategy vector s−i ∈ S−i of the other players, ui (si , s−i ) < ui (ti , s−i ).
(4.7)
If this is the case, we say that si is strictly dominated by ti , or that ti strictly dominates si . In the example in Figure 4.12 strategy R is strictly dominated by strategy M. It is therefore reasonable to assume that if Player II is “rational,” he will not choose R, because under any scenario in which he might consider selecting R, the strategy M would be a better choice. This is the first rationality property that we assume. Assumption 4.7 A rational player will not choose a strictly dominated strategy. We will assume that all the players are rational. Assumption 4.8 All players in a game are rational. Can a strictly dominated strategy (such as strategy R in Figure 4.12) be eliminated, under these two assumptions? The answer is: not necessarily. It is true that if Player II is rational he will not choose strategy R, but if Player I does not know that Player II is rational, he is liable to believe that Player II may choose strategy R, in which case it would be in Player I’s interest to play strategy B. So, in order to eliminate the strictly dominated strategies one needs to postulate that:
r Player II is rational, and r Player I knows that Player II is rational. On further reflection, it becomes clear that this, too, is insufficient, and we also need to assume that:
r Player II knows that Player I knows that Player II is rational. Otherwise, Player II would need to consider the possibility that Player I may play B, considering R to be a strategy contemplated by Player II, in which case Player II may be tempted to play L. Once again, further scrutiny reveals that this is still insufficient, and we need to assume that:
87
4.5 Domination
Player II L
M
T
1, 0
1, 2
B
0, 3
0, 1
Player I Figure 4.13 The game in Figure 4.12 after the elimination of strategy R
Player II
Player I
T
L
M
1, 0
1, 2
Figure 4.14 The game in Figure 4.12 following the elimination of strategies R and B
r Player I knows that Player II knows that Player I knows that Player II is rational. r Player II knows that Player I knows that Player II knows that Player I knows that Player II is rational. r And so forth. If all the infinite conditions implied by the above are satisfied, we say that the fact that Player II is rational is common knowledge among the players. This is an important concept underlying most of our presentation. Here we will give only an informal presentation of the concept of common knowledge. A formal definition appears in Chapter 9, where we extensively study common knowledge. Definition 4.9 A fact is common knowledge among the players of a game if for any finite chain of players i1 , i2 , . . . , ik the following holds: player i1 knows that player i2 knows that player i3 knows . . . that player ik knows the fact. The chain in Definition 4.9 may contain several instances of the same player (as indeed happens in the above example). Definition 4.9 is an informal definition since we have not formally defined what the term “fact” means, nor have we defined the significance of knowing a fact. We will now add a further assumption to the two assumptions listed above: Assumption 4.10 The fact that all players are rational (Assumption 4.8) is common knowledge among the players. Strictly dominated strategies can be eliminated under Assumptions 4.7, 4.8, and 4.10 (we will not provide a formal proof of this claim). In the example in Figure 4.12, this means that, given the assumptions, we should focus on the game obtained by the elimination of strategy R, which appears in Figure 4.13. In this game strategy B of Player I is strictly dominated by strategy T . Because the rationality of Player I is common knowledge, as is the fact that B is a strictly dominated strategy, after the elimination of strategy R, strategy B can also be eliminated. The players therefore need to consider a game with even fewer strategies, which is given in Figure 4.14.
88
Strategic-form games
Because in this game strategy L is strictly dominated (for Player II) by strategy M, after its elimination only one result remains, (1, 2), which obtains when Player I plays T and Player II plays M. The process we have just described is called iterated elimination of strictly dominated strategies. When this process yields a single strategy vector (one strategy per player), as in the example above, then, under Assumptions 4.7, 4.8, and 4.10, that is the strategy vector that will obtain, and it may be regarded as the solution of the game. A special case in which such a solution is guaranteed to exist is the family of games in which every player has a strategy that strictly dominates all of his other strategies, that is, a strictly dominant strategy. Clearly, in that case, the elimination of all strictly dominated strategies leaves each player with only one strategy: his strictly dominant strategy. When this occurs we say that the game has a solution in strictly dominant strategies.
Example 4.11 The Prisoner’s Dilemma The “Prisoner’s Dilemma” is a very simple game that is interesting in several respects. It appears in the literature in the form of the following story. Two individuals who have committed a serious crime are apprehended. Lacking incriminating evidence, the prosecution can obtain an indictment only by persuading one (or both) of the prisoners to confess to the crime. Interrogators give each of the prisoners – both of whom are isolated in separate cells and unable to communicate with each other – the following choices: 1. If you confess and your friend refuses to confess, you will be released from custody and receive immunity as a state’s witness. 2. If you refuse to confess and your friend confesses, you will receive the maximum penalty for your crime (ten years of incarceration). 3. If both of you sit tight and refuse to confess, we will make use of evidence that you have committed tax evasion to ensure that both of you are sentenced to a year in prison. 4. If both of you confess, it will count in your favor and we will reduce each of your prison terms to six years. This situation defines a two-player strategic-form game in which each player has two strategies: D, which stands for Defection, betraying your fellow criminal by confessing, and C, which stands for Cooperation, cooperating with your fellow criminal and not confessing the crime. In this notation, the outcome of the game (in prison years) is shown in Figure 4.15.
Player II D
C
D
6, 6
0, 10
C
10, 0
1, 1
Player I Figure 4.15 The Prisoner’s Dilemma in prison years
As usual, the left-hand number in each cell of the matrix represents the outcome (in prison years) for Player I, and the right-hand number represents the outcome for Player II.
89
4.5 Domination We now present the game in utility units. For example, suppose the utility of both players is given by the following function u: u(release) = 5, u(6 years in prison) = 1,
u(one year in prison) = 4, u(10 years in prison) = 0.
The game in utility terms appears in Figure 4.16.
Player II D
C
D
1, 1
5, 0
C
0, 5
4, 4
Player I Figure 4.16 The Prisoner’s Dilemma in utility units
For both players, strategy D (Defect) strictly dominates strategy C (Cooperate). Elimination of strictly dominated strategies leads to the single solution (D, D) in which both prisoners confess, ◭ resulting in the payoff (1, 1).
What makes the Prisoner’s Dilemma interesting is the fact that if both players choose strategy C, the payoff they receive is (4, 4), which is preferable for both of them. The solution derived from Assumptions 4.7, 4.8, and 4.10, which appear to be quite reasonable assumptions, is “inefficient”: The pair of strategies (C, C) is unstable, because each individual player can deviate (by defecting) and gain an even better payoff of 5 (instead of 4) for himself (at the expense of the other player, who would receive 0). In the last example, two strictly dominated strategies were eliminated (one per player), but there was no specification regarding the order in which these strategies were eliminated: was Player I’s strategy C eliminated first, or Player II’s, or were they both eliminated simultaneously? In this case, a direct verification reveals that the order of elimination makes no difference. It turns out that this is a general result: whenever iterated elimination of strictly dominated strategies leads to a single strategy vector, that outcome is independent of the order of elimination. In fact, we can make an even stronger statement: even if iterated elimination of strictly dominated strategies yields a set of strategies (not necessarily a single strategy), that set does not depend on the order of elimination (see Exercise 4.10). There are games in which iterated elimination of strictly dominated strategies does not yield a single strategy vector. For example, in a game that has no strictly dominated strategies, the process fails to eliminate any strategy. The game in Figure 4.17 provides an example of such a game. Although there are no strictly dominated strategies in this game, strategy B does have a special attribute: although it does not always guarantee a higher payoff to Player I relative to strategy T , in all cases it does grant him a payoff at least as high, and in the special case in which Player II chooses strategy L, B is a strictly better choice than T . In this case we say that strategy B weakly dominates strategy T (and strategy T is weakly dominated by strategy B).
90
Strategic-form games
Player II L
R
T
1, 2
2, 3
B
2, 2
2, 0
Player I Figure 4.17 A game with no strictly dominated strategies
Definition 4.12 Strategy si of player i is termed weakly dominated if there exists another strategy ti of player i satisfying the following two conditions: (a) For every strategy vector s−i ∈ S−i of the other players, ui (si , s−i ) ≤ ui (ti , s−i ).
(4.8)
(b) There exists a strategy vector t−i ∈ S−i of the other players such that ui (si , t−i ) < ui (ti , t−i ).
(4.9)
In this case we say that strategy si is weakly dominated by strategy ti , and that strategy ti weakly dominates strategy si . If strategy ti dominates (weakly or strictly) strategy si , then si does not (weakly or strictly) dominate ti . Clearly, strict domination implies weak domination. Because we will refer henceforth almost exclusively to weak domination, we use the term “domination” to mean “weak domination,” unless the term “strict domination” is explicitly used. The following rationality assumption is stronger than Assumption 4.7. Assumption 4.13 A rational player does not use a dominated strategy. Under Assumptions 4.8, 4.10, and 4.13 we may eliminate strategy T in the game in Figure 4.17 (as it is weakly dominated), and then proceed to eliminate strategy R (which is strictly dominated after the elimination of T ). The only remaining strategy vector is (B, L), with a payoff of (2, 2). Such a strategy vector is called rational, and the process of iterative elimination of weakly dominated strategies is called rationalizability. The meaning of “rationalizability” is that a player who expects a certain strategy vector to obtain can explain to himself why that strategy vector will be reached, based on the assumption of rationality. Definition 4.14 A strategy vector s ∈ S is termed rational if it is the unique result of a process of iterative elimination of weakly dominated strategies. Whereas Assumption 4.7 looks reasonable, Assumption 4.13 is quite strong. Reinhard Selten, in trying to justify Assumption 4.13, suggested a concept he termed the trembling hand principle. The basic postulate of this principle is that every single strategy available to a player may be used with positive probability, which may well be extremely small. This may happen simply by mistake (the player’s hand might tremble as he reaches to press the button setting in motion his chosen strategy, so that by mistake the button associated
91
4.6 Second-price auctions
with a different strategy is activated instead), by irrationality on the part of the player, or because the player chose a wrong strategy due to miscalculations. This topic will be explored in greater depth in Section 7.3 (page 262). To illustrate the trembling hand principle, suppose that Player II in the example of Figure 4.17 chooses strategies L and R with respective probabilities x and 1 − x, where 0 < x < 1. The expected payoff to Player I in that case is x + 2(1 − x) = 2 − x if he chooses strategy T , as opposed to a payoff of 2 if he chooses strategy B. It follows that strategy B grants him a strictly higher expected payoff than T , so that a rational Player I facing Player II who has a trembling hand will choose B and not T ; i.e., he will not choose the weakly dominated strategy. The fact that strategy si of player i (weakly or strictly) dominates his strategy ti depends only on player i’s payoff function, and is independent of the payoff functions of the other players. Therefore, a player can eliminate his dominated strategies even when he does not know the payoff functions of the other players. This property will be useful in Section 4.6. In the process of rationalizability we eliminate dominated strategies one after the other. Eliminating strategy si of player i after strategy sj of player j means that we assume that player i believes that player j will not implement sj . This assumption is reasonable only if player i knows player j ’s payoff function. Therefore, the process of iterative elimination of dominated strategies can be justified only if the payoff functions of the players are common knowledge among them; if this condition does not hold, this process is harder to justify. The process of rationalizability – iterated elimination of dominated strategies – is an efficient tool that leads, sometimes surprisingly, to significant results. The following example, taken from the theory of auctions, provides an illustration.
4.6
Second-price auctions • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
A detailed study of auction theory is presented in Chapter 12. In this section we will concentrate on the relevance of the concept of dominance to auctions known as sealed-bid second-price auctions, which are conducted as follows:
r An indivisible object is offered for sale. r The set of buyers in the auction is denoted by N. Each buyer i attaches a value vi to the object; that is, he is willing to pay at most vi for the object (and is indifferent between walking away without the object and obtaining it at price vi ). The value vi is buyer i’s private value, which may arise from entirely subjective considerations, such as his preference for certain types of artistic objects or styles, or from potential profits (for example, the auctioned object might be a license to operate a television channel). This state of affairs motivates our additional assumption that each buyer i knows his own private value vi but not the values that the other buyers attach to the object. This does not, however, prevent him from assessing the private values of the other buyers, or from believing that he knows their private values with some level of certainty. r Each buyer i bids a price bi (presented to the auctioneer in a sealed envelope).
92
Strategic-form games
r The winner of the object is the buyer who makes the highest bid. That may not be surprising, but in contrast to the auctions most of us usually see, the winner does not proceed to pay the bid he submitted. Instead he pays the second-highest price offered (hence the name second-price auction). If several buyers bid the same maximal price, a fair lottery is conducted between them to determine who will receive the object in exchange for paying that amount (which in this case is also the second-highest price offered.) Let us take a concrete example. Suppose there are four buyers respectively bidding 5, 15, 2, and 21. The buyer bidding 21 is the winner, paying 15 in exchange for the object. In general, the winner of the auction is a buyer i for which bi = max bj . j ∈N
(4.10)
If buyer i is the winner, the amount he pays is maxj =i bj . We now proceed to describe a sealed-bid second-price auction as a strategic-form game:3 1. The set of players is the set N of buyers in the auction. 2. The set of strategies available to buyer i is the set of possible bids Si = [0, ∞). 3. The payoff to buyer i, when the strategy vector is b = (b1 , . . . , bn ), is 0 if bi < maxj ∈N bj , ui (b) = (4.11) vi −maxj =i bj if bi = maxj ∈N bj . |{k : bk =maxj ∈N bj }| How should we expect a rational buyer to act in this auction? At first glance, this appears to be a very difficult problem to solve, because no buyer knows the private values of his competitors, let alone the prices they will bid. He may not even know how many other buyers are participating in the auction. So what price bi will buyer i bid? Will he bid a price lower than vi , in order to ensure that he does not lose money in the auction, or higher than vi , in order to increase his probability of winning, all the while hoping that the second-highest bid will be lower than vi ? The process of rationalizability leads to the following result: Theorem 4.15 In a second-price sealed-bid auction, the strategy bi = vi weakly dominates all other strategies. In other words, under Assumptions 4.8, 4.10, and 4.13, the auction will proceed as follows:
r Every buyer will bid bi = vi . r The winner will be the buyer whose private valuation of the object is the highest.4 The price paid by the winning buyer (i.e., the object’s sale price) is the second-highest private value. If several buyers share the same maximal bid, one of them, selected randomly by a fair lottery, will get the object, and will pay his private value (which in this special case is also the second-highest bid, and his profit will therefore be 0). ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
3 The relation between this auction method and other, more familiar, auction methods is discussed in Chapter 12. 4 This property is termed efficiency in the game theory and economics literature.
93
4.6 Second-price auctions
u(bi, B−i) vi
B−i vi Figure 4.18 The payoff function for strategy bi = vi
Each buyer knows his private value vi and therefore he also knows his payoff function. Since buyers do not necessarily know each other’s private value, they do not necessarily know each other’s payoff functions. Nevertheless, as we mentioned on page 91, the concept of domination is defined also when a player does not know the other players’ payoff functions. Proof: Consider a buyer i whose private value is vi . Divide the set of strategies available to him, Si = [0, ∞), into three subsets:
r The strategies in which his bid is less than vi : [0, vi ). r The strategy in which his bid is equal to vi : {vi }. r The strategies in which his bid is higher than vi : (vi , ∞). We now show that strategy bi = vi dominates all the strategies in the other two subsets. Given the procedure of the auction, the payment eventually made by buyer i depends on the strategies selected by the other buyers, through their highest bid, and the number of buyers bidding that highest bid. Denote the maximal bid put forward by the other buyers by B−i = max bj , j =i
and the number of buyers who offered this bid by N−i = k = i : bk = max bj . j =i
(4.12)
(4.13)
The payoff function of buyer i, as a function of the strategy vector b (i.e., the vector of all the bids made by the buyers) is ⎧ if bi < B−i , ⎨0 vi −B−i if bi = B−i , ui (b) = N−i +1 (4.14) ⎩ if bi > B−i . vi − B−i Since the only dependence that the payoff function ui (b) has on the bids b−i of the other buyers is via the highest bid, B−i , we sometimes denote this function by ui (bi , B−i ). If buyer i chooses strategy bi = vi , his payoff as a function of B−i is given in Figure 4.18. If buyer i chooses strategy bi satisfying bi < vi , his payoff function is given by Figure 4.19.
94
Strategic-form games
u(bi, B−i) vi
B−i bi vi Figure 4.19 The payoff function for strategy bi < vi
u(bi, B−i) vi
bi
B−i
vi Figure 4.20 The payoff function for strategy bi > vi
The height of the dot in Figure 4.19, when bi = B−i , depends on the number of buyers who bid B−i . The payoff function in Figure 4.18 (which corresponds to the strategy bi = vi ) is (weakly) greater than the one in Figure 4.19 (corresponding to a strategy bi with bi < vi ). The former is strictly greater than the latter when bi ≤ B−i < vi . It follows that strategy bi = vi dominates all strategies in which the bid is lower than buyer i’s private value. The payoff function for a strategy bi satisfying bi > vi is displayed in Figure 4.20. Again, we see that the payoff function in Figure 4.18 is (weakly) greater than the payoff function in Figure 4.20. The former is strictly greater than the latter when vi < B−i ≤ bi . It follows that the strategy in which the bid is equal to the private value weakly dominates all other strategies, as claimed. Theorem 4.15 holds also when some buyers do not know the number of buyers participating in the auction, their private values, their beliefs (about the number of buyers and the private values of the other buyers), and their utility functions (for example, information on whether the other players are risk seekers, risk averse, or risk neutral; see Section 2.7). The only condition needed for Theorem 4.15 to hold is that each buyer know the rules of the auction.
95
4.8 Stability: Nash equilibrium
4.7
The order of elimination of dominated strategies • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
As we have argued, when only strictly dominated strategies are involved in a process of iterated elimination, the result is independent of the order in which strategies are eliminated (Exercise 4.10). In iterated elimination of weakly dominated strategies, the result may be sensitive to the order of elimination. This phenomenon occurs for example in the following game. Example 4.16 Consider the strategic-form game that appears in Figure 4.21.
Player II
Player I
L
C
R
T
1, 2
2, 3
0, 3
M
2, 2
2, 1
3, 2
B
2, 1
0, 0
1, 0
Figure 4.21 A game in which the order of the elimination of dominated strategies influences the
yielded result In the table below, we present three strategy elimination procedures, each leading to a different result (verify!).
(1) (2) (3)
Order of elimination from left to right T , R, B, C B, L, C, T T , C, R
Result ML MR ML or BL
Payoff 2, 2 3, 2 2, 2 or 2, 1
The last line shows that eliminating strategies in the order T , C, R leaves two results ML and BL, with no possibility for further elimination because Player I is indifferent between the two results. This means that the order of elimination may determine not only the yielded strategy vector, but ◭ also whether or not the process yields a single strategy vector.
4.8
Stability: Nash equilibrium • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Dominance is a very important concept in game theory. As we saw in the previous section, it has several limitations, and it is insufficient for predicting a rational result in every game. In this section we present another important principle, stability. Consider the following two-player game in strategic form (Figure 4.22). There is no dominance relationship between the strategies in this game. For example, if we compare the strategies T and M of Player I, it turns out that neither of them is always preferable to the other: M is better than T if Player II chooses L, and T is better than M if Player II chooses C. In fact, M is the best reply of Player I to L, while T is his best reply
96
Strategic-form games
Player II
Player I
L
C
R
T
0, 6
6, 0
4, 3
M
6, 0
0, 6
4, 3
B
3, 3
3, 3
5, 5
Figure 4.22 A two-player game with no dominated strategies
to C, and B is his best reply to R. Similarly, for Player II, L is the best reply to T and C is the best reply to M, while strategy R is the best reply to B. A player who knows the strategies used by the other players is in effect playing a game in which only he is called upon to choose a strategy. If that player is rational, he will choose the best reply to those strategies used by the other players. For example, in the game in Figure 4.22:
r r r r r r
If Player II knows that Player I will choose T , he will choose L (his best reply to T ). If Player I knows that Player II will choose L, he will choose M (his best reply to L). If Player II knows that Player I will choose M, he will choose C (his best reply to M). If Player I knows that Player II will choose C, he will choose T (his best reply to C). If Player II knows that Player I will choose B, he will choose R (his best reply to B). If Player I knows that Player II will choose R, he will choose B (his best reply to R).
The pair of strategies (B, R) satisfies a stability property: each strategy in this pair is the best reply to the other strategy. Alternatively, we can state this property in the following way: assuming the players choose (B, R), neither player has a profitable deviation; that is, under the assumption that the other player indeed chooses his strategy according to (B, R), neither player has a strategy that grants a higher payoff than sticking to (B, R). This stability property was defined by John Nash, who invented the equilibrium concept that bears his name. Definition 4.17 A strategy vector s ∗ = (s1∗ , . . . , sn∗ ) is a Nash equilibrium if for each player i ∈ N and each strategy si ∈ Si the following is satisfied: ∗ ). ui (s ∗ ) ≥ ui (si , s−i
(4.15)
The payoff vector u(s ∗ ) is the equilibrium payoff corresponding to the Nash equilibrium s ∗ . The strategy si ∈ Si is a profitable deviation of player i at a strategy vector s ∈ S if ui ( si , s−i ) > ui (s). A Nash equilibrium is a strategy vector at which no player has a profitable deviation.
97
4.8 Stability: Nash equilibrium
The Nash equilibrium is often simply referred to as an equilibrium, and sometimes as an equilibrium point. As defined above, it says that no player i has a profitable unilateral deviation from s ∗ . The Nash equilibrium can be equivalently expressed in terms of the best-reply concept, which we first define. Definition 4.18 Let s−i be a strategy vector of all the players not including player i. Player i’s strategy si is termed a best reply to s−i if ui (si , s−i ) = max ui (ti , s−i ). ti ∈Si
(4.16)
The next definition, based on the best-reply concept, is equivalent to the definition of Nash equilibrium in Definition 4.17 (Exercise 4.15). Definition 4.19 The strategy vector s ∗ = (s1∗ , . . . , sn∗ ) is a Nash equilibrium if si∗ is a best ∗ for every player i ∈ N. reply to s−i In the example in Figure 4.22, the pair of strategies (B, R) is the unique Nash equilibrium (verify!). For example, the pair (T , L) is not an equilibrium, because T is not a best reply to L; Player I has a profitable deviation from T to M or to B. Out of all the nine strategy pairs, (B, R) is the only equilibrium (verify!). Social behavioral norms may be viewed as Nash equilibria. If a norm were not an equilibrium, some individuals in society would find some deviation from that behavioral norm to be profitable, and it would cease to be a norm. A great deal of research in game theory is devoted to identifying equilibria and studying the properties of equilibria in various games. One important research direction that has been emerging in recent years studies processes (such as learning, imitation, or regret) leading to equilibrium behavior, along with the development of algorithms for calculating equilibria. Example 4.11 The Prisoner’s Dilemma (continued) The Prisoner’s Dilemma is presented in the matrix in Figure 4.23.
Player II D
C
D
1, 1
5, 0
C
0, 5
4, 4
Player I Figure 4.23 The Prisoner’s Dilemma
The unique equilibrium is (D, D), in which both prisoners confess to the crime, resulting in payoff (1, 1). Recall that this is the same result that is obtained by elimination of strictly dominated ◭ strategies.
98
Strategic-form games
Example 4.20 Coordination game The game presented in Figure 4.24 is an example of a broad class of games called “coordination games.” In a coordination game, it is in the interests of both players to coordinate their strategies. In this example both (A, a) and (B, b) are equilibrium points. The equilibrium payoff associated with (A, a) is (1, 1), and the equilibrium payoff of (B, b) is (3, 3). In both cases, and for both players, the payoff is better than (0, 0), which is the payoff for “miscoordinated” strategies (A, b) or (B, a).
Player II a
b
A
1, 1
0, 0
B
0, 0
3, 3
Player I Figure 4.24 A coordination game
◭
Example 4.21 Battle of the Sexes The game in Figure 4.25 is called the “Battle of the Sexes.”
Woman F
C
F
2, 1
0, 0
C
0, 0
1, 2
Man Figure 4.25 Battle of the Sexes
The name of the game is derived from the following description. A couple is trying to plan what they will be doing on the weekend. The alternatives are going to a concert (C) or watching a football match (F ). The man prefers football and the woman prefers the concert, but both prefer being together to being alone, even if that means agreeing to the less-preferred recreational pastime. There are two equilibrium points: (F, F ) with a payoff of (2, 1) and (C, C) with a payoff of (1, 2). The woman would prefer the strategy pair (C, C) while the man would rather see (F, F ) ◭ chosen. However, either one is an equilibrium.
Example 4.22 The Security Dilemma The game illustrated in Figure 4.26 is also a coordination game called the “Security Dilemma.” The game describes the situation involving the Union of Soviet Socialist Republics (USSR, Player 1) and the United States (US, Player 2) after the Second World War. Each of these countries had the capacity to produce nuclear weapons. The best outcome for each country (4 utility units in the figure) was the one in which neither country had nuclear weapons,
99
4.8 Stability: Nash equilibrium because producing nuclear weapons is expensive and possession of such weapons is liable to lead to war with severe consequences. A less desirable outcome for each country (3 utility units in the figure) is for it to have nuclear weapons while the other country lacks nuclear weapons. Even less desirable for each country (2 utility units in the figure) is for both countries to have nuclear weapons. The worst outcome for a country (1 utility unit in the figure) is for it to lack nuclear weapons while the other country has nuclear weapons.
US Don’t produce nuclear weapons
Produce nuclear weapons
Produce nuclear weapons
3, 1
2, 2
Don’t produce nuclear weapons
4, 4
1, 3
USSR Figure 4.26 The Security Dilemma
There are two Nash equilibria in this game: in one equilibrium neither country produces nuclear weapons and in the other equilibrium both countries produce nuclear weapons. If the US believes that the USSR is not going to produce nuclear weapons then it has no reason to produce nuclear weapons, while if the US believes that the USSR is going to produce nuclear weapons then it would be better off producing nuclear weapons. In the first equilibrium each country runs the risk that the other country will produce nuclear weapons, but in the second equilibrium there is no such risk: if the US does produce nuclear weapons then if the USSR also produces nuclear weapons then the US has implemented the best strategy under the circumstances, while if the USSR does not produce nuclear weapons then the outcome for the US has improved from 2 to 3. In other words, the more desirable equilibrium for both players is also the more risky one. This is why this game got the name the Security Dilemma. Some have claimed that the equilibrium under which both countries produce nuclear weapons is the more reasonable equilibrium (and that is in fact the equilibrium that has obtained historically). Note that the maxmin strategy of each country is to produce nuclear weapons; that strategy guarantees a country implementing it at least 2, while a country implementing the strategy of not producing nuclear weapons runs the risk of getting only 1. ◭
Example 4.23
Cournot5 duopoly competition Two manufacturers, labeled 1 and 2, produce the same
product and compete for the same market of potential customers. The manufacturers simultaneously select their production quantities, with demand determining the market price of the product, which is identical for both manufacturers. Denote by q1 and q2 the quantities respectively produced by
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
5 Antoine Augustin Cournot, August 28, 1801–March 31, 1877, was a French philosopher and mathematician. In his book Researches on the Mathematical Principles of the Theory of Wealth, published in 1838, he presented the first systematic application of mathematical tools for studying economic theory. The book marks the beginning of modern economic analysis.
100
Strategic-form games Manufacturers 1 and 2. The total quantity of products in the market is therefore q1 + q2 . Assume that when the supply is q1 + q2 the price of each item is 2 − q1 − q2 . Assume also that the per-item production cost for the first manufacturer is c1 > 0 and that for the second manufacturer it is c2 > 0. Does there exist an equilibrium in this game? If so, what is it? This is a two-player game (Manufacturers 1 and 2 are the players), and the strategy set of each player is [0, ∞). If Player 1 chooses strategy q1 and Player 2 chooses strategy q2 , the payoff for Player 1 is u1 (q1 , q2 ) = q1 (2 − q1 − q2 ) − q1 c1 = q1 (2 − c1 − q1 − q2 ),
(4.17)
and the payoff for Player 2 is u2 (q1 , q2 ) = q2 (2 − q1 − q2 ) − q1 c2 = q2 (2 − c2 − q1 − q2 ).
(4.18)
Player 1’s best reply to Player 2’s strategy q2 is the value q1 maximizing u1 (q1 , q2 ). The function q1 #→ u1 (q1 , q2 ) is a quadratic function that attains its maximum at the point where the derivative of the function is zero: ∂u1 (q1 , q2 ) = 0. (4.19) ∂q1 Differentiating the right-hand side of Equation (4.17) yields the first-order condition 2 − c1 − 2q1 − q2 = 0, or
2 − c1 − q2 . (4.20) 2 Similarly, Player 2’s best reply to Player 1’s strategy q1 is given by the point where the derivative of u2 (q1 , q2 ) with respect to q2 is zero. Taking the derivative, we get q1 =
2 − c2 − q1 . 2 Solving equations (4.20) and (4.21) yields q2 =
(4.21)
2 − 2c2 + c1 2 − 2c1 + c2 , q2∗ = . (4.22) 3 3 A careful check indicates that this is an equilibrium (Exercise 4.24) and this is the only equilibrium of the game. The payoffs to the players at equilibrium are 2 − 2c1 + c2 2 u1 (q1∗ , q2∗ ) = = (q1∗ )2 , (4.23) 3 2 − 2c2 + c1 2 u2 (q1∗ , q2∗ ) = = (q2∗ )2 . (4.24) 3 q1∗ =
For example, if the two players have identical production costs c1 = c2 = 1, then the equilibrium 2 ◭ production quantities will be q1∗ = q2∗ = 31 , and the payoff to each player is 13 = 19 .
4.9
Properties of the Nash equilibrium • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The Nash equilibrium is the most central and important solution concept for games in strategic form or extensive form. To understand why, it is worthwhile to consider both the advantages and limitations of Nash’s seminal concept.
101
4.9 Properties of the Nash equilibrium
Player II L
R
T
0, 0
4, 2
B
3, 5
0, 0
Player I Figure 4.27 A coordination game
4.9.1
Stability The most important property expressed by the Nash equilibrium is stability: under Nash equilibrium, each player acts to his best possible advantage with respect to the behavior of the other players. Indeed, this would appear to be a requirement for any solution concept: if there is to be any “expected” result (by any conceivable theory predicting the result of a game), that result must be in equilibrium, because otherwise there will be at least one player with a profitable deviation and the “expected” result will not materialize. From this perspective, the Nash equilibrium is not a solution concept but rather a meta-solution: the stability property is one that we would like every “expected” or “reasonable” solution to exhibit.
4.9.2
A self-fulfilling agreement Another way to express the property of stability is to require that if there is “agreement” to play a particular equilibrium, then, even if the agreement is not binding, it will not be breached: no player will deviate from the equilibrium point, because there is no way to profit from any unilateral violation of the agreement. This appears to be particularly convincing in games of coordination, as in the example in Figure 4.27. This game has two Nash equilibria, (T , R) and (B, L), and it is reasonable to suppose that if the players were to communicate they would “agree” (probably after a certain amount of debate) to play one of them. The properties of the equilibrium concept imply that whether they choose (T , R) or (B, L) they will both fulfill the agreement and not deviate from it, because any unilateral deviation will bring about a loss to the deviator (and to the other player).
4.9.3
Equilibrium and evolution The principle of the survival of the fittest is one of the fundamental principles of Darwin’s Theory of Evolution. The principle postulates that our world is populated by a multitude of species of plants and animals, including many mutations, but only those whose inherited traits are fitter than those of others to withstand the test of survival will pass on their genes to posterity. For example, if an animal that has been endowed with certain inherited characteristics has on average four offspring who manage to live to adulthood, while a mutation of the animal with a different set of traits has on average only three offspring living to adulthood, then, over several generations, the descendants of the first animal will overwhelm the descendants of the mutation in absolute numbers.
102
Strategic-form games
A picturesque way of expressing Darwin’s principle depicts an animal (or plant) being granted the capacity of rational intelligence prior to birth and selecting the genetic traits with which it will enter the world. Under that imaginary scenario we would expect the animal (or plant) to choose those traits that grant the greatest possible advantages in the struggle for survival. Animals, of course, are not typically endowed with rational thought and no animal can choose its own genetic inheritance. What actually happens is that those individuals born with traits that are a poor fit relative to the conditions for survival will pass those same characteristics on to their progeny, and over time their numbers will dwindle. In other words, the surviving and prevailing traits are a kind of “best reply” to the environment – from which the relationship to the concept of Nash equilibrium follows. Section 5.8 (page 186) presents in greater detail how evolutionary processes can be modeled in game-theoretic terms and the role played by the Nash equilibrium in the theory of evolution.
4.9.4
Equilibrium from the normative perspective Consider the concept of equilibrium from the normative perspective of an arbitrator or judge recommending a certain course of action (hopefully based on reasonable and acceptable principles). In that case we should expect the arbitrator’s recommendation to be an equilibrium point. Otherwise (since it is a recommendation and not a binding agreement) there will be at least one agent who will be tempted to benefit from not following his end of the recommendation. Seeking equilibrium alone, however, is not enough for the arbitrator to arrive at a decision. If, for example, there is more than one equilibrium point, as in the coordination game in Figure 4.27, choosing between them requires more considerations and principles. A rich literature, in fact, deals with “refinements” of the concept of equilibrium, which seek to choose (or “invalidate”) certain equilibria within the set of all possible equilibria. This subject will be discussed in Chapter 7. Despite all its advantages, the Nash equilibrium is not the final be-all and end-all in the study of strategic- or extensive-form games. Beyond the fact that in some games there is no equilibrium and in others there may be a multiplicity of equilibria, even when there is a single Nash equilibrium it is not always entirely clear that the equilibrium will be the strategy vector that is “recommended” or “predicted” by a specific theory. There are many who believe, for example, that the unique equilibrium of the Prisoner’s Dilemma does not constitute a “good recommendation” or a “good prediction” of the outcome of the game. We will later see additional examples in which it is unclear that an equilibrium will necessarily be the outcome of a game (cf. the first example in Section 4.10, the repeated Prisoner’s Dilemma in Example 7.15 (page 259), the Centipede game (Examples 7.16, page 259), and Example 7.17 on page 261).
4.10
Security: the maxmin concept • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
As we have already pointed out, the concept of equilibrium, despite its advantages, does not always describe the expected behavior of rational players, even in those cases where an equilibrium exists and is unique. Consider, for example, the game described in Figure 4.28.
103
4.10 Security: the maxmin concept
Player II
Player I
L
R
T
2, 1
2, −20
M
3, 0
−10, 1
B
−100, 2
3, 3
Figure 4.28 A game with a unique but “dangerous” equilibrium
The unique equilibrium in this game is (B, R), with a payoff of (3, 3). But thinking this over carefully, can we really expect this result to obtain with high probability? One can imagine Player I hesitating to choose B: what if Player II were to choose L (whether by accident, due to irrationality, or for any other reason)? Given that the result (B, L) is catastrophic for Player I, he may prefer strategy T , guaranteeing a payoff of only 2 (compared to the equilibrium payoff of 3), but also guaranteeing that he will avoid getting −100 instead. If Player II is aware of this hesitation, and believes that there is a reasonable chance that Player I will flee to the safety of T , he will also be wary of choosing the equilibrium strategy R (and risking the −20 payoff), and will likely choose strategy L instead. This, in turn, increases Player I’s motivation to choose T . This underscores an additional aspect of rational behavior that exists to some extent in the behavior of every player: guaranteeing the best possible result without “relying” on the rationality of the other players, and even making the most pessimistic assessment of their potential behavior. So what can player i, in a general game, guarantee for himself? If he chooses strategy si , the worst possible payoff he can get is min ui (si , t−i ).
t−i ∈S−i
(4.25)
Player i can choose the strategy si that maximizes this value. In other words, disregarding the possible rationality (or irrationality) of the other players, he can guarantee for himself a payoff of v i := max min ui (si , t−i ). si ∈Si t−i ∈S−i
(4.26)
The quantity v i is called the maxmin value of player i, which is sometimes also called the player’s security level. A strategy si∗ that guarantees this value is called a maxmin strategy. Such a strategy satisfies min ui (si∗ , t−i ) ≥ min ui (si , t−i ), ∀si ∈ Si ,
t−i ∈S−i
t−i ∈S−i
(4.27)
which is equivalent to ui (si∗ , t−i ) ≥ v i , ∀t−i ∈ S−i .
(4.28)
104
Strategic-form games
Player II
Player I
minsI ∈ SI uII (sI, sII)
minsII ∈ SII uI (sI, sII)
L
R
T
2, 1
2, −20
2
M
3, 0
−10, 1
−10
B
−100, 2
3, 3
−100
0
−20
2, 0
Figure 4.29 The game in Figure 4.28 with the security value of each player
Remark 4.24 The definition of a game in strategic form does not include a requirement that the set of strategies available to any of the players be finite. When the strategy set is infinite, the minimum in Equation (4.25) may not exist for certain strategies si ∈ Si . Even if the minimum in Equation (4.25) is attained for every strategy si ∈ Si , the maximum in Equation (4.26) may not exist. It follows that when the strategy set of one or more players is infinite, we need to replace the minimum and maximum in the definition of the maxmin value by infimum and supremum, respectively: v i := sup inf ui (si , t−i ). si ∈Si t−i ∈S−i
(4.29)
If the supremum is never attained there is no maxmin strategy: for each ε > 0 the player can guarantee for himself at least vi − ε, but not at least v i . A continuous function defined over a compact domain always attains a maximum and a minimum. Moreover, when X and Y are compact sets in Rm and f : X × Y → R is a continuous function, the function x #→ miny∈Y f (x, y) is also continuous (Exercise 4.22). It follows that when the strategy sets of the players are compact and the payoff functions are continuous, the maxmin strategies of the players are well defined. We will now proceed to calculate the value guaranteed by each strategy in the example in Figure 4.28. In Figure 4.29, the numbers in the right-most column (outside the payoff matrix) indicate the the worst payoff to Player I if he chooses the strategy of the corresponding row. Similarly, the numbers in the bottom-most row (outside the payoff matrix) indicate the worst payoff to Player II if he chooses the strategy of the corresponding column. Finally, the oval contains the maxmin value of both players. The maxmin value of Player I is 2 and the strategy that guarantees this value is T . The maxmin value of Player II is 0 with maxmin strategy L. If the two players choose their maxmin strategies, the result is (T , L) with payoff (2, 1), in which Player II’s payoff of 1 is greater than his maxmin value. As the next example illustrates, a player may have several maxmin strategies. In such a case, when the players use maxmin strategies the payoff depends on which strategies they have chosen.
105
4.10 Security: the maxmin concept
Example 4.25 Consider the two-player game appearing in Figure 4.30.
Player II minsII ∈ SII uI (sI, sII)
L
R
T
3, 1
0, 4
0
B
2, 3
1, 1
1
1
1
Player I minsI ∈ SI uII (sI, sII)
1, 1
Figure 4.30 A game with the maxmin values of the players
The maxmin value of Player I is 1 and his unique maxmin strategy is B. The maxmin value of Player II is 1, and both L and R are his maxmin strategies. It follows that when the two players implement maxmin strategies the payoff might be (2, 3), or (1, 1), depending on which maxmin ◭ strategy is implemented by Player II.
We next explore the connection between the maxmin strategy and dominant strategies. Theorem 4.26 A strategy of player i that dominates all his other strategies is a maxmin strategy for that player. Such a strategy, furthermore, is a best reply of player i to any strategy vector of the other players. The proof of this theorem is left to the reader (Exercise 4.25). The theorem implies the following conclusion. Corollary 4.27 In a game in which every player has a strategy that dominates all of his other strategies, the vector of dominant strategies is an equilibrium point and a vector of maxmin strategies. An example of this kind of game is a sealed-bid second-price auction, as we saw in Section 4.6. The next theorem constitutes a strengthening of Corollary 4.27 in the case of strict domination (for the proof see Exercise 4.26). Theorem 4.28 In a game in which every player i has a strategy si∗ that strictly dominates all of his other strategies, the strategy vector (s1∗ , . . . , sn∗ ) is the unique equilibrium point of the game as well as the unique vector of maxmin strategies. Is there a relation between the maxmin value of a player and his payoff in a Nash equilibrium? As the next theorem states, the payoff of each player in a Nash equilibrium is at least his maxmin value. Theorem 4.29 Every Nash equilibrium σ ∗ of a strategic-form game satisfies ui (σ ∗ ) ≥ v i for every player i. Proof: For every strategy si ∈ Si we have ∗ ) ≥ min ui (si , s−i ). ui (si , s−i s−i ∈S−i
(4.30)
106
Strategic-form games ∗ Since the definition of an equilibrium implies that ui (s ∗ ) = maxsi ∈Si ui (si , s−i ), we deduce that ∗ ) ≥ max min ui (si , s−i ) = v i , ui (s ∗ ) = max ui (si , s−i si ∈Si s−i ∈S−i
si ∈Si
as required.
4.11
(4.31)
The effect of elimination of dominated strategies • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Elimination of dominated strategies was discussed in Section 4.5 (page 85). A natural question that arises is how does the process of iterative elimination of dominated strategies change the maxmin values and the set of equilibria of the game? We will show here that the elimination of strictly dominated strategies has no effect on a game’s set of equilibria. The iterated elimination of weakly dominated strategies can reduce the set of equilibria, but it cannot create new equilibria. On the other hand, the maxmin value of any particular player is unaffected by the elimination of his dominated strategies, whether those strategies are weakly or strictly dominated. Theorem 4.30 Let G = (N, (Si )i∈N , (ui )i∈N ) be a strategic-form game, and let s j ∈ Sj be the game derived from G by the elimination be a dominated strategy of player j . Let G is equal to his maxmin value in G. of strategy sj . Then the maxmin value of player j in G
Proof: The maxmin value of player j in G is
v j = max min uj (sj , s−j ),
(4.32)
sj ∈Sj s−j ∈S−j
is and his maxmin value in G
vj =
max
min uj (sj , s−j ).
(4.33)
{sj ∈Sj ,sj = sj } s−j ∈S−j
sj in G. Then the following is satisfied: Let tj be a strategy of player j that dominates sj , s−j ) ≤ uj (tj , s−j ), ∀s−j ∈ S−j , uj (
(4.34)
and therefore sj , s−j ) ≤ min uj (tj , s−j ) ≤ min uj ( s−j ∈S−j
s−j ∈S−j
This leads to the conclusion that
max
min uj (sj , s−j ).
{sj ∈Sj ,sj = sj } s−j ∈S−j
v j = max min uj (sj , s−j ) sj ∈Sj s−j ∈S−j min uj (sj , s−j ), min uj ( sj , s−j ) = max max =
{sj ∈Sj ,sj = sj } s−j ∈S−j
max
min uj (sj , s−j ) = vj ,
{sj ∈Sj ,sj = sj } s−j ∈S−j
which is what we wanted to prove.
s−j ∈S−j
(4.35)
(4.36) (4.37) (4.38)
107
4.11 Elimination of dominated strategies
Note that the elimination of a (strictly or weakly) dominated strategy of one player may increase the maxmin values of other players (but not decrease them; see Exercise 4.27). It follows that when calculating the maxmin value of player i we can eliminate his dominated strategies, but we must not eliminate dominated strategies of other players, since this may result in increasing player i’s maxmin value. Therefore, iterated elimination of (weakly or strictly) dominated strategies may increase the maxmin value of some players. The next theorem states that if we eliminate some of the strategies of each player (whether or not they are dominated), then every equilibrium of the original game (the game prior to the elimination of strategies) is also an equilibrium of the game resulting from the elimination process, provided that none of the strategies of that equilibrium were eliminated. = Theorem 4.31 Let G = (N, (Si )i∈N , (ui )i∈N ) be a game in strategic form, and let G (N, ( Si )i∈N , (ui )i∈N ) be the game derived from G through the elimination of some of the strategies, namely, Si ⊆ Si for each player i ∈ N. If s ∗ is an equilibrium in game G, and ∗ if si ∈ Si for each player i, then s ∗ is an equilibrium in the game G. Proof: Because s ∗ is an equilibrium of the game G, it follows that for each player i, ∗ ui (si , s−i ) ≤ ui (s ∗ ), ∀si ∈ Si .
Because Si ⊆ Si for each player i ∈ N, it is the case that ∗ ) ≤ ui (s ∗ ), ∀si ∈ Si . ui (si , s−i
(4.39)
(4.40)
we conclude that it is an equilibrium Because s ∗ is a vector of strategies in the game G, of G.
may contain new It should be noted that in general the post-elimination game G equilibria that were not equilibria in the original game (Exercise 4.28). The next theorem shows that this cannot happen if the eliminated strategies are weakly dominated – that is, no new equilibria are created if a weakly dominated strategy of a particular player is eliminated. Repeated application of the theorem then yields the fact that the process of iterated elimination of weakly dominated strategies does not lead to the creation of new equilibria (Corollary 4.33).
Theorem 4.32 Let G = (N, (Si )i∈N , (ui )i∈N ) be a game in strategic form, let j ∈ N, the and let sj ∈ Sj be a weakly dominated strategy of player j in this game. Denote by G is game derived from G by the elimination of the strategy sj . Then every equilibrium of G also an equilibrium of G. are Proof: The strategy sets of the game G Si Si = Sj \ { sj }
if i = j, if i = j.
(4.41)
108
Strategic-form games
Then Let s ∗ = (si∗ )i∈N be an equilibrium strategy vector of the game G. ∗ ) ≤ ui (s ∗ ), ui (si , s−i
∗ ) uj (sj , s−j
∗
≤ uj (s ),
∀i = j, ∀si ∈ Si = Si , ∀sj ∈ Sj .
(4.42) (4.43)
To show that s ∗ is an equilibrium of the game G we must show that no player i can profit in G by deviating to a strategy that differs from si∗ . First we will show that this is true of every player i, i = j . Let i be a player who is not player j . Since Si = Si , by Equation (4.42) player i has no deviation from si∗ that is profitable for him. As for player j , Equation sj }. It (4.43) implies that he cannot profit from deviating to any strategy in Sj = Sj \ { only remains, then, to check that player j sees no gain from switching from strategy sj∗ to strategy sj . Because sj is a dominated strategy, there exists a strategy tj ∈ Sj that dominates it. It sj , and in particular that tj ∈ Sj , so that follows that tj = uj ( sj , s−j ) ≤ uj (tj , s−j ), ∀s−j ∈ S−j .
(4.44)
∗ in Equation (4.44) and sj = tj in Equation (4.43), we get Inserting s−j = s−j ∗ ∗ ∗ sj , s−j ) ≤ uj (tj , s−j ) ≤ uj (sj∗ , s−j ), uj (
which shows that deviating to strategy sj is indeed not profitable for player j .
(4.45)
The following corollary (whose proof is left to the reader in Exercise 4.29) is implied by Theorem 4.32.
Corollary 4.33 Let G = (N, (Si )i∈N , (ui )i∈N ) be a game in strategic form, and let G be the game derived from the game G by iterative elimination of dominated strategies. is also an equilibrium of G. In particular, if the iterative Then every equilibrium s ∗ of G elimination results in a single vector s ∗ , then s ∗ is an equilibrium of the game G.
Iterated elimination of dominated strategies, therefore, cannot create new equilibria. However, as the next example shows, it can result in the loss of some of the equilibria of the original game. This can happen even when there is only one elimination process possible. Example 4.34 Consider the two-player game given by the matrix in Figure 4.31.
Player II L
R
T
0, 0
2, 1
B
3, 2
1, 2
Player I Figure 4.31 Elimination of dominated strategies may eliminate an equilibrium point
109
4.11 Elimination of dominated strategies The game has two equilibria: (T , R) and (B, L). The only dominated strategy in the game is L (dominated by R). The elimination of strategy L results in a game in which B is dominated, and its elimination in turn yields the result (T , R). Thus, the elimination of L also eliminates the strategy vector (B, L) – an equilibrium point in the original game. The payoff corresponding to the eliminated equilibrium is (3, 2), which for both players is preferable to (2, 1), the payoff corresponding to (T , R), the equilibrium of the post-elimination game. ◭
In fact, the iterative elimination of weakly dominated strategies can result in the elimination of all the equilibria of the original game (Exercise 4.12). But this cannot happen under iterative elimination of strictly dominated strategies, which preserves the set of equilibrium points. That is the content of the following theorem. Theorem 4.35 Let G = (N, (Si )i∈N , (ui )i∈N ) be a game in strategic form, let j ∈ N, and be the game derived from let sj ∈ Sj be a strictly dominated strategy of player j . Let G is identical G by the elimination of strategy sj . Then the set of equilibria in the game G to the set of equilibria of the game G. Theorem 4.35 leads to the next corollary.
Corollary 4.36 A strictly dominated strategy cannot be an element of a game’s equilibrium. The conclusion of the last corollary is not true for weakly dominated strategies. As can be seen in Example 4.34, a weakly dominated strategy can be an element of an equilibrium. Indeed, there are cases in which an equilibrium strategy vector s ∗ is comprised of a weakly dominated strategy si∗ for each player i ∈ N (Exercise 4.30).
the Proof of Theorem 4.35 Denote by E the set of equilibria of the game G, and by E set of equilibria of the game G. Theorem 4.32 implies that E ⊆ E, because every strictly dominated strategy is also a weakly dominated strategy. It remains to show that E ⊆ E. we will show that Let s ∗ ∈ E be an equilibrium of the game G. To show that s ∗ ∈ E, which by Theorem 4.31 then implies that s ∗ ∈ E. s ∗ is a strategy vector in the game G, As the game G was derived from the game G by elimination of player j ’s strategy sj , it suffices to show that sj∗ = sj . Strategy sj is strictly dominated in the game G, so that there exists a strategy tj ∈ Sj that strictly dominates it: sj , s−j ) < uj (tj , s−j ), ∀s−j ∈ S−j . uj (
(4.46)
∗ ∗ ∗ uj ( sj , s−j ) < uj (tj , s−j ) ≤ ui (sj∗ , s−j ),
(4.47)
∗ Because s ∗ is an equilibrium point, by setting s−j = s−j in Equation (4.46) we get
thus yielding the conclusion that sj = sj∗ , which is what we needed to show.
When we put together Corollary 4.33 and Theorem 4.35, the following picture emerges: in implementing a process of iterated elimination of dominated strategies we may lose equilibria, but no new equilibria are created. If the elimination is of only strictly dominated strategies, the set of equilibria remains unchanged throughout the process. In particular, if the process of eliminating strictly dominated strategies results in a single strategy vector, this strategy vector is the unique equilibrium point of the original game (because it is
110
Strategic-form games
the equilibrium of the game at the end of the process in which each player has only one strategy remaining). The uniqueness of the equilibrium constitutes a strengthening of Corollary 4.33 in the case in which only strictly dominated strategies are eliminated. Corollary 4.37 If iterative elimination of strictly dominated strategies yields a unique strategy vector s ∗ , then s ∗ is the unique Nash equilibrium of the game. In summary, to find a player’s maxmin values we can first eliminate his (strictly or weakly) dominated strategies. In implementing this elimination process we may eliminate some of his maxmin strategies and also change the maxmin values of some other players. For finding equilibria we can also eliminate strictly dominated strategies without changing the set of equilibria of the game. Elimination of weakly dominated strategies may eliminate some equilibria of the game. The process of iterated elimination of weakly dominated strategies is useful for cases in which finding all equilibrium points is a difficult problem and we can be content with finding at least one equilibrium.
4.12
Two-player zero-sum games • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
As we have seen, the Nash equilibrium and the maxmin are two different concepts that reflect different behavioral aspects: the first is an expression of stability, while the second captures the notion of security. Despite the different roots of the two concepts, there are cases in which both lead to the same results. A special case where this occurs is in the class of two-player zero-sum games, which is the subject of this section. In a given two-player game, denote, as we have done so far, the set of players by N = {I, II} and the set of strategies respectively by SI and SII . Example 4.38 Consider the two-player game appearing in Figure 4.32.
Player II
Player I
minsII ∈ SII uI (sI, sII)
L
C
R
T
3, −3
−5, 5
−2, 2
−5
M
1, −1
4, −4
1, −1
1
B
6, −6
−3, 3
−5, 5
−5
−6
−4
−1
minsI ∈ SI uII (sI, sII)
1, −1
Figure 4.32 A two-player zero-sum game
In this example, v I = 1 and v II = −1. The maxmin strategy of Player I is M and that of Player II is R. The strategy pair (M, R) is also the equilibrium of this game (check!). In other words, here we have a case where the vector of maxmin strategies is also an equilibrium point: the two concepts ◭ lead to the same result.
111
4.12 Two-player zero-sum games
In the game in Example 4.38, for each pair of strategies the sum of the payoffs that the two players receive is zero. In other words, in any possible outcome of the game the payoff one player receives is exactly equal to the payoff the other player has to pay. Definition 4.39 A two-player game is a zero-sum game if for each pair of strategies (sI , sII ) one has uI (sI , sII ) + uII (sI , sII ) = 0.
(4.48)
In other words, a two-player game is a zero-sum game if it is a closed system from the perspective of the payoffs: each player gains what the other player loses. It is clear that in such a game the two players have diametrically opposed interests. Remark 4.40 As we saw in Chapter 2, assuming that the players have von Neumann– Morgenstern linear utilities, any player’s utility function is determined only up to a positive affine transformation. Therefore, if the payoffs represent the players’ utilities from the various outcomes of the game, then they are determined up to a positive affine transformation. Changing the representation of the utility function of the players can then transform a zero-sum game into a non-zero-sum game. We will return to this issue in Section 5.5 (page 172); it will be proved there that the results of this chapter are independent of the particular representation of utility functions, and they hold true in two-player non-zero-sum games that are obtained from two-player zero-sum games by applying positive affine transformations to the players’ payoffs. Most real-life situations analyzed using game theory are not two-player zero-sum games, because even though the interests of the players diverge in many cases, they are often not completely diametrically opposed. Despite this, two-player zero-sum games have a special importance that justifies studying them carefully, as we do in this section. Here are some of the reasons: 1. Many classical games, such as chess, backgammon, checkers, and a plethora of dice games, are two-player zero-sum games. These were the first games to be studied mathematically and the first to yield formal results, results that spawned and shaped game theory as a young field of study in the early part of the twentieth century. 2. Given their special and highly restrictive properties, these games are generally simpler and easier to analyze mathematically than many other games. As is usually the case in mathematics, this makes them convenient objects for the initial exploration of ideas and possible directions for research in game theory. 3. Because of the fact that two-player zero-sum games leave no room for cooperation between the players, they are useful for isolating certain aspects of games and checking which results stem from cooperative considerations and which stem from other aspects of the game (information flows, repetitions, and so on). 4. In every situation, no matter how complicated, a natural benchmark for each player is his “security level”: what he can guarantee for himself based solely on his own efforts, without relying on the behavior of other players. In practice, calculating the security level means assuming a worst-case scenario in which all other players are acting as an adversary. This means that the player is considering an auxiliary zero-sum game, in which all the other players act as if they were one opponent whose payoff is the opposite
112
Strategic-form games
Player II
Player I
L
C
R
T
3
−5
−2
M
1
4
1
B
6
−3
−5
Figure 4.33 The payoff function u of the zero-sum game in Example 4.38
of his own payoff. In other words, even when analyzing a game that is non-zero-sum, the analysis of auxiliary zero-sum games can prove useful. 5. Two-player zero-sum games emerge naturally in other models. One example is games involving only a single player, which are often termed decision problems. They involve a decision maker choosing an action from among a set of alternatives, with the resultant payoff dependent both on his choice of action and on certain, often unknown, parameters over which he has no control. To calculate what the decision maker can guarantee for himself, we model the player’s environment as if it were a second player who controls the unknown parameters and whose intent is to minimize the decision maker’s payoff. This in effect yields a two-player zero-sum game. This approach is used in statistics, and we will return to it in Section 14.8 (page 600). Let us now turn to the study of two-player zero-sum games. Since the payoffs uI and uII satisfy uI + uII = 0, we can confine our attention to one function, uI = u, with uII = −u. The function u will be termed the payoff function of the game, and it represents the payment that Player II makes to Player I. Note that this creates an artificial asymmetry (albeit only with respect to the symbols being used) between the two players: Player I, who is usually the row player, seeks to maximize u(s) (his payoff) and Player II, who is usually the column player, is trying to minimize u(s), which is what he is paying (since his payoff is −u(s)). The game in Example 4.38 (page 110) can therefore be represented as shown in Figure 4.33. The game of Matching Pennies (Example 3.20, page 52) can also be represented as a zero-sum game (see Figure 4.34). Consider now the maxmin values of the players in a two-player zero-sum game. Player I’s maxmin value is given by vI = max min u(sI , sII ), sI ∈SI sII ∈SII
(4.49)
and Player II’s maxmin value is v II = max min(−u(sI , sII )) = − min max u(sI , sII ). sII ∈SII sI ∈SI
sII ∈SII sI ∈SI
(4.50)
113
4.12 Two-player zero-sum games
Player II H
T
H
1
−1
T
−1
1
Player I
Figure 4.34 The payoff function u of the game Matching Pennies
Player II R
T
−2
5
−2
B
3
0
0
3
5
0, 3
Player I maxsI ∈ SI uII (sI, sII)
minsII ∈ SII uI (sI, sII)
L
Figure 4.35 A game in strategic form with the maxmin and minmax values
Denote v := max min u(sI , sII ),
(4.51)
v := min max u(sI , sII ).
(4.52)
sI ∈SI sII ∈SII
sII ∈SII sI ∈SI
The value v is called the maxmin value of the game, and v is called the minmax value. Player I can guarantee that he will get at least v, and Player II can guarantee that he will pay no more than v. A strategy of Player I that guarantees v is termed a maxmin strategy. A strategy of Player II that guarantees v is called a minmax strategy. We next calculate the maxmin value and minmax value in various examples of games. In Example 4.38, v = 1 and v = 1. In other words, Player I can guarantee that he will get a payoff of at least 1 (using the maxmin strategy M), while Player II can guarantee that he will pay at most 1 (by way of the minmax strategy R). Consider the game shown in Figure 4.35. In this figure we have indicated on the right of each row the minimal payoff that the corresponding strategy of Player I guarantees him. Beneath each column we have indicated the maximal amount that Player II will pay if he implements the corresponding strategy. In this game v = 0 but v = 3. Player I cannot guarantee that he will get a payoff higher than 0 (which he can guarantee using his maxmin strategy B) and Player II cannot guarantee that he will pay less than 3 (which he can guarantee using his minmax strategy L). Finally, look again at the game of Matching Pennies (Figure 4.36).
114
Strategic-form games
Player II T
H
1
−1
−1
T
−1
1
−1
1
1
−1, 1
Player I maxsI ∈ SI uII (sI, sII)
minsII ∈ SII uI (sI, sII)
H
Figure 4.36 Matching Pennies with the maxmin and minmax values
In this game, v = −1 and v = 1. Neither of the two players can guarantee a result that is better than the loss of one dollar (the strategies H and T of Player I are both maxmin strategies, and the strategies H and T of Player II are both minmax strategies). As these examples indicate, the maxmin value v and the minmax value v may be unequal, but it is always the case that v ≤ v. The inequality is clear from the definitions of the maxmin and minmax: Player I can guarantee that he will get at least v, while Player II can guarantee that he will not pay more than v. As the game is a zero-sum game, the inequality v ≤ v must hold. A formal proof of this fact can of course also be given (Exercise 4.34). Definition 4.41 A two-player game has a value if v = v. The quantity v := v = v is then called the value of the game.6 Any maxmin and minmax strategies of Player I and Player II respectively are then called optimal strategies. Consider again the game shown in Figure 4.33. This game has a value equal to 1. Player I can guarantee that he will get at least 1 for himself by selecting the optimal strategy M, and Player II can guarantee that he will not pay more than 1 by choosing the optimal strategy R. Note that the strategy pair (M, R) is also a Nash equilibrium. Another example of a game that has a value is the game of chess, assuming that if the play does not end after a predetermined number of moves, it terminates in a draw. We do not know what that value is, but the existence of a value follows from Theorem 1.4 (page 3). Since it is manifestly a two-player game in which the interests of the players are diametrically opposed, we describe chess as a zero-sum game where White is the maximizer and Black is the minimizer by use of the following payoff function: u(White wins) = 1,
u(Black wins) = −1,
(4.53)
u(Draw) = 0.
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
6 The value of a game is sometimes also called the minmax value of the game.
115
4.12 Two-player zero-sum games
Theorem 1.4 (page 3) implies that one and only one of the following must occur: (i) White has a strategy guaranteeing a payoff of 1. (ii) Black has a strategy guaranteeing a payoff of −1. (iii) Each of the two players has a strategy guaranteeing a payoff of 0; that is, White can guarantee a payoff in the set {0, 1}, and Black can guarantee a payoff in the set {0, −1}. If case (i) holds, then v ≥ 1. As the maximal payoff is 1, it must be true that v ≤ 1. Since we always have v ≤ v, we deduce that 1 ≤ v ≤ v ≤ 1, which means that v = v = 1. Thus, the game has a value and v = v = 1 is its value. If case (ii) holds, then v ≤ −1. Since the minimal payoff is −1, it follows that v ≥ −1. Hence −1 ≤ v ≤ v ≤ −1, leading to v = v = −1, and the game has a value of −1. Finally, suppose case (iii) holds. Then v ≥ 0 and v ≤ 0. So 0 ≤ v ≤ v ≤ 0, leading to v = v = 0, and the game has a value of 0. Note that in chess each pair of optimal strategies is again a Nash equilibrium. For example, if case (i) above holds, then White’s strategy is optimal if and only if it is a winning strategy. On the other hand, any strategy of Black guarantees him a payoff of at least −1 and therefore all his strategies are optimal. Every pair consisting of a winning strategy for White and any strategy for Black is an equilibrium. Since White can guarantee victory for himself, he certainly has no profitable deviation; since Black will lose no matter what, no deviation is strictly profitable for him either. The following conclusion has therefore been proved. Corollary 4.42 The game of chess has a value that is either 1 (if case (i) holds), or −1 (if case (ii) holds), or 0 (if case (iii) holds). The following theorem can be proven in the same way that Theorem 3.13 (page 46) was proved. Later in this book, a more general result is shown to be true, for games that are not zero-sum (see Theorem 4.49 on page 118). Theorem 4.43 Every finite two-player zero-sum extensive-form game with perfect information has a value. In every example we have considered so far, every zero-sum game with a value also has an equilibrium. The following two theorems establish a close relationship between the concepts of the value and of Nash equilibrium in two-player zero-sum games. Theorem 4.44 If a two-player zero-sum game has a value v, and if sI∗ and sII∗ are optimal strategies of the two players, then s ∗ = (sI∗ , sII∗ ) is an equilibrium with payoff (v, −v). Theorem 4.45 If s ∗ = (sI∗ , sII∗ ) is an equilibrium of a two-player zero-sum game, then the game has a value v = u(sI∗ , sII∗ ), and the strategies sI∗ and sII∗ are optimal strategies. Before we prove Theorems 4.44 and 4.45, we wish to stress that these theorems show that in two-player zero-sum games the concept of equilibrium, which is based on stability, and the concept of minmax, which is based on security levels, coincide. If security level considerations are important factors in determining players’ behavior, one may expect that the concept of equilibrium will have greater predictive power in two-player zero-sum
116
Strategic-form games
games (where equilibrium strategies are also minmax strategies) than in more general games in which the two concepts lead to different predictions regarding players’ behavior. Note that despite the fact that the strategic form of the game is implicitly a simultaneously played game in which each player, in selecting his strategy, does not know the strategy selected by the other player, if the game has a value then each player can reveal the optimal strategy that he intends to play to the other player and still guarantees his maxmin value. Suppose that sI∗ is an optimal strategy for Player I in a game with value v. Then min u(sI∗ , sII ) = v,
sII ∈SII
(4.54)
and therefore for each sII ∈ SII the following inequality is satisfied: u(sI∗ , sII ) ≥ v.
(4.55)
In other words, even if Player I were to “announce” to Player II that he intends to play sI∗ , Player II cannot bring about a situation in which the payoff (to Player I) will be less than the value. This simple observation has technical implications for the search for optimal strategies: in order to check whether or not a particular strategy, say of Player I, is optimal, we check what it can guarantee, that is, what the payoff will be when Player II knows that this is the strategy chosen by Player I and does his best to counter it. Proof of Theorem 4.44: From the fact that both sI∗ and sII∗ are optimal strategies, we deduce that u(sI∗ , sII ) ≥ v, ∀sII ∈ SII , u(sI , sII∗ )
≤ v, ∀sI ∈ SI .
(4.56) (4.57)
Inserting sII = sII∗ into Equation (4.56) we deduce u(sI∗ , sII∗ ) ≥ v, and inserting sI = sI∗ into Equation (4.57) we get u(sI∗ , sII∗ ) ≤ v. The equation v = u(sI∗ , sII∗ ) follows. Equations (4.56) and (4.57) can now be written as u(sI∗ , sII ) ≥ u(sI∗ , sII∗ ), ∀sII ∈ SII , u(sI , sII∗ )
≤
u(sI∗ , sII∗ ),
∀sI ∈ SI ,
(4.58) (4.59)
and therefore (sI∗ , sII∗ ) is an equilibrium with payoff (v, −v).
u(sI , sII∗ ) ≤ u(sI∗ , sII∗ ), ∀sI ∈ SI
(4.60)
Proof of Theorem 4.45: Since (sI∗ , sII∗ ) is an equilibrium, no player can benefit by a unilateral deviation: u(sI∗ , sII )
≥
u(sI∗ , sII∗ ),
∀sII ∈ SII .
(4.61)
Let v = u(sI∗ , sII∗ ). We will prove that v is indeed the value of the game. From Equation (4.60) we get u(sI∗ , sII ) ≥ v, ∀sII ∈ SII ,
(4.62)
and therefore v ≥ v. From Equation (4.60) we deduce that u(sI , sII∗ ) ≤ v, ∀sI ∈ SI ,
(4.63)
117
4.12 Two-player zero-sum games
Player II a
b
A
1, 1
0, 0
B
0, 0
3, 3
Player I Figure 4.37 Coordination game
and therefore v ≤ v. Because it is always the case that v ≤ v we get v ≤ v ≤ v ≤ v,
(4.64)
which implies that the value exists and is equal to v. Furthermore, from Equation (4.62) we deduce that sI∗ is an optimal strategy for Player I, and from Equation (4.63) we deduce that sII∗ is an optimal strategy for Player II. Corollary 4.46 In a two-player zero-sum game, if (sI∗ , sII∗ ) and (sI∗∗ , sII∗∗ ) are two equilibria, then it follows that 1. Both equilibria yield the same payoff: u(sI∗ , sII∗ ) = u(sI∗∗ , sII∗∗ ). 2. Both (sI∗ , sII∗∗ ) and (sI∗∗ , sII∗ ) are also equilibria (and, given the above, they also yield the same payoff). Proof: The first part follows from Theorem 4.45, because the payoff of each one of the equilibria is necessarily equal to the value of the game. For the second part, note that Theorem 4.45 implies that all the strategies sI∗ , sI∗∗ , sII∗ , sII∗∗ are optimal strategies. By Theorem 4.44 we conclude that (sI∗ , sII∗∗ ) and (sI∗∗ , sII∗ ) are equilibria. Neither of the two conclusions of Corollary 4.46 is necessarily true in a two-player game that is not zero-sum. Consider, for example, the coordination game in Example 4.20, shown in Figure 4.37. (A, a) and (B, b) are two equilibria with different payoffs (thus, the first part of Corollary 4.46 does not hold in this example) and (A, b) and (B, a) are not equilibria (thus the second part of the corollary does not hold). The most important conclusion to take away from this section is that in two-player zero-sum games the value and Nash equilibrium, two different solution concepts, actually coincide and lead to the same results. Put another way, in two-player zero-sum games, the goals of security and stability are unified. John Nash regarded his concept of equilibrium to be a generalization of the value. But while the concept of the value expresses both the aspects of security and stability, the Nash equilibrium expresses only the aspect of stability. In games that are not zero-sum games, security and stability are different concepts, as we saw in the game depicted in Figure 4.28. There is a geometric interpretation to the value of a two-player zero-sum game, which finds expression in the concept of the saddle point.
118
Strategic-form games
Definition 4.47 A pair of strategies (sI∗ , sII∗ ) is a saddle point of the function u : SI × SII → R if u(sI∗ , sII∗ ) ≥ u(sI , sII∗ ),
u(sI∗ , sII∗ ) ≤ u(sI∗ , sII ),
∀sI ∈ SI ,
∀sII ∈ SII .
(4.65) (4.66)
In other words, u(sI∗ , sII∗ ) is the highest value in column sII∗ , and the smallest in the row sI∗ . The name “saddle point” stems from the shape of a horse’s saddle, whose center is perceived to be the minimal point of the saddle from one direction and the maximal point from the other direction. The proof of the next theorem is left to the reader (Exercise 4.36). Theorem 4.48 In a two-player zero-sum game, (sI∗ , sII∗ ) is a saddle point of the payoff function u if and only if sI∗ is an optimal strategy for Player I and sII∗ is an optimal strategy for Player II. In that case, u(sI∗ , sII∗ ) is the value of the game.
4.13
Games with perfect information • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
As we have shown, there are games in which there exists no Nash equilibrium. These games will be treated in Chapter 5. In this section we focus instead on a large class of widely applicable games that all have Nash equilibria. These games are best characterized in extensive form. We will show that if an extensive-form game satisfies a particular characteristic, then it always has a Nash equilibrium. Furthermore, there are even equilibria that can be calculated directly from the game tree, without requiring that the game first be transformed into strategic form. Because it is often more convenient to work directly with the extensive form of a game, this way of calculating equilibria has a significant advantage. In this section we study extensive-form games with perfect information. Recall that an extensive-form game is of perfect information if every information set of every player consists of only one vertex. Theorem 4.49 (Kuhn) Every finite game with perfect information has at least one Nash equilibrium. Kuhn’s Theorem constitutes a generalization of Theorem 4.43, which states that every two-player zero-sum game with perfect information has a value. The proof of the theorem is similar to the proof of Theorem 1.4 (page 3), and involves induction on the number of vertices in the game tree. Every child of the root of a game tree defines a subgame containing fewer vertices than the original game (a fact that follows from the assumption that the game has perfect recall) and the induction hypothesis then implies that the subgame has an equilibrium. Choose one equilibrium for each such subgame. If the root of the original game involves a chance move, then the union of the equilibria of all the subgames defines an equilibrium for the entire game. If the root involves a decision taken
119
4.13 Games with perfect information
v0
x1
x2
Γ(x 1 )
Γ(x 2 )
xL
Γ(x L )
Figure 4.38 The game tree and subgames starting at the children of the root
by player i, then that player will survey the subgames that will be played (one for each child that he may choose), calculate the payoff he will receive under the chosen equilibrium in each of those subgames, and choose the vertex leading to the subgame that grants him the maximal payoff. These intuitive ideas will now be turned into a formal proof. Proof of Theorem 4.49: It is convenient to assume that if a player in any particular game has no action available in any vertex in the game tree, then his strategy set consists of a single strategy denoted by ∅. The proof of the theorem is by induction on the number of vertices in the game tree. If the game tree is comprised of a single vertex, then the unique strategy vector is (∅, . . . , ∅) (so a fortiori there are no available deviations), and it is therefore the unique Nash equilibrium. Assume by induction that the claim is true for each game in extensive form containing fewer than K vertices, and consider a game Ŵ with K vertices. Denote by x 1 , . . . , x L the children of the root v 0 , and by Ŵ(x l ) the subgame whose root is x l and whose vertices are those following x l in the tree (see Figure 4.38). Because the game is one with perfect information, Ŵ(x l ) is indeed a subgame. If we had not assumed this then Ŵ(x l ) would not necessarily be a subgame, because there could be an information set containing vertices that are descendants of both x l1 and x l2 (where l1 = l2 ) and we would be unable to make use of the induction hypothesis. The payoff functions of the game Ŵ are, as usual, ui : ×i∈N Si → R. For each l ∈ 1, 2, . . . , L, the payoff functions in the subgame Ŵ(x l ) are uli : ×i∈N Sil → R, where Sil is player i’s set of strategies in the subgame Ŵ(x l ). For any l ∈ {1, . . . , L}, the root v 0 of the original game Ŵ is not a vertex of Ŵ(x l ), and therefore the number of vertices in Ŵ(x l ) is less than K. By the induction hypothesis, for each l ∈ {1, 2, . . . , L} the game Ŵ(x l ) has an equilibrium s ∗l = (si∗l )i∈N (if there are several such equilibria we arbitrarily choose one of them).
120
Strategic-form games
Case 1: The root v 0 is a chance move. For each l ∈ {1, 2, . . . , L} denote by pl the probability that child x l is chosen. For each player i consider the strategy si∗ in the game Ŵ defined as follows. If vertex x l is chosen in the first move of the play of the game, implement strategy si∗l in the subgame Ŵ(x l ). By definition it follows that ui (s ∗ ) = Ll=1 p l uli (s ∗l ). We will show that the strategy vector s ∗ = (si∗ )i∈N is a Nash equilibrium. Suppose that player j deviates to a different strategy sj . Let sjl be the restriction of sj to the ∗l ) is subgame Ŵ(x l ). The expected payoff to player j under the strategy vector (sjl , s−j L ∗l l l l l=1 p uj (sj , s−j ). ∗l ) ≤ ulj (s ∗l ) for all l = 1, . . . , L, and Since s ∗l is an equilibrium of Ŵ(x l ), ulj (sjl , s−j therefore ∗ uj (sj , s−j )
=
L
l=1
pl ulj
L l ∗l
sj , s−j ≤ p l ulj (s ∗l ) = uj (s ∗ ).
(4.67)
l=1
In other words, player j does not profit by deviating from sj∗ to sj . Since this holds true for every player j ∈ N, the strategy vector s ∗ is indeed a Nash equilibrium. Case 2: The root is a decision vertex for player i0 . We first define a strategy vector s ∗ = (si∗ )i∈N and then show that it is a Nash equilibrium. For each player i, i = i0 , consider the strategy si∗ defined as follows. If vertex x l is chosen in the first move of the play of the game, in the subgame Ŵ(x l ) implement strategy si∗l . For player i0 define the following strategy si∗0 : at the root choose the child x l0 at which the maximum max1≤l≤L uli (s ∗l ) is attained. For each l ∈ {1, 2, . . . , L}, in the subgame Ŵ(x l ) implement7 the strategy si∗l . The payoff under the strategy vector s ∗ = (si∗ )i∈N is ul0 (s ∗l0 ). The proof that each player i, except for player i0 , cannot profit from a deviation from si∗ is similar to the proof in Case 1 above. We will show that player i0 also cannot profit by deviating from si∗0 , thus completing the proof that the strategy vector s ∗ is a Nash equilibrium. Suppose that player i0 deviates by selecting strategy si0 . Let x l be the child of the root l l selected by this strategy, and for each child x of the root let si0 be the strategy si0 restricted to the subgame Ŵ(x l ).
r If l = l0 , since s ∗l0 is an equilibrium of the subgame Ŵ(x l0 ), the payoff to player i0 is ∗l0 ∗ ui0 (si0 , s−i ) = uli00 sil00 , s−i (4.68) ≤ uli00 (s ∗l0 ) = ui0 (s ∗ ). 0 0 In other words, the deviation is not a profitable one. r If l = l0 , since s ∗l is an equilibrium of the subgame Ŵ(x l ) and using the definition of l0 we obtain ∗ ∗ l l l ) = u , s (4.69) ui0 (si0 , s−i s ≤ uli0 (s ∗l ) ≤ uli0 (s ∗l0 ) = ui0 (s ∗ ). −i i i 0 0 0 0 This too is not a profitable deviation, which completes the proof.
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
7 Since defining a strategy requires defining how a player plays at each node at which he chooses an action, we also need to define si∗0 in the subgames Ŵ(x l ) which the first move of the play of the game does not lead to (l = l0 ).
121
4.14 Games on the unit square
Remark 4.50 In the course of the last proof, we proceeded by induction from the root to its children and beyond. This is called forward induction. We can prove the theorem by backward induction, as follows. Let x be a vertex all of whose children are leaves. Since the game has perfect recall, the player choosing an action at vertex x knows that the play of the game has arrived at that vertex (and not at a larger information set containing x) and he therefore chooses the leaf l giving him the maximal payoff. We can imagine erasing the leaves following x and thus turning x into a leaf with a payoff equal to the payoff of l. The resulting game tree has fewer vertices than the original tree, so we can apply the induction hypothesis to it. The reader is asked to complete this proof in Exercise 4.39. This process is called backward induction. It yields a practical algorithm for finding an equilibrium in finite games with perfect information: start at vertices leading immediately to leaves. Assuming the play of the game gets to such a vertex, the player at that vertex will presumably choose the leaf granting him the maximal payoff (if there are two or more such vertices, the player may arbitrarily choose any one of them). We then attach that payoff to such a vertex. If one of these vertices is the vertex of a chance move, the payoff at that vertex is the expectation of the payoff at the leaf reached by the chance move. From here we proceed in stages: at each stage, we attach payoffs to vertices leading immediately to vertices that had payoffs attached to them in previous stages. At each such vertex, the player controlling that vertex will make a selection leading to the maximal possible payoff to him, and that is the payoff associated with the vertex. We continue by this process to climb the tree until we reach the root. In some cases this process leads to multiple equilibria. As shown in Exercise 4.40 some equilibria cannot be obtained by this process.
4.14
Games on the unit square • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
In this section we analyze two examples of two-player games in which the set of strategies is infinite, namely, the unit interval [0, 1]. These examples will be referred to in Chapter 5, where we introduce mixed strategies.
4.14.1 A two-player zero-sum game on the unit square Consider the two-player zero-sum strategic-form game in which:8 r the strategy set of Player I is X = [0, 1]; r the strategy set of Player II is Y = [0, 1]; r the payoff function (which is what Player II pays Player I) is u(x, y) = 4xy − 2x − y + 3, ∀x ∈ [0, 1], ∀y ∈ [0, 1].
(4.70)
This game is called a game on the unit square, because the set of strategy vectors is the unit square in R2 . We can check whether or not this game has a value, and if it does, we
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
8 In games on the unit square it is convenient to represent a strategy as a continuous variable, and we therefore denote player strategies by x and y (rather than sI and sII ), and the sets of strategies are denoted by X and Y respectively (rather than SI and SII ).
122
Strategic-form games
2 12 2 1 x
0 0
1 4
1
Figure 4.39 The function x #→ miny∈[0,1] u(x, y)
can identify optimal strategies for the two players, as follows. First we calculate v = max min u(x, y),
(4.71)
v = min max u(x, y),
(4.72)
x∈[0,1] y∈[0,1]
and y∈[0,1] x∈[0,1]
and check whether or not they are equal. For each x ∈ [0, 1], min u(x, y) = min (4xy − 2x − y + 3) = min (y(4x − 1) − 2x + 3). (4.73)
y∈[0,1]
y∈[0,1]
y∈[0,1]
For each fixed x, this is a linear function in y, and therefore the point at which the minimum is attained is determined by the slope 4x − 1: if the slope is positive the function is increasing and the minimum is attained at y = 0; if the slope is negative this is a decreasing function and the minimum is attained at y = 1; if the slope is 0 the function is constant in y and every point is a minimum point. This leads to the following (see Figure 4.39): 2x + 2 if x ≤ 41 , (4.74) min u(x, y) = y∈[0,1] −2x + 3 if x ≥ 41 . This function of x attains a unique maximum at x = 41 , and its value there is 2 12 . Therefore, v = max min u(x, y) = 2 21 . x∈[0,1] y∈[0,1]
(4.75)
We similarly calculate the following (see Figure 4.40): max u(x, y) = max (4xy − 2x − y + 3) = max (x(4y − 2) − y + 3) x∈[0,1] x∈[0,1] 1 −y + 3 if y ≤ 2 , = 3y + 1 if y ≥ 21 .
x∈[0,1]
(4.76) (4.77)
123
4.14 Games on the unit square
4 3 2 12 2 1 y
0 0
1 2
1
Figure 4.40 The function y #→ maxx∈[0,1] u(x, y)
This function of y attains a unique minimum at y = 12 , and its value there is 2 12 . v = min max u(x, y) = 2 21 . y∈[0,1] x∈[0,1]
(4.78)
In other words, the game has a value v = 2 12 , and x ∗ = 14 and y ∗ = 12 are optimal strategies (in fact the only optimal strategies in this game). Since x ∗ and y ∗ are the only optimal strategies of the players, we deduce from Theorems 4.44 and 4.45 that (x ∗ , y ∗ ) is the only equilibrium of the game.
4.14.2 A two-player non-zero-sum game on the unit square Consider the following two-player non-zero-sum game in strategic form: r the strategy set of Player I is X = [0, 1]; r the strategy set of Player II is Y = [0, 1]; r the payoff function of Player I is uI (x, y) = 3xy − 2x − 2y + 2, ∀x ∈ [0, 1], ∀y ∈ [0, 1];
(4.79)
r the payoff function of Player II is uII (x, y) = −4xy + 2x + y, ∀x ∈ [0, 1], ∀y ∈ [0, 1].
(4.80)
Even though this is not a zero-sum game, the maxmin concept, reflecting the security level of a player, is still well defined (see Equation (4.26)). Player I can guarantee v I = max min uI (x, y),
(4.81)
v II = max min uII (x, y).
(4.82)
x∈[0,1] y∈[0,1]
and Player II can guarantee y∈[0,1] x∈[0,1]
124
Strategic-form games
2 3
x
0 0
2 3
1
Figure 4.41 The function x #→ miny∈[0,1] uI (x, y)
Similarly to the calculations carried out in Section 4.14.1, we derive the following (see Figure 4.41): min uI (x, y) = min (3xy − 2x − 2y + 2) = min (y(3x − 2) − 2x + 2) y∈[0,1] y∈[0,1] 2 x for x ≤ 3 , = −2x + 2 for x ≥ 23 .
y∈[0,1]
(4.83) (4.84)
This function of x has a single maximum, attained at x = 32 , with the value 23 . We therefore have v I = max min uI (x, y) = 23 . x∈[0,1] y∈[0,1]
(4.85)
The sole maxmin strategy available to Player I is x = 23 . We similarly calculate for Player II (see Figure 4.42): min uII(x, y) = min (−4xy + 2x + y) = min (x(2 − 4y) + y) x∈[0,1] x∈[0,1] 1 y for y ≤ 2 , = 2 − 3y for y ≥ 12 .
x∈[0,1]
(4.86) (4.87)
This function of y has a single maximum, attained at y = 21 , with value 12 . We therefore have v II = max min uII (x, y) = 12 , y∈[0,1] x∈[0,1]
(4.88)
and the sole maxmin strategy of Player II is y = 21 . The next step is to calculate a Nash equilibrium of this game, assuming that there is one. The most convenient way to do so is to use the definition of the Nash equilibrium based on the “best reply” concept (Definition 4.18 on page 97): a pair of strategies (x ∗ , y ∗ ) is a Nash equilibrium if x ∗ is Player I’s best reply to y ∗ , and y ∗ is Player II’s best reply to x ∗ .
125
4.14 Games on the unit square
1 2
1
0
x
1 2
−1 Figure 4.42 The function y #→ minx∈[0,1] uII (x, y)
For each x ∈ [0, 1], denote by brII (x) the collection of best replies9 of Player II to the strategy x: brII (x) := argmaxy∈[0,1] uII (x, y) = {y ∈ [0, 1] : uII (x, y) ≥ uII (x, z) ∀z ∈ [0, 1]}.
(4.89)
In other words, brII (x) is the collection of values y at which the maximum of uII (x, y) is attained. To calculate brII (x) in this example, we will write uII (x, y) as uII (x, y) = y(1 − 4x) + 2x.
(4.90)
For each fixed x, this is a linear function of y: if it has a positive slope the function is increasing and attains its maximum at y = 1. If the slope is negative, the function is decreasing and the maximum point is y = 0. If the slope of the function is 0, then the function is constant and every point y ∈ [0, 1] is a maximum point. The slope turns from positive to negative at x = 41 , and the graph of brII (x) is given in Figure 4.43. Note that brII is not a function, because brII ( 41 ) is not a single point but the interval [0, 1]. The calculation of brI (y) is carried out similarly. The best reply of Player I to each y ∈ [0, 1] is brI (y) := argmaxx∈[0,1] uI (x, y) = {x ∈ [0, 1] : uI (x, y) ≥ uI (z, y) ∀z ∈ [0, 1]} . (4.91) Writing uI (x, y) as uI (x, y) = x(3y − 2) − 2y + 2
(4.92)
shows that, for each fixed y, this is a linear function in x: if it has a positive slope the function is increasing and attains its maximum at x = 1. A negative slope implies that the function is decreasing and its maximum point is x = 0, and a slope of 0 indicates a
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
9 br stands for best reply.
126
Strategic-form games
y
x
0 0
1 4
1
Figure 4.43 The graph of brII (x)
y
2 3
x
0 0
1
Figure 4.44 The graph of brI (y)
constant function where every point x ∈ [0, 1] is a maximum point. The slope turns from negative to positive at y = 32 , and the graph of brI (y) is given in Figure 4.44. Note that the variable y is represented by the vertical axis, even though it is the variable of the function brI (y). This is done so that both graphs, brI (y) and brII (x), can be conveniently depicted within the same system of axes, as follows (Figure 4.45): In terms of the best-reply concept, the pair of strategies (x ∗ , y ∗ ) is an equilibrium point if and only if x ∗ ∈ brI (y ∗ ) and y ∗ ∈ brII (x ∗ ). In other words, we require (x ∗ , y ∗ ) to be on both graphs brII (x) and brI (y). As is clear from Figure 4.40, the only point satisfying this condition is (x ∗ = 41 , y ∗ = 23 ). We conclude that the game has a single Nash equilibrium (x ∗ , y ∗ ) where x ∗ = 14 and ∗ y = 23 , with the equilibrium payoff of uI (x ∗ , y ∗ ) = 23 to Player I and uII (x ∗ , y ∗ ) = 12 to Player II. This example shows, again, that in games that are not zero-sum the concepts of Nash equilibrium and optimal strategies differ; despite the fact that for both players the equilibrium payoff is equal to the security level ( 23 for Player I and 12 for Player II), the maxmin strategies are not the equilibrium strategies. The maxmin strategies are x = 23 and y = 21 , 1 2 ∗ ∗ while the equilibrium strategies are x = 4 and y = 3 .
127
4.14 Games on the unit square
y
2 3
x
0 0
1
1 4
Figure 4.45 The graphs of x #→ brII (x) (darker line) and y #→ brI (y) (lighter line)
r The pair of maxmin strategies, x = 32 and y = 12 , is not an equilibrium. The payoff to Player I is 23 , but he can increase his payoff by deviating to x = 0 because x, y ). (4.93) y ) = uI 0, 12 = 1 > 32 = uI ( uI (0, The payoff to Player II is 12 , and he can also increase his payoff by deviating to y = 0 because x, y ). (4.94) x , 0) = uII 32 , 0 = 34 > 12 = uII ( uII (
r The equilibrium strategies x ∗ = 1 and y ∗ = 2 are not optimal strategies. If Player I 4 3 chooses strategy x ∗ = 14 and Player II plays y = 1, the payoff to Player I is less than his security level 32 : (4.95) uI 14 , 1 = 41 < 23 = v I . Similarly, when Player II plays y ∗ = 23 , if Player I plays x = 1 then the payoff to
Player II is less than his security level 21 : uII 1, 32 = 0
ui (si , s−i ) for each player i ∈ N ∗ and each strategy si ∈ Si \ {si }. (a) Prove that if the process of iterative elimination of strictly dominated strategies results in a unique strategy vector s ∗ , then s ∗ is a strict Nash equilibrium, and it is the only Nash equilibrium of the game. (b) Prove that if s ∗ = (si∗ )ni=1 is a strict Nash equilibrium, then none of the strategies si∗ can be eliminated by iterative elimination of dominated strategies (under either strict or weak domination).10 4.10 Prove that the result of iterated elimination of strictly dominated strategies (that is, the set of strategies remaining after the elimination process has been completed) is independent of the order of elimination. Deduce that if the result of the elimination process is a single vector s ∗ , then that same vector will be obtained under every possible order of the elimination of strictly dominated strategies. 4.11 Find all rational strategy vectors in the following games. Player II
Player II
α Player I β
a
b
c
d
6, 2
6, 3
7, 6
2, 8
8, 5
6, 9
4, 6
4, 7
a
b
α
9, 5
5, 3
β
8, 6
8, 4
Player I
Game A
Game B Player II
a
d
−1, 20
−7, −7
−1, 2
−5, 8
β
27, 20
13, −1
21, 2
13, −1
γ
−5, 20
−3, 5
7, −1
3, −4
α Player I
Player II b c
Game C
a
b
c
d
α
3, 7
0, 13
4, 5
5, 3
β
5, 3
4, 5
4, 5
3, 7
γ
4, 5
3, 7
4, 5
5, 3
δ
4, 5
4, 5
4, 5
4, 5
Player I
Game D
4.12 Find a game that has at least one equilibrium, but in which iterative elimination of dominated strategies yields a game with no equilibria. 4.13 Prove directly that a strictly dominated strategy cannot be an element of a game’s equilibrium (Corollary 4.36, page 109). In other words, show that in every strategy vector in which there is a player using a strictly dominated strategy, that player can deviate and increase his payoff. ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
10 This is not true of equilibria that are not strict. See Example 4.16, where there are four nonstrict Nash equilibria (T , C), (M, L), (M, R), and (B, L).
133
4.16 Exercises
4.14 In a first-price auction, each buyer submits his bid in a sealed envelope. The winner of the auction is the buyer who submits the highest bid, and the amount he pays is equal to what he bid. If several buyers have submitted bids equal to the highest bid, a fair lottery is conducted among them to choose one winner, who then pays his bid. (a) In this situation, does the strategy βi∗ of buyer i, in which he bids his private value for the item, weakly dominate all his other strategies? (b) Find a strategy of buyer i that weakly dominates strategy βi∗ . Does the strategy under which each buyer bids his private value weakly dominate all the other strategies? Justify your answer. 4.15 Prove that the two definitions of the Nash equilibrium, presented in Definitions 4.17 and 4.19, are equivalent to each other. 4.16 Find all the equilibria in the following games.
Player II b c
a γ Player I
7, 3
6, 3
Player II b c
a d
5, 5
4, 7
β
4, 2
5, 8
8, 6
5, 8
α
6, 1
3, 8
2, 4
6, 9
δ
5, 2
3, 1
2, 2
4, 5
γ
0, 3
2, 2
0, 1
−1, 3
β
8, 4
7, 0
6, −1
5, 2
α
0, 5
1, −2
2, 2
3, 4
Player I
Game A
Game B Player II b c
a
Player I
d
d
0, 0
−1, 1
1, 1
0, −1
δ
1, −1
1, 0
0, 1
0, 0
γ
0, 1
−1, −1
1, 0
1, −1
β
−1, 1
0, −1
−1, 1
0, 0
α
1, 1
0, 0
−1, −1
0, 0
Game C
4.17 In the following three-player game, Player I chooses a row (A or B), Player II chooses a column (a or b), and Player III chooses a matrix (α, β, or γ ). Find all the equilibria of this game. a
b
A
0, 0, 5
0, 0, 0
B
2, 0, 0
0, 0, 0 α
a
b
A
1, 2, 3
0, 0, 0
B
0, 0, 0
1, 2, 3 β
a
b
A
0, 0, 0
0, 0, 0
B
0, 5, 0
0, 0, 4 γ
134
Strategic-form games
4.18 Find the equilibria of the following three-player game (Player I chooses row T , C, or B, Player II a column L, M, or R, and Player III chooses matrix P or Q). L
M
R
T
4, 9, 3
7, 8, 10
5, 7, −1
2, 2, 8
C
3, 4, 5
17, 3, 12
3, 5, 2
−3, 5, 0
B
9, 7, 2
20, 0, 13
0, 15, 0
L
M
R
T
3, 10, 8
8, 14, 6
4, 12, 7
C
4, 7, 2
5, 5, 2
B
3, −5, 0
0, 3, 4 P
Q
4.19 Prove that in the Centipede game (see Exercise 3.12 on page 61), at every Nash equilibrium, Player I chooses S at the first move in the game. 4.20 A two-player game is symmetric if the two players have the same strategy set S1 = S2 and the payoff functions satisfy u1 (s1 , s2 ) = u2 (s2 , s1 ) for each s1 , s2 ∈ S1 . Prove that the set of equilibria of a two-player symmetric game is a symmetric set: if (s1 , s2 ) is an equilibrium, then (s2 , s1 ) is also an equilibrium. 4.21 Describe the following games in strategic form (in three-player games, let Player I choose the row, Player II choose the column, and Player III choose the matrix). In each game, find all the equilibria, if any exist. 1 4
t t1
(2 , 5)
b1
(5 , 2)
t2
(4 , 8)
b2
(8 , 4)
II
T I B
II
b
(4 , 7)
3 5
(5 , 20)
0
T
τ b1
I
III B
β
b2
(8 , 2, 5) (5 , 8, 2)
β
(0 , 9, 20)
(0 , 3, 6)
Game D
(6 , 3, 0)
τ
t2 II
b1
T
t2 I
M
II
B
(0 , 3)
II
b
(0 , 0)
t
(3 , 12)
b
(9, 6)
T2
1 3
I
(3 , 15)
B2 Game C
t1 (1 , 2, 3)
t B1
Game B
II II
2 3
(25, 10)
2 5
Game A
t1
I
0
I B
(6 , 9)
T1
(8 , 4)
3 4
II
T
(12, 16)
0
(2 , 4, 5) (3 , 8, 2) τ2 (2 , 7, 3) III (4 , 0, 5) β2
b2
(1 , 1, 1)
τ1
(2 , 4, 8)
β1
(27, 9, 3)
III
t2
Game E
2 3
T
0 1 3
I B
II
II
t1 m1
m2 b2 t3
II b3 (9 , 5) (7 , 7)
(5 , 9) b1 Game F
(12, 6) (9, 3) (0 , 6) (9, 3) (6 , 12)
135
4.16 Exercises
4.22 Let X and Y be two compact sets in Rm , and let f : X × Y → R be a continuous function. Prove that the function x #→ miny∈Y f (x, y) is also a continuous function. 4.23 In each of the following two-player zero-sum games, implement a process of iterative elimination of dominated strategies. For each game list the strategies you have eliminated and find the maxmin strategy of Player I and the minmax strategy of Player II.
Player I
a
Player II b c
d
γ
8
4
8
4
β
2
5
3
8
α
6
1
4
5
Game A
a
b
Player II c
d
δ
6
4
2
1
γ
5
3
3
0
β
1
0
5
α
2
−3
2
Player I
a
Player II b c
d
δ
3
6
5
5
γ
5
5
5
5
4
β
5
3
5
6
3
α
6
5
5
3
Player I
Game B
Game C
4.24 Prove that in Example 4.23 on page 99 (duopoly competition) the pair of strategies (q1∗ , q2∗ ) defined by q1∗ =
2 − 2c1 + c2 , 3
q2∗ =
2 − 2c2 + c1 3
(4.97)
is an equilibrium. 4.25 Prove Theorem 4.26 (page 105): if player i has a (weakly) dominant strategy, then it is his (not necessarily unique) maxmin strategy. Moreover, this strategy is his best reply to every strategy vector of the other players. 4.26 Prove Theorem 4.28 (page 105): in a game in which every player i has a strategy si∗ that strictly dominates all of his other strategies, the strategy vector (s1∗ , . . . , sn∗ ) is the unique equilibrium point of the game as well as the unique vector of maxmin strategies. si ∈ Si be an 4.27 Let G = (N, (Si )i∈N , (ui )i∈N ) be a game in strategic form, and let be the game derived from G by arbitrary strategy of player i in this game. Let G the elimination of strategy si . Prove that for each player j , j = i, the maxmin value
136
Strategic-form games
is greater than or equal to his maxmin value in G. Is the of player j in the game G necessarily less than his maxmin value in G? maxmin value of player i in game G Prove this last statement, or find a counterexample.
4.28 Find an example of a game G = (N, (Si )i∈N , (ui )i∈N ) in strategic form such that the derived from G by elimination of one strategy in one player’s strategy set game G has an equilibrium that is not an equilibrium in the game G.
4.29 Prove Corollary 4.33 on page 108: let G = (N, (Si )i∈N , (ui )i∈N ) be a strategic form be the game derived from G by iterative elimination of dominated game and let G is also an equilibrium in the strategies. Then every equilibrium s ∗ in the game G game G.
4.30 Find an example of a strategic form game G and of an equilibrium s ∗ of that game such that for each player i ∈ N the strategy si∗ is dominated. 4.31 The following questions relate to the following two-player zero-sum game.
T
II
B
II
t1
12
b1
15
t2
8
b2
10
I
(a) (b) (c) (d)
Find an optimal strategy for each player by applying backward induction. Describe this game in strategic form. Find all the optimal strategies of the two players. Explain why there are optimal strategies in addition to the one you identified by backward induction.
4.32 (a) Let A = (aij ) be an n × m matrix representing a two-player zero-sum game, where the row player is Ann and the column player is Bill. Let B = (bj i ) be a new m × n matrix in which the row player is Bill and the column player is Ann. What is the relation between the matrices A and B? (b) Conduct a similar transformation of the names of the players in the following matrix and write down the new matrix. Player II L
M
R
T
3
−5
7
B
−2
8
4
Player I
137
4.16 Exercises
4.33 The value of the two-player zero-sum game given by the matrix A is 0. Is it necessarily true that the value of the two-player zero-sum game given by the matrix −A is also 0? If your answer is yes, prove this. If your answer is no, provide a counterexample. 4.34 Let A and B be two finite sets, and let u : A × B → R be an arbitrary function.11 Prove that max min u(a, b) ≤ min max u(a, b). a∈A b∈B
(4.98)
b∈B a∈A
4.35 Show whether or not the value exists in each of the following games. If the value exists, find it and find all the optimal strategies for each player. As usual, Player I is the row player and Player II is the column player. a
b
A
2
2
B
1
3
a
b
c
A
1
2
3
B
4
3
0
Game A
Game B
a
b
c
d
A
3 12
3
4
12
B
7
5
6
C
4
2
3
a
b
A
3
0
13
B
2
2
0
C
0
3
Game C
Game D
4.36 Prove Theorem 4.48 (page 118): in a two-player zero-sum game, (sI∗ , sII∗ ) is a saddle point if and only if sI∗ is an optimal strategy for Player I and sII∗ is an optimal strategy for Player II. 4.37 Let A and B be two finite-dimensional matrices with positive payoffs. Show that the game A
0
0
B
has no value. (Each 0 here represents a matrix of the proper dimensions, such that all of its entries are 0.)
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
11 The finiteness of A and B is needed to ensure the existence of a minimum and maximum in Equation (4.98). The claim holds (using the same proof) for each pair of sets A and B and function u for which the min and the max of the function in Equation (4.98) exist (for example, if A and B are compact sets and u is a continuous function; see Exercise 4.22). Alternatively, we may remove all restrictions on A, B, and u and replace min by inf and max by sup.
138
Strategic-form games
4.38 Answer the following questions with reference to Game A and Game B that appear in the diagram below. (a) Find all equilibria obtained by backward induction. (b) Describe the games in strategic form. (c) Check whether there are other Nash equilibria in addition to those found by backward induction.
2 3
a c a
II d
3 e
2
f
3
I
I
f g
0 1 3
II h
I b
b
e II
5
d
4
−6 −9 12
II
3 Game A
c
6
Game B
4.39 Prove Theorem 4.49 (on page 118) using backward induction (a general outline for the proof can be found in Remark 4.50 on page 121). 4.40 Find a Nash equilibrium in the following game using backward induction:
t1
I
b1
I
II T1 I
T2
1, 2, 3
B2
2, 3, 4
T3
0, 0, 3
B3
0, −1, 5
t2
5, 4, 10
b2
2, 5, 0
t3
10, 1, 3
b3
0, 0, 5
t4
0, 0, 5
b4
3, 1, 4
II τ
B1 III
µ
II
β II
Find an additional Nash equilibrium of this game. 4.41 In a two-player zero-sum game on the unit square where Player I’s strategy set is X = [0, 1] and Player II’s strategy set is Y = [0, 1], check whether or not the game
139
4.16 Exercises
associated with each of the following payoff functions has a value, and if so, find the value and optimal strategies for the two players: (a) u(x, y) = 1 + 4x + y − 5xy. (b) u(x, y) = 4 + 2y − 4xy. 4.42 Consider a two-player non-zero-sum game on the unit square in which Player I’s strategy set is X = [0, 1], Player II’s strategy set is Y = [0, 1], and the payoff functions for the players are given below. Find the maxmin value and the maxmin strategy (or strategies) of the players. Does this game have an equilibrium? If so, find it. uI (x, y) = 2x − xy,
uII (x, y) = 2 + 3x + 3y − 3xy. 4.43 Consider a two-player non-zero-sum game on the unit square in which Player I’s strategy set is X = [0, 1], and Player II’s strategy set is Y = [0, 1], which has a unique equilibrium (x ∗ , y ∗ ), where x ∗ , y ∗ ∈ (0, 1). Prove that the equilibrium payoff to each player equals his maxmin value. 4.44 Fifty people are playing the following game. Each player writes down, on a separate slip of paper, one integer in the set {0, 1, . . . , 100}, alongside his name. The gamemaster then reads the numbers on each slip of paper, and calculates the average x of all the numbers written by the players. The winner of the game is the player (or players) who wrote down the number that is closest to 32 x. The winners equally divide the prize of $1,000 between them. Describe this as a strategic-form game, and find all the Nash equilibria of the game. What would be your strategy in this game? Why? 4.45 Peter, Andrew, and James are playing the following game in which the winner is awarded M dollars. Each of the three players receives a coupon and is to decide whether or not to bet on it. If a player chooses to bet, he or she loses the coupon with probability 12 and wins an additional coupon with probability 12 (thus resulting in two coupons in total). The success of each player in the bet is independent of the results of the bets of the other players. The winner of the prize is the player with the greatest number of coupons. If there is more than one such player, the winner is selected from among them in a lottery where each has an equal chance of winning. The goal of each player is to maximize the probability of winning the award. (a) Describe this game as a game in strategic form and find all its Nash equilibria. (b) Now assume that the wins and losses of the players are perfectly correlated: a single coin flip determines whether all the players who decided to bid either all win an additional coupon or all lose their coupons. Describe this new situation as a game in strategic form and find all its Nash equilibria. 4.46 Partnership Game Lee (Player 1), and Julie (Player 2), are business partners. Each of the partners has to determine the amount of effort he or she will put into the business, which is denoted by ei , i = 1, 2, and may be any nonnegative real
140
Strategic-form games
number. The cost of effort ei for Player i is cei , where c > 0 is equal for both players. The success of the business depends on the amount of effort put in by the players; the business’s profit is denoted by r(e1 , e2 ) = e1α1 e2α2 , where α1 , α2 ∈ (0, 1) are fixed constants known by Lee and Julie, and the profit is shared equally between the two partners. Each player’s utility is given by the difference between the share of the profit received by that player and the cost of the effort he or she put into the business. Answer the following questions: (a) Describe this situation as a strategic-form game. Note that the set of strategies of each player is the continuum. (b) Find all the Nash equilibria of the game. 4.47 Braess Paradox There are two main roads connecting San Francisco and San Jose, a northern road via Mountain View and a southern road via Cupertino. Travel time on each of the roads depends on the number x of cars using the road per minute, as indicated in the following diagram. Mountain View 1+x
51 + 0 .1x
San Francisco
San Jose 51 + 0 .1x
1+x Cupertino
For example, the travel time between San Francisco and Mountain View is 1 + x, where x is the number of cars per minute using the road connecting these cities, and the travel time between Mountain View and San Jose is 51 + 0.1x, where x is the number of cars per minute using the road connecting those two cities. Each driver chooses which road to take in going from San Francisco to San Jose, with the goal of reducing to a minimum the amount of travel time. Early in the morning, 60 cars per minute get on the road from San Francisco to San Jose (where we assume the travellers leave early enough in the morning so that they are the only ones on the road at that hour). (a) Describe this situation as a strategic-form game, in which each driver chooses the route he will take. (b) What are all the Nash equilibria of this game? At these equilibria, how much time does the trip take at an early morning hour? (c) The California Department of Transportation constructs a new road between Mountain View and Cupertino, with travel time between these cities 10 + 0.1x (see the diagram below). This road is one way, enabling travel solely from Mountain View to Cupertino. Find a Nash equilibrium in the new game. Under this equilibrium how much time does it take to get to San Jose from San Francisco at an early morning hour?
141
4.16 Exercises
(d) Does the construction of the additional road improve travel time? Mountain View 1+x
51 + 0 .1x 10 + 0 .1x
San Francisco 51 + 0 .1x
San Jose
1+x Cupertino
This phenomenon is “paradoxical” because, as you discovered in the answers to (b) and (c), the construction of a new road increases the travel time for all travellers. This is because when the new road is opened, travel along the San Francisco–Mountain View–Cupertino–San Jose route takes less time than along the San Francisco–Mountain View–San Jose route and the San Francisco–Cupertino– San Jose route, causing drivers to take the new route. But that causes the total number of cars along the two routes San Francisco–Mountain View–San Jose and San Francisco–Cupertino–San Jose to increase: travel time along each stretch of road increases. Such a phenomenon was in fact noted in New York (where the closure of a road for construction work had the effect of decreasing travel time) and in Stuttgart (where the opening of a new road increased travel time). 4.48 The Davis Removal Company and its main rival, Roland Ltd, have fleets of ten trucks each, which leave the companies’ headquarters for Chicago each morning at 5 am for their daily assignments. At that early hour, these trucks are the only vehicles on the roads. Travel time along the road between the Davis Removal Company and √ Chicago is 20 + 2 x, where x is the number of cars on the road, and it is similarly √ 20 + 2 x on the road connecting the headquarters of Roland Ltd with Chicago, where x is the number of cars on the road. The Illinois Department of Transportation paves a new two-way road between the companies’ headquarters, where travel time on this new road is 0.2, independent of the number of cars on the road. This situation is described in the following diagram. Davis 20 + 2√x Chicago
0.2
0.2
20 + 2√x Roland
Answer the following questions: (a) Before the new road is constructed, what is the travel time of each truck between its headquarters and Chicago?
142
Strategic-form games
(b) Describe the situation after the construction of the new road as a two-player strategic-form game, in which the players are the managers of the removal companies and each player must determine the number of trucks to send on the road connecting his company with Chicago (with the rest traveling on the newly opened road and the road connecting the other company’s headquarters and Chicago), with the goal of keeping to a minimum the total travel time to Chicago of all the trucks in its fleet. Note that if Davis, for example, instructs all its drivers to go on the road between company headquarters and Chicago, and Roland sends seven of its trucks directly to Chicago and three first to the Davis headquarters and then to Chicago, the total time racked up by the fleet of Roland Ltd is √ √ 7 × (20 + 2 7) + 3 × (0.2 + 20 + 2 13).
(4.99)
(c) Is the strategy vector in which both Davis and Roland send their entire fleets directly to Chicago, ignoring the new road, a Nash equilibrium? (d) Show that the strategy vector in which both Davis and Roland send six drivers directly to Chicago and four via the new road is an equilibrium. What is the total travel time of the trucks of the two companies in this equilibrium? Did the construction of a new road decrease or increase total travel time? (e) Construct the payoff matrix of this game, with the aid of a spreadsheet program. Are there any additional equilibria in this game? 4.49 Location games Two competing coffee house chains, Pete’s Coffee and Caribou Coffee, are seeking locations for new branch stores in Cambridge. The town is comprised of only one street, along which all the residents live. Each of the two chains therefore needs to choose a single point within the interval [0, 1], which represents the exact location of the branch store along the road. It is assumed that each resident will go to the coffee house that is nearest to his place of residence. If the two chains choose the exact same location, they will each attract an equal number of customers. Each chain, of course, seeks to maximize its number of customers. To simplify the analysis required here, suppose that each point along the interval [0, 1] represents a town resident, and that the fraction of residents who frequent each coffee house is the fraction of points closer to one store than to the other. (a) Describe this situation as a two-player strategic-form game. (b) Prove that the only equilibrium in this game is that given by both chains selecting the location x = 12 . (c) Prove that if three chains were to compete for a location in Cambridge, the resulting game would have no equilibrium. (Under this scenario, if two or three of the chains choose the same location, they will split the points closest to them equally between them.)
143
4.16 Exercises
4.50 For each of the following two games, determine whether or not it can represent a strategic-form game corresponding to an extensive-form game with perfect information. If so, describe a corresponding extensive-form game; if not, justify your answer.
a
Player II b
A
1, 1
5, 3
B
3, 0
5, 3
Player I
a
Player II b
A
3, 0
0, 4
B
3, 0
0, 4
Player I C
1, 1
0, 4
C
3, 0
1, 1
D
3, 0
5, 3
D
3, 0
5, 3
Game A
Game B
4.51 Let Ŵ be a game in extensive form. The agent-form game derived from Ŵ is a strategic-form game where each player i in Ŵ is split into several players: for each information set Ui ∈ Ui of player i we define a player (i, Ui ) in the agent-form game. Thus, if each player i has ki information sets in Ŵ, then there are i∈N ki players in the agent-form game. The set of strategies of player (i, Ui ) is A(Ui ). There is a bijection between the set of strategy vectors in the game Ŵ and the set of strategy vectors in the agent-form game: the strategy vector σ = (σi )i∈N in Ŵ corresponds to the strategy vector (σi (Ui )){i∈N,Ui ∈Ui } in the agent-form game. The payoff function of player (i, Ui ) in the agent-form game is the payoff function of player i in the game Ŵ. Prove that if σ = (σi )i∈N is a Nash equilibrium in the game Ŵ, then the strategy vector (σi (Ui )){i∈N,Ui ∈Ui } is a Nash equilibrium in the agent-form game derived from Ŵ.
5
Mixed strategies
Chapter summary Given a game in strategic form we extend the strategy set of a player to the set of all probability distributions over his strategies. The elements of the new set are called mixed strategies, while the elements of the original strategy set are called pure strategies. Thus, a mixed strategy is a probability distribution over pure strategies. For a strategic-form game with finitely many pure strategies for each player we define the mixed extension of the game, which is a game in strategic form in which the set of strategies of each player is his set of mixed strategies, and his payoff function is the multilinear extension of his payoff function in the original game. The main result of the chapter is the Nash Theorem, which is one of the milestones of game theory. It states that the mixed extension always has a Nash equilibrium; that is, a Nash equilibrium in mixed strategies exists in every strategic-form game in which all players have finitely many pure strategies. We prove the theorem and provide ways to compute equilibria in special classes of games, although the problem of computing Nash equilibrium in general games is computationally hard. We generalize the Nash Theorem to mixed extensions in which the set of strategies of each player is not the whole set of mixed strategies, but rather a polytope subset of this set. We investigate the relation between utility theory discussed in Chapter 2 and mixed strategies, and define the maxmin value and the minmax value of a player (in mixed strategies), which measure respectively the amount that the player can guarantee to himself, and the lowest possible payoff that the other players can force on the player. The concept of evolutionary stable strategy, which is the Nash equilibrium adapted to Darwin’s Theory of Evolution, is presented in Section 5.8.
There are many examples of interactive situations (games) in which it is to a decision maker’s advantage to be “unpredictable”:
r If a baseball pitcher throws a waist-high fastball on every pitch, the other team’s batters will have an easy time hitting the ball. r If a tennis player always serves the ball to the same side of the court, his opponent will have an advantage in returning the serve. r If a candidate for political office predictably issues announcements on particular dates, his opponents can adjust their campaign messages ahead of time to pre-empt him and gain valuable points at the polls. 144
145
5.1 The mixed extension of a strategic-form game
r If a traffic police car is placed at the same junction at the same time every day, its effectiveness is reduced. It is easy to add many more such examples, in a wide range of situations. How can we integrate this very natural consideration into our mathematical model? Example 5.1 Consider the two-player zero-sum game depicted in Figure 5.1.
Player II minsII u(sI, sII)
L
R
T
4
1
1
B
2
3
2
4
3
2, 3
Player I maxsI u(sI, sII)
Figure 5.1 A two-player zero-sum game; the security values of the players are circled
Player I’s security level is 2; if he plays B he guarantees himself a payoff of at least 2. Player II’s security level is 3; if he plays R he guarantees himself a payoff of at most 3. This is written as v = max
min u(sI , sII ) = 2,
(5.1)
v¯ = min
max u(sI , sII ) = 3.
(5.2)
sI ∈{T ,B} sII ∈{L,R} sII ∈{L,R} sI ∈{T ,B}
Since v¯ = 3 > 2 = v,
(5.3)
the game has no value. Can one of the players, say Player I, guarantee a “better outcome” by playing “unpredictably”? Suppose that Player I tosses a coin with parameter 41 , that is, a coin that comes up heads with probability 14 and tails with probability 34 . Suppose furthermore that Player I plays T if the result of the coin toss is heads and B if the result of the coin toss is tails. Such a strategy is called a mixed strategy. What would that lead to? First of all, the payoffs would no longer be definite, but instead would be probabilistic payoffs. If Player II plays L the result is a lottery [ 41 (4), 43 (2)]; that is, with probability 1 Player II pays 4, and with probability 34 pays 2. If these payoffs are the utilities of a player whose 4 preference relation satisfies the von Neumann–Morgenstern axioms (see Chapter 2), then Player I’s utility from this lottery is 14 × 4 + 43 × 2 = 2 21 . If, however, Player II plays R the result is the lottery [ 14 (1), 43 (3)]. In this case, if the payoffs are utilities, Player I’s utility from this lottery is 1 × 1 + 34 × 3 = 2 12 . ◭ 4
5.1
The mixed extension of a strategic-form game • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
In the rest of this section, we will assume that the utilities of the players satisfy the von Neumann–Morgenstern axioms; hence their utility functions are linear (in probabilities).
146
Mixed strategies
In other words, the payoff (= utility) to a player from a lottery is the expected payoff of that lottery. With this definition of what a payoff is, Player I can guarantee that no matter what happens his expected payoff will be at least 2 21 , in contrast to a security level of 2 if he does not base his strategy on the coin toss. Definition 5.2 Let G = (N, (Si )i∈N , (ui )i∈N ) be a strategic-form game in which the set of strategies of each player is finite. A mixed strategy of player i is a probability distribution over his set of strategies Si . Denote by ⎫ ⎧ ⎬ ⎨
σi (si ) = 1 (5.4) i := σi : Si → [0, 1] : ⎭ ⎩ si ∈Si
the set of mixed strategies of player i.
A mixed strategy of player i is, therefore, a probability distribution over Si : σi = (σi (si ))si ∈Si . The number σi (si ) is the probability of playing the strategy si . To distinguish between the mixed strategies i and the strategies Si , the latter are called pure strategies. Because all the results proved in previous chapters involved only pure strategies, the claims in them should be qualified accordingly. For example, Kuhn’s Theorem (Theorem 4.49 on page 118) should be read as saying: In every finite game with perfect information, there is at least one equilibrium point in pure strategies. We usually denote a mixed strategy using the notations for lotteries (see Chapter 2). For example, if Player I’s set of pure strategies is SI = {A, B, C}, we denote the mixed strategy σI under which he chooses each pure strategy with probability 13 by σI = [ 13 (A), 13 (B), 13 (C)]. If SI = {H, T }, Player I’s set of mixed strategies is I = {[p1 (H ), p2 (T )] : p1 ≥ 0,
p2 ≥ 0,
p1 + p2 = 1}.
(5.5)
In this case, the set I is equivalent to the interval in R2 connecting (1, 0) to (0, 1). We can identify I with the interval [0, 1] by identifying every real number x ∈ [0, 1] with the probability distribution over {H, T } that satisfies p(H ) = x and p(T ) = 1 − x. If SII = {L, M, R}, Player II’s set of mixed strategies is II = {[p1 (L), p2 (M), p2 (R)] : p1 ≥ 0,
p2 ≥ 0,
p3 ≥ 0,
p1 + p2 + p3 = 1}.
(5.6)
In this case, the set II is equivalent to the triangle in R3 whose vertices are (1, 0, 0), (0, 1, 0), and (0, 0, 1). For any finite set A, denote by (A) the set of all probability distributions over A. That is,
p(a) = 1 . (5.7)
(A) := p : A → [0, 1] : a∈A
The set (A) is termed a simplex in R|A| . The dimension of the simplex (A) is |A| − 1 (this follows from the constraint that a∈A p(a) = 1). We denote the number of pure strategies of player i by mi , and we assume that his pure strategies have
147
5.1 The mixed extension of a strategic-form game
a particular ordering, with the denotation Si = {si1 , si2 , . . . , simi }. It follows that the set of mixed strategies i = (Si ) is a subset of Rmi of dimension mi − 1. We identify a mixed strategy si with the pure strategy σi = [1(si )], in which the pure strategy si is chosen with probability 1. This implies that every pure strategy can also be considered a mixed strategy. We now define the mixed extension of a game. Definition 5.3 Let G = (N, (Si )i∈N , (ui )i∈N ) be a strategic-form game in which for every player i ∈ N, the set of pure strategies Si is nonempty and finite. Denote by S := S1 × S2 × · · · × Sn the set of pure strategy vectors. The mixed extension of G is the game Ŵ = (N, (i )i∈N , (Ui )i∈N ),
(5.8)
in which, for each i ∈ N, player i’s set of strategies is i = (Si ), and his payoff function is the function Ui : → R, which associates each strategy vector σ = (σ1 , . . . , σn ) ∈ = 1 × · · · × n with the payoff Ui (σ ) = Eσ [ui (σ )] =
(s1 ,...,sn )∈S
ui (s1 , . . . , sn )σ1 (s1 )σ2 (s2 ) · · · σn (sn ).
(5.9)
Remark 5.4 Mixed strategies were defined above only for the case in which the sets of pure strategies are finite. It follows that the mixed extension of a game is only defined when the set of pure strategies of each player is finite. However, the concept of mixed strategy, and hence the mixed extension of a game, can be defined when the set of pure strategies of a player is a countable set (see Example 5.12 and Exercise 5.50). In that case the set i = (Si ) is an infinite-dimensional set. It is possible to extend the definition of mixed strategy further to the case in which the set of strategies is any set in a measurable space, but that requires making use of concepts from measure theory that go beyond the background in mathematics assumed for this book. Note that the fact that the mixed strategies of the players are statistically independent of each other plays a role in Equation (5.9), because the probability of drawing a particular vector of pure strategies (s1 , s2 , . . . , sn ) is the product σ1 (s1 )σ2 (s2 ) · · · σn (sn ). In other words, each player i conducts the lottery σi that chooses si independently of the lotteries conducted by the other players. The mixed extension Ŵ of a strategic-form game G is itself a strategic-form game, in which the set of strategies of each player is of the cardinality of the continuum. It follows that all the concepts we defined in Chapter 4, such as dominant strategy, security level, and equilibrium, are also defined for Ŵ, and all the results we proved in Chapter 4 apply to mixed extensions of games. Definition 5.5 Let G be a game in strategic form, and Ŵ be its mixed extensions. Every equilibrium of Ŵ is called an equilibrium in mixed strategies of G. If G is a two-player zero-sum game, and if Ŵ has value v, then v is called the value of G in mixed strategies.
148
Mixed strategies
Example 5.1 (Continued ) Consider the two-player zero-sum game in Figure 5.2.
Player II L
R
T
4
1
B
2
3
Player I Figure 5.2 The game in strategic form
When Player I’s strategy set contains two actions, T and B, we identify the mixed strategy [x(T ), (1 − x)(B)] with the probability x of selecting the pure strategy T . Similarly, when Player II’s strategy set contains two actions, L and R, we identify the mixed strategy [y(L), (1 − y)(R)] with the probability y of selecting the pure strategy L. For each pair of mixed strategies x, y ∈ [0, 1] (with the identifications x ≈ [x(T ), (1 − x)(B)] and y ≈ [y(L), (1 − y)(R)]) the payoff is U (x, y) = 4xy + 1x(1 − y) + 2(1 − x)y + 3(1 − x)(1 − y)
(5.10)
= 3 − 2x − y + 4xy.
(5.11)
This mixed extension is identical to the game over the unit square presented in Section 4.14.1. As we showed there, the game has the value 2 12 , and its optimal strategies are x = 14 and y = 21 . It follows that the value in mixed strategies of the game in Figure 5.2 is 2 21 , and the optimal strategies of the players are x ∗ = [ 41 (T ), 34 (B)] and y ∗ = [ 12 (L), 21 (R)]. We conclude that this game has no ◭ value in pure strategies, but it does have a value in mixed strategies.
The payoff function defined in Equation (5.10) is a linear function over x for each fixed y and, similarly, a linear function over y for each fixed x. Such a function is called a bilinear function. The analysis we conducted in Example 5.1 can be generalized to all two-player games where each player has two pure strategies. The extension to mixed strategies of such a game is a game on the unit square with bilinear payoff functions. In the converse direction, every zero-sum two-player game over the unit square with bilinear payoff functions is the extension to mixed strategies of a two-player zero-sum game in which each player has two pure strategies (Exercise 5.6). The next theorem states that this property can be generalized to any number of players and any number of actions, as long as we properly generalize the concept of bilinearity to multilinearity. Theorem 5.6 Let G = (N, (Si )i∈N , (ui )i∈N ) be a game in strategic form in which the set of strategies Si of every player is finite, and let Ŵ = (N, (i )i∈N , (Ui )i∈N ) be its mixed extension. Then for each player i ∈ N, the function Ui is a multilinear function in the n variables (σi )i∈N , i.e., for every player i, for every σi , σi′ ∈ i , and for every λ ∈ [0, 1], Ui (λσi + (1 − λ)σi′ , σ−i ) = λUi (σi , σ−i ) + (1 − λ)Ui (σi′ , σ−i ),
∀σ−i ∈ −i .
Proof: Recall that Ui (σ ) =
(s1 ,...,sn )∈S
ui (s1 , . . . , sn )σ1 (s1 )σ2 (s2 ) · · · σn (sn ).
(5.12)
149
5.1 The mixed extension of a strategic-form game
The function Ui is a function of ni=1 mi variables: σ1 s11 , σ1 s12 , . . . , σ1 s1m1 , σ2 s21 , . . . , σ2 s2m2 , . . . , σn (sn1 ), . . . , σn (snmn ). (5.13)
For each i ∈ N, for all j , 1 ≤ j ≤ mi and for each s = (s1 , . . . , sn ) ∈ S, the function j (5.14) σi si #→ ui (s1 , . . . , sn )σ1 (s1 )σ2 (s2 ) · · · σn (sn ) j
j
is a constant function if si = si and a linear function of σi (si ) with slope ui (s1 , . . . , sn )σ1 (s1 )σ2 (s2 ) · · · σ2 (si−1 )σ2 (si+1 ) · · · σn (sn )
(5.15) j
j
if si = si . Thus, the function Ui , as the sum of linear functions in σi (si ), is also linear j in σi (si ). It follows that for every i ∈ N, the function Ui (·, σ−i ) is linear in each of the j coordinates σi (si ) of σi , for all σ−i ∈ −i : Ui (λσi + (1 − λ)σi′ , σ−i ) = λUi (σi , σ−i ) + (1 − λ)Ui (σi′ , σ−i ), for every λ ∈ [0, 1], and every σi , σi′ ∈ i .
(5.16)
Since a multilinear function over is a continuous function (see Exercise 5.4), we have the following corollary of Theorem 5.6. Corollary 5.7 The payoff function Ui of player i is a continuous function in the extension to mixed strategies of every finite strategic-form game G = (N, (Si )i∈N , (ui )i∈N ). We can also derive a second corollary from Theorem 5.6, which can be used to determine whether a particular mixed strategy vector is an equilibrium. Corollary 5.8 Let G = (N, (Si )i∈N , (ui )i∈N ) be a strategic-form game, and let Ŵ be its mixed extension. A mixed strategy vector σ ∗ is an equilibrium in mixed strategies of Ŵ if and only if for every player i ∈ N and every pure strategy s i ∈ S i ∗ Ui (σ ∗ ) ≥ Ui (si , σ−i ).
(5.17)
∗ Proof: If σ ∗ is an equilibrium in mixed strategies of Ŵ, then Ui (σ ∗ ) ≥ Ui (σi , σ−i ) for every player i ∈ N and every mixed strategy σi ∈ i . Since every pure strategy is in ∗ particular a mixed strategy, Ui (σ ∗ ) ≥ Ui (si , σ−i ) for every player i ∈ N and every pure i i strategy s ∈ S , and Equation (5.17) holds. To show the converse implication, suppose that the mixed strategy vector σ ∗ satisfies Equation (5.17) for every player i ∈ N and every pure strategy s i ∈ S i . Then for each mixed strategy σi of player i,
∗ ∗ )= σi (si )Ui (si , σ−i ) (5.18) Ui (σi , σ−i si ∈Si
≤
σi (si )Ui (σ ∗ )
si ∈Si
= Ui (σ ∗ )
si ∈Si
σi (si ) = Ui (σ ∗ ),
(5.19) (5.20)
150
Mixed strategies
where Equation (5.18) follows from the fact that Ui is a multilinear function, and Equation (5.19) follows from Equation (5.17). In particular, σ ∗ is an equilibrium in mixed strategies of Ŵ.
Example 5.9 A mixed extension of a two-player game that is not zero-sum Consider the two-player non-zero-sum game given by the payoff matrix shown in Figure 5.3.
Player II L
R
T
1, −1
0, 2
B
0, 1
2, 0
Player I Figure 5.3 A two-player, non-zero-sum game without an equilibrium
As we now show, this game has no equilibrium in pure strategies (you can follow the arrows in Figure 5.3 to see why this is so).
r r r r
(T , L) is not an equilibrium, since Player II can gain by deviating to R. (T , R) is not an equilibrium, since Player I can gain by deviating to B. (B, L) is not an equilibrium, since Player I can gain by deviating to T . (B, R) is not an equilibrium, since Player II can gain by deviating to L.
Does this game have an equilibrium in mixed strategies? To answer this question, we first write out the mixed extension of the game:
r The set of players is the same as the set of players in the original game: N = {I, II}. r Player I’s set of strategies is I = {[x(T ), (1 − x)(B)] : x ∈ [0, 1]}, which can be identified with the interval [0, 1].
r Player II’s set of strategies is II = {[y(L), (1 − y)(R)] : y ∈ [0, 1]}, which can be identified with the interval [0, 1].
r Player I’s payoff function is UI (x, y) = xy + 2(1 − x)(1 − y) = 3xy − 2x − 2y + 2.
(5.21)
r Player II’s payoff function is UII (x, y) = −xy + 2x(1 − y) + y(1 − x) = −4xy + 2x + y.
(5.22)
This is the game on the unit square that we studied in Section 4.14.2 (page 123). We found a unique equilibrium for this game: x ∗ = 14 and y ∗ = 23 . The unique equilibrium in mixed strategies of the given game is therefore 1 2 3 1 (5.23) 4 (T ), 4 (B) , 3 (L), 3 (R) .
◭
We have seen in this section two examples of two-player games, one a zero-sum game and the other a non-zero-sum game. Neither of them has an equilibrium in pure strategies, but they both have equilibria in mixed strategies. Do all games have equilibria in mixed
151
5.1 The mixed extension of a strategic-form game
strategies? John Nash, who defined the concept of equilibrium, answered this question affirmatively. Theorem 5.10 (Nash [1950b, 1951]) Every game in strategic form G, with a finite number of players and in which every player has a finite number of pure strategies, has an equilibrium in mixed strategies. The proof of Nash’s Theorem will be presented later in this chapter. As a corollary, along with Theorem 4.45 on page 115, we have an analogous theorem for two-player zero-sum games. This special case was proven by von Neumann twenty-two years before Nash proved his theorem on the existence of the equilibrium that bears his name. Theorem 5.11 (von Neumann’s Minmax Theorem [1928]) Every two-player zero-sum game in which every player has a finite number of pure strategies has a value in mixed strategies. In other words, in every two-player zero-sum game the minmax value in mixed strategies is equal to the maxmin value in mixed strategies. Nash regarded his result as a generalization of the Minmax Theorem to n players. This is, in fact, a generalization of the Minmax Theorem here to two-player games that may not be zero-sum, and to games with any finite number of players. On the other hand, as we noted on page 117, this is a generalization of only one aspect of the notion of the “value” of a game, namely, the aspect of stability. The other aspect of the value of a game – the security level – which characterizes the value in two-player zero-sum games, is not generalized by the Nash equilibrium. Recall that the value in mixed strategies of a two-player zero-sum game, if it exists, is given by v := max min U (σI , σII ) = min max U (σI , σII ). σI ∈I σII ∈II
σII ∈II σI ∈I
(5.24)
Since the payoff function is multilinear, for every strategy σI of Player I, the function σII #→ U (σI , σII ) is linear. A point x in a set X ⊆ Rn is called an extreme point if it is not the linear combination of two other points in the set (see Definition 23.2 on page 917). Every linear function defined over a compact set attains its maximum and minimum at extreme points. The set of extreme points of a collection of mixed strategies is the set of pure strategies (Exercise 5.5). It follows that for every strategy σI of Player I, it suffices to calculate the internal maximum in the middle term in Equation (5.24) over pure strategies. Similarly, for every strategy σII of Player II, it suffices to compute the internal maximum in the right-hand term in Equation (5.24) over pure strategies. That is, if v is the value in mixed strategies of the game, then v = max min U (σI , sII ) = min max U (sI , σII ). σI ∈I sII ∈SII
σII ∈II sI ∈SI
(5.25)
As the next example shows, when the number of pure strategies is infinite, Nash’s Theorem and the Minmax Theorem do not hold.
152
Mixed strategies
Example 5.12 Choosing the largest number Consider the following two-player zero-sum game. Two players simultaneously and independently choose a positive integer. The player who chooses the smaller number pays a dollar to the player who chooses the largest number. If the two players choose the same integer, no exchange of money occurs. We will model this as a game in strategic form, and then show that it has no value in mixed strategies. Both players have the same set of pure strategies: SI = SII = N = {1, 2, 3, . . .}. This set is not finite; it is a countably infinite set. The payoff function is ⎧ when sI > sII , ⎨1 when sI = sII , u(sI , sII ) = 0 ⎩ −1 when sI < sII .
(5.26)
(5.27)
A mixed strategy in this game is a probability distribution over the set of nonnegative integers: ∞
I = II = (x1 , x2 , . . .) : xk = 1, xk ≥ 0 ∀k ∈ N . (5.28) k=1
We will show that sup inf U (σI , σII ) = −1
(5.29)
inf sup U (σI , σII ) = 1.
(5.30)
σI ∈I σII ∈II
and σII ∈II σI ∈I
It will then follow from Equations (5.29) and (5.30) that the game has no value in mixed strategies. Let σI be the strategy of Player I, and let ε ∈ (0, 1). Since σI is a distribution over N, there exists a sufficiently large k ∈ N satisfying σI ({1, 2, . . . , k}) > 1 − ε.
(5.31)
In words, the probability that Player I will choose a number that is less than or equal to k is greater than 1 − ε. But then, if Player II chooses the pure strategy k + 1 we will have U (σI , k + 1) < (1 − ε) × (−1) + ε × 1 = −1 + 2ε,
(5.32)
because with probability greater than 1 − ε, Player I loses and the payoff is −1, and with probability less than ε, he wins and the payoff is 1. Since this is true for any ε ∈ (0, 1), Equation (5.29) holds. ◭ Equation (5.30) is proved in a similar manner.
We defined extensive-form games with the use of finite games; in particular, in every extensive-form game every player has a finite number of pure strategies. We therefore have the following corollary of Theorem 5.10. Theorem 5.13 Every extensive-form game has an equilibrium in mixed strategies.
5.2
Computing equilibria in mixed strategies • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Before we proceed to the proof of Nash’s Theorem, we will consider the subject of computing equilibria in mixed strategies. When the number of players is large, and
153
5.2 Computing equilibria in mixed strategies
similarly when the number of strategies is large, finding an equilibrium, to say nothing of finding all the equilibria, is a very difficult problem, both theoretically and computationally. We will present only a few examples of computing equilibria in simple games.
5.2.1
The direct approach The direct approach to finding equilibria is to write down the mixed extension of the strategic-form game and then to compute the equilibria in the mixed extension (assuming we can do that). In the case of a two-player game where each player has two pure strategies, the mixed extension is a game over the unit square with bilinear payoff functions, which can be solved as we did in Section 4.14 (page 121). Although this approach works well in two-player games where each player has two pure strategies, when there are more strategies, or more players, it becomes quite complicated. We present here a few examples of this sort of computation. We start with two-player zero-sum games, where finding equilibria is equivalent to finding the value of the game, and equilibrium strategies are optimal strategies. Using Equation (5.25) we can find the value of the game by computing maxσI ∈I minsII ∈SII U (σI , sII ) or minσII ∈II maxsI ∈SI U (sI , σII ), which also enables us to find the optimal strategies of the players: every strategy σI at which the maximum of maxσI ∈I minsII ∈SII U (σI , sII ) is obtained is an optimal strategy of Player I, and every strategy σII at which the minimum of minσII ∈II maxsI ∈SI U (sI , σII ) is obtained is an optimal strategy of Player II. The first game we consider is a game over the unit square. The computation presented here differs slightly from the computation in Section 4.14 (page 121).
Example 5.14 A two-player zero-sum game, in which each player has two pure strategies Consider the two-player zero-sum game in Figure 5.4.
Player II L
R
T
5
0
B
3
4
Player I Figure 5.4 A two-player zero-sum game
We begin by computing maxσI ∈I minsII ∈SII U (σI , sII ) in this example. If Player I plays the mixed strategy [x(T ), (1 − x)(B)], his payoff, as a function of x, depends on the strategy of Player II:
r If Player II plays L: U (x, L) = 5x + 3(1 − x) = 2x + 3. r If Player II plays R: U (x, R) = 4(1 − x) = −4x + 4. The graph in Figure 5.5 shows these two functions. The thick line plots the function representing the minimum payoff that Player I can receive if he plays x: minsII ∈SII U (x, sII ). This minimum is called the lower envelope of the payoffs of Player I.
154
Mixed strategies
5 4 v 3
U(x, L )
U(x, R )
0
x 0
1
1 6
Figure 5.5 The payoff function of Player I and the lower envelope of those payoffs, in the game in
Figure 5.4 The value of the game in mixed strategies equals maxσI ∈I minsII ∈SII U (σI , sII ), which is the maximum of the lower envelope. This maximum is attained at the intersection point of the two corresponding lines appearing in Figure 5.5, i.e., at the point at which 2x + 3 = −4x + 4,
(5.33) ∗
1 . 6
[ 61 (T ), 65 (B)].
The value whose solution is x = It follows that Player I’s optimal strategy is x = of the game is the height of the intersection point, v = 2 × 61 + 3 = 3 13 . We conduct a similar calculation for finding Player II’s optimal strategy, aimed at finding the strategy σII at which the minimum of minσII ∈II maxsI ∈SI U (sI , σII ) is attained. For each one of the pure strategies T and B of Player I, we compute the payoff as a function of the mixed strategy y of Player II, and look at the upper envelope of these two lines (see Figure 5.6).
5 4 v 3
U(T, y) = 5y
U(B, y) = 4 − y
0
y 0
2 3
1
Figure 5.6 The payoff function of Player II and the upper envelope of those payoffs, in the game in Figure 5.4
The minimum of the upper envelope is attained at the point of intersection of these two lines. It is the solution of the equation 5y = 4 − y, which is y = 32 . It follows that the optimal strategy of Player II is y ∗ = [ 23 (L), 13 (R)]. The value of the game is the height of the intersection point, U (B, y ∗ ) = 4 −
2 3
= 3 13 .
(5.34)
155
5.2 Computing equilibria in mixed strategies This procedure can be used for finding the optimal strategies of every game in which the players each have two pure strategies. Note that the value v, as computed in Figure 5.6 (the minmax value), is identical to the value v computed in Figure 5.4 (the maxmin value): both are equal to 3 31 . This equality follows from Theorem 5.11, which states that the game has a value in mixed strategies. ◭
The graphical procedure presented in Example 5.14 is very convenient. It can be extended to games in which one of the players has two pure strategies and the other player has any finite number of strategies. Suppose that Player I has two pure strategies. We can plot (as a straight line) the payoffs for each pure strategy of Player II as a function of x, the mixed strategy chosen by Player I. We can find the minimum of these lines (the lower envelope), and then find the maximum of the lower envelope. This maximum is the value of the game in mixed strategies.
Example 5.15 Consider the two-player zero-sum game in Figure 5.7.
Player II L
M
R
T
2
5
−1
B
1
−2
5
Player I
Figure 5.7 The two-player zero-sum game in Example 5.15
If Player I plays the mixed strategy [x(T ), (1 − x)(B)], his payoff, as a function of x, depends on the strategy chosen by Player II:
r If Player II plays L: U (x, L) = 2x + (1 − x) = 1 + x. r If Player II plays M: U (x, M) = 5x − 2(1 − x) = 7x − 2. r If Player II plays R: U (x, R) = −x + 5(1 − x) = −6x + 5. Figure 5.8 shows these three functions. As before, the thick line represents the function miny∈[0,1] U (x, y). The maximum of the lower envelope is attained at the point x in the intersection of the lines U (x, L) and U (x, R), and it is therefore the solution of the equation 1 + x = −6x + 5, which is x = 47 . It follows that the optimal strategy of Player I is x ∗ = [ 47 (T ), 37 (B)]. The maximum of the lower envelope is U ( 47 , L) = U ( 74 , R) = 1 47 ; hence the value of the game in mixed strategies is 1 74 . How can we find optimal strategies for Player II? For each mixed strategy σII of Player II, the payoff U (x, σII ), as a function of x, is a linear function. In fact, it is the average of the functions U (x, L), U (x, M), and U (x, R). If σII∗ is an optimal strategy of Player II, then it guarantees that the payoff will be at most the value of the game, regardless of the mixed strategy x chosen by Player I. In other words, we must have U (x, σII∗ ) ≤ 1 74 , ∀x ∈ [0, 1].
(5.35)
156
Mixed strategies
5 4
1 47
5 U(x, R)
4
3
3
2
2
U(x, M )
U(x, L)
1 0
−1
x∗
−2
0 −1 −2
Figure 5.8 The graphs of the payoff functions of Player 1
Consider the graph in Figure 5.8. Since U ( 74 , σII∗ ) is at most 1 47 , but U ( 74 , L) = U ( 74 , R) = 1 47 and U ( 47 , M) > 1 47 , the only mixed strategies for which U ( 47 , σII ) ≤ 1 47 are mixed strategies in which the probability of choosing the pure strategy M is 0, and in those mixed strategies U ( 47 , σI ) = 1 47 . Our task, therefore, is to find the appropriate weights for the pure strategies L and R that guarantee that the weighted average of U (x, L) and U (x, R) is the constant function 1 74 . Since every weighted average of these functions equals 1 47 at the point x = 74 , it suffices to find weights that guarantee that the weighted average will be 1 47 at one additional point x, for example, at x = 0 (because a linear function that attains the same value at two distinct points is a constant function). This means we need to consider the equation 1 74 = qU (0, L) + (1 − q)U (0, R) = q + 5(1 − q) = 5 − 4q.
(5.36)
The solution to this equation is q = 76 , and therefore the unique optimal strategy of Player II is σII∗ = [ 76 (L), 17 (R)]. ◭
The procedure used in the last example for finding an optimal strategy for Player II is a general one: after finding the value of the game and the optimal strategy of Player I, we need only look for pure strategies of Player II for which the intersection of the lines corresponding to their payoffs comprises the maximum of the lower envelope. In the above example, there were only two such pure strategies. In other cases, there may be more than two pure strategies comprising the maximum of the lower envelope. In such cases, we need only choose two such strategies: one for which the corresponding line is nonincreasing, and one for which the corresponding line is nondecreasing (see, for example, Figure 5.9(F)). After we have identified two such strategies, it remains to solve one linear equation and find a weighted average of the lines that yields a horizontal line. Remark 5.16 The above discussion shows that in every two-player zero-sum game in which Player I has two pure strategies and Player II has mII pure strategies, Player II
157
5.2 Computing equilibria in mixed strategies
x∗
x
Case A
x0 Case D
x∗
x
Case B
x
x0 x1
x
x∗ Case C
x
Case E
x∗
x
Case F
Figure 5.9 Possible graphs of payoffs as a function of x
has an optimal mixed strategy that chooses, with positive probability, at most two pure strategies. This is a special case of a more general result: in every two-player zerosum game where Player I has mI pure strategies and Player II has mII pure strategies, if mI < mII then Player II has an optimal mixed strategy that chooses, with positive probability, at most mI pure strategies. To compute the value, we found the maximum of the lower envelope. In the example above, there was a unique maximum, which was attained in the line segment [0, 1]. In general there may not be a unique maximum, and the maximal value may be attained at one of the extreme points, x = 0 or x = 1. Figure 5.9 depicts six distinct possible graphs of payoff functions of (U (x, sII ))sII ∈SII . In cases A and F, the optimal strategy of Player I is attained at an internal point x ∗ . In case B, the maximum of the lower envelope is attained at x ∗ = 1, and in case C the maximum is attained at x ∗ = 0. In case D, the maximum is attained in the interval [x0 , 1]; hence every point in this interval is an optimal strategy of Player I. In case E, the maximum is attained in the interval [x0 , x1 ]; hence every point in this interval is an optimal strategy of Player I. As for Player II, his unique optimal strategy is at an internal point in case A (and therefore is not a pure strategy). His unique optimal strategy is a pure strategy in cases B, C, D, and E. In case F, Player II has a continuum of optimal strategies (see Exercise 5.11).
5.2.2
Computing equilibrium points When dealing with a game that is not zero sum, the Nash equilibrium solution concept is not equivalent to the maxmin value. The computational procedure above will therefore not lead to Nash equilibrium points in that case, and we need other procedures.
158
Mixed strategies
The most straightforward and natural way to develop such a procedure is to build on the definition of the Nash equilibrium in terms of the “best reply.” We have already seen such a procedure in Section 4.14.2 (page 123), when we looked at non-zero-sum games on the unit square. We present another example here, in which there is more than one equilibrium point.
Example 5.17 Battle of the Sexes The Battle of the Sexes game, which we saw in Example 4.21 (page 98), appears in Figure 5.10.
Player II F
C
F
2, 1
0, 0
C
0, 0
1, 2
Player I Figure 5.10 Battle of the Sexes
Recall that for each mixed strategy [x(F ), (1 − x)(C)] of Player I (which we will refer to as x for short), we denoted the collection of best replies of Player II by: brII (x) = argmaxy∈[0,1] uII (x, y)
(5.37)
= {y ∈ [0, 1] : uII (x, y) ≥ uII (x, z) ∀z ∈ [0, 1]} .
(5.38)
Similarly, for each mixed strategy [y(F ), (1 − y)(C)] of Player II (which we will refer to as y for short), we denoted the collection of best replies of Player I by: brI (y) = argmaxx∈[0,1] uI (x, y)
(5.39)
= {x ∈ [0, 1] : uI (x, y) ≥ uI (z, y) ∀z ∈ [0, 1]} .
In the Battle of the Sexes, these correspondences1 are given by ⎧ ⎧ if x < 23 , ⎪ ⎪ ⎨0 ⎨0 2 brII (x) = [0, 1] if x = 3 , brI (y) = [0, 1] ⎪ ⎪ ⎩ ⎩ 1 if x > 23 . 1
(5.40)
if y < 13 , if y = 13 ,
if y > 13 .
Figure 5.11 depicts the graphs of these two set-valued functions, brI and brII . The graph of brII is the lighter line, and the graph of brI is the darker line. The two graphs are shown on the same set of axes, where the x-axis is the horizontal line, and the y-axis is the vertical line. For each x ∈ [0, 1], brII (x) is a point or a line located above x. For each y ∈ [0, 1], brI (y) is a point or a line located to the right of y.
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
1 A set-valued function, or a correspondence, is a multivalued function that associates every point in the domain with a set of values (as opposed to a single value, as is the case with an ordinary function).
159
5.2 Computing equilibria in mixed strategies
brII
y
brI 1 3
0
x 0
1
2 3
Figure 5.11 The graphs of brI (black line) and of brII (grey line)
A point (x ∗ , y ∗ ) is an equilibrium point if and only if x ∗ ∈ brI (y ∗ ) and y ∗ ∈ brII (x ∗ ). This is equivalent to (x ∗ , y ∗ ) being a point at which the two graphs brI and brII intersect (verify this for yourself). As Figure 5.11 shows, these graphs intersect in three points:
r (x ∗ , y ∗ ) = (0, 0): corresponding to the pure strategy equilibrium (C, C). r (x ∗ , y ∗ ) = (1, 1): corresponding to the pure strategy equilibrium (F, F ). r (x ∗ , y ∗ ) = ( 2 , 1 ): corresponding to the equilibrium in mixed strategies 3 3 x∗ = Note two interesting points:
2
1 3 (F ), 3 (C)
, y∗ =
1
2 3 (F ), 3 (C)
.
(5.41)
r The payoff at the mixed strategy equilibrium is ( 2 , 2 ). For each player, this payoff is worse than 3 3 the worst payoff he would receive if either of the pure strategy equilibria were chosen instead.
r The payoff 2 is also the security level (maxmin value) of each of the two players (verify this), but 3
the maxmin strategies guaranteeing this level are not equilibrium strategies; the maxmin strategy of Player I is [ 13 (F ), 23 (C)], and the maxmin strategy of Player II is [ 32 (F ), 13 (C)]. ◭
This geometric procedure for computing equilibrium points, as intersection points of the graphs of the best replies of the players, is not applicable if there are more than two players or if each player has more than two pure strategies. But there are cases in which this procedure can be mimicked by finding solutions of algebraic equations corresponding to the intersections of best-response graphs.
5.2.3
The indifference principle One effective tool for finding equilibria is the indifference principle. The indifference principle says that if a mixed equilibrium calls for a player to use two distinct pure strategies with positive probability, then the expected payoff to that player for using one of those pure strategies equals the expected payoff to him for using the other pure strategy, assuming that the other players are playing according to the equilibrium.
160
Mixed strategies
Theorem 5.18 Let σ ∗ be an equilibrium in mixed strategies of a strategic-form game, and let si and si be two pure strategies of player i. If σi∗ (si ) > 0 and σi∗ ( si ) > 0, then ∗ ∗ ) = Ui ( si , σ−i ). Ui (si , σ−i
(5.42)
The reason this theorem holds is simple: if the expected payoff to player i when he plays pure strategy si is higher than when he plays si , then he can improve his expected payoff by increasing the probability of playing si and decreasing the probability of playing si .
Proof: Suppose by contradiction that Equation (5.42) does not hold. Without loss of generality, suppose that ∗ ∗ ) > Ui ( si , σ−i ). Ui (si , σ−i
(5.43)
Let σi be the strategy of player i defined by ⎧ ⎨ σi (ti ) σi′ (ti ) := 0 ⎩ ∗ si ) σi (si ) + σi∗ (
Then ∗ Ui (σi , σ−i ) =
= >
ti ∈Si
∗ ) σ (ti )Ui (ti , σ−i
ti ∈{si , si }
ti ∈{si , si }
=
ti ∈si
if ti ∈ {si , si }, if ti = si , if ti = si .
(5.44)
∗ ∗ σ ∗ (ti )Ui (ti , σ−i ) + (σi∗ (si ) + σi ( si ))Ui (si , σ−i )
(5.45)
∗ ∗ ∗ ) + σi (si )Ui (si , σ−i ) + σi∗ ( si )Ui ( si , σ−i ) σ ∗ (ti )Ui (ti , σ−i
(5.46)
∗ σi∗ (ti )Ui (ti , σ−i )
= Ui (σ ∗ ).
(5.47) (5.48)
The equalities in Equation (5.45) and Equation (5.47) follow from the definition of σ , and Equation (5.46) follows from Equation (5.43). But this contradicts the assumption that σ ∗ is an equilibrium, because player i can increase his payoff by deviating to strategy σi′ . This contradiction shows that the assumption that Equation (5.42) does not hold was wrong, and the theorem therefore holds. We will show how the indifference principle can be used to find equilibria, by reconsidering the game in Example 5.9.
161
5.2 Computing equilibria in mixed strategies
Example 5.9 (Continued ) The payoff matrix in this game appears in Figure 5.12.
Player II L
R
T
1, −1
0, 2
B
0, 1
2, 0
Player I Figure 5.12 The payoff matrix in Example 5.9
As we have already seen, the only equilibrium point in this game is 1 (T ), 34 (B) , 23 (L), 13 (R) . 4
(5.49)
Definition 5.19 A mixed strategy σi of player i is called a completely mixed strategy if σi (si ) > 0 for every pure strategy si ∈ Si . An equilibrium σ ∗ = (σi∗ )i∈N is called a completely mixed equilibrium if for every player i ∈ N the strategy σi∗ is a completely mixed strategy. In words, a player’s completely mixed strategy chooses each pure strategy with positive probability. It follows that at every completely mixed equilibrium, every pure strategy vector is chosen with positive probability. We will now compute the equilibrium using the indifference principle. The first step is to ascertain, by direct inspection, that the game has no pure strategy equilibria. We can also ascertain that there is no Nash equilibrium of this game in which one of the two players plays a pure strategy. By Nash’s Theorem (Theorem 5.10), the game has at least one equilibrium in mixed strategies, and it follows that at every equilibrium of the game both players play completely mixed strategies. For every pair of mixed strategies (x, y), we have that UII (x, L) = 1 − 2x, UII (x, R) − 2x, UI (T , y) = y, and UI (B, y) = 2(1 − y). By the indifference principle, at equilibrium Player I is indifferent between playing T and playing B, and Player II is indifferent between L and R. In other words, if the equilibrium is (x ∗ , y ∗ ), then:
r Player I is indifferent between T and B: UI (T , y ∗ ) = UI (B, y ∗ )
=⇒
y ∗ = 2(1 − y ∗ )
=⇒
y ∗ = 23 .
(5.50)
1 − 2x ∗ = 2x ∗
=⇒
x ∗ = 14 .
(5.51)
r Player II is indifferent between L and R: UII (x ∗ , L) = UII (x ∗ , R)
=⇒
We have, indeed, found the same equilibrium that we found above, using a different procedure. Interestingly, in computing the mixed strategy equilibrium, each player’s strategy is determined by the payoffs of the other player; each player plays in such a way that the other player is indifferent between his two pure strategies (and therefore the other player has no incentive to deviate). This is in marked contrast to the maxmin strategy of a player, which is determined solely by the player’s own payoffs. This is yet another expression of the significant difference between the solution concepts of Nash equilibrium and maxmin strategy, in games that are not two-player zero-sum ◭ games.
162
Mixed strategies
5.2.4
Dominance and equilibrium The concept of strict dominance (Definition 4.6 on page 86) is a useful tool for computing equilibrium points. As we saw in Corollary 4.36 (page 109), in strategic-form games a strictly dominated strategy is chosen with probability 0 in each equilibrium. The next result, which is a generalization of that corollary, is useful for finding equilibria in mixed strategies. Theorem 5.20 Let G = (N, (Si )i∈N , (ui )i∈N ) be a game in strategic form in which the sets (Si )i∈N are all finite sets. If a pure strategy si ∈ Si of player i is strictly dominated by a mixed strategy σi ∈ i , then in every equilibrium of the game, the pure strategy si is chosen by player i with probability 0. Proof: Let si be a pure strategy of player i that is strictly dominated by a mixed strategy σi , and let σ = ( σi )i∈N be a strategy vector in which player i chooses strategy si with σ is not an equilibrium by showing that positive probability: σi (si ) > 0. We will show that σi is not a best reply of player i to σ−i . Define a mixed strategy σi′ ∈ i as follows: ti = si , σi (si ) · σi (si ) σi′ (ti ) = (5.52) σi (si ) · σi (ti′ ) ti = si . σi (ti ) +
In words, player i, using strategy σi′ , chooses his pure strategy in two stages: first he chooses a pure strategy using the probability distribution σi . If this choice leads to a pure strategy that differs from si , he plays that strategy. But if si is chosen, player i chooses another pure strategy using the distribution σi , and plays whichever pure strategy that leads to. σi , when played Finally, we show that σi′ yields player i a payoff that is higher than σ cannot be an equilibrium. Since σi strictly dominates si , it follows against σ−i , and hence that, in particular, and we have Ui ( σi , σ−i ) = =
0}, supp(σII∗ )
:= {sII ∈ SII :
σII∗ (sII )
> 0}.
(5.64) (5.65)
The sets supp(σI∗ ) and supp(σII∗ ) are called the support of the mixed strategies σI∗ and σII∗ respectively, and they contain all the pure strategies that are chosen with positive probability under σI∗ and σII∗ , respectively. By the indifference principle (see Theorem 5.18 on page 160), at equilibrium any two pure strategies that are played by a particular player with positive probability yield the same payoff to that player. Choose sI0 ∈ supp(σI ) and sII0 ∈ supp(σII∗ ). Then (σI∗ , σII∗ ) satisfies the following constraints: UI (sI0 , σII∗ ) = UI (sI , σII∗ ), ∀sI ∈ supp(σI∗ ),
UII (σI∗ , sII0 ) = UII (σI∗ , sII ), ∀sII ∈ supp(σII∗ ).
(5.66) (5.67)
At equilibrium, neither player can profit from unilateral deviation; in particular, UI (sI0 , σII∗ ) ≥ UI (sI , σII∗ ), ∀sI ∈ SI \ supp(σI ),
UII (σI∗ , sII0 )
≥
UII (σI∗ , sII ),
∀sII ∈ SII \ supp(σII ).
(5.68) (5.69)
Since UI and UII are multilinear functions, this is a system of equations that are linear in σI∗ and σII∗ . By taking into account the constraint that σI∗ and σII∗ are probability distributions, we conclude that (σI∗ , σII∗ ) is the solution of a system of linear equations. In addition, every pair of mixed strategies (σI∗ , σII∗ ) that solves Equations (5.66)–(5.69) is a Nash equilibrium. This leads to the following direct algorithm for finding equilibria in a two-player game that is not zero sum: For every nonempty subset YI of SI and every nonempty subset YII of SII , determine whether there exists an equilibrium (σI∗ , σII∗ ) satisfying YI = supp(σI∗ ) and YII = supp(σII∗ ). The set of equilibria whose support is YI and YII is the set of solutions of the system of equations comprised of Equations (5.70)–(5.79), in which sI0 and sII0 are any
166
Mixed strategies
two pure strategies in YI and YII , respectively (Exercise 5.54):
sII ∈SII
σII (sII )uI sI0 , sII = σII (sII )uI (sI , sII ), ∀sI ∈ YI ,
sI ∈SI
sII ∈SII
σI (sI )uI sI , sII0 = σI (sI )uI (sI , sII ), ∀sII ∈ YII ,
sI ∈SI
(5.71)
sI ∈SI
σII (sII )uI sI0 , sII ≥ σII (sII )uI (sI , sII ), ∀sI ∈ SI \ YI ,
(5.70)
sII ∈SII
(5.72)
sII ∈SII
σI (sI )uI sI , sII0 ≥ σI (sI )uI (sI , sII ), ∀sII ∈ SII \ YII ,
sII ∈SII
sI ∈SI
σI (sI ) = 1,
(5.74)
σII (sII ) = 1,
(5.75)
sI ∈SI
(5.73)
σI (sI ) > 0, σII (sII ) > 0, σI (sI ) = 0,
σII (sII ) = 0,
∀sI ∈ YI ,
(5.76)
∀sII ∈ YII ,
(5.77)
∀sII ∈ SII \ YII .
(5.79)
∀sI ∈ SI \ YI ,
(5.78)
Determining whether this system of equations has a solution can be accomplished by solving a linear program. Because the number of nonempty subsets of SI is 2mI − 1 and the number of empty subsets of SII is 2mII − 1, the complexity of this algorithm is exponential in mI and mII , and hence this algorithm is computationally inefficient.
5.3
The proof of Nash’s Theorem • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
This section is devoted to proving Nash’s Theorem (Theorem 5.10), which states that every finite game has an equilibrium in mixed strategies. The proof of the theorem makes use of the following result. Theorem 5.23 (Brouwer’s Fixed Point Theorem) Let X be a convex and compact set in a d-dimensional Euclidean space, and let f : X → X be a continuous function. Then there exists a point x ∈ X such that f (x) = x. Such a point x is called a fixed point of f . Brouwer’s Fixed Point Theorem states that every continuous function from a convex and compact set to itself has a fixed point, that is, a point that the function maps to itself. In one dimension, Brouwer’s Theorem takes an especially simple form. In one dimension, a convex and compact space is a closed line segment [a, b]. When f : [a, b] → [a, b] is a continuous function, one of the following three alternatives must obtain:
167
5.3 The proof of Nash’s Theorem
b
x∗
a a
x∗
b
Figure 5.16 Brouwer’s Theorem: a fixed point in the one-dimensional case
1. f (a) = a, hence a is a fixed point of f . 2. f (b) = b, hence b is a fixed point of f . 3. f (a) > a and f (b) < b. Consider the function g(x) = f (x) − x, which is continuous, where g(a) > 0 and g(b) < 0. The Intermediate Value Theorem implies that there exists x ∈ [a, b] satisfying g(x) = 0, that is to say, f (x) = x. Every such x is a fixed point of f . The graphical expression of the proof of Brouwer’s Fixed Point Theorem in one dimension is as follows: every continuous function on the segment [a, b] must intersect the main diagonal in at least one point (see Figure 5.16). If the dimension is two or greater, the proof of Brouwer’s Fixed Point Theorem is not simple. It can be proved in several different ways, with a variety of mathematical tools. A proof of the theorem using Sperner’s Lemma appears in Section 23.1.2 (page 935). We will now prove Nash’s Theorem using Brouwer’s Fixed Point Theorem. The proofs of the following two claims are left to the reader (Exercises 5.1 and 5.2). Theorem 5.24 If player i’s set of pure strategies Si is finite, then his set of mixed strategies i is convex and compact. Theorem 5.25 If A ⊂ Rn and B ⊂ Rm are compact sets, then the set A × B is a compact subset of Rn+m . If A and B are convex sets, then A × B is a convex subset of Rn+m . Theorems 5.24 and 5.25 imply that the set = 1 × 2 × · · · n is a convex and compact subset of the Euclidean space Rm1 +m2 +···+mn . The proof of Nash’s Theorem then proceeds as follows. We will define a function f : → , and prove that it satisfies the following two properties:
r f is a continuous function. r Every fixed point of f is an equilibrium of the game. Since is convex and compact, and f is continuous, it follows from Brouwer’s Fixed Point Theorem that f has at least one fixed point. The second property above then implies that the game has at least one equilibrium point. The idea behind the definition of f is as follows. For each strategy vector σ , we define f (σ ) = (fi (σ ))i∈N to be a vector of strategies, where fi (σ ) is a strategy of player i. fi (σ )
168
Mixed strategies
is defined in such a way that if σi is not a best reply to σ−i , then fi (σ ) is given by shifting σi in the direction of a “better reply” to σ−i . It then follows that fi (σ ) = σi if and only if σi is a best reply to σ−i . j To define f , we first define an auxiliary function gi : → [0, ∞) for each player i and each index j , where 1 ≤ j ≤ mi . That is, for each vector of mixed strategies σ we j define a nonnegative number gi (σ ). The payoff that player i receives under the vector of mixed strategies σ is Ui (σ ). The j payoff he receives when he plays the pure strategy si , but all the other players play σ , is j j Ui (si , σ−i ). We define the function gi as follows: ! j j gi (σ ) := max 0, Ui si , σ−i − Ui (σ ) . (5.80) j
j
In words, gi (σ ) equals 0 if player i cannot profit from deviating from σi to si . When j gi (σ ) > 0, player i gains a higher payoff if he increases the probability of playing the pure j strategy si . Because a player has a profitable deviation if and only if he has a profitable deviation to a pure strategy, we have the following result: j
Claim 5.26 The strategy vector σ is an equilibrium if and only if gi (σ ) = 0, for each player i ∈ N and for all j = 1, 2, . . . , mi . To proceed with the proof, we need the following claim. j
Claim 5.27 For every player i ∈ N, and every j = 1, 2, . . . , mi , the function gi is continuous. Proof: Let i ∈ N be a player, and let j ∈ {1, 2, . . . , mi }. From Corollary 5.7 (page 149) j the function Ui is continuous. The function σ−i #→ Ui (si , σ−i ), as a function of σ−1 is j therefore also continuous. In particular, the difference Ui (si , σ−i ) − Ui (σ ) is a continuous function. Since 0 is a continuous function, and since the maximum of continuous functions j is a continuous function, we have that the function gi is continuous. We can now define the function f . The function f has to satisfy the property that every one of its fixed points is an equilibrium of the game. It then follows that if σ is not an equilibrium, it must be the case that σ = f (σ ). How can we guarantee that? The main j idea is to consider, for every player i, the indices j such that gi (σ ) > 0; these indices j correspond to pure strategies at which gi (σ ) > 0, i.e., the strategies that will increase player i’s payoff if he increases the probability that they will be played (and decreases the probability of playing pure strategies that do not satisfy this inequality). This idea leads to the following definition. j Because f (σ ) is an element in , i.e., it is a vector of mixed strategies, fi (σ ) is the j probability that player i will play the pure strategy si . Define: j j σi si + gi (σ ) j fi (σ ) := . (5.81) i k 1+ m k=1 gi (σ ) j
j
In words, if si is a better reply than σi to σ−i , we increase its probability by gi (σ ), and then normalize the resulting numbers so that we obtain a probability distribution. We now turn our attention to the proof that f satisfies all its required properties.
169
5.3 The proof of Nash’s Theorem
Claim 5.28 The range of f is , i.e., f () ⊆ . Proof: We need to show that f (σ ) is a vector of mixed strategies, for every σ ∈ , i.e., j
1. fi (σ ) ≥ 0 for all i, and for all j ∈ {1, 2, . . . , mi }. i j 2. m j =1 fi (σ ) = 1 for all players i ∈ N. j
The first condition holds because gi (σ ) is nonnegative by definition, and hence the denominator in Equation (5.81) is at least numerator is nonnegative. 1,i and the j As for the second condition, because m σ (s ) j =1 i i = 1, it follows that mi
j =1
j fi (σ )
j mi j
σi si + gi (σ ) = mi k 1 + k=1 gi (σ ) j =1 j mi j j =1 (σi si + gi (σ )) = i k 1+ m k=1 gi (σ ) j mi j mi j =1 σi si + j =1 gi (σ ) = 1. = mi j 1 + j =1 gi (σ )
(5.82)
(5.83) (5.84)
Claim 5.29 f is a continuous function. Proof: Claim 5.27, implies that both the numerator and the denominator in the definition j of fi are continuous functions. As mentioned in the proof of Claim 5.28, the denominator j in the definition of fi is at least 1. Thus, f is the ratio of two continuous functions, where the denominator is always positive, and therefore it is a continuous function. To complete the proof of the theorem, we need to show that every fixed point of f is an equilibrium of the game. This is accomplished in several steps. Claim 5.30 Let σ be a fixed point of f . Then mi j
j gi (σ ) = σi si gik (σ ), ∀i ∈ N, j ∈ {1, 2, . . . , mi }.
(5.85)
k=1
Proof: The strategy vector σ is a fixed point of f , and therefore f (σ ) = σ . This is an equality between vectors; hence every coordinate in the vector on the left-hand side of the equation equals the corresponding coordinate in the vector on the right-hand side, i.e., j j (5.86) fi (σ ) = σi si , ∀i ∈ N, j ∈ {1, 2, . . . , mi }.
From the definition of f
j j j σi si + gi (σ ) = σi si , ∀i ∈ N, j ∈ {1, 2, . . . , mi }. mi k 1 + k=1 gi (σ )
(5.87)
170
Mixed strategies
The denominator on the left-hand side is positive; multiplying both sides of the equations by the denominator yields mi j j j
j σi si + gi (σ ) = σi si + σi si gik (σ ), ∀i ∈ N, j ∈ {1, 2, . . . , mi }.
(5.88)
k=1
j
Cancelling the term σi (si ) from both sides of Equation (5.88) leads to Equation (5.85). We now turn to the proof of the last step. Claim 5.31 Let σ be a fixed point of f . Then σ is a Nash equilibrium. Proof: Suppose by contradiction that σ is not an equilibrium. Theorem 5.26 implies that there exists a player i, and l ∈ {1, 2, . . . , mi }, such that gil (σ ) > 0. In particular, mi k k=1 gi (σ ) > 0; hence from Equation (5.85) we have j j σi si > 0 ⇐⇒ gi (σ ) > 0, ∀j ∈ {1, 2, . . . , mi }. (5.89)
Because gil (σ ) > 0, one has in particular that σi (sil ) > 0. Since the function Ui is multi i j j linear, Ui (σ ) = m j =1 σi (si )Ui (si , σ−i ). This yields 0= =
mi
j =1
j j σi si (Ui si , σ−i − Ui (σ ))
j
{j : σi si >0}
=
j
{j : σi si >0}
(5.90)
j j σi si Ui si , σ−i − Ui (σ )
(5.91)
j j σi si gi (σ ),
(5.92)
j
j
where the last equality holds because from Equation (5.89), if σi (si ) > 0, then gi (σ ) > 0, j j and in this case gi (σ ) = Ui (si , σ−i ) − Ui (σ ). But the sum (Equation (5.92)) is positive: it contains at least one element (j = l), and by Equation (5.89) every summand in the sum is positive. This contradiction leads to the conclusion that σ must be a Nash equilibrium.
5.4
Generalizing Nash’s Theorem • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
There are situations in which, due to various constraints, a player cannot make use of some mixed strategies. For example, there may be situations in which player i cannot choose si with different probability, and he is then forced to limit himself two pure strategies si and to mixed strategies σi in which σi (si ) = σi ( si ). A player may find himself in a situation in which he must choose a particular pure strategy si with probability greater than or equal to some given number pi (si ), and he is then forced to limit himself to mixed strategies σi in which σi (si ) ≥ pi (si ). In both of these examples, the constraints can be translated into linear inequalities. A bounded set that is defined by the intersection of a finite number of half-spaces is called a polytope. The number of extreme points of every polytope S is finite, and every polytope is the convex hull of its extreme points: if x 1 , x 2 , . . . , x K
171
5.4 Generalizing Nash’s Theorem
are the extreme points of S, then S is the smallest convex set containing x 1 , x 2 , . . . , x K (see Definition 23.1 on page 917). In other words, s ∈ S there exist nonnegative Kfor each l l numbers (α l )K whose sum is 1, such that s = α x ; conversely, for each vector of l=1 l=1 K l l l K nonnegative numbers (α )l=1 whose sum is 1, the vector l=1 α x is in S. The space of mixed strategies i is a simplex, which is a polytope whose extreme points are unit vectors e1 , e2 , . . . , emi , where ek = (0, . . . , 0, 1, 0, . . . , 0) is an mi -dimensional vector whose k-th coordinate is 1, and all the other coordinates of ek are 0. We will now show that Nash’s Theorem still holds when the space of strategies of a player is a polytope, and not necessarily a simplex. We note that Nash’s Theorem holds under even more generalized conditions, but we will not present those generalizations in this book. Theorem 5.32 Let G = (N, (Xi )i∈N , (Ui )i∈N ) be a strategic-form game in which, for each player i,
r The set Xi is a polytope in Rdi . r The function Ui is a multilinear function over the variables (si )i∈N . Then G has an equilibrium. Nash’s Theorem (Theorem 5.10 on page 151) is a special case of Theorem 5.32, where Xi = i for every player i ∈ N. Proof: The set of strategies Xi of player i in the game G is a polytope. Denote the extreme points of this set by {xi1 , xi2 , . . . , xiKi }. Define an auxiliary strategic-form game in which: G
r The set of players is N. r The set of pure strategies of player i ∈ N is Li := {1, 2, . . . , Ki }. Denote L := ×i∈N Li . r For each vector of pure strategies l = (l1 , l2 , . . . , ln ) ∈ L, the payoff to player i is (5.93) vi (l) := Ui x1l1 , x2l2 , . . . , xnln .
It follows that in the auxiliary game every player i chooses an extreme point in his set of strategies Xi , and his payoff in the auxiliary game is given by Ui . For each i ∈ N, denote by Vi the multilinear extension of vi . Since Ui is a multilinear function, player i’s payoff to mixed strategies is function in the extension of G Vi (α) =
k1
k2
=
k1
k2
l1 =1 l2 =1
l1 =1 l2 =1
⎛
= Ui ⎝
···
kn
α1l1 α2l2 · · · αnln vi (l1 , l2 , . . . , ln )
···
kn
α1l1 α2l2 · · · αnln Ui x1l1 , x2l2 , . . . , xnln
k1
l1 =1
ln =1
ln =1
α1l1 x1l1 , . . . ,
kn
ln =1
⎞
αnln xnln ⎠ .
(5.94)
(5.95)
(5.96)
satisfies the conditions of Nash’s Theorem (Theorem 5.10 on The auxiliary game G page 151), and it therefore has a Nash equilibrium in mixed strategies α ∗ . It follows that
172
Mixed strategies
for every player i, ∗ Vi (α ∗ ) ≥ Vi (αi , α−i ), ∀i ∈ N, ∀αi ∈ (Li ).
(5.97)
i ∗ Denote by αi∗ = (αi∗,li )K li =1 player i’s strategy in the equilibrium α . Since Xi is a convex set, the weighted average
si∗ :=
Ki
αi∗,li xili
(5.98)
li =1
is a point in Xi . We will now show that s ∗ = (si∗ )i∈N is an equilibrium of the game G. Let i ∈ N be a player, and let si be any strategy of player i. Since {xi1 , xi2 , . . . , xiKi } are extreme K i li i points of Si there exists a distribution αi = (αili )K li =1 over Li such that si = li =1 αi xi . Equations (5.98), (5.94), and (5.97) imply that, for each player i ∈ N, ∗ Ui (s ∗ ) = Vi (∗ α) ≥ Vi (αi , α−i ) = Ui (si , s−i ).
(5.99)
That is, if player i deviates to si , he cannot profit. Since this is true for every player i ∈ N and every strategy si ∈ Si , the strategy vector s ∗ is an equilibrium of the game G.
5.5
Utility theory and mixed strategies • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
In defining the mixed extension of a game, we defined the payoff that a vector of mixed strategies yields as the expected payoff when every player chooses a pure strategy according to the probability given by his mixed strategy. But how is this definition justified? In this section we will show that if the preferences of the players satisfy the von Neumann– Morgenstern axioms of utility theory (see Chapter 2), we can interpret the numerical values in each cell of the payoff matrix as the utility the players receive when the outcome of the game is that cell (see Figure 5.17). Suppose that we are considering a two-player game, such as the game in Figure 5.17. In this game there are six possible outcomes, O = {A1 , A2 , . . . , A6 }. Each player has a preference relation over the set of lotteries over O. Suppose that the two players have linear utility functions, u1 and u2 respectively, over the set of lotteries. Every pair of mixed strategies x = (x1 , x2 ) and y = (y1 , y2 , y3 ) induces a lottery over the possible outcomes. The probability of reaching each one of the possible outcomes is indicated in Figure 5.18. In other words, every pair of mixed strategies (x, y) induces the following lottery Lx,y over the outcomes: L = Lx,y = [x1 y1 (A1 ), x1 y2 (A2 ), x1 y3 (A3 ), x2 y1 (A4 ), x2 y2 (A5 ), x2 y3 (A6 )]. Since the utility function of the two players is linear, player i’s utility from this lottery is ui (Lx,y ) = x1 y1 ui (A1 ) + x1 y2 ui (A2 ) + x1 y3 ui (A3 ) + x2 y1 ui (A4 ) + x2 y2 ui (A5 ) + x2 y3 ui (A6 ).
(5.100)
173
5.5 Utility theory and mixed strategies
Player II
Player I
y1
y2
y3
x1
A1
A2
A3
x2
A4
A5
A6
Figure 5.17 A two-player game in terms of outcomes
Outcome Probability
A1 x1 y 1
A2 x 1y 2
A3 x1 y 3
A4 x2 y 1
A5 x2 y 2
A6 x2 y 3
Figure 5.18 The probability of reaching each of the outcomes
Player I
y1
Player II y2
y3
x1
u1(A1), u2 (A1)
u1(A2), u2 (A2)
u1(A3), u2 (A3)
x2
u1(A 4), u2 (A 4)
u1(A5), u2 (A 5)
u1(A6), u2 (A 6)
Figure 5.19 The game in Figure 5.17 in terms of utilities
Player i’s utility from this lottery is therefore equal to his expected payoff in the strategicform game in which in each cell of the payoff matrix we write the utilities of the players from the outcome obtained at that cell (Figure 5.19). If, therefore, we assume that each player’s goal is to maximize his utility, what we are seeking is the equilibria of the game in Figure 5.19. If (x, y) is an equilibrium of this game, then any player who unilaterally deviates from his equilibrium strategy cannot increase his utility. Note that because, in general, the utility functions of the players differ from each other, the game in terms of utilities (Figure 5.19) is not a zero-sum game, even if the original game is a zero-sum game in which the outcome is a sum of money that Player II pays to Player I. Recall that a player’s utility function is determined up to a positive affine transformation (Corollary 2.23, on page 23). How does the presentation of a game change if a different choice of players’ utility functions is made? Let v1 and v2 be two positive affine transformations of u1 and u2 respectively; i.e., ui and vi are equivalent representations of the utilities of player i that satisfy vi (L) = αi ui (L) + βi for every lottery L where αi > 0 and βi ∈ R for i = 1, 2. The game in Figure 5.17 in terms of the utility functions v1 and v2 will be analogous to the matrix that appears in Figure 5.19, with u1 and u2 replaced by v1 and v2 respectively.
174
Mixed strategies
Example 5.33 Consider the two games depicted in Figure 5.20. Game B in Figure 5.20 is derived from Game A by adding a constant value of 6 to the payoff of Player II in every cell, whereby we have implemented a positive affine transformation (where α = 1, β = 6) on the payoffs of Player II.
Player II
T Player I B
Player II
L
M
3, −3
−2, 2
5, −5
L
M
T
3, 3
−2, 8
B
5, 1
1, 5
Player I
1, −1
Game A
Game B
Figure 5.20 Adding a constant value to the payoffs of one of the players
While Game A is a zero-sum game, Game B is not a zero-sum game, because the sum of the utilities in each cell of the matrix is 6. Such a game is called a constant-sum game. Every constantsum game can be transformed to a zero-sum game by adding a constant value to the payoffs of one of the players, whereby the concepts constant-sum game and zero-sum game are equivalent. As we will argue in Theorem 5.35 below, the equilibria of a game are unchanged by adding constant values to the payoffs. For example, in the two games in Figure 5.20, strategy B strictly dominates T for Player I, and strategy M strictly dominates L for Player II. It follows that in both of these games, the only equilibrium point is (B, M). If we implement a positive affine transformation in which α = 1 on the payoffs of the players, we will still end up with a game in which the only thing that has changed is the units in which we are measuring the utilities of the players. For example, the following game is derived from Game A in Figure 5.20 by implementing the affine transformation x #→ 5x + 7 on the payoffs of Player I (see Figure 5.21).
Player II L
M
T
22, −3
−3, 2
B
32, −5
12, −1
Player I
Figure 5.21 The utilities in Game A in Figure 5.20 after implementing the affine transformation x #→ 5x + 7 on the payoffs to Player I ◭
Games that differ only in the utility representations of the players are considered to be equivalent games. Definition 5.34 Two games in strategic form (N, (Si )i∈N , (ui )i∈N ) and (N, (Si )i∈N , (vi )i∈N ) with the same set of players and the same sets of pure strategies are strategically equivalent if for each player i ∈ N the function vi is a positive affine transformation
175
5.5 Utility theory and mixed strategies
of the function ui . In other words, there exist αi > 0 and βi ∈ R such that vi (s) = αi ui (s) + βi , ∀s ∈ S.
(5.101)
The name “strategic equivalence” comes from the next theorem, whose proof we leave as an exercise (Exercise 5.58). be two strategically equivalent strategic-form games. Every Theorem 5.35 Let G and G equilibrium σ = (σ1 , . . . , σn ) in mixed strategies of the game G is an equilibrium in mixed strategies of the game G. In other words, each equilibrium in the original game remains an equilibrium after changing the utility functions of the players by positive affine transformations. Note, however, that the equilibrium payoffs do change from one strategically equivalent game to another, in accordance with the positive affine transformation that has been implemented. Corollary 5.36 If the preferences of every player over lotteries over the outcomes of the game satisfy the von Neumann–Morgenstern axioms, then the set of equilibria of the game is independent of the particular utility functions used to represent the preferences. Given the payoff matrix in Figure 5.21 and asked whether or not this game is strategically equivalent to a zero-sum game, what should we do? If the game is strategically equivalent to a zero-sum game, then there exist two positive affine transformations f1 and f2 such that f2 (u2 (s)) = −f1 (u1 (s)) for every strategy vector s ∈ S. Since the inverse of a positive affine transformation is also a positive affine transformation (Exercise 2.19 on page 35), and the concatenation of two positive affine transformations is also a positive affine transformation (Exercise 2.20 on page 35), in this case the positive affine transformation f3 = −((f1 )−1 ◦ (−f2 )) satisfies the property that f3 (u2 (s)) = −u1 (s) for every strategy vector s ∈ S. In other words, if the game is strategically equivalent to a zero-sum game, there exists a positive affine transformation that when applied to the utilities of Player II, yields the negative of the utilities of Player I. Denote such a transformation, assuming it exists, by αu + β. Then we need to check whether there exist α > 0 and β ∈ R such that −5α + β = −32,
(5.102)
−1α + β = −12,
(5.104)
−3α + β = −22,
(5.103)
2α + β = 3.
(5.105)
In order to ascertain whether this system of equations has a solution, we can find α and β that solve two of the above equations, and check whether they satisfy the rest of the equations. For example, if we solve Equations (5.102) and (5.103), we get α = 5 and β = −7, and we can then check that these values do indeed also solve the Equations (5.104) and (5.105). Since we have found α and β solving the system of equations, we deduce that this game is strategically equivalent to a zero-sum game. Remark 5.37 Given the above, some people define a zero-sum game to be a game strategically equivalent to a game (N, (Si )i∈N , (vi )i∈N ) in which v1 + v2 = 0.
176
Mixed strategies
The connection presented in this section between utility theory and game theory underscores the significance of utility theory. Representing the utilities of players by linear functions enables us to compute Nash equilibria with relative ease. Had we represented the players’ preferences/indifferences by nonlinear utility functions, calculating equilibria would be far more complicated. This is similar to the way we select measurement scales in various fields. Many physical laws are expressed using the Celsius scale, because they can be given a simple expression. For example, consider the physical law that states that the change in the length of a metal rod is proportional to the change in its temperature. If temperature is measured in Fahrenheit, that law remains unchanged, since the Fahrenheit scale is a positive affine transformation of the Celsius scale. In contrast, if we were to measure temperature using, say, the log of the Celsius scale, many physical laws would have much more complicated formulations. Using linear utilities enables us to compute the utilities of simple lotteries using expected-value calculations, which simplifies the analysis of strategic-form games. This, of course, depends on the assumption that the preferences of the players can be represented by linear utility functions, i.e., that their preferences satisfy the von Neumann–Morgenstern axioms. Another important point that has emerged from this discussion is that most daily situations do not correspond to two-player zero-sum games, even if the outcomes are in fact sums of money one person pays to another. This is because the utility of one player from receiving an amount of money x is usually not diametrically opposite to the utility of the other from paying this amount. That is, there are amounts x ∈ R for which u1 (x) + u2 (−x) = 0. On the other hand, as far as equilibria are concerned, the particular representation of the utilities of the players does not affect the set of equilibria of a game. If there exists a representation that leads to a zero-sum game, we are free to choose that representation, and if we do so, we can find equilibria by solving a linear program (see Section 5.2.5 on page 164). One family of games that is always amenable to such a representation, which can be found easily, is the family of two-person games with two outcomes, where the preferences of the two players for the two alternative outcomes are diametrically opposed in these games. In such games we can always define the utilities of one of the players over the outcomes to be 1 or 0, and define the utilities of the other player over the outcomes to be −1 or 0. In contrast, zero-sum games are rare in the general family of two-player games. Nevertheless, two-player zero-sum games are very important in the study of game theory, as explained on pages 111–112.
5.6
The maxmin and the minmax in n-player games • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
In Section 4.10 (page 102), we defined the maxmin to be the best outcome that a player can guarantee for himself under his most pessimistic assumption regarding the behavior of the other players. Definition 5.38 The maxmin in mixed strategies of player i is defined as follows: v i := max min Ui (σi , σ−i ). σi ∈i σ−i ∈−i
(5.106)
177
5.6 The maxmin and the minmax in n-player games
In two-player zero-sum games we also defined the concept of the minmax value, which is interpreted as the least payoff that the other players can guarantee that a player will get. In two-player zero-sum games, minimizing the payoff of one player is equivalent to maximizing the payoff of his opponent, and hence in two-player zero-sum games the maxmin of Player I is equal to the minmax of Player II. This is not true, however, in two-player games that are not zero-sum games, and in games with more than two players. Analogously to the definition of the maxmin in Equation (5.106), the minmax value of a player is defined as follows. Definition 5.39 Let G = (N, (Si )i∈N , (ui )i∈N ) be a strategic-form game. The minmax value in mixed strategies of player i is v i := min max Ui (σi , σ−i ).
(5.107)
σ−i ∈−i σi ∈i
v i is the lowest possible payoff that the other players can force on player i. A player’s maxmin and minmax values depend solely on his payoff function, which is why different players in the same game may well have different maxmin and minmax values. One of the basic characteristics of these values is that a player’s minmax value in mixed strategies is greater than or equal to his maxmin value in mixed strategies. Theorem 5.40 In every strategic-form game G = (N, (Si )i∈N , (ui )i∈N ), for each player i ∈ N, vi ≥ vi .
(5.108)
Equation (5.108) is expected: if the other players can guarantee that player i will not receive more than v i , and player i can guarantee himself at least v i , then v i ≥ v i . Proof: Let σ−i ∈ −i be a strategy vector in which the minimum in Equation (5.107) is attained;, i.e., v i = max Ui (σi , σ−i ) ≤ max Ui (σi , σ−i ), σi ∈i
σi ∈i
∀σ−i ∈ −i .
(5.109)
On the other hand,
Ui (σi , σ−i ) ≥ min Ui (σi , σ−i ), ∀σi ∈ i .
(5.110)
v i = max Ui (σi , σ−i ) ≥ max min Ui (σi , σ−i ) = v i .
(5.111)
σ−i ∈−i
Taking the maximum over all mixed strategies σi ∈ i on both sides of the equation sign (5.110) yields σi ∈i
σi ∈i σ−i ∈−i
We conclude that v i ≥ v i , which is what we needed to show.
In a two-player game G = (N, (Si )i∈N , (ui )i∈N ) where N = {I, II}, the maxmin value in mixed strategies of each player is always equal to his minmax value in mixed strategies. For Player I, for example, these two values equal the value of the second two-player zerosum game G = (N, (Si )i∈N , (vi )i∈N ), in which vI = uI and vII = −uI (Exercise 5.64). As the next example shows, in a game with more than two players the maxmin value may be less than the minmax value.
178
Mixed strategies
Example 5.41 Consider the three-player game in which the set of players is N = {I, II, III}, and every player has two pure strategies; Player I chooses a row (T or B), Player II chooses a column (L or R), and Player III chooses the matrix (W or E). The payoff function u1 of Player I is shown in Figure 5.22.
L
R
T
0
1
B
1
1
L
R
T
1
1
B
1
0
W
E
Figure 5.22 Player I’s payoff function in the game in Example 5.41
We compute the maxmin value in mixed strategies of player i. If Player I uses the mixed strategy [x(T ), (1 − x)(B)], Player II uses the mixed strategy [y(L), (1 − y)(R)], and Player III uses the mixed strategy [z(W ), (1 − z)(E)], then Player I’s payoff is UI (x, y, z) = 1 − xyz − (1 − x)(1 − y)(1 − z).
(5.112)
v I = max min UI (x, y, z) = 21 .
(5.113)
We first find x
y,z
To see this, note that UI (x, 1, 1) = x ≤ 12 for every x ≤ 21 , and UI (x, 0, 0) = 1 − x ≤ 12 for every x ≥ 12 , and hence miny,z UI (x, y, z) ≤ 21 for every x. On the other hand, UI ( 21 , y, z) ≥ 21 for each y and z and hence maxx miny,z UI (x, y, z) = 12 , which is what we claimed. We next turn to calculating the minmax value of Player I. v I = min max UI (x, y, z) y,z
(5.114)
x
= min max(1 − xyz − (1 − x)(1 − y)(1 − z))
(5.115)
= min max(1 − (1 − y)(1 − z) + x(1 − y − z)).
(5.116)
y,z
y,z
x
x
For every fixed y and z the function x #→ (1 − (1 − y)(1 − z) + x(1 − y − z)) is linear; hence the maximum of Equation (5.116) is attained at the extreme point x = 1 if 1 − y − z ≥ 0, and at the extreme point x = 0 if 1 − y − z ≤ 0. This yields 1 − (1 − y)(1 − z) if y + z ≥ 1, max(1 − (1 − y)(1 − z) + x(1 − y − z)) = 1 − yz if y + z ≤ 1. x The minimum of the function 1 − (1 − y)(1 − z) over the domain y + z ≥ 1 is 43 , and is attained at y = z = 12 . The minimum of the function 1 − yz over the domain y + z ≤ 1 is also 43 , and is attained at y = z = 12 . We therefore deduce that v I = 34 .
(5.117)
In other words, in this example vI =
1 2
38 , the added information is advantageous to Player I, in accordance with Theorem 5.45. ◭
In games that are not zero-sum, however, the situation is completely different. In the following example, Player I receives additional information, but this leads him to lose at the equilibrium point.
185
5.7 Imperfect information: the value of information
Example 5.47 Detrimental addition of information Consider Game A in Figure 5.29.
(1, 1)
(1, 1)
M II
M l r
R L I
l r
(10, 0) (0, 10)
II
R
(10, 0)
I
l1 r1
(0, 10)
I
l2 r2
L
(0, 10) (10, 0)
Game A
(0, 10)
(10, 0)
Game B
Figure 5.29 The games in Example 5.47
The only equilibrium point of this game is the following pair of mixed strategies:
r Player I plays [ 1 (l), 1 (r)]. 2 2 r Player II plays [ 1 (L), 1 (R), 0(M)]. 2 2 To see this, note that strategy M is strictly dominated by strategy [ 21 (L), 12 (R), 0(M)], and it follows from Theorem 4.35 (page 109) that it may be eliminated. After the elimination of this strategy, the resulting game is equivalent to Matching Pennies (Example 3.20 on page 52), whose sole equilibrium point is the mixed strategy under which both players choose each of their pure strategies with probability 21 (verify that this is true). The equilibrium payoff is therefore (5, 5). When Player I receives additional information and can distinguish between the two vertices (see Game B in Figure 5.29), an equilibrium in the game is:
r Player I plays the pure strategy (l1 , r2 ). r Player II plays [0(L), 0(R), 1(M)]. To see this, note that the pure strategy (l1 , r2 ) of Player I is his best reply to Player II’s strategy, and strategy [0(L), 0(R), 1(M)] is Player II’s best reply to (l1 , r2 ). The equilibrium payoff is therefore (1, 1). This is not the only equilibrium of this game; there are more equilibria, but they all yield an equilibrium payoff of (1, 1) (Exercise 5.67). Adding information in this game is to Player I’s detriment, because his payoff drops from 5 to 1. In this particular example, the additional information has also impacted Player II’s payoff negatively, but this is not always the case (see Exercise 5.62). The reason that additional information is to Player I’s detriment is that he cannot ignore the new information: if the play of the game reaches one of the vertices that Player I controls, it is to his advantage to exploit the information he has, since ignoring it lowers his expected payoff. As a rational player, he must make use of the information. Player II knows this, and adapts his strategy to this new situation. It would be to Player I’s advantage to commit to not using his additional information, but without such a binding commitment, Player II may not believe any “promise” that Player I makes to disregard his information. Careful consideration of this example brings out the source of this phenomenon: it is not the addition of information, per se, that is the cause of Player I’s loss, but the fact that Player II knows that Player I has additional information, which leads Player ◭ II to change his behavior.
186
Mixed strategies
A question naturally arises from the material in this section: why does the addition of information always (weakly) help a player in two-player zero-sum games, while in games that are not zero-sum, it may be advantageous or detrimental? The answer lies in the fact that there is a distinction between the concepts of the maxmin value and the equilibrium in games that are not zero sum, while in two-player zero-sum games the two concepts coincide. Additional information to a player (weakly) increases his maxmin value in every game, whether or not it is a two-player zero-sum game (Theorem 5.44). In a two-player zero-sum game, the unique equilibrium payoff is the value of the game (which is also the maxmin value), which is why adding information to a player always (weakly) increases his payoff. If the game is not a two-player zero-sum game, equilibrium payoffs, which can rise or fall with the addition of information, need not equal a player’s maxmin value. The statement “adding information can be detrimental” specifically relates to situations in which a player’s equilibrium payoff, after he gains information, can fall. But this is only relevant if we are expecting the outcome of a game to be an equilibrium, both before and after the addition of new information (we may perhaps expect this if the game has a unique equilibrium, or strictly dominant strategies, as in Exercise 5.63). In contrast, in situations in which we expect the players to play their maxmin strategies, adding information cannot be detrimental.
5.8
Evolutionarily stable strategies • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Darwin’s Theory of Evolution is based on the principle of the survival of the fittest, according to which new generations of the world’s flora and fauna bear mutations.2 An individual who has undergone a mutation will pass it on to his descendants. Only those individuals most adapted to their environment succeed in the struggle for survival. It follows from this principle that, in the context of the process of evolution, every organism acts as if it were a rational creature, by which we mean a creature whose behavior is directed toward one goal: to maximize the expected number of its reproducing descendants. We say that it acts “as if” it were rational in order to stress that the individual organism is not a strategically planning creature. If an organism’s inherited properties are not adapted to the struggle for survival, however, it will simply not have descendants. For example, suppose that the expected number of surviving offspring per individual in a given population is three in every generation. If a mutation raising the number of expected offspring to four occurs in only one individual, eventually there will be essentially no individuals in the population carrying genes yielding the expected number of three offspring, because the ratio of individuals carrying the gene for an expected number of four descendants to individuals carrying the gene for an expected number of three descendants will grow exponentially over the generations. If we relate to an organism’s number of offspring as a payoff, we have described a process that is propelled by the maximization of payoffs. Since the concept of equilibrium in a game is also predicated on the idea that only strategies that maximize expected payoffs ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
2 A mutation is a change in a characteristic that an individual has that is brought on by a change in genetic material. In this section, we will use the term mutation to mean an individual in a population whose behavior has changed, and who passes on that change in behavior to his descendants.
187
5.8 Evolutionarily stable strategies
(against the strategies used by the other players) will be chosen, we have a motivation for using ideas from game theory in order to explain evolutionary phenomena. Maynard Smith and Price [1973] showed that, in fact, it is possible to use the Nash equilibrium concept to shed new light on Darwin’s theory. This section presents the basic ideas behind the application of game theory to the study of evolutionary biology. The interested reader can find descriptions of many phenomena in biology that are explicable using the theory developed in this section in Richard Dawkins’s popular book, The Selfish Gene (Dawkins [1976]). The next example, taken from Maynard Smith and Price [1973], introduces the main idea, and the general approach, used in this theory. Example 5.48
Suppose that a particular animal can exhibit one of two possible behaviors: aggressive
behavior or peaceful behavior. We will describe this by saying that there are two types of animals: hawks (who are aggressive), and doves (who are peaceful). The different types of behavior are expressed when an animal invades the territory of another animal of the same species. A hawk will aggressively repel the invader. A dove, in contrast, will yield to the aggressor and be driven out of its territory. If one of the two animals is a hawk and the other a dove, the outcome of this struggle is that the hawk ends up in the territory, while the dove is driven out, exposed to predators and other dangers. If both animals are doves, one of them will end up leaving the territory. Suppose that each of them leaves the territory in that situation with a probability of 12 . If both animals are hawks, a fight ensues, during which both of them are injured, perhaps fatally, and at most one of them will remain in the territory and produce offspring. Figure 5.30 presents an example of a matrix describing the expected number of offspring of each type of animal in this situation. Note that the game in Figure 5.30 is symmetric; that is, both players have the same set of strategies S1 = S2 , and their payoff functions satisfy u1 (s1 , s2 ) = u2 (s2 , s1 ) for each s1 , s2 ∈ S. This is an example of a “single-species” population, i.e., a population comprised of only one species of animal, where each individual can exhibit one of several possible behaviors.
Invader Dove Hawk Dove
4, 4
2, 8
Hawk
8, 2
1, 1
Defender Figure 5.30 The expected number of offspring following an encounter between two individuals in
the population Our focus here is on the dynamic process that develops under conditions of many random encounters between individuals in the population, along with the appearance of random mutations. A mutation is an individual in the population characterized by a particular behavior: it may be of type dove, or type hawk. More generally, a mutation can be of type x (0 ≤ x ≤ 1); that is, the individual3 will behave as a dove with probability x, and as a hawk with probability 1 − x. The expected number of offspring of an individual who randomly encounters another individual in the population depends on both its type and the type of the individual it has encountered; to be more precise, the expected number depends on the probability y that the encountered individual is a
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
3 For convenience, we will use the same symbol x to stand both for the real number between 0 and 1 specifying the probability of being of type “dove,” and for the lottery [x(dove), (1 − x)(hawk)].
188
Mixed strategies dove (or the probability 1 − y that the encountered individual is a hawk). This probability depends on the composition of the population, that is, on how many individuals there are of a given type in the population, and whether those types are “pure” doves or hawks, or mixed types x. Every population composition determines a unique real number y (0 ≤ y ≤ 1), which is the probability that a randomly chosen individual in the population will behave as a dove (in its next encounter). Suppose now that a mutation, before it is born, can “decide” what type it will be (dove, hawk, or x between 0 and 1). This “decision” and the subsequent interactions the mutation will have with the population can be described by the matrix in Figure 5.31.
Population Dove y
Hawk 1−y
Dove
4, 4
2, 8
Hawk
8, 2
1, 1
Mutation Figure 5.31 The Mutation–Population game
In this matrix, the columns and the rows represent behavioral types. If we treat the matrix as a game, the column player is the “population,” which is implementing a fixed mixed strategy [y(Dove), (1 − y)(Hawk)]; i.e., with probability y the column player will behave as a dove and with probability 1 − y he will behave as a hawk. The row player, who is the mutation, in contrast chooses its type. The expected payoff of a mutation from a random encounter is 4y + 2(1 − y) if it is a dove, 8y + (1 − y) if it is a hawk, and x(4y + 2(1 − y)) + (1 − x)(8y + (1 − y)) if it is of type x. For example, if the population is comprised of 80% doves (y = 0.8) and 20% hawks, and a new mutation is called upon to choose its type when it is born, what “should” the mutation choose? If the mutation chooses to be born a dove (x = 1), its expected number of offspring is 0.8 × 4 + 0.2 × 2 = 3.6, while if it chooses to be born a hawk (x = 0), its expected number of offspring is 0.8 × 8 + 0.2 × 1 = 6.6. It is therefore to the mutation’s advantage to be born a hawk. No mutation, of course, has the capacity to decide whether it will be a hawk or a dove, because these characteristics are either inherited, or the result of a random change in genetic composition. What happens in practice is that individuals who have the characteristics of a hawk will reproduce more than individuals who have the characteristics of a dove. Over the generations, the number of hawks will rise, and the ratio of doves to hawks will not be 80% : 20% (because the percentage of hawks will be increasing). A population in which the ratio of doves to hawks is 80% : 20% is therefore evolutionarily unstable. Similarly, if the population is comprised of 10% doves (y = 0.1) and 90% hawks, we have an evolutionarily unstable situation (because the percentage of doves will increase). It can be shown that only if the population is comprised of 20% doves and 80% hawks will the expected number of offspring of each type be equal. When y ∗ = 0.2 0.2 × 4 + 0.8 × 2 = 2.4 = 0.2 × 8 + 0.8 × 1. ∗
∗
(5.130)
We therefore have u1 (x, y ) = 2.4 for all x ∈ [0, 1]. Note that y = 0.2 is the symmetric equilibrium of the game in Figure 5.31, even when the player represented by the “population” can choose any mixed strategy. In other words, u1 (x, y ∗ ) ≤ u1 (y ∗ , y ∗ ) for each x, and u2 (y ∗ , x) ≤ u2 (y ∗ , y ∗ ) for each x (in fact, the expressions on both sides of the inequality sign in all these cases is 2.4). Can we conclude that when the distribution of the population corresponds to the symmetric Nash equilibrium of the associated game, the population will be evolutionarily stable? The following example shows that to attain evolutionary stability, we need to impose a stronger condition that takes into account encounters between two mutations. ◭
189
5.8 Evolutionarily stable strategies
Example 5.49
Consider the situation shown in Figure 5.32, in which the payoffs in each encounter are
different from the ones above.
Population Dove y
Hawk 1−y
Dove
4, 4
2, 2
Hawk
2, 2
2, 2
Mutation Figure 5.32 The payoff matrix in a symmetric game
This game has two Nash equilibria, (Dove, Dove) and (Hawk, Hawk). The former corresponds to a population comprised solely of doves, and the second to a population comprised solely of hawks. When the population is entirely doves (y = 1), the expected number of offspring of type dove is 4, and the expected number of offspring of type hawk is 2, and therefore hawk mutations disappear over the generations, and the dove population remains stable. When the population is entirely hawks (y = 0), the expected number of offspring, of either type, is 2. But in this case, as long as the percentage of doves born is greater than 0, it is to a mutation’s advantage to be born a dove. This is because the expected number of offspring of a mutation that is born a hawk is 2, but the expected number of offspring of a mutation that is born a dove is 2(1 − ε) + 4ε = 2 + 2ε, where ε is the percentage of doves in the population (including mutations). In other words, when there are random changes in the composition of the population, it is to a mutation’s advantage to be born a dove, because its expected number of offspring will be slightly higher than the expected number of offspring of a hawk. After a large number of generations have passed, the doves will form a majority of the population. This shows that a population comprised solely of hawks is not evolutionarily stable. What happens if the population is comprised of doves, but many mutations occur, and the percentage of hawks in the population becomes ε? By a calculation similar to the one above, the expected number of offspring of a hawk is 2, while the expected number of offspring of a dove is 4 − 2ε. As long as ε < 1, the expected number of offspring of a dove will be greater than that of a hawk, and hence the percentage of hawks in the population will decrease. This shows that a population comprised entirely of doves is evolutionarily stable. If the population is comprised solely of hawks, where can a dove mutation come from? Such a mutation can arise randomly, as the result of a genetic change that occurs in an individual in the population. In general, even when a particular type is entirely absent from a population, in order to check whether the population is evolutionarily stable it is necessary to check what would happen if ◭ the absent type were to appear “ab initio.”
We will limit our focus in this section to two-player symmetric games. We will also assume that payoffs are nonnegative, since the payoffs in these games represent the expected number of offspring, which cannot be a negative number. Examples 5.48 and 5.49 lead to the following definition of evolutionarily stable strategy. Definition 5.50 A mixed strategy x ∗ in a two-player symmetric game is an evolutionarily stable strategy (ESS) if for every mixed strategy x that differs from x ∗ there exists ε0 = ε0 (x) > 0 such that, for all ε ∈ (0, ε0 ), (1 − ε)u1 (x, x ∗ ) + εu1 (x, x) < (1 − ε)u1 (x ∗ , x ∗ ) + εu1 (x ∗ , x).
(5.131)
190
Mixed strategies
The biological interpretation of this definition is as follows. Since mutations occur in nature on a regular basis, we are dealing with populations mostly composed of “normal” individuals, with a minority of mutations. We will interpret x ∗ as the distribution of types among the normal individuals. Consider a mutation making use of strategy x, and assume that the proportion of this mutation in the population is ε. Every individual of type x will encounter a normal individual of type x ∗ with probability 1 − ε, receiving in that case the payoff u1 (x, x ∗ ), and will encounter a mutation of type x with probability ε, receiving in that case the payoff u1 (x, x). Equation (5.131) therefore says that in a population in which the proportion of mutations is ε, the expected payoff of a mutation (the left-hand side of the equal sign in Equation (5.131)) is smaller than the expected payoff of a normal individual (the right-hand side of the equal sign in Equation (5.131)), and hence the proportion of mutations will decrease and eventually disappear over time, with the composition of the population returning to being mostly x ∗ . An “evolutionarily stable equilibrium” is therefore a mixed strategy of the column player that corresponds to a population that is immune to being overtaken by mutations. In Example 5.49, Equation (5.131) holds for the dove strategy (x ∗ = 1) for every ε < 1 and it is therefore an evolutionarily stable strategy. In contrast, Equation (5.131) does not hold for the hawk strategy (x ∗ = 0), and the hawk strategy is therefore not evolutionarily stable. As we saw in that example, for each x = 0 (where x denotes the proportion of doves in the population), (1 − ε)u1 (x, x ∗ ) + εu1 (x, x) = 2 + 2(1 − x)2 > 2 = (1 − ε)u1 (x ∗ , x ∗ ) + εu1 (x ∗ , x). By continuity, Equation (5.131) holds as a weak inequality for ε = 0. From this we deduce that every evolutionarily stable strategy defines a symmetric Nash equilibrium in the game. In particular, the concept of an evolutionarily stable equilibrium constitutes a refinement of the concept of Nash equilibrium. Theorem 5.51 If x ∗ is an evolutionarily stable strategy in a two-player symmetric game, then (x ∗ , x ∗ ) is a symmetric Nash equilibrium in the game. As Example 5.49 shows, the opposite direction does not hold: if (x ∗ , x ∗ ) is a symmetric Nash equilibrium, x ∗ is not necessarily an evolutionarily stable strategy. In this example, the strategy vector (Hawk, Hawk) is a symmetric Nash equilibrium, but the Hawk strategy is not an evolutionarily stable strategy. The next theorem characterizes evolutionarily stable strategies. Theorem 5.52 A strategy x ∗ is evolutionarily stable if and only if for each x = x ∗ only one of the following two conditions obtains: u1 (x, x ∗ ) < u1 (x ∗ , x ∗ ),
(5.132)
or u1 (x, x ∗ ) = u1 (x ∗ , x ∗ ) and
u1 (x, x) < u1 (x ∗ , x).
(5.133)
The first condition states that if a mutation deviates from x ∗ , it will lose in its encounters with the normal population. The second condition says that if the payoff a mutation receives from encountering a normal individual is equal to that received by a normal individual encountering a normal individual, that mutation will receive a smaller payoff
191
5.8 Evolutionarily stable strategies
when it encounters the same mutation than a normal individual would in encountering the mutation. In both cases the population of normal individuals will increase faster than the population of mutations. Proof: We will first prove that if x ∗ is an evolutionarily stable strategy then for each x = x ∗ one of the conditions (5.132) or (5.133) holds. From Theorem 5.51, (x ∗ , x ∗ ) is a Nash equilibrium, and therefore u1 (x, x ∗ ) ≤ u1 (x ∗ , x ∗ ) for each x = x ∗ . If for a particular x neither of the conditions (5.132) or (5.133) holds, then u1 (x, x ∗ ) = u1 (x ∗ , x ∗ ) and u1 (x, x) ≥ u1 (x ∗ , x), but then Equation (5.131) does not hold for this x for any ε > 0, contradicting the fact that x ∗ is an evolutionarily stable strategy. It follows that for each x at least one of the two conditions (5.132) or (5.133) obtains. Suppose next that for any mixed strategy x = x ∗ , at least one of the two conditions (5.132) or (5.133) obtains. We will prove that x ∗ is an evolutionarily stable strategy. If con∗ ∗ )−u (x,x ∗ ) 1 dition (5.132) obtains, then Equation (5.131) obtains for all ε < u1 (x ,x 4M , where M is the upper bound of the payoffs: M = maxs1 ∈S1 maxs2 ∈S2 u1 (s1 , s2 ) (verify!). If condition (5.133) obtains then Equation (5.131) obtains for all ε ∈ (0, 1]. It follows that Equation (5.131) obtains in both cases, and therefore x ∗ is an evolutionarily stable strategy. If condition (5.132) obtains, then for each x = x ∗ , the equilibrium (x ∗ , x ∗ ) is called a strict equilibrium. The next corollary follows from Theorem 5.52. Corollary 5.53 In a symmetric game, if (x ∗ , x ∗ ) is a strict symmetric equilibrium then x ∗ is an evolutionarily stable equilibrium. Indeed, if (x ∗ , x ∗ ) is a strict symmetric equilibrium, then condition (5.132) holds for every x = x ∗ . Theorem 5.52 and Corollary 5.53 yield a method for finding evolutionarily stable strategies: find all symmetric equilibria in the game, and for each one of them, determine whether or not it is a strict equilibrium. Every strict symmetric equilibrium defines an evolutionarily stable strategy. For every Nash equilibrium that is not strict, check whether condition (5.133) obtains for each x different from x ∗ for which condition (5.132) does not obtain (hence necessarily u1 (x, x ∗ ) = u1 (x ∗ , x ∗ )). Example 5.48 (Continued) Recall that the payoff function in this example is as shown in Figure 5.33.
Population Dove Hawk Dove
4, 4
2, 8
Hawk
8, 2
1, 1
Mutation Figure 5.33 The expected number of offspring from encounters between two individuals in
Example 5.48 The symmetric mixed equilibrium is ([ 51 (Dove), 54 (Hawk)], [ 15 (Dove), 54 (Hawk)]). The proportion of doves at equilibrium is x ∗ = 15 . Denote by x the proportion of doves in a mutation. Since the
192
Mixed strategies equilibrium is completely mixed, each of the two pure strategies yields the same expected payoff, and therefore u1 (x, x ∗ ) = u1 (x ∗ , x ∗ ) for all x = x ∗ . To check whether [ 51 (Dove), 54 (Hawk)] is an evolutionarily stable strategy, we need to check whether condition (5.133) obtains; that is, we need to check whether u1 (x, x) < u1 (x ∗ , x) for every x = x ∗ . This inequality can be written as 4x 2 + 2x(1 − x) + 8(1 − x)x + (1 − x)2 < 51 4x + 51 2(1 − x) + 54 8x + 54 (1 − x), which can be simplified to (5x − 1)2 > 0,
(5.134)
and this inequality obtains for each x different from 51 . We have thus proved that [ 15 (Dove), 54 (Hawk)] is an evolutionarily stable strategy. This game has two additional asymmetric Nash equilibria: (Dove, Hawk) and (Hawk, Dove). These equilibria do not contribute to the search for evolutionarily stable equilibria, since ◭ Theorem 5.51 relates evolutionarily stable equilibria solely to symmetric equilibria.
Example 5.54
Consider another version of the Hawk–Dove game, in which the payoffs are as shown
in Figure 5.34. This game has three symmetric equilibria: two pure equilibria, (Dove, Dove), (Hawk, Hawk), and one mixed, ([ 21 (Dove), 21 (Hawk)], [ 12 (Dove), 21 (Hawk)]). The pure equilibria (Dove, Dove) and (Hawk, Hawk) are strict equilibria, and hence the two pure strategies Dove and Hawk are evolutionarily stable strategies (Corollary 5.53) (see Figure 5.34).
Population Dove
Hawk
Dove
4, 4
1, 3
Hawk
3, 1
2, 2
Mutation Figure 5.34 The expected number of offspring in encounters between two individuals in
Example 5.54 The strategy x ∗ = [ 12 (Dove), 21 (Hawk)] is not evolutionarily stable. To see this, denote x = [1(Dove), 0(Hawk)]. Then u1 (x ∗ , x ∗ ) = 2 12 = u1 (x, x ∗ ), and u1 (x, x) = 4 > 2 21 = u1 (x ∗ , x). From Theorem 5.52 it follows that the strategy [ 12 (Dove), 21 (Hawk)] is not evolutionarily stable. We can conclude from this that the population would be stable against mutations if the population were comprised entirely of doves or entirely of hawks. Any other composition of the population would not be stable against mutations. In addition, if the percentage of doves is greater than 50%, doves will reproduce faster than hawks and take over the population. On the other hand, if the percentage of doves is less than 50%, doves will reproduce more slowly than hawks, and eventually disappear from the population. If the percentage of doves is exactly 50%, as a result of mutations or random changes in the population stemming from variability in the number of offspring, the percentage of doves will differ from 50% in one of the subsequent generations, and then one of the two types will take over the population. Although in this example a population composed entirely of doves reproduces at twice the rate ◭ of a population composed entirely of hawks, both populations are evolutionarily stable.
193
5.8 Evolutionarily stable strategies
Since Nash’s Theorem (Theorem 5.10, page 151) guarantees the existence of a Nash equilibrium, an interesting question arises: does an evolutionarily stable strategy always exist? The answer is negative. It may well happen that an evolutionary process has no evolutionarily stable strategies. The next example, which is similar to Rock, Paper, Scissors, is taken from Maynard Smith [1982].
Example 5.55 Consider the symmetric game in which each player has the three pure strategies appearing in Figure 5.35.
Player II Rock
Player I
Paper
Scissors
Rock
2 2 3, 3
0, 1
1, 0
Paper
1, 0
2 2 3, 3
0, 1
Scissors
0, 1
1, 0
2 2 , 3 3
Figure 5.35 A game without an evolutionarily stable strategy
This game has only one Nash equilibrium (Exercise 5.70), which is symmetric, in which the players play the mixed strategy: x ∗ = 13 (Rock), 13 (Paper), 31 (Scissors) . (5.135)
The corresponding equilibrium payoff is u1 (x ∗ , x ∗ ) = 95 . We want to show that there is no evolutionarily stable strategy in this game. Since every evolutionarily stable strategy defines a symmetric Nash equilibrium, to ascertain that there is no evolutionarily stable strategy it suffices to check that the strategy x ∗ is not an evolutionarily stable strategy. The strategy x ∗ is completely mixed, and hence it leads to an identical payoff against any pure strategy: u1 (x, x ∗ ) = u1 (x ∗ , x ∗ ) for all x = x ∗ . Consider a mutation x = 1(Rock), 0(Paper), 0(Scissors) ; condition (5.133) does not obtain for this mutation. To see this, note that u1 (x, x) = 32 , while u1 (x ∗ , x) = 29 + 13 = 59 , and hence u1 (x, x) > u1 (x ∗ , x). It is interesting to note that a biological system in which the number of offspring is given by the table in Figure 5.35 and the initial distribution of the population is 31 (Rock), 13 (Paper), 13 (Scissors) will never attain population stability, and instead will endlessly cycle through population configurations (see Hofbauer and Sigmund [2003] or Zeeman [1980]). If, for example, through mutation the proportion of Rocks in the population were to increase slightly, their relative numbers would keep rising, up to a certain point. At that point, the proportion of Papers would rise, until that process too stopped, with the proportion of Scissors then rising. But at a certain point the rise in the relative numbers of Scissors would stop, with Rocks then increasing, and the cycle would repeat endlessly. Analyzing the evolution of such systems is accomplished using tools from the theory of dynamic processes. The interested reader is directed to Hofbauer and Sigmund ◭ [2003].
194
Mixed strategies
5.9
Remarks • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Exercise 5.13 is based on Alon, Brightwell, Kierstead, Kostochka, and Winkler [2006]. Exercise 5.27 is based on a discovery due to Lloyd Shapley, which indicates that the equilibrium concept has disadvantages (in addition to its advantages). A generalization of this result appears in Shapley [1994]. The Inspector Game in Exercise 5.28 is a special case of an example in Maschler [1966b], in which r on-site inspection teams may be sent, and there are n possible dates on which the Partial Test Ban Treaty can be abrogated. For a generalization of this model to the case in which there are several detectors, with varying probabilities of detecting what they are looking for, see Maschler [1967]. The interested reader can find a survey of several alternative models for the Inspector Game in Avenhaus, von Stengel, and Zamir [2002]. Exercise 5.29 is based on Biran and Tauman [2007]. Exercise 5.33 is based on an example in Diekmann [1985]. Exercise 5.34 is a variation of a popular lottery game conducted in Sweden by the Talpa Corporation. Exercise 5.44 is taken from Lehrer, Solan, and Viossat [2007]. Exercise 5.51 is from Peleg [1969]. Parts of Exercise 5.60 are taken from Altman and Solan [2006]. The authors thank Uzi Motro for reading and commenting on Section 5.8 and for suggesting Exercise 5.71. We also thank Avi Shmida, who provided us with the presentation of Exercise 5.72.
5.10
Exercises • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
5.1 Prove that if S is a finite set then (S) is a convex and compact set. 5.2 Let A ⊆ Rn and B ⊆ Rm be two compact sets. Prove that the product set A × B ⊆ Rn+m is a compact set. 5.3 Let A ⊆ Rn and B ⊆ Rm be two convex sets. Prove that the product set A × B ⊆ Rn+m is a convex set. 5.4 Show that every multilinear function f : → R is continuous. 5.5 Prove that for every player i the set of extreme points of player i’s collection of mixed strategies is his set of pure strategies. 5.6 Prove that every two-player zero-sum game over the unit square with bilinear payoff functions is the extension to mixed strategies of a two-player game in which each player has two pure strategies. 5.7 Show that for every vector σ−i of mixed strategies of the other players, player i has a best reply that is a pure strategy. 5.8 Answer the following questions for each of the following games, which are all two-player zero-sum games. As is usually the case in this book, Player I is the row player and Player II is the column player.
195
5.10 Exercises
(a) Write out the mixed extension of each game. (b) Find the value in mixed strategies, and all the optimal mixed strategies of each of the two players. L
R
T
−1
−4
B
−3
3
L
R
T
5
8
B
5
1
L
R
T
5
4
B
2
3
Game B
Game A L
R
T
4
2
B
2
9
Game C
L
R
T
5
4
B
5
6
Game D
L
R
T
7
7
B
3
10
Game E
Game F
5.9 Find the value of the game in mixed strategies and all the optimal strategies of both players in each of the following two-player zero-sum games, where Player I is the row player and Player II is the column player. L
R
L
R
d
15
−8
T
2
6
c
10
−4
M
5
5
b
5
−2
B
7
4
a
−3
8
a
b
c
d
T
5
3
4
0
B
−3
2
−5
6
Game B
Game A
Game C
L
M
R
T
6
4
3
B
3
7
9
Game D
L
M
R
T
6
4
3
B
3
7
9
Game E
5.10 In each of the following items, find a two-player game in strategic form in which each player has two pure strategies, such that in the mixed extension of the game the payoff functions of the players are the specified functions. (Note that the games in parts (a) and (b) are zero-sum, but that the games in parts (c) and (d) are not zero-sum.) (a) U (x, y) = 5xy − 2x + 6y − 2. (b) U (x, y) = −2xy + 4x − 7y.
196
Mixed strategies
(c) U1 (x, y) = 3xy − 4x + 5, U2 (x, y) = 7xy + 7x − 8y + 12. (d) U1 (x, y) = 3xy − 3x + 3y − 5, U2 (x, y) = 7x − 8y + 12. 5.11 For each of the graphs appearing in Figure 5.9 (page 157) find a two-player zero-sum game such that the graph of the functions (U (x, sII ))sII ∈SII is the same as the graph in the figure. For each of these games, compute the value in mixed strategies, and all the optimal strategies of Player I. 5.12 A (finite) square matrix A = (ai,j )i,j is called anti-symmetric if ai,j = −aj,i for all i and j . Prove that if the payoff matrix of a two-player zero-sum game is antisymmetric, then the value of the game in mixed strategies is 0. In addition, Player I’s set of optimal strategies is identical to that of Player II, when we identify Player I’s pure strategy given by row k with Player II’s pure strategy given by column k. 5.13 Let G = (V , E) be a directed graph, where V is a set of vertices, and E is a set of edges. A directed edge from vertex x to vertex y is denoted by (x, y). Suppose that the graph is complete, i.e., for every pair of edges x, y ∈ V , either (x, y) ∈ E or (y, x) ∈ E, but not both. In particular, (x, x) ∈ E for all x ∈ E. In this exercise, we will prove that there exists a distribution q ∈ (V ) satisfying
q(y) ≥ 12 , ∀x ∈ V . (5.136) {y∈V : (y,x)∈E}
(a) Define a two-player zero-sum game in which the set of pure strategies of the two players is V , and the payoff function is defined as follows: ⎧ x = y, ⎨0 x = y, (x, y) ∈ E, (5.137) u(x, y) = 1 ⎩ −1 x = y, (x, y) ∈ E.
Prove that the payoff matrix of this game is an anti-symmetric matrix, and, using Exercise 5.12, deduce that its value in mixed strategies is 0. (b) Show that every optimal strategy q of Player II in this game satisfies Equation (5.136).
5.14 A mixed strategy σi of player i is called weakly dominated (by a mixed strategy) if it is weakly dominated in the mixed extension of the game: there exists a mixed strategy σi of player i satisfying (a) For each strategy s−i ∈ S−i of the other players:
σi , s−i ). Ui (σi , s−i ) ≤ Ui (
(5.138)
(b) There exists a strategy t−i ∈ S−i of the other players for which σi , t−i ). Ui (σi , t−i ) < Ui (
(5.139)
Prove that the set of weakly dominated mixed strategies is a convex set. 5.15 Suppose that a mixed strategy σi of player i strictly dominates another of his mixed strategies, σi . Prove or disprove each of the following claims:
197
5.10 Exercises
(a) Player i has a pure strategy si ∈ Si satisfying: (i) σi (si ) > 0 and (ii) strategy si is not chosen by player i in any equilibrium. (b) For each equilibrium σ ∗ = (σi∗ )i∈N player i has a pure strategy si ∈ Si satisfying (a) σi (si ) > 0 and (b) σi∗ (si ) = 0.
5.16 Suppose player i has a pure strategy si that is chosen with positive probability in each of his maxmin strategies. Prove that si is not weakly dominated by any other strategy (pure or mixed). 5.17 Suppose player i has a pure strategy si that is chosen with positive probability in one of his maxmin strategies. Is si chosen with positive probability in each of player i’s maxmin strategies? Prove this claim, or provide a counterexample.
5.18 Suppose player i has a pure strategy si that is not weakly dominated by any of his other pure strategies. Is si chosen with positive probability in one of player i’s maxmin strategies? Prove this claim, or provide a counterexample. 5.19 Let (ai,j )1≤i,j ≤n be nonnegative numbers satisfying j =i ai,j = ai,i for all i. Julie and Sam are playing the following game. Julie writes down a natural number i, 1 ≤ i ≤ n, on a slip of paper. Sam does not see the number that Julie has written. Sam then guesses what number Julie has chosen, and writes his guess, which is a natural number j , 1 ≤ j ≤ n, on a slip of paper. The two players simultaneously show each other the numbers they have written down. If Sam has guessed correctly, Julie pays him ai,i dollars, where i is the number that Julie chose (and that Sam correctly guesses). If Sam was wrong in his guess (i = j ), Sam pays Julie ai,j dollars. Depict this game as a two-player zero-sum game in strategic form, and prove that the value in mixed strategies of the game is 0. 5.20 Consider the following two-player zero-sum game.
Player I
L
Player II C
R
T
3
−3
0
M
2
6
4
B
2
5
6
(a) Find a mixed strategy of Player I that guarantees him the same payoff against any pure strategy of Player II. (b) Find a mixed strategy of Player II that guarantees him the same payoff against any pure strategy of Player I. (c) Prove that the two strategies you found in (a) and (b) are the optimal strategies of the two players.
198
Mixed strategies
(d) Generalize this result: Suppose a two-player zero-sum game is represented by an n × m matrix.4 Suppose each player has an equalizing strategy, meaning a strategy guaranteeing him the same payoff against any pure strategy his opponent may play. Prove that any equalizing strategy is an optimal strategy. (e) Give an example of a two-player zero-sum game in which one of the players has an equalizing strategy that is not optimal. Why is this not a contradiction to (d)? 5.21 In the following payoff matrix of a two-person zero-sum game, no player has an optimal pure strategy.
L
Player II R
T
a
b
B
c
d
Player I
What inequalities must the numbers a, b, c, d satisfy? Find the value in mixed strategies of this game. 5.22 Prove that in any n-person game, at Nash equilibrium, each player’s payoff is greater than or equal to his maxmin value. 5.23 The goal of this exercise is to prove that in a two-player zero-sum game, each player’s set of optimal strategies is a convex set. Let G = (N, (Si )i∈N , (ui )i∈N ) be a two-player zero-sum game in which N = {I, II}. For each pair of mixed strategies σI = [pI1 (sI1 ), . . . , pImI (sImI )] and ImI (sImI )], and each real number σI = [ pI1 (sI1 ), . . . , p j I in the unit line interval α ∈ [0, 1], define a vector qI = (qI )m j =1 as follows: j
j
j
qI = αpI + (1 − α) pI .
(5.140)
j
I (a) Prove that q = (qI )m j =1 is a probability distribution. (b) Define a mixed strategy τI of Player I as follows: τI = qI1 sI1 , . . . , qImI sImI .
(5.141)
Prove that for every mixed strategy σII of Player II:
σI , σII ). U (τI , σII ) = αU (σI , σII ) + (1 − α)U (
(5.142)
(c) We say that a strategy σI of Player I guarantees payoff v if U (σI , σII ) ≥ v for σI guarantee Player I payoff every strategy σII of Player II. Prove that if σI and v, then τI also guarantees Player I payoff v. σI are optimal strategies of Player I, then τI is also an (d) Deduce that if σI and optimal strategy of Player I. (e) Deduce that Player I’s set of optimal strategies is a convex set in (SI ). ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
4 This means that the matrix has n rows (pure strategies of Player I) and m columns (pure strategies of Player II).
199
5.10 Exercises
5.24 The goal of this exercise is to prove that in a two-player zero-sum game, each player’s set of optimal strategies, which we proved is a convex set in Exercise 5.23, is also a compact set. Let (σIk )k∈N be a sequence of optimal strategies of Player I, and for each k ∈ N denote σIk = [pIk,1 (sI1 ), . . . , pIk,mI (sImI )]. Suppose that for each ∗,j k,j j = 1, 2, . . . , mI , the limit pI = limk→∞ pI exists. Prove the following claims: ∗,j
I (a) The vector (pI )m j =1 is a probability distribution over SI . (b) Define a mixed strategy σI∗ as follows: σI∗ = pI∗,1 sI1 , . . . , pI∗,mI sImI .
(5.143)
Prove that σI∗ is also an optimal strategy of Player I. (c) Deduce from this that Player I’s set of optimal strategies is a compact subset of the set of mixed strategies (SI ). 5.25 For each of the following games, where Player I is the row player and Player II is the column player: (a) Write out the mixed extension of the game. (b) Compute all the equilibria in mixed strategies. L
R
T
1, 1
4, 0
B
2, 10
3, 5
L
R
T
1, 2
2, 2
B
0, 3
1, 1
Game A
L
M
R
T
1, 1
0, 2
2, 0
B
0, 0
1, 0
−1, 3
Game B
Game C
5.26 For each of the following games, where Player I is the row player and Player II is the column player: (a) Find all the equilibria in mixed strategies, and all the equilibrium payoffs. (b) Find each player’s maxmin strategy. (c) What strategy would you advise each player to use in the game? L
R
T
5, 5
0, 8
B
8, 0
1, 1
L
R
T
9, 5
10, 4
B
8, 4
15, 6
Game A
L
R
T
5, 16
15, 8
B
16, 7
8, 15
Game B L
R
T
4, 12
5, 10
B
3, 16
6, 22
Game E
L
R
T
8, 3
10, 1
B
6, −6
3, 5
Game C L
R
T
2, 2
3, 3
B
4, 0
2, −2
Game F
Game D L
R
T
15, 3
15, 10
B
15, 4
15, 7
Game G
200
Mixed strategies
5.27 Consider the two-player game in the figure below, in which each player has three pure strategies.
Player II
Player I
L
C
R
T
0, 0
7, 6
6, 7
M
6, 7
0, 0
7, 6
B
7, 6
6, 7
0, 0
(a) Prove that ([ 13 (T ), 13 (M), 13 (B)]; [ 13 (L), 13 (C), 31 (R)]) is the game’s unique equilibrium. (b) Check that if Player I deviates to T , then Player II has a reply that leads both players to a higher payoff, relative to the equilibrium payoff. Why, then, will Player II not play that strategy? 5.28 The Inspector Game During the 1960s, within the framework of negotiations between the United States (US) and the Union of Soviet Socialist Republics (USSR) over nuclear arms limitations, a suggestion was raised that both countries commit to a moratorium on nuclear testing. One of the objections to this suggestion was the difficulty in supervising compliance with such a commitment. Detecting aboveground nuclear tests posed no problem, because it was easy to detect the radioactive fallout from a nuclear explosion conducted in the open. This was not true, however, with respect to underground tests, because it was difficult at the time to distinguish seismographically between an underground nuclear explosion and an earthquake. The US therefore suggested that in every case of suspicion that a nuclear test had been conducted, an inspection team be sent to perform on-site inspection. The USSR initially objected, regarding any inspection team sent by the US as a potential spy operation. At later stages in the negotiations, Soviet negotiators expressed readiness to accept three on-site inspections annually, while American negotiators demanded at least eight on-site inspections. The expected number of seismic events per year considered sufficiently strong to arouse suspicion was 300. The model presented in this exercise assumes the following:
r The USSR can potentially conduct underground nuclear tests on one of two possible distinct dates, labeled A and B, where B is the later date. r The USSR gains nothing from choosing one of these dates over the other for conducting an underground nuclear test, and the US loses nothing if one date is chosen over another. r The USSR gains nothing from conducting nuclear tests on both of these dates over its utility from conducting a test on only one date, and the US loses nothing if tests are conducted on both dates over its utility from conducting a test on only one date.
201
5.10 Exercises
r The US may send an inspection team on only one of the two dates, A or B, but not on both. r The utilities of the two countries from the possible outcomes are: r If the Partial Test Ban Treaty (PTBT) is violated by the USSR and the US does not send an inspection team: the US receives 0 and the USSR receives 0. r If the PTBT is violated by the USSR and the US sends an inspection team: the US receives 1 and the USSR receives 1. r If the PTBT is not violated, the US receives α and the USSR receives β, where α > 1 and 0 < β < 1 (whether or not the US sends an inspection team). Answer the following questions: (a) Explain why the above conditions are imposed on the values of α and β. (b) Plot, in the space of the utilities of the players, the convex hull of the points (0, 1), (1, 0), and (α, β). The convex hull includes all the results of all possible lotteries conducted on pairs of actions undertaken by the players. (c) List the pure strategies available to each of the two countries. (d) Write down the matrix of the game in which the pure strategies permitted to the US (the row player) are: r A: Send an inspection team on date A r B: Send an inspection team on date B and the pure strategies permitted to the USSR (the column player) are: r L: Conduct a nuclear test on date A r R: Do not conduct a nuclear test on date A. Conduct a nuclear test on date B, only if the US sent an inspection team on date A. (e) Explain why the other pure strategies you wrote down in part (c) are either dominated by the strategies in paragraph (d), or equivalent to them. (f) Show that the game you wrote down in paragraph (d) has only one equilibrium. Compute that equilibrium. Denote by (vI∗ , vII∗ ) the equilibrium payoff, and by [x ∗ (A), (1 − x ∗ )(B)] the equilibrium strategy of the US. (g) Add to the graph you sketched in paragraph (b) the equilibrium payoff, and the payoff U ([x ∗ (A), (1 − x ∗ )(B)], R) (where U = (UI , UII ) is the vector of the utilities of the two players). Show that the point U ([x ∗ (A), (1 − x ∗ )(B)], R) is located on the line segment connecting (0, 1) with (α, β). (h) Consider the following possible strategy of the US: play [(x ∗ + ε)(A), (1 − x ∗ − ε)(B)], where ε > 0 is small, and commit to playing this mixed strategy.5 Show that the best reply of the USSR to this mixed strategy is to play strategy R. What is the payoff to the two players from the strategy vector ([(x ∗ + ε)(A), (1 − x ∗ − ε)(B)], R)? Which of the two countries gains from this, relative to the equilibrium payoff?
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
5 This model in effect extends the model of a strategic game by assuming that one of the players has the option to commit to implementing a particular strategy. One way of implementing this would be to conduct a public spin of a roulette wheel in the United Nations building, and to commit to letting the result of the roulette spin determine whether an inspection team will be sent: if the result indicates that the US should send an inspection team on date A, the USSR will be free to deny entry to a US inspection team on date B, without penalty.
202
Mixed strategies
(i) Prove that the USSR can guarantee itself a payoff of vII∗ , regardless of the mixed strategy used by the US, when it plays its maxmin strategy. (j) Deduce from the last two paragraphs that, up to an order of ε, the US cannot expect to receive a payoff higher than the payoff it would receive from committing to play the strategy [(x ∗ + ε)(A), (1 − x ∗ − ε)(B)], assuming that the USSR makes no errors in choosing its strategy. 5.29 Suppose Country A constructs facilities for the development of nuclear weapons. Country B sends a spy ring to Country A to ascertain whether it is developing nuclear weapons, and is considering bombing the new facilities. The spy ring sent by Country B is of quality α: if Country A is developing nuclear weapons, Country B’s spy ring will correctly report this with probability α, and with probability 1 − α it will report a false negative. If Country A is not developing nuclear weapons, Country B’s spy ring will correctly report this with probability α, and with probability 1 − α it will report a false positive. Country A must decide whether or not to develop nuclear weapons, and Country B, after receiving its spy reports, must decide whether or not to bomb Country A’s new facilities. The payoffs to the two countries appear in the following table. Country B Bomb
Don’t Bomb
Don’t Develop
1 1 2, 2
3 4 ,1
Develop
0,
3 4
1,0
Country A
(a) Depict this situation as a strategic-form game. Are there any dominating strategies in the game? (b) Verbally describe what it means to say that the quality of Country B’s spy ring is α = 12 . What if α = 1? (c) For each α ∈ [ 12 , 1], find the game’s set of equilibria. (d) What is the set of equilibrium payoffs as a function of α? What is the α at which Country A’s maximal equilibrium payoff is obtained? What is the α at which Country B’s maximal equilibrium payoff is obtained? (e) Assuming both countries play their equilibrium strategy, what is the probability that Country A will manage to develop nuclear weapons without being bombed? 5.30 Prove that in any two-player game, max min uI (σI , σII ) = max min uI (σI , sII ).
σI ∈I σII ∈II
σI ∈I sII ∈SII
(5.144)
That is, given a mixed strategy of Player I, Player II can guarantee that Player I will receive the minimal possible payoff by playing a pure strategy, without needing to resort to a mixed strategy.
203
5.10 Exercises
5.31 Let σ−i be a vector of mixed strategies of all players except for player i, in a strategic-form game. Let σi be a best reply of player i to σ−i . The support of σi is the set of all pure strategies given positive probability in σi (see Equation (5.64) on page 165). Answer the following questions. (a) Prove that for any pure strategy si of player i in the support of σi Ui (si , σ−i ) = U (σi , σ−i ).
(5.145)
(b) Prove that for any mixed strategy σi of player i whose support is contained in the support of σi Ui ( σi , σ−i ) = U (σi , σ−i ).
(5.146)
(c) Deduce that player i’s set of best replies to every mixed strategy of the other players σ−i is the convex hull of the pure strategies that give him a maximal payoff against σ−i . Recall that the convex hull of a set of points in a Euclidean space is the smallest convex set containing all of those points. 5.32 A game G = (N, (Si )i∈N , (ui )i∈N ) is called symmetric if (a) each player has the same set of strategies: Si = Sj for each i, j ∈ N, and (b) the payoff functions satisfy ui (s1 , s2 , . . . , sn ) = uj (s1 , . . . , si−1 , sj , si+1 , . . . , sj −1 , si , sj +1 , . . . , sn )
(5.147)
for any vector of pure strategies s = (s1 , s2 , . . . , sn ) ∈ S and for each pair of players i, j satisfying i < j . Prove that in every symmetric game there exists a symmetric equilibrium in mixed strategies: an equilibrium σ = (σi )i∈N satisfying σi = σj for each i, j ∈ N. 5.33 The Volunteer’s Dilemma Ten people are arrested after committing a crime. The police lack sufficient resources to investigate the crime thoroughly. The chief investigator therefore presents the suspects with the following proposal: if at least one of them confesses, every suspect who has confessed will serve a one-year jail sentence, and all the rest will be released. If no one confesses to the crime, the police will continue their investigation, at the end of which each one of them will receive a ten-year jail sentence. (a) Write down this situation as a strategic-form game, where the set of players is the set of people arrested, and the utility of each player (suspect) is 10 minus the number of years he spends in jail. (b) Find all the equilibrium points in pure strategies. What is the intuitive meaning of such an equilibrium, and under what conditions is it reasonable for such an equilibrium to be attained? (c) Find a symmetric equilibrium in mixed strategies. What is the probability that at this equilibrium no one volunteers to confess? (d) Suppose the number of suspects is not 10, but n. Find a symmetric equilibrium in mixed strategies. What is the limit, as n goes to infinity, of the probability that in a symmetric equilibrium no one volunteers ? What can we conclude from this analysis for the topic of volunteering in large groups?
204
Mixed strategies
5.34 Consider the following lottery game, with n participants competing for a prize worth $M (M > 1). Every player may purchase as many numbers as he wishes in the range {1, 2, . . . , K}, at a cost of $1 per number. The set of all the numbers that have been purchased by only one of the players is then identified, and the winning number is the smallest number in that set. The (necessarily only) player who purchased that number is the lottery winner, receiving the full prize. If no number is purchased by only one player, no player receives a prize. (a) Write down every player’s set of pure strategies and payoff function. (b) Show that a symmetric equilibrium exists, i.e., there exists an equilibrium in which every player uses the same mixed strategy. (c) For p1 ∈ (0, 1), consider the following mixed strategy σi (p1 ) of player i: with probability p1 purchase only the number 1, and with probability 1 − p1 do not purchase any number. What conditions must M, n, and p1 satisfy for the strategy vector in which player i plays strategy σi (p1 ) to be a symmetric equilibrium? (d) Show that if at equilibrium there is a positive probability that player i will not purchase any number, then his expected payoff is 0. (e) Show that if M < n, meaning that the number of participants is greater than the value of the prize at equilibrium, there is a positive probability that no player purchases a number. Conclude from this that at every symmetric equilibrium the expected payoff of every player is 0. (Hint: Show that if with probability 1 every player purchases at least one number, the expected number of natural numbers purchased by all the players together is greater than the value of the prize M, and hence there is a player whose expected payoff is negative.) 5.35 The set of equilibria is a subset of the product space (S1 ) × (S2 ) × · · · × (Sn ). Prove that it is a compact set. Is it also a convex set? If you answer yes, provide a proof; if you answer no, provide a counterexample. 5.36 Let Mn,m be the space of matrices of order n × m representing two-player zero-sum games in which Player I has n pure strategies and Player II has m pure strategies. Prove that the function that associates with every matrix A = (aij ) ∈ Mn,m the value in mixed strategies of the game that it represents is continuous in (aij ). Remark: The sequence of matrices (Ak )k∈N in Mn,m , where Ak = (aijk ), converges to A = (aij ), if aij = lim aijk , ∀i, j. k→∞
(5.148)
5.37 Let A = (aij ) and B = (bij ) be two n × m matrices representing two-player zerosum games in strategic form. Prove that the difference between the value of A and the value of B is less than or equal to maxni=1 maxm j =1 |aij − bij |.
(5.149)
5.38 Find matrices A and B of order n × m representing two-player zero-sum games, such that the value of the matrix C := 12 A + 21 B is less than the value of A and less than the value of B.
205
5.10 Exercises
5.39 Let Ŵ = (N, (Si )i∈N , (ui )i∈N ) and Ŵ = (N, (Si )i∈N , ( ui )i∈N ) be two strategic-form games with the same sets of pure strategies. Denote the maximal difference between the payoff functions of the two games by c=
max
max |ui (s) − ui (s)|.
s∈S1 ×···×Sn i∈N
(5.150)
if for every We say that the set of equilibria of G is close to the set of equilibria of G such that x ∗ of G equilibrium x ∗ of G there is an equilibrium |ui (x ∗ ) − ui ( x ∗ )| ≤ c, ∀i ∈ N.
(5.151)
Find two games Ŵ = (N, (Si )i∈N , (ui )i∈N ) and Ŵ = (N, (Si )i∈N , ( ui )i∈N ) such that Can such a the set of equilibria of G is not close to the set of equilibria of G. phenomenon exist in two-player zero-sum games? (See Exercise 5.37.)
Ŵ = (N, (Si )i∈N , ( ui )i∈N ) be two strategic-form 5.40 Let Ŵ = (N, (Si )i∈N , (ui )i∈N ) and games with the same sets of players, and the same sets of pure strategies such that ui (s) ≥ ui (s) for each strategy vector s ∈ S. Denote the multilinear extension of i . Is it necessarily true that for each equilibrium σ of Ŵ there exists an ui by U i ( equilibrium σ of Ŵ such that Ui (σ ) ≥ U σ ) for each player i ∈ N? In other words, when the payoffs increase, do the equilibrium payoffs also increase? Prove this claim, or find a counterexample. 5.41 Prove that in a two-player strategic-form game, the minmax value in mixed strategies of a player equals his maxmin value in mixed strategies. 5.42 Suppose that the following game has a unique equilibrium, given by a completely mixed strategy. Player II L R T
a, b
e, f
B
c, d
g, h
Player I
Answer the following questions: (a) Prove that the payoff of each player at this equilibrium equals his maxmin value in mixed strategies. (b) Compute the equilibria in mixed strategies and the maxmin strategies in mixed strategies of the two players. Did you find the same strategies in both cases? 5.43 Prove that the only equilibrium in the following three-player game, where Player I chooses a row (T or B), Player II chooses a column (L or R), and Player III chooses a matrix (W or E), is (T , L, W ).
206
Mixed strategies
W
E
L
R
T
1, 1, 1
0, 1, 3
B
1, 3, 0
1, 0, 1
L
R
T
3, 0, 1
1, 1, 0
B
0, 1, 1
0, 0, 0
Guidance: First check whether there are equilibria in pure strategies. Then check whether there are equilibria in which two players play pure strategies, while the third plays a completely mixed strategy (meaning a strategy in which each one of his two pure strategies is chosen with positive probability). After that, check whether there are equilibria in which one player plays a pure strategy, and the other two play completely mixed strategies. Finally, check whether there are equilibria in which all the players play completely mixed strategies. Note the symmetry between the players; making use of the symmetry will reduce the amount of work you need to do. 5.44 In this exercise we will prove the following theorem: Theorem 5.56 A set E ⊆ R2 is the set of Nash equilibrium payoffs in a two-player game in strategic form if and only if E is the union of a finite number of rectangles of the form [a, b] × [c, d] (the rectangles are not necessarily disjoint from each other, and we do not rule out the possibility that in some of them a = b and/or c = d). For every distribution x over a finite set S, the support of x, which is denoted supp(x), is the set of all elements of S that have positive probability under x: supp(x) := {s ∈ S : x(s) > 0}.
(5.152)
(a) Let (x1 , y1 ) and (x2 , y2 ) be two equilibria of a two-player strategic-form game with payoffs (a, c) and (b, d) satisfying supp(x1 ) = supp(x2 ) and supp(y1 ) = supp(y2 ). Prove that for every 0 ≤ α, β ≤ 1 the strategy vector (αx1 + (1 − α)x2 , βy1 + (1 − β)y2 ) is a Nash equilibrium with the same support, and with payoff (αa + (1 − α)c, βb + (1 − β)d). (b) Deduce that for any subset SI′ of Player I’s pure strategies, and any subset SII′ of Player II’s pure strategies, the set of Nash equilibria payoffs yielded by strategy vectors (x, y) satisfying supp(x) = SI′ and supp(y) = SII′ is a rectangle. (c) Since the number of possible supports is finite, deduce that the set of equilibrium payoffs of every two-player game in strategic form is a union of a finite number of rectangles of the form [a, b] × [c, d]. (d) In this part, we will prove the converse of Theorem 5.56. Let K be a positive numbers satisfying ak ≤ bk and integer, and let (ak , bk , ck , dk )K k=1 be positive ([a ck ≤ dk for all k. Define the set A = K k , bk ] × [ck , dk ]), which is the k=1 union of a finite number of rectangles (if ak = bk and/or ck = dk , the rectangle is degenerate). Prove that the set of equilibrium payoffs in the following game in strategic form in which each player has 2K actions is A.
207
5.10 Exercises
a1, b1
c1, b1
0, 0
0, 0
.. .
0, b1
0, b1
a1, d1
c1, d1
0, 0
0, 0
.. .
0, d1
0, d1
0, 0
0, 0
a2, b2
c2, b2
.. .
0, b2
0, b2
0, 0
0, 0
a2, d2
c2, d2
.. .
0, d2
0, d2
.. .
.. .
.. .
.. .
.. .
.. .
.. .
a1, 0
c1, 0
a2, 0
c2, 0
.. .
aK, bK
cK, bK
a1, 0
c1, 0
a2, 0
c2, 0
.. .
aK, dK
cK, dK
5.45 In this exercise, we will show that Theorem 5.56 (page 206) only holds true in two-player games: when there are more than two players, the set of equilibrium payoffs is not necessarily a union of polytopes. Consider the following three-player game, in which Player I chooses a row (T or B), Player II chooses a column (L or R), and Player III chooses a matrix (W or E).
W
E
L
R
T
1, 0, 3
0, 0, 1
B
1, 1, 1
0, 1, 1
L
R
T
0, 1, 4
0, 0, 0
B
1, 1, 0
1, 0, 0
Show that the set of equilibria is ([x(T ), (1 − x)(B)], [y(L), (1 − y)(R)], W ) : 0 ≤ x, y ≤ 1, xy ≤
1 2
!
.
(5.153)
Deduce that the set of equilibrium payoffs is (y, x, 1 + 2xy) : 0 ≤ x, y ≤ 1, xy ≤
1 2
!
.
(5.154)
and hence it is not the union of polytopes in R3 . Guidance: First show that at every equilibrium, Player III plays his pure strategy W with probability 1, by ascertaining what the best replies of Players I and II are if he does not do so, and what Player III’s best reply is to these best replies. 5.46 Find all the equilibria in the following three-player game, in which Player I chooses a row (T or B), Player II chooses a column (L or R), and Player III chooses a matrix (W or E).
208
Mixed strategies
E
W L
R
T
0, 0, 0
1, 0, 0
B
0, 0, 1
0, 1, 0
L
R
T
0, 1, 0
0, 0, 1
B
1, 0, 0
0, 0, 0
5.47 Tom, Dick, and Harry play the following game. At the first stage, Dick or Harry is chosen, each with probability 21 . If Dick has been chosen, he plays the Game A in Figure 5.36, with Tom as his opponent. If Harry has been chosen, he plays the Game B in Figure 5.36, with Tom as his opponent. Tom, however, does not know who his opponent is (and which of the two games is being played). The payoff to the player who is not chosen is 0.
Dick
T
Harry
L
R
2, 5
0, 0
Tom
L
R
T
2, 5
0, 0
B
0, 0
1, 1
Tom B
0, 0
1, 1
Game A
Game B
Figure 5.36 The payoff matrices of the game in Exercise 5.47
Do the following: (a) (b) (c) (d)
Draw the extensive form of the game. Write down strategic form of the game. Find two equilibria in pure strategies. Find an additional equilibrium in mixed strategies.
5.48 In this exercise, Tom, Dick, and Harry are in a situation similar to the one described in Exercise 5.47, but this time the payoff matrices are those shown in Figure 5.37. Dick
T Tom B
Harry
L
R
0, 0
3, −3
1, −1
L
R
T
5, −5
0, 0
B
0, 0
1, −1
Tom 0, 0
Game A Figure 5.37 The payoff matrices of the game in Exercise 5.48
Game B
209
5.10 Exercises
(a) Depict the game in extensive form. (b) Depict the game in strategic form. (c) Find all the equilibria of this game. 5.49 Prove that in every two-player game on the unit square that is not zero sum, and in which the payoff functions of the two players are bilinear (see Section 4.14.2 on page 123), there exists an equilibrium in pure strategies. 5.50 In this exercise, we generalize Theorem 5.11 (page 151) to the case in which the set of pure strategies of one of the players is countable. Let Ŵ be a two-player zero-sum game in which Player I’s set of pure strategies SI , is finite, Player II’s set of pure strategies SII = {1, 2, 3, . . .} is a countable set, and the payoff function u is bounded. Let Ŵ n be a two-player zero-sum game in which Player I’s set of pure strategies is SI , Player II’s set of pure strategies is SIIn = {1, 2, . . . , n}, and the payoff functions are identical to those of Ŵ. Let v n be the value of the game Ŵ n , and let σIn ∈ (SI ) and σIIn ∈ (SII ) be the optimal strategies of the two players in this game, respectively. (a) Prove that (v n )n∈N is a sequence of nonincreasing real numbers. Deduce that v := limn→∞ v n exists. (b) Prove that each accumulation point σI of the sequence (σIn )n∈N satisfies6 inf
σII ∈ (SII )
U (σI , σII ) ≥ v.
(5.155)
(c) Prove that for each n ∈ N, the mixed strategy σIIn satisfies sup U (σ1 , σIIn ) ≤ v n .
(5.156)
σI ∈ (SI )
(d) Deduce that sup
inf
σI ∈ (SI ) σII ∈ (SII )
U (σI , σII ) =
inf
sup U (σI , σII ) = v.
σII ∈ (SII ) σI ∈ (SI )
(e) Find an example of a game Ŵ in which the sequence (σIIn )n∈N has no accumulation point. (f) Show by a counterexample that (d) above does not necessarily hold when SI is also countably infinite. 5.51 In this exercise we will present an example of a game with an infinite set of players that has no equilibrium in mixed strategies. Let (N, (Si )i∈N , (ui )i∈N ) be a game in strategic form in which the set of players is the set of natural numbers N = {1, 2, 3, . . .}, each player i ∈ N has two pure strategies Si = {0, 1}, and player i’s payoff function is if j ∈N sj < ∞, si ui (s1 , s2 , . . .) = (5.157) if j ∈N sj = ∞. −si (a) Prove that this game has no equilibrium in pure strategies.
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
n
6 Recall that σI is an accumulation point of a sequence (σIn )n∈N if there exists a subsequence (σI k )k∈N converging to σ I .
210
Mixed strategies
(b) Using Kolmogorov’s 0-1 Law,7 prove that the game has no equilibrium in mixed strategies. 5.52 Let f : Rn+m → R be a homogeneous function, i.e., f (cx, y) = cf (x, y) = f (x, cy), for every x ∈ Rn , every y ∈ Rm , and every c ∈ R. = ({I, II}, Rn+ , Rm Prove that the value v of the two-player zero-sum game G +, f ) 8 v ∈ R ∪ {−∞, ∞} such that exists; that is, there exists v = sup infm f (x, y) = infm sup f (x, y). x∈Rn+ y∈R+
(5.158)
y∈R+ Rn +
In addition, prove that the value of the game equals either 0, ∞, or −∞. Guidance: Consider first the game G = ({I, II}, X, Y, f ), where n
n X := x ∈ R : xi ≤ 1, xi ≥ 0 ∀i = 1, 2, . . . , n , Y :=
⎧ ⎨ ⎩
i=1
y ∈ Rm :
m
j =1
⎫ ⎬
xj ≤ 1, yj ≥ 0 ∀j = 1, 2, . . . , m . ⎭
(5.159)
(5.160)
Show that the game G has a value v; then show that: if v = 0 then v = 0; if v > 0 then v = ∞; and if v < 0 then v = −∞.
5.53 Show that every two-player constant-sum game is strategically equivalent to a zero-sum game. For the definition of strategic equivalence, see Definition 5.34 (page 174). 5.54 Prove that if (σI , σII ) is the solution of the system of linear equations (5.70)–(5.79) (page 166), then (σI , σII ) is a Nash equilibrium. 5.55 Suppose that the preferences of two players satisfy the von Neumann–Morgenstern axioms. Player I is indifferent between receiving $600 with certainty and participating in a lottery in which he receives $300 with probability 14 and $1,500 with probability 43 . He is also indifferent between receiving $800 with certainty and participating in a lottery in which he receives $600 with probability 21 and $1,500 with probability 21 . Player II is indifferent between losing $600 with certainty and participating in a lottery in which he loses $300 with probability 17 and $800 with probability 67 . He is also indifferent between losing $800 with certainty and participating in a lottery in which he loses $300 with probability 81 and $1,500 with probability 78 . The players ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
7 Let (Xi )i∈N be a sequence of independent random numbers defined over a probability space (, F , p). An event A is called a tail event if it depends only on (Xi )i≥n , for each n ∈ N. In other words, for any n ∈ N, to ascertain whether ω ∈ A it suffices to know the values (Xi (ω))i≥n , which means that we can ignore a finite number of the initial variables X1 , X2 , . . . , Xn (for any n). Kolmogorov’s 0-1 law says that the probability of a tail event is either 0 or 1. 8 For every natural number n the set Rn+ is the nonnegative quadrant of Rn : Rn+ := {x ∈ Rn : xi ≥ 0, ∀i = 1, 2, . . . , n}.
211
5.10 Exercises
play the game whose payoff matrix is as follows, where the payoffs are dollars that Player II pays to Player I. (a) Find linear utility functions for the two players representing the preference relations of the players over the possible outcomes. The players play a game whose outcomes, in dollars paid by Player II to Player I, are given by the following matrix.
Player II L
M
R
T
$300
$800
$1,500
B
$1,500
$600
$300
Player I
(b) Determine whether the game is zero sum. (c) If you answered yes to the last question, find optimal strategies for each of the players. If not, find an equilibrium. 5.56 Which of the following games, where Player I is the row player and Player II is the column player, are strategically equivalent to two-player zero-sum games? For each game that is equivalent to a two-player zero-sum game, write explicitly the positive affine transformation that proves your answer.
L
R
T
11, 2
5, 4
B
− 7, 8
17, 0
L
R
T
2, 7
4, 5
B
6, 3
−3, 12
Game A
C
R
T
0, 12
5, 16
4, 22
B
8, 9
2, 10
7, 11
Game B
L
C
R
T
11, 2
5, 4
−16, 11
B
−7, 8
17, 0
−1, 6
Game D
L
Game C
L
C
R
T
9, 5
3, 7
−18, 14
B
−9, 11
15, 3
−4, 9
Game E
L
C
R
T
9, 2
5, 6
12, −1
B
9, 8
1, 10
7, 4
Game F
5.57 (a) Find the value in mixed strategies and all the optimal strategies of each of the two players in the following two-player zero-sum game.
212
Mixed strategies
Player II L
R
T
12
8
B
4
16
Player I
(b) Increase the utility of Player II by 18, to get Player II L
R
T
12, 6
8, 10
B
4, 14
16, 2
Player I
What are all the equilibria of this game? Justify your answer. (c) Multiply the utility of the first player in the original game by 2, and add 3, to get the following game. Player II L
R
T
27, −12
19, −8
B
11, −4
35, −16
Player I
What are the equilibrium strategies and equilibrium payoffs in this game? be two strategically equivalent 5.58 Prove Theorem 5.35 on page 175: let G and G strategic-form games. Every equilibrium in mixed strategies σ of G is an equilibrium in mixed strategies of G. 5.59 (a) Consider the following two-player game.
Player II L R T
1, 0
−1, 1
B
0, 1
0, 0
Player I
Show that the only equilibrium in the game is [ 12 (T ), 12 (B)], [ 12 (L), 12 (R)].
213
5.10 Exercises
(b) Consider next the two-player zero-sum game derived from the above game in which these payoffs are Player I’s payoffs. Compute the value in mixed strategies of this game and all the optimal strategies of Player I. (c) Suppose that Player I knows that Player II is implementing strategy [ 12 (L), 12 (R)], and he needs to decide whether to implement the mixed strategy [ 21 (T ), 12 (B)], which is his part in the equilibrium, or whether to implement instead the pure strategy B, which guarantees him a payoff of 0. Explain in what sense the mixed strategy [ 12 (T ), 12 (B)] is equivalent to the pure strategy B, from Player I’s perspective. 5.60 A strategic-form game with constraints is a quintuple (N, (Si , ui )i∈N , c, γ ) where N is the set of players, Si is player i’s set of pure strategies, ui : S → R is player i’s payoff function, where S = ×i∈N Si , c : S → R is a constraint function, and γ ∈ R is a bound. Extend c to mixed strategies in the following way:
σ1 (s1 )σ2 (s2 ) · · · σn (sn )c(s). (5.161) C(σ ) = s∈S
In a game with constraints, the vectors of mixed strategies that the players can play are limited to those vectors of mixed strategies satisfying the constraints. Formally, a vector of mixed strategies σ = (σi )i∈N is called permissible if C(σ ) ≤ γ . Games with constraints occur naturally when there is a resource whose use is limited. (a) Consider the following two-player zero-sum game, with the payoffs and constraints appearing in the accompanying figure. The bound is γ = 1. Player II
T Player I B
Player II
L
R
0
1
−1
0
L
R
T
2
0
B
0
0
Player I
Payoffs
Constraints
Compute max
min
U (σI , σII )
min
max
U (σI , σII ).
σI ∈I {σII ∈II : C(σI ,σII )≤γ }
and σII ∈II {σI ∈I : C(σI ,σII )≤γ }
(b) How many equilibria can you find in this game? The following condition in games with constraints is called the Slater condition. For each player i and every vector of mixed strategies of the other players σ−i
214
Mixed strategies
there exists a mixed strategy σi of player i such that C(σi , σ−i ) < γ (note the strict inequality). The following items refer to games with constraints satisfying the Slater condition. (c) Prove that in a two-player zero-sum game with constraints max
min
σI ∈I {σII ∈II : C(σI ,σII )≤γ }
U (σI , σII ) ≥ min
max
σII ∈II {σI ∈I : C(σI ,σII )≤γ }
U (σI , σII ).
Does this result contradict Theorem 5.40 on page 177? Explain. (d) Go back to the n-player case. Using the compactness of i and −i , prove that for every player i, sup
inf
σ−i ∈−i {σi ∈i : C(σi ,σ−i )≤γ }
C(σi , σ−i ) < γ .
(5.162)
(e) Prove that for each strategy vector σ satisfying the constraints (i.e., C(σ ) ≤ γ ), k ∞ )k=1 converging to for each player i, and each sequence of strategy vectors (σ−i k ∞ k σ−i , there exists a sequence (σi )k=1 converging to σi such that C(σik , σ−i )≤γ for every k. (f) Using Kakutani’s Fixed Point Theorem (Theorem 23.32 on page 939), show that in every strategic-form game with constraints there exists an equilibrium. In other words, show that there exists a permissible vector σ ∗ satisfying the ∗ ) condition that for each player i ∈ N and each strategy σi∗ of player i, if (σi , σ−i ∗ ∗ is a permissible strategy vector, then Ui (σi , σ−i ) ≤ Ui (σ ). Hint: To prove part (c), denote by v the value of the game without constraints, and prove that the left-hand side of the inequality is greater than or equal to v, and the right-hand side of the inequality is less than or equal to v. 5.61 Prove Theorem 5.45 on page 183: if information is added to Player I in a two-player zero-sum game, the value of the game in mixed strategies does not decrease. 5.62 Compute the (unique) equilibrium payoff in each of the following two-player extensive-form games. Which player gains, and which player loses, from the addition of information to Player I, i.e., when moving from Game A to Game B? Is the result of adding information here identical to the result of adding information in Example 5.47 (page 185)? Why? (20, 1)
(20, 1)
M II
M l r
R L I
Game A
l r
(10, 0) (0, 10)
II
R
(10, 0)
I
l1 r1
(0, 10)
I
l2 r2
L
(0, 10) (10, 0)
Game B
(0, 10)
(10, 0)
215
5.10 Exercises
5.63 Consider the following two-player game composed of two stages. In the first stage, one of the two following matrices is chosen by a coin toss (with each matrix chosen with probability 21 ). In the second stage, the two players play the strategic-form game whose payoff matrix is given by the matrix that has been chosen. Player II
Player II
T Player I B
L
L
C
R
0, 0
1, −1
−1, 10
−2, −2
−2, −2 −3, −12
C
T
−1, 1
B
1,
Player I
1 2
R
−2, −1 −2, −11 −1, 0
−1, 10
For each of the following cases, depict the game as an extensive-form game, and find the unique equilibrium: (a) No player knows which matrix was chosen. (b) Player I knows which matrix was chosen, but Player II does not know which matrix was chosen. What effect does adding information to Player I have on the payoffs to the players at equilibrium? 5.64 Prove that in a two-player game, the maxmin value in mixed strategies of a player equals his minmax value in mixed strategies. 5.65 In this exercise we will prove von Neumann’s Minmax Theorem (Exercise 5.11 on page 151), using the Duality Theorem from the theory of linear programming (see Section 23.3 on page 945 for a brief review of linear programming). Let G be a two-player zero-sum game in which Player I has n pure strategies, Player II has m pure strategies, and the payoff matrix is A. Consider the following linear program, in the variables y = (yj )m is j =1 , in which c is a real number, and c an n-dimensional vector, all of whose coordinates equal c: Compute: subject to:
ZP := min c, ⊤ Ay m ≤ c, j =1 yj = 1, y ≥ 0.
(a) Write down the dual program. (b) Show that the set of all y satisfying the constraints of the primal program is a compact set, and conclude that ZP is finite. (c) Show that the optimal solution to the primal program defines a mixed strategy for Player II that guarantees him an expected payoff of at most ZP . (d) Show that the optimal solution to the dual program defines a mixed strategy for Player I that guarantees an expected payoff of at least ZD . (e) Explain why the Duality Theorem is applicable here. Since the Duality Theorem implies that ZP = ZD , deduce that ZP is the value of the game.
216
Mixed strategies
5.66 Prove the following claims for n-player extensive-form games: (a) Adding information to one of the players does not increase the maxmin or the minmax value of the other players. (b) Adding information to one of the players does not increase the minmax value of the other players. (c) Adding information to one of the players may have no effect on his maxmin value. (d) Adding information to one of the players may decrease the maxmin value of the other players. 5.67 Find all the equilibria of Game B in Figure 5.29 (page 185). What are the equilibria payoffs corresponding to these equilibria? 5.68 Find all the equilibrium points of the following games, and ascertain which of them defines an evolutionarily stable strategy.
Population Dove Hawk Dove
2, 2
Population Dove Hawk
8, 3
Mutation
Dove
2, 2
1, 3
Hawk
3, 1
7, 7
Mutation Hawk
Dove
3, 8
7, 7
Game A
Game B
Population Dove Hawk
Population Dove Hawk
2, 2
0, 1
Dove
1, 1
1, 1
Hawk
1, 1
1, 1
Mutation
Mutation Hawk
Dove
1, 0
7, 7
Game C
Game D
Population Dove Hawk
Population Dove Hawk
2, 2
8, 8
Dove
1, 1
1, 1
Hawk
1, 1
2, 2
Mutation
Mutation Hawk
8, 8
7, 7
Game E
Game F
5.69 Suppose that a symmetric two-player game, in which each player has two pure strategies and all payoffs are nonnegative, is given by the following figure.
217
5.10 Exercises
Player II L
R
T
a, a
d, c
B
c, d
b, b
Player I
What conditions on a, b, c, d guarantee the existence of an ESS? 5.70 Prove that the unique Nash equilibrium of Rock, Paper, Scissors (Example 4.3, on page 78) is 1 (Rock), 31 (Paper), 31 (Scissors) ; 13 (Rock), 31 (Paper), 31 (Scissors) . 3
5.71 Suppose that the males and females of a particular animal species have two types of behavior: care for offspring, or abandonment of offspring. The expected number of offspring are presented in the following matrix. Mother Care Father
Abandon
Care
V − c, V − c
αV − c, αV
Abandon
αV, αV − c
0, 0
Explanation: V is the expected number of surviving offspring if they are cared for by both parents. If only one parent cares for the offspring, the expected number of surviving offspring is reduced to αV , 0 < α < 1. In addition, a parent who cares for his or her offspring invests energy and time into that care, which reduces the number of surviving offspring he or she has by c (because he or she has fewer mating encounters with other animals). Prove the following claims: (a) If V − c > αV and αV − c > 0 (which results in a relatively smaller investment, since c < αV and c < (1 − α)V ), then the only evolutionarily stable strategy is Care, meaning that both parents care for their offspring. (b) If V − c < αV and αV − c < 0 (which results in a high cost for caring for offspring), the only evolutionarily stable strategy is Abandon, and hence both parents abandon their offspring. (c) If α < 12 (in this case (1 − α)V > αV , and investment in caring for offspring satisfies (1 − α)V > c > αV ), there are two evolutionarily stable equilibria, Care and Abandon, showing that both Care and Abandon are evolutionarily stable strategies. Which equilibrium emerges in practice in the population depends on the initial conditions.
218
Mixed strategies
(d) If α > 12 (in this case αV > (1 − α)V , and investment in caring for offspring satisfies αV > c > (1 − α)V ), the only evolutionarily stable equilibrium is the αV −c mixed strategy in which Care is chosen with probability (2α−1)V . Remark: The significance of α < 21 is that “two together are better than two separately.” The significance of α > 12 is that “two together are worse than two separately.” 5.72 A single male leopard can mate with all the female leopards on the savanna. Why, then, is every generation of leopards composed of 50% males and 50% females? Does this not constitute a waste of resources? Explain, using ideas presented in Section 5.8 (page 186), why the evolutionarily stable strategy is that at which the number of male leopards born equals the number of females born.9 ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
9 In actual fact, the ratio of the males to females in most species is close to 50%, but not exactly 50%. We will not present here various explanations that have been suggested for this phenomenon.
6
Behavior strategies and Kuhn’s Theorem
Chapter summary In strategic-form games, a mixed strategy extends the player’s possibilities by allowing him to choose a pure strategy randomly. In extensive-form games random choices can be executed in two ways. The player can randomly choose a pure strategy for the whole play at the outset of the game; this type of randomization yields in fact the concept of mixed strategy in an extensive-form game. Alternatively, at every one of his information sets, the player can randomly choose one of his available actions; this type of randomization yields the concept of behavior strategy, which is the subject of this chapter. We study the relationship between behavior strategies and mixed strategies in extensive-form games. To this end we define an equivalence relation between strategies and we show by examples that there are games in which some mixed strategies do not have equivalent behavior strategies, and there are games in which some behavior strategies do not have equivalent mixed strategies. We then introduce the concept of perfect recall: a player has perfect recall in an extensive-form game if along the play of the game he does not forget any information that he knew in the past (regarding his moves, the other players’ moves, or chance moves). We prove Kuhn’s Theorem, which states that if a player has perfect recall, then any one of his behavior strategies is equivalent to a mixed strategy, and vice versa. It follows that a game in which all players have perfect recall possesses an equilibrium in behavior strategies.
As noted in previous chapters, extensive-form games and strategic-form games are not related in a one-to-one manner. In general, the extensive form is richer in detail, and incorporates “dynamic aspects” of the game that are not expressed in strategic form. Strategic-form games focus exclusively on strategies and outcomes. Given this, it is worthwhile to take a closer look at the concepts developed for the two forms of games and detect differences between them, if there are any, due to the different representations of the game. We have already seen that the concept of pure strategy, which is a fundamental element of strategic-form games, is also well defined in extensive-form games, where a pure strategy of a player is a function that maps each of his information sets to an action that is feasible at that information set. In this chapter (only), we will denote the multilinear extension (expectation) of player i’s payoff function by ui , rather than Ui , because Ui will denote an information set of player i. 219
220
Behavior strategies and Kuhn’s Theorem
Example 6.1 Consider the two-player extensive-form game given in Figure 6.1.
U2 U1 U1
B2
t T1
II
I b
I B1
T2
T2 B2
O5 O4 O3 O2
O1
Figure 6.1 The game in Example 6.1
In this game, Player I has two information sets, UI1 and UI2 , and four pure strategies: SI = {T1 T2 , T1 B2 , B1 T2 , B1 B2 }. Player II has one information set,
UII1 ,
(6.1)
and two pure strategies: SII = {t, b}.
(6.2)
Mixed strategies are defined as probability distributions over sets of pure strategies. The concept of mixed strategy is therefore well defined in every game in which the set of pure strategies is a finite or countable set, whether the game is an extensive-form game or a strategic-form game. The sets of mixed strategies are1 I = (SI ),
II = (SII ).
(6.3)
One of the interpretations of the concept of mixed strategy is that it is a random choice of how to play the game. But there may be different ways of attaining such randomness. Clearly, if a player has only one move (only one information set), such as Player II in the game in Figure 6.1, there is only one way to implement a random choice of an action: to pick t with probability α, and b with probability 1 − α. That does indeed define a mixed strategy. What about Player I, who has two information sets in the game in Figure 6.1? Suppose that he implements a mixed strategy, such as, for example, σI = [ 31 (T1 T2 ), 0(T1 B2 ), 13 (B1 T2 ), 13 (B1 B2 )]. Then he is essentially conducting a lottery at the start of the game, and then implementing the pure strategy that has been chosen by lottery. However, Player I has another, equally natural, alternative way to attain randomness: he can choose randomly between T1 and B1 when the play of the game arrives at his information set UI1 , and then choose randomly between T2 and B2 when the play of the game arrives at his information set UI2 . Such a strategy is described by two lotteries: [α(T1 ), (1 − α)(B1 )] at UI1 , and [β(T2 ), (1 − β)(B2 )] at UI2 . In other words, instead of randomly choosing a grand plan (a pure strategy) that determines his actions at each of his information sets, the player randomly chooses his action every time he is at a particular information set. Such a strategy is called a behavior ◭ strategy.
Is there an essential difference between these two strategies? Can a player attain a higher payoff by using a behavior strategy instead of a mixed strategy? Alternatively, can he attain a higher payoff by using a mixed strategy instead of a behavior strategy? We ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
1 Recall that for every finite set S, (S) is the set of all probability distributions over S, (Definition 5.1, page 146).
221
6.1 Behavior strategies
will answer these questions in this chapter, and find conditions under which it makes no difference which of these alternative strategy concepts is used.
6.1
Behavior strategies • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Definition 6.2 A behavior strategy of a player in an extensive-form game is a function mapping each of his information sets to a probability distribution over the set of possible actions at that information set. Recall that we denote by Ui the collection of information sets of player i, and for every information set Ui ∈ Ui , we denote by A(Ui ) the set of possible actions at Ui . A behavior strategy of player i in an extensive-form game is a function bi : Ui → ∪Ui ∈Ui (A(Ui )) such that bi (Ui ) ∈ (A(Ui )) for all Ui ∈ Ui . Equivalently, a behavior strategy is a vector of probability distributions (lotteries), one per information set. This is in contrast with the single probability distribution (single lottery) defining a mixed strategy. The probability that a behavior strategy bi will choose an action ai ∈ A(Ui ) at an information set Ui is denoted by bi (ai ; Ui ). Recall that i is player i’s set of mixed strategies; player i’s set of behavior strategies is denoted by Bi . What is the relationship between Bi and i ? Note first that in every case in which player i has at least two information sets at which he has at least two possible actions, the sets Bi and i are different mathematical structures – two sets in different spaces. This is illustrated in Example 6.1.
Example 6.1 (Continued)
As noted above, in this example Player I’s behavior strategy is described by
two lotteries: bI = ([α(T1 ), (1 − α)(B1 )], [β(T2 ), (1 − β)(B2 )]). Equivalently, we can describe this behavior strategy by a pair of real numbers α, β in the unit interval. The set BI is thus equivalent to the set {(α, β) : 0 ≤ α ≤ 1, 0 ≤ β ≤ 1}, while I is equivalent to the set ⎫ ⎧ 4 ⎬ ⎨
xj = 1 . (x1 , x2 , x3 , x4 ) : xj ≥ 0, ⎭ ⎩
(6.4)
(6.5)
j =1
4 In due to the constraint 4other words, I is equivalent to a subset of R (which is three-dimensional, 2 2 j =1 xj = 1). By contrast, BI is equivalent to a subset of R : the unit square [0, 1] . The fact that I is of higher dimension than BI (three dimensions versus two dimensions) suggests that I may be a “richer,” or a “larger,” set. In fact, in this example, for every behavior strategy one can define an “equivalent” mixed strategy: the behavior strategy
([α(T1 ), (1 − α)(B1 )], [β(T2 ), (1 − β)(B2 )])
(6.6)
222
Behavior strategies and Kuhn’s Theorem is equivalent to the mixed strategy [αβ(T1 T2 ), α(1 − β)T1 B2 , (1 − α)βB1 T2 , (1 − α)(1 − β)B1 B2 ].
(6.7)
The sense in which these two strategies are equivalent is as follows: for each one of Player II’s mixed strategies, the probability of reaching a particular vertex of the tree when Player I uses the behavior strategy (α, β) of Equation (6.6) equals the probability of reaching that vertex when Player I uses the mixed strategy of Equation (6.7). ◭
To define formally the equivalence between a mixed strategy and a behavior strategy, we consider strategy vectors that consist of both mixed strategies and behavior strategies. Definition 6.3 A mixed/behavior strategy vector is a vector of strategies σ = (σi )i in which σi can be either a mixed strategy or a behavior strategy of player i, for each i. For every mixed/behavior strategy vector σ = (σi )i∈N and every vertex x in the game tree, denote by ρ(x; σ ) the probability that vertex x will be visited during the course of the play of the game when the players implement strategies (σi )i∈N . Example 6.4 Consider the two-player game depicted in Figure 6.2. In this figure, the vertices of the tree are denoted by x1 , x2 , . . . , x17 .
x2 T1 x1 I 1
6 10
t
b
B1
x3 4 10
x4 0 9 20
x5 3 10
II t b
2 3
x6 I x7 1 10
1 3
x8 I
T3
6 20 x9
B3 T4
I 3 20
B4
x14 x15 x16 x17
2 20 4 20 3 20
0
6 40
T2 B2 T2 B2
x10 x11 x12 x13
3 40 9 40 1 40 3 40
Figure 6.2 A two-player game, including the probabilities of arriving at each vertex
Suppose that the players implement the following mixed strategies: & ' 3 1 4 2 (B1 B2 B3 B4 ) , (B1 T2 B3 B4 ) , (T1 B2 B3 T4 ) , (T1 B2 T3 T4 ) , σI = 10 10 10 10 ' & 3 1 (t), (b) . σII = 4 4
(6.8) (6.9)
223
6.1 Behavior strategies Given these mixed strategies, we have computed the probabilities that the play of the game will reach the various vertices of the game tree, and these probabilities are listed alongside the vertices in Figure 6.2. We began at the leaves of the tree; for example, the probability of arriving at leaf x13 is the probability that Player I will play B1 at vertex x1 , and B2 at information set {x6 , x7 }, and that Player II will play b at information set {x2 , x3 }. From among the four pure strategies of Player I for which σI assigns positive probability, Player I will play B1 at vertex x1 and B2 at information set {x6 , x7 } only at the pure strategy (B1 B2 B3 B4 ), with this pure strategy chosen by σI with probability 3 . Since the mixed strategies of the two players (which are probability distributions over their 10 pure strategy sets) are independent, the probability that the play of the game will reach the leaf x13 3 3 × 14 = 40 . We compute the probability of getting to a vertex that is not a leaf by recursion is 10 from the leaves to the root: the probability of getting to a vertex x is the sum of the probabilities of getting to one of the children of x. ◭
Definition 6.5 A mixed strategy σi and a behavior strategy bi of player i in an extensiveform game are equivalent to each other if for every mixed/behavior strategy vector σ−i of the players N \ {i} and every vertex x in the game tree ρ(x; σi , σ−i ) = ρ(x; bi , σ−i ).
(6.10)
In other words, the mixed strategy σi and the behavior strategy bi are equivalent if for every mixed/behavior strategy vector σ−i , the two strategy vectors (σi , σ−i ) and (bi , σ−i ) induce the same probability of arriving at each vertex in the game tree. In particular, ρ(x; σi , σ−i ) = ρ(x; bi , σ−i ) for every leaf x. The probability ρ(x; σ ) that the vertex x will be visited during a play of the game equals the sum of the probabilities that the leaves that are descendants of x will be visited. It follows that to check that Equation (6.10) holds for every vertex x it suffices to check that it holds for every leaf of the game tree. It further follows from the definition that when the behavior strategy bi is equivalent to the mixed strategy σi , then for every mixed/behavior strategy vector σ−i of the other players the two strategy vectors (σi , σ−i ) and (bi , σ−i ) lead to the same expected payoff (Exercise 6.6). Theorem 6.6 If a mixed strategy σi of player i is equivalent to a behavior strategy bi , then for every mixed/behavior strategy vector σ−i of the other players and every player j ∈ N, uj (σi , σ−i ) = uj (bi , σ−i ).
(6.11)
Repeated application of Theorem 6.6 leads to the following corollary. Corollary 6.7 Let σ = (σi )i∈N be a vector of mixed strategies. For each player i, let bi be a behavior strategy that is equivalent to σi , and denote b = (bi )i∈N . Then, for each player i, ui (σ ) = ui (b).
(6.12)
224
Behavior strategies and Kuhn’s Theorem
Example 6.4 (Continued) Given the probabilities calculated in Figure 6.2, the behavior strategy bI defined by bI =
& ' 1 1 2 3 2 (T ), (B ) , (T ), (B ) , (T ), (B ) , 1(T ), 0(B ) 1 1 2 2 3 3 4 4 5 5 4 4 3 3
3
is equivalent to the mixed strategy σI defined in Equation (6.8), 3 1 4 2 (B1 B2 B3 B4 ) , 10 (B1 T2 B3 B4 ) , 10 (T1 B2 B3 T4 ) , 10 (T1 B2 T3 T4 ) . σI = 10
(6.13)
(6.14)
To see how behavior strategy bI was computed from the mixed strategy σI , suppose that Player II implements strategy σI = [ 43 (t), 14 (b)]. The probability that the play of the game will arrive at each vertex x appears in the game tree in Figure 6.2. If behavior strategy bI is equivalent to the mixed strategy σI , then the probability that an action in a particular information set is chosen is the ratio between the probability of arriving at the vertex that leads to that action and the probability of arriving at the vertex at which the action is chosen. For example, in order to compute the proba3 bility at which the action B2 is chosen in the information set {x6 , x7 }, we divide the probability 40 of 3/40 3 1 reaching vertex x13 by the probability 10 of reaching vertex x7 , to obtain 1/10 = 4 , corresponding to [ 41 (T2 ), 34 (B2 )] in strategy bI (we obtain a similar result, of course, if we divide the probability 9 3 of reaching vertex x11 by the probability 10 of reaching vertex x6 ). To complete the construction 40 of bI from the mixed strategy σI , similar computations need to be conducted at Player I’s other information sets, and it must be shown that these computations lead to the same outcome for all ◭ strategies [α(t), (1 − α)(b)] of Player II (Exercise 6.7).
Using a behavior strategy, instead of a mixed strategy, may be advantageous for two reasons: first, the set Bi is “smaller,” and defined by fewer parameters, than the set i . For example, if the player has four information sets, with two actions at each information set (as happens in Example 6.4), the total number of pure strategies available is 24 = 16, so that a mixed strategy involves 15 variables, as opposed to a behavior strategy, which involves only four variables (namely, the probability of selecting the first action in each one of the information sets). Secondly, in large extensive-form games, behavior strategies appear to be “more natural,” because in behavior strategies, players choose randomly between their actions at each information set at which they find themselves, rather than making one grand random choice of a “master plan” (i.e., a pure strategy) for the entire game, all at once. This motivates the questions of whether each mixed strategy has an equivalent behavior strategy, and whether each behavior strategy has an equivalent mixed strategy. As the next two examples show, the answers to both questions may, in general, be negative. Example 6.8
A mixed strategy that has no equivalent behavior strategy Consider the game in
Figure 6.3, involving only one player.
T2 T1 I
B2 I
B1
T2 B2
O4 O3 O2 O1
Figure 6.3 A game with a mixed strategy that has no equivalent behavior strategy
225
6.1 Behavior strategies There are four pure strategies, {T1 T2 , T1 B2 , B1 T2 , B1 B2 }. We will show that there is no behavior strategy that is equivalent to the mixed strategy σI = [ 21 (T1 T2 ), 0(T1 B2 ), 0(T2 B1 ), 12 (B1 B2 )]. This mixed strategy induces the following probability distribution over the outcomes of the game: 1 1 (6.15) 2 (O1 ), 0(O2 ), 0(O3 ), 2 (O4 ) . A behavior strategy [α(T1 ), (1 − α)(B1 )], [β(T2 ), (1 − β)(B2 )] induces the following probability distribution over the outcomes of the game: [(1 − α)(1 − β)(O1 ), (1 − α)β(O2 ), α(1 − β)(O3 ), αβ(O4 )].
(6.16)
If this behavior strategy were equivalent to the mixed strategy σI , they would both induce the same probability distributions over the outcomes of the game, so that the following equalities would have to obtain: αβ = 12 ,
α(1 − β) = 0,
(1 − α)β = 0,
(1 − α)(1 − β) =
1 . 2
(6.17) (6.18) (6.19) (6.20)
But this system of equations has no solution: Equation (6.18) implies that either α = 0 or β = 1. If α = 0, Equation (6.17) does not hold, and if β = 1, Equation (6.20) does not hold. ◭
Example 6.9 The Absent-Minded Driver: a game with a behavior strategy that has no equivalent mixed strategy Consider the game in Figure 6.4, involving only one player, Player I. In this game, the player, when he comes to choosing an action, cannot recall whether or not he has chosen an action in the past. An illustrative story that often accompanies this example is that of an absent-minded driver, motoring down a road with two exits. When the driver arrives at an exit, he cannot recall whether it is the first exit on the road, or the second exit.
x1
R
x3 O3
I L x2
R
x5 O2
L
O1 x4
Figure 6.4 The Absent-Minded Driver game
There are two pure strategies: T and B. The pure strategy T yields the outcome O3 , while the pure strategy B yields the outcome O1 . Since a mixed strategy is a probability distribution over the set of pure strategies, no mixed strategy can yield the outcome O2 with positive probability. In contrast, the behavior strategy [ 21 (T ), 21 (B)], where the player chooses one of the two actions with equal probability at each of the two vertices in his information set, leads to the following probability distribution over outcomes: 1 (O1 ), 14 (O2 ), 12 (O3 ) . (6.21) 4
Since this probability distribution can never be the result of implementing a mixed strategy, we conclude that there is no mixed strategy equivalent to this behavior strategy. ◭
226
Behavior strategies and Kuhn’s Theorem
6.2
Kuhn’s Theorem • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Let us note that the player suffers from forgetfulness of a different kind in each of the above examples: in Example 6.8, when the player is about to take an action the second time, he cannot recall what action he chose the first time; he knows that he has made a previous move, but cannot recall what action he took. In Example 6.9, the player does not even recall whether or not he has made a move in the past (although he does know that if he did make a prior move, he necessarily must have chosen action B). What happens when the player is not forgetful? Will this ensure that every behavior strategy has an equivalent mixed strategy, and that every mixed strategy has an equivalent behavior strategy? As we will show in this section, the answer to these questions is affirmative.
6.2.1
Conditions for the existence of an equivalent mixed strategy to any behavior strategy Let x be a vertex in the game tree that is not the root, and let x1 be a vertex on the path from the root to x. The (unique) edge emanating from x1 on the path from the root to x is called the action at x1 leading to x. A pure strategy selects the same action at every vertex in each one of the corresponding player’s information sets. It follows that if the path from the root to x passes through two vertices x1 and x1 that are in the same information set of player i, and if the action at x1 leading to x differs from the action at x1 leading to x, then when player i implements a pure strategy the play of the game cannot arrive at x. For this reason, in Example 6.9 there is no pure strategy leading to the vertex x5 . Since a mixed strategy is a probability distribution over pure strategies, the probability that a play of the game will arrive at such a vertex x is 0 when player i implements any mixed strategy. In contrast, if all the players implement behavior strategies in which at every information set every possible action is played with positive probability, then for each vertex in the game tree there is a positive probability that the play of the game will reach that vertex. This leads to the following conclusion (Exercise 6.8). Corollary 6.10 If there exists a path from the root to some vertex x that passes at least twice through the same information set Ui of player i, and if the action leading in the direction of x is not the same action at each of these information sets, then player i has a behavior strategy that has no equivalent mixed strategy. The last corollary will be used to prove the next theorem, which gives a necessary and sufficient condition for the existence of a mixed strategy equivalent to every behavior strategy. If every path emanating from the root passes through each information set at most once, then every behavior strategy has an equivalent mixed strategy. Theorem 6.11 Let Ŵ = (N, V , E, v0 , (Vi )i∈N∪{0} , (px )x∈V0 , (Ui )i∈N , O, u) be an extensive-form game that satisfies the condition that at every vertex there are at least two actions. Every behavior strategy of player i has an equivalent mixed strategy if and only if each information set of player i intersects every path emanating from the root at most once.
227
6.2 Kuhn’s Theorem
In the game in Example 6.9, there is a path that twice intersects the same information set, and we indeed identified a behavior strategy of that game that has no equivalent mixed strategy. The theorem does not hold without the condition that there are at least two actions at each vertex (Exercise 6.9). We first prove that the condition in the statement of the theorem is necessary. Proof of Theorem 6.11: the condition is necessary Suppose that there exists a path from the root to a vertex x that intersects the same information set Ui of player i at least twice. We will prove that there is a behavior strategy of player i that has no mixed strategy equivalent to it. Let x1 and x1 be two distinct vertices in the above-mentioned information set that are located along the path (see Figure 6.5). Denote by a the action at x1 leading to x, and by b an action at x1 that differs from a. Let x2 be the vertex that the play of the game reaches if at vertex x1 player i chooses action b.
x0
b x1
a
b Ui
x1
a
x2 x
Figure 6.5 The game tree in the proof of Theorem 6.11
The path from the root to x2 passes through the vertices x1 and x1 , the action at x1 leading x1 leading to x2 is b. By Corollary 6.10 it follows that there is to x2 is a, and the action at a behavior strategy of player i that has no mixed strategy equivalent to it, which is what we needed to show. We now explain the idea underlying the proof of the second direction. The proof itself will be presented in Section 6.2.3 after we introduce several definitions. Let bi be a behavior strategy of player i. When the play of the game arrives at information set Ui , player i conducts a lottery based on the probability distribution bi (Ui ) to choose one of the actions available at information set Ui . Player i could just as easily conduct this lottery at the start of the game, instead of waiting until he gets to the information set Ui . In other words, at the start of the game, the player can conduct a lottery for each one of his information sets Ui , using in each case the probability distribution bi (Ui ), and then play the action thus chosen at each information set, respectively, if and when the play of the game reaches it. Since all the lotteries are conducted at the start of the game, we have essentially defined a mixed strategy that is equivalent to bi . This construction would not be possible without the condition that any path from the root intersects every information set at most once. Indeed, if there were a path intersecting the same information set of player i several times, then the mechanism described in the previous paragraph would require player i to choose the same action every time he gets
228
Behavior strategies and Kuhn’s Theorem
to that information set. In contrast, a behavior strategy enables the player to choose his actions at the information sets independently every time the play of the game arrives at the information set. It follows that in this case the mixed strategy that the process defines is not equivalent to the behavior strategies bi . Before we prove the other direction of Theorem 6.11 (sufficiency), we present ρ(x; σ ), the probability that the play of the game reaches vertex x, as the product of probabilities, each of which depends solely on one player. This representation will serve us in several proofs in this section, as will the notation that we now introduce.
6.2.2
Representing ρ(x; σ) as a product of probabilities For each decision vertex x of player i, denote by Ui (x) ∈ Ui the information set of player i containing x. For each descendant x of x denote by ai (x → x ) ∈ A(Ui (x)) the equivalence class containing the action leading from x to x . This is the action that player i must choose at vertex x for the play of the game to continue in the direction of vertex x. For each vertex x (not necessarily a decision vertex of player i) denote the number of vertices along the path from the root to x (not including x) at which player i is Lx the decision maker by Lxi , and denote these nodes by xi1 , xi2 , . . . , xi i . In Example 6.4, LxI 10 = 2, xI1 = x1 , xI2 = x6 , and UI (x1 ) = {x1 },
UI (x10 ) = {x6 , x7 }.
(6.22)
Since an information set can contain several vertices on the path from the root to x, as happens in the Absent-Minded Driver game (Example 6.9), it is possible that Ui (xil1 ) = Ui (xil2 ) even when l1 = l2 . In Example 6.9, LxI 4 = 2 and UI x41 = UI x42 = {x1 , x2 }. (6.23)
What is the probability that under the strategy implemented by player i, he will choose the action leading to x at each one of the information sets preceding x? If player i implements behavior strategy bi , this probability equals ( x Li if Lxi > 0, bi ai xil → x ; Ui xil l=1 ρi (x; bi ) := (6.24) 1 if Lxi = 0.
If player i implements the mixed strategy σi , then σi (si ) is the probability that he chooses pure strategy si . Denote by Si∗ (x) ⊆ Si all of player i’s pure strategies under which at each information set Ui (xil ), 1 ≤ l ≤ Lxi , he chooses the action a(Ui (xil ) → x). The set Si∗ (x) may be empty; since a pure strategy cannot choose two different actions at the same information set, this happens when the path from the root to x passes at least twice through the same information set of player i, and the action leading to x is not the same action in every case. When Si∗ (x) = ∅, the probability that player i chooses the actions leading to vertex x is
σi (si ). (6.25) ρi (x; σi ) := si ∈Si∗ (x)
When Si∗ (x) = ∅, this probability is defined by ρi (x; σi ) := 0. Because the lotteries conducted by the players are independent, we get that for each mixed/behavior strategy
229
6.2 Kuhn’s Theorem
vector σ and every vertex x, ρ(x; σ ) =
)
ρi (x; σi ).
(6.26)
i∈N
We turn now to the proof of the second direction of Theorem 6.11.
6.2.3
Proof of the second direction of Theorem 6.11: sufficiency We want to prove that if every path intersects each information set of player i at most once, then every mixed strategy of player i has an equivalent behavior strategy. A pure strategy of player i is a choice of an action from his action set at each of his information sets. Hence the set of pure strategies of player i is Si =
×
Ui ∈Ui
A(Ui ).
(6.27)
For every pure strategy si of player i, and every information set Ui , the action that the player chooses at Ui is si (Ui ). It follows that for every behavior strategy bi and every pure strategy si of player i, bi (si (Ui ); Ui ) is the probability that under behavior strategy bi , at each time that the play of the game reaches a vertex in information set Ui , player i chooses the same action that si chooses at this information set. Given a behavior strategy bi of player i, we will now define a mixed strategy σi that is equivalent to bi . For every pure strategy si of player i define the “probability that this strategy is chosen according to bi ” as ) bi (si (Ui ); Ui ). (6.28) σi (si ) := Ui ∈Ui
First, we will show that σi := (σi (si ))si ∈Si is a probability distribution over Si , and hence it defines a mixed strategy for player i. Since σi (si ) is a product of nonnegative numbers, σi (si ) ≥ 0 for every pure strategy si ∈ Si . We now verify that si ∈Si σi (si ) = 1. Indeed, ⎛ ⎞
) ⎝ (6.29) bi (si (Ui ); Ui )⎠ σi (si ) = si ∈Si
si ∈Si
=
=
)
Ui ∈Ui
bi (ai ; Ui )
(6.30)
Ui ∈Ui ai ∈A(Ui )
)
Ui ∈Ui
1 = 1.
(6.31)
Equation (6.30) follows from changing the order of the product and the summation and from the assumption that every path intersects every information set at most once. Finally, we need to check that the mixed strategy σi is equivalent to bi . Let x be a vertex. We will show that for each mixed/behavior strategy vector σ−i of players N \ {i}, ρ(x; bi , σ−i ) = ρ(x; σi , σ−i ).
(6.32)
From Equation (6.26), we deduce that ρ(x; bi , σ−i ) = ρi (x; bi ) ×
) j =i
ρj (x; σj ),
(6.33)
230
Behavior strategies and Kuhn’s Theorem
and ρ(x; σi , σ−i ) = ρi (x; σi ) ×
)
ρj (x; σj ).
(6.34)
j =i
It follows that in order to show that Equation (6.32) is satisfied, it suffices to show that ρi (x; bi ) = ρi (x; σi ).
(6.35)
Divide player i’s collection of information sets into two: Ui1 , containing all the information sets intersected by the path from the root to x, and Ui2 , containing all the information sets that are not intersected by this path. Since Si∗ (x) is the set of pure strategies of player i in which he implements the action leading to vertex x in all information sets intersected by the path from the root to x, ρi (x; σi ) = = =
si ∈Si∗ (x)
σi (si )
(6.36)
)
(6.37)
bi (si (Ui ); Ui )
si ∈Si∗ (x) Ui ∈Ui
si ∈Si∗ (x)
⎛ ⎝
)
Ui ∈Ui1
bi (si (Ui ); Ui ) ×
)
Ui ∈Ui2
⎞
bi (si (Ui ); Ui ))⎠ .
(6.38)
Lx
Since Ui1 contains only the information sets Ui (xi1 ), Ui (xi2 ), . . . , Ui (xi i ), and since for every l ∈ {1, 2, . . . , Lxi } the pure strategy si ∈ Si∗ (x) instructs player i to play action a(Ui (xil ) → x) at information set Ui (xil ), we deduce, using Equation (6.24), that )
Ui ∈Ui1
x
bi (si (Ui ); Ui ) =
Li ) l=1
bi (ai (xil → x); Ui (xil )) = ρi (x; bi ).
(6.39)
In particular, this product is independent of si ∈ Si∗ (s). We can therefore move the product outside of the sum in Equation (6.38), yielding ⎛
ρi (x; σi ) = ρi (x; bi ) × ⎝
si ∈Si∗ (x)
)
Ui ∈Ui2
⎞
bi (si (Ui ); Ui )⎠ .
(6.40)
We will now show that the second element on the right-hand side of Equation (6.40) equals 1. The fact that si is contained in Si∗ (x) does not impose any constraints on the actions implemented by player i at the information sets in Ui2 . For every sequence (aUi )Ui ∈Ui2 at which aUi ∈ A(Ui ) is a possible action for player i at information set Ui for all Ui ∈ Ui2 , there is a pure strategy si ∈ Si∗ (x) such that aUi = si (Ui ) for all Ui ∈ Ui2 . Moreover, there is an injective mapping between the set of pure strategies Si∗ (x) and the set of the sequences
231
6.2 Kuhn’s Theorem
(aUi )Ui ∈Ui2 ∈ ×Ui ∈Ui2 A(Ui ). Therefore,
) bi (si (Ui ); Ui ) = si ∈Si∗ (x)
Ui ∈Ui2
=
* (aUi )U
)
Ui ∈Ui2
2 ∈×U ∈U 2 A(Ui ) i ∈Ui i i
aUi ∈A(Ui )
+
)
bi (aUi ; Ui )
Ui ∈Ui2
bi (aUi ; Ui ) =
)
Ui ∈Ui2
1 = 1.
(6.41)
Equation (6.40) therefore implies that ρi (x; σi ) = ρi (x; bi )
(6.42)
which is what we wanted to prove.
6.2.4
Conditions guaranteeing the existence of a behavior strategy equivalent to a mixed strategy In this section, we present a condition guaranteeing that every mixed strategy has an equivalent behavior strategy. This requires formalizing when a player never forgets anything. During the play of a game, a player can forget many things: r He can forget what moves he made in the past (as in Example 6.8). r He can forget whether or not he made a move at all in the past (as in Example 6.9). r He can forget things he knew at earlier stages of the games, such as the result of a chance move, what actions another player has played, which players acted in the past, or how many times a particular player played in the past. The next definition guarantees that a player never forgets any of the items in the above list (Exercises 6.11–6.15). Recall that all the vertices in the same information set must have the same associated action set (Definition 3.23 on page 54). = (x 0 → Definition 6.12 Let X = (x 0 → x 1 → . . . → x K ) and X x1 → . . . → x L ) be 2 two paths in the game tree. Let Ui be an information set of player i, which intersects each at of these two paths at only one vertex: X at x k , and X x l . We say that these two paths choose the same action at information set Ui if k < K, l < L, and the action at x k leading to x k+1 is identical to the action at x l leading to x l+1 , i.e., ai (x k → x k+1 ) = ai ( xl → x l+1 ). Definition 6.13 Player i has perfect recall if the following conditions are satisfied:
(a) Every information set of player i intersects every path from the root to a leaf at most once. (b) Every two paths from the root that end in the same information set of player i pass through the same information sets of player i, and in the same order, and in every such information set the two paths choose the same action. In other words, for every information set Ui of player i and every pair of vertices x, x in Ui , if the decision vertices of player i on the path from the root to x are xi1 , xi2 , . . . , xiL = x and his
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
2 In the description of a path in a game tree we list only the vertices, because the edges along the path are uniquely determined by those vertices.
232
Behavior strategies and Kuhn’s Theorem
decision vertices on the path from the root to x are xi1 , xi2 , . . . , xiL = x , then L = L, l l l l xi ), and ai (xi → x) = ai ( xi → x ) for all l ∈ {1, 2, . . . , L}. and Ui (xi ) = Ui (
A game is called a game with perfect recall if all the players have perfect recall.
Two games are shown in Figure 6.6. In Game A, every player has a single information set, and all the players have perfect recall. In Game B, in contrast, Player I has imperfect recall, because the two paths connecting the root to the vertices in information set {x3 , x4 } do not choose the same action in information set {x1 }. Player II, however, has perfect recall in this game.
t
x2
x1
b
τ
x4 T
II
B t
III I β
T x5 x3
t
x2
b
B
Game A: a game with perfect recall
x1
b
T1
x4 T 2
II
I I B1
T2 x 5 x3
B2 t b
B2
Game B: a game with imperfect recall
Figure 6.6 Two games in extensive form
Recall that Si∗ (x) is the set of pure strategies of player i at which he chooses the actions leading to vertex x (see page 228). The definition of perfect recall implies the following corollary (Exercise 6.16). Theorem 6.14 Let i be a player with perfect recall in an extensive-form game, and let x and x ′ be two vertices in the same information set of player i. Then Si∗ (x) = Si∗ (x ′ ). Theorem 6.15 (Kuhn [1957]) In every game in extensive form, if player i has perfect recall, then for every mixed strategy of player i there exists an equivalent behavior strategy. Proof: We make use of the following notation: for each vertex x of player i, and each possible action a in A(Ui (x)), we denote by x a the vertex in the game tree that the play of the game reaches if player i chooses action a at vertex x. Let σi be a mixed strategy of player i. Our goal is to define a behavior strategy bi equivalent to σi . Step 1: Defining a behavior strategy bi . To define a behavior strategy bi we have to define, for each information set Ui of player i, a probability distribution over the set of possible actions at Ui .
233
6.2 Kuhn’s Theorem
So suppose Ui is an information set of player i, and let x be a vertex in Ui . For each action ai ∈ A(Ui ), the collection Si∗ (x ai ) contains all the pure strategies si in Si∗ (x) satisfying si (Ui ) = ai . If si ∈Si∗ (x) σi (si ) > 0 define
s ∈Si∗ (x ai )
i bi (ai ; Ui ) :=
si ∈Si∗ (x)
σi (si )
σi (si )
,
∀ai ∈ A(x).
(6.43)
The numerator on the right-hand side of Equation (6.43) is the probability that player i will play the actions leading to x a , and the denominator is the probability that player i will play the actions leading to x. It follows that the ratio between the two values equals the conditional probability that player i plays action a if the play reaches vertex x. If si ∈Si∗ (x) σi (si ) = 0, by Theorem 6.14 it follows that si ∈Si∗ (x ′ ) σi (si ) for each vertex x ′ in the information set of x. Therefore, when player i implements σi the probability that the play of the game will visit the information set containing x is 0, i.e., ρi (x; σi ) = 0. In this case, the definition of bi , for information set Ui , makes no difference. For the definition of bi to be complete, we define in this case bi (ai ; Ui ) =
1 , |A(Ui )|
∀ai ∈ A(x).
(6.44)
We now show that the definition of bi is independent of the vertex x chosen in information bi is well defined. It suffices to check the set Ui , so that the behavior strategy case si ∈Si∗ (x) σi (si ) > 0, because when si ∈Si∗ (x) σi (si ) = 0, the definition of bi (see Equation (6.44)) is independent of x; it depends only on Ui . Let then x1 and x2 be two different vertices in Ui . Since player i has perfect recall, Theorem 6.14 implies that Si∗ (x1 ) = Si∗ (x2 ). Since x1 and x2 are in the same information set, the set of possible actions at x1 equals the set of possible actions at x2 : A(x1 ) = A(x2 ). If a is a possible action at these vertices, x1a and x2a are the vertices reached by the play of the game from x1 and from x2 respectively, if player i implements action a at these vertices. Using Theorem 6.14 we deduce that Si∗ (x1a ) = Si∗ (x2a ). In particular, it follows that the numerator and denominator of Equation (6.43) are independent of the choice of vertex x in Ui . Step 2: Showing that bi is a behavior strategy. We need to prove that for every information set Ui of player i, bi (Ui ) is a probability summing to one. distribution over A(Ui ), i.e., that bi (Ui ) is a vector of nonnegative numbers Equation (6.44) defines a probability distribution over A(Ui ) for the case si ∈Si∗ (x) σi (si ) = 0. We show now that when si ∈Si∗ (x) σi (si ) > 0, Equation (6.43) defines a probability distribution over A(Ui ). Since σi (si ) ≥ 0 for every pure strategy si , the numerator in Equation (6.43) is nonnegative, and hence bi (ai ; Ui ) ≥ 0 for every action ai ∈ A(Ui ). The sets {Si∗ (x a ) : a ∈ A(Ui )} are disjoint, and their union is Si∗ (x). It follows that
σi (si ) = σi (si ). (6.45) a∈A(Ui ) si ∈Si∗ (x a )
si ∈Si∗ (x)
We deduce from Equations (6.43) and (6.45) that in this case
ai ∈A(Ui )
bi (ai ; Ui ) = 1.
234
Behavior strategies and Kuhn’s Theorem
Step 3: Showing that bi is equivalent to σi . Let σ−i be a mixed/behavior strategy vector of the other players, and let x be a vertex in the game tree (not necessarily a decision vertex of player i). We need to show that ρ(x; bi , σ−i ) = ρ(x; σi , σ−i ). As we saw previously, Equation (6.26) implies that ) ρj (x; σj ), ρ(x; bi , σ−i ) = ρi (x; bi ) ×
(6.46)
(6.47)
j =i
and ρ(x; σi , σ−i ) = ρi (x; σi ) ×
)
ρj (x; σj ).
(6.48)
j =i
To show that Equation (6.46) is satisfied, it therefore suffices to show that ρi (x; bi ) = ρi (x; σi ).
(6.49)
In words, we need to show that the probability that player i will play actions leading to x under σi equals the probability that player i will do the same under bi . Recall that Lx xi1 , xi2 , . . . , xi i is the sequence of decision vertices of player i along the path from the root to x (not including the vertex x if player i is the decision maker there). If Lxi = 0, then player i has no information set intersected by the path from the root to x, so Si∗ (x) = Si . In this case, we have defined ρi (x; bi ) = 1 (see Equation (6.24)), and also
σi (si ) = 1. (6.50) σi (si ) = ρi (x; σi ) = si ∈Si∗ (x)
si ∈Si
Hence Equation (6.49) is satisfied. Suppose, then, that Lxi > 0. Every strategy of player i that chooses, at each information set Ui (xi1 ), Ui (xi2 ), . . . , Ui (xil ), the action leading to x is a strategy that does so at each information set Ui (xi1 ), Ui (xi2 ), . . . , Ui (xil−1 ) and at information set Ui (xil ) chooses the action al := ai (xil → x). In other words, (6.51) Si∗ xil+1 = Si∗ xil,al .
Since bi is a behavior strategy, Equation (6.24) implies that x
Li ) ρi (x; bi ) = bi al ; Ui xil .
(6.52)
l=1
If ρi (x; bi ) = 0, then the definition of bi (Equation (6.43)) implies that Lxi a ) s ∈S ∗ (x l ) σi (si ) i i l ρi (x; bi ) = . si ∈Si∗ (xl ) σi (si ) l=1 From Equation (6.51) we deduce that
σi (si ) = a
si ∈Si∗ (xl l )
si ∈Si∗ (xl+1 )
σi (si ).
(6.53)
(6.54)
235
6.3 Equilibria in behavior strategies
It follows that the product on the right-hand side of Equation (6.53) is a telescopic product: the numerator in the l-th element of the product equals the denominator in the (l + 1)-th element of the product. This means that adjacent product elements cancel each other out. Note that Si∗ (xlal ) = Si∗ (x) is satisfied for l = Lxi , so that canceling adjacent product elements in Equation (6.53) yields si ∈S ∗ (x) σi (si ) ρi (x; bi ) = i . (6.55) si ∈Si∗ (xi1 ) σi (si )
Recall that xi1 is player i’s first decision vertex on the path from the root to x. Since player i has no information set prior to xi1 , every strategy of player i is in Si∗ (xi1 ), i.e., Si∗ (xi1 ) = Si . The denominator in Equation (6.55) therefore equals 1, so that
ρi (x; bi ) = σi (si ) = ρi (x; σi ), (6.56) si ∈Si∗ (x)
which is what we claimed. To wrap up, we turn our attention to the case ρi (x; bi ) = 0. From Equation (6.24), we deduce that ρi (x; bi ) is given by a product of elements and therefore one of those elements vanishes: there exists l, 1 ≤ l ≤ Lxi , such that bi (al ; Ui (xil )) = 0. From the definition of bi (Equation (6.43)) we deduce that si ∈S ∗ (x l,al ) σi (si ) = 0. On the other hand, Si∗ (xil,al ) ⊇ i i Si∗ (x) and therefore by Equation (6.25)
σi (si ) ≤ σi (si ) = 0. (6.57) ρi (x; σi ) = si ∈Si∗ (x)
l,al
si ∈Si∗ (xi
)
Hence Equation (6.49) is satisfied in this case.
6.3
Equilibria in behavior strategies • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
By Nash’s Theorem (Theorem 5.10 on page 151) every finite extensive-form game has a Nash equilibrium in mixed strategies. In other words, there exists a vector of mixed strategies under which no player has a profitable deviation to another mixed strategy. An equilibrium in behavior strategies is a vector of behavior strategies under which no player has a profitable deviation to another behavior strategy. The next theorem states that to ensure the existence of a Nash equilibrium in behavior strategies, it suffices that all the players have perfect recall. Theorem 6.16 If all the players in an extensive-form game have perfect recall then the game has a Nash equilibrium in behavior strategies. Proof: Since an extensive-form game is by definition a finite game, Nash’s Theorem (Theorem 5.10 on page 151) implies that the game has a Nash equilibrium in mixed strategies σ ∗ = (σi∗ )i∈N . Since all the players in the game have perfect recall, we know from Kuhn’s Theorem (Theorem 6.15) that for each player i there exists a behavior strategy bi∗ equivalent to σi∗ . Corollary 6.7 then implies that ui (σ ∗ ) = ui (b∗ ), ∀i ∈ N,
(6.58)
236
Behavior strategies and Kuhn’s Theorem
where b∗ = (bi∗ )i∈N . We show now that no player can increase his expected payoff by deviating to another behavior strategy. Let bi be a behavior strategy of player i. From Theorem 6.11, there exists a mixed strategy σi equivalent to bi . Since σ ∗ is an equilibrium in mixed strategies, ∗ ). ui (σ ∗ ) ≥ ui (σi , σ−i
(6.59)
Since σi is equivalent to bi , and for each j = i the strategy σj∗ is equivalent to bj∗ , Corollary 6.7 implies that ∗ ∗ ui (σi , σ−i ) = ui (bi , b−i ).
(6.60)
From Equations (6.58)–(6.60) we then have ∗ ∗ ) = ui (bi , b−i ). ui (b∗ ) = ui (σ ∗ ) ≥ ui (σi , σ−i
(6.61)
In other words, player i cannot profit by deviating from bi∗ to bi , so that the strategy vector b∗ is an equilibrium in behavior strategies. As the proof of the theorem shows, when a game has perfect recall, at each equilibrium in mixed strategies no player has a profitable deviation to a behavior strategy, and at each equilibrium in behavior strategies no player has a profitable deviation to a mixed strategy. Moreover, there exist equilibria at which some players implement mixed strategies and some players implement behavior strategies, and at each such equilibrium no player has a profitable deviation to either a mixed strategy or a behavior strategy. The next example shows that when it is not the case that all players have perfect recall, the game may not have a pure strategy equilibrium. Example 6.17 Figure 6.7 depicts a two-player zero-sum game.
U I2 U II
c
E
0
D
−2
E
0
D
2
Bill s
1 2
1 2
−1
Jane
C Jim U I1
S
1
Figure 6.7 The game in Example 6.17, in extensive form
237
6.3 Equilibria in behavior strategies This game may be interpreted as follows: Player I represents a couple, Jim and Jane. Player II is named Bill. At the first stage of the game, a winning card is handed either to Jim or to Bill, with equal probability. The player who receives the card may choose to show (“S” or “s”) his card, and receive a payoff of 1 from the other player (thus ending the play of the game), or to continue (“C” or “c”). If the player holding the winning card chooses to continue, Jane (who does not know who has the card) is called upon to choose between declaring end (“E”), and thus putting an end to the play of the game without any player receiving a payoff, or declaring double (“D”), which results in the player holding the winning card receiving a payoff of 2 from the other player. In this game, Player I has imperfect recall, because the paths from the root to the two vertices in information set UI2 do not intersect the same information set of Player I: one path intersects UI1 , while the other does not intersect it. Player I’s set of pure strategies is {SD, SE, CD, CE}, and Player II’s set of pure strategies is {S, C}. The strategic form of this game is given in the matrix in Figure 6.8 (in terms of payments from Player II to I):
Player II c s CE
0
− 21
CD
0
1 2
SE
1 2
0
SD
− 12
0
Player I
Figure 6.8 The game in Example 6.17 in strategic form
The value of the game in mixed strategies is v = 41 , and an optimal mixed strategy guaranteeing this payoff to Player I is σI = [0(CE), 21 (CD), 21 (SE), 0(SD)]. Player II’s only optimal (mixed) strategy is σII = [ 12 (c), 12 (s)]. To check whether the game has a value in behavior strategies we compute the minmax value v b and the maxmin value v b in behavior strategies. The maxmin value in behavior strategies equals the maxmin value in mixed strategies, since Player II has one information set. It follows that his set of behavior strategies BII equals his set of mixed strategies II (Exercise 6.4). Since he can guarantee 1 in mixed strategies he can guarantee 14 in behavior strategies. Formally, 4 v b = min max U (bI , bII )
(6.62)
= min max U (bI , bII )
(6.63)
= min max U (bI , bII )
(6.64)
bII ∈BII bI ∈BI
σII ∈BII bI ∈BI
σII ∈BII sI ∈SI
= min max U (bI , bII ) = v = σII ∈BII σI ∈I
1 4
(6.65)
Equation (6.64) holds because, as explained on page 179, it suffices to conduct maximization on the right-hand side of Equation (6.63) over the pure strategies of Player I, and Equation (6.65) holds because the function U is bilinear.
238
Behavior strategies and Kuhn’s Theorem We now compute the maxmin value in behavior strategies v b . In other words, we will calculate Player I’s maxmin value when he is restricted to using only behavior strategies. A behavior strategy of Player I can be written as bI = ([α(S), (1 − α)(C)], [β(D), (1 − β)(E)]). His expected payoff, when he plays bI , depends on Player II’s strategy:
r If Player II plays s, Player I’s expected payoff is 1 2 (α
+ (1 − α)(2β + 0(1 − β))) + 21 (−1) = (1 − α) β − 12 .
r If Player II plays c, Player I’s expected payoff is 1 2 (α
+ (1 − α)(2β + 0(1 − β))) + 21 (β(−2) + 0(1 − β)) = α 3
Player I’s maxmin value in behavior strategies is therefore ! v b = max min (1 − α) β − 21 , α 21 − β = 0.
1
2
−β .
α,β
(6.66)
(6.67)
(6.68)
To see that indeed v b = 0, note that if β ≤ 12 , then the first element in the minimization in Equation (6.68) is nonpositive; if β ≥ 21 , then the second element is nonpositive; and if β = 21 , both elements are zero. We conclude that v b = 14 = 0 = v b , and therefore the game has no value in behavior strategies. Since the strategy σI = [0(CE), 21 (CD), 21 (SE), 0(SD)] guarantees Player I an expected payoff of 41 , while any behavior strategy guarantees him at most 0, we confirm that there does not exist a behavior strategy equivalent to σI , which can also be proved directly (prove it!). The source of the difference between the two types of strategies in this case lies in the fact that Player I wants to coordinate his actions at his two information sets: ideally, Jane should play E if Jim plays S, and should play D if Jim plays C. This coordination is possible using a mixed strategy, but cannot be achieved with a behavior strategy, because in any behavior strategy the lotteries [α(S), (1 − α)(C)] and [β(D), (1 − β)(E)] are independent lotteries. ◭
6.4
Kuhn’s Theorem for infinite games • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
In Section 6.2 we proved Kuhn’s Theorem when the game tree is finite. There are extensiveform games with infinite game trees. This can happen in two ways: when there is a vertex with an infinite number of children, and when there are infinitely long paths in the game tree. In this section we generalize the theorem to the case in which each vertex has a finite number of children and the game tree has infinitely long paths. Infinitely long paths exist in games that may never end, such as backgammon and Monopoly. In Chapters 13 and 14 we present models of games that may not end. Generalizing Kuhn’s Theorem to infinite games involves several technical challenges:
r The set of pure strategies has the cardinality of the continuum. Indeed, if for example player i has a countable number of information sets and in each of his information sets there are only two possible actions, a pure strategy of player i is equivalent to an infinite sequence of zeros and ones. The collection of all such sequences is equivalent ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
3 Again, it suffices to conduct maximization over the pure strategies of Player II, which are c and s.
239
6.4 Kuhn’s Theorem for infinite games
to the interval [0, 1] of real numbers, which has the cardinality of the continuum. Since a mixed strategy is a probability distribution over pure strategies, we need to define a σ -algebra over the collection of all pure strategies in order to be able to define probability distributions over this set. r In finite games, the equivalence of mixed strategies and behavior strategies was defined using the equivalence between the probabilities that they induce over the vertices of the game tree, and in particular over the set of leaves, which determines the outcome of the game. In infinite games, the outcome of the game may be determined by an infinitely long path in the game tree that corresponds to an infinitely long play of the game. It follows that instead of probability distributions induced over a finite set of leaves, in the case of an infinite game we need to deal with probability distributions induced over the set of paths in the game tree, which as we showed above has the cardinality of the continuum. This requires defining a measurable space over the set of plays of the game, that is, over the set of paths (finite and infinite) starting at the root of the tree. We first introduce several definitions that will be used in this section. Definition 6.18 Let X be a set. A collection Y of subsets of X is a σ -algebra over X if (a) ∅ ∈ Y , (b) X \ Y ∈ Y for all Y ∈ Y , and (c) ∪i∈N Yi ∈ Y for every sequence (Yi )i∈N of elements in Y . De Morgan’s Laws imply that a σ -algebra is also closed under countable intersections: of subsets if (Yi )i∈N is a sequence of elements of Y , then ∩i∈N Yi ∈ Y . For each family Y is the smallest σ -algebra of Y (with respect to set of X, the σ -algebra generated by Y inclusion) satisfying Y ⊆ Y . The σ -algebra that we will use in the rest of this section is the σ -algebra of cylinder sets.
Definition 6.19 Let (Xn )n∈N be sequence of finite sets, and let X ∞ := ×n∈N Xn . A set B ∈ X∞ is called a cylinder set if there exist N ∈ N and (An )N n=1 , An ⊆ Xn for all N ∞ n ∈ {1, 2, . . . , N}, such that B = (×n=1 An ) × (×n=N+1 Xn ). The σ -algebra of cylinder sets is the σ -algebra Y generated by the cylinder sets in X∞ . Definition 6.20 A measurable space is a pair (X, Y ) such that X is a set and Y is a σ -algebra over X. A probability distribution over a measurable space (X, Y ) is a function p : Y → [0, 1] satisfying:
r p(∅) = 0. r p(X \ Y ) = 1 − p(Y ) for every Y ∈ Y . r p(∪n∈N Yn ) = n∈N p(Yn ) for any sequence (Yn )n∈N of pairwise disjoint sets in Y .
The third property in the definition of a probability distribution is called σ -additivity. The next theorem follows from the Kolmogorov Extension Theorem (see, for example, Theorem A.3.1 in Durrett [2004]) and the Carath´eodory Extension Theorem (see, for example, Theorem 13.A in Halmos [1994]). Given an infinite product of spaces X ∞ = ×n∈N Xn and a sequence (pN )N∈N of probability distributions, where each p N is N a probability distribution over the finite product XN := ×n=1 Xn , the theorem presents a condition guaranteeing the existence of an extension of the probability distributions (pN )N∈N to X ∞ , i.e., a probability distribution p over X∞ whose marginal distribution over XN is p N , for each N ∈ N.
240
Behavior strategies and Kuhn’s Theorem
Theorem 6.21 Let (Xn )n∈N be a sequence of finite sets. Suppose that for each N ∈ N N there exists a probability distribution p N over XN := ×n=1 Xn that satisfies p N (A) = pN+1 (A × XN+1 ), ∀N ∈ N, ∀A ⊆ XN .
(6.69)
Let X := ×n∈N Xn and let Y be the σ -algebra of cylinder sets over X . Then there exists a unique probability distribution p over (X ∞ , Y ) extending (pN )N∈N , i.e., ∞
∞
pN (A) = p(A × XN+1 × XN+2 × · · · ), ∀N ∈ N, ∀A ⊆ X N .
(6.70)
When (V , E, x 0 ) is a (finite or infinite) game tree, denote by H the set of maximal paths in the tree, meaning paths from the root to a leaf, and infinite paths from the root. For each vertex x denote by H (x) the set of paths in H passing through x. Let H be the σ -algebra generated by the sets H (x) for all x ∈ V . Recall that for each vertex x that is not a leaf the set of children of x is denoted by C(x). In this section we also make use of the following version of Theorem 6.21, which states that if there is an infinite tree such that each vertex x in the tree has an associated probability distribution p(x), and if these probability distributions are consistent in the sense that the probability associated with a vertex equals the sum of the probabilities associated with its children, then there is a unique probability distribution p over the set of maximal paths satisfying the property that the probability that the set of paths passing through vertex x equals p(x). The proof of the theorem is left to the reader (Exercise 6.24).
6.4.1
Theorem 6.22 Let (V , E, x 0 ) be a (finite or infinite) game tree such that |C(x)| < ∞ for each vertex x. Denote by H the set of maximal paths. Let p : V → [0, 1] be a function satisfying p(x) = x ′ ∈C(x) p(x ′ ) for each vertex x ∈ V that is not a leaf. Then there exists a unique probability distribution p over (H, H) satisfying p (H (x)) = p(x) for all x ∈ V .
Definitions of pure strategy, mixed strategy, and behavior strategy Let G be an extensive-form game with an infinite game tree such that each vertex has a finite number of children. In such a game, as in the finite case, a pure strategy of player i is a function that associates each information set of player i with a possible action at that information set. A behavior strategy of player i is a function associating each one of his information sets with a probability distribution over the set of possible actions at that information set. Denote by Si = ×Ui ∈Ui A(Ui ) player i’s set of pure strategies and by Bi = ×Ui ∈Ui (A(Ui )) his set of behavior strategies. A mixed strategy is a probability distribution over the collection of pure strategies. When the game has finite depth,4 the set of pure strategies is a finite set and the set of player i’s mixed strategies i is a simplex. When player i has an infinite number of information sets at which he has at least two possible actions the set of pure strategies Si has the cardinality of the continuum. To define a probability distribution over this set we need to define a σ -algebra over it. Let Si be the σ -algebra of cylinder sets of Si . The pair (Si , Si ) is a measurable space and the set of probability distributions over it is the set of mixed strategies i of player i.
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
4 The depth of a vertex is the number of edges in the path from the root to the vertex. The depth of a game is the maximum (or supremum) of the depth of all vertices in the game tree.
241
6.4 Kuhn’s Theorem for infinite games
6.4.2
Equivalence between mixed strategies and behavior strategies In a finite game of depth T , a mixed strategy σiT is equivalent to a behavior strategy biT T T T if ρ(x; σiT , σ−i ) = ρ(x; biT , σ−i ) for every mixed/behavior strategy vector σ−i of the other players and every vertex x in the game tree. In this section we extend the definition of equivalence between mixed and behavior strategies to infinite games. We begin by defining ρi (x; σi ) and ρi (x; bi ), the probability that player i implementing either mixed strategy σi or behavior strategy bi will choose actions leading to the vertex x at each vertex along the path from the root to x that is in his information sets. For each behavior strategy bi of player i, and each vertex x in the game tree, Lxi bi ai ; Uil , ρi (x; bi ) := l=1
(6.71)
where Lxi is the number of vertices along the path from the root to x that are in player i’s information sets (not including the vertex x, if at vertex x player i chooses an action), and Lx Ui1 , Ui2 , . . . , U1 i are the information sets containing these vertices (if there are several vertices along the path to x in the same information set Ui of player i, then this information Lx set will appear more than once in the list Ui1 , Ui2 , . . . , U1 i ). We now define the probability ρi (x; σi ) for a mixed strategy σi . For any T ∈ N let GT be the game that includes the first T stages of the game G.
r The set of vertices V T of GT contains all vertices of G with depth at most T . r The information sets of each player i in GT are all nonempty subsets of V T that are obtained as the intersection of an information set in UiT with V T −1 ; that is, an information set in GT contains only vertices whose depth is strictly less than T . This is because the vertices whose depth is T are leaves of GT . Denote by UiT the collection of player i’s information sets in the game G that have a nonempty intersection with V T −1 . With this notation, player i’s collection of information sets in the game GT is all the nonempty intersections of V T −1 with a set in UiT . Below, for any T ∈ N, we identify each information set UiT of player i in the game GT with the information set Ui ∈ UiT for which UiT = V T −1 ∩ Ui . Since each vertex has a finite number of children, the set V T contains a finite number of vertices. To simplify the notation, an information set of player i in the game GT , which is the intersection of V T and an information set Ui of player i in the game G, will also be denoted by Ui . Since Kuhn’s Theorem does not involve the payoffs of a game, we will not specify the payoffs in the game GT . Player i’s set of pure strategies in the game GT is SiT := ×Ui ∈UiT A(Ui ). For each mixed strategy σi in the game G, let σiT be its marginal distribution over SiT . Then σiT is a mixed strategy in the game GT . The sequence of probability distributions (σiT )T ∈N satisfies the following property: the marginal distribution of σiT over SiT −1 is σiT −1 . It follows that for each vertex x whose depth is less than or equal to T we have ρi (x; σiT1 ) = ρi (x; σiT2 ) for all T1 , T2 ≥ T . Define for each vertex x ρi (x; σi ) := ρi x; σiT ,
(6.72)
242
Behavior strategies and Kuhn’s Theorem
where T is greater than or equal to the depth of x. Finally, define, for each mixed/behavior strategy vector σ , ρ(x; σ ) := i∈N ρi (x; σi ).
(6.73)
This is the probability that the play of the game reaches vertex x when the players implement the strategy vector σ . The following theorem, which states that every vector of strategies uniquely defines a probability distribution over the set of infinite plays, follows from Theorem 6.22 and the definition of ρ (Exercise 6.25). Theorem 6.23 Let σ be a mixed/behavior strategy vector in a (finite or infinite) extensiveform game. Then there exists a unique probability distribution μσ over (H, H) satisfying μσ (H (x)) = ρ(x; σ ) for every vertex x. Definition 6.24 A mixed strategy σi of player i is equivalent to a behavior strategy bi of player i if, for every mixed/behavior strategy vector σ−i of the other players, μ(σi ,σ−i ) = μ(bi ,σ−i ) . Theorem 6.23 implies the following theorem. Theorem 6.25 A mixed strategy σi of player i is equivalent to his behavior strategy bi if for every mixed/behavior strategy vector σ−i of the other players and every vertex x we have ρ(x; σi , σ−i ) = ρ(x; bi , σ−i ).
6.4.3
Statement of Kuhn’s Theorem for infinite games and its proof The definition of a player with perfect recall in an infinite extensive-form game is identical to the definition for finite games (Definition 6.13 on page 231). If player i has perfect recall in a game G, then he also has perfect recall in the game GT for all T ∈ N (verify!). Theorem 6.26 Let G be an extensive-form game with an infinite game tree such that each vertex in the game tree has a finite number of children. If player i has perfect recall, then for each mixed strategy of player i there is an equivalent behavior strategy and for each behavior strategy of player i there is an equivalent mixed strategy. Proof: Let G be an extensive-form game with an infinite game tree such that each vertex in the game tree has a finite number of children. Let i be a player with perfect recall. We begin by proving one direction of the statement of the theorem: for each mixed strategy of player i there is an equivalent behavior strategy. Let σi be a mixed strategy of player i in the game G. For each T ∈ N, let σiT be the restriction of σi to the game GT ; in other words, σiT is the marginal distribution of σi over SiT . In the proof of Kuhn’s Theorem (Theorem 6.15 on page 232), we constructed an equivalent behavior strategy for any given mixed strategy in a finite extensive-form game. Let biT be the behavior strategy equivalent to the mixed strategy σiT in the game GT , constructed according to that theorem. Since σiT +1 is equivalent to biT +1 in the game GTi +1 , since the marginal distribution of σiT +1 over SiT is σiT , and since σiT is equivalent to biT in the game GTi , it follows that for each vertex x whose depth is less than or equal to T , (6.74) ρi x; biT +1 = ρi x; σiT +1 = ρi x; σiT = ρi x; biT .
243
6.5 Remarks
It follows that biT +1 (Ui ) = biT (Ui ), ∀Ui ∈ UiT .
(6.75)
In other words, the behavior strategies (biT )T ∈N are consistent, in the sense that every two of them coincide on information sets that are in the domain of both. Define a behavior strategy bi of player i by bi (Ui ) := biT (Ui ), ∀Ui ∈ Ui ,
(6.76)
where T satisfies Ui ∈ UiT . By Equation (6.75) it follows that bi (Ui ) is well defined. We will prove that σi and bi are equivalent in the game G. Let σ−i = (σj )j =i be a mixed/behavior strategy vector of the other players. For each T ∈ N, let σjT be the strategy T = (σjT )j =i . Since the strategies σiT and biT are σj restricted to the game GT . Denote σ−i T equivalent in the game G , T T ρ x; σiT , σ−i = ρ x; biT , σ−i (6.77) for each vertex whose depth is less than or equal to T . By definition, it follows that for each vertex x ρ(x; σi , σ−i ) = ρ(x; bi , σ−i ).
(6.78)
Theorem 6.25 implies that σi and bi are equivalent strategies. We now prove the other direction of the statement of the theorem. Let bi be a behavior strategy of player i. For each T ∈ N let biT be the restriction of bi to the collection of information sets UiT . It follows that biT is a behavior strategy of player i in the game GT . Since player i has perfect recall in the game GT , and since the game GT is a finite game, there exists a mixed strategy σiT equivalent to biT in the game GT . Since σiT +1 is equivalent to biT +1 in the game GTi +1 , since the restriction of biT +1 to T Ui is biT , and since σiT is equivalent to biT in the game GTi , it follows that σiT is the marginal distribution of σiT +1 on SiT . By Theorem 6.21 (with respect to the product space Si = ×Ui ∈Ui A(Ui )) we deduce that there exists a mixed strategy σi whose projection over SiT is σiT for all T ∈ N. Reasoning similar to that used in the first part of this proof shows that σi and bi are equivalent strategies in the game G (Exercise 6.26). Using methods similar to those presented in this section one can prove Kuhn’s Theorem for extensive-form games with game trees of finite depth in which every vertex has a finite or countable number of children. Combining that result with the proof of Theorem 6.26 shows that Kuhn’s Theorem holds in extensive-form games with game trees of infinite depth in which every vertex has a finite or countable number of children.
6.5
Remarks • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The Absent-Minded Driver game appearing in Example 6.9 (page 225) was first introduced in Piccione and Rubinstein [1997], and an entire issue of the journal Games and Economic Behavior (1997, issue 1) was devoted to analyzing it. Item (b) of Exercise 6.17 is taken from von Stengel and Forges [2008].
244
Behavior strategies and Kuhn’s Theorem
6.6
Exercises • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
6.1 In each of the games in the following diagrams, identify which players have perfect recall. In each case in which there is a player with imperfect recall, indicate what the player may forget during a play of the game, and in what way the condition in Definition 6.13 (page 231) fails to obtain. T1 2 3
T2 B2
I
β
T1
1 3
T2
T1
B1
τ 0
t
I
b T2
I t
I
II
B1 T2
T2
B1
b T2
B2 Game B
Game A
T1 I
T3
2 5
τ
t T1
B3
0
b
II
I
B1 I
β
T3
T2
B1
I
T2
3 5
t B2
T2 b
B3
I
I
B2
B2
Game D
Game C
T2
T1 1 7
0 6 7
I
T2 t
B2
τ
B1
β
t II
I
B2
T1 T2 B2 T2
I
II B1
b t b
b
T3 I B3
B2
Game E
I
Game F
245
6.6 Exercises t
t b
T1
T2
II
B2
I
0
t
I
B1
4 7
τ β
3 7
b
T2
b II I T
B2
B
Game G
Game H
T B t b
6.2 In each of the following games, find a mixed strategy equivalent to the noted behavior strategy. (a) bI =
1 3
(T ), 23 (B) , in the game t T I
II B
b t b
(b) bII =
4 9
(t1 ), 95 (b1 ) , 35 (t2 ), 25 (b2 ) , 23 (t3 ), 31 (b3 ) , in the game t1 T
II
I B
t3 b1 t2
II b 3 t3
II
b3 b2
(c) bII =
4 9
(t1 ), 95 (b1 ) , 14 (t2 ), 34 (b2 ) , in the game T 3 5
I
0 2 5
t2 B t1
II
II b2 t2 b2
b1
246
Behavior strategies and Kuhn’s Theorem
6.3 Identify the payoff that each player can guarantee for himself in each of the following two-player zero-sum game using mixed strategies and using behavior strategies.
T2 80
T1 I B1
T1 I
T1
10
B1
50
B1
Game A
t
I
b II
B1
B2
I t b
Game D
B2
50
T2
50
B2
100
T2
0
B2
0
T2
0 0
T2
0
B2
100
8
B1 T1
10
B1
T1
2
B1
0
I
t
B1
p
b 0
B2
6
Game C
100 0
T1
T1
Game B T2
T1
I
I
100
II
I t
1−p
T2 I B2
0 b Game E
T3
100
B3
0
T3
0
B3
0
T3
0
B3
0
T3
0
B3
100
6.4 Prove that if a player in an extensive-form game has only one information set, then his set of mixed strategies equals his set of behavior strategies. 6.5 Does there exist a two-player zero-sum extensive-form game that has a value in mixed strategies and a value in behavior strategies, but these two values are not equal to each other? Either prove that such a game exists or provide a counterexample. 6.6 Prove Theorem 6.6 (page 223): if bi is a behavior strategy equivalent to the mixed strategy σi , then for every strategy vector σ−i , uj (σi , σ−i ) = uj (bi , σ−i ), ∀j ∈ N.
(6.79)
σII of Player II 6.7 Prove that in Example 6.4 (page 222) for every mixed strategy the probability distribution induced by (σI , σII ) over the leaves of the game tree σII ) over the leaves of is identical with the probability distribution induced by (bI , the game tree. The mixed strategy σI defined in Equation (6.14) and the behavior strategy bI defined in Equation (6.13) (page 224).
247
6.6 Exercises
6.8 Prove Corollary 6.10 (page 226): if there exists a path from the root to the vertex x that passes at least twice through the same information set Ui of player i, and if the action leading to x is not the same action in each of these passes through the information set, then player i has a behavior strategy that has no equivalent mixed strategy. 6.9 Show that Theorem 6.11 does not hold without the condition that there are at least two possible actions at each vertex. 6.10 Explain why Equation (6.37) in the proof of Theorem 6.11 (page 230) does not necessarily hold when a game does not have perfect recall. 6.11 Prove that if a player does not know whether or not he has previously made a move during the play of a game, then he does not have perfect recall (according to Definition 6.13 on page 231). 6.12 Prove that if a player knows during the play of a game how many moves he has previously made, but later forgets this, then he does not have perfect recall (according to Definition 6.13 on page 231). 6.13 Prove that if a player knows during the play of a game which action another player has chosen at a particular information set, but later forgets this, then he does not have perfect recall (according to Definition 6.13 on page 231). 6.14 Prove that if a player does not know what action he chose at a previous information set in a game, then he has imperfect recall in that game (according to Definition 6.13 on page 231). 6.15 Prove that if at a particular information set in a game a player knows which player made the move leading to that information set, but later forgets this, then he does not have perfect recall (according to Definition 6.13 on page 231). 6.16 Prove that if x1 and x2 are two vertices in the same information set of player i, and if player i has perfect recall in the game, then Si∗ (x1 ) = Si∗ (x2 ). (See page 228 for the definition of the set Si∗ (x).) be two information sets (they may both be the information sets of 6.17 Let U and U if there the same player, or of two different players). U will be said to precede U exist a vertex x ∈ U and a vertex x ∈ U such that the path from the root to x passes through x. (a) Prove that if U is an information set of a player with perfect recall, then U does not precede U . (b) Prove that in a two-player game without chance moves, where both players have , then U does not precede U . perfect recall, if U precedes U (c) Find a two-player game with chance moves, where both players have perfect such that U precedes U , and recall and there exist two information sets U and U precedes U . U
248
Behavior strategies and Kuhn’s Theorem
(d) Find a three-player game without chance moves, where all the players have such that U precedes perfect recall and there exist two information sets U and U U , and U precedes U .
6.18 Find a behavior strategy equivalent to the given mixed strategies in each of the following games. (a) sI = 21 (B1 , B2 ), 12 (T1 , T2 ) , in the game T1 I B1
T2 I B2
(b) sI =
sII =
3
(B1 B2 M3 ), 17 (B1 T2 B3 ), 72 (T1 B2 M3 ), 17 (T1 T2 T3 ) 73 (b b ), 1 (b t ), 1 (t b ), 2 (t t ) , in the game 7 1 2 7 1 2 7 1 2 7 1 2 1 2
and
T3
0 T1
t2
M3
I
1 2
B3 b2
I
II
T2 t2
1 5
B1
I b2 t1
0 4 5
B2 T2 B2
II b1
6.19 (a) Let i be a player with perfect recall in an extensive-form game and let σi be a mixed strategy of player i. Suppose that there is a strategy vector σ−i of the other players such that ρ(x; σi , σ−i ) > 0 for each leaf x in the game tree. Prove that there exists a unique behavior strategy bi equivalent to σi . (b) Give an example of an extensive-form game in which player i has perfect recall and there is a mixed strategy σi with more than one behavior strategy equivalent to it. 6.20 Let i be a player with perfect recall in an extensive-form game and let bi be a behavior strategy of player i. Suppose that there is a strategy vector σ−i of the other
249
6.6 Exercises
players such that ρ(x; bi , σ−i ) > 0 for each leaf in the game tree. Prove that there exists a unique mixed strategy σi equivalent to bi . 6.21 In the following two-player zero-sum game, find the optimal behavior strategies of the two players. (Why must such strategies exist?)
1 3
T2 I
II B2
0 T1
2 3
6
I t1
B1 II
I b1
T3
3
B3
9
T3
7
B3
5
t2
9
b2
18
t2
12
b2
6
6.22 Compute the value of the following game, in mixed strategies, and in behavior strategies, if these values exist.
t T1 I B1
I t
1
B2
2
T3
3
B3
1
T3
2
B3
0
T3
0
B3
2
I
b II
T2
b
6.23 (a) Compute the value in mixed strategies of the game below. (b) Compute what each player can guarantee using behavior strategies (in other words, compute each player’s security value in behavior strategies).
250
Behavior strategies and Kuhn’s Theorem 1
t b
T2
−2
II
B2
4
I
t
2 −3
T1 I B1 T2 B2
b −1
6.24 Prove Theorem 6.22 (page 240). 6.25 Prove Theorem 6.23 (page 242): let σ be a mixed/behavior strategy vector in a (finite or infinite) extensive-form game. Then there exists a unique probability distribution μσ over (H, H) satisfying μσ (H (x)) = ρ(x; σ ) for each vertex x. 6.26 Complete the proof of the second direction of Kuhn’s Theorem for infinite games (Theorem 6.26, page 242): prove that the mixed strategy σi constructed in the proof is equivalent to the given behavior strategy bi .
7
Equilibrium refinements
Chapter summary The most important solution concept in noncooperative game theory is the Nash equilibrium. When games possess many Nash equilibria, we sometimes want to know which equilibria are more reasonable than others. In this chapter we present and study some refinements of the concept of Nash equilibrium. In Section 7.1 we study subgame perfect equilibrium, which is a solution concept for extensive-form games. The idea behind this refinement is to rule out noncredible threats, that is, “irrational” behavior off the equilibrium path whose goal is to deter deviations. In games with perfect information, a subgame perfect equilibrium always exists, and it can be found using the process of backward induction. The second refinement, presented in Section 7.3, is the perfect equilibrium, which is based on the idea that players might make mistakes when choosing their strategies. In extensive-form games there are two types of perfect equilibria corresponding to the two types of mistakes that players may make: one, called strategic-form perfect equilibrium, assumes that players may make a mistake at the outset of the game, when they choose the pure strategy they will implement throughout the game. The other, called extensive-form perfect equilibrium, assumes that players may make mistakes in choosing an action in each information set. We show by examples that these two concepts are different and prove that every extensive-form game possesses perfect equilibria of both types, and that every extensive-form perfect equilibrium is a subgame perfect equilibrium. The last concept in this chapter, presented in Section 7.4, is the sequential equilibrium in extensive-form games. It is proved that every finite extensive-form game with perfect recall has a sequential equilibrium. Finally, we study the relationship between the sequential equilibrium and the extensive-form perfect equilibrium.
When a game has more than one equilibrium, we may wish to choose some equilibria over others based on “reasonable” criteria. Such a choice is termed a “refinement” of the equilibrium concept. Refinements can be derived in both extensive-form games and strategic-form games. We will consider several equilibrium refinements in this chapter, namely, perfect equilibrium, subgame perfect equilibrium, and sequential equilibrium. Throughout this chapter, when we analyze extensive-form games, we will assume that if the game has chance vertices, every possible move at every chance vertex is chosen with positive probability. If there is a move at a chance vertex that is chosen with probability 0, 251
252
Equilibrium refinements
it, and all the vertices following it in the tree, may be omitted, and we may consider instead the resulting smaller tree.
7.1
Subgame perfect equilibrium • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The concept of subgame perfect equilibrium, which is a refinement of equilibrium in extensive-form games, is presented in this section. In an extensive-form game, each strategy vector σ defines a path from the root to one of the leaves of the game tree, namely, the path that is obtained when each player implements the strategy vector σ . When the strategy vector σ is a Nash equilibrium, the path that is thus obtained is called the equilibrium path. If x is a vertex along the equilibrium path, and if Ŵ(x) is a subgame, then the strategy vector σ restricted to the subgame Ŵ(x) is also a Nash equilibrium because each profitable deviation for a player in the subgame Ŵ(x) is also a profitable deviation in the original game (explain why). In contrast, if the vertex x is not located along the equilibrium path (in which case it is said to be off the equilibrium path) then the strategy vector σ restricted to the subgame Ŵ(x) is not necessarily a Nash equilibrium of the subgame. The following example illustrates this point. Example 7.1 Consider the two-player extensive-form game shown in Figure 7.1.
x1 I
x2 II
B
A
D
(2, 1)
C
(0, 0)
(1, 2)
Figure 7.1 The extensive-form game in Example 7.1
Figure 7.2 shows the corresponding strategic form of this game.
Player II C D Player I
A
1, 2
1, 2
B
0, 0
2, 1
Figure 7.2 The strategic-form game, and two pure-strategy equilibria of the game
This game has two pure-strategy equilibria, (B, D) and (A, C). Player I clearly prefers (B, D), while Player II prefers (A, C). In addition, the game has a continuum of mixed-strategy equilibria: (A, [y(C), (1 − y)(D)]) for y ≥ 21 , with payoff (1, 2), which is identical to the payoff of (A, C). Which equilibrium is more likely to be played?
253
7.1 Subgame perfect equilibrium The extensive form of the game is depicted again twice in Figure 7.3, with the thick lines corresponding to the two pure-strategy equilibria.
x1 I
B
A
x2 II
D C
(1, 2)
Equlibrium (B, D)
(2, 1) (0, 0)
x1 I
B
A
x2 II
D
(2, 1)
C
(0, 0)
(1, 2)
Equlibrium (A, C)
Figure 7.3 The pure-strategy equilibria of the game in extensive form
This game has one proper subgame, Ŵ(x2 ), in which only Player II is active. The two equilibria (A, C) and (B, D) induce different plays in this subgame. While the restriction of the equilibrium (B, D) to Ŵ(x2 ), namely D, is an equilibrium of the subgame Ŵ(x2 ), the restriction of the equilibrium (A, C) to Ŵ(x2 ), namely C, is not an equilibrium in this subgame, since a deviation to D is profitable for Player II. The vertex x2 is on the equilibrium path of (B, D), which is x1 → x2 → (2, 1), and it is not on the equilibrium path of (A, C), which is x1 → (1, 2). This is necessarily so, since if x2 were on the equilibrium path of (A, C) and Player II did not play an equilibrium in the subgame Ŵ(x2 ), then (A, C) could not be an equilibrium (why ?). Player II’s strategy at vertex x2 seems irrational: the equilibrium strategy calls on him to choose C, which yields him a payoff of 0, instead of D, which yields him a payoff of 1. This choice, in fact, is never actually made when this equilibrium is played, because Player I chooses A at vertex x1 , but the equilibrium, as constructed, says that “if Player I were to choose B, then Player II would choose C.” This may be regarded as a threat directed by Player II to Player I: if you “dare” choose B, I will choose C, and then you will get 0 instead of 1, which you would get by choosing A. This threat is intended by Player II to persuade Player I to choose the action that leads to payoff (1, 2), which Player II prefers to (2, 1). Is this a credible threat? Whether or not a threat is credible depends on many factors, which are not expressed in our model of the game: previous interaction between the players, reputation, behavioral norms, and so on. Consideration of these factors, which may be important and interesting, is beyond the scope of this book. We will, however, consider what happens if we invalidate such threats, on the grounds that they are not “rational.” Another way of saying the same thing is: the restriction of the equilibrium (A, C) to the subgame Ŵ(x2 ), which begins at vertex x2 (the subgame in which only Player II has a move), yields the strategy C, which is not an equilibrium of that subgame. This observation led to the concept of ◭ subgame perfect equilibrium that we develop in this section.
Reinhard Selten [1965, 1973] suggested that the equilibria that should be chosen in extensive-form games are those equilibria that are also equilibria when restricted to each subgame. In other words, Selten suggested choosing those equilibria at which the actions of the players are still in equilibrium even when they are off the equilibrium path. By definition, a strategy σi tells player i which action to choose at each of his information sets, even at information sets that will not be arrived at during the play of the game that results from implementing σi (whether due to moves chosen by player i, or moves chosen by the other players). It follows that, for every strategy vector σ , it is possible to compute
254
Equilibrium refinements
the payoff of each player if the play of the game is at vertex x (even if the play has not arrived at x when the players implement σ ). Denote by ui (σ | x) player i’s payoff in the subgame Ŵ(x) when the players implement the strategy vector σ , if the play of the game is at vertex x. For example, in the game in Example 7.1, u1 ((A, C) | x1 ) = 1 and u1 ((A, C) | x2 ) = 0 (note that x2 is not reached when (A,C) is played). The payoff ui (σ | x) depends only on the restriction of the strategy vector σ to the subgame Ŵ(x). We will therefore use the same notation to denote the payoff when σ is a strategy vector in Ŵ(x) and when in the strategy vector σ some of the strategies are in Ŵ and some in Ŵ(x). Definition 7.2 A strategy vector σ ∗ (in mixed strategies or behavior strategies) in an extensive-form game Ŵ is called a subgame perfect equilibrium if for every subgame, the restriction of the strategy vector σ ∗ to the subgame is a Nash equilibrium of that subgame: for every player i ∈ N, every strategy σi , and every subgame Ŵ(x), ∗ | x). ui (σ ∗ | x) ≥ ui (σi , σ−i
(7.1)
As we saw in Example 7.1, the equilibrium (A, C) is not a subgame perfect equilibrium. In contrast, the equilibrium (B, D) is a subgame perfect equilibrium: the choice D is an equilibrium of the subgame starting at x2 . For each y ∈ [ 12 , 1], the equilibrium in mixed strategies (A, [y(C), (1 − y)(D)]) is not a subgame perfect equilibrium, because the choice of C with positive probability is not an equilibrium of the subgame starting at x2 . Note that in the strategic form of the game, Player II’s strategy C is (weakly) dominated by the strategy D, and hence the elimination of dominated strategies in this game eliminates the equilibrium (A, C) (and the mixed-strategy equilibria for y ∈ [ 12 , 1]), leaving only the subgame perfect equilibrium (B, D). A solution concept based on the elimination of weakly dominated strategies, and its relation to the concept of subgame perfect equilibrium, will be studied in Section 7.3. Remark 7.3 Since every game is a subgame of itself, by definition, every subgame perfect equilibrium is a Nash equilibrium. The concept of subgame perfect equilibrium is therefore a refinement of the concept of Nash equilibrium. As previously stated, each leaf x in a game tree defines a sub-tree Ŵ(x) in which effectively no player participates. An extensive-form game that does not include any subgame other than itself and the subgames defined by the leaves is called a game without nontrivial subgames. For such games, the condition appearing in Definition 7.2 holds vacuously, and we therefore deduce the following corollary. Theorem 7.4 In an extensive-form game without nontrivial subgames, every Nash equilibrium (in mixed strategies or behavior strategies) is a subgame perfect equilibrium. For each strategy vector σ , and each vertex x in the game tree, denote by Pσ (x) the probability that the play of the game will visit vertex x when the players implement the strategy vector σ . Theorem 7.5 Let σ ∗ be a Nash equilibrium (in mixed strategies or behavior strategies) of an extensive-form game Ŵ, and let Ŵ(x) be a subgame of Ŵ. If Pσ ∗ (x) > 0, then the strategy
255
7.1 Subgame perfect equilibrium
vector σ ∗ restricted to the subgame Ŵ(x) is a Nash equilibrium (in mixed strategies or behavior strategies) of Ŵ(x). This theorem underscores the fact that the extra conditions that make a Nash equilibrium a subgame perfect equilibrium apply to subgames Ŵ(x) for which Pσ (x) = 0, such as for example the subgame Ŵ(x2 ) in Example 7.1, under the equilibrium (A, C). Proof: The idea behind the proof is as follows. If in the subgame Ŵ(x) the strategy vector σ ∗ restricted to the subgame were not a Nash equilibrium, then there would exist a player i who could profit in that subgame by deviating from σi∗ to a different strategy, say σi′ , in the subgame. Since the play of the game visits the subgame Ŵ(x) with positive probability, the player can profit in Ŵ by deviating from σi∗ , by implementing σi′ if the game gets to x. We now proceed to the formal proof. Let Ŵ(x) be the subgame of Ŵ starting at vertex x and let σ ∗ be a Nash equilibrium of Ŵ satisfying Pσ ∗ (x) > 0. Let σi′ be a strategy of player i in the subgame Ŵ(x). Denote by σi the strategy1 of player i that coincides with σ ∗ except in the subgame Ŵ(x), where it coincides with σi′ . ∗ Since σ ∗ and (σi , σ−i ) coincide at all vertices that are not in Ŵ(x), Pσ ∗ (x) = P(σi ,σ−i∗ ) (x).
(7.2)
Denote by ui the expected payoff of player i, conditional on the play of the game not arriving at the subgame Ŵ(x) when the players implement the strategy vector σ ∗ . Then ui . ui (σ ∗ ) = Pσ ∗ (x)ui (σ ∗ | x) + (1 − Pσ ∗ (x))
(7.3)
∗ ) and using Writing out the analogous equation for the strategy vector (σi , σ−i Equation (7.2) yields ∗ ∗ ) = P(σi ,σ−i∗ ) (x)ui ((σi′ , σ−i ) | x) + (1 − P(σi ,σ−i∗ ) (x)) ui ui (σi , σ−i
=
∗ Pσ ∗ (x)ui ((σi′ , σ−i )
| x) + (1 − Pσ ∗ (x)) ui .
(7.4) (7.5)
Since σ ∗ is an equilibrium, ui = ui (σ ∗ ) Pσ ∗ (x)ui (σ ∗ | x) + (1 − Pσ ∗ (x)) ≥
=
∗ ) ui (σi , σ−i
∗ Pσ ∗ (x)ui ((σi′ , σ−i )
(7.6) (7.7) | x) + (1 − Pσ ∗ (x)) ui .
(7.8)
Since Pσ ∗ (x) > 0, one has ∗ ) | x). ui (σ ∗ | x) ≥ ui ((σi′ , σ−i
(7.9)
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
1 When σi∗ and σi′ are behavior strategies, the strategy σi coincides with σi∗ in player i’s information sets that are not in Ŵ(x), and with σi′ in player i’s information sets that are in the subgame of Ŵ(x). When σi∗ and σi′ are mixed strategies, the strategy σi is defined as follows: every pure strategy si of player i is composed of the pair (si1 , si2 ), in which si1 associates a move with each of player i’s information sets in the subgame Ŵ(x), and si2 associates a move with each of player i’s information sets that are not in the subgame Ŵ(x). Then si ). σi (si1 , si2 ) := σi′ (si1 ) {si : s 2 =s 2 } σ ∗ ( i i Since Ŵ(x) is a subgame, every information set that is in Ŵ(x) does not contain vertices that are not in that subgame; hence the strategy σi is well defined in both cases.
256
Equilibrium refinements
Since this inequality is satisfied for each player i, and each strategy σi , in the subgame Ŵ(x), the strategy vector σ ∗ restricted to the subgame Ŵ(x) is a Nash equilibrium of Ŵ(x). Recall that, given a mixed strategy σi of player i, we denote by σi (si ) the probability that the pure strategy si will be chosen, and given a behavior strategy σi , we denote by σi (Ui ; ai ) the probability that the action ai will be chosen in the information set Ui of player i. Definition 7.6 A mixed strategy σi of player i is called completely mixed if σi (si ) > 0 for each si ∈ Si . A behavior strategy σi of player i is called completely mixed if σi (Ui ; ai ) > 0 for each information set Ui of player i, and each action ai ∈ A(Ui ). A mixed strategy is completely mixed if under the strategy a player chooses each of his pure strategies with positive probability, and a behavior strategy is completely mixed if at each of his information sets, the player chooses with positive probability each of his possible actions at that information set. Since at each chance vertex every action is chosen with positive probability, if each player i uses a completely mixed strategy σi , then Pσ (x) > 0 for each vertex x in the game tree. This leads to the following corollary of Theorem 7.5. Corollary 7.7 Let Ŵ be an extensive-form game. Then every Nash equilibrium in completely mixed strategies (behavior strategies or mixed strategies) is a subgame perfect equilibrium. As Theorem 7.4 states, in games whose only subgames are the game itself, every Nash equilibrium is a subgame perfect equilibrium; in such cases, subgame perfection imposes no further conditions beyond the conditions defining the Nash equilibrium. In contrast, when a game has a large number of subgames, the concept of subgame perfection becomes significant, because a Nash equilibrium must meet a large number of conditions to be a subgame perfect equilibrium. The most extreme case of such a game is a game with perfect information. Recall that a game with perfect information is a game in which every information set is composed of only one vertex. In such a game, every vertex is the root of a subgame. Example 7.8 Figure 7.4 depicts a two-player game with perfect information.
x1 I
a
x2 II
c
d b
(1, 2)
x3 I
I x4
e f g h i
(4, 5) (−10, 10) (3, 7) (−10, 10) (−2, −4)
Figure 7.4 A subgame perfect equilibrium in a game with perfect information
To find a subgame perfect equilibrium, start with the smallest subgames: those whose roots are vertices adjacent to the leaf vertices, in this case the subgames Ŵ(x3 ) and Ŵ(x4 ). The only equilibrium in the subgame Ŵ(x3 ) is the one in which Player I chooses e, because this action leads
257
7.1 Subgame perfect equilibrium to the result (4, 5), which includes the best payoff Player I can receive in the subgame. Similar reasoning shows that the only equilibrium in the subgame Ŵ(x4 ) is the one in which Player I chooses i, leading to the result (−2, −4). We can now replace the subgame Ŵ(x3 ) with the result of its equilibrium, (4, 5), and the subgame Ŵ(x4 ) with the result of its equilibrium, (−2, −4). This yields the game depicted in Figure 7.5 (this procedure is called “folding” the game).
x1 I
a
b
x2 II
c
(4, 5)
d
(−2, −4)
(1, 2)
Figure 7.5 The folded game
Now, in the subgame starting at x2 , at equilibrium, Player II chooses c, leading to the result (4, 5). Folding this subgame leads to the game depicted in Figure 7.6.
x1 I
a
(4, 5)
b
(1, 2)
Figure 7.6 The game after further folding
In this game, at equilibrium Player I chooses a, leading to the result (4, 5). Recapitulating all the stages just described gives us the subgame perfect equilibrium shown in Figure 7.4. ◭
This process is called backward induction (see Remark 4.50 on page 121). The process leads to the equilibrium ((a, e, i), c), which by construction is a subgame perfect equilibrium. Backward induction leads, in a similar way, to a subgame perfect equilibrium in pure strategies in every (finite) game with perfect information. We thus have the following theorem. Theorem 7.9 Every finite extensive-form game with perfect information has a subgame perfect equilibrium in pure strategies. The proof of the theorem is accomplished by backward induction on the subgames, from the smallest (starting from the vertices adjacent to the leaves) to the largest (starting from the root of the tree). The formal proof is left to the reader (Exercise 7.8). We will later show (Theorem 7.37 on page 271) that every extensive-form game with perfect recall has a subgame perfect equilibrium in behavior strategies. With regard to games with incomplete information, we can reuse the idea of “folding” a game to prove the following theorem. Theorem 7.10 Every extensive-form game with perfect recall has a subgame perfect equilibrium in mixed strategies. The proof of the theorem is left to the reader as an exercise (Exercise 7.15).
258
Equilibrium refinements
Remark 7.11 In the last two theorems we used the fact that an extensive-form game is finite by definition: the number of decision vertices is finite, and the number of actions at every decision vertex is finite. To prove Theorem 7.9, it is not necessary to assume that the game tree is finite; it suffices to assume that there exists a natural number L such that the length of each path emanating from the root is no greater than L. Without this assumption, the process of backward induction cannot begin, and these two theorems are not valid. These theorems do not hold in games that are not finite. There are examples of infinite two-player games that have no equilibria (see, for example, Mycielski [1992], Claim 3.1). Such examples are beyond the scope of this book. Exercise 7.16 presents an example of a game with imperfect information in which one of the players has a continuum of pure strategies, but the game has no subgame perfect equilibria. Remark 7.12 In games with perfect information, when the backward induction process reaches a vertex at which a player has more than one action that maximizes his payoff, any one of them can be chosen in order to continue the process. Each choice leads to a different equilibrium, and therefore the backward induction process can identify several equilibria (all of which are subgame perfect equilibria). Remark 7.13 The process of backward induction is in effect the game-theory version of the dynamic programming principle widely used in operations research. This is a very natural and useful approach to multistage optimization: start with optimizing the action chosen at the last stage, stage n, for every state of the system at stage n − 1. Continue by optimizing the action chosen at stage n − 1 for every state of the system at stage n − 2, and so on. Backward induction is a very convincing logical method. However, its use in game theory sometimes raises questions stemming from the fact that unlike dynamic optimization problems with a single decision maker, games involve several interacting decision makers. We will consider several examples illustrating the limits of backward induction in games. We first construct an example of a game that has an equilibrium that is not subgame perfect, but is preferred by both players to all the subgame perfect equilibria of the game.
Example 7.14 A two-player extensive-form game with two equilibria is depicted in Figure 7.7.
x2 x1 I
A II B x3
t b t b
x4 I
C
D (2, 2) (4, 4) E I x5
F
(3, 0) (0, 0)
(0, 0) (1, 6)
Subgame perfect equilibrium Figure 7.7 A game with two equilibria
x2 x1 I
A II B x3
t b t b
x4 I
C
(3, 0)
D (2, 2)
(0, 0)
(4, 4) E I x5
F
(0, 0) (1, 6)
An equilibrium that is not subgame perfect
259
7.1 Subgame perfect equilibrium Using backward induction, we find that the only subgame perfect equilibrium in the game is ((A, C, F, ), b), leading to the payoff (2, 2). The equilibrium ((B, C, E), t) leads to the payoff (4, 4) (verify that this is indeed an equilibrium). This equilibrium is not a subgame perfect equilibrium, since it calls on Player I to choose E in the subgame Ŵ(x4 ), which is not an equilibrium. This choice by Player I may be regarded as a threat to Player II: “if you choose b (in an attempt to get 6) instead of t, I will choose E and you will get 0.” What is interesting in this example is that both players have an “interest” in maintaining this threat, because it serves both of them: it enables them to receive the payoff (4, 4), which is preferred by both of them to the payoff (2, 2) that they would receive under the game’s only subgame perfect equilibrium. ◭
Example 7.15 The repeated Prisoner’s Dilemma Consider the Prisoner’s Dilemma game with the payoff shown in Figure 7.8.
Player II D
C
D
1, 1
4, 0
C
0, 4
3, 3
Player I Figure 7.8 Prisoner’s Dilemma
Suppose two players play the Prisoner’s Dilemma game 100 times, with each player at each stage informed of the action chosen by the other player (and therefore also the payoff at each stage). We can analyze this game using backward induction: at equilibrium, at the 100th (i.e., the last) repetition, each of the players chooses D (which strictly dominates C), independently of the actions undertaken in the previous stages: for every other choice, a player choosing C can profit by deviating to D. This means that in the game played at the 99th stage, what the players choose has no effect on what will happen at the 100th stage, so that at equilibrium each player chooses D at stage 99, and so forth. Backward induction leads to the result that the only subgame perfect equilibrium is the strategy vector under which both players choose D at every stage.2 In Exercise 7.9, the reader is asked to turn this proof idea into a formal proof. In fact, it can be shown that in every equilibrium (not necessarily subgame perfect equilibrium) in this 100-stage game the players play (D, D) in every stage (see Chapter 13). This does not seem reasonable: one would most likely expect rational players to find a way to obtain the payoff (3, 3), at least in the initial stages of the game, and not play (D, D), which yields only (1, 1), in every stage. A large number of empirical studies confirm that in fact players do usually cooperate during many stages when playing the repeated Prisoner’s Dilemma, in order to obtain a higher payoff than ◭ that indicated by the equilibrium strategy.
Example 7.16 The Centipede game The Centipede game that we saw in Exercise 3.12 on page 61 is also a two-player game with 100 stages, but unlike the repeated Prisoner’s Dilemma, in the Centipede game the actions of the players are implemented sequentially, rather than simultaneously: in the odd stages, t = 1, 3, . . . , 99, Player I has a turn, and he decides whether to stop the game (S) or to
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
2 This is a verbal description of the process of backward induction in a game tree with 100 stages. Writing out the formal backward induction process in full when the game tree is this large is, of course, not practical.
260
Equilibrium refinements continue (C). If he stops the game at stage t, the payoff is (t, t − 1) (hence Player I receives t, and Player II receives t − 1), and if he instead chooses C, the game continues on to the next stage. In the even stages, t = 2, 4, . . . , 100, Player II has a turn, and he also chooses between stopping the game (S) and continuing (C). If he stops the game at stage t, the payoff is (t − 2, t + 1). If neither player chooses to stop in the first 99 stages, the game ends after 100 stages, with the payoff of 101 to Player I and 100 to Player II. The visual depiction of the game in extensive form explains why it is called the Centipede game (see Figure 7.9).
Stage: 1 I
C
2 II
3 I
C
C
4 II
S
S
S
S
(1, 0)
(0, 3)
(3, 2)
(2, 5)
99 I S
C
100 II
C
(101, 100)
S
(99, 98) (98, 101)
Figure 7.9 The Centipede game
What does backward induction lead to in this game? At stage 100, Player II should choose to stop the game: if he stops the game, he leaves the table with $101, while if the game continues he will only get $100. Since that is the case, at stage 99, Player I should stop the game: he knows that if he chooses to continue the game, Player II will stop the game at the next stage, and Player I will end up with $98, while if he stops, he walks away with $99. Subgame perfection requires him to stop at stage 99. A similar analysis obtains at every stage; hence the only subgame perfect equilibrium in the game is the strategy at which each player stops the game at every one of his turns. In particular, at this equilibrium, Player I stops the game at the first stage, and the payoff is (1, 0). This result is unreasonable: shrewd players will not stop the game and be satisfied with the payoff (1, 0) when they can both do much better by continuing for several stages. Empirical studies reveal that many people do indeed “climb the centipede” up to a certain level, and then one of them stops the game. It can be shown that at every Nash equilibrium of the game (not necessarily subgame perfect equilibrium), Player I chooses S at the first stage (Exercise 4.19 on page 134). ◭
7.2
Rationality, backward induction, and forward induction • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The last two examples indicate that backward induction alone is insufficient to describe rational behavior. Kohlberg and Mertens [1986] argued that backward induction requires that at each stage every player looks only at the continuation of the game from that stage forwards, and ignores the fact that the game has reached that stage. But if the game has reached a particular vertex in the game tree, that fact itself gives information about the behavior of the other players, and this should be taken into account. For example, if I am playing the repeated Prisoner’s Dilemma, and at the third stage it transpires that the other player played C in the previous two stages, then I need to take this into account, beyond regarding it as “irrational.” Perhaps the other player is signaling that we should both play (C, C)? Similarly, if the Centipede game reaches the second stage, then Player
261
7.2 Rationality, and backward and forward induction
I must have deviated from equilibrium, and not have stopped the game at the first stage. It seems reasonable to conjecture that if Player II chooses not to stop the game at that point, then Player I will not stop at stage 3. Backward induction implies that Player I should stop at stage 3, but it also implies that he should stop at stage 1. If he did not stop then, why should he stop now? The approach that grants significance to the history of the game is called forward induction. We will not present a formal description of the forward induction concept, and instead only give an example of it.
Example 7.17 Consider the two-player extensive-form game depicted in Figure 7.10.
T
t
(1, 5)
b
(1, 1)
T
II I
M
t
(1, 5)
b
(1, 1)
t
(0, 0)
b
(3, 2)
II t
(0, 0)
b
(3, 2)
B
I
M B
(2, 3) Equilibrium (B, t)
(2, 3) Equilibrium (M, b)
Figure 7.10 An extensive-form game with two subgame perfect equilibria
This game has two equilibria in pure strategies:
r (B, t), with payoff (2, 3). r (M, b), with payoff (3, 2). Since the game has no nontrivial subgames, both equilibria are subgame perfect equilibria (Theorem 7.4). Is (B, t) a reasonable equilibrium? Is it reasonable for Player II to choose t? If he is called on to choose an action, that means that Player I has not chosen B, which would guarantee him a payment of 2. It is unreasonable for him to have chosen T , which guarantees him only 1, and he therefore must have chosen M, which gives him the chance to obtain 3. In other words, although Player II cannot distinguish between the two vertices in his information set, from the very fact that the game has arrived at the information set and that he is being called upon to choose an action, he can deduce, assuming that Player I is rational, that Player I has played M and not T . This analysis leads to the conclusion that Player II should prefer to play b, if called upon to choose an action, and (M, b) is therefore a more reasonable equilibrium. This convincing choice between the ◭ two equilibria was arrived at through forward induction.
Inductive reasoning, and the inductive use of the concept of rationality, has the potential of raising questions regarding the consistency of rationality itself. Consider the game depicted in Figure 7.11. The only subgame perfect equilibrium of this game is ((r, c), a), which yields the payoff (2, 1). Why does Player II choose a at x2 ? Because if Player I is rational, he will then choose c, leading to the payoff (1, 2), which Player II prefers to the payoff (1, 1) that would result if he were to choose b. But is Player I really rational? Consider the fact that
262
Equilibrium refinements
r I x1
(2, 1) b
l
II x2
a
(1, 1) d I x3
c
(0, 0) (1, 2)
Figure 7.11 A game with only one subgame perfect equilibrium
r I x1
(2, 1) b
l
II x2
a
(3, 1) d I x3
c
(0, 0) (1, 2)
Figure 7.12 A game with only one subgame perfect equilibrium
if the game has arrived at x2 , and Player II is called upon to play, then Player I must be irrational: Player I must have chosen l, which yields him at most 1, instead of choosing r, which guarantees him 2. Then why should Player II assume that Player I will be rational at x3 ? Perhaps it would be more rational for Player II to choose b, and guarantee himself a payoff of 1, instead of running the risk that Player I may again be irrational and choose d, which will yield the payoff (0, 0). The game depicted in Figure 7.12, which is just like the previous game except that the payoff (1, 1) is replaced by (3, 1), is even more problematic. This game also has only one subgame perfect equilibrium, ((r, c), a), yielding payoff (2, 1). Again, by backward induction, Player I will not choose l, which leads to the payoff (1, 2). Player II, at x2 , must therefore conclude that Player I is irrational (because Player I must have chosen l at x1 , which by backward induction leads to him getting 1, instead of r, which guarantees him a payoff of 2). And if Player I is irrational, then Player II may need to fear that if he chooses a, Player I will then choose d and the end result will be (0, 0). It is therefore possible that at x2 , Player II will choose b, in order to guarantee himself a payoff of 1. But, if that is the case, Player I is better off choosing l at x1 , because then he will receive 3, instead of 2, which is what choosing r gets him. So is Player I really irrational if he chooses l? Perhaps Player I’s choice of l is a calculated choice, aimed at making Player II think that he is irrational, and therefore leading Player II to choose b? Then which one of Player I’s choices at vertex x1 is rational, and which is irrational?
7.3
Perfect equilibrium • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
This section presents the concept of perfect equilibrium. While subgame perfect equilibrium is a refinement of the concept of Nash equilibrium applicable only to extensive-form
263
7.3 Perfect equilibrium
games, perfect equilibrium is a refinement of the concept of Nash equilibrium that is applicable to extensive-form games and strategic-form games. After introducing the concept of subgame perfect equilibrium in 1965, Selten revisited it in 1975, using the following example. Example 7.18 Consider the three-player game depicted in Figure 7.13.
(1, 1, 1)
t T
II b
I B
III
τ
(4, 4, 0)
β
(0, 0, 1)
τ
(3, 2, 2)
β
(0, 0, 0)
Equilibrium (T, t, β)
(1, 1, 1)
t T
II b
I B
III
τ
(4, 4, 0)
β
(0, 0, 1)
τ
(3, 2, 2)
β
(0, 0, 0)
Equilibrium (B, t, τ)
Figure 7.13 A game in extensive-form, along with two equilibria
Since this game has no nontrivial subgames, every equilibrium of the game is a subgame perfect equilibrium. There are two equilibria in pure strategies:
r (T , t, β), with payoff (1, 1, 1). r (B, t, τ ), with payoff (3, 2, 2). (Check that each of these two strategy vectors does indeed form a Nash equilibrium.) Selten argued that Player II’s behavior in the equilibrium (B, t, τ ) is irrational. The reasoning is as follows: if Player II is called upon to play, that means that Player I misplayed, playing T instead of B, because at equilibrium he is supposed to play B. Since Player III is supposed to play τ at that ◭ equilibrium, if Player II deviates and plays b, he will get 4 instead of 1.
Selten introduced the concept of the “trembling hand,” which requires rational players to take into account the possibility that mistakes may occur, even if they occur with small probability. The equilibrium concept corresponding to this type of rationality is called “perfect equilibrium.” In an extensive-form game, a mistake can occur in two ways. A player may, at the beginning of the play of a game, with small probability mistakenly choose a pure strategy that differs from the one he intends to choose; such a mistake can cause deviations at every information set that is arrived at in the ensuing play. A second possibility is that the mistakes in different information sets are independent of each other; at each information set there is a small probability that a mistake will be made in choosing the action. As we will see later in this chapter, these two ways in which mistakes can occur lead to alternative perfect equilibrium concepts. The analysis of this solution concept therefore requires careful attention to these details. We will first present the concept of perfect equilibrium for strategic-form games, in Section 7.3.1, and present its extensive-form game version in Section 7.3.2.
264
Equilibrium refinements
7.3.1
Perfect equilibrium in strategic-form games Definition 7.19 Let Ŵ = (N, (Si )i∈N , (ui )i∈N ) be a game in strategic form in which the set of pure strategies of each player is finite. A perturbation vector of player i is a vector εi = (εi (si ))si ∈Si satisfying εi (si ) > 0 for each si ∈ Si , and
εi (si ) ≤ 1, ∀i ∈ N. (7.10) si ∈Si
A perturbation vector is a vector ε = (εi )i∈N , where εi is a perturbation vector of player i for each i ∈ N. For every perturbation vector ε, the ε-perturbed game is the game Ŵ(ε) = (N, (i (εi ))i∈N , (ui )i∈N ) where player i’s strategy set is i (εi ) := {σi ∈ i : σi (si ) ≥ εi (si ), ∀si ∈ Si }.
(7.11)
In words, in the ε-perturbed game Ŵ(ε), every pure strategy si is chosen with probability greater than or equal to εi (si ). The condition in Equation (7.10) guarantees that the strategy set i (εi ) is not empty. Furthermore, i (εi ) is a compact and convex set (Exercise 7.17). The following theorem therefore follows from Theorem 5.32 (page 171). Theorem 7.20 Every (finite) ε-perturbed game has an equilibrium; i.e., there exists a mixed-strategy vector σ ∗ = (σi∗ )i∈N satisfying σi∗ ∈ i (εi ) for each player i ∈ N, and ∗ ), ∀i ∈ N, ∀σi ∈ i (εi ). Ui (σ ∗ ) ≥ Ui (σi , σ−i
(7.12)
Given a perturbation vector ε, denote by M(ε) := max εi (si )
(7.13)
i∈N,si ∈Si
the maximal perturbation in Ŵ(ε), and by m(ε) :=
min εi (si )
(7.14)
i∈N,si ∈Si
the minimal perturbation. Note that m(ε) > 0. Example 7.21 Consider the two-player game depicted in Figure 7.14.
Player II L
R
T
1, 1
0, 0
B
0, 0
0, 0
Player I Figure 7.14 The strategic-form game of Example 7.21
265
7.3 Perfect equilibrium This game has two pure strategy equilibria, (T , L) and (B, R). Consider now the ε-perturbed game, where the perturbation vector ε = (ε1 , ε2 ) is as follows. ε1 (T ) = η, ε1 (B) = η2 ,
ε2 (L) = η, ε2 (R) = 2η,
where η ∈ (0, 13 ]. Then M(ε) = 2η,
m(ε) = η2 .
(7.15)
Since m(ε) > 0, all the strategies in 1 (ε1 ) and 2 (ε2 ) are completely mixed strategies. In particular, in the perturbed game, Player I’s payoff under T is always greater than his payoff under B: if Player I plays B, he receives 0, while if he plays T his expected payoff is positive. It follows that Player I’s best reply to every strategy in 2 (ε2 ) is to play T with the maximally allowed probability; this means that the best reply is [(1 − η2 )(T ), η2 (B)]. Similarly, we can calculate that Player II’s expected payoff is greatest when he plays L, and his best reply to every strategy in 1 (ε1 ) is [(1 − 2η)(L), 2η(R)]. It follows that the only equilibrium in this ε-perturbed game is ([(1 − η2 )(T ), η2 (B)], [(1 − 2η)(L), 2η(R)]).
(7.16)
◭ In Example 7.21 Player I’s pure strategy T weakly dominates his pure strategy B. In this case, when Player II is restricted to playing mixed strategies, the strategy T always leads to a higher payoff than the strategy B, and therefore at equilibrium Player I plays the pure strategy B with the minimal possible probability. This line of reasoning is generalized to the following theorem, whose proof is left to the reader. Theorem 7.22 If si is a weakly dominated strategy, then at every equilibrium σ of the ε-perturbed game, σi (si ) = εi (si ).
(7.17)
Let (ε k )k∈N be a sequence of perturbation vectors satisfying limk→∞ M(ε k ) = 0: the maximal constraint converges to 0. Then for every completely mixed strategy σi of player i, there exists k0 ∈ N such that σi ∈ i (εik ) for every k0 ≥ k. Indeed, denote c := minsi ∈Si σi (si ) > 0 and choose k0 ∈ N, where M(εik ) ≤ c for all k ≥ k0 . Then σi ∈ i (εik ) for every k ≥ k0 . Since every mixed strategy in i can be approximated by a completely mixed strategy (Exercise 7.18), we deduce the following theorem. Theorem 7.23 Let (ε k )k∈N be a sequence of perturbation vectors satisfying limk→∞ M(εk ) = 0. For every mixed strategy σi ∈ i of player i, there exists a sequence (σik )k∈N of mixed strategies of player i satisfying the following two properties:
r σik ∈ i (εik ) for each k ∈ N. r limk→∞ σik exists and equals σi . The following theorem, which is a corollary of Theorem 7.23, states that the limit of equilibria in an ε-perturbed game, where the perturbation vectors (εk )k∈N are positive and converge to zero, is necessarily a Nash equilibrium of the original unperturbed game.
266
Equilibrium refinements
Theorem 7.24 Let Ŵ = (N, (Si )i∈N , (ui )i∈N ) be a strategic-form game. For each k ∈ N, let εk be a perturbation vector, and let σ k be an equilibrium of the εk -perturbed game Ŵ(εk ). If 1. limk→∞ M(εk ) = 0, 2. limk→∞ σ k exists and equals the mixed strategy vector σ , then σ is a Nash equilibrium of the original game Ŵ. Proof: To show that σ is a Nash equilibrium, we need to show that no player can profit from a unilateral deviation. Let σi′ be a strategy of player i. By Theorem 7.23, there exists a sequence of strategies (σi′k )k∈N converging to σi′ , and satisfying σi′k ∈ i (εik ) for each k ∈ N. Since σ k is an equilibrium in the εk -perturbed game Ŵ(εk ), k . (7.18) ui (σ k ) ≥ ui σi′k , σ−i By the continuity of the payoff function ui ,
k = ui (σi′ , σ−i ). ui (σ ) = lim ui (σ k ) ≥ lim ui σi′k , σ−i k→∞
k→∞
(7.19)
Since this inequality obtains for every player i ∈ N and every mixed strategy σi′ ∈ i , it follows that σ is a Nash equilibrium. A mixed strategy vector that is the limit of equilibria in perturbed games, where the perturbation vectors are all positive, and converge to zero, is called a perfect equilibrium. Definition 7.25 A mixed strategy vector σ in a strategy-form game (N, (Si )i∈N , (ui )i∈N ) is a perfect equilibrium if there exists a sequence of perturbation vectors (εk )k∈N satisfying limk→∞ M(εk ) = 0, and for each k ∈ N there exists an equilibrium σ k of Ŵ(εk ) such that lim σ k = σ.
k→∞
(7.20)
The following corollary of Theorem 7.24 states that the concept of perfect equilibrium is a refinement of the concept of Nash equilibrium. Corollary 7.26 Every perfect equilibrium of a finite strategic-form game is a Nash equilibrium. The game in Example 7.21 (page 264) has two equilibria, (T , L) and (B, R). The equilibrium (T , L) is a perfect equilibrium: (T , L) is the limit of the equilibria given by Equation (7.16), as η converges to 0. We will later show that (B, R) is not a perfect equilibrium. In Example 7.18 (page 263), the equilibrium (T , t, β) is a perfect equilibrium, but the equilibrium (B, t, τ ) is not a perfect equilibrium (Exercise 7.30). The next theorem states that every finite game has at least one perfect equilibrium. Theorem 7.27 Every finite strategic-form game has at least one perfect equilibrium. Proof: Let Ŵ be a finite strategic form game, and let (εk )k∈N be a sequence of perturbation vectors satisfying limk→∞ M(εk ) = 0. For example, εik (si ) = k|S1 i | for each player
267
7.3 Perfect equilibrium
i ∈ N and for each si ∈ Si . Theorem 7.20 implies that for each k ∈ N the game Ŵ(ε k ) has an equilibrium in mixed strategies σ k . Since the space of mixed strategy vectors is compact (see Exercise 5.1 on page 194), the sequence (σ k )k∈N has a convergent subsequence (σ kj )j ∈N . Denote the limit of this subsequence by σ . Applying Theorem 7.24 to the sequence of perturbation vectors (ε kj )j ∈N , and to the sequence of equilibria (σ kj )j ∈N , leads to the conclusion that σ is a perfect equilibrium of the original game. As a corollary of Theorem 7.22, and from the definition of perfect equilibrium, we can deduce the following theorem (Exercise 7.22). Theorem 7.28 In every perfect equilibrium, every (weakly) dominated strategy is chosen with probability zero. In other words, no weakly dominated strategy can be a part of a perfect equilibrium. This means that, for example, in Example 7.21, the strategy vector (B, R) is not a perfect equilibrium, since B is a dominated strategy of Player I (and R is a dominated strategy of Player II). As Exercise 7.28 shows, the converse of this theorem is not true: it is possible for a Nash equilibrium to choose every dominated strategy with probability zero, but not to be a perfect equilibrium. The following theorem states that a completely mixed equilibrium must be a perfect equilibrium. Theorem 7.29 Every equilibrium in completely mixed strategies in a strategic-form game is a perfect equilibrium. Proof: Let σ ∗ be a completely mixed equilibrium of a strategic-form game Ŵ. Then c := mini∈N minsi ∈Si σi∗ (si ) > 0. Let (ε k )k∈N be a sequence of perturbation vectors satisfying limk→∞ M(εk ) = 0. Since limk→∞ M(εk ) = 0, it must be the case that M(ε k ) < c for sufficiently large k. Hence for each such k, we may conclude that σi∗ ∈ i (εik ) for every player i; i.e., σ ∗ is a possible strategy vector in the game Ŵ(εk ). Let K0 ∈ N be sufficiently large so that for each k ≥ K0 , one has σi∗ ∈ i (εik ) for every player i. Since Ŵ(ε k ) has fewer strategies than Ŵ, Theorem 4.31 (page 107) implies that σ ∗ is an equilibrium of Ŵ(ε k ). We may therefore apply Theorem 7.24 to the sequences (εk )k≥K0 and the constant ∗ sequence (σ ∗ )∞ n=K0 , to conclude that σ is a perfect equilibrium, which is what we needed to show.
7.3.2
Perfect equilibrium in extensive-form games Since every extensive-form game can be presented as a strategic-form game, the concept of perfect equilibrium, as defined in Definition 7.25, also applies to extensive-form games. This definition of perfect equilibrium for extensive-form games is called strategic-form perfect equilibrium. Theorem 7.27 implies the following corollary (Exercise 7.32).
268
Equilibrium refinements
Theorem 7.30 Every extensive-form game has a strategic-form perfect equilibrium. In this section, we will study the concept of extensive-form perfect equilibrium, where the mistakes that each player makes in different information sets are independent of each other. We will limit our focus to extensive-form games with perfect recall. By Kuhn’s Theorem, in such games each behavior strategy has an equivalent mixed strategy, and the converse also holds. Let Ŵ be an extensive-form game with perfect recall. Denote by Ui player i’s set of information sets. Recall that we denote player i’s set of possible actions at information set Ui by A(Ui ). When we are dealing with behavior strategies, a perturbation vector δi of player i is a vector associating a positive real number with each action a ∈ i Ui ∈Ui A(Ui ) of player i, such that ai ∈A(Ui ) δi (ai ) ≤ 1 for each information set Ui ∈ Ui . Let δ = (δi )i∈N be a set of perturbation vectors, one for each player. Denote the maximal perturbation in Ŵ(ε) by M(δ) :=
max
{i∈N,ai ∈
and the minimal perturbation by m(δ) :=
min
{i∈N,ai ∈
Ui ∈Ui
Ui ∈Ui
δi (ai ),
(7.21)
δi (ai ) > 0.
(7.22)
A(Ui )}
A(Ui )}
The game Ŵ(δ) is the extensive-form game such that player i’s set of strategies, denoted by Bi (δi ), is the set of behavior strategies in which every action ai is chosen with probability greater than or equal to δi (ai ), that is, ! Bi (δi ) := σi ∈ × (A(Ui )) : σi (Ui ; ai ) ≥ δi (ai ), ∀i ∈ N, ∀Ui ∈ Ui , ∀ai ∈ A(Ui ) . Ui ∈Ui
(7.23)
Since every possible action at every chance vertex is chosen with positive probability, and since m(δ) > 0, it follows that Pσ (x) > 0 for every vertex x, and every behavior strategy vector σ = (σi )i∈N in Ŵ(δ): the play of the game arrives at every vertex x with positive probability. For each vertex x such that Ŵ(x) is a subgame, denote by Ŵ(x; δ) the subgame of Ŵ(δ) starting at the vertex x. Similarly to Theorem 7.5, we have the following result, whose proof is left to the reader (Exercise 7.33). Theorem 7.31 Let Ŵ be an extensive-form game, and let Ŵ(x) be a subgame of Ŵ. Let δ be a perturbation vector, and let σ ∗ be a Nash equilibrium (in behavior strategies) of the game Ŵ(δ). Then the strategy vector σ ∗ , restricted to the subgame Ŵ(x), is a Nash equilibrium of Ŵ(x; δ). Similar to Definition 7.25, which is based on mixed strategies, the next definition bases the concept of perfect equilibrium on behavior strategies. Definition 7.32 A behavior strategy vector σ in an extensive-form game Ŵ is called an extensive-form perfect equilibrium if there exists a sequence of perturbation vectors
269
7.3 Perfect equilibrium
(δ k )k∈N satisfying limk→∞ M(δ k ) = 0, and for each k ∈ N there exists an equilibrium σ k of Ŵ(δ k ), such that limk→∞ σ k = σ is satisfied. These concepts, strategic-form perfect equilibrium and extensive-form perfect equilibrium, differ from each other: a strategic-form perfect equilibrium is a vector of mixed strategies, while an extensive-form perfect equilibrium is a vector of behavior strategies. Despite the fact that in games with perfect recall there is an equivalence between mixed strategies and behavior strategies (see Chapter 6), an extensive-form perfect equilibrium may fail to be a strategic-form perfect equilibrium. In other words, a vector of mixed strategies, each equivalent to a behavior strategy in an extensive-form perfect equilibrium, may fail to be a strategic-form perfect equilibrium (Exercise 7.36). Conversely, a strategic-form perfect equilibrium may not necessarily be an extensive-form equilibrium (Exercise 7.37). The conceptual difference between these two concepts is similar to the difference between mixed strategies and behavior strategies: in a mixed strategy, a player randomly chooses a pure strategy at the start of a game, while in a behavior strategy he randomly chooses an action at each of his information sets. Underlying the concept of strategic-form perfect equilibrium is the assumption that a player may mistakenly choose, at the start of the game, a pure strategy different from the one he intended to choose. In contrast, underlying the concept of extensive-form perfect equilibrium is the assumption that a player may mistakenly choose an action different from the one he intended at any of his information sets. In extensive-form games where each player has a single information set, these two concepts are identical, because in that case the set of mixed strategies of each player is identical with his set of behavior strategies. As stated above, Selten defined the concept of perfect equilibrium in order to further “refine” the concept of subgame perfect equilibrium in extensive-form games. We will now show that this is indeed a refinement: every extensive-form perfect equilibrium is a subgame perfect equilibrium in behavior strategies. (This result can also be proved directly.) Since every subgame perfect equilibrium is a Nash equilibrium (Remark 7.3 on page 254), we will then conclude that every extensive-form perfect equilibrium is a Nash equilibrium in behavior strategies. This result can also be proved directly; see Exercise 7.31. Theorem 7.33 Let Ŵ be an extensive-form game. Every extensive-form perfect equilibrium of Ŵ is a subgame perfect equilibrium. The analogous theorem for strategic-form perfect equilibrium does not obtain (see Exercise 7.37). Before we proceed to the proof of the theorem, we present a technical result analogous to Theorem 7.23, which states that every behavior strategy may be approximated by a sequence of behavior strategies in perturbed games, where the perturbations converge to zero. The proof of this theorem is left to the reader (Exercise 7.34). Theorem 7.34 Let (δ k )k∈N be a sequence of perturbation vectors satisfying limk→∞ M(δ k ) = 0. For each behavior strategy σi ∈ Bi of player i, there exists a sequence (σik )k∈N of behavior strategies satisfying the following two properties:
270
Equilibrium refinements
r σik ∈ Bi (δik ) for each k ∈ N. r limk→∞ σik exists and equals σi . Proof of Theorem 7.33: Let σ ∗ = (σi∗ )i∈N be an extensive-form perfect equilibrium, and let Ŵ(x) be a subgame (starting at vertex x). We will show that the restriction of σ ∗ to this subgame is a subgame perfect equilibrium. By the definition of extensive-form perfect equilibrium, for each k ∈ N there exists a perturbation vector δ k , and an equilibrium σ k in the δ k -perturbed game satisfying limk→∞ M(δ k ) = 0, and limk→∞ σ k = σ ∗ . Theorem 7.31 implies that the strategy vector σ k is a Nash equilibrium in behavior strategies of the game Ŵ(x; δ k ). Let σi′ be a behavior strategy of player i. We will show that ∗ ) | x). ui (σ ∗ | x) ≥ ui ((σi′ , σ−i
(7.24)
Theorem 7.34 implies that there exists a sequence (σi′k )k∈N of behavior strategies converging to σi′ and satisfying σi′k ∈ Bi (δik ) for each k ∈ N. Since σ k is an equilibrium of the subgame Ŵ(x; δ k ), k |x . ui (σ k | x) ≥ ui σi′k , σ−i
(7.25)
Equation (7.24) is now derived from Equation (7.25) by using the continuity of the payoff function ui and passing to the limit as k → ∞. The next example shows that the converse of Theorem 7.33 does not obtain; a subgame perfect equilibrium need not be an extensive-form perfect equilibrium.
Example 7.35 Consider the two-player extensive-form game depicted in Figure 7.15. This game has two purestrategy equilibria, (A, L) and (B, R). Each of these equilibria is a subgame perfect equilibrium, since the game has no nontrivial subgames (see Theorem 7.4).
C
R
(1, 3)
L
(1, 1)
C
II I
B
R
(1, 3)
L
(1, 1)
R
(2, 1)
L
(0, 0)
II R
(2, 1)
L
(0, 0)
A
I
B A
(1, 2) Equilibrium (A, L)
(1, 2) Equilibrium (B, R)
Figure 7.15 The game in Example 7.35, along with two of its equilibria
271
7.4 Sequential equilibrium The equilibrium (A, L) is not an extensive-form perfect equilibrium. Indeed, since each player has a single information set, if (A, L) were an extensive-form perfect equilibrium it would also be a strategic-form perfect equilibrium (Exercise 7.39). But the strategy L is a weakly dominated strategy, and therefore Theorem 7.28 implies that it cannot form part of a strategic-form perfect equilibrium. Showing that (B, R) is an extensive-form perfect equilibrium is left to the reader (Exercise 7.47).
◭ Together with Theorem 7.33, the last example proves that the concept of extensive-form perfect equilibrium is a refinement of the concept of subgame perfect equilibrium. Note that in this example, a subgame perfect equilibrium that is not an extensive-form perfect equilibrium is given in pure strategies, and therefore the inclusion of the set of extensiveform perfect equilibria in the set of subgame perfect equilibria is a proper inclusion, even when only pure strategy equilibria are involved. Theorem 7.33 states that every extensive-form perfect equilibrium is a subgame perfect equilibrium, and therefore also a Nash equilibrium. It follows that if a game has no Nash equilibria in behavior strategies, then it has no extensive-form perfect equilibria. By Theorem 6.16 (page 235) this can happen only if the game does not have perfect recall. Example 6.17 (page 236) describes such a game. As we now show, a finite extensive-form game with perfect recall always has an extensive-form perfect equilibrium. Theorem 7.36 Every finite extensive-form game with perfect recall has an extensive-form perfect equilibrium. Proof: Let Ŵ be a finite extensive-form game with perfect recall, and let (δ k )k∈N be a sequence of perturbation vectors satisfying limk→∞ M(δ k ) = 0. Since all the players have perfect recall, Theorem 6.16 (page 235) shows that Ŵ(δ k ) has an equilibrium σ k in behavior strategies. Since the space of behavior strategy vectors ×i∈N Bi is compact, the sequence (σ k )k∈N has a convergent subsequence (σ kj )j ∈N , converging to a limit σ ∗ . Then σ ∗ is an extensive-form perfect equilibrium. Theorems 7.36 and 7.33 lead to the following result. Theorem 7.37 Every finite extensive-form game with perfect recall has a subgame perfect equilibrium in behavior strategies.
7.4
Sequential equilibrium • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
This section presents another equilibrium concept for extensive-form games, which differs from the three concepts we have studied so far in this chapter, subgame perfect equilibrium, strategic-form perfect equilibrium, and extensive-form perfect equilibrium. The subgame perfect equilibrium concept assumes that players analyze each game from the leaves to the root, with every player, at each of his information sets, choosing an action under the assumption that in each future subgame, all the players will implement equilibrium
272
Equilibrium refinements
strategies. The two perfect equilibrium concepts assume that each player has a positive, albeit small, probability of making a mistake, and that the other players take this into account when they choose their actions. The sequential equilibrium concept is based on the principle that at each stage of a game, the player whose turn it is to choose an action has a belief, i.e., a probability distribution, about which vertex in his information set is the true vertex at which the game is currently located, and a belief about how the play of the game will proceed given any action that he may choose. These beliefs are based on the information structure of the game (the information sets) and the strategies of the players. Given these beliefs, at each of his information sets, each player chooses the action that gives him the highest expected payoff. In this section we will deal only with games with perfect recall. We will later, in Example 7.60 (on page 283), remark on why it is unclear how the concept of sequential equilibrium can be generalized to games without perfect recall. Recall that a player’s behavior strategy in an extensive-form game is a function associating each of that player’s information sets with a probability distribution over the set of possible actions at that information set. Such a probability distribution is called a mixed action. Before we begin the formal presentation, we will look at an example that illustrates the concept of sequential equilibrium and the ideas behind it.
Example 7.38 Consider the two-player extensive-form game depicted in Figure 7.16.
(2, 0) (3, 2) t x2 T
m
b x1
I
x4
T1
(5, 3)
B1
(0, −1)
T1
(0, 3)
x5
B1
(2, 2)
x6
T2
(0, 0)
B2
(2, 2)
T2
(3, 0)
B2
(0, 2)
UI1
M UII B
t
UI2
m x3
x7 b
(2, 1)
Figure 7.16 The game in Example 7.38 and a strategy vector
The following pair of behavior strategies σ = (σI , σII ) is a Nash equilibrium of the game (see Figure 7.16, where the actions taken in this equilibrium appear as bold lines):
273
7.4 Sequential equilibrium
r Player I’s strategy σI : r At information set UI1 , choose the mixed action [ 3 (T ), 4 (M), 5 (B)]. 12 12 12 r At information set UI2 , choose B1 . r At information set UI3 , choose T2 . r Player II’s strategy σII : Choose b. (Check that this is indeed an equilibrium.) The strategy vector σ determines, for each vertex x, the probability Pσ (x) that a play of the game will visit this vertex: Pσ (x2 ) =
Pσ (x5 ) =
4 , 12 4 , 12
Pσ (x3 ) =
5 , 12
Pσ (x6 ) = 0,
Pσ (x4 ) = 0,
Pσ (x7 ) = 0.
When Player II is called upon to play, he knows that the play of the game is located at information set UII = {x2 , x3 }. Knowing Player I’s strategy, he can calculate the conditional probability of each one of the vertices in this information set to be the actual position of the play, Pσ (x2 | UII ) =
Pσ (x2 ) = Pσ (x2 ) + Pσ (x3 )
4 12 4 12
+
5 12
=
4 , 9
(7.26)
and similarly, Pσ (x3 | UII ) = 95 . The conditional probability Pσ (· | UII ) is Player II’s belief at information set UII . Similarly, when Player I is called to play at information set UI1 , he cannot distinguish between x4 and x5 , but knowing Player II’s strategy σII , Player I can ascribe probability 1 to the play of the game being at vertex x5 . Formally, this is the following conditional distribution, Pσ x5 | UI1 =
4 Pσ (x5 ) = 12 4 = 1, Pσ (x4 ) + Pσ (x5 ) 0 + 12
(7.27)
with a similar calculation yielding Pσ (x4 | UI1 ) = 0. The conditional distribution Pσ (· | UI1 ) is Player I’s belief at information set UI1 . Does Player I also have a belief at information set UI2 ? The answer to this question is negative, because if the players implement the strategy vector σ , the play of the game will not visit this information set. Formally, the conditional distribution Pσ (· | UI2 ) is undefined, because the probability Pσ (UI2 ) = Pσ (x6 ) + Pσ (x7 ), which represents the probability ◭ that the play of the game will arrive at information set UI2 , equals 0.
For each strategy vector σ , and for each player i and each of his information sets Ui , denote Pσ (Ui ) := x∈Ui Pσ (x). Since the game has perfect recall, every path from the root passes through every information set at most once, and hence Pσ (Ui ) is the probability that a play of the game will visit information set Ui when the players implement the strategy vector σ . As we saw, the strategy vector σ determines a belief Pσ (· | Ui ) over the vertices of information set Ui , for each information set satisfying Pσ (Ui ) > 0: when player i is called upon to play in information set Ui , his belief about the vertex at which the play of the game is located is given by the conditional distribution Pσ (· | Ui ). Beliefs, calculated this way from the strategy vector σ , satisfy the property of consistency. In other words, the beliefs are consistent with the distribution Pσ over the vertices of the game tree, and with Bayes’ formula for calculating conditional probability. We say that the strategy vector σ determines a partial belief system. The word partial denotes the
274
Equilibrium refinements
fact that the beliefs are defined only for some of the information sets; they are not defined for information sets that the strategy vector σ leads to with probability 0.
Example 7.38 (Continued) We will now explore the connection between an action chosen by a player at a given information set, and his belief at that information set. Player I, at information set UI1 , ascribes probability 1 to the play of the game being located at vertex x5 . He therefore regards action B1 as being the optimal action for him: B1 leads to a payoff 2, while T1 leads to a payoff 0. Given his belief at UI1 , Player I is rational in choosing B1 . If, in contrast, according to his belief at UI1 he had ascribed high probability to the play of the game being located at vertex x4 (a probability greater than or equal to 72 ) it would have been rational for him to choose T1 . This property, in which a player’s strategy calls on him to choose an action maximizing his expected payoff at each information set, given his belief at that information set, is termed sequential rationality. We will now check whether sequential rationality obtains at Player II’s information set UII in the equilibrium we previously presented in this example. As we computed above, Player II’s belief about the vertex at which the play of the game is located, given that it has arrived at information set UII , is Pσ (x3 | UII ) = 95 .
Pσ (x2 | UII ) = 49 ,
(7.28)
Given this belief, and the strategy vector σ , if Player II chooses action b, he receives a payoff of 2 with probability 49 (if the play of the game is at vertex x2 ) and a payoff of 1 with probability 59 (if the play of the game is at vertex x3 ). His expected payoff is therefore 4 9
×2+
5 9
×1=
13 9 .
(7.29)
A similar calculation shows that if he chooses action m, his expected payoff is 4 9
5 9
× (−1) +
× 0 = − 94 ,
(7.30)
× 0 = 98 .
(7.31)
and if he chooses action t his expected payoff is 4 9
×2+
5 9
The strategy σII calls on Player II to choose action b at information set UII , which does indeed maximize his expected payoff, relative to his belief. In other words, Player II’s strategy is sequentially rational. We next ascertain that Player I’s strategy is sequentially rational at information set UI1 , containing a single vertex, x1 . When the play of the game arrives at information set UI1 , Player I knows that x1 must be the vertex at which the play of the game is located, because the information set contains 3 4 5 only one vertex. The mixed strategy [ 12 (T ), 12 (M), 12 (B)] maximizes Player I’s expected payoff if and only if all three actions yield him the same expected payoff. This is due to the fact that the payoff is a linear function of the probabilities in which the various actions are implemented by Player I at information set UI1 . We encountered a similar argument at the indifference principle (Theorem 5.18, page 160). The reader is asked to verify that each of these three actions yield the payoff 2, and therefore any mixed action implemented by Player I at information set UI1 4 5 3 (T ), 12 (M), 12 (B)] implemented satisfies sequential rationality, in particular the mixed action [ 12 ◭ in σI .
In Example 7.38, we saw that the strategy vector σ induces a partial belief system over the players’ information sets, and that each player i’s strategy is sequentially rational at
275
7.4 Sequential equilibrium
each information set Ui for which the belief Pσ (· | Ui ) is defined, i.e., at each information set at which the play of the game arrives with positive probability under the strategy vector σ . The main idea behind the concept of sequential equilibrium is that the property of sequential rationality should be satisfied at every information set, including those information sets that are visited with probability 0 under the strategy vector σ . This requirement is similar to the requirement that the subgame perfect equilibrium be an equilibrium both on the equilibrium path, and off the equilibrium path. A sequential equilibrium therefore requires specifying players’ beliefs at all information sets. A sequential equilibrium is thus a pair (σ, μ), where σ = (σi )i∈N is a vector of behavior strategies, and μ is a complete belief system; i.e., with every player and every information set of that player, μ associates a belief: a distribution over the vertices of that information set. The pair (σ, μ) must satisfy two properties: the beliefs μ must be consistent with Bayes’ formula and with the strategy vector σ , and σ must be sequentially rational given the beliefs μ. The main stage in the development of the concept of sequential equilibrium is defining the concept of consistency of beliefs μ with respect to a given strategy vector σ . Doing this requires extending the partial belief system Pσ to every information set Ui for which Pσ (Ui ) = 0. This extension is based on Selten’s trembling hand principle, which was discussed in the section defining perfect equilibrium. Denote by U the set of the information sets of all the players. Definition 7.39 A complete belief system μ is a vector μ = (μU )U ∈U associating each information set U ∈ U with a distribution over the vertices in U . Definition 7.40 Let U ′ ⊆ U be a partial collection of information sets. A partial belief system μ (with respect to the U ′ ) is a vector μ = (μU )U ∈U ′ associating each information set U ∈ U ′ with a distribution over the vertices in U . If μ is a partial belief system, denote by Uμ the collection of information sets at which μ is defined. Although the definition of a belief system is independent of the strategy vector implemented by the players, we are interested in belief systems that are closely related to the strategy vector σ . The partial belief system that is induced by σ plays a central role in the definition of sequential equilibrium. Definition 7.41 Let σ be a strategy vector. Let Uσ = {U ∈ U : Pσ (U ) > 0} be the collection of all information sets that the play of the game visits with positive probability when the players implement the strategy vector σ . The partial belief system induced by the strategy vector σ is the collection of distributions μσ = (μσ,U )U ∈Uσ , satisfying, for each U ∈ Uσ , μσ,U (x) := Pσ (x | U ) =
Pσ (x) , ∀x ∈ U. Pσ (U )
(7.32)
Note that Uσ = Uμσ . To avoid using both denotations, we will henceforth use only the denotation Uμσ .
276
Equilibrium refinements
Remark 7.42 Since we have assumed that at each chance vertex, every action is chosen with positive probability, it follows that if all the strategies in the strategy vector σ are completely mixed, i.e., at each information set every action is chosen with positive probability, then Pσ (Ui ) > 0 for each player i and each of his information sets Ui ; hence in this case the belief system μσ is a complete belief system: it defines a belief at each information set in the game (Uμσ = U ). Recall that ui (σ | x) is the expected payoff of player i when the players implement the strategy σ , given that the play of the game is at vertex x. It follows that player i’s expected payoff when the players implement the strategy vector σ , given that the game arrives at information set Ui and given his belief, is ui (σ | Ui , μ) :=
x∈Ui
μσ,Ui (x)ui (σ | x).
(7.33)
Definition 7.43 Let σ be a vector of behavior strategies, μ be a partial belief system, and Ui ∈ Uμ be an information set of player i. The strategy vector σ is called rational at information set Ui , relative to μ, if for each behavior strategy σi′ of player i ui (σ | Ui , μ) ≥ ui ((σi′ , σ−i ) | Ui , μ).
(7.34)
The pair (σ, μ) is called sequentially rational if for each player i and each information set Ui ∈ Uμ , the strategy vector σ is rational at Ui relative to μ. As the following theorems show, there exists a close connection between the concepts of sequential rationality and those of Nash equilibrium and subgame perfect equilibrium. Theorem 7.44 In an extensive-form game with perfect recall, if the pair (σ, μσ ) is sequentially rational, then the strategy vector σ is a Nash equilibrium in behavior strategies. Proof: Let i ∈ N be a player, and let σi′ be any behavior strategy of player i. We will prove that ui (σ ) ≥ ui (σi′ , σ−i ). We say that an information set Ui of player i is highest if every path from the root to i a vertex in Ui does not pass through any other information set of player i. Denote by U the set of player i’s highest information sets: any path from the root to a leaf that passes through an information set of player i necessarily passes through an information set in i . Denote by p∗ the probability that, when the strategy vector σ is played, a play of the U σ,i game will not pass through any of player i’s information sets, and denote by u∗σ,i player i’s expected payoff, given that the play of the game according to the strategy vector does not pass through any of player i’s information sets. ∗ Note that pσ,i and u∗σ,i are independent of player i’s strategy, since these values depend on plays of the game in which player i does not participate. Similarly, for every
277
7.4 Sequential equilibrium
i , the probability Pσ (Ui ) is independent of player i’s strategy, since information set Ui ∈ U these probabilities depend on actions chosen at vertices that are not under player i’s control. Using this notation, we have that i , pσ (Ui ) = p(σi′ ,σ−i ) (Ui ), ∀Ui ∈ U
∗ pσ (Ui )ui (σ | Ui , μσ ) + pσ,i u∗σ,i , ui (σ ) = ui (σi′ , σ−i ) = =
i Ui ∈U
i Ui ∈U
i Ui ∈U
(7.35) (7.36)
∗ p(σi′ ,σ−i ) (Ui )ui ((σi′ , σ−i ) | Ui , μσ ) + pσ,i u∗σ,i
(7.37)
∗ pσ (Ui )ui ((σi′ , σ−i ) | Ui , μσ ) + pσ,i u∗σ,i ,
(7.38)
where Equation (7.38) follows from Equation (7.35). Since for every Ui ∈ Uμσ , the pair (σ, μσ ) is sequentially rational at Ui , ui (σ | Ui , μσ ) ≥ ui ((σi′ , σ−i ) | Ui , μσ ).
(7.39)
Equations (7.36)–(7.39) imply that ui (σ ) ≥ ui (σi′ , σ−i ), which is what we wanted to prove.
(7.40)
The following theorem, whose proof is left to the reader (Exercise 7.41), is the converse of Theorem 7.44. Theorem 7.45 If σ ∗ is a Nash equilibrium in behavior strategies, then the pair (σ ∗ , μσ ∗ ) is sequentially rational at every information set in Uμσ ∗ . In a game with perfect information, every information set contains only one vertex, and therefore when called on to make a move, a player knows at which vertex the play of the game is located. In this case, we denote by μ the complete belief system in which μU = [1(x)], for every information set U = {x}. The next theorem, whose proof is left to the reader (Exercise 7.42), characterizes subgame perfect equilibria using sequential rationality. Theorem 7.46 In a game with perfect information, a behavior strategy vector σ is a subgame perfect equilibrium if and only if the pair (σ, μ) is sequentially rational at each vertex in the game.
As previously stated, the main idea behind the sequential equilibrium refinement is to expand the definition of rationality to information sets Ui at which Pσ (Ui ) = 0. This is accomplished by the trembling hand principle: player i may find himself in an information set Ui for which Pσ (Ui ) = 0, due to a mistake (tremble) on the part of one of the players, and we require that even if this should happen, the player ought to behave rationally relative
278
Equilibrium refinements
to beliefs that are “consistent” with such mistakes. In other words, we extend the partial belief system μσ to a complete belief system μ that is consistent with the trembling hand principle, and we require that σ be sequentially rational not only with respect to μσ , but also with respect to μ. Remark 7.47 A belief at an information set Ui is a probability distribution over the vertices in Ui , i.e., an element of the compact set (Ui ). A complete belief system is a vector of beliefs, one belief per information set, and therefore a vector in the compact set ×U ∈U (U ). Since this set is compact, every sequence of complete belief systems has a convergent subsequence. Definition 7.48 An assessment is a pair (σ, μ) in which σ = (σi )i∈N is a vector of behavior strategies, and μ = (μU )U ∈U is a complete belief system. Definition 7.49 An assessment (σ, μ) is called consistent if there exists a sequence of completely mixed behavior strategy vectors (σ k )k∈N satisfying the following conditions: (i) The strategies (σ k )k∈N converge to σ , i.e., limk→∞ σ k = σ . (ii) The sequence of beliefs (μσ k )k∈N induced by (σ k )k∈N converges to the belief system μ, μσ (U ) = lim μσ k (U ), ∀U ∈ U . k→∞
(7.41)
Remark 7.50 If σ is a completely mixed behavior strategy vector, then μσ is a complete belief system (Remark 7.42). In this case, (σ, μσ ) is a consistent system. This follows directly from Definition 7.49, using the sequence (σ k )k∈N defined by σ k = σ for all k ∈ N. Remark 7.51 Since the strategies σ k in Definition 7.49 are completely mixed strategies, for every k ∈ N the belief system μσ k is a complete belief system (Remark 7.42), and hence the limit μ is also a complete belief system (Remark 7.47). Definition 7.52 An assessment (σ, μ) is called a sequential equilibrium if it is consistent and sequentially rational. Remark 7.53 By definition, if an assessment (σ, μ) is sequentially rational then it is rational at each information set at which the belief μ is defined. Since the belief system of an assessment is a complete belief system, it follows that a sequentially rational assessment (σ, μ) is rational at each information set. The following result, which is a corollary of Theorem 7.44, shows that the concept of sequential equilibrium is a refinement of the concept of Nash equilibrium. Theorem 7.54 In an extensive-form game with perfect recall, if the assessment (σ, μ) is a sequential equilibrium, then the strategy vector σ is a Nash equilibrium in behavior strategies. All of the above leads to: Theorem 7.55 In an extensive-form game with perfect recall, if σ is a Nash equilibrium in completely mixed behavior strategies, then (σ, μσ ) is a sequential equilibrium.
279
7.4 Sequential equilibrium
Proof: Remark 7.50 implies that the pair (σ, μσ ) is a consistent assessment. Theorem 7.45 implies that this assessment is sequentially rational.
Example 7.56 Consider the two-player extensive-form game depicted in Figure 7.17.
C
x2
x1
II
I
B
x3
t
(0, 2)
b
(1, 0)
t
(1, 0)
b
(0, 1)
A (3, 0) Figure 7.17 The game in Example 7.56, and a strictly dominant strategy of Player I
The strategy A strictly dominates Player I’s two other strategies, and hence at every Nash equilibrium, Player I chooses A. This means that at any Nash equilibrium Player II’s strategy has no effect at all on the play of the game, and hence in this game there is a continuum of equilibria in mixed strategies, (A, [y(t), (1 − y)(b)]), for 0 ≤ y ≤ 1. We now compute all the sequential equilibria of the game. Since every sequential equilibrium is a Nash equilibrium (Theorem 7.54), every sequential equilibrium (σ, μ) satisfies σI = A. Player II’s belief at her sole information set is therefore μUII = (μUII (x2 ), μUII (x3 )). Is every belief μUII part of a consistent complete belief system? The answer is positive: the only condition that the assessment (σ, μ) needs to satisfy in order to be a consistent assessment is σI = A. This follows directly from Definition 7.49, using the sequence σIk
' μUII (x2 ) μUII (x3 ) k−1 (A), (B), (C) = k k k &
for each k ∈ N. We next check which beliefs of Player II at information set UII are rational at UII . If the play of the game is at information set UII , action b yields Player II the expected payoff 2μUII (x2 ), and action t yields the expected payoff μUII (x3 ). Since μUII (x2 ) + μUII (x3 ) = 1, we deduce the following:
r If μUII (x2 ) > 1 , then the only action that is rational for Player II at information set UII is t. 3 r If μUII (x2 ) < 1 , then the only action that is rational for Player II at information set UII is b. 3 r If μUII (x2 ) = 1 , then every mixed action of Player II is rational at information set UII . 3 In other words, the set of sequentially rational equilibria consists of the following assessments:
r σI = A, σII = t, μUII = [y(x2 ), (1 − y)(x3 )] for y > 1 . 3 r σI = A, σII = b, μUII = [y(x2 ), (1 − y)(x3 )] for y < 1 . 3 r σI = A, σII = [z(t), (1 − z)(b)] for z ∈ [0, 1], μUII = [ 1 (x2 ), 2 (x3 )]. 3 3
◭
280
Equilibrium refinements
Example 7.38 (Continued) We have seen that the following pair (σ, μσ ) satisfies the properties of partial consistency, and sequential rationality, at every information set U for which Pσ (U ) > 0:
r Player I: r plays the mixed action [ 3 (T ), 4 (M), 5 (B)] at UI1 . 12 12 12 r chooses B1 at information set UI1 . r chooses T2 at information set UI2 . r Player II chooses b. r Player II’s belief at information set UII is [ 4 (x2 ), 5 (x3 )]. 9 9 r Player I’s belief at information set UI2 is [1(x5 )]. We now show that (σ, μσ ) can be extended to a sequential equilibrium. To do so, we need to specify what Player I’s belief is at information set UI3 . Denote this belief by μUI3 = (μUI3 (x6 ), μUI3 (x7 )). Note that for each μUI3 , the assessment (σ, μσ , μUI3 ) is consistent. This is achieved by defining ' & μU 3 (x7 ) μUI3 (x6 ) k−1 σIk = σI , σIIk = (t), I (m), (b) , (7.42) k k k and using Definition 7.49. Finally, the action T2 yields Player I the expected payoff 3μUI3 (x7 ), and action B1 yields him the expected payoff 2μUI3 (x6 ). It follows that action T2 is rational if μUI3 (x6 ) ≤ 53 . We deduce that the assessment (σ, μσ ) can be expanded to a sequential equilibrium (σ, μ) if we add:
r the belief of Player I at information set UI3 is [p(x6 ), (1 − p)(x7 )], where p ∈ [0, 3 ]. 5
◭
Sequential equilibrium, and extensive-form perfect equilibrium, are similar but not identical concepts. The following theorem states that every extensive-form perfect equilibrium can be completed to a sequential equilibrium. Example 7.59, which is presented after the proof of the theorem, shows that the converse does not obtain, and therefore the concept of extensive-form perfect equilibrium is a refinement of the concept of sequential equilibrium. Theorem 7.57 Let σ be an extensive-form perfect equilibrium in an extensive-form game with perfect recall Ŵ. Then σ can be completed to a sequential equilibrium: there exists a complete belief system μ = (μU )U ∈U satisfying the condition that the pair (σ, μ) is a sequential equilibrium. Since by Theorem 7.30 (page 268) every finite extensive-form game with perfect recall has an extensive-form perfect equilibrium, we immediately deduce the following corollary of Theorem 7.57. Corollary 7.58 Every finite extensive-form game with perfect recall has a sequential equilibrium. Proof of Theorem 7.57: Since σ is an extensive-form perfect equilibrium, there exists a sequence (δ k )k∈N of perturbation vectors satisfying limk→∞ M(δk ) = 0, and for each k ∈ N, there exists an equilibrium σ k of the δ k -perturbed game Ŵ(δ k ), satisfying
281
7.4 Sequential equilibrium
limk→∞ σ k = σ . Theorem 7.55 implies that for each k ∈ N, the assessment (σ k , μσ k ) is a sequential equilibrium in the game Ŵ(δ k ). By Remark 7.47, there exists an increasing sequence (kj )j ∈N of natural numbers satisfying the condition that the sequence (μσ kj )j ∈N converges to a complete belief system μ. We deduce from this that (σ, μ) is a consistent assessment. We now prove that (σ, μ) is a sequentially rational assessment. Let i ∈ N be a player, Ui be an information set of player i, and σi′ be a behavior strategy of player i. By Theorem 7.34, there exists a sequence (σi′k )k∈N of behavior strategies of player i converging to σi′ and satisfying the condition that for each k ∈ N, the strategy σi′k is a possible strategy for player i in the game Ŵ(δ k ). Since the assessment (σ k , μσ k ) is a sequential equilibrium in the game Ŵ(δ k ), one has (7.43) ui (σ k | Ui , μσ k ) ≥ ui σi′k , σ−i | Ui , μσ k . From the continuity of the payoff function, and consideration of the subsequence (kj )j ∈N , we conclude that ui (σ | Ui , μ) ≥ ui ((σi′ , σ−i ) | Ui , μ).
(7.44)
This completes the proof that the pair (σ, μ) is sequentially rational, and hence a sequential equilibrium. We will now show that the converse of Theorem 7.57 does not hold: there exist games that have a sequential equilibrium of the form (σ, μ), where the strategy vector σ is not an extensive-form perfect equilibrium. Example 7.59 Consider the two-player extensive-form game depicted in Figure 7.18. In this game there are two Nash equilibria in pure strategies: (T , t) and (B, b). Since every player has a single information set, the set of strategic-form perfect equilibria equals the set of extensive-form perfect equilibria. Since strategy T dominates strategy B (and strategy t dominates strategy b), only (T , t) is a strategic-form perfect equilibrium (see Theorem 7.28 on page 267). However, as we will now show, both (T , t) and (B, b) form elements of sequential equilibrium.
x2 x1 I
T II B
x3
t b
(0, 0)
t
(0, 0)
b
(0, 0)
Equilibrium (T, t)
x2
(1, 1) x1 I
T II B
x3
t
(1, 1)
b
(0, 0)
t
(0, 0)
b
(0, 0)
Equilibrium (B, b)
Figure 7.18 The game in Example 7.59, along with two sequential equilibria
Under both equilibria, the play of the game visits every information set, and therefore the beliefs of the players in these equilibria are as follows:
282
Equilibrium refinements
r At the equilibrium (T , t), the beliefs of the players are [1(x1 )] and [1(x2 )] respectively. r At the equilibrium (B, b), the beliefs of the players are [1(x1 )] and [1(x3 )] respectively. We first show that the pair ((T , t), [1(x1 )], [1(x1 )]) is a sequential equilibrium. To show that this pair is consistent define ' & ' & k−1 1 1 k−1 σk = (T ), (B) , (t), (b) , ∀k ∈ N. (7.45) k k k k
(x2 ), 1k (x3 ) for all k ∈ N, limk→∞ σ k = (T , t), and limk→∞ μσ k (UII ) = Then μσ k (UII ) = k−1 k [1(x2 )]. This pair is sequentially rational because the payoff to each of the players is 1, which is the maximal payoff in the game. We next show that the pair ((B, b), [1(x1 )], [1(x3 )]) is also a sequential equilibrium. To show that this pair is consistent define ' & ' & 1 k−1 k−1 1 (T ), (B) , (t), (b) , ∀k ∈ N. (7.46) σk = k k k k
(x3 ) for all k ∈ N, limk→∞ σ k = (B, b), and limk→∞ μσ k (UII ) = Then μσ k (UII ) = k1 (x2 ), k−1 k [1(x3 )]. This pair is sequentially rational because Player I receives 0 whether he plays T or B, and given his belief at information set {x2 , x3 }, Player II receives 0 whether he plays t or plays b. ◭
In summary, the main differences between the three refinements of Nash equilibrium in extensive-form games are as follows:
r A mixed strategy vector σ is a strategic-form perfect equilibrium if it is the limit of equilibria in completely mixed strategies (σ k )k∈N of a sequence of perturbed games, where the perturbations converge to zero. r A mixed strategy vector σ is an extensive-form perfect equilibrium if it is the limit of equilibria in completely mixed behavior strategies (σ k )k∈N of a sequence of perturbed games, where the perturbations converge to zero. r An assessment (σ, μ) is a sequential equilibrium if μ is the limit of a sequence of beliefs (μσ k )k∈N induced by a sequence of strategies (σ k )k∈N converging to σ in a sequence of games with perturbations converging to zero (the consistency property), and for each player i, at each of his information sets, σi is the best reply to σ−i according to μ (the sequential rationality property). As we saw in Example 7.59, if (σ, μ) is a sequential equilibrium then the strategy vector σ is not necessarily an extensive-form perfect equilibrium. This is due to the fact that the definition of extensive-form perfect equilibrium contains a condition that is not contained in the definition of sequential equilibrium: for σ to be an extensive-form perfect equilibrium, σ k must be an equilibrium of the corresponding perturbed game for every k ∈ N; i.e., the sequential rationality property must obtain for every element of the sequence (σ k )k∈N , while for (σ, μ) to be a sequential equilibrium, the sequential rationality property must hold only in the limit, σ .
283
7.4 Sequential equilibrium
The next example illustrates why it is not clear how to extend the definition of sequential equilibrium to games with imperfect recall.
Example 7.60 The Absent-Minded Driver Consider the Absent-Minded Driver game depicted in Figure 7.19, which we previously encountered in Example 6.9 (page 225). The game contains a single player, who cannot distinguish between the two vertices in the game tree, and hence, at any vertex cannot recall whether or not he has played in the past.
x1
3
T
I B x2
T
0
B
2
Figure 7.19 The Absent-Minded Driver game
The only Nash equilibrium in this game is T , because this strategy yields a payoff of 3, which is the game’s highest payoff. We now check whether the concept of sequential equilibrium can be adapted to this example. We first need to contend with the fact that because there are paths that visit the same information set several times, we need to reconsider what a belief at a vertex means. Suppose that the player implements strategy σ = [1(B)], in which he plays action B. The play of the game will visit the vertex x1 , and the vertex x2 , hence pσ (x1 ) = pσ (x2 ) = 1, and pσ (U ) = 1 holds for the information set U = {x1 , x2 }. It follows that Equation (7.32) does not define a belief system, because pσ (U ) = pσ (x1 ) + pσ (x2 ). We therefore need to define the player’s belief system as follows: μU (x1 ) =
pσ (x1 ) = 12 , pσ (x1 ) + pσ (x2 )
μU (x2 ) =
pσ (x2 ) pσ (x1 )+pσ (x2 )
= 21 .
(7.47)
In words, if the player implements strategy B, at his information set he ascribes equal probability to the play of the game being at either of the vertices x1 and x2 . Is the concept of sequential equilibrium applicable in this game? We will show that the assessment (B, [ 21 (x1 ), 12 (x2 )]) is sequentially rational, and therefore is a sequential equilibrium according to Definition 7.52, despite the fact that the strategy B is not a Nash equilibrium. If Player I implements strategy B at his information set, his expected payoff is 2, because he believes the play of the game is located at either x1 , or x2 , with equal probability, and in either case, if he implements strategy B, his expected payoff is 2. If, however, Player I implements strategy T at this information set, his expected payoff is 23 : the player ascribes probability 12 to the play of the game being located at vertex x1 , which yields a payoff of 3 if he implements strategy T , and he ascribes probability 1 to the play of the game being located at vertex x2 , in which case he receives a payoff of 0 if 2 he implements strategy T . It follows that (B, [ 21 (x1 ), 12 (x2 )]) is a sequentially rational assessment, despite the fact that B is not an equilibrium. The reason that Theorem 7.54 does not hold in games with imperfect recall is due to the fact that if there exists a path from the root that passes through two different vertices in the same information set U of a player, then when the player changes the action that he implements at U , he may also change his belief at U . This possibility is not taken into account in the definition of sequential ◭ equilibrium.
284
Equilibrium refinements
7.5
Remarks • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Sections 7.1 and 7.3 are based on the research conducted by Reinhardt Selten, who was awarded the Nobel Memorial Prize in Economics in 1994 for his contributions to refinements of the Nash equilibrium. The concept of sequential equilibrium first appeared in Kreps and Wilson [1982]. The interested reader may find a wealth of material on the concepts of subgame perfect equilibrium, and perfect equilibrium, in van Damme [1987]. Exercises 7.6 and 7.7 are based on Glazer and Ma [1989]. Exercise 7.13 is based on Selten [1978]. Exercise 7.14 is a variation of an example appearing in Rubinstein [1982]. Exercise 7.16 is based on an example appearing in Harris, Reny, and Robson [1995]. Exercise 7.26 is based on an example appearing in van Damme [1987, page 28]. Exercise 7.37 is based on an example appearing in Selten [1975]. The game in Exercise 7.46 is taken from Selten [1975]. The game in Exercise 7.48 is based on a game appearing in Kreps and Ramey [1987]. Exercise 7.49 is taken from Kohlberg and Mertens [1986]. Exercise 7.52 is based on an example appearing in Banks and Sobel [1987]. Exercise 7.53 is based on an example appearing in Cho and Kreps [1987]. Exercise 7.54 is based on an example appearing in Camerer and Weigelt [1988].
7.6
Exercises • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
7.1 (a) What is the number of subgames in a game with perfect information whose game tree has eight vertices? (b) What is the number of subgames in a game whose game tree has eight vertices and one information set, which contains two vertices (with all other information sets containing only one vertex)? (c) What is the number of subgames in a game whose game tree has eight vertices, three of which are chance vertices? 7.2 Answer the following questions, for each of the following two-player zero-sum extensive-form games: (a) Find all the equilibria obtained by backward induction. (b) Describe the corresponding strategic-form game. (c) Check whether there exist additional equilibria. 3
c a
II d
I b
a
3 Game A
e
3
f
2
I
I b
c
4
d
5
II 2 3
II
1 3
II
e g f
0
Game B
h
6 −6 −9 12
285
7.6 Exercises
7.3 Find all the equilibria of the following two-player zero-sum game.
a
II
b
II
c
12
d
15
e
8
f
10
I
Explain why one cannot obtain all the equilibria of the game by implementing backward induction. 7.4 Find all the subgame perfect equilibria of the following games. t1 T
b1
II
I
(1, 1) (1, 0)
T
II
B
II
t1
(5, 1)
b1
(1, 1)
t2
(3, 2)
b2
(0, 3)
I t2
B
b2
(0, 0) (0, 1)
Game A
Game B (0, 2)
T1 I B1
II
(0, 1) T1
(3, 2) T2
(1, 5)
B2
(2, 3)
Game C
I B1
II
(6, 3) T2
(5, 4)
B2
(3, 5)
Game D
7.5 The Ultimatum game Allen and Rick need to divide $100 between them as follows: first Allen suggests an integer x between 0 and 100 (which is the amount of money he wants for himself). Rick, on hearing the suggested amount, decides whether to accept or reject. If Rick accepts, the payoff of the game is (x, 1 − x): Allen receives x dollars, and Rick receives 100 − x dollars. If Rick chooses to reject, neither player receives any money. (a) Describe this situation as an extensive-form game. (b) What is the set of pure strategies each player has? (c) Show that any result (a, 100 − a), a ∈ {0, 1, . . . , 100}, is a Nash equilibrium payoff. What are the corresponding equilibrium strategies? (d) Find all the subgame perfect equilibria of this game. 7.6 The Judgment of Solomon Elizabeth and Mary appear before King Solomon at his palace, along with an infant. Each woman claims that the infant is her child. The
286
Equilibrium refinements
child is “worth” 100 dinars to his true mother, but he is only “worth” 50 dinars to the woman who is not his mother. The king knows that one of these two women is the true mother of the child, and he knows the “value” that the true mother ascribes to the child, and the “value” that the impostor ascribes to the child, but he does not know which woman is the true mother, and which the impostor. To determine which of the two women is the true mother, the king explains to Elizabeth and Mary that he will implement the following steps: (i) He will ask Elizabeth whether the child is hers. If she answers negatively, the child will be given to Mary. If she answers affirmatively, the king will continue to the next step. (ii) He will ask Mary if the child is hers. If she answers negatively, the child will be given to Elizabeth. If she answers affirmatively, Mary will pay the king 75 dinars, and receive the child, and Elizabeth will pay the king 10 dinars. Answer the following questions: (a) Describe the mechanism implemented by the king using two extensive-form games: in one extensive-form game Elizabeth is the true mother of the child, and in the second extensive-form game Mary is the true mother of the child. (b) Prove that the mechanism implemented by the king guarantees that despite the fact that he does not know which of the above extensive-form games is being played, in each game the only subgame perfect equilibrium is the one under which the true mother gets her child and neither woman pays anything at all. (c) Find another equilibrium of each game, which is not the subgame perfect equilibrium. 7.7 The following is a generalization of the “Judgment of Solomon,” discussed in Exercise 7.6. Emperor Caligula wishes to grant a prize-winning horse as a gift to one of his friends, Claudius or Marcus. The value that Claudius ascribes to the horse is in the set {u1 , u2 , . . . , un }, and the value that Marcus ascribes to the horse is in the set {v1 , v2 , . . . , vm }. Each one of the emperor’s friends knows the precise value that he ascribes to the horse, and he also knows the precise value that the other friend ascribes to the horse, but the only thing that the emperor knows is that the value that each of his friends ascribes to the horse is taken from the appropriate set of possible values. The emperor wishes to give the horse to the friend who values the horse most highly, but does not want to take money from his friends. The emperor implements the following steps: (i) Let ε > 0 be a positive number satisfying the condition that for each i and j , if ui = vj then |ui − vj | > ε. (ii) The emperor will ask Claudius if he values the horse at least as much as Marcus does. If Claudius answers negatively, the horse will be given to Marcus. If Claudius answers affirmatively, the emperor will continue to the next stage. (iii) The emperor will ask Marcus if he values the horse more than Claudius does. If Marcus answers negatively, the horse will be given to Claudius. If Marcus
287
7.6 Exercises
answers affirmatively, the two friends will each pay the emperor ε/4, and the emperor will continue to the next step. (iv) Claudius will be called upon to suggest a value u ∈ {u1 , u2 , . . . , un }. (v) Knowing Claudius’ suggested value, Marcus will be called upon to suggest a value v ∈ {v1 , v2 , . . . , vm }. (vi) The individual who suggested the higher value receives the horse, with the emperor keeping the horse in case of a draw. The winner pays max{u, v} − 2ε for the horse. The loser pays nothing. Answer the following questions: (a) Describe the sequence of steps implemented by the emperor as an extensiveform game. Assume that at the start of the game the following move of chance is implemented, which determines the private value of the horse for each of the two friends: the private value of the horse for Claudius is chosen from the set {u1 , u2 , . . . , un } using the uniform distribution, and the private value of the horse for Marcus is chosen from the set {v1 , v2 , . . . , vm } using the uniform distribution. (b) Prove that the only subgame perfect equilibrium of the game leads to the friend who values the horse the most receiving the horse (in case both friends equally value the horse, Claudius receives the horse). 7.8 Prove Theorem 7.9 on page 257: every (finite) extensive-form game with perfect information has a subgame perfect equilibrium in pure strategies. 7.9 Prove that in the 100-times-repeated Prisoner’s Dilemma game (see Example 7.15 on page 259), the only subgame perfect equilibrium is the one where both players choose D in all stages of the game (after every history of previous actions). 7.10 (a) Find all the equilibria of the following two-player game. Player II L
R
T
3, 0
1, 2
B
2, 0
1, 5
Player I
(b) Suppose the players play the game twice; after the first time they have played the game, they know the actions chosen by both of them, and hence each player may condition his action in the second stage on the actions that were chosen in the first stage. Describe this two-stage game as an extensive-form game. (c) What are all the subgames of the two-stage game? (d) Find all the subgame perfect equilibria of the two-stage game. 7.11 The one-stage deviation principle for subgame perfect equilibria Recall that ui (σ | x) is the payoff to player i when the players implement the strategy vector σ , given that the play of the game has arrived at the vertex x.
288
Equilibrium refinements
Prove that a strategy vector σ ∗ = (σi∗ )i∈N in an extensive-form game with perfect information is a subgame perfect equilibrium if and only if for each player i ∈ N, every decision vertex x, and every strategy σi of player i that is identical to σi∗ at every one of his decision vertices except for x, ∗ σi , σ−i ) | x). ui (σ ∗ | x) ≥ ui ((
(7.48)
Guidance: To prove that σ ∗ is a subgame perfect equilibrium if the condition above ∗ ) | x) holds obtains, one needs to prove that the condition ui (σ ∗ | x) ≥ ui ((σi , σ−i for every vertex x, every player i, and every strategy σi . This can be accomplished by induction on the number of vertices in the game tree as follows. Suppose that this condition does not hold. Among all the triples (x, i, σi ) for which it does not hold, choose a triple such that the number of vertices where σi differs from σi∗ is minimal. Denote by X the set of all vertices such that σi differs from σi∗ . By assumption, |X | ≥ 1. From the vertices in X , choose a “highest” vertex, i.e., a vertex such that every path from the root to it does not pass through any other vertex in X . Apply the inductive hypothesis to all the subgames beginning at the other vertices in X . 7.12 Principal-Agent game Hillary manages a technology development company. A company customer asks Hillary to implement a particular project. Because it is unclear whether or not the project is feasible, the customer offers to pay Hillary $2 million at the start of work on the project, and an additional $4 million upon its completion (if the project is never completed, the customer pays nothing beyond the initial $2 million payment). Hillary seeks to hire Bill to implement the project. The success of the project depends on the amount of effort Bill invests in his work: if he fails to invest effort, the project will fail; if he does invest effort, the project will succeed with probability p, and will fail with probability 1 − p. Bill assesses the cost of investing effort in the project (i.e., the amount of time he will need to devote to work at the expense of the time he would otherwise give to his family, friends, and hobbies) as equivalent to $1 million. Bill has received another job offer that will pay him $1 million without requiring him to invest a great deal of time and effort. In order to incentivize Bill to take the job she is offering, Hillary offers him a bonus, to be paid upon the successful completion of the project, beyond the salary of $1 million. Answer the following questions: (a) Depict this situation as an extensive-form game, where Hillary first determines the salary and bonus that she will offer Bill, and Bill afterwards decides whether or not to take the job offered by Hillary. If Bill takes the job offered by Hillary, Bill then needs to decide whether or not to invest effort in working on the project. Finally, if Bill decides to invest effort on the project, a chance move determines whether the project is a success or a failure. Note that the salary and bonus that Hillary can offer Bill need not be expressed in integers. (b) Find all the subgame perfect equilibria of this game, assuming that both Hillary and Bill are risk-neutral, i.e., each of them seeks to maximize the expected payoff he or she receives.
289
7.6 Exercises
(c) What does Hillary need to persuade Bill of during their job interview, in order to increase her expected payoff at equilibrium? 7.13 The Chainstore game A national chain of electronics stores has franchises in shopping centers in ten different cities. In each shopping center, the chainstore’s franchise is the only electronics store. Ten local competitors, one in each city, are each contemplating opening a rival electronics store in the local shopping center, in the following sequence. The first competitor decides whether or not to open a rival electronics store in his city. The second competitor checks whether or not the first competitor has opened an electronics store, and takes into account the national chainstore’s response to this development, before deciding whether or not he will open a rival electronics store in his city. The third competitor checks whether or not the first and second competitors have opened electronics stores, and takes into account the national chainstore’s response to these developments, before deciding whether or not he will open a rival electronics store in his city, and so on. If a competitor decides not to open a rival electronics store, the competitor’s payoff is 0, and the national chain store’s payoff is 5. If a competitor does decide to open a rival electronics store, his payoff depends on the response of the national chainstore. If the national chainstore responds by undercutting prices in that city, the competitor and the chainstore lose 1 each. If the national chainstore does not respond by undercutting prices in that city, the competitor and the national chainstore each receive a payoff of 3. (a) Describe this situation as an extensive-form game. (b) Find all the subgame perfect equilibria. (c) Find a Nash equilibrium that is not a subgame perfect equilibrium, and explain why it fails to be a subgame perfect equilibrium. 7.14 Alternating Offers game Debby and Barack are jointly conducting a project that will pay them a total payoff of $100. Every delay in implementing the project reduces payment for completing the project. How should they divide this money between them? The two decide to implement the following procedure: Debby starts by offering a division (xD , 100 − xD ), where xD is a number in [0, 100] representing the amount of money that Debby receives under the terms of this offer, while Barack receives 100 − xD . Barack may accept or reject Debby’s offer. If he rejects the offer, he may propose a counteroffer (yD , 99 − yD ) where yD is a number in [0, 99] representing the amount of money that Debby receives under the terms of this offer, while Barack receives 99 − yD . Barack’s offer can only divide $99 between the two players, because the delay caused by his rejection of Debby’s offer has reduced the payment for completing the project by $1. Debby may accept or reject Barack’s offer. If she rejects the offer, she may then propose yet another counteroffer, and so on. Each additional round of offers, however, reduces the amount of money available by $1: if the two players come to an agreement on a division after the kth offer has been passed between them, then they can divide only (101 − k) dollars between them. If the two players cannot come to any agreement, after 100 rounds of alternating offers, they drop plans to conduct the project jointly, and each receives 0.
290
Equilibrium refinements
Describe this situation as an extensive-form game, and find all of its subgame perfect equilibria. 7.15 Prove Theorem 7.10: every extensive-form game with perfect recall has a subgame perfect equilibrium in mixed strategies. 7.16 A game without a subgame perfect equilibrium Consider the four-player extensive-form game in Figure 7.20. In this game, Player I’s set of pure strategies is the interval [−1, 1]; i.e., Player I chooses a number a in this interval. The other players, Players II, III, and IV, each have two available actions, at each of their information sets. Figure 7.20 depicts only one subgame tree, after Player I has chosen an action. All the other possible subtrees are identical to the one shown here.
t1 III
IV b1
T I
a II
B
t2 III
IV b2
, 1, −a, −a)
τ1
(−| a | 2 − 2|a|
β1
(−| a | 2 − 2|a − 20 , 1, −a, a)
τ1
(−| a | 2 + 2|a| − 20 , −1, a, −a)
β1
(−| a | 2 + 2|a|
, −1, a, a)
τ2
(−| a | 2 + 2|a|
, −2, −a, −a)
β2
(−| a | 2 + 2|a| − 20 , −2, −a, a)
τ2
(−| a | 2 − 2|a| − 20 , 2, a, −a)
β2
(−| a | 2 − 2|a|
, 2, a, a)
Figure 7.20 A game without subgame perfect equilibria
This game may be regarded as a two-stage game: in the first stage, Players I and II choose their actions simultaneously (where Player I chooses a ∈ [−1, 1], and Player II chooses T or B), and in the second stage, Players III and IV, after learning which actions were chosen by Players I and II, choose their actions simultaneously. Suppose that the game has a subgame perfect equilibrium, denoted by σ = (σI , σII , σIII , σIV ). Answer the following questions: (a) (b) (c) (d)
What are all the subgames of this game? What will Players III and IV play under σ , when a = 0? What are the payoffs of Players III and IV, when a = 0? Denote by β the probability that Player II plays the pure strategy B, under strategy σII . Explain why there does not exist a subgame perfect equilibrium such that Player I plays a = 0, Player II plays β = 21 , and if a = 0, Player
291
7.6 Exercises
(e) (f) (g)
(h)
(i)
(j) (k)
III chooses t1 with probability 41 , and chooses t2 with probability 18 , while Player IV chooses τ1 and τ2 with probability 1. Depict the expected payoff of Player I as a function of a and β, in the case where a = 0. Find the upper bound of the possible payoffs Player I can receive, in the case where a = 0. What is Player I’s best reply when β < 12 ? What are the best replies of Players III and IV, given Player I’s strategy? What is Player II’s best reply to these strategies of Players I, III, and IV? What is Player I’s best reply when β > 12 ? What are the best replies of Players III and IV, given Player I’s strategy? What is Player II’s best reply to these strategies of Players I, III, and IV? Suppose that β = 21 . What is the optimal payoff that Player I can receive? Deduce that under σ , Player I necessarily plays a = 0, and his payoff is then 0. What does this say about the strategies of Players III and IV? What is Player II’s best reply to these strategies of Players I, III, and IV? Conclude that this game has no subgame perfect equilibrium. Find a Nash equilibrium of this game.
This exercise does not contradict Theorem 7.37, which states that every finite extensive-form game with perfect recall has a subgame perfect equilibrium in behavior strategies, because this game is infinite: Player I has a continuum of pure strategies. 7.17 Prove that for each player i, and every vector of perturbations εi , the set of strategies i (εi ) (see Equation (7.11)) is compact and convex. 7.18 Prove that every mixed strategy σi ∈ i can be approximated by a completely mixed strategy; that is, for every δ > 0 there is a completely mixed strategy σi′ of player i that satisfies maxsi ∈Si |σi (si ) − σi′ (si )| < δ. 7.19 Prove that the set of perfect equilibria of a strategic-form game is a closed subset of ×i∈N i .
7.20 Find all the perfect equilibria in each of the following games, in which Player I is the row player and Player II is the column player. L
C
R
L
M
T
1, 1
0, 0
−1, −2
T
1, 1
1, 0
M
0, 0
0, 0
0, −2
B
1, 0
0, 1
B
−2, 1
−2, 0
−2, −2
Game A
Game B
7.21 Consider the following two-player strategic-form game:
292
Equilibrium refinements L
C
R
T
1, 2
3, 0
0, 3
M
1, 1
2, 2
2, 0
B
1, 2
0, 3
3, 0
(a) Prove that ([x1 (T ), x2 (M), (1 − x1 − x2 )(B)], L) is a Nash equilibrium of this game if and only if 13 ≤ x1 ≤ 23 , 0 ≤ x2 ≤ 2 − 3x1 , and x1 + x2 ≤ 1. (b) Prove that the equilibria identified in part (a) are all the Nash equilibria of the game. (c) Prove that if ([x1 (T ), x2 (M), (1 − x1 − x2 )(B)], L) is a perfect equilibrium, then 1 − x1 − x2 > 0. (d) Prove that for every x1 ∈ ( 13 , 12 ) the strategy vector ([x1 (T ), (1 − x1 )(M)], L) is a perfect equilibrium. (e) Using Exercise 7.19 determine the set of perfect equilibria of this game. 7.22 Prove Theorem 7.28 (page 267): in a perfect equilibrium, every weakly dominated strategy is chosen with probability 0. 7.23 Let σ1 and σ2 be optimal strategies (in pure or mixed strategies) of two players in a two-player zero-sum game. Is (σ1 , σ2 ) necessarily a perfect equilibrium? If so, prove it. If not, provide a counterexample. 7.24 A pure strategy si of player i is said to be weakly dominated by a mixed strategy if player i has a mixed strategy σi satisfying: (a) For each strategy s−i ∈ S−i of the other players,
ui (si , s−i ) ≤ Ui (σi , s−i ).
(7.49)
(b) There exists a strategy t−i ∈ S−i of the other players satisfying ui (si , t−i ) < Ui (σi , t−i ).
(7.50)
Prove that in a perfect equilibrium, every pure strategy that is weakly dominated by a mixed strategy is chosen with probability 0. 7.25 (a) Prove that (T , L) is the only perfect equilibrium in pure strategies of the following game. Player II L
M
T
6, 6
0, 0
B
0, 4
4, 4
Player I
(b) Prove that in the following game, which is obtained from the game in part (a) by adding a dominated pure strategy to each player, (B, M) is a perfect equilibrium.
293
7.6 Exercises Player II
Player I
L
M
R
T
6, 6
0, 0
2, 0
B
0, 4
4, 4
2, 0
I
0, 0
0, 2
2, 2
7.26 In this exercise, we will show that in a three-player game a vector of strategies that makes use solely of strategies that are not dominated is not necessarily a perfect equilibrium. To do so, consider the following three-player game, where Player I chooses a row (T or B), Player II chooses a column (L or R), and Player III chooses a matrix (W or E). W
E
L
R
T
1, 1, 1
1, 0, 1
B
1, 1, 1
0, 0, 1
L
R
T
1, 1, 0
0, 0, 0
B
0, 1, 0
1, 0, 0
(a) Find all the dominated strategies. (b) Find all the Nash equilibria of this game. (c) Find all the perfect equilibria of this game. 7.27 Prove that the following definition of perfect equilibrium is equivalent to Definition 7.25 (page 266). Definition 7.61 A strategy vector σ is called a perfect equilibrium if there exists a sequence (σ k )k∈N of vectors of completely mixed strategies satisfying:
r For each player i ∈ N, the limit limk→∞ σik exists and equals σi . k r σ is a best reply to σ−i , for each k ∈ N, and each player i ∈ N. 7.28 Prove that, in the following game, (B, L) is a Nash equilibrium, but not a perfect equilibrium. Player II
Player I
L
M
R
T
1, 1
3, 3
0, 0
C
1, 1
0, 0
3, 3
B
1, 1
1, 1
1, 1
7.29 Show that in the game in Example 7.35 (page 270) the equilibrium (B, R) is an extensive-form perfect equilibrium. Does this game have additional Nash equilibria? If so, which of them is also an extensive-form perfect equilibrium? Justify your answer.
294
Equilibrium refinements
7.30 Prove that, in the game in Example 7.18 (page 263), the equilibrium (T , t, β) is an extensive-form perfect equilibrium, but the equilibrium (B, t, τ ) is not an extensiveform perfect equilibrium. 7.31 Prove directly the following theorem which is analogous to Corollary 7.26 (page 266) for extensive-form perfect equilibria: every extensive-form perfect equilibrium is a Nash equilibrium in behavior strategies. To prove this, first prove the analog result to Theorem 7.24 for a sequence of equilibria in perturbed games (Ŵ(δk ))k∈N . 7.32 Prove Theorem 7.30 (page 268): every finite extensive-form game has a strategicform perfect equilibrium. 7.33 Prove Theorem 7.31 (page 268): let δ be a perturbation vector, let σ ∗ be a Nash equilibrium (in behavior strategies) in the game Ŵ(δ), and let Ŵ(x) be a subgame of Ŵ. Then the strategy vector σ ∗ , restricted to the subgame Ŵ(x), is a Nash equilibrium (in behavior strategies) of Ŵ(x; δ). 7.34 Prove Theorem 7.34 (page 269): let (δk )k∈N be a sequence of perturbation vectors satisfying limk→∞ M(δ k ) = 0. Then for every behavior strategy σi ∈ Bi of player i there exists a sequence (σik )k∈N of behavior strategies satisfying the following two properties:
r σik ∈ Bi (δik ) for each k ∈ N. r limk→∞ σik exists and equals σi . 7.35 Prove Theorem 7.36 (page 271): every finite extensive-form game with perfect recall has an extensive-form perfect equilibrium. 7.36 This exercise shows that an extensive-form perfect equilibrium is not necessarily a strategic-form perfect equilibrium. In the following game, find an extensive-form perfect equilibrium that is not a strategic-form perfect equilibrium.
T1 I
t
(0, 0)
b
(1, 1)
t
( 21 , 12 )
II B1
b
I
T2
(0, 0)
B2
(1, 1)
Does the game have another Nash equilibrium? Does it have another subgame perfect equilibrium? 7.37 This exercise proves the converse to what we showed in Exercise 7.36: a strategic-form perfect equilibrium is not necessarily an extensive-form perfect equilibrium.
295
7.6 Exercises
(a) Prove that the following game has a unique extensive-form perfect equilibrium. T1
(1, 1) t II b
I B1
(0, 2) T2
(0, 3)
B2
(2, 0)
I
(b) Show that this game has another equilibrium, which is a strategic-form perfect equilibrium. To do so, construct the corresponding strategic-form game, and show that it has more than one perfect equilibrium. (c) Does this game have a strategic-form perfect equilibrium that is not a subgame perfect equilibrium? 7.38 Show that the following game has a unique Nash equilibrium, and in particular a unique extensive-form perfect equilibrium and a unique strategic-form perfect equilibrium.
C
II
B
I
R
(1, 3)
L
(2, 1)
R
(3, 1)
L
(0, 0)
A (1, 2)
7.39 Prove that in an extensive-form game in which every player has a single information set, every strategic-form perfect equilibrium is equivalent to an extensive-form perfect equilibrium, and that the converse also holds. 7.40 Consider the extensive-form game shown in Figure 7.21.
x2 2 3
x4 x5
t
II x6
b t
B
x1 0
T
I T
1 3
x3 Figure 7.21 A game in extensive form
b t B
x7
b
296
Equilibrium refinements
For each of the following pairs, explain why it is not a consistent assessment of the game: (a) ([ 12 (T ), 12 (B)], t, [ 12 (x2 ), 12 (x3 )], [ 14 (x4 ), 41 (x5 ), 12 (x6 )]). (b) ([ 12 (T ), 12 (B)], b, [ 23 (x2 ), 13 (x3 )], [ 13 (x4 ), 13 (x5 ), 31 (x6 )]). (c) (T , t, [ 32 (x2 ), 13 (x3 )], [ 23 (x4 ), 13 (x6 )]). (d) (T , t, [ 23 (x2 ), 13 (x3 )], [ 12 (x5 ), 12 (x6 )]). 7.41 Prove Theorem 7.45 (page 277): if σ ∗ is a Nash equilibrium in behavior strategies, then the pair (σ ∗ , μσ ∗ ) is sequentially rational in every information set U satisfying Pσ ∗ (U ) > 0. 7.42 Prove Theorem 7.46 (page 277): in a game with perfect information, a vector of behavior strategies σ is a subgame perfect equilibrium if and only if the pair (σ, μ) is sequentially rational at every information set of the game, where μ is a complete belief system such that μU = [1(x)] for every information set U = {x}.
7.43 List all the consistent assessments of the extensive-form game in Exercise 7.40 (Figure 7.21). 7.44 List all the consistent assessments of the extensive-form game in Example 7.17 (page 261). 7.45 List all the consistent assessments of the following extensive-form game. T1 B1
3 5
T2
I
B2
0
T1
I
2 5
B1
T2 B2
7.46 List all the consistent assessments, and all the sequentially rational assessments of the following game. t T
(1, 1, 1)
II b
I B
III
τ
(0, 0, 1)
β
(4, 4, 0)
τ
(0, 0, 0)
β
(3, 3, 2)
7.47 Find all the sequential equilibria of the game in Example 7.35 (page 270).
297
7.6 Exercises
7.48 Consider the following extensive-form game. (1, 0) a
T1 I
1 2
B1
0
(0, 3)
c II
(3, 0)
a
(0, 2)
b
T2
1 2
(0, 2)
b
(3, 0)
c
I B2
(0, 3)
(1, 0)
(a) Prove that in this game at every Nash equilibrium Player I plays (T1 , B2 ). (b) List all the Nash equilibria of the game. (c) Which of these Nash equilibria can be completed to a sequential equilibrium, and for each such sequential equilibrium, what is the corresponding belief of Player II at his information sets? Justify your answer. 7.49 Find all the sequential equilibria of the following game. III
τ
(0, 3, 2)
β
(3, 1, 0) τ (2, 0, 0)
T I B
t (0, 3, 2) β (1, 2, 1)
II b
7.50 The following example shows that the set of sequential equilibria is sensitive to the way in which a player makes decisions: it makes a difference whether the player, when called upon to choose an action from among a set of three possible actions, eliminates the actions he will not choose one by one, or simultaneously. Consider the two extensive-form games below. Show that (2, 2) is a sequential equilibrium payoff in Game A, but not a sequential equilibrium payoff in Game B. (2, 2) T t I
M R
II
Game A
b
( 23 , 0)
t
(0, 0)
b
(1, 1)
(2, 2)
T
(3, 3) I
M I
I
II R Game B
t
(3, 3)
b
( 32 , 0)
t
(0, 0)
b
(1, 1)
298
Equilibrium refinements
7.51 In an extensive-form game with perfect recall, is every Nash equilibrium part of a sequential equilibrium? That is, for every Nash equilibrium σ ∗ does there exist a belief system μ that satisfies the property that (σ ∗ , μ) is a sequential equilibrium? If yes, prove it. If not, construct a counterexample. 7.52 Pre-trial settlement A contractor is being sued for damages by a municipality that hired him to construct a bridge, because the bridge has collapsed. The contractor knows whether or not the collapse of the bridge is due to negligence on his part, or due to an act of nature beyond his control, but the municipality does not know which of these two alternatives is the true one. Both sides know that if the matter is settled by a court trial, the truth will eventually be uncovered. The contractor can try to arrive at a pre-trial settlement with the municipality. He has two alternatives: to make a low settlement offer, under which he pays the municipality $300,000, or a high offer, under which he pays the municipality $500,000. After the contractor has submitted a settlement offer, the municipality must decide whether or not to accept it. Both parties know that if the suit goes to trial, the contractor will pay lawyer fees of $600,000, and that, in addition to this expense, if the court finds him guilty of negligence, he will be required to pay the municipality $500,000 in damages. Assume that the municipality has no lawyer fees to pay. Answer the following questions: (a) Describe this situation as an extensive-form game, where the root of the game is a chance move that determines with equal probability whether the contractor was negligent or not. (b) Explain the significance of the above assumption, that a chance move determines with equal probability whether the contractor was negligent or not. (c) Find all the Nash equilibria of this game. (d) Find all the sequential equilibria of this game. (e) Repeat items (c) and (d) when the chance move selects whether the contractor was negligent or not with probabilities p and 1 − p respectively. 7.53 Signaling game Caesar is at a cafe, trying to choose what to drink with breakfast: beer or orange juice. Brutus, sitting at a nearby table, is pondering whether or not to challenge Caesar to a duel after breakfast. Brutus does not know whether Caesar is brave or cowardly, and he will only dare to challenge Caesar if Caesar is cowardly. If he fights a cowardly opponent, he receives one unit of utility, and he receives the same single unit of utility if he avoids fighting a brave opponent. In contrast, he loses one unit of utility if he fights a brave opponent, and similarly loses one unit of utility if he dishonors himself by failing to fight a cowardly opponent. Brutus ascribes probability 0.9 to Caesar being brave, and probability 0.1 to Caesar being a coward. Caesar has no interest in fighting Brutus: he loses 2 units of utility if he fights Brutus, but loses nothing if there is no fight. Caesar knows whether he is brave or cowardly. He can use the drink he orders for breakfast to signal his type, because it is commonly known that brave types receive one unit of utility if they drink beer (and receive nothing if they drink orange juice), while cowards receive one unit of
299
7.6 Exercises
utility if they drink orange juice (and receive nothing if they drink beer). Assume that Caesar’s utility is additive; for example, he receives three units of utility if he is brave, drinks beer, and avoids fighting Brutus. Answer the following questions: (a) Describe this situation as an extensive-form game, where the root of the game tree is a chance move that determines whether Caesar is brave (with probability 0.9) or cowardly (with probability 0.1). (b) Find all the Nash equilibria of the game. (c) Find all the sequential equilibria of the game. 7.54 Henry seeks a loan to form a new company, and submits a request for a loan to Rockefeller. Rockefeller knows that p percent of people asking him for loans are conscientious, who feel guilty if they default on their loans, and 1 − p percent of people asking him for loans have no compunction about defaulting on their loans, but he does not know whether or not Henry is a conscientious borrower. Rockefeller is free to grant Henry a loan, or to refuse to give him a loan. If Henry receives the loan, he can decide to repay the loan, or to default. If Rockefeller refuses to loan money to Henry, both sides receive 10 units. If Rockefeller loans Henry the money he needs to form a company, and Henry repays the loan, Rockefeller receives 40 units, while Henry receives 60 units. If Rockefeller loans Henry the money he needs to form a company, but Henry defaults on the loan, Rockefeller loses x units, and Henry’s payoff depends on his type: if he is a conscientious borrower, he receives 0, but if he has no compunction about defaulting, he gains 150 units. Answer the following questions: (a) Describe this situation as an extensive-form game, where the root of the game tree is a chance move that determines Henry’s type. (b) Find all the Nash equilibria, and the sequential equilibria, of this game, in the following three cases: (i) p = 13 , and x = 100. (ii) p = 0.1, and x = 50. (iii) p = 0, and x = 75. 7.55 The one-stage deviation principle for sequential equilibria Let (σ, μ) be a consistent assessment in an extensive-form game Ŵ with perfect recall. Prove that the assessment (σ, μ) is a sequential equilibrium if and only if for each player i ∈ N, and every information set Ui ui (σ | Ui , μ) ≥ ui ( σi , σ−i | Ui , μ),
(7.51)
under every strategy σi that differs from σi only at the information set Ui . Guidance: To prove that if the condition holds then (σ, μ) is a sequential equilibrium, consider a player i and any information set Ui of his, along with any strategy σi′ . Show that ui (σ | Ui , μ) ≥ ui ((σi′ , σ−i ) | Ui , μ). The proof of this inequality can be accomplished by induction on the number of information sets of player i over which σi′ differs from σi .
8
Correlated equilibria
Chapter summary This chapter introduces the concept of correlated equilibrium in strategic-form games. The motivation for this concept is that players’ choices of pure strategies may be correlated due to the fact that they use the same random events in deciding which pure strategy to play. Consider an extended game that includes an observer who recommends to each player a pure strategy that he should play. The vector of recommended strategies is chosen by the observer according to a probability distribution over the set of pure strategy vectors, which is commonly known among the players. This probability distribution is called a correlated equilibrium if the strategy vector in which all players follow the observer’s recommendations is a Nash equilibrium of the extended game. The probability distribution over the set of strategy vectors induced by any Nash equilibrium is a correlated equilibrium. The set of correlated equilibria is a polytope that can be calculated as a solution of a set of linear equations.
In Chapters 4, 5, and 7 we considered strategic-form games and studied the concept of equilibrium. One of the underlying assumptions of those chapters was that the choices made by the players were independent. In practice, however, the choices of players may well depend on factors outside the game, and therefore these choices may be correlated. Players can even coordinate their actions among themselves. A good example of such correlation is the invention of the traffic light: when a motorist arrives at an intersection, he needs to decide whether to cross it, or alternatively to give right of way to motorists approaching the intersection from different directions. If the motorist were to use a mixed strategy in this situation, that would be tantamount to tossing a coin and entering the intersection based on the outcome of the coin toss. If two motorists approaching an intersection simultaneously use this mixed strategy, there is a positive probability that both of them will try to cross the intersection at the same time – which means that there is a positive probability that a traffic accident will ensue. In some states in the United States, there is an equilibrium rule that requires motorists to stop before entering an intersection, and to give right of way to whoever arrived at the intersection earlier. The invention of the traffic light provided a different solution: the traffic light informs each motorist which pure strategy to play, at any given time. The traffic light thus correlates the pure strategies of the players. Note that the traffic light does not, strictly speaking, choose a pure strategy for the motorist; it recommends a pure strategy. It is in the interest of each motorist to follow that recommendation, even if we suppose there are no traffic police watching, no cameras, and no possible court summons awaiting a motorist who disregards the traffic light’s recommendation. 300
301
8.1 Examples
The concept of correlated equilibrium, which is an equilibrium in a game where players’ strategies may be correlated, is the subject of this chapter. As we will show, correlation can be beneficial to the players.
8.1
Examples • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Example 8.1 Battle of the Sexes Consider the Battle of the Sexes game, as depicted in Figure 8.1 (see also Example 4.21 on page 98). The game has three equilibria (verify that this is true): 1. (F, F ): the payoff is (2, 1). 2. (C, C): the payoff is (1, 2). 3. ([ 32 (F ), 13 (C)], [ 13 (F ), 23 (C)]): in this equilibrium, every player uses mixed strategies. The row player plays [ 23 (F ), 13 (C)] – he chooses F with probability two-thirds, and T with probability one-third. The column player plays [ 13 (F ), 23 (C)]. The expected payoff in this case is ( 32 , 23 ).
F
Player II C
F
2, 1
0, 0
C
0, 0
1, 2
Player I Figure 8.1 The Battle of the Sexes
The first two equilibria are not symmetric; in each one, one of the players yields to the preference of the other player. The third equilibrium, in contrast, is symmetric and gives the same payoff to both players, but that payoff is less than 1, the lower payoff in each of the two pure equilibria. The players can correlate their actions in the following way. They can toss a fair coin. If the coin comes up heads, they play (F, F ), and if it comes up tails, they play (C, C). The expected payoff is then (1 12 , 1 12 ). Since (F, F ) and (C, C) are equilibria, the process we have just described is an equilibrium in an extended game, in which the players can toss a coin and choose their strategies in accordance with the result of the coin toss: after the coin toss, neither player can profit by ◭ unilaterally deviating from the strategy recommended by the result of the coin toss.
The reasoning behind this example is as follows: if we enable the players to conduct a joint (public) lottery, prior to playing the game, they can receive as an equilibrium payoff every convex combination of the equilibrium payoffs of the original game. That is, if we denote by V the set of equilibrium payoffs in the original game, every payoff in the convex hull of V is an equilibrium payoff in the extended game in which the players can conduct a joint lottery prior to playing the game. The question naturally arises whether it is possible to create a correlation mechanism, such that the set of equilibrium payoffs in the game that corresponds to this mechanism includes payoffs that are not in the convex hull of V . The following examples show that the answer to this question is affirmative.
302
Correlated equilibria
Example 8.2 Consider the three-player game depicted in Figure 8.2, in which Player I chooses the row (T or B), Player II chooses the column (L or R), and Player III chooses the matrix (l, c, or r).
l
c
L
R
T
0, 1, 3
0, 0, 0
B
1, 1, 1
1, 0, 0
r
L
R
T
2, 2, 2
0, 0, 0
B
2, 2, 0
2, 2, 2
L
R
T
0, 1, 0
0, 0, 0
B
1, 1, 1
1, 0, 3
Figure 8.2 The payoff matrix of Example 8.2
We will show that the only equilibrium payoff of this game is (1, 1, 1), but there exists a correlation mechanism that induces an equilibrium payoff of (2, 2, 2). In other words, every player gains by using the correlation mechanism. Since (1, 1, 1) is the only equilibrium payoff of the original game, the vector (2, 2, 2) is clearly outside the convex hull of the original game’s set of equilibrium payoffs. Step 1: The only equilibrium payoff is (1, 1, 1). We will show that every equilibrium is of the form (B, L, [α(l), (1 − α)(r)]), for some 0 ≤ α ≤ 1. (Check that the payoff given by any strategy vector of this form is (1, 1, 1), and that each of these strategy vectors is indeed an equilibrium.) To this end we eliminate strictly dominated strategies (see definition 4.6 on page 86). We first establish that at every equilibrium there is a positive probability that the pair of pure strategies chosen by Players II and III will not be (L, c). To see this, when Player II plays L, strategy l strictly dominates strategy c for Player III, so it cannot be the case that at equilibrium Player II plays L with probability 1 and Player III plays c with probability 1. We next show that at every equilibrium, Player I plays strategy B. To see this, note that the pure strategy B weakly dominates T (for Player I). In addition, if the probability of (L, c) is not 1, strategy B yields a strictly higher payoff to Player I than strategy T . It follows that the pure strategy T cannot be played at equilibrium. Finally, we show that at every equilibrium Player II plays strategy L and Player III plays either l or r. To see this, note that after eliminating strategy T , strategy r strictly dominates c for Player III, hence Player III does not play c at equilibrium, and after eliminating strategy c, strategy L strictly dominates R for Player II. We are left with only two entries in the matrix: (B, L, l) and (B, L, r), both of which yield the same payoff, (1, 1, 1). Thus any convex combination of these two matrix entries is an equilibrium, and there are no other equilibria. Step 2: The construction of a correlation mechanism leading to the payoff (2, 2, 2). Consider the following mechanism that the players can implement:
r Players I and II toss a fair coin, but do not reveal the result of the coin toss to Player III. r Players I and II play either (T , L) or (B, R), depending on the result of the coin toss. r Player III chooses strategy c. Under the implementation of this mechanism, the action vectors that are chosen (with equal probability) are (T , L, c) and (B, R, c), hence the payoff is (2, 2, 2). Finally, we check that no player has a unilateral deviation that improves his payoff. Recall that because the payoff function is multilinear, it suffices to check whether or not this is true for a deviation to a pure strategy. If Player III deviates and chooses l or r, his expected payoff is 1 × 3 + 21 × 0 = 1 21 , and hence he cannot gain from deviating. Players I and II cannot profit from 2 deviating, because whatever the outcome of the coin toss is, the payoff to each of them is 2, the maximal payoff in the game. ◭
303
8.1 Examples
For the mechanism described in Figure 8.2 to be an equilibrium, it is necessary that Players I and II know that Player III does not know the result of the coin toss. In other words, while every payoff in the convex hull of the set of equilibrium payoffs can be attained by a public lottery, to attain a payoff outside the convex hull of V it is necessary to conduct a lottery that is not public, in which case different players receive different partial information regarding the result of the lottery.
Example 8.3 The game of “Chicken” Consider the two-player non-zero-sum game depicted in Figure 8.3.
Player II L
R
T
6, 6
2, 7
B
7, 2
0, 0
Player I Figure 8.3 The game of “Chicken”
The following background story usually accompanies this game. Two drivers are racing directly towards each other down a single-lane road. The first to lose his nerve and swerve off the road before the cars collide is the loser of the game, the “chicken.” In this case, the utility of the loser is 2, and the utility of the winner is 7. If neither player drives off the road, the cars collide, both players are injured, and they each have a utility of 0. If they both swerve off the road simultaneously, the utility of each of them is 6. The game has three equilibria (check that this is true): 1. The players play (T , R). The payoff is (2, 7). 2. The players play (B, L). The payoff is (7, 2). 3. The players play 23 (T ), 13 (B) , 23 (L), 13 (R) . The payoff is (4 32 , 4 32 ).
Consider the following mechanism, in which an outside observer gives each player a recommendation regarding which action to take, but the observer does not reveal to either player what recommendation the other player has received. The observer chooses between three action vectors, (T , L), (T , R), and (B, L), with equal probability (see Figure 8.4).
T B
L
R
1 3 1 3
1 3
0
Figure 8.4 The distribution that the observer uses to choose the action vector
After conducting a lottery to choose one of the three action vectors, the observer provides Player I with a recommendation to play the first coordinate of the vector that was chosen, and he provides Player II with a recommendation to play the second coordinate of that vector. For example, if the action vector (T , L) has been chosen, the observer recommends T to Player I and L to Player II. If Player I receives a recommendation to play T , the conditional probability that Player II has 1
received a recommendation to play L is 1 +3 1 = 21 , which is also the conditional probability that 3 3 he has received a recommendation to play R. In contrast, if Player I receives a recommendation to play B, he knows that Player II has received L as his recommended action.
304
Correlated equilibria We now show that neither player can profit by a unilateral deviation from the recommendation received from the observer. As we stated above, if the recommendation to Player I is to play T , Player II has received a recommendation to play L with probability 21 , and a recommendation to play R with probability 12 . Player I’s expected payoff if he follows the recommended strategy of T is therefore 1 × 6 + 21 × 2 = 4, while his expected payoff if he deviates and plays B is 21 × 7 + 21 × 0 = 3 12 . 2 In this case, Player I cannot profit by unilaterally deviating from the recommended strategy. If the recommendation to Player I is to play B, then with certainty Player II has received a recommendation to play L. The payoff to Player I in this case is then 7 if he plays the recommended strategy B, and only 6 if he deviates to T . Again, in this case, Player I cannot profit by deviating from the recommended strategy. By symmetry, Player II similarly cannot profit by not following his recommended strategy. It follows that this mechanism induces an equilibrium in the extended game with an outside observer. The expected equilibrium payoff is 1 3 (6, 6)
+ 13 (7, 2) + 13 (2, 7) = (5, 5),
(8.1)
which lies outside the convex hull of the three equilibrium payoffs of the original game, (2, 7), (7, 2), and (4 32 , 4 23 ). (A quick way to become convinced of this is to notice that the sum of the payoffs in the vector (5, 5) is 10, while the sum of the payoffs in the three equilbrium payoffs is either 9 or 9 31 , both of which are less than 10.) ◭
Examples 8.1 and 8.3 show that the way to attain a high payoffs for both players is to avoid the “worst” payoff (0, 0). This cannot be accomplished if the players implement independent mixed strategies; it requires correlating the players’ actions. We have made the following assumptions regarding the extended game:
r The game includes an observer, who recommends strategies to the players. r The observer chooses his recommendations probabilistically, based on a probability distribution that is commonly known to the players. r The recommendations are private, with each player knowing only the recommendation addressed to him or her. r The mechanism is common knowledge1 among the players: each player knows that this mechanism is being used, each player knows that the other players know that this mechanism is being used, each player knows that the other players know that the other players know that this mechanism is being used, and so forth. As we will see in the formal definition of correlated equilibria in the next section, the fact that the recommendations are privately provided to each player does not exclude the possibility that the recommendations may be public (in which case the recommendations to each player are identical), or that a player can deduce which recommendations the other players have received given the recommendation he has received, as we saw in Example 8.3: in the correlated equilibrium of the game of “Chicken,” if Player I receives the recommendation to play B, he can deduce that Player II’s recommended strategy is L.
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
1 See Definition 4.9 (page 87). The formal definition of common knowledge is Definition 9.2 on page 321.
305
8.2 Definition and properties of correlated equilibrium
8.2
Definition and properties of correlated equilibrium • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The concept of correlated equilibrium formally captures the sort of correlation that we saw in Example 8.3. In that example, we added an outside observer to the strategic game G who chooses a pure strategy vector, and recommends that each player play his part in this vector. We will now present the formal definition of this concept. To distinguish between the strategies in the strategic-form game G and the strategies in the game that includes the observer we will call pure strategies in G actions. Let G = (N, (Si )i∈N , (ui )i∈N ) be a strategic-form game, where N is the set of players, Si is the set of actions of player i ∈ N, and ui : S → R is player i’s payoff function, where S = ×i∈N Si is the set of strategy vectors. For every probability distribution p over the set S, define a game Ŵ ∗ (p) as follows:
r An outside observer probabilistically chooses an action vector from S, according to the probability distribution p. r To each player i ∈ N the observer reveals si , but not s−i . In other words, the observer reveals to player i his coordinate in the action vector that was chosen; to be interpreted as the recommended action to play. r Each player i chooses an action si′ ∈ Si (si′ may be different from the action revealed by the observer). r The payoff of each player i is ui (s1′ , . . . , sn′ ). This describes an extensive-form game with information sets. A presentation of the extensive-form game corresponding to the game of “Chicken,” with the addition of the correlation mechanism described above, is shown in Figure 8.5. Near every chance move in the figure, we have noted the respective recommendation of the observer for that choice. The actions T1 and T2 in the figure correspond to the action T in the strategic-form game: T1 represents the possible action T when the observer’s recommendation is T ; T2 represents the possible action T when the observer’s recommendation is B. Actions B1 and B2 similarly correspond to action B, and so forth. The information revealed by the observer to player i will be termed a recommendation: the observer recommends that player i play the action si in the original game. The player is not obligated to follow the recommendation he receives, and is free to play a different action (or to use a mixed action, i.e., to conduct a lottery in order to choose between several actions). A player’s pure strategy in an extensive-form game with information sets is a function that maps each of that player’s information sets to a possible action. Since every information set in the game Ŵ ∗ (p) is associated with a recommendation of the observer, and the set of possible actions at each information set of player i is Si , we obtain the following definition of a pure strategy in Ŵ ∗ (p). Definition 8.4 A (pure) strategy of player i in the game Ŵ ∗ (p) is a function τi : Si → Si mapping every recommendation si of the observer to an action τi (si ) ∈ Si . Suppose the observer has recommended that player i play the action si . This fact enables player i to deduce the following regarding the recommendations that the other players
306
Correlated equilibria
L1 T1
1 3 (TL)
L1
R1
(7, 2) (0, 0)
B1 II I
1 3 (BL)
R1
T2 L1 B2
R1
0
L1 R1
L2
L2 R2
II
R2
T2 L2
0(BR ) B2
(6, 6) (2, 7) (7, 2) (0, 0)
B1
I
(6, 6) (2, 7) (7, 2) (0, 0)
T1 1 3 (TR)
(6, 6) (2, 7)
L2 R2
R2
(6, 6) (2, 7) (7, 2) (0, 0)
Figure 8.5 The game of “Chicken,” for the probability distribution p given in
Figure 8.1, in extensive form
have received: since the probability that player i receives recommendation si is
p(si , t−i ),
(8.2)
t−i ∈S−i
the conditional probability that the observer has chosen the action vector s = (si , s−i ) is p(si , s−i ) . t−i ∈S−i p(si , t−i )
p(s−i | si ) =
(8.3)
The conditional probability in Equation (8.3) is defined when the denominator is positive, i.e., when the probability that player i receives recommendation si is positive. When t−i ∈S−i p(si , t−i ) = 0, the probability that player i receives recommendation si is zero, and in this case the conditional probability p(s−i | si ) is undefined. One strategy available to player i is to follow the observer’s recommendation. For each player i ∈ N, define a strategy τi∗ by: τi∗ (si ) = si , ∀si ∈ Si .
(8.4)
Is the pure strategy vector τ ∗ = (τ1∗ , . . . , τn∗ ), in which each player i follows the observer’s recommendation, an equilibrium? As might be expected, the answer to that question depends on the probability distribution p, as specified in the following theorem.
307
8.2 Definition and properties of correlated equilibrium
Theorem 8.5 The strategy vector τ ∗ is an equilibrium of the game Ŵ ∗ (p) if and only if
p(si , s−i )ui (si , s−i ) ≥ p(si , s−i )ui (si′ , s−i ), ∀i, ∀si , si′ ∈ Si . (8.5) s−i ∈S−i
s−i ∈S−i
Proof: The strategy vector τ ∗ , in which each player follows the recommendation he receives, is an equilibrium if and only if no player i can profit by deviating to a strategy that differs from his recommendation. Equation (8.3) implies that the payoff that player i has under the action vector τ ∗ , when his recommended action is si , is
p(si , s−i ) × ui (si , s−i ) . (8.6) t−i ∈S−i p(si , t−i ) s ∈S −i
−i
Suppose player i decides to deviate and play action si′ instead of si , while the other players ∗ follow the recommendations (i.e., play τ−i ). The distribution of the actions of the other players is given by the conditional probability in Equation (8.3), and therefore player i’s expected payoff if he deviates to action si′ is
p(si , s−i ) (8.7) × ui (si′ , s−i ) . p(s , t ) i −i t ∈S −i −i s ∈S −i
−i
This means that the strategy vector τ ∗ is an equilibrium if and only if for each player i ∈ N, for each action si ∈ Si for which s−i ∈S−i p(si , s−i ) > 0, and for each action si′ ∈ Si :
p(si , s−i ) × ui (si , s−i ) p(s , t ) i −i t ∈S −i −i s−i ∈S−i
p(si , s−i ) ′ × ui (si , s−i ) . ≥ (8.8) t−i ∈S−i p(si , t−i ) s ∈S −i
−i
When the denominator of this equation is positive, we can reduce both sides of the inequality to obtain Equation (8.5). When t−i ∈S−i p(si , t−i ) = 0, Equation (8.5) holds true with equality: since (p(si , t−i ))t−i ∈S−i are nonnegative numbers, it is necessarily the case that p(si , t−i ) = 0 for each t−i ∈ S−i , and hence both sides of the inequality in Equation (8.5) are identically zero. We can now define the concept of correlated equilibrium. Definition 8.6 A probability distribution p over the set of action vectors S is called a correlated equilibrium if the strategy vector τ ∗ is a Nash equilibrium of the game Ŵ ∗ (p). In other words, for every player i ∈ N:
p(si , s−i )ui (si , s−i ) ≥ p(si , s−i )ui (si′ , s−i ), ∀si , si′ ∈ Si . (8.9) s−i ∈S−i
s−i ∈S−i
Every strategy vector σ induces a probability distribution pσ over the set of action vectors S, pσ (s1 , . . . , sn ) := σ1 (s1 ) × σ2 (s2 ) × · · · × σn (sn ).
(8.10)
308
Correlated equilibria
Under a Nash equilibrium σ ∗ the actions that each player chooses with positive probability are only those that give him maximal payoffs given that the other players implement the ∗ strategy vector σ−i , ∗ ∗ ), ∀si ∈ supp(σi∗ ), ∀si′ ∈ Si . ) ≥ ui (si′ , σ−i ui (si , σ−i
(8.11)
This leads to the following theorem (whose proof is left to the reader in Exercise 8.2). Theorem 8.7 For every Nash equilibrium σ ∗ , the probability distribution pσ ∗ is a correlated equilibrium. As Theorem 8.7 indicates, correlated equilibrium is in a sense an extension of the Nash equilibrium concept. When we relate to a Nash equilibrium σ ∗ as a correlated equilibrium we mean the probability distribution pσ ∗ given by Equation (8.10). For example, the convex hull of the set of Nash equilibria is the set conv{pσ ∗ : σ ∗ is a Nash equilibrium} ⊆ (S).
(8.12)
Since every finite normal-form game has a Nash equilibrium, we deduce the following corollary. Corollary 8.8 Every finite strategic-form game has a correlated equilibrium. Theorem 8.9 The set of correlated equilibria of a finite game is convex and compact. Proof: Recall that a half-space in Rm is defined by a vector α ∈ Rm and a real number β ∈ R, by the following equation: +
m
H (α, β) := x ∈ R :
m
i=1
αi x i ≥ β .
(8.13)
A half-space is a convex and closed set. Equation (8.9) implies that the set of correlated equilibria of a game is given by the intersection of a finite number of half-spaces. Since an intersection of convex and closed spaces is convex and closed, the set of correlated equilibria is convex and closed. Since the set of correlated equilibria is a subset of the set of probability distributions S, it is a bounded set, and so we conclude that it is a convex and compact set. Remark 8.10 A polytope in Rd is the convex hull of a finite number of points in Rd . The minimal set of points satisfying the condition that the polytope is its convex hull is called the set of extreme points of the polytope. (For the definition of the extreme points of a general set see Definition 23.2 on page 917.) Every bounded set defined by the intersection of a finite number of half-spaces is a polytope, from which it follows that the set of correlated equilibria of a game is a polytope. Since there exist efficient algorithms for finding the extreme points of a polytope (such as the simplex algorithm), it is relatively
309
8.2 Definition and properties of correlated equilibrium
easy to compute correlated equilibria, in contrast to computing Nash equilibria, which is computationally hard. (See, for example, Gilboa and Zemel [1989].)
Example 8.1 (Continued) Consider again the Battle of the Sexes, which is the two-player game shown in Figure 8.6.
Player II F C F
1, 2
0, 0
C
0, 0
2, 1
Player I Figure 8.6 Battle of the Sexes
We will compute the correlated equilibria of this game. Denote a probability distribution over the action vectors by p = [α(F, F ), β(F, C), γ (C, F ), δ(C, C)]. Figure 8.7 depicts this distribution graphically.
Player II F C F
α
β
C
γ
δ
Player I
Figure 8.7 Graphic representation of the probability distribution p
For a probability distribution p = [α(F, F ), β(F, C), γ (C, F ), δ(C, C)] to be a correlated equilibrium, the following inequalities must be satisfied (see Equation (8.9)): αu1 (F, F ) + βu1 (F, C) ≥ αu1 (C, F ) + βu1 (C, C),
(8.14)
αu2 (F, F ) + γ u2 (C, F ) ≥ αu2 (F, C) + γ u2 (C, C),
(8.16)
γ u1 (C, F ) + δu1 (C, C) ≥ γ u1 (F, F ) + δu1 (F, C), βu2 (F, C) + δu2 (C, C) ≥ βu2 (F, F ) + δu2 (C, F ), α + β + γ + δ = 1, α, β, γ , δ ≥ 0.
(8.15) (8.17) (8.18) (8.19)
Entering the values of the game matrix into these equations, we get 2α ≥ β,
δ ≥ 2γ ,
2δ ≥ β,
β 2.
α ≥ 2γ .
(8.20)
In other words, both α and δ must be greater than 2γ and The set of possible payoffs of the game (the triangle formed by the coordinates (0, 0), (1, 2), and (2, 1)) is shown in Figure 8.8, with the game’s three Nash equilibrium payoffs ((1, 2), (2, 1), ( 32 , 23 )) along with the set of correlated equilibrium payoffs (the dark triangle formed by (1, 2), (2, 1), and ( 23 , 23 )). In this case, the set of correlated equilibrium payoffs is the convex hull of the Nash equilibrium payoffs.
310
Correlated equilibria
2
1 2 3
0 0
2 3
1
2
Figure 8.8 The set of possible payoffs, the set of correlated equilibrium payoffs, and the Nash equilibrium payoffs of the game in Figure 8.1 ◭
Example 8.3 (Continued) The payoff matrix of the game in this example is shown in Figure 8.9.
Player II L R T
6, 6
2, 7
B
7, 2
0, 0
Player I Figure 8.9 The game of “Chicken”
A probability distribution over the set of action vectors is again denoted by p = [α(T , L), β(T , R), γ (B, L), δ(B, R)] (see Figure 8.10).
Player II L R T
α
β
B
γ
δ
Player I
Figure 8.10 Graphic depiction of the probability distribution p
For the probability distribution p to be a correlated equilibrium (see Equation (8.9)), the following inequalities must be satisfied: 6α + 2β ≥ 7α,
7γ ≥ 6γ + 2δ,
6α + 2γ ≥ 7α,
7β ≥ 6β + 2δ.
(8.21)
The equations imply that both β and γ must be greater than 2δ and α2 . The set of possible payoffs of the game (the rhombus formed by the coordinates (0, 0), (7, 2), (2, 7), and (6, 6)) is shown
311
8.2 Definition and properties of correlated equilibrium in Figure 8.11, along with the game’s three Nash equilibrium payoffs ((7, 2), (2, 7), and (4 23 , 4 23 )), with their convex hull (the dark triangle) and the set of correlated equilibrium payoffs (the dark-grey rhombus formed by (3 25 , 3 25 ), (7, 2), (2, 7), and (5 14 , 5 41 )).
7 6 5 14 4 23 3 52 2
0 0
2
4 23 5 14 6
3 25
7
Figure 8.11 The set of possible payoffs (light rhombus), the Nash equilibrium payoffs, the convex hull of the Nash equilibrium payoffs (dark triangle), and the correlated equilibrium payoffs (dark rhombus) of the game in Figure 8.3 ◭
Example 8.11 Consider the two-player game depicted in Figure 8.12, which resembles the Battle of the Sexes, but is not symmetric between the players. The game has three equilibria: (T , L), (B, R), and [ 53 (T ), 25 (B)], [ 23 (L), 13 (R)].
Player II L
R
T
1, 2
0, 0
B
0, 0
2, 3
Player I Figure 8.12 The payoff matrix of the game in Example 8.11
We will compute the correlated equilibria of the game. For a probability distribution over the set of action vectors p = [α(T , L), β(T , R), γ (B, L), δ(B, R)] to be a correlated equilibrium, the following inequalities must be satisfied (see Equation (8.9)): α ≥ 2β,
(8.22)
2α ≥ 3γ ,
(8.24)
2δ ≥ γ ,
3δ ≥ 2β,
α + β + γ + δ = 1, α, β, γ , δ ≥ 0.
(8.23) (8.25) (8.26) (8.27)
312
Correlated equilibria Note that the constraint α + β + γ + δ = 1 implies that: 2δ ≥ γ
⇐⇒
α + β + 32 γ ≤ 1,
(8.28)
3δ ≥ 2β
⇐⇒
α+
(8.29)
5 β 3
+ γ ≤ 1.
Figure 8.13 shows the sets defined by each of the four inequalities in Equations (8.22)–(8.25), along with the constraints that α, β, and γ be nonnegative, and that δ = 1 − α − β − γ ≥ 0. The intersection of these four sets is the set of correlated equilibria. To find this set, we will seek out its extreme points. The set of all the correlated equilibria is the subset of R3 defined by the intersection of eight half-spaces (Equations (8.22)–(8.25), along with the constraints that α ≥ 0, β ≥ 0, γ ≥ 0, and α + β + γ ≤ 1). Note that in this case, if α + 35 β + γ ≤ 1 then α + β + γ ≤ 1, and hence there is no need explicitly to require that α + β + γ ≤ 1. In addition, if we look at the hyperplanes defining these half-spaces, we notice that three of them intersect at one point (there are 73 = 35 such intersection points, some of them identical to each other). Each such intersection point satisfying all the constraints is an extreme point.
γ
γ (0, 12 , 1)
(0, 0, 1)
(1, 0, 1) (0, 21 , 0)
β
(0, 0, 0)
(1, 0, 23 )
β (0, 1, 0) (1, 1, 23 )
(1, 0, 0)
(1, 0, 0) α
(1, 1, 0)
α
α ≥ 2β
2α ≥ 3γ
γ
γ (0, 0, 1)
(0, 0, 23 )
β (0, 1, 0) (1, 0, 0)
β (0, 35 , 0) (1, 0, 0)
α α+ β+
3 γ 2
≤ 1
α α+
5 β 3
Figure 8.13 The sets defined by the inequalities in Equations (8.22)–(8.25)
+ γ≤ 1
313
8.4 Exercises A simple, yet tedious, calculation reveals that the set of all the correlated equilibria has five extreme points (recall that δ = 1 − α − β − γ ): (α, β, γ ) = (0, 0, 0),
(8.30)
(α, β, γ ) = (1, 0, 0), 6 3 (α, β, γ ) = 11 , 11 , 0 , 1 (α, β, γ ) = 2 , 0, 13 , 4 . (α, β, γ ) = 52 , 15 , 15
(8.31) (8.32) (8.33) (8.34)
It follows that the set of all the correlated equilibria is the smallest convex set containing these five points (see Figure 8.14). The three equilibrium points are: (T , L) corresponding to the point (1, 0, 0), (B, R) corresponding to the point (0, 0, 0), and ([ 53 (T ), 52 (B)], [ 23 (L), 13 (R)]) corresponding to the 4 ). In general, the Nash equilibria need not correspond to extreme points of the set of point ( 25 , 15 , 15 correlated equilibria.
γ ( 52 , 15 ,
4 15
)
(0, 0, 0)
( 12 , 0, 13 ) β ( 116 ,
3 11
, 0)
(1, 0, 0) α Figure 8.14 The set of correlated equilibria of the game in Example 8.11
8.3
◭
Remarks • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
This chapter is based on Aumann [1974], a major work in which the concept of correlated equilibrium was developed. The game in Exercise 8.21 was suggested by Yannick Viossat, in response to a question posed by Ehud Lehrer.
8.4
Exercises • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
8.1 What is the set of possible payoffs of the following game (the Battle of the Sexes game; see Example 8.1 on page 301) if: (a) the players are permitted to decide, and commit to, the mixed strategies that each player will use;
314
Correlated equilibria
(b) the players are permitted to make use of a public lottery that chooses a strategy vector and instructs each player which pure strategy to choose.
Player II F C F
1, 2
0, 0
C
0, 0
2, 1
Player I
8.2 Prove Theorem 8.7 on page 308: for every Nash equilibrium σ ∗ in a strategic-form game, the probability distribution pσ ∗ that σ ∗ induces on the set of action vectors S is a correlated equilibrium. 8.3 The set of all probability distributions pσ over the set of action vectors S that are induced by Nash equilibria σ is W := {pσ : σ is a Nash equilibrium} ⊆ (S).
(8.35)
Prove that any point in the convex hull of W is a correlated equilibrium. 8.4 Prove that in every correlated equilibrium, the payoff to each player i is at least his maxmin value in mixed strategies. v i = max min Ui (σi , σ−i ). σi ∈i σ−i ∈−i
(8.36)
8.5 Given a strategic-form game G = (N, (Si )i∈N , (ui )i∈N ), write out a linear program whose set of solution vectors is the set of correlated equilibria of the game. = (N, (Si )i∈N , ( 8.6 Let G = (N, (Si )i∈N , (ui )i∈N ) and G ui )i∈N ) be strategically equivalent games (see Definition 5.34 on page 174). What is the relation between the What is set of correlated equilibria of G and the set of correlated equilibria of G? the relation between the set of correlated equilibrium payoffs of G and the set of Justify your answers. correlated equilibrium payoffs of G?
be the game 8.7 Let G = (N, (Si )i∈N , (ui )i∈N ) be a game in strategic form, and let G derived from G by a process of iterated elimination of strictly dominated strategies. What is the relation between the set of correlated equilibria of G and the set of Justify your answer. correlated equilibria of G? 8.8 Find the correlated equilibrium that maximizes the sum of the players’ payoffs in Example 8.1 (page 301), and in Example 8.3 (page 303).
8.9 Find a correlated equilibrium whose expected payoff is ( 40 , 36 ) in the game of 9 9 “Chicken” (Example 8.3 on page 303).
315
8.4 Exercises
8.10 In the following game, compute all the Nash equilibria, and find a correlated equilibrium that is not in the convex hull of the Nash equilibria.
L
Player I
Player II C
R
T
0, 0
2, 4
4, 2
M
4, 2
0, 0
2, 4
B
2, 4
4, 2
0, 0
8.11 Repeat Exercise 8.10 for the following game. Player II L R T
8, 8
4, 9
B
9, 4
1, 1
Player I
8.12 In this exercise, we present an extension of the correlated equilibrium concept. Let G = (N, (Si )i∈N , (ui )i∈N ) be a strategic-form game, and (Mi )i∈N be finite sets of messages. For each probability distribution q over the product set M := ×i∈N Mi ∗ define a game ŴM (q) as follows:
r An outside observer chooses a vector of messages m = (mi )i∈N ∈ M probabilistically, using the probability distribution q. r The observer reveals mi to player i ∈ N, but not m−i . In other words, the observer reveals to player i his coordinate in the vector of messages that has been chosen. r Each player i chooses an action si ∈ Si . r Each player i has payoff ui (s1 , . . . , sn ). ∗ This is a generalization of the game Ŵ ∗ (p), which is ŴM (q) for the case Mi = Si for every player i and q = p. Answer the following questions:
∗ (q)? (a) What is the set of behavior strategies of player i in the game ŴM (b) Show that every vector of behavior strategies induces a probability distribution over the set of action vectors S = ×i∈N Si . ∗ (q), the probability distribution (c) Prove that at every Nash equilibrium of ŴM induced on the set of pure strategy vectors S is a correlated equilibrium.
8.13 Show that there exists a unique correlated equilibrium in the following game, in which a, b, c, d ∈ (− 14 , 14 ). Find this correlated equilibrium. What is the limit of the correlated equilibrium payoff as a, b, c, and d approach 0?
316
Correlated equilibria Player II L R T
1, 0
c, 1 + d
B
0, 1
1 + a, b
Player I
8.14 Let si be a strictly dominated action of player i. Is there acorrelated equilibrium under which si is chosen with positive probability, i.e., s−i ∈S−i p(si , s−i ) > 0? Justify your answer. 8.15 Prove that in a two-player zero-sum game, every correlated equilibrium payoff to Player I is the value of the game in mixed strategies. 8.16 In this and the following exercise, we will show that the result of Exercise 8.15 partially obtains for equilibrium strategies. Prove that if p is a correlated equilibrium of a two-player zero-sum game, then for every recommendation sI that Player I receives with positive probability, the conditional probability (p(sII | sI ))sII ∈SII is an optimal strategy for Player II. Deduce from this that the marginal distribution of p over the set of actions of each of the players is an optimal strategy for that player. 8.17 In the following two-player zero-sum game, find the value of the game, the optimal strategies of the two players, and the set of correlated equilibria. Does every correlated equilibrium lie in the convex hull of the product distributions that correspond to pairs of optimal strategies? Player II
Player I
L
C
R
T
0
0
1
M
1
1
0
B
1
1
0
8.18 Prove that the set-valued function that assigns to every game its set of correlated equilibria is an upper semi-continuous mapping.2 In other words, let (Gk )k∈N be a sequence of games (Gk ) = (N, (Si )i∈N , (uki )k∈N ), all of which share the same set of players N and the same sets of actions (Si )i∈N . Further suppose that for each player i, the sequence of payoff functions (uki )k∈N converges to a limit ui , lim uki (s) = ui (s), ∀s ∈ S.
k→∞
(8.37)
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
2 A set-valued function F : X → Y between two topological spaces is called upper semi-continuous if its graph Graph(F ) = {(x, y) : y ∈ F (x)} is a closed set in the product space X × Y .
317
8.4 Exercises
Suppose that for each k ∈ N the probability distribution p k is a correlated equilibrium of Gk , and the sequence (pk )k∈N converges to a limit p, lim pk (s) = p(s), ∀s ∈ S.
k→∞
(8.38)
Prove that p is a correlated equilibrium of the game (N, (Si )i∈N , (ui )i∈N ). 8.19 A Nash equilibrium σ ∗ = (σi∗ )i∈N is called a strict equilibrium if for every player i and every action si ∈ Si satisfying σi∗ (si ) = 0, ∗ ). ui (σ ∗ ) > ui (si , σ−i
(8.39)
In words, if player i deviates by playing an action that is not in the support of σi∗ then he loses. A correlated equilibrium p is called a strict correlated equilibrium if the strategy vector τ ∗ is a strict equilibrium in the game Ŵ ∗ (p). Answer the following questions: (a) Does every game in strategic form have a strict correlated equilibrium? If your answer is yes, provide a proof. If your answer is no, provide a counterexample. (b) Find all the strict correlated equilibria of the following two-player game. Player II L R T
4, 2
3, 4
B
5, 1
0, 0
Player I
8.20 Harry (Player I) is to choose between the payoff vector (2, 1) and playing the following game, as a row player, against Harriet (Player II), the column player: Player II L R T
0, 0
1, 3
B
4, 2
0, 0
Player I
(a) What are Harry’s pure strategies in this game? What are Harriet’s? (b) What are the Nash equilibria of the game? (c) What is the set of correlated equilibria of the game? 8.21 Let x1 , x2 , . . . , xn and y1 , y2 , . . . , yn be positive numbers. Consider the two-player strategic game with the following payoff matrix.
318
Correlated equilibria
Player II
Player I
x1 , y1
0, 0
0, 0
...
0, y1
0, 0
x 2 , y2
0, 0
...
0, y2
0, 0
0, 0
x 3 , y3
...
0, y3
...
...
...
...
...
x1 , 0
x2 , 0
x3 , 0
...
x n , yn
(a) Find the set of Nash equilibria of this game. (b) Prove that the set of correlated equilibria of this game is the convex hull of the set of Nash equilibria. 8.22 Let A and B be two sets in R2 satisfying:
r A ⊆ B; r A is a union of a finite number of rectangles; r B is the convex hull of a finite number of points. Prove that there is a two-player strategic-form game satisfying the property that its set of Nash equilibrium payoffs is A, and its set of correlated equilibrium payoffs is B. Hint: Make use of the game in Exercise 8.21, along with Exercise 5.44 in Chapter 5. 8.23 Let x, y, a, b be positive numbers. Consider the two-player strategic-form game with the following payoff matrix, in which Player I chooses a row, and Player II chooses a column. x − 1, y + 1
0, 0
0, 0
x + 1, y − 1
0, y
x + 1, y − 1
x − 1, y + 1
0, 0
0, 0
0, y
0, 0
x + 1, y − 1
x − 1, y + 1
0, 0
0, y
0, 0
0, 0
x + 1, y − 1
x − 1, y + 1
0, y
x, 0
x, 0
x, 0
x, 0
a, b
(a) Find the set of Nash equilibria of this game. (b) Find the set of correlated equilibria of this game.
9
Games with incomplete information and common priors
Chapter summary In this chapter we study situations in which players do not have complete information on the environment they face. Due to the interactive nature of the game, modeling such situations involves not only the knowledge and beliefs of the players, but also the whole hierarchy of knowledge of each player, that is, knowledge of the knowledge of the other players, knowledge of the knowledge of the other players of the knowledge of other players, and so on. When the players have beliefs (i.e. probability distributions) on the unknown parameters that define the game, we similarly run into the need to consider infinite hierarchies of beliefs. The challenge of the theory was to incorporate these infinite hierarchies of knowledge and beliefs in a workable model. We start by presenting the Aumann model of incomplete information, which models the knowledge of the players regarding the payoff-relevant parameters in the situation that they face. We define the knowledge operator, the concept of common knowledge, and characterize the collection of events that are common knowledge among the players. We then add to the model the notion of belief and prove Aumann’s agreement theorem: it cannot be common knowledge among the players that they disagree about the probability of a certain event. An equivalent model to the Aumann model of incomplete information is a Harsanyi game with incomplete information. After presenting the game, we define two notions of equilibrium: the Nash equilibrium corresponding to the ex ante stage, before players receive information on the game they face, and the Bayesian equilibrium corresponding to the interim stage, after the players have received information. We prove that in a Harsanyi game these two concepts are equivalent. Finally, using games with incomplete information, we present Harsanyi’s interpretation of mixed strategies.
As we have seen, a very large number of real-life situations can be modeled and analyzed using extensive-form and strategic-form games. Yet, as Example 9.1 shows, there are situations that cannot be modeled using those tools alone.
319
320
Games with incomplete information and common priors
Example 9.1 Consider the Matching Pennies game, which is depicted in Figure 9.1 in both extensive form and strategic form.
R I
T
L
(1, –1)
R
(–1, 1)
L
(–1, 1)
Player II L R
II B
T
1, –1
–1, 1
Player I
B –1, 1 1, –1 (1, –1) R Figure 9.1 The game of Matching Pennies, in extensive form and strategic form Suppose that Player I knows that he is playing Matching Pennies, but believes that Player II does not know that the pure strategy R is available to her. In other words, Player I believes that Player II is convinced that she has only one pure strategy, L. Suppose further that Player II does in fact know that she (Player II) is playing Matching Pennies, with both pure strategies available. How can we model this game? Neither the extensive-form nor the strategic-form descriptions of the game enable us to model such a state of players’ knowledge and beliefs. If we try to analyze this situation using only the depictions of the game appearing in Figure 9.1, we will not be able to predict how the players will play, or recommend an optimal course of action. For example, as we showed on page 52, the optimal strategy of Player I playing Matching Pennies is the mixed strategy [ 12 (T ), 12 (B)]. But in the situation we have just described, Player I believes that Player II will play L, so that his best reply is the pure strategy T . Note that Player I’s optimal strategy depends only on how he perceives the game: what he knows about the game and what he believes Player II knows about the game. The way that Player II really perceives the game (which is not necessarily known to Player I) has no effect on the strategy chosen by Player I. Consider next a slightly more complicated situation, in which Player I knows that he is playing Matching Pennies, he believes that Player II knows that she is playing Matching Pennies, and he believes that Player II believes that Player I does not know that the pure strategy B is available to him. Then Player I will believe that Player II believes that Player I will play strategy T , and he will therefore conclude that Player II will select strategy R, and Player I’s best strategy will therefore be B. A similar situation obtains if there is incomplete information regarding some of the payoffs. For example, suppose that Player I knows that his payoff under the strategy profile (T , L) is 5 rather than 1, but believes that Player II does not know this, and that she thinks the payoff is 1. How should Player I play in this situation? Or consider an even more complicated situation, in which both Player I and Player II know that Player I’s payoff under (T , L) is 5, but Player II believes Player I does not know that she (Player II) knows this; Player II believes Player I believes Player II ◭ thinks the payoff is 1.
Situations like those described in Example 9.1, in which players do not necessarily know which game is being played, or are uncertain about whether the other players know which game is being played, or are uncertain whether the other players know whether the other players know which game is being played, and so on, are called situations of
321
Games with incomplete information and common priors
“incomplete information.” In this chapter we study such situations, and see how they can be modeled and analyzed as games. Notice that neither of the situations described in Example 9.1 is well defined, as we have not precisely defined what the players know. For example, in the second case we did not specify what Player I knows about what Player II knows about what Player I knows about what Player II knows, and we did not touch upon what Player II knows. Consideration of hierarchies of levels of knowledge leads to the concept of common knowledge, which we touched upon in Section 4.5 (page 87). An informal definition of common knowledge is: Definition 9.2 A fact F is common knowledge among the players of a game if all the players know F , all the players know that all the players know F , all the players know that all the players know that all the players know F , and so on (for every finite number of levels).1 Definition 9.2 is incomplete, because we have not yet defined what we mean by a “fact,” nor have we defined the significance of the expression “knowing a fact.” These concepts will be modeled formally later in this chapter, but for now we will continue with an informal exposition. So far we have seen that in situations involving several players, incomplete knowledge of the game that is being played leads us to consider infinite hierarchies of knowledge. In decision-making situations with incomplete information, describing the information that decision makers have usually cannot be captured by labeling a given fact as “known” or “unknown.” Decision makers often have assessments or beliefs about the truthfulness of various facts. For example, when a person takes out a variable-rate loan he never has precise knowledge of the future fluctuations of the interest rate (which can significantly affect the total amount of loan repayment), but he may have certain beliefs about future rates, such as “I assign probability 0.7 to the event that there will be lower interest rates over the term of the loan.” To take another example, a company bidding for oil exploration rights in a certain geographical location has beliefs about the amount of oil likely to be found there and the depth of drilling required (which affects costs and therefore expected profits). A trial jury passing judgment on a defendant expresses certain collective beliefs about the question: is the defendant guilty as charged? For our purposes in this chapter, the source of such probabilistic assessments is of no importance. The assessments may be based on “objective” measurements such as geological surveys (as in the oil exploration example), on impressions (as in the case of a jury deliberating the judgment it will render in a trial), or on personal hunches and information published in the media (as in the example of the variable-rate loan). Thus, probability assessments may be objective or subjective.2 In our models, a decision maker’s beliefs will be expressed by a probability distribution function over the possible values of parameters unknown to him.
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
1 A simple example of a fact that is common knowledge is a public event: when a teacher is standing before a class, that fact is common knowledge among the students, because every student knows that every student knows . . . that the teacher is standing before the class. 2 A formal model for deriving an individual’s subjective probability from his preferences was first put forward by Savage [1954], and later by Anscombe and Aumann [1963] (see also Section 2.8 on page 26).
322
Games with incomplete information and common priors
The most widely accepted statistical approach for dealing with decision problems in situations of incomplete information is the Bayesian approach.3 In the Bayesian approach, every decision maker has a probability distribution over parameters that are unknown to him, and he chooses his actions based on his beliefs as expressed by that distribution. When several decision makers (or players) interact, knowing the probability distribution (beliefs) of each individual decision maker is insufficient: we also need to know what each one’s beliefs are about the beliefs of the other decision makers, what they believe about his beliefs about the others’ beliefs, and so on. This point is illustrated by the following example. Example 9.1 (Continued) Returning to the Matching Pennies example, suppose that Player I attributes probability p1 to the event: “Player II knows that R is a possible action.” The action that Player I will choose clearly depends on p1 , because the entire situation hinges on the value of p1 : if p1 = 1, Player I believes that Player II knows that R is an action available to her, and if p1 = 0, he believes that Player II does not know that R is possible at all. If 0 < p1 < 1, Player I believes that it is possible that Player II knows that R is an available strategy. But the action chosen by Player I also depends on his beliefs about the beliefs of Player II: because Player I’s action depends on p1 , it follows that Player II’s action depends on her beliefs about p1 , namely, on her beliefs about Player I’s beliefs. By the same reasoning, Player I’s action depends on his beliefs about Player II’s beliefs about his own beliefs, p1 . As in the case of hierarchy of knowledge, we see that determining ◭ the best course of action of a Player requires considering an infinite hierarchy of beliefs.
Adding beliefs to our model is a natural step, but it leads us to an infinite hierarchy of beliefs. The concepts of knowledge and of beliefs are closely intertwined in games of incomplete information. For didactic reasons, however, we will treat the two notions separately, considering first hierarchies of knowledge and then hierarchies of beliefs.
9.1
The Aumann model of incomplete information and the concept of knowledge • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
In this section we will provide a formal definition of the concept of “knowledge,” and then construct hierarchies of knowledge: what each player knows about what the other players know. We will start with an example to illustrate the basic elements of the model. Example 9.3 Assume that racing cars are produced in three possible colors: gold, red, and purple. Color-blind individuals cannot distinguish between red and gold. Everyone knows that John is color-blind, but no one except Paul knows whether or not Paul is color-blind too. John and Paul are standing side by side viewing a photograph of the racing car that has just won first prize in the Grand Prix, and asking themselves what color it is. The parameter that is of interest in this example is the color of
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
3 The Bayesian approach is named after Thomas Bayes, 1702–1761, a British clergyman and mathematician who formulated a special case of the rule now known as Bayes’ rule.
323
9.1 The Aumann model and the concept of knowledge the car, which will later be called the state of nature, and we wish to describe the knowledge that the players possess regarding this parameter. If the color of the car is purple, then both color-blind and non-color-blind individuals know that fact, so that both John and Paul know that the car is purple, and each of them knows that the other knows that the car is purple. If, however, the car is red or gold, then John knows that it is either red or gold. As he does not know whether or not Paul is color-blind, he does not know whether Paul knows the exact color of the car. Because Paul knows that John is color-blind, if the car is red or gold he knows that John does not know what the precise color is, and John knows that Paul knows this. We therefore need to consider six distinct possibilities (three possibilities per car color times two possibilities regarding whether or not Paul is color-blind):
r The car is purple and Paul is not color-blind. John and Paul both know that the car is purple, they each know that the other knows that the car is purple, and so on.
r The car is purple and Paul is color-blind. Here, too, John and Paul both know that the car is purple, they each know that the other knows that the car is purple, and so on.
r The car is red and Paul is not color-blind. Paul knows the car is red; John knows that the car is red or gold; John does not know whether or not Paul knows the color of the car.
r The car is gold and Paul is not color-blind. Paul knows the car is gold; John knows that the car is red or gold; John does not know whether or not Paul knows the color of the car.
r The car is red and Paul is color-blind. Paul and John know that the car is red or gold; John does not know whether or not Paul knows the color of the car.
r The car is gold and Paul is color-blind. Paul and John know that the car is red or gold; John does not know whether or not Paul knows the color of the car. In each of these possibilities, both John and Paul clearly know more than we have explicitly written above. For example, in the latter four situations, Paul knows that John does not know whether Paul knows the color of the car. Each of the six cases is associated with what will be defined below as a state of the world, which is a description of a state of nature (in this case, the color of the car) and the state of knowledge of the players. Note that the first two cases describe the same state of the world, because the difference between them (Paul’s color-blindness) affects neither the color of the car, which is the parameter that is of interest to us, nor the knowledge of the players regarding the color of the car. ◭
The definition of the set of states of nature depends on the situation that we are analyzing. In Example 9.3 the color of the car was the focus of our interest – perhaps, for example, because a bet has been made regarding the color. Since the most relevant parameters in a game are the payoffs, in general we will want the states of nature to describe all the parameters that affect the payoffs of the players (these are therefore also called “payoffrelevant parameters”). For instance, if in Example 9.3 we were in a situation in which Paul’s color-blindness (or lack thereof) were to affect his utility, then color-blindness would be a payoff-relevant parameter and would comprise a part of the description of the state of nature. In such a model there would be six distinct states of nature, rather than three. Definition 9.4 Let S be a finite set of states of nature. An Aumann model of incomplete information (over the set S of states of nature) consists of four components (N, Y, (Fi )i∈N , s), where:
324
Games with incomplete information and common priors
r N is a finite set of players; r Y is a finite set of elements called states of the world;4 r Fi is a partition of Y , for each i ∈ N (i.e., a collection of disjoint nonempty subsets of Y whose union is Y ); r s : Y → S is a function associating each state of the world with a state of nature. The interpretation is that if the “true” state of the world is ω∗ , then each player i ∈ N knows only the element of his partition Fi that contains ω∗ . For example, if Y = {ω1 , ω2 , ω3 } and Fi = {{ω1 , ω2 }, {ω3 }}, then player i cannot distinguish between ω1 and ω2 . In other words, if the state of the world is ω1 , player i knows that the state of the world is either ω1 or ω2 , and therefore knows that the state of the world is not ω3 . For this reason, the partition Fi is also called the information of player i. The element of the partition Fi that contains the state of the world ω is denoted Fi (ω). For convenience, we will use the expression “the information of player i” to refer both to the partition Fi and to the partition element Fi (ω∗ ) containing the true state of the world. Definition 9.5 An Aumann situation of incomplete information over a set of states of nature S is a quintuple (N, Y, (Fi )i∈N , s, ω∗ ), where (N, Y, (Fi )i∈N , s) is an Aumann model of incomplete information and ω∗ ∈ Y . The state ω∗ is the “true state of the world” and each player knows the partition element Fi (ω∗ ) in his information partition that contains the true state. A situation of incomplete information describes a knowledge structure at a particular state of the world, i.e., in a particular reality. Models of incomplete information, in contrast, enable us to analyze all possible situations. Example 9.3 (Continued) An Aumann model of incomplete information for this example is as follows:
r r r r r r
N = {John, Paul}. S = {Purple Car, Red Car, Gold Car}. Y = {ωg,1 , ωr,1 , ωg,2 , ωr,2 , ωp }. John’s partition is FJ = {{ωg,1 , ωg,2 , ωr,1 , ωr,2 }, {ωp }}. Paul’s partition is FP = {{ωg,1 , ωr,1 }, {ωg,2 }, {ωr,2 }, {ωp }}. The function s is defined by s(ωg,1 ) = s(ωg,2 ) = Gold Car, s(ωr,1 ) = s(ωr,2 ) = Red Car,
s(ωp ) = Purple Car.
The state of the world ωp is associated with the situation in which the car is purple, in which case both John and Paul know that it is purple, and each of them knows that the other knows that the car is purple. It represents the two situations in the two first bullets on page 323, which differ only in whether Paul is color-blind or not. As we said before, these two situations are equivalent, and can be represented by the same state of the world, as long as Paul’s color-blindness is not
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
4 We will later examine the case where Y is infinite, and show that some of the results obtained in this chapter also hold in that case.
325
9.1 The Aumann model and the concept of knowledge payoff relevant, and hence is not part of the description of the state of nature. The state of the world ωg,1 is associated with the situation in which the car is gold and Paul is color-blind, while the state of the world ωr,1 is associated with the situation in which the car is red and Paul is color-blind; in both these situations, Paul cannot distinguish which state of the world holds, because he is color-blind and cannot tell red from gold. The state of the world ωg,2 is associated with the situation in which the car is gold and Paul is not color-blind, while the state of the world ωr,2 is associated with the situation in which the car is red and Paul is not color-blind; in both these cases Paul knows the true color of the car. Therefore, FP (ωg,2 ) = {ωg,2 }, and FP (ωg,1 ) = {ωg,1 , ωr,1 }. As for John, he is both color-blind and does not know whether Paul is color-blind. He therefore cannot distinguish between the four states of the world {ωg,1 , ωr,1 , ωg,2 , ωr,2 }, so that FJ (ωg,1 ) = FJ (ωg,2 ) = FJ (ωr,1 ) = FJ (ωr,2 ) = {ωg,1 , ωr,1 , ωg,2 , ωr,2 }. The true state of the world is one of the possible states in the set Y . The Aumann model along ◭ with the true state of the world describes the actual situation faced by John and Paul.
Definition 9.6 An event is a subset of Y . In Example 9.3 the event {ωg,1 , ωg,2 } is the formal expression of the sentence “the car is gold,” while the event {ωg,1 , ωg,2 , ωp } is the formal expression of the sentence “the car is either gold or purple.” We say that an event A obtains in a state of the world ω if ω ∈ A. It follows that if event A obtains in a state of the world ω and if A ⊆ B, then event B obtains in ω. Definition 9.7 Let (N, Y, (Fi )i∈N , s) be an Aumann model of incomplete information, let i be a player, let ω ∈ Y be a state of the world, and let A ⊆ Y be an event. Player i knows A in ω if Fi (ω) ⊆ A.
(9.1)
If Fi (ω) ⊆ A, then in state of the world ω player i knows that event A obtains (even though he may not know that the state of the world is ω), because according to his information, all the possible states of the world, Fi (ω), are included in the event A. Definition 9.8 Let (N, Y, (Fi )i∈N , s) be an Aumann model of incomplete information, let i be a player, and let A ⊆ Y be an event. Define an operator Ki : 2Y → 2Y by5 Ki (A) := {ω ∈ Y : Fi (ω) ⊆ A}.
(9.2)
We will often denote Ki (A), the set of all states of the world in which player i knows event A, by Ki A. Thus, player i knows event A in state of the world ω∗ if and only if ω∗ ∈ Ki A. The definition implies that the set Ki A equals the union of all the elements in the partition Fi contained in A. The event Kj (Ki A) (which we will write as Kj Ki A for short) is the event that player j knows that player i knows A: Kj Ki A = {ω ∈ Y : Fj (ω) ⊆ Ki A}.
(9.3)
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
5 The collection of all subsets of Y is denoted by 2Y .
326
Games with incomplete information and common priors
Example 9.3 (Continued) Denote A = {ωp }, B = {ωr,2 }, and C = {ωr,1 , ωr,2 }. Then KJ A = {ωp } = A, KP A = {ωp } = A,
KJ B = ∅, KP B = {ωr,2 },
KJ C = ∅, KP C = {ωr,2 }.
The content of the expression KP B = {ωr,2 } is that only in state of the world ωr,2 does Paul know that event B obtains (meaning that only in that state of the world does he know that the car is red). The content of KJ B = ∅ is that there is no state of the world in which John knows that B obtains; i.e., he never knows that the car is red and that Paul is not color-blind. From this we conclude that KJ KP C = KJ B = ∅.
(9.4)
This means that there is no state of the world in which John knows that Paul knows that the car is red. In contrast, ωp ∈ KP KJ A, which means that in state of the world ωp Paul knows that John ◭ knows that the state of the world is ωp (and in particular, that the car is purple).
We can now present some simple results that follow from the above definition of knowledge. The first result states that if a player knows event A in state of the world ω, then it is necessarily true that ω ∈ A. In other words, if a player knows the event A, then A necessarily obtains (because the true state of the world is contained within it).6 Theorem 9.9 Ki A ⊆ A for every event A ⊆ Y and every player i ∈ N. Proof: Let ω ∈ Ki A. From the definition of knowledge it follows that Fi (ω) ⊆ A. Since ω ∈ Fi (ω) it follows that ω ∈ A, which is what we needed to prove. Our second result states that if event A is contained in event B, then the states of the world in which player i knows event A form a subset of the states of the world in which the player knows event B. In other words, in every state of the world in which a player knows event A, he also knows event B. Theorem 9.10 For every pair of events A, B ⊆ Y , and every player i ∈ N, A⊆B
=⇒
Ki A ⊆ Ki B.
(9.5)
Proof: We will show that ω ∈ Ki A implies that ω ∈ Ki B. Suppose that ω ∈ Ki A. By definition, Fi (ω) ⊆ A, and because A ⊆ B, one has Fi (ω) ⊆ B. Therefore, ω ∈ Ki B, which is what we need to show. Our third result7 says that if a player knows event A, then he knows that he knows event A, and conversely, if he knows that he knows event A, then he knows event A. Theorem 9.11 For every event A ⊆ Y and every player i ∈ N, we have Ki Ki A = Ki A. Proof: Theorems 9.9 and 9.10 imply that Ki Ki A ⊆ Ki A. We will show that the opposite inclusion holds, namely, if ω ∈ Ki A then ω ∈ Ki Ki A. If ω ∈ Ki A then Fi (ω) ⊆ A. Therefore, for every ω′ ∈ Fi (ω), we have ω′ ∈ Fi (ω′ ) = Fi (ω) ⊆ A. It follows that ω′ ∈ Ki A. As this is true for every ω′ ∈ Fi (ω), we deduce that Fi (ω) ⊆ Ki A, which implies that ω ∈ Ki Ki A. Thus, Ki A ⊆ Ki Ki A, which is what we wanted to prove. ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
6 In the literature, this is known as the “axiom of knowledge.” 7 One part of this theorem, namely, the fact that if a player knows an event, then he knows that he knows the event, is known in the literature as the “axiom of positive introspection.”
327
9.1 The Aumann model and the concept of knowledge
More generally, the knowledge operator Ki of player i satisfies the following five properties, which collectively are called Kripke’s S5 System: 1. Ki Y = Y : the player knows that Y is the set of all states of the world. 2. Ki A ∩ Ki B = Ki (A ∩ B): if the player knows event A and knows event B then he knows event A ∩ B. 3. Ki A ⊆ A: if the player knows event A then event A obtains. 4. Ki Ki A = Ki A: if the player knows event A then he knows that he knows event A, and vice versa. 5. (Ki A)c = Ki ((Ki A)c ): if the player does not know event A, then he knows that he does not know event A, and vice versa.8,9 Property 3 was proved in Theorem 9.9. Property 4 was proved in Theorem 9.11. The proof that the knowledge operator satisfies the other three properties is left to the reader (Exercise 9.1). In fact, Properties 1–5 characterize knowledge operators: for every operator K : 2Y → 2Y satisfying these properties there exists a partition F of Y that induces K via Equation (9.2) (Exercise 9.2). Example 9.12 Anthony, Betty, and Carol are each wearing a hat. Hats may be red (r) or blue (b). Each one of the three sees the hats worn by the other two, but cannot see his or her own hat, and therefore does not know its color. This situation can be described by an Aumann model of incomplete information as follows:
r The set of players is N = {Anthony, Betty, Carol}. r The set of states of nature is S = {(r, r, r), (r, r, b), (r, b, r), (r, b, b), (b, r, r), (b, r, b), (b, b, r), (b, b, b)}. A state of nature is described by three hat colors: that of Anthony’s hat (the left letter), of Betty’s hat (the middle letter), and of Carol (the right letter). r The set of states of the world is Y = {ωrrr , ωrrb , ωrbr , ωrbb , ωbrr , ωbrb , ωbbr , ωbbb }. r The function s : Y → S that maps every state of the world to a state of nature is defined by s(ωrrr ) = (r, r, r), s(ωbrr ) = (b, r, r),
s(ωrrb ) = (r, r, b), s(ωbrb ) = (b, r, b),
s(ωrbr ) = (r, b, r), s(ωbbr ) = (b, b, r),
s(ωrbb ) = (r, b, b), s(ωbbb ) = (b, b, b).
The information partitions of Anthony, Betty, and Carol are as follows: FA = {{ωrrr , ωbrr }, {ωrrb , ωbrb }, {ωrbr , ωbbr }, {ωrbb , ωbbb }}, FB = {{ωrrr , ωrbr }, {ωrrb , ωrbb }, {ωbrr , ωbbr }, {ωbrb , ωbbb }},
FC = {{ωrrr , ωrrb }, {ωrbr , ωrbb }, {ωbrr , ωbrb }, {ωbbr , ωbbb }}.
(9.6) (9.7) (9.8)
For example, when the state of the world is ωbrb , Anthony sees that Betty is wearing a red hat and that Carol is wearing a blue hat, but does not know whether his hat is red or blue, so that he knows that the state of the world is in the set {ωrrb , ωbrb }, which is one of the elements of his
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
8 The first part of this property, i.e., the fact that if a player does not know an event, then he knows that he does not know it, is known in the literature as the “axiom of negative introspection.” 9 For any event A, the complement of A is denoted by Ac := Y \ A.
328
Games with incomplete information and common priors partition FA . Similarly, if the state of the world is ωbrb , Betty knows that the state of the world is in her partition element {ωbrb , ωbbb }, and Carol knows that the state of the world is in her partition element {ωbrr , ωbrb }. Let R be the event “there is at least one red hat,” that is, R = {ωrrr , ωrrb , ωrbr , ωrbb , ωbrr , ωbrb , ωbbr }.
(9.9)
In which states of the world does Anthony know R? In which states does Betty know that Anthony knows R? In which states does Carol know that Betty knows that Anthony knows R? To begin answering the first question, note that in state of the world ωrrr , Anthony knows R, because FA (ωrrr ) = {ωrrr , ωbrr } ⊆ R.
(9.10)
Anthony also knows R in each of the states of the world ωrrb , ωrbr , ωbrb , ωbrr , and ωbbr . In contrast, in the states ωrbb and ωbbb he does not know R, because FA (ωrbb ) = FA (ωbbb ) = {ωrbb , ωbbb } ⊆ R.
(9.11)
In summary, KA R = {ω ∈ Y : FA (ω) ⊆ R} = {ωrrr , ωrbr , ωrrb , ωbrb , ωbrr , ωbbr }. The analysis here is quite intuitive: Anthony knows R if either Betty or Carol (or both) is wearing a red hat, which occurs in the states of the world in the set {ωrrr , ωrbr , ωrrb , ωbrb , ωbrr , ωbbr }. When does Betty know that Anthony knows R? This requires calculating KB KA R. KB KA R = {ω ∈ Y : FB (ω) ⊆ KA R}
= {ω ∈ Y : FB (ω) ⊆ {ωrrr , ωrbr , ωrrb , ωbrb , ωbrr , ωbbr }}
= {ωrrr , ωbrr , ωrbr , ωbbr }.
(9.12)
For example, since FB (ωrbr ) = {ωrbr , ωrrr } ⊆ KA R we conclude that ωrbr ∈ KB KA R. On the other hand, since FB (ωbrb ) = {ωbrb , ωbbb } ⊆ KA R, it follows that ωbrb ∈ KB KA R. The analysis here is once again intuitively clear: Betty knows that Anthony knows R only if Carol is wearing a red hat, which only occurs in the states of the world {ωrrr , ωbrr , ωrbr , ωbbr }. Finally, we answer the third question: when does Carol know that Betty knows that Anthony knows R? This requires calculating KC KB KA R. KC KB KA R = {ω ∈ Y : FC (ω) ⊆ KB KA R}
= {ω ∈ Y : FC (ω) ⊆ {ωrrr , ωbrr , ωrbr , ωbbr }} = ∅.
(9.13)
For example, since FC (ωrbr ) = {ωrbr , ωrbb } ⊆ KB KA R, we conclude that ωrbr ∈ KC KB KA R. In other words, there is no state of the world in which Carol knows that Betty knows that Anthony knows R. This is true intuitively, because as we saw previously, Betty knows that Anthony knows R only if Carol is wearing a red hat, but Carol does not know the color of her own hat. This analysis enables us to conclude, for example, that in state of the world ωrrr Anthony knows R, Betty knows that Anthony knows R, but Carol does not know that Betty knows that Anthony ◭ knows R.
Note the distinction in Example 9.12 between states of nature and states of the world. The state of nature is the parameter with respect to which there is incomplete information: the colors of the hats worn by the three players. The state of the world includes in addition the mutual knowledge structure of the players regarding the state of nature. For example, the state of the world ωrrr says a lot more than the fact that all three players are wearing red
329
9.1 The Aumann model and the concept of knowledge
hats; for example, in this state of the world Carol knows there is at least one red hat, Carol knows that Anthony knows that there is at least one red hat, and Carol does not know that Betty knows that Anthony knows that there is at least one red hat. In Example 9.12 there is a one-to-one correspondence between the set of states of nature S and the set of states of the world Y . This is so since the mutual knowledge structure is uniquely determined by the configuration of the colors of the hats. Example 9.13 Arthur, Harry, and Tom are in a room with two windows, one facing north and the other facing south. Two hats, one yellow and one brown, are placed on a table in the center of the room. After Harry and Tom leave the room, Arthur selects one of the hats and places it on his head. Tom and Harry peek in, each through a different window, watching Arthur (so that they both know the color of the hat Arthur is wearing). Neither Tom nor Harry knows whether or not the other player who has left the room is peeking through a window, and Arthur has no idea whether or not Tom or Harry is spying on him as he places one of the hats on his head. An Aumann model of incomplete information describing this situation is as follows:
r N = {Arthur, Harry, Tom}. r S = {Arthur wears the brown hat, Arthur wears the yellow hat}. r There are eight states of the world, each of which is designated by two indices:
r
r
r r
Y = {ωb,∅ , ωb,T , ωb,H , ωb,TH , ωy,∅ , ωy,T , ωy,H , ωy,TH }. The left index of ω indicates the color of the hat that Arthur is wearing (which is either brown or yellow), and the right index indicates which of the other players has been peeking into the room (Tom (T), Harry (H), both (TH), or neither(∅)). Arthur’s partition contains two elements, because he knows the color of the hat on his head, but does not know who is peeking into the room: FA = {{ωb,∅ , ωb,H , ωb,T , ωb,TH }, {ωy,∅ , ωy,H , ωy,I , ωy,TH }}. Tom’s partition contains three elements, one for each of his possible situations of information: Tom has not peeked into the room; Tom has peeked into the room and seen Arthur wearing the brown hat; Tom has peeked into the room and seen Arthur wearing the yellow hat. His partition is thus FT = {{ωb,∅ , ωb,H , ωy,∅ , ωy,H }, {ωb,T , ωb,TH }, {ωy,T , ωy,TH }}. For example, if Tom has peeked and seen the brown hat on Arthur’s head, he knows that Arthur has selected the brown hat, but he does not know whether he is the only player who peeked (corresponding to the state of the world ωb,T ) or whether Harry has also peeked (state of the world ωb,TH ). Similarly, Harry’s partition is FH = {{ωb,∅ , ωb,T , ωy,∅ , ωy,T }, {ωb,H , ωb,TH }, {ωy,H , ωy,TH }}. The function s is defined by s(ωb,∅ ) = s(ωb,T ) = s(ωb,H ) = s(ωb,TH ) = Arthur wears the brown hat;
s(ωy,∅ ) = s(ωy,T ) = s(ωy,H ) = s(ωy,TH ) = Arthur wears the yellow hat.
In this model, for example, if the true state of the world is ω∗ = ωb,TH , then Arthur is wearing the brown hat, and both Tom and Harry have peeked into the room. The event “Arthur is wearing the brown hat” is B = {ωb,∅ , ωb,T , ωb,H , ωb,TH }. Tom and Harry know that Arthur’s hat is brown only if they have peeked into the room. Therefore, KT B = {ωb,T , ωb,TH },
KH B = {ωb,H , ωb,TH }.
(9.14)
Given Equation (9.14), since the set KH B is not included in any of the elements in Tom’s partition, we conclude that KT KH B = ∅. In other words, in any state of the world, Tom does not know whether or not Harry knows that Arthur is wearing the brown hat, and therefore, in particular, this is the case at the given state of the world, ωb,TH . We similarly conclude that KH KT B = ∅: in any state
330
Games with incomplete information and common priors of the world, Harry does not know that Tom knows that Arthur is wearing the brown hat (and in particular this is the case at the true state of the world, ωb,TH ). This is all quite intuitive; Tom knows that Arthur is wearing the brown hat only if he has peeked into the room, but Harry does not know whether or not Tom has peeked into the room. Note again the distinction between a state of nature and a state of the world. The objective fact about which the players have incomplete information is the color of the hat atop Arthur’s head. Each one of the four states of the world {ωy,∅ , ωy,H , ωy,T , ωy,TH } corresponds to the state of nature “Arthur wears the yellow hat,” yet they differ in the knowledge that the players have regarding the state of nature. In the state of the world ωy,∅ , Arthur wears the yellow hat, but Tom and Harry do not know that, while in state of the world ωy,H , Arthur wears the yellow hat and Harry knows that, but Tom does not know that. Note that in both of these states of the world Tom and Arthur do not know that Harry knows the color of Arthur’s hat, Harry and Arthur do not know whether or not Tom knows the color of the hat, and in each state of the world there are additional statements that can be made regarding the players’ mutual knowledge of Arthur’s hat. ◭
The insights gleaned from these examples can be formulated and proven rigorously. Definition 9.14 A knowledge hierarchy among players in state of the world ω over the set of states of the world Y is a system of “yes” or “no” answers to each question of the form “in a state of the world ω, does player i1 know that player i2 knows that player i3 knows . . . that player il knows event A”? for any event A ⊆ Y and any finite sequence i1 , i2 , . . . , il of players10 in N. The answer to the question “does player i1 know that player i2 knows that player i3 knows . . . that player il knows event A?” in a state of the world ω is affirmative if ω ∈ Ki1 Ki2 · · · Kil A, and negative if ω ∈ Ki1 Ki2 · · · Kil A. Since for every event A and every sequence of players i1 , i2 , . . . , il the event Ki1 Ki2 · · · Kil A is well defined and calculable in an Aumann model of incomplete information, every state of the world defines a knowledge hierarchy. We have therefore derived the following theorem. Theorem 9.15 Every situation of incomplete information (N, Y, (Fi )i∈N , s, ω∗ ) uniquely determines a knowledge hierarchy over the set of states of the world Y in state of the world ω∗ . For every subset C ⊆ S of the set of states of nature, we can consider the event that contains all states of the world whose state of nature is an element of C:
s−1 (C) := {ω ∈ Y : s(ω) ∈ C}.
(9.15)
For example, in Example 9.13 the set of states of nature {yellow} corresponds to the event {ωy,∅ , ωy,H , ωy,G , ωy,TH } in Y . Every subset of S is called an event in S. We define knowledge of events in S as follows: in a state of the world ω player i knows event C in S if and only if he knows the event s−1 (C), i.e., if and only if ω ∈ Ki (s−1 (C)). In the same manner, in state of the world ω player i1 knows that player i2 knows that player i3 knows . . . that player il knows event C in S if and only if in state of the world ω player i1 knows that player i2 knows that player i3 knows . . . that player il knows s−1 (C). ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
10 A player may appear several times in the chain i1 , i2 , . . . , il . For example, the chain player 2 knows that player 1 knows that player 3 knows that player 2 knows event A is a legitimate chain.
331
9.1 The Aumann model and the concept of knowledge
Corollary 9.16 is a consequence of Theorem 9.15 (Exercise 9.10). Corollary 9.16 Every situation of incomplete information (N, Y, (Fi )i∈N , s, ω∗ ) uniquely determines a knowledge hierarchy over the set of states of nature S in state of the world ω∗ . Having defined the knowledge operators of the players, we next turn to the definition of the concept of common knowledge, which was previously defined informally (see Definition 9.2). Definition 9.17 Let (N, Y, (Fi )i∈N , s) be an Aumann model of incomplete information, let A ⊆ Y be an event, and let ω ∈ Y be a state of the world. The event A is common knowledge in ω if for every finite sequence of players i1 , i2 , . . . , il , ω ∈ Ki1 Ki2 . . . Kil−1 Kil A.
(9.16)
That is, event A is common knowledge at state of the world ω if in ω every player knows event A, every player knows that every player knows event A, etc. In Examples 9.12 and 9.13 the only event that is common knowledge in any state of the world is Y (Exercise 9.12). In Example 9.3 (page 322) the event {ωp } (and every event containing it) is common knowledge in state of the world ωp , and the event {ωg,1 , ωg,2 , ωr,1 , ωr,2 } (and the event Y containing it) is common knowledge in every state of the world contained in this event. Example 9.18 Abraham selects an integer from the set {5, 6, 7, 8, 9, 10, 11, 12, 13, 14}. He tells Jefferson whether the number he has selected is even or odd, and tells Ulysses the remainder left over from dividing that number by 4. The corresponding Aumann model of incomplete information depicting the induced situation of Jefferson and Ulysses is:
r r r r r
N = {Jefferson, Ulysses}. S = {5, 6, 7, 8, 9, 10, 11, 12, 13, 14}: the state of nature is the number selected by Abraham. Y = {ω5 , ω6 , ω7 , ω8 , ω9 , ω10 , ω11 , ω12 , ω13 , ω14 }. The function s : Y → S is given by s(ωk ) = k for every k ∈ S. Since Jefferson knows whether the number is even or odd, his partition contains two elements, corresponding to the subset of even numbers and the subset of odd numbers in the set Y : FJ = {{ω5 , ω7 , ω9 , ω11 , ω13 }, {ω6 , ω8 , ω10 , ω12 , ω14 }}.
(9.17)
r As Ulysses knows the remainder left over from dividing the number by 4, his partition contains four elements, one for each possible remainder: FU = {{ω8 , ω12 }, {ω5 , ω9 , ω13 }, {ω6 , ω10 , ω14 }, {ω7 , ω11 }}.
(9.18)
In the state of the world ω6 , the event that the selected number is even, i.e., A = {ω6 , ω8 , ω10 , ω12 , ω14 }, is common knowledge. Indeed, KJ A = KU A = A, and therefore it follows that Ki1 Ki2 . . . Kil−1 Kil A = A for every finite sequence of players i1 , i2 , . . . , il . Since ω6 ∈ A, it follows from Definition 9.17 that in state of the world ω6 the event A is common knowledge among Jefferson and Ulysses. Similarly, in state of the world ω9 , the event that the selected number is odd, B = {ω5 , ω7 , ω9 , ω11 , ω13 }, is common knowledge among Jefferson and Ulysses ◭ (verify!).
332
Games with incomplete information and common priors
Remark 9.19 From Definition 9.17 and Theorem 9.10 we conclude that if event A is common knowledge in a state of the world ω, then every event containing A is also common knowledge in ω. Remark 9.20 The definition of common knowledge can be expanded to events in S: an event C in S is common knowledge in a state of the world ω if the event s−1 (C) is common knowledge in ω. For example, in Example 9.13 in state of the world ωb,TH the event (in the set of states of nature) “Arthur selects the brown hat” is not common knowledge among the players (verify!). Remark 9.21 If event A is common knowledge in a state of the world ω, then in particular ω ∈ Ki A and so Fi (ω) ⊆ A for each i ∈ N. In other words, all players know A in ω. Remark 9.22 We can also speak of common knowledge among a subset of the players M ⊆ N: in a state of the world ω, event A is common knowledge among the players in M if Equation (9.16) is satisfied for any finite sequence i1 , i2 , . . . , il of players in M. Theorem 9.23 states that if there is a player who cannot distinguish between ω and ω′ , then every event that is common knowledge in ω is also common knowledge in ω′ . Theorem 9.23 If event A is common knowledge in state of the world ω, and if ω′ ∈ Fi (ω) for some player i ∈ N, then the event A is also common knowledge in state of the world ω′ . Proof: Suppose that ω′ ∈ Fi (ω) for some player i ∈ N. As the event A is common knowledge in ω, for any sequence i1 , i2 , . . . , il of players we have ω ∈ Ki Ki1 Ki2 . . . Kil−1 Kil A.
(9.19)
Fi (ω) ⊆ Ki1 Ki2 . . . Kil−1 Kil A.
(9.20)
Remark 9.21 implies that
Since ω′ ∈ Fi (ω′ ) = Fi (ω) it follows that ω′ ∈ Ki1 Ki2 . . . Kil−1 Kil A. As this is true for any sequence i1 , i2 , . . . , il of players, the event A is common knowledge in ω′ . We next turn to characterizing sets that are common knowledge. Given an Aumann model of incomplete information (N, Y, (Fi )i∈N , s), define the graph G = (Y, V ) in which the set of vertices is the set of states of the world Y , and there is an edge between vertices ω and ω′ if and only if there is a player i such that ω′ ∈ Fi (ω). Note that the condition defining the edges of the graph is symmetric: ω′ ∈ Fi (ω) if and only if Fi (ω) = Fi (ω′ ), if and only if ω ∈ Fi (ω′ ); hence G = (Y, V ) is an undirected graph. A set of vertices C in a graph is a connected component if the following two conditions are satisfied:
r For every ω, ω′ ∈ C, there exists a path connecting ω with ω′ , i.e., there exist ω = ω1 , ω2 , . . . , ωK = ω′ such that for each k = 1, 2, . . . , K − 1 the graph contains an edge connecting ωk and ωk+1 . r There is no edge connecting a vertex in C with a vertex that is not in C.
333
9.1 The Aumann model and the concept of knowledge
The connected component of ω in the graph, denoted by C(ω), is the (unique) connected component containing ω. Theorem 9.24 Let (N, Y, (Fi )i∈N , s) be an Aumann model of incomplete information and let G be the graph corresponding to this model. Let ω ∈ Y be a state of the world and let A ⊆ Y be an event. Then event A is common knowledge in state of the world ω if and only if A ⊇ C(ω). Proof: First we prove that if A is common knowledge in ω, then C(ω) ⊆ A. Suppose then that ω ′ ∈ C(ω). We want to show that ω′ ∈ A. From the definition of a connected component, there is a path connecting ω with ω′ ; we denote that path by ω = ω1 , ω2 , . . . , ωK = ω′ . We prove by induction on k that ωk ∈ A, and that A is common knowledge in ωk , for every 1 ≤ k ≤ K. For k = 1, because the event A is common knowledge in ω, we deduce that ω1 = ω ∈ A. Suppose now that ωk ∈ A and A is common knowledge in ωk . We will show that ωk+1 ∈ A and that A is common knowledge in ωk+1 . Because there is an edge connecting ωk and ωk+1 , there is a player i such that ωk+1 ∈ Fi (ωk ). It follows from Theorem 9.23 that the event A is common knowledge in ωk+1 . From Remark 9.21 we conclude that ωk+1 ∈ A. This completes the inductive step, so that in particular ω′ = ωK ∈ A. Consider now the other direction: if C(ω) ⊆ A, then event A is common knowledge in state of the world ω. To prove this, it suffices to show that C(ω) is common knowledge in ω, because from Remark 9.19 it will then follow that any event containing C(ω), and in particular A, is also common knowledge in ω. Let i be a player in N. Because C(ω) is a connected component of G, for each ω′ ∈ C(ω), we have Fi (ω′ ) ⊆ C(ω). It follows that {ω′ } = C(ω). (9.21) Fi (ω′ ) ⊇ C(ω) ⊇ ω′ ∈C(ω)
ω′ ∈C(ω)
In other words, for each player i the set C(ω) is the union of all the elements of Fi contained in it. This implies that Ki (C(ω)) = C(ω). As this is true for every player i ∈ N, it follows that for every sequence of players i1 , i2 , . . . , il , ω ∈ C(ω) = Ki1 Ki2 · · · Kil C(ω), and therefore C(ω) is common knowledge in ω.
(9.22)
The following corollary follows from Theorem 9.24 and Remark 9.19. Corollary 9.25 In every state of the world ω ∈ Y , the event C(ω) is common knowledge among the players, and it is the smallest event that is common knowledge in ω. For this reason, C(ω) is sometimes called the common knowledge component among the players in state of the world ω. Remark 9.26 The proof of Theorem 9.24 shows that for each player i ∈ N, the set C(ω) is the union of the elements of Fi contained in it, and it is the smallest event containing ω that satisfies this property. The set of all the connected components of the graph G defines a partition of Y , which is called the meet of F1 , F2 , . . . , Fn . This is the finest partition that satisfies the property that each partition Fi is a refinement of it. We can therefore formulate Theorem 9.24 equivalently as follows. Let (N, Y, (Fi )i∈N , s) be
334
Games with incomplete information and common priors
an Aumann model of incomplete information. Event A is common knowledge in state of the world ω ∈ Y if and only if A contains the element of the meet containing ω.
9.2
The Aumann model of incomplete information with beliefs • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The following model extends the Aumann model of incomplete information presented in the previous section. Definition 9.27 An Aumann model of incomplete information with beliefs (over a set of states of nature S) consists of five elements (N, Y, (Fi )i∈N , s, P), where:
r r r r r
N is a finite set of players; Y is a finite set of states of the world; Fi is a partition of Y , for each i ∈ N; s : Y → S is a function associating a state of nature to every state of the world; P is a probability distribution over Y such that P(ω) > 0 for each ω ∈ Y .
Comparing this definition to that of the Aumann model of incomplete information (Definition 9.4), we have added one new element, namely, the probability distribution P over Y , which is called the common prior. In this model, a state of the world ω∗ is selected by a random process in accordance with the common prior probability distribution P. After the true state of the world has been selected by this random process, each player i learns his partition element Fi (ω∗ ) that contains ω∗ . Prior to the stage at which private information is revealed, the players share a common prior distribution, which is interpreted as their belief about the probability that any specific state of the world in Y is the true one. After each player i has acquired his private information Fi (ω∗ ), he updates his beliefs. This process of belief updating is the main topic of this section. The assumption that all the players share a common prior is a strong assumption, and in many cases there are good reasons to doubt that it obtains. We will return to this point later in the chapter. In contrast, the assumption that P(ω) > 0 for all ω ∈ Y is not a strong assumption. As we will show, a state of the world ω for which P(ω) = 0 is one to which all the players assign probability 0, and it can be removed from consideration in Y . In the following examples and in the rest of this chapter, whenever the states of nature are irrelevant we will specify neither the set S nor the function s. Example 9.28 Consider the following Aumann model:
r The set of players is N = {I, II}. r The set of states of the world is Y = {ω1 , ω2 , ω3 , ω4 }. r The information partitions of the players are FI = {{ω1 , ω2 }, {ω3 , ω4 }},
FII = {{ω1 , ω3 }, {ω2 , ω4 }}.
(9.23)
P(ω3 ) = 31 ,
(9.24)
r The common prior P is P(ω1 ) = 14 ,
P(ω2 ) = 41 ,
P(ω4 ) = 61 .
335
9.2 The Aumann model with beliefs A graphic representation of the players’ partitions and the prior probability distribution is provided in Figure 9.2. Player I’s partition elements are marked by a solid line, while Player II’s partition elements are denoted by a dotted line.
1 4
ω1
1 3
ω3
ω2
ω4
1 4
1 6
Player I: Player II:
Figure 9.2 The information partitions and the prior distribution in Example 9.28
What are the beliefs of each player about the state of the world? Prior to the chance move that selects the state of the world, the players have a common prior distribution over the states of the world. When a player receives information that indicates that the true state of the world is in the partition element Fi (ω∗ ), he updates his beliefs about the states of the world by calculating the conditional probability given his information. For example, if the state of the world is ω1 , Player I knows that the state of the world is either ω1 or ω2 . Player I’s beliefs are therefore P(ω1 | {ω1 , ω2 }) =
p(ω1 ) = p(ω1 ) + p(ω2 )
P(ω2 | {ω1 , ω2 }) =
p(ω2 ) = p(ω1 ) + p(ω2 )
1 4 1 4
+
1 4
= 12 ,
(9.25)
1 4
= 12 .
(9.26)
and similarly 1 4 1 4
+
In words, if Player I’s information is that the state of the world is in {ω1 , ω2 }, he attributes probability 1 1 2 to the state of the world ω1 and probability 2 to the state of the world ω2 . The tables appearing in Figure 9.3 are arrived at through a similar calculation. The upper table describes Player I’s beliefs, as a function of his information partition, and the lower table represents Player II’s beliefs as a function of his information partition.
Player I’s beliefs:
Player II’s beliefs:
Player I’s Information {ω1, ω2} {ω3, ω4} Player II’s Information {ω1, ω3} {ω2, ω4}
ω1
ω2
1 2
1 2
ω3 0
ω4 0
0
0
2 3
1 3
ω1 3 7
0
ω2 ω3 4 0 7 3 0 5
ω4 0 2 5
Figure 9.3 The beliefs of the players in Example 9.28
For example, if Player II’s information is {ω2 , ω4 } (i.e., the state of the world is either ω2 or ω4 ), he attributes probability 53 to the state of the world ω2 and probability 52 to the state of the world ω4 . A player’s beliefs will be denoted by square brackets in which states of the world appear alongside the probabilities that are ascribed to them. For example, [ 35 (ω2 ), 25 (ω4 )] represents beliefs in which probability 35 is ascribed to state of the world ω2 , and probability 25 is ascribed to state of the world
336
Games with incomplete information and common priors ω4 . The calculations performed above yield the first-order beliefs of the players at all possible states of the world. These beliefs can be summarized as follows:
r In state of the world ω1 the first-order belief of Player I is [ 1 (ω1 ), 1 (ω2 )] and that of Player II is 2 2
[ 73 (ω1 ), 47 (ω3 )]. r In state of the world ω2 the first-order belief of Player I is [ 1 (ω1 ), 1 (ω2 )] and that of Player II 2 2 is [ 35 (ω2 ), 25 (ω4 )]. r In state of the world ω3 the first-order belief of Player I is [ 2 (ω3 ), 1 (ω4 )] and that of Player II is 3 3 [ 73 (ω1 ), 47 (ω3 )]. r In state of the world ω4 the first-order belief of Player I is [ 2 (ω3 ), 1 (ω4 )] and that of Player II is 3 3 [ 53 (ω2 ), 25 (ω4 )]. Given the first-order beliefs of the players over Y , we can construct the second-order beliefs, by which we mean the beliefs each player has about the state of the world and the first-order beliefs of the other player. In state of the world ω1 (or ω2 ) Player I attributes probability 12 to the state of the world being ω1 and probability 21 to the state of the world being ω2 . As we noted above, when the state of the world is ω1 , the first-order belief of Player II is [ 73 (ω1 ), 47 (ω3 )], and when the state of the world is ω2 , Player II’s first-order belief is [ 53 (ω2 ), 25 (ω4 )]. Therefore:
r In state of the world ω1 (or ω2 ) Player I attributes probability 1 to the state of the world being ω1 2 and the first-order belief of Player II being [ 37 (ω1 ), 47 (ω3 )], and probability world being ω2 and Player II’s first-order belief being [ 53 (ω2 ), 25 (ω4 )].
1 2
to the state of the
We can similarly calculate the second-order beliefs of each of the players in each state of the world:
r In state of the world ω3 (or ω4 ) Player I attributes probability 2 to the state of the world being ω3 3
and the first-order belief of Player II being [ 37 (ω1 ), 47 (ω3 )], and probability 13 to the state of the world being ω4 and Player II’s first-order belief being [ 53 (ω2 ), 25 (ω4 )]. r In state of the world ω1 (or ω3 ) Player II attributes probability 3 to the state of the world being 7 ω1 and the first-order belief of Player I being [ 21 (ω1 ), 12 (ω2 )], and probability 47 to the state of the world being ω3 and Player I’s first-order belief being [ 23 (ω3 ), 13 (ω4 )]. r In state of the world ω2 (or ω4 ) Player II attributes probability 3 to the state of the world being 5 ω2 and the first-order belief of Player I being [ 21 (ω1 ), 12 (ω2 )], and probability 25 to the state of the world being ω4 and Player I’s first-order belief being [ 23 (ω3 ), 13 (ω4 )].
These calculations can be continued to arbitrarily high orders in a similar manner to yield belief hierarchies of the two players. ◭
Theorem 9.29 says that in an Aumann model, knowledge is equivalent to belief with probability 1. The theorem, however, requires assuming that P(ω) > 0 for each ω ∈ Y ; without that assumption the theorem’s conclusion does not obtain (Exercise 9.21). In Example 9.36 we will see that the conclusion of the theorem also fails to hold when the set of states of the world is infinite. Theorem 9.29 Let (N, Y, (Fi )i∈N , s, P) be an Aumann model of incomplete information with beliefs. Then for each ω ∈ Y , for each player i ∈ N, and for every event A ⊆ Y , player i knows event A in state of the world ω if and only if he attributes probability 1 to that event: P(A | Fi (ω)) = 1
⇐⇒
Fi (ω) ⊆ A.
(9.27)
337
9.2 The Aumann model with beliefs
Notice that the assumption that P(ω) > 0 for every ω ∈ Y , together with ω ∈ Fi (ω) for every ω ∈ Y , yields P(Fi (ω)) > 0 for each player i ∈ N and every state of the world ω ∈ Y , so that the conditional probability in Equation (9.27) is well defined. Proof: Suppose first that Fi (ω) ⊆ A. Then P(A | Fi (ω)) ≥ P(Fi (ω) | Fi (ω)) = 1,
(9.28)
so that P(A | Fi (ω)) = 1. To prove the reverse implication, if P(A | Fi (ω)) = 1 then P(A | Fi (ω)) =
P(A ∩ Fi (ω)) = 1, P(Fi (ω))
(9.29)
which yields P(A ∩ Fi (ω)) = P(Fi (ω)). From the assumption that P(ω′ ) > 0 for each ω′ ∈ Y we conclude that A ∩ Fi (ω) = Fi (ω), that is, Fi (ω) ⊆ A. A situation of incomplete information with beliefs is a vector (N, Y, (Fi )i∈N , s, P, ω∗ ) composed of an Aumann model of incomplete information with beliefs (N, Y, (Fi )i∈N , s, P) together with a state of the world ω∗ ∈ Y . The next theorem follows naturally from the analysis we performed in Example 9.28, and it generalizes Theorem 9.15 and Corollary 9.16 to situations of belief. Theorem 9.30 Every situation of incomplete information with beliefs (N, Y, (Fi )i∈N , s, P, ω∗ ) uniquely determines a mutual belief hierarchy among the players over the states of the world Y , and therefore also a mutual belief hierarchy over the states of nature S. The above formulation is not precise, as we have not formally defined what the term “mutual belief hierarchy” means. The formal definition is presented in Chapter 11 where we will show that each state of the world is in fact a pair, consisting of a state of nature and a mutual belief hierarchy among the players over the states of nature S. The inductive description of belief hierarchies, as presented in the examples above and the examples below, will suffice for this chapter. In Example 9.28 we calculated the belief hierarchy of the players in each state of the world. A similar calculation can be performed with respect to events. Example 9.28 (Continued) Consider the situation in which ω∗ = ω1 and the event A = {ω2 , ω3 }. As Player I’s information in state of the world ω1 is {ω1 , ω2 }, the conditional probability that he ascribes to event A in state of the world ω1 (or ω2 ) is P(A | {ω1 , ω2 }) =
P({ω1 }) P(A ∩ {ω1 , ω2 }) = = P({ω1 , ω2 }) P({ω1 , ω2 })
1 4 1 4
+
1 4
= 12 .
(9.30)
Because Player II’s information in state of the world ω1 is {ω1 , ω3 }, the conditional probability that he ascribes to event A in state of the world ω1 (or ω3 ) is P(A | {ω1 , ω3 }) =
P({ω3 }) P(A ∩ {ω1 , ω3 }) = = P({ω1 , ω3 }) P({ω1 , ω3 })
1 3 1 4
+
1 3
= 47 .
(9.31)
338
Games with incomplete information and common priors Second-order beliefs can also be calculated readily. In state of the world ω1 , Player I ascribes probability 21 to the true state being ω1 , in which case the probability that Player II ascribes to event A is 74 ; he ascribes probability 12 to the true state being ω2 , in which case the probability that Player II ascribes to event A is ( 41 )/( 14 + 16 ) = 52 . These are Player I’s second-order beliefs about event A in state of the world ω1 . We can similarly calculate the second-order beliefs of Player II, as well as ◭ all the higher-order beliefs of the two players.
Example 9.31 Consider again the Aumann model of incomplete information with beliefs presented in Example 9.28, but now with the common prior given by P(ω1 ) = P(ω4 ) = 16 , P(ω2 ) = P(ω3 ) = 13 .
(9.32)
The partitions FI and FII are graphically depicted in Figure 9.4.
1 6
ω1
1 3
ω3
ω2
ω4
1 3
1 6
Player I: Player II:
Figure 9.4 The information partitions and the prior distribution in Example 9.31
Since ω1 ∈ FI (ω2 ), ω2 ∈ FII (ω4 ), and ω4 ∈ FI (ω3 ) in the graph corresponding to this Aumann model, all states in Y are connected. Hence the only connected component in the graph is Y (verify!), and therefore the only event that is common knowledge in any state of the world ω is Y (Theorem 9.24). Consider now the event A = {ω2 , ω3 } and the situation in which ω∗ = ω1 . What is the conditional probability that the players ascribe to A? Similarly to the calculation performed in Example 9.28, P(A | {ω1 , ω2 }) =
P({ω2 }) P(A ∩ {ω1 , ω2 }) = = P({ω1 , ω2 }) P({ω1 , ω2 })
1 3 1 6
+
1 3
= 32 ,
(9.33)
and we can also readily calculate that both players ascribe probability 32 to event A in each state of the world. Formally: ! ! ω : qI := P(A | FI (ω)) = 23 = Y, ω : qII := P(A | FII (ω)) = 23 = Y. (9.34)
It follows from the definition of the knowledge operator that the event “Player I ascribes probability 2 to A” is common knowledge in each state of the world, and the event “Player II ascribes probability 3 2 to A” is also common knowledge in each state of the world. In other words, in this situation the 3 probabilities that the two players ascribe to event A are both common knowledge and equal to each ◭ other.
Is it a coincidence that the probabilities qI and qII that the two players assign to the event A in Example 9.31 are equal (both being 23 )? Can there be a situation in which it is common knowledge that to the event A, Player I ascribes probability qI and Player II
339
9.2 The Aumann model with beliefs
ascribes probability qII , where qI = qII ? Theorem 9.32 asserts that this state of affairs is impossible. Theorem 9.32 Aumann’s Agreement Theorem (Aumann [1976]) Let (N, Y, (Fi )i∈N , s, P) be an Aumann model of incomplete information with beliefs, and suppose that n = 2 (i.e., there are two players). Let A ⊆ Y be an event and let ω ∈ Y be a state of the world. If the event “Player I ascribes probability qI to A” is common knowledge in ω, and the event “Player II ascribes probability qII to A” is also common knowledge in ω, then qI = qII . Let us take a moment to consider the significance of this theorem before proceeding to its proof. The theorem states that if two players begin with “identical beliefs about the world” (represented by the common prior P) but receive disparate information (represented by their respective partition elements containing ω), then “they cannot agree to disagree”: if they agree that the probability that Player I ascribes to a particular event is qI , then they cannot also agree that Player II ascribes a probability qII to the same event, unless qI = qII . If they disagree regarding a particular fact (for example, Player I ascribes probability qI to event A and Player II ascribes probability qII to the same event), then the fact that they disagree cannot be common knowledge. Since we know that people often agree to disagree, we must conclude that either (a) different people begin with different prior distributions over the states of the world, or (b) people incorrectly calculate conditional probabilities when they receive information regarding the true state of the world. Proof of Theorem 9.32: Let C be the connected component of ω in the graph corresponding to the given Aumann model. It follows from Theorem 9.24 that event C is common knowledge in state of the world ω.j The event jC can be represented as a union of partition elements in FI ; that is, C = j FI , where FI ∈ FI for each j . Since P(ω′ ) > 0 for every j ω′ ∈ Y , it follows that P(FI ) > 0 for every j , and therefore P(C) > 0. The fact that Player I ascribes probability qI to the event A is common knowledge in ω. It follows that the event A contains the event C (Corollary 9.25), and therefore each one j j of the events (FI )j . This implies that for each of the sets FI the conditional probability j of A, given that Player I’s information is FI , equals qI . In other words, for each j , j P A ∩ FI j P A | FI = (9.35) j = qI . P FI j As this equality holds for every j , and C = j FI , it follows from Equation (9.35) that
j j P A ∩ FI = qI P FI = qI P(C). (9.36) P(A ∩ C) = j
j
We similarly derive that P(A ∩ C) = qII P(C).
(9.37)
Finally, since P(C) > 0, Equations (9.36) and (9.37) imply that qI = qII , which is what we wanted to show. How do players arrive at a situation in which the probabilities qI and qII that they ascribe to a particular event A are common knowledge? In Example 9.31, each player calculates
340
Games with incomplete information and common priors
the conditional probability of A given a partition element of the other player, and comes to the conclusion that no matter which partition element of the other player is used for the conditioning, the conditional probability turns out to be the same. That is why qi is common knowledge among the players for i = I, II. In most cases the conditional probability of an event is not common knowledge, because it varies from one partition element to another. We can, however, describe a process of information transmission between the players that guarantees that these conditional probabilities will become common knowledge when the process is complete (see Exercises 9.25 and 9.26). Suppose that each player publicly announces the conditional probability he ascribes to event A given the information (i.e., the partition element) at his disposal. After each player has heard the other player’s announcement, he can rule out some states of the world, because they are impossible: possible states of the world are only those in which the conditional probability that the other player ascribes to event A is the conditional probability that he publicly announced. Each player can then update the conditional probability that he ascribes to event A following the elimination of impossible states of the world, and again publicly announce the new conditional probability he has calculated. Following this announcement, the players can again rule out the states of the world in which the updated conditional probability of the other player differs from that which he announced, update their conditional probabilities, and announce them publicly. This can be repeated again and again. Using Aumann’s Agreement Theorem (Theorem 9.32), it can be shown that at the end of this process the players will converge to the same conditional probability, which will be common knowledge among them (Exercise 9.28).
Example 9.33 We provide now an example of the dynamic process just described. More examples can be found in Exercises 9.25 and 9.26. Consider the following Aumann model of incomplete information:
r N = {I, II}. r Y = {ω1 , ω2 , ω3 , ω4 }. r The information partitions of the players are FI = {{ω1 , ω2 }, {ω3 , ω4 }},
FII = {{ω1 , ω2 , ω3 }, {ω4 }}.
(9.38)
PII (ω2 ) = PII (ω3 ) = 61 .
(9.39)
r The prior distribution is PII (ω1 ) = PII (ω4 ) = 13 ,
The partition elements FI and FII are as depicted graphically in Figure 9.5.
1 3
ω1
1 6
ω3
ω2
ω4
1 6
1 3
Player I: Player II:
Figure 9.5 The information partitions and the prior distribution in Example 9.33
341
9.2 The Aumann model with beliefs Let A = {ω2 , ω3 }, and suppose that the true state of the world is ω3 . We will now trace the dynamic process described above. Player I announces the conditional probability P(A | {ω3 , ω4 }) = 13 that he ascribes to event A, given his information. Notice that in every state of the world Player I ascribes probability 31 to event A, so that this announcement does not add any new information to Player II. Next, Player II announces the conditional probability P(A | {ω3 , ω4 }) = 12 that he ascribes to A, given his information. This enables Player I to learn that the true state of the world is not ω4 , because if it were ω4 , Player II would have ascribed conditional probability 0 to the event A. Player I therefore knows, after Player II’s announcement, that the true state of the world is ω3 , and then announces that the conditional probability he ascribes to the event A is 1. This informs Player II that the true state of the world is ω3 , because if the true state of the world were ω1 or ω2 (the two other possible states, given Player II’s information), Player I would have announced that he ascribed conditional probability 13 to the event A. Player II therefore announces that the conditional probability he ascribes to the event A is 1, and this probability is now common knowledge among the two players. It is left to the reader to verify that if the true state of the world is ω1 or ω2 , the dynamic process described above will lead the two players to common knowledge that the conditional probability of ◭ the event A is 31 .
Aumann’s Agreement Theorem has important implications regarding the rationality of betting between two risk-neutral players (or two players who share the same level of risk aversion). To simplify the analysis, suppose that the two players bet that if a certain event A occurs, Player II pays Player I one dollar, and if event A fails to occur, Player I pays Player II one dollar instead. Labeling the probabilities that the players ascribe to event A as qI and qII respectively, Player I should be willing to take this bet if and only if qI ≥ 21 , with Player II agreeing to the bet if and only if qII ≤ 12 . Suppose that Player I accepts the bet. Then the fact that he has accepted the bet is common knowledge, which means that the fact that qI ≥ 21 is common knowledge. By the same reasoning, if Player II agrees to the bet, that fact is common knowledge, and therefore the fact that qII ≤ 21 is common knowledge. Using a proof very similar to that of Aumann’s Agreement Theorem, we conclude that it is impossible for both facts to be common knowledge unless qI = qII = 21 , in which case the expected payoff for each player is 0, and there is no point in betting (see Exercises 9.29 and 9.30). Note that the agreement theorem rests on two main assumptions:
r Both players share a common prior over Y . r The probability that each of the players ascribes to event A is common knowledge among them. Regarding the first assumption, the common prior distribution P is part of the Aumann model of incomplete information with beliefs and it is used to compute the players’ beliefs given their partitions. As the following example shows, if each player’s belief is computed from a different probability distribution, we obtain a more general model in which the agreement theorem does not hold. We will return to Aumann models with incomplete information and different prior distributions in Chapter 10.
342
Games with incomplete information and common priors
Example 9.34 In this example we will show that if the two players have different priors, Theorem 9.32 does not hold. Consider the following Aumann model of incomplete information:
r N = {I, II}. r Y = {ω1 , ω2 , ω3 , ω4 }. r The information that the two players have is given by FI = {{ω1 , ω2 }, {ω3 , ω4 }},
FII = {{ω1 , ω4 }, {ω2 , ω3 }}.
(9.40)
r Player I calculates his beliefs based on the following prior distribution: PI (ω1 ) = PI (ω2 ) = PI (ω3 ) = PI (ω4 ) = 14 .
(9.41)
r Player II calculates his beliefs based on the following prior distribution: PII (ω1 ) = PII (ω3 ) =
2 , 10
PII (ω2 ) = PII (ω4 ) =
3 . 10
(9.42)
The only connected component in the graph corresponding to this Aumann model is Y (verify!), so that the only event that is common knowledge in any state of the world ω is Y . Let A = {ω1 , ω3 }. A quick calculation reveals that in each state ω ∈ Y PI (A | FI (ω)) = 12 ,
PII (A | FII (ω)) = 25 .
(9.43)
That is, ω : qI := P(A | FI (ω)) =
1 2
!
= Y,
ω : qII := P(A | FII (ω)) =
2 5
!
= Y.
(9.44)
From the definition of the knowledge operator it follows that the facts that qI = 12 and qII = 52 are common knowledge in every state of the world. In other words, it is common knowledge in every state of the world that the players ascribe different probabilities to the event A. This does not contradict Theorem 9.32 because the players do not share a common prior. In fact, this result is not surprising; because the players start off by “agreeing” that their initial probability distributions diverge (and that fact is common knowledge), it is no wonder that it is common knowledge among them that they ascribe different probabilities to event A (after learning which partition element they ◭ are in).
Example 9.35 In this example we will show that even if the players share a common prior, if the fact that “Player II ascribes probability qII to event A” is not common knowledge, Theorem 9.32 does not hold; that is, it is possible that qI = qII . Consider the following Aumann model of incomplete information:
r N = {I, II}. r Y = {ω1 , ω2 , ω3 , ω4 }. r The players’ information partitions are FI = {{ω1 , ω2 }, {ω3 , ω4 }},
FII = {{ω1 , ω2 , ω3 }, {ω4 }}.
(9.45)
r The common prior distribution is P(ω1 ) = P(ω2 ) = P(ω3 ) = P(ω4 ) = 14 .
(9.46)
343
9.2 The Aumann model with beliefs The partitions FI and FII are depicted graphically in Figure 9.6.
1 4
ω1
1 4
ω3
ω2
ω4
1 4
1 4
Player I: Player II:
Figure 9.6 The partitions of the players in Example 9.35 and the common prior
The only connected component in the graph corresponding to this Aumann model is Y (verify!). Let A = {ω1 , ω3 }. In each state of the world, the probability that Player I ascribes to event A is qI = 12 : ! w ∈ Y : qI = P (A | FI (ω)) = 12 = Y, (9.47) and therefore the fact that qI =
1 2
is common knowledge in every state of the world.
In states of the world ω1 , ω2 , and ω3 , Player II ascribes probability 23 to event A: ! w ∈ Y : qII = P A | FII (ω) = 23 = {ω1 , ω2 , ω3 } ⊆ Y,
(9.48)
and in state of the world ω4 he ascribes probability 0 to A. Since the only event that is common knowledge in any state of the world is Y , the event “Player II ascribes probability 32 to A” is not common knowledge in any state of the world. For that reason, the fact that qI = qII does not contradict Theorem 9.32. Note that in state of the world ω1 , Player I knows that the state of the world is in {ω1 , ω2 }, and therefore he knows that Player II’s information is {ω1 , ω2 , ω3 }, and thus he (Player I) knows that Player II ascribes probability qII = 23 to the event A. However, the fact that Player II ascribes probability qII = 23 to event A is not common knowledge among the players in the state of the world ω1 . This is so because in that state of the world Player II cannot exclude the possibility that the state of the world is ω3 (he ascribes to this probability 31 ). If the state of the world is ω3 , Player I knows that the state of the world is in {ω3 , ω4 }, and therefore he (Player I) cannot exclude the possibility that the state of the world is ω4 (he ascribes to this probability 12 ), in which case Player II knows that the state of the world is ω4 , and then the probability that Player II ascribes to event A is 0 (qII = 0). Therefore, in state of the world ω1 Player II ascribes probability 31 to the fact that Player I ascribes probability 12 to Player II ascribing probability 0 to event A. Thus, in state of the world ω1 Player I knows that qII = 23 , yet this event is not common knowledge among the players. ◭
Before we proceed, let us recall that an Aumann model consists of two elements:
r The partitions of the players, which determine the information (knowledge) they possess. r The common prior P that, together with the partitions, determines the beliefs of the players.
344
Games with incomplete information and common priors
The knowledge structure in an Aumann model is independent of the common prior P. Furthermore, as we saw in Example 9.34, even when there is no common prior, and instead every player has a different subjective prior distribution, the underlying knowledge structure and the set of common knowledge events are unchanged. Not surprisingly, the Agreement Theorem (Theorem 9.32), which deals with beliefs, depends on the assumption of a common prior, while the common knowledge characterization theorem (Theorem 9.24, page 333) is independent of the assumption of a common prior.
9.3
An infinite set of states of the world • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Thus far in the chapter, we have assumed that the set of states of the world is finite. What if this set is infinite? With regard to set-theoretic operations, in the case of an infinite set of states of the world we can make use of the same operations that we implemented in the finite case. On the other hand, dealing with the beliefs of the players requires using tools from probability theory, which in the case of an infinite set of states of the world means that we need to ensure that this set is a measurable space. A measurable space is a pair (Y, F ), with Y denoting a set, and F a σ -algebra over Y . This means that F is a family of subsets of Y that includes the empty set, is closed under complementation (i.e., if A ∈ F then Ac = Y \ A F ), and is closed under countable ∈ ∞ ∞ unions (i.e., if (An )n=1 is a family of sets in F then n=1 An ∈ F ). An event is any element of F . In particular, the partitions of the players, Fi , are composed solely of elements of F . The collection of all the subsets of Y , 2Y , is a σ -algebra over Y , and therefore (Y, 2Y ) is a measurable space. This is in fact the measurable space we used, without specifically mentioning it, in all the examples we have seen so far in which Y was a finite set. All the infinite sets of states of the world Y that we will consider in the rest of the section will be a subset of a Euclidean space, and the σ -algebra F will be the σ -algebra of Borel sets, that is, the smallest σ -algebra that contains all the relatively open sets11 in Y . The next example shows that when the set of states of the world is infinite, knowledge is not equivalent to belief with probability 1 (in contrast to the finite case; see Theorem 9.29 on page 336). Example 9.36 Consider an Aumann model of incomplete information in which the set of players N = {I} contains only one player, the set of states of the world is Y = [0, 1], the σ -algebra F is the σ -algebra of Borel sets,12 and the player has no information, which means that FI = {Y }. The common prior P is the uniform distribution over the interval [0, 1]. Since there is only one player and his partition contains only one element, the only event that the player knows (in any state of the world ω) is Y . Let A be the set of irrational numbers in the interval [0, 1], which is in F . As the set A does not contain Y , the player does not know A. But ◭ P(A | FI (ω)) = P(A | Y ) = P(A) = 1 for all ω ∈ Y . ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
11 When Y ⊆ Rd , a set A ⊆ Y is relatively open in Y if it is equal to the intersection of Y with an open set in Rd . 12 In this case the σ -algebra of Borel sets is the smallest σ -algebra that contains all the open intervals in [0, 1], and the intervals of the form [0, α) and (α, 1] for α ∈ (0, 1).
345
9.4 The Harsanyi model
Next we show that when the set of states of the world is infinite, the very notion of knowledge hierarchy can be problematic. To make use of the knowledge structure, for every event A ∈ F the event Ki A must also be an element of F : if we can talk about the event A, we should also be able to talk about the event that “player i knows A.” Is it true that for every σ -algebra, every partition (Fi )i∈N representing the information of the players, and every event A ∈ F , it is necessarily true that Ki A ∈ F ? When the set of states of the world is infinite, the answer to that question is no. This is illustrated in the next example, which uses the fact that there is a Borel set in the unit square whose projection onto the first coordinate is not a Borel set in the interval [0, 1] (see Suslin [1917]). Example 9.37 Consider the following Aumann model of incomplete information:
r There are two players: N = {I, II}. r The space of states of the world is the unit square: Y = [0, 1] × [0, 1], and F is the σ -algebra of Borel sets in the unit square.
r For i = I, II, the information of player i is the i-th coordinate of ω; that is, for each x, y ∈ [0, 1] denote
Ax = {(x, y) ∈ Y : 0 ≤ y ≤ 1},
By = {(x, y) ∈ Y : 0 ≤ x ≤ 1}.
(9.49)
Ax is the set of all points in Y whose first coordinate is x, and By is the set of all points in Y whose second coordinate is y. We then have FI = {Ax : 0 ≤ x ≤ 1},
FII = {By : 0 ≤ y ≤ 1}.
(9.50)
In words, Player I’s partition is the set of vertical sections of Y , and the partition of Player II is the set of horizontal sections of Y . Thus, for any (x, y) ∈ Y Player I knows the x-coordinate and Player II knows the y-coordinate. Let E ⊆ Y be a Borel set whose projection onto the x-axis is not a Borel set, i.e., the set F = {x ∈ [0, 1] : there exists y ∈ [0, 1] such that (x, y) ∈ E}
(9.51)
is not a Borel set, and hence F c = Y \ F is also not a Borel set in [0, 1]. Player I knows that the event E does not obtain when the x-coordinate is not in F : KI (E c ) = F c × [0, 1].
(9.52)
This implies that despite the fact that the set E c is a Borel set, the set of states of the world in which ◭ Player I knows the event E c is not a Borel set.
In spite of the technical difficulties indicated by Examples 9.36 and 9.37, in Chapter 10 we develop a general model of incomplete information that allows infinite sets of states of the world.
9.4
The Harsanyi model of games with incomplete information • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
In our treatment of the Aumann model of incomplete information, we concentrated on concepts such as mutual knowledge and mutual beliefs among players regarding the true
346
Games with incomplete information and common priors
state of the world. Now we will analyze games with incomplete information, which are models in which the incomplete information is about the game that the players play. In this case, a state of nature consists of all the parameters that have a bearing on the payoffs, that is, the set of actions of each player and his payoff function. This is why the state of nature in this case is also called the payoff-relevant parameter of the game. This model was first introduced by John Harsanyi [1967], nine years prior to the introduction of the Aumann model of incomplete information, and was the first model of incomplete information used in game theory. The Harsanyi model consists of two elements. The first is the games in which the players may participate, which will be called “state games,” and are the analog of states of nature in Aumann’s model of incomplete information. The second is the beliefs that the players have about both the state games and the beliefs of the other players. Since the information the player has of the game is incomplete, a player is characterized by his beliefs about the state of nature (namely, the state game) and the beliefs of the other players. This characterization was called by Harsanyi the type of the player. In fact, as we shall see, a player’s type in a Harsanyi model is equivalent to his belief hierarchy in an Aumann model. Just as we did when studying the Aumann model of incomplete information, we will also assume here that the space of states of the world is finite, so that the number of types of each player is finite. We will further assume that every player knows his own type, and that the set of types is common knowledge among the players. Example 9.38 Harry (Player I, the row player) and William (Player II, the column player) are playing a game in which the payoff functions are determined by one of the two matrices appearing in Figure 9.7. William has two possible actions (t and b), while Harry has two or three possible actions (either T and B, or T , C, and B), depending on the payoff function.
William t
b
T
1, 1
1, 0
C
0, 2
1, 1
B
1, 0
0, 2
William
Harry
t
b
T
1, 0
0, 2
B
0, 3
1, 0
Harry
State game G1
State game G1
Figure 9.7 The state games in the game in Example 9.38
Harry knows the payoff function (and therefore in particular knows whether he has two or three actions available). William only knows that the payoff functions are given by either G1 or G2 . He ascribes probability p to the payoff function being given by G1 and probability 1 − p to the payoff function being given by G2 . This description is common knowledge among Harry and William.13
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
13 In other words, both Harry and William know that this is the description of the game, each one knows that the other knows that this is the description of the game, and so on.
347
9.4 The Harsanyi model
T I G1
B
p 0
t
(1, 0)
b
(0, 2)
t
(0, 3)
b
(1, 0)
t
(1, 1)
b
(1, 0)
t
(0, 2)
b
(1, 1)
t
(1, 0)
b
(0, 2)
II 1–p G2
T I
C B
Figure 9.8 The game in Example 9.38 in extensive form
This situation can be captured by the extensive-form game appearing in Figure 9.8. In this game, Nature chooses G1 or G2 with probability p or 1 − p, respectively, with the choice known to Harry but not to William. In Figure 9.8 the state games are delineated by broken lines. Neither of the two state games is a subgame (Definition 3.11, page 45), because there are information sets that contain vertices from two state games. ◭
The game appearing in Figure 9.8 is the game that Harsanyi suggested as the model for the situation described in Example 9.8. Such a game is called a Harsanyi game with incomplete information and defined as follows. Definition 9.39 A Harasanyi game with incomplete information is a vector (N, (Ti )i∈N , p, S, (st )t∈×i∈N Ti ) where:
r N is a finite set of players. r Ti is a finite set of types for player i, for each i ∈ N. The set of type vectors is denoted by T = ×i∈N Ti . r p ∈ (T ) is a probability distribution over the set of type vectors14 that satisfies p(ti ) := t−i ∈T−i p(ti , t−i ) > 0 for every player i ∈ N and every type ti ∈ Ti . r S is a set of states of nature, which will be called state games.15 Every state of nature s ∈ S is a vector s = (N, (Ai )i∈N , (ui )i∈N ), where Ai is a nonempty set of actions of player i and ui : ×i∈N Ai → R is the payoff function of player i.
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
14 Recall that T−i = ×j =i Tj and t−i = (tj )j =i . 15 For the sake of notational convenience, every state game will be represented by a game in strategic form. Everything stated in this section also holds true in the case in which every state game is represented by a game in extensive form.
348
Games with incomplete information and common priors
r st = (N, (Ai (ti ))i∈N , (ui (t))i∈N ) ∈ S is the state game for the type vector t, for every t ∈ T . Thus, player i’s action set in the state game st depends on his type ti only, and is independent of the types of the other players. A game with incomplete information proceeds as follows:
r A chance move selects a type vector t = (t1 , t2 , . . . , tn ) ∈ T according to the probability distribution p. r Every player i knows the type ti that has been selected for him (i.e., his own type), but does not know the types t−i = (tj )j =i of the other players. r The players select their actions simultaneously: each player i, knowing his type ti , selects an action ai ∈ Ai (ti ). r Every player i receives the payoff ui (t; a), where a = (a1 , a2 , . . . , an ) is the vector of actions that have been selected by the players. Player i, of type ti , does not know the types of the other players; he has a belief about their types. This belief is the conditional probability distribution p(· | ti ) over the set T−i = ×j =i Tj of the vectors of the types of the other players. The set of actions that player i believes that he has at his disposal is part of his type, and therefore the set Ai (ti ) of actions available to player i in the state game st is determined by ti only, and not by the types of other players. It is possible for a player, though, not to know the sets of actions that the other players have at their disposal, as we saw in Example 9.38. He does have beliefs about the action sets of the other players, which are derived from his beliefs about their types. The payoff of player i depends on the state game st and on the vector of actions a selected by the players, so that it depends on the vector of types t in two ways:
r The set of action vectors A(t) := ×i∈N Ai (ti ) in the state game st depends on t. r The payoff function ui (t) in the state game st depends on t; even when the sets of action Aj (tj ) do not depend on the players’ types, player i’s payoff depends on the types of all players. Because a player may not know for certain the types of the other players, he may not know for certain the state of nature, which in turn implies that he may not know for certain his own payoff function. In summary, a Harsanyi game is an extensive-form game consisting of several state games related to each other through information sets, as depicted in Figure 9.8. Remark 9.40 The Harsanyi model, defined in Definition 9.39, provides a tool to describe the incomplete information a player may have regarding the possible action sets of the other players, their utility functions, and even the set of other players active in the game: a player j who is not active in state game st has type tj such that Aj (tj ) contains only one action. This interpretation makes sense because the set of equilibria in the game is independent of the payoffs to such a type tj of player j (see Exercise 9.45). Remark 9.41 We note that in reality there is no chance move that selects one of the state games: the players play one, and only one, of the possible state games. The Harsanyi game is a construction we use to present the situation of interest to us, by describing the beliefs of the players about the game that they are playing. For instance, suppose that
349
9.4 The Harsanyi model
in in Example 9.38 the players play the game G1 . Since William does not know whether the game he is playing is G1 or G2 , from his standpoint he plays a state game that can be either one of these games. Therefore, he constructs the extensive-form game that is described in Figure 9.8, and the situation that he faces is a Harsanyi game with incomplete information. In the economic literature the Harsanyi model is often referred to as the ex ante stage16 model as it captures the situation of the players before knowing their types. The situation obtained after the chance move has selected the types of the players is referred to as the interim stage model. This model corresponds to an Aumann situation of incomplete information that captures the situation in a specific state of the world (Definition 9.5, page 324). The next theorem generalizes Example 9.38, and states that any Harsanyi game with incomplete information can be described as an extensive-form game. Its proof is left to the reader (Exercise 9.35). Theorem 9.42 Every (Harsanyi) game with incomplete information can be described as an extensive-form game (with moves of chance and information sets).
9.4.1
Belief hierarchies In any Aumann model of incomplete information we can attach a belief hierarchy to every state of the world (Theorem 9.30, page 337). Similarly, in any Harsanyi game with incomplete information we can attach a belief hierarchy to every type vector. We illustrate this point by the following example.
Example 9.43 The residents of the town of Smallville live in a closed and supportive tight-knit community. The personality characteristic that they regard as most important revolves around the question of whether a person puts his family life ahead of his career, or his career ahead of his family. Kevin, the local matchmaker, approaches two of the residents, Abe and Sarah, informing them that in his opinion they would be well suited as a couple. It is well known in the community from past experience that Kevin tends to match a man who stresses his career with a woman who emphasizes family life, and a man who puts family first with a woman who is career-oriented, but there were several instances in the past when Kevin did not stick to that rule. The distribution of past matches initiated by Kevin is presented in the following diagram (Figure 9.9).
Family Woman
Career Woman
Family Man
1 10
3 10
Career Man
4 10
2 10
Figure 9.9 Player types with prior distribution
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
16 The Latin expression ex ante means “before.”
350
Games with incomplete information and common priors The above verbal description can be presented as a Harsanyi model (without specifying the state games) in the following way: The set of players is N = {Abe, Sarah}. Abe’s set of types is TA = {Careerist, Family}. Sarah’s set of types is TS = {Careerist, Family}. Because this match is one of Kevin’s matches, the probability distribution p over T = TA × TS is calculated from past matches of Kevin; namely, it is the probability distribution given in Figure 9.9. r Since the state game corresponding to each pair of types is not specified, we denote the set of states of nature by S = {sCC , sCF , sF C , sF F }, without specifying the details of each state game. For each state game, the left index indicates the type of Abe (“C” for Career man, “F” for Family man) and the right index indicates the type of Sarah (“C” for Career woman, “F” for Family woman).
r r r r
As each player knows his own type, armed with knowledge of the past performance of the matchmaker (the prior distribution in Figure 9.9), each player can calculate the conditional probability of the type of the other player. For example, if Abe is a careerist, he can conclude that the conditional probability that Sarah is also a careerist is p(Sarah is a careerist | Abe is a careerist) =
2 10 2 10
+
4 10
= 13 ,
while if Abe is a family man he can conclude that the conditional probability that Sarah is a careerist is p(Sarah is a careerist | Abe is a family man) =
3 10 1 10
+
3 10
= 34 .
◭
Given his type, every player can calculate the infinite belief hierarchy. The continuation of our example illustrates this. Example 9.43 (Continued) Suppose that Abe’s type is careerist. As shown above, in that case his first-order beliefs about the state of nature is [ 32 (sCF ), 13 (sCC )]. His second-order beliefs are as follows: he ascribes probability 23 to the state of nature being sCF , in which case Sarah’s beliefs are [ 51 (sF F ), 45 (sF C )] (this follows from a similar calculation to the one performed above; verify!), and he ascribes probability 13 to the state of nature being sCC , in which case Sarah’s beliefs are [ 35 (sCF ), 25 (sCC )]. Abe’s higher-order beliefs can similarly be calculated. ◭
When we analyze a Harsanyi game without specifying the state game corresponding to each state of nature, we will refer to it as a Harsanyi model of incomplete information. Such a model is equivalent to an Aumann model of incomplete information in the sense that every situation of incomplete information that can be analyzed using one model can be analyzed using the other one: a partition element Fi (ω) of player i in an Aumann model is his type in a Harsanyi model. Let (N, (Ti )i∈N , p, S, (st )t∈×i∈N Ti ) be a Harsanyi model of incomplete information. An Aumann model of incomplete information describing the same structure of mutual information is the model in which the set of states of the world
351
9.4 The Harsanyi model
is the set of type vectors that have positive probability Y = {t ∈ T : p(t) > 0}.
(9.53)
The partition of each player i is given by his type: for every type ti ∈ Ti , there is a partition element Fi (ti ) ∈ Fi , given as follows: Fi (ti ) = {(ti , t−i ) : t−i ∈ T−i , p(ti , t−i ) > 0}.
(9.54)
The common prior is P = p. In the other direction, let (N, Y, (Fi )i∈N , s) be an Aumann model of incomplete information over a set S of states of nature. A corresponding Harsanyi model of incomplete information is given by the model in which the set of types of each player i is the set of his partition elements Fi : Ti = {Fi ∈ Fi },
(9.55)
and the probability distribution p is given by p(F1 , F2 , . . . , Fn ) = P
,
i∈N
Fi .
(9.56)
Note that the intersection in this equation may be empty, or it may contain only one state of the world, or it may contain several states of the world. If the intersection is empty, the corresponding type vector is ascribed a probability of 0. If the intersection contains more than one state of the world, then in the Aumann model of incomplete information no player can distinguish between these states. The Harsanyi model identifies all these states as one state, and ascribes to the corresponding type vector the sum of their probabilities. This correspondence shows that a state of the world in an Aumann model of incomplete information is a vector containing the state of nature and the type of each player. The state of nature describes the payoff-relevant parameters, and the player’s type describes his beliefs. This is why we sometimes write a state of the world ω in the following form (we will expand on this in Chapter 11): ω = (s(ω); t1 (ω), t2 (ω), . . . , tn (ω)),
(9.57)
where s(ω) is the state of nature and ti (ω) is player i’s type in the state of the world ω. The following conclusion is a consequence of Theorem 9.30 (page 337) and the equivalence between the Aumann model and the Harsanyi model. Theorem 9.44 In a Harsanyi model of incomplete information, every state of the world ω = (s(ω); t1 (ω), t2 (ω), . . . , tn (ω)) uniquely determines the belief hierarchy of each player over the state of nature and the beliefs of the other players.
9.4.2
Strategies and payoffs In the presentation of a game with incomplete information as an extensive-form game, each type ti ∈ Ti corresponds to an information set of player i. It follows that a pure
352
Games with incomplete information and common priors
strategy17 of player i is a function si : Ti →
ti ∈Ti
Ai (ti ) that satisfies
si (ti ) ∈ Ai (ti ), ∀ti ∈ Ti .
(9.58)
In words, si (ti ) is the action specified by the strategy si for player i of type ti (which is an action available to him as a player of type ti ). A mixed strategy of player i is, as usual, a probability distribution over his pure strategies. A behavior strategy σi of player i is a function mapping each type ti ∈ Ti toa probability distribution over the actions available to that type. Notationally, σi : Ti → ti ∈Ti (Ai (ti )) that satisfies σi (ti ) = (σi (ti ; ai ))ai ∈Ai (ti ) ∈ (Ai (ti )).
(9.59)
In words, σi (ti ; ai ) is the probability that player i of type ti chooses the action ai . A Harsanyi game is an extensive-form game with perfect recall (Definition 9.39, page 347), and therefore by Kuhn’s Theorem (Theorem 4.49, page 118) every mixed strategy is equivalent to a behavior strategy. For this reason, there is no loss of generality in using only behavior strategies, which is indeed what we do in this section. Remark 9.45 Behavior strategy, as defined here, is a behavior strategy in a Harsanyi game in which the state game corresponding to t is a strategic-form game. If the state game is an extensive-form game, then Ai (ti ) is the set of pure strategies in that game, so that (A i (ti )) is the set of mixed strategies of this state game. In this case, a strategy σi : Ti → ti ∈Ti (Ai (ti )) in which σi (ti ) ∈ (Ai (ti )) is not a behavior strategy of the Harsanyi game. Rather, a behavior strategy is a function σi : Ti → ti ∈Ti Bi (ti ) with σi (ti ) ∈ Bi (ti ) for every ti ∈ Ti , where Bi (ti ) is the set of behavior strategies of player i in the state game s(ti ,t−i ) , which is the same for all t−i ∈ T−i . The distinction between these definitions is immaterial to the presentation in this section, and the results obtained here apply whether the state game is given in strategic form or in extensive form. If the vector of the players’ behavior strategies is σ = (σ1 , σ2 , . . . , σn ), and the vector of types that is selected by the chance move is t = (t1 , . . . , tn ), then each vector of actions (a1 , . . . , an ) is selected with probability σ1 (t1 ; a1 ) × σ2 (t2 ; a2 ) × · · · × σn (tn ; an ). Player i’s expected payoff, which we denote by Ui (t; σ ), is therefore
σ1 (t1 ; a1 ) × · · · × σn (tn ; an ) × ui (t; a). Ui (t; σ ) := Eσ [ui (t)] =
(9.60)
(9.61)
a∈A(t)
It follows that when the players implement strategy vector σ , the expected payoff in the game for player i is
Ui (σ ) := p(t)Ui (t; σ ). (9.62) t∈T
This is the expected payoff for player i at the ex ante stage, that is, before he has learned what type he is. After the chance move, the vector of types has been selected, and the
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
17 We use the notation si for a pure strategy of player i, and st for the state game that corresponds to the type vector t.
353
9.4 The Harsanyi model
conditional expected payoff to player i of type ti is
Ui (σ | ti ) := p(t−i | ti )Ui ((ti , t−i ); σ ),
(9.63)
t−i ∈T−i
where p(t−i | ti ) =
p(ti , t−i ) p(ti , t−i ) = . ′ p(ti , t−i ) p(ti )
(9.64)
′ t−i ∈T−i
This is the expected payoff of player i at the interim stage. The connection between the ex ante (unconditional expected) payoff Ui (σ ) and the interim (conditional) payoff (Ui (σ | ti ))ti ∈Ti is given by the equation
Ui (σ ) = p(ti )Ui (σ | ti ). (9.65) ti ∈Ti
Indeed,
ti ∈Ti
p(ti )Ui (σ | ti ) = = =
p(ti )
ti ∈Ti
p(t−i | ti )Ui ((ti , t−i ); σ )
(9.66)
p(ti )p(t−i | ti )Ui ((ti , t−i ); σ )
(9.67)
t−i ∈T−i
t−i ∈T−i ti ∈Ti
p(t)Ui (t; σ )
(9.68)
t∈T
= Ui (σ ).
(9.69)
Equation (9.66) follows from Equation (9.63), Equation (9.67) is a rearrangement of sums, Equation (9.68) is a consequence of the definition of conditional probability, and Equation (9.69) follows from Equation (9.62).
9.4.3
Equilibrium in games with incomplete information As we pointed out, Harsanyi games with incomplete information may be analyzed at two separate points in time: at the ex ante stage, before the players know their types, and at the interim stage, after they have learned what types they are. Accordingly, two different types of equilibria can be defined. The first equilibrium concept, which is Nash equilibrium in Harsanyi games, poses the requirement that no player can profit by a unilateral deviation before knowing his type. The second equilibrium concept, called Bayesian equilibrium, poses the requirement that no player i can profit by deviating at the interim stage, after learning his type ti . Definition 9.46 A strategy vector σ ∗ is a Nash equilibrium if 18 for each player i and each strategy σi of player i, ∗ ). Ui (σ ∗ ) ≥ Ui (σi , σ−i
(9.70)
As every game with incomplete information can be described as an extensive-form game (Theorem 9.42), every finite extensive-form game has a Nash equilibrium in mixed
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
∗ = (σ ∗ ) 18 Recall that σ−i j j =i .
354
Games with incomplete information and common priors
strategies (Theorem 5.13, page 152), and every mixed strategy is equivalent to a behavior strategy, we arrive at the following conclusion: Theorem 9.47 Every game with incomplete information in which the set of types is finite and the set of actions of each type is finite has a Nash equilibrium in behavior strategies. Remark 9.48 When the set of player types is countable, a Nash equilibrium is still guaranteed to exist (see Exercise 9.43). In contrast, when the set of player types is uncountable, it may be the case that all the equilibria involve strategies that are not measurable (see Simon [2003]). Definition 9.49 A strategy vector σ ∗ = (σ1∗ , σ2∗ , . . . , σn∗ ) is a Bayesian equilibrium if for each player i ∈ N, each type ti ∈ Ti , and each possible19 action ai ∈ Ai (ti ), ∗ ) | ti ). Ui (σ ∗ | ti ) ≥ Ui ((ai , σ−i
(9.71)
An equivalent way to define Bayesian equilibrium is by way of an auxiliary game, called the agent-form game. Definition 9.50 Let Ŵ = (N, (Ti )i∈N , p, S, (st )t∈T ) be a game with incomplete information. The agent-form game Ŵ corresponding to Ŵ is the following game in strategic form:
r The set of players is ∪i∈N Ti : every type of each player in Ŵ is a player in Ŵ. r The set of pure strategies of player ti in Ŵ is Ai (ti ), the set of available actions of that type in the game Ŵ. r The payoff function uti of player ti in Ŵ is given by
uti (a) := p(t−i | ti )ui (ti , t−i ; (aj (tj ))j ∈N ), (9.72) t−i ∈T−i
where a = (aj (tj ))j ∈N,tj ∈Tj denotes a vector of actions of all the players in Ŵ.
Ŵ equals the expected payoff of player i of type ti in The payoff u(ti ; a) of player ti in the game Ŵ when he chooses action ai (ti ), and for any j = i, player j of type tj chooses action aj (tj ). The conditional probability in Equation (9.72) is well defined because we have assumed that p(ti ) > 0 for each player i and each type ti . Note that every behavior strategy σi = (σi (ti ))ti ∈Ti of player i in the game Ŵ naturally defines a mixed strategy for the players in Ti in the agent-form game Ŵ . Conversely, every Ŵ naturally defines vector of mixed strategies of the players in Ti in the agent-form game a behavior strategy σi = (σi (ti ))ti ∈Ti of player i in the game Ŵ. Theorem 9.51 relates Bayesian equilibria in a game Ŵ to the Nash equilibria in the corresponding agent-form game Ŵ . The proof of the theorem is left to the reader (Exercise 9.44). Theorem 9.51 A strategy vector σ ∗ = (σi∗ )i∈N is a Bayesian equilibrium in a game Ŵ with incomplete information if and only if the strategy vector (σi∗ (ti ))i∈N,ti ∈Ti is a Nash equilibrium in the corresponding agent-form game Ŵ.
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
∗ ) | t ) is the payoff of player i of type t , when all other players use σ ∗ , and he plays 19 In Equation (9.71), Ui ((ai , σ−i i i action ai .
355
9.4 The Harsanyi model
As every game in strategic form in which the set of pure strategies available to each player is finite has a Nash equilibrium (Theorem 5.13, page 152), we derive the next theorem: Theorem 9.52 Every game with incomplete information in which the set of types is finite and the set of actions of each type is finite has a Bayesian equilibrium (in behavior strategies). As already noted, the two definitions of equilibrium presented in this section (Nash equilibrium and Bayesian equilibrium) express two different perspectives on the game: does each player regard the game prior to knowing his type or after knowing it? Theorem 9.53 states that these two definitions are in fact equivalent. Theorem 9.53 (Harsanyi [1967]) In a game with incomplete information in which the number of types of each player is finite, every Bayesian equilibrium is also a Nash equilibrium, and conversely every Nash equilibrium is also a Bayesian equilibrium. In other words, no player has a profitable deviation after he knows which type he is if and only if he has no profitable deviation before knowing his type. Recall that in the definition of a game with incomplete information we required that p(ti ) > 0 for each player i and each type ti ∈ Ti . This is essential for the validity of Theorem 9.53, because if there is a type that is chosen with probability 0 in a Harsanyi game, the action selected by a player of that type has no effect on the payoff. In particular, in a Nash equilibrium a player of this type can take any action. In contrast, because the conditional probabilities p(t−i | ti ) in Equation (9.64) are not defined for such a type, the payoff function of this type is undefined, and in that case we cannot define a Bayesian equilibrium. Proof of Theorem 9.53: The idea of the proof runs as follows. Because the expected payoff of player i in a game with incomplete information is the expectation of the conditional expected payoff of all of his types ti , and because the probability of each type is positive, it follows that every deviation that increases the expected payoff of any single type of player i also increases the overall payoff for player i in the game. In the other direction, if there is a deviation that increases the total expected payoff of player i in the game, it must necessarily increase the conditional expected payoff of at least one type ti . Step 1: Every Bayesian equilibrium is a Nash equilibrium. Let σ ∗ be a Bayesian equilibrium. Then for each player i ∈ N, each type ti ∈ Ti , and each action ai ∈ Ai (ti ), ∗ Ui (σ ∗ | ti ) ≥ Ui (ai , σ−i | ti ).
(9.73)
Combined with Equation (9.65) this implies that for each pure strategy si of player i we have
∗ ∗ )= p(ti )Ui (si (ti ), σ−i | ti ) ≤ p(ti )Ui (σ ∗ | ti ) = Ui (σ ∗ ). (9.74) Ui (si , σ−i ti ∈Ti
ti ∈Ti
356
Games with incomplete information and common priors
As this inequality holds for any pure strategy si of player i, it also holds for any of his ∗ mixed strategies. This implies that σi∗ is a best reply to σ−i . Since this is true for each ∗ player i ∈ N, we conclude that σ is a Nash equilibrium. Step 2: Every Nash equilibrium is a Bayesian equilibrium. We will prove that if σ ∗ is not a Bayesian equilibrium, then it is also not a Nash equilibrium. As σ ∗ is not a Bayesian equilibrium, there is at least one player i ∈ N, type ti ∈ Ti , and action ai ∈ Ai (ti ) satisfying ∗ Ui (σ ∗ | ti ) < Ui ((ai , σ−i ) | ti ).
Consider a strategy σi of player i defined by σi (ti′ ) =
σi∗ (ti′ ) ai
when ti′ = ti , when ti′ = ti .
(9.75)
(9.76)
In words: strategy σi is identical to strategy σi∗ except in the case of type ti , who plays ai instead of σi∗ (ti ). Equations (9.65) and (9.75) then imply that ∗ σi , σ−i )= Ui (
= = >
ti′ ∈Ti
ti′ =ti
ti′ =ti
ti′ =ti
=
ti′ ∈Ti
∗ p(ti′ )Ui ( σi , σ−i | ti′ )
(9.77)
∗ ∗ p(ti′ )Ui ( σi , σ−i | ti′ ) + p(ti )Ui ( σi , σ−i | ti )
(9.78)
∗ ∗ p(ti′ )Ui (σi∗ , σ−i | ti′ ) + p(ti )Ui (ai , σ−i | ti )
(9.79)
∗ ∗ p(ti′ )Ui (σi∗ , σ−i | ti′ ) + p(ti )Ui (σi∗ , σ−i | ti )
(9.80)
p(ti′ )Ui (σ ∗ | ti′ ) = Ui (σ ∗ ).
(9.81)
Inequality (9.80) follows from Inequality (9.75) and the assumption that p(ti ) > 0 for each player i and every type ti ∈ Ti . From the chain of equations (9.77)–(9.81) we get ∗ Ui ( σi , σ−i ) > Ui (σ ∗ ),
which implies that σ ∗ is not a Nash equilibrium.
(9.82)
We next present two examples of games with incomplete information and calculate their Bayesian equilibria.
357
9.4 The Harsanyi model
Example 9.54 Consider the following game with incomplete information:
r r r r
N = {I, II}. TI = {I1 , I2 } and TII = {II}: Player I has two types and Player II has one type. p(I1 , II) = p(I2 , II) = 12 : the two types of Player I have equal probabilities. There are two states of nature corresponding to two state games in which each player has two possible actions, and the payoff functions are given by the matrices shown in Figure 9.10.
Player II L R
Player II L R T1 Player I
B1
1, 0
0, 2
0, 3
Player I
1, 0
The state game for t = (I1, II)
T2
0, 2
1, 1
B2
1, 0
0, 2
The state game for t = (I2, II)
Figure 9.10 The state games in Example 9.54
Because the information each player has is his own type, Player I knows the payoff matrix, while Player II does not know it (see Figure 9.11).
T1 1 2
I1 B1
0
II 1 2
T2 I2 B2
L
1, 0
R
0, 2
L
0, 3
R L
1, 0
R
1, 1
L
1, 0
R
0, 2
0, 2
Figure 9.11 The game of Example 9.54 in extensive form
Turning to the calculation of Bayesian equilibria in this game, given such an equilibrium, denote by [q(L), (1 − q)(R)] the equilibrium strategy of Player II, by [x(T1 ), (1 − x)(B1 )] the equilibrium strategy of Player I of type I1 , and by [y(T2 ), (1 − y)(B2 )] the equilibrium strategy of Player I of type I2 (see Figure 9.12).
Player II q 1–q Player I
x
1, 0
0, 2
1–x
0, 3
1, 0
Strategies in state game t = (I1, II)
q Player I
Player II 1–q
y
0, 2
1, 1
1–y
1, 0
0, 2
Strategies in state game t = (I2, II)
Figure 9.12 The strategies of the players in the game of Example 9.54
358
Games with incomplete information and common priors We first show that 0 < q < 1.
r If q = 1, then type I1 ’s best reply is T (x = 1) and I2 ’s best reply is B (y = 0). But Player II’s
best reply to this strategy is R (q = 0). It follows that q = 1 is not part of a Bayesian equilibrium.
r If q = 0, then type I1 ’s best reply is B (x = 0) and I2 ’s best reply is T (y = 1). But Player II’s best reply to this strategy is L (q = 1). It follows that q = 0 is not part of a Bayesian equilibrium.
The conclusion is therefore that in a Bayesian equilibrium Player II’s strategy must be completely mixed, so that he is necessarily indifferent between L and R. This implies that 1 2
· 3(1 − x) +
1 2
· 2y =
1 2
· 2x + 21 (y + 2(1 − y)),
(9.83)
giving us 1 + 3y . (9.84) 5 Is every pair (x, y) satisfying Equation (9.84) part of a Bayesian equilibrium? For (x, y) to be part of a Bayesian equilibrium, it must be a best reply to q. x=
r If q < 1 , Player I’s best reply is x = 0, y = 1, which does not satisfy Equation (9.84). 2 r If q = 1 , Player I’s payoff is 1 irrespective of what he plays, so that every pair (x, y) is a best 2 2 reply to q = 21 . r If q > 1 , Player I’s best reply is x = 1, y = 0, which does not satisfy Equation (9.84). 2
This leads to the conclusion that a pair of strategies (x, y; q) is a Bayesian equilibrium if and only if q = 21 and Equation (9.84) is satisfied. Since x and y are both in the interval [0, 1] we obtain 1 5
≤ x ≤ 54 , 0 ≤ y ≤ 1, x =
1+3y 5 .
(9.85)
We have thus obtained a continuum of Bayesian equilibria (x, y; q), in all of which Player I’s payoff (of either type) is 21 , and Player II’s payoff is 1 2
· 3(1 − x) +
1 2
· 2y =
12+y . 10
(9.86)
◭
Example 9.55 Cournot duopoly competition with incomplete information Consider the duopoly competition described in Example 4.23 (page 99) when there is incomplete information regarding production costs. Two manufacturers, labeled 1 and 2, produce the same product and compete for the same market of potential customers. The manufacturers simultaneously select their production quantities, with demand determining the market price of the product, which is identical for both manufacturers. Denote by q1 and q2 the quantities respectively produced by manufacturers 1 and 2. The total quantity of products in the market is therefore q1 + q2 . Assume that when the supply is q1 + q2 the price of each item is 2 − q1 − q2 . The per-item production cost for Manufacturer 1 is c1 = 1, and it is common knowledge among the two manufacturers. The per-item production cost for Manufacturer 2 is known only to him, not to Manufacturer 1. All that Manufacturer 1 knows about it is that it is either c2L = 34 (low cost) or c2H = 54 (high cost), with equal probability. Note that the average production cost of Manufacturer 2 is 1, which is equal to Manufacturer 1’s cost.
359
9.4 The Harsanyi model Let us find a Bayesian equilibrium of this game. This is a game with incomplete information in which the types of each manufacturer correspond to their production costs:20
r r r r
N = {1, 2}. ! T1 = {1}, T2 = 43 , 54 . p(1, 43 ) = p(1, 54 ) = 21 . There are two states of nature, corresponding respectively to the type vectors (1, 43 ) and (1, 54 ). Each one of these states of nature corresponds to a state game in which the action set of each player is [0, ∞) (each player can produce any nonnegative quantity of items), and the payoff functions which we provide now.
Denote by ui (q1 , q2H , q2L ) the net profit of Manufacturer i as a function of the quantities of items produced by each type, where q1 is the quantity produced by Manufacturer 1, q2H is the quantity produced by Manufacturer 2 if his production costs are high, and q2L is the quantity produced by Manufacturer 2 if his production costs are low. As Manufacturer 1 does not know the type of Manufacturer 2, his expected profit is u1 q1 , q2H , q2L = 12 q1 2 − q1 − q2H + 12 q1 2 − q1 − q2L − c1 q1 = q1 2 − c1 − q1 − 21 q2H − 21 q2L . (9.87)
The net profit of Manufacturer 2’s two possible types is H L H H H H H H H uH 2 q1 , q2 , q2 = q2 2 − q1 − q2 − c2 q2 = q2 2 − c2 − q1 − q2 , uL2 q1 , q2H , q2L = q2L 2 − q1 − q2L − c2L q2L = q2L 2 − c2L − q1 − q2L .
(9.88) (9.89)
Since each manufacturer has a continuum of actions, the existence of an equilibrium is not guaranteed. Nevertheless, we will assume that an equilibrium exists, and try to calculate it. Denote by q1∗ the quantity of items produced by Manufacturer 1 at equilibrium, by q2∗H the quantity produced by Manufacturer 2 at equilibrium if his production costs are high, and by q2∗L the quantity he produces at equilibrium under low production costs. At equilibrium, every manufacturer maximizes his expected payoff given the strategy of the other manufacturer: q1∗ maximizes u1 (q1 , q2∗H , q2∗L ), q2∗H ∗ H ∗L ∗L L ∗ ∗H L H maximizes uH 2 (q1 , q2 , q2 ), and q2 maximizes u2 (q1 , q2 , q2 ). Since u2 is a quadratic function H H 2 of q2 , and the coefficient of (q2 ) is negative, it has a maximum at the point where its derivative with respect to q2H vanishes. This results in q2H =
3 4
− q1 . 2
(9.90)
Similarly, we differentiate uL2 with respect to q2L , set the derivative to zero, and get q2L =
5 4
− q1 . 2
(9.91)
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
20 Similar to remarks we made with respect to the Aumann model regarding the distinction between states of nature and states of the world, the type t2 = 43 in this Harsanyi model contains far more information than the simple fact that the per-unit production cost of Manufacturer 2 is 34 ; it contains the entire belief hierarchy of Manufacturer 2 with respect to the production costs of both manufacturers. Production costs are states of nature, with respect to which there is incomplete information.
360
Games with incomplete information and common priors Finally, differentiate u1 with respect to q1 and set the derivative to zero, obtaining 1 − 21 q2H − 21 q2L . 2 Insert Equations (9.90) and (9.91) in Equation (9.92) to obtain q1 =
1−
q1 =
1−q1 2
2
(9.92)
(9.93)
,
or, in other words, q1∗ = 31 .
(9.94)
This leads to 3 4
q2∗H =
− 2 − 2
5 4
q2∗L =
1 3 1 3
=
5 , 24
(9.95)
=
11 . 24
(9.96)
5 11 The conclusion is that (q1∗ , q2∗H , q2∗L ) = ( 31 , 24 , 24 ) is the unique Bayesian equilibrium of the game. Note that q2∗H < q1∗ < q2∗L : the high (inefficient) type produces less than Manufacturer 1, and the low (more efficient) type produces more than Manufacturer 1, whose production costs are the average of the production costs of the two types of Manufacturer 2. The profits gained by the manufacturers are
1 8 2 − 1 − 31 − 24 = 9, 5 3 1 5 5 2 1 5 11 , uH 2 3 , 24 , 24 = 24 4 − 3 − 24 = 24 11 2 L 1 5 11 11 5 1 11 u2 3 , 24 , 24 = 24 4 − 3 − 24 = 24 . u1
1
5 11 3 , 24 , 24
=
1 3
(9.97) (9.98) (9.99)
Therefore Manufacturer 2’s expected profit is 1 2
5 2 24
+
1 2
11 2 24
≈ 0.127.
(9.100)
The case in which Manufacturer 2 also does not know his exact production cost (but knows that the cost is either 34 or 45 with equal probability, and thus knows that his average production cost is 1) is equivalent to the case we looked at in Example 4.23 (page 99). In that case we derived the equilibrium q1∗ = q2∗ = 31 , with the profit of each manufacturer being 19 . Comparing that figure with Equation (9.97), we see that relative to the incomplete information case, Manufacturer 1’s profit is the same. Using Equations (9.98)–(9.99) and the fact that 0.127 > 19 , we see that Manufacturer 2’s profit when he does not know his own type is smaller than his expected profit when he knows his type; the added information is advantageous to Manufacturer 2. We also gain insight by comparing this situation to one in which the production cost of Manufacturer 2 is common knowledge among the two manufacturers. In that case, after the selection of Manufacturer 2’s type by the chance move, we arrive at a game similar to a Cournot competition with complete information, which we solved in Example 4.23 (page 99). With probability 21 the manufacturers face a Cournot competition in which c1 = 1 and c2 = c2H = 54 , and with probability
361
9.5 A possible interpretation of mixed strategies they face a Cournot competition in which c1 = 1 and c2 = c2L = 43 . In the first case, equilibrium 2 5 2 5 and q2∗ = 61 , with profits of u1 = 61 and u2 = 12 (verify!). In the second is attained at q1∗ = 12 2 1 1 ∗ ∗ case, equilibrium is attained at q1 = 4 and q2 = 2 , corresponding to profits of u1 = 41 and 2 u2 = 12 (verify!). The expected profits prior to the selection of the types is 1 2
u1 =
1 2
u2 =
1 2
1 2 4
1 2 6
+
1 2
+
1 2
5 2 12
1 2 2
= 81 ,
(9.101)
5 36 .
(9.102)
=
For comparison, we present in table form the profits attained by the manufacturers in each of the three cases dealt with in this example (with respect to the production costs of Manufacturer 2): Knowledge regarding Manufacturer 2’s type
Manufacturer 1’s profit
Manufacturer 2’s profit
Unknown to both manufacturers
1 9 1 9 1 8
1 9
Known only to Manufacturer 2 Known to both manufacturers
≈ 0.127 5 36
Note the following:
r Both manufacturers have an interest in Manufacturer 2’s type being common knowledge, as opposed to the situation in which that type is unknown to both manufacturers, because 18 > 19 5 and 36 > 19 . r Both manufacturers have an interest in Manufacturer 2’s type being common knowledge, as opposed to the situation in which that type is known solely to Manufacturer 2, because 18 > 19 5 and 36 > 0.127. This last conclusion may look surprising, because it says that Manufacturer 2 prefers that his private information regarding his production cost be exposed and made public knowledge. ◭
9.5
Incomplete information as a possible interpretation of mixed strategies • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
There are cases in which it is difficult to interpret or justify the use of mixed strategies in equilibria. Consider for example the following two-player game in which the payoff functions are given by the matrix in Figure 9.13. This game has only one Nash equilibrium, with Player I playing the mixed strategy [ 43 (T ), 14 (B)] and Player II playing the mixed strategy [ 12 (L), 12 (R)]. The payoff at equilibrium is 0 for both players. When Player II plays strategy [ 21 (L), 12 (R)] Player I is indifferent between T and B. If that is the case, why should he stick to playing a mixed strategy? And even if he does play a mixed strategy, why the mixed strategy [ 43 (T ), 14 (B)]? If he plays, for example, the pure strategy T , he guarantees himself a payoff of 0 without going through the bother of randomizing strategy selection.
362
Games with incomplete information and common priors
Player II L R Player I
T
0, 0
0, –1
B
1, 0
–1, 3
Figure 9.13 The payoff matrix of a strategic-form game
Player II L R Player I
T
εα, εβ
εα, −1
B
1, εβ
−1, 3
Figure 9.14 The payoff matrix of Figure 9.13 with “noise”
As we now show, this equilibrium can be interpreted as the limit of a sequence of Bayesian equilibria of games in which the players play pure strategies. The idea is to add incomplete information by injecting “noise” into the game’s payoffs; each player will know his own payoffs, but will be uncertain about the payoffs of the other player. To illustrate this idea, suppose the payoff function, rather than being known with certainty, is given by the matrix of Figure 9.14. In Figure 9.14, ε (the amplitude of the noise) is small and α and β are independently and identically distributed random variables over the interval [−1, 1], with the uniform distribution. Note that for ε = 0 the resulting game is the original game appearing in Figure 9.13. Suppose that Player I knows the value of α and Player II knows the value of β; i.e., each player has precise knowledge of his own payoff function. This game can be depicted as a game with incomplete information and a continuum of types, as follows:
r r r r r
The set of players is N = {I, II}. The type space of Player I is TI = [−1, 1]. The type space of Player II is TII = [−1, 1]. The prior distribution over T is the uniform distribution over the square [−1, 1]2 . The state game corresponding to the pair of types (α, β) ∈ T := TI × TII is given by the matrix in Figure 9.14.
The Harsanyi game that we constructed here has a continuum of types. The definition of a Harsanyi game is applicable also in this case, provided the set of type vectors is a measurable space, so that a common prior distribution can be defined. In the example presented here, the set of type vectors is [−1, 1]2 , which is a measurable space (with the σ -algebra of Borel sets), and the common prior distribution is the uniform distribution.
363
9.5 A possible interpretation of mixed strategies
The expected payoff and the conditional expected payoff are defined analogously to the definitions in Equations (9.62) and (9.63), by replacing the summation over T (in Equation (9.62)) or on T−i (in Equation (9.63)) with integration. To ensure that the expressions in these equations are meaningful, we need to require the strategies of the players to be measurable functions of their type, and the payoff functions have to be measurable as well (so that the expected payoffs are well defined). The definitions of Nash equilibrium and Bayesian equilibrium remain unchanged (Definitions 9.46 and 9.49). Since each player has a continuum of types, the existence of a Bayesian equilibrium is not guaranteed. We will, nevertheless, assume that there exists a Bayesian equilibrium and try to identify it. In fact, we will prove that there exists an equilibrium in which the strategies are threshold strategies: the player plays one action if his type is less than or equal to a particular threshold, and he plays the other action if his type is greater than this threshold.
r Let α0 ∈ [−1, 1], and let sIα0 be the following strategy: T when α > α0 , α0 sI = B when α ≤ α0 .
(9.103)
In words, if Player I’s type is “high” (α > α0 ) he plays T , and if his type is “low” (α ≤ α0 ) he plays B. r Let β0 ∈ [−1, 1] and let sIIα0 be the following strategy: L when β > β0 , β0 (9.104) sII = R when β ≤ β0 . In words, if Player II’s type is “high” (β > β0 ) he plays L, and if his type is “low” (β ≤ β0 ) he plays R. β
Next, we will identify two values, α0 and β0 , for which the pair of strategies (sIα0 , sII0 ) form a Bayesian equilibrium. 0 0 Since P(β > β0 ) = 1−β and P(β ≤ β0 ) = 1+β , the expected payoff of Player I of type 2 2 β0 α facing strategy sII of Player II is β UI T , sII0 α = εα, (9.105) if he plays T ; and it is
1 + β0 1 − β0 β UI B, sII0 α = 1 + (−1) = −β0 , 2 2
(9.106)
β
if he plays B. In order for sIα0 to be a best reply to sII0 the following conditions must hold: β β α > α0 =⇒ UI T , sII0 α ≥ UI B, sII0 α ⇐⇒ εα ≥ −β0 , (9.107) β0 β0 (9.108) α ≤ α0 =⇒ UI T , sII α ≤ UI B, sII α ⇐⇒ εα ≤ −β0 .
From this we conclude that at equilibrium,
εα0 = −β0 .
(9.109)
364
Games with incomplete information and common priors
We can similarly calculate that the expected payoff of Player II of type β facing strategy sIα0 of Player I is (9.110) UII sIα0 , Lβ = εβ, 1 + α0 1 − α0 UII sIα0 , R β = (−1) +3 = 1 + 2α0 . (9.111) 2 2 β
In order for sII0 to be a best reply against sIα0 , the following needs to hold: β > β0 =⇒ UII sIα0 , Lβ ≥ UII sIα0 , R β ⇐⇒ εβ ≥ 1 + 2α0 , α0 α0 ⇐⇒ εβ ≤ 1 + 2α0 . β ≤ β0 =⇒ UII s , Lβ ≤ UII s , R β I
I
From this we further deduce that at equilibrium
εβ0 = 1 + 2α0
(9.112)
must hold. The solution of Equations (9.109) and (9.112) is α0 = −
1 , 2 + ε2
β0 =
ε . 2 + ε2
The probability that Player I will play B is therefore 1 1 − 2+ε 1 + ε2 1 2 Pε (B) = P α ≤ − = = , 2 + ε2 2 4 + 2ε2 and the probability that Player II will play R is ε 1 + 2+ε ε 2 + ε + ε2 2 = . Pε (R) = P β ≤ = 2 + ε2 2 4 + 2ε2
(9.113)
(9.114)
(9.115)
When ε approaches 0, that is, when we reduce the uncertainty regarding the payoffs down towards zero, we get lim Pε (B) = 14 ,
(9.116)
lim Pε (R) = 12 ,
(9.117)
ε→0
ε→0
which is the mixed strategy equilibrium in the original game that began this discussion. β It follows that in the equilibrium (sIα0 , sII0 ) each player implements a pure strategy. Moreover, for α = α0 , the action chosen by Player I of type α yields a strictly higher payoff than the action not chosen by him. Similarly, when β = β0 , the action chosen by Player II of type is β yields a strictly higher payoff than the action not chosen by him. Harsanyi [1973] proposed this sort of reasoning as a basis for a new interpretation of mixed strategies. According to Harsanyi, a mixed strategy can be viewed as a pure strategy of a player that can be of different types. Each type chooses a pure strategy, and different types may choose different pure strategies. From the perspective of other players, who do not know the player’s type but rather have a belief (probability distribution) about the player’s type, it is as if the player chooses his pure strategy randomly; that is, he is implementing a mixed strategy. It is proved in Harsanyi [1973] that this result can be applied to n-player strategic-form games in which the set of pure strategies is finite. That paper also identifies
365
9.6 The common prior assumption
conditions guaranteeing that each equilibrium is the limit of equilibria in “games with added noise,” similar to those presented in the above example, as the amplitude of the noise approaches zero. We note that the same result obtains when the distribution of noise is not necessarily uniform over the interval [−1, 1]. Any probability distribution that is continuous over a compact, nonempty interval can be used (an example of such a probability distribution appears in Exercise 9.47).
9.6
The common prior assumption: inconsistent beliefs • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
As noted above, in both the Aumann and Harsanyi models, a situation of incomplete information can be assessed from two different perspectives: the ex ante stage, prior to the chance move selecting the state of the world (in the Aumann model) or the type vector (in the Harsanyi model), and the interim stage, after the chance move has selected the type vector and informed each player about his type, but before the players choose their actions. Prior to the selection of the state of the world, no player knows which information (the partition element in the Aumann model; the type in the Harsanyi model) he will receive; he only knows the prior distribution over the outcomes of the chance move. After the chance move, each player receives information, and updates his beliefs about the state of the world in the Aumann model (the distribution P conditioned on Fi (ω)) or about the types of the other players in the Harsanyi model (the distribution p conditioned on ti ). The concept of interim beliefs is straightforward: a player’s interim beliefs are his beliefs after they have been updated in light of new information he has privately received. In reallife situations, a player’s beliefs may not be equal to his updated conditional probabilities for various reasons: errors in the calculation of conditional probability, lack of knowledge of the prior distribution, psychologically induced deviations from calculated probabilities, or in general any “subjective feeling” regarding the probability of any particular event, apart from any calculations. It therefore appears to be natural to demand that the interim beliefs be part of the fundamental data of the game, and not necessarily derived from prior distributions (whether or not those prior distributions are common). This is not the case in the Aumann and Harsanyi models: the fundamental data in these models includes a common prior distribution, with the interim beliefs derived from the common prior through the application of Bayes’ rule. Assuming the existence of a common prior means adopting a very strong assumption. Can this assumption be justified? What is the exact role of the prior distribution p? Who, or what, makes that selection of the type vector (in the Harsanyi model) or the state of the world (in the Aumann model) at the beginning of the game? And how are the players supposed to “know” the prior distribution p that forms part of the game data? When player beliefs in the interim stage are derived from one common prior by way of Bayes’ rule, given the private information that the players receive, those beliefs are termed consistent beliefs. They are called consistent because they imply that the players’ beliefs about the way the world works are identical; the only thing that distinguishes players from each other is the information each has received. In that case there is
366
Games with incomplete information and common priors
“no difference” between the Harsanyi depiction of the game and its depiction in the interim stage. Theorem 9.53 (page 355) states that the sets of equilibria in both depictions are identical (when the set of type vectors is finite). This means that the Aumann and Harsanyi models may be regarded as “convenient tools” for analyzing the interim stage, which is the stage in which we are really interested. If we propose that the most relevant stage for analyzing the game is the interim stage, in which each player is equipped with his own (subjective) interim beliefs, the next question is: can every system of interim stage beliefs be described by a Harsanyi game? In other words, given a system of interim stage beliefs, can we find a prior distribution p such that the beliefs of each player’s type is the conditional probability derived from p, given that the player is of that type? The next example shows that the answer to this question may be negative.
Example 9.56 Consider a model of incomplete information in which:
r there are two players: N = {I, II}, and r each player has two types: TI = {I1 , I2 }, TII = {II1 , II2 }, and T = TI × TII = {I1 II1 , I1 II2 , I2 II1 , I2 II2 }.
Suppose that in the interim stage, before actions are chosen by the players, the mutual beliefs of the players’ types are given by the tables in Figure 9.15.
I1 I2
II1 3/7 2/3
II2 4/7 1/3
I1 I2
II1 1/2 1/2
II2 4/5 1/5
Player I’s beliefs Player II’s beliefs Figure 9.15 The mutual beliefs of the various types in the interim stage in Example 9.56 The tables in Figure 9.15 have the following interpretation. The table on the left describes the beliefs of the two possible types of Player I: Player I of type I1 ascribes probability 37 to the type of Player II being II1 and probability 47 to the type of Player II being II2 . Player I of type I2 ascribes probability 32 to the type of Player II being II1 and probability 13 to the type of Player II being II2 . The table on the right describes the beliefs of the two possible types of Player II. For example, Player II of type II1 ascribes probability 12 to the type of Player I being I1 and probability 12 to the type of Player I being I2 . There is no common prior distribution p over T = TI × TII that leads to the beliefs described above. This can readily be seen with the assistance of Figure 9.16.
I1 I2
II1 2x 2x
II2 4x x
Figure 9.16 Conditions that must be satisfied in order for a common prior in Example 9.56 to exist
367
9.7 Remarks In Figure 9.16, we have denoted x = p(I2 , II2 ). In order for the beliefs of type I2 to correspond with the data in Figure 9.15, it must be the case that p(I2 , II1 ) = 2x (because according to Figure 9.15 type I2 believes that the probability that Player II’s type II1 is twice the probability that his type is II2 ). In order for the beliefs of type II1 to correspond with the data in Figure 9.15, it must be the case that p(I1 , II1 ) = 2x, and in order for the beliefs of type II2 to correspond with the data in Figure 9.15, it must be the case that p(I1 , II2 ) = 4x. But then the beliefs of type I1 are [ 73 (II1 ), 47 (II2 )], while according to Figure 9.16, these beliefs are [ 13 (II1 ), 23 (II2 )]. ◭
The incomplete information situation described in the last example is a situation of inconsistent beliefs. Such a situation cannot be described by a Harsanyi model, and it therefore cannot be described by an Aumann model. Analyzing such situations requires extending the Harsanyi model, which is what we will do in the next chapter, where we will construct a model of incomplete information in which the beliefs of the types are part of the data of the game. The question is what can be said about models of situations with inconsistent beliefs. For one thing, the concept of Bayesian equilibrium is still applicable, also when players’ beliefs are inconsistent. In the definition of Bayesian equilibrium, the prior p has significance only in establishing the beliefs p(t−i | ti ) in Equation (9.63). That means that the definition is meaningful also when beliefs are not derived from a common prior. In the next chapter we will return to the topic of consistency, provide a formal definition of the concept, and define Bayesian equilibrium in general belief structures.
9.7
Remarks • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Kripke’s S5 system was defined in Kripke [1963] (see also Geanakoplos [1992]). The concept of common knowledge first appeared in Lewis [1969] and was independently defined in Aumann [1976]. Theorem 9.32 (page 339) is proved in Aumann [1976], in which he also proves the Characterization Theorem 9.24 (page 333) in a formulation that is equivalent to that appearing in Remark 9.26 (page 333). The same paper presents a dynamic process that leads to a posterior probability that is common knowledge. A formal description of that dynamic process is given by Geanakoplos and Polemarchakis [1982]. Further developments of this idea can be found in many papers, including Geanakoplos and Sebenius [1983], McKelvey and Page [1986], and Parikh and Krasucki [1990]. John Harsanyi proposed the Harsanyi model of incomplete information in a series of three papers titled “Games of incomplete information played by Bayesian players” (Harsanyi [1967, 1968a, 1968b]), for which he was awarded the Nobel Memorial Prize in Economics in 1994. Harsanyi also proposed the interpretation of the concept of mixed strategies and mixed equilibria as the limit of Bayesian equilibria (in Harsanyi [1973]), as explained in Section 9.5 (page 361). For further discussions on the subject of the distinction between knowledge and probability-one belief, the reader is directed to Monderer and Samet [1989] and Vassilakis and Zamir [1993]. Exercise 9.14 is based on a question suggested by Ronen Eldan. Geanakoplos [1992] notes that the riddles on which Exercises 9.23 and 9.31 are based first appeared in Bollob´as
368
Games with incomplete information and common priors
[1953]. Exercise 9.25 is proved in Geanakoplos and Polemarchakis [1982], from which Exercise 9.28 is also taken. Exercise 9.26 was donated to the authors by Ayala MashiahYaakovi. Exercises 9.29 and 9.30 are from Geanakoplos and Sebenius [1983]. Exercise 9.33 is taken from Geanakoplos [1992]. Exercise 9.34 is the famous “coordinated attack problem,” studied in the field of distributed computing. The formulation of the exercise is from Halpern [1986]. Exercise 9.39 is from Harsanyi [1968a]. Exercise 9.40 is based on Spence [1974]. Exercise 9.41 is based on Akerlof [1970]. Exercise 9.46 is the “Electronic Mail game” of Rubinstein [1989]. Exercise 9.53 is based on Aumann [1987]. The authors thank Yaron Azrieli, Aviad Heifetz, Dov Samet, and Eran Shmaya for their comments on this chapter.
9.8
Exercises • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
In the exercises in this chapter, all announcements made by the players are considered common knowledge, and the game of each exercise is also considered common knowledge among the players. 9.1 Prove that the knowledge operator Ki (Definition 9.8, page 325) of each player i satisfies the following properties: (a) Ki Y = Y : player i knows that Y is the set of all states. (b) Ki A ∩ Ki B = Ki (A ∩ B): player i knows event A and knows event B if and only if he knows event A ∩ B. (c) (Ki A)c = Ki ((Ki A)c ): player i does not know event A if and only if he knows that he does not know event A. 9.2 This exercise shows that the Kripke S5 system characterizes the knowledge operator. Let Y be a finite set, and let K : 2Y → 2Y be an operator that associates with each subset A of Y a subset K(A) of Y . Suppose that the operator K satisfies the following properties: (i) (ii) (iii) (iv) (v)
K(Y ) = Y . K(A) ∩ K(B) = K(A ∩ B) for every pair of subsets A, B ⊆ Y . K(A) ⊆ A for every subset A ⊆ Y . K(K(A)) = K(A) for every subset A ⊆ Y . (K(A))c = K((K(A))c ) for every subset A ⊆ Y .
Associate with each ω ∈ Y a set F (ω) as follows: , F (ω) := {A ⊆ Y, ω ∈ K(A)}.
(9.118)
(a) Prove that ω ∈ F (ω) for each ω ∈ Y . (b) Prove that if ω′ ∈ F (ω), then F (ω) = F (ω′ ). Conclude from this that the family of sets F := {F (ω), ω ∈ Y } is a partition of Y .
369
9.8 Exercises
(c) Let K ′ be the knowledge operator defined by the partition F : K ′ (A) = {ω ∈ Y : F (ω) ⊆ A}.
(9.119)
Prove that K ′ = K. (d) Which of the five properties listed above did you use in order to prove that K ′ = K? 9.3 Prove that in Kripke’s S5 system (see page 327), the fourth property, Ki Ki A = Ki A, is a consequence of the other four properties. 9.4 Consider an Aumann model of incomplete information in which N = {1, 2}, Y = {1, 2, 3, 4, 5, 6, 7}, F1 = {{1}, {2, 3}, {4, 5}, {6, 7}}, F2 = {{1, 2}, {3, 4}, {5, 6}, {7}}. Let A = {1} and B = {1, 2, 3, 4, 5, 6}. Identify the events K1 A, K2 A, K2 K1 A, K1 K2 A, K1 B, K2 B, K2 K1 B, K1 K2 B, K1 K2 K1 B, K2 K1 K2 B. 9.5 Emily, Marc, and Thomas meet at a party to which novelists and poets have been invited. Every attendee at the party is either a novelist or a poet (but not both). Every poet knows all the other poets, but every novelist does not know any of the other attendees, whether they are poets or novelists. What do Emily, Marc, and Thomas know about each other’s professions? Provide an Aumann model of incomplete information that describes this situation (there are several ways to do so). 9.6 I love Juliet, and I know that Juliet loves me, but I do not know if Juliet knows that I love her. Provide an Aumann model of incomplete information that describes this situation, and specify a state of the world in that model that corresponds to this situation (there are several possible ways of including higher-order beliefs in this model). 9.7 Construct an Aumann model of incomplete information for each of the following situations, and specify a state of the world in that model which corresponds to the situation (there are several possible ways of including higher-order beliefs in each model): (a) Mary gave birth to a baby, and Herod knows it. (b) Mary gave birth to a baby, and Herod does not know it. (c) Mary gave birth to a baby, Herod knows it, and Mary knows that Herod knows it. (d) Mary gave birth to a baby, Herod knows it, but Mary does not know that Herod knows it. (e) Mary gave birth to a baby, Herod does not know it, and Mary does not know whether Herod knows it or not. 9.8 Romeo composes a letter to Juliet, and gives it to Tybalt to deliver to Juliet. While on the way, Tybalt peeks at the letter’s contents. Tybalt gives Juliet the letter, and
370
Games with incomplete information and common priors
Juliet reads it immediately, in Tybalt’s presence. Neither Romeo nor Juliet knows that Tybalt has read the letter. Answer the following questions relating to this story: (a) Construct an Aumann model of incomplete information in which all the elements of the story above regarding the knowledge possessed by Romeo, Tybalt, and Juliet regarding the content of the letter hold true (there are several possible ways to do this). Specify a state of the world in the model that corresponds to the situation described above. (b) In the state of the world you specified above, does Romeo know that Juliet has read the letter? Justify your answer. (c) In the state of the world you specified above, does Tybalt know that Romeo knows that Juliet has read the letter? Justify your answer. (d) Construct an Aumann model of incomplete information in which, in addition to the particulars of the story presented above, the following also holds: “Tybalt does not know that Juliet does not know that Tybalt read the letter,” and specify a state of the world in your model that corresponds to this situation. 9.9 George, John, and Thomas are standing first, second, and third in a line, respectively. Each one sees the persons standing in front of him. James announces: “I have three red hats and two white hats. I will place a hat on the head of each one of you.” After James places the hats, he asks Thomas (who can see the hats worn by John and George) if he knows the color of the hat on his own head. Thomas replies “no.” He then asks John (who sees only George’s hat) whether he knows the color of the hat on his own head, and he also replies “no.” Finally, he asks George (who cannot see any of the hats) if he knows the color of the hat on his own head. (a) Construct an Aumann model of incomplete information that contains 7 states of the world and describes this situation. (b) What are the partitions of George, John, and Thomas after James’s announcement and before he asked Thomas whether he knows the color of his hat? (c) What are the partitions of George and John after Thomas’s response and before John responded to James’s question? (d) What is George’s partition after hearing John’s response? (e) What is George’s answer to James’s question? Does this answer depend on the the state of the world, that is, on the colors of the hats that the three wear? 9.10 Prove Corollary 9.16 (page 331): every situation of incomplete information (N, Y, (Fi )i∈N , s, ω∗ ) over a set of states of nature S uniquely determines a knowledge hierarchy among the players over the set of states of nature S in state of the world ω∗ . 9.11 Consider an Aumann model of incomplete information in which N = {I, II}, Y = {1, 2, 3, 4, 5, 6, 7, 8, 9}, FI = {{1}, {2, 3}, {4, 5}, {6}, {7}, {8, 9}}, and FII = {{1}, {2, 5}, {3}, {4, 7}, {6, 9}, {8}}. What are the connected components in the graph corresponding to this Aumann model? Which events are common knowledge in state
371
9.8 Exercises
of the world ω = 1? Which events are common knowledge in state of the world ω = 9? Which events are common knowledge in state of the world ω = 5? 9.12 Show that in Examples 9.12 (page 327) and 9.13 (page 329), in each state of the world, the only event that is common knowledge is Y . 9.13 Consider an Aumann model of incomplete information in which N = {I, II}, Y = {1, 2, 3, 4, 5, 6, 7, 8, 9}, FI = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}, and FII = {{1, 5}, {2, 6}, {3, 4}, {7}, {8, 9}}. Answer the following questions: (a) What are the connected components in the graph corresponding to the Aumann model? (b) Which events are common knowledge in state of the world ω = 1? In ω = 7? In ω = 8? (c) Denote by A the event {1, 2, 3, 4, 5}. Find the shortest sequence of players i1 , i2 , . . . , ik such that in the state of the world ω = 1 it is not the case that i1 knows that i2 knows that . . . ik−1 knows that ik knows event A. 9.14 A digital clock showing the hours between 00:00 to 23:59 hangs on a wall; the digits on the clock are displayed using straight lines, as depicted in the accompanying figure.
William and Dan are both looking at the clock. William sees only the top half of the clock (including the midline) while Dan sees only the bottom half of the clock (including the midline). Answer the following questions: (a) (b) (c) (d) (e) (f)
At which times does William know the correct time? At which times does Dan know the correct time? At which times does William know that Dan knows the correct time? At which times does Dan know that William knows the correct time? At which times is the correct time common knowledge among William and Dan? Construct an Aumann model of incomplete information describing this situation. How many states of nature, and how many states of the world, are there in your model?
9.15 Prove that if in an Aumann model of incomplete information the events A and B are common knowledge among the players in state of the world ω, then the event A ∩ B is also common knowledge among the players in ω. 9.16 Given an Aumann model of incomplete information, prove that event A is common knowledge in every state of the world in A if and only if K1 K2 · · · Kn A = A, where N = {1, 2, . . . , n} is the set of players. 9.17 Prove that in an Aumann model of incomplete information with n players, every event that is common knowledge among the players in state of the world ω is
372
Games with incomplete information and common priors
also common knowledge among any subset of the set of players (Remark 9.22, page 332). 9.18 Give an example of an Aumann model of incomplete information with a set of players N = {1, 2, 3} and an event A that is not common knowledge among all the players N, but is common knowledge among players {2, 3}. 9.19 (a) In state of the world ω, Andrew knows that Sally knows the state of nature. Does this imply that Andrew knows the state of nature in ω? Is the fact that Sally knows the state of nature common knowledge among Andrew and Sally in ω? (b) In every state of the world, Andrew knows that Sally knows the state of nature. Does this imply that Andrew knows the state of nature in every state of the world? Is the fact that Sally knows the state of nature common knowledge among Andrew and Sally in every state of the world? (c) In state of the world ω, Andrew knows that Sally knows the state of the world. Does this imply that Andrew knows the state of the world in ω? Is the fact that Sally knows the state of the world common knowledge among Andrew and Sally in ω? 9.20 Let (N, Y, (Fi )i∈N , s, P) be an Aumann model of incomplete information with beliefs, and let W ⊆ Y be an event. Prove that (N, W, (Fi ∩ W )i∈N , P(· | W )) is also an Aumann model of incomplete information with beliefs, where for each player i ∈ N
Fi ∩ W = {F ∩ W : F ∈ Fi }
(9.120)
is the partition Fi restricted to W , and P(· | W ) is the conditional distribution of P over W . 9.21 Prove that without the assumption that P(ω) > 0 for all ω ∈ Y , Theorem 9.29 (page 336) does not obtain. 9.22 This exercise generalizes Aumann’s Agreement Theorem to a set of players of arbitrary finite size. Given an Aumann model of incomplete information with beliefs (N, Y, (Fi )i∈N , s, P) with n players, suppose that for each i ∈ N, the fact that player i ascribes probability qi to an event A is common knowledge among the players. Prove that q1 = q2 = · · · = qn . Hint: Use Theorem 9.32 on page 339 and Exercise 9.17. 9.23 Three individuals are seated in a room. Each one of them is wearing a hat, which may be either red or white. Each of them sees the hats worn by the others, but cannot see his own hat (and in particular does not know its color). The true situation is that every person in the room is wearing a red hat. (a) Depict this situation as a Harsanyi model of incomplete information, where a player’s type is the color of his hat, and specify the vector of types corresponding to the true situation. (b) Depict this situation as an Aumann model of incomplete information, and specify the state of the world corresponding to the true situation.
373
9.8 Exercises
(c) A stranger enters the room, holding a bell. Once a minute, he rings the bell while saying “If you know that the color of the hat on your head is red, leave this room immediately.” Does anyone leave the room after a few rings? Why? (d) At a certain point in time, the announcer says, “At least one of you is wearing a red hat.” He continues to ring the bell once a minute and requesting that those who know their hat to be red to leave. Use the Aumann model of incomplete information to prove that after the third ring, all three hat-wearers will leave the room. (e) What information did the announcer add by saying that at least one person in the room was wearing a red hat, when this was known to everyone before the announcement was made? Hint: See Example 9.12 on page 327. (f) Generalize this result to n individuals (instead of 3). 9.24 Prove that in an Aumann model of incomplete information with a common prior P, if in a state of the world ω Player 1 knows that Player 2 knows A, then P(A | F1 (ω)) = 1. 9.25 Consider an Aumann model of incomplete information with beliefs in which N = {I, II}, Y = {1, 2, 3, 4, 5, 6, 7, 8, 9}, FI = {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}, FII = {{1, 2, 3, 4}, {5, 6, 7, 8}, {9}}, P(ω) = 91 , ∀ω ∈ Y. Let A = {1, 5, 9}, and suppose that the true state of the world is ω∗ = 9. Answer the following questions:
(a) What is the probability that Player I (given his information) ascribes to the event A? (b) What is the probability that Player II ascribes to the event A? (c) Suppose that Player I announces the probability you calculated in item (a) above. How will that affect the probability that Player II now ascribes to the event A? (d) Suppose that Player II announces the probability you calculated in item (c). How will that affect the probability that Player I ascribes to the event A, after hearing Player II’s announcement? (e) Repeat the previous two questions, with each player updating his conditional probability following the announcement of the other player. What is the sequence of conditional probabilities the players calculate? Does the sequence converge, or oscillate periodically (or neither)? (f) Repeat the above, with ω∗ = 8. (g) Repeat the above, with ω∗ = 6. (h) Repeat the above, with ω∗ = 4. (i) Repeat the above, with ω∗ = 1.
374
Games with incomplete information and common priors
9.26 Repeat Exercise 9.25, using the following Aumann model of incomplete information with beliefs: N = {I, II}, Y = {1, 2, 3, 4, 5}, FI = {{1, 2}, {3, 4}, {5}}, FII = {{1, 3, 5}, {2}, {4}}, P(ω) = 51 , ∀ω ∈ Y. for A = {1, 4} and ω∗ = 3. 9.27 Repeat Exercise 9.25 when the two players have different priors over Y : ω PI (ω) = , ∀ω ∈ Y, 45 10 − ω , ∀ω ∈ Y. PII (ω) = 45
(9.121) (9.122)
9.28 This exercise generalizes Exercise 9.25. Let (N, Y, FI , FII , s, P) be an Aumann model of incomplete information with beliefs in which N = {I, II} and let A ⊆ Y be an event. Consider the following process:
r Player I informs Player II of the conditional probability P(A | FI (ω)). r Player II informs Player I of the conditional probability that he ascribes to event A given the partition element FII (ω) and Player I’s announcement. r Player I informs Player II of the conditional probability that he ascribes to event A given the partition element FI (ω) and all the announcements so far. r Repeat indefinitely. Answer the following questions: (a) Prove that the sequence of conditional probabilities that Player I announces converges; that the sequence of conditional probabilities that Player II announces also converges; and that both sequences converge to the same limit. (b) Prove that after at most 2|Y | announcements the sequence of announcements made by the players becomes constant. 9.29 The “No Trade Theorem” mentioned on page 341 is proved in this exercise. Let (N, Y, FI , FII , s, P) be an Aumann model of incomplete information with beliefs where N = {I, II}, let f : Y → R be a function, and let ω∗ ∈ Y be a state of the world in Y . Suppose21 that the fact that E[f | FI ](ω) ≥ 0 is common knowledge in ω∗ , and that the fact that E[f | FII ](ω) ≤ 0 is also common knowledge in ω∗ . In other words, the events AI := {ω : E[f | FI ](ω) ≥ 0} and AII := {ω : E[f | FII ](ω) ≤ 0} are common knowledge in ω∗ . Prove that the event D := {ω ∈ Y : E[f | FI ](ω) = E[f | FII ](ω) = 0} is common knowledge in the state of the world ω∗ . ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
21 Recall that the conditional expectation E[f | FI ] is the function on Y that is defined by E[f | FI ](ω) := E[f | FI (ω)] for each ω ∈ Y .
375
9.8 Exercises
9.30 This exercise is similar to Exercise 9.25, but instead of announcing the probability of a particular event given their private information, the players announce whether or not the expectation of a particular random variable is positive or not, given their private information. This is meant to model trade between two parties to an agreement, as follows. Suppose that Ralph (Player 2) owns an oil field. He expects the profit from the oil field to be negative, and therefore intends to sell it. Jack is of the opinion that the oil field can yield positive profits, and is therefore willing to purchase it (for the price of $0). Jack and Ralph arrive at different determinations regarding the oil field because they have different information. We will show that no trade can occur under these conditions, because of the following exchange between the parties:
r r r r
Jack: I am interested in purchasing the oil field; are you interested in selling? Ralph: Yes, I am interested in selling; are you interested in purchasing? Jack: Yes, I am interested in purchasing; are you still interested in selling? And so on, until one of the two parties announces that he has no interest in a deal.
The formal description of this process is as follows. Let (N, Y, F1 , F2 , s, P) be an Aumann model of incomplete information with beliefs where N = {I, II}, let f : Y → R be a function, and let ω ∈ Y be a state of the world. f (ω) represents the profit yielded by the oil field at the state of the world ω. At each stage, Jack will be interested in the deal only if the conditional expectation of f given his information is positive, and Ralph will be interested in the deal only if the conditional expectation of f given his information is negative. The process therefore looks like this:
r Player I states whether or not E[f | FI ](ω) > 0 (implicitly doing so by expressing or not expressing interest in purchasing the oil field). If he says “no” (i.e., his expectation is less than or equal to 0), the process ends here. r If the process gets to the second stage, Player II states whether his expectation of f , given the information he has received so far, is negative or not. The information he has includes FII (ω) and the affirmative interest of Player I in the first stage. If Player II now says “no” (i.e., his expectation is greater than or equal to 0), the process ends here. r If the process has not yet ended, Player I states whether his expectation of f , given the information he has received so far, is positive or not. The information he has includes FI (ω) and the affirmative interest of Player II in the second stage. If Player I now says “no” (i.e., his expectation is less than or equal to 0), the process ends here. r And so on. The process ends the first time either Player I’s expectation of f , given his information, is not positive, or Player II’s expectation of f , given his information, is not negative. Show that this process ends after a finite number of stages. In fact, show that the number of stages prior to the end of the process is at most max{2|FI | − 1, 2|FII | − 1}. 9.31 Peter has two envelopes. He puts 10k euros in one and 10k+1 euros in the other, where k is the outcome of the toss of a fair die. Peter gives one of the envelopes to
376
Games with incomplete information and common priors
Mark and one to Luke (neither Mark nor Luke knows the outcome of the toss). Mark and Luke both go to their respective rooms, open the envelopes they have received, and observe the amounts in them. (a) Depict the situation as a model with incomplete information, where the state of nature is the amounts in Mark and Luke’s envelopes. (b) Mark finds 1,000 euros in his envelope, and Luke finds 10,000 euros in his envelope. What is the true state of the world in your model? (c) According to the information Mark has, what is the expected amount of money in Luke’s envelope? (d) According to the information Luke has, what is the expected amount of money in Mark’s envelope? (e) Peter enters Mark’s room and asks him whether he would like to switch envelopes with Luke. If the answer is positive, he goes to Luke’s room and informs him: “Mark wants to switch envelopes with you. Would you like to switch envelopes with him?” If the answer is positive, he goes to Mark’s room and tells him: “Luke wants to switch envelopes with you. Would you like to switch envelopes with him?” This process repeats itself as long as the answer received by Peter from Mark and Luke is positive. Use your model of incomplete information to show that the answers of Mark and Luke will be positive at first, and then one of them will refuse to switch envelopes. Who will be the first to refuse? Assume that each of the two would like to change envelopes if the conditional expectation of the amount of money in the other’s envelope is higher than the amount in his envelope. 9.32 The setup is just as in the previous exercise, but now Peter tells Mark and Luke that they can switch the envelopes if and only if both of them have an interest in switching envelopes: each one gives Peter a sealed envelope with “yes” or “no” written in it, and the switch is effected only if both envelopes read “yes”. What will be Mark and Luke’s answers after having properly analyzed the situation? Justify your answer. 9.33 The setup is again as in Exercise 9.31, but this time Peter chooses the integer k randomly according to a geometric distribution with parameter 12 , that is, P(k = n) = 21n for each n ∈ N. How does this affect your answers to the questions in Exercise 9.31? 9.34 Two divisions of Napoleon’s army are camped on opposite hillsides, both overlooking the valley in which enemy forces have massed. If both divisions attack their enemy simultaneously, victory is assured, but if only one division attacks alone, it will suffer a crushing defeat. The division commanders have not yet coordinated a joint attack time. The commander of Division A wishes to coordinate a joint attack time of 6 am the following day with the commander of Division B. Given the stakes involved, neither commander will give an order to his troops to attack until he is absolutely certain that the other commander is also attacking simultaneously. The only way the commanders can communicate with each other is by courier. The travel
377
9.8 Exercises
time between the two camps is an hour’s trek through enemy-held territory, exposing the courier to possible capture by enemy patrols. It turns out that on that night no enemy patrols were scouting in the area. How much time will pass before the two commanders coordinate the attack? Justify your answer. 9.35 Prove Theorem 9.42 on page 349: every game with incomplete information can be described as an extensive-form game. 9.36 Describe the following game with incomplete information as an extensive-form game. There are two players N = {I, II}. Each player has three types, TI = {I1 , I2 , I3 } and TII = {II1 , II2 , II3 }, with common prior: p(Ik , IIl ) =
k(k + l) , 78
1 ≤ k, l ≤ 3.
(9.123)
The number of possible actions available to each type is given by the index of that type: the set of actions of Player I of type Ik contains k actions {1, 2, . . . , k}; the set of actions of Player II of type IIl contains l actions {1, 2, . . . , l}. When the type vector is (Ik , IIl ), and the vector of actions chosen is (aI , aII ), the payoffs to the players are given by uI (Ik , IIl ; aI , aII ) = (k + l)(aI − aII ), uII (Ik , IIl ; aI , aII ) = (k − l)aI aII .
(9.124)
For each player, and each of his types, write down the conditional probability that the player ascribes to each of the types of the other player, given his own type. 9.37 Find a Bayesian equilibrium in the game described in Example 9.38 (page 346). Hint: To find a Bayesian equilibrium, you may remove weakly dominated strategies. 9.38 Find a Bayesian equilibrium in the following game with incomplete information:
r r r r
N = {I, II}. TI = {I1 , I2 } and TII = {II1 }: Player I has two types, and Player II has one type. p(I1 , II1 ) = 31 , p(I2 , II1 ) = 32 . Every player has two possible actions, and state games are given by the following matrices:
Player II L R
Player II L R T Player I
B
2, 0 0, 4
0, 3 1, 0
The state game for t = (I1, II1)
Player I
T
0, 3
3, 1
B
2, 0
0, 1
The state game for t = (I2, II1)
378
Games with incomplete information and common priors
9.39 Answer the following questions for the zero-sum game with incomplete information with two players I and II, in which each player has two types, TI = {I1 , I2 } and TII = {II1 , II2 }, the common prior over the type vectors is p(I1 , II1 ) = 0.4, p(I1 , II2 ) = 0.1, p(I2 , II1 ) = 0.2, p(I2 , II2 ) = 0.3, and the state games are given by
Player II L R
Player II L R Player I
T
2
5
B
–1
20
The state game for t = (I1, II1)
Player I
T
–24
–36
B
0
24
The state game for t = (I1, II2) Player II L R
Player II L R Player I
T
28
15
B
40
4
The state game for t = (I2, II1)
Player I
T
12
20
B
2
13
The state game for t = (I2, II2)
(a) List the set of pure strategies of each player. (b) Depict the game in strategic form. (c) Calculate the value of the game and find optimal strategies for the two players. 9.40 Signaling games This exercise illustrates that a college education serves as a form of signaling to potential employers, in addition to expanding the knowledge of students. A young person entering the job market may be talented or untalented. Suppose that one-quarter of high school graduates are talented, and the rest untalented. A recent high school graduate, who knows whether or not he is talented, has the option of spending a year traveling overseas or enrolling at college (we will assume that he or she cannot do both) before applying for a job. An employer seeking to fill a job opening cannot know whether or not a job applicant is talented; all he knows is that the applicant either went to college or traveled overseas. The payoff an employer gets from hiring a worker depends solely on the talents of the hired worker (and not on his educational level), while the payoff to the youth depends on what he chose to do after high school, on his talents (because talented students enjoy their studies at college more than untalented students), and on whether or not he gets a job. These payoffs are described in the following tables (where the employer is the row player and the youth is the column player, so that a payoff vector of (x, y) represents a payoff of x to the employer and y to the youth).
379
9.8 Exercises Youth Travel Study Hire
Employer
0, 6
Don’t Hire
3, 3
Youth Travel Study
0, 2 3, –3
Employer
Hire
8, 6
8, 4
Don’t Hire
3, 3
3, 1
Payoff matrix if youth is untalented
Payoff matrix if youth is talented
(a) Depict this situation as a Harsanyi game with incomplete information. (b) List the pure strategies of the two players. (c) Find two Bayesian equilibria in pure strategies. 9.41 Lemon Market This exercise illustrates that in situations in which a seller has more information than a buyer, transactions might not be possible. Consider a used car market in which a fraction q of the cars (0 ≤ q ≤ 1) are in good condition and 1 − q are in bad condition (lemons). The seller (Player 2) knows the quality of the car he is offering to sell while the buyer (Player 1) does not know the quality of the car that he is being offered to buy. Each used car is offered for sale at the price of $p (in units of thousands of dollars). The payoffs to the seller and the buyer, depending on whether or not the transaction is completed, are described in the following tables: Sell Buy Don’t Buy
Don’t Sell
Sell
6 – p, p
0, 5
Buy
0, 5
0, 5
Don’t Buy
State game if car in good condition
Don’t Sell
4 – p, p
0, 0
0, 0
0, 0
State game if car in bad condition
Depict this situation as a Harsanyi game with incomplete information, and for each pair of parameters p and q, find all the Bayesian equilibria. 9.42 Nicolas would like to sell a company that he owns to Marc. The company’s true value is an integer between 10 and 12 (including 10 and 12), in millions of dollars. Marc has to make a take-it-or-leave-it offer, and Nicolas has to decide whether to accept the offer or reject it. If Nicolas accepts the offer, the company is sold, Nicolas’s payoff is the amount that he got, and Marc’s payoff is the difference between the company’s true value and the amount that he paid. If Nicolas rejects the offer, the company is not sold, Nicolas’s payoff is the value of the company, and Marc’s payoff is 0. For each one of the following three information structures, describe the situation as a game with incomplete information, and find all the Bayesian equilibria in the corresponding game. In each case, the description of the situation is common knowledge among the players. In determining Nicolas’s action set, note that Nicolas knows what Marc’s offer is when he decides whether or not to accept the offer.
380
Games with incomplete information and common priors
(a) Neither Nicolas nor Marc knows the company’s true value; both ascribe probability 13 to each possible value. (b) Nicolas knows the company’s true value, whereas Marc does not know it, and ascribes probability 31 to each possible value. (c) Marc does not know the company’s worth and ascribes probability 13 to each possible value. Marc further ascribes probability p to the event that Nicolas knows the value of the company, and probability 1 − p to the event that Nicolas does not know the value of the company, and instead ascribes probability 31 to each possible value. 9.43 Prove that in each game with incomplete information with a finite set of players, where the set of types of each player is a countable set, and the set of possible actions of each type is finite, there exists a Bayesian equilibrium (in behavior strategies). Guidance: Suppose that the set of types of player i, Ti , is the set of natural numbers N. Denote Tik := {1, 2, . . . , k} and T k = ×i∈N Tik . Let pk be the probability distribution p conditioned on the set T k :
p k (t) =
p(t) p(T k )
0
t ∈ T k, t ∈ T k .
(9.125)
Prove that for a sufficiently large k, the denominator p(T k ) is positive and therefore the probability distribution p k is well defined. Show that for each k, the game in which the probability distribution over the types is p k has an equilibrium, and any accumulation point of such equilibria, as k goes to infinity, is an equilibrium of the original game. 9.44 Prove Theorem 9.51 on page 354: a strategy vector σ ∗ = (σi∗ )i∈N is a Bayesian equilibrium in a game Ŵ with incomplete information if and only if the strategy vector (σi∗ (ti ))i∈N,ti ∈Ti is a Nash equilibrium in the agent-form game Ŵ . (For the definition of an agent-form game, see Definition 9.50 on page 354.)
9.45 This exercise shows that in a game with incomplete information, the payoff function of an inactive type has no effect on the set of equilibria. Let Ŵ = (N, (Ti )i∈N , p, S, (st )t∈×i∈N Ti ), where st = (N, (Ai (ti ), ui (t))i∈N ) for each t ∈ ×i∈N Ti , be a game with incomplete information in which there exists a player j Ŵ be a game with incomplete and a type tj∗ of player j such that |Aj (tj∗ )| = 1. Let information that is identical to Ŵ, except that the payoff function uj (tj∗ ) of player j ∗ ∗ ui (t; a) = ui (t; a) if tj = tj∗ or i = j . of type tj may be different from uj (tj ), that is, Show that the two games Ŵ and Ŵ have the same set of Bayesian equilibria. 9.46 Electronic Mail game Let L > M > 0 be two positive real numbers. Two players play a game in which the payoff function is one of the following two, depending on the value of the state of nature s, which may be 1 or 2:
381
9.8 Exercises Player II A B
Player II A B A Player I
B
M, M –L, 0
1, –L 0, 0
The state game for s = 1
Player I
A
0, 0
0, –L
B
–L, 1
M, M
The state game for s = 2
The probability that the state of nature is s = 2 is p < 12 . Player I knows the true state of nature, and Player II does not know it. The players would clearly prefer to coordinate their actions and play (A, A) if the state of nature is s = 1 and (B, B) if the state is s = 2, which requires that both of them know what the true state is. Suppose the players are on opposite sides of the globe, and the sole method of communication available to them is e-mail. Due to possible technical communications disruptions, there is a probability of ε > 0 that any e-mail message will fail to arrive at its destination. In order to transfer information regarding the state of nature from Player I to Player II, the two players have constructed an automated system that sends e-mail from Player I to Player II if the state of nature is s = 2, and does not send any e-mail if the state is s = 1. To ensure that Player I knows that Player II received the message, the system also sends an automated confirmation of receipt of the message (by e-mail, of course) from Player II to Player I the instant Player I’s message arrives at Player II’s e-mail inbox. To ensure that Player II knows that Player I received the confirmation message, the system also sends an automated confirmation of receipt of the confirmation message from Player I to Player II the instant Player II’s confirmation arrives at Player I’s e-mail inbox. The system then proceeds to send an automated confirmation of the receipt of the confirmation of the receipt of the confirmation, and so forth. If any of these e-mail messages fail to arrive at their destinations, the automated system stops sending new messages. After communication between the players is completed, each player is called upon to choose an action, A or B. Answer the following questions: (a) Depict the situation as a game with incomplete information, in which each type of each player is indexed by the number of e-mail messages he has received. (b) Prove that the unique Bayesian equilibrium where Player I plays A when s = 1 is for both players to play A under all conditions. (c) How would you play if you received 100 e-mail confirmation messages? Explain your answer. 9.47 In the example described in Section 9.5 (page 361), for each ε ∈ [0, 1] find Bayesian in threshold strategies, where α has uniform distribution over the interval equilibria 1 2 1 1 , , and β has uniform distribution over the interval − . 4 2 3 3
9.48 In each of the two strategic-form games whose matrices appear below, find all the equilibria. For each equilibrium, describe a sequence of games with incomplete
382
Games with incomplete information and common priors
information in which the amplitude of the noise converges to 0, and find Bayesian equilibria in pure strategies in each of these games, such that when the amplitude of the noise converges to 0, the probability that each of the players will choose a particular action converges to the corresponding probability in the equilibrium of the original game (see Section 9.5 on page 361). Player II L R
Player II L R 1, 5
T Player I
B
4, 1
2, 1
0, 3
Player I
T
3, 4
2, 2
B
1, 1
2, 1
Game B
Game A
9.49 Consider a Harsanyi game with incomplete information in which N = {I, II}, TI = {I1 , I2 }, and TII = {II1 , II2 }. The mutual beliefs of the types in this game in the interim stage, before actions are chosen, are II1 1/4 2/3
I1 I2
II2 3/4 1/3
Player I’s beliefs
I1 I2
II1 3/11 8/11
II2 9/13 4/13
Player II’s beliefs
and the state games are given by Player II
Player II
Player I
L
R
T
1
0
B
0
0
The state game for t = (I1, II1)
Player I
L
R
T
0
1
B
0
0
The state game for t = (I1, II2) Player II
Player II
Player I
L
R
T
0
0
B
1
0
The state game for t = (I2, II1)
Player I
L
R
T
0
0
B
0
1
The state game for t = (I2, II2)
Are the beliefs of the players consistent? In other words, can they be derived from common prior beliefs? If you answer no, justify your answer. If you answer yes, find the common prior, and find a Bayesian equilibrium in the game. 9.50 Repeat Exercise 9.49, with the following mutual beliefs:
383
9.8 Exercises
I1 I2
II1
II2
1/3 3/4
2/3 1/4
I1 I2
Player I’s beliefs
II1
II2
3/5 2/5
1/6 5/6
Player II’s beliefs
9.51 Two or three players are about to play a game: with probability 12 the game involves Players 1 and 2 and with probability 12 the game involves Players 1, 2, and 3. Players 2 and 3 know which game is being played. In contrast, Player 1, who participates in the game under all conditions, does not know whether he is playing against Player 2 alone, or against both Players 2 and 3. If the game involves Players 1 and 2 the game is given by the following matrix, where Player 1 chooses the row, and Player 2 chooses the column: L
R
T
0, 0
2, 1
B
2, 1
0, 0
with Player 3 receiving no payoff. If the game involves all three players, the game is given by the following two matrices, where Player 1 chooses the row, Player 2 chooses the column, and Player 3 chooses the matrix: W
(a) (b) (c) (d) (e) (f)
E
L
R
T
1, 2, 4
0, 0, 0
B
0, 0, 0
2, 1, 3
L
R
T
2, 1, 3
0, 0, 0
B
0, 0, 0
1, 2, 4
What are the states of nature in this game? How many pure strategies does each player have in this game? Depict this game as a game with incomplete information. Describe the game in extensive form. Find two Bayesian equilibria in pure strategies. Find an additional Bayesian equilibrium by identifying a strategy vector in which all the players of all types are indifferent between their two possible actions.
9.52 This exercise generalizes Theorems 9.47 (page 354) and 9.53 (page 355) to the case where the prior distributions of the players differ. Let (N, (Ti )i∈N , (pi )i∈N , S, (st )t∈×i∈N Ti ) be a game with incomplete information where each player has a different prior distribution: for each i ∈ N, player i’s prior distribution is pi . For each strategy vector σ , define the payoff function Ui as
pi (t)Ui (t; σ ), (9.126) Ui (σ ) := t∈T
384
Games with incomplete information and common priors
and the payoff of player i of type ti by
Ui (σ | ti ) := pi (t−i | ti )Ui ((ti , t−i ); σ ).
(9.127)
t−i ∈T−i
A strategy vector σ ∗ is a Nash equilibrium if for every player i ∈ N and every strategy σi of player i, ∗ ), Ui (σ ∗ ) ≥ Ui (σi , σ−i
(9.128)
and it is a Bayesian equilibrium if for every player i ∈ N, every type ti ∈ Ti , and every strategy σi of player i, ∗ Ui (σ ∗ | ti ) ≥ Ui (σi , σ−i | ti ).
(9.129)
(a) Prove that a Nash equilibrium exists when the number of players is finite and each player has finitely many types and actions. (b) Prove that if each player assigns positive probability to every type of every player, i.e., if pi (tj ) := t−j ∈T−j pi (tj , t−j ) > 0 for every i, j ∈ N and every tj ∈ Tj , then every Nash equilibrium is a Bayesian equilibrium, and every Bayesian equilibrium is a Nash equilibrium. 9.53 In this exercise, we explore the connection between correlated equilibrium (see Chapter 8) and games with incomplete information. (a) Let Ŵ = (N, (Ti )i∈N , p, S, (st )t∈×i∈N Ti ) be a game with incomplete information, where the set of states of nature S contains only one state, which is a game in strategic form G = (N, (Ai )i∈N , (ui )i∈N ); that is, st = G for every t ∈ ×i∈N Ti . The game G is called “the base game” of Ŵ. Denote the set of action vectors by A = ×i∈N Ai . Every strategy vector σ in Ŵ naturally induces a distribution μσ over the vectors in A:
p(ω) × σ1 (t1 ; a1 ) × σ2 (t2 ; a2 ) × · · · × σn (tn ; an ). (9.130) μσ (a) = ω∈
Prove that if a strategy vector σ ∗ is a Bayesian equilibrium of Ŵ, then the distribution μσ ∗ defined in Equation (9.130) is a correlated equilibrium in the base game G. (b) Prove that for every strategic-form game G = (N, (Ai )i∈N , (ui )i∈N ), and every correlated equilibrium μ in this game there exists a game with incomplete information Ŵ = (N, (Ti )i∈N , p, S, (st )t∈×i∈N Ti ) in which the set of states of nature S contains only one state, and that state corresponds to the base game, st = G ∗ for every t ∈ ×i∈N T i , and there exists a Bayesian equilibrium σ in the game ∗ ∗ Ŵ, such that μ(a) = ω∈ p(ω) × σ1 (t1 ; a1 ) × σ2 (t2 ; a2 ) × · · · × σn∗ (tn ; an ) for every a ∈ A.
9.54 Carolyn and Maurice are playing the game “Chicken” (see Example 8.3 on page 303). Both Carolyn and Maurice know that Maurice knows who won the Wimbledon tennis tournament yesterday (out of three possible tennis players, Jim, John, and Arthur, who each had a probability of one-third of winning the tournament), but Carolyn does not know who won the tournament.
385
9.8 Exercises
(a) Describe this situation as a game with incomplete information, and find the set of Bayesian equilibria of this game. (b) Answer the first two questions of this exercise, under the assumption that both Carolyn and Maurice only know whether or not Jim has won the tournament. (c) Answer the first two questions of this exercise, under the assumption that Maurice only knows whether or not Jim has won the tournament, while Carolyn only knows whether or not John has won the tournament.
10
Games with incomplete information: the general model
Chapter summary In this chapter we extend Aumann’s model of incomplete information with beliefs in two ways. First, we do not assume that the set of states of the world is finite, and allow it to be any measurable set. Second, we do not assume that the players share a common prior, but rather that the players’ beliefs at the interim stage are part of the data of the game. These extensions lead to the concept of a belief space. We also define the concept of a minimal belief subspace of a player, which represents the model that the player “constructs in his mind” when facing the situation with incomplete information. The notion of games with incomplete information is extended to this setup, along with the concept of Bayesian equilibrium. We finally discuss in detail the concept of consistent beliefs, which are beliefs derived from a common prior and thus lead to an Aumann or Harsanyi model of incomplete information.
Chapter 9 focused on the Aumann model of incomplete information, and on Harsanyi games with incomplete information. In both of those models, players share a common prior distribution, either over the set of states of the world or over the set of type vectors. As noted in that chapter, there is no compelling reason to assume that such a common prior exists. In this chapter, we will expand the Aumann model of incomplete information to deal with the case where players may have heterogeneous priors, instead of a common prior. The equilibrium concept we presented for analyzing Harsanyi games with incomplete information and a common prior was the Nash equilibrium. This is an equilibrium in a game that begins with a chance move that chooses the type vector. As shown in Chapter 9, every Nash equilibrium in a Harsanyi game is a Bayesian equilibrium, and conversely every Bayesian equilibrium is a Nash equilibrium. When there is no common prior, we cannot postulate a chance move choosing a type vector; hence the concept of Nash equilibrium is not applicable in this case. However, as we will show, the concept of Bayesian equilibrium is still applicable. We will study the properties of this concept in Section 10.5 (page 407).
10.1
Belief spaces • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
386
Recall that an Aumann model of incomplete information is given by a set of players N, a finite set Y of states of the world, a partition Fi of Y for each player i ∈ N, a set of states of nature S, a function s : Y → S mapping each state of the world to a state of nature,
387
10.1 Belief spaces
and a common prior P over Y . The next definition extends this model to the case in which there is no common prior. Definition 10.1 Let N be a finite set of players, and let (S, S ) be a measurable space of states of nature.1 A belief space of the set of players N over the set of states of nature is an ordered vector = (Y, Y , s, (πi )i∈N ), where:
r (Y, Y ) is a measurable space of states of the world. r s : Y → S is a measurable function,2 mapping each state of the world to a state of nature. r For each player i ∈ N, a function πi : Y → (Y ) mapping each state of the world ω ∈ Y to a probability distribution over Y . We will denote the probability that player i ascribes to event E, according to the probability distribution πi (ω), by πi (E | ω). We require the function (πi )i∈N to satisfy the following conditions: 1. Coherency: for each player i ∈ N and each ω ∈ Y , the set {ω′ ∈ Y : πi (ω′ ) = πi (ω)} is measurable in Y , and πi ({ω′ ∈ Y : πi (ω′ ) = πi (ω)} | ω) = 1.
(10.1)
2. Measurability: for each player i ∈ N and each measurable set E ∈ Y , the function πi (E | ·) : Y → [0, 1] is a measurable function. As in the Aumann model of incomplete information, belief spaces describe situations in which there is a true state of the world ω∗ but the players may not know which state is the true state. At the true state of the world ω∗ each player i ∈ N believes that the true state of the world is distributed according to the probability distribution πi (ω∗ ). This probability distribution is called player i’s belief at the state of the world ω∗ . We assume that each player knows his own belief and therefore if at the state of the world ω∗ player i believes that the state of the world might be ω, then his beliefs at ω∗ and ω must coincide. Indeed, if his beliefs at ω differed from his beliefs at ω∗ , then he would be able to distinguish between these states, and therefore at ω∗ he could not ascribe a positive probability to the state of the world ω. It follows that at the state of the world ω∗ player i ascribes probability 1 to the set of states of the world at which his beliefs equal his belief at ω∗ , and therefore the support of πi (ω) is contained in the set {ω′ ∈ Y : πi (ω′ ) = πi (ω)}, for each state of the world ω ∈ Y . This is the reason we demand coherency in Definition 10.1. The measurability condition is a technical condition that is required for computing the expected payment in games in which incomplete information games are modeled using belief spaces. The concept “belief space” generalizes the concept “Aumann model of incomplete information” that was presented in Definition 9.27 (page 334). Every Aumann model of incomplete information is a belief space. To see this, let = (N, Y, (Fi )i∈N , s, P) be an Aumann model of incomplete information. Let Y = 2Y be the collection of all subsets
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
1 A measurable space is a pair (X, X ), where X is a set and X is a σ -algebra over X; i.e., X is a collection of subsets of X that includes the empty set, is closed under complementation, and is closed under countable intersections. A set in X is called a measurable set. This definition was mentioned on page 344. 2 A function f : X → Y is measurable if the inverse image under f of every measurable set in Y is a measurable set in X. In other words, for each measurable set C in Y , the set f −1 (C) := {x ∈ X : f (x) ∈ C} is measurable in X.
388
Games with incomplete information: the general model
of Y . For each player i ∈ N and every ω ∈ Y , let πi (ω) = P(· | Fi (ω)); i.e., player i’s belief at state ω is the common prior P, conditioned on his information. It follows that (Y, Y , s, (πi )i∈N ) is a belief space equivalent to the original Aumann model: for every event A ⊆ Y , the probability that player i ascribes at every state of the world ω to event A is equal in both models (verify!). Since every Harsanyi model of incomplete information is equivalent to an Aumann model of incomplete information (see page 350), every Harsanyi model of incomplete information can be represented by a belief space. Belief spaces generalize Aumann models of incomplete information with belief in the following ways: 1. The set of states of the world in an Aumann model of incomplete information is finite, while the set of states of the world in a belief space may be any measurable space. 2. The beliefs (πi )i∈N in a belief space are not necessarily derived from a prior P common to all the players. In most of the examples in this chapter, the set of states of the world Y is finite. In those examples, we assume that Y = 2Y : the σ -algebra over Y is the collection of all the subsets of Y . Example 10.2 Let the set of players be N = {I, II}, and let the set of states of nature be S = {s1 , s2 }. Consider a belief space Y = {ω1 , ω2 , ω3 }, where: State of the world
s(·)
ω1
s1
ω2
s1
ω3
s2
πI (·)
πII (·)
(ω1 ), 13 (ω2 ) 3 2 (ω1 ), 13 (ω2 ) 3 2
[1(ω3 )]
1
[1(ω1 )]
(ω2 ), 21 (ω3 ) 12 1 2 (ω2 ), 2 (ω3 )
The states of the world appear in the left-hand column of the table, the next column displays the state of nature associated with each state of the world, and the two right-hand columns display the beliefs of the players at each state of the world. At the state of the world ω1 , Player II ascribes probability 1 to the state of nature being s1 , while at the states ω2 and ω3 he ascribes probability 12 to each of the two states of nature. At each state of the world, Player I ascribes probability 1 to the true state of nature. As for the beliefs of Player I about the beliefs of Player II about the state of nature, at the state of the world ω3 he ascribes probability 1 to Player II ascribing equal probabilities to the two states of nature, while at the states of the world ω1 and ω2 he ascribes probability 32 to Player II ascribing probability 1 to the true state of nature, and probability 31 to Player II ascribing probability 12 to the true state of nature. The beliefs of the players can be calculated from the following common prior P: P(ω1 ) = 12 ,
P(ω2 ) = 41 ,
P(ω3 ) = 41 ,
(10.2)
and the following partitions of the two players (verify that this is true) FI = {{ω1 , ω2 }, {ω3 }},
FII = {{ω1 }, {ω2 , ω3 }}.
(10.3)
It follows that the belief space of this example is equivalent to an Aumann model of incomplete ◭ information.
389
10.1 Belief spaces
As the next example shows, however, it is not true that every belief space is equivalent to an Aumann model of incomplete information; in other words, there are cases in which the beliefs of the players (πi )i∈N cannot be calculated as conditional probabilities of a common prior. Example 10.3 Let the set of players be N = {I, II}, and let the set of states of nature be S = {s1 , s2 }. Consider a belief space Y = {ω1 , ω2 } where: State of the world
s(·)
ω1
s1
ω2
s2
πI (·)
πII (·)
(ω1 ), 13 (ω2 ) 3 2 1 3 (ω1 ), 3 (ω2 ) 2
1
(ω1 ), 12 (ω2 ) 12 1 2 (ω1 ), 2 (ω2 )
In this space, at every state of the world Player I ascribes probability 23 to the state of nature being s1 , while Player II ascribes probability 12 to the state of nature being s1 . There is no common prior over Y that enables both of these statements to be true (verify that this is true). Since each player has the same belief at both states of the world, if there is an Aumann model of incomplete information describing this situation, the partition of each player must be the trivial ◭ partition: Fi = {Y } for all i ∈ N .
As the next example shows, it is possible for the support of πi (ω) to be contained in the set {ω′ ∈ Y : πi (ω′ ) = πi (ω)}, but not equal to it. Example 10.4 Let the set of players be N = {I, II}, and let the set of states of nature be S = {s1 , s2 }. Consider a belief space Y = {ω1 , ω2 , ω3 }, where: State of the world
s(·)
πI (·)
ω1
s1
[1(ω1 )]
ω2
s2
[1(ω1 )]
ω3
s2
[1(ω3 )]
πII (·) (ω1 ), 12 (ω2 ) (ω1 ), 12 (ω2 ) 21 (ω1 ), 12 (ω2 ) 2 1
12
At both states of the world ω1 and ω2 , Player I believes that the true state is ω1 : the support of πI (ω1 ) is the set {ω1 }, which is a proper subset of {ω′ ∈ Y : πI (ω′ ) = πI (ω1 )} = {ω1 , ω2 }. Note that at the state of the world ω2 , the state of nature is s2 , but Player I believes that the state of nature ◭ is s1 .
The belief spaces described in Examples 10.3 and 10.4 are not equivalent to Aumann models of incomplete information, but they can be described as Aumann models in which every player has a prior distribution of his own. In Example 10.3, in both states of the world, Player I has a prior distribution [ 23 (ω1 ), 13 (ω2 )], and Player II has a prior distribution [ 12 (ω1 ), 12 (ω2 )]. The beliefs of the players in the belief space of Example 10.4 can also be computed as being derived from prior distributions in the following way (verify!). The beliefs of Player II can be derived from the prior PII (ω1 ) = 12 , PII (ω2 ) = 21 , PII (ω3 ) = 0
(10.4)
390
Games with incomplete information: the general model
and the partition
FII = {Y }.
(10.5)
The beliefs of Player I can be derived from any prior of the form PI (ω1 ) = x, PI (ω2 ) = 0, PI (ω3 ) = 1 − x,
(10.6)
where x ∈ (0, 1), and the partition
FI = {{ω1 , ω2 }, {ω3 }}.
(10.7)
This is not coincidental: every belief space with a finite set of states of the world is an Aumann model of incomplete information in which every player has a prior distribution whose support is not necessarily all of Y , and the priors of the players may be heterogeneous.3 To see this, for the case that Y is finite define, for each player i, a partition Fi of Y based on his beliefs: Fi (ω) = {ω′ ∈ Y : πi (ω′ ) = πi (ω)}.
(10.8)
For each ω, the partition element Fi (ω) is the set of all states of the world at which the beliefs of player i equal his beliefs at ω: player i’s beliefs do not distinguish between the states of the world in Fi (ω). Define, for each player i ∈ N a probability distribution Pi ∈ (Y ) as follows (verify that this is indeed a probability distribution):
1 πi (A | ω). (10.9) Pi (A) = |Y | ω∈Y Then the belief πi (ω) of player i at the state of the world ω is the probability distribution Pi , conditioned on Fi (ω), which is his information at that state of the world (see Exercise 10.3): πi (A | ω) = Pi (A | Fi (ω)), ∀ω ∈ Y, ∀A ∈ Y .
(10.10)
It follows that every belief space = (Y, Y , s, (πi )i∈N ), where Y is a finite set, is equivalent to an Aumann model of incomplete information (N, Y, (Fi )i∈N , s, (Pi )i∈N ) in which every player has a prior of his own. Example 10.4 (Continued) Using Equation (10.9), we have PI = 32 (ω1 ), 0(ω2 ), 13 (ω3 ) , PII = 21 (ω1 ), 12 (ω2 ), 0(ω3 ) .
(10.11)
In fact, the definition of Pi in Equation (10.9) can be replaced with any weighted average of the beliefs (πi (· | ω))ω∈Y , where all the weights are positive. The probability distribution of Equation (10.6) corresponds to the weights (y, x − y, 1 − x), where y ∈ (0, x) (verify!). ◭
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
3 When the space of states of the world is infinite, additional technical assumptions are needed to ensure the existence of a prior distribution from which each player’s beliefs can be derived.
391
10.2 Belief and knowledge
Just as in an Aumann model of incomplete information, we can trace all levels of beliefs for each player at any state of the world in a belief space. For example, consider Example 10.4 and write out Player 1’s beliefs at the state of the world ω3 . At that state, Player 1 ascribes probability 1 to the state of nature being s2 ; this is his first-order belief. He ascribes probability 1 to the state of nature being s2 and to Player 2 ascribing equal probability to the two possible states of nature; this is his second-order belief. Player 1’s third-order belief at the state of the world ω3 is as follows: Player 1 ascribes probability 1 to the state of nature being s2 , to Player 2 ascribing equal probability to the two states of nature, and to Player 2 believing that Player 1 ascribes probability 1 to the state of nature s1 . We can similarly describe the beliefs of every player, at any order, at every state of the world.
10.2
Belief and knowledge • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
One of the main elements of the Aumann model of incomplete information is the partitions (Fi )i∈N defining the players’ knowledge operators. In an Aumann model, the players’ beliefs are derived from a common prior, given the information that the player has (i.e., the partition element Fi (ω)). In contrast, in a belief space, a player’s beliefs are given by the model itself. Since an Aumann model of incomplete information is a special case of a belief space, it is natural to ask whether a knowledge operator can be defined generally, in all belief spaces. As we saw in Equation (10.8), the beliefs (πi )i∈N of the players define partitions (Fi )i∈N of Y . A knowledge operator can then be defined using these partitions. When player i knows what the belief space is, he can indeed compute his partitions (Fi )i∈N and the knowledge operators corresponding to these partitions. As the next example shows, knowledge based on these knowledge operators is not equivalent to belief with probability 1. Example 10.5
Consider the belief space of a single player N = {I} over a set of states of the world
S = {s1 , s2 }, shown in Figure 10.1.
II
State of the world ω1 ω2
(·) s1 s2
πI (·) [1(ω1)] [1(ω1)]
Figure 10.1 The belief space in Example 10.5
In the belief space , the partition defined by Equation (10.8) contains a single element, and therefore the minimal knowledge element of Player I at every state of the world is {ω1 , ω2 }. In other words, at the state of the world ω1 Player I does not know that the state of the world is ω1 . Thus, despite the fact that at the state of the world ω1 Player I ascribes probability 1 to the state of the world ω1 , he does not know that this is the true state of the world. ◭
The assumption that a player knows the belief space is a strong assumption: at the state of the world ω1 in Example 10.5 the player ascribes probability 1 to the state of the
392
Games with incomplete information: the general model
world ω1 . Perhaps he does not know that there is a state of the world ω2 ? We will assume that the only information that a player has is his belief, and that he does not know what the belief space is. In particular, different players may have different realities. In such a case, when a player does not know , he cannot compute the partitions (Fi )i∈N and the knowledge operators corresponding to these partitions, and therefore cannot compute the events that he knows. Under these assumptions the natural operator to use in belief spaces is a belief operator and not a knowledge operator. Under a knowledge operator, if a player knows a certain fact it must be true. This requirement may not be satisfied by belief operators; a player may ascribe probability 1 to a ‘fact’ that is actually false. After we define this operator and study its properties we will relate it to the knowledge operator in Aumann models of incomplete information. Definition 10.6 At the state of the world ω ∈ Y , player i ∈ N believes that an event A obtains if πi (A | ω) = 1. Denote Bi A := {ω ∈ Y : πi (A | ω) = 1}.
(10.12)
At the state of the world ω player i believes that event A obtains if he ascribes probability 1 to A. The event Bi A is the set of all states of the world at which player i believes event A obtains. The belief operator Bi satisfies four of the five properties of Kripke that a knowledge operator must satisfy (see page 327 and Exercise 10.8). Theorem 10.7 For each player i ∈ N, the belief operator Bi satisfies the following four properties: 1. Bi Y = Y : At each state of the world, player i believes that Y is the set of states of the world. 2. Bi A ∩ Bi C = Bi (A ∩ C): If player i believes that event A obtains and he believes that event C obtains, then he believes that event A ∩ C obtains. 3. Bi (Bi A) = Bi A: If player i believes that event A obtains, then he believes that he believes that event A obtains. 4. (Bi A)c = Bi ((Bi A)c ): If player i does not believe that event A obtains, then he believes that he does not believe that event A obtains. The knowledge operator Ki satisfies a fifth property: Ki A ⊆ A. This property is not necessarily satisfied by a belief operator: it is not always the case that Bi A ⊆ A. In other words, it is possible that ω ∈ Bi A but ω ∈ A. This means that a player may believe that the event A obtains despite the fact that the true state of the world is not in A; i.e., A does not obtain. This is the case in Example 10.4: for A = {ω1 }, B1 A = {ω1 , ω2 }: at the state of the world ω2 the player believes that A obtains, despite the fact that ω2 ∈ A. The belief operator does satisfy the following additional property (Exercise 10.13). The analogous property for the knowledge operator is stated in Theorem 9.10 (page 326). Theorem 10.8 For each player i ∈ N, and any pair of events A, C ⊆ Y , if A ⊆ C, then Bi A ⊆ Bi C. In words, when a player believes that event A obtains, he also believes that every event containing A obtains.
393
10.2 Belief and knowledge
Just as we defined the concept of common knowledge (Definition 9.2 on page 321), we can define the concept of common belief. Definition 10.9 Let A ⊆ Y be an event and let ω ∈ Y . The event A is common belief among the players at the state of the world ω if at that state of the world every player believes that A obtains, every player believes that every player believes that A obtains, and so on. In other words, for every finite sequence i1 , i2 , . . . , il of players in N, ω ∈ Bi1 Bi2 . . . Bil−1 Bil A.
(10.13)
It follows from Definition 10.9, and Theorem 10.7(1) that, in particular, the event Y is common belief among the players at every state of the world ω ∈ Y . The next theorem presents a sufficient condition for an event to be common belief among the players at a particular state of the world. Theorem 10.10 Let ω ∈ Y be a state of the world. Let A ∈ Y be an event satisfying the following two conditions:
r πi (A | ω) = 1 for every player i ∈ N. r πi (A | ω′ ) = 1 for every player i ∈ N and every ω′ ∈ A. Then A is common belief among the players at ω. Proof: The first condition implies that ω ∈ Bi A, and the second condition implies that A ⊆ Bi A, for each player i ∈ N. From this, and from repeated application of Theorem 10.8, we get for every finite sequence i1 , i2 , . . . , il of players: ω ∈ Bi1 A ⊆ Bi1 Bi2 A ⊆ · · · ⊆ Bi1 Bi2 . . . Bil−1 A ⊆ Bi1 Bi2 . . . Bil−1 Bil A. It follows that at the state of the world ω event A is common belief among the players. When a belief space is equivalent to an Aumann model of incomplete information, the concept of knowledge is a meaningful one, and the question naturally arises as to whether there is a relation between knowledge and belief in this case. As we now show, the answer to this question is positive. Let = (Y, Y , s, (πi )i∈N ) (where Y is a finite set of states of the world) be a belief space that is equivalent to an Aumann model of incomplete information. In particular, there exists a probability distribution P over Y satisfying P(ω) > 0 for all ω ∈ Y , and there exist partitions (Fi )i∈N of Y such that πi (ω) = P(· | Fi (ω)), ∀i ∈ N, ∀ω ∈ Y.
(10.14)
The partition Fi in the Aumann model coincides with the partition defined by Equation (10.8) for the belief space (Exercise 10.14); hence the knowledge operator in the Aumann model is the same operator as the belief operator in the belief space. We therefore have the following theorem: Theorem 10.11 Let be a belief space equivalent to an Aumann model of incomplete information. Then the belief operator in the belief space is the same operator as the knowledge operator in the Aumann model: For every i ∈ N, at the state of the world ω player i believes that event A obtains (in the belief space) if and only if he knows that event A obtains (in the Aumann model).
394
Games with incomplete information: the general model
Note that for this result to obtain, it must be the case that P (ω) > 0 for every state of the world ω ∈ Y (Exercise 10.11). If = (Y, Y , s, (πi )i∈N ) is a belief space satisfying the condition that for player i ∈ N, Bi A ⊆ A for every event A ⊆ Y , then the operator Bi satisfies the five properties of Kripke, and is therefore a knowledge operator: there exists a partition Gi of Y such that Bi is the knowledge operator defined by Gi via Equation (9.2) on page 325 (Exercise 9.2, Chapter 9). Since this partition is simply the partition defined by Equation (10.8) (Exercise 10.9), the conclusion of Theorem 10.11 obtains in this case as well. We stress that this case is more general than the case in which a belief space is equivalent to an Aumann model of incomplete information, because this condition can be met in an Aumann model of incomplete information in which the players do not share a common prior (see Example 10.3). Nevertheless, in this case, the belief operator is also the same operator as the knowledge operator.
10.3
Examples of belief spaces • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
As stated above, the information that player i has at the state of the world ω is given by his belief πi (ω). We shall refer to this belief at the player’s type. A player’s type is thus a probability distribution over Y . Definition 10.12 Let = (Y, Y , s, (πi )i∈N ) be a belief space. The type of player i at the state of the world ω is πi (ω). The set of all types of player i in a belief space is denoted by Ti and called the type set of player i. Ti := {πi (ω) : ω ∈ Y } ⊆ (Y ).
(10.15)
The coherency requirement in Definition 10.1, and the definition of the belief operator Bi , together imply that at each state of the world ω, every player i ∈ N believes that his type is πi (ω): πi ({ω′ ∈ Y : πi (ω′ ) = πi (ω)} | ω) = 1.
(10.16)
We next present examples showing how situations of incomplete information can be modeled by belief spaces. We start with situations that can be modeled both by Aumann models of incomplete information, and by belief spaces. Example 10.13 Complete information Suppose that a state of nature s0 ∈ S is common belief among the players in a finite set N = {1, 2, . . . , n}. The following belief space corresponds to this situation, where the set of states of the world, Y = {ω}, contains only one state, and all the players have the same beliefs: State of the world
s(·)
π1 (·), · · · , πn (·)
ω
s0
[1(ω)]
395
10.3 Examples of belief spaces In this space, every player i ∈ N has only one type, Ti = {[1(ω)]}. The beliefs of the players can be calculated from the common prior P defined by P(ω) = 1, hence this belief space is also an Aumann model of incomplete information, with the trivial partition Fi = {{ω}}, for every player i ∈ N . This situation can also be modeled by the following belief space, where Y = {ω1 , ω2 }: State of the world
s(·)
π1 (·), · · · , πn (·)
ω1 ω2
s0 s0
[1(ω1 )] [1(ω2 )]
The two states of the world ω1 and ω2 are distinguished only by their names; they are identical from the perspective both of the state of nature associated with them and of the beliefs of the players about the state of nature: at both states of the world, the state of nature is s0 , and at both states that fact is common belief. This is an instance of redundancy: two states of the world describe the same ◭ situation. If we eliminate the redundancy, we recapitulate the former belief space. Example 10.14 Known lottery The set of players is N = {1, . . . , n}. The set of states of nature, S = {s1 , s2 },
contains two elements; a chance move chooses s1 with probability 31 , and s2 with probability 23 . This probability distribution is common belief among the players. The following belief space, where Y = {ω1 , ω2 }, corresponds to this situation: State of the world
s(·)
ω1 ω2
s1 s2
π1 (·), · · · , πn (·) 1 (ω1 ), 23 (ω2 ) 3 1 (ω1 ), 23 (ω2 ) 3
In this space, each player i ∈ N has only one type, Ti = {[ 31 (ω1 ), 23 (ω2 )]}, and this fact is therefore common belief among the players. The beliefs of the players can be calculated from a common prior P defined by P(ω1 ) = 13 and P(ω2 ) = 32 , and the partitions derived from the beliefs of the types, i.e., Fi = {{ω1 , ω2 }} for every player i ∈ N ; hence this belief space is also an Aumann model of incomplete information. ◭ Example 10.15 Incomplete information on one side There are two players, N = {I, II}, and two states of nature, S = {s1 , s2 }; a chance move chooses s1 with probability p, and s2 with probability 1 − p. The state of the world that is chosen is known to Player I, but not to Player II. This description of the situation is common belief among the players. The following belief space, where Y = {ω1 , ω2 }, corresponds to this situation: State of the world
s(·)
πI (·)
πII (·)
ω1 ω2
s1 s2
[1(ω1 )] [1(ω2 )]
[p(ω1 ), (1 − p)(ω2 )] [p(ω1 ), (1 − p)(ω2 )]
In this belief space, Player II has only one type, TII = {[p(ω1 ), (1 − p)(ω2 )]}, while Player I has two possible types, TI = {[1(ω1 )], [1(ω2 )]}, because he knows the true state of nature. The beliefs of the players can be calculated from a common prior P given by P(ω1 ) = p and P(ω2 ) = 1 − p, and the partition FI = {{ω1 }, {ω2 }} (Player I knows which state of the world has been chosen) and FII = {Y } (Player II does not know which state of the world has been chosen). Note that in this example, the belief operator is the same as the knowledge operator (in accordance with Theorem 10.11). ◭
396
Games with incomplete information: the general model
Example 10.16 Incomplete information about the information of the other player This example, which is similar to Example 10.2 (on page 388), describes a situation in which one of the players knows the true state of nature, but is uncertain whether the other player knows the true state of nature. Consider a situation with two players, N = {I, II}. A state of nature s1 or s2 is chosen by tossing a fair coin. Player I is informed which state of nature has been chosen. If the chosen state of nature is s2 , only Player I is informed of that fact. If the chosen state of nature is s1 , the coin is tossed again, in order to determine whether or not Player II is to be informed that the chosen state of nature is s1 ; Player I is not informed of the result of the second coin toss; hence in this situation, even though he knows the state of nature, he does not know whether or not Player II knows the state of nature. The belief space corresponding to this situation contains three states of the world Y = {ω1 , ω2 , ω3 } and is given by: πI (·) πII (·) State of the world s(·) 1 ω1 s1 (ω1 ), 12 (ω2 ) [1(ω1 )] 1 12 1 s1 (ω1 ), 2 (ω2 ) (ω2 ), 23 (ω3 ) ω2 2 3 1 s2 [1(ω3 )] (ω2 ), 23 (ω3 ) ω3 3 At the state of the world ω1 , the state of nature is s1 , and both Player I and Player II know this. At the state of the world ω2 , the state of nature is s1 , and Player I knows this, but Player II does not know this. The state of the world ω3 corresponds to the situation in which the state of nature is s2 . Player I cannot distinguish between the states of the world ω1 and ω2 . Player II cannot distinguish between the states of the world ω2 and ω3 . The beliefs of the players can be derived from the probability distribution P, P(ω1 ) = 14 , P(ω2 ) = 14 , P(ω3 ) = 21 ,
(10.17)
given the partitions FI = {{ω1 , ω2 }, {ω3 }} and FII = {{ω1 }, {ω2 , ω3 }}. Notice that, as required, at every state of the world ω, every player ascribes probability 1 to the states of the world at which his beliefs coincide with his beliefs at ω. In this belief space, each player has two possible types: ! TI = 21 (ω1 ), 12 (ω2 ) , [1(ω3 )] , (10.18) 1 ! 2 (10.19) TII = [1(ω1 )] , 3 (ω2 ), 3 (ω3 ) .
What information does Player I of type [ 21 (ω1 ), 12 (ω2 )] lack (at the states of the world ω1 and ω2 )? He knows that the state of nature is s1 , but he does not know whether Player II knows this: Player I ascribes probability 21 to the state of the world being ω1 , at which Player II knows that the state of nature is s1 , and he ascribes probability 12 to the state of the world being ω2 , at which Player II does not know what the true state of nature is. Player I’s lack of information involves the information that Player II has. ◭
Example 10.17 Incomplete information on two sides (the independent case) There are two players, N = {I, II}, and four states of nature, S = {s11 , s12 , s21 , s22 }. One of the states of nature is chosen, using the probability distribution p defined by p(s11 ) = p(s12 ) = 61 , p(s21 ) = p(s22 ) = 31 .
397
10.3 Examples of belief spaces Each player has partial information about the chosen state of nature: Player I knows only the first coordinate of the chosen state, while Player II knows only the second coordinate. The belief space corresponding to this situation contains four states of the world, Y = {ω11 , ω12 , ω21 , ω22 }, and is given by: State of the world
s(·)
ω11
s11
ω12
s12
ω21
s21
ω22
s22
πI (·)
πII (·)
1 2 (ω11 ), 2 (ω12 ) 1 (ω11 ), 12 (ω12 ) 21 1 2 (ω21 ), 2 (ω22 ) 1 1 2 (ω21 ), 2 (ω22 ) 1
1
(ω11 ), 13 (ω12 ), 13 (ω11 ), 13 3 (ω12 ),
2 3 (ω21 ) 2 3 (ω22 ) 2 3 (ω21 ) 2 3 (ω22 )
In this case, each player has two possible types: ! TI = {I1 , I2 } = 12 (ω11 ), 12 (ω12 ) , 21 (ω21 ), 21 (ω22 ) , ! TII = {II1 , II2 } = 13 (ω11 ), 32 (ω21 ) , 13 (ω12 ), 23 (ω22 ) .
(10.20) (10.21)
Note that each player knows his own type: at each state of the world, each player ascribes positive probability only to those states of the world in which he has the same type. The beliefs of each player about the type of the other player are described in Figure 10.2.
I1 I2
II1 1/2 1/2
II2 1/2 1/2
I1 I2
The beliefs of Player I
II1 1/3 2/3
II2 1/3 2/3
The beliefs of Player II
Figure 10.2 The beliefs of each player about the type of the other player
The tables in Figure 10.2 state, for example, that Player I of type I2 ascribes probability 21 to Player II being of type II1 , and probability 12 to his being of type II2 . The beliefs of each player about the types of the other player do not depend on his own type, which is why this model is termed the “independent case.” This is a Harsanyi model of incomplete information, in which the common prior p over the set of type vectors, T = TI × TII is the product distribution shown in Figure 10.3.
I1 I2
II1 1/6 1/3
II2 1/6 1/3
Figure 10.3 The common prior in Example 10.17
The independence in this example is expressed in the fact that p is a product distribution over TI and TII . In summary, in this case the belief space is equivalent to a Harsanyi model of incomplete information. ◭
398
Games with incomplete information: the general model
Example 10.18 Incomplete information on two sides (the dependent case) This example is similar to Example 10.17, but here the probability distribution p according to which the state of nature is chosen is given by p(s11 ) = 0.3, p(s12 ) = 0.4, p(s21 ) = 0.2, p(s22 ) = 0.1. As in Example 10.17, the corresponding belief space has four states of the world, Y = {ω11 , ω12 , ω21 , ω22 }, and is given by: State of the world
s(·)
ω11
s11
ω12
s12
ω21
s21
ω22
s22
πI (·)
πII (·)
(ω11 ), 47 (ω12 ) 7 3 4 7 (ω11 ), 7 (ω12 ) 2 (ω21 ), 13 (ω22 ) 32 1 3 (ω21 ), 3 (ω22 ) 3
3
(ω11 ), 54 (ω12 ), 53 (ω11 ), 54 5 (ω12 ),
2 (ω21 ) 5 1 5 (ω22 ) 2 5 (ω21 ) 1 5 (ω22 )
The sets of type sets are ! (ω11 ), 47 (ω12 ) , 32 (ω21 ), 13 (ω22 ) , ! TII = {II1 , II2 } = 35 (ω11 ), 25 (ω21 ) , 54 (ω12 )), 15 (ω22 ) . TI = {I1 , I2 } =
3
7
The mutual beliefs of the players are described in Figure 10.4.
I1 I2
II1 3/7 2/3
II2 4/7 1/3
I1 I2
The beliefs of Player I
II1 3/5 2/5
II2 4/5 1/5
The beliefs of Player II
Figure 10.4 The beliefs of each player about the types of the other player
These beliefs correspond to a Harsanyi model with incomplete information, with the common prior p described in Figure 10.5.
I1 I2
II1 0.3 0.2
II2 0.4 0.1
Figure 10.5 The common prior in Example 10.18
This prior distribution can be calculated from the mutual beliefs described in Figure 10.4 as follows. Denote x = p(I1 , II1 ). From the beliefs of type I1 , we get p(I1 , II2 ) = 34 x; from the beliefs of type II2 , we get p(I2 , II2 ) = 31 x; from the beliefs of type I2 , we get p(I2 , II1 ) = 23 x. From the beliefs of type II1 , we get p(I1 , II1 ) = x, which is what we started with. Since p is a probability distribution, 3 x + 34 x + 31 x + 32 x = 1. Then x = 10 , and we have indeed shown that the common prior of the Harsanyi model is the probability distribution appearing in Figure 10.5. The difference between this example and Example 10.17 is that in this case the common prior is not a product distribution over T = TI × TII . Equivalently, the beliefs of one player about the types of the other player depend on his own type: Player I of type I1 ascribes probability 73 to Player II being of type II1 , while Player I of type I2 ascribes probability 32 to Player II being of type II1 . ◭
399
10.3 Examples of belief spaces
Example 10.19 Inconsistent Beliefs. This example studies a belief space in which the players hold inconsistent beliefs. This means that the beliefs cannot be derived from a common prior. Such a situation cannot be described by an Aumann model of incomplete information. There are two players, N = {I, II}, and four states of nature, S = {s11 , s12 , s21 , s22 }. The corresponding belief space has four states of the world, Y = {ω11 , ω12 , ω21 , ω22 }: State of the world
s(·)
ω11
s11
ω12
s12
ω21
s21
ω22
s22
The type sets are TI = {I1 , I2 } = TII = {II1 , II2 } =
3
πI (·)
πII (·)
(ω11 ), 74 (ω12 ) (ω11 ), 47 (ω12 ) 72 1 3 (ω21 ), 3 (ω22 ) 2 1 3 (ω21 ), 3 (ω22 ) 3
37
(ω11 ), 12 (ω21 ) 1 5 (ω12 ), 5 (ω22 ) 1 1 2 (ω11 ), 2 (ω21 ) 4 1 5 (ω12 ), 5 (ω22 ) 1
42
!
4 2 1 7 (ω11 ), 7 (ω12 ) , 3 (ω21 ), 3 (ω22 ) , 1 ! 1 4 1 2 (ω11 ), 2 (ω21 ) , 5 (ω12 ), 5 (ω22 ) .
The mutual beliefs of the players are described in Figure 10.6.
I1 I2
II1
II2
3/7 2/3
4/7 1/3
I1 I2
II1
II2
1/2 1/2
4/5 1/5
The beliefs of Player I The beliefs of Player II Figure 10.6 The beliefs of each player about the types of the other player These mutual beliefs are the same beliefs as in Example 9.56 (page 366). As shown there, there does not exist a common prior in the Harsanyi model with these beliefs. Note that this example resembles Example 10.18, the only difference being the change of one of the types of Player II, namely, type II1 , from [ 35 (ω11 ), 25 (ω21 )] to [ 12 (ω11 ), 12 (ω21 )]. These two situations, which are similar in their presentations as belief spaces, are in fact significantly different: one can be modeled by an ◭ Aumann or Harsanyi model of incomplete information, while the other cannot.
In general, if there exists a probability distribution p such that at any state of the world ω in the support of p, the beliefs of the player are calculated as a conditional probability via πi (ω) = p(· | {ω′ ∈ Y : πi (ω′ ) = πi (ω)}),
(10.22)
then p is called a consistent distribution, and every state of the world in the support of p is called a consistent state of the world. In that case, the collection of beliefs (πi )i∈N is called a consistent belief system (see also Section 10.6 on page 415). In the above example, all the states of the world in Y are inconsistent. Ensuring consistency requires the existence of certain relationships between the subjective probabilities of the players, and, therefore, the dimension of the set of consistent belief systems is lower than the dimension of the set of all mutual belief systems. For example, in the examples
400
Games with incomplete information: the general model
above containing two players and two types for each player, the mutual belief system of the types contains four probability distributions over [0, 1]; hence the set of mutual belief systems is isomorphic to [0, 1]4 . The consistency condition requires that any one of these four probability distributions be determined by the three others; hence the set of mutual belief systems is isomorphic to [0, 1]3 . In other words, within the set of all mutual belief systems, the relative dimension of the set of consistent belief systems is 0 (see Exercise 10.18). Example 10.20 Infinite type space There are two players N = {I, II} and the set of states of nature is S = [0, 1]2 . Player I is informed of the first coordinate of the chosen state of nature, while Player II is informed of the second coordinate. The beliefs of the players are as follows. If x is the value that Player I is informed of, he believes that the value Player II is informed of is taken from the uniform distribution over [0.9x, 0.9x + 0.1]. If y is the value that Player 2 is informed of, then if y ≤ 21 , Player 2 believes that the value Player 1 is informed of is taken from the uniform distribution over [0.7, 1], and if y > 12 , Player 2 believes that the value Player 1 is informed of is taken from the uniform distribution over [0, 0.3]. A belief space that corresponds to this situation is:
r The set of states of the world is Y = [0, 1]2 . A state of the world is denoted by ωxy = (x, y),
where 0 ≤ x, y ≤ 1. For every (x, y) ∈ [0, 1]2 , the equation s(ωxy ) = (x, y) holds. r For every x ∈ [0, 1], Player I’s belief πI (ωxy ) is a uniform distribution over the set {(x, y) ∈ [0, 1]2 : 0.9x ≤ y ≤ 0.9x + 0.1}, which is the interval [(x, 0.9x), (x, 0.9x + 0.1)]. r If y ≤ 1 then Player II’s belief πII (ωxy ) is the uniform distribution over the set {(x, y) ∈ 2 [0, 1]2 : 0.7 ≤ x ≤ 1} (which is the interval [(0.7, y), (1, y)]), and if y > 12 then the belief πII (ωxy ) is the uniform distribution over the set {(x, y) ∈ [0, 1]2 : 0 ≤ x ≤ 0.3} (which is the interval [(0, y), (0.3, y)]). The type sets of the players are4 TI = {U [(x, 0.9x), (x, 0.9x + 0.1)] : 0 ≤ x ≤ 1} , ! TII = U [(0.7, y), (1, y)] : 0 ≤ y ≤ 21 U [(0, y), (0.3, y)] :
1 2
!
0 for each player i ∈ N, then Y pair of players i and j (hence Yi (ω) = Y (ω) for every i ∈ N).
Proof: Let i ∈ N be a player. Since πi ({ω} | ω) > 0, it follows from Equation (10.26) j (ω) ⊆ Y i (ω) for i (ω). Then Theorem 10.31 implies that Y in Definition 10.28 that ω ∈ Y every j ∈ N. Since this is true for any pair of players i, j ∈ N, the proof of the theorem is complete.
The next theorem states that the minimal belief subspace at a state of the world ω is simply the union of the true state of the world ω and the minimal belief subspaces of the players at that state. The proof of the theorem is left to the reader (Exercise 10.32). Theorem 10.33 Let = (Y, Y , s, (πi )i∈N ) be a belief space in which Y is a finite set. Then for every state of the world ω ∈ Y ,
(ω) = {ω} ∪ i (ω) . Y Y (10.28) i∈N
Remark 10.34 As shown in Example 10.26, when the set of states of the world has the cardinality of the continuum, the minimal belief subspace may not necessarily exist. If the set of states of the world is a topological space,6 define the minimal belief subspace of a player as follows. , Y|Y, s, (πi )i∈N ) satisfying Equation = (Y A belief subspace is an ordered vector (10.25), and also satisfying the property that Y is a closed set. Player i’s minimal belief , s, (πi )i∈N ) in , Y |Y = (Y subspace at the state of the world ω, is the belief subspace is the smallest closed subset (with respect to set inclusion) among all the which the set Y belief subspaces satisfying Equation (10.26).
When the set of states of the world Y is finite, there exists a characterization of belief subspaces. Define a directed graph G = (Y, E) in which the set of vertices is the set of states of the world Y , and there is a directed edge from ω1 to ω2 if and only if there exists a player i ∈ N for whom πi ({ω2 } | ω1 ) > 0. A set of vertices C in a directed graph is called
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
6 A space Y is called a topological space if there exists a family of subsets T that are called open sets: the empty set is contained in T , the set Y is contained in T , the union of any set of elements of T is in T , and the intersection of a finite number of elements in T is also a set in T . A set A in a topological space is called closed if it is the complement of an open set.
406
Games with incomplete information: the general model
a closed component if for each vertex ω ∈ C, every vertex connected to ω by a directed edge is also in C; i.e., if there exists an edge from ω to ω′ , then ω′ ∈ C. The next theorem states that the set of belief subspaces is exactly the set of closed sets in the graph G. The proof of the theorem is left to the reader (Exercise 10.33). Theorem 10.35 Let = (Y, Y , s, (πi )i∈N ) be a belief space in which Y is a finite set of Y is a belief subspace if and only if Y is a closed of states of the world. A subset Y component in the graph G.
Denote the minimal closed set containing ω by C(ω). This set contains the vertex ω, all the vertices that are connected to ω by way of directed edges emanating from ω, all vertices that are connected to those vertices by directed edges, and so on. Since the graph G is finite, this is a finite process, and therefore the set C(ω) is well defined. Together with the construction of the set C(ω), Theorem 10.35 provides a practical method for calculating belief subspaces and minimal belief subspaces of the players. The next theorem provides a practical method of computing minimal belief subspaces at a particular state of the world. The proof of the theorem is left to the reader (Exercise 10.34). Theorem 10.36 Let = (Y, Y , s, (πi )i∈N ) be a belief space in which Y is a finite set, let ω ∈ Y , and let i ∈ N. Then i (ω) = C(ω′ ). (10.29) Y {ω′ : πi ({ω′ }|ω)>0}
Recognizing his own beliefs, player i can compute his own minimal belief subspace. To see this, note that since he knows πi (ω), he knows which states of the world are in the support of this probability distribution. Knowing the states of the world in the support, player i knows the beliefs of the other players at these states of the world; hence he knows which states of the world are in the supports of those beliefs. Player i can thus recursively i (ω). The construction is construct the portion of the graph G relevant for computing Y completed in a finite number of steps because Y is a finite set. i (ω) using his own beliefs, While player i can compute his minimal belief subspace Y j (ω))j =i , he needs in order to compute the belief subspaces of the other players, (Y to know their beliefs. Since player i does not know the true state of the world ω, he does not know the beliefs of the other players at that state, which means that he cannot compute the minimal belief subspaces of the other players. In Example 10.22 (page 404), at the states of the world ω2 and ω3 , the belief of Player II is [1(ω3 )]; hence Player II cannot distinguish between the two states of the world based on his beliefs. The minimal belief subspaces of Player I at the two states of the world are different: I (ω2 ) = Y, Y
I (ω3 ) = {ω3 }. Y
(10.30)
It follows that, based on his beliefs, Player II cannot know whether the minimal belief subspace of Player I is Y or {ω3 }.
407
10.5 Games with incomplete information
10.5
Games with incomplete information • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
So far, we have discussed the structure of the mutual beliefs of players, and largely ignored the other components of a game, namely, the actions and the payoffs. In this section, we will define games with incomplete information without a common prior, and the concept of Bayesian equilibrium in such games. Definition 10.37 A game with incomplete information is an ordered vector G = (N, S, (Ai )i∈N , ), where:
r N is a finite set of players. r S is a measurable space of states of nature. To avoid a surfeit of symbols, we will not mention the σ -algebra over S, or over other measurable sets that are defined below. r Ai is a measurable set of possible actions of player i, for every i ∈ N. r Each state of nature in S is a state game s = (N, (Ai (s))i∈N , (ui (s))i∈N ), where Ai (s) ⊆ Ai is a nonempty measurable set of possible actions of player i, for each i ∈ N. We denote by A(s) = ×i∈N Ai (s) the set of vectors of possible actions in s. For each player i ∈ N the function ui (s) : A(s) → R is a measurable function assigning a payoff to player i in the state game s for each vector of possible actions. r = (Y, Y , s, (πi )i∈N ) is a belief space of the players N over the set of states of nature S, satisfying the following condition: for every pair of states of the world ω, ω′ ∈ Y , if πi (ω) = πi (ω′ ), then Ai (s(ω)) = Ai (s(ω′ )). The last condition in Definition 10.37 implies that at each state of the world ω ∈ Y , player i’s set of possible actions Ai (s(ω)) depends on ω, but only through his type πi (ω). Since the player knows his own type, he knows the set of possible actions Ai (s(ω)) available to him. Formally, consider the partition Fi of Y determined by player i’s beliefs (see Equation (10.8) on page 390) given by the sets Fi (ω) = {ω′ ∈ Y : πi (ω′ ) = πi (ω)},
(10.31)
and the knowledge operator defined by this partition. Define the event Ci (ω) = “player i’s set of actions is Ai (s(ω))”: Ci (ω) := {ω′ ∈ Y : Ai (s(ω′ )) = Ai (s(ω))}.
(10.32)
Then the last condition in Definition 10.37 guarantees that at each state of the world ω, player i knows Ci (ω), i.e., Fi (ω) ⊆ Ci (ω) for each ω ∈ Y . A game with incomplete information, therefore, is composed of a belief space , and a collection of state games, one for each state of nature in S. The information that each player i has at the state of the world ω is his type, πi (ω). As required in the Harsanyi model of incomplete information, the set of actions available to a player must depend solely on his type. Every Harsanyi game with incomplete information (see Definition 9.39 on page 347) is a game with incomplete information according to Definition 10.37 (Exercise 10.48). Player i’s type set was denoted by Ti := {πi (ω) : ω ∈ Y }.
(10.33)
408
Games with incomplete information: the general model
To define the expected payoff, we assume that the graph of the function s #→ A(s), defined by Graph(A) := {(s, a) : s ∈ S, a ∈ A(s)} ⊆ S × A, is a measurable set. We similarly assume that for each player i ∈ N, the function ui : Graph(A) → R is a measurable function. Definition 10.38 A behavior strategy of player i in a game with incomplete information G = (N, S, (Ai )i∈N , ) is a measurable function σi : Y → (Ai ), mapping every state of the world to a mixed action available at the stage game that corresponds to that state of the world,7 and dependent solely on the type of the player. In other words, for each ω, ω′ ∈ Y , σi (ω) ∈ (Ai (s(ω))), ′
πi (ω) = πi (ω )
(10.34)
=⇒
′
σi (ω) = σi (ω ).
(10.35)
Since the mixed action σi (ω) of player i depends solely on his type ti = πi (ω), it can also be denoted by σi (ti ). Because the type sets of the players may be infinite, strategies must be measurable functions in order for us to calculate the expected payoff of a player given his type. Let σ = (σi )i∈N be a strategy vector. Denote by σ (ω) := (σi (ω))i∈N ∈
× (Ai (s(ω)))
(10.36)
i∈N
the vector of mixed actions of the players when the state of the world is ω. Player i’s payoff under σ at the state of the world ω is8 γi (σ | ω) = Ui (s(ω′ ); σ (ω′ ))dπi (ω′ | ω). (10.37) Y
Since πi (ω) is player i’s belief at the state of the world ω about the states of the world ω′ ∈ Y , the integral of the payoff function with respect to this probability distribution describes the expected payoff of the player at the state of the world ω, based on his subjective beliefs, and given the other players’ strategies. To emphasize that the expected payoff of player i at the state of world ω depends on the mixed action implemented by player i at ω, and is independent of mixed actions that he implements at other states of the world, we sometimes write γi (σi (ω), σ−i | ω) instead of γi (σi , σ−i | ω). We will now define the concept of Bayesian equilibrium in games with incomplete information. Definition 10.39 A Bayesian equilibrium is a strategy vector σ ∗ = (σi∗ )i∈N satisfying ∗ γi (σ ∗ | ω) ≥ γi (σi (ω), σ−i | ω), ∀i ∈ N, ∀σi (ω) ∈ (Ai (s(ω))), ∀ω ∈ Y.
(10.38)
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
7 Since a behavior strategy is a measurable function whose range is the space of mixed actions, we need to specify the σ -algebra over the space (Ai ) that we are using. The σ -algebra over this space is the σ -algebra induced by the weak topology (see Dunford and Schwartz [1999]). An alternative definition of a measurable function taking values in this space is: for each measurable set C ⊆ Ai , the function ω #→ σi (C | ω) is a measurable function. In an infinite space, the existence of a behavior strategy requires that the function ω #→ Ai (s(ω)) be measurable. Results appearing in Kuratowski and Ryll-Nardzewski [1965] imply that a sufficient condition for the existence of a behavior strategy is that (a) Ai is a complete metric space for every i ∈ N , (b) the function ω #→ Ai (s(ω)) is a measurable function, and (c) for each state of nature s, the set Ai (s) is a closed set. 8 Recall that Ui is the multilinear extension of ui ; see Equation (5.9) on page 147.
409
10.5 Games with incomplete information
In other words, a Bayesian equilibrium is a strategy vector that satisfies the condition that based on his subjective beliefs, at no state of the world can a player profit by deviating from his strategy. As the next theorem states, a strategy vector is a Bayesian equilibrium if no player of any type can profit by deviating to any other action. The theorem is a generalization of Corollary 5.8 (page 149). In our formulation of the theorem, we will use the following notation. Let σ = (σj )j ∈N be a strategy vector. For each player i ∈ N, each state of the world ω ∈ Y , and for each action ai,ω ∈ Ai (s(ω)), denote by (σ ; ai,ω ) the strategy vector at which every player j = i implements strategy σj , and player i plays action ai,ω when his type is πi (ω). Theorem 10.40 A strategy vector σ ∗ = (σi∗ )i∈N is a Bayesian equilibrium if and only if for each player i ∈ N, for each state of the world ω ∈ Y , and each action ai,ω ∈ Ai (s(ω)), γi (σ ∗ | ω) ≥ γi ((σ ∗ ; ai,ω ) | ω).
(10.39)
The proof of the theorem is left to the reader (Exercise 10.49).
Example 10.41 We consider a game that extends Example 10.19 (page 399), where beliefs are inconsistent. There are two players N = {I, II}, four states of nature S = {s11 , s12 , s21 , s22 }, and four states of the world, Y = {ω11 , ω12 , ω21 , ω22 }. The beliefs of the players, and the function s are given in Figure 10.10.
State of the world ω11 ω12 ω21 ω22
(·)
s11 s12 s21 s22
πI(·) [ 37 (ω11 ), [ 37 (ω11 ), [ 23 (ω21 ), [ 23 (ω21 ),
4 7 (ω12 )] 4 (ω12 )] 7 1 3 (ω22 )] 1 3 (ω22 )]
πII (·) [ 12 (ω11 ), [ 45 (ω12 ), [ 12 (ω11 ), [ 45 (ω12 ),
1 2 (ω21 )] 1 (ω22 )] 5 1 2 (ω21 )] 1 5 (ω22 )]
Figure 10.10 The beliefs of the players and the function s in Example 10.41
The players’ type sets are 3
! (ω11 ), 47 (ω12 ) , 32 (ω21 ), 31 (ω22 ) , ! TII = {II1 , II2 } = 12 (ω11 ), 21 (ω21 ) , 45 (ω12 ), 15 (ω22 ) . TI = {I1 , I2 } =
7
(10.40) (10.41)
The state games s11 , s12 , s21 , and s22 are given in Figure 10.11. A behavior strategy of Player I is a pair (x, y), defined as:
r Play the mixed action [x(T ), (1 − x)(B)] if your type is I1 . r Play the mixed action [y(T ), (1 − y)(B)] if your type is I2 .
410
Games with incomplete information: the general model Similarly, a behavior strategy of Player II is a pair (z, t), defined as:
r Play the mixed action [z(L), (1 − z)(R)] if your type is II1 . r Play the mixed action [t(L), (1 − t)(R)] if your type is II2 . We will now find a Bayesian equilibrium satisfying 0 < x, y, z, t < 1 (assuming one exists). In such an equilibrium, every player of each type must be indifferent between his two actions, given his beliefs about the type of the other player.
L
R
T
2, 0
0, 1
B
0, 0
1, 0
L
R
T
0, 0
0, 0
B
1, 1
1, 0
State game s11 L
R
T
0, 0
0, 0
B
1, 1
0, 0
State game s12 L
R
T
0, 0
2, 1
B
0, 0
0, 2
State game s21
State game s22
Figure 10.11 The payoff functions in Example 10.41
If the players are indifferent between their actions, then: 3 7
Player I of type I1 is indifferent between B and T :
· 2z = 73 (1 − z) + 47 ; 1 3
Player I of type I2 is indifferent between B and T :
1 2 (1
Player II of type II1 is indifferent between R and L : Player II of type II2 is indifferent between R and L :
· 2(1 − t) = 32 z;
4 5 (1
− y) = 12 x;
− x) = 15 (y + 2(1 − y)).
The solution to this system of equations is (verify!) x = 35 , y = 25 , z = 79 , t = 29 .
(10.42)
The mixed actions [ 35 (T ), 25 (B)] for Player I of type I1 , and [ 25 (T ), 53 (B)] for Player I of type I2 , and [ 97 (L), 92 (R)] for Player II of type II1 , and [ 92 (L), 79 (R)] for Player II of type II2 therefore form a Bayesian equilibrium of this game. This game has no “expected payoff,” because there is no common prior distribution over Y . Nevertheless, one can speak about an “objective” expected payoff at each state of nature (calculated from the actions of the players at that state of nature). Denote by γi (s) the payoff of player i at the state game s. Denote the payoff matrix of player i at the state game skl by Gi,kl . The payoff γi (skl ), for example, can be represented in vector form: z . (10.43) γi (s11 ) = (x, 1 − x)Gi,11 1−z
411
10.5 Games with incomplete information A simple calculation gives the payoff of each player at each state of nature: 46 6 3 2 3 2 7/9 7/9 γI (ω11 ) = , , γII (ω11 ) = , , = = GI,11 GI,11 2/9 2/9 5 5 45 5 5 45 18 4 3 2 3 2 2/9 2/9 γI (ω12 ) = , , γII (ω12 ) = , , = = GI,12 GI,12 7/9 7/9 5 5 45 5 5 45 21 21 2 3 2 3 7/9 7/9 γI (ω21 ) = , , γII (ω21 ) = , , = = GII,21 GII,21 2/9 2/9 5 5 45 5 5 45 28 70 2 3 2 3 2/9 2/9 γI (ω22 ) = , , γII (ω22 ) = , . = = GII,22 GII,22 7/9 7/9 5 5 45 5 5 45 Because the players do not know the true state game, the relevant expected payoff for a player is the subjective payoff he receives given his beliefs. For example, at the state of the world ω11 (or ω12 ) Player I believes that the state of the world is ω11 with probability 37 , and ω12 with probability 74 . Player I therefore believes that the state game is G11 with probability 73 , and G12 with probability 74 . His subjective expected payoff is therefore 37 × 46 + 74 × 18 = 23 , and it is this 45 45 payoff that he “expects” to receive at the state of the world ω11 (or ω12 ). Similarly, at the state + 31 × 28 = 14 . At ω11 (or ω21 ) of the world ω21 (or ω22 ), Player I “expects” to receive 32 × 21 45 45 27 1 6 1 21 3 Player II “expects” to receive 2 × 45 + 2 × 45 = 10 . At ω12 ( or ω22 ) Player II “expects” to receive 4 4 1 70 86 ◭ 5 × 45 + 5 × 45 = 225 .
There are no general results concerning the existence of Bayesian equilibria in inconsistent models, but we do have the following result. Theorem 10.42 Let G = (N, S, (Ai )i∈N , ) be a game with incomplete information, where Y is a finite set of states of the world, and each player i has a finite set of actions Ai . Then G has a Bayesian equilibrium in behavior strategies. Proof: To prove the theorem, we will define the agent-form game corresponding to G (see Definition 9.50, on page 354), and show that every Nash equilibrium of the agent-form game is a Bayesian equilibrium of G. Since Nash’s Theorem (Theorem 5.10 on page 151) implies that there exists an equilibrium in the agent-form game, we will deduce that the given game G has a Bayesian equilibrium. Recall that the type set of player i is denoted Ti = {πi (ω) : ω ∈ Y }. The agent-form ( uk )k∈N ), where: game corresponding to G is a strategic-form game Ŵ = (N, Sk )k∈N , (
r The set of players is N = {(i, ti ) : i ∈ N, ti ∈ Ti }. In other words, each type of each player is a player in the agent-form game. r The set of pure strategies of player (i, ti ) ∈ N is S(i,ti ) := Ai (s(ω)), where ω is any state of the world satisfying ti = πi (ω).
A pure strategy σ(i,ti ) of player (i, ti ) in the agent-form game is a possible action of player i’s type ti in G. It follows that a pure strategy vector σ = (σ(i,ti ) )(i,ti )∈N is a prescription for what each type of each player should play; hence it is, in fact, also a pure strategy vector in G, in which for each i ∈ N, the vector (σ(i,ti ) )ti ∈Ti is a pure strategy of player i. This means that the set of pure strategy vectors in G is equal to the set of strategy vectors in
412
Games with incomplete information: the general model
Ŵ, and the set of behavior strategy vectors in the game G equals the set of mixed strategy vectors in the game Ŵ.
r The payoff function of player (i, ti ) is u(i,ti ) ( σ ) = γi ( σ | ω),
(10.44)
σ | ω) depends on where ω is any state of the world satisfying ti = πi (ω). Since γi ( ω only via πi (ω), this expression depends only on ti , and therefore u(i,ti ) ( σ ) is well defined. in Because the set of states of the world Y is finite, we deduce that the set of players N Ŵ is also finite. Since every player i’s set of actions Ai is finite, the agent-form game Ŵ satisfies the conditions of Nash’s Theorem (see Theorem 5.10 on page 151); hence it has ∗ an equilibrium σ ∗ = (σ(i,t ) in mixed strategies. Since the set of behavior strategy i ) (i,ti )∈N vectors of G equals the set of mixed strategies of Ŵ, we can regard σ ∗ = (σi∗ (· | ti ))i∈N,ti ∈Ti as a vector of behavior strategies in G. The fact that σ ∗ is a Bayesian equilibrium then follows from the definition of the agent-form game, and because σ ∗ is a mixed strategy equilibrium of the agent-form game Ŵ. The following examples look at games with incomplete information with infinite spaces of states of nature. Example 10.43 Sealed-bid first-price auction9 An original van Gogh painting is being offered in a firstprice sealed-bid auction, meaning that every buyer writes his bid on a slip of paper that is placed in a sealed envelope, which is then inserted into a box. After all buyers have submitted their bids, all of the envelopes in the box are opened and read. The buyer who has made the highest bid wins the painting, paying for it the amount that he offered. If more than one buyer bids the highest bid, a fair lottery is conducted among them to choose the winner. Every buyer has a private evaluation for the painting, which will be referred to as his private value for the object. This is the subjective value he ascribes to the painting; private values may differ from one buyer to the next. Only two buyers take part in this auction, Elizabeth and Charles. Each of them knows his or her private value, but not the private value of the other buyer. Each buyer believes that the private value of the other buyer is uniformly distributed in the interval [0, 1], and this fact is common belief among the buyers. This situation can be modeled as a game with incomplete information, in the following way:
r The set of players is N = {Elizabeth, Charles}. r The set of states of nature is S = {sx,y : 0 ≤ x, y ≤ 1}; The subscript x corresponds to Elizabeth’s private value, and subscript y corresponds to Charles’s private value.
r Player i’s set of actions is Ai (s) = [0, ∞); hence each player i can submit any nonnegative bid ai . The pair of bids submitted in the envelopes is therefore (aE , aC ).
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
9 Auction theory is studied in greater detail in Chapter 12.
413
10.5 Games with incomplete information
r Elizabeth’s payoff function is ⎧ if aE > aC , ⎪ ⎨ x − aE 1 uE (sx,y ; aE , aC ) = 2 (x − aE ) if aE = aC , ⎪ ⎩ 0 if aE < aC .
(10.45)
If Elizabeth wins the auction, her payoff is the difference between her private value and the amount of money she pays for the painting; if she does not win the auction, her payoff is 0. If both buyers submit the same bid, the winner of the auction is chosen at random between them, where each buyer has a probability of 21 of being chosen (this can be accomplished, for example, by tossing a fair coin). It follows that in this case Elizabeth’s payoff is half of the difference between her private value and the sum of money she pays for the painting. Charles’s payoff function is defined similarly. r The space of the states of the world is Y = [0, 1]2 with the σ -algebra generated by the Borel sets. r The function s : Y → S is defined by s(x, y) = sx,y for all (x, y) ∈ Y . r For each state of the world ω = (x, y), Elizabeth’s belief, πE (x, y), is the uniform distribution over the set {(x, y) : y ∈ [0, 1]} and Charles’s belief, πC (x, y), is the uniform distribution over the set {( x , y) : x ∈ [0, 1]}.
We will show that this game has a symmetric Bayesian equilibrium σ ∗ = (σE∗ , σC∗ ), in which both buyers make use of the same strategy: player i’s bid is half of his private value. That is, x y σE∗ (x, y) = , σC∗ (x, y) = , ∀(x, y) ∈ [0, 1]2 . (10.46) 2 2 Suppose that Charles uses strategy σC∗ . We will show that Elizabeth’s best reply is to bid half of her private value. Elizabeth’s expected payoff if her private value is x, and her bid is aE , is y γE (aE , σC∗ | x) = P aE > × (x − aE ) (10.47) 2 (10.48) = P(2aE > y) × (x − aE ) = min{2aE , 1} × (x − aE ).
(10.49)
As a function of aE , this is a quadratic function over aE ∈ [0, 21 ] (attaining a maximum at aE = x2 ), and a linear function with a negative slope for aE ≥ 12 . The graph of this function is shown in Figure 10.12.
γE (aE, σ*C x)
x 2
γE (aE, σ*C x)
x
aE 1 2
The case x < 12
x 1 2 2
x
aE
The case sE > 12
Figure 10.12 Elizabeth’s payoff function
In both cases, the function attains a maximum at the point aE = x2 . It follows that aE∗ (x) = the best reply to σC∗ . Thus, σ ∗ = (σE∗ , σC∗ ) is a Bayesian equilibrium.
x 2
is
◭
414
Games with incomplete information: the general model
Example 10.44 This is an example in which the beliefs of the players are inconsistent. There are two players N = {I, II}. The set of states of nature is S = {sx,y : 0 < x, y < 1}; the state games in S are depicted in Figure 10.13.
L
R
T
1, 0
0, 0
B
2, 1
1, –1
State game sx,y for x > y
L
R
T
0, 0
0, 0
B
0, 0
0, 0
State game sx,y for x = y
L
R
T
0, 1
1, 2
B
0, 0
–1, 1
State game sx,y for x < y
Figure 10.13 The state games in Example 10.44
The set of states of the world is Y = (0, 1)2 , and the function s : Y → S is defined by s(x, y) = sx,y for every (x, y) ∈ Y . Player I is told the first coordinate x of the state of world, and Player II is told the second coordinate y of the state of world. Given the value z that is told to a player, that player believes that the value told to the other player is uniformly distributed over the interval (0, z). In other words, πI (x, y) is the uniform distribution over the line segment ((x, 0), (x, x)) and πI (x, y) is the uniform distribution over the line segment ((0, y), (y, y)). At every state of the world, Player I believes that x > y; hence he believes that action B strictly dominates action T . In a similar way, at every state of the world Player II believes that y > x; hence he believes that action R strictly dominates action L. It follows that the only equilibrium is that where Player I, of any type, plays B, and Player II, of any type, plays R. The equilibrium payoff is then (−1, 1) if x < y, (1, −1) if x > y, and (0, 0) if x = y. However, in every state of the world, each player believes that his payoff is 1. ◭
Example 10.45
We now consider a game similar to the game in Example 10.44, but with different state
games, given in Figure 10.14; here, x represents the first coordinate of the state of nature, and y the second coordinate. Note that each player, after learning his type, knows his payoff function, but does not know the payoff function of the other player, even if he knows the strategy used by the other player, because he does not know the other player’s type.
Player II L R T
x, 0
0, y
B
0, 1
1, 0
Player I Figure 10.14 The state game s(x,y) in Example 10.45
We will seek a Bayesian equilibrium in which both players, of each type, use a completely mixed action. At such an equilibrium, every player of every type is indifferent between his two actions. Denote by σI (x) the probability that Player I of type x, who has received the information x, will choose action T , and by σII (y) the probability that Player II of type y,
415
10.6 The concept of consistency who has received the information y, will choose action L. Denote by Ux the uniform distribution over [0, x]. The payoff to Player I of type x if he plays the action T is then x x γI (T , σII | x) = xσII (y)dUx (y) = x σII (y)dUx (y), (10.50) y=0
y=0
and the payoff to Player I of type x if he plays the action B is then x x (1 − σII (y))dUx (y) = 1 − σII (y)dUx (y). γI (B, σII | x) = y=0
(10.51)
y=0
Player I of type x is indifferent between T and B if these two quantities are equal to each other, i.e., if x (1 + x) σII (y)dUx (y) = 1. (10.52) y=0
The density function of the distribution Ux equals x1 in the interval [0, x] , and it follows that in this interval dUx (y) = dy . After inserting this equality in Equation (10.52) and moving terms from one x side of the equal sign to the other, we get x x . (10.53) σII (y)dy = 1+x y=0 Differentiating by x yields σII (x) =
1 . (1 + x)2
(10.54)
By replacing the variable x by y, which is the information that Player II receives, we deduce that 1 σII (y) = (1+y) 2 is a strategy of Player II that makes Player I of any type indifferent between his two actions. In Exercise 10.54, the reader is asked to conduct a similar calculation to find a strategy of Player I that makes Player II of any type indifferent between his two actions. When each player implements a strategy that makes the other player indifferent between his two actions, we obtain an equilibrium (why is this true?). ◭
10.6
The concept of consistency • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The concept of consistency in belief spaces was defined on page 399. A consistent belief space is one in which the beliefs of the players are derived from a common prior p over the types. In this section, we will study this concept in greater detail. For simplicity, we will deal here only with finite belief spaces, but all the results of this section also hold in belief spaces with a countably infinite number of states of the world. The definitions and results can be generalized to infinite belief spaces, often requiring only adding appropriate technical conditions. Denote the support of πi (ω) by Pi (ω), Pi (ω) := supp(πi (ω)) ⊆ Y.
(10.55)
This is the set of states of the world that are possible, in player i’s opinion, at the state of the world ω.
416
Games with incomplete information: the general model
Definition 10.46 Let = (Y, Y , s, (πi )i∈N ) be a belief space, where Y is a finite set. A is consistent if there exists a probability distribution p over Y such belief subspace Y that, for every event A ⊆ Y , for each player i, and for each ω ∈ supp(p), p(Pi (ω)) > 0 and πi (A | ω) = p(A | Pi (ω)).
(10.56)
, for every A probability distribution p satisfying Equation (10.56) for every event A ⊆ Y ), or a player i and for every ω ∈ supp(p), is called a consistent distribution (over Y common prior.
In words, the belief of player i at the state of the world ω is given by the conditional probability of p given the information Pi (ω) that the player has at ω. A consistent distribution, therefore, plays the same role as the common prior in the Aumann model of incomplete information. On page 399 we defined the concept of consistency by conditioning on the set {ω′ ∈ Y : πi (ω′ ) = πi (ω)} instead of the set Pi (ω) (see Equation (10.56)). The coherency requirement in the definition of a belief space guarantees that these two definitions are equivalent (Exercise 10.65). If p is a consistent distribution, then every state of the world ω ∈ supp(p) is called a consistent state of the world. Every state of the world that is not consistent is called inconsistent. = Remark 10.47 Let = (Y, Y , s, (πi )i∈N ) be a belief space, and let , Y , s, (πi )i∈N ) be a consistent belief subspace of . Then is also a consistent (Y is a consistent belief subspace, there exists a belief space. To see this, note that since consistent distribution p over Y . In that case, define a probability distribution p over Y by , p (ω) if ω ∈ Y p(ω) = (10.57) . 0 if ω ∈ Y
This is a consistent distribution over (Exercise 10.64).
The next example elucidates the concepts of consistent distribution and consistent state of the world. Example 10.48 Figure 10.15 depicts a belief space for the set of players N = {I, II} over the set of states of nature S = {s1 , s2 }. In this belief space, the set of states of the world is Y = {ω1 , ω2 }.
State of the world ω1 ω2 Figure 10.15 The belief space in Example 10.48
(·) s1 s2
πI (·) [1(ω1)] [1(ω1)]
πII (·) [1(ω1)] [1(ω2)]
417
10.6 The concept of consistency At the state of the world ω1 , the fact that the state of nature is s1 is common belief among the players. At the state of the world ω2 , Player I believes that the fact that the state of nature is s1 is common belief among the players, while Player II believes that the state of nature is s2 , and believes that Player I believes that the fact that the state of nature is s1 is common belief among the players. = {ω1 } is a consistent belief subspace, with a consistent distribution p = The belief subspace Y [1(ω1 )]. Remark 10.47 shows that is also a consistent belief space, with consistent distribution p = [1(ω1 )]. The state of the world ω1 is contained in the support of p; therefore it is a consistent state of the world. The state of the world ω2 , however, is inconsistent. Indeed, at that state of the world Player I ascribes probability 1 to the state of nature being s1 , while Player II ascribes probability 1 to the state of nature being s2 . There cannot, then, exist a probability distribution p from which these two beliefs can be derived (verify!). ◭
Example 10.18 (Continued) Consider the event A = {ω12 , ω21 } and the probability distribution p, as defined in Figure 10.5 on page 398. Then πI (A | ω11 ) = 74 , p(A | PI (ω11 )) = p(A | {ω11 , ω12 }) =
(10.58) 4 7;
(10.59)
hence πI (A | ω11 ) = p(A | PI (ω)). It can be shown that Equation (10.56) is satisfied for every event A ⊆ Y , for every player i ∈ {I, II}, and for every ω ∈ Y (Exercise 10.66); hence p is a consistent ◭ distribution.
Example 10.19 (Continued) The beliefs of the players are shown in Figure 10.6 on page 399. Consider the event A = {ω12 , ω21 } and the probability distribution p defined in Figure 10.5 on page 398. Then πI (A | ω11 ) = 47 , p(A | PI (ω11 )) = p(A | {ω11 , ω12 }) =
(10.60) 4 7.
(10.61)
On the other hand, πII (A | ω11 ) = 21 , p(A | PII (ω11 )) = p(A | {ω11 , ω21 }) =
(10.62) 2 ; 5
(10.63)
hence πII (A | ω11 ) = p(A | PII (ω11 )). It follows that p is not a consistent distribution. This is not ◭ surprising, since we proved that this belief space is not consistent.
It can be shown (Exercise 10.67) that the following definition is equivalent to the definition of a consistent distribution (see Definition 10.46).
418
Games with incomplete information: the general model
Definition 10.49 A probability distribution p ∈ (Y ) (over a finite set of states of the world Y ) is called consistent if for each player i ∈ N,
p= πi (ω)p(ω). (10.64) ω∈Y
In other words, p(ω′ ) =
ω∈Y
πi ({ω′ } | ω)p(ω), ∀ω′ ∈ Y.
(10.65)
A probability distribution p is consistent if, for every player i, it is the average according to p of player i’s types (recall that a type is a probability distribution over Y ). This gives us a new angle from which to regard the consistency condition: in any consistent system of beliefs, we can find weights for the states of the world (a probability distribution p), such that the weighted average of the type of each player is the same for all the players, and this average is exactly the probability distribution p. (Equations (10.64) and (10.65) refer to sums, and not integrals, because of the assumption that the set of states of the world is finite.) Example 10.18 (Continued) To ascertain that p is a consistent distribution according to Definition 10.49, we need to ascertain that Equation (10.64) is satisfied for each player i ∈ N . Each row in the table in Figure 10.16 describes a type of Player I at a state of the world. The right column shows the common prior, and the bottom row describes the weighted average of Player I’s types.
ω11
ω12
πI(ω11) πI(ω12) πI(ω21) πI(ω22)
3 7 3 7 0
4 7 4 7 0
0
0
Average
3 10
4 10
ω21
ω22
0
0
0 2 3 2 3 2 10
0 1 3 1 3 1 10
Probability 3 10 4 10 2 10 1 10
Figure 10.16 The probability that each type of Player I ascribes to each state of nature in Example 10.18
When we take the weighted average of each row of the table (i.e., compute the right-hand side of the equal sign in Equation (10.64)), we obtain the probability distribution p with which we started (listed in the left-most column). We obtain a similar result with respect to Player II (Exercise 10.68); hence p is a consistent distribution according to Definition 10.49. ◭
Definition 10.46 does not require the support of a consistent distribution p to be all of Y . The next theorem states, however, that the support of such a probability distribution is also a consistent belief subspace. Theorem 10.50 Let = (Y, Y , s, (πi )i∈N ) be a belief space in which Y is a finite set. If = supp(p) is a consistent belief subspace. p is a consistent distribution over Y , then Y
419
10.6 The concept of consistency
The proof of Theorem 10.50 is left to the reader (Exercise 10.70). As the next example shows, it is possible for a consistent belief space to have several different consistent distributions. Example 10.51
Consider the following belief space, with one player, N = {I}, two states of nature, S =
{s1 , s2 }, and two states of the world, Y = {ω1 , ω2 }, where: State of the world
s(·)
πI (·)
ω1 ω2
s1 s2
[1(ω1 )] [1(ω2 )]
That is, at ω1 the player believes that the state of nature is s1 , and at ω2 he believes that the state of nature is s2 . For each λ ∈ [0, 1], the probability distribution pλ , defined as follows, is consistent (verify!): p λ (ω1 ) = λ,
p λ (ω2 ) = 1 − λ.
(10.66)
◭
The belief space Y = {ω1 , ω2 } in Example 10.51 properly contains two belief subspaces: {ω1 } and {ω2 }. As the next theorem states, this is the only possible way in which multiple consistent distributions can arise. Theorem 10.52 Let = (Y, Y , s, (πi )i∈N ) be a consistent belief space in which Y is finite that does not properly contain a belief subspace. Then there exists a unique consistent probability distribution p whose support supp(p) is contained in Y . there exists a consistent distribution By definition, for each consistent belief subspace Y p satisfying supp(p) ⊆ Y . Theorem 10.52 states that if the belief subspace is minimal, then there exists a unique such probability distribution. It then follows from Theorem . 10.50 that supp(p) = Y In order to prove Theorem 10.52, we will first prove the following auxiliary theorem. We adopt here the convention that 00 = 1.
Theorem 10.53 Let (αk )nk=1 be nonnegative numbers, let (xk , yk )nk=1 be a sequence of pairs of positive numbers, and let C > 1. If xykk ≤ C for all i ∈ {1, 2, . . . , n}, then
n αk x k k=1 ≤ C. If, in n k yk k=1 α n αk x k then k=1 < C. n k=1 αk yk
addition, there exists j ∈ {1, 2, . . . , n} such that αj > 0, and
xj yj
< C,
Proof: If αk = 0 for each k, then both the numerator and the denominator in the expression n αk x k k=1 n are zero; hence their ratio equals 1, which is smaller than C. Suppose, therefore, k=1 αk yk is positive. This implies that both the numerator that at least one of the numbers in (αk )nk=1 n α x and the denominator in the expression nk=1 αkk ykk are positive. k=1 Since yxkk ≤ C for each k ∈ {1, 2, . . . , n}, it follows that xk ≤ Cyk . Because (αk )nk=1 are nonnegative numbers, n
k=1
αk xk ≤ C
n
k=1
αk yk .
(10.67)
420
Games with incomplete information: the general model
Since the sum on the right-hand side of the equation is positive, we can divide by that sum to get n αk xk k=1 ≤ C. (10.68) n k=1 αk yk x
Next, suppose that there also exists j ∈ {1, 2, . . . , n} such that αj > 0 and yjj < C. In this case, we deduce that Equation (10.67) is satisfied as a strict inequality (verify!). It follows that Equation (10.68) is also satisfied as a strict inequality, which is what we sought to show.
be two distinct consistent probability distributions, Proof of Theorem 10.52: Let p and p where the supports supp(p) supp( p ) are contained in Y . Theorem 10.50 implies that supp(p) is a belief subspace. Since Y is a minimal belief subspace, it must be the case that supp(p) = Y . We similarly deduce that supp( p ) = Y and therefore in particular p(ω) supp( p) = supp(p). Denote C := maxω∈supp(p) p(ω) . Since supp( p ) = Y , the denominator p (ω) is positive for all ω ∈ Y ; therefore C is well defined. Since p and p are different probability distributions, it must be the case that C > 1. To see why, suppose that C ≤ 1. Then p(ω) ≤ p (ω) for each ω ∈ Y , and since ω∈Y p(ω) = 1 = ω∈Y p (ω), we have p(ω) = p (ω) for each ω ∈ Y . Denote p(ω) =C . (10.69) A := ω ∈ Y : p (ω) We now show that Theorem 10.53 implies that supp(πi (ω′ )) ⊆ A for each ω′ ∈ A. Let ω′ ∈ A be an element of A, and write out the expressions in Theorem 10.53 with (πi ({ω′ } | p (ω))ω∈Y as, ω))ω∈Y as the set of nonnegative numbers (αk )nk=1 and (p(ω))ω∈Y and ( respectively, the sets of positive numbers (xk )nk=1 and (yk )nk=1 . Since p and p are consistent distributions, with the aid of Definition 10.49, we get p(ω′ ) πi ({ω′ } | ω)p(ω) k αk xk = . (10.70) = ω ′ p (ω) p (ω′ ) k αk yk ω πi ({ω } | ω)
If supp(πi (ω′ )) were not contained in A, then there would be a state ω ∈ supp(πi (ω′ )) ′) < C. Then Theorem 10.53 would in turn imply that p(ω < C, i.e., ω′ ∈ A, satisfying pp(ω) (ω) p (ω′ ) which would be a contradiction. If follows that supp(πi (ω′ )) ⊆ A for each ω′ ∈ A; hence A is a belief subspace (see Definition 10.21 on page 400). Since Y is a minimal belief subspace, A = Y . We then deduce that pp(ω) = C > 1 for each ω ∈ Y , i.e., p(ω) > p (ω). Summing over ω ∈ Y , we (ω) get
p(ω) > p (ω) = 1. (10.71) 1= ω∈Y
This contradiction establishes that p = p .
ω∈Y
The consistency presented in this section is an “objective” concept, by which we mean objective from the perspective of an outside observer, who knows the belief space = (Y, Y , s, (πi )i∈N ) and can verify whether or not a given state of the world ω ∈ Y is consistent according to Definition 10.46. But what are the beliefs of the players about
421
10.6 The concept of consistency
the consistency of a given state of the world? If a player believes that the state of the world is consistent, then he can describe the situation he is in as a Harsanyi game with incomplete information, and choose his actions by analyzing that game. The following theorem relates to this question. Theorem 10.54 Let = (Y, Y , s, (πi )i∈N ) be a belief space, where Y is a finite set, and let ω ∈ Y be a consistent state of the world. Then it is common belief among the players at ω that ω is consistent. In particular, every player at ω believes that ω is consistent. Proof: Since ω is a consistent state of the world, there exists a consistent distribution p over Y satisfying p(ω) > 0. Since p(ω) > 0, Equation (10.56) implies that for each player i ∈ N, πi ({ω} | ω) = p({ω} | Pi (ω)) > 0.
(10.72)
It follows from Theorem 10.32 on page 405 that for each pair of players i, j ∈ N, j (ω) = Y i (ω), Y
(10.73)
i (ω) is the minimal belief space of player i at the state of the world ω (see where Y Definition 10.30 on page 403). Note that ω ∈ supp(πi (ω)) for each player i ∈ N; hence i (ω) = Y (ω) for each i ∈ N, where Y (ω) is the ω ∈ Yi (ω). Theorem 10.33 implies that Y minimal belief space at the state of the world ω (see Definition 10.24 on page 401). Let p (ω), be the probability distribution p, conditioned on the set Y p (ω′ ) =
p(ω′ ) (ω). , ∀ω′ ∈ Y (ω)) p(Y
(10.74)
(ω) is a consistent Then p is a consistent distribution (Exercise 10.72). It follows that Y belief subspace. As stated after Definition 10.21 (page 400), every belief subspace is (ω) is common common belief at every state of the world contained in it; hence the event Y belief among the players at the state of the world ω. In particular, it is common belief among the players at ω that ω is consistent. Every player i, based only on his own private information (i.e., his type), can construct i (ω) that includes, according to his beliefs, all the states the minimal belief subspace Y of the world that are relevant to the situation he is in. If the state of the world ω is i (ω) is also consistent, and, since it is a minimal consistent, then the belief subspace Y belief subspace, there is a unique consistent distribution p over it (Theorem 10.52), which the player can compute. In this case, the situation, according to player i’s beliefs, is equivalent to a Harsanyi model of incomplete information. That model is constructed by i (ω); first selecting a state of the world according to the consistent distribution p over Y hence the situation, according to player i’s beliefs, is equivalent to the interim stage, which is the point in time at which every player knows his partition element, which contains the true state of the world. Every Aumann model, or equivalently Harsanyi model, is but an auxiliary construction that can be made by each of the players. The entire model thus constructed, including the space of types (which is computed from the belief subspace i (ω)), and the probability distribution used to choose the types (which is derived from the Y consistent distribution p), is based on the private information of the player. In addition,
422
Games with incomplete information: the general model
when ω is a consistent state of the world, Theorem 10.54 states that every player computes the same minimal belief space, and computes the same consistent distribution p, and it is common belief among the players that this is the case. In particular, all the players arrive at the same Aumann (or Harsanyi) model of incomplete information, and this model is common belief among them. The next question that naturally arises is what are the beliefs of the players about the consistency of a given state of the world when that state of the world is inconsistent. As we will now show, in that case there are two possibilities: it is possible for a player to believe that the state of the world is inconsistent, and in some cases this may even be common belief among the players. However, it is also possible for a player (mistakenly) to believe that the state of the world is consistent, and it is even possible for all the players to believe that the state of the world is consistent, and for that fact to be common belief among the players. Theorem 10.55 Let = (Y, Y , s, (πi )i∈N ) be a belief space, where Y is a finite set, and let ω ∈ Y be an inconsistent state of the world. If πi ({ω} | ω) > 0, then at the state of the world ω player i believes that the state of the world is inconsistent. i (ω); hence Y i (ω) is a Proof: The assumption that πi ({ω} | ω) > 0 implies that ω ∈ Y belief subspace containing ω. Since ω is an inconsistent state of the world, there does i (ω) satisfying p(ω) > 0; hence player i, after not exist a consistent distribution p over Y i (ω), believes that the state of the world is calculating his minimal belief subspace Y inconsistent.
It follows from Theorem 10.55 that at an inconsistent state of the world ω, player i is liable (mistakenly) to believe that the state of the world is consistent only if he ascribes probability 0 to ω; that is, πi ({ω} | ω) = 0. This happens in fact in Example 10.27 on page 402, in which the state of the world ω2 is inconsistent, but at this state of the world, both players believe that the actual state of the world is consistent. In fact, in Example 10.27, at the state of the world ω2 it is common belief among the players that the state of the world is consistent, even though it is inconsistent (Exercise 10.80). In Example 10.19 on page 399, every state of the world is inconsistent; hence the fact that any given state of the world is inconsistent is common belief among the players at every state of the world. The next theorem generalizes this example, and presents a sufficient condition that guarantees that the fact that a given state of the world is inconsistent is common belief among the players. Theorem 10.56 Let = (Y, Y , s, (πi )i∈N ) be a belief space, where Y is a finite set, and let ω ∈ Y be an inconsistent state of the world. If πi ({ω′ } | ω′ ) > 0 for every player i and (ω) at ω, then the fact that the every state of the world ω′ in the minimal belief subspace Y state of the world is inconsistent is common belief among the players at ω, and at every (ω). state of the world in Y
(ω), the assumption implies that πi ({ω} | ω) > 0 for every player Proof: Because ω ∈ Y (ω) = Y i (ω) for each player i ∈ N. It then follows i ∈ N. By Theorem 10.32 (page 405), Y (ω), every player believes that from Theorem 10.55 that at every state of the world ω′ ∈ Y
423
10.8 Exercises
the state of the world is inconsistent. In particular, it follows that at every state of the (ω) it is common belief that the state of the world is inconsistent. world ω′ ∈ Y
There are cases in which at a given state of the world, every player believes that the state of the world is inconsistent, but this fact is not common belief among the players (Exercise 10.81). There are also cases in which some of the players believe that the state of the world is consistent while others believe that the state of the world is inconsistent (Exercise 10.82). Most of the models of incomplete information used in the game theory literature are Harsanyi games. This means that nearly every model in published papers is described using a consistent belief system, despite the fact that, as we have seen, not only do inconsistent belief spaces exist, they comprise an “absolute majority” of situations of incomplete information – the set of consistent situations is a set of measure zero within the set of belief spaces. The reason that consistent models are ubiquitous in the literature is mainly because consistent models are presentable extensive-form games, while situations of inconsistent beliefs cannot be presented as either extensive-form or strategic-form games. This makes the mathematical study of such situations difficult. It should, however, be reiterated that the central solution concept – Bayesian equilibrium – is computable and applicable in both consistent and inconsistent situations.
10.7
Remarks • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Example 10.16 (page 396) appears in Sorin and Zamir [1985], under the name “Lack of information on one-and-a-half sides.” Results on Nash equilibria and Bayesian equilibria in games with incomplete information and general space of states of the world can be found in many papers. A partial list includes Milgrom and Weber [1985], Milgrom and Roberts [1990], van Zandt and Vives [2007], van Zandt [2007], and Vives [1990]. Exercises 10.37–10.40 are taken from Mertens and Zamir [1985]. The notion “belief with probability at least p” that appears in Exercise 10.15 was defined in Monderer and Samet [1989]. Discussion about the subject of probabilistic beliefs appeared in Gaifman [1986]. Exercise 10.62 is taken from van Zandt [2007]. The authors thank Yaron Azrieli, Aviad Heifetz, Dov Samet, and Eran Shmaya for their comments on this chapter.
10.8
Exercises • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
10.1 Let the set of players be N = {I, II}, the set of states of nature be S = {s1 , s2 }, and the set of states of the world be Y = {ω1 , ω2 , ω3 }. The σ -algebra over Y is Y = {∅, {1}, {2, 3}, {1, 2, 3}}, and the function s mapping the states of the world to the states of nature is given by
s(ω1 ) = s1 ,
s(ω2 ) = s(ω3 ) = s2 .
(10.75)
For each of the following three belief functions, determine whether (Y, Y , s, (πi )i∈N ) is a belief space of N over S. Justify your answers.
424
Games with incomplete information: the general model
(a) πI (ω1 ) = [1(ω1 )], πI (ω2 ) = πI (ω3 ) = [ 31 (ω2 ), 23 (ω3 )], πII (ω1 ) = πII (ω2 ) = πII (ω3 ) = [ 74 (ω2 ), 37 (ω3 )]. (b) πI (ω1 ) = πI (ω2 ) = πI (ω3 ) = [ 41 (ω1 ), 34 (ω2 )], πII (ω1 ) = [1(ω1 )], πII (ω2 ) = πII (ω3 ) = [ 31 (ω2 ), 23 (ω3 )]. (c) πI (ω1 ) = [ 13 (ω1 ), 13 (ω2 ), 13 (ω3 )], πI (ω2 ) = πI (ω3 ) = [ 61 (ω2 ), 56 (ω3 )], πII (ω1 ) = [1(ω1 )], πII (ω2 ) = πII (ω3 ) = [1(ω3 )].
, , Y = (Y s, ( πi )i∈N ) be two belief spaces satis10.2 Let = (Y, Y , s, (πi )i∈N ) and fying Y ∩ Y = ∅. Prove that = (Y , Y , s, ( πi )i∈N ), as defined below, is a belief space.10
rY := Y ∪ Y . }. r Y := {F ∪ F : F ∈ Y , F ∈ Y r . s(ω) = s(ω) for every ω ∈ Y and s( ω) = s( ω) for every ω∈Y r . πi (ω) = πi (ω) for every ω ∈ Y and πi ( ω) = πi ( ω) for every ω∈Y
10.3 Prove Equation (10.10) on page 390.
10.4 Minerva ascribes probability 0.7 to Hercules being able to lift a massive rock, and she believes that Hercules believes that he can lift the rock. Construct a belief space in which the described situation is represented by a state of the world and indicate that state (more than one answer is possible). 10.5 Minerva ascribes probability 0.7 to Hercules being able to lift a massive rock, and she believes that Hercules believes that he can lift the rock. Hercules, in contrast, believes that if he attempts to lift the rock he will fail to do so. Construct a belief space in which the described situation is represented by a state of the world and indicate that state (more than one answer is possible). 10.6 Eric believes that it is common belief among him and Jack that the New York Mets won the baseball World Series in 1969. Jack ascribes probability 0.5 to the New York Mets having won the World Series in 1969, and to Eric believing that it is common belief among the two of them that the New York Mets won the World Series in 1969. Jack also ascribes probability 0.5 to the New York Mets not having won the World Series in 1969, and to Eric believing that it is not common belief among the two of them that the New York Mets won the World Series in 1969. Construct a belief space in which the described situation is represented by a state of the world and indicate that state (more than one answer is possible). 10.7 (a) Using two states of the world describe the following situation, specifying how each state differs from the other: “Roger ascribes probability 0.4 to the Philadelphia Phillies winning the World Series.” (b) Add to the two states of the world that you listed above two more states of the world, and use the four states to construct a belief space in which the following situation is represented as a state of the world: “Jimmy ascribes probability 0.3 ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
10 Every probability distribution π over the space Y can be regarded as a probability distribution over the space such that π (Y ) = 0. Y ∪Y
425
10.8 Exercises
to the Philadelphia Phillies winning the World Series and to Roger ascribing probability 0.4 to the Philadelphia Phillies winning the World Series, and Jimmy ascribes probability 0.7 to the Philadelphia Phillies not winning the World Series and to Jimmy ascribing 0.4 to the Philadelphia Phillies winning the World Series.” (c) Construct a belief space in which the following situation is represented by a state of the world and indicate that state. Roger says: r I ascribe probability 0.3 to “the Philadelphia Phillies will win the World Series, and Jimmy ascribes probability 0.3 to the Philadelphia Phillies winning the World Series and to my ascribing 0.4 to the Philadelphia Phillies winning the World Series, and Jimmy ascribes probability 0.7 to the Philadelphia Phillies not winning the World Series and to my ascribing 0.4 to the Philadelphia Phillies winning the World Series.” r I ascribe probability 0.2 to “the Philadelphia Phillies will win the World Series, and Jimmy ascribes probability 0.7 to the Philadelphia Phillies winning the World Series and to my ascribing 0.5 to the Philadelphia Phillies winning the World Series, and Jimmy ascribes probability 0.3 to the Philadelphia Phillies not winning the World Series and to my ascribing 0.5 to the Philadelphia Phillies winning the World Series.” r I ascribe probability 0.4 to “the Philadelphia Phillies will win the World Series, and Jimmy ascribes probability 0.2 to the Philadelphia Phillies winning the World Series and to my ascribing 0.1 to the Philadelphia Phillies winning the World Series, and Jimmy ascribes probability 0.8 to the Philadelphia Phillies not winning the World Series and to my ascribing 0.1 to the Philadelphia Phillies winning the World Series.” r I ascribe probability 0.1 to “the Philadelphia Phillies will win the World Series, and Jimmy ascribes probability 0.6 to the Philadelphia Phillies winning the World Series and to my ascribing 0.4 to the Philadelphia Phillies winning the World Series, and Jimmy ascribes probability 0.4 to the Philadelphia Phillies not winning the World Series and to my ascribing 0.3 to the Philadelphia Phillies winning the World Series.” 10.8 Prove Theorem 10.7: player i’s belief operator Bi (see Definition 10.6 on page 392) satisfies the following properties: (a) Bi Y = Y : player i believes that Y is the set of all states of the world. (b) Bi A ∩ Bi C = Bi (A ∩ C): if player i believes that event A obtains, and that event C obtains, then he believes that event A ∩ C obtains. (c) Bi (Bi A) = Bi A: if player i believes that event A obtains, then he believes that he believes that event A obtains. (d) (Bi A)c = Bi ((Bi A)c ): if player i does not believe that event A obtains, then he believes that he does not believe that event A obtains. 10.9 Let be a belief space equivalent to an Aumann model of incomplete information and let Bi be player i’s belief operator in (see Definition 10.6 on page 392).
426
Games with incomplete information: the general model
Prove that the knowledge operator Ki that is defined by the partition via Equation (10.8) (page 390) is the same operator as the belief operator Bi . 10.10 In this exercise we show that the converse of the statement of Theorem 10.10 does not hold. Find an example of a belief space in which there exists a state of the world ω ∈ Y , an event A that is common belief among the players at the state of the world ω, a player i ∈ N, and a state of the world ω′ ∈ A, such that πi (A | ω′ ) < 1. 10.11 In this exercise we show that Theorem 10.11 on page 393 does not hold without the assumption that P(ω) > 0 for every state of the world ω ∈ Y . Prove that there exists an Aumann model of incomplete information in which the common prior P satisfies P(ω) = 0 in at least one state of the world ω ∈ Y , such that the following claim holds: the knowledge operator in the Aumann model is not identical to the belief operator in the belief space equivalent to the Aumann model. 10.12 Describe in words the beliefs of the players about the state of nature and the beliefs of the other players about the state of nature in each of the states of the world in Example 10.4 (page 389). 10.13 Prove Theorem 10.8 (page 392): for each player i ∈ N and every pair of events A, C ⊆ Y , if A ⊆ C, then Bi A ⊆ Bi C. 10.14 Let = (Y, Y , s, (πi )i∈N ) be a belief space equivalent to an Aumann model of incomplete information. Prove that the partition defined by Equation (10.8) (page 390) for the belief space is the same partition as the partition Fi in the equivalent Aumann model. p
10.15 For every player i ∈ N and every real number p ∈ [0, 1], define Bi to be the operator mapping each set E ∈ Y to the set of states of the world at which player i ascribes to E probability equal to or greater than p, p
Bi (E) := {ω ∈ Y : πi (E | ω) ≥ p}.
(10.76)
p
Which of the following properties are satisfied by Bi for p ∈ [0, 1]? For each property, either prove that it is satisfied, or present a counterexample. There may be different answers for different values of p. (a) (b) (c) (d) (e)
p
Bi (Y ) = Y . p Bi (A) ⊆ A. p p If A ⊆ C then Bi (A) ⊆ Bi (C). p p p Bi (Bi (A)) = Bi (A). p p p c Bi ((Bi (A)) ) = (Bi (A))c .
is a belief subspace of a belief space = (Y, Y , s, (πi )i∈N ), then 10.16 Prove that if Y . the event Y is common belief among the players at every state of the world ω ∈ Y
10.17 Let = (Y, Y , s, (πi )i∈N ) be a belief space equivalent to an Aumann model of incomplete information. Prove that for each state of the world ω, the common knowledge component among the players at ω (see page 333) is a belief subspace.
427
10.8 Exercises
10.18 Consider Example 10.19 (page 399), and suppose that the beliefs of the types in that example are as follows, where x, y, z, w ∈ [0, 1]. For which values of x, y, z, and w is the belief system of the players consistent? II1 II2
II1
II2
I1 x 1 − x I2 y 1 − y
I1 z w I2 1 − z 1 − w
The beliefs of Player I
The beliefs of Player II
10.19 Prove that the beliefs of the players in Example 10.20 (page 400) are inconsistent. Recall that when the set of states of the world is a topological space, we require that a belief subspace be a closed set (see Remark 10.34). 10.20 Consider the following belief space, where the set of players is N = {I, II}, and the set of states of nature is S = {s1 , s2 }. State of the world ω1
s(·)
s1
ω2
s2
ω3
s2
πI (·) 3 5 (ω1 ), 5 (ω2 ) 2 3 5 (ω1 ), 5 (ω2 ) 2
[1(ω1 )]
πII (·) [1(ω1 )] [ 3 (ω2 ), 34 4 (ω2 ),
1 4 (ω3 )] 1 4 (ω3 )
(a) List the types of the two players at each state of the world in Y . (b) Can the beliefs of the players be derived from a common prior? If so, what is that common prior? If not, justify your answer. 10.21 Boris believes that “it is common belief among me and Alex that Bruce Jenner won a gold medal at the Montreal Olympics,” while Alex believes that “it is common belief among me and Boris that Bruce Jenner won a silver medal at the Montreal Olympics.” (a) Construct a belief space in which the described situation is represented by a state of the world and indicate that state. (b) Prove that, in any belief space in which the set of states of the world is a finite set and contains a state ω describing the situation in this exercise, ω is not contained in the support of the beliefs of either Boris or Alex, at any state of the world. In other words, ω ∈ supp(πi (ω′ )) for any state of the world ω′ , for i ∈ {Boris, Alex}. 10.22 Laoco¨on declares: “I ascribe probability 0.6 to the Greeks attacking us from within a wooden horse.” Priam then declares: “I ascribe probability 0.7 to the Greeks attacking us from within a wooden horse.” After Priam’s declaration, is the fact that “Laoco¨on ascribes probability 0.6 to the Greeks attacking from within a wooden horse” common belief among the players? Justify your answer. 10.23 There are two players, N = {I, II}, and two states of nature S = {s1 , s2 }. A chance move chooses the state of nature, where s1 is chosen with probability 0.4, and s2 is
428
Games with incomplete information: the general model
chosen with probability 0.6. Player I knows the true state of nature that has been chosen. A chance move selects a signal that is received by Player II. The signal depends on the state of nature, as follows: if the true state of nature is s1 , Player II receives signal R with probability 0.6, and signal L with probability 0.4; if the true state of nature is s2 , Player II receives signal M with probability 0.7, and signal L with probability 0.3. It follows that if Player II receives signal L, he does not know with certainty which state of nature has been chosen. If the state of nature that has been chosen is s2 , and Player II has received signal M, then Player I is informed of this with probability 0.2, and Player I is not informed of this with probability 0.8. This description is common belief among the players. Construct a belief space in which the described situation is represented by a state of the world and indicate that state. 10.24 Repeat Exercise 10.23, under the assumption that the players do not agree on the probability distribution according to which the state of nature is chosen; that is, there is no common prior over S: Player I believes that s1 is chosen with probability 0.4, while Player II believes that s1 is chosen with probability 0.5. The rest of the description of the situation is as in Exercise 10.23, and this description is common belief among the players. 10.25 John, Bob, and Ted meet at a party in which all the invitees are either novelists or poets (but no one is both a novelist and a poet). Every poet knows all the other poets, but no novelist knows any other attendee at the party, whether novelist or poet. Every novelist believes that one-quarter of the attendees are novelists. Construct a belief space describing the beliefs of John, Bob, and Ted about the others’ profession. 10.26 Walter, Karl, and Ferdinand are on the road to Dallas. They arrive at a fork in the road; should they turn right or left? Type t1 believes that “we should turn right, everyone here believes that we should turn right, everyone here believes that everyone here believes that we should turn right, etc.”: in other words, that type believes that turning right is called for, and believes that this is common belief among the three. Type t2 believes that “we should turn right, the two others believe that we should turn left, the two others believe that everyone here believes that we should turn left, the two others believe that everyone here believes that everyone here believes that we should turn left, etc.”: in other words, that type believes that turning right is called for, but believes that the other two believe that they should turn left and that this fact is common belief among the three. Type t3 does not know which way to turn, but believes that the two others know the right way to turn, and believes that the others believe that everyone knows the right way to turn: he believes “the probability that we should turn right is 21 , and the probability that we should turn left is 12 ; if we should turn right, then the two others believe that we should turn right, and that this is common belief among everyone here, and if we should turn left, then the two others believe that we should turn left, and that this is common belief among everyone here.” Walter’s type is t1 , Karl’s type is t2 , and Ferdinand’s type is t3 .
429
10.8 Exercises
(a) Construct a belief space in which the described situation is represented by a state of the world and indicate that state. (b) What is the minimal belief subspace of each of the three players (at the state of the world in which Walter’s type is t1 , Karl’s type is t2 , and Ferdinand’s type is t3 )? 10.27 Repeat Exercise 10.26, when all three players are of type t3 . 10.28 In this exercise we show that when there is no common prior, it is possible to find a lottery that satisfies the property that each player has a positive expectation of profiting from the lottery, using his subjective probability belief. Let = (Y, Y , s, (πi )i∈N ) be a belief space of the set of players N = {I, II}, where the set of states of the world Y is finite. For each i ∈ N, define a set Pi of probability distributions over Y as follows:
Pi := xω πi (· | ω) : xω = 1, xω > 0 ∀ω ∈ Y ⊂ (Y ). (10.77) ω∈Y
ω∈Y
This is the set of all convex combinations of the beliefs (πi (· | ω))ω∈Y of player i such that the weight given to every ω is positive.
(a) Prove that for every p ∈ Pi and every ω ∈ Y , the belief πi (· | ω) is the conditional probability distribution of p given Fi (ω) (for the definition of the set Fi (ω) see Equation (10.8) on page 390). In other words, if p were a common prior, the beliefs of player i would be given by πi . (b) Prove that the set Pi is an open and convex set in (Y ), for every i ∈ N. (c) Prove that if there is no common prior, then PI and PII are disjoint sets. (d) Using Exercise 23.46 (page 956) prove that there exist α ∈ R|Y | and β ∈ R such that11 α, pI > β > α, pII , ∀pI ∈ PI , ∀pII ∈ PII .
(10.78)
(e) The beliefs of Players I and II about the state of the world are given by the probability distributions (πi )i∈N . The state of the world, while unknown to them today, will become known to them tomorrow. They decide that after the state of the world ω will be revealed to them, Player II will pay Player I the sum α(ω) − β. If this quantity is negative, the payment will be from Player I to Player II. Prove that, given his subjective beliefs, the expected payoff of each player under this procedure is positive. 1 , Y|Y , 1 = (Y 10.29 Prove Theorem 10.23 (page 401): given two belief subspaces 1 s, (πi )i∈N ) and 2 = (Y2 , Y|Y2 , s, (πi )i∈N ) of a belief space = (Y, Y , s, (πi )i∈N ) 1 ∩ Y 2 = ∅, prove that (Y 1 ∩ Y 2 , Y|Y ∩Y , s, (πi )i∈N ) is also a belief satisfying Y 1 2 subspace of . 10.30 Prove that if there exists a minimal belief subspace, then it is unique.
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
11 The inner product is given by p, α =
ω∈Y
p(ω)α(ω).
430
Games with incomplete information: the general model
10.31 Generalize Theorem 10.25 to the case in which the set of states of the world is countably infinite: let = (Y, Y , s, (πi )i∈N ) be a belief space in which the set of states of the world Y is countably infinite. Prove that there exists a minimal belief subspace at each state of the world ω ∈ Y . 10.32 Prove Theorem 10.33 (page 405): let = (Y, Y , s, (πi )i∈N ) be a belief space, where Y is a finite set. Then for each state of the world ω ∈ Y ,
(ω) = {ω} ∪ i (ω) . (10.79) Y Y i∈N
10.33 Prove Theorem 10.35 (page 406): let = (Y, Y , s, (πi )i∈N ) be a belief space, of Y is a belief subspace if and only if Y where Y is a finite set. Then the subset Y is a closed component in the graph G defined by .
10.34 Prove Theorem 10.36 on page 406: let = (Y, Y , s, (πi )i∈N ) be a belief space, where Y is a finite set, let ω ∈ Y , and let i ∈ N. For each state of the world ω, let C(ω) be the minimal closed component containing ω in the graph corresponding to . Prove that i (ω) = C(ω′ ). (10.80) Y {ω′ : πi ({ω′ }|ω)>0}
10.35 Present an example of a belief space with three players, and a state of the 2 (ω) ⊃ Y 3 (ω) (where all the set inclusions are strict 1 (ω) ⊃ Y world ω satisfying Y inclusions).
10.36 Prove or disprove the following. There exists a belief space = (Y, Y , s, (πi )i∈N ), where Y is a finite set, and there are two players, i, j ∈ N, such that there exists i (ω) ∩ Y j (ω) is nonempty a state of the world ω ∈ Y satisfying the property that Y (ω) and Y (ω). and strictly included in both Yi j
10.37 In this exercise, suppose there are four states of nature, S = {s11 , s12 , s21 , s22 }. The information that Player I receives is the first coordinate of the state of nature chosen, while the information that Player II receives is the second coordinate. The conditional probabilities of the players, given their respective informations, are given by the following table (the conditional probability of Player I appears in the left column, while the conditional probability of Player II appears in the top row of the table): 3 0 5 2 1 5
(1, 0) s11 ( 23 , 13 ) s21
s12 s22
The table is to be read as stating, e.g., that if Player I receives information indicating that the state of nature is contained in {s11 , s12 }, he believes with probability 1 that the state of nature is s11 .
431
10.8 Exercises
(a) Construct a belief space in which the described situation is represented by a state of the world and indicate that state. Suppose that the state of nature is s12 , and that ω is the corresponding state of the world. Answer the following questions: I (ω) and Y II (ω) of the players? (b) What are the minimal belief subspaces Y (c) Is YI (ω) = YII (ω)? (d) Is there a common prior p over S such that the players agree that the state of the world has been chosen according to p? (e) Is the state of the world ω ascribed positive probability by p? 10.38 Repeat Exercise 10.37, where this time there are nine states of nature S = {s11 , s12 , s13 , s21 , s22 , s23 , s31 , s32 , s33 } and the beliefs of the players, given their information, are presented in the following table: ⎛3⎞⎛ ⎞⎛ ⎞ 1 0 5 ⎜ 1 ⎟ ⎝ ⎠ ⎜ 32 ⎟ ⎝5⎠ 1 ⎝3⎠ 1 0 0 5 (1, 0, 0) s11 s12 s13 ( 23 , 13 , 0) s21 s22 s23 (0, 0, 1) s31 s32 s33 10.39 Repeat Exercise 10.37, but this time suppose the beliefs of each player of each type are: 0 1 1 0 (1, 0) s11 s12 (0, 1) s21 s22 Parts (b)–(e) of Exercise 10.37 relate to a situation in which the true state of nature is s12 . Can each player calculate the minimal belief subspace of the other player at each state of the world? Justify your answer. 10.40 Repeat Exercise 10.37, where S includes 20 states of nature, the true state of nature is s13 , and the beliefs of the players are given in the following table: ⎛ ⎞⎛3⎞⎛ ⎞⎛ ⎞⎛ ⎞ 0 0 0 0 ⎜ ⎟ ⎜ 52 ⎟ ⎜ 0 ⎟ ⎜ 0 ⎟ ⎜ 0 ⎟ ⎜1⎟ ⎜ 5 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠⎝1⎠⎝1⎠⎝1⎠ ⎝0⎠ 0 2 2 3 1 1 2 0 0 2 2 3 (1, 0, 0, 0, 0) s11 s12 s13 s14 s15 ( 13 , 23 , 0, 0, 0) s21
s22
s23
s24
s25
1 , 4 1 , 4
s31
s32
s33
s34
s35
s41
s42
s43
s44
s45
(0, 0, 0, (0, 0, 0,
3 ) 4 3 ) 4
432
Games with incomplete information: the general model
10.41 Suppose there are two players, N = {I, II}, and four states of nature, S = {s11 , s12 , s21 , s22 }. Player I’s information is the first coordinate of the state of nature, and Player II’s information is the second coordinate. The beliefs of each player, given his information, about the other player’s type are given by the following tables: II1 II2 I1 I2
1 4 4 9
3 4 5 9
The beliefs of Player I
II1 II2 I1 I2
1 5 4 5
3 8 5 8
The beliefs of Player II
(a) Find a belief space describing this situation. (b) Is this belief space consistent? If so, describe this situation as an Aumann model of incomplete information. 10.42 Repeat Exercise 10.41 for the following beliefs of the players: II1 II2 I1 I2
1 4 1 5
3 4 4 5
The beliefs of Player I
II1 II2 I1 I2
2 3 1 3
1 2 1 2
The beliefs of Player II
10.43 Calculate the minimal belief subspaces of the two players at each state of the world in Example 10.20 (page 400). Recall that when the set of states of the world is a topological space, a belief subspace is required to be a closed set (Remark 10.34). 10.44 Prove or disprove: There exists a belief space in which the set of states of the world contains K states, and there are 2K − 1 different belief subspaces (in other words, every subset of states of the world, except for the empty set, constitutes a belief subspace). 10.45 Prove or disprove: There exists a belief space comprised of three states of the world and six different belief subspaces. 10.46 Prove or disprove: There exists a belief space with N = {I, II} and a finite set of II (ω) and I (ω) ⊇ Y states of the world containing a state of the world ω such that Y YII (ω) ⊇ YI (ω). i (ω) ∪ {ω} is 10.47 Prove or disprove: For each ω ∈ Y , and each player i ∈ N, the set Y a belief subspace.
10.48 Prove that every Harsanyi game with incomplete information (see Definition 9.39 on page 347) is a game with incomplete information according to Definition 10.37 on page 407. 10.49 Prove Theorem 10.40 (page 409): the strategy vector σ ∗ = (σi∗ )i∈N is a Bayesian equilibrium if and only if for each player i ∈ N, each state of the world ω ∈ Y ,
433
10.8 Exercises
and each action ai,ω ∈ Ai (ω),
γi (σ ∗ | ω) ≥ γi ((σ ∗ ; ai,ω ) | ω).
(10.81)
10.50 In this exercise we show that for studying the set of Bayesian equilibria, one may assume that the action sets of the players are independent of the state of nature. Let G = (N, S, (Ai )i∈N , ) be a game with incomplete information such that the payoff functions (ui )i∈N are uniformly bounded from below: M := inf inf inf ui (s; a) > −∞. i∈N s∈S a∈A(s)
(10.82)
= (N, S, (Ai )i∈N , ) be the game with incomplete information defined as Let G follows:
r The action sets of the players are independent of the state of nature: Ai (s) = Ai for every player i ∈ N and every state of nature s ∈ S. r For each player i ∈ N, the payoff function ui is a real-valued function defined over the set A = ×i∈N Ai and given by ⎧ a ∈ A(s), ⎨ ui (s; a) ai ∈ Ai (s), a ∈ A(s), (10.83) ui (s; a) = M ⎩ M −1 ai ∈ Ai (s),
where A(s) = ×i∈N Ai (s). In other words, if at least one player j = i chooses an action that is not in Aj (s), while player i chooses an action in Ai (s), player i receives payoff M, and if player i chooses an action that is not in Ai (s), he receives a payoff that is less than M.
Prove that the set of Bayesian equilibria of the game G coincides with the set of Bayesian equilibria of the game G.
10.51 Prove that there exists a Bayesian equilibrium (in behavior strategies) in every game with incomplete information in which the set of players is finite, the number of types of each player is countable, the number of actions of each type is finite, and the payoff functions are uniformly bounded. 10.52 In this exercise we generalize Corollary 4.27 (page 105) to Bayesian equilibria. Suppose that for every player i in a game with incomplete information there exists a strategy σi∗ that weakly dominates all his other strategies; in particular, ∗ (ω), ai ), ∀i ∈ N, ∀ω ∈ Y, ∀ai ∈ Ai (s(ω)). Ui (s(ω); σ ∗ (ω)) ≥ Ui (s(ω); σ−i
Prove that the strategy vector σ ∗ = (σi∗ )i∈N is a Bayesian equilibrium. 10.53 This exercise presents an alternative proof of Theorem 10.42 (page 411), regarding the existence of Bayesian equilibria in finite games. Let G = (N, S, (Ai )i∈N , ) be a game with incomplete information where the set of actions (Ai )i∈N is a finite set, and = (Y, Y , s, (πi )i∈N ) is a belief space with a finite set of states of the world Y . Define a strategic-form game Ŵ, where the set of players is N, the set of player i’s pure strategies is the set of functions σi that map each type of player i to an available action for that type, and player i’s
434
Games with incomplete information: the general model
payoff function wi is given by wi (σ ) =
ω∈Y
γi (σ | ω).
(10.84)
(a) Prove that the game Ŵ has a Nash equilibrium in mixed strategies. (b) Deduce that the game Ŵ has a Nash equilibrium in behavior strategies. (c) Prove that the set of Nash equilibria in behavior strategies of the game Ŵ coincides with the set of Bayesian equilibria of the game G. 10.54 In Example 10.45 (page 414), find a strategy for Player I that guarantees that Player II, of any type, is indifferent between L and R. 10.55 Find a Bayesian equilibrium in pure strategies in the following two-player game. Are there any additional Bayesian equilibria? The set of states of nature is S = {s1 , s2 }, the set of players is N = {I, II}, and the belief space is given by: State of the world s(·) π1 (·)
π2 (·)
ω1 ω2
s1 [1(ω1 )] [1(ω1 )] 1 s1 [1(ω1 )] [ 2 (ω2 ), 12 (ω3 )]
ω3 ω4
s2 [1(ω4 )] [ 12 (ω2 ), 12 (ω3 )] s2 [1(ω4 )] [1(ω4 )]
The state games are as follows: L R
L
R
T
0, 0
0, 1
T
1, 2
−10, 0
B
−10, 0
1, 1
B
0, 2
0, 0
State game s1
State game s2
10.56 Find a Bayesian equilibrium in the game appearing in Exercise 9.39 (page 378, when each player has a different prior, as follows. The prior distribution of Player I is pI (I1 , II1 ) = 0.4, pI (I1 , II2 ) = 0.1, pI (I2 , II1 ) = 0.2, pI (I2 , II2 ) = 0.3. The prior distribution of Player II is pII (I1 , II1 ) = 0.3, pII (I1 , II2 ) = 0.2,
pII (I2 , II1 ) = 0.25, pII (I2 , II2 ) = 0.25. Assume that these prior distributions are common knowledge among the players. 10.57 Ronald and Jimmy are betting on the result of a coin toss. Ronald ascribes probability 13 to the event that the coin shows heads, while Jimmy ascribes probability 3 to that event. The betting rules are as follows: Each of the two players writes 4 on a slip of paper “heads” or “tails,” with neither player knowing what the other
435
10.8 Exercises
player is writing. After they are done writing, they show each other what they have written. If both players wrote heads, or both players wrote tails, each of them receives a payoff of 0. If they have made mutually conflicting predictions, they toss the coin. The player who has made the correct prediction regarding the result of the coin toss receives $1 from the player who has made the incorrect prediction. This description is common knowledge among the two players. (a) (b) (c) (d)
Depict this situation as a game with incomplete information. Are the beliefs of Ronald and Jimmy consistent? Justify your answer. If you answered the above question positively, find the common prior. Find a Bayesian equilibrium in this game (whether or not the beliefs of the players are consistent).
10.58 Find a Bayesian equilibrium in the game appearing in Exercise 9.50 (page 382). 10.59 In Exercise 9.42 (page 379), suppose that the players’ beliefs are:
r Marc thinks that the probability of every possible value is 1 , and he believes that 3 this is common belief among him and Nicolas. r Nicolas knows that Marc’s beliefs are as described above, but he also knows that the true value of the company is 11. Answer the following questions: (a) Can this situation be described as a Harsanyi game with incomplete information (with a common prior)? Justify your answer. (b) Find a Bayesian equilibrium of the game. 10.60 Find all the Bayesian equilibria of the following two-player game with incomplete information. The set of states of nature is S = {s1 , s2 , s3 }, the set of players is N = {I, II}, the set of states of the world is Y = {ω1 , ω2 , ω3 }, and the belief space is: State of the world
s(·)
ω1
s1
ω2
s2
ω3
s3
πI (·) (ω1 ), 12 (ω2 ) 2 1 1 2 (ω1 ), 2 (ω2 ) 1
[1(ω3 )]
πII (·) [1(ω1 )] [1(ω1 )] [1(ω3 )]
The state games are as follows: L
R
T
4, 0
1, 1
B
1, 2
3, 0
State game s1
L
R
T
0, 3
1, 5
B
1, 0
0, 2
State game s2
L
R
T
0, 1
6, 4
B
7, 5
2, 3
State game s3
10.61 A Cournot game with inconsistent beliefs Each of two manufacturers i ∈ {I, II} must determine the quantity of a product xi to be manufactured (in thousands of
436
Games with incomplete information: the general model
units) for sale in the coming month. The unit price of the manufactured products depends on the total quantities both manufacturers produce, and is given by p = 2 − xI − xII . Each manufacturer knows his own unit production cost, but does not know the unit production cost of his rival. The unit production cost of manufacturer i may be high (ci = 45 ) or low (ci = 34 ). Manufacturer i’s profit is xi (p − ci ). The first manufacturer ascribes probability 32 to the second manufacturer’s costs being high (and probability 13 to the second manufacturer’s costs being low). The second manufacturer ascribes probability 34 to the costs of both manufacturers being equal to each other (and probability 14 to their costs being different). Answer the following questions: (a) Describe this situation as a game with incomplete information. (b) Prove that the beliefs of the manufacturers are inconsistent. (c) Find all Bayesian equilibria in pure strategies in this game. 10.62 This exercise shows that when the players agree that every state of the world may obtain, the game is equivalent to a game with a common prior. Let G = (N, S, (Ai )i∈N , ) be a game with incomplete information, where s = (N, (Ai (s))i∈N , (ui (s))i∈N ) for every s ∈ S, = (Y, Y , s, (πi )i∈N ), the set of states of the world Y is finite, the set of states of nature S equals the set of states of the world, S = Y with s(ω) = ω, and each player i has a prior distribution Pi whose support is Y , and a partition Fi of Y such that πi (ω) = Pi (ω | Fi (ω)), for every player i and every state of the world ω. Let P be a probability distribution over Y whose support is Y . For each s ∈ S define a state game s := (N, (Ai (s))i∈N , ( ui (s))i∈N ), where ui (a; ω) := Pi (ω) u (a; ω). Let S be the collection of all state games s defined in this way. i P(ω) Let G = (N, S, (Ai )i∈N , ) be a game with incomplete information, where = (Y, Y , s(ω) = s/ (ω), s, ( πi )i∈N ), and for every player i ∈ N and every ω ∈ Y , has a common prior equal to P. Each πi (ω) := P(ω | Fi (ω)). In words, the game G player i’s payoff function at the state of the world ω is his payoff function in the i (ω) game G multiplied by the ratio PP(ω) . Prove that the set of Bayesian equilibria of the game G coincides with the set of Bayesian equilibria of the game G.
10.63 In this exercise, we relate the set of Bayesian equilibria in a game G with incomplete information to the set of Bayesian equilibria when we restrict the game to a belief subspace of the belief space of G. Let G = (N, S, (Ai )i∈N , ) be a game with incomplete information, where = (Y, Y , s, (πi )i∈N ) is a belief space with a finite set of states of the world Y , |Y, s, (πi )i∈N ) be a belief subspace of . , Y = (Y and let
= (N, S, (Ai )i∈N , ) is a game with incomplete information. (a) Prove that G ∗ (b) Let σ be a Bayesian equilibrium of G. Prove that σ|∗Y, the strategy vector σ ∗ , is a Bayesian equilibrium of the game restricted to the states of the world in Y G.
437
10.8 Exercises
Prove that there exists a Bayesian (c) Let σ be a Bayesian equilibrium of G. ∗ σi (ω) = σi∗ (ω) for each player i ∈ N and each equilibrium σ of G satisfying . state of the world ω ∈ Y
10.64 Prove that the probability distribution defined by Equation (10.57) on page 416 is consistent over the belief space described in Remark 10.47. 10.65 Prove that for probability distributions p over Y whose support is a finite set in Equation (10.56) one may condition on the set {ω ′ ∈ Y : πi (ω′ ) = πi (ω)} instead of Pi (ω).
10.66 Prove that, in Example 10.18 (page 398), Equation (10.56) (page 416) is satisfied for each event A ⊆ Y , for each player i, and for each ω ∈ Y . 10.67 Prove that the two definitions of a consistent distribution, Definition 10.46 (page 416) and Definition 10.49 (page 418), are equivalent. 10.68 Prove that Equation (10.64) on page 418 is satisfied for Player II in Example 10.18 (page 398). 10.69 Verify that Equation (10.64) on page 418 is satisfied in Examples 10.17 (page 396) and 10.18 (page 398). 10.70 Prove Theorem 10.50 (page 418): if the set of states of the world is finite, and if p = supp(p) is a consistent belief subspace. is a consistent distribution, then Y
1 and Y 2 be two consistent belief subspaces of the same belief space , and 10.71 Let Y let p1 and p2 be consistent distributions over these two subspaces, respectively. 1 ∪ Y 2 is also a consistent belief subspace, and that for each Prove that the set Y λ ∈ [0, 1] the probability distribution λp1 + (1 − λ)p2 is consistent. In addition, if for each i ∈ {1, 2} we expand pi to a probability distribution over Y by setting i , then for every λ ∈ [0, 1] the probability distribution pi (ω) = 0 for every ω ∈ Y λp1 + (1 − λ)p2 is consistent.
10.72 Let be a consistent belief space, and let p be a consistent distribution. Let (ω)) > 0. Prove that the probability ω ∈ Y be a state of the world satisfying p(Y (ω) is consistent. Deduce that Y (ω) is a distribution p conditioned on the set Y consistent belief subspace. 10.73 Prove or disprove: Every finite belief space has a consistent belief subspace.
(ω) is also i (ω) is inconsistent for some player i ∈ N, then Y 10.74 Prove or disprove: If Y inconsistent.
i (ω) is inconsistent for every player i ∈ N, then Y (ω) is also 10.75 Prove or disprove: If Y inconsistent.
(ω) is inconsistent then there exists a player i ∈ N for whom 10.76 Prove or disprove: If Y Yi (ω) is inconsistent.
10.77 Provide an example of a belief space with three players, which contains a state of the world ω, such that the minimal belief subspaces of the players at ω are inconsistent, and differ from each other.
438
Games with incomplete information: the general model
10.78 Provide an example of a belief space with three players, which contains a state of the world ω, such that the minimal belief subspaces of two of the players at ω are consistent and differ from each other, but the minimal belief space of the third player is inconsistent. is a minimal consistent belief subspace, and Y ′ is an inconsistent 10.79 Prove that if Y ′ ∩ Y = ∅. Does the claim hold without the condition that belief subspace, then Y Y is minimal?
10.80 In Example 10.27 on page 402, show that at the state of the world ω2 , the fact that the state of the world is consistent is common belief among the players. 10.81 Consider the following belief space, where the set of players is N = {I, II}, the set of states of nature is S = {s1 , s2 }, the set of states of the world is Y = {ω1 , ω2 , ω3 }, and the beliefs of the players are given by the following table: State of the world s(·) ω1
s1
ω2 ω3
s1 s2
1
πI (·)
πII (·)
(ω1 ), 12 (ω2 ) [1(ω1 )] (ω1 ), 12 (ω2 ) [1(ω3 )] 2 [1(ω3 )] [1(ω3 )]
21
(a) Prove that the state of the world ω3 is the only consistent state of the world in Y. (b) Prove that at the state of the world ω2 , Player II believes that the state of the world is consistent. (c) Prove that at the state of the world ω1 , both players believe that the state of the world is inconsistent. (d) Prove that at the state of the world ω1 , the fact that the state of the world is inconsistent is not common belief among the players. 10.82 Find an example of a belief space, where the set of players is N = {I, II}, and there is a state of the world at which Player I believes that the state of the world is consistent, while Player II believes the state of the world is inconsistent. 10.83 At the state of the world ω, Player I believes that the state of the world is consistent, while Player II believes the state of the world is inconsistent. Is it possible that ω is a consistent state of the world? Justify your answer. 10.84 Prove or disprove: If it is common belief among the players at the state of the j (ω) for every pair i (ω) = Y world ω that the state of the world is consistent, then Y of players i, j ∈ N.
10.85 Find an example of a belief space where the set of players consists of two players, and there exists an inconsistent state of the world ω at which πi ({ω} | ω) = 0, and each player i believes that the state of the world is inconsistent.
10.86 Prove that if player i believes at the state of the world ω that the state of the world is consistent, then he believes that every player believes that the state of the world
439
10.8 Exercises
is consistent. Deduce that in this case player i believes that it is common belief that the state of the world is consistent. 10.87 Two buyers are participating in a first-price auction. Each of them has a private value, located in [0, 1]. With regards to each of the following belief situations, in which the two buyers have symmetric beliefs, answer the following questions:
r Ascertain whether the beliefs of the buyers are consistent. Prove your reply. r Find a Bayesian equilibrium. (a) The buyer whose private value is x believes: r If x ∈ [0, 1 ], the buyer believes that the private 2 given by the uniform distribution over [0, 12 ]. r If x ∈ ( 1 , 1], the buyer believes that the private 2 given by the uniform distribution over [ 12 , 1]. (b) A buyer whose private value is x believes: r If x ∈ [0, 1 ], the buyer believes that the private 2 given by the uniform distribution over [ 12 , 1]. r If x ∈ ( 1 , 1], the buyer believes that the private 2 given by the uniform distribution over [0, 12 ].
value of the other buyer is value of the other buyer is
value of the other buyer is value of the other buyer is
11
The universal belief space
Chapter summary In this chapter we construct the universal belief space, which is a belief space that contains all possible situations of incomplete information of a given set of players over a certain set of states of nature. The construction is carried out in a straightforward way. Starting from a given set of states of nature S and a set of players N we construct, step by step, the space of all possible hierarchies of beliefs of the players in N. The space of all possible hierarchies of beliefs of each player is proved to be a well-defined compact set T, called the universal type space. It is then proved that a type of a player is a joint probability distribution over the set S and the types of the other players. Finally, the universal belief space is defined as the Cartesian product of S with n copies of T; that is, an element of , called state of the world, consists of a state of nature and a list of types, one for each player.
Chapters 9 and 10 focused on models of incomplete information and their properties. A belief space with a set of players N on a set of states of nature S, is given by a set of states of the world Y , and, for each state of the world ω ∈ Y , a corresponding state of nature s(ω) ∈ S and a belief πi (ω) ∈ (Y ) for each player i ∈ N. As we saw, the players’ beliefs determine hierarchies of beliefs over the states of nature, that is, beliefs about the state of nature, beliefs about beliefs about the state of nature, beliefs about beliefs about beliefs about the state of nature, and so on (see Example 9.28 on page 334 for an Aumann model of incomplete information, Example 9.43 on page 350 for a Harsanyi model of incomplete information, and page 390 for a hierarchy of beliefs in a more general belief space). The players’ hierarchies of beliefs are thus derived from the model of incomplete information, and they are not an element of the model. In reality, when individuals analyze a situation with incomplete information they do not write down a belief space. They do, however, have hierarchies of beliefs over the state of nature: an investor ascribes a certain probability to the event “the interest rate next year will be 3%,” he ascribes a possibly different probability to the event “the interest rate next year will be 3% and the other investor ascribes probability at least 0.7 to the interest rate next year being 3%,” and he similarly ascribes probabilities to events that involve higher levels of beliefs. It therefore seems more natural to have the belief hierarchies as part of the data of the situation. In other words, we wish to describe a situation of incomplete information by the set of states of nature S and the players’ belief hierarchies on S. Does such a description correspond to a belief space as defined in Section 10? This chapter is 440
441
The universal belief space
devoted to the affirmative answer of this question: starting from belief hierarchies we will construct the belief space that yields these belief hierarchies. We will first define the concept of a belief hierarchy of a player, and construct a space = (N, S) containing all possible hierarchies of beliefs of the set of players N about the set of states of nature S. We will then prove that this space is a belief space. This will imply that every belief space of the set of players N on the set of states of nature S is a belief subspace of the space . That is why the space will be called the universal belief space. In constructing the universal belief space , we will assume that the space of states of nature S is a compact set in a metric space. This assumption is satisfied, in particular, when S is a finite set, or when it is a closed and bounded set in a finite-dimensional Euclidean space. Remark 11.1 As we will now show, the assumption that the space of states of nature is compact is not a strong assumption. Suppose that the players in N are facing a strategicform game in which the set of actions of player i is Ai , which may be a finite or an infinite set. We argue that under mild assumptions, such a game can be presented as a point in a compact subset of a metric space. Denote by A = ×i∈N Ai the set of action vectors. Assume that the preference relation of each player satisfies the von Neumann–Morgenstern axioms, and that each player has a most-preferred and a least-preferred outcome (see Section 2.6 for a generalization of the von Neumann–Morgenstern axioms to infinite sets of outcomes). It follows that the preference relation of each player i can be presented by a bounded linear utility function ui . Since the utility function of every player is determined up to a positive affine transformation, we may suppose that the utility function of each player takes values in the range [0, 1]. As we saw in Chapters 9 and 10, a state of nature is a state game in strategic form that the players face. Suppose for now that the set of actions Ai of player i is common knowledge among the players. Then a state of nature is described by a vector of utility functions (ui )i∈N , i.e., by an element in S := [0, 1]A : a list of payoff vectors for each action vector. When the sets of actions are finite, the set S is compact, i.e., the set of states of nature is a compact set. When the sets of actions are compact (not necessarily finite), the set of states of nature is a compact set if we consider only state games in which the utility functions of all players are Lipschitz functions with a given constant. Recall that for every set X in a topological space, (X) is the space of all probability distributions over X. We endow (X) with the weak-∗ topology. In the weak-∗ topology, a sequence of distributions (μj )j ∈N converges to a probability distribution μ if and only if for every continuous function f : X → R, lim f (x)dμj (x) = f (x)dμ(x). (11.1) j →∞ X
X
This topology is a metric topology: there exists a metric over the set (X) satisfying the property that the collection of open sets in the weak-∗ topology is identical with the collection of open sets generated by open balls in this metric.
442
The universal belief space
A fundamental property of this topology, which we will often make use of, is the following theorem, which follows from Riestz’s representation theorem. Theorem 11.2 If X is a compact set in a metric space, then (X) is a compact metric space (in the weak-∗ topology). For the proof of this theorem, see Conway [1990], Theorem V.3.1, and Claim III.5.4. Further properties of this topology will be presented, as needed, in the course of the chapter.
11.1
Belief hierarchies • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
We begin by constructing all the belief spaces in the most direct and general possible way. A player’s belief is described by a distribution over the parameter about which he is uncertain, i.e., over the state of nature. Denote by Xk the space of all belief hierarchies of order k. In particular, X1 includes all the possible beliefs of a player about the states of nature; X2 includes all possible beliefs of a player about the state of nature, and on the beliefs of the other players about the state of nature; X3 includes all the beliefs of order 1 and 2 and the beliefs about the second-order beliefs of all the other players, etc. Definition 11.3 The space of belief hierarchies of order k of a set of players N on the set of states of nature S is the space Xk defined inductively as follows: X1 := (S),
(11.2)
and for every k ≥ 2, Xk : = Xk−1 × (S × (Xk−1 )n−1 )
= Xk−1 × (S × Xk−1 × Xk−1 × · · · × Xk−1 ). 12 3 0
(11.3)
n−1 times
An element μk ∈ Xk is called a belief of order k, or a belief hierarchy of order k. Every probability distribution over S can be a first-order belief of a player in a game. A second-order belief (or hierarchy) is a first-order belief and a joint distribution over the set of states of nature and the first-order beliefs of the other players. In general, a (k + 1)-order belief includes a belief of order k and a joint distribution over the vectors of length n composed of a state of nature and the n − 1 beliefs of order k of the other players. Note that the joint distribution over S × (Xk )n−1 is not necessarily a product distribution. This means that a player can believe that there is a correlation between the beliefs of order k of the other players, and between those beliefs and the state of nature. This can happen, for example, if the player believes that one or more of the other players knows the state of nature, or if some other players have common information on the state of nature. Since the first component of a (k + 1)-order belief is a k-order belief, and the first component of a k-order belief is a (k − 1)-order belief, and so on, a (k + 1)-order belief defines the player’s beliefs of order 1, 2, . . . , k. This is the reason that a (k + 1)-order belief is also called a “belief hierarchy of order k + 1.”
443
11.1 Belief hierarchies
Example 11.4 Suppose that there are two players N = {Benjamin, George}, and two states of nature S = {s1 , s2 }. The space of belief hierarchies of order 1 of every player is X1 = (S), and a first-order belief is of the form [p1 (s1 ), (1 − p1 )(s2 )]: “I ascribe probability p1 to the state of nature being s1 , and probability 1 − p1 to the state of nature being s2 .”A second-order belief is an element in X2 = X1 × (S × X1 ), for example, “I ascribe probability p2 to the state of nature being s1 , probability 1 − p2 to the state of nature being s2 (an element of X1 ), probability α1 to the state of nature being s1 and the belief of the second player on the state of nature being [q1 (s1 ), (1 − q1 )(s2 )], probability α2 to the state of nature being s1 and the belief of the second player on the state of nature being [q2 (s1 ), (1 − q2 )(s2 )], and probability 1 − α1 − α2 to the state of nature being s2 and the belief of the second player on the state of nature being [q3 (s1 ), (1 − q3 )(s2 )] (this is an element in (S × X1 )). Note that each of these beliefs can be those of either Benjamin or George: the belief spaces, at ◭ any order, of the players are identical.”
While the concept of a belief hierarchy is an intuitive one, the detailed mathematical description of a belief hierarchy may be extremely cumbersome. Despite this, we can prove mathematical properties of belief hierarchies that will eventually enable us to construct the universal belief space, which is the space of all possible belief hierarchies. Theorem 11.5 For each k ∈ N, the set Xk is a compact set in a metric space. Proof: The theorem is proved by induction on k. Because S is a compact set in a metric space, Theorem 11.2 implies that X1 = (S) is also a compact set in a metric space. Let k ≥ 1, and suppose by induction that Xk is a compact set. It follows that the set S × (Xk )n−1 is also compact, as the Cartesian product of compact sets in metric spaces. By Theorem 11.2 again, the set (S × (Xk )n−1 ) is a compact subset of a metric space in the weak-∗ topology. We deduce that the set Xk+1 = Xk × (S × (Xk )n−1 ) as the Cartesian product of two compact sets in metric spaces is also a compact set in a metric space. A k-order belief of a player is an element of Xk . Can every element in Xk be an “acceptable” belief of a player? The answer to this question is negative. Example 11.4 (Continued ) In this example, where N = {Benjamin, George} and S = {s1 , s2 }, an element in X1 is of the form [p1 (s1 ), (1 − p1 )(s2 )], and every such element is an “acceptable” first-order belief of a player. We will show, however, that not every second-order belief is “acceptable.” A second-order belief of a player is a pair μ2 = (μ1 , ν1 ), where μ1 is a first-order belief of the player, and ν1 is a probability distribution over S × X1 . In other words, ν1 is a probability distribution over vectors of the form (s, ρ), where s is a state of nature, and ρ is a first-order belief of the other player. On page 443, we gave an example of a second-order belief of a player where:
r The first-order belief is μ1 = [p2 (s1 ), (1 − p2 )(s2 )]. r The distribution ν1 ascribes probability α1 to the state of nature being s1 and the first-order belief of the second player being [q1 (s1 ), (1 − q1 )(s2 )].
r The distribution ν1 ascribes probability α2 to the state of nature being s1 and the first-order belief of the second player being [q2 (s1 ), (1 − q2 )(s2 )].
r The distribution ν1 ascribes probability 1 − α1 − α2 to the state of nature being s2 and the first-order belief of the second player being [q3 (s1 ), (1 − q3 )(s2 )].
444
The universal belief space If a player’s belief is “acceptable,” we expect the player to be able to answer the question “what is the probability you ascribe to the state of nature being s1 ?” When the second-order belief of the player is μ2 = (μ1 , ν1 ), he can answer this question in two different ways. On the one hand, μ1 is a first-order belief; i.e., it is a probability distribution over S, so that the answer to our question is the probability that μ1 ascribes to s1 . In the example above, that answer is p2 . On the other hand, ν1 is a probability distribution over S × X1 , so that the answer to our question is the probability that the marginal distribution of ν1 over S ascribes to s1 . In the above example, that answer is α1 + α2 . For the probability that the player ascribes to the state of nature s1 to be well defined, we must require that p2 = α1 + α2 . In general, for a second-order belief of a player to be “acceptable,” the marginal distribution of ν1 over S must coincide with μ1 , which is also a probability distribution over S. Note that according to the player’s second-order belief, the probability that the other player ascribes to the state of nature being s1 is α1 q1 + α2 q2 + (1 − α1 − α2 )q3 . It follows that even if the player’s belief is “acceptable,” i.e., if p2 = α1 + α2 , if α1 q1 + α2 q2 + (1 − α1 − α2 )q3 = p2 , then the player believes that the other player ascribes a probability to the state of nature being s1 that is different from the probability that he himself ascribes to that event. Thus, the inequality α1 q1 + α2 q2 + (1 − α1 − α2 )q3 = p2 does not mean that the player’s belief is unacceptable, because the ◭ player may believe that the other player does not agree with him.
The condition p2 = α1 + α2 , which emerged in the above discussion, is a mathematical condition constraining the distributions that comprise a belief hierarchy. Its purpose is to ensure that the beliefs in a belief hierarchy do not contradict each other. This condition is called the coherency condition. To define the coherency condition precisely, denote by μk+1 a belief hierarchy of order k + 1, i.e., an element in Xk+1 , for every k ≥ 0. We will present conditions that ensure that such a hierarchy is coherent. Since by the inductive definition (Definition 11.3) an element of Xk+1 is μk+1 ∈ Xk ×
(S × (Xk )n−1 ), we write μk+1 = (μk , νk ), where μk ∈ Xk and νk ∈ (S × Xkn−1 ). We similarly write μk = (μk−1 , νk−1 ), where μk−1 ∈ Xk−1 and νk−1 ∈ (S × (Xk−1 )n−1 ). Note that1 νk ∈ (S × (Xk )n−1 ) = (S × (Xk−1 × (S × (Xk−1 )n−1 ))n−1 )
= (S × (Xk−1 )n−1 × ( (S × (Xk−1 )n−1 ))n−1 ).
(11.4)
The marginal distribution of νk over S × (Xk−1 )n−1 is the player’s belief about the (k − 1)-order beliefs of the other players. For μk+1 to be coherent, we require that the marginal distribution of νk over S × (Xk−1 )n−1 to be equal to the probability distribution νk−1 over S × (Xk−1 )n−1 , which comprises part of μk . We also require that the players believe that the beliefs of the other players be coherent: νk must ascribe probability 1 to the event that the lower-order beliefs of the other players are also coherent. These conditions together lead to the following inductive definition of Zk , the set of all coherent belief hierarchies of order k (for each k ∈ N). ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
n−1 1 The spaces (S × (Xk−1 )n−1 )n−1 and S × (Xk−1 )n−1 × (S × (Xk−1 )n−1 ) in Equation (11.4) differ from each other in the order of the coordinates. Here and in the sequel we will relate to these spaces as if they were identical, identifying the corresponding coordinates in the two spaces.
445
11.1 Belief hierarchies
Definition 11.6 For each k ∈ N, the space of coherent belief hierarchies of order k is the space Zk defined inductively as follows: Z1 := X1 = (S),
(11.5) n−1
Z2 := {μ2 = (μ1 , ν1 ) ∈ Z1 × (S × (Z1 )
):
(11.6)
the marginal distribution of ν1 over S equals μ1 }.
For each k ≥ 2, Zk+1 := {μk+1 = (μk , νk ) ∈ Zk × (S ×(Zk )n−1 ) :
the marginal distribution of νk over S ×(Zk−1 )n−1 equals νk−1 where μk = (μk−1 , νk−1 )}. (11.7)
An element in the set Zk is called a coherent belief hierarchy of order k. In words, every belief of order 1 of a player is coherent; a second-order belief hierarchy μ2 = (μ1 , ν1 ) is a coherent belief hierarchy if the marginal distribution of ν1 over S equals μ1 ; for k ≥ 2, a (k + 1)-order belief hierarchy μk+1 = (μk , νk ) is a coherent belief hierarchy of order k + 1 if:
r μk = (μk−1 , νk−1 ) is a coherent belief hierarchy of order k. r νk is a probability distribution over S × (Zk )n−1 . r The marginal distribution of νk over S × (Xk−1 )n−1 equals νk−1 . One can prove by induction that Zk ⊆ Xk : every coherent belief hierarchy is a belief hierarchy (Exercise 11.3). As mentioned before, the coherency condition requires the beliefs of a player to be well defined. If the coherency condition is not met, then, for example, the probability that the player ascribes to event A may be 13 according to his k-order belief, and 2 according to his l-order belief. This is, of course, meaningless: the mathematical struc5 ture must reflect the intuition that the question “What is the probability that the player ascribes to an event A?” has an unequivocal answer. To understand the content of the coherency condition, note that a belief hierarchy of order k of any player defines a belief hierarchy for all orders l less than k for that player. Indeed, μk = (μk−1 , νk−1 ), where μk−1 ∈ Zk−1 is the player’s belief hierarchy of order k − 1 and νk−1 ∈ (S × (Zk−1 )n−1 ) is that player’s belief on the states of nature and on the belief hierarchies of order k − 1 of the other players. Similarly, μk−1 = (μk−2 , νk−2 ), where μk−2 ∈ Zk−2 is the players’ belief hierarchy of order k − 2 and νk−2 ∈ (S × (Zk−2 )n−1 ) is that player’s belief about the states of nature and about the belief hierarchies of order k − 2 of the other players. Continuing in this way, we arrive at the conclusion that in effect a belief hierarchy of order k is equivalent to a vector μk = (μ1 ; ν1 , ν2 , . . . , νk−1 ),
k ≥ 2,
(11.8)
where μ1 is the player’s belief about the state of nature, and νl ∈ (S × (Zl )n−1 ) is a probability distribution over the states of nature and the belief hierarchies of order l of the other players, for all 2 ≤ l < k. As the next theorem states, the coherency condition guarantees that all of these distributions “agree” with each other. The proof of the theorem is left to the reader (Exercise 11.6).
446
The universal belief space
Theorem 11.7 Let μk = (μ1 ; ν1 , ν2 , . . . , νk−1 ) ∈ Zk be a coherent belief hierarchy of order k, and let l1 , l2 be integers satisfying 1 ≤ l1 ≤ l2 ≤ k. Then: 1. The marginal distribution of ν1 over S equals μ1 . 2. The marginal distribution of νl2 over S × (Zl1 )n−1 is νl1 . The following theorem is a reformulation of Definition 11.6, and it details which pairs (μk , νk ) form coherent beliefs of order k + 1. Theorem 11.8 Let μk = (μ1 ; ν1 , ν2 , . . . , νk−1 ) ∈ Zk be a coherent belief hierarchy of order k, and let νk ∈ (S × (Xk )n−1 ). The pair (μk , νk ) is a coherent belief hierarchy of order k + 1 if and only if the following conditions are met:
r νk ascribes probability 1 to S × (Zk )n−1 . r For k = 1, the marginal distribution of ν1 over S is μ1 . r For k > 1, the marginal distribution of νk over S × (Xk−1 )n−1 equals νk−1 , where μk = (μk−1 , νk−1 ). From Theorem 11.8 it follows that if the belief of player i is coherent, then for every finite sequence of players i1 , i2 , . . . , il , player i believes (ascribes probability 1) to i1 believing that player i2 believes . . . that the belief hierarchy of order k − l of player il is coherent.
Example 11.9 In this example we present a situation of incomplete information and write down the belief hierarchy of one of the players. Phil wonders what the color of the famous Shwezigon Pagoda in Burma is, and whether his brother Don knows what it is. The states of nature are the possible colors of the pagoda: sb (blue), sg (gold), sp (purple), sr (red), sw (white), and so on. Phil does not know the color of the pagoda; he ascribes probability 13 to the pagoda being red and probability 23 to its being gold. Phil’s first-order belief is therefore μ1 =
1
2 3 (sr ), 3 (sg )
∈ (S).
(11.9)
Phil also believes that if the pagoda is red, then Don ascribes probability 21 to the pagoda being red and probability 12 to its being blue. He also believes that if the pagoda is gold then Don ascribes probability 1 to its being gold. Phil’s second-order belief is μ2 = (μ1 , ν1 ) where μ1 is given by Equation (11.9) and ν1 =
1 3
sr ,
1
1 2 (sr ), 2 (sb )
, 23 sg , 1[sg ] ∈ (S × Z1 ).
(11.10)
In addition, Phil believes that if the pagoda is red, then the following conditions are met:
r Don ascribes probability 1 to “the pagoda is red, and Phil believes that the pagoda is purple.” 2 r Don ascribes probability 1 to “the pagoda is blue, and Phil believes that the pagoda is red.” 2
447
11.1 Belief hierarchies Phil also believes that if the pagoda is gold, then Don ascribes probability 1 to “the pagoda is gold, and Phil ascribes probability 1 to the pagoda being white.” Phil’s third-order belief is μ3 = (μ2 , ν2 ) with μ2 as defined above and ν2 = 13 sr , 12 (sr ), 12 (sb ) , 21 (sr , [1(sp )]), 21 (sb , [1(sr )]) , 32 (sg , [1(sg )], [1(sg , [1(sw )])]) .
(11.11)
ν2 is Phil’s belief about Don’s belief. In Equation (11.11), we see that Phil believes that if the state of nature is sr , then Don’s first-order belief is [ 12 (sr ), 21 (sb )] and Don’s second-order belief is ([ 12 (sr ), 12 (sb )], [ 12 (sr , [1(sp )]), 21 (sb , [1(sr )])]). Phil also believes that if the state of nature is sg , then Don’s first-order belief is [1(sg )] and Don’s second-order belief is ([1(sg )], [1(sg , [1(sw )])]). We can now check the meaning of the coherence condition in this example. First, Phil’s belief is coherent:
r The marginal distribution of ν1 over S is μ1 . r The projection of ν2 on S × Z1 is ν1 . Second, Phil believes that Don’s belief is coherent. Indeed, the second-order belief that Phil ascribes to Don is coherent in both the states of nature sr and sg . Note that Phil indeed has beliefs about Don’s beliefs, but Don’s true beliefs are not expressed in Phil’s belief about Don’s beliefs; the latter ◭ may in fact differ from Phil’s beliefs about Don’s beliefs.
Does there exist a coherent belief hierarchy of order k for every k? Can every coherent belief hierarchy of order k be extended to a coherent belief hierarchy of order k + 1; in other words, given a coherent belief hierarchy μk of order k, can we find a coherent belief hierarchy μk+1 of order k + 1 such that μk+1 = (μk , νk )? The answer to these questions is yes. With respect to the first question, if s0 ∈ S is a given state of nature, then the following sentence defines a coherent belief hierarchy order k (Exercise 11.7): “I ascribe probability 1 to the state of nature being s0 , I ascribe probability 1 to the other players ascribing probability 1 to the state of nature being s0 , I ascribe probability 1 to each of the other players ascribing probability 1 to each of the other players ascribing probability 1 to the state of nature being s0 , and so on, up to level k.” The proof that every coherent belief hierarchy of order k can be extended to a coherent belief hierarchy of order k + 1 is more complicated, and we will present it next. We start by showing that for every k, the set Zk is compact. Theorem 11.10 For each k ∈ N, the set Zk is compact in Xk . Proof: The theorem is proved by induction on k. Start with k = 1. By definition, Z1 = (S), which is a compact set in a metric space (see the proof of Theorem 11.5). Let k ≥ 1, and suppose by induction that Zk is a compact set in Xk . Since the set Xk+1 is compact in a metric space (Theorem 11.5), to prove that Zk+1 is a compact set in Xk+1 it suffices to prove that the set Zk+1 is a closed set. To this end we need to show that the limit of every convergent sequence of points in Zk+1 is also in Zk+1 . This follows from the following two well-known facts regarding the weak-∗ topology:
r Let (μn )n∈N be a sequence of probability distributions over a space X, which converges in the weak-∗ topology to a probability distribution μ, and satisfies, for a compact
448
The universal belief space
set T ⊆ X, the condition μn (T ) = 1 for all n ∈ N. Then μ(T ) = 1 (this follows from Theorem 2.1 in Billingsley [1999]). r Let (μn )n∈N be a sequence of probability distributions over a product space X × Y converging in the weak-∗ topology to a probability distribution μ. Denote by ν j the marginal distribution of μj over X, and by ν the marginal distribution of μ over X. Then the sequence (νn )n∈N converges in the weak-∗ topology to ν (see Theorem 2.8 in Billingsley [1999]). j
Indeed, let (μk+1 )j ∈N be a sequence of points in Zk+1 converging to the limit μk+1 in j j j j Xk+1 . Denote μk+1 = (μk , νk ) and μk+1 = (μk , νk ). By Equation (11.8), μk+1 ascribes j probability 1 to S × (Zk )n−1 , and the marginal probability distribution of νk over j S × (Zk )n−1 is νk−1 . By the two above-mentioned facts, these two properties also hold for the limits μk and νk . By Theorem 11.8, we deduce that μk+1 ∈ Zk+1 , which is what we needed to show. We are now ready to prove that every coherent belief hierarchy μk ∈ Zk of order k can be extended to a coherent belief hierarchy μk+1 = (μk , νk ) ∈ Zk+1 . Since the set Z1 is nonempty, it will follow from this in particular that for any k ∈ N the set Zk is nonempty. Theorem 11.11 For any k ∈ N and every coherent belief hierarchy μk of order k there exists νk ∈ (S × (Zk )n−1 ) such that the pair (μk , νk ) is a coherent belief hierarchy of order k + 1. We will in effect be proving that there exists a continuous function hk : Zk → (S × (Zk )n−1 ) such that (μk , hk (μk )) ∈ Zk+1 for every μk ∈ Zk . If we define a function fk : Zk → Zk+1 by fk (μk ) = (μk , hk (μk )),
(11.12)
the function fk will be a continuous function associating every coherent belief hierarchy of order k with a coherent belief hierarchy of order k + 1, such that the projection of fk to the first coordinate is the identity function. Proof: We prove the existence of the continuous function hk by induction on k. We start with the case k = 1. Let s1 ∈ S be a state of nature. The distribution [1(s1 )] ∈ (S) is a first-order belief hierarchy in which the player ascribes probability 1 to s1 . For each μ1 ∈ Z1 = (S), consider the product2 distribution ν1 := μ1 ⊗ [1(s1 )]n−1 over S × (Z1 )n−1 . The pair μ2 := (μ1 , ν1 ) is a second-order belief hierarchy: the player believes that the probability distribution of the state of nature is μ1 , and that each of the other players ascribes probability 1 to the state of nature being s1 . Define a function h1 : Z1 → (S × (Z1 )n−1 ) as follows: h1 (μ1 ) := μ1 ⊗ [1(s1 )]n−1 .
(11.13)
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
2 When μ1 ∈ (X1 ) and μ2 ∈ (X2 ) are two probability distributions, the product distribution μ1 ⊗ μ2 ∈ (X1 × X2 ) is the unique probability distribution over X1 × X2 that satisfies (μ1 ⊗ μ2 )(A1 × A2 ) = μ1 (A1 ) · μ2 (A2 ) for every pair of measurable sets A1 in X1 and A2 in X2 .
449
11.1 Belief hierarchies
As we saw earlier, the pair (μ1 , h1 (μ1 )) is a coherent second-order belief hierarchy. Moreover, the function h1 is continuous (why?). We have thus completed the proof for the case k = 1. Suppose by induction that there exists a continuous function hk : Zk → (S × (Zk )n−1 ) satisfying (μk , hk (μk )) ∈ Zk+1 for all μk ∈ Zk . By Equation (11.12) this function defines a function fk : Zk → Zk+1 . We now proceed to construct the function hk+1 . For every k ∈ N set Yk := S × (Zk )n−1 . For every coherent belief hierarchy of order k + 1, μk+1 = (μk , νk ), the component νk is a probability distribution over S × (Zk )n−1 = Yk . Note that Yk+1 = S × (Zk+1 )n−1 ⊆ S × (Zk × (S × (Zk )n−1 ))n−1
= S × (Zk )n−1 × ( (S × (Zk )n−1 ))n−1 = Yk × ( (Yk ))n−1 .
(11.14)
(s, (μk,j )n−1 j =1 ),
where μk,j ∈ Zk for all j = We will denote an element of Yk by 1, 2, . . . , n − 1. Using Equation (11.12), changing the order of the coordinates yields n−1 n−1 n−1 s, (fk (μk,j ))n−1 j =1 = s, (μk,j , hk (μk,j ))j =1 = s, (μk,j )j =1 , (hk (μk,j ))j =1 ; (11.15) n−1 i.e., the projection of (s, (fk (μk,j ))n−1 j =1 ) on Yk is (s, (μk,j )j =1 ). For every measurable set A ⊆ Yk define a set Fk (A) ⊆ Yk+1 as follows: ! n−1 Fk (A) := s, (fk (μk,j ))n−1 j =1 : s, (μk,j )j =1 ∈ A ⊆ Yk+1 .
(11.16)
This set includes all the coherent belief hierarchy vectors of order k + 1 of the other players derived by expanding the coherent belief hierarchy vectors of order k contained in A by using fk . By the induction assumption, Fk (A) is not empty when A = ∅ (because n−1 . (s, (fk (μk,j )n−1 j =1 )) ∈ Fk (A) for every μk ∈ A) and is contained in S × (Zk+1 ) Consider next the inverse function of Fk : for every measurable set B ⊆ Yk+1 define n−1 (11.17) ∈ B} ⊆ Yk . Fk−1 (B) := s, (μk,j )n−1 j =1 : s, fk (μk,j )j =1
This is the set of all elements of Yk that are mapped by fk to the elements of B. Since the function fk is continuous, it is in particular a measurable function, and therefore the set Fk−1 (B) is also measurable.3 We next define an element νk+1 ∈ (Yk+1 ) as follows: for every measurable set B ⊆ Yk+1 , νk+1 (B) := νk (Fk−1 (B)).
(11.18)
Define the function hk+1 : Zk+1 → (Yk+1 ) by hk+1 (μk+1 ) := νk+1 .
(11.19)
The probability distribution νk+1 is a distribution over Yk+1 . By Equation (11.14), νk+1 is also a probability distribution over the set Yk × ( (Yk ))n−1 whose support is Yk+1 . We need to check that the marginal distribution of νk+1 over Yk is νk . To do so, we consider a measurable set A ⊆ Yk and check that νk+1 (A × ( (Yk ))n−1 ) = νk (A).
(11.20)
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
3 To show that the set Fk−1 (B) is measurable, it suffices to show that the function fk is a measurable function. We choose to show that this function is continuous because it is easier to do so than to show directly that it is measurable.
450
The universal belief space
Now, νk+1 (A×( (Yk ))n−1 ) = νk (Fk−1 (A×( (Yk ))n−1 ) (11.21) ! n−1 n−1 (11.22) = νk s, (μk,j )n−1 j =1 : s, fk (μk,j )j =1 ∈ A×( (Yk )) = νk (A).
(11.23)
Finally, we show that hk+1 is a continuous function. Let (μlk+1 )l∈N be a sequence of probability distributions over Yk converging to the limit μk+1 in the weak-∗ topology. Denote l := hk+1 (μlk+1 ) and νk+1 = hk+1 (μk+1 ). We need to show that for every continuous νk+1 function g : Yk+1 → R, l g s, ( μk+1,j )n−1 μk+1,j )n−1 lim j =1 dνk+1 s, ( j =1 l→∞ S×(Z )n−1 k+1
=
S×(Zk+1 )n−1
g s, ( μk+1,j )n−1 μk+1,j )n−1 j =1 dνk+1 s, ( j =1 .
(11.24)
This follows directly from μk+1 = fk (μk ), along with the fact that if g and fk are continuous functions then the composition g(s, (f (μk,j ))n−1 j =1 ) is a continuous function, where μk+1,j = (μk,j , νk,j ).
11.2
Types • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The sequence (Zk )∞ k=1 of the spaces of coherent belief hierarchies has a special structure. Define a projection ρ : Zk+1 → Zk as follows: if μk+1 = (μk , νk ) ∈ Zk+1 , then ρ(μk+1 ) := μk . Theorem 11.11 implies that ρ(Zk+1 ) = Zk . Such a structure is called a projective structure, and it enables us to define the projective limit as follows. Definition 11.12 The projective limit4 of the sequence of the spaces (Zk )∞ k=1 is the space ∞ T of all the sequences (μ1 , μ2 , . . .) ∈ ×k=1 Zk , where for every k ∈ N the belief hierarchy μk ∈ Zk is the projection of the belief hierarchy μk+1 ∈ Zk+1 on Zk . In other words, there exists a distribution νk ∈ (S × Zkn−1 ) such that μk+1 = (μk , νk ). The projective limit T is called the universal type space. An element in the universal type space is a sequence of finite belief hierarchies, satisfying the condition that for each k, the belief hierarchy of order k + 1 is an extension of the belief hierarchy of order k. Such an element is called a type, a term due to Harsanyi. Definition 11.13 An element t = (μ1 , μ2 , . . .) ∈ T is called a type. A player’s type is sometimes called his “state of mind,” since it contains answers to all questions regarding the player’s beliefs (of any order) about the state of nature. A player’s belief hierarchy defines his beliefs to all orders: his beliefs about the state of nature, his beliefs about the beliefs of the other players, his beliefs about their second-order beliefs, ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
4 The projective limit is also called the inverse limit. This definition is a special case of the more general definition of the projective limit of an infinite sequence of spaces on which a projective operator is defined, from which the name “projective limit” is derived.
451
11.2 Types
and so on. We assume that a player’s type is all the relevant information that the player has about the situation, and in what follows we will relate to a type as all the information in a player’s possession. Let t = (μ1 , μ2 , . . .) be a player’s type. Since the distribution μk is a coherent belief hierarchy of order k, as previously noted, it follows that for every list of players i1 , i2 , . . . , il , the player believes that player i1 believes that player i2 believes that . . . believes that player il ’s belief hierarchy of order k − l is coherent. Since for every k ∈ N, the first component of μk+1 is μk , a player of type t believes that the fact that “the players’ beliefs are coherent” is common belief among the players (Definition 10.9 on page 393). As the following example shows, when the set of players contains only one player, and there are two states of nature, the universal type space can be simply described. This observation is extended to any finite set of states of nature in Exercise 11.9. Example 11.14 In this example, we will construct the universal type space when there is one player, N = {I}, and two states of nature, S = {s1 , s2 }. By definition, X1 = (S),
k−1
Xk = (S)
(11.25) k
× (S) = ( (S)) , ∀k ≥ 2.
(11.26)
The coherency condition implies that the player’s type (there is only one player here) is entirely determined by his first-order beliefs. The universal type space in this case is homeomorphic to the set [0, 1]: for every p ∈ [0, 1], the element tp corresponds to the type ascribing probability p to the ◭ state of nature s1 , and probability 1 − p to the state of nature s2 .
When the set of players contains two or more players, the mathematical structure of the universal type space is far more complicated, because in that case a second-order belief hierarchy is a distribution over distributions, a third-order belief hierarchy is a distribution over distributions over distributions, and so on. The only way to analyze universal type spaces tractably requires simplifying their mathematical description. We will therefore consider several mathematical properties of the universal type space T that will be useful ∞ towards that end. Since a type is an element of the product space ×k=1 Zk , a natural topology over the universal type space, which we will use, is the topology induced by the product topology on this space. Theorem 11.15 The universal type space T is a compact space. Proof: As previously stated, every coherent belief hierarchy μk of order k uniquely defines an element (μ1 , μ2 , . . . , μk ) ∈ Z1 × Z2 × · · · × Zk ,
(11.27)
where μl is the projection of μk on Zl for every l, 1 ≤ l ≤ k. Denote by Tk ⊆ Z1 × Z2 × · · · Zk the set containing all k-order coherent belief hierarchies and their projections. Zk is a compact space for every k ∈ N (Theorem 11.10), and therefore the Cartesian product Z1 × Z2 × · · · Zk is also compact. We will now show that Tk ⊆ Z1 × Z2 × · · · × Zk is a compact set. To see this, note that for every l = 1, 2, . . . , k, the projection ρk,l :
452
The universal belief space
Zk → Zl is a continuous function; hence Tk , which is the image of the compact set Zk under the continuous mapping (ρk,1 , ρk,2 , . . . , ρk,k ), is a compact set. Tychonoff’s Theorem (see, for example, Theorem I.8.5 in Dunford and Schwartz [1988]) states that the (finite or infinite) Cartesian product of compact spaces is a compact space in the product topology. It follows that Tk := Tk × Zk+1 × Zk+2 × · · · (11.28) 4 is also a compact set for every k ∈ N. Since T = k∈N Tk we conclude that T , as the intersection of compact sets, is a compact set in Z1 × Z2 × · · · .
The topology over T is the collection of open sets in T . In order to study the probability distributions over T , it is necessary first to define a σ -algebra over T . A natural σ -algebra is the σ -algebra of the Borel sets: this is the minimal σ -algebra over T that contains all the open sets in T . The next theorem provides us with another way of defining the type of a player. It says that the type of a player is a probability distribution over the states of nature and the types of the other players. Theorem 11.16 The universal type space T satisfies5 T = (S × T n−1 ).
(11.29)
To be more precise, we will prove that there exists a natural homeomorphism6 ϕ : (S × T n−1 ) → T . Proof: An element in T is a vector of the form (μ1 , μ2 , . . .), satisfying μk = ρ(μk+1 ) for all k ∈ N. In the proof we will use the following: S × T n−1 ⊆ S × (Z1 × Z2 × · · · ) × · · · × (Z1 × Z2 × · · · ) 12 3 0 n−1
= S × (Z1 )
n−1
× (Z2 )
n−1
times
× ··· .
(11.30)
Step 1: Definition of the function ϕ : (S × T n−1 ) → T . We will show that every distribution λ ∈ (S × T n−1 ) uniquely determines an element (μ1 , μ2 , . . .) ∈ T . The belief hierarchies of all finite orders are defined as follows. Let μ1 be the marginal distribution of λ over S. For each k ≥ 1, let νk be the marginal distribution of λ over S × (Zk )n−1 (see Equation (11.30)). Inductively define μk+1 = (μk , νk ) for each k ≥ 1. To show that the resulting sequence (μ1 , μ2 , . . .) is a type in T , we need to show that for each k ∈ N, the projection of μk+1 on Zk is μk , which follows from the definitions of νk+1 and νk and from the fact that T contains only coherent types. Step 2: The function ϕ is continuous. The claim obtains because if (λl )l∈N is a sequence of probability distributions defined over the probability space X × Y converging to λ in the weak-∗ topology, and if μl is the marginal distribution of λl over X, then the sequence of distributions (μl )l∈N converges ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
5 The σ -algebra over S × T n−1 is the product σ -algebra. 6 A homeomorphism between two spaces X and Y is a continuous bijection f : X → Y , whose inverse f −1 : Y → X is also continuous.
453
11.3 Definition of the universal belief space
in the weak-∗ topology to the marginal distribution μ of λ over X (see Theorem 2.8 in Billingsley [1999]). Step 3: The function ϕ is injective. We will show that λ can be reconstructed from ϕ(λ), for each λ ∈ (S × T n−1 ). Denote ϕ(λ) = (μ1 , μ2 , . . .). Recall that μk+1 = (μk , νk ), where νk is a probability distribution over the space S × (Zk )n−1 . Because Zk contains all the hierarchies of all orders 1, 2, . . . , k, it follows that νk is a probability distribution over the space S × (Z1 )n−1 × (Z2 )n−1 × · · · × (Zk−1 )n−1 . Since μk = (μk−1 , νk−1 ), the marginal distribution of νk over S × (Z1 )n−1 × (Z2 )n−1 × · · · × (Zk−2 )n−1 is the probability distribution νk−1 . From the Kolmogorov Extension Theorem (see, for example, Theorem II.3.4 in Shiryaev [1995]) it follows that there exists a unique distribution λ∗ over the space S × (Z1 )n−1 × (Z2 )n−1 × · · · satisfying the condition that for every k ∈ N, the marginal distribution of λ∗ over S × (Z1 )n−1 × (Z2 )n−1 × · · · × (Zk−1 )n−1 is νk . Since the marginal distribution of λ∗ over S × (Z1 )n−1 × (Z2 )n−1 × · · · × (Zk−1 )n−1 equals the marginal distribution of λ over these spaces, the uniqueness of the extension implies that λ = λ∗ . Step 4: The function ϕ is surjective. Let (μ1 , μ2 , . . .) ∈ T be a type. As we saw in Step 3, a type in T defines a unique distribution λ ∈ (S × T n−1 ). The reader is asked to ascertain that ϕ(λ) equals (μ1 , μ2 , . . .). Step 5: The function ϕ is a homeomorphism. Every continuous, injective, and surjective function ϕ from a compact space to a Hausdorff space7 is a homeomorphism (see Claim I.5.8 in Dunford and Schwartz [1988]).
11.3
Definition of the universal belief space • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Definition 11.17 The universal belief space is = (N, S) = S × T n .
(11.31)
By definition, the universal belief space is determined by the set of states of nature and by the number of players. To understand the meaning of Definition 11.17, write Equation (11.31) in the following form: (11.32) = S × × Ti , i∈N
where Ti = T for all i ∈ N. The space Ti is called player i’s type space. It is the same space for all the players, and is the universal type space. An element of is a state of the world, and denoted by ω to distinguish it from the states of nature, which are elements
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
7 A topological space is a Hausdorff space if (a) every set containing a single point is closed, and (b) for every pair of distinct points there exist two disjoint and open sets, each of which contains one point but not the other. The space T is a Hausdorff space (Exercise 11.11).
454
The universal belief space
of S. A state of the world is therefore a vector ω = (s(ω), t1 (ω), t2 (ω), . . . , tn (ω)).
(11.33)
The first coordinate s(ω) is the state of nature at the state of the world ω, and ti (ω) is player i’s type at this state of the world. In other words, a state of the world is characterized by the state of nature s(ω), and the vector of types of the players, (ti (ω))i∈N , at that state of the world. We will assume that all a player knows is his own type. While in the Aumann and Harsanyi models the belief hierarchies of the players can be computed at every state of the world, in the universal belief space these hierarchies are part of the data defining the state of the world: a state of the world consists of a state of nature and the players’ belief hierarchies. Example 11.14 (Continued ) We have seen that when N = {I} and S = {s1 , s2 }, the universal type space T is homeomorphic to the interval [0, 1]. In this case, the universal belief space is = ({I}, S) = S × [0, 1]. For every p ∈ [0, 1], at the state of the world ω = (s1 , p), the state of nature is s1 , and the player ascribes probability p to the state of nature being s1 , and probability 1 − p to the state ◭ of nature being s2 .
Remark 11.18 As we saw in Theorem 11.16, a type ti (ω) ∈ T = (S × T n−1 ) is a probability distribution over the vectors of the state of nature and the list of the n − 1 types of the other players. Since = (S × (×j =i Tj )) × Ti , and because at every state of the world ω every player i knows his own type ti (ω), we can regard ti (ω) also as a probability distribution over , where the marginal distribution over Ti is the degenerate distribution at the point {ti (ω)}. From here on, we will assume that ti (ω) is indeed a probability distribution over . Recall that a belief space of the set of players N on the set of states of nature S is an ordered vector = (Y, Y , s, (πi )i∈N ), where (Y, Y ) is a measure space of states of the world, s : Y → S is a measurable function associating a state of nature with every state of the world, and πi : Y → (Y ) is a function associating a probability distribution over Y with every state of the world and every player i ∈ N, which satisfies the conditions of coherency and measurability. The next two theorems justify the name “universal belief space” that we gave to : Theorem 11.19 states that naturally defines a belief space, and Theorem 11.20 states that every belief space is a belief subspace of . It follows that the space (N, S) contains all the possible situations of incomplete information of a set of players N on a set of states of nature S. Denote by Y ∗ the product σ -algebra over the set defined by Equation (11.32). Theorem 11.19 The ordered vector ∗ = (, Y ∗ , s, (ti )i∈N ) is a belief space, where is the universal belief space and s and (ti )i∈N are projections on the n + 1 coordinates of the state of the world (see Equation (11.33)). Proof: We will show that all the conditions defining a belief space are satisfied. The space (, Y ∗ ) is a measurable space. Since Y ∗ is a σ -algebra, the functions s and ti are measurable functions.
455
11.3 Definition of the universal belief space
We will next show that the functions (ti )i∈N satisfy the coherency condition (see Definition 10.1 on page 387). As stated in Remark 11.18, for every player i ∈ N and every ω ∈ , the type ti (ω) is a probability distribution over . Since ti is a measurable function, the set {ω′ ∈ : ti (ω′ ) = ti (ω)} is measurable, and by definition, the probability distribution ti (ω) ascribes probability 1 to this set, showing that the coherency condition is satisfied. Finally, we check that for every i ∈ N, the function ti satisfies the measurability condition (see Definition 10.1 on page 387). To do so, we need to show that for every measurable set E in , the function ti (E | ·) : → [0, 1] is measurable. We prove this by showing that for every x ∈ [0, 1], the set Gx = {ω ∈ : ti (E | ω) > x} is measurable. By the definition of the weak-∗ topology, for every continuous function f : → R and every x ∈ [0, 1], the set Af,x := ω ∈ : f (ω′ )dti (ω′ | ω) > x (11.34)
is measurable. Let F be the family of continuous functions f : → (0, ∞) satisfying the condition f (ω) > 1 for all ω ∈ E. Let F0 be a countable dense subset of F (why does such a set exist?). The intersection ∩f ∈F0 Af,x , as the intersection of a countable number of measurable sets, is measurable, and is equal to Gx (why?). Theorem 11.20 Every belief space of a set of players N on a set of states of nature S is a belief subspace (see Definition 10.21 on page 400) of the universal belief space (N, S) defined in Theorem 11.19. To be precise, every belief space = (Y, Y , s, (πi )i∈N ) is homomorphic to a belief subspace of the belief space ∗ , in the following sense: the belief hierarchy of every player i at every state of the world ω ∈ Y equals his belief hierarchy at the state of the world in ∗ corresponding to ω, under the homomorphism. Proof: Let = (Y, Y , s, (πi )i∈N ) be a belief space. As we stated on page 390, for every state of the world ω ∈ Y and every player i ∈ N, we can associate an infinite belief hierarchy that describes the beliefs of player i at the state of the world ω. Denote this belief hierarchy by ti (ω). For each ω ∈ Y , the vector ϕ(ω) := (s(ω), t1 (ω), t2 (ω), . . . , tn (ω)) is a state of the world in the universal belief space. Note that if there are two states of the world ω, ω′ ∈ Y satisfying the conditions that the belief hierarchy of every player i in ω equals his belief hierarchy in ω′ , and if these two states of the world are associated with the same state of nature, then ϕ(ω) = ϕ(ω′ ) (this happens, for example, in the second belief space in Example 10.13 on page 394). The definition implies that the belief hierarchy of every player i at every state of the world ω ∈ Y equals his belief hierarchy in ϕ(ω). Consider the set := {ϕ(ω) : ω ∈ Y } ⊆ . Y
(11.35)
is a belief subspace of ∗ (Exercise 11.10). It is left to the reader to check that the set Y
456
The universal belief space
Theorem 11.20 implies, for example, that in each of the examples in Section 10.3 (page 394), the belief space is a belief subspace of an appropriate universal belief space. For example, each of the belief spaces described in Examples 10.17 (page 396), 10.18 (page 398) and 10.19 (page 399), is a subspace of the universal belief space (N, S), where N = {I, II} and S = {s11 , s12 , s21 , s22 }.
11.4
Remarks • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The universal belief space was first constructed and studied by Mertens and Zamir [1985]. Heifetz and Samet [1998] discuss a construction of the universal belief space using measure-theoretical tools, without any use of topological structures. Aumann [1999] constructs the universal belief space using a semantic approach. The reader interested in the weak-∗ topology is directed to Dunford and Schwartz [1988] (Chapter V.12), Conway [1990], or Billingsley [1999]. The authors thank Yaron Azrieli, Aviad Heifetz, Dov Samet, Boaz Klartag, John Levy, and Eran Shmaya for their comments on this chapter.
11.5
Exercises • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
11.1 Joshua and his army are planning to circle Jericho seven times. Will the wall come tumbling down? The king of Jericho reports that:
r I ascribe probability 0.8 to “the wall of the city will fall, and Joshua ascribes probability 0.6 to the wall falling.” r I ascribe probability 0.2 to “the wall of the city will not fall, and Joshua ascribes probability 0.5 to the wall falling.” Answer the following questions: (a) What is the set of states of nature corresponding to the above description. (b) What is the king’s first-order belief? What is his second-order belief? (c) Can Joshua’s first-order belief be ascertained from the above description? Justify your answer. 11.2 Construct a belief space of the set of players N = {Don, Phil} on the set of states of nature S = {sb , sg , sp , sr , sw } describing the situation in Example 11.9 (page 446), and indicate at which state of the world Phil’s belief hierarchy of order 3 is the hierarchy described in the example. There may be more than one correct answer. 11.3 Prove that Zk ⊆ Xk for each k ≥ 1: every coherent belief hierarchy of order k (Definition 11.6 on page 445) is a belief hierarchy of order k (Definition 11.3 on page 442).
457
11.5 Exercises
11.4 Consider the following belief space, where the set of players is N = {I, II}, and the set of states of nature is S = {s1 , s2 }: State of the world s(·)
πI (·)
ω2
s1 [ 12 (ω1 ), s2 [ 12 (ω1 ),
ω3
s1
ω1
1 (ω2 )] 2 1 (ω2 )] 2
[1(ω3 )]
πII (·) [1(ω1 )] [ 34 (ω2 ), [ 34 (ω2 ),
1 (ω3 )] 4 1 (ω3 )] 4
Write out the belief hierarchies of orders 1, 2, and 3 of Player I, at each state of the world. 11.5 Roger reports:
r I ascribe probability 0.3 to “the Philadelphia Phillies will win the World Series next year, Chris ascribes probability 0.4 to their winning the World Series and to me ascribing probability 0.4 that they will win the World Series, and Chris ascribes probability 0.6 to the Philadelphia Phillies not winning the World Series and to me ascribing probability 0.2 that they will win the World Series.” r I ascribe probability 0.2 to “the Philadelphia Phillies will win the World Series next year, Chris ascribes probability 0.4 to their winning the World Series and to me ascribing probability 0.5 that they will win the World Series, and Chris ascribes probability 0.6 to the Philadelphia Phillies not winning the World Series and to me ascribing probability 0.8 that they will win the World Series.” r I ascribe probability 0.4 to “the Philadelphia Phillies will win the World Series next year, Chris ascribes probability 0.2 to their winning the World Series and to me ascribing probability 0.1 that they will win the World Series, and Chris ascribes probability 0.8 to the Philadelphia Phillies not winning the World Series and to me ascribing probability 0.3 that they will win the World Series.” r I ascribe probability 0.1 to “the Philadelphia Phillies will win the World Series next year, Chris ascribes probability 0.6 to their winning the World Series and to me ascribing probability 0.4 that they will win the World Series, and Chris ascribes probability 0.4 to the Philadelphia Phillies not winning the World Series and to me ascribing probability 0.7 that they will win the World Series.” Answer the following questions: (a) Construct a space of states of nature corresponding to the above description. (b) What is Roger’s first-order belief? What is his second-order belief? What is his third-order belief? 11.6 Prove Theorem 11.7: let μk = (μ1 ; ν1 , ν2 , . . . , νk−1 ) ∈ Zk be a coherent belief hierarchy of order k and let l1 , l2 be two integers such that 1 ≤ l1 ≤ l2 ≤ k. Then: (a) The marginal distribution of ν1 over S equals μ1 . (b) The marginal distribution of νl2 over S × (Zl1 )n−1 is νl1 . 11.7 Let s0 ∈ S be a state of nature. Prove that the following sentence defines a coherent belief hierarchy of order k: “I ascribe probability 1 to the state of nature being s0 ,
458
The universal belief space
I ascribe probability 1 to all the other players ascribing probability 1 to the state of nature being s0 , I ascribe probability 1 to each of the other players ascribing probability 1 to every other player ascribing probability 1 to the state of nature being s0 , and so on, to level k.” 11.8 There are two players, N = {I, II}, and the space of states of nature is S = {s1 , s2 }. Ascertain for each of the following belief hierarchies of Player I whether or not it is a coherent belief hierarchy (of some order). Justify your answer. (a) I ascribe probability 19 to the state of nature being s1 . (b) r I ascribe probability 12 to the state of nature being s1 . r I also ascribe probability 2 to the state of nature being s1 and to me ascribing 3 probability 43 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s1 and to me ascribing 3 probability 0 to the state of nature being s1 . (c) r I ascribe probability 21 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s1 and me ascribing 2 probability 31 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s1 and me ascribing 2 probability 32 to the state of nature being s1 . (d) r I ascribe probability 12 to the state of nature being s1 . r I also ascribe probability 2 to the state of nature being s1 and to Player II 3 ascribing probability 34 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s1 and to Player II 3 ascribing probability 0 to the state of nature being s1 . (e) r I ascribe probability 21 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s1 and to 3 Player II ascribing probability 41 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s1 and to Player II 6 ascribing probability 12 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s2 and to Player II 2 ascribing probability 23 to the state of nature being s1 . (f) r I ascribe probability 12 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s1 and to 3 Player II ascribing probability 41 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s1 and to Player II 6 ascribing probability 14 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s2 and to Player II 2 ascribing probability 23 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s2 and to Player II 3 ascribing probability 14 to the state of nature being s1 , and to me ascribing probability 25 to the state of nature being s1 , and to Player II ascribing probability 34 to the state of nature being s2 , and to me ascribing probability 4 to the state of nature being s1 . 5
459
11.5 Exercises
r I also ascribe probability 1 to the state of nature being s1 and to Player II 6 ascribing probability 14 to the state of nature being s1 , and to me ascribing probability 35 to the state of nature being s1 , and to Player II ascribing probability 34 to the state of nature being s2 , and to me ascribing probability 3 to the state of nature being s1 . 7 r I also ascribe probability 1 to the state of nature being s2 and to Player II 2 ascribing probability 14 to the state of nature being s1 , and to me ascribing probability 45 to the state of nature being s1 , and to Player II ascribing probability 34 to the state of nature being s2 , and to me ascribing probability 2 to the state of nature being s1 . 5 (g) r I ascribe probability 12 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s1 and to Player II 3 ascribing probability 14 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s1 and to Player 6 II ascribing probability 12 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s1 and to Player II 2 ascribing probability 23 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s1 and to Player II 2 ascribing probability 13 to the state of nature being s1 , and to me ascribing probability 25 to the state of nature being s1 , and to Player II ascribing probability 23 to the state of nature being s2 , and to me ascribing probability 4 to the state of nature being s1 . 5 r I also ascribe probability 1 to the state of nature being s2 and to Player II 2 ascribing probability 23 to the state of nature being s1 , and to me ascribing probability 45 to the state of nature being s1 , and to Player II ascribing probability 13 to the state of nature being s2 , and to me ascribing probability 1 to the state of nature being s1 . 5 (h) r I ascribe probability 12 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s1 and to Player II 3 ascribing probability 14 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s1 and to Player 6 II ascribing probability 12 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s2 and to Player II 2 ascribing probability 23 to the state of nature being s1 . r I also ascribe probability 1 to the state of nature being s1 , and to Player II 3 ascribing probability 41 to the state of nature being s1 and to me ascribing probability 52 to the state of nature being s1 , and to Player II ascribing probability 34 to the state of nature being s2 and to me ascribing probability 4 to the state of nature being s1 . 5 r I also ascribe probability 1 to the state of nature being s1 , and to Player II 6 ascribing probability 21 to the state of nature being s1 and to me ascribing probability 54 to the state of nature being s1 , and to Player II ascribing
460
The universal belief space
probability 21 to the state of nature being s2 and to me ascribing probability 3 to the state of nature being s1 . 5 r I also ascribe probability 1 to the state of nature being s2 , and to Player II 2 ascribing probability 32 to the state of nature being s1 and to me ascribing probability 52 to the state of nature being s1 , and to Player II ascribing probability 13 to the state of nature being s2 and to me ascribing probability 1 to the state of nature being s1 . 5 11.9 There is a single player N = {I} and the set of states of nature is a finite set S. What is the universal type space in this case? What is the universal belief space (N, S)? that was 11.10 Complete the proof of Theorem 11.20 on page 455: prove that the set Y ∗ defined in the proof of the theorem is a belief subspace of . 11.11 Prove that the universal type space T is a Hausdorff space.
12
Auctions
Chapter summary In this chapter we present the theory of auctions, which is considered to be one of the most successful applications of game theory, and in particular of games with incomplete information. We mainly study symmetric auctions with independent private values and risk-neutral buyers. An auction is presented as a game with incomplete information and the main interest is in the (Bayesian) equilibrium of this game, that is, in the bidding strategies of the buyers and in the expected revenue of the seller. A hallmark of this theory is the Revenue Equivalence Theorem, which states that in any equilibrium of an auction method in which (a) the winner is the buyer with the highest valuation for the auctioned item, and (b) any buyer who assigns private value 0 to the auctioned item pays nothing, the expected revenue of the seller is independent of the auction method. This theorem implies that a wide range of auction methods yield the seller the same expected revenue. We also prove that the expected revenue to the seller increases if all buyers are risk averse, and it decreases if all buyers are risk seeking. The theory is then extended to selling mechanisms. These are abstract mechanisms to sell items to buyers that include, e.g., post-auction bargaining between the seller and the buyers who placed the highest bids. We prove the revelation principle for selling mechanisms, which allows us to consider only a simple class of mechanisms, called incentive-compatible direct selling mechanisms. We then prove the Revenue Equivalence Theorem for selling mechanisms, and identify the selling mechanism that yields the seller the highest expected profit. This turns out to be a sealed-bid second-price auction with a reserve price.
Auctions and tenders are mechanisms for the buying and selling of objects by way of bids submitted by potential buyers, with the auctioned object sold to the highest bidder. Auctions have been known since antiquity. The earliest mention of auctions appears in the fifth century BCE, in the writings of Herodotus (Book One, Clio 194): When they arrive at Babylon in their voyage and have disposed of their cargo, they sell by auction the ribs of the boat and all the straw.
Herodotus also tells of a Babylonian custom of selling young women by public auction to men seeking wives (Book One, Clio 196). In 193 CE, the Roman emperor Pertinax was assassinated by his Praetorian Guard. In an attempt to win the support of the guard and be crowned the next emperor, Titus Flavius Sulpicianus offered to pay 20,000 sesterces to each member of the guard. Upon hearing of Sulpicianus’ offer, Marcus Severus Didius 461
462
Auctions
Julianus countered with an offer of 25,000 sesterces to each member of the Praetorian guard, and ascended to the throne; in effect, the Roman Empire had been auctioned to the highest bidder. Julianus did not live long to enjoy the prize he had won; within three months, three other generals laid claim to the crown, and Julianus was beheaded. Auctions and tenders are ubiquitous nowadays. A very partial list of examples of objects sold in this way includes Treasury bills, mining rights, objects of fine art, bottles of wine, and repossessed houses. A major milestone in the history of auctions was achieved in the 1995 auctioning of the rights to radio-spectrum frequencies in the United States, which resulted in the federal government pocketing an unprecedented profit of 8.7 billion dollars. The main reasons for preferring auctions and tenders to other sales mechanisms are the speed with which deals are concluded, the revelation of information achieved by these mechanisms, and the prevention of improper conduct on the part of sales agents and purchasers (an especially important reason when the seller is a public body). As we will show in this chapter, an auction is a special case of a game with incomplete information. Many of the games we encounter in daily life are highly complex. Even when the theory assures us that these games have equilibria, in most cases the equilibria are hard to compute, and it is therefore difficult to predict what buyers will do, or to advise them on the way to play. This is also true, in general, with respect to auctions. However, as we will show, under certain assumptions, it is possible to compute the equilibrium strategies in auctions, to describe how the equilibria will change if the parameters of the game are changed (e.g., the utility functions of the buyers), and to compare the expected outcomes (for both buyers and seller) when the rules of the game (or the auction method) are changed. The theory developed in this chapter provides insights useful for participating in auctions and designing auctions. The theory of auctions is one of the most successful application of game theory, and in particular of games with incomplete information. The theory is not simple, but it is very elegant. The combination of mathematical challenge with clear applicability makes the theory of auctions a central element of modern economic theory. In the literature auctions are classified in several ways:
r Open-bid or sealed-bid auction. In an open-bid auction, the buyers hear or see each other in real time, as bids are made, and are able immediately to offer counter bids. In a sealed-bid auction, all the buyers submit their bids simultaneously, and no buyer knows the offers made by the other buyers. Art objects are usually sold in open-bid auctions. Most public auctions conducted on the Web, and many large state-run auctions, such as the auctioning of radio-spectrum frequencies, are also open-bid auctions. In contrast, tenders for government contracts, and auctions for the sale of assets taken into receivership after a corporate bankruptcy, are usually conducted in sealed-bid auctions. r Private value or common value. A buyer’s assessment of the worth of an object offered for sale is called a value. The literature on auction theory distinguishes between private values and common values. When the value of an object for a buyer is a private value, it is independent of the value as assessed by the other buyers. A private value is always known ahead of time to the buyer, with no uncertainty.
463
Auctions
When the value is a common value, it is identical for all the buyers, but is unknown. This occurs, for example, in tenders for oil-drilling rights, where there is uncertainty regarding the amount of oil that can be extracted from the oil field, and in tenders for the real-estate development rights, where there is uncertainty regarding the potential demand for apartments, and the final price at which the apartments will be sold. Most auctions share both characteristics to a certain extent: the value of any object, whether it is a valuable work of art or a drilling project, is never known with certainty. This unknown value is common among the buyers when measured in dollars and cents, but there is also a private component, determined by personal taste, financial resources and the future plans of the buyer. When the object offered for sale is, for example, a real-estate development or oil-drilling rights, the expected financial revenue that the project will yield is the common value component. When the object is a Treasury bond or shares in a company, the difference between the sale price and the purchase price is the common value component. This component is common to all the buyers, but is unknown to them, and each buyer may have different information (or different assessments) regarding this value. The financial abilities of the buyer, his future plans, and other possibilities available to him, should he fail to win the auction, also affect the value of an object to the buyer. These factors differ from one buyer to another, and what influences one buyer usually has no effect on another. This is the private component of the value of an object for a buyer. The literature includes general auction models that use general valuation functions, and take into account the possibility that the private information of the buyers regarding the common but unknown value of an object may be interdependent (see Milgrom and Weber, [1982]). r Selling a single object or several objects. Auctions differ with respect to the number of objects offered. Sometimes only one object is offered, such as a Chagall painting, a license to operate a television station for five years, or a letter from Marilyn Monroe to Elvis Presley. Sometimes, several copies of the same object are offered, such as batches of Treasury bonds, or shares in a company listed on the stock exchange. There are also cases in which several objects with different characteristics are offered at once. For example, in recent years some countries have conducted auctions of regional communication licenses (covering mobile telephone rights, broadcast radio rights, and so on), with licenses for different regions offered simultaneously. In this chapter, we will focus on the case in which the buyers in an auction have independent private values, and only one object is offered for sale. This is the simplest case from the mathematical perspective. It is also historically the first case that was studied in the literature. Despite the simplicity of this model, the mathematical analysis is not trivial and the results are both elegant and applicable. We close this introduction to the chapter with a remark on terminology. In previous chapters, the term “payoff” meant the expected payoff of a buyer. In this chapter, we will use the term “payment” to refer to the amount of money a buyer pays to the seller, and the term “profit” to denote the expected profit of the buyer, which is defined as the difference between the buyer’s expected utility from receiving the object (the probability that he will
464
Auctions
win the auction times the utility he receives from winning the object) and the expected payment the buyer pays the seller.
12.1
Notation • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The participants in the auctions will be called buyers. For every random variable X, denote its cumulative distribution function by FX . That is, FX (c) = P(X ≤ c), ∀c ∈ R.
(12.1)
If X is a continuous random variable, denote its density function by fX . In this case, c fX (x)dx, ∀c ∈ R. (12.2) FX (c) = −∞
12.2
Common auction methods • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The following list details the most common auction methods: 1. Open-bid ascending auction (English auction). This is the most common public auction. It is characterized by an auctioneer who publicly declares the price of the object offered for sale. The opening price is low, and as long as there are at least two buyers willing to pay the declared price, the auctioneer raises the price (either in discrete jumps, or in a continuous manner using a clock). Each buyer raises a hand as long as he is willing to pay the last price that the auctioneer has declared. The auction ends when all hands except one have been lowered, and the object is sold to the last buyer whose hand is still raised, at the last price declared by the auctioneer. If the auction ends in a draw (i.e., the last two or more buyers whose hands were raised drop out of the auction at the same time), a previously agreed rule (such as tossing a coin) is employed to determine who wins the object, which is then sold to the winner at the price that was current when they lowered their hands. Web-based auctions and auctions of works of art (such as those conducted at Sotheby’s and Christie’s), typically use this method. 2. Open-bid descending auction (Dutch auction). A Dutch auction operates in the reverse direction of the English auction. In this method, the auctioneer begins by declaring a very high price, higher than any buyer could be expected to pay. As long as no buyer is willing to pay the last declared price, the auctioneer lowers the declared price (either in discrete jumps or in a continuous manner using a clock), up to the point at which at least one buyer is willing to pay the declared price and indicates his readiness by raising his hand or pressing a button to stop the clock. If the price drops below a previously declared minimum, the auction is stopped, and the object on offer is not sold. Similarly to the English auction, a previously agreed rule is employed to determine who wins the auction if two or more buyers stop the clock at the same time. The flower auction at the Aalsmeer Flower Exchange, near Amsterdam, is conducted using this method.
465
12.3 Definition of a sealed-bid auction
3. Sealed-bid first-price auction. In this method, every buyer in the auction submits a sealed envelope containing the price he is willing to pay for the offered object. After all buyers have submitted their offers, the auctioneer opens the envelopes and reads the offers they contain. The buyer who has submitted the highest bid wins the offered object, and pays the price that he has bid. A previously agreed rule determines how to resolve draws. 4. Sealed-bid second-price auction (Vickery auction). The sealed-bid second-price auction method is similar to the first-price sealed-bid auction method, except that the winner of the auction, i.e., the buyer who submitted the highest bid, pays the secondhighest price among the bid prices for the offered object. A previously agreed-upon rule determines the winner in case of a draw, with the winner in this case paying what he bid (which is, in the case of a draw, also the second-highest bid). We mention here in passing several sealed-bid auction methods that, despite being important, will not be studied in detail in this book. In each of these methods, the winner of the auction is the buyer who has submitted the highest bid (if several buyers have submitted the same highest bid, the winner is determined by a previously agreed-upon rule). 1. A sealed-bid auction with a reserve price is a sealed-bid auction in which every bid that is lower than a minimal price, as determined by the seller, is disqualified. In a sealed-bid first-price auction with a reserve price, the winner of the auction pays the highest bid for the object; in a sealed-bid second-price auction with a reserve price, the winner pays either the second-highest bid for the object or the reserve price, whichever is higher. 2. An auction with an entry fee is a sealed-bid auction in which every buyer must pay an entry fee for participating in the auction, whether or not he wins the auction. The winner of the auction also pays for the object he has won, in addition to the entry fee. In a sealed-bid first-price auction with an entry fee, the winner of the auction pays the highest bid for the object; in a sealed-bid second-price auction with an entry fee, the winner pays the second-highest bid for the object. In an auction with an entry fee, a buyer’s strategy is composed of two components: whether or not to participate in the auction (and pay the entry fee), and if so, how high a bid to submit. 3. An all-pay auction is a sealed- or open-bid auction in which every buyer pays the amount of money he has bid, whether or not he has won the object for sale. All-pay auctions are appropriate models for competitions, such as arms races between countries, or research and development competitions between companies racing to be the first to market with a new innovation. In these cases, all the buyers in the race, or competition, end up paying the full amounts of their investments, whether or not they win.
12.3
Definition of a sealed-bid auction with private values • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
In a sealed-bid auction, every buyer submits a bid, and the rules of the auction determine who wins the object for sale, and the amounts of money that the buyers (the winner, and
466
Auctions
perhaps also the other buyers) must pay. The winner is usually the highest bidder, but it is possible to define auctions in which the winner is not necessarily the highest bidder. Definition 12.1 A sealed-bid auction (with independent private values) is a vector (N, (Vi , Fi )i∈N , p, C), where:
r N = {1, 2, . . . , n} is the set of buyers. r Vi ⊆ R is the set of possible private values of buyer i, for each i ∈ N. Denote by VN := V1 × V2 × · · · × Vn the set of vectors of private value. r For each buyer i ∈ N there is a cumulative distribution function Fi over his set of private values Vi . r p : [0, ∞)N → (N) is a function associating each vector of bids b ∈ [0, ∞)N with a distribution according to which the buyer who wins the auctioned object is identified.1 r C : N × [0, ∞)N → RN is a function determining the payment each buyer pays, for each vector of bids b ∈ [0, ∞)N , depending on which buyer i∗ ∈ N is the winner. A sealed-bid auction is conducted as follows:
r The private value vi of each buyer i is chosen randomly from the set Vi , according to the cumulative distribution function Fi . r Every buyer i learns his private value vi , but not the private values of the other buyers. r Every buyer i submits a bid bi ∈ [0, ∞) (depending on his private value vi ). r The buyer who wins the auctioned object, i∗ , is chosen according to the distribution p(b1 , b2 , . . . , bn ); the probability that buyer i wins the object is pi (b1 , b2 , . . . , bn ). r Every buyer i pays the sum Ci (i∗ ; b1 , b2 , . . . , bn ). For simplicity we will sometimes denote an auction by (p, C) instead of (N, (Fi )i∈N , p, C). Note several points relating to this definition:
r The private values of the buyers are independent, and therefore the vector of private values (v1 , v2 , . . . , vn ) is drawn according to a product distribution, whose cumulative distribution function is F N := F1 × F2 × · · · × Fn . A more general model would take into account the possibility of general joint distributions, thereby enabling the modeling of situations of interdependency between the private values of different buyers. r In most of the auctions with which we are familiar, the winner of the auction is the highest bidder. In other words, if there is a buyer i such that bi > maxj =i bj , then p(b1 , b2 , . . . , bn ) is a degenerate distribution ascribing probability 1 to buyer i. If two (or more) buyers submit the same highest bid, a previously agreed-upon rule is implemented to determine the winner. That rule may be deterministic (for example, among the buyers who have submitted the highest bid, the winner is the buyer who submitted his bid first), or probabilistic (for example, the winner may be determined by the toss of a fair coin). r In the most familiar payment functions, the winner pays either the highest, or the second-highest bid for the auctioned object. The payment function in the definition of a sealed-bid auction is more general, and enables the modeling of entry-fee favoritism (e.g., incentives for certain sectors), and all-pay auctions. It also enables the modeling ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
1 Recall that (N ) := {x ∈ [0, 1]N : N = {1, 2, . . . , n}.
i∈N
xi = 1} is the set of all probability distributions over the set of buyers
467
12.3 Definition of a sealed-bid auction
of auctions with less-familiar rules, such as third-price auctions, in which the winner pays the third-highest bid. The private value of buyer i is a random variable whose cumulative distribution function is Fi . This random value is denoted by Vi . A sealed-bid auction can be presented as a Harsanyi game with incomplete information (see Section 9.4) in the following way:
r The set of players is the set of buyers N = {1, 2, . . . , n}. r Player i’s set of types is Vi . r The distribution over the set of type vectors is a product distribution with cumulative distribution function F N = F1 × F2 × · · · × Fn . r For each type vector v ∈ VN , the state of the world is the state game sv , where buyer i’s set of actions is [0, ∞) and for every action vector x ∈ [0, ∞)N , buyer i’s profit is
pi (x)vi − pi∗ (x)Ci (i∗ ; x). (12.3) i∗ ∈N
In words, if buyer i is the winner, he receives vi (his private value for the object), and he pays Ci (i∗ ; x) in any event (whether or not he is the winner), where i∗ is the winning buyer. A formal definition of an open-bid auction depends on the specific method used in conducting the auction, and may be very complex. For example, in the most common open-bid auction method, the English auction, a buyer’s decision on whether or not to stop bidding at a certain moment depends on the identities of the other buyers, both those who have already quit the auction, and those who are still bidding, and the prices at which those who have already quit chose to stop bidding. We will not present a formal definition of an open-bid auction in this book. Example 12.2 Sealed-bid second-price auction In a sealed-bid second-price auction, the winner is the highest bidder. If several buyers have submitted the highest bid then each of them has the same probability of winning: denote by N (x) = {i ∈ N : xi = maxj ∈N xj } the set of buyers who have submitted the highest bid, and by i∗ the buyer who wins the auctioned object. Then: 0 i ∈ N (x), pi (x) = (12.4) 1 i ∈ N (x), |N (x)| 0 i = i∗ , (12.5) Ci (i∗ ; x) = i = i∗ . maxj =i xj Note that if at least two buyers have submitted the same highest bid, the auctioned object is sold at ◭ this highest bid; that is, if |N (x)| ≥ 2, then Ci∗ (i∗ ; x) = maxj ∈N xj .
A pure strategy of buyer i in a sealed-bid auction is a measurable function2 βi : [0, ∞) → [0, ∞).
(12.6)
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
2 Recall that for every subset X ⊆ R, a real-valued function f : X → [0, ∞) is measurable if for each number y ∈ [0, ∞), the set f −1 ([0, y]) = {x ∈ X : f (x) ≤ y} is a measurable set.
468
Auctions
If buyer i uses pure strategy βi , then when his type is vi he bids βi (vi ). If the buyers use the strategy vector β = (βi )i∈N , buyer i’s expected profit is ui (β) =
VN
pi (β1 (x1 ), . . . , βn (xn ))vi −
i∗ ∈N
pi∗ (x)Ci (i∗ ; β1 (x1 ), . . . , βn (xn )) dF N (x). (12.7)
The next theorem points out a connection between two of the auction methods described above. Theorem 12.3 The open-bid descending auction method is equivalent to the sealed-bid first-price auction method: both methods describe the same strategic-form game, with the same strategy sets and the same payoff functions. Proof: The set of (pure) strategies in a sealed-bid first-price auction, for each buyer i, is the set of all measurable functions βi : Vi → [0, ∞). This set is also the set of buyer i’s strategies in an open descending auction. Indeed, a strategy of buyer i is a function detailing how he should play at each of his information sets. An open descending auction ends when the clock is stopped. Hence his only information consists of the current price. A strategy of buyer i then only needs to determine, for each of his possible private values, the announced price at which he will stop the clock (if no other buyer has stopped the clock before that price has been announced). In other words, every strategy of buyer i is a measurable function βi : [0, ∞) → [0, ∞). In both auctions, every strategy vector β = (βi )i∈N leads to the same outcome in both auctions: in a sealed-bid first-price auction, the winning buyer is the one who submits the highest bid, maxi∈N βi (vi ), and the price he pays for the auctioned object is his bid. In an open descending auction, the winning buyer is the one who stops the clock at the price maxi∈N βi (vi ), and the price he pays for the auctioned object is that price. It follows that both types of auction correspond to the same strategic-form game. Remark 12.4 Note that this equivalence obtains without any assumption on the information that each buyer has regarding the other buyers, their preferences, and their identities, or even the number of other buyers. Similarly, it does not depend on the assumption that the private values of the buyers are independent. We next present additional relations between auction methods based on the concept of equilibrium.
12.4
Equilibrium • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Having defined games, buyers, strategies, and payoffs, we next introduce the concept of equilibrium. In actual fact, since an auction is a game with incomplete information (because the type of each buyer, which is his private value, is known to him but not to the other buyers), the concept of equilibrium introduced here is that of Bayesian equilibrium
469
12.4 Equilibrium
(see Definition 9.49 on page 354), corresponding to the interim stage, when each buyer knows his private value. Let β = (β1 , β2 , . . . , βn ) be a strategy vector. Denote by β−i (x−i ) := (βj (xj ))j =i
(12.8)
the vector of bids of the buyers other than i, given their types and their strategies. Denote by ui (β; vi ) buyer i’s expected profit under the strategy vector β, when his private value is vi , (pi (βi (vi ), β−i (x−i ))vi ui (β; vi ) : = V−i
−
i∗ ∈N
pi∗ (βi (vi ), β−i (x−i ))Ci (i∗ ; βi (vi ), β−i (x−i ))) dF−i (x−i ).
(12.9)
Here, V−i := ×j =i Vj is the space of the vectors of the private values of all the buyers except for buyer i, and F−i := ×j =i Fj is the cumulative distribution function of the multidimensional random variable V−i = (Vj )j =i . Note that the expected profit ui (β; vi ) depends on βi , buyer i’s strategy, only via βi (vi ), the bid of buyer i with private value vi . We denote by ui (bi , β−i ; vi ) the expected profit of buyer i with private value vi when he submits bid bi and the other buyers use strategy vector β−i . Definition 12.5 A strategy vector β ∗ is an equilibrium (or a Bayesian equilibrium) if for every buyer i ∈ N and every private value vi ∈ Vi ∗ ui (β ∗ ; vi ) ≥ ui (bi , β−i ; vi ), ∀bi ∈ [0, ∞).
(12.10)
In other words, β ∗ is an equilibrium if no buyer i with private value vi can profit by deviating from his equilibrium bid βi∗ (vi ) to another bid bi . Remark 12.6 Analyzing auctions using mixed strategies is beyond the scope of this book. In the auctions covered in this chapter, the distribution of the private value of each buyer is continuous, and we will show that under appropriate assumptions, equilibria in pure strategies exist in these auctions. Note that if β ∗ is an equilibrium in pure strategies, then no buyer can increase his payoff by deviating to a mixed strategy. To see this, recall that a mixed strategy is a distribution over pure strategies. If the buyer could increase his payoff by deviating to a mixed strategy, then he could do the same by deviating to one of the pure strategies in the support of the mixed strategy. This shows that every equilibrium in pure strategies is also an equilibrium in mixed strategies. In general, however, it is possible for all the equilibria in an auction to be equilibria in completely mixed strategies (see for example Vickrey [1961]). In Section 4.6 (page 91) we considered sealed-bid second-price auctions, and proved the following result (see Theorem 4.15 on page 92). Theorem 12.7 In a sealed-bid second-price auction, the strategy of buyer i in which he bids his private value weakly dominates all his other strategies.
470
Auctions
Remark 12.8 As noted in Remark 12.4, Theorem 12.7 obtains under very few assumptions: we assume nothing regarding the behavior of the other buyers, the number of other buyers, or their identities. In other words, in a sealed-bid second-price auction, revealing your private value is a (weakly) dominant strategy. This is a great advantage that the sealed-bid second-price auction has over other auction methods: it incentivizes every buyer to reveal his true preferences, i.e., how much he truly is willing to pay for the object. From the seller’s perspective, this is an advantage, because he need not be concerned that the buyers will conceal their preferences and act as if they value the object less than they really do. Another, secondary advantage for the seller is that if a buyer who submitted a high bid does not win the auction, the seller, knowing his true preferences, might be able to offer him a similar object. An important consequence of Theorem 12.7 is: Theorem 12.9 In a sealed-bid second-price auction, the strategy vector in which every buyer’s bid equals his private value is an equilibrium. Proof: As stated in Theorem 12.7, in a second-price auction, bidding the true value is a dominant strategy. Corollary 4.27 (page 105; see also Exercise 10.52 on page 433) states that a vector of dominant strategies is an equilibrium, and therefore this strategy vector is a Bayesian equilibrium. Although we have not defined a game corresponding to an open-bid ascending auctions, and in particular not defined a strategy in such an auction, it is possible to regard a behavior under which the buyer lowers his hand and no longer participates in the auction when the declared price reaches his private value as a “strategy” in this type of auctions. We will show that this is a dominant strategy for such a buyer. Since we have not presented the necessary definitions, the proof here is not a formal proof. Theorem 12.10 In an open ascending auction (English auction), the strategy of buyer i that calls on him to lower his hand when the declared price reaches his private value, weakly dominates all his other strategies. Proof: As long as the declared price is lower than buyer i’s private value, he receives 0 with certainty if he quits the auction. On the other hand, if he continues to bid, he stands to receive a positive profit (and certainly cannot lose). When the declared price equals buyer i’s private value, if he quits he receives 0 with certainty, but if he continues to bid he may win the auction and end up paying more for the object than he values it for. Here we are relying on the fact that buyer i knows his private value, and that this value is independent of the values of the other buyers, so that the information given by the timing that the other buyers choose for quitting the auction is irrelevant to his strategic considerations. Similarly to the proof of Theorem 12.9, and referring to Theorem 12.10, we can prove the following theorem. Theorem 12.11 In an open-bid ascending auction, the strategy vector in which every buyer lowers his hand when the declared price equals his private value is an equilibrium.
471
12.5 The symmetric model
Remark 12.12 Note that in the dominant strategy equilibrium of the English auction established in Theorem 12.11, the winner of the object is the buyer with the highest private value and the selling price is the second highest private value. This is the same allocation and the same payment as in the dominant strategy equilibrium of the sealed-bid second-price auction established in Theorem 12.7. Remark 12.13 There are other equilibria in sealed-bid second-price auctions, in addition to the equilibrium in which every buyer’s bid equals his private value. For example, if the private values of two buyers are independent and uniformly distributed over the interval [0, 1], the strategy vector in which buyer 1’s bid is b1 = 1 (for every private value v1 ), and buyer 2’s bid is b2 = 0 (for every private value v2 ), is an equilibrium (Exercise 12.4)
12.5
The symmetric model with independent private values • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
In this section we will study models of sealed-bid auctions that satisfy the following assumptions: (A1) Single object for sale: There is only one object offered for sale in the auction, and it is indivisible. (A2) The seller is willing to sell the object at any nonnegative price. (A3) There are n buyers, denoted by 1, 2, . . . , n. (A4) Private values: All buyers have the same set of possible private values V. This set can be a closed bounded interval [0, v] or the set of nonnegative numbers [0, ∞). Every buyer knows his private value of the object. The random values V1 , V2 , . . . , Vn of the private values of the buyers are independent and identically distributed. Denote by F the common cumulative distribution function of the random variables Vi , i = 1, 2, . . . , n. The support of this distribution is V. (A5) Continuity: For each i, the random variable Vi is continuous, and its density function, which we denote by f , is continuous and positive (this is the density function of the cumulative distribution function F of (A4)). (A6) Risk neutrality: All the buyers are risk neutral, and therefore seek to maximize their expected profits. We further assume that Assumptions (A1)–(A6) are common knowledge among the buyers (see Definition 9.17 on page 331). An auction model satisfying Assumptions (A1)–(A6) is called a symmetric auction with independent private values. This is the model studied in this section. Since every buyer knows his own private value, any additional information, and in particular information regarding the private values of the other buyers, has no effect on his private value. That means that when buyer i’s private value is vi , then if he wins the auctioned object at price p, his profit is vi − p, whether or not he knows the private values of the other buyers. In more general models in which buyers do not know with certainty the value of the auctioned object, the information a buyer has regarding the private values of the other buyers may be important to him, because it may be relevant to updating his
472
Auctions
own private value. Note that even if, after the auction is completed, the winner knows only that he has won, and not the details of the private values of the other buyers, he still obtains information about the other buyers’ private values: he knows that the private values of the other buyers were sufficiently low for them not to submit bids higher than his bid. The assumption that V1 , V2 , . . . , Vn are identically distributed is equivalent to the statement that prior to the random selection of the private values, the buyers are symmetric; each buyer, in his strategic considerations, assumes that all of the other buyers are similar to each other and to him.
12.5.1 Analyzing auctions: an example Definition 12.14 In a symmetric auction with independent private values, an equilibrium (β1∗ , β2∗ , . . . , βn∗ ) is called a symmetric equilibrium βi∗ = βj∗ for all 1 ≤ i, j ≤ n; that is, all buyers implement the same strategy. When β ∗ = (βi∗ )i∈N is a symmetric equilibrium, we abuse notations and denote the common strategy also by β ∗ , that is, β ∗ = βi∗ for every i ∈ N. Such a strategy is called a ∗ the vector of strategies in which symmetric equilibrium strategy. We will denote by β−i ∗ all buyers except buyer i implement strategy β . We will sometimes denote the symmetric equilibrium strategy also by βi∗ when we want to focus on the strategy implemented by buyer i. Example 12.15 Two buyers with uniformly distributed private values3 Suppose that there are two buyers, and that Vi has uniform distribution over [0, 1] for i = 1, 2 (and by Assumption (A4) V1 and V2 are independent). We will show that in a sealed-bid first-price auction the following strategy is a symmetric equilibrium: vi i = 1, 2. (12.11) βi∗ (vi ) = , 2 This equilibrium calls on each buyer to submit a bid that is half of his private value. Suppose that buyer 2 implements this strategy. Then if buyer 1’s private value is v1 , and her submitted bid is b1 , her expected profit is V2 u1 (b1 , β2∗ ; v1 ) = u1 b1 , ; v1 (12.12) 2 V2 = P b1 > (v1 − b1 ) (12.13) 2 = P(2b1 > V2 )(v1 − b1 ) (12.14) = min{2b1 , 1}(v1 − b1 ).
(12.15)
and This function is quadratic over the interval b1 ∈ [0, 12 ] (attaining its maximum at b1 = 1 ∗ linear, with a negative slope, when b1 ≥ 2 . The graph of the function b1 #→ u1 (b1 , β2 ; v1 ) is shown in Figure 12.1 for the case v1 ≤ 21 and the case v1 > 12 . v1 ), 2
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
3 This example also appears on page 412 in Chapter 10.
473
12.5 The symmetric model
u1 (b1, β∗2; v1)
v1 2
u1 (b1, β∗2; v1)
v1
b1
1 2
b1
v1 1 2 2
The case v1 ≤ 12
v1
The case v1 > 12
Figure 12.1 The payoff to buyer 1, as a function of b1 , when buyer 2 implements β2∗
In both cases, the function attains its maximum at the point b1 = v21 . This implies that b1∗ (v1 ) = v21 is the best response to β2∗ , which in turn means that the strategy vector β ∗ = (β1∗ , β2∗ ) is a symmetric equilibrium. We note that from our results so far we can observe that different auction methods have different equilibria:
r In the sealed-bid first-price auction in Example 12.15, a symmetric equilibrium is given by βi∗ (vi ) =
vi 2
.
r In a sealed-bid second-price auction, a symmetric equilibrium is given by βi∗ (vi ) = vi (Theorem 12.9, and Exercise 12.3).
Which auction method is preferable from the perspective of the seller? To answer this question, we need to calculate the seller’s expected revenue in each of the two auction methods. The seller’s expected revenue equals the expected sale price. At the equilibrium that we have calculated, the expected sale price is ' & 1 V1 V2 , = E[max{V1 , V2 }]. (12.16) E max 2 2 2 Denote Z := max{V1 , V2 }. Since V1 and V2 are independent, and have uniform distribution over [0, 1], the cumulative distribution function of Z is FZ (z) = P(Z ≤ z) = P(max{V1 , V2 } ≤ z) = P(V1 ≤ z) × P(V2 ≤ z) = z2 . It follows that the density function of Z is 2z fZ (z) = 0
if 0 ≤ z ≤ 1, otherwise.
(12.17)
(12.18)
We deduce from this that the expected revenue is 51 51 1 E[Z] = 12 0 zfZ (z)dz = 0 z2 dz = 31 . 2
(12.19)
min{V1 , V2 } + max{V1 , V2 } = V1 + V2 ,
(12.20)
The seller’s expected revenue in a sealed-bid second-price auction is given by E[min{V1 , V2 }]. Note that
and hence E[min{V1 , V2 }] + E[max{V1 , V2 }] = E[V1 ] + E[V2 ] =
1 2
+
1 2
= 1.
(12.21)
We have already calculated that E[max{V1 , V2 }] = E[Z] = 32 , so E[min{V1 , V2 }] = 13 . In other ◭ words, the seller’s expected revenue in a sealed-bid second-price auction is 13 .
474
Auctions
Corollary 12.16 In Example 12.15, in equilibrium, the expected revenue of the seller is the same, whether the auction method used is a sealed-bid first-price auction or secondprice auction. This result is surprising at first sight, because one “would expect” that the seller would be better off selling the object at the price of the highest bid submitted, rather than the second-highest bid. However, buyers in a sealed-bid first-price auction will submit bids that are lower than those they would submit in a sealed-bid second-price auction, because in a sealed-bid first-price auction the winner pays what he bids, while in a sealed-bid second-price auction the winner pays less than his bid. The fact that these two opposing elements (on one hand, the sale price in a sealed-bid first-price auction is the highest bid, while on the other hand, bids are lower in a sealed-bid first-price auction) cancel each other out and lead to the same expected revenue, is a mathematical result that is far from self-evident. The equivalence between sealed-bid first-price auctions and open-bid descending auctions (Theorem 12.3), and the equivalence between the equilibrium payments in sealed-bid second-price auctions and open-bid ascending auctions (Remark 12.12), lead to the following corollary. Corollary 12.17 In Example 12.15, all four auction methods presented, the sealed-bid first price auction, sealed-bid second price auction, open-bid ascending auction and open-bid descending auction yield the seller the same expected revenue in equilibrium. As we will see later, the equivalence of the expected profit in these four auction methods follows from a more general result (Theorem 12.23), called the Revenue Equivalence Theorem.
12.5.2 Equilibrium strategies In this section, we will compute the equilibria of several auction methods. Definition 12.18 A symmetric equilibrium strategy β ∗ is monotonically increasing if the higher the private value, the higher the buyer’s bid: v < v′
=⇒
β ∗ (v) < β ∗ (v ′ ), ∀v, v ′ ∈ V.
(12.22)
If β is a monotonically increasing symmetric equilibrium, the winner of the auction is the buyer with the highest private value. Since the distribution of V is continuous, the probability that two buyers have the same private value is 0. We proceed now to find monotonically increasing symmetric equilibria. Define Y = max{V2 , V3 , . . . , Vn }.
(12.23)
This is a random variable, whose value equals the highest private value of buyers 2, . . . , n. From buyer 1’s perspective, this is the highest private value of his competitors. In a monotonically increasing symmetric equilibrium, buyer 1 wins the auction if and only if Y < V1 . (As we previously stated, the event Y = V1 has probability 0, so we ignore it, as it has no effect on the expected profit.) The following theorem identifies a specific symmetric equilibrium, in symmetric auctions with independent private values.
475
12.5 The symmetric model
Theorem 12.19 In a symmetric auction with independent private values the following strategy defines a symmetric equilibrium: β(v) := E[Y | Y ≤ v], ∀v ∈ V \ {0},
(12.24)
and β(0) := 0. Proof: Step 1: β is a monotonically increasing function. Recall that for every random variable X, and every pair of disjoint events A and B: E[X | A ∪ B] = P(A | A ∪ B)E[X | A] + P(B | A ∪ B)E[X | B].
(12.25)
Note that by the assumption that the density function f is positive (Assumption (A5)), it follows that Y is a continuous random variable with positive density function fY . Let v be an interior point of V1 ; that is, v > 0 and if V = [0, v] is a bounded interval, then v < v. For every δ > 0 satisfying v + δ ∈ V, β(v + δ) = E[Y | Y ≤ v + δ]
= P(Y ≤ v | Y ≤ v + δ) × E[Y | Y ≤ v]
+ P(v < Y ≤ v + δ | Y ≤ v + δ) × E[Y | v < Y ≤ v + δ] P(Y ≤ v) = E[Y | Y ≤ v] P(Y ≤ v + δ) P(v < Y ≤ v + δ) E[Y | v < Y ≤ v + δ]. (12.26) + P(Y ≤ v + δ)
Since the density function fY is positive, and since v is an interior point of V1 , E[Y | Y ≤ v] ≤ v < E[Y | v < Y ≤ v + δ].
(12.27)
From Equations (12.26)–(12.26), we deduce that the β(v + δ) is the weighted average of two numbers, which, by Equation (12.27), satisfy the property that one is strictly greater than the other. Since the density function fY is positive, and since v > 0, the weights of both terms are positive. It follows that β(v + δ) is strictly greater than the minimal number among the two, which is, according to Equation (12.27), E[Y | Y ≤ v], that is, β(v + δ) > E[Y | Y ≤ v] = β(v).
(12.28)
Therefore, β is an increasing function. Step 2: β is a continuous function. We will first show that β is continuous at v = 0. This obtains because for each v ∈ V, v > 0, 0 ≤ β(v) = E[Y | Y ≤ v] ≤ v,
(12.29)
leading to limv→0 β(v) = 0 = β(0). We next show that β is continuous at each interior point v of V. By the definition of conditional expectation, 5v v 1 0 yfY (y)dy β(v) = E[Y | Y ≤ v] = = yfY (y)dy. (12.30) P(Y ≤ v) FY (v) 0
476
Auctions
Since the random variable Y is continuous, the cumulative distribution function FY is a continuous function. Since the density function fY is positive for all v > 0, FY (v) > 0; the denominator is not zero. It follows that β is the quotient of two continuous functions of v in which the function in the denominator is non-zero for v > 0, and hence it is a continuous function. Note that from Equation (12.30) we can deduce, by integrating by parts, that v v FY (v)E[Y | Y ≤ v] = yfY (y)dy = vFY (v) − FY (y)dy. (12.31) 0
0
This equation will be useful later in the proof. Step 3: β is a symmetric equilibrium strategy. Suppose that buyers 2, 3, . . . , n all implement strategy β. We will show that in that case, buyer 1’s best reply is the same strategy β. Let v1 be buyer 1’s private value. If V = [0, v] is a bounded interval, suppose that v < v, which occurs with probability 1 (why?). Since the function β is monotonically strictly increasing (Step 1) and continuous (Step 2), it has a continuous inverse β −1 . Buyer 1’s expected profit, when he bids b1 , is u1 (b1 , β−1 ; v1 ) = P(β(Y ) < b1 )(v1 − b1 ).
(12.32)
If buyer 1 bids b1 = 0, with probability 1 he does not win the auction: since the density of Y is positive in the interval V, the probability that Y > 0 is 1, and since β is monotonically increasing, with probability 1 another buyer bids more than 0. We deduce that u1 (0, β−1 ; v1 ) = 0. If buyer 1 bids b1 greater than or equal to his private value v1 , by Equation (12.32) his expected payoff is nonpositive: u1 (b1 , β−1 ; v1 ) ≤ 0, ∀b1 ≥ v1 .
(12.33)
Since the density of Vi is positive in the set V, the probability P(β(Y ) < b1 ) is positive for every b1 > 0. It follows that in the domain V \ {0} the function b1 #→ u1 (b1 , β−1 ; v1 ) is the product of two positive functions and it is therefore a positive function. To summarize, we proved that u1 (b1 , β−1 ; v1 ) is positive for b1 ∈ (0, v1 ) and nonpositive for b1 ∈ (0, v1 ), and therefore the function b1 #→ u1 (b1 , β−1 ; v1 ), attains its maximum at an interior point of [0, v1 ]. We next present the expected profit u1 (b1 , β−1 ; v1 ) in a more useful form: u1 (b1 , β−1 ; v1 ) = P(β(Y ) < b1 )(v1 − b1 ) = P(Y < β
= FY (β
= FY (β
−1
−1
−1
(b1 ))(v1 − b1 )
(b1 )) × (v1 − β(β
(12.34) (12.35) −1
(b1 )))
(b1 )) × (v1 − E[Y | Y ≤ β
(12.36) −1
(b1 )]).
(12.37)
Let b1 be an interior point of the interval β(V), which is the image of β, and denote z1 := β −1 (b1 ). Then b1 = β(z1 ), and hence u1 (β(z1 ), β−1 ; v1 ) = FY (z1 ) × (v1 − E[Y | Y ≤ z1 ]).
(12.38)
477
12.5 The symmetric model
Denote the right-hand side of Equation (12.38) by h(z1 ): h(z1 ) := FY (z1 ) × (v1 − E[Y | Y ≤ z1 ])
= FY (z1 )(v1 − z1 ) + FY (z1 )z1 − FY (z1 )E[Y | Y ≤ z1 ] z1 FY (y)dy, = FY (z1 )(v1 − z1 ) +
(12.39) (12.40) (12.41)
0
where Equation (12.41) follows from Equation (12.31). To find the point b1 ∈ V at which the maximum of u1 (b1 , β−1 ; v1 ) is attained, it suffices to find the point z1 ∈ V at which the maximum of h(z1 ) is attained. To do so, differentiate h; the function h is differentiable over the interval (0, v1 ), and its derivative is h′ (z1 ) = fY (z1 )(v1 − z1 ) − FY (z1 ) + FY (z1 ) = fY (z1 )(v1 − z1 ).
(12.42)
The derivative h′ equals zero at a single point, z1 = v1 , which is therefore the maximum of h. In other words, buyer 1’s best reply, when his private value is v1 and the other buyers implement strategy β, is β −1 (b1 ) = z1 = v1 , i.e., b1 = β(v1 ). In summary, our results on equilibrium strategies in sealed-bid first-price and secondprice auctions are as follows: Corollary 12.20 In a symmetric sealed-bid auction with independent private values:
r β(v) = E[Y | Y ≤ v] is a symmetric equilibrium strategy in the sealed-bid first-price auction. r β(v) = v is a symmetric equilibrium strategy in the sealed-bid second-price auction. Example 12.15 (Continued) When there are two buyers, with private values uniformly distributed over [0, 1], β(v) = E[Y | Y ≤ v] = v2 . This is the symmetric equilibrium strategy we found on ◭ page 472.
We next compute the expected profits of the buyers and the seller in these two auction methods. Theorem 12.21 In the symmetric equilibria given by Corollary 12.20, the expected payment that a buyer with private value v makes for the object is FY (v) × E[Y | Y ≤ v], in both sealed-bid first-price and second-price auctions. Proof: At equilibrium in a sealed-bid second-price auction, a buyer with private value v submits a bid of v. He wins the auction with probability FY (v), and the expected amount he pays is E[Y | Y ≤ v]. His expected payment is therefore FY (v) × E[Y | Y ≤ v], as claimed above. At equilibrium in a sealed-bid first-price auction, a buyer with private value v submits a bid of E[Y | Y ≤ v]. He wins the auction with probability FY (v), and pays what he bid. His expected payment for the object is therefore also FY (v) × E[Y | Y ≤ v]. Corollary 12.22 In a symmetric sealed-bid auction with independent private values, at the symmetric equilibrium, the expected revenue of a seller in both sealed-bid first-price
478
Auctions
and second-price auctions, is π =n
V
FY (v)E[Y | Y ≤ v]f (v)dv.
(12.43)
Proof: The expected payment of a buyer with private value 5 v is FY (v)E[Y | Y ≤ v]. It follows that the expected payment made by each buyer is V FY (v)E[Y | Y ≤ v]f (v)dv. Since the seller’s expected revenue is the sum of the expected payments of the n buyers, the result follows.
12.5.3 The Revenue Equivalence Theorem In the previous section, we saw that the symmetric and monotonically increasing equilibrium that we found in sealed-bid first-price and second-price auctions always yields the seller the same expected revenue. Is this coincidental, or is there a more general result implying this? As we shall see in the sequel, the Revenue Equivalence Theorem shows that there is indeed a more general result, ascertaining that the expected revenue of the seller is constant over a broad family of auction methods. Recall that we denote by (p, C) a sealed-bid auction in which the winner is determined by the function p, and each buyer’s payment is determined by the function C. Let β : V → [0, ∞) be a monotonically increasing strategy. Denote by ei (vi ) = e(p, C, β; vi ) the expected payment that buyer i with private value vi pays in auction method (p, C), when all the buyers implement strategy β: ei (vi ) :=
V−i i ∈N ∗
pi∗ (β(v1 ), β(v2 ), . . . , β(vn ))Ci (i∗ ; β(v1 ), β(v2 ), . . . , β(vn ))dF−i (v−i ). (12.44)
Theorem 12.23 Let β be a symmetric and monotonically increasing equilibrium in a sealed-bid symmetric auction with independent private values satisfying the following properties: (a) the winner of the auction is the buyer with the highest private value, and (b) the expected payment made by a buyer with private value 0 is 0. Then ei (vi ) = FY (vi )E[Y | Y ≤ vi ].
(12.45)
Property (a) of the equilibrium is known as the “efficiency condition”: an efficient auction is one in which, at equilibrium, the auctioned object is allocated to the buyer who most highly values it. Since the seller is willing to sell the auctioned object at any nonnegative price, and since the private values of the buyers are nonnegative, in an efficient auction the object is sold with probability 1. Under a symmetric and monotonically increasing equilibrium, the object is sold to the highest bidder. The expression on the right-hand side of Equation (12.45) is independent of the auction methods, and depends solely on the distribution of Y , which is determined by the distribution of the private values of the buyers. Theorem 12.23 states therefore that the expected payment that a buyer with private value v makes is independent of the auction method, and depends only on the distribution of the private values of the buyers. It follows that if n risk-neutral buyers are asked whether they prefer to participate in a sealed-bid first-price
479
12.5 The symmetric model
auction, or a sealed-bid second-price auction, they have no reason to prefer one to the other. By integrating Equation (12.45) over buyer types, we deduce that the seller’s expected revenue is independent of the auction method: Corollary 12.24 (The Revenue Equivalence Theorem) In a symmetric sealed-bid auction with independent private values, let β be a symmetric and monotonically increasing equilibrium satisfying the following properties: (a) the winner is the buyer with the highest private value, and (b) the expected payment of each buyer with private value 0 is 0. Then the seller’s expected revenue is π = n ei (v)f (v)dv, (12.46) V
where ei (v) = FY (v)E[Y | Y ≤ v].
(12.47)
Proof of Theorem 12.23: Since the private values are independent and identically distributed, the cumulative distribution function of Y is FY = F n−1 .
(12.48)
Let β be a symmetric and monotonically increasing equilibrium strategy in a sealed-bid auction (p, C). Let v1 ∈ V be a private value of buyer 1 (which is not 0 and is not v if V = [0, v] is a bounded interval). If buyer 1 with private value v1 deviates from the strategy and plays as if his private value is z1 , he wins only if z1 is higher than the private values of the other buyers, and the probability of that occurring is FY (z1 ). His profit in this case is u1 (βz1 , β−1 ; v1 ) = v1 FY (z1 ) − e1 (z1 ).
(12.49)
Since β is an equilibrium, buyer 1’s best reply is z1 = v1 . In other words, the function z1 #→ u1 (βz1 , β−1 ; v1 ) attains its maximum at z1 = v1 , which is an interior point of V. We next prove that the function e1 is differentiable, and compute its derivative. Since the function z1 #→ u1 (βz1 , β−1 ; v1 ) attains its maximum at z1 = v1 , for any pair of interior points v1 , z1 in the interval V one has v1 FY (z1 ) − e1 (z1 ) = u1 (βz1 , β−1 ; v1 ) ≤ u1 (βv1 , β−1 ; v1 ) = v1 FY (v1 ) − e1 (v1 ).
(12.50)
By exchanging the roles of z1 and v1 , we deduce that for every pair of interior points v1 , z1 in the interval V one has z1 FY (v1 ) − e1 (v1 ) = u1 (βv1 , β−1 ; z1 ) ≤ u1 (βz1 , β−1 ; z1 ) = z1 FY (z1 ) − e1 (z1 ).
(12.51)
From Equations (12.50) and (12.51), by rearrangement, we have: e1 (v1 ) − e1 (z1 ) ≤ (FY (v1 ) − FY (z1 ))v1 , e1 (v1 ) − e1 (z1 ) ≥ (FY (v1 ) − FY (z1 ))z1 .
(12.52) (12.53)
480
Auctions
For z1 = v1 , dividing Equations (12.52) and (12.53) by v1 − z1 , and taking the limit as z1 goes to v1 , we get lim
z1 →v1
e1 (v1 ) − e1 (z1 ) = v1 fY (v1 ), ∀v1 ∈ V, v1 ∈ {0, v}. v1 − z 1
(12.54)
In particular, e1 is a differentiable function and its derivative is e1′ (v1 ) = v1 fY (v1 ) for every v1 ∈ V1 . Note that the derivative e1′ is independent of the auction method. Since e1 (0) = 0, by integration, for every v1 ∈ V (including the extreme points) we get v1 v1 e1 (v1 ) = e1 (0) + e1′ (y)dy = yfY (y)dy = FY (v1 )E[Y | Y ≤ v1 ], 0
0
(12.55)
which is what we wanted to prove.
We now show how to use the Revenue Equivalence Theorem to find symmetric equilibrium strategies in various auctions. Theorem 12.25 Let β be a symmetric, monotonically increasing equilibrium strategy, satisfying β(0) = 0 in a symmetric sealed-bid first-price auction with independent private values. Then β(v) = E[Y | Y ≤ v].
(12.56)
This theorem complements Theorem 12.19, where we proved that β(v) = E[Y | Y ≤ v] is a symmetric equilibrium strategy that is monotonically increasing and satisfies β(0) = 0. Theorem 12.25 shows that this is the unique such symmetric equilibrium in sealed-bid first-price auctions. Proof: Since the function β is monotonic, a buyer with private value v wins the auction if and only if his private value is higher than the private values of all the other buyers. It follows that the probability that a buyer with value v wins the auction is FY (v). If he wins, he pays his bid, meaning that he pays β(v). The expected payment that the buyer makes is therefore e(v) = FY (v)β(v).
(12.57)
Since β satisfies the conditions of Theorem 12.23 (note that the condition that β(0) = 0 guarantees that at this equilibrium, e(0) = 0), Theorem 12.23 implies that e(v) = FY (v)E[Y | Y ≤ v].
(12.58)
Since FY (v) > 0 for every v > 0, from Equations (12.57)–(12.58) we get β(v) = E[Y | Y ≤ v], which is what we wanted to show.
(12.59)
The following theorem exhibits the equilibrium of an all-pay auction in which every buyer pays the amount of his bid, whether or not he wins the auctioned object (see page 465).
481
12.5 The symmetric model
Theorem 12.26 Let β be a symmetric, monotonically increasing equilibrium strategy, satisfying β(0) = 0, in a symmetric sealed-bid all-pay auction with independent private values. Then β(v) = FY (v)E[Y | Y ≤ v].
(12.60)
Proof: In a sealed-bid all-pay auction, every buyer pays his bid, in any event, and it follows that the payment that a buyer with private value v makes is e(v) = β(v). Since the conditions of Theorem 12.23 are guaranteed by the monotonicity of β and the condition β(0) = 0, we deduce that e(v) = FY (v)E[Y | Y ≤ v]. It follows that β(v) = FY (v)E[Y | Y ≤ v], which is what we needed to prove. Example 12.15 (Continued) A sealed-bid first-price auction with two buyers Consider a sealed-bid firstprice auction with two buyers, where the private values of the buyers are independent and uniformly distributed over [0, 1]. We will compute the following:
r e(v), the expected payment of a buyer with a private value v. r e, the buyer’s expected payment, before he knows his private value. r E = ne, the seller’s expected revenue. For each v ∈ [0, 1], one has FY (v) = v and fY (v) = 1, and we have seen that β(v) = E[Y | Y ≤ v] = v2 . Therefore, e(v) = FY (v)E[Y | Y ≤ v] =
1
v2 v3 dv = 6 0 2 1 1 π = 2 6 = 3. e=
The seller’s expected revenue π is
1 , 3
'1 0
v2 , 2
= 61 ,
as we computed directly on page 473.
(12.61) (12.62) (12.63)
◭
Example 12.27 A sealed-bid first-price auction with an arbitrary number of buyers Consider a symmetric sealed-bid first-price auction with n ≥ 2 buyers. The private values of the buyers are independent and uniformly distributed over [0, 1]. Then FY (v) = v n−1 and fY (v) = (n − 1)v n−2 , for each v ∈ [0, 1]. By Theorem 12.26, the symmetric equilibrium strategy is 5v 5v (n − 1)x n−1 dx n−1 0 xfY (x)dx β(v) = E[Y | Y ≤ v] = = 0 v. (12.64) = FY (v) v n−1 n It follows that the expected payment of a buyer with private value v is n−1 n v . n The buyer’s expected payment, before he knows his private value, is n−1 1 n−1 1 n , v dv = e= n n n+1 0 e(v) = FY (v)E[Y | Y ≤ v] =
(12.65)
(12.66)
482
Auctions and the seller’s expected revenue is π = ne =
n−1 . n+1
(12.67)
This value converges to 1 as n increases to infinity. Since the seller’s revenue equals the sale price, we deduce that the sale price converges to 1 as the number of buyers approaches infinity (explain ◭ intuitively why this should be expected).
12.5.4 Entry fees We have assumed, up to now, that participation in an auction is free, and that buyers therefore lose nothing in submitting bids. In this section, we explore, via examples, how adding entry fees for auctions may affect the strategies of buyers, and the seller’s expected revenue. Example 12.28 Sealed-bid second-price auction with entry fee Consider a sealed-bid second-price auction with entry fee λ ∈ [0, 1]. In such an auction, a buyer may decide not to participate; for example, he may decline to participate if his private value is lower than the entry free. If there is only one buyer submitting a bid, that buyer can win the auction by bidding 0. As in second-price auctions without entry fees, when a buyer decides to participate in a secondprice auction with an entry fee, his bid will be his private value of the auctioned object. To formulate this claim precisely, denote the set of actions of each buyer by A = R+ ∪ {“no”}, where “no” means “don’t participate in the auction” and x ∈ R+ means “participate in the auction, pay the entry fee λ and bid the price x”. A (pure) strategy of buyer i is a measurable function βi : Vi → A. That is, when buyer i’s private value is vi he implements action βi (vi ). Theorem 12.29 In a sealed-bid second-price auction with entry fee, for every strategy βi of buyer i = βi , i weakly dominates βi , if β i the following strategy β βi (v) = “no”, i (v) = “no” β (12.68) v βi (v) = x.
The proof of the theorem is similar to the proof of Theorem 4.15 (page 92); the proof is left to the reader (Exercise 12.22). Theorem 12.29 implies that to find an equilibrium in a sealed-bid second-price auction with entry fees, we have to find for each buyer the set of private values for which he will participate in the auction. Suppose that there are two buyers, and that the private values V1 and V2 are independent and uniformly distributed over [0, 1]. Since the buyer knows his own private value before he submits his bid, if his private value is low, he will not participate in the auction. There must therefore exist a threshold value v0 such that no buyer with a private value below v0 will participate in the auction. Suppose that buyer 1’s private value equals the threshold V1 = v0 . If the equilibrium is monotonic, this buyer will win the auction if and only if buyer 2 does not participate in the auction, since if buyer 2 participates, with probability 1 his private value V2 is greater than v0 , and therefore buyer 2’s bid is greater than buyer 1’s private value v0 . It follows that P(winning the auction | v0 ) = v0 . On the other hand, when the private value of buyer 1 equals the threshold value v0 , he is indifferent between participating and not participating. The buyer’s expected profit if he participates is P(winning the auction | v0 ) × v0 − λ = (v0 )2 − λ,
(12.69)
483
12.5 The symmetric model and his profit if he does not participate is 0, we deduce that (v0 )2 − λ = 0, or v0 = equilibrium strategy in this game is therefore
√ λ. An
√ if v < √λ, β(v) = (12.70) if v ≥ λ. √ The probability that each buyer will participate in the auction is 1 − λ. To compute the seller’s expected revenue, denote Vmax = max{V1 , V2 }, and Vmin = min{V1 , V2 }.
“Don’t participate” v
r If Vmin ≥ v0 , both buyers participate in the auction, the seller receives 2λ as entry fee, and the sale price of the auctioned object is Vmin .
r If Vmax < v0 , no buyer will participate, and the seller’s revenue is 0. r If Vmin < v0 ≤ Vmax , only one buyer participates in the auction, the seller receives λ as entry fee, and the sale price of the auctioned object will be 0.
The seller’s expected revenue, as a function of the entry fee λ, is therefore π(λ) = P(Vmin ≥ v0 )(2λ + E[Vmin | Vmin ≥ v0 ]) + P(Vmin < v0 ≤ Vmax ) × λ.
(12.71)
Now, FVmin (z) = P(Vmin ≤ z) = z + (1 − z)z = z(2 − z), fVmin (z) =
FV′ min (z)
= 2(1 − z),
2
P(Vmin ≥ v0 ) = (1 − v0 ) ,
(12.72) (12.73) (12.74)
P(Vmin < v0 ≤ Vmax ) = 2P(V1 < v0 ≤ V2 ) = 2v0 (1 − v0 ), 1 1 vfVmin (v)dv E[Vmin | Vmin ≥ v0 ] = P(Vmin ≥ v0 ) v0 1 1 2v0 + 1 . = 2v(1 − v)dv = 2 (1 − v0 ) v0 3
(12.75)
(12.76)
By inserting the values of Equations (12.74)–(12.76) in (12.71), and using the fact that v0 = we get
√ √ √ 2 √ 2 λ+1 π(λ) = (1 − λ) 2λ + + 2 λ(1 − λ) × λ 3 √ √ √ √ (1 − λ) = (1 − λ)(1 + 2 λ + 6λ) + (6λ λ) 3 √ √ (1 − λ)(4λ + λ + 1) = 3 This is a concave function of λ, satisfying π(0) = π, we have: π ′ (λ) = [(1 −
1 3
√ λ,
(12.77) (12.78) (12.79)
and π(1) = 0. Differentiating the function
√ √ √ 3 λ)(4λ + λ + 1)]′ = [1 + 3λ − 4λ 2 ]′ = 3 − 6 λ.
(12.80)
484
Auctions The derivative π ′ vanishes at λ∗ = 41 , where π 14 = 31 1 − 12 4 · 14 +
1 2
+1 =
1 3
·
1 2
·
5 2
=
5 12
> 13 .
(12.81)
Because π( 14 ) is greater than π(0) and greater than π(1), the function π attains its maximum at the point λ∗ = 41 , and hence the entry fee maximizing the seller’s expected revenue is λ∗ = 41 . We conclude that in this case, a sealed-bid second-price auction with entry fee 14 yields the seller an
expected revenue that is greater than what he can receive from a sealed-bid second-price auction ◭ without entry fees.
Remark 12.30 The fact that the seller’s expected revenue from a sealed-bid second-price auction with entry fee is greater than his expected revenue from a sealed-bid second-price auction without entry fee does not contradict the Revenue Equivalence Theorem (Theorem 12.23), because the auction with entry fees does not satisfy the efficiency property: if both buyers have private values lower than the entry fee, the auctioned object is not sold, despite the fact that there is a buyer willing to pay for it.
12.6
The Envelope Theorem • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Recall that, given the strategy of the other buyers β−i , the profit of buyer i with private value vi who submits a bid bi , is ui (bi , β−i ; vi ). The expected profit of the buyer is the difference between the product of the probability he will win the auction and his private value of the auctioned object, and his expected payment to the seller: ui (bi , β−i ; vi ) = P(buyer i wins the auction | bi , β−i ) × vi
− E[buyer i’s payment to the seller | bi , β−i ].
(12.82)
Both the probability of winning the auction and buyer i’s payment depend on the bid bi that he submits (and the strategies of the other buyers), but not on his private value. The
u ∗i (β − i ; vi ) u i (bi (6), β− i ; vi) u i (bi (5) , β− i ; vi) u i (bi (4) , β− i ; vi) u i (bi (3) , β− i ; vi) u i (bi (2) , β− i ; vi) u i (bi (1) , β− i ; vi) vi Figure 12.2 The function ui (b, β−i ; vi ), for different values of bi and the upper
envelope u∗i (β−i ; vi ) (the bold curve)
485
12.6 The Envelope Theorem
function ui (bi , β−i ; vi ) is therefore linear in vi . At equilibrium, the buyer’s bid maximizes his expected profit. The buyer’s profit at equilibrium is therefore u∗i (β−i ; vi ) := max ui (bi , β−i ; vi ). bi ≥0
(12.83)
This function is called the upper envelope, because if we draw the function vi #→ ui (bi , β−i ; vi ) for every bi , then u∗i (β−i ; vi ) is the upper envelope of this family of linear functions. Figure 12.2 shows some of these linear functions vi #→ ui (bi , β−i ; vi ) for various values of bi , along with the upper envelope. Denote by bi∗ (vi ) a value of bi at which the maximum of ui (bi , β−i ; vi ) for a given vi is obtained: u∗i (β−i ; vi ) = ui (bi∗ (vi ), β−i ; vi ).
(12.84)
That is, bi∗ is a best reply of the buyer to the strategy vector β−i . Assuming that the function ui is differentiable4 and the function vi #→ bi∗ (vi ) is also differentiable, the function vi #→ u∗i (β−i ; vi ) is differentiable, and its derivative is ∂u∗i (β−i ; vi ) ∂vi db∗ (vi ) ∂ui ∂ui (bi , β−i ; vi )|bi =bi∗ (vi ) + (bi , β−i ; vi )|bi =bi∗ (vi ) · i . = ∂vi ∂bi dvi
(12.85)
i (b , β−i ; vi )|bi =bi∗ (vi ) = 0 at the maximal point bi∗ (vi ), If for each private value vi , ∂u ∂bi i then the second term is zero, leading to the following conclusion, called the Envelope Theorem, which has many applications in economics.
Theorem 12.31 (The Envelope Theorem) Let bi∗ be a best response of buyer i to the strategy vector β−i of the other buyers; i.e., bi∗ satisfies Equation (12.84). If the function ui (b1 , β−i ; vi ) is differentiable, and the function vi #→ bi∗ (vi ) is differentiable and satisfies ∂ui (b , β−i ; vi )|bi =bi∗ (vi ) = 0 for every private value vi , then ∂bi i ∂u∗i ∂ui (β−i ; vi ) = (bi , β−i ; vi )|bi =bi∗ (vi ) . ∂vi ∂vi
(12.86)
i (b , β−i ; vi )|bi =bi∗ (vi ) = 0 holds if 0 < bi∗ (vi ) < ∞ Remark 12.32 The condition ∂u ∂bi i because this is the first-order condition for a local maximum. If follows that it is necessary to check that it holds only if the maximum is at an extreme point, i.e., only if bi∗ (vi ) = 0.
To apply the chain rule, the function vi #→ bi∗ (vi ) has to be differentiable. That means that the equilibrium strategy must be differentiable. A symmetric equilibrium β ∗ is called a differentiable symmetric equilibrium if the function β ∗ is differentiable.
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
4 A real-valued multi-variable function is differentiable if it is continuously differentiable (i.e., its derivative is continuous) with respect to each variable. This is equivalent to it being differentiable in every direction in the space of variables.
486
Auctions
Example 12.33
Sealed-bid first-price auction Consider a sealed-bid first-price auction with n buyers.
Suppose that the private values of the buyers are independent of each other, that they are all in the unit interval [0, 1], and that they share the same cumulative distribution function F . Assuming that there exists a monotonically increasing and differentiable symmetric equilibrium strategy β ∗ , we ∗ is the strategy vector in which all the can compute it using the Envelope Theorem. Recall that β−i buyers, except for buyer i, use the strategy β ∗ . The expected profit of buyer i with private value vi who submits a bid bi is ∗ ∗ ; vi ) = P(the buyer wins the auction | bi , β−i ) × (vi − bi ) ui (bi , β−i ∗ −1
= (F ((β ) (bi )))
n−1
× (vi − bi ).
(12.87) (12.88)
Since β ∗ is an equilibrium, βi∗ (vi ) is the best response of buyer i with private value vi to ∗ , and therefore bi∗ (vi ) = βi∗ (vi ). If β is differentiable, then β −1 is also differentiable, and β−i ∗ then ui (bi , β−i ; vi ) is differentiable. Since the strategy β ∗ is monotonically increasing, and ∗ because β (0) ≥ 0 and β ∗ (1) ≤ 1, it follows that 0 ≤ β ∗ (vi ) ≤ 1 for all vi ∈ (0, 1), and therefore ∂ui ∗ (bi , β−i ; vi )|bi =βi∗ (vi ) = 0. The Envelope Theorem implies that ∂bi ∂u∗i ∗ ∂ui ∗ (β−i ; vi ) = (bi , β−i ; vi )|bi =βi∗ (vi ) = (F (vi ))n−1 . ∂vi ∂vi
(12.89)
∗ ; 0) = 0, i.e., the profit of a buyer with private value 0 is 0. By integrating, we get Note that u∗i (β−i vi ∗ ∗ ui (β−i ; vi ) = (F (xi ))n−1 dxi . (12.90) 0
From this equation, along with Equation (12.88), for bi = bi∗ (vi ) = β ∗ (vi ), we get vi (F (xi ))n−1 dxi . (F (vi ))n−1 vi − β ∗ (vi ) =
(12.91)
0
After moving terms from one side of the equals sign to the other, we have: 5 vi (F (xi ))n−1 dxi ∗ . β (vi ) = vi − 0 (F (vi ))n−1
(12.92)
In other words, if a monotonically increasing, differentiable, and symmetric equilibrium exists, it is necessarily given by Equation (12.92). Recall that, according to Theorem 12.19, in a symmetric sealed-bid first-price auction with independent private values, the symmetric equilibrium is β ∗ (v) = E[Y | Y ≤ v]. It follows that in the case before us, in which the distribution of the private values of the buyers is the uniform distribution over [0, 1], this expression must be equal to the expression given by Equation (12.92). The reader is asked to check directly that these two expressions indeed equal each other in ◭ Exercise 12.24.
Example 12.34
Sealed-bid first-price auction with a reserve price An auction with a reserve price ρ
is an auction in which every bid below ρ is invalid. Consider a sealed-bid first-price auction with a reserve price ρ ∈ [0, 1] and two buyers whose private values are independent and uniformly distributed over [0, 1]. What is the symmetric equilibrium strategy β ∗ ? A buyer with a private value lower than or equal to ρ cannot profit no matter what bid he makes. Using the Envelope Theorem we can find a symmetric, monotonically increasing and differentiable equilibrium strategy satisfying
487
12.6 The Envelope Theorem β ∗ (v1 ) = v1 for all v1 ∈ [0, ρ]. This choice is arbitrary and it guarantees that a bid by a buyer whose private value is less than ρ is invalid.5 Step 1: ρ ≤ β ∗ (v1 ) < v1 for all v1 ∈ (ρ, 1]. u1 (b1 , β ∗ ; v1 ) = 0 for all b1 < ρ, and u1 (b1 , β ∗ ; v1 ) ≤ 0 for all b1 ≥ v1 . For all b1 ∈ (ρ, v1 ), u1 (b1 , β ∗ ; v1 ) ≥ P(V2 < ρ)(b1 − ρ) > 0.
(12.93)
It follows that the maximum of the function b1 #→ u1 (b1 , β ∗ ; v1 ) is attained at a point in the interval [ρ, v1 ). Step 2: ρ < β ∗ (v1 ) < v1 for all v1 ∈ (ρ, 1]. Suppose by contradiction that there exists v1 ∈ (ρ, 1] such that β ∗ (v1 ) = ρ and let v1 ∈ (ρ, v1 ). Since β ∗ is a monotonic strategy, v1 ) < β ∗ (v1 ) = ρ, β ∗ (
(12.94)
in contradiction to Step 1. Step 3: Computing β ∗ . For v1 ∈ (ρ, 1] the maximum of the function b1 #→ u1 (b1 , β ∗ ; v1 ) is attained at a point in the interval 1 (b , β ∗ ; v1 )|b1 =β ∗ (v1 ) = (ρ, v1 ), this is a local maximum, and since β ∗ is a differentiable function, ∂u ∂b1 1 0. By the Envelope Theorem: ∂u∗1 ∗ ∂u1 (β ; v1 ) = (b1 , β ∗ ; v1 )|b1 =β ∗ (v1 ) . ∂v1 ∂v1
(12.95)
Since the distribution of V2 is the uniform distribution over [0, 1], and since β ∗ is monotonically increasing, u1 (b1 , β ∗ ; v1 ) = P(β ∗ (V2 ) < b1 ) × (v1 − b1 ) = (β ∗ )−1 (b1 ) × (v1 − b1 ).
(12.96)
∂u1 (b1 , β ∗ ; v1 ) = (β ∗ )−1 (b1 ), ∂v1
(12.97)
Therefore,
and Equations (12.82) and (12.95)–(12.97) imply that ∂u∗1 ∗ (β ; v1 ) = (β ∗ )−1 (β ∗ (v1 )) = v1 . ∂v1
(12.98)
∗ ; v1 ) = 0. By integration, we get For v1 ≤ ρ, the profit is zero: u∗1 (β−1
u∗1 (β ∗ ; v1 )
=
v1 ρ
∂u∗1 ∗ (β ; t1 )dt1 = ∂t1
ρ
v1
v ρ2 (t1 )2 1 (v1 )2 − . t1 dt1 = = 2 ρ 2 2
(12.99)
On the other hand, the buyer’s profit u∗1 (β ∗ ; v1 ) can be computed directly: in a symmetric, monotonically increasing equilibrium, buyer 1’s profit is the probability that the private value of buyer 2 is lower than v1 times the profit (v1 − β ∗ (v1 )) if he wins: u∗1 (β ∗ ; v1 ) = u1 (β ∗ (v1 ), β ∗ ; v1 ) = v1 (v1 − β ∗ (v1 )),
∀v1 ∈ (ρ, 1].
(12.100)
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
∗ of the strategies played in the symmetric 5 In this example we have two buyers, and therefore the vector β−1 ∗ . equilibrium by all players except Player 1 is β ∗ . We therefore write β ∗ instead of β−1
488
Auctions From Equations (12.99)–(12.100) we conclude that β ∗ (v1 ) =
ρ2 v1 + , 2 2v1
∀v1 ∈ (ρ, 1].
(12.101)
Step 4: Computing the seller’s expected revenue. We have shown that if there exists a monotonically increasing and differentiable symmetric equilibrium strategy in the interval (ρ, 1) then that strategy is defined by Equation (12.101). The strategy that we found is indeed differentiable. To see that it is monotonically increasing, we look at its derivative: (β ∗ )′ (v1 ) =
1 2
−
ρ2 , 2(v1 )2
∀v1 ∈ (ρ, 1].
(12.102)
We see that for v1 ∈ (ρ, 1] it is indeed the case that (β ∗ )′ (v1 ) > 0. Note that for ρ = 0 (an auction without a minimum price), by Equation (12.101), β ∗ (v1 ) = v21 , which is the solution that we found for sealed-bid first-price auctions without a reserve price (Example 12.15 on page 472). What is the seller’s expected revenue? Computing this requires first computing each buyer’s expected payment. Buyer 1’s payment is 0 when v1 ≤ ρ. If v1 > ρ, he wins only if v1 > v2 (an event that occurs with probability v1 ), and then he pays β ∗ (v1 ) (we are ignoring the possibility that v1 = v2 , which occurs with probability 0). The expected payment of buyer 1 is, therefore, 1 1 3 1 ρ2 v1 + dv1 = 12 (v31 ) + ρ 2 v1 v1 v1 β ∗ (v1 )dv1 = e= ρ 2 2v1 ρ ρ =
1 6
+
ρ2 2
− 23 ρ 3 .
(12.103)
Since there are two buyers, the seller’s expected revenue is π(ρ) = 2e =
1 3
+ ρ 2 − 43 ρ 3 .
(12.104)
Note that π(0) = 31 : in a sealed-bid first-price auction without a reserve price, the seller’s expected revenue is 13 . Similarly, π(1) = 0: when the reserve price is 1, with probability 1 no buyer wins the object and the seller’s expected revenue is 0. What is the reserve price that maximizes the seller’s expected payoff? To compute that, differentiate the function π, and set the derivative to 0: 0 = π ′ (ρ) = 2ρ − 4ρ 2 .
(12.105) 1 2,
It follows that the reserve price that maximizes the seller’s expected revenue is ρ = at which the 5 5 . Since 12 ≥ 13 , introducing a reserve price is beneficial for seller’s expected revenue is π( 12 ) = 12 5 the seller. Note that 12 is also the seller’s expected revenue in a sealed-bid second-price auction ◭ with entry fee 14 (see Example 12.28).
12.7
Risk aversion • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
One of the underlying assumptions of our analysis so far has been that the buyers participating in auctions are risk neutral, and therefore their goal is to maximize their expected profits. What happens if we drop this assumption? In this section, we will see how risk-averse buyers behave in sealed-bid first-price and second-price auctions. We will
489
12.7 Risk aversion
consider auction models satisfying Assumptions (A1)–(A5), thus omitting the risk neutrality Assumption (A6). For simplicity we maintain the term “symmetric auction with independent private values” for this model. Suppose that the buyers satisfy the von Neumann–Morgenstern axioms with respect to their utility for money (for a review of utility theory, see Chapter 2). In addition, suppose that each buyer has the same monotonically increasing utility function for money, U : R → R, satisfying U (0) = 0. If buyer i’s private value is vi , and his bid is bi , the buyer’s profit is:
r 0, if he does not win the auction. r vi − bi , if he wins. If we denote by αbi the probability that the buyer wins the auction if he bids bi , then when he bids bi , he is effectively facing the lottery: [αbi (vi − bi ), (1 − αbi )0].
(12.106)
Since the buyer’s preference relation satisfies the von Neumann–Morgenstern axioms, and since U (0) = 0, his utility from this lottery is U [αbi (vi − bi ), (1 − αbi )0] = αbi U (vi − bi ).
(12.107)
Recall (see Section 2.7 on page 23), that a buyer is risk-averse if his utility function for money U is concave, is risk-neutral if his utility function U is linear, and is risk-seeking if his utility function U is convex. In a sealed-bid second-price auction, the strategy β(v) = v
(12.108)
still weakly dominates all other strategies, even if the buyers are risk-averse (or riskseeking). The reasoning behind this is the same reasoning behind the similar conclusion we presented in the case of risk-neutral buyers, using the fact that U is monotonic (see Exercise 12.25). The situation is quite different in a sealed-bid first-price auction. Theorem 12.35 Consider a symmetric sealed-bid first-price auction with independent private values. Suppose that each buyer has the same utility function for money U that is monotonically increasing, differentiable, and strictly concave. Let γ be a monotonically increasing, differentiable, and symmetric equilibrium strategy satisfying γ (0) = 0. Then γ is the solution of the differential equation γ ′ (v) = (n − 1) ×
U (v − γ (v)) f (v) × , ∀v > 0 F (v) U ′ (v − γ (v))
(12.109)
with initial condition γ (0) = 0. Proof: Since γ is monotonically increasing, if buyer i bids bi , and the other buyers implement strategy γ , the probability that buyer i wins the auction is (F (γ −1 (bi )))n−1 . Since U (0) = 0, the buyer’s utility is ui (bi , γ−i ; vi ) = (F (γ −1 (bi )))n−1 × U (vi − bi ).
(12.110)
490
Auctions
We will first check that for vi > 0, the maximum of this function is attained at a value bi , which is in the interval (0, vi ). This is accomplished by showing that ui (0, γ−i ; vi ) = ui (γi , γ−i ; vi ) = 0 and ui (bi , γ−i ; vi ) < 0 for each bi > vi , while ui (bi , γ−i ; vi ) > 0 for each bi ∈ (0, vi ).
r If buyer i’s bid is bi = vi , his utility from winning is 0, and therefore ui (γi , γ−i ; vi ) = 0. r Since γ (0) = 0 it follows that F (γ −1 (0)) = 0, and therefore ui (0, γ−i ; vi ) = 0. r Since U is monotonically increasing and U (0) = 0, vi − bi < 0 for bi > vi , and therefore U (vi − bi ) < 0. By Assumptions (A4) and (A5), F (γ −1 (bi )) > 0 for every bi > vi , and therefore ui (bi , γ−i ; vi ) < 0 for every bi > vi . r Finally, we show that ui (bi , γ−i ; vi ) > 0 for every bi ∈ (0, vi ). For every bi > 0, since γ is monotonically increasing, γ −1 (bi ) > 0, and Assumptions (A4) and (A5) imply that F (γ −1 (bi )) > 0. Since the utility function is monotonically increasing, U (vi − bi ) > 0, and therefore ui (bi , γ−i ; vi ) > 0. We deduce that the maximum of the function bi #→ ui (bi , γ−i ; vi ) is indeed attained at a point in the open interval (0, vi ). We next differentiate the function ui in Equation (12.110) (which is differentiable because both U and γ are differentiable), yielding f (γ −1 (bi )) ∂ui (bi , γ−i ; vi ) = (n − 1) ′ −1 (F (γ −1 (bi )))n−2 U (vi − bi ) ∂bi γ (γ (bi )) − (F (γ −1 (bi )))n−1 U ′ (vi − bi ).
(12.111)
Since the strategy γ is a symmetric equilibrium strategy, the maximum of this function is attained at bi = γ (vi ); and at that point, the derivative vanishes. Thus, by substituting bi = γ (vi ) in Equation (12.111) one has ∂ui (bi , γ−i ; vi )|bi =γ (vi ) ∂bi f (vi ) (F (vi ))n−2 U (vi − γ (vi )) − (F (vi ))n−1 U ′ (vi − γ (vi )). (12.112) = (n − 1) ′ γ (vi )
0=
Because vi > 0, and by Assumption (A5), F (vi ) > 0. Reducing the factor (F (vi ))n−2 in Equation (12.112), and rearranging the remaining equation, yields Equation (12.109). We now prove that when all the buyers are risk averse and have the same utility function, the submitted bids in equilibrium are higher than the bids that would be submitted by risk-neutral buyers. The intuition behind this result is that risk-averse buyers are more concerned about not winning the auction, and therefore they submit higher bids than risk-neutral buyers. Theorem 12.36 Suppose that in a symmetric sealed-bid first-price auction with independent private values, each buyer’s utility function U is monotonically increasing, differentiable, strictly concave, and satisfies U (0) = 0. Let γ be a monotonically increasing, differentiable, symmetric equilibrium strategy satisfying γ (0) = 0, and let β be monotonically increasing, differentiable, symmetric equilibrium strategy satisfying β(0) = 0 in the auction when the buyers are risk-neutral. Then γ (v) > β(v) for each v > 0.
491
12.7 Risk aversion
U(v)
Slope U ( v ) Slope
U(v) v
x Figure 12.3
U (v) v
′
> U (v) for a strictly concave function U
Proof: Theorem 12.35 implies that γ ′ (v) = (n − 1)
U (x − γ (v)) f (v) × ′ , ∀v > 0 F (v) U (v − γ (v))
(12.113)
Since the strategy β also satisfies the conditions of Theorem 12.35, this strategy, for risk-neutral buyers, satisfies Equation (12.113) with utility function U (v) = v β ′ (v) = (n − 1)
f (v) × (v − β(v)), ∀v > 0. F (v)
Since U is a strictly concave function, and U (0) = 0, it follows that U ′ (v) < > v. Figure 12.3), or equivalently, UU′(v) (v) It follows that γ ′ (v) = (n − 1)
(12.114) U (v) v
(see
U (v − γ (v)) f (v) f (v) × ′ > (n − 1) × (v − γ (v)). (12.115) F (v) U (v − γ (v)) F (v)
To show that γ (v) > β(v) for each v > 0, note that if v0 > 0 satisfies γ (v0 ) ≤ β(v0 ), then f (v0 ) × (v0 − γ (v0 )) F (v0 ) f (v0 ) ≥ (n − 1) × (v0 − β(v0 )) = β ′ (v0 ) > 0. F (v0 )
γ ′ (v0 ) > (n − 1)
(12.116) (12.117)
Define δ(v) := γ (v) − β(v) for all v ∈ V. Equations (12.116)–(12.117) show that δ ′ (v) > 0 for each v > 0 such that δ(v) ≤ 0. It follows that if there exists v0 > 0 such that δ(v0 ) ≤ 0, then δ(v) < 0 for each v ∈ [0, v0 ). Since δ(0) = γ (0) − β(0) = 0, there does not exist v0 > 0 such that δ(v0 ) ≤ 0. In other words, γ (v) > β(v) for each v > 0. In the model used in this section, in which all the buyers have the same utility function for money, the bids submitted by the buyers in the symmetric equilibrium of the sealedbid first-price auctions are higher if all buyers are risk averse, which implies that the seller’s expected revenue is higher. In contrast, in sealed-bid second-price auctions, the bids submitted by buyers in the symmetric equilibrium equal their private values, whether
492
Auctions
they are risk-averse or risk-neutral, and hence the seller’s expected revenue is equal in either case. This leads to the following corollary. Corollary 12.37 In a symmetric sealed-bid auction with independent private values, when buyers are risk averse, and they all have the same monotonically increasing, differentiable, and strictly concave utility function, the seller’s expected revenue in the symmetric equilibrium is higher in a sealed-bid first-price auction than in a sealed-bid second-price auction. In particular, this proves that the Revenue Equivalence Theorem does not apply when the buyers are risk averse. The converse corollary can similarly be proved for risk-seeking buyers (Exercise 12.26): In a symmetric sealed-bid auction with independent private values, when the buyers are risk seeking and they all have the same monotonically increasing, differentiable, and strictly convex utility functions the seller’s expected revenue in the symmetric equilibrium is lower in a first-price auction than in a second-price auction.
12.8
Mechanism design • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
We have presented up to now several auction methods, which we analyzed by computing an equilibrium in each auction, and studying its properties. The advantages and disadvantages of a particular auction method were then judged by the properties of its equilibrium. A natural question that arises is whether one can plan an auction method, or more generally a selling mechanism, that can be expected to yield a “desired outcome.” In other words, what we seek is a selling mechanism whose equilibrium has “desired properties,” such as efficiency and maximizing the revenue of the seller. In this section we will study mechanism design, a subject that focuses on these sorts of questions. Definition 12.38 A selling problem is a vector (N; (Vi , fi )i∈N ) such that:
r N = {1, 2, . . . , n} is a set of buyers. r Vi is a bounded interval [0, v] or an infinite interval [0, ∞). 5 r fi : Vi → [0, ∞) is a density function, i.e., Vi fi (v)dv = 1.
A selling problem serves as a model for the following situation:
r A seller wishes to sell an indivisible object, whose value for the seller is normalized to be 0. r The set of buyers is N = {1, 2, . . . , n}. The buyers are all risk-neutral, and each seeks to maximize the expected value of his profit. r The private value of buyer i is a random variable Vi with values in the interval Vi . This is a continuous random variable whose density function is fi . The random variables (Vi )i∈N are independent random variables that do not necessarily have the same distribution; we do not rule out the possibility that fi = fj for different buyers i and j . r Each buyer knows his private value and does not know the private values of the other buyers, but he knows the distributions of the random variables (Vj )j =i .
493
12.8 Mechanism design
Denote by VN := V1 × V2 × · · · × Vn the space of all possible vectors of private values. Since the private value of the buyers are independent, the joint density function of the vector V = (V1 , V2 , . . . , Vn ) is ) fV (v) = fi (vi ). (12.118) i∈N
Denote: f−i (v−i ) :=
)
fj (vj ).
(12.119)
j =i
f−i is the joint density function of V−i = (Vj )j =i . Since the private values are independent, this is also the marginal density of V over V−i , conditioned on Vi , where V−i := ×j =i Vj is the set of all possible private value vectors of all the buyers except for buyer i. Definition 12.39 A selling mechanism for a selling problem (N, (Vi , fi )i∈N ) is a vector qi , μi )i∈N ) where, for each buyer i ∈ N: ((i ,
1. i is a measurable space6 of messages that buyer i can send to the seller. The space of the message vectors is = 1 × 2 × · · · × n . each vector of messages to the probability that 2. qi : → [0, 1] is a function mapping buyer i wins the object. (Necessarily, i∈N qi (θ) ≤ 1, for every θ ∈ .) 3. μi : → R is a function mapping every vector of messages to the payment that buyer i makes to the seller (whether or not he wins the object).
If i∈N qi (θ) < 1 for a particular θ ∈ , then when the message vector received by the seller is θ there is a positive probability that the object will not be sold (and will therefore remain in the possession of the seller). Given a selling mechanism, we define the following game:
r The set of players (buyers) is N. r Each buyer i ∈ N chooses a message θi ∈ i . Denote θ = (θ1 , θ2 , . . . , θn ). r Buyer i wins the object with probability qi (θ). The object remains in the possession of the seller with probability 1 − i∈N qi (θ). r Every buyer i pays to the seller the amount μi (θ).
The space of messages in a selling mechanism may be very complex: if the selling mechanism includes negotiations, the message space may include the buyer’s first offer, his second offer to every counteroffer of the seller, and so on. From now on we study selling mechanisms for a given selling problem.
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
6 Recall that i is a measurable space if it has an associated σ -algebra, i.e., a collection of subsets of i containing the empty set that is closed under countable unions and set complementation.
494
Auctions
Example 12.40 Sealed-bid first-price auction A sealed-bid first-price auction with risk-neutral buyers can be presented as a selling mechanism as follows:
r i = [0, ∞): Buyer i’s message is a nonnegative number; this is buyer i’s bid. r Denote by N (θ ) := |{i ∈ N : θi = max θj }| j ∈N
the number of buyers who submit the highest bid.7 1 if θi = maxj ∈N θj , qi (θ ) = N (θ ) 0 if θi < maxj ∈N θj .
(12.120)
(12.121)
r The payment that buyer i makes is
μi (θ ) =
θi N (θ )
0
if θi = maxj ∈N θj , if θi < maxj ∈N θj .
(12.122)
This description differs from the auction descriptions we previously presented only in the payment that the buyer who submits the highest bid makes, and only in the case that several buyers submit the same bid. In the description of an auction as a selling mechanism, all the buyers who submitted the highest bid equally share the cost of that bid whether or not they finally get the object, while in the description on page 465, only the winner of the object pays its full price, which equals his bid. But this difference does not change the strategic considerations of the buyers: when there are several buyers submitting the same highest bid and the winner of the object is chosen from among them according to the uniform distribution, in both cases the expected payment that the buyer who wins the auction makes is the amount that he bid, divided by N (θ ), and his probability of winning is N1(θ ) . Since the buyers are risk-neutral and the goal of each buyer is to maximize his expected ◭ profit, the strategic considerations of the buyers, under both definitions, are unchanged.
Example 12.41 Sealed-bid second-price auction Similar to a sealed-bid first-price auction, a sealed-bid second-price auction with risk-neutral buyers can also be presented as a selling mechanism. The only difference is in the payment function, μ, which is given as follows: maxj =i θj if θi = maxj ∈N θj , N (θ ) μi (θ ) = (12.123) 0 if θi < maxj ∈N θj . Again, N (θ ) is the number of buyers who submitted the highest bid.
◭
As the following theorem states, every sealed-bid auction is an example of a selling mechanism. The proof of the theorem is left to the reader (Exercise 12.28). Theorem 12.42 Every sealed-bid auction with risk-neutral buyers can be presented as a selling mechanism. The game that corresponds to a selling mechanism is a Harsanyi game with incomplete information (see Definition 9.39 on page 347). ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
7 Recall that for every finite set A, the number of elements in A is denoted by |A|.
495
12.8 Mechanism design
r r r r
The set of players is the set of buyers N = {1, 2, . . . , n}. Player i’s set of types is Vi . Denote VN := ×i∈N Vi . The distribution of the set of type vectors is a product distribution, whose density is fV . For each type vector v ∈ VN , the state of nature sv is the state game defined by r player i’s set of actions is i ; r for each vector of actions θ ∈ , buyer i’s utility is q (θ)vi − μi (θ). ui (v; θ) =
(12.124)
In other words, the buyer pays μi (θ) in any event, and if he wins the auctioned object, he receives vi , his value of the object.
A pure strategy for player i is a measurable function βi : Vi → i . For each strategy vector β−i = (βj )j =i of the other buyers, denote by ui (θi , β−i ; vi ) buyer i’s expected profit, when his private value is vi , and he sends message θi : ui (θi , β−i ; vi ) ui (vi ; β1 (v1 ), . . . , βi−1 (vi−1 ), θi , βi+1 (vi+1 ), . . . , βn (vn ))f−i (v)dv−i . =
(12.125)
V−i
The definition of a Bayesian equilibrium of the game with incomplete information that corresponds to a selling mechanism is as follows (see Definition 9.49 on page 354). Definition 12.43 A vector β = (β1 , β2 , . . . , βn ) of strategies is a (Bayesian) equilibrium if for each buyer i ∈ N, and every private value vi ∈ Vi , ui (βi (vi ), β−i ; vi ) ≥ ui (θi , β−i ; vi ), ∀θi ∈ i .
(12.126)
A simple set of mechanisms is the set of direct selling mechanisms, in which the set of messages of each buyer is his set of private values. qi , μi )i∈N is called direct if i = Vi for each Definition 12.44 A selling mechanism (i , buyer i ∈ N.
A direct selling mechanism is a mechanism in which every buyer is required to report a private value; he may report his true private value or make up any other value to report. We will denote a direct selling mechanism by ( q, μ) for short, where q = ( qi )i∈N and μ = ( μi )i∈N . When a mechanism is direct, a possible strategy that a buyer may use is to report his true private value: βi∗ (vi ) = vi , ∀vi ∈ Vi .
(12.127)
We refer to this as the “truth-telling” strategy. Definition 12.45 A direct selling mechanism ( q, μ) is incentive compatible if the vector β ∗ = (βi∗ )i∈N of truth-telling strategies βi∗ (vi ) = vi is an equilibrium.
The reason we use the term “incentive compatible” is because if β ∗ is an equilibrium, then each buyer has an incentive to report his private value truthfully: he cannot profit by lying in reporting his private value. This property is analogous to the nonmanipulability property that we will discuss in Chapter 21, on social choice theory.
496
Auctions
The direct selling mechanism depicted in Example 12.41 for a sealed-bid secondprice auction is an incentive compatible mechanism, because in a sealed-bid second-price auction, the strategy vector β ∗ in which every buyer’s bid equals his private value is an equilibrium. In contrast, the direct selling mechanism depicted in Example 12.40, corresponding to a sealed-bid first-price auction, is not incentive compatible, because in sealed-bid first-price auction the strategy vector β ∗ is not an equilibrium. As the next example shows, when the buyers are symmetric, it is nevertheless possible to describe sealed-bid first-price auctions as incentive-compatible direct selling mechanisms. Example 12.46 Sealed-bid first-price auction: another representation as a selling mechanism Consider a symmetric sealed-bid first-price auction that satisfies Assumptions (A4)–(A6). Let β = (βi )i∈N be an equilibrium of the auction. Consider the following direct selling mechanism:
r i = [0, ∞): buyer i’s message is a nonnegative number. r The probability that buyer i wins the auctioned object is qi (θ ) =
1 N (θ )
0
if θi = maxj ∈N θj , if θi < maxj ∈N θj .
(12.128)
r The expected payment that buyer i makes is μi (θ ) =
βi (θi ) 0
if θi = maxj ∈N θj , if θi < maxj ∈N θj .
(12.129)
In words, a buyer submitting the highest bid pays the expected sum that he would pay under equilibrium β when the private values are (θi )i∈N . Since β is a symmetric equilibrium strategy, the strategy vector β ∗ , under which each buyer reports his private value, is an equilibrium in this selling ◭ mechanism, and therefore in particular this selling mechanism is incentive compatible.
12.8.1 The revelation principle The idea in Example 12.46 can be generalized to any selling mechanism: let (i , qi , μi )i∈N be an equilibrium of this mechanism. We can then define be a selling mechanism, and let β a direct selling mechanism (q, μ), as follows: if the buyers report the private values vector i (vi ) the buyer would have sent in v = (vi )i∈N , the mechanism computes what message β the original mechanism, and then proceeds exactly as that mechanism would have done under those messages: for each buyer i ∈ N, 1 (v1 ), . . . , β n (vn )), q i (β qi (v) := 1 (v1 ), . . . , β n (vn )). μ i (β μi (v) :=
(12.130) (12.131)
The mechanism (q, μ) is schematically described in Figure 12.4. Since the strategy vector is an equilibrium of the mechanism (i , β qi , μi )i∈N , the strategy vector β ∗ according to which every buyer reports his true private value is an equilibrium of the mechanism (q, μ). This leads to the following theorem, which is called the revelation principle. qi , μi )i∈N be a selling mechanism, and let β Theorem 12.47 (Myerson [1979]) Let (i , be an equilibrium of this mechanism. There exists an incentive-compatible direct selling
497
12.8 Mechanism design
(q, µ ) vn
v2 v1
βn
β2 β1
θn
(Θi , qi , µ i ) i ∈N
θ2 θ1
Figure 12.4 A selling mechanism, along with a θ incentive-compatible direct selling mechanism that is equivalent to it
is mechanism (q, μ) satisfying that the outcome of the original mechanism under β ∗ identical to the outcome of (q, μ) under β (which is the truth-telling equilibrium): 1 (v1 ), β 2 (v2 ), . . . , β n (vn ) = q(v1 , v2 , . . . , vn ), ∀(v1 , . . . , vn ) ∈ V, q β (12.132) μ β1 (v1 ), β2 (v2 ), . . . , βn (vn ) = μ(v1 , v2 , . . . , vn ), ∀(v1 , . . . , vn ) ∈ V.
(12.133)
The proof of the theorem is left to the reader as an exercise (Exercise 12.30). The theorem’s importance stems from the fact that incentive-compatible direct selling mechanisms are simple and easy to work with, resulting in simpler mathematical analysis. The space of messages in a generic selling mechanism may be quite large, and the revelation principle simplifies the effort required to analyze selling mechanisms. It implies that it suffices to consider only incentive-compatible direct selling mechanisms, because every general selling mechanism has an incentive-compatible, direct mechanism that is equivalent to it in the sense of Theorem 12.47.
12.8.2 The Revenue Equivalence Theorem In this section we prove a general Revenue Equivalence Theorem for selling mechanisms, which includes the Revenue Equivalence Theorem for auctions as a special case (Theorem 12.23 on page 478). To that end, we first introduce some new notation, and prove intermediate results. Consider a direct selling mechanism (q, μ). When buyer i reports that his private value is xi , and the other buyers report their true private values, the probability that buyer i wins the object is Qi (xi ) = qi (xi , v−i )f−i (v−i )dv−i , (12.134) V−i
498
Auctions
and the expected payment he makes is Mi (xi ) = μi (xi , v−i )f−i (v−i )dv−i .
(12.135)
V−i
Because the private value of the buyers are independent, these two quantities are independent of buyer i’s true private value, and depend solely on the message xi that he reports. Buyer i’s expected profit, when his true private value is vi and he reports xi , is ∗ ui (xi , β−i ; vi ) = Qi (xi )vi − Mi (xi ).
(12.136)
In other words, the buyer’s expected profit is the probability that he wins the auction, times his private value, less the expected payment that he makes. Given this, the following equation is obtained: ∗ ui (xi , β−i ; vi ) = Qi (xi )vi − Mi (xi )
= Qi (xi )xi − Mi (xi ) + Qi (xi )(vi − xi ) =
∗ ui (xi , β−i ; xi )
+ Qi (xi )(vi − xi ).
(12.137) (12.138) (12.139)
Denote buyer i’s expected profit when he reports his true private value by ∗ Wi (vi ) = ui (vi , β−i ; vi ).
(12.140)
Inserting xi = vi into Equation (12.136), one has Wi (vi ) = Qi (vi )vi − Mi (vi ).
(12.141)
Theorem 12.48 A direct selling mechanism (q, μ) is incentive compatible if and only if Wi (vi ) ≥ Wi (xi ) + Qi (xi ) (vi − xi ) , ∀i ∈ N, ∀vi ∈ Vi , ∀xi ∈ Vi .
(12.142)
Proof: A direct selling mechanism (q, μ) is incentive compatible if and only if equilibrium is attained when all buyers report their true private values. This means that for each buyer i, each private value vi ∈ Vi , and each possible report xi ∈ Vi , ∗ ∗ ui (vi , β−i ; vi ) ≥ ui (xi , β−i ; vi ).
(12.143)
∗ Since ui (vi , β−i ; vi ) = Wi (vi ) (Equation (12.140)), Equations (12.137)–(12.139) imply that Equation (12.143) is equivalent to ∗ ; xi ) + Qi (xi )(vi − xi ) = Wi (xi ) + Qi (xi )(vi − xi ). Wi (vi ) ≥ ui (xi , β−i
(12.144)
In other words, (q, μ) is incentive compatible if and only if Equation (12.144) obtains, which is what we needed to prove. The following theorem yields an explicit formula for computing a buyer’s expected profit in an incentive-compatible direct selling mechanism.
499
12.8 Mechanism design
Theorem 12.49 Let (q, μ) be an incentive-compatible direct selling mechanism. Then for each vi ∈ Vi , vi Qi (ti )dti (12.145) Wi (vi ) = Wi (0) + 0 vi = −Mi (0) + Qi (ti )dti . (12.146) 0
We see from Equation (12.146) that buyer i’s expected profit depends on the payment he makes for the object, Mi , only through Mi (0) – the sum that he pays when the private value that he reports is 0. Inserting Equation (12.141) into Equation (12.146), one gets vi Mi (vi ) = Mi (0) + Qi (vi )vi − Qi (ti )dti . (12.147) 0
This equation is an explicit formula for a buyer’s expected payment under the truth-telling equilibrium β ∗ as a function of Mi (0), and of his probability of winning.
Proof of Theorem 12.49: Note that if Equation (12.145) is satisfied, then Equation (12.146) also holds, since ∗ Wi (0) = ui (0, β−i ; 0) = Qi (0) × 0 − Mi (0) = −Mi (0).
(12.148)
To prove Equation (12.145), we first prove that the function Qi is monotonically nondecreasing. Since (q, μ) is an incentive-compatible mechanism, Theorem 12.48 implies that Wi (vi ) − Wi (xi ) ≥ Qi (xi ) (vi − xi ) , ∀i ∈ N, ∀vi ∈ Vi , ∀xi ∈ Vi .
(12.149)
In particular: Wi (vi ) − Wi (xi ) ≥ Qi (xi ) (vi − xi ) , ∀vi ≥ xi .
(12.150)
Reversing the roles of xi and vi in Equation (12.149), and re-inserting vi ≥ xi , one gets Wi (xi ) − Wi (vi ) ≥ Qi (vi ) (xi − vi ) , ∀vi ≥ xi .
(12.151)
Multiplying both sides of the inequality sign in this equation by −1 yields: Wi (vi ) − Wi (xi ) ≤ Qi (vi ) (vi − xi ) , ∀vi ≥ xi .
(12.152)
Note the resemblance between the inequalities in Equations (12.150) and (12.152): the only difference is the argument of Qi , and the direction of the inequality sign. Equations (12.150) and (12.152) imply that for each xi and vi , Qi (xi ) (vi − xi ) ≤ Wi (vi ) − Wi (xi ) ≤ Qi (vi ) (vi − xi ) , ∀vi ≥ xi .
(12.153)
For vi > xi , we can divide Equation (12.153) by vi − xi , which yields If vi > xi then Qi (vi ) ≥ Qi (xi ).
(12.154)
That is, the function Qi is a monotonically nondecreasing function, and is therefore in particular integrable. We next turn to the 5 vproof that Equation (12.145) is satisfied. Let vi ∈ Vi , vi > 0, and consider the integral 0 i Qi (ti )dti as the limit of Riemann sums of the function Qi . Divide
500
Auctions
the interval [0, vi ] into L intervals of length δ = vLi ; denote by zk = (k + 1)δ the rightmost (upper) end of the k-th interval, and by x k = kδ its leftmost (lower) end. Inserting vi = zk and xi = x k in Equation (12.153) and summing over k = 0, 1, . . . , L − 1 yields L−1
k=0
Qi (x k )(zk − x k ) ≤
L−1 L−1
(Wi (zk ) − Wi (x k )) ≤ Qi (zk )(zk − x k ). k=0
(12.155)
k=0
The middle series is a telescopic series that sums to Wi (vi ) − Wi (0). The left series is a Riemann sum, where the value of the function is taken to be its value at the leftmost end of each interval, and the right series is a Riemann sum, where the value of the function is taken to be its value at the rightmost end of the interval. By 5 v increasing L (letting δ approach 0), both the right series and the left series converge to 0 i Qi (ti )dti , which yields vi Qi (ti )dti = Wi (vi ) − Wi (0), (12.156) 0
and hence Equation (12.145) is satisfied.
Corollary 12.50 Let (q, μ) and ( q, μ) be two incentive-compatible direct selling mechanisms defined over the same selling problem (N, (Vi , fi )i∈N ) and satisfying:
r q = q : the rule determining the winner is identical in both mechanisms. r μi (0, v−i ) = μi (0, v−i ): a buyer who reports that his private value is 0 pays the same sum in both mechanisms.
Then at the truth-telling equilibrium β ∗ the expected profit of each buyer is the same in both mechanisms: ∗ ∗ ui (vi , β−i ; vi ) = ui (vi , β−i ; vi ),
(12.157)
∗ ; vi ) is seller i’s expected revenue under equilibrium β ∗ using the selling where ui (vi , β−i mechanism ( q, μ).
Proof: Since both mechanisms apply identical rules for determining the winner, the probability that a buyer with private value vi wins is equal in both mechanisms: the second term in Equation (12.146) is therefore the same in both mechanisms. Since in both cases a buyer who reports 0 pays the same amount, the first term in Equation (12.146) is also the same in both mechanisms. Equation (12.146) therefore implies that the expected profit of each buyer is equal in both mechanisms. Since every auction is, in particular, a selling mechanism, Corollary 12.50 implies the Revenue Equivalence Theorem (Corollary 12.24, page 479) as the reader is asked to prove in Exercise 12.31.
12.9
Individually rational mechanisms • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
A direct selling mechanism is called individually rational if at the truth-telling equilibrium β ∗ , the expected profit of each buyer is nonnegative.
501
12.10 Finding the optimal mechanism
Definition 12.51 A direct selling mechanism is called individually rational if Wi (vi ) ≥ 0 for each buyer i and for every vi ∈ Vi . If Wi (vi ) < 0, buyer i with private value vi will not want to participate, because by doing so he is liable to lose. Therefore, assuming that the equilibrium β ∗ is attained, a buyer cannot lose by participating in a sale by way of a direct and individually rational selling mechanism. Theorem 12.52 An incentive-compatible direct selling mechanism is individually rational if and only if Mi (0) ≤ 0, for each buyer i ∈ N. Proof: From Equations (12.145)–(12.146): Wi (vi ) = −Mi (0) +
vi
Qi (ti )dt.
(12.158)
0
Since the function Qi is nonnegative, the right-hand side of the equation is minimal when vi = 0. That is, Wi (vi ) is minimal at vi = 0. Therefore, the mechanism is individually rational if and only if 0 ≤ Wi (0) = −Mi (0), that is, if and only if Mi (0) ≤ 0, for each buyer i ∈ N, which is what we needed to prove.
12.10 Finding the optimal mechanism • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
In this section, we will find the incentive-compatible and individually rational mechanism that maximizes the seller’s expected revenue. We will assume the following condition, which is similar to Assumption (A5): (B) For every buyer i, the density function fi of buyer i’s private values is positive over the interval Vi . Remark 12.53 Changing a density function at a finite (or countable) number of points does not affect the distribution. Therefore, if a density function is zero at a finite number of points, it can be changed to satisfy Assumption (B). Define, for each buyer i, a function ci : Vi → R as follows: ci (vi ) = vi −
1 − Fi (vi ) . fi (vi )
(12.159)
This function depends only on the distribution of the buyer’s private values. From Assumption (B), the function ci is well defined over the interval Vi .
Example 12.54 The private value is uniformly distributed over [0, 1] Since fi (vi ) = 1 and Fi (vi ) = vi , for each vi ∈ [0, 1]: ci (vi ) = vi −
1 − vi = 2vi − 1. 1
(12.160)
◭
502
Auctions
Example 12.55 The distribution of the private values over [0, 1] is given by the cumulative distribution function Fi (vi ) = vi (2 − vi ) In this case, fi (vi ) = 2(1 − vi ), and therefore ci (vi ) = vi −
3vi − 1 1 − vi (2 − vi ) = . 2(1 − vi ) 2
(12.161)
◭
We will first prove the following claim: Theorem 12.56 E[ci (Vi )] = 0 for each buyer i ∈ N. Proof: 1 − Fi (vi ) fi (vi )dvi vi − E[ci (Vi )] = fi (vi ) Vi = E[Vi ] − (1 − Fi (vi ))dvi = 0,
(12.162)
Vi
where the last equality obtains because for every nonnegative random variable, ∞ E[X] = (1 − FX (x))dx. (12.163) 0
Using the functions (ci )i∈N , we define a direct selling mechanism (q ∗ , μ∗ ). Definition 12.57 Define the direct mechanism (q ∗ , μ∗ ) as follows: ⎧ ci (vi ) ≤ 0, ⎨0 ∗ ci (vi ) < maxj ∈N cj (vj ), (12.164) qi (v) = 0 ⎩ 1 c (v ) = max c (v ) > 0. i i j ∈N j j |{l : cl (vl )=maxj ∈N cj (vj )}| vi μ∗i (v) = vi qi∗ (v) − qi∗ (ti , v−i )dti . (12.165) 0
In other words, buyer i wins the object (with positive probability) only if ci (vi ) is positive and maximal. First, we will show that if the function ci is nondecreasing, then the mechanism (q ∗ , μ∗ ) is incentive compatible and individually rational. Then we will show that, if ci is monotonically nondecreasing, (q ∗ , μ∗ ) maximize the seller’s expected revenue among the incentivecompatible and individually rational direct selling mechanisms. Theorem 12.58 If for each buyer i ∈ N the function ci is nondecreasing, then the direct mechanism (q ∗ , μ∗ ) is incentive compatible and individually rational. Proof: In the direct mechanism (q ∗ , μ∗ ), when each buyer reports his true private value, for each buyer i with private value is vi denote by Q∗i (vi ) buyer i’s probability of winning the object, by Mi∗ (vi ) the expected payment that buyer i makes, and by Ui∗ (vi ) his expected ∗ profit in this case. Also denote by u∗i (xi , β−i ; vi ) the expected profit of buyer i with private value vi if he reports xi while all other buyers truthfully report their private value.
503
12.10 Finding the optimal mechanism
Step 1: The function Q∗i is nondecreasing: Q∗i (ti ) ≥ Q∗i (xi ) for all ti ≥ xi . Since the function ci is nondecreasing, the definition of q ∗ (Equation (12.164)) implies that the greater the buyer’s private value, the greater his probability of winning, i.e., for each buyer i, for any ti ≥ xi and for any v−i ∈ V−i , qi∗ (ti , v−i ) ≥ qi∗ (xi , v−i ). Integrating over V−i yields ∗ qi (ti , v−i )f−i (v−i )dv−i ≥ V−i
V−i
(12.166)
qi∗ (xi , v−i )f−i (v−i )dv−i .
(12.167)
By the definition of Q∗i (see Equation (12.134)), (12.168) Q∗i (ti ) ≥ Q∗i (xi ), ∀i ∈ N, ∀ti ≥ xi . 5 x ∗ Step 2: u∗i (xi , β−i ; vi ) = Q∗i (xi )(vi − xi ) + 0 i Q∗i (ti )dti for each i ∈ N and every xi ∈ Vi . By definition (see Equation (12.136)), ∗ ; vi ) = Q∗i (xi )vi − Mi∗ (xi ), u∗i (xi , β−i
and Mi∗ (xi ) =
V−i
μ∗i (xi , v−i )f−i (v−i )dv−i
=
=
xi Q∗i (xi )
V−i
xi qi∗ (xi , v−i ) −
0
xi
−
xi
0
(12.169)
(12.170)
qi∗ (ti , v−i )dti
f−i (v−i )dv−i
Q∗i (ti )dti ,
where the first equality follows from Equation (12.135), the second equality follows the definition of μ∗ (Equation (12.165)), and the third equality follows from the definition of Q∗i , and changing the order of integration. Inserting this equation into Equation (12.169), one has xi ∗ ∗ ∗ ui (xi , β−i ; vi ) = Qi (xi )(vi − xi ) + Q∗i (ti )dti , (12.171) 0
as claimed. This equation5obtains for every xi ∈ Vi , and in particular, for xi = vi , it v ∗ becomes u∗i (vi , β−i ; vi ) = 0 i Q∗i (ti )dti .
Step 3: The mechanism (q ∗ , μ∗ ) is a incentive-compatible direct selling mechanism. The mechanism is incentive compatible if and only if reporting the truth is an equilibrium: ∗ ∗ u∗i (vi , β−i ; vi ) ≥ u∗i (xi , β−i ; vi ), ∀vi , xi ∈ Vi .
(12.172)
∗ ; vi ) obtained in Equation (12.171) into both By substituting the expression for u∗i (xi , β−i the left-hand side and the right-hand side of Inequality (12.172) we deduce that the mechanism is incentive compatible if and only if xi vi ∗ ∗ Qi (xi )(vi − xi ) + Qi (ti )dti ≤ Q∗i (ti )dti , ∀vi , xi ∈ Vi . (12.173) 0
0
504
Auctions
The last equation holds if and only if
Q∗i (xi )(vi − xi ) ≤
vi
xi
Q∗i (ti )dti , ∀vi , xi ∈ Vi .
(12.174)
By Step 1 the function Q∗i is nondecreasing. Thus, if xi ≤ vi then Q∗i (xi ) ≤ Q∗i (t) for all t ∈ [xi , vi ], and therefore vi vi Q∗i (xi )(vi − xi ) = Q∗i (xi )dti ≤ Q∗i (ti )dti (12.175) xi
xi
and Equation (12.174) holds. If xi > vi , xi Q∗i (xi )(xi − vi ) = Q∗i (xi )dti ≥ vi
xi
Q∗i (ti )dti ,
(12.176)
vi
and therefore Q∗i (xi )(vi
− xi ) =
−Q∗i (xi )(xi
− vi ) ≤ −
xi
vi
Q∗i (ti )dti
=
vi
Q∗i (ti )dti ,
xi
and Equation (12.174) also holds. Step 4: The mechanism (q ∗ , μ∗ ) is individually rational. Since the mechanism is direct and incentive compatible, by Theorem 12.52 it suffices to show that Mi∗ (0) ≤ 0 for every buyer i ∈ N. By Equation (12.165), for every v−i ∈ V−i , μ∗i (0, v−i ) = 0 · qi∗ (0, v−i ) +
0
0
qi∗ (ti , v−i )dti = 0.
(12.177)
It follows that Mi∗ (0) =
V−i
μ∗i (0, v−i )f−i (v−i )dv−i = 0,
and therefore in particular Mi∗ (0) ≤ 0.
(12.178)
The next theorem shows that if the functions (ci )i∈N are monotonically nondecreasing, then the mechanism (q ∗ , μ∗ ) is optimal from the seller’s perspective. Theorem 12.59 Consider the selling problem (N, (Vi , fi )i∈N ) and suppose that the functions (ci )i∈N defined by Equation (12.159) are monotonically nondecreasing. Then the mechanism (q ∗ , μ∗ ) defined by Equations (12.164)–(12.165) maximizes the seller’s expected revenue within all incentive-compatible, individually rational direct selling mechanisms. Proof: The seller’s revenue is the sum of the payments made by the buyers. His expected revenue, which we will denote by π, is therefore
E[Mi (Vi )]. (12.179) π= i∈N
505
12.10 Finding the optimal mechanism
From8 Equation (12.147), E[Mi (Vi )] =
vi
Mi (vi )fi (vi )dvi
0
= Mi (0) + −
0
vi
Qi (vi )vi fi (vi )dvi
0 v i vi
Qi (ti )dti fi (vi )dvi .
0
(12.180)
Changing the order of integration in the last term, and using the fact that Fi (v i ) = 1 yields: vi vi vi vi Qi (ti )dti fi (vi )dvi = Qi (ti )fi (vi )dvi dti 0
0
=
0
ti
vi
0
Qi (ti )(1 − Fi (ti ))dti .
(12.181)
It follows that E[Mi (Vi )] = Mi (0) + = Mi (0) + = Mi (0) + = Mi (0) +
vi
Qi (vi )vi fi (vi )dvi −
0 vi 0 vi
0
0
vi
Qi (ti )(1 − Fi (ti ))dti
(12.182)
1 − Fi (vi ) dvi Qi (vi )fi (vi ) vi − fi (vi )
(12.183)
Qi (vi )ci (vi )fi (vi )dvi
(12.184)
qi (v)ci (vi )fV (v)dv.
(12.185)
VN
Equation (12.184) follows from the definition of ci , and Equation (12.185) holds by Equation (12.134) (page 497) together with the fact that the private values are independent. By summing over i ∈ N, we deduce that the seller should maximize the quantity
π= qi (v)ci (vi ) fV (v)dv. (12.186) Mi (0) + i∈N
VN
i∈N
The first term depends only on (Mi (0))i∈N , i.e., only on (μi )i∈N , and the second term depends only on (qi )i∈N . To maximize π it therefore suffices to maximize each term separately. Start with the second term. For each v ∈ V, consider i∈N qi (v)ci (vi ). What are the coefficients (qi (v))i∈N that maximize this sum?
r If ci (vi ) < 0 for every i ∈ N, the sum is maximized when qi (v) = 0 for each i ∈ N. r If maxi∈N ci (vi ) ≥ 0, the maximum of the sum (under the constraint i∈N qi (v) ≤ 1) is maxi∈N ci (vi ): give positive weights (summing to 1) only to those buyers for whom ci (vi ) is maximal.
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
8 When Vi is not bounded we denote v i = ∞ and Fi (v i ) := limvi →∞ Fi (vi ).
506
Auctions
Since (qi∗ )i∈N have been defined to satisfy these conditions (see Equation (12.164)), it follows that the second term in Equation (12.186) is maximal for q = q ∗ . We next turn to the first term in Equation (12.186). By Theorem 12.52, Mi (0) ≤ 0 for every individually rational direct selling mechanism. Therefore, the first term is not greater than 0. As we proved in Step 4 above (Equation (12.178)), Mi∗ (0) = 0 for all i ∈ N, and therefore μ∗ maximizes the first term in Equation (12.186). The definition of (μ∗i )i∈N implies that Mi∗ (0) = 0. It follows that there is no incentive-compatible direct selling mechanism that yields the seller an expected revenue greater than his expected revenue yielded by (q ∗ , μ∗ ). Example 12.60
Private values distributed over different intervals Suppose that there are two buyers,
where buyer 1’s private value is uniformly distributed over [0, 1], and buyer 2’s private value is uniformly distributed over [0, 2]. As we saw in Example 12.54, c1 (v1 ) = 2v1 − 1, and therefore c1 (v1 ) < 0 if and only if v1 < 12 . For buyer 2, f2 (v2 ) = 12 and F2 (v2 ) = v22 , and therefore c2 (v2 ) = 2 /2 v2 − 1−v = 2v2 − 2. It follows that c2 (v2 ) < 0 if and only if v2 < 1. Since 1/2 c1 (v1 ) > c2 (v2 )
⇐⇒
2v1 − 1 > 2v2 − 2
⇐⇒
v1 > v2 − 21 ,
(12.187)
the optimal allocation rule q ∗ is defined as follows:
r r r r
If v1 < 21 and v2 < 1, the object is not sold (qi∗ (vi ) = 0 for i = 1, 2). If v1 ≥ 12 and v1 > v2 − 12 , buyer 1 wins the object, (q1∗ (v1 ) = 1 and q2∗ (v2 ) = 0). If v2 > 1 and v2 > v1 + 12 , buyer 2 wins the object, (q2∗ (v2 ) = 1 and q1∗ (v1 ) = 0). If v1 ≥ 21 , v2 ≥ 1 and v1 = v2 − 12 , each buyer wins the object with probability 12 , (q1∗ (v1 ) = q2∗ (v2 ) = 12 ).
v2 2
Object sold to buyer 2
3 2
q∗2 (v2) = 1 q∗1 (v1) = 0
1
region of inefficiency Object
Object
sold to buyer 1
not sold
1 2
q∗1 (v1) = 0 q∗1 (v1) = 1 q∗2 (v2) = 0 q∗2 (v2) = 1
0
v1 0
1 2
Figure 12.5 The function q ∗ in Example 12.60
1
507
12.10 Finding the optimal mechanism To compute the payment function we first compute ⎧ v1 ⎨0 q1∗ (t1 , v2 )dt1 = 0 ! ⎩ 0 v1 − max 21 , v2 − 12 ⎧ v2 ⎨0 q2∗ (v1 , t2 )dt2 = 0 ! ⎩ 0 v2 − max 1, v1 + 21
v1 < 12 , v1 < v2 − 12 , v1 ≥ 21 and v1 ≥ v2 − 12 .
(12.188)
v2 < 1, v2 < v1 + 21 , v1 ≥ 1 and v2 ≥ v1 + 21 .
(12.189)
Equation (12.165) yields
μ∗1 (v1 , v2 )
=
μ∗2 (v1 , v2 ) =
0 max
1 2 , v2
−
1 2
!
0 ! max 1, v1 + 12
Buyer 1 does not win, Buyer 1 wins.
(12.190)
Buyer 2 does not win, Buyer 2 wins.
(12.191)
Note that in this case, each buyer has a different minimum price: 21 for buyer 1 and 1 for buyer 2. Note also that when 21 < v1 < v2 < v1 + 12 (the shaded area in Figure 12.5), buyer 1 wins the object, despite the fact that his private value is less than buyer 2’s private value. In other words, the selling mechanism that maximizes the seller’s expected revenue is not efficient: the winner is not ◭ necessarily the buyer with the highest private value.
When private values are independent and identically distributed, Theorem 12.59 leads to the following corollary. Corollary 12.61 If the private values of the buyers are independent and identically distributed, and if the functions (ci )i∈N are monotonically nondecreasing, the incentivecompatible direct selling mechanism that maximizes the seller’s expected revenue is a sealed-bid second-price auction with a reserve price. Proof: Since the private values of the buyers are identically distributed, ci = cj =: c for every pair of buyers i, j . The function c is monotonically nondecreasing, and therefore a buyer for whom c(vi ) is maximal is one whose private value vi is maximal. Denote ρ ∗ = inf{ti ∈ Vi : ci (ti ) > 0},
(12.192)
which is independent of i, since Vi = Vj for all i, j ∈ N. To simplify the analysis, suppose that ci is a continuous function, and therefore ci (ρ ∗ ) = 0. By the definition of q ∗ (Equation (12.164)), the buyer who wins the object is the one who submits the highest bid, as long as that bid is greater than ρ ∗ : qi∗ (vi ) =
0 1 |{l : vl =maxj ∈N vj }|
We next calculate μ∗ (vi ) = vi qi∗ (vi ) − page 502):
if vi < ρ ∗ or vi ≤ maxj ∈N vj , if vi > ρ ∗ and vi = maxj ∈N vj . 5 vi 0
(12.193)
qi∗ (ti , v−i )dti (see Equation (12.165) on
508
Auctions
r If vi ≤ ρ ∗ or vi < maxj =i vj , then qi∗ (vi ) = 0, and qi∗ (ti ) = 0 for each ti ∈ [0, vi ), and hence in this case vi μ∗i (vi ) = vi qi∗ (vi ) − qi∗ (ti , v−i )dti = 0. (12.194) 0
In other words, a buyer whose bid is lower than the highest bid or less than or equal to ρ ∗ pays nothing. 1 r If vi > ρ ∗ and vi ≥ maxj =i vj then qi∗ (vi ) = , and qi∗ (ti ) = 0 for all |{i : vi =maxj ∈N vj }| ti ∈ [0, vi ). Hence, in this case, vi vi . (12.195) qi∗ (ti , v−i )dti = μ∗i (vi ) = vi qi∗ (vi ) − |{l : vl = maxj ∈N vj }| 0 In words, all the buyers who bid the maximum bid, provided it is at least ρ ∗ , equally share the payment to the seller. In summary, ∗
μ (vi ) = =
vi qi∗ (vi )
0
−
0
vi
qi∗ (ti , v−i )dti
maxj =i vj |{i : ci (vi )=maxj ∈N cj (vj )}|
vi < ρ ∗ or vi < maxj ∈N vj , vi ≥ ρ ∗ and vi = maxj ∈N vj .
(12.196)
In other words, (q ∗ , μ∗ ) is a sealed-bid second-price auction with a reserve price ρ ∗ .
12.11 Remarks • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The first use of game theory to study auctions was accomplished by economist William Vickrey [1961, 1962]. Vickrey, 1914–96, was awarded the Nobel Memorial Prize in Economics in 1996 for his contributions to the study of incentives when buyers have different information, and the implications incentives have on auction theory. The results in Section 12.7 are based on Holt [1980]. In this chapter we studied symmetric sealed-bid auction with independent private values. The theory of asymmetric sealed-bid auctions and sealed-bid auctions in which the private values of the buyers are not independent is mathematically complex. The interested reader is directed to Milgrom and Weber [1982], Lebrun [1999], Maskin and Riley [2000], Fibich, Gavious, and Sela [2004], Reny and Zamir [2004], and Kaplan and Zamir [2011, 2012]. The significance of mechanism design in economic theory was recognized by the Nobel Prize Committee in 2007, when it awarded the prize to three researchers who were instrumental in developing mechanism design, Leonid Hurwicz, Eric Maskin, and Roger Myerson. The revelation principle was proved by Myerson [1979], and the structure of the optimal mechanism was proved by Myerson [1981]. The reader interested in further deepening his understanding of auction theory and mechanism design is directed to Krishna [2002], Milgrom [2004], or Klemperer [2004].
509
12.12 Exercises
Exercise 12.1 is based on Wolfstetter [1996]. Exercises 12.19, 12.45 and 12.46 are based on examples that appear in Krishna [2002]. Exercises 12.20 and 12.21 are based on Kaplan and Zamir [2012]. The authors wish to thank Vijay Krishna for answering questions during the composition of this chapter. Many thanks are due to the students in the Topics in Game Theory course that was conducted at Tel Aviv University in 2005, for their many comments on this chapter, with special thanks going to Ronen Eldan and Ayala Mashiah-Yaakovi.
12.12 Exercises • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
In each of the exercises in this chapter, assume that buyers are risk-neutral, unless it is explicitly noted otherwise. 12.1 Selling an object at the monopoly price Andrew is interested in selling a rare car (whose value in his eyes we will normalize to 0). Assume there are n buyers and that buyer i’s private value of the car, Vi , is uniformly distributed over [0, 1]. The private values of the buyers are independent. Instead of conducting an auction, Andrew intends on setting a price for the car, publicizing this price, and selling the car only to buyers who are willing to pay this price; if no buyer is willing to pay the price, the car will not be sold, and if more than one buyer is willing to pay the price, the car will be sold to one of them based on a fair lottery that gives each of them equal probability of winning. Answer the following questions: (a) Find Andrew’s expected revenue as a function of the price x that he sets. (b) Find the price x ∗ that maximizes Andrew’s expected revenue. (c) What is the maximal expected revenue that Andrew can obtain, as a function of n? (d) Compare Andrew’s maximal revenue with the revenue he would gain if he sells the car by way of a sealed-bid first-price auction. For which values of n does a sealed-bid first-price auction yield a higher revenue? 12.2 (a) Explain what a buyer in an open-bid decreasing auction knows when the current announced price is x that he did not know prior to the auction. (b) Explain what a buyer in an open-bid increasing auction knows when the current announced price is x that he did not know prior to the auction. 12.3 Prove that in a symmetric sealed-bid second-price auction with independent private values the only monotonically increasing, symmetric equilibrium is the equilibrium in which every buyer submits a bid equal to his private value. 12.4 Suppose that V = [0, v] is a bounded interval. Show that in a symmetric sealedbid second-price auction with independent private values the strategy vector under which buyer 1 bids v and all the other buyers bid 0 is an (asymmetric) equilibrium. Is it also an equilibrium in a sealed-bid first-price auction? Justify your answer. Is
510
Auctions
there an equilibrium in an open-bid ascending auction that is analogous to this equilibrium? 12.5 Consider a sealed-bid second-price auction where the private values of the buyers are independent and identically distributed with the uniform distribution over [0, 1]. Show that the strategy under which a buyer bids his private value does not strictly dominate all his other strategies. 12.6 Brent and Stuart are the only buyers in a sealed-bid first-price auction of medical devices. Brent knows that Stuart’s private value is uniformly distributed over [0, 2], 2 and that Stuart’s strategy is β(v) = v3 + v3 . (a) What is Brent’s optimal strategy? (b) What is Brent’s expected payment if he implements this optimal strategy (as a function of his own private value)? 12.7 (a) Suppose that the private values of two buyers in a sealed-bid first-price auction are independent and uniformly distributed over the set {0, 1, 2}. In other words, each buyer has three possible private values. The bids in the auction must be nonnegative integers. Find all the equilibria. (b) Find all the equilibria, under the same assumptions, when the auction is a sealed-bid second-price auction. (c) Compare the seller’s expected revenue under both auction methods. What have you discovered? 12.8 Denote by E I the seller’s revenue in a sealed-bid first-price auction, and by E II the seller’s revenue in a sealed-bid second-price auction. Find the variance of E I and of E II when there are n buyers, whose private values are independent and uniformly distributed over [0, 1]. Which of the two is the lesser? 12.9 Consider a sealed-bid second-price auction with n buyers whose private values are independent and uniformly distributed over [0, 1]. Find an asymmetric equilibrium at which every buyer wins with probability n1 . Can you find an equilibrium at which every buyer i wins with probability αi , for any collection of nonnegative numbers α1 , α2 , . . . , αn whose sum is 1? 12.10 Consider a sealed-bid first-price auction with three buyers, where the private values of the buyers are independent and uniformly distributed over [0, 2]. (a) Find a symmetric equilibrium of the game. (b) What is the seller’s expected revenue? 12.11 Prove that in a symmetric sealed-bid auction with independent private values the random variable Y = max{V2 , V3 , . . . , Vn } is a continuous random variable and its density function fY is a positive function. 12.12 Which of the following auction methods satisfy the conditions of the Revenue Equivalence Theorem (Corollary 12.24 on page 479): sealed-bid first-price auctions (see Example 12.15 on page 472), sealed-bid second-price auctions, sealed-bid
511
12.12 Exercises
first-price auctions with a reserve price (see Example 12.34 on page 486), sealedbid second-price auctions with a reserve price, sealed-bid second-price auctions with entry fees (see Example 12.28 on page 482). When only some of the conditions are satisfied, specify which conditions are not satisfied, and justify your answer. 12.13 Compute the seller’s expected revenue in a sealed-bid second-price auction with a reserve price ρ, in which there are two buyers whose private values are independent and uniformly distributed over [0, 1]. Compare the results here with the results we computed for sealed-bid first-price auctions with a reserve price ρ (Equation (12.104) on page 488). 12.14 (a) Compute the seller’s expected revenue in a sealed-bid first-price auction with a reserve price ρ, with n buyers whose private values are independent and uniformly distributed over [0, 1]. (b) What is the reserve price ρ ∗ that maximizes the seller’s expected revenue? (c) Repeat items (a) and (b) for a sealed-bid second-price auction with a reserve price ρ. (d) Did you obtain the same results in both cases? Explain why. (e) Compare the expected revenue computed here with the expected revenue of a seller who is selling the object by setting the monopoly price (see Exercise 12.1). Which expected revenue is higher? (f) What does the optimal reserve price in items (b) and (c) above converge to when the number of buyers increases to infinity? 12.15 Consider a symmetric sealed-bid first-price auction with independent private values with n buyers whose cumulative distribution function F is given by F (v) = v 2 . Find the symmetric equilibrium, compute ei (vi ) (the expected payment of buyer i if his private value is vi ), compute ei (the payment that buyer i makes), and compute π (the seller’s expected revenue). 12.16 Repeat Exercise 12.15 when the cumulative distribution function of each buyer i’s private value is Fi (v) = v 3 . 12.17 Consider a sealed-bid second-price auction with entry fee λ and n buyers, whose private values are independent and uniformly distributed over [0, 1]. (a) (b) (c) (d)
Find a symmetric equilibrium. What is the seller’s expected revenue? Which entry fee maximizes the seller’s expected revenue? What value does the optimal entry fee approach as the number of buyers increases to ∞?
12.18 Repeat Exercise 12.17 when the auction method conducted is a sealed-bid firstprice auction with entry fee λ. 12.19 In this exercise, using Theorem 12.23 (page 478), compute a symmetric equilibrium β in a sealed-bid third-price auction, with n buyers whose private values are independent; each Vi has uniform distribution over [0, 1]. The winner of this
512
Auctions
auction is the buyer submitting the highest bid, and he pays the third-highest bid.9 If several buyers have submitted the highest bid, the winner is chosen from among them by a fair lottery granting each equal probability of winning. Denote the highest bid from among V2 , V3 , . . . , Vn by Y , and the second-highest bid from among V2 , V3 , . . . , Vn by W . Denote by Fi the cumulative distribution function of Vi , and by fi its density function. (a) Prove that for every v1 ∈ (0, 1], the conditional cumulative distribution function of W , given Y ≤ v1 , is F(W |Y ≤v1 ) (w) = (Fi (w))n−2 ×
(n − 1)Fi (v1 ) − (n − 2)Fi (w) , ∀w ∈ [0, v1 ]. (12.197) (Fi (v1 ))n−1
(b) Compute the conditional density function f(W |Y ≤v1 ) . (c) Denote h(y) = (n − 2)(F1 (y))n−3 f1 (y), ∀y ∈ [0, 1].
(12.198)
The Revenue Equivalence Theorem implies that the expected payment of buyer 1 with private value v1 ∈ (0, 1] is given by FY (v1 )E[β(W ) | Y ≤ v1 ]. Conclude from this that v1 v1 β(y)(n − 1)h(y)(F1 (v1 ) − F1 (y))dy = yfY (y)dy, ∀v1 ∈ (0, 1]. 0
0
(12.199)
(d) Differentiate Equation (12.199) by v1 , and show that v1 β(y)h(y)dy = v1 (F1 (v1 ))n−2 , ∀v1 ∈ (0, 1).
(12.200)
0
(e) Differentiate Equation (12.200) by v1 , and show that the solution to this equation is F1 (v1 ) , ∀v1 ∈ (0, 1). (12.201) β(v1 ) = v1 + (n − 2)f1 (v1 ) (f) Under what conditions is β a symmetric equilibrium?
12.20 Consider a sealed-bid first-price auction with two buyers whose private values are independent; the private value of buyer 1 has uniform distribution over the interval [0, 3], and the private value of buyer 2 has uniform distribution over the interval [3, 4]. Answer the following questions: (a) Prove that the following pair of strategies form an equilibrium β1 (v1 ) = 1 +
β2 (v2 ) =
1 2
+
v1 , 2 v2 . 2
(12.202) (12.203)
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
9 If only two buyers have submitted the highest bid, the next-highest bid is the sum of money that the winner pays for the auctioned object. If three buyers have submitted the highest bid, that bid is the amount of money the winner pays for the auctioned object.
513
12.12 Exercises
(b) Is the probability that buyer 2 wins the auction equal to 1? (c) Compute the seller’s expected revenue if the buyers implement the strategies (β1 , β2 ). (d) Compute the seller’s expected revenue in a sealed-bid second-price auction. Is that the same expected revenue as in part (c) above? 12.21 Consider a sealed-bid second-price auction with two buyers whose private values are independent; the private value of buyer 1 has uniform distribution over the interval [0, m + z], and the private value of buyer 2 has uniform distribution over , 3m + z], where m, z > 0. Show that the following pair of strategies the interval [ 3m 2 2 form an equilibrium. β1 (v1 ) =
β2 (v2 ) =
v1 2 v2 2
+ m2 , +
m . 4
(12.204) (12.205)
Note that these equilibrium strategies are independent of z. This is a generalization of Exercise 12.20 (which is the special case in which m = 2, z = 1). 12.22 Prove Theorem 12.29 (page 482): in a sealed-bid second-price auction with a weakly dominates reserve price, for each buyer strategy β the following strategy β β, if β = β, β(v) = “no”, (v) = “no” β (12.206) v β(v) = x.
12.23 Consider a sealed-bid second-price auction with two buyers, whose private values are independent; buyer 1’s private value is uniformly distributed over [0, 1], and buyer 2’s private value is uniformly distributed over [0, 2]. (a) For each buyer, find all weakly dominant strategies. (b) Consider the equilibrium in which every buyer bids his private value. What is the probability that buyer 1 wins the auction, under this equilibrium? What is the seller’s expected revenue in this case? (c) Prove that at any equilibrium β = (β1 , β2 ) satisfying β1 (v) = β2 (v) for all v ∈ [0, 1], one has β1 (v) = β2 (v) = v for all v ∈ [0, 1]. (d) Are there equilibria at which a buyer, whose private value vi is less than 1, does not submit the bid βi (vi ) = vi ?
12.24 (a) Prove that if F is a cumulative distribution function over [0, 1], and if Y is the maximum of n − 1 independent random variables with cumulative distribution function F , then 5v (F (x))n−1 dx E[Y | Y ≤ v] = v − 0 . (12.207) (F (v))n−1 Hint: Differentiate the function x(F (x))n−1 . (b) Use Equation (12.207) to write explicitly the symmetric equilibrium β ∗ in a symmetric sealed-bid first-price auction with independent private
514
Auctions
values with n buyers when V = [0, 1] and (a) F (x) = x, (b) F (x) = x 2 , and (c) F (x) = x(2 − x). 12.25 Prove that in a sealed-bid second-price auction, the strategy under which a buyer submits a bid equal to his private value weakly dominates all his other strategies, also when the buyer is risk-averse or risk-seeking. 12.26 Prove that in a symmetric sealed-bid auction with independent private values in which all the buyers are risk-seeking, and have the same strictly convex, differentiable, and monotonically increasing utility function, the seller’s expected revenue is lower in a sealed-bid first-price auction than in a sealed-bid second-price auction. 12.27 Suppose that in a sealed-bid first-price auction there are n buyers whose private values are independent and uniformly distributed over [0, 1]. Suppose further that the utility function of all buyers is U (x) = x c . (a) Find the values c for which the buyers are risk-averse, risk-neutral, and riskseeking. (b) Prove that a symmetric equilibrium γ must satisfy the following differential equation: γ ′ (x) =
n − 1 f (x) (x − γ (x)). c F (x)
(c) Show that the following strategy is a symmetric equilibrium: 5 v n−1 F c (x)dx γ (v) = v − 0 n−1 . F c (v)
(12.208)
(12.209)
(d) Compute the symmetric equilibrium γ for the case that F is the uniform distribution over [0, 1]. (e) Compare the strategy that you found for arbitrary c, with the symmetric equilibrium in the case in which the buyers are risk-neutral (see Exercise 12.24). Ascertain that risk-averse buyers submit higher bids than risk-neutral buyers, and that risk-seeking buyers submit lower bids than risk-neutral buyers. (f) What is the seller’s expected revenue as a function of c? Is this an increasing function? This exercise shows that a symmetric equilibrium in a sealed-bid first-price auction with n buyers, where the utility function of each buyer is U (x) = x c , is also a symmetric equilibrium in the same auction with n−1 + 1 risk-neutral buyers.10 In c other words, a risk-averse buyer behaves like a risk-neutral buyer in an auction with more buyers, and therefore increases his bid. 12.28 Prove Theorem 12.42 on page 494: every sealed-bid auction with risk-neutral buyers can be presented as a selling mechanism. 12.29 Can there be more than one equilibrium for an incentive-compatible direct selling mechanism? If your answer is yes, present an example. If not, justify your answer. ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
10 Under the assumption that
n−1 c
is an integer.
515
12.12 Exercises
If your answer is yes, do all these equilibria yield the same expected revenue for the seller? Justify your answer. 12.30 Prove the revelation principle (Theorem 12.47 on page 496): let (i , qi , μi )i∈N be an equilibrium of this mechanism. Then the be a selling mechanism, and β is identical to the outcome under β ∗ (the outcome of this mechanism under β equilibrium when all buyers reveal their true values) in mechanism (q, μ), defined by Equations (12.132)–(12.133). be two symmetric sealed-bid auctions with independent 12.31 Let (p, C) and ( p, C) private values defined over the same set N of risk-neutral buyers. Suppose that in both auctions the winner of the auction is the buyer who submits the highest bid. be symmetric and and a buyer who submits a bid of 0 pays nothing. Let β and β respecmonotonically increasing equilibrium strategies in (p, C) and in ( p , C), tively (for the same distributions of private values). Using Corollary 12.50 (page 500) prove that the seller’s expected revenue is the same under both equilibria, and the buyer’s expected profit given his private value is also identical in both auctions. 12.32 Consider the following cumulative distribution function, where k ∈ N: Fi (vi ) = (vi )k ,
0 ≤ v ≤ 1.
(12.210)
Compute the function ci (see Equation (12.159) on page 501). For which values k is the function ci monotonically increasing? 12.33 Suppose that Vi is distributed according to the exponential distribution with parameter λ (i.e, Vi = [0, ∞) and fi (vi ) = λe−λvi for each vi ≥ 0). Compute the function ci . Is ci monotonically increasing? 12.34 For each of the following auctions and their respective equilibria β, construct an incentive-compatible direct selling mechanism whose truth-telling equilibrium β ∗ is equivalent to the equilibrium β. (a) Auction method: sealed-bid second-price auction; equilibrium β = (βi )i∈N , where βi (vi ) = vi for each buyer i and each vi ∈ Vi . (b) Auction method: a sealed-bid second-price auction in which Vi = [0, 1] for every buyer i, equilibrium β = (βi )i∈N where β1 (v1 ) = 1 for all v1 ∈ [0, 1] and βi (vi ) = 0 for each buyer i = 1 and all v1 ∈ [0, 1]. (c) Auction method: sealed-bid first-price auction with n = 2, and the private values of the buyers are independent and uniformly distributed over [0, 1]; equilibrium β = (βi )i=1,2 , given by βi (vi ) = v2i . (d) Auction method: sealed-bid first-price auction with a reserve price ρ and two buyers whose private values are independent and uniformly distributed over the interval [0, 1]; equilibrium given in Example 12.34 (page 486). 12.35 Suppose that there are n buyers whose private values are independent and uniformly distributed over [0, 1]. Answer the following questions: (a) What is the individually rational, incentive-compatible direct selling mechanism that maximizes the seller’s expected revenue?
516
Auctions
(b) In this mechanism, what is each buyer’s probability of winning the object, assuming that each buyer reports his true private value? (c) What is the seller’s expected revenue in this case? 12.36 Repeat Exercise 12.35 for the case in which there are two buyers, and their private values V1 and V2 are independent; V1 is uniformly distributed over [0, 2] and V2 is uniformly distributed over [0, 3]. 12.37 Repeat Exercise 12.35 for the case in which there are two buyers, and their private values V1 and V2 are independent, V1 is uniformly distributed over [0, 1] and V2 is uniformly distributed over [0, 1] given by the cumulative distribution function F2 (v) = v 2 . 12.38 What is the seller’s expected revenue in a sealed-bid first-price auction in which there are two buyers, and their private values V1 and V2 are independent; V1 is uniformly distributed over [0, 2] and V2 is uniformly distributed over [0, 3]? Hint: Use the Revenue Equivalence Theorem. 12.39 Repeat Exercise 12.38 for the case in which there are two buyers, and their private values V1 and V2 are independent; V1 is uniformly distributed over [0, 1] and V2 is uniformly distributed over [0, 3]. 12.40 Partition the following list of auction methods into groups, each of which contains methods whose symmetric equilibrium yields the same expected revenue for the seller. In all these methods consider the symmetric case with independent private values and a symmetric and monotonic increasing equilibrium. (a) (b) (c) (d) (e) (f) (g)
Sealed-bid first-price auction. Sealed-bid first-price auction with a reserve price ρ. Sealed-bid first-price auction with an entry fee λ. Sealed-bid second-price auction. Sealed-bid second-price auction with a reserve price ρ. Sealed-bid second-price auction with an entry fee λ. Sealed-bid third-price auction (in which the winning buyer is the one who submits the highest bid, and the price he pays for the object is the third-highest price bid. See Exercise 12.19). (h) Sealed-bid all-pay auction. (i) Sealed-bid all-pay auction with a reserve price ρ.
12.41 Among the auction methods listed in Exercise 12.40, do there exist methods in which a buyer’s submitted bid in an equilibrium may be greater than his private value? Justify your answer. 12.42 Suppose that there are n buyers participating in an auction, where the private values V1 , V2 , . . . , Vn are independent and identically distributed, with the cumulative distribution function Fi (vi ) = (vi )2 for v ∈ [0, 1]. Which auction maximizes the seller’s expected revenue? What is the seller’s expected revenue in that auction?
517
12.12 Exercises
12.43 Suppose there are n buyers participating in an auction, where the private values V1 , V2 , . . . , Vn are independent and identically distributed, with the cumulative distribution function Fi (vi ) = 21 vi + 12 (vi )2 , vi ∈ [0, 1].
(12.211)
Answer the following questions: (a) Which auction maximizes the seller’s expected revenue? (b) What is the seller’s expected revenue in this auction? (To answer this, it suffices to write the formula for the seller’s expected revenue, and specify the values of the variables. There is no need to compute the formula explicitly.) (c) Is the seller’s expected revenue monotonically increasing as the number of buyers in the auction increases? Justify your answer. 12.44 Suppose there are n buyers participating in an auction, where the private values V1 , V2 , . . . , Vn are independent and for each i ∈ {1, 2, . . . , n}, vi is uniformly distributed over [0, v i ]. Suppose further that v 1 < v 2 < · · · < v n . Answer the following questions: (a) Which selling mechanism maximizes the seller’s expected revenue? (b) What is the seller’s expected revenue under this mechanism? (c) What is the probability that buyer n wins the object under this mechanism? In the last two items, it suffices to write down the appropriate formula, with no need to solve it explicitly. 12.45 Suppose there are two buyers with independent private values uniformly distributed over [0, 1]. Buyer 2 faces budget limitations, and the maximal sum that he can bid is 41 . Buyer 1 is free of budget limitations. Answer the following questions: (a) Find an equilibrium if the buyers are participating in a sealed-bid second-price auction. (b) Compute the seller’s expected revenue, given the equilibrium you found. Consider what happens if the buyers participate instead in a sealed-bid first-price auction. To avoid a situation in which buyer 1 bids a price 14 + ε, where ε > 0 is very small, define the function p according to which if both buyers bid 14 buyer 1 is declared the winner (and if both buyers submit an identical bid that is lower than 14 , each of them is chosen the winner with equal probability 12 ). Answer the following questions, and justify your answers: (c) Is the following strategy vector (β1 , β2 ) an equilibrium? ! β1 (v1 ) = v21 , β2 (v2 ) = min v22 , 14 .
(12.212)
(d) If your answer to item (c) is negative, find a nondecreasing equilibrium. (e) Let (β1∗ , β2∗ ) be a nondecreasing equilibrium. What are β1∗ (0) and β2∗ (0)? Is β1∗ (1) = β2∗ (1)?
518
Auctions
(f) Does Corollary 12.50 (page 500) enable you to deduce that the seller’s expected revenue under the nondecreasing equilibrium in this case equals the expected revenue that you found in item (b)? (g) Does Theorem 12.59 (page 504) enable you to deduce that the individually rational, incentive-compatible direct selling mechanism that maximizes the seller’s expected revenue is a sealed-bid second-price auction with a reserve price? 12.46 This exercise explores the case in which the number of buyers in an auction is unknown. Suppose that there are N potential buyers whose private values V1 , V2 , . . . , VN are independent and have identical continuous distribution over [0, 1]. Denote by F the common cumulative distribution function. The number of buyers participating in this auction is unknown to any of the participating buyers. Each buyer ascribes probability pn to the event that there are N−1 pn = 1. Note that each n participating buyers, in addition to himself, where n=0 buyer has the same belief about the distribution of the number of buyers in the auction. (a) Find a symmetric equilibrium of this situation when the selling takes place by way of a sealed-bid second-price auction. Explain why this is an equilibrium. (b) Denote G(n) (z) = (F (z))n . Prove that the expected profit of a participating buyer, whose private value is v, is N−1
n=0
pn G(n) (v)E Y1(n) | Y1(n) < v ,
(12.213)
where Y1(n) is the maximum of n independent random variables sharing the same cumulative distribution function F . (c) Prove the Revenue Equivalence Theorem in this case: Consider a symmetric sealed-bid auction with independent private values, and let β be a monotonically increasing symmetric equilibrium satisfying the assumptions that (a) the winner is the buyer submitting the highest bid, and (b) the expected payment of a buyer whose private value is 0, is 0. Then the expected payment of a buyer whose private value is v is given by Equation (12.213). (d) Compute a symmetric equilibrium strategy when the selling takes place by way of a sealed-bid first-price auction. (e) Explain how Theorem 12.59 (page 504) can be used to show that the optimal selling mechanism in this case is a sealed-bid second-price auction with a reserve price.
13
Repeated games
Chapter summary In this chapter we present the model of repeated games. A repeated game consists of a base game, which is a game in strategic form, that is repeated either finitely or infinitely many times. We present three variants of this model:
r The finitely repeated game, in which each player attempts to maximize his average payoff.
r The infinitely repeated game, in which each player attempts to maximize his long-run average payoff.
r The infinitely repeated game, in which each player attempts to maximize his discounted payoff. For each of these models we prove a Folk Theorem, which states that under some technical conditions the set of equilibrium payoffs is (or approximates) the set of feasible and individually rational payoffs of the base game. We then extend the Folk Theorems to uniform equilibria for discounted infinitely repeated games and to uniform ε-equilibria for finitely repeated games. The former is a strategy vector that is an equilibrium in the discounted game, for every discount factor sufficiently close to 1, and the latter is a strategy vector that is an ε-equilibrium in all sufficiently long finite games.
In the previous chapters, we dealt with one-stage games, which model situations where the interaction between the players takes place only once, and once completed, it has no effect on future interactions between the players. In many cases, interaction between players does not end after only one encounter; players often meet each other many times, either playing the same game over and over again, or playing different games. There are many examples of situations that can be modeled as multistage interactions: a printing office buys paper from a paper manufacturer every quarter; a tennis player buys a pair of tennis shoes from a shop in his town every time his old ones wear out; baseball teams play each other several times every season. When players repeatedly encounter each other in strategic situations, behavioral phenomena emerge that are not present in one-stage games.
r The very fact that the players encounter each other repeatedly gives them an opportunity to cooperate, by conditioning their actions in every stage on what happened in previous 519
520
Repeated games
stages. A player can threaten his opponent with the threat “if you do not cooperate now, in the future I will take actions that harm you,” and he can carry out this threat, thus “punishing” his opponent. For example, the manager of a printing office can inform a paper manufacturer that if the price of the paper he purchases is not reduced by 10% in the future, he will no longer buy paper from that manufacturer. r Repeated games enable players to develop reputations. A sporting goods shop can develop a reputation as a quality shop, or a discount store. In this chapter, we present the model of repeated games. This is a simple model of games in which players play the same base game time and again. In particular, the set of players, the actions available to the players, and their payoff functions do not change over time, and are independent of past actions. This assumption is, of course, highly restrictive, and it is often unrealistic: in the example above, new paper manufacturers enter the market, existing manufacturers leave the market, there are periodic changes in the price of paper, and the quantity of paper that printers need changes over time. This simple model, however, enables us to understand some of the phenomena observed in multistage interactions. The more general model, where the actions of the players and their payoff functions may change from one stage to another, is called the model of “stochastic games.” The reader interested in learning more about stochastic games is directed to Filar and Vrieze [1997] and Neyman and Sorin [2003].
13.1
The model • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
A repeated game is constructed out of the base game Ŵ that defines it, i.e., the game that the players play at each stage. We will assume that the base game is given in strategic form Ŵ = (N, (Si )i∈N , (ui )i∈N ), where N = {1, 2, . . . , n} is the set of players, Si is the set of actions1 available to player i, and ui : S → R is the payoff function of player i in the base game, where S = S1 × S2 × · · · × Sn is the set of action vectors. In repeated games, the players encounter each other again and again, playing the same strategic-form game Ŵ each time. The complete description of a repeated game needs to include the number of stages that the game is played. In addition, since the players receive a payoff at each stage, we need to specify how the players value the sequence of payoffs that they receive, i.e., how each player compares each payoff sequence to another payoff sequence. We will consider three cases:
r The game lasts a finite number of stages T , and every player wants to maximize his average payoff. r The game lasts an infinite number of stages, and every player wants to maximize the upper limit of his average payoffs. ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
1 In this chapter we will call the elements of Si “actions,” and reserve the term “strategy” for strategies in the repeated game.
521
13.2 Examples
r The game lasts an infinite number of stages, and each player wants to maximize the time-discounted sum of his payoffs. Denote by M := max max |ui (s)| i∈N
(13.1)
s∈S
the maximal absolute value of the payoffs received by the players in one stage. Recall that the set of distributions over a set Si is i = (Si ), the product set of these sets is = 1 × 2 × · · · × n , and Ui : → R is the multilinear extension of the payoff functions ui (defined over S; see page 147). By definition, a strategy instructs a player how to play throughout the game. The definition of a strategy in finite repeated games, and infinitely repeated games, will be presented when these games are defined.
13.2
Examples • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The following example will be referenced often, for illustrating definitions, and explaining claims in this chapter. Example 13.1 Repeated Prisoner’s Dilemma Recall that the Prisoner’s Dilemma is a one-stage two-player game, depicted in Figure 13.1.
Player II D
C
D
1, 1
4, 0
C
0, 4
3, 3
Player I Figure 13.1 The one-stage Prisoner’s Dilemma
For both players, action D strictly dominates action C, so the only equilibrium of the base game is (D, D). Consider the case in which the players play the Prisoner’s Dilemma twice, and the second time the game is played, they both know which actions were chosen the previous time they played the game. When this situation is depicted as an extensive-form game (see Figure 13.2), the game tree has information sets representing the fact that at each stage the players choose their actions simultaneously. In Figure 13.2, the total payoff of each player in the two stages are indicated by the leaves of the game tree, where the upper number is the total payoff of Player I, and the lower number is the total payoff of Player II. In this figure, and several other figures in this chapter, the depicted tree “grows” from top to bottom, rather than left to right, for the sake of saving space on the page. What are the equilibria of this game? A direct inspection reveals that the strategy vector in which the players repeat the one-stage equilibrium (D, D) at both stages is an equilibrium of the two-stage
522
Repeated games
I D
C
II
II
D I
C I
D
C
II D 2 2
D I
D
II
C
II
C I
D
II
C
II
D
II
C
II
II
C D
C D
C D
C D
C D
C D
C D
C
5 1
4 4
8 0
7 3
4 4
3 7
7 3
6 6
1 5
5 1
4 4
1 5
0 8
4 4
3 7
Figure 13.2 The two-stage Prisoner’s Dilemma, represented as an extensive-form game
game. This is a special case of a general claim that states that every strategy vector where in every stage the players play an equilibrium of the base game is an equilibrium of the T -stage game (Theorem 13.6). We argue now that at every equilibrium of the two-stage repeated game, the players play (D, D) in both stages. To see this, suppose instead that there exists an equilibrium at which, with positive probability, the players do not play (D, D) at some stage. Let t ∈ {1, 2} be the last stage in which there is positive probability that the players will not play (D, D), and suppose that in this event, Player I does not play D in stage t. This means that if the game continues after stage t the players will play (D, D). We will show that this strategy cannot be an equilibrium strategy. Case 1: t = 1. Consider the strategy of Player I at which he plays D in both stages. We will show that this strategy grants him a higher payoff. Since D strictly dominates C, Player I’s payoff rises if he switches from C to D in the first stage. And since, by assumption, after stage t the players play (D, D) (since stage t is the last stage in which they may not play (D, D)), Player I’s payoff in the second stage was supposed to be 1. By playing D in the second stage, Player I’s payoff is either 1 or 4 (depending on whether Player II plays D or C);2 in either case, Player I cannot lose in the second stage. The sum total of Player I’s payoffs therefore rises. Case 2: t = 2. Consider the strategy of Player I at which he plays in the first stage what the original strategy tells him to play, and in the second stage he plays D. Player I’s payoff in the first stage does not change, but because D strictly dominates C, his payoff in the second stage does increase. The sum total of Player I’s payoffs therefore increases.
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
2 Even if t = 1 is the last stage in which one of the players plays C with positive probability, it is still possible that if both players play D in the first stage, then Player II will play C in the second stage with positive probability. To see this, consider the following strategy vector. In the first stage, both players play C. In the second stage, Player I plays D, and Player II plays D if Player I played C in the first stage, and he plays C if Player I played D in the first stage. In this case, if neither player deviates, the players play (C, C) in the first stage, and (D, D) in the second stage; but if Player I plays D in the first stage, then Player II plays C in the second stage.
523
13.2 Examples Note that despite the fact that at every equilibrium of the two-stage repeated game the players play (D, D) in every stage, it is possible that at equilibrium, the strategy C is used off the equilibrium path; that is, if a player does deviate from the equilibrium strategy, the other player may play C with positive probability. For example, consider the following strategy σ1 :
r Play D in the first stage. r In the second stage, play as follows: if in the first stage the other player played D, play D in the second stage; otherwise play [ 81 (C), 78 (D)] in the second stage. Direct inspection shows that the strategy vector (σ1 , σ1 ), in which both players play strategy σ1 , is an equilibrium of the two-stage repeated game. By the same rationale used here to show that in the two-stage repeated Prisoner’s Dilemma at equilibrium the players play (D, D) in both stages, it can be shown that in the T -stage repeated ◭ Prisoner’s Dilemma, at equilibrium, the players play (D, D) in every stage (Exercise 13.6).
As we saw, in the finitely repeated Prisoner’s Dilemma, at every equilibrium the players play (D, D) in every stage. Does this extend to every repeated game? That is, does every equilibrium strategy of a repeated game call on the players to play a one-stage equilibrium in every stage? The following example shows that the answer is negative: in general, the set of equilibria of repeated games is a much richer set. Example 13.2 Repeated Prisoner’s Dilemma, with the possibility of punishment Consider the two-player game given in Figure 13.3, where each player has three possible actions.
D
C
P
D
1, 1
4, 0
−1, 0
C
0, 4
3, 3
−1, 0
P
0, −1
0, −1
−2, −2
Figure 13.3 The repeated Prisoner’s Dilemma, with the possibility of punishment
This game is similar to the Prisoner’s Dilemma in Example 13.1, with the addition of a third action P to each player, yielding low payoffs for both players. Note that action P (which stands for Punishment) is strictly dominated by action D, and therefore by Theorem 4.35 (page 109) we can eliminate it without changing the set of equilibria of the base game. After eliminating P for both players, we are left with the one-stage Prisoner’s Dilemma, whose only equilibrium is (D, D). It follows that the only equilibrium of the base game in Figure 13.3 is (D, D). As previously stated, when the players play an equilibrium of the base game in every stage, the resulting strategy vector is an equilibrium of the repeated game. It follows that in the two-stage repeated game in this example, playing (D, D) in both stages is an equilibrium. In contrast with the standard repeated Prisoner’s Dilemma, there are additional equilibria in this repeated game. The strategy vector at which both players play the following strategy is an equilibrium:
r Play C in the first stage. r If your opponent played C in the first stage, play D in the second stage. Otherwise, play P in the second stage.
524
Repeated games If both players play this strategy, they will both play C in the first stage, and D in the second stage, and each player’s total payoff will be 4 (in contrast to the total payoff 2 that they receive under the equilibrium of playing (D, D) in both stages). Since action D weakly dominates both of the other actions, no player can gain by deviating from D in the second stage alone. A player who deviates in the first stage from C to D gets a payoff of 4 in the first stage, but he will then get at most −1 in the second stage (because his opponent will play P in the second stage), and so in sum total he loses: his total payoff when he deviates is 3, which is less than his total payoff of 4 at the equilibrium. By deviating to P in the first stage, the deviator also loses. This example illustrates that in a repeated game, the players can threaten each other, by adopting strategies that call on them to punish a player in later stages, if at some stage that player deviates from a particular action. The greater the number of stages in the repeated game, the greater opportunity players have to punish each other. In general, this increases the number of equilibria. The last equilibrium in this example is not a subgame perfect equilibrium (see Section 7.1 on page 252), since the use of the action P is not part of an equilibrium in the subgame starting in the second stage. We will see later in this chapter that repeated games may have additional equilibria that are subgame perfect. Note that there is a proliferation of pure strategies in repeated games, compared to one-stage games. For example, in the one-stage game in Figure 13.3, every player has three pure strategies, D, C, and P . In the two-stage game, every player has 3 × 39 = 310 = 59,049 pure strategies: there are three actions available to the player in the first stage, and in the second stage his strategy is given by a function from the pair of actions played in the first stage, i.e., from {D, C, P }2 to {D, C, P }. 4 In the three-stage repeated game, every player has 3 × 39 × (33 ) = 391 pure strategies: the number of possible strategies in the first two stages is as calculated above, and in the third stage the player’s strategy is given by a function from {D, C, P }4 to {D, C, P }: for every pair of actions that were ◭ played in the first two stages, the player needs to decide what to play in the third stage.
In general, the size of each player’s space of strategies grows super-exponentially with the number of stages in the repeated game (Exercise 13.1). This growth has two consequences. A positive consequence is that it leads to complex and interesting equilibria. In Example 13.2, we found an equilibrium that grants a higher average payoff to the two players than their payoff when they repeat the only equilibrium of the one-stage game. A negative consequence is that, due to the complications inherent in the proliferation of strategies, it becomes practically impossible to find all the equilibria of repeated games with many stages. For this reason, we will not attempt to compute all equilibria of repeated games. We will instead look for asymptotic results, as the number of repetitions grows; we will seek approximations to the set of equilibrium payoffs, without trying to find all possible equilibrium payoffs; and we will be interested in special equilibria that can easily be described.
13.3
The T-stage repeated game • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
In this section we will study the equilibria of a T -stage repeated game ŴT that is based on a strategic-form game Ŵ. Our goal is to characterize the limit set of equilibrium payoffs as T goes to infinity. We will also construct, for each vector x in the limit set of equilibrium payoffs, and for each sufficiently large natural number T , an equilibrium in the T -stage repeated game that yields a payoff close to x.
525
13.3 The T-stage repeated game
13.3.1 Histories and strategies Since players encounter each other repeatedly in repeated games, they gather information as the game progresses. The information available to every player at stage t + 1 is the actions played by all the players in the first t stages of the game. We will therefore define, for every t ≥ 0, the set of t-stage histories as H (t) := S t = S 12· · · × S3 . 0 ×S×
(13.2)
t times
For t = 0, we identify H (0) := {∅}, where ∅ is the history at the start of the game, which contains no actions. A history in H (t) will sometimes be denoted by ht , and sometimes j by (s 1 , s 2 , . . . , s t ), where s j = (si )i∈N is the vector of actions played in stage j . A behavior strategy for player i is an action plan that instructs the player which mixed action to play after every possible history. Definition 13.3 A behavior strategy for player i in a T -stage game is a function associating a mixed action with each history of length less than T τi :
T −1 t=0
H (t) → i .
(13.3)
The set of behavior strategies of player i in a T -stage game is denoted by BiT . −1 Equivalently, we can define a behavior strategy of player i as a sequence τi = (τit )Tt=0 t+1 of functions, where τi : H (t) → i instructs the player what to play in stage t, for each t ∈ {0, 1, . . . , T − 1}.
Remark 13.4 When a T -stage repeated T −1 game is depicted as an extensive-form game, a pure strategy is a function τi : t=0 H (t) → Si . A mixed strategy is a distribution over pure strategies (Definition 5.3 on page 147). We have assumed that every player knows which actions were played at all previous stages; i.e., every player has perfect recall (see Definition 6.13 on page 109). By Kuhn’s Theorem (Theorem 6.16 on page 235) it follows that every mixed strategy is equivalent to a behavior strategy, and we can therefore consider only behavior strategies, which are more convenient to use in this chapter. Example 13.1 (Continued ) Consider the two-stage Prisoner’s Dilemma. Two (behavior) strategies are writ ten in Figure 13.4, one for each player. The notation τI (DC) = 23 (D), 31 (C) means that after history DC (which occurs if in thefirst stage Player I plays D, and Player II plays C), Player I plays the mixed action 23 (D), 31 (C) in the second stage.
τI (φ) = 12 (D), 21 (C) , τI (DD) = D, τI (DC) = 23 (D), 31 (C) , τI (CD) = 14 (D), 43 (C) , τI (CC ) = C
τ II (φ) = C, τII (DD) = 34 (D), τ II (DC) = 12 (D), τ II (CD) = C, τ II (CC ) = D.
1 (C) 4 1 2 (C)
Figure 13.4 Strategies for both players in the two-stage Prisoner’s Dilemma
, ,
◭
526
Repeated games
Given the strategies (τi )i∈N of the players, denote by τ = (τ1 , τ2 , . . . , τn ) the vector of the players’ strategies. Denote by τi (si ) the probability that player i plays action si in the first stage, and by τi (si | s 1 , . . . , s t−1 ) the conditional probability that player i plays action si in stage t, given that the players have played (s 1 , . . . , s t−1 ) in the first t − 1 stages. Example 13.1 (Continued) If the players play according to the strategies τI and τII that we defined in Figure 13.4 in the two-stage Prisoner’s Dilemma, we can associate with every branch in the game tree the probability that it will be chosen in a play of the game. These probabilities are shown in Figure 13.5. The figure also shows, by each leaf of the game tree, the probability that the leaf will be arrived at if the players play strategies τI and τII . I D
C
1 2
1 2
II
II
D
3 4
Payoff to Player I: 2 Payoff to Player II: 2
0
1
I
I
I
I
1
0
II
II C D
5 1
C
1
C
1 4
D
0
D
D
C
3 4
D
C D 1 4
1 5
4 4
1 2
C
C
1 3
1 4
3 4
II
II
II
II
C D 1 2
5 1
D
2 3
8 0
1 2
C D 1 2
4 4
7 3
0 1 5
D
C
0
1
II
II
C D
C D
C D
C
1 0
1 1
0 1
0
4 4
0 8
3 7
4 4
7 3
3 7
Probability to 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 6 6 12 12 2 reach each leaf: Figure 13.5 The probabilities attached to each play of the game, under the strategies (τI , τII )
6 6 0
◭
The collection of all the possible plays of the T -stage game is S T = H (T ). As can be seen in Figure 13.5, every strategy vector τ naturally induces a probability measure Pτ over H (T ). The probability of every play of the game (s 1 , s 2 , . . . , s T ) is the probability that if the players play according to strategy τ , the resulting play of the game will be this history. Formally, for every action vector s 1 = (s11 , . . . , sn1 ) ∈ S, define Pτ (s 1 ) = τ1 s11 × τ2 s21 × · · · × τn sn1 .
(13.4)
527
13.3 The T-stage repeated game
This is the probability that the action vector played in the first stage is s 1 , and it equals the product of the probability that every player i plays action si1 . More generally, for every t, 2 ≤ t ≤ T , and every finite history(s 1 , s 2 , . . . , s t ) ∈ S t , define by induction Pτ (s 1 , s 2 , . . . , s t ) = Pτ (s 1 , s 2 , . . . , s t−1 ) × τ1 s1t | s 1 , s 2 , . . . , s t−1 × τ2 s2t | s 1 , s 2 , . . . , s t−1 × · · · × τn snt | s 1 , s 2 , . . . , s t−1 . This means that the probability that under τ the players play the action vector s 1 , s 2 , . . . , s t in the first t stages is the probability that the players play s 1 , s 2 , . . . , s t−1 in the first t − 1 stages, times the conditional probability that they play the action vector s t in stage t, given that they played s 1 , s 2 , . . . , s t−1 in the first t − 1 stages. This formula for Pτ expresses the fact that the mixed action that a player implements in any given stage can depend on the actions that he or other players played in previous stages, but the random choices of the players made simultaneously in each stage are independent of each other. The case in which there may be correlation between the actions chosen by the players was addressed in Chapter 8, where we studied the concept of correlated equilibrium.
13.3.2 Payoffs and equilibria In repeated games, the players receive a payoff in every stage of the game. Denote the payoff received by player i in stage t by uti , and denote the vector of payoffs to the players in stage t by ut = (ut1 , . . . , utn ). Then, during the course of a play of the game, player i receives the sequence of payoffs (u1i , u2i , . . . , uTi ). We assume that every player seeks to maximize the sum total of these payoffs or, equivalently, seeks to maximize the average of these payoffs. As previously noted, every strategy vector τ induces a probability measure Pτ over H (T ). Denote the corresponding expectation operator by Eτ ; i.e., for every function f : H (T ) → R, the expectation of f under Pτ is denoted by Eτ [f ]:
Pτ (s 1 , . . . , s T )f (s 1 , . . . , s T ). (13.5) Eτ [f ] = (s 1 ,...,s T )∈H (T )
Player i’s expected payoff in stage t, under the strategy vector τ , is Eτ [uti ]. Denote player i’s average expected payoff in the first T stages under strategy vector τ by 7 6 T T
1 1 t γiT (τ ) := Eτ uti = E τ ui . (13.6) T t=1 T t=1 Example 13.1 (Continued) Figure 13.5 provides the probability to every play of the game under the strategy pair (τI , τII ). The table in Figure 13.6 presents the plays of the game that are obtained with positive probability in the left column, the probability that each play is obtained in the middle column, and the payoff to the players, under that play of the game, in the right column. Each play of the game is written from left to right, with the actions implemented by the players in the first stage appearing first, followed by the actions implemented by the players in the second stage. Player I’s action appears to the left of Player II’s action.
528
Repeated games
Play of the Game (D, C ), (D, D ) (D, C ), (D, C ) (D, C ), (C, D ) (D, C ), (C, C ) (C, C ), (C, D )
Probability
Payoff (5, 1) (8, 0) (4, 4) (7, 3) (3, 7)
1 6 1 6 1 12 1 12 1 2
Figure 13.6 The probability of every play of the game, and the corresponding payoff, under the
strategy pair (τI , τII ) It follows that the expected payoff of the two players is 1 6
× (5, 1) +
1 6
× (8, 0) +
1 12
× (4, 4) +
1 12
× (7, 3) +
1 2
7 1 × (3, 7) = 4 12 ,44 .
(13.7)
◭
Definition 13.5 Let Ŵ = (N, (Si )i∈N , (ui )i∈N ) be a base game. The T -stage game ŴT corresponding to Ŵ is the game ŴT = (N, (BiT )i∈N , (γiT )i∈N ). The strategy vector τ ∗ = (τ1∗ , . . . , τn∗ ) is a (Nash) equilibrium of ŴT if for each player i ∈ N, and each strategy τi ∈ BiT , ∗ γiT (τ ∗ ) ≥ γiT (τi , τ−i ).
(13.8)
The vector γ T (τ ∗ ) is called an equilibrium payoff of the repeated game ŴT . The following theorem states that a strategy vector at which in each stage the players play a one-stage equilibrium is an equilibrium of the T -stage game. Theorem 13.6 Let Ŵ = (N, (Si )i∈N , (ui )i∈N ) be a base game, and let ŴT be its corresponding repeated T -stage game. Let σ 1 , σ 2 , . . . , σ T be equilibria of Ŵ (not necessarily different equilibria). Then the strategy vector τ ∗ in ŴT , at which in each stage t, 1 ≤ t ≤ T , every player i ∈ N plays the mixed action σit , is an equilibrium. Proof: The strategy vector τ ∗ is an equilibrium, because neither player can profit by deviating. No player can profit in a stage in which he deviates from equilibrium, because by definition in such a stage the players implement an equilibrium of the base game. In addition, his deviation in any stage cannot influence the future actions of the other players, because they are playing according to a strategy that depends only on the stage t, not on the history ht . Formally, let i ∈ N be a player, and let τi be any strategy of player i in ŴT . We will ∗ ) ≤ γiT (τ ∗ ); i.e., player i does not profit by deviating from τi∗ to τi . show that γiT (τi , τ−i For each t, 1 ≤ t ≤ T , the mixed action vector σ t is an equilibrium of Ŵ. Therefore, for each history ht−1 ∈ H (t − 1), t ui (σ t ) ≥ ui τi (ht−1 ), σ−i .
(13.9)
529
13.3 The T-stage repeated game
This implies that Eτi ,τ−i∗ uti = = ≤
ht−1 ∈H (t−1)
ht−1 ∈H (t−1)
t Pτi ,τ−i∗ (ht−1 )ui τi (ht−1 ), σ−i
(13.10) (13.11)
Pτi ,τ−i∗ (ht−1 )ui (σ t )
(13.12)
(13.13)
ht−1 ∈H (t−1)
= ui (σ t )
∗ Pτi ,τ−i∗ (ht−1 )ui (τi (ht−1 ), τ−i (ht−1 ))
ht−1 ∈H (t−1)
Pτi ,τ−i∗ (ht−1 ) = ui (σ t ).
The last equality follows from the fact that the sum total of the probabilities of all (t − 1)stage histories is 1, and therefore Eτi ,τ−i∗ [uti ] ≤ ui (σ t ). Averaging over the T stages of ∗ the game shows that γiT (τi , τ−i ) ≤ γiT (τ ∗ ), which is what we wanted to show. Since ∗ T T ∗ γi (τi , τ−i ) ≤ γi (τ ) for every strategy τi of player i, and for every player i, we deduce that τ ∗ is an equilibrium. By repeating the same equilibrium in every stage, we get the following corollary. Corollary 13.7 Let Ŵ be a base game, and let ŴT be the corresponding repeated T -stage game. Every equilibrium payoff of Ŵ is also an equilibrium payoff of ŴT .
13.3.3 The minmax value Recall that Ui is the multilinear extension of ui (Equation (5.9), page 147). The minmax value of player i in the base game Ŵ is (Equation (4.51), page 113): vi =
min
max Ui (σi , σ−i ).
σ−i ∈×j =i j σi ∈i
(13.14)
This is the value that the players N \ {i} cannot prevent player i from attaining: for any vector of mixed actions σ−i they implement, player i can receive at least maxσi ∈i Ui (σi , σ−i ), which is at least v i . Every mixed strategy vector σ−i satisfying v i = max Ui (σi , σ−i ) σi ∈i
(13.15)
is called a punishment strategy vector against player i, because if the players N \ {i} play σ−i , they guarantee that player i’s average payoff will not exceed v i . Similarly to what we saw in Equation (5.25) (page 151), for every mixed action vector σ−i ∈ −i there exists a pure action si′ ∈ Si of player i satisfying Ui (si′ , σ−i ) ≥ v i (why?). The next theorem states that at every equilibrium of the repeated game, the payoff to each player i is at least v i . The discussion above and the proof of the theorem imply that the minmax value of each player i in the T -stage game is v i (Exercise 13.8). Theorem 13.8 Let τ ∗ be an equilibrium of ŴT . Then γiT (τ ∗ ) ≥ v i for each player i ∈ N. Proof: We will show that for every strategy vector τ (not necessarily an equilibrium vector) there exists a strategy τi∗ of player i (which depends on τ−i ) satisfying γiT (τi∗ , τ−i ) ≥ v i .
530
Repeated games
It follows, in particular, that if τ is an equilibrium, then γiT (τ ) ≥ γiT (τi∗ , τ−i ) ≥ v i ,
(13.16)
which is what the theorem claims. We now construct such a strategy τi∗ explicitly, for any given τ−i . Recall that when τ is a strategy vector, τj (h) is the mixed action that player j plays after history h, and τ−i (h) = (τj (h))j =i is the mixed action vector that the players −1 N \ {i} play after history h. As previously noted, for every history h ∈ Tt=0 H (t) there is an action si′ (h) ∈ Si such that Ui (si′ (h), τ−i (h)) ≥ v i . Let τi∗ be a strategy of player i under which, after every history h, he plays the action si′ (h). Then for every t ∈ {1, 2, . . . , T },
Pτi∗ ,τ−i (ht−1 )ui (τi∗ (ht−1 ), τ−i (ht−1 )) (13.17) Eτi∗ ,τ−i uti = ht−1 ∈H (t−1)
= ≥
ht−1 ∈H (t−1)
ht−1 ∈H (t−1)
Pτi∗ ,τ−i (ht−1 )ui (si′ (ht−1 ), τ−i (ht−1 ))
(13.18)
Pτi∗ ,τ−i (ht−1 )v i = v i .
(13.19)
The last equality follows from the fact that the sum total of the probabilities of all the possible histories at time period t is 1. In words, the expected payoff in stage t is at least v i . By averaging over the T stages of the game, we conclude that the expected average of the payoffs is at least v i : γiT (τi∗ , τ−i ) =
T T 1
1
vi = vi , Eτi∗ ,τ−i [uti ] ≥ T t=1 T t=1
(13.20)
which is what we wanted to show. Define a set of payoff vectors V by ! V := x ∈ RN : xi ≥ v i for each player i ∈ N .
(13.21)
This is the set of payoff vectors at which every player receives at least his minmax value. The set is called the set of individually rational payoffs. Theorem 13.8 implies that the set of equilibrium payoffs is contained in V .
13.4
Characterization of the set of equilibrium payoffs of the T-stage repeated game • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
For every set of vectors {x1 , . . . , xK } in RN , denote by conv{x1 , . . . , xK } the smallest convex set that contains {x1 , . . . , xK }. The players play some action vector s in S in each stage; hence the payoff vector in each stage is one of the vectors {u(s), s ∈ S}. In particular, the average payoff of the players, which is equal to T1 Tt=1 u(s t ), is necessarily located in the convex hull of these vectors (because it is a weighted average of the vectors in this set), which we denote by F : F := conv{u(s), s ∈ S}.
(13.22)
531
13.4 Equilibrium payoffs of the T-stage repeated game
This set is called the set of feasible payoffs. We thus have γ T (τ ) ∈ F for every strategy vector τ . Using the last remark, and Theorem 13.8, we deduce that the set of equilibrium payoffs is contained in the set F ∩ V of feasible and individually rational payoff vectors. As we now show, if the base game satisfies a certain technical condition, then for every feasible and individually rational payoff vector x there exists an equilibrium payoff vector of the T -stage game that is close to it, for sufficiently large T . The technical condition that is needed here is that, for every player i, it is possible to find an equilibrium of the base game at which the payoff to player i is strictly greater than his minmax value. Theorem 13.9 (The Folk Theorem3 ) Suppose that for every player i ∈ N there exists an equilibrium β(i) in the base game Ŵ = (N, (Si )i∈N , (ui )i∈N ) satisfying ui (β(i)) > v i . Then for every ε > 0 there exists T0 ∈ N such that for every T ≥ T0 , and every feasible and individually rational payoff vector x ∈ F ∩ V , there exists an equilibrium τ ∗ of the T -stage game ŴT whose corresponding payoff is ε-close to x (in the maximum norm4 ): γ T (τ ∗ ) − x∞ < ε.
(13.23)
Under every equilibrium β of the base game, ui (β) ≥ v i for every player i (as implied by Theorem 13.8 for T = 1). The condition of the theorem requires furthermore that, for every player i, there exist an equilibrium at which that inequality is a strict inequality. Remark 13.10 One can choose the minimal length T0 in Theorem 13.9 to be independent of x. To see this, note that since F ∩ V is a compact set, given ε there exists a finite set x 1 , x 2 , . . . , x J of vectors in F ∩ V such that the distance between each vector x ∈ F and at least one of the vectors x 1 , x 2 , . . . , x J is below 2ε : max min x − x j ∞ ≤
x∈F ∩V 1≤j ≤J
ε . 2
(13.24)
Denote by T0 (x j , 2ε ) the size of T0 in Theorem 13.9 corresponding to x j and 2ε . Let x ∈ F ∩ V , and let j0 ∈ {1, 2, . . . , J } be an index satisfying x − x j0 ∞ ≤ 2ε . By the triangle inequality, every equilibrium τ of the T -stage repeated game satisfying γ T (τ ) − x j0 ∞ ≤ 2ε also satisfies γ T (τ ) − x∞ ≤ ε. It follows that the statement of Theorem 13.9 holds for x and ε with T0 := max1≤j ≤J T0 (x j , 2ε ), and this T0 is independent of x.
13.4.1 Proof of the Folk Theorem: example Before we prove the theorem, we present an example that illustrates the proof. Consider the two-player game in Figure 13.7 (this is the game of Chicken; see Example 8.3 on page 303). The minmax value of both players is 2. The punishment strategy against Player I is R, and the punishment strategy against Player II is B. The game has two equilibria in pure
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
3 The name of the Folk Theorem is borrowed from the analogous theorem (see Theorem 13.17) for infinitely repeated games, which was well known in the scientific community for many years, despite the fact that it was not formally published in any journal article, and hence it was called a “folk theorem.” The theorem is now usually ascribed to Aumann and Shapley [1994]. The Folk Theorem for finite games, Theorem 13.9, was proved by Benoit and Krishna [1985]. 4 The maximum norm over Rn is defined as follows: x∞ = maxi=1,2,...,n |xi | for each vector x ∈ Rn .
532
Repeated games
Player II L
R
T
6, 6
2, 7
B
7, 2
0, 0
Player I Figure 13.7 The payoff matrix of the game of Chicken
strategies, (T , R) and (B, L), with payoffs (2, 7) and (7, 2) respectively (we will not use the equilibrium in mixed strategies). If we denote β(I) = (B, L),
β(II) = (T , R),
(13.25)
we deduce that the condition of Theorem 13.9 holds (because ui (β(i)) = 7 > 2 = v i for i ∈ {I, II}). The payoff vector (3, 3) is in F , since (3, 3) = 12 (0, 0) + 12 (6, 6). It is also in V , because both of its coordinates are greater than or equal to 2, which is the minmax value of both players. It is therefore in F ∩ V . We will now construct an equilibrium of the 100-stage game, whose average payoff is close to (3, 3). If the players play (T , L) in odd-numbered stages (yielding the payoff (6, 6) in every odd-numbered stage) and play (B, R) in even-numbered stages (yielding the payoff (0, 0) in every even-numbered stage), the average payoff is (3, 3). This does not yet constitute an equilibrium, because every player can profit by deviating at every stage. Because this sequence of actions is deterministic, any deviation from it is immediately detected, and the other player can then implement the punishment strategy. The punishment strategy guarantees that the deviating player receives at most 2 in every stage after the deviation, which is less than the average of 3 that he can receive if he avoids deviating. Because the repeated game in this case is finite, a threat to implement a punishment strategy is effective only if there are sufficiently many stages left to guarantee that the loss imposed on a deviating player is greater than the reward he stands to gain by deviating. If, for example, a player deviates in the last stage, he cannot be punished because there are no more stages, and he therefore stands to gain by such a deviation. This detail has to be taken into consideration in constructing an equilibrium. We now describe a strategy vector defined by a basic plan of action and a punishment strategy. The basic plan of action is depicted in Figure 13.8, and consists of 49 cycles, each comprised of two stages, along with a tail-end that is also comprised of two stages. In the first 98 stages, the players alternately play the action vectors (T , L) and (B, R), thereby guaranteeing that the average payoffs in these stages is (3, 3), with the average payoff in all 100 stages close to (3, 3). In these stages, they play according to a deterministic plan of action; hence if one of them deviates from this plan, the other immediately takes note of this deviation. Once one player deviates at a certain stage, the other player implements the punishment strategy against the deviator, from the next stage on: if Player II deviates, Player I plays B from the next stage to the end of the play of the game. If
533
13.4 Equilibrium payoffs of the T-stage repeated game
Player I’s actions Player II’s actions Stage Player I’s payoff Player II’s payoff
T L 1 6 6
B T B R L R 2 3 4 0 6 0 0 6 0
• • • • •
• • • • •
• • • • •
T B B L R L 97 98 99 6 0 7 6 0 2
T R 100 2 7
Figure 13.8 An equilibrium in the 100-stage game of Chicken
Player I deviates, Player II plays R from the next stage to the end of the play of the game. In the last two stages of the basic plan of action, the players play the pure strategy equilibria β(I) and β(II) (in that order). We now show that this strategy vector is an equilibrium yielding an average payoff that is close to (3, 3). Indeed, if the players follow this strategy vector, the average payoff is 49 (6, 6) 100
+
49 (0, 0) 100
+
1 (7, 2) 100
+
1 (2, 7) 100
= (3.03, 3.03),
(13.26)
which is close to (3, 3). We next turn to ascertaining that Player I cannot gain by deviating (ascertaining that Player II cannot gain by deviating is conducted in a similar way). In each of the last two stages (the tail-end of the action plan), the two players play an equilibrium of the base game, and therefore Player I cannot gain by deviating in those stages. Suppose, therefore, that Player I deviated during one of the first 98 stages. In the cycle at which he deviates for the first time, he can gain at most 3, relative to the payoff he would receive at that cycle by following the basic action plan. To see this, note that if he deviates in the second stage of the cycle (playing T instead of B), he gains 2 at that stage. If he deviates in the first stage of the cycle (playing B instead of T ), he gains 1 at that stage, and if he then plays T instead of B in the second stage of the cycle he gains 2 at that stage, and in total he gains 3 at that cycle (7 + 2 instead of 6 + 0 according to the basic plan). In each of the following cycles he loses (because he receives at most 2 in every stage of the cycle, instead of receiving 6 in the first stage and 0 in the second stage of the cycle, as he would receive under the basic plan of action). Finally, at stage 100 he loses 5: he will receive at most 2 rather than the 7 that he receives in the basic plan of action. In sum total, the deviation leads to a loss of at least 5 − 3 = 2, relative to the payoff he would receive by following the basic action plan, and therefore Player I cannot gain by deviating. In the construction depicted here, we have split the stages into cycles of length 2, because the payoff (3, 3) is the average of two payoff vectors of the matrix. If we had wanted to construct an equilibrium with a payoff that is, say, close to (3 21 , 4 34 ) (which is also in F ∩ V ), then, since (3 21 , 4 34 ) = 41 (0, 0) + 12 (6, 6) + 14 (2, 7), we would have constructed an equilibrium using cycles of length 4: except for the last stages, the players would repeatedly play the action vectors (B, R), (T , L), (T , L), (T , R).
(13.27)
We can mimic the construction above whenever the target payoff can be obtained as the weighted average of the payoff vectors in the matrix, with rational weights.
534
Repeated games
Since the target payoff is in F , it can always be obtained as a weighted average of payoffs. If the weights are irrational, we need to approximate them using rational weights. The role of the tail-end (the last two stages in the above example) is to guarantee that a deviating player loses. During the course of the tail-end, the players cyclically play the equilibria β(1), . . . , β(n). The expected payoff of each player i under each of these equilibria is greater than or equal to v i (because they are equilibria) and under β(i) it is strictly greater than v i . That is why, if the other players punish player i by reducing his payoff to v i , he loses in the tail-end. The tail-end needs to be sufficiently long for the total loss to be greater than the maximal gain that a player can obtain by deviating. On the other hand, the tail-end needs to be sufficiently short, relative to the length of the game, for the overall payoff to be close to the target payoff (which is the average payoff in a single cycle). In the formulation of the Folk Theorem, the equilibrium payoff does not equal the target payoff x; the best we can do is obtain a payoff that is close to it. This stems from two reasons: 1. The existence of the tail-end, in which the payoff is not the target payoff. 2. It may be the case that x cannot be expressed as the weighted average of payoff vectors of the matrix using rational weights, which then requires approximating these weights using rational weights.
13.4.2 Detailed proof of the Folk Theorem We will now generalize the construction in the example of the previous section to all repeated games. For every real number c, denote by ⌈c⌉ the least integer that is greater than or equal to c, and by ⌊c⌋ the greatest integer that is less than or equal to c. Recall that M = maxi∈N maxs∈S |ui (s)| is the maximal payoff of the game (in absolute value). Step 1: Determining the cycle length. We first show that every vector in F can be approximated by a weighted average of the vectors (u(s))s∈S , with rational weights sharing the same denominator. The proof of the following theorem is left to the reader (Exercise 13.13). Theorem 13.11 For every K ∈ N and every vector x ∈ F there are nonnegative integers (ks )s∈S summing to K satisfying 8 8 8 8 k s 8 8 u(s) − x 8 8 8 8 K s∈S
∞
≤
M × |S| . K
(13.28)
For ε > 0 and x ∈ F ∩ V , let K be a natural number satisfying K ≥ 2M×|S| and let ε (ks )s∈S be nonnegative integers summing to K satisfying Equation (13.28). If the players implement cycles of length K, and in each cycle they play each action s∈S vector ks u(s), and exactly ks times, then the average payoff over the course of the cycle is s∈S K the distance between this average payoff and x is at most M×|S| . K
535
13.4 Equilibrium payoffs of the T-stage repeated game
Step 2: Defining the strategy vector τ ∗ . We next define a strategy vector τ ∗ of the T -stage game, which depends on two variables, R and L, to be defined later. The T stages of the game are divided into R cycles of length K and a tail of length L: T = RK + L.
(13.29)
These variables will be set in such a way that the following two properties are satisfied: R will be sufficiently large for the average payoff according to τ ∗ to be close to x, and L will be sufficiently large for τ ∗ to be an equilibrium. In each cycle, the players play every action vector s ∈ S exactly ks times. In the tail-end, the players cycle through the equilibria β(1), . . . , β(n). In other words, each player j plays the mixed action βj (1) in the first stage, and in stages n + 1, 2n + 1, etc., of the tail-end; he plays the mixed action βj (2) in the second stage, and in stages n + 2, 2n + 2, etc., of the tail-end, and so on. The basic plan that we have defined for the first RK stages is deterministic: the players do not choose their actions randomly in these stages. It follows that if a player deviates from the basic plan in one of the first RK stages, this deviation is detected by the other players. In this case, from the next stage on, the other players punish the deviator: at every subsequent stage they implement a punishment strategy vector against the deviator. If a player deviates for the first time in one of the L final stages, the other players do not punish him, and instead continue cycling through the equilibria {β(i)}i∈N . Step 3: The constraints on R and L needed to ensure that the distance between the average payoff under τ ∗ and x is at most ε. Suppose that the players implement the strategy vector τ ∗ . Given the choice of (ks )s∈S , . the distance between the average payoff in every cycle of length K and x is at most M×|S| K This also holds true for any integer number of repetitions of the cycle. By the choice of K, one has M×|S| ≤ 2ε , and hence the distance between the average payoff in the first RK K stages and x is at most 2ε . If the length of the tail-end L is small relative to RK, the average payoff in the entire game will be close to x. We will ascertain that if L≤
KRε , 4M
(13.30)
then the distance between the average payoff in the entire game and x is at most ε. Indeed, the distance between the average payoff in the first RK stages and x is at most 2ε , and the distance between the average payoff in the last L stages and x is at most 2M. Therefore the average payoff in the entire game is within ε of x, as long as RK 2ε + 2ML ≤ ε. T
(13.31)
Since T = RK + L > RK, it suffices to require that RK 2ε + 2ML ≤ ε, RK and this inequality is equivalent to Equation (13.30).
(13.32)
536
Repeated games
Step 4: τ ∗ is an equilibrium. Suppose that player i first deviates from the basic plan at stage t0 . We will ascertain here that his average payoff cannot increase by such a deviation. Suppose first that t0 is in the tail-end: t0 > RK. Since throughout the tail the players play an equilibrium of the base game at every stage, player i cannot increase his average payoff by such a deviation. Suppose next that t0 ≤ RK. Then player i’s deviation triggers a punishment strategy against him from stage t0 + 1. It follows that from stage t0 + 1 player i’s payoff at each stage is at most his minmax value v i . If L ≥ n, by the condition that ui (β(i)) > v i we deduce that at each n consecutive stages in the tail-end, player i loses by the deviation at least ui (β(i)) − v i , relative to his payoff at the equilibrium strategy. Denote δi = ui (β(i)) − v i > 0, and δ = mini∈N δi > 0. The maximal profit that player i can gain by deviating up to stage RK is 2KM: because the payoffs are between −M and M, player i can gain at most 2M by deviating in any single stage; hence in a cycle in which he deviates, a player can gain5 at most 2KM. The player cannot gain in any of the subsequent cycles, because the average payoff in a cycle under the equilibrium strategy is x, while if a player deviates, he receives at most v i , while v i ≤ xi . For a punishment to be effective, we need to require that the tail-end be sufficiently long to ensure that the losses at the tail-end exceed the possible gains in the cycle in which the deviation occurred: 9 : L δ > 2KM. (13.33) n In this calculation, we have rounded down L/n. In every n stages of the tail-end, every player is punished only once. If L is not divisible by n, some of the players are punished ⌊ Ln ⌋ times, and some are punished ⌈ Ln ⌉ times. Equation (13.33) gives us the required minimal length of the tail-end 2KM L>n 1+ . (13.34) δ
The length of the tail-end, L, cannot be constant for all T , because T − L needs to be 2KM divisible by K. It suffices to use tail-ends whose length is at least n 1 + , and at δ + K. most n 1 + 2KM δ
Step 5: Establishing T0 . The length of the game, T , satisfies T = RK + L. From Equation (13.30), we need to require that R ≥ 4ML , i.e., T = RK + L ≥ L 1 + 4M . This, along with Equation Kε ε (13.34), implies that the length of the game must satisfy 4M 2KM 1+ . (13.35) T >n 1+ δ ε
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
5 If a player deviates at any stage, from the next stage on his one-stage expected payoff is at most his minmax value, but it is possible that in the basic plan during the cycle there may be stages in which his payoff is less than his minmax value. For example, in the equilibrium constructed in the example in Section 13.4.1 (page 531), in the even stages the payoff to each player is 0, while the minmax value of each player is 2. It is therefore possible for a player to gain at more than one stage by deviating.
537
13.5 Infinitely repeated games
We can therefore set T0 to be the value of the right-hand side of Equation (13.35). This concludes the proof of Theorem 13.9. Remark 13.12 As mentioned above, the only equilibrium payoff in the finitely repeated Prisoner’s Dilemma is (1, 1). This does not contradict Theorem 13.9, because the conditions of the theorem do not hold in this case: the only equilibrium of the one-stage Prisoner’s Dilemma is (D, D), and the payoff to both players at this equilibrium is 1, which is the minmax value of both players. The proof of the uniqueness of the equilibrium payoff in the T -stage Prisoner’s Dilemma is based on the existence of a last stage in the game. In the next section we will study repeated games of infinite length, and show that in that case, the repeated Prisoner’s Dilemma has more than one equilibrium payoff.
13.5
Infinitely repeated games • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
As noted above, the strategy vector constructed in the previous section is highly dependent on the length of the game: it cannot be implemented unless the players know the length of the game. However, it is often the case that the length of a repeated game is not known ahead of time. For example, the owner of a tennis-goods shop does not know if or when he will sell his shop, tennis players do not know when they will stop playing tennis, nor if or when they will move to another town. Infinitely repeated games can serve to model finite but extremely long repeated games, in which (a) the number of stages is unknown, (b) the players ascribe no importance to the last stage of the game, or (c) at every stage the players believe that the game will continue for several more stages. In this section, we will present a model of infinitely repeated games, and characterize the set of equilibria of such games. The definitions in this section are analogous to the definitions in the section on T -stage games. As the next example shows, extending games to an infinite number of repetitions leads to new equilibrium payoffs: payoff vectors that cannot be obtained as limits of sequences of equilibrium payoffs in finite games whose lengths increase to infinity.
Example 13.1 (Continued) Recall the repeated Prisoner’s Dilemma, given by the payoff matrix in Figure 13.9.
Player II D
C
D
1, 1
4, 0
C
0, 4
3, 3
Player I Figure 13.9 The Prisoner’s Dilemma
Consider the repeated Prisoner’s Dilemma in the case where the players repeat playing the basic game ad infinitum. In this case, every player receives an infinite sequence of payoffs: one payoff
538
Repeated games per stage of the game. We will assume that every player strives to maximize the limit of the average payoff he receives. Certain technical issues, such as what happens when the limit of the average payoffs does not exist, will be temporarily ignored (we will consider this issue later in this chapter). Whereas the only equilibrium payoff of the T -stage repeated game is (1, 1), in the infinitely repeated game there are additional equilibrium payoffs. Let us look, for example, at following strategy: in the first stage play C. In every subsequent stage, if the other player chose C in every stage since the game started, choose C in the current stage; otherwise choose D. This is an unforgiving strategy that is called the Grim-Trigger Strategy: as long as the opponent cooperates, you also cooperate, but if he fails to cooperate once, defect forever from that point on in the game. When both players implement this strategy, no player has an incentive to deviate. To see why, note that at this strategy vector, every player receives 3 in each stage; hence every player’s average payoff is 3. If a player deviates at some stage and plays D, he receives 4 in that stage, instead of 3, but from that stage on, the other player plays D, and then the most that the deviating player receives in every stage is 1. In particular, the limit of the average payoff of the deviating player is at most 1. Thus, the payoff vector (3, 3) is an equilibrium payoff of the infinitely repeated game, despite not being an equilibrium payoff of the T -stage game. ◭
Definition 13.13 A behavior strategy for player i (in the infinitely repeated game) is a function mapping every finite history to a mixed action: τi :
∞ t=0
H (t) → i .
(13.36)
The collection of all of player i’s strategies in the infinitely repeated game is denoted by Bi∞ . Remark 13.14 If τi (h) ∈ Si for every finite history h ∈ ∞ t=1 H (t), then the strategy τi is a pure strategy. Note that even when the sets of actions of the players are finite, the set of pure strategies available to them has the cardinality of the continuum. Denote by H (∞) the collection of all possible plays of the infinitely repeated game: H (∞) = S N .
(13.37)
An element of this set is an infinite sequence (s 1 , s 2 , . . .) of action vectors, where s t = (sit )i∈N is the action vector of the players in stage t. The results in Section 6.4 (page 238) show that every vector of behavior strategies τ = (τi )i∈N induces a probability distribution Pτ over the set H (∞) (which, together with the σ -algebra of cylinder sets forms a measurable space; see Section 6.4 for the definitions of these notions). We denote by Eτ the expectation operator that corresponds to the probability distribution Pτ . To define an infinitely repeated game, we need to define, in addition to the sets of strategies of the players, their payoff functions. One way to try doing this is by taking the limit of Equation (13.6) as T goes to infinity, but this limit may not necessarily exist. In this section, we will define infinitely repeated games, and equilibria in infinitely repeated games, without explicitly defining payoff functions in such games. In the next section we will define discounted payoff functions for infinitely repeated games, and study the corresponding equilibrium notion, which turns out to be different from the equilibrium concept presented in this section.
539
13.5 Infinitely repeated games
Definition 13.15 Let Ŵ = (N, (Si )i∈N , (ui )i∈N ) be a game in strategic form. The infinitely repeated game Ŵ∞ corresponding to Ŵ is the game whose set of players is N, and each player i’s set of strategies is Bi∞ . For every stage t ∈ N, and every player i ∈ N, denote by uti player i’s payoff in stage t. We next define the concept of the equilibrium of Ŵ∞ . Definition 13.16 A strategy vector τ ∗ is an equilibrium (of the infinitely repeated game Ŵ∞ ), with a corresponding equilibrium payoff x ∈ RN , if with probability 1 according to Pτ ∗ , for each player i ∈ N, the limit T 1 t ui T →∞ T t=1
(13.38)
lim
exists, and for each strategy τi of player i,
Eτ ∗
T 1 t lim ui T →∞ T t=1
= xi ≥ Eτi ,τ−i∗
T 1 t lim sup ui . T →∞ T t=1
(13.39)
In words, a strategy vector τ ∗ is an equilibrium, with payoff x if (a) the average payoff under τ ∗ converges, (b) the expectation of its limit is x, and (c) no player can profit by deviating. Since there is no guarantee that every deviation leads to a well-defined limit of the average payoffs, we require that the expectation of the limit superior of the average payoffs after a deviation be not greater than the equilibrium payoff. The following theorem characterizes the set of equilibrium payoffs of the infinitely repeated game. Theorem 13.17 (The Folk Theorem for the infinitely repeated game Ŵ ∞ ) The set of equilibrium payoffs of Ŵ∞ is the set F ∩ V . That is, every payoff vector that is in the convex hull of the payoffs {u(s), s ∈ S} and is individually rational (i.e., that is not less than the minmax value v i for each player i, which a player cannot be prevented from attaining) is an equilibrium payoff. Note that the Folk Theorem for Ŵ∞ and the Folk Theorem for ŴT differ in two respects. First, in Ŵ∞ we need not approximate a payoff to within ε; exact payoffs can be obtained. Second, in the finite repeated game, we required that for every player i there exist an equilibrium of the base game that gives player i a payoff that is strictly greater than his minmax value; this requirement is not needed for the Folk Theorem for Ŵ∞ . The differences between these two theorems are illustrated in the repeated Prisoner’s Dilemma. In that example, for every T ∈ N the only equilibrium payoff of ŴT is (1, 1) (see Example 13.1 on page 521), while according to Theorem 13.17, the set of equilibrium payoffs of Ŵ∞ is the set W , shown in Figure 13.10.
540
Repeated games
4 3 W 1 0
0
1
3
4
Figure 13.10 The set of equilibrium payoffs of the infinitely repeated Prisoner’s
Dilemma
Example 13.1 (Continued ) Consider again the repeated Prisoner’s Dilemma of Figure 13.9. We will show that (1, 2), for example, is an equilibrium payoff of the infinitely repeated game. We do so by constructing an equilibrium leading to this payoff. Note first that (1, 2) = 58 (1, 1) + 18 (3, 3) + 28 (0, 4).
(13.40)
Define the pair of strategies τ ∗ = (τI∗ , τII∗ ) that repeatedly cycle through the action vectors (D, D), (D, D), (D, D), (D, D), (D, D), (C, C), (C, D), (C, D),
(13.41)
unless a player deviates, in which case the other player switches to the punishment action D. Formally:
r Player I repeatedly cycles through the actions D, D, D, D, D, C, C, C. r Player II repeatedly cycles through the actions D, D, D, D, D, C, C, D. r If one of the players deviates, and fails to play the action he is supposed to play under this plan, the other player chooses D in every subsequent stage of the game, forever. Direct inspection shows that τ ∗ is an equilibrium of the infinitely repeated game. By Equation (13.40) the average payoff at this equilibrium converges to (1, 2). We can similarly obtain every payoff vector in F ∩ V that is representable as a weighted average of the vectors (u(s))s∈S , with rational weights, as an equilibrium payoff of the infinitely repeated game. When a payoff vector x ∈ F ∩ V cannot be represented as a weighted average with rational weights, we can use Theorem 13.11 to approximate x by way of a weighted average with rational weights. In other words, for every k ∈ N we can find rational coefficients (λs (k))s∈S with denominator k such that 1
(13.42) λs (k)u(s) < . x − k s∈S In this case, in the basic action plan, the players play in blocks, where the k-th block has k stages: λ (1)u(s), in the second block they in the first block the players play to obtain the average s s∈S k play to obtain the average s∈S λs (2)u(s), and so on. Recall that for every sequence (z )k∈N of real numbers converging in the limit to z, the sequence of averages ( n1 nk=1 zk )n∈N also converges to z.
541
13.5 Infinitely repeated games Since the average payoff of the k-th block approaches x as k increases, the average payoff of the infinitely repeated game approaches x. If one of the players deviates, the other player plays D in every stage from that stage on, forever, and hence no player can profit by deviating. ◭
The construction of the equilibrium strategy in the above example can be generalized to any repeated game, thus proving Theorem 13.17. The proof is left to the reader (Exercise 13.23). As stated at the beginning of this section, one reason to study infinitely repeated games is to obtain insights into very long finitely repeated games. To present the connection between infinitely repeated games and finite games, we define the concept of ε-equilibrium of finite games. Definition 13.18 Let ε > 0, and let T ∈ N. A strategy vector τ ∗ is an ε-equilibrium of ŴT if for each player i ∈ N and any strategy τi ∈ BiT , ∗ ) − ε. γiT (τ ∗ ) ≥ γiT (τi , τ−i
(13.43)
If τ ∗ is an ε-equilibrium of ŴT , it is perhaps possible for a player to profit by deviating, but his profit will be no greater than ε. The smaller ε is, the less motivation a player has to deviate. When ε = 0, we recapitulate the definition of equilibrium, in which case no player has any motivation to deviate. If there is a cost for deviating, and the cost exceeds ε, then even at an ε-equilibrium, deviating is unprofitable. In this sense, ε-equilibria satisfy the property of being “almost stable,” where “almost” is measured by ε. For every strategy vector τ in Ŵ∞ , and every T ∈ N, we can define the restriction of τ to the first T stages of the game. To avoid a plethora of symbols, we will denote such a restricted strategy vector by the same symbol, τ . A stronger formulation of the Folk Theorem for Ŵ∞ relates the equilibria of the infinitely repeated game to ε-equilibria in long finitely repeated games. The proof of the theorem is left to the reader (Exercise 13.25). Theorem 13.19 For every ε > 0 and every vector x ∈ F ∩ V there exist a strategy vector τ in Ŵ∞ and T0 ∈ N that satisfy the following: 1. τ is an equilibrium of Ŵ∞ . 2. τ is an ε-equilibrium of ŴT , for all T ≥ T0 . Example 13.1 (Continued ) On page 522, we saw that the only equilibrium payoff of the T -stage repeated Prisoner’s Dilemma is (1, 1). It follows that for every payoff vector x ∈ F ∩ V that is not (1, 1), the corresponding strategy vector τ constructed on page 540, which is an equilibrium of the infinitely repeated game, is not an equilibrium of the T -stage repeated game, for any T ∈ N. However, for every ε > 0, for T sufficiently large, the strategy vector τ is an ε-equilibrium of the T -stage repeated game with an average payoff close to x. In other words, every payoff vector x ∈ F ∩ V can be supported by an ε-equilibrium in the T -stage repeated game, provided T is large enough ◭ (Exercise 13.24).
542
Repeated games
As the following example shows, it is not the case that every equilibrium of Ŵ∞ is an ε-equilibrium of every sufficiently long finitely repeated game. Example 13.20 Consider the two-player zero-sum game in Figure 13.11.
Player II L
R
T
0
−1
B
1
0
Player I Figure 13.11 The base game in Example 13.20
The pure action B strictly dominates T , and R strictly dominates L. Elimination of strictly dominated actions reveals that the value of the (finitely or infinitely) repeated game is 0. Consider the following pair of strategies τ ∗ = (τI∗ , τII∗ ) in the infinitely repeated game:
r τI∗ instructs Player I to play T up to stage (tII )2 and to play B thereafter, where tII is the first stage in which Player II plays R (tII = ∞ if Player II plays L in every stage).
r τII∗ instructs Player II to play L up to stage (tI )2 and to play R thereafter, where tI is the first stage in which Player I plays B (tI = ∞ if Player II plays T at every stage).
Thus, Player I plays T and checks whether Player II plays L (in which case the payoff is 0). As long as Player II plays L, Player I plays T . If at a certain stage (which we denote by tII ) Player II first plays R, Player I continues to play T for several stages (up to stage (tII )2 ) and from that stage on he punishes Player II by playing B in every subsequent stage. Player II’s strategy is defined similarly, all things being equal. At strategy vector τ ∗ , the players play (T , L) in every stage, and the payoff, in the infinitely repeated game, is 0. If one of the players (say Player I) deviates he may receive the higher payoff of 1 for several stages (the stages between tI and (tI )2 ), but afterwards he can receive at most 0 in every stage. The upper limit of the average payoff of Player I in Ŵ∞ is therefore less than or equal to 0 even when he deviates: he cannot profit by deviating. In particular, (τI∗ , τII∗ ) is an equilibrium of the infinitely repeated game. However, in finite games, (τI∗ , τII∗ ) is not an ε-equilibrium for ε close to 0: for example, in the 99-stage repeated game, if Player I deviates from τI∗ , and plays B from stage 10 onwards, his 90 average payoff is 99 . It follows that a deviation yields Player I a profit of 90 99 . In general, in the √
T -stage game, playing against strategy τII∗ , Player I has a deviation yielding him a payoff of T −⌈T T ⌉ (Exercise 13.26). Similarly, playing against strategy τI∗ , Player II has a deviation yielding him a √ T −⌈ T ⌉ payoff of − T . It follows that when ε is sufficiently small, (τI∗ , τII∗ ) is not an ε-equilibrium in ◭ any finite repeated game ŴT , for large T .
13.6
The discounted game • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
In the definition of the T -stage game, we assumed that every player seeks to maximize his average expected payoff at every stage of the game, or, equivalently, that every player seeks to maximize the expected sum of his payoffs. This means that if, say, John receives
543
13.6 The discounted game
$10,000 today, and Paul receives $10,000 in a year from now, their situations are considered identical. Such an assumption is not appealing: in reality, if John invests his $10,000 in a bank account yielding, say, 5% annual interest, he will have $10,500 in one year, and thus will be better off than Paul. This is the reason that economic models usually assume that players maximize not the sum of their payoffs over time, but the discounted sum of their payoffs, where the discount rate takes into account the interest that players can receive over time for their money. Discounted repeated games are presented in this section. For mathematical convenience, we will consider only infinitely repeated games. The assumption that all games are infinite is applicable in realistic models, because when payoffs are time-discounted, payoffs in far-off stages have a negligible effect on the discounted sum of the payoffs. We will study the equilibria of such games, and compare them to the equilibria of the finitely and infinitely repeated games presented in the previous sections. Definition 13.21 Let λ ∈ [0, 1) be a real number, and let τ = (τi )i∈N be a strategy vector in an infinitely repeated game. The λ-discounted payoff to player i under strategy vector τ is 6 7 ∞
λt−1 uti . (13.44) γiλ (τ ) := Eτ (1 − λ) t=1
The constant λ is called the discount factor. The coefficient λt−1 that multiplies the stage payoff uti in Equation (13.44) expresses the fact that a payoff of $1 tomorrow is equivalent to a payoff of $λ today, a payoff of $1 in two t−1 days is equivalent to a payoff of $λ2 today, and so on. Since (1 − λ) ∞ = 1, the t=1 λ discounted payoff is the weighted average of the daily payoffs, where the weights decrease exponentially. When λ = 0, players’ payoffs in Ŵλ equal their payoffs in the first stage of the game, and the discounted repeated game is essentially equivalent to the (one-stage) base game. When λ is close to zero, 1 − λ (the weight associated with the first stage) is large relative to λ (the total weight associated with the payoffs in the subsequent stages), and the first-stage payoff is the most important one: players attach more importance to today’s payoff, and are willing to forgo high payoffs in the future. When λ is close to 1, the weight associated with stage t is very close to that of stage t + 1, and hence the players exhibit “patience”: each player evaluates tomorrow’s payoff almost as much as he evaluates today’s payoff. Since the sum total of the weights is 1, the λ-discounted payoff γiλ may be viewed as an “expected payoff per stage.” This can be seen in two different ways: 1. If the payoffs are 1 in each stage, we want the “average payoff” per stage to be 1, and t−1 λ = 1. indeed the discounted sum in this case is (1 − λ) ∞ t=1 2. We can also interpret the discount factor λ as the probability that the game will continue to the next stage. In other words, at every stage there is probability 1 − λ that 1 the game will end, and probability λ that the game will continue. It follows that 1−λ is the expected number of stages to the end of the game, and the probability that the game will get to stage t is λt−1 . With this interpretation, the sum on the right-hand side
544
Repeated games
of Equation (13.44) is the total expected payoff in the game divided by the expected number of stages. Finally, since the definition in Equation (13.44) captures the per-stage payoff to player i, it allows us to compare equilibrium payoffs in discounted games, for different discount factors, and to compare these payoffs with equilibrium payoffs in finitely and infinitely repeated games. Definition 13.22 Let Ŵ = (N, (Si )i∈N , (ui )i∈N ) be a base game, and let λ ∈ [0, 1). The discounted game Ŵλ (with discount factor λ) corresponding to Ŵ is the game Ŵλ = (N, (Bi∞ )i∈N , (γiλ )i∈N ). It follows that a strategy vector τ ∗ is an equilibrium of Ŵλ if for each player i ∈ N and each strategy τi , ∗ ). γiλ (τ ∗ ) ≥ γiλ (τi , τ−i
(13.45)
In this case, the vector γiλ (τ ∗ ) is an equilibrium payoff of Ŵλ . The minmax value of each player i in Ŵλ is his minmax value in the base game Ŵ (Exercise 13.31), and an equilibrium payoff of a player is at least his minmax value (Theorem 5.42 on page 180). Therefore, γiλ (τ ∗ ) ≥ v i for each player i ∈ N. So far we have seen two ways to model long repeated games, using the infinitely repeated game Ŵ∞ and using finite repeated games with duration T that increases to infinity. As we have seen in point 2 above, in a λ-discounted model we can interpret the 1 quantity 1−λ as the expected duration of the game. Since this quantity goes to infinity as λ goes to 1, a third way to model a long repeated game is by λ-discounted games with a discount factor λ that goes to 1. A natural question that arises concerns the limit set of the set of λ-discounted equilibrium payoffs as λ goes to 1. In view of the Folk Theorem for infinitely repeated games, can we prove an analog result for discounted games, that is, is it true that every vector x ∈ F ∩ V is the limit of λ-discounted equilibrium payoffs, as the discount factor goes to 1? The following example shows that this is not the case. Example 13.23 Consider the three-player base game given in Figure 13.12, in which Player I has three actions {T , M, B}, Player II has two actions {L, R}, and Player III is a dummy player who has only one action, which has no effect on the payoffs (and is not mentioned throughout the example).
Player II L R
Player I
T
0, 2, 5
0, 0, 0
M
0, 1, 0
2, 0, 5
B
1, 1, 0
1, 1, 0
Figure 13.12 The base game in Example 13.23
545
13.6 The discounted game The minmax values of the players are 1, 1, and 0 respectively, and the set F ∩ V of the feasible and individually rational payoffs is the line segment [(1, 1, 0) − (1, 1, 5)] (verify!). We will now show that (1, 1, 0) is the only equilibrium payoff in the discounted game Ŵλ , for any discount factor λ ∈ [0, 1). Let τ ∗ be an equilibrium of the discounted game Ŵλ . We first show that γIλ (τ ∗ ) = γIIλ (τ ∗ ) = 1. Indeed, γiλ (τ ∗ ) ≥ v i = 1 for i ∈ {I, II}. On the other hand, the sum of the payoffs to Players I and II is at most 2 in all entries of the payoff matrix, and therefore γIλ (τ ∗ ) + γIIλ (τ ∗ ) ≤ 2. Consequently, γIλ (τ ∗ ) = γIIλ (τ ∗ ) = 1, as claimed. Since the sum of payoffs for Players I and II at (M, L) and (T , R) is strictly less than 2, these two pairs of actions are chosen with probability 0 at the equilibrium τ ∗ ; otherwise, we would have γIλ (τ ∗ ) + γIIλ (τ ∗ ) < 2, which contradicts γIλ (τ ∗ ) = γIIλ (τ ∗ ) = 1. For any t ≥ 0 and any history ht ∈ H (t) denote by γiλ (τ ∗ | ht ) the conditional discounted future payoff of player i (from stage t + 1 on) given the history ht , under the equilibrium τ ∗ : ⎡ ⎤ ∞
t+j γiλ (τ ∗ | ht ) := Eτ ∗ ⎣(1 − λ) (13.46) λj −1 ui | ht ⎦ . j =1
The arguments provided above show that γiλ (τ ∗ | ht ) = 1 for i ∈ {I, II}, for any history ht that has λ (τ ∗ ) = 0 we will show that the pairs of actions (T , L) positive probability under τ ∗ . To prove that γIII ∗ and (M, R) are chosen under τ with probability 0. Assume by contradiction that the action pair (T , L) is chosen with positive probability α > 0 at some stage t ≥ 0 after the history ht ∈ H (t). Since (M, L) and (T , R) are played with probability 0, it follows that at the history ht , the action pair (B, L) is played with probability 1 − α. Therefore, 1 = γIIλ (τ ∗ | ht ) = α (1 − λ) × 2 + λ × γIIλ (τ ∗ | (ht , (T , L))) + (1 − α) (1 − λ) × 1 + λ × γIIλ (τ ∗ | (ht , (B, L)))
= α ((1 − λ) × 2 + λ × 1) + (1 − α) ((1 − λ) × 1 + λ × 1) ,
(13.47) (13.48) (13.49)
which implies that α = 0, in contradiction to our assumption that α > 0. This proves that the action pair (T , L) is played with probability 0 under τ ∗ , and similarly the action pair (M, L) is played with probability 0 under τ ∗ . This concludes the proof that γ λ (τ ∗ ) = (1, 1, 0). The fact that at the only equilibrium payoff every player’s payoff is his minmax value is a coincidence. Indeed, replacing the payoffs 5 in Figure 13.12 by (−5) does not affect our proof that (1, 1, 0) is the only equilibrium payoff, while the minmax value of Player III changes to (−5). ◭
The Folk Theorem for finitely repeated games required a technical condition on the base game: for every player i there exists an equilibrium β(i) of the base game that yields player i a payoff higher than his minmax value. In our construction of equilibria in the repeated game, the mixed actions (β(i))ni=1 were played in the last stages of the game, to ensure that a player who deviates along the play will lose when punished. To obtain a Folk Theorem for discounted games one also needs a technical condition on the base game for the same purpose. The condition that we will require is weaker than the one that appears in the Folk Theorem for finite games (Exercise 13.41). Theorem 13.24 (The Folk Theorem for discounted games) Let Ŵ be a base game in which there exists a vector x ∈ F ∩ V that satisfies xi > v i for every player i ∈ N. For every ε > 0 there exists λ0 ∈ [0, 1) such that for every λ ∈ [λ0 , 1) and every vector
546
Repeated games
x ∈ F ∩ V , there exists an equilibrium τ ∗ of Ŵλ satisfying6 γ λ (τ ∗ ) − x∞ < ε.
(13.50)
The condition that appears in the statement of the theorem ensures that there is a convex combination of the entries of the payoff matrix, using rational weights, that is close to x, and yields each player a payoff that is strictly higher than his minmax value. If, as we did in the proof of Theorem 13.17, we construct a strategy vector with a basic plan in which the players play according to this convex combination, then, for λ sufficiently close to 1, for every t, the λ-discounted payoff of every player, from stage t on, is strictly higher than his minmax value, allowing the players to punish a deviator. The reader is asked to complete the details of the proof (Exercise 13.42).
13.7
Uniform equilibrium • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
As we said before, players may not always know the number of stages a repeated game will have:
r A young professional baseball player knows that at a certain age he will retire from the sport, but does not know exactly when that day will come. r We are all players in “the game of life,” whose length is unknown and differs among the players. Similarly, players do not always know the discount factor of the game:
r Although the prime interest rate is common knowledge, we do not know what the interest rate will be next year, in two years, or a decade from today. r Suppose the government is interested in selling a state-owned company. What discount rate should be used? Computing a reasonable discount rate in such cases can be very complicated. In the examples above, the discount rate and the exact length of the game are unknown. How should a player play in this case? In this section we will present concepts enabling us to study this question, and to arrive at results that are independent of the exact value of the discount factor, or the exact length of the game. To do so, we introduce the concept of uniform equilibrium, first for discounted games, then for finite games, and later we will see the relation between the two. Definition 13.25 A strategy vector τ ∗ is called a uniform equilibrium for discounted games if limλ→1 γ λ (τ ∗ ) exists, and there exists λ0 ∈ [0, 1) such that τ ∗ is an equilibrium of Ŵλ for every discount factor λ ∈ [λ0 , 1). The limit limλ→1 γ λ (τ ∗ ) is called a uniform equilibrium payoff for discounted games. τ ∗ is therefore a uniform equilibrium for discounted games if it is an equilibrium of every game in which the discount factor is sufficiently close to 1; that is, the players are sufficiently patient. ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
6 Recall that the maximum norm over Rn is defined as x∞ = maxi=1,2,...,n |xi | for every vector x ∈ Rn .
547
13.7 Uniform equilibrium
Do uniform equilibria for discounted games exist? As we will see (Theorem 13.27), there are many uniform equilibria for discounted games.
Example 13.26 Consider the two-player game in Figure 13.13.
Player II L
R
T
3, 1
0, 0
B
1, 2
4, 3
Player I Figure 13.13 The payoff matrix of the game in Example 13.26
The minmax value of Player I is v I = 2, and the punishment strategy against him is [ 23 (L), 13 (R)]. Player II’s minmax value is v II = 1, and the punishment strategy against him is T . We will show that in the example above, 1 5 7 (13.51) 3 2 , 2 8 = 8 × (4, 3) + 18 × (0, 0)
is a uniform equilibrium payoff for discounted games. To do so, we will show that the following pair of strategies is a discounted equilibrium for every discount factor sufficiently close to 1:
r Player I plays B at the first stage, and as long as Player II plays R, Player I repeatedly cycles through the following sequence of actions: B, B, B, B, B, B, B, T (the action B played at the first stage is the beginning of the first cycle). r Player II plays R at the first stage, and as long as Player I cycles through the sequence of actions B, B, B, B, B, B, B, T , Player II plays R. If neither player deviates from this strategy, the discounted sum of the payoffs of the first eight stages of the game is (1 + λ + λ2 + · · · + λ6 )(4, 3) + λ7 (0, 0) = Therefore, the discounted payoff is
1 − λ7 · (4, 3). 1−λ
(13.52)
(1 − λ)((4, 3) + λ(4, 3) + λ2 (4, 3) + · · · + λ6 (4, 3) + λ7 (0, 0) + λ8 (4, 3) + λ9 (4, 3) + · · · + λ14 (4, 3) + λ15 (0, 0) + · · · )
= (1 − λ) × =
1 − λ7 (1 + λ8 + λ16 + · · · ) · (4, 3) 1−λ
1 − λ7 · (4, 3). 1 − λ8
Applying L’Hˆopital’s Rule, the limit of this value, as λ approaches 1, is −7λ6 1 − λ7 lim × (4, 3) = 87 × (4, 3) = 3 21 , 2 85 . · (4, 3) = lim 8 7 λ→1 −8λ λ→1 1 − λ
(13.53) (13.54)
(13.55)
Neither player can profit by deviating in the stages in which the players play (B, R), because in these stages each receives his maximal possible payoffs. To guarantee that neither player
548
Repeated games can profit by deviating in the stages in which the players play (T , R), we add the following punishments:
r If Player II deviates for the first time in stage t, from stage t + 1 onwards Player I always plays T (which is Player I’s punishment strategy against Player II).
r If Player I deviates for the first time in stage t, from stage t + 1 onwards Player II always plays the mixed action [ 23 (L), 12 (R)] (which is Player II’s punishment strategy against Player I).
We next seek discount factors λ for which this strategy vector is a λ-discounted equilibrium. If Player I deviates in stage t, where the players are supposed to play (T , R), he receives in that stage a payoff of 4 instead of 0, for a net profit of 4. In contrast, from the next stage onwards, his expected payoff is 2. Player I’s λ-discounted payoff from stage t onwards is therefore7 (1 − λ)(4 + λ × 2 + λ2 × 2 + λ3 × 2 + · · · ) = 4(1 − λ) + 2λ = 4 − 2λ.
(13.56)
If Player I had not deviated in stage t, his discounted payoff from stage t onwards would be (1 − λ) × 0 + λ ×
1 − λ7 × 4, 1 − λ8
(13.57)
because if in stage t the players are supposed to play (T , R), then a new cycle of length 8 begins in stage t + 1, and therefore the λ-discounted payoff from stage t + 1 onwards equals (up to multiplication by λt ) the λ-discounted payoff from the first stage. The deviation is unprofitable only if the payoff, when no deviation occurs, is greater than or equal to the payoff when a deviation occurs: λ×
1 − λ7 × 4 ≥ 4 − 2λ. 1 − λ8
(13.58)
Multiplying both sides of the expression by 1 − λ8 , we deduce that the following must hold: 4λ − 4λ8 ≥ 4 − 4λ8 − 2λ + 2λ9 .
(13.59)
For λ = 1, both sides of the expression equal zero. Differentiating the left-hand side and setting λ = 1 yields 4 − 32 = −28, while differentiating the right-hand side and setting λ = 1 yields −32 − 2 + 18 = −16. Therefore, an interval (λ0 , 1) exists such that for every discount factor λ in the interval, the left-hand side of Equation (13.59) is greater than the right-hand side of Equation (13.59). One can check that the inequality in Equation (13.59) holds as a strict inequality for every λ ∈ (0.615, 1). If Player II deviates in stage t, where the players are supposed to play (T , R), he receives a payoff of 1 instead of 0 in that stage, for a net profit of 1. In contrast, from that stage onwards his payoff is bounded by 1. It follows that the λ-discounted payoff of Player II from stage t onwards is at most 1. In contrast, if Player II does not deviate, his payoff from stage t onwards is 1−λ7 (1 − λ) × 0 + λ × 1−λ 8 × 3. Deviating is not profitable if (1 − λ) × 0 + λ ×
1 − λ7 × 3 ≥ 1. 1 − λ8
(13.60)
It can be shown that this holds for all λ ≥ 0.334. We deduce from this that the pair of strategies defined above form a λ-discounted equilibrium for all λ > max {0.334, 0.615} = 0.615; hence it is ◭ a uniform equilibrium for discounted games. ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
7 For all a, b ∈ R, (1 − λ) λt−1 a + λt b + λt+1 b + · · · = λt−1 ((1 − λ)a + λb).
549
13.7 Uniform equilibrium
Theorem 13.27 (The Folk Theorem for uniform equilibrium in discounted games) Let Ŵ be a base game in which there exists a vector x ∈ F ∩ V that satisfies xi > v i for every player i ∈ N. For every ε > 0, and every x ∈ F ∩ V , there exists a strategy vector τ ∗ in the discounted repeated game such that: 1. τ ∗ is a uniform equilibrium for discounted games. 2. limλ→1 γ λ (τ ∗ ) − x∞ < ε.
In words, for each x ∈ F ∩ V there exists a uniform equilibrium for discounted games τ ∗ satisfying the property that the limit of discounted payoffs limλ→1 γ λ (τ ∗ ) is approximately x. The strategy vector τ ∗ satisfying the conditions of the theorem, similarly to the case in the proof of Theorem 13.17 (page 539), is of the grim-trigger type, with a basic plan that ensures that the payoff vector is close to x. The complete proof of this theorem is left to the reader (Exercise 13.44). The concept of uniform equilibrium can also be defined for long finite games. We will see in Theorem 13.32 that the two concepts of uniform equilibrium are related. Definition 13.28 Let ε ≥ 0. A strategy vector τ ∗ in an infinitely repeated game is a uniform ε-equilibrium for finite games if the limit limT →∞ γ T (τ ∗ ) exists, and there exists T0 ∈ N such that τ ∗ is an ε-equilibrium of ŴT , for every T ≥ T0 . The limit limT →∞ γ T (τ ∗ ) is called a uniform ε-equilibrium payoff. At every uniform 0-equilibrium for finite games (i.e., the case in which ε = 0), from some stage onwards, in every stage, the players play an equilibrium of the base game (Exercise 13.48). Consequently, the set of uniform 0-equilibrium payoffs is the convex hull of the set of Nash equilibrium payoffs of the base game. For ε > 0, however, the set of uniform ε-equilibrium payoffs is much larger; that is, the Folk Theorem holds. Theorem 13.29 (The Folk Theorem for uniform equilibrium in finite games) For every ε > 0, and every x ∈ F ∩ V , there exists a strategy vector τ ∗ such that: 1. τ ∗ is a uniform ε-equilibrium for finite games: 2. limT →∞ γ T (τ ∗ ) − x∞ < ε. The proof of the theorem, which is similar to the proof of Theorem 13.17 (page 539), is left to the reader (Exercise 13.49). We now turn our attention to comparing the concepts of uniform equilibrium for discounted games and uniform ε-equilibrium for finite games. For this purpose, we will first find a connection between finite averages and discounted sums. Theorem 13.30 Let (xt )∞ t=1 be a bounded sequence of numbers. Denote the average of the first T elements of this sequence by T 1
ST = xt , T t=1
∀T ∈ N,
(13.61)
550
Repeated games
and the discounted sum by A(λ) = (1 − λ)
∞
λt−1 xt ,
∀λ ∈ [0, 1).
t=1
(13.62)
Also denote αT (λ) = (1 − λ)2 λT −1 T , Then, for all λ ∈ [0, 1), A(λ) = Note that
∞
T =1 αT (λ)
∞
∀T ∈ N, ∀λ ∈ [0, 1).
αT (λ)ST .
(13.63)
(13.64)
T =1
= 1:
∞
T =1
αT (λ) = (1 − λ)2
∞
T λT −1
(13.65)
T =1
∞ d T λ = (1 − λ) dλ T =1 1 2 d = (1 − λ) = 1, dλ 1 − λ 2
(13.66) (13.67)
where Equation (13.66) follows from the Bounded Convergence Theorem (see Theorem 16.4 in Billingsley [1999]). Thus, Equation (13.64) states that A(λ) is a weighted average of (ST )T ∈N . Proof: The proof of the theorem is accomplished by the following sequence of equalities: A(λ) = (1 − λ) = (1 − λ) = (1 − λ) = = =
∞
λt−1 xt
(13.68)
t=1
∞ ∞
t=1
∞
k=1
k−1
(λ
k=t
k
− λ ) xt
(13.69)
(13.70)
(λk−1 − λk )
k
t=1
∞
(1 − λ)(λk−1 − λk )kSk k=1 ∞
xt
(13.71)
(1 − λ)2 λk−1 kSk
(13.72)
αk (λ)Sk .
(13.73)
k=1 ∞
k=1
551
13.7 Uniform equilibrium
Equation (13.70) follows by changing the order of summation (why can the order of summation be changed in this case?), and Equation (13.70) follows from Equation (13.61). The next theorem is a consequence of Theorem 13.30. Theorem 13.31 (Hardy and Littlewood) Every bounded sequence of real numbers (xt )∞ t=1 satisfies lim inf T →∞
T ∞
1
xt ≤ lim inf (1 − λ)λt−1 xt λ→1 T t=1 t=1
≤ lim sup λ→1
≤ lim sup T →∞
∞
(1 − λ)λt−1 xt
(13.74)
(13.75)
t=1
T 1
xt . T t=1
(13.76)
In particular, if the limit limT →∞ T1 Tt=1 xt exists, then the limit limλ→1 ∞ t=1 (1 − λ)λt−1 xt also exists, and both limits are equal. Using the notation of Theorem 13.30, Theorem 13.31 states that lim inf ST ≤ lim inf A(λ) ≤ lim sup A(λ) ≤ lim sup ST . T →∞
λ→1
(13.77)
T →∞
λ→1
Proof: We will prove Equation (13.76): ∞ T
1
t−1 lim sup xt . (1 − λ)λ xt ≤ lim sup λ→1 t=1 T →∞ T t=1
(13.78)
The proof of Equation (13.74) can be accomplished in a similar manner, or by considering the sequence (yt )t∈N defined by yt := −xt for every t ∈ N. Equation (13.75) requires no proof. Theorem 13.30 implies that for all T0 ∈ N, A(λ) =
∞
T =1
αT (λ)ST =
T
0 −1 T =1
αT (λ)ST +
∞
αT (λ)ST .
(13.79)
T =T0
Denote C := lim supT →∞ ST , and let ε > 0 be any positive real number. Let T0 be sufficiently large such that ST ≤ C + ε for all T ≥ T0 . Note that T0
T =1
αT (λ) = (1 − λ)
2
T0
T =1
λT −1 T < (1 − λ)2 (T0 )2 ,
(13.80)
where the last inequality follows from the fact that λ ∈ [0, 1). In particular, when T0 −1 λ approaches 1 the sum T =1 αT (λ) approaches 0, and therefore the first sum in
552
Repeated games
Equation (13.79) also converges to 0. The second sum is bounded by C + ε. We therefore have lim sup A(λ) ≤ C + ε = lim sup ST + ε.
(13.81)
T →∞
λ→1
Since this inequality holds for all ε > 0, we deduce that lim supλ→1 A(λ) ≤ lim supT →∞ ST , which is what we wanted to prove. Analogously to the definition for finite games (Definition 13.18), a strategy vector τ ∗ is an ε-equilibrium in the discounted game Ŵλ if no player can profit more than ε by deviating: ∗ ) − ε, ∀i ∈ N, ∀τi ∈ Bi∞ . γiλ (τ ∗ ) ≥ γiλ (τi , τ−i
(13.82)
Theorem 13.31 enables us to establish the following connection between uniform εequilibria for finite games and uniform ε-equilibria for discounted games. Theorem 13.32 Let τ ∗ be a uniform ε-equilibrium for finite games. Then, for every δ > 0, there exists λ0 ∈ [0, 1) such that for every λ ∈ [λ0 , 1), the strategy vector τ is an (ε + 2δ)equilibrium of the discounted game with discount factor λ: for every player i ∈ N, and every strategy τi , ∗ ) − ε − 2δ. γiλ (τ ∗ ) ≥ γiλ (τi , τ−i
(13.83)
Proof: Let τ ∗ be a uniform ε-equilibrium for finite games, and let δ > 0. Recall that M is a bound on the payoffs of the base game, and denote by C := limT →∞ γiT (τ ∗ ) the limit of the payoffs in the finite games. By Theorem 13.31, C = lim γiλ (τ ∗ ).
(13.84)
λ→1
Let T0 be sufficiently large such that for each T ≥ T0 , one has (a) the strategy vector τ ∗ is an ε-equilibrium of the T -stage game, and (b) |Ci − γiT (τ ∗ )| < δ. Let λ0 be sufficiently close to 1 such that TT0=1 αT (λ0 ) ≤ Mδ (see Equation (13.80)). Let i be a player, and let τi be a strategy of player i. We will show that for λ sufficiently close to 1, player i cannot profit more than ε + δ by deviating to any strategy τi . Denote the expected payoff in stage t, when player i deviates to τi , by xt = Eτi ,τ−i∗ [ui (at )].
(13.85)
∗ The average of x1 , x2 , . . . , xT equals the payoff under (τi , τ−i ) in the T -stage game: ∗ ) γiT (τi , τ−i
=
T
t=1
T
xt
.
(13.86)
553
13.7 Uniform equilibrium
For each λ ∈ [λ0 , 1), ∗ γiλ (τi , τ−i )=
=
∞
T =1
T
0 −1 T =1
≤δ+ ≤δ+
∗ αT (λ)γiT (τi , τ−i ) ∗ αT (λ)γiT (τi , τ−i ) ∞
T =T0 ∞
T =T0
(13.87)
+
∞
T =T0
∗ αT (λ)γiT (τi , τ−i )
∗ αT (λ)γiT (τi , τ−i )
(13.89)
αT (λ)γiT (τ ∗ ) + ε
(13.90)
≤ δ+C+δ+ε
= lim
λ→1
(13.88)
γiλ (τ ∗ )
(13.91)
+ ε + 2δ.
(13.92)
Equation (13.87) holds by Theorem 13.30, Equation (13.89) holds because λ ∈ [λ0 , 1) and by the choice of λ0 , and Equation (13.90) holds because τ ∗ is an ε-equilibrium for every T ≥ T0 . Equation (13.91) holds because |Ci − γiT (τ ∗ )| < δ for every T ≥ T0 , and Equation (13.92) follows from Equation (13.84). It follows that τ ∗ is an (ε + 2δ) equilibrium of the λ-discounted game, and this holds for all λ ∈ [λ0 , 1). We have already seen in Example 13.20 that an equilibrium of Ŵ∞ is not necessarily an ε-equilibrium of long finite games, and therefore not necessarily a uniform ε-equilibrium for finite games. The following example shows that a uniform equilibrium for discounted games is not necessarily a uniform ε-equilibrium for finite games, or an equilibrium of Ŵ∞ . Example 13.33 Let (xt )∞ t=1 be a sequence of zeros and ones satisfying lim sup T →∞
T
t=1
T
xt
> lim sup(1 − λ) λ→1
∞
λt−1 xt .
(13.93)
t=1
For details on how to construct such a sequence, see Exercise 13.50. Let c be a real number satisfying lim sup T →∞
T
t=1
T
xt
> c > lim sup(1 − λ) λ→1
∞
λt−1 xt .
(13.94)
t=1
Consider the two-player game in Figure 13.14. In this game, the payoff to Player II is 2, under every action vector. As we will now show, (c, 2) is a uniform equilibrium payoff of discounted repeated games, but is not a uniform ε-equilibrium payoff of finite games, for ε > 0 sufficiently small. Since under every circumstance Player II receives 2 in every stage of the repeated game, to prove that a pair of strategies is an equilibrium it is sufficient to show that Player I cannot profit by deviating.
554
Repeated games Player II D
E
F
A
0, 2
0, 2
c, 2
B
0, 2
1, 2
c, 2
Player I Figure 13.14 The payoff matrix of the game in Example 13.33
Define the following strategy σII of Player II:
r In the first stage, play F . r If in the first stage Player I played A, play F in all of the remaining stages of the game. r If in the first stage Player I played B, play D or E in all of the remaining stages of the game, according to the above-mentioned sequence (xt )∞ t=1 : if xt = 0, play D in stage t, and if xt = 1, play E in stage t. The strategy σII does not depend on Player I’s actions after the first stage. For Player I, therefore, every strategy σI is weakly dominated by the strategy in which Player I’s action in the first stage is the same as that of σI , and from the second stage onwards his action is always B. It follows that Player I’s best reply to σII is either σIA , where Player I plays A in the first stage, and B in every other stage, or σIB , where he plays B in every stage, including the first stage. The strategy vector (σIA , σII ) is a uniform equilibrium for discounted games payoff (c, 2). with t−1 λ xt , Equation To see this, note that since γIλ (σIA , σII ) = c, while γIλ (σIB , σII ) = (1 − λ) ∞ t=1 (13.94) implies that for a discount factor sufficiently close to 1, one has γIλ (σIA , σII ) > γIλ (σIB , σII ), and therefore Player I has no profitable deviation. A We next show that ε-equilibrium for finite games, for ε > 0 sufficiently (σI , σII ) is not a uniform small. Set ε0 :=
1 2
lim supT →∞
(σIA , σII )
T
t=1
T
xt
− c . We will show that there exists an increasing sequence
(Tk )k∈N such that is not an ε-equilibrium of the Tk -stage game, for every k ∈ N and every ε ∈ (0, ε0 ). By Equation (13.94), there exists an increasing sequence (Tk )k∈N such that for every k ∈ N, Tk xt γITk σIB , σII = t=1 > c + ε0 > c = γITk σIA , σII . (13.95) Tk
Therefore, for every k ∈ N, by deviating in the Tk -stage game to σIB , Player I’s profit is more than ε0 . It follows that (σIA , σII ) is not an ε-equilibrium in ŴTk for every k ∈ N and every ε ∈ (0, ε0 ]. We further note that it follows from this discussion that (σIA , σII ) is also not an equilibrium in the ◭ infinitely repeated game (Exercise 13.52).
13.8
Discussion • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
There is a wealth of literature on repeated games, and many variations of this model have been studied. One line of inquiry has focused on the subject of punishment. The equilibrium strategies we have defined in this section are unforgiving: once a player deviates, he is punished by the other player for the rest of the game. Because a punishment
555
13.10 Exercises
strategy is liable to lower the payoff of not only the player who is being punished but also other players in the game, it is reasonable to ask whether players whose interests are harmed by a punishment strategy will join in implementing it. Considerations such as these have led to the study of subgame perfect equilibria in repeated games (for a discussion on the notion of subgame perfect equilibrium in an extensive-form game, see Section 7.1 on page 252). In repeated games, a strategy vector τ ∗ is a subgame perfect equilibrium if after every finite history (whether or not the players arrive at that history if they implement τ ∗ ), the play of the game that ensues from that stage onwards is an equilibrium of the subgame starting at that point. A proof of the Folk Theorem under this definition of equilibrium appears in Aumann and Shapley [1994], Rubinstein [1979], Fudenberg and Maskin [1986], and Gossner [1995]. There are several other variations on the theme of repeated games that have been studied in the literature. These include what happens when: (1) players do not observe the actions implemented by other players, and instead receive only a signal that depends on the actions of all the players (see, e.g., Lehrer [1989], [1990], and [1992], and Gossner and Tomala [2007]); (2) players do not know their payoff functions (see, e.g., Megiddo [1980]); and (3) at the start of the game, a payoff function is chosen from a set of possible payoff functions, and the players receive partial information regarding which payoff function is chosen (see Aumann and Maschler [1995] and Section 14.7 on page 590).
13.9
Remarks • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Exercise 13.7 is based on a result appearing in Benoit and Krishna [1985]. Exercise 13.34 is based on Rubinstein [1982]. Exercise 13.38 is based on Neyman [1985]. Exercise 13.50 is based on Liggett and Lippman [1969]. Exercise 13.51 is based on Example H.1 in Filar and Vrieze [1997]. The game appearing in Exercise 13.53 is known as the Big Match. It was first described in Gillette [1957], and was extensively studied in Blackwell and Ferguson [1968]. A review on repeated games with complete information can be found in Sorin [1992]. For a presentation of repeated games with incomplete information see Aumann and Maschler [1995] and Sorin [2002]. For a presentation of repeated games with private monitoring see Mailath and Samuelson [2006]. More information on Tauberian Theorems, of which Theorem 13.31 (page 551) is an example, can be found in Korevaar [2004]. The authors thank Abraham Neyman for clarifications provided during the composition of this chapter.
13.10 Exercises • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
13.1 Compute the number of pure strategies a player has in a T -stage game with n players, where the number of actions of each player i in the base game is |Si | = ki .
556
Repeated games
13.2 Artemis and Diana are avid hunters. They devote Tuesdays to their shared hobby. On Monday evening, each of them separately writes, on a slip of paper, whether or not he or she will go hunting, and whether he or she wants to be the lead hunter, or the second hunter. They then meet and each reads what the other wrote. If at least one of the two is not interested in going hunting, or both of them want the same role (lead hunter or second hunter), they do not go hunting on Tuesday. If they are both interested in a hunt, one of them wants to be the lead hunter, and the other wants to be second hunter, they do go hunting on Tuesday. The utility of being lead hunter is 2. The utility of being second hunter is 1, and the utility of not going hunting is 0. Answer the following questions for this situation of repeated interaction: (a) Write down the base game for this situation. (b) Find all the equilibria of the one-stage game (the base game). (c) Find all the equilibria of the two-stage game. 13.3 Repeat Exercise 13.2 for the following situation. Mark and Jim are neighbors, and are employed in the same place of work. They start work at the same hour every day, but their working day ends at different hours. Each has the option of going to work by train, or by bus. Every morning, each of them decides the mode of transportion by which he will get to work that day. Each of them “gains” 5 when they travel to work together, and each “gains” 0 if they travel by different modes of transportation. Taking the bus costs 1, and taking the train costs 2. Mark enjoys a 50% reduction on train tickets. The utility each of them receives is the difference between what he gains during the ride to work, and the cost of the ticket. For example, Jim’s utility from taking the bus with Mark is 4. 13.4 Repeat Exercise 13.2 for the following situation. There are two pubs in a neighborhood. Three friends, Andrew, Mike, and Ron, like to cap off their working days with a beer at the pub. Each of them gains a utility of 2 when drinking with only one other friend, a utility of 1 when the three drink together, and a utility of 0 when drinking alone. Every day each of them independently decides which of the two pubs in the neighborhood he will go to, for a drink. 13.5 Prove or disprove the following claim: let τ be a strategy vector in ŴT where, for each history h, the mixed action vector τ (h) is an equilibrium of the base game Ŵ. Then τ is an equilibrium of the game ŴT . Compare this result with Theorem 13.6 (page 528), where the equilibrium of the base game that is played at any stage is independent of the history. 13.6 Prove that at every equilibrium of the T -stage Prisoner’s Dilemma, both players play D in every stage. 13.7 Let Ŵ = (N, (Si )i∈N , (ui )i∈N ) be a game in strategic form that has a unique equilibrium, and let ŴT be the T -stage repeated game corresponding to Ŵ. Prove that ŴT has a unique subgame perfect equilibrium. Is it possible for ŴT to have an additional Nash equilibrium? Justify your answer.
557
13.10 Exercises
13.8 Prove that the minmax value of each player i in the T -stage repeated game is equal to his minmax value v i in the base game. 13.9 In the following two-player zero-sum game (see Figure 13.15), find the value of the T -stage repeated game, and the optimal strategies of the two players for every T ∈ N. What is the limit of the values of the T -stage games, as T goes to infinity? Player I’s set of actions is AI = {T , B}, and Player II’s set of actions is AII = {L, R}.
r If the players choose the pair of actions (T , L), Player II pays Player I the sum of $1, and the players play the repeated game in Figure 13.15(A). r If the players choose the pair of actions (T , R), Player II pays Player I the sum of $4, and the players play the repeated game in Figure 13.15(B). r If the players choose the pair of actions (B, L), Player II pays Player I the sum of $2, and the players play the repeated game in Figure 13.15(B). r If the players choose the pair of actions (B, R), Player II pays Player I the sum of $0, and the players play the repeated game in Figure 13.15(A).
Player II
T Player I B
Player II
L
R
4
0
1
5
L
R
T
1
0
B
0
1
Player I
A
B
Figure 13.15 The payoff matrix of the game in Exercise 13.9
13.10 In this exercise, we will prove that the payoff received by a player in a T -stage repeated game is a linear function of the probabilities under which he chooses his pure strategies. (a) Let τi1 , . . . , τiL be all the pure strategies of player i in the T -stage repeated game. Prove that for every behavior strategy τi of player i (see Definition 13.3 on page 525) in the repeated game, there exist nonnegative numbers α1 , . . . , αL , whose sum is 1, such that for each strategy vector τ−i of the other players,
γiT (τi , τ−i ) = (b) What are the coefficients (αl )Ll=1 ?
L
l=1
αl γiT (τil , τ−i ).
(13.96)
558
Repeated games
13.11 Consider the following base game. Player II L
R
T
− 1, 3
4, 0
B
1, −1
0, 2
Player I
What is the limit of the average payoffs in the infinitely repeated game corresponding to this game, when the players implement the following strategies? (a) In even stages, Player I plays T , and in odd stages he plays B. In stages divisible by 3 Player II plays L, and in all other stages he plays R. (b) In even stages, Player I plays T , and in odd stages he plays B. Player II plays as follows. In the first stage he plays L. At any other stage he plays R if Player I played T in the previous stage; otherwise he plays the mixed action [ 41 (L), 34 (R)] in the current stage. (c) Player I plays [ 23 (T ), 13 (B)] in every stage. Player II plays as follows. In the first stage he plays L. At any other stage he plays R if Player I played T in the previous stage; otherwise he plays the mixed action [ 41 (L), 34 (R)] in the current stage. 13.12 For each of the infinitely repeated games corresponding to the following base games, plot on the same graph in R2 the sets F and F ∩ V , where the x-axis represents Player I’s payoff, and the y-axis represents Player II’s payoff. Player II
Player II
T Player I
L
R
−1, 1
1, −1
1, −1
B
L
R
T
6, 6
2, 7
B
7, 2
0, 0
Player I
−1, 1
Game A
Game B
Player II
Player I
L
C
R
T
0, 0
2, 4
4, 2
M
4, 2
0, 0
2, 4
Player II L
R
T
4, 2
2, 3
B
1, 0
0, 1
Player I B
2, 4
4, 2 Game C
0, 0
Game D
559
13.10 Exercises
13.13 Prove Theorem 13.11 on page 534: for every K ∈ N and every vector x ∈ F there are nonnegative integers (ks )s∈S summing to K satisfying 8 8 8 8 k M × |S| s 8 8 u(s) − x 8 ≤ . (13.97) 8 8 8 K K s∈S ∞
13.14 Suppose that each player i in a two-player zero-sum base game Ŵ has a unique optimal mixed strategy xi . Prove that in the T -stage repeated game ŴT , at each equilibrium in behavior strategies, at each stage each player implements the mixed strategy xi . 13.15 In the 1,000,000-stage repeated game of the following base game, describe an equilibrium whose payoff is within 0.01 of (5, 6), and an equilibrium whose payoff is within 0.01 of (4, 3). Player II L
R
T
6, 6
2, 7
B
7, 2
0, 0
Player I
13.16 Consider the infinitely repeated game of the following base game. Player II L
R
T
4, 6
2, 8
B
7, 3
1, 0
Player I
Suppose that, in this game, Player I implements the following strategy σI . In the first stage, he plays the mixed action [ 23 (T ), 13 (B)]. In every stage t > 1, he plays a mixed action that is determined by the action that Player II played in the previous stage: if in stage t Player II played L, then in stage t + 1, Player I plays the mixed action [ 21 (T ), 12 (B)], while if in stage t Player II played R, then in stage t + 1 Player I plays the mixed action [ 43 (T ), 14 (B)]. Player II is considering which of the following four strategies to implement: (a) play L in every stage, (b) play R in every stage, (c) play L in odd stages, and R in even stages, (d) play R in odd stages, and L in even stages. What is the limit of the average payoffs of each of the players when Player I implements strategy σI and Player II implements each of the above four strategies? 13.17 For the base game in Exercise 13.15 describe an equilibrium of the infinitely repeated game that yields a payoff of (4 31 , 2 13 ). 13.18 For each of the following base games determine whether or not (2, 1) is an equilibrium payoff of the corresponding infinitely repeated game. If it is an equilibrium
560
Repeated games
payoff, describe an equilibrium leading to that payoff. If not, justify your answer. In these games, Player I is the row player and Player II is the column player. L
R
T
0, 0
2, 2
B
1, 1
0, 0
L
R
T
1, 3
4, 0
B
2, 0
0, 1
Game A
L
R
T
1, 3
4, 0
B
3, 0
1, 1
Game B
L
M
R
T
0, 1
1, 0
3, 1
B
3, 1
0, 2
0, 3
Game C
Game D
13.19 In the infinitely repeated game of the following base game, describe an equilibrium leading to the payoff (2, 3 32 ). Player II L
R
T
1, 4
2, 5
B
0, 1
3, 2
Player I
13.20 In the following three-player base game Ŵ Player I chooses the row (T or B), Player II chooses the column (L or R), and Player III chooses the matrix (W or E). Describe an equilibrium in the infinitely repeated game, based on Ŵ, for which the resulting payoff is (2, 1, 2 12 ). L
R
T
1, 0, 2
1, 1, 0
B
3, 0, 3
3, 2, 1
L
R
T
2, 0, 0
2, 1, 2
B
1, 2, 1
1, 2, 3
W
E
13.21 One of the payoffs in the following base game is a parameter labeled x. For every x ∈ [0, 1] find the set of equilibrium payoffs in the infinitely repeated game based on this game. L
R
T
1, 1
0, 2
B
0, 1
x,
3 2
13.22 Find an example of an infinitely repeated game, and a strategy vector τ in this game satisfying (a) τ is an equilibrium for every finite game ŴT with a corresponding payoff of γ T (τ ) and (b) the limit limT →∞ γ T (τ ) does not exist. 13.23 Prove the Folk Theorem for infinitely repeated games (Theorem 13.17 on page 539). Guidance: For each K ∈ N, approximate x by a weighted average of vectors in the payoff matrix, with weights that are nonnegative and rational, with denominator
561
13.10 Exercises
K. Construct a strategy vector in which the players play in blocks, such that the length of the K-th block is K stages, and in the K-th block the players play in such a way that the average of the payoffs is approximately x. If at a certain stage, a player deviates from the action he is supposed to play at that stage, he is punished from the next stage onwards by a punishment strategy. 13.24 Prove directly that the statement of Theorem 13.19 (page 541) holds with respect to the strategy vector τ ∗ defined in Example 13.1 (page 540): τ ∗ is an ε-equilibrium of the T -stage game, for every T sufficiently large. 13.25 Prove the strong formulation of the Folk Theorem for infinitely repeated games (Theorem 13.19 on page 541). 13.26 In the game in Example 13.20,√prove that for every T ∈ N, Player I has a strategy in ŴT yielding the payoff T −⌈T T ⌉ when Player II uses the strategy τII∗ defined in the example. 13.27 Let N be a set of players, and let (Si )i∈N be finite sets of actions of the players. Let u : S → RN and u′ : S → RN be two payoff functions. Consider a variation of the repeated game, in which in odd stages the payoff function is u, and in even stages the payoff function is u′ . (a) Write the analogous theorem to Theorem 13.9 in this model. (b) Write the analogous theorem to Theorem 13.17 in this model. 13.28 Repeat Exercise 13.27, under the following variation of the game: in each stage, one of the payoff functions is chosen randomly (each payoff function is chosen with probability 12 , independently of the payoff functions and the actions of the players in previous stages), and the players are informed of the chosen payoff functions before they choose their actions in each stage. 13.29 Repeat Exercise 13.28 for the case where the players are not informed of the payoff function chosen in each stage. 13.30 Repeat Exercise 13.28 for the case where only Player 1 is informed of the payoff function chosen in each stage (with the other players not informed of the chosen payoff function). 13.31 Prove that for every discount factor λ ∈ [0, 1), the minmax value of each player i in the λ-discounted game Ŵλ is equal to his minmax value in the base game Ŵ. 13.32 Compute the λ-discounted payoff in each of the three cases (a), (b), (c) of Exercise 13.11. 13.33 Cartel game A cartel is an association of players who coordinate their actions in order to attain better results than the players could attain if they acted individually. In this exercise we will show that players can indeed profit by forming a cartel, and check whether a cartel can be stable. Consider the following Cournot competition (see Example 4.23 on page 99): there are n luxury car manufacturers. The manufacturing cost of each car is
562
Repeated games
$100,000 (for each manufacturer) and the consumer price of each such car is $200,000 − ni=1 xi , where xi is the number of cars manufactured annually by manufacturer i. For computational ease, we assume below that xi can be any nonnegative real number (not necessarily an integer). Answer the following questions: (a) Describe the situation as a strategic-form game, where a pure strategy of each manufacturer is the number of cars he manufactures annually. (b) Prove that this game has a unique symmetric equilibrium (that is, an equilibrium . x in which xi = xj for all i and j ), at which xi = 100,000 n+1 (c) Suppose that the manufacturers decide to form a cartel, and to determine jointly the number of cars that each of them will manufacture, in order to maximize the profit of each of them. Prove that to maximize this profit, the manufacturers need to manufacture collectively 50,000 cars; hence if they divide this number equally between them they will each manufacture 50,000 cars. In other words, n the cartel limits the number of cars manufactured by each member to a number that is lower than the number of cars manufactured at equilibrium (assuming that n > 1). Show that despite the lower manufacturing numbers, the profit of each manufacturer under the cartel’s quotas is higher than his profit at the equilibrium strategy. (d) Consider next the discounted repeated game of the above-described base game. Is the strategy vector at which each manufacturer manufactures 50,000 cars in n each stage an equilibrium of Ŵλ , for every λ ∈ [0, 1)? Justify your answer. (e) For each manufacturer i define a strategy τi as follows: r In the first stage, manufacture 50,000 cars. n r For each t > 1, the number of cars to manufacture in stage t is determined as follows: • if in each of the previous stages every manufacturer manufactured 50,000 n cars, manufacture 50,000 cars in stage t; n • otherwise, manufacture 100,000 cars in stage t. n+1 For which value of n, and which discount factor λ, is the strategy vector τ = (τi )ni=1 an equilibrium of the game Ŵλ ? What can we conclude regarding the stability of cartels, given these results? (f) Are there similarities between the repeated Prisoner’s Dilemma (see Example 13.1 on page 521) and the cartel game of this exercise? If so, what are they? Which equilibria of the repeated Prisoner’s Dilemma correspond to the equilibria described in items (b) and (e) of this exercise? 13.34 Alternating offers game Barack and Joe can together implement a project that will jointly yield them a profit $100. How should they divide this sum of money between them? They decide to implement the following mechanism: Barack will offer Joe a split of (x, 100 − x), where x is a number in the interval [0, 100], signifying the amount of money that Barack will receive under this offer. Joe may accept or reject this offer. If he accepts, this will be the final split. If he rejects the offer, the next day he proposes a counteroffer (y, 100 − y), where y is a number
563
13.10 Exercises
in the interval [0, 100], signifying the amount of money that Barack will receive, under this offer. Barack may accept or reject this offer. If he accepts, this will be the final split. If he rejects the offer, the next day he proposes a counteroffer, and so on. Every delay in implementing the project reduces the profit they will receive: if the two of them agree on a division of the money (x, 100 − x) on day n, Barack’s payoff is β n−1 × x, and Joe’s payoff is β n−1 × (100 − x), where β ∈ (0, 1) is the discount factor in the game (in other words, 100( β1 − 1) is the daily interest rate in the game). Depict this situation as an extensive-form game, and find all the subgame perfect equilibria of the game. 13.35 In the two-player zero-sum game in Exercise 13.9, find the value of the discounted game, and the optimal strategy of both players for any discount factor λ ∈ [0, 1). What is the limit of the discounted values, as the discount factor converges to 1? Is the limit equal to the limit you computed in Exercise 13.9 for the values of the T -stage game? 13.36 Find an example of a repeated game, and a strategy vector τ , such that (a) τ is an equilibrium of the discounted game for every λ ∈ [0, 1), and (b) the limit limλ→1 γ λ (τ ) does not exist. 13.37 Suppose two players are playing the repeated Prisoner’s Dilemma. Prove that if the discount factor λ is sufficiently close to 1, the strategy vector at which the players implement the grim-trigger strategy, i.e., every player plays C as long as the other player plays C, and otherwise plays D, is a λ-discounted equilibrium. 13.38 A strategy in an infinitely repeated game has recall k if the action a player chooses in stage t depends only on the actions that were played in stages t − 1, t − 2, . . ., t − k (and is independent of the actions played in earlier stages, and of the number t ≥ k we have of the stages t). Formally, a strategy τi has recall k if for every t, τi (a 1 , a 2 , . . . , a t−1 ) = τi ( a 1 , a 2 , . . . , at−1 ) whenever (a t−k , a t−k+1 , · · · , a t−1 ) = at−1 ). at−k+1 , · · · , ( at−k , (a) How many pure strategies of recall k has each player got? (b) Can the grim-trigger strategy be implemented by a pure strategy with recall k? Justify your answer. (c) Prove that in the T -stage repeated Prisoner’s Dilemma, when the players are limited to playing only strategies with recall k (where k + 1 < T ), (3, 3) is an equilibrium payoff. (d) For which triples k, l, and T is (3, 3) an equilibrium payoff in the T -stage repeated Prisoner’s Dilemma, where Player I is limited to strategies with recall k, and Player II is limited to strategies with recall l? 13.39 Suppose two players are playing the repeated Prisoner’s Dilemma with an unknown number of stages; after each stage, a lottery is conducted, such that with probability 1 − β the game ends with no further stages conducted, and with probability β the game continues to another stage, where β ∈ [0, 1) is a given real number. Each
564
Repeated games
player’s goal is to maximize the sum total of payoffs received over all the stages of the game. Prove that if β is sufficiently close to 1, the strategy vector in which at the first stage every player plays C, and in each subsequent stage each player plays C if the other player played C in the previous stage, and he plays D otherwise, is an equilibrium. This strategy is called the Tit-for-Tat strategy. 13.40 In this exercise, we will show that in a discounted two-player zero-sum game in which the discount factors of the two players are different from each other, the payoff to each player at every equilibrium is the value of the base game. Consider the two-player zero-sum repeated game based on the following base game. L
R
T
− 1, 1
1, −1
B
1, −1
−1, 1
Assume that the discount factor of Player I is λ throughout this exercise (except in section (j)), and that the discount factor of Player II is λ2 , where λ ∈ [0, 1). Answer the following questions: (a) What is the value in mixed strategies v of the base game? (b) What is the discounted payoff to each player under the following pair of strategies, as a function of the parameter t0 ∈ N: r Player I plays T in each stage of the game. r Player II plays L in the first t0 stages, and always R afterwards. (c) Find t0 such that the sum of the payoffs of the two players is maximized. What is the sum of the payoffs for this t0 ? In the solution here, assume that t0 may be any nonnegative real number. (d) Prove that the pair of strategies in which Player I plays the mixed action [ 21 (T ), 12 (B)] at each stage, and Player II plays the mixed action [ 12 (L), 12 (R)] at each stage, is an equilibrium in this discounted game. Let τ ∗ = (τI∗ , τII∗ ) be any equilibrium of this discounted game. For t0 ∈ N, denote by A(λ, t0 ) the λ-discounted payoff under strategy vector τ ∗ starting from stage t0 :
A(λ, t0 ) = (1 − λ)
∞
Eτ ∗ [ut ]λt−t0 −1 .
(13.98)
t=t0
(e) Prove that for every t0 ∈ N, the following holds: A(λ, t0 ) ≥ v and A(λ2 , t0 ) ≤ v.
565
13.10 Exercises
(f) Prove that for every t0 ∈ N and every λ ∈ [0, 1), the following holds: 2
A(λ, t0 ) = A(λ , t0 ) +
∞
k=1
λk (1 − λ)A(λ2 , t0 + k).
(13.99)
(g) Deduce from the last two items that A(λ, t0 ) = v for every t0 ∈ N, and from this further deduce that Eτ ∗ [ut ] = 0 for every t ∈ N. (h) Prove that at each equilibrium of this discounted game, the discounted payoff of each player is v. (i) Does the result of item (c) contradict the result of item (h)? Explain. (j) Generalize the result of item (h) to any discounted game and any pair of discount factors: if τ ∗ = (τI∗ , τII∗ ) is an equilibrium of a two-player zero-sum game in which the discount factor of Player I is λI and the discount factor of Player II is λII , then A(λi , t0 ) = v for i ∈ {I, II}, for every t0 ∈ N, where v is the value in mixed strategies of the base game. In particular, at any equilibrium, the discounted payoff of each player (at his discount factor) is the value of the base game. 13.41 Prove that the condition in Theorem 13.9 (page 531) implies the condition in Theorem 13.24 (page 545): if for every player i there exists an equilibrium β(i) in x ∈ F ∩ V satisfying the base game for which βi (i) > v i , then there exists a vector xi > v i for every i ∈ N. 13.42 Prove the Folk Theorem for discounted games (Theorem 13.24 on page 545).
13.43 Show (by finding appropriate strategy vectors) that the payoff vectors mentioned in Exercises 13.19 and 13.20 are payoffs of uniform equilibria for discounted games. 13.44 Prove the Folk Theorem for uniform equilibrium in discounted games (Theorem 13.27 on page 549). 13.45 In this exercise, we define the uniform value of two-player zero-sum games. Let Ŵ be a two-player zero-sum base game. The real number v is called the uniform value (for the finite games (ŴT )T ∈N ) if for each ε > 0 there exist strategies τI∗ of Player I and τII∗ of Player II in Ŵ∞ , and an integer T0 , such that the following condition is satisfied: for each T ≥ T0 , and each pair of strategies (τI , τII ) in ŴT , γ T (τI , τII∗ ) ≤ v + ε, and γ T (τI∗ , τII ) ≥ v − ε.
(13.100)
Prove that the uniform value for finite games equals the value of the base game. 13.46 Repeat Exercise 13.43 for uniform ε-equilibria for finite games, for every ε > 0. 13.47 Let ET be the set of equilibrium payoffs of a T -stage repeated game ŴT . (a) Prove that ET ⊆ EkT for every T ∈ N and for every k ∈ N. 1 E1 ⊆ ET +1 for every T ∈ N. (b) Prove8 that T T+1 ET + T +1
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
8 For every pair of sets S1 and S2 in Rk , and every real number α, the sets αS1 and S1 + S2 are defined by αS1 := {αx : x ∈ S1 } and S1 + S2 := {x + y : x ∈ S1 , y ∈ S2 }.
566
Repeated games
(c) For every set S ⊆ Rk , the set S denotes the closure of S: the smallest closed set containing S. Let , Ek . (13.101) E∞ := lim sup ET = T →∞
T ∈N k≥T
The set E∞ is the upper limit of the sets (ET )T ∈N , and it includes all the partial limits of the sequences (xt )t∈N , where xt ∈ Et for each t ∈ N. Prove, using items (a) and (b), that ET ⊆ E∞ for every T ∈ N, and in particular that E∞ is not empty. Furthermore, prove that for every x ∈ E∞ and every ε > 0, there exists T0 ∈ N such that for every T ≥ T0 there exists y ∈ ET satisfying x − y∞ ≤ ε. In other words, the sets (ET )T ∈N “approach” E∞ as T goes to infinity. 13.48 Prove that in every uniform 0-equilibrium for finite games, from some stage onwards the players play an equilibrium of the base game at each stage. 13.49 Prove the Folk Theorem for uniform equilibrium in finite games (Theorem 13.29 on page 549). 13.50 In this exercise we prove the existence of a sequence (xt )∞ t=1 of zeros and ones satisfying T ∞
xk λt−1 xt . (13.102) lim inf k=1 < lim inf (1 − λ) T →∞ λ→1 T t=1 Let (qt )t∈N be a sequence of natural numbers. Define a sequence (pt )t∈N as follows: p1 := 0,
(13.103)
pt := q1 + q2 + · · · + qt−1 .
(13.104)
Define a sequence (xt )t∈N as follows: 1 when there exists k such that 2pk < t ≤ 2pk + qk , (13.105) xt = 0 otherwise. In words, the first q1 elements of the sequence (xt )t∈N equal 1, the next q1 elements of the sequence equal 0, the next q2 elements of the sequence equal 1, the next q2 elements of the sequence equal 0, and so on. T
x
k=1 k (a) Prove that lim inf T →∞ = 21 . T ∞ 2pk (1 − λqk ). (b) Denote A(λ) = (1 − λ) t=1 λt−1 xt . Prove that A(λ) = ∞ k=1 λ pk pk+1 (c) Denote αk = λ − λ for every k ∈ N. Using item (b) above, prove that
∞
2 1 A(λ) = 2 (αk ) + 1 . (13.106)
k=1
(d) Let ε ∈ (0, 41 ), and define c :=
ln(ε) √ . ln(1− ε)
Prove that c > 2.
567
13.10 Exercises 2pk (e) Suppose that the sequence (qt )t∈N satisfies qk > c−2 for every k ∈ N. Define √ | ln(1 − ε)| | ln(ε)| ak := , bk := . (13.107) qk 2pk
Prove that limk→∞ bk = 0, and that for every k ∈ N, (a) bk+1 < bk for every k ∈ N, (b) cqk > 2pk + 2qk , and (c) ak ε2 .
(13.108) 2
(h) Deduce, with the aid of item (c) above, that lim inf λ→1 A(λ) ≥ ε 2+1 . (i) Deduce that Equation (13.102) holds for the sequence (xt )t∈N defined in item (d) above. (j) Construct a sequence (yt )∞ t=1 of zeros and ones satisfying lim sup T →∞
T
t=1
T
yt
> lim sup(1 − λ) λ→1
∞
λt−1 yt .
(13.109)
t=1
Such a sequence was used in Example 13.33. = −t, 13.51 Consider the following sequence (xt )t∈N : 1, −1, 2, −2,3, −3, . . . , i.e., x2t and x2t−1 = t for every N. Compute lim supT →∞ t∈ t−1 and limλ→1 (1 − λ) ∞ λ xt . t=1
T k=1
T
xk
, lim inf T →∞
T k=1
xk
T
13.52 Prove that the strategy vector (σIA , σII ) defined in Example 13.33 (page 553) is not an equilibrium of the infinitely repeated game. 13.53 David and Tom play the following game, over T stages. In each stage David chooses a color, either red or yellow, and Tom guesses which color David chose. If Tom guesses “red,” he pays David one dollar if he guessed incorrectly, and receives one dollar from David if he guessed correctly. If, however, Tom guesses “yellow,” he pays David a dollar in that stage and in every subsequent stage of the game if he guessed incorrectly, and he receives a dollar from David in that stage and in every subsequent stage of the game if he guessed correctly. Note that this is not a repeated game, because if the first time that Tom guesses “yellow” is in stage t, the payoffs in all the stages after t depend on Tom’s choice in stage t. (a) Prove that the only equilibrium payoff when T = 1 is (0, 0). (b) Prove that the only equilibrium payoff when T = 2 is (0, 0). (c) Prove that the only equilibrium payoff for every T is (0, 0). 13.54 Consider the game in Exercise 13.53 with T = ∞. Let x and y be two numbers in the interval [0, 1]. Suppose that in each stage David chooses “red” with probability
568
Repeated games
x and “yellow” with probability 1 − x, and in each stage Tom guesses “red” with probability y and “yellow” with probability 1 − y. (a) Compute the expected λ-discounted payoff in this infinite game as a function of x and y, for each λ ∈ [0, 1). (b) Conclude that, if the players are restricted to these i.i.d. strategies, (0, 0) is a λdiscounted equilibrium payoff, for each λ ∈ [0, 1). What are the corresponding equilibrium strategies?
14
Repeated games with vector payoffs
Chapter summary This chapter is devoted to a theory of repeated games with vector payoffs, known as the theory of approachability, developed by Blackwell in 1956. Blackwell considered two-player repeated games in which the outcome is an m-dimensional vector of attributes, and the goal of each player is to control the average vector of attributes. The goal can be either to approach a given target set S ⊆ Rm, that is, to ensure that the distance between the vector of average attributes and the target set S converges to 0, or to exclude the target set S, that is, to ensure that the distance between the vector of average attributes and S remains bounded away from 0. If a player can approach the target set we say that the set is approachable by the player, whereas if the player can exclude the target set we say that it is excludable by that player. Clearly, a set cannot be both approachable by one player and excludable by the other player. We provide a geometric condition that ensures that a set is approachable by a player, and show that any convex set is either approachable by one player or excludable by the other player. Two applications of the theory of approachability are provided: it is used, respectively, to construct an optimal strategy for the uninformed player in two-player zero-sum repeated games with incomplete information on one side, and to construct a no-regret strategy in sequential decision problems with experts.
In Chapter 13 we studied repeated games in which the payoff to each player in every stage was a real number representing the player’s utility. In this chapter we will look at two-player repeated games in which the outcome in every stage is not a pair of payoffs, but a vector in the m-dimensional Euclidean space Rm . These games correspond to situations in which the outcome of an interaction between the players is comprised of several incommensurable factors. For example, an employment contract between an employee and an employer may specify the number of hours the employee is to commit to the job; the salary the employee will receive; and the number of days of annual leave granted to the employee. As we saw in Chapter 2 on utility theory, under certain assumptions it is possible to associate each outcome with a real number representing the utility of the outcome, thereby translating the situation into a game with payoffs in real numbers. But we may not know the players’ utility functions. In addition, we may at times be interested in controlling each variable separately, as is done for example in physics problems, where pressure and temperature may be controlled separately. The model of repeated games with 569
570
Repeated games with vector payoffs
vector payoffs was first presented by Blackwell [1956]. The first part of this chapter is based on that paper. When the outcome of an interaction to each player is a payoff, each player tries to maximize the average of the payoffs he receives. When the outcome is a vector in Rm , maximizing one coordinate may come at the expense of another coordinate. We therefore speak of target sets in the space of vector payoffs: each player tries either to cause the average of his payoffs to approach a target set (i.e., a certain subset of Rm ) or to exclude a target set. In Chapters 9 and 10 we studied Bayesian games; these are games with incomplete information whose payoffs depend on the state of nature, which can have a finite number of values. In Section 14.7 (page 590) we will study two-player zero-sum repeated games with incomplete information regarding the state of nature using the model of repeated games with vector payoffs: every pair of actions in such a game is associated with a vector of payoffs composed of the payoff for each possible state of nature. In this way, we can monitor the average payoff for every possible state of nature, even if the state of nature is not known by all the players. An example of such an application appears in Section 14.7 (page 590). In Section 14.8 (page 600), we will present an additional application of the model of repeated games with vector payoffs to the study of dynamic decision problems with experts.
14.1
Notation • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
In this chapter we will work in Rm , the m-dimensional Euclidean space. We will sometimes term x ∈ Rm a “vector,” and sometimes a “point.” The zero vector in Rm is denoted by 0. Recall that for a finite set A, we denote by (A) the set of probability distributions over A. The inner product in Rm is denoted as follows. For every pair of vectors x, y ∈ Rm , x, y :=
m
xl yl .
(14.1)
l=1
The inner product is symmetric, x, y = y, x , and bilinear; i.e., it is a linear function in each of its variables. That is, for every α, β ∈ R and every x, x1 , x2 , y, y1 , y2 ∈ Rm , αx1 + βx2 , y = αx1 , y + βx2 , y ,
(14.2)
x, αy1 + βy2 = αx, y1 + βx, y2 .
(14.3)
and
The norm of a vector x ∈ Rm , denoted by x, is the Euclidean norm, given by ? @ m @
1/2 x := x, x = A (xl )2 , (14.4) l=1
571
14.1 Notation
and the distance function between vectors is d(x, y) := x − y = x − y, x − y 1/2
? @ m @
= A (xl − yl )2 .
(14.5)
l=1
If C ⊆ Rm is a set, and x ∈ Rm is a vector, the distance between x and C is given by (14.6)
d(x, C) := inf d(x, y). y∈C
It follows that the distance between a point x and a set C equals the distance between x and the closure of C, and d(x, C) = 0 for every x in the closure of C. The triangle inequality states that d(x, y) + d(y, z) ≥ d(x, z),
∀x, y, z ∈ Rm .
(14.7)
Equivalently, x + y ≥ x + y.
(14.8)
The Cauchy–Schwartz inequality states that x2 y2 ≥ x, y 2 .
(14.9)
The following inequalities also hold (Equation (14.11) follows from the Cauchy–Schwartz inequality):1 d(x + y, x + z) = d(y, z),
∀x, y, z ∈ Rm ,
d(x + y, z + w) ≤ d(x, z) + d(y, w), d(αx, αy) = αd(x, y), √ d(x, y) ≤ 2M m,
(14.10)
∀x, y, z, w ∈ Rm , m
∀x, y ∈ R , ∀α > 0,
m
∀M > 0, ∀x, y ∈ [−M, M] .
(14.11) (14.12) (14.13)
If C ⊆ Rm is a set, and x, y ∈ Rm are vectors, then (Exercise 14.1) d(x, C) ≤ d(x, y) + d(y, C).
(14.14)
All the vectors are considered to be row vectors. If x is a row vector, then x ⊤ is the corresponding column vector. Since we are studying two-player games, for every player k ∈ {1, 2}, we will denote by −k the player who is not player k. In particular, the notation σ−k denotes a strategy of the player who is not k.
•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
1 For every set A ⊆ R, and natural number m, the set Am ⊆ Rm is defined as follows: Am = A × A × · · · × A = {(x1 , x2 , . . . , xm ) ∈ Rm : xi ∈ A, 12 3 0 m times
i = 1, 2, . . . , m}.
572
Repeated games with vector payoffs
14.2
The model • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
Definition 14.1 A repeated (two-player) game with (m-dimensional) vector payoffs is given by two action sets I = {1, 2, . . . , I } and J = {1, 2, . . . , J } of Players 1 and 2, respectively,2 and a payoff function u : I × J → Rm . As previously stated, the vectors in Rm are not necessarily payoffs; they are various attributes of the outcome of the game. Despite this, we use the term “payoff function” for u, both for convenience and because of the analogy to games with scalar payoffs (the case m = 1). It will sometimes be convenient to present the payoff function u as a matrix of order I × J , whose elements are vectors in Rm . The game proceeds in stages as follows. In stage t, (t = 1, 2, . . .), each one of the players chooses an action: Player 1 chooses action i t ∈ I , and Player 2 chooses action j t ∈ J . As in the model of repeated games, we will assume that every player knows what the other player chose in previous stages. A behavior strategy of Player 1 is a function associating a mixed action with each history of actions σ1 :
∞ t=1
(I × J )t−1 → (I ).
(14.15)
Similarly, a behavior strategy of Player 2 is a function σ2 :
∞ (I × J )t−1 → (J ).
(14.16)
t=1
Kuhn’s Theorem for infinite games (Theorem 6.26 on page 242) states that every mixed strategy has an equivalent behavior strategy and vice versa. It therefore suffices to consider only behavior strategies here, because they are more natural than mixed strategies. The word strategy in this chapter will be short-hand for “behavior strategy.” By Theorem 6.23 (page 242), every pair of strategies (σ1 , σ2 ) induces a probability measure Pσ1 ,σ2 over the set of infinite plays, i.e., over (I × J )N . The expectation operator corresponding to this probability distribution is denoted by Eσ1 ,σ2 . Denote the payoff in stage t by g t = u(i t , j t ) ∈ Rm , and the average payoff up to stage T by3 gT =
T T 1 t 1 t t g = u(i , j ) ∈ Rm . T t=1 T t=1
(14.17)
We next define the concept of an approachable set, the central concept of this chapter. ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
2 For convenience, we use in this chapter the notation I and J for the action sets of the players, instead of A1 and A2 . 3 While in one-stage games the payoff is defined to be the expected payoff according to the mixed actions of the players, in repeated games the payoff in each stage t is the actual payoff u(i t , j t ) of that stage (and not the expected payoff according to the mixed actions at that stage). In this chapter we will be interested in the average payoff g T , as opposed to its expectation.
573
14.3 Examples
Definition 14.2 A nonempty set C ⊆ Rm is called approachable by player k if there exists a strategy σk of player k such that for every ε > 0 there exists T ∈ N such that for every strategy σ−k of the other player Pσk ,σ−k (d(g t , C) < ε,
∀t ≥ T ) > 1 − ε.
(14.18)
In this case we say that σk approaches C for player k. A set is approachable by a player if that player can guarantee that for any strategy used by the other player, the average payoff approaches the set with probability 1 uniformly. In particular, this implies that Pσk ,σ−k ( lim d(g t , C) = 0) = 1. t→∞
(14.19)
The convergence of the average payoff to C is uniform; i.e., the rate at which the average payoff approaches this set (meaning the ratio between ε and t in Equation (14.18)) is independent of the strategy used by the rival player. The dual to Definition 14.2 relates to the situation in which player k can guarantee that the distance between the average payoff and the target set is positive and bounded away from 0. Definition 14.3 A nonempty set C ⊆ Rm is called excludable by player k if there exists δ > 0 such that the set {x ∈ Rm : d(x, C) ≥ δ} is approachable by player k. If the strategy σk of player k approaches the set {x ∈ Rm : d(x, C) ≥ δ} for some δ > 0, we say that σk excludes the set C for player k.
14.3
Examples • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
When m = 1, the outcome at each stage is a real number. If we interpret this number as the payoff to Player 1, and the negative of this number as the payoff to Player 2, then this model is equivalent to the model of repeated two-player zero-sum games. If v is the value of the one-stage game, then [v, ∞) is an approachable set for Player 1, and (−∞, v] is an approachable set for Player 2. The players’ approaching strategies are stationary strategies, in which each player plays an optimal strategy of the one-stage game at each stage (independently of the history of play). It follows that for every δ > 0, the set (−∞, v − δ] is an excludable set for Player 1, and the set [v + δ, ∞) is an excludable set for Player 2. This example shows that one may regard the model of repeated games with vector payoffs as a generalization of the model of two-player zero-sum games. Blackwell [1956], in fact, presented his model in such a way.
574
Repeated games with vector payoffs
Example 14.4 Consider a game where m = 2, each player has two possible actions, and the payoff function u is given by the matrix in Figure 14.1.
Player 2
Player 1
L
R
T
(0, 0)
(0, 0)
B
(1, 1)
(1, 0)
Figure 14.1 The game in Example 14.4
The set C1 = {(0, 0)}, containing only the vector (0, 0) (see Figure 14.2), is approachable by Player 1: if Player 1 plays T in every stage, he guarantees that the average payoff is (0, 0). The set C2 = {(1, x) : 0 ≤ x ≤ 1} (see Figure 14.2) is also approachable by Player 1: if Player 1 plays B in every stage, he guarantees that the average payoff is in C2 .
(1, 1)
(1, 1) C2
C1 (0, 0) The set C1
(1, 1)
(1, 0)
(0, 0)
(1, 0)
C3 (0, 0)
The set C2
(1, 0)
The set C3
Figure 14.2 Three sets approachable by Player 1 in Example 14.4
! It is also interesting to note that the set C3 = (x, 1 − x) : 12 ≤ x ≤ 1 (see Figure 14.2) is also approachable by Player 1. The following strategy of Player 1 guarantees that the average payoff approaches this set:
r If g t−1 , the average payoff up to stage t − 1, is located above the diagonal x1 + x2 = 1, i.e., if g 1t−1 + g 2t−1 ≥ 1, then play T in stage t. r If g t−1 , the average payoff up to stage t − 1, is located below the diagonal x1 + x2 = 1, i.e., if g 1t−1 + g 2t−1 < 1, then play B in stage t.
In Exercise 14.8 we present a guided proof of the fact that the set C3 is indeed approachable by ◭ Player 1.
14.4
Connections between approachable and excludable sets • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
The following claims, whose proofs are left to the reader, state several simple properties that follow from the definitions (Exercise 14.4). Theorem 14.5 The following two claims hold: 1. If strategy σk approaches a set C for player k, then it approaches the closure of C for that player.
575
14.4 Approachable and excludable sets
2. If strategy σk excludes a set C for player k, then it excludes the closure of C for that player. Let M ≥
1 2
be a bound on the norm of the payoffs in the game u(i, j ) ≤ M, ∀i ∈ I , ∀j ∈ J .
(14.20)
In particular, u(i t , j t ) ≤ M, in every stage t. The triangle inequality implies that g T ≤ M, ∀T ∈ N.
(14.21)
In words, the average payoff is located in the ball with radius M around the origin. Therefore, if the average payoff approaches a particular set, it must approach the intersection of that set and the ball of radius M around the origin. Similarly, if a player can guarantee that the distance between the average payoff and a particular set is positive and bounded away from 0, then he can guarantee that the distance between the average payoff and the intersection of that set and the ball of radius M around the origin is positive and bounded away from 0. This insight is expressed in the next theorem, whose proof is left to the reader (Exercise 14.5). Theorem 14.6 The following two claims hold: 1. A closed set C is approachable by a player if and only if the set {x ∈ C : x ≤ M} is approachable by the player. 2. A closed set C is excludable by a player if and only if the set {x ∈ C : x ≤ M} is excludable by the player. The following theorem relates to sets containing approachable sets, and to subsets of excludable sets (Exercise 14.6). Theorem 14.7 The following two claims hold: 1. If strategy σk approaches a set C for player k, then it approaches every superset of C for that player. 2. If strategy σk excludes a set C for player k, then it excludes every subset of C for that player. We close this section with the following theorem (Exercise 14.7). Theorem 14.8 A set C cannot be both approachable by one player and excludable by the other player. Theorem 14.8 expresses the opposing interests of the players in this model, as in the model of two-player zero-sum games. In the next section we will present a geometric condition for the approachability of a set, which we then use to prove that every closed and convex set is either approachable by one player, or excludable by the other player.
576
Repeated games with vector payoffs
14.5
A geometric condition for the approachability of a set • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •
If in stage t Player 1 plays the mixed action p and Player 2 plays the mixed action q, the expected payoff in that stage is4
U (p, q) := pi u(i, j )qj , (14.22) i,j
which is a vector in Rm . For every mixed action p ∈ (I ) of Player 1, define the set ⎫ ⎧ ⎬ ⎨
(14.23) pi u(i, j )qj : q ∈ (J ) ⊆ Rm . R1 (p) := {U (p, q) : q ∈ (J )} = ⎭ ⎩ i,j
Thus, if Player 1 plays the mixed action p, the expected payoff in the current stage is in the set R1 (p). As we will show (Theorem 14.19, page 585), for every p ∈ (I ), the strategy of Player 1 in which he plays the mixed action p in every stage approaches the set R1 (p). The reason for this is that when Player 1 implements the mixed action p in every stage, the expected payoff in each stage is located in R1 (p), independently of the action implemented by Player 2. Since the set R1 (p) is convex, it follows that for every T ∈ N the expectation of the average payoff up to stage T is also in R1 (p). As we will later show this further implies, by way of a variation of the strong law of large numbers, that the average payoff g¯ T approaches R1 (p) as T increases to infinity. Similarly, for every mixed action q ∈ (J ) of Player 2, defines ⎫ ⎧ ⎬ ⎨
R2 (q) := {U (p, q) : p ∈ (I )} = pi u(i, j )qj : p ∈ (I ) ⊆ Rm . (14.24) ⎭ ⎩ i,j
Just as for R1 (p), for every q ∈ (J ), the strategy of Player 2 in which he plays the mixed action q in every stage approaches the set R2 (q) (Theorem 14.19, page 585).
Example 14.9 Consider the game with two-dimensional payoffs in Figure 14.3.
Player 2
Player 1
L
R
T
(3, 0)
(5, 2)
B
(0, 1)
(4, 4)
Figure 14.3 The game in Example 14.9
••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
4 Here, and in the rest of this chapter, a sum
i,j
will be understood to mean the double sum
i∈I
j ∈J .
577
14.5 The approachability of a set Figure 14.4 depicts the sets R1 (p) and R2 (q) for several values of p and q. For simplicity, when Player 1 has two actions, T and B, we will identify every number p in the interval [0, 1] with the mixed action [p(T ), (1 − p)(B)]. When Player 2 has two actions, L and R, we will identify every number q in the interval [0, 1] with the mixed action [q(L), (1 − q)(R)].
R1 (0) R1 ( —14 ) R1 ( —12 ) 3 R1 ( — 4) R1 (1)
4 3 2
R 2 (0)
4
1 2 — 4
R ( ) 1 3 R2 ( — 2) 2 R 2 (3/4) 1
1 R 2 (1)
0
0 0
1
2
3
4
5
0
The sets R1 (p)
1
2
3
4
5
The sets R1 (p)
Figure 14.4 The sets R1 (p) and R2 (p) in Example 14.9
Definition 14.10 A hyperplane H (α, β) in Rm is defined by ! H (α, β) := x ∈ Rm : α, x = β ,
◭
(14.25)
where α ∈ Rm and β ∈ R.
Denote H + (α, β) = {x ∈ Rm : x, α ≥ β}
(14.26)
H − (α, β) = {x ∈ Rm : x, α ≤ β}.
(14.27)
and
H + (α, β) and H − (α, β) are the half-spaces defined by the hyperplane H (α, β). Note that H + (α, β) ∩ H − (α, β) = H (α, β). Figure 14.5 depicts the hyperplane H ((2, 1), 2) in R2 , and the two corresponding half-spaces. By definition (see Corollary 14.23), H + (α, β) = H − (−α, −β).
(14.28)
For every x, y ∈ Rm , the hyperplane H (x − y, x − y, y ) is the hyperplane passing through the point y, and perpendicular to the line passing through x and y (Exercise 23.35 on page 954). For example, in the case m = 2 described in Figure 14.6, the slope of the −x2 . We now show that the slope of the hyperplane line passing through x and y is yy12 −x 1 1 H (x − y, x − y, y ), which in this case is a line, is − yy12 −x , and therefore this line is −x2 perpendicular to the line passing through x and y. Choose a point z = (z1 , z2 ) = y on the hyperplane H (x − y, x − y, y ). Then z satisfies z1 (x1 − y1 ) + z2 (x2 − y2 ) = x − y, y = (x1 − y1 )y1 + (x2 − y2 )y2 .
(14.29)
578
Repeated games with vector payoffs
(0, 2)
H + ((2, 1, 2) (1, 0)
H – ((2, 1), 2)
H((2, 1), 2) Figure 14.5 The hyperplane H ((2, 1), 2) in R2 , which is the line 2x1 + x2 = 2
H (x − y, 〈x − y, y〉) y z x Figure 14.6 The hyperplane H (y − x, y − x, y )
This further implies that the slope of the line connecting z and y, which is the hyperplane H , is x1 − y1 z2 − y2 =− , z1 − y1 x2 − y2
(14.30)
which is what we needed to show. Definition 14.11 Let C ⊆ Rm be a set, and let x ∈ C be a point in Rm . A hyperplane H (α, β) is said to separate x from C if: 1. x ∈ H + (α, β) \ H (α, β) and C ⊆ H − (α, β), or 2. x ∈ H − (α, β) \ H (α, β) and C ⊆ H + (α, β). In words, a hyperplane H (α, β) separates x from C if (i) x, α > β and y, α ≤ β for all y ∈ C, or (ii) x, α < β and y, α ≥ β for all y ∈ C. As in Chapter 13, denote by F the convex hull of all possible one-stage payoffs: F = conv{u(i, j ), (i, j ) ∈ I × J }.
(14.31)
Note that the average payoff g t , as a weighted average of vectors in the convex set {u(i, j ), (i, j ) ∈ I × J }, is necessarily in the set F .
579
14.5 The approachability of a set
As previously noted, it will follow that the set R1 (p) will be proved to be approachable by Player 1, for every p ∈ (I ). By Theorem 14.7, any half-space containing at least one of the sets (R1 (p))p∈ (I ) is also approachable by Player 1. This observation leads to the concept of a “B-set.” A set C is a B-set for Player 1 if each half-space in a certain collection of half-spaces contains a set R1 (p) for some p ∈ (I ). Definition 14.12 A closed set C ⊆ Rm is a B-set for Player 1 if for every point x ∈ F \ C there exist a point y = y(x, C) ∈ C and a mixed action p = p(x, C) ∈ (I ) of Player 1 satisfying: 1. y is a point in C that is closest to x: d(x, y) = d(x, C).
(14.32)
2. The hyperplane H (y − x, y − x, y ) separates x from R1 (p): R1 (p) ⊆ H + (y − x, y − x, y ), −
x ∈ H (y − x, y − x, y ) \ H (y − x, y − x, y ).
(14.33) (14.34)
Remark 14.13 The hyperplane H (y − x, y − x, y ) satisfies the following three properties (Exercise 23.35 on page 954): 1. y ∈ H (y − x, y − x, y ). 2. This hyperplane is perpendicular to y − x, that is, y − x, z − y = 0 for all z ∈ H (y − x, y − x, y ). 3. y is the point in H (y − x, y − x, y ) that is closest to x, that is, z − x, z − x > y − x, y − x for all z ∈ H (y − x, y − x, y ), z = y. Similarly, for a given hyperplane H and a point x ∈ H , if y ∈ H is the point in H that is closest to x, then H = H (y − x, y − x, y ) (Exercise 23.36 on page 954). Note that the condition in Definition 14.12 requires that for every x there exist a point y and a mixed action p of Player 1 satisfying (a) and (b); for a given mixed action p, it is not the case that every point y satisfying (a) also satisfies (b). In Figure 4.7, there are two points y, y ′ in C that are the closest points to x. The hyperplane H (y − x, y − x, y ), containing y, separates x from R1 (p). In contrast, the hyperplane H (y ′ − x, y ′ − x, y ′ ) does not separate x from R1 (p). The definition of a B-set for Player 2 is analogous to Definition 14.12: a set C is a B-set for Player 2 if for each point x ∈ F \ C there exists a mixed action q ∈ (J ) of Player 2 such that the hyperplane H (y − x, y − x, y ) separates x from R2 (q), where y ∈ C is a point in C that is closest to x. The following theorem presents a geometric condition that guarantees the approachability of a set by a particular player. Theorem 14.14 (Blackwell [1956]) If a set C is a B-set for player k, then it is approachable by player k. The converse may not hold: there are sets approachable by a player k that are not B-sets for player k (Exercise 14.15).
580
Repeated games with vector payoffs
R1( p)
C y
y′ x H ( y′ − x, 〈 y′ − x, y′〉)
H (y − x, 〈y − x, y〉)
Figure 14.7 The hyperplane H (y − x, y − x, y ) separates x from R1 (p)
R1 ( p) f t+1 C
E [gt +1 ]
y gt
Figure 14.8 The idea behind the proof of Theorem 14.14
The intuition behind the proof (for Player 1) is depicted in Figure 14.8. Consider the strategy of Player 1 under which he plays the mixed action p(g t , C) in every stage t. The hyperplane identified by the definition of a B-set is the hyperplane tangent to C at the point y = y(g t , C) in C that is closest to x. Suppose that g t , the average payoff up to stage t is outside C, and let p = p(g t , C) be the mixed action of Player 1, respectively, satisfying conditions (1) and (2) in Definition 14.12, for x = g t . If Player 1 plays the mixed action p, the expected payoff in stage t + 1, denoted in the figure by f t+1 , is in R1 (p), and therefore the expected value of g t+1 is located on the line connecting g t with f t+1 . We will show that the expected distance d(g t+1 , C) is smaller than d(g t , C); i.e., the expected distance between g t+1 and C is smaller than the distance between g t and C. Finally, we will show that if the expected distance to C goes to 0, the distance itself also goes to 0, with probability 1. We now turn to the formal proof of the theorem. Proof: We will prove the theorem for Player 1. The proof for Player 2 is similar. From Theorem 14.6 (page 575) we may assume without loss of generality that for every y ∈ C, y ≤ M,
(14.35)
581
14.5 The approachability of a set
and, in particular, the absolute value of every coordinate of y is less than or equal to M. We will first define a strategy σ1∗ for Player 1, and then prove that it guarantees that the average payoff approaches the set C. In the first stage, the strategy σ1∗ chooses any action. For each t ≥ 1 the strategy σ1∗ instructs Player 1 to play as follows in stage t + 1:
r If g t ∈ C, the definition of σ1∗ is immaterial (play any action). r If g t ∈ C, the strategy σ1∗ instructs the player to choose the mixed action p(g t , C) (as defined in Definition 14.12). Denote by d t = d(g t , C) the distance between the average payoff up to stage t and the set C. We wish to show that for every strategy σ2 of Player 2, the distance d t converges to zero, with probability 1, and that the rate of convergence can be bounded, independently of the strategy of Player 2. Lemma 14.15 For every strategy σ2 of Player 2, and for every t ∈ N, Eσ1∗ ,σ2 [(d t )2 ] ≤
4M 2 . t
(14.36)
Proof: We will prove the claim by induction on t. Since the payoffs are bounded by M, one has d t = d(g t , C) ≤ 2M for all t ∈ N: the distance between the average payoff and the set C is not greater than twice the maximal payoff. Since M ≥ 12 , Equation (14.36) holds, for t = 1. Assume by induction that Equation (14.36) holds for t; we will prove that it holds for t + 1. The average payoff up to stage t + 1 is a weighted average (i) of the average payoff up to stage t, and (ii) of the payoff in stage t + 1: t+1
g t+1 =
t
1 l 1 l t 1 t+1 t 1 t+1 × g g . gt + g = g + = t + 1 l=1 t +1 t l=1 t +1 t +1 t +1
(14.37)
We wish to show that the expected value of d t+1 , the distance between g t+1 and C, shrinks. If g t ∈ C, then y t = g t . If g t ∈ C, denote by p t the mixed action that Player 1 plays in stage t + 1. Since y t ∈ C, one has d(g t+1 , C) ≤ d(g t+1 , y t ), leading to (d t+1 )2 = (d(g t+1 , C))2 ≤ (d(g t+1 , y t ))2 = y t − g t+1 2 .
(14.38)
By Equation (14.37), the right-hand side of Equation (14.38) is 8 82 8 t 8 1 t t t t+1 8 8 (y (y − g ) + − g ) 8t + 1 8 . t +1
(14.39)
Since d t = y t − g t and y t − g t+1 ≤ 2M, using Equations (14.38) and (14.39), this implies that (d t+1 )2 ≤
t t +1
2
(d t )2 +
4M 2 2t + y t − g t+1 , y t − g t . 2 (t + 1) (t + 1)2
(14.40)
582
Repeated games with vector payoffs
Taking conditional expectation in both sides of Equation (14.40), conditioned on the history ht up to stage t yields Eσ1∗ ,σ2 [(d t+1 )2 | ht ] 2 t 4M 2 2t ≤ Eσ1∗ ,σ2 [(d t )2 | ht ] + + Eσ ∗ ,σ [y t − g t+1 , y t − g t | ht ]. 2 t +1 (t + 1) (t + 1)2 1 2 (14.41) We now show that the third element on the right-hand side of Equation (14.41) is nonpositive. If g t ∈ C, then y t = g t , in which case the third element equals 0. If g t ∈ C, then, because C is a B-set for Player 1, it follows from the definition of p t that R1 (p t ) ⊂ H + (y t − g t , y t − g t , y t ). Since in stage t + 1, Player 1 plays mixed action p t , the expected payoff in stage t + 1, which is Eσ1∗ ,σ2 [g t+1 |ht ], is located in R1 (pt ), and therefore in H + (y t − g t , y t − g t , y t ). It follows that y t − g t , Eσ1∗ ,σ2 [g t+1 | ht ] ≥ y t − g t , y t .
(14.42)
Since the inner product is symmetric and bilinear, and since the average payoff g t and the point y t are determined given the history ht , we get Eσ1∗ ,σ2 [ y t − g t+1 , y t − g t | ht ]
= y t − Eσ1∗ ,σ2 [g t+1 | ht ], y t − g t
= y t − Eσ1∗ ,σ2 [g t+1 | ht ], y t − y t − Eσ1∗ ,σ2 [g t+1 | ht ], g t ≤ 0.
(14.43)
Since the third element on the right-hand side of Equation (14.41) is nonpositive, we get 2 4M 2 t t+1 2 t Eσ1∗ ,σ2 [(d ) | h ] ≤ Eσ1∗ ,σ2 [(d t )2 ] + . (14.44) t +1 (t + 1)2
Taking the expectation over ht of the conditional expectation on the left-hand side yields 2 4M 2 t Eσ1∗ ,σ2 [(d t+1 )2 ] = Eσ1∗ ,σ2 Eσ1∗ ,σ2 [(d t+1 )2 | ht ] ≤ Eσ1∗ ,σ2 [(d t )2 ] + . t +1 (t + 1)2 (14.45) 2
By the inductive hypothesis, Eσ1∗ ,σ2 [(d t )2 ] ≤ 4Mt , and therefore 2 4M 2 4M 2 t 4M 2 Eσ1∗ ,σ2 [(d t+1 )2 ] ≤ + , = 2 t +1 t (t + 1) t +1 which is what we wanted to show.
(14.46)
Recall that Markov’s inequality states that for every nonnegative random variable X, and for every c > 0, P(X ≥ c) ≤
E(X) . c
By Lemma 14.15, and the Markov inequality (with c = that d t is large is small (for large t):
(14.47) 2M √ ), we deduce that the probability t
583
14.5 The approachability of a set
Corollary 14.16 For every strategy σ2 of Player 2, 2M 2M t 2 ∗ Pσ1 ,σ2 (d ) ≥ √ ≤ √ , t t and therefore Pσ1∗ ,σ2
(14.48)
√ 2M 2M d t ≥ 1/4 ≤ √ . t t
(14.49)
This corollary relates to the distance between g t and the set C in stage t. We are interested in showing that this distance is small for large t, i.e., that there exists T sufficiently large such that from stage T onwards, the distance d t remains small. In other words, while in Lemma 14.15 we show that the expected value of the random variables (d t )t∈N converges to 0, and therefore the sequence (d t )t∈N converges in probability to 0, we now wish to show that convergence occurs almost surely. Although this can be proved using the strong law of large numbers for uncorrelated random variables, we will present a direct proof of convergence, without appealing to the law of large numbers. Lemma 14.17 For every ε > 0, there exists a number T sufficiently large such that for every strategy σ2 of Player 2, Pσ1∗ ,σ2 (d t < ε, ∀t ≥ T ) > 1 − ε.
(14.50)
In particular, this implies that the set C is approachable by Player 1. Therefore, proving Lemma 14.17 will complete the proof of Theorem 14.14. Proof: Let ε > 0. By5 Equation (14.49), for t = l 3 ,
√ 2M 2M 3 l ≤ 3/2 . Pσ1∗ ,σ2 d ≥ 3/4 l l
(14.51)
Let L ∈ N. Summing Equation (14.51) over l ≥ L yields
√ ∞
2M 1 3 l . Pσ1∗ ,σ2 d ≥ 3/4 for some l ≥ L ≤ 2M 3/2 l l l=L
(14.52)
Consider the complement of the event on the left-hand side in Equation (14.52):
√ ∞
2M 1 3 l Pσ1∗ ,σ2 d < 3/4 , ∀l ≥ L ≥ 1 − 2M . (14.53) 3/2 l l l=L ∞ 1 converges, there exists L0 sufficiently large for Since the series l=1 l 3/2 ∞ 1 ≥ 1 − ε. For the remainder of the proof, we will also require that 1 − 2M l=L0 l 3/2 L0 ≥ 7. We next prove the following lemma. 3
Lemma 14.18 If d l