127 65 5MB
English Pages 309 [302] Year 2022
Trends in Mathematics
Leon A. Petrosyan Vladimir V. Mazalov Nikolay A. Zenkevich Editors
Frontiers of Dynamic Games Game Theory and Management, St. Petersburg, 2020
Trends in Mathematics Trends in Mathematics is a series devoted to the publication of volumes arising from conferences and lecture series focusing on a particular topic from any area of mathematics. Its aim is to make current developments available to the community as rapidly as possible without compromise to quality and to archive these for reference. Proposals for volumes can be submitted using the Online Book Project Submission Form at our website www.birkhauser-science.com. Material submitted for publication must be screened and prepared as follows: All contributions should undergo a reviewing process similar to that carried out by journals and be checked for correct use of language which, as a rule, is English. Articles without proofs, or which do not contain any significantly new results, should be rejected. High quality survey papers, however, are welcome. We expect the organizers to deliver manuscripts in a form that is essentially ready for direct reproduction. Any version of TEX is acceptable, but the entire collection of files must be in one particular dialect of TEX and unified according to simple instructions available from Birkhäuser. Furthermore, in order to guarantee the timely appearance of the proceedings it is essential that the final version of the entire material be submitted no later than one year after the conference.
More information about this series at https://link.springer.com/bookseries/4961
Leon A. Petrosyan • Vladimir V. Mazalov • Nikolay A. Zenkevich Editors
Frontiers of Dynamic Games Game Theory and Management, St. Petersburg, 2020
Editors Leon A. Petrosyan St. Petersburg State University St. Petersburg, Russia
Vladimir V. Mazalov Institute of Applied Mathematical Research Russian Academy of Sciences Petrozavodsk, Russia
Nikolay A. Zenkevich Graduate School of Management St. Petersburg State University St. Petersburg, Russia
ISSN 2297-0215 ISSN 2297-024X (electronic) Trends in Mathematics ISBN 978-3-030-93615-0 ISBN 978-3-030-93616-7 (eBook) https://doi.org/10.1007/978-3-030-93616-7 Mathematics Subject Classification: 91Axx © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This book is published under the imprint Birkhäuser, www.birkhauser-science.com, by the registered company Springer Nature Switzerland AG. The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The content of this volume is based on selected talks that were given at the International Conference “Game Theory and Management (GTM2020)” held in Saint-Petersburg, Russia, on October 05–09, 2020. The meeting was organized by St. Petersburg State University and International Society of Dynamic Games (Russian Chapter). In the volume, two sorts of contributions prevail: chapters that are mainly concerned with the applications of game theory and chapters where the theoretical background is developed. In the chapter by Encarnación Algaba and René van den Brink, an overview of the most common models of communication and hierarchy restrictions in cooperative transferable utility games is given. Players in this kind of cooperative games take different positions in the network structure. Special attention is given to communication graph games. The authors illustrate this network structures by applying them to cooperative games with restricted cooperation and games with a permission structure. They discuss also more general union stable systems as models of communication networks. In the chapter by A. Azamov, T. Ibaydulaev, and G. Ibragimov, the differential pursuit-evasion game of kind with simple motion between a team of pursuers and one evader moving along the edges of regular simplex is considered. The two basic theorems are formulated and proven: one concerns the number of pursuers necessary for point wise capture of the evader on a finite time interval and another includes sufficient conditions to escape capture for the evader. In the chapter by A. Burato and S. Taboui, the problem of production labeling from the game theory perspective is investigated. In the bilateral monopoly where the manufacturer is the channel, two scenarios are considered. In the first scenario, the retailer distributes only the manufacturer’s national brand, and in the next scenario, the second brand is added. In the corresponding differential game model, the equilibria under both scenarios are computed. In the chapter by I. Dahmouni and U. Sumaila, game theoretic fishery model under three different scenarios is considered: fishing only, fishing and aquaculture, and the monopoly case. For each case, the two-stage dynamic game is constructed, and the Nash equilibrium and cooperative strategies are found in explicit form. v
vi
Preface
With respect to what remains in the waters as unexploited fish stock, three possible options are considered: (1) total depletion, (2) limited depletion, and (3) no significant fishing effect. The proposition outlines the conditions under which each of these options may be occurring. In the chapter by D.N. Fedyanin, the epistemic models from bounded observation are constructed. The construction is based on an assumption that agent i considers that agent j observes an event p if an agent i observes an event p and an agent i observes that an agent j observes an event p. The model of generations change for Cournot competition with predictions and memory is investigated. The general method to calculate equilibrium is proposed and its weak and strong points discussed. In the chapter by M. Geraskin, a duopoly model with a linear demand and nonlinear cost functions of agents as multilevel Stackelberg game is investigated. It is also proven that if one of the agents has the concave cost function, the game with the Stackelberg multilevel leadership leads to the positive variation and a bifurcation arises in the system of two agents. The cause of atypical responses in the Duopoly is demonstrated. In the chapter by V.V. Gusev, the so called vertex cover game on a network is defined as cooperative game with the characteristic function constructed in a special way. For different types of networks, the Shapley-Shubik index in corresponding vertex cover games is calculated. The author tries to apply the Shapley-Shubik index for estimating the efficiency of vertexes in the vertex cover game. Using the ShapleyShubik index, he tries to demonstrate the properties of the array of surveillance cameras if they will be arranged proportionally to the values of the Shapley-Shubik index in the vertex cover game. In the chapter by I. Konnov, an approximate decomposable penalty method was applied to generalized noncooperative games with joint constraints and restrictions on player shares allocation. The approach enables to replace the initial problem with a sequence of the approximate penalized Nash equilibrium problems together with an upper-level variational inequality. Convergence of approximate solutions to the initial game problem was established under rather weak conditions. In the chapter by A. Korolev, stochastic parameters are introduced into the network game with production and knowledge externalities. Players’ productivities have deterministic and stochastic components. The research represents the adjustment dynamics which occurs in the process of unifying different regular networks. Explicit expressions for the dynamics of network agents in the form of Brownian random processes are obtained. A qualitative analysis of solutions is carried out. In the chapter by K.V. Kozlov, G.A. Ougolnitsky, A.B. Usov, and M.K. Malsagov, static three-level resource allocation game model under corruption is proposed. This is the hierarchical game of the type “principal agents” with additional feedbacks on bribes proposed by the agents of the lower level to the agent of the middle level. The agents on the middle and lower levels of hierarchy make their decisions simultaneously and independent from each other. The numerical solution of the game is found. Experiments with different model parameters are presented.
Preface
vii
In the chapter by Nikolay A. Krasovskii and Alexander M. Tarasyev, the analysis of behavior of equilibrium trajectories is carried out for dynamic systems arising from solutions of bimatrix games. In the framework of guaranteed solutions, algorithms for constructing the value functions, positional strategies, and equilibrium trajectories are proposed. The equilibrium trajectories of the replicator dynamics relating to the theory of evolutionary games are analyzed. The dynamic systems generated by strategies of best replies are also considered. The chapter by E. Kuka and E. Ianovski investigates the So-called strategic voting when voters misrepresent their preferences to obtain a favorable outcome to themselves and, in doing so, force a suboptimal outcome on society. It is interesting to know what information is needed to realize the strategic voting. The authors try to quantify the amount of information needed to manipulate the election. For a special k-approval election, it is proved that the complete information about the profile is needed, which, as the authors suggest, is unrealistic. In the chapter by D. Lazovanu and S. Pickl, stochastic games are considered where the single player controls the transition probabilities. The problem of existence and determination of stationary Nash equilibrium in corresponding average stochastic game is investigated in the case when the set of states and the set of actions in the game are finite. It is shown that all stationary equilibriums in this case can be obtained from a specially defined auxiliary noncooperative static game in normal form. In their chapter, I.M. Orlov and S.S. Kumacheva try to use game-theoretic methods to analyze corrupt agents acting not isolated but as parts of a bigger hierarchical structure in hope to obtain insights that may help combat corruption in organizations. A subhierarchical two-level game-theoretic model of bribery is proposed, the particular example with six officials on three levels is solved via computer simulation, directions for minimizing corruption and mild corruption are suggested, and the cooperative extension of the game is also considered. In the chapter by L. Petrosyan, D. Yeung, and Y. Pankratova, differential nonzero-sum game on network with infinite duration is considered. In cooperative setting, the characteristic function is constructed based upon the payoffs computed along cooperative trajectory and the possibility of cutting connections at each timeinstant of the game. As a result, the corresponding game is convex, and the Shapley value belongs to the core of the game and is time consistent. The proposed approach essentially simplifies the computation of characteristic function and the Shapley value. The example is presented. In the chapter by Z. Wang, F. Yao, O. Petrosian, and H. Gao, the application of a new class of differential games with continuous updating in low-carbon chain is considered. It is proven that the cooperative and feedback Nash equilibrium strategies converge to the corresponding strategies in the game with continuous updating as the number of updating instants tends to infinity. The key point is that players could adjust their strategies corresponding to the changing information about the conditions of the game.
viii
Preface
The GTM2020 program committee thanks all the authors for their active cooperation and participation during the preparation of this volume. Also, the organizers of the conference gratefully acknowledge the financial support given by the Saint Petersburg State University. Last but not least, we thank the reviewers for their outstanding contribution. Saint Petersburg, Russia Petrozavodsk, Russia Saint Petersburg, Russia
L. A. Petrosyan V. V. Mazalov N. A. Zenkevich
Contents
Networks, Communication and Hierarchy: Applications to Cooperative Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Encarnación Algaba and René van den Brink Game with Slow Pursuers on the Edge Graphs of Regular Simplexes . . . . . Abdulla Azamov, Tolanbay Ibaydullaev, and Gafurjan Ibragimov Are Retailers’ Private Labels Always Detrimental to National Brand Manufacturers? A Differential Game Perspective . . . . . . . . . . . . . . . . . . . Alessandra Buratto and Sihem Taboubi A Two-Stage Fishery Game with an Aquaculture Facility . . . . . . . . . . . . . . . . . . Ilyass Dahmouni and Ussif Rashid Sumaila Ordering in Games with Reduced Memory and Planning Horizon of Players . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . Denis N. Fedyanin
1 35
49 79
99
Problems of Calculation Equilibria in Stackelberg Nonlinear Duopoly . . . 119 Mikhail I. Geraskin The Vertex Cover Game .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 137 Vasily V. Gusev Application of the Decomposable Penalty Method to a Class of Generalized Nash Equilibrium Problems . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 149 Igor Konnov Regular Networks Unification in Games with Stochastic Parameters . . . . . 167 Alexei Korolev Simulation Modeling of the Resource Allocation Under Economic Corruption ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 189 Kirill V. Kozlov, Guennady A. Ougolnitsky, Anatoly B. Usov, and Mukharbeck Kh. Malsagov ix
x
Contents
Analysis of Equilibrium Trajectories in Dynamic Bimatrix Games . . . . . . . . 203 Nikolay A. Krasovskii and Alexander M. Tarasyev Manipulation of k-Approval Under De Re Knowledge . .. . . . . . . . . . . . . . . . . . . . 219 Valeriia Kuka and Egor Ianovski An Approach for Determining Stationary Equilibria in a Single-Controller Average Stochastic Game . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 235 Dmitrii Lozovanu and Stefan Pickl Hierarchical Model of Corruption: Game-Theoretic Approach .. . . . . . . . . . . 251 Ivan M. Orlov and Suriya Sh. Kumacheva Differential Network Games with Infinite Duration . . . . .. . . . . . . . . . . . . . . . . . . . 269 Leon Petrosyan, David Yeung, and Yaroslavna Pankratova Differential Game Model Applied in Low-Carbon Chain with Continuous Updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 279 Zeyang Wang, Fanjun Yao, Ovanes Petrosian, and Hongwei Gao
List of Contributors
Encarnación Algaba Matemática Aplicada II and Instituto de Matemáticas de la Universidad de Sevilla (IMUS), Escuela Superior de Ingenieros, Camino de los Descubrimientos, Sevilla, Spain Abdulla Azamov Institute of Mathematics named after V.I. Romanowsky, Tashkent, Uzbekistan Alessandra Buratto Department of mathematics “Tullio Levi-Civita”, University of Padova, Padova, Italy Ilyass Dahmouni Department of Fisheries and Oceans Canada, Ottawa, ON, Canada Denis N. Fedyanin ICS RAS, Moscow, Russia HSE University, Moscow, Russia Hongwei Gao School of Mathematics and Statistics, Qingdao University, Qingdao, P.R. China Mikhail I. Geraskin Institute of Economics and Management, Samara National Research University named after academician S.P. Korolev, Samara, Russia Vasily V. Gusev HSE University, St. Petersburg, Russian Federation Egor Ianovski International Laboratory of Game Theory and Decision Making, HSE University, St. Petersburg, Russia Tolanbay Ibaydullaev Andijan State University named after Z.M. Babur, Andijan, Uzbekistan Gafurjan Ibragimov Department of Mathematics and INSPEM, Universiti Putra Malaysia, Serdang, Selangor, Malaysia Igor Konnov Kazan Federal University, Kazan, Russia Alexei Korolev National Research University Higher School of Economics at St. Petersburg, St. Petersburg, Russia xi
xii
List of Contributors
Kirill V. Kozlov I.I. Vorovich Institute of Mathematics, Mechanics and Computer Sciences, Southern Federal University, Rostov-on-Don, Russia Nikolay A. Krasovskii N.N. Krasovskii Institute of Mathematics and Mechanics of UrB of RAS, Yekaterinburg, Russia Valeriia Kuka International Laboratory of Game Theory and Decision Making, HSE University, St. Petersburg, Russia Suriya Sh. Kumacheva St. Petersburg State University, St. Petersburg, Russia Dmitrii Lozovanu Institute of Mathematics and Computer Science, Chisinau, Moldova Mukharbeck Kh. Malsagov Ingush State University, Magas, Russia Ivan M. Orlov St. Petersburg State University, St. Petersburg, Russia Guennady A. Ougolnitsky I.I. Vorovich Institute of Mathematics, Mechanics and Computer Sciences, Southern Federal University, Rostov-on-Don, Russia Yaroslavna Pankratova St. Petersburg State University, Saint-Petersburg, Russia Ovanes Petrosian School of Automation, Qingdao University, Qingdao, China St. Petersburg State University, St. Petersburg, Russia Leon Petrosyan St. Petersburg State University, Saint-Petersburg, Russia Stefan Pickl Universität der Bundeswehr München, Germany
Neubiberg-München,
Ussif Rashid Sumaila Fisheries Economics Research Unit, The University of British Columbia, Montreal, QC, Canada Sihem Taboubi GERAD and Marketing Department HEC Montreal, Montreal, QC, Canada Alexander M. Tarasyev N.N. Krasovskii Institute of Mathematics and Mechanics of UrB of RAS, , Yekaterinburg, Russia Ural Federal University named after the first President of Russia B.N. Yeltsin, Yekaterinburg, Russia Anatoly B. Usov I.I. Vorovich Institute of Mathematics, Mechanics and Computer Sciences, Southern Federal University, Rostov-on-Don, Russia René van den Brink Department of Economics and Tinbergen Institute, VU University Amsterdam, Amsterdam, the Netherlands
List of Contributors
xiii
Zeyang Wang St. Petersburg State University, St. Petersburg, Russia School of Mathematics and Computer Science, Yan’an University, Shaanxi, China Fanjun Yao School of Business, Qingdao University, Qingdao, P.R. China David Yeung Hong Kong Shue Yan University, Hong Kong, China
Networks, Communication and Hierarchy: Applications to Cooperative Games Encarnación Algaba and René van den Brink
Abstract Agents participating in different kind of organizations, usually take different positions in some network structure. Two well-known network structures are hierarchies and communication networks. We give an overview of the most common models of communication and hierarchy restrictions in cooperative games, compare different network structures with each other and discuss network structures that combine communication as well as hierarchical features. Throughout the survey, we illustrate these network structures by applying them to cooperative games with restricted cooperation. Keywords Networks · Games · Communication · Hierarchy · Cooperative TU-game · Shapley value
1 Introduction A cooperative game with transferable utility, or simply a TU-game, consists of a finite set of players and for every subset (coalition) of players a worth representing the total payoff that the coalition can obtain by cooperating. A main question is how to allocate the worth that can be earned by all players cooperating together, over the individual players. A (single-valued) solution is a function that assigns to every game a payoff vector which components are the individual payoffs to each player. A solution is efficient if it assigns to every game a payoff vector such that the sum of the payoffs is equal to the worth of the ‘grand coalition’ consisting of all players.
E. Algaba Matemática Aplicada II and Instituto de Matemáticas de la Universidad de Sevilla (IMUS), Escuela Superior de Ingenieros, Sevilla, Spain e-mail: [email protected] R. van den Brink () Department of Economics and Tinbergen Institute, VU University Amsterdam, Amsterdam, The Netherlands e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Trends in Mathematics, https://doi.org/10.1007/978-3-030-93616-7_1
1
2
E. Algaba and R. van den Brink
One of the most applied efficient solutions for cooperative TU-games is the Shapley value (Shapley [63]), which is exposed and highlighted by Algaba et al. [14, 15]. In its classical interpretation, a TU-game describes a situation in which every coalition S (i.e. subset) of N can be formed and earn its worth. In the literature, various restrictions on coalition formation have been developed. A general network structure can be represented by any subset of the power set of N. Usually, in the literature, these network structures satisfy certain properties. Two main forms of restricted cooperation that have been studied are communication restrictions and hierarchies. Myerson [54] introduced the well-known model of a communication graph game that consists of a TU-game and an undirected (communication) graph where it is assumed that only coalitions that are connected in the communication graph are feasible. A restricted game is defined where the worth of every feasible (i.e. connected) coalition equals its worth in the original game, while the worth of a nonconnected coalition equals the sum of the worths of its maximally connected subsets (also known as components). Further, he showed that the solution that assigns to every communication graph game the Shapley value of the restricted game is the only solution that satisfies component efficiency (meaning that every maximally connected subset of players earns its own worth) and fairness (meaning that deleting a communication link between two players has the same effect on the individual payoffs of these two players). Algaba et al. [2, 3] introduce and analyze union stable systems being network structures that satisfy the property that the union of every pair of nondisjoint coalitions is also feasible, a property that is satisfied by the set of connected coalitions of any undirected graph. This led to a generalization of the characterization of the Shapley value for communication graph games.1 A model that studies restrictions in cooperation arising from hierarchies is that of a game with a permission structure. In those games it is assumed that the players are part of a hierarchical organization, where some players might need permission or approval from other players before they are allowed to cooperate. Two approaches to games with a permission structure are considered. In the conjunctive approach as developed in Gilles et al. [44] and van den Brink and Gilles [73], it is assumed that each player needs permission from all its predecessors before it is allowed to cooperate with other players. This implies that a coalition is feasible if and only if for every player in the coalition it holds that all its predecessors belong to the coalition. Alternatively, in the disjunctive approach as developed in Gilles and Owen [43] and and van den Brink [66], it is assumed that each player (except the topplayers) needs permission from at least one of its predecessors before it is allowed to cooperate. Consequently, a coalition is feasible if and only if every (non-top) player in the coalition has at least one predecessor who also belongs to the coalition. In Algaba et al. [6] it is shown that the sets of feasible coalitions arising from these
1 For union stable systems where all singletons are feasible, Algaba et al. [4] unified two important lines of restricted cooperation: the one introduced by Myerson [54] and the one initiated by Faigle [39]. Moreover, the relationship among union stable systems and hypergraphs is established by Algaba et al. [7].
Networks, Communication and Hierarchy: Applications to Cooperative Games
3
permission structures are antimatroids being well-known combinatorial structures representing hierarchies, see Dilworth [36] and Edelman and Jamison [38]. A set of feasible coalitions is an antimatroid if it contains the empty set, satisfies accessibility (meaning that every nonempty feasible coalition has at least one player that can leave the coalition and the result is a feasible subcoalition) and is union closed (meaning that the union of two feasible coalitions is also feasible). An overview of games with a permission structure is given in van den Brink [71].2 In the field of restricted cooperation, van den Brink [70] made clear the distinction between hierarchies and communication networks by showing that the network structures that can be the set of connected coalitions in some undirected graph are exactly those network structures that, besides containing the empty set, satisfy the above mentioned union stability and 2-accessibility (meaning that every feasible coalition with two or more players has at least two players that can leave the coalition such that the remaining set of players is still a feasible coalition). So, compared to communication feasible sets (network structures that can be obtained as the set of connected coalitions in some undirected graph), antimatroids satisfy a stronger union property (since union closedness implies union stability) but a weaker accessibility property (since 2-accessibility implies accessibility). In view of this last result, Algaba et al. [13] considered networks that are represented by network structures that take the weaker of the two union and accessibility properties for network structures considered above, i.e. union stability (from communication feasible sets) and accessibility (from antimatroids). They study these so-called accessible union stable network structures. Obviously, all sets of connected coalitions of some (undirected) communication graph as well as all antimatroids fall into this class. It is also shown that augmenting systems (Bilbao [24]) are accessible union stable network structures, but not every accessible union stable network structure is an augmenting system. This brings us to the conclusion that, under union stability, augmentation implies accessibility (i.e. if a coalition can grow from the empty set to the grand coalition by letting players enter one by one, then also from the grand coalition players can leave one by one until we reach the empty set), but not the other way around. Throughout this paper, we compare these different network structures by considering their impact on coalition formation in cooperative TU-games with restricted cooperation. We specifically consider the solution that assigns to every cooperative game on a certain network structure, the Shapley value of an associated restricted game where the worths are generated only by feasible coalitions. This survey is organized as follows. Section 2 gives some basic definitions on cooperative TU-games. Section 3 considers games with restrictions in cooperation 2
As an alternative to restricting the coalitions (i.e. sets of players) that are feasible, Faigle and Kern [40] introduce games under precedence constraints where (1) the game is defined on a restricted domain that is determined by the hierarchy, and (2) the possible orders in which coalitions can be formed is restricted. In this setting Algaba et al. [12] define a class of new values in which the removal of certain ‘irrelevant’ players does not effect the payoffs of the remaining players. A comparison between the two models is given in Algaba and van den Brink [1].
4
E. Algaba and R. van den Brink
arising from communication restrictions, while Sect. 4 considers restrictions arising from hierarchies. In Sect. 5, we combine communication and hierarchical restrictions in one type of network structure. Finally, Sect. 6 contains concluding remarks.
2 Cooperative TU-Games and Restricted Cooperation The main goal of this survey is to review and compare various network structures, specifically those modeling communication networks and hierarchies. Moreover, we will illustrate most concepts by their effect on coalition formation in cooperative games. Firstly, in this section, we give some basic preliminaries on cooperative transferable utility games. A situation in which a finite set of players can obtain certain payoffs by cooperation can be described by a cooperative game with transferable utility, or simply a TU-game, being a pair (N, v), where N ⊆ N is a finite set of players and v : 2N → R is a characteristic function on N satisfying v(∅) = 0. For every coalition S ⊆ N, v(S) is the worth of coalition S, meaning that the members of coalition S can obtain a total payoff of v(S) by agreeing to cooperate. Since we take the player set N to be fixed, we denote a TU-game (N, v) just by its characteristic function v. We denote the collection of all TU-games on player set N by G N . A payoff vector of an n-player TU-game v ∈ G N is an n-dimensional vector x ∈ RN giving a payoff xi ∈ R to any player i ∈ N. A (single-valued) solution for TU-games is a mapping f that assigns to every game v ∈ G N a payoff vector f (v) ∈ RN . One of the most well-known solutions for TU-games is the Shapley value (Shapley [63]) given by Shi (v) =
(|N| − |S|)!(|S| − 1)! |N|!
S⊆N
(v(S) − v(S \ {i})) for all i ∈ N.
i∈S
Equivalently, the Shapley value can be written as the average marginal contribution vector over all permutations (orders of entrance) of the players: Shi (v) =
1 |N|!
(v({j ∈ N | π(j ) ≤ π(i)})
π∈Π(N)
− v({j ∈ N | π(j ) < π(i)})) for all i ∈ N, where Π(N) is the collection of all permutations π : N → N of the player set.3 For an updating on theoretical and applied results about the potential and versatility of
3
Another common expression of the Shapley value is using the so-called Harsanyi dividends (see Harsanyi [47]) but we will not use that in this survey.
Networks, Communication and Hierarchy: Applications to Cooperative Games
5
this appealing value, see Algaba et al. [14]. Likewise a review of the main classical properties can be found in Algaba et al. [15]. In a TU-game any subset S ⊆ N is assumed to be able to form a coalition and earn the worth v(S). However, in most economic and political organizations not every set of participants can form a feasible coalition. Therefore, cooperative game theory models have been developed that take account of restrictions on coalition formation. This is modeled by considering a set of feasible coalitions F ⊆ 2N that need not contain all subsets of the player set N. For a finite set N, a network structure on N is a pair (N, F ) where F ⊆ 2N is a family of subsets. The sets belonging to F are called feasible. A network structure F ⊆ 2N without any further requirement can be seen as the most general type of network structure, and is called conference structure in Myerson [55]. However, intuitively when we consider applications of a network structure, it satisfies certain properties. In this survey, we will consider some of such properties that we encounter in applications in Economics and Operations Research. A triple (N, v, F ) with v ∈ G N and F ⊆ 2N is a game with restricted cooperation. Here, the game v represents the earnings that coalitions can obtain, and F denotes the network structure determining the cooperation possibilities. Again, since we take the player set to be fixed, we denote a game with restricted cooperation (N, v, F ) by (v, F ).
3 Communication Restrictions In this section, we consider communication networks represented by undirected graphs and the more general union stable systems, and consider their role in defining cooperation networks in cooperative games.
3.1 Communication Graphs One of the most well-known models of restricted cooperation are the games with communication restrictions as introduced in Myerson [54], see also Owen [60]. In this model, a communication network on the set of players in a cooperative game is given, and a coalition S is feasible if and only if the players in S are connected within this communication network. This communication network is represented by an undirected graph on the set of players. An undirected graph is a pair (N, L) where N is the set of nodes and L ⊆ {{i, j }|i, j ∈ N, i = j } is a collection of subsets of N such that each element of L contains precisely two elements. The elements of L represent bilateral communication links and are referred to as edges or links. Since in this paper the nodes in a graph represent the positions of players in a communication network, we refer to the nodes as players. If there is a link between two players, we call
6
E. Algaba and R. van den Brink
2
Fig. 1 Communication graph (N, L) of Example 1
4
1
5 3
them neighbours. A sequence of k different players (i1 , . . . , ik ) is a path in (N, L) if {ih , ih+1 } ∈ L for h = 1, . . . , k − 1. Two distinct players i and j , i = j , are connected in graph (N, L) if there is a path (i1 , . . . , ik ) with i1 = i and ik = j . A coalition S ⊆ N is connected in graph (N, L) if every pair of players in S is connected by a path that only contains players from S, i.e. for every i, j ∈ S, i = j , there is a path (i1 , . . . , ik ) such that i1 = i, ik = j and {i1 , . . . , ik } ⊆ S. In other words, a coalition S is connected in (N, L) if the subgraph (S, L(S)), with L(S) = {{i, j } ∈ L|{i, j } ⊆ S} being the set of links between players in S, is connected. A maximally connected subset of coalition S in (N, L) is called a component of S in that graph, i.e. T ⊆ S is a component of S in (N, L) if and only if (1) T is connected in (N, L(S)), and (2) for every h ∈ S \ T the coalition T ∪ {h} is not connected in (N, L(S)). A sequence of players (i1 , . . . , ik ), k ≥ 2, is a cycle in (N, L) if (i1 , . . . , ik ) is a path in (N, L) and {ik , i1 } ∈ L. A graph (N, L) is cycle-free when it does not contain any cycle. Example 1 Consider the communication graph (N, L) on N = {1, . . . , 5} given by L = {{1, 2}, {1, 3}, {2, 4}, {3, 4}, {4, 5}}, see Fig. 1. Players 1 and 5 are connected by two paths: (1, 2, 4, 5) and (1, 3, 4, 5). Coalition {1, 4, 5} has two components: {1} and {4, 5}. This communication graph has a cycle (1, 2, 4, 3).
3.1.1 Communication Graph Games A triple (N, v, L) with (N, v) a TU-game and (N, L) an undirected communication graph is called a communication graph game. Again, since we take the player set to be fixed, we denote a communication graph game (N, v, L) just by (v, L). In the communication graph game (v, L), players can cooperate if and only if they are able to communicate with each other, i.e. a coalition S is feasible if and only if it is connected in (N, L). In other words, the set of feasible coalitions in a communication graph game (N, v, L) is the set of coalitions FL ⊆ 2N given by FL = {S ⊆ N | S is connected in (N, L)}.
(1)
We refer to this network structure as the communication feasible set of communication graph (N, L). Myerson [54] introduced the restricted game of a communication
Networks, Communication and Hierarchy: Applications to Cooperative Games
7
graph game (v, L) as the TU-game vL in which every feasible coalition S can earn its worth v(S). Whenever S is not feasible it can earn the sum of the worths of its components in (N, L). Denoting the set of components of S ⊆ N in (N, L) by CL (S), the restricted game vL corresponding to communication graph game (v, L) is given by
vL (S) =
v(T ) for all S ⊆ N.
(2)
T ∈CL (S)
Note that CL (S) is a partition of S. A solution for communication graph games assigns a payoff vector to every communication graph game. Applying any TUgame solution to the restricted game vL gives a solution for communication graph games (v, L). The solution μ given by Myerson [54] is obtained by taking for every communication graph game the Shapley value of the corresponding restricted game, i.e., μ(v, L) = Sh(vL ). This solution was later named the Myerson value for communication graph games. Example 2 Consider the communication graph game (v, L) with L as given in Example 1 and game v given by v(S) =
1 if {1, 5} ⊆ S 0 else,
This can represent a supply chain where player 5 is a retailer who can generate worth (normalized to be equal to 1) when she has a product in store. To get the product from the factory (player 1) one of the two distributors (players 2 or 3) can be used to bring the product to the wholesaler (player 4) who delivers the product to the retailer. The Myerson restricted game is the game (N, v L ) given by v L (S) =
1 if S ∈ {{1, 2, 4, 5}, {1, 3, 4, 5}, {1, 2, 3, 4, 5}} 0 else.
So, the coalitions that generate the worth one are those coalitions that connect the retailer with the factory. Applying the Shapley value to this game gives the payoffs according to the Myerson value: μ(v, L) =
3 1 1 3 3 , , , , . 10 20 20 10 10
One can argue whether the allocation of the payoffs in the example above are fair and reasonable. One way to motivate solutions is by giving axiomatic characterizations. Myerson [54] axiomatized his value by component efficiency and fairness. Component efficiency means that every maximally connected set of players (component) earns exactly its worth. Fairness means that deleting (or
8
E. Algaba and R. van den Brink
adding) a link has the same effect on the payoffs of the two players on that link.4 Another axiomatization of the Myerson value with another kind of fairness axiom, component efficiency, a kind of null player property and additivity can be found in Selcuk and Suzuki [62].5
3.2 Union Stable Systems Union stable systems are generalizations of communication graphs where the smallest unit of cooperation might have a size larger than 2. Let N = {1, . . . , n} be a finite set of players or nodes, and F ⊆ 2N a network structure of feasible coalitions. Union stable systems are introduced and developed in Algaba et al. [2, 3] as generalizations of communication networks, where for every two nondisjoint feasible coalitions, also their union is feasible. Definition 1 A network structure F ⊆ 2N is a union stable system if it satisfies (union stability)
S, T ∈ F with S ∩ T = ∅ implies that S ∪ T ∈ F .
Union stable systems can be used to model restricted cooperation, where the smallest units of cooperation are not the singletons and edges as with communication graphs, but it can be that a coalition with more than two players is feasible while none of its subsets are feasible. Many real world situations find their natural framework in these structures as shown in the next example, that is taken from Algaba et al. [10]. Example 3 Consider a situation where player 1 is a homeowner who wants to sell his/her house. Player 1 has signed a contract with a real estate agent represented by player 2. So, player 1 only can sell his/her house by means of player 2. There are two buyers, players 3 and 4. In this application, the family of feasible coalitions that can generate a surplus are only those which make possible that the seller can sell his/her house. Therefore, the coalitions which can trade are those coalitions that contain the homeowner, the real estate agent and at least one of the buyers: F = {∅, {1, 2, 3} , {1, 2, 4} , {1, 2, 3, 4}} .
(3)
4
For TU-games, van den Brink [68] axiomatized the Shapley value by efficiency, the null player property and a fairness axiom that requires that the payoffs of two players change by the same amount if to a game v we add another game w such that the two players are symmetric in game w. Deleting an edge from a communication graph, the two players on the deleted edge are symmetric in the difference game between the two communication restricted games. 5 Another type of motivation for a solution is by strategic implementation. A strategic implementation of the Myerson value can be found in Slikker [65], who modified the bidding mechanism of Pérez-Castrillo and Wettstein [61] for the Shapley value to communication graph games.
Networks, Communication and Hierarchy: Applications to Cooperative Games
9
Notice that for every communication graph (N, L), the set of connected coalitions FL is a union stable system as considered in Algaba et al. [2, 3].6 Obviously, if S and T are two connected coalitions, then there is a path from every player in S to every player in S ∩ T , and there is a path from every player in S ∩ T to every player in T , and therefore there is a path from every player in S to every player in T , i.e. S ∪ T is connected. However, a union stable system cannot always be modeled by a communication graph. (A characterization of the union stable systems that can be the set of connected coalitions in a communication graph can be found in Sect. 5.)
3.2.1 The Supports of a Union Stable System Similar as the set of edges of an undirected graph determine the set of all connected coalitions, a union stable system can be fully determined by its basis as determined in Algaba et al. [2, 3]. For each union stable system F , the following set is welldefined: E (F ) = {G ∈ F : G = A ∪ B, A = G, B = G, A, B ∈ F , A ∩ B = ∅}, being the set of those coalitions in F that can be written as the union of two other feasible coalitions. The set B (F ) = F \ E (F ) is called the basis of F , and the elements of B (F ) are called supports of F , i.e. these are the coalitions that cannot be written as the union of two other feasible coalitions. Inductively, the following families are defined7 G (0) = B(F ), G (m) = S ∪ T : S, T ∈ G (m−1) , S ∩ T = ∅ , (m = 1, 2, . . .) . Notice that G (0) ⊆ G (m−1) ⊆ G (m) ⊆ F , since F is union stable. So, starting with the basis G 0 = B(F ), the collection G 1 is obtained by adding to G 0 all coalitions that can be obtained as the union of any pair of nondisjoint coalitions in G , G 2 is obtained by adding to G 1 all coalitions that can be obtained as the union of any pair of nondisjoint coalitions in G 1 , etc., continuing untill all unions of nondisjoint pairs of coalitions in the set also belong to the set. In fact, Algaba et al. [2] define G by G = G (k) , where k is the smallest integer such that G (k+1) = G (k) . We remark that the basis B (F ) is the minimal subset of the union stable system F such that B (F ) = F . As mentioned, the set of connected coalitions in an undirected graph or communication feasible set, is a union stable system. The basis of a communication feasible set is exactly the set of edges (feasible coalitions of size two) and singletons (feasible coalitions of size one). The components of an undirected graph can be generalized to 6
Another application of union stable systems can be found in Algaba et al. [19]. Algaba et al. [2] inductively apply the union stability operator to any network structure G ⊆ 2N which always ends up in a union stable system. 7
10
E. Algaba and R. van den Brink
union stable systems as follows. Let F ⊆ 2N be a network structure and let S ⊆ N. A set T ⊆ S is called a F -component of S if (1) T ∈ F , and (2) there exists no T ∈ F such that T ⊂ T ⊆ S. In other words, the F -components of S are the maximal feasible coalitions that belong to F and are contained in S. We denote by CF (S) the collection of the F -components of S. Union stable systems can be characterized in terms of the F -components of a coalition in the following way: The network structure F ⊆ 2N is union stable if and only if for every S ⊆ N with CF (S) = ∅, the F -components of S are a collection of pairwise disjoint subsets of S, see Algaba et al. [2]. So, if F is a union stable system, such that for every i ∈ N, there is an S ∈ F with i ∈ S, then the F -components form a partition of the player set N.
3.2.2 Cooperative Games on a Union Stable System A union stable cooperation structure, or a game on a union stable system, is a triple (N, v, F ) where N = {1, . . . , n} is the set of players, (N, v) is a TU-game and F is a union stable system. Again, we assume that the player set is fixed and denote game on union stable system (N, v, F ) just as (v, F ). For convenience, we assume from now on that the underlying game (N, v) is zero-normalized, i.e., v({i}) = 0, for all i ∈ N. Let B(F ) be the basis of F and C (F ) = {B ∈ B(F ) : |B| ≥ 2}. Extending the approach of Myerson [54] for communication graph games to games on a union F N stable system, the F -restricted is defined on the player set game, v : 2 → R, F F N and is given by v (S) = T ∈CF (S) v(T ), where v (S) = 0, if CF (S) = ∅, as developed in Algaba et al. [2, 3]. A solution for games on a union stable system is a map that assigns a payoff vector to each game on a union stable system. The Myerson value for communication graph games can be extended straightforward by applying the Shapley value to the associated F -restricted game, see van den Nouweland et al. [57] and Algaba et al. [3].8 Given a game on a union stable system (v, F ), the Myerson value,9 denoted by μ (v, F ) ∈ RN , is defined by
μ (v, F ) = Sh v F . 8
From a computational point of view, Algaba et al. [8] study the complexity of the Myerson value for games on a union stable system by means of the Harsanyi dividends. Polynomial time algorithms for computing the Myerson value in weighted voting games restricted by a tree are given in Fernández et al. [41]. 9 The Myerson value for games on a union stable system is a particular case of the class of Harsanyi power solutions for games on a union stable system, as introduced by Algaba et al. [11], which generalizes the class of Harsanyi power solutions for communication graph games presented by van den Brink et al. [77]. Moreover, the class of Harsanyi power solutions has been studied, for a particular case of union stable systems derived from the family of winning coalitions associated with a voting game, in Algaba et al. [19]. In fact, this setting allows for studying situations in which there exists a feedback between the economic influence of each coalition of agents and its political power.
Networks, Communication and Hierarchy: Applications to Cooperative Games
11
In order to characterize the Myerson value using a fairness axiom, the following axioms are introduced in this framework. Component efficiency of a solution on a class of games on union stable systems states that for every game with restricted cooperation in this class, the total payoff to every component equals its worth. A player i ∈ N is called a component dummy in union stable system F if this player does not belong to any maximal component of the grand coalition, i.e., i ∈ / M. A solution satisfies component dummy if every component dummy M∈CF (N) is assigned a zero payoff. For games on a union stable system, Algaba et al. [3] presented a generalized version of the fairness axiom presented in Myerson [54], requiring that for union stable systems all players in a support B lose or gain the same amount if the support B and all coalitions that are obtained by union stability using support B are deleted. They characterized the Myerson value for games on a union stable system by component efficiency, component dummy and fairness. Notice that this generalizes the axiomatization of Myerson [54] for communication graph games since component efficiency and fairness boil down to the corresponding axioms for communication graph games, while component dummy becomes void since in a communication graph all singletons are connected/feasible.10 Another axiom that is used in characterizing the Shapley value for TU-games, but also on various network restrictions, is balanced contributions and variations. We briefly get back to this in Sect. 5.
4 Hierarchies In this section, we review some models of hierarchical cooperation structures, specifically permission structures and the more general antimatroids, and consider their role in defining cooperation networks in cooperative games.
4.1 Permission Structures A model that studies restrictions in coalition formation arising from hierarchies is that of a game with a permission structure. In those games, it is assumed that players are part of a hierarchical organization in which there are players that need permission or approval from certain other players before they are allowed to cooperate. For a finite set of players N such a hierarchical organization is represented by an irreflexive directed graph (N, D) with D ⊆ N × N such that
10 Other characterizations of the Myerson value for games on a union stable system can be found in Algaba et al. [10].
12
E. Algaba and R. van den Brink
(i, i) ∈ D for all i ∈ N, referred to as a permission structure on N. Again, since we take the player set to be fixed, we denote a permission structure (N, D) just by its binary relation D. The directed links (i, j ) ∈ D are called arcs. The players in FD (i) = {j ∈ N | (i, j ) ∈ D} are called the successors or followers of player i, while the players in PD (i) = {j ∈ N | (j, i) ∈ D} are called the predecessors of i. A sequence of different players (i1 , . . . , ik ) is a directed path between players i and j , i = j , in a permission structure D if i1 = i, ik = j and (ih , ih+1 ) ∈ D for all 1 ≤ h ≤ k − 1. A permission structure D is acyclic if there exists no directed path (i1 , . . . , ik ) with (ik , i1 ) ∈ D. Note that in an acyclic permission structure there can be more than one directed path from player i to player j = i. Also note that in an acyclic permission structure D, there always exists at least one player with no predecessors, i.e. TOP (D) = {i ∈ N | PD (i) = ∅} = ∅. We refer to these players as the top-players in the permission structure. The first two approaches to games with a permission structure that have been considered in the literature are the following. In the conjunctive approach as developed in Gilles et al. [44] and van den Brink and Gilles [73], it is assumed that each player needs permission from all its predecessors in order to cooperate. This implies that a coalition S ⊆ N is feasible if and only if for every player in S all its predecessors belong to S. The set of feasible coalitions in this approach is therefore given by c ΦD = {S ⊆ N | PD (i) ⊂ S for all i ∈ S} ,
which we refer to as the conjunctive feasible set of D. Alternatively, in the disjunctive approach as developed in Gilles and Owen [43] and van den Brink [66], it is assumed that each player (except the top-players) needs permission from at least one of its predecessors before it is allowed to cooperate with other players. Consequently, a coalition is feasible if and only if every player in the coalition (except the top-players) has at least one predecessor who also belongs to the coalition. Thus, the feasible coalitions are the ones in the set d ΦD = {S ⊆ N | PD (i) ∩ S = ∅ for all i ∈ S \ TOP (D)} ,
which we refer to as the disjunctive feasible set of D. Example 4 Consider the permission structure D on N = {1, 2, 3, 4} given by c = {∅, {1}, {1, 2}, {1, 3}, D = {(1, 2), (1, 3), (2, 4), (3, 4)}, see Fig. 2. Then, ΦD d = Φ c ∪ {{1, 2, 4}, {1, 3, 4}}. {1, 2, 3}, {1, 2, 3, 4}} and ΦD D Both the conjunctive and disjunctive feasible sets are union closed, i.e. for every two feasible sets, also the union is feasible. This gives rise to an interesting difference between the communication feasible sets in Myerson [54] as mentioned in Sect. 3, and the conjunctive and disjunctive feasible sets arising from permission structures. Whereas in a communication graph every coalition can be partitioned
Networks, Communication and Hierarchy: Applications to Cooperative Games
13
1
Fig. 2 Permission structure D of Example 4
2
3 4
into maximally connected subsets (components), in the conjunctive and disjunctive feasible sets every coalition has a unique largest feasible subset.11
4.1.1 Games with a Permission Structure A game with a permission structure is a triple (N, v, D), where N = {1, . . . , n} is the set of players, (N, v) is a TU-game and (N, D) is a permission structure. Again, we assume that the player set is fixed and denote game with permission structure (N, v, D) just as the pair (v, D). As mentioned above, union closedness of the conjunctive and disjunctive feasible sets implies that every coalition has a unique largest feasible subset in each of the two approaches. An approach using restricted games similar to the approach of Myerson (1977) for communication graph games (see Sect. 3), assigns to every coalition in a game with a permission structure the worth of its largest feasible subset. c This gives two restricted games. The conjunctive restriction rv,D of v on D assigns to every coalition the worth of its largest conjunctive feasible subset. Similar, d the disjunctive restriction rv,D assigns to every coalition the worth of its largest disjunctive feasible subset. A solution for games with a permission structure assigns a payoff vector to every game with permission structure. Similar as in Myerson’s approach Myerson [54] to communication graph games, we can apply any TU-game solution to the restricted games. Here, we focus on the Shapley value,12 yielding two solutions, the conjunctive, respectively, disjunctive (Shapley) permission value: c d ) and ϕ d (v, D) = Sh(rv,D ). ϕ c (v, D) = Sh(rv,D
11 In van den Brink et al. [76] union closed systems are considered where the only requirement of a set of feasible coalitions is that it is union closed. They exploit the property that in such network structures every feasible coalition has a unique largest feasible subset. 12 Core properties are considered in Derks and Gilles [34].
14
E. Algaba and R. van den Brink
Example 5 Consider the game with permission structure (N, v, D) on N = {1, 2, 3, 4} with permission structure D as given in Example 4 and game v given by v(S) =
1 if S 4 0 else.
This models a situation where player 4 needs to be activated in order to earn a worth of 1, but 4 needs approval from its predecessors. The conjunctive restricted game is given by c rv,D (S)
=
1 if S = {1, 2, 3, 4} 0 else,
with corresponding conjunctive permission value payoffs ϕ (v, D) = c
1 1 1 1 , , , . 4 4 4 4
The disjunctive restricted game is given by d rv,D (S)
=
1 if S ∈ {{1, 2, 4}, {1, 3, 4}, {1, 2, 3, 4}} 0 else,
with corresponding disjunctive permission value payoffs ϕ d (v, D) =
5 1 1 5 , , , . 12 12 12 12
Several axiomatizations of the conjunctive and disjunctive permission value can be found in the literature. Here, we just focus on a difference between the two permission values by an axiom similar to Myerson’s fairness for communication graph games as mentioned in Sect. 3.13 In van den Brink [66] an axiomatization of the disjunctive permission value is given showing that the disjunctive permission value satisfies disjunctive fairness which implies that deleting (or adding) an arc (such that the successor on the arc has at least one other predecessor) has the same effect on the payoffs of the two
13 Axiomatizations of the two permission values using conjunctive, respectively disjunctive, fairness together with efficiency, additivity, the inessential player property, the necessary player property and (weak) structural monotonicity can be found in van den Brink [66], van den Brink [67]. These axioms have a natural interpretation in several applications such as, for example, polluted river problems of Ni and Wang [56] and Dong et al. [37], see van den Brink et al. [80].
Networks, Communication and Hierarchy: Applications to Cooperative Games
15
players on that arc.14 On the other hand, in van den Brink [67] an axiomatization of the conjunctive permission value is given showing that the conjunctive permission value satisfies conjunctive fairness implying that deleting (or adding) an arc has the same effect on the payoffs of the successor on the arc and every ‘other’ predecessor of this successor.15
4.2 Antimatroids Algaba et al. [6] show that the conjunctive and disjunctive feasible sets in acyclic permission structures are antimatroids. Antimatroids were introduced by Dilworth [36] as particular examples of semimodular lattices. Since then, several authors have obtained the same concept by abstracting various combinatorial situations (see Korte et al. [50] and Edelman and Jamison [38]). Definition 2 A network structure A ⊆ 2N is an antimatroid if it satisfies the following properties (feasible empty set) ∅ ∈ A , (union closedness) S, T ∈ A implies that S ∪ T ∈ A , (accessibility) S ∈ A with S = ∅, implies that there exists i ∈ S such that S \ {i} ∈ A . Union closedness means that the union of two feasible coalitions is also feasible. Accessibility means that every nonempty feasible coalition has at least one player that can leave such that the set of remaining players is a feasible subcoalition. Besides these characterizing properties, in a game theory context the network structure is usually such that every player/node belongs to at least one feasible coalition: (normality)
for every i ∈ N there exists an S ∈ A such that i ∈ S.
Note that normality and union closedness imply that N ∈ A . In the following we refer to normal antimatroids simply as antimatroids. The conjunctive and disjunctive feasible sets corresponding to an acyclic permission structure are antimatroids. Theorem 1 (Algaba et al. [6]) If D is an acyclic permission structure on N then c d and ΦD are antimatroids on N. ΦD Next question is if antimatroids are really more general than permission structures. First, we exactly characterize those antimatroids that can be the conjunctive 14 The disjunctive fairness axiom is stronger and also requires an equal change in the payoff of every ‘complete superior’ of the predecessor on the arc, being those superiors that are on every path from a top player to this predecessor. 15 Similar as mentioned for disjunctive fairness in the previous footnote, the full axiom also requires equal changes on the payoffs of every ‘complete superior’ of these other predecessors.
16
E. Algaba and R. van den Brink
or disjunctive feasible set of some permission structure. It turns out that conjunctive feasible sets are exactly those that are closed under intersection. These are well known structures, also known as poset antimatroids.16 Theorem 2 (Algaba et al. [6]) Let A be an antimatroid. There is an acyclic c permission structure D such that A = ΦD if and only if A satisfies (intersection closedness)
S, T ∈ A implies that S ∩ T ∈ A .
An alternative way to characterize poset antimatroids is by using paths. An extreme player of S ∈ A is a player i ∈ S such that S \ {i} ∈ A . So, extreme players are those players that can leave a feasible coalition S keeping feasibility. By accessibility, every feasible coalition in an antimatroid has at least one extreme player. Coalition S ∈ A is a path in A if it has a unique extreme player. The path S ∈ A is a i-path in A if it has i ∈ N as unique extreme player. The paths form the basis of an antimatroid in the sense that every feasible coalition in an antimatroid is either a path, or can be written as the union of other feasible coalitions. So, if we know the paths, then we generate the full antimatroid by applying the union operator. Example 6 Consider the permission structure of Example 4. The paths of the c are {1}, {1, 2}, {1, 3} and {1, 2, 3, 4}. The paths of the conjunctive feasible set ΦD d disjunctive feasible set ΦD are {1}, {1, 2}, {1, 3}, {1, 2, 4} and {1, 3, 4}. In Example 6, we see that the conjunctive feasible set has exactly four paths, one for each player. This property characterizes the conjunctive feasible sets among all antimatroids. Theorem 3 (Algaba et al. [6]) Let A be an antimatroid. There is an acyclic c permission structure D such that A = ΦD if and only if for every player i ∈ N there is a unique i-path in A . Obviously, the disjunctive feasible set does not satisfy this property. Looking at Example 6, we see that {1, 2, 4} and {1, 3, 4} are both paths of player 4. On the d other hand, in the disjunctive feasible set ΦD we see that, given a path, leaving out the unique extreme player, we have again a path, see for example the sequence of coalitions {1, 2, 4}, {1, 2}, {1}, and ∅, in Examples 4 and 6. In this example, this is c not satisfied by the conjunctive feasible set ΦD since deleting the unique extreme player from the path {1, 2, 3, 4}, we are left with {1, 2, 3} which is not a path since both players 2 and 3 are extreme players (it is the union of the feasible coalitions
16 Games on intersection closed set systems are studied in Beal et al. [23]. Intersection closedness is one of the characterizing properties of the important network structures called convex geometries. Convex geometries are a combinatorial abstraction of convex sets introduced by Edelman and Jamison [38]. A network structure G ⊆ 2N is a convex geometry if it satisfies the following properties: (1) (feasible empty set) ∅ ∈ G , (2) (intersection closed) S, T ∈ G implies that S ∩ T ∈ G , and (3) (augmentation’) S ∈ G with S = N, implies that there exists i ∈ N \ S such that S ∪ {i} ∈ G . (In Sect. 5, we consider another augmentation property.)
Networks, Communication and Hierarchy: Applications to Cooperative Games
17
{1, 2} and {1, 3}). It turns out that this ‘path property’ is typical for disjunctive feasible sets. In fact, we need a stronger property to characterize them. Theorem 4 (Algaba et al. [6]) Let A be an antimatroid. There is an acyclic d permission structure D such that A = ΦD if and only if 1. Every path S has a unique feasible ordering, i.e. S = (i1 > · · · > it ) such that {i1 , . . . , ik } ∈ A for all 1 ≤ k ≤ t. Furthermore, the union of these orderings for all paths is a partial ordering of N. 2. If S, T and S \ {i} are paths such that the extreme player of T equals the extreme player of S \ {i}, then T ∪ {i} ∈ A . Next we show that antimatroids are really more general than permission structures by giving an example of an antimatroid that does not satisfy the properties of Theorems 3 and 4 that characterize the conjunctive and disjunctive feasible sets, respectively. Example 7 (Ordered Partition Voting) Consider player set N = {1, 2, 3, 4, 5}. Suppose that the player set is partitioned into two levels: Level 1 consists of players 1, 2 and 3, while Level 2 consists of players 4 and 5. Suppose that all subsets of Level 1 are feasible, but every subset of Level 2 needs approval of a majority (twoplayer) coalition of Level 1. So, the set of feasible coalitions is ⎧ ⎫ ⎨ ∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}, ⎬ A = {1, 2, 4}, {1, 2, 5}, {1, 3, 4}, {1, 3, 5}, {2, 3, 4}, {2, 3, 5}, . ⎩ ⎭ {1, 2, 3, 4}, {1, 2, 3, 5}, {1, 2, 4, 5}, {1, 3, 4, 5}, {2, 3, 4, 5}, {1, 2, 3, 4, 5}
This is an antimatroid. However, it is not a conjunctive feasible set (poset antimatroid) since {1, 2, 4}, {1, 3, 4} and {2, 3, 4} are all paths of player 4. It is also not a disjunctive feasible set since taking out the unique extreme player (4) from the path {1, 2, 4} gives coalition {1, 2} which is not a path. It is not difficult to prove that the conjunctive and disjunctive approaches coincide if and only if the permission structure is a forest. Theorem 5 (Algaba et al. [6]) Let D be an acyclic permission structure. Then c d = ΦD if and only if |PD (i)| ≤ 1 for all i ∈ N. ΦD Games with a permission tree (i.e. a connected forest) are studied in ÁlvarezMozos et al. [20] and van den Brink et al. [78, 79].17 Peer group games where, moreover, the original game is an additive or inessential game, are studied in Brânzei
17 Although
van den Brink et al. [79] consider permission tree games, they distinguish solutions that are based on a communication or hierarchy approach. For cycle-free communication graph games, Demange [32] introduces the so-called hierarchical outcomes, that are extreme points of the core of the restricted game if the game is superadditive and the graph is cycle-free. Nonemptiness of the core of a superadditive, cycle-free graph game was shown, independently, in Demange [31] and Le Breton et al. [51]. Herings et al. [48] consider the average of the hierarchical outcomes.
18
E. Algaba and R. van den Brink
et al. [27] and in Brânzei et al. [26] for the special case of a rooted line-graph. These games have many applications in Economics and Operations Research, as we will mention in our concluding remarks.
4.2.1 Games on an Antimatroid A game on an antimatroid is a triple (N, v, A ) where (N, v) is a TU-game, and A is an antimatroid on player set N. Since we take the player set to be fixed, we denote a game on an antimatroid just as a pair (v, A ). The antimatroid is the set of feasible coalitions in the game, and thus reflects the restricted cooperation possibilities. Since the conjunctive and disjunctive feasible sets derived from an acyclic permission structure are antimatroids, this model generalizes the games with an acyclic permission structure. As mentioned before, by union closedness every coalition has a unique largest feasible subset. For antimatroids, Korte et al. [50] introduce the interior operator intA : 2N → A that assigns to every set its largest feasible subset, i.e. intA (S) =
T for all S ⊆ N.
(4)
{T ∈A |T ⊆S}
Using this operator18 we can easily generalize the definition of the conjunctive and disjunctive restricted game for games with a permission structure to games on antimatroids. The restriction of game v on antimatroid A is the game vA that assigns to every coalition the worth of its largest feasible subset, and thus is given by vA (S) = v(intA (S)) for all S ⊆ N. A solution f for games on antimatroids assigns a payoff vector to every game on an antimatroid (v, A ) on N. We consider the solution Sh that assigns to every game on antimatroid (v, A ), the Shapley value of the restricted game, i.e. Sh(v, A ) = Sh(vA ). Algaba et al. [6] introduce a fairness axiom for games on an antimatroid that generalizes both conjunctive as well as disjunctive fairness, by requiring that deleting a feasible coalition from an antimatroid, such that what is left is still an
interior operator is characterized by: (1) intA (∅) = ∅, (2) intA (S) ⊆ S, (3) if S ⊆ T then intA (S) ⊆ intA (T ), (4) intA (intA (S)) = intA (S), and (5) if i, j ∈ intA (S) and j ∈ / intA (S \ {j }). intA (S \ {i}) then i ∈
18 The
Networks, Communication and Hierarchy: Applications to Cooperative Games
19
antimatroid, has the same effect on the payoffs of all players in the coalition that is deleted.19,20,21 Finally, notice that an antimatroid is a union stable system, since union closedness implies union stability.22
5 Communication and Hierarchies In this section, we compare different communication and hierarchy structures and explore the possibilities to combine communication features with hierarchy.
5.1 Comparing Communication and Hierarchies Let F ⊆ 2N be an arbitrary network structure. Since all singletons in a communication graph are connected, it follows that communication feasible sets arising from communication graphs contain the empty set and satisfy normality, i.e. every player belongs to at least one feasible coalition. Further, communication feasible sets also satisfy accessibility. They even satisfy the stronger 2-accessibility meaning that every feasible coalition with two or more players has at least two players that can leave the coalition such that the set of remaining players is a feasible coalition. Communication feasible sets are not union closed (as is illustrated by the two connected coalitions {1, 2} and {5} in Example 1 whose union is not connected). However, as mentioned before (see Sect. 3), communication feasible sets satisfy the weaker union stability. In van den Brink [70] it is shown that a network structure
antimatroid A , the path S is covered by path T if S ⊂ T with |T | = |S| + 1, and the unique extreme player of T is the player in T \ S. For a coalition S to be deleted leaving behind an antimatroid, the deleted coalition should be a path (otherwise, union closedness will be violated) that is not covered by a path (otherwise accessibility will be violated, since if path S is covered by path T ⊃ S with |T | = |S| + 1, after deleting S from the antimatroid, T has no any extreme player). 20 One can see that applying this fairness to the sets Φ c , respectively Φ d , of a game with D D permission structure gives the corresponding conjunctive, respectively disjunctive, fairness as follows. Deleting an arc (i, j ), with |PD (j )| ≥ 2, in a permission structure leads to more c d (respectively ΦD ) and every coalition that is ‘gained’ (respectively less) feasible coalitions in ΦD (respectively ‘lost’) contains player j and all other predecessors h ∈ PD (j ) \ {i} (respectively player i). 21 They characterize the Shapley value for games on an antimatroid by this fairness axiom together with axioms generalizing efficiency, additivity, the inessential player property and the necessary player property for games with a permission structure as mentioned in Footnote 11. 22 For results of games on antimatroids we refer to Algaba et al. [5, 6]. For antimatroids that are not normal, similar results can be stated restricted to the class of players that belong to at least one feasible coalition. 19 In
20
E. Algaba and R. van den Brink
is a communication feasible set if and only if it contains the empty set and satisfies normality, union stability and 2-accessibility. Theorem 6 (van den Brink [70]) Let F ⊆ 2N be a normal network structure on N ⊂ N. Then F is the communication feasible set of some communication graph if and only if it satisfies the following properties: (feasible empty set) ∅ ∈ F , (union stability) S, T ∈ F with S ∩ T = ∅ implies that S ∪ T ∈ F , (2-accessibility) S ∈ F with |S| ≥ 2 implies that there exist i, j ∈ S, i = j , such that S \ {i} ∈ F and S \ {j } ∈ F . Usually the set of links in an undirected communication graph/communication feasible set, being coalitions of size two, is considered as the basis of a communication network. Note that by repeated application of 2-accessibility until we are left with coalitions of size two and one, we can generate these bilateral links from any communication feasible set. Also note that given 2-accessibility, normality implies that {i} ∈ F for all i ∈ N as is the case for communication feasible sets. Adding other properties characterizes the sets of connected coalitions in special graphs. For example adding closedness under intersection yields those communication feasible sets arising from cycle-complete communication graphs.23 Other special types of graphs, such as cycle-free graphs and line-graphs, are characterized in van den Brink [70]. For example, 1. a communication feasible set is the set of connected coalitions in some communication line-graph if and only if it satisfies path union stability (meaning that the union of every pair of paths that have a nonempty intersection is also a path24); 2. a communication feasible set is the set of connected coalitions in some cyclefree communication graph if and only if it satisfies weak path union stability (meaning that the union of two feasible paths that have an endpoint in common, is also a path); 3. a communication feasible set is the set of connected coalitions in some communication tree (i.e. connected cycle-free graph) if and only if it satisfies weak path union stability and connectedness (meaning that for every pair of players, there is a feasible coalition containing both players). Comparing Theorem 6 with Definition 2, we conclude that communication feasible sets are characterized by properties similar to the ones used in defining antimatroids. Besides normality and feasibility of the empty set, both satisfy an accessibility and a union property. Obviously, 2-accessibility implies accessibility and therefore communication feasible sets satisfy a stronger accessibility property. However, antimatroids satisfy union closedness instead of union stability, and therefore antimatroids satisfy a stronger union property. 23 A
graph is cycle-complete if, whenever there is a cycle, the subgraph on that cycle is complete. context makes clear if we consider paths in a communication graph or paths in an antimatroid. 24 The
Networks, Communication and Hierarchy: Applications to Cooperative Games Table 1 Comparing communication with hierarchy
Antimatroids ∅ is feasible union closed accessible
⇒ ⇐
21 Communication ∅ is feasible union stable 2-accessible
5.2 Accessible Union Stable Network Structures Considering Theorem 6, a natural network structure that combines hierarchy with communication restrictions, is to consider network structures that satisfy the weaker union and accessibility properties that characterize antimatroids and communication feasible sets, see Table 1. This gives the following network structure that is introduced in Algaba et al. [13]. Definition 3 A network structure F ⊆ 2N is an accessible union stable network structure if it satisfies the following properties: (feasible empty set) ∅ ∈ F , (union stability) S, T ∈ F with S ∩ T = ∅, implies that S ∪ T ∈ F , (accessibility) S ∈ F with S = ∅, implies that there exists i ∈ S such that S \ {i} ∈ F . Obviously, antimatroids and communication feasible sets are accessible union stable network structures. In these network structures, union stability reflects communication in the sense that players that belong to the intersection of two coalitions can generate communication through the full (i.e. union) coalition. Accessibility reflects asymmetry between the players, specifically between the players that can, and those that cannot, leave the coalition keeping feasibility. Thus, accessible union stable network structures seem to be a natural network structure to model organizations with communication as well as hierarchy features. By definition, we have the following obvious proposition. Proposition 1 (i) An accessible union stable network structure is an antimatroid if and only if it is union closed. (ii) A normal accessible union stable network structure is a communication feasible set if and only if it satisfies 2-accessibility. Since accessible union stable network structures generalize communication feasible sets as well as antimatroids, they can help us to study organizations that have hierarchical as well as communication features. Another class that contains communication feasible sets and antimatroids are the augmenting systems introduced by Bilbao [24]. Definition 4 A network structure F ⊆ 2N is an augmenting system if it satisfies the following properties: (feasible empty set) ∅ ∈ F , (union stability) S, T ∈ F with S ∩ T = ∅, implies that S ∪ T ∈ F ,
22
E. Algaba and R. van den Brink
(augmentation) S, T ∈ F with S ⊂ T , implies that there exists i ∈ T \ S such that S ∪ {i} ∈ F . Algaba et al. [9] establish that an augmenting system, where all singletons are feasible, is a communication feasible set. Augmentation25 establishes that, whenever there are two feasible coalitions such that one is contained in the other, we can keep adding players from the ‘bigger’ coalition to the ‘smaller’ coalition one by one, such that after each addition the new coalition is feasible. This property can be used in defining solutions for games that are based on marginal vectors, such as the Shapley value. Assuming that the ‘grand coalition’ N is feasible one can, starting with the empty set and adding one player at each step, define a sequence of feasible coalitions ending up in the ‘grand coalition’. This means that we can always define a permutation π : N −→ N such that {π(1), . . . , π(k)} is a feasible coalition for every k ∈ {1, . . . , n}. We will see later that the same cannot be done for arbitrary accessible union stable network structures. Players who can be joined to a feasible coalition S ∈ F keeping feasibility are called augmentation players, i.e. a player i ∈ N \ S, with S ∪ {i} ∈ F ⊆ 2N , is called an augmentation player of coalition S in F . Note that augmentation implies accessibility. Indeed, for T ∈ F with T = ∅, by the augmentation property, we have that there exists a sequence of coalitions T0 , T1 , . . . , Tt , with Th ∈ F , |Th | = h, 0 ≤ h ≤ t, such that ∅ = T0 ⊂ T1 ⊂ · · · ⊂ Tt −1 ⊂ Tt = T . Therefore, there exists a player i ∈ T such that T \{i} = Tt −1 ∈ F . This shows that augmenting systems satisfy accessibility, and therefore they are accessible union stable network structures. Proposition 2 (Algaba et al. [13]) If F ⊆ 2N is an augmenting system then F is an accessible union stable network structure. However, accessibility does not imply augmentation, and therefore not every accessible union stable network structure is an augmenting system, see Example 8. So, interpreting augmentation as the possibility to grow and accessibility as the possibility to shrink, the possibility to grow from the empty set to the grand coalition by players entering one by one, implies that we can shrink from the grand coalition to the empty set by players leaving one by one. But the possibility to shrink does not imply the possibility to grow. This has an important impact on solutions that are defined using marginal vectors such as the Shapley value. The next example, taken from Algaba et al. [13], discusses an application of accessible union stable network structures that are neither an augmenting system nor an antimatroid nor a communication feasible set. Example 8 (Exploring and Careful Societies) Let K = {1, 2} be a society of explorers, and M = {3, 4, 5} a society of careful players. Assume that all coalitions
25 Notice the difference with the augmentation property that is used as a defining property of a convex geometry, see Footnote 14.
Networks, Communication and Hierarchy: Applications to Cooperative Games
23
within each society are feasible, but a coalition containing players from both societies can only be formed if it contains all players of M (and any subset of K). Therefore, we can consider society K as a society of ‘explorers’ who each can go individually or with any group to the ‘outside world’, and society M as a ‘careful’ society consisting of players who can only go out together. The corresponding set of feasible coalitions is F = {∅, {1}, {2}, {1, 2}, {3}, {4}, {5}, {3, 4}, {3, 5}, {4, 5}, {3, 4, 5}, {1, 3, 4, 5}, {2, 3, 4, 5}, {1, 2, 3, 4, 5}}, (where the first row of F contains K, M and all their subsets, while in the second row are the coalitions that contain players from both societies.) This network structure is union stable since (i) any union of two coalitions that are either a subset of K or a subset of M are feasible, and (ii) the union of two nondisjoint feasible coalitions containing players from K and M must contain all players from M (since at least) one of these two coalitions must contain players from K and M, and thus the union must contain all players from M, and therefore is feasible. The network structure is accessible since (i) for each non-empty feasible coalition that is a subset of K or M every player in this coalition is an extreme player (i.e., it can be deleted keeping feasibility), and (ii) for every feasible coalition that contains players from both K and M each player from K can be deleted, resulting in a coalition that still contains all players from M, and therefore is feasible. To show that the network structure F is not a communication feasible set, consider for example feasible coalition {2, 3, 4, 5}. This coalition contains one player from K, player 2, and all players from M. But since none of the players of M can be deleted, player 2 is the only extreme player, so the network structure does not satisfy 2-accessibility. To show that the network structure F is not an antimatroid, consider for example coalitions {1} and {3}. These are proper subsets of K, respectively M, so their union contains a player from K and a player from M but does not contain all players from M, and therefore is not feasible, showing that F does not satisfy union closedness. Finally, to show that the network structure F is not an augmenting system, consider coalitions K = {1, 2} and K ∪ M = {1, 2, 3, 4, 5}. Then no single player from M can be added to K to get a feasible coalition since the players of M only join K as group, showing that it does not satisfy augmentation.
5.2.1 The Supports of Accessible Union Stable Network Structures Since accessible union stable network structures are union stable systems, they can be described by the supports as defined in Sect. 3.
24
E. Algaba and R. van den Brink
Example 9 Consider the accessible union stable network structure F on N = {1, 2, 3, 4, 5} given by F =
∅, {1}, {2}, {4} , {5} , {1, 2} , {3, 4} , {4, 5} , {2, 3, 4} , . {3, 4, 5} , {1, 2, 3, 4} , {2, 3, 4, 5} , N
The supports of F are given by B (F ) = {∅, {1} , {2} , {4} , {5} , {1, 2} , {3, 4} , {4, 5} , {2, 3, 4}} . We introduced accessible union stable network structures as a model that generalizes communication feasible sets as well as antimatroids in such a way that two defining properties reflect communication (union stability), respectively, hierarchy (accessibility). An interesting question is to see how these features influence the basis of the structure. As mentioned in Sect. 3, the supports of a communication feasible set are exactly those elements that have cardinality one or two, the first type being the singletons and the second type being the links or edges of the communication graph. The full communication feasible set is obtained from the supports by repeatedly applying the union stability operator. An antimatroid can be fully described by its paths, being those feasible coalitions that have exactly one extreme player, and applying the union operator. Applying accessibility, it holds that every nonempty feasible coalition in an accessible union stable network structure has an extreme player. For accessible union stable network structures it turns out that every support either has cardinality at most two or is a path. (In the accessible union stable network structure of Example 9, the supports with cardinality at most two are those in B(F ) \ {{2, 3, 4}}, while support {2, 3, 4} is a path.) Proposition 3 (Algaba et al. [13]) Let F ⊆ 2N be an accessible union stable network structure. If B ∈ B (F ) with |B| > 2 then B is a path. The reverse is not true, i.e., not every path with more than two players is a support. Example 10 Consider the set N = {1, 2, 3, 4} and the accessible union stable network structure given by F = {∅, {1} , {2} , {4} , {1, 2} , {3, 4} , {2, 3, 4} , N}. Its basis is B(F ) = {{1} , {2} , {4} , {1, 2} , {3, 4} , {2, 3, 4}}. Since the only extreme player of the ‘grand coalition’ N is player 1, the grand coalition is a path but it is not a support, since it is the union of {1, 2} and {2, 3, 4}. This ‘hybrid’ form of the basis of accessible union stable network structures make these structures in some sense more difficult than the other structures mentioned before. Specifically, (1) the basis of a union stable system is formed by the supports being the coalitions that are not the union of two nondisjoint feasible coalitions, (2) communication feasible sets are determined by the singletons and the
Networks, Communication and Hierarchy: Applications to Cooperative Games
25
links, while (3) antimatroids are described by their paths. This is useful, specifically for the fairness type of axioms for solutions for games with restricted cooperation which compare two different structures where one of these is obtained by deleting certain feasible coalitions from the other. In the case of union stable systems, deleting a support, what is left is still a union stable system. For antimatroids, taking out any path that is not covered by another path, what is left is still an antimatroid. In the mentioned literature, it is shown that in these structures, we have enough possibilities to delete coalitions and apply the fairness axiom mentioned earlier in this survey to get uniqueness with some other axioms. In the next subsection, we mention some problems that we encounter when applying fairness to games on accessible union stable network structures.
5.2.2 Cooperative Games on Accessible Union Stable Network Structures A game on an accessible union stable network structure is a triple (N, v, F ) where (N, v) is a TU-game, and F ⊆ 2N is an accessible union stable network structure. Again, since we take the player set to be fixed, we denote a game on an accessible union stable network structure (N, v, F ) by (v, F ). Since accessible union stable network structures are union stable systems, we can directly apply the approach of Sect. 3, and define a restricted game associated to games on an accessible union stable network structure as follows. Let v : 2N → R be a cooperative game and let F ⊆ 2N be an accessible union stable network structure. The restricted game v F : 2N → R, is defined by v F (S) =
v(T ) for all S ⊆ N.
T ∈CF (S)
Notice that, if F is an accessible union stable network structure, then for every S ⊆ N such that CF (S) = ∅, we have v F (S) = 0. If F is a communication feasible set, then the game v F is the graph-restricted game of Myerson [54] and Owen [60], see Sect. 3. Since an antimatroid A is union closed, every subset S ⊆ N has a unique component given by the interior operator int (S), see (4). The restricted game v A : 2N → R, is the game defined by v A (S) = v (int (S)), see Sect. 4. A solution for games on an accessible union stable network structure is a function that assigns a payoff vector to every game on an accessible union stable network structure. Following the previous sections, we consider the solution that assigns to every game on an accessible union stable network structure the Shapley value of the corresponding restricted game: ϕ (v, F ) = Sh(v F ) for every game on accessible union stable network structure (v, F ). Clearly, the value ϕ for games on accessible union stable network structures generalizes the Myerson value for communication graph games and the (conjunctive and disjunctive) permission value for games with a permission structure.
26
E. Algaba and R. van den Brink
As mentioned before, if F is a union stable system then the components of N in F form a partition of a subset of N. (If F is, moreover, normal then the components form a partition of N). Notice that we cannot just apply fairness since, besides union stability, we now also need to take care that, after deleting a coalition/support, the remaining network structure still satisfies accessibility. Before considering a fairness type of axiom for games on an accessible union stable network structure, we consider a balanced contributions type of axiom. Balanced contributions axioms balance mutual dependence of two players on each other in the sense that they equalize the effect of the removal of one player on the payoff of the other player. Given a network structure F ⊆ 2N and a player i ∈ N, the network structure F−i = {S ∈ F | i ∈ / S} consists of all those feasible coalitions in F which do not contain player i. This operation has the nice feature that the reduced structure F−i is still an accessible union stable network structure. Proposition 4 (Algaba et al. [13]) If F ⊆ 2N is an accessible union stable network structure and i ∈ N, then F−i is an accessible union stable network structure. This proposition allows us to define the following axiom. A solution f for games on an accessible union stable network structure has balanced contributions if for every game on accessible union stable network structure (v, F ) and any two players i, j ∈ N with i = j, we have fi (v, F ) − fi v, F−j = fj (v, F ) − fj (v, F−i ) . The ‘freedom’ to ‘isolate’ any player in an accessible union stable network structure gives us the positive characterization result that the Shapley value is the unique value for games on an accessible union stable network structures that satisfies component efficiency, component dummy, and has balanced contributions, see Algaba et al. [13]. Above, using balanced contributions, we considered the effects of deleting all coalitions containing one particular player from the set of feasible coalitions, on the payoffs of another player. For accessible union stable network structures, we can define various types of fairness axioms. Here, we consider the following. Comparing payoffs of two players, we can, similar as balanced contributions, consider the effect on their payoffs when we delete all coalitions containing both players. So, for an accessible union stable network structure F and two players i, j ∈ N, we consider the network structure F−ij = {S ∈ F | {i, j } ⊆ S}
Networks, Communication and Hierarchy: Applications to Cooperative Games
27
being the collection of feasible coalitions in F that do not contain both players i and j . Next, we define a version of fairness where we delete all coalitions containing two particular players, and require the payoffs of these two players to change by the same amount. Then a solution f for games on an accessible union stable network structure satisfies fairness if fi (v, F ) − fi (v, F−ij ) = fj (v, F ) − fj (v, F−ij ) for all games on accessible union stable network structure (v, F ) and i, j ∈ N such that F−ij is an accessible union stable network structure. The restriction that F−ij is an accessible union stable network structure implies that not all feasible coalitions can be deleted. It turns out that F being accessible implies that F−ij is accessible. Proposition 5 (Algaba et al. [13]) If F ⊆ 2N is an accessible network structure then F−ij is accessible. However, for an arbitrary accessible union stable network structure F the network structure F−ij need not be union stable as the following example shows. Example 11 Consider the accessible union stable network structure F of Example 8. Take a player from N and one from M, for example players 2 and 4. Then F−24 = F \ {{2, 3, 4, 5}, {1, 2, 3, 4, 5}}, which is not union stable since {1, 2} and {1, 3, 4, 5} both belong to F−24 but their union does not. This creates problems for axiomatizing the Shapley value for games on an accessible union stable network structure, since we are restricted in the coalitions that can be deleted from an accessible union stable network structure. Notice that fairness as defined above for the class of accessible union stable network structures is not the same as Myerson’s fairness for communication graph games, since if F = FL for some communication graph L then F−ij , in general, is not the set of connected coalitions in L \ {{i, j }}. Consider, for example, the communication graph L on N = {1, 2, 3, 4} given by L = {{1, 2}, {2, 3}, {3, 4}, {1, 4}}. Then {1, 2, 3, 4} ∈ FL\{{1,2}} \ (FL )−12 . Therefore, on the class of communication graph games this axiom is not the same as Myerson’s fairness. However, it can be shown that for (sets of connected coalitions in) cycle-free graphs the two fairness axioms are the same. Therefore, it is known that for these set systems, fairness and component efficiency characterize the Shapley value. Similar, it follows that for the class games on cycle-free accessible union stable network structures, the Shapley value is characterized by component efficiency, component dummy and fairness.
28
E. Algaba and R. van den Brink
6 Concluding Remarks The main goal of this survey is to review and compare various network structures, specifically those that model communication networks and hierarchies. We illustrated most concepts by their effect on coalition formation in cooperative (transferable utility) games. We discussed (undirected) communication graphs and the more general union stable systems as models of communication networks, and we discussed permission structures and the more general antimatroids as models of hierarchical structures. Also, we discussed the model of an accessible union stable network structure as a model that combines communication with hierarchy. In the survey, we already mentioned several other network structures and their relation with the structures reviewed here, such as union closed network structures, convex geometries, intersection closed network structures and augmenting systems. Other models in the literature are, for example, the games in coalition structure of Aumann and Drèze [22] and Owen [59], where the player set is partitioned into a-priori unions (which can also be modeled as an undirected graph where there is a link between two players if and only if they belong to the same a-priori union), or its generalization to level structures in Winter [81], where there is a sequence of coalition structures, the next one finer than the previous one. It is interesting to observe that cooperative games in itself are a generalization of undirected graphs. In fact, cooperative games are a generalization of hypergraphs. To be precise, a hypergraph is a simple cooperative game, being a game (N, v) with v(S) ∈ {0, 1} for all S ⊆ N. A simple cooperative game then represents hypergraph (N, F ), F ⊆ 2N , if v(S) = 1 if S ∈ F , and v(S) = 0 otherwise. In game terminology, S is called a coalition, while in graph terminology it is called a hyperlink. Undirected graphs are a special type of hypergraphs where the hyperlinks have size one or two. Representing an undirected graph (N, L) by its collection FL of connected coalitions, this can also be represented by the socalled 2-additive game (N, v) given by (1) v(S) = 1 if |S| =2 and S ∈ L, (2) v(S) = 0 if |S| = 2 and S ∈ L, or |S| ≤ 1, and (3) v(S) = T ⊂S,|T |=2 v(T ) if |S| > 2, see Deng and Papadimitriou [33]. In this context, undirected graphs are usually called bilateral graphs or networks. Summarizing, the class of undirected bilateral graphs is contained in the class of hypergraphs which is contained in the class of cooperative games. Specifically, hypergraphs are simple cooperative games, while undirected bilateral graphs are 2-additive simple cooperative games. Obviously, 2-additive cooperative games (that are not simple) coincide with the socalled weighted undirected graphs where the links have weights expressing their importance.26
26 Applications of weighted graphs, considering weights on links as well as on nodes can be found in, for example, Lindelauf et al. [52], who measure the importance of terrorists based on their centrality in a terrorist network.
Networks, Communication and Hierarchy: Applications to Cooperative Games
29
A very general approach to games with restricted cooperation is followed by Derks and Peters [35] who consider a restriction as a mapping ρ : 2N → 2N that assigns to every coalition an associated ‘feasible’ coalition. One of the characterizing properties of a restriction ρ is that ρ(ρ(S)) = ρ(S) for all S ⊆ N.27 This approach contains the games on a union closed system, and thus games on an antimatroid and games with a permission structure, where to every coalition it assigns the largest feasible subset. It does not contain the communication graph games but, for example, it also does not contain the games with a local permission structure of van den Brink and Dietz [72], where the role of a player as value generator is separated from its role as authorizer in the sense that a player can give permission to its subordinates to act, but still needs permission from its own predecessors to be active.28 In this survey, we focussed on comparing different network structures. We want to mention that in many applications in Economics and Operations Research, also looking at more specific network structures gives insight into these Economics and OR problems. Without going into detail, we mention some of these applications. Regarding communication networks, van den Brink et al. [75] show that, for example the river games of Ambec and Sprumont [21], and the sequencing games of Curiel et al. [28, 29], can be seen as communication graph games where the underlying communication graph is a line graph. Looking at it in this way, we can see that, for example, the downstream incremental solution for river games in Ambec and Sprumont [21] and the drop-out monotonic solution for sequencing games in Fernández et al. [42] boil down to the same solution, being the marginal vector where players enter consecutive in the order of the line. Regarding applications of hierarchies, Brânzei et al. [26] and Brânzei et al. [27] show several applications of peer group games, being games with a permission structure where the permission structure is a rooted tree and the game is an additive game, such as the auction games of Graham et al. [46] and the ATM-games of Bjorndal et al. [25]. Other applications are, for example, the polluted river problems of Ni and Wang [56] and Dong et al. [37], mentioned in Footnote 11.29 Other approaches to additive games on a rooted tree can be found in, e.g the hierarchical ventures in Hougaard
other characterizing properties are that ρ(S) ⊆ S for all S ⊆ N, and ρ(S) ⊆ ρ(T ) for all S ⊆ T ⊆ N. 28 For example, in the line permission tree D = {(1, 2), (2, 3)} on N = {1, 2, 3} the feasible part of coalition {2, 3} is singleton {3} since its direct predecessor is in the coalition, while the predecessor of 2 is not in the coalition. In this case ρ({1, 2, 3}) = {1, 2, 3}, ρ({1, 2} = {1, 2}, ρ({1, 3}) = ρ({1}) = {1}, ρ({2, 3}) = {3} and ρ({2}) = ρ({3}) = ∅. Thus ρ(ρ({2, 3})) = ρ({3}) = ∅ = {3} = ρ({2, 3}). 29 Although they are not games with a permission structure, the airport games of Littlechild and Owen [53], respectively, the joint liability problems of Dehez and Ferey [30] are the duals of auction games, respectively polluted river games. This also makes the graph game approach useful to these applications, in particular for self-dual solutions such as the Shapley value. A comparison between these four applications, based on anti-duality relations, can be found in Oishi et al. [58]. 27 The
30
E. Algaba and R. van den Brink
et al. [49].30 A way to generalize peer group games is keeping a rooted permission tree but allowing any game giving the permission tree games as mentioned in the paragraph after Theorem 5. A special case of a permission tree game that is not a peer group game, are the hierarchically structured firms in van den Brink [69] being games with a permission structure where the permission structure is a rooted tree, and the game is any convex game on the ‘lowest level’ of the hierarchy (i.e. players with no successor) and all players that are not on the lowest level are null players in the game, with the interpretation that the lowest level players are workers that generate value, while the other players are managers who do not physically produce, but organize the production process. The above mentioned applications are just some of the applications of games on communication or hierarchy networks that one can find in the literature. The usefulness of the Myerson graph game approach is also shown in van den Brink and Pintér [74] who show that axiomatizations of the Shapley value for TUgames are invalid on the class of assignment games (see Shapley and Shubik [64]) in the sense that they do not give uniqueness, but Myerson’s component efficiency and fairness do characterize the Shapley value for assignment games (when we consider these games as restriction to the bipartite graph where two players are linked if and only if one is a seller and the other is a buyer). An important relation between network structures that we did not consider in this survey is the duality relation. The dual structure of a network structure F ⊆ 2N is the structure F d given by F d = {S ⊆ N | N \ S ∈ F }. It is, for example, well known that the dual of an antimatroid is a convex geometry, see Footnote 14. Algaba et al. [13] show that the dual structures of accessible union stable network structures form a class of network structures that contains all convex geometries.31 Acknowledgments Authors are grateful to the editors for the invitation to participate in this volume and two anonymous referees for their revision. Financial support from the Ministerio de Ciencia, Innovación y Universidades (MCIU), the Agencia Estatal de Investigación (AEI) and the Fondo Europeo de Desarrollo Regional (FEDER) under the project PGC2018-097965-B-I00 is gratefully acknowledged.
References 1. Algaba, E., van den Brink, R.: The Shapley value and games with hierarchies. In: Handbook of the Shapley Value, pp. 49–74 (2019)
30 The introduction of specific networks as, for instance, coloured networks in the setting of TUgames in Algaba et al. [16] allows us to analyze profit sharing problems in an intermodal transport system. More general networks are the so-called labelled networks as presented by Algaba et al. [17] which curiously coincide with the set of museum pass games Ginsburgh and Zang [45] as shown in Algaba et al. [18]. 31 Specifically, they satisfy (1) feasible empty set, (2) augmentation’, and (3) a weaker intersection property requiring that S, T ∈ F with S ∪ T = N, implies that S ∩ T ∈ F .
Networks, Communication and Hierarchy: Applications to Cooperative Games
31
2. Algaba, E., Bilbao, M., Borm, P., López, J.: The position value for union stable systems. Math. Methods Oper. Res. 52, 221–236 (2000) 3. Algaba, E., Bilbao, M., Borm, P., López, J.: The Myerson value for union stable structures. Math. Methods Oper. Res. 54, 359–371 (2001) 4. Algaba, E., Bilbao, M., López, J.: A unified approach to restricted games. Theory Decis. 50, 333–345 (2001) 5. Algaba, E., Bilbao, M., van den Brink, R., Jiménez-Losada, A.: Axiomatizations of the Shapley value for games on antimatroids. Math. Methods Oper. Res. 57, 49–65 (2003) 6. Algaba, E., Bilbao, M., van den Brink, R., Jiménez-Losada, A.: Cooperative games on antimatroids. Discrete Math. 282, 1–15 (2004) 7. Algaba, E., Bilbao, J.M., López, J.: The position value in communication structures. Math. Methods Oper. Res. 59, 465–477 (2004) 8. Algaba, E., Bilbao, M., Fernández, J., Jiménez, N., López J.: Algorithms for computing the Myerson value by dividends. In: Moore, K.B. (ed.) Discrete Mathematics Research Progress, pp. 1–13 (2007) 9. Algaba, E., Bilbao, J.M, Slikker, M.: A value for games restricted by augmenting systems. SIAM J. Discrete Math. 24, 992–1010 (2010) 10. Algaba, E., Bilbao, J.M, van den Brink, R., López J.J.: The Myerson value and superfluous supports in union stable systems. J. Optim. Theory Appl. 155, 650–668 (2012) 11. Algaba, E., Bilbao, J.M., van den Brink, R.: Harsanyi power solutions for games on union stable systems. Ann. Oper. Res. 225, 27–44 (2015) 12. Algaba, E., van den Brink, R., Dietz, C.: Power measures and solutions for games under precedence constraints. J. Optim. Theory Appl. 172, 1008–1022 (2017) 13. Algaba, E., van den Brink, R., Dietz, C.: Network structures with hierarchy and communication. J. Optim. Theory Appl. 179, 265–282 (2018) 14. Algaba, E., Fragnelli, V., Sánchez-Soriano, J.: Handbook of the Shapley Value. CRC Press, Taylor and Francis Group, New York (2019) 15. Algaba, E., Fragnelli, V., Sánchez-Soriano, J.: The Shapley value, a paradigm of fairness. In: Handbook of the Shapley Value, pp. 17–29 (2019) 16. Algaba, E., Fragnelli, V., Llorca, N., Sánchez-Soriano, J.: Horizontal cooperation in a multimodal public transport system: The profit allocation problem. Eur. J. Oper. Res. 275, 659–665 (2019) 17. Algaba, E., Fragnelli, V., Llorca, N., Sánchez-Soriano, J.: Labeled network allocation problems. An application to transport systems. In: Lecture Notes in Computer Science, 11890 LNCS, pp. 90–108 (2019) 18. Algaba, E., Béal, Fragnelli, V., Llorca, N., Sánchez-Soriano, J.: Relationship between labeled network games and other cooperative games arising from attributes situations. Econ. Lett. 185, 108708 (2019) 19. Algaba, E., Béal, S., Rémila, E., Solal, P.: Harsanyi power solutions for cooperative games on voting structures. Int. J. Gen. Syst. 48, 575–602 (2019) 20. Álvarez-Mozos, M., van den Brink, R., van der Laan G., Tejada, O.: From hierarchies to levels: new solutions for games with hierarchical structure. Int. J. Game Theory 46, 1089–1113 (2017) 21. Ambec, S., Sprumont, Y.: Sharing a river. J. Econ. Theory 107, 453–462 (2002) 22. Aumann, R.J., Drèze J.: Cooperative games with coalition structures. Int. J. Game Theory 3, 217–237 (1974) 23. Béal S., Moyouwou, I., Rémila E., Solal, P.: Cooperative games on intersection closed systems and the Shapley value. Math. Soc. Sci. 104, 15–22 (2020) 24. Bilbao, J.M.: Cooperative games under augmenting systems. SIAM J. Discrete Math. 17, 122– 133 (2003) 25. Bjorndal, E., Hamers, H., Koster, M.: Cost allocation in a bank ATM network. Math. Methods Oper. Res. 59, 405–418 (2004) 26. Brânzei, R., Fragnelli, V., Tijs, S.: Tree connected line graph peer group situations and line graph peer group games. Math. Methods Oper. Res. 55, 93–106 (2002)
32
E. Algaba and R. van den Brink
27. Brânzei, R., Solymosi, T., Tijs, S.: Strongly essential coalitions and the nucleolus of peer group games. Int. J. Game Theory 33, 447–460 (2005) 28. Curiel, I., Potters, J., Rajendra Prasad, V., Tijs, S., Veltman, B.: Cooperation in one machine scheduling. ZOR Methods Models Oper. Res. 38, 113–129 (1993) 29. Curiel, I., Potters, J., Rajendra Prasad, V., Tijs, S., Veltman, B.: Sequencing and cooperation. Oper. Res. 54, 323–334 (1994) 30. Dehez, P., Ferey, S.: How to share joint liability: a cooperative game approach. Math. Soc. Sci. 66, 44–50 (2013) 31. Demange, G.: Intermediate preferences and stable coalition structures. J. Math. Econ.23, 45–58 (1994) 32. Demange, G.: On group stability in hierarchies and networks. J. Polit. Econ. 112, 754–778 (2004) 33. Deng, X., Papadimitriou, C.H.: On the complexity of cooperative solution concepts. Math. Oper. Res. 19, 257–266 (1994) 34. Derks, J.M., Gilles, R.P.: Hierarchical organization structures and constraints on coalition formation. Int. J. Game Theory 24, 147–163 (1995) 35. Derks, J.M, Peters, H.: A Shapley value for games with restricted coalitions. Int. J. Game Theory 21, 351–360 (1993) 36. Dilworth, R.P.: Lattices with unique irreducible decompositions. Ann. Math. 41, 771–777 (1940) 37. Dong, B., Ni, D., Wang Y.: Sharing a polluted river network. Environ. Res. Econ. 53, 367–387 (2012) 38. Edelman, P.H., Jamison, R.E.: The theory of convex geometries. Geometr. Dedicata 19, 247– 270 (1985) 39. Faigle, U.: Cores of games with restricted cooperation. Z. Oper. Res. 33, 405–422 (1989) 40. Faigle, U., Kern, W.: The Shapley Value for cooperative games under precedence constraints. Int. J. Game Theory 21, 249–266 (1992) 41. Fernández, J., Algaba., E., Bilbao, J., Jiménez, A., Jiménez, N., López, J.: Generating functions for computing the Myerson value. Ann. Oper. Res. 109, 143–158 (2002) 42. Fernández, C., Borm, P., Hendrickx, R., Tijs, S.: Drop-out monotonic rules for sequencing situations. Math. Methods Oper. Res. 61, 501–504 (2005) 43. Gilles, R.P., Owen, G.: Cooperative games and disjunctive permission structures, Department of Economics, Virginia Polytechnic Institute and State University, Blacksburg, Virginia (1994) 44. Gilles, R.P., Owen, G., van den Brink, R.: Games with permission structures: the conjunctive approach. Int. J. Game Theory 20, 277–293 (1992) 45. Ginsburgh, V., Zang, I.: The museum pass game and its value. Games Econ. Behav. 43, 322– 325 (2003) 46. Graham, D.A., Marshall, R.C., Richard, J.F.: Differential payments within a bidder coalition and the Shapley value. Am. Econ. Rev. 80, 493–510 (1990) 47. Harsanyi, J.C.: A bargaining model for cooperative n-person games. In: Tucker, A.W., Luce, R.D. (eds.) Contributions to the Theory of Games IV, pp. 325–355. Princeton University Press, Princeton, NJ (1959) 48. Herings, P.J.J., van der Laan, G., Talman, A.J.J.: The average tree solution for cycle free graph games. Games Econ. Behav. 62, 77–92 (2008) 49. Hougaard, J.L., Moreno-Ternero, J.D., Tvede, M., Osterdal, L.P.: Sharing the proceeds from a hierarchical venture. Games Econ. Behav. 102, 98–110 (2017) 50. Korte, B., Lovász, L., Schrader, R.: Greedoids. Springer, Berlin (1991) 51. Le Breton, M., Owen, G., Weber, S.: Strongly balanced cooperative games. Int. J. Game Theory 20, 419–427 (1992) 52. Lindelauf, R.H.A., Hamers, H., Husslage, B.G.M.: Cooperative game theoretic centrality analysis of terrorist networks: The cases of Jemaah Islamiyah and Al Qaeda. Eur. J. Oper. Res. 229, 230–238 (2013)
Networks, Communication and Hierarchy: Applications to Cooperative Games
33
53. Littlechild, S.C., Owen, G.: A simple expression for the Shapley value in a special case. Manag. Sci. 20, 370–372 (1973) 54. Myerson, R.B.: Graphs and cooperation in games. Math. Oper. Res. 2, 225–229 (1977) 55. Myerson, R.B.: Conference structures and fair allocation rules. Int. J. Game Theory 9, 169–182 (1980) 56. Ni, D., Wang, Y.: Sharing a polluted river. Games Econ. Behav. 60, 176–186 (2007) 57. Nouweland, A., Borm, P., Tijs, S.: Allocation rules for hypergraph communication situations. Int. J. Game Theory 20, 255–268 (1992) 58. Oishi, T., Nakayama, M., Hokari, T., Funaki, Y.: Duality and anti-duality in TU games applied to solutions, axioms, and axiomatizations. J. Math. Econ. 63, 44–53 (2016) 59. Owen, G.: Values of games with a priori unions. In: Henn, R., Moeschlin,O. (eds.) Mathematical Economics and Game Theory, pp. 76–88. Springer, Berlin (1977) 60. Owen, G.: Values of graph-restricted games. SIAM J. Algebraic Discrete Methods 7, 210–220 (1986) 61. Pérez-Castrillo, D., Wettstein, D.: Bidding for the surplus: a non-cooperative approach to the Shapley Value. J. Econ. Theory 100, 274–294 (2001) 62. Selcuk, O., Suzuki, T.: An axiomatization of the Myerson value. Contrib. Game Theory Manag. 7, 341–348 (2014) 63. Shapley, L.S.: A value for n-person games. In: Kuhn, H.W., Tucker, A.W. (eds.) Contributions to the Theory of Games, vol. 2, pp. 307–317. Princeton University Press, Princeton (1953) 64. Shapley, L.S., Shubik, M.: The assignment game I: the core. Int. J. Game Theory 1, 111–130 (1972) 65. Slikker, M.: Bidding for surplus in network allocation problems. J. Econ. Theory 137, 493–511 (2007) 66. van den Brink, R.: An axiomatization of the disjunctive permission value for games with a permission structure. Int. J. Game Theory 26, 27–43 (1997) 67. van den Brink, R.: An Axiomatization of the Conjunctive Permission Value for Games with a Hierarchical Permission Structure, in: Logic, Game Theory and Social Choice (ed. H. de Swart), pp. 125–139 (1999) 68. van den Brink, R.: An axiomatization of the Shapley value using a fairness property. Int. J. Game Theory 30, 309–319 (2001) 69. van den Brink, R.: Vertical wage differences in hierarchically structured firms. Soc. Choice Welfare 30, 225–243 (2008) 70. van den Brink, R.: On hierarchies and communication. Soc. Choice Welfare 39, 721–735 (2012) 71. van den Brink, R.: Games with a permission structure: a survey on generalizations and applications. TOP 25, 1–33 (2017) 72. van den Brink, R., Dietz, C.: Games with a local permission structure: separation of authority and value generation. Theory Decision 76, 343–361 (2014) 73. van den Brink, R., Gilles, R.P.: Axiomatizations of the conjunctive permission value for games with permission structures. Games Econ. Behav. 12, 113–126 (1996) 74. van den Brink, R., Pintér, M.: On Axiomatizations of the Shapley Value for assignment games. J. Math. Econ. 60, 110–114 (2015) 75. van den Brink, R., van der Laan, G., Vasil’ev, V.: Component efficient solutions in line-graph games with applications. Econ. Theory 33, 349–364 (2007) 76. van den Brink, R., Katsev, I., van der Laan, G.: Axiomatizations of two types of Shapley Values for games on union closed systems. Econ. Theory 47, 175–188 (2011) 77. van den Brink, R., van der Laan, G., Pruzhansky, V.: Harsanyi power solutions for graphrestricted games. Int. J. Game Theory 40, 87–110 (2011) 78. van den Brink, R., Herings, P.J.J., van der Laan, G., Talman, A.J.J.: The average tree permission value for games with a permission tree. Econ. Theory 58, 99–123 (2015) 79. van den Brink, R., Dietz, C., van der Laan, G., Xu, G.: Comparable characterizations of four solutions for permission tree games. Econ. Theory 63, 903–923 (2017)
34
E. Algaba and R. van den Brink
80. van den Brink, R., He, S., Huang, J.-P.: Polluted river problems and games with a permission structure. Games Econ. Behav. 108, 182–205 (2018) 81. Winter, E.: A value for cooperative games with levels structure of cooperation. Int. J. Game Theory 18, 227–240 (1989)
Game with Slow Pursuers on the Edge Graphs of Regular Simplexes Abdulla Azamov, Tolanbay Ibaydullaev, and Gafurjan Ibragimov
Abstract We study a differential game of kind of several pursuit points and one evasion point moving along the edges of a regular simplex of dimension d. It is assumed that maximum magnitude of velocity of evader is twice as much as the maximum magnitudes of velocities of pursuers. An exact mathematical formulation of the problem is given by introducing special classes of strategies adapted for games on graphs. It is proved that if the number of pursuers is greater than d/2+1, then the game is completed in favor of pursuers, otherwise in favor of the evader. Keywords Differential game · Many pursuers · Evader · Games on graphs · Strategy · Control
1 Introduction An important class of differential games consists of pursuit-evasion games of Lion & Man. The game of Lion & Man of Hungarian mathematician R. Rado [26] was the starting point for studying of such type differential games. In view of the fact that such games are considered on an infinite time interval, the exact formulation of the game requires a special approach. This is due to the fact that in pursuit-evasion differential games the following situation is possible: there is an initial position such that the completion of the pursuit is possible, but it is impossible to complete the game in any finite time interval (or, equivalently,
A. Azamov Institute of Mathematics named after V.I. Romanowsky, Tashkent, Uzbekistan T. Ibaydullaev Andijan State University named after Z.M. Babur, Andijan, Uzbekistan G. Ibragimov () Universiti Putra Malaysia, Department of Mathematics and INSPEM, Serdang, Selangor, Malaysia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Trends in Mathematics, https://doi.org/10.1007/978-3-030-93616-7_2
35
36
A. Azamov et al.
the evader is not able to prevent the capture, but is able to delay the moment of capture arbitrarily long). Moreover, such points can form entire regions [4], which is why it is impossible to apply the methods of approximation by games on finite time intervals. Pursuit-evasion games on finite graphs are distinguished by the fact that the indicated phenomenon is impossible for them, moreover, there is an alternative in a strengthened form: either there exists T such that the pursuit is completed on the time interval [0, T ] regardless of the initial state, or it is possible evading capture over an infinite time interval, of course, with the exception of those initial states in which the evader at the very beginning of the game may be trapped by the pursuers. It should be noted that there are two types of dynamic games on graphs. Games on abstract graphs where points move from one vertex to an adjacent one by jumping constitute one type [13–16, 19, 25, 29, 32]. It is better to call such games as multimove games. In games of another type, points move along the edges of a given graph embedded in a Euclidean space [1–3, 9–11, 17]. Both types of games have minimax forms, each being a model the search problem of a moving object [5, 17, 20]. Note that there are various statements of pursuit-evasion games, for example, [21, 29]. The purpose of the present paper is to study a pursuit-evasion differential games on the edge graph of regular simplexes. A regular simplex of dimension d is defined as a subset of the Euclidean space IRd by the following relations
1 xj = √ , xj ≥ 0, j = 1, 2, . . . , d + 1. 2
Its edges of length 1 form a complete graph Σ d with d +1 vertices. Being interested only in the topological scheme and the lengths of the edges, Σ d can be embedded in IR3 replacing rectilinear edges with broken lines of length 1. Let n pursuing points Pk , k = 1, 2, . . . , n, move along the graph Σ d , whose velocities do not exceed in absolute value ρ, ρ > 0, and one evading point Q whose velocity does d not exceed σ , σ > 0. These data define the game, which we denote Σ , ρ, σ . To give an exact statement, one should introduce the classes of the players’ strategies and, for each pair of strategies U, V and the initial state ξ , indicate how the trajectories of points are generated, then introduce the payoff function. As a result, for each ξ we obtain a normal form, and the solution of the game will be its equilibrium situation (saddle point). If we take the capture time as the payoff function, then we will deal with the degree problem (in R. Isaacs’ terminology). If we are only interested in the possibility or impossibility of capturing, then we come to the problem of kind. It is extremely rare that the degree problem can be solved explicitly. The present work is mainly devoted to the problem of kind. There is another approach to differential games based on Control Theory, according to which the pursuit and evasion problems are considered separately. In the present paper, we show that both approaches are equivalent for the games considered here, while in general this is unknown.
Game with Slow Pursuers on the Edge Graphs of Regular Simplexes
37
Further, we note that in the pursuit problem it is natural to impose a condition that gives some advantage to the pursuer, for example, the latter has a higher speed than the evader. Similarly, in the evasion problem, such an advantage should be given to the evader. In games on graphs, the restriction on the movement of points along the graph is in itself an advantage for the pursuer; therefore, it is natural to expect that the pursuit problem makes sense when the maximum speed of the pursuing points is less than the maximum speed of the evader. Moreover, the features of our game also allow us to achieve its solution in the form of a pair of winning strategies of pursuit and evasion problems solved separately [18, 31]. This will be demonstrated for the game Σ d , ρ, σ when ρ = 12 σ . Suppose that we have already clarified the concepts of “pursuer wins” and “evader wins” (in the game Σ d , ρ, σ . Let Nd denote the smallest number of pursuers for which the pursuing player wins. Then the following statement holds. Theorem 1 Let ρ = 12 σ . Then Nd = d2 + 1.
2 Statement of the Problem The game involves two players: the pursuer controlling the movements of points x [1] , x [2] ,. . . , x [n] , n ≥ 2, and the evader controlling the movement of the point y. For brevity, a set of m vectors will be denoted by the corresponding capital letter, for example, X = (x [1] , x [2] , . . . , x [n] ) ∈ (Σ d )n . Further, the (n + 1)-tuple (X, y) = (x [1] , x [2] , . . . , x [n] , y) ∈ (Σ d )n+1 will be called a state. An absolutely continuous mapping x [k] (·) : [0, +∞) → Σ d , satisfying the condition |dx [k] (t)/dt| ≤ ρk a.e., is called the trajectory of the pursuing point x [k] . Similarly, an absolutely continuous mapping y(·) : [0, +∞) → Σ d satisfying the condition |dy(t)/dt| ≤ σ a.e. is called the trajectory of the evading point. The set of trajectories of all pursuers and the evader defines the play in the game. (Naturally, the notion of a trajectory includes the starting position of the corresponding point.) If the pursuer chooses the trajectories x [k] (·) of all points x [k] and then, knowing them, the evader y chooses his own trajectory y(·), then we will have a search problem [5, 17, 27, 28]. However, we are dealing with a differential game here. This concept was introduced into the lexicon by R. Isaacs. Since the publication of his book [22], the theory of differential games has received comprehensive development, including dozens of monographs published. At the same time, the question “What is a differential game?” remains relevant. If the game is considered on a fixed time interval [0, T ], then the approach based on the multi-step approximation of a differential game works successfully, where the circumstances are greatly simplified [18, 31]. It should be noted that in monograph [23] the theory of differential games on a finite time interval was brought up to the existence theorem for optimal mixed strategies.
38
A. Azamov et al.
As regards pursuit-evasion differential games on an infinite time interval [0, +∞), such comprehensive results have not yet been obtained. L.S. Pontryagin proposed effective methods for solving pursuit games on a finite time interval (see, for example, [8]) and evasion games on an infinite time interval [33]. In general, the third case can also take place: at any finite time interval, it is possible to evade capture, but on an infinite interval this is impossible [4, 34]. In this regard, the game on graphs has a simple structure—either from any initial state it is possible to complete the pursuit at a finite fixed time interval, or it is possible to evade capture at an infinite time interval. In the latter case, trivial initial states where the evader is trapped by the pursuers should be excluded. For this, it is sufficient that at the initial time the evading point is at one of the vertices, and the pursuing points are at positive distances from it. Unlike the search problem, in a differential game players must and have the right to manage their points based on current information, which includes information about the values X(t) and y(t) at the current time t for the evader, and, for the pursuer, information about the value v(t) = dy(t)/dt beyond X(t), y(t). The inequality in the awareness of the players is explained by the fact that the condition for the end of the game is a pointwise capture, that is, xk (t) = y(t) for some k and t, k = 1, 2, . . . , m, t ≥ 0. For an exact answer to the question “What is a pursuit-evasion differential game?” it is necessary to explain how the trajectories X(t) and y(t) are generated. This is realized by introducing classes of players’ strategies, which allows us to write the differential game in the normal form of J. von Neumann [31]. At present, there are many approaches to the problem of normalizing a differential game (see, for example, [4, 18, 23, 24, 31, 33, 34]). In the present paper we propose a relatively simple normalization method based on the specifics of games on graphs. Let a connected graph Γ be given with rectifiable edges of a certain length. Since we are only interested in the topological structure of the graph and the lengths of its edges, we can assume that Γ is embedded in IR3 and all edges are broken lines Definition 1 The mapping Q that assigns a vector v, |v| ≤ σ , and a positive number δ to each state (X, y) is called a strategy of evader. If Q(X, y) = (v, δ), then we call the pair (v, δ) the prescription of the strategy Q for the point y at the state (X, y). Definition 2 The mapping P assigning to each collection (X, y, v, δ), |v| ≤ σ , the collection (U, ε) = (u1 , u2 , . . . , um , ε), where |uk | ≤ ρk , k = 1, 2, . . . , m, 0 < ε ≤ δ, is called the pursuer’s strategy. If P (X, y, v, δ) = (U, ε), then we call (U, ε) the prescription of the strategy P at the position (X, y, v). The strategies of the players are called admissible if they do not take out the moving points from the graph Γ . For this, it is sufficient that the prescriptions satisfy the conditions: 1. xk ∈ Γ implies that xk + tuk ∈ Γ for t ∈ [0, ε], 2. y ∈ Γ implies that y + tv ∈ Γ for t ∈ [0, δ].
Game with Slow Pursuers on the Edge Graphs of Regular Simplexes
39
Note that the admissibility of the pursuer’s strategy and the evader’s strategy is independent. In the sequel, by strategy we mean only an admissible strategy. The set of all admissible strategies of pursuer (evader) we denote by P (respectively, Q). Let us explain the reason that in the prescription of the pursuer’s strategy, the duration of his action ε is not chosen equal to δ. The structure of the graph and the response to the evader’s prescription may require the pursuer to change his velocity vector. In this case, the evader is allowed after time δ to pass to another prescription. A pair of strategies (P , Q) ∈ P × Q together with the given initial positions X0 = (x10, x20 , . . . , xm0 ) and y0 form a play in the game. In our case, this concept is equivalent to a pair of trajectories X(t) = X(t; X0 , y0 , P , Q), y(t) = y(t; X0 , y0 , P , Q), which are generated as described below. Let (v0 , δ0 ) = Q(X0 , y0 ), (U0 , ε0 ) = P (X0 , y0 , v0 , δ0 ). Set t0 = 0, t1 = t0 + ε0 and for t ∈ [t0 , t1 ] we put X(t) = X0 + tU0 , y(t) = y0 + tv0 . Now let X1 = X(t1 ), y1 = y(t1 ). According to this data, the strategies under consideration generate the values (v1 , δ1 ) = Q(X1 , y1 ), (U1 , ε1 ) = P (X1 , y1 , v1 , δ1 ), and the trajectories are continued by the formulas y(t) = y1 + (t − t1 )v1 , X(t) = X1 + (t − t1 )U1 for t ∈ [t0 , t2 ], where t2 = t1 + ε1 . Continuing this process, we obtain infinite sequences tn , vn , δn , Un , εn , and the trajectories y(t) and X(t) are defined for t ∈ [t0 , tn ], n = 1, 2, . . .. By construction, tn is a strictly increasing sequence. There are two possible cases. Case (i): tn is unbounded. Then the construction of trajectories y(t) and X(t) ends. Case (ii): tn is bounded. In the second case, the trajectories can be continued by the method of transfinite induction [35] (for various applications to the theory of dynamical games (see [4, 6, 13, 34])), but instead we restrict ourselves to considering strategies for which inf δn > 0, inf εn > 0 that guarantees Case (i). At first glance such a condition n
n
narrows the capacity of players, but in reality winning strategies constructed below satisfy it. From now on we will include this condition in the notion of admissible strategies. We’ll call tn the moments of choice and [tn , tn+1 ) the intervals of the prescriptions. Let us now set TX0 ,y0 (P , Q) = inf{t | ∃k : xk (t; X0 , y0 , P , Q) = y(t; X0 , y0 , P , Q)}.
(1)
(Note that inf = +∞.) Function (1) together with the classes of admissible strategies of the players for each pair of initial points X0 , y0 defines the normal form (TX0 ,y0 , P , Q) of the game according to J. von Neumann, which completes the formalization of the game. The resulting normal form corresponds to the Isaacs degree problem. Its solution, i.e. a situation of equilibrium can be rarely found.
40
A. Azamov et al.
Replacing the payoff function TX0 ,y0 (P , Q) by its characteristic function τX0 ,y0 (P , Q) =
0, if TX0 ,y0 (P , Q) < ∞ 1, otherwise
we get the normal form (τX0 ,y0 , P , Q) of the game corresponding to the problem of kind with only two outcomes, i.e. game without draws. It can be reformulated in the traditional form for the control theory approach, according to which the problem of finding an equilibrium point is replaced by two problems. Let X0 , y0 be fixed. Pursuit Problem Construct a strategy Pˆ of the pursuer such that for any strategy Q of the evader ∃k, ∃t : x [k] (t; X0 , y0 , Pˆ , Q) = y(t; X0 , y0 , Pˆ , Q). The Evasion Problem (Evasion from Capture) Construct a strategy Qˆ of the evader such that for any strategy P of the pursuer ˆ = y(t; X0 , y0 , P , Q). ˆ ∀k, ∀t : x [k] (t; X0 , y0 , P , Q) ˆ is called a strategy that guarantees the solvability of the pursuit Pˆ (respectively, Q) (evasion) problem or, in short, a strategy that guarantees the pursuit (evasion). Obviously, for each fixed pair of initial points X0 , y0 , only one of these problems is solvable: either the pursuit problem or the evasion problem. However, it does not logically follow from the definitions that one of them is always solvable. This phenomenon is based on the fact that the game is considered on an infinite time interval [0, +∞) [4]. If only one of these problems has a positive solution, then it is said that the Krasovskii alternative holds [23].
3 Main Result Let us return to the game on the simplex Γ = Σ d . For this case, the alternative holds in a strengthened form. Note that we are considering a game with n pursuers. We call the initial state X0 , y0 nontrivial if y0 is a vertex of Σ d and xk0 = y0 , k = 1, 2, . . . , n. Theorem 2 Let ρ ≥ 12 σ and n ≥ [d/2] + 1. Then there exist a positive number Td and a strategy Pˆ in the game Σ d , ρ, σ such that for any initial positions X0 , y0 and the evader’s strategy Q one has ∃k, ∃t : t ≤ Td and xk (t; X0 , y0 , Pˆ , Q) = y(t; X0 , y0 , Pˆ , Q).
Game with Slow Pursuers on the Edge Graphs of Regular Simplexes
41
Theorem 3 Let one of the following conditions be satisfied: (a) ρ ≤ 12 σ and n < [d/2] + 1 or (b) ρ < 12 σ and n ≤ [d/2] + 1. Then, in the game Σ d , ρ, σ there exists a strategy Qˆ of the evader such that for any nontrivial initial positions X0 , y0 and strategy P of the pursuer we have ˆ = y(t; X0 , y0 , P , Q). ˆ ∀k, ∀t : xk (t; X0 , y0 , P , Q)
3.1 Proof of Theorem 2 Proof In construction of pursuit strategy we use P-strategy. P-strategy was first constructed by [30] and used by many researchers in simple motion pursuit differential games. We apply induction on d. I. Base of Induction If d = 1 or 2, then the conclusion of Theorem 2 is obvious. Moreover, the numbers T1 = 1 and T2 = 3/2 satisfy it. Let now d = 3 i.e. we have a regular tetrahedron ABCD. It suffices to prove the theorem for n = 2. Let us start constructing the required strategy Pˆ . This is done by specifying a vector u of prescriptions, and the choice of δ is always obvious: either δ = ε, or δ is equal to the time it takes to reach a vertex by at least one of the pursuers. Let us bring the pursuing points to convenient positions (we call this stage the initial stage). Let x [1] ∈ AB at the initial time. The points x [1] and x [2] are prescribed to occupy the mid point S of the edge AB and the vertex D, respectively. If this happens at some time t1 , then it is easy to see that t1 ≤ 3/(2ρ). There are two types of location for the point y(t1 ). Case 1: y(t1 ) lies on a side of the base ABC. Let y(t1 ) belong to the broken line SAC (Fig. 1) (the case where y(t1 ) belongs to the broken line SBC is analyzed in a similar way). Let the point x [1] move towards the vertex A. On some time intervals the point y can move along the edge AD. Therefore, at the time t2 = t1 + AS/ρ ≤ 2/ρ we have either x [1] (t2 ) = A and y(t2 ) ∈ AD, then the point y will be between two pursuers on this edge, or a time t2 (less than t2 ) occurs such that the segment with ends x [1] (t2 ) and y(t2 ) becomes parallel to the bisector CS. In the latter case, starting from the moment of time t2 , the point x [2] is prescribed to move towards the vertex C, and the point x [1] moves in accordance with the P-strategy [12, 30] (this is possible due to the condition ρ ≥ (1/2)σ since the projection of the point y ∈ AC onto the segment AS moves exactly two or more times slower in comparison with the speed of the point y itself; see [7] for details). Then, according to the main property of the P -strategy, the segment with ends x [1] (t) and y(t) remains parallel to the bisector CS. Note that in this case, at some time intervals, the point y can move along the edge CD and the P-strategy will become directly inapplicable. Therefore, we correct it as follows: if (α, β) is one of such intervals, then it is easy to see that x [1] (α) = S. Then we set x [1] (t) = S for t ∈ [α, β].
42
A. Azamov et al.
Fig. 1 The point y(t1 ) is on the base ABC
Since P -strategy will not allow the point y to penetrate the edges BD and AD, and x [2] will pursue it further, then either x [1] (t3 ) = y(t3 ) or x [2] (t3 ) = y(t3 ) for some t3 , t3 ≤ t2 + 2/ρ ≤ 4/ρ. Case 2: The point y(t1 ) lies on one of the lateral edges AD, BD, CD. If y(t1 ) ∈ AD, then the situation generally repeats the previous case. The point x [1] moves towards the vertex A and the point y will be forced to go to the edge AC and then the situation y ∈ AC occurs. If y(t1 ) ∈ BD, then the same reasoning works. Finally, if y(t1 ) ∈ CD (see Fig. 2), then starting from the time t1 the point x [1] must move in accordance with the corrected P-strategy as already described in the previous case, and the pursuer x [2] should go to the vertex C. Then the point x [2] reaches the vertex C at the time t2 = t1 + 1/ρ. If y is not caught by the pursuer x [2] by the time t2 , then y(t3 ) ∈ AC or y(t3 ) ∈ BC at some t3 > t2 . For t > t2 , let x [2] continue chasing y moving towards the vertices A or B, respectively. This in combination with the P-strategy for x [1] will ensure the 7 completion of the pursuit for the time T3 = 2ρ . Thus, Theorem 2 is true for the cases d = 2 and 3. II. Step of Induction Let the conclusion of the theorem be true for some d, d ≥ 2. Also, we assume that the initial stage is defined for this game. Let us show that n=
d +2 d +1= +2 2 2
pursuing points are able to capture the evader in the game on the simplex Σ d+2 = A0 A1 . . . Ad+2 . Without loss of generality, we can assume x [n] (0) ∈ Ad+1 Ad+2 . The sub-simplex A0 A1 . . . Ad is called the base of the simplex, and the open (in the
Game with Slow Pursuers on the Edge Graphs of Regular Simplexes
43
Fig. 2 The point y(t1 ) is on a lateral edge of ABCD
Fig. 3 The initial stage on the simplex A0 A1 A2 A3 A4 A5
topology Σ d+2 ) subgraph Σ d+2 \Σ d is called the superstructure. Subsequently, the point x [n] will be on the superstructure until a certain time. First, we describe the initial stage of constructing strategies for the pursuers. Assign x [n] to occupy the mid point R of the edge Ad+1 Ad+2, and let the points x [1] , x [2] , . . . , x [n−1] go to the base and take there the corresponding positions prescribed by the strategy in the game with (n − 1) pursuers on the simplex Σ d at the initial stage. In Fig. 3, A0 A1 A2 A3 is the basis of simplex A0 A1 A2 A3 A4 A5 , where d = 3. This completes the initial stage in the game with n pursuers on the
44
A. Azamov et al.
simplex Σ d+2 . It is easy to verify that for the duration t1 of this stage, we have the 3 estimate t1 ≤ 2ρ . If at the time t1 the point y is on the superstructure, then the pursuer x [n] is assigned to displace y from the edge [Ad+1 , Ad+2 ] and then follow the following prescription: if y is on an edge with one end Ad+1 (with end Ad+2 ) and another end at the base, then x [n] must occupy the vertex Ad+1 (respectively, Ad+2 ). Let x [n] occupies Ad+1 (or Ad+2 ) at some time t2 . If the point y is on the base, then x [n] waits 5 at the point R. All these are possible since ρ ≥ σ/2. Thus, at t2 , t2 ≤ t1 + ρ1 = 2ρ , d+2 \[Ad+1, Ad+2 ] holds. the inclusion y(t2 ) ∈ Σ We now introduce the “shadow” point y˜ by the formula y˜ =
y, y ∈ Σ d , Al , y ∈ (Al , Ad+1 ) ∪ (Al , Ad+2 ).
Thus, any trajectory of the point y generates the trajectory of the point y˜ and, moreover, y˜ remains on the base Σ d all the time. Note that if y˜ ∈ (Ai Aj ) for some i, j ∈ {0, 1, . . . , (n − 1)}, then y˜ = y. By the induction hypothesis, the pursuing points x1 , x2 , . . . , xn−1 are able to complete the capture of y˜ for the time Tn−1 . If xk (t∗ ) = y(t ˜ ∗ ) and y(t ˜ ∗ ) ∈ (Ai Aj ) for some k ∈ {1, . . . , (n − 1)} and i, j ∈ {0, 1, . . . , (n − 1)}, then as noted above y(t ˜ ∗ ) = y(t∗ ) and hence xk (t∗ ) = y(t∗ ). If y(t ˜ ∗ ) = Al for some l ∈ {0, 1, . . . , (n − 1)}, then y(t∗ ) ∈ (Al , Ad+1 ) ∪ (Al , Ad+2 ). This means that the real evading point y is clamped on both sides by the points xk and x [n] and hence it will be caught. For the capture time, we have the following estimate Td ≤ t2 + Td−2 +
2 9 ≤ Td−2 + . ρ 2ρ
Considering separately the cases of even and odd n based on the initial values 3 7 T2 = 2ρ , T3 = 2ρ , we obtain the formula Td =
9d−12 4ρ , 9d−13 4ρ ,
d is odd, d is even.
3.2 Proof of Theorem 3 Proof Let the point y(0) be at the vertex A0 at the initial time. By assumption, the initial state is nontrivial. Therefore, all pursuers are at a positive distance from A0 . Let r be the smallest of these distances. We put r0 = min{r, 13 }. Assume that the
Game with Slow Pursuers on the Edge Graphs of Regular Simplexes
45
Fig. 4 The simplex Σ 4
evader remains in its place until at least one of the pursuers is at a distance r0 from the vertex A0 at some time t0 (it might be t0 = 0 as well). Let x [1] be a such pursuer (see Fig. 4 where d = 4). The point y can pass from the vertex A0 to any of the vertices A1 , A2 ,. . . , Ad of the simplex Σ d for the time 1/σ . Let the condition (a) of the theorem be satisfied. Then, for the speeds of the pursuers x2 , x3 , . . . , xn , we have ρ ≤ 12 σ . Therefore, for the time 1/σ , any of the pursuing points x2 , x3 , . . . , xn can reach only one of the vertices A1 ,. . . , Ad of Σ d if ρ < 12 σ , and in the case of equation ρ = σ/2 can reach not more than two vertices. Since x1 cannot reach any of the vertices A1 ,. . . , Ad for the time 1/σ and by the condition (a) of the theorem 2(n − 1) < d, then the point y can reach at least one of the vertices A1 , A2 , . . . , Ad before any of pursuing points. Let us now consider the case where the condition (b) is satisfied. Thus, ρ < 12 σ . As we concluded above any of the pursuing points can reach only one of the d vertices for the time 1/σ , and the evader can safely go to any of the remaining d − n ≥ 1 vertices. Thus, in both cases, the evading point can move from one vertex to another an infinite number of times, avoiding capture. The proof of Theorem 3 is complete.
4 Conclusion In the present paper, using the features of pursuit and evasion games on graphs a convenient formalization of such games by reducing them to J. von Neumann normal form has been proposed. The case where the maximum speed of the pursuing
46
A. Azamov et al.
points is less than the maximum speed of the evading point has been studied. The solutions of pursuit and evasion games of several pursuers and one evader on the simplex Σ d has been obtained in an explicit form. The strategies of pursuers in the pursuit game and the strategy of the evader in the evasion game have been constructed explicitly. To construct the strategies of pursuers, we have used the ideas such as the strategy of parallel approach, representations of the (d + 2)-dimensional tetrahedron in the form of a superstructure over the d-dimensional tetrahedron, which can be useful in graph game theory. Acknowledgments The present research was partially supported by the National Fundamental Research Grant Scheme FRGS of Malaysia FRGS/1/2017/STG06/UPM/02/9. Code Project 0101-17-1921FR.
References 1. Andreae, T.: Note on a pursuit game played on graphs. Discrete Appl. Math. 9(2), 111–115 (1984) 2. Andreae, T.: A search problem on graphs which generalizes some group testing problems with two defectives. Combinatorics of ordered sets (Oberwolfach, 1988). Discrete Math. 88(2–3), 121–127 (1991) 3. Andreae, T, Hartenstein, F., Wolter, A.: A two-person game on graphs where each player tries to encircle his opponent’s men. Theoret. Comput. Sci. (Math Games) 215, 305–323 (1999) 4. Azamov, A.A.: On an alternative for pursuit-evasion games in an infinite time interval. PMM USSR 50(4), 428–432 (1986). (Prikl. Matem. Mechan. 26(4), 561–566 (1986)) 5. Azamov, A.A.: Lower bound for the advantage coefficient in the graph search problem. Differential Equations 44(12), 1764–1767 (2008) 6. Azamov, A.A.: Foundations of the theory of discrete games. Niso Poligraf, Tashkent (2011) 7. Azamov, A.A., Ibaydullaev, T.: A pursuit-evasion differential game with slow pursuers on the edge graph of simplexes I. Math. Game Theory Appl. 12(4), 7–23 (2020). https://doi.org/10. 17076/mgta_2020_4_23 8. Azamov, A.A., Iskanadjiev, I.M.: Pontryagin’s alternating integral for differential inclusions with counteraction. Contrib. Game Theory Manag. V, 33–44 (2012) 9. Azamov, A.A., Kuchkarov, A.Sh., Holboyev, A.G.: The pursuit-evasion game on the 1-skeleton graph of the regular polyhedron. III. Mat. Teor. Igr Pril. 11(4), 5–23 (2019) 10. Azamov, A.A., Kuchkarov, A.Sh., Holboyev, A.G.: The pursuit-evasion game on the 1-skeleton graph of the regular polyhedron. I. Mat. Teor. Igr Pril. 7(3), 3–15 (2015) 11. Azamov, A.A., Kuchkarov, A.Sh., Holboyev, A.G.: The pursuit-evasion game on the 1-skeleton graph of the regular polyhedron. II. Mat. Teor. Igr Pril. 8(4), 3–13 (2016) 12. Azamov, A.A., Samatov, B.T.: The Π -strategy: analogies and applications. Contrib. Game Theory Manag. IV, 33–46 (2011) 13. Berge, C.: Colloque sur la Théorie des Jeux. (French) Tenu á Bruxellesle 29 etle 30 mai 1975. Cahiers Centre ÉtudesRechercheOpér. 18(1–2), 1–253. Institut de Statistique, Université Libre de Bruxelles, Brussels, 1–267 (1976) 14. Bonato, A., Nowakowski, R.J.: The Game of Cops and Robbers on Graphs. Student Mathematical Library, vol. 61. American Mathematical Society, Providence (2011) 15. Bonato, A., Golovach, P., Hahn, G., Kratochvíl, J.: The capture time of a graph. Discrete Math. 309(18), 5588–5595 (2009) 16. Bulgakova, M.A., Petrosyan, L.A.: Multistage games with pairwise interactions on full graph. Mat. Teor. Igr Pril. 11(1), 3–20 (2019)
Game with Slow Pursuers on the Edge Graphs of Regular Simplexes
47
17. Fomin F.V., Thilikos, D.M.: An annotated bibliography on guaranteed graph searching. Theoret. Comput. Sci. 399, 236–245 (2008) 18. Friedman, A.: Differential Games. Wiley, New York, 350 p. (1971) 19. Gavenˇciak, T.: Cop-win graphs with maximum capture-time. Discrete Math. 310(10–11), 1557–1563 (2010) 20. Golovach, P.A., Petrov, N.N., Fomin, F.V.: Search in graphs. Proc. Steklov Inst. Math. Control Dynamic Syst. suppl. 1, 90–103 (2000) 21. Ibragimov, G.I., Luckraz, Sh.: On a characterization of evasion strategies for pursuit-evasion games on graphs. J. Optim. Theory Appl. 175, 590–596 (2017). https://doi.org/10.1007/ s10957-017-1155-7 22. Isaacs, R.: Differential Games. John Wiley & Sons, New York (1965) 23. Krasovskii, N.N.: Theory of Control of Motion: Linear Systems. Nauka, Moscow (1968) 24. Krasovskii, N.N., Subbotin, A.I.: Game-Theoretical Control Problems. Springer, New York (1988) 25. Kummer, B.: Spiele auf Graphen (German). International Series of Numerical Mathematics, vol. 44. Birkhäuser Verlag, Basel, 94 p. (1980) 26. Littlewood, J.E.: Bollobás, Béla (ed.), Littlewood’s Miscellany. Cambridge University Press, Cambridge (1986) 27. Mamatov, A.R.: An algorithm for solving a two-person game with information transfer. Comput. Math. Math. Phys. 50(6), 1699–1704 (2010) 28. Mamatov, A.R.: Algorithm for solving one game of two persons with the transfer of information. J. Comput. Math. Math. Phys. 46(10), 1784–1789 (2006) 29. Nowakowski, R.J.: Unsolved problems in combinatorial games. Games No Chance 5, 125–168 (2019), Math. Sci. Res. Inst. Publ., 70, Cambridge University Press, Cambridge 30. Petrosyan, L.A.: On a family of games of survival. Soviet Math. Doklady 161(1), 52–54 (1965) 31. Petrosyan, L.A.: Differential Games of Pursuit. World Scientific, Singapore (1993) 32. Petrosyan, L.A., Sedak, A.A.: Multi-step network game with full information. Math. Theory Games Appl. 1(2), 66–81 (2009) 33. Pontryagin, L.S.: Selected Works. Nauka, Moscow (1988) 34. Pshenichny, B.N., Ostapenko, V.V.: Differential Games. Naukova Dumka, 262 p. (1992) 35. Sierpinski, W.: Cardinal and Ordinal Numbers. Polish Scientific Publishers, Warsaw, 491 p. (1965)
Are Retailers’ Private Labels Always Detrimental to National Brand Manufacturers? A Differential Game Perspective Alessandra Buratto and Sihem Taboubi
Abstract We study the competition between national and private brands (or private labels) in a vertical channel structure. Our main objective is to analyze the impacts of the private label’s existence on strategies, sales, and profits of the members and the whole channel. We use a differential game, where the control variables are price and non-price marketing decisions, and investigate two scenarios. The first one, used as a benchmark, considers an exclusive retailer that distributes only a national brand provided by a manufacturer. The latter invests in national advertising to build its brand’s goodwill. In the second scenario, the retailer owns a private label that competes with the national brand. By computing the results under both scenarios, we provide answers to the following research questions: (1) What should the price and the non-price marketing strategies be, with and without the private label? (2) How do they compare? (3) Is the presence of a private label always profitable for the retailer and harmful to the manufacturer? One of our main results indicates that the manufacturer is not necessarily always hurt by the private label, as the existing literature suggests. Keywords Differential games · Private labels · Channels · Feedback Stackelberg equilibrium
1 Introduction According to the Private Label Manufacturers Association (PLMA), the most recent statistics in the retail industry indicate that for every four products sold on the US market, one product is a private label (PL). The results of their nationwide survey A. Buratto () Department of Mathematics “Tullio Levi-Civita”, University of Padova, Padova, Italy e-mail: [email protected] S. Taboubi GERAD and Marketing Department HEC Montreal, Montreal, QC, Canada e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Trends in Mathematics, https://doi.org/10.1007/978-3-030-93616-7_3
49
50
A. Buratto and S. Taboubi
in 2019 shows that this proportion is intended to increase in the future. Indeed, two third of the respondents evaluated the quality of PLs as “as good if not better than the national brand version of the same product” and 25% of them plan to buy more PLs in the year in ahead. Another report published by Nielsen the same year estimates the annual sales of PLs to about $180 billion with a market share that increases at a faster rate than national brands.1 In such a context, it is important for national brands manufacturers, that formerly dealt with exclusive retailers or competed only with other national brands, to be prepared to face the competition driven by retailers’ private labels. These could be a serious threat to national brands manufacturers for various reasons. First, because the retail price of PLs is often lower than that of national brands.2 Second, since retailers control the shelf-space in their stores, they can decide to allocate less space to national brands in order to give their store brand more visibility. Finally, retailers may be tempted to put more effort into local promotional activities to boost their brand instead of promoting the national brand. How manufacturers and retailers fix their prices and the levels of the other marketing decision variables depending whether the retailer owns of not a store brand is a very important question in this context. More particularly because the presence of price and non-price variables allows channel members to use more than one “instrument” to react or to influence the other’s decisions and outcomes. This paper explores this issue in a dynamic setting. It examines the price and non-price marketing decisions (e.g., national and local advertising efforts) taken by national brand manufacturers and by retailers, when the latter owns a PL or not. Our main objective is to provide answers to the following questions: (1) What should the prices and the non-price marketing efforts be? (2) How do channel members’ strategies, profits, and sales for both products compare when the retailer offers (or not) a PL? (3) Is the presence of a PL always profitable for the retailer and harmful to the manufacturer? To the best of our knowledge, most of the game-theoretical literature that models the interactions between national brand manufacturers and retailers selling PLs is static, and in most cases, price and non-price marketing decisions are investigated separately. More particularly, no studies consider a dynamic setting where a manufacturer fixes the transfer price and invests in building its brand’s goodwill and, at the same time, a retailer that controls the retail price and the local promotional activities of both brands. Only two studies simultaneously examine price and non-price decisions, but both of them consider only the scenario where a national brand and a PL compete in the market, and they focus on issues related to the type of advertising between brands (i.e., competitive or informative advertising). Hence, the results obtained in these studies do not provide answers to the research questions (1)–(3) that we address in this study.
1
PLMA webpage (https://www.plma.com), retrieved March 29th, 2021. According to the PLMA, audits comparing the prices of PLs vs national brands for 100 market basket items revealed that consumers saved an average of 72 cents per item—about $32 billion on grocery—by buying PLs instead of national brands.
2
Are Private Labels Always Detrimental to National Brand Manufacturers?
51
The paper proceeds as follows. In the next section, we provide a brief review of the literature that has studied marketing channels, and where the effects of private labels are analyzed. In Sect. 3, we introduce our model and the scenarios investigated. In Sect. 4, we compute the equilibria and compare the results. Section 5 concludes.
2 Literature Review The existing literature investigating the impacts of PLs on the profits and strategies of vertical channel members can be classified into two broad lines of research: (a) studies that developed empirical models and where scanner data were used for validation purposes; and (b) theoretical models adopting a game-theoretical framework to model this issue. These models are mainly static, but recent research has extended the study of this topic to a dynamic framework in order to account for the carryover effects of some marketing decisions and the multiple interactions taking place in the marketing channel.
2.1 Empirical Models Most of the papers in this line of research used scanner data from large supermarket chains reporting sales, prices, marketing activities, and/or market shares for one or multiple product categories where a retailer had introduced its own PL. In [19], the author finds that retailer prices in the market increase as the number of branded products increases. However, when the new brand introduced into the market is a private label, the retail prices of the national brands decrease. In [4], the authors found mixed results among the different product categories, meaning that prices of the national brands could increase or decrease when facing the additional competition of a PL. In [6] are tested the effects of private label introduction on transfer prices and margins in two product categories. In one of the categories, the PL competes only with one national brand, while the second product category is also characterized by competition among several national brands. They found that the transfer prices of the existing firms decreased for both cases. Furthermore, they observed that the increase in the retailer’s margin for the national brand coincided with the introduction of the PL into the market. The link between both observations allowed the authors to conclude that PLs shift the bargaining power from manufacturers to retailers, since the competition introduced by the PL pushes the manufacturers to reduce their transfer prices. The impact of PL’s introduction on the shift of power from the manufacturer to the retailer was also confirmed for the majority of the products examined in [7]. Also [6] analyzed the impacts of PL introduction on the non-price marketing variables. They reported that the retailer’s promotional
52
A. Buratto and S. Taboubi
spending for the national brand was cut by almost 50%, while promotional spending for the PL was fixed at a higher level (w.r.t. the national brand). According to the authors, the increase in the retailer’s margin and the decrease in the national brand’s promotional spending confirm the change in the nature of the interactions between channel members, with the benefits going to the retailer. Is the presence of a PL always harmful for the manufacturer? Pauwels and Srinivasan [18] is one of the empirical studies that investigated this question. The authors found that the manufacturers can benefit from the introduction of a PL. This result contradicts most of the finding in the literature using analytical models to examine this topic. According to Ru et al. [21], the discrepancies between the empirical and the analytical results on this issue can be explained by the data used in the empirical studies. Indeed, the data used in these studies are often obtained from large retail chains. Retailers in this context are more powerful than manufacturers, while in the game-theoretical models, the hypothesis adopted in most of the studies considers the manufacturer as the Stackelberg leader.
2.2 Game-Theoretical Models The game-theoretic literature on PLs is mainly made up of papers adopting a static setting. In [20] it is examined a situation where a PL is launched in a market where two incumbent national brands compete. They used a model where they considered prices (i.e., wholesale and retail) as the sole decision variables and tested their results with scanner data. They demonstrated that a higher cross-price elasticity between the national brands and the store brand increases the retailer’s profit. This result is attributed to the fact that higher levels of competition lead to high sales of the PL, but also to an increase in its retail price, which still remains lower than the prices of the national brands. They also observed that the sales of the national brands decrease with the presence of the PL. Hence, the increase in the retailer’s profit for the whole product category is mainly driven by the introduction of the PL, since the authors found that the retailer’s profit from selling the national brands decreases under this condition. Another interesting result in [20] states that, when only two national brands are sold at the retailer’s store, the retailer is only interested in introducing its PL if the competition between the existing national brands is low. A very important remark can be added about the existing game theoretic literature on the effects of PLs on the strategies of firms in the distribution channel. This literature often considers the decision for the retailer to introduce or not its PL as endogenous. This means that the retailer, considered in most of the cases as a follower, observes the decisions of the manufacturer, and decides consequently whether or not to launch its brand. Hence, the PL becomes a strategic instrument for the retailer. Consequently, the manufacturer fixes its strategies by considering that the launch of a store brand is a threat and, in some situations, could try to counterbalance this by offering some incentives. Karray and Zaccour in [12] demonstrated for example that a cooperative advertising program mitigates the
Are Private Labels Always Detrimental to National Brand Manufacturers?
53
negative effects of launching a PL. In [9] it is examined the case where a national brand manufacturer could choose to distribute its product through a channel composed of unique retailer or two competing retailers. In the latter situation, the manufacturer could fix the same or different wholesale prices for both retailers (i.e., uniform versus flexible wholesale pricing). They demonstrated that the retailers are less interested by the launch of a PL (1) when there is no competition at their level, or (2) when the manufacturer chooses to implement a uniform pricing strategy among its competing retailers. Mills in [13] argued that the introduction of a PL is used by the retailer to counterbalance the inefficiencies in decentralized channels. Indeed, the PL allows the retailer to capture a portion of the channel profit that would be lost otherwise, because the competitive effect caused by the PL’s presence leads to a drop in retail prices, and hence, mitigates the double-marginalization problem. According to the author, PLs not only increase the channel’s total profit, but they also help the retailer to strengthen its channel power and to increase its profits, to the detriment of the manufacturer. Indeed, the manufacturer may be tempted to fix higher transfer prices if the national brand is the only product sold in the market. With the threat of PLs, the manufacturer is pressured to reduce the transfer price, since the introduction of a PL is only viable for the retailer when the national product is so expensive that the margin the retailer gets from selling it becomes too low. Under such circumstances, the retailer might be motivated to introduce another brand in order to gain a higher margin. In such a case, launching a PL rather than introducing a second national brand would allow the retailer to better reach this objective. According to Mills [13], retailer’s increase in profit surpasses the manufacturer’s profit loss, and the consumer surplus is higher when a PL is available in a product category. An additional evidence on the positive effect of PL introduction on the retailer’s profit and negotiation power is provided in [16], where is also found that PLs lower the manufacturer’s profit and increase the consumer surplus. The authors proved that retailers benefit from a lower transfer price when they introduce a PL, even in the situation where multiple national brands compete at the store level. In [21] is questioned the hypothesis on channel power by investigating the case where the retailer is the channel leader. The authors found that under a retailer-led Stackelberg game, the manufacturer could benefit from the introduction of a PL. This result is explained by the fact that the presence of the PL changes the strategic interaction between the manufacturer and the retailer, which switches from strategic substitutability to strategic independence when the PL is introduced. In [23], the authors introduced also the advertising effect by considering that the main difference between national brands and PLs is the goodwill (i.e., brand equity or image) that national brands manufacturers build through advertising. Hence, advertising leads not only to an increase in consumers’ willingness to pay, but also creates heterogeneity by distinguishing between consumers that are brand seekers versus those that are product seekers. The retailer takes advantage of this psychologically based differentiation by offering a PL as an alternative to the highly advertized national brand. This allows it to capture the portion of the market that has a lower willingness to pay. The authors predict an increase in the average price of
54
A. Buratto and S. Taboubi
a product category that includes a PL when the portion of product seekers is small and the cost of advertising is high. Karray and Zaccour (in [12]) were among the first to bring up the topic of PLs in the dynamic games literature. They considered an infinite-horizon game taking place in a bilateral monopoly where the manufacturer’s national advertising builds the brand’s goodwill. The authors examined a similar setting to the one we are studying in this paper, but considered constant margins. In their study, the retailer controls the promotional efforts for the national and store brands. These efforts aim at increasing the sales of both brands. The results in [12] confirm that the presence of a PL reduces the manufacturer’s profits, at the retailer’s benefit. Furthermore, the authors found that the presence of a PL has no effect on the manufacturer’s national advertising investment, but does affect the retailer’s local advertising efforts for both brands: the retailer systematically reduces its effort to promote the national brand and can even stop promoting it whenever its marginal revenues are found to be lower than the marginal revenues driven by the PL. On the other hand, as long as the unit margin of the PL is higher than that of the national brand, the retailer will always promote its PL. The authors suggest, as an extension of their model, to introduce prices as endogenous control variables. Our paper answers this call by simultaneously considering price and non-price marketing decisions in a similar setting to the one used in [12]. In the literature, seldom have price and non-price marketing variables been introduced in the same model, despite the fact that they often interact with each others and have different impacts on the optimal strategies of the channel members. The main arguments given by the authors for this have been that they wanted to focus on the effect of only one of these variables or wanted to keep their model simple in order to obtain analytical results.3 The only exceptions are [1, 2, 10] and [11]: they investigate price and advertising decisions in a channel where a PL and a national brand are sold. Surprisingly, [1, 2] and [10] do not answer the questions of how prices, efforts, and individual and total channel profits compare when the retailer owns (or not) a PL. Indeed, in [1, 2], the authors investigated the pricing and advertising strategies when the retailer offers both a PL and a national brand. While [2] computed Feedback Nash equilibria, the information structure in [1] is feedback Stackelberg. In both studies, the focus of the authors is to characterize channel members’ strategies for the particular case where each member’s advertising efforts hurt the goodwill stock of the other. Karray and Martín-Herrán in [10] also examined the interactions between pricing and advertising decisions of channel members when a PL competes with a national brand. They considered two situations in which the advertising for one brand could have either a competitive or complementary (i.e., informative) impact on the other brand’s sales. One of the main results in their study states that the relationship between advertising and pricing strategies depends on the nature of the advertising effect. In particular, when the retailer’s advertising negatively affects the national
3
The introduction of additional variables is often a challenge for model tractability.
Are Private Labels Always Detrimental to National Brand Manufacturers?
55
brand’s sales, the manufacturer reacts to the increase in the retailer’s advertising for the store brand by lowering the advertising effort and transfer price for its national brand. The retailer’s reaction to an increase in the manufacturer’s advertising effort when the latter is competitive depends on the price competition level and the strength of the competitive advertising effects. When advertising contributes to an increase in both brands’ sales (i.e., is informative), prices and advertising efforts are set at higher levels for the national and the store brands, and revenues increase for both channel members. Although interesting, these results are obtained only under the scenario where both the PL and the national brand compete in the market. Hence, they do not indicate how, in this scenario, prices, advertising, or profits compare to their respective values when the national brand does not face competition from a PL. One of the few studies that investigated these questions in a setting where the retailer and the manufacturer control price and non-price marketing variables is [11]. The authors examined various scenarios where the manufacturer and the retailer could fix their strategies simultaneously or as a sequence. The sequence being either to fix prices first, then the advertising levels, or to start by fixing the advertising levels and then the pricing strategies. The timing of decisions is a strategic tool that can be used in order to deter the retailer from launching a PL since the profitability of introducing a PL for each channel member is affected by the sequence of these decisions. As in the previous studies examining the impact of PL introduction on channel members profits, they found that the manufacturer always incurs losses when a PL is introduced by the retailer, regardless of the timing of decisions. On the other hand, the presence of the PL is not necessarily beneficial to the retailer. The main differences between our study and [11] is that we consider that, under the scenario where the retailer owns a store brand, this decision is made regardless of the interactions that the retailer has with the manufacturer. In different words, the launch of a PL is not a strategic decision in our setting. Furthermore, our study examines the impact of PLs’ presence by considering a dynamic model where the marketing decisions have immediate and long-term effects on sales and brand’s goodwill.
3 The Model We consider a bilateral monopoly where the manufacturer (M) is the channel leader,4 and we investigate two scenarios. In the first scenario, the retailer (R) distributes only the manufacturer’s national brand, denoted by N. In the second scenario, we add a second brand, owned by the retailer as a PL, denoted by subscript P . We follow the literature and consider that the PL manufacturer is a non-strategic
4 This is a standard hypothesis in the game-theoretic channel literature. Amrouche et al. in [1] give a list of papers where this hypothesis is used.
56
A. Buratto and S. Taboubi
player, and denote by cP > 0 and cN > 0 the production cost of the PL and of the national brand, respectively.5 Table of Notation aN (t) aNR (t) aP R (t) cN cP δ G0 G(t) pN (t) pP (t) w(t)
National advertising efforts Local advertising efforts for national brand Local advertising efforts for PL brand Production cost for National brand Production cost for PL brand Goodwill decay rate Initial National brand’s goodwill National brand’s goodwill Retail price for National brand Retail price for PL brand Transfer price
The manufacturer controls the transfer price w(t) and the national advertising investment aN (t) for the national brand. National advertising investment encompasses all advertising efforts at the national level (e.g., TV and online advertising). These investments have long-term effects since they contribute to the building of the brand’s goodwill. The latter is considered as the state variable and is denoted by G (t). It evolves according to the [17] dynamics given by ˙ (t) = aN (t) − δG (t) , G
G(0) = G0 ≥ 0,
(1)
where the parameter δ > 0 is the goodwill decay rate. In both scenarios, we consider that the retailer controls retail prices and local advertising for the products carried at its store. We denote by pi (t) and aiR (t) the retail price and the local advertising for brand i, where i ∈ {N, P } . Local advertising, as opposed to national advertising encompasses several promotional activities in the retail store. They are devoted to increase the sales of the brand at the store level. These hypothesis on the effects of local and national advertising are plausible since retailers are often interested in commercial activities designed to increase traffic and sales for the whole product category sold in their stores. The responsibility of investing in brand’s goodwill is often left to the manufacturers. Another argument justifying these hypothesis is that retailers carry various brands and cannot invest in national advertising for all of them, unless they benefit from incentives offered by manufacturers (e.g. Cooperative advertising programs). The local and national advertising costs for both products are given by the following expression: C(aj ) =
5
1 (aj )2 , 2
j ∈ {N, NR, P R} .
The retailer bears this cost when it offers the PL.
(2)
Are Private Labels Always Detrimental to National Brand Manufacturers?
57
Under scenario I , when only the national brand is sold in the marketplace, we use the following expression for the demand function: QIN (t) = α − βpN (t) + θ aNR (t) + ρG(t).
(3)
When the retailer introduces its private brand (i.e., scenario I I ), we use the following demand functions:6 QINI (t) = α − βpN (t) + γpP (t) + θ aNR (t) − τ aP R (t) + ρG(t)
(4)
QP (t) = α − βpP (t) + γpN (t) + θ aP R (t) − τ aNR (t)
(5)
α, β, γ , θ, ρ, and τ are all positive parameters: α denotes the market potential for the product category; β is the impact of a brand’s retail price on its own sales; and γ captures the cross-price effect (i.e., price competition). The parameter θ represents the impact of the retailer’s local advertising for each brand on the sales of that brand, while τ captures the impact of the retailer’s local advertising for one brand on the sales of the other (i.e., advertising competition). A standard hypothesis in the economic literature states that a product’s demand is more sensitive to its own price and advertising effort than those of its competitor. Hence, the model’s parameters compare as follows: β > γ,
θ > τ.
(6)
Three remarks can be added about Eqs. (4) and (5): First, for simplicity, we consider that the direct and cross-effect of prices and local advertising on demand are the same for both brands.7 Second, we consider that the retailer does not invest in national advertising activities to build goodwill for its private brand. Third, we add the term ρG(t) in the national brand’s demand specification in Eq. (4) in order to give higher baseline sales for the national brand with respect to the PL. This is to capture the fact that consumers always prefer buying well-known national brands instead of PLs whenever the retailer decides to sell both brands at the same price and to spend the same effort on local advertising.8
6
The superscripts (I, I I ) in the expressions of QN refer to the scenarios I and I I . Note that in the literature, direct and cross-price effects among brands could be taken either similar or different. Raju et al. [20], Sayman et al. [22], and Amrouche et al. [1] are some examples of studies where these parameters are considered symmetric, as we do in our study. By using this hypothesis, we focus on a situation where the PL is positioned very close the national brand with respect to price and perceived quality. This could be the case for premium PLs. This hypothesis should not have an impact on our qualitative results or the structure of the game we study. It can be easily relaxed, but at the cost of having very long analytical expressions. 8 In [16] the authors attribute the fact that consumers often prefer national brands to PLs to several factors: One of them is the superior perceived quality of national brands w.r.t. PLs. Another is the investment made by branded-product manufacturers in image-building advertising. 7
58
A. Buratto and S. Taboubi
We consider that both the manufacturer’s and the retailer’s objective under the two scenarios is the maximization of their discounted profit streams over an infinite horizon, assuming a common discount rate r > 0. In what follows, we will omit the time argument when no confusion may arise. The manufacturer’s and retailer’s optimization problems in the two scenarios read as follows: • Scenario I: I max JM w,a N
+∞ 1 −rt I 2 = e (w − cN ) QN − (aN ) dt, 2 0
max
pN ,aNR
JRI
+∞ 1 = e−rt (pN − w) QIN − (aNR )2 dt, 2 0
where QIN is given by (3). • Scenario II: II max JM w,a N
+∞ 1 = e−rt (w − cN ) QINI − (aN )2 dt, 2 0
max
pN ,pP ,aNR ,aP R
JRI I
+∞ 1 = e−rt (pN − w) QINI + (pP − cP )QP − (aNR )2 2 0
1 2 − (aP R ) dt. 2
where QINI and QP are given by (4) and (5), respectively.
4 The Results In this section, we compute the equilibria under scenarios I and I I and compare the results in order to answer the research questions addressed in this study. In both scenarios, we consider a feedback information structure. Hence, we look for prices and advertising strategies that depend on the state variable (i.e., the goodwill stock). We consider the retailer to be the follower; therefore in the following subsections, we first compute its best reaction function, and then the Feedback Stackelberg equilibrium strategies of both players.
Are Private Labels Always Detrimental to National Brand Manufacturers?
59
4.1 Scenario I The first order optimality conditions (FOC) of the HJB equation associated with the follower’s optimization problem (see Eq. (14) in Appendix 1) are α − 2βpN + θ aNR + ρG + βw = 0,
(7)
θ (pN − w) − aNR = 0.
(8)
Note that the second equation gives the following equality: aNR = θ (pN − w) . This result indicates that the retailer will advertise the national brand locally9 only if the unit margin that he gets from selling this brand is positive (pN − w > 0). Otherwise, the retailer won’t spend any local advertising effort to promote the national brand. By solving the system of Eqs. (7) and (8), we obtain the following expressions for the retailer’s reaction functions: p¯ N (w, G) =
ρG + β − θ 2 w + α , 2β − θ 2
a¯ NR (w, G) =
(α − βw + ρG) θ , 2β − θ 2 (9)
where the second-order optimality condition requires that 2β − θ 2 > 0. One interesting piece of information that can be obtained from the first reaction function is the type of strategic interaction linking the manufacturer’s and the retailer’s pricing strategies. According to [15], channel members’ pricing strategies may be substitutes, complements, or independent, depending on the sign of the N derivative ∂p ∂w .
p¯ N β−θ The first retailer’s reaction function implies that ∂∂w = 2β−θ 2 . Since this result could be positive, negative, or null, we can conclude that, depending on how the price and local advertising effects on demand compare, each channel member could modify or not its pricing strategy in the same or in the opposite directions as the price variations of the other channel’s member. The second expression in (9) clearly indicates that the retailer decreases its local advertising investment in promoting the national brand whenever the manufacturer a¯ NR decides to increase the national brand’s transfer price ∂ ∂w 0, thus the solution is interior.
60
A. Buratto and S. Taboubi
Proposition 1 Assuming an interior solution, the manufacturer’s and the retailer’s pricing and advertising strategies at equilibrium under scenario I are given by the following expressions: wI (G) =
ρG + α + βcN , 2β
I aN (G) = M1 G + M2 , ρ 3β − θ 2 G + β 3α + cN β − θ 2 − αθ 2 I , pN (G) = 2β 2β − θ 2 I aNR (G) =
θ (ρG + α − βcN ) . 2 2β − θ 2
Channel members’ value functions are I (G) = VM
1 M1 G2 + M2 G + M3 , 2
VRI (G) =
1 R1 G2 + R2 G + R3 , 2
(10)
the expressions of M1 , M2 , M3 , R1 , R2 , and R3 are given in Appendix 1. Proof See Appendix 1.
Proposition 1 indicates that both channel members’ strategies are linear in the state variable and that the value functions are quadratic. These results are expected because of the structure of the game. Moreover, it is very easy to observe that the national brand’s transfer price increases when the goodwill level increases. This means that manufacturers selling brands that have a high goodwill will charge a higher transfer price with respect to the price that they would charge if their brand image was low. I (G) and a I (G) indicate that the national brand’s retail The expressions of pN NR price and the retailer’s local advertising effort to promote the national brand both increase with the level of the brand’s goodwill.
4.2 Scenario II In scenario I I, we compute the channel members’ strategies and profits in the situation where the retailer offers a PL on top of selling a national brand. The results are obtained by following the same steps as in the previous scenario. An interesting result to be reported here relates to the conditions for the retailer to allocate local advertising effort for both brands (i.e., the national brand and the PL), and the condition under which the retailer might allocate more (or less) local advertising effort to promote one brand at the expense of the other. Indeed, by computing the partial derivatives of the retailer’s Hamiltonian in HJB (see Eq. (15) in Appendix 2)
Are Private Labels Always Detrimental to National Brand Manufacturers?
61
with respect to aNR and aP R and equating to zero, we obtain a¯ NR = θ (pN − w) − τ (pP − cP ) ,
(11)
a¯ P R = θ (pP − cP ) − τ (pN − w) .
(12)
This result indicates that the retailer will allocate effort to advertising a brand (aNR > 0 or aP R > 0) locally only if the marginal revenue resulting from promoting it surpasses the marginal loss from promoting the other brand. In [12], it is obtained a similar condition for the case where channel members make only advertising decisions while margins are constant. Furthermore, by computing the difference between (11) and (12) a¯ NR − a¯ P R = (θ + τ ) ((pN − w) − (pP − cP )) , we can conclude that the retailer will always allocate more (less) local advertising effort to promote the brand with the highest (lowest) profit margin. The results under scenario I I are given in the following proposition. Proposition 2 Assuming an interior solution, the manufacturer’s and the retailer’s strategies at the equilibrium under scenario I I are given by the following expressions: wI I (G) =
−ρY G+T 2U
II aNR (G) =
C1 G+D1 2SU ,
aP R (G) =
pP (G) =
C3 G+D3 2SU ,
II pN
,
II aN (G) = N1 G + N2 ,
C2 G + D2 , 2SU C4 G + D4 , (G) = 2SU
where C1 , C2 , C3 , C4 , D1 , D2 , D3 and D4 are parameters given in Appendix 2, Y = 2γ θ τ − (U + γ 2 θ 2 + τ 2 )/β, T = αK + cN U + 2cP Z, U = 2β 3 − β 2 θ 2 + τ 2 − 2βγ (γ − 2θ τ ) − γ 2 θ 2 + τ 2 > 0, 2 S = 4β 2 − 4β θ 2 + τ 2 − 4γ 2 + 8γ θ τ + θ 2 − τ 2 > 0. For parameters K and Z see Appendix 2. The channel members’ value functions are II VM (G) =
1 N1 G2 + N2 G + N3 , 2
VRI I (G) =
1 Z 1 G2 + Z 2 G + Z 3 , 2
(13)
the expressions of the parameters N1 , N2 , and N3 are given in Appendix 2.10 10 The
coefficients Z1 , Z2 , and Z3 have very long expressions. We do not write them explicitly, since they do not affect the optimal strategies. We will need their values in the numerical section to compute the optimal profits throughout the value functions. In any case, the Mathematica code for generating the results is available from the authors upon request.
62
Proof See Appendix 2.
A. Buratto and S. Taboubi
The results in Proposition 2 indicate that all prices and national and local advertising strategies for both the national brand and the PL are linear in the goodwill stock. The expressions of parameters C1 , C2 , C3 , and C4 do not allow us to determine whether they are positive or negative. Hence, additional comments on how channel members’ strategies compare, and how they are affected by the goodwill cannot be made about these analytical expressions. As in scenario I, the channel members’ value functions are quadratic in the goodwill stock.
4.3 Comparing Results To provide answers to this paper’s research questions (ii) and (iii), it is necessary to compare the strategies and profits of both channel members under scenarios I and I I . Although we obtained analytical solutions for both scenarios, comparing the results analytically was challenging because of the complicated expressions obtained under scenario I I . Hence, we chose to perform some numerical simulations, where we fixed some of the initial model’s parameters and examined the impacts of two key parameters on the model: the price and the advertising competition effects (i.e., parameters γ and τ, respectively). In the next subsection, we provide additional details about these simulations and the results obtained. We are interested in the long-term behavior of prices, advertising efforts, goodwill stocks, brand demands, and profits. Therefore, the values of the variables under investigation are compared at the steady state.
4.3.1 Numerical Illustration We start by computing the results under both scenarios by using a benchmark case,11 where the following values for the model’s initial parameters are used. Some parameters (discount rate, decay rate, and production cost) are taken from Amrouche et al. [1]. The other parameters are used to calibrate the model in order to obtain admissible and realistic values.12 The demand parameters are α = 4, β = 1, θ = 1, γ ∈ [0, 1[ , τ ∈ [0, 1[ . The cost parameters are cN = 0.01, cP = 0. The dynamic parameters are ρ = 0.01 and δ = 0.1. The discount rate is given by r = 0.03.
11 All the claims obtained under this section have been checked for robustness after varying the values of parameters α, δ, ρ, cP and the initial goodwill level G0 . 12 E.g., since PL retail prices are generally equal to or lower than the retail prices of national brands, we chose a set of parameters that reflect this reality.
Are Private Labels Always Detrimental to National Brand Manufacturers?
63
1.0
0.8
τ
0.6
0.4
0.2
0.0
0.0
0.2
0.4
γ
0.6
0.8
1.0
Fig. 1 Feasibility region
Note that with this set of parameters, we normalize to 1 the values of β and θ , which capture the effects of own-price and own-local advertising on a product’s demand, and we allow the parameters γ and τ to vary in the interval [0, 1[ , according to (6). The choice to vary these parameters is justified by their importance in capturing the competition between the PL and the national brand. In all our simulations, we checked the stability conditions and considered only the results guaranteeing the positiveness of demands, strategies, margins, and profits.13 All the figures have been obtained using the same set of parameters values above reported. Figure 1 shows the feasible region in the space defined by γ and τ where all these constraints are fulfilled, together with the sufficient second order optimality conditions. We generated a non-uniform grid of points (more dense close to the border of the feasible region), where the different values of the analyzed variables are computed. Moreover a mesh size of 0.05 was built to verify possible monotonicity behaviors. The following figures are obtained by computing the difference between the results obtained under scenario I I and scenario I. In each figure, the “++” (“− −”) regions correspond to the feasible values of γ and τ, such that the value of the represented variable is greater (smaller) in scenario I I, i.e., in the presence of the PL, w.r.t. its value in scenario I.
13 We
carried out the simulations using Mathematica 11.1.1; the code for generating the numerical results is available from the authors upon request.
64
A. Buratto and S. Taboubi
Fig. 2 Variation of aNM
Figure 2 illustrates how the manufacturer adjusts its national advertising investments when the retailer owns a PL with respect to the case where the retailer sells only the national brand. Figure 3 shows the effects of the PL’s presence on the national brand’s goodwill. The results indicate that, depending on the combination of values for the parameters capturing the price competition (γ ) and the local advertising competition (τ ), the manufacturer of the national brand could react to the presence of the PL by either increasing or decreasing its investments in national advertising. As a result, the brand’s goodwill could be either higher or lower than its level when the retailer has a PL. This result is not in line with the result obtained in [12] that the manufacturer’s national advertising investment is not affected by the presence or absence of the PL. Since both our study and the study of [12] share multiple similarities with respect to the game structure and channel setting, we attribute this result to the endogeneity of prices in our model and their interplay with the nonprice marketing decisions. The “++” region in Fig. 2 shows that there are some combinations where the manufacturer’s best reaction to the presence of a PL is to invest more in national advertising (with respect to the situation where it holds a monopolistic power at the
Are Private Labels Always Detrimental to National Brand Manufacturers?
65
Fig. 3 Variation of G
retailer’s store). More particularly, we can see that such a reaction can be observed only for parameters combinations characterized by an advertising competition effect that is lower than the price competition effect.14 For all combinations where τ is higher than γ , the manufacturer reduces the national advertising for its brand. Furthermore, Fig. 2 indicates that for all parameters combinations, the region where the manufacturer invests less in national advertising under the presence of the PL is bigger than the region where the manufacturer invests more in national advertising. Stated differently, we can see that there are more situations where the manufacturer invests less in national advertising than where it does the opposite. Since the brand’s goodwill is built through national advertising investments, the results we observe in Fig. 3 are expected: in the region where the manufacturer reduces the national advertising investment, we observe a lower level in the national brand’s goodwill in the presence of the PL, while in the region where the manufacturer reacts to the presence of a PL’s by fixing a higher level for its national advertising investment, the goodwill stock increases. 14 Although there are many combinations of γ and τ for which the manufacturer decreases the national advertising investment with γ < τ .
66
A. Buratto and S. Taboubi
Fig. 4 Variation of w
The result in Fig. 4 reveals that, depending on the same combinations of values for the parameters γ and τ identified in the previous figures, the manufacturer of the national brand could react to the presence of the PL by fixing either a higher or a lower transfer price. Interestingly, this result contradicts previous results in the channels literature, which state that the pressure resulting from introducing a PL always forces the manufacturer to concede a portion of its profits to the retailer by decreasing the transfer price. As mentioned in Sect. 2, Chen [5] finds that the manufacturer should reduce its transfer price whenever the national brand faces a high level of price competition. The “++” region in Fig. 4 indicates that there are some combinations of the values of γ and τ for which the manufacturer’s best reaction to the presence of the PL is to increase the transfer price. Such a reaction can be observed in the same region as where the manufacturer invests more in the national brand advertising, and where the goodwill stock is higher when the retailer holds a PL. Hence, the only situations where the manufacturer could increase the national brand’s transfer price are observed when the competitive effect of local advertising is lower than the level of advertising competition. For all combinations where τ is higher than γ , that is, in the region where the manufacturer reduces its investments in national advertising
Are Private Labels Always Detrimental to National Brand Manufacturers?
67
Fig. 5 Variation of pNR
and the goodwill stock is lower, the manufacturer fixes a lower transfer price when the retailer holds a PL. Here again, we can observe that the region where the manufacturer decreases the transfer price under the presence of a PL is bigger than the region where the manufacturer increases the transfer price. Stated differently, we can see that there are more situations where the manufacturer decreases the transfer price rather than doing the opposite. Figure 5 illustrates the difference in the national brand’s retail price under both scenarios. The results indicate that the retailer may either increase or decrease the retail price, depending on the combinations of the parameters γ and τ in the feasibility region. The “− −” zone shows all the parameters combinations for which the retailer fixes a lower retail price for the national brand when selling a PL in the market. The results clearly show that the retailer fixes a lower price for the national brand only when the price competition is low (γ < 0.6) and τ > γ . In all situations where the price competition is higher than 0.6, the retailer always fixes a higher price for the national brand. Interestingly, we can observe, by comparing the “− −” zone in Fig. 5 with the “− −” zone in Fig. 4, that the retailer decreases the retail price when carrying a PL only in the situations where the manufacturer reduces the national brand’s transfer price. However, the retailer does not always pass-through
68
A. Buratto and S. Taboubi
Fig. 6 Variation of aNR
this transfer price reduction to consumers, (i.e., the “− −” zone in Fig. 4 is bigger than the “− −” zone in Fig. 5. Finally, comparing the “++” zones in both figures indicates that, in all the situations where the manufacturer fixes a higher transfer price for the national brand, the retailer reacts by setting a higher retail price for the national brand. Figure 6 depicts the difference in local advertising. It indicates that when the retailer carries a PL, it may either allocate higher or lower local advertising to the national brand (w.r.t. scenario I ). In any case, our simulations indicate that a retailer carrying a PL always spends more on the local advertising effort to promote its brand. This is because the PL’s unit margin is higher than the unit margin of the national brand. This condition is demonstrated in Sect. 4.2, and its fulfillment is observed in all our simulations. Furthermore, our results reveal that the retailer always reacts to an increase in manufacturer’s national advertising and transfer price by increasing the local advertising effort, but does not always reduce its local advertising for the national brand in the mirror situation. Since the “++” region is bigger in Fig. 6 than in Fig. 2, we can observe that there are many parameters combinations in which the manufacturer can decrease its national advertising investment and transfer price,
Are Private Labels Always Detrimental to National Brand Manufacturers?
69
Fig. 7 Variation of National brand demand
while the retailer increases its local advertising effort for the national brand. Finally, the results indicate that an increase in the retailer’s local advertising is observed only for admissible parameters values involving a τ lower than γ . Figure 7 shows that the demand for the national brand could also be higher or lower when a PL is available on the market. Interestingly, the “− −” and “++” zones in this figure coincide with the “− −” and “++” zones in Figs. 2, 3, and 4. Hence, the increase in the national brand’s demand, observed mainly for situations where γ is higher than τ , can be attributed to the manufacturer’s higher investment in national advertising in this parameter region and the resulting increase in the brand’s goodwill. The effect of the increase in the brand’s goodwill on the demand for the national brand seems to compensate for the increase in the national brand’s retail price in this region. An additional observation (not depicted in the figure) indicates that in all the numerical simulations, we find that if a retailer owns a PL, the demand for the PL is always higher than the demand for the national brand. This result can be attributed to the fact that the retail price of the PL is always lower than that of the national brand, and the retailer always allocates more local advertising effort to the PL than to the national brand.
70
A. Buratto and S. Taboubi
Fig. 8 Variation of manufacturer’s profits
The analysis of the variations in the individual and total channel profits indicates that, for all the admissible parameters values, the retailer always benefits from having its own store brand. This result is consistent with the previous results reported in the literature review section15 (see e.g. [20]). Since the retailer’s margin is higher for the PL than for the national brand, the retailer’s profit stemming from the PL sales in scenario I I is higher than the profit the retailer obtains from selling the national brand under this same scenario. Furthermore, comparing the retailer’s profit stemming from sales of the national brand, under the presence or not of a PL, reveals that the national brand is more profitable for the retailer when it is sold exclusively in its store. When the retailer carries a PL, the national brand becomes less profitable than the PL. For the manufacturer, we find mixed results, depending on the combinations of parameters γ and τ . These results are illustrated in Fig. 8.
15 The
only exception being the study of Karray and Martín-Herrán [11], where the authors found situations where the retailer could not benefit from the presence of a PL.
Are Private Labels Always Detrimental to National Brand Manufacturers?
71
This figure captures a very interesting result that diverges from the existing literature that has compared the impact of PLs on the strategies and profits of channel members. Indeed, unlike previous results, which have suggested that the manufacturer always looses profits, at the PL-owning retailer’s benefit, our results indicate that there are situations where the manufacturer can also benefit from the presence of the PL. These situations correspond to some of the parameter combinations characterized by a higher value of γ than of τ . The increase in manufacturer’s profit in the “++” region can be attributed to the rise in the demand for the national brand and in the transfer price in this same region (as depicted in Figs. 7 and 4). Finally, our numerical simulations indicate that, for all the admissible values of the parameters γ and τ , the presence of a PL increases the overall channel efficiency. This result is consistent with previous results in the literature. It indicates that, despite the fact that there are some regions in which the manufacturer’s profit is hurt by the presence of a PL, the increase in the retailer’s profit compensates for the decrease in the manufacturer’s profit. This means that in the “− −” region in Fig. 8, the decrease in the manufacturer’s profit is lower than the increase in the retailer’s profit resulting from the launch of the PL.
5 Conclusion This paper investigates price and non-price marketing decisions (i.e., national and local advertising) in a bilateral monopoly where a retailer may or may not own a private label in addition to distributing a national brand. We build a dynamic model that takes into account the carryover effects of the manufacturer’s national advertising, and study two scenarios where the national brand does and does not face competition from a private label. One of our key results indicates that the presence of the PL is not always detrimental to the national brand manufacturer, as has been suggested in the existing literature. More particularly, we find that, when the competitive effect of local advertising between the PL and the national brand is lower than the competitive effect of prices, the national brand’s manufacturer could charge a higher transfer price to the retailer carrying a PL (with respect to the situation where the retailer sells only the national brand). This result raises questions about the main reasons advanced by Mills [13] on why retailers launch PLs. In that, where only pricing decisions are taken into account, the author considers PLs counterstrategies used by retailers to push manufacturers to reduce transfer prices. We explain the difference in the results by the fact that our study considers that the manufacturer and the retailer have multiple control variables that allow them to find the appropriate combination of instruments when maximizing their objectives. By considering the interplay between the price and non-price marketing variables, each channel member uses both instruments to exercise its influence on the decisions and outcomes of the other channel member and the channel.
72
A. Buratto and S. Taboubi
These results highlight the importance of considering both the price and the nonprice marketing variables in this literature, as suggested in [12]. The following are a few possible extensions of this study: • To investigate if some incentives strategies can be implemented by the manufacturer in the regions where the introduction of the LP hurts its profits: For example, Mills [14] examined whether or not the manufacturer can employ some counterstrategies in order to appropriate a portion of the profit surplus resulting from the PLs’ launch. He demonstrated that the use of a coupons program and the implementation of a two-part tariff (e.g., quantity discounts or retail price maintenance) can allow the manufacturer to reach this objective, while lump sum payments (e.g., shelf payments and slotting allowances) are not effective in counterbalancing the retailer’s PL strategy. In a dynamic setting, Karray and Zaccour [12] found that a cooperative advertising program can not only mitigate the manufacturer’s profit loss, but is also Pareto-improving for both channel members. Chen [5] investigated this question in a static setting where the channel members make price and advertising decisions, and found that channel members will benefit and implement a cooperative advertising program in a similar context only when the level of competition between the national brand and the PL is high. • To examine the situation where the retailer also invests in national advertising to build goodwill for its brand, and, as in [1, 2], to consider that the national advertising investments of one channel member could hurt the goodwill stock of the other channel member • To consider, a second national brand manufacturer in the channel, in a dynamic game involving price and non-price marketing variables. Competition here will be observed in the horizontal and vertical channel structures (i.e., competition between the two national brands, and competition between the national brand and the retailer’s PL) • To investigate the case where the introduction of the private label is a strategic decision made by the retailer and to examine if various leadership roles and decision sequences could have an impact on such a decision.
Appendix 1 We compute here the Feedback Stackelberg solution (see [8, p. 142] and [3, p. 373]) assuming that both the leader and the follower solve stagewise dynamic programming problems. Since the retailer is the follower, we start by writing its HJB equation:
rVRI (G)
1 2 (pN − w) (α − βpN + θ aNR + ρG) − aNR 2 I dVR (G) + (aN − δG) . dG
= max
pN ,aNR
(14)
Are Private Labels Always Detrimental to National Brand Manufacturers?
73
Computing the partial derivatives of the RHS in equation (14) with respect to pN and aNR and setting them equal to zero gives the retailer’s reaction functions under scenario I given by Eqs. (9). As a leader, the manufacturer maximizes its profit functional subject to the retailer’s reaction functions. The Hamilton-Jacobi-Bellman (HJB) associated with the manufacturer’s optimization problem is given by the following equation: 1 2 I rVM (G) = max (w − cN ) (α − β p¯ N + θ a¯ NR + ρG) − aN w,aN 2 I (G) dVM + (aN − δG) . dG We substitute the retailer’s reaction functions (9) in it, compute the derivatives w.r.t. the manufacturer’s control variables w and aN , and obtain w(G) =
ρG + α + βcN , 2β
aN (G) =
I (G) ∂VM . ∂G
The sufficient conditions for a stationary feedback Stackelberg equilibrium require I (G), us to find bounded and continuously differentiable functions, VRI (G) and VM for the retailer and the manufacturer, respectively, which satisfy for all G(t) ≥ 0, the HJB equations obtained after the substitution of w(G) and aN (G). Guided by the model’s linear-quadratic structure, we conjecture that the functions I (G) are quadratic and given by the expressions (10) in the proposiVRI (G) and VM tion. The coefficients R1 , R2 , R3 , M1 , M2 and M3 are obtained by identification I (G) as well as their first derivatives into the HJB after replacing VRI (G) and VM equations. There are two triplets of coefficients associated to the manufacturer’s problem: (M1+ , M2+ , M3+ ) and (M1− , M2− , M3− ), where ⎛ M1+ =
M1−
⎞
1 ⎝(r + 2δ) + 2
(r + 2δ)2 −
2ρ 2 2β − θ 2
⎠,
⎛ ⎞ 2 1 2ρ ⎠, = ⎝(r + 2δ) − (r + 2δ)2 − 2 2β − θ 2
M2+ = M2− =
2β − θ 2
2β − θ 2
(α − cN β)ρ r − (r + 2δ)2 −
2ρ 2 2β−θ 2
(α − cN β)ρ r + (r + 2δ)2 −
2ρ 2 2β−θ 2
,
,
74
A. Buratto and S. Taboubi
M3+
M3−
2ρ 2 (α − cN β)2 r 2 + 2rδ + 2δ 2 − r (r + 2δ)2 − 2β−θ 2 = ,
2 2 2ρ 2r 2β − θ 2 r − (r + 2δ)2 − 2β−θ 2
2ρ 2 (α − cN β)2 r 2 + 2rδ + 2δ 2 + r (r + 2δ)2 − 2β−θ 2 = .
2 2ρ 2 2 2 r + (r + 2δ) − 2β−θ 2 2r 2β − θ
Before finding the coefficients of the value function associated to the retailer’s problem, let’s observe that the motion equation depends only on the manufacturer’s coefficients, more precisely only on M1 and M2 , in fact: G (t) = (M1 − δ) G(t) + M2 . The stability of the steady state depends on the sign of (M1 − δ). Observing that M1+ − δ > 0 for all n-uples of parameters values, we can infer that any solution obtained with the triplet (M1+ , M2+ , M3+ ) is unstable, and therefore, we will continue our analysis with the (M1− , M2− , M3− ) triplet. The stability condition M1− − δ < 0 2
ρ holds if and only if 2δ(δ + r) − 2β−θ 2 > 0. The coefficients of the retailer’s value functions associated to such a triplet are
R1 = R3 =
ρ2 , 4(2β−θ 2 )(r+2δ−2M1− ) (α−cN β)2 8r (2β−θ 2 )
+
R2 =
ρM2− (α−cN β)
ρ(ρM2− +(α−βcN )(2(δ−M1− )+r)) 4(2β−θ 2 )(r+2δ−2M1− )(r+δ−M1− )
4r (2β−θ 2 ) (r+δ−M1− ) 2
+
2 ρ 2 (M2− ) − 4r(r+2δ−2M1 )(r+δ−M1− )
and
.
Finally, plugging the derivatives of the values functions into the expressions w(G) and aN (G) provides the channel members’ pricing strategies at the equilibrium displayed in Proposition 1.
Appendix 2 We follow the same steps as in Appendix 1. The retailer’s HJB equation is rVRI I (G) =
max
pN ,pP ,aN R ,aP R
(pN − w) (α − βpN + γpP + θ aNR − τ aP R + ρG) (pP − cP ) (α − βpP + γpN + θ aP R − τ aNR ) + dVRI I (G) 1 2 1 − aNR − aP2 R + (aN − δG) 2 2 dG
(15)
Are Private Labels Always Detrimental to National Brand Manufacturers?
75
Solving the FOC with respect to the control variables pN , pP , aNR , and aP R , we obtain the retailers’ reaction functions: α 2(β + γ ) − (θ + τ )2 + cP 2βθ τ − γ θ 2 + τ 2 Γw + p¯N (w, G) = S S 2 2 ρ 2β − θ + τ G + S 2βθ τ − γ θ 2 + τ 2 Γ 2α + cP + p¯ P (w, G) = w S S 4β − 2 2γ + (θ − τ )2 +
2(γ − θ τ ) ρG S
a¯ NR (w, G) =
α(θ − τ ) 2(β − γ ) − (θ − τ )2 cP 2 2β 2 θ − βθ θ 2 − τ 2 − γ 2γ θ − θ 2 τ + τ 3 − 2S 2 2 2 −2β θ + βθ θ − τ + γ 2γ θ − θ 2 τ + τ 3 w + S ρ 2βθ − θ 3 − 2γ τ + θ τ 2 G + S
cP 2 β 2 − γ 2 θ + (−βθ + γ τ ) θ 2 − τ 2 α(θ − τ ) a¯ P R (w, G) = − S 2(β − γ ) − (θ − τ )2 −γ θ 3 + 2β 2 τ − 2γ 2 τ + βθ 2 τ + γ θτ 2 − βτ 3 + w S ρ −2γ θ + 2βτ + θ 2 τ − τ 3 − G S where S = 2β + 2γ − θ 2 − 2θ τ − τ 2 2β − 2γ − θ 2 + 2θ τ − τ 2 = 0 and Γ = 2β 2 − 2γ 2 − 3βθ 2 + θ 4 + 6γ θ τ − 3βτ 2 − 2θ 2 τ 2 + τ 4 .
The second-order optimality conditions claim that S > 0. Let us substitute the reaction functions into the manufacturer’s HJB equation:
II rVM (G)
1 2 = max (w − cN ) (α − β p¯ N + γ p¯ P + θ a¯ NR − τ a¯ P R + ρG) − aN w,aN 2 I I (G) dVM + (16) (aN − δG) dG
76
A. Buratto and S. Taboubi
I I (G) by solving the partial derivatives of its RHS: and compute wI I (G) and aN
wI I (G) = II aN (G) =
ρ U + γ −2βθ τ + γ θ 2 + τ 2 αK + cN U + 2cP Z + G, 2U 2βU I I (G) dVM dG,
(17) (18)
where
K = 2β 2 − β(θ + τ )2 + γ (θ + τ )2 − 2γ , U = 2β 3 − β 2 θ 2 + τ 2 − 2βγ (γ − 2θ τ ) − γ 2 θ 2 + τ 2 , Z = β 2 (γ + θ τ ) − βγ θ 2 + τ 2 + γ 2 (θ τ − γ ). I I (G) in Proposition 2. are obtained by The feedback strategies wI I (G) and aN U +γ 2 θ 2 +τ 2
(17) and (18) upon the substitutions Y = 2γ θ τ − and T = αK + β cN U + 2cP Z, The second-order optimality conditions claim that S > 0 and U > 0. Now we I I (G) into Eqs. (15) and (16) and conjecture that substitute strategies wI I (G) and aN I I I I VR (G) and VM (G) are quadratic and given by the expressions (14). I I (G) = N G + N . The coefficients Under this assumption (18) becomes aN 1 2 N1 , N2 , N3 , Z1 , Z2 , Z3 can be obtained by identification after replacing VRI I (G) I I (G) as well as their first derivatives into the HJB equations. and VM For the manufacturer’s value function we obtain the two triplets of coefficients (N1+ , N2+ , N3+ ) and (N1− , N2− , N3− ), where N1+ =
Yρ Yρ , N2− = SUF(r+R) . (r + 2δ + R) , N1− = 12 (r + 2δ − R) , N2+ = SUF(r−R)
2 2 2 2F Y ρ 1 (2cP Z − cN U + Kα)2 + SU and N3+ = 4rSU (r−R)2
2Y 2 ρ2 2F − 1 N3 = 4rSU (2cP Z − cN U + Kα)2 + SU (r+R)2 2 2 U +γ 2 θ 2 +τ 2 and with R = (r + 2δ)2 − 2YSUρ , Y = 2γ θ τ − β F = cN U − α(β − γ ) 2β + 2γ − (θ + τ )2 + 2cP γ 2 (γ − θ τ ) − β 2 (γ + θ τ )+ βγ θ 2 + τ 2 . 1 2
In order to study the stability of the solution, we substitute the reaction function I I (G) in the state equation and obtain a¯ N ˙ (t) = (N1 − δ)G (t) + N2 . G It is a first-order linear differential equation, thus its solution is stable if and only if (N1 − δ) < 0. Observing that N1+ − δ > 0 for all n-uples of parameters values, we can infer that any solution obtained with the triplet (N1+ , N2+ , N3+ ) is unstable, and therefore, we will continue our analysis with the (N1− , N2− , N3− ) triplet. The stability condition N1− − δ < 0 holds if and only if δ(δ + r) −
Y 2ρ2 2SU
> 0.
Are Private Labels Always Detrimental to National Brand Manufacturers?
77
Analogously for the manufacturer’s HJB equation, we obtain the feedback strategies in Proposition 2. with the following coefficients for the value function: C1 = 2Uρ 2βθ − θ 3 − 2γ τ + θ τ 2 − YρA, α(θ−τ ) D1 = 2KUβ−γ + 2cP U B + T A, C2 = 2Uρ 2γ θ − 2βτ − θ 2 τ + τ 3 − YρB, α(θ−τ ) D2 = 2KUβ−γ + 2cP U A + T B, C3 = 4Uρ (γ − θ τ ) + Yρ γ θ 2 + γ τ 2 − 2θ τβ , D3 = 2U α 2β + 2γ − (θ + τ )2 + 2cP U B − T γ θ 2 + γ τ 2 − 2θ τβ , C4 = −2Uρ −2β + θ 2 + τ 2 − YρB, D4 = 2U α 2β + 2γ − (θ + τ )2 + 2cP U γ θ 2 + γ τ 2 − 2θ τβ + T B, A = γ τ 3 − βθ τ 2 − γ θ 2 τ + βθ 3 + 2γ 2 θ − 2β 2 θ, B = −γ θ 3 + βτ θ 2 + γ τ 2 θ − βτ 3 − 2γ 2 τ + 2β 2 τ.
References 1. Amrouche, N., Martín-Herrán, G., Zaccour, G.: Feedback Stackelberg equilibrium strategies when the private label competes with the national brand. Ann. Oper. Res. 164, 79–95 (2008). https://doi.org/10.1007/s10479-008-0320-7 2. Amrouche, Amrouche, N., Martín-Herrán, G., Zaccour, G.: Pricing and advertising of private and national brands in a dynamic marketing channel. J. Optim. Theory Appl. 137(3), 465–483 (2008). https://doi.org/10.1007/s10957-007-9340-8 3. Ba¸sar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (1998). https://doi.org/10.1137/1.9781611971132 4. Bonfrer, A., Chintagunta, P.K.: Store Brands: Who buys them and what happens to retail prices when they are introduced? Rev. Ind. Organ. 24(2), 195–218 (2004). https://doi.org/10.1023/B: REIO.0000033352.19694.4a 5. Chen, J.: The manufacturerâs co-op advertising counterstrategy to private label. In: International Conference on E-Product E-Service and E-Entertainment, Henan, 2010, pp. 1–4 (2010). https://doi.org/10.1109/ICEEE2010.5660103 6. Chintagunta, P., Bonfrer, A., Song, I.: Investigating the effects of store-brand introduction on retailer demand and pricing behavior. Manag. Sci. 48(10), 1242–1267 (2002) 7. Chung, H., Lee, E.: Effect of store brand introduction on channel price leadership: An empirical investigation. J. Retail. 94(1), 21–32 (2018). https://doi.org/10.1016/j.jretai.2017.10.001 8. Dockner, E., Jorgensen, S., Van Long, N., Sorger, G.: Differential Games in Economics and Management Science. Cambridge University Press, Cambridge (2000) 9. Jin, Y., Wu, X., Hu, Q.: Interaction between channel strategy and store brand decisions. Eur. J. Oper. Res. 256(3), 911–923 (2017). https://doi.org/10.1016/j.ejor.2016.07.00 10. Karray, S., Martín-Herrán, G.: A dynamic model for advertising and pricing competition between national and store brands. Eur. J. Oper. Res. 193(2), 451–467 (2009) 11. Karray, S., Martín-Herrán, G.: Fighting store brands through the strategic timing of pricing and advertising decisions. Eur. J. Oper. Res. 275(2), 635–647 (2019). https://doi.org/10.1016/j.ejor. 2018.11.06 12. Karray, S., Zaccour, G.: A differential game of advertising for national and store brands. In: Haurie, A., Zaccour, G. (eds.) Dynamic Games: Theory and Applications, pp. 213–229. Springer US, Boston, MA (2005)
78
A. Buratto and S. Taboubi
13. Mills, D.: Why retailers sell private labels. J. Econ. Manag. Strategy 4, 509–28 (1995). https:// doi.org/10.1111/j.1430-9134.1995.00509.x 14. Mills, D.: Private labels and manufacturer counterstrategies. Eur. Rev. Agricult. Econ. 26(2), 125–145 (1999). https://doi.org/10.1093/erae/26.2.125 15. Moorthy, K.S.: Strategic decentralization in channels. Mark. Sci. 7(4), 335–355 (1988). https:// doi.org/10.1287/mksc.7.4.335 16. Narasimhan, C., Wilcox, R.T.: Private labels and the channel relationship: A cross-category analysis. J. Business 71(4), 573–600 (1998) 17. Nerlove, M., Arrow, K.: Optimal advertising policy under dynamic conditions. Economica 29(114), 129–142 (1962). https://doi.org/10.2307/2551549 18. Pauwels, K., Srinivasan, S.: Who benefits from store brand entry. Mark. Sci. 23, 364–390 (2004) 19. Putsis, W.P.: An empirical study of the effect of brand proliferation on private label – national brand pricing behavior. Rev. Ind. Organ. 12(3), 355–371 (1997). https://doi.org/10.1023/A: 1007704421589 20. Raju, J.S., Sethuraman, R., Dhar, S.K.: The introduction and performance of store brands. Manag. Sci. 41(6), 957–978 (1995) 21. Ru, J., Shi, R., Zhang, J.: Does a store brand always hurt the manufacturer of a competing national brand? Prod. Oper. Manag. 24(2), 272–286 (2015). https://doi.org/10.1111/poms. 12220 22. Sayman, S., Hoch, S.J., Raju, J.S.: Positioning of store brands. Mark. Sci. 21(4), 378– 397(2002) 23. Soberman, D.A., Parker, P.M.: Private labels: psychological versioning of typical consumer products. Int. J. Ind. Organ. 22(6), 849–861 (2004)
A Two-Stage Fishery Game with an Aquaculture Facility Ilyass Dahmouni and Ussif Rashid Sumaila
Abstract In recent years, aquaculture has become a global practice and is now widely adopted by the majority of countries in response to the growing demand for seafood products. Yet, as the wild fisheries continue, careful planning is required to ensure that farmed fish are introduced to the market in an environmentally sound manner. In this paper, we will develop sufficient conditions for overfishing to occur when fishermen are faced with a future aquaculture installation allowing the commercialization of substitutes for wild marine products. By considering different scenarios in a two-stage dynamic model, we show that depending on the model parameters, aquaculture activities can harm the marine biodiversity and becomes an incentive for the over-exploitation of the resource. Moreover, when the aquaculture is owned by the only fisherman in the industry, this situation may prevail. Keywords Dynamic games · Fisheries · Two-stage games · Aquaculture · Blue paradox
1 Introduction In 2019, and for the first time in the history of mankind, global aquaculture production surpassed fishing. For instance, the FAO 2020 [3] report states that its production is estimated at no less than 90 million tons, which is more than the total amount of fish landed in all ports around the globe. As the fastest expanding food industry in the world, aquaculture has the potential to become a pivotal player in responding to the anticipated growth in demand for fish based products. This production is expected to reach 186 million tons in 2030, where tilapia and
I. Dahmouni () Department of Fisheries and Oceans, Ottawa, ON, Canada U. R. Sumaila The University of British Columbia, Fisheries Economics Research Unit, Vancouver, BC, Canada e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Trends in Mathematics, https://doi.org/10.1007/978-3-030-93616-7_4
79
80
I. Dahmouni and U. R. Sumaila
shrimp production are the fastest growing farmed species, particularly in India, Latin America and the Caribbean, and Southeast Asia [9]. While this historical production has both ecological and economic advantages for all stakeholders, it introduces a new type of competition in the seafood industry, that is, between fishermen and fish farmers. As a consequence, the establishment of aquaculture facilities must not be detrimental to the sustainability of wild fish, although it has been argued by Anderson et al. [1] that its development is being driven by ecological factors. In particular, if the quantity of farmed fish put on the market is continuously increasing, some fishermen may speed up their overfishing habits. In practice, such a phenomenon can be observed in different economies: Van Der Ploeg and Withagen [16] have shown that fossil fuels are depleted faster in an economy with a clean backstop, referred to as “the green paradox” effect. McDermott et al. [13] studied one of the largest marine reserves in the world, based in the Phoenix Islands, and showed that fishermen more than doubled their fishing effort once this area was designated for eventual protected status. They estimated that at the global level, this could temporarily increase the share of overexploited fisheries up to 72%, this effect is termed “the blue paradox”. Aquaculture production is expected to grow further, bringing with it many challenges and opportunities, more particularly in developed countries. Not surprisingly, this topic has attracted more interest from various researchers (see [6, 8, 11, 14, 15]). Martinell et al. [12] studied the impact that the closure of high seas fisheries will have on wild fish and its implications for fishmeal destined for aquaculture. In this paper, we study the effect of an aquaculture facility announcement on the harvested amount of wild fish. Without losing sight of the generality of our model, as well as being able to give some insight into the variation in catches, we limit ourselves to a two-period game model. Since its appearance four decades ago as the first attempt to model fisheries interactions using game theory, the Great Fish War of Lehvari and Mirman [10] set the stage for a large number of contributions as this theory has much to offer in bringing insightful results to this field and its concerns. For a comprehensive review of game theory and its application to fisheries, see [2, 4, 5]. We examine three different scenarios, first, we consider the reference case where only wild fishing exists, then, we study the case in which the aquaculture product is introduced and sold in the same market as the wild fish, finally, a third scenario where a sole owner in the industry controls both types of fish products. This approach is intended to answer the research questions that we are raising, which can be summarized under the following points: 1. Under what conditions will an increase in the catches occur when fishermen are confronted with a future aquaculture facility allowing the marketing of substitutes to their wild marine products? 2. Considering different modes of play, how do aquaculture harms the resource stock and pose a threat to biodiversity? 3. Would the effect of aquaculture be reduced with market power practices? In addressing the above questions, this study aims to define the conditions under which the existence of aquaculture may or may not trigger an increase in the current catch. In general terms, our results can be summarized by the following conclusions,
A Two-Stage Fishery Game with an Aquaculture Facility
81
(1) aquaculture has an impact on existing wild harvests, (2) the monopoly does not necessarily do better than competition, (3) and lastly, this topic deserves to be explored in greater depth with a technically more sophisticated model. The rest of this paper is organized as follows: In Sect. 2 we present the model, and in Sect. 3 we provide the solution to cooperative and non-cooperative strategies for all the scenarios. In Sect. 4 we discuss our results and characterize all the conditions associated with the exhaustion of the fish stock under each scenario. Finally, Sect. 5 concludes the paper while highlighting some future possibilities for extension.
2 The Bio-economic Model We consider a two-period game model where the set of players is composed of N different fishing firms exploiting a wild fish stock. In an open access setting, player i = {1, . . . , n} chooses hit , the amount of fish to harvest at stage t = {0, 1}. When fishing activities are in place, the biological reproduction of the fish stock between the two periods is described by the following differential equation xt +1 = xt (1 + α) −
n
hit ,
(1)
i=1
Where xt ≥ 0 is the fish stock at time t and α ∈ [0, 1] is the fish birth rate, the initial stock x(0) = x0 is given.1 When harvesting the wild fish, players are facing a per unit cost. ∀t, let cw ∈ (0, x1t ), fishing the quantity hit is associated with a decreasing cost function with respect to the level of the available stock and is given by2 Cit (hit , xt ) = hit (1 − cw xt )
(2)
Fishers have access to a common market where the landed fish is sold at the equilibrium price Pt set according to the following inverse demand function Pt = a − b
n
hit ,
(3)
i=1
1
In order to avoid the exhaustion of the fish stock at the end of the first period, we assume that x0 > ni=1 hi0 . 1 2 Restricting the parameter c to the subset (0, w min{x0 ,x1 } ) guarantees a non negative cost in both periods.
82
I. Dahmouni and U. R. Sumaila
With a > 1 and b > 1 are time-invariant parameters. The revenue of player i is then given by Rit = hit (a − b
n
hit ),
(4)
i=1
Assuming a profit maximization behavior, player i optimizes the discounted sum of her profit over the two periods, that is, max
{hio ,hi1 }
1
δ t (Rit − Cit ) + S(x1 ),
(5)
t =0
Subject to (1), Where δ ∈ (0, 1) is the discount factor and S(x1 ) = s.x1 is the linear salvage value with s being a fixed parameter. Moving forward, we shall consider the following scenarios: • W (Wild fishing only): In this baseline scenario, we consider that there is only wild fishing and that aquaculture does not exist. Using the above model parameters, the business-as-usual benchmark is provided in a common pool resource context. • A (wild and Aquaculture): This case assumes that in addition to the open access wild fishery, there is an aquaculture product available in period 1. The farmed product y1 is produced by a single firm at a per unit cost ca . We consider that the N players and the farmer are playing simultaneously. • M (Monopoly): This scenario describes the framework where together with the wild exploitation, the product of aquaculture belongs to a single company. In the second period the player’s total production is given by h1 + y1 . As previously mentioned, the identification and the comparison of the results based on these scenarios enable us to investigate the biological and economic effects of aquaculture on existing fishing activities. On one hand, the analysis is made in terms of the harvested quantities and their associated salvage value, and on the other hand in terms of the generated profits. For example, comparing the results of Scenario A with those obtained in Scenario W provides a measure of the impact of aquaculture on the fishing activity, while comparing with those obtained in Scenario M provides an assessment of the extent of market power in such a context.
3 Solutions In this section, we provide the equilibrium strategies and their associated fish stock levels. These results will provide us with necessary conditions under which full, partial and no exhaustion of the fish stock occur. Prior to presenting the results
A Two-Stage Fishery Game with an Aquaculture Facility
83
in different contexts related to the scenarios mentioned above, we first define some parameters in Eqs. (6)–(15) below, where the superscripts (N, C) refers respectively to “Nash” and “Cooperation”, while (W, A, M) are the scenarios for which the solution is derived. Further in this section, we will derive equilibrium solutions where the harvesting strategies in period 1 in all scenarios and play modes are of the linear form, h0 (x0 ) = Φx0 + Γ . While the Γ parameters represent the intercept of the linear harvest curve, the Φ parameters reflect its slope, i.e. the marginal increase in harvest for each additional unit of fish stock at the start of the game. Φ NW =
Γ NW =
2b(a − 1 − δ 2 ns(1 + α)) − nδcw (nδs + 2 + δs + 2bn(a − 1 − δs)) 2 (bn − 1) 2b2(1 + n) + 2δn2 cw (9) 2 (1 + α)(2bn − 1) + b(2 + n)c 4δnc w w Φ NA = 2 (10) 2 (4bn − 1) b (1 + n)(n + 2) + 2δn2 cw
b(n + 2)(a − 1 − δ 2 ns(1 + α)) − nδcw [2(1 + n) + (a + ca )(1 + 4bn) + (4δs + 4)(1 − 2bn)] 2 (4bn − 1) b2 (1 + n)(n + 2) + 2δn2 cw
Φ CA
Γ CA =
2 (1 + α)(2b − 1) + 3nbc δncw w = 2 2 2 b (1 + n)(n + 2) + 2δn cw (4bn − 1)
(11)
(12)
3nb(a − 1 − δ 2 ns(1 + α)) − nδcw [(a + ca )(1 + 4bn) + (2nδs + 4)(1 − 2bn)] 2 (4bn − 1) b2 (1 + n)(n + 2) + 2δn2 cw
ΦM
ΓM =
(6)
b(1 + n)(a − 1 − δ 2 ns(1 + α)) − nδcw (nδs + 2 + δs + 2bn(a − 1 − δs)) 2 (bn − 1) b2 (1 + n)2 + 2δn2 cw (7) 2 (1 + α)(bn − 1) + b(n + 1)c 2δnc w w (8) Φ CW = 2 (bn − 1) 2b2 (1 + n) + 2δn2 cw
Γ CW =
Γ NA =
2 (1 + α)(bn − 1) + 2bc 2δncw w 2 (bn − 1) b2 (1 + n)2 + 2δn2 cw
2 (1 + α)(5b − 1) 3δcw = 2 9b − δcw
(13) (14)
9(ab − 1) + 3δcw (a + ca − 2 + 2δs) − 3δbs(1 + α) − ( 43 )cw (2a − ca − 1 − δs) 2) 2(9b − δcw
(15)
84
I. Dahmouni and U. R. Sumaila
3.1 Scenario W: Fishing Only Suppose that no aquaculture is taking place and wild fishing is the only option in the economy. Assuming that each player’s catch is proportional to the available stock, we specify and compare non-cooperative and cooperative solutions. We first seek a feedback-Nash equilibrium. Then, when the N fishermen are cooperating, we jointly maximize their profits. Analytical results are provided in the propositions below. Proposition 1 Assuming an interior solution, the unique feedback-Nash equilibNW rium harvesting strategies are given by {hNW i0 (x0 ), hi1 (x1 )}, and the equilibrium NW state dynamics are given by x1 such that, NW hNW x0 + Γ NW i0 (x0 ) = Φ
hNW i1
(16)
a − 1 − cw x1NW − δs (x1 ) = (1 + n)b
x1NW = x0 (1 + α) −
n
hNW i0 (x0 )
(17)
(18)
i=1
Proof See appendix.
An interesting result at this point is that with the linear-quadratic structure of the game, the solution strategies presented in Proposition 1 are in their linear forms with respect to the fish stock. Moreover, the harvesting strategy in period 1 being an increasing function of the stock since
∂hNW i1 (x1 ) ∂x1NW
=
does not apply to the first period as the sign of
cw (1+n)b
> 0. However, this result
∂hNW i0 (x0 ) ∂x0NW
= Φ NW is ambiguous,
and will therefore vary depending on the values of each parameter. Having the same form as in the noncooperative solution, the grand coalition strategies and stock dynamics are presented in the following proposition. Proposition 2 Assuming an interior solution, the unique pair of strategies under CW joint fishing is given by {hCW i0 (x0 ), hi1 (x1 )}, and the equilibrium state dynamics CW are given by x1 such that, CW hCW x0 + Γ CW i0 (x0 ) = Φ
hCW i1 (x1 ) =
a − 1 − cw x1CW − nδs 2nb
x1CW = x0 (1 + α) −
n i=1
hCW i0 (x0 )
(19) (20)
(21)
A Two-Stage Fishery Game with an Aquaculture Facility
85
Proof See appendix.
Comparing the results of Propositions 1 and 2 gives rise to the following comments: for n = 1 both results are equal and each player takes into account the number of players involved in the fishery. Moreover, in the second period cooperative harvesting is always lower framework, i.e. than in non-cooperative ! CW (x ) NW (x ) ∂h ∂h 1 1 cw cw NW since ∀n > 1, we have i1 i1 = < h − − hCW CW NW i1 i1 2nb (1+n)b > ∂x1
∂x1
0. Consequently, joint fishing leaves a larger fish stock in the long term, S(x1NW ) < S(x1CW ). However, this result can’t be claimed for all cases for the first period and will depend on the values of the model parameter.
3.2 Scenario A: Fishing and Aquaculture Suppose now that at (t = 0) an aquaculture facility is advertised as being operational in the second period (t =1) and will be producing a farmed fish product that is perceived by the consumer as a perfect substitute to the wild fish. We consider that the quantity of farmed fish is easily predefined and equivalent to y1A , this assumption is motivated by the fact that aquaculture companies invest in certain assets that are capable of producing a precise quantity with a result known in advance. Each unit of the farmed fish costs ca > 0 and is sold in the same market as the wild fish. The inverse demand in period 1 is now given by P1A = a − b
n
A hA i1 − by1 ,
(22)
i=1
Firm i optimizes the discounted sum of her profits over the two periods max
1
A hA i0 ,hi1 t =0
δ t RitA − CitA + S (x1 ) ,
(23)
n A = hA a − b A A with Ri1 i=1 hi1 − by1 . i1 While the aquaculture supplier optimizes her profit in period 1 as follows
max R1A − C1A = y1A P1A − ca , y1A
(24)
86
I. Dahmouni and U. R. Sumaila
In the next propositions we shall present the solutions when fishermen are individually or jointly fishing. But first let’s define the best response functions of the farmer and fishermen in the following lemmas, where l = {N, C},3 Lemma 1 The aquaculture farmer’s optimal strategy is given by y1lA (hlA i1 ) =
lA a − nbhlA i1 (y1 ) − ca 2b
(25)
Lemma 2 Player i’s optimal strategy is given by lA hlA i1 (y1 ) =
a − 1 + cw x1lA − δs − by1lA (hlA i1 ) (1 + n)b
(26)
Proof This result is derived from the simultaneous maximization of the two problems in Eqs. (23) and (24). The solution is straightforward and can be easily derived from the first order conditions. Proposition 3 Assuming an interior solution, the unique feedback-Nash equilibNA NA rium harvesting strategies are given by {hNA i0 (x0 ), hi1 (x1 ), y1 (x1 )}, and the equilibrium state dynamics are given by x1NA such that, NA hNA x0 + Γ NA i0 (x0 ) = Φ
a + 2 cw x1NA − δs − 1 + ca (x1 ) = (2 + n)b
(28)
a − (1 + n)ca − n cw x1NA − δs − 1 = (2 + n)b
(29)
hNA i1 y1NA
(27)
x1NA = x0 (1 + α) −
n
hNA i0 (x0 )
(30)
i=1
Proof The proof for the harvesting strategies is similar to Proposition 1 by solving for n + 1 players using the new price expression P1A for period 1 and the best response functions obtained in Eqs. (23) and (24). In the next proposition, we shall define the solutions when the n players are cooperating among them and not with the monopoly. Proposition 4 Assuming an interior solution, the equilibrium strategies when CA CA fishermen are jointly fishing is given by {hCA i0 (x0 ), hi1 (x1 ) , y1 (x1 )}, and the
3 Case l = {C} refers to cooperation among fishermen who form a joint coalition that plays noncooperatively against the aquaculturist.
A Two-Stage Fishery Game with an Aquaculture Facility
87
equilibrium state dynamics are given by x1CA such that, CA hCA x0 + Γ CA i0 (x0 ) = Φ
hCA i1 (x1 )
a + 2 cw x1CA − nδs − 1 + ca = 3nb
y1CA =
a − 2ca − cw x1CA + nδs + 1 3b
x1CA = x0 (1 + α) −
n
hCA i0 (x0 )
(31) (32)
(33)
(34)
i=1
Proof To prove the results of Proposition 4, calculations similar to those of Proposition 3 should therefore be used in a two-player game framework. As in Scenario A, the catches of the players in the joint fishery are lower than NA those of the non-cooperation in period 1, hCA i1 < hi1 , consequently the farmed fish is higher when the fishermen are cooperating y1CA > y1NA . A direct effect of common fishing is that it leaves a larger fish stock at the end of the horizon, S(x1NA ) < S(x1CA ). Besides this, the overall effect of aquaculture on the strategies of the players and on the fish stock is summarized in the following proposition. Proposition 5 Compared to scenario W, while the effect of aquaculture on the harvested quantity at the first period is unknown, the fishermen are encouraged to harvest more in the second period if the cost of fishing is high enough, that is, if the following conditions hold true, NW • hNA i1 > hi1 if cw > CW • hCA i1 > hi1 if cw >
a+δs−(1+n)ca 2n(x1NA −x1NW )+nx1NA a+n2 δs−n(2ca +1) n(3x1CW −2x1CA )
One possible interpretation for this case is that the associated value that the society assigns to the fish stock at the end of the horizon is lower when aquaculture takes place.
3.3 Scenario M: Monopoly Case Finally, we consider the case where a monopoly has total control over both the wild fishery and the aquaculture facility. This scenario is quite interesting for two reasons: first, it will determine the socially optimal strategies and, second, it will allow for an assessment of the effect of market power and how the property rights of the aquaculture facility will (or will not) affect the stock of the resource. In this respect, we consider that the Monopoly optimizes the discounted sum of her profits
88
I. Dahmouni and U. R. Sumaila
in both periods such as, max
1
h0 ,h1 ,y1
δ t RtM − CtM + S (x1 ) ,
(35)
t =0
Where the revenue and the cost at the second period are given by the following equations,
M M M (36) R1M = (hM 1 + y1 ) a − bh1 − by1 M C1M = cw hM 1 + ca y 1
(37)
Proposition 6 Assuming an interior solution, the single owner strategies are given M M by {hM 0 (x0 ), h1 (x1 ) , y1 (x1 )}, and the equilibrium state dynamics are given by M x1 such that, M M hM 0 (x0 ) = Φ x0 + Γ
hM 1 (x1 ) = y1M (x1 ) =
(38)
a + ca + 2(cw x1M − δs − 1) 3b
(39)
a + 1 − 2ca − cw x1M + δs 3b
(40)
x1M (x0 ) = x0 (1 + α) − hM 0 (x0 ) Proof The prove as similar as to Proposition (4) in a one player setting.
(41)
Obviously, in the absence of aquaculture, the monopoly will harvest less than the grand coalition in order to leverage her market power and extract additional rent from buyers, but this amount is still less than the sum of all catches context of nin theNW CW ) < hM (x M ) < NW ). Nash strategies, i.e. ∀t we have nt=1 hCW (x h t t t t =1 it (xt it In contrast to scenario A, the monopoly will fish more in period 2 under the a −3 following condition cw > a(3−2n)+(4n+3)δs+4n−2nc . This means that the classic 4nx M −32x cw 1
1
economists’ idea that the monopoly is “environmentally friendly” could be rejected out of hand in this context, since the possibility of rearing fish could have a negative impact on the monetary value of the fish left uncaught at the end of the horizon.4
Note that when s = 0 and c = 1 − cw x, the strategies derived for the second period in all a−c . Furthermore, the scenarios correspond to the solution in [7], [section 8.5], that is h1 = b(1+n) for Eq. (28) the solution also corresponds if we consider (N = n + 1) players and assume that a−2c+c a−c a (ca = c). i.e. h1 = a−2c+c b(2+n) = b(1+1+n) = b(1+N) .
4
A Two-Stage Fishery Game with an Aquaculture Facility
89
4 Discussion From an environmental perspective, and with respect to what remains in the waters as unexploited fish stock, there are three possible options: (i) Total depletion if s(x1 ) = 0, (ii) Limited depletion if 0 < s(x1 ) < x0 and, (iii) No significant fishing effect if s(x1 ) = x0 . The following proposition outlines the conditions under which each of these alternatives may be occurring. Proposition 7 The exhaustion of the resource is characterized by the following, with l = {N, C} and j = {W, A, M} lj • If x1 = ni=1 hi1 , then full exhaustion. n lj • If x1 − i=1 hi1 < x0 , then partial exhaustion. n lj • If x1 − i=1 hi1 = x0 , then no significant fishing. The intuition behind the above proposition is that the decision to leave a quantity of stock in the water at the end of the horizon should be based on the level of the fish stock that reflects a specific threshold. That is, when the present value of future marginal environmental damage associated with fishing the remaining stock is greater than the monetary value of the harvested stock. The conditions resulting from Proposition 7 are listed in Table 1. Table 1 displays the level of fishing cost characterizing the impact of the model parameters on the stock of the resource at the end of the horizon. As indicated in Proposition 7, there are three possibilities for each of the scenarios. From a dynamic game angle, it is interesting to consider a broader model with many aquaculture facilities for multiple fish species in an infinite horizon analysis. In addition, the assumption of perfect substitution between wild and farmed fish deserves to be relaxed in order to consider a more realistic figure. A final remark regarding this result is to consider the possibility of examining the rebuilding scenario, i.e. the situation where the stock of the resource is larger at the end of the game than at its lj initial level, that is, when x1 − ni=1 hi1 > x0 . Figure 1 illustrates these findings for the monopoly case. The results are robust to the values of the model parameters. For instance, under the assumption x0 < 2(1−2ca +2δS) , the set of feasible solutions is represented by the gray area where 3b complete depletion will occur at the intersection of this area and the right-hand side curve. While its intersection with the left-hand side curve represents non-significant fishing outcomes. The two intersection segments will exchange positions when x0 > 2(1−2ca +2δS) . 3b
cw < cw < cw < cw < cw
n1 , n > 0 and δ > 0.
Proof of Proposition 2 As in the proof of Proposition 1, we solve the two-period model in backward. Denote by V (x) the joint value function and by V1 (x) the joint value function in period 1.
Second-Period Equilibrium Problem The optimization problem of the joint coalition in period 1 is as follows: V1 (x) =
=
max
h11 ,...,hn1
max
h11 ,...,hn1
"
n i=1 n
$
" hi1 a − b
$
hi1 a − b
i=1
n
%
#
− hi1 (1 − cw x1 ) + δV1 (x)
hi1
i=1
"
+δs x1 (1 + α) −
n
# hi1 − hi1 (1 − cw x1 )
i=1 n
#%
hi1
i=1
Assuming an interior solution, the first-order optimality conditions are as follows: a − 2bnhi1 − (1 − cw x1 ) − nδs = 0,
i = 1, . . . , n.
(52)
The equilibrium harvest of player i in period 1 is given by hCW i1 =
a − 1 + cw (x0 (1 + α) − nhi0 ) − nδs a − 1 + cw x1 − nδs = 2bn 2bn
(53)
To insure that the stock of fish is not exhausted, the following condition must hold true: x1 > nhCW i1 ⇔ x1 >
a − 1 − nδs . 2b − cw
(54)
A Two-Stage Fishery Game with an Aquaculture Facility
95
Checking for the second order condition implies, −2bn < 0 , since b > 1 and n ≥ 1. Substituting for hCW i1 in V1 (x), we get a [a − 1 + cw (x0 (1 + α) − nhi0 ) − nδs] 2b n − [a − 1 + cw (x0 (1 + α) − nhi0 ) − nδs]2 2
V1CW =
1 [a − 1 + cw (x0 (1 + α) − nhi0 ) − nδs] [1 − cw (x0 (1 + α) − nhi0 )] 2b nδs +nδs (1 + α) (x0 (1 + α) − nhi0 ) − [a − 1 + cw (x0 (1 + α) − nhi0 ) − nδs] 2b −
Overall Equilibrium Problem The joint overall optimization problem is given by V (x) =
max
h10 ,...,hn0
n i=1
$
" hi0 a − b
n
# hi0
% − hi0 (1 − cw x0 ) + δV1CW (.)
.
i=1
(55) The first-order equilibrium conditions yield
aδncw 0 = a − 2bhi0 − b(n − 1)hi0 − (1 − cw x0 ) − 2b
+ δn2 cw [a − 1 + cw x0 (1 + α) − ncw hi0 − nδs] ! δ 2 − ncw (a − 2 + 2cw x0 (1 + α) − nδs) − 2n2 cw hi0 2b ! δ 2 sn2 c w 2 2 − n δ s(1 + α) + 2b (56)
96
I. Dahmouni and U. R. Sumaila
Define Φ CW and Γ CW as: Φ CW =
Γ CW =
2 (1 + α)(bn − 1) + b(n + 1)c 2δncw w 2 (bn − 1) 2b2 (1 + n) + 2δn2 cw
(57)
2b(a − 1 − δ 2 ns(1 + α)) − nδcw (nδs + 2 + δs + 2bn(a − 1 − δs)) 2 (bn − 1) 2b2(1 + n) + 2δn2 cw (58)
Then the above conditions give CW hCW x0 + Γ CW i0 (x0 ) = Φ
(59)
Checking for the second order condition implies,
! δ 2 −2n2 cw 1 > n1 , n > 0 and δ > 0. Acknowledgments The authors would like to thank the editors for their outstanding efforts in making this book possible amidst the difficult circumstances of the year 2020 as well as the two anonymous reviewers for their helpful comments.
References 1. Anderson, J.L., Asche, F., Garlock, T.: Economics of aquaculture policy and regulation. Annu. Rev. Res. Econ. 11, 101–123 (2019) 2. Bailey, M., Sumaila, U.R., Lindroos, M.: Application of game theory to fisheries over three decades. Fisheries Res. 102(1–2), 1–8 (2010) 3. FAO: The State of World Fisheries and Aquaculture 2020. Sustainability in Action. Rome (2020). https://doi.org/10.4060/ca9229en 4. Grønbæk, L., Lindroos, M., Munro, G., Pintassilgo, P.: Game theory and fisheries. Fisheries Res. 203, 1–5 (2018) 5. Hannesson, R.: Game theory and fisheries. Annu. Rev. Resour. Econ. 3(1), 181–202 (2011) 6. Hoagland, P., Jin, D., Kite-Powell, H.: The optimal allocation of ocean space: aquaculture and wild-harvest fisheries. Marine Res. Econ. 18(2), 129–147 (2003) 7. Intriligator, M.D.: Mathematical Optimization and Economic Theory. Society for Industrial and Applied Mathematics, Philadelphia (2002)
A Two-Stage Fishery Game with an Aquaculture Facility
97
8. Jensen, F., Nielsen, M., Nielsen, R.: Increased competition for aquaculture from fisheries: Does improved fisheries management limit aquaculture growth? Fisheries Res. 159, 25–33 (2014) 9. Kobayashi, M., Msangi, S., Batka, M., Vannuccini, S., Dey, M.M., Anderson, J.L.: Fish to 2030: the role and opportunity for aquaculture. Aquacult. Econ. Manag. 19(3), 282–300 (2015) 10. Levhari, D., Mirman, L.J.: The great fish war: an example using a dynamic Cournot-Nash solution. Bell J. Econ. 11(1), 322–334 (1980) 11. Liu, Y., Volpe, J., Sumaila, U.R.: Ecological and economic impact assessment of sablefish aquaculture in British. Fisheries Centre, University of British Columbia (2005) 12. Martinell, D.P., Cashion, T., Parker, R., Sumaila, U.R.: Closing the high seas to fisheries: Possible impacts on aquaculture. Marine Policy 115, 103854 (2020) 13. McDermott, G.R., Meng, K.C., McDonald, G.G., Costello, C.J.: The blue paradox: Preemptive overfishing in marine reserves. Proc. Natl. Acad. Sci. 116(12), 5319–5325 (2019) 14. Natale, F., Hofherr, J., Fiore, G., Virtanen, J.: Interactions between aquaculture and fisheries. Marine Policy 38, 205–213 (2013) 15. Safran, P. (ed.): Fisheries and Aquaculture-Volume V. EOLSS Publications (2009) 16. Van der Ploeg, F., Withagen, C.: Is there really a Green Paradox?. J. Environ. Econ. Manag. 64(3), 342–363 (2012)
Ordering in Games with Reduced Memory and Planning Horizon of Players Denis N. Fedyanin
Abstract We suggested and investigated a model of generations change for Cournot competition with predictions and memory. Then, we described the general method to calculate equilibrium and discussed his weak and strongpoints. The numerical experiments have been conducted and confirmed the importance of the periodic solutions. Some analytical solutions were found for periodic solutions. These results seem to be the foundation for solving control problem tasks and better understanding the Stackelberg game’s generalizations. Keywords Cournot competition · Epistemic models · Memory restrictions · Theory of mind
1 Introduction Let n agents choose their strategies in a given order. Every agent observes strategies of h previous agents and believes that a game will stop after p next agents chose their strategies. We use agents and players as synonyms. The main reason for our research is to construct an epistemic model for the game and find equilibria. We used a model from the Theory of mind. The model is based on storyboard of the Sally-Ann or Location False Belief Task [11]. It is an experiment that shows that people tend to consider non-observable as non-happening.Let some actions happen in a room. People who are in the room know who is in the room and actions have happened. Let there are pairs (action, time), and each agent has a subset of time which an agent i is present in the room. We can construct the believes of an agent.
D. N. Fedyanin () ICS RAS, Moscow, Russian Federation HSE University, Moscow, Russian Federation © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Trends in Mathematics, https://doi.org/10.1007/978-3-030-93616-7_5
99
100
D. N. Fedyanin
Stakelbers’s game is an example, if we put n = 2, p = 1, h = 1 [7]. The first player chooses his production volume x1 , then the second player chooses the his production volume x2 . The utility of players are ui = xi (A − x1 − x2 ) − ci (xi ). The first player in a Stakelbers’s game looks at the last known agent and predict its choice by the best response. This choice depend on the strategy of the previous player. The Cournot competition, which is related to Stakelberg game [2], produces a system of linear equation. The solution depends on the observation of an agent’s strategies, and the dependence is a linear function. Using the linear recursive methods, we can construct the analytical solution that is polynomial of power functions. We suggest modifications: assign two parameters to each player— memory and planning horizon or prediction p, built corresponding epistemic models for players and calculate equilibria (Figs. 1 and 2). The activity of agents is synchronized with a step generator. The game can be interpreted if the action’s duration is h, the duration of the presence of an agent in the room is p, and a new agent comes into the room every step. It can also be interpreted as a cultural shift during the change of generations. Let us consider only such case that a utility function of every player depends only on the player’s strategy and the sum of strategies of all players. Furthermore, we modify the game of this class and focus on the modified game. We suggest three modifications. Following the Stackelberg competition model idea, we introduce ordering in deciding between players. We consider ordering as a parameter and will investigate the role of this parameter on players’ decision and their utility functions [1, 6, 9, 10]. The second and third modification is an assignment of two types to each player. A player has memory and planning horizons. Memory is how many preceding players decisions the player can observe or remember. If a player makes her decision late enough and her memory is small enough, then the first player’s strategy is to be neglected by the player. For the sake of simplicity, we assume that the player will behave as if the first player has chosen a zero strategy. The planning horizon is how far the player looks at the future and when the game will stop from her point of view. There are opportunities to handle difficulties with players belief about other players planning horizon, similar to k-level epistemic models. However, we avoid these difficulties in our model. We use the assumption that a player fixes her belief about when the game stop and when it has been started and consider it is accurate and shared knowledge [3, 5, 8]. We found an equilibrium for different general cases of the modified game and investigated these modes. We can use the linear recursive method to find the solution if we have memory h, equal r type, p prediction. Then the general method is to predict the best action for the agent j+p. His action depends on the actions of all previous. Previous is a function of its previous and so on. Then we have the best response (Figs. 3, 4, and 5).
Ordering in Games with Reduced Memory and Planning Horizon of Players
Sally
Ann
1
101
Sally has a black box and Ann has a white box.
Sally has a white Marble that she puts in her box. 2
Sally goes for a walk. 3
Ann takes the marble out of Sally’s box and puts in her own box.
4
Sally comes back and wants to play with the marble
5
Where will Sally look for her marble?
Fig. 1 Schematic illustration of the Sally-Ann false belief theory of mind task from Frontiers in Human Neuroscience [4]. Adults usually say that Sally will look in a black box, though small children think that Ann will look in white box first [11]
2 Model of the Game There is a set N = 1, . . . , m of agents, each agent i has type (p, h). The actual utility function for the agent i is ⎛ fi = xi ⎝A −
j ∈N
⎞ xj ⎠ −
xi2 , ri
(1)
102
D. N. Fedyanin
Fig. 2 Schematic illustration of the Sally-Ann false belief theory of mind task from Frontiers in Human Neuroscience [4]. Adults usually say that Sally will look in a black box, though small children think that Ann will look in white box first [11]
... ← k ← (k + 1) ← (k + 2) ← (k + 3) ← (k + 4) ← (k + 5) ← ... memory
horizon
agent
Fig. 3 Structure of scope of an agent k+3 Bk+2 k+3
...
(k + 1)
(k + 2)
Bk+3 k+4
(k + 3)
Bk+4 k+5
(k + 4)
(k + 5)
...
Fig. 4 Scopes of agent k+3 and phantom agents in his mind
...
(k + 1)
Bk+4 k+3 Bk+3 k+2 Bk+2 k+1
(k + 2)
(k + 3)
(k + 4)
(k + 5)
...
Fig. 5 Scopes of agent k+3 and phantom agents in his mind (cont.)
where A > 0 is a market volume, ri is a parameter which shows how much it is expensive for agent to increase production, xi ≥ 0 is a production volume which an agent i chose. An agent i + 1 chooses this strategy once right after an agent i. An agent 0 chooses his strategy first. However, an agent i has another belief about his
Ordering in Games with Reduced Memory and Planning Horizon of Players
103
utility function. He believes that it is ⎛
⎞
fi = xi ⎝A −
xj ⎠ −
j ∈{max{0,i−h},...,i+p}
xi2 . ri
(2)
Let us denote Si = {max{0, i − h}, . . . , i + p}. For any sequence of agents from N, say ji , . . . , jm , the agent j1 believes that an agent j2 believes . . . the agent im has a utility function ⎛
fim = xim ⎝A −
⎞ xj ⎠ −
j ∈∩b∈i1 ,...im Sb
xi2m rim
.
(3)
The difficulty is to handle all belief carefully to find an equilibrium.
3 Method of Equilibria Calculation An agent i starts his calculation from the last agent in his scope, which is an agent i + p. The agent i + p makes his decision based on his observable history from the point of agent i view. The observable history depends on memory h. The best response expression describes it. ⎛ xi+p = BR ⎝A −
k=i+p−1
⎞ xk ⎠ .
(4)
i+p−h
Then, we should calculate response of the next agent that is agent i + p − 1. We have for this agent the following expression ⎛ xi+p−1 = BR ⎝A −
k=i+p−2
⎞ xk − xi+p ⎠ ,
(5)
i+p−h−1
and then replace the decision of the agent i + p by its expression from previous calculations ⎛ ⎛ ⎞⎞ k=i+p−2 k=i+p−1 xk − BR ⎝A − xk ⎠ ⎠ . (6) xi+p−1 = BR ⎝A − i+p−h−1
i+p−h
For the sake of simplicity further we consider only case when ri = rj = r.
104
D. N. Fedyanin
The expression can be expanded using the expression for best response in our game k=i+p−2 xi+p−1 = BR A − xk − i+p−h−1
+
r 2(r + 1)
r Ar + 2(r + 1) 2(r + 1)
k=i+p−2
xk
i+p−h−2
xi+p−1 .
(7)
We use this property of the best response for our game to simplify some results. BR(A + s) = BR(A) +
sr . 2(r + 1)
(8)
We get xi+p−1 = BR A −
k=i+p−2 i+p−h−1
+
r Ar + xk − 2(r + 1) 2(r + 1)
xi+p−1 = BR ⎝A −
k=i+p−2
xk −
i+p−h−1
xi+p−1
xk
i+p−h−2
r xi+p−1 , 2(r + 1)
⎛
+
k=i+p−2
(9)
r Ar + 2(r + 1) 2(r + 1)
k=i+p−2
⎞ xk ⎠
i+p−h−2
r2 xi+p−1 , 4(r + 1)2
(10)
⎛ k=i+p−2 4(r + 1)2 Ar ⎝ = BR A − xk − 4(r + 1)2 − r 2 2(r + 1) i+p−h−1
+
r 2(r + 1)
k=i+p−2 i+p−h−2
⎞
xk ⎠ ,
(11)
Ordering in Games with Reduced Memory and Planning Horizon of Players
105
and then xi+p−1
4(r + 1)2 = BR 4(r + 1)2 − r 2
r +2 Ar + 2A −− 2(r + 1) 2(r + 1)
k=i+p−1
⎞ xk − xi+p−h−2 ⎠ .
i+p−h−2
(12) These expressions illustrate the difficulty of calculations for large horizon p. One can show that an equilibrium strategy of an agent is a linear combination of strategies of others. Since we have the following expression for the utility function ⎛
k
⎞
i2
⎜ fk = xk ⎝Sk −
2 ⎟ x xj ⎠ − k . r k
(13)
j =i1
we have for the extremum point ⎛ ∂fk ⎜ = ⎝Sk − ∂xk
k
⎞
k
i2
xj − xk
j =i1k
i2 ∂xj i1k
⎟ 2xk xj ⎠ − . ∂xk r
(14)
We can define k
xk =
i2
qjk xj ,
(15)
j =i1k
which leads to the following expressing due to linearity or best response ∂xj j = qk , ∂xk
(16)
qk = 0, j < k,
(17)
j
So we have ⎛
i2k
j =i1
j =i1k ,j =k
∂fk ⎜ = ⎝Sk − xj − xk ∂xk k
i2k
⎞ ⎟ 2xk j = 0. qk xj − xk ⎠ − r
(18)
106
D. N. Fedyanin
Let us find an extremum point for the next agent. ⎛
k
Sk −
i2 j =i1k ,j =k
⎞
k
⎜2 xj = xk ⎝ + r
i2
⎟ j qk + 1⎠ .
(19)
j =i1k ,j =k
The extremum point is ⎛ ⎜2 xk = ⎝ + r
i2k
⎞−1 ⎛ ⎟ j qk + 1⎠
i2k
⎜ ⎝Sk −
j =i1k ,j =k
⎞ ⎟ xj ⎠ .
(20)
j =i1k ,j =k
So we can explicitly write expression for calculating other coefficients if we are given some initial ones ⎛ ⎜2 j qk = − ⎝ + r
k
i2
⎞−1 ⎟ j qk + 1⎠
, j < k.
(21)
j =i1k ,j =k
These coefficients for the specific example are shown in tables in the numerical section.
3.1 Numerical Example Solving the system for arbitrary r is very complicated. However, it is still linear and simple for any given type r, thus we have conducted three experiments for different fixed r. The general form might be calculated using recursive form using the theory of linear recursive series. Details for one of these experiments are shown in the Tables 1, 2, and 3, It has a form xi = c3 (−0.0595801 − 0.558819i)t + c1 (−0.530537)t + c2 0.744028t + c4 (−0.0595801 + 0.558819i)t − 0.132865 for r = 0.3, h = 3, p = 6, where constants should be determined by three initial strategies (Fig. 6).
Ordering in Games with Reduced Memory and Planning Horizon of Players Table 1 Partial derivatives j gi of strategies by choices of previous players in the scope of the agent k
Table 2 Partial derivatives j gi of strategies by choices of previous players (cont.) in the scope of a player k
Table 3 Partial derivatives gik of strategies of player k by his observed strategies of previous players for different types r
xk+6 xk+5 xk+4 xk+3 xk+2 xk+1 xk
xk+6 xk+5 xk+4 xk+3 xk+2 xk+1 xk
xk+1 0.00 0.00 −0.29 −0.26 −0.21 0.00 0.00
A 0.24 0.20 0.18 0.16 0.13 0.10 0.09 xk+2 0.00 −0.27 −0.23 −0.20 0.00 0.00 0.00 r 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90
xk−3 0.00 0.00 0.00 0.00 0.00 0.00 −0.30 xk+3 −0.24 −0.20 −0.18 0.00 0.00 0.00 0.00 A 0.04 0.05 0.07 0.07 0.08 0.08 0.08 0.09 0.09
xk−2 0.00 0.00 0.00 0.00 0.00 −0.30 −0.24 xk+4 −0.24 −0.20 0.00 0.00 0.00 0.00 0.00 xk−3 −0.05 −0.09 −0.12 −0.16 −0.19 −0.22 −0.24 −0.27 −0.30
107
xk−1 0.00 0.00 0.00 0.00 −0.30 −0.25 −0.20 xk+5 −0.24 0.00 0.00 0.00 0.00 0.00 0.00 xk−2 −0.04 −0.08 −0.11 −0.14 −0.16 −0.19 −0.21 −0.23 −0.24
xk 0.00 0.00 0.00 −0.31 −0.25 −0.21 0.00 xk+6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 xk−1 −0.04 −0.07 −0.10 −0.12 −0.14 −0.16 −0.18 −0.19 −0.20
4 No Predictions We consider the zero horizons of the predictions. It means that an agent believes that the game will stop after his choice. There is an example when memory is h, horizon is p and period is T , where period means that xT +k = xk for any k (Fig. 7).
Fig. 6 Strategies for three examples
108 D. N. Fedyanin
Ordering in Games with Reduced Memory and Planning Horizon of Players
109
T
T
... ← xk ← xk+1 ← xk+2 ← xk+3 ← xk+4 ← xk+5 ← ... agent
memory
horizon
Fig. 7 Agent k+3. Strategies and periodic properties
4.1 Periodic Solutions When the Period T = 1 We considered equal strategies of the agents, which corresponds to a period T = 1. The expression of the best response for the agent i for the ⎛ xi∗ =
r ⎝A − 2(r + 1)
i−1
⎞ xj∗ ⎠ .
(22)
j =i−h
Since the strategies are the same then a sum of h of them is just a multiplication of a strategy by h, so xi∗ =
r A − hxi∗ . 2(r + 1)
(23)
It can be rewritten in a simpler form as a function of h Ar , 2(r + 1) + rh
(24)
2 A 1− . 2+h (2 + h)r + 2
(25)
xi∗ (h) = and a function of r xi∗ (r) =
These expressions help to construct optimal control if the control centre can control h or r.
4.2 Periodic Solutions for Even Memory and Period T = 2 Let us assume that memory parameter is even, thus we can denote h = 2k, where k is natural number. The sum of two strategies is ∗
∗
x +y =
ry rx + A − k(x ∗ + y ∗ ) . 2(rx + 1) 2(ry + 1)
(26)
110
D. N. Fedyanin
It leads to the expression for these strategies based on the expression of the best response ⎞ ry rx + ⎟ 2(rx + 1) 2(ry + 1) rx A ⎜ ⎜1 − k⎟ x∗ = ⎠. ry rx 2(rx + 1) ⎝ 1+k + 2(rx + 1) 2(ry + 1)
(27)
⎞ ry rx + ⎟ ry A ⎜ 2(rx + 1) 2(ry + 1) ⎜1 − k⎟ y∗ = ⎠. ⎝ ry rx 2(ry + 1) + 1+k 2(rx + 1) 2(ry + 1)
(28)
⎛
⎛
We have xi∗ =
rx ∗ A − kxi∗ − kxi+1 , 2(rx + 1)
∗ xi+1 =
ry ∗ , A − kxi∗ − kxi+1 2(ry + 1) xi∗ rx (ry + 1) , = yi∗ ry (rx + 1)
(29) (30)
(31)
if k = 1. If rx = ry there are no T = 2 solutions, which are not T = 1 solutions.
4.3 Periodic Solutions for Odd Memory and Period T = 2 Let us assume that memory parameter is odd, so we denote h = 2k + 1 and denote strategies x ∗ = xi∗ ,
(32)
∗ . y ∗ = xi+1
(33)
Ordering in Games with Reduced Memory and Planning Horizon of Players
111
Using the expression for the best response one can write expressions for these strategies separately x∗ =
r A − (k + 1)x ∗ − ky ∗ , 2(r + 1)
x∗ − y∗ = y∗ =
(34)
∗ r y − x∗ . 2(r + 1)
(35)
r A − kx ∗ − (k + 1)y ∗ , 2(r + 1)
(36)
There are still no T = 2 solutions which are not T = 1 solutions, if only r = −1. However, it happens only when r = −2/3, which is not acceptable. 2(r + 1)
4.4 Example for the Periodic Solutions When Period T = 3 Let us denote memory h = T k + m, then one can write the following expressions ⎛ ⎞ i+m r i ⎝A − k xi∗ = xj∗ − xj∗ ⎠ . 2(ri + 1)
(37)
j =i
j
It leads to the expression
xj∗ =
j
i
⎛ ri ⎝A − k 2(ri + 1)
j
⎞ xj∗ ⎠ − m
xj∗ .
(38)
j
⎞ ⎛ ri A ⎟A ⎜ 2(ri + 1) m+1 ⎟ ⎜ i xj∗ = = 1 − ⎟ . ⎜ ri r i ⎠k ⎝ m+1+k m+1+k j 2(ri + 1) 2(ri + 1) i i (39)
If h MOD 3 = 1 then we have r ∗ ∗ − kxi+2 − xi∗ , A − kxi∗ − kxi+1 2(r + 1)
(40)
∗ = xi+1
r ∗ ∗ ∗ A − kxi∗ − kxi+1 , − kxi+2 − xi+1 2(r + 1)
(41)
∗ = xi+2
r ∗ ∗ ∗ A − kxi∗ − kxi+1 . − kxi+2 − xi+2 2(r + 1)
(42)
xi∗ =
112
D. N. Fedyanin
There are no T = 3 solutions here which are not T = 1 solutions. if h MOD 3 = 0 then r A − kx ∗ − ky ∗ − kz∗ . 2(r + 1)
∗ ∗ xi∗ = xi+1 = xi+2 =
(43)
It means there are no T = 3 solutions here which are not T = 1 solutions.
4.5 Numerical Examples Which Show the Importance of Periodic Solutions Modelling shows that zero prediction generates periodic solutions without special restrictions on being periodic (Figs. 8, 9, and 10).
5 Prediction of One Step The utility function can be rewritten in this case as follows ⎛ fi = xi ⎝A −
i j =i−h
⎞⎞ ⎛ i x2 ri+1 ⎝A − xj − xj + xi−h ⎠⎠ − i 2(ri+1 + 1) ri
(44)
j =i−h
Function of best response for this case is ⎛ ⎞ i−1 ri+1 ⎝A − A− xj − xj + xi−h ⎠ 2(ri+1 + 1) j =i−h j =i−h , xi = 2 ri+1 + 2+ (ri+1 + 1) ri i−1
⎛ ⎝A − xi =
i−1 j =i−h
⎞
⎠ 1− xj 2+
ri+1 2(ri+1 + 1)
+
2 ri+1 + (ri+1 + 1) ri
(45)
ri+1 xi−h 2(ri+1 + 1) .
(46)
strategy
0.00
0.05
0.10
0.15
0.20
0.25
0.00
0.05
0.10
0.15
0.20
0
0
25
25
50
50 step
100 125 150 175 200
75 step
100
125 150 175 200
h=100, p=0, r=0.99
75
h=10, p=0, r=0.99
0.00
0.05
0.10
0.15
0.20
0.25
–0.05
0.00
0.05
0.10
0.15
0.20
0.25
0
0
25
25
50
50
75
step
100 125 150 175 200
h=200, p=0, r=0.99
step
75 100 125 150 175 200
h=40, p=0, r=0.99
Fig. 8 Histories: 10, 40, 100, 200 steps for type r = 0.99. Values of strategies are on vertical axis, and number of agent is on the horizontal axis
strategy
strategy strategy
0.25
Ordering in Games with Reduced Memory and Planning Horizon of Players 113
strategy
step
100
125 150 175
0.075
0.100
0.125
0.150
0.175
0.000
step
0.000 100 125 150 175 200
200
0.025
75
h=100, p=0, r=0.5
75
0.000
50
50
0.050
25
25
0.025
0
0
0.025
0.050
0.075
0.100
0.125
0.150
0.050
0.075
0.100
0.125
0.150
0.02
0.04
0.06
0.08
0.10
0.12
0.14
h=10, p=0, r=0.5
0
0
25
25
50
50
100 125 150 175 200 step
75
step
100 125 150 175 200
h=200, p=0, r=0.5
75
h=40, p=0, r=0.5
Fig. 9 Histories: 10, 40, 100, 200 steps for type r = 0.50.Values of strategies are on vertical axis, and number of agent is on the horizontal axis
strategy
strategy strategy
0.16
114 D. N. Fedyanin
strategy
0.00300
0.00325
0.00350
0.00375
0.00400
0.00425
0.00450
0.00475
0.00500
0.00470
0.00475
0.00480
0.00485
0.00490
0
0
25
25
50
50
100 125 150 175 200 step
75 step
100 125 150 175 200
h=100, p=0, r=0.01
75
strategy 0.0020
0.0025
0.0030
0.0035
0.0040
0.0045
0.0050
0.0040
0.0042
0.0044
0.0046
0.0048
0
0
25
25
50
50
100 125 150 175 200 step
75
step
100 125 150 175 200
h=200, p=0, r=0.01
75
h=40, p=0, r=0.01
Fig. 10 Histories: 10, 40, 100, 200 steps for type r = 0.01.Values of strategies are on vertical axis, and number of agent is on the horizontal axis
strategy
0.00495
strategy
h=10, p=0, r=0.01
Ordering in Games with Reduced Memory and Planning Horizon of Players 115
116
D. N. Fedyanin
5.1 Example for the Period T = 3 Let ri = r and a period is 3 then we can modify an expression ⎛ ⎝A − k
i+T
xj −
j =i
xi =
⎞
ri+1 ri+1 xi−h + 2(ri+1 + 1) 2(ri+1 + 1) j =i 2 ri+1 + 2+ (ri+1 + 1) ri
i+m
xj ⎠ 1 −
(47)
and simplify it ⎛ ⎝A − k
j =i
xi =
The sum
i+T
i+T j =i
xj −
i+m
⎞
xj ⎠ 1 −
r 2(r + 1) j =i 2 r + 2+ (r + 1) r
+
r xi−h 2(r + 1) .
(48)
xj will equal to sum of
⎞ r 1 − i+T i+T ⎜ 2(r + 1) ⎟ ⎟T ⎝A − k xj − m xj ⎠ ⎜ ⎝ 2⎠ r j =i j =i + 2+ (r + 1) r ⎞
⎛
⎛
(49)
and ⎛
⎞ r ⎜ ⎟ i+T 2(r + 1) ⎜ ⎟ xj , ⎝ 2⎠ r j =i + 2+ (r + 1) r
(50)
which means that r 2(r + 1) AT 2 r + 2 + i+T (r + 1) r xj = , r r 1− j =i 2(r + 1) 2(r + 1) (k + m) T − r r 2 2 2+ 2+ + + (r + 1) r (r + 1) r 1−
(51)
Ordering in Games with Reduced Memory and Planning Horizon of Players
117
so we have r 2(r + 1) xj = AT . r r j =i T − (k + m) 1 − 2(r + 1) 2(r + 1)
i+T
1−
(52)
This expression provides an explicit form for sum, which is extremely useful for solving control problems when the control centre’s utility function is a sum of agents strategies as usual.
6 Discussion Using generalization of a classic experiment from the theory of mind we constructed epistemic models from bounded observation. The construction is based on an assumption that agent i consider that agent j observe an event p if an agent i observes an event p and an agent i observes that an agent j observes an event p. It is a very strong assumption and has not been verified in experiments for more than three agents but there are reasons to suppose that this assumption holds. We plan to conduct experiments to check it. We found out that this idea might be applied for more general case if replace agents by generations who act at the same time, i.e. year of decades. We found that modelling obtains solution similar of periodic solutions but there is an open question—if it is possible to achieve absolute periodic dynamics and what initial and final boundary conditions should be to obtain it. Acknowledgments This work/article is an output of a research project implemented as part of the Basic Research Program at the National Research University Higher School of Economics (HSE University).
References 1. Abele, S., Bless, H., Ehrhart, K.-M.: Social information processing in strategic decisionmaking: why timing matters. Organ. Behav. Hum. Decis. Process. 93(1), 28–46 (2004) 2. Allaz, B., Vila, J.-L.: Cournot competition, forward markets and efficiency. J. Econ. Theory 59(1), 1–16 (1993) 3. Aumann, R.J.: Interactive epistemology I: Knowledge. Int. J. Game Theory. 28(3), 263–300 (1999) 4. Byom, L., Bilge, M.: Theory of mind: mechanisms, methods, and new directions. Front. Hum. Neurosci. 7, 413 (2013). https://doi.org/10.3389/fnhum.2013.00413 5. Fedyanin, D.: An example of reflexive analysis of a game in normal form. In: Frontiers of Dynamic Games, pp. 1–11. Birkhäuser, Cham (2019) 6. Huck, S., Müller, W.: Perfect versus imperfect observability—an experimental test of Bagwell’s result. Games Econ. Behav. 31(2), 174–190 (2000)
118
D. N. Fedyanin
7. Li, T., Sethi, S.P.: A review of dynamic Stackelberg game models. Discrete Contin. Dynam. Syst. B 22(1), 125–129 (2017) 8. Novikov, D., Chkhartishvili, A.: Reflexion Control: Mathematical models. Series: Communications in Cybernetics, Systems Science and Engineering (Book 5), 298 p. CRC Press (2014) 9. Rapoport, A.: Order of play in strategically equivalent games in extensive form. Int. J. Game Theory 26(1), 113–136 (1997) 10. Spiliotopoulou, E., Donohue, K.L., Gurbuz, M.C.: Do Allocation Mechanisms Drive Strategic Ordering? The Case of Integrated Distribution Systems. April 18, 2019 11. Wimmer, H., Perner, J.: Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children’s understanding of deception. Cognition 13(1), 103–128 (1983)
Problems of Calculation Equilibria in Stackelberg Nonlinear Duopoly Mikhail I. Geraskin
Abstract A duopoly model with a linear demand function and nonlinear cost functions of agents is considered. The game with the multilevel Stackelberg leadership is investigated. We analyze conjectural variations, i.e., the agent’s assumption about changes in the counterparty’s actions, which optimize the latter’s utility function. For an arbitrary Stackelberg leadership level, the formula for calculating the conjectural variations of agents is derived. The main insights are as follows: (1) the variations depend not only on the leadership level, but also on the product of the cost functions concavity/convexity indicators; (2) if at least one of agents has the concave cost function, then the variations can be not only negative, but also positive, and are not limited in absolute value, i.e., the bifurcations can occur. Keywords Duopoly · Stackelberg game · Power cost function · Multi-level leadership
1 Introduction Conjectural variations [2] are a key aspect of the firms’ game in the duopoly market and the basis for calculating the Cournot-Nash equilibrium [4, 14]. In the case of a volume competition, the conjectural variation characterizes the agent’s expected response change in the counterparty’s action (i.e., supply), which optimizes the latter’s utility function with the chosen action of the former. In the linear duopoly model, if the counterparty ignores the agent in accordance with the Cournot model, then the agent becomes the Stackelberg leader [16]. The leader chooses the optimal response on the basis of the negative conjectural variation, i.e., she assumes that the counterparty’s supply decreases with respect to the leader’s supply growth. In the game “struggle for leadership” [8], both agents choose actions taking into account
M. I. Geraskin () Samara National Research University named after academician S.P. Korolev, Institute of Economics and Management, Samara, Russia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Trends in Mathematics, https://doi.org/10.1007/978-3-030-93616-7_6
119
120
M. I. Geraskin
the negative conjectural variations. In this case, both agents become the Stackelberg leaders, and their supply volumes increase. As a consequence, in equilibrium the market volume increases and the price decreases. Therefore, in the case of the linear cost functions, the negative conjectural variation characterizes the typical optimal response in the duopoly game. However, in the nonlinear model, in the case of concave cost functions, i.e., the positive return to scale, atypical optimal responses may occur. We investigate the reasons for the occurrence of the positive conjectural variations and pose the following research question: to search for the conditions for the positiveness of these variations and explain the influence of this phenomenon on the limits of the variations. In an analytical form, the conjectural variations were calculated in the duopoly model with linear demand function and costs functions [1, 5, 9, 11, 12, 17, 18]. In the duopoly model with nonlinear cost functions [3, 13], an analytical form of the variations was derived in games with the first level Stackelberg leader [7]. In the case of the higher levels leadership, the variations were calculated from the nonlinear equations system at each level. The problem of multilevel leadership is relevant in an analysis of the reflexive games [15], in which the different systems of agents’ assumptions generate a differentiation of the game results. We study signs of the conjectural variations and the boundaries of their change, which provides to assess an influence of the leadership level on the players’ behavior for an arbitrary Stackelberg leadership level.
2 Formulation of Problem The nonlinear model of choosing the optimal agent actions in the duopoly has the following form β Q∗i = arg max Πi (Q, Qi ) = arg max (a − bQ)Qi − CF i − Bi Qi i , Qi ≥0
Qi ≥0
i = 1, 2,
(1)
Q = Q1 + Q2 ,
where Qi , Πi are the action (supply volume) and utility function (profit) of the agent i; Q is the aggregate market volume; CF i > 0, Bi > 0, βi ∈ (0, 2) are β the coefficients of the cost function of the type Ci (Qi ) = CF i + Bi Qi i , CF i is the fixed cost; a > 0, b > 0 are the coefficients of the inverse demand function; the symbol “*” indicates the optimal values. In the general case, the agents’ cost functions may be either concave at 0 < βi < 1, or convex at 1 < βi < 2. The first case corresponds to the positive effect of return to scale, and it is observed at the stage of the firm appearance, the second case characterizes the negative effect of return to scale, and it arises in great firms [19].
Problems of Calculation Equilibria in Stackelberg Nonlinear Duopoly
121
The Nash equilibrium in model (1) is determined from the necessary extremum conditions of the following form ∂Πi (Qi , xij ) = 0, ∂Qi
i, j = 1, 2,
(2)
where xij = Qj Qi is the conjectural variation in the reaction equation of the agent i, i.e., the assumed change in the supply of the agent j in response to a unit increment in the supply of the agent i. The Stackelberg leadership level of the agent is determined as follows. A zero level corresponds to the follower agent i, and it occurs if in the equation i of system (2) xij (0) = 0, where the lower index in brackets indicates the leadership level r. The first leadership level of the agent i arises if in the equation i of system (2) the variation xij (1) is calculated by differentiation of another equation (2) with respect to Qi , and in this equation xj i(0) = 0. An arbitrary leadership level r of the agent i occurs if in the equation i of system (2) the variation xij (r) is calculated by differentiation of another equation (2) with respect to Qi , and in this equation xj i = xj i(r−1). For model (1) of the agent i at the leadership level r, the solutions of system (2) satisfy the following equations [6]: β −1
a − bQ − bQi (1 + xij (r) ) − Bi βi Qi i
= 0,
Qi > 0,
i, j = 1, 2,
(3)
subject to ui − xij (r) < 0,
i, j = 1, 2,
(4)
taking into account the following notation β −2
ui = −2 −
Bi βi (βi − 1)Qi i b
< 0,
i = 1, 2
(4a)
where ui (•) is function characterizing the influence of the nonlinearity of the cost function on the unimodality of the utility function of the agent i (when ui = −2, system (3) is linear). Condition (4) characterizes the fulfillment of a sufficient maximum condition (i.e., unimodality condition) of function (1) of the agent i. In comparison to the linear model, in which condition (4) has the form −2 − xij (r) < 0, in the nonlinear model (1) the utility function of the agent may not be unimodal due to the following reasons: (i) the influence of environmental actions, i.e. situations when xij (r) < −2; (ii) a result of the positive effect of return to scale of the agent’s cost function. Therefore, we introduce the assumption that the rate of the marginal costs reduction with an increasing in return to scale (i.e., βi < 1) is less than the rate of decline in the price with an increasing in the supply β −2
|MCiQ | = Bi βi |βi − 1|Qi i i
0 if
fi(r−2) > 0 ∨ (fi(r−2) < 0 ∧ αi(r) < 0), fi(r−2) < 0 ∧ αi(r) > 0;
(8a)
(b) is bounded in absolute value under the following conditions |xi(r)|
< |uj |−1 if fi(r−2) > 0, fi(r−2) < 0, < A−1 i(r) if
where fi(r−2) = ui − xi(r−2), αi(r) = |fi(r−2) |−1 − |uj |, Ai(r) =
min
xi(r−2) ∈Ωi (u1 ,u2 )
|αi(r) |,
Ωi (u1 , u2 ) is a set of possible values xi(r−2) for given values u1 , u2 .
(8b)
124
M. I. Geraskin
Formulas (8) enable us to predict the sign and limit value of the conjectural variation at the leadership level r on the basis of the variation value at the leadership level r − 2. The parameter fi(r−2) is positive if the value xi(r−2) at the leadership level r −2 is negative and exceeds |ui | in absolute value, i.e., in the case of significantly negative xi(r−2); if the variation xi(r−2) is negative and near zero (less than |ui | in absolute value) or positive, then the parameter fi(r−2) is negative. If the absolute deviation |xi(r−2)| from |ui | is small compared to |ui |, then the parameter αi(r) is positive, otherwise this parameter is negative. Consequently, the sign of the agent’s i variation at the leadership level r is determined by the ratio of parameters ui , uj , and the values of the variation at the leadership level r − 2 as follows: if the value xi(r−2) is significantly negative or positive, but the absolute deviation |xi(r−2)| from |ui | is great in comparison with |uj |, then the variation xi(r) is negative; if xi(r−2) has a positive or close to zero negative value, and the absolute deviation |xi(r−2)| from |ui | is small compared to |uj |, then the variation xi(r) is positive. In absolute value, the variation xi(r) is limited by a small number |uj |−1 in the case of significantly negative xi(r−2); if it has a positive or close to zero negative value, then the modulus of the variation xi(r−2) is limited by a large number A−1 i(r) . Next, we carry out numerical experiments. If at least one of the values |ui | < 2, i = 1, 2, i.e., if at least one of the agents has the positive return to scale, the conjectural variations can be positive and exceed unity in absolute value [6]. According to (5a), |ui | ≥ 1, i = 1, 2, therefore, in Table 1, the values of the parameter u2 for different values of u1 and y are presented, and the allowable values are in bold. Further, we prove that, for r = 3, 4, 5, bifurcations are observed. In Table 1, for r = 3,4,5, the values of the co-concavity indicator are italicized. With an increase in the co-concavity indicator y, the values of u1 and u2 converge to unity, i.e., the concavity of the agents’ cost functions increases. A decrease in the parameter y corresponds to a weakening of the concavity and a transition to the linear dependence; moreover, one of the agents can have the convex cost function. In accordance with formula (7), for given value of y, if P(r+1) is close to zero, the peaks of the conjectural variation are observed (Fig. 1). In this case, the bifurcation points with respect to r occur, because at y = 0.5, the peaks of the variations for r = 3, 7, 11 discontinuously transform from −∞ to ∞; this process is not shown on the graph because of the discreteness of r. With an increase in the value of y, the oscillatory type of the dependence P(r) (y) increases, and, for this function, a number of zeros increases (Fig. 2). Next, we consider the dependences of P(r) on the parameter y for the leadership levels r = 1, . . . , 11, which are presented in Figs. 3, 4, and 5. Analysis of the graphs demonstrates that the dependencies P(r) (y) are smooth functions, which makes it possible to estimate their zeros and ranges of signs. For given leadership level r, the equality P(r+1) (y) = 0 corresponds to an infinitely large estimate of the conjectural variation, i.e., ψ(r∞ ) (y∞ ) → ∞, where y∞ , r∞ are the values of the co-
y 0.251 0.333 0.35 0.382 0.45 0.5 0.55 0.65 0.75 0.85 0.95 1
u1 −1 −3.98 −3.00 −2.86 −2.62 −2.22 −2.00 −1.82 −1.54 −1.33 −1.18 −1.05 −1.00
−1.1 −3.62 −2.73 −2.6 −2.38 −2.02 −1.82 −1.65 −1.40 −1.21 −1.07 −0.96 −0.91
Table 1 Dependency u2 (u1 , y)
−1.2 −3.32 −2.50 −2.38 −2.18 −1.85 −1.67 −1.52 −1.28 −1.11 −0.98 −0.88 −0.83
−1.3 −3.06 −2.31 −2.20 −2.01 −1.71 −1.54 −1.40 −1.18 −1.03 −0.90 −0.81 −0.77
−1.4 −2.85 −2.15 −2.04 −1.87 −1.59 −1.43 −1.30 −1.10 −0.95 −0.84 −0.75 −0.71
−1.5 −2.66 −2.00 −1.90 −1.75 −1.48 −1.33 −1.21 −1.03 −0.89 −0.78 −0.70 −0.67
−1.6 −2.49 −1.88 1.79 −1.64 −1.39 −1.25 −1.14 −0.96 −0.83 −0.74 −0.66 −0.63
−1.7 −2.34 −1.77 1.68 −1.54 −1.31 −1.18 −1.07 −0.90 −0.78 −0.69 −0.62 −0.59
−1.8 −2.21 −1.67 −1.59 −1.45 −1.23 −1.11 −1.01 −0.85 −0.74 −0.65 −0.58 −0.56
−1.9 −2.10 −1.58 −1.50 −1.38 −1.17 −1.05 −0.96 −0.81 −0.70 −0.62 −0.55 −0.53
−2 −1.99 −1.50 −1.43 −1.31 −1.11 −1.00 −0.91 −0.77 −0.67 −0.59 −0.53 −0.50
Problems of Calculation Equilibria in Stackelberg Nonlinear Duopoly 125
126
M. I. Geraskin
2,2 1,– 1,2 0,– 0,2 –0,3 –0,8 1
2
3
4
5
6
7
P(r)
8
9
10
P(r+1)
11
12
13
14 r
x1*0,1
Fig. 1 Dependencies of P(r) ,P(r+1) and variations on leadership level at y = 0.505 with bifurcation points at r = 3, 7, 11 (at y = 0.495 values x1(3) , x1(7) , x1(11) are negative)
P(r) 1,5 1,0 0,5 0,0 –0,5 –1,0 –1,5
0
2
4
6
8
10
12
y=0,25
y=0,35
y=0,45
y=0,55
y=0,65
y=0,75
y=0,85
y=0,95
14
r
Fig. 2 Dependences of P(r) on leadership level for different y
concavity indicator y and the leadership level r, for which the variation is unlimited. In addition, the following estimates are important for practice: Y20 is the range of y, for which ψ(r) (y) ≤ 20; Y1 is the range of y, for which ψ(r) (y) ≤ 1; ψmax is the maximum value of the estimate; Y+ is the range of y, for which Si(r) > 0, respectively, Y− = {[0.25; 0.95] \ Y+ } is the range of y, for which Si(r) < 0. We define a pair y∞ , r∞ as a bifurcation point, because at these values a fundamental change in the state of the system occurs.
Problems of Calculation Equilibria in Stackelberg Nonlinear Duopoly
127
P(r) 1,5 1,0 0,5 0,0 –0,5 –1,0
y
–1,5 0,25 0,3 0,35 0,4 0,45 0,5 0,55 0,6 0,65 0,7 0,75 0,8 0,85 0,9 0,95 r=1,2
r=3
r=4
1
r=5
Fig. 3 Dependences of P(r) on y at leadership levels r = 1, . . . , 5
P(r) 1,0 0,5 0,0 –0,5 –1,0
y
–1,5 0,25 0,3 0,35 0,4 0,45 0,5 0,55 0,6 r=5
r=6
0,65 0,7 0,75 0,8 0,85 0,9 0,95 1 r=7
r=8
Fig. 4 Dependences of P(r) on y at leadership levels r = 5, . . . , 8
Table 2 presents a generalization of the estimate ψ and the ranges of the positive variation Y+ according to the data in Figs. 3, 4, and 5 for practically emerging leadership levels. An analysis of the signs and limitations (Table 2) leads to the following conclusions. 1. In the admissible range y ∈ [0.25, 0.95], at the first leadership level, the modulus variation is limited to unity, i.e. Y1(1) = [0.25, 0.95], at the second level, the variation is limited to 20, and the values less than unity are not achieved, i.e. Y1(2) = ∅, Y20(2) = [0.25, 0.95]. From the third level, the variation can take on infinitely large values, i.e. the function P(r+1) has zeros in the range y ∈ [0.25, 0.95]: at r = 3, 4, 5 there is one value y∞ , at r = 6, 7, 8 there are two values y∞ , then three values.
128
M. I. Geraskin
P(r) 1,0 0,8 0,6 0,4 0,2 0,0 –0,2 –0,4 –0,6 –0,8 –1,0 0,25 0,3 0,35 0,4 0,45 0,5 0,55 0,6 0,65 0,7 0,75 0,8 0,85 0,9 0,95 1 y r=8
r=9
r=10
r=11
Fig. 5 Dependences of P(r) on y at leadership levels r = 8, . . . , 11
2. From the third leadership level, the variation can take on infinitely large values, which demonstrates the bifurcation in r. The leadership level of the bifurcation point satisfies the following regularity r∞(r) = r + (r + 1)n ∀r ≥ 3 where n ∈ Z, and, at y∞ , the value ψ(r) takes the equal values (Fig. 1) at the corresponding levels, i.e. ψ(r) (y∞ ) = ψ(r+(r+1)n)(y∞ )
∀r ≥ 3.
3. The bifurcations occur simultaneously for both agents, because, according to formula (7), the bifurcation factor ψ(r) (y) is the same for the agents. 4. The conjectural variations at the first two leadership levels are negative. From the third leadership level, the variations can be positive. The range of the positive values Y+ on the one hand is limited by the bifurcation point y∞(r) , and on the other hand, either by the right border at y = 0.95 for r = 3, 6, 9, or by the bifurcation point of the previous level for r = 4, 5, 7, 8. Consequently, for r = 3, 6, 9, the variations are positive with the strong concavity of the cost functions, and for r = 4, 5, 7, 8, this feature arises even with a weak concavity.
Y+ ∅ ∅ (0.5; 0.95]
(0.382; 0.5)
(0.333;0.382)
(0.643; 0.95]
(0.5;0.643)
(0.426;0.5)
(0.724;0.95]
r 1 2 3
4
5
6
7
8
9
∞
∞
∞
∞
∞
∞
ψmax 1 20 ∞
Y20 (ψ(r) (y) ≤ 20) [0.25; 0.95] [0.25; 0.95] [0.25; 0.48] ∩ [0.51; 0.95] [0.25;0.376] ∩ [0.388;0.95] [0.25;0.33] ∩ [0.336;0.95] [0.25;0.306] ∩ [0.31;0.632] ∩ [0.654;0.95] [0.25;0.292] ∩ [0.295;0.494] ∩ [0.506;0.95] [0.25;0.282] ∩ [0.284;0.422] ∩ [0.436;0.95] [0.25;0.276] ∩ [0.277;0.379] ∩ [0.385;0.714] ∩ [0.734;0.95]
Table 2 Signs and limitations of conjectural variation
[0.85;0.95]
[0.289;0.31] ∩ [0.465;0.65]
[0.57;0.95]
[0.323;0.38] ∩ [0.79;0.95]
[0.36;0.95]
[0.44;0.95]
Y1 (ψ(r) (y) ≤ 1) [0.25; 0.95] ∅ [0.66; 0.95]
0.276; 0.382; 0.724
0.283; 0.426
0.293; 0.5
0.308; 0.643
0.333
0.382
ψ →∞ y∞ – – 0.5
r∞ =9, 19, 29 . . . = 9+10n
r∞ =8, 17, 26 . . . = 8+9n
r∞ =7, 15, 23 . . . = 7+8n
r∞ – – r∞ = 3, 7, 11 . . . = 3+4n r∞ =4, 9, 14 . . . = 4+5n r∞ =5, 11, 17 . . . = 5+6n r∞ =6, 13, 20 . . . = 6+7n
Problems of Calculation Equilibria in Stackelberg Nonlinear Duopoly 129
130
M. I. Geraskin
P(r)/P(r+1) 20,0
10,0
0,0
–10,0
–20,0 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 r=3 Fig. 6 Dependences of 0.382, y∞(5) = 0.333
P(r) P(r+1)
r=4
y 1
r=5
on y at r = 3, 4, 5 with bifurcation points at y∞(3) = 0.5, y∞(4) =
For r = 3, 4, 5, the bifurcation ! points are illustrated in Fig. 6: taking into account , P(r) sign xi(r) = −sign P(r+1) , the figure proves that in the neighborhood of the bifurcation point the left limit of variation is negative, and the right limit is positive, i.e. lim
y→y∞[r] −0
xi(r) = −∞,
lim
y→y∞[r] +0
xi(r) = +∞.
According to method (8), an analysis of the signs and limits of the conjectural variation leads to the following conclusions (Fig. 7). First, at r = 6, the variation becomes positive, because at r = 4 the variation belongs to the range xi(r−2) ∈ (−1.6, 0), i.e. fi(r−2) < 0 ∧ αi(r) < 0 ⇒ xi(r) < 0,
r = 1, . . . , 5,
fi(r−2) < 0 ∧ αi(r) > 0 ⇒ xi(r) > 0,
r = 6.
Second, at r = 7, 14, the variation is limited by the value |u2 |−1 , because fi(r−2) > 0, r = 7, 14; at other levels, it is limited by the value A−1 i(r) = 3.41, −1 because fi(r−2) < 0, r = 7, 14, i.e. |x1(r)| < |u2 | if fi(r−2) > 0 ∀r = 7, 14; |x1(r)| < A−1 i(r) if fi(r−2) < 0 ∀r = 7, 14.
Problems of Calculation Equilibria in Stackelberg Nonlinear Duopoly
131
4,0 3,0 2,0 1,0 0,0 –1,0 –2,0 –3,0 –4,0 –5,0
1
2
3
4 x1
5
6
7
f1(r-2)
8
9 10 11 12 13 14 15 16 17 18 19 20 a1(r)
1/u2
r
1/A1
Fig. 7 Signs and limits of first agent’s conjectural variation at u1 = −1.6, u2 = −2
4 Conclusion The conjectural variation in a duopoly characterizes the agent’s optimal response to an increase in the counterparty’s supply. As a rule, this variation is negative, because the aggregate demand curve is decreasing. In this case, the agent, in order to increase his profit, intends to increase the price, reducing the supply. However, if one of the agents has the concave cost function, the game with the Stackelberg multilevel leadership leads to the positive variation. We prove that the process of this transition occurs abruptly, i.e., a bifurcation arises in the system of two agents. We establish two bifurcation factors that must act simultaneously for this discontinuity jump: 1. at the bifurcation leadership level, the variation falls to −∞ and increases to ∞, and then stabilizes at a positive final value at the next level, if the co-concavity indicator takes a bifurcation value; 2. for the bifurcation co-concavity indicator, the variation limit on the left falls to −∞, and the limit on the right increases to ∞, if the leadership level corresponds to the bifurcation value. We explain a reason for the bifurcation as follows. At the appropriate leadership level and for the specific co-concavity, the positive return to scale of the agents leads to the following process. At first, the market volume grows so sharply that it is optimal for both agents to infinitely reduce the supply volume. Then, the market volume drops so sharply that it is optimal for agents to infinitely increase the supply volume. Thus, our studies demonstrate the cause of the atypical responses in the duopoly. In addition, for the practically applicable leadership levels, we form a set of the co-concavity parameters, at which the bifurcations are observed.
132
M. I. Geraskin
We see the prospects for the development of our research in substantiating the following reasonable hypothesis: for an arbitrary number of oligopoly agents, the conjectural variations are calculated similar to (7) on the basis of the following . P(r) −1 formula xij (r) = u−1 i∈N (ui ) . Accordingly, our regularities j P(r+1) , where y = are fulfilled at the indicated bifurcation points.
Appendix Proof (Of Proposition 1) We transform the expansion (6a) of continued fraction (6) in convergent fractions as follows: ⎧ τ −1 ⎪ ur−τ f or r = 1, 2, ! ⎪ 1 u2 ⎪ ⎪ r−τ τ −1 −1 ⎪ ⎪ 1 − (r − 2)u−1 f or r = 3, 4, u u ⎪ 2 1 u2 ⎨ 1 ! r−τ τ −1 −1 −1 1 p1(r) = u 1 − (r − 2)u1 u2 + 2 (r − 3)(r − 4)u−2 u2 u−2 1 1 2 ⎪ ⎪ ⎪ −1 −2 −2 1 ⎪ ⎪ ur−τ uτ2 −1 [1 − (r − 2)u−1 ⎪ 1 u2 + 2 (r − 3)(r − 4)u1 u2 − ⎪ ⎩ 1 −3 −3 1 − 6 (r − 6)(r − 5)(r − 4)u1 u2 ] f or r = 7, 8,
q1(r) =
⎧ τ ⎪ ur−τ f or r = 1, ⎪ 1 u2 ! ⎪ ⎪ ⎪ r−τ τ 1 − (r − 1)u−1 u−1 ⎪ u u ⎪ 1 1 2 2 ⎨
f or
r = 2, 3,
f or
!
−1 −1 −2 −2 1 τ ur−τ 1 u2 1 − (r − 1)u1 u2 + 2 (r − 2)(r − 3)u1 u2 ⎪ ⎪ ⎪ −1 −2 −2 1 ⎪ ⎪ ur−τ uτ2 [1 − (r − 1)u−1 ⎪ 1 u2 + 2 (r − 2)(r − 3)u1 u2 − ⎪ 1 ⎩ −3 r = 6, 7 − 16 (r − 5)(r − 4)(r − 3)u−3 1 u2 ] f or
f or
r = 5, 6,
r = 4, 5,
Substituting these formulas in (6a), and, taking into account the fact that τ −1 ur−τ 1 u2 r−τ τ u1 u2
−1 −1 = u−1 2 and the notation y = u1 u2 , we obtain the following expression
for the variation of the first agent: x1(r) = u−1 2
P(r)
Q(r)
P(r) , Q(r)
⎧ 1 f or r = 1, 2, ⎪ ⎪ ⎪ ⎨ 1 − (r − 2)y f or r = 3, 4, = ⎪ 1 − (r − 2)y + 1 (r − 3)(r − 4)y 2 f or r = 5, 6, ⎪ 2 ⎪ ⎩ 1 − (r − 2)y + 12 (r − 3)(r − 4)y 2 − 16 (r − 6)(r − 5)(r − 4)y 3
f or
r = 7, 8,
⎧ 1 f or r = 1, ⎪ ⎪ ⎪ ⎨ 1 − (r − 1)y f or r = 2, 3, = ⎪ 1 − (r − 1)y + 12 (r − 2)(r − 3)y 2 f or r = 4, 5, ⎪ ⎪ ⎩ 1 − (r − 1)y + 12 (r − 2)(r − 3)y 2 − 16 (r − 5)(r − 4)(r − 3)y 3
f or
r = 6, 7.
Problems of Calculation Equilibria in Stackelberg Nonlinear Duopoly
133
Similar reasoning for the second agent leads to the following formula for its variation: x2(r) = u−1 1
P(r) . Q(r)
We write the formulas P(r) , Q(r) in the following generalized form: 2r −1
P(r)
2 −1 2t −1 (1−)t / (1−)t 2t/ yt yt = (r − γ ), Q(r) = (r − γ ). t! t! γ =t t =0
r+1
γ =t +1
t =0
. (r−t −1)! .2t −1 Because 2t γ =t +1 (r − γ ) = (r−2t −1)! , γ =t (r − γ ) = formulas have the following form [10]: r2 −1
P(r)
(r−t )! (r−2t )! ,
then these
2 −1 (−1)t (r − t − 1)! (−1)t (r − t)! yt , Q(r) = yt . = t! (r − 2t − 1)! t! (r − 2t)! r+1
t =0
t =0
Comparison of these formulas proves that Q(r) = P(r+1) ; therefore, P(r) −1 P(r) x1(r) = u−1 2 P(r+1) , x2(r) = u1 P(r+1) and, in general, these expressions are written as (7). Proof (Of Proposition 2) We introduce the function fj (r−1) = uj − xj (r−1), then formula (6) has the following form: xi(r) = fj−1 (r−1) .
(9)
An analysis of the function f2(r−1) demonstrate that f2(0) = u2 , −1 −1 f2(1) = u2 − u11 = u2 − f1(0) , f2(2) = u2 − 1 1 = u2 − f1(1) , etc., therefore u1 − u
2
−1 fj (r−1) = uj − fi(r−2) . For variation x1(r), we consider the function −1 f2(r−1) = u2 − f1(r−2) = u2 − (u1 − x1(r−2))−1 .
(10)
The following cases are possible: (i) if f1(r−2) > 0, i.e.x1(r−2) < u1 , then f2(r−1) < 0, therefore, according to (9), x1(r) < 0,
|x1(r)| < 1,
lim
f1(r−2) →∞
|x1(r)| = |u2 |−1 ;
(11)
(ii) if f1(r−2) < 0, i.e. x1(r−2) > u1 , then, according to (10), two options are possible:
134
M. I. Geraskin
(iii) for |f1(r−2)| < |u2 |−1 , the inequality f2(r−1) > 0 holds; therefore, x1(r) > 0,
−1 x1(r) = |f1(r−2)|−1 − |u2 | ,
|x1(r)| > 1,
lim
f1(r−2) →|u2 |−1
(12)
|x1(r)| = ∞;
(iv) for |f1(r−2)| > |u2 |−1 , the inequality f2(r−1) < 0 holds; therefore, x1(r) < 0,
−1 x1(r) = |f1(r−2)|−1 − |u2 | ,
|x1(r)| > 1,
lim
f1(r−2) →|u2 |−1
(13)
|x1(r)| = ∞.
We introduce the function α1(r) = |f1(r−2)|−1 − |u2 | = |u1 − x1(r−2)|−1 − |u2 |, and assume that the minimum value of this function is equal to A1(r) = minx1(r−2) ∈Ω1 (u1 ,u2 ) |α1(r)|, where Ω1 (u1 , u2 ) is a set of admissible values of the variation x1(r−2) for given values u1 , u2 . Then conditions (9)–(12) can be written as follows: < 0 if f1(r−2) > 0 ∨ (f1(r−2) < 0 ∧ α1(r) < 0), x1(r) > 0 if f1(r−2) < 0 ∧ α1(r) > 0; |x1(r)|
< |u2 |−1 if f1(r−2) > 0, < A−1 f1(r−2) < 0. 1(r) if
Similar reasoning for the second agent leads to general notation (8).
References 1. Askar, S., Alnowibet, K.: Nonlinear oligopolistic game with isoelastic demand function: rationality and local monopolistic approximation. Chaos Solit. Fractal. 84, 15–22 (2016) 2. Bowley, A.L.: The Mathematical Groundwork of Economics. Oxford University Press, Oxford (1951) 3. Cavalli, F., Naimzada, A., Tramontana, F.: Nonlinear dynamics and global analysis of a geterogeneous Cournot duopoly with a local monopolistic approach versus a gradient rule with endogenous reactivity. Commun. Nonlinear Sci. Numer. Simulat. 23(1–3), 245–262 (2015) 4. Cournot, A.A.: Researches into the Mathematical Principles of the Theory of Wealth. Hafner, London (Original 1838) (1960) 5. Currarini, S., Marini, M.A.: Sequential play and cartel stability in Cournot oligopoly. Appl. Math. Sci. No. 7(1–4), 197–200 (2012) 6. Geraskin, M.I.: Analysis of conjectural variations in nonlinear model of Stackelberg duopoly. Mathematical methods in engineering and technology. MMTT 5, 81–84 (2020)
Problems of Calculation Equilibria in Stackelberg Nonlinear Duopoly
135
7. Geraskin, M.I., Chkhartishvili, A.G.: Game-theoretic models of an oligopoly market with nonlinear agent cost functions. Autom. Remote Control 78(9), 1631–1650 (2017) 8. Intriligator, M.D.: Mathematical Optimization and Economic Theory. Prentice-Hall, Englewood Cliffs (1971) 9. Karmarkar, U.S., Rajaram, K.: Aggregate production planning for process industries under oligopolistic competition. Eur. J. Oper. Res. 223(3), 680–689 (2012) 10. Korn, G., Korn, T.: Mathematical Handbook for Scientists and Engineers: Definitions, Theorems, and Formulas for Reference and Review. McGraw-Hill Book Company, New York (1968) 11. Ledvina, A., Sigar, R.: Oligopoly games under asymmetric costs and an application to energy production. Math. Financ. Econ. 6(4), 261–293 (2012) 12. Naimzada, A.K., Sbragia, L.: Oligopoly games with nonlinear demand and cost functions: two boundedly rational adjustment processes. Chaos Solit. Fractal. 29(3), 707–722 (2006) 13. Naimzada, A., Tramontana, F.: Two different routes to complex dynamics in an heterogeneous triopoly game. J. Difference Equations Appl. 21(7), 553–563 (2015) 14. Nash, J.: Non-cooperative games. Ann. Math. 54, 286–295 (1951) 15. Novikov, D.A., Chkhartishvili, A.G.: Reflexion and Control: Mathematical Models. CRC Press, London (2014) 16. Stackelberg, H.: Market Structure and Equilibrium: 1st Edition. Translation into English, Bazin, Urch & Hill, Springer. (Original 1934) (2011) 17. Sun, F., Liu, B., Hou, F., Gui, L., Chen, J.: Cournot equilibrium in the mobile virtual network operator oriented oligopoly offloading market. In: IEEE Int. Conf. Communicat., ICC No. 7511340 (2016) 18. Vasin, A.: Game-theoretic study of electricity market mechanisms. Procedia Comput. Sci. 31, 124–132 (2014) 19. Walters, A.A.: Production and cost functions: and econometric survey. Econometrica 31(1), 23–44 (1963)
The Vertex Cover Game Vasily V. Gusev
Abstract The paper describes the vertex cover game (Gusev, Omega 97:102102, 2020) and shows its properties. The peculiarity of such a game is that it takes into account all the covers of the graph. Since the number of coverings is large, methods of decomposition and its analysis are developed. Keywords Game theory · Vertex cover · Power index
1 Introduction There is a concept of the vertex cover of an undirected graph in the graph theory [17]. A vertex cover of a graph G = N, E", E ⊆ {{a, b}|a, b ∈ N} is any subset S of the set of graph vertices N such that any edge of this graph is incident to least one vertex in the set S. We are interested in the vertex cover with a minimal number of vertices, i.e. so-called minimum vertex cover of a graph. Let us list some of the applications for the problem of finding the minimum vertex cover. Vertex cover in transport networks. Let the undirected graph G be a transport network. Each vertex is a crossroad, an edge is a road. If surveillance cameras are deployed at vertices of the minimum vertex cover, then every road portion will be monitored by a camera and the costs of purchasing cameras will be minimized [1, 23]. Vertex cover in a society. Let each vertex of the graph G be a person. If there is a conflict between two people, then there exists an edge between the vertices representing these people. Denote by S the minimum vertex cover. There are no
V. V. Gusev () HSE University, St. Petersburg, Russian Federation
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Trends in Mathematics, https://doi.org/10.1007/978-3-030-93616-7_7
137
138
V. V. Gusev
conflicts between people in the set N\S, and the dimensionality of this set is maximal [7, 19]. To determine the power of vertices in the vertex cover, we use the methods of the cooperative game theory. In the terms of the game theory, each vertex can be regarded as an individual player. We consider the terms ‘vertex’ and ‘player’ to be synonyms. The vertex cover is a coalition of players. If a group of vertices contains at least one vertex cover, the payoff of this group of vertices is one; otherwise it is zero. The characteristic function of the cooperative game takes only two values: 0 and 1 [6]. Shapley suggested using the cooperative game theory for measuring the influence of parties. The application of Shapley values and cooperative games to allocate resources can be found in [4]. An application of the Shapley value to transport and computer networks is described in [14]. The Shapley-Shubik index is also used in benefit allocation in games on electricity networks [24], and for the analysis of hierarchical structures [12], and human behavior [20]. In this study, the Shapley-Shubik index is applied to estimate the power of graph vertices with regard to vertex covers. Methods of decision-making on networks can be found in [8]. The problems of resource allocation on the network are studied in [5].
2 Game Formation Let us introduce several concepts to define the cooperative vertex cover game. A vertex cover S of an undirected graph G = N, E" is a subset of N such that ∀(u, v) ∈ E ⇒ u ∈ S or v ∈ S [11, 16]. The minimum vertex cover of the graph G = N, E" is the vertex cover consisting of the smallest possible number of vertices. A vertex cover S of the graph G is called least vertex cover if for ∀i ∈ S the set S \ {i} is not a vertex cover. Denote by M(G) the set of least vertex covers of the graph G. Definition 1 Let G = N, E", E = ∅ an undirected graph, M(G) is the set of least vertex covers of the graph G. A simple game N, v" is a vertex cover game of G if W m (v) = M(G), that is v(K) =
1, if ∃A ∈ M(G) : A ⊆ K; 0, otherwise.
∀K ⊆ N.
Definition 2 Let G = N, E", E = ∅ an undirected graph, M(G) is the set of least vertex covers of the graph G. A simple game N, v" is a vertex cover game of G if W m (v) = M(G), that is v(K) =
1, if ∃A ∈ M(G) : A ⊆ K; 0, otherwise.
∀K ⊆ N.
The Vertex Cover Game
139
Proposition 1 shows how the characteristic function of the vertex cover game can be represented in the form of an intersection of simple monotone characteristic functions. Proposition 1 Let Y = {1, 2, . . . , r}, G = N, E", Gj = N, Ej ", j ∈ Y be undirected graphs, where E = j ∈Y Ej . Then, ∀K ⊆ N we have v(K) = (v1 ∧ v2 ∧ . . . ∧ vr ) (K), where ∀j ∈ Y N, v", N, vj " are vertex cover game with W m (v) m M(G), W (vj ) = M(Gj ).
(1) =
Proposition 1 is based on the properties of vertex covers of a graph. No papers have been found that analyze simple games in which the set of minimal winning coalitions is the set of least vertex covers of a graph. Applications for the union and intersection of simple functions can be found in the papers [2]. Consider two simple games N, v", N, w", such that |W m (v)| = |W m (w)| = a and ∀A ∈ W m (v) ∀B ∈ W m (w) : A ⊆ B and B ⊆ A. Then, W m (v ∨ w) = {A|A ∈ W m (v) or A ∈ W m (w)}, W m (v ∧ w) = {A ∪ B|A ∈ W m (v) and B ∈ W m (w)}. Hence, |W m (v∧w)| = a 2 , and |W m (v∨w)| = 2a. Since the equality (v∧w)(K) = v(K)+w(K)−(v ∨w)(K), holds for simple games, then instead of considering the characteristic function with a 2 minimal winning coalitions we can simultaneously consider three games, the total number of minimal winning coalitions in the three games being 4a. Knowing the representation of the characteristic function in the form of a conjunction of simple games, one can consider games with a smaller number of minimal winning coalitions. Value φ has linearity property if φ(αv + βw) = αφ(v) + βφ(w), α, β ∈ R; v, w are characteristic functions. If the characteristic function of a cooperative vertex cover game can be represented in the form of a linear combination of characteristic functions, then the linearity property can be used to calculate the Shapley-Shubik index. The original graph can be decomposed into subgraphs so as to fulfill the conditions of Proposition 1. Then, using (v ∧w)(K) = v(K)+w(K)−(v ∨w)(K), the characteristic function of the vertex cover game can be represented in the form of a linear combination of other functions. It is convenient to use the decomposition procedure if a graph is large enough, e.g. a network- or a communication graph. With this approach, it is not necessary to know the minimal winning coalitions of the original characteristic function. The same is true for linear values in games, such as the Banzhaf value, Owen value and others.
140
V. V. Gusev
Proposition 2 A simple game N, v", is a vertex cover game on graph G if and only if there exist simple games N, vl (K)", l ∈ {1, 2, . . . , r}, W m (vl ) = {{il }, {kl }} for which the equality v(K) = (v1 ∧ v2 ∧ . . . ∧ vr )(K)
(2)
holds, and G = N, E", E = {{il , kl }|1 ≤ l ≤ r}. Decompose the characteristic transformation) function the basis (Möbius . v over n , where λ (v) = [22], and set u(x) = λ , ∀x ∈ {0, 1} (v) x S S S⊆N i∈S i |S|−|R| v(R). In that case, if f : [0, 1]n → R is a multilinear extension R⊆S (−1) 0 1 ∂f (t, t, . . . , t)dt [15, 21]. If the minimal of u : {0, 1}n → R, then φi (N, v) = 0 ∂x winning coalitions do not intersect, the following statement is true. Proposition 3 Let W m (v) = {A1 , A2 , . . . , Am }, |Aj | = aj , j = 1, 2, . . . , m; ∀i, j, i = j : Ai ∩ Aj = ∅. Then, in the simple game N, v", the Shapley-Shubik index for the player k ∈ N is calculated by the formula
1
φk (v) =
x ai −1
0
m /
(1 − x aj )dx,
j =1,j =i
where k ∈ Ai , ai = |Ai |, i ∈ {1, 2, . . . , m}. Proof Fix the player k. Since the minimal winning coalitions do not intersect, there exists only one minimal winning coalition containing the player k. Denote this coalition by Ai . Let L ⊆ {1, 2, . . . , m}, L = ∅. Then + + + + + + + + + ∪ Aj + = |Ai | + + ∪ Aj + = ai + aj . +j ∈L + +j ∈L\{i} + j ∈L\{i}
The following sequence of equations is valid: φk (v) =
=
(−1)|L|−1 (−1)|L|−1 + + = = + + ai + j ∈L\{i} aj + + L⊆{1,2,...,m} L⊆{1,2,...,m} + Aj + k∈ k∈ Aj Aj +j ∈L + j∈L j∈L
L⊆{1,2,...,m}\{i}
(−1)|L| = ai + j ∈L aj
L⊆{1,2,...,m}\{i}
(−1)
|L|
1 0
x ai −1+
j∈L aj
dx =
The Vertex Cover Game
141
⎛
1
=
⎝
0
⎛
1
= 0
(−1)|L| · x
ai −1+ j∈L aj ⎠
dx =
L⊆{1,2,...,m}\{i}
x ai −1 ⎝
⎞
(−1)|L| · x
⎞
j ∈L aj
⎠ dx =
1 0
L⊆{1,2,...,m}\{i}
m /
x ai −1
(1−x aj )dx.
j =1,j =i
The lemma is thus proven. Example 1 Let N be the set of players, and 1 ∈ N. W m (v) = {A1 , . . . , Am } is the set of minimal winning coalitions, and elements of the set W m (v) fulfill the following restrictions: 1. ∀i, j, i = j : Ai ∩ Aj = {1}; 2. |Ai | = i + 1. E.g., A1 = {1, 2}, A2 = {1, 3, 4}, A3 = {1, 5, 6, 7}, etc. Find the limit payoff of player 1 in the game N, v", where the number of minimal winning coalitions tends to infinity. We get " lim φ1 (v) = lim
m→∞
=1−
m→∞
∞ 1/
1 − xk
0 k=1
1−
m 1/
#
1 − x ak −1 dx
=
0 k=1
√ √ 4π 3 sinh π 323 √ · dx = 1 − √ ≈ 0.6316. 23 cosh π 23 2
We find that as the number of minimal winning coalitions in the set W m (v) increases, the payoff of player 1 tends to a finite limit. Some limit theorems for the Penrose-Banzhaf value can be found in [18]. The Shapley-Shubik indices for the linear graph consisting of n vertices, n = 2, 3, . . . , 10 are given in Table 1. Numbers from 2 to 10 in the first line indicate the number of vertices in the linear graph. Numbers from 1 to 10 in the first column are
Table 1 Solution for the linear graph 1 2 3 4 5 6 7 8 9 10
2 1/2 1/2 – – – – – – – –
3 1/6 2/3 1/6 – – – – – – –
4 1/6 1/3 1/3 1/6 – – – – – –
5 7/60 17/60 1/5 17/60 7/60 – – – – –
6 1/10 13/60 11/60 11/60 13/60 1/10 – – – –
7 1/12 11/60 3/20 1/6 3/20 11/60 1/12 – – –
8 61/840 131/840 37/280 39/280 39/280 37/280 131/840 61/840 – –
9 23/360 43/315 293/2520 311/2520 151/1260 311/2520 293/2520 43/315 23/360 –
10 2/35 61/504 263/2520 23/210 34/315 34/315 23/210 263/2520 61/504 2/35
142
V. V. Gusev
the players’ numbers. The index will be the highest for the vertices connected to end vertices. Let G = N, E" be a star graph, for which E = {{1, 2}{1, 3}, . . . , {1, n}}. Consider the vertex cover game N, v" of G, where W m (v) = {{1}, {2, 3, . . . , n}}. Calculate Shapley-Shubik index of each player. Elements of the set of minimal winning coalitions do not intersect each other, wherefore Proposition 3 can be applied:
1
φ1 (v) =
(1 − x n−1 )dx = 1 −
0
1
φi (v) =
x n−2 (1 − x)dx =
0
1 , n
1 , i = 1. n(n − 1)
Proposition 4 Let G = N, E" be a complete bipartite graph, L∪R = N, L∩R = ∅, E = {{a, b}|a ∈ L, b ∈ R}. Then, the Shapley-Shubik index for the player i ∈ N in the vertex cover game N, v" has the following form: φi (v) =
1 |L| 1 |R|
− −
1 |L|+|R| , 1 |L|+|R| ,
i ∈ L; i ∈ R.
Proof For a complete bipartite graph, W m (G) = {L, R} is valid. Elements of the set of minimal winning coalitions do not intersect with one another, wherefore Proposition 3 can be applied.
1
φi (v) =
x |L|−1 (1 − x |R| )dx =
1 1 − ,i ∈ L |L| |L| + |R|
x |R|−1 (1 − x |L| )dx =
1 1 − , j ∈ R. |R| |L| + |R|
0
1
φj (v) = 0
Proposition 5 Let G = N, E", E = {{1, 2}, {{1, ap }kp=1 }, {{2, bq }}rq=1 }, ∀p, q : ap = 2, bq = 1, ap = bq . Then, the Shapley-Shubik index for the player i ∈ N in the vertex cover game N, v" has the following form: ⎧ 1 1 1 ⎪ ⎪ ⎪ 2 − k+2 + r+1 − ⎪ ⎨1 − 1 + 1 − φi (v) = 2 1 r+2 1 k+1 ⎪ ⎪ k+1 − k+2 , ⎪ ⎪ ⎩ 1 1 r+1 − r+2 ,
1 r+2 , 1 k+2 ,
i = 1; i = 2; i = ap , p ∈ {1, . . . , k}; i = bq , q ∈ {1, . . . , r}.
The Vertex Cover Game
143
Proof Decompose the graph G = N, E" into subgraph G1 = N, E1 ", G2 = N, E2 " E1 = {{1, 2}, {1, ap }}, p ∈ {1, . . . , k}, E2 = {{2, bq }}, q ∈ {1, . . . , r}, W m (v1 ) = {{1}, {2, a1, . . . , ak }}, W m (v2 ) = {{2}, {b1, . . . , br }}, W m (v1 ∨ v2 ) = {{1}, {2}, {b1, . . . , br }}. Apply Proposition 1, Shapley-Shubik index linearity property, and Proposition 3. We get φ(v) = φ (v1 ∧ v2 ) = φ(v1 ) + φ(v2 ) − φ (v1 ∨ v2 ) .
1
φ1 (v) =
1
(1 − x k+1)dx −
0
1 1 1 1 − + − ; 2 k+2 r +1 r +2
(1 − x)(1 − x r )dx =
0
1
φ2 (v) =
1
x (1 − x)dx + k
0
(1 − x )dx − r
0
=
(1 − x)(1 − x r )dx =
0
1 1 1 1 − + − ; 2 r +2 k+1 k+2
φap (v) =
1
1
x k (1 − x)dx =
0
1 1 − . k+1 k+2
Calculate φbq (v). Since W m (v1 ∨ v2 ) = {{1}, {2}, {b1, . . . , br }}, |{1}| = 1, |{2}| = 1, |{b1, . . . , br }| = r, then φbq (v) =
1
= 0
1
x
r−1
1
(1 − x)dx −
0
(1 − x)r−1 (1 − x)1 (1 − x)1 dx =
0
1
x r−1 (1 − x)dx − 0
(1 − x)r−1 (1 − x)2 dx =
1 1 − . r +1 r +2
An example of the graph used in Proposition 5 is shown in Fig. 1. We get φ1 (v) = 57 1 1 ≈ 0.32, φ2(v) = 140 ≈ 0.41, φap (v) = 20 = 0.05, φbq (v) = 42 ≈ 0.02, p ∈ {1, 2, 3}, q ∈ {1, 2, 3, 4, 5}. 34 105
144
V. V. Gusev
Fig. 1 Star graph with two centers, k = 3, r = 5
a3
a2
b1
1
a1
b2
b3
2
b5
b4
3 Application of the Shapley-Shubik Index for Estimating the Efficiency of Vertices in the Vertex Cover of a Graph Let the graph G be a transport network. A vertex in this graph is a crossroads, an edge is a road. The task is to optimally distribute surveillance cameras. Knowing the power of graph vertices, one can deploy the cameras accordingly. Surveillance cameras are to provide for full coverage of the transport network. If the existing cameras capture the transport network entirely, no more budget allocations are needed to purchase new cameras. Let us consider all possible rearrangements of graph vertices. Let σ denote a rearrangement of vertices. In the rearrangement σ , enumerate vertices as 1, 2, . . . , |V |. Denote by σ (k) the set of vertices occupying in the rearrangement σ positions before and including the vertex with the number k. The coalition of vertices σ (k) in the rearrangement σ is losing if it does not cover the transport network, and winning otherwise. If σ (k − 1) is the losing coalition and σ (k) is the winning one, then the vertex numbered as k is called pivotal for the given rearrangement. Vertices occupying positions preceding the pivotal vertex in the rearrangement do not cover the network. Vertices after the pivotal vertex make no further contribution since the transport network is already covered. Hence, the essential question when arranging cameras is whether a vertex is the pivotal one. Knowing this, the efficiency of each vertex can be calculated by the formula the number of rearrangements in which the vertex i is pivotal with regard to vertex covers φi = , n! where n! is the number of all possible rearrangements among n vertices. The value of φi is the Shapley-Shubik index for the vertex cover game. The higher is the number of the rearrangements where the vertex is pivotal, the higher is the power of this vertex. Denote SGn the set of all simple games with n players. Since the vertex cover game is a simple game and the efficiency axiom (for all v ∈ SGn , ni=1 φi (v) = 1 [9]) is fulfilled for the Shapley-Shubik index, the power of each vertex is not less than zero and the sum of all values is equal to unity.
The Vertex Cover Game
145
Let us demonstrate what properties an array of surveillance cameras will have if the cameras are arranged proportionately to the values of the Shapley-Shubik index in a vertex cover game. The Shapley-Shubik index conforms to the null player, anonymity, symmetry, transfer axioms. Null player axiom: for any v ∈ SGn and any i ∈ N, if i is a null player in game v, then φi (v) = 0. The player i ∈ N is called the null player if v(S) = v(S \ {i}) for all i ∈ S ⊆ N. If the vertex degree is 0, this means there are no roads running across the given crossroads. In the vertex cover game, such a vertex is the null player. Owing to the null player property, cameras will not be deployed in the vertices in which they are unnecessary. Anonymity axiom: for all v ∈ SGn , any permutation π of N, and any i ∈ N, φi (πv) = φπ(i) (v), where (πv)(S) := v(π(S)). The numbers assigned to vertices have no effect on the distribution of cameras. If the vertex numbering scheme is changed but the transport network topology remains the same, the distribution of cameras will not be affected. Owing to the anonymity axiom, all vertices are in an equal position. The Shapley-Shubik index has the symmetry axiom, i.e. if i, j ∈ N, i = j v(S ∪ {i}) = v(S ∪ {j }) ∀S ⊆ N \ {i, j } then φi (v) = φj (v). If there are two vertices symmetrical with respect to the graph’s vertex cover, then these vertices will have equal Shapley-Shubik index in the vertex cover game. This axiom ensures that symmetric vertices are allocated equal numbers of surveillance cameras. Transfer axiom: for any v, w ∈ SGn such that v ∨ w ∈ SGn , φ(v) + φ(w) = φ(v ∧ w) + φ(v ∨ w). This axiom implies that when winning coalitions are added, changes in the solution of the game depend only on the added coalitions. This interpretation of the axiom can be found in [10]. If, for instance, a new road appeared in the transport network or, vice versa, a road has been closed, changes in the distribution of cameras will depend solely on the respective changes in the graph topology, but not on any other factors.
4 Conclusions The article is an overview of the main results from [13]. Games on graphs have become a popular field in game theory because the solution of applied problems requires the analysis of transport, communication, or computer networks. As the society and social interactions develop, this field is moving even further into the foreground. The questions of centrality and significance of objects in a network structure come up. Relying on the graph theory and methods of the cooperative game theory, a graph decomposition technique for Shapley-Shubik index estimation was suggested. This procedure permits representing the characteristic function of the original game in the form of a linear combination of simpler characteristic functions. Using this approach, the Shapley-Shubik index was calculated in the cooperative vertex cover game for a transport network and some classes of graphs.
146
V. V. Gusev
The cooperative game in this paper depends on vertex covers of the graph. Since there exist also other covers of a graph, such as the edge cover, it is interesting to consider the corresponding simple games, too. If the covers are interrelated, is there also a relationship between the respective simple games? The usual procedure in mathematical models of resource allocation is that a functional is composed to be then optimized. One of the properties of the solution is that the composed functional reaches the required value. The cooperative game theory has a different approach to resource allocation. The optimal distribution fulfils several axioms, these axioms having an applied interpretation. Where a graph has several minimum vertex covers, a measure of centrality can be composed based only on minimum vertex covers. Acknowledgments The paper was prepared within the framework of the HSE University Basic Research Program
References 1. Alrasheed, H.: δ-Hyperbolicity and the core-periphery structure in graphs. In: Machine Learning Techniques for Online Social Networks (pp. 23–43). Springer, Cham (2018) 2. Alvarez-Mozos, M., Alonso-Meijide, J.M., Fiestras-Janeiro, M.G.: On the externality-free Shapley-Shubik index. Games Econ. Behav. 105, 148–154 (2017) 3. Alonso-Meijide, J.M., Freixas, J., Molinero, X.: Computation of several power indices by generating functions. Appl. Math. Comput. 219(8), 3395–3402 (2012) 4. An, Q., Wen, Y., Ding, T., Li, Y.: Resource sharing and payoff allocation in a three-stage system: Integrating network DEA with the Shapley value method. Omega 85, 16–25 (2019) 5. Bhadury, J., Mighty, E.J., Damar, H.: Maximizing workforce diversity in project teams: A network flow approach. Omega 28(2), 143–153 (2000) 6. Brams, S.J., Affuso, P.J.: Power and size: A new paradox. Theory Decis. 7(1–2), 29–56 (1976) 7. Chen, N.: On the approximability of influence in social networks. SIAM J. Discrete Math. 23(3), 1400–1415 (2009) 8. Dong, Y., Liu, Y., Liang, H., Chiclana, F., Herrera-Viedma, E.: Strategic weight manipulation in multiple attribute decision making. Omega 75, 154–164 (2018) 9. Dubey, P.: On the uniqueness of the Shapley value. Int. J. Game Theory 4(3), 131–139 (1975) 10. Dubey, P., Einy, E., Haimanko, O.: Compound voting and the Banzhaf index. Games Econ. Behav. 51(1), 20–30 (2005) 11. Filiol, E., Franc, E., Gubbioli, A., Moquet, B., Roblot, G.: Combinatorial optimisation of worm propagation on an unknown network. Int. J. Comput. Sci. 2, 124 (2007) 12. Gallardo, J.M., Jiménez, N., Jiménez-Losada, A.: A Shapley measure of power in hierarchies. Inf. Sci. 372, 98–110 (2016) 13. Gusev, V.V.: The vertex cover game: Application to transport networks. Omega 97, 102102 (2020) 14. Hadas, Y., Gnecco, G., Sanguineti, M.: An approach to transportation network analysis via transferable utility games. Transp. Res. B Methodol. 105, 120–143 (2017) 15. Harsanyi, J.C.: A simplified bargaining model for the n-person cooperative game. Int. Econ. Rev. 4(2), 194–220 (1963) 16. Hu, S., Li, R., Zhao, P., Yin, M.: A hybrid metaheuristic algorithm for generalized vertex cover problem. Memetic Comput. 10(2), 165–176 (2018)
The Vertex Cover Game
147
17. Karp, R.M.: Reducibility among combinatorial problems. In: Complexity of Computer Computations (pp. 85–103). Springer, Boston, MA (1972) 18. Kurz, S.: A note on limit results for the Penrose-Banzhaf index (2018), Available at SSRN 3229289 19. Lusher, D., Koskinen, J., Robins, G. (eds.) Exponential Random Graph Models for Social Networks: Theory, Methods, and Applications, p. 337. Cambridge University Press, Cambridge (2013) 20. Molinero, X., Riquelme, F., Serna, M.: Cooperation through social influence. Eur. J. Oper. Res. 242(3), 960–974 (2015) 21. Owen, G.: Game Theory. Academic Press (3rd edn.), San Diego, USA (1995) 22. Shapley, L.S.: A value for n-person games. Contrib. Theory Games 2(28), 307–317 (1953) 23. Tamura, H., Sugawara, H., Sengoku, M., Shinoda, S.: Multiple cover problem on undirected flow networks. Electron. Commun. Japan (Part III: Fundamental Electronic Science) 84(1), 67–74 (2001) 24. Wu, Q., Ren, H., Gao, W., Ren, J.: Benefit allocation for distributed energy network participants applying game theory based solutions. Energy 119, 384–391 (2017)
Application of the Decomposable Penalty Method to a Class of Generalized Nash Equilibrium Problems Igor Konnov
Abstract We consider an extension of a noncooperative game problem where players have joint binding constraints. In this case, justification of a generalized equilibrium point needs a reasonable mechanism for attaining it. We propose a penalty method and shares allocation of right-hand sides, which replaces the initial problem with a sequence of the usual Nash equilibrium problems together with an upper level variational inequality as a master problem. This approach also takes into account additional constraints that can be imposed on the share allocation at the upper (regulator) level. Convergence of solutions of approximate penalized problems to a solution of the initial equilibrium problem is established under natural conditions. Keywords Noncooperative games · Joint constraints · Generalized equilibrium points · Shares allocation · Approximate penalty method · Variational inequality
1 Introduction The custom l-person noncooperative game is determined by particular strategy sets Xi ⊆ Rni and payoff (utility) functions fi : X → R of all the players for i = 1, . . . , l, where X = X1 × · · · × Xl , n =
l
ni .
i=1
Then the well known Nash equilibrium point serves as its most popular solution concept; see [1]. We recall that a point x ∗ = (x1∗ , . . . , xl∗ )$ ∈ X is said to be a Nash
I. Konnov () Department of System Analysis and Information Technologies, Kazan Federal University, Kazan, Russia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Trends in Mathematics, https://doi.org/10.1007/978-3-030-93616-7_8
149
150
I. Konnov
equilibrium point for the game, if ∗ fi (x−i , yi ) ≤ fi (x ∗ )
∀yi ∈ Xi , i = 1, . . . , l;
(1)
where we set (x−i , yi ) = (x1 , . . . , xi−1 , yi , xi+1 , . . . , xl ). At the same time, there exist a number of game-theoretic models where the players have the joint (binding) constraints; see e.g. [2–9]. In this generalized setting the players have the joint constraint set 1 l n Y = x∈R hi (xi ) ≤ b , i=1
where hi : Rni → Rm is a given vector function with the coordinates hij : Rni → R, j = 1, . . . , m, for i = 1, . . . , l, b ∈ Rm is a fixed vector in Rm , as an addition to the set X. That is, the players have the common feasible set Y = X
2
Y.
(2)
A point x ∗ = (x1∗ , . . . , xl∗ )$ ∈ Y is said to be a generalized Nash equilibrium point for this game, if ∗ fi (x−i , yi ) ≤ fi (x ∗ )
∗ ∀(x−i , yi ) ∈ Y , i = 1, . . . , l;
(3)
see e.g. [2, 3, 7]. The main question of this extension consists in implementation of the constrained equilibrium concept within the custom noncooperative framework, where players are independent and make their choices simultaneously. In fact, the presence of the binding constraints requires certain treaties or concordant actions of the players, thus contradicting the above assumptions. Rather recently, the right-hand side decomposition approach was suggested for generalized noncooperative games in [9]. Within this approach, the initial problem is treated as a two-level one by using a share allocation procedure, which leads to a variational inequality as a master problem, whereas a usual noncooperative game problem is solved at the lower level. Further, a decomposable penalty method that overcomes certain drawbacks of the streamlined allocation procedure was suggested in [9, 10]. In fact, the penalty method makes the lower level problems well defined under mild assumptions and yields a single-valued approximation of the master problem in the monotone case. Application of this method for production problems with common pollution regulation was described in [11]. In this paper, we present further development of the decomposable penalty method for generalized noncooperative games. Namely, we admit approximate solutions of the lower level game problem, which enables us to create implementable variants of the method. Besides, we now take into account additional constraints that can be imposed on the share allocation at the upper (regulator) level. Nevertheless, we show convergence of approximate penalized solutions to a solution of the initial problem under natural conditions.
Application of the Penalty Method to Equilibrium Problems
151
2 Constrained Shares Allocations We first fix our basic assumptions. (A1) Each strategy set Xi ⊆ Rni is convex and compact and each utility function fi is concave in its i-th variable xi and continuous for i = 1, . . . , l. Also, hij : Rni → R, j = 1, . . . , m, i = 1, . . . , l are convex functions. The two-level share allocation methods from [9, 10] for problem (3) are based on the following simple transformation of the joint constraint set: Y = x ∈ Rn ∃u ∈ Rml ,
l
1 ui = b, hi (xi ) ≤ ui , i = 1, . . . , l ,
i=1
where u = (u1 , . . . , ul )$ gives a partition of the right-hand side vector b so that ui ∈ Rm is the explicit share of the i-th player, i = 1, . . . , l. For any fixed partition u, the players have no joint constraints and hence are able to find the corresponding Nash equilibrium point parametrized by u. Then the upper level regulator who can also assign additional penalties for violation of particular shares finds the optimal shares, which yield a generalized Nash equilibrium point. In such a way, inserting the upper level regulator leads to the standard Nash equilibrium problem (NEP for short) setting for the players. In this paper, we consider the situation when some additional restrictions may be imposed on the regulator share allocation choice in the sense that the regulator partitions must belong to a set U0 . For instance, it can be defined as follows U0 = u ∈ Rml di ≤ ui ≤ di , i = 1, . . . , l , but may in general have some other format. Hence, we can define the whole set of feasible partitions: V = u ∈ U0
l
1 ui = b .
i=1
Similarly, we replace the set Y with the following * ) E = x ∈ Rn ∃u ∈ V , hi (xi ) ≤ ui , i = 1, . . . , l and define D=X
2
E;
152
I. Konnov
cf. (2). Instead of problem (3) we have to now find a point x ∗ = (x1∗ , . . . , xl∗ )$ ∈ D such that ∗ fi (x−i , yi ) ≤ fi (x ∗ )
∗ ∀(x−i , yi ) ∈ D, i = 1, . . . , l.
(4)
These new conditions need rather weak additional assumptions. (A2) The set U0 ⊆ Rml is convex and closed and the common feasible set D is nonempty. Lemma 1 If (A1) and (A2) are fulfilled, then the set E is convex and closed. Proof Take arbitrary points x , x ∈ E. Then there exist points u , u ∈ V such that hi (xi ) ≤ ui and hi (xi ) ≤ ui for i = 1, . . . , l. Take any α ∈ [0, 1] and set u(α) = αu + (1 − α)u , then u(α) ∈ V since V is now convex. Next, set x(α) = αx + (1 − α)x , then x(α) ∈ X and for each index i we have hi (xi (α)) ≤ αhi (xi ) + (1 − α)hi (xi ) ≤ αui + (1 − α)ui = ui (α), hence x(α) ∈ E and E is convex. Take an arbitrary sequence {x k } ⊂ E and suppose that {x k } → x. ¯ Then there exists the corresponding sequence {uk } ⊂ V such that hi (xik ) ≤ uki for i = 1, . . . , l. If {uk } is unbounded, i.e. %uk % → ∞, there exists at least one pair of indices j and t such that ukj t → −∞ due to the balance equality. However, this contradicts the boundedness of the sequence {x k }. Therefore, the sequence {uk } is also bounded and has limit points. Without loss of generality we can now suppose that {uk } → u¯ as k → ∞. Then u¯ ∈ V since V is now closed. Besides, we have hi (x¯i ) ≤ u¯ i for i = 1, . . . , l, which means that x¯ ∈ E and E is closed. Following the Nikaido-Isoda approach from [12], we can also define the normalized equilibrium problem (EP for short) of finding a point x ∗ = (x1∗ , . . . , xl∗ )$ ∈ D such that Φ(x ∗ , y) ≥ 0 ∀y ∈ D,
(5)
where Φ(x, y) = Ψ (x, x) − Ψ (x, y), Ψ (x, y) =
l i=1
fi (x−i , yi );
Application of the Penalty Method to Equilibrium Problems
153
its solutions will be called normalized equilibrium points. From the assumptions in (A1) and (A2) it follows that Φ : X × X → R is an equilibrium bi-function, i.e., Φ(x, x) = 0 for every x ∈ X, besides, Φ(x, ·) is convex for each x ∈ X and Φ(·, ·) is continuous. It should be noted that each normalized equilibrium point is a solution to problem (4), but the reverse assertion is not true in general. Due to Lemma 1, the set D is nonempty, convex and compact. Hence, we can obtain the existence result for EP (5) via a proper adjustment of the classical Ky Fan inequality assertion from [13]. In turn, this yields the existence result for problem (4). Proposition 1 If (A1) and (A2) are fulfilled, then EP (5) has a solution.
3 Decomposable Penalty Method We will consider the auxiliary penalized EP: Find a pair w(τ ) = (x(τ ), u(τ )) ∈ X × V , τ > 0 such that Φτ (w(τ ), w) = Φ(x(τ ), x) + τ [P (w) − P (w(τ ))] ≥ 0 ∀w = (x, u) ∈ X × V ,
(6)
where P (w) =
l
Pi (wi ), Pi (wi ) = Pi (xi , ui ) = 0.5%[hi (xi ) − ui ]+ %2 , i = 1, . . . , l.
i=1
Here and below [t]+ denotes the projection of a point t ∈ Rs onto the non-negative orthant Rs+ . This means that we take the most popular quadratic differentiable penalty functions for violation of particular share allocations, but the same approach can be applied with other functions. We can show that approximate solutions of penalized EP (6) converge to normalized equilibrium points defined in (5) as τ & +∞. However, we intend to first show that each penalized EP (6) is equivalent to a two-level problem where a usual NEP is solved at the lower level. We need an auxiliary property on equivalent formulations of optimization problem from [14]. We recall that the function ϕ : Rs → R is called isotone, if for any points u, v, u ≥ v it holds that ϕ(u) ≥ ϕ(v). Lemma 2 ([14, Proposition 5]) Suppose that W is a nonempty, convex, and closed set in Rt , μ : Rt → R is a convex function, ϕ : Rs → R is a differentiable convex function on a convex set W ⊇ H (W ), H : Rt → Rs is a continuous mapping with convex components Hi : Rt → R, i = 1, . . . , s, and a function ϕ is isotone on a convex set W . Then the optimization problem min → {μ(v) + ϕ(H (v))},
v∈W
154
I. Konnov
is equivalent to the mixed variational inequality: Find a point v ∗ ∈ W such that μ(v) − μ(v ∗ ) + ϕ (H (v ∗ ), H (v) − H (v ∗ )" ≥ 0 ∀v ∈ W. Let us now define the mixed EP: Find a pair w(τ ) = (x(τ ), u(τ )) ∈ X × V , such that Φ(x(τ ), x) + τ
l
[hi (xi (τ )) − ui (τ )]+ , hi (xi ) − hi (xi (τ ))"
i=1
+τ
l
[hi (xi (τ )) − ui (τ )]+ , ui (τ ) − ui " ≥ 0
i=1
∀w = (x, u) ∈ X × V .
(7)
If w(τ ) = (x(τ ), u(τ )) is a solution of problem (6), we can temporarily set φ(x) = Φ(x(τ ), x) and obtain φ(x) − φ(x(τ )) + τ [P (w) − P (w(τ ))] ≥ 0
∀w = (x, u) ∈ X × V ,
i.e. w(τ ) is a solution of the optimization problem. Applying now Lemma 2, we conclude that w(τ ) solves (7). The reverse assertion is obtained similarly due to Lemma 2. Moreover, problem (7) is clearly equivalent to the system: Find a pair w(τ ) = (x(τ ), u(τ )) ∈ X × V , such that Φ(x(τ ), x) +τ
l
[hi (xi (τ )) − ui (τ )]+ , hi (xi ) − hi (xi (τ ))" ≥ 0
∀x ∈ X,
(8)
i=1 l
[hi (xi (τ )) − ui (τ )]+ , ui (τ ) − ui " ≥ 0 ∀u ∈ V .
(9)
i=1
We now collect the obtained properties. Lemma 3 Let the conditions in (A1) and (A2) be fulfilled. Then problems (6), (7), and (8)–(9) are equivalent. Given a point u ∈ V , we can solve only problem (8) in x, which is to find x(u) ∈ X such that Φ(x(u), x) + τ
l i=1
[hi (xi (u)) − ui ]+ , hi (xi ) − hi (xi (u))" ≥ 0 ∀x ∈ X.
(10)
Application of the Penalty Method to Equilibrium Problems
155
Let X(u) denote the whole solution set of this problem. For each x(u) ∈ X(u) we set g(u) = (g1 (u), . . . , gl (u))$ , where gi (u) = −[hi (xi (u)) − ui ]+ , i = 1, . . . , l.
(11)
Thus we can define the mapping value G(u) = {g(u) | x(u) ∈ X(u)} . Bearing in mind (9), we now define the variational inequality (VI for short): Find a point u∗ ∈ V such that ∃g(u∗ ) ∈ G(u∗ ), g(u∗ ), u − u∗ " ≥ 0,
∀u ∈ V .
(12)
Proposition 2 Suppose (A1) and (A2) are fulfilled. (i) If a point u∗ solves VI (12), then there exists a point x ∗ = x(u∗ ) ∈ X(u∗ ) such that g(u∗ ) is defined in (11) at u = u∗ and that the pair w∗ = (x ∗ , u∗ ) is a solution of problem (6). (ii) If a pair w(τ ) = (x(τ ), u(τ )) is a solution of problem (6), then the point u∗ = u(τ ) solves VI (12). The assertions follow directly from the definitions and Lemma 3. Furthermore, problem (10) appears to be equivalent to the following penalized EP: Find x(u) ∈ X such that l %[hi (yi ) − ui ]+ %2 Φ(x(u), y)+ 0.5τ i=1
−%[hi (xi (u)) − ui ]+ %2 ≥ 0 ∀y ∈ X.
(13)
This property can be obtained similarly by the application of Lemma 2. However, (13) does not contain coupled variables and hence is equivalent to the NEP: Find x(u) ∈ X such that f˜i (x−i (u), yi ) ≤ f˜i (x(u)),
∀yi ∈ Xi , i = 1, . . . , l;
(14)
where the i-th player has the penalized utility function f˜i (x) = fi (x) − 0.5τ %[hi (xi ) − ui ]+ %2 ;
(15)
cf. (1). Therefore, X(u) is precisely the set of Nash equilibrium points in (14)–(15) and we have obtained the basic equivalence result.
156
I. Konnov
Theorem 1 Suppose (A1) and (A2) are fulfilled. (i) If a point u∗ solves VI (12), then there exist a solution x ∗ = x(u∗ ) of NEP (14)– (15) and an element g(u∗ ) defined in (11) at u = u∗ such that the pair w∗ = (x ∗ , u∗ ) is a solution of problem (6). (ii) If a pair w(τ ) = (x(τ ), u(τ )) is a solution of problem (6), then the point u∗ = u(τ ) solves VI (12) and the point x(τ ) is a solution of NEP (14)–(15) at u = u(τ ). We conclude that VI (12) related to the parametric NEP (14)–(15) yields a solution for the penalized game problem. Observe that NEP (14)–(15) always has a solution again due to the adjustment of the Ky Fan inequality property from [13]. Proposition 3 If (A1) and (A2) are fulfilled, then NEP (14)–(15) has a solution for any u ∈ V and τ > 0. In addition, we intend to establish the existence result for the penalized EP (6). Since the set V is unbounded in general, we will prove a proper coercivity condition for EP (6). Proposition 4 If (A1) and (A2) are fulfilled, then EP (6) has a solution for each τ > 0. Proof Fix a point x˜ ∈ D. Then there exists a point u˜ ∈ V such that hi (x˜i ) ≤ u˜ i , i = 1, . . . , l. Set w˜ = (x, ˜ u) ˜ and choose any τ > 0. Next, take any infinite sequence {wk } with wk = (x k , uk ) ∈ X×V such that %wk −w% ˜ → ∞ as k → ∞. We proceed to show that there exists a subsequence {wks } such that Φτ (wks , w) ˜ → −∞
as ks → ∞.
(16)
˜ ≤ C ≤ ∞. Hence %uk − u% ˜ → ∞. It follows Since the set X is unbounded, %x k − x% that there exists at least one pair of indices j and t such that ukjst → −∞ due to the balance equality for some subsequence {uks }. But the sequence {x k } is bounded, hence %[hj (xjks ) − ukjs ]+ %2 → +∞, which implies (16). Now the existence result follows from Theorem 1 in [15]. Within the proposed approach, the upper level system regulator determines the right share allocation vector u(τ ) for each chosen penalty parameter τ as a solution of VI (12). The regulator can send some trial vectors u to the players so that they must only correct their utility functions in conformity with (15) and then determine the usual Nash equilibrium point in (14). Proper changing the penalty parameter should lead to a solution of the initial problem (4).
Application of the Penalty Method to Equilibrium Problems
157
4 Convergence of the Penalty Method In this section, we intend to substantiate the penalty method with approximate penalized problems. In fact, finding an exact solution of EP (6) is very difficult. Given numbers τ > 0 and ε > 0, we take the approximate problem of finding a pair w(τ, ε) = (x(τ, ε), u(τ, ε)) ∈ X × V such that Φτ,ε (w(τ, ε), w) = Φ(x(τ, ε), x) + τ [P (w) − P (w(τ, ε))] + ε ≥ 0 ∀w = (x, u) ∈ X × V ,
(17)
Thus, the approximate penalty method creates the trajectory {w(τ, ε)} for approximation of a solution of EP (5) as τ → +∞ and ε → 0. Observe that any concordance rules between τ and ε are not necessary here. For any sequences {τk } and {εk } we will set wk = (x k , uk ) = w(τk , εk ) for brevity. Theorem 2 Suppose that (A1) and (A2) are fulfilled, the sequences {τk } and {εk } satisfy {τk } & +∞, {εk } ( 0.
(18)
Then: (i) EP (17) has a solution for any τ > 0 and ε > 0; (ii) Each sequence {wk } of solutions of EP (17) has limit points, all these limit points are solutions of EP (5). Proof Assertion (i) clearly follows from Proposition 4. By (i), the sequence {wk } is well-defined. We have to show that it is bounded. Take a solution x ∗ of EP (5), it exists due to Proposition 1. Then there exists a point u∗ ∈ V such that hi (xi∗ ) ≤ u∗i , i = 1, . . . , l. Set w∗ = (x ∗ , u∗ ), then −εk ≤ Φ(x k , x ∗ ) + τk [P (w∗ ) − P (wk )] = Φ(x k , x ∗ ) − τk P (wk ). If the sequence {wk } is unbounded, we have %x k % ≤ C ≤ ∞ and %uk % → ∞. Along the lines of the proof of Proposition 4 we also obtain P (wk ) → +∞, which yields a contradiction. Therefore, the sequence {wk } is bounded and has limit points. Let w¯ = (x, ¯ u) ¯ be an arbitrary limit point of {wk }, i.e. {wks } → w. ¯ Then, by definition, Φ(x ks , x) + P (w) + τk−1 εks , 0 ≤ P (wks ) ≤ τk−1 s s
∀w = (x, u) ∈ X × V .
Take any w = (x, u) ∈ D × V . Then P (w) = 0 and 0 ≤ P (w) ¯ ≤ lim P (wks ) ≤ 0, s→∞
158
I. Konnov
i.e. P (w) ¯ = 0 and w¯ ∈ D × V . Next, for each x ∈ D we can take u ∈ V such that P (w) = P (x, u) = 0 and obtain ! Φ(x ks , x) − τks P (wks ) = Φ(x ks , x) + τks P (w) − P (wks ) ≥ −εks . It now follows that ! Φ(x, ¯ x) ≥ lim sup Φ(x ks , x) ≥ lim sup τks P (wks ) − εks ≥ 0. s→+∞
s→+∞
Therefore x¯ solves EP (5) and assertion (ii) is true.
5 Inexact Solution Method for Penalized Problems Theorems 1 and 2 enable one to solve the basic penalized EP (6) approximately with a suitable iterative method applied to the master VI (12). It was shown in [10] that the mapping G in (12) possesses a strengthened monotonicity property if the bi-function Φ is monotone, i.e. Φ(x , x ) + Φ(x , x ) ≤ 0, for each pair of points x , x ∈ X. Then VI (12) admits more efficient solution methods. However, the monotonicity property of the bi-function Φ in noncooperative games may be rather restrictive. At the same time, we can utilize the weaker monotonicity property of the mapping F defined by taking the proper partial derivatives of Φ. Utilization of this mapping was suggested in [3]. For the sake of simplicity, we consider the differentiable case and take the following additional assumptions. (A3) Each utility function fi is differentiable in its i-th variable xi on X for i = 1, . . . , l. The mapping F defined by + ∂Φ(x, y) ++ F (x) = , + ∂y y=x is monotone on X. Lemma 4 Let the conditions in (A1)–(A3) be fulfilled. Then problem (10) is equivalent to the following mixed type VI: Find x(u) ∈ X such that F (x(u)), y − x(u)" +τ
l i=1
[hi (xi (u)) − ui ]+ , hi (yi ) − hi (xi (u))" ≥ 0 ∀y ∈ X.
(19)
Application of the Penalty Method to Equilibrium Problems
159
This property can be obtained similarly to that of Lemma 3. Hence, X(u) is also the solution set of problem (19). Proposition 5 Suppose (A1)–(A3) are fulfilled. Then the mapping G is singlevalued and co-coercive, i.e. u − u , G(u ) − G(u )" ≥ %G(u ) − G(u )%2 for all u , u ∈ V . Proof Take arbitrary points u , u ∈ V and set x = x(u ), g ∈ G(u ), v = h(x ) − u and x = x(u ), g ∈ G(u ), v = h(x ) − u . It follows from (19) that F (x ), x − x " − τ
l
gi , hi (xi ) − hi (xi )" ≥ 0,
i=1
F (x ), x − x " − τ
l
gi , hi (xi ) − hi (xi )" ≥ 0;
i=1
hence l
τ
gi − gi , hi (xi ) − hi (xi )" ≥ F (x ) − F (x ), x − x " ≥ 0
i=1
since F is monotone. It follows that 0≤
l
gi
− gi , [hi (xi )
− ui ] − [hi (xi )
− ui ]" +
i=1
=−
l
l
gi − gi , ui − ui "
i=1
[vi ]+ − [vi ]+ , vi − vi " + g − g , u − u "
i=1
≤−
l
%[vi ]+ − [vi ]+ %2 + g − g , u − u "
i=1
= −%g − g %2 + g − g , u − u ", and therefore, G is co-coercive and single-valued.
160
I. Konnov
It should be noted that the mapping G is single-valued even if this is not the case for the mapping u )→ X(u). Hence, we can re-write VI (12) as follows: Find a point u∗ ∈ V such that G(u∗ ), u − u∗ " ≥ 0,
∀u ∈ V .
(20)
Nevertheless, since each computation of the value G(u) requires a solution of NEP (14)–(15) we describe an approximate version of the projection method for VI (20). Denote by πV (u) the projection of a point u onto V . Method (APM) Choose a point u0 ∈ V and sequences of non-negative numbers {δs } and {δs } such that ∞ (δs + δs ) < +∞.
(21)
s=0
At each s-th iteration, s = 0, 1, . . ., we have a point us ∈ V and then find the next point us+1 ∈ V in conformity with the conditions %us+1 − u˜ s+1 % ≤ δs ,
(22)
where u˜ s+1 = πV [us − g s ] and %g s − G(us )% ≤ δs .
(23)
Thus, the method admits both the approximate projection step in (22) and the inexact computation of values of the mapping G in (23) within the control rule in (21). The approximate projection allows us to implement the method even in the case when the set V is defined with the help of nonlinear constraints. Besides, this makes the implementation considerably simpler in the linear case as well. The substantiation of convergence of Method (APM) follows directly from Theorem 1.4 in [16, Chapter V] and Proposition 5. Proposition 6 Suppose (A1)–(A3) are fulfilled. Then the sequence {us } converges to a solution u∗ of VI (20). We now discuss condition (23) in more detail under assumptions (A1)–(A3). Fix any point u ∈ V and a number σ > 0 and define the approximate problem for VI (19): Find x(u) ˜ ∈ X such that F (x(u)), ˜ y − x(u)" ˜ +τ
l i=1
[hi (x˜i (u)) − ui ]+ , hi (yi ) − hi (x˜i (u))" + σ ≥ 0
∀y ∈ X.
(24)
Application of the Penalty Method to Equilibrium Problems
161
Take any solutions x = x(u) and x = x(u) ˜ of problems (19) and (24), respectively, and set g = (g1 , . . . , gl )$ , gi = −[hi (xi ) − ui ]+ for i = 1, . . . , l and g = (g1 , . . . , gl )$ , gi = −[hi (xi ) − ui ]+ for i = 1, . . . , l. It follows from (19) and (24) that
F (x ), x − x " − τ
l
gi , hi (xi ) − hi (xi )" ≥ 0,
i=1
F (x ), x − x " − τ
l
gi , hi (xi ) − hi (xi )" + σ ≥ 0;
i=1
hence τ
l
gi − gi , hi (xi ) − hi (xi )" + σ ≥ F (x ) − F (x ), x − x " ≥ 0
i=1
since F is monotone. It follows that σ/τ ≥
l
gi − gi , [ui − hi (xi )] − [ui − hi (xi )]"
i=1
≥
l
%gi − gi %2 = %g − g %2 .
i=1
Therefore, %g − g % ≤
3
σ/τ .
We conclude that the desired accuracy δs in (23) can be attained via approximate solution of NEP (14)–(15) at u = us , which is equivalent to the mixed VI (19) due to Lemma 3, so that it is sufficient to find a point x(u) ˜ satisfying (24) at u = us where 3
σ/τ ≤ δs .
This means that there exist rather simple iterative procedures for solving noncooperative game problems with joint constraints both for strategies and for shares allocation within the proposed approach.
162
I. Konnov
6 An Illustrative Example In order to illustrate properties of the decomposable penalty method we give a simple example of the oligopolistic type model, which describes common utilization of a natural resource (e.g. water) and follows the lines of those in [17, Chapter II]. The model involves a system of n companies (players) which utilize common water resources so that the total water consumption volume must be bounded above by a fixed number b > 0 within a given time period. If the i-th company receives the water volume xi , it can obtain the benefit hi (xi ), whereas the price of water consumption depends on the total consumption volume, i.e. p = p(σx ) where σx =
n
xj .
j =1
Then the profit function of the i-th company is defined by fi (x) = hi (xi ) − xi p(σx ) for i = 1, . . . , n where x = (x1 , . . . , xn )$ . The i-th company water consumption is also bounded above by a fixed number ai > 0, hence it has the strategy set Xi = [0, ai ], the joint constraint set is defined by Y = x∈R
n
n
1 xi ≤ b ,
i=1
and n = l. It follows that the model can be formulated as the generalized noncooperative game (2)–(3). For the sake of simplicity, we first take U0 = Rn , then Y = D and problem (4) coincides with (2)–(3). We can also take the normalized EP (5) where the bi-function Φ is defined as follows: Φ(x, y) =
n
(hi (xi ) − hi (yi )) − σx p(σx ) +
i=1
n
⎞ ⎛ yi p ⎝ xj + yi ⎠ .
(25)
j =i
i=1
The streamlined application of the penalty method will consist in imposing penalties instead of the explicit common constraint in the set Y . Take the custom quadratic penalty function P˜ (x) = 0.5
$ n i=1
%2 xi − b
. +
Application of the Penalty Method to Equilibrium Problems
163
Then, given a number τ > 0, we consider the problem of finding a point x(τ ) ∈ X such that Φ(x(τ ), x) + τ [P˜ (x) − P˜ (x(τ ))] ≥ 0 ∀x ∈ X; cf. (6). However, this EP can not be decomposed into a NEP due to the nonseparability of the penalty function. If we modify each utility function as f¯i (x) = fi (x) − τ P˜ (x) and remove the set Y in (2), we obtain the usual NEP of form (1); see e.g. [7]. However, now each company will have additional charges after any violation of the common constraint regardless of individual contributions, which does not seem fair. We now describe the application of the proposed decomposable penalty method. The basic penalized auxiliary problem (6) leads to the equivalent two-level decomposable problem, as indicated in Theorem 1. For any u ∈ V , we can solve the NEP: Find x(u) ∈ X such that f˜i (x−i (u), yi ) ≤ f˜i (x(u)),
∀yi ∈ Xi , i = 1, . . . , n;
(26)
where the i-th player has the penalized utility function f˜i (x) = fi (x) − 0.5τ [xi − ui ]2+ , i = 1, . . . , n. If X(u) denotes the solution set of this problem, then for each x(u) ∈ X(u) we set g(u) = − [x(u) − u]+ , and define the mapping value G(u) = {g(u) | x(u) ∈ X(u)} for the upper level VI (12). In such a way we obtain a completely decomposable flexible procedure corresponding to the custom noncooperative game framework. In order to provide the assumptions in (A1) and (A2), we only have to impose conditions on the functions p and hi . We take a rather broad assumptions that follow those in [18]. (B1) Each function hi : R+ → R is concave and continuous for i = 1, . . . , n. The cost function p : R+ → R is concave and continuous for i = 1, . . . , n, the function μ : R+ → R defined by μ(σ ) = σp(σ ) is convex. Then (see [18, Lemma 1]) the function μ(α) ˜ = αp(σ + α) is convex on R+ for each fixed σ ≥ 0. It follows that each utility function fi is concave in its i-th variable xi and continuous for i = 1, . . . , n. Therefore, (A1) and (A2) hold true. The assumptions in (A3) will only require the differentiability. (B2) The functions p and hi , i = 1, . . . , n, are differentiable.
164
I. Konnov
We now give two natural examples of the cost function p satisfying conditions (B1) and (B2): p(σ ) = ασ β , α > 0, β ∈ (0, 1]; p(σ ) = ln(1 + σ ). Condition (B2) implies that each utility function fi is differentiable. Due to (B1), each utility function fi is concave in its i-th variable, hence Φ(x, ·) in (25) is convex for each x ∈ X. Besides, (B1) also gives that Φ(·, y) in (25) is concave for each y ∈ X; see e.g. [19, Proposition 12.4]. It follows that all the assumptions in (A3) are fulfilled; see e.g. [19, Proposition 11.8]. Due to Proposition 5 the mapping G is then single-valued and co-coercive. These properties simplify essentially the solution of the upper level VI (12) or (20) since now any Nash equilibrium point in (26) for a chosen u ∈ V gives the same value g(u). They also provide convergence of Method (APM) where the projection step in (22) can be calculated exactly. However, if the set V involves additional regulator restrictions defined by complex constraints, we can apply the same method with approximate projection steps.
7 Conclusions In this paper, an approximate decomposable penalty method was applied to generalized noncooperative games with joint constraints and restrictions on player shares allocation. Since the streamlined penalty method leads to an equilibrium problem with coupled variables, the method involves an equivalent transformation to a completely decomposable problem. This approach enabled us to replace the initial problem with a sequence of the approximate penalized Nash equilibrium problems together with an upper level variational inequality. Convergence of inexact solutions of these auxiliary problems to a solution of the initial game problem was established under natural conditions. Acknowledgments This paper has been supported by the Kazan Federal University Strategic Academic Leadership Program (“PRIORITY-2030”).
References 1. Nash, J.: Non-cooperative games. Ann. Math. 54, 286–295 (1951) 2. Debreu, G.: A social equilibrium existence theorem. Proc. Nat. Acad. Sci. U. S. A. 38, 886–893 (1952) 3. Rosen, J.B.: Existence and uniqueness of equilibrium points for concave n-person games. Econometrica 33, 520–534 (1965)
Application of the Penalty Method to Equilibrium Problems
165
4. Zukhovitskii, S.I., Polyak, R.A., Primak, M.E.: Two methods of search for equilibrium points of n-person concave games. Soviet Mathem. Doklady 10, 279–282 (1969) 5. Ichiishi, T.: Game Theory for Economic Analysis. Academic Press, New York (1983) 6. Contreras, J., Klusch, M., Krawczyk, J.B.: Numerical solutions to Nash-Cournot equilibria in coupled constraint electricity markets. IEEE Trans. Power Syst. 19, 195–206 (2004) 7. Facchinei, F., Kanzow, C.: Generalized Nash equilibrium problems. 4OR 5, 173–210 (2007) 8. Pang, J.-S., Scutari, G., Facchinei, F., Wang, C.: Distributed power allocation with rate constraints in Gaussian parallel interference channels. IEEE Trans. Inform. Theory 54, 3471– 3489 (2008) 9. Konnov, I.V.: Shares allocation methods for generalized game problems with joint constraints. Set-Valued. Variat. Anal. 24, 499–516 (2016) 10. Konnov, I.V.: Decomposable penalty method for generalized game problems with joint constraints. Optimization. https://doi.org/10.1080/02331934.2020.1793153 11. Allevi, E., Gnudi, A., Konnov, I.V., Oggioni, G.: Decomposition method for oligopolistic competitive models with common environmental regulation. Ann. Oper. Res. 268, 441–467 (2018) 12. Nikaido, H., Isoda, K.: Note on noncooperative convex games. Pacific J. Math. 5, 807–815 (1955) 13. Fan Ky: A minimax inequality and applications. In: Shisha, O. (ed.) Inequalities III, pp. 103– 113. Academic Press, New York (1972) 14. Konnov, I.V.: An approximate penalty method with descent for convex optimization problems. Russ. Mathem. (Iz. VUZ) 63(7), 41–55 (2019) 15. Blum, E., Oettli, W.: From optimization and variational inequalities to equilibrium problems. Math. Stud. 63, 127–149 (1994) 16. Gol’shtein, E.G., Tret’yakov, N.V.: Modified Lagrange Functions. Nauka, Moscow (1989) [Engl. transl. in John Wiley and Sons, New York, 1996] 17. Okuguchi, K., Szidarovszky, F.: The Theory of Oligopoly with Multi-product Firms. SpringerVerlag, Berlin (1990) 18. Murphy, F.H., Sherali, H.D., Soyster, A.L.: A mathematical programming approach for determining oligopolistic market equilibrium. Math. Program. 24, 92–106 (1982) 19. Konnov, I.V.: Equilibrium Models and Variational Inequalities. Elsevier, Amsterdam (2007)
Regular Networks Unification in Games with Stochastic Parameters Alexei Korolev
Abstract In this paper, stochastic parameters are introduced into the network game model with production and knowledges externalities. This model was formulated by V. Matveenko and A. Korolev. It generalizes two-period Romer model. Agents’ productivities have deterministic and Wiener components. The research represents the adjustment dynamics which occurs in the process of unifying three regular networks. Explicit expressions of the dynamics of network agents in the form of Brownian random processes were obtained. A qualitative analysis of the solutions of stochastic system was carried out. Keywords Network games · Differential games · Brownian motion · Stochastic differential equations · Ito’s lemma · Heterogeneous agents · Productivity
1 Introduction Recent decades have been developing research areas such as analysis of social networks, the economics of networks and games on networks (e.g. [2–6, 8]). Numerous theoretical results in these areas began to be widely used in the analysis of real-life networks such as the Internet, the relations of people in collectives and settlements, the relations of countries, etc. However, so far in the literature insufficient attention has been paid to production networks. In [9] a model with production and knowledge externalities with two time periods is considered which generalizes the Romer model [12], where essentially a special case of the complete network is being examined. The agents, which receive an endowment in the first period, are located at the nodes of a network of arbitrary form. The endowment is distributed between investment in knowledge and consumption. The consumption of the second period is determined by production, which depends on both one’s investments and investments made by the closest neighbors
A. Korolev () National Research University Higher School of Economics, St. Petersburg, Russia © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Trends in Mathematics, https://doi.org/10.1007/978-3-030-93616-7_9
167
168
A. Korolev
in the network. The utility of an agent is determined by its consumption in two periods. In [9] the concept of Nash equilibrium with externalities is introduced where, as in the usual Nash equilibrium, agents maximize their gain (utility) and none of the agents find it beneficial to change her behavior in case if others do not change it. However, in this model, it is assumed that the agent is not able to completely arbitrarily change her behavior, as the concept of Nash equilibrium allows, and to a certain extent attached to the equilibrium situation of the game. Exactly, in [9] it is assumed that the agent makes a decision while in a certain environment formed by her and her network neighbors, and although she is involved in shaping the environment at the time of making the decision, the environment is considered by the agent as exogenously given. However, in [9] only networks with homogeneous agents were considered. In [10] generalization of the model [9] is examined in a situation where the productivities of the agents may be different. Conditions are established under which the agent behaves in equilibrium in one way or another: passive (does not invest), active (invests part of the income), hyperactive (invests all income). We also found conditions for the existence of internal equilibrium (i.e., equilibrium with active agents) for several networks and proved a theorem on comparing the utilities of agents. Also [10] introduced the dynamics of networks in discrete time. The concept of dynamic stability of equilibria is defined. In [10] the transients that occur when combining networks also in discrete-time were studied. In [11] the dynamics of networks with the production and externalities of knowledge and the concept of dynamic stability of equilibria are considered in continuous time. However, in all of the mentioned above works, the network parameters were deterministic. The contribution of this paper consists in the description of the transition dynamics in the stochastic case, when agent productivity has both deterministic and Brownian components. We examined the behavior of a single agent and dyad. It turns out that the boundaries of various scenarios of the agent’s behavior in the stochastic case are shifted in comparison with the deterministic case. The content of the article is as follows. We begin by describing our main model and listing some of its properties. The second section contains a description of the main model and a review of previous results. Definitions of Nash equilibrium in a network with the production and externalities of knowledge and of the dynamic stability of equilibrium are given. Also, the second section provides a characterization of the behaviors of agents, that is, the main analysis tools. At the end of the section, the brand new model description is given. The third section considers the dynamics of a single agent in deterministic and stochastic cases, i.e. when an agent has constant productivity and when her productivity consists of two terms: a deterministic and a stochastic (Wiener) processes. The third section contains previous results concerning the transition dynamics in a triregular network in the deterministic case. The fourth section describes the adjustment dynamics in a triregular network in the stochastic case. An explicit expressions of the dynamics of agents in the form of Brownian random processes are obtained (Theorem 1).
Regular Networks Unification in Games with Stochastic Parameters
169
The research provides a qualitative analysis of the solutions of stochastic equations systems (Corollary 1). The fifth section summarizes and lists possible directions for future researches.
2 Deterministic Model and Review of Previous Results We begin by describing our main model and listing some of its properties, since our new model differs from the original model in the stochastic nature of the parameters. The basic model under consideration (deterministic version) was formulated in [9] and represents the transfer to the network of the generalized Romer model [12]. There is a network (undirected graph) with n nodes, i = 1, 2, . . . , n; each node represents an agent. In period 1 each agent i possesses initial endowment of good, e, and uses it partially for consumption in first period of life, c1i , and partially for investment into knowledge, ki , which are used in the production of goods for consumption in the second period of life c2i : c1i + ki = e, i = 1, 2, . . . , n. Investment immediately transforms one-to-one into knowledge which is used in production of good for consumption in second period, c2i . Preferences of agent i are described by quadratic utility function:
Ui c1i , c2i = c1i e − ac1i + bi c2i , where a is a satiation coefficient, bi > 0 is a parameter, characterized the value of comfort and health in the second period of life compared to consumption in the first period. It is assumed that c1i ∈ [0, e], the utility increases in c1i , and is concave (the marginal utility decreases) with respect to c1i . These assumptions are equivalent to condition 0 < a < 1/2. Agent i environment we will call the sum of investments by the agent himself and her neighbors: Ki = ki + K˜ i ,
K˜ i =
kj ,
j ∈N(i)
where N (i)—is the set of neighboring nodes of node i, K˜ i we will call the pure externality. Production in node i is described by production function: F (ki , Ki ) = Bi ki Ki , Bi > 0
170
A. Korolev
which depends on the state of knowledge in i-th node, ki , and on environment, Ki , Bi is a technological coefficient. We will denote the product bi Bi by Ai and assume that a < Ai . Since increase of any of parameters bi , Bi promotes increase of the second period consumption, we will call Ai “productivity”. We will assume that Ai = 2a, i = 1, 2, . . . , n. If Ai > 2a, we will say that i-th agent is productive, and if Ai < 2a, we will say that i-th agent is unproductive. Three ways of behavior are possible: agent i is called passive if she makes zero investment, ki = 0 (i.e. consumes the whole endowment in period 1); active if 0 < ki < e; hyperactive if she makes maximally possible investment ε (i.e. consumes nothing in period 1). Let us consider the following game. Players are the agents i = 1, 2, . . . , n. Possible actions (strategies) of player i are values of investment ki from the segment (for shortness, equilibrium) is a profile [0, e]. Nash equilibrium with externalities of knowledge levels (investments) k1∗ , k2∗ , . . . , kn∗ , such that each ki∗ is a solution of the following problem P (Ki ) of maximization of i-th player’s utility given environment Ki :
Ui c1i , c2i −→ max c1i ,c2i ,ki
⎧ ⎨
c1i ≤ e − ki , ≤ F (ki , Ki ) , ⎩ i c1 ≥ 0, c2i ≥ 0, ki ≥ 0, c2i
where the environment Ki is defined by the profile k1∗ , k2∗ , . . . , kn∗ : Ki = ki∗ +
kj∗
j ∈N(i)
The first two constraints of problem P (Ki ) in the optimum point are evidently satisfied as equalities. Substituting into the objective function, we obtain a new function (payoff f unction, or indirect utility f unction): Vi (ki , Ki ) = Ui e − ki , Fi (ki , Ki ) = (e − ki ) e − a (e − ki ) + Ai ki Ki = = e2 (1 − a) − ki e (1 − 2a) − aki2 + Ai ki Ki .
(1)
If all players’ solutions are internal (0 < ki < e), i.e. all players are active, the equilibrium will be referred as inner equilibrium. Otherwise it be referred as corner equilibrium. A corner equilibrium in which the level of knowledge at each vertex is 0 or e, i.e. all players are passive or hyperactive, we will call purely corner equilibrium.
Regular Networks Unification in Games with Stochastic Parameters
171
Clearly, the inner equilibrium (if it exists for given values of parameters) is defined by the system D1 Vi (ki , Ki ) = 0, i = 1, 2, . . . , n,
(2)
or according to (1) it is D1 Vi (ki , Ki ) = e (2a − 1) − 2aki + Ai Ki = 0, i = 1, 2, . . . , n.
(3)
˜ Let us introduce the following notations: A—diagonal matrix, which has numbers A1 , A2 , . . . , An on the main diagonal, I —unit n × n matrix, M—network adjacency matrix. In this matrix Mij = Mj i = 1, if there is the edge of the matrix, connecting vertexes i and j , and Mij = Mj i = 0—otherwise. It is believed, that Mii = 0 for all i = 1, 2, . . . , n. The system of Eqs. (3) takes the form:
˜ k = e¯, A˜ − 2aI k + AM
(4)
T where k = (k1 , k2 , . . . , kn )T , e¯ = e (1 − 2a) , e (1 − 2a) , . . . , e (1 − 2a) . In [10] is introduced in discrete time the adjustment dynamics, which can begin after a small deviation of the strategies of agents from the equilibrium position or after unification of networks, each of which was in equilibrium prior to unification. We imagine the dynamics in the network in such a way that each agent maximizes utility by choosing the level of investment, while it considers the environment as a given one. Hence, if kin = 0 and D1 Vi (ki , Ki )|ki =0 ≤ 0, then kin+1 = 0, and if kin = e and D1 Vi (ki , Ki ) |ki =e ≥ 0, then kin+1 = e. In all other cases, i.e. at all internal points of the interval [0, e], as well as if kin = 0 and D1 Vi (ki , Ki )|ki =0 > 0, and if kin = e and D1 Vi (ki , Ki ) |ki =e < 0, the behavior of the agent is determined by the relation: −2akin+1 + Ai Kin − e(1 − 2a) = 0, i.e. kin+1 =
Ai n Ai ˜ n e(1 − 2a) k + . K − 2a i 2a i 2a
In [11] the dynamics in continuous time were modeled, also for the deterministic model, as follows. Definition 1 ([11], Definition 5) Each agent maximizes her utility by choosing a level of investment; at the moment of decision-making she considers her environment as exogenously given. Correspondingly, if ki (t) = 0, where t is an arbitrary moment, and D1 Vi (ki , Ki )|ki =0 ≤ 0, then k˙i (t) = 0, and if ki (t) = e
172
A. Korolev
and D1 Vi (ki , Ki ) |ki =e ≥ 0, then k˙i (t) = 0; in all other cases, ki (t) solves the differential equation: Ai ˜ Ai − 2a e(1 − 2a) k˙i = Ki + ki − . 2a 2a 2a Definition 2 ([11], Definition 6) The equilibrium is called dynamically stable if, after a small deviation of one of the agents from the equilibrium, dynamics starts which returns the equilibrium back to the initial state. In the opposite case the equilibrium is called dynamically unstable. The next natural task is the rejection of determinism and the transition to stochastic parameters of agents in our model. The assumption of the presence of a stochastic component of agents’ productivities seems quite realistic to our view, while the endowments of agents are constants for this concept. In brand new model the productivity of each agent has not only determined A, but also the Brownian (Wiener) component αWt (W0 = 0). Thus, the full productivity of the i-th agent is now equal Ai + αi Wti . The other assumptions remain the same as in our main model. In this case, the indirect utility function (1) takes the form Vi (ki , Ki ) = e2 (1 − a) − ki e (1 − 2a) − aki2 + (Ai + αi Wti )ki Ki . and Eq. (3) is transformed as follows D1 Vi (ki , Ki ) = e (2a − 1) − 2aki + (Ai + αi Wti )Ki = 0, i = 1, 2, . . . , n. Thus in discrete time the equation of dynamics of agent i now is kin+1 =
Ai + αi Wni n Ai + αi Wni ˜ n e(1 − 2a) ki + , Ki − 2a 2a 2a
or kin+1
− kin
=
Ai + αi Wni Ai + αi Wni ˜ n e(1 − 2a) − 1 kin + . Ki − 2a 2a 2a
Turning to continuous time, we have k˙i =
Ai + αi Wti Ai + αi Wti ˜ e(1 − 2a) − 1 ki + , Ki − 2a 2a 2a
or in differential form Ai αi Ai ˜ αi ˜ e(1 − 2a) dki = − 1 ki dt + ki dWti + dt. Ki dt + Ki dWti − 2a 2a 2a 2a 2a
Regular Networks Unification in Games with Stochastic Parameters
173
3 Adjustment Dynamics in Triregular Networks with Heterogeneous Agents (Deterministic Case) Definition 3 Let the set of nodes 1, 2, . . . , n be decomposed into disjoint classes in such way that any nodes belonging the same class have the same productivity and the same numbers of neighbors from each class. The classes will be referred as types of nodes. Type i is characterized by productivity Ai and by vector t i = (ti1 , ti2 , . . . , tik ), where tij is the number of neighbors of type j for any node of type i. Let us describe an algorithm of subdivision of the set of nodes of network into types. Let s be a current number of subsets of subdivision. Initially s is the number of various productivities. Iteration of the Algorithm Consider nodes of the first subset. If all of them have the same numbers of neighbors in each subset 1, 2, . . . , s, then the first subset is not changed. In the opposite case, we divide the first subset into new subsets in such way that all nodes of each new subset have the same vector of numbers of neighbors in subsets. We proceed in precisely same way with the second, the third, . . . , the s-th subset. If on the present iteration the number of subsets s have not changed, then the algorithm finishes its work. If s has increased, then the new iteration is executed. The number of subsets s does not decrease in process of the algorithm. Since s is bounded from above by the number of nodes in the network, the algorithm converges. It is clear that the algorithm divides the set of nodes into the minimal possible number of classes. Definition 4 A network in which each node has the same degree (number of neighbors) is referred as regular. Definition 5 Let us consider a regular network consisting of three types of agents with productivities Ai and vectors ti = (ti1 , ti2 , ti3 ), i = 1, 2, 3; A1 > A2 > A3 . Let the following conditions be satisfied: t11 + 1 = t21 = t31 = n1 , t12 = t22 + 1 = t32 = n2 , t13 = t23 = t33 + 1 = n3 . Such network will be referred as triregular network of degree (n1 , n2 , n3 ). The triregularity seems to be a natural specification of regularity. In triregular network, any agent has ni links with type i (i = 1, 2, 3). (Since each agent of type i is “linked” in some sense with herself, she has only (ni − 1) links with other agents of type i).
174
A. Korolev
A special case of triregular network is a complete network with n1 + n2 + n3 nodes which is received in result of junction of three complete networks with n1 , n2 and n3 nodes. Definition 6 Equilibrium (or any other situation) is called symmetric, if all players of the same type choose the same action (make the same investment). In triregular network, let in initial time period each i-th type agent invest k0i (i = 1, 2, 3). Correspondingly, the environment (common for all agents) in the initial period is K = n1 k01 + n2 k02 + n3 k03 . Assume that for each i (i = 1, 2, 3) either k0i = 0 and D1 V1 (ki , K)|ki =0 > 0, or k0i = e and D1 V1 (ki , K)|ki =e < 0, or k0i ∈ (0, e). Then Definition 1 implies that the dynamics is described by the system of differential equations. ⎧ ⎪ ⎨ k˙1 = k˙2 = ⎪ ⎩ k˙ = 3
n1 A1 −2a k1 + n22aA1 k2 + n32aA1 k3 2a n1 A2 n2 A2 −2a k2 + n32aA2 k3 2a k1 + 2a n1 A3 n2 A3 n3 A3 −2a k3 2a k1 + 2a k2 + 2a
+ + +
e(2a−1) 2a , e(2a−1) 2a , e(2a−1) 2a
(5)
with initial conditions ki0 = k0i ,
i = 1, 2, 3.
(6)
Triregular networks are often found in life. An example is the interaction of firms or non-profit organizations, when an institution of each type requires a certain number of partners of each type to work successfully. We could also consider a more general situation of m-regularity. Definition 7 Let us consider a regular network consisting of m types of agents with productivities Ai and vectors ti = (ti1 , ti2 , . . . , tim ), i = 1, 2, . . . , m; A1 > A2 > · · · > Am . Let the following conditions be satisfied: t11 + 1 = t21 = · · · = tm1 = n1 , t12 = t22 + 1 = · · · = tm2 = n2 , ..., t1m = t2m = · · · = tmm + 1 = nm . Such network will be referred as m-regular network of degree (n1 , n2 , . . . , nm ). In m-regular network, let in initial time period each i-th type agent invest k0i (i = 1, 2, . . . , m).
Regular Networks Unification in Games with Stochastic Parameters
175
Assume that for each i (i = 1, 2, . . . , m) either k0i = 0 and D1 V1 (ki , K)|ki =0 > 0, or k0i = e and D1 V1 (ki , K)|ki =e < 0, or k0i ∈ (0, e). Then the dynamics in m-regular network is described by the following system of differential equations. ⎧ 1 −2a ⎪ k˙1 = n1 A2a k1 + n22aA1 k2 + · · · + nm2aA1 k3 + e(2a−1) ⎪ 2a , ⎪ ⎨ ˙ n1 A2 2 −2a k2 = 2a k1 + n2 A2a k2 + · · · + nm2aA2 k3 + e(2a−1) 2a , ⎪ ... ... ... ... ... ... ... ... ..., ⎪ ⎪ ⎩ k˙ = n1 Am k + n2 Am k + · · · + nm Am −2a k + e(2a−1) m 1 m 2a 2a 2 2a 2a
(7)
with initial conditions ki0 = k0i ,
i = 1, 2, . . . , m.
(8)
4 Adjustment Dynamics in Triregular Networks with Heterogeneous Agents (Stochastic Case) Consider three regular networks that are in some initial state. The productivity of each agent has a constant and random (Brownian) component. The first regular network of degree t11 consists of agents with productivity A1 + α1 Wt1 each, the second regular network of degree t22 consists of agents with productivity A2 +α2 Wt2 each, and the third regular network of degree t33 consists of agents with productivity A3 + α3 Wt3 each. Before joining, each of the original networks had homophilia, i.e. all agents in each of the networks made the same investment in knowledge. Investment in the knowledge of agents of the first network was k10 , investment in the knowledge of agents of the second network was k20 , and investment in the knowledge of agents of the third network was k30 . Let at time t = 0 these three networks are combined to form a triregular network of degree (n1 , n2 , n3 ) in accordance with Definition 5. Agents of the first network will be agents of the first type in the unified triregular network, agents of the second network will be agents of the second type in the unified triregular network and agents of the third network will be agents of the third type in the unified triregular network. Let’s assume that our three regular networks with different agent productivity and initial values of investment in knowledge k10 , k20 , and k30 , respectively, are combined into a single m-regular network. The productivity of each agent has a constant and random (Brownian) component. We assume that changes in the productivity of agents are caused by the same random influences in the economy, and the size of the random components is proportional to the constant terms of the productivity of agents. In other words, the productivity of agents of the first type is equal to A1 + α1 Wt , the productivity of agents of the second type is equal to A2 + α2 Wt and
176
A. Korolev
the productivity of agents of the third type is equal to A3 + α3 Wt , while A1 A2 A3 = = . α1 α2 α3
(9)
Then homophilia will also occur in the combined triregular network: agents of the same type will behave the same way at any given time. The dynamics in such a network is described by a system of stochastic differential equations ⎧ ⎪ k˙1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ k˙ 2 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ k˙3 ⎪ ⎪ ⎪ ⎩
α1 = n1 A2a1 − 1 k1 + n1 α2a1 Wt k1 + n2 A2a1 k2 + n2 2a Wt k2 + n3 A2a1 k3 + e(1−2a) 2a ,
A2 α2 α2 = n1 2a k1 + n1 2a Wt k1 + n2 A2a2 − 1 k2 + n2 2a Wt k2 + n3 A2a2 k3 + +n3 α2a2 Wt k3 − e(1−2a) 2a ,
α3 = n1 A2a3 k1 + n1 2a Wt k1 + n2 A2a3 k2 + n2 α2a3 Wt k2 + n3 A2a3 − 1 k3 + +n3 α2a3 Wt k3 − e(1−2a) 2a ,
+n3 α2a1 Wt k3 −
or, in differential form,
⎧ ⎪ dk1 = n1 A2a1 − 1 k1 dt + n1 α2a1 k1 dWt + n2 A2a1 k2 dt + n2 α2a1 k2 dWt + n3 A2a1 k3 dt+ ⎪ ⎪ ⎪ ⎪ α1 ⎪ k3 dWt − e(1−2a) +n3 2a ⎪ 2a dt,
⎪ ⎪ ⎨ dk = n A2 k dt + n α2 k dW + n A2 − 1 k dt + n α2 k dW + n A2 k dt+ 2 1 2a 1 1 2a 1 t 2 2a 2 2 2a 2 t 3 2a 2 e(1−2a) α2 ⎪ k dW − dt, +n ⎪ 3 2a 2 t ⎪ 2a
⎪ ⎪ A3 α3 A3 α3 A3 ⎪ ⎪ = n k dt + n k dW + n k dt + n k dW + n − 1 k2 dt+ dk 3 1 1 1 1 t 2 2 2 2 t 3 ⎪ 2a 2a 2a 2a 2a ⎪ ⎩ α3 e(1−2a) +n3 2a k2 dWt − 2a dt. (10) The matrix notation of system (10) has the form dk = Akdt + αkdW + εdt,
(11)
where ⎛
⎞ dk1 dk = ⎝ dk2 ⎠ , dk3
⎛ n1 A1 A=⎝
⎞ n1 A1 n1 A1 2a − 1 2a 2a n2 A2 n2 A2 n2 A2 ⎠, 2a 2a − 1 2a n3 A3 n3 A3 n3 A3 2a 2a 2a − 1
⎞ ⎛ n α n α n α 1 ⎝ 1 1 1 1 1 1⎠ α= n2 α2 n2 α2 n2 α2 , 2a n3 α3 n3 α3 n3 α3
⎞ k1 k = ⎝ k2 ⎠ , k3 ⎛
⎛ ⎞ 1 e(1 − 2a) ⎝ ⎠ ε=− 1 . 2a 1
Regular Networks Unification in Games with Stochastic Parameters
177
Theorem 1 In the stochastic case the dynamics of investments in knowledge of agents in triregular network is " e(1 − 2a) 2nj Aj − kj (t) = 2a
3
i=j
# ni Ai +
ni Ai
i=1
" 2a +
kj0
i=j
ni Ai − nj Aj
i=j
# ki0 2a
" − e(1 − 2a) 2nj Aj −
3
i=j
# ni Ai exp{−t}+
ni Ai
i=1
⎧⎛ ⎫ ⎞ " #2 ⎪ ⎪ 3 3 3 ⎪ ⎪ ⎪⎜ ⎪ ⎪ ⎟ ⎪ n α i i n A n α ⎪ ⎪ i i i i 3 ⎜ ⎨ ⎬ ⎟ nj Aj i=1 ⎜ i=1 ⎟ i=1 × − W + exp ⎜ − 1 ki0 − t + ⎟ t 3 ⎪ ⎪ ⎜ 2a ⎟ 8a 2 2a ⎪ ⎪ ⎪ ⎪ i=1 ⎝ ⎠ ni Ai ⎪ ⎪ ⎪ ⎪ ⎩ ⎭ i=1 ⎫ ⎤ ⎧⎛ ⎞ #2 " ⎪ ⎪ 3 3 ⎪ ⎪ ⎪ ⎪ ⎪ ⎥ ⎪ ⎜ ⎟ ni αi n A ⎪ ⎪ t i i 3 ⎬ ⎥ ⎨ ⎜ ⎟ i=1 3e(1 − 2a) ⎥ ⎜ i=1 ⎟ i=1 ni αi − exp ⎜− + 1⎟ τ − + Wτ dτ ⎥ , ⎪ ⎥ ⎪⎜ ⎟ 2a 2a 8a 2 2a 0 ⎪ ⎪ ⎪ ⎪ ⎦ ⎝ ⎠ ⎪ ⎪ ⎪ ⎪ ⎭ ⎩
(12) where j = 1, 2, 3. Proof It is clear that the matrices A and α commute in view of (9); therefore, for the matrix exponentials, the relation exp{At} exp{αWt } = exp{At + αWt } holds and we can solve the matrix equation (11) by multiplying from the left by the matrix exponent exp{−At − αWt +
α2 t}. 2
178
A. Korolev
Denote for brevity Ψ = −At − αWt +
α2 t. 2
Then we have d (exp{Ψ }k) = exp{Ψ }dk + d exp{Ψ } · k + d exp{Ψ } · dk = α2 α2 dt + dt k− = exp{Ψ } (Akdt + αkdWt + εdt) + exp{Ψ } −Adt − αdWt + 2 2 − exp{Ψ }α 2 kdt = exp{Ψ }εdt. Thus, Eq. (11) takes the form α2 α2 d exp −At − αWt + t k = exp −At − αWt + t εdt, 2 2 therefore, the solution of matrix equation (11) can be written as α2 k(t) = exp At + αWt − t k0 + 2 t α2 α2 t τ dτ ε. + exp At + αWt − exp −Aτ − αWτ + 2 2 0 Notice, that ⎛
⎞⎛ ⎞ n1 α1 n1 α1 n1 α1 n1 α1 n1 α1 n1 α1 1 = 2 ⎝ n2 α2 n2 α2 n2 α2 ⎠ ⎝ n2 α2 n2 α2 n2 α2 ⎠ = 2 8a n3 α3 n3 α3 n3 α3 n3 α3 n3 α3 n3 α3
α2
⎞ n1 α1 n1 α1 n1 α1 i=1 ni αi ⎝ n2 α2 n2 α2 n2 α2 ⎠ . 8a 2 n3 α3 n3 α3 n3 α3
3 =
⎛
(13)
Regular Networks Unification in Games with Stochastic Parameters
179
The eigenvalues of the matrix A− ⎛ ⎜ =⎜ ⎝
α2 = 2
⎞ n1 α1 3i=1 ni αi n1 α1 3i=1 ni αi n1 α1 3i=1 ni αi n1 A1 n1 A1 n1 A1 − 1 − − 2 2 2 2a − 2a 2a 8a 8a 8a ⎟ n2 α2 3i=1 ni αi n2 α2 3i=1 ni αi n2 α2 3i=1 ni αi ⎟ n2 A2 n2 A2 n2 A2 − − − 1 − 2 2 2 ⎠. 2a 2a 2a 8a 8a 38a 3 3 n3 α3 i=1 ni αi n3 α3 i=1 ni αi n3 α3 i=1 ni αi n3 A3 n3 A3 n3 A3 −1 2a − 2a − 2a − 8a 2 8a 2 8a 2
are, as one can easily see, 3 λ1 = −1,
λ2 = −1,
i=1 ni Ai − 2a
λ3 =
3
2
i=1 ni αi
8a 2
As eigenvectors we can obviously take ⎞ 1 e1 = ⎝ −1 ⎠ , 0 ⎛
⎛
⎞ 0 e2 = ⎝ 1 ⎠ −1 and ⎛
⎞ n1 A1 e3 = ⎝ n2 A2 ⎠ , n3 A3 or in view of (9) ⎛
⎞ n1 α1 e3 = ⎝ n2 α2 ⎠ . n3 α3 The eigenvalues of the matrix α are 3 μ1 = 0,
μ2 = 0,
μ3 =
i=1 ni αi
2a
,
− 1.
180
A. Korolev 2
and we can choose the same e1 , e2 , and e3 as eigenvectors as for the matrix A − α2 .
2 Therefore, to reduce to the diagonal form the matrices A − α2 t and αWt we can use the same transition matrices ⎛
1 S = ⎝ −1 0 ⎛
S −1
⎞ n1 A1 n2 A2 ⎠ , n3 A3
0 1 −1
n2 A2 + n3 A3 ⎝ = 3 n3 A3 i=1 ni Ai 1
−n1 A1 n3 A3 1
1
⎞ −n1 A1 −(n1 A1 + n2 A2 ) ⎠ , 1
so we get α2 t + αWt = S(J t + ΛWt )S −1 , A− 2 where ⎛
−1 0 ⎜ 0 −1 ⎜ J =⎝ 0 0
⎞
0 0
3
i=1 ni Ai
2a
−
3 i=1 ni αi 8a 2
⎟ ⎟, ⎠
2
⎛
00 ⎜0 0 Λ=⎝
−1
00
3
0 0
⎞ ⎟ ⎠,
i=1 ni αi 2a
and correspondingly exp
α2 A− t + αWt = 2
⎞ exp{−t} 0 0 ⎟ ⎜ ⎟ ⎜ 0 exp{−t} 0 # ⎟ S −1 . # "
2 =S⎜ 3 3 ⎟ ⎜ 3 i=1 ni αi ⎠ ⎝ i=1 ni Ai i=1 ni αi − − 1 t + W 0 0 exp t 2 2a 2a 8a ⎛
(14) Substituting (14) into (13) we obtain ⎞ ⎛ k1 (t) 1 1 ⎝ k2 (t) ⎠ = ⎝ −1 3 i=1 ni Ai k3 (t) 0 ⎛
0 1 −1
⎞ n1 A1 n2 A2 ⎠ × n3 A3
Regular Networks Unification in Games with Stochastic Parameters
⎛
exp{−t} 0 ⎜ 0 exp{−t} ⎜ " ×⎜ 3 ⎝ i=1 ni Ai − 0 0 exp 2a
⎛
n2 A2 + n3 A3 ×⎝ n3 A3 1
−n1 A1 n3 A3 1
3 i=1 ni αi 8a 2
exp{−t} 0 ⎜ 0 exp{−t} ⎜ " ×⎜ 3 ⎝ i=1 ni Ai 0 0 exp − 2a
⎜ ⎜ ×⎜ ⎜ ⎝
exp{τ }dτ 0
0t 0
0
⎛
3
i=1 ni αi
2a
Wt
⎟ 1⎟ ⎟× ⎠
⎞ n1 A1 n2 A2 ⎠ × n3 A3 ⎞
0 0
3 i=1 ni αi 8a 2
0
#
2
−1 t +
3
i=1 ni αi
2a
Wt
⎟ 1⎟ ⎟× ⎠ ⎞
0
exp{τ }dτ 0
−1 t +
0 1 −1
⎛
0
#
2
⎞⎛ 0⎞ k1 −n1 A1 −(n1 A1 + n2 A2 ) ⎠ ⎝ k20 ⎠ − k30 1
⎛ 1 e(1 − 2a) 1 ⎝ −1 − · 3 2a i=1 ni Ai 0
⎛0t
⎞
0 0
181
0t 0
" exp
n2 A2 + n3 A3 ×⎝ n3 A3 1
−
3
ni Ai 2a
i=1
−n1 A1 n3 A3 1
+
0
3 i=1
ni αi
8a 2
2
# +1 τ −
3
1 ni αi 2a
i=1
Wτ
dτ
⎟ ⎟ ⎟× ⎟ ⎠
⎞⎛ ⎞ −n1 A1 1 −(n1 A1 + n2 A2 ) ⎠ ⎝ 1 ⎠ = 1 1
⎛ ⎞ 1 (n2 A2 + n3 A3 )k10 − n1 A1 (k20 + k30 ) exp{−t} ⎝ −1 ⎠ + = 3 i=1 ni Ai 0 ⎛ ⎞ 0 n3 A3 (k10 + k20 ) − (n1 A1 + n2 A2 )k30 + exp{−t} ⎝ 1 ⎠ + 3 i=1 ni Ai −1 ⎧⎛ ⎫ ⎞
2 3 ⎪ ⎪ 3 ⎨ 3 n A ⎬ n α i i i=1 ⎜ i=1 i i ⎟ i=1 ni αi + 3 − W exp ⎝ − 1 t + ⎠ t ⎪ ⎪ 2a 8a 2 2a ⎩ ⎭ i=1 ni Ai 3
0 i=1 ki
182
A. Korolev
⎛
⎞ n1 A1 × ⎝ n2 A2 ⎠ − n3 A3 ⎛ ⎞ 1 e(1 − 2a)(n2 A2 + n3 A3 − 2n1 A1 ) (1 − exp{−t}) ⎝ −1 ⎠ − − 2a 3i=1 ni Ai 0 ⎛ ⎞ 0 e(1 − 2a)(2n3A3 − n1 A1 − n2 A2 ) (1 − exp{−t}) ⎝ 1 ⎠ − − 2a 3i=1 ni Ai −1 −
t
× 0
3e(1 − 2a) × 2a 3i=1 ni Ai
⎧⎛ ⎞
2 3 ⎪ ⎨ 3 n A n α i i i=1 i i ⎜ ⎟ − exp ⎝ i=1 − 1⎠ (t − τ ) 2 ⎪ 2a 8a ⎩ ⎛ ⎞ 1 n1 A1 n α i=1 i i (Wt − Wτ ) dτ × ⎝ n2 A2 ⎠ . 2a n3 A3
3 −
(15)
By grouping the terms in (15) and writing the resulting vector expression coordinate—wise, we get expression (12). In the stochastic case, we don’t have so many equilibria as in the deterministic case. Equilibrium in this case is a point in the phase space to which a (stochastic) transition process converges as t tends to infinity. Thus, in our stochastic model, the very concept of unstable equilibrium loses its meaning. To perform a qualitative analysis of solutions of systems of stochastic equations, we need the following theorem. Theorem 2 ([7], Law of Repeated Logarithm) For a one-dimensional Brownian motion Wt , the equality lim sup √
t →∞
Wt 2 t ln ln t
=1
a.s.
holds. As usual, the abbreviation a.s. means almost sure.
Regular Networks Unification in Games with Stochastic Parameters
183
For analysis of the solution behavior, it is important for us to know in the case 3
2
3
i=1 ni αi
ni Ai >
+ 2a
4a
i=1
whether the values of terms with integrals on the right side of the Eqs. (12) when t → ∞ reach the value 3i=1 ki0 , i.e. whether the random process
t
0
⎫ ⎧⎛ ⎞
2 3 3 3 ⎪ ⎪ ⎬ ⎨ n α i=1 i i ⎜ ⎟ i=1 ni Ai i=1 ni αi dτ exp ⎝− + 1 + W τ − ⎠ τ ⎪ ⎪ 2a 2a 8a 2 ⎭ ⎩
reaches the value 2a 3i=1 ki0 . 3e(1 − 2a) Remark 1 Random Density of 0
t
⎧⎛ ⎫ ⎞
2 3 3 3 ⎪ ⎪ ⎨ ⎬ n α i=1 i i ⎜ ⎟ i=1 ni Ai i=1 ni αi dτ + W exp ⎝− + 1 τ − ⎠ τ ⎪ ⎪ 2a 2a 8a 2 ⎩ ⎭
if 3
2
3
i=1 ni αi
ni Ai >
+ 2a
4a
i=1
according to [1] is "
3 i=1 ni αi 8a 2
"
f (k) = Γ
4a
2 #−
3
4a
3 n A −2a i=1 i i +1
2 3 n α i=1 i i
i=1 ni Ai −2a
2 3 i=1 ni αi
#
−1
1
exp − · k
8a 2 3 i=1 ni αi
2
3 n A −2a 4a i=1 i i
3 n α 2 i=1 i i
k
.
184
A. Korolev
Then the probability that
3e(1 − 2a) 2a
t 0
⎧⎛ ⎞
2 3 3 ⎪ ⎨ n α i=1 i i ni Ai ⎜ ⎟ exp ⎝− i=1 + 1⎠ τ − + ⎪ 2a 8a 2 ⎩ 3 −
i=1 ni αi
2a
1 Wτ dτ
2a + 4a lim k1 (t) = e,
t →∞
lim k2 (t) = e,
t →∞
lim k3 (t) = e.
t →∞
and with probability 1 − P˜ respectively lim k1 (t) = 0,
t →∞
lim k2 (t) = 0,
t →∞
lim k3 (t) = 0.
t →∞
Remark 2 Note that for an m-regular network, it would be quite difficult to get a result similar to Theorem 1. Indeed, consider m regular networks that are in some initial state. The productivity of each agent has a constant and random (Brownian) component. The first regular network of degree t11 consists of agents with productivity A1 +α1 Wt1 each, the second regular network of degree t22 consists of agents with productivity A2 + α2 Wt2 each, and so on, the m-th regular network of degree tmm consists of agents with productivity Am + αm Wtm each. Before joining, each of the original networks had homophilia, i.e. all agents in each of the networks made the same investment in knowledge. Investment in the knowledge of agents of the first network was k10 , investment in the knowledge of agents of the second network was k20 , and so on, the investment in the knowledge of agents of the m-th 0 . Let at time t = 0 these m networks are combined to form a mnetwork was km regular network of degree (n1 , n2 , . . . , nm ) in accordance with Definition 5. Agents of the first network will be agents of the first type in the unified m-regular network, agents of the second network will be agents of the second type in the unified mregular network and so on, agents of the m-th network will be agents of the m-th type in the unified m-regular network. Let it continue as before A1 A2 Am = = ··· = . α1 α2 αm
(17)
Then the dynamics in such a network is described by a system of m stochastic differential equations similar to (10). It is clear that as the eigenvectors of this system
186
A. Korolev
of equations, we can choose ⎞ 1 ⎜ −1 ⎟ ⎟ ⎜ ⎟ ⎜ e1 = ⎜ 0 ⎟ , ⎟ ⎜ ⎝...⎠ 0 ⎛
⎞ 0 ⎜ 1 ⎟ ⎟ ⎜ ⎟ ⎜ e2 = ⎜ −1 ⎟ ⎟ ⎜ ⎝...⎠ 0 ⎛
and so on, ⎞ n1 A1 ⎜ n2 A2 ⎟ ⎟ em = ⎜ ⎝ ... ⎠, nm Am ⎛
or in view of (17) ⎞ n1 α1 ⎜ n2 α2 ⎟ ⎟ em = ⎜ ⎝ ... ⎠. nm αm ⎛
But after that, in order to get an explicit solution to the system of m stochastic equations, we would have to manually substitute an expression similar to (14) into an expression similar to (13), and then again manually perform all the calculations to obtain an m-dimensional expression similar to (15) and this is very time-consuming.
5 Conclusion The paper considers the development of the model [9]–[10]. The contribution of this paper consists the description of the transition dynamics in the stochastic case, when agents’ productivity has both deterministic and Brownian components. In previous researches, the transition dynamics between dynamically stable equilibrium states in the network was considered only in the deterministic case. It turns out that the boundaries of various scenarios of the agent’s behavior (and her behavior itself)
Regular Networks Unification in Games with Stochastic Parameters
187
in the stochastic case are shifted in comparison with the deterministic case. We obtained an explicit expression of the dynamics of agents in the form of Brownian random processes (Theorem 1). The research provides a qualitative analysis of the solutions for stochastic equations and their systems (1). Thus, it has been established in which direction and with which kind of probability the random process will develop and to which state it will come in following case. The next task is to study the transition dynamics in stochastic networks with arbitrary correlation functions between the Wiener components of different parameters.
References 1. Borodin, A.N., Salminen, P.: Handbook of Brownian Motion: Facts and Formulae. Birkhauser, Basel (1996) 2. Bramoullé, Y., Kranton, R.: Public goods in networks. J. Econ. Theory 135, 478–494 (2007) 3. Galeotti, A., Goyal, S., Jackson, M.O., Vega-Redondo, F., Yariv, L.: Network games. Rev. Econ. Stud. 77, 218–244 (2010) 4. Granovetter, M.S.: The strength of weak ties. Am. J. Sociol. 78, 1360–1380 (1973) 5. Jackson, M.O.: Social and Economic Networks. Princeton University, Princeton (2008) 6. Jackson, M.O., Zenou, Y.: Games on networks. In: Young, P., Zamir, S. (eds). Handbook of Game Theory, vol. 4, pp. 95–163. Elsevier, Amsterdam (2014) 7. Lamperti, J.: Stochastic processes. Springer, Berlin (1977) 8. Martemyanov, Y.P., Matveenko, V.D.: On the dependence of the growth rate on the elasticity of substitution in a network. Int. J. Process Manage. Benchmark. 4(4), 475–492 (2014) 9. Matveenko, V.D., Korolev, A.V.: Network game with production and knowledge externalities. Contrib. Game Theory Manage. 8, 199–222 (2015) 10. Matveenko, V., Korolev, A., Zhdanova, M.: Game equilibria and unification dynamics in networks with heterogeneous agents. Int. J. Eng. Bus. Manage. 9, 1–17 (2017) 11. Matveenko, V., Garmash, M., Korolev, A.: Game equilibria and transition dynamics in triregular networks. Contrib. Game Theory Manage. 9, 113–128 (2018) 12. Romer, P.M.: Increasing returns and long-run growth. J. Polit. Econ. 94, 1002–1037 (1986)
Simulation Modeling of the Resource Allocation Under Economic Corruption Kirill V. Kozlov, Guennady A. Ougolnitsky, Anatoly B. Usov, and Mukharbeck Kh. Malsagov
Abstract Resource allocation competitions often involve corruption. We analyze a static model of the production of a public good taking into account the corrupt distribution of resources and private interests. The game is formalized as a hierarchical three-level resource allocation model under corruption with independent decision makers. We use qualitatively representative scenarios of simulation modeling method to analyze the model. An algorithm for applying the method for the specified game is presented. The numerical solution of the game is found. Experiments with different model parameters are presented. Keywords Difference Stackelberg games · Economic corruption · Resource allocation · Simulation modeling
1 Introduction Most papers on the modeling of corruption are based on Gary Becker’s idea [3] that struggle with any crime makes sense when the utility of prevention of the crime is greater than the respective costs. This economic approach was tailored to corruption by Susan Rose-Ackerman [16–18] and many other authors, for example Shleifer and Vishny [20]. Comprehensive reviews are presented by Aidt [1]; International Handbook [19]; Jain [9]. As a rule, all named papers use static game theoretic models in normal form or multi-step games. Essentially less number of papers is based on optimal control models or differential game theoretic models of corruption [8].
K. V. Kozlov · G. A. Ougolnitsky () · A. B. Usov Southern Federal University, I.I. Vorovich Institute of Mathematics, Mechanics and Computer Sciences, Rostov-on-Don, Russia e-mail: [email protected]; [email protected]; [email protected] M. Kh. Malsagov Ingush State University, Magas, Russia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Trends in Mathematics, https://doi.org/10.1007/978-3-030-93616-7_10
189
190
K. V. Kozlov et al.
Later publications keep the tendency. Kolokoltsov and Malafeev [10] use the meanfield games approach that describes an interaction of the very big number of rational agents, particularly, under the impact of a special “key” player. Nikolaev [12] takes into consideration a moral level of the controllers during inspections. A concept of modeling of corruption in the hierarchical organizational systems based on the theory of sustainable management in active systems is developed under supervising of one of the authors. The main propositions are approved by the specific models. Static and dynamic models of the administrative and economic corruption in one-level, two-level, and three-level control systems are built and investigated. The distinctions between the named types of corruption as well as between capture and extortion from the point of view of mathematical formalization are substantiated. The indicators of tractability and greed of a bribetaker are proposed. The dependencies of the bribe-giver’s behavior on the model parameters are found, and the respective recommendations on corruption deterrence are formulated [2, 4, 6]. Models of resource allocation under corruption are studied in [5, 7]. Methods of optimization and optimal control theory are used for the investigation of the mentioned models. The methods are Pontryagin maximum principle, Hamilton-Jacobi-Bellman equations, and numerical methods. Also, simulation modeling seems to be very useful technique though it is quite rarely applied in this domain. Ougolnitsky and Usov [13] proposed a method of qualitatively representative scenarios in simulation modeling (QRS SM method). Its idea is to reduce the complete enumeration to a very small number of control scenarios that gives a sufficiently complete description of dynamics of the controlled system. The method is a heuristic one, and the representative scenarios are determined basing on specific features of the modeled system. This hypothesis is checked formally by the conditions of internal and external stability of the set of scenarios. The condition of internal stability is checked by the complete enumeration, and the condition of external stability—by a big enough number of experimental comparisons. The method demonstrated good results for the models of fisheries [14], river water quality control [15], and struggle with economic corruption [11, 21]. The contribution of this paper consists in the investigation of a static threelevel resource allocation model under corruption. It is supposed that the agents on the middle and lower levels of hierarchy make their decisions simultaneously and independently from each other. As a result, we receive a hierarchical game of the type “Principal-agents” with additional feedbacks on bribes proposed by the agents of the lower level to the agent of the middle level. The solution in this game is found by the QRS SM method because the feedbacks make an analytical solution intractable. The model is a static one that allows for concentration on the peculiarities of the QRS SM method without an analysis of the complicated dynamic setups. As we know, the approach is used first. In Sect. 2 the mathematical model is presented. In Sect. 3 we describe the QRS SM method and the algorithm of its application for the solution of our problem. The main Sect. 4 contains the results of the numerical experiments. Section 5 concludes.
Simulation Modeling of the Resource Allocation Under Economic Corruption
191
2 Mathematical Setup of the Problem We consider a three-level control system that consists of a Principal, a supervisor, and several active agents. The Principal P allocates a resource between the agents proportionally to their qualifications or output capabilities (types). After receiving the resource an agent produces a product and gets an income. Due to a big number of agents and amount of work the Principal is unable to control the system in real time and delegates a part of his authorities to the supervisor S. In fact, there are several supervisors but we consider in this model only one for simplicity. In exchange for a bribe, the supervisor can increase an allocated share for the bribe-giving agent at the expense of other agents who propose a smaller bribe or do not give it at all. The Principal struggles with corruption. The effectiveness of the struggle is characterized by the value of probability of catching of a bribe-taker (supervisor) and incurs costs that are proportional to the probability. Given the probability of her catching, the supervisor chooses a degree of corruption, i.e. a value of distortion of the initial objective resource allocation. In the same time, the agents independently of themselves and on the degree of corruption choose their bribe proposals. The model has the following form: gp (z, y, b) = (1 − p −
n j =1
gs (z, y, b) = p
n
λi )
n
ai ri (b) − c(z) → max, 0 ≤ z ≤ 1;
(1)
i=1
ai ri (b) + (1 − z − Mz)bmax rmax → max, 0 ≤ y ≤ R − rimax ;
i=1
(2) gi (z, y, b) = (1 − bi )λi ai ri (b) → max, 0 ≤ bi ≤ 1, i = 1, . . . , n. Here is an output factor (type) of the i-th agent, ai > 0,
n j =1
(3)
aj = 1; λi —a
share of the agent’s income from the product realization; R > 0—the Principal’s resource($), an initial allocation of the resource between the agents according the rule ri =
ai R , i = 1, . . . , n; rimax = max {rj }. n 1≤j ≤n aj
(4)
j =1
So, the function gi describes an income of the i-th agent. In fact, ai ri (b) is a production function of the agent, λi ai ri (b) gives its share in the produced income, and (1 − bi )λi ai ri (b) reflects the final value with consideration of a possible bribe. The supervisor’s income gS consists of two summands. The first summand is its official reward that is a share p of the total product by all agents (naturally, all
192
K. V. Kozlov et al.
values are expressed in money). The second summand is supervisor’s corruptive gain given the fact that if a bribe-taker is caught then the gain is confiscated, and an additional penalty is charged. At last, the Principal’s income gP is its share in the total product minus three values: supervisor’s reward, agents’ shares, and anticorruption struggle’s costs. In the presence of corruption the supervisor distorts the allocation (4) by the rule: ri (b) =
⎧ ⎨rmax = ri + y, bi = bmax = max {bj }, ⎩r − i
1≤j ≤n
y n−1 ,
(5)
otherwise;
If Arg max bj = {bi1 , . . . , bik } then bi = bmax : ai = amax = j
max
j ∈{i1 ,...,ik }
{aj };
y—a corruption surplus ($); bi —a share of the “kickback” of the i-th agent to the supervisor; p—a share of an official supervisor’s reward; M >> 1—a penalty for bribe in the case of catching; z—a probability of catching the supervisor by the principal; c(z)—a cost function in the struggle with corruption; c(0) = 0; lim c(z) = ∞; dc(z) dz > 0. z→1
The solution of the game (1)–(3) s.t. (4)–(5) is a scenario (z∗ , y ∗ , b ∗ ), where is the best response of the supervisor and the agents on the Principal’s strategy z∗ , i.e. (y ∗ , b ∗ )
gp∗ = gp (z∗ , y ∗ , b ∗ ) = sup
inf
z∈[0,1] (y,b)∈BR(z)
gp (z, y, b).
(6)
Here b∗ = (b1∗ , b2∗ , . . . , bn∗ ) is a Nash equilibrium in the game of agents that provides to the agents the best response in the sense of (3) to the Principal’s strategy.
3 Method of Investigation The solution of the Stackelberg game (1)–(3) s.t. (4)–(5) is found by the method of qualitatively representative scenarios (QRS SM) [13]. Let Ω = Z × Y × B1 × · · · × Bn be a set of outcomes in the game (1)–(3). QRS The method QRS SM supposes that Z = Z QRS , Y = Y QRS , Bi = Bi ∀i = QRS QRS QRS 1, 2, . . . , n, where sets Bi ,Y and Z contain qualitatively representative strategies (scenarios) of an agent i, the Supervisor and the Principal respectively. QRS (i = 1, 2, . . . , n) contain a small number K Suppose also that the set Bi QRS of elements, and sets Y and Z QRS contain a small number L of elements, QRS QRS QRS QRS or |Bi | = K, and |Y | = |Z QRS |=L. Then B1 × · · · × Bn × Y QRS × Z QRS = QRS is a QRS-set of the game that includes m = |QRS| = n . |BiQRS | = L2 · K n elements. Each representative scenario |Y QRS | · |Z QRS | i=1
Simulation Modeling of the Resource Allocation Under Economic Corruption
193
of the game (z, y, b)(k) ∈ QRS, k = 1, 2, . . . , m, has the form: (z, y, b)(k) = (k),b
(k)
(z(k) , y (k), b1 2 , . . . , bn(k) ); z(k) ∈ Z QRS ; y (k) ∈ Y QRS ; bi(k) ∈ BiQRS , i = 1, 2, . . . , n. A set QRS = (z, y, b)(1), (z, y, b)(2), . . . , (z, y, b)(m) is called the QRS-set of a difference Stackelberg game with a precision Δ if: (a) for any two elements of this set (z, y, b)(i), (z, y, b)(j ) ∈ QRS (j )
|gp(i) − gp | > Δ;
(7)
(b) for any other element (z, y, b)(l) ∈ / QRS there is an element (z, y, b)(j ) ∈ QRS such that (j )
|gp(l) − gp | ≤ Δ.
(8)
(j )
Here gp(i) , gp , gp(l) are the respective payoffs of the Principal (in the sense (s) (s) (s) (s) of (1)); gp = gp (z(s), y (s), b1 , b2 , . . . , bn ), s = i, j, l; Δ > 0—a sufficiently small constant that determines the precision. Therefore, qualitatively representative strategies imply an essential difference between the Principal’s payoffs, and the difference between one of the representative scenarios and any other scenario is not essential in the sense. The condition (7) is checked by the complete enumeration, and the condition (8) is checked by a sufficiently big number of experimental (numerical) comparisons. In the proposed method initial QRS sets are chosen to an extent arbitrarily using some heuristic reasoning. As a rule if the set of feasible control values is the segment [0, 1] then a strict or approximate uniform partition of it is used. The idea is to consider both extreme and middle control scenarios. If internal or external stability conditions for the initial QRS sets do not hold then these sets are modified using the same heuristic reasoning. In frame of this approach, in our model it is assumed initially that the Principal chooses his strategy z from the set Z QRS = {0, 0.2, 0.5, 0.9}. If z ≤ 0.2 then the supervisor chooses her R−r max R−r max QRS = {0, 4i , 2i }, otherwise from the set control value y from the set YL R−r max
3(R−r max )
i , R − rimax }. So, the less is a probability of catching of YHQRS = { 2i , 4 a bribe-taker, the greater is the corruptive action. The agents choose their strategies from the sets B QRS = BiQRS = {0, 0.15, 0.3}, i = 1, . . . , n. The conditions of qualitative representativeness (7)–(8) are checked for the set Z QRS . If they are not satisfied then the set Z QRS is corrected, otherwise a solution of the game (1)–(3) s.t. (4)–(5) on the set QRS = Z QRS × Y QRS × (B QRS )n is found. Then an analysis of sensitivity to the form of function c(z) and values of the parameters λi and p is conducted. An algorithm of building the solution for the model (1)–(5) by the QRS SM method is the following one.
194
K. V. Kozlov et al.
1. Give the form and numerical values of all input functions and parameters of the investigated model. 2. For an initial QRS-set check the conditions of internal and external stability (7)–(8). The specific value of constant Δ is chosen such that the difference between Principal’s payoffs for the strategies from the QRS set be not less than 10% of the mean payoff, and in the comparison with strategies that do not belong to the QRS set—not greater than 10% of the mean pay-off. If necessary then the initial QRS set is extended or reduced. In the case of extension add to the QRS set new strategies that are qualitatively different from the point of view of the Principal’s payoff. They are chosen from the strategies that are situated between the previously taken strategies. 3. Take one of the qualitatively representative Principal’s scenarios as his current strategy. 4. Take one of the qualitatively representative supervisor’s scenarios as her current strategy. 5. Find a Nash equilibrium in the game of agents by enumeration of their qualitatively representative scenarios. The Nash equilibrium is the best response of the agents in the sense of (3) to the supervisor’s strategy chosen from her QRS-set. 6. Compare the current (n + 1)-tuple (supervisor’s strategy and the respective agents’ best responses) with the best for the supervisor (in the sense of (2)) respective (n + 1)-tuple. If necessary then make a current supervisor’s strategy the optimal one. 7. If the supervisor’s QRS-set is not exhausted then take her new strategy and go to step 4, otherwise go to step 6. 8. Compare the current (n + 2)-tuple (Principal’s strategy and the respective supervisor’s and agents’ best responses) with the best for the Principal (in the sense of (1)) respective (n + 2)-tuple. If necessary then make a current Principal’s strategy the optimal one. 9. If the Principal’s QRS-set is not exhausted then take his new strategy and go to step 3, other-wise go to step 8. 10. The solution of the game has the form (z∗ , y ∗ , bi∗ ni=1 ), where z∗ ∈ Z QRS ; y ∗ ∈ Y QRS ; (bi∗ )ni=1 ∈ (B QRS )n .
4 Numerical Calculations For simplicity we consider a hierarchical control system with three agents. Example 1 (Internal and External Stability) Consider in details the process of calculations on the set QRS. Set the following values of parameters: M = 10, R = 1000($), c(z) = z2 , p = 0.2. Fix for the agents the values a1 = 0.35, a2 = 0.33, a3 = 0.32, and λ1 = 0.15, λ2 = 0.3, λ3 = 0.2.
Simulation Modeling of the Resource Allocation Under Economic Corruption Table 1 Numerical results for the set Z QRS
195
No
z
y
b1
b2
b3
gp , $
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
0 0 0 0 0 0 0 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9
0 162.5 162.5 162.5 325 325 325 325 487.5 487.5 487.5 650 650 650 325 325 487.5 487.5 487.5 650 650 650 325 325 325 487.5 487.5 487.5 650 650
0 0 0 0.15 0 0 0.15 0.15 0 0 0.15 0 0 0.15 0 0.15 0 0 0.15 0 0 0.15 0 0 0.15 0 0 0.15 0 0.15
0 0 0.15 0 0 0.15 0 0 0 0.15 0 0 0.15 0 0 0 0 0.15 0 0 0.15 0 0 0.15 0 0 0.15 0 0 0
0.15 0.15 0 0 0.15 0 0 0 0.15 0 0 0.15 0 0 0.15 0 0.15 0 0 0.15 0 0 0.15 0 0 0.15 0 0 0.15 0
50.1 49.6 50 50.7 49.1 49.8 51.3 51.2 48.6 49.7 51.9 48.1 49.5 52.5 48.8 51 48.4 49.4 51.6 47.9 49.3 52.3 48.3 49 50.5 47.8 48.9 51.1 47.3 51.7
To prove the internal stability of the set Z QRS = 0, 0.2, 0.5, 0.9 it is necessary to enumerate all strategies from the set G = Z QRS × Y QRS × (B QRS )3 . As the results of calculations of the Principal’s payoffs for some strategies repeat (a resource goes to the agent which proposes a greater bribe), we show in the Table 1 only the different Principal’s payoffs; its number is much less. It is seen from the Table 1 that the value Δ = 0.1 provides the condition (7) of internal stability for the set Z QRS . To check the condition of external stability let us calculate the Principal’s payoffs on the set P = {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}. Similarly to Table 1, now Table 2 contains the scenarios with different Principal’s payoffs. It is seen from the Table 2 that the previously chosen value Δ = 0.1 provides also the condition (8) of external stability for the set Z QRS .
196 Table 2 Numerical results for the set P
K. V. Kozlov et al.
No
z
y
b1
b2
b3
gp , $
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
0 0 0 0 0 0 0 0.1 0.2 0.2 0.2 0.3 0.3 0.3 0.3 0.3 0.3 0.4 0.4 0.4 0.4 0.4 0.4 0.5 0.5 0.5 0.5 0.5 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.7 0.7 0.7
0 162.5 162.5 162.5 325 325 325 162.5 162.5 162.5 325 325 325 487.5 487.5 650 650 325 325 487.5 487.5 650 650 325 325 487.5 650 650 325 325 487.5 487.5 650 650 650 325 325 487.5
0 0 0 0.15 0 0 0.15 0 0 0.15 0.15 0 0 0 0.15 0 0.15 0 0.15 0 0.15 0 0.15 0 0.15 0.15 0 0 0 0.15 0 0.15 0 0 0.15 0 0.15 0
0 0 0.15 0 0 0.15 0 0.15 0 0 0 0 0.15 0 0 0 0 0 0 0 0 0.15 0 0 0 0 0 0.15 0 0 0 0 0 0.15 0 0 0 0
0.15 0.15 0 0 0.15 0 0 0 0.15 0 0 0.15 0 0.15 0 0.15 0 0.15 0 0.15 0 0 0 0.15 0 0 0.15 0 0.15 0 0.15 0 0.15 0 0 0.15 0 0.15
50.1 49.6 50 50.7 49.1 49.8 51.3 49.9 49.5 50.6 51.2 49 49.7 48.5 51.8 48 52.4 48.9 51.1 48.4 51.7 49.4 52.3 48.8 51 51.6 47.9 49.3 48.7 50.9 48.2 51.5 47.8 49.2 52.1 48.6 50.8 48.1
(continued)
Simulation Modeling of the Resource Allocation Under Economic Corruption Table 2 (continued)
197
No
z
y
b1
b2
b3
gp , $
39 40 41 42 43 44 45 46
0.7 0.7 0.7 0.8 0.8 0.9 0.9 0.9
487.5 650 650 650 650 325 325 650
0.15 0 0.15 0 0.15 0 0.15 0
0 0 0 0 0 0 0 0
0 0.15 0 0.15 0 0.15 0 0.15
51.4 47.6 52 47.5 51.9 48.3 50.5 47.3
Certainly, now it is not a strict proof by the complete enumeration: the result holds only for the comparison of the scenarios from the set Z QRS with outside scenarios from the set P . However, a big enough number of comparisons makes the conclusion about stability quite reliable. For the chosen values of parameters the payoffs of the Principal and the supervisor differ essentially. In this case, the solution of the game (1)–(3) is the following set of scenarios from the QRS set: (z∗ , y ∗ , b∗ ) = (0.5, 650, {0.3, 0, 0.3}), where b∗ = {0.3, 0, 0.3} is the Nash equilibrium in the game of agents in normal form, or their best response to the supervisor’s strategy y ∗ = 650. The respective payoffs of the Principal, the supervisor, and the agents are equal to: g1 = 36.8, g2 = 0, g3 = 0, gS = 70, gP = 52.3. Example 2 Take z ∈ Z QRS = {0, 0.2, 0.5, 0.9}, M = 10, R = 1000. Fix for the agents a1 = 0.2, a2 = 0.5, a3 = 0.3, and λ1 = 0.1, λ2 = 0.1, λ3 = 0.1, p = 0.4. Find a solution of the game (1)–(3) on the set G = Z QRS × Y QRS × (B QRS )3 . Given these values of the parameters we receive the following results: (z∗ , y ∗ , {b1∗ , b2∗ , b3∗ }) = (0.5, 500, {0, 0.3, 0.3}), g1∗ = 0, g2∗ = 35, g3∗ = 0, gs∗ = 200, gp∗ = 0.5. Example 3 Let us study the model’s sensitivity to the form of function c(z). Fix the following parameter values: λ1 = 0.1, λ2 = 0.2, λ3 = 0.1, p = 0.2, a1 = 0.3, a2 = 0.2, a3 = 0.4, R = 1000, M = 10. The result is quite expected: it is seen from the Table 3 that the Principal’s payoff depends rather on the behavior of the function c(z) on the segment [0, 1] than on the rate of growth. Also, a form of the function does not impact on the optimal strategies in this case. Example 4 Study a variation of the payoffs of the Principal and the supervisor in dependence on their shares and the penalty value. For the Principal’s share variation we will vary the shares of the supervisor and the agents. Set R = 1000, and fix for the agents a1 = 0.35, a2 = 0.33, a3 = 0.32, c(z) = z2 (Table 4).
198 Table 3 Model’s sensitivity to the form of function c(z)
K. V. Kozlov et al.
No
c(z)
1 2 3 4 5 6 7 8 9 10
z 2
z 2z 100z z2 z3 z5 z7 z1 0 ez
z∗
y∗
b1∗
b2∗
b3∗
gp∗ , $
0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
555.6 555.6 555.6 555.6 555.6 555.6 555.6 555.6 555.6 555.6
0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
0 0 0 0 0 0 0 0 0 0
0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
159.8 159.5 159 110 159.8 159.9 160 160 160 158.4
As it is expected, an increase of the Principal’s share increases its payoffs. Also, an increase of the supervisor’s payoff does not change the strategies of the supervisor and the agents that increase again the Principal’s payoff. An increase of the penalty implies the greater supervisor’s losses but does not change the optimal strategies of the players. Starting from a certain value of the parameter M the supervisor’s losses become commensurable with the increase of the penalty. Example 5 Now let us study a dependence of the model solutions on the parameters n λi when p = 0.2 with consideration of the inequality λi ≤ 1 − p. Other values i=1
of the parameters are borrowed from the Example 4 (Table 5). It is not astonishing that the biggest payoff of the Principal corresponds to the smallest shares of the agents and the maximal bribes. Here the agents attain a Nash equilibrium only if the supervisor allocates the amount of resource R − rim ax. In this case, the winner receives the total resource, and other agents lose nothing. Notice that a Nash equilibrium in the previous examples is not unique. The equilibrium is attained when the supervisor allocates the resource in the amount R − rimax , or the total resource goes to the agent which proposes the maximal bribe. Other agents receive nothing, and their payoffs are equal to 0 for any bribe. It is seen from the Table 6 that there are several solutions of the problem in the case of parameters values from the Example 4. All the scenarios presented in the Table 6 are Nash equilibria, and the respective payoffs of all players coincide. Really, if the kickbacks are equal then the agent with the greatest value of the parameter ai wins (the second agent in this case), and this agent receives the total resource.
Simulation Modeling of the Resource Allocation Under Economic Corruption Table 4 Payoffs of the Principal and the supervisor in dependence on their shares and the penalty value
199
No
M
p
z
gs∗ , $
gp∗ , $
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
0.1 0.1 0.1 0.1 0.1 0.1 0.1 1 1 1 1 1 1 1 10 10 10 10 10 10 10 100 100 100 100 100 100 100 1000 1000 1000 1000 1000 1000 1000 10,000 10,000 10,000 10,000 10,000 10,000 10,000
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.1 0.2 0.3 0.4 0.5 0.6 0.7
0.6 0.5 0.4 0.3 0.2 0.1 0 0.6 0.5 0.4 0.3 0.2 0.1 0 0.6 0.5 0.4 0.3 0.2 0.1 0 0.6 0.5 0.4 0.3 0.2 0.1 0 0.6 0.5 0.4 0.3 0.2 0.1 0 0.6 0.5 0.4 0.3 0.2 0.1 0
188.5 228.5 268.5 308.5 348.5 388.5 428.5 175 215 255 295 335 375 415 40 80 120 160 200 240 280 −1310 −1270 −1230 −1190 −1150 −1110 −1070 −14,810 −14,770 −14,730 −14,690 −14,650 −14,610 −14,570 −149,810 −149,770 −149,730 −149,690 −149,650 −149,610 −149,570
199.8 159.8 119.7 79.7 39.7 −0.2 −40.2 199.8 159.8 119.7 79.7 39.7 −0.2 −40.2 199.8 159.8 119.7 79.7 39.7 −0.2 −40.2 199.8 159.8 119.7 79.7 39.7 −0.2 −40.2 199.8 159.8 119.7 79.7 39.7 −0.2 −40.2 199.8 159.8 119.7 79.7 39.7 −0.2 −40.2
200
K. V. Kozlov et al.
Table 5 Dependence of the model solutions on the parameters λi No
λ1
λ2
λ3
z∗
y∗
b1∗
b2∗
b3∗
g1∗ , $
g2∗ , $
g3∗ , $
gp∗ , $
gs∗ , $
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.3 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.3 0.3 0.1 0.1 0.1 0.2 0.2 0.2 0.3 0.3
0.1 0.2 0.3 0.1 0.2 0.3 0.1 0.2 0.3 0.1 0.2 0.3 0.1 0.2 0.3 0.1 0.2 0.3 0.1 0.2 0.3 0.1 0.2 0.3 0.1 0.3
0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500
0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
35 35 35 70 70 70 105 105 105 35 35 35 70 70 70 105 105 105 35 35 35 70 70 70 105 105
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
299.8 249.8 199.8 249.8 199.8 149.7 199.8 149.7 99.7 249.8 199.8 149.7 199.8 149.7 99.7 149.8 99.8 49.7 199.8 149.7 99.7 149.8 99.8 49.7 99.7 −0.3
50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50
Table 6 The same payoffs of the players z∗
y∗
b1∗
b2∗
b3∗
r1
r2
r3
g1∗ , $
g2∗ , $
g3∗ , $
gp∗ , $
gs∗ , $
0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
500 500 500 500 500 500 500 500 500
0 0 0 0.15 0.15 0.15 0.3 0.3 0.3
0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3 0.3
0 0.15 0.3 0 0.15 0.3 0 0.15 0.3
0 0 0 0 0 0 0 0 0
1000 1000 1000 1000 1000 1000 1000 1000 1000
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
35 35 35 35 35 35 35 35 35
0 0 0 0 0 0 0 0 0
149.7 149.7 149.7 149.7 149.7 149.7 149.7 149.7 149.7
−14,650 −14,650 −14,650 −14,650 −14,650 −14,650 −14,650 −14,650 −14,650
Simulation Modeling of the Resource Allocation Under Economic Corruption
201
5 Conclusion We proved in several examples the internal and external stability conditions (7)–(8) for the set Z QRS . A solution of the game (1)–(3) s.t. (4)–(5) is found by means of the QRS SM method on the set QRS = Z QRS × Y QRS × (B QRS )3 . We analyzed a dependency of results on the parameters and as well as on the form of the function c(z). The numerical results allow to make a conclusion that in the modeled system the agents with maximal output capability are interested to give the maximal bribe. In this case the supervisor allocates the maximal resource to such an agent. Also, this setup generates several Nash equilibrium best responses of the agents to the supervisor’s strategy. The optimal payoffs of the Principal and the supervisor in these equilibria are the same. Thus, we received several solutions of the game (1)– (3) that differ only by the agents’ strategies. It is interesting to investigate an improvement of the decisive rules that determine the winner when the kickbacks are equal. Probably, they should not choose the agent with the maximal output capability. When the values of penalty and catching probability coefficients are small, all agents are interested to give as big bribe as possible to become the winner. Most solutions generate such outcomes in spite of the different values of the agents’ types, and the share and initial resource of the Principal. Acknowledgments The work is supported by the Russian Foundation for Basic Research, project No.20-31-90041.
References 1. Aidt, T.S.: Economic Analysis of corruption: a survey. Econ. J. 113(491), 632–652 (2003) 2. Antonenko, A.V., Ugol’nitskii, G.A., Usov, A.B.: Static models of struggle with corruption in hierarchical management systems. J. Comput. Syst. Sci. Int. 52(4), 664–675 (2013) 3. Becker, G.: Crime and punishment: an economic approach. J. Polit. Econ. 76, 169–218 (1968) 4. Denin, K.I., Ugol’nitskii, G.A.: Game theoretic model of corruption in hierarchical control systems. J. Comput. Syst. Sci. Int. 49(1), 152–157 (2010) 5. Gorbaneva, O.I., Ougolnitsky, G.A.: Purpose and non-purpose resource use models in two-level control systems. Adv. Syst. Sci. Appl. 13(4), 379–391 (2013) 6. Gorbaneva, O.I., Ougolnitsky, G.A., Usov, A.B.: Modeling of Corruption in Hierarchical Organizations, 552 p. Nova Science Publishers, New York (2016) 7. Gorbaneva, O.I., Usov, A.B., Ougolnitsky, G.A.: Mechanisms of struggle with corruption in dynamic social and private interests coordination engine models. Contrib. Game Theory Manage. 12, 140–150 (2019) 8. Grass, D., Caulkins, J., Feichtinger, G. et al.: Optimal Control of Nonlinear Processes: With Applications in Drugs, Corruption, and Terror. Springer, Berlin (2008) 9. Jain, A.: Corruption: a review. J. Econ. Surv. 15(1), 71–121 (2001) 10. Kolokoltsov, V.N., Malafeev, O.A.: Mean-Field-Game of corruption. Dyn. Games Appl. 7, 34– 47 (2017) 11. Malsagov, M.H., Ugol’nitskii, G.A., Usov, A.B.: Struggle against economic corruption in resource allocation. Comput. Res. Model. 11(1), 173–185 (2019)
202
K. V. Kozlov et al.
12. Nikolaev, P.V.: Corruption suppression models: the role of inspectors’ moral level. Comput. Math. Model. 25(1), 87–102 (2014) 13. Ougolnitsky, G.A., Usov, A.B.: Computer simulations as a solution method for differential games. In: Pfeffer, M.D., Bachmaier, E. (eds.) Computer Simulations: Advances in Research and Applications, pp. 63–106. Nova Science Publishers, New York (2018) 14. Ougolnitsky, G., Usov, A.: Spatially distributed differential game theoretic model of fisheries. Mathematics 7(8), 732 (2019) 15. Reshitko, M.A., Ugol’nitskii, G.A., Usov, A.B.: Numerical method for finding Nash and Stackelberg equilibria in river water quality control models. Comput. Res. Model. 12(3), 653– 667 (2020) 16. Rose-Ackerman, S.: The economics of corruption. J. Public Econ. 4, 187–203 (1975) 17. Rose-Ackerman, S.: Corruption: A Study in Political Economy. Academic Press, New York (1978) 18. Rose-Ackerman, S.: Corruption and Government: Causes, Consequences and Reform. Cambridge University, Cambridge (1999) 19. Rose-Ackerman, S. (ed.): International Handbook on the Economics of Corruption. Edward Elgar, Northampton (2006) 20. Shleifer, A., Vishny R.: Corruption. Q. J. Econ. 108(3), 599–617 (1993) 21. Ugol’nitskii, G.A., Usov, A.B.: Dynamic models for coordinating private and public inter-ests in economic corruption. J. Comput. Syst. Sci. Int. 59(1), 39–48 (2020)
Analysis of Equilibrium Trajectories in Dynamic Bimatrix Games Nikolay A. Krasovskii and Alexander M. Tarasyev
Abstract The paper is devoted to the analysis of behavior of equilibrium trajectories for game dynamic systems arising in solution of bimatrix games. At the first stage, the approach is considered based on the ideas of guaranteed strategies in the sense of N.N. Krasovskii. In the framework of guaranteed solutions, we propose algorithms for constructing the value functions, positional strategies and equilibrium trajectories using the definition of the dynamic Nash equilibrium. At the second stage, we analyze equilibrium trajectories of the replicator dynamics relating to the theory of evolutionary games. At the third stage, we examine the dynamic system generated by the strategies of best replies similar to the Cournot model. The comparison is carried out for the objective indices of the equilibrium trajectories of all three dynamic systems. It is shown that the characteristics of the trajectories of the dynamic Nash equilibrium are better than the properties of the trajectories of the replicator dynamics or the best reply dynamics. In addition, the numerical experiments are implemented for the so-called mixed dynamics in which the first player uses the guaranteed strategy and the strategy of the second player is formed either by the replicator formulas or by the best reply dynamics. The simulation results for the mixed dynamics demonstrate that the values of players’ payoff functionals in the final points of trajectories are better in comparsion with the indices for trajectories of the replicator dynamics and the best reply dynamics and even better than at the final point of the dynamic Nash equilibrium. Keywords Dynamic Nash equilibirum · Replicator dynamics · Best reply dynamics · Guaranteed control strategies · Equilibrium trajectories
N. A. Krasovskii () N.N. Krasovskii Institute of Mathematics and Mechanics of UrB of RAS, Yekaterinburg, Russia e-mail: [email protected] A. M. Tarasyev N.N. Krasovskii Institute of Mathematics and Mechanics of UrB of RAS, Yekaterinburg, Russia Ural Federal University named after the first President of Russia B.N. Yeltsin, Yekaterinburg, Russia e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Trends in Mathematics, https://doi.org/10.1007/978-3-030-93616-7_11
203
204
N. A. Krasovskii and A. M. Tarasyev
1 Introduction In the paper, an analysis of the behavior of equilibrium trajectories is implemented for game dynamic systems arising in the constructing solutions of bimatrix games [2, 26]. At the first stage, for the dynamic bimatrix game generated by approaches of differential-evolutionary games [9, 18], the solution is considered based on the ideas of guaranteed strategies in the sense of N.N. Krasovskii [10–12]. Within this methodology, the algorithms are proposed for constructing the value functions, positional strategies and equilibrium trajectories in the framework of the dynamic Nash equilibrium [8]. These solutions are based on the value functions in the problems with the infinite horizon and possess the property of foresight (forecasting) for long-term objectives of the players. Let us note that the value functions are constructed in the framework of the theory of generalized (minimax, viscosity) solutions of Hamilton-Jacobi equations [4, 24]. In particular, constructions of the conjugate derivatives are used for the verification of stability properties [25]. Equilibrium trajectories are synthesised using characteristics of the Hamilton-Jacobi equations, the construction of which is based on the principles of optimality [13– 17, 23]. At the second stage, an analysis is provided for equilibrium trajectories of the replicator dynamics, that relates to the theory of evolutionary games [3, 5, 6, 20, 27]. The construction of replicator dynamics expresses the evolutionary dynamics of an object called replicator which has means of making more or less accurate copies of itself. The main idea is that replicators whose fitness is larger (smaller) than the average fitness of the population will increase (decrease) their share in the population. At the third stage, we investigate the dynamic construction, based on the strategies of best replies analogous to the Cournot model [7]. In evolutionary game theory, best reply dynamics represents a class of strategy updating rules, where players strategies in the next round are determined by their best reply to some subset of the population. Importantly, in these models players only choose the best reply on the next round that would give them the highest payoff on the next round. The solutions of the replicator dynamics and the best reply dynamics are related to the short-term interests of the players and do not take into account the longterm prediction of values of players’ payoffs. A comparison of the objective indices of the equilibrium trajectories of all three dynamic systems is carried out. It is demonstrated that the properties of the dynamic Nash equilibrium trajectories are better than the properties of the trajectories for the replicator dynamics and the best reply dynamics. In addition, computational experiments are carried out for mixed dynamics, where the guaranteed dynamic Nash equilibrium strategy is chosen for one of the players, and a strategy of either replicator dynamics or the best reply dynamics is selected for the second player. It is shown that the guaranteed dynamic Nash equilibrium strategy successfully “copes” with the replicator dynamics strategies
Analysis of Equilibrium Trajectories in Dynamic Bimatrix Games
205
or the strategies of the best reply dynamics and drives equilibrium trajectories to the preferred domains on the infinite time horizon. The simulation results for the mixed dynamics demonstrate that the values of players’ payoff functionals in the final points of trajectories are better in comparsion with the indices for trajectories of the replicator dynamics and the best reply dynamics and even better than at the final point of the dynamic Nash equilibrium. Thus, the idea is realized of shifting the trajectories of the dynamic bimatrix game from the static Nash equilibrium to the Pareto maximum points of the vector criterion [21, 22].
2 Evolutionary Game Dynamics: Dynamic Nash Equilibrium 2.1 Model Dynamics: Payoff Functions Let us consider the system of differential equations which describes the dynamics of behavior of two players: x(t) ˙ = −x(t) + u(t), y(t) ˙ = −y(t) + v(t),
x(t0 ) = x0 , y(t0 ) = y0 .
(1)
Here parameters x = x(t), 0 ≤ x ≤ 1, and y = y(t), 0 ≤ y ≤ 1 determine the probabilities of how selected players hold to their chosen strategies. For example, the parameter x means the probability that the first player choses the first strategy, (respectively, (1 − x) is the probability that he holds to the second strategy). The parameter y stands for the probability of choosing the first strategy by the second player (respectively, (1 − y) means the probability that he holds to the second strategy). Control parameters u = u(t) and v = v(t) satisfy conditions 0 ≤ u ≤ 1, 0 ≤ v ≤ 1 and are signals recommending a change of strategies by players. For example, the value u = 0 (v = 0) corresponds to the signal: “change the first strategy to the second”. The value u = 1 (v = 1) corresponds to the signal: “change the second strategy to the first”. The value u = x (v = y) corresponds to the signal: “keep the previous strategy”. An important property of the dynamics (1) is that the square, (x, y) ∈ [0, 1] × [0, 1], is a strongly invariant set. Namely, any trajectory of the dynamics (1), which starts in the square, survives in it on the infinite time horizon. Let us assume, that players’ payoffs are described by the matrices: A=
a11 a12 a21 a22
,
B=
b11 b12 b21 b22
.
Terminal quality functionals are determined as the mathematical expectations of payoffs given by corresponding matrices A and B in bimatrix game and can be
206
N. A. Krasovskii and A. M. Tarasyev
interpreted as “local” interests of players: gA (x(T ), y(T )) = = CA x(T )y(T ) − α1 x(T ) − α2 y(T ) + a22 at given time moment T . Here parameters CA , α1 , α2 are determined according to the classic bimatrix game theory [1]: CA = a11 − a12 − a21 + a22 , α1 = a22 − a12 , α2 = a22 − a21 . The quality functional gB of the second player and parameters CB , β1 , β2 are determined analogously according to the coefficients of the matrix B. The “global” interests JA∞ of the first player are determined as limit relations for quality functionals on an infinite planning horizon: JA∞ = [JA− , JA+ ], JA− = JA− (x(·), y(·)) = lim inf gA (x(t), y(t)), t →∞
JA+ = JA+ (x(·), y(·)) = lim sup gA (x(t), y(t)),
(2)
t →∞
calculated for the trajectories (x(·), y(·)) of the system (1). For the second player the “global” interests JB∞ are determined symmetrically. The solution of the dynamic game is considered basing on the optimal control theory [22] and differential game theory [7]. Following the papers [3, 5–7, 9] we present the definition of the dynamic Nash equilibrium in the class of positional strategies (feedbacks) U = u(t, x, y, ε), V = v(t, x, y, ε).
2.2 Dynamic Nash Equilibrium Let ε > 0 and (x0 , y0 ) ∈ [0, 1] × [0, 1]. The pair of feedbacks U 0 = u0 (t, x, y, ε), V 0 = v 0 (t, x, y, ε) is called the Nash equilibrium at the initial point (x0 , y0 ), if for any other feedbacks U = u(t, x, y, ε), V = v(t, x, y, ε) the following conditions take place: inequalities JA− (x 0 (·), y 0 (·)) ≥ JA+ (x1 (·), y1 (·)) − ε, JB− (x 0 (·), y 0 (·)) ≥ JB+ (x2 (·), y2 (·)) − ε are true for any trajectories (x 0 (·), y 0 (·)) ∈ X(x0 , y0 , U 0 , V 0 ), (x1 (·), y1 (·)) ∈ X(x0 , y0 , U, V 0 ), (x2 (·), y2 (·)) ∈ X(x0 , y0 , U 0 , V ).
Analysis of Equilibrium Trajectories in Dynamic Bimatrix Games
207
Here the symbol X stands for the set of trajectories, which start from the initial point (x0 , y0 ) and are generated by the corresponded strategies (U 0 , V 0 ), (U, V 0 ), (U 0 , V ) (see [12]).
2.3 Auxiliary Zero-Sum Games For the construction of the desired equilibrium feedbacks U 0 , V 0 we use the results [3]. According to this approach, we construct the equilibrium using optimal feedbacks for differential games ΓA = ΓA− ∪ ΓA+ and ΓB = ΓB− ∪ ΓB+ with payoffs JA∞ and JB∞ . In the game ΓA the first player maximizes the functional JA− (x(·), y(·)) with the guarantee, using the feedback U = u(t, x, y, ε) and the second player, on the contrary, tries to minimize the functional JA+ (x(·), y(·)), using the feedback V = v(t, x, y, ε). Vice versa, in the game ΓB the second player maximizes the functional JB− (x(·), y(·)) with the guarantee, and the first player minimizes the functional JB+ (x(·), y(·)). Let us introduce the following notations. The feedbacks solving, respectively, the problem of guaranteed maximization of the payoff functionals JA− and JB− are denoted by symbols u0A = u0A (t, x, y, ε) and vB0 = vB0 (t, x, y, ε). Let us note, that such feedbacks present the guaranteed maximization of players’ payoffs in the long run, and can be called “positive” feedbacks. By symbols u0B = u0B (t, x, y, ε) 0 = v 0 (t, x, y, ε) we denote feedbacks which are most unfavorable to the and vA A opposing players; namely, those feedbacks minimize the payoff functionals JB+ and JA+ of opposite players, respectively. Let us call these feedbacks “punishing”. The dynamic Nash equilibrium can be constructed by pasting “positive” feed0 according to relations [3]: backs u0A , vB0 and “punishing” feedbacks u0B , vA U0 = V = 0
uεA (t), if %(x, y) − (xε (t), yε (t))% < ε, u0B (x, y), otherwise, vBε (t), if %(x, y) − (xε (t), yε (t))% < ε, 0 (x, y), otherwise. vA
2.4 Value Functions for “Positive” Feedbacks The main role in constructing the dynamic Nash equilibrium belongs to “positive” feedbacks u0A , vB0 , which maximize values gA , gB with guarantee on an infinite time horizon T → ∞. For this purpose value functions wA , wB are constructed in zero-sum games on the infinite horizon. Basing on the method of generalized characteristics of Hamilton-Jacobi equations, we obtain an analytic structure of the value function.
208
N. A. Krasovskii and A. M. Tarasyev
Step 1 On the first step, we construct smooth components of the value function with fixed time moment T of the game termination. For that, the value function is determined along the characteristics of Hamilton-Jacobi equations, which are straight lines directed to the vertices of the square of possible game situations and are generated by marginal values of controls u and v. Step 2 On the second step, continuous pasting of these smooth components is implemented, and the conditions of u- and v-stability are checked at the points of nonsmoothness on the basis of the technique of differential inequalities for Hamilton-Jacobi equations [24, 25]. Step 3 On the third step, we calculate lower envelopes of these terminal value functions by the time parameter T of the game termination. To this end, we calculate derivatives of these functions by the time parameter T , equate these derivatives to zero, exclude parameter T , and obtain stationary smooth components for the value function on infinite horizon. Step 4 On the fourth step, we check that these smooth components satisfy the stationary Hamilton-Jacobi equation. Step 5 On the fifth step, using smooth components we paste the continuous function, for which we verify properties of u- and v-stability basing on the techniques of conjugate derivatives [25]. A more detailed description of the algorithm in the 2 × 2-dimensional case is provided in the paper [17]. In the case when CA > 0 the value function wA is determined by the system of four functions: wA (x, y) = ψAi (x, y),
i = 1, . . . , 4.
(α1 x + α2 y)2 , 4CA xy ((CA − α1 )(1 − x) + (CA − α2 )(1 − y))2 , ψA2 (x, y) = a11 − 4CA (1 − x)(1 − y) ψA3 (x, y) = CA xy − α1 x − α2 y + a22 , a22 CA − α1 α2 = ωA . ψA4 (x, y) = CA ψA1 (x, y) = a22 −
(3)
Here ωA is the value of the static game with the matrix A. The value function is pasted from the smooth components ψAi (x, y), i = 1, . . . , 4 (3) and generates the switching line of the optimal control strategy for the first player KA at the points of pasting of the components ψA1 and ψA4 and at the
Analysis of Equilibrium Trajectories in Dynamic Bimatrix Games
209
points of pasting of the components ψA2 and ψA4 : KA = KA1 ∪ KA2 ,
(4)
KA1 = {(x, y) ∈ [0, 1] × [0, 1] : α1 ≤ y ≤ 1, CA
y=
α1 x}, α2
KA2 = {(x, y) ∈ [0, 1] × [0, 1] : 0≤y≤
α1 , CA
α2 ≤ x ≤ 1, CA
y=−
0≤x≤
α2 , CA
(CA − α1 ) (1 − x) + 1}. (CA − α2 )
For the construction of the guaranteed strategy by the switching line KA we introduce the decription of the domain DA where the value of the guaranteed control u = 1: DA = {(x, y) ∈ [0, 1] × [0, 1] : y≥−
y≥
α1 x, α2
(CA − α1 ) (1 − x) + 1}. (CA − α2 )
The guaranteed strategy of the first player generated by the switching line KA is constructed by the following rule: u(x, y) =
1, 0,
if (x, y) ∈ DA , otherwise.
(5)
The value function wB (x, y), the switching line KB = KB1 ∪ KB2 and the guaranteed strategy v = v(x, y) for the second player with the payoff matrix B are constructed analogously. It is shown that the value function wA has the properties of u-stability and vstability [16], that can be expressed in the form of conjugate derivatives [25].
2.5 Example of the Investment Game For definiteness and demonstration of the behavior of equilibrium trajectories in the dynamic bimatrix games we present the following payoff matrices of two players on the financial markets of stocks and bonds. These matrices reflect the data on markets of stocks and bonds in the USA [19]. Matrix A correspond to the behavior of traders, who play on increase of the rate and are called bulls. Matrix B corresponds to the behavior of traders, who play on
210
N. A. Krasovskii and A. M. Tarasyev
decrease of the rate and are called bears. Matrix parameters mean the yield on stocks and bonds expressed as interest rates. Let us present the matrices A, B and their main “game” parameters [26]: A=
10 0 , 1.75 3
B=
−5 3 . 10 0.5
CA = a11 − a12 − a21 + a22 = 11.25, α2 = a22 − a21 = 1.25, α1 = a22 − a12 = 3, xA =
α2 = 0.11, CA
yA =
α1 = 0.27, CA
CB = b11 − b12 − b21 + b22 = −17.5, β1 = b22 − b12 = −2.5, β2 = b22 − b21 = −9.5, xB =
β2 = 0.54, CB
yB =
β1 = 0.14. CB
On Fig. 1 we present saddle points of the static game SA and SB , the point of the static Nash equilibrium NE, switching lines KA and KB generated by the value functions. Equilibrium trajectories T1 , T2 and T3 , which we call trajectories of the dynamic Nash equilibrium, start from initial points I P1 , I P2 and I P3 then move along characteristics of Hamilton-Jacobi equations, meet switching lines on which sliding mode arises and converge to the point of the dynamic Nash equilibrium, which we call the market equilibrium ME.
1
IP
1
0.8
KA IP2
T
1
0.6
y
T2
0.4 0.2
y A y
SA
NE
ME
S
K
B
B
B
0
xA 0
T3
IP3
xB 0.5
x Fig. 1 Trajectories of the dynamic Nash equilibrium
1
Analysis of Equilibrium Trajectories in Dynamic Bimatrix Games
211
The values of the payoff functionals of players at the point of the market equilibrium ME: gA (ME) = 2.68, gB (ME) = 2.86. Let us note that these values majorate the payoffs and the point of the static Nash equilibrium NE: gA (NE) = 2.68, gB (NE) = 1.86.
3 Replicator Dynamics Let us present the general view of replicator dynamics for the dynamic bimatrix game (see, for example, [3, 6, 27]): u˙i = ui (Av)i − (u, Av) , v˙j = vj (Bu)j − (v, Bu) ,
1 ≤ i, j ≤ n.
(6)
Here vectors u = (u1 , . . . , un ) and v = (v1 , . . . , vn ) describe the system state. Symbols (Av)i and (Bu)j stand for the fitness of the corresponding type. Average fitness is determined as follows: (u, Av) =
n
ui (Av)i ,
(v, Bu) =
i=1
n
vi (Bu)i .
i=1
The system (6) is consistent with one of the basic principles of Darwinism: reproductive success of an individual or a group depends on the advantage of one’s own fitness over the average fitness of the population. Let us present the main characteristics for the replicator systems of the type (6). The Jacobi matrix at the stationary point (the static Nash equilibrium) in the general case has the form [6]:
0 C J= , D 0 where 0 is the zero submatrix of the size (n − 1) × (n − 1), and C, D are the submatrices formed by some constant coefficients. The characteristic polynomial of the system has the form: p(λ) = det(λ2 I − DC). From the structure of the characteristic polynomial it follows that in the twodimensional case the system cannot have a stationary point of the focus type or the node type.
212
N. A. Krasovskii and A. M. Tarasyev
1.2 1 IP1
y
0.8
IP
2
0.6 0.4
NE
y 0.2
A
1
FP
IP
2
0 −0.5
FP
0
3
FP xB 3 0.5
1
1.5
x Fig. 2 Equilibrium trajectories of the replicator dynamics
For the dynamic bimatrix 2 × 2 game the replicator dynamics can be rewritten in the form of the system of differential equations of the second order: x(t) ˙ = x(t) 1 − x(t) CA y(t) − α1 , y(t) ˙ = y(t) 1 − y(t) CB x(t) − β2 ,
x(t0 ) = x0 , y(t0 ) = y0 .
(7)
On Fig. 2 we present trajectories of the replicator dynamics which start from initial points I P1 , I P2 and I P3 and circulate around the point of the static Nash equilibrium NE. The guaranteed values of the functional (2) on the trajectories of the replicator dynamics are much worse than at the point of the static Nash equilibrium NE and, certainly, worse than at the point of the market equilibrium ME. For example, on the trajectory that starts from the starting point I P2 = (0.75, 0.65) the guaranteed value of the functional gA is: lim inf gA (x(t), y(t)) = 1.04, t →∞
and the guaranteed value of the functional gB is: lim inf gB (x(t), y(t)) = −0.11. t →∞
Let us note that there exist other payoff functionals for players, for example, average integral payoffs [1]. But in this article, the comparison for such functionals is not carried out and is a subject for the future research.
Analysis of Equilibrium Trajectories in Dynamic Bimatrix Games
213
4 Best Reply Dynamics This section is devoted to the analysis of the best reply dynamics in the bimatrix game. The game dynamics have the same form as the control dynamics (1). The only difference in this construction is that the switching lines are provided by the shortterm interests of the players, namely, by the fitness functions gA (x, y) and gB (x, y). It means that the switching lines for the control strategies of the best reply dynamics are generated by the static acceptable situations x = xB , y = yA , which form the point of the static Nash equilibrium NE = (xB , yA ). Let us note that the shortterm fitness functions of players gA (x, y) and gB (x, y) differ essentially from the long-term interests presented by the value functions wA (x, y) and wB (x, y). Thus, the best reply dynamics is defined by the control system: x(t0 ) = x0 , y(t0 ) = y0 ,
x˙ = −x + u, y˙ = −y + v,
where control strategies u = u(x, y), v = v(x, y) are given by formulas: u(x, y) = v(x, y) =
1, 0,
if yA ≤ y ≤ 1, if 0 ≤ y < yA ,
1, 0,
if if
0 ≤ x < xB , xB ≤ x ≤ 1.
(8)
On Fig. 3 we present equilibrium trajectory T R of the best reply dynamics. It starts from the initial point I P , moves along the characteristics of the HamiltonJacobi equations, switches from one characteristics to another at the switching lines x = xB , y = yA , and converges to the point of the static Nash equilibrium NE. Let us remind that the values of players’ payoffs at the point of the market equilibrium ME majorate players’ payoffs at the point of the static Nash equilibrium. Thus, the trajectories of the dynamic Nash equilibrium provide better results at the point of market equilibrium ME than the trajectories of best reply dynamics at the static Nash equilibrium NE.
5 Mixed Dynamics In this section we consider mixed dynamics. In the first scenario, we consider the case when the first player uses the guaranteed strategy with switching line KA (4) that has the form ug (t) = u(x(t), y(t))
214
N. A. Krasovskii and A. M. Tarasyev
1.2 1 IP
y
0.8
TR 0.6 0.4 0.2
NE
yA
0 −0.5
xB 0
0.5
1
1.5
x Fig. 3 Equilibrium trajectory of the best reply dynamics
1.2 FP
1 2
0.8
y
IP1
T
T1
IP2
0.6
KA
0.4 0.2
yA
NE S
A
T
3
0 −0.5
x
IP3
B
0
0.5
1
1.5
x Fig. 4 Equilibrium trajectories of the mixed dynamics: Scenario 1
(5), and the strategy of the second player is formed by the replicator dynamics (7): x(t) ˙ = −x(t) + ug (t), y(t) ˙ = y(t) 1 − y(t) CB x(t) − β2 ,
x(t0 ) = x0 , y(t0 ) = y0 .
On Fig. 4 we present the switching line KA for the control of the first player, the switching line x = xB for the control of the second player formed by the replicator dynamics. Trajectories of the mixed dynamics are denoted by the symbols T1 , T2 ,
Analysis of Equilibrium Trajectories in Dynamic Bimatrix Games
215
1.2 FP 1
T2
y
0.8
IP
1
IP
T1
2
0.6
KA 0.4
NE
y 0.2
A
0 −0.5
S
T
A
3
x
IP
3
B
0
0.5
1
1.5
x Fig. 5 Equilibrium trajectories of the mixed dynamics: Scenario 2
T3 . They start from initial points I P1 , I P2 , I P3 , move along the characteristics of Hamilton-Jacobi equations, meet switching lines where they change their behavior and converge to the final point F P . In the second scenario, we consider the case when the first player uses the guaranteed strategy with switching line KA (4) that has the form ug (t) = u(x(t), y(t)) (5), and the strategy of the second player is formed by the best reply dynamics vb (t) = v(x(t), y(t)) (8): x(t) ˙ = −x(t) + ug (t), y(t) ˙ = −y(t) + vb (t),
x(t0 ) = x0 , y(t0 ) = y0 .
On Fig. 5 we present the switching line KA for the control of the first player, the switching line x = xB for the control of the second player formed by the best reply dynamics. Trajectories of the mixed dynamics are marked by the symbols T1 , T2 , T3 . They start from initial points I P1 , I P2 , I P3 , then move along the characteristics of Hamilton-Jacobi equations, meet switching lines where they change their behavior and converge to the final point F P . Let us note that the values of the players’ payoff functionals at the final point of the motion for equilibrium trajectories of the mixed dynamics F P for both scenarios are: gA (F P ) = 5.29 and gB (F P ) = 3.55. These values majorate players’ payoffs at the points of the static Nash equilibrium NE and even at the point of the market equilibrium ME.
216
N. A. Krasovskii and A. M. Tarasyev
6 Comparison Results for Equilibrium Trajectories In this section we demonstrate the comparison results for equilibrium trajectories of considered dynamics: the dynamics of the dynamic Nash equilibrium, the replicator dynamics, the best reply dynamics and the mixed dynamics. Let us provide values of the players’ payoff functionals gA and gB at the final points of the motion of equilibrium trajectories. At the final point of the motion of trajectories for the dynamic Nash equilibrium ME the values are the following: gA (ME) = 2.68 and gB (ME) = 2.86. At the final point of the motion of the trajectory of the best reply dynamics (which is the point of the static Nash equilibrium NE) the values are the following: gA (NE) = 2.68 and gB (NE) = 1.86. For the replicator dynamics, cyclic equilibrium trajectories arise around the point of the static Nash equilibrium NE. For example, on the trajectory that starts from the starting point I P2 = (0.75, 0.65) the guaranteed values of both functionals are: lim inf gA (x(t), y(t)) = 1.04, t →∞
lim inf gB (x(t), y(t)) = −0.11. t →∞
At the final point of the motion of trajectories for the mixed dynamics F P in both scenarios the values are the following: gA (F P ) = 5.29 and gB (F P ) = 3.55. Summarizing, let us note that the values of players’ payoff functionals in the final points of the motion of trajectories for the mixed dynamics are the best in comparsion with the indices for trajectories of the replicator dynamics and the best reply dynamics, and are even better than at the final point of the dynamics Nash equilibrium.
7 Conclusion For the considered 2 × 2 dynamic bimatrix game we provide the analysis for the behavior of equilibrium trajectories. First, the dynamic Nash equilibrium trajectories are constructed within the approach of guaranteed strategies in the sense of N.N. Krasovskii and generalized (minimax, viscosity) solutions of HamiltonJacobi equations. Second, the analysis is provided for the replicator dynamics whose trajectories generate cycles around the static Nash equilibrium. Third, the dynamic construction is investigated based on the strategies of best replies and its convergence property to the static Nash equilibrium is outlined. Fourth, computational experiments are carried out for the mixed dynamics in which we couple strategies of the considered dynamics: strategies of the dynamic Nash equilibrium against the replicator dynamics, strategies of the dynamic Nash equilibrium against the best
Analysis of Equilibrium Trajectories in Dynamic Bimatrix Games
217
reply dynamics. Finally, the comparison results for equilibrium trajectories of the considered dynamics are presented. In the future research, we are planning to provide the comparison analysis for equilibrium trajectories of the dynamic bimatrix games with average integral payoff functionals [1]. Acknowledgments The first author, Nikolay A. Krasovskii, is supported by the Russian Science Foundation (Project No. 19-11-00105).
References 1. Arnold, V.I.: Optimization in mean and phase transitions in controlled dynamical systems. Funct. Anal. Appl. 36(2), 83–92 (2002). https://doi.org/10.1023/A:1015655005114 2. Basar, T., Olsder, G.J.: Dynamic Noncooperative Game Theory. Academic press, London (1982) 3. Bratus, A.S., Novozhilov, A.S, Platonov, A.P.: Dynamic Systems and Models of Biology. Fizmatlit, Moscow (2010) 4. Crandall, M.G., Lions, P.L.: Viscosity solutions of Hamilton-Jacobi equations. Trans. Am. Math. Soc. 277(4), 1–42 (1983) 5. Friedman, D.: Evolutionary games in economics. Econometrica 59(3), 637–666 (1991) 6. Hofbauer, J., Sigmund, K.: The Theory of Evolution and Dynamic Systems. Cambridge University, Cambridge (1988) 7. Intriligator, M.: Mathematical Optimization and Economic Theory. Prentice–Hall, New York (1971) 8. Kleimenov, A.F.: Nonantagonistic Positional Differential Games. Nauka, Yekaterinburg (1993) 9. Kolmogorov, A.N.: On analytical methods in probability theory. Uspekhi Matematicheskih Nauk 5, 5–41 (1938) 10. Krasovskii, N.N.: Control of Dynamic System. Nauka, Moscow (1985) 11. Krasovskii, A.N., Krasovskii, N.N.: Control Under Lack of Information. Birkhauser, Boston (1995) 12. Krasovskii, N.N., Subbotin, A.I.: Game-Theoretical Control Problems. Springer, New York (1988) 13. Krasovskii, A.A., Taras’ev, A.M.: Dynamic optimization of investments in the economic growth models. Autom. Remote Control 68(10), 1765–1777 (2007) 14. Krasovskii, N.A., Tarasyev, A.M.: Decomposition algorithm of searching equilibria in a dynamic game. Autom. Remote Control 76(10), 1865–1893 (2015). https://doi.org/10.1134/ S0005117915100136 15. Krasovskii, N.A., Tarasyev, A.M.: Equilibrium trajectories in dynamical bimatrix games with average integral payoff functionals. Math. Game Theory Appl. 8(2), 58–90 (2016) 16. Krasovskii, N.A., Tarasyev, A.M.: Mechanism for shifting Nash equilibrium trajectories to cooperative Pareto solutions in dynamic bimatrix games. Contrib. Game Theory Manage. 13, 218–243 (2020) 17. Krasovskii, N.A., Kryazhimskiy, A.V., Tarasyev, A.M.: Hamilton–Jacobi equations in evolutionary games. Proc. Inst. Math. Mech. UrB RAS 20(3), 114–131 (2014) 18. Kryazhimskii, A.V., Osipov, Yu.S.: On differential–evolutionary games. Proc. Steklov Inst. Math. 211, 234–261 (1995) 19. MarketWatch: Stock Market News—Financial News. https://www.marketwatch.com/ 20. Maynard Smith, J.: Game theory and the evolution of fighting. In: Maynard Smith, J. (ed.) On Evolution, pp. 8–28. Edinburgh University Press, Edinburgh (1972)
218
N. A. Krasovskii and A. M. Tarasyev
21. Mazalov, V.V., Rettieva, A.N.: Asymmetry in a cooperative bioresource management problem. Large-scale Syst. Control 55, 280–325 (2015) 22. Petrosjan, L.A., Zenkevich, N.A.: Conditions for sustainable cooperation. Autom. Remote Control 76(10), 1894–1904 (2015). https://doi.org/10.1134/S0005117915100148 23. Pontryagin, L.S., Boltyanskii, V.G., Gamkrelidze, R.V., Mischenko, E.F.: The Mathematical Theory of Optimal Processes. Interscience Publishers, New York (1962) 24. Subbotin, A.I.: Generalized Solutions of First Order PDEs: The Dynamical Optimization Perspective. Birkhauser, Boston (1995) 25. Subbotin, A.I., Tarasyev, A.M.: Conjugate derivatives of the value function of a differetnial game. Doklady AN SSSR 283(3), 559–564 (1985) 26. Vorobyev, N.N.: Game Theory for Economists and System Scientists. Nauka, Moscow (1985) 27. Yakushkina, T.S.: On spatial replicator systems for bimatrix games. Vestnik Moscow Univ. Comput. Math. Cybern. 15(1), 19–27 (2016)
Manipulation of k-Approval Under De Re Knowledge Valeriia Kuka and Egor Ianovski
Abstract Strategic voting is a pervasive issue in social choice—voters misrepresent their preferences to obtain a favourable outcome to themselves and, in doing so, force a suboptimal outcome on society. The Gibbard-Satterthwaite theorem established that all non-trivial procedures are vulnerable to strategic voting, but that does not mean that all are equally vulnerable. One approach to limiting the effects of strategic voting is to consider the epistemic burden of strategy formation— how much does a voter need to know in order to vote strategically? Unfortunately the notion of knowledge can be formalised in many equally intuitive ways, and it is not at all obvious what is the right definition of knowledge for voting. In this paper we consider the concept of de re knowledge of manipulation, proposed by Van Ditmarsch et al. (Strategic voting and the logic of knowledge (2013). arXiv preprint:1310.6436), and quantify the amount of knowledge a manipulator needs to manipulate a k-approval election. We find that the amount needed is a lot, bordering on complete knowledge of the profile, which suggests that this notion is too stringent for wide applicability. Keywords Choice · Voting theory · Strategic voting.
1 Introduction Voting theory studies the situation where a set of agents with ordinal preferences over a set of alternatives have to decide on a single alternative for all of society. As the name suggests, this includes voting (citizens express preferences over candidates, a president is chosen based on the ballots), but can also describe sporting events (judges rank athletes, medals are awarded), group recommendation (recommender systems suggest films for individuals, results are aggregated to
V. Kuka () · E. Ianovski HSE University, International Laboratory of Game Theory and Decision Making, St. Petersburg, Russia e-mail: [email protected]; [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Trends in Mathematics, https://doi.org/10.1007/978-3-030-93616-7_12
219
220
V. Kuka and E. Ianovski
suggest a film to the group), and ensemble algorithms (machine learning algorithms predict whether what just jumped in front of the car is a human, a dog, or a sensor error, and the car either swerves into the gutter or keeps driving). Since an agent’s sincere rating of the alternatives is his private information, if the agents are rational and self-interested, a voting situation is also a game: an agent submits an order over the alternatives—not necessarily his sincere order—with the aim to obtain an outcome that is maximally preferable. If a voter submits an order that is different from his sincere preferences, he is said to engage in strategic voting, or manipulation. A voting rule where strategic voting is not possible (i.e. sincere behaviour is a weakly dominant strategy) is said to be strategy-proof. A strategy-proof procedure would be desirable for various reasons. In a political context, democracy is based on the principle of one-man-one-vote, but if strategic behaviour is possible then better organised, strategically inclined agents will have a greater impact on the outcome than their naïve counterparts. In sport, to assess the relative merits of the athletes we would wish the judges to rank them sincerely, and not with the aim of making their preferred athlete win. In ensemble computing, strategic behaviour by an algorithm could lead to sub-optimal decisions which, in the case of a self-driving car, could be disastrous. Unfortunately, the Gibbard-Satterthwaite theorem established that non-trivial strategy-proof procedures do not exist. However, that is not to say that all procedures are equally susceptible to strategic behaviour—the theorem merely states that for every procedure, there exists some voting situation where an agent can manipulate. It says nothing about how frequent these situations are, or how difficult it is for an agent to execute them. One approach to circumvent the Gibbard-Satterthwaite theorem is by considering the epistemic burden of manipulation—how much must an agent know about the preferences of other agents in order to vote strategically? Consider the situation with five voters and four candidates below: Ann a b c d
Bill a b c d
Tom b d c a
Jessica c d b a
Ron d c b a
The plurality voting rule assigns one point to a candidate each time the candidate is ranked first, and elects the candidate with the most points. Under this rule a is the winner with two votes. However, Ron has the potential to manipulate the election. His preferred candidate, d, has no hope of winning, but Ron prefers b to a, and if he ranks b first then there will be a tie between a and b. We did not specify a tiebreaking procedure, but if b has a chance of winning the tie, then this is clearly a profitable deviation for Ron.
Manipulation of k-Approval Under De Re Knowledge
221
But how much must Ron know in order to use this strategy? Clearly if Ron knows nothing about the preferences of the other voters, he has no hope of finding a strategic vote; he may as well vote sincerely. Yet if Ron knows the first choice of every voter, that gives him all the information he needs to manipulate the election. Now consider the Borda rule, which assigns three points to the first position, two to the second, one to the third. The winner is b with 9 points, followed by c with 8, d with 7, a with 6. Ron can still manipulate—by submitting the vote c * d * a * b, c will gain a point and b will lose a point, making c the unique winner. However, if Ron only knew the first choice of every voter then, intuitively, he would not know how to act—he would not even know what the outcome of the election is. It turns out that this intuition is not trivial to formalise. To even state what it means to manipulate an election, when a voter does not know the outcome, is not obvious. In the literature, many authors have settled on the notion of de re knowledge of weak manipulation—a voter can manipulate an election if there exists a vote P that will improve on the sincere election outcome in some situation consistent with his knowledge, and in no situation will voting P be worse. In other words, voting P is a weakly dominant strategy. This is certainly an appealing notion—it would be reasonable to assume that a rational agent would not use a weakly dominated strategy—but it is not without its problems. Consider a situation with m candidates, and a voter who knows only that his first choice a is universally reviled by all the other candidates. Under any reasonable voting rule, a would not win, so the voter has a weakly dominant strategy to vote for someone else. However, depending on the voting rule this could be almost any of the (m − 1)(m − 1)! ballots that do not rank a first. Saying that such a voter can manipulate the election is stretching reality. In this paper we consider the consequences of the stronger notion of de re knowledge of manipulation—that voting P is a strictly dominant strategy—on the family of k-approval voting procedures. We show that with a sufficient number of voters, knowing one (or, with a larger number of voters, all but one) disapproved candidate of every voter is insufficient to give the manipulator knowledge of manipulation. With knowledge of approved candidates, the key factor is whether it is known that some candidates are universally approved; if not, the manipulator cannot have knowledge of manipulation.
1.1 Our Contribution We investigate the notion of de re knowledge of manipulation in the context of manipulating approval voting. We consider two families of information sets—Dx , where the manipulator knows x disapproved candidates of each voter, and Ax , where the manipulator knows x approved candidates of each voter, and consider the circumstances under which this information is insufficient to have de re knowledge of manipulation.
222
V. Kuka and E. Ianovski
We find that if n > k, k < m − 1 then D1 will never give the manipulator de re knowledge of manipulation under k-approval. If it is also the case that n > m, the result can be strengthened to show that Dm−k−1 —the knowledge of all but one disapproved candidate of each voter—is also insufficient for de re knowledge. With the knowledge of approved candidates, we show that if the manipulator knows that no candidate is approved unanimously, then Ax is insufficient knowledge for manipulation for any x < k ≤ 3. Overall, these results suggest that the notion of de re knowledge of manipulation is too stringent to be of wide applicability in voting. Intermediate notions between de re knowledge of weak manipulation and de re knowledge proper ought to be sought.
2 Related Work Ever since the Gibbard-Satterthwaite theorem established that no non-trivial voting rule can be strategy-proof [9, 14], scholars have searched for ways to get around it. One of the first was Gibbard himself, who showed that if we generalise the notion of a choice function to include chance [7, 8, 18], a continuum of strategy-proof procedures is possible [10].1 But the probabilistic setting raises the question, of what does it mean to vote strategically, if a voter cannot be sure of the outcome? To exploit the full power of the model, we need to introduce cardinal utilities into what was originally a purely ordinal setting, and impose some sort of expected utility maximisation on the voters. This can have great theoretical advantages: the study of voting equilibria, rather than a naïve single-manipulator model becomes a lot more feasible [12], and if we allow a choice function to operate on the utilities directly, a whole new world lies open [1]. But its not without its problems. There are the standard issues of the unknowability and incomparability of subjective utility, and amusingly Ferejohn and Fiorina [6] argue that, if voting has non-zero cost, an expected utility maximiser would not vote at all. If we wish to maintain deterministic functions and ordinal preferences, the probabilistic model loses some of its appeal. Instead, scholars in computer science and formal logic have devised a wide range of methods to model an agent who does not know the state of the world, and does not even have a Bayesian prior. Xia and Conitzer [16] introduce the model of partial orders. Here it is assumed that only certain binary preferences of a voter are known. For example, there may have been a survey that asked voters to rank a and b, and a second b and c. As a
1
Gibbard interpreted this result negatively, as he showed all these procedures are some mixture of dictatorship and majority voting. We do not share his pessimism; random dictatorship is responsive, anonymous, majoritarian in expectation, and arguably the only feasible implementation of sortition in a modern context. The only thing it shares with dictatorship is the name.
Manipulation of k-Approval Under De Re Knowledge
223
consequence we may know that a given voter prefers a to b and c to b, but that will tell us nothing about how they compare a and c. Xia and Conitzer [16, 17] consider the complexity of determining whether it is possible for a given candidate to win in this model. Conitzer et al. [4] extend the analysis to the complexity of finding a strategic vote. To model strategic voting under uncertainty they introduce the notion of a dominating manipulation, which proved to be a very popular notion with later authors (this is the same as de re knowledge of weak manipulation, discussed below). A dominating manipulation is a vote that will improve on the election outcome in some state the manipulator considers possible, and can never harm the outcome. Thus using a dominating manipulation whenever one is available is a weakly dominant strategy. Meir et al. [11] model voter uncertainty under plurality voting by assuming all voters have access to a given poll, but voters differ in their belief in the accuracy of the poll. For example if the poll shows 60 votes for candidate a, voter 1 may consider the range 55–65 possible, while voter 2 is less certain and consider that scores from 30 to 90 may be possible. Voters vote for an undominated candidate given their beliefs. An equilibrium is sought by iteratively presenting the voters with the new scores, and allowing them to change their vote. Chopra et al. [3] assume a voter knows the preferences of some voters, but not others. This knowledge is represented as a graph on the set of voters. The model includes a strategising function f , which specifies how a voter reacts to this information. Examples of f could include voting sincerely; choosing a favourite from the two candidates with the highest known scores; switching to a candidate that will win under the assumption that all like minded voters act likewise; and so on. It is shown that any f converges to equilibrium on a directed, acyclic graph. Reijngoud and Endriss [13] study specific opinion polls, which communicate certain information about the profile to the voters; such as the scores, the winner, tournament graph. The voter considers all profiles consistent with this information possible. Chevaleyre et al. [2] study the relationship between the information content of various opinion polls, and Endriss et al. [5] the extent to which such polls allow agents to manipulate. Van Ditmarsch et al. [15] present a formalisation of strategic voting under uncertainty in the language of epistemic logic. In the course of their analysis they emphasise the fact that it is far from clear what it means to manipulate an election if a voter does not know the outcome, and cannot even assign probabilities to it. In particular, they identify at least four notions: 1. De re knowledge of manipulation: the voter knows that, whatever the state of the world, voting Pi is preferable to voting Pi . 2. De re knowledge of weak manipulation: the voter knows that in some states of the world voting Pi will improve the election outcome, and in no state will it worsen it.
224
V. Kuka and E. Ianovski
3. De dicto knowledge of manipulation: the voter knows that in every state of the world, there exists some strategic Pi that will improve the election outcome, but this may be a different strategy in each state. 4. De dicto knowledge of weak manipulation: an even weaker concept. These insights are interesting not only because “de dicto” knowledge is certainly a form of knowledge, but is entirely useless for actually changing the election outcome, but even de re knowledge of weak manipulation—the notion used by Reijngoud and Endriss [13], Endriss et al. [5], Conitzer et al. [4]—seems a bit weak. Simply knowing that his top choice will not win the election is enough to give a voter de re knowledge of weak manipulation under many voting rules, but without a concrete idea of how to improve the situation, it is odd to say the voter can manipulate.
3 Preliminaries Consider a set of n voters and m candidates. We will denote voters by integers and candidates by lower case letters. A profile is an n-tuple of linear orders over the candidates, P = (P1 , . . . , Pn ). Pi are the preferences of voter i, which we will also denote with *i in infix notation. A voting rule is a function that takes a profile to a single candidate: the election winner. In this paper we will be concerned with the k-approval family of voting rules, under lexicographic tie-breaking. Definition 1 Let k be an integer 1 ≤ k < m, and *T an order over the candidates. If voter i ranks a candidate c in the top k positions, we say i approves of c, and disapproves otherwise. The k-approval rule (with tie-breaking via *T ) is the voting rule that assigns one point to a candidate each time that candidate is approved, and elects the candidate with the largest number of points; ties are broken by choosing the candidate ranked higher by *T . If the winner, c, has strictly more points than any other candidate we say that c wins uniquely, otherwise c wins via tie-breaking. The special case of 1-approval is known as plurality; the case of (m−1)-approval as antiplurality, or veto. We use P−i to denote the preferences of all voters other than i, and thus (Pi , P−i ) denotes the profile obtains from P = (P1 , . . . , Pn ) by replacing Pi with Pi . We say that voter i can manipulate the voting rule f at profile P if there exists a Pi such that: f (Pi , P−i ) *i f (P )
(1)
That is, if i swaps his vote from Pi to Pi , and all other voters keep their votes the same, the outcome in (Pi , P−i ) will be strictly better for i than in P . We call such a Pi a strategic vote.
Manipulation of k-Approval Under De Re Knowledge
225
In the case of k-approval, the only relevant aspect of the change from Pi to Pi is which candidates went from approved or disapproved or vice versa. Thus if i approves of c in Pi but disapproves of c in Pi , we say i demotes c. In the converse situation, i promotes c. To recycle on variable names, from hereon we will assume that all manipulation will be carried out by voter 1, whom we call the manipulator; the other, sincere voters will be referred to as s-voters. An information set for voter 1 is a set I of profiles such that P1 = P1 for every P , P ∈ I . That is, voter 1’s preferences are the same in every profile in I , but the s-voters’ preferences can change. This is interpreted as the set of possible states voter 1 considers possible, in the situation where he does not possess complete information of other voters’ preferences. The special case of complete information is where I is a singleton, and the case of complete ignorance is where I consists of all profiles that agree on voter 1’s preferences. Given that I is the information set of the manipulator, we say that the manipulator has de re knowledge of manipulation if: ∃P1 ∀P ∈ I : f (P1 , P−1 ) *1 f (P )
(2)
That is voting P1 will be a successful manipulation in every state the manipulator considers possible. In this paper we will consider two specific types of information sets where the manipulator has information about which candidates are approved or disapproved. The information sets will come in two forms, Dx and Ax . The intuition that Dx is meant to capture is that the manipulator knows, for every voter i, some x candidates that i disapproves of. Likewise, Ax means that the manipulator knows x approved choices for each voter. To formalise this we will parametrise Dx (Ax ) with n−1 sets. That is, Dx (D2 , . . . , Dn ) is the information set where the manipulator knows that voter i disapproves of Di , |Di | = x. However, since we will never need to compare two information sets, we will simplify notation by dropping the parameters from the text and write simply Dx in lieu of Dx (D2 , . . . , Dn ), and taking it as given that Di refers to the disapproved candidates of voter i under the information set Dx . Definition 2 Let x be an integer. Let Di be a set of x candidates, for 1 < i ≤ n. Let Dx be the information set (with parameters D2 , . . . , Dn ) such that in every P ∈ Dx , voter i disapproves of Di in P . Thus D1 represents the knowledge of one disapproved candidate of every voter, Dm−k−1 the knowledge of all but one disapproved candidate, and Dm−k gives effectively complete knowledge of a k-approval election. For convenience, D0 represents having no knowledge whatsoever. Let Ai be a set of x candidates, for 1 < i ≤ n. Let Ax be the information set (with parameters A2 , . . . , An ) such that in every P ∈ Ax , voter i approves of Ai in P .
226
V. Kuka and E. Ianovski
Thus A1 represents the knowledge of one approved candidate of every voter, Ak−1 the knowledge of all but one disapproved candidate, and Ak gives effectively complete knowledge of a k-approval election.
4 Knowledge of Manipulation 4.1 Knowledge of Disapproved Candidates We begin with the case of D1 , and seek to prove the following theorem: Theorem 1 Let, k < m − 1 and n > k. The information set D1 will not give the manipulator de re knowledge of manipulation. To simplify the process we need to introduce some notation. Observe that as far as the scores of the candidates are concerned, all the relevant information contained in Dx can be reduced to m integers, n1 , . . . , nm , where ni is the number of voters (including the manipulator) who are known to disapprove of ci . Thus n − ni is the upper bound on the score of ci in any P ∈ Dx . We are interested in the candidates for whom this upper bound is the highest. Let nmin = min ni , and call all candidates ci with ni = nmin promising. An extreme profile for a promising ci is a P ∈ Dx where every s-voter who is not known to disapprove of ci approves of ci . Clearly ci will be one of the highest ranking candidates at P with n − nmin points. Observe that after manipulation, only one of four types of candidates can win at P . Namely: 1. A promising candidate that the manipulator approves of. Call such a candidate a champion. 2. A promising candidate that the manipulator disapproves of. Call such a candidate an opponent. 3. A candidate ci with ni = nmin + 1 that the manipulator approves of. Call such a candidate a runner-up. 4. A candidate ci with ni = nmin + 1 that the manipulator disapproves of. Call such a candidate a challenger. Observe also that the only way to manipulate at P in favour of a champion is to demote another champion; the only way to manipulate in favour of a runner-up is to demote champions or runner-ups; the only way to manipulate in favour of an opponent is to promote that opponent; the only way to manipulate in favour of a challenger is to promote that challenger. There is a certain symmetry in the information set that we will exploit: Lemma 1 If there exists a P ∈ Dx where a champion (opponent) ci wins uniquely, then for any champion (opponent) cj there exists a P ∈ Dx where cj wins uniquely.
Manipulation of k-Approval Under De Re Knowledge
227
Proof Let ci be a champion (opponent) and let P be a profile where ci wins uniquely. Construct P as follows: 1. For every s-voter v such that ci , cj ∈ / Dv , obtain Pv by swapping ci and cj in Pv . 2. For every s-voter v such that ci , cj ∈ Dv , leave the preferences unchanged, Pv = Pv . 3. Since ci and cj are both champions (opponents), observe that the number of svoters v for whom ci ∈ Dv , cj ∈ / Dv , is equal to the number of voters w for whom ci ∈ / Dv , cj ∈ Dv . If this were not the case, then it could not be true that ni = nj . As such, let v1 , . . . , vq , w1 , . . . , wq be all such voters, and obtain Pvr by swapping ci and cj in Pwr and Pw r by swapping ci and cj in Pvr . In the resulting P the scores of ci and cj will be reversed, and the scores of any other candidate will be unchanged. Thus cj will win uniquely in P . Crucial to the case of D1 , is the fact that we can always construct a profile where any set of q ≤ k candidates get less than their maximum possible score. Lemma 2 Let s1 , . . . , sq be a set of q ≤ k candidates, n > k. and ni < n − 1 for all si . There exists a P ∈ D1 such that the score of each si is no greater than n − ni − 1. Proof Let s1 , . . . , sq be the subset of the candidates for whom ni > 0. If q = 1, have s1 be demoted by an arbitrary s-voter for whom sq ∈ / Dv —by the assumption that ni < n − 1, such a voter exists. Otherwise, for each sj let vj be an s-voter for whom sj ∈ Dvj . Have v1 demote sq , v2 demote s1 , and so on. This guarantees that these candidates cannot attain more than n − ni − 1 points. For the remaining q − q candidates, observe that we have n − q untouched svoters remaining, for none of which si ∈ Dv . Since n > k ≥ q, we can have each candidate demoted by an arbitrary s-voter. This gives us all we need to begin. Proof of Theorem 1 Suppose, for contradiction, that there exists a profile P such that the manipulator has de re knowledge of manipulation in D1 . That is, there exists a strategic vote L1 = P1 , such that voting L1 is strictly preferable to voting P1 in every state in D1 . We proceed by cases on whether or not there exists an extreme profile where a promising ci wins uniquely.2
2
To see that this is not necessarily the case, consider 1-approval with 3 candidates and 2 voters. The manipulator ranks a *1 b *1 c, and knows that the other voter disapproves of a. Every profile in the information set will be a tie.
228
V. Kuka and E. Ianovski
Case 1: There exists an extreme P ∈ D where a promising ci wins uniquely. First, we claim that ci cannot be a champion. Suppose otherwise. The only way the manipulator can manipulate here is by demoting ci , and making ci lose the tie against a preferred runner-up. Let s1 , . . . , sq be the runner-ups, of whom there are at most k − 1. Construct P by demoting the runner-ups as in Lemma 2. If one such demotion operation does not change the election outcome, call it a safe demotion; else an unsafe demotion. Let P ∗ be the intermediate profile after applying all the safe demotions. Observe that such a demotion will not lower the score of any candidate other than a runner-up, hence the sincere winner at P is either a champion or an opponent. If ci is still the sincere winner, i.e. P = P ∗ , then it is only possible to manipulate in favour of a runner-up, but by construction they all have at most n − nmin − 2 points at P , so manipulation is not possible. If ci is in a tie at P and loses the tie, then if the sincere election outcome is a champion cj , the only way the manipulator can manipulate is to demote cj in favour of some champion cq . This must mean that the manipulator does not demote cq . However, this leads to a contradiction—by Lemma 1 there exists an extreme profile where cj is a unique winner, and by assumption L1 is a strategic vote at that profile also. Hence, L1 must demote all champions. If the sincere election outcome is an opponent cj , then the only way the manipulator can manipulate is by promoting an opponent or challenger cq . But then consider P ∗ —the sincere election outcome is the champion ci , while cq is either tied with ci or is one point behind. After manipulation, ci will lose one point, cq will gain one and win. Case 2: Every extreme P ∈ D1 incurs a tie. We proceed by cases based on whether a champion wins in some extreme profile after manipulation. Case 2.1: A champion ci wins in some extreme profile after manipulation. The manipulator cannot increase the score of any champion, or decrease the score of any opponent, so it must be the case that ci beats all non-champions in a tie, but previously lost the tie to a less preferred champion. Observe that it is possible to construct an extreme profile where ci is in a tie with cj , for every promising cj . If k ≥ 2, simply make sure cj and ci are approved wherever possible. If k = 1, then it is not possible for every extreme profile to have a tie—if ci is approved wherever possible but still does not win a plurality election, that must mean the number of s-voters for whom ci ∈ Dv is equal to the rest, and hence ni = n − ni , n = 2ni . But if half of all voters disapprove of ci , yet ci nevertheless minimises ni , then there can be only two candidates, and manipulation is not possible. Thus the only way this manipulation can work is if voter one demotes all champions that beat ci in a tie.
Manipulation of k-Approval Under De Re Knowledge
229
We claim there cannot exist any opponents. If cj were an opponent, then there would exist a P where ci and cj are tied. If we obtain P by having one s-voter demote ci , then after manipulation cj gets full score and all champions get one less, so the winner would be an opponent. Given that there are no opponents, then every tie is between champions. There are at most k champions, so by Lemma 2 we can construct a profile where every champion but ci gets less than the maximum score. ci is the unique winner at this profile, contradicting our assumption. Case 2.2: No champions win at extreme profiles after manipulation. Here voter one must manipulate by upgrading opponents and challengers. He can only upgrade k of these. Let these be s1 , . . . , sk . Using Lemma 2, construct an extreme P by having one s-voter demote each of these. Suppose c won in P sincerely, but after manipulation si , This must mean that either si was an opponent who lost to c by tie, or a challenger that beat c by tie. In P , the sincere winner is some c∗ that is at least as high in the tie-breaking as c . After manipulation, no challenger can match c∗ by score, and any opponent will lose the tie because they lost to c in P . Both cases lead to contradiction. This establishes the theorem. The above argument fails for general Dx because Lemma 2 no longer holds. However, if we relax the conditions on the theorem, we can prove a similar result for the almost complete knowledge of Dm−k−1 . Theorem 2 Let n > m. The information set Dm−k−1 will not give the manipulator de re knowledge of manipulation. Proof Let ci be the promising candidate ranked highest in the tie-breaking. We claim that it is possible to construct an extreme P for ci where every other champion, opponent, runner-up, and challenger does not attain maximum score. To do this, we must show that it is possible to find a unique s-voter v for each such candidate c such that c ∈ / Dv , and construct a profile where v demotes c. Consider a bipartite graph with such candidates on the left and s-voters on the right. A candidate c is adjacent to an s-voter v just if c ∈ / Dv . To find an appropriate sequence of demotions, it suffices to find a candidate-perfect matching in this graph. By Hall’s theorem, a candidate-perfect matching exists just if for every subset of candidate W we have |W | ≤ |NG(W )|, where NG(W ) is the set of s-voters adjacent to some c ∈ W . Suppose, for contradiction, that this fails—there exists a W for which |W | > |NG(W )|. Let r = |W | be the number of these candidate, r = |NG(W )| the number of adjacent s-voters, and n = n − 1 the total number of s-voters. Since no candidate in W is adjacent to an s-voter outside of NG(W ), it must be the case that each of these n − r s-voters disapproves of all r candidates in W . Consider now the quantity of the manipulator’s knowledge. For each of n − 1 svoters, the manipulator knows (m−k −1) disapproved candidates. We can visualise this as a table with (n−1)(m−k−1) entries. Of these, we know that n−1−r s-voters
230
V. Kuka and E. Ianovski
disapprove of the r candidates in W . This leaves (n − 1)(m − k − 1) − r(n − 1 − r ) unfilled entries in the table, and at least m − r candidates to distribute among them. We know that each candidate ci is disapproved ni times. To account for the possibility of being disapproved of by the manipulator, each candidate must appear in the table at least ni times. Since each candidate in W appears in the table at least n − r times, and for these candidates ni ≤ nmin + 1, this means every remaining candidate must appear in the table at least n − r − 2 times. Since the minimum must be smaller than the average, this gives us: n (m − k − 1) − r(n − r ) m−r
≥ n − r − 2
(3)
n (m − k − 1) − r(n − r )
≥ (m − r)(n − r − 2)
(4)
n m − n k − n − rn + rr ≥ n m − mr − 2m − rn + rr + 2r
−n k − n
≥
−mr
mr + 2m − 2r
mr + 2m − 2r
− 2m + 2r
≥ n k + n ≥
n (k
− 1) + 2n
(5) (6) (7) (8)
Which is impossible since n > m and k − 1 ≥ r . As a result we can construct a profile where every champion, opponent, runnerup, and challenger other than ci fails to attain maximum score. In the resulting profile ci wins uniquely, and even if the manipulator forces a tie with another promising candidate, ci will win the tie. Thus the manipulator cannot alter the election outcome.
4.2 Knowledge of Approved Candidates With Ax , it is indeed possible to construct cases where the manipulator has de re knowledge of manipulation. Suppose k ≥ 2, x ≥ 2 and the manipulator knows c1 and c2 are approved by every voter. If c1 *1 c2 , c2 is first in tie-breaking, and c1 second, then demoting c2 will ensure the winner changes from c2 to c1 in every possible state. However this is an extremely marginal situation—it requires that two candidates are unanimously approved of by the voter base, as well as restrictions on their position in tie-breaking. We shall show that if no candidate is known to be universally approved of, then the manipulator cannot have knowledge of manipulation. To ease notation, call the top k candidates in the manipulator’s preference order the good candidates, and let ai be the number of times candidate ci is known to be approved of. Observe that only a good candidate can attain n points in a sincere profile, thus the sincere winners in extreme profiles can only be good candidates.
Manipulation of k-Approval Under De Re Knowledge
231
Lemma 3 Let Ax be such that for all ci , ai < n. For any given set G of m − k good candidates, there exists a P ∈ Ax where these candidates do not get n points. Proof Observe that any voter has enough disapproval slots to demote every candidate in G if need be. It is thus sufficient that for every ci ∈ G, there exists some voter j for whom ci ∈ / Aj . But this must be the case because ai < n. Theorem 3 Suppose 3 ≥ k > x, and let Ax be such that for all ci , ai < n. The information set Ax will not give the manipulator de re knowledge of manipulation. Proof Suppose g1 , . . . , gk are the good candidates, with tie-breaking g1 *T · · · *T gk . For every gi ∈ { g1 , . . . , gm−k+1 } there exists a P where gi gets n points and g1 , . . . , gi−1 get less than n—simply approve gi by every voter, and use Lemma 3 on g1 , . . . , gi−1 . In this profile the sincere winner is gi . If gi is the manipulator’s favourite candidate, manipulation is not possible and thus the manipulator does not have de re knowledge of manipulation. If gi is not his favourite candidate, then the only possible strategic vote would involve demoting gi . In order to manipulate at all these profiles, then, the manipulator would have to demote each of g1 , . . . , gm−k+1 . But that is impossible, because there are only m − k positions at the bottom of the ballot.
5 Discussion We began with the observation that de re knowledge of weak manipulation is excessively weak, since marginal information such as knowing the manipulator’s top choice is disapproved of, is enough to give the manipulator “knowledge” of manipulation, yet no concrete plan how to manipulate. However, our results suggest that de re knowledge may be too strong. Ak−1 is a colossal amount of information, the manipulator knows all but one of the other voters’ approved candidates—yet it turns out that if no candidate is universally approved (which is a reasonable assumption) then this information does not serve the manipulator just because it is conceivable that in some possible world, no matter how unlikely, the election is already decided. We suggest two ways to address this, if we are to stay in the possibleworlds framework favoured by computer scientists. The first is based on the non-constructive nature of de re weak knowledge. We have no qualms with the notion of a weakly dominant strategy, and if an agent can use a weakly dominant strategy, then we agree that he should. The trouble with weakly dominant strategies is that there may be many of them, and in the case of strategic voting this is certainly the case. An agent has m! possible votes available to him, and as soon as he knows his sincere voter is wasted, almost all of these can become viable. It would be interesting to study a notion where the manipulator has a unique (or effectively unique, in the case of voting rules with a lot of redundant information) weakly strategy. It does not have to always improve on the election outcome, but at least
232
V. Kuka and E. Ianovski
the manipulator knows for a fact that voting this way, and only this way, dominates sincere behaviour. The second stems from the proof techniques used in our theorems. Effectively, we sought to construct a profile in the information set where the manipulator cannot alter the election outcome, and hence it does not matter whether they manipulate or not. This is a well known issue—the paradox of the rational voter—that, in a large election, the probability of one voter making a difference tends to zero. As a consequence, the probability of a single manipulator altering the outcome tends to zero as well. Since de re knowledge of manipulation requires that every profile in the information set be manipulable, it requires that the manipulator know that all possible worlds fall into the extremely narrow category of worlds where a single voter can make a difference. The way to address this problem is to consider manipulation by a group—coalitional manipulation. It could prove that with many manipulators, de re knowledge is a reasonable notion after all. Acknowledgments This work is supported by the Russian Science Foundation under grant 2071-00029.
References 1. Barbera, S., Bogomolnaia, A., Van Der Stel, H.: Strategy-proof probabilistic rules for expected utility maximizers. Math. Social Sci. 35(2), 89–103 (1998) 2. Chevaleyre, Y., Maudet, N., Lang, J., Ravilly-Abadie, G., et al.: Compiling the votes of a subelectorate. In: Proceedings of IJCAI-2009 (2009) 3. Chopra, S., Pacuit, E., Parikh, R.: Knowledge-theoretic properties of strategic voting. In: European Workshop on Logics in Artificial Intelligence, pp. 18–30. Springer, Berlin (2004) 4. Conitzer, V.,Walsh, T., Xia, L.: Dominating manipulations in voting with partial information. In: Twenty-Fifth AAAI Conference on Artificial Intelligence (2011) 5. Endriss, U., Obraztsova, S., Polukarov, M., Rosenschein, J.S.: Strategic voting with incomplete information. In: AAAI Press/International Joint Conferences on Artificial Intelligence (2016) 6. Ferejohn, J.A., Fiorina, M.P.: The paradox of not voting: a decision theoretic analysis. Am. Political Sci. Rev. 68(2), 525–536 (1974) 7. Fishburn, P.C.: Even-chance lotteries in social choice theory. Theory Decis. 3(1), 18–40 (1972a) 8. Fishburn, P.C.: Lotteries and social choices. J. Econ. Theory 5(2), 189–207 (1972b) 9. Gibbard, A.: Manipulation of voting schemes: a general result. Econometrica 41(4), 587–601 (1973) 10. Gibbard, A.: Manipulation of schemes that mix voting with chance. Econometrica 45(3), 665– 681 (1977) 11. Meir, R., Lev, O., Rosenschein, J.S.: A local-dominance theory of voting equilibria. CoRR, abs/1404.4688 (2014) 12. Myerson, R., Weber, R.: A theory of voting equilibria. Am. Political Sci. Rev. 87, 102–114 (1988) 13. Reijngoud, A., Endriss, U.: Voter response to iterated poll information. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, vol. 2, pp. 635–644. International Foundation for Autonomous Agents and Multiagent Systems (2012)
Manipulation of k-Approval Under De Re Knowledge
233
14. Satterthwaite, M.A.: Strategy-proofness and Arrow’s conditions: existence and correspondence theorems for voting procedures and social welfare functions. J. Econ. Theory 10(2), 187–217 (1975) 15. Van Ditmarsch, H., Lang, J., Saffidine, A.: Strategic voting and the logic of knowledge. arXiv preprint:1310.6436 (2013) 16. Xia, L., Conitzer, V.: Determining possible and necessary winners under common voting rules given partial orders. AAAI 8, 196–201 (2008) 17. Xia, L., Conitzer, V.: Determining possible and necessary winners given partial orders. J. Artif. Intell. Res. 41, 25–67 (2011) 18. Zeckhauser, R.: Majority rule with lotteries on alternatives. Q. J. Econ. 83(4), 696–703 (1969)
An Approach for Determining Stationary Equilibria in a Single-Controller Average Stochastic Game Dmitrii Lozovanu and Stefan Pickl
Abstract In this paper the problem of the existence and determining stationary Nash equilibria in a single-controller average stochastic game is considered. The set of states and the set of actions in the game are assumed to be finite. We show that all stationary equilibria for such a game can be obtained from an auxiliary noncooperative static game in normal form where the payoffs are quasi-monotonic (quasi-convex and quasi-concave) with respect to the corresponding strategies of the players and graph-continuous in the sense of Dasgupta and Maskin. Based on this we present a proof of the existence of stationary equilibria in a singlecontroller average stochastic game and propose an approach for determining the optimal stationary strategies of the players. Keywords Single-controller stochastic game · Average payoff criteria · Stationary strategies · Stationary Nash equilibrium
1 Introduction Single-controller average stochastic games represent a class of stochastic games with average payoffs in which the transition probabilities are controlled by one player only. The relationships of this class of games with some practical and classical optimization problems have been shown by Vrieze [22] and Filar [5]. Parthasarathy and Raghavan [14] first considered the single-controller stochastic games and showed existence of stationary Nash equilibria in the case of two-player games with discounted and average payoffs. For the zero-sum single-controller average stochastic game Vrieze [21] and Hordrijk and Kallenberg [8] showed D. Lozovanu () Institute of Mathematics and Computer Science, Chisinau, Moldova e-mail: [email protected] S. Pickl Universität der Bundeswehr München, München, Germany e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Trends in Mathematics, https://doi.org/10.1007/978-3-030-93616-7_13
235
236
D. Lozovanu and S. Pickl
a linear programming reduction procedure for determining stationary equilibria. Algorithms for determining stationary Nash equilibria in such single-controller average stochastic games have been developed in [4, 6, 16, 17, 19, 21]. The problem of the existence and determining stationary equilibria for a single-controller average stochastic game in a general case has been considered by Raghavan and Syed [17]. They proved the existence of stationary equilibria for the general case of a singlecontroller average stochastic came and proposed an algorithm for determining a stationary Nash equilibrium. However the theoretical computational complexity of the algorithm is not estimated. In this paper we propose a new approach for studying the problem of the existence and determining stationary Nash equilibria for a single-controller average stochastic game in a general case. We show that all stationary equilibria for the general case of a single-controller game can be obtained from an auxiliary noncooperative static game in normal form where the payoffs are quasi-monotonic (quasi-convex and quasi-concave) with respect to the corresponding strategies of the players and graph-continuous in the sense of Dasgupta and Maskin [2]. Based on this we present a new proof of the existence of stationary equilibria in a single-controller average stochastic game and show how to construct the auxiliary noncooperative static game that gives all stationary Nash equilibria for a singlecontroller average stochastic game.
2 Formulation of a Single-Controller Average Stochastic Game We consider an m-player single-controller stochastic game in which the first player controls the transition probabilities. First we present the framework of the game and then we will specify the formulation of the game when players use stationary strategies for the selection of the actions in the states.
2.1 The Framework of a Single-Controller Average Stochastic Game An m-player single-controller stochastic game consists of the following elements: – a state space X (which we assume to be finite); – a finite set Ai (x) of actions in each state x ∈ X for an arbitrary player i ∈ {1, 2, . . . , m}; – a step reward f i (x, a) with respect to each player i ∈ {1, 2, . . . , m} state . in each i (x); x ∈ X and for an arbitrary action vector a = (a1 , a2 , . . . , am ) ∈ m A i=1
An Approach for Determining Stationary Equilibria in a Single-Controller. . .
– a transition probability function p1 : X ×
.
237
A1 (x) × X → [0, 1] of the first
x∈X 1
a from an arbitrary x ∈ X to an player that gives the transition probabilities px,y a 1 = 1, ∀x ∈ X, ∀a 1 ∈ A1 (x); arbitrary y ∈ X and a 1 ∈ A1 (x), where px,y y∈X
– a starting state x0 ∈ X.
The game starts in the state x0 at the moment of time t = 0 where players simultaneously and independently fix their actions a0i ∈ Ai (x0 ), i = 1, 2, . . . , m. After that the players receive the corresponding rewards f i (x0 , a0 ), i = 1, 2, . . . , m in x0 for the given action vector a0 = (a01 , a02 , . . . , a0m ). Then the game passes a1
randomly to a state x1 ∈ X according to the probability distribution {px00,y }y∈X . At the moment of time t = 1 the players observe the state x1 ∈ X and again select simultaneously and independently their actions a1i ∈ Ai (x1 ), i = 1, 2, . . . , m in the state x1 and receive the corresponding rewards f i (x1 , a1 ), i = 1, 2, . . . , m for the given action vector a1 = (a11 , a12 , . . . , a1m ). Then the game passes randomly a1
to a state x2 ∈ X according to a probability distribution {px11,y }y∈X . In general, at the moment of time t the players observe the state xt ∈ X, fix their actions ati ∈ Ai (xt ), i = 1, 2, . . . , m in xt and receive the corresponding rewards f i (x1 , at ), i = 1, 2, . . . , m in xt for the given action vector at = (at1 , at2 , . . . , atm ). Such a play of the game produces a sequence of states and actions x0 , a0 , x1 , a1 , . . . , xt , at , . . . that defines a stream of stage rewards f 1 (xt , at ), f 2 (xt , at ), . . . , f m (xt , at ), t = 0, 1, 2, . . . . The average single-controller stochastic game is the game with payoffs of the players "
ωxi 0
# t −1 1 i = lim inf E f (xτ , aτ ) , i = 1, 2, . . . , m t →∞ t τ =0
where E is the expectation operator with respect to the probability measure induced by the Markov process with actions chosen by the first player and given starting state x0 . Each player in this game has the aim to maximize his average reward per transition.
2.2 A Single-Controller Average Stochastic Game in Stationary Strategies Now we will specify the formulation of a single-controller average stochastic game when players use stationary strategies for the selection of the actions in the states. A strategy of player i ∈ {1, 2, . . . , m} in a single-controller stochastic game is a mapping s i that provides for every state xt ∈ X a probability distribution over the
238
D. Lozovanu and S. Pickl
set of actions Ai (xt ). If these probabilities take only values 0 and 1, then s i is called a pure strategy, otherwise s i is called a mixed strategy. If these probabilities depend only on the state xt = x ∈ Xi (i. e. s i and do not depend on t), then s i is called a stationary strategy, otherwise s i is called a non-stationary strategy. So, we will regard a pure stationary strategy s i of the player i ∈ {1, 2, . . . , m} as a mapping s i : x → a ∈ Ai (x) for x ∈ X that determines an action a i ∈ Ai (x) for every x ∈ X, i.e. s i (x) = a i for x ∈ X. This means that we can identify a pure stationary strategy s i (x) of player i with i i the set of boolean variables sx,a ∈ {0, 1}, where for a given x ∈ X, sx,a = 1 if i i and only if player i fixes the action a ∈ A (x). Thus, the set of pure stationary strategies S i of player i can be regarded as the set of solutions of the following system: ⎧ ⎨ ⎩
a i ∈A(x)
i sx,a i = 1,
∀x ∈ X;
i sx,a ∀x ∈ X, ∀a ∈ Ai (x). i ∈ {0, 1},
i i i If in this system we change the restrictions sx,a i ∈ {0, 1} for x ∈ X, a ∈ A (x) by
i the conditions 0 ≤ sx,a i ≤ 1 then we obtain the set of stationary strategies in the i sense of Shapley [20], where sx,a i is treated as the probability of the choices of
the action a i by player i every time when the state x is reached by any route in the dynamic stochastic game. This means that the set of mixed stationary strategies Si of player i corresponds to the set of solutions of the system ⎧ ⎨ ⎩
a i ∈A(x)
i sx,a i = 1,
i sx,a i ≥ 0,
∀x ∈ X; ∀x ∈ X, ∀a ∈ A(x).
Let s = (s 1 , s 2 , . . . , s m ) ∈ S = S1 × S2 × · · · × Sm be a profile of mixed strategies s 1 , s 2 , . . . , s m of the players. Taking into account that the transition probabilities in the game are controlled only by player 1 the dynamics of the game is determined by 1 s 1 ) induced by the strategy s 1 , i.e. the elements ps 1 a stochastic matrix P s = (px,y x,y are calculated as follows s1 1 a1 px,y = sx,a for x, y ∈ X. (1) 1 px,y a 1 ∈A1 (x)
An Approach for Determining Stationary Equilibria in a Single-Controller. . . 1
1
239
1
s ) is the limiting probability matrix of P s then the average payoffs If Qs = (qx,y per transition ωx10 (s), ωx20 (s), . . . , ωxm0 (s) for the players are determined as follows
ωxi 0 (s) =
1
qxs0 ,y f i (y, s), i = 1, 2, . . . , m,
(2)
y∈X
where f i (y, s) =
n /
(a 1 ,a 2 ,...,a m )∈A(y)
k=1
k i 1 2 m sy,a k f (y, (a , a , . . . , a ))
(3)
expresses the average payoff (immediate reward) in the state y ∈ X of player i when the corresponding stationary strategies s 1 , s 2 , . . . , s m have been applied by players 1, 2, . . . , m in y. The functions ωx10 (s), ωx20 (s), . . . , ωxm0 (s) on S = S1 × S2 × · · · × Sm , defined according to (2), ˜(3), determine a game in normal form [13] that we denote by {Si }i=1,m , {ωxi 0 (s)}i=1,m ". This game corresponds to the single-controller average stochastic game in mixed stationary strategies that in extended form is determined by the tuple (X, {Ai (x)}x∈X,i=1,m, {f i (x, a}i=1,m , p, x0 ). The functions ωx10 (s), ωx20 (s), . . . , ωxm0 (s) on S = S 1 × S 2 × · · · × S m , determine the game {S i }i=1,m , {ωxi 0 (s)}i=1,m " that corresponds to the single-controller average stochastic game in pure strategies. In the extended form this game is also determined by the tuple (X, {Ai (x)}x∈X,i=1,m, {f i (x, a}i=1,m , p, x0 ). In this paper we will consider also single-controller stochastic games in which the starting state is chosen randomly according to a given distribution {θx } on X. So, for a given single-controller stochastic game we will assume that the play starts in the state x ∈ X with probability θx > 0 where θx = 1. If the players use x∈X
mixed stationary strategies then the payoff functions ωθi (s) =
θx ωxi (s), i = 1, 2, . . . , m
x∈X
on S define a game in normal form {Si }i=1,m , {ωθi (s)}i=1,m " that in the extended form is determined by (X, {Ai (x)}x∈X,i=1,m, {f i (x, a}i=1,m , p, θ ). In the case θx = 0, ∀x ∈ X \ {x0 }, θxo = 1 the considered game becomes a single-controller stochasic game with fixed starting state x0 .
240
D. Lozovanu and S. Pickl
3 Preliminaries An important result that we shall use in the paper is concerned with the existence of Nash equilibria in noncooperative games with quasi-concave and graph-continuous payoffs (see Dasgupta and Maskin [2]). A function f : S → R1 on a convex set S ⊆ Rn is quasi-concave [1] if ∀s , s ∈ S and ∀λ ∈ [0, 1] hold f (λs + (1 − λ)s ) ≥ min{f (s ), f (s )}. If ∀s , s ∈ S and ∀λ ∈ [0, 1] hold f (λs + (1 − λ)s ) ≤ max{f (s ), f (s )} then the function f : S → R1 is called quasi-convex. A function f : S → R1 , S ⊆ Rn , which is quasi-concave and quasi-convex is called quasi-monotonic. A detailed characterization of quasi-convex, quasi-concave and quasi-monotonic functions with an application to linear-fractional programming problems can be found in [1]. Let Si i=1,m , f i (s)i=1,m " be an m-player game in normal form, where Si ⊆ Rni , i = 1, 2, . . . , m, represent the corresponding sets of strategies m . Si → R1 , i = 1, 2, . . . , m, of the players 1, 2, . . . , m, and f i : j =1
represent the corresponding payoffs of these players. Let s = (s 1 , s 2 , . . . , s m ) m . Si , and define s−i = be a profile of strategies of the players, s ∈ S = (s 1 , s 2 , . . . , s i−1 , s i+1 , . . . , s m ),
S−i =
m .
j =1(j =i) (s i , s−i ).
j =1
Si where s−i ∈ S−i . Thus,
for an arbitrary s ∈ S we can write s = Fan [3] extended the well-known equilibrium result of Nash [13] to the games with quasi-concave payoffs. He proved the following theorem: Theorem 1 Let Si ⊆ Rni , i = 1, 2, . . . , m, be non-empty, convex and compact m . sets. If each payoff f i : S → R1 , i ∈ {1, 2, . . . , m}, is continuous on S = Si j =1
and quasi-concave with respect to s i on Si , then the game Si i=1,m , f i (s)i=1,m " possesses a Nash equilibrium. Dasgupta and Maskin [2] considered a class of games with upper semicontinuous, quasi-concave and graph-continuous payoffs. Definition 1 The payoff f i :
m . j =1
Si → R1 of the game Si i=1,m f i (s)i=1,m " is
upper semi-continuous if for any sequence {sk } ⊆ S = it holds lim sup f i (sk ) ≤ f i (s). k→∞
m . j =1
Si such that {sk } → s
An Approach for Determining Stationary Equilibria in a Single-Controller. . .
Definition 2 The payoff f i :
m . j =1
241
Si → R1 of the game Si i=1,m f i (s)i=1,m " is
graph-continuous if for all s = (s i , s−i ) ∈ S =
m .
Si there exists a function
j =1
F i : S−i → Si with F i (s−i ) = s i such that f i (F i (s−i ), s−i ) is continuous at s−i = s−i . Dasgupta and Maskin [2] proved the following theorem. Theorem 2 Let Si ⊆ Sni , i = 1, 2, . . . , m, be non-empty, convex and compact m . sets. If each payoff f i : Si → R1 , i ∈ {1, 2, . . . , m}, is quasi-concave j =1
with respect to
si
on
Si ,
upper semi-continuous with respect to s on S =
m .
Si
j =1
and graph-continuous, then the game {Si }i=1,m , {f i (s)}i=1,m " possesses a Nash equilibrium. In the following we shall use this theorem for the case when each payoff f i (s i , s−i ), i ∈ {1, 2, . . . , m} is quasi-monotonic with respect to s i on Si and graph-continuous. In this case the reaction correspondence of player φ i (s−i ) = {ˆs i ∈ Si |f i (ˆs i , s−i ) = max f i (s i , s−i )}, i = 1, 2, . . . , m s i ∈Si
is compact and convex valued and therefore the upper semi-continuous condition for the functions f i (s), i = 1, 2, . . . , m in Theorem 2 can be released. So, in this case the theorem can be formulated as follows. Theorem 3 Let Si ⊆ Rni , i = 1, m be non-empty, convex and compact sets. If m . each payoff f i : Si → R1 , i ∈ {1, 2, . . . , n}, is quasi-monotonic with respect j =1
to s i on Si and graph-continuous, then the game possesses a Nash equilibrium.
{Si }i=1,m , {f i (s)}i=1,m "
4 Some Auxiliary Results for an Average Markov Decision Problem in Stationary Strategies Let us consider a Markov decision process determined by a tuple (X, {A(x)}x∈X , {f (x, a)}x∈X,a∈A(x), p), where X is a finite set of states; A(x) is a finite set of actions . in x ∈ X; f (x, a) is a step reward in x ∈ X for a ∈ A(x) and p : X× A(x) × X → [0, 1] is a probability transition function that satisfies x∈X a the condition y∈X py∈X = 1, ∀x ∈ X, a ∈ A(x). We consider the problem determining the optimal stationary strategies (optimal stationary policies) in such
242
D. Lozovanu and S. Pickl
a process and propose an approach for determining the optimal strategies that we shall use for a single-controller average stochastic game.
4.1
A Linear Programming Approach for an Average Markov Decision Problem
It is well known (see [9, 15]) that the optimal policies (optimal stationary strategies) for an average Markov decision problem can be found on the basis of linear programming. A linear programming model that allows to determine the optimal randomised strategies (optimal stationary strategies) in an average Markov decision problem with finite state and action spaces is the following: Maximize ϕθ (α, β) = f (x, a)αx,a (4) x∈X a∈A(x)
subject to ⎧ a α αy,a − px,y ⎪ x,a = 0, ∀y ∈ X; ⎪ ⎪ ⎪ x∈X a∈A(y) a∈A(x) ⎪ ⎨ a αy,a + βy,a − px,y βx,a = θy , ∀y ∈ X; ⎪ ⎪ x∈X a∈A(x) a∈A(y) a∈A(y) ⎪ ⎪ ⎪ ⎩ αx,a ≥ 0, βy,a ≥ 0, ∀x ∈ X, a ∈ A(x),
(5)
where θy for y ∈ X represent arbitrary positive values that satisfy the condition θy = 1. Here θy for y ∈ X can be treated as the probabilities of choosing the y∈X
starting state y ∈ X in the decision problem. In the case θy = 1 for y = x0 and θy = 0 for y ∈ X \ {x0 } we obtain the linear programming model for an average Markov decision problem with fixed starting state x0 . The linear programming model (4), (5) corresponds to the multichain case of an average Markov decision problem; in the unichain case restrictions (5) can be replaced by ⎧ a α αy,a − px,y ⎪ x,a = 0, ∀y ∈ X; ⎪ ⎪ x∈X ⎪ a∈A(y) a∈A(x) ⎨ αy,a = 1; ⎪ ⎪ ⎪ y∈X a∈A(y) ⎪ ⎩ αy,a ≥ 0, ∀y ∈ X, a ∈ A(y)
(6)
An Approach for Determining Stationary Equilibria in a Single-Controller. . .
243
because in the linear programming model (4), (5) the restrictions
αy,a +
a∈A(y)
with the condition
βy,a −
a px,y βx,a = θy , ∀y ∈ X
x∈X a∈A(x)
a∈A(y)
θy = 1 generalize the constraint
y∈X
αy,a = 1 in (6).
x∈X a∈A(y)
In [15] it is shown the following relationship between feasible solutions of problem (4), (5) and stationary strategies in the average Markov decision problem determined by the tuple (X, {A(x)}x∈X , {f (x, a)}x∈X,a∈A(x), p): Let (α, β) be a feasible solution of the linear programming problem (4), (5) and denote by Xα = {x ∈ X| αx,a > 0}. Then (α, β) possesses the properties that a∈X βx,a > 0 for x ∈ X \ Xα and a stationary strategy sx,a that corresponds to a∈A(x)
(α, β) is determined as
sx,a
⎧ αx,a if x ∈ Xα ; ⎪ ⎪ ⎪ ⎪ α ⎪ x,a ⎪ ⎨ a∈A(x) = βx,a ⎪ if x ∈ X \ Xα , ⎪ ⎪ ⎪ ⎪ β ⎪ x,a ⎩
(7)
a∈A(x)
where sx,a expresses the probability of choosing the actions a ∈ A(x) in the states x ∈ X.
4.2 A Nonlinear Programming Model and a Quasi-Monotonic Programming Approach Using the relationship between feasible solutions of problem (4), (5) and the corresponding stationary strategies (7) in [10] it is shown that an average Markov decision problem in the terms of stationary strategies can be formulated as follows: Maximize ψθ (s, q, w) = f (x, a)sx,a qx (8) x∈X a∈A(x)
244
D. Lozovanu and S. Pickl
subject to ⎧ a s qy − px,y x,a qx = 0, ⎪ ⎪ ⎪ x∈X a∈A(x) ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ a ⎪ ⎪ px,y sx,a wx = θy , ⎪ ⎨ qy + wy −
∀y ∈ X; ∀y ∈ X;
x∈X a∈A(x)
⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩
(9) sy,a = 1,
∀y ∈ X;
a∈A(y)
sx,a ≥ 0, ∀x ∈ X, ∀a ∈ A(x); wx ≥ 0, ∀x ∈ X,
where θy are the same values as in problem (4), (5) and sx,a , qx , wx for x ∈ X, a ∈ A(x) represent the variables that must be found. It is easy to observe that for fixed sx,a , x ∈ X, a ∈ A(x) system (9) uniquely determines qx for x ∈ X and determines wx for x ∈ X up to an additive constant in s ) (see [15]). Therefore the notation ψ (s, q, w) each recurrent class of P s = (px,y θ in (8) can be changed by ψθ (s) =
(10)
f (x, a)sx,a qx .
x∈X a∈X
Based on the results above we can show now that an average Markov decision problem in stationary strategies can be represented as a quasi-monotonic programming problem. Theorem 4 Let an average Markov decision problem be given and consider the function ψθ (s) =
(11)
f (x, a)sx,a qx ,
x∈X a∈A(x)
where qx for x ∈ X satisfy the condition ⎧ ⎪ q − ⎪ ⎨ y
x∈X a∈A(x)
⎪ ⎪ ⎩ qy + wy −
a s px,y x,a qx = 0,
x∈X a∈A(x)
a s px,y x,a wx = θy ,
∀y ∈ X; ∀y ∈ X.
(12)
Then on the set S of solutions of the system ⎧ ⎪ sx,a = 1, ⎨ ⎪ ⎩
∀x ∈ X;
a∈A(x)
(13) sx,a ≥ 0, ∀x ∈ X, a ∈ A(x)
An Approach for Determining Stationary Equilibria in a Single-Controller. . .
245
the function ψθ (s) depends only on sx,a for x ∈ X, a ∈ A(x) and ψθ (s) is quasi-monotonic on S (i.e. ψθ (s) is quasi-convex and quasi-concave on S [1]). This theorem has been formulated and proved in [10]. Based on this theorem we can conclude that an average Markov decision problem in stationary strategies represents a quasi-monotonic programming problem in which it is necessary to maximize a quasi-monotonic function (11) and (12) on the set of solutions of system (13). In the unichain case of the average Markov decision problem the function (11) and (12) does not depend on θy , y ∈ X, i.e the problem is determined by (X, {A(x)}x∈X , {f (x, a)}x∈X , p). In this case from Theorem 4 as a corollary we obtain the following result: Corollary 1 Let an average Markov decision problem be given and consider the function
ψ(s) =
f (x, a)sx,a qx ,
x∈X a∈A(x)
where qx for x ∈ X satisfy the condition ⎧ ⎪ ⎨ qy − ⎪ ⎩
a s px,y x,a qx = 0,
∀y ∈ X;
x∈X a∈A(x)
y∈X
= 1.
Then on the set S of solutions of the system ⎧ ⎪ sx,a = 1, ⎨ ⎪ ⎩
∀x ∈ X;
a∈A(x)
sx,a ≥ 0, ∀x ∈ X, a ∈ A(x)
the function ψ(s) depends only on sx,a for x ∈ X, a ∈ A(x) and ψ(s) is quasi-monotonic on S (i.e. ψ(s) is quasi-convex and quasi-concave on S ). Remark 1 In Theorem 4 ψθ (s) expresses the average reward per transition in a Markov decision problem when the starting position is chosen randomly according to a given distribution {θx } on X and when a stationary strategy s ∈ S is applied. Therefore ψθ (s) = ωθ (s). In Corollary 1 ψ(s) expresses the average reward per transition for a Markov decision problem with unichain property [12, 18]. Therefore ψ(s) = ω(s), ∀s ∈ S and ω(s) does not depend on a starting position.
246
D. Lozovanu and S. Pickl
5 Stationary Equilibria for a Singe-Controller Average Stochastic Game In this section we show that for an arbitrary m-player single-controller average stochastic game a Nash equilibrium in stationary strategies exists. Using Theorem 4 we show that a single-controller average stochastic game determined by a tuple (X, {Ai (x)}x∈X,i=1,m, {f i (x, a}i=1,m , p, {θy }y ∈ X) can be formulated in terms of stationary strategies as follows. Let Si , i ∈ {1, 2, . . . m} be the set of solutions of the system ⎧ ⎨ ⎩
a i ∈Ai (x)
i sx,a i = 1,
∀x ∈ X;
i ∀x ∈ X, a i ∈ Ai (x) sx,a i ≥ 0,
(14)
that determines the set of stationary strategies of player i. Each Si is a convex compact set and an arbitrary extreme point corresponds to a basic solution s i of i system (14), where sx,a ∈ {0, 1}, ∀x ∈ X, a ∈ A(x), i.e. each basic solution of this system corresponds to a pure stationary strategy of player i. On the set S = S1 × S2 × · · · × Sm we define m payoff functions as follows ωθi (s) =
m .
x∈X (a 1 ,a 2 ,...,a m )∈A(x) k=1
k i 1 2 m sx,a k · f (x, (a , a . . . a ) · qx , i = 1, 2, . . . , m,
(15) where qx for x ∈ X are determined uniquely from the following system of linear equations ⎧ 1 q − s 1 · pa · q = 0, ∀y ∈ X; ⎪ ⎪ ⎨ y x∈X a 1 ∈A1 (x) x,a 1 x,y x ⎪ 1 a1 ⎪ sx,a ∀y ∈ X, ⎩ qy + wy − 1 · px,y · wx = θy ,
(16)
x∈X a 1 ∈A1 (x)
for a fixed s = (s 1 , s 2 , . . . , s m ) ∈ S. The functions ωθi (s)(s 1 , s 2 , . . . , s m ), i = 1, 2, . . . , m, represent the payoffs of the single average stochastic game in normal form {Si }i=1,m , {ωθi (s)}i=1,m ". Here θy , y ∈ X represent arbitrary fixed nonnegative values where y∈X θy = 1. If θy = 0, ∀y ∈ X \ {x0 } and θx0 = 1, then we obtain an average stochastic game in normal form {Si }i=1,m , {ωxi 0 (s)}i=1,m " when the starting state x0 is fixed, i.e. ωθi (s 1 , s 2 , . . . , s m ) = ωxi 0 (s 1 , s 2 , . . . , s m ), i = 1, 2, . . . , m. In this case the game is determined by (X, {Ai (x)}x∈X,i=1,m, {f i (x, a}i=1,m , p, x0 ).
An Approach for Determining Stationary Equilibria in a Single-Controller. . .
247
If θy > 0, ∀y ∈ X and y∈X θy = 1, then we obtain an average stochastic game when the play starts in the states y ∈ X with probabilities θy . In this case for the payoffs of the players in the game in normal form we have ωθi (s 1 , s 2 , . . . , s m ) =
θy ωyi (s 1 , s 2 , . . . , s m ), i = 1, 2, . . . , m.
y∈X
Let {Si }i=1,m , {ωθi (s)}i=1,m " be the non-cooperative game in normal form that corresponds to the average stochastic positional game in stationary strategies determined by X, {Ai (x)}x∈X,i , {f i (x, a}i=1,m , p, {θy }y∈X ), where Si and ωθi (s 1 , s 2 , . . . , s m ), i = 1, 2, . . . , m are defined according to (14)–(16). The payoffs in this game may be discontinuous, however each payoff function ωθi (s 1 , s 2 , . . . , s m ) of player i ∈ {1, 2, . . . , m} is quasi-monotonic (quasi-convex and quasi-concave) with respect to strategy s i and graph-continuous in the sense of Dasgupta and Maskin [2]. Theorem 5 The game {Si }i=1,m , {ωθi (s)}i=1,m " possesses a Nash equilibrium ∗ ∗ s∗ = (s 1 , s 2 , . . . , s m ∗ ) ∈ S which is a Nash equilibrium in stationary strategies for the single-controller average stochastic game determined by (X, {Ai (x)}i=1,m,x∈X , {f i (x, a}i=1,m , p, {θy }y∈X ). If θy > 0, ∀y ∈ X, then ∗ ∗ s∗ = (s 1 , s 2 , . . . , s m∗ ) is a Nash equilibrium in stationary strategies for the single-controller average stochastic game with an arbitrary starting state y ∈ X. Proof To prove of the theorem is similar to the proof of Theorem 9.6 from [10] for the case of average stochastic positional games. So, we need to verify that {Si }i=1,m , {ωθi (s)}i=1,m " satisfies the conditions of Theorem 3, i.e. we have to show that each payoff ωθi (s i , s−i ) is quasi-monotonic with respect to s i on convex and compact set Si , and graph-continuous. Indeed, if players 1, 2, . . . , i−1, i+1, . . . , m, fix their strategies sˆ k ∈ Sk , k = i then we obtain an average Markov decision problem with respect to s i ∈ Si in which it is necessary to maximize the average reward function ϕ i (s i ) = ωθi (s i , sˆ−i ). According to Theorem 4 the function ϕ i (s i ) = ωθi (s i , sˆ−i ) possesses the property that it is quasi-monotonic with respect to s i on Si . Additionally we can observe that if for the payoff ωi (s i , s−i ) we consider the function F i : S−i → Si such that F i (s−i ) = sˆ i ∈ ωi (s−i ) for s−i ∈ S−i , i ∈ {1, 2, . . . , m} where ωi (s−i ) = { sˆi ∈ Si | ωθi (ˆs i , s−i )) = max ωθi (s i , s−i )}, s i ∈Si
then the function ωθi (F i (s−i ), s−i ) is continuous at s−i = s−i for an arbitrary (s i , s−i ) ∈ S. So, ψθi (s) is graph-continuous and according
248
D. Lozovanu and S. Pickl
to Theorem 3 the game {Si }i=1,m , {ψθi (s)}i=1,m " possesses a Nash equilibrium s∗ ∈ S. This Nash equilibrium is a Nash equilibrium in stationary strategies for the single-controller average stochastic game determined by (X, {Ai (x)}i=1,m,x∈X , {f i (x, a}i=1,m , p, {θy }y∈X ). 1
Remark 2 In the unichain case of transition probability matrix P s the system (16) is transformed into the following system ⎧ 1 a1 ⎪ qy − sx,a ⎪ 1 · px,y · qx = 0, ∀y ∈ X; ⎪ ⎪ x∈X a 1 ∈A1 (x) ⎪ ⎨ ⎪ qx = 1, ⎪ ⎪ ⎪ ⎪ ⎩ x∈X
(17)
and the payoffs ωi (s), i = 1, 2, . . . , m are determined by (15), (17). In [11, 18] is shown that for the considered case of a single-controller average stochastic game there exists a Nash equilibrium in pure stationary strategies. Based on results above we can proposed the following approach for determining a stationary Nash equilibrium for a single-controller average stochastic game. We construct the auxiliary noncooperative game {Si }i=1,m , {ωθi (s)}i=1,m ", where Si and ωθi (s), for i = 1, 2, . . . , m are determined according to (14)–(16). Then by applying a fixed-point iterative algorithm (see [2]) we determine a Nash equilibrium for this game that is a stationary equilibrium for the single-controller average stochastic game. The proposed approach for determining stationary Nash equilibria in a single-controller average stochastic game differ by the approaches from [7, 21] in which it necessary to solve quadratic programming or linear complimentary problems.
6 Conclusion A new approach for studying the problem of the existence and determining stationary Nash equilibria for a single-controller average stochastic game is proposed. Such an approach is based on reduction procedure of the problem of determining stationary equilibria of a single-controller average stochastic game to the problem of determining Nash equilibria in an auxiliary noncooperative static game in which the payoffs are quasi-monotonic with respect to the strategies of the corresponding players and graph-continuous in the sense of Dasgupta and Maskin. Based on this a new proof of the existence of stationary Nash equilibria for a singlecontroller average stochastic games in a general case is obtained. A stationary Nash equilibrium for a single-controller average stochastic game determined by a tuple (X, {Ai (x)}x∈X,i=1,m, {f i (x, a}i=1,m , p, {θy }y ∈ X) can be found
An Approach for Determining Stationary Equilibria in a Single-Controller. . .
249
by computing a Nash equilibrium for the auxiliary noncooperative static game {Si }i=1,m , {ωθi (s)}i=1,m " from Sect. 5. Acknowledgments The authors are grateful to the referee for the valuable suggestions and remarks contributing to improve the presentation of the paper.
References 1. Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University, Cambridge (2004) 2. Dasgupta, P., Maskin, E.: The existence of equilibrium in discontinuous economic games. Rev. Econ. Stud. 53, 1–26 (1986) 3. Fan, K.: Application of a theorem concerned sets with convex sections. Math. Ann. 1963, 189– 203 (1966) 4. Filar, J.A.: On stationary equilibria of a single-controller stochastic game. Math. Program. 30, 313–325 (1984) 5. Filar, J.A.: Quadratic programming and the single-controller stochastic game. J. Math. Anal. Appl. 113, 136–147 (1986). Math. Program. 30, 313–325 (1984) 6. Filar, J.A., Raghavan, T.E.S.: A matrix game solution to a single-controller stochastic game. Math. Oper. Res. 9, 356–362 (1984) 7. Filar, J.A., Schultz, T.A.: Bilinear programming and structured stochastic games. J. Optim. Theory Appl. 53, 85–104 (1987) 8. Hordijk, A., Kallenberg, L.S.: Linear programming and Markov games. In: Meschling, O., Pallaschke, D. (eds.) Game Theory and Mathematical Economics, pp. 307–319. NorthHolland, Amsterdam (1984) 9. Kallenberg, L.S.: Markov Decision Processes. University of Leiden, Leiden (2016) 10. Lozovanu, D.: Stationary Nash equilibria for average stochastic positional games. In: Petrosyan et al. (eds.) Frontiers of dynamic games, Static and Dynamic Games Theory: Foundation and Applications, pp. 139–163. Springer, Birkhäuser (2018) 11. Lozovanu, D.: Pure and mixed stationary Nash equilibria for average stochastic positional games. In: Petrosyan et al. (eds.) Frontiers of dynamic games, Static and Dynamic Games Theory: Foundation and Applications, pp. 131–156. Springer, Birkhäuser (2019) 12. Lozovanu, D., Pickl, S.: Optimization and Multiobjective Control of Time-Discrete Systems. Springer, Berlin (2009) 13. Nash, J.: Non-cooperative games. Ann Math. 54, 286–293 (1953) 14. Parthasarathy, T., Raghavan, T.E.S.: An orderfield property for stochastic games when one player control transition probabilities. J. Optim. Theory Appl. 33, 375–392 (1981) 15. Puterman, M.: Markov Decision Processes: Discrete Dynamic Programming. Wiley, Hoboken (2005) 16. Raghavan, T.E.S.: Finite-step algorithms for single-controller and perfect information stochastic games: Computing stationary Nasg equilibria of undiscounted single-controller stochastic games. In: Neyman, A., Sorin, S. (eds.) Stochastic Games and Aplications, NATO Science Series, vol. 570, pp. 227–251 (2003) 17. Raghavan, T.E.S., Syed, Z.: Computing stationary Nash equilibria of undiscounted singlecontroller stochastic games. Math. Oper. Res. 27(2) 384–400 (2002) 18. Rogers, P.: Nonzero-Sum Stochastic Games. PhD thesis, University of California, Berkeley, Report ORC 69-8 (1966) 19. Rosenberg, D., Solan, E., Vieille, N.: Stochastic games with a single controller and incomplete information. SIAM J. Control Optim. 43(1), 86–110 (2004) 20. Shapley, L.: Stochastic games. Proc. Natl. Acad. Sci. USA 39, 1095–1100 (1953)
250
D. Lozovanu and S. Pickl
21. Vrieze, O.J.: Linear programming and undiscounted stochastic games in which one player controls transitions. OR Spectrum 3, 29–35 (1981) 22. Vrieze, O.J.: Stochastic games, practical motivation and theorderfield property for special classes. In: Neyman, A., Sorin, S. (eds.) Stochastic Games and Applications, NATO Science Series 570, 215–225 (2003)
Hierarchical Model of Corruption: Game-Theoretic Approach Ivan M. Orlov and Suriya Sh. Kumacheva
Abstract Corruption occurs in relations between people and companies—agents who, in order to benefit from it, should make strategic decisions. This condition makes it possible to use game-theoretic apparatus to analyze it and there are many scientific works on the topic yet they mostly address the corruption in form of game between two or maximum three players. This research differs in its approach: it broadens the scope and analyzes corrupt agents acting not isolated but as parts of a bigger hierarchical structure in hope to obtain insights that may help to combat corruption in organizations. A subhierarchical two-stage game-theoretic model of embezzlement and bribery is created, the particular example with six officials on three levels is solved via computer simulation, settings for corruption minimizing and mild corruption minimizing are suggested and the cooperative extension is suggested. Keywords Corruption · Embezzlement · Inspection · Bribery · Hierarchical game · Equilibria · Corruption minimizing strategies
1 Introduction In [7] the Briber-Bribee game (based on Zimbabwean public sector corruption) is studied with mixed Nash Equilibrium solution based on the values of costs and incomes. The way to affect these values is again top-down and varies from policy recommendations to educating the officials. Work [8] focuses on Committee-Department embezzlement game (based on Chinese corruption) and comes to conclusions similar to Shenje. In [11] the corruption in multinational companies is studied resulting into outlining tetrad of conditions needed for its success: existence of opportunity for
I. M. Orlov () · S. Sh. Kumacheva St. Petersburg State University, Faculty of Applied Mathematics and Control Processes, St. Petersburg, Russia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Trends in Mathematics, https://doi.org/10.1007/978-3-030-93616-7_14
251
252
I. M. Orlov and S. Sh. Kumacheva
corrupt action; small risk of negative repercussions; willingness to engage in corrupt activity; capability to act in a corrupt way. In [9] the extensive form game between Client, Official and Inspector is studied in great depth (analysis, two extensions, three player types, laboratory experiments) and an improvement of the previous models is suggested by making probabilities of actions endogenous with mixed equilibrium being the solution and asymmetric penalties (with focus on officials) being the anti-corruption measure. The carcass of the game inspired the inspection stage of this research. The work [4] presents a multi-stage hierarchical game of corruption in forms of tax evasion and auditor bribing, which inspired the model of this study. The work considers three-level structure: administration, inspector, taxpayers. Taxpayers declare their level of income and choose the size of bribe, administration chooses probabilities of auditing and reauditing, inspector chooses to accept or reject the suggested bribe. The solution suggests that the administration should choose probabilities of auditing and reauditing that depend on the tax, penalty and fine rates and taxpayers should declare their true level of income. The extension for inspection mistakes is also considered. In [1] authors study concordance of public and private interests models with different profit functions of the society and individuals. The main parameter of analysis is price of anarchy (ratio of values of the game in the worst Nash equilibrium to the best situation) and social price of anarchy (the same but the public benefit is used instead of values of the game). The utility of using impulsion (economic) and compulsion (administrative) methods to improve these parameters is examined. The ideas of meta-game synthesis (including corrupt version) are suggested. The same authors analyze corruption via hierarchical control systems, namely investment-construction projects and electricity theft in [2]. Hierarchy is comprised of triplet “principal-supervisor-agent”. In the first system the supervised competition for resources is considered and allocations in situations of no bribes and Nash equilibrium in simultaneous bribing game of n agents are suggested with comparison between corrupt and non-corrupt cases considered for n = 2. The condition for bribing to be unprofitable for the supervisor is provided. In the second system the electricity provider (principal) sends the inspector (supervisor) to check whether the client company (agent) declares their consumption truthfully (which is akin to tax evasion problem). The condition for the agent to report the actual consumption and the ways for the principal to ensure this condition are given. In their latest work [3] they study a three-level extended active system of an incorruptable principal and one or several supervisors and several agents maximizing their payoffs in the conditions of a possible corruption in a controlled dynamic system via social and private interests coordination engine (SPICE) models. The supervisor allocates a resource between the agents who divide it between their private activities and the production of a common social good. If bribed by the agents, he can increase the allocated part. The principal punishes the supervisor in case of the found bribe. The game is solved by qualitatively representative scenarios (QRS) of simulation modeling, the suggested solution is Nash equilibrium in open-
Hierarchical Model of Corruption: Game-Theoretic Approach
253
loop strategies for agents and Stackelberg equilibrium in open-loop strategies for supervisor. The corruption minimization condition is to make the bribe-taking unprofitable to the supervisor via increase in the probability of being caught and the fine for the bribe. Without the corruption agent’s payoffs increase while supervisor’s decrease. The results are given in both general and numerical forms. In [10] corruption (taxpayers’ evasion and bribing the inspector) in transition economies is discussed with Russia being an example. Their model depicts a hierarchical game: homogenous population of taxpayers with income distributed according to some density function, each taxpayer declares a level of income that maximizes their utility and the authority chooses the audit probability that does the same for it. The non-corrupt models of progressive tax and progressive and linearly dependent on undeclared income fines are studied. The corrupt model includes homogeneous taxpayers with two possible levels of income (high and low), inspecting auditor which can be bribed and center which tries to maximize its payoff—sum of all taxes and fines minus costs of inspection (auditor checks taxpayer) and reinspection (center checks auditor on a declared low taxpayer). Mathematical solutions based on parameters (size of tax, fine, bribe, costs of inspection and reinspection) is suggested. Authors also describe possible applications of their results to the Russian economy, they give the optimal audit probability for the rates of 1997, the cut-off difference between a priori and declared profit, probabilistic cut-off for enterprises to be audited selection, warning on the irrelevance of the assumptions in case of organized corruption.
2 Model Description The corruption is modeled as a hierarchical game consisting of two stages: embezzlement and inspection. The implications are such that players are risk-neutral and utility-maximizing while only monetary payoffs are considered. The model was previously considered in [5, 6] (Figs. 1 and 2). Hierarchy is a directed graph (Fig. 1) with the following meaning of the links: 1. 2. 3. 4.
Only ij —i is the superior of j . Only j i—i is the subordinate of j . Both ij and j i—i and j are colleagues (equals). Neither ij nor j i—i and j are unrelated (they are on different levels with no superior-subordinate relationships).
In the first stage the state (the non-corrupt official Om,0 ) allocates amount of money Mm . This money goes down the hierarchy of officials with each of them having a chance to embezzle some of it before passing it to the subordinates. The optimal stealing is ∗ Sn,i =
Gn − Mn , Nn
(1)
254
I. M. Orlov and S. Sh. Kumacheva
Fig. 1 Hierarchy of officials
where Mn is the cut-off value—the minimal amount of money that needs to leave level n in order to create at least semblance of work; Gn is the amount of money entering level n. ∗ is not optimal since the official O Any Sn,i > Sn,i n,i either breaks the cut-off ∗ is not optimal since it is condition or steals from a colleague. Any Sn,i < Sn,i ∗ is optimal from the riskpossible to get more. It is also important to note that Sn,i neutral and utility-maximizing perspective only in case it is possible to bribe the inspector with the amount of money less than stolen; otherwise, there is no profit in stealing. In the second stage (Fig. 2) inspector checks one official i on level n for corruption or none. The inspector has perfect technology, so, if there was an embezzlement, it would be revealed. An official cannot know for sure whether inspection came to them directly or via exposure from one of their subordinates and can either bribe B, not bribe NB or expose their superior E (except for the officials on level m − 1 who cannot expose). The payoffs in the ends in format EX : Uj,k ; U1,i ; UI are as follows (Table 1): E1 : Wj,k + Sj,k ; W1,i + S1,i ; WI E2 : Wj,k + Sj,k ; W1,i + κ1,i S1,i − F (W1,i , S1,i ); WI + R(S1,i ) − Ci1 E3 : Wj,k +Sj,k ; W1,i +κ1,i S1,i −(F (W1,i , S1,i )+B1,i +F b(B1,i )); WI +R(S1,i )− Ci1 E4 : Wj,k + Sj,k ; W1,i + S1,i − B1,i ; WI + B1,i − Ci1 − Cu(S1,i ) E5 : Wj,k +κj,k Sj,k −F (Wj,k , Sj,k ); W1,i +κ1,i S1,i −θ1,i F (W1,i , S1,i ); WI −(Ci1 + Cij ) + R(S1,i ) + R(Sj,k )
eff
E
B
NB
Inspected: Exposed: Bl,p=Bl Bl,p=Bch(l,p)
Oj,k
Fig. 2 Graph of the inspection stage
a l,p E
E7
eff
Acc
a j,k
I
B
Rej
E6
NB
Inspected: Exposed: Bj,k=Bj Bj,k=Bch(j,k)
Oj,k
E5
E
E4
Acc
I
B
E3
Rej
Ol,i
eff
NB
a l,i
E2
eff
E1
a l,i (1- a l,i)
a l,i
Hierarchical Model of Corruption: Game-Theoretic Approach 255
256
I. M. Orlov and S. Sh. Kumacheva
Table 1 Descriptions of the ends
End 1 2 3 4 5 6 7
Description No inspection. Subordinate is inspected, no bribe. Subordinate is inspected, bribe is rejected. Subordinate is inspected, bribe is accepted. Boss is exposed by the subordinate, no bribe. Boss is exposed by the subordinate, bribe is rejected. Boss is exposed by the subordinate, bribe is accepted.
E6 : Wj,k + κj,k Sj,k − (F (Wj,k , Sj,k ) + Bj,k + F b(Bj,k )); W1,i + κ1,i S1,i − θ1,i F (W1,i , S1,i ); WI − (Ci1 + Cij ) + R(S1,i ) + R(Sj,k ) E7 : Wj,k + Sj,k − Bj,k ; W1,i + S1,i ; WI + Bj,k − (Ci1 + Cij + Cu(S1,i ) + Cu(Sj,k )) The further outcomes are similar to E5 , E6 , E7 . The probability of inspection is proportional to the total amount of stealing up to this level. Level n is inspected only if level n + 1 has not been inspected. From the inspector’s point of view, all officials on one level are equivalent. Sn =
N n −1
(2)
Sn,i
i=0
m−1 αn =
j =n
Sj
eff
αn
Mm
= αn
m /
(1 − αk )
(3)
k=n+1 eff
αm = αm = 0
(4)
αm−1 = (1 − αm )αm−1 = αm−1
(5)
eff
αn,i
αn = Nn
+ αn,i = αn,i + eff
eff
eff αn,i
αn = Nn
+ αj,k ,
(j,k)∈SE(n,i)
where (m, l) ∈ SE(n, i) : (m, l) ∈ subs(n, i) & Am,l = E (Tables 2 and 3).
(6) (7)
Hierarchical Model of Corruption: Game-Theoretic Approach
257
Table 2 Values of officials’ characteristics On,i 3, i 2, i 1, i
Wn,i 90,000 40,000 40,000
Sn,i 500,000 125,000 125,000
κn,i 0.600 0.300 0.300
θn,i − 0.010 0.010
αn,i 0.167 0.208 0.250
Bn,i 150,000 62,500 62,500
F (Sn,i ) 1,620,000 720,000 720,000
F b(Bn,i ) 5,625,000 2,812,500 2,812,500
Table 3 Values of inspector’s characteristics WI 70,000
Ci{1,2} 10,000
Ci3 25,000
R(S{1,2},i ) 40,000
R(S3,i ) 75,000
Cu(S{1,2},i ) 5000
Cu(S3,i ) 12,500
4 4,0
3
3,0
G2=1000000 M2= 750000
2,0
Contractor 0
2
G4=3000000 M4=3000000
3,1
2,1
1,0
Contractor 1
Contractor 2
1
G3=3000000 M3=2000000
1,1
G1=1000000 M1= 750000
Contractor 3
Fig. 3 Hierarchy of officials in the example
3 Example The game cannot be solved via backward induction, since official does not know characteristics and utilities of boss and inspector for sure. In order to solve it, the example was constructed (Fig. 3) and the simulation code in Python1 was written and executed. The analysis of results yields the stable outcome via the following processes (assumption is that all officials are self-interested, utility maximizing and incapable of communicating with each other): 1. Find the action yielding maximal utility for bosses. 2. Find the best response of subordinates to the 1. 3. Find the best response of bosses to the 2. 1
The listing can be found at https://github.com/ib-rain/corruption.
258
I. M. Orlov and S. Sh. Kumacheva
Table 4 Results of simulation for the initial settings (3, 0) (3, 1) (2, 0) (2, 1) (1, 0) (1, 1) I State LoC
OptOpt_EB 523,136 535,972 165,000 165,000 165,000 165,000 156,602 1,090,000 0.500
OptOpt_BB 564,934 565,004 156,277 156,294 158,935 158,975 131,233 1,090,000 0.500
NoneOpt_NBB 565,055 564,835 40,000 40,000 40,000 40,000 105,137 1,590,000 0.333
OptNone_BNB 90,000 90,000 162,407 162,405 160,187 160,240 81,219 2,090,000 0.167
NoneNone_NBNB 90,000 90,000 40,000 40,000 40,000 40,000 70,000 2,590,000 0.000
4. Repeat until there are no deviations. (OptOpt_BB → OptOpt_EB → OptOpt_EB in case of Table 4.) Or: 1. 2. 3. 4.
Find the action yielding maximal utility for subordinates. Find the best response of bosses to the 1. Find the best response of subordinates to the 2. Repeat until there are no deviations.
(OptOpt_EB → OptOpt_EB in case of Table 4.) The stable outcome is when all officials steal optimally, subordinates expose, bosses bribe and inspector accepts the bribe. Proposition 1 The obtained equilibrium cannot be called Nash since due to the lack of information about inspector’s payoffs official cannot choose the optimal bribe. We suggest the notion of Nash-like equilibrium: ∗ ∗ υ (Sn,i , Bn,i , A∗n,i ) = argmax{Un,i (Sn,i , Bn,i , An,i ) | Bn,i ≥ Bn,i }.
In that equilibrium officials maximize their utility within confines of not knowing three important things: the utility functions of inspector, the action and the bribe υ of the size of their boss, and the optimal bribe size. They have only hypothesis Bn,i minimal sufficient bribe—they are not able to suggest the lesser bribe (because they believe it will surely be rejected).
4 Anti-Corruption Settings In order to minimize corruption the bribe must be rejected. That will cause official to lose not hidden steal and pay fines, which are supposed to discourage them from stealing in the first place. The ultimate decision (to accept or reject the bribe) is
Hierarchical Model of Corruption: Game-Theoretic Approach
259
made by the inspector. Since they maximize their utility, it depends on which action yields the most profit, i.e. the sign of the inequality (8).
UI (Acc) ? UI (Rej ) → Bn,i −
Cu(Sl,j ) ?
(l,j )∈T
RI (Sl,j )
(8)
(l,j )∈T
The corruption is minimized when
Bn,i −
Cu(Sl,j ) ≤
(l,j )∈T
RI (Sl,j )
(9)
(l,j )∈T
[R(Sl,j ) + Cu(Sl,j )] ≥ Bn,i
(10)
(l,j )∈T
At the same time the size of bribe is chosen by the official: in order to not be corrupt they must get not more from stealing and bribing than from not doing so: + ∗ ∗ ∗ ∗ Un,i (Sn,i , Bn,i , B) − Un,i (0, 0, NB) = Sn,i − αn,i Bn,i ≤ 0
(11)
By connecting (10) and (11) we get the anti-corruption setting condition
[R(Sl,j ) + Cu(Sl,j )] ≥
(l,j )∈T
∗ Sn,i
+ αn,i
∀T ,
(12)
that must be satisfied in the best case for T = {On,i }, in the worst case— T = {On,i , Oj,k , Ol,p , . . . } Oj,k ∈ SE(n, i); Ol,p ∈ SE(j, k) In order to be accepted, the bribe for inspected chain T must be: Bopt T >
[R(Sl,j ) + Cu(Sl,j )]
(13)
[R(Sl,j ) + Cu(Sl,j )] + ζ
(14)
(l,j )∈T
Bopt T (ζ ) =
(l,j )∈T
For the corruption minimization, it must hold that Bopt T (ζ ) ≥
∗ Sn,i + αn,i
All conclusions valid for ζ = x > 0 are valid for any ζ > x.
(15)
260
I. M. Orlov and S. Sh. Kumacheva
Let us provide the example. There are three possible types of chains in the studied hierarchy: Ts = {O2,i }; {O1,i }
Tb = {O3,i }
Tch = {O2,i , O3,0 }; {O1,i , O3,1 }
i = 0, 1
For simplicity, since levels 1 and 2 are alike (and officials within them are identical), suppose ∗ ∗ = S2,i = Ss S1,i
∗ ∗ B1,i = B2,i = Bs
Since it has already been established that it is optimal for the subordinates to expose their bosses, fighting corruption in chains Ts is senseless: no matter how big the needed bribe is, they will not pay it. It is more useful to fight corruption in chain Tch (make being exposed unprofitable for bosses), then Tb (make being directly inspected unprofitable for bosses) and then Ts under the circumstances of S3 = 0 while following the logic of bigger bribe for bigger stealing. It is possible to formulate three settings, each stricter than the previous (Table 5). All obtained settings were simulated 500,000 times with utilities being averaged. The code execution results are presented in Table 6 and Fig. 4 via charts of “corrupt utility” calculated as CUX = UX − WX
(16)
Due to the assumptions of officials not being able to communicate and not knowing the characteristics of each other and inspector, the averages from the stable solutions are chosen to represent the settings. The settings changes reduce corruption and it is possible to eliminate the corruption in the model, but the means are extreme.
5 Mild Anti-Corruption Settings The values of settings in Table 5 might be considered extreme or impossible to implement in real life, so let us limit the optimal bribe size: ∗ Bopt −T ≤ Sn,i
With that limitation, we have four possible settings (including default), which we will name zettings to avoid confusion (Table 7): The settings changes reduce corruption, decrease revenue for On,i and increase for I (Table 8 and Fig. 5), which might also be beneficial since focusing the corrupt money in one place simplifies control. Mild Corruption Minimization is less extreme, effective, but less so than Corruption Minimization.
Setting Default 1 2 3
R(S{1,2},i ) 40,000.0 60,000.0 60,000.0 2,000,000.0
Cu(S{1,2},i ) 5000.0 20,000.0 20,000.0 1,000,000.0
Table 5 Corruption minimization settings R(S3,i ) 75,000.0 875,000.0 2,000,000.0 3,250,000.0
Cu(S3,i ) 11,250.0 429,615.4 1,000,000.0 2,500,000.0
Bopt−ch 131,251.0 1,384,616.4 3,080,000.0 8,750,000.0
Bopt−b 86,251.0 1,304,615.4 3,000,000.0 5,750,000.0
Bopt−s 45,001.0 80,000.0 80,000.0 3,000,000.0
T – ch b s
Bopt−T − 1,384,615.4 3,000,000.0 3,000,000.0
Hierarchical Model of Corruption: Game-Theoretic Approach 261
262
I. M. Orlov and S. Sh. Kumacheva
Table 6 Change in corrupt utility after corruption minimization AVG (3, 0) (3, 1) (2, 0) (2, 1) (1, 0) (1, 1) I
def 143,336.69 147,691.36 109,345.65 109,236.93 96,252.46 96,099.14 36,663.69
s1
s2
0.00 0.00 80,560.62 80,548.62 78,231.37 78,242.55 11,026.42
0.00 0.00 80,554.08 80,554.81 78,253.14 78,230.66 11,018.90
def → s1 −100.00% −100.00% −26.32% −26.26% −18.72% −18.58% −69.93%
s3 0.00 0.00 0.00 0.00 0.00 0.00 0.00
def → s2 −100.00% −100.00% −26.33% −26.26% −18.70% −18.59% −69.95%
def → s3 −100.00% −100.00% −100.00% −100.00% −100.00% −100.00% −100.00%
200000 180000 160000 140000 120000 100000 80000 60000 40000 20000 0
def
s1
s2 3
2
1
s3
l
Fig. 4 Average “corrupt utility” of players after corruption minimization
6 Cooperative Element Bosses need some way of protecting themselves from subordinates. One way is to form a coalition of two or more officials in which: members cannot expose each other; members’ steals are divided among them according to the stated allocation rule; bribe (in the case when one of the members is inspected) is compiled collectively. Joining a coalition brings advantages and disadvantages. Advantages are insurance against being exposed; better coordination in terms of stealing amounts (irrelevant in the model, but might be important in real life); more certainty in terms of the sufficient bribe (grand coalition knows exactly how the inspection happened); bigger bribe (thus less chance of being rejected) with less problems conjuring up one for each of the members—at least, potentially. Disadvantages are higher chances of being inspected; higher fines for organized group felonies; allocation might not be favourable for some members.
Zetting Default 1 2 3
R(S{1,2},i ) 40,000 70,000 0 85,000
Cu(S{1,2},i ) 5000 35,000 0 39,999
Table 7 Mild corruption minimization zettings R(S3,i ) 75,000 270,000 300,000 250,000
Cu(S3,i ) 11,250 124,999 199,999 125,000
Bopt−ch 131,251 500,000 500,000 500,000
Bopt−b 86,251 395,000 500,000 375,001
Bopt−s 45,001 105,001 1 125,000
T – ch b s
Bopt−T – 500,000 500,000 125,000
Hierarchical Model of Corruption: Game-Theoretic Approach 263
264
I. M. Orlov and S. Sh. Kumacheva
Table 8 Change in utilities after mild corruption minimization AVG (3, 0) (3, 1) (2, 0) (2, 1) (1, 0) (1, 1) I
def 143,336.69 147,691.36 109,345.65 109,236.93 96,252.46 96,099.14 36,663.69
z1 69,432.38 79,857.00 76,485.57 76,497.06 75,106.62 75,127.30 70,989.22
z3 69,307.13 79,864.25 76,168.38 76,163.28 74,542.06 74,531.44 71,822.14
def → z1 −51.56 % −45.93 % −30.05 % −29.97 % −21.97 % −21.82 % 93.62 %
def → z3 −51.65 % −45.92 % −30.34 % −30.28 % −22.56 % −22.44 % 95.89 %
z1 → z3 −0.18 % 0.01 % −0.41 % −0.44 % −0.75 % −0.79 % 1.17 %
Fig. 5 Average “corrupt utility” of players after mild corruption minimization
Not any group of officials can form a coalition. For example, take a pair {(1, 0), (2, 0)}. They do not “know” each other—there are no ties connecting them directly, so it must be hard for them to communicate, the former cannot expose the latter because they are not in “superior-subordinate” relationships, forming this coalition is senseless and should not be possible. We suggest the rule “any official with direct or indirect connection (path in the hierarchy graph) to another can be in the coalition with them”. In other words, no disconnected components are allowed in the coalition. For example, coalition {(2, 0), (3, 0), (3, 1)} is possible, but {(2, 0), (3, 0), (1, 0)} is not. It is possible to build twenty-four different coalitions according to this rule. Coalitions are characterized by: • set of coalition members, its subsets and their sizes: C= {(n, i)} = Cn , NC = |C|, (n,i)∈C
Cj =
(j,i)∈C
{(j, i)},
n∈C
NC,j =
(j,i)∈C
1 = |Cj | ≤ Nj ,
Hierarchical Model of Corruption: Game-Theoretic Approach
265
• partial utility of a member according to the rule R (the part official gets from stealing and potentially coalitionally bribing) C RUn,i = Un,i (Sn,i , 0, BC) − Wn,i ,
• coalitional actions: members of coalition never expose, always bribe jointly and cannot refrain from stealing (if they do not want it is better for them not to join coalition in the first place) Sn,i > 0 & An,i = BC
∀(n, i) ∈ C,
• coalitional stealing SC =
Sn,i ,
(n,i)∈C
• coalitional bribe BC , • the chance of inspection αC =
+ αn,i .
(n,i)∈C
This chance can also be portrayed as the vector of probabilities αC = (αch ; αb ; αs ) since any official but the ultimate subordinate is unsure about the source of inspection (and there is more than one official in the coalition). The same applies to the coalitional bribe: BC = (Bch ; Bb ; Bs )T . From that we get: αC BC = αch Bch + αb Bb + αs Bs . In the non-cooperative case for boss every term goes into αch , since they cannot know the source of inspection. Every term goes into αs for subordinate since there is no other way for them to be inspected but the direct. If inspector accepts the bribe, coalition loses only it, if he does not, coalition loses the bribe and every coalition member suffers the fine for organized stealing: C C Un,i (AI ) = RUn,i −
0 if AI = Acc , F cs(SC ) + F cb(BC ) if AI = Rej
where F cs(SC ) and F cb(BC ) are fines for coalitional stealing and bribing.
(17)
266
I. M. Orlov and S. Sh. Kumacheva
7 Conclusion The study of the literature shows researches that do not take hierarchical relations of players into account and analyze ‘simple’ games between two-three agents are more common, the similar claim is made in [2]. The difference between this study and hierarchical studies [2, 4] lies in the construction of hierarchy: in these works hierarchies are of “administration-inspectorclient” type with no differentiation in the last class, while this work focuses on the “superior-subordinate” type (which provides a feature of subordinate having the ability to expose the bigger stealer, for example, their superior) with inspector being outside the hierarchy. Another difference is in the future development of cooperative element. The semblance can be found in absence of corruption on the highest level of the hierarchy and the hierarchy itself. The model of hierarchical corruption is built. It consists of two stages: at the first stage each official decides how much money he or she embezzles, at the second stage inspector investigates the stealing and inspected official chooses the size of bribe and action (bribe, not bribe or expose). The particular case with two levels and six officials is built and is solved via computer simulation. The result is an equilibrium in which each inspected official from level 1 or 2 exposes his boss who then gives sufficient bribe to the inspector and each inspected official from level 3 gives sufficient bribe. This equilibrium situation is pessimistic, because corruption is not punished, but causes even greater corruption. Corruption minimization settings that reduce corruption and are capable of eliminating it are suggested alongside with mild corruption minimization settings which are less extreme, but still effective, even though less so. The independently obtained results and the ways of obtaining them are surprisingly similar to [3]. Cooperation between officials into the model as the mean to protect the bosses from the subordinates is suggested. Further work will be the simulation of the cooperative element and identification of the ways to combat corruption both with and without coalitions.
References 1. Gorbaneva, O.I., Ougolnitsky, G.A.: Price of anarchy and control mechanisms in models of concordance of public and private interests. Math. Game Theory Appl. 7, 50–73. Inst. of App. Math. Res. of the KarRC RAS, Petrozavodsk (2015) [in Russian] 2. Gorbaneva, O.I., Ougolnitsky, G.A., Usov, A.B.: Models of corruption in hierarchical control systems. Control Sci. 1, 2–10. Russian Academy of. Sciences, Moscow (2015) [in Russian] 3. Gorbaneva, O.I., Usov, A.B., Ougolnitsky, G.A.: Mechanisms of struggle with corruption in dynamic social and private interests coordination engine models. In: Petrosyan, L.A., Zenkevich N.,A. (eds.) Contributions to Game Theory and Management, vol. 12, pp. 140–150. St. Petersburg University, St. Petersburg (2019)
Hierarchical Model of Corruption: Game-Theoretic Approach
267
4. Kumacheva, S.Sh.: The Strategy of Tax Control in Conditions of Possible Mistakes and Corruption of Inspectors. In: Petrosyan, L.A., Zenkevich N.,A. (eds.) Contributions to Game Theory and Management, vol. 6, pp. 264–273. St. Petersburg University, St. Petersburg (2013) 5. Orlov, I.M.: Example of Solving a Corruption Game with Hierarchical Structure. In: Control Processes and Stability, vol. 7, pp. 402-407. Publishing House Fedorova G.V., St. Petersburg (2020) 6. Orlov, I.M., Kumacheva, S.Sh.: Hierarchical Model of Corruption: Game-theoretic Approach. Mathematical Control Theory and Its Applications Proceedings, pp. 269-271. CSRI ELEKTROPRIBOR, St. Petersburg (2020) [in Russian] 7. Shenje, T.: Investigating the mechanism of corruption and bribery behavior: a game-theoretical methodology. Dyn. Res. J. J. Econ. Finance 1, 1–6 (2016) 8. Song, Y., Zhu, M., Wang, H.: Game-theoretic approach for anti-corruption policy between investigating committee and inspected departments in China. In: International Conference on Applied Mathematics, Simulation and Modelling, pp. 452-455. Atlantis Press, Beijing (2016) 9. Spengler, D.: Detection and Deterrence in the Economics of Corruption: a Game Theoretic Analysis and some Experimental Evidence. University of York, York (2014) 10. Vasin, A., Panova, E.: Tax Collection and Corruption in Fiscal Bodies. Economics Education and Research Consortium Working Paper Series, No. 99/10 (2000) 11. Zyglidopoulos, S., Dieleman, M., Hirsch, P.: Playing the Game: Unpacking the Rationale for Organizational Corruption in MNCs. J. Manage. Inq. 29, 338–349 (2019)
Differential Network Games with Infinite Duration Leon Petrosyan, David Yeung, and Yaroslavna Pankratova
Abstract In the paper, infinite horizon differential games on networks are considered. The cooperative version of the game is proposed, and the special type of characteristic function is introduced. It is proved that the constructed cooperative game is convex. Using the properties of payoff functions and the constructed characteristic function, the Shapley Value is computed. It is also proved that in this special class of differential games the Shapley value is time-consistent. In non cooperative case as solution concept the Nash Equilibrium is considered. Moreover, a special subclass of Nash equilibrium, based on threat and punishment strategies, is derived. Additionally, we compute the Price of Stability (PoS). Keywords Differential Network Games · Nash equilibrium · Price of Stability · The Shapley Value
1 Introduction This paper is a generalization of the paper [10] where a differential non-zero sum game with prescribed duration on network was considered (see also [7, 13]). Here we consider the differential games [5] with infinite horizon. Coordinating players in a network to maximize their joint gain and distribute the cooperative gains in a dynamically stable solution is a topic of ongoing research (see [3, 6, 8, 15]). The Shapley value [12] is credited to be one of the best solutions in attributing a fair gain to each player in a complex situation like a network. As we mentioned earlier [11] the determination of the worth of the subsets of players (characteristic function) in the Shapley value is not indisputably unique. In addition, in a differential game, the
L. Petrosyan · Y. Pankratova () St. Petersburg State University, St. Petersburg, Russia e-mail: [email protected]; [email protected] D. Yeung Hong Kong Shue Yan University, Hong Kong, China © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Trends in Mathematics, https://doi.org/10.1007/978-3-030-93616-7_15
269
270
L. Petrosyan et al.
worth of the coalitions of players changes as the game evolves. In this paper as in [9–11] in computing the values of characteristic function for coalitions, we maintain the cooperative strategies for all players and evaluate the worth of the coalitions along the cooperative trajectory. Thus, the worth of coalition S is its cooperative payoff with the exclusion of the gains through network connection from players outside the coalition. Worth-noting is that this new type of characteristic function is time consistent. As it was mentioned in [9–11] the time consistency property of the characteristic function has not been shared by existing characteristic functions in differential games (see [4, 10, 10, 14]). This approach enables us to measure the worth of coalitions under the process of cooperation instead of under minmax confrontation [2] or Nash noncooperative stance. Also the newly introduced characteristic function is convex, which implies the belonging of Shapley Value to the core and thus non-emptiness of the core. The example demonstrates how the new approach simplifies the computation of the Shapley Value. It is proved that the cooperative outcome can be strategically supported and the Nash equilibrium is constructed with resulting payoffs of players coinciding with their payoffs under cooperation.
2 Differential Network Games Consider a class of n-person differential games on network. The players are connected in a network system. We use N = {1, 2, . . . , n} to denote the set of players in the network. The nodes of the network are used to represent the players from the set N. We also denote the set of nodes by N and denote the set of all arcs in network N by L. The arcs in L are the arc (i, j ) ∈ L for players i, j ∈ N. For notational convenience, we denote the set of players connected to player i as K(i) = {j : arc(i, j ) ∈ L}, for i ∈ N. Let x i (τ ) ∈ R m be the state variable of player i ∈ N at time τ , and ui (τ ) ∈ i U ⊂ R k the control variable of player i ∈ N. Every player i ∈ N can cut the connection with any other player from the set K(i) at any instant of time. The state dynamics of the game is x˙ i (τ ) = f i (x i (τ ), ui (τ )), x i (t0 ) = x0i , for τ ∈ [t0 , ∞) and i ∈ N.
(1)
The function f i (x i , ui ) is continuously differentiable in x i and ui . For notational convenience, we use x(t) to denote the vector (x 1 (t), x 2 (t), · · · , x n (t)). As usual in classical differential game theory we suppose that at each time instant t ∈ [t0 , ∞) players i ∈ N have information about the this time instant and state variable x(t) and based on this information chose their controls and the action to cut or to keep the connection with players from K(i).
Differential Network Games with Infinite Duration
271
Let x0 = (x01 , . . . , x0n ) and denote this game as Γ (x0 ). The payoff function of player i depends upon his state variable, his own control variable and the state variables of players from the set K(i), m = 1, 2, . . .. the payoff of player i is given as Ki (x0 , u , . . . , u ) = 1
n
∞
j ∈K(i) t0
e−ρ(t −t0 ) hi (x i (τ ), x j (τ ))dτ, i ∈ N, j
(2)
if K(i) = ∅ (if player i has no connection with other players) then Ki (x0 , u1 , . . . , un ) = 0. j The term hi (x i (τ ), x j (τ )) is the instantaneous gain that player i can obtain through network links with player j ∈ K(i). j The functions hi (x i (τ ), x j (τ )), for j ∈ K(i) are positive. F From (2) we can see that the payoff of player i is computed as a sum of payoffs which he gets interacting with players j ∈ N\{i}.
2.1 Cooperation and Characteristic Function Consider the cooperative version of the game. The value of characteristic function for the grand coalition N is defined as maximum value of the joint payoff
∞
i∈N j ∈K(i) t0
e−ρ(t −t0 ) hi (x i (τ ), x j (τ ))dτ j
(3)
subject to dynamics (1). ¯ = (u¯ 1 (t), u¯ 2 (t), · · · , Denote by x(t) ¯ = (x¯ 1 (t), x¯ 2 (t), · · · , x¯ n (t)) and by u(t) n u¯ (t)) the optimal cooperative trajectory and the optimal cooperative control in the problem of maximization (3) subject to (1).The maximized joint cooperative payoff V (x0 , t0 , N) involving all players can then be expressed as
∞
i∈N j ∈K(i) t0
max
u1 ,u2 ,··· ,un
∞
i∈N j ∈K(i) t0
e−ρ(t −t0 ) hi (x¯ i (τ ), x¯ j (τ )) = j
e−ρ(t −t0 ) hi (x i (τ ), x j (τ ))dτ = V (x0 , t0 , N) j
subject to dynamics (1). Next, we consider distributing the cooperative payoff to the participating players under an agreeable scheme. Given that the contributions of an individual player to the joint payoff through linked players can be diverse, the Shapley value [12] provides one of the best solutions in attributing a fair gain to each player in a
272
L. Petrosyan et al.
complex network. One of the contentious issues in using the Shapley value is the determination of the worth of subsets of players (characteristic function). In this section, we present a new formulation of the worth of coalition S ⊂ N. In computing the values of characteristic function for coalitions, we evaluate contributions of the players in the process of cooperation and maintain the cooperative strategies for all players along the cooperative trajectory. In particular, we evaluate the worth of the coalitions along the cooperative trajectory as V (x0 , t0 , S) =
∞
i∈S j ∈K(i)∩S t0
e−ρ(t −t0 ) hi (x¯ i (τ ), x¯ j (τ ))dτ. j
(4)
The idea of this approach was first proposed in [1]. Note that the worth of coalition S is measured by the sum of payoffs of players in the coalition in the cooperation process with the exclusion of gains from players outside coalition S. Thus, the characteristic function reflecting the worth of coalition S in (4) is formulated along the cooperative trajectory x(t). ¯ Similarly, the characteristic function at time t ∈ [t0 , ∞] can be evaluated as V (x(t), ¯ t, S) =
∞
i∈S j ∈K(i)∩S t
e−ρ(t −t0) hi (x¯ i (τ ), x¯ j (τ ))dτ. j
(5)
An important property of the above characteristic function as a measure of the worth of coalition in the Shapley value is given below. Proposition 1 The characteristic function defined by (4) and (5) is convex. Proof of Proposition 1 is similar to the corresponding proposition from [11], where the differential games with finite time horizon were considered. We can see that t j V (x0 , t0 , N) = ( e−ρ(τ −t0 ) hi (x¯ i (τ ), x¯ j (τ ))dτ )+ i∈N j ∈K(i) t0
e−ρ(t −t0 )
∞ t
e−ρ(τ −t ) hi (x(τ ¯ ), x¯ j (τ ))dτ ) = j
t j ( e−ρ(τ −t0 ) hi (x¯ i (τ ), x¯ j (τ ))dτ ) + V (x(t), ¯ t, N)e−ρ(t −t0 ) i∈N j ∈K(i) t0
(6)
Differential Network Games with Infinite Duration
273
and V (x0 , t0 , S) = (
t
i∈S j ∈K(i)∩S t0
e−ρ(t −t0 )
∞ t
(
t
i∈S j ∈K(i)∩S t0
e−ρ(τ −t0 ) hi (x¯ i (τ ), x¯ j (τ ))dτ )+ j
e−ρ(τ −t ) hi (x(τ ¯ ), x¯ j (τ ))dτ ) = j
e−ρ(τ −t0 ) hi (x¯ i (τ ), x¯ j (τ ))dτ ) + V (x(t), ¯ t, S)e−ρ(t −t0 ) , s ⊂ N. j
(7) Equations (6)–(7) can be interpreted as time-consistency property of newly introduced characteristic function. It is necessary to mention that this property of the characteristic function is similar to one considered in [10, 11] and has not been shared by existing characteristic functions in differential games. As we can see in our case the worth of coalitions is measured under the process of cooperation instead of under min-max confrontation or Nash non-cooperative stance. Any individual player or coalition attempting to act independently will have the links to other players in the network being cut off. Because of this players outside S in worst case will cut connection with players from S, and players from S will get positive payoffs only interacting with other players from S.
3 Dynamic Shapley Value In this section, we develop a dynamic Shapley value imputation using the defined characteristic function. Now, we consider allocating the grand coalition cooperative network gain V (x0 , t0 , N) to individual players according to the Shapley value imputation. Player i’s payoff under cooperation would become Shi (x0 , t0 ) =
(|S| − 1)!(n − |S|)! × [V (x0 , t0 , S) − V (x0 , t0 , S\{i})], n! S⊂N, S i
for i ∈ N.
(8)
274
L. Petrosyan et al.
Invoking (8), in our case, we can obtain the cooperative payoff of player i under the Shapley value as Shi (x0 , t0 ) =
⎡ (|S| − 1)!(n − |S|)! ×⎣ n! S⊂N
∞
l∈S j ∈K(l)∩S t0
S i
∞
l∈S\{i} j ∈K(l)∩S\{i} t0
e−ρ(τ −t0 ) hl (x¯ l (τ ), x¯ j (τ ))dτ − j
⎤ e−ρ(τ −t0 ) hl (x¯ l (τ ), x¯ j (τ ))dτ ⎦ j
(9)
However, in a dynamic framework, the agreed upon optimality principle for sharing the gain has to be maintained throughout the cooperation duration (see [14]) for a dynamically consistent solution. Applying the Shapley value imputation in (9) to any time instance t ∈ [t0 , T ], we obtain: Shi (x(t), ¯ t) =
⎡ (|S| − 1)!(n − |S|)! ×⎣ n! S⊂N
∞
l∈S j ∈K(l)∩S t
S i
l∈S\{i} j ∈K(l)∩S\{i} t
∞
e−ρ(τ −t ) hl (x¯ l (τ ), x¯ j (τ ))dτ − j
⎤ e−ρ(τ −t ) hl (x¯ l (τ ), x¯ j (τ ))dτ ⎦ j
(10)
The Shapley value imputation in (9)–(10) is based on characteristic function evaluates along the optimal cooperative trajectory and it attributes the contributions of the players under the optimal cooperation process. Indeed, it can be regarded as optimal trajectory dynamic Shapley value. In addition, this Shapley value imputation (9)–(10) fulfils the property of time consistency. Proposition 2 The Shapley value imputation in (9)–(10) satisfies the time consistency property. Proof By direct computation we get Shi (x0 , t0 )
⎡ (|S| − 1)!(n − |S|)! ×⎣ = n! S⊂N
t
l∈S j ∈K(l)∩S t0
S i
e−ρ(τ −t0 ) hl (x¯ l (τ ), x¯ j (τ ))dτ − j
⎤ t
l∈S\{i} j ∈K(l)∩S\{i} t0
j e−ρ(τ −t0 ) hl (x¯ l (τ ), x¯ j (τ ))dτ ⎦
+ e−ρ(t−t0 ) Shi (x(t), ¯ t),
Differential Network Games with Infinite Duration
275
for t ∈ [t0 , T ] which exhibits the time consistency property of the Shapley value imputation. This is the first time that a Shapley value measure itself in a dynamic framework fulfils the property of time consistency (see existing dynamic Shapley value measures which do not share this property in [4, 14]. Using this Shapley value formulation, the cooperative game solution would automatically satisfy the condition of time consistency.
4 Strategic Support of Cooperation Using the structure of the game we can construct Nash equilibrium with the same outcome as by cooperation. Definition 1 The Nash equilibrium (u∗1 (·), . . . , u∗i (·), . . . , u∗n (·)) is called best Nash equilibrium if the joint payoff of players in the equilibrium is maximal among joint payoffs in all other Nash equilibrium and is denoted by W (N). Proposition 3 The cooperative outcome in the game Γ (x0 ) can be achieved in Nash equilibrium. Proof Consider the following behaviour of players in Γ (x0 ). Suppose that all players starting from time instant t0 choose cooperative controls u(t) ¯ = (u¯ 1 (t), . . . , u¯ i (t), . . . , u¯ n (t)), then the game will evolve along the cooperative trajectory x(t). ¯ If in some time instant t¯ the state of the game will be seen outside the cooperative trajectory x(t¯) = x(t), ¯ then players strategy from the time instant t¯ cut connection with neighbours. Since the motion trajectory of the game is continuous function the time instant t¯ can be very close to the first moment of the deviation of player i0 ∈ N from cooperative behaviour u¯ i0 (t). Now suppose that one player i ∈ N deviates at some time instant t from the cooperative behaviour u¯ i0 (t) (u¯ i0 (t ) = ui0 (t )). Then if f i0 (x i0 (t ), ui0 (t )) = f i0 (x¯ i0 (t ), u¯ i0 (t )) the motion of player i0 ∈ N will change and as result x(t) = x(t) ¯ for any t, t > t sufficiently close to t . Since players have information about the state variable x at each time instant of the game there will exist such t¯ close to t , (t¯ > t ) that they will be informed about x(t¯) = x( ¯ t¯). Then according to the prescribed behaviour each of players i ∈ N\{i0 } will cut connection with all his neighbours. As a result all players including the deviating player i0 will get payoff j equal to 0 till the end of the game. Since the function hi > 0 for j ∈ K(i) and the infinite duration of the game it is clear that the player i0 ∈ N will lose by deviating from cooperative trajectory. Which means that the proposed behaviour constitutes a Nash equilibrium in Γ (x0 ) with cooperative outcome.
276
L. Petrosyan et al.
Fig. 1 4 player network game
2
3
1
4
Definition 2 The Price of Stability is defined as S=
W (N) , V (N)
where V (N) is maximal joint payoff of players in the game, and W (N) is the joint payoff of players in the best Nash equilibrium. It follows from Proposition 3 that the Price of Stability S in Γ (x0 ) is equal to 1. Since according to Proposition 3, there exists the best Nash equilibrium with maximal joint payoff V (N) of players. Example 1 Consider the following 4 player network game (see Fig. 1). For simplicity in notations, we denote the gain that player i can obtain through the network link with player j ∈ K(i) as ¯ t) = αij (x(t), t
∞
e−ρ(t −t0 ) hi (x¯ i (τ ), x¯ j (τ ))dτ, j
for t ∈ [t0 , ∞]. Using (11), we can rewrite the formula of the Shapley value ⎡ (|S| − 1)!(n − |S|)! ×⎣ Shi (x0 , t0 ) = n! S⊂N
αlj (x0 , t0 )−
l∈S j ∈K(l)∩S
S i
⎤ αlj (x0 , t0 )⎦
l∈S\{i} j ∈K(l)∩S\{i}
For this network structure the characteristic function is defined as V ({1, 2}) = α12 + α21 , V ({1, 3}) = 0, V ({1, 4}) = α14 + α41 , V ({2, 3}) = α23 + α32 ,
(11)
Differential Network Games with Infinite Duration
277
V ({2, 4}) = α24 + α42 , V ({3, 4}) = α34 + α43 , V ({1, 2, 3}) = α12 + α21 + α23 + α32 , V {(1, 3, 4}) = α14 + α41 + α34 + α43 , V ({1, 2, 4}) = α12 + α21 + α14 + α41 + α24 + α42 , V ({2, 3, 4}) = α23 + α32 + α24 + α42 + α34 + α43 , V ({1, 2, 3, 4}) = α12 + α21 + α14 + α41 + α23 + α32 + α24 + α42 + α34 + α43 . Computing the Shapley Value we get α12 + α21 + α14 + α41 , 2 α12 + α21 + α23 + α32 + α24 + α42 , Sh2 = 2 α23 + α32 + α34 + α43 Sh3 = , 2 α14 + α41 + α43 + α34 + α24 + α42 . Sh4 = 2 Sh1 =
5 Conclusions In the paper differential cooperative games on networks with infinite duration are defined and investigated. The special case is considered when players payoff depend only upon his actions and actions of his neighbours. A form for measuring the worth of coalitions for the Shapley value imputation similar to one proposed in [10, 11] is developed. In computing this type of characteristic function, we evaluate contributions of the players in the process of cooperation and maintain the cooperative strategies for all players along the cooperative trajectory. Using this characteristic function, the dynamic Shapley value solution is derived. Worth-noting is the resultant Shapley value itself fullls the property of time consistency. Another important point is the applicability of proposed solution based on the defined characteristic function to a wide class of dynamic real-life game-theoretic problems. This is because of the simple method used for the computation of characteristic function and the Shapley value. The only limitation is connected with the level of complexity of the maximization problem [4] but this type of problem is classical in the modern control theory and there is a wide variety of effective methods to solve them (dynamic programming, Pontryagin maximum principle, variation methods). It is proved that in the case under consideration the cooperative outcome can be achieved in a specially constructed Nash Equilibrium.
278
L. Petrosyan et al.
References 1. Bulgakova, M., Petrosyan, L.: About one multistage non-antagonistic network game. Vestnik Sankt-Peterburgskogo Universiteta, Prikladnaya Matematika, Informatika, Protsessy Upravleniya 5(4), 603–615 (2019). https://doi.org/10.21638/11702/spbu10.2019.415 2. Cao, H., Ertin, E., Arora, A.: MiniMax Equilibrium of Networked Differential Games. ACM Trans. Auton. Adapt. Syst. 3(4) (1963). https://doi.org/10.1145/1452001.1452004 3. Gao, H., Pankratova, Y.: Cooperation in dynamic network games. Contrib. Game Theory Manage. 10, 42–67 (2017) 4. Gromova, E.: The Shapley value as a sustainable cooperative solution in differential games of three players. In: Recent Advances in Game Theory and Applications, Static and Dynamic Game Theory: Foundations and Applications. Springer, Berlin (2016). https://doi.org/10.1007/ 978-3-319-43838-2_4 5. Isaacs, R.: Differential Games, Wiley, New York (1965) 6. Meza, M.A.G., Lopez-Barrientos, J.D.: A differential game of a duopoly with network externalities. In: Petrosyan, L.A., Mazalov, V.V. (eds.) Recent Advances in Game Theory and Applications. Springer, Birkhäuser (2016). https://doi.org/10.1007/978-3-319-43838-2 7. Pai, H.M.: A differential game formulation of a controlled network. Queueing Syst. Theory Appl. Archive 64(4), 325–358 (2010) 8. Petrosyan, L.A.: Cooperative differential games on networks. Trudy Inst. Mat. i Mekh. UrO RAN 16(5), 143–150 (2010) 9. Petrosyan L.A., Tur A.V.: Cooperative Optimality Principles in Differential Games on Networks. Autom. Remote Control 82(6), 1095–1106 (2021) 10. Petrosyan, L.A., Yeung, D.W.K.: Shapley value for differential network games: theory and application. J. Dyn. Games 8(2), 151–166 (2020). https://doi.org/10.3934/jdg.2020021 11. Petrosyan, L.A., Yeung, D., Pankratova, Y.B.: Cooperative differential games with partner sets on networks. Trudy Inst. Mat. i Mekh. UrO RAN 27(3), 286–295 (2021). https://doi.org/10. 21538/0134-4889-2021-27-3-286-295 12. Shapley, L.S.: A Value for N-person Games. In: Kuhn, H., Tucker, A. (eds.) Contributions to the Theory of Games, pp. 307–317. Princeton University, Princeton (1953) 13. Wie, B.W.: A Differential Game Model of Nash equilibrium on a congested traffic network. Networks 23, 557–565 (1993) 14. Yeung, D.W.K., Petrosyan, L.A.: Subgame Consistent Cooperation: A Comprehensive Treatise. Springer, Berlin (2016) 15. Zhang, H., Jiang, L.V., Huang, S., Wang, J., Zhang, Y.: Attack-defense differential game model for network defense strategy selection. IEEE Access 7, 50618–50629 (2018). https://doi.org/ 10.1109/ACCESS.2018.2880214
Differential Game Model Applied in Low-Carbon Chain with Continuous Updating Zeyang Wang, Fanjun Yao, Ovanes Petrosian, and Hongwei Gao
Abstract This paper is devoted to the application of a class of differential games with continuous updating in low-carbon chain. It is performed that the optimal control (cooperative strategies) and feedback Nash equilibrium strategies uniformly converge to the corresponding strategies in the game model with continuous updating as the number of updating instants converges to infinity. With respect to the traditional model, this method gives the chances for players to adjust their strategies according to the information in the near future. The key point is that players could adjust their strategies corresponding to the current time information. In this paper we for the first time study the low-carbon chain differential game model with continuous updating. Keywords Differential games · Low-carbon chain · Continuous updating · Cooperative differential games · Non-cooperative differential game
This work is supported by Postdoctoral International Exchange Program of China and the National Natural Science Foundation of China (Grant No. 71571108) Z. Wang Saint Petersburg State University, St. Petersburg, Russia School of Mathematics and Computer Science, Shaanxi, Yan’an University, Yan’an, China F. Yao School of Business, Qingdao University, Qingdao, P.R. China O. Petrosian () School of Automation, Qingdao University, Qingdao, China Faculty of Applied Mathematics and Control Processes, St. Petersburg University, St. Petersburg, Russia e-mail: [email protected] H. Gao School of Mathematics and Statistics, Qingdao University, Qingdao, P.R. China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2021 L. A. Petrosyan et al. (eds.), Frontiers of Dynamic Games, Trends in Mathematics, https://doi.org/10.1007/978-3-030-93616-7_16
279
280
Z. Wang et al.
1 Introduction With the advent of the low-carbon era, consumer’s environmental awareness is further enhanced. Low-carbon tourism and low-carbon consumption have become new fashions. There are much literature about the low-carbon supply chain and closed-loop supply chain which return the end-of-life/use products to a collector. Most of them use game theory to characterize the behavior of different agents which pursuit their profit. There are two streams of game theory, static game and dynamic game. Dynamic games can better describe the changes in reality because their state equation influence by current and future consumption. Xia et al. [1] incorporates consumer low-carbon awareness and social preferences, including relationship and status preferences to investigate the channel members’ decisionmaking and performance and find that improving consumer low-carbon awareness is beneficial for carbon emissions reduction and both channel members’ utilities. Ma et al. [2] investigate a low-carbon tourism supply chain from the perspective of online-to-offline(O2O). De Giovanni and Zaccour [3] investigate return functions and coordination mechanisms, two key issues in closed-loop supply chain. Wang et al. [4] survey a retailer-led low-carbon supply under altruistic preference. All these papers use differential game to study the supply chain. They set the planning horizon as infinite or finite with a constant discount parameter. In practice, we only know a little of the information. So in our paper, we use differential game with continuous updating to investigate the low-carbon supply chain. Most real life conflicting processes evolve continuously in time, and their participants continuously receive updated information and adapt. For such processes an approach was proposed that allows for constructing more realistic models, namely, games with dynamic updating [5, 6] and games with continuous updating [7, 8]. Fundamental models previously considered in the theory of differential games are related to problems defined for a fixed time interval (players have all the information for a closed time interval) [9], problems over an infinite time interval with discounting (players have information for an infinite time interval) [10], problems defined for a random time interval (players have information over a given time interval, but the terminating instant is a random variable) [11]. Furthermore, one of the first works in the theory of differential games was devoted to the pursuer and evader game (the player’s payoff depends on when the opponent is caught) [12]. In all the above models and suggested solutions, it is assumed that players at the beginning of the game know all the information about the dynamics of the game (motion equations) and about the preferences of players (payoff functions). However, this approach does not take into account the fact that, in many real life processes, players at the initial instant do not know all the information about the game. Thus, existing approaches cannot be directly used to construct a range of real life game-theoretic models. The class of differential games with continuous updating was considered in the papers [7, 8, 13–19], here it is supposed that the updating process evolves continuously in time. For the first time this class of games was considered in the
Differential Game Model Applied in Low-Carbon Chain with Continuous Updating
281
papers [7, 8]. In [7], the system of Hamilton-Jacobi-Bellman equations is derived for the feedback Nash equilibrium with continuous updating. The paper [8] is devoted to the class of autonomous linear quadratic differential games with continuous updating, where the feedback strategies are considered. There the convergence of Nash equilibrium strategies and related trajectory with dynamic updating and continuous updating is proved. Later in the paper [14] the class of cooperative differential games with transferable utility using Hamilton-Jacobi-Bellman equations is considered. Here the construction of characteristic function with continuous updating and several related theorems are presented and the property of strong timeconsistency of cooperative solution with continuous updating is proved. Another result related to Hamilton-Jacobi-Bellman equations with continuous updating is devoted to the class of cooperative differential games with nontransferable utility, [15]. The paper [19] is devoted to a detailed study of resource extraction game model for cooperative and non-cooperative game model. In the paper [16], the explicit form of the Nash equilibrium for the differential game with continuous updating is derived by using the Pontryagin maximum principle. The paper [18] studies the cooperative setting using Pontryagin Maximum principal. More papers related to the class of linear-quadratic differential games with continuous updating are devoted to open-loop Nash equilibrium strategies [13] and cooperative setting with the form of characteristic function with continuous updating [17]. Moreover, the convergence results are also obtained for the non-autonomous case. The class of non-cooperative differential games with continuous updating was considered in the papers [7, 8], here it is supposed that the updating process evolves continuously in time. In the paper [7] the system of Hamilton-Jacobi-Bellman equations is derived for the Nash equilibrium in a game with continuous updating. In the paper [8] the class of linearquadratic differential games with continuous updating is considered and the explicit form of the Nash equilibrium is derived. Theoretical results are illustrated on a game model of non-renewable resource extraction presented in the paper [20]. In this paper, we selected a classical model from management area, and studied this model taking into consideration the continuous updating. Potentially, this method will more useful to explain the low-carbon chain in the real life. The paper is structured as follows. Section 2 the initial game of supply chain is presented. Section 3 demonstrates the knowledge of a differential game model with continuous updating, introduced two kinds solution of differential game with continuous updating, including the cooperative solution and Nash Equilibrium with continuous updating. In Sect. 4, the low-carbon chain model with continuous updating is presented, and corresponding solutions in cooperative and non-cooperative cases are derived. The results of the modeling approach are based on continuous updating, and related simulation results done by MATLAB are presented in the Sect. 5. The conclusion is drawn in Sect. 6.
282
Z. Wang et al.
2 Initial Model of Low-Carbon Chain Consider a game-theoretical model of low-carbon supply chain with two asymmetric players [21]. The game involves two players: a manufacturer and a retailer. The game is defined on the t time interval [0, ∞). The strategy of the manufacturer is emission reduction effort e(t), and the retailer’s strategy is promotion a(t). The manufacturer’s goodwill G(t) depends on the emission reduction effort and promotion. The initial goodwill at the beginning of the game is G0 . Refer to the classical goodwill dynamic equation of Nerlove-Arrow [22], characterizing the dynamic of low-carbon goodwill as .
G(t) = θ e(t) + ηa(t) − δG(t).
(1)
Assuming that consumer tends to buy products with high level of emission reduction, good brand image and low price, market demand is influenced by consumer’s low-carbon preference, retail price and low-carbon goodwill. Outright [23] divided the market demand influencing factors into price and non-price. We also divide the market demand influencing factors into price and non-price, and influence the demand through separable multiplication form. Therefore, the product market demand function is: D(t) = γ G(t)(α − βp),
(2)
where p(t) is the retail price at the time t. γ > 0 and β > 0 stand for the low-carbon preference coefficient and price sensitivity coefficient of consumer, respectively, which indicate the extent to which low-carbon goodwill and retail price affect market demand. Assuming that the unit production cost independent of emission reduction is a constant c, according to the general cost convexity assumption [24], the manufacturer’s emission reduction cost function and retailer’s promotion cost function can be expressed as Cm (t) =
km 2 e (t), 2
(3)
Cr (t) =
kr 2 a (t), 2
(4)
where km > 0 and kr > 0 represent the emission reduction cost coefficient and promotion cost coefficient, respectively.
Differential Game Model Applied in Low-Carbon Chain with Continuous Updating
283
Payoff functions of each player have the forms, where w(t) is the wholesale price:
∞
max Jm =
e(t ),w(t )
∞
max Jr =
a(t ),p(t )
[(w(t) − c)γ G(t)(α − βp(t)) − Cm (t)]dt,
(5)
[(p(t) − w(t))γ G(t)(α − βp(t)) − Cr (t)]dt.
(6)
0
0
According to the two propositions in [21], the optimal solutions in traditional model are: Remark 1 In the cooperative scenario (The following abbreviated as model C), the 2 θγ 2 ηγ system’s strategies of the supply chain are eC = (α−βc) a C = (α−βc) 4βkm (ρ+δ) , 4βkr (ρ+δ) ∗
pC = α+βc 2β , where c is the production cost. The total profit of supply chain is V C (G) = c1 GC + c2 , here c1 and c2 stand for the coefficients of value function. −δt + GC , where GC = The time trajectory of goodwill is GC (t) = (G0 − GC ∞ )e ∞ ∞ γ (α−βc)2 (θ 2 kr +η2 km ) 4βδkm kr (ρ+δ)
is the steady state goodwill.
Remark 2 In the non-cooperative scenario (hereafter abbreviated as the N), the 2 θγ N = α+βc . The manufacturer’s optimal strategies are eN = (α−βc) 8βkm (ρ+δ) and w 2β 2
(α−βc) ηγ and pN = 3α+βc retailer’s optimal strategies are a N = 16βk 4β . The manur (ρ+δ) N facturer’s profit is Vm (G) = h1 G + h2 , here h1 , h2 to stand for the coefficients of value function. The retailer’s profit is VrN (G) = l1 G + l2 The time trajectory of −δt + GN , where GN = goodwill is GN (t) = (G0 − GN ∞ )e ∞ ∞ the steady state goodwill.
γ (α−βc)2 (2θ 2 kr +η2 km ) 16βδkm kr (ρ+δ)
is
Next we will compare our solutions with continuous updating with mentioned solutions. In the next section the calculation of optimal strategies (controls) is presented for two basic classes of differential games: cooperative differential games, non-cooperative differential games.
3 General Differential Game Model with Continuous Updating Consider n-player differential game Γ (x0 , t0 , t0 +T ), defined on the interval [t0 , t0 + T ], where 0 < T < +∞. The motion equation is given by the system: x˙t0 (s) = f (s, xt0 , ut0 ), xt0 (t0 ) = x0 , xt0 ∈ Rl , ut0 = (ut10 , . . . , utn0 ), uti0 = uti0 (s, x) ∈ Ui ⊂ compRk .
(7)
284
Z. Wang et al.
for which the conditions of existence, uniqueness, and continuity of solution xt0 (t) for any admissible measurable controls ut10 (·), . . . , utn0 (·) are satisfied. A closed-loop control uti0 (s, x) is a strategy of player i. The payoff function of player i is defined as follows:
t Ki 0 (x0 , t0 , T ; ut0 )
t 0 +T
=
hi [s, xt0 (s), ut0 (s, x)]ds, i ∈ N,
(8)
t0
where xt0 (s), ut0 (s, x) are trajectory and strategies in the game Γ (x0 , t0 , t0 + T ), x˙t0 (s) is the derivative of s. g i (t, x, u), i = 1, n and f (t, x, u) are the integrable functions. Subgame of Differential Game with Continuous Updating Consider n-player differential game Γ (x, t, t + T ), t ∈ [t0 , +∞) defined on the interval [t, t + T ], where 0 < T < +∞. The motion equation for the subgame Γ (x, t, t + T ) has the form: x˙t (s) = f (s, xt , ut ), xt (t) = x, xt ∈ Rl , ut = (ut1 , . . . , utn ), uti = uti (s, x) ∈ Ui ⊂ compRk , s ∈ [t, t + T ]. (9) The payoff function of the player i for the subgame Γ (x, t, t + T ) has the form: t+T
Kit (x, t; ut )
=
hi [s, xt (s), ut (s, x)]ds, i ∈ N,
(10)
t
where xt (s), ut (s, x) are trajectories and strategies in the game Γ (x, t, t + T ). A differential game with continuous updating is developed according to the following rule: Current time t ∈ [t0 , +∞) evolves continuously and as a result players continuously obtain new information about motion equations and payoff functions in the game Γ (x, t, t + T ). The strategy profile u(t, x) in a differential game with continuous updating has the form: u(t, x) = ut (s, x)|s=t , t ∈ [t0 , +∞),
(11)
where ut (s, x), s ∈ [t, t + T ] is the strategy profile in the subgame Γ (x, t, t + T ). Obviously in some classes of games optimal ut (t, x) (cooperative strategies, Nash
Differential Game Model Applied in Low-Carbon Chain with Continuous Updating
285
equilibrium, Pareto optimal solution) at some point can prove to be not unique, nonmeasurable, or even nonexistent. Trajectory x(t) in a differential game with continuous updating is determined in accordance with x(t) ˙ = f (t, x, u), x(t0 ) = x0 , x ∈ Rl ,
(12)
where u = u(t, x) are strategies in a game with continuous updating (11). We suppose that a strategy with continuous updating obtained using (11) is admissible or that the problem (12) has a unique and containable solution. The essential difference between a game model with continuous updating and a classic differential game with prescribed duration Γ (x0 , t0 , T ) is that players in the initial game are guided by the payoffs they will eventually obtain at the interval [t0 , T ]; but in the case of a game with continuous updating, at time instant t they orient themselves toward the expected payoffs (10), which are calculated based on the information defined at interval [t, t + T ], or on the information that they have at the time instant t.
3.1 Cooperative Game Model with Continuous Updating In a cooperative differential game model with transferable utility, there are also two problems related to how one might define cooperative behavior using cooperative strategies and how one might define cooperative solutions. Both problems should be addressed using the continuous updating approach. In the framework of continuously updated information, it is important to model players’ behavior. In this paper we consider a case of cooperative strategies. For the class of differential games with continuous updating, we shall give it in the following form: Consider a concept of generalized cooperative strategies 9 u∗ (t, s, x) = (9 u∗1 (t, s, x), . . . ,9 u∗n (t, s, x)), t ∈ [t0 , T ], s ∈ [t, t + T ],
(13)
which we are going to use further for constructing strategies u∗ (t, x), where x is the state, t is the current time, s is the imagined time. Strategies 9 u∗ (t, s, x) = ∗ (9 u1 (t, s, x), . . . , 9 u∗n (t, s, x)) for any fixed t are defined in the following way: u∗1 (t, s, x), . . . ,9 u∗n (t, s, x)) = arg max 9 u∗ (t, s, x) = (9
9 u1 ,...,9 un
n
Kit (x, t; 9 u(t)),
i=1
(14)
286
Z. Wang et al.
where for a fixed current time instant t the optimization problem (14) is constructed on the basis of information available to players at the time instant t, i.e., over the interval [t, t + T ]. Suppose that the maximum in (14) is achieved on a set of admissible strategies 9 u∗ (t, s, x) with fixed t. Definition 1 Strategy profile 9 u∗ (t, s, x) = (9 u∗1 (t, s, x), . . . ,9 u∗n (t, s, x)) is generalized cooperative strategy profile in a game with continuous updating, if for any fixed t ∈ [t0 , +∞) strategy profile 9 u∗ (t, s, x) is cooperative strategy profile in game Γ (x, t, t + T ). It is important to notice that generalized cooperative strategies 9 u∗ (t, s, x) for a fixed time t is a function of s and x, where s are defined for interval [t, t + T ]. Using generalized cooperative strategies, it is possible to define cooperative strategies that would realize in a game model with continuous updating. Definition 2 Strategy profile u∗ (t, x) are called cooperative strategies with continuous updating if defined in the following way: u∗ (t, x) = 9 u∗ (t, s, x)|s=t = (9 u∗1 (t, s, x)|s=t , . . . ,9 u∗n (t, s, x)|s=t ), t ∈ [t0 , +∞),
(15)
where 9 u∗ (t, s, x) are a generalized cooperative strategies specified in Definition 1. Trajectory x ∗ (t) corresponding to the cooperative strategies with continuous updating u∗ (t, x) can be obtained from the system x(t) ˙ = f (t, x, u∗ ), x(t0 ) = x0 , x ∈ Rl .
(16)
For each differential game model Γ (x, s, t +T ) defined on the interval [t, t +T ], define a cooperative trajectory that is to be realized in the game at time instant t in position x based on the information available at time instant t. Denote the corresponding cooperative trajectory 9 xt∗ (s) as a trajectory that is a solution of the following system: u∗t (t, s, xt )), x˙t (s) = f (s, x,9 xt (t) = x, x(t0 ) = x0 , xt ∈ Rl , s ∈ [t, t + T ]. with 9 u∗ (t, s, x) involved with fixed current time t.
(17)
Differential Game Model Applied in Low-Carbon Chain with Continuous Updating
287
3.2 Nash Equilibrium in Game with Continuous Updating In the framework of continuously updated information, it is important to model the behavior of players. To do this, we use the concept of Nash equilibrium in feedback strategies. However, for the class of differential games with continuous updating, we would like to have it in the following form: NE • for any fixed t ∈ [t0 , +∞), uNE (t, x) = (uNE 1 (t, x), . . . , un (t, x)) in the instant t coincides with the Nash equilibrium in the game (9), (10) defined on the interval [t, t + T ].
However, direct application of classical approaches for definition of Nash equilibrium in feedback strategies is not possible, consider two intervals [t, t + T ] and [t + , t + T + ],