295 94 1MB
English Pages 269 Year 2001
Strategic Negotiation in Multiagent Environments
Intelligent Robots and Autonomous Agents Ronald C. Arkin, editor Behavior-Based Robotics, Ronald C. Arkin, 1998 Robot Shaping: An Experiment in Behavior Engineering, Marco Dorigo and Marco Colombetti, 1998 Layered Learning in Multiagent Systems: A Winning Approach to Robotic Soccer, Peter Stone, 2000 Evolutionary Robotics: The Biology, Intelligence, and Technology of Self-Organizing Machines, Stefano Nolfi and Dario Floreano, 2000 Reasoning about Rational Agents, Michael Wooldridge, 2000 Introduction to AI Robotics, Robin R. Murphy, 2000 Strategic Negotiation in Multiagent Environments, Sarit Kraus, 2001
Strategic Negotiation in Multiagent Environments
Sarit Kraus
A Bradford Book The MIT Press Cambridge, Massachusetts London, England
c 2001 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. This book was set in Times Roman by Interactive Composition Corporation [LATEX2e] and was printed and bound in the United States of America. Library of Congress Cataloging-in-Publication Data Kraus, Sarit. Strategic negotiation in multiagent environments / Sarit Kraus. p. cm. — (Intelligent robots and autonomous agents) “A Bradford book.” ISBN 0-262-11264-7 (hc. : alk. paper) 1. Intelligent agents (Computer software) I. Title. II. Series. QA 76.76.I58 K75 2001 006.3—dc21
00-065370
To my family
Contents
Preface Acknowledgments 1
2
3
4
Introduction 1.1 Rational Self-Interested Agents 1.2 Motivating Examples 1.3 Characteristics That Differentiate Negotiation Protocols 1.4 An Overview of This Book 1.5 Game Theory Concepts
xi xiii 1 3 6 8 9 10
The Strategic-Negotiation Model 2.1 Rubinstein’s Protocol of Alternating Offers 2.2 Assumptions 2.3 The Agents’ Utility Functions 2.4 Negotiation Strategies 2.5 Subgame Perfect Equilibria 2.6 Discussion of the Protocol 2.7 Negotiation Models in Distributed Artificial Intelligence
17 17 19 20 23 23 24
Negotiations about Data Allocation 3.1 Description of the Data Allocation Environment 3.2 Negotiation Analysis—Complete Information 3.3 Complexity and Heuristics Methods for the Allocation Problem 3.4 Dataset Allocation—The Incomplete Information Case 3.5 Distributed File Allocation 3.6 Approaches to Protocols for the Incomplete Information Case
29
Negotiations about Resource Allocation 4.1 Problem Description 4.2 Attributes of the Utility Functions 4.3 Complete Information
67 67 69 73
26
31 38 42 57 60 64
viii
Contents
4.4 4.5 4.6 5
Incomplete Information about the Opponent Multiple-Encounter Negotiation Protocol Other Approaches to the Resource Allocation Problem
80 94 113
Negotiations about Resource Allocation with Multiple Attributes 5.1 Description of the Environment 5.2 Subgame Perfect Equilibrium Strategies 5.3 Simulation Results
117 118 132 153
6
Negotiations about Task Distribution 6.1 Bilateral Negotiations 6.2 Multiple Agents 6.3 Task Distribution in DAI
159 159 166 171
7
Negotiations about How to Reduce Pollution 7.1 Problem Description 7.2 Attributes of the Utility Functions 7.3 Complete Information 7.4 Incomplete Information 7.5 Market Mechanisms and AI Methods for Pollution Control
175 175 181 181 192
Negotiation during a Hostage Crisis 8.1 The Simulation Scenario 8.2 Negotiations between Only the Sikhs and India 8.3 Pakistan Participates in the Negotiations 8.4 Negotiation Analysis in Political Science
213 213
8
9
10
208
216 218 219
Economic and Game-Theoretic Models for Cooperation 9.1 Auctions 9.2 Market-Oriented Programming 9.3 Coalition Formation 9.4 Contracting
221 221 224 225 227
Conclusions and Future Directions
231
Contents
Appendix A: Suggested Background Reading Appendix B: Glossary of Notations Notes References Index
ix
235 237 241 245 259
Preface
In recent years it has become clear that computer systems do not work in isolation. Rather, computer systems are increasingly acting as elements in a complex, distributed community of people and systems. In order to fulfill their tasks, computer systems must cooperate and coordinate their activities with other systems and with people. This phenomenon has been emphasized by the huge growth and success of the Internet. However, cooperation and coordination are needed almost everywhere computers are used. Examples, to mention a few, include health institutions, electricity networks, electronic commerce, robotic systems, digital libraries, and even military units. This book is concerned with the cooperation and coordination of intelligent systems (agents) that are self-interested and that are probably owned by different organizations or individuals. Conflicts between such agents frequently arise. Negotiation is one of the main mechanisms for reaching agreements between entities. In this book I present the strategic-negotiation model, which can be used for the development of autonomous agents that reach mutually beneficial agreements efficiently in complex environments. It integrates game-theory and economic techniques and heuristic methods of artificial intelligence. The strategic-negotiation model is based on Rubinstein’s model of alternating offers, where agents exchange offers until they reach an agreement or until one of them opts out of the negotiations. It provides a unified solution to a wide range of multiagent coordination and cooperation problems. The model’s applications to the data allocation problem in information servers, to the resource allocation and task distribution problems, to the pollution allocation problem, and to the hostage crisis scenario are presented in this book. In all these domains the strategic-negotiation model provides the agents with ways to reach mutually beneficial agreements without delay. The book contains both theoretical and experimental results. All readers are advised to read sections 1.1–1.4 of chapter 1 and all of chapter 2. These chapters deal with the basic ideas and definitions of the strategic-negotiation model. Section 1.5 lays out the basic concepts of game theory and may help readers who lack a background in game theory to understand and to attain a better insight into the technical parts of the book. Readers who are more implementation-oriented may skip the sections in those chapters that present the theorems and their proofs, especially those presented in chapters 4 and 5, and should focus instead on the description of each domain and the associated simulation results.
Acknowledgments
This book is the result of an interdisciplinary project conducted jointly with Jonathan Wilkenfeld, Professor of Government and Politics at the University of Maryland at College Park. The project has been performed over the last ten years at Bar-Ilan University and at the University of Maryland at College Park. I would like to thank Jonathan for his cooperation and for his willingness to accept new ideas and methods from other disciplines. I would also like to thank for their cooperation all the students and colleagues who have participated in this project over the years. In particular, I would like to acknowledge the contribution of Rina Azoulay-Schwartz, Orna Shechter, Freitsis Esfir, and Gilad Zlotkin, whose work is reported in this book, and thank them for their cooperation and hard work. The project was funded, in part, by NSF under grants IIS-9820657 and IIS-9907482. The following chapters have been derived, in part, from published or to be published works as noted: 1. Chapter 3 is based on R. Azuolay-Schwartz and S. Kraus, Negotiation on Data Allocation in Multi-Agent Environments, Autonomous Agents and MultiAgent Systems Journal, 2001 (to appear). 2. Chapters 4 and 6 are based on S. Kraus, J. Wilkenfeld, and G. Zlotkin, Multiagent Negotiation under Time Constraints, Artificial Intelligence Journal, 75(2):297–345, 1995, and on S. Kraus, Beliefs, Time, and Incomplete Information in Multiple Encounter Negotiations among Autonomous Agents, Annals of Mathematics and Artificial Intelligence, 20(1–4):111–159, 1997. 3. Chapter 8 is based on S. Kraus and J. Wilkenfeld, A Strategic Negotiations Model with Applications to an International Crisis, IEEE Transactions on Systems, Man, and Cybernetics, 23(1):313–323, 1993, and on J. Wilkenfeld, S. Kraus, K. Holley, and M. Harris, GENIE: A Decision Support System for Crisis Negotiations, International Journal on Decision Support Systems, 14:369–391, 1995. I would like to thank Daniel Lehmann for suggesting in 1984 that I work on automated negotiation, when the idea that computers could negotiate seemed to be science fiction. In addition, I would like to thank Ariel Rubinstein for introducing me to his seminal work on the bargaining game of alternatingoffers, fifteen years ago. I have cooperated with and worked with more than sixty researchers over the years. They have contributed to my understanding of the concepts of agents and automated negotiation. They have taught me how to cooperate and negotiate.
xiv
Acknowledgments
First, many thanks to Jack Minker and Barbara Grosz for their wisdom, cooperation, and encouragement. I also learned tremendously from my current and my former students. In addition to the students mentioned above, Onn Shehory, Osher Yadgar, Meirav Hadad, Esther David, Haim Fayerstein, Penina HozWeiss, Amir Evenchik, Maier Fenster, Galit Lemel, and Ariel Stollman have been involved in my research on cooperation and coordination in multiagent systems. My current collaborators include V. S. Subrahmanian, Barbara Grosz, Jim Hendler, Don Perlis, Tara Santmire, Charles Ortiz, J¨urgen Dix, Claudia Goldman, John Grant, Piero Bonati, Ruth Ben-Yashar, Dave Sulivann, and Sanmay Das. It has been a pleasure working with them. I have also cooperated with Michael Morreau, Orly Kremien, Katia Sycara, Jeff Rosenschein, Shlomo Argamon-Engelson, Tanya Plotkin, Samir Khuller, Alissa Glass, Luke Hunsberger, Fatma Ozcan, Rob Ross, Kim Holley, Michael Harris, Tomas Eiter, Madhura Nirkhe, Eithan Ephrati, Yoav Shoham, Becky Thomas, Chita Baral, Menachem Magidor, and David Etherington. I am grateful to all of them and all the other researchers of the multiagent community with whom I conducted very helpful discussions on the issues of the book over the years. The members and the staff of the department of mathematics and computer science at Bar-Ilan University and of the Institute for Advanced Computer Studies at University of Maryland at College Park have provided vast support throughout the implementation of this research. In particular, Larry Davis and Johanna Weinstein introduced me to Jonathan Wilkenfeld and together with Joseph Ja’Ja’ supported and encouraged this interdisciplinary research. Thanks also to Amihood Amir and Martin Golumbic for their support. I would like to thank Jonathan Wilkenfeld and Claudia Goldman for reading previous drafts and providing helpful comments. Peggy Weinreich provided editorial assistance and Moshe Katz provided programming assistance. In addition, I would like to thank Bob Prior, Judy Feldmann, and Katherine Innis of the MIT Press for their assistance. Finally, I would like to thank my family for their support and encouragement.
1
Introduction
One of the greatest challenges for computer science is to build computer systems that can work together. The integration of automated systems has always been a challenge, but as computers have become more sophisticated, the demands for coordination and cooperation have become increasingly important and pervasive. It is not only basic level components, such as printers, disks, and CPUs, but also high-level complex systems that need to coordinate and cooperate. Cooperation and coordination are required for those complex problems that cannot be solved by a single system in isolation, but require several systems to work together interactively. For example, providing a user with an integrated picture of his investment portfolio over time, using the information resources already available over the Internet, requires the collaboration of a set of intelligent systems (Sycara and Zeng 1996). Furthermore, there are heterogeneous intelligent systems that were built in isolation, and their cooperation may be necessary to achieve a new common goal. For example, intelligent personal agents, each of whom is responsible for scheduling the meetings of the person for whom it works, may need to cooperate to schedule a joint meeting of all the people for whom they work (Sen, Haynes, and Arora 1997). In other situations, several autonomous systems may work in the same environment, on different goals, and may need to coordinate their activities or to share resources. They may also benefit from cooperation. For example, information servers (possibly owned by different organizations), which need to store very large documents, may unite and decide that together they will keep only one copy of each document. They will need to decide where each shared document will be located. Other examples of intelligent systems that need to cooperate and coordinate their activities include automated systems (agents) that monitor electricity transformation networks (Jennings 1995; Brazier et al. 1998) and multiagent systems that support hospital patient scheduling (Decker and Li 1998). Additional examples include computers that handle complex problems with distributed information that cannot be solved efficiently by a single system alone, and robots or teams of robotic systems (Balch and Arkin 1995) or helicopter pilot agents (Kaminka and Tambe 2000) that must cooperate in hostile environments to reach a common goal. Transportation centers that deliver packages can cooperate to reduce expenses (Sandholm and Lesser 1997) even though they are autonomous, since they are working in the same environment, and they can benefit from cooperation. Problems of coordination and cooperation are not unique to automated systems; they exist at multiple levels of activity in a wide range of populations. People pursue their own goals through communication and cooperation with
2
Chapter 1
other people or machines. Countries and nations must cooperate and coordinate their activities to increase the well-being of their people (Axelrod 1984). Since the earliest days people have used negotiation as a means to compromise to reach mutually beneficial agreements. In social sciences there are two main approaches to the development of theories relating to negotiation. The first approach is the formal theory of bargaining (e.g., Roth 1979; Osborne and Rubinstein 1990), constituting a formal, gametheoretic approach that provides clear analyses of various situations and precise results concerning the strategy a negotiator should choose. However, this approach can be applied only to situations that satisfy very restricted assumptions. In particular, this approach assumes that the agents act rationally, have large computation capabilities, and follow strict negotiation protocols. The second approach, the negotiation guides approach, comprises informal theories that attempt to identify possible general beneficial strategies for a negotiator. The works based on this approach advise a negotiator how to behave in order to reach beneficial results in a negotiation (see, e.g., Raiffa 1982; Fisher and Ury 1981; Druckman 1977; Karrass 1970; Johnson 1993; Hall 1993). These negotiation guides do not presuppose the strong restrictions and assumptions presented in the game-theoretic models. Applying these methods to automated systems is more difficult than using the first approach, however, since they use neither formal theories nor strategies.1 In this book I adopt the formal game-theoretic approach. I present a strategicnegotiation model and apply it to various domains for coordination and cooperation. During strategic negotiations, negotiators communicate their respective desires and they compromise to reach mutually beneficial agreements. This strategic negotiation is a process that may include several iterations of offers and counteroffers. A major goal in the development of the strategic-negotiation model has been to reduce overhead costs resulting from the time spent on planning and negotiation. This is necessary since one of the presumed difficulties in using negotiation as a way of reaching mutual benefit is that negotiation is a costly and time-consuming process and, consequently, it may increase the overhead of coordination (see Bond and Gasser 1988a). Thus, in the presence of time constraints, time spent on planning and negotiation should be taken into consideration. The strategic-negotiation model provides a unified solution to a wide range of problems. It is appropriate for dynamic real-world domains. In this book, I describe the application of the strategic-negotiation model to data allocation
Introduction
3
problems in information servers, resource allocation and task distribution problems, and the pollution allocation problem. In all these domains the strategicnegotiation model provides the negotiators with ways to reach mutually beneficial agreements without delay. The application of the strategic-negotiation model to high pressure, human crisis negotiations will also be presented. The strategic-negotiation model has been applied to domains representing a wide range of possible applications. A resource allocation mechanism is important in any environment in which systems and/or people need to share resources. Examples include almost any workshop, such as a carshop. A conflict resolution mechanism concerning resource usage is also needed when organizations such as airlines need to share a scare resource, for instance, airport resources. In addition, resource allocation is important in organizations and governmental systems. In such domains task allocation mechanisms are also essential. For example, task assignment is necessary in self-management maintenance groups of an airline (Hackman 1991) or in research teams. The information servers’case represents increasingly important areas of computer and information sciences, namely, digital libraries and the Internet. In these domains servers with different interests may interact when there is no central controller, and thus use of the strategic-negotiation model for resolving conflicts seems very promising. The problem of data allocation in informationserver environments is similar to the problem of deciding where to store parts, in an integrated national network of spare-parts inventories within a large company such as IBM. Human negotiation during a crisis is a clear example of a situation where negotiation is costly and support is needed. Other examples include international negotiations and buying and selling on the Internet. 1.1
Rational Self-Interested Agents
The strategic-negotiation model was developed to address problems in distributed artificial intelligence (DAI), an area concerned with how automated agents can be constructed to interact in order to solve problems effectively. In the last few years, there have been several attempts to define an agent (e.g., Etzioni and Weld 1995; Wooldridge and Jennings 1995a; Foner 1993; Moulin and Chaib-Draa 1996; Jennings and Wooldridge 1998; Subrahmanian et al. 2000). For example, Etzioni and Weld (Etzioni and Weld 1995) require an agent to be goal-oriented, collaborative, flexible, and capable of making independent
4
Chapter 1
decisions on when to act. In addition, they determined that an agent should be a continuously running process and be able to engage in complex communication with other agents, including people. It should automatically customize itself to the preferences of its user and to changes in the environment. Subrahmanian et al. (2000) concentrate on the interaction of an agent with other agents and the environment. They define a software agent as a body of software that: provides one or more useful services that other agents may use under specified conditions;
•
includes a description of the services offered by the software, which may be accessed and understood by other agents;
•
includes the ability to act autonomously without requiring explicit direction from a human being;
•
includes the ability to describe succinctly and declaratively how an agent determines what actions to take even though this description may be kept hidden from other agents; and
•
includes the ability to interact with other agents—including humans—either in a cooperative or in an adverse manner, as appropriate.
•
There are two aspects to the development of agents: what is the architecture of each agent, and how do they interconnect, coordinate their activities, and cooperate. There are many approaches to the development of a single agent (see, e.g., the survey in Wooldridge and Jennings 1995a). These approaches can be divided into three main categories (Wooldridge and Jennings 1995b): 1. deliberative architectures 2. reactive architectures 3. hybrid architectures A deliberative agent architecture is one that contains an explicitly represented, symbolic model of the world, and one in which decisions (e.g., about what actions to perform) are made via logical (or at least pseudo-logical) reasoning, based on pattern matching and symbol manipulation. Examples of such architecture includes the intelligent resource-bounded machine architecture (IRMA) (Bratman, Israel, and Pollack 1988), HOMER (Vere and Bickmore 1990), Agent0 (Shoham 1993), Etzioni’s softbots for UNIX environments (Doorenbos, Etzioni, and Weid 1997), the Kabash system (Chavez and Maes 1996), and
Introduction
5
many others. The main criticism of this approach is that the computational complexity of symbol manipulation is very high, and some key problems appear to be intractable. A reactive architecture is usually defined as one that does not include any kind of central symbolic world model and does not use any complex symbolic reasoning. One of the first architectures of this type is Brooks’s subsumption architecture (Brooks 1985). Another such architecture is Maes’s agent network architecture (Maes 1990). These types of agents work efficiently when they are faced with “routine” activities. Many researchers suggest that neither a completely deliberate nor a completely reactive approach is suitable for building agents. They use hybrid systems, which attempt to combine the deliberate and the reactive approaches, for example, the PRS architecture (Georgeff 1987) and the TouringMachine (Ferguson 1992). Muller (1999) presents additional types of agents (e.g., layered agents, interacting agents) and tries to assist readers in deciding which agent architecture to choose for a specific application. This book concentrates only on the second aspect of the development of agents: the ability of the agent to coordinate its activity with other agents and to cooperate with them. I provide an independent module for strategic negotiation, and thus, I am willing to adopt any definition or model of a single agent. My only assumptions are that the agents can communicate with each other using a predefined language, that they have some computation and memory resources, and that the negotiation module can be added to the agents. In some applications we will consider people as agents and assume that they can follow the negotiation protocols and strategies, if they wish. An important issue in the development of a coordination and cooperation module is the level of cooperation among the agents: there are cooperative agents, which work toward satisfying the same goal, and there are environments where the agents are self-interested and try to maximize their own benefits.2 There are intermediary cases where self-interested agents join together to work toward a joint goal. In this book we study the interactions among self-interested, rational, and autonomous agents. The agents do not share a common goal, and each agent has its own preferences and acts according to them. Cooperation and coordination techniques may be required in various environments and situations where the agents act according to their own preferences and do not share a goal. For example, in situations where airplanes belonging to different airlines need to share the limited resources of the same airport, it is
6
Chapter 1
necessary to find a mechanism that will give priority to planes with less fuel on board (Rosenschein and Zlotkin 1994). Other examples include an electronic market populated with automated agents representing different enterprises that buy and sell products (e.g., Chavez and Maes 1996; Fischer et al. 1996; Tsvetovatyy and Gini 1996; Zeng and Sycara 1998); information servers that form coalitions for answering queries (Klusch and Shehory 1996); and autonomous agents that negotiate to reach agreements for providing a service by one agent to another (Sierra, Faratin, and Jennings 1997). In all these examples the agents are self-interested and try to maximize their own benefits. The strategic-negotiation model is applicable to these types of examples. 1.2
Motivating Examples
Two motivating examples are presented in order to illustrate different settings where negotiation among automated agents is required. These examples will be revisited throughout the book to illustrate basic concepts and the results that will be provided. The first example involves software agents, and the second mobile robots. EXAMPLE 1.2.1 (DATA ALLOCATION IN LARGE DATABASES) There are several information servers in different geographical areas. Each server stores data, which has to be accessible by clients not only in its geographical area but also in other areas. The topics of interest to each client change over time, and the set of clients may also change over time. Periodically, new data arrive at the system and have to be located at one of the servers in the distributed system. Each server is independent and has its own commercial interests. The servers would like to cooperate with each other in order to make more information available to their clients. Since each server has its own preferences regarding possible data allocations, its interests may conflict with the interests of some of the other servers. A specific example of a distributed information system is the data and information system component of the earth-observing system (EOSDIS) of NASA (NASA 1996). It is a distributed system that supports archival data and distribution of data at multiple and independent data centers (called DAACs). The current policy for data allocation in NASA is static: each DAAC specializes in some topics. When new data arrive at a DAAC, the DAAC checks if the data are relevant to one of its topics, and, if so, it uses criteria such as storage cost to
Introduction
7
determine whether or not to store the data in its database. The DAAC communicates with other DAACs in those instances in which the data item encompasses the topics of multiple DAACs, or when a data item presented to one DAAC is clearly in the jurisdiction of another DAAC, and then a discussion takes place among the relevant DAACs’ managers. However, this approach does not take into consideration the location of the information clients, and this may cause delays and excessive transmission costs if data are stored far from their potential users. Moreover, this method can cause rejection of data if they do not fall within the criteria of any DAAC, or if they fall under the criteria of a DAAC that cannot support this new product because of budgetary problems. 1.2.2 (AGENTS ON MARS) NASA has embarked on a scientific mission to Mars that involves sending several mobile robots. The European Space Agency (ESA) has also sent several mobile robots to Mars. Both NASA’s and ESA’s robots work in the same environment. The missions of the robots involve collecting earth samples from different locations and at different depths. A robot will need one or more digging tools to complete its tasks. The tools were sent to Mars by a third company, which charges NASA and ESA according to their use of the equipment. As a result, it might be beneficial for the two groups of robots to share resources. It may also be the case, say, that the NASA group’s antenna is damaged during landing, and it is expected that communications between NASA’s center and its robots on Mars will be down for repairs for one day. NASA can use a weaker and less reliable backup line, but this would mean diverting this line from other costly space experiments, and consequently the expense of using this line is very high. NASA would like to share the use of the ESA line during the one-day period so that it can conduct its planned research program. Only one group can use the line at a time, and that line will be in use for the entire duration of the particular experiment. A negotiation ensues between the two labs over division of use of the ESA line, during which time the ESA has sole access to the line, and NASA cannot conduct any of its experiments (except by use of the very expensive backup). Given the 1997 success of NASA’s Pathfinder on Mars (Golombek et al. 1997), and the 1994 success of DANTE II in exploring the crater on the Mt. Spurr volcano in Alaska, it seems that such scenarios may be realistic in the near future (see also Berns 1998). NASA’s current plan is to send a pair of wheeled robots to search for evidence of water on Mars in 2003.
EXAMPLE
8
1.3
Chapter 1
Characteristics That Differentiate Negotiation Protocols
Evaluation of the results of negotiations in the systems considered is not easy. Since the agents are self-interested, when a negotiation is said to be successful we must ask “successful for whom?” since each agent is concerned only with its own benefits or losses from the resolution of the negotiation. Nevertheless, there are some parameters that can be used to evaluate different protocols. Only those protocols that satisfy the following conditions are considered. Distribution: The decision-making process should be distributed. There should be no central unit or agent required to manage the process. Symmetry: The coordination mechanism should not treat agents differently in light of nonrelevant attributes. In the situations we consider, the agents’ utility functions and their role in the encounter are the relevant attributes. All other attributes, such as an agent’s color, name, or manufacturer, are not relevant. That is, symmetry implies that given a specific situation, the replacement of an agent with another that is identical with respect to the above attributes will not change the outcome of the negotiation. The first requirement is desirable since we consider self-interested agents and it may be difficult for such agents to agree on a centralized controller that will be a fair mediator. In addition, a centralized controller may become a performance bottleneck. The symmetry restriction will encourage the designers of the agents to adopt the negotiation protocol. The following parameters will be used to evaluate the results of the negotiation that are presented in this book. Negotiation time: Negotiations that end without delay are preferable to negotiations that are time-consuming. It is assumed that a delay in reaching an agreement causes an increase in the cost of communication and computation time spent on the negotiation. We want to prevent the agents from spending too much time on negotiation resulting in not keeping to their timetables for satisfying their goals. Efficiency: It is preferred that the outcome of the negotiations will be efficient. It increases the number of agents that will be satisfied by the negotiation results and the agents’ satisfaction levels from the negotiation results.
Introduction
9
Thus it is preferable that the agents reach Pareto optimal agreements.3 In addition, if there is an agreement that is better for all the agents than opting out, then it is preferred that the negotiations will end with an agreement. Simplicity: Negotiation processes that are simple and efficient are preferable to complex processes. Being a “simple strategy” means that it is feasible to build it into an automated agent. A “simple strategy” is also one that an agent will be able to compute in a reasonable amount of time. Stability: A set of negotiation strategies for a given set of agents is stable if, given that all the other agents included in the set are following their strategies, it is beneficial to an agent to follow its strategy too. Negotiation protocols that have stable strategies are more useful in multiagent environments than protocols that are unstable. If there are stable strategies, we can recommend to all agent designers to build the relevant strategies into their agents. No designer will benefit by building agents that use any other strategy. Money transfer: Money transfer may be used to resolve conflicts. For example, a server may “sell” a data item to another server when relocating this item. This can be done by providing the agents with a monetary system and with a mechanism for secure payments. Since maintaining such a monetary system requires resources and efforts, negotiation protocols that do not require money transfers are preferred. 1.4
An Overview of This Book
This book is organized as follows. Chapter 2 introduces the reader to the main components of the strategic-negotiation model. It presents Rubinstein’s protocol of alternating offers, defines basic concepts such as agreements and strategies, and introduces the concept of equilibrium that will be used to identify strategies for the negotiating agents. Chapter 3 presents the application of the strategic-model to the data-allocation problem. It considers situations characterized by complete as well as incomplete information, and proves that the negotiation model yields better results than the static allocation policy currently used for data allocation for servers in distributed systems. Chapter 4 considers resource allocation in environments where agents need to share a common resource. It considers situations with both complete and
10
Chapter 1
incomplete information, single encounters and multiple encounters. It will be shown that in all the situations considered in this chapter, the negotiations end no later than in the second time period, and usually with an agreement. Chapter 5 continues to study the problem of resource allocation where the agents have goals with deadlines that they need to meet. Emphasis is placed on the issue of multiple attributes of the negotiation and it is shown that in these settings the negotiation ends with no delay. Simulation results reveal that our mechanism performs as well as a centralized scheduler and also has the property of balancing the resources’ usage. Chapter 6 presents the application of the strategic model to the task distribution problem. Here again negotiation ends with no delay. Chapter 7 considers the pollution sharing problem in which plants should reach agreements on pollution reduction because of external factors such as weather. The application of the strategic-negotiation model to the problem is presented and a comparison is made with market-based methods in situations where the agents have incomplete information. Chapter 8 presents the application of the strategic-negotiation model to a hostage crisis situation. Chapter 9 presents a detailed survey of other game-theory and economicsbased models of cooperation and compares them with the strategic-model of negotiation. The book concludes with future directions for automated negotiation. The appendix includes an annotated bibliography of suggested background readings and a glossary of the notations used in the book. 1.5
Game Theory Concepts
This introductory chapter informally describes the main concepts of game theory. Readers who are familiar with game theory can skip this chapter.4 Some of the general concepts described in this section will be redefined and used in the strategic-negotiation model. Game theory is the study of decision making in multiperson situations, where the outcome depends on everyone’s choice. The goal of each participant is to achieve well-defined objectives, while taking into account that the other participants are doing the same and that all their actions affect each other. This is in contrast to decision theories and the theory of competitive equilibrium that are used in economics, in which the other participants’ actions are considered
Introduction
11
as an environmental parameter, and the effect of the decision maker’s actions on the other participants is not taken into consideration. Game theory, as it is used in this book, is a modeling tool, not an axiomatic system. The main concepts of game theory will be described using a simplified case of a hostage crisis scenario that is presented in (Kraus and Wilkenfeld 1993). The scenario is based on the hypothetical hijacking of a commercial airliner enroute from Europe to India and its forced landing in Pakistan. The passengers are predominantly Indian and the hijackers are known to be Sikhs. The hijackers demand the release of up to 800 Sikh prisoners from Indian security prisons (see Kraus et al. 1992). The three parties must consider several possible outcomes: India or Pakistan launches military operations to free the hostages; the hijackers blow up the plane with themselves aboard; India and the Sikhs negotiate a deal involving the release of security prisoners in exchange for the hostages; Pakistan and the Sikhs negotiate a safe passage agreement; or the hijackers give up. The details of the negotiation process will not be considered here since they are not needed to demonstrate the main concepts of game theory. The negotiation process is detailed in chapter 8. 1.5.1
Describing a Game
The essential elements of a game are players, actions, information, strategies, payoffs, outcome, and equilibria. The players, actions, and outcomes are collectively referred to as the rules of the game, and the modeler’s objective is to use the rules of the game to determine the equilibrium. The players are the individuals who make decisions. It is assumed that each player’s objective is to maximize the expected value of his own payoff, which is measured in some utility scale. In the hostage crisis scenario, the players are India, Pakistan, and the Sikhs. Each player has a set of objectives, which we identified, and a certain number of utility points is associated with each (see Kraus et al. 1992; Wilkenfeld et al. 1995). Passive individuals, like the UK in this example, who may observe the situation without trying to change anyone’s behavior, are not players. There are essentially three ways to present a social interaction as a game. Extensive form: The most complete description is the extensive form. It details the various stages of the interaction, the conditions under which a player has to move, the information an agent holds at different stages, and the motivation of the players.
12
Chapter 1
Strategic form: More abstract is the strategic form (or normal form) representation of a game. Here one notes all possible strategies of each agent together with the payoffs that result from strategic choices of all the players. In strategic forms, many details of the extensive form have been omitted. Coalitional form: The coalitional form (or characteristic function form) is a description of social interactions where binding agreements can be made and enforced. Binding agreements allow groups of players, or coalitions, to commit themselves to actions that may be against the interest of individual players once the agreement is carried out. The components of the strategic form will be presented first and then the extensive form will be described. The coalitional form is discussed in, for example, (Kahan and Rapoport 1984). 1.5.2
Strategic Games
In a strategic game each player chooses his final plan of action, and these choices are made simultaneously.5 The model consists of a finite set of N players. An action or a move by player i, denoted as ai , is a choice he can make. Player i’s action set, Acti = {ai }, is the entire set of actions available to him. An action profile is an order set a = {a j }, ( j = 1, . . . , N ), where a j is an action of player j. The set of all possible action profiles is denoted as A. For example, in a simplified scenario of the hostage crisis, we have ActInd = {Operation, Deal, Nothing}, ActPak = {Operation, Safe passage, Nothing} and ActSik = {Blow, Deal, Safe passage}. A possible action profile is (Deal, Nothing, Deal), where India and the Sikhs reach an agreement, and Pakistan chooses to do nothing. For each player i, there is a payoff function Ui : A → IR (also called a utility function), which represents its preferences on the set of action profiles.6 A strategic game in which there are two players can be described easily in a table like that given in figure 1.1. One player’s actions are identified with the India Deal Op Sikh
Deal 2
1
Blow –1 –2
–2 0
0 –3
Figure 1.1 An example of a two-player strategic game in which each player has two actions.
Introduction
13
rows and the other with the columns. The two numbers in the box formed by row r and column c are the players’ payoffs when the row player chooses r and the column player chooses c. The top number is the payoff of the column player. In the simple example of figure 1.1 the row player is the Sikhs and the column player is India. India can choose between trying to reach a deal (Deal) and launching an operation (Op). The Sikhs can choose between trying to reach a deal (Deal) and blowing up the plane (Blow). If both India and the Sikhs choose Deal, the utility of India7 is 1 and that of the Sikhs is 2. If, for example, the Sikhs choose Deal and India chooses Op, then the utility of the Sikhs is −2 and that of India is 0. The solution concept most commonly used to predict the outcome of a game in game theory is Nash equilibrium (Nash 1953; Luce and Raiffa 1957). This notion captures a steady state of a strategic game, in which each player holds the correct expectation about the other players’ behavior and acts rationally.8 An action profile a is a Nash equilibrium of a strategic game, if each player i does not have a different action yielding an outcome that it prefers to that generated when it chooses ai , given that every other player j chooses a j . To put it briefly: no agent can profitably deviate, given the actions of the other players. For example, the game in figure 1.1 (Deal, Deal) is a Nash equilibrium. If India changes to Op, then, given that the Sikhs choose Deal, its utility will decrease to 0. Similarly, if the Sikhs choose Blow, given that India chooses Deal, their utility will be reduced to −1. Note that Deal is always better to India than Op; however, Blow is better for the Sikhs than Deal when India launches a military operation (Op). 1.5.3
Games in Extensive Form
The strategic form is too abstract to model multiple situations, including a negotiation process. The extensive form, on the other hand, is the most explicit description of a game. It notes the sequence of moves, all possible states of information, and the choices at different stages available to all players of the game. That is, the model allows us to study solutions in which each player can consider his plan of action, not only at the beginning of the game, but also at any point of time at which he has to make a decision. An extensive-form game is a tree together with functions that assign labels to its nodes and edges. An extensive game based on a simple scenario of the hostage crisis is presented in figure 1.2. In this scenario, India acts before the Sikhs. It needs to choose between offering a deal or launching an operation (Op).
14
Chapter 1
–
–
–
–
–
– –
Figure 1.2 An example of an extensive-form game.
The deal can be high (DealH) or low (DealL), in which India will release a high number of Sikh prisoners or a low number of Sikh prisoners, respectively. India prefers DealL to DealH, while the Sikhs prefer DealH. If India chooses Op the game ends: if the weather conditions are good, then the operation will succeed and India’s utility will be 5 and the Sikhs’ utility will be −3. However, if the weather conditions are bad, the operation will fail and India and the Sikhs’ utilities will be −3 and 0, respectively. India does not know the weather conditions when making a decision. If India chooses DealH or DealL, then it is the Sikhs’ turn to move. They can either accept the deal (Yes) or blow up the plane (Blow), or if India chose DealL they can make a counteroffer (DealH). In the first two cases the game ends. If the Sikhs choose DealH, then it is India’s turn again. It can either accept the deal (Yes) or launch an operation. The tree in figure 1.2 specifies all these details. In general, the tree of an n-person extensive-form game consists of nodes (the points) and edges that connect between nodes. The leaves of the tree
–
Introduction
15
(i.e., nodes without edges that start there) are called terminal nodes and represent possible ways that the game could end. Each possible sequence of events that could occur in the game is represented by a path of edges from the root to one of these terminal nodes. Each nonterminal node has a player label. The nodes with a player-label i are decision nodes that are controlled by player i. For example, in figure 1.2, it is India’s turn to make a decision in the nodes labeled Ind. The labels on the edges that start at a decision node specify the possible moves of the player of the node. For example, consider in figure 1.2 the three edges that start at the highest node at the left that is labeled by Ind. These edges are labeled with Op, DealH, and DealL, respectively, specifying that India can choose between launching a military operation, or offering deals. There may be events that are determined by chance, that is, they are not under the control of any of the players (e.g., the weather conditions in the hostage crisis scenario). They are represented by nodes labeled c and are called chance nodes. Each edge that starts at a chance node has a label that specifies its probability. At each chance node, these chance probabilities of the edges are nonnegative numbers that sum to 1. In figure 1.2, the root of the tree (i.e., the first node) is a chance node. The edges leaving it are labeled 0.4 and 0.6. Intuitively, this means that there is a probability of 0.4 that the weather will be good, and a probability of 0.6 that the weather will be bad. Each terminal node of the tree is associated with a label that specifies a vector of n numbers that represent the payoffs of the players.9 For example, in figure 1.2 there are 14 terminal nodes. The pair of numbers at each of the terminal nodes represents the payoffs that India and the Sikhs would get if the events of the path ending at this node will occur. For example, if the weather is good and India launches a military operation, then the terminal node labeled 5, −3 (the left top node) will be reached, indicating that India’s utility in this case is 5 and the Sikhs’ utility is −3. The extensive form can also take into consideration the uncertainty of the players about the game. A player may be unable to observe the moves of opponents or the chance moves. Since actions lead from nodes to other nodes, a player who cannot observe the action of another player will not know at which decision node he is located. For example, in India’s first move in the hostage crisis scenario, India doesn’t know the weather conditions, that is, it doesn’t know whether it is in the upper node or the lower node. To show that a player may not be able to distinguish between nodes, one joins the decision nodes that are indistinguishable in a set, called an information set. In diagramming the game trees, information sets are indicated by joining the particular nodes
16
Chapter 1
with a circle. In figure 1.2, the first two nodes of India are joined together by a circle, indicating that India cannot distinguish between them. An information set contains only nodes of the same players, and each node can belong to only one information set. In addition, the possible moves of a player in all the nodes that belong to the same information set are exactly the same. A sequence of events leading to a node is its history. It is generally assumed that the players in an extensive-form game have perfect recall. This assumption asserts that whenever a player moves, he remembers all the information that he knew earlier in the game, including all of his own past moves in the history. A strategy for a player in an extensive-form game is a function that specifies an action for every history after which the player chooses an action. Thus a strategy specifies for a player which action to take at each of its decision nodes. A strategy profile is an ordered set of strategies, one for each player. For example, a possible strategy for India is for it to launch a military operation at its turn to move. As in the strategic games, the Nash equilibrium concept is also used for predicting the outcome of extensive-form games. A strategy profile F = ( f 1 , . . . , f N ) is a Nash equilibrium of an extensive game, if each player i does not have a different strategy yielding an outcome that it prefers to that generated when it chooses f i , given that every other player j chooses f j . For example, the strategies profile in which India always launches a military operation (chooses Op) whenever it is her turn to move, and the Sikhs blow up the plane, whenever it is their turn to move, is a Nash equilibrium. If India moves first and chooses Op, the Sikhs’ action is not relevant anymore. Thus deviating from Blow will not improve their expected utility. Given that the Sikhs’ strategy is Blow, if India will deviate and will not choose Opt, but rather choose DealH or DealL, the game will terminate with Blow. India’s expected utility in such a case is −2, which is lower than its expected utility from Opt, which is 0.4 × 5 + 0.6 × −3 = 0.2. This equilibrium demonstrates that the use of Nash equilibrium in extensiveform games may lead to an absurd equilibrium, since accepting DealH or DealL by the Sikhs yields a higher utility than blowing up the plane. Additional concepts of equilibrium will be discussed and applied to the strategic model.
2
The Strategic-Negotiation Model
In this chapter, we consider situations where a set of agents need to reach an agreement on a given issue. The strategic-negotiation model consists of a protocol for the agents interactions, the utility functions of the agents, and the agents’ strategies. The negotiation protocol is simple: one of the agents makes an offer. The other responds by either accepting the offer, rejecting it, or opting out of the negotiation (Rubinstein 1982). The negotiation ends if all the agents accept the offer or if one of them opts out. If the negotiation proceeds then another agent makes an offer and so on. The utility functions of the agents are used to evaluate different possible terminations of the negotiation. The strategy of an agent specifies which negotiation actions to take in various situations. This chapter discusses the general properties of each part of the model. The application of the model to different domains will be presented in the following chapters, and the modifications and the restrictions to the general model that are needed to enhance a specific application will be discussed. 2.1
Rubinstein’s Protocol of Alternating Offers
The strategic-negotiation model is based on Rubinstein’s model of alternating offers (Rubinstein 1982). In our strategic model there are N agents, Agents = {A1 , . . . , A N }. The agents need to reach an agreement on a given issue. It is assumed that the agents can take actions in the negotiation only at certain times in the set T = {0, 1, 2 . . .} that are determined in advance and are known to the agents. In each period t ∈ T of the negotiation, if the negotiation has not terminated earlier, the agent whose turn it is to make an offer at time t will suggest a possible agreement (with respect to the specific negotiation issue), and each of the other agents may either accept the offer (choose Yes), reject it (choose No), or opt out of the negotiation (choose Opt). If an offer is accepted by all the agents (i.e., all of them choose Yes), then the negotiation ends, and this offer is implemented. If at least one of the agents opts out of the negotiation, then the negotiation ends and a conflictual outcome results. If no agent has chosen “Opt,” but at least one of the agents has rejected the offer, the negotiation proceeds to period t + 1, and the next agent makes a counteroffer, the other agents respond, and so on. In most of the chapters of the book we assume that an agent responding to an offer is not informed of the other responses during the current negotiation period. We call this protocol a simultaneous response protocol.1 j (t) will denote the agent that makes an offer at time period t.
18
Chapter 2
Notation
Description
Agents N T S Ui Possibleit
The set of agents. The number of agents. Negotiation time periods. The set of possible agreements. Agent’s i utility function. The possible agreements that are not worse for agent i in period t than opting out in period t. The set of agreements at step t that are not worse for any agent than opting out. The best agreement for i in Possiblet . The worst agreement for i in Possiblet . Agent’s i strategy.
Possiblet s˜ i,t sˆ i,t fi
Figure 2.1 A brief description of the notations defined in this chapter and used in the following chapters.
Consider the problem of data allocation presented in example 1.2.1 of chapter 1. In this example the agents negotiate to reach an agreement that specifies the location of all the relevant data items. In the first time period, the first server offers an allocation, and the other agents either accept the offer, reject it, or opt out of the negotiation. If an offer is accepted by all the agents, then the negotiation ends and the proposed allocation is implemented. If at least one of the agents opts out of the negotiation, then a predefined conflict allocation is implemented, as described in Schwartz and Kraus (1997). If no agent has chosen “Opt” but at least one of the agents has rejected the offer, the negotiation proceeds to the next time period and another agent proposes an allocation, the other agents respond, and so on. In the strategic-negotiation model there are no rules that bind the agents to any specific strategy. We make no assumptions about the offers the agents make during the negotiation. In particular, the agents are not bound to any previous offers that have been made. After an offer is rejected, an agent whose turn it is to suggest a new offer can decide whether to make the same offer again or propose a new offer. The protocol provides a framework for the negotiation process and specifies the termination condition, but there is no limit on the number of periods. A fair and reasonable method for deciding on the order in which agents will make offers is to arrange them randomly in a specific order before the negotiation begins.2 That is, the agents will be labeled randomly A1 , . . . , A N . At each time t, j (t) will be Ai where i is equal to (t mod N ) + 1.
The Strategic-Negotiation Model
19
Another implementation issue deals with ensuring that if there are more than two agents, an agent responding to an offer is not informed of the other responses during the current negotiation period. If the agents broadcast their responses, one may receive the other responses before it responds and thus gain additional information. The simplest way to prevent this is to ensure that each agent sends its response before it reads the new mail it received after the point in time when the offer had been broadcast. This restriction can be verified by checking the data maintained by the operating systems or the communication system. Another possibility is that each agent sends an encoded response, and will send the key to decode it only after all the other messages have been received. The set of possible agreements is called S. An outcome of the negotiation may be that an agreement s ∈ S will be reached at time t ∈ T . This outcome is denoted by a pair (s, t). When one of the agents opts out of the negotiations at time period t ∈ T , the outcome is (Opt, t). For example, in the data allocation scenario (example 1.2.1), an agreement is an allocation that assigns each data item to one of the servers. In this case S is the set of all possible allocations. The symbol Disagreement indicates a perpetual disagreement, that is, the negotiation continues forever without reaching an agreement and without any of the agents opting out. 2.2
Assumptions
The following assumptions will be valid for all situations considered in this book. 1. Rationality The agents are rational; they try to maximize their utilities and behave according to their preferences. 2. Agents avoid opting out When an agent’s utilities derived from accepting an offer and from opting out are the same, it will accept the offer. 3. Commitments are kept
If an agreement is reached both sides will honor it.
4. No long-term commitments Each negotiation stands alone. An agent cannot commit itself to any future activity other than the agreed-upon action. 5. Common beliefs
Assumptions (1)–(4) are common beliefs.
Additional assumptions will be made explicit as needed given specific constraints of a given domain.
20
2.3
Chapter 2
The Agents’ Utility Functions
We assume that agents care only about the nature of the outcome of the negotiation and the time at which the outcome is reached, and not about the sequence of offers and counteroffers that leads to the agreement, that is, there is no “decision-regret” (see Raiffa 1982). In particular, we assume that agent Ai ∈ Agents has a utility function over all possible outcomes: U i : {S ∪ {Opt} × T } ∪ {Disagreement} → IR. The nature of the utility function depends on the specific domain of the negotiation. The time and resources spent on the negotiations also affect this utility. In previous work on formal models of multiagent negotiation (e.g., Rosenschein and Zlotkin 1994) and in game theory, the source of the utility functions or the preferences of the agents, which is the basis of such models, is rarely discussed. It is assumed that each agent knows its utility function (and has some knowledge of its opponents’ utility functions). However, a designer of an automated agent is required to provide the agent with a utility function or a preference relation. Otherwise, formal models cannot be used for automated agents. We will discuss the design of specific utility functions for several domains later in the book. Let o denote a possible outcome of the negotiation. The utility functions of the agents belong to one of the following categories. Fixed losses/gains per time unit: U i (o, t) = U i (o, 0) + t · Ci . In this case the agent has a utility function with a constant cost (i.e., Ci < 0) or gain (Ci > 0) due to delay. Here, every agent bears a fixed cost or gain for each period. The fixed cost can be, for example, due to negotiations costs (e.g., communication costs). The fixed gain may be obtained, for example, by using a resource that is the subject of the negotiation. Time constant discount rate: U i (o, t) = δit U i (o, 0) where 0 < δi < 1. In the case of a time constant discount rate, every agent i has a fixed discount rate 0 < δi < 1. Models with a financial system with an interest rate r : U i (o, t) = t 1 t · U i (o, 0) + C · 1 +r r (1 − 1 +1 r ), 0 ≤ r < 1. Suppose that a monetary 1+r 3 system exists in which each agent is able to borrow or to lend money at the current interest rate r. Using the interest rate r , U i (o, t) is evaluated as the net present value (NPV) of the future utility of the outcome, computed with
The Strategic-Negotiation Model
21
respect to the interest rate r. The NPV is used in financial systems in order to find the value of an investment. It is computed by discounting the cash flows at the firm’s “opportunity cost” of capital (Copeland and Weston 1992). In this case, the utility of an outcome o reached at time t is: U i (o, t) =
∞ U i (o, 0) t ′ =t
(1 + r )t ′
.
It is easy to show that U i (o, t) = ( 1 +1 r )t U i (o, 0). In addition it is assumed that there is a constant gain (or loss) over the course of the negotiation over time. Taking r into consideration, the constant gain (or t loss) is expressed by: C · 1 +r r (1 − 1 +1 r ). Finite-horizon models with fixed losses per time unit: U i (o, t) = U i (o, 0) · (1 − t/Nˆ ) − t · C for t ≤ Nˆ , C ∈ R. This utility function is applicable when it is previously known that the outcome of the negotiation is valid for Nˆ periods and that the first step of the negotiation the agent can gain U i (o, 0). As in the previous case, it is also assumed that there is a constant gain (or loss) from the negotiation over time. In the data allocation example, utility functions of the servers take into consideration factors such as storage costs, retrieval costs, distances between servers, negotiation costs, and so on, as presented in Schwartz and Kraus (1997). The servers in this case will prefer to agree on an allocation earlier, rather than later (i.e., C in the functions above is negative), for two reasons. First, an earlier agreement enables earlier use of the new data items—this increases the servers’ benefits due to queries, since they can use the new data items. Furthermore, a delay in reaching an agreement causes an increase in the cost of communication and computation time spent on the negotiation. The agents’ time preferences and the preferences between agreements and opting out are the driving force of the model. They will influence the outcome of the negotiation. In particular, agents will not reach an agreement that is not at least as good as opting out for all of them. Otherwise, the agent that prefers opting out over the agreement will opt out. Definition 2.3.1 (Agreements that are preferred over opting out) For every pedef riod t ∈ T and agent Ai ∈ Agents let Possibleit = {s t | s t ∈ S, U i ((s t , t)) ≥ U i ((Opt, t))} be the set of all the possible agreements that are not worse for
22
Chapter 2
agent Ai in period t than opting out in period t. Possiblet is the set of agreements which are preferred by all the agents over opting out, i.e., Possiblet = t Ai ∈Agents Possiblei .
In the inequality in the above definition, we allow the utility from (s t , t) to be greater than or equal to Opt, because, according to the assumption that “agents avoid opting out,” an agent will also accept an agreement if it gains equal utility from both. For example, in the data allocation scenario (example 1.2.1) Possiblet includes all the allocations that if implemented by time t will not yield any utility to any of the agents that is less than its utility at time t from the conflict allocation that is implemented if one of the agents opts out of the negotiation. There are situations in which one of the agents has greater negotiation power than the others. For example, when one of the agents gains while the others lose over time, the set of agreements that could be reached is still Possiblet . However, the stronger agent may be able to force the others to reach the agreement that is the best for it, among the agreements in Possiblet , by continuing to offer this agreement and refusing any other proposal. Definition 2.3.2 (The best possible agreement) If Possiblet is not empty, let the agreement s˜i,t ∈ Possiblet be the agreement that satisfies U i ((˜s i,t , t)) = maxs∈Possiblet U i ((s, t)). In some of the domains considered in this book, there are only two agents in the environment, Ai and A j , and their preferences are strictly conflicting, i.e., U i ((s, t)) ≥ U i ((s ′ , t ′ )) iff U j ((s, t)) ≤ U J ((s ′ , t ′ )). In a strictly conflicting preferences situation, the best agreement for an agent that is still not worse for the other agent than opting out is actually the worst agreement for the other agent. For example, in the agents on Mars case (example 1.2.2 of chapter 1), if two robots need the same resource at the same time, and each would like to use it as much as possible, then their preferences are conflicting. The worst agreement for agent Ai that is still not worse than opting out for the other agents is defined in the next definition. Definition 2.3.3 (The worst possible agreement) If Possiblet is not empty, let the agreement sˆi,t ∈ Possiblet be the agreement that satisfies U i ((ˆs i,t , t)) = mins∈Possiblet U i ((s, t)). It is easy to see that in a strictly conflicting environment of agents Ai and A j , if Possiblet is not empty, then sˆi,t = s˜ j,t .
The Strategic-Negotiation Model
2.4
23
Negotiation Strategies
An agent’s negotiation strategy specifies what the agent should do next, for each sequence of offers s0 , s2 , s3 , . . . , st . In other words, for the agent whose turn it is to make an offer, the strategy specifies which offer to make next. That is, it indicates to the agent which offer to make at t + 1, if in periods 0 until t the offers s0 , . . . , st had been made and were rejected by at least one of the agents, but none of them has opted out. Similarly, in time periods when it is the agent’s turn to respond to an offer, the strategy specifies whether to accept the offer, reject it, or opt out of the negotiation. Formally, a strategy can be expressed as a sequence of functions. The domain of the t’th element of a strategy is a sequence of offers of length t and its range is the set {Yes, No, Opt} ∪ S. For each Ai ∈ Agents, let f i = f t ∞ t=0 , such that
j (t), f it : S t × S → {Yes, No, Opt}. for i = j (t), f it : S t → S, and for i = A strategy profile is a collection of strategies, one for each agent (Osborne and Rubinstein 1994). Let Length( f 1 , . . . , f N ) be the length of the negotiation process where each agent Ai adopts f i . Let Last( f 1 , . . . , f N ) be the last offer (if there is such an offer). Note that Last( f 1 , . . . , f N ) may be either Opt or an agreement. Definition 2.4.1 (Outcome of the negotiation) Given a strategy profile f 1 , . . . , f N , the outcome function of the negotiation is defined as: Outcome( f 1 , . . . , f N ) Disagreement if Length( f 1 , . . . , f N ) = ∞ = (Last( f 1 , . . . , f N ), Length( f 1 , . . . , f N ) − 1) otherwise Given a specific situation, we would like to be able to find simple strategies that we could recommend to all agents such that no agent will benefit by using another strategy. We will define this notion formally in the next section. 2.5
Subgame Perfect Equilibria
We are now ready to consider the problem of how a rational agent will choose its negotiation strategy. A useful notion is the Nash Equilibrium (Nash 1953; Luce and Raiffa 1957), which is defined as follows: Definition 2.5.1 (Nash equilibrium) A strategy profile F is a Nash equilibrium of a model of alternating offers, if each Ai does not have a different strategy
24
Chapter 2
yielding an outcome that it prefers to that generated when it chooses f i , given that every other agent A j chooses f j . Briefly: no agent can profitably deviate, given the actions of the other agents. This means that if all the agents use the strategies specified for them in the strategy profile of the Nash equilibrium, then no agent has a motivation to deviate and use another strategy. However, the use of Nash equilibrium in a model of alternating-offers leads to an absurd Nash equilibria (Tirole 1988): an agent may use a threat that would not be carried out if the agent were put in the position to do so, if carrying out the threat would give the agent lower payoff than it would get by not taking the threatened action. This is because Nash equilibrium strategies may be in equilibrium only in the beginning of the negotiation, but may be unstable in intermediate stages. The concept of subgame perfect equilibrium (SPE) (Osborne and Rubinstein 1994) is a stronger notion, and will be used in order to analyze the negotiation: Definition 2.5.2 (Subgame perfect equilibrium) A strategy profile is a subgame perfect equilibrium of a model of alternating offers if the strategy profile induced in every subgame is a Nash equilibrium of that subgame. This means that at any step of the negotiation process, no matter what the history is, no agent has a motivation to deviate and use any strategy other than that defined in the strategy profile. This book will also consider situations of incomplete information. However, when there is incomplete information there is no proper subgame. In the incomplete information situation the sequential equilibrium (Kreps and Wilson 1982), which takes the beliefs of the agents into consideration, will be used. This concept will be defined formally in chapter 4. 2.6
Discussion of the Protocol
The protocol of the strategic-negotiation model is one of alternating offers. Rubinstein (1982) proposed the model of alternating offers, where two players need to reach an agreement on the partition of a pie of size 1 and they cannot opt out of the negotiation. In particular, an agreement in Rubinstein (1982) is a pair (s1 , s2 ) where s1 , s2 ∈ IR + and s1 + s2 = 1. Rubinstein considered situations of fixed discount costs over time and of fixed discount factors. In all cases the negotiation ends with no delay. He extended this model to the case of incomplete information about time preferences in Rubinstein (1985).
The Strategic-Negotiation Model
25
There, an agreement may be reached in the first or second time periods. Since then, many variations of the Rubinstein’s model of alternating offers have been studied in the game-theory literature. For example, Shaked and Sutton (1984) propose a version of the model in which a player can opt out of negotiations. A detailed review of the model of alternating offers can be found in Osborne and Rubinstein (1990). However, to use this model in our agents’ applications, several important modifications of the game-theory models are required. These mainly concern the way time influences the preferences of the agents, the consideration of discrete agreements, the possibility that both agents can opt out, and the preferences of the agents over opting out. Also, the complexity of finding an equilibrium is important when these models are applied to multiagent environment. One can oppose the strategic-negotiation approach for solving conflicts among agents, by offering a centralized algorithm or an algorithm in which conflicts like those considered in this book are solved before the system starts running. However, such algorithms are not appropriate in problems where one or more of the following conditions hold: Agents do not agree on any entity (oracle) who will resolve their conflict. Therefore, a centralized solution is not acceptable; •
The system is dynamic and therefore, a predefined conflict resolution (e.g., timetable for resource usage) cannot be used; •
A centralized solution may become a performance bottleneck. The bottleneck appears in the optimization phase, where all possibilities must be checked in order to arrive at an optimal solution; and •
• There is incomplete information and no centralized entity has all the relevant information.
In noncentralized environments the interaction among agents requires protocols that regulate those interactions. Usually, as more restrictions are forced upon the agents by the protocol, the amount of communication required to reach a beneficial agreement decreases. Yet protocols that impose many restrictions may be contradictory to the rationality of an individual agent. In environments of heterogeneous, self-interested rational agents, protocols must be agreed upon and enforced by the agents’ designers. For this, deviation from the protocols must be revealable and penalizable by the interacting agents, or protocols must be self-enforced. The designers of agents will agree upon protocols, provided that they are not advantageous to any particular type of
26
Chapter 2
agent and leave enough opportunity for the individual agents to utilize their resources and strengths. Therefore, the negotiation protocol that is presented in this book places very few restrictions on the agents’ interactions and allows the agents to choose specific strategies that will enable them to improve their individual outcomes. An agent that negotiates with another agent may have incomplete information about its opponent’s utility function and may not be sure of how the opponent will evaluate an offer or how it might compare an offer with other options. One of the main factors in such negotiations is the agents’ beliefs about their opponents, and about their opponents’ beliefs, and so on. Questions such as “what is the other agent’s reservation price?” “how will my opponent respond if I reject his offer?” and “is it worthwhile for me to pretend to be someone else?” are common among negotiators. These questions become even more crucial when time is valuable and there are several agents with whom it is possible to cooperate. Under these conditions, the agents may not be able to determine exactly what their opponents’ beliefs are and therefore they will be unable to negotiate to the best of their capacity. In some situations, an agent needs to decide whom to negotiate with and how to estimate the possible results from negotiations with the other agents. Our model will be extended later to be applicable to situations with incomplete information. 2.7
Negotiation Models in Distributed Artificial Intelligence
Negotiation has been used in DAI both in distributed problem solving (DPS) where the agents are cooperative and in Multiagent Systems (MA) where the agents are self-interested. Several works in DPS use negotiation for distributed planning and distributed search for possible solutions for hard problems. For example, Conry et al. (1991) suggest multistage negotiation to solve distributed constraint satisfaction problems when no central planner exists. Moehlman, Lesser, and Buteau (1992) use negotiation as a tool for distributed planning: each agent has certain important constraints, and it tries to find a feasible solution using a negotiation process. They applied this approach in the Phoenix fireman array. Lander and Lesser (1992) use negotiation search, which is a multistage negotiation as a means of cooperation while searching and solving conflicts among the agents. The above approaches have been developed for DPS systems, and as such are not applicable to our problems.
The Strategic-Negotiation Model
27
For the MA environments, Rosenschein and Zlotkin (1994) have identified three distinct domains where negotiation is applicable and have found a different strategy for each domain: (i) task-oriented domain: Finding ways in which agents can negotiate and come to agreement, and allocating their tasks in a way that is beneficial to everyone; (ii) state-oriented domain: Finding actions that change the state of the “world” and serve the agents’ goals; and (iii) worthoriented domain: Same as the above, but, in this domain, the decision is taken according to the maximum utility the agents gain from the states. Our model belongs to the worth-oriented domain. However, in all of the above domains, time plays no explicit role in the agent’s utility functions; therefore, the strategies of Rosenschein and Zlotkin (1994) cannot be used in our model. Sycara (1990; 1987) has presented a model of negotiation that combines case-based reasoning and optimization of multi-attribute utilities. In her work agents try to influence the goals and intentions of their opponents. Kraus and Lehmann (1995) developed an automated Diplomacy player that negotiates and plays well in actual games against human players. Sierra et al. (1997) present a model of negotiation for autonomous agents to reach agreements about the provision of service by one agent to another. Their model defines a range of strategies and tactics, distilled from intuitions about good behavioral practice in human negotiation, that agents can employ to generate offers and evaluate proposals. We apply a formal game-theory model. Zeng and Sycara (1998) consider negotiation in a marketing environment with a learning process in which the buyer and the seller update their beliefs about the opponent’s reservation price4 using the Bayesian rule. We consider situations of complete information, but where the utility functions of the agents are more complex, and finding stable strategies depends on the specification of the environment. Sandholm and Lesser (1995b) discuss issues such as levels of commitment that arise in automated negotiation among self-interested agents whose rationality is bounded by computational complexity. These issues are presented in the context of iterative task allocation negotiations, while we consider other problems in addition to task allocation.
3
Negotiations about Data Allocation
Given the enormous volume of data stored in modern distributed informationretrieval systems, information servers will benefit from cooperation with one another. For example, they can share documents by storing only one copy of each document in one of the servers that they all use in order to save space; they can transfer clients from one server to the other if there is a heavy load on one server, thus increasing the clients’ satisfaction while the servers do not need to buy more resources such as CPUs; they can distribute computations that are needed for answering queries in order to provide faster responses to the clients, given their resources. This chapter considers environments where the servers share documents and thus need to decide where to locate data that are available to them. The data allocation problem is studied in environments where the servers are autonomous and self-interested. That is, each server in the environment is independent and has its own commercial interests, but would like to cooperate with the other information servers in order to increase its own benefits and make more information available to its clients. This is the case when each server is owned by a different organization as is common today on the internet. There are also situations where an organization’s policy is to create autonomous servers with independent budgets in order to increase the organization’s benefits. We assume that there is no central controller to resolve conflicts among the servers. In order to have a central controller, the servers (or their designers or owners) should agree in advance and there is a need to operate and maintain the central controller over time. This is difficult in environments like the internet. Furthermore, a central controller may become a bottleneck and thus decrease the performance of the servers. In this chapter, the application of the strategic-negotiation model to the data allocation problem is presented. Using this model, the servers have simple and stable negotiation strategies that result in efficient agreements without delays. We show that the proposed methods yield better results than the static allocation policy currently used in EOSDIS (see example 1.2.1), which is a specific example of the environment under consideration. In particular, we show that when servers negotiate to reach an agreement on the allocation of data items and they have complete information, various agreements can be reached. We prove that for any possible allocation of the data items that is not worse for any of the agents than opting out, there is a set of stable strategies (one for each server) that leads to this outcome. That is, suppose alloc is a specific allocation that all the servers prefer to opting out of the negotiation. A strategy for each of the servers can be designed such that the strategy profile
30
Chapter 3
will be an equilibrium. If the servers use this strategy profile, the negotiations will end at the first time period of the negotiation with the agreement alloc. The details of the allocations that are not worse for any of the agents than opting out depend on the specific settings of the environment in a given negotiation session. Thus there is no way to identify these allocations in advance. In addition, there are usually several allocations that are not worse for any of the agents than opting out. Finding all these allocations is intractable. In addition, after identifying these allocations the servers should agree on one of them as the basis for the negotiation.1 Of course, each of the servers may prefer a different allocation because it may yield a higher utility. A mechanism by which the servers can choose one of these profiles of stable strategies is presented. It leads to satisfactory results for all of the servers. In this mechanism each server proposes an allocation and the one that maximizes a social welfare criterion (e.g., the sum of the servers’ utilities) is selected. We propose several heuristic search algorithms to be used by the servers to find such allocations. There are situations where the servers have incomplete information about each other. I consider such situations and add a preliminary step to the strategic negotiation where the servers reveal some of their private information. When the servers use the proposed revelation mechanism, it is beneficial for them to truthfully report their private information. After the preliminary step, the negotiation continues as in the complete information case and yields better results for all the servers than the static allocation policy currently used in EOSDIS. Thus the overall process in this case is: First, each server broadcasts its private information. If a lie is detected, then the liar is punished by the group. In the next step each server searches for an allocation and then simultaneously each of them proposes one. The allocation that maximizes the predefined social-welfare criterion is selected. Then the servers construct the equilibrium strategies based on the chosen allocation and they start the negotiation using the alternating offers protocol. In the first step of negotiations, the first agent proposes the selected allocation, and the others accept it. In addition to the theoretical results, simulation results that demonstrate the effect of different parameters of the environment on the negotiation results are also presented. For example, when the servers are more willing to store data locally, better agreements can be reached. The reason for this is that in such situations there are fewer constraints on finding agreements that are better for all the servers than opting out, and it is easier to find a beneficial compromise. The servers are more willing to store data locally when the storage costs and the cost of delivery of documents stored locally to other servers is low.
Negotiations about Data Allocation
3.1
31
Description of the Data Allocation Environment
In the environment considered in this chapter, there are several information servers connected by a communication network. Each server is located in a different geographical area and receives queries from clients in its area. A server, in response to a client’s query, sends back information stored locally or information stored by another server, which it retrieves from that server for a price. In this model, each server has its own interests and wants to maximize its own utility. Its utility function is described below and considers the location of each data item, including data that are stored by any other server. This section formally presents the data allocation problem and defines its basic components. First, the environment in which the negotiation will take place is defined. Definition 3.1.1 DS, U>.
A data allocation environment (DAE) is a tuple U i (Disagreement). The definition of the utility function of a server will be presented in stages. First, the utility from one dataset will be specified. Then, the utility of an allocation at the first step of the negotiation will be discussed, and finally, the utility when there is a delay in reaching an agreement will be presented. 3.1.2.1 Utility from One Dataset The utility function of an agent from a given allocation of a set of datasets is the combination of its utility from the assignment of each dataset. Taking the above parameters into consideration, and the assumption that only one copy of each document is stored by all the servers, the utility for a server from the assignment of one dataset, (ds), to a certain location, (loc), is: For each server i ∈ SERV, ds ∈ DS and loc ∈ SERV: local(ds) loc = i V i (ds, loc) = remote(ds, loc) otherwise
Attribute 3.1.1
where local(ds) = usage(i, ds) · query price − storage cost · dataset size(ds)− d∈SERV usage(d, ds) · distance(d, i) · answer cost
and
remote(ds, loc) = usage(i, ds) · query price − usage(i, ds) · distance(i, loc) · retrieve cost. 3.1.2.2 Utility from an Allocation with no Delay in Negotiation The utility of a server from an allocation consists of the utility from the assignment of each dataset. U i will first be introduced while ignoring the cost of negotiation delay. For each server i ∈ SERV, and alloc ∈ S: V i (x, alloc(x)). U i (alloc, 0) = Attribute 3.1.2
x∈DS
According to the above definition, the utility from the assignment of one dataset is independent of the assignment of the other datasets. This property reflects that we do not take into consideration the overall load of the system, which may cause delays in transferring information and which yields higher
Negotiations about Data Allocation
35
transfer costs. However, a severe load on one server is prevented, since, as is presented later, the servers reach fair agreements. 3.1.2.3 Effect of Negotiation Time on the Utility Function For a server participating in the negotiation process, the time when an agreement is reached is very important, for two reasons. First, there is the cost of communication and computation time spent on the negotiation. Second, there is the loss of unused information: until an agreement is reached, new documents cannot be used. Thus, since the servers receive payment for answering queries, they wish to reach an agreement as soon as possible. Usage of a stored dataset is considered to decrease over time, since the information becomes less current. In our environment, if there is a constant discount ratio of dataset usage and storage cost, then there is also a constant discount ratio of the utility from an allocation. Thus the utility function of an agreement depends on the details of the agreement and on its time. In particular, we consider functions with both fixed losses per time unit and time constant discount rate. We assume that the usage of each dataset by each area is reduced over time, with a discount rate of 0 < δ < 1, and the storage cost per dataset is reduced with the same discount rate. We will also consider environments with a monetary system as discussed in section 2.3. In this case we will also assume that there is a fixed cost for the negotiation. A1d Utility Over Time The utility functions have one of the following forms: Constant discount rate: Consider a server i ∈ SERV. For alloc ∈ S and t ∈ T , U i (alloc, t) = δ t · U i (alloc, 0) − t · C, where C > 0 is the constant negotiation cost for each time delay, and 0 ≤ δ < 1. Monetary system with an interest rate r : Consider a server i ∈ SERV. t For alloc ∈ S and t ∈ T , U i (alloc, t) = 1 +1 r · U i (alloc, 0) − t C · 1 +r r (1 − 1 +1 r ), 0 ≤ r < 1, C > 0. 3.1.3
Properties of the Utility Functions
A few lemmas on the properties of the utility functions will be proven in this section. They will be used later for finding which strategy profile is in equilibrium. First, an additional assumption is made with respect to the utility of the conflict allocation that is implemented if an agent opts out of the negotiation. The conflict allocation may be better for an agent than some other allocations in S. However, in assumption A2d below we assume that the utility from the conflict allocation is never negative. This attribute is necessary in order to ensure
36
Chapter 3
that servers prefer reaching an agreement sooner rather than later when their utility function is as described in section 3.1.2. A2d Opting out at the first period: The utility of each server from the conflict allocation in the first time period is always greater or equal to zero. If the conflict allocation means no allocation of any dataset, then the utility that a server derives from the conflict allocation is always zero. However, the assumption also states that in the case of static allocation, the utility for each server is positive. A positive utility from the conflict allocation can be obtained when the price of queries is sufficiently high. The following lemma proves that reaching an agreement earlier is better than reaching it later, whenever the utility derived by the server from this agreement in the first time period of the negotiation is nonnegative. It also states that the utility derived from the conflict allocation (denoted conflict alloc) decreases over time. Finally, it states that the relation between the utilities of two offers that yield positive utility at time 0 is independent of time. This trait is important for the structure of our negotiation process. Lemma 3.1.1 Consider a model that satisfies assumption A1d and A2d . For every t1 , t2 ∈ T , i ∈ SERV and alloc1 , alloc2 ∈ S the following hold: Agreements over time: If U i (alloc1 , 0) ≥ 0, then if t1 < t2 , then U i (alloc1 , t2 ) ≤ U i (alloc1 , t1 ). Opting out costs over time: If t1 < t2 , then U i (conflict allocation, t2 ) ≤ U i (conflict alloc, t1 ). Relations between offers: If t1 < t2 , if U i (alloc1 , t1 ) > 0 and U i (alloc2 , t1 ) > 0, then U i (alloc1 , t1 ) > U i (alloc2 , t1 ) iff U i (alloc1 , t2 ) > U i (alloc2 , t2 ). Proof: The proof is straightforward using the assumptions and the definition of the utility functions. We will demonstrate it by proving the last claim of the relations between offers. The two cases of A1d are considered. Constant discount ratio of dataset usage and storage cost: ∀alloc ∈ S, t ∈ T , i ∈ SERV, U i (alloc, t) = δ t · U i (alloc, 0) − t · C, 0 < δ < 1, C ≥ 0. Thus, if U i (s1 , t1 ) > U i (s2 , t1 ) then δ t1 · U i (s1 , 0) − t1 · C > δ t1 · U i (s2 , 0) − t1 · C. Since δ < 1, U i (s1 , 0) > U i (s2 , 0). Thus, δ t2 · U i (s1 , 0) > δ t2 · U i (s2 , 0),
Negotiations about Data Allocation
37
and δ t2 · U i (s1 , 0) − t2 · C > δ t2 · U i (s2 , 0) − t2 · C. Therefore, U i (s1 , t2 ) > U i (s2 , t2 ). Monetary system with an interest rate r : ∀alloc ∈ S, t ∈ T , i ∈ SERV, U i (alloc, t) = (1 +1 r )t · U i (alloc, 0) − C 1 +r r (1 − (1 +1 r )t ) C ≥ 0, 0 < r < 1. Since 0 < r < 1, also 0 < If U i (s1 , t1 ) > U i (s2 , t1 ) then
1 t1 1+r
1 (1 + r )t1
· U i (s1 , 0) − C 1 +r r (1 −
1 (1 + r )t1
)>
1 1+r
< 1.
1 t1 1+r
· U i (s2 , 0) − C 1 +r r (1 −
1 t2 1+r
· U i (s2 , 0) − C 1 +r r (1 −
).
1 > 0, U i (s1 , 0) > U i (s2 , 0). 1+r t t Thus, 1 +1 r 2 · U i (s1 , 0) > 1 +1 r 2 · U i (s2 , 0) t and 1 +1 r 2 · U i (s1 , 0) − C 1 +r r (1 − (1 +1r )t2 ) 1 ), (1 + r )t2 i so, U (s1 , t2 ) > U i (s2 , t2 ).
Since
>
The following is an important corollary of the above lemma concerning the relationships over time between the utility of any offer and the utility of opting out. Corollary 3.1.1 (Conflict allocation and other offers) Consider a model that satisfies assumptions A0d –A2d . For each i ∈ SERV, alloc ∈ S, t1 , t2 ∈ T , if U i (alloc, t1 ) > U i (conflict alloc, t1 ), then U i (alloc, t2 ) > U i (conflict alloc, t2 ). As was discussed in chapter 2, only agreements that are not worse for all agents than opting out can be reached. Since the relation between the utilities of offers and the utility of the conflict allocation that is implemented when one of the servers opts out does not change over time in the data allocation environment, the set of offers that are not worse for agent i than the conflict allocation is independent of the negotiation time. In the following, definition 2.3.1 is adapted to fit this property. That is, we define the set of offers that are not worse for agent i than the conflict allocation, and the set of offers that are not worse for all the agents over the conflict allocation. Definition 3.1.2 (Offers that are not worse than the conflict allocation) every t ∈ T and i ∈ S E RV , we define
For
Possibleit = {alloc | alloc ∈ S and U i (alloc, t) ≥ U i (conflict alloc, t)}. ′
Since, for every t, t ′ ∈ T and i ∈ SERV Possibleit = Possibleit , we set Possiblei = Possibleit .
38
Chapter 3
The set of all offers that are individually rational, namely the offers that are not worse than the conflict allocation for all the agents: Possible = Possiblei . i∈SERV
The strategies that the servers will use in the negotiation are identified in the next section. 3.2
Negotiation Analysis—Complete Information
In all the situations considered, the servers are uncertain about the future usage of the datasets. This section considers an environment with complete information, in which the servers know the expected usage of each dataset by the clients located in each area, but do not know the actual future usage. Section 3.4 considers an environment with incomplete information in which each server does not know the expected usage of each dataset by clients of each area; each server knows only the past usage of the datasets stored locally, and the past usage of datasets by clients in its area. 3.2.1
Multiple Equilibria
Here we will show that when the servers negotiate to reach an agreement on the allocation of the datasets and they have complete information a large number of agreements can be reached. In particular, for any possible allocation of the datasets that is not worse for any of the agents than the conflict allocation, there is a subgame-perfect equilibrium that leads to this outcome. The proof is based on an idea proposed by H. Haller (1986). The intuition behind this proof is as follows. Suppose the strategy profile is constructed to lead to alloc. If all the agents will accept alloc and will opt out if a different proposal is made, then since the agent that makes an offer prefers alloc to opting out, it is better for it to offer alloc. Therefore, suppose the agent whose turn it is to make an offer proposes alloc, and all the agents but one, i, accept it. Since agent i prefers alloc over opting out, it is not worth it to opt. If agent i rejects the offer, the best option it can expect in the next time period is reaching alloc. But, since all the agents lose over time, it is better for agent i to accept the offer now than to reject it and, in the best scenario, reach agreement alloc in the next time period.6 Theorem 3.2.1 Consider a model that satisfies assumptions A0d –A2d . If there are N ≥ 3 agents, then for each offer alloc ∈ Possible, i.e., an offer which is not worse for any of the agents than opting out, there is a subgame-perfect
Negotiations about Data Allocation
39
equilibrium of the alternating offers negotiation with the outcome alloc offered and unanimously accepted in period 0. Proof: For any time t let us define a rule Rt to use in time t by agent i = j (t) and a rule E t to be used by agents which are not j (t): For i = j (t): Rt : Offer alloc. For agent i =
j (t): Et : Accept, if alloc was offered. Opt out, if a different offer than alloc was made. These rules define a strategy f i for any agent i. The claim is that ( f 1 , . . . , f N ) is a subgame-perfect equilibrium with the outcome that alloc is offered and unanimously accepted in period 0. Clearly, ( f 1 , . . . , f N ) leads to this outcome. It remains to be shown that ( f 1 , . . . , f N ) is a subgame-perfect equilibrium. Let t ≥ 0 and i ∈ SERV. Consider a subgame starting in period t. Offer case: If i has to make an offer: violation of Rt by i leads to the other agents opting out. Thus the conflict allocation is implemented, but this is not better (and probably worse) for i than reaching an agreement alloc. Therefore, in that case i has no better response than offering alloc. Responding case: If agent i has to respond, then: Either Rt was violated by j (t). Then, since N ≥ 3, there is an agent k ∈ { j (t), i} who will opt out from the negotiation, according to the strategy f k . Since i is not decisive, deviating from opting out won’t improve its outcome. Or Rt was followed by j (t), who offered alloc. If i accepts, since all other agents will accept it too according to their strategies, the offer is accepted and i’s utility is U i ((alloc, t)). If i opts out, the opting out solution will be implemented, but i prefers alloc over opting out. If i rejects, then it cannot get more than U i ((alloc, t +1)): if it will deviate from Rs at a certain s > t, it will get U i (opt). Otherwise, it will get U i (alloc, t + l) with l ≥ 1. But i prefers alloc today since it is losing over time. Hence accepting alloc is optimal. Therefore, following E t is optimal for i in the responding case. Consequently, for any i ∈ SERV following the rules is optimal in any subgame, if the other
40
Chapter 3
agents follow the rules. Hence ( f 1 , . . . , f N ) is a subgame-perfect equilibrium. According to the above result the order in which the agents make offers does not influence the outcome of the negotiation. However, the above theorem shows that the number of equilibria can be very large. The selection of one equilibrium in such situations is very difficult. Since the agents are self-motivated, there is no one equilibrium that is the “best” for all the agents. In addition, one might say that the strategies in the equilibria of the theorem are strange: given an allocation x, the strategy of agent i in the equilibrium associated with x makes i opt out for any offer that is different than x, even though the offer is better for i than opting out. However, the agent will follow this strategy if it knows that the other agents will also follow it, and given that the others will follow these strategies it cannot benefit from deviation. That is, if it is known that this equilibrium was chosen by all the agents, deviation by one of them is not beneficial. Game theory has developed equilibrium selection theories that can distinguish between Nash equilibria (e.g., Harsanyi and Selten 1988; Huyck et al. 1990; Cooper et al. 1990). A significant amount of work has been performed on the evolution of conventions in games that are played repeatedly within a population (e.g., Young 1993ab; Kandori, Mailath, and Rob 1993; Bhaska 1997). These conventions lead to a selection of one equilibrium by the agents. Another approach is using “cheap-talk,” which may be roughly defined as nonbinding, nonpayoff relevant preplay communication (Farrell 1988; Jamison 1997) for selecting an equilibrium. We propose that the convention for selecting an equilibrium will be agreed upon by the designers of the agents rather than evolve from repeated interactions. It should lead to an equilibrium that is Pareto optimal. We discuss the proposed convention in the next section, and owing to the complexity of the data allocation problem, we use a “cheap-talk” stage to enable the agents to start the negotiation without a long delay. 3.2.2
Choosing the Allocation
As was shown, if the agents follow the negotiation protocol described above, any offer x, which gives each server at least its conflict utility, has a subgame perfect equilibrium, which leads to the acceptance of x during the first period of the negotiation. People use conventions that emerge over time and we propose that the designers of the agents will agree on a convention. Since the allocation set and the servers’ utilities strongly depend on the exact settings of the
Negotiations about Data Allocation
41
negotiation session, a convention cannot be a specific equilibrium but should be a mechanism for choosing one. In particular, we propose that the designers of the servers agree in advance on a joint technique for choosing x that on average will give each agent a beneficial outcome. Thus, in the long run, this convention is beneficial. We propose that the designers decide on a mechanism that will find an allocation that must, at the very least, give each agent its conflict utility and, under these constraints, maximize some social-welfare criterion, such as the sum of the servers’ utilities, or the generalized Nash product of the servers’ utilities (Nash 1950), i.e., i (U i (x) − U i (conflict alloc)).7 These methods will lead to Pareto optimal allocations. However, the designers may agree on other methods as well, preferably ones that will lead to Pareto-optimal outcomes. Note that the utility functions of the servers will be calculated according to the usage revealed by the servers in the revelation process. In addition to mechanisms for choosing x, the designers will provide their agents with the strategies of theorem 3.2.1, which are in perfect equilibrium and lead to the acceptance of a chosen allocation in the first negotiation period.8 The main problem in the mechanisms mentioned above is, as we will prove in section 3.3.2, that of finding an allocation maximizing a welfare criterion with restrictions on the servers’ utilities is NP-complete.9 Thus searching for an optimal solution is not practical in a large system. Hence a possible tractable mechanism is to search for suboptimal solutions. If the problem can be solved in a reasonable amount of time or if there is a known deterministic algorithm that achieves good results, then each agent can run the same algorithm and find the same x, which will be the allocation offered and accepted at the negotiation. Nevertheless, as will be described below, randomized methods may be more beneficial than deterministic ones for all the agents. However, the agents must jointly agree on the same allocation x, and if they use a random mechanism, each will find a different allocation. To solve this problem, we divide the negotiation protocol into two phases, in situations where randomized methods are beneficial. In the first phase, each server will search for an allocation using any search algorithm and resources it has. All the agents will simultaneously broadcast their findings to the other agents at the end of the phase, and the one with the highest value of the socialwelfare criterion, agreed upon by the designers, will be designated as the chosen x. Using the game-theory concepts, the exchange of messages in this phase can be referred to as cheap-talks. In the second phase, the negotiation will be performed using the perfect equilibrium strategies with respect to x.
42
Chapter 3
In the first phase, all the agents try to maximize the social-welfare criterion. However, when using such a protocol, where each agent makes an offer and the one that maximizes a welfare criterion is chosen, it may be worthwhile for a server that has significantly more computational power than the others to deviate and try to find an offer that is good primarily for itself, and only secondarily so for society. But, if the servers have similar computation powers, then their strategies that do not maximize the social-welfare criterion are not stable. If we promise a bonus that is high enough for the server whose suggestion has been accepted in the first phase, then maximizing the social-welfare criterion becomes the best strategy for all agents. In any event, even if the servers do deviate and each tries to maximize its own utility function, the results obtained are still good. In section 3.3.4.3 I present the result of testing the case in which each server maximizes its own utility. The results are still not worse than those obtained by the static allocation. 3.3 3.3.1
Complexity and Heuristics Methods for the Allocation Problem The Optimization Problem
In section 3.2.1 it was proven that any offer not worse for any of the servers than the conflict allocation can be implemented. We suggest choosing an allocation that maximizes a given social criterion. The allocation should be not worse for all the agents than the conflict allocation. The formal problem of finding an appropriate allocation is presented below. The term guaranteed utilityi will be used to denote the utility that any selected offer should guarantee for server i. In particular, guaranteed utilityi is the utility for server i from the conflict allocation, since, if it will not derive at least the same utility, it will prefer to opt out rather than accept the offer. Suppose that the designers of the agents have agreed on a function f as the social welfare criterion. To find a specific offer that will be acceptable in the negotiation process, at time period t, an agent has to find the best allocation according to function f that yields at least guaranteed utilityi for all the agents. That is, it needs to consider the allocations in the set Possible that maximizes f : alloct = argmaxalloc∈Possible f (alloc) The following functions are considered: i 0 • The sum of the servers’ utilities: i∈Ser v U (alloc, t). In particular, for alloc , i i∈Ser v U (alloc, 0) is used.
Negotiations about Data Allocation
•
43
The generalized Nash product of the servers’ utilities: (U i (alloc, t) − U i (conflict alloc, t)).
i∈SERV
Similarly, the relevant case is when t = 0. Designing an algorithm that the agent will use in order to find such an allocation remains to be accomplished. However, this problem is intractable. Thus efficient algorithms fot this problem do not exist, and therefore heuristic and suboptimal approaches for solving it will be presented. The proof that finding an allocation that maximizes function f given the constraints is NP-complete is presented formally. 3.3.2
NP-Completeness
In the following section, the decision versions of the problems described above will be defined and will be proven to be NP complete. The following two problems will be defined: maximizing the sum of the agents’ utilities, and maximizing the generalized Nash product of the servers’ utilities. 3.3.2.1 SBDA: Sum Best Dataset Allocation: maximization problem is:
the decision version of this
Instance: the set SERV, a set DS with n ∈ IN datasets, the utility functions {U i }, {guaranteed utilityi } and C ∈ R. Question: find out whether there is an allocation alloc where each server i ∈ SERV gets at least guarateed utilityi , and the value of the sum of utilities i i∈SERV U (alloc, 0) is greater than C. 3.3.2.2 PBDA: Product Best Dataset Allocation: this maximization problem is:
the decision version of
Instance: the set SERV, a set DS with n ∈ IN datasets, the utility functions {U i }, {guaranteed utilityi } and C ∈ R. Question: find out whether there is an allocation alloc where each server i ∈ SERV gets at least guaranteed utilityi , and the value of the product i i∈SERV (U (alloc, 0) − guaranteed utilityi ) is greater than C.
The decision problem is no more difficult than the corresponding optimization problem. Clearly, if we could find an allocation that maximizes the sum of utilities in polynomial time, then we could also solve the associated decision
44
Chapter 3
problem in polynomial time. All we need to do is find the maximum utility and compare it to C. Thus, if we could demonstrate that SBDA is NP-complete, we would know that the original optimization problem is at least difficult, and the same holds for the other problem. Theorem 3.3.1
SBDA, and PBDA are NP complete.
Proof: SBDA and PBDA are in NP: when a solution alloc is given, it can be checked in polynomial time to ensure that it is better for all the agents than the initial allocation, and that the sum, or product, is higher than the required value C. In order to prove that they are NP-hard, a reduction from the multiprocess scheduling is presented. The multiprocessor scheduling (MS) problem is defined in (Garey and Johnson 1979) as follows: Instance: A finite set A of n tasks, a length l(a) ∈ Z + for each task a ∈ A, a number m ∈ Z + of processors, and a deadline D ∈ Z + . Question: Is there a partition A = A1 ∪ A2 ∪ . . . ∪ Am of A into m disjoint sets such that max1≤ j≤m { a∈A j l(a)} ≤ D?
The problem is NP-complete, and its proof is immediately deduced from the partition problem. The reduction from MS to SBDA and to PBDA is as follows. Each server corresponds to one processor, and each dataset corresponds to one task. The utility function for server i from each dataset location will be D/n − l(a) when the dataset is allocated to it, D/n if it is allocated to another server, and −D if it is not allocated at all. Furthermore, for each server i, guaranteed utilityi = 0. Finally, C is taken to be 0 for all the maximization cases. If there is an allocation that provides for each server, at the very least, its conflict utility, then each server has at least a utility of 0. This means that all the dataset are allocated, and for each server i, alloc(a)=i l(a) ≤ D; otherwise the utility of i would be negative. If there is an allocation of the tasks, then the same allocation will be feasible for SBDA and PBDA. This is because the utility for each server will be at most 0, even though all the datasets are allocated. Thus the sum of l(a) is at most D, which means that the sum for all datasets will be n · D/n − a∈Ai l(a); so a∈Ai l(a) is at most D if the task allocation solution is feasible. 3.3.3
Implementation of Search Methods
In the previous section I proved the NP-completeness of maximizing the sum and the product under the constraints on the utilities of each of the agents. Hence
Negotiations about Data Allocation
45
obtaining an optimal allocation of datasets to servers is not computationally feasible. Thus we use a suboptimal method in order to find alloc. This section proposes a heuristic algorithm that can be used for finding suboptimal solutions for solving each of the problems above. We have tested several algorithms: Backtracking algorithm: this is a deterministic algorithm (Prosser 1993), which in the worst case will need exponential time for the number of the datasets to be allocated in order to find the first solution. Backtracking on Subproblems: this algorithm uses the backtracking algorithm for subproblems and merges the solution into one allocation. That is, if there are M datasets to distribute, they are arbitrarily divided into L subsets and the backtracking algorithm is run for each subgroup separately. Hill-Climbing: the method of “random restart hill-climbing” (Minton et al. 1992) is used. The hill-climbing search starts with an arbitrary allocation and attempts to improve it, that is, it tries to change assignments of datasets in order to satisfy more constraints until there is no way to improve the solution. The random-restart hill-climbing conducts a series of hill-climbing searches from randomly generated initial states, running each until it halts or until it makes no significant progress. It saves the best result found so far from all of the searches. There are several stop criteria that can be used, such as running a fixed number of iterations, or running until the best saved result has not been improved for a certain number of iterations. We run the algorithm for fixed CPU time units, in order to enable comparison of its results with the results of the other algorithms we have implemented. The algorithm is presented in (Schwartz 1997). Genetic Algorithm: a genetic algorithm (Goldberg 1989) for the data allocation problem. The simulated population contains a set of individuals. Each individual has a data structure that describes its genetic structure. The genetic structure, in our case, is an allocation. Given a population of individuals corresponding to one generation, the algorithm simulates natural selection and reproduction to obtain the next generation. The simulations showed that when there is a time constraint, the algorithm backtracking on subproblems achieves better results than the regular backtracking algorithm. The simulations described below show that the hill-climbing algorithm, which is nondeterministic, achieved significantly better results than the other algorithms used. We propose using backtracking on subproblems of
46
Chapter 3
large problems, when a deterministic algorithm has to be used; otherwise, the hill-climbing algorithm should be used. 3.3.4
Simulation Evaluation
The environment’s attributes, including the usage frequency of datasets by different areas, the distance between any two servers, and so on, were randomly generated. We implemented the conflict allocation as the static allocation, that is, each dataset is stored by the server that contains datasets with similar topics. However, we did not take into consideration the “none” option; the static allocation, as well as the allocation algorithms, ensures that each dataset will be allocated to a server. Most of the simulations used a measurement that excludes the gains from queries and the storage costs, since their influence on the sum of the servers’ utilities does not depend on a specific allocation. In particular, vcosts(alloc) denotes the variable costs of an allocation, which consist of the transmission costs resulting from the flow of queries. Formally, given an allocation, its variable costs are defined as follows: vcosts(alloc) = usage(i, ds) · distance(i, alloc(ds)) ds∈DS i∈SERV
· (answer cost + retreive cost).
The actual measurement used is called vcost ratio, and refers to the ratio of the variable costs when using negotiation and the variable costs when using the static allocation mechanism. The efficiency of the negotiation technique increases as the vcost ratio decreases. We ran simulations on various configurations of the environment in order to demonstrate the applicability of our techniques in various settings. The number of servers was 11, since this was the number of servers at NASA at the time of the simulations (but we also considered environments where the number of servers varied between 3–14.) The number of datasets to be allocated varied between 20 and 200. All the outcomes that are presented are the average of the results obtained from runs of randomly generated environments according to the specification of the specific simulation set as described in the subsequent sections. 3.3.4.1 Performance of the Algorithms The purpose of the first set of tests was to compare the performance of the proposed algorithms. First, we ran simulations with randomly generated environments, with 11 servers and 20–200 datasets to be allocated, for a limited time, where the time limit grows
Negotiations about Data Allocation
47
"hill-climbing" "backtracking-on-subproblems" "genetic" 0.98 0.96 0.94
vcost ratio
0.92 0.9 0.88 0.86 0.84 0.82 0.8 20
40
60
80
100 120 datasets number
140
160
180
Figure 3.1 vcost ratio as a function of the number of datasets when the different algorithms attempt to maximize the average utility of the servers.
linearly as the number of datasets increases. All the algorithms attempted to maximize the sum of the servers’ utilities. Figure 3.1 presents the vcost ratio as a function of the number of datasets for the different algorithms used. Note that the performance of an algorithm improves as the vcost ratio decreases. For all the configurations considered, the backtracking algorithm was not able to find any allocation better than the conflict allocation. Thus the vcost ratio equaled one for all these cases. The graphs of the figure specify the behavior of backtracking on subproblems algorithm, the hill-climbing algorithm, and the genetic algorithms. The hill-climbing algorithm achieved significantly better results than the other algorithms used. We also tested the algorithms on their maximization of the generalized Nash product, that is, the objective function was i∈SERVERS (U i (alloc) − U i (conflict alloc)). Figure 3.2 presents the average of the Nash product for
200
48
Chapter 3
350
300
hill climbing algorithm backtracking on s.p. genetic algorithm
Nash product
250
200
150
100
50
0 20
40
60
80
100 120 datasets number
140
160
180
Figure 3.2 The Nash product of the servers utilities when the number of datasets varies.
backtracking on subproblems algorithm, the hill-climbing algorithm, and the genetic algorithm as a function of the number of datasets in the environments considered. Again, the simple backtracking algorithm did not find appropriate allocations in the cases studied, while the hill-climbing algorithm achieved the highest Nash product, namely, the best results. In conclusion, in simulations, the hill-climbing algorithm achieved the best results. Moreover, using a genetic algorithm requires determining some parameters, which cannot be given by the designers in advance, that prevent the genetic algorithm from being the preferred algorithm for the agents’ automated negotiation, since its performance depends on parameters that cannot be chosen by the designers for any possible environment. 3.3.4.2 The Influence of the Objective Function First the effect of the choice of a social-welfare criterion on the hill-climbing algorithm’s results is studied more closely. We ran simulations with the hill-climbing algorithm on randomly generated environments with 11 servers and 30 new datasets.
200
Negotiations about Data Allocation
49
Table 3.1 Comparison of different social-welfare criteria. The third column (CU) indicates the average of the relative dispersion of the utility among the agents (std/mean), and the last one specifies the average of the relative dispersion of the added benefit (CI) of the negotiation among the agents. Method
Vcost ratio
CU
CI
static allocation maximizing the sum of the servers’ utilities maximizing the generalized Nash product optimal solution without constraints
1 0.77 0.78 0.71
0.148 0.118 0.116 0.171
* 0.94 0.87 1.65
The social welfare criteria studied are: the sum of the servers’ utilities, the generalized Nash product,10 and the sum of the servers’ utilities, without any constraint on the utility obtained by each server, which leads to the optimal solution of a centralized system (optimal). These results were compared with the servers’ results from the static allocation, which can be obtained without negotiation (and also served as our opting-out outcome). The results are presented in table 3.1. The second column specifies the average of vcost ratio. The third column (CU) indicates the average of the relative dispersion of the utility among the agents (std/mean), and the last one specifies the average of the relative dispersion of the added benefit (CI) of the negotiation among the agents. Note that as the values of CU and CI decrease the fairness of the allocations increases. The results indicate that maximizing the sum achieves a slightly lower vcost ratio than does maximizing the Nash product and is not far from the optimal solution. Maximizing the Nash product achieves a lower dispersion of the utilities and gains due to the negotiation, and they both do much better than the optimal with respect to dispersion of the utility among the servers. 3.3.4.3 Influence of Agent’s Deviation Section 3.2.2 presents a protocol in which each server will make an offer that maximizes a predefined social welfare criterion, and the best offer according to that criterion will be agreed upon during the negotiation. Recall that a server may try to find an offer that is good primarily for itself, and only secondarily for society. It is important to know how such a deviation influences the negotiation results. In our test, we assume that each server tries to maximize its own utility, under the strict constraints of giving all the servers at least their conflict utility. The regulation is not changed, that is, the allocation with the highest sum of utilities is chosen. Deviation of all the agents toward maximizing their own utility function is the
50
Chapter 3
worst possible case with regard to the maximization of the welfare criteria. We compare the results obtained in this case with the results when each server tries to maximize the social welfare criterion, that is, the sum of the servers’ utilities. We performed 100 runs on randomly generated environments, and the results are presented in figure 3.3. The first bar, “self,” represents vcost ratio obtained when each server tries to maximize its own utility function. The second bar, “sum,” represents vcost ratio when each server tries to maximize the sum of servers utilities, and the third bar, “optimal,” is the lower bound of vcost ratio, which means the result of maximizing the sum of servers utilities ignoring the hard constraints. The results obtained when all the agents try to maximize their own benefits (“self”) are still significantly better than the static allocation. 0.88 "vcost-ratio" 0.86 0.84
vcost ratio
0.82 0.8 0.78 0.76 0.74 0.72 0.7 self
sum maximizing criteria
optimal
Figure 3.3 Deviation affect on the results: This figure presents the value of vcost ratio for different situations. The first bar shows vcost ratio obtained when each server tries to maximize its own utility, ignoring the social welfare criterion of maximizing the average of the servers’ utilities. The second bar shows vcost ratio obtained when maximizing the average of the servers’ utilities, with constraints on the utility of each server. The third bar shows vcost ratio obtained when maximizing the average of the servers’ utilities, without any constraint.
Negotiations about Data Allocation
51
3.3.4.4 Influence of Parameters of the Environment on the Results The next set of runs examines the influence of several parameters of the environment on the results obtained when using negotiations. The hill-climbing algorithm is used in these runs as well. We examined how the change in several parameters of the environment influences vcost ratio. Each set of simulations kept all but one of the parameters fixed. The randomly generated environments of the simulation runs include 11 servers and 100 datasets. We chose answer cost to be similar to retrieve cost, and thus, in general, theorized that each server prefers to store datasets in remote servers rather than locally. We examined the effect of the distribution of the usage of datasets by servers. First, we considered situations in which, given a mean usage frequency, the usage frequency of each dataset by each server has a uniform distribution between 0 and 2·mean. As presented in the graph of figure 3.4, changing the mean did 0.83
0.825
vcost ratio
0.82
0.815
0.81
0.805
0.8 0.2
0.4
0.6
0.8
1
1.2
1.4 1.6 1.8 2 mean usage frequency
2.2
2.4
2.6
2.8
Figure 3.4 Changing frequency usage: This graph presents the vcost ratio obtained when the average frequency usage varies.
52
Chapter 3
not significantly influence the variable cost ratio. The reason for this may be the type of environments considered. The cost parameters in the simulation causes the transmission cost to be the significant component of the costs in the system. A change in the mean usage means a linear transformation over the transmission cost, which does not significantly influence the results, nor the measurements. The graph in figure 3.5 presents vcost ratio as a function of the standard deviation of the usage when the usage is driven from the normal distribution with a mean of 0.4. In this case, the variable cost ratio decreases as the std of the usage increases; that is, negotiation is more beneficial when the usage of the datasets is more dispersed. The bar of figure 3.5 describes the results when distances are distributed uniformly, rather than normally with varied std as illustrated in the graph. In the case of uniform distribution, the usage frequency is generated between 0 and 0.8. Thus its standard deviation is 0.23094. In fact, the results obtained when the distances are uniformly distributed are similar to the results obtained when normal distribution with std is between 0.21 and 0.24. Moreover, 1 0.95 0.9 0.85
vcost ratio
0.8 0.75 0.7 0.65 0.6 0.55 0.5 uniform
0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0.38 usage standard deviation
Figure 3.5 Changing frequency standard deviation. This graph presents vcost ratio obtained when the average frequency usage varies. The bar describes the results when the usage is distributed uniformly.
Negotiations about Data Allocation
53
as the standard deviation of the usage frequency increases, the quality of the results obtained increases. The reasoning behind this is that as the standard deviation of the usage decreases, the usage of a dataset by different areas is similar, and the specific location of each dataset becomes less important. We also studied the effect of the distances between any two servers on the vcost ratio. We examined a uniform distribution between given minimal (min) and maximal (max) distances, and a normal distribution with a given std and mean of (min + max)/2. As presented in figure 3.6, when the max distance (and thus the mean) of the distribution increases, negotiation becomes slightly more beneficial. However, as presented in figure 3.7, the variable cost ratio decreases as the std of the distances increases; that is, negotiation is of greater benefit when there is a greater distance between the servers. The explanation for this result is simple. As the standard deviation of the distances becomes smaller, the distances between any two servers become similar, and the location of each dataset becomes less important. The first bar in figure 3.7 represents the results 0.86 0.85 0.84 0.83
vcost ratio
0.82 0.81 0.8 0.79 0.78 0.77 0.76 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000 8500 9000 9500 10000
distance upper bound
Figure 3.6 Changing distance: This graph presents the vcost ratio obtained when the average distance between servers varies.
54
Chapter 3
0.95
0.9
vcost ratio
0.85
0.8
0.75
0.7
0.65 uniform
0
1000
2000
3000 4000 5000 distance standard deviation
6000
7000
8000
Figure 3.7 Changing std of distance: This graph presents the vcost ratio obtained when the standard deviation of the distances between sites varies. The bar describes the results when distances are distributed uniformly.
when the distances are uniformly distributed. In this case the average standard deviation is 2742.41, and we obtained similar results to those obtained when the std was 3000 in the normal distribution case. In addition, we examined the effect of cost factor changes on the variable cost ratio. Changing the query price did not influence the results significantly. However, as figure 3.8 demonstrates, as the retrieve cost increases, the benefit of negotiation also increases.11 Intuitively, this can be explained as follows. In the initial environment, the servers prefer not to store the datasets locally. However, when the retrieve cost increases, the servers are more willing to store datasets locally, and thus there are fewer constraints on possible allocation and better agreements can be reached. As presented in figure 3.9, as answer cost increases, the benefit of negotiation decreases, and the datasets’ mean size or storage costs have the same influence on the results.12 This may be the result of the fact that increases in
0.86
0.84
0.82
vcost ratio
0.8
0.78
0.76
0.74
0.72
0.7 0.0001
0.0002
0.0003
0.0004
0.0005 0.0006 retrieve cost
0.0007
0.0008
0.0009
Figure 3.8 Changing retrieval cost: This graph presents the vcost ratio obtained when the retrieve cost varies.
0.9
0.88
vcost ratio
0.86
0.84
0.82
0.8
0.78 0.0001
0.0002
0.0003
0.0004
0.0005 0.0006 answer cost
0.0007
0.0008
0.0009
Figure 3.9 Changing answer cost: This graph presents vcost ratio obtained when the answer cost varies.
56
Chapter 3
the answer cost cause the storage of datasets to be less beneficial, and thus it is more difficult to find a good allocation, since the constraints are stronger and the improvement rate is lower. The number of servers and datasets also influences the results. As shown in figure 3.10, when the number of servers is kept fixed, the results of the hillclimbing algorithm are more beneficial as the number of datasets increases. This is similar to the results presented in figure 3.1, where a slightly different configuration was considered. On the other hand, figure 3.11 demonstrates that when the number of datasets is fixed, the results are worse as the number of servers increases. The reason for the first observation is probably that dataset allocations are independent of one another. Thus, as the number of datasets increases, there are more possibilities for increasing the benefit for all the servers. On the other hand, when the number of servers increases, there are more constraints on finding possible allocations and it is much more difficult to find a suboptimal allocation. 0.87
0.86
vcost ratio
0.85
0.84
0.83
0.82
0.81
0.8 20
40
60
80
100 120 datasets number
140
160
180
Figure 3.10 Changing the number of datasets: This graph presents vcost ratio obtained when the number of datasets increases.
200
Negotiations about Data Allocation
57
0.88
0.86
0.84
vcost ratio
0.82
0.8
0.78
0.76
0.74
0.72 3
4
5
6
7
8 9 servers number
10
11
12
13
Figure 3.11 Changing the number of servers: This graph presents the vcost ratio obtained when the number of servers increases.
3.4
Dataset Allocation—The Incomplete Information Case
In the previous sections, all the servers are assumed to have complete information about each other. In particular, they all know the expectations about future usage of each dataset by the clients located near each server. In real-world situations, if there is neither a central statistical unit nor the ability to enforce true reporting, this assumption is not valid. Thus we will now consider a distributed allocation mechanism for environments in which the agents have private information about the usage of the datasets: each server knows only the usage level of the clients associated with it and the usage level of the datasets stored in its database. We will add a revelation mechanism prior to negotiations that enforces truthtelling in most of the cases. The revelation mechanism includes the following steps. At the first step, all the agents are asked to report, simultaneously, all of
14
58
Chapter 3
their statistical information about past usage of datasets. Given this information, each server calculates the expected usage function, that is, the expected usage of each dataset for the clients around each server. After this step, the negotiations proceed as in the complete information case. To avoid manipulative reports in the first step, we have to ensure that when using such a revelation process no server will have an incentive to lie. In presenting the revealing protocol and discussing its properties, the notion of local dataset with respect to server i will denote a dataset stored in server i. The concept, remote dataset w.r.t. server i, will denote a dataset stored in another server. Furthermore, local users of server i are the users located in its geographical area. We propose that the servers use the following protocol: 1. Each server i will broadcast the following: (a) for each dataset, ds, the past usage of ds by server i; and (b) for each server, j, j = i, and for each local dataset ds with respect to i, the past usage of ds by j (the servers send these data simultaneously). 2. each server will process the information obtained from the other servers, and then negotiations will take place, as in the case of complete information. No communication is permitted between the servers before and during step 1. Situations in which two servers give different reports concerning the same fact is defined as follows. Definition 3.4.1 Conflicting reports: reports of server i and server j are in conflict if there is a local dataset ds w.r.t. server i, such that i’s and j’s reports on the usage of dataset ds by server j are different. Conflicting reports by servers i and j indicate that at least one of the servers is lying. In such cases, a high penalty for both servers provides an incentive for truthful reporting in most of the reports. The penalty imposed on servers i and j should be distributed evenly among the other servers. The following lemma shows that a server will tell the truth about its own usage of remote datasets and about the usage of other servers of its local datasets. Lemma 3.4.1 If the servers follow the prenegotiation protocol described above, and there is a high penalty exacted for servers with conflicting reports, then there is a Nash equilibrium where each server is telling the truth about its usage level of remote datasets and about other servers’ usage of its local datasets.
Negotiations about Data Allocation
59
Proof: Each server can tell or not tell the truth about its usage of each remote dataset. In addition, each server can tell or not tell the truth about the others’ usage for each local dataset. Suppose that all servers always tell the truth. If server i changes its report about the usage of one remote dataset and thereby not tell the truth, then the difference will be revealed immediately, since the server where the remote dataset is stored will report truthfully. Server i will be given a high penalty, and thus it is not worthwhile to lie. If server i changes its report about the usage of one of its local datasets by one of the other servers and thereby not tell the truth, then this difference will be revealed immediately, since the server whose usage had been changed will report truthfully according to its strategy. Thus server i will be given a high penalty and again, in this case, it is not worthwhile to lie. The problematic case is that of a server’s reports on the usage of its own local datasets, by queries of its local users. In this case the server is able to lie about the usage of local datasets since there are no other reports on the same facts, and hence a lie will not be immediately revealed. One possibility is to ignore all reports of any server about its own usage of a local dataset. This yields inefficiency in the negotiations outcome, but it prevents manipulation by the servers. If the servers do use the data about self usage of local datasets, a server may change its reports if it expects a higher utility when doing so. However, if it reports a lower usage of a dataset ds than the real one, the other servers will believe that its retrieval costs of documents of datasets similar to ds are lower than the actual retrieval costs. When using an allocation mechanism that maximizes the sum or the product of the servers’ utilities, such a lie may lead to an allocation in which those datasets will be stored far from the liar (since it reported a low usage), and thus the liar’s utility will be lower than its utility from the allocation that might have been chosen had it told the truth. If a server reports heavier usage of a dataset ds than the real one, it may cause datasets similar to ds to be stored by the liar. This may cause the liar’s storage and answer costs to be higher than the retrieval costs if it had been stored elsewhere (based on an honest report). Moreover, in such cases, there may be datasets similar to ds that are allocated remotely according to the static allocation. In this case, if server i reports heavier usage of ds, it causes the other servers to believe that it has higher retrieval costs of those datasets that are similar to ds than it really has. Thus the other servers believe that i’s utility from opting out is lower than it actually is. Therefore, the agreed upon allocation
60
Chapter 3
may be worse for the liar than opting out, and again, worse than the situation reached when reporting the truth. In conclusion, without knowing the exact reports of the other servers concerning their usage of their local datasets, lying may harm the server. Moreover, even if it has an estimation about the other servers’ usage, it needs unreasonable computation time in order to compute which lie is beneficial, since the problem of finding an optimal allocation itself is NP-complete, and the problem of finding which lie will lead to an allocation that is more beneficial than if it had told the truth, is, in general, much more complicated. Thus finding a beneficial lie is intractable. In summary, the proposed two-step mechanism leads to the same solution as in the case of complete information. However, telling the truth about self-usage of a local dataset is not always stable, and a penalty system is needed in order to induce partially true reports. In the next sections we will discuss problems related to the data allocation problem and discuss other approaches for dealing with incomplete information. 3.5
Distributed File Allocation
Our research is closely related to the file allocation problem in distributed environments. The file allocation problem deals with how to distribute files and documents among computers in order to optimize system performance. The various file allocation models are presented as optimization problems in terms of objective functions and constraints. The file allocation problem is known to be difficult, and many contributions have been made toward solving it. The file allocation problem was first investigated by Chu (1969), who developed a model for obtaining the minimum overall operating costs of the system, subject to constraints on the expected access time and on the available storage capacity at each node; the number of copies of each file was assumed to be fixed. Chu defined a generalized model and described it as a nonlinear zero-one programming problem. He suggested using integer programming methods to solve the problem, but the complexity of these methods was unreasonable. Casey (1972) considered a more complex environment, in which the number of copies of each file was not assumed to be fixed. The difference between retrieval and update transactions was stressed: while retrieval transactions are routed to only one copy of the file, update transactions are routed to all
Negotiations about Data Allocation
61
the copies. Eswaran (1974) proved that Casey’s problem is NP-complete and suggested using a heuristic approach. Dowdy and Foster (1982) gave an extended description as well as mathematical formulations of the objective function and the constraints that may be considered in the file allocation problem, and they described some known solution techniques. The data allocation problem considered here is a simple case. Only one copy of each file is allowed, as discussed below. Thus there is no difference between retrieved and updated transactions (any transaction is called a “query”), and the objective function of each server is simple and does not consider the system load. However, the problem we consider is different from the classic file allocation problems, since the servers in our case are self-interested. Thus the problem is much more difficult to solve. Any solution has to consider each server involved, and we have to prove that each solution is really stable against servers’ manipulations. Apers (1988) investigates the problem of the allocation of the data of a database to the sites of a communication network, and he also mentions the distributed version of the problem. He suggests that in the distributed system, the problem solving should be done in a distributed fashion and each site should solve part of the problem. Du and Maryanski (1988) consider the problem of data allocation in a clientserver environment, which may change dynamically when replications are allowed. They propose a greedy algorithm, using data from accounting systems located in the database servers. In the first stage, each server provides a list of files it would like to obtain copies of, and a list of files it would like to delete. In the second stage, a central greedy algorithm is used to reallocate files. Their research is important to us from several perspectives. First, the article deals with a dynamic distributed database, and our environment is also dynamic. Second, the first stage of their algorithm is done in a distributed way and the servers are assumed to have their own interests, although there is a central unit that decides on the final allocation using a central objective function. In some of the cases above, we proposed that the agents will find allocations that maximize a social welfare criterion. Even in such situations our problem is more difficult than the classical file allocation problem. In particular, the file allocation problem has many variations (Ceri, Martella, and Pelagatti 1982; Apers 1988; Du and Maryanski 1988), but most of them deal with maximizing a global function under storage limitations. That is, the goal is to optimize the performance of the distributed system as a whole, but it is assumed that
62
Chapter 3
the amount of memory that can be effectively installed at a single node is limited (Ceri, Martella, and Pelagatti 1982). The problem we are dealing with is different. We have a global function to maximize, as in the classic file allocation problem. However, the constraints associated with server i are not only related to the files located at server i, but are also related to all the files in the system. That is, we do not assume constraints on the storage space, but we have constraints on the utility of each server, which is a function of the overall allocation. Thus we cannot automatically adopt the solutions of the file allocation problem as solutions for our problem. Ceri, Martella, and Pelagatti (1982) suggest a solution method for the file allocation problem that is based on the isomorphism of the multiple choice constraint knapsack problem. They suggest a solution method founded on an operations research method for solving the knapsack problem. Our problem is equivalent to another version of the knapsack problem, the multidimensional 0/1 Knapsack problem (Weingartner and Ness 1967). However, the algorithms presented in (Weingartner and Ness 1967) that achieve the optimal solution are intractable. The classical approach in operations research (Hillier and Lieberman 1995) to problems such as the different variations of the 0/1 knapsack problem is by a “branch and bound” algorithm. The assignment of an object to a knapsack is considered a binary variable, and the branch and bound algorithm assigns a value for each binary variable and backtracks when required. In our problem, we may consider xi, j to be a boolean variable, indicating that dataset i is allocated at server j. However, in the variation studied, only one copy of a dataset is allowed. Thus, if xi, j = 1, then xi,k = 0 for each k = j. This makes the branch and bound algorithm simpler, since whenever 1 is assigned to one of the variables, all the variables related to the same dataset are cleared. Alternatively, we actually used in this chapter a numerical variable, indicating for each dataset the number of the server where it is assigned, and run a search algorithm with backtracking in order to find the assignment for each variable. Another approach to the file allocation problem is to use a greedy algorithm. Du and Maryanski (1988), propose using a distributed candidate selection algorithm that uses the historic data from each server. Then, an allocation algorithm is run to choose the optimal assignment using heuristic benefit functions and a greedy search strategy. However, the algorithm is based on the fact that the constraints are on storage, and on the minimal number of copies of each file. In this case, deleting and then adding a file will not usually violate the storage
Negotiations about Data Allocation
63
constraints, whereas in our case, moving a file from one location to another may violate the constraints of a certain server. Another approach was taken by Siegelmann and Frieder (1992), who used genetic algorithms for the multiprocessor document allocation problem, and by March and Rho (1995) for the case of database design. In our work we also implemented a genetic algorithm and compared its outcome to other approaches. Another perspective on the problem is to consider it a constraint-satisfaction problem (CSP), and to solve it by using constraint satisfaction methods. The classic algorithm for solving CSP problems is a tree search algorithm with backtracking (Frost and Dechter 1994). There are several improvements to this solution that make the search more efficient (Prosser 1993; Minton et al. 1992). Our problem can be considered as a CSP, but it has a special structure, since each constraint involves all the variables. Therefore some of the improvements to the basic backtracking algorithm are irrelevant to our model. In our simulations we tested the behavior of a backtracking algorithm and compared it with the other methods we used for large problems. Another approach for solving CSP is to use a min-conflict hill-climbing algorithm (Minton et al. 1992). This approach was developed for binary constraint satisfaction problems, where each constraint involves at most two variables. It starts with a random assignment of values to the variables. Then it selects one variable whose value is in conflict with the value of other variables, and assigns it a different value that minimizes the number of variables in conflict with the chosen variable. We adapted this algorithm for our problem, where each constraint involves all the servers, and tested it for different configurations. In summary, there are several different methods for solving problems similar to ours. We have considered three different approaches: backtracking, genetic algorithms, and hill-climbing. There are no results of simulations for problems identical to ours that are discussed in the literature, so we cannot compare our detailed results to others. However, the algorithms we used are adapted from algorithms for similar problems, as described above. We also checked the behavior of the algorithms for different sizes of problems and different values of parameters. The main question is whether our results can be used when trying to maximize other utility functions. This seems plausible if the utility function is additive, as it is in our case. However, more research is required in order to test the algorithms for different structures of utility functions—for example, for nonadditive utility functions, or the role of other parameters of the utility function.
64
3.6
Chapter 3
Approaches to Protocols for the Incomplete Information Case
In this section we survey approaches for reaching agreements in incomplete information environments. Voting protocols (Dummett 1984): The agents decide on the location of each dataset separately. For each dataset, there are N possible locations. Each agent grades the possibilities, and the “winner” is chosen according to a voting protocol. This approach is problematic in our environment for several reasons. First, in general, all ordinal voting protocols13 are problematic to some extent. Arrow (Arrow 1951) provided a set of axioms to encode the minimum requirements of justice, fairness, and rationality that we might ask of a constitution, and he proved that there is no voting protocol that maintains this set of axioms. Second, a voting protocol does not ensure any lower bound for the utility of each agent, so one may attain lower utility than in the case of opting out. Third, the simple voting protocols are not protected against manipulating agents that do not report honestly. Thus these protocols are not appropriate for our environment.
•
Incentive compatibility mechanisms: The agents reveal their hidden information, and this mechanism chooses an agreement that meets the social welfare criterion, but also motivates the servers to report honestly. Below we will survey the relevant incentive compatible mechanisms and explain why we did not apply any of them to our problem. Additional discussion on related mechanisms can be found in chapter 9.
•
– Clarke tax mechanism: this mechanism (Clarke 1971; Groves 1973) uses a revelation step in which each agent reveals its cardinal preferences; the result reached is that which maximizes the sum of the agents’ utilities, and the agents have to pay taxes in a way that ensures that the interest of each agent is always to tell the exact truth in the revelation step. The mechanism was suggested for use in multiagent systems by (Ephrati and Rosenschein 1996). However, this mechanism results in a waste of social welfare, since the taxes taken from the agents are wasted. There is also no central “agency” in our environment to collect the taxes, and if we allow the taxes to be given back to the agents by some agreed-upon process, it will affect their considerations and yield lies. Another problem is that the Clarke tax mechanism is not safe against coalition formation: two or more agents may coordinate their reports in such a way that they will win without paying anything.14 This mechanism also does not ensure that each server will accept at least its conflict utility.
Negotiations about Data Allocation
65
– Lottery mechanism: Myerson (1979) considers a problem of an arbitrator trying to select a collective choice for a group of individuals when the arbitrator does not have complete information about their preferences. A typical arbitrator’s solution is a procedure in which the arbitrator first asks every player for some information about himself and then selects a choice, or a probability distribution over the possibilities, using the information the players have given him. The final result is determined by a lottery, with the probability distribution selected by the arbitrator according to the reports. Since the arbitrator cannot force the players to give truthful responses, he must design the choice mechanism so that it provides an incentive to be truthful. Such a mechanism is called a Bayesian incentive-compatible mechanism. Myerson suggests choosing the Bayesian incentive-compatible mechanism that maximizes the generalized Nash product, as suggested by Nash (1950). Implementing such a mechanism for our application is problematic since there is no central arbitrator to design the mechanism and decide on an agreement. Moreover, a Bayesian incentive-compatible mechanism requires knowledge about the distribution of the types of each server, for example, which types of preferences are possible and the probability for each type. Such knowledge is not available in our case since there is no common knowledge about the type of distribution of the servers. Negotiation without revelation: Several economic models explain delay in the negotiation process as the result of uncertainty. Kennan and Wilson (1993) give a brief overview of several models of incomplete information. All of these models assume incomplete information about the value of time or the value of agreements, when an agreement refers to, for example, wages or the price of some item. These models are no help in our case, where an agreement refers to a decision about allocating several indivisible goods. In our case, in contrast to the cases considered by economic literature, delays will not reveal the agent’s value for an allocation, since the result is complex and composed of more than one dimension. Wellman (1993) uses market-oriented programming as an approach to distributed computation based on market price mechanisms (for more discussion see section 9.2). Several allocation problems can be solved efficiently and in a distributed fashion by using the computational economy model and finding its competitive equilibrium: flow problems, the allocation of computation time, the allocation of computational resources, the provision of distributed information services, and so on. •
66
Chapter 3
Mullen and Wellman (1995) and Wellman (1993) use the idea of marketoriented programming to solve a problem of information service provision (for more discussion see section 9.2). In this model, a popular information service (of which the canonical example is Blue-Skies, a weather-information server based at the University of Michigan) is available on the Internet, and local agents decide whether to serve as mirror sites for the service. In their economic model, there are producers such as carriers that produce transmission of information through the network, manufacturers that provide CPU access and disk storage, and mirror providers that have the capability of transferring local storage and other resources into provision of the information service at their local site. A consumer is an individual end user or an aggregate of all the users at a particular site. Mullen and Wellman suggest a competitive-market pricing of the transmission price (when no mirror site is established) and the price of establishing mirror sites. The competitive approach is applicable only when there are several units of each kind of goods, since it is not rational for the consumers and producers to ignore the effect of their behavior on the prices when they actually have an influence. In our case, each data item is unique, so a competitive approach cannot be used. Furthermore, in our model, there are few servers involved, so the competitive assumption does not hold. Thus we propose negotiations for our problem. Bidding mechanism: A bidding mechanism cannot be applied in environments in which all the agents care about all the details of all the agreements that are reached between any agents. Thus it is not applicable in the environment considered in this chapter, in which each agent cares where each dataset is stored, and not only whether it is stored locally or not. However, the bidding mechanism can be used for the data allocation problem for an environment where each server cares only whether a dataset is stored locally or not, but does not care about the exact location of each dataset. We compare the bidding mechanism and the negotiation model in section 9.1.2.
•
In summary, in this chapter we proposed solving the data allocation problem using the strategic-negotiation model. We proved that negotiation is beneficial and that an agreement will be reached during the first time period. For situations of agents with incomplete information, a revelation process was added to the protocol, after which a negotiation takes place, as in the complete information case.
4
Negotiations about Resource Allocation
In some domains, owing to limited resources, agents must share common resources (e.g., communication lines, roads, bridges, clean air). In other domains, when resources are unlimited, agents may still mutually benefit from sharing a common resource since resources may be expensive (e.g., printers, satellites). In such situations there is competition for a valuable resource, with each agent seeking a larger share of the resource. Thus sharing a common resource requires a coordination mechanism that will manage the use of the resource. When there is no central controller to manage the resource, applying the strategic-negotiation model is beneficial. For example, in the example of the robots of Mars, two robots may need the same resource simultaneously, and it is not efficient to allow the operators on Earth to resolve the conflict. Also, the development of a central operating system for each resource may not be possible or beneficial. Therefore, one option is for the agents to negotiate to reach an agreement on the usage of the resource. In this chapter, we consider situations of bilateral negotiations, where one agent already has access to the resource and is using it during the negotiation process, while the other agent is waiting to use the resource. When an agent opts out of the negotiation, it causes damage to the resource. We first consider the case of complete information, and we then extend the model to deal with situations of incomplete information. We conclude with cases of multiple encounters. In all cases the negotiation ends no later than the second time period, and usually in an agreement. These results are surprising because it is not clear why the agent holding the resource would grant access to another agent. It turns out that the threat of the waiting agent to opt out of the negotiation, which could cause damage to the resource, generates a loss for the agent using the resource, thus forcing it to reach an agreement that is better for the waiting agent than opting out. However, the agent who holds the resource will only agree to the best agreement for itself that will also prevent the other agent from opting out. Opting out may occur when there is incomplete information that leads to a misunderstanding. However, this rarely occurs. 4.1
Problem Description
Suppose some group of agents share a joint resource. The joint resource can be used only by one agent at a time. An agreement is sought so that all the agents will be able to use the resource. An agreement is a schedule that divides the usage of the resource among the agents.1 There is a certain cost associated
68
Chapter 4
with the time that elapses between the time that the resource is needed by an agent and the time the agent actually gains access to the resource. This cost depends on the internal state of the agent, that is, its task load, its disk space, and so on. We consider a bilateral negotiation between two agents that need the same resource. One of them—the Attached Agent (A)—is using a resource that another agent—the Waiting Agent (W )—needs. Consequently, W starts a negotiation process to obtain access to the resource.2 During the negotiation process, A continues to hold the resource and to work on its goal. We consider situations where A and W should divide M units of a resource or the time usage of the resource. An agreement is an ordered pair (s A , sW ), where s A + sW = M. si is agent i’s portion of the resource.3 Note that s A solely determines sW (and visa versa), and thus there is only one attribute to an agreement. Hence, formally, S = {(s A , sW ) | , s A , sW ∈ IN , s A ≥ 0, sW ≥ 0, s A + sW = M}. When one of the agents opts out of the negotiation, there may be some damage to the resource. If the resource is damaged, A’s session (of using the resource) is interrupted, both agents need to wait, and they then may try to obtain the resource. Assume that W is the first agent to make an offer. This is a reasonable assumption since A is using the resource and does not have a motive to start the negotiations. Therefore it is reasonable to ask of the agent that is actually using the resource what is its motivation to participate in a negotiation process (since it already has the resource and can accomplish its goals). We believe that this agent has several reasons for entering the negotiation: Future negotiation: In the future, the attached agent might be in a situation in which it will need a resource that is possessed by the other agent.
•
Waiting Agent’s threat: If the attached agent does not negotiate, the other agent might cause damage to the resource, so the attached agent will not be able to accomplish its goal.
•
Costless process: We assume that the negotiation is not costly to the attached agent. It may require only some computation resources.
•
The agents’ preferences over different agreements and the agreements they prefer over opting out play an important role in the manner in which the negotiation ends. Furthermore, changes in the agents’ preferences over time will change their strategies in the negotiation and, as a result, the agreements they are willing to reach. In the next section we make several assumptions about these preferences.
Negotiations about Resource Allocation
4.2
69
Attributes of the Utility Functions
As in previous chapters, the negotiation may end with an agreement from the set S, at some period time t ∈ T . It can also end if one of the agents opts out of the negotiation (i.e., chooses Opt). The negotiation may also continue forever without reaching an agreement and without any of the agents opting out, which is called Disagreement. We assume that agent i ∈ Agents has a utility function over all possible outcomes: U i : | {S ∪ {Opt} | × T } ∪ {Disagreement} → IR. We present a number of assumptions concerning the utility functions of the agents in the resource allocation case.4 A0r Disagreement is the worst outcome: For each x ∈ { | S ∪ {Opt} | × T }, i ∈ Agents: U i (Disagreement) < U i (x). The agents prefer any possible outcome over disagreement. Assumption A0r is similar to A0d of the data allocation problem (see section 3.1.2). In both cases we assume that continuing the negotiations indefinitely is the worst outcome for all agents. Condition A1r presented below requires that among agreements reached in the same period, agent i prefers larger portions of the resource. A1r The resource is valuable: For all t ∈ T , r, s ∈ S and i ∈ Agents : ri > si ⇒ U i ((r, t)) > U i ((s, t)).5 For agreements that are reached within the same time period, each agent prefers to get a larger portion of the resource. This assumption is very different from the ones presented in the data allocation case. There, the negotiation is on the allocation of datasets instead of on the allocations of a resource. A server does not always prefer to have a larger number of datasets. In some cases, storing a dataset locally may yield losses. The next assumption expresses the agents’ attitudes toward time. W loses over time while A gains over time. A2r Cost/benefits over time: For any t1 , t2 ∈ T , s ∈ S and i ∈ Agents if t1 < t2 , U W ((s, t1 )) ≥ U W ((s, t2 )) and U A ((s, t1 )) ≤ U A ((s, t2 )). By comparing A2r with A1d one can easily see the difference between the servers in the data-allocation environment and the agents in the resourceallocation environment. Here, one of the agents loses over time while the other gains over time, whereas in the data-allocation case, all the agents lose over time. We assume that the agents have a utility function with a constant cost or gain due to delay. Every agent bears a fixed cost for each period. That is, agent A
70
Chapter 4
has a constant time gain, c A > 0, and each agent W has a constant time loss, cW < 0. A3r Agreement’s cost over time: Each agent i ∈ {W, A} has a number ci such that: ∀t1 , t2 ∈ T , s, s¯ ∈ S, U i ((s, t1 )) ≥ U i ((¯s , t2 )) iff (si + ci t1 ) ≥ (¯si + ci t2 ), where cW < 0 and c A > 0. We assume that agent A gains over time (c A > 0) and that agent W loses over time (cW < 0), i.e., agent W prefers to obtain any given number of units sooner rather than later, while agent A prefers to obtain any given number of units later rather than sooner. Notice that assumptions A1r and A2r are simple inferences from assumption A3r . We still would like to be able to distinguish between the two properties of the utility functions. One is the desirability of the resource and the second is monotonic cost over time. Assumption A3r should be compared with assumption A1d of the dataallocation case (section 3.1.2). Here, we consider a simpler utility function, that is, a utility function with a constant cost or gain due to delay. In the dataallocation case the utility function has both a constant cost due to delay and a constant discount rate. The next assumption concerns the utility of opting out. W prefers opting out sooner rather than later, and A always prefers opting out later rather than sooner since it continues to use the resource during the negotiations. This is because A gains over time while W loses over time. For this reason A would never opt out. A would prefer for agent W to opt out in the next time period over itself opting out in the current time period. A4r Cost of opting out over time: for any t ∈ T , U W ((Opt, t)) > U W ((Opt, t + 1)) and U A ((Opt, t)) < U A ((Opt, t + 1)). The above assumption puts few restrictions on the utility from opting out. In particular, the change in the utility from opting out due to delay may be different than the change in the utility from reaching agreements due to delay. This is different from the data-allocation case, where opting out is actually implemented as one of the possible agreements, and thus its utility behaves just like utility gained from agreements. Even though agent A gains over time, an agreement will be reached after two periods at most, since agent W can threaten to opt out at any given time. This threat is the driving force of the negotiation process toward an agreement. If there is some agreement s that A prefers at time t over W ’s opting out in the next period t + 1, then it may agree to s. Therefore the main factor that plays
Negotiations about Resource Allocation
71
a role in reaching an agreement is the best agreement for agent A in a given period t that is still preferable to W than opting out in time period t. Recall from section 2.3 that this agreement is denoted by s˜ A,t ∈ S. If agent A will not agree to such an agreement, its opponent has no other choice but to opt out. Recall from section 2.3 that Possiblet is the set of agreements at step t that are not worse for any agent than opting out. If Possiblet is not empty, U i ((˜s i,t , t)) = maxs∈Possiblet U i ((s, t)). In the current chapter, if Possiblet is not empty then there will be only one minimal s˜i,t . This is because of assumption ( A1d ) above. Recall that sˆi,t is the worst agreement for i in Possiblet . Since in this chapter we consider a strictly conflicting environment of agents, sˆ W,t = s˜ A,t . In the rest of the chapter we will use sˆ W,t rather than s˜ A,t since it facilitates the discussion when the agent has incomplete information. Agent A’s loss from opting out is greater than that of W , since A’s session (of using the resource) is interrupted. Thus we make the following assumption. A5r Range for agreement: For every t ∈ T , •
If Possiblet+1 =
∅, then Possiblet =
∅.
If Possiblet+1 =
∅, then U W ((ˆs W,t , t)) ≥ U W ((ˆs W,t+1 , t + 1)), W U ((Opt, t)) ≥ U W ((ˆs W,t+1 , t + 1)), U A ((ˆs W,t+1 , t + 1)) ≥ U A ((ˆs W,t , t)). •
•
If Possiblet =
∅, then U A ((ˆs W,t , t)) ≥ U A ((Opt, t + 1)).
The condition in the first bullet above specifies that the property of the nonemptiness of Possiblet is monotonic. The second bullet indicates that if there is still a possibility to reach an agreement in the next time period (i.e., Possiblet+1 is not empty), then W prefers to opt out at time period t or to agree to the worst agreement for W that is still better for W than opting out at time t (i.e., sˆ W,t ) than to wait until the next time period (t + 1) and reach agreement on the worst agreement for W that is still better for W than opting out at period t + 1, (i.e., sˆ W,t+1 ). However, A’s preferences are reversed. A preferes to reach the agreement sˆ W,t+1 at time period t + 1 than to reach the agreement sˆ W,t at period t. The third bullet specifies that if there is still a possibility to reach an agreement, then agent A prefers sˆ W,t at time period t to opting out in the next time period. Our last assumption is that in the first two time periods,6 period 0 and period 1, there is an agreement that is preferable to both agents (regardless of their type) over opting out.
72
Chapter 4
A6r Possible agreement: Possible0 =
∅ and Possible1 =
∅. Such an assumption wasn’t necessary in the data allocation case. There, the conflict allocation always belongs to Possiblet and thus there is always an opportunity for agreement. Given the above properties of the agents’ utility functions, we show below that an agreement will be reached no later than in the second time period of the negotiation. We first demonstrate the assumptions about the agents’ utility functions, using the agents on Mars example. EXAMPLE 3 NASA and the European Space Agency (ESA) have embarked on a joint scientific mission to Mars involving separate mobile labs launched from a single shuttle in orbit around the planet. Each agency has contracts with a number of companies for conducting the experiments. These experiments were preprogrammed prior to launch. Arrangements were made prior to launch for the sharing of some equipment to avoid duplication and excess weight on the mission. Instructions to begin each experiment must be sent from Earth. NASA’s antenna was damaged during landing, and it is expected that communications between the United States and its lab on Mars will be down for repairs for one day (1440 minutes) of the planned five-day duration of the mission. NASA can use a weaker and less reliable backup line, but this involves diverting this line from other costly space experiments, and thus the expense of using this line is very high for NASA. NASA would like to share the use of ESA’s line during the one-day period so that it can conduct its planned research program. Only one research group can use the line at a time, and that line will be in use for the entire duration of the particular experiment. A negotiation ensues between the automated agents that represent the two labs on Mars over division of use of the ESA’s line, during which time ESA has sole access to the line, and NASA cannot conduct any of its experiments (except by use of the very expensive backup). By prearrangement, ESA is using some of the U.S. equipment for their experiments and are earning $5000 per minute. While the Europeans cannot conduct any of their experiments without some of the U.S. equipment, the United States could conduct some of its experiments without ESA equipment. The United States is losing $3000 per minute during the period in which they must rely on their backup communications line. An agreement between NASA and ESA to share the communications line will result in a $1000 gain per period (minute) for each group. If an agreement on sharing the line is not reached, NASA can threaten to opt out of the arrangement. In this case, NASA will be able to conduct a small portion of its experiments by using all of its equipment and none of ESA’s
Negotiations about Resource Allocation
73
equipment, and by using the backup communications line. NASA’s overall gain will be $550,000, but it will lose $1000 per any minute of the negotiation. If NASA opts out, the Europeans will not be able to continue their experiments (without NASA’s equipment) and their gain will be restricted to whatever they had gained at the point NASA opted out. If the Europeans opt out, they will need to pay NASA $100,000 for use of the U.S. equipment up to that point. Note that the Europeans play the role of A (attached to the communication line) and NASA plays the role of W (waiting for the line). One dollar is the smallest unit of currency in this example. Formally: U e ((s, t)) = 1000se + 5000t, U e ((Optn , t)) = 5000t, U e ((Opte , t)) = 5000t − 100000 •
U n ((s, t)) = 1000sn − 3000t, U n ((Optn , t)) = 550000 − 1000t, U n ((Opte , t)) = −1000t •
•
M = 1440
Consider assumptions A1r –A6r . Assumption A1r is valid in this case since the resource is valuable to both sides. NASA is losing over time and ESA is gaining over time; thus A2r is true. ce = 5 and cn = −3, and thus A3r is valid. NASA prefers to opt out sooner rather than later since it loses 1000 per time unit, and ESA prefers opting out later rather than sooner since it gains 5000 per time unit; thus A4r is true. ESA always prefers any agreement over opting out. Concerning NASA, n,t sˆ = (890 − 2t, 550 + 2t). Substituting sˆn,t in U n we obtain: U n (ˆs n,t , t) = 550,000 − 1000t and in U e we obtain U e (ˆs n,t , t) = 890,000 + 3000t. It is easy to see that all the inequalities of A5r and that of A6r hold. In the subsequent sections we show that in cases where the agents’ utility functions satisfy the above assumptions, the negotiation will end in the second period at the latest. We first consider the case of complete information where the agents always reach an agreement. 4.3
Complete Information
In the analysis of negotiation when the agents have complete information, under the assumptions made in the previous section, two cases are considered. In the first case an agent loses less per period while waiting than it can gain per period while using the resource. In the second situation, an agent loses more
74
Chapter 4
while waiting for the resource than it can gain while using the resource. For this second agent, sharing a resource with others is not efficient. Therefore, it prefers to have its own private resource if possible. However, in some cases the agents have no choice but to share a resource (e.g., a road junction or another expensive resource). We first consider the case where W loses less over time than A can gain over time. In such a case, for any offer, if it is big enough, it is possible to find an offer in the future that will be better for both sides, that is, so that both agents have positive total gain. Although it might appear that such an assumption will cause long delays in reaching an agreement, we will prove that in fact the delay will be at most one period, since W may opt out. However, since better agreements for both parties can be found in the future, the agreement that is reached is not Pareto-optimal over time. The reasoning behind this proof is as follows. If it is not agent W ’s turn to make an offer in some time period t, it can always opt out and gain utility similar W,t to that of sˆ W,t (actually, between sˆ W,t W and sˆ W − 1). So, in time period t − 1, W W,t will never make a better offer to A than sˆ A + |cW |, where its utility is equal to or greater than its utility from opting out in the next period with the addition of W ’s loss over time (note that cW < 0). But A will refuse such an offer, since A prefers waiting a period and offering W sˆ W,t ∈ S. This offer will prevent W from opting out, and if W accepts the offer, A’s utility will be equal to the W,t utility from sˆ W,t A + c A , which is better for A than sˆ A + |cW | since |cW | < c A . Thus an agreement will not be achieved when it is W ’s turn to make an offer, but there is still the possibility of an agreement in the next period. On the other hand, if A offers W something less preferred by W than sˆ W,t , W will opt out since it will never receive in any given time period in the future t ′ anything ′ more than sˆ W,t . To prevent W from opting out, A should offer sˆ W,t , which is acceptable to W . Since W is the first agent to make an offer, the agreement will be reached in the second period with sˆ W,1 . The second case considers the situation where agent W ’s losses over time are greater than agent A’s gains. In this model, for any agreement in period t ∈ T , there is no other agreement in the future that both agents will prefer over this agreement. On the other hand, if an agent’s portion of agreement s in period t is sufficiently small, one can find an agreement in a period earlier than t that both agents prefer to s in period t. According to our assumptions, this property will cause the agents to reach an agreement in the first period. In each period in this case, if an agreement exists that agent W prefers to opting out, an agreement exists that agent A cannot reject. The idea is that agent
Negotiations about Resource Allocation
75
W will accept or make an offer only if it is better for it than opting out. If A receives an offer such that there is no better agreement for it in the future, and it is also better for W than opting out, and if A prefers this offer over W ’s opting out in the next period, it must accept this offer. Otherwise, if this agreement is rejected, W should opt out as soon as possible, since it cannot expect to do any better than opting out. But if A prefers the proposed agreement over W ’s opting out in the next time period, it should accept the offer. Such an agreement, that is, the one that will be reached in some period t ∈ T where there is still a possibility for reaching an agreement in the next time period, will be at most (from W ’s point of view) sˆ W,t . The reason for this is that if there is still a possibility for an agreement in t + 1, A wants to delay reaching an agreement. By offering sˆ W,t A prevents W from opting out and gains another period of time. On the other hand, A won’t accept anything worth less than sˆ W,t+1 , since A can always wait until the next period, gain a period, and reach such an agreement. Therefore, in a given time period t + 1, A won’t + c A . But this agreement is not worse to accept anything worth less than sˆ W,t+1 A A than anything that is acceptable to W in the future. On the other hand, since W loses over time more than A can gain, this agreement is better for W than anything it can attain in the future. The next theorem is a formal statement of the above. Theorem 4.3.1 (An agreement will be reached in the first or second period) If the model satisfies assumptions A0r –A6r and the agents use SPE strategies then W loses more than A can gain. If |cW | > c A , and ⌊ˆs W,1 A + c A ⌋ ≤ M, then W W,1 will offer (⌊ˆs W,1 − c ⌉) in the first time period of the + c ⌋, ⌈ˆ s A A W A negotiation and A will accept its offer. W loses less than A can gain. Otherwise, any offer made by W at the first time period will be rejected by A. In the next time period, A will make a W,1 counteroffer (ˆs W,1 A , sˆ W ) which will be accepted by W . We will demonstrate the negotiation process and the results by means of the Mars example. 4 We return to the example of the mission to Mars. Recall that sˆ = (890 − 2t, 550 + 2t). Since ce = 5 and cn = −3, an agreement will be reached in the second period (period 1) with (888, 552). Note that there are agreements in the future that both agents prefer to reaching the agreement (888, 552) in the second period. This is because ESA gains more EXAMPLE n,t
76
Chapter 4
over time than NASA loses over time. For example, the agreement (878, 562) in the fourth time period (period 3) is better for both agents than (888, 552) in the second time period. The problem is that there is no way that NASA can be sure that when the fourth time period arrives, ESA will offer the agreement (878, 562). In that time ESA needs to offer only (884, 556) in order to prevent NASA from opting out, and ESA has no motivation to offer more. 4.3.1 A Comparison of the Resource Allocation with the Data Allocation Results The result of the negotiation on the resource allocation considered in this section is different from that of the data allocation considered in section 3.2. In the resource allocation an agreement will be reached either in the first time period W,1 with (⌊ˆs W,1 s W,1 s W,1 A + c A ⌋, ⌈ˆ W − c A ⌉) or in the second time period with (ˆ A , sˆ W ). In the data allocation case an agreement will always be reached in the first time period. In particular, there is a subgame perfect equilibrium for any possible allocation of the datasets that is not worse to any of the agents than the conflict allocation (theorem 3.2.1). The difference is due to several distinctions between the two cases: In the data allocation case we assumed that there are at least three agents, while in the resource allocation case there are only two agents. If there are two agents in the data allocation environment, theorem 3.2.1 is no longer valid.
•
In the data allocation case all the agents are losing over time, while in the resource allocation case one is losing over time and one is gaining over time. The results of theorem 3.2.1 of the data allocation domain are not valid if the agents are not all losing over time.
•
4.3.2
Simulation Evaluation
We developed a simulation of the robots example for checking how different parameters of the environment influence the outcome of the strategic-negotiations and for comparing the strategic-negotiation outcomes with resource allocation by a mediator. We assume that the mediator will use the earliest and the most popular solution to the bargaining problem, which is Nash’s axiomatic approach (Nash 1950). That is, it will choose the allocation that maximizes the generalized Nash product of the servers’ utilities: (U A ((x, 0)) − U A (Opt A , 0)) × (U W ((x, 0)) − U W (O ptW , 0)). Recall that the negotiations in the robots example is on the division of 1440 usage time periods of the resource, that is, M = 1440, and that the utility
Negotiations about Resource Allocation
77
functions of the agents is of the form (where NASA plays the role of W and ESA plays the role of A): A’ utility function: •
U A ((s, t)) = G A s A + C A t
•
U A ((OptW , t)) = C A t,
•
U A ((Opt A , t)) = C A t − O A
W’ utility function: •
U W ((s, t)) = G W sW − C W t
•
U W ((OptW , t)) = O W − C ′W t,
•
U W ((Opt A , t)) = −C ′W t
In our simulations we tested how the changes in the following parameters influence the outcomes of the negotiations: the benefits from using the resource per time period, G A and G W ; W ’s cost of the negotiation per time period, C W ; A’s benefits over time during the negotiations, C A ; W ’s utility from opting out, O W ; and W ’s loss over time when opting out, C ′W . It turned out that only changes in G W , C W , and O W affected the outcome of the negotiations. Therefore, we present only the simulations in which we varied those parameters. We found that in all our experiments (see figures 4.1 and 4.2) W ’s share in the time usage of the resource (and hence its utility) is larger when the Nash solution is used than when strategic negotiation is used. On the other hand, A is better off when the strategic-negotiation model is used, since A’s utility from the outside offer (i.e., opting out) is relatively lower than that of W . Since the Nash solution’s allocation depends heavily on the difference between the outcome from possible agreements and the outside offer, A’s share is relatively lower when the Nash solution is used. However, in the strategic negotiation, A, who is attached to the resource, has a stronger position in the negotiation than W , and thus gets a larger share of the resource time when the strategic negotiation is used. Figure 4.1 presents the way the allocation of the usage time of the resource changes as a function of W ’s utility from opting out. In this set of simulations, the profit for the agents from one time period of the usage of the resource was 1,000. A’s gains from using the resource during the negotiation, C A , was 5,000, and W ’s loss during the negotiation, C W , was 3,000. Thus these situations can be attributed to the second case of theorem 4.3.1, that is, W loses less than A gains over time and an agreement will be reached in the second time period.
78
Chapter 4
1300 1100 A’s Nash part
time periods
900
W’s Nash part 700
A’s negotiation part
500
W’s negotiation part
300
0 90
0,
00
0 80
0,
00
0 70
0,
00
0 60
0,
00
0 0,
00
0 50
40
0,
00
0 30
0,
00
0 00 0, 20
10
0,
00
0
100
W's Utility from Opting Out
Figure 4.1 The agents’ share of the resource usage time as a function of W ’s utility from opting out (at period 0) when there is complete information (O W ). The x axis shows the utility from opting out in dollars and the y axis’ units are time periods.
1000
800
A’s Nash part
700
W’s Nash part A’s negotiation part
600
W’s negotiation part
500
50 0 4,
00 0 4,
50 0 3,
00 0 3,
50 0 2,
00 0 2,
50 0 1,
1,
00 0
400 50 0
time periods
900
loss over time (in dollars)
Figure 4.2 The agents’ share of the resource usage time as a function of W ’s loss over time when opting out when there is complete information. The x axis shows the loss over time in dollars and the y axis’ units are time periods.
Negotiations about Resource Allocation
79
W ’s loss over time when opting out, C ′W , was 1,000, and its utility from opting out, O W , varied between 100,000 and 1,000,000. The results presented in figure 4.1 show that W ’s share of using the resource increases as its utility from opting out increases in both the strategic-negotiation model and the Nash solution cases. In the strategic-negotiation model, this increases because sˆ W,t W increases. In the Nash solution, the increase is a result of the need to maintain a positive U W ((x, 0)) − U W ((Opt W , 0)) and the attempt to maximize the multiplication. Of course, since W ’s share increases, A’s share decreases as W ’s utility from opting out increases. In the Nash solution case, W ’s share is always larger than that of A. In the strategic negotiation, A’s share is larger until W ’s utility from opting out reaches 850,000. Figure 4.2 illustrates the differences between the outcomes of the agents in the two solution methods. W ’s cost over time when opting out, C ′W , varied between 500 and 4,500. O W was equal to 550,000 and A’s gain during the negotiations was 2,000. The other parameters were as in the previous set of simulations. Thus this set of simulations can be attributed to the first case of theorem 4.3.1, that is, W loses over time more than A can gain and an agreement will be reached in the first time period. As in the previous case, W prefers the Nash solution to the negotiation, and A prefers the strategic-negotiation model. This figure also demonstrates that W ’s loss over time when opting out only slightly affects the outcome. Figure 4.3 presents the agents’ share in the agreement of the resource usage time as a function of the benefits from one unit of usage time in the agreement. In this set of simulations, G W varied between 500 and 4,500. A’s profit during 1400
time periods
1200
A’s Nash part W’s Nash part A’s negotiation part W’s negotiation part
1000 800 600 400 200 50 0 1, 00 0 1, 50 0 2, 00 0 2, 50 0 3, 00 0 3, 50 0 4, 00 0 4, 50 0
0
benefits from one unit of resource time
Figure 4.3 The agents’ share in the agreement as a function of the agents benefits from one unit of usage time of the agreement when there is complete information. The x axis shows the G W in dollars and the y axis units are time periods.
80
Chapter 4
the negotiation is 1,000 and the other parameters are the same as in the previous simulations. Thus this set of simulations can be attributed to the first case of theorem 4.3.1. The results shown in figure 4.3 demonstrate that W ’s share in the agreement decreases as its benefits from one unit of usage time decreases in both solution methods. This is because W ’s utility from an agreement should be equal to or a little higher than its utility from opting out. If the benefits from one unit increase, then the number of units that are needed to yield W ’s same utility decreases. Note that in such situations A’s utility increases as the agents’ benefits from one unit of usage time increase: both its share and its benefits from each unit increase. W ’s utility, however, doesn’t change much. 4.4
Incomplete Information about the Opponent
An agent that negotiates with another agent on resource allocation may have incomplete information about its opponent’s utility function and may not be sure how the opponent will evaluate an offer or how it might compare an offer with other options. In this section we extend our model to deal with agent negotiations in which agents have incomplete information. We will assume that there is a finite set of agent types characterized by their capabilities (e.g., their disk space, computational power, payment agreements). These characteristics produce a different utility function for each type of agent. As before, we assume that each agent i has a utility function over all possible outcomes: U i : { | S ∪ {Opt} | × T } ∪ {Disagreement} → IR. In addition, each agent has some probabilistic beliefs about the types of the other agents, and about the other agents’ beliefs about themselves and about other agents. These beliefs may be updated over time, during negotiations between the agents. Formally, we denote the possible types of the agents Type = {1, . . . , k}. We assume that the details of those types are mutually believed by the agents and that the set of agents is Agents = {W1 , W2 , . . . , Wk , A1 , . . . , Ak }. That is, in a given encounter, Wi , i ∈ {1, . . . , k} negotiates with A j , j ∈ {1, . . . , k}. i and j may be different or may be equal. Both agents may not know the type of their opponents. 4.4.1
Beliefs, Histories, and Strategies
An agent’s negotiation strategy is, in general, any function of the history of the negotiations to its next move. However, in situations of incomplete information, the agent’s strategies also take its beliefs into consideration. To
Negotiations about Resource Allocation
81
formally define a strategy for agents with incomplete information we must define the notions of a history and of an agent’s belief. We consider two steps in each time period. In the first step one of the agents makes an offer and in the second step the other agent responds by accepting the offer, rejecting it, or opting out. Definition 4.4.1 (History) For any time period t ∈ T and a negotiation step j ∈ {1, 2} of this time period, let H (t, j), be the history through step j of time period t of the negotiation. That is, H (t, 1) is a sequence of t proposals and t − 1 responses, while in H (t, 2) there are t proposals and t responses. For example, suppose that there are two agents and M = 50. If in the first time period agent i proposes (30, 20), which is rejected by agent j, then H (1, 1) = {(30, 20)} and H (1, 2) = {(30, 20), No}. If in the second time period agent j proposes (25, 25), which is accepted by agent i, then H (2, 1) = {(30, 20), No, (25, 25)} and H (2, 2) = {(30, 20), No, (25, 25), Yes}. Definition 4.4.2 (System of beliefs) A system of beliefs of agent i is a function ℘i (H ) that is a probability distribution of i’s opponents as a function of the j j history. That is, ℘i (H ) = {(φ1 , . . . , φk ) | j ∈ Agents \ {i}} describes agent i’s belief about its opponents’ types according to a given history of offers and j counteroffers H . That is, φl , l ∈ {1, . . . , k} indicates that agent i believes with j probability φl that the type of j is l. For example, suppose there are two agents i and j and three types of agents in the environment, and suppose that before the negotiation starts agent i believes that with a probability of 12 its opponent is of type 1, with a probability of 41 it is of type 2, and with a probability of 14 its opponent is of type 3. That is, ℘i (∅) = {( 21 , 14 , 41 )}. Now suppose i receives an offer s from its opponent j. i may now change its beliefs. For example, it may conclude that its opponent cannot be of type 3, but rather there is a probability of 23 that it is of type 1 and a probability of 13 that it is of type 2. That is, ℘i ({s}) = {( 32 , 13 , 0)}. A strategy for an agent i specifies an action in the set {Yes, No, Opt} ∪ S for every system of beliefs and possible history after which this agent has to take an action. 4.4.2
Sequential Equilibrium
The main questions here pertain to how an agent uses its beliefs during the negotiation, how it updates its beliefs according to the information it gathers
82
Chapter 4
during the negotiation process, and how an agent influences its opponents’ beliefs. We examine these problems in several situations using the notion of sequential equilibrium (Kreps and Wilson 1982), which requires that in each time period any agent’s strategy will be optimal given its opponents’ strategies, the history up to the given time period, and its beliefs. The agent’s beliefs may change over time, but must remain consistent with the history. To state the requirement that an agent’s strategy be optimal for every history, we must specify its beliefs about the other agents’ types. The notion of sequential equilibrium therefore requires the specification of two elements: the profile of strategies and the beliefs of the agents. This means that when the number of agents is n and the number of possible types is k then a sequential equilibrium (S.E.) is a sequence of nk strategies (i.e., k strategies for each agent for any possible type, 11 , 12 , . . . , 1k , . . . , n 1 , . . . , n k ) and a system of beliefs with the following properties (Osborne and Rubinstein 1990): each agent has a belief about its opponents’ type. At each negotiation period t the strategy for agent i is optimal given its current belief and its opponents’ possible strategies in the S.E. At each negotiation step t, each agent’s belief is consistent with the history of the negotiation. In other words, the agent’s belief may change over time, but it must remain consistent with the history. We assume that each agent in a negotiation interaction has an initial system of beliefs. While the agent’s beliefs may change over time, the agent’s type, which is characterized by its capabilities and goals, does not change over time, as we explain below. A sequence of nk strategies, one for each possible agent, leads to a probability distribution over the outcomes. For example, if agent i believes with a probability of φ that its opponent j is of type 2, then i expects that with a probability of φ the outcome is determined by the strategy specified for i and the strategy specified in the sequential equilibrium for j2 . If i believes that j’s type is k with a probability of φ ′ , then it assumes that with a probability of φ ′ that the outcome will be the result of j’s usage of the strategy that is specified in the sequential equilibrium for type k and its own strategy. The agents use expected utilities to compare the outcomes. We impose three conditions on the sequence of strategies and the agent’s system of beliefs (Osborne and Rubinstein 1990).7 Sequential rationality—The optimality of agent i’s strategy after any history H depends on the strategies of its opponents given their types and their system of beliefs. This means that agent i will try to maximize its expected utility with
•
Negotiations about Resource Allocation
83
regard to the strategies of its opponents and its beliefs about the probabilities of its opponents’ type according to the given history. Consistency—Agent i’s belief ℘i (H ) should be consistent with its initial belief ℘i (∅) and with the possible strategies of its opponents. Whenever possible, an agent should use Bayes’ rule to update its beliefs. If, after any history, all the strategies of agent j’s in the given sequence of strategies, regardless of agent j’s type, indicate that it has to take the same action (e.g., reject an offer, make the same counteroffer), and this action is indeed taken by agent j, then agent i’s beliefs remain as they were before the action was taken. If only one of the strategies of j, for example, type l, specifies that a given action should be taken (e.g., making an offer s), and the action is indeed taken (e.g., s is offered by j), then i believes with a probability of 1 that j’s type is indeed l. The agent uses the same reasoning about its opponent j’s beliefs based on the given sequence of strategies and updates j’s beliefs in a similar way. To demonstrate this requirement we return to the above example, where there are three types of agents in the environment. Suppose i’s original belief is ℘i (∅) = {( 12 , 41 , 41 )} as above, and suppose that the strategies of j1 , j2 and j3 indicate that in the beginning all of them will make an offer s; then i’s beliefs cannot be changed if it indeed receives the offer s. However, if the strategies of j1 and j2 specify the offer s, but the strategy of j specifies the offer s ′ , then if A receives an offer s ′ it believes that its opponent is of type 3. That is, ℘i ({s}) = {(0, 0, 1)}. •
Never dissuaded once convinced—Once an agent is convinced of its opponent’s type with a probability of 1, or convinced that its opponent cannot be of a specific type, that is, the probability of this type is 0, it is never dissuaded from its view. The condition implies, for example, that in the above example, once agent i reaches the conclusion that its opponent is j3 , it cannot revise its belief, even if agent j subsequently deviates from j3 ’s strategy. From this point on, i has perfect information on agent j3 and is sure how j will respond to its offers and which counteroffers it will make.8 •
Definition 4.4.3 (Sequential equilibrium) A sequential equilibrium is a sequence of nk strategies and a system of beliefs, for any i ∈ Type that satisfy the conditions of sequential rationality, consistency, and never dissuaded once convinced. Using this formal definition of sequential equilibrium and the negotiation protocol, we will analyze different negotiation situations.
84
4.4.3
Chapter 4
Attributes of the Utility Functions
We presume that assumptions A0r –A6r of section 4.4.3 are valid for each type of agents A and W respectively. Recall from assumption A3r that we assume that the agents have a utility function with a constant cost or gain due to delay. Every agent bears a fixed cost for each period. That is, in the complete information case, agent A has a constant time gain c A > 0, and agent W has a constant time loss, cW < 0. In the incomplete information case, A3r should be slightly revised. Each agent of type i ∈ Type when it plays the role of A may have a different constant c Ai > 0 and when it plays the role of W , its constant is cWi < 0. When Ai and W j negotiate, the exact values c Ai and cW j are private information. That is, agent Ai knows its personal gain c Ai , but may not know cW j , although it knows that it is one of k values. We will consider the situation where it is common belief that |cWk | ≤ |cWk−1 | ≤ · · · ≤ |cW1 | < c Ak ≤ · · · ≤ c A1 . That is, agent Wk loses less than agent W1 while waiting for the resource. Agent Ak also gains less than A1 while using the resource. Both agents lose less while waiting than they can gain while using the resource. We also assume that for any time period t, W ,t k ,t 1 ,t 9 sˆ W < sˆ A k−1 < · · · < sˆ W A . That is, Wk is more willing to opt out (than A to reach an agreement) than W1 . We will also assume that for all i ∈ Type i ,t i t+1 − sˆ W < c Ak . We show that in situations that satisfy conditions A0r – sˆ W A A A6r , a sequence of strategies that comprise a sequential equilibrium exists.10 4.4.4
Negotiation Ends in the Second Period
If the above assumptions hold and the agents use sequential equilibrium strategies, then the negotiation will end in the second period. There is a high probability that the agents will reach an agreement in this period. The exact probability and the details of the agreement depend on agent A’s initial belief. If A believes with high probability that W ’s type is i and its type is actually i, then the probability that Wi will opt out is low. The probability that an agreement will be reached depends also on A’s type. As the difference between A’s utility from an agreement and its utility from opting out decreases, the probability that W will opt out decreases. We will show that all agents that play the role of W (regardless of their type) will try to deceive their opponents and behave as the strongest agent Wk in the first period. Agent A will ignore their offer, and will make its counteroffer in the second period based on its initial belief and its type. In most cases this offer will be accepted, as will be explained below. The remainder of this chapter
Negotiations about Resource Allocation
85
describes these results. We first define another notion that captures the agent’s belief of how strong its opponent is. Definition 4.4.4 (The strongest an opponent can be believed to be) Let ℘m (h) be the system of beliefs of agent m after history h. Let n ′ be the maximal A
0. n ′ is the strongest agent that m believes its n ∈ Type such that φn j = opponent may be. In the next lemma we show the exact agreements each of the agents makes or accepts in a given time period t. The first part of the lemma (1) indicates that Wi will not offer A anything better than its possible utility from opting out in the next period, with the addition of Wi ’s loss over time. It can always wait another time period and opt out. That is, Wi won’t offer A anything better than its offer in the situation of full information. In the second part of the lemma (2) we show that A will behave toward W no better than it will toward the strongest type A believes W may be. That is, if A believes that W can’t be stronger than W j , it won’t offer it more than it would offer W j when there is full information and W is indeed W j . The third part (3) of the lemma indicates that if Wi receives an offer worth less to it than opting out (less than sˆ Wi ,t ), it will opt out. We show that if Wi rejects this offer it won’t receive any future offers better than this one. But it prefers opting out over reaching this agreement. On the other hand, if Wi is offered an agreement at least as good as opting out, it should accept it. Wi won’t be offered any agreement better than that (part (4) of the lemma). Lemma 4.4.1 (Agreements that are accepted and agreements that are rejected) Suppose agent W is of type i ∈ Type and A is of type j ∈ Type, and the agents’ utility functions satisfy A0r –A6r . Let n A ∈ Type be the strongest type A believes its opponent can be (as defined in definition 4.4.4). If both W and A use their sequential equilibrium strategies then the following holds: 1. The best offer to A that may be made by W i : Wi will not offer A in step i ,t+1 + |cWi |. t more than sˆ W A 2. The best agreement for agent W that may be made or accepted by Wn ,t+1 Aj : A j will not accept anything less than sˆ A A + c A j in step t and won’t Wn A ,t . offer anything more than sˆ i ,t 3. W i will opt out: If in step t the offer that A j makes to Wi is less than sˆ W W , then Wi opts out in step t.
86
Chapter 4
4. The offers that will be accepted by W i : If in step t, A j makes an offer s i ,t such that sW ≥ sˆ W W , Wi accepts the offer. Proof: i ,t 1. It is clear that any type of W won’t offer A more than sˆ W A +|cWi | since it can always wait until the next time period, opt out, and achieve a better outcome.
2. When it is A’s turn to make an offer in a given time period t, and it believes that n A is the strongest type W may be, then it will never offer anything better for W than sˆ Wn A ,t . This offer will prevent any type of W that A believes W can be from opting out. This is the main goal of A; if W rejects its offer but doesn’t opt out, A earns another time period of using the resource. Wn ,t+1 + c A j of the resource in time period If A is offered anything less than sˆ A A t it should reject this offer; it can always wait another time period and offer W sˆ Wn A ,t+1 , and since it gains over time c A j , its utility will be at least as large as Wn ,t+1 the utility of sˆ A A + c A j at time t. 3. This will be proved by induction on the number of types (|Type|). Note that U Wi ((Opt, t)) > U Wi ((ˆs Wi ,t+1 , t + 1)) by assumption (A5r ) and since this is a discrete case. Base case: (only two types) k = 2 If there are only two types, it is clear by the second part of the lemma (2) that in ′ any future time period t ′ A won’t offer W more than sˆ W2 ,t . However, W2 prefers ′ opting out now over the possibility of getting sˆ W2 ,t in future time periods t ′ . The only way for A to prevent W2 from opting out now is by offering at least sˆ W2 ,t , which is the worst agreement to W2 that is still better than opting out. That is, if W2 is offered anything less than sˆ W2 ,t at time t it will opt out. Suppose A offers less than sˆ W1 ,t . It is clear that in this situation W2 will opt out W2 ,t 1 ,t since sˆ W W < sˆ W . So, if W doesn’t opt out after receiving such an offer, it will be clear to A that its opponent is W1 . Thus by (2) it is clear that in the future ′ (t ′ ) A won’t offer W anything better for W than sˆ W1 ,t , and W1 prefers opting ′ out now over sˆ W1 ,t in time t ′ . ′ k = k + 1 Suppose the assumption is correct for |Type| = k ′ , and a new type i is added. If A offers something less than sˆ Wi ,t , all the other types of W that are stronger than i opt out, by the induction hypothesis. Therefore, if Wi won’t opt out, A will know that its opponent is at most i, and won’t offer anything better ′ in the future (t ′ ) than sˆ Wi ,t (by (2)). 4. Similar to (3).
Negotiations about Resource Allocation
87
Based on this lemma we prove that all agents that play the role of W , regardless of their type, will behave as the strongest type when it is their turn to make an offer. From (1) of lemma 4.4.1, it is clear that agent Wi won’t offer A i ,t+1 + |cWi |. If so, if there is an agent that offers more, anything more than sˆ W A A can conclude that its type is weaker than i. But in such a case, A is better off waiting and offering W an agreement that it would have made if its opponent had been weak. It isn’t worthwhile for an agent to reveal that it is weak in the first period, since it can only lose from this exposure. Therefore, all types of agent W behave as the strongest ones. When it is A’s turn to make an offer, it should try to maximize its expected utility. Thus it needs to calculate for which agreement, according to its beliefs, its expected utility will be the highest. If it offers its opponent an agreement that fits Wi (ˆs Wi ,t ), and its opponent is stronger than i, it will opt out. If it is of type i or weaker than i it will accept the offer. Lemma 4.4.2 (W will pretend to be strong and A will behave according to its expected utility) Suppose agent W is of type i ∈ Type, A is of type j ∈ Type, and the agents’ utility functions satisfy A0r –A6r . If both W and A use their sequential equilibrium strategies, then the following properties hold: (i)
i ,t An agent of type i ∈ Type will accept any offer greater or equal to sˆ W W .
k ,t+1 (ii) All agents of type W , regardless of their types, will offer sˆ W + |cWk | A amount of the resource in any time period t.
(iii) Suppose A has a probability belief of (φ1A , . . . , φkA ), where φ1A + · · · + φkA = 1. Let Expect(ˆs Wi ,t ) = (φ1 + · · · + φi )U A ((ˆs Wi ,t , t)) + (φi+1 + · · · + φk )U A ((Opt, t)). Let iˆ ∈ Type such that Expect(ˆs Wiˆ ,t ) is maximal over any i ∈ Type. A will offer sˆ Wiˆ ,t in time period t when it is its turn to make an offer. Proof: (i)
The proof is clear by (4) of Lemma 4.4.1.
(ii) The proof is by induction on the number of types. Base case (k = |Type| = 2). By (1) of Lemma 4.4.1 it is clear that W2 will not offer anything more to A than s AW2 ,1 + |cW2 | that will be rejected by A according to (2) of Lemma 4.4.1. So if W1 will offer something better than s AW2 ,1 + |cW2 |, A can conclude that W1 ’s type is 1, i.e., n A = 1. But, by (2) of the lemma, it will reject the offer, and in the next period it will offer W1 nothing more than sˆ W1 ,1 . Therefore W1 should prefer to pretend to be W2 .
88
Chapter 4
Inductive case (k = |Type| = k ′ + 1). Suppose the induction hypothesis is correct for k ′ number of types. And suppose another type i ′ is added that is weaker than the previous types. From the induction hypothesis it is clear that all of the types that play the role of W will pretend to be strong; therefore if Wi ′ will behave differently, A will reveal who it is, reject its offer, and won’t offer it anything better than sˆ Wi ′ ,t+1 in the future. So, if i plays the role of W , it should pretend to be strong. (iii) By (3) of lemma 4.4.1, it is clear that if A will offer s Wi ,1 in period 1, all the agents that are stronger than i will opt out, while the others will accept it. Thus A calculates its expected utility from all its options, and chooses the best option for itself. The final results of this section are presented in the following theorem. Theorem 4.4.2 (Either an agreement will be reached in the second period, or W will opt out) Suppose agent W is of type i ∈ Type and A is of type j ∈ Type, and the agents’ utility functions satisfy A0r –A6r . Let iˆ ∈ Type such that Expect(ˆs Wiˆ ,2 ) is maximal over any i ∈ Type (where Expect is defined as in Lemma 4.4.2 (iii)). If both W and A use their sequential equilibrium strategies, then in the first k ,1 + period (period 0) all types of agents playing the role of W will offer A sˆ W A |cWk | amount of the resource. A will reject the offer. In period 1 A will offer sˆ Wiˆ ,1 . If W is at least of type iˆ it will accept the offer. Otherwise, it will opt out. Proof:
The proof is clear from lemma 4.4.2.
We would like to indicate that A j may revise its beliefs about its opponent’s type after each negotiation session following the conditions of the sequential equlibrium (definition 4.4.4). If in the second period, W accepts an offer equal Wi ,1 i ,1 to sˆ W and W opts W , A concludes that W is at most of type i. If A offers sˆ W out of the negotiation A concludes that it is of type greater than i. Using this additional information about W , agent A can update its beliefs about other agents. For example, if A knows that there is at most one agent in the system that is of type 1, and it finds out that its opponent from the previous interaction is of type 1, then it can adjust its beliefs about the types of other agents. The updated belief will be used in future interactions. This is the only case we have studied in which one of the agents may actually opt out. However, as more interactions occur, more information about one another is collected, and less opting out will occur in the future.
Negotiations about Resource Allocation
89
EXAMPLE 5 We return to the example of the mission to Mars. Suppose that each of the labs (agents) on Mars does not know the exact details of the contracts the other has with companies. There are two possibilities for the contracts: high (h) and low (l). If the type of contracts ESA holds is h, then its utility functions are similar to those of Example 4. It gains $5000 per minute during the negotiation and gains $1000 per minute when sharing the line with NASA. If NASA also holds contracts of type h, then its utility functions are also similar to those of example 4. NASA loses $3000 per minute during the negotiation period and gains $1000 per minute when sharing the line with the Europeans. If NASA opts out its overall gain will be $550,000, but it will also lose $1000 per minute during the negotiation. However, if ESA’s contracts are of type l, then it only gains $4000 per minute while using the line by itself. If NASA’s contracts are of type l it only loses $2000 per minute while negotiating. But if NASA opts out, its overall gain is only $450,000. NASA still negotiates for the usage of ESA’s line in the next 24 hours (i.e., M = 1440) from the time the negotiation ends. Formally, let s ∈ S, t ∈ T :
Type h
Type l
U eh ((s, t))
U el ((s, t)) = 1000se + 4000t U el ((Optn , t)) = 4000t U el ((Opte , t)) = 40t − 1000 U nl ((s, t)) = 1000sn − 2000t U nl ((Optn , t)) = 450000 − 1000t U nl ((Opte , t)) = −1000t sˆ nl ,t = (990 − t, 450 + t)
= 1000se + 5000t U eh ((Optn , t)) = 5000t U eh ((Opte , t)) = 5000t − 1000 U n h ((s, t)) = 1000sn − 3000t U n h ((Optn , t)) = 550000 − 1000t U n h ((Opte , t)) = −1000t sˆn h ,t = (890 − 2t, 550 + 2t)
In this example, |cnl | = 2 < |cn h | = 3 < cel = 4 < ceh = 5 as required by the revised version of A3r . The other assumptions can be verified as in Example 3. Let us assume that ESA (playing the role of A) is of type h and NASA (playing the role of W ) is of type l. We denote them by eh and n l . We consider two cases. Suppose eh believes that with a probability of 0.5 its opponent is of type h and with a probability of 0.5 its opponent is of type l, i.e., φle = 0.5 and φhe = 0.5. According to theorem 4.4.2, in the first period, n l will pretend to be of type h and will offer eh (ˆs ne h ,1 + |cn h |, sˆ nn h ,1 − |cn h |) = (893, 547). eh will reject the offer. In the second time period, eh compares offer (888, 552), which will be accepted by both types, with offer (988, 452), which will be accepted
90
Chapter 4
by type l, but after such an offer, if W is of type h, it will opt out. Since its expected utility from offering (888, 552) is higher, it makes this offer, which is accepted by n l . However, suppose eh believes only with a probability of 0.1 its opponent is of type h, and with a probability of 0.9 its opponent is of type l. The behavior of n l in the first period is similar to the previous case. It pretends to be h. However, in the second period, eh ’s expected utility from (988, 452) is higher than from (888, 552), and therefore it makes this offer to W , which is accepted by nl . 4.4.5
Comparison with Data Allocation with Incomplete Information
In the resource allocation with incomplete information case we proposed using sequential equilibrium, while in the case of data allocation with incomplete information we changed the protocol and added a revelation mechanism (see section 3.4). At the first step of the revelation mechanism of the data allocation case, all the agents are asked to report, simultaneously, all of their statistical information about past usage of datasets. Most of the information will be reported by two agents. In such cases, a high penalty for both servers in case of conflicting reports provides an incentive for truth-telling in most of the reports in the data allocation case. However, in the resource allocation case, each agent is familiar only with its own utility function and there is no overlapping of private information in the reports. Thus it is harder to develop a mechanism with an incentive for truth-telling. On the other hand, in the resource allocation case we assume that there is a finite number of types of agents and that the agents have beliefs about the type of their opponent. In the data allocation case the number of possible utility functions is not restricted since it depends on the usages of the data. Therefore, sequential equilibrium can be used in the resource allocation case and not in the data allocation case. Because of these differences, the negotiations ends with an agreement in the first time period in all the situations considered in the data allocation environment, while in the resource allocation environments one of the agents may opt out of the negotiations in some situations. 4.4.6
Simulation Evaluation
We used the simulation of the robots example described in section 4.3.2 to evaluate the results of the strategic-negotiation when A has incomplete information about W .
Negotiations about Resource Allocation
91
In the first two sets of experiments, described in figures 4.4 and 4.5, we consider situations where W can be either of type low (l) or high (h). If W is of type l, then its outcome when opting out is 100,000. We varied the outcome of W of type h from opting out between 200,000 and 1,000,000. The other parameters were the same as in the first set of simulations in the complete information case (figure 4.1). In the first set of simulations of the incomplete information case, considered in figure 4.4, W ’s type was high. In some cases agreement was reached and the graphs specify the share of each agent in the time usage of the resource according to the agreement that had been reached (i.e., a number between 0 and 1,400). However, in the case where opting out occurred (when A offered sˆ Wl ,1 ), the graphs specify the agents’ utility from opting out divided by the benefits for i W the agents from one unit of using the resource (i.e., U (OGpti ,1) .) We tested the cases where A believes with a probability of 0.2, 0.4, 0.6, 0.8, or 1 that W is of type h. Thus the first five points in each graph of figure 4.4 specify the outcome when Wh ’s utility from opting out was 200,000. The first point specifies the outcome for the 0.2 case, the second point specifies the outcome for the 0.4 case, and so on. The sixth point until the eleventh point specify the outcomes for the case when Wh ’s utility from opting out was 300,000, and
1200
1000
800 A’s part W’s part
600
400
200
900000
900000
800000
800000
800000
700000
700000
600000
600000
600000
500000
500000
400000
400000
400000
300000
300000
200000
200000
200000
0
Figure 4.4 The agents’ share of the resource usage time as a function of the utility of Wh from opting out and A’s probabilistic belief about W ’s type. In these simulations W ’s type was actually high. A’s probabilistic beliefs that W is of type h that were considered for each value of W ’s utility from opting out were 0.2, 0.4, 0.6, 0.8, and 1.
92
Chapter 4
so on. From these results we can see that when Wh ’s utility from opting out is relatively low (i.e., 200,000 and 300,000), an agreement was reached regardless of A’s beliefs about W ’s type. However, when Wh ’s utility from opting out was higher and A believed with only low probability that W is of type h, opting out occurred. For example, when Wh ’s utility from opting out was 400,000 and A believed only with a probability of 0.2 that W was of type h, then it offered W sˆ Wl and W opted out (since it was of type h). Note that while A’s normalized utility from opting out was very close to 0 (i.e., its utility from opting, 5,000, divided by its benefit from the usage of one unit of time, 1000), W ’s normalized utility was very close to its utility when an agreement was reached under the same circumstances. In the second set of simulations of incomplete information (see figure 4.5), W ’s type was low. In all these cases an agreement was reached and the graphs specify the share of each agent of the usage of the resource in the agreement. As in the previous case, A’s beliefs concerning W ’s type being high varied between 0.2 and 1. As in the previous case, A usually treated W as type h. However, when the difference between Wl and Wh increased there were situations in which it treated W as of type l. In particular, when Wh ’s utility from opting out was between 400,000 and 600,000 and A’s belief that W is of type h was
time periods
1400 1200
A’s part W’s part
1000 800 600 400 200 )
)
)
00
0
(1
.4 (0 0,
00
0 90
0, 90
80
0,
00
0
(0
.8
)
)
.2
.6
(0 0
00
0 00
0, 80
70
0,
00
0
(0
(1
.4 (0
0
0, 60
00
00
0, 60
)
)
)
) 0
(0
.8
)
.2 (0 0, 50
50
0,
00
0
0
(0
.6
)
)
(1 0
00 0,
0,
40
00
0 30
00
)
.4 (0
.8 (0
0 30
0,
00 0,
20
20
0,
00
0
(0
.2
)
0
W's Utility from Opting Out
Figure 4.5 The agents’ share of the resource usage time as a function of the utility of Wh from opting out and A’s probabilistic belief about W ’s type. In these simulations W ’s type was actually low. A’s probabilistic beliefs that W is of type h that were considered for each value of W ’s utility from opting out were 0.2, 0.4, 0.6, 0.8, and 1.
Negotiations about Resource Allocation
93
only 0.2, A offered sˆ Wl ,1 , which was accepted by W (who was of type l). In these cases, A’s share in the usage of the resource was higher than in the other cases. Similarly, when Wh ’s utility from opting out was between 700,000 and 800,000 and A’s belief that W is of type h was either 0.2 or 0.4 and Wh ’s utility from opting was 900,000 and the probability was 0.2, 0.4, or 0.6, better agreements for A were reached. It is clear that if A would have had complete information sˆ Wl ,1 would have been reached in all the cases of this set of simulations. Because of the incomplete information the average utility of A was decreased by 22.5% from its utility in similar situations in which there was complete information. W ’s utility was almost four times higher than in the case in which A had complete information. The big difference between the agents is due to the small share of Wl in an agreement. This can be demonstrated when Wh ’s utility from opting out was h,1 400,000. In this case, sˆl,1 W was around 100 and sˆ W was around 400, that is, four l,1 times higher. However, sˆ A was close to 1,300 and sˆ h,1 A was around 1,000, a decrease of 23%. In the third set of simulations we considered situations where there were five possible types of W : W1 was the lowest, W5 the highest. The utility levels from opting out for W1 , W2 , W3 , W4 , and W5 were 100,000, 300,000, 500,000, 600,000, and 700,000, respectively. We ran 30,000 iterations. In each iteration the type of W was chosen and it was matched with A, who does not know its type. The following cases were considered: There was equal distribution of W ’s types, and A believed that each type of W had an equal probability, that is, it believed that there was a probability of 1/5 W ’s type was i. In this case, A always offered an agreement as it would have offered the highest type (i.e., sˆ W5 ,1 ), and the negotiations always ended with an agreement. •
There was equal distribution of W ’s types, but A’s beliefs about W ’s type were random. That is, A didn’t have an accurate estimation of W ’s type. In this case A’s utility was 8.6% lower than if it had always offered for the highest type. This demonstrates that having mistaken beliefs reduces A’s benefits. •
The probability that W will be of type 1, 2, 3, 4, or 5 were 0.4, 0.3, 0.15, 0.1, and 0.05 respectively, and A’s beliefs were correct. That is, A believed that with probability 0.4 W ’s type was 1, with probability 0.3 W ’s type was 2 and so on. In these settings A always offered for W2 , that is, sˆ W2 ,1 . A’s utility was higher by 8.1% than if it would have offered for the highest type, that is, sˆ W5 ,1 , even though in 30% of the cases W opted out. •
94
4.5
Chapter 4
Multiple-Encounter Negotiation Protocol
There are cases in which the agents may need to meet several times in order to negotiate. In these situations, future encounters play an important role in the negotiations. Agents with incomplete information may take actions to influence their opponent’s beliefs, so that they will be able to benefit from future encounters. Furthermore, they may take actions designed to gather information. In this section we study these situations. 4.5.1
Strategies, Histories, and Beliefs
We will adjust the negotiation protocol used in previous sections to apply to the multiple-encounter negotiation situation. The notion of history defined in definition 4.4.1 describes the progress of the negotiation in a single-encounter negotiation. In multiple encounters, given two agents, we use a sequence of histories to describe the encounters between them over time. For i, j ∈ Agents, if the agents negotiate m times, we denote the sequence of their histories by Hi, j = H1 , . . . , Hm . Furthermore, we assume that interactions with other agents will not change the agents’ beliefs about one another.11 Therefore, if there are several encounters, we assume that the beliefs of the agent at the beginning of encounter Hq 1 < q ≤ m, before the negotiations starts, will be the same as at the end of the encounter Hq−1 , after the negotiations have ended and the agents have revised their beliefs accordingly. Furthermore, while in the single encounter case, we are able to find strategies that are in equlibrium where the agents take only definite actions (pure strategies), in this case there are occasionally not pure strategies that are in equlibrium and an agent may choose its actions randomly (mixed strategies). The notions of pure and mixed strategies were proposed by von Neumann (Luce and Raiffa 1957). A pure strategy specifies an action for an agent that can be either a proposal or a response, given the history and the system of beliefs. A mixed strategy requires the agent to draw a random number with a probability distribution specified by the strategy, and then decide accordingly on the action it will take. These mixed strategies will be used to find stable solutions in situations where there are no stable pure strategies. Definition 4.5.5 (Strategies) A pure strategy for agent i specifies an action in the set {Yes, No, Opt} ∪ S for every system of beliefs and possible history after which this agent has to take an action (as defined in section 4.4.1). A mixed strategy for an agent specifies a probability distribution over actions rather than just an action as in the pure strategies.
Negotiations about Resource Allocation
95
When the agents choose randomly between several pure strategies, the expected utility from all of the chosen pure strategies should be the same. Otherwise, they will not agree to randomize but will prefer one pure strategy to the other. The concepts of pooling and separating equilibria are very useful in analyzing situations of multiple encounters and reputation (see chap. 8 of Fudenberg and Tirole 1991). Suppose that an agent that plays a given role in an interaction has different utility functions. Each utility function is associated with a different type of that agent. If all types of a given agent pick the same strategy in all states, the equilibrium is pooling. Otherwise, it is separating—it is possible to identify the agent’s type from its actions. There can also be hybrid or semi-separating equilibria, where an agent may randomize between pooling and separating. We use these concepts later in the section. 4.5.2
Two Agents Involved in Two Encounters
We assume that assumptions A0r –A6r are valid also in the case of multiple encounters. According to assumption A5r (range for agreement), agent A always prefers sˆ W,t to opting out, regardless of its own type and regardless of W ’s type. Therefore, in this section we consider cases in which A j ’s type does not play an important role, meaning that A j ’s actions do not depend on its exact type. This simplifies our discussion.12 Furthermore, we will assume that there are only two types of agents in the environment, “high” and “low,” that is, Type = {h, l}. These types satisfy the following condition: |cWh | ≤ |cWl | ≤ W ,t A ,t Al ,t l ,t c Ah ≤ c Al and sˆ A h < sˆ W s Ah . This means that agent h would prefer A < sˆ A ≤ˆ to opt out more often than agent l. The threat of h to opt out in case it gets low offers causes its opponent to offer higher agreements. In addition, agent W ,t l ,t and therefore A will prefer to negotiate with Wl than in Wh since sˆ A h < sˆ W A A j Wl ,t A j Wh ,t , t) < U (ˆs , t). U (ˆs As mentioned above, a pure strategy for agent i specifies an action in the set {Yes, No, Opt} ∪ S for every possible sequence of histories and appropriate system of beliefs. Since the agents’ belief at the end of one history is similar to the one at the beginning of the next history in the sequence, there is no need for a strategy to be a function of all the histories in a sequence of histories. Therefore, a strategy for a sequence of histories will be composed of strategies that are functions of one history in the sequence. Furthermore, since in this section we concentrate on the effect of multiple encounters, we will not describe in detail the agent’s strategies for each history in the sequence. We will use the strategies described in section 4.4 as the basic
96
Chapter 4
components of strategies of sequences of histories. We will identify strategies that form a sequential equilibrium with the actual events that occur. For example, given a specific encounter, when we say that A j will offer sˆ Wh ,1 , we mean that in every time period when it is A j ’s turn to make an offer, it will offer sˆ Wh ,t and if it receives an offer smaller than sˆ Wh ,t + c A j it will reject it. However, since by the second time period, given W ’s strategy, either an agreement with sˆ Wh ,1 or sˆ Wl ,1 will be reached, or W will opt out, we will characterize the strategies by the behavior in the second time period, that is, sˆ Wl ,1 , sˆ Wh ,1 or Opt. Thus, in the rest of this chapter, the main factors that play a role are the agents’ utilities in the second time period of each encounter. To make this section more readable, we will use short notations for these utilities. Uoi denotes the utility of agent i from outcome o. We will use l for (ˆs Wl ,1 , 1), h for (ˆs Wh ,1 , 1), and O for (Opt, 1). That is o can be either l, h or O. i can be either A, for agent A, l for agent Wl , and h for agent Wh . That is, UoA denotes A’s utility from outcome o, Uol denotes Wl ’s utility from o, and Uoh denotes Wh ’s utility from outcome o. For example, Ull denotes U Wl (ˆs Wl ,1 , 1) and U Ol denotes U Wl (Opt, 1). These notations are summarized in figure 4.6. Given the new definitions of histories, strategies, and systems of beliefs and the sequential equilibriums, we assume that the three conditions of the sequential equilibriums impose restrictions on each time period in all sequences of histories. In particular, in any time period of every possible history in a sequence of histories, agent i will try to maximize its expected utility, with regard to the strategy of its opponent (which is composed of a sequence of strategies for each encounter) and with regard to its probabilistic beliefs about its opponents’ type according to the given history. Furthermore, if there is a probability attached as to whether a future encounter will happen, it will use this probability to compute its expected utility.
Short Notation
Utility
Explanation
Ull Uhl U Ol UlA UhA U OA
U Wl (ˆs Wl ,1 , 1) U Wl (ˆs Wh ,1 , 1) U Wl (Opt, 1) U A j (ˆs Wl ,1 , 1) U A j (ˆs Wh ,1 , 1) U A j (Opt, 1)
Wl ’s utility for agreement sˆ Wl ,1 in period 1 Wl ’s utility for agreement sˆ Wh ,1 in period 1 Wl ’s utility for opting out in period 1 A j ’s utility for agreement sˆ Wl ,1 in period 1 A j ’s utility for agreement sˆ Wh ,1 in period 1 A j ’s utility if W opts out in period 1
Figure 4.6 Short notation. Note that U OA < UhA < UlA and U Ol < Ull < Uhl .
Negotiations about Resource Allocation
97
Thus suppose that there is some probability that the agents will meet again and negotiate in similar situations in the future. In particular, we assume that the agent that plays the role of A in the current negotiation will play the same role in the future. In some cases, A j may take action in order to find out what W ’s type is. On the other hand, W may want to influence A j ’s beliefs in order to benefit in future encounters and may be willing to lose now in order to increase its overall expected utility from all encounters. Such situations may arise in the mission to Mars example (example 5), if there is some probability that NASA’s antenna will require additional repairs. Suppose that both agents believe with a probability of 0 ≤ β ≤ 1 that they will meet again in a similar situation in the future. We also assume that any given type of agent has the same beliefs concerning its opponent. We will denote by φ j , j ∈ Type, A j ’s beliefs that W is of type l. For example, all the agents Ah believe initially that their opponents are of type l with a probability of φ h , and of type h with a probability of 1 − φ h .13 In the next sections we will characterize the situations of negotiation by two agents in two encounters using different conditions. Note that the conditions stated in the following sections complement one another, and provide us with a large range of situations. The conditions of the following sections consist of inequalities on the utility functions of the agents. When an inequality in the following sections is with respect to agent A the letter A will appear in the condition’s title. For example, AA1, presented in the next section, denotes an inequality involving agent’s A’s utility function. If an inequality is denoted by 1, its reverse will be denoted by 2. For example, the reverse of inequality AA1 is denoted by AA2. As we mentioned above, these two inequalities cover all the possibilities of A’s utility functions, besides the one that yields equality. 4.5.2.1 A’s Expectation from sˆWh , t Is Higher than from sˆWl , t We first consider the case where A j ’s expected utility (regardless of its type) in a single encounter from offering sˆ Wh ,t is greater than offering sˆ Wl ,t (when it is A’s turn to make an offer). As was discussed in section 4.4, if offered sˆ Wh ,t , both types of W will accept the offer, but if offered sˆ Wl ,t , Wh will opt out. Therefore, in the current section we assume that the following holds: AA1 ∀ j ∈ Type, t < 3, φ j U A j (ˆs Wl ,t , t) + (1 − φ j )U A j (Opt, t) < U A j (ˆs Wh ,t , t). We assume that the expected utility of A from offering sˆ Wl ,t in a single encounter situation (i.e., φ j U A j (ˆs Wl ,t , t) + (1 − φ j )U A j (Opt, t)) is lower than its utility from offering sˆ Wh ,t (i.e., U A j (ˆs Wh ,t , t)).
98
Chapter 4
In particular, the utility for A j from offering sˆ Wl ,1 is UlA if W is of type l (which A j believes with a probability of φ j ) and U OA if W is of type h (which A j believes with a probability of 1 − φ j ). Thus the expected utility for A j from offering sˆ Wl ,1 is smaller than the utility of offering sˆ Wh ,1 which will be accepted by W regardless of its type and will undoubtedly provide A j with UhA in the second time period of the negotiation. The main question is, if another encounter is possible with a probability of β, is it worthwhile for A j to offer sˆ Wl ,1 in the first encounter? In such cases, if the offer will be accepted, A j may know for sure that its opponent’s type is l and use its findings in the second encounter. However, if its opponent is of type h, it will opt out. In that case, A j should compare its expected utility from offering sˆ Wh ,1 in both encounters, that is, UhA + βUhA , with offering sˆ Wl ,1 in the first encounter. A j can then decide according to the results whether to offer sˆ Wl ,1 again or sˆ Wh ,1 , that is, φ j [UlA + βUlA ] + (1 − φ j )[U OA + βUhA ]. In the following theorem we consider the situation where the possible loss for A j from offering sˆ Wl ,1 rather than sˆ Wh ,1 in the first encounter is greater than the possible gain for A j from finding out that W is of type l and then reaching the agreement sˆ Wl ,1 . Formally, AA1.1 UhA − [φ j UlA + (1 − φ j )U OA ] > φ j β[UlA − UhA ]. This assumption means that the difference between A’s expected utility in the first encounter from offering sˆ Wh ,t (i.e., UhA ) and its expected utility from offering sˆ Wl ,t in the first encounter, (i.e., [φ j UlA + (1 − φ j )U OA ]), is greater than the difference between A’s expected utility in the second encounter (if occurs) from offering sˆ Wh ,t (i.e., UlA ) and from offering sˆ Wl ,t (i.e., UhA ) when W is of type l. The probability of the event that the second encounter will occur and that W will be of type l is φ j β. In the next theorem we show that if AA1.1 holds both encounters will always end with an agreement. Theorem 4.5.3 (A j does not gain sufficiently from information) If the model satisfies assumptions A0r –A6r , AA1, and AA1.1 and the agents use sequential equilibrium strategies, then A j will offer sˆ Wh ,1 in the second time period of both encounters. This offer will be accepted by both types of W . Proof: If A j offers sˆ Wh ,1 in the first encounter, W will accept the offer regardless of its type. On the other hand, if A j offers sˆ Wl ,1 , it is clear from the discussion of section 4.4 that Wh will opt out and Wl will consider accepting
Negotiations about Resource Allocation
99
the agreement. If Wl accepts the agreement, then A j will realize that W ’s type is l and can use it in the second encounter (if it occurs) by offering sˆ Wl ,1 , which will be accepted by Wl . Thus, to conclude that A j will offer sˆ Wh ,1 , we need to show that: UhA + βUhA ≥ φ j [UlA + βUlA ] + (1 − φ j )[U OA + βUhA ]; but this is clear from assumption AA1.1. The equilibrium in the above theorem is a pooling equilibrium. Since A offers sˆ Wh ,1 , both types of W will take the same actions, and agent A will not be able to obtain additional information on W in these encounters. We now consider situations in which the inequality of AA1.1 is reversed14 and make additional assumptions about Wl ’s utility function. AA1.2 UhA − [φ j UlA + (1 − φ j )U OA ] < φ j β[UlA − UhA ]. This assumption states that the difference between A’s expected utility in the first encounter from offering sˆ Wh ,1 in period 1 (i.e., UhA ) and its expected utility from offering sˆ Wl ,t in period 1 in the first encounter (i.e., [φ j UlA + (1 − φ j )U OA ]) is lower than the difference between A’s expected utility in the second encounter (if occurs) from offering sˆ Wh ,1 (i.e., UlA ) and from offering sˆ Wl ,1 (i.e., UhA ) when W is of type l. The probability of the event that the second encounter will occur and that W will be of type l is φ j β. The next inequality states that the possible loss for Wl from opting out in the first encounter rather than accepting sˆ Wl ,1 is less than the possible gain for Wl in the second encounter, given β, from offering Uhl rather than Ull . AW1 Ull − U Ol < β[Uhl − Ull ]. This assumption states that the difference between Wl ’s utility in the first encounter from (ˆs Wl ,1 , 1) (i.e., Ull ) and its utility from opting out in the first time period (i.e., U Ol ), is lower than its expected utility from (ˆs Wh ,1 , 1) (i.e., Uhl ) and (ˆs Wl ,1 , 1) (i.e., Ull ), in the second encounter, if it occurs (with probability β). If AA1.2 holds there may be situations where it is worthwhile for A j to offer sˆ Wl ,1 , depending on Wl ’s behavior. If AW1 holds, it is worthwhile for Wl to pretend to be Wh by opting out when offered sˆ Wl ,1 in the first encounter to conceal its type. This will ensure that W will be offered sˆ Wh ,1 in the second encounter. This is stated in the following theorem. Theorem 4.5.4 (A j may benefit from information, but Wl conceals its type) If the model satisfies assumptions A0r –A6r , AA1, AA1.2, and AW1, and the agents use sequential equilibrium strategies, then A j will offer sˆ Wh ,1 in both encounters. This offer will be accepted by both types of W .
100
Chapter 4
Proof: If Wl is offered sˆ Wl ,1 in the first encounter, it should consider whether to accept the offer and thus reveal its type (because Wh will never accept this offer), or opt out and receive sˆ Wh ,1 in the next encounter too (if it occurs). That is, if (1 + β)Ull < U Ol + βUhl then Wl should opt out in the first encounter if it receives sˆ Wl ,1 . However, this could be concluded from AW1. If Wl will opt out in the first encounter if offered sˆ Wl ,1 , it is better for A j to offer sˆ Wh ,1 in the first encounter (since A j cannot learn anything by offering sˆ Wl ,1 ), so that both encounters will end with an agreement sˆ Wh ,1 in the second time period.15 Next, we consider situations where the reverse of inequality AW1 holds.16 AW2 Ull − U Ol > β[Uhl − Ull ]. This assumption states that the difference between Wl ’s utility in the first encounter from (ˆs Wl ,1 , 1) (i.e., Ull ) and its utility from opting out in the first time period (i.e., U Ol ) is greater than its expected utility from (ˆs Wh ,1 , 1) (i.e., Uhl ) and (ˆs Wl ,1 , 1) (i.e., Ull ), in the second encounter, if it occurs (with probability β). If AW2 holds, it is worthwhile for Wl to accept sˆ Wl ,1 in the first encounter. In this situation, it is worthwhile for A j to offer sˆ Wl ,1 in the first encounter and to find out what W ’s type is. Theorem 4.5.5 (A j may benefit from information, and Wl reveals its type) If the model satisfies assumptions A0r –A6r , AA1, AA1.2, and AW2, and the agents use sequential equilibrium strategies, then A j will offer sˆ Wl ,1 in the first encounter and decide on its offer in the next encounter according to W ’s behavior in the first one. If W opts out, A j will then offer sˆ Wh ,1 in the second encounter and if W accepts the offer in the first encounter, A j will then offer it sˆ Wl ,1 again in the second encounter. Proof: If Ull − U Ol ≥ β(Uhl − Ull ) then it isn’t worthwhile for Wl to opt out in the first encounter even if it receives an offer of sˆ Wl ,1 . In such situations it is worthwhile for A j to try to find out W ’s type by offering sˆ Wl ,1 . This offer will be accepted by Wl , but Wh will opt out. The equilibrium in the above theorem is a separating equilibrium. At the end of the first encounter A will find out what W ’s type is. We demonstrate the different cases of this section in the following example. EXAMPLE 6 We return to the example of the missions to Mars. Suppose the utility functions of NASA and ESA are exactly as described in Example 5 and
Negotiations about Resource Allocation
101
that ESA (playing the role of A) is of type h and that this is also known to NASA. In addition, ESA believes with a probability of 0.85 that NASA is of type l and with a probability of 0.15 that NASA is of type h (i.e., φ h = 0.85). In this example: Ull = 449,000, Uhl = 550,000, U Ol = 449,000, UlA = 994,000, UhA = 893,000 and U OA = 5,000. It is easy to verify that AA1 holds. Suppose the probability that NASA’s antenna will break again and that an additional encounter will occur is β = 0.5. Then AA1.1 holds and there will be an agreement in both encounters, regardless of the real type of NASA. If the probability that NASA’s antenna will break again is 0.7, then condition AA1.2 holds. However, since in this case AW1 also holds, the results will be similar to the case with probability β = 0.5, that is, the negotiation will end with an agreement in both encounters, regardless of the actual type of NASA. To summarize, the cases that we consider cover almost all the possibilities of utility functions when A’s expectation from sˆ Wh ,t is higher than from sˆ Wl ,t . The only case that is not considered is of situations of equality in the conditions. In all the cases considered in this section, the negotiations end in the second time period in both encounters. Also, the second encounter will always end with an agreement. However, some of the first encounters will end with opting out. This seems to be a rare situation, since this may happen only if the probability of the second encounter (i.e., β) is very low and Wl ’s utilities from sˆ Wl ,1 and sˆ Wh ,1 are very close in value, considering that Ull − U Ol is very close to zero. 4.5.2.2 Aj ’s Expectation from sˆWh , t Is Lower than from sˆWl , t In this section we consider the case complementary to the one described above, where, in a single encounter, the expected utility for A j , regardless of its type, from offering sˆ Wl ,t is greater than from offering sˆ Wh ,t , that is: AA2 φ j U A j (ˆs Wl ,t , t) + (1 − φ j )U A j (Opt, t) > U A j (ˆs Wh ,t , t). We assume that the expected utility of A from offering sˆ Wl ,t in a single encounter situation (i.e., φ j U A j (ˆs Wl ,t , t) + (1 − φ j )U A j (Opt, t)) is higher than its utility from offering sˆ Wh ,t (i.e., U A j (ˆs Wh ,t , t)). In this situation, if there is a single encounter, A j will offer sˆ Wl ,t and will always have the opportunity to get information on W ’s type. Therefore, in such situations, during the first negotiation encounter, Wl wants to convince A j that it is of type h. If it succeeds, in the next encounter A j will treat W as Wh and not as Wl , as if its beliefs are not changed.17 The only way that Wl can convince A j that its type is h is by opting out if it gets an offer less than sˆ Wh ,t . As we explained above, if Wh is offered less than sˆ Wh ,t then it opts out, since opting
102
Chapter 4
out is better for Wh than an offer that is less than sˆ Wh ,t . If Wh rejects the offer (chooses No) it will not reach a better agreement in the future. Therefore, if Wl wants to convince A j that it is of type h it should also opt out. However, it is not always rational for Wl to pretend to be Wh and to opt out. It depends on condition AW2, described in the previous section, in which the difference between Wl ’s utility from sˆ Wl ,1 and (Opt, 1) is greater than the difference between sˆ Wl ,1 and sˆ Wh ,1 multiplied by β. In particular, if AW2 holds, it is not rational for Wl to pretend to be Wh , as stated in the following theorem. Theorem 4.5.6 (Wl accepts sˆ Wl ,1 and reveals its type) If the model satisfies assumptions A0r –A6r , AA2, AW2, and the agents use sequential equilibrium strategies, A j will offer sˆ Wl ,1 in the second time period of the first encounter. Wh will opt out and will be offered sˆ Wh ,1 in the second encounter, which it will accept. Wl will accept sˆ Wl ,1 and will be offered the same in the second encounter and will accept it. Proof: If A j knows that W is of type l, it will offer sˆ Wl ,t in any time period of the negotiation. On the other hand, since Wh will not accept an offer of sˆ Wl ,t if W accepts such an offer, A j can conclude that it is of type l. Therefore, Wl should compare accepting sˆ Wl ,1 in both encounters with opting out in the first encounter and accepting sˆ Wh ,1 in the next one (if it occurs). Thus, if Ull + βUll ≥ U Ol + βUhl , Wl should accept sˆ Wl ,1 . But this inequality is clear from AW2 of the theorem. Since in this section we assume that A j prefers to obtain information (i.e., AA2 holds), theorem 4.5.6 follows. As in theorem 4.5.5, the equilibrium in the above theorem is a separating equilibrium and A j will find out W ’s type in the second encounter. If AW1 holds rather than AW2, then the situation is more complicated. In these cases, Wl ’s expected utility from opting out in the second time period of the first encounter, and accepting an offer in the next encounter as Wh , is greater than its expected utility from accepting an offer as Wl in both encounters. However, if Wl will always behave as Wh , then A j will not change its beliefs if it observes behavior typical to Wh , since it knows that both Wl and Wh behave similarly,18 that is, opt out when offered sˆ Wl ,1 . Thus there is no sequential equilibrium with pure strategies, the agents should use mixed strategies, and the equilibrium is a hybrid equilibrium. As mentioned above, when the agents choose to randomize between several pure strategies the expected utility from all of these strategies should be the same; otherwise they will not agree to randomize but rather prefer one pure
Negotiations about Resource Allocation
103
strategy over the other. In our case, when Wl is offered sˆ Wl ,1 it should randomize between accepting the offer and opting out. Thus its expected utility in both cases should be the same. If A j observes opting out, it should randomize in the second encounter between offering sˆ Wl ,1 again or offering sˆ Wh ,1 . We denote the probability that Wl will opt out if offered sˆ Wl ,1 in the first encounter pW , and the probability that A j will offer sˆ Wh ,1 in the second encounter if W opts out of the first one p A . The probability in which Wl and A j should randomize their strategies in a sequential equilibrium is stated in the next lemma. Lemma 4.5.1 (Probabilities of mixed strategies of Wl and A j ) If the model satisfies assumptions A0r –A6r , AA2, AW1, and the agents use sequential equilibrium strategies (with mixed strategies) then pA = and pW
U l − U Ol
ll β Uh − Ull
(1 − φ j ) UhA − U OA
= φ j UlA − UhA
(4.1)
(4.2)
Proof: Suppose that if Wl receives an offer sˆ Wl ,1 in the first encounter, it will opt out with a probability of pW . Since Wh always opts out in such situations, when A j observes opting out after proposing sˆ Wl ,1 , it will update its beliefs about W ’s type, and using Bayes’s rule it will conclude that W ’s type is l j pW with a probability of 1−φφj +φ j p . In the next encounter (if one occurs), A j will W randomly choose between offering sˆ Wl ,1 and sˆ Wh ,1 if the expected utilities from both offers are the same. If it offers sˆ Wh ,1 , then W will accept the offer regardless of its type and A j ’s expected utility will be UhA . If it offers sˆ Wl ,1 in the second encounter, Wl will accept the offer and Wh will opt out. Using A j ’s updated j φ j pW pW A A belief, its expected utility is 1−φφj +φ j p Ul + (1 − 1−φ j +φ j p )U O . Since A j ’s W W expected utilities from both sˆ Wh ,1 and sˆ Wl ,1 should be the same, we can conclude j (1−φ j )(UhA −U OA ) pW φ j pW A A A . that 1−φφj +φ j p Ul + (1 − 1−φ j +φ j p )U O = Uh and thus p W = φ j (UlA −UhA ) W W Wl ,1 in the first encounter, it will choose randomly If Wl receives the offer sˆ between opting out and accepting the offer only if its expected utility from both are the same. If it accepts the offer and reveals its type, it will be offered sˆ Wl ,1 also in the next encounter; thus Wl ’s expected utility in this case is Ull + βUll . If it opts out with a probability of pW , then in the second encounter A j will offer
104
Chapter 4
sˆ Wh ,1 with a probability of p A and will offer sˆ Wl ,1 with a probability of 1 − p A . Thus Wl ’s expected utility in this case is U Ol + β( p A Uhl + (1 − p A )Ull ). We require that U Ol + β( p A Uhl + (1 − p A )Ull ) = Ull + βUll and we conclude that U l −U l p A = β(Ul l −UOl ) . h l It still remains to be shown that 0 ≤ p A ≤ 1 and that 0 ≤ pW ≤ 1. From our assumptions, Ull > U Ol and Uhl > Ull , it is clear that p A > 0, and from AW1 it is clear that p A < 1. Similarly, since UhA > U OA and UlA > UhA pW > 0, and given AA2, it is clear that pW < 1. Finally, we must verify that under the above mixed strategies, it is still worthwhile for A j to offer sˆ Wl ,1 in the first encounter, where Wh will opt out and Wl will choose randomly between opting out or accepting. This is considered in the following theorem. Theorem 4.5.7 (Mixed strategies and pure strategies) If the model satisfies assumptions A0r –A6r , AA2, AW1, and the agents use sequential equilibrium strategies (with mixed strategies), then if AA23: φ j (1 − pW )(UlA + βUlA ) + (1 − φ j + φ j pW )[U OA + βp A UhA + β(1 − p A )(φ j UlA + (1 − φ j )U OA )] > UhA + β(φ j UlA + (1 − φ j )U OA ) then: First encounter: A j will offer sˆ Wl ,1 in period 1 of the first encounter; Wh will (1−φ j )(UhA −U OA ) always opt out and Wl will opt out with a probability of pW = φ j (U A −U A l h ) and with a probability of 1 − pW will accept the offer. If W accepts the offer, agent A will believe with a probability of 1 that W ’s type is l. If W opts U A −U A out, A will believe with a probability of UhA −U OA that W ’s type is l. l
O
Second encounter: If A believes that W ’s type is l with a probability of 1, then it will offer sˆ Wl ,1 which will be accepted by W .19 U l −U l Otherwise, A j will offer sˆ Wl ,1 with a probability of p A = β(Ul l −UOl ) and h
l
with a probability of 1 − p A it will offer sˆ Wh ,1 . Wl will accept the offer, but Wh will opt out. If inequality AA23 does not hold, A j will offer sˆ Wh ,1 in the first encounter and sˆ Wl ,1 in the second one. Proof: Most of the proof is clear from lemma 4.5.1 and the discussion in section 4.4. It remains to be shown that if inequality AA23 holds, then A j will
Negotiations about Resource Allocation
105
offer sˆ Wl ,1 in the first encounter. According to lemma 4.5.1, if A j offers sˆ Wl ,1 in the first encounter, then it believes with a probability of φ j (1 − pW ) that its offer will be accepted; Wl reveals its type and its overall expected utility in this case is (UlA + βUlA ). A j also believes with a probability of 1 − φ j + φ j pW that W will opt out (either because it is Wh or because it is Wl that opts out with a probability of pW ). In this case its utility in the first encounter will be U OA and in the second encounter A j will offer sˆ Wh ,1 with a probability of p A and with a probability of 1 − p A will offer sˆ Wl ,1 . If it offers sˆ Wh ,1 , its offer will be accepted by both agents; however, Wh will opt out if offered sˆ Wl ,1 . To summarize, A j ’s expected utility from offering sˆ Wl ,1 in the first encounter is: A
j φ p ) U O + βp A UhA φ j (1 − pW ) Ul A + βUlA + (1 − φ j + W
j A j A + β(1 − p A ) φ Ul + (1 − φ )U O . If A j offers sˆ Wh ,1 it will be accepted by W regardless of its type, and A j ’s beliefs will not be changed. According to AA2 in the second encounter, A j will offer sˆ Wl ,1 ; thus A j ’s expected outcome in this case is UhA + β(φ j UlA + (1 − φ j )U OA ). Therefore, A j will offer sˆ Wl ,1 in the first encounter if the following holds:
A + φ j (1 − pW ) UlA + βU l
(1 − φ j + φ j pW ) U OA + βp A U hA + β(1 − p A ) φ j UlA + (1 − φ j )U OA > UhA + β φ j UlA + (1 − φ j )U OA .
According to AA23 this inequality holds.
It is useful to characterize situations where the condition AA2.3 of the above theorem holds. Especially since p A includes Wl ’s utility, it is useful to know whether A j ’s decision depends on Wl ’s utility or not. We found that A j ’s decision depends on its own utilities, A j ’s original belief that W ’s type is l (φ j ), and on the probability that the agents will meet again (β). Lemma 4.5.2 If the model satisfies assumptions A0r –A6r , AA2, AW1 and the agents use sequential equilibrium strategies (with mixed strategies) and
A Uh − U OA UlA − U OA A A j A j A j Ul − Uh + (1 − φ )U O + β(1 − φ )Ul > (1 − φ ) UlA − UhA (4.3) then A j will offer sˆ Wl ,1 in the first encounter.
106
Chapter 4
Proof: After substituting pW and p A according to their definitions in inequality 4.3, we obtain the following:
(1 − φ j ) UhA − U OA A A A (4.4) Uh + βUh − 1 − Ul + βUlA A A Ul − Uh
βUhA Uhl − U Ol (1 − φ j ) UhA − U OA A j
UO + − .1 − φ + UlA − UhA β Uhl − Ull
U l − U Ol j A
φ Ul + (1 − φ j )U OA < 0. + β(1 − l l l β Uh − Ul
After some manipulations of the above, one can conclude that inequality 4.3 holds. Table 4.1 summarizes the results of this section. In the following examples we demonstrate the situations described in this section. 7 We return to the example of the mission to Mars. Suppose that two robots, one from NASA called RobotN and one from ESA called RobotE, need to share a digging tool. They need to divide its usage for 10 hours, but there is some probability that they will need to negotiate again in the near future on the usage of the same digging tool. In both cases RobotN is attached to the tool. Their utility functions are presented in table 4.2. RobotN’s type is known, but RobotE’s type is not known. It can be either l or h, where the utility of Wl from opting out is lower than the utility of Wh from opting out. We assume that RobotN believes with a probability of 34 that RobotE’s type EXAMPLE
Table 4.1 A summary of all the cases of two encounters. Case
Conditions
Results
1
AA1 & AA1.1
(ˆs Wh ,1 , 1) in both encounters
2
AA1 & AA1.2 & AW1
(ˆs Wh ,1 , 1) in both encounters
3
AA1 & AA1.2 & AW2
First encounter: If Wl , then (ˆs Wl ,1 , 1); if Wh , then (Opt,1). Second encounter: If Wl , then (ˆs Wl ,1 , 1); if Wh , then (ˆs Wh ,1 , 1)
4
AA2 & AW2
First encounter: If Wl , then (ˆs Wl ,1 , 1); if Wh , then (Opt,1).
5
AA2 & AW1 & AA23
Second encounter: If Wl , then (ˆs Wl ,1 , 1); if Wh , then (Opt,1). mixed strategies
6
AA2 & AW1 & ¬A A23
First encounter: (ˆs Wh ,1 , 1). Second encounter: If Wl , then (ˆs Wl ,1 , 1); if Wh , (Opt,1)
Negotiations about Resource Allocation
107
Table 4.2 Utility functions of the robots in example 7 el and eh denote ESA’s robot of type l and h respectively. n denotes NASA’s robot. Type h
Type l
U eh ((s, t))
U el ((s, t)) = sn − 2t U el ((Opte , t)) = 2.5 − t U el ((Optn , t)) = −t
= su − 3t e , t)) = 4.5 − t U eh ((Optn , t)) = −t
U eh ((Opt
sˆ el ,t = (7 − ⌊ 2t ⌋, 3 + t)
sˆeh ,t = (5 − 2t, 5 + 2t) U n ((s, t))
= se + 6t U n ((Opte , t)) = 3t
U n ((Optn , t)) = −100
is l. We have UlA = 12, UhA = 9, U OA = 3, Uhl = 5, Ull = 2, and U Ol = 1.5. In this situation, the expectation of RobotN, which plays the role of A, from sˆ Wh ,t is lower than that from sˆ Wl ,t in a single encounter, that is, AA2 holds. 1 Suppose β = 10 . In this situation AW2 holds, and according to theorem 4.5.6 RobotN (playing the role of A) will offer sˆ Wl ,1 = (6, 4) in the first encounter. If RobotE (playing the role of W ) is of type l, it will accept the offer sˆ Wl ,1 = (6, 4) and will receive a similar offer in the second encounter (if it occurs). If RobotE is of type h it will opt out in the first encounter and will accept (3, 7) in the second encounter. In both cases, at the end of the first encounter RobotN will know RobotE’s type. Suppose β = 12 . In this situation AW1 holds and the reverse of inequality AA23 is true. Therefore, according to theorem 4.5.7, RobotN will offer (3, 7) in the first encounter and will offer (6, 4) in the second encounter. Suppose β = 12 as before, but UlA = 20. In this situation AW1 still holds, but inequality AA23 also holds. According to theorem 4.5.7, if RobotE’s type is l and it is offered (6, 4) in the first encounter it should choose randomly between accepting the offer and opting out in the first encounter; with a probability of pW = 52 it will opt out, and with a probability of 35 it will accept the offer. RobotN will offer (6, 4) in the first encounter. In the second encounter, with a probability of p A = 31 it will offer (3, 7) and with 23 it will offer W (6, 4). In the previous section, where AA1 holds, we were able to find pure sequential equilibrium strategies, whereas in this section, where AA2 holds, in some situations the agents will need to use mixed strategies. The reason for this behavior is that when AA1 holds, Wl , whose utility from opting out is lower than that of Wh , does not try to change A’s belief in the first encounter; it only tries not to reveal its type. However, when AA2 holds, Wl tries to decrease A’s
108
Chapter 4
probabilistic belief that its type is l. In several of these situations there is no sequential equilibrium of pure strategies. 4.5.3
Simulation Evaluation
We used our simulation environment to study the agents’ performance in situations of multiple encounters. We chose the parameters of the agents’ utility functions based on example 7. We ran two sets of simulations; in both cases the agents had to divide ten periods of the usage of the resource between them. 4.5.3.1 Two Encounters: Simulation 1 In the first set of simulations, A’s gains during one time period of the negotiation (C A ) was 20. The benefits from using the resource per time period, G i , was 9; W ’s loss during the negotiation, C W , was 10; and W ’s loss over time when an agreement is reached, C ′W , was 7. There were two types of W : low and high. Wl ’s utility from opting out (at the first time period) was 20. Wh ’s utility from opting out varied between 35 and 65. We ran 30,000 iterations. In each iteration there was a second encounter with a probability of 0.6, that is, β = 0.6. In each iteration, with a probability of 0.5, W was of type l, and with a probability of 0.5 it was of type h. When W ’s type was h, its utility from opting out was chosen randomly to be between 35 and 65. In a given iteration, A didn’t know W ’s type but knew the utility functions of both types. The six possible cases of the two encounter situation occurred with the following percentages: 1
A A1&A A1.1
20%
2
A A1&A A1.2&AW 1
10%
3
A A1&A A1.2&AW 2
0%
4
A A2&AW 2
0%
5
A A2&AW 1&A A23
40%
6
A A2&AW 1&7A A23
30%
That is, cases where Wl was willing to accept a low offer in pure strategies (cases 3 and 4) didn’t occur at all in our simulation. In 60 percent of the iterations, A suggested sˆ Wh ,1 in the first encounter (cases 1, 2, and 6), which was accepted by W regardless of its type. In the second encounter A suggested sˆ Wh ,1 in only 44 percent of the iterations (cases 1 and 2 and some cases of 5; see 4.7). In 40 percent of the iterations, A offered sˆ Wl ,1 in the first encounter (case 5). In 50.4 percent of these iterations, W ’s type was h and it opted out. In addition, in 10.5 percent of the cases, where W ’s type was l, it pretended to be h and
Negotiations about Resource Allocation
First Encounter Second Encounter
109
A Suggested sˆ Wl ,1
A Suggested sˆ Wh ,1
W Opted Out
Wl Opted Out
40% 56%
60% 44 %
25.4% 24.2%
5.25% 0%
Figure 4.7 Outcomes of the negotiations of bilateral negotiations, two types of W with equal probability (φ = 0.5), with two possible encounters (β = 0.6.); The results are of simulation 1 in which there was a large difference between Wl and Wh ’s utility from Opt.
opted out. That is, 25.4 percent of all the iterations the negotiations in the first encounter ended with opting out. Thus in none of the iterations could A conclude that W is of type l with a probability of 0, since it couldn’t know whether it was Wh that opted out or whether it was Wl who was pretending. However, when it observed opting out, it slightly increased its probabilistic belief that W is of type h. In the second encounter (if it occurred), the number of times that A offered Wl ,1 sˆ increased. This was because of case 6 (30 percent), in which A offered sˆ Wh ,1 in the first encounter, but only sˆ Wl ,1 in the second encounter. In addition, in 65.2 percent of the iterations of case 5 (40 percent), A also offered sˆ Wl ,1 . Thus the overall percentage of offering sˆ Wl ,1 was 56 percent. However, even though the number of times A offered sˆ Wl ,1 increased, the number of times in which the negotiations ended with opting out only slightly decreased to 24.2 percent. This is because in 53.6 percent of the cases in which the first encounter ended with opting out, A offered sˆ Wh ,1 in the second encounter. In 79.4 percent of these cases, W was actually of type h. The overall utility for A from using the sequential equilibrium strategies rather than using the safe strategy of always offering sˆ Wh ,1 increased, on average, by 10.7 percent. 4.5.3.2 Two Encounters: Simulation 2 To obtain most of the six possible cases of the two encounters situations, we increased Wl ’s utility from opting out from 20 to 33 and varied Wh ’s utility from opting out between 38 and 65. The other parameters were the same as in the previous case. Again, we ran 30,000 iterations, and the six possible cases of the two encounter situation occurred with the following percentages: 1
A A1&A A1.1
40%
2
A A1&A A1.2&AW 1
0%
3
A A1&A A1.2&AW 2 10%
110
Chapter 4
First Encounter Second Encounter
A Suggested sˆ Wl ,1
A Suggested sˆ Wh ,1
W Opted Out
Wl Opted Out
30% 50.2%
70% 49.8%
17.5% 21.4%
2.45% 0%
Figure 4.8 Outcome of the negotiations of bilateral negotiations, two types of W with equal probability (φ = 0.5), with two possible encounters (β = 0.6.); simulation 2: a slight difference between Wl and Wh ’s utility from Opt.
4
A A2&AW 2
0%
5
A A2&AW 1&A A23
20%
6
A A2&AW 1&7A A23
30%
In this set of simulations, since cases 1 and 6 happened in 70 percent of the iterations in which A offered sˆ Wh ,1 , A didn’t have many opportunities to learn W ’s type. However, in case 3 (10 percent), it could undoubtedly determine whether W ’s type was high or low. Also, in 7.55 percent of the cases where Wl accepted sˆ Wl ,1 , A could know its type for sure. In 57.9 percent of the iterations in which opting out occurred in the first encounter, A offered sˆ Wh ,1 , and in 90.1 percent W was Wh . So, even though A did very well in those cases where it had an option to learn, since in 30 percent of the iterations (case 6) A didn’t try to learn and offered sˆ Wl ,1 in the second encounter, the number of opting out increased from the first encounter. Furthermore, in this set of simulations, the overall utility for A from using the sequential equilibrium strategies rather than using the safe strategy of always offering sˆ Wh ,1 increased. However, given the above discussion, it increased by only 4.7 percent. 4.5.4
Extensions of the Model
The situations considered in the previous section are relatively simple: there are two types of agents and only two possible encounters. In this section we discuss possible extensions of the model. Two agents and more than two encounters There are situations in which the agents may meet more than twice. Suppose the agents believe that in addition to the current encounter there is positive probability for m encounters and that the (independent) probability for each of these encounters is βi , i = 1, . . . , m, respectively. This assumption is valid if the probability of the need for a resource in one time period does not depend on the probability of using the resource in another time period. For
Negotiations about Resource Allocation
111
example, the probability that the communication system will be down on one day does not depend on whether it was down on the previous day, assuming that the problem is fixed by the end of each day. Similarly, the fact that there are excessive customer requests on one day does not help in predicting the situation on the next day. It mainly depends on the behavior of the market, the day of the week, and so on. Thus, in our analysis, agents do not update their beliefs about future interactions. This case is similar to that of two encounters, but is more complicated. It was considered in (Kraus 1997). Many resources Suppose that there are several resources in the environment and at any given time, only two agents may share the same resource, after the agents have reached a detailed agreement. There may be two types of resources: available ones, and resources that are already in use by other agents.20 In such an environment, when an agent needs a resource, it may check if there is such a resource that is not in use. However, if all the resources of the type that is needed are already in use, it may find the resources that are being used by only one agent, and based on its beliefs about their types and its utility for using the specific resources, it can decide with whom to start the negotiations. We assume that an agent cannot negotiate with more than one agent concurrently. Let us assume that there is only one encounter, and we call the agent that is waiting for a resource W . For any resource R and an agent A R that uses it, W has certain probabilistic beliefs about its type and about the belief of A R about W ’s type. For each of these types j ∈ Type, W computes its expected utility from negotiation with A Rj . Then, using its own beliefs about the type of its opponent, W computes the overall expected utility from R. After computing the expected utility for all the resources, W chooses the one with the highest expected utility and negotiates according to the strategies of the previous section. After choosing a resource, it is easy to prove that the agent may not change its decision, that is, it will not stop negotiating with one agent and start a new negotiation process with another agent for a different resource. More than two types of agents Suppose there are more than two types of agents. This situation is similar to when there are only two, but the agents need to take more options into consideration. If there is only one encounter and A j offers sˆ Wr ,1 in the second time period, then if W ’s type is i ≤ r it will accept the offer. If i > r , Wi will opt out. Suppose that the maximum expected utility for A j in such a case is when it offers sˆ Wr ,1 and that it isn’t worthwhile for A j
112
Chapter 4
to offer less in order to gain information. If so, A j will offer sˆ Wr ,1 , and if it is accepted, A j will know that W ’s type is at most r and will update its belief accordingly using Bayes’s rules. It is easy to prove that in such a case A j will offer sˆ Wr ,1 in the second encounter as well. However, if W opts out, A j may conclude that W ’s type is greater than r and update its belief. In such a case, in the next encounter A j will choose an x such that x > r and will offer sˆ Wr ,1 . The question, as in the case of two types, is that if W ’s type is less than r is it worthwhile for W to opt out or is there an equilibrium of pure strategies, or should the agents use mixed strategies? In the case where there are no pure strategies, the process of identifying the probabilities of the mixed strategies is similar to the case where there are only two types of agents. However, it will require solving more equations. We conclude this section with a discussion on mixed strategies in MA systems. 4.5.5
Mixed Strategies in MA Systems
In this section we propose that the agents use mixed strategies when there is no equilibrium with pure strategies. Mixed strategies that are in sequential equilibrium are not as intuitive as pure strategies in equilibrium, and many game theorists and economists prefer to restrict themselves to pure strategies in games that have both pure and mixed strategies in equilibrium. Similarly, we suggest using mixed strategies only when there is no equilibrium with pure strategies. We claim that using mixed strategies for automated agents is a helpful technique. Game theorists and economists try to model and estimate human behavior (for example, Rasmusen 1989). One of their main objections to mixed strategies is that people in the real world do not take random actions. This observation is not applicable in MA systems, where all agents are automated and the designers can come to a general agreement that their agents use mixed strategies.21 Even in the case where some of the agents are human, the automated agent can treat the mixed strategies as good descriptions of human behavior in the sense that the actions appear random to observers, even if the human agent him- or herself has always been sure what action he or she would take. For example, if there are several types of human agents, each will take a different action, and the automated agent will have probabilistic beliefs about the human’s type. Moreover, explicitly random actions are not uncommon among humans. For
Negotiations about Resource Allocation
113
example, the IRS’s heuristics for deciding which tax return to audit include random actions. Another objection to the usage of mixed strategies is that an agent that selects mixed strategies must always be indifferent to two pure strategies. Even a small deviation from the probabilities specified by the equilibrium destroys the equilibrium, while this deviation does not change the agents’ expected utility. That is, to maintain the equilibrium, a player must select a particular strategy from strategies it is indifferent to. It seems that in the case of automated agents, the designers can agree in advance on such behavior. Zlotkin and Rosenschein (1991) also consider a certain type of probabilistic actions. They propose the notion of mixed deals in order to resolve conflicts in task distribution. A mixed deal is defined to be a pair of plans PA and PB and a probability p. If the agents reach this deal, then with a probability of p, agent A will do PA and agent B will carry out PB , and with a probability of 1− p, A will do PB , and B will carry out PA . That is, Zlotkin and Rosenschein’s protocol requires that the agents need to draw the random number jointly. The expected utility of an agent from PA and PB is different and there should be a mechanism to force them to carry out their promises after they jointly draw the random number.22 Note that Zlotkin and Rosenschein’s concept is very different from ours.23 We propose to use only pure deals. An agent chooses a strategy randomly, in private, and is motivated by the property that the expected utilities of the strategies it mixes are the same. Furthermore, using Zlotkin and Rosenschein’s mixed deals won’t provide stability in our case. If an agent agrees on a mixed deal, it thereby reveals its type. This is not acceptable in the cases where it considers mixed strategies. 4.6
Other Approaches to the Resource Allocation Problem
The problem of resource allocation has been extensively studied in operations systems, distributed systems, and real-time systems. Here we briefly discuss some of these works and then discuss the research on resource allocation in distributed artificial intelligence. Resource allocation problems, in general, can be categorized as the determination of the scheduled times for project activities that either (1) level the resource requirements, subject to a constraint on the project duration; (2) minimize the project duration, subject to constraints on the availability of resources; or
114
Chapter 4
(3) minimize the total cost of the resources and the penalties due to project delay. Because of their complexity, optimal solutions, using mathematical programming, have very limited utility. Because of this lack of success with optimization procedures, attention has primarily been focused on heuristic procedures that produce feasible schedules. These procedures are classified as “serial” or “parallel” approaches, depending on whether the priorities are determined only once before the activity of scheduling begins or during scheduling. Davis (1966, 1973), and Davis and Patterson (1975) have studied and reviewed this problem extensively and have evaluated a large number of heuristics for prioritizing the activities during the scheduling process. Their primary conclusion is that, while no single heuristic will always provide the best schedule, the rule of scheduling the activities with least slack first (or the equivalent rule of minimum latest start time) has the best average performance, when preknowledge is available. The operations research point of view meets the distributed problem solving (DPS) definition, in that they both attempt to maximize the performance of the system. In our model, which is a multi-agent system (MA), the problem of formulating the optimal resource constrained case is complex because explicit criteria are lacking with respect to the optimal use of a resource. In distributed systems, parallel activities take place at the same time, with a possible need for the same resources. As in DPS, all computers cooperate to achieve a common goal, for example, performance improvement. Loadsharing algorithms try to improve performance by distributing goals among the components more evenly. Such algorithms can be beneficial, as shown in (Eager, Lazowska, and Zahorjan 1986), (Zhou 1988), and (Kremien, Kramer, and Magee 1993). Most of the benefits of load sharing occur as a result of good initial placement decisions, preferably at the component level. 4.6.1
Scheduling in Real-Time Systems
Another approach to the resource allocation problem can be found in realtime systems, where resources such as memory, CPU, and so on, should be accessed by several processes at the same time. Even though the resource allocation problem seems to be similar to the problem of synchronization of accesses to a common resource, straightforward techniques (such as mutual exclusion) cannot be used. The classical theory of coordination using critical section techniques assumes that the agent that is currently using the resource (i.e., it is inside its critical section) must of its own “free will” release the resource. However, in our domain, the agents are not assumed to be benevolent. Even when an agent no longer needs the resource, it has no motivation
Negotiations about Resource Allocation
115
to release it. The agent that holds the resource may need it again in the near future and may not want to risk a long and expensive wait until the next time it obtains the resource. Most of the work in the classical scheduling theory deals with static scheduling (e.g., Ramamritham, Stanlovic, and Shiah 1990). In static scheduling, the scheduling algorithm has complete knowledge of the task set and its constraints, such as deadlines, computation times, precedence constraints, and future release times. This set of assumptions is realistic for many real-time systems. For example, a simple laboratory experiment or a simple process-control application might have a fixed set of sensors and actuators and a well-defined environment and processing requirements; the static scheduling algorithm operates on this set of tasks and produces a single schedule that is fixed for all times. If all future release times are known when the algorithm is developing the schedule, then it is still a static algorithm. A dynamic scheduling algorithm has complete knowledge of currently active tasks, but new task activations, not known to the algorithm when it is scheduling the current set, may arrive. Therefore, the schedule changes over time. For example, teams of robots cleaning up a chemical spill or military command and control applications require dynamic scheduling. As described in (Stankovic et al. 1995), there are few known results for real-time dynamic scheduling algorithms, among them (Cheng, Stanlovic, and Ramamritham 1986; Zhao, Ramamritham, and Stanlovic 1987; Rosu, Schwan, and Jha 1997; Czumaj and Stemann 1997; and Alanyali and Hajek 1997). The important distinction between static and dynamic scheduling is what is known about the algorithm’s performance in each case. For example, consider earliest-deadline-first (EDF) scheduling. When applied to static scheduling, EDF is optimal in many situations, but when applied to dynamic scheduling on multiprocessors, it is not. 4.6.2
Resource Allocation in MA Systems
Other work in the DAI community dealing with the resource allocation problem includes, for example (Conry, Meyer, and Lesser 1988; Kuwabara and Lesser 1989) which present a multistage negotiation protocol that is useful for cooperatively resolving resource allocation conflicts arising in distributed networks of semiautonomous problem-solving nodes. Lesser, Pvalin, and Durfee (1988) address tradeoffs in resource allocation and real-time performance, and develop a mechanism for resource allocation based on the criticality of tasks. Kornfeld and Hewitt (1981) propose resource allocation using specialist “sponsor” agents; and Chandrasekn (1981) proposes resource allocation via resource pricing.
116
Chapter 4
Sengupta and Findler (1992) discuss dynamic scoping and constrained latticelike structures for distributed resource allocation in multiagent environments. Chavez, Moukas, and Maes (1997) developed a market-based control system, named Challenger, that performs distributed resource allocation (in particular, allocation of CPU time). All these works are applicable to DPS environments, but not ours, as we consider MA environments. In summary, in this chapter, we considered situations of bilateral negotiations for resource allocation, where one agent already has access to the resource while the other agent is waiting to use the resource. Situations of complete information and situations of incomplete information were considered, and cases of both single and multiple encounters were studied. In all cases, negotiation ends no later than in the second time period, and usually with an agreement.
5
Negotiations about Resource Allocation with Multiple Attributes
In chapter 4, we considered the abstract problem of resource allocation. We made several assumptions about the agents’ utility functions, but we did not model other aspects of the environment, for example, why the agents need the resource, what the specifications of their utility functions are, and so on. A general model of agents and their goals and the resources they need is presented in this chapter. We continue to consider bilateral negotiations in which one agent uses the resource during the negotiations, while the other agent waits to gain access to the resource. We formally define the exact utility functions for the agents. Such utility functions enable us to devise offers that can be compared to other scheduling techniques via simulations. Furthermore, in the previous chapter, the agents we considered had to divide a given number of time periods of resource usage between them. Therefore, the decision of how many time periods one of them gets to use the resource determined uniquely the number of time periods that the second agent would use the resource. Thus there was only one attribute to the negotiation. In this chapter the available number of time periods of resource usage is not restricted. However, each agent would like to obtain all the time periods that it needs, and as soon as possible. Thus there are two issues to the negotiations: how many time periods each will obtain and when. In particular, it is assumed that the agent that holds the resource will obtain its time periods first, that is, will continue to hold the resource for the number of time periods that is specified in the agreement. Only then will the waiting agent get access to the resource for the number of time periods that is specified in the agreement. That is, an agreement consists of two parts—the time slice of the agent that holds the resource and continues to use it, and the time slice of the agent that would like to gain access to the resource. The waiting agent will gain access to the resource only after the time slice of the first agent has ended. Thus, even though increasing the time slice of the waiting agent can be done without decreasing the slice of the attached agent, the increase may not be beneficial to the waiting agent. It may be too late for it because it needs to wait until the other agent finishes using the resource. In other situations, when the waiting agent does not have a deadline, it does not care if the slice of the agent that holds the resource will be increased, given that its part will be large enough. We have identified strategies that are in subgame-perfect equilibrium in various situations. When the agents follow these strategies, the negotiation ends in the first or the second periods of the negotiations.
118
5.1
Chapter 5
Description of the Environment
There are several agents in the environment that need to satisfy goals, which are possibly given to them sequentially. We refer to this environment as the resource allocation environment (RAE). Time constraints and a deadline are associated with each goal. As in the previous chapter, we consider bilateral negotiation between two agents that need the same resource. One of them— the Attached Agent (A)—is using a resource that another agent—the Waiting Agent (W )—needs. So, W starts a negotiation process to obtain access to the resource. A has possibly performed work before W started the negotiation. During the negotiation process, A continues to hold the resource and to work toward its goal. W cannot move to another goal and return to this goal later, since the goals are given in a meaningful order. For example: W cannot start taking soil samples before digging a hole. Since each goal has a deadline (as will be described below), each negotiation step reduces W ’s chances of accomplishing its goal. Therefore W loses over time. A usually gains over time, unless it needs to stop working on its goal without finishing some minimal requirements with respect to its goal. When one of the agents opts out, both agents wait q time steps, until the damage to the resource is fixed, and then both of them may try to obtain the resource. In some cases, it may be more beneficial for one of the agents to leave the negotiation, rather than to opt out and wait for q time steps. Thus the negotiation has an additional possible action: an agent can leave the negotiation. In this case, the other agent can continue (or start) using the resource immediately and the agent who leaves no longer tries to satisfy its current goal. Thus, if opting out occurs, both agents may continue to compete for the resource after q time periods and may satisfy their goals, whereas if one of the agents leaves, there is no more competition and only one agent, the one that didn’t leave, may satisfy its goal. An offer that is made in the negotiation refers to two aspects of the use of the resource, as shown in the next definition. Definition 5.1.1 (Agreement) An offer that could become an agreement is a pair < s, n >, such that s is the number of steps that A will continue to keep the resource, that is, the number of steps W has to wait.
•
•
n is the number of steps W gains to keep the resource.
The set of possible offers is S = {(s, n) ∈ IN 2 : s ≥ 0 , n ≥ 0}.
Negotiations about Resource Allocation with Multiple Attributes
119
Note that in this case, s does not determine n uniquely (as in the previous chapter). Rather, both s and n are subject to negotiation. In an agreement, (s, n), both s and n are not restricted. However, if s is too large, then W will not have enough time to perform its goal before its deadline. If n is too large, then since holding a resource without using it is costly, it will not be profitable for W . Implementing an agreement means that A will continue to work for s time periods and then W will work for n. We will see that even after an implementation of an agreement the agents’ goals may be only partially satisfied and they may try to obtain additional access to the resource. We demonstrate these RAE environments using a variation of the mission to Mars example. EXAMPLE 8 We return to the example of the mission to Mars. This time NASA has embarked on a scientific mission to Mars that involves sending several mobile robots. The European Space Agency (ESA) has also sent several mobile robots to Mars. Both NASA’s and ESA’s robots work in the same environment. The missions of the robots involve collecting soil samples from different locations and at different depths. To satisfy its goal, a robot will need one of the digging tools. The tools were sent to Mars by a third company, which charges NASA and ESA according to their use of the equipment, but does not schedule its usage. When a robot encounters a situation in which it needs a resource that is used by another robot, it may start a negotiation session.
Since the option of Leave was added to the negotiation protocol, we need to make the following additional assumption: Leaving vs. opting out and possible agreements: When an agent’s utility from leaving and from opting out is the same, it will not opt out of the negotiation. Similarly, when an agent’s utility from leaving and from accepting an offer is the same, it will not accept the offer but will leave. The reason for this assumption is that an agent would like the resource to stay free so it can use it for future goals. Opting out causes damage to the resource and an agreement leads to a usage of the resource; thus both limit the avaiability of the resource. Leaving the negotiation, on the other hand, does not limit the availability of the resource. Thus, if the utility from the three options are the same, leaving is preferred. 5.1.1
Goals and Resources
An agent needs a resource in order to fulfill one of its goals, which is formally defined in the following definition.1
120
Chapter 5
Table 5.1 A short description of all notations used in this chapter. The index i refers to either agent A or W .
Goal
Notation
Description
Definition
g tmin
Goal identification. Minimum time periods needed for working in order to get paid for this goal. Maximum time periods needed for working on this goal. Deadline: the number of time units from the goal’s arivial time in which the goal is still relevant. Payment per time period. The resource needed for this goal. Periods agent i has been working on the goal so far. Current goal A is working on— A , t A , dl A , m A , r > < g A , tmin max Goal W wants to work on— W W , dl W , m W , r > , tmax < g W , tmin The time periods A gets to keep the resource according to an agreement. The time periods W gets to use the resource according to an agreement.
Def. 5.1.2
< s, n >
Best agreement for A in time period t that is not worse for W than opting out.
Def. 5.1.3
s tA
The additional time periods A needs in time period t to completely accomplish its goal.
Def. 5.1.4
Ol q
The latest offer made in the negotiations. Time periods needed for repairing the resource after opting out. Cost for W per negotiation period.
Sec. 5.2 Sec. 5.1
The earliest time period in which agent W W does not have enough time to perform tmin before its deadline after Opt. The earliest time period in which W W does not have enough time to perform tmin before its deadline. The time in which A would finish working on its goal and prefers to Leave. The earliest time in which W prefers to leave over any other option. The earliest time period between tˆ W and tˆ A .
Sec. 5.2.1
tmax dl m r donei gA gW Offer
s n
s˜ W,t
General
c Time periods
W tneo
W tne
tˆ A tˆ W tˆ
Def. 5.1.1
Sec. 5.1.2
Definition 5.1.2 Goal: A goal is a tuple with six elements < g, tmin , tmax , dl, m, r >, where 1. g is a unique goal identification number, which enables easy reference to each goal. We denote the set of all the goal identification numbers by G I . We will sometime refer to the goal using its identification number and vice versa.
Negotiations about Resource Allocation with Multiple Attributes
121
2. tmin —denotes that an agent must work a minimum of at least tmin time periods on the goal without interruptions in order to receive any payment. That is, this is the minimal amount of time required to partially satisfy the goal. 3. tmax —indicates that it is not beneficial for an agent to work more than a maximum of tmax time periods. That is, after tmax time periods, the goal is fully satisfied. 4. dl—specifies the deadline for accomplishing this goal, i.e., the number of time units from the time the agent receives the goal until the goal stops being relevant. 5. m is the payment the agent receives per time period, but is given only when the minimum time periods were completed. The maximal payment that could be made is m · tmax for working on the goal for tmax time periods. No payment will be made for additional time periods. The payment is made after a reduction of the fee for using the resource needed for this goal. 6. r is the resource needed for this goal. It is costly to hold it. We assume that there is a monetary system in the environment that is used for the payments mentioned above, as well as for other costs described below.2 We also assume that the agents try to maximize their payments and minimize their costs. An agent will try to work on the goal for tmax steps, but it may not be able to do so because the resource may not be available for tmax periods before the deadline. The work toward achiveing a goal can be stopped and resumed later, on the condition that the agent has worked for tmin periods on it. However, if it is stopped before the agent has worked on it at least tmin time periods, the agent will need to start from the beginning if it would like to satisfy the goal. In environments where goals are defined with an exact number of time periods, we set tmin = tmax . In environments in which working toward a goal any number of time periods is beneficial, tmin = 0. If the resource needed for the goal is busy, the agent that attempts to fulfill the goal must negotiate access to it, as we will describe in the next section. 5.1.2
The Utility Functions
As in the previous chapters we assume that each agent has a utility function: U i : S ∪ {Opt, LeaveW , LeaveA } × G I × IN × T → IR. In this chapter a utility function associates with each of the possible outcomes the expected benefits of the agent from the outcome. The possible outcomes are
122
Chapter 5
an agreement (i.e., a member of S), opting out and Leave by one of the agents. The expected benefits depend on goal identification (i.e., a member of G I ), the number of periods the agent has been working on the goal until the negotiation has started (donei ), and the time period of the negotiation (i.e., a member of T ). When donei and g i are clear from the context or do not influence the agent’s utility, we will not specify them, that is, we will write U W (LeaveW , t) instead of U W (LeaveW , g W , doneW , t). The utility functions of the agents in the environment considered in this chapter may depend on a variety of factors: the specifications of their current goal, the amount of time already spent in attempting to satisfy this goal, benefits for working on the goal, costs for holding resources, future usage of the negotiated resource for the same goal and for future assigned goals, and future needs of other agents. Taking all these factors into account would make the utility function very complicated, so we will concentrate on the factors that seem to play the most important role in evaluating the outcome of the negotiation. In particular, we will ignore the effects of future assigned goals and consider only some of the influences of other agents (i.e., not the ones that participate in the negotiation), since the negotiation agent is uncertain about these factors, and they play a less important role than the information about the current goal. However, when two outcomes have the same utility, an agent will prefer the one in which the resource will be free in the future, as much as possible, to be used for future goals. Thus the utility functions of the agents apply several functions that specify possible costs and benefits of the agents and which are used in the computations of the utility of an agent.3 Productive time steps ( prod): Suppose an agent is working on a goal g and it is possible for it to obtain access to the resource needed for this goal for n time steps, starting at time t. The function prod specifies the time the agent can actually use toward satisfying its goal, given the goal’s deadline and the other parameters of the goal. When calculating the productive steps of an agent, several parameters should be taken into consideration. If an agent cannot perform at least the minimal number of steps required to start receiving payment (tmin ), then it receives no payment, and therefore its productive steps equal zero. Thus consider the case where the agent can take at least the minimal number of steps required to start receiving payment for that goal. These steps are all productive only given two conditions: first, that they do not exceed the maximum number of steps needed for this goal; and second, that enough time remains before the goal’s deadline is reached.
Negotiations about Resource Allocation with Multiple Attributes
123
Payment function (Pp ): We assume that the agent receives a fixed payment for every productive step it works toward satisfying its goal. The payment is by the entity that requested the satisfaction of the goal. Costs for holding an unused resource (H i ): It is beneficial for an agent to hold a resource only if it can use it for fulfilling one of its goals. However, some agreements may lead the agent to hold a resource even if it cannot use it.4 The function that specifies the cost of holding an unused resource for agent i ∈ {W, A} is denoted by H i . Negotiation cost (Pn ): We assume that W has a fixed cost for each negotiation step. The cost is due to communication costs, etc., which it needs to pay. Since W initiates the negotiations, we assume that it needs to pay these costs and not A. Possible productive steps after an agreement ( f ia ): After an agreement is implemented, an agent may try to gain access to the resource again. If successful, it can use additional time steps to fulfill its goal, and may obtain additional payment. These future steps may be productive only if they are prior to the goal’s deadline (dl) and they do not exceed the maximal number of steps required for the goal (tmax ). In addition, the agent must be able to complete at least tmin steps with no interruptions, i.e., either due to the agreement, or after the agreement was implemented. Note that these possible steps will be used by the agent only if it actually gains access to the resource after the agreement is implemented. Possible productive steps after opting out ( f io ): For an agent to determine its expected utility from opting out, it must estimate the number of possible productive steps when the resource will be accessible again (i.e., after q time steps from the time of opting out). The main question here is whether the deadline will be reached before the resource will recover, i.e., whether dl i − q − t < 0. If so, no productive steps after opting out are available. Possible productive steps after LeaveW ( f Al ): Finally, we consider the case where W leaves the negotiation. In this case A can continue working on its A goal until it works for tmax or until its deadline. That is, its future steps are the maximal number it needs. Note that W does not have any future step after it leaves, and similarly when A leaves it does not have future steps.
124
Chapter 5
Probability of gaining access to the resource after an event ( pi ): As mentioned above, an agent may be able to work on its goal after opting out or after an agreement has been implemented5 only if it gains access to the resource. The probability that agent i will actually gain access to the resource after an agreement is implemented or after the recovery period from opting out is over is denoted by pi ∈ [0, 1], i ∈ {A, W }. To simplify the model, we assume that after one of the agents gains access to the resource after an agreement is implemented or after the recovery period from opting out is over, the other agent will leave. We demonstrate the agents’ utility function using the following example. 9 We return to the example of the robots on Mars (example 1.2.2). Suppose that at some given time, a robot sent by NASA, called RobotN, is working on goal G1, which is to dig a hole in Mars’ surface using a special digging tool denoted 1002. Its task should take at least 15 minutes and should not last more than 70 minutes (the rock’s structure is unknown to the researchers, and therefore a time interval is given instead of an exact amount of time). The deadline of this goal is set to 79 minutes. The payment for each productive minute is 4. We formally specify this goal by . Similarly, RobotE is a robot sent by ESA, and one of its goals is to dig a small hole in another location. We specify this goal by < G2, 10, 20, 30, 4, 1002 >. W W That is, tmin = 10, tmax = 20, W ’s deadline is 30, and its payment, similar to RobotN, for each productive minute is 4. RobotN has started working on its goal first. It has already been working for four minutes (done A = 4) when RobotE wants to start working on its goal and realizes that RobotN is using the resource it needs. This situation requires negotiation, where RobotN is the Attached agent and RobotE is the Waiting agent. Opting out in this case causes the resource to go out of use for eight minutes (q = 8). After opting out or after an agreement is implemented (i.e., A has worked for s time periods and W for n time periods), if the agents weren’t able to fully satisfy their goals and they still have enough time (i.e., their deadlines haven’t arrived yet) they, may try to get access to the resource. We assume that both robots have equal probability of gaining access to the tool in such a case ( pW = p A = 0.5). Since RobotE initiates the negotiation, it has costs of 2 per minute for the communication. Let us consider the utility of the robots from a specific possible agreement. Suppose the robots agree at the beginning of the negotiation (i.e., t = 0) that EXAMPLE
Negotiations about Resource Allocation with Multiple Attributes
125
RobotN will continue working for an additional 20 minutes and will then allow RobotE to use the tool for 10 minutes, that is, < 20, 10 >. This will allow both RobotN and RobotE to perform their minimal required time periods, but will allow RobotN to perform for a few more time periods before the agreement is implemented, and possibly additional time periods after the agreement is implemented. In particular, RobotN, which plays the role of A, is allowed to work 20 minutes according to the agreement. Thus its utility from this work is 20 · 4 = 80. Working for 20 minutes is more than its minimal requirement A (tmin = 15), but even with the 4 minutes RobotN had worked before the neA gotiation started (done A = 4), it is less than its maximal time (tmax = 70). RobotN has enough time to work an additional 20 minutes since its deadline is in 75 minutes (note that the original deadline was 79 minutes from the time it started working on the goal, but it has already worked for 4 minutes). After the agreement is implemented, that is, after RobotN has worked for 20 minutes and RobotE has worked for 10 minutes, RobotN will have 45 minutes until its deadline (which together with the minutes worked before the agreement will give it 69 minutes, i.e., almost its maximum). However, the probability that it will be able to work these 45 minutes is only 0.5. Thus its expected utility from working after the agreement is implemented is 0.5 · 45 · 4 = 90. Summing these utilities we obtain that U A (< 20, 10 >, G1, 4, 0) = 170. The only utility for RobotE, which plays the role of W , is attained from reaching the agreement. It does not have future time periods since its deadline will arrive after the agreement is implemented, that is, in 20 + 10 minutes. Thus its payments are 10 · 4 = 40. Since the agreement is reached during the first time period of the negotiation, it does not have any additional costs. Thus U W (< 20, 10 >, G2, 0, 0) = 40. Opting out at the beginning of the negotiation (t = 0) is worse for RobotN than the agreement < 20, 10 >. If opting out occurs, RobotN won’t have any productive time periods before it. However, it will be able to perform most of its maximal required time, that is, 67 minutes, after the resource becomes available again and before its deadline arrives. In this case, it will gain access to the resource only with a probability of 0.5. Thus U A (Opt, G1, 4, 0) = 0.5 · 67·4 = 134 (assuming, as we have, that no additional negotiation session will be started by RobotE). RobotE, which plays the role of W , will have the chance to work its total maximal time after opting out, but again only with a probability of 0.5. Thus U W (Opt, G2, 0, 0) = 0.5 · 20 · 4 = 40. This is equal to the utility from the agreement < 20, 10 >, where it works for 10 minutes with certainty.
126
5.1.3
Chapter 5
Comparison with the Single Attribute Case
In this chapter we consider a situation of agents working on goals (referred to as the “environment with goals”), which is more complex than the situations of resource allocation discussed in the previous chapter. However, since both chapters consider the problem of resource allocation, it is possible to check whether assumptions A0r –A6r of chapter 4 hold for the “environment with goals” case. A0r : Assumption A0r states that disagreement is the worst outcome for W and the best outcome for A. It is easy to see that in the environment with goals, disagreement is the worst outcome for W as in A0r . However, since A needs to pay for holding the resource when it is not in use, disagreement, that is, holding the resource forever, is also the worst outcome for A. That is, from the time A that A finishes working for tmax steps, or its deadline arrives, the negotiation is costly for it and negotiating forever is the worst outcome. Thus, in the example of robots on Mars, if RobotN is working on < G1, 15, 70, 79, 4, 1002 > as in example 9, when it has already worked for 4 minutes before the negotiation had started, then after negotiating for 66 minutes, RobotN pays 2 per step. A1r : This condition specifies that the resource is valuable, and each agent would like a larger share of the resource (or its usage time). In our case, both agents want, in general, more time to work on their goals. However, this desire i is constrained by their deadlines and tmax . Thus A1r is valid only for some agreements. For example, RobotN of the above example does not want more than 66 minutes to work on its goal. A2r : The condition A2r requires that W loses over time and that A gains over time, with respect to agreements. As explained above in the discussion on W ’s preferences, W is losing over time; thus A2r is valid with respect to W . A gains over time only in some cases, depending on the specific agreement and the time of the negotiation as discussed in the section on A’s preferences. A3r : In this assumption we require that agent A has a utility function with a constant gain due to delay and agent W has a utility function with a constant cost due to delay. The simple requirement of A3r is not valid in our case. W does have a constant cost of the negotiation. However, gaining additional time proportional to this loss is not always beneficial, given the specification of W ’s goal. In
Negotiations about Resource Allocation with Multiple Attributes
127
particular, there are situations where n 1 > n 2 , but U W (< s, n 2 >, t) < U W (< s, n 1 >, t). For example, if RobotE is working on < G2, 10, 20, 30, 4, 1002 >, then for n 1 = 20 and n 2 = 25, U W (< 10, 20 >, 0) > U W (< 10, 25 >, 0). Similarly, such a constant does not exist in A’s case. A4r : This assumption indicates that W prefers opting out sooner than later, while A prefers opting out later than sooner. This assumption is correct for W , but not for A, as discussed below in the section on A’s preferences (section 5.1.5.2). A5r : This assumption requires that if there are some agreements that agent W prefers over opting out, then agent A also prefers at least one of those agreements over W ’s opting out even in the next period (assuming the condition of “Agents avoiding opting out” holds). The agreement < 0, 0 > is always preferred by both A and W over opting out. Since we assume that q ≥ 1, under most conditions < 0, 0 > will also be preferred by W over opting out in the next time period. However, if A has finished working on its goal, this is not the case. A6r : The last assumption of chapter 4 indicates that there is an agreement in the first step of the negotiation that is preferred by both agents to opting out. This assumption is valid in the current case since (0, 0) is such an agreement. However, note that in this chapter we assume also that the agents can leave the negotiation without causing damage to the resource. Leaving the resource during the first time period may, in some cases, induce the same utility for W as (0, 0). As can be seen from our discussion, the utility functions of the agents in the current chapter do not satisfy assumptions A0r –A6r . Thus we can’t use the results achieved in the previous chapter here. Nevertheless, close analysis of this case shows that an agreement may be reached at the close of the first time period, as will be proved in the next section. 5.1.4
Acceptable Agreements
The main question in identifying strategies that are in subgame-perfect equilibrium is what are the offers that may be acceptable to the agents. It is clear that such offers must be better than (or equal to) opting out or leaving for both agents. Otherwise, they will prefer the other options to reaching an agreement. The negotiation time also plays an important role in the identification of the offers that may be acceptable. A is attached to the resource and doesn’t lose
128
Chapter 5
over time; it will try to prolong the negotiation. In addition, A’s utility from opting out may be much lower than W ’s utility from opting out. W loses over time and would like to reach an agreement as soon as possible. Thus the only way W can make A accept an offer is by threatening to opt out. Therefore, as in the previous chapters, acceptable offers must be better for W and A than opting out, that is, members of Possiblet . From these agreements, the one that is best for A may be reached. As we will show later, in most of the cases only such an agreement will be offered and accepted when the agents are using strategies that are in subgame-perfect equilibrium. In chapter 2 we formally defined such an offer for a given time t and denoted it by s˜ A,t . We restate definition 2.3.2 below. Definition 5.1.3 Acceptable offer: If Possiblet is not empty, let the agreement s˜ A,t = < s, n > be the best agreement for A, at time period t, which is still not worse for W than opting out. That is, U i ((˜s i,t , t)) = maxs∈Possiblet U i ((s, t)). Note that Possiblet is never empty. The agreement < 0, 0 > is always not worse to both agents than opting out. The agreement < 0, 0 > allows both agents the possibility of obtaining the resource again immediately with probability pi . This is better than trying to gain access to the resource after q time periods, as in opting out. However, in some cases, the utility for an agent from < 0, 0 > is equal to its utility from leaving the negotiation. This occurs when it doesn’t i before its deadline. have enough time to work on its goal for tmin A,t W,t Note also that in this chapter s˜ = sˆ . That is, the best agreement for A at period t that is still not worse for both agents than opting out may not be the worse agreement for W among the agreements that is still not worse for both agents than opting out. This is because there is no strict competition between the two agents as in the previous chapter. s˜ A,t has an important role in A’s and W ’s strategies, which are in equilibrium, as we will see in the following sections. To find such strategies, we studied s˜ A,t thoroughly, under different conditions and cases the agents may be involved in. The next definition deals with A’s needs: how many additional time periods it needs in period t in order to accomplish its goal, given the specifications of its goal and the current time period. At a given time period t of the negotiations, A has already worked for t + done A time periods. In the best case, it will be able to work for enough periods to reach its maximal requirements, that is, A tmax − done A − t. However, its deadline may arrive before this, and thus it will be able to work only for the time left until its deadline, that is, dl A − done A − t. This is summarized in the following definition.
Negotiations about Resource Allocation with Multiple Attributes
129
Definition 5.1.4 s tA : A s tA = min{dl A − done A − t, tmax − done A − t}.
When A works for additional s tA time periods, after t time periods of negotiation, it will accomplish the feasible part of its goal. Therefore, it is easy to prove the following lemma. Lemma 5.1.2 (< s tA , 0 > is the best agreement for A.) ∀t, n, s ′ = s tA , U A (< s ′ , n >, g A , done A , t) ≤ U A (< s tA , 0 >, g A , done A , t) Thus s tA is an extreme point in A’s utility function regardless of t. Schechter (1996) identified the values of s˜ A,t given the agents’ utility functions. Here we simply demonstrate these findings using an example and present our findings concerning the agents’ preferences. EXAMPLE 10 We return to the example of the robots on Mars (example 9) and compute s˜ A,0 . Recall that RobotN’s goal is < G1, 15, 70, 79, 4, 1002 >, A A that is, tmin = 15, tmax = 70, and the deadline is 79. The goal of RobotE is W W = 10, tmax = 20, and the deadline < G2, 10, 20, 30, 4, 1002 >, that is, tmin is 30. In this case, RobotE, which plays the role of W , has enough time to W work for at least its minimal required time (tmin = 10) after opting out, that is, W W W dl − q ≥ tmin , where dl = 30 and q = 8. Furthermore, W can accomplish W time periods after opting out, since its goal completely by working on it tmax W W has more than tmax = 20 time periods after opting out. RobotN plays the role of A. At the beginning of the negotiations, the number of minutes left for RobotN to work in order to fully accomplish its goal is 66. This is because the goal requires 70 minutes and it has already worked for 4 minutes. The 66 minutes are productive since it has enough time until its deadline. That is, s 0A = 66. However, if RobotN continues working for 66 minutes, RobotE will not be able to even partially accomplish its goal since its deadline is 30 minutes. RobotE should consider two possible agreements, < 20, 10 > and < 0, 0 >, which are better for RobotE than opting out. In < 20, 10 >, A will work until the last minute, after which RobotE will not be able to work its minimal required time periods. In < 0, 0 >, A will give up the resource and has a probability of 0.5 of gaining it back immediately. As calculated in example 9, U A (< 20, 10 >, G1, 4, 0) = 170. It is easy to see that U A (< 0, 0 >, G1, 4, 0) = 140. Intuitively this means that RobotN will prefer to continue working for 20 minutes, then giving the tool to RobotE for 10 minutes and waiting, and only after it finishes will it try to gain access to
130
Chapter 5
the tool. This option is better than leaving the resource immediately and having a chance equal to that of RobotE to gain the tool, and to work possibly for 70 minutes. This is the case, even though in the first possibility, that is, < 20, 10 >, it will not be able to satisfy its maximal requirements. Thus s˜ A,0 = < 20, 10 >. It is useful to observe how s˜ A,t changes over time. At periods 1 and 2, the conditions do not change significantly, and thus s˜ A,1 = < 19, 10 > and s˜ A,2 = < 18, 10 >. At time 3, RobotE does not have enough time before its W = 20 after opting out, but still s˜ A,3 = < 17, 10 >, aldeadline to do tmax lowing RobotE to work for its minimal requirement before its deadline. The situation changes only at time 13, when RobotE does not have enough time to work for its minimal requirement before its deadline after opting out. In this case, s˜ A,t = < s tA , 0 >, that is, s˜ A,13 = < 53, 0 >, since A has worked for 4 minutes before the negotiations started (done A = 4) and it continues to work for 13 minutes during the negotiations. Note that RobotN’s utility from s˜ A,t does not change over time until t = 13. However, s˜ A,13 at period 13 (U A (< 53, 0 >, G1, 4, 13) = 66 ∗ 4 = 264) is better than s˜ A,0 at the first time period (U A (< 20, 10 >, G1, 4, 0) = 170). Robot E’s utility from s˜ A,t is reduced over time. In particular, U W (< 20, 10 >, G1, 4, 0) = 40; since RobotE pays 2 per minute for negotiation costs, U W (< 19, 10 >, G1, 4, 1) = 40−2 = 38 and U W (< 18, 10 >, G1, 4, 2) = 36. Its utility from s˜ A,13 is really low, since it does not gain any productive working time, but it needs to pay for the negotiation, that is, U W (< 53, 0 >, G1, 4, 13) = −26. 5.1.5
The Agents’ Preferences in Respect to Different Outcomes
An important question that we consider before specifying the strategies for A and W , which are in equilibrium, are the preferences of the agents between opting out and s˜ A,t and the way they change over time. Schechter (1996) presents a detailed discussion and formal results concerning this issue. Here we only summarize the most interesting results, and present a lemma that is needed for the proofs in the next section. 5.1.5.1 W’s Preferences In general, W loses over time. Thus we assume that the following properties are true concerning its preferences. W prefers opting out now, rather than later: If W opts out now, it has the option of more future time periods to accomplish its goal. In addition, if it
Negotiations about Resource Allocation with Multiple Attributes
131
spends less time negotiating because of its opting out earlier, its negotiation costs are lower. W prefers s˜A,t now to s˜A,t+1 later: W ’s share in s˜ A,t does not increase over time. In addition, it pays for the negotiation. W prefers s˜A,t now to opting out later: This is the result of W ’s preference for opting out sooner rather than later and that s˜ A,t is always not worse than opting out at time t. W prefers to opting out: When opting out, W needs to wait q ≥ 1 time periods before it can try to gain access to the resource (and has a probability of pW to gain it); when the agreement < 0, 0 > is implemented, it can try immediately to gain access to the resource (and again has a probability of pW to gain it). The following lemma indicates that when W has no preference between opting out and leaving or prefers leaving to opting out, it either remains indifferent or prefers to leave during the negotiation. This lemma is used in the next section, and its proof follows from the definition of W ’s utility for opting out and leaving. Lemma 5.1.3 (LeaveW vs. Opt.) ∀t, t ′ ∈ T , if U W (LeaveW , g W , doneW , t) ≥ U W (Opt, g W , doneW , t), then if t ′ > t, U W (LeaveW , g W , doneW , t ′ ) ≥ U W(Opt, g W , doneW , t ′ ). 5.1.5.2 A’s Preferences In general, A gains over time when its deadline has not been reached and it hasn’t satisfied its goal completely. However, there are situations in which it will prefer an event sooner, rather than later. Thus we assume that the following properties are true concerning its preferences. Opting out now vs. opting out later: One can mistakenly assume that A may prefer to opt out as late as possible, since it is the one that holds the resource and therefore profits over time. However, there are periods of time in which A A loses over time: when A has not yet performed tmin periods, it does not get paid for each negotiation period and therefore loses over time (i.e., it must pay for holding its resource). During this period, A will prefer to opt out and save the payment for holding its resource to reaching an agreement that will not allow it to work for tmin time periods. Another case in which A loses over time is after it has completed its maximum periods. Any additional negotiations cause it to overpay for holding the resource when it doesn’t actually need it. Thus it will
132
Chapter 5
prefer to leave or opt out. In all other cases, A gains over time and prefers to opt out later, rather than sooner. Even though, A may lose over time, that is, its utility from opting out earlier exceeds its utility from opting out later, if the loss is because A hasn’t A yet performed tmin time periods and it still has enough time until its deadline A time periods, there is t ′ in the future, where A gains to work on the goal tmin greater utility from opting out, compared to its present utility. Therefore, in such cases it will be better for A to delay its opting out until that time period and thereby gain more utility. A has no preference between s˜A, t = now and s˜A, t+1 = < st+1 A ,t > t t A, t =< s A , t > for some t ∈ IN and s A > 0, A gains its maxlater: When s˜ imal utility.6 As t increases, s tA decreases respectively. Therefore, instead of s˜ A,t =< s tA , t >, we have s˜ A,t+1 =< s tA − 1, t >. But since A is working on its goal during negotiation, its utility remains the same. A prefers s˜A, t = < 0, 0 > now to s˜A, t+1 = < 0, 0 > later: s˜ A,t is equal to < 0, 0 > when A does not benefit from holding the resource, either because it hasn’t A reached tmin yet, or because it has accomplished its goal: that is, it has worked A but is paying for the resource. In both cases, it is better for A more than tmax to accept < 0, 0 > sooner rather than later and to cut its losses. Note, however, that in some cases the utility from < 0, 0 > may be equal to the utility from leaving now. s˜A, t is not worse to A than opting out now: By the definition of s˜ A,t , it belongs to Possibleit , which consists of agreements that are not worse to both agents than opting out. 5.2
Subgame Perfect Equilibrium Strategies
Having discussed the agents’ preferences regarding different outcomes of the negotiation, we are ready to define the strategies that should be used in the negotiation process. The details of the strategies that are in SPE depend on the agents’ preferences regarding s˜ A,t , opting out, and leaving. We will present the strategies that are in subgame perfect equilibrium and show that negotiation ends, at most, after two negotiation time periods. As mentioned above, W loses over time. In particular, its utility from s˜ A,t decreases as t increases. However, this is not always the situation for A: it may prefer s˜ A,t at time t to s˜ A,t+1 at t + 1 or vice versa. Or it may have no preference
Negotiations about Resource Allocation with Multiple Attributes
133
between s˜ A,t at time t and s˜ A,t+1 at t +1. The situations differ from each other in the relationship between A’s and W ’s utility functions, for example, which one reaches its deadline first. This factor will play an important role in the details of the strategies that are in subgame-perfect equilibrium. 5.2.1
Time Periods when the Negotiation Ends
Before discussing the strategies that are in equilibrium, we identify the time periods in which it is clear that the negotiation will end. Lemma 5.2.1 (A’s strategy when s tA = 0.) When s tA = 0, if it is A’s turn to respond to an offer and the negotiation has not ended at t − 1, then A will leave the negotiation. In the proofs of the rest of the lemmas and theorems, we will concentrate only on time periods t, in which s tA > 0, since from the previous lemma it is clear that the negotiation will not continue after that point. The first time period in which s tA = 0 is denoted tˆ A . In addition there are situations where W will always leave the negotiation. This is when its utility from leaving is not lower than its utility from s˜ A,t . Lemma 5.2.2 (Strategies when LeaveW is better for W than s˜ A,t .) Suppose U W (LeaveW , g W , done W , t) ≥ U W (˜s A,t , g W , done W , t). If it is W ’s turn to respond to an offer and the negotiation has not ended at t − 1, then W will leave. If it is A’s turn to respond to an offer, s tA > 0, and the negotiation has not ended at t − 1, then A will say no and offer s˜ A,t+1 . W W W To simplify the notation, we define tne = dl W − tmin + 1, and tneo = dl W − W − q + 1. Intuitively, these two points in time indicate changes in W ’s tmin W preferences for different actions: when reaching tneo , W has no preference between opting out and leaving, since, from that point on, it will not have W enough time periods to work tmin time periods after waiting q time periods. W , W has no preference between leaving and accepting s˜ A,t , When reaching tne W too, since it will not have enough time to work tmin time periods if s˜ A,t is implemented. In some situations, W will become indifferent to s˜ A,t as leaving W W for some tneo ≤ t < tne . We denote the earliest time period in which this W = 5. In this happens by tˆ W . For example, suppose dl W = 14, q = 6 and tmin W W case, tneo = 14 − 5 − 6 + 1 = 4 and tne = 14 − 5 + 1 = 10. It is easy to see W that for every t ≥ tneo , s˜ A,t = < s tA , 0 > (Schechter 1996). Suppose that at time W 4 tneo = 4, s A = 3. Then W ’s utility from s˜ A,t is better at that time than that from leaving, since according to the agreement, after A leaves the resource, W will
134
Chapter 5
still have 7 time periods to try to work on its goal. Thus tˆ W = 10. However, if, for example, s 4A = 8, leaving is better for W at period 4 than s˜ A,4 ; thus tˆ W = 4. Note, however, that in the first case, where s 4A = 3, A will leave the negotiation before the earliest time in which W prefers to leave (tˆ W = 10), that is, the time in which A would finish working, tˆ A , is earlier than tˆ W . 5.2.2
Time Periods Near the End of the Negotiation
From Lemmas 5.2.1 and 5.2.2 it is clear that the negotiation will end at the earliest time between the earliest time in which W prefers to leave (tˆ W ) and the time in which A would finish working (tˆ A ) if it hasn’t ended before so far. We will try to construct the strategies of the agents in reverse from these points. Note that according to the assumptions “leaving is preferred over opting out and agreements” of section 5.1 and “agents avoid opting out” of section 2.2, if W ’s utility from leaving and opting out is the same, it will not opt out. Similarly, if W ’s utility from accepting an offer and opting out is the same, it will not opt out, and if its utility from leaving and accepting an offer is the same, it will not accept the offer. As discussed above, the logic behind this behavior is to make the resource available as much as possible. However, it adds more cases to the strategies, which are specified in this section. 5.2.2.1 W Prefers Leaving to s˜A, t in a Time Period before stA Becomes 0 (tˆW < tˆA ) We consider the case in which the earliest time in which W prefers to leave is earlier than the time in which A would finish working, that is, tˆW < tˆ A . That is, W prefers to leave the negotiation because it cannot gain more by accepting s˜ A,t before A finishes working on its goal. We first specify the strategies for A and W at tˆW − 1. Intuitively, since W will leave either in tˆW or in tˆW +1 (depending on when it should respond to an offer), A, which prefers Leave W over any other option, will try to delay the end of the negotiation to tˆW . However, W , which prefers leaving or opting out as early as possible, will opt out or leave if it does not receive a better offer. In the rest of this chapter, the latest offer made is denoted O l . Lemma 5.2.3 (Strategies at tˆW − 1) Suppose the negotiation hasn’t ended at tˆW − 2. W’s strategy: If at time tˆW − 1 it is W ’s turn to respond to an offer, then it will use the following strategy: 1. If U W (LeaveW , tˆW − 1) < U W (Opt, tˆW − 1), then if U W (Opt, tˆW − 1) ≤ U W (O l , tˆW − 1), then say Yes. Otherwise, Opt.
Negotiations about Resource Allocation with Multiple Attributes
135
2. Otherwise, if U W (LeaveW , tˆW − 1) ≥ U W (Opt, tˆW − 1), then if U W (LeaveW , tˆW − 1) < U W (O l , tˆW − 1), then say Yes. Otherwise, Leave. A’s strategy: If at time tˆW − 1, it is A’s turn to respond to an offer, then it says No and offers s˜ A,t+1 . The reasoning behind the above lemma is as follows. In W ’s strategy, step 1 considers the case where opting out is better for W than leaving. In this case, W will compare opting out with accepting the offer made by A (i.e., O l ) and choose the best option for itself. In the second step, leaving is better than opting out, and thus W will choose between leaving and accepting A’s offer. A’s strategy above is very simple. It would like W to leave in the next time period, and thus rejects its current offer knowing that it will leave in the next time period. We next consider the agents’ behavior at time tˆW − 2. If it is W ’s turn to respond to an offer at tˆW − 2, then it will try to end the negotiation at that time period. If the negotiation continues to the next time period, A will make W leave at tˆW . Since W loses over time, it will prefer reaching an agreement, leaving, or opting out in tˆW − 2. If it is A’s turn to respond to an offer, it will try to make W leave the negotiation. If it is not possible to make W leave (since opting out is better for W than leaving), A will try to prevent W from opting out by either accepting its offer or by offering s˜ A,t+1 , which is not worse for W than opting out. This is stated formally in the next lemma. Lemma 5.2.4 (Strategies at tˆW − 2.) Suppose the negotiation hasn’t ended at tˆW − 3, and t = tˆW − 2. W’s strategy: If it is W ’s turn to respond to an offer, then it will use the following strategy: 1. If U W (LeaveW , t) < U W (Opt, t), then if U W (Opt, t) ≤ U W (O l , t), then say Yes. Otherwise, Opt. 2. Otherwise, if U W (LeaveW , t) ≥ U W (Opt, t), then if U W (LeaveW , t) < U W (O l , t), then say Yes. Otherwise, Leave. A’s strategy: If t = tˆW − 2 and it is A’s turn to respond to an offer, then it will use the following strategy: 1. If U W (Opt, t + 1) ≤ U W (LeaveW , t + 1), then say No and offer < s tA , 0 >. 2. Otherwise, if U A (Leave A , t) ≥ U A (Opt, t) ≥ U A (O l , t) and U A (Leave A , t) ≥ U A (˜s A,t+1 , t + 1), then Leave.
136
Chapter 5
3. Otherwise, if U A (O l , t) < U A (˜s A,t+1 , t + 1) and U A (Opt, t) ≤ U A (˜s A,t+1 , t + 1), then say No and offer s˜ A,t+1 . 4. Otherwise, if U A (O l , t) = U A (˜s A,t+1 , t + 1), then (a) If O l = s˜ A,t , then say Yes, (b) Otherwise, say No and offer s˜ A,t+1 . 5. Otherwise, if U A (O l , t) > U A (˜s A,t+1 , t + 1) and U A (O l , t) ≥ U A (Opt, t), then say Yes. 6. Otherwise, Opt. In the lemma above (lemma 5.2.4), W ’s strategy is simpler than A’s (while in the previous lemma, A’s strategy was simpler). W ’s strategy is exactly as in period tˆW − 1 (lemma 5.2.3): it chooses the best option between opting out, leaving, and accepting A’s offer. This is because if the negotiation continues to tˆW − 1, it will be A’s turn and A will delay the negotiation and make W leave, as in tˆW . In the first step of A’s strategy above, the case where W prefers leaving is considered. In this case, A would like to delay the negotiation to the next time period, where W will leave. In the second step A checks whether it should leave. This may occur if A interrupting its work in the next time period will not enable it to work for tmin before its deadline. In step 3 A compares its utility from accepting W ’s current offer, its expected utility in the next time period (i.e., its utility from s˜ A,t+1 at t +1), and its expected utility from opting out now. If the future option is the best, it will not accept the offer, and wait for W to accept its offer s˜ A,t+1 in the next time period. Step 4 considers the case in which W ’s offer is as good as the future option of s˜ A,t+1 . To make W offer s˜ A,t and not an offer that will keep the resource busy, it will accept the offer only when it is equal to s˜ A,t . Step 5 considers the case in which W ’s offer is better than the future option and opting out. Thus the best option for A is to accept the offer. If none of the conditions of steps 1–5 holds, A would conclude that opting out is the best option, as indicated in step 6. 5.2.2.2 stA Becomes 0 Before the First Period in which W Prefers LeaveW to s˜A, t (tˆA < tˆW ) We now consider the case where the time in which A would finish working is earlier than the earliest time in which W prefers to leave (tˆ A < tˆW ). That is, A would like to leave the negotiation before W . We know
Negotiations about Resource Allocation with Multiple Attributes
137
from lemma 5.2.1 that if it is A’s turn to respond to an offer at tˆ A , it will leave. In the next lemma, we consider the case where t = tˆ A , but it is W ’s turn to respond. Lemma 5.2.5 (W ’s strategy at tˆ A ) Suppose the negotiation hasn’t ended at tˆ A − 1. If at t = tˆ A it is W ’s turn to respond to an offer O l , then it will use the following strategy: 1. If U W (O l , t) ≥ U W (Opt, t) and U W (O l , t) ≥ U W (Leave A , t + 1), then say Yes. 2. Otherwise, if U W (Opt, t) > U W (Leave A , t + 1), then Opt. 3. Otherwise, say No and offer s˜ A,t+1 . In W ’s strategy of the lemma above, W compares Leave A in the next time period, A’s offer, and opting out. If A’s offer is the best (step 1), then it will accept the offer. If opting out is the best option (step 2), then W will opt out. Otherwise, (step 3) it will wait until the next period by saying no and making a counteroffer. The next time period to be considered in our backward induction is tˆ A − 1. Lemma 5.2.6 (Strategies at tˆ A − 1) tˆ A − 2.
Suppose the negotiation hasn’t ended at
W’s strategy: If at time tˆ A − 1 it is W ’s turn to respond to an offer O l , then it will use the following strategy: 1. Opt is better for W than LeaveW : If U W (LeaveW , tˆ A − 1) < U W (Opt, tˆ A − 1), then (a) If U W (Leave A , tˆ A ) ≥ U W (Opt, tˆ A − 1), then i. If U W (O l , tˆ A − 1) ≥ U W (Leave A , tˆ A ), then say Yes. A
ii. Otherwise, say No and suggest s˜ A,tˆ . (b) Otherwise, if U W (O l , tˆ A − 1) ≥ U W (Opt, tˆ A − 1), then say Yes. (c) Otherwise, Opt. 2. W prefers to leave now than A’s leaving in the next time period: Otherwise, if U W (LeaveW , tˆ A − 1) ≥ U W (Leave A , tˆ A ), then (a) If U W (LeaveW , tˆ A − 1) < U W (O l , tˆ A − 1), then say Yes. (b) Otherwise, Leave.
138
Chapter 5
3. Accepting A’s offer is the best option for W: Otherwise, if U W (O l , tˆ A − 1) > U W (Leave A , tˆ A ), then say Yes. 4. Waiting for A to leave is the best option for W: Otherwise, say No A and offer s˜ A,tˆ . A’s strategy: If in time tˆ A − 1 it is A’s turn to respond to an offer, then it will use the following strategy: A
1. Accepting W’s offer is A’s best option: If [U A (O l , tˆ A −1) = U A (˜s A,tˆ , tˆ A ) A A and O l = s˜ A,tˆ −1 ] OR U A (O l , tˆ A − 1) > U A (˜s A,tˆ , tˆ A ), then say Yes. A
2. s˜A,tˆ in the next time period is the best option: Otherwise, say No A and suggest s˜ A,tˆ . In the above lemma, if it is W ’s turn to respond to A’s offer at tˆ A − 1, it will choose between Leave A in the next time period, opting out now, LeaveW now, or accept A’s offer. Step 1 considers the case where opting out now is better for W than LeaveW . In step 1a, Leave A in the next time period is better for W than opting out. Thus W compares A’s offer and Leave A . In step 1b, the case where opting out is better than Leave A is considered, and thus W compares opting out and A’s offer. The rest of the steps consider the case in which W prefers leaving now to opting out. Step 2 considers the rare case where W prefers to leave now to waiting for A to leave during the next time period. Thus it compares accepting A’s offer and LeaveW . In step 3, A’s offer is better than Leave A in the next time period, which is better than LeaveW now, which, in turn, is better than opting out. The case when Leave A in the next time period is the best option is considered in step 4. That is, U W (LeaveW , tˆ A − 1) ≥ U W (Opt, tˆ A − 1), but W W U W (LeaveW , tˆ A −1) < U W (Leave A , tˆ A ). This may occur when tneo . < tˆ A < tne A’s strategy in the lemma above, when it needs to respond to W ’s offer at A tˆ A − 1, is simpler. First, it is easy to see that s˜ A,tˆ = < 0, 0 > and that if it is offered by A at tˆA , it will be accepted by W . Thus A will compare W ’s offer and A A s˜ A,tˆ in the next time period. Note that if its utility from s˜ A,tˆ and O l is the same, l A,tˆ A −1 it will accept the offer only if O = s˜ in order to keep the resource free as possible. We will continue by considering the strategies at tˆ A − 2. We will specify the strategies of tˆ A − 2 in as general a way as possible, in order to be able to go on to extend it to earlier time periods.
Negotiations about Resource Allocation with Multiple Attributes
139
Lemma 5.2.7 (Strategies at tˆ A − 2) Suppose the negotiation hasn’t ended by tˆ A − 3. W’s strategy: If at time t = tˆ A − 2 it is W ’s turn to respond to an offer, it will use the following strategy: 1. A prefers s˜A, t+1 at t + 1 than s˜A, t+2 at t + 2: If U A (˜s A,t+1 , t + 1) ≥ U A (˜s A,t+2 , t + 2), then (a) If U W (O l , t) ≥ U W (˜s A,t+1 , t + 1) and U W (O l , t) ≥ U W (Opt, t), then say Yes. (b) Otherwise, if U W (LeaveW , t) ≥ U W (Opt, t), then i. If U W (LeaveW , t) ≥ U W (˜s A,t+1 , t + 1), then Leave. ii. Otherwise, say No and suggest s˜ A,t+1 . (c) Otherwise, if U W (˜s A,t+1 , t + 1) ≥ U W (Opt, t), then say No and offer s˜ A,t+1 . (d) Otherwise, Opt. 2. LeaveW is not worse for W than Opt: Otherwise, if U W (LeaveW , t) ≥ U W (Opt, t), then (a) If U W (O l , t) ≥ U W (˜s A,t+2 , t + 2) and U W (O l , t) > U W (LeaveW , t), then say Yes. (b) Otherwise, if U W (LeaveW , t) ≥ U W (˜s A,t+2 , t + 2), then Leave. (c) Otherwise, say No and offer s˜ A,t+1 . 3. Accepting A’s offer now is the best option: Otherwise, if U W (O l , t) ≥ U W (˜s A,t+2 , t + 2) and U W (O l , t) ≥ U W (Opt, t), then say Yes. A 4. Waiting for s˜A, tˆ at tˆA is the best option: Otherwise, if U W (˜s A,t+2 , t + 2) ≥ U W (Opt, t), then say No and offer s˜ A,t+1 . 5. Opt is the best option: Otherwise, Opt. A’s strategy: If in time t = tˆ A − 2 it is A’s turn to respond to an offer, then it will use the following strategy: 1. LeaveW is not worse for W than Opt in the next time period: If U W (Opt, t + 1) ≤ U W (LeaveW , t + 1), then say No and offer < s tA , 0 >. 2. LeaveA is the best option: Otherwise, if U A (Leave A , t) ≥ U A (Opt, t) ≥ U A (O l , t) and U A (Leave A , t) ≥ U A (˜s A,t+1 , t + 1), then Leave.
140
Chapter 5
3. Accepting W’s offer is the best option: Otherwise, if U A (O l , t) < U A (˜s A,t+1 , t + 1) and U A (Opt, t) ≤ U A (˜s A,t+1 , t + 1), then say No and offer s˜ A,t+1 . 4. A has no preference between W’s offer now and s˜A, t+1 in the next time period: Otherwise, if U A (O l , t) = U A (˜s A,t+1 , t + 1), then (a) If O l = s˜ A,t , then say Yes, (b) Otherwise, say No and offer s˜ A,t+1 . 5. Accepting W’s offer is the best option: Otherwise, if U A (O l , t) > U A (˜s A,t+1 , t + 1) and U A (O l , t) ≥ U A (Opt, t), then say Yes. 6. Opting out is the best option: Otherwise, Opt. In the first step of W ’s strategy of the lemma above, the case where A loses over time (i.e., A prefers s˜ A,t+1 at t +1 to s˜ A,t+2 at t +2) is considered. In this case, if W offers A s˜ A,t+1 in the next time period (t = tˆ A − 1), it will be accepted. Thus W will compare accepting A’s offer, O l , opting out, and LeaveW . In step 1a, accepting A’s offer is the best option, and thus W says Yes. In step 1bi, LeaveW is the best option, and in steps 1bii and 1c, s˜ A,t+1 is the best option. The situations in which A does not lose over time are considered in steps 2–5 of W ’s strategy above. In Step 2, LeaveW is not worse for W than opting out. In this case, W cannot threaten A with opting out, and thus A will try to delay the negotiations until t + 2 = tˆ A . Thus W will compare O l (step 2a), leaving now (step 2b), and s˜ A,t+2 at tˆ A = n + 2 (step 2c). In steps 3, 4, and 5, the situations where opting out is better for W than LeaveW are considered. W will choose between accepting A’s offer (step 3), waiting until tˆ A = n + 2 and accepting s˜ A,t+2 (step 4), and opting out (step 5). Note that when t = tˆ A − 2, A will not leave in the next time period, and thus W will not consider this option. The logic behind A’s strategy in the lemma above is as follows. A prefers that W leave. This is possible only if LeaveW is not worse for W than opting out. This situation is handled in step 1. In this case, A rejects W ’s strategy, waiting for it to leave in the next time period. Otherwise, A will compare accepting W ’s offer, O l , s˜ A,t+1 in the next time period, leaving now, and opting out now. As in previous cases, A can be sure that s˜ A,t+1 will be accepted. Also, as in the previous lemma, if it has no preference between O l now and s˜ A,t+1 in the next period, and both are better than opting out, it will accept O l only if it is equal to s˜ A,t . Deviating will not increase its expected utility, and it is assumed that A prefers to leave the resource free, as much as possible.
Negotiations about Resource Allocation with Multiple Attributes
141
5.2.3 Possible Agreements when A Prefers s˜A, t+1 at t + 1 to s˜A, t+2 at t + 2 (Losing Over Time) Before we continue to specify the strategies for A and W , we will discuss more closely the case in which A’s expected utility from s˜ A,t at period t is higher than its expected utility of s˜ A,t+1 at period t + 1. We name this case “A loses over time.” As mentioned in section 5.1.5.2, this is the case when A has A not yet performed tmin time periods, it does not get paid for any period of the A,t A . In this case, W has negotiation, and s˜ will not let it finish working for tmin more negotiation power than in the cases where A does not lose over time. In particular, it may try to offer A something that is better for W than s˜ A,t . As can be seen in step 4 in A’s strategies of lemma 5.2.4 and lemma 5.2.7, in cases where opting out is better for W than leaving, A, in time t, compares its offer with s˜ A,t+1 , and if s˜ A,t is better for A than s˜ A,t+1 , there is some freedom for W here, in turn, regarding what to offer. The exact agreement for W that will be better for A than s˜ A,t+1 depends on the cost of holding the resource. We denote this agreement with s¯ W,t . However, during time periods that are prior to those in which W offers s¯ W,t , A will take this into consideration when it makes an offer. If it would like W to accept an offer at time t (since A loses over time), it will need to offer W an agreement that is better for W than s¯ W,t+1 and not just s˜ A,t . Otherwise, W will reject the offer and suggest s¯ W,t+1 . Thus we define s¯ W,t backward from the first time that A suggests s˜ A,t+1 to W . It is important to notice that when A is losing over time, tˆW ≤ tˆ A . The backward definition of s¯ W,t will start from tˆ. We define tˆ to be the latest time period such that tˆ < tˆW and U W (LeaveW , tˆ) < U W (Opt, tˆ). In addition, we require that it be A’s turn W to make an offer at time period tˆ − 1. It is easy to see that tˆ < tneo , and that it W W W is either tneo − 1 or tneo − 2, depending on whether tneo is even or odd. Definition 5.2.1 Base case (t = tˆ): s¯ W,tˆ = s˜ A,tˆ. W’s turn to respond (t is odd): For any t < tˆ, if t is odd then s¯ W,n is the best agreement for A in time t that is still better for W than s¯ W,t+1 at time t + 1 and is not worse for W than opting out at time t. A’s turn to respond (t is even): For any t < tˆ, if t is even, then s¯ W,n is the best agreement for W in time t that is still better for A than s¯ W,t+1 at time t + 1 and is not worse for A than opting out at time t. Note that there are situations in which s¯ W,t = s˜ A,t for several time periods before tˆ. To demonstrate the way the values of s¯ W,t can be determined, we consider a revised version of the example of the robots on Mars.
142
Chapter 5
11 Consider the case where the goal of RobotN is < G1, 40, 80, 99, A A = 40, tmax = 80 and its deadline is 99, but the other 4, 1002 >, that is, tmin details of the situation are exactly as in example 9. In particular, RobotE’s goal is < G2, 10, 20, 30, 4, 1002 >, and the agents pay 2 for holding the resource without doing productive work. We first need to determine s˜ A,0 . This situation is similar to that of example 10, since RobotE’s goal hasn’t changed and s 0A = 76 is even larger than in the original situation. As in the original case, we need to compare < 20, 10 > and < 0, 0 >. However, in this case, RobotN cannot perform the minimal required time needed for its goal in 20 minutes. This A is since tmin = 40 and RobotN has worked only for 4 minutes before the negotiation. Thus its utility is only from future time periods after the agreement is implemented. Also, it will then be able to work only for 65 minutes if it gains access to the resource when < 20, 10 > is implemented. However, it needs to pay for holding the resource for 20 minutes, since it is not productive during this time. So, U A (< 20, 10 >, 0) = 0.5 · 65 · 4 − 2 · 20 = 90. On the other hand, the agreement < 0, 0 > will allow it to work for 80 minutes before its deadline and without paying for holding the resource; thus U A (< 0, 0 >, 0) = 0.5 · 80 · 4 = 160. We can conclude that s˜ A,0 = < 0, 0 > and that it will remain < 0, 0 > up to period 13 of the negotiation, at which time RobotE will not have had enough time to perform even its minimal requirements after opting out. During this time, RobotN’s utility from s˜ A,t decreases over time, since it needs to pay for holding the resource, that is, U A (< 0, 0 >, 1) = 0.5 · 80 · 4 − 2 = 158. W W ˆW W , tne , t , and tˆ A . tneo = 30 − 8 − We will now consider the values of tneo W A 10 + 1 = 13, tne = 30 − 10 + 1 = 21, and tˆ = 76. To compute tˆW , we must first compute s˜ A,13 . Since at t = 13, W has fewer future time periods W A,13 than tmin after opting out, s 13 = < 67, 0 >. It is easy to see that A = 67 and s˜ W W W A,13 , 13), and therefore tˆW = 13 and tˆ = 12. Thus, U (Leave , 13) ≥ U (˜s A,12 W,12 since s˜ = < 0, 0 >, s¯ = < 0, 0 >. At time 12, RobotN can still do its 80 time periods, and since it needs to pay for holding the resource for 12 time periods, then U A (< 0, 0 >, 12) = 80 · 4 · 0.5 − 2 · 12 = 136. RobotE’s utility from this is U W (< 0, 0 >, 12) = 18 · 4 · 0.5 − 2 · 12 = 12. To compute s¯ W,11 , we need to consider that in order for it to be better for W than < 0, 0 >, W W must obtain at least tmin = 10. However, the utility for RobotN from < 0, 10 > at time 12 is 73 · 4 · .5 − 2 · 13 = 124, which is lower than that of < 0, 0 > at time 12. Thus s¯ W,11 = < 0, 0 >, and therefore also s¯ W,10 = s˜ A,10 = < 0, 0 >. This is the case, that is, s¯ W,t = s˜ A,t = < 0, 0 >, until t = 5. Since at time 5 it is A’s turn to make an offer and s¯ W,6 = < 0, 0 >, s¯ W,5 is also equal to < 0, 0 >, where U A (< 0, 0 >, 5) = 150 and U W (< 0, 0 >, 5) = 30. However, at time EXAMPLE
Negotiations about Resource Allocation with Multiple Attributes
143
period 4, W can offer something that is better for it than < 0, 0 >. In particular, s¯ W,4 = < 0, 11 >, where U A (< 0, 11 >, 4) = 152 and U W (< 0, 11 >, 4) = 11 · 4 + 9 · 4 · 0.5 − 2 · 4 = 54. Similarly, s¯ W,3 = < 0, 11 >, s¯ W,2 = < 0, 13 >, s¯ W,1 = < 0, 13 >, and s¯ W,0 =< 0, 15 >. In this example, s¯ W,t enables RobotN to make use of its maximal future time periods, similar to its situation in s˜ A,t . However, in this example, if the payment for holding the resource is higher, or p A is lower, that is, the losses of time are larger than the gain from additional working time periods, A’s share in s¯ W,t may decrease relative to s˜ A,t . 5.2.4
Specification of the Subgame Perfect Equilibrium Strategies
We are now ready to continue with the specification of the strategies that are in equilibrium. First, we will present the strategies for any time period except the first one. For the last possible steps of the negotiations, we will combine the strategies for both tˆW − 3 and tˆ A − 3, taking into consideration the differences in the strategies for tˆW − 2 and tˆ A − 2. We denote by tˆ the minimum of tˆW and tˆ A . Next, we will consider the special case of the first time period during which W cannot opt out of the negotiation. Finally, we will combine all our results and present the subgame-perfect equilibrium strategies for the entire negotiation. We will conclude this section with a short discussion of the results. 5.2.4.1 Negotiations Starting at the Second Time Period (t = 1) In the next lemma, we specify the strategies for any time period 0 < t ≤ tˆ − 3. Lemma 5.2.8 (Strategies at 0 < t ≤ tˆ − 3.) Consider the negotiation at period t such that 0 < t ≤ tˆ − 3, and suppose that the negotiation hasn’t ended at period t − 1. W’s strategy: If at t it is W ’s turn to respond to an offer, then it will use the following strategy: 1. LeaveW is not worse for W than Opt in two time periods: If U W (LeaveW , t + 2) ≥ U W (Opt, t + 2), then W (a) If U W (< s t+1 A , 0 >, t + 1) ≥ U (Opt, t), then W l i. If U W (< s t+1 A , 0 >, t + 1) ≤ U (O , t), then say Yes.
ii. Otherwise, say No and offer s˜ A,t+1 . (b) If U W (LeaveW , t) < U W (Opt, t), then if U W (Opt, t) ≤ U W (O l , t), then say Yes. Otherwise, Opt.
144
Chapter 5
(c) Otherwise, if U W (LeaveW , t) ≥ U W (Opt, t), then if U W (LeaveW , t) < U W (O l , t), then say Yes. Otherwise, Leave. 2. Waiting for LeaveA is the best option: Otherwise, if U A (Leave A , t + 1) ≥ U A (Opt, t + 1) ≥ U A (˜s A,t+1 , t + 1) and U A (Leave A , t + 1) ≥ U A (˜s A,t+2 , t + 2), then if U W (O l , t) < U A (Leave A , t + 1) then say No and offer s˜ A,t+2 . 3. Accepting A’s offer now is the best option: Otherwise, if U W (O l , t) ≥ U W (Opt, t) and [ [U A (˜s A,t+1 , t + 1) > U A (˜s A,t + 2 , t + 2) and U W (O l , t) > U W (¯s W,t + 1 , t + 1)] OR [U A (˜s A,t+1 , t + 1) = U A (˜s A,t+2 , t + 2) and U W (O l , t) ≥ U W (˜s A,t+1 , t + 1)] OR [U A (˜s A,t+1 , t + 1) < U A (˜s A,t+2 , t + 2) and U W (O l , t) ≥ U A (˜s A,t+2 , t + 2)]], then say Yes. 4. A prefers s˜A, t+1 at t + 1 to s˜A, t+2 at t + 2 (losing over time): Otherwise, if [U A (˜s A, t+1 , t + 1) > U A (˜s A, t+2 , t + 2) and U W (Opt, t) ≤ U W (¯s W,t+1 , t + 1)], then say No and offer s¯ W,t+1 . 5. A has no preference between s˜A, t+1 at t + 1 and s˜A, t + 2 at t + 2: Otherwise, if U A (˜s A,t+1 , t + 1) = U A (˜s A,t+2 , t + 2) and U W (Opt, t) ≤ U W (˜s A,t+1 , t + 1), then say No and offer s˜ A,t+1 . 6. A prefers s˜A, t+2 at t + 2 to s˜A, t+1 at t + 1 (gaining over time): Otherwise, if [U A (˜s A,t+1 , t + 1) < U A (˜s A,t+2 , t + 2) and U W (Opt, t) ≤ U W (˜s A,t+2 , t + 2), then say No and offer s˜ A,t+1 . 7. Opt is the best option: Otherwise, Opt. A’s strategy: If it is A’s turn to respond to an offer at period t, then it will use the following strategy: 1. LeaveW is not worse for W than Opt: If U W (Opt, t + 1) ≤ U W (LeaveW , t + 1), then say No and offer < s t+1 A , 0 >. 2. LeaveA is the best option: Otherwise, if U A (Leave A , t) ≥ U A (Opt, t) ≥ U A (O l , t) and U A (Leave A , t) ≥ U A (˜s A,t+1 , t + 1), then Leave. 3. A prefers s˜A, t+1 at t + 1 to s˜A, t+2 at t + 2 (losing over time): Otherwise, if U A (˜s A,t+1 , t + 1) > U A (˜s A,t+2 , t + 2), then (a) If U A (O l , t) > U A (¯s W,t+1 , t + 1) and U A (O l , t) ≥ U A (O pt, t), then say Yes.
Negotiations about Resource Allocation with Multiple Attributes
145
(b) Otherwise if U A (O l , t) < U A (¯s W,t+1 , t + 1) and U A (Opt, t) ≤ U A (¯s W,t+1 , t + 1), then say No and offer s¯ W,t+1 . (c) Otherwise, Opt. 4. s˜A, t+1 is the best option: Otherwise, if U A (O l , t) < U A (˜s A,t+1 , t + 1) and U A (Opt, t) ≤ U A (˜s A,t+1 , t + 1), then say No and offer s˜ A,t+1 . 5. A has no preference between s˜A, t+1 and W’s offer: Otherwise, if U A (O l , t) = U A (˜s A,t+1 , t + 1), then (a) If O l = s˜ A,t , then say Yes. (b) Otherwise, say No and offer s˜ A,t+1 . 6. W’s offer is the best option: Otherwise, if U A (O l , t) > U A (˜s A,t+1 , t + 1) and U A (O l , t) ≥ U A (Opt, t), then say Yes. 7. Opt is the best option: Otherwise, Opt. The proof of the lemma is by backward induction on t. As in previous lemmas, W ’s position in the negotiation is weak when its expected utility from leaving is not higher than its expected utility from opting out. It cannot threaten A with opting out and thus will not gain anything better than < s t+1 A , 0 > in the future. This situation is handled in step 1 of W ’s strategy. Thus W will compare opting out now, leaving now, accepting A’s current offer, and reaching an agreement of < s t+1 A , 0 > in the next time period. In step 2, W checks whether there is a possibility that A will leave in the next time period and whether it is worth to it to wait for A to leave. In step 3, W compares A’s offer with possible future outcomes. For W to accept the offer, it needs to be the best option and also better than opting out. The possible future outcomes depend on whether A gains over time, that is, A prefers s˜ A,t+2 at t + 2 to s˜ A,t+1 at t + 1 or losing over time (i.e., A prefers s˜ A,t+1 at t + 1 than s˜ A,t+2 at t + 2). If A loses over time, W can offer s¯ W,t , which will be accepted by A in the next time period. If A has no preference between s˜ A,t+1 and s˜ A,t+2 , then W can offer s˜ A,t+1 , which will be accepted by A. If A gains over time, the best W can hope for in the future is s˜ A,t+2 . If accepting O l now is better than these possible outcomes, W will accept it. In steps 4–7, W compares the above possible outcomes with opting out. If the relevant outcome is better than opting out, then W will say no and will make an offer. If opting out is better, W will opt out. The logic behind A’s strategy in the lemma above is as follows. As in the previous lemmas, if LeaveW in the next time period is not worse for W than opting out in the next time period (step 1), then A will wait until the next time
146
Chapter 5
period for W to leave. In step 2, A considers leaving. Otherwise, if A loses over time, that is, A prefers s˜ A,t+1 at t + 1 to s˜ A,t+2 at t + 2, it will choose between accepting W ’s offer (step 3a), s¯ W,t+1 in the next time period (step 3b), and opting out now (step 3c). The rest of the strategy (steps 4–6) considers the situations in which A does not lose over time. In these situations, A will compare A’s offer, s˜ A,t+1 , in the next time period with opting out. Note that if A has no preference between O l and s˜ A,t+1 , it will insist that O l = s˜ A,t in order to keep the resource as free as possible. 5.2.4.2 Negotiation During the First Time Period The first period of the negotiation is different from the others since W cannot opt out but can only make an offer or leave the negotiation. In the next lemma we specify W ’s strategy in this period. Lemma 5.2.9 (W ’s strategy during the first period (t = 0).) In the first period of the negotiation, W will use the following strategy: 1. LeaveW in the second period (t = 1) is not worse for W than Opt: If U W (Opt, 1) ≤ U W (LeaveW , 1), then (a) If U W (LeaveW , 0) ≥ U W (< s 1A , 0 >, 1), then Leave. (b) Otherwise, offer s˜ A,0 . 2. Waiting for LeaveA is the best option: Otherwise, if U A (Leave A , 0) ≥ U A (Opt, 0) ≥ U A (˜s A,0 , 0) and U A (Leave A , 0) ≥ U A (˜s A,1 , 1), then offer s˜ A,0 . 3. A prefers s˜A, 0 at 0 to s˜A, 1 at 1 (losing over time): Otherwise, if U A (˜s A,0 , 0)> U A (˜s A,1 , 1), then offer s¯ W,0 . 4. A has no preference between s˜A, 0 now and s˜A, 1 in the next time period: Otherwise, if U A (˜s A,0 , 0) = U A (˜s A,1 , 1), then offer s˜ A,0 . 5. A prefers s˜A, t+2 at t + 2 to s˜A, t+1 at t + 1 (gaining over time): Otherwise, if U A (˜s A,0 , 0) < U A (˜s A,1 , 1) and U W (˜s A,1 , 1) > U W (LeaveW , 0), then offer s˜ A,0 . 6. Leave is the best option: Otherwise, Leave. The logic behind the strategy in the previous lemma is that A will accept an offer from W only if it believes that W can threaten it with opting out in the next time period. The first step of the strategy treats the case in which it is better for W to leave in the second time period (t = 1) than to opt out. In this case, W ’s threat to opt out is not credible. A, who prefers LeaveW over any other option, will not
Negotiations about Resource Allocation with Multiple Attributes
147
accept any offer from W and will offer < s 1A , 0 > in the next time period. This will allow A to work for its maximal possible time periods and then leave the resource free. Thus, in this case, W compares < s 1A , 0 > at time period 1 with leaving now, and decides either to leave (1a) or to make some offer it knows will be rejected. The second step of the strategy considers the situation where A will leave the negotiation when W approaches it. The third step of the strategy considers the situation where A is losing over time, that is, A prefers s˜ A,0 at 0 to s˜ A,1 at 1. As explained above, W can offer s¯ W,0 , which will be accepted by A. If A does not lose, but also does not gain over time (step 4), it will accept s˜ A,0 . If A gains over time (step 5), W ’s offer will not be accepted. A will offer s˜ A,1 in the next time period, which will be better for W than opting out. Thus W is left with two options: accepting s˜ A,1 during the next time period, or leaving now. In the first case, it will offer s˜ A,0 . If the second case occurs, it will leave. We can now combine the strategies presented in the lemmas above and specify the strategies that are in subgame-perfect equilibrium. Theorem 5.2.1 (Strategies) The strategies below are in subgame perfect equilibrium given our assumptions. A’s strategy: 1. A has finished working on its goal: If s tA = 0, then Leave. 2. LeaveW is not worse for W than Opt: Otherwise, if U W (Opt, t + 1) ≤ U W (LeaveW , t + 1), then say No and offer < s t+1 A , 0 >. 3. A prefers (˜sA, t+1 , t + 1) to (˜sA, t+2 , t + 2) (losing over time): Otherwise, if U A (˜s A,t+1 , t + 1) > U A (˜s A,t+2 , t + 2), then (a) If U A (O l , t) > U A (¯s W,t+1 , t + 1) and U A (O l , t) ≥ U A (Opt, t), then say Yes. (b) Otherwise if U A (O l , t) < U A (¯s W,t+1 , t + 1) and U A (Opt, t) ≤ U A (¯s W,t+1 , t + 1), then say No and offer s¯ W,t+1 . (c) Otherwise, Opt. 4. Accepting W’s offer is the best option: Otherwise, if U A (O l , t) > U A (˜s A,t+1 , t + 1) and U A (O l , t) ≥ U A (Opt, t), then say Yes. 5. s˜A, t+1 in the next time period is the best option: Otherwise, if U A (O l , t) < U A (˜s A,t+1 , t + 1) and U A (Opt, t) ≤ U A (˜s A,t+1 , t + 1), then say No and offer s˜ A,t+1 .
148
Chapter 5
6. A has no preference between accepting W’s offer now and s˜A, t+1 in the next time period: Otherwise, if U A (O l , t) = U A (˜s A,t+1 , t + 1), then (a) If O l = s˜ A,t , then say Yes. (b) Otherwise, say no and offer s˜ A,t+1 . 7. Opting out is the best option: Otherwise, Opt. W’s strategy: t = 0: 1. LeaveW in the second period (t = 1) is not worse for W than Opt: If U W(Opt, 1) ≤ U W(LeaveW , 1), then (a) U W(LeaveW , 0) ≥ U W(< s 1A , 0 >, 1), then Leave. (b) Otherwise, offer s˜ A,0 . 2. Waiting for LeaveA is the best option: Otherwise, if U A (Leave A , 0) ≥ U A (Opt, 0) ≥ U A (˜s A,0 , 0) and U A (Leave A , 0) ≥ U A (˜s A,1 , 1), then offer s˜ A,0 . 3. A prefers (˜sA, 0 , 0) to (˜sA, 1 , 1) (losing over time): Otherwise, if U A (˜s A,0 , 0) > U A (˜s A,1 , 1), then offer s¯ W,0 . 4. A has no preference between s˜A, 0 now and s˜A, 1 in the next time period: Otherwise, if U A (˜s A,0 , 0) = U A (˜s A,1 , 1), then offer s˜ A,0 . 5. A prefers s˜A, t+2 at t + 2 to s˜A, t+1 at t + 1 (gaining over time): Otherwise, if U A (˜s A,0 , 0) < U A (˜s A,1 , 1) and U W(˜s A,1 , 1) > U W(LeaveW , 0), then offer s˜ A,0 . 6. Leave is the best option: Otherwise, Leave. t > 0: 1. LeaveW is not worse for W than Opt in two time periods: If U W(LeaveW , t + 2) ≥ U W(Opt, t + 2), then W (a) If U W(< s t+1 A , 0 >, t + 1) ≥ U (Opt, t), then W l i. If U W(< s t+1 A , 0 >, t + 1) ≤ U (O , t), then say Yes.
ii. Otherwise, say No and offer s˜ A,t+1 . (b) If U W(LeaveW , t) < U W(Opt, t), then if U W(Opt, t) ≤ U W(O l , t), then say Yes. Otherwise, Opt. (c) Otherwise, if U W(LeaveW , t) ≥ U W(Opt, t), then if U W(LeaveW , t)< U W(O l , t), then say Yes. Otherwise, Leave.
Negotiations about Resource Allocation with Multiple Attributes
149
2. Waiting for LeaveA is the best option: Otherwise, if U A (Leave A , t + 1) ≥ U A (Opt, t + 1) ≥ U A (˜s A,t+1 , t + 1) and U A (Leave A , t + 1) ≥ U A (˜s A,t+2 , t + 2), then if U W (O l , t) < U A (Leave A , t + 1) then say No and offer s˜ A,t+2 . 3. Accepting A’s offer now is the best option: Otherwise, if U W (O l , t) ≥ U W (Opt, t) and [ [U A (˜s A,t+1 , t + 1) > U A (˜s A,t+2 , t + 2) and U W (O l , t) > U W (¯s W,t+1 , t + 1)] OR [U A (˜s A,t+1 , t + 1) = U A (˜s A,t+2 , t + 2) and U W (O l , t) ≥ W A,t+1 , t + 1)] OR U (˜s [U A (˜s A,t+1 , t + 1) < U A (˜s A,t+2 , t + 2) and U W (O l , t) ≥ A A,t+2 , t + 2)]], then say Yes. U (˜s 4. A prefers (˜sA, t+1 , t + 1) to (˜sA, t+2 , t + 2) (losing over time): Otherwise, if [ [U A (˜s A,t+1 , t + 1) > U A (˜s A,t+2 , t + 2) and U W (Opt, t) ≤ U W(¯s W,t+1 , t + 1)], then say No and offer s¯ W,t+1 . 5. A has no preference between (˜sA, t+1 , t+1) and (˜sA,t+2 , t+2): Otherwise, if U A (˜s A,t+1 , t + 1) = U A (˜s A,t+2 , t + 2) and U W (Opt, t) ≤ U W (˜s A,t+1 , t + 1), then say no and offer s˜ A,t+1 . 6. A prefers (˜sA, t+2 , t + 2) to (˜sA,t+1 , t + 1) (gaining over time): Otherwise, if [U A (˜s A,t+1 , t + 1) < U A (˜s A,t+2 , t + 2) and U W (Opt, t) ≤ U W (˜s A,t+2 , t + 2), then say no and offer s˜ A,t+1 . 7. Opt is the best option: Otherwise, Opt. Proof: The proof is clear from the previous lemmas. It important to discuss the ways the negotiations may end according to the above theorem: W will leave without starting a negotiation process: If, in the second period W of the negotiation, W does not have enough time to do tmin before its deadline W W W after opting out, that is, U (Opt, 1) ≤ U (Leave , 1), then it considers leaving because A will reject any offer and make a counteroffer < s 1A , 0 >. If W prefers leaving in period 0 over such an agreement, it will leave. Another situation in which W may leave before even starting the negotiations is when A gains over time, that is, if U A (˜s A,0 , 0) < U A (˜s A,1 , 1) and W prefers leaving over (˜s A,1 , 1). A A will leave when W approaches it: If A hasn’t worked for tmin before W A approaches it, and it does not have enough time to finish working for tmin
150
Chapter 5
W before it needs to give W the resource so that W will be able to work for tmin before W ’s deadline, and it does not have enough time to work on its goal after opting out, then it will leave in the first time period of the negotiations.
An agreement will be reached in the first time period: If in the second time period (1), W prefers opting out to leaving and A loses utility over time or its utility is not changing over time, then an agreement will be reached in the first period of the negotiation. An agreement will be reached in the second time period: If, in the second time period (1), W prefers opting out to leaving, A gains over time, and W prefers < s˜ A,1 , 1 > to leaving before starting the negotiation, then the negotiation will end during the second period of the negotiation. 5.2.5
Examples
We will consider five examples of negotiation to demonstrate the cases where A prefers (˜s A,t+1 , t + 1) to (˜s A,t+2 , t + 2) (losing over time), A prefers (˜s A,t+2 , t + 2) to (˜s A,t+1 , t + 1) (gaining over time), and A has no preference between (˜s A,t+1 , t + 1) and (˜s A,t+2 , t + 2). We will also consider an example where W leaves before starting the negotiation and an example where A leaves when W approaches it. 5.2.5.1 A has No Preference Between (˜sA, t+1 , t + 1) and (˜sA, t+2 , t + 2) We will demonstrate the negotiation in this case using the original Mars example. EXAMPLE 12 We return to the example of the robots on Mars (examples 9 and A 10). Recall that RobotN’s goal is < G1, 15, 70, 79, 4, 1002 >, that is, tmin = 15, A tmax = 70 and its deadline is 79, and that of RobotE is < G2, 10, 20, 30, 4, 1002 >. As calculated in example 10, s˜ A,0 = < 20, 10 >, s˜ A,1 = < 19, 10 > and W s˜ A,2 = < 18, 10 >, etc., until period tneo = 13, where s˜ A,13 = < 53, 0 >. Recall A,t that RobotN’s utility from s˜ does not change over time until t = 13. RobotE’s utility from s˜ A,t decreases over time. However, its utility from opting out also decreases over time. As was computed in example 9, U W (Opt, G2, 0, 0) = 40, U W (Opt, G2, 0, 1) = 38 and U W (Opt, G2, 0, 2) = 36. Furthermore, U W (Opt, G2, 0, 3) = 19 · 4 · .5 − 2 · 3 = 32; U W (Opt, G2, 0, 4) = 18 · 4 · .5 − 2 · 4 = 28; U W (Opt, G2, 0, 12) = 10 · 4 · .5 − 2 · 12 = −4; U W (Opt, G2, 0, 13) = −2 · 13 = −26; U W (LeaveW , G2, 0, 0) = 0; U W (LeaveW , G2, 0, 1) = −2. W Since tneo = 13 and U A (˜s A,0 , 0) = U A (˜s A,1 , 1), case 4 of W ’s strategy for t = 0 in Theorem 5.2.1 should be used. Thus RobotE will offer s˜ A,0 = < 20, 10 >, which will be accepted by RobotN.
Negotiations about Resource Allocation with Multiple Attributes
151
5.2.5.2 A’s Utility from (˜sA, t , t) Is Not Lower than That from (˜sA, t+1 , t + 1) We will demonstrate this case using a modification of the example of the robots on Mars. 13 We return to the example of the robots on Mars (example 9) but A = 13, assume that RobotN works on goal < G1, 13, 27, 30, 4, 1002 >, i.e., tmin A tmax = 27 and the deadline is 30. RobotE needs to work on < G2, 2, 12, 20, 4, W W 1002 >, that is, tmin = 2, tmax = 12, and the deadline is 20. The other details remain as in example 9, for example, done A = 4. RobotN would like to work for an additional 23 minutes to satisfy its maximal possible requirement; that is, s0A = 23. However, this will prevent RobotE from satisfying even its minimal requirement. We shall now compute s˜ A,t . In the first W time time period, W can accomplish its goal completely by working for tmax periods after opting out, but after s0A periods W will not have enough time before W periods. We need to compare RobotN’s utilities its deadline to accomplish tmax from the following: < 10, 2 >, < 14, 6 > and < 0, 0 >. U A (< 10, 2 >, 0) = 10 · 4 + 13 · 4 · .5 = 66; U A (< 14, 6 >, 0) = 14 · 4 + 6 · 4 · .5 = 68 and U A (< 0, 0 >, 0) = 26 · 4 · .5 = 52. Thus s˜ A,0 = < 14, 6 >. RobotE’s utility from this agreement is: U W (< 14, 6 >, 0) = 6 · 4 = 24. Its utility from opting out is U W (Opt, 0) = 12 · 4 · .5 = 24. At time period 1, RobotE does not have enough time to complete its maximal requirement after opting out before its deadline. In this period, s 1A = 22. We need to compare < 8, 0 >, whose utility at period 1 to RobotN is 64; < 10, 2 >, whose utility is 68; < 13, 5 >, whose utility is 70; and < 0, 0 >, whose utility is 50. Thus s˜ A,1 = < 13, 5 >. RobotE’s utility is U W (< 13, 5 >, 1) = 4 · 5+1 · 4 · 0.5−2 = 20. Its utility from opting out is U W (Opt, 1) = 11 · 4 · .5 − 2 = 20. In the next time period, t = 2 and s 2A = 21, and again we will compare < 8, 0 >, < 10, 2 >, < 13, 5 >, and < 0, 0 >. RobotN achieves maximum utility from < 13, 5 >, and it is 72 (i.e., RobotN’s utility continues to increase). Thus s˜ A,2 = < 13, 5 >. RobotE’s utility from s˜ A,2 is 16, and so its utility from opting out is also 16. Note that in this time period RobotE’s utility from leaving is much lower than from opting out. Thus, according to theorem 5.2.1, it offers s˜ A,0 , but RobotN, which prefers s˜ A,1 , will say no and will offer s˜ A,1 . RobotE will accept the offer and the negotiation will end in the second round of the negotiation. EXAMPLE
5.2.5.3 A Prefers (˜sA, t+1 , t + 1) to (˜sA, t+2 , t + 2) (Losing Over Time) A A prefers (˜s A,t+1 , t + 1) to (˜s A,t+2 , t + 2) when it can’t work for at least tmin and make W an offer that will prevent it from opting out. This case happens
152
Chapter 5
W when W ’s deadline expires (so it cannot work tmin ) before A can work for (at W A W A least) tmin time periods on its goal, that is, dl − tmin < tmin − done A . We have already considered this situation in Example 11. We will present the resolution of the negotiation in this case in the following example.
EXAMPLE 14 We return to the example of the robots on Mars when the situation is exactly as in example 11. That is, the goal of RobotN is < G1, 40, 80, 99, 4, 1002 >, and the goal of RobotE is < G2, 10, 20, 30, 4, 1002 >. We have shown in example 11 that A’s utility from s˜ A,t at period t is higher than its utility from s˜ A,t+1 at period t + 1 and that s¯ W,0 = < 0, 15 >. W = 13, U W (Opt, 1) > U W (LeaveW , 1), and since A loses over time, Since tneo we are in case 3 of W ’s strategy, when t = 0, in theorem 5.2.1. Thus RobotE will offer < 0, 15 >, which A will accept according to step 3a of its strategy. In summary, the negotiation will end in the first period of the negotiation with an agreement.
5.2.5.4 W Leaves Before Starting the Negotiation There are situations in which, regardless of whether A gains or loses over time, W will leave the negotiation. We demonstrate this case in the following example. EXAMPLE 15 We modify the example of the robots on Mars. Suppose that RobotN’s goal is < G1, 15, 70, 79, 4, 1002 >, as in example 9, but RobotE’s deadline is earlier: < G2, 10, 20, 15, 4, 1002 >. The rest of the details of the example are as in example 9, for example, q = 8. In this case, in the first time period RobotE does not have enough time to work for 10 time periods after opting out before the deadline. Thus RobotE’s utility from opting out and LeaveW in the first time period is equal to zero. RobotN would like to work for an additional 66 time periods, will not accept any offer, and will offer < 66, 0 > to RobotE in the second time period of the negotiation. RobotE prefers to leave rather than to accept such an offer. Thus RobotE will leave before starting the negotiation process.
5.2.5.5
A Leaves when W Approaches It
16 We modify the example of the robots on Mars. Suppose that A A RobotN’s goal is < G1, 10, 12, 12, 4, 1002 >, that is, tmin = 10, tmax = 12, and its deadline is 12, and it has worked on its goal for 5 time periods (i.e., done A = 5) before RobotE starts the negotiation. That is, when W starts the negotiation, there are only 7 time periods left until A’s deadline. RobotE’s goal is < G2, 10, 12, 14, 4, 1002 > and q = 2. EXAMPLE
Negotiations about Resource Allocation with Multiple Attributes
153
If A will continue working on its goal and finish its minimal number of time periods, that is, work for an additional 5 time periods, W will not be able to complete its own minimal time periods, since its deadline is in 14 time periods and it needs to work for at least 10 time periods. Thus s˜ A,0 = < 0, 0 >. In addition, if A interupts its work, it will not be able to work for its minimal number of time periods (i.e., for 10 periods) because its deadline is in 7 time periods when the negotiation begins. Thus A’s utility from opting out, leaving, and the agreement s˜ A,0 = < 0, 0 > at time period 0 are the same. Its utility from s˜ A,1 = < 0, 0 > at time period 1 is even lower since it needs to pay for using the resource. However, W prefers to opt out than to leave at time period 1, since after opting out it will need to wait for 2 time periods and then it will still have enough time to work on its goal for the minimal number of time periods before its deadline. Thus, when W will approach A with the offer s˜ A,0 = < 0, 0 >, A will leave, knowing that otherwise W will opt out in the next time period and given its preference of leaving now to W ’s opting out in the next time period. 5.3
Simulation Results
This section, presents the simulation results of the protocol and of the strategies discussed in the previous sections. We will see how different variables, such as the cost of opting out, and the goals’ deadlines affect the performance of the algorithms. We present the experiments conducted and conclude by analyzing the experimental results. Several simplifying assumptions are required to apply the formal model and to make a reasonable implementation feasible. First, it is assumed that communication is foolproof and that the message delay time is known to all agents. Second, in order to carry out coordination activities, agents share a global clock reference. However, we would like to emphasize that the system is distributed in such a way that there is no one process that manages the others; each process acts independently. The experiments were conducted with a system comprising three agents: A1 , A2 , and A3 . Each agent has 50 goals, G 1 , . . . , G 50 , to accomplish. The agents’ utility functions satisfied our assumptions and their details are presented in (Schechter 1996). The goals arrive in an arbitrary order and therefore cannot be scheduled a priori. Each goal has minimum and maximum computation time and a deadline. Such a scenario was tested for 10 runs, to guarantee that all results presented were significant to the 0.05 level or below ( p < 0.05).
154
Chapter 5
Table 5.2 Values of the parameters used in simulation.
Goal
Field
Description
Values
tmin
Minimum time periods needed for working in order to be paid for a goal. Maximum time periods needed for working on a goal. Deadline to accomplish a goal. Payment per time period. Cost per time unit when using the resource. Cost per time unit when holding the resource. Time periods needed for repaired the resource after opting out.
[1, 10]
tmax
Resource General
dl m c l q
[tmin + 1, 20] [tmax + 1, tmax + 10] [4, 10] [1, 3] [0, c-1] 1, 3, 6
Table 5.2 shows the values of all parameters used in the experiments. Note that q was tested for different values. All other parameters were created using a uniform distribution. 5.3.1
Metrics
Classical scheduling theory typically uses metrics, such as minimizing the sum of completion times to evaluate different resource allocation algorithms. When deadlines are considered, as in our model, they are usually added as constraints—for example, creating a minimum schedule length subject to the constraint that all tasks must meet their deadlines. We have compared our algorithms to a well-known algorithm, Earliest-Deadline-First (EDF), which has been shown to achieve good scheduling results (Buttazzo 1997; Ronen 1995). Our model of negotiation was planned to maximize the agents’ utility function, and therefore we will use it as the main metric for evaluation and comparison. We also count the number of negotiation sessions and the number of goals that were not completed successfully. The following are formal definitions of the metrics we used in our experiments. Definition 5.3.1 Utility score: The utility gained when running a session, divided by the maximum utility that could be gained when each agent has all needed resources, is the utility score. The ratio will be represented as a percentage. Since the EDF algorithm tries to maximize the number of goals that were completed successfully, we add the following metric to our tests.
Negotiations about Resource Allocation with Multiple Attributes
155
Definition 5.3.2 Abandoned goals: The percentage of goals that the agents are forced to abandon due to deadlines are called abandoned goals. The following metric is an attribute of the environment, which specifies the load of the system—for example, how many times the agents needed to use a strategy to solve a conflict for sharing resources. We chose environments with heavy loads, because we believe that in such environments our model is applicable. Definition 5.3.3 Negotiation/Alternations: The number of times a conflict arises between two agents over the usage of a common resource is called Negotiation/Alternations. The conflict is solved either by negotiation or by alternation, depending on the model used. 5.3.2
Results and Discussion
Recall that our model is designed to maximize the utility function of each agent rather than the utility of the agents as a group. Therefore, both the utility score of each agent and the average score were considered. We ran the simulations with different values of the q parameter, that is, the time delay due to opting out: how many time periods are needed to repair the damaged resource. As table 5.3 shows, our model gives a fair share of the resources to all agents. This is even though agents that play the role of A in the negotiations usually obtain a higher utility. However, since the agents have an equal probability of playing the role of A and W in the simulations, fair distribution is obtained. In addition, we can see that agents chose to opt out very rarely. This happened only in cases of incomplete information that we did not consider in our theoretical work. When an agent opts out during negotiation, both the attached agent, which is currently using the resource, and the agent that opted out must wait q time periods until the resource is repaired. However, under the EDF Table 5.3 Experimental results with different values of q. Utility Score (%) q
A1
A2
Average
St. Dev.
1 3 6
88 94 84
94 87 93
91 90 89
5 5 5
Opting Outs out of 30
Abandoned Goals out of 30
Nego. Sessions (%)
1 1 0
9.6 9.6 9.6
43 43 43
156
Chapter 5
Table 5.4 Experimental results comparing Negotiation and EDF. Negotiation (q = 1)
EDF
Metric
Average
St.Dev.
Average
St.Dev.
Conclusion
Utility score Abandoned goals Nego./Alternation
91% 9.6 21.2
5.2% 3.1 4.5
91% 8.4 15.5
3.5% 2.6 1.7
Same Same Less in EDF
model, when preemption occurs no payment is made by either side: neither by the attached agent nor by the new agent that gained access to the resource. Another interesting development is the evolution of the agents’ achievements under different opting-out constraints. As shown in table 5.3, as q increases, the agents’ average utility score decreases slightly. However, the frequency of opting outs decreases, meaning that the agents avoid opting out when it hurts their achievements. The difference between the agents’ scores, that is, A1 vs. A2, is insignificant. It is a result of applying goals with different attributes. In the second set of experiments, we used the same data sets for two algorithms: our negotiation model and the EDF algorithm. As shown in table 5.4, our model did as well as EDF, with small differences in the standard deviation. In addition to its achievements with respect to the utility score metric, our algorithm outperforms EDF in other aspects. First, the EDF algorithm can be used only in cases where the central agent has complete information on all agents competing for the resource. Our model also supports problems in which agents have only probabilistic beliefs about their opponents’ possible goals.7 Another advantage of our model over EDF is its flexibility. Each agent may opt out and reject its opponents’ offers. Such an option does not exist in EDF, which forces the agents to accept decisions. Furthermore, we must be sure that any agent that might be created in the future, will obey the EDF scheduler. In our model, which uses negotiation as a way of sharing resources, each agent accepts only the communication protocol, without any need for additional rules concerning the schedule itself. We believe that a system with fewer rules is more flexible and thus would be able to meet more future needs, as they arise. The third set of experiments was designed to study the effect of different parameters on the negotiation results (table 5.5).8 We first considered the case i in which the minimum time period (tmin ) is zero, meaning that the agent gets paid from the first working time period. The average utility score remains the same as before, but opting out increased. Next, we studied a case in which each goal
Negotiations about Resource Allocation with Multiple Attributes
157
Table 5.5 Experimental results of studying the effect of different parameters on the negotiation results. Index
Description
Utility Score (q = 1)
Changes in Metrics
1
W tmin =0
91%
Number of opt-outs increases.
2 3
W W = tmax tmin Pt (t) = 0
91% 91%
Number of abandoned goals decreases. Remain the same.
requires a specific amount of time to be achieved, and the agent gets paid only i i = tmax ). Again, the average when it accomplishes maximum periods (i.e., tmin utility score remains the same, but the number of abandoned goals increases. The last modification in our model was to eliminate the effect of negotiation cost. The results show that there was no difference between the two scenarios. In summary, this chapter considered bilateral negotiation for resource allocation. One agent, A, uses the resource during the negotiations, while the other agent, W , waits to gain access to the resource. An agreement consists of two parts—the number of steps that A will continue to keep the resource, and the number of steps W gains to keep the resource. The case of complete information was studied in which the negotiation ends in the first or the second periods of the negotiations.
6
Negotiations about Task Distribution
Suppose a set of autonomous agents has a common goal it wants to satisfy as soon as possible. The common goal is formed by overlapping the goals of the individual agents. To satisfy the goal, costly actions must be taken, and no agent can satisfy the goal without reaching an agreement with the other agents. Each of the agents wants to minimize its costs, that is, prefers to do as little as possible. Note, then, that even though the agents have the same goal (under our simplified assumptions), there is actually a conflict of interests. We consider simple situations where the agents need to divide M equal tasks between them and they lose over time. We consider both bilateral and multiple agents’ situations. In both cases, under appropriate conditions, an agreement will be reached without any delay even though the agents are allowed to opt out of the negotiations. No agent will opt out of the negotiations since there are always agreements in the beginning of the negotiation that are better for all the agents than opting out. 6.1
Bilateral Negotiations
In this section we deal with the case of task distribution between two agents. We will use the following example to demonstrate our ideas. EXAMPLE 17 Suppose there are two agents that are responsible for the delivery of the electronic newsletters of two different companies. The delivery is done by phone (for example, by fax machines). The expenses of the agents depend only on the number of phone calls made. Therefore, if there is someone who subscribes to both companies’ newsletters, the two newsletters may be delivered by one of the agents for the price of only one phone call. Thus delivering newsletters to the common subscribers is a shared goal of the two agents. The agents may negotiate over the distribution of the common subscriptions. Each of the agents can opt out of the negotiations and deliver all of its own newsletters by itself.
In the task allocation problem, an agreement is similar to an agreement in chapter 4. It is an ordered pair (s1 , s2 ), where s1 + s2 = M. si is agent i’s portion of the labor. Hence, formally, S = {(s1 , s2 ) | , s1 , s2 ∈ IN , s1 ≥ 0, s2 ≥ 0, s1 + s2 = M}. We call the environments considered in this chapter task distribution environments (TDE). 6.1.1
Attributes of the Utility Functions
As in previous chapters, we assume that agent i ∈ Agents has a utility function over all possible outcomes: U i : { | S ∪ {Opt} | × T } ∪ {Disagreement} → IR.
160
Chapter 6
Although agents in the resource allocation problem would like to attain a larger share of the resource, in this chapter we assume that each agent prefers to do as little as possible. Therefore, condition A1r (which was appropriate for the resource allocation case) is modified to condition A1t . A1t requires that among agreements reached in the same period, agent i prefers smaller numbers of units si . A1t Actions are costly: For all t ∈ T , r, s ∈ S and i ∈ Agents: ri > si ⇒ U i ((r, t)) < U i ((s, t)). For agreements that are reached within the same time period, each agent prefers to perform a smaller portion of the labor. In contrast to the resource allocation case, as in the data allocation problem, time is valuable to both sides as stated in the following assumption. A2t Time is valuable: For all t1 , t2 ∈ T , s ∈ S and i ∈ Agents if t1 < t2 , U i ((s, t1 )) ≥ U i ((s, t2 )). The next assumption greatly simplifies the behavior of the utility function for agreements. It requires that the difference in utility between (s1 , t1 ) and (s2 , t2 ) depends only on s1 , s2 , and the differences between t1 and t2 . As in the resource allocation problem, we will consider the case of constant delay. We assume that both agents lose over time. A3t Agreement’s cost over time: Each agent i ∈ Agents has a number ci > 0 such that: for all r, s ∈ S and for all t1 , t2 ∈ T , U i ((s, t1 )) ≥ U i ((r, t2 )) iff (si − ci t1 ) ≥ (ri − ci t2 ). Note that assumption A3t does not hold for Opt. We also assume that both agents prefer to opt out sooner than later. Formally: A4t Opting out costs more over time: For t1 , t2 ∈ T and i ∈ Agents, if t1 < t2 then U i ((Opt, t1 )) > U i ((Opt, t2 )). Note that the agents have no notion of group-rationality, and when making a decision whether or not to opt out, only the individual utility is taken into consideration.1 We make no assumptions concerning the preferences of an agent for opting out versus an agreement, which enables us to consider different types of opting out. Formally, there is no fixed s ∈ S such that for every t ∈ T , U i ((s, t)) = U i ((Opt, t)) as in (Shaked and Sutton 1984). As in the previous chapters, the main factor that plays a role in reaching an agreement when agents can opt out of the negotiation is the best agreement for agent i in a given period t that is still preferable to j than opting out in time
Negotiations about Task Distribution
161
period t, that is, s˜ i,t ∈ S. If Possiblet is not empty, then there will be only one maximal s˜ i,t . This is because of assumption ( A1t ) above. In addition, for i = j, sˆi,t = s˜ j,t . An agreement may be reached only if there is at least one agreement that both agents prefer to opting out, that is, only if Possiblet = ∅. We will now introduce two additional assumptions that will ensure that an agreement will be reached. A5t Agreements vs. Opting Out: For every t ∈ T , i ∈ Agents, if U i ((s, t)) > U i ((Opt, t)) and t ≥ 1, then U i ((s, t − 1)) > U i ((Opt, t − 1)). Assumption A5t indicates that if an agreement is preferred over opting out in some time period, it will also be preferred in the previous time periods over opting out. That is, the set of acceptable agreements for an agent does not increase over time. As in the resource allocation problem, an additional assumption similar to A6r , is necessary to ensure that an agreement is possible at least in the first period. That is, there is an agreement that both agents prefer to opting out. We rename this assumption A6t . We will assume that there is some time period T in which there is no agreement that is acceptable for both agents over opting out, as in the data allocation case. This time period may be viewed as a deadline. A6t Time period when an agreement is not possible: A time period T exists where PossibleT = ∅. We denote the earliest of these time periods Tˆ and assume that Tˆ = 0. That is, Possible0 = ∅. 6.1.2
Agreement is Guaranteed with No Delay
As in the previous chapters, an agreement will be reached without delay even though the agents have the option to opt out of the negotiation in any step, if the agents use perfect equilibrium strategies. No agent will opt out since there are always agreements in the beginning of the negotiation that are better for both agents than opting out. If the agents use SPE strategies, there is an agreement offered in the first time period by the first agent to make an offer that will be preferred by its opponents over all possible future outcomes. As in the data allocation case when the agents lose over time, the main driving force for the agent to reach an agreement in this case is the cost of the negotiation time. The agents’ attitudes toward opting out versus reaching agreements will affect only the details of the actual agreement that is reached; they won’t drive any of the agents to opt out.
162
Chapter 6
As the first step in proving the existence of such an agreement, we will now prove that under the above assumptions, if the negotiation has not ended prior to Tˆ , an agreement will be reached in the period immediately prior to this period, that is, in Tˆ − 1. The main reason for this is that both agents in the period prior to this period will try to avoid opting out and will agree to the worst agreement for themselves that is still better than opting out. Lemma 6.1.1 (An agreement will be reached prior to the time period when an agreement is no longer possible.) All the subgame perfect equilibrium (SPE) strategies of a model satisfying A0t –A6t satisfy the following: If it is agent i’s ˆ turn in time period Tˆ − 1 then using its SPE strategy it will suggest s˜ i,T −1 . The other agent will accept the offer. Proof: First note that by (A6t ), Tˆ = 0, and therefore Tˆ −1 ∈ T . Now, suppose that it is agent i’s turn to make an offer at time period Tˆ − 1. It is clear that an agreement won’t be reached after this period. Therefore, since disagreement is the worst outcome ( A0t ), the negotiation process will end with one of the agents opting out. Actually, since the agents prefer opting out sooner rather than later (A4t ), agent i will opt out in the next time period. But, by (A6t ), in time period Tˆ − 1 there are still some agreements that both agents prefer over opting out (at least one). Agent i can choose the best agreement from its point of view and agent j has no other choice but to accept this offer. The best agreement from ˆ agent i’s point of view is s˜ i,T −1 . In the rest of this section, we assume that agent 1 is the first agent to make an offer. Since agent 1’s and agent 2’s positions are similar, all the results can also be proved when agent 2 is the first to make an offer. We now define the agreement that will be offered by an agent during its turn to make an offer. This agreement will be acceptable to the other agent, which will lead to the termination of the negotiation with no delay. The intuition behind this definition is that in each step, the agent whose turn it is to make an offer considers the possible agreement that can be reached in the following time periods. It may offer an agreement that will be better for the other agent than what that other agent can attain in the next periods. However, the offer will be the worst agreement of the possible future agreements. That is, since the agents lose over time, the first agent will offer the second agent the possible agreement in the next period minus the second agent’s losses over time. The starting point is Tˆ − 1, where the agreement that will be signed is clear from the previous lemma. The definition depends on whether Tˆ is even or odd. Note that in the case of bilateral negotiation for i, j ∈ Agents, i = j, sˆ j,t = s˜ i,t is the best
Negotiations about Task Distribution
163
agreement for agent i that is still better to agent j than opting out. As defined in A3t , ci is the constant cost of delay of agent i. Definition 6.1.1 Acceptable agreements Tˆ is even. Suppose it is agent 2’s turn to make an offer in time period Tˆ − 1 ˆ ˆ (i.e, Tˆ − 1 is odd). Let us define x T −1 = s˜ 2,T −1 . For any t ∈ T , t = Tˆ − k, ˆ −1 ˆ 2, T t − k2 c2 + ( k2 − 1)c1 , s˜22,T −1 + 1 < k ≤ Tˆ , if t is even we define x = (˜s1 ˆ k c − ( k2 − 1)c1 ). If t is odd we define x t = (˜s12,T −1 − k−1 c + k−1 c, 2 2 2 2 2 1 2,Tˆ −1 k−1 k−1 s˜2 + 2 c2 − 2 c1 ). Tˆ is odd. In this case it is agent 1’s turn to make an offer in time period ˆ ˆ Tˆ − 1. Let us define x T −1 = s˜ 1,T −1 . For any t ∈ T , t = Tˆ − k, 1 < k ≤ Tˆ , if t ˆ ˆ c + k−1 c , s˜ 1,T −1 + k−1 c − k−1 c ). If t is even we define x t = (˜s11,T −1 − k−1 2 2 2 1 2 2 2 2 1 ˆ ˆ 1, T −1 1, T −1 k k k + 2 c1 − ( 2 − 1)c2 , s˜2 + ( 2 − 1)c2 − k2 c1 ). is odd we define x t = (˜s1 If Tˆ is even, that is, if it is agent 2’s turn to make an offer before the period where no agreement can be reached, agent 2 has a small advantage. If Tˆ is odd, ˆ ˆ agent 1 has an advantage. However, since s˜ 1,T −1 and s˜ 2,T −1 are quite close, the advantage in both cases is small. We will show by induction on k that if the agents follow their perfect equilibrium strategies, the agent whose turn it is to make an offer will offer x t and the other agent will accept this offer. The main idea behind the proof is that both agents prefer x t to opting out. Furthermore, both agents prefer x t in time period t to x t+1 at time period t + 1. And x t is the best such agreement for the agent whose turn it is to make an offer ˆ in time period t. In particular, since in Tˆ − 1 the agreement will be x T −1 (as ˆ T −2 is the best option for we proved in lemma 6.1.1), it is clear that in Tˆ − 2, x the agent whose turn it is to make an offer in time period Tˆ − 2; and similarly in previous time periods. A7t Losses due to opting out vs. losses resulting from agreement: 1. For any t < Tˆ , if t > 0, s˜ i,t ∈ Possiblet−1 . 2. For any t < Tˆ , if t > 0, s˜12,t − s˜12,t−1 ≤ 12 (c2 − c1 ) and s˜21,t − s˜21,t−1 ≤ 1 (c − c2 ). 2 1 Note that the first part (1) indicates that the best agreement for agent i in the next time period that is still better to both agents than opting out in the next time period is also preferred by both agents to opting out in the current time period. If c1 = c2 then the second part (2) is always true.
164
Chapter 6
In the following lemmas we describe the proofs for the case where Tˆ is even. The proofs where Tˆ is odd are similar. We first prove that x t is preferred by both agents over opting out. Lemma 6.1.2 x t is acceptable If the model satisfies assumptions A0t –A7t then for any t ∈ T , t = Tˆ − k, 1 < k ≤ Tˆ , U i ((x t , t)) > U i ((Opt, t)). Proof: The proof is based on backward induction on t. ˆ Base case (t = Tˆ − 2): In this case, t is even and x T −2 = ˆ ˆ (˜s12,T −1 − c2 , sˆ21,T −1 + c2 ).
We first show that the hypothesis is correct for agent 1. By (A5t ), since ˆ ˆ U 1 ((˜s 2,T −1 , Tˆ −1)) > U 1 ((Opt, Tˆ −1)), also U 1 ((˜s 2,T −1 , Tˆ −2)) > U 1 ((Opt, Tˆ − 2)). But, since actions are costly (A1t ), agent 1 prefers a smaller portion of the task, and if its part will be decreased by c2 it will be prefered than the original ˆ ˆ agreement. Thus U 1 ((˜s12,T −1 − c2 , sˆ21,T −1 + c2 ), Tˆ − 2) > U 1 ((Opt, Tˆ − 2)). ˆ We now show the hypothesis for agent 2. By (A7t ), U 2 ((˜s 2,T −1 , Tˆ − 1)) > ˆ 2 1,Tˆ −2 ˆ 2 , T −2)) > U ((Opt, Tˆ −2)). By (A3t ), it is clear that U 2 ((˜s12,T −1 − U ((˜s ˆ ˆ c2 , sˆ 21,T −1 + c2 ), Tˆ − 2) ≥ U 2 ((˜s 2,T −1 , Tˆ − 1)), and we can conclude that ˆ ˆ U 2 ((˜s12,T −1 − c2 , sˆ 21,T −1 + c2 ), Tˆ − 2) > U 2 ((Opt, Tˆ − 2)). Induction case (t < Tˆ − 2): Suppose the hypothesis is true for any t ′ , t < t ′ ≤ Tˆ − 2 and let t = Tˆ − k. ˆ
ˆ
1. If t is even x t = (˜s12,T −1 − k2 c2 + ( k2 − 1)c1 , s˜22,T −1 + k2 c2 − ( k2 − 1)c1 ), which is actually (x1t+1 − c2 , x2t+1 + c2 ). For agent 1, by the induction hypothesis, U 1 ((x t+1 , t +1)) > U 1 ((Opt, t +1)) and by (A5t ) U 1 ((x t+1 , t)) > U 1 ((Opt, t)). By (A3t ) it is clear that U 1 ((x1t+1 − c2 , x2t+1 + c2 ), t) > U 1 ((Opt, t)). ˆ ˆ For agent 2, by ( A7t ) U 2 ((˜s 2,T −1 , Tˆ −1)) > U 2 ((˜s 1,T −2 , Tˆ −2)) and by (A3t ) ˆ ˆ ˆ ˆ s˜22,T −1 < s˜21,T −2 −c2 and s˜22,T −1 + k2 c2 −( k2 −1)c1 < s˜21,T −2 −c2 + k2 c2 −( k2 −1)c1 . ˆ ˆ ˆ It is enough to show that s˜21,T −2 −c2 + k2 c2 −( k2 −1)c1 < s˜21,T −k . That is, s˜21,T −2 − 1,Tˆ −k s˜2 < c2 − k2 c2 + ( k2 − 1)c1 . But, c2 − k2 c2 + ( k2 − 1)c1 = (k − 2)( 12 (c1 − c2 )), and by (A7t ) we can conclude that U 2 ((x t , t)) > U 2 ((Opt, t)). ˆ
ˆ
c + k−1 c , s˜ 2,T −1 + k−1 c − k−1 c ), which 2. If t is odd, x t = (˜s12,T −1 − k−1 2 2 2 1 2 2 2 2 1 t+1 t+1 is actually (x + c1 , x − c1 ). The proof for agent 2 is similar to the proof for agent 1 when t is even. For ˆ ˆ ˆ c + k−1 c < s˜12,T −k , i.e., s˜12,T −1 − agent 1, we need to show that s˜12,T −1 − k−1 2 2 2 1 ˆ s˜12,T −k < k−1 (c2 − c1 ). This is clear by (A7t ). 2
Negotiations about Task Distribution
165
We now prove that both agents prefer x t in period t to x t+1 in t + 1. This is due to the construction of the x t s. For example, if t is even then agent 1 will do less in x t than in x t+1 . So, it is clear that agent 1 prefers x t in period t (agent 1 also gains a period). Agent 2 needs to gain exactly what it loses over time, and therefore the utility is the same from both options. Lemma 6.1.3 (x t is preferred to x t+1 ) If the model satisfies assumptions A0t − A7t then for any t ∈ T , t = Tˆ −k, 1 < k ≤ Tˆ , U i ((x t , t)) ≥ U i (x t+1 , t +1). Proof: If t is even then x1t = x1t+1 − c2 and by (A1t ) the claim is clear for agent 1. For agent 2, x2t = x2t+1 + c2 and the claim is clear by (A3t ); similarly when t is odd. We now state our final results for this section. Theorem 6.1.1 Agreement will be reached in the first period If the model satisfies assumptions A0t –A7t and the agents follow their perfect equilibrium strategies, then: ˆ ˆ ˆ ˆ If Tˆ is even, agent 1 will offer agent 2 (˜s12,T −1 − T2 c2 + ( T2 − 1)c1 , s˜22,T −1 + ˆ Tˆ c − ( T2 − 1)c1 ) in the first period and agent 2 will accept the offer. 2 2 ˆ ˆ ˆ 1 ˆ 1 c2 + T − c1 , s˜21,T −1 + If Tˆ is odd, agent 1 will offer agent 2 (˜s11,T −1 − T − 2 2 ˆ ˆ 1 T −1 c2 − T − c1 ) in the first period and agent 2 will accept the offer. 2 2
Proof: This is clear from the above lemmas. EXAMPLE 18 We return to the example of the newsletter deliverers. Two electronic newsletters (N1 and N2) are delivered by separate delivery services (D1 and D2). The publisher of N1 pays D1 $200 for the delivery of one edition of N 1 to all its subscribers, and the publisher of N2 pays D2 $225 per delivery. Each delivery to any subscriber (i.e., a phone call to the subscriber’s server) costs D1 or D2 $1, and each loses $1 for each time period. There are M subscribers with subscriptions to both N1 and N2, and there are substantial savings to a delivery service if one or the other can deliver both newsletters. In the event that there is an agreement between D1 and D2 for joint deliveries to the M joint subscribers, then the publisher of N1 will pay D1 $170, and the publisher of N2 will pay D2 $200 (the lower prices reflect the fact that there are competing advertisers in the two newsletters, and consequently their joint delivery may detract from the sales impact of each newsletter). They must still pay $1 per phone call to the server and will lose $2 over time during the negotiations because of a
166
Chapter 6
higher penalty from the publisher. An agreement s ∈ S is a pair (s1 , s2 ) where s1 , s2 ∈ IN and s1 + s2 = M. One dollar is the smallest unit of currency in this example. Formally: U 1 ((Opt, t)) = 200 − M − t and U 1 ((s, t)) = 170 − s1 − 2t U 2 ((Opt, t)) = 225 − M − t and U 2 ((s, t)) = 200 − s2 − 2t. Suppose M = 100. Then s˜ 2,t = (69 − t, 31 + t), s˜ 1,t = (26 + t, 74 − t) and Tˆ = 22.2 Since Tˆ is even, according to theorem 6.1.1 the agreement that will be reached in the first period is (46, 54). 6.2
Multiple Agents
So far, we have assumed that only two agents participate in each interaction when there is a task to distribute. We now expand this assumption by extending the framework to the case of more than two agents, as in the data allocation instance. In this section we assume that the agents have full information. We assume that a set of agents wants to satisfy a common goal. As in the bilateral case, the goal may originate from the goals of the individual agents. All agents can take part in satisfying the goal, but they all need to agree on the schedule. There are no side payments, that is, no private deals can be reached among the agents. An additional option that we do not deal with in this chapter involves one of the agents opting out, and the remaining agents reaching an agreement. In the rest of this chapter when we refer to several agents we mean more than two. EXAMPLE 19 Suppose there are several electronic newsletters (more than two) that are delivered by separate delivery service agents. The delivery is done by phone (either by fax machines or electronic mail). The expenses of the agents depend only on the number of phone calls made. There are several subscribers that subscribe to all the newsletters. The delivery to the common subscribers is the common goal of the agents. However, each agent may still need to deliver its newsletter to its private subscribers. All delivery agents negotiate over the distribution of the common subscriptions. Each of the agents can opt out of the negotiations and deliver all of its own newsletters. The agents are paid according to the time of the delivery (the faster the better).
We assume that the common goal consists of M equal subtasks to be fulfilled and that an agreement is a tuple (s1 , . . . , sn ), where si ∈ IN and s1 + · · · + sn = M. si is agent i’s portion of the labor.
Negotiations about Task Distribution
167
We assume that the properties of the agents’ utility function in the bilateral case are still correct in the current case of multiple agents. For example, as in the bilateral negotiations, for agreements that are reached within the same time period each agent prefers to perform a smaller portion of the labor. However, the utilities of the agents from agreements in which their parts are equivalent may be different. That is, ri = si → U i ((ri , t)) = U i ((si , t)). Other parameters, such as the quality of the performance of the other agents, may also play a role. 6.2.1
Negotiation Ends with No Delay
We are able to show that the results of section 6.1 are also valid when there are more than two agents in the environment and when all agents have veto power. That is, if one of the agents opts out of the negotiation, the common goal cannot be performed. When the agents use the protocol of simultaneous response, then the results of the data allocation case in chapter 3 are valid. That is, for any possible distribution of the tasks that is not worse for any of the agents than opting out there is a subgame-perfect equilibrium that leads to this outcome without delay. The complexity of finding a distribution that maximizes the sum of the agents’ utilities depends on the type of the utility functions. When the utility functions satisfy assumptions A0t –A7t , then methods such as the simplex can be used (see chapter 7). In this chapter we consider the protocol of sequential responses in which an agent responding to an offer is informed of the responses of the preceding agents (assuming that the agents are ordered). We assume that the same order as that used for making offers is used for responding to an offer. When the agents use this protocol, as in the bilateral negotiation on task distribution, the main driving force behind the agents’ reaching an agreement in this case is the cost of the negotiation time. The agents’ attitudes toward opting out versus reaching an agreement will affect only the details of the actual agreement that will be reached, but won’t drive any of the agents to opt out. We first show that in such a case if the game has not ended in prior periods, then an agreement will be reached in the period prior to that in which there is no agreement acceptable to all agents, that is, in period Tˆ − 1. Lemma 6.2.1 An agreement will be reached prior to the time period when an agreement is no longer possible (N agent version; sequential response) Suppose the model satisfies A0t –A2t , A4t –A5t and the agents respond to an offer sequentially. If the agents are using their subgame perfect equilibrium strategies, the negotiation process is not over until time Tˆ − 1 and it is agent
168
Chapter 6
ˆ
i’s turn to make an offer. Then it will offer s˜ i,T −1 , which is the agreement that ˆ satisfies U i ((˜s i,T −1 , t)) = maxs∈Possiblet U i ((s, Tˆ −1)). We will denote s˜ i,T −1 by sˆ. All the other agents will accept the offer.3 Proof: In period Tˆ and later, there is no agreement that is acceptable to all the agents. Therefore, the only possible outcome after that time period is either opting out or disagreement. Since disagreement is the worst outcome (A1t ) and the agents prefer opting out sooner rather than later, at least one of the agents will opt out at time period Tˆ . But all the agents prefer an agreement ˆ from PossibleT −1 over opting out in the next period. Since it is i’s turn, it can choose the best agreement from its point of view, offer it, and all the agents will accept it. We now show that in each time period previous to Tˆ there is a set of possible agreements acceptable to all the agents. The agent whose turn it is to make an offer should choose the best of these agreements according to its utility function and make this offer. We first define the sets of acceptable agreements by induction of t. In the period before Tˆ , this set contains only sˆ. In the prior period (Tˆ − 2), it includes all the agreements in this time period that all agents prefer to opting out and to sˆ in Tˆ − 1. The best agreement for the agent whose turn it is to make an offer is chosen from this set. This value is used as the basis in computing the acceptable agreements set in the prior period, Tˆ − 3, that is, the acceptable agreements in Tˆ − 3 are those that are better for the agents than this value and also better than opting out; and similarly for prior periods. ˆ
Definition 6.2.1 Acceptable agreements Let x T −1 = sˆ (where sˆ is as defined in lemma 6.2.1). For each t ∈ T , t < Tˆ − 1 let X t include all the agreements that satisfy the following condition: s ∈ X t iff s ∈ Possiblet and for any j ∈ Agents U j ((s, t)) ≥ U j ((x t+1 , t + 1)). If it is i’s turn to make an offer in time period t, we define x t = maxU i X t . This definition is sound, since X t is not empty for any time period before Tˆ − 1. We will prove this in the next lemma. The intuition behind the proof is that x t+1 always belongs to X t , since the agents lose over time, and if an agreement is preferred to opting out at a given time period, it is also preferred to opting out in previous time periods. Lemma 6.2.2 Acceptable agreements do exist If the model satisfies conditions A1t –A2t , A4t –A5t then for t ∈ T , t < Tˆ − 1, X t =
∅.
Negotiations about Task Distribution
169
Proof: We show by backward induction on t that for all t < Tˆ , X t = ∅. Base case (t = Tˆ − 1): By (A2t ) ∀i ∈ Agents, U i ((ˆs , Tˆ − 2)) > U i ((ˆs , Tˆ − 1)) and by (A4t ) it is clear that ˆ U i ((ˆs , Tˆ − 2)) > U i ((Opt, Tˆ − 2)), and therefore sˆ ∈ X T −2 . Inductive case (t < Tˆ − 1): By the induction hypothesis, X t+1 = ∅. Therefore, x t+1 is well defined. But, by (A2t ) and (A4t ), similar to the base case, it is easy to show that x t+1 ∈ X t . We now show that in any time period all the agents will accept x t , and the agent whose turn it is to make an offer will also offer x t . The intuition behind this is that the agents prefer x t to opting out and x t is better than any agreement that can be reached in the future. Lemma 6.2.3 x t is offered and accepted If the model satisfies conditions A0t – A2t , A4t –A5t , then in any time period t < Tˆ − 1, the agents will accept any offer s ∈ X t and the agent whose turn it is to make an offer will offer x t . Proof:
The proof is by backward induction on t.
Base case (t = Tˆ − 2): By lemma 6.2.1, it is clear that the agents won’t reach ˆ any agreement in the future better than x T −1 . However, by the definition ˆ t of X (definition 6.2.1), any agreement in X T −2 is better for the agents than ˆ x T −1 in time period Tˆ − 1 and than opting out at Tˆ − 2. Therefore, they should accept these offers. ˆ
On the other hand, any agreement that does not belong to X T −2 won’t be accepted by at least one of the agents, since it will prefer to wait another period and receive sˆ or even opt out. But since i, similar to the other agents, prefers ˆ the agreements of X T −2 to this possibility, it should offer an agreement from this set. However, since it is i’s turn to make an offer, it has the opportunity to choose the best one from its point of view. Inductive case (t < Tˆ − 2): By the induction hypothesis, if an agreement isn’t reached in this time period, the outcome of the negotiation process will be (x t+1 , t + 1). But the agreements of X t at time period t are preferred by all agents to (x t+1 , t + 1) and to opting out at t; the proof proceeds as in the base case. We summarize our results with the following theorem.
170
Chapter 6
Theorem 6.2.1 If the model satisfies conditions A0t –A2t , A4t –A5t , and the agents use their perfect equilibrium strategies, then in the first time period agent 1 will offer x 0 , and all other agents will accept the offer. Proof: The proof is clear by lemma 6.2.3. EXAMPLE 20 We return to the example of the newsletter deliverers. Three electronic newsletters (N1, N2, and N3) are delivered by separate delivery services (D1, D2, and D3). The payment arrangements for D1 and D2 are as previously discussed in example 18, that is, the publisher of N1 pays D1 $200 per delivery of one edition, and the publisher of N2 pays D2 $225 per delivery of one edition. The publisher of N3 pays D3 $250 per delivery of one edition. As was the case for D1 and D2, each delivery to a given subscriber (i.e., a phone call to this subscriber’s server) also costs D3 $1, and each loses $1 for each time period. There are M subscribers with subscriptions to all newsletters (i.e., N1, N2, and N3), and as in example 18 there are substantial savings to a delivery service if one of the agents can deliver all newsletters to the same subscribers. If an agreement among D1, D2, and D3 for joint deliveries to the M joint subscribers is reached, then the publisher of N3 will pay D3 only $215 per delivery of an edition; and, as in the previous example, in such an event the publisher of N1 will pay D1 $170, and the publisher of N2 will pay D2 $200. They must still pay $1 per phone call to the server and will lose $2 for any negotiation time period. Notice that in this example, only the number of phone calls to the subscribers made by a delivery agent plays a role in its payments and not the distribution of the rest of the subscribers between the other two agents. Formally:
U 1 ((Opt, t)) = 200 − M − t and U 1 ((s, t)) = 170 − s1 − 2t U 2 ((Opt, t)) = 225 − M − t and U 2 ((s, t)) = 200 − s2 − 2t U 3 ((Opt, t)) = 250 − M − t and U 3 ((s, t)) = 215 − s3 − 2t 3,t 2,t Suppose M = 100. Then sˆ 1,t 1 = 69 − t, sˆ 2 = 74 − t and sˆ 3 = 64 − t. Note i,t that for all i ∈ Agents, sˆ is not unique in this case. Tˆ = 36, and it is D3’s turn to make an offer in the time period prior to ˆ T . In this period, D1 is willing to deliver up to 34 newsletters, if an agreement will be reached and D2 is willing to deliver up to 39 newsletters. So, x 35 = (34, 39, 27). It is easy to compute that whenever it is D1’s turn to make an offer (t is divided by 3), x t = (31, 38, 31); that when it is D2’s turn to make an offer, x t = (35, 36, 29); and that when it is D3’s turn to make an offer (prior
Negotiations about Task Distribution
171
to time period 35), x t = (33, 40, 27). Therefore, in the first time period (0), D1 will offer (31,38,31), and the other agents will accept its offer. 6.3
Task Distribution in DAI
Most of the research on task distribution in DAI was performed in the area of distributed problem solving systems. The Contract Net is a high-level communication protocol for Distributed Problem Solving systems (Smith and Davis 1983; Malone et al. 1988; Sandholm 1993). It enables the distribution of tasks through decentralized negotiation processes between individual agents and a manager working on behalf of the task force. The main problem with this approach is that it usually does not adequately address the problem of minimization of the work time, and it incurs significant overhead during the negotiation process. However, the research on the Contract Net model considers problems that we do not, such as task decomposing. Cammarata, McArthur, and Steeb (1983) suggest different strategies for cooperation that are applicable in the context of collision avoidance in air traffic control systems. They evaluate their strategies via simulations, using criteria such as communications required, processing time, and separation errors. Lesser, Durfee, and colleagues (Durfee 1988; Durfee and Lesser 1987; Decker and Lesser 1993) have developed a model where agents exchange partial solutions, at various levels of detail, to construct global solutions. Using simulations, they examined strategies for communication of data and hypotheses and different group structures and policies to make the overall performance of the system more efficient. Lesser and Erman’s model of a distributed interpretation system is able to function effectively even though processing nodes have inconsistent and incomplete information (Lesser and Erman 1980). Carver et al. present sophisticated models that support complex and dynamic interactions between agents (Carver, Cvetanovic, and Lesser 1991) in domains such as aircraft monitoring. The problem of distributed dynamic task allocation by a set of cooperative agents is considered in (Kraus and Plotkin 2000). There are different types of tasks that arrive dynamically to the system. Each of the agents can satisfy only a subset of the tasks. The main goal of the agents is to maximize the overall performance of the system and to fulfill the tasks as soon as possible. The agents are modeled using a stochastic closed queueing network, and algorithms for determining a distributed policy of optimal task allocation
172
Chapter 6
and finding the optimal effort levels of the agents, subject to certain constraints, are presented. None of the research discussed above considers self-interested agents who need to cooperate, as we do. Several formal models of shared plans, teamwork, and integration of information agents have been developed (e.g., (Grosz and Kraus 1996; Jennings 1995; Huhns et al. 1994; Sonenberg et al. 1992)). They deal mainly with restrictions on the design and behavior of members of a group that will lead to cooperative behavior. Some of these works also discuss possibilities for subactions distribution. Jennings, for example, suggests that there will be an organizer for any joint action. This organizer is responsible for selecting an agent for each subaction. The exact timing of the subactions is determined based on the temporal ordering of the subactions and the other commitments of the agents, and it is reached through mutual agreement between the agents and the organizer. Shehory and Kraus (1998, 1995) consider situations where each task should be attached to a group of agents that will perform the task. The set of tasks is known in advance and the agents try to optimize the overall performance of the system. Shehory and Kraus suggest that the agents form coalitions in order to perform tasks, and present a distributed algorithm with a low ratio bound and with low computational complexity. The problem of task scheduling4 and its relationship to system performance is one of the major research issues in distributed computer systems (see (Graham et al. 1979; Coffman 1976; Tanaev, Sotskov, and Strusevich 1994) as surveys). The main difference between task scheduling in distributed systems and the problem we consider is that task-scheduling tries to maximize the performance of the overall system, while in our case the agents are self-interested, and each tries to maximize its own performance. In addition, in the literature on task scheduling, the tasks are scheduled by a central controller while the strategic-negotiation model is distributed. Grigg and Petro (1997) propose a market-based solution for the distribution of small software tasks over the Internet. They consider an open market where agents join and leave the market, whereas we considere negotiations among a fixed set of agents reaching an agreement on a well-defined set of tasks. An example of task distribution is the “delivery domain” (Wellman 1992; Zlotkin and Rosenschein 1993; Sandholm 1993; Fischer and Kuhn 1993). A group of delivery companies can reduce their overall and individual delivery costs by coordinating their deliveries. Each delivery requirement is a single task. Delivery coordination is actually the exchanging of tasks. One company,
Negotiations about Task Distribution
173
for example, that needs to make a delivery from A to B and a delivery from C to D can execute other deliveries from A to B with no extra cost. Therefore, it may agree to exchange its C-to-D delivery with another A-to-B delivery. In summary, the mechanisms presented in this chapter allow multiple-delivery companies and other agents working in multiagent environments to reach an efficient and mutually beneficial agreement on task distribution without delay.
7
Negotiations about How to Reduce Pollution
In chapters 3 through 6 we described the application of the strategic-negotiation model to various domains. There are other game-theory based models that can be used for agent application, which we discuss in chapter 9. In particular, the market-oriented programming approach (section 9.2) can be used for resolving conflicts when there are many agents and the conflict is over the allocation of several items. In this chapter we present the pollution allocation problem and will compare situations in which it is beneficial to apply the strategic-negotiation model with situations in which using market mechanisms (Wellman 1993) is more appropriate. The problem under consideration in this chapter concerns the need to reduce air pollution for a short time period in light of external factors (such as weather). The emission of several pollutants by manufacturing plants should be reduced by a certain percentage with short notice. The solution currently being implemented is simply the reduction of emission by each plant by an appropriate required percentage. But it may be less costly to reduce certain pollutant emissions of one plant more than another plant. Therefore, plants can reach an agreement with respect to emission of various pollutants and by which percentage each must be reduced. The plants will be represented by automated agents that will negotiate to reach an agreement on pollution reduction. Each agent wants to reach an agreement that will maximize its profit and that controls the total emission of each pollutant so that it does not exceed its maximal permissible concentration in the atmosphere. For situations of complete information we will use the strategic-negotiation model, and for situations of incomplete information we will examine market mechanisms. 7.1
Problem Description
Consider the following situation: there are some closely grouped plants in an industrial region. In the process of manufacturing its products, a plant emits various pollutants into the atmosphere. There are norms restricting the maximal emission of each pollutant for each plant. The level of pollution must always be below these norms. We refer to the situation when only these norms must be adhered to as the usual circumstances. Sometimes there is a need to reduce pollution further for some period of time because of unusual external factors, such as weather (high humidity, wind blowing toward residential areas). We refer to this situation as special circumstances. In this case plants receive new norms and need to adjust their production to these new norms. The solution
176
Chapter 7
currently implemented is simply the reduction of each norm by the appropriate percentage, that is, each plant has to reduce its norms for each pollutant proportionally to the general reduction needed for this pollutant. We refer to this solution as the default solution. It is possible that for one plant it is less costly to reduce the emission of one pollutant while for another it is less costly to reduce the emission of another pollutant. So plants can negotiate to reach more beneficial agreements about the emission of which pollutants and by which percentage each of them must be reduced. All the plants must agree on the new allocation of the emissions in order that it will be implemented. If a consensus is not reached, the default solution is implemented. EXAMPLE 21 There are three plants, A, B, and C, that are located in the same industrial region, and there are three pollutants, α, β, and γ , that they emit while manufacturing their products. The plants have the following constraints under usual circumstances:
Plant A: The maximal emission allowed under usual circumstances for plant A of α is 80 kg, of β 90 kg, and of γ , 120 kg. Plant B: The maximal emission allowed under regular circumstances for plant B of α is 130 kg, of β is 90 kg, and of γ , 160 kg. Plant C: The maximal emission allowed under regular circumstances for plant C of α is 70 kg, of β 60 kg, and of γ , 190 kg. Suppose that owing to high humidity prediction the emissions in the industrial region of the plants have to be reduced in the following way: α has to be reduced by 10%, β by 20%, and γ by 30%. Thus if the default solution is implemented, the default constraints in these special circumstances will be: Plant A: α - 72 kg, β - 72 kg, γ - 84 kg; Plant B: α - 117 kg, β - 72 kg, γ - 112 kg; Plant C: α - 63 kg, β - 48 kg, γ - 133 kg. However, as will be demonstrated later, it may be the case that different constraints will be beneficial to all the plants. The formal definition of such an environment is presented in the following definition.
Negotiations about How to Reduce Pollution
177
Definition 7.1.1 A pollution allocation environment (PAE) is a tuple where PLANTS: A set of n plants PLANTS = { pl1 , pl2 , . . . , pln } is a set of closely located plants in an industrial region. Each plant has its own interests and tries to maximize its own utility. We assume that there are more than two plants. PRODUCTS: A set of n sets of products PRODUCTS = {PRODUCTS1 , . . . , PRODUCTSn } where PRODUCTSi = {producti1 , . . . , productim i } is the set of m i products produced by plant pli . We denote by z¯ i the vector of the amounts of the products that are produced by pli , where z i j denotes the amount of producti j produced by plant pli . We assume that z i j can be any nonnegative number. This assumption is reasonable if the amount is measured, for example, by kilograms. Since different plants may have different technologies for producing the same kind of products and therefore have different profits and emit different pollutants while producing the same product, we always refer to the products produced by different plants as different kinds of products. UTILITIES: A set of the basic utility functions of the plants UTILITIES = {Ub1 , . . . , Ubn }, where Ubi is the basic utility function of pli . It specifies pli ’s profit from a given production level.1 That is, given z i1 , . . . , z im i , Ubi (z i1 , . . . , z im i ) specifies pli ’s profits from producing of each product producti j the amount of z i j .2 POLLUTANTS (POLLUT): A set of pollutants POLLUT = { pol1 , pol2 , . . . , polk }. We will use the following notations: em i j (¯z i ) is the amount of pol j emitted by plant pli if it produces z¯ i . em i (¯z i ) is the vector (em i1 (¯z i ), . . . , em ik (¯z i )).
•
em j (¯z 1 , . . . , z¯ n ) denotes the amount of the pollutant pol j emitted by all the plants when they produce the amounts z¯ 1 , . . . , z¯ n , i.e., em j (¯z 1 , . . . , z¯ n ) = n z i ). i=1 em i j (¯ •
We assume that the emission of a given pollutant from producing one item of a certain product is constant, that is, the functions em i j are linear homogeneous functions. Experts confirm that this assumption is reasonable.
178
Chapter 7
CONSTRAINTS: A set of CONSTRAINTS = {CONSTu , CONSTs } where CONSTu = < a¯ 1 , . . . , a¯ n >, where a¯ i = < ai1 , . . . , aik > where ai j is the maximal amount of pollutant pol j that can be emitted by plant pli under usual circumstances. •
The maximal amount of pollutant pol j that can be emitted by all plants in n usual circumstances is a j , i.e., a j = i=1 ai j .
•
CONSTs = < b1 , . . . , bk >, where b j is the maximal amount of pollutant pol j that can be emitted by all plants under special circumstances.
•
Under usual circumstances, if pli produces z¯ i , then em i j (¯z i ) ≤ ai j . In the above definition the utility function of a plant is a function of the amount of products of the plant. Since our problem is to reallocate the permissions to emit pollutants, it is useful also to define the basic utility function of a plant as a function of the amount of pollutants that a plant emits. In particular, we want to know the maximal possible utility that an agent pli can obtain given a set of constraints, ci1 , . . . , cik . This utility can be expressed formally as the following constrained optimization problem: i (ci1 , . . . , cik ) = max zi1 ∈IR,...,zimi ∈IR Ubi (z i1 , . . . , z im i ) where Umax ∀ j, 1 ≤ j ≤ k, em i j (¯z i ) ≤ ci j
(7.1)
We illustrate the above definitions with the following example. EXAMPLE
22 Let us return to example 21 to illustrate definition 7.1.1.
PLANTS: There are three plants in the example, A, B, and C, that is, PLANTS = {A, B, C}. PRODUCTS: Plant A manufactures two kinds of products, ϕ1 and ψ1 . Plant B also manufactures two kinds of products, ϕ2 and ψ2 . Plant C manufactures three kinds of products, ϕ3 , ψ3 and χ3 . Thus PRODUCTS = {{ϕ1 , ψ1 }, {ϕ2 , ψ2 }, {ϕ3 , ψ3 , χ3 }}. UTILITIES: Let us assume that for each unit of ϕ1 plant A has a profit of $10,000 and for each unit of ψ1 its profit is $8,000. Then the utility of plant A in thousands of dollars is: Ub1 (< z 11 , z 12 >) = 10z 11 + 8z 12 . Plant B has a profit of $6,000 per unit of ϕ2 and $11,000 per unit of ψ2 . So, Ub2 (< z 21 , z 22 >) = 6z 21 + 11z 22 . Plant C has a profit of $12,000 for each unit of ϕ3 , $5,000 per unit of ψ3 and $7,000 for per unit of χ3 . Thus Ub3 (< z 31 , z 32 , z 33 >) = 12z 31 + 5z 32 + 7z 33 .
Negotiations about How to Reduce Pollution
179
POLLUTANTS: There are three kinds of pollutants in the example: α, β, and γ . So POLLUT = {α, β, γ }. Plant A emits 2 kg of pollutant α and 5 kg of pollutant γ while producing one unit of ϕ1 and 3 kg of pollutant β and 3 kg of pollutant γ while producing one unit of ψ1 . Plant B emits 4 kg of pollutant β and 2 kg of pollutant γ while producing one unit of ϕ2 and 3 kg of pollutant α and 4 kg of pollutant γ while producing one unit of ψ2 . Plant C emits 3 kg of pollutant α while producing one unit of ϕ3 , 2 kg of pollutant α and 6 kg of pollutant β while producing one unit of ψ2 , and 2 kg of pollutant β and 3 kg of pollutant γ while producing one unit of χ3 . We assume as earlier that all the pollution functions are linear and that the emission of a certain pollutant while producing certain amounts of a product is equal to the number of units (or part of units) of the product multiplied by the emission of that pollutant while producing one item of that product. Then: •
em 1 (¯z 1 , z¯ 2 , z¯ 3 ) = 2z 11 + 3z 22 + 3z 31 + 2z 32
•
em 2 (¯z 1 , z¯ 2 , z¯ 3 ) = 3z 12 + 4z 21 + 6z 32 + 2z 33
•
em 3 (¯z 1 , z¯ 2 , z¯ 3 ) = 5z 11 + 3z 12 + 2z 21 + 4z 22 + 3z 33
CONSTRAINTS: As specified in example 21 the plants should follow the following constraints under usual circumstances: Plant A: The maximal emission allowed under usual circumstances for plant A of α is 80 kg, of β 90 kg, and of γ 120 kg. Plant B: The maximal emission allowed under regular circumstances for plant B of α is 130 kg, of β is 90 kg, and of γ 160 kg. Plant C: The maximal emission allowed under usual circumstances for plant C of α is 70 kg, of β 60 kg, and of γ 190 kg. Therefore CONSTu = ,,< 70, 60, 190 >>. Summarizing the allowance of each plant for each pollutant we acquire the general constraints for pollutants α, β and γ : a1 = 280, a2 = 240 and a3 = 470. Suppose the emissions of α have to be reduced by 10%, emissions of β by 20%, and emissions of γ by 30%. That is, CONSTs = < 252, 192, 329 >. Plants can negotiate to reach an agreement about how to distribute the reduction of the special circumstances constraints. Thus negotiation leads to an allocation of the special circumstances constraints. The following is a definition of an allocation.
180
Chapter 7
Definition 7.1.2 (Pollution allocation) Given total special circumstances constraints < b1 , . . . , bk >, an allocation is a vector of vectors n , . . . , < bn1 , . . . , bnk >> such that i=1 bi j ≤ b j . alloci is the vector associated with pli , i.e., < bi1 , . . . , bik >. The set of all the possible allocations is S—the set of possible agreements. Among the problems that we consider in this book, the variation of the data allocation problem considered in section 9.1.2 is most similar to the pollution allocation problem. Both cases consider negotiations between multiple agents (more than two) who lose over time. In addition, in section 9.1.2, each server is concerned about the data stored locally, but has no preferences concerning the exact storage location of data stored in remote servers. Similarly, each plant is concerned about the amounts of pollutants it is permitted to emit according to the agreed upon allocation, and do not care how the remaining amounts are distributed among the other plants. However, while in the data allocation problem the negotiations concern the allocation of unique items, in the pollution allocation problem there are several units of each pollutant to be allocated. As in the data allocation problem, one of the main factors that influences the outcome of the negotiation is the conflict alloc, that is, which allocation will be implemented if agents do not reach any agreement. Definition 7.1.3 (Conflict allocation) The conflict allocation is the default allocation that is currently implemented under special circumstances where each constraint changes proportionally to the total change. conflict alloci = c c < bi1 , . . . , bik > is the allocation of plant pli where ∀1 ≤ i ≤ n, ∀1 ≤ j ≤ k, bj c bi j = ai j a j . The goal of each plant is to maximize its own profit. As we will see later, negotiation usually leads to an allocation of the total permitted amount of each pollutant, so that all plants will make a bigger profit than in the case of the default solution. The main question is what amounts of products a plant should manufacture given an allocation in order to maximize its utility. This question can be answered by solving the optimization problem specified in 7.1. This is demonstrated with the following example. 23 Let us return to example 22. The conflict alloc (i.e., the default solution) in this case is: , < 117, 72, 112 >, < 63, 48, 133 >>. If the conflict alloc is implemented, then the best strategy for plant A will be to produce 2.4 units of ϕ1 and 24 units of ψ1 . The best strategy for plant B EXAMPLE
Negotiations about How to Reduce Pollution
181
in such a case will be to produce 18 units of ϕ2 and 19 units of ψ2 . For plant C, the highest utility will be obtained by producing 21 units of ϕ3 and 24 units of χ3 . In this case the profit of plant A will be $216,000, the profit of plant B will be $317,000, and the profit of plant C will be $420,000. There are other allocations that if implemented will increase the utility of all the agents. 7.2
Attributes of the Utility Functions
As in previous chapters, we assume that a plant has a utility function over all possible outcomes: U i : { | S ∪ {Opt} | × T } ∪ {Disagreement} → IR. In the previous section we specified the basic utility of a plant for a given allocation without taking into account the negotiation cost. We extend these functions to take the negotiation time into consideration by assuming that U i (alloc, 0) = i Umax (alloci ). Since the conflict allocation is also an allocation, these functions can be used for evaluating the option of opting out. We present two assumptions concerning the utility functions of the agents in the case of the pollution allocation, which are similar to the ones presented in the data allocation case. A0p Disagreement is the worst outcome: For every alloc ∈ S, t ∈ T , and pli ∈ PLANTS, U i (alloc, t) > U i (Disagreement). The plants (or the agents that represent them) must agree on an allocation before the time at which they must start reducing their emissions. Thus the pollution allocation environment has a finite horizon where the negotiation must terminate after Tˆ periods. If the negotiation has not ended by then, the conflict allocation will be implemented at Tˆ . During negotiation the plants lose over time as specified in the following assumption. A1p Utility Over Time: For all i ∈ PLANTS and t ∈ T such that t ≤ Tˆ , and for all possible outcomes of the negotiation, o, U i (o, t) = U i (o, 0) · (1 − t/Tˆ ). 7.3
Complete Information
We first consider the case in which the plants have complete information about each other. That is, the utility functions of all the agents are known. We will discuss the application of the strategic-negotiation model in such situations. Note that since all the plants must agree on the new allocation, if one of them opts out of the negotiation, the conflict allocation will be implemented.
182
7.3.1
Chapter 7
Formal Analysis
When there is complete information and the agents use the protocol of simultaneous response, the results of the data allocation case in chapter 3 and of the task distribution case with simultaneous protocol in chapter 6 are valid. That is, for the protocol of simultaneous response, for any possible allocation of the pollution reduction that is not worse for any of the plants than the conflict allocation there is a subgame-perfect equilibrium that leads to this outcome without delay. As in the data allocation case (chapter 3), we propose that the designers of the agents agree in advance on a joint technique for choosing an allocation x that will be the basis for the subgame-perfect equilibrium strategies. We propose that the designers decide on a mechanism that will find an allocation that must, at the very least, give each agent its conflict utility and under these constraints maximizes some social-welfare criterion, such as the sum of the plants’ utilities or the generalized Nash product of the plants’ utilities (Nash 1950), that is, i (U i (alloci ) − U i (conflict alloci )). The main difference between the data allocation case and the pollution allocation case is that because of the linearity of the utility functions of the plants, finding an allocation that maximizes the sum of the plants’ utilities can be done in polynomial time, whereas in the data allocation case the problem is NPcomplete. Thus there is no need for two phases of the negotiations as there was in the data allocation case (see section 3.2.2). All the plants will run the same maximization algorithm locally and will find the same x, which will be the allocation offered and which will be accepted during the negotiations. However, maximizing the generalized Nash product is also intractable in the pollution allocation case, and therefore nondeterministic methods may be used. Here, a two-phase negotiation protocol is needed as in the data allocation case. If the plants use a protocol of sequential response, then the results of section 6.2.1 are applicable. That is, the agents should compute the acceptable agreements x t for 0 ≤ t < Tˆ as defined in lemma 6.2.1 by backward induction. Following theorem 6.2.1 in the first time period, agent 1 will offer x 0 and all the other agents will accept the offer. x 0 can be computed because the utility functions are linear and are continuous functions of the amounts produced by the plants. Note, however, that the utility functions are neither linear nor continuous i functions of the allocations. That is, Ubi is linear and continuous, but Umax is not. Thus in both the sequential and the simultaneous protocols the negotiations will end without delay. To compare the differences between the agreements reached in each of the cases, we developed a simulation of the pollution allocation environment, as discussed in the next section.
Negotiations about How to Reduce Pollution
7.3.2
183
Simulation Evaluation
To compare the various methods in the simulation of the pollution allocation environment, we varied the number of plants, the number of products per plant, and the number of types of pollutants. We determined how changes in these parameters influence the average utility of the plants, the standard deviation, and the Nash product. Other attributes, including the profit from producing a unit of each type of product and the amount and types of the pollutants emitted while producing a particular type of product, were randomly generated. We first discuss the assumptions and parameter choices of the simulation. • The number of plants, the number of products produced by each plant, and the maximal number of pollutants in the simulations were varied between 5 and 20 (the exact values were 5, 7, 10, 15, 20). The default value for each of these parameters was 5. • The maximal number of negotiation periods was 60 (i.e., T ˆ = 60).
We selected randomly which pullutants are emitted during the manufacturing of a given product. In particular, we generated randomly whether the manufacturing of a given product causes the emission of a given pollutant. For every type of pollutant, with probability 21 , manufacturing the given product causes the emission of the given pollutant. •
If the production of a given product causes the emission of a given pollutant (this happened with probability 21 ), the amount per unit was set randomly between 1 and 10. •
• The profit per unit for each product type was generated randomly between 0 and 9.
The constraint of the usual circumstances for each type of pollutants of each plant was set randomly between 50 and 100. •
The constraints of the special circumstances were determined by generating a random number between 0.5 and 1 for each of the pollutants. The constraint for a given plant was set as the regular constraint multiplied by the randomly generated number. •
As assumed above, the plant profit is a linear function of the number of units of each product manufactured by that plant. The profit of a plant is the sum of the profits from all of the products manufactured by that plant. The profit from a particular product is the profit from one unit of this product multiplied by the number of produced units. The pollution emission is a linear function of the number of units of each product. The amount of each pollutant of a plant is
184
Chapter 7
the sum of amounts of that pollutant from all of the products manufactured by this plant. The amount of a specific pollutant from a particular product is the amount of the given pollutant from one unit of the given product multiplied by the number of produced units. Given the above functions, the utility of the conflict allocation at the first time period was computed using the Simplex method. For a given plant the utility is the maximal possible profit for this plant under the given special circumstances constraints on the pollutants. 7.3.2.1 Optimization Methods Used in the Simulation For the simultaneous response protocol, we considered two possible social-welfare criteria for the pollution control problem with complete information: the maximization of the generalized Nash product that yields a “fair” solution and the maximization of the sum of the utilities that are at least equal to the utility from the conflict allocation for all the plants. Different techniques to find the solution that maximizes these criteria were examined. The Simplex method (Schrijver 1986) was used to maximize the sum of the plants’ utilities. We call this method MaxSum. Simplex is a method for linear optimization that leads to a good average performance time.
•
Praxis (Press et al. 1986) and random restart Hill Climbing (see section 3.3.3) were used to explicitly maximize the Nash Product. Maximizing the Nash product is a nonlinear optimization problem, and Praxis and the Hill Climbing methods search for near-optimal solutions.
•
We also considered the possibility of side payments, that is, where plants can pay one another for permits to emit some pollutants. In this case we examined the possibility of maximizing both the sum of the plants’ utilities and the Nash product of the plants’ utilities. According to this method the algorithm first finds the solution that maximizes the sum of the plants’ utilities without considering the conflict allocation utilities of the plants. Then, the profit is redistributed in the following way: every plant receives its conflict allocation utility and the rest of the common profit is divided equally among all the plants. This technique is call MaxSumNash.
•
For the sequential response protocol we used BackTracking, that is, computing x 0. 7.3.2.2 Complexity of the Methods Most of the methods use the Simplex method (Schrijver 1986). The average complexity of the Simplex method is
Negotiations about How to Reduce Pollution
185
l) O(h(l + h) · min{ h2 , 2l , (h + }) (Schrijver 1986) where h is the number of con8 straints and l is the number of variables of the maximization problem. Given this, we can compute the average complexity of the methods. Remember that n is the number of plants, k is the number of pollutants, and m i is the number of products of plant pli . We define m = max pli ∈PLANTS m i .
MaxSumNash: The method includes one appliance of Simplex with l = n · m variables and h = k constraints. The k constraints are one for each pollutant pol j indicating that the overall emitting amount of pol j cannot be higher than b j . With respect to the variables, there is a need for one variable per product for each of the plants. So the complexity of MaxSumNash is O((k 2 + kmn) · , k +8nm }). min{ k2 , nm 2 MaxSum: As in MaxSumNash, the MaxSum applies the Simplex method only once. However, there are more constraints here than in the MaxSumNash. In addition to the constraints on the maximal amount of pollution allowed, there are constraints stating that each of the plants must obtain at least its conflict allocation utility. Thus there are h = k + n constraints and l = nm variables, and the complexity is: O(((k + n)2 + nm(k + n)) · min{ k +2 n , nm , k + n8+ mn }). 2 Hill Climbing: This method runs for a fixed number (5000) of iterations. Each iteration consists of changing each product amount by δ and computing a goal function after each change to determine which local change will lead to the maximal improvement of the goal function. The complexity of computing the goal function is O(k · n · m), and there are n · m products. Thus the complexity of the Hill Climbing is O(5000 · k · n 2 · m 2 ). BackTracking: There are 60 negotiation steps (i.e., Tˆ = 60). Each negotiation step includes one application of the Simplex method with l = nk variables, one for each product of each plant. The number of constraints is h = k + n, one for each plant specifying, at step t, that it will receive at least its utility from x t+1 and its conflict allocation utility, and one for each pollutant, specifying for each pol j that the overall emission will not be higher than b j (similar to MaxSum). Thus the average complexity of the BackTracking method is O(60 ∗ ((k + n)2 + mn(k + n)) · min{ k +2 n , n·m , k + n8+ nm }). 2 NashPraxis: This method runs until convergence or until it performs the maximal number of iterations (10,000). Each iteration includes computing the goal function with complexity O(k · n · m).
186
Chapter 7
As can be seen, all the methods require, on average, polynomial time of k, n, and m. The main difference is in the constants that depends mainly on the number of times each method applies Simplex. Since in our simulations k, n, and m were varied between 5 and 20, the constants of 60 and 5,000 above led to a different time performance in our simulations. 7.3.2.3 Performance of the Methods The results presented in this section were obtained by performing 150 runs of the simulation for each given method, with a given number of plants, a given number of products per plant, and a given number of pollutants. In every particular experiment we varied one of the parameters from 5 to 20 (the exact values were 5, 7, 10, 15, 20) and the other two remained unchanged and equal to 5. The graphs in figure 7.1 present the average utility per plant as a function of the number of plants, number of products, and the number of pollutants, using the five methods described above. The graphs in figure 7.2 present the respective results for the Nash product. As could be expected, when the performance of the methods were evaluated using the average utility of the plants, the methods that strive to optimize the sum of the plants’ utilities, such as MaxSum and MaxSumNash, provide better results than the other methods. Another criterion that influences the attempt to maximize the sum of the utilities is the constraint that every plant will obtain at least its conflict allocation utility. So, if there is an allocation that leads to a higher sum of the utilities but does not yield the conflict allocation profit to some of the plants, it will not be selected because of this constraint. However, methods in which side payments are permitted, such as MaxSumNash, do not have to reject allocations for this reason. They can maximize the sum and then redistribute the profit by side payments in a way that provides each plant at least its conflict allocation utility. Thus, as can be seen in the graphs, MaxSumNash always yields the highest average utility. The second best is obtained by MaxSum. The worst results are obtained by the Nash Hill Climbing and the Nash Praxis methods, which attempt to optimize the Nash Product, rather than the sum of the utilities. They are general methods that do not use the linearity of the utility functions and the pollution functions. We also compared the standard deviation of each of the above methods. The lowest standard deviation of the utilities was obtained when using MaxSumNash and NashPraxis, since they try to maximize the Nash Product, which is motivated by fairness. The highest standard deviation was obtained by MaxSum, since it maximizes only the sum of the utilities (taking into account the constraints) and does not take into consideration the Nash Product or any other
1200
Utility per Plant
1000 800
MaxSum Nash Praxis BackTracking Nash Hill climbing MaxSumNash
600 400 200 0 5
10
15
20
Number of Plants
1200
Utility per Plant
1000 800
MaxSum Nash Praxis BackTracking Nash Hill climbing MaxSumNash
600 400 200 0 5
10
15
20
Number of Products
500 450
Utility per Plant
400 350 300
MaxSum
250
Nash Praxis
200
BackTracking
150
Nash Hill climbing
100
MaxSumNash
50 0 5
10
15
20
Number of Pollutants
Figure 7.1 The change in the average utility of the plants as a function of the number of plants, the number of products and the number of pollutants.
1E+63 1E+54 Nash Product
1E+45
MaxSum Nash Praxis
1E+36
BackTracking 1E+27
Hill climbing
1E+18
MaxSumNash
1E+09 1 5
10
15
20
Number of Plants
1E+16
Nash Product
1E+12
MaxSum Nash Praxis BackTracking
1E+08
Hill climbing MaxSumNash
10000
1 5
10
15
20
Number of Products
Nash Product
1E+12
MaxSum Nash Praxis BackTracking Hill climbing MaxSumNash
1E+08
10000
1 5
10
15
20
Number of Pollutants
Figure 7.2 The change in the Nash product of the plants as a function of the number of plants, the number of products and the number of pollutants.
Negotiations about How to Reduce Pollution
189
criteria that would lead to fairness. So, there may be agents that obtain only their conflict allocation utility, while others obtain much more. When the Nash Product is used for the evaluation of the methods, it is expected that the best results would be obtained by methods whose goals are to maximize the Nash Product. These methods are MaxSumNash, Hill Climbing, and Praxis. As can be seen in the graphs of figure 7.2, MaxSumNash yields the highest Nash Product, since it finds the optimal allocation with respect to the Nash product. BackTracking and Nash Hill Climbing provide the second best results. The success of the BackTracking method is surprising, since it does not try to maximize the Nash product. However, this can be explained by the fact that in each iteration, the BackTracking method tries to maximize the utility of a different agent. That is, when computing x t , it maximizes the utility of the agent that it is its turn to make an offer at period t given the relevant constraints, and when computing x t+1 it maximizes the utility of the agent that it is its turn to make an offer at period t + 1 given the relevant constraints, and so on. The disappointing results of the NashPraxis can be explained by the fact that NashPraxis is a general maximization problem that is not adapted to the pollution allocation problem and does not use the linearity of the utility functions. As presented in figure 7.2, MaxSum usually yields zero Nash Product since it maximizes the sum of the utilities and is not concerned with the fairness of the selected solution. Therefore, there may be plants that will receive only their conflict allocation utility and thus the Nash Product will be equal to zero. Consider the average time performance of the methods in our simulations, and note that k, n, and m varied in our simulations between 5 and 20. As could be expected, Nash Hill Climbing is the slowest method, because it always has to perform 5,000 iterations. NashPraxis and BackTracking require similar average times. Even though the maximal number of iterations that are allowed in NashPraxis is 10,000, it usually converges much earlier and requires an average time similar to BackTracking. The fastest methods are MaxSumNash and MaxSum, which require only one iteration. 7.3.2.4 Influence of the Parameters of the Environment on the Results As the number of plants increases, the average utility that is obtained by MaxSum and MaxSumNash improves significantly. The average utility that is obtained by BackTracking improves slightly when the number of plants increases, the average utility of NashPraxis does not change, and the average utility of Nash Hill Climbing yields a slightly lower utility as the number of the plants
190
Chapter 7
increases (see figure 7.1). The reason for this behavior might be that when the number of plants increases there are more possibilities of satisfying the overall constraints and thus MaxSum and MaxSumNash yield a much higher average utility. On the other hand, a larger number of plants makes the optimization problem more difficult, which explains the suboptimal behavior of BackTracking, Nash Hill Climbing and NashPraxis. The average utility obtained by all the methods increases as the number of products per plants increases. When the number of products per plant increases, each plant has much more flexibility with respect to the given constraints. So, each plant can use the permission to emit pollutants more completely and with greater benefit for itself. When the number of pollutants increases, the average utility obtained by all the methods decreases. This is because when the number of pollutants increases, the number of constraints increases, and therefore all the methods yield a worse average utilities. 7.3.2.5 Discussion As a result of our experiments we reached the following conclusions concerning the methods described in the previous subsections: NashPraxis Advantages: 1. It is relatively fast. 2. It does not use the linearity of the utility functions and the constraints and therefore can be used if these are not linear. Limitations: 1. It yields low values of the Nash product and the sum of the utilities. 2. It can be used only for small numbers of plants, products, and pollutants, since it does not converge for larger numbers. Nash Hill Climbing Advantages: 1. It does not use the linearity of the utility functions and the constraints and therefore can be used if these are not linear. 2. It yields relatively high values of the Nash Product. 3. It yields low standard deviation of the plants’ utilities (i.e., it is fair).
Negotiations about How to Reduce Pollution
191
Limitations: 1. It is very slow. 2. It does not necessarily find the optimal values of the sum of the plant’s utilities or the Nash product. MaxSum Advantages: 1. It is very fast. 2. It yields the optimal sum of the plants’ utilities. 3. It can be used for a large number of plants, products, and pollutants. 4. It is polynomial. Limitations: 1. It takes into consideration the linearity of the utility functions and the constraints. 2. The value of the Nash product of the allocations obtained by this method is usually equal to zero. 3. It yields a high standard deviation of the plants’ utilities (i.e., it is unfair). MaxSumNash Advantages: 1. It is very fast. 2. It reaches the optimal values of the sum of the plants’ utilities and the Nash product. 3. It can be used for large numbers of plants, products, and pollutants. 4. It is polynomial. 5. It yields a very low standard deviation of the plants’ utilities (i.e., it is fair). Limitations: 1. It uses the linearity of the utility functions and the constraints. 2. It requires side payments.
192
Chapter 7
BackTracking Advantages: 1. It is relatively fast. 2. It provides high values of the Nash product and the sum of the utilities. 3. It yields a low standard deviation of the plants’ utilities (i.e., it is fair). 4. It can be used for a large number of plants, products, and pollutants. 5. It is polynomial. 6. It can be used in the case of a sequential response to an offer and thus it does not require the agents’ designers to decide on a policy for choosing an offer before the negotiation. 7. It can also be used in the simultaneous response protocol, as a welfare criterion. Limitations: 1. It uses the linearity of the utility functions and of the constraints. (This limitation can be abolished by using a nonlinear optimization technique instead of Simplex at each step, but this will lead to a worse performance and the results that will be obtained by the method will be less reliable.) 2. It does not necessarily find the optimal values of the sum of the plant’s utilities or the Nash product. We suggest using the MaxSumNash method for computing the welfare criteria in cases of simultaneous response to an offer if side payments are permitted. If side payments are not permitted then the plants have to agree prior to the negotiation to use either BackTracking or MaxSum as the social-welfare criterion. In the case of sequential response the BackTracking method should be used. 7.4
Incomplete Information
In the previous sections, we assumed that the plants have complete information about each other. In particular, they know each others’ utility functions (profit of producing goods). In real-world situations such assumptions are usually not valid. One possibility is to use a revelation mechanism as in the data allocation case (chapter 3.4) where each plant announces its utility and pollution functions. This is problematic since revealing such information usually conflicts with the plants’
Negotiations about How to Reduce Pollution
193
interests. In addition, it is hard to verify the truthfulness of the announcements of the agents in the pollution allocation problem. It is not similar to the data allocation problem, where most of the information on the usages is stored by two agents and thus lies are revealed. In the pollution allocation problem, the information of each plant is kept secret. Another possibility that we checked experimentally is that the plants declare their preferences concerning each of the pollutants. The problem is that there are dependencies between the pollutants, since manufacturing a unit of a specific product is usually bound to the emission of several pollutants. The approach presented in section 4.4 —maintaining a final set of agents’ types, in which each agent has some probabilistic beliefs about the types of the other agents and about the other agents’ beliefs about themselves and about other agents—could be considered for the pollution allocation problem. However, this approach is also problematic in our case because there are many agents and the negotiation has many interconnected issues. Another approach is to use a bidding mechanism. In chapter 9 (section 9.1.2) we will describe its usage for a version of the data allocation problem in which each server is concerned only about the data stored locally. Such a mechanism is not applicable in the pollution allocation problem since there are several units of each pollutant and many buyers and sellers for each type of pollutant. The second price sealed bid that is discussed in section 9.1.2 is appropriate for one seller of one unit for each good and thus is not applicable to the pollution allocation problem. Double auctions could be used, but because of the interdependencies between the pollutants, market mechanisms seem more appropriate. We investigated three market mechanisms for allocating pollution emissions in the case of incomplete information. The first attempts to find the competitive equilibrium of the market. See, e.g., (Varian 1992) for a detailed description. That is, it searches for a vector ρ of length k of prices—one for each pollutant, and for a pollution allocation, (b¯ 1 , . . . , b¯ n ), where b¯ i is a vector of the amounts of pollutants pli would be allowed to emit, that satisfies the following constraints: For each plant pli , b¯ i = (bi1 , . . . , bik ) is feasible and maximizes pli ’s utility. ¯bi is feasible if, given the prices ρ, pli ’s cost does not exceed the value of its c c , . . . , bik ). That is: conflict allocation, i.e., the value of (bi1 •
′ i (b¯ i ) where b¯ i = argmaxbi ′ ∈IR k Umax k j=1
ρ j bi j ≤
k j=1
ρ j bicj
(7.2) (7.3)
194
Chapter 7
The total amount of pollution will not exceed the special circumstances constraints. That is:
•
∀pol j ∈ POLLUT,
n
bi j ≤ b j
(7.4)
i=1
In the selected allocation, each agent will be allowed to emit bi j of pol j . Note that it is assumed that each plant behaves competitively—that is, it takes prices as given independently of its actions. Competition is also assumed to be perfect; for example, the price of each pollutant is the same for all the plants and all the transactions. However, even under the above assumptions, a competitive equilibrium does not always exist in our domain. It is known that such an equilibrium exists when the aggregate demand function is continuous and satisfies the Walras Law (Varian 1992). Leon Walras (1954) proposed a way to compute this equilibrium through a price-adjustment process he called tatonnement. A sufficient condition for convergence of the tatonnement is gross substitutability: if the price of one of the goods rises, then the demand for the other goods does not decrease. In our environment this condition does not hold. Moreover, the aggregate demand function of the pollution domain is not continuous because of the dependencies between the pollutants with respect to the production of items. Thus in our environment competitive equilibrium does not always exist, and if it exists, the tatonnement mechanism will not always converge. Also, the WALRAS mechanism (Wellman 1993), which is based on the tatonnement mechanism and was developed for resource allocation, does not always converge in our domain. The noncontinuity of the demand function can be easily demonstrated using the following example (as will be demonstrated below). EXAMPLE 24 Suppose plant pli manufactures only one product. It emits 3 units of the pollutant pol j when it manufactures one unit of its product, i.e., em i j (1) = 3. Its utility from manufacturing one unit of its product is $12. It is easy to see that when the price is less or equal to $4, pli would be willing to “buy,” i.e., to be allocated, as much units of the pollutants as possible. However, when the price is higher than $4 its demand drops to zero.
We refer to the tatonnement-based mechanism as the competitive equilibrium market mechanism (CEM). Since competitive equilibrium does not always exist, and even if it exists, the tatonnement-based mechanism does not always
Negotiations about How to Reduce Pollution
195
converge, and even if it converges, the demand is not always equal to zero (as will be demonstrated in example 25), we also examine two greedy market mechanisms. In the tatonnement-based mechanism, the pollutants are allocated only after the process is terminated, whereas in the greedy mechanisms, the pollutants are redistributed in each cycle of the mechanism. In one of the greedy market mechanisms a monetary transaction is performed after each cycle, which we refer to as market-clearing with intermediate transactions (MCIT). In the second greedy mechanism, two pollutants are exchanged after each cycle. We refer to this mechanism as market-clearing intermediate exchange (MCIE). Non-tatonnement mechanisms were also studied by economists Negishi (1962). As in our MCIE mechanism, intermediate exchanges are performed, but the problem of the choice of the specific exchange mechanism is left open. The economists require that an intermediate exchange will occur if, and only if, at least one individual gains by the exchange and no individual loses. This requirement holds in our case even though usually a plant does not obtain all its demands in the exchange step. This is because only one pollutant is exchanged in each cycle, and that the pollution functions are continuous, linear, and homogeneous. At the beginning of the process, the plants are allowed to emit pollutants according to the conflict allocation in all of the mechanisms. The purpose of the processes is to redistribute these allocations. As in the tatonnement mechanism, there is an auctioneer that collects the plants’ demands and determines the appropriate prices. In each cycle of the three mechanisms the auctioneer chooses one of the pollutants randomly and tries to determine its clearing price while keeping the prices of the other pollutants fixed. It uses a binary search to find the clearing price.3 The overall process terminates when the prices do not change for a predefined number of iterations, or when it reaches the predefined maximal number of iterations. As mentioned above, ρ denotes the vector of length k of prices—one for each pollutant. We use the notation μρ¯ j to denote the vector that is exactly like ρ except that ρ j is replaced by μ. 7.4.1
The Competitive Equilibrium Market Mechanism
In this section we present algorithms for finding the market clearing prices, for allocating the pollutants according to the demands of the agents giving these prices, and for determining how a plant should compute its demands given a set of prices. Note that a plant’s demand may be negative, indicating that it would like to “sell” some of its pollutants. The performance of the first two algorithms, that is, the algorithms for finding the market clearing prices and for allocating the pollutants according to the demands of the agents giving these
196
Chapter 7
prices, can be distributed, and the third one, that is, the algorithm for specifying how a plant should compute its demands given a set of prices, is done by each plant; for clarification purposes we present them with no distribution. The following is the procedure used to find the clearing prices. Algorithm 7.4.1 1. Initialization of ρ: initialize the vector ρ with random nonnegative numbers.4 2. Find the global clearing prices: Until Max Num steps were performed or ρ has not been changed, for Max Change steps: (a) Choose randomly pol j from POLLUT. (b) Initialization: Low = 0, High = MaxPrice. (c) Search for the clearing price of polj , given ρ: While High-Low > do: i. μ =
High−Low 2
ii. Ask the plants for their demand for pol j , when the prices are μρ¯ j . Denote by di j the demand of agent i. n iii. Sum up the demands, i.e., dem j = i=1 di j . If dem j ≥ 0 then Low = μ. Otherwise, High = μ.
(d) Replace ρ j with Low in ρ.
3. Verification: Verify for all pli that its vecror of demands (di1 , . . . , dik ) is feasible, i.e., satisfies 7.3. That is, given ρ, pli ’s cost does not exceed the value of its conflict allocation, i.e., kj=1 di j ρ j ≤ kj=1 bicj ρ j .
The binary search is performed in the While loop of step 2c above. In each iteration of the loop, price μ is considered. If for this price the demand is higher than the supply, that is, the sum of di j is positive, then price μ is too low and it is assigned to LOW. If the supply is higher than the demand, then the price is too high and μ is assigned to HIGH. A cycle terminates when the prices LOW and HIGH are very close. We do not aim at zero demand because the functions are not continuous and an allocation with zero demand may not exist. Thus, when the process terminates, the demand may be quite high. When the demand is not zero, the demand of all the plants will not be satisfied, and some allocation procedure is needed. The available pollution permits are divided proportionally according to the demands of the plants.
Negotiations about How to Reduce Pollution
197
In particular, for each pol j ∈ POLLUT, the agents with negative demands— the ones that would like to emit less than their conflict allocation amount, that is, di j < 0—will be allocated bicj + di j . Thus di j 0}. The iterative algorithm used to distribute the supply of pol j to the agents with positive demands is as follows: 1. Initialization: tmp = di j 0 di j ρ j ≤ bˆ il and if di j < 0 then |di j | ≤ bˆ i j . After ρ j is determined in step 2d of the algorithm, a step of exchanges of pol j with poll is performed and bˆ i j and bˆ il are updated accordingly. The decision of how much a plant can buy in each cycle is made in the same manner as in the MCIT’s case. That is, only some percentage of the amount that can be exchanged is actually exchanged. The maximization problem that a plant would need to solve given that pol j should be exchanged with poll consists of the following parts. It will need to choose the amounts that it will produce of each product, that is, x¯ i = (xi1 , . . . , xim i ), that will maximize its utility using function Ubi (see 7.10 below). There are two constraints on the choice of x¯ i . First, for any pollutant polh not equal to pol j or poll , the amounts that will be emitted during the production of x¯ i , that is, em i h (x¯ i ) should not exceed the current allocation of pli of this pollutant, that is, bˆ i h (see 7.11). Second, pli can exchange pol j and poll according to the price ρ j so the amounts that will be emitted during the production of x¯ i will not exceed the new allocation after the exchange. In particular, given the current allocation of bˆ i j and bˆ il , pli will need to exchange (em i j (x¯ i ) − bˆ i j ) of pol j with (bˆ il − em il (x¯ i )) of poll , where one unit of pol j will be exchanged for ρ j units of poll (see 7.12). Formally, the maximization problem is as follows: Ubi (xi1 , . . . , xim i ) where
(7.10)
∀h, h = j, h =
l, em i h (x¯ i ) ≤ bˆ i h and
(7.11)
ρ j (em i j (x¯ i ) − bˆ i j ) ≤ (bˆ il − em il (x¯ i ))
(7.12)
max
xi j ∈IR,...,xim i ∈IR
Then it needs to return em i j (x¯ i ) − bˆ i j as its demand. That is, it varies the amounts of the products that it will produce, pol j and poll , looking for the maximal utility, given the current amounts of the other pollutants that it has, while satisfying the constraint that it will be able to exchange the pollutants according to the exchange price. 7.4.4
Simulation Evaluation
The goal of the experiments was to compare the results obtained by the different market mechanisms and to examine the influence of the number of the plants, the number of the pollutants and the number of the products on the results. We also compared the market mechanisms with the strategic-negotiation model.
202
Chapter 7
In particular, we compared the market mechanisms with MaxSumNash, which is applicable when there is a monetary system and complete information and the protocol of simultaneous response is used, and with BackTracking, which can be applied to the sequential response protocol with complete information (section 7.3.2). 7.4.4.1 Complexity of the Market Mechanisms All the market mechanisms run until convergence or until Max Num number of steps are performed (in our simulations, Max Num was 10,000). We will now discuss the complexity of each cycle of the mechanisms. Each cycle includes a binary search in a set of 10,000 possible prices, and thus there are approximately 17 iterations in the search. At each of these 17 iterations each plant has to solve its optimization problem to determine its demand. In all the mechanisms Simplex is used for solving these optimization problems. As mentioned above, the l) )) average complexity of the Simplex method is O(h(l + h) · min( h2 , 2l , (h + 8 (Schrijver 1986), where h is the number of constraints and l is the number of variables. We will discuss how many variables and how many constraints are needed for the optimization problem of each mechanism. In the simulations we used more variables than those used above in the maximization problems, in order explicitly to state the relations between the production and the pollution emission. CEM: In each iteration of each cycle, each plant applies Simplex with l = k+m variables, one for each pollutant, and one for each product, and h = k + 1 inequalities, one for each pollutant, indicating that the emission couldn’t be more than the amounts that are available for the agent at the current prices, and one for budget constraint (inequality 7.6). So, the average complexity of the m }). CEM is O(17n(2k 2 + mk) · min{ k2 , 2k + 8 MCIT: In each iteration of each cycle, Simplex is applied with l = m + 1 variables, one for each product, and one for the demand of the pollutant that is the subject of the current cycle, and h = k inequalities, one for each pollutant, indicating that the emission couldn’t be more than the amounts that will be available to the agent after the current transaction. Thus the complexity of MCIT is O(n(k 2 + mk) · min{ k2 , m+1 , k+m+1 }). 2 8 MCIE: In each iteration of each cycle, there is one application of Simplex for each plant with l = m + 2 variables, one for each product and two for the pollutants that may be exchanged. The number of inequalities in each application of Simplex is h = k + 1, one for each pollutant, indicating that the
Negotiations about How to Reduce Pollution
203
emission couldn’t be more than the amounts that will be available to the agent after the exchange, and one equation, indicating that the amount of the first exchanged pollutant is equal to its price multiplied by the amount of the opposite of the second exchanged pollutant. Thus the average complexity of MCIE is O(n(k 2 + mk) · min{ k +2 1 , m 2+ 2 , k + m8 + 3 }). The average complexity of this mechanism is similar to the average complexity of the MCIT. For the complexity of MaxSumNash and BackTracking methods, see section 7.3.2.2. The complexity of the cycles of all the mechanisms is the same. The differences in the time performance stem from the convergence rate. Usually MCIE and CEM converge more slowly than MCIT, and therefore have worse time performance. 7.4.4.2 Performance of the Methods The results presented in the figures are taken from 150 runs. As could be expected the highest average utility of the plants and the highest Nash Product were obtained by MaxSumNash (see figures 7.3 and 7.4). This is because MaxSumNash finds the optimal sum of the utilities and the optimal Nash Product, as was explained in the complete information case (section 7.3.2). However, it requires complete information and a monetary system. Between the market mechanisms CEM yields the worst results. The reason for this is that in our environment there are strong interdependencies between the commodities (pollutants), and the performance of the CEM mechanism in such conditions is poor. The MCIT mechanism usually yields better results than the MCIE mechanism because MCIT allows money exchange. It is surprising that the MCIT mechanism’s results are usually very close to the MaxSumNash. The MaxSumNash finds the optimal solution when it has complete information, whereas the MCIT mechanism yields results very near to the optimal with incomplete information. The highest Nash Product is obtained by MaxSumNash, and CEM yields the worst results. The results of the other methods depend on the parameters. MaxSumNash yields the highest Nash Product because it searches for optimal Nash Product. CEM yields the worst results because the aggregate demand function is not continuous and there is no gross substitutability—two conditions that are needed for CEM to converge to a solution with zero demand. In particular, a plant may be allowed to emit much less than it would like, and thus its utility may be lower than its conflict allocation utility. In this case, the conflict allocation is implemented and thus the Nash Product is equal to zero.
1200
Utility per Plant
1000 BackTracking
800
MCIE MaxSumNash
600
CEM
400
MCIT
200 0 5
10
15
20
Number of Plants 1200
Utility per Plant
1000 BackTracking
800
MCIE
600
MaxSumNash CEM
400
MCIT
200 0 5
10
15
20
Utility per Plant
Number of Products
500 450 400 350 300 250 200 150 100 50 0
BackTracking MCIE MaxSumNash CEM MCIT
5
10
15
20
Number of Pollutants Figure 7.3 The change in the average utility of the plants as a function of the number of plants, the number of products and the number of pollutants when there is incomplete information.
1.00E+58
Nash Product
1.00E+49 BackTracking MCIE MaxSumNash CEM MCIT
1.00E+40 1.00E+31 1.00E+22 1.00E+13 1.00E+04 5
10
15
20
Number of Plants
1.00E+16
Nash Product
1.00E+12
BackTracking MCIE MaxSumNash
1.00E+08
CEM MCIT
1.00E+04
1.00E+00 5
10
15
20
Number of Products
1.00E+13
Nash Product
1.00E+10
BackTracking MCIE MaxSumNash CEM MCIT
1.00E+07
1.00E+04
1.00E+01 5
10
15
20
Number of Pollutants
Figure 7.4 The change in the Nash product of the plants as a function of the number of plants, the number of products, and the number of pollutants when there is incomplete information.
206
Chapter 7
The method with the highest standard deviation of the utilities is BackTracking, MCIE mechanism, MCIT, and MaxSumNash, in decreasing order. MaxSumNash has the lowest standard deviation because it finds the optimal Nash Product and thus the standard deviation of the utilities is equal to the standard deviation of the conflict allocation. We also considered the average time performance of the methods. On average, the MCIT is the fastest mechanism. The time required for MCIE and CEM is quite similar; CEM is slightly better when the number of products is high, and MCIE is better when the number of pollutants is high. 7.4.4.3 Influence of the Parameters of the Environment on the Results The influence of the parameters on the market mechanisms is similar to their influence on the methods considered in the complete information case. As in the complete information case, when the number of the plants increases, all the results of all the methods improve (see figure 7.3). The increase in the results of MaxSumNash and that of the MCIT mechanism is much larger than that of BackTracking, the MCIE mechanism, and the CEM mechanism. This is because when the number of plants increases there are more opportunities to reallocate the pollutants while satisfying the constraints, and therefore all the results improve. On the other hand, a larger number of plants makes the optimization problem more difficult, which explains the behavior of BackTracking, MCIE, and CEM. These do not use a monetary system and thus finding a near optimal solution is more difficult for them. When the number of products per plant increases all the methods yield a higher average utility, as in the complete information case. This is because when the number of products per plant increases each plant gains much more flexibility with respect to the given constraints. So each plant can use the permissions to emit pollutants more completely and with larger benefits for itself. As the number of pollutants increases, all the methods yield a worse average utility. This is because in such cases the number of constraints increases. 7.4.4.4 Discussion As a result of our experiments we reached the following conclusions concerning the methods, described in the previous subsections: CEM Advantages: 1. It reaches the competitive equilibrium if there is gross substitutability and the demand functions are continuous. 2. It does not require side payments.
Negotiations about How to Reduce Pollution
207
Limitations: 1. It obtains poor results if the above conditions are not satisfied. We do not recommend this method for the pollution allocation problem. MCIT Advantages: 1. It is fast. 2. If the number of the pollutants is relatively small, the average utility that it obtains is close to the optimal solution and its Nash Product is also close to optimal. 3. It can be used for large numbers of plants and products. Limitations: 1. It requires a monetary system for side payments. 2. It yields much worse results when the number of pollutants is large. MCIE Advantages: 1. It does not require a monetary system for side payments. 2. It yields a relatively high average utility and high values of the Nash product. 3. It can be used for large numbers of plants, products, and pollutants. Limitations: 1. It is the slowest of the three market mechanisms presented. In summary, if side payments are permitted and the number of pollutants is small (less or equal to 7), the MCIT mechanism should be used. Otherwise, the MCIE mechanism should be used. We do not recommend the CEM mechanism for the pollution allocation problem, but if side payments are not permitted and the number of pollutants is very small (less than 5) this mechanism may be considered. As mentioned above, all these results are valid when the plants act competitively: each plant takes the prices as given, neglecting any impact of its own behavior on the prices. This assumption is reasonable when there are many agents in the environment. Ygge (1998) presents a method for investigating the
208
Chapter 7
maximum possible gain of speculation in markets. He also presents a strategy for a speculative agent, which drives the market to an equilibrium where the agent’s maximal advantage from speculation materializes. This strategy can be used by the agents in our domain. However, to implement this strategy an agent must have some information about other agents’ demand functions (perfect information, biased beliefs or probability distributions on the other agents’ supply/demand functions), which is usually not available to the plants in our environment. 7.4.4.5 Comparison of Market Mechanisms with the Strategic-Negotiation Model The market mechanisms require many more interactions between the agents than the strategic negotiations. In all the cases of the strategic model we consider, an agreement is reached with no delay. However, in the market mechanisms, the agents should interact until the market converges, or until the maximal number of cycles are performed. Each of these cycles also requires several interactions. This may be very time-consuming. In addition, the market mechanisms need some centralized auctioneer. The auctioneer’s tasks can be distributed, but it will require a lot of broadcasting. On the other hand, the market mechanisms are applicable in incomplete information cases, whereas the application of the strategic-negotiation model in such cases is limited. 7.5
Market Mechanisms and AI Methods for Pollution Control
The problem of air pollution has become urgent in the second half of the twentieth century and the early twenty-first century. Production involving a large variety of chemical elements and technological processes has resulted in increased amounts of pollution emitted to the atmosphere as well as a qualitative change in the emission composition. New pollutants that do not exist in nature have appeared. Air pollution has a harmful influence on human health. All these have demonstrated a necessity for pollution control. To guarantee the observance of norms of maximal permissible concentrations of pollutants it is necessary to calculate the maximal quantity of each pollutant that each plant can emit in normal and in unfavorable meteorological conditions. The main sources of air pollution are electric utilities, industry, and transport. In the last few years new methods of aggregating pollution emission have become popular. These methods are referred to as emissions trading programs. Under such programs an “emissions budget” is developed for all the sources in a defined area. The area could be a state, a group of states, an entire country,
Negotiations about How to Reduce Pollution
209
or the entire world. The emissions budget is then divided among emitting companies. Depending on the program, each company’s share of the budget can be based on historical emissions, capacity, or units produced. Political and legislative deliberations also influence the distribution of the allocations. Over time, the overall budget and each company’s allocation declines. To operate within the declining budget, a firm can undercontrol and buy credits, emit less than permitted or overcontrol and sell credits, use a mixture of controls and credit acquisitions, or curtail operations on a schedule that is parallel to the declining budget. Emissions trading allows firms the flexibility to select costeffective solutions to achieve established environmental goals and encourages them to pursue cost-effective emission reduction strategies and to develop the means by which emissions can be reduced inexpensively. The emission trading implemented uses allowance auctions where permits to emit some pollutant are sold. Below we review examples of emission trading programs directed at reducing air pollution. An interesting program is the Acid Rain Program of the United States Environmental Protection Agency (EPA) (Agency 2000; Cramton 2000). The overall goal of the Acid Rain Program is to achieve significant environmental and public health benefits through reductions in emissions of sulfur dioxide (SO2 ) and nitrous oxides (NOx), the primary causes of acid rain. To achieve this goal at the lowest cost to society, the program employs both traditional and innovative market-based approaches for controlling air pollution. In addition, the program encourages energy efficiency and pollution prevention. The allowance allocation proceeds in the following way. The initial allocation of the allowance is based on past history. Allowances may be bought, sold, or banked. Any person may acquire allowances and participate in the trading system. The EPA also holds yearly auctions of allowances. To supply the auctions with allowances, the EPA sets aside a Special Allowance Reserve of approximately 2.8 percent of the total annual allowances allocated to all the units. Private allowance holders may also offer their allowances for sale at the EPA auctions. The General Accounting Office recently reported that the allowance trading system could save as much as $3 billion per year—over 50 percent—compared with a command and control approach typical of previous environmental protection programs. Greenhouse gas air credit trading is another emission trading program (Fitzgerald 2000) that has emerged as a mechanism to reduce greenhouse gas (GHG) emissions such as carbon dioxide and methane. The emission trading markets are for long-term periods, the negotiation time is large, and the trade is done by people. In this chapter we consider the reduction
210
Chapter 7
of emission for a short term, the negotiation period is very short, and thus, we propose that it could be done automatically by agents. The following papers discuss emissions trading methods for long-term periods. Ledyard and Szakaly-Moore (1993) focus on choosing a permit trading mechanism that is both economically efficient and politically viable. They examine static allocation of permits, auctions without initial allocation of permits, and different types of auctions that involve an initial allocation of permits based on past history. They claim that the only mechanism that is both economically efficient and politically viable is the auction with an initial allocation of permits. Cramton and Kerr (1998) argue that an auction is preferred to grandfathering (giving companies permits based on historical output or emissions), because it allows reduced tax distortions, provides more flexibility in distribution of costs, provides greater incentives for innovation, and reduces the need for politically contentious arguments over the allocation of rents. A review of works on game theory and artificial intelligence dealing with pollution control is given below. Mayoh (1996) uses genetic algorithms to build a model of the enforcement dilemma problem: how much effort should the government put into enforcing the environmental regulation? If the government agencies put forth too little effort, firms will be tempted to skimp, pollute, and risk the occasional fine. If government agencies put forth too much effort, firms will avoid polluting, but the government will waste money. Kwerel (1977) examines the incentives of firms to deceive the regulatory authority when confronted with two standard pollution control policies: a fixed number of transferable licenses for pollution and effluent charge per unit of pollution. He proposes a new scheme that aims to induce cost-minimizing firms to reveal the true costs of cleaning up pollution. Dasgupta, Hammond, and Maskin (1980) develop a model of pollution control, providing classification in terms of the number of rounds of communication between the regulator and the firms. Kryazhimskii et al. (1998) consider a noncooperative multiplayer game in which the governments of neighboring countries trade emission reductions. They prove the existence of market equilibrium and study algorithms for searching for a market equilibrium. The algorithms are interpreted as repeated auctions in which the auctioneer has no information on the countries’ costs and benefits and no government has information on the costs and benefits of other countries. In each round of the auction, the auctioneer offers individual prices for
Negotiations about How to Reduce Pollution
211
emission reductions and observes countries’ best replies. While we consider a model of several pollutants in which the utility of a plant depends on its permits to emit all the pollutants and there are interdependencies between the pollutants, Kryazhimskii et al. consider a problem where the governments consider the reduction in emission of only one pollutant, but allow different exchange rates of the same pollutant in the market. In conclusion, this chapter discusses the pollution allocation problem. The problem is how to reduce pollution for a short time period. For situations of complete information we propose using the strategic-negotiation model in which the negotiations end with no delay. For the incomplete information case, we proposed market mechanisms. Reaching a solution using the market mechanisms is delayed, but solutions very close to optimal can be obtained even when the plant’s have no information on the others’ utility functions.
8
Negotiation during a Hostage Crisis
In the previous sections we presented the application of the strategic negotiation model to problems solvable by automated agents. In this chapter, we demonstrate its usage for human crisis negotiations. We will analyze the hostage crisis (HC) scenario that was briefly described in section 1.5. Recall that the hostage crisis scenario is based on the hypothetical hijacking of an Indian airliner by Sikh terrorists and its forced landing in Pakistan. The three parties (India, Pakistan, Sikhs) consider several possible outcomes: India or Pakistan launch military operations to free the hostages; the hijackers blow up the plane with themselves aboard; India and the Sikhs negotiate a deal involving the release of a number of security prisoners in Indian jails in exchange for the hostages; Pakistan and the Sikhs negotiate a safe passage agreement; or the hijackers give up. Each party to the negotiation has a set of objectives, and a certain number of utility points is associated with each (see Kraus et al. 1992). There are several possible uses for the application of the strategic-negotiation model to a hostage crisis. The hostage crisis results presented in this chapter have been used for the development of a negotiation simulation environment (Wilkenfeld et al. 1995) This simulation environment has been used for training people involved in negotiation. In particular, students, the German police, and the U.S. foreign office personnel have used it successfully. In addition, the simulation environment has been used to study the impact of cognitive complexity of decision makers on their behavior in crisis negotiation situations (Santmire et al. 1998). It is interesting to compare results of an actual hostage crisis, where actual people negotiate with each other, with the theoretical results presented in this chapter. However, predicting the human behavior is not the goal of the research. The theoretical results can also be used for the development of an automated agent that will negotiate with human players in a hostage crisis simulation environment. This agent could be used to train people in handling negotiations. 8.1
The Simulation Scenario
The following is a brief description of the hostage crisis scenario. As mentioned above, the scenario is based on the hypothetical hijacking of a commercial airliner en route from Europe to India and its forced landing at Karachi International Airport. The passengers are predominantly Indians, but there are a number of other nationals aboard. The hijackers are known to be Sikh. The
214
Chapter 8
hijackers demand the release of an undetermined number of Sikh prisoners from Indian security prisons and safe passage for the hijackers to an as yet undisclosed destination (for additional details see Kraus and Wilkenfeld 1990). The hostage crisis as specified was selected as a typical case of multiparty negotiation. Although this hypothetical case is quite specific in details, the formal model is general. The choice of a real historical case would have increased the complexity of the model while at the same time reducing its potential generalizability. Once the case was chosen, it was reduced to its essential characteristics. For example, this model consists of only three players: the terrorists (the Sikhs), India, and Pakistan (the latter plays the role of third party or mediator). We could have added additional players such as the United States or China, but we feel that these three adequately represent the most important types of players and their interests in such a negotiation. Similarly, we could have increased the number of options available to each player—India could have had the option of kidnapping a prominent Sikh leader, in addition to its two options of reaching an agreement with the Sikhs or launching a military operation. Here again, we assume that the added complexity that additional options would entail would not add appreciably to the reliability or generalizability of the model. India, the Sikhs (hijackers), and Pakistan must consider six possible outcomes: 1. India launches a military operation to free the hostages. 2. Pakistan launches a military operation to free the hostages. 3. The Sikhs blow up the plane with all on aboard. 4. India and the Sikhs negotiate a deal involving the release of prisoners in Indian jails, release of hostages, and safe passage for the Sikhs. 5. Pakistan and the Sikhs negotiate a deal involving release of the hostages and safe passage for the Sikhs. 6. The Sikhs give up. Each party to the negotiation has a set of objectives and a certain number of utility points associated with each (see Kraus and Wilkenfeld 1990). Shortterm objectives pertain to the resolution or management of the immediate crisis, while long-term objectives have to do with the consequences of the policy of that actor once the immediate situation has been resolved.
Negotiation during a Hostage Crisis
215
For India, short-term objectives involve the safe return of the passengers and an acceptable level of casualties among Indian military personnel in the event of military action. For the Sikhs, short-term objectives include the release of prisoners held in Indian jails, the release of the hostages, and safe passage for the Sikhs. Pakistan is cast in the role of mediator or facilitator, and has no exclusively short-term goals. Among India’s major long-term goals is a cluster of factors relating to the credibility of its deterrence of terrorism, its overall strategic interests, and its experience in counterterrorism. For the Sikhs, long-term objectives include damage to India’s internal and external image, damage to India’s deterrence of terrorism, and damage to India’s relations with the United States and Pakistan. For both India and the Sikhs, the long-term consequences are considerably more important than the resolution of the immediate situation. As we have indicated, all of Pakistan’s objectives are long-term in nature. By far the most important Pakistani objective is to demonstrate its control of the situation and to maintain its internal image. Also of critical importance is Pakistan’s ability to emerge from the crisis with its relations with other countries intact. Combining the range of utility points associated with each objective with the six possible outcomes listed above generates a matrix that yields a point output total for the various outcomes. In the case of three of these outcomes—an Indian or Pakistani military operation, and a terrorist decision to blow up the plane—probabilities are attached to the success or failure of such actions. The concept of the passage of time is incorporated into the model in two ways. First, it provides a reference point for the calculation of utilities and probabilities. Second, time is a factor for the three parties, since the passage of time affects each of them differentially. In general, time works in favor of the Sikhs and against India and Pakistan. This latter aspect of time sets up a complex negotiation dynamic for the crisis. In general, time affects the following aspects of the model: (1) the probability of success of an Indian or Pakistani military operation (having to do with whether the operation is launched in daylight or at night, time available for preparation of troops, deteriorating weather conditions, and the condition of the Sikhs and the hostages); (2) the extent of publicity for the Sikhs’ message; and (3) India and Pakistan’s internal and external images. This scenario is analyzed in the subsequent sections of this chapter. First, we consider a simplified version where Pakistan does not play a role in the negotiations and then we discuss how the results change when Pakistan participates in the negotiations.
216
8.2
Chapter 8
Negotiations between Only the Sikhs and India
We assume that there are two players: the “initiator” of the crisis—the Sikh hijackers (sik) and the “participant” (against its will) in the crisis—India (ind). Thus the set of agents is defined as: Agents = {sik, ind}. We assume that India holds M Sikh security prisoners, and an agreement between the hijackers and India is a pair (ssik , sind ) where ssik , sind ∈ IN , ssik ≥ 1 and ssik + sind = M. As in the previous chapters, the set of agreements is called S. That is, an agreement between the hijackers and India is the division of the M prisoners between them. In the hostage crisis situation of (Wilkenfeld et al. 1995): M = 800 (Sikh prisoners in Indian jails). We assume that the Sikhs opt out by blowing up the plane and India opts out by launching a military operation. These actions may succeed or fail yielding different points to the participants. We assume that the players are risk-neutral and our discussion take into consideration only the expected utility for the players from a given outcome (Kraus and Wilkenfeld 1993). As discussed in section 2.3 and defined in previous chapters, each player i = sik, ind has a utility function U i : {S ∪ {Optind , Optsik } × T } ∪ {Disagreement} → IR. Note that U i (Opt j , t) specifies the expected utility for i resulting from the opting out of j. We have identified several conditions that the utility function of the players in the hostage crisis satisfy. We assume that these conditions are known to all players. That is, we have developed a model of complete information. 8.2.1
The Sikhs Gain Over Time and India Loses Over Time
There are several similarities between the hostage crisis situation and the case of the resource allocation problem presented in chapter 4, where one agent uses the resource during the negotiations while the other waits for access. In both cases there are two parties that negotiate over the division of M units. In the hostage crisis the players negotiate the division of the M prisoners, and in the resource allocation case they negotiate the division of M time units of resource usage. In both cases, one of the agents gains over time, agent A in the resource allocation problem and the Sikhs in the hostage crisis, and one of the agents loses over time, agent W in the resource allocation problem and India in the hostage crisis. We restate the assumptions of chapter 4 below, informally, when possible. A0h Disagreement is the worst outcome.
Negotiation during a Hostage Crisis
217
A1h The prisoners are desirable: For agreements that are reached within the same time period, each agent prefers to obtain a larger number of prisoners. A2h Agreement’s cost over time: India prefers reaching a given agreement sooner rather than later, while the Sikhs prefer to reach a given agreement later rather than sooner. A3h Agreement’s cost over time: We consider the model of fixed loses/gains per time unit. Thus India has a number cind < 0 and the Sikhs have a number csik > 0 such that: For any s, r ∈ S and t1 , t2 ∈ T , U i (s, t1 ) ≥ U i (s, t2 ) iff (si + ci ∗ t1 ) ≥ (ri + ci ∗ t2 ). A4h Opting out over time: India prefers to opt out sooner rather than later, and the Sikhs prefer to opt out later rather than sooner. Note that assumption (A2h ) does not hold for Opt, and the preferences of the players for opting out in different periods of time do not change in a stationary way. Furthermore, the preferences of a player for opting out versus an agreement fluctuate across periods of time in a nonstationary fashion. In the case of the hostage crisis this is because of different rates of change over time in the probabilities associated with success or failure of the actions taken when opting out. A5h Range for agreement: For every t ∈ T , •
The property of the nonemptiness of Possiblet is monotonic.
If there is still a possibility to reach an agreement in the next time period (i.e., Possiblet+1 is not empty), then India prefers opting out at time period t or agreeing to the worst agreement for India that is still better for India than opting out at time t (i.e., sˆ ind,t ) to waiting until the next time period (t + 1) and reaching the worst agreement for India that is still better for India than opting out at period t + 1 (i.e., sˆ ind,t+1 ). However, the Sikhs’ preferences are opposite. The Sikhs prefer reaching the agreement sˆ ind,t+1 at time period t + 1 to reaching the agreement sˆ ind,t at period t. •
If it is still possible to reach an agreement, then the Sikhs prefer sˆ ind,t at time period t to opting out in the next time period. •
A6h Possible agreement: In the first two time periods, there is an agreement that is preferable to both the Sikhs and India than opting out.
218
Chapter 8
Since assumptions A0h –A6h are similar to A0r –A6r , theorem 4.3.1 is valid in the hostage crisis. That is, the results of the negotiations depend on the relationships between the losses of the Sikhs and India over time. If India loses ind,1 more than the Sikhs can gain over time and sˆsik +csik ≤ M, then the negotiation ind,1 ind,1 will end in the first time period with (⌊ˆssik + csik ⌋, ⌈ˆsind − csik ⌉). If India loses less than the Sikhs can gain over time, the negotiations will end in the second time period with the agreement sˆ ind,1 . 8.2.2 The Sikhs’ Situation Changes from Winning to Losing Over Time, and India Loses Over Time Analyzing the hostage crisis more carefully, we can see that there is a point in the negotiations where the hijackers stop gaining and start losing over time. This can be the result of factors such as a shift in media sympathy from the plight of the Sikh prisoners to the deteriorating circumstances of the hostages on board the aircraft. So we consider a revision of assumptions A3h and A4h . We assume that the Sikhs through period Tc , prefer any agreement later rather than sooner, but then they prefer any agreement sooner rather than later. In the simulations that we conducted Tc was equal to 10 (Wilkenfeld et al. 1995). ′ In particular, the Sikhs have two constants, csik > 0 and csik < 0, such that: let 1 2 1 t1 , t2 ∈ T , and for i = 1, 2, ti = ti + ti where if ti ≥ Tc , ti = Tc , otherwise ti2 = 0. For any (ssik , sind ), (rsik , rind ) ∈ S, U sik ((ssik , sind ), t1 ) ≥ U sik ((rsik , rind ), t2 ) ′ ′ ∗ t12 ) ≥ (rsik + csik ∗ t21 + csik ∗ t22 ). We also assume that iff (ssik + csik ∗ t11 + csik even when the Sikhs are losing over time, their losses are less than the losses of ′ India. That is, |csik | < |cind |. We also assume that after Tc , the Sikhs prefer to opt sooner rather than later, but until Tc they prefer to opt out later rather than sooner. The results of the previous section when the Sikhs gain over time, which are based on theorem 4.3.1, are also valid in the current case. This is mainly because the Sikhs’ losses are less than those of India (Kraus and Wilkenfeld 1993). Therefore, what India can gain from the subgame starting in period Tc is not ind,Tc +1 ind,Tc +1 +csik ⌋, ⌈ˆsind −csik ⌉) if it is her turn to make an offer at better than (⌊ˆssik ind,Tc ˆ time period Tc , or s if it is the Sikhs’ turn to make an offer at time period Tc . 8.3
Pakistan Participates in the Negotiations
We assume that in addition to India and the Sikhs, Pakistan also participates in the negotiations. Thus the set of agents is defined as: Agents = {sik, ind, pak}. Unlike previous chapters where agreements are reached between all the agents,
Negotiation during a Hostage Crisis
219
in this version of the hostage crisis a set of possible agreements exists between all
j possible pairs of players. Thus we assume that the set Si, j , i, j ∈ Agents, i = includes the possible agreements between players i and j. We also assume that Si, j = S j,i , and as before we refer to the set of all possible agreements by S. In analyzing the hostage crisis case, we identified special conditions in regard to the sets of possible agreements. In this case Sind, pak = ∅, that is, there is no possible agreement between India and Pakistan that can end the crisis. There is only one possible agreement between the Sikhs and Pakistan (hostages are freed and hijackers are granted free passage). We add the following assumption: A7h Preferences for agreements: While the Sikhs prefer any agreement between themselves and India (i.e., victory for the Sikhs) to any agreement between themselves and Pakistan (i.e., defeat for the Sikhs), both India and Pakistan prefer a Sikh/Pakistan agreement at any time to any Sikh/India agreement or to opting out. A8h Possible agreement: The Sikhs and Pakistan prefer any possible agreement over opting out. In the situations in which the revised version of assumptions A0h –A8h hold, there are two possible subgame-perfect equilibria, depending on the exact relations between the utility of the players from the agreement between the Sikhs and Pakistan and the possible agreements between the Sikhs and India and the way these utilities change over time. In one subgame-perfect equilibrium, the negotiations will end at Tc with an agreement between the Sikhs and Pakistan (Kraus and Wilkenfeld 1991). In the second equilibrium, the negotiations will end in the first or second time period with an agreement between the Sikhs and India. 8.4
Negotiation Analysis in Political Science
The analysis of negotiation and bargaining behavior in crisis situations has fallen predominantly in the domain of political science. Studies in this area focus on deterrence (George and Smoke 1974), the bargaining process itself (Snyder and Diesing 1977), cross-national models of crisis decision making (Brecher 1978; Stein and Tanter 1980), cognitive closure and crisis management (Lebow 1981), quantitative analysis of bargaining (Leng 1988), and studies of crisis prevention (George 1983). Comprehensive statistical analysis of the behavior of states in crises is reported in (Brecher, Wilkenfeld, and
220
Chapter 8
Moser 1988; Wilkenfeld, Brecher and Moser 1988; Brecher and Wilkenfeld 1989, 1997). Another related work was conducted by Fraser and Hipel (1979). They developed a formal method that permits a rapid assessment of complex conflict situations for the purpose of finding a resolution of a conflict. The output from the analysis includes possible stable solutions to the conflict. Comparing their work with ours, we model the process of the negotiation itself, taking into account the passage of time during the negotiation. Our analysis provides for the players negotiation strategies that are in perfect equilibrium. The question of whether or not to negotiate with terrorists has been discussed in the media, governments, and research for many years. It was assumed that one should never bargain with terrorists since such negotiations encourage terrorism by making it a profitable activity. In (Sandler, Tschirhart, and Cauley 1983) a general formal model for decision making for governments and terrorists in situations such as hostage crises is presented. They focus on the utility functions of the actors, taking into account constraints on available resources and their risk attitudes. While we focus on the effect of time on the actors’ utility, they do not take the time preferences into consideration. One of their conclusions that is consistent with ours is that a no negotiation strategy is not the best in all situations. In this chapter we presented such a situation. Lapan and Sandler (1988) use an economic analysis in a simple game-theory framework to study the conditions under which a government would want to precommit itself to a no-negotiation strategy. They show that according to their formal model, only in a limited number of cases is the no-negotiation strategy recommended. In the scenario we analyzed in this chapter, negotiation is shown to be beneficial.
9
Economic and Game-Theoretic Models for Cooperation
In this book we have focused on the strategic-negotiation model as a method that agents can use to cooperate and to coordinate their activities. However, there are other game theory and economic models that were adapted for agents, and we present a short survey in this chapter. When relevant, these models are compared with the strategic-negotiation model. 9.1
Auctions
In the environments we consider, agents try to reach agreements concerning the distribution of a set of items. For example, in the information server environment discussed in chapter 3, the agents need to decide on the allocation of datasets, that is, the items under consideration are datasets. In resolving conflicts on the usage of a resource (chapter 4), an agreement should be reached on the time periods to be assigned to each agent, that is, the items to be distributed are time periods. When the agents need to decide on task assignments (chapter 6), then the items under discussion are tasks, and a decision should be made about which agent will execute a given task. Most of these conflicts can be resolved efficiently by providing the agents with a monetary system, modeling them as buyers and sellers, and resolving the conflicts using a money transfer. For example, a server may “sell” a dataset to another server when relocating this dataset; a subcontractor may be paid in order to carry out a task. In this chapter we discuss the use of auctions for “buying” and “selling” items to resolve conflicts. Auctions have become an area of increased interest since a huge volume of economic transactions is conducted through these public sales. The formation of virtual electronic auction houses on the internet (Guttman and Maes 1998) such as eBay (eBay 2001) has even increased the interest in auctions. There are two patterns of interactions in auctions. The most common are oneto-many auction protocols (Sandholm 1993; Andersson and Sandholm 1998; Gimenez-Funes, Godo, and Rodriguez-Aguilar 1998), where one agent initiates an auction and a number of other agents can bid in the auction, and many-to-many auction protocols (Wurman, Walsh, and Wellman 1998), where several agents initiate an auction and several other agents can bid in the auction. Given the pattern of interaction, the first issue to determine is the type of protocols to use in the auction (Klemperer 1999). Given the protocol, the agents need to decide on their bidding strategy. There are several types of one-to-many auctions that are used, including the English auction, first-price sealed-bid auction, second-price sealed-bid
222
Chapter 9
(Vickery auction), and the Dutch auction. The English auction is an ascending auction, in which the price is successively raised until only one bidder remains, and that bidder wins the item at the final price. In one variant of the English auction the auctioneer calls successively higher prices until only one willing bidder remains, and the number of active bidders is publicly known at all times. In other variants the bidders call out prices themselves, or have the bids submitted electronically and the best current bid is posted. The first-price sealed bid auction is a sealed-bid auction in which the buyer making the highest bid claims the object and pays the amount he has bid. The second-price auction is a sealed-bid auction in which the buyer making the highest bid claims the object, but pays only the amount of the second-highest bid. In the Dutch auction, the auctioneer begins by naming a very high price and then lowers it continuously until some bidder stops the auction and claims the object for that price. In real-world situations, each auction has its advantages and drawbacks (Klemperer 1999; Monderer and Tennenholtz 2000). The MAGMA system (Tsvetovatyy and Gini 1996; Tsvetovatyy et al. 1997) is an architecture for an agent-based virtual market. It includes a communication infrastructure, mechanisms for the storage and transfer of items, banking and monetary transactions, and economic mechanisms for producer-consumer transactions. Currently, MAGMA uses the Vickrey mechanism as a negotiation method, but it can incorporate other protocols and economic mechanisms. The Vickrey auction (Vickrey 1961) is widely used in DAI (Rosenschein 1986; Huberman and Clearwater 1995; Schwartz and Kraus 1998) and in research on electronic commerce (Tsvetovatyy and Gini 1996; Tsvetovatyy et al. 1997) for the case of one-to-many auctions. Under some assumptions, this protocol is incentive compatible, which means that each bidder has incentives to bid truthfully. Sandholm (1996) surveys the existing auction protocols and discusses certain known and new limitations of the protocol for multiagent systems, such as the possibility of bidder collusion and a lying auctioneer. There are situations in which the value of some item to a bidder depends on which other items he or she wins. In such cases, bidders may want to submit bids for combinations of items. Such auctions are called combinatorial auctions. The main problem in combinatorial auctions is to determine the revenue-maximizing set of nonconflicting bids. The general problem is NP-complete. Several researchers have been trying to develop polynomial algorithms, either for specific cases (e.g., (Rothkopf, Pekec, and Harstad 1995)) or for finding suboptimal solutions (e.g., Lehmann, O’Callaghan and Shoham 1999; Fujishima, Leyton-Brown, and Shoham 1999).
Economic and Game-Theoretic Models for Cooperation
223
Double auction is the most well-known auction protocol for many-to-many auctions. In a double auction, buyers and sellers are treated symmetrically, with buyers submitting bids and sellers submitting minimal prices (Wilson 1985). There are several algorithms used for matching buyers and sellers and for determining the transaction price. Preferably, the protocol will be incentive compatible, individual rational, and Pareto optimal (Wurman, Walsh, and Wellman 1998). As we discussed in 3.6, an auction is incentive compatible if the agents optimize their expected utilities by bidding their true valuations of the goods. An auction is individual rational if participating in an auction does not make an agent worse off than not participating. 9.1.1
The Comparison of Auctions with the Strategic-Negotiation Model
The strategic-negotiation model does not require the use of money transfers to resolve conflicts and thus is preferred to auction models according to the “money transfer” criteria of section 1.3. Furthermore, the strategic-negotiation model provides a mechanism that allows all the agents to be involved in all the details of the agreement with respect to all the agents, even when there are more than two agents in the environment. The auction mechanism is preferred when the agents care only about the details that are related to them and do not care about the details that are related to the other agents. For example, if in the task allocation problem an agent is concerned only with the tasks that it should perform, then auctions can be used. However, if the identity of the agents that perform all the tasks affects its expected utility, and it would thus like to influence the decision on how the tasks will be distributed among the agents in order to maximize its expected utility, then auctions cannot be used. However, if there is incomplete information, auctions may be more applicable than the strategic-negotiation model. To apply the strategic-negotiation model the agents must have some beliefs about their opponents’ utility functions. This is not required when applying auctions to resolve conflicts (even though it may be beneficial for an agent to know the reservation prices of the others). 9.1.2
Auctions vs. Strategic-Negotiation Model in Data Allocation
In (Schwartz and Kraus 1998) we considered a different variation of the data allocation problem. In the model described there, each server is concerned about the data stored locally, but has no preferences concerning the exact storage location of data stored in remote servers. This situation occurs, for example, when clients send their queries directly to the server that stores the documents they need. For such an environment, we proposed an auction mechanism with the
224
Chapter 9
second-price sealed bid protocol in order to attain efficient results. We formally proved that our method is stable and yields honest bids, and the simulations demonstrated the quality of the auction mechanism. The auction mechanism is not applicable in the environments that we considered here in chapter 3 where each server is concerned with the exact location of each dataset. In the auction mechanism, if a server would like to store a dataset it can make a high bid; however, there is no way for a server to influence the location of datasets that are not stored locally. Nonetheless, the strategic-negotiation model can be used in the environments considered in (Schwartz and Kraus 1998), where an auction mechanism is used as a solution method. The strategic-negotiation model does not require the use of a monetary system and transfer utilities, which are needed for the auction mechanisms. Furthermore, the negotiation model guarantees for each server, at the very least, its utility from the conflict allocation. However, the usage of the strategic-negotiation model by servers that are not concerned with the exact location of datasets not stored locally requires more computational resources than the usage of the auction mechanism by these servers. In addition, the results of simulations that we conducted showed that in such situations, the servers obtain, on average, lower utility when they use the strategic negotiation than when they use the auction mechanism. In situations where the servers have incomplete information about each other, the strategic-negotiation model requires a preliminary step in which the servers reveal some of their private information about data usage. To enforce truthful reports, a punishment mechanism for detecting liars should be applied. When auctions are used, no information revelation or punishment mechanisms are needed in the case of incomplete information. In conclusion, the auction mechanism is not efficient in environments where each server is concerned with the exact location of each dataset considered in this book. However, using it in environments where each server cares only about stored locally datasets yields fair and efficient solutions even when there is incomplete information, but requires a monetary system and does not ensure a minimal utility as guaranteed by the strategic-negotiation model. 9.2
Market-Oriented Programming
Market-oriented programming is an approach to distributed computation based on market price mechanisms (Wellman 1993; Wellman and Wurman 1998;
Economic and Game-Theoretic Models for Cooperation
225
Gerber, Russ, and Vierke 1999). In chapter 7 we discuss the application of market mechanisms to the pollution allocation problem. Here, we briefly survey its other applications. The idea of market-oriented programming is to exploit the institution of markets and models of them, and to build computational economies to solve particular problems of distributed resource allocation. This is inspired in part by economists’ metaphors of market systems “computing” the activities of the agents involved, and also by artificial intelligence researchers’ view of modules in a distributed system as autonomous agents. In market-oriented programming the researchers take these metaphors literally, and implement the distributed computation directly as a market price system. That is, the modules, or agents, interact in a very restricted manner—by offering to buy or sell quantities of commodities at fixed unit prices. When this system reaches equilibrium, the computational market has indeed computed the allocation of resources throughout the system, and dictates the activities and consumptions of the various modules (http://ai.eecs.umich.edu/people/wellman/MOP.html). Note that this approach does not necessarily require money transfer, as the strategic-negotiation model, and it is applicable when there is incomplete information. However, it is applicable only when there are several units of each kind of good and the number of agents is large. Otherwise, it is not rational for the agents to ignore the effects of their behavior on the prices, when it actually has an influence. Another issue is that there are situations in which reaching an equilibrium may be time consuming, and the system may not even converge (Wellman and Wurman 1998). It also requires some mechanism to manage the auctions (possibly a distributed mechanism, one for each type of good.) A survey and a general discussion on the market-programming approach can be found in (Wellman and Wurman 1998; Wellman 1996). Also, http://www2.elec.qmw.ac.uk/∼mikeg/text.html is a market-based multiagent systems resource page. The application of market-programming to the pollution allocation problem and additional related work is discussed in chapter 7. 9.3
Coalition Formation
Another important way for agents to cooperate is by creating coalitions (Sandholm and Lesser 1997; Shehory and Kraus 1998; Shehory and Kraus 1999). The formation of coalitions for executing tasks is useful both in
226
Chapter 9
multiagent systems (MA) and distributed problem solving (DPS) environments. However, in DPS, there is usually no need to motivate the individual agent to join a coalition. The agents can be built to try to maximize the overall performance of the system. Thus the only problem is that of which coalitions should be formed (i.e., the structure of the coalitions) for maximizing the overall expected utility of the agents. However, finding the coalition structure that maximizes the overall utility of the system is NP-complete. In MA systems of self-interested agents, an agent will join a coalition only if it gains more by joining the coalition than it could gain previously. Thus, in addition to the issue of the coalition structure, the problem of the division of the coalition’s joint utility is very important. Game-theory techniques for coalition formation can be applied to this problem. Work in game theory such as (Rapoport 1970; Shimomura 1995; Vohra 1995; Zhou 1994) describes which coalitions will form in N-person games under different settings and how the players will distribute the benefits of the cooperation among themselves. This is done by applying several related stability notions such as the core, Shapley value, and the kernel (Kahan and Rapoport 1984). Each of the stability notions is motivated by a different method of measuring the relative strengths of the participating agents. However, the game-theory solutions to the coalition formation problem do not take into consideration the constraints of a multiagent environment, such as communication costs and limited computation time, and they do not present algorithms for coalition formation. Shehory and Kraus (1999), consider coalition-formation of self-interested agents in order to satisfy goals. Both the coalition structure and the division of the utility problems are handled. An anytime algorithm for forming coalitions that satisfy a certain stability based on the kernel stability criterion is developed. The properties of this algorithm are examined via simulations, which show that the model increases the benefits of the agents within a reasonable time period, and that more coalition formations provide more benefits to the agents. Klusch and Shehory (1996) apply these results to the formation of coalitions among information agents. Sandholm et al. (1999) focus on establishing the worst case bound on the coalition structure quality while searching only a small fraction of the coalition structures. They show that there is a minimal number of structures that should be searched in order to establish a bound. They present an anytime algorithm that establishes a tight bound within this minimal amount of search. If the algorithm is allowed to search further, it can establish a lower bound.
Economic and Game-Theoretic Models for Cooperation
227
Sandholm and Lesser (1995a) develop a coalition-formation model for bounded rational agents and present a general classification of coalition games. They concentrate on the problem of computing the value of a coalition, and in their model this value depends on the computation time available to the agents. Zlotkin and Rosenschein (1994) study the problem of the utility division in subadditive task-oriented domains, which is a subset of the task-oriented domains (see section 2.7). They consider only the grand coalition structure, where all the agents belong to the same coalition, and provide a linear algorithm that guarantees each agent an expected utility that is equal to its Shapley value. Ketchpel (1994) presents a utility distribution mechanism designed to perform in similar situations where it is uncertain how much utility a coalition obtains. The strategic-negotiation model could be used for the utility distribution problem if the agents form a grand coalition. Shehory and Kraus (1998) discuss coalition formation to perform tasks in DPS environments. Thus only the coalition structure problem is considered. Efficient distributed algorithms with low ratio bounds and low computational complexities are presented. Both agent coalition formation, where each agent must be a member of only one coalition, and overlapping coalitions are considered. 9.4
Contracting
An agent may try to contract out some of the tasks that it cannot perform by itself, or that may be performed more efficiently by other agents. One selfinterested agent may convince another self-interested agent to help it with its task, by promises of rewards. The main question in such a setting is how one agent can convince another agent to do something for it when the agents do not share a global task and the agents are self-interested. Furthermore, if the contractor-agent can choose different levels of effort when carrying out the task, how can the manager-agent convince the contractor-agent to carry out the task with the level of effort that the manager prefers without requiring the manager’s close observation? The issue of incentive contracting has been investigated in economics and game theory during the last three decades (e.g., Arrow 1985; Ross 1973; Rasmusen 1989; Grossman and Hart 1983; Hirshleifer and Riley 1992; Laffont and Tirole 1993). These works in economics and game theory consider
228
Chapter 9
different types of contracts for different applications. Examples of these include contracts between a firm and an employer or employers (e.g., Nalebuff and Stiglitz 1983; Baiman and Demski 1980; Banerjee and Beggs 1989; MachoStadler and P´erez-Castrillo 1991); a government and taxpayers (e.g., Caillaud et al. 1988); a landlord and a tenant (e.g., Arrow 1985); an insurance company and a policy holder (e.g., Rubinstein and Yaari 1983; Harris and Raviv 1978; Spence and Zeckhauser 1971; Landsberger and Meilijson 1994); a buyer and a seller (e.g., Matthews 1983; Myerson 1983); a government and firms (e.g., McAfee and McMillan 1986); stockholders and managements (e.g., Arrow 1985); and a professional and a client (Shavell 1979). In these situations there are usually two parties. The first party (called “the agent” in economics literature) must choose an action or a level of effort from a number of possibilities, thereby affecting the outcome of both parties. The second party (called “the principal”) has the additional function of prescribing payoff rules. Before the first party (i.e., the agent) chooses the action, the principal determines a rule (i.e., a contract) that specifies the fee to be paid to the other party as a function of the principal’s observations. Despite the similarity of the above applications, they differ in several aspects, such as the amount of information that is available to the parties, the observations that are made by the principal, and the number of agents involved. Several concepts and techniques are applied to the principal-agent paradigm in the relevant economics and game-theory literature. A well-known framework for automated contracting is the Contract Net protocol (Smith and Davis 1981, 1983), mentioned in section 6.3. It was developed for DPS environments where all the agents work on the same goal. In the Contract Net protocol a contract is an explicit agreement between an agent that generates a task (the manager) and an agent that is willing to execute the task (the contractor). The manager is responsible for monitoring the execution of a task and processing the results of its execution, whereas the contractor is responsible for the actual execution of the task. The manager of the task announces the task’s existence to other agents. Available agents (potential contractors) then evaluate the task announcements made by several managers and submit bids for the tasks they are suited to perform. Since all the agents have a common goal and are designed to help one another, there is no need to motivate an agent to bid for tasks or to do its best in executing it if its bid is chosen. The main problems addressed by (Smith and Davis 1981, 1983; Smith 1980) are task decomposition, sub-tasks distribution, and synthesis of the overall solution.
Economic and Game-Theoretic Models for Cooperation
229
The Contract Net has been used in various domains (Parunak 1995; Ohko, Hiraki, and Anzai 1995; Malone et al. 1988; Sen and Durfee 1996). For example, a modified version of the Contract Net protocol for competitive agents in the transportation domain is presented in (Sandholm 1993). It provides a formalization of the bidding and the decision awarding processes, based on marginal cost calculations according to local agent criteria. More important, an agent will submit a bid for a set of delivery tasks only if the maximum price mentioned in the tasks’ announcement is greater than the cost of the deliveries for that agent. A simple motivation technique is presented to convince agents to make bids; the actual price of a contract is halfway between the price mentioned in the task announcement and the bid price. Kraus (1996) considers contracting in various situations of automated agent environments: certainty vs. uncertainty, full information vs. partial information, symmetric information vs. asymmetric information, and bilateral situations vs. environments containing more than two automated agents. For each of these situations appropriate economic mechanisms and techniques are fitted from the game-theory or economics literature that can be used for contracting in automated agents environments. In all the situations considered, the agent that designs the contract is provided with techniques to maximize its personal expected utilities, given the constraints of the other agent(s). Sandholm and his colleagues (Sandholm, Sikka, and Norden 1999; Sandholm and Zhou 2000) have developed a method called leveled commitment contract where each party can unilaterally decommit to a contract by paying a predetermined penalty. They show that such contracts improve expected social welfare even when the agents decommit strategically in Nash equilibrium. In summary, game-theory and economics techniques seem very useful for the development of automated agents. The choice of the specific technique for a given domain depends on the specification of the domain: whether the agents are self-interested, the number of agents in the environment, the type of agreement that they need to reach, and the amount and the type of information the agents have about each other.
10
Conclusions and Future Directions
The strategic negotiation model provides a unified solution to a wide range of coordination and cooperation problems. It satisfies the conditions of distribution and symmetry stated in section 1.3. We have demonstrated its advantages in solving the data-allocation problem, the resource allocation and task distribution problems, and the pollution allocation problem. Recall the five parameters (section 1.3) that should be used to evaluate negotiation results. The results presented in this book have been evaluated according to these five parameters, as summarized below: Negotiation Time: Negotiations that end without delay are preferred to negotiations that are time-consuming. In the data allocation domain (chapter 3) the agents reach an agreement with no delay. However, this requires a preliminary phase of one round of messages broadcast to identify the allocation that serves as the basis of the equilibrium strategies. If there is incomplete information, then an additional revelation phase is required. The revelation phase consists of one round of messages broadcast. In the resource allocation domain (chapters 4 and 5) the negotiations end in the first or the second time periods. However, if there is incomplete information, the negotiations may end with one of the agents opting out. In the task allocation domain (chapter 6) and the pollution domain (chapter 7) the negotiations end in an agreement with no delay. However, if the simultaneous response protocol is used, an additional phase of messages broadcast is needed, as in the data allocation domain. Using the sequential protocol does not require any additional communication between the agents. Note that the market mechanism, which was proposed for the pollution domain when there is incomplete information (chapter 7), requires many iterations of communication between the agents. Efficiency: The outcome of the negotiations is preferably efficient. In almost all the chapters, efficiency is achieved by the agents reaching Pareto-optimal agreements. However, in the data-allocation domain (chapter 3) and when the simultaneous response protocol is used in the pollution domain (chapter 7), given complexity problems, agents reach only suboptimal agreements. In addition, in one case in the resource allocation domain, in order to end the negotiation with no delay and prevent agents from opting out, the agreement reached is not Pareto-optimal. Simplicity: Negotiation processes that are simple and efficient are preferable to processes that are complex.
232
Chapter 10
The protocol of alternating offers of the strategic-negotiation model is very simple. The strategies for the agents presented in the book can be found in polynomial time. To overcome complexity problems in the data-allocation and the pollution domain, an integration of heuristics methods into the strategicnegotiation model is required. Stability: All chapters of the book identified strategies that are in subgame perfect equilibrium or sequential equilibrium. Money transfer: Money transfer is not needed in the strategic-negotiation model. However, if money transfer is possible, using it can improve the negotiations results. This was demonstrated in the pollution allocation problem, where the technique MaxSumNash, which requires money transfer, yields the best results (section 7.3). We have shown that the strategic-negotiation model can be useful in realworld domains since it satisfies all of the above preferences. Currently, we are working on applying the strategic-negotiation model to the development of an automated agent that could negotiate with human players in a crisis situation similar to the hostage crisis. The agent will help train people in handling negotiations. This requires extending the model so that it takes strategic delays into consideration (Ma and Manove 1993). Another possible direction is to use the strategic-negotiation model in electronic commerce, which is becoming an increasingly important trading mechanism. In order to negotiate efficiently and successfully, buyers and sellers need information about the issue of the negotiation and the other parties, and the environment. However, the information that is available to the buyers and the sellers in electronic markets is incomplete. Thus we expect that the incorporation of learning techniques into the strategic-model of negotiation is needed for this domain. For example, learning the market price of an item under negotiation can be useful for both the buyer and the seller. The learning can be done by reviewing previous events (Gimenez-Funes, Godo, and Rodriguez-Aguilar 1998), or by observing the behavior of the other participants in the negotiation (Zeng and Sycara 1998; Grossman and Perry 1986; Chatterjee and Samuelson 1987). We have already studied the effect of learning on auctions in the version of the data allocation problem, discussed in section 9.1.2 (Azoulay-Schwartz and Kraus 2000).There, each server learns the expected usage of each data item from information about past usage of its own data items. We implemented this type of learning process using neural networks. Simulations showed that as a
Conclusions and Future Directions
233
server learns more, it improves its own utility, with only a minor improvement, or a minor decrease (in an environment where information was missing) in the utility of the other servers. We also found that if all the servers learn more, the average utility of the servers increases, but if there is incomplete information, excess learning could reduce the average utility. We believe that appropriate learning will also improve the results of the negotiation in electronic commerce and will lead to successful automated agents in this complex domain.
Appendix A: Suggested Background Reading
The following books, articles, and sites provide additional background for interested readers. Artificial Intelligence Russell and Norvig (1995) is a textbook on artificial intelligence. It provides a short discussion on intelligent agents and their rationality. Other introductory books to artificial intelligence include Nilsson 1998; Charniak and McDermott 1985; Winston 1984; Rich and Knight 1991; and Ginsberg 1993. The Institute for Information Technology, National Research Council of Canada, maintains a list of artificial intelligence resources such as books, journals, and conferences at http://ai.iit.nrc.ca/ai point.html. Distributed Artificial Intelligence (DAI) Weiss (1999) is a textbook on multiagent systems. Each chapter of the book was written by a different researcher. The chapters most relevant to the present book are: chapter 2 (section 2.3 on agent interaction protocols) and chapter 5 on distributed rational decision making. There are several books that include collections of papers on multiagent systems, e.g., O’Hare and Jennings 1996; Bond and Gasser 1988c; and Huhns and Singh 1997. In addition, there is a series of books that includes the papers presented at the International Workshops on Agents Theories, Architectures, and Languages (Wooldridge and Jennings 1995c; Wooldridge, Muller, and Tambe 1996; Muller, Wooldridge, and Jennings 1997; Singh, Rao, and Wooldridge 1998; Jennings and Lesperance 2000). The proceedings of the International Conference on Multi-Agent Systems (ICMAS) are also relevant. UMBC AgentWeb (http://www.cs.umbc.edu/agents/) maintains information, resources newsletters, and mailing lists relating to intelligent agents. Michael Wooldridge maintains a list of agent related pages (http://www.csc.liv.ac.uk/ mjw/links). Game Theory There are many books on game theory. Luce and Raiffa (1957) is a classic that discusses the main motivations of game theory and surveys the early results of game theory. Osborne and Rubinstein (1994) is a graduate-level introductory text on game-theory. Rasmusen (1989) is a book about noncooperative game theory and asymmetric information. It is written from the point of view of an applied theoretical economist and not from the point of view of game theorist. A more advanced book on relevant topics is Aumann and Hart (1992). Other textbooks that require strong mathematical background include Myerson (1991) and Fudenberg and Tirole (1991). Other useful books are those written for
236
Appendix A
social scientists, such as Shubik 1982; Eichberger 1993; Friedman 1986. NetEC (http://netec.wustl.edu) provides information on printed working papers, data about electronic working papers, code for economics and econometrics, and world wide web resources in economics. All the datasets can be queried in a common search system. Bargaining and Negotiations There are several negotiation “guides” that comprise informal theories that attempt to identify possible strategies for a negotiator and to assist him or her in achieving good results. For example, Fisher and Ury 1981; Druckman 1977; Karrass 1970; Johnson 1993; Hall 1993. Raiffa (1982) discusses the practical side of negotiating with a simple mathematical analysis. Lewicki, Saunders, and Minton (1999) explore the major concepts and theories of the psychology of bargaining and negotiation, and the dynamics of interpersonal and intergroup conflict and its resolution. The literature on the axiomatic approach to bargaining is surveyed in Roth (1979). InterNeg maintains reference materials on negotiation and negotiation support, including FAQs (frequently asked questions), bibliographies, software catalogues, glossaries. and world wide web resources in negotiations (http://interneg.carleton.ca/interneg/reference). Others Cormen, Leiserson, and Rivest (1990) is a textbook that provides a comprehensive introduction to the modern study of computer algorithms. Garey and Johnson (1970) is an introduction to the NP-complete problems. French (1986) includes suitable background reading on decision theory. Yoon and Hwang (1995) present a compact survey of multiple attribute decision making. The Collection of Computer Science Bibliography (http://liinwww.ira.uka. de/bibliography/index.html#about) is a collection of bibliographies of scientific literature in computer science from various sources, covering most aspects of computer science. The collection currently contains more than one million references (mostly to journal articles, conference papers, and technical reports).
Appendix B: Glossary of Notations
Notation
Description and relevant environments
A
Attached agent (RAE, HC)
Ai
Agent i
Agents
The set of agents
ai j
The maximal amount of pollutant pol j that can be emitted by plant pli under usual circumstances (PAE)
aj
The maximal amount of pollutant pol j that can be emitted by all the plants under usual circumstances (PAE)
bicj
The default amount of emission of pollutant pol j by plant pli (PAE)
bj
The maximal amount of pollutant pol j that can be emitted by all plants under special circumstances (PAE)
ci
The cost for agent i per negotiation period
DAE
Data allocation environment
δ
Time discount rate
dl
Deadline: the number of time units from the goal’s arivial time in which the goal is still relevant (RAE)
donei
Periods agent i has been working on its goal so far (RAE)
DS
The set of the data items to be allocated (DAE)
em i j
A function that associates with a production vector the amount of the pollutant pol j emitted by plant pli when producing the amounts specified in the vector (PAE)
em j
A function that associates with the production vectors of all the plants the amount of pol j emitted by all the plants when producing the amounts specified in the vectors (PAE)
fi
Agent’s i strategy
g
Goal identification (RAE)
g
A
A A A’s goal — < g A , tmin , tmax , dl A , m A , r > (RAE)
gW
W W W ’s goal — < g W , tmin , tmax , dl W , m W , r > (RAE)
H (t, j)
The history through step j of time period t of the negotiation
HC
Hostage crisis
j (t)
The agent that makes an offer at time period t
238
Appendix B
Notation
Description and relevant environments
M
The number of units of the resource/tasks to be distributed in the negotiation
mi
The number of products produced by plant pli (PAE)
N
The number of agents
O
l
The latest offer made in the negotiations
PAE
Pollution allocation environment
pi
The probability of gaining access to the resource after an event (RAE)
℘i (H )
A probability distribution of i’s opponents as a function of the history
j
φl
The probability belief of agent i that the type of agent j is l
pli
Plant i (PAE)
pol j
Pollutant j (PAE)
Possibleit
The possible agreements that are not worse for agent i in period t than opting out in period t
Possiblet
The set of agreements at period t that are not worse for any agent than opting out
q
Time periods needed for repairing the resource after opting out (RAE)
r
Interest rate
RAE
Resource allocation environment
S
The set of possible agreements
s˜
i,t
The best agreement for i in Possiblet
sˆ i,t
The worst agreement for i in Possiblet
s tA
The additional time periods A needs in time period t to completely accomplish its goal (RAE)
SERV
A set of information servers (DAE)
T Tˆ
Negotiation time periods
tmax
Maximum time periods needed for working on a given goal (RAE)
The earliest time period where PossibleT = ∅ (TDE,PAE)
Glossary of Notations
239
Notation
Description and relevant environments
tmin
Minimum time periods needed for working on a given goal in order to get paid for that goal (RAE)
W tneo
The earliest time period in which agent W does not have enough W before its deadline after Opt (RAE) time to perform tmin
W tne
The earliest time period in which W does not have enough time W before its deadline (RAE) to perform tmin
tˆ A
The time in which A would finish working on its goal and prefers to Leave (RAE)
tˆW
The earliest time in which W prefers to leave over any other option (RAE) The earliest time period between tˆW and tˆ A (RAE)
tˆ TDE U
i
Ull Uhl U Ol UlA UhA U OA
Task distribution environment Agent’s i utility function Wl ’s utility for agreement sˆ Wl ,1 in period 1 (RAE) Wl ’s utility for agreement sˆ Wh ,1 in period 1 (RAE) Wl ’s utility for opting out in period 1 (RAE) A j ’s utility for agreement sˆ Wl ,1 in period 1 (RAE) A j ’s utility for agreement sˆ Wh ,1 in period 1 (RAE) A j ’s utility if W opts out in period 1 (RAE)
vcosts(alloc) The variable costs of a data allocation (DAE) vcost ratio
The ratio of the variable costs when using negotiation and the variable costs when using the static allocation (DAE)
W
Waiting agent (RAE,HC)
t
Accetable agreement; computed using backward induction (TDE,PAE)
z¯ i
The vector of the amounts of the products that are produced by pli (PAE)
z ij
The amount of producti j produced by plant pli (PAE)
x
Notes
Chapter 1 1. These methods can be used in domains where people interact with each other and with automated systems, and situations where automated systems interact in environments without predefined regulations. These informal models can serve as guides for the development of negotiation heuristics (Kraus and Lehmann 1995) or as a basis for the development of a logical negotiation model (Kraus, Sycara, and Evenchik 1998). 2. Research in DAI is divided into two basic classes: distributed problem solving (DPS) and multiagent systems (MA) (Bond and Gasser 1988b). Cooperative agents belong to the DPS class, while self-interested agents belong to the MA class. 3. An agreement is Pareto optimal if there is no other agreement that dominates it, i.e., there is no other agreement that is better for some of the agents and not worse for the others. 4. The introductory chapters of Rasmusen 1989; Myerson 1991; Osborne and Rubinstein 1994; and Eichberger 1993 were referenced while writing this section. They are recommended for further reading. 5. The strategic-negotiation model which is presented in this book is an extensive game. 6. Sometimes the consequence of an action profile is affected by an exogenous random variable whose actual value is not known to the players before they take their actions. For example, the weather condition at the exact time in which India launches its military operation in Pakistan may determine whether the operation is a success or a failure. These weather conditions are not known to India. The strategic-form can be extended to deal with such situations (Rasmusen 1989; Myerson 1991; Osborne and Rubinstein 1994; Eichberger 1993). 7. This simple game does not model the scenario presented in Kraus and Wilkenfeld 1993 and Wilkenfeld et al. 1995. 8. Note that while game-theoreticians try to predict the outcome of games, we try to find strategies for our agents. 9. In order to provide the intuition of the concepts and to simplify the general discussion only finite games have been described here. As defined in the context of the strategic-negotiation, in games with an infinite number of nodes, one has to associate payoffs with (possibly infinite) sequences of actions.
Chapter 2 1. In chapters 6 and 7 a sequential protocol is considered. In this protocol an agent responding to an offer is informed of the responses of the preceding agents (assuming that the agents are arranged in a specific order). 2. A distributed algorithm for randomly ordering the agents can be based on the methods of Ben-Or and Linial (1985). 3. Note that this assumption does not require money transfer between the agents. It only provides a way to evaluate outcomes over time. 4. The reservation price of the seller is the price below which the seller refuses to sell. The reservation price of the buyer is the price above which the buyer refuses to buy.
Chapter 3 1. In game-theory terminology, the game has multiple equilibria and the problem of the players is to convert into one of them.
242
Notes
2. If the datasets are small it is possible to extend the model to permit multiple copies of each dataset. 3. In the remainder of the chapter a server and its agent will be used interchangeably, and we will assume that Agents = SERV. 4. This cost is relevant only in environments in which each client is connected to the nearest server that responds to all his queries. In cases where the clients are autonomous and send their queries directly to the server that has the answer, retrieve cost = 0. 5. For simplicity, we assume that storage space is not restricted. 6. This result can be used in other environments where there are at least three agents and the utility functions of the agents satisfy assumptions A0d –A2d . Other domains where these results can be applied include scheduling problems for self-interested agents where one complete schedule should be found (e.g., Sen, Haynes, and Arora 1997); complex task allocations (e.g., task assignment in self-management maintenance groups of an airline [Hackman 1991]); and air-pollution reduction by several companies when weather conditions require it. 7. This solution was proved by Nash to be the unique solution, given certain basic axioms and when the set of possible outcomes of the bargaining problem is compact and convex. In our situation, the set of allocations is not convex, but we use the generalized Nash product as a reasonable bargaining outcome. 8. It is possible to reach similar results using other negotiation protocols (e.g., Rosenschein and Zlotkin 1994). However, our protocol puts fewer restrictions on the agents’ strategies, which is preferable in environments of autonomous agents. I do not restrict the length of the negotiation, and I allow each server in turn to propose an offer. 9. A problem is in the class of NP problems if there is some algorithm that can guess a solution and then verify whether or not the guess is correct in polynomial time. The class of NP-complete problems includes the hardest problems in the class NP. It has been proven that either all NPcomplete problems are in P or none of them are. Currently, only exponential deterministic algorithms are known to handle NP-complete problems (Garey and Johnson 1979). 10. The product maximization method was proven in Nash (1950) to be the unique solution that satisfies certain basic axioms. 11. The retrieve cost is measured in dollars per 1 distance unit, i.e., retrieve cost = 0.001 means that transmission of 1 document over 1000 distance units costs 1 dollar for the side that receives the document. 12. Answer cost is measured in the same units as the retrieve cost, i.e., dollars per distance unit. 13. This refers to mechanisms where each agent reports an order for the possibilities, but not a cardinal value for each possibility. 14. Nevertheless, the mechanisms we use are also not safe against coalitions.
Chapter 4 1. Our model is also applicable in the case where the resource itself can actually be divided between the agents. This case does not differ significantly from the case where only the resource usage time can be divided. 2. We assume that even if both agents try to obtain access to the resource simultaneously, only one will actually be able to gain access and become the Attached agent. 3. Throughout the remainder of the chapter, A’s portion in an agreement will be written first. 4. In the case of incomplete information, A3r will be slightly changed. 5. For all s ∈ S and i ∈ Agents, si is agent i’s portion of the resource.
Notes
243
6. Note that T = {0, 1, 2 . . .}, i.e., the periods start from 0. 7. Kreps and Wilson (1982) imposed an additional, stronger restriction. They required that the beliefs of the agent are the limit of a sequence of rational beliefs. All the original sequential equilibria satisfied our conditions, but there are a few equilibria according to our definition that do not satisfy Kreps and Wilson’s stronger requirement. 8. See Madrigal, Tan, and Werlang (1987) for a discussion of this requirement. In some situations, this requirement may cause the elimination of equilibria. We leave the relaxation of this requirement for future work. 9. As defined above, for i, j ∈ Type, cWi denotes agent Wi ’s loss over time and c A j denotes agent A j ’s gain over time. sˆ Wi ,t is the worst agreement for agent Wi that is still better than opting out. The W ,t subscript A, i.e., sˆ A i , indicates A’s portion of the resource in agreement sˆ Wi ,t . In the following, A3r will also include the assumptions specified here. W ,t
W
,t
W ,t
10. It is enough to assume that sˆ A k + |cWk | < sˆ A k−1 + |cWk−1 | < · · · < sˆ A 1 + |cW1 |. 11. This is the situation when there are only two agents in the environment or when the type of one agent does not depend on the types of the others. That is, we do not consider the cases where by learning that an agent is of type h, e.g., it can also be concluded that others are less likely to be of the same type. 12. From a practical point of view, these are situations of asymmetric information. 13. This assumption is reasonable if there is a finite number of beliefs about other agents. Then, we can divide any type into subtypes according to its beliefs about its opponent. 14. Note that inequalities AA1.1 and AA1.2 below cover all possibilities of A’s utility functions, besides the one that yields equality. 15. Note that all these results are valid only under the assumption that there may be only two encounters. 16. Note that inequalities AW1 and AW2 cover all possibilities of Wl ’s utility functions, besides the one that yields equality. 17. Note that in the previous section, Wl just wanted to maintain the current situation. In this case, Wl tries to convince A j that its type is h. 18. Note that there is a paradox. If A j is not influenced by both agents behaving as Wh , it is not rational for Wl to behave as Wh . Therefore, if A j observes Wh ’s behavior, it may conclude that it is Wh . However, if A j ’s beliefs are affected by W ’s behavior, it is worthwhile for Wl also to pretend to be Wh . 19. Note that A’s belief is correct in this case; W ’s type is l. 20. The agents may use a Test and Set mechanism. This will prevent the situation in which two agents would like to attain access at exactly the same time. 21. Note that this does not require a central design of agents. However, it requires the development of certain standards for an MA environment. 22. This concept is similar to the notion of correlated equilibrium (Aumann 1974), which in most situations requires a contract enforcement mechanism. 23. Similarly, correlated equilibrium is different from mixed strategies.
Chapter 5 1. A summary of the notations used in this chapter appears in table 5.1. 2. This assumption does not require side payments between the agents during the negotiations. It only provides a way to evaluate the agents’ goals and costs.
244
Notes
3. Details on the following functions can be found in Schechter (1996). 4. A simple example that demonstrates holding unused resource is as follows: the maximum time A = 10) and the periods needed for A to work toward this goal in order to get paid for is 10 (i.e., tmax agents reach an agreement (15, 23). A will need to hold the resource for at least 5 extra time periods without being able to use it. Another example is as follows: the minimum time periods needed for A = 20, and A hasn’t been A to work toward its goal in order to get paid for its goal is 20, i.e., tmin working on the goal before W starts the negotiation (i.e., done A . = 0). An agreement (10, 34) is reached in the first time period. A will not be paid for the 10 time periods because according to the agreement it will need to stop working before being able to work the minimal required time. 5. If an agreement (s, t) has been implemented, it means that A had continued to work for s time periods, and then W had worked for n time periods. However, it may be the case that even then, both W and A have not fullfiled their goals. 6. s tA is the maximum number of time periods needed by A at period t in order to accomplish the feasible part of its goal. 7. We still need to work on the theoretical aspects of such problems, as an enhancement of the model we present here. 8. See Schechter (1996) for a formal analysis of this issue.
Chapter 6 1. A discussion on intention reconciliation in the context of teamwork can be found in (Sullivan et al. 1999; Sullivan et al. 2000). 2. We note that U i ((ˆs i,t , t)) > U i ((Opt, t)) (see definition 2.3.1). 3. For simplification, we assume that only one such maximal agreement exists. This is the case either if the only factor determining the utility for an agent is its own portion of the task (see example 20), or if the quality of the performance of the other agents yields a different utility for each agreement. Without this uniqueness assumption, if there are several agreements that have the same maximal utility for agent i, it is difficult for the other agents to predict which offer agent i will make. These agreements may have different utilities for the other agents. We can assume that in such situations, agent i chooses any one of the maximal agreements with equal probability, and that the other agents behave according to their expected utilities. 4. This problem is also called task assignment, task allocation, or task distribution.
Chapter 7 1. This utility is obtained when the plant emitts the amounts specified by the functions em ij below. We assume that the em i j functions do not change during the time of consideration. The question of how a change in the functions em i j (e.g., using “cleaner” production procedures) influences the utility of the plants is bewond the scope of this chapter. 2. These utility functions do not take the negotiation time or any other coordination costs into account. We will modify them later, when relevant. 3. We use a binary search since it is not possible for the plants to compute their demand curves as in WALRAS (Wellman 1993) because the functions are not continuous. However, similar to WALRAS, the auctioneer adjusts individual prices to clear, rather than adjusting the entire prices vector by some increment as in the tatonnement mechanism. 4. In the current implementation, the initial prices are between 0.5 and 2. More research is needed to study how a change to this range would influence the results.
References
Agency, United States Environmental Protection. 2000. Acid rain program. http://www.epa. gov/acidrain. Alanyali, M. and B. Hajek. 1997. Analysis of simple algorithms for dynamic load balancing. Math. Oper. Res. 22(4):840–871. Andersson, M. R. and T. W. Sandholm. 1998. Contract Types for Optimal Task Allocation: II Experimental Results. In Sandip Sen, editor, AAAI 1998 Spring Symposium: Satisficing Models, Stanford University, California. Technical Report SS-98-05, The AAAI Press, Menlo Park, California. Apers, P. M. G. 1988. Data allocation in distributed database systems. ACM Transactions on Database Systems, 13(3):263–304. Arrow, K. J. 1951. Social Choice and Individual Values (Collected Papers of K. J. Arrow, vol. 1). Basil Blackwell, Cambridge, Massachusetts. Arrow, K. J. 1985. The economics of agency. In J. Pratt and R. Zeckhauser, editors, Principals and Agents: The Structure of Business. Harvard Business School Press, Cambridge, Massachusetts, pages 37–51. Aumann, R. J. 1974. Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics 1(1):67–96. Aumann, R. J. and S. Hart, editors. 1992. Handbook of game theory with economic applications. North-Holland, Amsterdam. Axelrod, R. 1984. The Evolution of Cooperation. Basic Books, New York. Azoulay-Schwartz, R. and S. Kraus. 2000. Assessing usage patterns to improve data allocation via auctions. In Proceedings of ICMAS-2000, pages 47–54. IEEE Computer Society Press, Los Alamitos, California. Baiman, S. and J. Demski. 1980. Economically optimal performance evaluation and control systems. Journal of Accounting Research 18:184–220. Balch, T. and R. C. Arkin. 1995. Motor schema-based formation control for multiagent robot teams. In Proceedings of the First International Conference on Multiagent Systems (ICMAS-95), pages 10–16. The AAAI Press, Menlo park, California. Banerjee, A. and A. Beggs. 1989. Efficiency in hierarchies: Implementing the first-best solution by sequential actions. The Rand Journal of Economics 20(4):637–645. Ben-Or, M. and N. Linial. 1985. Collective coin flipping, robust voting games and minima of banzhaf values. In Proceedings 26th IEEE Symposium on the Foundations of Computer Science, pages 408–416. IEEE Computer Society Press, Los Alamitos, California. Berns, K. 1998. Walking machine catalogue. http://www.fzi.de/ipt/WMC/preface/walking machines katalog.html. Bhaska, V. 1997. Breaking the symmetry: Optimal conventions in repeated symmetric games. In The 17th Arne Ryde Symposium on Focal Points: Coordination, Complexity and Communication in Strategic Contexts. The Arne Ryde Foundation, Helsingborg, Sweden. Bond, A. H. and L. Gasser. 1988a. An analysis of problems and research in DAI. In A. H. Bond and L. Gasser, editors, Readings in Distributed Artificial Intelligence. Morgan Kaufmann, San Mateo, California, pages 3–35. Bond, A. H. and L. Gasser, editors. 1988b. Readings in Distributed Artificial Intelligence. Morgan Kaufmann, San Mateo, California. Bratman, M. E., D. J. Israel, and M. E. Pollack. 1988. Plans and resource-bounded practical reasoning. Computational Intelligence 4(4):349–355. Brazier, F., F. Cornelissen, R. Gustavsson, C. M. Jonker, O. Lindeberg, B. Polak, and J. Treur. 1998. Agents negotiating for load balancing of electricity use. In Proceedings of the 18th international
246
References
conference on distributed computing systems (ICDCS’98), pages 622–629. IEEE Computer Society Press, los Alamitos, California. Brecher, M., editor. 1978. Studies in Crisis Behavior. Transaction Books, New Brunswick, New Jersey. Brecher, M. and J. Wilkenfeld. 1989. Crisis, Conflict, and Instability. Pergamon Press, Oxford. Brecher, M. and J. Wilkenfeld. 1997. A Study of Crisis. University of Michigan Press, Ann Arbor, Michigan. Brecher, M., J. Wilkenfeld, and S. Moser. 1988. Crises in the Twentieth Century, vol. #I: Handbook of International Crises. Pergamon Press, Oxford. Brooks, Rodney. 1985. A robust layered control system for a mobile robot. Technical Report AI Memo 865, Artificial Intelligence Laboratory, MIT, Cambridge, Massachusetts, September. Buttazzo, Giorgio C. 1997. Hard Real-Time Computing Systems. Kluwer Academic Publishers, Boston, Massachusetts. Caillaud, B., R. Guesnerie, P. Rey, and J. Tirole. 1988. Government intervention in production and incentives theory: A review of recent contributions. Rand Journal of Economics 19(1):1–26. Cammarata, S., D. McArthur, and R. Steeb. 1983. Strategies of cooperation in distributed problem solving. In Proceedings of IJCAI-83, pages 767–770. Morgan Kaufmann, San Mateo, California. August. Carver, N., Z. Cvetanovic, and V. Lesser. 1991. Sophisticated cooperation in FA/C distributed problem solving systems. In Proceedings of AAAI-91, pages 191–198. AAAI Press, Menlo Park, California. Casey, R.G. 1972. Allocation of copies of a file in an information network. In Proceedings of AFIPS SJCC, AFIPS Press, Washington D.C. Ceri, S., G. Martella, and G. Pelagatti. 1982. Optimal file allocation in a computer network: A solution method based on the knapsack problem. Computer Networks 6:345–317. Chandrasekn, B. 1981. Natural and social system metaphors for distributed problem solving: Introduction to the issue. IEEE Transaction on Systems Man and Cybernetics 11(1):1–5. Charniak, E. and D. McDermott. 1985. Introduction to Artificial Intelligence. Addison-Wesley, Reading, Massachusetts. Chatterjee, K. and L. Samuelson. 1987. Bargaining with two-sided incomplete information: An infinite horizon model with alternating offers. Review of Economic Studies, 54:175–192. Chavez, A. and P. Maes. 1996. Kasbah: An agent marketplace for buying and selling goods. In The First International Conference on the Practical Application of Intelligent Agents and Multi Agents Technology, pages 75–90. The Practical Application Company, Lancashire, U.K. Chavez, A., A. Moukas, and P. Maes. 1997. Challenger: A multi-agent system for distributed resource allocation. In W. Lewis Johnson and Barbara Hayes-Roth, editors, Proceedings of the 1st International Conference on Autonomous Agents, pages 323–331, ACM Press, New York. Cheng, S., J. Stanlovic, and K. Ramamritham. 1986. Dynamic scheduling of groups of tasks with precedence constraints in distributed, hard real-time systems. In Real-Time Systems Symp., pages 166–174. IEEE Computer Society Press, Los Alamitos, California. Chu, W. W. 1969. Optimal file allocation in a multiple computer system. IEEE Trans. on Computers C-18:885–889. Clarke, E. 1971. Multipart pricing of public goods. Public Choice 8:19–33. Coffman, E. G., editor. 1976. Computer and Job-Shop Scheduling Theory. John Wiley & Sons, New York. Conry, S. E., R. A. Meyer, and V. R. Lesser. 1988. Multistage negotiation in distributed planning. In A. H. Bond and L. Gasser, editors, Readings in Distributed Artificial Intelligence. Morgan Kaufmann Publishers, San Mateo, California, pages 367–384.
References
247
Conry, S. E., K. Kuwabara, V. R. Lesser, and R. A. Meyer. 1991. Multistage negotiation for distributed satisfaction. IEEE Transactions on Systems, Man, and Cybernetics, Special Issue on Distributed Artificial Intelligence 21(6):1462–1477. Cooper, R. W., D. V. DeJong, R. Forsythe, and T. W. Ross. 1990. Selection criteria in coordination games: Some experimental results. The American Economic Review 80(1):218–233. Copeland, T. E. and J. F. Weston. 1992. Financial Theory and Corporate Policy. Addison-Wesley, Reading, Massachusetts. Cormen, T. H., C. E. Leiserson, and R. L. Rivest. 1990. Introduction to Algorithms. MIT Press, Cambridge, Massachusetts. Cramton, P. 2000. A review of markets for clean air: The U.S. acid rain program. Journal of Economic Literature 38:627–633. Cramton, P. and S. Kerr. 1998. Tradable carbon permit auctions: How and why to auction not grandfather. Working Papers, Economics Department, University of Maryland, College Park, Maryland. Czumaj, A. and V. Stemann. 1997. Randomized allocation processes. In Proceedings of 38th Annual Symposium on Foundations of Computer Science, pages 194–203. IEEE Computer Society Press, Los Alamitos, California. Dasgupta, P., P. Hammond, and E. Maskin. 1980. On imperfect information and optimal pollution control. Review of Economic Studies 47:857–860. Davis, E. W. 1966. Resource allocation in project network models—A survey. Journal of Industrial Engineering, 17(4):33–41. Davis, E. W. 1973. Project scheduling under resource constraints—Historical review and categorization of procedures. AIIE Trans. 5(4):297–313. Davis, E. W. and J. H. Patterson. 1975. A comparison of heuristic and optimum solutions in resource constrained project scheduling. Management Science 21(8):944–955. Decker, K. and V. Lesser. 1993. A one-shot dynamic coordination algorithm for distributed sensor networks. In Proceedings of AAAI-93, pages 210–216. The AAAI Press, Menlo Park, California. Decker, K. and J. Li. 1998. Coordinated hospital patient scheduling. In Proceedings of ICMAS98, pages 104–11. IEEE Computer Society Press, Los Alamitos, California. Doorenbos, R. B., O. Etzioni, and D. S. Weld. 1997. A scalable comparision-shopping agent for the world-wide web. In W. L. Johnson and B. Hayes-Roth, editors, Proceedings of Autonomous Agents-97, pages 39–48, ACM Press, Washington D.C. Dowdy, L. W. and D. V. Foster. 1982. Comparative models of the file assignment problem. Computing Survey 14 (2):289–313. Druckman, D. 1977. Negotiations. Sage, London. Du, X. and Fred J. Maryanski. 1988. Data allocation in a dynamically reconfigurable environment. In Proceedings of the IEEE Fourth Int. Conf. Data Engineering, pages 74–81. IEEE Computer Society, Los Alamitos, California. Dummett, M. 1984. Voting Procedures. Clarendon Press, Oxford. Durfee, E. H. 1988. Coordination of Distributed Problem Solvers. Kluwer Academic Publishers, Boston. Durfee, E. H. and V. R. Lesser. 1987. Global plans to coordinate distributed problem solvers. In Proceedings of IJCAI-87, pages 875–883. Morgan Kaufmann, San Mateo, California. Eager, D. L., E. D. Lazowska, and J. Zahorjan. 1986. Adaptive load sharing in homogeneous distributed systems. IEEE Trans. Software Engineering, 12(5):662–675. eBay. 2001. eBay—Your Personal Trading Community. http://www.ebay.com. Eichberger, J. 1993. Game Theory for Economics. Academic Press, San Diego, California. Ephrati, E. and J. S. Rosenschein. 1996. Deriving consensus in multiagent systems. Artificial Intelligence, 87(1–2):21–74.
248
References
Eswaran, K. P. 1974. Placement of records in a file and file allocation in a computer network. In Proceedings of the IFIP Congress on Information Processing, pages 304–307. North Holland, Amsterdam, The Netherlands. Etzioni, O. and D. S. Weld. 1995. Intelligent agents on the internet: Fact, fiction, and forecast. IEEE Expert 10(4):44–49. Farrell, J. 1988. Meaning and credibility in cheap-talk games. In M. Dempster, editor, Mathematical Models in Economics. Oxford University Press, Oxford. Ferguson, I. A. 1992. Touring Machines: An Architecture for Dynamic, Rational, Mobile Agents. Ph.D. thesis, University of Cambridge, Clare Hall, U.K. Available as Tech Report No. 273. Fischer, K. and N. Kuhn. 1993. A DAI approach to modeling the trasportation domain. Technical Report RR-93-25, Deustsches Forschungszentrum fur Kunstliche Intelligenz GmbH. Fischer, K., J. P. M¨uller, I. Heimig, and A. Scheer. 1996. Intelligent agents in virtual enterprises. In The First International Conference on the Practical Application of Intelligent Agents and Multi Agents Technology. The Practical Application Company, Lancashire, U.K. Fisher, R. and W. Ury. 1981. Getting to Yes: Negotiating Agreement without Giving In. Houghton Mifflin, Boston. Fitzgerald, C. 2000. Emissions trading. http://www.cantor.com/ebs. Foner, L. N. 1993. What’s an agent anyway? A sociological case study. Technical Report Agents Memo 93-01, MIT, Media Laboratory. Fraser, N. M. and K. W. Hipel. 1979. Solving complex conflicts. IEEE Transaction on Systems Man and Cybernetics 9(12):805–816. French, S. 1986. Decision Theory: An Introduction to the Mathematics of Rationality. Ellis Horwood. Friedman, J. W. 1986. Game Theory with Applications to Economics. Oxford University Press, Oxford. Frost, D. and R. Dechter. 1994. In search of the best constraint satisfaction search. In Proceedings of the Twelfth National Conference on Artificial Intelligence, pages 301–306. The AAAI Press, Menlo Park, California. Fudenberg, D. and J. Tirole. 1991. Game Theory. MIT Press, Cambridge, Massachusetts. Fujishima, Y., K. Leyton-Brown, and Y. Shoham. 1999. Taming the computational complexity of combinatorial auctions: Optimal and approximate approaches. In Dean Thomas, editor, Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI-99-Vol1), pages 548–553, Morgan Kaufmann, San Francisco, California. Garey, M. R. and D. S. Johnson. 1979. Computers and Intractability: A Guide to the Theory of NP-completeness. W. H. Freedman, New York. George, A. L. 1983. Managing US–Soviet Rivalry: Problems of Crisis Prevention. Westview Press, Boulder. George, A. L. and R. Smoke. 1974. Deterrence in American Foreign Policy. Columbia University Press, New York. Georgeff, M. P. 1987. Actions, processes, and causality. In Reasonings about Actions and Plans: Proceedings of the 1986 Workshop, pages 99–122. Morgan Kaufmann, San Francisco, California. Gerber, C., C. Russ, and G. Vierke. 1999. On the suitability of market-based mechanisms for telematics applications. In Proceedings of the Third International Conference on Autonomous Agents (Agents ’99), pages 408–409. ACM Press, New York, NY. Gimenez-Funes, E., L. Godo, and J. A. Rodriguez-Aguilar. 1998. Designing bidding strategies for trading agents in electronic commerce. In ICMAS98, pages 136–143. IEEE Computer Society, Los Alamitos, California.
References
249
Ginsberg, M. 1993. Essentials of artificial intelligence. Morgan Kaufmann, San Francisco. Goldberg, D.E. 1989. Genetic Algorithms in Search Optimization and Machine Learning. AddisonWesley, Reading, Massachusetts. Golombek, M. P., R. A. Cook, T. Economou, W. M. Folkner, A. F. Haldemann, P. H. Kallemeyn, J. M. Knudsen, R. M. Manning, H. J. Moore, T. J. Parker, R. Rieder, J. T. Schofield, P. H. Smith, and R. M. Vaughan. 1997. Overview of the Mars pathfinder mission and assessment of landing site predictions. Science 278(5):1743–1748. Graham, R. L., E. L. Lawler, J. K. Lenstera, and A. H. G. Rinnooy Kan. 1979. Optimization and approximation in deterministics sequencing and scheduling: A survey. Annals of Discrete Mathematics 5:287–326. Grigg, I. and C. C. Petro. 1997. Using electronic markets to achieve efficient task distribution. In R. Hirschfeld, editor, Financial Cryptography: First International Conference, FC ’97, volume 1318 of Lecture Notes in Computer Science. Springer-Verlag, Berlin, pages 329–339. Grossman, S. and O. Hart. 1983. An analysis of the principal-agent problem. Econometrica 51(1): 7–45. Grossman, S. and M. Perry. 1986. Sequential bargaining under asymmetric information. Journal of Economic Theory 39:120–154. Grosz, B. J. and S. Kraus. 1996. Collaborative plans for complex group activities. Artificial Intelligence Journal 86(2):269–357. Groves, T. 1973. Incentives in teams. Econometrica 41(4):617–631. Guttman, R. H. and P. Maes. 1998. Cooperative vs. competitive multi-agent negotiations in retail electronic commerce. In The Second International Workshop on Cooperative Information agents (CIA98), pages 135–147. Springer-Verlag, Berlin, Germany. Hackman, J. R., editor. 1991. Groups That Work (and Those That Don’t). Jossey-Bass, San Francisco, California. Hall, Lavinia, editor. 1993. Negotiation: Strategies for Mutual Gain. Sage, Beverly Hills. Haller, H. 1986. Non-cooperative bargaining of n ≥ 3 players. Economics Letters 22:11–13. Harris, M. and A. Raviv. 1978. Some results on incentive contracts with applications to education and employment, health insurance, and law enforcement. The American Economic Review 68(1):20–30. Harsanyi, J. C. and R. Selten. 1988. General theory of equilibrium selection in games. MIT Press, Cambridge, Massachusetts. Hillier, F. S. and G. J. Lieberman. 1995. Introduction to Operations Research. McGraw-Hill, New York. Hirshleifer, J. and J. Riley. 1992. The Analytics of Uncertainty and Information. Cambridge University Press, Cambridge. Huberman, B. and S. H. Clearwater. 1995. A multi-agent system for controlling building environments. In Proceedings of ICMAS-95, pages 171–176. Huhns, M. and M. Singh, editors. 1997. Readings in Agents. Morgan Kaufmann, San Francisco, California. The AAAI Press, Menlo, California. Huhns, Michael N., Munindar P. Singh, Tomasz Ksiezyk, and Nigel Jacobs. 1994. Global information management via local autonomous agents. In Proceedings of the 13th International Workshop on Distributed Artificial Intelligence, pages 153–174. AAAI Technical Report, WS-94-02, AAAI Press, Menlo Park, California. Huyck, John B. Van, Raymond, C. Battalio, and Richard O. Beil. 1990. Tacit coordination games, strategic uncertainty, and coordination failure. The American Economic Review 80(1):234–248.
250
References
Jamison, J. C. 1997. Valuable cheap-talk and equilibrium selection. In The 17th Arne Ryde Symposium on Focal Points: Coordination, Complexity and Communication in Strategic Contexts. The Arne Ryde Foundation, Helsingborg, Sweden. Jennings, N. and Y. Lesperance, editors. 2000. Agents for Broadcasting Environment. SpringerVerlag, Berlin. Jennings, N. R. and M. J. Wooldridge. 1998. Applications of intelligent agents. In Agent Technology Foundations, Applications, and Markets. Springer-Verlag, Berlin. Jennings, Nick R. 1995. Controlling cooperative problem solving in industrial multi-agent systems using joint intentions. Artificial Intelligence Journal 75(2):1–46. Johnson, R. 1993. Negotiation basics. Sage, Newbury Park. Kahan, J. P. and A. Rapoport. 1984. Theories of coalition formation. Lawrence Erlbaum, Hillsdale, New Jersey. Kaminka, G. A. and M. Tambe. 2000. Robust agent teams via socially attentive monitoring. Journal of Artificial Intelligence Research 12:105–147. Kandori, M., G. J. Mailath, and R. Rob. 1993. Learning, mutation, and long-run equilibria in games. Econometrica 61(1):29–56. Karrass, C. L. 1970. The Negotiating Game: How to Get What You Want. Thomas Crowell, New York. Kennan, J. and R. Wilson. 1993. Bargaining with private information. Journal of Economic Literature 31:45–104. Ketchpel, S. P. 1994. Forming coalitions in the face of uncertain rewards. In Proceedings of AAAI94, pages 414–419. The AAAI Press, Menlo Park, California. Klemperer, P. 1999. Auction theory: A guide to literature. Journal of Economic Surveys, 13(3):227– 286. Klusch, M. and O. Shehory. 1996. A polynomial kernel-oriented coalition formation algorithm for rational information agents. In Proceedings of ICMAS-96, pages 157–164. The AAAI Press, Menlo Park, California. Kornfeld, W. and C. Hewitt. 1981. The scientific community metaphor. IEEE Transactions on Systems Man and Cybernetics 11(1):24–33. Kraus, S. 1996. An overview of incentive contracting. Artificial Intelligence Journal 83(2):297– 346. Kraus, S. 1997. Beliefs, time, and incomplete information in multiple encounter negotiations among autonomous agents. Annals of Mathematics and Artificial Intelligence 20(1–4):111–159. Kraus, S. and D. Lehmann. 1995. Designing and building a negotiating automated agent. Computational Intelligence 11(1):132–171. Kraus, S. and T. Plotkin. 2000. Algorithms of distributed task allocation for cooperative agents. Theoretical Computer Science 242(1-2):1–27. Kraus, S., K. Sycara, and A. Evenchik. 1998. Reaching agreements through argumentation: a logical model and implementation. Artificial Intelligence 104(1-2):1–69. Kraus, S. and J. Wilkenfeld. 1990. Modeling a hostage crisis: Formalizing the negotiation process. Technical Report UMIACS TR 90-19 CS TR 2406, Institute for Advanced Computer Studies, University of Maryland. Kraus, S. and J. Wilkenfeld. 1991. Negotiations over time in a multiagent environment: Preliminary report. In Proceedings of IJCAI-91, pages 56–61. Morgan Kaufmann, San Francisco, California. Kraus, S. and J. Wilkenfeld. 1993. A strategic negotiations model with applications to an international crisis. IEEE Transactions on Systems Man and Cybernetics 23(1):313—323. Kraus, S., J. Wilkenfeld, M. Harris, and E. Blake. 1992. The hostage crisis simulation. Simulations and Games 23(4):398–416.
References
251
Kremien, O., J. Kramer, and J. Magee. 1993. Scalable and adaptive load-sharing for distributed systems. IEEE Parallel and Distributed Technology 1(3):62–70. Kreps, D. and R. Wilson. 1982. Sequential equilibria. Econometrica 50:863–894. Kryazhimskii, A., A. Nentjes, S. Shibayev, and A. Tarasyev. 1998. Searching market equilibria under uncertain utilities. Technical Report IR-98-00, International Institute for Applied Systems Analysis, Austria. Kuwabara, K. and V. Lesser. 1989. Extended protocol for multistage negotiation. In Proceedings of the Ninth Workshop on Distributed Artificiall Intelligence, pages 129–161. (unpublished collection). Kwerel, E. 1977. To tell the truth: Imperfect information and optimal pollution control. Review of Economic Studies 44(3):595–601. Laffont, J. and J. Tirole. 1993. A Theory Of Incentives in Procurement and Regulation. The MIT Press, Cambridge, Massachusetts. Lander, Susan E. and Victor R. Lesser. 1992. Customizing distributed search among agents with heterogeneous knowledge. In Proceedings first int. conf. on Information Knowledge Management, pages 335–344. ACM Press, New York, NY. Landsberger, M. and I. Meilijson. 1994. Monopoly insurance under adverse selection when agents differ in risk aversion. Journal of Economic Theory 63:392–407. Lapan, H. E. and T. Sandler. 1988. To bargain or not to bargain: That is the question. The American Economic Review 78(2):16–21. Lebow, R.N. 1981. Between Peace and War. Johns Hopkins University Press, Baltimore. Ledyard, J. and K. Szakaly-Moore. 1993. Designing organizations for trading pollution rights. Working paper, California Institute of Technology. Lehmann, Daniel, Liadan Ita O’Callaghan, and Yoav Shoham. 1999. Truth revelation in rapid, approximately efficient combinatorial auctions. In Proceedings of the ACM Conference on Electronic Commerce (EC’99), pages 96–102, ACM Press, New York. Leng, R. 1988. Crisis learning games. American Political Science Review 82(1):179–1194. Lesser, V. R. and L. D. Erman. 1980. Distributed interpretation: A model and experiment. IEEE Transactions on Computers 29(12):1144–1163. Lesser, V. R., J. Pavlin, and E. H. Durfee. 1988. Approximate processing in real-time problem solving. AI Magazine 9(1):49–61. Lewicki, R., D. M. Saunders, and J. W. Minton. 1999. Negotiation. Irwin/McGraw-Hill, Boston, Massachusetts. Luce, R. D. and H. Raiffa. 1957. Games and Decisions. John Wiley and Sons, New York. Ma, C. A. and M. Manove. 1993. Bargaining w ith deadlines and imperfect player control. Econometrica 61(6):1313–1339. Macho-Stadler, I. and J. P´erez-Castrillo. 1991. Moral hazard and cooperation. Economics Letters 35:17–20. Madrigal, V., T. Tan, and R. Werlang. 1987. Support restrictions and sequential equililibria. Journal of Economic Theory 43:329–334. Maes, P. 1990. Situated agents can have goals. In P. Maes, editor, Designing Autonomous Agents. MIT Press, Cambridge, Massachusetts, pages 49–70. Malone, T. W., R. E. Fikes, K. R. Grant, and M. T. Howard. 1988. Enterprise: A marketlike task schedule for distributed computing environments. In B. A. Huberman, editor, The Ecology of Computation, pages 177–205. North Holland, Amsterdam, The Netherlands. March, S. T. and S. Rho. 1995. Allocating data and operations to nodes in distributed database design. IEEE Transactions on Knowledge and Data Engineering 7(2):305–317. Matthews, S. 1983. Selling to risk-averse buyers with unobservable tastes. Journal of Economy Theory 30:370–400.
252
References
Mayoh, B. 1996. Artificial life and pollution control: Explorations of a genetic algorithm system on the highly parallel connection machine. In D. Bjorner, M. Broy, and I. V. Pottosin, editors, Perspective of System Informatics. Springer-Verlag, Berlin. McAfee, R. P. and J. McMillan. 1986. Bidding for contracts: A principal-agent analysis. The Rand Journal of Economics 17(3):326–338. Minton, S., M. D. Johnston, A. B. Philips, and P. Laird. 1992. Minimizing conflicts: A heuristic repair method for constraint satisfaction and scheduling problems. Artificial Intelligence 58:161– 205. Moehlman, T., V. Lesser, and B. Buteau. 1992. Decentralized negotiation: An approach to the distributed planning problem. Group Decision and Negotiation 2:161–191. Monderer, D. and M. Tennenholtz. 2000. Optimal auctions revisited. Artificial Intelligence Journal 120:29–42. Moulin, B. and B. Chaib-Draa. 1996. An overview of distributed artificial intelligence. In G. M. P. O’Hare and N. R. Jennings, editors, Foundations of Distributed Artificial Intelligence. John Wiley & Sons, New York, pages 3–55. Mullen, T. and M. Wellman. 1995. A simple computational market for network information services. In Proceedings of the First International Conference on Multiagent Systems (ICMAS-95), pages 283–289. The AAAI Press, Menlo Park, California. Muller, J. P. 1999. The right agent (architecture) to do the right thing. In J. P. Muller, M. P. Singh, and A. S. Rao, editors, Intelligent Agents V, Berlin: Springer-Verlag, pages 211–226. Muller, J. P., M. J. Wooldridge, and N. R. Jennings, editors. 1997. Intelligent Agents III: Proceedings of the Third International Workshop on Agent Theories, Architectures, and Languages. Springer-Verlag, Berlin. Myerson, R. 1979. Incentive compatibility and the bargaining problem. Econometrica 47(1):61–73. Myerson, R. 1983. Mechanism design by an informed principal. Econometrica 51:1767—1798. Myerson, R. B. 1991. Game Theory: Analysis of Conflict. Harvard University Press, Cambridge, Massachusetts. Nalebuff, B. and J. Stiglitz. 1983. Information, competition, and markets. American Economic Review 73(2):278–283. NASA. 1996. EOSDIS Home Page. http://www-v0ims.gsfc.nasa.gov/v0ims/index.html. Nash, J. F. 1950. The bargaining problem. Econometrica 18:155–162. Nash, J. F. 1953. Two-person cooperative games. Econometrica 21:128–140. Negishi, T. 1962. The stability of a competitive economy: A survey article. Econometrica 30(4):635–669. Nilsson, N. J. 1998. Artificial Intelligence: A New Synthesis. Morgan Kaufmann, San Mateo, California. O’Hare, G. M. P. and N. R. Jennings, editors. 1996. Foundations of Distributed Artificial Intelligence. John Wiley & Sons, New York. Ohko, T., K. Hiraki, and Y. Anzai. 1995. Reducing communication load on contract net by casebased reasoning—Extension with directed contract and forgetting. In Proceedings of the First International Conference on Multi–Agent Systems, MIT Press, Cambridge, Massachusetts. Osborne, M. J. and A. Rubinstein. 1990. Bargaining and Markets. Academic Press, San Diego, California. Osborne, M. J. and A. Rubinstein. 1994. A Course in Game Theory. MIT Press, Cambridge, Massachusetts. Parunak, H. van Dyke. 1987. Manufacturing experience with the Contract Net. In M. Huhns, editor, Distributed Artificial Intelligence, pages 285–310. Pitman Publishings London and Morgan Kaufman, San Mateo, California.
References
253
Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. 1986. Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, Cambridge. Prosser, P. 1993. Hybrid algorithms for the constraint satisfaction problem. Computational Intelligence, 9:268–299. Raiffa, H. 1982. The Art and Science of Negotiation. Harvard University Press, Cambridge, Massachusetts. Ramamritham, K., J. S, and P. Shiah. 1990. Efficient scheduling algorithms for real-time multiprocessor systems. IEEE Transactions on Parallel and Distributed Computing 1(2):184–194. Rapoport, A. 1970. N-Person Game Theory. University of Michigan, Ann Arbor, Michigan. Rasmusen, E. 1989. Games and Information. Basil Blackwell, Cambridge, Massachusetts. Rich, E. and K. Knight. 1991. Artificial intelligence. McGraw-Hill, New York. Ronen, Y. 1995. The use of operating-systems and operations-research techniques in meta-level control. Master’s thesis, Intelligent Systems Program, University of Pittsburgh, August. Rosenschein, J. S. 1986. Rational Interaction: Cooperation Among Intelligent Agents. Ph.D. thesis, Stanford University. Rosenschein, J. S. and G. Zlotkin. 1994. Rules of Encounter: Designing Conventions for Automated Negotiation Among Computers. MIT Press, Cambridge, Massachusetts. Ross, S. 1973. The economic theory of agency: The principal’s problem. The American Economic Review 63(2):134–139. Rosu, D., K. Schwan, S. Yalamanchili, and R. Jha. 1997. On adaptive resource allocation for complex real-time applications. In Proceedings. The 18th IEEE Real-Time Systems Symposium, pages 320–329. IEEE Computer Society Press, Los Alamitos, California. Roth, A. E. 1979. Axiomatic Models of Bargaining. Springer-Verlag, Berlin. Rothkopf, M. H., A. Pekec, and R. M. Harstad. 1995. Computationally manageable combinatorial auctions. Technical Report 95-09, DIMACS, April 19. Rubinstein, A. 1982. Perfect equilibrium in a bargaining model. Econometrica 50(1):97–109. Rubinstein, A. 1985. A bargaining model with incomplete information about preferences. Econometrica 53(5):1151–1172. Rubinstein, A. and M. Yaari. 1983. Repeated insurance contracts and moral hazard. Journal of Economic Theory 30:74–97. Russell, S. J. and P. Norvig. 1995. Artificial Intelligence: A Modern Approach. Prentice Hall, Englewood Cliffs, New Jersey. Sandholm, T. 1993. An implementation of the contract net protocol based on marginal cost calculations. In Proceedings of AAAI-93, pages 256–262. The AAAI Press, Menlo Park, California. Sandholm, T., K. Larson, M. R. Andersson, O. Shehory, and F. TohmÅÈ. 1999. Coalition structure generation with worst-case guarantees. Artificial Intelligence Journal 111:209–238. Sandholm, T., S. Sikka, and S. Norden. 1999. Algorithms for optimizing leveled commitment contracts. In Proceedings of IJCAI99, pages 535–540. Morgan Kaufmann, San Francisco, California. Sandholm, T. and Y. Zhou. 2000. Surplus equivalence of leveled commitment contracts. In Proceedings of ICMAS-2000, pages 247–254, IEEE Computer Society, Los Alamitos, California. Sandholm, T. W. 1996. Limitations of the vickrey auction in computational multiagent systems. In International Conference on Multiagent Systems (ICMAS-96), pages 299–306. The AAAI Press, Menlo Park, California. Sandholm, T. W. and V. R. Lesser. 1995a. Coalition formation among bounded rational agents. In Proceedings of IJCAI-95, pages 662–669. Morgan Kaufman, San Francisco, California.
254
References
Sandholm, T. W. and V. R. Lesser. 1995b. Issues in automated negotiation and electronic commerce: Extending the contract net framework. In First International Conference on Multiagent Systems (ICMAS-95), pages 328–335. The AAAI Press, Menlo Park, California. Sandholm, T. W. and V. R. Lesser. 1997. Coalition formation among bounded rational agents. Artificial Intelligence 94(1-2):99–137. Special issue on Principles of Multi-Agent Systems. Sandler, T., J. T. Tschirhart, and J. Cauley. 1983. Theoretical analysis of transnational terrorism. The American Political Science Review 77(1):36–54. Santmire, T. E., J. Wilkenfeld, S. Kraus, K. Holley, T. E. Santmire, and K. S. Gleditsch. 1998. Differences in cognitive complexity levels among negotiators and crisis outcomes. Political Psychology 19(4):721–748. Schechter, O. 1996. Sharing resources through negotiation in multi-agent environments. Master’s thesis, Bar-Ilan University, Ramat-Gan, Israel. Schrijver, A. 1986. Theory of Linear and Integer Programming. John Wiley & Sons, Chichester, England. Schwartz, R. 1997. Negotiation about data allocation in distributed systems. Master’s thesis, BarIlan University, Ramat-Gan, Israel. Schwartz, R. and S. Kraus. 1997. Negotiation on data allocation in multi-agent environments. In Proceedings of AAAI-97, pages 29–35. The AAAI Press, Menlo Park, California. Schwartz, R. and S. Kraus. 1998. Bidding mechanisms for data allocation in multi-agent environments. In Munindar P. Singh, Anand S. Rao, and Michael J. Wooldridge, editors, Intelligent Agents IV: Agent Theories, Architectures, and Languages. Springer-Verlag, Berlin, pages 61–75. Sen, S., T. Haynes, and N. Arora. 1997. Satisfying user preferences while negotiating meetings. International Journal on Human-Computer Studies 47(3):407–27. Sen, S. and E. Durfee. 1996. A contracting model for flexible distributed scheduling. Annals of Operations Research 65:195–222. Sengupta, U. and N. V. Findler. 1992. Multi-agent planning and collaboration in dynamic resource allocation. In J. Hendler, editor, Artificial Intelligence Planning Systems: Proceedings of the First International Conference (AIPS 92), pages 305–306, Morgan Kaufmann, San Francisco, California. Shaked, A. and J. Sutton. 1984. Involuntary unemployment as a perfect equilibrium in a bargaining model. Econometrica 52(6):1351–1364. Shavell, S. 1979. Risk sharing and incentives in the principal and agent relationship. Bell Journal of Economics 10:55–79. Shehory, O. and S. Kraus. 1995. Task allocation via coalition formation among autonomous agents. In Proceedings of IJCAI95, pages 655–661. Morgan Kaufmann, San Francisco, California. Shehory, O. and S. Kraus. 1998. Methods for task allocation via agent coalition formation. Artificial Intelligence 15(3):218–251. Shehory, O. and S. Kraus. 1999. Feasible formation of stable coalitions among autonomous agents in non-super-additive environments. Computational Intelligence 15(3):218–251. Shimomura, K. I. 1995. The bargaining set and coalition formation. Technical Report 95-11, Brown University, Department of Economics. Shoham, Y. 1993. Agent oriented programing. Artificial Intelligence 60(1):51–92. Shubik, M. 1982. Game Theory in the Social Sciences, Concepts and Solutions. MIT Press, Cambridge, Massachusetts. Siegelmann, H. and O. Frieder. 1992. Document allocation in multiprocessor information retrieval systems. Technical Report IA-92-1, George Mason University. Sierra, C., P. Faratin, and N. Jennings. 1997. A service-oriented negotiation model between autonomous agents. In Proceedings 8th European Workshop on Modeling Autonomous Agents in a Multi-Agent World (MAAMAW-97), pages 17–35. Springer-Verlag, Berlin.
References
255
Singh, M., A. Rao, and M. J. Wooldridge, editors. 1998. Intelligent Agents IV: Agent Theories, Architectures, and Languages: 4th International Workshop. Springer, New York. Smith, R. and R. Davis. 1981. Framework for cooperation in distributed problem solvers. IEEE Transactions on Systems, Man and Cybernetic C-29(12):61–70. Smith, R.G. 1980. The contract net protocol: High-level communication and control in a distributed problem soler. IEEE Transactions on Computers 29:1104–1113. Smith, R. G. and R. Davis. 1983. Negotiation as a metaphor for distributed problem solving. Artificial Intelligence 20:63–109. Snyder, G. H. and P. Diesing. 1977. Conflict Among Nations: Bargaining, Decision Making and System Structure in International Crises. Princeton University Press, Princeton, New Jersey. Sonenberg, E., G. Tidhar, E. Werner, D. Kinny, M. Ljungberg, and A. Rao. 1992. Planned team activity. Technical Report 26, Australian Artificial Intelligence Institute, Australia. Spence, M. and R. Zeckhauser. 1971. Insurance, information, and individual action. The American Economic Review 61(1):380–391. Stankovic, J. A., M. Spuri, M. Di Natale, and G. C. Buttazzo. 1995. Implications of classical scheduling results for real-time systems. IEEE Transactions on Computers 28(6):16–25. Stein, J.G. and R. Tanter. 1980. Rational Decision Making: Israel’s Security Choices 1967. Ohio State University Press, Columbus. Subrahmanian, V.S., P. Bonatti, J. Dix, T. Eiter, S. Kraus, F. Ozcan, and R. Ross. 2000. Heterogeneous Agent Systems: Theory and Implementation. MIT Press, Cambridge, Massachusetts. Sullivan, D. G., A. Glass, B. J. Grosz, and S. Kraus. 1999. Intention reconciliation in the context of teamwork: an initial empirical investigation. In M. Klusch, O. Shehory, and G. Weiss, editors, Cooperative Information Agents III, Springer-Verlag, Berlin, pages 138–151. Sullivan, D. G., B. J. Grosz, and S. Kraus. 2000. Intention reconciliation by collaborative agents. In Proceedings of ICMAS-2000, pages 293–300. IEEE Computer Society Press, Los Alamitos, California. Sycara, J. and D. Zeng. 1996. Coordination of multiple intelligent software agents. International Journal of Intelligent and Cooperative Information Systems 5:181–211. Sycara, K. P. 1987. Resolving Adversarial Conflicts: An Approach to Integrating Case-Based and Analytic Methods. Ph.D. thesis, School of Information and Computer Science, Georgia Institute of Technology. Sycara, K. P. 1990. Persuasive argumentation in negotiation. Theory and Decision 28:203–242. Tanaev, V. S., Y. N. Sotskov, and V. A. Strusevich. 1994. Scheduling Theory, Multi-Stage Systems. Kluwer Academic Publishers, The Netherlands. Thompson, L. 1990. Negotiation behavior and outcomes: Empirical evidence and theoretical issues. Psychological Bulletin 108(3):515–532. Tirole, Jean. 1988. The Theory of Industrial Organization. MIT Press, Cambridge, Massachusetts. Tsvetovatyy, M., M. Gini, B. Mobasher, and Z. Wieckowski. 1997. Magma: An agent-based virtual market for electronic commerce. Applied Artificial Intelligence 6:501–523. Special issue on intelligent agents. Tsvetovatyy, M. B. and M. Gini. 1996. Toward a virtual marketplace: Architectures and strategis. In The First International Conference on the Practical Application of Intelligent Agents and Multi Agents Technology, pages 597–614. The Practical Application Company, Lancashire, U.K. Varian, H. R. 1992. Microeconomic Analysis, third edition. W. W. Norton, New York. Vere, S. and T. Bickmore. 1990. Basic agent. Computational Intelligence 4:41–60. Vickrey, William. 1961. Counterspeculation, auctions, and competitive sealed tenders. Journal of Finance 16:8–37.
256
References
Vohra, R. 1995. Coalitional non-cooperative approaches to cooperation. Technical Report 95-6, Brown University, Department of Economics. Walras, L. 1954. Elements of Pure Economics (W. Jaff´e, tr.), George Allen and Unwin, London, U.K. Weingartner, H. M. and D. N. Ness. 1967. Methods for the solution of the multi-dimensional 0/1 knapsack problem. Operations Research 15 (1):83–103. Weiss, G., editor. 1999. Multiagent Systems: A Modern Approach to Distributed Artificial Intelligence. MIT Press, Cambridge, MA. Wellman, M. 1992. A general-equilibrium approach to distributed transportation planning. In Proceedings of AAAI-92, pages 282–289. The AAAI Press, San Mateo, California. Wellman, M. 1993. A market-oriented programming environment and its application to distributed multicommodity flow problems. Journal of Artificial Intelligence Research 1:1–23. Wellman, M. 1996. Market-oriented programming: Some early lessons. In S. Clearwater, editor, Market-Based Control: A Paradigm for Distributed Resource Allocation. World Scientific, Singapore. Wellman, M. P. and P. R. Wurman. 1998. Market-aware agents for a multiagent world. Robotics and Autonomous Systems 24:115–125. Wilkenfeld, J., M. Brecher, and S. Moser. 1988. Crises in the Twentieth Century, Vol. II: Handbook of Foreign Policy Crises. Pergamon Press, Oxford. Wilkenfeld, J., S. Kraus, K. Holley, and M. Harris. 1995. Genie: A decision support system for crisis negotiations. Decision Support Systems 14:369–391. Wilson, R. 1985. Incentive efficiency of double auctions. Econometrica 53(5):1101–1116. Winston, P. H. 1984. Artificial Inteligence. Addison-Wesley, Reading, Massachusetts. Wooldridge, M., J. Muller, and M. Tambe, editors. 1996. Intelligent Agents II: Proceedings of the Second International Workshop on Agent Theories, Architectures, and Languages. SpringerVerlag, Berlin. Wooldridge, M. J. and N. R. Jennings. 1995a. Agent theories, architectures and languages: A survey. In Intelligent Agents. Springer-Verlag, Berlin, pages 1–39. Wooldridge, M. J. and N. R. Jennings, editors. 1995b. Intelligent Agents. Springer-Verlag, Berlin. Wurman, P. R., W. E. Walsh, and M. P. Wellman. 1998. Flexible double auctions for electronic commerce: Theory and implementation. Decision Support Systems 24:17–27. Ygge, F. 1998. Market-Oriented Programming and Its Application to Power Load Management. Ph.D. thesis, Lund University. Yoon, K. P. and C. L. Hwang. 1995. Multiple attribute decision making: An introduction. Sage, Thousand Oaks. Young, P. 1993a. The evolution model of bargaining. Journal of Economic Theory 59:145–168. Young, P. 1993b. The evolution of conventions. Econometrica 61(1):57–84. Zeng, Dajun and Katia Sycara. 1998. Bayesian learning in negotiation. International Journal of Human-Computer Studies 48:125–141. Zhao, W., K. Ramamritham, and J. Stanlovic. 1987. Preemptive scheduling under time and resource constraints. IEEE Transaction Computer 36(8):949–960. Zhou, L. 1994. A new bargaining set of an n-person game and endogenous coalition formation. Games and Economic Behavior 6:512–526. Zhou, S. 1988. A trace-driven simulation study of dynamic load balancing. IEEE Trans. Software Engineering 14(9):1327–1341.
References
257
Zlotkin, G. and J. S. Rosenschein. 1991. Cooperation and conflict resolution via negotiation among autonomous agents in noncooperative domains. IEEE Transactions on Systems, Man, and Cybernetics 21(6):1317–1324. Special Issue on Distributed Artificial Intelligence. Zlotkin, G. and J. S. Rosenschein. 1993. A domain theory for task oriented negotiation. In Proceedings of IJCAI-93, pages 416–422. Morgan Kaufmann, San Francisco, California. Zlotkin, G. and J. S. Rosenschein. 1994. Coalition, cryptography, and stability: Mechanisms for coalition formation in task oriented domains. In Proceedings of AAAI94, pages 432–437. The AAAI Press, Menlo Park, California.
Index
acid rain, 209 action, 11–13, 15–17, 27, 81, 83, 94, 95, 97, 112, 113, 118, 133, 159, 160, 164, 172, 215–217, 228 action profile, 12, 13 agent, 3, 4–6, 17, 67, 213 A, 68–71, 74, 75, 84, 118 W , 68, 69, 71, 74, 75, 77, 80, 84, 118 architecture, 4 automated, 3, 20 autonomous, 5 beliefs, 26, 27 bounded rational, 227 deliberative architecture, 4 designer, 40, 41, 48 goal, 159 hybrid architecture, 4, 5 interacting, 5 layered, 5 negotiation strategy, 80 negotiator, 26, 32, 122 personal, 1 preferences, 5, 68, 131, 132, 160 rational, 2, 3, 5, 19, 23, 25 reactive architecture, 4, 5 self-interested, 3, 5, 6, 8, 25–27, 29, 61, 172, 226, 227, 229 services, 4 type, 80, 95 strongest, 84 utility function, 33 Agents, 17, 32, 69, 80, 81, 94, 159–162, 168–170, 216, 218, 219, 242 agreement, 17, 19, 21, 24, 30–39, 54, 64–67, 68, 69, 71, 72, 75, 85, 88, 116, 117, 118, 119, 123, 127, 129, 150, 152, 157, 159, 161, 162, 213, 216, 217, 221 acceptable, 163 allocation, 30 cost, 160 cost over time, 70 efficiency, 29 mutually beneficial, 2, 3, 173 resource, 67, 69, 74, 92, 217 allocation, 18, 29, 30, 32, 32, 33, 34 alternating offers, 17, 23, 24, 32, 39 alternating offers protocol, 30 Andersson, M. R., 221, 226 anytime algorithm, 226 Anzai, Y., 229 Apers, P. M. G., 61 Arkin, Ronald C., 1 Arrow, Kenneth J., 64, 227, 228 artificial intelligence, 235
Attached Agent, 68, 68, 118, 124 auction, 209, 210, 221–225 combinatorical, 222 double, 223 Dutch, 222 English, 222 first price sealed bid, 222 first-price sealed-bid, 221 protocol, 223 second-price sealed bid, 222, 224 strategy, 221 Vickery, 222 virtual house, 221 auctioneer, 195, 201, 208 Aumann, Robert J., 113, 235 Axelrod, Robert, 2 Azoulay-Schwartz, Rina, 18, 21, 45, 232 backtracking algorithm, 45, 45, 48, 63, 184, 185, 189, 190, 192, 203, 206 backtracking algorithm on sub-problems, 45, 45, 47, 48 Baiman, S., 228 Balch, Tucker, 1 Banerjee, A., 228 bargaining, 2, 219, 236, 242 Bayesian rule, 27 Beggs, A., 228 belief, 80 probabilistic, 80, 193 beliefs, 94 system of, 81 Ben-Or, Michael, 18 Berns, K., 7 Bhaska, V., 40 Bickmore, T., 4 bid, 221 bidding mechanism, 66, 193 Blake, Elizabeth, 213 Bonatti, Piero, 3 Bond, Alan H., 2, 235, 241 Bratman, Michael E., 4 Brazier, Frances, 1 Brecher, Michael, 219, 220 broadcast, 30, 41, 58 Brooks Rodney, 5 Buttazzo, Giorgio C., 154 buyer, 221–223, 228 c A , 70, 74–76 cW , 70 Caillaud, B., 228 Cammarata, Stephanie, 171 Carver, N., 171
260
case-based reasoning, 27 Casey, R. G., 60 central controller, 29, 67 central cotroller, 29 centralized controller, 8 centralized solution, 25 Ceri, S., 33, 61, 62 Charniak, Eugean, 235 Chatterjee, K., 232 Chavez, Andrea, 4, 6, 116 cheap talk, 40, 41 Chu, W. W., 60 Clarke, E., 64 clearing prices, 195–197, 199 client, 29, 31, 33, 38, 57, 58, 242 coalition, 6, 225 core, 226 grand, 227 kernel, 226 Shapley value, 226 coalition formation, 225–227 Coffman, E. G., 172 commitment, 27, 172 common belief, 19 communication, 4 communication cost, 21 communication network, 31, 61, 66 competition perfect, 194 competitive equilibrium, 10, 193, 194, 206 Competitive Equilibrium Market mechanism (CEM), 194, 195, 199, 200, 202, 203, 206, 207 complete information, 29, 30, 38, 66, 73, 175, 181, 182, 184, 202, 203, 206, 211 complexity, 25, 42, 172, 202 computation time, 227 conflict, 3, 9, 25, 26, 155, 159, 175, 221, 223 conflict allocation, 18, 32, 35–37, 47, 76, 180, 181, 182, 184, 185, 193, 195–199, 203, 206 conflicting reports, 58 Conry, Susan E., 26, 115 consistency, 83 constant gain, 21 constraint satisfaction problem, 63 constraints, 176, 179, 185, 189, 190, 200–202, 206 special circumstances, 176, 180, 183, 184 usual circumstances, 176, 179 contract, 227 Contract Net protocol, 171, 228 contractor, 228 controller, 3 convention, 40
Index
cooperation, 1, 2, 5, 231 coordination, 1, 2, 5, 8, 231 Copeland, Thomas E., 21 Cormen, T. H., 236 cost, 159, 160, 172 answer, 33 communication, 8, 35 computation, 8, 35 negotiation, 35 retrieval, 33 storage, 7, 30, 33, 35, 36, 46, 54 counteroffer, 2, 17, 20 Cramton, P., 209, 210 Crowston, K., 171 Dasgupta, P., 210 data, 29 data allocation, 2, 18, 61, 90, 167, 221 Data Allocation Environment, 31 data allocation problem, 29, 31, 40, 45, 66, 76, 160, 180–182, 193 database, 57, 61, 63 dataset, 31, 32–36, 38, 43–46, 48, 51, 53, 54, 56–58, 62, 66, 221 local, 38, 54, 58, 58, 59, 60 old, 31, 32 remote, 58, 58, 59 usage, 35, 38, 46, 48, 51, 52, 57–59, 90 Davis, R., 171, 228 deadline, 118, 120–126, 128–133, 142, 149–154, 161 Dechter, Rina, 63 decision making, 10 distributed, 8 decision theory, 10, 236 decision-regret, 20 Decker, Keith S., 1, 171 delivery company, 172 delivery time, 33 demand function, 194, 203, 208 Demski, J., 228 deviation, 13, 16, 24, 25, 39, 40, 42, 49, 50 Diesing, P., 219 Diplomacy, 27 disagreement, 19, 31, 34, 69, 126, 159, 162, 168, 181, 216 distance virtual, 33 Distributed Artificial Intelligent, 3, 235 distributed constraint satisfaction problem, 26 distributed file allocation, 33, 60 Distributed Problem Solving, 26, 171, 226–228, 241 distributed systems, 114, 172
Index
Dix, Jourgen, 3 document, 29, 33, 34, 60, 63 local, 30 new, 35 remote, 33 donei , 120, 122, 237 Doorenbos, Robert B., 4 Dowdy, L. W., 33, 61 Du, X., 33, 61, 62 Dummett, M., 64 Durfee, Edmund, H., 115, 171, 229 Earliest-Deadline-First algorithm, 154 economy, 112, 221 Eichberger, Juergen, 10, 12, 236 Eiter, Thomas, 3 electronic commerce, 222, 232, 233 electronic market, 6 electronic newsletter, 159, 165, 166, 170 emission, 175, 176, 179, 183, 185, 193, 202, 208–211 encounter, 94, 96, 97, 101 first, 98–100, 102–105, 107, 108, 110 multiple, 67, 94, 95, 108 second, 98–104, 107–110, 112 two, 95, 97, 106, 108, 109 environment parameters, 30, 33, 51 EOSDIS, 29, 30 Ephrati, Eithan, 64 equilibrium polling, 95 seperating, 95 subgame perfect, 24 equilibrium selection, 40 equilibrium strategies, 30 equlibrium hybrid, 95 semi-seperating, 95 ESA, 72, 73, 77, 89, 100, 106 Eswaran, K. P., 60 Etzioni, Oren, 3, 4 European Space Agency (ESA), 72 experiment, 72, 73, 77, 91, 115, 153–157, 186, 190, 201, 206 Faratin, Peyman, 6, 27 Farrell, J., 40 Ferguson, I. A., 5 file allocation, 61 file allocation problem, 60 Findler Nicholas V., 116 finite horizon model, 21 Fischer, Klaus, 6, 172 Fisher, Roger, 2, 236
261
Foster, D. V., 33, 61 Fraser, N. M., 220 French, Simon, 236 Frieder, Ophir, 63 Friedman, J. W., 236 Frost, Daniel, 63 Fudenberg, Drew, 95, 235 game, 11–13, 15 coalitional form, 12 extensive form, 11, 13, 14, 16 chance move, 15 chance node, 15 decision node, 15 information set, 15 path, 15 strategy, 16 strategy profile, 16 terminal nodes, 15 history, 16 N-person, 226 player, 11, 12 strategic form, 12, 13 game theory, 2, 10, 11, 25, 27, 30, 40, 41, 112, 175, 221, 226, 235, 241 concepts, 10 Garey, Michael R., 41, 44, 236 Gasser, Les, 2, 235, 241 generalized Nash product, 41, 43, 48, 49, 65, 76, 77, 182–184, 186, 188–192, 205, 207, 242 genetic algorithm, 45, 47, 48, 63 George, A. L., 219 Georgeff, Michael P., 5 Gerber, C., 225 Gimenez-Funes, E., 221, 232 Gini, Maria, 6 Ginsberg, M., 235 Glass, Alyssa, 244 Gleditsch, Kristian S., 213 goal, 27, 68, 117, 118, 120, 121–124, 129, 142, 153, 157 tmax , 121 tmin , 121 deadline, 121 identification number, 120 Godo, L., 232 Goldberg, D. E., 45 Golombek, Michele P., 7 Graham, R. L., 172 greedy algorithm, 61, 62 Grigg, I., 172 Grossman, Sanford J., 227, 232 Grosz, Barbara J., 172, 244
262
group, 172 Groves, T., 64 Guttman, R. H., 221 Hackman, Richard J., 3, 242 Hall, Lavinia, 2, 236 Haller, Hans, 38 Harris, Michael, 213, 216, 218 Harris, Milton, 228 Harsanyi, John C., 40 Hart, Oliver D., 227, 235 heuristic, 30, 42, 45 Hewit, Carl E., 115 hill-climbing algorithm, 45, 45, 47, 48, 51, 56, 63, 184–186, 189, 190 Hillier, F. S., 62 Hipel, K. W., 220 Hiraki, K., 229 Hirshleifer, J., 227 history, 81, 94 Holley, Kim, 213, 216, 218 Hostage Crisis, 11, 13, 213, 214, 216–219, 232 Huhns, Michael N., 172, 235 Hwang, C. L., 236 incentive compatible, 64, 222, 223 Bayesian, 65 incomplete information, 24, 26, 30, 38, 57, 64–66, 80, 90, 175, 192, 203–205, 211, 223, 224, 231 individual rational, 223 information server, 3, 29, 221 intention, 27 interest rate, 20, 21, 35, 37 Internet, 1, 3, 29, 66, 172, 221 Israel, David J., 4 Jamison, J. C., 40 Jennings, Nick R., 1, 3, 4, 6, 27, 172, 235 Johnson, David S., 41, 44, 236 Johnson, R., 2, 236 Kahan, James P., 12, 226 Kaminka, Gal A., 1 Kandori, M., 40 Karrass, Chester L., 2, 236 Kennan, J., 65 Kerr. S, 210 Ketchpel, S. P., 227 Klusch, Matthias, 6, 226 Knapsack problem, 62 Knight, K., 235 Kornfeld, William A., 115
Index
Kraus, Sarit, 3, 11, 13, 18, 21, 27, 171, 172, 213, 214, 216, 218, 219, 225–227, 229, 232, 241, 244 Kreps, David M., 82 Kryazhimskii A., 210 Kuhn, N., 172 Kuwabara, Kazuhiro, 26, 115 Kwerel, E., 210 Laffont, J., 227 Lander, Susan E., 26 Landsberger, M., 228 Lapan, Harvey E., 220 Larson, K., 226 learning, 27, 232 Lebow, R. N., 219 Ledyard, John, 210 Lehmann, Daniel, 27, 241 Leng, R., 219 Lesperance, Yves, 235 Lesser, Victor R., 26, 27, 115, 171, 225, 227 Lewicki, R., 236 Li, Jin, 1 liar, 59 Lieberman, G. J., 62 Linial, Nati, 18 Luce, Robert D., 13, 23, 94, 235 Ma, C. A., 232 Macho-Stadler, I., 228 Maes, Patti, 4–6, 116, 221 Malone, Thomas W., 171, 229 manager, 228 Manove, M., 232 March, S. T., 63 market mechanism, 175, 193, 195, 199, 201, 202, 208, 211 Market-Clearing with Intermediate Exchange (MCIE), 195, 199, 200, 202, 203, 206, 207 Market-Clearing with Intermediate Transactions (MCIT), 195, 199–203, 206, 207 market-oriented programming, 65, 66, 175, 224, 225 Maryanski, Fred J., 33, 61, 62 Matthews, S., 228 Mayoh, B., 210 McArthur, David, 171 McDermott, Drew, 235 McAfee, R. P., 228 McMillan, J., 228
Index
mechanism bidding, 66 Clarke tax, 64 incentive compatible, 64 lottery, 65 mediator, 8, 76, 214, 215 Meilijson, I., 228 Minton, S., 45, 63 mixed strategies, 94, 103–105, 107, 112, 113 Moehlman, T., 26 monetary system, 9, 20, 121, 200, 221, 224 money transfer, 9, 221, 223, 225, 232 Moser, S., 220 Mullen, T., 66 Muller, Jourge P., 5, 6, 235 Multi-Agent Systems, 226 multiagent system, 235 Multiagent Systems, 26 Myerson, Robert B., 10, 12, 65, 228, 235 Nalebuff, B., 228 NASA, 46, 72, 73, 77, 89, 97, 100, 101, 106, 119 Nash equilibrium, 13, 13, 16, 16, 23, 40, 58 Nash, John F., 13, 23, 40, 41, 65, 76, 182–184, 242 near-optimal solutions, 184 Negotiation leave, 239 negotiation, 2, 3, 5–8, 11, 13, 25, 26, 31, 73, 117–119, 131, 143, 146, 154–156, 167, 182, 183, 231, 236 beginning, 30, 124, 125, 129 beliefs, 26 bilateral, 67, 68, 117, 118, 157, 159, 162, 166 cost, 2, 34, 35, 120, 123, 126, 130, 217 efficiency, 8, 231 end, 133–135, 149, 157 finite horizon, 21 guides, 2 heuristics, 241 leave, 118, 120, 122, 127, 131–135, 137, 138, 140, 145–147, 149–153 multiple-encounters, 94 no revelation, 65 opt out, 17 outcome, 19, 21, 23, 40, 122, 132 process, 18 protocol, 8, 9, 17, 26, 119 resource, 117 strategy, 9, 29, 133 time, 2, 8, 127, 161, 231 negotiator, 2 Ness, D. N., 62
263
Nilsson, Nils, 235 Norden, S., 229 norm, 175, 208 norms, 175 Norvig, Peter, 235 NP-complete, 41, 43, 60, 61, 182, 222, 226, 236 O’Hare, Greg M. P., 235 offer, 2, 17, 18, 20, 37–39, 68, 74, 118 Ohko, T., 229 operations research, 62, 114 opt out, 17, 19, 21, 25, 69, 71, 88, 118, 119, 131, 132, 143, 153, 156, 160 optimal solution, 49, 62 opting out, 19, 21, 29, 30, 36–39, 67, 70, 160, 161 cost, 36 utility, 37, 59, 64 Osborne, Martin J., 2, 10, 12, 23–25, 82, 235 Ozcan Fatma, 3 ℘i (H ), 81 P´erez-Castrillo, J., 228 Pareto optimal, 40, 41, 74, 223 Parunak, van Dyke H., 229 Pavlin, Jasmina, 115, 171 perfect recall, 16 Perry, Motty, 232 Petro, C. C., 172 planning distributed, 26 plant, 175–180, 182–185, 189, 244 player, 13, 16 action, 12 Plotkin, Tanya, 171 pollutant, 175–179, 184–186, 190–195, 197–202, 204, 206–209 pollution, 175, 179, 182, 184–186, 192, 193, 196, 202, 208–210, 231 pollution allocation, 3, 179, 231 Pollution Allocation Environment, 177 pollution allocation problem, 175, 180, 193, 207, 225 Possiblet , 22, 71, 128, 132 Press, W. H., 184 price, 31, 223, 224 private information, 30, 57 probabilistic belief, 80 Prosser, Patrick, 45, 63 protocol, 5, 8, 9, 17, 18, 24, 25, 156, 222 auction, 221 pure strategies, 94, 102, 104, 108, 112, 113 pure strategy, 95
264
query, 29, 31, 33, 46, 61, 242 price, 33, 54 ρ, 195 Raiffa, Howard, 2, 13, 20, 23, 94, 235, 236 Rao, Anand, 235 Rapoport, Amnon, 12, 226 Rasmusen, Eric, 10, 12, 112, 227, 235 Raviv, Artur, 228 real-time systems, 114 reputation, 95 reservation price, 26, 27, 223 resource, 29, 41, 65, 66, 68, 117, 221 usage, 67, 68, 77–80, 91–93, 108, 117, 126, 155 resource allocation, 3, 113, 117, 126, 154, 157, 160, 231 distributed, 116, 225 resource allocation problem, 216 revelation, 65 revelation mechanism, 30, 57, 66, 90, 192 Rho, S., 63 Rich, E., 235 Riley, J., 227 robot, 1, 67, 76, 90, 106, 107, 115, 150 Rodriguez-Aguilar, J. A., 232 Ronen, Yagil, 154 Rosenschein, Jeff S., 6, 20, 27, 64, 113, 172, 227, 242 Ross, Robert, 3 Ross, Stephen A., 227 Roth, Alvin E., 2, 236 Rubinstein, Ariel, 2, 9, 10, 12, 17, 23–25, 32, 82, 228, 235 Russ, C., 225 Russel, Stuart J., 235 S, 19, 32, 33, 35, 68, 121, 159, 161 sˆ W,t , 71, 74, 75, 79, 85, 128 sˆ i,t , 22, 128 s˜ W,t , 71 s˜i,t , 22, 22, 120, 128, 128, 161 Samuelson, Larry, 232 Sandholm, Tuomas W., 1, 27, 171, 172, 221, 225–227, 229 Sandler, Todd, 220 Santmire, Tara E., 213 Santmire, Toni E., 213 Schrijver, A., 184, 202 search, 26, 30, 41 binary, 196, 202, 244 distributed, 26 seller, 221, 223, 228 Selten, Reinhard, 40
Index
Sen, Sandip, 1, 229, 242 Sengupta, Uttam, 116 sequential equilibrium, 24, 81–83, 83, 87, 88, 90, 96, 98, 99, 103–105 108–110, 112 sequential rationality, 82 sequential response protocol, 167, 182, 184, 192, 202, 241 SERV, 31 server, 29–31, 56 distance, 33, 46, 52, 53 geographical area, 31 Shaked, A., 25, 160 shared plans, 172 Shavell, S., 228 Shechter, Orna, 129, 130, 133, 153, 244 Shehory, Onn, 1, 6, 172, 225–227 Shimomura, K. I., 226 Shoham, Yoav, 4 Shubik, M., 236 side payments, 166, 184, 186, 191, 192, 206, 207, 243 Siegelmann, Hava T., 63 Sierra, Carles, 6, 27 Sikka, S., 229 Simplex, 184–186, 202 simulation, 30, 46, 48, 51, 63, 76, 90, 108, 117, 153, 155, 182, 183, 186, 189, 198, 201, 202, 213, 232 simultaneous response protocol, 17, 182, 184, 192, 202, 231 Singh, Munindar P., 235 Smith, Raid G., 171, 228 Smoke, R., 219 Snyder, G. H., 219 social-welfare criterion, 30, 41, 42, 48–50, 61, 64, 184 SPE, 24, 75, 161, 162 special circumstances, 178, 179, 183, 184, 194 Spence, M., 228 stability, 9, 232 standard deviation, 52, 54, 156, 183, 186 distance, 53, 54 usage, 52, 53 state information, 13 state-oriented domain, 27 static allocation, 29, 30, 32, 36, 42, 46, 49 steady state, 13 Steeb, Randall, 171 Stein, J. G., 219 Stiglitz, J., 228 strategic negotiation, 30, 67 strategic negotiation model, 213
Index
strategic-negotiation model, 2, 3, 6, 17, 17, 18, 29, 66, 175, 181, 202, 208, 211, 221, 223, 224, 231 strategies stability, 9 strategy, 5, 9, 16–18, 23, 26, 29, 80, 94, 133, 135–138, 140, 145–147 mixed, 94 pure, 94 strategy profile, 23, 23, 24, 30 subgame perfect equilibrium, 23, 24, 24, 38–40, 76, 117, 127, 128, 132, 133, 143, 147, 162, 167, 182 Subrahmanian, V.S., 3 Sullivan, Dave G., 244 Sutton, J., 25, 160 Sycara, Katia P., 1, 2, 6, 27, 232 symbolic model, 4 system of beliefs, 81 Szakaly-Moore, Kristin, 210 Tˆ , 161–163, 166–169 T , 17, 19, 21, 71, 74, 75, 80, 89, 121, 131, 217 Type, 80 Tambe, Milind, 1, 235 Tanaev, V. S., 172 Tanter, R., 219 task, 159, 221, 227 task allocation, 171 task distribution, 3, 160, 171–173, 182, 231 task-oriented domain, 27 tatonnement, 194, 195, 244 teamwork, 172, 244 termination condition, 18 terrorists, 213, 214, 220 threat, 24, 68 time period, 18, 19, 23, 32, 33, 42, 69, 71, 76, 161, 217 first, 18, 25, 30, 36, 40, 66, 71, 75, 76, 79, 81, 90, 108, 117, 125, 127, 130, 146, 150, 217, 218, 244 second, 67, 71, 76, 79, 81, 89, 96, 98, 100–102, 111, 116, 117, 217, 218, 244 time preference, 21 Tirole, Jean, 24, 95, 227, 228, 235 Tohm, F., 226 tree, 13–15 Tsvetovatyy Maksim B., 6 U i , 18, 20, 20, 21, 22, 31, 33–36, 39, 42, 43, 69, 71, 80, 121, 128, 159–161, 167–169, 216, 239, 242
265
uncertainty usage, 38 Ury, William, 2, 236 user preferences, 4 usual circumstances, 178–180, 183 utility, 11 coalition, 226 conflict, 32 utility function, 8, 17, 20, 20, 21, 27, 31, 33, 35, 43, 69, 90, 117, 121, 126, 129, 155, 167, 178, 181, 181, 186, 211, 216 additive, 63 agreement, 35 allocation, 34 assumptions, 33, 69, 99 attributes, 84 basic, 177 constant cost of delay, 69, 84 constant gain of delay, 69, 84 constraints, 178 fixed time cost, 20 linear, 182, 189, 190 one dataset, 34 opponent, 26 properties, 35 resource allocation, 69, 90 robot, 77, 106, 108 server, 21, 31, 33, 41 textbf, 159 time constant discount rate, 20, 24, 35 time effect, 35 Varian, H. R., 193, 194 vcosts(alloc), 46 vcosts ration, 46 Vere, S., 4 Vierke, G., 225 Vohra, R., 226 voting protocol, 64 Waiting Agent, 68, 118 Wallras, Leon, 194 Walsh, W. E., 221, 223 weather, 175 Weingartner, H. M., 62 Weiss, Gerhard, 235 Weld, Daniel S., 3, 4 Wellman, Michael P., 65, 66, 172, 175, 194, 221, 223, 225, 244 Weston, Fred J., 21 Wilkenfeld, Jonathan, 11, 13, 213, 214, 216, 218–220 Wilson, R., 65
266
Wilson, Robert, 82 Winston, P. H., 235 Wooldridge, Michael, 3, 4, 235 worth-oriented domain, 27 Wurman, P. R., 221, 223, 225 Yaari, M., 228 Ygge, F., 207
Index
Yoon, K. P., 236 Young, Peyton, 40 Zeckhauser, R., 228 Zeng, Dajun, 1, 6, 27, 232 Zhou, L., 226 Zhou, Y., 229 Zlotkin, Gilad, 6, 20, 27, 113, 172, 227, 242