Recent Advances in Agent-based Negotiation: Formal Models and Human Aspects 9811604703, 9789811604706

This volume comprises carefully selected and reviewed outcomes of the 12th International Workshop on Automated Negotiati

209 61 4MB

English Pages 131 [125] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
Editors and Contributors
Human Aspects in Agent-based Negotiation
Effect of Awareness of Other Side's Gain on Negotiation Outcome, Emotion, Argument, and Bidding Behavior
1 Introduction
2 Related Work
3 Human–Human Negotiation
3.1 Awareness of Opponent's Gain
3.2 Emotions in Negotiation
3.3 Argumentation in Negotiation
4 Bidding Behavior
5 Structured Human Negotiations
6 Experimental Evaluation
6.1 Experimental Setup
6.2 Analysis of Negotiation Outcome
6.3 Analysis of Arguments
6.4 Analysis of Bidding Behavior
6.5 Analysis of Emotion
6.6 Analysis of Questionnaire
7 Conclusion and Future Work
References
Facilitation in Abstract Argumentation with Abstract Interpretation
1 Introduction
1.1 Abstract Interpretation as Facilitation
2 Technical Preliminaries
2.1 Abstract Argumentation
2.2 Order and Galois Connection for Abstract Interpretation
3 Argumentation Frameworks for Abstraction
3.1 Lattices
3.2 Abstraction and Concretisation
3.3 Computation of Abstract Space Argumentation Frameworks from a Concrete Space Argumentation Framework
3.4 Preferred Sets in Concrete and Abstract Spaces
3.5 Comparisons to Dung Preferred Semantics and cf2 Semantics, and Observations
4 Conclusion with Related Work
4.1 Related Work
4.2 Conclusion
References
How to Recognize and Explain Bidding Strategies in Negotiation Support Systems
1 Introduction
2 Related Work
3 Typical Bidding Strategies
4 Bids, Utilities and Moves
4.1 Optimal Bidding Strategy Recognition
5 Expectations and Aberrations
6 Generating Explanations
7 Evaluation
7.1 Preparation
7.2 Conditions
7.3 Metrics
8 Results
8.1 Pilot 1
8.2 Pilot 2
8.3 Full Experiment
9 Conclusion
References
Negotiation Frameworks, Strategies, and Recommenders
NegMAS: A Platform for Situated Negotiations
1 Introduction
2 Situated Negotiations
3 System Design
3.1 Issues and Outcomes
3.2 Utility Functions
3.3 Negotiators
3.4 Controllers
3.5 Agents
3.6 Mechanisms
3.7 Worlds
4 Tools and Common Components
5 Applications: Focus on SCML
6 Using NegMAS for Developing SCM Agents
7 Conclusions
References
Can a Reinforcement Learning Trading Agent Beat Zero Intelligence Plus at Its Own Game?
1 Introduction
2 Trading Agents
2.1 ZI Agents
2.2 ZIP Agents
2.3 RL Agents (ZIQ+)
3 Setup
4 Results
5 Conclusions
References
Negotiation in Hidden Identity: Designing Protocol for Werewolf Game
1 Introduction
2 Background
2.1 Werewolf Game: Hidden Identity in Communication Game
2.2 Game AI Studies: From Chess to Werewolf
3 Model of Werewolf Games
3.1 Basic Rule on Close-Rule Werewolf Game
3.2 Lack of Objective Resources
3.3 Reasoning for Modeling the Intentions of Others
3.4 Persuasion as Modeling Self from the Perspective of Others
3.5 Requirements for a Werewolf Game Protocol
4 Werewolf Game Protocol
4.1 Word
4.2 Sentence
4.3 Operator
4.4 Grammar Notes
4.5 About Omitting Subjects (UNSPEC)
4.6 Example Sentences
5 Conclusion
References
Multi-Agent Recommender System
1 Introduction
2 Literature Review
2.1 Machine Learning (ML)
2.2 Recommender Systems (RS)
3 The MARS Recommender System
3.1 MARS Architecture
3.2 Training and Test Data
3.3 The Manager Agent
4 Experimental Evaluation
4.1 Dataset
4.2 Data Pre-processing
4.3 Implementation Environment
4.4 Evaluation Metrics
5 Performance Analysis of MARS
5.1 Performance of the Components MARS
5.2 A Comparison of MARS with Other Systems
6 Conclusion and Future Work
References
Recommend Papers

Recent Advances in Agent-based Negotiation: Formal Models and Human Aspects
 9811604703, 9789811604706

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Studies in Computational Intelligence 958

Reyhan Aydoğan · Takayuki Ito · Ahmed Moustafa · Takanobu Otsuka · Minjie Zhang   Editors

Recent Advances in Agent-based Negotiation Formal Models and Human Aspects

Studies in Computational Intelligence Volume 958

Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland

The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, self-organizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.

More information about this series at http://www.springer.com/series/7092

Reyhan Aydoğan Takayuki Ito Ahmed Moustafa Takanobu Otsuka Minjie Zhang •







Editors

Recent Advances in Agent-based Negotiation Formal Models and Human Aspects

123

Editors Reyhan Aydoğan Department of Computer Science Özyeğin University Istanbul, Turkey

Takayuki Ito Graduate School of Informatics Kyoto University Kyoto, Japan

Ahmed Moustafa Department of Computer Science Nagoya Institute of Technology Nagoya, Japan

Takanobu Otsuka Department of Computer Science Nagoya Institute of Technology Nagoya, Japan

Minjie Zhang School of Computing and Information Technology The University of Wollongong Wollongong, NSW, Australia

ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-981-16-0470-6 ISBN 978-981-16-0471-3 (eBook) https://doi.org/10.1007/978-981-16-0471-3 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface

Agent-based negotiations in which intelligent software agents negotiate with their human counterparts or other software agents in order to reach consensus in a given context have been widely studied in the field of multiagent systems and artificial intelligence. The prominence and wide-spread adoption of automated negotiation techniques have attracted plenty of attention from a number of researchers in multiple areas. Those areas include agreement technology, mechanism design, electronic commerce, recommender systems, supply chain management, and other related areas. Toward this end, the ACAN series of workshops was incepted in order to facilitate and share the ideas among the automated negotiation researchers and to foster collaboration inside the community. The Twelfth International Workshop on Agent-based Complex Automated negotiations (ACAN 2019) was held in conjunction with IJCAI 2019. In this regard, this book covers the selected works that were presented in the ACAN 2019 workshop that cover the human aspects of negotiation and the recent advances in negotiation frameworks and strategies. This book consists of the following parts: – Part 1: Human Aspects in Agent-based Negotiation, and – Part 2: Negotiation Frameworks, Strategies, and Recommenders Finally, we would like to extend our sincere thanks to all authors. This book would not have been possible without the valuable support and contributions of those who cooperated with us. Istanbul, Turkey Kyoto, Japan Nagoya, Japan Nagoya, Japan Wollongong, Australia November 2020

Reyhan Aydoğan Takayuki Ito Ahmed Moustafa Takanobu Otsuka Minjie Zhang

v

Contents

Human Aspects in Agent-based Negotiation Effect of Awareness of Other Side’s Gain on Negotiation Outcome, Emotion, Argument, and Bidding Behavior . . . . . . . . . . . . . . . . . . . . . . Onat Güngör, Umut Çakan, Reyhan Aydoğan, and Pinar Özturk

3

Facilitation in Abstract Argumentation with Abstract Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ryuta Arisaka, Jérémie Dauphin, and Takayuki Ito

21

How to Recognize and Explain Bidding Strategies in Negotiation Support Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vincent J. Koeman, Koen Hindriks, Jonathan Gratch, and Catholijn M. Jonker

35

Negotiation Frameworks, Strategies, and Recommenders NegMAS: A Platform for Situated Negotiations . . . . . . . . . . . . . . . . . . . Yasser Mohammad, Shinji Nakadai, and Amy Greenwald

57

Can a Reinforcement Learning Trading Agent Beat Zero Intelligence Plus at Its Own Game? . . . . . . . . . . . . . . . . . . . . . . . . . . . . Davide Bianchi and Steve Phelps

77

Negotiation in Hidden Identity: Designing Protocol for Werewolf Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hirotaka Osawa, Takashi Otsuki, Claus Aranha, and Fujio Toriumi

87

Multi-Agent Recommender System . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Abdullah Alhejaili and Shaheen Fatima

vii

Editors and Contributors

About the Editors Dr. Reyhan Aydoğan is Assistant Professor in the Department of Computer Science at Özyeğin University and guest researcher in the Interactive Intelligence Group at TU Delft. She received her PhD. degree in 2011 in Computer Engineering from Boğaziçi University, Turkey, after which she joined the Interactive Intelligence Group at Delft University of Technology as a postdoctoral researcher. As a guest researcher, she visited the Center of Collective Intelligence at MIT in 2013; Intelligence Systems Group at Norwegian University of Science and Technology in 2015 and Nagoya Institute of Technology in 2017. Her research focuses on the modeling, development and analysis of intelligent agents that integrate different aspects of intelligence such as reasoning, decision making and learning. She is well-known for her research on qualitative preference modeling, automated negotiating agents and negotiation protocols. Besides autonomous agents, she also designs and develops decision support systems in particular negotiation support systems. Her aim is to support human decision makers in complex and dynamic environments, which also requires the design of effective human computer interaction (e.g. preference elicitation). Her career project is about human-robot negotiation. Dr. Takayuki Ito is Professor in the Department of Social Informatics at Kyoto University. He received his Doctor of Engineering from Nagoya Institute of Technology, Japan in 2000. He was a JSPS research fellow, an associate professor of JAIST, and a visiting scholar at USC/ISI, Harvard University, and MIT. He received the JSAI Achievement Award, the JSPS Prize, the Fundamental Research Award of JSSST, the Prize for Science and Technology of the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science, and Technology (MEXT), the Young Scientists’ Prize of the Commendation for Science and Technology by the MEXT, the Nagao Special Research Award of IPSJ, the Best Paper Award of AAMAS2006, the 2005 Best Paper Award of

ix

x

Editors and Contributors

JSSST, and the Super Creator Award of 2004 IPA Exploratory Software Creation Project. He was a JST PREST Researcher, and a principal investigator of the Japan Cabinet Funding Program for Next Generation World-Leading Researchers. He is currently principal investigator of JST CREST project. Dr. Ahmed Moustafa is an Associate Professor at Nagoya Institute of Technology. He received his Ph.D. in Computer Science from the University of Wollongong, Australia. He is a member of the Japan Society of Artificial Intelligence, IEEE Computer Society, Australia Computer Society, Service Science Society of Australia. He was a visiting researcher in University of Adelaide, Auckland University of Technology and Data61, Australia. His main research interests include complex automated negotiation, multiagent reinforcement learning, trust and reputation in multiagent societies, deep reinforcement learning, service oriented computing, collective intelligence, intelligent transportation systems and data mining. Dr. Takanobu Otsuka is Associate Professor in the Department of Computer Science at Nagoya Institute of Technology, Japan. He received his M.E. and Doctor of Engineering from the Nagoya Institute of Technology, Japan, in 2011 and 2016, respectively. Between 2012 and 2015, he was an Assistant Professor at Nagoya Institute of Technology. From 2015 and 2016, he was a visiting researcher at University of California Irvine, USA. His main research interests include IoT, multi-agent systems, intelligent agents, distributed system, and software engineering on offshoring. Prof. Minjie Zhang received her Ph.D. from the University of New England, Australia in 1996. Dr. Zhang is a full professor, the Director of Centre for Big Data Analytics and Intelligent Systems, and the Research Chair in Computer Science and IT at the University of Wollongong (UOW), Australia. She is the author/co-author of over 260 research papers. She has organized over 20 international workshops and conferences. As a co-editor, she has edited 14 books and 4 special issues with reputable journals. Her research interests include multi-agent systems and their applications in complex domains, distributed artificial intelligence, smart modeling and simulation in complex systems, agent-based grid/cloud computing, and smart grid systems.

Contributors Abdullah Alhejaili Department of Computer Science, Loughborough University, Loughborough, UK; Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia

Editors and Contributors

xi

Claus Aranha Faculty of Engineering, Information and Systems, University of Tsukuba, Tsukuba, Japan Ryuta Arisaka Nagoya Institute of Technology, Nagoya, Japan Reyhan Aydoğan Department of Computer Science, Özyeğin University, Istanbul, Turkey; Interactive Intelligence Group, Delft University of Technology, Delft, The Netherlands Davide Bianchi King’s College London, London, UK Umut Çakan Department of Computer Science, Özyeğin University, Istanbul, Turkey Jérémie Dauphin University of Luxembourg, Esch-sur-Alzette, Luxembourg Shaheen Fatima Department of Computer Science, Loughborough University, Loughborough, UK Jonathan Gratch USC Institute for Creative Technologies, Playa Vista, CA, USA Amy Greenwald Brown University, Providence, USA Onat Güngör Department of Computer Science, Özyeğin University, Istanbul, Turkey Koen Hindriks Vrije Universiteit Amsterdam, Amsterdam, The Netherlands Takayuki Ito Kyoto University, Kyoto, Japan Catholijn M. Jonker Interactive Intelligence Group, Delft University of Technology, Delft, The Netherlands; LIACS, Leiden University, Leiden, The Netherlands Vincent J. Koeman Vrije Universiteit Amsterdam, Amsterdam, The Netherlands Yasser Mohammad NEC Inc., Minato-ku, Japan; Assiut University, Asyut, Egypt Shinji Nakadai NEC Inc., Minato-ku, Japan; AIST, Tsukuba, Japan Hirotaka Osawa Faculty of Engineering, Information and Systems, University of Tsukuba, Tsukuba, Japan Takashi Otsuki Graduate School of Science and Engineering, Yamagata University, Yamagata, Japan

xii

Editors and Contributors

Pinar Özturk Norwegian University of Science and Technology, Trondheim, Norway Steve Phelps Mesonomics Ltd., London, UK Fujio Toriumi Graduate School of Engineering, The University of Tokyo, Bunkyo City, Japan

Human Aspects in Agent-based Negotiation

Effect of Awareness of Other Side’s Gain on Negotiation Outcome, Emotion, Argument, and Bidding Behavior Onat Güngör, Umut Çakan, Reyhan Aydo˘gan, and Pinar Özturk

Abstract Designing agents aiming to negotiate with human counterparts requires additional factors. In this work, we analyze the main elements of human negotiations in a structured human experiment. Particularly, we focus on studying the effect of negotiators being aware of the other side’s gain on the bidding behavior and the negotiation outcome. We compare the negotiations in two settings where one allows human negotiators to see their opponent’s utility and the other does not. Furthermore, we study what kind of emotional state expressed and arguments sent in those setups. We rigorously discuss the findings from our experiments. Keywords Negotiation · Human negotiator · Emotion · Argument · Bidding behavior · Gain awareness

1 Introduction In the broadest sense, negotiation resolves conflicts and finds mutually acceptable solutions among two or more parties with different preferences [21]. Negotiations are in human lives such as selling a house, arranging a travel plan, and so on. With the advancements in artificial intelligence, autonomous agents can alleviate the burden O. Güngör (B) · U. Çakan · R. Aydo˘gan Department of Computer Science, Özye˘gin University, Istanbul, Turkey e-mail: [email protected] U. Çakan e-mail: [email protected] R. Aydo˘gan e-mail: [email protected] R. Aydo˘gan Interactive Intelligence Group, Delft University of Technology, Delft, The Netherlands P. Özturk Norwegian University of Science and Technology, Trondheim, Norway e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 R. Aydoˇgan et al. (eds.), Recent Advances in Agent-based Negotiation, Studies in Computational Intelligence 958, https://doi.org/10.1007/978-981-16-0471-3_1

3

4

O. Güngör et al.

of negotiating if they can make rational decisions during negotiations [12]. Such agents may negotiate with each other or with a human. Especially, for agents which are capable of negotiating with human counterparts, it is of utmost importance to understand the human behavior [18]. A deep analysis of human–human negotiations provides valuable insights into the design of such negotiating agents. For instance, the offers are the most prevalent “language” in negotiations and indicate how self-centered or cooperative the players are. As illustrated by Axelrod [2], in iterative games (one can conceive negotiations as such games), players adapt their behavior according to opponents’ behavior. Particularly, “Tit for Tat” [9], a negotiation strategy where the player imitates the behavior of the opponent in the last turn, has proven to improve cooperation among players. In this paper, we analyze the bidding behavior of the human negotiators in different settings by particularly aiming to measure the effect of observing the other side’s gain on their bidding strategy. Human negotiators do not only express their offers during the negotiation. Emotions also play a significant role in their decision-making process. To illustrate, consider a scenario where Joe (human negotiator) believes that his bids (so far) should have given some signal to his opponent (Mary) and he expects a more empathetic offer from her. If Mary’s next offer doesn’t comply with this expectation, Joe may be upset, and this emotion could be reflected in his next offer. There are several studies investigating the effect of emotions in negotiation [20, 23]. For instance, Melo et al. [8] share that people tend to concede more to a person showing anger than a person mostly expressing their happiness. Accordingly, our work investigates the effect of awareness of the other side’s gain on participant’s emotional state changes. Furthermore, the role of argumentation in negotiations has been recognized and studied well [1, 7]. Kraus et al. [13] analyze different types of arguments based on their effects such as promise, reward, threat as well as supporting arguments. They also introduce a taxonomy of arguments for negotiation. However, this taxonomy is limited where we observe some of the arguments our human subjects used in our experiments cannot be classified. We have therefore revised and adapted the given taxonomy and elaborated on the relationships between argument types and bidding behaviors. In order to design more human-likely negotiating agents, we perform a deep analysis of human–human negotiations in a structured experimental setup. Fundamentally, we incorporate the following dimensions into our work: 1. Awareness of Opponent’s Gain: In negotiation, participants mostly know their own gain in terms of the utility of the agreement for themselves. In order to investigate the effect of knowing your opponent’s utility on bidding behavior, we design two interfaces: one of them allows participants to see the other side’s utility while the other interface does not. We observed that 67% of negotiations reached higher or the same social welfare when participants know each other’s utilities compared to the case they do not. Moreover, some participants have a more tendency to competing behavior when they are aware of the other side’s gain. Some participants expressing a neutral emotional state in the case of knowing only

Effect of Awareness of Other Side’s Gain on Negotiation …

5

their own utility are inclined to express other emotional states such as frustration and pleasure. 2. Emotion: Expressed emotion towards an opponent’s previous offer may affect the whole negotiation process. In our analysis, we examine if we can see this effect on ultimate utility values. We observed that participants reaching a low utility expressed more frustration during their negotiation. 3. Argument: We introduce a classification of arguments particularly for human negotiation and our results indicated that participants who are aware of the other side’s utility are more likely to provide arguments that explain the motivation underlying the offer for both sides while a number of self-explanatory arguments are higher where they cannot see opponent’s utility. Moreover, the ones who do not know their opponent’s gain did not provide any rewarding argument. Also, the number of arguments that seem threatening is twofold in the ones who know/see the opponent’s gain. The rest of the paper is organized as follows: Sect. 2 lists the relevant studies in comparison to our work and outlines the differences between them. Section 3 discusses the main elements influencing human negotiation. Section 4 explains how we identify the bidding behavior of participants in our analysis. Section 5 describes the negotiation tool we designed that is used in our experiments. Section 6 first presents our experimental setup and then provides a detailed analysis of the experiments from different perspectives. Section 7 concludes the paper with future work directions.

2 Related Work Bosse and Jonker [5] develop the current benchmark of artificial negotiators. They analyze human and computer behavior in multi-issue negotiation by conducting two different sets of experiments. They evaluate the results of the experiments based on predetermined performance (e.g., fairness of deals) as well as step properties (e.g., the number of concession steps taken) using the SAMIN negotiation environment introduced by Bosse et al. [6]. In the first experimental setup, they compare human–human negotiations with computer–computer ones while in the second setup they compare human–computer negotiations with computer–computer negotiations. Results of their work demonstrate three facts: (1) the computer–computer negotiations have the fairest outcome, (2) computers make more unfortunate moves (i.e., making an offer decreasing both sides’ utility), and (3) humans act more diversely. This indicates a need for a better understanding of human–human negotiations in order to develop more human-likely negotiating agents. Our work paves the way for a deeper understanding of human–human negotiations. Malhotra and Bazerman [16] make a connection between psychological influence and negotiation by presenting psychological principles for a negotiation environment. They propose 13 different psychological tactics including punishment, giving something in return, providing a reason, etc. to influence the opponent. In our work, we

6

O. Güngör et al.

investigate what kind of arguments are used for these purposes. Melo et al. [8] study the effect of emotions in human–computer negotiations. They found that people concede more to an agent that expresses anger than to one that expresses happiness and the way of (non-verbal vs. verbal) expressing an emotion. Haim et al. [10] analyze negotiation behavior across cultures. They predict negotiation behaviors using machine learning methods. Their ultimate aim is to build an agent that is capable of learning to negotiate with people from different cultures. Accordingly, they conduct human–computer negotiation experiments in three different countries, namely, the United States, Lebanon, and Israel. They conclude that cultural differences have a significant effect in predicting negotiation behavior. Although it is an important aspect, we do not focus on the effect of the culture in this work. Lin and Kraus [14] question whether an automated agent is capable of negotiating with humans. They identify the main challenges and review current approaches for automated agents that can learn human-related factors (e.g., bounded rationality, incomplete information) and the opponent’s model. By studying seven different agents designed for negotiating with their human counterparts, they identify common features to be used in designing a new agent. Similarly, Oshrat et al. [19] present an automated agent which can negotiate with humans efficiently. They focus on modeling human opponents from past negotiations. The proposed agent is compared with QOAgent [15] and it achieves higher utility values. This provides us further motivation for a deeper analysis of human–human negotiations. Mell and Gratch [17] develop human-agent negotiation tool (namely IAGO). This tool enables emotion exchange using emoji and arguments. It is designed for humanagent negotiation where an agent negotiates with a human counterpart. While our negotiation tool is particularly used for a deep analysis of human–human negotiations in order to design agents negotiating with humans effectively. In that sense, they are complementary to each other.

3 Human–Human Negotiation The process of human negotiation is steered by several factors such as awareness of the opponent’s gain and emotion, arguments exchanged during the negotiation, and so on. We briefly discuss those elements in the following parts.

3.1 Awareness of Opponent’s Gain Being aware of the opponent’s gain may affect the human negotiator’s decisions during the negotiation. On one hand, this can impact negotiation process positively. For instance, it may cause better judgment for the offers made in terms of fairness, hence the likelihood of the acceptance by the opponent. That is, the negotiator can

Effect of Awareness of Other Side’s Gain on Negotiation …

7

better anticipate opponent’s responses to the offers which can be adapted accordingly. On the other hand, it may also have a negative impact on the negotiator’s behavior. For example, if a negotiator is preoccupied with fairness and observes that the opponent’s bid is not fair at all, then the negotiator may have a tendency to be less cooperative which may create a challenge for reaching an agreement. In our experiments, we study this effect by testing two environments in which one of the environments allows the negotiators to observe the utility of their opponent while the other environment does not.

3.2 Emotions in Negotiation Human decision-making can be highly influenced by people’s moods. In any negotiation context, this effect can be observed. The bidding behavior of a negotiating party may cause a change in the other side’s emotions. To illustrate, if you receive a humiliating offer, then you may become upset or frustrated. Consequently, this will change your bidding behavior which is triggered by your emotion. That may cause you to give a less flattering offer. In our work, we consider five different emotions: positive (pleasant, very pleasant), neutral, and negative (unpleasant, frustrated). Furthermore, observing the opponent’s emotions may help the negotiator to guess the opponent’s next moves and it can be considered as feedback to revise your offers towards finding a consensus.

3.3 Argumentation in Negotiation Types of arguments exchanged during the negotiation can give some clues about the behavior of the human negotiator; therefore, in this section, we focus on how to classify given arguments. There are many related works in the literature that specifically work on finding argument types. Kraus et al. [13] propose the idea of using argumentation for achieving cooperation and agreements. They present six distinctive argument types (i.e., categories) from weakest to strongest: appeal to prevailing practice, counterexample, appeal to past promise, appeal to self-interest, the promise of future reward, and threat. Amgoud and Prade [1] introduce another classification which consists of three main categories: threats, rewards, and explanatory. In threats, a negotiator forces the opponent to behave in a certain way; in rewards, a negotiator proposes a reward in order to make the offer accepted; and in explanatory arguments, a negotiator gives some reasons to make the opponent believe the offer. Furthermore, Sierra et al. [22] accumulate argument types under three categories similar to [1]. They classify arguments as threatening, rewarding, or appealing. Note that the explanatory arguments in those studies are intended for only opponents (i.e., stating the benefits for the opponent); however, they can also explain from their own perspective as well as from a mutual benefit perceptive.

8

O. Güngör et al.

Fig. 1 Argument types framework

We observe that self explanatory and both explanatory arguments are not mentioned in those works after a detailed analysis of our experiments. Specifically, we add those categories in order to cover all arguments provided in our experiments as depicted in Fig. 1. In our argument framework there are three main types of arguments: Explanatory: These arguments provide reason why the given offer is acceptable. Explanatory arguments should be analyzed from three different perspectives: 1. Self: Here, arguments are provided considering player’s own perspective solely. It is more likely for player to provide arguments using a self-centered approach. E.g., “I have been there before and I did not like it”. 2. Opponent: Arguments are based on opponent actions, and they try to modify the behavior of the opponent. Giving counter example is a good tactic where a player tells the opponent that current actions contradict with opponent’s past action(s). E.g., “Museums are not suitable for us, you did not like museums the last time”. 3. Both: These are the arguments which consider both sides of the negotiation. It can be thought as a way to increase the social welfare. They can be categorized as a. Promoting: It aims to glorify the offer. E.g., “Festival is very nice, we will be having so much fun”. b. Demoting: The player shows the infeasibility of the current offer. E.g., “It is not possible to make holiday in Stockholm given 300 euros and 7 days”. Rewarding: Rewarding arguments aim for convincing opponent to do something by offering a reward. E.g., “If you accept Stockholm, I will increase the budget”. Threatening: These are the arguments which force an agent to behave in a certain way. They can be in different forms. For instance, you should do α; otherwise, I will do β is one type of threatening arguments. E.g., “This is your last chance to accept I’m NOT gonna concede, ever again”.

Effect of Awareness of Other Side’s Gain on Negotiation …

9

4 Bidding Behavior In this section, we present how we classify the bidding behaviors of a human negotiator in a systematic way. In the literature, Thomas [24] proposes Thomas–Kilmann Conflict Mode Instrument which has five different behavior modes in order to cope up with conflict situations. These modes are determined based on the degree of assertiveness (i.e., satisfying own concerns) and cooperativeness (i.e., satisfying other person’s concerns) of humans. The offered modes are competing (high assertiveness & low cooperativeness), collaborating (high assertiveness & high cooperativeness), compromising (mediocre assertiveness & mediocre cooperativeness), avoiding (low assertiveness & low cooperativeness), and accommodating (low assertiveness & high cooperativeness). Based on the aforementioned model, Baarslag et al. [4] introduce another classification, which is based on the player’s concession rate against particular opponents. Those are inverter (i.e., inverts opponent behavior), conceder, competitor, and matcher (i.e., matches opponent behavior). This model requires agents negotiating with the same type of agents—which is not possible in our case. Therefore, we need to come up with a mathematical model to classify the bidding behavior based on Thomas–Kilmann Model by taking two dimensions into account: 1. Assertiveness: It measures the individual attempts to satisfy own concerns/ preferences. Specific to our experiments, we calculate assertiveness by considering players’ own utility values while making their own offers. We measure and categorize assertiveness into three categories: high, mediocre, and low. High class corresponds to utility of the bid is between 68 and 100, mediocre class corresponds to utility between 34 and 67, low class corresponds to utility between 0 and 33. By applying a majority voting on the classification of each offer made by the human negotiator, we decide the level of assertiveness. For example, if there are 5 High, 2 Mediocre, and 1 Low, then the assertiveness is considered as High. 2. Cooperativeness: Cooperativeness is the measurement of the individual attempts to find a mutual agreement. In our work, we consider that the cooperativeness of the negotiators can be determined based on their sensitivity to their opponent’s preferences. Therefore, we adopt the sensitivity calculation in Eq. 1 proposed by Hindriks et al. [11]. Sensitivit ya (t) =

% For tunate + % N ice + %Concession % Sel f ish + %U n f or tunate + % Silent

(1)

Sensitivity is calculated by taking into account the percentages of the negotiator’s different moves. A move is determined based on the utility difference of the negotiator’s subsequent offers for both sides. There are six different move types (fortunate, nice, concession, selfish, unfortunate, and silent). Table 1 demonstrates the calculation of move types of a player where Us and Uo represent the utility difference for negotiator itself and that for opponent, respectively.

10

O. Güngör et al.

Table 1 Move specification of a negotiator [11] Self difference Silent Nice Fortunate Unfortunate Concession Selfish

Us Us Us Us Us Us

=0 =0 >0 0 >0 0 1, we consider player as cooperative (C), if sensitivity < 1 we classify player as uncooperative (U). Otherwise, the player is considered as neutral (N) to opponent’s preferences. In order to decide on bidding behavior of each player, we need to combine the assertiveness and cooperativeness results according to our classification depicted in Fig. 2. Note that according to Thomas–Kilmann Conflict Mode Instrument model, it is hard to differentiate some behaviors formally (e.g., which behavior to be assigned for neutral assertiveness level and uncooperative behavior). Therefore, we extend this model as seen in Table 2.

5 Structured Human Negotiations Observing and analyzing human–human negotiations can give valuable insights for designing human-likely negotiating agents. In this work, we develop a negotiation tool, which allows human negotiators to negotiate with each other in a more structured way, particularly by following the alternating offers protocol proposed by Aydogan et al. [3]. Human negotiators can exchange offers in a turn-taking fashion and they can also send arguments to persuade their opponents and share their current emotional state. Note that our tool supports only bilateral negotiations.

Effect of Awareness of Other Side’s Gain on Negotiation …

11

Fig. 2 Bidding behavior classification

Since one of our aims is to find out whether being aware of the other side’s gain has an influence on human negotiator’s bidding behavior, we design two interfaces that are almost identical to each other except one of the interfaces enables negotiators to observe their opponent’s current gain (in terms of utility) while other interface hides this information. Figure 3 shows the bidding interface showing both sides’ utilities for the chosen offer. The human participants can choose the values for each issue by using the drop boxes while making their offers. They can express their emotions and provide an argument about their opponent’s previous offer or related to their current offer to convince their opponent. Negotiation is governed by alternating offer protocol in which bid exchanges continue in a turn-taking fashion until players reach an agreement or the given deadline. The time limit for each negotiation is set to 20 min. Additionally, players are informed about how many minutes left for the negotiation at specific time intervals (10 min, 5 min, 2 min, and 1 min). In our negotiation tool, participants can choose one of the five emotional states: positive (pleasant, very pleasant), neutral and negative (unpleasant, frustrated). Players can provide any type of argument to their opponents. Participants in their turn can see their opponent’s offer as well as their emotional state and arguments at that time. Consequently, they can assess whether their opponent is pleasant or unhappy about their previous offer. The given arguments may help to convince the current player or help them understand each other’s preferences.

12

O. Güngör et al.

Fig. 3 Bidding screen

6 Experimental Evaluation In this section, we present our experimental setup, and we analyze the experiments by considering different dimensions.

6.1 Experimental Setup In order to set up well-structured negotiation experiments, we prepare two negotiation scenarios on a travel domain. There are four issues: destination, duration, budget, and amusement type. All possible values for each issue are specified and the total number of possible outcome is equal to 320. The preferences of the participants are represented by a simple additive utility function. That is, the utility for each value of an issue is given between 0 and 100, and the overall utility of a given offer is calculated by the sum of the utility of issue values specified in the given bid.

Effect of Awareness of Other Side’s Gain on Negotiation …

13

For our human–human negotiation experiments, we recruit 24 students at Özye˘gin University (%29.1 female, %70.8 male. %54 M.Sc., % 38 bachelor, %0.8 Ph.D. %63 between 21 and 25, % 29 between 26 and 30, % 4 between 18 and 20, % 4 between 31 and 35). It is worth noting that participants gave their informed consent before participating in the experiment and the work is a part of a research project, which was approved by Özye˘gin University Research Ethics committee (REC). As an incentive mechanism, we provide coffee gift cards to the most successful participants. Each participant is asked to negotiate in both scenarios where in the first scenario they can only see their own utilities while in the second scenario they can also see the utility of the opponents. Note that the utility function given to participants are different in both scenarios in order to decrease the learning effect. Besides, the utility functions are generated from the same utility distribution so that we can compare the utility of agreements fairly. Furthermore, in order to decrease the learning effect, we use the randomization technique. That is, half of the participants start with the first scenario (Group1) while the other half starts with the second scenario (Group 2). Before the experiment, we gave a live demo illustrating how to use the negotiation tool. For each negotiation session, participants are given 20 min, if participants cannot make an agreement within the specified time, both sides did obtain a score of 0 (100 being maximum). For all negotiation sessions, we log all negotiation-related information (e.g., offer made at each round, elapsed time while making this offer) to be used for our further analysis. After a group is done with their negotiations, they are asked to fill out the questionnaire form to get feedback about their negotiation.

6.2 Analysis of Negotiation Outcome As our first main result, we observe that 23 out of 24 negotiation sessions are ended up with an agreement. Figure 4 shows the utilities that players received in both scenarios. On the x-axis, we have the player number, and on the y-axis we have corresponding utility values. Here, blue bars denote the results of the first scenario in which players cannot see each other’s utility [non-observable scenario (NOS)] and orange bars depict the result of the second scenario where an opponent is able to see other’s utility [observable scenario (OS)]. We discover that 33% of the players received higher utility in OS while 46% of the players received higher utility in NOS. Overall, 21% of the players received the same score. We apply mixed-ANOVA statistical test to see if there is a significant difference in gained utilities between two scenarios. We have not observed a significant effect of scenario (OS and NOS ) on the received player utilities [F(1, 22) = 0.48, p = 0.16]. In addition to that, we observe that there is no significant difference between player’s starting with OS or NOS [F(1, 22) = 1.54, p = 0.23]. We also measure the social welfare. When we compare the social welfare (i.e., the sum of both players’ utilities) in both settings, we observe that participants obtained higher social welfare in OS in %50 of negotiations while in NOS %33 of negotiations they reached higher social welfare.

14

O. Güngör et al.

Fig. 4 Player scores for different negotiation scenarios

6.3 Analysis of Arguments By using our proposed framework explained in Sect. 3.3, we make a detailed analysis of arguments provided in our experiments. We analyze the arguments one by one based on three main argument types and subcategories combined. The results can be seen in Fig. 5. This figure demonstrates the percentages of different argument types with respect to each scenario. In total, in NOS 67 different arguments are provided by participants, while this number is 56 in OS. It is remarkable that both explanatory arguments are seen more in OS while self-explanatory arguments are observed more in NOS. This implies that when players are aware of the other side’s gain, it is more likely for them to provide arguments which concern both sides of the negotiation. Moreover, players did not provide any rewarding argument in NOS while this number is 2 in OS. The number of threatening arguments is twofold in OS (6 vs 3). Other than these results, it is important to share the fact that threatening arguments are provided towards the end of the negotiation as expected.

6.4 Analysis of Bidding Behavior We calculate the assertiveness and cooperativeness of each participant in both settings. We observe that while 79% of the participants are highly assertive in NOS, 96% of them are highly assertive in OS. Both in NOS and OS 50% of the participants are sensitive to their opponent’s preferences. Additionally, 25% of the participants

Effect of Awareness of Other Side’s Gain on Negotiation …

15

Fig. 5 Argument analysis for different negotiation scenarios

are insensitive in NOS while 12.5% of them are insensitive in OS. This shows that some of the participants increase their sensitivity while they can see their opponent’s utility. Based on Table 2, we classify each participant’s behavior demonstrated at Table 3. Remarkably, we do not have any compromising behavior. The percentage of participants who demonstrate competing behavior is increased by 12% when they negotiate in OS. There is a slight increase in the percentage of participants who are collaborating in their OS. Avoiding behavior can only be observed in NOS. When we compare NOS and OS, accommodating behavior is almost disappeared in OS.

6.5 Analysis of Emotion We calculate percentages of all emotional states for each player. Table 5 shows corresponding percentages of each emotional state for both NOS and OS scenarios. Although there is no significant effect of the scenario on the emotional states, we observe there are more unpleasant emotional state exchanged on average in NOS than OS(%24 versus %18) while more frustrated emotional states are observed on average in OS (%11 versus %4). In addition, we also compare participant’s each emotional state percentages in both scenarios and compute the number of participants whose emotional state percentage is higher in NOS than the percentage in OS and vice versa. Table 4 shows, for each emotional state, how many participants expresses higher percentage of that particular emotional state in their NOS than OS. For example, the percentage of frustration emotion is higher for 4 participants in their NOS negotiation compared to OS ones while that of 7 participants is higher in their OS negotiations.

16

O. Güngör et al.

Table 3 Behavior analysis Player NOS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Avoiding Competing Competing Collaborating Avoiding Competing Competing Collaborating Collaborating Collaborating Collaborating Competing Competing Competing Collaborating Avoiding Collaborating Collaborating Collaborating Competing Competing Accommodating Accommodating Competing

OS

Group

Competing Collaborating Competing Competing Accommodating Collaborating Competing Competing Competing Collaborating Collaborating Collaborating Competing Competing Collaborating Collaborating Competing Competing Collaborating Collaborating Collaborating Collaborating Competing Competing

1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2

Table 4 Comparison of emotional state expressions for individuals Frustrated Unpleasant Neutral Pleasant NOS OS NOS OS NOS OS NOS OS 4

7

11

8

11

8

6

8

Very pleasant NOS OS 1

1

Players were more neutral in emotion in NOS than when in OS (11 versus 8). Being aware of the opponent’s gain may change the negotiator’s emotional state. Besides, players show a tendency to express stronger emotional states when they observe their opponents’ utility. To exemplify, players were more frustrated in OS than when in NOS (7 versus 4). In the questionnaire taken after their negotiation, some participants reported that they get frustrated more when they receive unfair offers. Note that players can detect unfairness only in OS. On the other hand, it is observed that more players have a higher percentage of unpleasant emotions in their

Effect of Awareness of Other Side’s Gain on Negotiation … Table 5 Emotion percentages for each player Player Neutral Pleasant Unpleasant NOS OS NOS OS NOS OS 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Average STDEV

67 17 38 43 33 0 80 60 100 75 50 33 67 50 100 50 78 88 75 67 100 67 100 67 63 26

100 0 40 60 50 0 80 50 100 67 100 100 33 25 50 60 50 20 33 67 100 100 80 60 59 30

0 17 0 14 33 33 0 0 0 0 25 33 0 0 0 50 11 0 0 0 0 0 0 0 9 14

0 0 0 0 50 0 20 0 0 33 0 0 0 0 0 0 33 40 33 0 0 0 20 20 10 15

17 67 38 14 33 67 20 40 0 25 0 33 33 50 0 0 0 13 25 33 0 33 0 33 24 20

0 100 60 0 0 0 0 0 0 0 0 0 67 75 17 20 17 20 0 33 0 0 0 20 18 28

17

Very pleasant NOS OS

Frustrated NOS OS

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 11 0 0 0 0 0 0 0 1 2

17 0 25 29 0 0 0 0 0 0 25 0 0 0 0 0 0 0 0 0 0 0 0 0 4 9

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 33 0 0 0 0 0 1 6

0 0 0 40 0 100 0 50 0 0 0 0 0 0 33 0 0 20 0 0 0 0 0 0 11 23

NOS compared to their OS (11 versus 8). These unpleasant emotions may stem from getting a utility under their expectation irrespective of what their opponent’s gains. Regarding the correlation between particular emotions and received agreement utilities, we observe a weak positive relation between neutral emotion and utilities in NOS (R = 0.23) and a weak negative correlation between neutral emotion and utilities in OS (R = –0.2544). Furthermore, there is a moderate negative relation between frustrated emotion and utilities in NOS (R = –0.61) as expected. Also, there is a weak positive relation between positive emotion and utilities OS (R = 0.245). We can obtain more reliable results if we increase the number of participants.

18

O. Güngör et al.

Fig. 6 Negotiation behavior questionnaire results

6.6 Analysis of Questionnaire Figure 6 shows the number of participants who take into consideration the underlying elements during their negotiation, according to their response to our questionnaire. 23 out of 24 players considered their own utility. It is seen that most of the players consider the notion of fairness, their own utilities, opponent’s offers in their negotiation. A few people consider the opponent’s gesture and emotions while 71% of the participants consider arguments.

7 Conclusion and Future Work In this work, we conduct a structured human negotiation experiment and analyze the results from different perspectives. We mainly investigate how knowing the opponent’s utility affects the bidding behavior of players in the sequel, the negotiation outcome, emotional state, and arguments. Furthermore, we provide a classification for the arguments exchanged during the negotiation as well as bidding behaviors and use them in our experimental setup. The analysis of our experiment results indicates that observing the opponents’ utility for the given offers has a positive effect on social welfare. In particular, in 62% of OS negotiations, participants obtained higher social welfare values than what is obtained in NOS experiments. Observing opponents’ utilities affected the results in several other ways as well. For example, some participants exhibit more competing behavior when they are aware of the other side’s gains.

Effect of Awareness of Other Side’s Gain on Negotiation …

19

Some participants expressing a neutral emotional state in the NOS cases are inclined to express other emotional states such as frustration and pleasant on observable cases. We observed also that participants reaching a low utility in the deal, expressed more frustration during their negotiation. Regarding the arguments used in the experiments, we find that in OS cases where both could see each other’s utilities for a given offer, players provide reasons involving both parties (i.e., from a fairness perspective) while in the NOS cases arguments are of a more self-explanatory nature, i.e., justification of themselves. Moreover, the ones who do not know their opponent’s gain did not provide any rewarding argument. Also, the number of arguments that seem threatening is twofold in the ones who know the opponent’s gain. It is worth noting that if we had more participants, more trustworthy results could be obtained. Structured human–human experiments have provided interesting findings. However, because of the relatively small number of participants, the results were not conclusive enough. Further investigation with the participation of more subjects needs to be conducted to obtain more conclusive results. In future experiments, we will also investigate additional factors such as remaining time in negotiation, power differences between players. Acknowledgements This work has been supported by a grant of The Scientific and Research Council of Turkey (TÜB˙ITAK) with grant number 118E197. The contents of this article reflect the ideas and positions of the authors and do not necessarily reflect the ideas or positions of TÜB˙ITAK.

References 1. Amgoud, L., Prade, H.: Generation and evaluation of different types of arguments in negotiation. In: NMR, pp. 10–15 (2004) 2. Axelrod, R.: The Evolution of Cooperation. Basic, New York (1984) 3. Aydogan, R., Festen, D., Hindriks, K.V., Jonker, C.M.: Alternating offers protocols for multilateral negotiation. In: Fujita, K., Bai, Q., Ito, T., Zhang, M., Ren, F., Aydogan, R., Hadfi, R. (eds.) Modern Approaches to Agent-Based Complex Automated Negotiation, pp. 153–167. Springer, Berlin (2017) 4. Baarslag, T., Hindriks, K., Jonker, C.: Towards a quantitative concession-based classification method of negotiation strategies. In: International Conference on Principles and Practice of Multi-agent Systems, pp. 143–158. Springer (2011) 5. Bosse, T., Jonker, C.M.: Human vs. computer behavior in multi-issue negotiation. In: Rational, Robust, and Secure Negotiation Mechanisms in Multi-agent Systems (RRS’05), pp. 11–24. IEEE (2005) 6. Bosse, T., Jonker, C.M., Treur, J.: Experiments in human multi-issue negotiation: analysis and support. In: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004, pp. 671–678. IEEE (2004) 7. Carabelea, C.: Adaptive agents in argumentation-based negotiation. In: ECCAI Advanced Course on Artificial Intelligence, pp. 180–187. Springer (2001) 8. de Melo, C.M., Carnevale, P.J., Gratch, J.: The effect of expression of anger and happiness in computer agents on negotiations with humans. In: 10th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2011), Taipei, Taiwan, 2–6 May 2011, vol. 1–3, pp. 937–944 (2011)

20

O. Güngör et al.

9. Faratin, P., Sierra, C., Jennings, N.R.: Negotiation decision functions for autonomous agents. Robot. Auton. Syst. 24(3–4), 159–182 (1998) 10. Haim, G., Gal, Y., Kraus, S., Blumberg, Y.: Learning human negotiation behavior across cultures. In: HuCom International Working Conference on Human Factors and Computational Models in Negotiation (2010) 11. Hindriks, K.V., Jonker, C.M., Tykhonov, D.: Let’s DANS! an analytic framework of negotiation dynamics and strategies. Web Intell. Agent Syst. 9(4), 319–335 (2011) 12. Jonker, C.M., Aydogan, R., Baarslag, T., Fujita, K., Ito, T., Hindriks, K.V.: Automated negotiating agents competition (ANAC). In: AAAI, pp. 5070–5072 (2017) 13. Kraus, S., Sycara, K., Evenchik, A.: Reaching agreements through argumentation: a logical model and implementation. Artif. Intell. 104(1–2), 1–69 (1998) 14. Lin, R., Kraus, S.: Can automated agents proficiently negotiate with humans? Commun. ACM 53(1), 78–88 (2010) 15. Lin, R., Kraus, S., Wilkenfeld, J., Barry, J.: An automated agent for bilateral negotiation with bounded rational agents with incomplete information. Front. Artif. Intell. Appl. 141, 270 (2006) 16. Malhotra, D., Bazerman, M.H.: Psychological influence in negotiation: an introduction long overdue (2008) 17. Mell, J., Gratch, J.: IAGO: interactive arbitration guide online (demonstration). In: Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems, Singapore, 9–13 May 2016, pp. 1510–1512 (2016) 18. Mell, J., Gratch, J., Baarslag, T., Aydogan, R., Jonker, C.M.: Results of the first annual humanagent league of the automated negotiating agents competition. In: Proceedings of the 18th International Conference on Intelligent Virtual Agents, pp. 23–28. ACM (2018) 19. Oshrat, Y., Lin, R., Kraus, S.: Facing the challenge of human-agent negotiations via effective general opponent modeling. In: Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1, pp. 377–384 (2009) 20. Pietroni, D., Van Kleef, G.A., De Dreu, C.K.W., Pagliaro, S.: Emotions as strategic information: effects of other’s emotional expressions on fixed-pie perception, demands, and integrative behavior in negotiation. J. Exp. Soc. Psychol. 44(6), 1444–1454 (2008) 21. Raiffa, H.: The Art and Science of Negotiation. Harvard University Press, Harvard (1982) 22. Sierra, C., Jennings, N.R., Noriega, P., Parsons, S.: A framework for argumentation-based negotiation. In: International Workshop on Agent Theories, Architectures, and Languages, pp. 177–192. Springer (1997) 23. Sinaceur, M., Tiedens, L.Z.: Get mad and get more than even: when and why anger expression is effective in negotiations. J. Exp. Soc. Psychol. 42(3), 314–322 (2006) 24. Thomas, K.W.: Thomas-Kilmann conflict mode. TKI Profile and Interpretive Report, pp. 1–11 (2008)

Facilitation in Abstract Argumentation with Abstract Interpretation Ryuta Arisaka, Jérémie Dauphin, and Takayuki Ito

Abstract Cycles of attacking arguments pose non-trivial issues in abstract argumentation. In particular, when arguments in a cycle cannot be determined either to be accepted or rejected, their acceptance statuses simply become undecided. The undecided statuses can then propagate out of the cycles to other parts of the graph, contaminating even more arguments with the undecided status. This is less than desirable from the perspective of obtaining as much useful information as possible. For a remedy, we draw inspiration from facilitation in real-life argumentation with abstractive summarisation of conflicting arguments, which assists a positive discussion progression. As facilitation for real-life argumentations requires awareness of a certain semantic relation among arguments, so, too, do we need argument graphs together with a lattice-like ontological semantic structure over arguments. With the semantic-argumentgraphic hybrid approach, we show that, even where no arguments in a cycle could be selected sensibly, we could say more about acceptability of arguments in an argumentation graph.

1 Introduction Consider the following scenario: the members of a board of directors are gathered in a meeting to decide the future general strategy of their company.

R. Arisaka (B) Nagoya Institute of Technology, Nagoya, Japan e-mail: [email protected] J. Dauphin University of Luxembourg, Esch-sur-Alzette, Luxembourg e-mail: [email protected] T. Ito Kyoto University, Kyoto, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 R. Aydoˇgan et al. (eds.), Recent Advances in Agent-based Negotiation, Studies in Computational Intelligence 958, https://doi.org/10.1007/978-981-16-0471-3_2

21

22

R. Arisaka et al.

• a1 : ‘We should focus on improving our business organization structure, because it determines our economic conduct’. (to shorten, focusOnOs and OsDeterminesEc) is advanced by one member. • a2 : ‘We should focus on improving our market performance, because it determines our business organization structure’. (focusOnMp and MpDeterminesOs) is then advanced by another member, as an attack on a1 . • a3 : ‘We should focus on improving our economic conduct, because it determines our market performance’. (focusOnEc and EcDeterminesMp) is then given in response to a2 . • a5 : ‘Our firm needs 1 billion dollars revenue this fiscal year’, meanwhile, is an argument expressed by another member. • a4 : ‘Let our company just sink into bankruptcy!’ (focusOnLiq), another member impatiently declares in response, against which, however, all the first three speakers promptly express dissent with their arguments. We can represent this argumentation as AF1 of Fig. 1, with five arguments a1 , . . . , a5 . In Dung’s abstract argumentation theory [7], according to one acceptance judgement [5], an argument is accepted if and only if all its attackers are rejected; is rejected if and only if there exists an attacker that is accepted and is undecided (neither accepted nor rejected), otherwise. Following these instructions, every argument in AF1 is undecided, and so we gain almost no information with regard to the acceptability statuses of the arguments. In real-life argumentation, such inconclusive argumentation is seen often. Because of that, there normally are facilitators acting to make a progress in argumentation [9, 11]. Periodically supplying a summary of discussion is considered an effective facilitation strategy [10]. For our example, a1 , a2 and a3 are all for the benefit of their company’s growth. Noticing that, a facilitator may aggregate the three arguments into a new argument ax : ‘We should focus on our company’s further growth’ (focusOnImp), to derive AF2 in Fig. 1. The aggregation extracts from the three arguments part with no conflict while preserving the attack on a4 (since to focus on company’s growth opposes the choice of liquidation). This transformation is a progress on the argumentation. Indeed, we obtain that ax is accepted, a4 is rejected and a5 is accepted. Acceptability statuses of a4 and, in particular, a5 of AF1 have been sharpened.

Fig. 1 Left: original argumentation framework AF1 . Right: abstracted argumentation framework AF2

a1 a3

a4

a5

ax

a4

a2 AF1

AF2

a5

Facilitation in Abstract Argumentation with Abstract Interpretation

23

1.1 Abstract Interpretation as Facilitation The particular facilitation we saw is reminiscent of abstract interpretation [6], a methodology known in static program analysis, to map concrete space semantics to abstract space semantics and to do inferences in the latter space to say something about the former space. In the above, AF1 in Fig. 1 is the concrete space argumentation, while AF2 in Fig. 1 is an abstract space argumentation. The abstract semantics is typically coarser than the concrete semantics; in our example, the detail of what exactly their company should focus on for the company’s growth was abstracted away. In return, we were able to conclude in abstract space that ax , a5 are accepted and, moreover, that a4 is rejected. Compared to existing methodologies to deal with cycles, e.g. cf2 [3], which can give an accepted status to a5 only by enforcing acceptance of either of a1 , a2 and a3 (which rejects a4 ), our approach does not require acceptance of any arguments within the cycle in favour of the others. In contrast to those existing graph-theoretical approaches, abstract interpretation applies to semantic information. Thus, abstract argumentation + abstract interpretation is a semanticargumentgraphic hybrid approach, which we propose to sharpen arguments’ acceptability statuses with respect to a given semantic structure. Specifically, we use a lattice ordered by ontological belongs-to relation among included entities. In our setting, the semantic relation among a1 , . . . , a4 , ax is as visualised in Fig. 2. The dotted arrow with a filled box denotes that the argument on the tail side of the arrow belongs to the argument on the tip side. It shows that each of focusOnOs, focusOnEcand focusOnMp belongs to focusOnImp. None of the three belong to the other two, and none of them attack or are attacked by focusOnImp. Importantly, focusOnLiq does not belong to focusOnImp, which ensures that abstraction be kept among the arguments in the cycle only. Moreover, focusOnImp attacks focusOnLiq, which ensures that abstraction preserve the attack relation to focusOnLiq. In this work, we will apply abstract-interpretation-inspired method to argumentation frameworks, albeit the usual sound over-approximation consideration not strictly observed (for that, readers may be referred to [1] for the first proper materialisation

focusOnImp

focusOnOs

focusOnMp

focusOnLiq

focusOnEc

Fig. 2 AF1 and some ontological abstract-concrete relation over its arguments. focusOnImp is more abstract an argument than focusOnOs, focusOnMp, and focusOnEc. Neither of the three is more abstract nor more concrete than the other two. focusOnLiq is not a concrete instance of focusOnImp

24

R. Arisaka et al.

of abstract interpretation within formal argumentation in the ordinary sense in static analysis). Technical preliminaries will be in Sect. 2. Our formal frameworks will be developed in Sect. 3, where we discuss in detail conditions that should be satisfied for abstraction. We will make comparisons to Dung preferred and cf2 semantics as existing alternatives.

2 Technical Preliminaries 2.1 Abstract Argumentation Let A be a class of abstract entities, and let R be all the binary relations over A. Let R A for some A ∈ A be a subclass of R that contains all members R of R (but nothing else) that satisfy: if (a1 , a2 ) ∈ R, then a1 , a2 ∈ A. An argumentation framework is a 2-tuple (A, R) for A ⊆fin A and R ∈ R A . An argument a1 is said to attack another argument a2 if and only if, or iff, (a1 , a2 ) ∈ R. A subset A1 of A is said to defend ax ∈ A iff, for each a y ∈ A attacking ax , there is some az ∈ A1 such that az attacks ax . A subset A1 of A is said to be conflict-free iff no member of A1 attacks a member of A1 ; an admissible set iff it is conflict-free and defends all the members of A1 , and a preferred set (extension) iff it is a settheoretically maximal admissible set. There are other types of admissible sets, and an interested reader will find details in [2, 7]. We say that an argument is sceptically accepted iff it is in all preferred sets and credulously accepted iff it is in at least one preferred set.

2.2 Order and Galois Connection for Abstract Interpretation Let L 1 and L 2 be an ordered set, ordered in 1 and 2 , respectively. Let α be an abstraction function that maps each element of L 1 onto an element of L 2 , and let γ be a concretisation function that maps each element of L 2 onto an element of L 1 . α(l1 ) for l1 ∈ L 1 is said to be an abstraction of l1 in L 2 , and γ(l2 ) for l2 ∈ L 2 is said to be a concretisation of l2 in L 1 . If α(l1 ) 2 l2 implies l1 1 γ(l2 ) and vice versa for every l1 ∈ L 1 and every l2 ∈ L 2 , then the pair of α and γ is said to be a Galois connection. Galois connection is contractive: α ◦ γ(l2 ) 2 l2 for every l2 ∈ L 2 , and extensive: l1 1 γ ◦ α(l1 ) for every l1 ∈ L 1 . Also, both α and γ are monotone with α ◦ γ ◦ α = α and γ ◦ α ◦ γ = γ. An ordered set L 1 , ordered by a partial order 1 , is a complete lattice just when it is closed under join and meet for every L 1 ⊆ L 1 . Every finite lattice is a complete lattice.

Facilitation in Abstract Argumentation with Abstract Interpretation

25

3 Argumentation Frameworks for Abstraction Technically speaking of what we showed in Sect. 1, to apply abstract interpretation for some given (A, R), a facilitator prepares a larger A such that A ⊆ A ⊆fin A,  and of R  ∈ R A such that (1) R ⊆ R  and (2) for any a1 , a2 ∈ A, if (a1 , a2 ) ∈ R, / R, then also (a1 , a2 ) ∈ / R  . The second then also (a1 , a2 ) ∈ R  , and if (a1 , a2 ) ∈  condition for R ensures that it is conservative over R for the members of A.

3.1 Lattices   Let (L 2 , , , ) be a finite lattice, and let f : A → L 2 be such that, for each l2 ∈ L 2 , if l2 is neither a top nor a bottom element, then there exists some a ∈ A such that f (a) = l2 , i.e. f is surjective up possibly to the two extreme elements of L 2 . This function is basically a semantic interpretation of A , which could be some chosen ontology representation with belongs-to relation among entities. For example, in Sect. 1, all focusOnMp, focusOnEc and focusOnOs belonged to focusOnImp, which should enforce focusOnImp mapped onto an upper part in L 2 than the three, i.e. f (focusOnEc), f (focusOnMp), f (focusOnOs)  f (focusOnImp). The elements in L 2 form an abstract lattice. In comparison, we define elements of a concrete lattice to be just sets of arguments in A ordered inset inclusion. Specifically, let low : L 2 → 2 L 2 be such that: low(l2 ) := {l2 } if l2 = L 2 (the bottom element);   else low(l2 ) := {x ∈ L 2 | x  l2 and y ∈ L 2 .x  y  l2 }. We let (L 1 , ⊆ , , ) be another complete lattice where L 1 := 2 A and ⊆ satisfies • x ⊆ y if x ⊆ y. • x ⊆ y and y ⊆ x iff: x = {a1 , . . . , an } and y ={a1 , . . . , ai−1 , a1 , . . . , am , ai+1 , . . . , an } with low( f (ai )) = {a1 , . . . , am }. The lattices in Fig. 3 illustrate the second condition with low. We point it out that low( f (focusOnImp)) = { f (focusOnMp), f (focusOnEc), f (focusOnOs)} in L 2 . Hence, {focusOnImp} and {focusOnMp, focusOnEc, focusOnOs} are equivalent in L 1 (which is indeed a quotient lattice). This equivalence reflects the following interpretation of ours of arguments in A . Any argument a1 ∈ A has concrete instances a2 , . . . , ai (i.e. they belong to a1 ) if f (a2 ), . . . , f (ai ) are children of f (a1 ) in the abstract lattice. If, here, f (a2 ), . . . , f (ai ) are all the children of f (a1 ), our interpretation is that mentioning f (a1 ) is just a shorthand of mentioning ), . . . , f (ai ), i.e. both mean the same thing with respect to the structure of all f (a2 (L 2 , , , ). It is because of this property that we place all equivalent sets of arguments at the same position in L 1 .

26

R. Arisaka et al.

3.2 Abstraction and Concretisation Now, let α : L 1 → L 2 be the abstraction function, and let γ : L 2 → L 1 be the concretisation function. We require: α(l1 ) = au ∈l1 f (au ); and γ(l2 ) = {x ∈ A | f (x) ∈ low(l2 )}. Intuition, in particular, of γ, is as we described earlier in Sect. 3.1, just above. Note γ(l2 ) is an empty set when low(l2 ) does not contain any f (a) for a ∈ A . We say that ax is the best abstraction of {a1 , . . . , an } iff f (ax ) = α({a1 , . . . , an }), but more generally we say that ax is an abstraction of a1 , . . . , an iff α({a1 , . . . , an })  f (ax ). We say that {a1 , . . . , an } is the most general concretisation of ax iff {a1 , . . . , an } = γ( f (ax )). More generally, we say that {a1 , . . . , an } is a concretisation of ax iff {a1 , . . . , an } ⊆ γ( f (ax )). Proposition 1 (Galois connection) For every l1 ∈ L 1 and every l2 ∈ L 2 , we have α(l1 )  l2 iff l1 ⊆ γ(l2 ).  Proof If: Suppose l2  α(l1 ), i.e. l2  au ∈l1 f (au ) by definition of α.  is a standard abbreviation. Then we have γ(l2 ) ⊂ l1 , contradiction. Suppose l2 and α(l1 ) are not comparable in , then clearly l1  γ(l2 ), contradiction. Only if: Suppose γ(l2 ) ⊂ l1 , then there exists at least one e in l1 which is not in any set equivalent to  γ(l2 ) under ⊆ . Then by definition of α, we have l2  α(l1 ), contradiction. Example 2 In Fig. 3, low( f (focusOnImp)) = {focusOnMp, focusOnEc, focusOnOs}. We see that, for instance, {focusOnMp, focusOnEc} is mapped to

Fig. 3 Illustration of a concrete lattice and an abstract lattice

Facilitation in Abstract Argumentation with Abstract Interpretation

27

f (focusOnImp) by α as α({focusOnMp, focusOnEc}) = f (focusOnMp)∨ f (focusOnEc). Hence, it so happens that focusOnImp is (the best) abstraction of {focusOnMp, focusOnEc}. Meanwhile, γ( f (focusOnImp)) = {focusOnMp, focusOnEc, focusOnOs} = X . Since (α, γ) is a Galois connection, α(X ) = f (focusOnImp) again. Proposition 3 If ax is the best abstraction of a set A of arguments, then every abstraction of A is an abstraction of ax . Proposition 4 (Existence) There exists at least one abstraction for every set of arguments. Proof L 2 is a complete lattice.



However, some abstraction, including the top element of L 2 if it is in { f (a) ∈ L 2 | a ∈ A }, can be so general that all arguments are abstracted by it. For our example from Sect. 1, we can consider an argument ‘Argumentation is taking place’, for instance. Not only does it aggregate a1 , . . . , a3 , it also aggregates a4 and a5 . Needless to say, when we reason about argumentation, we would not be normally interested in the too general abstraction, since the whole point of argumentation theory is to be able to judge which set(s) of arguments may be acceptable when the others are unacceptable. A given argumentation should not be trivialised by such a big summary argument.

3.2.1

Conditions for Conservative Abstraction

Hence, a few conditions ought to be defined in order to ensure conservative abstraction. We assume that those elements of L 2 that are so abstract that they could abstract all arguments in A into a single argument are forming a non-empty upper set M of L 2 : M(⊆ L 2 ) is an upper set iff, if x ∈ M and x  y both hold, then y ∈ M. Intuition is that once we find some f (a) in L 2 that is so general, then any f (a1 ) such that f (a)  f (a1 ) is also. For example, if f (‘Argumentation is taking place’.) in L 2 is so general, then f (az ) for any az that further abstracts ‘Argumentation is taking place’ is also so general. Let us say that there is a path from an argument a1 to an argument a2 iff either a1 attacks a2 , or else there is a path from a1 to some argument a3 which attacks a2 . Let us say that a set A1 of arguments is a strongly connected component in (A, R) iff (1) there is a path from any a1 ∈ A1 to any a2 ∈ A1 and (2) there exists no A1 ⊂ A x ⊂ A such that A x satisfies (1). We say that abstraction ax of a set A1 ⊆ A of arguments is valid iff there exists a strongly connected component As ⊆ A such that A1 ⊆ As (abstraction ax is over arguments in a strongly connected / M (abstraction cannot be too general); compaticomponent); non-trivial iff α(ax ) ∈ / R  for any a ∈ A1 ∪ {ax } (abstraction ax should not be conflicting) ble iff: (a, ax ) ∈ and attack-preserving iff: (1) (a, ax ) ∈ R  for every a ∈ A\A1 attacking a member of A1 , and (2) (ax , a) ∈ R  for every a ∈ A\A1 attacked by a member of A1 . For intuition, validity ensures that abstraction is over arguments in a cycle. The condition of compatibility dictates that abstraction ax is not feasible when ax is selfattacking or when there is an attack relation between ax and any member of A. In

28

R. Arisaka et al.

the former, such abstraction defeats the whole point of making acceptability statuses of arguments more informative, and in the latter, ax is a conflicting abstraction for A1 . The role of attack-preservation is to ensure that the attack relation between abstraction ax of A1 ⊆ A and immediately connected arguments (external attackers or attackees to some member of A1 ) remains unchanged, that is to say, to ensure that abstraction ax influences only the members of A1 . Proposition 5 (Independence) Let ω, θ be one of the propositions: {ax is valid, ax is non-trivial, ax is compatible, ax is attack-preserving}. ω materially implies θ iff ω = θ. We say that abstraction ax of a set A1 of arguments is conservative iff it is valid, non-trivial, compatible and attack-preserving. We say that abstraction a y of a set A1 of arguments is the most conservative iff: (1) a y is conservative and (2) there exists no conservative abstraction ax ( = a y ) of A1 such that a y is abstraction of az . In general, the best and the most conservative abstractions do not coincide.   Proposition 6 For a given (A, R), A , f , R  and (L 2 , , , ), we have • The most conservative abstraction of A1 ⊆ A is not necessarily the best abstraction of A1 . • Abstraction a y of a conservative abstraction ax is not necessarily conservative.

3.3 Computation of Abstract Space Argumentation Frameworks from a Concrete Space Argumentation Framework We present an algorithm which computes for a given argumentation framework (concrete space argumentation framework) corresponding   abstract space argumentation frameworks with respect to A , R  , f, (L 2 , , , ), i.e. the choice made for facilitation. Algorithm 1 is given below: which, informally, just keeps replacing, where possible at all, a part of, or an entire, cycle with an abstract argument for all possibilities. Concerning Line 9, for a set of arguments A1 in a given argumentation framework, we say that A2 ⊆ A1 is a maximal subset of A1 that satisfies conservative abstraction iff (1) there exists a conservative abstraction of A2 and (2) there exists no A3 that satisfy both (2A): A2 ⊂ A3 ⊂ A1 and (2B): there exists a conservative abstraction of A3 . Proposition 7 (Complexity) Algorithm 1 runs at worst in exponential time. Proof Strongly connected components are known to be computable in linear time (Line 5). Line 9 is computable at worst in exponential time. With n arguments, we can over-estimate that the for loop executes at most n times, the first while loop at most (n C n/2 )n times and the second while loop at most n C n/2 times. 

Facilitation in Abstract Argumentation with Abstract Interpretation

29

Algorithm 1 Computation of the set of abstract space argumentation frameworks for a given concrete space argumentation framework Require:  with or without a subscript is a set of argumentation frameworks, .addSet(X ) adds X into , but duplicates are assumed to be discarded.   1: function DeriveAbs((A, R), A , R  , f, (L 2 , , , )) 2:  ← an empty set. 3:  abs.space.arg.framwrks to be added to  4: .addSet((A, R))  Initially only (A, R) is in  5:  ← all distinct sets of arguments in (A, R) that are strongly connected. 6: for all A in  do 7: 1 ←   Copy  8:  ← an empty set.  Reset 9:  ← the set of all maximal subsets of A that satisfy conservative abstraction. 10: while 1 is not empty do 11: while  is not empty do 12: (A1 , R1 ) ← the 1st element of 1 13: A x ← the 1st element of   A x ⊆ A1 14: ax ← the best abstraction of A x 15: Replace A x in (A1 , R1 ) with ax into (A2 , R2 ). 16: .addSet((A2 , R2 )) 17: Remove the 1st element of  18: end while 19: Remove the 1st element of 1 20: end while 21: end for 22: return  23: end function

3.4 Preferred Sets in Concrete and Abstract Spaces We now subject preferred sets in concrete space to those in abstract space for more clues on arguments acceptability in concrete space. We now subject preferred sets in concrete space to those in abstract space for more clues on arguments acceptability in concrete space. To make precise domains and ranges of functions that appear in this subsection, we let F denote the class of all argumentation A frameworks (A, R) for A ⊆fin A; sem denote 22 (this is the class the preferred semantics of a given argumentation framework belongs to) and σ : sem → sem be such that σ(X ) consists of every maximal member of X . Clearly, σ(X ) ⊆ X holds generally, and X ⊆ σ(X ) only if every member of X is maximal in X . Now, let g p : 2F → 2sem be such that g p (X ) is the set of all the preferred semantics of (A, R) ∈ X (the procedure for computing the preferred semantics of a given argumentation framework is found in the literature). Further, for any A ⊆fin A, let gγA : 2sem → 2sem be such that gγA () = {σ({A1 , . . . , An })|∃{A1 , . . . , An } ∈  ∀1 ≤ i ≤ n.Ai = Ai ∩ A}. This is functionally a projection function on a set of preferred semantics to retain only those arguments found in A. For example, with A = {a1 , a2 }, gγA ({{{a1 , a3 }}, {{a4 }, {a2 , a5 }}}) = {{{a1 }}, {{a2 }}}. The underly-

30

R. Arisaka et al.

Fig. 4 Relating abstract and concrete preferred semantics

ing motivation behind it, however, comes from the point that abstraction has taken place to encapsulate irresolvable cyclic conflicts, which gives us a reason to not accept any member of γ( f (ax )). But this is achievable precisely when gγA is the projection function as defined above. Finally, we denote Algorithm 1 by gα . Figure 4 illustrates on one hand g p ({(Ac , Rc )}) for an argumentation framework (Ac , Rc ) in concrete space, which gives and us the preferred semantics of (Ac , Rc ) (= the set of all preferred sets of (Ac , R c )), on the other hand gγAc ◦ g p ◦ gα () with  ≡ ((A, R), A , R  , f, (L 2 , , , )), which also gives us a set of all preferred sets in concrete space but through abstraction. The abstract transformations proceed by transforming the given concrete space argumentation framework into a set of abstract space argumentation frameworks (gα ()), deriving the preferred semantics for each of them (g p ◦ gα ()), and projecting them to concrete space arguments Ac (gγAc ◦ g p ◦ gα ()) so that comparisons to the preferred semantics obtained directly within concrete space can be done. In particular, we can learn (1) an argument deemed credulously/sceptically acceptable within concrete space is positively approved by abstract space preferred semantics, thus gaining more confidence in the set members being acceptable; (2) arguments not deemed acceptable within concrete space, i.e. those that are not in any preferred set, are negatively approved also by abstract space preferred sets, thus gaining more confidence in those arguments not acceptable. But also (3) arguments deemed credulously or sceptically acceptable within concrete space may be questioned when their acceptability is not inferred from any abstract space preferred set and, on the other hand, (4) arguments deemed not acceptable within concrete space may be credulously/sceptically implied by abstract space preferred set(s). To formally sum it up, given an argumentation framework AF ≡ (Ac , Rc ), we say that, with respect to given  ≡ (A , R  , f,

Facilitation in Abstract Argumentation with Abstract Interpretation

31

  (L 2 , , , )), an argument that is deemed credulously/sceptically acceptable in concrete space is +approved iff, for some/every member X of gγAc ◦ g p ◦ gα () (similarly for the other three below), it belongs to some/every member A of X . questioned iff, for every member X of gγAc ◦ g p ◦ gα (), it belongs to no member A of X .   And we say that, with respect to given  ≡ (A , R  , f, (L 2 , , , )), an argument that is deemed not acceptable in concrete space is −approved iff, for every member X of gγAc ◦ g p ◦ gα (), it belongs to no member A of X . credulously/sceptically implied iff, for some/every member X of gγAc ◦ g p ◦ gα (), it belongs to some/all member(s) A of X .

3.5 Comparisons to Dung Preferred Semantics and cf2 Semantics, and Observations We conclude this section with comparisons to Dung preferred semantics and cf2 semantics [3]. Let us first consider AF1 in Fig. 1 and the lattices in Fig. 3. Let us denote gγA ◦ g p ◦ gα for any A ⊆fin A by G A , then we have {∅} for g p ({(Ac , Rc )}) (i.e. (Ac , Rc ) has only one Dung preferred set which is the empty Ac set); {{a1 , a5 }, {a2 , a5 }, {a3 , a5 }} forcf2((A  c , Rc )), while {{{a5 }}} for G () with    ≡ ((Ac , Rc ), A , R , f, (L 2 , , , )) (as we have already shown the only one abstract space argumentation framework AF2 , found in Fig. 1, we omit the derivation process). That is, {a5 } is the only one preferred set of (Ac , Rc ) derivable via abstract transformation. By comparisons between g p ({(Ac , Rc )}) and G Ac (), we observe that all a1 , a2 , a3 , a4 are-approved, while a5 is implied. Hence, in this case, with respect to , we might say that Dung preferred semantics behaves more conservative than necessary. On the other hand, by comparisons between cf2((Ac , Rc )) and G Ac (), we observe that cf2((Ac , Rc )) accepts either of the arguments in the odd cycle, which is more liberal than necessary with respect to — since no arguments in AF1 could break the preference pre-order focusOnOs < focusOnMp < focusOnEc < focusOnOs of the three arguments. Therefore, for AF1 , Dung semantics seems to give a false-negative to a5 acceptability, while cf2 seems to give false-positives to either of a1 , a2 , a3 acceptability. If those acceptability semantics aim to answer ‘Which arguments should be (credulously) accepted?’, false-negatives only signal omission, but false-positives signal unintuitive results and are less desirable. Let us, however, consider AF3 in Fig. 5 borrowed from [3]. Let (Ac , Rc ) denote AF3 . Assume Ac consists of a1 The downpour has been relentless since the morning (DP for short).

32

R. Arisaka et al.

Fig. 5 Top left: an argumentation framework AF3 . Bottom-left: AF3 ’s abstraction. Right: an abstract lattice L 2

a2 a3 a4 a5

It was burning hot today (Bn). All our employees ran a pleasant full marathon today (Fm). Nobody stayed indoor (NoId). Many enjoyed TV shows at home (TV).

Assume Rc to be as shown in the top left of Fig. 5. Let us say that a facilitator recognises (1) that a downpour and the burning sun relate under the harsh weather; (2) that the harsh weather and indoor activities such as watching TV shows relate under harsh weather activity (that is, an activity to do under a harsh weather condition); (3) that harsh weather and mild weather activities do not go together and (4) a full marathon is good under a mild weather. Thus, he/she conceives (1) a y : The weather was harsh today. (HW); (2) ax : Indoor activities were popular today due to the harsh weather. (HWA); (3) aw : Many enjoyed indoor activities. (Id); (4) today. av : Activities under a mild weather were popular   (WMA) Hence, he/she  = Ac ∪ {HW, HWA, Id, WMA} and (L 2 , , , ), as in Fig. 5. We assume sees A M = L 2 (the top element), and any nodes below f (Fm), f (TV), f (Dp), f (Bn) not explicitly shown there are still assumed to be there. Further, he/she notices (a) that the harsh weather and a mild weather activity do not go together, and (b) that the harsh weather opposes absence of indoor activities, thus letting R  = R ∪ {(HW, Fm), (Fm, HW), (HW, NoId), (Id, NoId)}. Here we have {{a5 }} for g p ({(Ac , Rc )}); {{a1 , a4 }, {a1 , a5 }, {a2 , a5 }, {a3 , a4 }, for cf2((Ac , Rc )). Meanwhile, for G Ac () with  ≡ ((Ac , Rc ), A , R  , f, {a3 , a5 }} (L 2 , , , )), note that {a1 , a2 } is first of all the set of a maximal subset of {a1 , a2 , a3 } for which a y is valid (abstraction is over arguments in a cycle), non-trivial / ( f (HW) is not in M), compatible ((a1 , a y ), (a2 , a y ), (a y , a1 ), (a y , a2 ), (a y , a y ) ∈ R  ). It is also the most conservative. Hence, the argumentation framework shown under AF3 in Fig. 5 is the abstract space argumentation framework of AF3 with respect to . Consequently, G Ac () = {{{a3 , a4 }, {a3 , a5 }}}. Therefore, in this example, G A3 (), too, credulously accepts an argument in the odd cycle as cf2((Ac , Rc )) does. It is safe to observe that the traditional Dung, or cf2, which is more appropriate depends not just on an argument graph but also on the semantic relation among the arguments in the graph, and that combination of abstract argumentation and abstract

Facilitation in Abstract Argumentation with Abstract Interpretation

33

interpretation is one viable methodology to address this problem around cycles in argumentation frameworks.

4 Conclusion with Related Work 4.1 Related Work As far as we are aware, this is the first study that applies the idea of abstract interpretation into abstract argumentation theory (although abstract interpretation for sound over-approximation, that is, the abstract interpretation in the ordinary sense in static program analysis, is not explicitly identified here; for that, [1] seems to be the first within formal argumentation), taking inspiration from facilitation. Odd-sized cycles have been a popular topic of research in the literature for some time, as they tend to make acceptability statuses of arguments in itself and even other parts of a given argumentation graph undecided. Noting the difference between preferred and the grounded semantics, Baroni et al. [3] proposed to accept maximal conflict-free subsets of a cycle for gaining more acceptable arguments off an oddlength cycle, which led to cf1/cf2 semantics. They are regarded as improvements on more traditional naive semantics [4]. They also weaken Dung defence around strongly connected components of an argumentation framework into SCC-recursiveness. The stage2 semantics that took inspiration from cf2 is another approach with a similar SCC-recursive aspect, but which is based on the stage semantics [12] rather than the naive semantics, the incentive being to maximise range (the range of a set of arguments is itself plus all arguments it attacks). The fundamental motivation of those semantics was to treat an odd-length cycle in a similar manner to an even-length cycle. As we showed, however, specialisation of Dung semantics without regard to semantic relation among arguments in a given argumentation framework is not fully generalisable. To an extent, that any such systematic resolution of acceptability of cyclic arguments based only on an argumentation graph is tricky relates to the fact that attacking arguments in a cycle can be contrarily [8] but not necessarily contradictorily opposing. As is known in linguistics, dealing with contrary relations is, in general, difficult in Fregean logic. However, with abstract interpretation, we can take advantage of semantic information of arguments, by which uniform treatment of cycles comes into reach.

4.2 Conclusion We introduced abstract interpretation into argumentation frameworks. Our formulation shows it is also a powerful methodology in AI reasoning. We believe that more and more attention will be directed towards semantic-argumentgraph hybrid stud-

34

R. Arisaka et al.

ies within argumentation community, and we hope that our work will provide one fruitful research direction. Studies on other aspects of facilitation in argumentation should be also worthwhile. Acknowledgements We thank reviewers for their comments. The first and the third authors were partially supported by JST CREST Grant JPMJCR15E1, and the first author is also additionally supported by AIP challenge program, Japan.

References 1. Arisaka, R., Ito, T.: Abstract interpretation in formal argumentation: with a Galois connection for abstract dialectical frameworks and may-must argumentation (first report). CoRR (2020). arXiv:2007.12474 2. Baroni, P., Giacomin, M.: On principle-based evaluation of extension-based argumentation semantics. Artif. Intell. 171(10–15), 675–700 (2007) 3. Baroni, P., Giacomin, M., Guida, G.: SCC-recursiveness: a general schema for argumentation semantics. Artif. Intell. 168, 162–210 (2005) 4. Bondarenko, A., Dung, P.M., Kowalski, R.A., Toni, F.: An abstract, argumentation-theoretic approach to default reasoning. Artif. Intell. 93(1–2), 63–101 (1997) 5. Caminada, M.: On the issue of reinstatement in argumentation. In: JELIA, pp. 111–123 (2006) 6. Cousot, P., Cousot, R.: Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In: Conference Record of the Fourth Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Los Angeles, California, pp. 238–252. ACM Press, New York (1977) 7. Dung, P.M.: On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming, and n-person games. Artif. Intell. 77(2), 321–357 (1995) 8. Horn, L.R.: A Natural History of Negation, 2nd edn. The University of Chicago Press, Chicago (2001) 9. Ito, T.: Towards agent-based large-scale decision support system: the effect of facilitator. In: HICSS (2018) 10. MacKnight, C.B.: Teaching critical thinking through online discussions. Educ. Q. 38–41 (2000) 11. Rovai, A.P.: Facilitating online discussions effectively. Internet High. Educ. 10(1), 77–88 (2007) 12. Verheij, B.: Two approaches to dialectical argumentation: admissible sets and argumentation stages. In: NAIC, pp. 357–368 (1996)

How to Recognize and Explain Bidding Strategies in Negotiation Support Systems Vincent J. Koeman, Koen Hindriks, Jonathan Gratch, and Catholijn M. Jonker

Abstract Effective use of negotiation support systems depends on the systems capability of explaining itself to the user. This paper introduces the notion of an explanation matrix and an aberration detection mechanism for bidding strategies. The aberration detection is a mechanism that detects if one of the negotiating parties deviates from their expected behaviour, i.e. when a bid falls outside the range of expected behaviour for a specific strategy. The explanation matrix is used when to explain which aberrations to the user. The idea is that the user, when understanding the aberration, can take effective action to deal with the aberration. We implemented our aberration detection and our explanation mechanisms in the Pocket Negotiator (PN). We evaluated our work experimentally in a task in which participants are asked to identify their opponent’s bidding strategy, under different explanation conditions. As the number of correct guesses increases with explanations, indirectly, these experiments show the effectiveness of our aberration detection mechanism. Our experiments with over 100 participants show that suggesting consistent strategies is more effective than explaining why observed behaviour is inconsistent. An extended abstract of this article can be found in [15].

V. J. Koeman · K. Hindriks Vrije Universiteit Amsterdam, De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands e-mail: [email protected] K. Hindriks e-mail: [email protected] J. Gratch USC Institute for Creative Technologies, 12015 Waterfront Drive, Playa Vista, CA 90094-2536, USA e-mail: [email protected] C. M. Jonker (B) Interactive Intelligence Group, Delft University of Technology, van Mourik Broekmanweg 6, Delft, The Netherlands e-mail: [email protected] LIACS, Leiden University, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 R. Aydoˇgan et al. (eds.), Recent Advances in Agent-based Negotiation, Studies in Computational Intelligence 958, https://doi.org/10.1007/978-981-16-0471-3_3

35

36

V. J. Koeman et al.

Keywords Negotiation support systems · Recognizing · Explanation · Bidding strategies

1 Introduction Negotiation support systems aim to assist human negotiators in their complex decision-making processes aimed at reaching an agreement to exchange goods or services. One such system, the Pocket Negotiator (PN) [10], states its goal as ‘to enhance the negotiation skills and performance of the user ... through synergy between the human negotiator and the Pocket Negotiator’. The PN supports the major activities of a negotiation: modelling the interests of the user and the opponent, bidding, and closing. Techniques currently used to provide this support include preference elicitation methods [3, 20], visualization of the negotiation space [14], and multi-criteria optimization techniques for advising what to bid and when to accept the opponent’s offer [2]. This paper is the first in a line of research to develop a full fledged explanation framework for negotiation support systems. We decided to start with the explanation of what happens during the bidding phase of a negotiation, as the effectiveness of the bidding strategies determines, to a large extent, the utility of the negotiation outcome. During the bidding phase, the support currently provided by the PN consists of an interface, see Fig. 1 with a range of options and tools. The snapshot is taken at a moment when the user has just received a bid from the opponent bid suggestions, PN provides an intuitive bid analysis in the form of the horizontal red bars in the right upper corner of Fig. 1. That same bid is also presented as a dot in the visualization of the bid space and its Pareto Optimal Frontier. Expert negotiators use this interface to quickly create bids by either clicking on the points projected on the Pareto Optimal Frontier or by asking for a bid suggestion. Bid suggestions are generated by a bidding support agent. The user can pick any of a number of typical bidding strategies provided by PN. Finally, the visualization provides an overview of the bids made by the user and by the other party. Note that the visualization of the bid space is based on an estimation of the preference profile of the opponent, and the current view of the negotiator of his/her own preferences. If the estimation is wrong, then so is visualization. Furthermore, the bid suggestions, and the advice of when to accept and what to accept of the support agent depend on that estimation. To get the most out of interface in Fig. 1, the human user and the bidding support agent have to be able to collaborate at a high knowledge level about the ongoings in the negotiation. The team needs to make sophisticated analysis of the bidding by both parties, creating an understanding of why a player makes this particular bid now. Although the reader is referred to the state of the art in human negotiation theories for a thorough discussion of these questions, see, e.g. [17, 21, 22], we provide an example here. Suppose that one of the players, say the opponent, bids below the Pareto Optimal Frontier, then we need to know why. Several reasons come to mind:

How to Recognize and Explain Bidding Strategies in Negotiation …

37

Fig. 1 An example of a bidding phase in the pocket negotiator

1. The opponent doesn’t realize his bid is not Pareto Optimal bids. This might be the case if the opponent is human, as humans in multi-issue negotiations often find this difficult. 2. The opponent’s preferences might differ from what we estimated. 3. It might be a tactic to play a bit unpredictable. 4. All of the above might hold at the same time. The different cases ask for different actions on our side, and thus we need to identify which case holds. Similarly, users might deviate intentionally or unintentionally from their chosen bidding strategy. If intentionally, our negotiation supports agent should know about this, so that it can match its advise and support activities to that strategy. If it happens unintentionally, alerting the user might be the best support to give. We wrote ‘seems to deviate’, as it might also be the case that the preferences of a user change or are for some other reason different from the preferences entered in the negotiation support system. Again it is important to discover this as quickly as possible, and make sure that the team of human negotiator and negotiation support system have a shared understanding of what is going on. Now we have come to the core of the problem: a negotiation support system can only discuss these matters with the user, if it can explain to the user what we wrote here. Furthermore, the need to discuss this can only be established if the agent is capable of detecting and analysing these and other strange behaviours that we decided to call aberrations.

38

V. J. Koeman et al.

This motivates the need for aberration detection and explanation mechanisms that we introduce in this paper, and for which we present experimental results. In Sect. 2, we discuss the state-of-the-art literature relevant for this work. Section 3 discusses the most characteristic and typical negotiation strategies used in real life. We use these strategies to further focus our research. The concepts already discussed informally in Sect. 3 are formalized and extended in Sect. 4 to form the basis for the analytical framework that forms the core of our aberration detection and explanation mechanisms as presented in Sects. 5 and 6. The experimental setup for the evaluation of our mechanisms is presented in Sect. 7. The experimental results are presented in Sect. 8. Conclusions can be found in Sect. 9.

2 Related Work Explanations are currently employed in many sub-fields of artificial intelligence, such as justifying autonomous agent behaviour, debugging of machine learning models, explaining medical decision-making, and explaining predictions of classifiers [18]. Reference [8] identifies, however, that allowing users of negotiation support systems to ‘trust the system through co-participation, transparency, and proper representation’ is still an open challenge. For negotiation agents representing humans specifically, the authors identify that a user’s trust and willingness to relinquish control is conditional on a sufficient understanding of the agent’s reasoning and consequences of its actions. Reference [24] focuses on explaining the preferences of a user and his or her opponent in the Pocket Negotiator. The authors propose a mechanism to analyse discrepancies between the system’s mental model and the user’s (assumed) mental model. However, aside from addressing a different sub-topic within negotiation support than we do, generating the content of the explanations and evaluating their effectiveness are also not addressed in this work. Reference [19] states that ‘artificial agents should be equipped with explanation and argumentation capabilities in order to be able to convince their users of the validity of their recommendations’. Reference [23] identifies seven possible aims of such explanations: transparency, scrutability, trust, effectiveness, persuasiveness, efficiency, and satisfaction. The authors also consider these aim metrics for good qualifications, of which trade-offs are inevitable. The goal of an explanation should thus be carefully considered. Reference [18] argues that explanations in AI should be contrastive, selective, non-probabilistic, and social. Although most research on ‘opponent modelling’ in (automated) negotiation focuses on determining the preferences of the opponent [7], in this work, we focus explicitly on determining the (bidding) strategy that an opponent uses. Reference [7] identifies two main approaches: regression analysis and time-series forecasting. Specific implementations are, however, either overly simplistic (e.g. classifying an opponent as ‘positive’ when its average concession rate exceeds some pre-set amount [16]) or opaque (e.g. using techniques like neural networks and Markov chains). In this work, we aim to devise an approach that balances the level of sophistication with

How to Recognize and Explain Bidding Strategies in Negotiation …

39

the degree of explainability, focusing on increasing a (novice) human negotiator’s understanding of the opponent’s strategy rather than determining that strategy as good as possible. Using a negotiation support system as a training tool for novice negotiators, as [13] do for example, shares similarities with our aim of providing insight into bidding strategies of opponents in those systems, as information about (digital) negotiations is to be conveyed to a novice user in both situations. Current work in the field of training is, however, mainly focused on evaluating the (actions of the) participant itself, e.g. focusing on factors such as making efficient concessions and avoiding early commitment. The explanation mechanism as developed in this paper for opponent strategy recognition could be relevant for negotiation training, but we do not explicitly examine that aspect here.

3 Typical Bidding Strategies As we study negotiation support systems for human negotiators, the number of rounds in a negotiation is low, with the highest numbers typically found in the markets of Northern Africa, where people enjoy haggling, and thus the process takes much longer than in USA where the number of rounds of bidding is typically no more than 3. In that light, the essence of human negotiation strategies can be captured by the following four typical strategies as identified in [5]: Hardheaded (‘Tough negotiator that does not yield easily’, i.e. makes mostly silent moves or small concessions), Conceder (‘Nice negotiator that tends to move towards you’, i.e. generally makes concessions), Tit-for-Tat (‘Somewhat mirrors the moves you are making’, i.e. responds with the same type of moves) and Random (‘Does not follow any of the other strategies’. Examples of how these bidding strategies are in action are depicted in Fig. 2. Note that in that picture, the user (playing the ‘me’ role) is playing a strategy that allows her to differentiate between these four typical strategies, and, in particular, between the Conceder and the Tit-for-Tat strategy. Note that the literature on automated negotiating agents, see, e.g. [4, 6], is full of all kinds of sophisticated bidding strategies. However, the core of most of these strategies is formed by (combinations of) two commonly used negotiation tactics: time-dependent tactics and behaviour-dependent tactics, in which some aspects of randomness is used to prevent the strategy from becoming too predictable. The Conceder and Hardheaded strategy fall under the time-dependent tactics, and Titfor-Tat is a behaviour-dependent tactic. It is quite a challenge to recognize the essence of someone’s negotiation strategy during a negotiation of only a few rounds. To be able to develop a mechanism to do so that works independent of the domain of negotiation and independent of the opponent one is playing, we need a way to abstract away from the exact details of bids and offers. The following section presents an abstract framework to do so.

40

V. J. Koeman et al.

Fig. 2 Typical bidding strategies

4 Bids, Utilities and Moves This section presents the notation and definitions for bids, utilities and moves as used in the remainder of this article. N = {H, O} is the set of negotiators, where H denotes the human participant, and O the opponent. Variable a ∈ N ranges over the negotiators (‘agents’). B denotes the bid space for the negotiation and Bia ∈ B denotes the bid made by agent a in round i. Let u a : B → [0, 1] denote the utility function of agent a (i.e. a’s preferences), then m a ⊆ B = {b ∈ B|∀b ∈ B : u a (b ) ≤ u a (b)} is the set of bids that have maximum utility for agent a. These are the so-called maximum utility bids for the agent. U N = [0, 1]|N | denotes the multi-dimensional utility space (i.e. all possible bids in a domain) over the negotiators in set N , where |N | denotes the number of elements in N . The bids made by the negotiators are mapped to U N according to the utility functions of negotiators by function υ : B → U N defined by υ(b) = u H (b), u O (b) . A bid sequence βi,a j = (Bia , . . . , B aj ) is the sequence of bids made by an agent a ∈ N

How to Recognize and Explain Bidding Strategies in Negotiation …

41

from round i up to and including round j, where i ≤ j, in the negotiation.1 A move μ is a pair (b, b ) of two sequential bids b, b ∈ B made by the same agent. Any negotiating party can make offers that from the perspective of an agent a can be seen as concessions, selfish moves or silent moves. For any two agents a, a  ∈ N and any move μ = (b, b ) ∈ B × B, we define the following: • [move size:] σa (μ) = u a (b ) − u a (b), is the size of the move (i.e. difference in utility) according to a. • [silent moves:] sil aδ (a  , μ) if |σa (μ)| ≤ δ, which means that agent a considers the move with a size less than δ made by agent a  to be a silent move. • [concession moves:] We differentiate between ◦ concaδ (a  , μ) if a = a  ∧ σa (μ) > δ, which means that from the perspective of a agent a  conceded at least δ. The generic case conca refers to conca0 . ◦ concaδ (a, μ) if σa (μ) < 0 ∧ |σa (μ)| ≥ δ, which means that a conceded at least δ. • [selfish moves:] sel f aδ (a, μ) if σa (μ) > δ, which means that a thinks he made a selfish move of at least δ size. • [move types:] the parametrized characterizing relations defined above also define sets Ma of move types according to agent a: Ma = {sil aδ1 , concaδ2 , sel f aδ3 }, for given δ parameters. Thus, it is the set of all move types that satisfy the corresponding predicates. These notions are inspired by the Dynamics Analysis of Negotiation Strategies (DANS) framework of [11], which we simplified by modelling unfortunate moves as selfish moves, and fortunate/nice moves as concessions. The next section illustrates the effectiveness of our abstract framework by introducing an optimal strategy detection algorithm that is based on this framework.

4.1 Optimal Bidding Strategy Recognition The following strategy correctly determines the opponent’s strategy in three to four rounds, unless the opponent would play the random strategy as that could theoretically behave consistently with a different strategy over multiple moves: Round 1: Randomly select B1H from m H . Round 2: Bidding: Randomly select B2H from {b ∈ B | τ HO ((B1H , b)) = concδH }, where δ corresponds to a moderate concession, e.g. δ = .1. Analysis: After O made a bid, compare the first moves of both players to form a first hypothesis Hyp. O ). Let t O = τ HO (β1,2 1 For

simplicity, we disregard the possibility of using information from a previous encounter with the same opponent here.

42

V. J. Koeman et al. O H If σ H (β1,2 ) ≤ σ H (β1,2 ) ∧ t O ∈ {conc H , sil H }, then Hyp := {H H, R}. O H ) ≥ σ H (β1,2 ) then Hyp := {CC, R, T T }. If σ H (β1,2

Round 3: The bid of H depends on the analysis of round 2. Bidding in case Hyp = {H H, R} Randomly select B3H from {b ∈ B|τ HH ((B2H , b)) = concδHl } where δl is the boundary of a large concession, e.g. > 0.2 Bidding in case Hyp = {CC, T T, R} Randomly select B3H from {b ∈ B|τ HH ((B2H , b)) = sel f δHm } where δm is the boundary of a moderately selfish move. O ). Analysis in case Hyp = {H H, R}: Let t O = τ HO (β2,3 O H O if t ∈ {sel f H , conc H } and σ H (β2,3 ) ≤ σ H (β2,3 )}, then Conclude that O plays Hardheaded, and stop else Conclude that O plays Random. O ). Analysis in case Hyp = {CC, T T, R}: Let t O = τ HO (β2,3 O If t = sel f H , then O does not play Conceder, so we update: Hyp:= {T T, R}. If t O = {sil H , conc H }, then O does not play Tit-for-Tat, so we update: Hyp:= {CC, R}.

Round 4: Only needed if round 3 ended without conclusions H ’s bid: Randomly select B4H from {b ∈ B|τ HH ((B3H , b)) = sil H }. O ). Conclusions: Let t O = τ HO (β2,3 Analysis in case Hyp = {T T, R}: if t O = sil H , then this does not fit with R and we conclude that O is playing Tit-for-Tat else we conclude that O is playing R. Analysis in case Hyp = {CC, R}: If t O ∈ {sil H , conc H }, then we conclude that O plays Conceder. else we conclude that O plays Random.

5 Expectations and Aberrations As our aim is to pro-actively discuss bids with respect to a user’s expectation (‘guess’) of the bidding strategy of the opponent, a mechanism is needed that can detect when a bid deviates from that strategy. The mechanism should be sensitive to the user’s estimation of the opponent bidding strategy, which we refer to as assumption in the remainder of this paper. A deviation can only be detected if also an expectation can be formulated on the types of move that a negotiator would play if he or she were to play a certain strategy. Let S be a set of bidding strategies. We define an expectation function ρ : S × N × N →  N (P(M)) to be a function that given a strategy s ∈ S, a finite number of rounds r ∈ N, and a negotiator a ∈ N and produces a sequence of length r of

How to Recognize and Explain Bidding Strategies in Negotiation …

43

expected move types from Ma corresponding to s. Strategy descriptions should be specific enough to derive the δ parameters of the move types, and the behaviour over the rounds and in relation to possible deadlines and/or discount factors. For each of the four typical strategies [5], Hardheaded (‘Tough negotiator that does not yield easily’, i.e. makes mostly silent moves or small concessions), Conceder (‘Nice negotiator that tends to move towards you’, i.e. generally makes concessions), Tit-for-Tat (‘Somewhat mirrors the moves you are making’, i.e. responds with the same type of moves) and Random (‘Does not follow any of the other strategies’, i.e. makes concessions or selfish or silent moves randomly), we give an example for four rounds of negotiation in which the role of the δ parameters is ignored. In each example, the human user is the first to bid in a round. If the human user H estimates the opponent O to play a Hardheaded strategy (denoted H H ∈ S) for four rounds, then ρ(H H, 4, H ) = ({sil H }, {sil H }, {sil H }). Similarly, if the strategy is estimated to be a Conceder strategy CC, then ρ(CC, 4, H ) = ({conc H }, {conc H }, {conc H }). A Random strategy R would yield a set of all possible move types per move: ρ(R, 4, H ) = (M H , M H , M H ). Note that the definition of ρ function for a Tit-for-Tat strategy (denoted T T ) can only be determined if the bidding strategy of the human user is also given, or if a move type sequence for the same rounds of the human user is provided. Therefore, in case of T T , the function is called by ρ(< T T, (conc H , sel f H , conc H ) >, 4, H ) = ({conc H }, {sel f H }, {conc H }). In order to detect aberrations, we need to compare the move types of the actually  made moves with the expected move types. For this, we define a set of functions τaa for all a, a  ∈ N over bid sequences as follows: ⎧ ⎪ if sil a (a  , μ) ⎨sil a  ∀μ ∈ B × B : τaa (μ) = conca if conca (a  , μ) ⎪ ⎩ sel f a if sel f a (a  , μ) 

∀a, a  ∈ N , ∀i, j ∈ N, ∀βi,a j : 















a τaa (βi,a j ) = (τaa ((Bia , Bi+1 )), . . . , τaa ((B aj−1 , B aj ))) 



Aberration detection is now as simple as checking for each element in τaa (β a ) if it occurs in the corresponding element of ρ(s, r, a). By setting the δ parameters appropriately, minor deviations can be ignored.

6 Generating Explanations Now that we have a method to indicate a party’s bid as deviating from the user’s assumption of that party’s strategy and a classification of the deviation in terms of a

44

V. J. Koeman et al.

Table 1 The aberration explanation matrix for our tit-for-tat expectation function (according to template 1) Our μ−1 Expected μ Actual μ Explanation (of aberration) Silent

Silent

Concession Selfish

Concession

Concession (equal)

Silent Concession (smaller) Concession (larger) Selfish

Selfish

Selfish (equal)

Silent Concession Selfish (smaller) Selfish (larger)

A tit-for-tat player would typically not respond with a conceding move to your inaction A tit-for-tat player would typically not respond with a selfish move to your inaction A tit-for-tat player would typically not respond with inaction to your conceding move A tit-for-tat player would typically not respond with a conceding move that is much smaller than your concession A tit-for-tat player would typically not respond with a conceding move that is much larger than your concession A tit-for-tat player would typically not respond with a selfish move to your conceding move A tit-for-tat player would typically not respond with inaction to your selfish move A tit-for-tat player would typically not respond with a conceding move to your selfish move A tit-for-tat player would typically not respond with a selfish move that is much smaller than your selfish move A tit-for-tat player would typically not respond with a selfish move that is much larger than your selfish move

direction and size according to (a simplification of) the DANS framework, we need to convey this information to the user. To this end, we propose the use of aberration explanation matrices, providing a visualization of the expectation function as well as an explanation for all combinations (i.e. aberrations) of the expected move type(s) and size(s) and the actual move type(s) and size(s) of the opponent. As an example, we provide the aberration explanation matrix for the expectation function ρ(< T T, μ−1 >, 2, H ) in Table 1, which provides explanations for aberrations from an expectation of the tit-for-tat strategy. The matrix is set up according to the following template (Template 1): ‘An expected strategy player would typically not respond with an actual μ to your μ−1 ’, where expected strategy and actual are parameters to be instantiated. For simplicity, we leave move size information out. Note that we use μ−1 to signify the last two bids of our user, i.e. defining a decrease of our own utility with x as a concession towards the opponent of size r = −x; any bids before those last two are not used. Moreover, we use ‘smaller’ and ‘larger’ here as a difference between the expected r (which is equal to the user’s own r in μ−1 ) and the actual r that is larger than 10% (δ = .1).

How to Recognize and Explain Bidding Strategies in Negotiation …

45

For each supported negotiation strategy, an explanation matrix should be provided, establishing a design from which the implementation can be constructed. As discussed in Sect. 7, the results from two pilot studies encouraged us to design an additional explanation template. The idea of the second template is to suggest to the user which strategies would be consistent with the observed behaviour, instead of only pointing out the behaviour is not consistent with the user’s current guess, as is done in Template 1. The alternative explanation template that we used is Template 2: ‘Responding with an actual μ to your μ−1 is more consistent with consistent strategies’, where actual and consistent are parameters to be instantiated.

7 Evaluation This section describes our evaluation of the aberration detection mechanism and explanation matrix we introduced. If we would try to introduce our mechanisms at once for all aspects of the bidding phase, the experiments would have to cover too many variables at once for a meaningful evaluation. In direct relation to that, the number of participants would make the experiment infeasible. Finally, if we just let participants negotiate then we cannot control how often aberrations would occur, or whether they would occur at all. Therefore, we designed the experiment in such a way that greatly reduced the number of variables we would test for and in a manner that gives us control of the aberrations that would occur in the experiment. We decided to test the participants’ understanding of the typical bidding strategies discussed in Sect. 3. In a between-subject setup, participants negotiated against automated opponents. The bidding strategy used by the automated opponents (agents) varied over the well-known bidding strategies. The participants were asked to identify the bidding strategy of the opponent. We controlled the variation over the bidding strategies, as well as whether or not the participant was supported by our explanation mechanism. We evaluated the effectiveness of this mechanism in improving a participants’ understanding of the opponent’s bidding negotiation strategy. We hypothesized that our explanation mechanism improves a PN user’s understanding of a negotiation, and specifically, of the strategy that the other party uses. By some pilot experiments we found that this, more than expected, depends on the contents of an explanation (of an aberration), suggesting consistent strategies is more effective than explaining why observed behaviour is inconsistent for example. Therefore, we finally evaluated our hypothesis that our explanation mechanism based on aberrations increases a user’s understanding of the opponent’s strategy through controlled between-subjects experiments, in which one group did not receive such explanations, a second group received explanations of why a chosen strategy seemed less likely to fit and the third group received explanations about which strategies would be consistent with the behaviour of the opponent. All participants were tasked with negotiating against a (computer-controlled) opponent that employed one

46

V. J. Koeman et al.

of the four defined strategies, with the goal to find out which strategy this opponent is playing.

7.1 Preparation In the experiments, each participant first received short definitions of the four possible negotiation strategies, and a brief training in the use of the PN itself. The goal of determining the opponent’s strategy without regarding the result of the negotiation itself was made clear. All negotiations were performed in the multi-issue Jobs domain (see Fig. 3), which was selected due to being easily understandable for novice users while still providing enough complexity and thus flexibility and variation in the negotiations. The issues and values in this domain could be explored by the user in the PN; all issue weights and valuations were fixed for both parties, i.e. all preferences are fully known from the start and never change. Each participant was asked to perform at least four negotiation sessions. In the first four negotiations, each participant played against each possible opponent at least once, in a random order. Participants were not informed about the fact that each opponent would only be encountered once. In all sessions, the participant’s experiment condition did not change. Based on the optimal bidding strategy presented in Sect. 4.1, which requires three to four rounds of bidding, we allowed each participant sufficient room with at most ten rounds per negotiation. Our evaluation results show that this is indeed sufficient.

7.2 Conditions During a negotiation, the opponent would never accept a bid (i.e. the opponent never ended the negotiation); only the participant could end the negotiation when he or she was convinced of having identified the strategy of the opponent successfully (which happened automatically after the ten bid limit as well). This was known to the participants. The participant’s assumption about the opponent’s strategy was requested after each move of the opponent, as illustrated in Fig. 3. As the participant always had to start the negotiation with an opening bid, the first move of the opponent was already a response to the participant’s first move. Thus, with the participant always making the opening bid, the first assumption about the opponent’s strategy is requested after four bids (i.e. a move from both parties). If the opponent would start the negotiation, a participant would have only three bids in total to base his or her first estimate on, which we considered too much of a guess. Participants were not informed of the correctness of their assumptions of the opponent’s strategy at any point during the experiment. Moreover, the order in which the four strategies (i.e. assumption options) were displayed was randomized in each negotiation session. In the explanation conditions, the request for selecting the strategy the user thinks the opponent is employing was potentially accompanied by an

How to Recognize and Explain Bidding Strategies in Negotiation …

47

Fig. 3 The display of an offer of an opponent in the modified PN

explanation as detailed in this paper. Note that such an explanation was always based on the participant’s previous selection of the opponent’s strategy, as it would otherwise be too easy to just ‘try all buttons’ and see how the system responds.

7.3 Metrics Each bid, and each selection of an assumption, was logged. Moreover, after each negotiation, participants were asked to rate on a five-point Likert scale (i) how sure he or she was about the determination of the opponent’s strategy and (ii) how well he or she was assisted by the system in making this determination. Before starting the first negotiation, participants had to rate their prior knowledge on negotiations (on a scale of 1–10) and indicate what kind of moves they would expect from each of the fours strategies (with percentages). We did this in order to measure the participant’s understanding of the four negotiation strategies, and posed the same questions in a post-questionnaire. For hardheaded, we counted the answer as correct when silent moves got the largest portion, along with a non-zero portion for concessions. For tit-for-tat, each portion had to be at least 20%. For random, each portion had to be at least 30%. Finally, for conceding, concession moves had to have the largest portion. In the post-questionnaire, we also asked how difficult the user found the task.

48

V. J. Koeman et al.

8 Results This section describes and discusses the results of our experiments. Two pilots were held with relatively small groups, after which a large-scale online experiment was performed.

8.1 Pilot 1 To determine the suitability of the experimental setup and our software for the goals of our evaluation, we performed an exploratory pilot study with 11 participants, all male post-graduates in the department of the authors. Compared to the final setup as described above, in this first setting only a post-questionnaire was held, in which the questions about ‘What kinds of moves would you expect ...’ were not posed. Furthermore, we included the question ‘You had at most 10 bidding rounds to identify the strategy of each opponent. Was this sufficient?’. Finally, the participant’s existing knowledge on negotiation was requested on a Likert scale (instead of on a scale of 1 to 10 as in the final setup). The results of this pilot are summarized in Fig. 4. On average, each participant negotiated five times. No technical problems were encountered. In about 80% of the negotiations, the final answer on which strategy the opponent was playing was correct; 6 out of the 11 participants even achieved a 100% score on this in the pilot. These high scores are also apparent in the questionnaire results (see Fig. 4), as the participants were very sure about their answers (μ = 4.4) and did not find the task very difficult (μ = 2.0). The condition (i.e. receiving explanations or not) did not have any significant effect, which we believe was both due to the small sample size (only five people received explanations) and the low difficulty of the task for this highly educated group.

8.2 Pilot 2 Following the inconclusive results from the initial pilot, an additional pilot was held with a mixed-gender group of 39 third-year bachelor’s students following a minor on negotiation. For this pilot, the opponents were tweaked in order to slightly increase the difficulty of the task, and as aforementioned, the single post-questionnaire was split into pre- and post-questionnaires, to which questions were added in order to measure the participant’s understanding of the negotiation strategy. In addition, as it was clear the limit of ten bids was more than sufficient from the initial pilot, both from the related questionnaire question (μ = 4.3) and the fact that on average only five bids were made per negotiation (in about 2.5 min), the question related to this fact was removed. In the second pilot, on average, seven bids were made per negotiation (in about 2.5 min as well).

How to Recognize and Explain Bidding Strategies in Negotiation …

49

Fig. 4 Results from the first pilot (N = 11)

Fig. 5 Results from the second pilot (N = 30)

As illustrated in Fig. 5, the results from the second pilot were in some sense the opposite of the results of the initial pilot. The 30 students that completed the task correctly identified only 39% of the opponents. As there are just four options to pick from, it can safely be concluded that the participants performed very poorly, indicated by the participants themselves as well through being less sure (μ = 3.7) and feeling the task was more difficult (μ = 3.2). Just like in the initial pilot, no significant results based on the condition were found. Due to this fact and space constraints, further results from the pilots will not be discussed here.

50

V. J. Koeman et al.

8.3 Full Experiment Based on the two pilots, we introduced the ‘new’ explanation strategy in which strategies that would be consistent with an aberration are identified. Our main reason for doing this is that the original explanations that detail the aberrations only gave participants more knowledge about the strategy they had currently guessed, while this new form would also communicate information about one or more other strategies. In addition, in order to gain more participants from more varied backgrounds, we decided to perform a large-scale online experiment. Therefore, instead of face-to-face training as given in both pilots, this part was digitalized.2 To gain a sufficient number of participants, we made use of the Amazon Mechanical Turk [1], ‘a marketplace for work that requires human intelligence’. In order to ensure high-quality work from participants that we could not have direct interaction with, a number of measures were taken (that are common practice [12]): • Only participants who performed at least 1000 tasks with at least a 99% acceptance rate were allowed in. • Only participants from English-speaking countries were allowed to participate. • In the pre-questionnaire, besides questions ensuring informed consent, questions were added to verify that the participant understood the training. Participants that did not answer these questions correctly were prevented from continuing in the experiment. • A unique code was generated upon completion; participants submitting incorrect codes were rejected. Out of 198 ‘turkers’ that started the task, 84 completed the experiment successfully.3 31% of participants was female. The main results of the experiment are shown in Fig. 6. Participants correctly identified the strategy of 44% of their opponents, using 6.7 bids on average (in about 2 minutes). Independent sample T-tests were used to identify differences between participants that received any form of explanation, the ‘old’ inconsistent-behaviour explanations, the ‘new’ suggesting consistent strategies explanations or no explanations at all. Participants receiving any form of explanation on average had a 23.2% (±11.4%) better score against opponents playing a random strategy (t (79) = 2.029, p = 0.046) than participants that received no explanations at all. Moreover, such participants had a 13.5% (±6.4%) better score for correctly specifying the hardheaded strategy in the post-questionnaire (t (79) = 2.098, p = 0.039) as well. Participants receiving the ‘new’ form of explanation on average had a 15.3% (±5.7%) better score against any opponent (t (79) = 2.691, p = 0.009). As no significant difference was found for the ‘old’ form of explanation, we conclude that suggesting consistent strategies is more effective than explaining why observed behaviour is inconsistent. Participants receiving the ‘new’ form of explanation on average also had a 46.2% (±10.9%) better score against opponents playing a 2 The

training for our experiment can be found at anonymized. numbers fall within the expected range for MTurk experiments of this type [9].

3 These

How to Recognize and Explain Bidding Strategies in Negotiation …

51

Fig. 6 Results from the final experiment (N = 81)

random strategy (t (79) = 4.253, p = 0). Such a difference was not found for the other opponents, perhaps because there is more overlap in their behaviours. This is especially true for the hardheaded and the conceding opponent (mainly varying their concession rate), but also for the tit-for-tat opponent if the participant mainly performs concessions him or herself, a sub-optimal strategy. In addition, participants receiving the ‘new’ form of explanation on average had a 24.8% (±11.8%) better score in their second negotiation (t (77) = 2.103, p = 0.039) and a 26.4% (±11.6%) better score in their third negotiation (t (78) = 2.269, p = 0.026) than the other participants. Interestingly, such differences were not found for the first and fourth negotiations, suggesting a relatively steep learning curve. No further significant mean differences were found based on the explanation conditions. However, further correlation analysis suggests that participants that on average did more bids in a negotiation identified the opponent correctly more often (r = 0.233, p = 0.037), but were also less sure of their answer each time (r = −0.388, p = 0). Besides the ‘new’ form of explanations as aforementioned, no other factor directly correlates with the amount of correct answers of a participant. Interestingly, just like in both pilots, participants that indicated they were more knowledgeable about negotiations at the start did not perform significantly better. However, such participants seemed to perform less bids (r = −0.327, p = 0.003), to be more sure about their answers (r = 0.297, p = 0.007), and to feel more assisted (r = 0.304, p = 0.006) regardless. Being more sure (μ = 3.9) and feeling more assisted (μ = 3.7) positively correlates in general (r = 0.361, p = 0.001). Finally, no significant impact of either the order of opponents in the four negotiations or the ordering of the strategies in the interface in each negotiation, which were both randomized, was found.

52

V. J. Koeman et al.

9 Conclusion If we can automatically detect when the user or the opponent seems to deviate from a strategy, or that our opponent model might be wrong, or that the user or the opponent might have changed his preferences, or might just simply have made a mistake, this opens the possibility for pro-actively discussing these strategies with the user. The technology we introduce in this paper has been developed with the aim of supporting human negotiators in gaining insight into the bidding strategy of the opponent and into their own bidding behaviour. The core technology we developed consists of two aspects: aberration detection, and the notion of an explanation matrix. The aberration detection mechanism identifies when a bid falls outside the range of expected behaviour for a specific strategy. The explanation matrix is used to decide when to provide what explanations. We evaluated our work experimentally in a task in which participants are asked to identify their opponent’s strategy in the Pocket Negotiator. On a technical note, our explanation mechanism made it easy for us to experiment with different types of explanations, as these could quickly be implemented in our explanation matrices. As the number of correct guesses increases with explanations, indirectly, these experiments show the effectiveness of our aberration detection mechanism. Our experiments show that suggesting consistent strategies is more effective than explaining why observed behaviour is inconsistent. Future Work Our evaluations used a single negotiation domain and four negotiation strategies. Although we believe the domain and strategies are representative, the effects of using more complex domains and/or strategies can be examined. Finally, note that our work is applicable to evaluating the bids a user makes him or herself as well, e.g. for confirming that a user’s bids comply with that user’s intended strategy (as set in the system) and providing an explanation when this is not the case (before a bid is actually made).

References 1. Amazon. Mechanical Turk (2018). https://www.mturk.com/. Accessed 16 Nov 2018 2. Baarslag, T.: What to bid and when to stop. Ph.D. thesis, Delft University of Technology (2014) 3. Baarslag, T., Gerding, E.H.: Optimal incremental preference elicitation during negotiation. In: Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, pp. 3–9. AAAI Press (2015) 4. Baarslag, T., Hindriks, K.V., Jonker, C.M., Kraus, S., Lin, R.: The first automated negotiating agents competition (ANAC 2010). In: Ito, T., Zhang, M., Robu, V., Fatima, S., Matsuo, T. (eds.) New Trends in Agent-Based Complex Automated Negotiations. Studies in Computational Intelligence, vol. 383, pp. 113–135. Springer, Berlin (2012) 5. Baarslag, T., Fujita, K., Gerding, E.H., Hindriks, K., Ito, T., Jennings, N.R., Jonker, C., Kraus, S., Lin, R., Robu, V., Williams, C.R.: Evaluating practical negotiating agents: results and analysis of the 2011 international competition. Artif. Intell. 198, 73–103 (2013)

How to Recognize and Explain Bidding Strategies in Negotiation …

53

6. Baarslag, T., Aydo˘gan, R., Hindriks, K.V., Fuijita, K., Ito, T., Jonker, C.M.: The automated negotiating agents competition, 2010-2015. AI Mag. 36(4), 115–118 (2015) 7. Baarslag, T., Hendrikx, M.J.C., Hindriks, K.V., Jonker, C.M.: Learning about the opponent in automated bilateral negotiation: a comprehensive survey of opponent modeling techniques. Auton. Agents Multi-Agent Syst. 30(5), 849–898 (2016) 8. Baarslag, T., Kaisers, M., Gerding, E.H., Jonker, C.M., Gratch, J.: When will negotiation agents be able to represent us? The challenges and opportunities for autonomous negotiators. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI’17, pp. 4684–4690. AAAI Press (2017) 9. Goodman, J.K., Cryder, C.E., Cheema, A.: Data collection in a flat world: the strengths and weaknesses of Mechanical Turk samples. Behav. Decis. Mak. 26(3), 213–224 (2012) 10. Hindriks, K.V., Jonker, C.M.: Creating human-machine synergy in negotiation support systems: towards the pocket negotiator. In: Proceedings of the 1st International Working Conference on Human Factors and Computational Models in Negotiation, HuCom’08, New York, NY, USA, pp. 47–54. ACM (2009) 11. Hindriks, K.V., Jonker, C.M., Tykhonov, D.: Let’s DANS! an analytic framework of negotiation dynamics and strategies. Web Intell. Agent Syst. 9(4), 319–335 (2011) 12. Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on Amazon Mechanical Turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP’10, New York, NY, USA, pp. 64–67. ACM (2010) 13. Johnson, E., Gratch, J., DeVault, D.: Towards an autonomous agent that provides automated feedback on students’ negotiation skills. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, AAMAS’17, Richland, SC, USA, pp. 410– 418. International Foundation for Autonomous Agents and Multiagent Systems (2017) 14. Jonker, C.M., Hindriks, K.V., Wiggers, P., Broekens, J.: Negotiating agents. AI Mag. 33(3), 79–91 (2012) 15. Koeman, V., Hindriks, K.V., Gratch, J., Jonker, C.: Recognising and explaining bidding strategies in negotiation support systems, extended abstract. In: Agmon, N., Taylor, M.E., Elkind, E., Veloso, M. (eds.) Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2019, pp. 2063–3064. International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org) (2019) 16. Lee, W.-P.: Towards agent-based decision making in the electronic marketplace: interactive recommendation and automated negotiation. Expert Syst. Appl. 27(4), 665–679 (2004) 17. Lewicki, R.J., Saunders, D.M., Barry, B., Minton, J.W.: Essentials of Negotiation. McGrawHill, Boston (2003) 18. Miller, T.: Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019) 19. Moulin, B., Irandoust, H., Bélanger, M., Desbordes, G.: Explanation and argumentation capabilities: towards the creation of more persuasive agents. Artif. Intell. Rev. 17(3), 169–222 (2002) 20. Pommeranz, A., Broekens, J., Wiggers, P., Brinkman, W.-P., Jonker, C.M.: Designing interfaces for explicit preference elicitation: a user-centered investigation of preference representation and elicitation process. User Model. User-Adapt. Interact. 22(4), 357–397 (2012) 21. Raiffa, H.: The Art and Science of Negotiation: How to Resolve Conflicts and Get the Best Out of Bargaining. Harvard University Press, Cambridge (1982) 22. Raiffa, H., Richardson, J., Metcalfe, D.: Negotiation Analysis: The Science and Art of Collaborative Decision Making. Harvard University Press, Cambridge (2003) 23. Tintarev, N., Masthoff, J.: Explaining recommendations: design and evaluation. In: Ricci, F., Rokach, L., Shapira, B. (eds.) Recommender Systems Handbook, pp. 353–382. Springer, Boston (2015) 24. Van De Kieft, I., Jonker, C.M., Van Riemsdijk, M.B.: Explaining negotiation: obtaining a shared mental model of preferences. In: Proceedings of the 24th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems Conference on Modern Approaches in Applied Intelligence - Volume Part II, IEA/AIE’11, pp. 120–129. Springer, Berlin (2011)

Negotiation Frameworks, Strategies, and Recommenders

NegMAS: A Platform for Situated Negotiations Yasser Mohammad, Shinji Nakadai, and Amy Greenwald

Abstract Most research in automated negotiation focuses on strategy development in preset scenarios where decisions about what to negotiate about, whom to negotiate with, and on which issues are given to the agents. Moreover, in many cases, the agents’ utility functions are predefined, static, and independent of other negotiations. NegMAS (Negotiations Managed by Agent Simulations/Negotiation Multiagent System) was developed to facilitate the research and development of autonomous agents that operate in a rich multiagent system where negotiations are paramount, such as a supply chain. The richness of the setting creates what we call situated negotiations, where negotiations naturally interdepend and agents’ utility functions arise endogenously from system dynamics. This paper introduces NegMAS—a platform for autonomous negotiation within a rich simulated multiagent system—and evaluates its use in a sample application.

1 Introduction Negotiation is everywhere. It provides an important mechanism for collaboration, and is one of the main mechanisms for achieving agreements among parties with overlapping interests. Automated negotiation involves autonomous agents, either Y. Mohammad (B) · S. Nakadai NEC Inc., Minato-ku, Japan e-mail: [email protected] S. Nakadai e-mail: [email protected] AIST, Tsukuba, Japan Y. Mohammad Assiut University, Asyut, Egypt A. Greenwald Brown University, Providence, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 R. Aydoˇgan et al. (eds.), Recent Advances in Agent-based Negotiation, Studies in Computational Intelligence 958, https://doi.org/10.1007/978-981-16-0471-3_4

57

58

Y. Mohammad et al.

negotiating among themselves or with humans on behalf of their (ultimately human) users. Automated negotiation has a rich and long history with roots in economics [27], game theory [25], machine learning [30], and multiagent systems [7]. Despite this long history, relatively few platforms exist for conducting research in automated negotiations. The most well-known such a platform is GENIUS (General Environment for Negotiation with Intelligent multi-purpose Usage Simulation) [20], which was designed to alleviate difficulties in the process of developing general (i.e., domain-independent) automated negotiators. GENIUS provides an analytical toolbox with several tools to analyze negotiation domains, agent performance, and negotiation outcomes. Moreover, it comes equipped with a rich variety of negotiation domains, utility functions, and negotiation strategies. GENIUS is the main platform used in the Automated Negotiating Agents Competition (ANAC), used in its tenth incarnation in 2019. GENIUS is implemented in Java and was open sourced in 2018. Whereas in a typical GENIUS negotiation, a domain is specified, and then each agent is assigned a utility function to try to maximize, recently, GENIUS was enhanced to support negotiations where agents have only partial knowledge of the utility function of their users. A related platform is Jupiter [10], which allows agents developed in Python to engage in GENIUS negotiations, but using only one of the negotiation protocols GENIUS supports. Another related project is the Pocket Negotiator, developed by the same group as GENIUS [14], as a support system for people engaging in nonautomated bilateral negotiations involving multiple issues. Yet another related platform is Invite, developed for research and training purposes by the InterNeg research center at Concordia University [15]. This platform is designed to support human–human negotiations, by providing analytic methods, communication facilities, and graphical tools depicting a simulated bargaining process [16]. Commercial platforms for fully automated negotiation, or even negotiation support systems, are still relatively uncommon. An example of a commercial negotiation support system is ContractRoom [6]. This platform provides easy-to-use tools that enable human negotiators to reach agreements faster, such as a mechanism that facilitates online collaboration. This mechanism is augmented with artificial intelligence, but is still mostly a human–human negotiation support system; it does not venture into the realm of automated negotiations. The aforementioned platforms were designed to support the development of general negotiation strategies that work effectively in a wide variety of situations, and they succeed in achieving this goal. The variety of agents submitted to ANAC for a decade support this claim, for the GENIUS platform especially. Nevertheless, our goal in this work is to model still more general negotiations, by building a platform that supports rich simulations of a multiagent system (e.g., an economy) in which negotiations are paramount, such as a supply chain. The richness of the setting gives rise to what we call situated negotiations, where negotiations naturally interdepend and agents’ utility functions arise endogenously from system dynamics. Moreover, as is natural in the environments we aim to simulate, all decisions pertaining to a negotiation are autonomous in our platform, including decisions about when to negotiate,

NegMAS: A Platform for Situated Negotiations

59

what issues to negotiate about, with whom to negotiate, and even what protocols to use for these negotiations. Existing platforms were not designed to handle situated negotiations. All of the aforementioned platforms and systems share some limitations in situated negotiation settings, which NegMAS is intended to address. First, agents are limited to engaging in either a single negotiation or a set of independent negotiations. In some settings, this limitation can be crippling. Consider a single buyer in a market with multiple sellers. If the buyer can engage in multiple negotiations with the various sellers simultaneously, or sequentially, the situation is characterized by interdependent utility functions across multiple negotiations, something other platforms and systems fail to model, perhaps because of the inherent complexity in reasoning about the utility of one negotiation when it depends on the success of other current or future negotiations. Second, the utility function is usually assumed to be static for the duration of the negotiation. While this assumption may be justified in many cases, situating the negotiator within the real world (or a simulated environment) with intrinsic utility functions arising from the system dynamics necessitates treating utility functions as dynamic entities. Consider once again the single buyer, multiple sellers scenario; once an agreement is reached with any of the sellers, the utility value of all other potential agreements may change drastically. Finally, no other platform or system supports autonomous agents reasoning about which negotiations to engage in and what issues to negotiate about. NegMAS is intended to complement existing automated negotiation platforms (e.g., GENIUS) by addressing the structural issues stemming from the nature of situated negotiations. Moreover, NegMAS is developed as an open-source public project1 ; as such, it is open to contributions from the whole research community. It provides a common API that supports multiple programming language (currently python and Java). Finally, it is designed to work either as a stand-alone system or as a client to a distributed system, implementing the same API, thus providing a scalable solution. The rest of this paper is organized as follows: Sect. 2 describes what we mean by situated negotiations, and justifies our goal of building a system to handle them, based on their prevalence in the real world. Section 3 explains the design philosophy and design decisions made in NegMAS to support situated negotiations. Section 4 briefly presents the analytic tools available in NegMAS. In Sect. 5, we showcase NegMAS’ capabilities, by describing a rich sample application.

2 Situated Negotiations Automated negotiation is usually studied without regard to the environment in which the negotiation takes place [1, 5, 27]. From an engineering perspective, this approach 1 NegMAS

is available from https://www.github.com/yasserfarouk/negmas.

60

Y. Mohammad et al.

can be justified by the separation of concerns design principle. In some situations, the outside environment can indeed be abstracted away, by embedding appropriate assumptions in the utility function or opponent model. For example, outside pressure to reach agreement can be modeled by imposing a time limit on the negotiation, an exponential discount factor on the utility function, as part of the opponent model, or in the reservation value assumed for failure to reach agreement. In some other scenarios—the ones NegMAS is intended to model —the effect of the environment on negotiator’s options and behavior is harder to ignore. As an example, consider the matching market in the Navy detailing system, which allocates sailors to job vacancies [19]. In this system, vacancies are published and sailors apply to fill them. Commanders then select sailors to fill these jobs by conducting concurrent bilateral negotiations with applicants. In the Navy’s system, the effect of one negotiation i on another negotiation j can be summarized in the reservation value employed in j, assuming enough information is available to accurately predict the agreement to be reached in i. Indeed, if historical data and distinguishing features of the negotiating partners (e.g., sailors) are available, it is possible to use this information to estimate (i.e., learn) the values of agreements offline, thereby decoupling concurrent negotiations. Relying on fixed reservation values, however, for the duration of a negotiation is suboptimal, as it does not allow the negotiator to update their behavior based on the expected outcome of open negotiations as they proceed. Moreover, sufficient information is rarely available to decouple negotiations in advance; instead, dynamic modification of the reservation value (or other aspects of the negotiation) should happen during the negotiation process. A more business-like example of a complex negotiation scenario is a factory conducting concurrent bilateral negotiations with multiple suppliers of raw materials and multiple possible consumers of the factory’s products. In this case, not only is the reservation value of a negotiation affected by other ongoing negotiations but the utility of all potential agreements is as well. Utility elicitation during negotiation provides another example in which an outside factor (the user the agent represents, in this case) affects the utility function, and as a result impacts what should be deemed acceptable behavior on the part of the negotiator [2, 4, 21, 23]. What all of these scenarios share is an effect of factors external to the negotiation on various aspects of the negotiations, particularly the utility function, that should entail a change in the negotiator’s behavior. These scenarios are called situated negotiations in this paper. Some of their more salient features are Negotiation Management Agents are not only required to decide how to negotiate, but also whether to negotiate, when, and with whom. Simultaneous and sequential negotiations Negotiations interdepend, and thus it is suboptimal to consider multiple negotiations independently of one another. Embedded utility functions that emerge endogenously from real-world circumstances, or the simulation environment in which the agent operates. Dynamic utility functions that are subject to change during the negotiation. Uncertain or unknown utility functions that are not known by the negotiating agents with certainty, or are only partially known.

NegMAS: A Platform for Situated Negotiations

61

NegMAS was developed to facilitate and ultimately advance research in situated negotiations, by providing an easy-to-use, extensible, portable platform for modeling and analyzing situated negotiations. To achieve this goal, NegMAS allows the designer to develop arbitrarily complex multiagent simulations of real-world scenarios, and provides the tools necessary to build negotiation mechanisms and negotiation strategies for agents embedded in these simulations. The complexity of the negotiation scenario (i.e., simultaneity, sequentiality, utility function interdependence, nonstationarity, etc.) arises naturally as a consequence of this embedding.

3 System Design This section presents an overview of NegMAS’ key components and their interactions. As a general design principle, NegMAS is intended to make common cases easy to implement, with less common cases still possible, but assuming common default settings for most parameters of its components.

3.1 Issues and Outcomes A possible agreement of a negotiation is called an outcome and is denoted by ω. The set of all possible outcomes of a negotiation, Ω, is called the outcome space, and can be represented in three ways in NegMAS. The simplest, most abstract representation is an enumerated outcome space, a list enumerating countably-many outcomes, i.e., Ω = {ωi | i ∈ {1 . . . n}}. The most structured representation is a dimensional outcome space, which is defined as the Cartesian product of a finite set of issues I . The range of each issue is flexible; it can be finite or infinite. NegMAS supports partial outcome specification for dimensional outcome spaces, meaning specification of values for only a partial set of the outcomes, which is used in some mediated protocols [18]. The third type of outcome space supported by NegMAS is a structured outcome space. Issues and outcomes can be structured into graphs that negotiation agents and mechanisms can use to guide their behavior. For example, defining a chain over issues can be used to indicate the order of negotiation for negotiation protocols that consider issues serially [8]. Defining a directed acyclic graph over outcomes can be used to encode probabilistic independence relationships among them [23, 26]. Any object can be an outcome of a negotiation in NegMAS, including numbers, strings, and more complex objects. Outcome ranges specifying both continuous and discontinuous parts of the outcome space are also supported, along with functions to efficiently test outcome membership in ranges, random generation of outcomes inside or outside ranges, enumeration or sampling of outcomes from outcome ranges, etc.

62

Y. Mohammad et al.

3.2 Utility Functions In contrast to existing negotiation environments, utility functions in NegMAS are active entities that evolve over time. They are implemented as objects in the standalone version and processes in the distributed version. This is necessary in situated negotiations, because of the interdependencies among utility functions in simultaneous and sequential negotiations as well as the need to model dynamic utility functions resulting form the dynamics of the simulation. Three utility function interfaces are supported in NegMAS: cardinal, comparative, and ranking interfaces. Cardinal utility functions implement a mapping from any possible outcome (or partial outcome) to a utility value. Utility values can be either real numbers or any probabilistic distribution over real numbers (e.g., uniform, Gausc c sian, etc.). Formally, a cardinal  ∞utility function U is defined as U : Ω →  ∪ P, where P = { p :  → (0, 1) | −∞ p (x) d x = 1 }. For normalized utility functions, the range of integration and the domain of p can be limited to the limits of normalization (usually (0, 1)). Allowing cardinal utility functions to return probability distributions instead of simple values, simplifies the implementation of agents that negotiate under uncertainty. Comparative utility functions implement only a comparison operator between any two outcomes, allowing for indifference. Formally, a comparative utility function U  is defined as U  : Ω 2 → {, ≺, } ∪ P , where , ≺, and represent and same, respectively, and P = { p  : {, ≺, } → (0, 1) |  better, worse,  x∈{,≺, } p (x) = 1}. Once again, the flexibility to return a probability distribution has applications in negotiation under uncertainty. Ranking utility functions implement a ranking function that returns, for any list of outcomes, a partial ordering over them. Formally, a ranking utility function U r is defined as U r : Ω n → ( , Ω), where ( , Ω) is a partial ordering of outcomes in Ω. Currently, NegMAS does not support probability distributions over orderings. Cardinal utility functions that return real values for all outcomes in Ω are called crisp utility functions. Given a (probabilistic) cardinal utility function u p , one can create a crisp utility function u by applying a collapsing function that maps probability distributions to real values. Built-in collapsing functions include the mean (the default), median, min, and max. Custom collapsing functions can also be defined easily. A crisp cardinal utility function u implies a comparative utility function u  , where ⎧ ⎪ ⎨ u (ω1 ) > u (ω2 )  u (ω1 , ω2 ) = u (ω1 ) = u (ω2 ) . ⎪ ⎩ ≺ u (ω1 ) < u (ω2 )

(1)

A probabilistic cardinal utility function returns a probability distribution pω , for all ω ∈ Ω. In this case, the implied comparative utility function u  (ω1 , ω2 ) = pω , where

NegMAS: A Platform for Situated Negotiations

∞ ∞ pω

() =

−∞ −∞

63

 δ pω1 (x) > pω2 (y) pω1 (x) pω2 (y)d xd y ∞ ∞ . −∞ −∞ pω1 (x) pω2 (y)d xd y

Here, δ[E] is one if E is true and zero otherwise, and pω ( ) and pω (≺) are defined similarly. NegMAS implements this transformation for crisp utility functions and probabilistic utility functions with either impulse, uniform, or Gaussian distributions. Generating a ranking utility function from a crisp cardinal utility function is simply a sort operation, which takes O (n log n) operations, where n is the number of outcomes in the ranking. (For probabilistic utility functions, a collapsing function is applied first.) Generating a ranking utility function from a comparative utility func tion is also supported, using bubble sort, which takes O n 2 operations, assuming a consistent set of comparisons (i.e., no cycles, so that it induces a linear order). As NegMAS automates all of these transformations, it is only necessary to implement a single interface in any utility function, and then others—in the following order: probabilistic utility functions; crisp utility functions; comparative utility functions; ranking utility functions—become available for immediate use by negotiators. In the future, inferring probabilistic utility functions from comparative and ranking utility functions will also be supported, using, say, ordinal regression or Gaussian processes. Currently NegMAS supports the following types of cardinal utility functions: linear utility functions, generalized additive independence (GAI) models [9], hyperrectangle utility functions [13] and nonlinear combinations thereof, and, more generally, any nonlinear mapping from the outcome space to utilities. General support for time-discounted cardinal utility functions, with both linear and exponential discounting, is also available. Defining new utility function types involves overriding a single method. Utility Operators NegMAS supports the concept of a utility operator, a functional that operates on a utility function, and other parameters as necessary, and returns a value. Examples of utility operators are the max and min operators, which return the limits of a utility function; the outcome_with_utility operator, which returns an outcome with a utility near some given value; and the normalize operator, which returns a new utility function normalized so that its values lie in a given range. The latter two examples apply only to cardinal utility functions; however, the uneg2 library—a sister library of NegMAS—induces a cardinal utility function from a comparative or ranking utility function. Utility operators provide a modular way to implement common operations used by multiple negotiation strategies. An important feature of utility operators in NegMAS is that they can be redefined for each kind (i.e., cardinal, ranking, or comparative) of utility function and each type (e.g., linear, GAI, hyperrectangle, etc.) of cardinal utility function to optimize its accuracy versus its speed. Moreover, the operators can be parameterized to control this trade-off. The rest of this section provides an example that highlights this tradeoff: 2 Available

at https://www.github.com/yasserfarouk/uneg.

64

Y. Mohammad et al.

The max Operator The max operator, when given a utility function, returns both the outcome with maximal utility and its utility value. It takes as input one parameter, α ∈ [0.1], which specifies how accurate the operator be (i.e., the speed/accuracy trade-off). The most general (and slowest) implementation of this operator samples without replacement n outcomes from the outcome space, calculates their utilities, and returns the maximal value sampled together with the corresponding outcome. The number of samples n is α × |Ω|, where α is the accuracy and |Ω| is the size of the outcome space (saturated at 1e6 for continuous outcome spaces). Note that setting α = 1.0 yields the exact maximum for any discrete utility function. Specialized implementations of the max operator are provided for different types of cardinal utility functions to improve the accuracy/speed trade-off. Consider a linear utility function u defined as m βi ωi , u(ω) = i=1

where ωi is the value of issue i assuming outcome ω, the βi ’s are a set of weights, and m is the number of issues. Now the max operator is the solution to the following optimization problem: arg max u(ω) ω

s.t.

ωi ∈ Ri ,

where Ri is the range of issue i. This is a simple linear optimization problem which NegMAS solves using the simplex algorithm in O(m) steps. Note that in this simple case the accuracy parameter is ignored. Specialized implementations are also provided for GAI and linear hyperrectangle utility functions. These approaches are heuristic and iterative; the accuracy parameter is used to control the speed/accuracy trade-off as usual. More generally, all utility function operators provide a default (usually inefficient) implementation. Because of NegMAS’ object-oriented design, it is trivial to add new utility functions, and to extend operators to support specialized implementations of them.

3.3 Negotiators Negotiations occur between negotiators. All negotiator types define capabilities that are matched with the requirements of the negotiation protocol before agents are allowed to join negotiation mechanisms. This makes it possible to define negotiation strategies that are applicable across multiple negotiation protocols. All negotiators define a set of callbacks that can be used to update the negotiator’s internal state or behavior based on salient events during the negotiation, including the negotiation’s start and end, a round’s start and end, errors, and utility function updates.

NegMAS: A Platform for Situated Negotiations

65

It is not possible to define general purpose negotiators in NegMAS independent of a negotiation protocol. NegMAS provides implementations of simple negotiation strategies for the SAOP in the bilateral [5] and multilateral cases [1], including the time-based aspiration level strategy with exponential and polynomial aspiration functions [7], and the Naive version of the tit-for-tat strategy described in Baarslag, et al. [3]. Beyond these built-in negotiators, NegMAS can also access most negotiation agents defined in the GENIUS platform [20] through a GeniusNegotiator class that allows these agents to participate in negotiation sessions running on NegMAS. Note, however, that since NegMAS supports richer simulation environments than GENIUS, GENIUS negotiators are not always applicable: e.g., they assume static utility functions.

3.4 Controllers Negotiators can participate in but one negotiation at a time. This means that they cannot support concurrent negotiation, which is one of the defining features of situated negotiations. NegMAS thus provides a controller entity capable of orchestrating the behavior of multiple negotiators (its children). Any method that is implemented by the controller takes precedence over the same method implemented by any of its negotiators. This way, controllers can decide to delegate some of their activities to negotiators, while still maintaining centralized control. Controllers enable a variety of implementations of concurrent interdependent negotiations in NegMAS. In the most centralized implementation, one controller takes control of the behaviors in all negotiations, based on a utility function defined for that controller (the micromanager CEO model). In the most distributed implementation, independent empowered negotiators are created, each of which has its own utility function and makes autonomous decisions in its own individual negotiation. Utility functions can then be linked together through shared state, which enables them to respond dynamically to changes in the state of other negotiations. Figure 1 depicts these two extreme possibilities. Controllers allow agents to view related negotiations from the lens of a unified managing entity. Consider an agent representing a factory that is engaging in multiple concurrent negotiations to secure raw materials for some products it intends to sell later for which multiple sell contracts already exist. Such a factory can create a single controller for each set of negotiations related to a single sell contract. In so doing, it becomes easy to evaluate the marginal utility of any hypothetical agreement being negotiated by any of the negotiators controlled by a single controller. Moreover, agreements that are reached by one controller do not impact the utility functions of any other negotiations not under the control of that controller, leading to increased efficiency.

66

Y. Mohammad et al.

Types of Controllers NegMAS provides several built-in controller types covering common cases that face agents engaged in multiple concurrent negotiations. The SyncController synchronizes the behavior of multiple negotiators, allowing centralized control of their behaviors. When used in many-to-many negotiations, SyncController(s) face the possibility of non-trivial deadlocks (e.g., Agent A is negotiating with B and C and agent B is negotiating with A and D, with agent A waiting for two offers from B and C and agent B waiting for two offers from A and D). A simple loop-breaking mechanism is used to handle these situations. The MetaNegotiatorController uses a single negotiation strategy for all concurrent negotiations. It takes care of instantiating these negotiators and appropriately invoking them. A simple example is the AspirationMetaNegotiatorController, which uses a time-based negotiation strategy for all negotiations. Finally, the SingleAgreementController manages a set of negotiators, deferring the local decisions about each individual negotiation to its negotiator, while still guaranteeing that at most one agreement can be reached.

3.5 Agents The main actor in NegMAS is the agent. An agent represents an autonomous entity that has well-defined objectives (which can, but need not, be explicitly encoded in a utility function). Figure 1 shows an example of an agent, which, using a controller and two independent negotiators, is engaged in four simultaneous negotiations. Agents in NegMAS interact within a simulation that is part of a world (see Sect. 3.7). Within a world, agents can access public information as well as their own private state, and can execute actions as well as engage in negotiations. Agents are responsible for deciding what negotiations to engage in, which utility functions to use, and how to change their utility functions based on changes in the world simulation, their internal state, or outcomes of other negotiations. Moreover, agents may be required to perform other activities not directly related to negotiation to achieve their objectives. For example, an agent representing a factory manager needs to control the production lines in that factory based on the results of its negotiations.

3.6 Mechanisms Negotiations are conducted based on a negotiation protocol, which encodes the rules of engagement for negotiators. Negotiation protocols are the primary mechanisms in NegMAS. A mechanism is an entity that controls the interactions among negotiators. Beyond negotiation protocols, mechanisms can also represent auctions. Mechanisms define a set of requirements that must be satisfied by any negotiator that joins them. In addition to defining a set of requirements, mechanisms also have to define two operations: initialization and a round operation. Mechanisms are run

NegMAS: A Platform for Situated Negotiations

67

Fig. 1 An agent engaging in four negotiations in four separate mechanisms using one controller to link two of these negotiations, and two additional autonomous negotiators with interdependent utility functions that depend on a common state. Note that all three negotiator’s utility functions depend on the agent’s utility function, which serves to coordinate the behavior of the three

by executing the round operation until it returns a special stop symbol or until a time limit is reached. Time limits can be defined for the complete mechanism session or for each round and each negotiator’s action. This feature simplifies implementation of bounded rationality negotiators, where the bound is imposed by computational considerations. At the time of writing, the following mechanisms had been implemented in NegMAS: the Stacked Alternating Offers Protocol (SAOP) as an example of a nonmediated negotiation protocol [1], the Single Text Negotiation Protocol (ST) [18] as an example of a mediated protocol, first-price and second-price auctions, as examples of one-shot mechanisms, and an English auction, as an example of a dynamic auction [29]. Adding new mechanisms to NegMAS involves implementing only a single method. Mechanisms in NegMAS are dynamic in the sense that negotiators can join and leave negotiations at any time. Specific mechanisms may disable this feature (for example, one-shot auctions). Negotiators join NegMAS mechanism sessions with predefined roles that are set by the mechanism. This makes it possible to implement mechanisms that treat different agents differently. For example, auction mechanisms have two roles: auctioneer and participant. Finally, mechanisms in NegMAS can be chained with the outcomes of one mechanism determining the outcome space for the next mechanism in the chain. Stacked Alternating Offers Protocol SAOP is a non-mediated negotiation protocol in which negotiators are ordered and take turns in a round-robin fashion. When it is a negotiator’s turn, it can either accept the current offer, reject and counter it, or leave the negotiation, effectively ending it. The negotiation ends with failure if any negotiator leaves, or if a predefined timeout condition is met (either a predefined number of

68

Y. Mohammad et al.

rounds or seconds have passed). NegMAS implements an extended version of this protocol which supports additional features useful for situated negotiations. First, NegMAS supports obfuscating the order of negotiators to avoid creating an ultimatum game [11] in the last round of a negotiation, where the first agent to make an offer knows that an offer above an opponent’s reservation value will be accepted, if the opponent behaves rationally. In NegMAS, all negotiators are invited to place offers in the first round; then all but one of these offers is discarded, which serves as the first offer, and the negotiation continues as usual. In this way, none of the agents know who placed the first offer—or, more importantly, who will be the first to place the last. As a result, the negotiation does not reduce to an ultimatum game in the last round, even if reservation values are common knowledge. This order obfuscation feature is optional, so can be turned on or off for each negotiation. Second, NegMAS supports dynamic entry into and exit from negotiations. More specifically, negotiators are free to enter or exit negotiations at any time. A standing offer becomes an agreement once all present negotiators accept it. Third, NegMAS supports SAOP with noncommittal offers, so that negotiators can reject their own offer, even if all the other negotiators accept that offer. This capability has been implemented before in concurrent negotiation protocols with time-limited decommitment messages [28], but to our knowledge, it is not typical of SOAP. Finally, NegMAS allows a negotiator, on its turn, to reject the current offer without having to propose a counter-offer, instead asking the next negotiator to propose a counter-offer—even if the next negotiator is the same one who just proposed! This is useful for negotiators that implement an acceptance strategy but no bidding strategy (i.e., they know what to accept but they do not know how to make own offers). Single Text Protocol In the ST family of protocols, a facilitator repeatedly proposes a potential agreement to all negotiators and receives their feedback, based on which it produces the next potential agreement. This process repeats until an agreement is reached, or a timeout condition is met. Different ST protocols differ in the way the facilitator generates offers, and the type of feedback received from the negotiators [13, 17, 18]. Implementing an ST protocol for NegMAS requires overriding a single next_offer method. At the time of writing, two specific variations had been implemented: VetoST and HillClimbingST. In both cases, a negotiator need only implement a single is_better method, capable of comparing any two potential outcomes in the outcome space. VetoST chooses the next action based on the votes of the negotiators. If a majority of negotiators accept the proposed outcome, VetoST creates the next proposal by randomly changing the value of one issue in that outcome; otherwise, it randomly changes the value of one issue in the last outcome that received a majority of the votes. If no outcome has yet received majority of votes, it changes the value of a randomly chosen issue. The HillClimbingST algorithm differs only in the way it generates the next offer. If all agents preferred a proposed outcome, HillClimbingST generates the next offer by applying a change in the same direction to the same issue; otherwise, it changes a different randomly chosen issue (Fig. 2).

NegMAS: A Platform for Situated Negotiations

69

Fig. 2 The main components of a world NegMAS

3.7 Worlds The simulation environments within which agents operate in NegMAS are called worlds. As all worlds use the same interface, some common functionality is provided. These include a public bulletin board on which common information available to all agents in the world is posted, and summary statistics calculations. The world also provides contract persistence (i.e., saving contracts even after a simulation ends), name resolution services, logging, and statistics calculation. Each world contains a simulator that is responsible for running the environment. Moreover, the world simulator executes all mechanisms within it. Agents can affect the simulation through actions that the world defines. Designing a new environment in NegMAS can be achieved by overriding a few basic functions that control the simulation. New statistics can be extracted from the simulation centrally, or through a modular monitoring system. The latter consists of user-defined objects that process all events raised by any entity in the system.

4 Tools and Common Components Beyond the main building blocks described so far, NegMAS also provides common tools that can be used by system designers and developers to implement new world simulations, mechanisms, agents, negotiators, and controllers. This sections provides an overview of some of the most important of these tools. Analytic Tools NegMAS provides designers of negotiators, agents, and mechanisms access to several analytic and visualization tools that they can use to better understand the domains they create.

70

Y. Mohammad et al.

Fig. 3 Visualization of a negotiation between a buyer and a seller using time-based negotiators

Analytic tools for outcome spaces include random generation of valid and invalid outcomes with respect to any combination of outcome ranges or issue spaces, grid and random sampling of outcomes, cardinality evaluation, etc. The library also provides functions that calculate the Pareto-frontier, social welfare, the Nash bargaining point, etc. A set of tools for visualizing world simulations, including negotiation requests, negotiation results, and contract signing and execution is also provided. Figure 3 depicts the default visualization of a sample negotiation conducted between a buyer and a seller. Tournament Management NegMAS provides a common tournament management interface which can be used to run tournaments among agents in any world by implementing just four components: a configuration generator to generate different world configurations, an assigner that assigns competitors to these worlds, a world generator that builds world simulations given configurations together with complete assignments, and a score calculator that calculates the scores of agents based on related world simulations. Tournaments can be run serially, in parallel utilizing any fraction of a single-machine’s cores or in a distributed environment over many servers. The system logs all activities in all simulations for use in post-tournament analysis. Language Neutrality Even though NegMAS is implemented in Python, the API was designed to be easy to port to any object-oriented language. At the time of writing, the API was also implemented in Java (JNegMAS) allowing agents and negotiators to be implemented in either Java or Python. Moreover, a REST API that supports a language-neutral implementation of agents, negotiators, and mechanisms is under development.

NegMAS: A Platform for Situated Negotiations

71

5 Applications: Focus on SCML NegMAS is still young, yet it is already being used actively for the research and development of negotiation agents. It has been used as a platform for preference elicitation research [22], where the ability to model uncertainties in utility functions is especially important. It has also been used in path planning for self-interested robots [12], where it is useful to be able to handle a large number of concurrent negotiations efficiently. Most notably, however, it was used in both 2019 [24] and 2020 as the platform for the Supply Chain Management League (SCML) conducted as part of the Automated Negotiation Agents Competition held in conjunction with the International Joint Conference on AI (IJCAI). In this application, NegMAS was used to implement a scenario that epitomizes situated negotiations: autonomous agents have to decide when to negotiate, about what, and with whom, using dynamic utility functions that emerge endogenously from the simulation rather than being dictated from outside the system. SCML—much like the NegMAS platform in which it was built—was created with the intent of increasing the relevance of automated negotiation research, by going beyond “context-free” negotiation scenarios, where agents make decisions in just a few independent, static negotiations, to situated negotiations, which arise when negotiations are embedded in complex, dynamic environments. One distinguishing feature of SCML, and situated negotiations more broadly, is the fact that agents’ utility functions are endogenous, meaning they are the product of the system’s evolution, and hence, cannot be dictated to agents in advance of running the simulation. In SCML specifically, agents represent factory owners, who negotiate to secure raw materials, which they then convert to products and sell. It is thus an agent’s job to devise utilities for various possible agreements, given their unique production capabilities, and then to negotiate with other agents to contract those that are most favorable to them. In SCML, a major determiner of an agent’s wealth, and hence, their final score, is their ability to capitalize on its position in the market by negotiating successfully. Agents Participants in SCML develop factory manager agents. The goal of each factory manager agent is to accrue as much wealth (i.e., profit) as possible. All agents can all buy and sell products based on agreements they reach, and then sign as contracts. Such agreements are generated through bilateral negotiations using a variant of the SAOP typically used in ANAC competitions. The sequences of offers and counteroffers in these negotiations are private to the negotiators. An offer must specify a buyer, a seller, a product, a quantity, a delivery time, and a unit price. When a contract comes due, the simulator tries to execute it (i.e., move products from the seller’s inventory to the buyer’s, and move money from the buyer’s wallet to the seller’s). If this execution fails, a breach of contract can occur. Breaches can also occur if either party decides not to honor the contract.

72

Y. Mohammad et al.

Fig. 4 Comparison of the performance of various SCML agents, showing score as a function of production level (i.e., the number of steps in the chain from the agent to the raw materials)

Public and Private Information Each factory manager agent is assigned private production capability. Contracts are also private; they are not posted on the bulletin board. However, whenever an agent breaches a contract, the breach is published on the bulletin board, on a breach list, which indicates the perpetrator and the level of the breach. In addition to the breach list, which may help agents decide who not to trade with, quarterly reports are also published on the bulletin board listing, for each factory manager agent, their assets, including their balance and the value of the products in their inventory (valued at endogenously determined trading prices).

6 Using NegMAS for Developing SCM Agents In this section, we briefly demonstrate the power of NegMAS, by reporting the results of an in-house SCML tournament that was run with built-in SCML agents. The tournament module was used to run a tournament between these agents to evaluate their relative strength. The tournament was run by creating 500 random world configurations, assigning agents to different factories, and then rotating the agents’ assigned factories, ensuring fair comparison. Five different agents were developed for this experiment: Random An agent that engages in buy and sell negotiation randomly. BuyCheapSellExpensive An agent that treats each negotiation independently, trying to maximize the price for sell contracts and minimize it for buy contracts. This agent is a baseline; it does not take into account the negotiation situation as it is defined in this paper. MovingRange This agent’s name stems from the fact that it aims to buy and sell target quantities for some future horizon, which by default is 10% of the length of the game. This agent then manages all procurement (buying) and all sales negotiations in a dependent fashion, using two AspirationMetaNegotiatorController(s), one to manage selling and the other to manage buying. Note that this agent only

NegMAS: A Platform for Situated Negotiations

73

partially accounts for the situation, because it fails to relate its buying and selling decisions during negotiation. It does, however, relate them when it adjusts its target quantities. Decentralizing This agent instantiates a MovingRange agent with a horizon of 1 on the current and all future days of the game. Decisions are coordinated only through replanning: i.e., the adjustment of target quantities between negotiations. IndDecentralizing This agent achieves the same goal as Decentralizing by utilizing another feature of NegMAS, namely dynamic utility functions. Rather than centralizing all decisions regarding negotiations, it uses dynamic utility functions, which pass information implicitly among negotiators using shared state. Figure 4 compares the performance of the aforementioned SCML agents at various levels of the supply chain. As shown, Decentralizing and IndDecentralizing are the clear winners, with MovingRange not that far behind. Further, more detailed analyses are necessary to differentiate the behavior of these three agents. Nonetheless, the fact that these agents achieve higher scores than the other competitors supports the claim that designing agents to take advantage of the situation is a key to success in situated negotiation scenarios like SCML. The BuyCheapSellExpensive agent which ignores the negotiation context only manages to outperform an agent that behaves randomly.

7 Conclusions This paper presents NegMAS, a new platform for studying situated negotiations, which provides a suite of building blocks and tools for developing autonomous agents that can engage in simultaneous and sequential negotiations within a wider world simulation, where utility functions arise endogenously and are naturally dynamic. Acknowledgements Amy Greenwald was supported in part by NSF Award CMMI-1761546.

References 1. Aydo˘gan, R., Festen, D., Hindriks, K.V., Jonker, C.M.: Alternating offers protocols for multilateral negotiation. In: Modern Approaches to Agent-based Complex Automated Negotiation, pp. 153–167. Springer (2017) 2. Baarslag, T., Gerding, E.H.: Optimal incremental preference elicitation during negotiation. In: IJCAI, pp. 3–9 (2015) 3. Baarslag, T., Hindriks, K., Jonker, C.: A tit for tat negotiation strategy for real-time bilateral negotiations. In: Complex Automated Negotiations: Theories, Models, and Software Competitions, pp. 229–233. Springer (2013) 4. Baarslag, T., Kaisers, M.: The value of information in automated negotiation: a decision model for eliciting user preferences. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 391–400. International Foundation for Autonomous Agents and Multiagent Systems (2017)

74

Y. Mohammad et al.

5. Chatterjee, K., Samuelson, W.: Bargaining under incomplete information. Oper. Res. 31(5), 835–851 (1983) 6. Contract room platform (2019), https://www.contractroom.com/ 7. Faratin, P., Sierra, C., Jennings, N.R.: Negotiation decision functions for autonomous agents. Robot. Auton. Syst. 24(3–4), 159–182 (1998) 8. Fatima, S.S., Wooldridge, M., Jennings, N.R.: Multi-issue negotiation under time constraints. In: Proceedings of the First International Joint Conference on Autonomous Agents and Multiagent Systems: Part 1, pp. 143–150 (2002) 9. Fishburn, P.C.: Interdependence and additivity in multivariate, unidimensional expected utility theory. Int. Econ. Rev. 8(3), 335–342 (1967) 10. Fukui, T., Ito, T.: A proposal of automated negotiation simulator “jupiter” for negotiating agents using machine learning. In: The 11th International Workshop on Automated Negotiations (2018) 11. Güth, W., Schmittberger, R., Schwarze, B.: An experimental analysis of ultimatum bargaining. J. Econ. Behav. Organ. 3(4), 367–388 (1982) 12. Inotsume, H., Aggarewal, A., Higa, R., Nakadai, S.: Path negotiation for self-interested multirobot vehicles in shared space. In: Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2020) 13. Ito, T., Hattori, H., Klein, M.: Multi-issue negotiation protocol for agents: exploring nonlinear utility spaces. IJCAI 7, 1347–1352 (2007) 14. Jonker, C.M., Aydo˘gan, R., Baarslag, T., Broekens, J., Detweiler, C.A., Hindriks, K.V., Huldtgren, A., Pasman, W.: An introduction to the pocket negotiator: a general purpose negotiation support system. In: Multi-Agent Systems and Agreement Technologies, pp. 13–27. Springer (2016) 15. Kersten, G.E.: Are procurement auctions good for society and for buyers? In: Joint International Conference on Group Decision and Negotiation, pp. 30–40. Springer (2014) 16. Kersten, G., Noronha, S.: Negotiation via the world wide web: a cross-cultural study of decision making. Group Decis. Negot. 8(3), 251–279 (1999) 17. Klein, M., Faratin, P., Sayama, H., Bar-Yam, Y.: Negotiating complex contracts. Group Decis. Negot. 12(2), 111–125 (2003) 18. Klein, M., Faratin, P., Sayama, H., Bar-Yam, Y.: Protocols for negotiating complex contracts. IEEE Intell. Syst. 18(6), 32–38 (2003) 19. Li, C., Giampapa, J., Sycara, K.: Bilateral negotiation decisions with uncertain dynamic outside options. IEEE Trans. Syst. Man Cybern. Part C (Applications and Reviews) 36(1), 31–44 (2006) 20. Lin, R., Kraus, S., Baarslag, T., Tykhonov, D., Hindriks, K., Jonker, C.M.: Genius: an integrated environment for supporting the design of generic automated negotiators. Comput. Intell. 30(1), 48–70 (2014). https://doi.org/10.1111/j.1467-8640.2012.00463.x 21. Mohammad, Y., Nakadai, S.: Fastvoi: efficient utility elicitation during negotiations. In: International Conference on Principles and Practice of Multi-Agent Systems (PRIMA), pp. 560–567. Springer (2018) 22. Mohammad, Y., Nakadai, S.: Optimal value of information based elicitation during negotiation. In: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, pp. 242–250. AAMAS ’19, International Foundation for Autonomous Agents and Multiagent Systems (2019) 23. Mohammad, Y., Nakadai, S.: Utility elicitation during negotiation with practical elicitation strategies. In: IEEE SMC (2018) 24. Mohammad, Y., Viqueira, E.A., Ayerza, N.A., Greenwald, A., Nakadai, S., Morinaga, S.: Supply chain management world. In: Baldoni, M., Dastani, M., Liao, B., Sakurai, Y., Zalila Wenkstern, R. (eds.) PRIMA 2019: Principles and Practice of Multi-Agent Systems, pp. 153– 169. Springer International Publishing, Cham (2019) 25. Nash Jr., J.F.: The bargaining problem. Econometrica: Journal of the Econometric Society pp. 155–162 (1950) 26. Robu, V., Somefun, D., La Poutré, J.A.: Modeling complex multi-issue negotiations using utility graphs. In: Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multiagent Systems, pp. 280–287 (2005)

NegMAS: A Platform for Situated Negotiations

75

27. Rubinstein, A.: Perfect equilibrium in a bargaining model. Econometrica: Journal of the Econometric Society pp. 97–109 (1982) 28. Williams, C.R., Robu, V., Gerding, E.H., Jennings, N.R.: Negotiating concurrently with unknown opponents in complex, real-time domains. In: Proceedings of the Twentieth European Conference on Artificial Intelligence, pp. 834–839 (2012) 29. Wurman, P.R., Wellman, M.P., Walsh, W.E.: A parametrization of the auction design space. Games Econ. Behav. 35(1–2), 304–338 (2001) 30. Zeng, D., Sycara, K.: Bayesian learning in negotiation. Int. J. Hum. Comput. Stud. 48(1), 125–141 (1998)

Can a Reinforcement Learning Trading Agent Beat Zero Intelligence Plus at Its Own Game? Davide Bianchi and Steve Phelps

Abstract We develop a simple trading agent that extends Cliff’s Zero Intelligence Plus (ZIP) strategy for trading in continuous double auctions. The ZIP strategy makes trading decisions using a deterministic hand-crafted and fixed mapping between states of the market corresponding to properties of recent order-flow and actions on the agent’s limit price (raise price, lower price, do nothing). In contrast, we situate the ZIP decision rules within a larger policy space by associating probabilities with state–action pair, and allow the agent to learn the optimal policy by adjusting these probabilities using Q-learning. Because we use the same state and action space as ZIP, the resulting policies have virtually identical computational overhead to the original strategy, but with the advantage that we can retrain our strategy to find optimal policies in different trading environments. Using empirical methods, we found that our reinforcement learning agents are sometimes able to outperform the deterministic ZIP agent in at least one scenario. Keywords Zero-intelligence traders · Automated trading · Reinforcement learning

1 Introduction Autonomous trading agents, also known as algo-traders, are responsible for a significant fraction of global financial trades due to their speed and scalability [1]. The seminal example of an algo-trader is the Zero Intelligence Plus (ZIP) strategy [2] which is able to trade in continuous double-auction markets, characteristic of most D. Bianchi (B) King’s College London, Bush House, 30 Aldwych, London WC2R 2LS, UK e-mail: [email protected] S. Phelps Mesonomics Ltd., London, UK e-mail: [email protected] URL: https://sphelps.net/ © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 R. Aydoˇgan et al. (eds.), Recent Advances in Agent-based Negotiation, Studies in Computational Intelligence 958, https://doi.org/10.1007/978-981-16-0471-3_5

77

78

D. Bianchi and S. Phelps

modern financial exchanges. The ZIP strategy can gradually adjust its price so that it consistently earns a profit on each trade. ZIP uses a set of simple decision rules based on feedback from recent trades to adjust the trader’s limit price. These decision rules were hand-crafted by Cliff, the designer of the algorithm [2], and the strategy has been shown to consistently outperform human traders [3]. One of the advantages of the ZIP strategy is its simplicity, which allows its statevector and decision function to be computed using a small amount of memory and a small number of clock cycles, thus making it suitable for real-time high-frequency trading, e.g. by implementing it on an FPGA [4]. A question that naturally arises is whether strategies with a commensurable architectural footprint to ZIP can outperform it. We research this question by situating the ZIP decision rules within a larger state–action space, but using the same state vector, and the same action–space, as the original strategy. We use a reinforcement-learning framework to search for optimal policies within this space using Q-learning.

2 Trading Agents 2.1 ZI Agents Zero intelligent agents were originally used by Gode and Sunder [5] to show that a continuous double auction intrinsically drives the system to a very high allocative efficiency. The ZI agents used in this paper, ZI constrained (ZI-c), submit a random shout with the limitation to only engage profit-making deals. ZI-c buyers can only bid in the interval between zero and their limit price, while ZI-c sellers can only bid in the interval between their limit price and the maximum price.

2.2 ZIP Agents Gode and Sunders showed that ZI-C agents are able to reach the equilibrium price in a market where supply and demand are symmetrical [5]. However, it was later discovered by Cliff that Zero-intelligence traders are not able to allow the market to reach equilibrium price if supply and demand are not symmetrical [2]. He proposed an agent called Zero Intelligence Plus (ZIP) that employs a very basic machine learning algorithm coupled with humanly designed heuristics to be able to adjust its margin, hence its internal price, based on price and type of the last shout. Let agent’s i private limit price be λi then pi is related to the profit margin μi (t) by equation: (1) pi = λi (1 ± μi (t)),

Can a Reinforcement Learning Trading Agent Beat Zero Intelligence …

79

where the plus sign is used for sellers and the minus sign for buyers. In fact, in order to make a profit, sellers have to sell for more than their internal evaluation of the asset (limit price). Vice versa for buyers. The machine learning technique used by ZIP agents is based on the Widrow– Hoff (W-H) rule with the addition of a momentum-based rule to help smooth out the stochastic fluctuations of the market. The price update rule is p(t + 1) = p(t) ± z(t) z i (t + 1) = γi z i (t) + (1 − γi )i (t),

(2) (3)

where plus the sign is for buyers and the minus for sellers in the top equation. In the second equation, γi is the momentum term that balances memory and adaptability. i is the difference between pi and a target price given by last trade price plus noise. This difference is multiplied by a learning rate. The inertial factor γ and the learning rate β are drawn from a random interval at the initialisation of the agent. The extreme values of this interval are parameters of the model together with the extreme values of the two parameters that determine the linear stochastic function that gives the target price. This price adaptation algorithm has been subjected to numerous attempts to improve it, either searching the parameter space [6] or extending it from its original 8 dimensions all the way up to 60 [7]. However, as far as our knowledge, the heuristic has never been optimised. ZIP agents decide to raise or lower their limit prices using a state–action table. The states correspond to properties of recent order-flow and trades, and the actions correspond to changes in the limit price. In the original formulation of the strategy, the state–action table was hand-coded by Cliff as a deterministic mapping. We reformulated these deterministic rules in reinforcement learning fashion by associating a probability with each state–action pair such that the probabilities for each state sum to one. Given a state of the system (properties of the last shout) the agent can then choose its action randomly with the specified probability distribution. For the original ZIP strategy, the probabilities are always 0 and 1, and the probability matrix is fixed by the designer of the algorithm. In contrast, we relax this constraint for our agents, and we search for optimal strategies by finding the probabilities which maximise the surplus of our agents over many episodes. The state space is composed of 1 ternary and 3 binary variables: • DEAL or NO_DEAL: last shout was accepted or not • p > q or p < q or p = q: the agents private price is greater or smaller than the last shout • BID or OFFER: last shout was a bid or an offer • WILL or NO_WILL: the agent is willing to shout or not. The willing condition is decided on two factors: not having shouted already during the day and being able to make a profit given the shouted price. • ACTIVE or NOT_ACTIVE: the agent has not sold all its commodities already. The action space consists of the actions:

80

D. Bianchi and S. Phelps

Fig. 1 The rules of a ZIP agent seen as a decision table. Yellow is probability 1, purple probability 0

• raise: raise profit • lower: lower profit • do nothing: keep the previous profit value. The resulting space consists of 48 states, each with the same action set A(a) with cardinality 3. The original deterministic ZIP strategy is depicted in Fig. 1.

2.3 RL Agents (ZIQ+) Reinforcement Learning agents have to indirectly solve a Markovian Decision Process (MDP). In our case, the ZIQ+ agents use Q-learning to choose the optimal action. The agents we designed have the same state space and action space as ZIP agents but at time 0 their probability table is uniform and they have to learn the matrix Q st ,at (Q-table) that dictates the action to take at given that the system is in state st , through the Softmax function, at time t. The update rule is given by:   ← (1 − η) · Q + η · r + γ · max Q Q new st ,at t st+1 ,a , st ,at a

(4)

where α is the learning rate and γ is the discount factor. rt is the reward at time t. The agent, at any time, will sense what is the state of the system and select the action to take accordingly. The optimal action is selected according to the probabilities obtained using a softmax activation potential with temperature τ . After the decision process, the selected action is performed using the same price update rule as ZIP (Eq. 2).

Can a Reinforcement Learning Trading Agent Beat Zero Intelligence …

81

Table 1 The number of agents per type, the theoretical equilibrium price P0 and the number of training steps Nt for the different environments. The subscript are Q for ZIQ+, Z for ZIP and Z I for ZI-c Environment NQ NZ NZ I P0 Nt (106 ) sS&D aS&D

CL SL CL SL

5 1 10 1

4 10 1 10

4 4 0 0

175$ 200$ 220$ 220$

2 3 2 4

3 Setup The ZIQ+ agents have been trained in four different environments (Table 1). There is a symmetrical supply and demand (aS&D) scenario and an asymmetrical one (sS&D). For each of those, the Q-learning agents have been trained in both a competitive learning environment (N Q  N Z I P ), indicated as CL, and solo learning environment (N Q  N Z I P ), indicated as SL. For asymmetrical S&D there are no ZI agents present since it has been shown that they cannot reach equilibrium price in this scenario [2]. For both the symmetrical S&D environments, the supply and demand curves are straight lines with equation p = 25 q and p = 25(N + 1) − 25 q, respectively. For the asymmetrical environments the supply slope is 30 and the demand slope is −10. The market is a simplified version of a continuous double auction with the NYSE rule. Traders shout prices at any time but can only improve on the previous bid or offer. Unaccepted bids and offers are cancelled after any transaction. Each experiment consists of 30 trading days. A trading day starts when each agent is having 1 unit of the product they are trading. The private price is kept during the days but is reset to a random value for each experiment. The ZIQ+ agents learn the trading rules starting with a Q-table engineered to produce a decision table equal to ZIP with noise  in a way similar to an -greedy policy. They then update their Q-tables according to (4) where the reward function rt is defined as follows. If during the trading day the agent has concluded a deal, then the reward is equal to the profit, else the reward is r = −50 or r = 0 depending on whether the agent was enabled or not to make a deal by the limit price. The learning rate η is adjusted during the progress of the experiment x according to the rule η(x) = max(ηmin , ηi · 0.96100x ). The exploration/exploitation balance is managed using annealing for the temperature of the Softmax activation function. The inverse temperature β = 1/τ is evolved according to β(x) = βi (1 − x) + β f . The parameters are set as follows: ηmin = 0.003, ηi = 1, τi = 100, τ f = 0.05 and  = 0.01. The ZIQ+ agents are trained for a number of experiments Nt specified in Table 1 for each environment. At the end of training, the ZIQ+ decision tables are fixed and the market is sampled for another 500 experiments.

82

D. Bianchi and S. Phelps

Table 2 The market efficiency μ and the Surplus per agent S, expressed as S ± σS , achieved by different agents in the four different environments μtot

μ Z I Q+

μZ I P

μZ I

Stot

S Z I Q+

S Z I P

CL

0.79

0.84

0.79

0.75

64.0±0.5

62±1

68±1

63±1

SL

0.82

0.90

0.83

0.76

76.2±0.6

88±9

77±1

71±1

Environment sS&D aS&D

S Z I

CL

0.91

0.91

0.91

n.a.

81.1±0.4

80.2±0.5

90±4

n.a.

SL

0.91

0.92

0.91

n.a.

81.3±0.4

83±4

81.1±0.5

n.a.

Fig. 2 Profit dispersion during the trading days for the different environments. The solid line is the average over the sampling experiments, the shaded area is the standard deviation and the dotted lines are the 95% confidence intervals

(a) symm S&D CL

(b) asymm S&D CL

(c) symm S&D SL

(d) asymm S&D SL

4 Results In order to benchmark the performance of the ZIQ+ agents, we are going to use well-known metrics such as profit dispersion and market efficiency together with the raw profit [2]. Additionally, to measure the similarity of the two strategies we define the following metric. Given two Ns by Na matrices P and Q, of which any row can be considered a probability distribution, the similarity score is s(P, Q) =

Ns  Na 1  |Psa − Q sa |. 2Ns s=1 a=1

This score will assume, on average, the value of 1/3 if two random matrices are compared and 1 if the compared matrices are identical. In terms of allocative efficiency, the ZIQ+ agents perform at least equal to the ZIP agents in the same market with equality only in the asymmetric competitive learning case (Table 2). These observations are confirmed by the profit dispersion where ZIQ+ consistently outperforms ZIP during the whole trading period with the exception of the aS&D CL case (Fig. 2). Notably, in the symmetric market ZIQ+

Can a Reinforcement Learning Trading Agent Beat Zero Intelligence …

83

Fig. 3 Surplus distribution for the different environments

(a) symm S&D CL

(b) asymm S&D CL

(c) symm S&D SL

(d) asymm S&D SL

hinders the performance of ZIP agents with the latter having a bigger profit dispersion than random agents (ZI-c) (Fig. 2a and c). The surplus per agent, the quantity that drives the reinforcement learning agents, shows ZIQ+ outperforming ZIP only when it is trained as a single learner (Table 2). The bigger average surplus is achieved broadening the support of the distribution. The ZIQ+ agents are willing to accept a large quantity of bad deals, as visible for the peak near zero in Figs. 3d and 2c, in exchange for the capability to access high profit deals. The long tails of the distribution are the cause for the higher average profit and high variance. This effect is not present in the competitive learning environments. Finally, we can analyse the similarity between the ZIP strategy and the ones learned by the ZIQ+ agents. Probably the most important outcome of this experiment is that, in this framework, ZIP’s strategy is not stable. As reported in Table 3, all the agents deviate from the ZIP strategy, with the biggest difference occurring in the solo learning cases. Interestingly, all of the strategies learned by ZIQ+ agents are semi-deterministic. Their probability tables mainly consists of ones and zeros. To quantitatively describe this phenomenon, we can compute the Shannon entropy for the probability of choosing action a, P(a), for every state and average it over the whole strategy. The maximum value of these strategy entropies across all ZIQ+ agents is very small (Table 5), thus supporting the qualitative statement. Such a feature can be very important to design extremely fast agents. In fact, deterministic policies can be encoded in if-else statement, do not need pseudo-random number generation and can be implemented on dedicated hardware (FPGAs or ASICs).

84

D. Bianchi and S. Phelps

Table 3 Similarity score between ZIP and Q-learning agents decision tables in each of the four environments. The standard deviation of the result is absent in the high signal case since there is only one RL buyer and one RL seller Similarity score Seller(s) Buyer(s) sS&D

0.92 ± 0.03 0.87 0.93 ± 0.04 0.90

CL SL CL SL

aS&D

0.93±0.04 0.9 0.90 ± 0.03 0.88

Table 4 Similarity score between RL agents decision tables in the high noise environment for both symmetrical and asymmetrical supply and demand. S-S stands for seller–seller similarity, B-B for buyer–buyer and S-B for seller–buyer. B-S is not present since the score function is symmetrical respective to the inputs Similarity score S-S B-B sS&D aS&D

CL CL

0.91±0.02 0.88±0.05

0.89±0.03 0.84±0.05

Table 5 The maximum value over strategies of the average entropy of the probability table Max entropy sS&D aS&D CL SL CL SL Buyer Seller

1e−8 3e−7

1e−8 8e−8

1e−10 2e−4

9e−6 8e−3

5 Conclusions We reinterpreted the update rules of one of the most used simple trading agents (ZIP), and we tried to understand if a reinforcement learning agent (ZIQ+) could outperform it borrowing the update mechanism but learning the behavioural rules. The results show that this is indeed possible, with ZIQ+ agents able to outperform ZIP. The better performance is frequency dependent. The environments tested showed that ZIQ+ agents learn much better as solo learners than when they have to compete with others. It would be necessary to study the agents in evolutionary environments to understand the correct mixture for learning, or change their goal to a market-wide metric rather than egoistic profit to analyse cooperation. Ultimately we have shown that in the strategy space we defined ZIP is not the optimum but is yet to be verified if the strategies found by the ZIQ+ agents are stable and are attractors for the learning dynamic. Another direction forward would be applying the same approach to agents that use more than last shout information as the GD agent from Gjerstad and Dickhaut [8].

Can a Reinforcement Learning Trading Agent Beat Zero Intelligence …

85

References 1. Wellman, M., Rajan, U.: Ethical issues for autonomous trading agents. Minds Mach. 27(4), 609–624 (2017) 2. Cliff, D., Bruten, J.: Zero is Not Enough: On the Lower Limit of Agent Intelligence for Continuous Double Auction Markets (1997) 3. Das, R., Hanson, J., Kephart, J., Tesauro, G.: Agent-human interactions in the continuous double auction. In: International Joint Conference on Artificial Intelligence, vol. 17(1), pp. 1169–1178, Lawrence Erlbaum Associates Ltd. (2001) 4. Leber, C., Geib, B., Litz, H.: High frequency trading acceleration using FPGAs. In: 21st International Conference on Field Programmable Logic and Applications, pp. 317–322 (2011) 5. Gode, D., Sunder, S.: Allocative efficiency of markets with zero-intelligence traders: market as a partial substitute for individual rationality. J. Polit. Econ. 101(1), 119–137 (1993) 6. Cliff, D.: Genetic optimization of adaptive trading agents for double-auction markets. In: Proceedings of the IEEE/IAFE/INFORMS 1998 Conference on Computational Intelligence for Financial Engineering (CIFEr) (Cat. No. 98TH8367), pp. 252–258. IEEE, New York (1998) 7. Cliff, D.: ZIP60: further explorations in the evolutionary design of trader agents and online auction-market mechanisms. IEEE Trans. Evol. Comput. 13, 3–18 (2009) 8. Gjerstad, S., Dickhaut, J.: Price formation in double auctions. Games Econ. Behav. 22(1), 1–29 (1998)

Negotiation in Hidden Identity: Designing Protocol for Werewolf Game Hirotaka Osawa, Takashi Otsuki, Claus Aranha, and Fujio Toriumi

Abstract The automation of complex negotiations is required to coordinate various AIs. In complex negotiations, the goals of each agent may not be shared because some agents may benefit from not disclosing their own information during negotiation. This uncooperative situation requires an agent to have the ability to infer the intention of the others from the communication, and also to persuade the others through communication. Such negotiation in hidden identity is a pioneering area on negotiation research. In the past, the authors have proposed an AI competition in the Werewolf game in order to research negotiation in hidden identity. To eliminate noise caused by the difficulty of handling natural language, it is necessary to create a model of communication between agents in such a competition. In this paper, we analyze the elements necessary for communication between agents in the Werewolf game and design the protocol between agents based on this analysis. Keywords Werewolf game · Agent communication protocol · Hidden identity

H. Osawa · C. Aranha (B) Faculty of Engineering, Information and Systems, University of Tsukuba, Tsukuba, Japan e-mail: [email protected] H. Osawa e-mail: [email protected] T. Otsuki Graduate School of Science and Engineering, Yamagata University, Yamagata, Japan e-mail: [email protected] F. Toriumi Graduate School of Engineering, The University of Tokyo, Bunkyo City, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 R. Aydoˇgan et al. (eds.), Recent Advances in Agent-based Negotiation, Studies in Computational Intelligence 958, https://doi.org/10.1007/978-981-16-0471-3_6

87

88

H. Osawa et al.

1 Introduction Complex automated negotiation becomes a more important research field as our social activities are supported more and more by automated AI systems. One of the challenges in complex automatic negotiation is to identify the intentions of the other negotiators. Currently, each AI act follows their own goal to maximize its own interests; however, these goals are not always disclosed to others because such tolerance causes exploitation. For example, a health-related AI might recommend you dietary products on how to maintain your health. However, this AI could be designed with a hidden intention to sell you specific products related to their interest. Several researchers, as well as other interested parties, have tried to make guidelines for AIs to inform consumers appropriately for their intentions [7]. However, it is difficult to force companies to clearly explains their AIs’ intentions. Such an environment would be categorized as a hidden identity situation in game studies [12]. In hidden identity situations, each player does not disclose their intentions and so each player needs to identify each other’s intentions just by their communications. To study the hidden identity problem in negotiation, we have applied “Are you a Werewolf?”, also called the werewolf game, as an AI competition. Werewolf is a worldwide known communication game about finding spies during discussion. The cover story of the werewolf game (also known as “Mafia”) is as follows: “It’s a story about a village. Werewolves have arrived who can change into and eat humans. The werewolves have the same form as humans during the day, and attack the villagers one-by-one every night. Fear, uncertainty, and doubt towards the werewolves begin to grow. The villagers decide that they must execute those who are suspected of being werewolves, one by one...”. This game provides a clear example of the hidden identity problem that will be required in complex automated negotiation. The winner of the Werewolf game is decided solely through discussions. Consequently, game players need to read the intention of the opponents and to persuade them by using their cognitive faculties. In contrast to a perfect-information game, players in the Werewolf game hide considerable information. Every player attempts to determine this hidden information by using the other players’ conversations and behaviors, while trying to hide their own information to accomplish the game’s objective. The challenge in using the Werewolf game as a competition is to model the communication between agents. In the Werewolf game, each agent has no explicit resources. Instead, a conversation is held about each agent’s hidden identity, and each agent uses the result to infer each other’s role and persuade each other. To eliminate the noise caused by the difficulty of handling natural language, modeling of communication between agents of such competition is required. In this paper, we explain the latest communication protocol that extend previous approaches [13] to describe other’s intentions recursively using limited factors. Section 2 summarizes the origins and history of the Werewolf game and explains how AI game studies also focused on werewolf games. Section 3 describes the model of the Werewolf game, and key challenges of hidden identity problems regarding describing the reasoning and persuasion structures to be dealt with in the Werewolf

Negotiation in Hidden Identity: Designing Protocol for Werewolf Game

89

game. Thus, we summarize the requirements for the protocol. Section 4 describes the proposed protocol for competition based on the discussion up to Sect. 3.

2 Background 2.1 Werewolf Game: Hidden Identity in Communication Game Communication game is a large category within board games. In a communication game, the success or failure of the game depends on the information exchange between players. Some well-known examples of communication games include Monopoly and Catan, where negotiation between players has a large role in determining victory or defeat. In Monopoly and Catan, players communicate with each other to change their gains. Werewolf is the most extreme form of such a game, because there is little objective information, or determinants, other than “Communication”. The objective information of a werewolf is just the contents of each person’s utterances, and meta-information like the number of days, the players that performed certain actions, etc. Furthermore, if we consider the player’s communication, there is no way to prove “objectively” the content of what a player says. Here, we use the term “objectively” to mean “When you look at something from outside that is not in the game” (In the simplest game of Werewolf, the role of the player is revealed at their death, but in modern Werewolf Game such as the Werewolf BBS in which a seer role appear, this objective announcement of the role is not made during the game). Werewolf is a popular party game played worldwide including several different variations like “Are You a Werewolf?” and “Lupus in Tabula”. The game is also known as “Mafia”, where it has a similar game structure but with less fantastic decoration. Dmitriy Davydov, known as the creator of “Mafia”, described the game factor as a conflict between an informed minority and an uninformed majority [10]. At the start of the Werewolf game, each player is secretly and randomly assigned a role affiliated with one of the two teams: the werewolves and the villagers. There are two phases: night and day. At night, the werewolves “attack” a player in the villager team. During the day, surviving players discuss to reach a consensus (by voting) on the elimination of another player, hopefully a werewolf. The objective of the werewolves is to kill off all the villagers without being killed themselves. The objective of the villagers is to ascertain who the werewolves are and to kill them. The victory condition for the villagers is to kill all the werewolves. For the werewolves, the victory condition is to kill enough villagers so that they become equal or fewer in number to the werewolves. There are two styles for playing Werewolf. The first includes face-to-face play by using game cards. The other is to play online using web applications, or a BBS-type platform. Large BBS services exist in Japan for playing Werewolf and there are more

90

H. Osawa et al.

than a thousand logs of Werewolf games. Some academic studies have focused on the analysis of BBS game logs. Inaba et al. [8] analyzed that half of all communications on BBS games could be represented as a simple protocol. This information was used to form a simplified representation of the essence of the game for the AI Werewolf Competition. In the “closed” rule version, which is applied on BBS-based werewolf and also in our competition, a player cannot ever know another players’ roles exactly, because the allocated roles are never revealed. Thus, a basic course of action for the villager players is to discover werewolves through conversation because they do not know who the werewolves are. In contrast, the werewolf players know who the werewolves are. Therefore, a basic course of action for the werewolf players is to engage in deceptive conversation, without the villagers knowing about their roles.

2.2 Game AI Studies: From Chess to Werewolf AI game-playing competitions have been a part of AI research from the beginning [1]. Several two-player board games with perfect information, such as Checkers, Othello, Chess, and Go, have been used to test new algorithms [5, 9]. In these games, all information is observable by both players. An AI system must only handle the condition of the board and does not need to determine a competitor’s thought processes. These complete information games are outdated as AI game competitions. On the contrary, there are several unsolved games in the multi-players and incomplete information game categories. Card games have information that cannot be observed by other players [4]. This is also an important field in AI research. Poker is one of the best-known examples, on which several theoretical analyses have been conducted [2]. Other incomplete information games, including Bridge and the twoplayer version of Dou Zi Zhu (a popular game in China), have also been studied [6, 14]. Compared to these games, the special characteristics of the Werewolf game is that player’s role and reward tables are hidden and not shared with all players. Thus, the game requires more social intelligence to estimate the roles and internal states of other players. Although their information cannot be observed by other players, each player’s role in the aforementioned games is determined before the game starts and is known to all players. In contrast, a player’s role in the Werewolf game is hidden from the other players and is only revealed at the end of the game. This type of situation requires more intelligence because each player (especially a villager) needs to hold multiple world models for the other players’ actions. It also suggests that a stable strategy does not exist because if some action suggests that a player supports the villagers, a werewolf will mimic this action. Inaba analyzed the change in the theory in the online werewolf game called “Werewolf bulletin board system (BBS)” [8] for 10 years. In addition, this game requires persuasion of other players. This type of intelligence requires two levels of the Theory of Mind: the expectation of other players’ expectations [3]. All these considerations suggest that research on the Werewolf game will lead to several new findings in the field of AI.

Negotiation in Hidden Identity: Designing Protocol for Werewolf Game

91

3 Model of Werewolf Games 3.1 Basic Rule on Close-Rule Werewolf Game We classify Werewolf games into two types: open-rule and close-rule, and focus on close-rule werewolf games in this study. In open-rule werewolf game, the role of dead players is revealed to everyone when that player is removed from the game. In closerule Werewolf games, the role of a player role is not revealed until the game ends. This situation makes communication more complex, as all negotiation processes are based on hidden identities. The game proceeds in alternating phases of day and night. During the day, all players discuss who the werewolves are. Simultaneously, players who have special abilities (which we discuss later) lead discussions that produce advantages for their respective teams by using the information derived from their abilities. After a certain period, players execute one player who is suspected of being a werewolf, chosen by majority voting. The executed player leaves the game and cannot play anymore. During the night, werewolf players can attack one townsfolk team player. The attacked player is killed and eliminated from the game. In addition, players who have special abilities can use those abilities during the night phase. The day and night phases alternate until the winning conditions are met. Villager players must be able to detect the lies of the werewolf player. In addition, it is important to persuade other players by using information obtained through special abilities. Furthermore, a crucially important point for the werewolf team is to manipulate the discussion to the team’s advantage. Occasionally they must impersonate a role and obfuscate the conditions and evidence. There are many variations of the rules and roles of the Werewolf game. Therefore, we use the following basic set of roles for the sake of simplicity: • Villager: Villager team. A player with this role has no special ability. • Seer: Villager team. A seer can inspect a player every night phase to ascertain whether or not that player is a werewolf. • Bodyguard: Villager team. A bodyguard can choose a player every night phase and protect the player against an attack by a werewolf. • Medium: Villager team. A medium can ascertain whether a player who was executed during the previous day phase was a werewolf. • Werewolf: Werewolf team. Werewolves can attack one human player during each night phase. They all decide on a single player to attack together through a vote, and zero or one human dies each night. BBS-type games also allow werewolves to talk to each other simultaneously during the day, and we used the same rules in this AI game. • Possessed: Werewolf team. Werewolves do not know who is a possessed player. The possessed has no special ability. This role secretly cooperates with werewolves because a werewolf-team victory is also regarded as a victory for possessed player.

92

H. Osawa et al.

3.2 Lack of Objective Resources Unlike other negotiation games, the only resource shared by the players in the Werewolf game is the knowledge about the rules of the game, which include the number of each role and game procedure shown in Sect. 3.1. In other words, there is no objective information known to the players during the game play. This pressures players to utilize and obtain information from communication among themselves. For example, suppose that you are a villager and your friends are watching the game from outside (=they can only observe the communication). In this case, it is impossible to prove to your friend that you are a human or even on the villager team during game play. Of course, it is obvious to a werewolf player in the game that you are not a werewolf. If a seer in the village divined you, the seer could know that you are human, and the seer could make a statement that you are a human. However, from the point of view of your friends outside the game, your real role is not known until the end of the game, regardless of any declarations in the game (Fig. 1). In the werewolf game, objective information is provided to a limited number of individuals (such as werewolves), and no information is shared among all of the participants. This is a fundamental principle of the close-rule Werewolf game. Players can’t blindly trust someone and delegate their choice. Such a feature of the Werewolf game puts players in a situation where they prove themselves only through their own communication and make their decisions on their own. This induces reasoning and persuasion of each player as shown in later subsections.

3.3 Reasoning for Modeling the Intentions of Others When there is no objective information to support the contents of a statement, the context of the speaker is a valuable source of information. In other words, “Put oneself in the other’s shoes”. The information exchanged by the werewolves is not objective information, but only subjective information (i.e., context-sensitive information). Thus, the weighting of the different contexts must ultimately be done and evaluated by each player individually (Fig. 2).

Fig. 1 Information non-objectivity from an outside perspective

Negotiation in Hidden Identity: Designing Protocol for Werewolf Game

93

Fig. 2 Reasoning in players: Each player estimate possible combinations of roles

Results from seer and medium are valuable clues to estimate the intention of another player. Even if a player claims falsely to be a seer, their result gives some clue to their context. Of course, a real seer doesn’t always tell the truth, and there are situations where a real seer needs to lie. Villagers gather information. Wolves provide deceptive information. The villagers must cut the branches of the possible scenarios, and the werewolves must create new possible branches. Broadly speaking, the werewolf game is a fight for information between humans and werewolves, where branches are created by logic, and cut or increased through persuasion. That said, the werewolf knows who the human players are, so a werewolf that plays perfectly could ’imitate perfectly’ as a seer or a medium. No matter how much information you accumulate, this principle will not change. The only way to tell the difference between the real and the fake is through information from communication and context.

3.4 Persuasion as Modeling Self from the Perspective of Others Persuasion described in this section is stronger than the reasoning in Sect. 3.3. The final likelihood for each context varies from player to player depending on the data which the estimation method is based and the abilities of each player. Therefore, each person’s view of the world is different. In the werewolves game, however, the decision of the village as a whole must be finalized day by day. Players who disagree with each other eventually need to put their opinions together and make rational decisions. The keywords here are persuasion and trust rather than reasoning. Villagers and werewolves must win trust if they want to survive. To get your point across, you need to explain to others that your point of view is trustworthy. At this stage, it is important not only to model others but also to model oneself from the viewpoint of others and to produce reliable statements (Fig. 3).

94

H. Osawa et al.

Fig. 3 Modeling self from the perspective of others

3.5 Requirements for a Werewolf Game Protocol Based on the above considerations, the protocol requirements between players in this study are as follows: • Convenience: The protocol should have sufficient elements to describe the necessary statements. • Uniqueness: The protocol should allow the same representation to be expressed in as simple a description as possible • Recursiveness: The protocol must be able to describe context recursively. Allow nested structures.

4 Werewolf Game Protocol Players have natural language conversations, but it is difficult for AI agents to deal with natural languages directly. Unlike natural language statements that describe objective events in the environment, natural language statements that identify partners in a Werewolf game are highly context-sensitive and it is difficult to use existing semantic recognition techniques. To achieve the above requirements, we proposed a protocol design for communication between players. There are agent belief description methods such as BDI logic, but these methods are biased to description of knowledge internal to the agent and are not used for communication with others [11]. In order to play the Werewolf game in a mixed human-agent environment or between agents, it is necessary to design a unique protocol that can communicate an agent’s estimation of the internal model of another.

Negotiation in Hidden Identity: Designing Protocol for Werewolf Game

95

With a view to future automation in artificial intelligence, these structures should be chosen in the simplest possible form while maintaining the descriptive power of natural language. The grammatical structure of the language defined by the proposed protocol consists of six types: a basic word and modifier, a basic sentence for describing a certain event by combining words, an utterance for connecting the basic sentences and forming an utterance, a logical operator for describing the logical relationship between the utterances, and a control structure for describing the relationship between meanings. For each utterance, a speaker number and a speaker are recorded. In this study, we referred to the results of previous research by Inaba et al. on Werewolf BBS [8]. In that research, the meaning is determined by tagging each utterance in the Werewolf game. However, in order to accurately describe the conversation in a game, it is necessary to consider even its grammatical structure. In this study, we use these tags to define the grammatical structure of a protocol required for werewolf game conversation and the language conversation required.

4.1 Word The Werewolf protocol defines a word as the unit of meaning. A word can be one of the following: • subject: an ‘agent identifier’ (ex: Agent1), or UNSPEC (ommited), or ANY • target: an ‘agent identifier’, or ANY (undefined) • role: one of the 6 valid roles (VILLAGER, SEER, MEDIUM, BODYGUARD, WEREWOLF, POSSESSED) or ANY • species: one of the 2 valid teams (HUMAN, WEREWOLF) or ANY • verb: one of 15 valid verbs (specified below) • talk number: a unique id for each sentence (composed of [day number] and [talk id]) The keyword ANY can be specified for the following word categories: subject, target, role, and species. In that case, it is treated as a wildcard that can correspond to any valid options within the category.

4.2 Sentence There are 13 atomic sentences. Each sentence is composed of multiple words. The sentences can be grouped as described below.

96

H. Osawa et al.

4.2.1

Sentences that Express Knowledge or Intent

• ‘subject’ ESTIMATE ‘target’ ‘role’: The subject states their belief about the role of the target. • ‘subject’ COMINGOUT ‘targe’ ‘role’: The subject states their knowledge about the role of the target.

4.2.2 • • • •

Sentences About Actions of the Werewolf Game

‘subject’ DIVINATION ‘target’: The subject divines the target. ‘subject’ GUARD ‘target’: The subject guards the target. ‘subject’ VOTE ‘target’: The subject votes on the target. ‘subject’ ATTACK ‘target’: The subject attacks on the target.

4.2.3

Sentences About the Result of Past Actions

• ‘subject’ DIVINED ‘target’ ‘species’: The subject used the seer’s action on target and obtained the result species. • ‘subject’ IDENTIFIED ‘target’ ‘species’: The subject used the medium’s action on dead [target] and obtained the result species. • ‘subject’ GUARDED ‘target’: The subject guarded the target. • ‘subject’ VOTED ‘target’: The subject voted on the target. • ‘subject’ ATTACKED ‘target’: The subject attacked the target.

4.2.4

Sentences that Express Agreement or Disagreement

• ‘subject’ AGREE ‘talk number’ • ‘subject’ DISAGREE ‘talk number’

4.2.5

Sentences Related to the Flow of the Conversation

• OVER: “I have nothing else to say”—implies agreement to terminate current day’s conversation • SKIP: “I have nothing to say now”—implies desire to continue the current day’s conversation These two sentences can only be used as single statements, never nested in other statements.

Negotiation in Hidden Identity: Designing Protocol for Werewolf Game

97

4.3 Operator There are eight operators. Each operator is used to connect sentences and express their relationships. They can be grouped as follows:

4.3.1

Operators for Directed Requests of Action and Information:

• ‘subject’ REQUEST ‘target’ (‘sentence’): subject requests that target acts according to sentence, or acts so that the state of sentence is achieved. If the sentence uses ANY in its composition, then any appropriate expansion of ANY is acceptable as the object of the REQUEST. • ‘subject’ INQUIRE ‘target’ (‘sentence’): subject questions target about sentence. If ANY is not used in the sentence, target is simply being asked if it agrees with the sentence or not. If ANY is used in the sentence, target is being asked to reply with the appropriate word to replace ANY.

4.3.2

Reasoning Operators

• ‘subject’ BECAUSE (‘sentence1’) (‘sentence2’): subject states that sentence1 is the reason for sentence2.

4.3.3

Time Indication Operators

• ‘subject’ DAY ‘day_number’ (‘sentence’): Subject indicates that ‘sentence’ took place on ‘day_number’. This is commonly used along BECAUSE.

4.3.4

Logic Operators

• ‘subject’ NOT (‘sentence’): Negate the sentence. • ‘subject’ AND (‘sentence1’) (‘sentence2’) (‘sentence3’) . . .: Claims that all sentences are true. • ‘subject’ OR (‘sentence1’) (‘sentence2’) (‘sentence3’) . . .: Claims that at least one sentence is true. • ‘subject’ XOR (‘sentence1’) (‘sentence2’): Claims that either sentence1 or sentence 2 is true.

98

H. Osawa et al.

4.4 Grammar Notes An agent statement can be composed of one or more sentences. In this case, multiple sentences can be separated by parentheses, like when using the AND or OR operators. Sentences that follow an operator should always be delimited by parenthesis.

4.5 About Omitting Subjects (UNSPEC) It is possible to omit the subject of a sentence using the UNSPEC word. In cases where omitting the subject does not change the meaning of the sentence, we recommend that the subject is omitted. However, note that every agent should be able to interpret sentences in full or shortened format. When the subject is omitted, if the sentence is in the widest scope (when the sentence comes at the beginning of the agent’s statement), the omitted subject should be interpreted to be the same as the speaking agent. If the subject is in a sentence on a narrower scope (a nested sentence), the interpretation of the omitted subject depends on the type of the parent sentence as follows. • REQUEST, INQUIRE: the omitted subject is to be interpreted to be the same as the target of the parent sentence. • Other cases: the omitted subject is to be interpreted to be the same as the subject of the parent sentence.

4.6 Example Sentences 4.6.1

General Examples

“COMINGOUT Agent1 SEER”: The speaker declares that Agent1 is a seer. “Agent1 COMINGOUT Agent1 SEER”: Agent1 declares that Agent1 is a seer. “DIVINED Agent1 HUMAN”: The speaker has at some point used the seer’s ability on Agent1, and obtained the “Human” result.

Negotiation in Hidden Identity: Designing Protocol for Werewolf Game

99

“Agent1 DIVINED Agent2 WEREWOLF”: Agent1 has at some point used the seer’s ability on Agent2, and obtained the “Werewolf” result. “REQUEST Agent2 (DIVINATION Agent3)”: The speaker desires that Agent2 uses the seer’s ability on Agent3. (Note, this is identical to “REQUEST Agent2 (Agent2 DIVINATION Agent3)” “GUARD Agent2”: The speaker will use the Bodyguard’s ability on Agent2 “Agent1 REQUEST Agent2 (GUARD Agent3)”: Agent1 desires that Agent2 uses the Bodyguard Role’s ability on Agent3.

4.6.2

Requesting the Agreement of Other Agents

“REQUEST Agent1 (ESTIMATE Agent2 [role])”: The speaker is asking that Agent1 change their mind about Agent2, and consider them to be [role]. (e.g., Alice, would you consider that Bob might be a Werewolf?) “REQUEST ANY (ESTIMATE Agent1 [role])”: The speaker is asking that anyone change their mind about Agent1, and consider them to be [role]. (e.g., Everyone! You should believe that Anna is a Werewolf!)

4.6.3

Requesting Game Actions

“REQUEST Agent1 (DIVINATION Agent2)”: The speaker requests that Agent1 uses the Seers’ divination action on Agent2. “REQUEST ANY (GUARD Agent1)”: The speaker requests that anyone who is a Bodyguard to use their protection ability on Agent1 “REQUEST ANY (VOTE Agent1)”:

100

H. Osawa et al.

The speaker request that anyone vote on Agent1 (e.g., Let’s all vote on agent1!) “REQUEST Agent1 (ATTACK Agent2)”: The speaker request that Agent1 uses the werewolf kill ability on Agent2. This is particularly useful when werewolves are discussing strategy during the night negotiation period.

4.6.4

Requesting an Assumed Result of Actions

“REQUEST Agent1 (DIVINED Agent2 [species])” “REQUEST Agent1 (GUARDED Agent2)” “REQUEST ANY (IDENTIFIED Agent1 [species])” In these sentences, the speaker is requesting that Agent 1 (or any agent, in the last case) behave as if they had performed and received the respective result for a role’s special action (Divined, Guarded, or Identified). This is particularly useful for werewolves who wish to coordinate lies about having particular roles during the night negotiation period. (e.g., “Agent2, you should pretend that you are a Seer, and that you divined that Agent1 (Me) is a Villager”).

4.6.5

Examples of Agreement Request

“REQUEST Agent1 (AGREE [talk number])”: Speaker is requesting that Agent1 agree with the statement specified by [talk number]. “REQUEST ANY (DISAGREE [talk number])”: Speaker is requesting that everyone disagree with the statement specified by [talk number]. (e.g., Everyone, please disregard talk number X)

4.6.6

Interpretation of BECAUSE Sentences

“Agent2 BECAUSE (DAY 1 (Agent1 VOTE Agent2)) (vote Agent1)”: Because Agent1 voted on Agent2 (myself) on Day 1, I will vote on Agent1.

Negotiation in Hidden Identity: Designing Protocol for Werewolf Game

4.6.7

101

Interpretation of INQUIRE Sentences

“Agent2 INQUIRE Agent1 (VOTED ANY)”: Agent2 wants to know who Agent1 voted against. “Agent2 INQUIRE Agent1 (VOTE ANY)”: Agent2 wants to know who Agent1 will vote against. “Agent2 INQUIRE Agent1 (ESTIMATE Agent2 WEREWOLF)”: Agent2 wants to know if Agent1 considers Agent2 (itself) to be an werewolf.

4.6.8

Interpretation of ANY Sentences

The ANY word is equivalent to expanding all possible substitutions, and connecting them using the OR operator. For example: “Agent2 INQUIRE Agent1 (VOTED ANY)” is equivalent to: “Agent2 INQUIRE Agent1 (OR (VOTED Agent1) (VOTED Agent2) (VOTED Agent3) ...)” “REQUEST ANY (DIVINED [agent] [species])” is equivalent to: “OR (REQUEST Agent1 (DIVINED [agent] [species])) (REQUEST Agent2 (DIVINED [agent] [species])) ...”

5 Conclusion In this study, we examined the features of the werewolf game such as the absence of objective information, reasoning and persuasion derived from this absence, modeling of the context of other players, and the modeling of the self from the viewpoint of others. Based on these characteristics, we design a conversation protocol that is necessary when people and agents play the werewolf game. In the future, the utilization method of the protocol adopted by each agent participating in the Werewolf AI competition is classified. Based on this, we analyze what expressions are useful for hidden underground negotiations.

102

H. Osawa et al.

References 1. Abramson, B.: Control strategies for two-player games. ACM Comput. Surv. 21(2), 137–161 (1989). https://doi.org/10.1145/66443.66444 2. Billings, D., Burch, N., Davidson, A., Holte, R., Schaeffer, J., Schauenberg, T., Szafron, D.: Approximating game-theoretic optimal strategies for full-scale poker. In: International Joint Conference on Artificial Intelligence, pp. 661–668 (2003) 3. Dias, J.A., Reis, H., Paiva, A.: Lie to me: virtual agents that lie. In: AAMAS ’13, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, pp. 1211–1212 (2013) 4. Ganzfried, S., Sandholm, T.: Game theory-based opponent modeling in large imperfectinformation games. In: The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2. AAMAS ’11, International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, pp. 533–540 (2011) 5. Gelly, S., Kocsis, L., Schoenauer, M., Sebag, M., Silver, D., Szepesvári, C., Teytaud, O.: The grand challenge of computer go. Commun. ACM 55(3), 106–113 (2012). https://doi.org/10. 1145/2093548.2093574 6. Ginsberg, M.L.: Gib: imperfect information in a computationally challenging game. J. Artif. Intell. Res. 14, 303–358 (2001) 7. IEEE: Ethically aligned design, versions 1 and 2. https://standards.ieee.org/develop/indconn/ ec/autonomous_systems.html, Accessed: 2017-06-19 8. Inaba, M., Toriumi, F., Takahashi, K., et al.: The statistical analysis of werewolf game data. In: Proceedings of Game Programming Workshop, pp. 144–147 (2012) 9. Krawiec, K., Szubert, M.G.: Learning n-tuple networks for othello by coevolutionary gradient search. In: Proceedings of the 13th Annual Conference on Genetic and Evolutionary Computation - GECCO’11. ACM Press (2011). https://doi.org/10.1145/2001576.2001626 10. Margaret Robertson, W.M.: Werewolf: how a parlour game became a tech phenomenon. https:// www.wired.co.uk/article/werewolf (2010) 11. Rao, A.S., Georgeff, M.P.: Bdi agents: from theory to practice. In: Proceedings of the First International Conference on Multi-Agent Systems (ICMAS-95), pp. 312–319 (1995) 12. Taylor, D.P.: Investigating approaches to AI for trust-based, multi-agent board games with imperfect information. Discov. Inven. Appl. (1) (2014) 13. Toriumi, F., Osawa, H., Inaba, M., Katagami, D., Shinoda, K., Matsubara, H.: AI wolf contest— development of game AI using collective intelligence—. In: Computer Games, pp. 101–115. Springer (2016) 14. Whitehouse, D., Powley, E.J., Cowling, P.I.: Determinization and information set Monte Carlo tree search for the card game Dou Di Zhu. In: 2011 IEEE Conference on Computational Intelligence and Games (CIG’11). IEEE (Aug 2011). https://doi.org/10.1109/cig.2011.6031993

Multi-Agent Recommender System Abdullah Alhejaili and Shaheen Fatima

Abstract A recommender agent (RA) provides users with recommendations about products/services. Recommendations are made on the basis of information available about the products/services and the users, and this process typically involves making predictions about user preferences and matching them with product attributes. Machine learning methods are being studied extensively to design RAs. In this approach, a model is learnt from historical data about trading (i.e. data about products and the users buying them). There are numerous different learning methods, and how accurately a method can make a recommendation depends on the adopted method and also on the use of historical data. Given this, we propose a multi-agent recommender system called MARS which combines various different machine learning methods. Within MARS, different agents are designed to make recommendations using different machine learning methods. Since different agents use different machine learning methods, the recommendations they make may be conflicting. Negotiation is used to come to an agreement on a recommendation. Negotiation is conducted using a contract-net protocol. The performance of MARS is evaluated in terms of recommendation error. The results of simulations show that MARS outperforms five existing recommender systems. Keywords Recommender system · Multi-agent system · Machine learning

A. Alhejaili (B) · S. Fatima Department of Computer Science, Loughborough University, Loughborough, UK e-mail: [email protected]; [email protected] S. Fatima e-mail: [email protected] A. Alhejaili Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 R. Aydoˇgan et al. (eds.), Recent Advances in Agent-based Negotiation, Studies in Computational Intelligence 958, https://doi.org/10.1007/978-981-16-0471-3_7

103

104

A. Alhejaili and S. Fatima

1 Introduction Online shopping websites such as Amazon offer thousands of products, which makes it difficult for users to manually search them. Automated tools for assisting users in their search are much needed. A recommender agent (RA) is such a tool, it provides users with recommendations on products and services such as articles, books, music, movies, restaurants, financial services and legal services [1]. A right recommendation can save users time and effort. It can also benefit sellers by increasing sales. A number of automated tools for making recommendations are used. Most of them work on the basis of information available about products and users. The information is filtered, using simple methods such as content-based filtering or collaborative filtering to find which products will be relevant to which users. With recent developments in the field of machine learning, there is much scope for improving filtering methods in terms of accuracy and thereby increasing user satisfaction [2]. If historical trading data about users and the products they bought is available, then supervised learning can be used to learn patterns in user behaviours and thereby make predictions about what products are recommendable. Supervised learning involves using historical data for training and generating a hypothesis and then using the learnt hypothesis to predict a recommendation [3]. Thus, an RA can be designed to make recommendations by learning patterns in historical data. Data needed for machine learning can be recorded from various sources such as the Internet, mobile devices and Internet of Things. The wide range of available data sources enables more features to be included in the data analysis in order to produce more accurate recommendations. For example, information about user personalities can significantly improve the performance [4]. Such data can be very useful because it enables a more accurate recommendation. However, the data can dynamically change with time because user preferences change and new users may enter the market. This makes the process of making accurate and up-to-date recommendations more challenging. In order to address this problem, we propose a multi-agent recommender system (MARS). MARS is comprised of six agents, and it makes a 3-level recommendation: strong, medium and weak recommended. In more detail, MARS is comprised of six agents A1 , . . . , A6 . Each individual agent that comprises MARS is by itself an RA. However, different agents are designed using different machine learning methods. Specifically, A1 is designed using random forests (RF), A2 using neural networks (NN), A3 using support vector machines (SVM), A4 using k-nearest neighbour (KNN), A5 using Naive Bayes (NB). Thus, several recommendations are produced within MARS. Since different agents use different machine learning methods, the recommendations they make may be conflicting. Negotiation is used to come to an agreement on a recommendation. Negotiation is conducted using a contract-net protocol [5]. Putting our work in the context of related literature, although there is some existing work on using machine learning methods for recommender agents, these methods focused on very few learning methods (see Sect. 2 for details). In contrast, MARS

Multi-Agent Recommender System

105

combines five different learning methods using a multi-agent approach and uses negotiation in order to improve the accuracy of recommendations. Through simulations, we demonstrate that MARS performs better than five existing machine learningbased recommender systems (see Sect. 5.2 for details on results and evaluation of MARS). The paper is organised as follows. Section 2 introduces and describes related literature of recommender systems and machine learning. Section 3 introduces our proposed method. Section 4 describes the setting for our experiments and the dataset used to evaluate our method. Section 5 presents the results and their analysis. Section 6 is the concluding section.

2 Literature Review In this section, we review the related literature. Section 2.1 is a brief review of machine learning methods. Section 2.2 is a review of existing recommender systems.

2.1 Machine Learning (ML) Relying on statistical methods, machine learning lies at the core of artificial intelligence and data science, where the main task is to intelligently analyse the data and generate knowledge that help in making decision and understanding the data [6]. More precisely, supervised ML methods are well known of their ability of making prediction in the case of dealing with labelled data. These include random forests, k-nearest neighbour, naive Bayes, support vector machines and neural networks. Random forest (RF) [7] is an ensembling method that builds a large number of simple decision trees. Instead of building a single but huge decision tree and using it to make predictions, a number of smaller decision trees are built. Each one of these trees is then used to make a prediction, while a majority rule is applied for making the final prediction. Support vector machines (SVM) [8] have the advantage of being able to deal better with some high dimensional data, which is not the case with several other methods. However, SVM has some limitations, such as the difficulty of interpreting the resulting model [9]. The k-nearest neighbour (k-NN) [10] is a classifier that classifies the data based on the classification of the data k nearest neighbours. For example, if a majority of the k nearest neighbours are classified as class c then that data point is classified to the same class c. The method is very good for multi-class classification but is sensitive to the value of k. Naive Bayes algorithm [11] is a simple classifier based on the Bayes’ rule, and it assumes that all features are independent given the value of the class variable.

106

A. Alhejaili and S. Fatima

Neural networks [12] learn and model the relationships between features and labels, typically by using three layers: input layer, hidden layer and output layer, although the number of hidden layers may be more in the case of deep learning [13]. Data passes from one layer to the next after applying activation functions and weights. Training the network involves the adjustment of weights to achieve accurate predictions. These are generally very good for multi-class classification. Each learning method has advantages and disadvantages. The suitability of a method also depends on the type of data. It is therefore important to combine the strengths of several methods for enhancing the prediction accuracy, and this is what we do in MARS.

2.2 Recommender Systems (RS) Two main types of methods have been used in the existing recommender systems in the literature. One is collaborative filtering and the other is content-based filtering, while some other approaches combine these two methods. We will describe all these methods below. Collaborative filtering (CF) [14] is an RS design approach in which information about the ratings given by users to products is used to extract any similarities between different users. Each product has a product identifier, and for each user that rated the product, a user identifier and the rating. Then, similarities between different users are extracted based on their shared liked/disliked products. For example, a product that was not rated by a user (X ) is recommended to him/her if it was liked by another user (Y ) where X and Y liked similar products in the past. In [15], a collaborative filtering RS model is proposed using neural networks. One of the limitations of this work is that it cannot make recommendations for new users (cold-start). Content-based (CB) filtering [16] is another approach for designing RSs. In this approach, historical data (comprised of user and product attributes together with ratings given by users to products) are used. The similarity between products is extracted based on their attributes, and similarity between users is also extracted based on their attributes. A recommendation is made based on these similarities. For example, a product P can be recommended to a user U if he/she liked another product Q, where P and Q have similar attributes. Hybrid recommender systems [17] combine CF with CB to enhance their advantages and overcome their limitations. [18] proposed a hybrid collaborative filtering model with neural networks for feature modelling and matrix factorization for rating prediction. Thus, only one machine learning methods was used. In MARS, we combine five different machine learning methods. [19] proposed a Content-boosted Collaborative Filtering neural NETwork (CCCFNet). Using neural networks, their model can combine collaborative filtering

Multi-Agent Recommender System

107

and content-based filtering in a unified framework. The method uses cross-domain data from several sources. A multi-agent recommender system was proposed in [20] using associative rules and collaborative filtering. Their two-agent-based system adopts only one type of ML methods, i.e. associative rules, and deal with only binary data. The difference between [20] and MARS is that we use five different ML methods and provide a 3-class recommendation. Another multi-agent based recommender system was proposed in [21]. This uses a market-based approach utilising a reinforcement learning approach proposed by [22]. In contrast, MARS utilises five different supervised learning methods. PMF: [23], presented a collaborative filtering model called probabilistic matrix factorization (PMF) to cover the issue of learning from imbalanced and sparse recommendation dataset, e.g. recommending items to users with little rating history. The model provides recommendation as a regression problem. MARS differs from this approach as it makes a 3-class recommendation, i.e. (strong/medium/weak) and combines five different learning methods. CTR: [24] proposed collaborative topic regression (CTR) model, which combines traditional collaborative filtering with topic modelling using latent Dirichlet allocation (LDA) [25]. Although this recommendation model can deal with product cold-start problem, it cannot deal with user cold-start problem. While this model uses two methods, MARS combines several ML methods. CDL: [26] proposed Collaborative Deep Learning model (CDL), a RS that uses only neural networks. In contrast, in MARS, we combine five different ML methods to enhance their advantages. ConvMF: [27] proposed a RS called convolutional matrix factorization (ConvMF), which integrates convolutional neural network (CNN) and probabilistic matrix factorization (PMF). The main advantage of this model is its ability to enhance the recommendation accuracy by using additional textual information about items. The limitation of this approach is that the text information may not always be available. In MARS, we only use data about users and products without any additional textual data about products, and we and combine five different ML methods. R-ConvMF+: [28] is an extension of [27] work. It also uses textual data about products in addition to other product and user information and improves recommendation accuracy by better understanding user-item relationships. This method requires additional text information which is not required in MARS. In summary, all the above-mentioned existing methods use very few ML methods for their recommender systems. In addition, some require additional textual data in order to make an accurate recommendation. However, the advantage of MARS is that it combines several ML methods and uses negotiation to enhance the accuracy of its recommendation. A second advantage of AMRS is that it does not require any additional text data to make accurate recommendations. A summary of the comparison of recommender system methods is given in Table 1. The results of our simulations show that MARS makes more accurate recommendations than existing methods.

108

A. Alhejaili and S. Fatima

Table 1 A comparison summary of recommender system methods Method Comparison factor Methodology Number of agents Information required PMF CTR

CDL

Matrix One Factorization Hierarchical One Bayesian, Support Vector Machine and Matrix Factorization Neural Networks One

ConvMF

Matrix One Factorization, Neural Networks

R-ConvMF

Matrix One Factorization, Neural Networks

MARS

Nural Networks, Random Forest, Support Vector Machines, k-Nearest Nieghbor, Naive Bayes

Five

Type of recommendation

User ID, product Regression ID and rating User attributes, Binary product attributes and rating

User attributes, product attributes, rating and textual information User attributes, product attributes, rating and textual information User attributes, product attributes, rating, and textual information User attributes, product attributes and rating

Regression

Regression

Regression

3-level

3 The MARS Recommender System MARS is comprised of six agents that work together to decide upon a 3-level recommendation: strong, medium or weak recommend. The agents interact by means of cooperative negotiation using the standard contract-net protocol [5]. One of the six agents is the manager agent and the remaining are contractors.

3.1 MARS Architecture The six agents A1 , . . . , A6 are organised as shown in Fig. 1. Agent A6 is the manager and the agents A1 , . . . , A5 are the contractors. Each contractor agent takes data about

Multi-Agent Recommender System

109

Fig. 1 MARS Architecture. D is defined in Table 2

product and user (details about product and user data are in Sect. 3.2) as input and outputs the strength with which to recommend the product to the user. Each one of these five agents therefore outputs one of three values: strong, medium and weak recommend. Thus, each of the agents A1 , . . . , A5 is itself a recommender agent. However, they all are designed using different machine learning methods (details below). This means that even if the input is the same to all of them, their outputs (i.e. strength of recommendation) may differ. The manager agent, A6 , resolves these differences by negotiation. The outputs of each of the agents A1 , . . . , A5 are given as input to the manager agent which then makes a final recommendation (strong, medium or weak). The agents A1 , . . . , A5 use machine learning as follows: • • • • • •

A1 : (RF) Random forest-based recommender agent. A2 : (NN) Neural networks-based recommender agent. A3 : (SVM) Support vector machine-based recommender agent. A4 : (KNN) k-Nearest neighbour-based recommender agent. A5 : (NB) Naive Bayes-based recommender agent. A6 : Manager recommender agent.

The manager solicits the recommendations of all the contractors and then resolves any conflicts using Algorithm 1 (see Sect. 3.3 for Algorithm 1). Section 3.2 is a description of user and product data (i.e. the input that is given to the agents A1 , . . . , A5 as shown in Fig. 1).

110

A. Alhejaili and S. Fatima

3.2 Training and Test Data The data was downloaded from MovieLens-1M .1 This contains historical data gathered about users and movies. Each user is described in terms of certain attributes (such as age, gender, etc.) and each movie is also described in terms of certain attributes (such as genre, title, etc.). Associated with each user-movie pair, is a rating (which is an integer between 1 and 5;1 is the lowest rating and 5 the highest) which is the rating that the user gave to the movie. This data is used to train the agents A1 , . . . , A5 . Before training, the data is pre-processed by converting the ratings. A rating in the raw data downloaded from Movielens can take one of 5 possible values: 1 to 5. However, in most recommender systems, a recommendation is binary (yes/no) or ternary (strong/medium/weak). Hence, we converted the 5-valued ratings to 3 valued ratings as follows. Each rating that is either 1 or 2 is converted to ‘weak’, each rating that is either 3 or 4 is converted to ‘medium’, and each rating that is 5 is converted to ‘strong’. Internally, the three ratings weak, medium and strong are represented as 0, 1 and 2, respectively. More details about pre-processing are in Sect. 4.2. After pre-processing, the data is a set of examples. Each example is comprised of three fields (see Sect. 4.1 for details about the data set): 1. data (i.e. user attributes) about a user U, 2. data (i.e. movie attributes) about a movie M, and 3. the rating (strong/medium/weak) the user U gave to the movie M. A set of such examples comprises the training data. Once trained, testing of agents can begin. The test data (which is different to the training data) contains a set of examples. Each example in the test data contains the following two fields: 1. data (i.e. user attributes) about a user U and 2. data (i.e. movie attributes) about a movie M Thus, each example in the test data is a pair, one element of which is the attributes of a user and the other is the attributes of a movie. Note that each example in the test data contains user and movie attributes but it does not contain a rating. It is the task of each recommender agent (A1 , . . . , A5 ) to take as input, each test example (i.e. user attributes—movie attributes pair) and generate as output, a recommendation (strong/medium/weak) which indicates how strongly the movie is to be recommended to the user. Recommendation in MARS is thus a three-class classification. Note that since the contractor agents A1 , . . . , A5 use different machine learning methods, their output recommendations can be different. More importantly, A1 , . . . , A5 differ in terms of their performance, i.e. how accurately that are able to recommend. We measure the performance accuracy of agents A1 , . . . , A5 in terms of root mean squared error (see Sect. 4.4 for details about this performance metric). Our results of testing showed that the agents in the decreasing order of accuracies are 1 https://grouplens.org/datasets/movielens/1m.

Multi-Agent Recommender System

111

A1  A2  A3  A4  A5 The most accurate is A1 and the least is A5 . Since A1 , . . . , A5 differ in terms of their output recommendations and also their accuracies, the manager agent combines their recommendations so as to optimise the accuracy of its final recommendation.

3.3 The Manager Agent The task for the manager is to take as input, each of the five output recommendations of the agents A1 , . . . , A5 and decide upon a final recommendation. This is done in the following steps: 1. Broadcast an announcement to all contractors that a recommendation must be made for test data. 2. Solicit bids form the contractor agents A1 , . . . , A5 . (At this stage all five contractor agents send bids.) 3. Award the recommendation task to each one of the five contractors. (Each contractor agent now starts to work on the test data and makes a recommendation.) 4. Solicit the output recommendations from A1 , . . . , A5 . 5. Combine the recommendations of A1 , . . . , A5 into one final recommendation by resolving any disagreements in the recommendations of the contractors (this is described in detail below). Now, if all contractor agents produce identical recommendations, then that will be the manager’s final recommendation. However, if there is disagreement between the recommendations of the contractors, then the disagreements are resolved as follows. Resolving disagreements: Disagreement between the contractors can arise in one of the following two possible ways: • Case 1: There is a clear majority, i.e. at least 3 out of the 5 contractors make the same recommendation. In this case, the final recommendation is decided by majority. • Case 2: A pair of distinct contractors, say (Aa , Ab ) agree on a recommendation, i.e. both recommend Ra . Another pair of distinct contractors, say (Ac , Ad ), agree on a recommendation, i.e. both recommend Rc where Ra = Rc . Recall that the contractors in their decreasing order of accuracies are A1  A2  A3  A4  A5 . So the top three most accurate contractors are A1 , . . . , A3 . If Aa ∈ {A1 , . . . , A3 } and Ab ∈ {A1 , . . . , A3 } then the final recommendation is Ra . Otherwise, if Ac ∈ {A1 , . . . , A3 } and Ad ∈ {A1 , . . . , A3 }, then the final recommendation is Rc . Otherwise, if only one agent of the pair (Aa , Ab ) belongs to the set {A1 , . . . , A3 } and only one agent of the pair (Ac , Ad ) belongs to the set {A1 , . . . , A3 }, then the recommendation of the most accurate of these two agents is the final recommendation.

112

A. Alhejaili and S. Fatima

Algorithm 1 Manager agent A6 Input: The recommendations R1 , . . . , R5 of the contractors A1 , . . . , A5 Output: The recommendation R (see Figure 1) 1: if Consensus between at least 3 contractors then 2: R is decided by majority 3: Return R 4: else 5: if (Aa , Ab ) recommend Ra and (Ac , Ad ) recommend Rc then 6: if Aa ∈ {A1 , . . . , A3 } and Ab ∈ {A1 , . . . , A3 } then 7: R = Ra 8: else if Ac ∈ {A1 , . . . , A3 } and Ad ∈ {A1 , . . . , A3 } then 9: R = Rc 10: else 11: Suppose Aa belongs to {A1 , . . . , A3 } and Ac belongs to {A1 , . . . , A3 } and let A x is the most accurate between Aa and Ac 12: R = Rx 13: end if 14: Return R 15: end if 16: end if

The above method for resolving disagreements is formalised in Algorithm 1.

4 Experimental Evaluation 4.1 Dataset For training and testing A1 , . . . , A5 , we use data from MovieLens 1M. This is a dataset of anonymous ratings given by users to movies. It contains data from the year 2000 onwards. There are about 1 million ratings from 6000 users on 4000 movies. The raw data is comprised of the following tables: • User data table: For each user, there are five attributes: UserID, gender, age, occupation and zip-code. • Movie data table: For each movie, there are three attributes: MovieID, title and genres (there are 18 possible genres including action, comedy, etc.). • Rating data table: This contains ratings given by users to movies (as UserID, MovieID, rating, timestamp). We combined the data from the above-mentioned three tables and encoded it as follows. Each training example has the following 42 features: • gu: 18 features for genre of interest to user: one feature corresponding to each genre. Each of these features takes a value between 0 and 1, the higher the value, the higher the degree for that genre.

Multi-Agent Recommender System

113

• gm: 18 features for genres to which a particular movie belongs: one feature corresponding to each genre. Each of these features takes a value between 0 and 1, the higher the value, the higher the degree for that genre. • uGen: A feature for user gender: 1 digit with 2 possible values 0 (male) 1 (female). • u Age: A feature for user age: 1 digit with 7 possible values representing the age group the user is corresponding to. • u Occ: A feature for user occupation: 1 digit with 21 possible values representing the occupation the user is corresponding to. • m Age: A feature for movie age: 1 digit with 8 possible values representing the age group for a movie. • m R A: A feature for rating: 1 digit representing the average of the ratings that the movie received from all users. • m R N : A feature for indicating the number of times the movie was rated: 1 digit representing the number of times that the movie was rated. All these features were normalised, so each one is represented as a number between 0 and 1.

4.2 Data Pre-processing Three types of pre-processing was done to the raw data: 1. Converting the rating that users gave to movies. In the raw data, the rating is a an integer between 1 and 5. Each rating is converted to an integer between 0 and 2 representing the recommendation level as described in Sect. 3.2. 2. The feature values are normalised. 3. Two new added features F1 nad F2 were added to each example in the training and test data. The feature F1 was obtained by clustering the data to extract information about similarities between the examples. We used Gaussian Mixture Model (GMM) [29] clustering algorithm to cluster the examples into 10 clusters. Each example then belongs to one of these 10 clusters. F1 can take one of 10 possible values: 1, . . . , 10. These values are normalised, so F1 is a number between 0 and 1. The feature F2 was added to indicate similarity between the movie genres and the genres of interest to the users. Genre can take one or more of 18 possible values (such as action, comedy, etc.). Genre is encoded as an integer i. For instance, i = 1 indicates action, i = 2 indicates comedy, and so on. Then gu i , a number between 0 and 1, indicates a user’s level of interest in the ith value for genre. gm i , also a number between 0 and 1, indicates the degree to which a movie belongs to a particular genre. Then, the similarity between a user’s interest in the various genres, and the genre of a movie is calculated using Eq. 1 as follows:

0.52 0.67 0.57

... ... ...

...

gu 1

0.14 1 0.43

1 0 0

0.04 0.21 0.2

u Age

uGen

u Occ

Users genre

User attributes Gender age occupation gu 18 0 0.01 0.05

0.5 0.33 0.77

0.88 0.8 0.8

Movie attributes Rating Rating average count mRA mRN 0.13 0.13 1

Movies age m Age 0 0.33 0

gm 1 ... ... ...

...

Movies genre

0 0 0

gm 18

Table 2 A list of features in the training data D (after pre-processing) with three training examples. D is shown in Fig. 1

0.7 0.2 0.5

0.79 0.61 0.75

2 1 0

Added Features Label GMM Similarity Rec. level F1 F2 y

114 A. Alhejaili and S. Fatima

Multi-Agent Recommender System

Sim =

115 18 

(1 − |gu i − gm i |)

(1)

i=1

4.3 Implementation Environment Simulations for training and testing were run on macOS High Sierra machine with the following features: • • • •

Version: 10.14.4 (17G65). Processor: 2.8 GHz Intel Core i7. Memory: 16 GB 2133 MHz LPDDR3. Graphics: Radeon Pro 555 2048 MB, Intel HD Graphics 630 1536 MB.

Programming was done in Python 3.6 using machine learning libraries such as Keras, Tensorflow and scikit-learn.

4.4 Evaluation Metrics The performance of the contractor agents A1 , . . . , A5 and the manager A6 is evaluated in terms of root mean squared error (RMSE) which is given by Eq. 2:  RMSE =

n i=1 ( yi − yi )2 n

(2)

n is the total number of test examples,  yi is MARS’s predicted recommendation for the i th example and yi is the actual recommendation from the data set for the i th example.

5 Performance Analysis of MARS Since MARS is comprised of six component agents (A1 , . . . , A6 ) and each one of them is itself a recommender agent, we analysed its performance by comparing the performances of all the component agents. We also compared the performance of MARS with the five recommender systems (listed below) from the literature. In more detail, we conducted the following two different analyses: 1. A comparative analysis of the performances of the component agents in MARS (see Table 3). Each contractor is a single-agent recommender system, and the manager together with the five contractors is a multi-agent recommender system. Details in Sect. 5.1.

116

A. Alhejaili and S. Fatima

2. A comparative analysis of the performance of MARS (in terms of the final recommendation made by the manager agent) with the performances of the following five recommender systems from the literature. • • • • •

PMF [23]: Probabilistic Matrix Factorization. CTR [24]: Collaborative Topic Regression. CDL [26]: Collaborative Deep Learning. ConfMF [27]: Convolutional Matrix Factorization. R-ConvMF [28]: Robust Convolutional Matrix Factorization.

Details are in Sect. 5.2.

5.1 Performance of the Components MARS The performance of MARS was evaluated in terms of the RMSE for each of the individual contractor agents A1 , . . . , A5 and also that of the combined result of the manager A6 . The results are shown in Table 3. As seen the RMSE for A1 , i.e. the contractor agent implemented using random forests is 0.6118. For A2 , i.e. the contractor agent implemented using neural networks is 0.6171. For A3 , i.e. the contractor agent implemented using support vector machines is also 0.6171. For A4 , i.e. the contractor agent implemented using neural networks k nearest neighbour is 0.6869. For A5 , i.e. the contractor agent implemented using naive Bayes is 0.7989. However, for the final recommendation made by the manager agent A6 , the RMSE is the lowest at 0.6033. Thus, using a multi-agent approach resulted in a reduced error of recommendation.

5.2 A Comparison of MARS with Other Systems MARS outperformed all the other five methods from the literature (see Table 4). This is despite the fact that MARS makes a 3-level (strong/medium/low) recommendation, while many others make binary (yes/no) recommendation. Thus, one

Table 3 A comparison of RMSE for each agent within MARS Agent Role A1 A2 A3 A4 A5 A6

Contractor Contractor Contractor Contractor Contractor Manager

RMSE 0.6118 0.6171 0.6171 0.6869 0.7989 0.6033

Multi-Agent Recommender System

117

Table 4 A comparison of RMSE for MARS and five other methods Method RMSE PMF CTR CDL ConvMF R-ConvMF MARS (A6)

0.8971 0.8969 0.8879 0.8531 0.8470 0.6033

of the advantages of MARS is that its error of predicted recommendation is lower than the other methods. Since MARS combines several different machine learning approaches and uses negotiation to make a final recommendation, it is able to make a more accurate recommendation compared to the other methods. Another advantage of MARS is that it can make recommendations without using any additional textual data about products. The other methods rely on such additional information to make a recommendation. The following is a summary of the main advantages of MARS: • MARS provides a three-level of recommendation (i.e., strong/medium/weak), while the other methods provide either a binary (yes/no) recommendation or a regression recommendation. • Compared to other methods such as CDL, ConvMF and R-ConvMF, MARS requires less input data but can still make a more accurate recommendation. MARS generates its recommendation based on only the data downloaded from MovieLens and does not require any additional textual data about products. • Since MARS uses a multi-agent approach, it is robust against failures of individual agents. • Since MARS uses a multi-agent approach, it can be trained in less time, as all the five contractor agents can be trained in parallel.

6 Conclusion and Future Work We proposed a multi-agent recommender system (MARS) for making 3-level (strong/medium/weak) recommendations. The individual agents within MARS use different machine learning methods to make a recommendation. Negotiation is used to come to an agreement on a recommendation. The performance of MARS is evaluated in terms of recommendation error. The results of simulations show that the recommendation error for MARS is lower than five other recommender systems from literature. Reverse engineering can be used for identifying the optimal ML model for making recommendation. Thus, we plan to address this issue in the future by reverse

118

A. Alhejaili and S. Fatima

engineering MARS. In addition, datasets from different recommendation domains such as book recommendation is planned to be included in the evaluation of MARS in the future.

References 1. Ricci, F., Rokach, L., Shapira, B.: Introduction to recommender systems handbook. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P. (eds.) Recommender Systems Handbook 2011, pp. 1–35. Springer, Boston, MA (2011). https://doi.org/10.1007/978-0-387-85820-3_1 2. Zhang, S., Yao, L., Sun, A., Tay, Y.: Deep learning based recommender system: a survey and new perspectives. ACM Comput. Surv. 52(1), 5:1–5:38 (2019) 3. Flach, P.: Machine Learning: The Art and Science of Algorithms that Make Sense of Data, 1st edn. Cambridge University Press, Cambridge, United Kingdom (2012) 4. Nguyen, T.T., Maxwell Harper, F., Terveen, L., Konstan, J.A.: User personality and user satisfaction with recommender systems. Inf. Syst. Front. 20(6), 1173–1189 (2018) 5. Smith, R.G.: The contract net protocol: high-level communication and control in a distributed problem solver. IEEE Trans. Comput. C-29(12), 1104–1113 (1980) 6. Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015) 7. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001) 8. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995) 9. Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.-H., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008) 10. Bhatia, N., Vandana: Survey of nearest neighbor techniques. Int. J. Comput. Sci. Inf. Secur. 8(2), 302–305 (2010) 11. Zhang, H.: The optimality of Naive Bayes. In: Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference, pp. 562–568. AAAI Press, Florida, USA (2004) 12. Jain, A.K., Mao, J., Mohiuddin, K.M.: Artificial neural networks: a tutorial. Computer 29(3), 31–44 (1996) 13. Qiu, J., Wu, Q., Ding, G., Xu, Y., Feng, S.: A survey of machine learning for big data processing. EURASIP J. Adv. Signal Process. 2016(67), 1–16 (2016) 14. Su, X., Khoshgoftaar, T.M.: A survey of collaborative filtering techniques. Adv. Artif. Intell. 2009, 1–19 (2009) 15. He, X., Liao, L., Zhang, H., Nie, L., Hu, X., Chua, T.-S.: Neural collaborative filtering. In: Proceedings of the 26th International Conference on World Wide Web, pp. 173–182. International World Wide Web Conferences Steering Committee, Perth, Australia (2017) 16. Lops, P., de Gemmis, M., Semeraro, G.: Content-based recommender systems: state of the art and trends. In: Ricci F., Rokach L., Shapira B., Kantor P. (eds.) Recommender Systems Handbook, pp. 73—105. Springer, Boston, MA (2011). https://doi.org/10.1007/978-0-38785820-3_3 17. Burke, R.: Hybrid recommender systems: survey and experiments. User Model. User-Adap. Inter. 12(4), 331–370 (2002) 18. Dong, X., Yu, L., Wu, Z., Sun, Y., Yuan, L., Zhang, F.: A hybrid collaborative filtering model with deep structure for recommender systems. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 1309–1315. AAAI Press, California, USA (2017) 19. Lian, J., Zhang, F., Xie, X., Sun, G.: CCCFNet: a content-boosted collaborative filtering neural network for cross domain recommender systems. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 817–818. International World Wide Web Conferences Steering Committee, Perth, Australia (2017)

Multi-Agent Recommender System

119

20. Morais A.J., Oliveira E., Jorge A.M.: A multi-agent recommender system. In: Omatu S., De Paz Santana J., González S., Molina J., Bernardos A., Rodríguez J. (eds.) Distributed Computing and Artificial Intelligence 2016, Advances in Intelligent and Soft Computing, vol. 151, pp. 281–288. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28765-7_33 21. Moon, S.K., Simpson, T.W., Kumara, S.R.T.: An agent-based recommender system for developing customized families of products. J. Intell. Manufact. 20(6), 649–650 (2009) 22. Tran, T., Cohen, R.: A reputation-oriented reinforcement learning strategy for agents in electronic marketplaces. Journal 18(4), 550–565 (2002) 23. Salakhutdinov, R., Mnih, A.: Bayesian probabilistic matrix factorization using Markov Chain Monte Carlo. In: Proceedings of the 25th International Conference on Machine Learning, pp. 880–887. ACM, Helsinki, Finland (2008) 24. Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 448–456. ACM, San Diego, California, USA (2011) 25. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003) 26. Wang, H., Wang, N., Yeung, D.-Y.: Collaborative deep learning for recommender systems. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1235–1244. ACM, Sydney NSW Australia (2015) 27. Kim, D., Park, C., Oh, J., Lee, S., Yu, H.: Convolutional matrix factorization for document context-aware recommendation. In: Proceedings of the 10th ACM Conference on Recommender Systems, pp. 233–240. ACM, Boston Massachusetts USA (2016) 28. Kim, D., Park, C., Oh, J., Yu, H.: Deep hybrid recommender systems via exploiting document context and statistics of items. Inf. Sci. 417, 72–87 (2017) 29. Reynolds, D.: Gaussian mixture models. In: Li, S.Z., Jain, A. (eds.) Encyclopedia of Biometrics, Springer, Boston, MA (2009). https://doi.org/10.1007/978-0-387-73003-5