Social dilemmas, institutions, and the evolution of cooperation 9783110472974, 9783110471953

The question of how cooperation and social order can evolve from a Hobbesian state of nature of a “war of all against al

287 93 6MB

English Pages 582 [584] Year 2017

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
Part I: Foundations
Introduction
Micro-Macro Models in Sociology: Antecedents of Coleman’s Diagram
Part II: Institutions
The Kula Ring of Bronislaw Malinowski: Simulating the Co-Evolution of an Economic and Ceremonial Exchange System
From the Savannah to the Magistrate’s Court
The Dependence of Human Cognitive and Motivational Processes on Institutional Systems
Social Dilemmas and Solutions in Immunizations
Part III: Social Norms
When Do People Follow Norms and When Do They Pursue Their Interests?
Personal Exposure to Unfavorable Environmental Conditions: Does it Stimulate Environmental Activism?
Cooperation and Career Chances in Science
Social Dilemmas in Science: Detecting Misconduct and Finding Institutional Solutions
The Interplay of Social Status and Reciprocity
Part IV: Peer-Sanctioning
Types of Normative Conflicts and the Effectiveness of Punishment
Social Status and Peer-Punishment: Findings from Two Road Traffic Field Experiments
The Double Edge of Counter-Sanctions. Is Peer Sanctioning Robust to Counter-Punishment but Vulnerable to Counter-Reward?
Diffusion of Responsibility in Norm Enforcement
Endogenous Peer Punishment Institutions in Prisoner’s Dilemmas: The Role of Noise
Part V: Trust and Trustworthiness
Cooperation and Distrust – a Contradiction?
Signaling Theory Evolving: Signals and Signs of Trustworthiness in Social Exchange
Trust and Promises as Friendly Advances
Online Reputation in eBay Auctions: Damaging and Rebuilding Trustworthiness Through Feedback Comments from Buyers and Sellers
Part VI: Game Theory
Nash Dynamics, Meritocratic Matching, and Cooperation
A Note on the Strategic Determination of the Required Number of Volunteers
Is No News Bad News? A Hostage Trust Game with Incomplete Information and Fairness Considerations of the Trustee
Part VII: Experimental Methods
When Prediction Fails
Measuring Social Preferences on Amazon Mechanical Turk
Repetition Effects in Laboratory Experiments
Notes on the Editors and Contributors
Recommend Papers

Social dilemmas, institutions, and the evolution of cooperation
 9783110472974, 9783110471953

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Ben Jann, Wojtek Przepiorka (Eds.) Social Dilemmas, Institutions, and the Evolution of Cooperation

Ben Jann, Wojtek Przepiorka (Eds.)

Social Dilemmas, Institutions, and the Evolution of Cooperation |

ISBN 978-3-11-047195-3 e-ISBN (PDF) 978-3-11-047297-4 e-ISBN (EPUB) 978-3-11-047069-7 Library of Congress Cataloging-in-Publication Data A CIP catalog record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at http://dnb.dnb.de. © 2017 Walter de Gruyter GmbH, Berlin/Boston Cover image: © Zoonar RF/Zoonar/Thinkstock Typesetting: le-tex publishing services GmbH, Leipzig Printing and binding: CPI books GmbH, Leck ♾ Printed on acid-free paper Printed in Germany www.degruyter.com

| For Andreas, our teacher, colleague, and friend.

Preface The starting point of this book project goes back several years, when one of us discussed the idea of compiling a Festschrift for Andreas Diekmann with the late Norman Braun of LMU Munich. The idea went into hibernation for some time, popping up now and again but mostly remaining in a deep sleep. In the spring of 2015, at a meeting of the Model Building and Simulation section of the German Sociological Association in Leipzig, a gentle nudge by Werner Raub and Thomas Voss infused the idea with a breath of life. Since neither of us is fond of the somewhat outdated Festschrift format, we decided to design a book with a focus we consider important, and about which the research community surrounding Andreas has much to say. The book’s topic lies at the core of sociological interest, as it deals with the most fundamental aspects of human sociality. Our choice of topic was also a natural one, given Andreas’ many important contributions to it. However, it is important to mention that Andreas has made many significant contributions to various other fields of sociological research. We therefore hope that our colleagues who were not asked or unable to contribute to this volume due to a lack of thematic fit will nevertheless join in with this book’s merry celebration of Andreas’ great achievements. On a more serious note, to ensure high scientific quality, all chapters in this volume have been peer reviewed (single-blind), typically receiving two reviews per chapter. Most contributors served as reviewers for one of the other chapters, and we would like to thank them as well as Vincenz Frey, Diego Gambetta, and Lukas Norbutas, who acted as external reviewers, for their valuable services. Their reviews resulted in significant improvements to the articles included in the book. We also thank the authors for their patience and willingness to take the extra effort of detailed revisions of their contributions. All chapters were edited by a professional proofreading service; we would like to thank James Disley and his team for their prompt and high-quality work. Furthermore, we thank Stefan Ilic and Tina Laubscher for their help with the technical processing and preparation of the chapters for the publisher. Most importantly, however, we would like to thank Andreas for being our teacher, mentor and supporter, but also our sharpest critic, friend, and inspiration. March 2017

https://doi.org/10.1515/9783110472974-201

Ben Jann and Wojtek Przepiorka

Contents Preface | VII

Part I: Foundations Ben Jann and Wojtek Przepiorka Introduction | 3 Werner Raub and Thomas Voss Micro-Macro Models in Sociology: Antecedents of Coleman’s Diagram | 11

Part II: Institutions Rolf Ziegler The Kula Ring of Bronislaw Malinowski: Simulating the Co-Evolution of an Economic and Ceremonial Exchange System | 39 Manuel Eisner, Aja Louise Murray, Denis Ribeaud, Margit Averdijk, and Jean-Louis van Gelder From the Savannah to the Magistrate’s Court | 61 Siegwart Lindenberg The Dependence of Human Cognitive and Motivational Processes on Institutional Systems | 85 Ulrich Mueller Social Dilemmas and Solutions in Immunizations | 107

Part III: Social Norms Karl-Dieter Opp When Do People Follow Norms and When Do They Pursue Their Interests? | 119 Peter Preisendörfer Personal Exposure to Unfavorable Environmental Conditions: Does it Stimulate Environmental Activism? | 143

X | Contents

Christiane Gross, Monika Jungbauer-Gans, and Natascha Nisic Cooperation and Career Chances in Science | 165 Katrin Auspurg and Thomas Hinz Social Dilemmas in Science: Detecting Misconduct and Finding Institutional Solutions | 189 Ulf Liebe and Andreas Tutić The Interplay of Social Status and Reciprocity | 215

Part IV: Peer-Sanctioning Heiko Rauhut and Fabian Winter Types of Normative Conflicts and the Effectiveness of Punishment | 239 Ben Jann and Elisabeth Coutts Social Status and Peer-Punishment: Findings from Two Road Traffic Field Experiments | 259 Andreas Flache, Dieko Bakker, Michael Mäs, and Jacob Dijkstra The Double Edge of Counter-Sanctions. Is Peer Sanctioning Robust to Counter-Punishment but Vulnerable to Counter-Reward? | 279 Fabian Winter and Axel Franzen Diffusion of Responsibility in Norm Enforcement | 303 Nynke van Miltenburg, Vincent Buskens, and Werner Raub Endogenous Peer Punishment Institutions in Prisoner’s Dilemmas: The Role of Noise | 327

Part V: Trust and Trustworthiness Margit E. Oswald and Corina T. Ulshöfer Cooperation and Distrust – a Contradiction? | 357 Wojtek Przepiorka and Joël Berger Signaling Theory Evolving: Signals and Signs of Trustworthiness in Social Exchange | 373

Contents | XI

Manuela Vieth and Jeroen Weesie Trust and Promises as Friendly Advances | 393 Chris Snijders, Marcin Bober, and Uwe Matzat Online Reputation in eBay Auctions: Damaging and Rebuilding Trustworthiness Through Feedback Comments from Buyers and Sellers | 421

Part VI: Game Theory Heinrich H. Nax, Ryan O. Murphy, and Dirk Helbing Nash Dynamics, Meritocratic Matching, and Cooperation | 447 Friedel Bolle A Note on the Strategic Determination of the Required Number of Volunteers | 471 Thomas Gautschi Is No News Bad News? A Hostage Trust Game with Incomplete Information and Fairness Considerations of the Trustee | 481

Part VII: Experimental Methods Hartmut Esser When Prediction Fails | 505 Marc Höglinger and Stefan Wehrli Measuring Social Preferences on Amazon Mechanical Turk | 527 Roger Berger and Bastian Baumeister Repetition Effects in Laboratory Experiments | 547

Notes on the Editors and Contributors | 567

| Part I: Foundations

Ben Jann and Wojtek Przepiorka

Introduction The question of how cooperation and social order can evolve from a Hobbesian state of nature of a “war of all against all” (Hobbes [1668] 1982) has always been at the core of social scientific inquiry (e.g., Axelrod 1984; Bowles and Gintis 2011; Durkheim [1893] 1997; Ellickson 1991; Ostrom 1990; Sennett 2012). Various approaches exist for addressing this question, but the theoretical lens through which we view the phenomena presented in this book is methodological individualism (Weber [1922] 2013; Coleman 1990). Methodological individualism reminds us that cooperation and social order are macro-sociological phenomena that can be, and need to be, explained as a result of the goal-oriented behaviors of actors. A key insight from this point of view is that social dilemmas constitute a pivotal analytical paradigm that can be used by social scientists to investigate the origins of conflict, competition, and cooperation among humans. Social dilemmas are therefore well suited as the micro-foundational building blocks for a stringent theoretical explanation of cooperation and social order (Kollock 1998). Social dilemmas are “situations of strategic interdependence in which the decisions of individually rational actors lead to an inferior outcome for all or some parties than the decisions of ‘collectively rational’ actors. Collective rationality means that actors, had they an opportunity to communicate and agree on a binding contract, should agree on a combination of actions leading to a welfare-enhancing outcome” (Diekmann and Przepiorka 2016:1311). Social dilemmas, therefore, are at the core of the problem of social cooperation. Their analysis identifies when and why humans may struggle to achieve a collectively rational solution, what societal benefits they forgo in their struggle, and what mechanisms and measures may help overcome the struggle. Social dilemmas are usually described in game theoretic terms. Apart from the well-known Prisoner’s Dilemma, there are many examples of social dilemmas (McAdams 2009; Raub, Buskens, and Corten 2015). Social dilemmas are mostly studied in small groups, such as dyadic interactions with trust at stake, or voluntary contribution situations in which a public good is produced only if enough group members contribute their resources. However, social dilemmas also occur on a large scale. The common pool resource dilemma is often used to describe the clash between individual and collective rationality in environmental issues such as fishery, land use, and traffic (Ostrom 1990). Coordination problems, in which actors must agree on one of many welfare-enhancing outcomes, also fall under the definition of social dilemmas. The gap between the individual and collective rationality inherent in social dilemmas (Rapoport 1974) creates a demand for the regulation of actors’ behavior (Coleman 1990; Voss 2001). Such regulations can be formal and manifest themselves in terms of legal codes and other institutions providing selective incentives. Regulations can also, https://doi.org/10.1515/9783110472974-001

4 | Ben Jann and Wojtek Przepiorka

however, be informal, and emerge as social norms that prescribe or proscribe certain behavior that is enforced by positive and negative peer-sanctions (Hechter and Opp 2001). In general, whether formal or informal, such regulations can be understood as institutions, defined as “humanly devised constraints that shape human interaction” (North 1990:3). A specific type is social norms, defined as rules guiding social behavior, the deviation from (or adherence to) which is negatively (or positively) sanctioned (e.g., Bicchieri 2006). Both social norms and other institutions are subject to change, due to technological innovations, policies and external shocks such as economic, political and environmental crises. These changes, in turn, affect the ways in which we interact on a small and on a large scale, and how we cooperate. Thus, while the notion of social dilemmas makes problems of cooperation comprehensible, the analysis of institutions is key to understanding the evolution of cooperation. Against this backdrop, the chapters compiled in this book give an overview of state-of-the-art research on social dilemmas, institutions, and the evolution of cooperation. The book covers (1) theoretical analyses of social dilemmas such as trust, public goods, common-pool resource and coordination dilemmas; (2) the role of formal and informal organizations, social norms, and institutions in shaping how individuals interact to overcome social dilemmas; and (3) empirical studies conducted in the laboratory and the field, as well as agent-based simulations that investigate how cooperation evolves in human groups. The book is divided into seven parts, and comprises 26 chapters written by distinguished scholars and experts in the field. Part I: Introduction After this introductory chapter, the book begins with an article by Raub and Voss (Chapter 2), who remind us of the theoretical roots of European rational-choice sociology. While James Coleman can safely be called the father of sociological rational choice theory, some of his ideas, and in particular the famous Coleman-boat, turn out to have originated in Europe. Raub and Voss provide an overview of the European antecedents of the Coleman-boat and thereby place the contributions in this book against the context of the history of thought. At the same time, by discussing various micro-macro models, Raub and Voss introduce the basic ideas of methodological individualism. Part II: Institutions The second part starts with a description of one of the most peculiar institutions in human social history – the Kula Ring. The Kula Ring was a system of economic and ceremonial exchanges among islander tribes in the Western Pacific, and was first described by the ethnographer Bronisław Malinowski at the beginning of the last century. It is believed that the ceremonial gift exchanges that took place alongside economic exchanges served the function of maintaining peace among the tribal societies.

Introduction

|

5

How the Kula Ring came into existence, however, is a yet unsolved puzzle. Ziegler (Chapter 3) devises a simulation model that aims at identifying the conditions under which the Kula Ring may have evolved. Perhaps less peculiar, but at least as important in terms of their origins as the Kula Ring, are criminal justice institutions. Starting from the observation that there is substantial cross-cultural variability in what is considered to be a crime, Eisner, Murray, Ribeaud, Averdijk, and van Gelder (Chapter 4) challenge the widespread view that the evolved human psychology has had a major bearing on the prevalence of criminal justice institutions. The authors emphasize procedural fairness and legitimacy as inherent properties of criminal justice institutions with a positive effect on maintaining social order. These properties distinguish criminal justice institutions from mere centralized punishment systems. In a similar vein, Lindenberg (Chapter 5) argues that institutions do not regulate actors’ behavior via selective incentives alone, but also have a bearing on actors’ perceptions of the situation at hand, their beliefs about other actors, and their own motivations. The author employs goal-framing theory to demonstrate how institutions regulate actors’ behavior via their influence on these actors’ overarching goals. He argues that the functioning of institutions crucially depends on how they influence an actor’s three overarching normative, gain and hedonic goals. To function well, institutions must make actors’ normative goals salient and activate internalized norms to prompt actors to act according to the legitimate rules these institutions define. In the last chapter in Part II, Mueller (Chapter 6) gives several examples of largescale social dilemmas related to vaccinations against infectious agents, and shows how the use of social dilemma theory can help to devise institutions that solve the dilemmas inherent in vaccination decisions. Part III: Social Norms What one ought to do often clashes with what one would like to do. In Chapter 7, Opp challenges the common view that the effects of one’s inclination to follow social norms and one’s own interests on behavior are additive. He suggests instead that one’s inclination to follow social norms decreases with one’s inclination to follow one’s own interests (and vice versa). He uses survey data on protest participation during the East German Revolution in 1989 to test his proposition about the interaction effect of social norms and own interests. Own interests, however, may also instigate collective action and normative change. In his contribution, Preisendörfer (Chapter 8) addresses the question of whether adverse environmental conditions trigger affected actors’ activism, aiming at improving their environmental conditions and changing social norms through policy interventions. To test his proposition, the author uses the Swiss Environmental Survey, which combines answers to the survey questions with objectively measured environmental conditions in the neighborhood of the respondents.

6 | Ben Jann and Wojtek Przepiorka

Social norms change as a result of institutional change. In academia, large-scale cooperative projects involving many researchers are increasingly promoted. At the same time, individual researchers compete for scarce, long-term academic positions and are evaluated based on their individual performance. Cooperating with others on a common project or working alone is therefore a dilemma with which young researchers are increasingly confronted. Gross, Jungbauer–Gans, and Nisic (Chapter 9) investigate whether the norm to cooperate with others has increased over time, and whether researchers who cooperate with others are more successful in securing longterm academic positions. Another clash between social norms and own interests in academia manifests itself every now and then in prominent cases of scientific misconduct. For social norms regarding scientific conduct to be established and maintained, norm violations must be negatively sanctioned with a certain probability; detecting and documenting scientific misconduct is a necessary precondition for the application of sanctions. In Chapter 10, Auspurg and Hinz devise and empirically evaluate new methods of fraud detection in scientific publications. Reciprocity is arguably the strongest social norm gluing a society together. Liebe and Tutic (Chapter 11) conduct two quasi-experiments to explore the interplay of reciprocity and social status. They use the sequential dictator game to measure reciprocity and manipulate social status by allowing participants from different schools to interact with each other. Part IV: Peer-Sanctioning Peer-sanctioning can be effective in establishing norm compliance, if a significant proportion of actors are willing to sanction their deviant peers at a cost to themselves. However, the effectiveness of peer-sanctioning may not only depend on the number of potential sanctioners, but also on the conflict potential inherent in actors’ divergent normative expectations. In Chapter 12, Rauhut and Winter distinguish four types of normative conflict that can arise from actors’ diverging expectations regarding the type of a social norm, and the extent to which others should adhere to that norm. Additionally, the more the actors to whom the social norm is directed are distinct from the actors who benefit from the social norm, the less effective peer-sanctioning will be in establishing norm compliance. In line with the idea of normative conflict, Jann and Coutts (Chapter 13) provide empirical evidence of the extent of negative peer-sanctioning in an inventive field experiment. They show how larger differences in actors’ social status can lead to more aggressive peer-sanctions of traffic norm violations. In their experiment, a confederate seemingly unintentionally blocks a road with his or her car, and the time until the blocked driver honks is measured. Actors’ social status is measured by the type of car. A factor that has been shown to hamper the effectiveness of peer-sanctioning to promote norm compliance is negative counter-sanctioning (i.e., retaliation for pun-

Introduction |

7

ished deviations). In Chapter 14 Flache, Bakker, Mäs and Dijkstra suggest that this may also be true for positive counter-sanctioning (i.e., reciprocation of rewarded adherence). The authors stage a laboratory experiment in which they test this conjecture by varying whether subjects can reward or punish their peers, and whether these sanctions occur anonymously or are attached to subjects’ “identities” for the duration of the experiment. Winter and Franzen (Chapter 15) take up the following important but often neglected question concerning peer-sanctioning: who is going to sanction the norm breaker? A group of actors who experience a norm violation by another actor may face a coordination dilemma in which only one actor is required to sanction the norm breaker. The authors show that this coordination dilemma can lead to the diffusion of responsibility such that the likelihood of a sanction decreases with the number of actors. They test this hypothesis by means of a laboratory experiment with the multi responder ultimatum game. In the last decade or so, research has shown how centralized sanctioning institutions could have evolved to substitute the costly and therefore often inefficient peersanctioning mechanism. In these studies, subjects can choose whether they want to be in an environment with or without the possibility of peer-sanctioning. Environments employing a peer-sanctioning institution turn out to be more successful and therefore attract more actors over time. In Chapter 16, van Miltenburg, Buskens, and Raub conduct a laboratory experiment to investigate whether this result also holds if the cooperation of actors is observed by their group members with some noise; with noise, defectors are observed as cooperators (and vice versa) with a small probability. Part V: Trust and Trustworthiness While social norms have been called the “cement” of society, trust has been called society’s “lubricant”. If actors know they can trust each other, social dilemmas can be overcome more effectively because fewer transaction costs accrue from the regulation of actors’ behavior. Under what conditions, however, can and do actors trust each other? Taking a psychological perspective, Oswald and Ulshöfer (Chapter 17) distinguish between trust and distrust as two states of mind that are activated depending on the risks involved in a social dilemma. In situations with small stakes, trust is the default state of mind, whereas distrust is the default in situations with larger stakes. The authors argue that actors are more skeptical about their interaction partners in the state of distrust. This in turn can have a positive effect, as actors are less gullible and more alert to signs and signals of trustworthiness in their interaction partners. Conversely, consistent with a rational choice perspective, Przepiorka and Berger (Chapter 18) start with the assumption that actors are in a state of mind in which they process information about their interaction partners’ trustworthiness in an accurate way, irrespective of whether stakes are low or high. After giving a precise definition of the trust dilemma, Przepiorka and Berger outline the potential of signaling theory

8 | Ben Jann and Wojtek Przepiorka

to explain trust and trustworthiness in social exchange. They argue that the conceptual distinction between signals and signs, and between the production and display of signals and signs, can make signaling theory more broadly applicable in the social sciences in general and in sociological scholarship in particular. Vieth and Weesie (Chapter 19) make the point more explicit that not only stakes determine whether someone is trustworthy or trustful, but also one’s interaction partners’ discernable intentions. That is, the mere act of trusting someone and promises of trustworthiness can induce actors to be respectively more trustworthy or more trustful. The authors conduct a laboratory experiment to test these conjectures, employing a nested game design, which allows disentangling subjects’ motives and intentions in the trust dilemma. In the last chapter of Part V, Snijders, Bober, and Matzat (Chapter 20) highlight once more why it is so important to study the causes of trust and trustworthiness. More and more social interactions are taking place online. In peer-to-peer online markets such as eBay, anonymous traders exchange goods and services across large geographic distances. The functioning of these online market platforms crucially depends on electronic reputation systems, which allow traders to rate each other after finished transactions and, in this way, to produce valuable information about the trustworthiness of potential exchange partners. The authors conduct an online choice experiment in which they present subjects with eBay-like offers of digital cameras to address two important but understudied questions: how much do positive and negative text messages, as compared to positive and negative star-ratings, affect potential buyers’ trust in online sellers, and how can sellers rebuild their trustworthiness in their responses to negative text messages? Part VI: Game Theory The three chapters in this part, although partly inspired by empirical evidence that conflicts with theoretical predictions, are purely game theoretical. Nax, Murphy, and Helbing (Chapter 21) make a case for the reconsideration of simple learning models to explain actors’ behavior in social dilemmas. In their analysis, they focus on a set of public goods dilemmas and the meritocratic matching mechanism. Meritocratic matching is a mechanism by which cooperators tend to be grouped with cooperators and defectors tend to be grouped with defectors, thereby making cooperation among self-regarding actors possible under certain conditions. The authors argue that simple learning models explain actors’ behavior in these types of games better than preference-based models. In a step-level public goods dilemma, the voluntary contribution of m actors is required to produce the public good for the entire group of n ≥ m actors. It is usually assumed that m is common knowledge. In his analysis, Bolle (Chapter 22) assumes that m is only known to a requestor (e.g., editors asking authors for their contributions to a collective volume) who is not part of the group, can communicate this information

Introduction

| 9

to the other actors, and benefits if the public good is produced. Bolle outlines the conditions under which the requestor has an incentive strategically to misrepresent the required number of volunteers. In the last chapter of Part VI, Gautschi (Chapter 23) analyses the trust dilemma with hostage posting. “Hostages” are costly commitment devices that can be used by actors to overcome the trust dilemma. The author observes that in experiments with hostage trust games, not posting a hostage results in less cooperation than in a trust dilemma, in which hostage positing is not an option. Given the equivalence of the trust dilemma without hostage posting, and the trust dilemma with hostage posting in which no hostage is placed, this empirical finding needs an explanation. Part VII: Experimental Methods Framing-effects are empirical effects of the “name of the game”. For example, actors behave differently in an otherwise identical social dilemma, which in one case is called the community game and in another is called the Wall Street game. Esser (Chapter 24) reviews the different ways in which experimental social scientists with a proclivity for rational choice theory have reacted to framing-effects. He thereby introduces the so-called model of frame selection, which offers a way to integrate framingeffects into a broader notion of rational choice theory. Actors with social preferences are more inclined to cooperate in social dilemmas, but not all actors have social preferences and actors’ preferences are not directly observable. However, institutions that take actors’ preferences into account can be more effective in providing solutions to social dilemmas. The design of such institutions thus requires that actors’ preferences can be measured. In Chapter 25, Höglinger and Wehrli evaluate the reliability and validity of a recently devised measure of social preferences, the SVO slider measure, by means of an online experiment. Laboratory experiments with university students as participants have been the main approach to empirical research on social dilemmas. Berger and Baumeister (Chapter 26) address the question of how far our knowledge about behavior in social dilemmas could be influenced, and even biased, by the way laboratory experiments are conducted. In particular, the authors look at the effect of subjects’ repeated participation in similar laboratory experiments on these subjects’ behavior in social dilemmas.

Bibliography [1] [2]

Axelrod, Robert. 1984. The Evolution of Cooperation. New York: Basic Books. Bicchieri, Christina. 2006. The Grammar of Society: The Nature and Dynamics of Social Norms. Cambridge: Cambridge University Press.

10 | Ben Jann and Wojtek Przepiorka

[3] [4] [5]

[6] [7] [8] [9] [10] [11] [12] [13] [14] [15]

[16] [17]

[18]

Bowles, Samuel, and Herbert Gintis. 2011. A Cooperative Species: Human Reciprocity and its Evolution. Princeton: Princeton University Press. Coleman, James S. 1990. Foundations of Social Theory. Cambridge, MA: The Belknap Press of Harvard University Press. Diekmann, Andreas, and Wojtek Przepiorka. 2016. “Take One for the Team!” Individual heterogeneity and the emergence of latent norms in a volunteer’s dilemma.” Social Forces 93(3):1309–1333. Durkheim, Emile. [1893] 1997. The Division of Labor in Society. New York: Free Press. Ellickson, Robert C. 1991. Order without Law: How Neighbors Settle Disputes. Cambridge: Harvard University Press. Hechter, Michael, and Karl-Dieter Opp, eds. 2001. Social Norms. New York: Russell Sage Foundation. Hobbes, Thomas. [1668] 1982. Leviathan. London: Penguin Classics. Kollock, Peter. 1998. “Social Dilemmas: The Anatomy of Cooperation.” Annual Review of Sociology 24:183–214. McAdams, Richard H. 2009. “Beyond the Prisoners’ Dilemma: Coordination, Game Theory, and Law.” Southern California Law Review 82(2):209–258. North, Douglass C. 1990. Institutions, Institutional Change and Economic Performance. Cambridge: Cambridge University Press. Ostrom, Elinor. 1990. Governing the Commons: The Evolution of Institutions for Collective Action. Cambridge: Cambridge University Press. Rapoport, Anatol. 1974. “Prisoner’s Dilemma – Recollections and Observations.” Pp. 17–34 in Game Theory as a Theory of Conflict Resolution, edited by A. Rapoport. Dordrecht: Reidel. Raub, Werner, Vincent Buskens, and Rense Corten. 2015. “Social Dilemmas and Cooperation.” Pp. 597–626 in Handbuch Modellbildung und Simulation in den Sozialwissenschaften, edited by N. Braun, and N. J. Saam. Wiesbaden: Springer VS. Sennett, Richard. 2012. Together: The Rituals, Pleasures and Politics of Cooperation. New Haven: Yale University Press. Voss, Thomas. 2001. “Game-Theoretical Perspectives on the Emergence of Social Norms.” Pp. 105–136 in Social Norms, edited by M. Hechter, and K.-D. Opp. New York: Russell Sage Foundation. Weber, Max. [1922] 2013. Economy and Society. Berkeley: University of California Press.

Werner Raub and Thomas Voss

Micro-Macro Models in Sociology: Antecedents of Coleman’s Diagram Abstract: In sociology, Coleman’s diagram (the “Coleman-boat” or “Coleman’s bathtub”) has become the standard way of representing micro-macro links. Coleman’s work on micro-macro links and his diagram have several “predecessors.” Many of these were developed by European sociologists and appeared (roughly) between 1970–1980, quite some years ahead of Coleman’s contributions. Much of this work, while often pioneering, has been hardly noticed outside Europe and is largely forgotten. Moreover, it has been typically published in languages like German, Dutch, and French that are not easily accessible to sociologists who tend to focus on scholarly literature in English, if only because of their lack of proficiency in other languages. We present a brief overview of some of the relevant work.

1 Introduction Coleman’s diagram for depicting micro-macro models (Figure 1) is arguably among his best known contributions to sociology and certainly to social theory. It seems to have first appeared in a hard-to-find journal (Coleman 1984) and, a bit later, in some of his better-known programmatic publications in the second half of the 1980s (Coleman 1986a:347; Coleman 1986b:1322; Coleman 1987a: passim, with 1987a a variant of his 1984 paper).¹ The diagram gained prominence in Chapter 1 of his magnum opus Foundations of Social Theory (Coleman 1990). The underlying theoretical approach took shape in two earlier strands of his work: his exchange model and theory of collective decisions (see 1964a for his early work in this field, 1973, and 1990 for comprehensive treatments), and, as we shall see, in his contributions to mathematical sociology (in particular Coleman 1964b). The diagram reflects the “logic” of “purposive action explanations” (Coleman 1986b; Coleman 1990: Chapter 1) of social phenomena, including purposive action explanations that use rigorous rationality assumptions. Such explanations are employed, for example, in Coleman’s formal theory of social exchange and collective

1 The bibliography of Coleman’s works in Clark (1996), albeit somewhat incomplete, is useful for a search. We screened the bibliography, checked various “suspect publications,” and Coleman (1984) seems to be the earliest “hit.” Note: Support for Raub was provided by NWO (PIONIER-program “The Management of Matches”; grants S 96–168 and PGS 50–370). Raub acknowledges the hospitality of Nuffield College, University of Oxford. https://doi.org/10.1515/9783110472974-002

12 | Werner Raub and Thomas Voss

decisions, in much of economics and in the “economic approach to human behavior” (e.g., Becker 1976). Work on social dilemmas, institutions, and cooperation often exemplifies the same spirit, including work on such topics based on game-theoretic tools. More generally, the diagram represents approaches to the explanation of social phenomena that focus on micro-macro links, irrespective of whether or not purposive action assumptions are used. These are approaches related to “methodological individualism” (Coleman 1986b; Coleman 1990: Chapter 1; also see Udehn 2001 for an overview). More recent approaches trying to explain aggregate-level regularities, such as behavioral and experimental game theory (Camerer 2003), “analytical sociology” (Hedström 2005), agent-based computational modeling (e.g., Macy and Flache 2009), and “sociology as a population science” (Goldthorpe 2016), likewise refer to Coleman’s diagram or employ roughly the same logic of explanation. The diagram is meanwhile influential also in other social sciences, such as demography (see Billari 2015 for a perspective similar to Goldthorpe 2016). While Coleman’s diagram has become the more or less standard exposition of micro-macro models, it has various “predecessors,” mostly developed by European sociologists, that appeared quite some time earlier (five to fifteen years or more), roughly in the 1970s and the early 1980s. Unfortunately, if only from the perspective of getting priority and originality issues right, these predecessors are by now largely forgotten. From the perspective of the history and sociology of science, this is not surprising. It is an illustration of a regularity known as Stigler’s law of eponymy: “No scientific discovery is named after its original discoverer” (Stigler 1999:277; see also Merton 1973 who, also according to Stigler himself, may claim priority with respect to Stigler’s law). In our case, this may be due to the success of Coleman’s diagram, deriving from its simplicity and intuitive appeal. It may likewise result from the fact that Coleman, while aware of at least some of these predecessors, did not refer to them. In addition, the original literature is often not in English, with few (if any) English translations. Our contribution aims primarily, though not exclusively, at “history of ideas.” We provide an overview of some predecessors, highlighting their relationship with Coleman’s diagram. We embed this in brief sketches of some intellectual background for these predecessors and some comments on ramifications for formal model building and empirical research in sociology. With good reason, the approach to sociological and social science theory that is the focus of this contribution is quite opposed to conceiving of social theory as a kind of history of ideas; in other words, it is opposed to social theory as a series of chapters detailing the ideas of “great sociologists and their lesser contemporaries.” What is more, and again understandably, engaging in social theory in the sense of developing testable explanations of social phenomena is typically preferred to spending much time and effort on the history of ideas (see Merton’s 1957:4 distinction between “systematics” and the “history of sociological theory,” and his preference for focusing on the former). Still, every now and then it seems useful to straighten out the development of ideas, certainly so when it comes to pioneering

Micro-Macro Models in Sociology: Antecedents of Coleman’s Diagram

| 13

ideas that have been overlooked or forgotten. We do not aim at contributing to recentlyrevived discussions on methodological individualism and related approaches which by and large refer to ontological ideas inspired by work in the philosophy of mind, nor do we wish to focus on concepts of causality implied by Coleman’s diagram (e.g., List and Spiekermann 2013, Little 2012; see Voss 2017 for a discussion of some of these issues). We start with an overview of Coleman’s diagram and its main features, including an example that highlights the logic of explanations in line with the diagram. We then turn to micro-macro models before Coleman. Subsequently, we briefly discuss how micro-macro models are related to formal model building and offer some comments on theoretical models and empirical research. Concluding remarks follow.

2 Coleman’s micro-macro model In Coleman’s diagram,² “macro” refers, in Coleman’s terminology, to social systems such as a family, a city, a business firm, a school, or a society (Coleman 1986a:346), whereas “micro” refers to individuals.³ The macro-level thus refers to collective phenomena that are described by concepts referring to properties of social systems, such as the size of a group. In terms of the number of actors involved, “macro” may refer not only to large but also to small social systems such as a dyad, a triad, or a small group. The micro-level refers to properties of individuals, such as their preferences, their information, or behavior. Hence, the distinction “micro” versus “macro” corresponds to the distinction “individual” versus “collective.” A: Macro-conditions

4

1

D: Macro-outcomes

3 2

B: Micro-conditions

C: Micro-outcomes

Fig. 1: Coleman’s diagram.

Nodes A and D represent propositions describing macro-conditions and, respectively, macro-outcomes. Arrow 4 represents propositions about an empirical regularity at the macro-level, for example an association between macro-conditions and macrooutcomes. Macro-outcomes denoted by Node D as well as empirical regularities de-

2 For our sketch, see Raub, Buskens and Van Assen (2011). 3 There are also well-known examples of micro-macro models with corporate actors (Coleman 1990: Part III and IV) at the micro-level.

14 | Werner Raub and Thomas Voss

noted by Arrow 4 represent explananda at the macro-level. Node B represents propositions describing micro-conditions. These propositions thus refer to “independent variables” in assumptions about regularities of individual behavior or, more ambitiously, in a theory of individual behavior. Arrow 1 represents assumptions on how social conditions affect these variables. For example, social conditions such as networks or institutions but also prices can be conceived as opportunities or, conversely, constraints that affect the feasible alternatives between which actors can choose. Social conditions likewise shape the incentives associated with various feasible alternatives and shape actors’ information. Various labels have been suggested for such assumptions on macro-to-micro relations. We follow Lindenberg (1981; Wippler and Lindenberg 1987) and label them “bridge assumptions.” Node C represents micro-outcomes and the explanandum at the micro-level, namely, descriptions of individual behavior. Assumptions about regularities of individual behavior or a theory of individual behavior are represented by Arrow 2. Thus, Arrow 2 represents a micro-theory. Finally, Arrow 3 represents assumptions on how actors’ behavior generates macro-outcomes. Again following Lindenberg (1977; Wippler and Lindenberg 1987), we use “transformation rules” as a label for such assumptions on micro-to-macro relations. It is evident from the diagram that the explanandum at the micro-level (descriptions of individual behavior) follows from an explanans comprising assumptions on individual behavior (Node B, Arrow 2), macro-conditions (Node A), and bridge assumptions (Arrow 1). The explananda at the macro-level, that is, descriptions of macro-outcomes (Node D) and macro-regularities (Arrow 4), follow from an explanans comprising assumptions on individual behavior (Node B, Arrow 2), macro-conditions (Node A), as well as bridge assumptions (Arrow 1) and transformation rules (Arrow 3). The diagram clearly indicates that sociological explanations focus on macro-phenomena as explananda and that such explanations try to highlight macro-conditions rather than exclusively micro-conditions as part of the explanans. Thus, such explanations follow the “minimal program of sociology” (Lindenberg 1977) that has been set forth already by Durkheim in his Rules of Sociological Method: social facts should be causally explained by other social facts. Note that “micro-macro” is ambiguous from the perspective of Coleman’s diagram. In a narrow sense, “micro-macro” can refer exclusively to Arrow 3. In a broader sense, “micro-macro” can refer to explaining macro-outcomes (Node D) and macroregularities (Arrow 4) using assumptions on individual behavior (Node B, Arrow 2), macro-conditions (Node A), as well as bridge assumptions (Arrow 1) and transformation rules (Arrow 3). We use “micro-macro” in this broader sense. Hence, we avoid cumbersome terminology like “macro-micro-macro” and systematically refer to assumptions represented by Arrow 3 as “transformation rules.” Consider a paradigmatic example of a micro-macro problem, namely the production of collective goods and the empirical regularity at the macro-level that group size is often negatively related to the production of collective goods (Olson 1971). The core feature of a collective good is that, once available, actors who did not contribute to its

Micro-Macro Models in Sociology: Antecedents of Coleman’s Diagram

| 15

production cannot be excluded from its consumption. This induces the free rider problem: when the costs of an individual contribution are high compared to the marginal effects of such a contribution on individual benefits from the good, actors face incentives not to contribute. Assume now that there are no “selective incentives” such that additional individual benefits do depend on individual contributions to the production of the collective good. Then, Olson argued, collective good production will depend on group size in the sense that large groups with a common interest of group members concerning the production of the collective good will typically suffer from a less than optimal production of the good. The relation between group size and collective good production at the macro-level should not, however, be considered as a simple macrolaw. Rather, this relationship depends on a number of specific conditions such as the absence of selective incentives, the production function for the collective good, and others (see, e.g., Sandler 1992). Diekmann’s (1985) Volunteer’s Dilemma (VOD) is a formal model of a set of conditions that imply the group size effect. The bystander intervention and diffusion of responsibility problem (Darley and Latané 1968) is Diekmann’s (1985) example of a social situation for which VOD is a reasonable model. In this situation, actors witness an accident or a crime. Everybody would feel relieved if at least one actor helped the victim by, for example, calling the police. However, providing help is costly and each actor might be inclined to abstain from helping, hoping that someone else will do so instead. VOD captures these features in a non-cooperative game with N actors.⁴ In a non-cooperative game, intuitively speaking, actors are unable to incur binding and enforceable agreements or unilateral commitments with respect to certain behaviors. More specifically, in the VOD, actors are unable to incur binding and enforceable agreements or commitments to contribute to the production of the collective good (such as calling the police). Actors have binary choices. They decide simultaneously and independently whether or not to contribute to the collective good: each actor, when choosing, is not informed about the choices of the other actors. The good is costly and will be provided if at least one actor – a “volunteer” – decides to contribute. Contributions by more than one actor are feasible and then each actor pays the full costs of providing the good but contributions of more than one actor do not further improve the utility level of any actor. A core feature of VOD is that the costs (K) of contributing to the collective good are smaller than the gains (U) from the good. The matrix in Figure 2 summarizes the normal form of the game. In this figure, the rows represent an actor’s strategies, namely, to contribute (CONTR) or not to contribute (DON’T); the columns indicate the number of other actors who contribute; and the cells represent an actor’s payoff as a function of his⁵ own strategy and the number of other actors who contribute.

4 See, for example, Heifetz (2012) for details on game-theoretic concepts and assumptions and a discussion of Diekmann’s VOD (Heifetz 2012:211–214). 5 Throughout, we use “he” and “his” to facilitate readability without intending any gender-bias.

16 | Werner Raub and Thomas Voss

0 U–K 0

CONTR DON’T

Number of other actors choosing CONTR 1 2 … U–K U–K … U U …

N–1 U–K U

Fig. 2: Diekmann’s (1985) Volunteer’s Dilemma (U > K > 0; N ≥ 2).

In terms of Coleman’s diagram (see Figure 3 for an illustration), both being a noncooperative game and group size are macro-conditions represented by Node A in the diagram. The macro-outcome of interest, represented by Node D, is the probability (P) that the collective good will be provided. Arrow 4 now represents the relation between group size and the probability that the collective good will be provided. Node B represents the micro-conditions (a) that each actor can choose between CONTR and DON’T, (b) actors’ information, namely, that actors, when choosing, are not aware of the other actors’ choices,⁶ and (c) actors’ preferences as represented by their payoffs. Note that the normal form of the game includes bridge assumptions (Arrow 1) on macro-micro transitions. Namely, the normal form includes a specification of how an actor’s payoff depends on own choices as well as those of all other actors: that is, the normal form specifies the structure of actors’ interdependence. VOD with group size N 4 1

3 2

Individual incentive to CONTR

Probability P* of collective good production

Individual probability p* of CONTR

Fig. 3: Micro-macro diagram for Diekmann’s Volunteer’s Dilemma.

Game-theoretic rationality assumptions, such as the assumption of equilibrium behavior, are micro-level assumptions represented by Arrow 2 in Coleman’s diagram. In an equilibrium, each actor’s strategy maximizes own payoffs, given the strategies of the other actors. VOD has N equilibria in pure strategies. These are the strategy combinations with exactly one volunteer choosing CONTR with probability 1, while all other actors choose DON’T with probability 1. In each of these equilibria, the collective good is provided with certainty. However, the equilibria involve a bargaining problem, since each actor prefers the equilibria with another actor as the volunteer over the equilib-

6 Strictly speaking, we would have to specify the extensive form, including the game tree, rather than only the normal form of VOD, to make its information structure explicit.

Micro-Macro Models in Sociology: Antecedents of Coleman’s Diagram

| 17

rium where he himself is the volunteer. Moreover, while the game is symmetric, the N equilibria in pure strategies require that actors do not choose the same strategies. It is a natural assumption that rational actors play a symmetric equilibrium in the sense of choosing the same strategies in a symmetric game. It can be shown that VOD has a unique symmetric equilibrium in mixed strategies such that each actor chooses CONTR with probability 1 K N−1 p∗ = 1 − ( ) 50 %b N with sign. caliper test 5 % significance level 10 % significance level 10 % Caliper N articlesa Percentage with OC > 50 %b N with sign. caliper test 5 % significance level 10 % significance level 15 % Caliper N articlesa Percentage with OC > 50 %b N with sign. caliper tests 5 % significance level 10 % significance level

Author w/o retr. (N = 30 articles)

Authors w/ retr. (N = 30 articles)

Retractions (N = 15 articles)

24 58.3 %

24 87.5 %

12 91.7 %

2 2 25 60.0 % 1 2 25 72.0 % 2 5

0 2 28 92.9 % 3 7 28 96.4 % 7 9

0 1 13 92.3 % 2 5 13 92.3 % 4 6

a

Some articles do not have any observations falling in the respective caliper. These articles are omitted. b Percentage of articles which have more than 50 % of observations in the over-caliper (OC).

(again see Table 4). For the articles in the control group of authors without retractions, fewer articles show significant caliper tests, which is probably mainly caused by these articles not showing such a clear over-representation of z-values in the over-caliper.

7 Summary and conclusions Throughout history, science – a systematic mode of scrutinizing the state of the world conducted in certain institutional settings, such as universities, academies, and research institutes – has repeatedly been affected by major and minor scandals relating to fraud. We have argued that the incentive structure of the scientific world triggers individual misconduct of all kinds (such as fabricating data, manipulating data, selective reporting of results, etc.). Two social dilemmas were identified: (1) a collective good problem, from the perspective of scientists, who can either cooperate (‘be honest’), with the risk that other scientists do not (‘be sloppy or even manipulate’), and gain advantage when it comes to rewards; and (2) a volunteer’s dilemma, from the perspective of the scientific community, regarding taking on the effort of inspecting

210 | Katrin Auspurg and Thomas Hinz

suspicious data, carrying the risks of whistle-blowing, etc. Solutions for both dilemmas require institutional measures that change the payoff structure by minimizing the possible gain from misconduct and creating positive incentives to engage in more rigorous surveillance. As we have argued, while sanctions imposed for misconduct are enforceable, the decisive factor is a higher probability that misconduct can be detected. In the natural sciences, repeated trials of replications often work as an effective detection mechanism for misconduct. In the social sciences, replications are also a promising method of detecting misconduct. However, replication studies are still rare, for reasons that will be discussed below. In this article, we tested whether simple statistical methods can serve as tools to check for manipulations in the social sciences, as long as researchers use numbers to represent their results and run statistical analyses (including significance tests) to prove their hypotheses. Our contribution to the existing literature lies in contrasting research results which are known to be fabricated (from articles that have been retracted) with research results from an unsuspicious control group. The Benford digit test has often been applied to detect fabricated results in regression analyses, but it failed in our analyses of articles in the world of social psychologists for two reasons. First, in most cases the smallest unit of analysis (single articles) contained too few numbers to have sufficient statistical power. Second, and more importantly, the digits of test statistics under focus (mainly F- and t-tests with restricted numbers of degrees of freedom) seemed not to be distributed according to Benford’s law at all. Our result is in line with other scholars who have been very skeptical about the Benford test’s potential to detect fraud in science and other fields (for example manipulating electoral votes; see for instance Deckert, Myagkov, and Ordeshook 2011). The caliper test, however, seems to be a promising candidate, not only in investigating publication bias but also in showing sensitivity in relation to manipulated data or results. Particularly for the retracted articles, the caliper test clearly rejected the null hypothesis of an even distribution of test scores around the critical threshold for statistical significance. The test logic is, of course, based on the assumption that manipulation and fraud in the social sciences occurs mainly in the form of data trimming (trimming results to statistical significance). However, as is argued in this chapter, this assumption seems very plausible, because falsifiers also have to show plausible values for all variables used in their analyses (test as well as control variables), so that non-significant but plausible data are probably used as a starting point and are then trimmed using small manipulations to show outstanding and significant results. Although the low number of cases hampers clear diagnosis regarding single articles, the overall pattern nevertheless suggests that authors in our control group show a lower (but still marked) tendency to manipulate data with the objective of presenting statistically significant results. When applying the caliper test at the level of single articles, one has to be aware of the fact that one single conspicuous case is not sufficient to prove manipulation. But as our analyses showed, many of the articles by all three authors known to have

Social Dilemmas in Science: Detecting Misconduct and Finding Institutional Solutions

|

211

fabricated or manipulated data were affected. Under the assumption that test scores are continuously distributed (no manipulations), our results are extremely unlikely. Nevertheless, future research should address the statistical characteristics of the (repeated) caliper test more precisely. In addition, one important prerequisite for conducting the caliper test is the stringent reporting of the number of cases, coefficients, standard errors and test statistics, without rounding coefficients and standard errors (Freese 2014). Retracted and non-retracted articles written by authors known to have engaged in falsification differed regarding the number of test statistics reported in the articles (with more tests reported in the retracted articles), but showed only marginal differences regarding the indication of data manipulations measured by the caliper test. Probably only the lower number of test statistics hindered the commission’s ability to detect manipulations in the as yet unretracted articles as well. Being mostly transparent in all data and test statistics not only makes data manipulation more difficult (and costly), but also increases the chance that the scientific community will be able to detect abnormalities (for instance, the test power of the caliper test is increased). The irregularities found by the caliper test might to some extent also have been caused by the selection process of journals, with reviewers and editors preferring significant over non-significant results for publication. However, with a general publication bias in the form of a rejection of insignificant results, it would be unlikely that this form of censoring particularly would affect the small calipers around the statistical threshold for significance. It is more likely that several test values that fail to reach statistical significance do not get published, regardless of whether they fall far outside of statistical significance or just below this threshold. For these reasons, the caliper is probably a tool that can be used to detect small data manipulations made by authors rather than publication bias triggered by reviewers or editors. Finally, some conclusions on the institutional organization of the scientific domain should be discussed. It is obvious that all kind of measures which increase the actual and subjectively expected probability of manipulations being detected should be supported. A core element is replication, or at least the chance of studies being replicated. Journals and funding agencies have the power to create transparent data access for published articles. This includes transparency regarding syntax files and case selection. In general, replications deserve a higher level of acceptance within the scientific community (i.e., by providing journal space in high-impact journals for replication studies). Given an effective and severe sanction mechanism, this would directly reduce the incentives for engaging in manipulation. But what is the solution for the volunteer’s dilemma? Research institutions, such as the German Research Foundation, or journals could implement one measure that has already been suggested by Andreas Diekmann (2011): research assistants in these institutions could randomly select some articles for replication. The risk of being selected for these controls should effectively reduce incentives to engage in scientific misconduct. Some journals have already adopted such policies (for instance, all authors submitting to

212 | Katrin Auspurg and Thomas Hinz

the American Journal of Political Sciences have to upload all data and analysis files, and final acceptance is contingent upon successful replication of the results: for more details see https://ajps.org/2015/03/26/the-ajps-replication-policy-innovations-andrevisions/). Such random checks, which are common in many other fields (such as security screening at airports), evidently need resources to be implemented. To overcome the volunteer’s dilemma, peer reviewing in journals would also benefit from explicitly rewarding thorough and in-depth reviews. Again, resources are necessary to create incentives for a higher quality review system. For the sake of good research practice in general, and not only in the social sciences, funding agencies should consider devoting a certain proportion of their budget to the support of quality assurance within science. This would most certainly be a better investment of money than some forms of project funding.

Bibliography [1]

[2]

[3] [4] [5] [6] [7]

[8] [9] [10] [11] [12] [13] [14]

Auspurg, Katrin, and Thomas Hinz. 2011. “What Fuels Publication Bias? Theoretical and Empirical Analyses of Risk Factors Using the Caliper Test.” Jahrbücher für Nationalökonomie und Statistik 231(5/6):636–660. Auspurg, Katrin, Thomas Hinz, and Andreas Schneck. 2014. “Ausmaß und Risikofaktoren des Publication Bias in der deutschen Soziologie.” Kölner Zeitschrift für Soziologie und Sozialpsychologie 66(4):549–573. Bauer, Johannes, and Jochen Gross. 2011. “Difficulties Detecting Fraud? The Use of Benford’s Law on Regression Tables.” Jahrbücher für Nationalökonomie und Statistik 231(5/6):733–748. Becker, Gary. S. 1968. “Crime and Punishment: An Economic Approach.” Journal of Political Economy 76(2):169–217. Bekkers, René. 2012. “Risk Factors for Fraud and Academic Misconduct in the Social Sciences.” Working Paper. Center for Philanthropic Studies. Amsterdam: University of Amsterdam. Benford, Frank. 1938. “The Law of Anomalous Numbers.” Proceedings of the American Philosophical Society 78(4):551–572. Berning, Carl C., and Bernd Weiß. 2016. “Publication Bias in the German Social Sciences: An Application of the Caliper Test to Three Top-Tier German Social Science Journals.” Quality & Quantity: International Journal of Methodology 50(2):901–917. Dawes, Robyn M. 1980. “Social Dilemmas.” Annual Review of Psychology 31(1):169–193. Dawes, Robyn M., and David M. Messick. 2000. “Social Dilemmas.” International Journal of Psychology 35(2):111–116. Deckert, Joseph, Mikhail Myagkov, and Peter C. Ordeshook. 2011. “Benford’s Law and the Detection of Election Fraud.” Political Analysis 19(3):245–268. Diamond, Arthur M. 1996. “The Economics of Science.” Knowledge and Policy 9(2/3):6–49. Diekmann, Andreas. 1985. “Volunteer’s Dilemma.” Journal of Conflict Resolution 29(4):605– 610. Diekmann, Andreas. 2005. “Betrug und Täuschung in der Wissenschaft. Datenfälschung, Diagnoseverfahren, Konsequenzen.” Schweizerische Zeitschrift für Soziologie 31(1):7–30. Diekmann, Andreas. 2007. “Not the First Digit! Using Benford’s Law to Detect Fraudulent Scientific Data.” Journal of Applied Statistics 34(3):321–329.

Social Dilemmas in Science: Detecting Misconduct and Finding Institutional Solutions

|

213

[15] Diekmann, Andreas. 2011. “Are Most Published Research Findings False?” Jahrbücher für Nationalökonomie und Statistik 231(5/6):628–635. [16] Diekmann, Andreas, and Ben Jann. 2010. “Benford’s Law and Fraud Detection. Facts and Legends.” German Economic Review 11(3):397–401. [17] Diekmann, Andreas, and Peter Preisendörfer. 2003. “Green and Greenback: The Behavioral Effects of Environmental Attitudes in Low-Cost and High-Cost Situations.” Rationality and Society 15(4):441–472. [18] Durtschi, Cindy, William Hillison, and Carl Pacini. 2004. “The Effective Use of Benford’s Law to Assist in Detecting Fraud in Accounting Data.” Journal of Forensic Accounting 5(1):17–34. [19] Engel, Christoph. 2015. “Scientific Disintegrity as a Public Bad.” Perspectives on Psychological Science 10(3):361–379. [20] Feigenbaum, Susan, and David M. Levy. 1993. “The Market for (Ir)Reproducible Econometrics.” Accountability in Research 3(1):25–43. [21] Feigenbaum, Susan, and David M. Levy. 1996. “The Technological Obsolescence of Scientific Fraud.” Rationality and Society 8(3):261–276. [22] Fisher, Ronald A. 1926. “The Arrangement of Field Experiments.” Journal of the Ministry of Agriculture of Great Britain 33:503–513. [23] Freese, Jeremy. 2014. “Defending the Decimals: Why Foolishly False Precision Might Strengthen Social Science.” Sociological Science 1:532–541. [24] Gerber, Alan S., and Neil Malhotra. 2008a. “Publication Bias in Empirical Sociological Research: Do Arbitrary Significance Levels Distort Published Results?” Sociological Methods & Research 37(1):3–30. [25] Gerber, Alan S., and Neil Malhotra. 2008b. “Do Statistical Reporting Standards Affect What Is Published? Publication Bias in Two Leading Political Science Journals.” Quarterly Journal of Political Science 3(3):313–326. [26] Günnel, Stefan, and Karl-Heinz Tödter. 2009. “Does Benford’s Law Hold in Economic Research and Forecasting?” Empirica 36(3):273–292. [27] Hein, Jürgen, Rico Zobrist, Christoph Konrad, and Guido Schuepfer. 2012. “Scientific Fraud in 20 Falsified Anesthesia Papers. Detection Using Financial Auditing Methods.” Der Anaesthesist 61(6):543–549. [28] Hill, Theodore P. 1995. “A Statistical Derivation of the Significant-Digit Law.” Statistical Science 10(4):354–363. [29] Hill, Theodore P. 1998. “The First Digit Phenomenon.” American Scientist 86(4):358–363. [30] Kerr, Norbert L. 1998. “HARKing: Hypothesizing after the Results are Known.” Personality and Social Psychology Review 2(3):196–217. [31] Kollock, Peter. 1998. “Social Dilemmas: The Anatomy of Cooperation.” Annual Review of Sociology 24(1):183–214. [32] Lacetera, Nicola, and Lorenzo Zirulia. 2011. “The Economics of Scientific Misconduct.” The Journal of Law, Economics, & Organization 27(3):568–603. [33] Merton, Robert K. 1957. “Priorities in Scientific Discovery: A Chapter in the Sociology of Science.” American Sociological Review 22(6):635–659. [34] Merton, Robert K. 1961. “Singletons and Multiples in Scientific Discovery: A Chapter in the Sociology of Science.” Proceedings of the American Philosophical Society 105(5):470–486. [35] Merton, Robert K. 1973. “The Normative Structure of Science.” Pp. 267–278 in The Sociology of Science: Theoretical and Empirical Investigations, edited by R. K. Merton. Chicago: University of Chicago Press. [36] Nigrini, Mark J. 1996. “A Taxpayer Compliance Application of Benford’s Law.” The Journal of the American Taxation Association 18(1):72–91. [37] Rapoport, Anatol. 1998. Decision Theory and Decision Behaviour. London: Palgrave MacMillan.

214 | Katrin Auspurg and Thomas Hinz

[38] Reich, Eugenie Samuel. 2009. Plastic Fantastic. How the Biggest Fraud in Physics Shook Up the Scientific World. New York: Palgrave MacMillan. [39] Schäfer, Christin, Jörg-Peter Schräpler, Klaus-Robert Müller, and Gert G. Wagner. 2005. “Automatic Identification of Faked and Fraudulent Interviews in the German SOEP.” Schmollers Jahrbuch – Journal of Applied Social Science Studies 125(1):183–193. [40] Simonsohn, Uri. 2013a. “Just Post It: The Lesson from Two Cases of Fabricated Data Detected by Statistics Alone.” Psychological Science 24(10):1875–1888. [41] Simonsohn, Uri. 2013b. “Just Post It: The Lesson from Two Cases of Fabricated Data Detected by Statistics Alone. Supplementary material.” Retrieved March, 2016 (http://opim.wharton. upenn.edu/~{}uws/papers/supplemental_just_post_it.pdf). [42] Simonsohn, Uri, Leif D. Nelson, and Joseph P. Simmons. 2014. “P-Curve: A Key to the FileDrawer.” Journal of Experimental Psychology: General 143(2):534–547. [43] Steen, R. Grant, Arturo Casadevall, and Ferric C. Fang. 2013. “Why Has the Number of Scientific Retractions Increased?” PLOS ONE 8(7):e68397. doi:10.1371/journal.pone.0068397. [44] Stephan, Paula E. 2010. “The Economics of Science.” Pp. 217–273 in Handbook of the Economics of Innovation, Vol. 1, edited by H. Hall Brownwyn, and N. Rosenberg. Amsterdam: Elsevier. [45] Stroebe, Wolfgang, Tom Postmes, and Russel Spears. 2012. “Scientific Misconduct and the Myth of Self-Correction in Science.” Perspectives on Psychological Science 7(6):670–688. [46] Tödter, Karl-Heinz. 2009. “Benford’s Law as an Indicator of Fraud in Economics.” German Economic Review 10(3):339–351. [47] van Kolfschooten, Frank. 2014. “Fresh Misconduct Charges Hit Dutch Social Psychology.” Science 344 (6184):566–567. [48] Xie, Yu. 2014. “‘Undemocracy’: Inequalities in Science.” Science 344 (6186):809–810. [49] Ziman, John M. 1984. “Rules and Norms.” Pp. 81–87 in An introduction to Science Studies: The Philosophical and Social Aspects of Science and Technology, edited by J. M. Ziman. Cambridge: Cambridge University Press.

Ulf Liebe and Andreas Tutić

The Interplay of Social Status and Reciprocity Abstract: While it is a well-established fact that social-status hierarchies and reciprocity norms can promote, regulate and stabilize social order, little is known about the interplay of social status and reciprocity. In two quasi-experiments, we investigate the extent of reciprocity and how it varies across groups with different social statuses. We use the sequential dictator game introduced by Diekmann (2004) to measure reciprocity. Differences in social status are indicated by school type (low vs. high, Study 1) and occupation in a specific sector (nursing student vs. nurse vs. physician, Study 2). In both experiments, we find strong evidence for reciprocity. However, groups with higher social status donate more and receive less than groups with lower social status. Our findings show that the effects of social status are not crowded out by direct reciprocity and have implications for the study of social order and cooperation.

1 Introduction and theoretical considerations Reciprocity is one of the most influential notions in the social, and more generally, in the behavioral sciences. The term refers to the common observation that actors tend to reward favorable acts and to retaliate against unfavorable acts by other agents. The mutual exchange of favors is one of the generic forms of reciprocity and nicely captured by the following quote of Hume (1978:520, 1739–1740): “’Tis profitable for us both, that I shou’d labour with you to-day, and that you shou’d aid me to-morrow.” While there are different approaches to explain reciprocity, the existence of the behavioral trend per se and its consequences for the organization of society as a whole are beyond doubt. Consider the problem of a bilateral exchange over time in some kind of natural environment, where no formal institutions such as enforceable contracts or court law are present. At a first point in time, Party A receives goods from Party B and the converse act falls due at a second (later) point in time. Given reciprocity, Party B should expect Party A to fulfill her obligations at the second point in time and, thus, has an incentive to act cooperatively at the first point in time. In the absence of reciprocity, the exchange is unlikely to take place, since Party B should expect Party A to act egoistically and not hand over the goods later. Even in the presence of formal institutions, actors might prefer to organize social and economic activities without referring to these and instead rely on reciprocity. That Note: We thank Elias Naumann for his support in conducting the quasi-experiment on occupational status, and Sascha Grehl, Helena Schmidt, and Melis Kirgil for editing the graphs and typesetting. We would also like to thank an anonymous reviewer for his/her comments and suggestions. Further, financial support from the German Research Foundation (DFG LI 1730/2-1) and the Alexander von Humboldt Foundation (“Feodor Lynen Research Fellowship”, Andreas Tutić) is also gratefully acknowledged. https://doi.org/10.1515/9783110472974-011

216 | Ulf Liebe and Andreas Tutić

is, reciprocity and formal institutions are some kind of substitutes for the provision of social order and, therefore, the question arises which combination of factors provides a given level of order with the lowest costs. For example, Ellickson (1991) describes a community of farmers in Shasta County, California, where the damages paid as a result of cattle trespassing are effectively governed by informal norms which are actually in contradiction to legal norms. Complying with these informal norms can be interpreted as a consequence of reciprocity: I obey these rules, even if this is to my detriment in the case at hand, because I expect that you will do the same in future contingencies. In her famous case studies, Ostrom (1990) provides examples where the imposition of formal institutions on self-organized communities that deal with the effective governance of common-pool resources actually leads to a breakdown of social order. From these empirical observations we can learn that a welfare-enhancing organization of society must consider the potential power of reciprocity.

1.1 Theoretical approaches to reciprocity Since the publication of the seminal article by Gouldner (1960), the concept of reciprocity has received a great deal of theoretical and empirical attention in sociology, (social)psychology, biology, political science, and economics (see, for example, van Dijk 2015; Kolm and Ythier 2006 for a comprehensive overview also regarding earlier traditions in anthropology). Whereas some scholars argue that reciprocation is a consequence of internalized norms (e.g., Gouldner 1960; Perugini et al. 2003), others have successfully developed theoretical models to explain the emergence of reciprocity. First, in the literature on repeated games, game-theorists identified conditions under which agents can rely on strategies that can be interpreted as reciprocal behavior (e.g., “tit for tat”) to achieve efficient payoffs in social dilemmas (e.g., Rapoport and Chammah 1965; Trivers 1971; Friedman 1977; Axelrod 1984; Nowak and Sigmund 1993). In these types of models, it is typically assumed that agents have materialisticegoistic preferences: that is, each actor only cares about her own material outcome. The basic mechanism for the possibility of cooperation (Taylor 1983) in these models is as follows. It is in my own best material interest to cooperate now, because if I do not, future gains will be lost, and the threat of these losses is credible. The second branch of game-theoretical literature which is relevant for reciprocity works with social preferences and/or more complex solution concepts (e.g., Rabin 1993; Fehr and Schmidt 1999; Bolton and Ockenfels 2000; Dufwenberg and Kirchsteiger 2004; Falk and Fischbacher 2006). For example, Fehr and Schmidt (1999) assume that actors have a “preference” for egalitarianism and consequently there is a trade-off between this and the ego’s material outcome. Since reciprocity tends to equalize the material outcomes, this model is a first step towards an explanation of reciprocity. Winter et al. (2009) only recently introduced the concept of a “mental equilibrium”, which is reminiscent of preliminary models by Gauthier (1978), Hegselmann, Raub and Voss (1986), and Raub

The Interplay of Social Status and Reciprocity

| 217

and Voss (1990). This allows explaining not only the behavior that actors display in games, such as social dilemma games, but also the narratives (the “morals”) which somehow justify this behavior. With respect to one of the most important forms of a social dilemma – the prisoner’s dilemma – Winter et al. (2009) are able to show that any narrative that supports a cooperative outcome in the one-shot prisoner’s dilemma is reciprocal. So, in a sense, game-theoretical models do not only provide hints for an explanation of reciprocal acts but also for the popularity of reciprocity as a value (“an eye for an eye”, the golden rule, etc.).

1.2 Experimental evidence on reciprocity At the same time, a great variety of experimental studies document the power of reciprocity (for an overview, see Fehr and Gächter 2000 or Fehr and Gintis 2007). Diekmann (2004), for example, introduced the so-called sequential dictator game to account for the empirical extent of reciprocity effects in laboratory settings. In the simple dictator game, a participant (the “dictator”) receives a certain amount of money from the experimenter and has to allocate this money between another participant (the “recipient”) and herself. Any allocation can be chosen by the dictator, including keeping the money all for herself. The sequential dictator game consists of two simple dictator games played in a row, where the first-round dictator is the second-round recipient and vice versa. The first-round recipient is informed on the first-round allocation before she makes her decision as the second-round dictator. Diekmann (2004) provides experimental evidence on the behavior in the sequential dictator game. It turns out that the behavior of the second-round dictator is highly influenced by the first-round allocation: the higher the first-round donation from the dictator to the recipient, the higher the second-round donation. This effect is the “power of reciprocity”. The importance of reciprocity is further supported by experimental studies on sequential games (see Yamagishi and Kiyonari 2000; Clark and Sefton 2001; Fehr, Fischbacher, and Gächter 2002; Ben-Ner et al. 2004). In some of these studies, evidence of behavior in simple games is compared with evidence of behavior in sequential games. A typical example for this kind of research is a study on the so-called ingroup bias conducted by Yamagishi and Kiyonari (2000). Participants are split into two groups by some prima facie irrelevant criterion (minimal group paradigm). It is well known that this grouping of participants induces an in-group bias in the prisoner’s dilemma. Cooperation rates or levels are higher if two participants from the same group are matched. It turns out that this in-group effect vanishes in the sequential version of the prisoner’s dilemma. More specifically, if the participants know that the other player is informed of their behavior before the other player decides whether to cooperate or defect, then the participants, on aggregate, do not differentiate according to the group membership of the other player. In other words, cooperation levels

218 | Ulf Liebe and Andreas Tutić

of first-round players in the prisoner’s dilemma are unaffected by the grouping of the players. Yamagishi and Kiyonari (2000) plausibly argue that the latter finding is due to reciprocity and back this idea with direct evidence on the expectation of the participants from a follow-up questionnaire. First-round players expect the second-round players to reciprocate their decision. This interpretation seems even more appropriate since levels of cooperation are generally higher in the sequential than in the simple prisoner’s dilemma. The major finding of their study can therefore be stated as follows: reciprocity crowds out the in-group bias. Such a superiority of one behavioral “determinant” (e.g., information of individual behavior) over another (e.g., information of group behavior) has also been found in other experimental studies (e.g., Cason and Mui 1998; Albert et al. 2007; Servatka 2009).

1.3 Reciprocity and social status In this contribution, we take a first step in exploring the interplay between social status and reciprocity. As indicated above, past research has documented the power of reciprocity in a wide variety of social interaction situations such as dictator games and prisoner’s dilemmas. Compared to the rich literature on reciprocity, very little experimental research on the effects of social status on social interaction has been conducted, but existing evidence indicates that status groups do show significant behavioral differences in a variety of social situations. For instance, Piff et al. (2010; 2012) demonstrate status effects in dictator games and trust games. Kumru and Vesterlund (2010) as well as Simpson, Willer and Ridgeway (2012) provide evidence that highstatus actors contribute faster and overall more towards the provision of public goods in laboratory experiments. Most relevant for the present contribution is the quasi-experiment by Liebe and Tutić (2010), which explored the effect of social status on altruistic giving in simple dictator games. Over 600 ninth-graders participated in this study. Objective status was measured by the type of school attended. Since the German schooling system is hierarchically structured and life chances critically depend on the type of school attended, this measure of social status is reasonable. Students attending four types of school played simple dictator games in which the status of the recipient, measured by type of school attended, was varied. It turns out that social status of the dictator and the status of the recipient matter in the simple dictator game: the higher the status of the dictator, the higher her donation. The higher the status of the recipient, the less she receives. From all relevant theoretical concepts (that is altruism, warm-glow giving [e.g., Margolis 1982; Andreoni 1989; Andreoni 1990], in-group bias [e.g., Tajfel et al. 1971; Tajfel and Turner 1986], and noblesse oblige [e.g., Homans 1961]), the observed pattern of donations fits altruism best. Heuristically, this can be seen as follows: if actors have altruistic preferences, then we should expect high-status actors to donate more than low-status actors because, generally speaking, high-status actors control

The Interplay of Social Status and Reciprocity

| 219

more resources than low-status actors, which might have an income effect in the microeconomic sense. Further, low-status recipients are expected to receive higher donations, because an actor with altruistic preferences can induce higher welfare gains by donating one increment of money more to them than by donating the increment to a high-status recipient. To shed light on the interplay of social status and reciprocity, we will build on our previous study as well as on Diekmann’s (2004) approach to uncover the power of reciprocity, while studying how both social status and reciprocity affect behavior in sequential dictator games. Our major research interest lies in the question of whether social status and reciprocity maintain their effects as observed in previous research, where either status or reciprocity (but not both) was varied. To achieve this, we rely on an experimental design in which both of these social forces can potentially shape and determine observable behavior. In a sequential dictator game, second-round dictators are not only informed of the size of the first-round donation, but also about the social status of the first-round dictator. This design allows us to answer questions such as: does the status of the first-round dictator even matter for second-round donations, or does the size of the first-round donation completely determine second-round donations? Is the power of reciprocity diminished by the fact that subjects can condition their donating behavior based on the status of the recipients? Against the background of similar studies that explore reciprocity by comparing behavior in sequential games to behavior in simple games (e.g., Yamagishi and Kiyonari 2000), which typically find a crowding-out of other behavioral determinants by reciprocity, we expect that the effects of social status are not present in the sequential dictator game. Hypothesis 1 (Pure Reciprocity). Neither the social status of the dictator, nor the social status of the recipient, impacts on donating behavior in the sequential dictator game. In case Hypothesis 1 is not supported and we find status effects in the sequential dictator game, we can go one step further and provide additional hypotheses which can be traced back to our theoretical interpretation of the findings in Liebe and Tutić (2010). As indicated above, we find the concept of altruistic motivations compelling as an explanation for the effects of social status in the simple dictator game. The mechanism proposed by altruism for the effects of the social status of the dictator is quite different from the mechanism invoked to explain the effects of the status of the recipient. Altruistic motivations might imply that high-status dictators donate more than low-status dictators because high-status actors control more resources, and if the good “altruistic act” is normal, microeconomics suggests that the demand for this good is positively correlated with the control of resources. This mechanism is in principle compatible with reciprocal behavior, since the subjects in general do not show full reciprocity and second-round donations are typically lower than first-round donations (cf. Diekmann 2004). It follows that there is some room for variation in the overall level of donations

220 | Ulf Liebe and Andreas Tutić

between the different types of dictators which can be filled by the stipulated income effect. Hypothesis 2 (Income Effect). High-status dictators donate more than low-status dictators in the sequential dictator game. With respect to the effects of the social status of the recipient, a completely different kind of mechanism applies: low-status recipients receive more than high-status recipients because a given amount of donated money is more productive for the well-being of a low-status recipient than for the well-being of a high-status recipient. In the following we refer to this mechanism as the welfare effect. Hypothesis 3 (Welfare Effect). Low-status recipients receive more than high-status recipients in the sequential dictator game. Of course, the welfare effect presupposes that the dictator is willing to donate at all. Now, consider some level of overall willingness to donate; empirically, we can simply calculate the average donation of some particular participant to all types of recipients. The income effect used to explain the effects of the social status of the dictator simply claims that this average is positively correlated with status. The welfare effect puts some additional structure on the vector of donations of our particular actor with her particular willingness to donate. Reciprocity demands, in general, an alternative structure of the vector of donations, a structure that might very well be more compelling than the one demanded by the welfare effect. After all, the welfare effect is “activated” only by the membership of the recipients in some social group, whereas reciprocity resorts to concrete behavioral information of a particular recipient. In a nutshell, reciprocity might not crowd out the effects of social status of the dictator, because it says little, in contrast to full reciprocity, about the average level of donations. However, with respect to the effects of the social status of the recipient, reciprocity is a direct rival for the welfare effect in structuring an actor’s vector of donations. This reasoning leads us to suspect that if social status influences donating behavior in the sequential dictator game, the effects of the status of the dictator should outweigh the effects of the status of the recipient. Hypothesis 4 (Outweighing Effect). The social status of the dictator affects donating behavior in the sequential dictator game more strongly than the social status of the recipient. In the remainder of this chapter, we test these hypotheses on the interplay of social status and reciprocity in two quasi-experiments.

The Interplay of Social Status and Reciprocity

| 221

2 Study background and method In this section, we describe the two quasi-experiments (Shadish, Cook, and Campbell 2002; Berger and Wolbring 2015) and give an overview of the experimental design as well as for the measurement of social status. We conclude with some methodological remarks on our experimental procedure.

2.1 A quasi-experiment with school type as an indicator of social status (Study 1) Sequential dictator games (SDG) were conducted with 618 ninth-graders (aged between 14 and 18 years old) attending a Hauptschule (193), a Realschule (151), a Gymnasium (195), or a Privatgymnasium (79) in the city of Berlin and several cities in the federal state of Brandenburg, Germany.¹ In our experiments, the 618 ninth-graders played three rounds of an SDG in the role of the second-round dictator and allocated 10 € each time between themselves and a recipient. Donations of the first-round dictators were controlled as part of the experimental design: that is, the first-round dictators were fictional players. Donations were varied between the 2 €, 5 €, and 8 €. Further, first-round dictators differed with respect to the type of school they attend (Hauptschule, Realschule, Gymnasium).² In a sociological sense, the German educational system partitions the pupils in status groups by grouping them in different school types, and there is much empirical evidence that the average pupil of a lower-ranking school fares worse than the average pupil of a higher-ranking school in terms of life income as well as future educational and occupational opportunities (Klein 2005). However, it might be argued that social status groups are not constituted by prospects of income, education, and occupation alone but also by family background. In this respect several studies, including the PISA study, have revealed that social origin (the parents’ social class) is a strong predictor for “pupils’ type of school”, especially with respect to the Gymnasium, the highest educational level (e.g., Baumert and Schümer 2001). Children from a lower social class have worse chances of entering a Gymnasium or Privatgymnasium than

1 Compared to the other school types, we had a lower number of subjects attending a Privatgymnasium due to the fact that there are considerably fewer private schools than public schools and this was not compensated by a lower drop-out rate among the former. 2 The German school system is complex. After elementary school (mostly till the fourth grade), the German school system differentiates between three main types of school: Hauptschule, Realschule, and Gymnasium. Additionally, there are also private schools (Privatschule). The Hauptschule leads to a diploma after 8 years, the Realschule to a diploma after 10 years, and the Gymnasium and Privatgymnasium to a diploma after 13 years (in some federal states after 12 years), qualifying for university admission or matriculation.

222 | Ulf Liebe and Andreas Tutić

children from a higher social class (e.g., parents with higher education and a better financial background). It can, therefore, safely be said that the current German school system establishes a status hierarchy. This status hierarchy is not only an objective categorization but also mostly recognized by the pupils themselves. For example, ninth-graders of a Hauptschule know that it is difficult for them to get vocational training or find a job at all compared to pupils from other school types. In contrast, pupils of a Gymnasium or a Privatgymnasium know that they have good occupational chances after finishing school. The Realschule lies in between these two extremes.³ Taken together, type of school can be seen as a reliable status-group category. With regard to life chances (future educational and occupational opportunities) there is a clear hierarchy from low to high: Hauptschule, Realschule, Gymnasium, and Privatgymnasium. Thus, we collected data on the effects of both status of the dictator and the recipient.

2.1.1 Procedure We obtained the permission from the local authorities, headmasters, and subsequently from the teachers of the schools to conduct experiments with ninth-graders in their schools. The paper-and-pencil experiments were carried out from May 2008 to March 2009 in the classroom and lasted less than 45 minutes (a lesson). If more than one class in a school took part in the experiment, we ensured that no class received information on the experiment before actual participation. Thus, there was no possibility that members of one class could inform members of the other about the experiment. Each participant received an envelope including several blocks of sheets which gave instructions, the measurement instruments for the simple as well as the sequential dictator game, and a short questionnaire. Before the participants filled in each block, the experimenter gave instructions and the participants were allowed to ask questions. For the SDG, participants were informed that they had to make three decisions. The experimenter explained that each decision would be made in response to another pupil from a different school, who had to allocate 10 € between herself and the partici-

3 In the follow-up questionnaire of our experiment, we asked the participants about the perceived social status of pupils of all school types (including their own) using a status ladder based on the MacArthur Scales (Adler et al. 2000). The status ladder assessed the subjective placement of the pupils from the different school types within German society (i.e., on a scale from 1, at the bottom of the ladder, to 10, at the top of the ladder, which position, on average, a pupil of each school type reaches in her life). The findings support the status hierarchy from low to high: the ranking revealed the order of Hauptschule (mean value of 4.01), Realschule (6.04), Gymnasium (7.89), and Privatgymnasium (8.46). This ordering did not vary with respect to the type of school the participant attended.

The Interplay of Social Status and Reciprocity

| 223

pant. Before each of the three decisions, the participant was informed of the respective allocation of the other, first-round dictator. Further, the experimenter told the participants that they would get the donation from the first-round dictator plus the amount of money they kept for themselves as a result of their own decision, while the first-round dictator received what he had kept for herself plus the donation from the participant. It was emphasized that only the participant could make these decisions, and that other pupils could not influence the outcome. Moreover, participants were asked to make each of the three decisions independently of the other decisions. It was pointed out that anonymity was guaranteed. Participants were further informed that they would only receive payoff from one of the three decisions, that the relevant decision would be chosen at random (using drawings from a bag containing numbered billiard balls by the experimenter), and that they would be paid anonymously (based on identification numbers randomly assigned to the survey instruments). In addition to the explanations by the experimenter, the participants received the instructions in written form. Note that the instructions did not contain any information about the school type of the recipients. Participants acquired the knowledge of the school type attended by the recipient at the time they had to decide on their donations. To facilitate a better understanding of our measurement instruments, we present an experimental task worded thus:⁴ A Gymnasiast has allocated 10 Euro between herself and you. The Gymnasiast attends grade 9 in a Gymnasium in Berlin/Brandenburg. From her 10 Euro the Gymnasiast has kept 5 Euro for herself and has allocated 5 Euro to you. Now you receive 10 Euro. You can allocate these 10 Euro between you and this Gymnasiast. It is your decision alone. How do you decide? Please enter the amounts of money in the places marked! Use full Euro amounts, that is, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Euro. Of 10 Euro I keep ___ Euro for me and give ___ Euro to the Gymnasiast.

We varied both the type of school the recipient attended and the donation of the firstround dictator. Each participant made one decision for each possible type of school the recipient might attend, and for each of the three first-round donations; however, each subject in total made only three decisions in the SDG, because not all combinations of type of first-round dictator and first-round donation were presented to each

4 Note that the German language allows the usage of neutral words with respect to gender. In the following translation of the experimental tasks, we substitute words referring to females for the genderneutral, German expressions.

224 | Ulf Liebe and Andreas Tutić

subject. A fully randomized design was chosen to rule out order effects. To restrict the number of different orderings and thereby ensure that each ordering was supplied to at least one participant from any status group, we eliminated first-round dictators attending a Privatgymnasium from the design. Since there are three types of first-round dictators and three first-round donations there are nine possible combinations concerning the first decision in the SDG of each subject. Regarding the second decision, there are two types of first-round dictators and two first-round donations left (because we ruled out repetitions), which means that four combinations are possible. Given the combinations on the first and second decision in the SDG, only one combination remains with respect to the third decision. Consequently, there are (3⋅3)⋅(2⋅2)⋅(1⋅1) = 36 different orderings. Each of these orderings was supplied to at least two subjects from any status group.

2.2 A quasi-experiment with occupation as an indicator of social status (Study 2) In Study 2, we used the same experimental setup as in Study 1, except this time the status groups were defined by three different occupations (professional roles) in a hospital setting: nursing students (n = 68), nurses (n = 65), and physicians (n = 23).⁵ The experiments were conducted in a nursing school and several hospitals in a big city, as well as a medium-sized town in Germany, in 2009. In total 157 students and employees participated in the experiments. Each participant made three decisions in the role of the second-round dictator, and we experimentally varied the occupation of the recipient (nursing student, nurse, physician) and first-round donations (2 €, 5 €, 8 €). The stake size was 10 €. Occupational positions are a crucial dimension of social stratification. They determine individuals’ income, class position and social prestige (Hoffmeyer-Zlotnik and Geis 2003; Ganzeboom and Treiman 1996). This is illustrated by the International Socio-Economic Index of Occupational Status (ISEI) which is a measure of socio-economic status (Ganzeboom, De Graaf, and Treiman 1992). It can take values between 16 and 90. Based on the ISEI, nursing students have a value of 38; the corresponding values for nurses and physicians are 43 and 88, respectively. While the status difference between nursing students and nurses is smaller than that between nurses and physicians, the ISEI values show a clear status hierarchy. This ranking

5 The differences in group size are mainly explained by differences in the response rates, which amounted to 100 % for nursing students, 59 % (65 of 111) for nurses and 26 % (23 of 87) for physicians. We cannot exclude the possibility that more pro-social physicians participated in the experiment. However, it seems that time constraints were a decisive factor stopping physicians from participating in our study.

The Interplay of Social Status and Reciprocity

| 225

of occupations also holds true if we use the Standard International Occupational Prestige Scale as a measure of social status (SIOPS, Ganzeboom and Treimann 1996).

2.2.1 Procedure The instructions, experimental tasks and follow-up questionnaire were distributed in the weekly meetings/briefings of the physicians and nurses and in class for the nursing students. Before handing over the experimental tasks, the experimenter explained the dictator game. Subjects were told that the other person was also working in a hospital, but not the same one (these instructions were also part of the written material). This way the anonymity of individual decisions was stressed. The subjects returned the experimental material in an envelope after class if they were nursing students, or at the next weekly meeting/briefing if they were nurses or physicians. We used exactly the same payment method as in Study 1. The money was returned in envelopes that were linked to the respondent by a random identification number. In the following, we give an example of the exact wording of an experimental task as used in the study: A nurse has allocated 10 Euro between herself and you. From her 10 Euro the nurse has kept 8 Euro for herself and has allocated 2 Euro to you. Now you receive 10 Euro. You can allocate these 10 Euro between you and this nurse. It is your decision alone. How do you decide? Please enter the amounts of money in the places marked! Use full Euro amounts, that is, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Euro. Of 10 Euro I keep ___ Euro for me and give ___ Euro to the nurse.

In the other versions of the experimental tasks, we varied the occupational position of first-round dictators and first-round donations. To avoid order effects, the sequence of decisions was randomized. All possible combinations of status groups and first-round donations were used.

2.3 Methodological remarks The recipients in the dictator games in Studies 1 and 2 actually did not exist: that is, our design involved deception. We used deception for at least three reasons. First, it allowed controlling the behavior of the first-round dictator (i.e., whether 2 €, 5 €, or 8 € were donated). The hyperfair condition of 8 € is interesting from a theoretical point of view but is unlikely to occur in actual decisions. Controlling first-round donations

226 | Ulf Liebe and Andreas Tutić

avoided lack of necessary variance concerning first-round donations in the data. Second, in our field setting it was difficult to match participants from different (nursing) schools since there was limited time for conducting experiments during class. Third, the presence of a projection bias was likely (cf. Loewenstein, O’Donoghue, and Rabin 2003). Subjects might prefer to receive payoffs immediately. Real matching across schools and hospitals would mean that payments had to be made after a couple of days or a week. Students, for example, might take the experimental tasks less seriously if they received their earnings another day. Disciplines in the social science differ with respect to the question of whether the deception of subjects is ethically acceptable in experimental research (cf. Barrera and Simpson 2012). Our approach of using deception and debriefing subjects afterwards parallels that of other researchers in the field (e.g., Piff et al. 2010; Piff et al. 2012) and accords with ethical guidelines in psychology (e.g., the code of ethics and conduct of the British Psychological Society).⁶

3 Descriptive and multivariate results In this section we provide evidence of the effects of reciprocity and social status in the sequential dictator game. We present descriptive results which are consecutively backed by multivariate analyses.

3.1 Descriptive statistics Figure 1a gives an overview of average donations in the SDG in Study 1. The upper panel shows the average donations grouped by the status group of the dictator and first-round donation. The lower panel depicts the average donations grouped by the status group of the recipient and first-round donation. Figure 1b is similarly structured and refers to Study 2.⁷ In Study 1, for example, dictators attending a Privatgymnasium who received 8 € from the first-round dictator donated approximately 5 € on average. In Study 2, for example, nurses received on average about 3.2 € from dictators who had received 2 € as a first-round donation.

6 The debriefing method varied. Schools (head of school and teachers) and hospitals received a short summary of the study, including the aim of the experiments, the methods and the main results. In hospitals this information was given by the experimenter in the team meeting after the participants had received the payments. 7 Each figure draws on all subjects who have no missing values in any of their three decisions in the SDG. Regarding Study 1, this applied to 627 subjects (199 attendants of a Hauptschule, 153 attendants of a Realschule, 195 attendants of Gymnasium, and 80 attendants of a Privatgymnasium). With respect to Study 2, this applied to 156 subjects (68 nursing students, 65 nurses, and 23 physicians).

2 5 First round donation

Average donation to recipients

Hauptschule Gymnasium 95%−Cl

(a)

8

Average donation by dictators

7 6 5 4 3 2 1 0

7 6 5 4 3 2 1 0 2 5 8 First round donation Hauptschule Realschule Gymnasium 95%−Cl

| 227

7 6 5 4 3 2 1 0 2 5 First round donation

Realschule Privatgymnasium

Nursing students Physicians Average donation to recipients

Average donation by dictators

The Interplay of Social Status and Reciprocity

8 Nurses 95%−Cl

7 6 5 4 3 2 1 0 2 5 First round donation Nursing students Physicians

8 Nurses 95%−Cl

(b)

Fig. 1: (a) Reciprocity and status effects in the sequential dictator game in Study 1. (b) Reciprocity and status effects in the sequential dictator game in Study 2.

As expected, we found clear evidence of reciprocity in both studies. That is, the higher the donation of the first-round dictator, the higher the donation of the second-round dictator. With respect to the effects of reciprocity, two observations seem remarkable. First, subjects do not fully reciprocate first-round donations: that is, average secondround donations are generally below the respective first-round. This phenomenon has already been documented by Diekmann (2004). Our second observation slightly contradicts the latter study: while Diekmann finds that subjects do reciprocate hyperfair donations to a considerable degree, our data revealed a higher reactivity to unfair than to hyperfair donations. More specifically, although the absolute difference between the unfair first-round donation (2 € condition) and the fair first-round donation (5 € condition) equals the absolute difference between the hyperfair (8 € condition) and the fair first-round donation (5 € condition), the difference in average second-round donations is much lower in the latter case.

228 | Ulf Liebe and Andreas Tutić

In addition to these more or less well-documented effects of reciprocity on behavior in the sequential dictator game, Figures 1a and 1b depict fairly strong effects of status. To see this, note that in the absence of status effects, the bars inside each group defined by first-round donations should be of approximately equal height. However, in many instances we find descriptively large and, as indicated by the non-overlapping confidence intervals, also statistically significant differences. In contradiction to previous studies on sequential games (Yamagishi and Kiyonari 2000), therefore, here direct reciprocity does not crowd out other factors which influence behavior in the respective simple game. Apparently, status matters in the sequential dictator game. This is contrary to Hypothesis 1 (pure reciprocity). More specifically, consider the upper panels of Figures 1a and 1b, which depict the effects of the status of the second-round dictator in Studies 1 and 2, respectively. Inside each group defined by first-round donations, the bars generally increase in height from left to right. There is only one exception to this pattern: Figure 1a (first-round donation = 5 €, donations by pupils of a Privatgymnasium and of a Gymnasium). This observation backs a straightforward interpretation: The higher the status of the dictator, the higher her donation. This finding is reminiscent of our results for the simple dictator game (Liebe and Tutić 2010). It confirms the prediction by altruism and is therefore in line with the income effect proposed in Hypothesis 2. The lower panels of Figures 1a and 1b show the effects of the status of the recipient. Here the pattern is less clear-cut but still discernible. With the exception of the groups defined by first-round donations of 2 € and 5 € in Study 1, the bars generally decrease in height from left to right. That is, the higher the status of the recipient, the less she receives in donation. This finding also confirms our previous results on the effects of the status of the recipient in the simple dictator game (Liebe and Tutić 2010) and supports the welfare effect suggested in Hypothesis 3. Based solely on the visual inspection of Figures 1a and 1b, it is hard to tell whether reciprocity or status bears a greater impact on donating behavior in the SDG. Against the background of Hypothesis 1 (Pure Reciprocity), it is per se surprising to find instances that might suggest that status occasionally trumps reciprocity regarding effect sizes. For instance, given a first-round donation of 8 €, the group with the highest status on average donates approximately 2 € more than the group with the lowest status in Study 1. At the same time, attendants of a Hauptschule only donate approximately 1 € more when confronted with a first-round donation of 8 € instead of 2 €. Of course, when we compare other bars in Figures 1a and 1b, a different conclusion is suggested. To determine more systematically the relative impact of reciprocity, the status of the dictator, and the status of the recipient, we conduct analyses of variance for both studies where the variance of the second-round donations in the sequential dictator game is explained by variables measuring the first-round donation (experimental condition) and two variables measuring the status of the dictator (first study: 1–4 scale; second study: 1–3 scale) and recipient (1–3 scale). The analyses draw on the same cases underlying Figures 1a and 1b (see Footnote 7). With respect to Study 1, the model ex-

The Interplay of Social Status and Reciprocity

| 229

plains 17.51 % of the variance of the donations in the sequential dictator game (total sum of squares = 11182.06). Using partial eta squared (η2 ) as our measure of effect strength, we find that reciprocity accounts for 13.93 % of the variance in donations, the social status of the dictator accounts for 5.81 %, and the social status of the recipient explains only 0.2 %. Regarding study 2, the model explains 21.49 % of the variance in donations (total sum of squares = 3246.23). Reciprocity accounts for 10.57 %, whereas the status of the dictator explains 6.28 % and the status of the recipient 7.72 %. So, while it is fair to say that reciprocity is the most important determinant of behavior in the sequential dictator game, our analysis shows that the effects of social status are not crowded out but remain present. This clearly contradicts Hypothesis 1 (Pure Reciprocity). While Hypothesis 2 (Income Effect) and Hypothesis 3 (Welfare Effect) are supported, we find mixed evidence for Hypothesis 4 (Outweighing Effect). As predicted by the latter, compared to the status of the recipient, the status of the dictator shows stronger explanatory power in Study 1. Yet in Study 2, the status effects of the recipient slightly outweigh status effects of the dictator. In the following section, we back these central results with multivariate analyses which, next to an examination of the effects of personal characteristics of the subjects, provide significance tests for reciprocity as well as status effects.

3.2 Multivariate analyses Table 1 contains the results of three ordinary least square (OLS) regression models for each study, where the donation of the second-round dictator is regressed on several sets of independent variables. Due to missing values we had to omit some of our subjects from these regressions, all of which rely on 1,842 observations (614 subjects) in Study 1 and 459 observations in Study 2 (153 subjects). To account for the problem of biased standard errors which arises because every subject makes three decisions, we report adjusted (robust) standard errors based on the Huber–White sandwich estimator. Models 1a and 1b document the power of reciprocity. The experimental conditions – the donation of the first-round dictator – serve as dummy variables, and the condition of 5 € is used as a reference. As already explained in the last subsection, mean donations are ranked as predicted by reciprocity. All differences in mean donations are highly statistically significant in study 1 and 2, except the hyperfair 8 € condition in Study 2 which is weakly statistically significant at the 10 % level. In addition to the reciprocity variables, Models 2a and 2b contain dummy variables measuring the status of both the dictator and the recipient where the attendants of a Hauptschule (Study 1) and nursing students (Study 2) are used as a reference. First of all, note that the coefficients of the reciprocity variables only change marginally compared with Models 1a and 1b, because our experimental design rules out any correlation between the reciprocity and the status variables. As expected, the

2.86 (614)

(19.71)

(4.55) (7.11) (6.24)

(−1.56) (−2.61)

(−17.97) (5.17)

0.20 1842

0.37 0.20 0.21 1.51

0.92 1.11 1.14

−0.14 −0.26

−1.55 0.47

(614)

(2.68) (2.55) (2.82) (4.54)

(4.74) (6.36) (5.23)

(−1.56) (−2.61)

(−17.95) (5.16)

Nurses Physicians

0.10 459

4.54

(Nursing stud. ref.)

(Nursing stud. ref.) Nurses Physicians

−1.57 0.36

(153)

(26.70)

(−6.73) (1.72)

0.22 459

4.69

0.75 1.75

−0.53 −1.64

−1.53 0.35

(153)

(20.78)

(2.49) (3.70)

(−2.55) (−6.20)

(−7.13) (1.76)

0.26 459

0.89 0.45 −0.15 3.36

0.71 1.79

−0.53 −1.64

−1.53 0.35

(153)

(2.20) (3.03) (−0.89) (4.63)

(2.47) (3.86)

(−2.54) (−6.18)

(−7.11) (1.75)

Model 3b

Notes: Reported are the results of ordinary least square (OLS) regression models with adjusted standard errors taking into account that each subject made three decisions (Huber–White sandwich estimator). t-values in parentheses. All effects are statistically significant, at least at p < 0.05, except the effect of first-round 8 € donations in Study 2 (p < 0.10) and the effect for recipients who attended a Realschule in Study 1 (p < 0.10).

(614)

0.18 1842

Personal characteristics Gender (1 = woman) Generalized trust Social cohesion Constant 3.51

(41.76)

0.90 1.23 1.35

Status of the dictator (Hauptschule ref.) Realschule Gymnasium Privatgymnasium

0.12 1842

−0.14 −0.26

Status of the recipient (Hauptschule ref.) Realschule Gymnasium

R2 N (subjects)

−1.55 0.47

Reciprocity (First round 5 € ref.) First round 2 € −1.55 (−18.01) First round 8 € 0.47 (5.19)

Model 2b

Model 1b

Model 3a

Model 1a

Model 2a

Occupation as status indicator

School type as status indicator

Tab. 1: Multivariate results for the sequential dictator game.

230 | Ulf Liebe and Andreas Tutić

The Interplay of Social Status and Reciprocity

| 231

coefficients of the status variables are ordered as predicted by altruism: the higher the status of the dictator, the higher her donation (the income effect). The higher the status of the recipient, the less she receives in donation (the welfare effect). Additionally, the model documents that all coefficients referring to differences between the reference group (pupils of a Hauptschule or nursing students, respectively) and the other status groups, both as dictators and as recipients, are highly statistically significant, with a sole exception. However, we have to stress that in Study 1 the number of significant coefficients of the status variables depends on the choice of the reference group. If we choose one of the extreme poles in the status hierarchy, a higher number of significant coefficients result. In other words, the differences in the middle of the status hierarchy are quite small and, consequently, typically not significant in Study 1. In Study 2 – the hospital setting – the number of statistically significant status effects does not depend on the choice of the reference category. Taken together, against Hypothesis 1 (Pure Reciprocity) and in line with Hypothesis 2 (Income Effect) as well as Hypothesis 3 (Welfare Effect), Models 2a and 2b provide clear evidence that the effects of social status are present if reciprocity as a behavioral principle is available. Further looking at the t-values regarding the effects of the status of the dictator and the recipient in Models 2a and 2b, we find the aforementioned mixed evidence for Hypothesis 4 (Outweighing Effect). Compared with the values for the status of the recipient, the values for the status of dictator are larger in Study 1 but smaller in Study 2 (especially with respect to the status group “physicians”). Finally, Models 3a and 3b add three variables which measure individual characteristics of the subjects. Many studies on social interaction show gender differences in donations. There is a tendency for women to donate more than men (Engel 2011; Camerer 2003 refers to mixed evidence). Generalized trust and social cohesion are suggested by the literature on social capital as predictors for donation behavior (e.g., Brooks 2005; Putnam 2000).⁸ Since all of the subjects had to donate in any experimental condition and to any type of recipient, the coefficients of the reciprocity variables and of the status variables with respect to the status of the recipient are (almost) identical in Models 2a,b and 3a,b. As can be seen, the personal characteristics of the subjects do not correlate too strongly with the status of the dictator and, consequently, no coefficient of 8 In Study 1, 45 % of the subjects were female (Gender = 1), and 55 % were male (Gender = 0). In Study 2, the corresponding figures were 78 % and 22 %. For doctors, the share of females (57 %) was considerable lower compared with that of nursing students and nurses. Generalized Trust is based on answers on a 5-point scale given to the question: “Generally speaking, to what extent would you say that most people can be trusted?” (1 = “Most people can be trusted” to 5 = “Need to be very careful”). Based on reversed coding (i.e., higher values meaning higher levels of trust), the mean value of this variable was 2.80 (std. dev. = 0.90) in Study 1 and 2.80 (std. dev. = 0.96) in Study 2. Social Cohesion is based on the following statement answered on a 5-point scale: “There is a strong solidarity within my school class/hospital ward.” (1 = “strongly disagree” to 5 = “strongly agree”). The mean value of this variable was 3.29 (std. dev. = 0.96) in Study 1 and 3.97 (std. dev. = 0.84) in Study 2, indicating a tendency towards strong social cohesion.

232 | Ulf Liebe and Andreas Tutić

the latter variables changes sign or loses statistical significance. Hence, our results on the effects of reciprocity and social status as described with respect to Models 2a,b are still valid under control of the personal characteristics. Models 3a,b show that, in our experiment, women donate more than men and generalized trust influences donations positively. Social cohesion has a positive and statistically significant effect in Model 3a (Study 1) and is a non-significant predictor in Model 3b (Study 2). That is, the more a subject is inclined to place trust in other people in Studies 1 and 2, and the higher the perceived level of integration in a subject’s school class in Study 1, the more she donates. These findings are noteworthy, since the literature on social capital suggests that a higher degree of reciprocity should be expected in more cohesive and trustworthy social groups, communities and societies (e.g., Putnam 1993). This idea can be traced back to Models 3a,b as follows: while it is true that reciprocity in the sense of varying second-round donations in response to varying first-round donations cannot be explained by generalized trust and social cohesion qua experimental design, we already mentioned that the subjects do not reciprocate the full amount of first-round donations. Since both variables influence second-round donations positively, we interpret Model 3 as confirming evidence for the popular idea that generalized trust and social cohesion are positively associated with reciprocity.

4 Discussion and conclusion This study provides evidence of the effects of reciprocity and social status in the sequential dictator game. As expected, reciprocity has a great impact on second-round donations: the higher the first-round donation, the higher the second-round donation. At the same time, the effects of social status are present in the sequential dictator game: the higher the status of the dictator, the higher her donation. The higher the status of the recipient, the less she receives in donation. Our findings speak against the assumption of pure reciprocity effects (Hypothesis 1) and for status effects – the income effect (Hypothesis 2) and the welfare effect (Hypothesis 3) – which can be explained by altruism. Two facts of the interplay of reciprocity and social status deserve some attention. First, reciprocity has a greater influence on behavior than the social status of both the dictator and the recipient. Second, we find only mixed evidence for the proposed outweighing effect (Hypothesis 4) that the effects of the status of the recipient are weakened more by the applicability of reciprocity than the effects of the status of the dictator. Taken together, our study demonstrates that reciprocity as a behavioral regularity is an important device for the organization of society. It can serve as a functional substitute for formal institutions and is the backbone of many informal social institutions, such as social norms. However, our study on the interplay of reciprocity and social status shows that even in situations where reciprocity is applicable, such as in

The Interplay of Social Status and Reciprocity

| 233

sequential and repeated games, behavioral effects of social status are still present and not crowded out by reciprocity. This finding raises the question whether and how social status can also serve as a substitute for formal institutions. Indeed, already George C. Homans (1961) and Peter M. Blau (1955; 1964) have highlighted the importance of social status as a medium for exchange. In his famous monograph “The Dynamics of Bureaucracy”, Blau (1955) describes in detail how more competent colleagues provide professional advice for less competent fellows in exchange for social status. Coleman (1990:130) goes as far as arguing that social status serves as a substitute for money in systems of social exchange: “Differential status is universal in social systems; in fact, the awarding of status to balance unequal transactions or to make possible halftransactions appears to be the most widespread functional substitute for money in social and political systems.” Hence, social order can benefit from both the power of reciprocity and the power of social status. Finally, we want to hint at an important direction for future research on the interplay of social status and reciprocity. In this contribution we have focused on the question of whether social status affects donating behavior in the sequential dictator game on top of reciprocity. We have remained silent on the question of what happens when status groups differ in reciprocal behavior. In fact, we have theoretical reasons for believing that high-status actors are more inclined towards direct reciprocity, whereas low-status actors are more prone to engage in indirect (i.e., generalized) reciprocity. Put briefly, drawing on social exchange theory (e.g., Molm, Collett, and Schaefer 2007) and the theory of communal and exchange relationships (cf. Clarks and Mills 2011), we theorize that high-status actors typically maintain more professional contacts which are characterized by direct reciprocity. Conversely, the networks of lowstatus actors contain much higher frequencies of close contacts such as family members and friends. As a consequence, low-status actors are more often engaged in systems of generalized exchange. Since direct exchange on the one hand and generalized systems of exchange on the other are governed by different norms of prosociality, we expect that high-status actors tend towards direct reciprocity, whereas low-status actors tend towards indirect reciprocity. Future research on the interplay of social status and reciprocity should focus on the question of whether status groups differ in reciprocal behavior. The hypothesis delineated above can serve as a point of reference for this research. The argument per se and the underlying mechanism suggest the need to measure aspects of the social capital of the subjects. It is also highly desirable to follow Diekmann’s (2004) lead in studying direct reciprocity with the sequential dictator game and develop a basic game that can serve as a measurement instrument for indirect reciprocity.

234 | Ulf Liebe and Andreas Tutić

Bibliography [1]

[2] [3] [4] [5] [6]

[7]

[8] [9]

[10] [11] [12] [13] [14] [15] [16] [17]

[18] [19] [20] [21]

Adler, Nancy E., Elissa S. Epel, Grace Castellazzo, and Jeannette R. Ickovics. 2000. “Relationship of Subjective and Objective Social Status with Psychological and Physiological Functioning: Preliminary Data in Healthy, White Women.” Health Psychology 19(6):586–592. Albert, Max, Werner Güth, Erich Kirchler, and Boris Maciejovsky. 2007. “Are We Nice(r) to Nice(r) People? An Experimental Analysis.” Experimental Economics 10(1):53–69. Andreoni, James. 1989. “Giving with Impure Altruism: Applications to Charity and Ricardian Equivalence.” Journal of Political Economy 97(6):1447–1458. Andreoni, James. 1990. “Impure Altruism and Donations to Public Goods: A Theory of WarmGlow Giving.” Economic Journal 100(401):464–477. Axelrod, Robert M. 1984. The Evolution of Cooperation. New York: Basic Books. Barrera, David, and Brent Simpson. 2012. “Much Ado About Deception: Consequences of Deceiving Research Participants in the Social Sciences.” Sociological Methods & Research 41(3):383–413. Baumert, Jürgen, and Gundel Schümer. 2001. “Familiäre Lebensverhältnisse, Bildungsbeteiligung und Kompetenzerwerb.” Pp. 323–407 in PISA 2000. Basiskompetenzen Von Schülerinnen und Schülern im Internationalen Vergleich, edited by J. Baumert, E. Klieme, M. Neubrand, M. Prenzel, U. Schiefele, W. Schneider, P. Stanat, K.-J. Tillmann, and M. Weiß. Opladen: Leske & Buderich. Ben-Ner, Avner, Louis Putterman, Fanmin Kong, and Dan Magan. 2004. “Reciprocity in a TwoPart Dictator Game.” Journal of Economic Behavior & Organization 53(3):333–352. Berger, Roger, and Tobias Wolbring. 2015. “Kontrafaktische Kausalität und eine Typologie sozialwissenschaftlicher Experimente.” Pp. 34–52 in Experimente in den Sozialwissenschaften. Sonderband 22, Soziale Welt, edited by M. Keuschnigg, and T. Wolbring. Baden-Baden: Nomos. Blau, Peter M. 1955. The Dynamics of Bureaucracy. Chicago: University of Chicago Press. Blau, Peter M. 1964. Exchange and Power in Social Life. New York: Wiley. Bolton, Gary E., and Axel Ockenfels. 2000. “ERC: A Theory of Equity, Reciprocity, and Competition.” American Economic Review 90(1):166–193. Brooks, Arthur C. 2005. “Does Social Capital Make You Generous?” Social Science Quarterly 86(1):1–15. Camerer, Colin F. 2003. Behavioral Game Theory. Experiments in Strategic Interaction. Princeton: Princeton University Press. Cason, Timothy N., and Vai-Lam Mui. 1998. “Social Influence in the Sequential Dictator Game.” Journal of Mathematical Psychology 42(2–3):248–265. Clark, Kenneth, and Martin Sefton. 2001. “The Sequential Prisoner’s Dilemma: Evidence on Reciprocation.” The Economic Journal 111(468):51–68. Clark, Margaret S., and Judson R. Mills. 2011. “A Theory of Communal (and Exchange) Relationships.” Pp. 232–250 in Handbook of Theories of Social Psychology, edited by P. A. M. V. Lange, A. W. Kruglanski, and E. T. Higgins. Thousand Oaks: Sage. Coleman, James S. 1990. Foundations of Social Theory. New York: Belknap Press. Diekmann, Andreas. 2004. “The Power of Reciprocity. Fairness, Reciprocity, and Stakes in Variants of the Dictator Game.” Journal of Conflict Resolution 48(4):487–505. Dufwenberg, Martin, and Georg Kirchsteiger. 2004. “A Theory of Sequential Reciprocity.” Games and Economic Behavior 47(2):268–298. Ellickson, Robert C. 1991. Order Without Law. Cambridge, MA: Harvard University Press.

The Interplay of Social Status and Reciprocity

| 235

[22] Engel, Christoph. 2011. “Dictator Games: a Meta Study.” Experimental Economics 14(4):583– 610. [23] Falk, Armin, and Urs Fischbacher. 2006. “A Theory of Reciprocity.” Games and Economic Behavior 54(2):293–315. [24] Fehr, Ernst, Urs Fischbacher, and Simon Gächter. 2002. “Strong Reciprocity, Human Cooperation and the Enforcement of Social Norms.” Human Nature 13(1):1–25. [25] Fehr, Ernst, and Simon Gächter. 2000. “Fairness and Retaliation: The Economics of Reciprocity.” The Journal of Economic Perspectives 14(3):159–181. [26] Fehr, Ernst, and Herbert Gintis. 2007. “Human Motivation and Social Cooperation: Experimental and Analytical Foundations.” Annual Review of Sociology 33(1):43–64. [27] Fehr, Ernst, and Klaus M. Schmidt. 1999. “A Theory of Fairness, Competition, and Cooperation.” The Quarterly Journal of Economics 114(3):817–868. [28] Friedman, James W. 1977. Oligopoly and the Theory of Games. Amsterdam: North-Holland. [29] Ganzeboom, Harry, Paul De Graaf, and Donald Treiman. 1992. “A Standard International SocioEconomic Index of Occupational Status.” Social Science Research 21(1):1–56. [30] Ganzeboom, Harry, and Donald Treiman. 1996. “Internationally Comparable Measures of Occupational Status for the 1988 International Standard Classification of Occupations.” Social Science Research 25(3):201–239. [31] Gauthier, David P. 1978. Morals by Agreement. Oxford: Clarendon Press. [32] Gouldner, Alvin W. 1960. “The Norm of Reciprocity: A Preliminary Statement.” American Sociological Review 25(2):61–178. [33] Hegselmann, Rainer, Werner Raub, and Thomas Voss. 1986. “Zur Entstehung der Moral aus natürlichen Neigungen.” Analyse & Kritik 8(2):150–177. [34] Hoffmeyer-Zlotnik, Jürgen, and Alfons Geis. 2003. “Berufsklassifikation und Messung des beruflichen Status/ Prestige.” ZUMA Nachrichten 27(52):125–138. [35] Homans, George C. 1961. Social Behavior: Its Elementary Forms. New York: Harcourt. [36] Hume, David. [1739–1740] 1978. A Treatise of Human Nature. Oxford: Clarendon Press. [37] Klein, Thomas. 2005. Sozialstrukturanalyse. Reinbek: Rowohlt. [38] Kolm, Serge-Christophe, and Jean M. Ythier, eds. 2006. Handbook of the Economics of Giving, Altruism and Reciprocity. Amsterdam: Elsevier. [39] Kumru, Cagri S., and Lise Vesterlund. 2010. “The Effect of Status on Charitable Giving.” Journal of Public Economic Theory 12(4):709–735. [40] Liebe, Ulf, and Andreas Tutić. 2010. “Status Groups and Altruistic Behavior in Dictator Games.” Rationality & Society 22(3):353–380. [41] Loewenstein, George, Ted O’Donoghue, and Matthes Rabin. 2003. “Projection Bias in Predicting Future Utility.” Quarterly Journal of Economics 118(4):1209–1248. [42] Margolis, Howard. 1982. Selfishness, Altruism, and Rationality: A Theory of Social Choice. Chicago: University of Chicago Press. [43] Molm, Linda D., Jessica L. Collett, and David R. Schaefer. 2007. “Building Solidarity through Generalized Exchange: A Theory of Reciprocity.” American Journal of Sociology 113(1):205–242. [44] Nowak, Martin, and Karl Sigmund. 1993. “A Strategy of Win-Stay, Lose-Shift that Outperforms Tit-for-Tat in the Prisoner’s Dilemma Game.” Nature 364(6432):56–58. [45] Ostrom, Elinor. 1990. Governing the Commons. The Evolution of Institutions for Collective Action. Cambridge: Cambridge University Press. [46] Perugini, Marco, Marcello Gallucci, Fabio Presaghi, and Anna P. Ercolani. 2003. “The Personal Norm of Reciprocity.” European Journal of Personality 17(4):251–283. [47] Piff, Paul K., Michael W. Kraus, Stéphane Côté, Bonnie H. Cheng, and Dacher Keltner. 2010. “Having Less, Giving More: The Influence of Social Class on Prosocial Behavior.” Journal of Personality and Social Psychology 99(5):771–784.

236 | Ulf Liebe and Andreas Tutić

[48] Piff, Paul K., Daniel M. Stancato, Stéphane Côté, Rodolfo Mendoza-Denton, and Dacher Keltner. 2012. “Higher Social Class Predicts Increased Unethical Behavior.” Proceedings of the National Academy of Sciences 109:4086–4091. [49] Putnam, Robert D. 1993. Making Democracy Work. Civic Traditions in Modern Italy. Princeton: Princeton University Press. [50] Putnam, Robert D. 2000. Bowling Alone. The Collapse and Revival of American Community. New York: Simon & Schuster. [51] Rabin, Matthew. 1993. “Incorporating Fairness into Game Theory and Economics.” American Economic Review 83(5):1281–1302. [52] Rapoport, Anatol, and Albert M. Chammah. 1965. Prisoner’s Dilemma. Ann Arbor: The University of Michigan Press. [53] Raub, Werner, and Thomas Voss. 1990. “Individual Interests and Moral Institutions. An Endogenous Approach to the Modification of Preferences.” Pp. 81–117 in Social Mechanisms: Their Emergence, Maintenance and Effects, edited by M. Hechter, K.-D. Opp, and R. Wippler. Berlin: De Gruyter. [54] Servatka, Maroš. 2009. “Separating Reputation, Social Influence, and Identification Effects in a Dictator Game.” European Economic Review 53(2):197–209. [55] Shadish, William R., Thomas D. Cook, and Donald T. Campbell. 2002. Experimental and QuasiExperimental Designs for Generalized Causal Inference. Boston / New York: Houghton Mifflin. [56] Simpson, Brent, Robb Willer, and Cecilia L. Ridgeway. 2012. “Status Hierarchies and the Organization of Collective Action.” Sociological Theory 30(3):149–166. [57] Tajfel, Henri, Michael Billig, Robert Bundy, and Claude Flament. 1971. “Social Categorization in Intergroup Behavior.” European Journal of Social Psychology 1(2):149–178. [58] Tajfel, Henri, and John C. Turner. 1986. “The Social Identity Theory of Intergroup Behavior.” Pp. 7–24 in Psychology of Intergroup Relations, edited by S. Worchel, and W. G. Austin. Chicago: Nelson-Hall. [59] Taylor, Michael. 1983. The Possibility of Cooperation. Cambridge: Cambridge University Press. [60] Trivers, Robert L. 1971. “The Evolution of Reciprocal Altruism.” The Quarterly Review of Biology 46(1):35–57. [61] Van Dijk, Eric. 2015. “The economics of prosocial behavior.” Pp. 86–99 in Oxford handbook of prosocial behaviour, edited by D. A. Schroeder, and W. G. Graziano. Oxford: Oxford University Press. [62] Winter, Eyal, Ignacio Garcia-Jurado, Jose Mendez-Naya, and Luciano Mendez-Naya. 2009. “Mental Equilibrium and Rational Emotions.” Discussion Paper 521, Center for Rationality and Interactive Decision Theory, Hebrew University, Jerusalem. [63] Yamagishi, Toshio, and Toko Kiyonari. 2000. “The Group as the Container of Generalized Reciprocity.” Social Psychology Quarterly 63(2):116–132.

| Part IV: Peer-Sanctioning

Heiko Rauhut and Fabian Winter

Types of Normative Conflicts and the Effectiveness of Punishment Abstract: While the current literature focuses on how social norms generate cooperation, the issue of norm-related conflict deserves more attention. We develop a new typology of normative conflict by combining Coleman’s (1990) distinction between conjoint and disjoint norms with our own classification of commitment-related and content-related normative conflicts (Winter, Rauhut, and Helbing 2012). We outline a theory of how the four resulting types of normative conflict can be ordered. We provide real-life examples and typical game-theoretical conceptualizations of the four cases and suggest how they can be sorted according to their conflict potential and the extent to which conflict can be restored by punishment. We then discuss a prototypical laboratory study for each of the types, and show how our theoretical arguments can be applied. We conclude with a discussion of how previously anomalous empirical results can be re-thought and understood in light of our theoretical reasoning. Finally, we give suggestions for prospective empirical micro-level corroborations and for mechanism design.

1 Introduction Social norms have a pivotal role in sociology. They can serve as a “lubricant” of social order and facilitate social interaction in coordination problems such as which side of the road to drive on, which greeting to use or what clothing to wear in which context. They can also solve cooperation problems by prescribing contributions to collective goods such as a clean environment, a safe neighborhood, or public infrastructure. Scholars of different schools of thought seem to converge around the idea that social norms emerge because they have positive consequences for society. In the functionalist approach, norms bridge the tension between individual self-interest and the functional prerequisites of society (Durkheim 1997; Parsons 1968; Dahrendorf 1977). The rational-choice literature also argues that norms emerge when there is a demand for them (Ullmann-Margalit 1977; Coleman 1990). A demand is typically given in situations where everybody has an interest that all others cooperate but oneself. Note: We thank Nikos Nikiforakis and Hironori Otsubo for allowing us to reanalyze their data, the Nature publishing group for the right to reprint one figure from Fehr and Gächter (2002), and two anonymous reviewers for their helpful comments. Heikor Rauhut acknowledges support by the SNSF Starting Grant BSSGI0_155981. Correspondence should be addressed to HR or FW. Both authors contributed equally to this work and are listed intentionally in alphabetical order.

https://doi.org/10.1515/9783110472974-012

240 | Heiko Rauhut and Fabian Winter

Mechanisms such as expected future interactions (Axelrod 1984), credible signals of long-term interests in mutual social exchange (Gambetta 2009), or reputation-seeking (Nowak and Sigmund 1998; Sigmund 2010; Wedekind and Milinski 2000; Berger and Rauhut 2014) can explain cooperative behavior even among rational egoists. Interestingly, the emphasis in the current literature is on the positive societal effects of social norms: “The view that norms are created to prevent negative externalities, or to promote positive ones, is virtually canonical in the rational choice literature” (Hechter and Opp 2001). In contrast to the rich literature about the positive effects of social norms on cooperation, we concentrate on the largely neglected argument that social norms can also generate conflict. Members of the same group can hold profoundly different normative expectations of what ought to be done. This phenomenon, referred to as “normative conflict”, generates conflict rather than cooperation. If ego holds a different norm to alter, she can do everything right and have the best intentions to cooperate, but nevertheless find that her behavior is conceived of as improper. They fall into conflict, despite both being convinced of having behaved adequately. We start by introducing the concept of normative conflict by extending Coleman’s (1990) conceptualization of norms. We give an introduction to social norms and cooperation, and exemplify how norms prescribe how target actors ought to behave to benefit the beneficiaries of the norm. We first focus on cases where all involved actors share the same norm. Normative conflict, in this case, is about the level of normative commitment: how much should each actor sacrifice her self-interest to comply with the norm? The second kind of normative conflict is about the normative content: which kind of behavior is prescribed or proscribed in a given situation? For example, people may hold exclusive norms of cooperation, such as equality versus equity norms. Our main argument in this article is that punishment has different effects in these different types of normative conflicts. The standard case is the first type of conflict about the level of commitment. Here, research shows that punishment helps foster cooperation. Our idea is that if people agree which norm to follow, punishment typically helps push low contributors towards more cooperation. However, if people do not agree which behavior should be conducted, that is, which normative content should apply, punishment often has detrimental effects. In other words, if people disagree about the kind of normative behavior that should be followed, punishment leads to counter-punishment, feuds, and long-lasting conflicts. We develop our theoretical argument of the effectiveness of punishment based on real-life examples and evidence from experiments. We believe that our proposed typology of normative conflict is helpful in re-reading the evidence of norm enforcement, and that we shed new light on the question of when punishment is effective and when it is ineffective in promoting cooperation.

Types of Normative Conflicts and the Effectiveness of Punishment |

241

2 A typology of norm-related conflicts Social norms define rules of how one ought to behave in a certain situation. To be more precise, in norm-relevant situations, almost every member of a population believes that almost every other member has certain behavioral expectations. This implies that norms are directed at certain actions, which can be called focal actions (Colman 1990:246). The expectations about focal actions are directed towards targets of the norm (equivalently one may say target actors or norm targets). Target actors are defined by Coleman (1990:247) as follows: “For any norm, there is a certain class of actors whose actions or potential actions are the focal actions. [. . . ] I will call members of such a class targets of the norm, or target actors.” Most norms benefit a certain group of actors, who are called beneficiaries of the norm. These beneficiaries typically hold the norm and are potential sanctioners of the target actors. Coleman (1990:247) defines beneficiaries as “a class of actors who would benefit from the norm, potentially hold the norm, and are potential sanctioners of the target actors. These are actors who [. . . ] assume the right to partially control the focal actions and are seen by others [. . . ] to have this right.” In summary, target actors are individuals who are forced to restrict their self-interest to follow the norm while beneficiaries are individuals who benefit from general adherence to this norm. Following the above definitions, we define social norm as follows. A social norm is a commonly shared behavioral expectation among beneficiaries and targets of a norm of how one ought to behave in a norm-relevant situation, which is enforced by sanctions in case of norm violations (see also Winter, Rauhut, and Helbing 2012).¹ Whereas the current debate is dominated by the argument that social norms solve the problem of cooperation among rational egoists, we argue that social norms can also trigger conflict. In our view, conflict can emerge from two sources. First, conflict can emerge if target actors and beneficiaries of a norm belong to a different group with different interests. We call this structural conflict. Second, conflict can emerge if actors apply contradicting norms in the same norm-relevant situation. We call these conflicts normative conflicts, and distinguish commitment from content-related normative conflicts. In commitment-related conflicts, actors disagree about the extent to which the norms should restrain their self-interest. In content-related conflicts, actors disagree about which norm should be followed in which situation.

1 A related definition is suggested in the sociological tradition by Elster (1989:105): “A norm [. . . ] is the propensity to feel shame and to anticipate sanctions by others at the thought of behaving in a certain, forbidden way. [. . . ] This propensity becomes a social norm when and to the extent that it is shared with other people.” In economics, similar definitions are used. Fehr and Gächter (2000:166) define a social norm as follows: “It is 1) a behavioral regularity; that is 2) based on a socially shared belief of how one ought to behave; which triggers 3) the enforcement of the prescribed behavior by informal social sanctions.”

242 | Heiko Rauhut and Fabian Winter

2.1 Structural conflicts Much of the current literature focuses on the case of conjoint norms, where the beneficiaries and targets of the norm belong to the same set of actors.² In the case of doping, all athletes are the target and likewise benefit from the anti-doping norm. From an individual perspective, doping yields a relative advantage at the price of damaging one’s health. Whereas many cyclists prefer to accept this price, the relative advantage vanishes if all athletes dope and end up with bad health, which is (paradoxically) the same relative position compared to the situation in which nobody dopes. As for many examples of conjoint norms, the social norm bridges the cleavage between selfinterest and collective good and can be modeled as a prisoner’s dilemma. The case of non-aggression norms in the trench warfare of the First World War represents another, by now classic, example of conjoint norms. Here, French and German soldiers reduced their mortality risk by complying with strong behavioral norms to conduct mutual fake assaults and show mutual respect of war interceptions (Ashworth 1980; Axelrod 1984). For some norms, the targets of a norm and the beneficiaries fall apart. In this case of disjoint norms, the separation is typically associated with opposing interests and causes conflict instead of cooperation. We can observe such conflict of interest between parents as the beneficiaries of a certain norm and their children as the target of the norm. Coleman (1990:245) gives the example of a high school girl who is asked by her friends to join them in smoking marijuana. Whereas her friends disdain her reluctance, her parents disapprove her consent. In the area of gender differences, many norms are disjoint. Consider the norms that women should not pursue a profession, should not practice polygamous sex, or should not engage in politics. It seems that such norms are targeted towards women to the benefit of men. The conflict of interest between the beneficiaries and targets of a norm might even be more pronounced in the case of norms proscribing racial or homosexual discrimination. We define structural conflict as the conflict of interest between the beneficiaries and targets of a disjoint norm. Both beneficiaries and targets share the same behavioral expectation of how one ought to behave in any given norm-relevant situation. Nevertheless, only the beneficiary profits from norm-compliant behavior, which is produced by the target of the norm at own cost (Figure 1). Structural conflicts do not necessarily depend on specific norms, but are an inherent property of some form of heterogeneity in a situation’s social structure. Asymmetry between actors, like gender or a parent-child relationship, allow for diverging behavioral expectations and form a necessary condition for the emergence of structural conflict. Whether or not a structural conflict might exist can thus already be inferred by taking a close look at the actors and their current social context, even before considering their specific social norms.

2 The typology of conjoint and disjoint norms was introduced by Coleman (1990:247ff.).

Types of Normative Conflicts and the Effectiveness of Punishment

Beneficiaries Norm Targets

Beneficiaries

Norm Targets

Beneficiaries

|

243

Norm Targets

Notes: The left image illustrates conjoint norms. All targets of the norm benefit from norm-compliant behavior. The right image illustrates disjoint norms that prescribe or proscribe certain behaviors of target actors, which benefit a different set of actors. The intermediate case between conjoint and disjoint norms is displayed in the middle. Fig. 1: Structural conflicts by different types of norms (Source: Authors’ compilation).

2.2 Normative conflicts The specification of normative conflicts requires distinguishing two factors that generate behavioral expectations:³ the kind of action that should be undertaken and the intensity of that action. We term the first element “normative content”, defined as the kind of behavior that is prescribed or proscribed in a given situation. It provides information about which of the situation’s characteristics should be evaluated when choosing an action. We term the second element “level of normative commitment”. This indicates that social norms usually require an actor to restrict self-interest in favor of another person’s or group’s wellbeing. Consequently, we define this element as the extent to which an actor should sacrifice self-interest to comply with the norm. The level of normative commitment is not fixed. While some norms may require strong restrictions, others are less demanding. The idea of content-related normative conflicts can be illustrated by the following examples. When it comes to performance-related salaries, blue-collar employees often consider harmful working conditions as an important determinant, while whitecollar employees stress value creation (Hyman and Brough 1975). In another study, soldiers differed over whether military merits or the fact of being married with children ought to be considered important for deciding early demobilization after World War II (Stouffer 1949). Alternatively, a group of employees in a firm may call for equal pay in contrast to a second group demanding a payment scheme based on added value. Thus, attributes such as working conditions, family status or children may serve as normative “cues” which determine the allocation of scarce goods (such as money or demobilization). Consequently, we define normative conflict as a transaction failure resulting from actors holding at least partially exclusive normative expectations. The distinction between content and commitment of a norm enables us to classify conflicts based on distinct contents versus distinct commitments. Normative conflicts are interesting inasmuch as they describe situations in which actors adhere to social norms, believe themselves to be behaving correctly, and nevertheless experience conflicts.

3 See also Winter, Rauhut, and Helbing 2012:920f.

244 | Heiko Rauhut and Fabian Winter

Obviously, it is possible to imagine combinations of structural and normative conflicts. For example, there can be norm-relevant situations in which the same group of beneficiaries favors different disjoint norms, which would benefit them to the same extent. Thus, they do not agree on whether norm A or norm B is the appropriate norm that should be demanded from the norm targets to please the beneficiaries. Note that we concentrate only on the pure cases in this chapter.

3 The theory on the effectiveness of peer punishment for different types of norm-related conflicts Our typology of structural and normative conflicts can be cross-tabulated two by two. This yields four cases. The cross-tabulation is depicted in Table 1 and visualized in Figure 2. This typology is helpful in systematizing theoretical and empirical research on social norms. We illustrate this for schematizing research on the effectiveness of peer punishment for the promotion of cooperation norms. We conjecture that peer punishment is more effective for commitment-related than for content-related conflicts. It is also more effective in the absence of structural conflicts, where norms are conjoint rather than disjoint. Our reasoning suggests the following order for the effectiveness of punishment: commitment-related conflicts over conjoint norms, commitment-related conflicts over disjoint norms, content-related conflicts over conjoint norms, and then content-related conflicts over disjoint norms. This order is conceptualized by the arrow in Figure 2. We illustrate our reasoning by giving examples and typical game-theoretical conceptualizations for each of the four cases. The first case of commitment-related conflicts over conjoint norms is the simplest and most prototypical one. A classic example is environmental protection (Diekmann and Preisendörfer 2003). All benefit if everybody contributes to environmental protection. The beneficiaries are also the target actors of the norm of eco-cooperation. The typical conflict is how much to contribute to eco-cooperation. In other words, people can disagree about the level of normative commitment. For example, is it sufficient to buy energy-saving lamps, or should one Tab. 1: Effectiveness of punishment.

Structural conflict

No (conjoint norm) Yes (disjoint norm)

Normative conflict Commitment-related conflict

Content-related conflict

Very high High

Low Very low

Types of Normative Conflicts and the Effectiveness of Punishment |

No (conjoint) Environmental protection

Public goods game

Parental bargaining Yes (disjoint) over one-sided career break for child-rearing

Ultimatum game

Normative conflict No (conjoint)

Heterogeneous Polluter pays vs. public goods game equality principle in global climate protection

Less effective punishment

Structural conflict

More conflict

Commitment

245

Content Parental bargaining over one-sided authoritarian Yes (disjoint) vs. anti-authoritarian child-rearing

Heterogeneous ultimatum game

Notes: This 2 × 2 typology yields four cases, for which typical examples are listed at each branch of the tree diagram. To the right of the examples are typical game-theoretical conceptualizations of the interaction structure. The four cases are ordered by increasing potential for normative conflicts and decreasing effectiveness of peer punishment. This order is conceptualized by the arrow on the right. Fig. 2: Typology of normative conflicts by commitment versus content related conflicts when structural conflict is present or absent (Source: Authors’ compilation).

also buy a fuel-thrifty car, or refrain from owning a personal car, or even abstain from flying to holiday destinations? A typical abstract conceptualization of these commitment-related conflicts is a public goods game where people can contribute more or less to a common pool, from which all group members benefit equally. Peer-punishment is most effective in this case, since it “only” coordinates the cooperation level. The second case of commitment-related conflicts over disjoint norms can be illustrated by the example of parental bargaining over a one-sided career break for childrearing. When expecting a child, a couple may be interested in one partner keeping to his or her career track to earn sufficient money for the family, while the other partner takes a career break to raise the child. In this case, one parent is the target of the norm and is expected to invest time and energy for child rearing, with the consequence of sacrificing some career advantages. The beneficiary, on the other hand, can continue his or her career. Normative expectations in this case are one-sided, so that this case satisfies the conditions of a disjoint norm. Beneficiary and target actor may bargain about how much the target actor should invest in child-rearing and how many career

246 | Heiko Rauhut and Fabian Winter

options it is tolerable to lose. Disagreement may therefore emerge about the level of normative commitment the target actor is expected to fulfill. The strategic interaction structure may be generalized to an abstract ultimatum game. A proposer can decide how to distribute a common pie and a responder can accept or reject. Rejection can be regarded as altruistic punishment, since the pie is lost to both parties. The structurally weaker responder often adheres to a fairness norm and rejects offers that are too low. This norm is disjoint, since target actor and beneficiary fall apart. We expect punishment to be less effective for the enforcement of a requested level of commitment for the latter case of disjoint norms, compared to the former case of conjoint norms. The reason is that the conflict of interests in disjoint norms hampers the alignment of a mutually agreed level of commitment. In conjoint norms, there is no conflict of interests; both parties “merely” have to coordinate on how much selfinterest should be restrained to benefit everybody in the group. The third case of content-related conflicts over conjoint norms can be exemplified by distinct norms of environmental protection. Take the case of global climate protection. Some parties may argue that heavy polluters should contribute larger shares to global climate protection than low polluters. In contrast, other parties may adhere to an equality principle and may demand that all parties should contribute equally to global climate protection. This case exemplifies conjoint norms, since all target actors benefit equally from a cleaner and more protected global environment. However, target actors disagree about which normative contents should be followed to protect the environment. In more abstract terms, the strategic interaction structure can be conceptualized by a heterogeneous public goods game. For example, target actors can have different production costs to produce the same level of the public good. To stay with our example, in countries with a lower technological level, the fulfillment of certain environmental guidelines takes higher relative prices compared to countries with a high technological level.⁴ We expect punishment to be less effective here than in the former cases, since disagreement about normative principles is harder to resolve compared to disagreement about the level at which commonly agreed principles should be adhered to. The fourth case of content-related conflicts over disjoint norms is illustrated by parental bargaining over different educational principles in a family that divides labor between child-rearing and breadwinning. This situation describes a disjoint norm, where the child-raiser is target actor of the norm to invest time and energy for childrearing. The educator and the breadwinner may, however, disagree with the educa-

4 A comparable conflict over contents can be modeled by different per capita returns that target actors receive from the same production levels at same production costs from all contributing target actors (e.g., Nikifourakis). We will discuss this case in the next section.

Types of Normative Conflicts and the Effectiveness of Punishment

|

247

tional principles. For example, one may favor an anti-authoritarian, and the other an authoritarian, style. The underlying motive of both styles may be similar inasmuch as both are geared towards making the best of the education of the child – they are just different means to serve this end. In more abstract terms, the fourth case can be conceptualized by a heterogeneous ultimatum game. For example, a proposer and a responder can be heterogeneous in their contributions to a common pool, which needs to be divided. A high-contributing responder may demand more than equal shares from the common pool, while a lowcontributing proposer has a structural advantage and may insist on equal shares. This yields a disjoint normative situation between proposer (beneficiary) and responder (target actor), where both adhere to different normative contents (an equity versus an equality norm). This represents an abstract model of a structurally advantaged breadwinner, who requests that the child-rearer follow his or her favored norm. We argue that the conflict is largest in this fourth case: there is disagreement about the content of the norm, and there is a structural conflict of interests between target actor and beneficiary. Therefore, we expect punishment to be least effective in these situations. Comparing cases one and two with cases three and four, we expect disagreement about the level of commitment to be more easily resolvable than disagreement about normative contents. In commitment-related conflicts, everybody agrees about the normative principles. Punishment “merely” helps to align contribution levels in the group. In content-related conflicts, however, people disagree about which principle should be followed to produce the public good. This is a more fundamental conflict, where punishment is likely to provoke counter-punishment, feuds, and barely-resolvable cleavages. This reasoning leads us to the proposed order of the level of conflict and the effectiveness of punishment for the four types of norm-related conflicts. One theoretical reason for the order of the level of conflict and the effectiveness of punishment is that the types can also be ordered by the number of potential conflicts. The number of potential conflicts is increasing from the first to the last type. Commitment-related conflicts over conjoint norms have one source of conflict: the level of commitment. Commitment-related conflicts over disjoint norms have two sources of conflict: the level of commitment and the structural conflict (beneficiary vs. target actor). Content-related conflicts over conjoint norms also have two sources of conflict: the level of commitment and the content. Finally, content-related conflicts over disjoint norms have three sources of conflict: the level of commitment, the content and the structure (beneficiary vs. target actor).

248 | Heiko Rauhut and Fabian Winter

4 Experimental evidence on the effectiveness of peer punishment for different types of norm-related conflicts In the following, we systematize experimental research on the effectiveness of punishment. This is done by discussing exemplary findings for each type of normative conflict.

4.1 Commitment-related conflicts over conjoint norms

Mean cooperation (MUs)

A classic study on the effectiveness of punishment in public-good provisions is Fehr and Gächter (2002). In this study, groups of four could invest in a linear public good with a marginal per capita return of 0.4. This creates a situation where everybody’s egoistic incentive is to contribute nothing (since every monetary unit yields individual returns of 0.4). However, if everybody contributes, everybody receives higher earnings (4 ⋅ 0.4 = 1.6 received units from each contributed unit). This creates a conjoint cooperation norm, since group members are beneficiaries and target actors for the contribution to the public good. In one condition, group members could punish others after having seen their contribution level. In this way, different levels of commitment to the cooperation norm could be coordinated. In most cases, high contributors punished low contributors. This increased the commitment to almost full contributions. In a condition without punishment, however, cooperation decreased substantially (Figure 3). This finding has been replicated several times and has become a textbook result in behavioral game theory (e.g., Camerer 2003). In light of our theory, the study demonstrates the high effectiveness of punishment for commitment-related disagreements about how much to contribute to a conjoint cooperation norm. 20 18 16 14 12 10 8 6 4 2 0

Without punishment

1

2

3

4

5

With punishment

6 1 Period

2

3

4

5

6

Fig. 3: Peer sanctioning enables cooperation norms in public-goods problems (Source: Fehr and Gächter 2002).

Types of Normative Conflicts and the Effectiveness of Punishment |

249

4.2 Commitment-related conflicts over disjoint norms Disjoint norms are situations in which beneficiaries and target actors of a norm fall apart. The ultimatum game offers an abstract conceptualization of this conflict of interests. A proposer decides how much of a common pie to distribute to a responder. The responder can accept or reject. Rejection destroys all payments for both parties. The structurally stronger proposer is the target of a fairness norm to split equally (50 : 50). The structurally weaker responder benefits from this norm, because a rational and egoistic proposer would offer the smallest possible amount, which a rational and egoistic responder would accept (since this is more than nothing). Disjoint fairness norms can, however, sustain a fairness norm for two reasons (Fehr and Schmidt 1999). The proposer splits close to equal if she is inequality-averse and prefers equal outcomes compared to unequal, but higher, personal earnings. Second, the proposer could believe that the responder is sufficiently inequality-averse and prefers equal zero earnings compared to unequal positive earnings. This also generates a fairness norm. Several studies support both arguments: proposers offer substantial amounts even without a rejection possibility, and the proposers’ violations of a fairness norm are often punished by the responders’ rejections (Camerer 2003). Avrahami et al. (2013) conducted an often-repeated ultimatum game experiment with changing partners. This design allows us to study the evolution of fairness norms and the effectiveness of punishment for norm enforcement. We reanalyzed their data to yield some support for our conjecture about the effectiveness of punishment. Our analysis showed that the adherence to a fairness norm of 50 : 50 quickly and strongly converges towards consensus (Figure 4 left). Violations of this norm are punished by rejections (Figure 4 right). Since the proportion of multilateral norm adherence strongly increases, the occurrence of punishment decreases over time, suggesting that the norm reproduces itself over and over again. This experiment simulates an abstract scenario of commitment-related conflicts in disjoint norms. Mostly, the proposer and the responder agree that the proposer should offer some part of the pie to the responder. However, both can disagree about the proportion, that is, about the level of commitment to the fairness norm of a fully equal split. We argue that in disjoint norms, the conflict of interests between beneficiary and target actor makes punishment less effective compared to conjoint norms. Some evidence for this argument can be deduced from a comparison of Figures 3 and 4. In disjoint norms (the ultimatum game), the norm takes longer to evolve and breaks down at the end. The evolution of the conjoint cooperation norm in public-good provisions is faster and there is no endgame effect (i.e., no breakdown of cooperation).

250 | Heiko Rauhut and Fabian Winter 1 Proportion of rejected offers

Proportion of equality norm

1 .8 .6 .4 .2

.8 .6 .4 .2 0

0 0

20

40

60

80

Period

100

0

20

40 60 Period

80

100

50% offered, 50% accepted 50% offered, less accepted Notes: The grey line in the left panel shows the proportion of unilateral full adherence to the equality norm over time (proposer offers 50 : 50). The black line shows bilateral full adherence to the equality norm (proposer offers and responder demands 50 : 50). The right panel shows the proportion of responders’ peer sanctioning of violations of the fairness norm over time (rejections by responders). Fig. 4: Convergence of a disjoint fairness norm in a repeated ultimatum game (Source: Own compilation of reanalyzed data from Avrahami et al. 2013).

4.3 Content-related conflicts over conjoint norms We conjecture that content-related conflicts are stronger than commitment-related conflicts. The dispute is not only about how much self-interest should be sacrificed to comply with the norm. It is a conflict about different principles, and about different conceptions of how to produce the norm. As in the argument above, we expect more conflict for disjoint than for conjoint norms. An experimental implementation of content-related conflicts over conjoint norms is given by Nikiforakis, Noussair, and Wilkening (2012). As in the experiment by Fehr and Gächter (2002) discussed above, they designed a public-goods experiment with groups of four, where individuals could invest up to 20 monetary units in a public good in each period. The marginal per capita return was always such that individual contributions yielded lower returns, but collective contributions yielded higher average group earnings, creating a social dilemma. There was a baseline punishment condition like the one in Fehr and Gächter (2002). We call this condition “no normative conflict with punishment” (Figure 5). In this condition, there was a symmetric marginal per capita return of 0.4 for all group members. This yielded an individual return of 0.4 for each contributed unit and a 60 % group benefit from every unit contributed by oth-

Types of Normative Conflicts and the Effectiveness of Punishment |

Punishment

CounterPunishment

40

20

Average contribution

Probability

30

20

10

0

251

Comm. Cont.

Comm. Cont.

Conflict over commitment punishment

15

10 Conflict over content punishment 5

0

Conflict over content no punishment 1

2 3 4 5 6 7 8 9 10 Period

Notes: The left panel shows the probability of punishment and counter-punishment in collectivegoods games with only commitment-related conflicts (white) and with content-related normative conflicts (black). The right panel shows the dynamical consequences of punishment and counterpunishment in terms of average collective-good provisions. The lines refer to a symmetric game (without content conflicts; upper line), an asymmetric game (with content conflicts; middle line) and a control treatment without punishment (lower line). Fig. 5: Normative conflict leads to feuds and less effective punishment (Source: Own compilation based on the data by Nikiforakis, Noussair, and Wilkening 2012).

ers. Extending previous experiments, counter-punishment was allowed. This means punished individuals could punish back, which could again be retaliated and so forth. Hence, feuds in terms of punishment series were allowed. This treatment was contrasted with an asymmetric public goods game with punishment. We call this treatment “normative conflict with punishment” (Figure 5). The asymmetry was implemented in terms of different per capita returns. Prior to the experiment, subjects competed in a real-effort task about advantageous positions in the public goods game. Winners were selected to receive high marginal per capita returns (0.5), and losers were selected to receive low marginal per capita returns (0.3). This created a situation in which winners had higher returns from public-good contributions than losers. In this sense, winners had a stronger interest in the public good than losers. The asymmetry in returns created normative conflicts between two possible contribution norms. First, actors could adhere to a libertarian norm and demand equal contributions from all group members (which would result in higher earnings for winners). Alternatively, actors could adhere to an equality (redistribution) norm and demand that all group members should earn equally (requiring higher contributions from winners). To put it differently, the first norm prescribed equal inputs (and implied

252 | Heiko Rauhut and Fabian Winter

unequal outputs). The second norm prescribed equal outputs (and implied unequal inputs). Both treatments were compared with a control condition in which no punishment was implemented. Otherwise, this condition was similar to the last one mentioned inasmuch as per capita returns were asymmetric. We call this treatment “normative conflict without punishment”. The left panel in Figure 5 shows punishment and counter-punishment probabilities for both punishment treatments. Counter-punishment is about three times as likely and about 70 % more severe in the asymmetric treatment with normative conflict over contents (black bars) compared to the symmetric treatment without normative conflict over commitments (white bars). Counter-punishment can be regarded as an indicator of normative conflict for the following reason. If the punished party adheres to a different norm from the punisher, punishment is unjustified from the perspective of the punished party. A normatively adequate response is counter-punishment. In this sense, normative conflicts are measurable by punishment feuds. The macro-level consequences of normative conflicts and counter-punishments are lower levels of cooperation. This is demonstrated by the right panel of Figure 5. The contributions in public-goods problems with normative conflicts (middle line) are considerably lower than in the condition without normative conflicts (upper line). This is due to more and harsher counter-punishments in the case of normative conflicts. Both treatments can be compared to a version without the possibility of punishment (right panel, lowest line, “normative conflict, no punishment”). Without punishment, normative conflicts cannot be resolved and cooperation breaks down completely. It is noteworthy that the breakdown of cooperation is stronger than in the symmetric version without punishment, as Fehr and Gächter outlined it (2002). Kingsley (2016) replicates this finding on the adverse effects of content-related normative conflict in a similar study. More importantly, however, he showed that punishment loses its effectiveness even without the possibility of counter-punishment. Taken together, these results indicate that persistent content-related conflicts destroy cooperation more severely than commitment-related conflicts.

4.4 Content-related conflicts over disjoint norms We argue that the strongest conflict with the least effective punishment is the case of content-related conflicts over disjoint norms. Here, people disagree about the normative rule and beneficiary and target actors have different interests. One example where people disagree about normative contents is when they have put different levels of effort into a collective good or have experienced different outcomes from it. A case where norms are disjoint is given if targets do not benefit from the norm. An exemplary abstract strategic setting of this kind is a heterogeneous ultimatum game.

Types of Normative Conflicts and the Effectiveness of Punishment

|

253

Winter, Rauhut, and Helbing (2012) conducted such a heterogeneous ultimatum game experiment. Participants engaged in a real-effort task several days before the experiment. This yielded different monetary endowments for proposers and responders. These different endowments were based on different levels of effort. People could specify their offers to responders and their least acceptable offer from proposers for both roles (the “strategy vector method”). They were then assigned roles and partners, who typically had different endowments to contribute to the common pie. About half of the participants acted according to an equality norm. As proposers, they offered an equal split to the responders, and as responders, they demanded an equal split. The other half of the participants, however, acted according to an equity norm. As proposers, their offers were proportional to their effort. They offered less to the responders if the responders had contributed less than themselves. Likewise, they offered more to the responders if the responders had contributed more than they had. About half of the responders followed this pattern and demanded offers that were proportional to their level of effort. This norm can be regarded as an alternative fairness rule where outcome is proportional to input. The two different norms generate conflict if the proposer has contributed more than the responder, and if the proposer holds an equity and the responder an equality norm. In this case, the proposer offers less than half to the responder, while the responder requests half of the pie. Winter, Rauhut, and Helbing (2012) estimated normative types (equity versus equality norm followers) and analyzed the likelihood of conflicts for pairs holding similar and different norms. Conflicts in the ultimatum game were operationalized as rejected offers. It turned out that conflicts occurred substantially more often if actors disagreed about the normative content than if they adhered to the same normative content (Figure 6). This gives evidence for our theory that content-related conflicts in disjoint norms represent the most severe case of conflict.

40

20 10 0

Content

30

Commitment

Conflict rate in %

50

Notes: Conflict rates represent rejections in ultimatum games. No normative conflict represents cases where proposer and responder adhere to the same norm (both equality or both equity). Normative conflict represents cases where proposer and responder hold different norms (one equality and the other equity).

Fig. 6: Conflict rates without (content-related) normative conflicts (left) and with (content-related) normative conflicts (right) (Source: Own compilation based on the data by Winter, Rauhut, and Helbing 2012).

254 | Heiko Rauhut and Fabian Winter

5 Implications Our terminology developed here sorts our reasoning about conflicts and promotes a more multi-faceted view about the underlying mechanisms of normative behavior. Our theoretical arguments may invite people to rethink cooperation failures observed in the lab and in the field. One such example may be the explanation of seemingly “antisocial” behavior. Hermann, Thöni, and Gächter (2008) conducted a public-goods experiment with punishment (see section 4.1) in several different countries. They found that some societies tend to limit punishment to low contributors, while others also punish high contributors. The authors argued that their results might best be explained by heterogeneity in “civic duty norms” across societies: “[I]f participant pools held different social norms with regard to cooperation and free-riding, they actually might have punished differently” (Hermann, Thöni, and Gächter 2008:1365, emphasis added). In contrast, our theoretical sketch developed here suggests that “antisocial” punishment is an indicator of normative conflicts within societies. One subgroup of a society might try to promote a high level of commitment to the collective good and only punish under-contributors. At the same time, another group might be discouraged by other people trying to force them to do anything, even if it was in their best interest. This group perceives high-contributors as overly ambitious, vain, or even hypocritical, and fears that they raise the bar of cooperation too high. A similar norm of modesty has already been reported in the Hawthorne experiments by Roethlisberger and Dickson (2003:384 [1939]). Instead of enforcing high contributions, they punish those who contribute too much. Norm violations are thus punished by two opposed groups: over- and under-contributors.

6 Conclusion This chapter outlines new theoretical ideas about normative conflicts and provides a new typology. Four types are distinguished based on the distinction between conjoint and disjoint norms by Coleman (1990) and our own classification of commitment-related and content-related normative conflicts (Winter, Rauhut, and Helbing 2012). We order the four types of normative conflicts according to their conflict potential and their effectiveness with which conflict can be restored by punishment. So far, the literature discussed commitment-related conflicts as the main problem. Here, people must agree on the extent to which social norms should restrain their selfinterest. Despite agreement that a specific norm should be followed, “undercutting” is regarded as legitimate by some and unacceptable by others. Thus, different degrees of normative commitment are an important source of normative conflict. However, we conjecture that content-related conflicts are more severe than commitment-related ones. Consequently, we expect punishment in content-related con-

Types of Normative Conflicts and the Effectiveness of Punishment |

255

flicts to be less effective in restoring cooperation. Despite actors deciding to be cooperative and contributing an appropriate share to the commons, they hold different norms of what they consider to be fair. The driving factors of commitment-related conflicts are different levels of selfishness or diverging beliefs about the cooperativeness of others. We expect people to be relatively open to persuasion to be more cooperative when others are also cooperative. We argue that people are also relatively open to argumentation that others are more cooperative than they had believed. In contrast, we expect content-related conflicts to be less easy restorable. When actors hold distinct convictions (i.e., when there is normative conflict), different normative viewpoints tend to be strongly defended, making more conflict resolution necessary. Others must be made amenable to different points of view. Communication, clarification and approval of distinct moral principles need more time and energy and more complex kinds of conflict resolution than punishment. For example, taking turns can be one solution for the peaceful coexistence of different moral principles (Winter 2014). An obvious next research step would be the development of an empirical research design through which all types of conflicts can be studied in a more comparable way. Our comparison over different experiments, subject pools and designs is limited to providing some insights and novel ideas. A next step would require a setup in which only the types of conflicts vary. The most direct test of our theoretical conjectures would be a laboratory design in which all types of normative conflicts were implemented and subjects were randomly allocated to different types of normative conflicts. A measure of normative conflict could be the extent of counter-punishment in all four types of normative conflicts. Ideally, such a laboratory design would go hand in hand with an analytical model, from which the hypothesized extent of conflict and effectiveness of punishment can be deduced. Despite not having formulated a rigorous theoretical model and having not provided results from a tailor-made laboratory experiment, we believe that our typology has many new implications for the understanding of when social order emerges spontaneously and how it can be organized by mechanism design. We believe that our new perspective can guide social theory and be applied to conflict resolution in the understanding and management of social norms, cooperation, and conflicts.

Bibliography [1] [2]

Ashworth, Tony. 1980. Trench warfare 1914–1918: the live and let live system. New York: Holmes & Meier. Axelrod, Robert M. 1984. The evolution of cooperation. New York: Basic Books.

256 | Heiko Rauhut and Fabian Winter

[3]

[4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19]

[20] [21] [22] [23] [24] [25]

[26] [27] [28]

Avrahami, Judith, Werner Güth, Ralph Hertwig, Yaakov Kareev, and Hironori Otsubo. 2013. “Learning (not) to yield: An experimental study of evolving ultimatum game behavior.” The Journal of Socio-Economics 47:47–54. Berger, Roger, and Heiko Rauhut. 2014. “Reziprozität und Reputation.” Pp. 715–742 in Handbuch Modellbildung und Simulation, edited by N. Braun, and N. Saam. Wiesbaden: VS Verlag. Camerer, Colin. 2003. Behavioral game theory: Experiments in strategic interaction. Princeton: Princeton University Press. Coleman, James S. 1990. Foundations of social theory. Cambridge, MA: The Belknap Press of Harvard University Press. Dahrendorf, Ralf. 1958. Homo Sociologicus. Ein Versuch zur Geschichte, Bedeutung und Kritik der Kategorie der sozialen Rolle. Opladen: Westdeutscher Verlag. Diekmann, Andreas, and Peter Preisendörfer. 2003. “The behavioral effects of environmental attitudes in low-cost and high-cost situations.” Rationality and Society 15(4):441–472. Diekmann, Andreas. 2010. “Not the First Digit! Using Benford’s Law to Detect Fraudulent Scientific Data.” Journal of Applied Statistics 34(3):321–329. Durkheim, Emile. [1897] 1997. Suicide. Glencoe, IL: Free Press. Elster, Jon. 1989. The Cement of Society: A Study of Social Order. Cambridge: Cambridge University Press. Fehr, Ernst, and Simon Gächter. 2000. “Fairness and Retaliation: The Economics of Reciprocity.” Journal of Economic Perspectives 14:159–181. Fehr, Ernst, and Simon Gächter. 2002. “Altruistic Punishment in Humans.” Nature 415(10):137– 140. Fehr, Ernst, and Klaus M. Schmidt. 1999. “A Theory of Fairness, Competition and Cooperation.” Quarterly Journal of Economics 114(3):817–868. Gambetta, Diego. 2009. “Signaling.” Pp. 168–194 in The Oxford Handbook of Analytical Sociology, edited by P. Hedström, and P. Bearman. Oxford: Oxford University Press. Hechter, Michael, and Karl-Dieter Opp. 2001. “Introduction.” Pp. xi–xx in Social norms, edited by M. Hechter, and K.-D. Opp. New York: Russell Sage Foundation. Herrmann, Benedikt, Christian Thöni, and Simon Gächter. 2008. “Antisocial punishment across societies.” Science 319(5868):1362–1367. Hyman, Richard, and Ian Brough. 1975. Social Values and Industrial Relations: Study of Fairness and Inequality (Warwick Studies in Industrial Relations). Oxford: Blackwell Publishers. Kingsley, David C. 2016. “Endowment heterogeneity and peer punishment in a public good experiment: Cooperation and normative conflict.” Journal of Behavioral and Experimental Economics (in press). Nikiforakis, Nikos, Charles N. Noussair, and Tom Wilkening. 2012. “Normative conflict and feuds: The limits of self-enforcement”. Journal of Public Economics 96(9):797–807. Nowak, Martin A., and Karl Sigmund. 1998. “Evolution of Indirect Reciprocity by Image Scoring.” Nature 393(6685):573–577. Olson, Mancur. 1965. The Logic of Collective Action. Cambridge, MA: Harvard University Press. Parsons, Talcot. 1937. The Structure of Social Action. Glencoe, IL: Free Press. Röthlisberger, Fritz J., and William J. Dickson. 2003. The early sociology of management and organizations. Vol. 5, Management and the Worker. London and New York: Routledge. Przepiorka, Wojtek, and Diekmann, Andreas. 2013. “Individual heterogeneity and costly punishment: a volunteer’s dilemma.” Proceedings of the Royal Society of London B: Biological Sciences 280(1759):20130247. Sigmund, Karl. 2010. The Calculus of Selfishness. Princeton, NJ: Princeton University Press. Stouffer, Samuel A. 1949. The American Soldier. Princeton, NJ: Princeton University Press. Ullmann-Margalit, Edna. 1977. The Emergence of Norms. Oxford: Clarendon Press.

Types of Normative Conflicts and the Effectiveness of Punishment

|

257

[29] Voss, Thomas. 2001. “Game-Theoretical Perspectives on the Emergence of Social Norms.” Pp. 104–136 in Social norms, edited by M. Hechter, and K.-D. Opp. New York, NY: Russell Sage Foundation. [30] Wedekind, Claus, and Manfred Milinski. 2000. “Cooperation Through Image Scoring in Humans.” Science 288(5467):850–852. [31] Winter, Fabian, Heiko Rauhut, and Dirk Helbing. 2012. “How norms can generate conflict: An experiment on the failure of cooperative micro-motives on the macro-level.” Social Forces 90(3):919–948. [32] Winter, Fabian. 2014. “Fairness Norms Can Explain the Emergence of Specific Cooperation Norms in the Battle of the Prisoner’s Dilemma.” The Journal of Mathematical Sociology 38(4):302–320.

Ben Jann and Elisabeth Coutts

Social Status and Peer-Punishment: Findings from Two Road Traffic Field Experiments Abstract: In a seminal experiment, Doob and Gross (1968) examined the influence of social status on peer-punishment of norm violations in traffic. They observed an inverse relationship between the economic status indicated by a car that was blocking an intersection and the punishment meted out to the driver of that car, with “punishment” taking the form of a honk of the car horn. In a more recent experiment, Diekmann et al. (1996) noted the status and reactions of the cars blocked by a single midstatus car. Blocked drivers at the wheel of a higher-status car were found to punish more aggressively than drivers of a lower-status car. Our study employs a combined design to separate the effects of driver and blocker status. In two field experiments, we varied the status of the norm-violating car and recorded the status of the blocked driver’s (i.e., the experimental subject’s) car. Our results provide evidence that social distance facilitates peer-punishment. Punishment was expressed less readily when the blocked and blocking cars indicated a similar social status.

1 Introduction Various studies have examined the relationship between socioeconomic status and the peer-sanctioning of norm violations (such as unnecessarily blocking the way) in road traffic. Most of these have focused on the effect of the status of only one of the parties, either that of the norm violator or that of the punisher. In a seminal study examining the effect of the norm violator’s status, Doob and Gross (1968) measured horn-honking response times as an indicator of drivers’ aggression in response to being blocked by an experimental car at traffic lights in the United States. Two different blocking cars were used, each indicating a different social status. As response times were significantly shorter, and responses occurred significantly more frequently, when a driver was frustrated (i.e., blocked) by an automobile indicating lower status, Doob and Gross concluded that the presumable social status of the blocking driver and aggression expressed toward that blocking driver are inversely related. Deaux (1971) found a similar yet non-significant effect in one roughly contemporaneous replication Note: We are indebted to Renato Marioni and Stephan Suhner (Experiment 1), and to Jörg Rothe, Heiko Schmiedeskamp, Hélène Venningen, Jelena Curcic, and Jakub Swiech (Experiment 2), for their support in designing the experiments, conducting the fieldwork, and/or preparing the data. We thank BMW (Schweiz) AG for providing a vehicle (free of charge) for Experiment 2. Experiment 1 has also been reported in Jann, Suhner, and Marioni (1995) and in Jann (2009); Experiment 2 has also been reported in Rothe and Schmiederkamp (2006). https://doi.org/10.1515/9783110472974-013

260 | Ben Jann and Elisabeth Coutts

of the experiment in the U.S., but Chase and Mills (1973) found the opposite effect in another such replication. The effect reported by Doob and Gross was, however, recently replicated in a Japanese study that found longer honking latencies in response to being blocked by a high-status car than a low-status car, as long as the car did not display a beginning driver’s plate (Yazawa 2004). Finally, drivers in a study by McGarva and Steiner (2000) responded more aggressively to provocation from a lowstatus driver than from a high-status driver. Other studies have instead looked at the effect of the blocked driver’s status on a honking response. For example, Diekmann et al. (1996) also blocked drivers at traffic lights (in Germany) and recorded horn-honking response times, but held the status of the blocking car constant while measuring the status indicated by the blocked car (containing the potential punisher). Diekmann et al. found a positive relationship between the status of the driver and the degree of aggression he or she displayed toward the driver of the blocking car (with the exception of the lowest class drivers, who acted fairly aggressively as well). Results from such horn-honking studies are traditionally discussed in the context of theories of aggressive behavior. The idea is that being blocked causes frustration and anger on the side of the blocked driver (Baron 1976; Lajunen and Parker 2001; Lajunen, Parker, and Stradling 1998; Lawton and Nutter 2002), who may then react with responses such as horn honking, obscene gestures, flashing high beams, or tailgating (Hennessy and Wiesenthal 1997; Parker, Lajunen, and Summala 2002; Turner, Layton, and Simons 1975). Whether such behavior is shown depends on both the aggressor’s traits and the situation in which the aggression occurs (for a corresponding general aggression model, see Anderson and Bushman 2002; for a similar model applying specifically to driver behavior, see Shinar 1998). A distinction is also made between hostile aggression as an impulsive response intended to harm a victim, and instrumental aggression, which is a premeditated action used as a means to achieve some goal other than harming a victim (Anderson and Bushman 2002:29). Although drivers who engage in horn honking are also likely to engage in other forms of mild aggression (Novaco 1991; Shinar 1998) and horn-honking behavior varies according to factors that have been observed to promote aggressive responses in general (such as uncomfortably hot temperatures, stressful circumstances, or increased anonymity: see Baron 1976; Kenrick and MacFarlane 1986; Hennessy and Wiesenthal 1999; Ellison et al. 1995), it seems obvious that honking often has an instrumental component, that is, the attempt to motivate the blocker to move his or her car. However, whether honking represents an expression of hostile or of instrumental aggression (which can be difficult to separate, as is pointed out by various authors: see Doob and Gross 1968; McGarva and Steiner 2000; Shinar 1998), there is reason to believe that drivers blocked at an intersection by an experimental car will find the experience frustrating and are likely to retaliate by honking their own car horns. An explanation for why blocked drivers honk their horn is that the frustrating situation makes them angry. Anger has been defined as an attempt to adjust social behav-

Social Status and Peer-Punishment: Findings from Two Road Traffic Field Experiments |

261

ior when someone else has violated rules or norms. It is an emotional state that often results in aggressive behavior, and may indeed be associated with aggressive behavior while driving (Lawton and Nutter 2002). There are, however, differences between people in terms of their disposition toward becoming angry, both in general (Pitkänen 1973; Verona, Patrick, and Lang 2002) and in response to frustrations on the road (Deffenbacher et al. 2001; Deffenbacher, Oetting, and Lynch 1994; McGarva and Steiner 2000; Yagil 2001). Anger is expressed less often than it is experienced, both while driving (Lajunen and Parker 2001) and in general (Ramirez, Santisteban, Fujihara, and Van Goozen 2002). One factor thought to influence the experience and the expression of anger is the social status of the angered person. Various theorists have discussed the role of emotional expression and suppression in establishing and maintaining social hierarchies (Clark 1990; Keltner, Gruenfeld, and Anderson 2003; Kemper 1978; Kemper 1987; Lovaglia and Houser 1996; McKinnon 1994; Ridgeway and Johnson 1990; SmithLovin 1990), predicting that those who occupy higher status positions will experience less negative affect than those of lower status and will also express any anger they experience more freely. There is some empirical evidence that this is the case. Using broad-based population samples, Haukkala (2002) and Schieman (2003) found either no SES-related differences in trait anger (Haukkala) or less anger in higher economic classes (Schieman). However, both authors found that anger, when experienced, was more likely to be expressed by those with better education or higher income. Other studies have found similar effects on the expression of anger in the workplace, with those of higher occupational status more likely to express their anger than those of a lower occupational status, although those with lower status often report experiencing more anger at work (Lively 2000; Sloan 2004).¹ These results correspond with those showing a better fit between emotion and behavior (or less inhibited behavior) in high-status individuals, whether that higher status occurs naturally or has been experimentally produced (Anderson and Berdahl 2002; Hecht and LaFrance 1988; Keltner, Gruenfeld, and Anderson 2003). One study (Galinsky, Gruenfeld, and Magee 2003) artificially varied the status of subjects within their experimental groups, and found that participants of greater status were quicker to stop an irritating noise than those of lesser status. The proneness of high-status individuals to act in accordance with their emotions or wishes may also be reflected in more insistent driving techniques. For example, Taubman-Ben-Ari, Mikulincer, and Gillath (2004) found that placing high value on a driving style that reflected behaviors such as honking and flashing high beams at other drivers (“angry driving”) was positively associated with higher scores on Burger and Cooper’s (1979) Desirability for Control scale. In other words, those who agreed with statements such as “I would

1 There is evidence that anger expression differs by social group, and also that people use the emotion expressed in a reaction to a trying or frustrating situation as a cue to an actor’s social status. A higher status is ascribed to the person with the angry reaction (Conway, DiFazio, and Mayman 1999: Study 2; Tiedens 2001; Tiedens, Ellsworth, and Mesquita 2000).

262 | Ben Jann and Elisabeth Coutts

prefer to be a leader than a follower” and disagreed with statements such as “Others usually know what is best for me” also reported more honking and light-flashing behavior while driving. There is, however, reason to believe that the relevant determinant of the expression of anger or irritation is less the status of the angered person than the difference in status between the angered person and the person who angered him or her. Some empirical evidence indicates that aggression “flows downward” in the status chain. For instance, subjects in Kuppens, Van Mechelen and Meulders’ (2004) experiment reported being more likely to express anger toward a target of lower relative status than toward one of higher relative status, a result also obtained by Allan and Gilbert (2002). Using a probability sample of the U.S. population, Sloan (2004) found that workers were more likely to express anger toward their subordinates than toward their supervisors. Such results are also consistent with evidence from various animal studies on the establishment and maintenance of social hierarchies, in which aggression is found to flow downward (Barroso, Alados, and Boza 2000). Alternatively, sanctioning behavior in road traffic may reflect a more general phenomenon of lower intra-group aggression or higher inter-group aggression. One mechanism for such an effect is a greater willingness to cooperate and a reduced propensity to aggress against actors whom one perceives as belonging to the same group. Research on social categorization and inter-group behavior (Billig and Tajfel 1973; Brewer and Kramer 1985; Mummendey and Schreiber 1983; Robinson 1996; Tajfel 1982a; Tajfel 1978; Tajfel 1982b; Tajfel et al. 1971; Turner, Brown, and Tajfel 1979) has revealed a strong bias toward favoring the in-group in many contexts – importantly, even “[. . . ] in the absence of comparison with any other groups” (Brewer 1979:321; also see Kramer and Brewer 1986). In-group favoritism implies that aggression “flows outward”. There are also arguments for the reverse effect: that is, that aggression “flows inward”. Gould (2003) argues that conflict occurs less often in relationships in which there is a clear hierarchy than in “symmetrical relationships”. The reason is that people have a strong tendency to battle out a ranking if their positions are ambiguous due to the lack of an established hierarchy. Whichever of these hypotheses applies, if the punishing behavior can be predicted from the difference in status between two parties to a conflict, previous studies on horn-honking responses have examined only one half of the equation. Some, such as the study conducted by Doob and Gross (1968), found that low-status blocking cars elicited faster reactions and thus higher levels of sanctioning than high-status blocking cars. Studies such as the one from Diekmann et al. (1996) report that high-status drivers reacted more quickly to having their progress impeded than low-status drivers. In the current study, we investigate a possible interaction effect between the status of the blocker and the status of the frustrated driver. We assume that the disparity between the statuses of the actors (rather than the status of one or the other per se) determines the aggressiveness displayed in the blocked-intersection situation.

Social Status and Peer-Punishment: Findings from Two Road Traffic Field Experiments | 263

With respect to the various hypotheses about how the status differences matter, we conjecture that all of them might be true, but that they apply to different situations. For example, in a setting where there is competition for hierarchical positions, one could assume with Gould (2003) that most aggression occurs among actors with ambiguous positions. Likewise, in a situation in which lower status actors depend on higher ranking actors, such as in a workplace setting where the latter can exercise power over the former, aggression may “flow downward”. Conversely, the underdogs may rebel against the oppressors, if given a chance to do so: that is, if the situation is such that they do not have to fear further repression. In this case, aggression would “flows upward”. In the situation of blocked vehicles under study, however, we believe that the status indicated by the cars of the two drivers mostly functions as a device for social categorization, giving rise to the mechanisms of in-group favoritism. Hence, we assume that sanctioning behavior “flows outward”: that is, sanctioning behavior is expressed more readily if the status between the two actors is different, independent of the direction of the difference. Note, however, that the same pattern could also result if, for example, higher status drivers punish lower status drivers because they feel more entitled to use the road, and, at the same time, the “underdogs” take the chance to rebel because they can do so without fearing retaliation.

2 Methods Experiment 1 We blocked cars at traffic lights using an experimental car and measured hornhonking response times in a similar manner to Doob and Gross (1968). A pre-test was conducted to practice the blocking method and test our ability to capture the relevant information on our experimental subjects validly and reliably. Our experiment was conducted on two consecutive Saturday mornings in spring 1995 at an intersection with relatively light traffic in Bern, Switzerland. On the first Saturday we used an experimental car indicating a high social status (a black 1995 Audi A6 2.6L), and on the second a car indicating a low social status (a blue 1989 Volkswagen Golf C1 Mark III). Traffic conditions were similar on both mornings. As in other studies, the use of this method reflects the presumption that the car driven by a subject is (to some degree) assumed by the drivers of other cars to reflect his or her social status (Marsh and Collett 1986 provide evidence that this is the case). It also assumes that other subjects are able to perceive information such as the make of an automobile, which seems reasonable since drivers appear to note a wide variety of information about other drivers spontaneously (Knapper and Cropley 1980). An experimental trial was initiated only when the experimental car could be stopped as the first car in a line formed at a red light, and when it was followed by just

264 | Ben Jann and Elisabeth Coutts

one car, whose driver’s behavior was being recorded.² After the light turned green, the experimental car remained stopped until the driver in the car behind it honked. The experimental car contained a driver and two visible observers, all male. One of the observers measured the time between the light’s changing and the honking response. Using the mirrors, the other observer noted some information about the blocked subject, including the sex and estimated age of the driver, as well as the make, model, and status indicated by the blocked vehicle (in terms of one of three hierarchical categories based on the car’s make, model, and approximate age). If a blocked subject did not respond within the twelve-second period during which the light was green, the case was considered censored at t = 12. In total, 123 valid cases were observed, approximately 60 on each of the mornings, of which 26 represented censored measurements. Experiment 2 In the second experiment, we blocked cars in a one-way street with relatively light traffic in the inner city of Zurich. We placed the experimental car approximately 30 meters down the road from the entry into the street, positioned slightly diagonally so that approaching vehicles could not pass and that the diver of the blocking car could be seen. After conducting several pretests, our experiment was carried out on a sunny Tuesday, between 10:30 a.m. and 5:40 p.m., in summer 2005. We used two experimental cars, one indicating high social status (a dark silver 2005 BMW 530i limousine; selling price 64,000 CHF) and one indicating low social status (a silver 1995 VW Golf 1800 Rolling Stones; selling price 24,000 CHF). Cars were switched about every 20 trials. We also varied the sex of the driver in the blocking car, switching drivers about every 10 trials. Since traffic conditions and temperature changed during the day, we control for temperature and traffic density in the analyses below.³ An experimental trial was initiated when a vehicle entered the street after the experimental car was in position. The experimental car remained stopped until the blocked car (or one of the subsequent cars, if several vehicles entered the street) honked. Each trail was taped by two video cameras, one hidden below a piece of clothing in the back of the experimental car and one operated by a confederate hiding in a hedge on the side of the street. Two further confederates, one on each side of the street, took notes about the blocked car, its driver, and the horn honking reaction using standardized forms (including information such as the time until the horn was 2 All trials were conducted at the same intersection between a main street and a side street. The trials were conducted in the side street, alternating the direction between each trial. 3 Temperature data for Zurich (one measurement every 10 minutes) was obtained from MeteoSwiss (Federal Office of Meteorology and Climatology). Information on hourly traffic flow (number of vehicles counted by the traffic sensors in Zurich) was obtained from ASTRA (Federal Roads Office). We used linear interpolation between measurements to match temperature and traffic density to the individual trials.

Social Status and Peer-Punishment: Findings from Two Road Traffic Field Experiments | 265

honked, the number of blocked vehicles, the sex and estimated age of the driver, and the status indicated by the foremost blocked vehicle). All collected information was validated and complemented based on an analysis of the videos later on. For example, exact measurements of the horn-honking response times were obtained from the videos (in a few cases, the honking was not audible on the video, in which case we used the measurement taken in the field; the correlation between the measurements taken in the field and the videos is r = 0.98). Based on stills from the videos, an automobile expert coded the exact make and model of the blocked car, its approximate production year, and its selling price. In the analyses below we exclude a handful of observations because, for example, the approaching car turned into a parking spot instead of being blocked, or because information on some key variables was missing or inconsistent (due to the failure of the observers to take notes, or due to missing video recordings). We also exclude 10 observations because the blocked vehicle was a delivery van or motorcycle, for which a status comparison to the experimental car is difficult. In total, 106 valid cases are available for the analysis. In 10 cases, the honking response came from a successive vehicle instead of the foremost blocked car (detailed data was collected only for the foremost car). We treat these cases as censored in our analysis. We use two measures for the status of the blocked vehicles: (a) a subjective classification in terms of one of three hierarchical categories similar to Experiment 1 (lower, middle, higher), and (b) the estimated monetary value of the vehicle. The monetary value is equal to the selling price after applying a yearly depreciation of 5 %. There is a clear relation between the two measures: the average monetary values are 16,497 CHF, 24,971 CHF, or 37,573 CHF for vehicles classified as lower, middle, or higher status. Data analysis Since there are censored response times, the techniques of event history modeling are the most appropriate statistical tools for analyzing the data (Diekmann et al. 1996:763). We use the product-limit method to estimate survival curves as descriptive measures. Multivariate analysis employs the semi-parametric Cox regression model (Cox 1972; Diekmann and Mitter 1984). In the Cox model, the hazard rate r(t) of horn-honking (i.e., the probability of a horn-honking event at time t, conditional on its not having yet occurred) is modeled as the product of an unspecified baseline hazard rate and the exponent of a linear function of the covariates. In the following analysis we will report the exponents of the estimated coefficients, since they can be interpreted in a straightforward manner as multiplication effects on the hazard rate, that is, as hazard ratios (effects greater than one imply an increase in the hazard rate and faster honking reactions; effects lower than one imply a decrease in the hazard rate and slower honking reactions). The Cox regression assumes proportional hazards at each point in time. The applicability of this assumption was tested, and deviation from it was negligible for the models discussed below (see last row in Table 1).

266 | Ben Jann and Elisabeth Coutts

3 Results Figure 1 shows the horn-honking survival functions from the two experiments. In Experiment 1, the time-window in which a honking reaction could occur was restricted to 12 seconds. About 80 percent of all blocked drivers honked within these 12 seconds (i.e., the survival function drops down to about 20 %). In Experiment 2, there was no such restriction, as the blocking car remained stopped until the first honking reaction occurred (the maximum time recoded in our data is 60 seconds). In Figure 1, we only display the survival curve for the first 20 seconds, within which about 80 % of the blocked subjects honked. Overall, honking reactions occurred faster in Experiment 1 than in Experiment 2. The reason is that in Experiment 2, reaction time was measured from when the blocked vehicle entered the street, whereas in Experiment 1, time was measured from the moment the lights turned green, with the blocked vehicle already in position behind the experimental car. 1 .9 .8 .7 .6 .5 Experiment 2 .4 .3

Experiment 1

.2 .1 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Seconds Fig. 1: Horn-honking survival functions.

To test our hypothesis that honking reactions are affected by the status difference between the experimental car and the blocked vehicle, Table 1 displays the results of several Cox regressions (for descriptive statistics, see Table 2 in the Appendix). In Model 1, which is based on the data of Experiment 1, we see that a status difference between the two vehicles accords with a significant increase in the hazard rate of honking: a onepoint status difference increases the hazard rate by about 40 %. The corresponding results from Experiment 2 (Model 3) are very similar (showing a significant increase in

Social Status and Peer-Punishment: Findings from Two Road Traffic Field Experiments | 267

Tab. 1: Multivariate analysis of honking response times (z-values in brackets). Experiment 1

Experiment 2

Status

Status

Model 1 Absolute difference in status or valuea

Model 2

1.42∗

Model 3

Value Model 4

1.45∗

(2.16)

Model 5

Model 6

2.40∗∗

(2.44)

(3.07)

Downward difference in status or valuea

1.11 (0.46)

1.07 (0.34)

2.99∗ (2.50)

Upward difference in status or valuea

1.90∗ (2.57)

2.02∗∗ (3.17)

2.11∗ (2.16)

Status of experimental car (0 low, 1 high)

1.15 (0.67)

0.67 (−0.99)

Driver in experimental car is female (0/1)

0.85 (−0.71)

0.46∗ (−2.08)

0.52∗ (−2.23)

0.66 (−0.88)

0.52∗∗ (−2.81)

0.53∗∗ (−2.69)

0.51∗∗ (−2.84)

0.50∗∗ (−2.91)

Driver in blocked car is female (0/1)

0.64 (−1.64)

0.55∗ (−2.08)

1.61+ (1.95)

1.50+ (1.66)

1.50+ (1.67)

1.50+ (1.68)

Blocked driver aged 18 through 30 (0/1)

1.45 (1.27)

1.33 (0.97)

0.48∗ (−2.42)

0.47∗ (−2.47)

0.47∗ (−2.41)

0.47∗ (−2.40)

Blocked driver aged 56 or older (0/1)

1.72∗ (2.07)

1.78∗ (2.18)

1.46 (1.12)

1.44 (1.07)

1.27 (0.72)

1.31 (0.80)

Business vehicle (0/1)

2.38∗ (2.01)

2.05 (1.64)

2.28+ (1.94)

2.37∗ (2.00)

Temperature

1.13 (0.75)

1.24 (1.25)

1.18 (0.99)

1.16 (0.91)

Traffic density

1.09 (1.05)

1.12 (1.37)

1.10 (1.16)

1.10 (1.17)

Direction of entry into road (0 left, 1 right)

2.10∗∗ (2.65)

2.14∗∗ (2.72)

2.15∗∗ (2.74)

2.14∗∗ (2.73)

Number of trials (events) Likelihood ratio χ 2 (df ) Proportional-hazards test (p-value)

123(97)

123(97)

13.5(5)∗

16.0(6)∗

0.784

0.800

106(96)

106(96)

106(96)

106(96)

26.4(10)∗∗ 31.0(11)∗∗ 29.8(10)∗∗∗ 30.2(11)∗∗ 0.414

0.476

0.640

0.728

Notes: Displayed are hazard ratios from proportional-hazards models (Cox regressions). Reference age group: drivers aged 31 through 55. a Difference in status (0: same level, 1: low or high vs. middle, 2: low vs. high) (models 1–4) or difference in log value (models 5/6) between blocked vehicle and blocking vehicle. The difference is downward (upward) if the status/value of the blocked vehicle is higher (lower) than the status/value of the blocking vehicle. + p < 0.10, ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001.

268 | Ben Jann and Elisabeth Coutts

the hazard rate of about 45 % for a one-point status difference).⁴ Furthermore, if the status difference is measured in terms of the difference in monetary value between the vehicles, we see a very clear and statistically significant effect (Model 5). Since we use a logarithmic specification, the coefficient of 2.40 can be interpreted as about a 2.40ln(2) − 1 = 83 % increase in the hazard rate if the higher status vehicle is worth about twice as much as the lower status vehicle, compared to a situation in which the value of both vehicles is the same. Overall, these results provide clear evidence for the “difference hypothesis” (the hypothesis that sanctioning is exerted more readily if there is a status difference between the two actors). Also note that the above models fit the data significantly better than models in which the status or monetary value of the blocked vehicle is introduced as is, without taking differences to the status of the experimental car (not shown). To put it another way, the actors’ status levels per se do not explain the patterns found in our data; it is the combination of status between the two actors that matters. Against the backdrop of the literature discussed above, an interesting question is whether the effects work the same in both directions, or whether, for example, aggression mainly “flows downward”. In Models 2, 4, and 6, the effects of the status difference are separated into an effect of a downward difference (the blocked driver has a higher status than the blocker) and an effect of an upward difference (the blocked driver has a lower status than the blocker). The results from Model 2 (Experiment 1) and Model 4 (Experiment 2) suggest that status matters when a lower status car is blocked by a higher status car, but not in the reverse case. These results suggest that aggression “flows upward”, but the results are not fully conclusive, as the difference in effects of a downward difference and an upward difference is not statistically significant in Model 2 (p = 0.119) and only mildly significant in Model 4 (p = 0.034). It is thus not entirely clear whether the distinction between downward and upward differences really matters. Furthermore, Model 6, in which status is measured in terms of the monetary value of the vehicles, does not provide support for such a distinction. Here, both effects are statistically significant, and the effect of a downward difference is in fact somewhat stronger (although the difference between the two effects is far from being statistically significant: p = 0.515). The results from Model 6 thus suggest that the relationship between peer-punishment and status is similar in both situations. For a better impression of the size of the discussed effects, Figure 2 displays the predicted survival curves from Models 2, 4, and 6 for different combinations of the

4 The status difference variable in these models can take on three values: 0 (same status category), 1 (difference between a middle status reactor and a lower or higher status experimental car), and 2 (difference between a higher status reactor and a lower status experimental car, or vice versa). Since the null hypothesis of a linear effect of the status difference (i.e., the effect of a one-point status difference is exactly half of the effect of a two-point status difference) cannot be rejected (p-value of 0.893 for Model 1 and 0.601 for Model 3), we refrain from using a more complex specification with separate effects for the two levels of status difference.

Social Status and Peer-Punishment: Findings from Two Road Traffic Field Experiments | 269

Low status blocker

High status blocker

Experiment 1 (status)

Experiment 1 (status)

1

1

.9

.9

.8

.8

.7

.7

.6

.6

.5

.5

.4

.4

.3

.3

.2

.2

.1

.1

0

0 2

4

6

8

10

12

14

16

18 20

2

4

6

Experiment 2 (status)

8

10

12

14

16

18 20

Experiment 2 (status)

1

1

.9

.9

.8

.8

.7

.7

.6

.6

.5

.5

.4

.4

.3

.3

.2

.2

.1

.1

0

0 2

4

6

8

10

12

14

16

18 20

2

4

6

Experiment 2 (value)

8

10

12

14

16

18 20

16

18 20

Experiment 2 (value)

1 .9 .8

1 .9 .8

.7 .6 .5 .4 .3 .2 .1

.7 .6 .5 .4 .3 .2 .1

0

0 2

4

6

Difference:

8

10

12

Small

14

16

18 20

Medium

2

4

6

8

10

12

14

Large

Fig. 2: Predicted survival functions by status or value difference from Models 2, 4, and 6.

270 | Ben Jann and Elisabeth Coutts

status of the experimental vehicle and the status of the blocked driver (with average values for the control variables). In the case of Model 2 (upper subgraphs) and Model 4 (middle subgraphs), the scenarios reflect the possible combinations of the categorical status variable (small difference: same category; medium difference: middle vs. low or high; large difference: low vs. high). In the case of Model 6 (lower subgraphs), the scenarios are determined by relative differences in monetary value (small difference: same value; medium difference: the value of one of the vehicles is 50 % higher than the value of the other vehicle; large difference: one of the vehicles is worth twice as much as the other; these scenarios were chosen in accordance with the approximate differences in average vehicle values between the three status groups, for which see above). The subgraphs on the left illustrate the effect of a (downward) status difference in case of the lower status experimental car; the subgraphs on the right show the effect of an (upward) status difference in case of the upper status experimental car. In all cases larger status differences lead to lower survival curves: the larger the status difference, the more drivers honk their horn within a given timespan. In the upper two subgraphs on the left, the differences between the curves are negligible (and not statistically significant). In the other cases, however, the differences are substantial. For example, in Experiment 2, only about 20 % honk within the first 10 seconds when both vehicles belong to the higher status class, but more than 60 % of lower class drivers honk in the same timespan if they are blocked by a higher class vehicle (middle subgraph on the right). Correspondingly, the median response time (the time until 50 % of the cars honked) is almost 20 seconds in the former case, but only 8 seconds in the later. For Experiment 1, the effects of an upward status difference are of similar magnitude (see the upper right subgraph). If status is measured in terms of vehicle value, the effects are substantial in both directions (see lower subgraphs), but the magnitude of the effects is somewhat smaller than above. Median response times for large and small differences were about 10 and 15 seconds in the case of the low status blocker, and about 14 and 20 seconds in the case of the high status blocker. Note that the curves in the right subgraphs tend to be higher than the curves in the left subgraphs. This means that, controlling for status difference, an upper status experimental car elicited a somewhat slower honking responses than a lower status car. The corresponding coefficients (reflecting the difference between the solid lines on the left and on the right) point in the same direction in all three models, but only in Model 4 is the coefficient statistically significant. The evidence for a more generous treatment of high-status norm violators is therefore only weak. With respect to the control variables, we find a clear effect of the sex of the driver of the experimental car. Hazard rates were substantially lower if the driver was female and not male (Models 4–6, Experiment 2 only; in Experiment 1 the driver was always male). With respect to the sex of the blocked driver, we found inconsistent results between the two experiments. In Experiment 1, females tended to have lower hazard rates than males, but in Experiment 2, females tended to show faster honking reac-

Social Status and Peer-Punishment: Findings from Two Road Traffic Field Experiments | 271

tions (the effects, however, were only marginally significant).⁵ One may suspect that the combination of the genders of the two drivers matters. Adding an interaction term to the models of Experiment 2 did reveal a diminishing effect if both drivers were female, but the difference was far from being statistically significant (p-values between 0.3 and 0.4, depending on the model). In terms of the age of the blocked drivers, the results were also inconsistent between the experiments. Whereas older drivers in Experiment 1 had significantly higher hazard rates than middle-aged drivers, the corresponding effect was smaller and not significant in Experiment 2 (although pointing in the same direction). Furthermore, young drivers in Experiment 1 tended to have higher hazard rates (although this was not statistically significant), whereas younger drivers in Experiment 2 had significantly lower hazard rates than middle-aged drivers. For Experiment 2, for which more control variables are available, we found mild evidence for faster honking reactions in drivers of business vehicles. Temperature and traffic density had effects in the expected direction, but were not statistically significant. A clear effect, however, was found for the direction from which the blocked car entered the street. This is a purely technical effect related to the way in which we determined the starting point for measuring the honking response times. We also evaluated the effects of some further control variables, such as the color of the blocked car, without finding any meaningful results (not shown).⁶

5 Results from other studies on the relative willingness of males and females to sound their car horns are also ambiguous. While Doob and Gross (1968), Shinar (1998:149–150), and Shinar and Compton (2004) report significantly fewer honking responses by female drivers, the effect has not been significant in several replications of the horn-honking experiment (Chase and Mills 1973; Deaux 1971; Diekmann et al. 1996; Ellison et al. 1995; Forgas 1976; Kenrick and MacFarlane 1986; Shinar 1998:151– 156; Turner, Layton, and Simons 1975), although the results of most of these studies showed longer latencies for women. Hennessey and Wiesenthal (1999) find no differences between men and women in behaviors such as honking, but suggest in a later article (2001) that a more distinct difference between men and women may be expected in the case of “driver violence”, that is, more severe forms of behavior, such as chasing other drivers or vandalizing vehicles. These results fit well with those reported for aggression in general. A meta-analysis by Bettencourt and Miller (1996) suggests that the largest differences between male and female aggression occur either in conditions in which there is no provocation, or when the aggression is expressed in physical form. 6 One variable, whether successive vehicles were present behind the blocked car, did have a significant effect (and including this variable also rendered the effect of traffic density significant). One could expect that the presence of successive vehicles is an additional stressor putting pressure on the blocked car, leading to faster honking reactions. Oddly, however, the effect was negative. We did not include this effect in our models because the result is an artifact of how the variable was measured. The longer a blocked driver refrained from honking, the higher the chance that additional vehicles appeared in the street (this also explains the increased effect of traffic density, as more cars appear in a given timespan if traffic density is high). Because the variable only measures whether additional vehicles were present, but not the exact times at which they appeared, the variable is endogenous to the honking behavior of the blocked driver. To estimate the effect of additional vehicles consistently, their appearance would have to be introduced in terms of a time-varying covariate.

272 | Ben Jann and Elisabeth Coutts

4 Conclusions The findings reported in this paper provide evidence that the disparity in social status between two actors has a positive effect on the degree of sanctioning behavior that is expressed during their interaction. These results were obtained in two road-traffic field experiments, during which subjects’ behavior was observed after their cars had been blocked by another car. Specifically, latencies in horn-honking responses were significantly higher in cases where the driver of a car was blocked by an experimental car of similar status than in cases where it was blocked by a car of quite different status. These results support our hypothesis. For the question of whether status differences operate in the same way irrespective of the direction of the difference (the punisher’s status being higher than the norm violator’s status, or vice versa), or whether sanctioning “flows downward” as suggested by literature on aggressive behavior, our findings are ambiguous. The results for one of our status measures suggests that sanctioning mainly “flows upward”, counter to the expectation from the literature. However, the statistical evidence for a difference in the effect depending on direction is not particularly strong. Moreover, clearly symmetric effects were found for our second status measure. Despite the fact that our results were obtained with a few deviations from previous experimental designs, they may reflect something more than the choice of a particular methodology, especially since similar results were obtained by Diekmann et al. (1996). The experimental car used to block the intersection by Diekmann et al. was classified as “lower middle class”, which was also the class of blocked drivers who showed the lowest level of horn-honking in their experiment. The level of horn-honking increased monotonically for higher status classes, and also for the lower status class: that is, the larger the status difference, the higher the level of horn-honking. Whether similar social status may have contributed to the effects detailed in other previous studies is difficult to assess, since the status of the blocked car was not reported in those studies. However, according to the status-similarity hypothesis, the results of the classic hornhonking experiments described in the introduction should depend on the composition of the sample of blocked subjects. If there are, for example, predominantly high-status subjects in the sample, one would expect a lower-status blocking vehicle to elicit more aggressive responses than a higher-status car, as observed by Doob and Gross (1968) and Deaux (1971). If, on the other hand, the reactors are drivers of mostly low-status cars, one would expect more aggressive responses toward a higher-status blocking vehicle than to a lower-status blocking vehicle, as reported by Chase and Mills (1973). The situation we study can be seen as a social dilemma in the sense that individual behavior (blocking the road) has negative externalities for the public (impediment of traffic flow), and that the blocking car violates a social norm (to keep the road clear if possible). Most likely, however, there is no second-order dilemma with respect to the enforcement of norm compliance, as most of the externalities are imposed on a single

Social Status and Peer-Punishment: Findings from Two Road Traffic Field Experiments | 273

actor (the blocked driver). It is reasonable to assume that the costs of sanctioning are much lower than the benefits, even though there may be a small chance that the norm violator will engage in retaliatory behavior rather than move the car. Nonetheless, our study is a valuable contribution to the literature on peer-punishment, as it shows how punishing behavior depends on the social status of the actors. We are, however, skeptical about whether our results can be applied to situations in which status is more than a mere token of social categorization. As discussed in the introduction, different results may, for example, be expected in a situation characterized by competition for status positions, or in a situation where an explicit power relationship exists between actors of different status. How status relates to sanctioning in different types of situations is an interesting question to be studied in future research.

Appendix Tab. 2: Descriptive statistics of the predictors.

Status of blocker (0 low, 1 high) Driver in blocking car is female (0/1) Status of blocked vehicle – low – middle – high Absolute difference in status (0–2) Downward difference in status (0–2) Upward difference in status (0–2) Natural logarithm of value of blocked vehicle (selling price minus 5 % depreciation per year) Absolute difference in log value Downward difference in log value Upward difference in log value Female blocked driver (0/1) Estimated age of blocked driver – 18 through 30 – 31 through 55 – 56 or older Business vehicle (0/1) Temperature (in degree Celsius) Traffic density (in 1000 vehicles per hour) Direction of entry into road (0 left, 1 right) Number of observations

Experiment 1

Experiment 2

Mean

Mean

Std. dev.

0.496

0.481 0.528

0.252 0.553 0.195 0.976 0.463 0.512

0.368 0.387 0.245 0.972 0.443 0.528 10.02

0.236 0.138 0.650 0.211

123

0.671 0.669 0.694

0.792 0.259 0.533 0.349 0.406 0.462 0.132 0.075 28.19 11.64 0.764 106

Std. dev.

0.786 0.705 0.771 0.451 0.497 0.363 0.627

0.984 2.214

274 | Ben Jann and Elisabeth Coutts

Bibliography [1]

[2]

[3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

[13] [14] [15] [16]

[17] [18]

[19]

[20]

Allan, Steven, and Paul Gilbert. 2002. “Anger and anger expression in relation to perceptions of social rank, entrapment and depressive symptoms.” Personality and Individual Differences 32(3):551–565. Anderson, Cameron, and Jennifer L. Berdahl. 2002. “The experience of power: Examining the effects of power on approach and inhibition tendencies.” Journal of Personality and Social Psychology 83(6):1362–1377. Anderson, Craig A., and Brad J. Bushman. 2002. “Human aggression.” Annual Review of Psychology 53:27–51. Baron, Robert A. 1976. “The reduction of human aggression: A field study of the influence of incompatible reactions.” Journal of Applied Social Psychology 6(3):260–274. Barroso, F. G., C. L. Alados, and J. Boza. 2000. “Social hierarchy in the domestic goat: effect on food habits and production.” Applied Animal Behaviour Science 69(1):35–53. Bettencourt, B. Ann, and Norman Miller. 1996. “Gender differences in aggression as a function of provocation: A meta-analysis.” Psychological Bulletin 119(3):422–447. Billig, Michael, and Henri Tajfel. 1973. “Social categorization and similarity in intergroup behaviour.” European Journal of Social Psychology 3(1):9–26. Brewer, Marilynn B. 1979. “In-group bias in the minimal intergroup situation: A cognitivemotivational analysis.” Psychological Bulletin 86(2):307–324. Brewer, Marilynn B., and Roderick M. Kramer. 1985. “The psychology of intergroup attitudes and behavior.” Annual Review of Psychology 36:219–243. Burger, Jerry M., and Harris M. Cooper. 1979. “The desirability of control.” Motivation and Emotion 3(4):381–393. Chase, Lawrence J., and Norbert H. Mills. 1973. “Status of frustrator as a facilitator of aggression: A brief note.” The Journal of Psychology 84(2):225–226. Clark, Candace. 1990. “Emotions and micropolitics in everyday life: Some patterns and paradoxes of ‘place’.” Pp. 505–533 in Research agendas in the sociology of emotions, edited by T. D. Kemper. Albany, NY: State University of New York Press. Conway, Michael R., Roberto DiFazio, and Shari Mayman. 1999. “Judging others’ emotions as a function of others’ status.” Social Psychology Quarterly 62(3):291–305. Cox, David R. 1972. “Regression models and life-tables.” Journal of the Royal Statistical Society, Series B 34:187–220. Deaux, Kay K. 1971. “Honking at the intersection: A replication and extension.” The Journal of Social Psychology 84(1):159–160. Deffenbacher, Jerry L., Rebekah S. Lynch, Eugene R. Oetting, and David A. Yingling. 2001. “Driving anger: correlates and a test of state-trait theory.” Personality and Individual Differences 31(8):1321–1331. Deffenbacher, Jerry L., Eugene R. Oetting, and Rebekah S. Lynch. 1994. “Development of a driving anger scale.” Psychological Reports 74(1):83–91. Diekmann, Andreas, Monika Jungbauer-Gans, Heinz Krassnig, and Sigrid Lorenz. 1996. “Social status and aggression: A field study analyzed by survival analysis.” Journal of Social Psychology 136(6):761–768. Diekmann, Andreas, and Peter Mitter. 1984. Methoden zur Analyse von Zeitverläufen. Anwendungen stochastischer Prozesse bei der Untersuchung von Ereignisdaten. Stuttgart: B. G. Teubner. Doob, Anthony N., and Alan E. Gross. 1968. “Status of frustrator as an inhibitor of hornhonking responses.” The Journal of Social Psychology 76(2):213–218.

Social Status and Peer-Punishment: Findings from Two Road Traffic Field Experiments | 275

[21] Ellison, Patricia A., John M. Govern, Herbert L. Petri, and Michael H. Figler. 1995. “Anonymity and aggressive driving behavior: A field study.” Journal of Social Behavior and Personality 10(1):265–272. [22] Forgas, Joseph P. 1976. “An unobtrusive study of reactions to national stereotypes in four European countries.” The Journal of Social Psychology 99(1):37–42. [23] Gould, Roger V. 2003. Collision of Wills: How ambiguity about social rank breeds conflict. Chicago, IL: University of Chicago Press. [24] Galinsky, Adam D., Deborah H. Gruenfeld, and Joe C. Magee. 2003. “From power to action.” Journal of Personality and Social Psychology 85(3):453–466. [25] Haukkala, Ari. 2002. “Socio-economic differences in hostility measures: A population based study.” Psychology and Health 17(2):191–202. [26] Hecht, Marvin A., and Marianne LaFrance. 1988. “License or obligation to smile: The effect of power and gender on amount and type of smiling.” Personality and Social Psychology Bulletin 24(12):1326–1336. [27] Hennessy, Dwight A., and David L. Wiesenthal. 1997. “The relationship between traffic congestion, driver stress and direct versus indirect coping behaviours.” Ergonomics 40(3):348–361. [28] Hennessy, Dwight A., and David L. Wiesenthal. 1999. “Traffic congestion, driver stress, and driver aggression.” Aggressive Behavior 25(6):409–423. [29] Hennessy, Dwight A., and David L. Wiesenthal. 2001. “Gender, driver aggression, and driver violence: An applied evaluation.” Sex Roles 44(11–12):661–676. [30] Jann, Ben, Stephan Suhner, and Renato Marioni. 1995. Sozialer Status und “Hupzeiten”. Ergebnisse aus einem Feldexperiment. University of Bern: Mimeo. [31] Jann, Ben. 2009. “Sozialer Status und Hup-Verhalten. Ein Feldexperiment zum Zusammenhang zwischen Status und Aggression im Strassenverkehr.” Pp. 397–410 in Klein aber fein! Quantitative empirische Sozialforschung mit kleinen Fallzahlen, edited by P. Kriwy, and C. Gross. Wiesbaden: VS-Verlag. [32] Keltner, Dacher, Deborah H. Gruenfeld, and Cameron Anderson. 2003. “Power, approach, and inhibition.” Psychological Review 110(2):265–284. [33] Kemper, Theodore D. 1978. A social interaction theory of emotions. New York: Wiley. [34] Kemper, Theodore D. 1987. “How many emotions are there? Wedding the social and the autonomic components.” American Journal of Sociology 93(2):263–289. [35] Kenrick, Douglas T., and Steven W. MacFarlane. 1986. “Ambient temperature and horn honking: A field study of the heat/aggression relationship.” Environment and Behavior 18(2):179– 191. [36] Knapper, Christopher K., and Arthur J. Cropley. 1980. “Interpersonal factors in driving.” International Review of Applied Psychology 29(4):415–438. [37] Kramer, Roderick M., and Marilynn B. Brewer. 1986. “Social group identity and the emergence of cooperation in resource conservation dilemmas.” Pp. 205–234 in Experimental social dilemmas, edited by H. A. M. Wilke, D. M. Messick, and C. G. Rutte. Frankfurt am Main: Peter Lang. [38] Kuppens, Peter, Iven Van Mechelen, and Michel Meulders. 2004. “Every cloud has a silver lining: Interpersonal and individual differences determinants of anger-related behaviors.” Personality and Social Psychology Bulletin 30(12):1550–1564. [39] Lajunen, Timo, and Dianne Parker. 2001. “Are aggressive people aggressive drivers? A study of the relationship between self-reported general aggressiveness, driver anger and aggressive driving.” Accident Analysis and Prevention 33(2):243–255. [40] Lajunen, Timo, Dianne Parker, and Stephen G. Stradling. 1998. “Dimensions of driver anger, aggressive and highway code violations and their mediation by safety orientation in UK drivers.” Transportation Research Part F 1:107–121.

276 | Ben Jann and Elisabeth Coutts

[41] Lawton, Rebecca, and Amanda Nutter. 2002. “A comparison of reported levels and expression of anger in everyday and driving situations.” British Journal of Psychology 93(3):407–423. [42] Lively, Kathryn J. 2000. “Reciprocal emotion management: Working together to maintain stratification.” Work and Occupations 27(1):32–63. [43] Lovaglia, Michael J., and Jeffrey A. Houser. 1996. “Emotional reactions and status in groups.” American Sociological Review 61(5):869–883. [44] Marsh, Peter, and Peter Collett. 1986. Driving passion: The psychology of the car. London: Jonathan Cape. [45] McGarva, Andrew R., and Michelle Steiner. 2000. “Provoked driver aggression and status: A field study.” Transportation Research Part F 3:167–179. [46] McKinnon, Neil J. 1994. Symbolic interactionism as affect control. Albany, NY: State University of New York Press. [47] Mummendey, Amélle, and Hans-Joachim Schreiber. 1983. “Better or just different? Positive social identity by discrimination against, or by differentiation from outgroups.” European Journal of Social Psychology 13(4):389–397. [48] Novaco, Raymond W. 1991. “Aggression on Roadways.” Pp. 253–326 in Targets of Violence and Aggression, edited by R. Baenninger. Amsterdam: Elsevier Science Publishers. [49] Parker, Dianne, Timo Lajunen, and Heikki Summala. 2002. “Anger and aggression among drivers in three European countries.” Accident Analysis and Prevention 34(2):229–235. [50] Pitkänen, Lea. 1973. “An aggression machine: II. Interindividual differences in the aggressive defence responses aroused by varying stimulus conditions.” Scandinavian Journal of Psychology 14(2):65–74. [51] Ramirez, J. Martin, Carmen Santisteban, Takehiro Fujihara, and Stephanie Van Goozen. 2002. “Differences between experience of anger and readiness to angry action: A study of Japanese and Spanish students.” Aggressive Behavior 28(6):429–438. [52] Ridgeway, Cecilia, and Cathryn Johnson. 1990. “What is the relationship between socioemotional behavior and status in task groups?” American Journal of Sociology 95(5):1189–1212. [53] Robinson, W. Peter. 1996. Social groups and identities. Developing the legacy of Henri Tajfel. Oxford: Butterworth-Heinemann. [54] Rothe, Jörg, and Heiko Schmiedeskamp. 2006. Aggressionsverhalten im Strassenverkehr: Analyse der Hupverzögerung von blockierten Autos. ETH Zurich: Mimeo. [55] Schieman, Scott. 2003. “Socioeconomic status and the frequency of anger across the life course.” Sociological Perspectives 46(2):207–222. [56] Shinar, David. 1998. “Aggressive driving: The contribution of the drivers and the situation.” Transportation Research Part F 1:137–160. [57] Shinar, David, and Richard Compton. 2004. “Aggressive driving: an observational study of driver, vehicle, and situational variables.” Accident Analysis and Prevention 36(3):429–437. [58] Sloan, Melissa M. 2004. “The effects of occupational characteristics on the experience and expression of anger in the workplace.” Work and Occupations 31(1):38–72. [59] Smith-Lovin, Lynn. 1990. “Emotion as the confirmation and disconfirmation of identity: An affect control model.” Pp. 238–270 in Research agendas in the sociology of emotions, edited by T. D. Kemper. Albany, NY: State University of New York Press. [60] Tajfel, Henri. 1978. Differentiation between social groups. Studies in the social psychology of intergroup relations. London: Academic Press. [61] Tajfel, Henri. 1982a. “Social psychology of intergroup relations.” Annual Review of Psychology 33:1–39. [62] Tajfel, Henri. 1982b. Social identity and intergroup relations. Cambridge: Cambridge University Press.

Social Status and Peer-Punishment: Findings from Two Road Traffic Field Experiments | 277

[63] Tajfel, Henri, Michael G. Billig, R. P. Bundy, and Claude Flament. 1971. “Social Categorization and intergroup behaviour.” European Journal of Social Psychology 1(2):149–177. [64] Taubman-Ben-Ari, Orit, Mario Mikulincer, and Omri Gillath. 2004. “The multidimensional driving style inventory: Scale construction and validation.” Accident Analysis and Prevention 36:323–332. [65] Tiedens, Larissa Z. 2001. “Anger and advancement versus sadness and subjugation: The effect of negative emotion expressions on social status conferral.” Journal of Personality and Social Psychology 80(1):86–94. [66] Tiedens, Larissa Z., Phoebe C. Ellsworth, and Batja Mesquita. 2000. “Stereotypes about sentiments and status: Emotional expectations for high- and low-status group members.” Personality and Social Psychology Bulletin 26(5):560–574. [67] Turner, Charles W., John F. Layton, and Lynn S. Simons. 1975. “Naturalistic studies of aggressive behavior: Aggressive stimuli, victim visibility, and horn honking.” Journal of Personality and Social Psychology 31(6):1098–1107. [68] Turner, John C., Rupert J. Brown, and Henri Tajfel, 1979. “Social comparison and group interest in ingroup favouritism.” European Journal of Social Psychology 9(2):187–204. [69] Verona, Edelyn, Christopher J. Patrick, and Alan R. Lang. 2002. “A direct assessment of the role of state and trait negative emotion in aggressive behavior.” Journal of Abnormal Psychology 111(2):249–258. [70] Yagil, Dana. 2001. “Interpersonal antecedents of drivers’ aggression.” Transportation Research Part F 4:119–131. [71] Yazawa, Hisashi. 2004. “Effects of inferred social status and a beginning driver’s sticker upon aggression of drivers in Japan.” Psychological Reports 94(3):1215–1220.

Andreas Flache, Dieko Bakker, Michael Mäs, and Jacob Dijkstra

The Double Edge of Counter-Sanctions. Is Peer Sanctioning Robust to Counter-Punishment but Vulnerable to Counter-Reward? Abstract: Peer sanctioning institutions are powerful solutions to the freerider problem in collective action. However, counter-punishment may deter sanctioning, undermining the institution. Peer-reward can be similarly vulnerable, because peers may exchange rewards for rewards (“counter-reward”) rather than enforce contributions to the collective good. Based on social exchange arguments, we hypothesize that peerreward is vulnerable in a repeated game where players are fully informed about who rewarded them in the past. Social preference arguments suggest that peer-punishment is robust under the same conditions. This contrast was tested in an experiment in which counter-sanctioning was precluded due to anonymity of enforcers in one treatment and allowed in another treatment by non-anonymity of enforcers. This was done both for a reward and for a punishment institution. In line with the exchange argument, non-anonymity boosted reward-reward exchanges. Punishment was only somewhat reduced when enforcers were not anonymous. In contrast with previous experiments, we found no effects of counter-sanctioning on contributions. Thus, nonanonymity did not undermine the effectiveness of the peer sanctioning institutions in our experiments, neither for reward nor for punishment. Our results suggest that previous claims about the vulnerability of peer-punishment to counter-punishment may not generalize to non-anonymous repeated interactions.

1 Introduction Human societies depend on the successful provision of collective goods, which typically require the contribution of many to be produced.¹ However, as Olson (1965) prominently argued, groups may fail to supply collective goods due to members’ rationally “freeriding”, unless there are selective incentives eliciting contributions (Olson 1965).

1 An important exception is the so-called volunteer’s dilemma introduced by Diekmann (1985). In this game, one contributor is enough to generate the collective good. Note: This work has benefitted from insightful comments by the editors and reviewers, as well as from stimulating discussions with the members of the Norms and Networks research group at the Department of Sociology/ICS of the University of Groningen. Any remaining deficiencies are, of course, sole responsibility of the authors. https://doi.org/10.1515/9783110472974-014

280 | Andreas Flache, Dieko Bakker, Michael Mäs, and Jacob Dijkstra

Environmental pollution, lack of effort in joint team production, or the failure to maintain a valuable community resource are examples of Olson’s famous “logic of collective action” (Bouma, Bulte, and van Soest 2008; Hardin 1968; Petersen 1992). But the empirical picture is not all that bleak. Collective goods are provided even without formal institutional solutions, as in mass protests to overthrow an oppressive regime (Opp, Voss, and Gern 1995), effective lobbying associations (Marwell and Oliver 1993), Wikipedia (Anthony, Smith, and Williamson 2009), or successful “selfmanaging teams” at the workplace (Barker 1993). Our chapter focuses on peer sanctioning as an informal social institution supporting collective action. Sociologists (Coleman 1990; Homans 1951; Homans 1974), economists (Fehr and Gächter 2000; Fehr and Gächter 2002; Kandel and Lazear 1992), and political scientists (Ostrom, Walker, and Gardner 1992) have long recognized the importance of peer sanctioning for the enforcement of cooperation. Under a peer sanctioning institution, group members failing to pull their weight face expressions of disapproval, physical punishment, or ostracism by their peers (Homans 1951). Contributors are rewarded with peer approval, praise, or affirmation of their social standing (Willer 2009). Research in work groups has provided ample evidence of the power of peer sanctioning, starting with Roethlisberger and Dickson’s classic Hawthorne studies (1939). How robust, however, are peer sanctioning institutions as solutions to collective action problems? Our contribution focuses in particular on how the robustness of peer sanctioning depends on whether the sanction is reward (for cooperative behavior) or punishment (of uncooperative behavior), and we report results from a laboratory experiment (see Van Miltenburg et al. 2014 for a similar approach). The robustness of peer sanctioning is debated in the literature because peer sanctioning institutions are threatened by the “second order free rider problem” (Oliver 1980). Peer sanctioning being both individually costly and in the collective interest, it is itself a collective good (Coleman 1990). Rational egoists should, therefore, refrain from sanctioning. Incentivized experiments with collective good games showed, however, that players are willing to invest in costly peer-punishment (Diekmann and Przepiorka 2015; Fehr and Gächter 2000; Fehr and Gächter 2002) and costly peerreward (Flache and Bakker 2012; Flache 1996; Van Miltenburg et al. 2014). Fehr and Gächter (2000; 2002) and Fehr and Gintis (2007) explain “altruistic punishment” by positing a behavioral disposition or social preference in at least part of the human population “to punish violations of cooperative norms even at a net cost to the punisher” (Fehr and Gintis 2007:45). Even when taking social preferences into account, theories of peer sanctioning are challenged to explain its robustness in the face of counter-sanctioning. In work teams, collaborative projects in science, and close-knit communities, for instance, punishers are not anonymous and face future interactions with those they punish. Consequently, fear of retaliation may deter peer-punishment. The experimental design employed by Fehr and Gächter, and virtually all follow-up studies in this paradigm, excluded

The Double Edge of Counter-Sanctions |

281

counter-punishment. To study the effects of counter-punishment, Nikiforakis (2008) added a stage to each period of the standard collective good experiment in which subjects first learned how strongly other group members had punished them and were given the opportunity to counter-punish. Participants were deterred from using punishment against freeriders in the first place, reducing contributions to the collective good (see also Denant-Boemont, Masclet, and Noussair 2007). Research on counter-rewards similarly suggests that peer-reward may be vulnerable to the effects of counter-reward. Flache (1996; 2002; see also Flache and Macy 1996; Flache, Macy, and Raub 2000) analyzed a repeated game in which players could reward each other based on information about others’ contribution to the collective good in previous periods. They compared two conditions, one in which enforcers remained anonymous and one in which their identity was revealed prior to the next round of the game. Both social learning theory (Flache and Macy 1996) and game-theory (Flache 2002) predicted that without anonymity players would refrain from using rewards as an instrument to enforce contributions under a large range of conditions. The reason is that in the non-anonymous condition players establish mutually beneficial reward exchanges, even with freeriders. Experimental tests confirmed the related prediction that subjects contribute less to the collective good in the non-anonymous condition (Flache and Bakker 2012; Flache 1996). There is a crucial difference in the experimental designs used to study counterpunishment and counter-reward. On the one hand, counter-punishment was extremely salient and unambiguous in previous studies (Denant-Boemont, Masclet, and Noussair 2007; Nikiforakis 2008). After having been exposed to a sanction, subjects were immediately given the opportunity to strike back, and only players who had imposed a punishment in the preceding punishment stage could be targeted for counter-punishment. In the counter-reward experiments (Flache and Bakker 2012; Flache 1996), on the other hand, counter-sanctioning was more ambiguous. Participants were always fully informed about all group members’ contributions and reward decisions, but the rules of the game did not favor any particular reaction to others’ behaviors in the reward-stage of the game. Unlike in the counter-punishment experiments, there was no explicit stage of the game where counter-rewards were possible. Instead, participants could respond with changing contributions and reward decisions in subsequent periods. The latter implementation of counter-sanctioning makes it more difficult for the researcher to disentangle rewards from counter-rewards, but comes with the important advantage of increased external validity. In real-life collective good situations, sanctions are inherently ambiguous signals, often embedded in long-term exchange processes in which players can respond with either contributions or sanctions to previous contributions or sanctions from others. It is an open question whether punishment institutions are vulnerable to countersanctioning under these more realistic conditions of the counter-reward experiments. The ambiguity of sanctions in repeated non-anonymous interactions leaves room for

282 | Andreas Flache, Dieko Bakker, Michael Mäs, and Jacob Dijkstra

at least two possibilities. One is that potential enforcers are deterred by the prospect of counter-sanctions (Nikiforakis and Engelmann 2011). Actors contemplating punishment thus fear counter-punishment, and actors contemplating withholding a reward fear forfeiting future rewards themselves. The other possibility is that sanctioned freeriders refrain from retaliation because the enforcers against whom they retaliate can strike back in future encounters (Nikiforakis and Engelmann 2011). If the latter mechanism prevails, lack of anonymity combined with a “shadow of the future” should not undermine the effectiveness of a peer sanctioning institution as a solution to the problem of collective action. If the former mechanism predominates, however, repeated non-anonymous interactions will severely curtail the effectiveness of peer sanctioning. In this study, we propose and experimentally test the theoretical prediction that the vulnerability of a peer sanctioning institution to counter-sanctions depends on whether the sanctioning institution is peer-reward or peer-punishment. In the subsequent section, we elaborate our theoretical predictions. We then describe the experiment and our results.

2 Theory and hypotheses Under both peer-reward and peer-punishment institutions, group members provide selective incentives to fellow group members using social or material resources. Actors considering freeriding face the possibility of peers using the incentive conditionally on a target’s sufficient contribution to the common good. Theoretical approaches in the literature differ when it comes to explaining why group members sanction in the first place. The social preference explanation (e.g., Bolton and Ockenfels 2000; Charness and Rabin 2002; Dijkstra 2012; Fehr and Schmidt 1999) posits that norm enforcers derive satisfaction from punishing a perpetrator, such that the material or social costs of enforcement are subjectively more than compensated. Peer-reward institutions, by contrast, are typically addressed from the perspective of social exchange theory (Coleman 1990; Dijkstra 2012; Dijkstra 2015; Holländer 1990; Homans 1974). In this view, group members reward contributors because the costs of rewarding are more than compensated by future contributions to the collective good that the reward recipient will make in response. This exchange theoretic model of peer sanctioning replaces the intrinsic benefits of sanctioning central to theories of peer-punishment, with the extrinsic motivation of obtaining future compensation for a present investment in an ongoing exchange relation. In the present research, we follow previous work in assuming that peer-punishment is predominately motivated by its intrinsic benefits (e.g., vengeance, relief) for enforcers, whereas peer-reward is mainly driven by the enforcers’ desire to maintain a beneficial exchange with the target of the sanction. At the same time, we acknowl-

The Double Edge of Counter-Sanctions | 283

edge that future work should move beyond this distinction and explore the possibility that peer-rewards may be emotionally motivated, just as peer-punishment decisions may also be affected by exchange considerations. Evidence of “altruistic rewarding” has, for example, been provided by experiments with reward-based sanctioning institutions in which players invested in costly peer-reward, although reciprocation was precluded by a one-shot design (Van Miltenburg et al. 2014). For an anonymous sanctioning institution, both emotionally driven peer-punishment and exchange driven peer-reward lead to the same qualitative prediction about effects of the institution. Both institutions will increase contribution rates in a repeated collective good game, in comparison with a game without a peer sanctioning institution (Hypothesis 1). More precisely, and in line with the common standard in the literature, we formulate this claim for a collective good game that imposes the incentive structure of an indefinitely repeated N-person Prisoner’s Dilemma game in which players are aware of the aggregate contribution of their peers in previous rounds but have – in the absence of a peer sanctioning institution – no means of responding other than by adapting their own contribution decision. For games like this, a common pattern frequently observed in experimental research is that participants initially make substantial contributions in the absence of a peer sanctioning institution, but cooperation rates gradually collapse as the game progresses (Andreoni 1988; Camerer 2003; Ledyard 1995). An anonymous peer-punishment institution adds a second stage in every round, in which players learn individual peers’ contribution levels in the previous stage and can then respond with a negative sanction targeted at a particular peer. The institution is anonymous in the sense that targets do not know who imposed a sanction upon them. At the same time, in our design, the group remains stable throughout the game and all players keep the same labels, such that a sanction can always be given conditionally on their past contribution behavior. Imposing the sanction comes at a cost to both the recipient and the enforcer. An anonymous peer-reward institution differs from this only in that the sanction benefits the recipient. The expected positive effects of anonymous peer sanctioning derive from the assumption that peer sanctions will be imposed, and that players expect this. This implies that the reward (punishment) a player receives from her peers in the sanctioning stage of a round of the game is higher if this player contributed (did not contribute) to the collective good in the contribution stage (Hypothesis 2 and Hypothesis 3). A game-theoretic analysis can highlight more precisely the scope conditions under which effectiveness of a peer-reward institution is in line with players’ strategic rationality, even without intrinsic motivation to sanction. Drawing on the theory of repeated games (Friedman 1971; Friedman 1986; Taylor 1987), Flache (1996; see also Flache, Macy, and Raub 2000; Flache 2002) showed that the exchange-theoretic explanation of the effectiveness of peer-reward can be reconstructed in terms of individually rational conditional cooperation in an indefinitely repeated game (see also Spagnolo 1999 for a similar analysis). More precisely, given a sufficiently large contin-

284 | Andreas Flache, Dieko Bakker, Michael Mäs, and Jacob Dijkstra

uation probability, effective peer-reward under an anonymous peer-reward institution can be sustained by a subgame-perfect equilibrium in which all players condition their contributions to the collective good as well as the rewards they give for their peers’ contribution behavior in previous rounds of the game. This model of peer-reward imposes constraints on the incentive structure under which the institution can be effective. In a nutshell, the institution can be effective only if there is sufficient interest in future payoffs: the value of the reward is large enough to discourage players from risking losing future rewards despite the benefits of freeriding, and the costs of providing a reward are sufficiently low to guarantee a net benefit as long as the reward recipients respond with cooperation. But will the institution remain effective once enforcers are identifiable (thus making counter-reward possible), all other things being equal? To answer this, we will focus theoretically and experimentally on payoff parameters for the game that meet the constraints under which effective peer-reward is sustained by a subgame perfect equilibrium. The conditions under which effective peer-reward is possible will also be taken as starting points for studying peer-punishment in our contribution. We assume that the provision of peer-punishment – unlike the provision of peer-reward – is more robust against higher costs or lower benefits of contribution and sanctions due to the additional intrinsic benefits of punishing. We therefore assume that if anonymous peerreward is effective for given payoff parameters, so is anonymous peer-punishment. To render peer-reward and peer-punishment comparable, we equalize the material unit costs and benefits of the collective good and the unit costs of sanctioning under both institutions. Moreover, the material value of avoiding a punishment is equal to the material value of receiving a reward.² Alignment of the two institutions allows isolating the effects of adding the possibility of counter-sanctions to either institution.

2 To be precise, the games are not exactly aligned in this way. A difference that remains is that perpetual conditional cooperation under the reward institution requires that players bear the costs of providing rewards throughout the game, whereas under the punishment institution perpetual conditional cooperation requires that no punishments need ever be carried out (see van Miltenburg et al. 2014 for a similar argument). This implies that when all costs and benefits are equal in the way described above, the conditions for conditional cooperation to constitute a Nash equilibrium in the repeated game are actually less restrictive for a punishment institution than they are for a reward institution. However, for the punishment game, a Nash equilibrium does not assure individual rationality in the stricter sense that the threat of actually carrying out the punishment is credible once it has failed to deter freeriding. Unless we assume intrinsic motivation, the fact that punishment is costly may withhold players from imposing the sanction in that situation. Technically, the strategy of conditional punishment may constitute a Nash equilibrium, but it is not necessarily a subgame perfect one. This is different for the reward game, where carrying out the threat of not rewarding is cost-free. Flache (1996; 2002) proved that in a reward game, the corresponding Nash equilibrium is also subgame perfect. For these reasons, the two games cannot readily be aligned in terms of the conditions under which conditional cooperation is individually rational. The best we can do, therefore, is to align all cost and benefit parameters such that they are equal in absolute value.

The Double Edge of Counter-Sanctions | 285

The non-anonymous sanctioning institutions we study differ from the anonymous ones only in that enforcers are identifiable for their peers in the interactions following enforcement. It has been shown that this small difference can profoundly affect a peerreward institution (Flache and Bakker 2012; Flache and Macy 1996; Flache 1996). In a repeated game, the loss of anonymity provides players with additional information to condition their behavior. In particular, there is a new conditional strategy in which present rewards are conditioned on rewards received in the past. The problem for the effectiveness of the peer-reward institution is that in the corresponding equilibrium of the repeated game, the prospect of attaining future rewards from peers is no longer an incentive for a player to contribute to the collective good. Instead, it is an incentive to keep rewarding the peer even if she is a freerider. To be sure, everyone would still be better off if the collective good were to be provided and players rewarded each other at the same time. Technically, the equilibrium corresponding to only mutual reward is payoff inferior to a competing one in which everyone both contributes and rewards all peers. However, the mutual-reward equilibrium also turns out to be more robust under random deviations (Flache 2002), easier to coordinate upon for boundedly rational backward-looking players (Flache and Macy 1996), and consistent with individual rationality under a less restrictive set of conditions (Flache, Macy, and Raub 2000; Flache 1996). Intuitively, the reason for this discrepancy is that a mutually beneficial exchange of reward for reward between two “friends” in a group is easier to establish and maintain than the more complex multilateral exchange between a sufficient number of contributors to the collective good and a sufficient number of peers needing to reward them for their efforts to motivate them into contributing (see Manhart and Diekmann 1989 for a similar argument about group size effects on the robustness of conditional cooperation). Building on this theoretical work, we hypothesize that contribution rates will be lower under a non-anonymous peer-reward institution than under an otherwise equivalent anonymous one (Hypothesis 4). We expect that the rewards a player receives from her peers depend less on being a contributor when the peer-reward institution allows counter-reward than in an equivalent anonymous peer-reward institution (Hypothesis 5). Effects of lifting the anonymity behind which enforcers can hide should be different when the motivation to sanction freeriders in the first place is intrinsic. To be sure, the experiments of Nikiforakis (2008) and others indicate that the costs of future retaliation are not entirely disregarded by enforcers in a peer-punishment institution. However, these experiments are crucially different from the peer-punishment institution we consider here. In our experiment, a player who counter-punishes an enforcer must face the possibility of retaliation by that same enforcer in the future. Other than for counter-reward, we cannot draw on elaborated formal modelling work to form theoretical expectations about effects of non-anonymity of punishers in a repeated game framework. The reason is that previous research on counter-punishment has focused both theoretically and empirically on one-shot games. Nevertheless, the underlying

286 | Andreas Flache, Dieko Bakker, Michael Mäs, and Jacob Dijkstra

logic of intrinsic motivation can also be applied to informally hypothesize effects of anonymous vs. non-anonymous punishment institutions in repeated games. Our reasoning starts from the notion that the possibility of counter-punishment evoking revenge is very real if we follow the arguments of Fehr and Gächter (2000; 2002; see also Bowles and Gintis 2011). Players can be expected to anticipate or learn from interactions with their peers that a considerable fraction of a group is prepared to punish norm violators at a cost to themselves. While non-anonymity of enforcement may thus reduce the extent to which freeriders are punished compared to an anonymous punishment institution, we expect that the prospect of future retaliation strongly deters punished freeriders from counter-punishing. In other words, under an anonymous punishment institution, the sanctions imposed on a player should be less contingent on the sanctions imposed by that player on her peers in the past than under a nonanonymous reward institution (Hypothesis 6). Accordingly, we also expect that the degree to which non-anonymity of the sanction will reduce rates of contribution to the collective good is higher if the sanction is reward than if the sanction is punishment (Hypothesis 7).

3 Method 3.1 Game and conditions We compared five different institutions in our experiment, using five treatments in a between-subjects design. In all treatments, subjects played a repeated five-person Prisoner’s Dilemma game. Following Flache (1996), we adopted a stage game with a dichotomous choice. In every round of the game, all players had to decide simultaneously whether or not to contribute to a collective good. After each round, they learned how many members contributed, and how many points they had earned from the collective good. The group playing the game consisted of the same five subjects with the same (anonymized) identities in all rounds. The first treatment of our experiment represented the baseline situation without a peer-sanctioning institution. The remaining four treatments added four different sanctioning institutions to the collective good game, implemented by a sanctioning stage following the contribution stage in every round of the game. We manipulated the peer sanctioning institution in a full factorial design along two dimensions. The first dimension was whether the sanction was reward or punishment. In both cases, players learned in the second stage of every round who had contributed and who had failed to contribute to the collective good in the first stage of that round. They also learned the amount of contributions every group member had made in all previous rounds of the game. Next, subjects had to decide simultaneously whether or not to impose a sanction on each of the other group members, at a

The Double Edge of Counter-Sanctions |

287

cost to themselves. Under the reward (punishment) institution, the sanction increased (decreased) the wealth of the target. Unlike in the design of Fehr and Gächter (2000; 2002; see also Nikiforakis 2008; Denant-Boemont, Masclet, and Noussair 2007), players could not vary the magnitude of a sanction and there was no budget limitation constraining the number of sanctions they could impose. Instead, each sanction had a fixed cost for the enforcer and a fixed effect on the wealth of the recipient. These design choices were made to represent the situation of an informal peer sanctioning institution in which actors’ instruments for sanctioning are typically social resources that do not depend on the material output of the collective action (e.g., the force of reprobation does not depend on material wealth). The second dimension along which the peer sanctioning institutions varied was the anonymity of the enforcer. With anonymous peer sanctioning, players learned after the second stage of every period only the number of other players who had punished (or rewarded) them in the previous stage. They were not told by whom. When peer sanctioning was not anonymous, players also learned who had imposed sanctions upon them. Moreover, subject labels were never changed so that group members remained identifiable and could be linked to their previous actions by all participants of the game, in all rounds. In the non-anonymous treatments, subjects also received information of how often they had sanctioned each of their fellow group members on average, in all previous rounds of the game. Only in the non-anonymous sanctioning treatment did they additionally learn how often on average they had been sanctioned by each of their fellow group members in previous periods.

3.2 Payoff structure and wealth Subjects received an initial endowment and were informed that their total wealth at the end of the game consisted of the initial endowment plus the total number of points earned in all periods of the game. To align wealth effects of punishment and reward we calculated endowments such that the maximal and minimal number of money units (MUs) subjects could end up with at the end of the game was equal across all peer sanctioning institutions. We chose payoff parameters such that players faced a considerable incentive to freeride in the collective action game, such that sanctions could provide a strong incentive to either earn rewards from the other players or avoid being punished. The payoff a player received in one round of the game consisted of the sum of the payoffs obtained in the collective good stage and (except for the baseline) the sanctioning stage of the game. The payoff rule for the collective good part implied a conventional linear N-person Prisoner’s Dilemma structure. A contribution added 30 MUs to the total value of the collective good in the current round at a cost of 20 MUs to the contributor. The total value of the collective good generated by the group in one round was divided by the number of players and the resulting amount added to every player’s

288 | Andreas Flache, Dieko Bakker, Michael Mäs, and Jacob Dijkstra

Tab. 1: Payoff of a player in the collective good game. Number of peers who contributed

0

1

2

3

4

Player did not contribute Player contributed

0 −14

6 −8

12 −2

18 4

24 10

payoff in that round. This imposed a social dilemma in that the marginal costs of contribution (20) were more than three times as large as the marginal benefits (30/5 = 6), yielding a marginal net benefit of −14 MU. Table 1 shows the resulting payoff structure of the collective good stage game. In the sanctioning stage of one round, a player could impose a sanction on each other group member at a cost of 3 MUs per target to the enforcer. The benefits of being rewarded or the costs of being punished increased linearly in the number of group members who rewarded (or punished) the target. Being rewarded (or punished) by one other player increased (or decreased) the recipient’s wealth by 10 MUs. The relatively low cost of imposing a sanction compared to the value of being sanctioned implements the notion that social rewards or punishments are relatively easy to “produce” in an informal peer sanctioning institution, but can have a substantial impact on a recipient’s subjective wellbeing (Coleman 1990). This choice made peer sanctions potentially an effective instrument to elicit contributions to the collective good. Tables 2 and 3 describe the payoff structure of the stage game in the sanctioning phase for reward and punishment respectively. Being rewarded by two peers, or avoiding the punishment of two peers, could offset the costs of a contribution to the collective good. This also created a considerable incentive to use sanctions as instruments to elicit reward or avoid punishment in the institutions with counter-sanctions. At 3/10, the cost-to-benefit ratio of mutual reward Tab. 2: Payoff of player at the sanction stage under a peer-reward institution. Number of peers rewarding the player

0

1

2

3

4

0 −3 −6 −9 −12

10 7 4 1 −2

20 17 14 11 8

30 27 24 21 18

40 37 34 31 28

Number of peers rewarded by the player 0 1 2 3 4

The Double Edge of Counter-Sanctions

| 289

Tab. 3: Payoff of player at the sanction stage under a peer-punishment institution. Number of peers punishing the player

0

1

2

3

4

0 −3 −6 −9 −12

−10 −13 −16 −19 −22

−20 −23 −26 −29 −32

−30 −33 −36 −39 −42

−40 −43 −46 −49 −52

Number of peers punished by the player 0 1 2 3 4

was considerably better than the cost-to-benefit ratio of contributing to the collective good in a situation of universal cooperation (20/30).

3.3 Duration of the game and payment Subjects received no information about the duration of the game other than that the maximum duration of an experimental session was one hour. Thus, since subjects did not know the number of rounds to be played, we regard this as an indefinitely repeated game. In fact, the game ended after 20 rounds and endowments were calculated such that, across all sanctioning treatments, subjects could in the worst case end up with zero MU, whereas in the best case they would end up with 1,800 MU. In the punishment treatments, subjects started with a wealth of 1,320 MU in their accounts, whereas they started with 520 MU in the reward treatments. In the baseline treatment, subjects could never end up with fewer than zero MUs and they could earn maximally 760 MUs after 20 rounds. All subjects were paid a show-up fee of 5 € and also received a payment proportional to the final wealth in their account at a conversion rate of 1 € per 180 MU. Unlike many experiments with designs that prevent reciprocation (e.g., Fehr and Gächter 2002), the subjects of our experiments received a sizeable endowment at the outset rather than a series of small endowments per round. This approach allows comparison with previous similar experiments using peer-reward in a repeated setting (Flache 1996). It also allows subjects better to realize the long-term consequences of their actions by showing how their wealth develops over time, in accordance with the theoretical notion of a longer-term exchange perspective in the reward treatment.

290 | Andreas Flache, Dieko Bakker, Michael Mäs, and Jacob Dijkstra

3.4 Participants The experiments were conducted at the “Sociological Laboratory” of the University of Groningen, The Netherlands (http://www.gmw.rug.nl/~orsee/public/). The Sociological Laboratory’s subject pool mostly consists of undergraduate students from a variety of disciplines including sociology, economics, law, biology, physics, etc. Students of psychology and sociology are overrepresented in the subject pool. The rules of the Sociological Laboratory guarantee that subjects will not be deceived, and that they will be paid for their efforts. We conducted nine sessions with 120 subjects in total, where 20 subjects (4 groups) participated in the baseline treatment. Of the subjects, 25 were assigned to the reward treatment and 35 to the reward and counter-reward treatment. The treatments with only punishment and with punishment and counter-punishment each comprised of 20 participants. Experiments took place in computer rooms, with subjects in separate cubicles. The experiments were programmed as a web application by the second author. After an oral introduction by the experimenter, each participant randomly picked an envelope containing detailed instructions and a login code to start the experiment. This ensured random assignment of participants to groups.

4 Results Hypothesis 1 states that contribution rates are higher in the anonymous sanctioning institutions than in the baseline treatment. Figure 1 pictures the development of contribution rates in the five treatments of the experiment. Dots show observed average rates in sets of five subsequent periods. The error bars show 95 % confidence intervals with robust standard errors clustered on subjects.³ In the baseline condition, the contribution rate was 0.37 averaged over all periods. Supporting Hypothesis 1, Figure 1 shows fewer contributions in the baseline condition than in all other treatments. Compared to the baseline treatment, random-effects logistic regressions with random intercepts at the level of subjects and groups showed significantly more contributions in the anonymous reward treatment (z = 6.45 [p = 0.000], log odds model), the nonanonymous reward treatment (z = 4.82 [p = 0.000], log odds model), the punishment treatment (z = 2.08 [p = 0.037], log odds model) and the non-anonymous punishment treatment (z = 2.61 [p = 0.009], log odds model). The same regression models with round as an additional independent variable showed significantly decreasing contribution rates in the baseline treatment (z = − 3.98 [p = 0.000]) and the treat-

3 Due to the small number of observations on the levels of subjects and groups, it was not possible to estimate multi-level logistic regressions for this figure.

The Double Edge of Counter-Sanctions

| 291

Contribution rate

ments with anonymous punishment and non-anonymous punishment (z = 2.46 [p = 0.014]). There was no significant trend in the remaining treatments (all abs(z) < 0.87, [p > 0.38]). These results not only support Hypothesis 1 in showing that contribution rates in the conditions with anonymous peer sanctioning exceed those in the baseline, but also demonstrate that, under the institutions allowing counter-sanctioning, contribution rates are higher than in the baseline.

Baseline

Anonymous reward

Non-anonymous reward

Anonymous punishment

Non-anonymous punishment

1 .8 .6 .4 .2 0 1–5

11–15 1–5 11–15 1–5 11–15 1–5 11–15 1–5 11–15 6–10 16–20 6–10 16–20 6–10 16–20 6–10 16–20 6–10 16–20 Periods

Fig. 1: Treatment effects on contribution rates over time.

Supporting Hypotheses 2 and 3, Figure 2 shows that in peer sanctioning institutions, subjects who contributed received more rewards (fewer punishments) in the subsequent sanctioning stage. Multi-level Poisson regressions with random intercepts at the level of groups and subjects showed that the effect of the subject’s own contribution behavior on group members’ sanctioning decisions directed at the subject was significant in all peer sanctioning treatments. After a subject contributed, the expected log count of the number of received rewards increased by 2.17 (z = 8.33 [p = 0.000]) in the anonymous reward treatment and by 1.33 (z = 10.96 [p = 0.000]) in the nonanonymous reward treatment. The effect of a subject’s contribution on the amount of sanctions received was significantly stronger in the anonymous reward treatment than in the non-anonymous reward treatment (z = −2.89 [p = 0.004]), supporting Hypothesis 5. After a subject contributed, the expected log count of the number of received punishments decreased by 1.56 (z = −9.97 [p = 0.000]) in the anonymous punishment treatment and by 3.32 (z = −11.05 [p = 0.000]) in the non-anonymous punishment treatment. The difference in the two effects is significant (z = −4.53 [p = 0.000]). Figure 1 shows that contribution rates hardly differed between the anonymous and non-anonymous reward treatments, contradicting Hypothesis 4 that contribution rates should be lower in the latter than in the former. Average contribution rates in the non-anonymous reward treatment were about 5 % lower than in the anonymous reward treatment, but random-effects logistic regressions with random intercepts at the level of subjects and groups revealed that this difference is not significant (z = −0.96 [p = 0.337]). A trend analysis showed that neither the slopes (z = −0.85 [p =

Avg. numb. sanctions received

292 | Andreas Flache, Dieko Bakker, Michael Mäs, and Jacob Dijkstra

Anonymous reward

Non-anonymous reward

Anonymous punishment

Non-anonymous punishment

4 3 2 1 Defect Contribute

Defect Contribute

Defect Contribute

Defect Contribute

Subject’s choice in Period t

Fig. 2: Treatment effects on average number of received sanctions in Period t, over subject’s contribution choice in the same period.

0.400]) nor the intercepts (z = −0.31 [p = 0.757]) of the contribution rates differed significantly between the two reward treatments. In the non-anonymous punishment treatment contribution rates were 5 % higher than in the anonymous punishment treatment. This effect too, was insignificant (z = 0.03 [p = 0.976]). The trend analysis also showed no treatment effect (intercept: z = 0.33 [p = 0.741]; slope: z = −1.20 [p = 0.230]). Figure 3 depicts the development of sanctioning rates over time. Each dot shows the rate of sanctioning of all subjects of the respective treatment at the respective period of the experiment. The solid lines show linear trends. We estimated these trends with multi-level logistic regressions with random intercepts at the level of groups and subjects, using a dataset with 4 observations per subjects and period (one sanctioning decision for each of the 4 fellow group members). We found no significant difference in initial sanctioning rates between the anonymous reward treatment and the non-anonymous reward treatment (z = − 0.05 [p = 0.960], log odds model). However, in the anonymous reward treatment, sanctioning rates significantly decreased (z = −6.28 [p = 0.000]), whereas there was a significant linear increase in sanctioning in the non-anonymous-reward treatment (z = 2.69 [p = 0.007]). Based on the linear trend, the estimated reward rate dropped from 0.682 initially to 0.480 in Anonymous reward

1 .8 .6 .4 .2 0 0

5

10

15

Non-anonymous reward

20 0

Overall sanctioning rates

5

10

20 0 Period Linear trend

Fig. 3: Development of sanctioning rates.

15

Anonymous punishment

5

10

15

Non-anonymous punishment

20 0

5

10

15

20

The Double Edge of Counter-Sanctions |

293

round 20 in the anonymous reward treatment, and rose from 0.650 to 0.716 in the nonanonymous reward treatment. Figure 3 shows that subjects sanctioned much less in the two punishment treatments. In the anonymous punishment treatment, punishment rates fell significantly over time (z = − 2.25 [p = 0.024]). The same trend was found in the non-anonymous punishment treatment (z = − 2.66 [p = 0.008]). Neither the intercept (z = −1.74 [p = 0.082]) nor the slope (−0.95) of the trend line differed between the two treatments (in the log odds model). Based on the linear trend, the estimated punishment rate dropped from 0.167 initially to 0.104 in round 20 in the anonymous punishment treatment and from 0.082 to 0.030 in the non-anonymous punishment treatment. Anonymous reward 157 43

Non-anonymous reward

642 1058 Sanctioning rate i, t

.6 .4 .2 0

.2 0

Sa ion nc j, t tio –1 n j, t– No 1 sa nc t Sa ion nc j, t tio –1 n j, t– 1

ct sa n

Sa ion nc j, t tio –1 n j, No t– 1 sa nc tio Sa n nc j, t tio –1 n j, t– 1

ct sa n No

j contributed, t

.4

j contributed, t

391 85

Non-anonymous punishment

924 120

401 23

1023 64

j defected, t

j contributed, t

1

.8 .6 .4 .2

.4 .2 0

ct

ct sa n No

.6

sa n

1 t–

n tio

Sa ion nc j, t tio –1 n j, t– 1

j, n nc

Sa

ct io

j contributed, t

j,

t–

1

j defected, t

No

0

.8

Sa ion nc j, t tio –1 n j, t– No 1 sa nc ti Sa on nc j, t tio –1 n j, t– 1

Sanctioning rate i, t

1 Sanctioning rate i, t

j defected, t

.6

Anonymous punishment

sa n

563 1685

.8

No

Sanctioning rate i, t

.8

j defected, t

No

282 130 1

1

Fig. 4: Effects of other players’ contribution and sanction of focal player on sanctioning behavior focal player.

294 | Andreas Flache, Dieko Bakker, Michael Mäs, and Jacob Dijkstra

Figure 4 shows sanctioning rates in the four treatments with sanctioning institutions. To generate Figure 4, we analyzed dyadic data, where each case represents the decision of subject i to sanction or not sanction one of her four team members j in a given period t. Figure 4 shows the sanctioning rates of i in Period t depending on j’s contribution decision in the same period and whether or not j had sanctioned i in the previous period. The numbers above the bars of Figure 4 indicate how often one of our subjects faced the respective decision. The error bars shown in Figure 4 visualize the 95 % confidence intervals estimated with robust standard errors (clustered on subjects). In the following, however, we report only results from multi-level logistic regressions with random intercepts at the levels of groups and subjects. The two left-hand panels of Figure 4 show that contributors (grey bars) were rewarded more often than defectors (white bars) in both treatments with a peer-reward institution. This difference is significant (z = 23.16 [p = 0.000]). The two right-hand panels show that contributors were punished less often than defectors (z = −13.97 [p = 0.000]) in treatments with a peer-punishment institution. Our results are compatible with counter-rewarding having taken place. Even for the anonymous reward treatment, Figure 4 shows that subjects rewarded by a team mate j in the previous period rewarded that team mate at a higher rate in the current period (z = 14.62 [p = 0.000]). Subjects in this treatment were not informed about which group member rewarded them. However, this effect suggests that subjects nevertheless inferred from past periods who had rewarded them and tried to reciprocate, despite the risk of rewarding the ‘wrong’ group member. In line with the notion that subjects use reward as an instrument of exchange when this is possible, Figure 4 also shows that there was clearly more counter-rewarding in the non-anonymous reward treatment. When also controlling for the group member’s contribution, we found a significant increase in reward rates when the group member receiving the reward had previously rewarded the subject (z = 13.89 [p = 0.000]). This was observed both when the other group member contributed and when she defected. There is weaker evidence for counter-punishment. In the treatment with anonymous punishment, there is no significant effect of j’s previous sanctioning behavior on i’s punishment decision (z = −0.27 [p = 0.787]). This is different from what we observed in the anonymous reward treatment. In the non-anonymous punishment treatment, subjects punished a group member more often if they had been punished by the same subject in the previous period (z = 2.91 [p = 0.004]). Subjects even counter-punished contributors (with a rate of 9.4 %) if the contributor had punished them in the previous period. Note, however, that our subjects encountered this situation only 64 times. Thus, this very strong form of “antisocial counter-punishment” was observed only six times. Hypothesis 6 predicts that, in the non-anonymous punishment institution, the sanctions imposed on a player by her peers should be less contingent on the sanctions imposed by that player on her peers in the past than in a non-anonymous reward institution. Figure 5 provides no support for this hypothesis, showing sanctioning rates

The Double Edge of Counter-Sanctions |

Reward treatments

295

Punishment treatments

1

Sanctioning rate i, t

.8

.6

.4

.2

0 No sanction j, t–1 Anonymous

Sanction j, t–1

No sanction j, t–1

Sanction j, t–1

Non-anonymous

Fig. 5: Test of interaction treatment with effects of sanctioning behavior other player on focal player’s sanction (hypothesis 6).

estimated with multi-level logistic regressions with random intercepts at the level of groups and subjects. Figure 5 shows that we did not find for the comparison of the two reward treatments a stronger effect for having been sanctioned by a team member in the past than we find for the comparison of the two punishment treatments. To be precise, we did find a significant difference between these two treatments in the log odds model (z = 2.26 [p = 0.024]), but as Figure 5 shows, this effect is not meaningful in terms of probability differences. Hypothesis 7 predicts that the possibility of counter-sanctioning will reduce contribution rates more in the reward treatments than in the punishment treatments. Figure 6 compares contribution rates in the four treatments estimated with multi-level logistic regressions with random intercepts at the level of groups and subjects. It turns out that there is no significant difference in the contribution rates between the two reward treatments (z = − 0.18 [p = 0.857]), nor is there a difference between the two punishment treatments (z = − 0.03 [p = 0.976]). Thus, our data lend no support to Hypothesis 7. The different contribution and sanctioning patterns in the five treatments also translated into payoff differences. In the baseline treatment, participants earned on average 3.7 MU per round (t = 3.74 [p = 0.001]). In stark contrast, participants earned on average 25.1 MU (t = 30.00 [p = 0.000]) in the anonymous reward treatment and 26.6 MU (t = 21.59 [p = 0.000]) in the non-anonymous reward treatment. The difference between the two reward treatments is not significant (t = −1.68 [p = 0.098]).

296 | Andreas Flache, Dieko Bakker, Michael Mäs, and Jacob Dijkstra

Contribution rate

1 .8 .6 .4 .2 0 Anonymous sanction Reward

Non-anonymous sanction Punishment

Fig. 6: Test of interaction effect of reward vs. punishment by anonymity of sanction on contribution (hypothesis 7).

Obviously, these high payoffs per round resulted from the very high contributions and the fact that receiving a reward increases one’s payoff. Payoffs were much lower in the punishment treatments. In the anonymous punishment treatment, participants lost 0.1 MU on average (t = −0.07 [p = 0.945]). In 40 % of all played rounds in this treatment participants had a negative payoff. In the non-anonymous punishment treatment, participants earned 4.50 MU on average (t = 7.09 [p = 0.000]). Subjects could earn more MUs in the reward conditions due to the rules of the game. We corrected for this with lower endowments in the reward conditions as compared to the punishment conditions, aligning the maximum possible wealth in all sanctioning conditions to 1,800 MU (see above). This resulted in higher average net wealth at the end of the game for subjects in punishment conditions (anonymous: 1318, nonanonymous: 1,410) compared with subjects in reward conditions (anonymous: 1,022, non-anonymous: 1,052).

5 Discussion and conclusion Previous research has found that peer sanctioning institutions as a solution to the freerider problem can be vulnerable to counter-sanctioning. However, these studies did not allow for an assessment of whether peer-punishment is vulnerable under the same conditions as peer-reward. In this chapter we focused on conditions reflecting peer sanctioning in many empirical settings, that is, small group situations in which enforcers cannot remain anonymous and retaliation against sanctions and countersanctions is possible in future encounters. We expected that the non-anonymity of enforcers in repeated interaction may eliminate the vulnerability of peer-punishment to counter-punishment postulated by previous research, while we expected non-anonymity to reduce the effectiveness of peer-reward. Comparing the effects of counter-punishment and counter-reward for this setting in a collective good experiment, we found no evidence that the possibility for counter-sanctioning undermined

The Double Edge of Counter-Sanctions

| 297

peer sanctions as a solution to the freerider problem. These results put into perspective the results of previous studies of counter-punishment (Nikiforakis 2008) and support our expectation that counter-punishment may not be a problem in repeated non-anonymous interactions, as they often occur in real life collective action settings. What we did not expect was that a similar robustness to non-anonymity would be found for a reward institution, in contrast with previous research (Flache 1996). Our results thus raise probing questions. Strikingly, our experiments did support the theoretical mechanisms on the basis of which we expected that peer sanctioning would be vulnerable to counter-reward and robust to counter-punishment. We found that subjects traded rewards for rewards rather than using them to enforce contributions to the collective effort in the nonanonymous reward treatment. This was in line with the notion that reward is driven by a logic of exchange and conditional cooperation that can undermine enforcement. We also found, as hypothesized, that the possibility of counter-punishment only slightly reduced enforcement rates in the non-anonymous peer-punishment institutions, consistent with the explanation that peer-punishment is intrinsically motivated. A reason why the theoretical mechanisms we found did not induce a corresponding difference in terms of the vulnerability of the institutions could be that the vulnerability of peer-reward only shows up in the longer term. Our data (see Figure 3) suggest that the differences between sanctioning behavior with and without the possibility of counter-reward gradually increase over time, which is in line with theoretical predictions from learning theory and results of earlier studies using longer games than ours (Flache and Bakker 2012; Flache 1996). These differences did not yet apparently translate into differences in contribution decisions, but it is possible that over more than the 20 rounds that we employed contributions will be affected. An extrapolation of the increasing trend in reward rates that we found in the non-anonymous reward treatment allows for speculation that, after sufficient time, participants in this treatment may discover that they can obtain high earnings even without having to make costly contributions, eventually yielding a decline in contribution rates similar to the one found by Flache (1996). Likewise, an extrapolation of the simultaneous increase in contribution rates and decline in punishment rates, which we observed in the nonanonymous punishment treatment after the initial phase of the game, could indicate that eventually both freeriding and counter-punishment may be sufficiently deterred by experiences of retaliation to guarantee sustained contribution without the need for actual punishment. Future research should, therefore, study longer games and manipulate the length of the game to assess how this affects the link between contribution behavior, sanctioning and the possibility of counter-sanctioning. Our results also differ in further interesting ways from previous experiments comparing reward and punishment. While we observe higher contribution rates in the anonymous reward condition than in the anonymous punishment condition, Van Miltenburg and coauthors (2014) find the opposite relation. More precisely, contribution rates in their punishment conditions are approximately comparable to our anonymous

298 | Andreas Flache, Dieko Bakker, Michael Mäs, and Jacob Dijkstra

punishment condition, but contribution rates in our anonymous reward conditions are clearly higher than in Miltenburg et al.’s reward condition. However, in contrast with our anonymous sanctioning conditions, their treatment uses random matching after every round. We believe that this can explain why we observe higher contributions under anonymous reward. Unlike theirs, our reward treatment allows for the emergence of a repeated exchange of reward for contribution, even when enforcers are unknown. Contributors know even in the anonymous reward condition that they can expect future reward for present contributions. This provides a stronger incentive to contribute than in a random matching design. If punishment is primarily driven by emotional gratification, this difference in design should be less relevant for the punishment treatment than for the reward treatment, because under punishment the prospect of building up long-term exchanges is not the main motivation to contribute. The fact that we do not use random matching may, however, explain why we find relatively low punishment rates compared to previous experiments with peer-punishment. Without random matching, persistent freeriders risk repeated and escalating punishment from the same emotionally-driven enforcers. Accordingly, in an anonymous punishment design without random matching, less punishment may be needed to credibly signal that freeriding will elicit future sanctions. We believe that a promising direction for future research is the integration of theoretical models of peer-punishment based on social preferences, with models of peerreward based on conditional cooperation and exchange. While these mechanisms have hitherto typically been separated in the literature, there is no compelling reason why this should be so. Even when “altruistic punishment” is intrinsically motivated, players can still consider the – partially subjective – costs and benefits of it in terms of long-term exchange outcomes. Conversely, even when conditional cooperation in the exchange of peer-reward against contribution is primarily driven by associated longterm costs and benefits, actors may derive some intrinsic benefit from not rewarding a freerider, or from rewarding a contributor. Future research can assess whether such combined models can better account for differences between peer-punishment and peer-reward. Peer sanctioning is by far not the only solution to freerider problems in collective action, and under many conditions it may not be the most powerful or most efficient solution. For example, signaling and reputation systems can be highly effective in large-scale anonymous online markets (Diekmann et al. 2014) in which there is no possibility of long-term repeated interaction with the same partners. However, the conditions of relatively small scale, non-anonymous and repeated interaction that we focused upon here reflect those of many freerider problems people face in their daily lives: at the workplace, in their neighborhoods or in informal social groups. Our work suggests that under those conditions, the informal “bottom up” institution of peer sanctioning may be less vulnerable and more effective than previous research on counter-sanctioning has assumed. This is a potentially hopeful insight that warrants further scrutiny in future work.

The Double Edge of Counter-Sanctions |

299

Appendix: instructions Below we present sample instructions for the Punishment and Counter-punishment condition. Presented first are the written instructions given to participants at the start of the experiment. The written instructions were very similar across all conditions. For the Baseline condition, sentences referring to a change in others’ payoffs were removed. In conditions which employed Reward sanctioning rather than Punishment sanctioning, the word decrease was replaced with the word increase. For each condition the endowment was changed to the appropriate amount. No other changes were required. The written instructions for the Punishment and Counter-punishment condition were as follows:

Fig. 7: Written instructions for the Punishment and Counter-punishment condition.

We also present a sample screen from the Punishment and Counter-punishment condition, showing the sanctioning stage. In this stage, participants are presented with their buddies’ contribution decisions and contribution history. They respond by selecting Yes or No for each buddy, indicating whether they want to reduce this buddy’s payoff or not. Participants are immediately shown how much the selected decisions will cost them. This screen appeared in all conditions in which sanctioning was possible.

300 | Andreas Flache, Dieko Bakker, Michael Mäs, and Jacob Dijkstra

Fig. 8: Sanctioning stage for the anonymous punishment condition and the non-anonymous punishment condition.

The decision screens throughout the experiment were designed to make sure participants had all available information from the current period on their screen at all times, while not overwhelming them. As the experiment progressed through the stages of each period, the lines of the decision table were filled from top to bottom. The condition a participant played determined which lines of the decision table were presented. Throughout the experiment, we color-coded decisions (blue) and consequences (orange) to help participants keep track of what happened during the current period.

Bibliography [1] [2]

[3] [4] [5]

[6]

Andreoni, James. 1988. “Why Free Ride?: Strategies and Learning in Public Goods Experiments.” Journal of Public Economics 37(3):291–304. Anthony, Denise, Sean W. Smith, and Timothy Williamson. 2009. “Reputation and Reliability in Collective Goods: The Case of the Online Encyclopedia Wikipedia.” Rationality and Society 21(3):283–306. Barker, James R. 1993. “Tightening the Iron Cage: Concertive Control in Self-Managing Teams.” Administrative Science Quarterly 38(3):408–437. Bolton, Gary E., and Axel Ockenfels. 2000. “ERC: A Theory of Equity, Reciprocity, and Competition.” American Economic Review 90(1):166–193. Bouma, Jetske, Erwin Bulte, and Daan van Soest. 2008. “Trust and Cooperation: Social Capital and Community Resource Management.” Journal of Environmental Economics and Management 56(2):155–166. Bowles, Samuel, and Herbert Gintis. 2011. A Cooperative Species: Human Reciprocity and Its Evolution. Princeton, NJ: Princeton University Press.

The Double Edge of Counter-Sanctions |

[7] [8] [9] [10]

[11] [12]

[13]

[14] [15] [16] [17] [18] [19] [20] [21] [22]

[23] [24]

[25] [26] [27]

301

Camerer, Colin F. 2003. Behavioral Game Theory: Experiments in Strategic Interaction. New York: Russel Sage Foundation. Charness, Gary, and Matthew Rabin. 2002. “Understanding Social Preferences with Simple Tests.” Quarterly Journal of Economics 117(3):817–869. Coleman, James S. 1990. Foundations of Social Theory. Cambridge, MA: Harvard University Press. Denant-Boemont, Laurent, David Masclet, and Charles N. Noussair. 2007. “Punishment, Counterpunishment and Sanction Enforcement in a Social Dilemma Experiment.” Economic Theory 33(1):145–167. Diekmann, Andreas. 1985. “Volunteer’s Dilemma.” Journal of Conflict Resolution 29(4):605– 610. Diekmann, Andreas, Ben Jann, Wojtek Przepiorka, and Stefan Wehrli. 2014. “Reputation Formation and the Evolution of Cooperation in Anonymous Online Markets.” American Sociological Review 79(1):65–85. Diekmann, Andreas, and Wojtek Przepiorka. 2015. “Punitive Preferences, Monetary Incentives and Tacit Coordination in the Punishment of Defectors Promote Cooperation in Humans.” Scientific Reports 5:10321. Dijkstra, Jacob. 2012. “Explaining Contributions to Public Goods: Formalizing the Social Exchange Heuristic.” Rationality and Society 24(3):324–342. Dijkstra, Jacob. 2015. “Social Exchange: Relations and Networks.” Social Network Analysis and Mining 5(1):60. doi:10.1007/s13278-015-0301-1. Fehr, Ernst, and Simon Gächter. 2000. “Cooperation and Punishment in Public Goods Experiments.” American Economic Review 90(4):980–994. Fehr, Ernst, and Simon Gächter. 2002. “Altruistic Punishment in Humans.” Nature 415(6868):137–140. Fehr, Ernst, and Herbert Gintis. 2007. “Human Motivation and Social Cooperation: Experimental and Analytical Foundations.” Annual Review of Sociology 33:43–64. Fehr, Ernst, and Klaus M. Schmidt. 1999. “A Theory of Fairness, Competition, and Cooperation.” The Quarterly Journal of Economics 114(3):817–868. Flache, Andreas. 1996. The Double Edge of Networks: An Analysis of the Effect of Informal Networks on Cooperation in Social Dilemmas. Amsterdam: Thesis Publishers. Flache, Andreas. 2002. “The Rational Weakness of Strong Ties: Failure of Group Solidarity in a Highly Cohesive Group of Rational Agents.” Journal of Mathematical Sociology 26(3):189–216. Flache, Andreas, and Dieko M. Bakker. 2012. “De Zwakke Kant van Sociale Controle. Theoretisch en Experimenteel Onderzoek naar de Spanning Tussen Groepsreciprociteit en Relationele Reciprociteit in een Sociaal Dilemma.” Pp. 103–130 in Samenwerking in sociale dilemma’s. Voorbeelden van Nederlands onderzoek, edited by V. Buskens and I. Maas. Amsterdam: Amsterdam University Press. Flache, Andreas, and Michael W. Macy. 1996. “The Weakness of Strong Ties: Collective Action Failure in a Highly Cohesive Group.” Journal of Mathematical Sociology 21(1–2):3–28. Flache, Andreas, Michael W. Macy, and Werner Raub. 2000. “Do Company Towns Solve Free Rider Problems? A Sensitivity Analysis of a Rational-Choice Explanation.” The Management of Durable Relations: Theoretical and Empirical Models for Households and Organizations, edited by W. Raub and J. Weesie. Amsterdam: Thela Thesis. Friedman, James W. 1971. “A Non-Cooperative Equilibrium for Supergames.” Review of Economic Studies 38(1):1–12. Friedman, James W. 1986. Game Theory With Applications to Economics. New York: Oxford University Press. Hardin, Garrett. 1968. “The Tragedy of the Commons.” Science 162(3859):1243–1248.

302 | Andreas Flache, Dieko Bakker, Michael Mäs, and Jacob Dijkstra

[28] Holländer, Heinz. 1990. “A Social Exchange Approach to Voluntary Cooperation.” The American Economic Review 80(5):1157–1167. [29] Homans, George C. 1951. The Human Group. New York: Harcourt Brace Jovanovich. [30] Homans, George C. 1974. Social Behavior: Its Elementary Forms. New York: Harcourt Brace Jovanovich. [31] Kandel, Eugene, and Edward P. Lazear. 1992. “Peer Pressure in Partnerships.” Journal of Political Economy 100(4):801–817. [32] Ledyard, John O. 1995. “Public Goods: A Survey of Experimental Research.” Pp. 111–194 in Handbook of Experimental Economics, edited by J. H. Kagel and A. E. Roth. Princeton, NJ: Princeton University Press. [33] Manhart, Klaus, and Andreas Diekmann. 1989. “Cooperation in 2- and N-Person Prisoner’s Dilemma Games: A Simulation Study.” Analyse und Kritik 11(2):134–153. [34] Marwell, Gerald, and Pamela Oliver. 1993. The Critical Mass in Collective Action. Cambridge: Cambridge University Press. [35] Van Miltenburg, Nynke, Vincent Buskens, Davide Barrera, and Werner Raub. 2014. “Implementing Punishment and Reward in the Public Goods Game: The Effect of Individual and Collective Decision Rules.” International Journal of the Commons 8(1):47–78. [36] Nikiforakis, Nikos. 2008. “Punishment and Counter-Punishment in Public Good Games: Can We Really Govern Ourselves?” Journal of Public Economics 92(1):91–112. [37] Nikiforakis, Nikos, and Dirk Engelmann. 2011. “Altruistic Punishment and the Threat of Feuds.” Journal of Economic Behavior and Organization 78(3):319–332. [38] Oliver, Pamela. 1980. “Rewards and Punishments as Selective Incentives for Collective Action: Theoretical Investigations.” American Journal of Sociology 85(6):1356–1375. [39] Olson, Mancur. 1965. The Logic of Collective Action. Cambridge: Harvard University Press. [40] Opp, Karl-Dieter, Peter Voss, and Christiane Gern. 1995. Origins of a Spontaneous Revolution: East Germany, 1989. Ann Arbor, MI: University of Michigan Press. [41] Ostrom, Elinor, James Walker, and Roy Gardner. 1992. “Covenants With and Without a Sword: Self-Governance Is Possible.” The American Political Science Review 86(2):404–417. [42] Petersen, Trond. 1992. “Individual, Collective and Systems Rationality in Work Groups: Dilemmas and Market-Type Solution.” American Journal of Sociology 98(3):469–510. [43] Roethlisberger, Fritz J., and William J. Dickson. 1939. Management and the Worker: An Account of a Research Program Conducted by the Western Electric Company, Hawthorne Works. Cambridge, MA: Harvard University Press. [44] Spagnolo, Giancarlo. 1999. “Social Relations and Cooperation in Organizations.” Journal of Economic Behavior and Organization 38(1):1–25. [45] Taylor, Michael. 1987. The Possibility of Cooperation. Cambridge: Cambridge University Press. [46] Willer, Robb. 2009. “Groups Reward Individual Sacrifice: The Status Solution to the Collective Action Problem.” American Sociological Review 74(1):23–43.

Fabian Winter and Axel Franzen

Diffusion of Responsibility in Norm Enforcement Evidence from an n-Person Ultimatum Bargaining Experiment Abstract: The enforcement of social norms is crucial for the functioning of groups, organization and societies. Often, norm enforcement requires the coordinated action of many individuals, but sometimes a subset of actors or even a single person is sufficient to punish a norm violation. The latter situation lowers the barrier of norm enforcement, but it also introduces a coordination problem of who should be responsible for the punishment. Just like in the Volunteer’s Dilemma, this might then lead to the diffusion of responsibility, where everybody hopes everyone else will enforce the norm. In this chapter, we study norm enforcement in the multiple responder ultimatum game, a game with one proposer and multiple responders. In this game, a rejection by a single responder is sufficient to punish the proposer at the cost of losing the offered amount only to those who choose to reject the offer. We derive hypotheses from different models of (behavioral) game theory, and test these hypotheses in a lab experiment. Our results suggest that the diffusion of responsibility can also be observed in our design: more responders lower the likelihood of rejecting a given offer. We also find that responder behavior in the multiple responder ultimatum game is best explained by an adapted version of the Fehr–Schmidt model.

1 Introduction Social norms prescribe more or less explicitly how individuals should behave and are vitally important for the functioning of a society. One of the most important norms is the prohibition to kill or threaten the life of fellow human beings. If this norm did not exist, or if compliance was not enforced, life would probably be as Thomas Hobbes described in his famous Leviathan: “solitary, poor, nasty, brutish, and short” (Hobbes 1651). Societies sanction non-compliance of the anti-homicide norm with the highest available penalty: lifelong imprisonment, or even death in some countries. Another but much less dramatic example of essential norms are property rights. Harold Demsetz (1967) described their importance for society. If there were no property rights for private goods, or if they were not properly enforced, individuals would not invest in housing, agriculture, or any other kind of handicraft or business, since they would run the risk of losing their investments and efforts to others not involved in the production process. The building of houses, cities, and the functioning of any complex society become unthinkable and certainly highly improbable. https://doi.org/10.1515/9783110472974-015

304 | Fabian Winter and Axel Franzen

Not all norms are as obviously important and fundamental as the above examples. More recently, sociologists and economists have focused on norms of fairness and reciprocity, which have already been discussed in the classic writings of Georg Simmel (1908), Alvin Gouldner (1960), and Marcel Mauss (1968). The norm of reciprocity can be defined as the expectation that an individual will treat other individuals as they have treated him. Hence, an actor will return a favor but will also punish others for misbehavior towards him. Since returning favors and losses restores equality between two actors, the fairness norm is possibly even more fundamental than the norm of reciprocity. Fairness can be defined as the idea that individuals are entitled to the same rights and resources. It is easy to see why both norms should be important for societies. Suppose a society consists of many individuals who adhere to these norms; knowing that others return favors should ease the initiating of social and economic exchange. Likewise, societies in which individuals fear exploitation and unfair treatment will less easily engage in exchange and trade, or will restrict exchange to a very small group of kin contacts. However, restricting exchange and trade to small groups of kin contacts limits a society’s ability to fully realize the advantages of specialization, and thus economic development. Not all members in a given society adhere to fairness norms to the same extent: compliance depends on the situation and on the interplay of different variations of fairness and reciprocity (Franzen and Pointner 2012; Pointner and Franzen 2016). There is nevertheless convincing evidence that reciprocity is a strong norm in some societies (Diekmann 2004). Norms must be reinforced to persist and to guide behavior. If norms are institutionalized, they are clearly defined by laws and reinforced by institutions of the state such as attorneys, courts, and the police. This is the case for the anti-homicide norm and property rights. Other norms, such as fairness or reciprocity, are less clearly defined, and professional institutions do usually not enforce them, at least not directly. Violations of the norm of fairness have no legal consequences in many instances. These norms must therefore be reinforced through individual voluntary sanctions to persist. However, often norm enforcement is costly, and rational actors face incentive problems for contributing to the enforcement of norms. This problem is often referred to as the “second order public goods problem” (Oliver 1980). There are various ways to study the conditions under which voluntary norm enforcement takes place and which circumstances facilitate it. In experimental game theory, one very widely used design is the Ultimatum Game (UG) (Güth, Schmittberger, and Schwarze 1982), in which an endowment is given to two players. One of them (the proposer) can divide the endowment any way he wants. The second player (the responder) can then either accept the offer or reject it. In case of acceptance, both players receive the suggested amount. In case of rejection, both players receive nothing. From a game-theoretical point of view, and under the assumption that players are payoffmaximizing, the responder should accept any offer bigger than zero (and the proposer should only offer the smallest possible amount that is bigger than zero). This solution

Diffusion of Responsibility in Norm Enforcement | 305

is the only subgame-perfect Nash equilibrium.¹ However, many experiments with the UG show that offers below 40 % are very often rejected (Güth and Kocher 2013). One interpretation of these results is that responders become angry that proposers are violating norms of fairness when offering less than half of the endowment and sanction unfair behavior. Since the sanctioning occurs even though it is costly, it is often referred to as “altruistic punishment”. Altruistic punishment can also be observed in public good games (PGG). Fehr and Gächter (2000; 2002) showed that actors sanction non-cooperating actors in the PGG even if sanctions are costly. Furthermore, the possibility of sanctioning noncooperative players restores cooperation and leads at least in some circumstances to almost full cooperation in four person groups (but see Edgas and Riedl 2008 for other results). In comparison to the PGG, in the UG a single player is sufficient for sanctioning a norm violation. However, many real-life situations are also characterized by the fact that the group of potential norm-sanctioning individuals is larger than one. If this is the case, the bystanders face a coordination problem of who should do the sanctioning. Hence, the idea of punishment can be thought of as a step-level second-order public goods problem. There are several ways to study step-level public good games. One way is the approach of Diekmann and Przepiorka (2015), who model it as a Volunteer’s Dilemma (VOD) (Diekmann 1985) and, alternatively, as a Missing Hero Dilemma (MHD). In the VOD, each individual (the bystanders) receives utility U if a norm is reinforced. The person who does the sanctioning faces costs, but these costs are smaller than his utility (U − K < 0). However, if no player enforces the norm, then everyone receives a utility of zero. In the VOD, therefore, every actor prefers to punish a norm violation if nobody else does so. But since enforcement by a single actor is sufficient, individuals in the VOD face the problem of who should do the sanctioning. In the VOD, peer punishment is on the equilibrium path, such that the question of who punishes becomes a coordination game rather than a social dilemma. This substantially relaxes the problem of peer punishment. In the MHD, a public good can also be provided by a single player. However, unlike the VOD, the costs of punishment (K) exceed the benefits (U) gained from the public good (U − K < 0). Hence, no player has an incentive to punish (C) irrespective of what others do, and not to punish (D) is a dominant strategy. If no player punishes, however, then all players receive zero as in the VOD. The MHD is thus a social dilemma in the

1 The all/nothing split is the only subgame-perfect equilibrium, but the UG has many other Nash equilibria. If the responder threatens to reject any offer lower than x, offering x is the Nash equilibrium of the game. This threat is not credible, but credibility is not a requirement of the Nash equilibrium concept. The commonly observed 50:50 splits in the UG are therefore also not in contradiction to game theory per se. They do, however, question the predictive reliability of the subgame-perfect equilibrium.

306 | Fabian Winter and Axel Franzen

sense that self-regarding players will not provide the good. Non-provision and a payoff of zero is the only Nash equilibrium. We study the problem of norm enforcement by a further variation of the cost and benefit structure. The game we propose is a variation of the UG such that there are multiple responders. In the multiple-responder ultimatum game (MRUG), a single proposer makes an offer to more than one responder simultaneously. The norm of fairness gets violated if the proposer offers less than half of the original endowment to the responders. Every responder can than veto the proposers’ offer, in which case the vetoing (sanctioning) individual receives a payoff of zero, as in the original UG. Hence, the cost (K) of sanctioning is exactly as large as the payoff of accepting the offer. Thus, in the MRUG, K = U which equals the offer of the proposer. Unlike the MHD or the VOD, the payoff to every responder in the MRUG is U > 0 for every non-sanctioning responder, even if no responder does the sanctioning, and not 0. The MRUG is not a social dilemma, since a veto by one (or more) of the responders will not increase the payoff of other responders; but it offers the opportunity to study the question of whether an individual’s willingness to punish norm deviations depends on how many bystanders can enforce the norm. We discuss three different conditions in which potential enforcers are either alone, with one or three other responders. We show that some results of the experimental research on the VOD do not depend on the assumption of equilibrium punishment. On the one hand, the individual decision to reject an offer decreases with the number of responders in the MRUG. When the potential enforcer is alone, the probability that a too-low offer will be rejected is significantly higher than in groups of two (odds ratio = 1.24) or in groups of four (odds ratio = 1.42). We will refer to this as the Individual Bystander Effect (IBE). On the other hand, the overall empirical probability that norms are enforced increases with group size in the VOD (Franzen 1995). This is also the case for the MRUG. We will refer to this as the Aggregate Bystander Effect (ABE). We show that the individual decision to engage in punishment is partly consistent with a specific form of inequality aversion put forth by Fehr and Schmidt (1999). We also, however, show that the effect of bystanders on the provision of the public good is not correctly predicted by this model, suggesting that sanctioning is partly driven by the individual desire to personally punish the offender, but that the tendency to punish also depends on the number of bystanders. This suggests that it might be important to see the norm being enforced rather than feeling obliged or emotionally driven to enforce the norm oneself. We show that this notion is captured by a different interpretation of the Fehr–Schmidt model. Extending the UG by adding more responders to the decision situation has been done before (e.g., Grosskopf 2003; Fischbacher, Fong, and Fehr 2009). Grosskopf (2003) compares the standard UG with a UG with three responders. However, unlike our MRUG, the proposer receives his share if at least one responder accepts the offer, and receives zero only if all responders reject. This introduces competition among the responders, and induces the proposer to reduce the offer, since the probability

Diffusion of Responsibility in Norm Enforcement | 307

that at least one of the responders accepts the offer increases with the number of responders. Similarly, Fischbacher, Fong, and Fehr (2009) vary the UG between one, two, and five responders. In this study, the proposer receives the proposed share if at least one responder accepts and is only punished if all responders reject. In contrast, in the MRUG, one rejection is sufficient to punish the proposer and responders face the problem of who should volunteer to punish and who should forgo the offer. To our knowledge, this is the first time such a decision situation has been considered.

2 Theoretical predictions The multiple-responder ultimatum game (MRUG) is played by n ≥ 2 players, one proposer and n − 1 responders. It extends the standard ultimatum game (Güth, Schmittberger, and Schwarze 1982) by adding additional responders. As in the UG, a proposer makes an offer x to the responders on how to split a sum of money π. The offer is then multiplied by the number of responders (n − 1), such that every responder receives the same offer x, which can individually be accepted or rejected. Rejecting an offer leads to zero profit for the rejecting responder. If at least one responder rejects, the proposer loses his share and receives 0 as well. Responders, on the other hand, can keep their offer, even if another responder rejects, unless the responder rejects the offer themselves. These design choices have several consequences. First, the collective welfare ω (i.e., the total sum of money which is paid out in case of acceptance) depends on the proposer’s offer and is given by { π − x + (n − 1) ⋅ x, { { ω = {(n a − 1) ⋅ x, { { {

if the offer is accepted by all responders if the offer is rejected by at least one responder and accepted by n a responders .

If the proposer keeps everything and thus offers x = 0, the total sum will by the initial endowment π. If, on the other hand, he decides to give away everything and to offer x = π, the proposer loses π but transfers (n−1)⋅π in total. This means that the proposer can “print money”: if there are at least two responders, offering everything to them is socially efficient since the proposer loses x but transfers (n − 1) ⋅ x > x in total. If proposers try to maximize the collective additive welfare, this can even lead to offers of x = π. We will not discuss this point further, as we are mainly interested in the responders’ choices. The interesting consequence of this design choice is that offers of 50 % are always egalitarian, irrespective of the number of other players. This keeps the fair split constant (and thus comparable) over treatments. An alternative definition with a constant size of the pie, where every responder would receive x/(n − 1) would give different egalitarian offers for different n and thus different predictions for different treatments.

308 | Fabian Winter and Axel Franzen

Second, rejections only affect the proposer and the rejecting responder(s). Thus, sanctions can be targeted specifically to the wrongdoer and do not impose externalities on other responders. Bribery would be a real-world scenario of this kind: if a Mafioso’s bribes do not satisfy every single corrupt policeman, a single deviation could bring him to jail, while all the others could keep their already received bribes. Finally, as we will see later, the psychological costs of norm violations make punishment more attractive. Lower offers, and thus stronger norm violations, cause a higher psychological incentive to punish. At the same time, the opportunity costs of rejection are lower for lower offers and are thus less costly to reject in monetary terms. This aligns psychological and monetary costs, but at the same time insures that the monetary costs of rejection are always positive. In what follows, we will discuss several formal models of human behavior and derive predictions from these models about the effect of different offers and group sizes in the MRUG. The first model applies the subgame-perfect equilibrium under common knowledge of rationality and material maximization. Under these very strict assumptions, all offers will be accepted such that we would not expect group size effects. We will then very briefly sketch the social preference utility function proposed by Fehr and Schmidt (1999) and derive hypotheses from two variants of this model. Depending on the exact formulation of the model, we will derive group size effects for at least a range of offers and increasing rejection rates for low offers.

2.1 The rational man The equilibrium analysis of the different treatments under the assumption of a “rational man” is straightforward. For the one-responder case, the usual subgame-perfect equilibrium of offering and accepting (almost) nothing applies (Güth, Schmittberger, and Schwarze 1982). This reasoning has already been discussed in the introduction. It is also easy to see that the equilibrium outcome does not change if more responders enter the game. Additional responders neither affect the remaining responder’s payoff, nor are they affected by their choices, implying that responder behavior does not change either. And since subgame perfection in the UG only depends on the responder, the equilibrium does not change either. Since for any number of responders the dominant strategy is to accept all offers, predictions relying on the subgame perfection concept and the assumption of a “rational man” do not predict a diffusion of responsibility in larger groups. This implies three hypotheses: (1) All offers will be accepted. (2) Following from this floor effect we predict no differences at the micro level, that is the individual probability of rejecting an offer (IBE) is constant across treatments. (3) And finally also at the macro level of the provision of punishment (ABE) does not differ between treatments.

Diffusion of Responsibility in Norm Enforcement | 309

2.2 The social man In the recent years, several “behavioral” models of social interaction have been proposed to understand the discrepancies between many empirical results and the predictions made by game theory. Since these new models change their conception towards a more socially embedded, often norm guided man, one could contrast the “rational man” with the “social man”. This norm-guided actor has lately been refurbished in the “neo-classical repair shops”, most notably by the theory of inequity aversion propagated by Fehr and Schmidt (1999). Their theory is neo-classical in the sense that it does not challenge the as-if idea of a maximizing, or rational, decision-maker; yet it is also a repair program, since it enriches the utility function and changes the gametheoretical presentation such that the game-theoretical solution moves closer to reality instead of modeling the decision process directly (Güth 2011:20). We will use two simplifying interpretations of the theory by Fehr and Schmidt (henceforth the FS-Model) to illustrate how their theoretical model can be used to predict different reactions to norm violations. Interestingly, the diffusion of responsibility in our framework of ultimatum bargaining can be a result of these social preference models. Before discussing our interpretations of the FS-Model in the next two subsections, let us briefly review the underlying idea of their approach. Fehr and Schmidt assume that players are driven by three forces: material self-interest, the disutility from normviolations to their own disadvantage, and the disutility from norm-violations to their own advantage. The latter two forces are referred to as “inequity aversion”. More technically, Fehr and Schmidt define the following utility function of player i: u i (x i , x j ) = x i −

αi n βi n ∑ max (x j − x i , 0) − ∑ max (x i − x j , 0) n − 1 j=i̸ n − 1 j=i̸

The material payoff of i is denoted by x i ≥ 0, the payoff of j by x j and without loss of generality we can further assume that ∑ni=1 x i = 1.² The second term of this equation corresponds to the disutility experienced from being worse off than others (scaled by the “greed-parameter” α i ≥ 0) and the third term corresponds to the disutility from norm-violations to one’s own advantage (scaled by β i , 0 ≤ β i ≤ 1). The parameters α i and β i can be interpreted as the degree of commitment or internalization of a norm (Winter, Rauhut, and Helbing 2012). The greater α i and β i are, the more weight the person puts on egalitarian outcomes. In what follows, we will restrict our discussion to cases where the decision-maker is worse off than the other person, implying x i ≤ 12 . We believe that this case is more interesting, since norm violations to one’s own advantage will never lead to norm enforcement (β i ≤ 1). 2 Note that this normalization does not violate our definition of the game. We allow the pie size to vary according to the proposer’s offer, so the pie size is not bound to a specific value, for example, π = 1. The normalization is still possible by simply comparing the relative payoffs for a given offer.

310 | Fabian Winter and Axel Franzen

2.2.1 The unconditionally enforcing man The behavioral model of unconditional enforcement makes simple, non-strategic assumptions about norm-enforcement. If I feel treated badly, I will react, no matter what others do or think. It is a first simplification of the FS-Model, and differs from the original model in two ways. First, it assumes that the responder compares her outcome only to the proposer’s outcome. Any remaining inequalities to other parties (e.g., responders) are considered irrelevant. This is of course by no means an innocent assumption. People might for instance feel treated unfairly if others receive a bigger share of the pie, even though they do not differ in any relevant characteristic. In our case, however, offers are restricted to be the same for all responders, such that this factor should not play a role. The second simplification concerns the sophistication of reasoning. We assume that actors are simply not strategic. If and only if they feel treated unfairly, they will reject. Responders thus live in a non-strategic world: they believe that only they can punish a norm-violation. Also, this assumption is by no means innocent. It excludes diffusion of responsibility via the following VOD-like reasoning among the potential rejecters: “I would reject, but maybe there is someone else who would reject as well, so let me reject with probability p∗ < 1.” When offered x i ≤ 0.5 by the proposer, the responder evaluates how much less she has than the proposer. If the difference gets too big, the offer might be seen as a punishable violation of the equality norm. More formally, the above utility function thus reduces to u i (x i , 1 − x i ) = x i − α i max ((1 − x i ) − x i , 0) . (1) The responder compares her utility from the offered x i to the utility of rejection u(0, 0) = 0, in which case both would receive zero and the inequality vanishes. Solving for α i gives the critical level of norm internalization necessary to punish too-low offers as xi α∗ = . (2) (1 − x i ) − x i Thus, an offer x i will be rejected if the individual α i is greater than α ∗ . The likelihood that an offer is rejected thus depends on the offer x i , as well as on the individual norm internalization α i , but not on the number of other responders n − 1. Since α i is a latent construct, and thus unknown, we must make assumptions about their distribution in the population to derive hypotheses about rejection rates. For the sake of simplicity, assume that α i ’s are drawn from the uniform distribution α i ∼ U(0, s), where s denotes the the upperbound of the interval. The probability of rejecting an offer is thus given by xi , 1) , 0) . P (α i > α ∗ ) = max (min ( s ((1 − x i ) − x i )

Diffusion of Responsibility in Norm Enforcement | 311

Figure 1a plots this probability for one, two, and four responders and some arbitrarily chosen s = 3.³ Evidently, the probabilities at the micro-level do not differ, because other responders are simply not considered. 1 P (having offer rejected (ABE))

P (rejecting an offer (IBE))

1 .8 .6 .4 .2 0

.6 .4 .2 0

0

i

.8

~ U(0,3)

.1 .2 .3 .4 Offered share of the pie 1 Resp.

2 Resp.

.5 4 Resp.

(a) Unconditional norm enforcement: Micro

0

i

.1 .2 .3 .4 Offered share of the pie

~ U(0,3)

1 Resp.

2 Resp.

.5 4 Resp.

(b) Unconditional norm enforcement: Macro

Notes: We assume α i ∼ U(0, 3). The individual probability of rejecting an offer does not depend on the number of other responders, and thus there is no IBE. Contrary to the VOD, more responders lead to higher aggregate rejection rates, and no or even a reversed ABE. Fig. 1: Individual probability of rejecting an offer (IBE, a) and global probability that an offer is rejected (ABE, b) for the case of 1, 2, and 4 responders if norms are unconditionally enforced.

What, then, are the consequences at the macro-level? The probability that a norm is enforced by at least one person is given by 1 − P (α i < α ∗ )

n−1

,

which is increasing in n. The more responders receive the same low offer, the higher the likelihood that it will be rejected, which is the opposite of the theoretical predictions in the VOD. Another observation might be of interest: depending on the distribution of α i ’s, offers slightly below one half might be “safe” offers. The flat line in Figure 1 indicates that these offers will never be rejected. Since we assume that α i ∼ U(0, s), and thus have an upper bound, we can calculate the safe offer as x∗ =

s . 2s + 1

3 All results discussed here qualitatively hold for all s > 0. High s leads to higher rejection rates, and only offers close to 50 % could be considered “safe offers”.

312 | Fabian Winter and Axel Franzen In Figure 1 (and all other theoretical figures) we assume s = 3, such that offers greater than 3/7 can be considered safe. It is also easy to see that, as the upper bound of α increases, x∗ approaches 12 , the only truly safe offer. Comparing the model of unconditional enforcement to the rationality of subgame perfection leads to four competing hypotheses: (1) Lower offers are more likely to be rejected than higher offers; (2) the individual willingness to punish norm violations does not depend on others (there is no IBE); but (3) norm violations are more likely to be punished (reversed ABE), and 4) all offers x i < 0.5 might be rejected.

2.2.2 The conditionally enforcing man Our second version of the FS-Model differs from the previous one by assuming that the responder not only compares her payoff to that of the proposer, but also to that of all the others. She does not want to be worse off than either the proposer or the other responders, but she is still non-strategic and does not consider that others might be angry and therefore reject as well. Adapting this reasoning to the FS utility function, the decision-maker again must evaluate whether she should accept an offer or reject it. If she accepts a low offer (and all others do so as well), the utility is now given by u i (accept) = x i −

αi n ∑ max (x j − x i , 0) . n − 1 i=j̸

Note that the second term differs from the second term in equation (1), which was the unconditional enforcement. It now represents the average deviation of all players j from i’s offer, instead of only representing the difference between the proposer and the responder i. Rejecting an offer while all others accept it would lead to u i (reject) = −

α i (n − 2) x i . n−1

Thus, rejection leads to a payoff of 0 in the one-responder case, but since the other (by assumption non-rejecting) responders receive x i , rejection might leave the responder with a utility of even less than 0 in the multiple-responder case. The inequality towards the proposer would be reduced to 0 in case of rejection, but the remaining n − 2 responders still have more than the rejecting responder. A responder thus rejects an offer (given all others accept) if the following inequality holds: xi −

αi n α i (n − 2) x i ∑ max (x j − x i , 0) < − n − 1 i=j̸ n−1

Solving for the critical α-level for accepting a given offer x for a given number n − 1 responders, we get x i − nx i α ∗n−1 = . nx i − 1

Diffusion of Responsibility in Norm Enforcement | 313

Critical degree of norm internalization α* n−1

8

xi = .33

6

xi = .15

xi = .1

4

2

0 1

2

3

4 5 6 7 Number of responder

8

9

Notes: In a game with 6 responders, an offer of x i = .1 would for instance be rejected if i’s α i was greater than 1.25.

Fig. 2: The relationship between norm internalization and the number of responders for different offers.

Note that for x i < 1/n, the required norm internalization α ∗n−1 is increasing in n. Figure 2 plots the critical α ∗n−1 for different offers. Furthermore, offers accepted by the conditionally enforcing man will certainly be accepted by the unconditional enforcer, but not vice versa, or (more formally) ∀x i ∈ (0, 1/n) : α∗n−1 > α ∗ . The reason for this is that with more responders, rejection becomes more unattractive, since some inequality towards the – now better off – other responders remains. This reasoning also shifts the safe offer discussed above to x∗ = 1/n. Irrespective of the α-level, offers above 1/n will always be accepted, because at this point, the inequality in case of rejection towards the remaining (accepting) responders outweighs the inequality a rejecting responder would reduce towards the proposer. Now, let us again assume that α i ∼ U(0, s). With some more reordering, we get the probability that person i will reject a given offer as P i (α i > α ∗n−1 |n, x) = 1 − (

α ∗n−1 ) . s

Recall that α ∗n−1 is increasing in n, and thus the probability of rejection is decreasing in n. As we can also see from Figure 3a, the more responders, the lower the likelihood that an offer will be rejected. Thus, conditional enforcement implies an Individual Bystander Effect (IBE). Introducing social preferences towards bystanders and proposers thus leads to a phenomenon off the equilibrium path (rejections in the ultimatum game) that can also be observed on the equilibrium path in the VOD.

314 | Fabian Winter and Axel Franzen

The individual probabilities might differ from the probability that unfair offers are rejected by at least one responder, and thus norm violations are sanctioned. We can derive the global probabilities that a given offer is rejected by at least one responder as (n−1) α∗ . 1 − ( n−1 ) s Figure 3b plots the macro-probability of the norm being enforced.⁴ Surprisingly, the probability increases in n for very low offers, but decreases in n for higher offers. The Average Bystander Effect (ABE) can thus be expected if the offers are high, but this is reversed for low offers, where a greater group leads to a higher likelihood of norm enforcement. 1 P (having offer rejected (ABE))

P (rejecting an offer (IBE))

1 .8 .6 .4 .2

.6 .4 .2 0

0 0

i

.8

~ U(0,3)

.1 .2 .3 .4 Offered share of the pie 1 Resp.

2 Resp.

(a) Conditional norm enforcement (Micro)

.5 4 Resp.

0

i

~ U(0,3)

.1 .2 .3 .4 Offered share of the pie 1 Resp.

2 Resp.

.5 4 Resp.

(b) Conditional norm enforcement (Macro)

Notes: We assume α i ∼ U(0, 3). The more responder, the higher the probability of accepting an offer as predicted by IBE (see left). Whether the Aggregate Bystander Effect holds depends on the offer: For low offers, more responders lead to lower aggregate rejection rates as predicted by the ABE, for higher offers this is not the case. Fig. 3: Individual probability of rejecting an offer (IBE, a) and global probability that an offer is rejected (ABE, b) for the case of 1, 2, and 4 responders if norms are only enforced if other responders are not much better off.

4 All results discussed here again qualitatively hold for all s > 0. High s lead to higher rejection and only offers close to the 1/n-split could be considered “safe offers”.

Diffusion of Responsibility in Norm Enforcement | 315

2.3 Summary of hypotheses The three models discussed above differ substantially in their predictions about how responders react to the offers they receive. Subgame perfection (SP) implies three hypotheses. First, all offers will be accepted (SP1 ). Following from this floor effect, we expect no differences at the micro level, or (to put it differently) no IBE (SP2 ). Finally, subgame perfection implies that there will of course be no ABE macro-level effect on the provision of punishment (SP3 ). Comparing the rationale of subgame perfection to the model of unconditional enforcement (UE) leads to three competing hypotheses. Lower offers are more likely to be rejected than higher offers (UE 1 ). The individual willingness to punish norm violations also does not depend on others (no IBE), but this is caused by a direct comparison only between proposer and the deciding responder, not by the floor effect as above (UE2 ). Consequently, norm violations are more likely to be punished such that unconditional enforcement predicts a reversed ABE (UE2 ), and all offers x i < 0.5 might be rejected, if the parameter α i is high enough (UE2 ). Thirdly, conditional enforcement (CE) also predicts a higher likelihood of rejection if offers are low (CE1 ). We would expect to observe an IBE, or a decreasing individual willingness to reject, as the number of responders grows (CE1 ). The macro effects (ABE) are different for different offers: while CE predicts ABE for high offers, it also suggests a reversed ABE for low offers (CE3 ). Finally, some offers, x i < 0.5, can be “safe offers”, which will never be rejected. Table 1 summarizes our hypotheses. Tab. 1: Overview of the predictions regarding micro- and macro-probabilities of norm enforcement, and the existence of safe offers. Subgame perfection

Unconditional enforcement

Conditional enforcement

Bystander effect Micro Macro

No difference No difference

No difference Increasing in n

Decreasing in n Increasing/ decreasing in n

Safe offer One responder Two responders Four responders

0 0 0

.5 .5 .5

.5 1/3 1/5

316 | Fabian Winter and Axel Franzen

3 Experimental design The participants were 127 students (mean age: 24.31, s.d. 4.59, min. 20, max. 58 years) of the University of Berne from a wide range of academic disciplines. At least 27 subjects (21 %) were female.⁵ The experiment was conducted using the z-Tree software developed by Fischbacher (2007). At the beginning of each session, subjects were randomly assigned to one of the computer terminals. General instructions regarding the procedure were given on the computer screen. Before starting the experiments, questions regarding the design where answered by the experimental staff to verify that everyone understood the rules. The experiment started when there were no further questions. Communication was prohibited from that point onward. After completing the experiment, subjects were individually paid at the experimenter’s desk.

3.1 Experimental manipulations To test our theory, we employed a 3 × 11 within-subject factorial design, varying the number of responders in one group (one, two or four) and the offer in the ultimatum game using the strategy method (0–100 %).

3.1.1 Varying the number of responders Our first manipulation varied the number of responders. Every subject made his or her decision for every possible number of responders. The sequence of decisions was balanced across all subjects, such that a third started in the treatment with one responder, another third with two and the remaining third with four (the same applies to the remaining sequence). This allowed us to separate learning effects from group size effects. Next to the decision form, subjects saw the respective pictures from Figure 4.

3.1.2 Varying offers Our second variation of the ultimatum game introduced the strategy vector method (Selten 1967; Fischbacher, Gächter, and Fehr 2001). The strategy method has been argued to be a better measure of normative behavior than the usual play method (Rauhut

5 Due to a coding error in the experiment, “no answer” was coded as male in the gender question. This error was removed in later sessions, but the gender composition in these later sessions is still skewed towards males.

Diffusion of Responsibility in Norm Enforcement | 317

$

$

$

(a)

$

(b)

$

$

$

$

$

$

(c)

Notes: The pictures were displayed in the instructions for the respective parts as well as next to the decision screen. Fig. 4: Graphical representation of the number of responders as used in the experiment.

and Winter 2010). A “conventional” ultimatum game using the play method would ask a proposer to offer the responder a certain amount of money. The responder could then accept or reject this offer. Our implementation of the strategy method instead asked the responder whether he or she would accept an offer of 0, 1, . . . , 10. In contrast with the play method, our method returned a vector of acceptance choices for every possible offer (hence the term “strategy vector method”). In a second step, a proposer entered his or her offer. Finally, the computer randomly matched a proposer and a responder and determined both players’ payoff given the proposer’s offer and the responder’s decision on this realized offer.

3.1.3 Varying roles After all participants had completed the experiment in the role of the responder, we announced that all participants would also play the role of the proposer. Again, we varied the number of responders according to our experimental treatments. Subjects were paid according to only one of their decisions. We therefore randomly determined

318 | Fabian Winter and Axel Franzen

whether they were paid as a proposer or responder and with one, two, or four responders.

4 Results We will start the discussion of our results by reporting the observed data, and will then continue to evaluate the predictive power of the theoretical models developed in Section 3.

4.1 Observed data Figure 5a plots the observed probability that an offer is rejected against the offer ranging from 0 (everything is kept by the proposer) to 10 (everything is offered to the responder). Independently of the number of responders, zero-offers are rejected with a probability of more than 90 %, whereas 50 / 50 offers are accepted with a probability of more than 90 %.

Average rejection rate

1

Rejection

.9

1 Responder

ref.

.8

2 Responder

–0.221*

4 Responder

–0.367**

(–1.99)

.7 .6

(–3.27) period

.5 .4

offer .3

–6.291*** (–29.52)

decisions

.2 .1 0 0

1

2

1 Responder (a)

–0.124* (–2.21)

3

4

5 6 Offer 2 Responder

7

8

9

10

4191

subjects

127

Pseudo R2

0.484

*p < .05, ** p < .01,*** p < .001

4 Responder (b)

Notes: Coefficients are displayed as marginal effects at the mean, z-statistics in parenthesis. Fig. 5: Observed probability of rejection for 1, 2, and 4 responders (a) and random effects logistic regression with subject-specific error term estimating the probability of rejection (b).

Diffusion of Responsibility in Norm Enforcement | 319

Note that offers above 50 % are again more likely to be rejected, and recall that our subjects decided for every offer whether they would like to accept it or not. About 85 % of our respondents reported monotone strategies, rejecting every offer below a certain threshold and accepting everything above it. The remaining participants (with one exception) gave an acceptance interval: they rejected too low and too high offers but usually accepted 50 / 50 offers. These acceptance intervals could reach from 1 to 9, or be exactly 5 where all unequal offers would be rejected. The table in Figure 5 reports a random effects logistic regression with subject-specific error terms, estimating the difference in rejection-probabilities for the different treatments. Estimates are reported as marginal effects at the mean. Note, that the coefficient for offer is far below 0 and significant, indicating that higher offers are much less likely to be rejected. More importantly, however, the estimated treatment effects are significant but small compared to the effect for offers. Responders are less likely to reject an offer if there is one other responder who could reject as well, and even less so if there are three others. The difference between two and four responders is however not statistically significant (Wald test, p = 0.175). This result is consistent with the previously discussed hypothesis that more responders lead to less individual norm enforcement.

4.2 Comparing observed to predicted data To evaluate how well the models developed in Section 3 perform, we will now concentrate on offers not larger than 50 % and normalize the sum to distribute to 1 by dividing all offers and acceptance thresholds by the highest possible offer.

4.2.1 The rational man Figures 7a and 7b once again plot the observed likelihood of rejecting an offer at the individual level (the individual bystander effect at the micro-level in a) and the likelihood that the fairness norm is enforced at all (the aggregate bystander effect at the macro-level in b). Recall that subgame perfection predicted all offers would be accepted, but evidently low offers are rejected. We can even quantify this contrast between predicted and observed rejection rates by calculating the mean squared difference (MSD) between two. The MSD is commonly used in prediction tournaments when several models compete for the best behavioral predictions (cf. Erev et al. 2010). In our case, it can range from 0 (all predictions are correct) to 1 (all predictions are maximally off). In the two-responder case, the MSD between the prediction for the subgame perfect equilibrium and the observed data is 0.51 and 0.49 in the four-responder case.

320 | Fabian Winter and Axel Franzen

4.2.2 The social man To evaluate the performance of the remaining two models, we will continue to analyze our data in the following order: (1) Estimate the individual norm internalization α i from the 1-responder treatment; (2) Plug the obtained distribution of α i ’s into the models for two responders and four responders and calculate the predicted rejection rates; and (3) Compare predicted values to observed values for all three models. To estimate the true α i , we assume that it has been measured in our 1-responder treatment.⁶ Remember from equation (2) that an offer x i will be rejected if αi >

xi . (1 − x i ) − x i

If an offer x1 was rejected (α i > α ∗ (x1 )) but x2 = x1 + 0.1 was accepted (α i < α ∗ (x2 )), some offer between x1 and x2 must be the last acceptable offer. We will define i’s least acceptable offer as x∗i = (x1 −x2 )/2, the arithmetic mean of her lowest accepted and the highest rejected offer. Say, for instance, i rejected an offer of 0.2 and accepted an offer of 0.3; then her least acceptable offer would be 0.25. Plugging this value into the equation above, we get α i = 0.5. Figure 6 plots the distribution of estimated α i ’s obtained by this procedure. The supposedly bimodal distribution is in fact a result of the nonlinear relation between α i and the acceptance threshold. Only quite small values of α i are sufficient to reject small offers, but α i converges to infinity if offers close to the equal split are rejected. Participants who rejected an offer of x i = 4 have a corresponding α i = 4.5. Given the empirical distribution of α i , we can now plot the likelihood that an offer is rejected by plugging the α i ’s into the models discussed above and simply calculate the share of rejections for any given offer.

4.2.3 Unconditional enforcement Figure 7c plots the individual bystander effect predicted by the unconditional enforcement model given the estimated α i . Remember that this model predicted no difference between the different treatments, but empirically the responder’s willingness to reject an offer shrinks for larger groups. Qualitatively speaking, the (statistically significant) observed individual bystander effect was not predicted by the model. On the other hand, the observed diffusion of responsibility is small, and the values predicted by the

6 The true α might also be measured in the two-responder or four-responder treatment. If it could (in principle) be measured in one of the treatments, and if this could be the same for all participants, the relative order of treatments would be preserved, though the predictions would shift.

Diffusion of Responsibility in Norm Enforcement | 321

40

Frequency

30

20

10

0 0

1

2

3

4

i Fig. 6: Estimated degrees of norm internalization α i .

theoretical model fit the observed data very well, much better than any other model presented here (MSD for 2 responders = 0.02, MSD for 4 responders = 0.04). Figure 7d plots the predicted probabilities of norm enforcement at the macro-level. Again, the predicted values resemble the observed data very well, both qualitatively and quantitatively. Contrary to the theoretical predictions for the VOD, more responders, empirically and theoretically, lead to more unconditional norm enforcement.

4.2.4 Conditional enforcement In contrast to the unconditional model discussed before, conditional enforcement assumes that people react to the presence of other potential enforcers. In terms of data fit, our data is less supportive of the conditional enforcement model, though it still performs better than the predictions derived from subgame-perfect equilibria. Again, Figure 7e shows the predicted micro probabilities of norm enforcement. The predictions differ substantially from the observed data. Quantitatively, the MSD for two responders increases to 0.20, and even to 0.26 in the four-responder case. The conditional enforcement model, however, correctly predicts the negative relationship between the individual willingness to enforce the norm and an increased number of possible punishers. However, the theoretically predicted effect over-exaggerates its empirical importance such that the quantitative fit does not reflect this qualitative consistency. Furthermore, the observed individual probability that an offer of 0.2 is rejected in groups with two responders is above 60 %, which again contradicts the model’s predicted existence of safe offers smaller than 0.5.

322 | Fabian Winter and Axel Franzen 1 P (having offer rejected (ABE))

P (rejecting an offer (IBE))

1 .8 .6 .4 .2

0

.1 .2 .3 .4 Offered share of the pie (a) Observed (Micro)

.5

.4 .2

0

.1 .2 .3 .4 Offered share of the pie (b) Observed (Macro) P (Having an offer rejected (ABE))

1 P (rejecting an offer (IBE))

.6

0

0

.8 .6 .4 .2 0 0

.1 .2 .3 .4 .5 Offered share of the pie (c) Predicted by unconditional enforcement (Micro)

.8 .6 .4 .2 0 0

.1 .2 .3 .4 Offered share of the pie

(e) Predicted by conditional enforcement (Micro) 1 Responder

2 Responder

.5

.5

1 .8 .6 .4 .2 0 0

.1 .2 .3 .4 .5 Offered share of the pie (d) Predicted by unconditional enforcement (Macro) P (Having an offer rejected (ABE))

1 P (rejecting an offer (IBE))

.8

1 .8 .6 .4 .2 0 0

.1 .2 .3 .4 Offered share of the pie

.5

(f) Predicted by conditional enforcement (Macro) 4 Responder

Notes: Different shades of grey indicate the 1, 2, and 4-responder treatment, respectively. Fig. 7: Observed individual enforcement rates (a) and collective enforcement rates (b), and enforcement rates predicted by the unconditional model (c, d), and the conditional model (e, f) for the individual and the collective probability of norm enforcement.

Diffusion of Responsibility in Norm Enforcement | 323

Similar results also hold for the macro effect (see Figure 7f). Since it predicts opposing effects for different ranges of offers, it also makes qualitatively correct predictions for the lower range of offers. It is easy to see, however, that the predicted functional form on the macro-level is very different from the observed data. Our results can be summarized from two perspectives. On the one hand, we do find support for the diffusion of responsibility in statistical terms. Low offers are 2–4 % more likely to be rejected, and thus norm violations are more likely to be punished, if responders are alone. These estimates are statistically significant at least at the 5 % level. But there is also another perspective on the results. If seen from the perspective of model comparison, the effects are small in comparison to the differences in model fit. The unconditional enforcement model has an MSD in the region of 0.03, while the only model predicting the bystander effect – the conditional enforcement – is off by an MSD of at least 0.2. Tab. 2: Summary of the qualitative and quantitative predictions, and results. Subgame perfection

Unconditional enforcement

Conditional enforcement

Data

Bystander effect Micro

No difference

No difference

Decreasing in n

Macro

No difference

Increasing in n

Increasing/ decreasing in n

Weakly decreasing No

Safe offer One responder Two responders Four responders

0 0 0

.5 .5 .5

.5 1/3 1/5

MSD Two responders Four responders

.51 .49

.02 .04

.20 .26

.5 .5 .5

5 Conclusions Norm enforcement is often costly. In many situations, these costs even outweigh the immediate benefits, which turns the provision of an enforcement institution into a classical social dilemma. In other cases, seeing a norm being enforced might have immediate intrinsic value, for instance when it reduces negative emotions like anger or the feeling of injustice. In this chapter, we take this reasoning a step further and discuss the relationship between the provision of public goods (in our case the provision

324 | Fabian Winter and Axel Franzen

of costly punishment) and the Volunteer’s Dilemma. The crux of the VOD lies in the fact that a number of rational actors have to coordinate who must provide the public good. This leads to a diffusion of responsibility, even though every actor individually has the incentive to cooperate. The theoretical contribution of this chapter is to show that this diffusion of responsibility can also emerge in situations where no one has a material incentive to cooperate. We model this situation as an ultimatum game with multiple responders, where a single rejection suffices to destroy the proposer’s gains. Diffusion of responsibility would not be expected under purely materialistic preferences. However, we show that diffusion of responsibility follows from a general interpretation of social preferences in the spirit of Fehr and Schmidt (1999). We term this interpretation “conditional enforcement”, and derive quantitative hypotheses from this interpretation. In another interpretation, termed “unconditional enforcement”, rejections of low offers are likely, but independent of the number of enforcers. Here, the diffusion of responsibility would not be expected. Empirically, many regularities observed in the Volunteer’s Dilemma extend to the setting described here. The diffusion of responsibility observed in the VOD does not seem to depend on the fact that private benefits exceed the private costs of the public goods, either theoretically or empirically. Our data suggests that some diffusion of responsibility also occurs if the costs of providing the public good of norm enforcement exceed the benefits. On the other hand, the mixed Nash equilibrium in the symmetric VOD predicts that provision of the good also negatively depends on the number of potential volunteers. This counterintuitive result is usually rejected by experimental data. Our data on the ultimatum game mirrors the empirical results in the VOD: more responders lead to higher global rejection rates. However, even though these effects are statistically significant, their magnitude is small, which questions their empirical significance. The “unconditional enforcement” model’s fit supersedes all other models by at least one order of magnitude. This lends support to the idea that the enforcement of social norms depend only to some degree on material and situational incentives. The core of the “unconditional enforcement” model is the payoff comparison between the actor and the norm violator, while it is the comparison between the actor and all others in the model of “conditional enforcement”. In the former, norm violations are sanctioned, when the differences in payoff between actor and violator become unbearable. Payoff differences to other victims of the norm violation are – by assumption of the model – not important for the sanctioning decision. This distinguishes “unconditional enforcement” from “conditional enforcement”, as well as from the underlying model by Fehr and Schmidt (1999). Fehr and Schmidt, and the model of “conditional enforcement”, assume that everyone compares to everybody else. Our interpretation of their work specifies exactly what forms a relevant comparison group.

Diffusion of Responsibility in Norm Enforcement | 325

Our results strongly suggest that some inequalities are more important than others. Seeing a norm violator being better off clearly outshines remaining or newlyemerging inequalities towards other victims. This highlights the importance of a careful evaluation of the popular models of social preferences and their often too little reflected use.

Bibliography [1] [2] [3] [4] [5]

[6] [7]

[8] [9] [10] [11] [12] [13] [14] [15] [16] [17]

Carpenter, Jeffrey P. 2007. “Punishing free-riders: How group size affects mutual monitoring and the provision of public goods.” Games and Economic Behavior 60(1):31–51. Demsetz, Harold. 1967. “Toward a theory of property rights.” The American Economic Review 57(2):347–359. Diekmann, Andreas. 1985. “Volunteer’s dilemma.” Journal of Conflict Resolution 29(4):605– 610. Diekmann, Andreas. 2004. “The Power of Reciprocity: Fairness, Reciprocity, and Stakes in Variants of the Dictator Game.” Journal of Conflict Resolution 48(4):487–505. Diekmann, Andreas, and Wojtek Przepiorka. 2015. “Punitive preferences, monetary incentives and tacit coordination in the punishment of defectors promote cooperation in humans.” Scientific Reports 5:10321. Egas, Martijn, and Arno Riedl. 2008. “The Economics of Altruistic Punishment and the Maintenance of Cooperation.” Proceedings of the Royal Society B 275(1637):871–878. Erev, Ido, Eyal Ert, Alvin E. Roth, Ernan Haruvy, Stefan M. Herzog, Robin Hau, Ralph Hertwig, Terrence Stewart, Robert West, and Christian Lebiere. 2010. “A choice prediction competition: Choices from experience and from description.” Journal of Behavioral Decision Making 23(1):15–47. Fehr, Ernst, and Simon Gächter. 2000. “Cooperation and Punishment in Public Goods Experiments.” The American Economic Review 90(4):980–994. Fehr, Ernst, and Simon Gächter. 2002. “Altruistic Punishment in Humans.” Nature 415(6868):137–140. Fehr, Ernst, and Klaus M. Schmidt. 1999. “A theory of fairness, competition, and cooperation.” Quarterly journal of Economics 114(3):817–868. Fischbacher, Urs. 2007. “z-Tree: Zurich toolbox for ready-made economic experiments.” Experimental economics 10(2):171–178. Fischbacher, Urs, Christina M. Fong, and Ernst Fehr. 2009. “Fairness, errors and the power of competition.” Journal of Economic Behavior & Organization 72(1): 527–545 Fischbacher, Urs, Simon Gächter, and Ernst Fehr. 2001. “Are people conditionally cooperative? Evidence from a public goods experiment.” Economics Letters 71(3):397–404. Franzen, Axel. 1995. “Group size and one-shot collective action.” Rationality and Society 7(2):183–200. Franzen, Axel, and Sonja Pointner. 2012. “Anonymity in the Dictator Game Revisited.” Journal of Economic Behavior & Organization 81(1):74–81. Gouldner, Alvin W. 1960. “The Norm of Reciprocity: A Preliminary Statement.” American Sociological Review 25(2):161–178. Grosskopf, Brit. 2003. “Reinforcement and Directional Learning in the Ultimatum Game with Responder Competition.” Experimental Economics 6(2):141–158.

326 | Fabian Winter and Axel Franzen

[18] Güth, Werner. 2011. Pp. 19–28 in Experimental Economics: Financial Markets, Auctions, and Decision Making: Interviews and Contributions from the 20th Arne Ryde Symposium, edited by N. Frederik, and H. Holm. Luxemburg: Springer Science & Business Media. [19] Güth, Werner, and Martin G. Kocher. 2013. “More than Thirty Years of Ultimatum Bargaining Experiments: Motives, Variations, and a Survey of the Recent Literature.” CESIFO Working Paper No. 4380, LMU, Munich. [20] Güth, Werner, Rolf Schmittberger, and Bernd Schwarze. 1982. “An experimental analysis of ultimatum bargaining.” Journal of Economic Behavior & Organization 3(4):367–388. [21] Hobbes, Thomas. [1651] 1997. Leviathan – or the Matter, Forme and Power of a Commonwealth Ecclesiasticall and Civil. New York: Touchstone. [22] Mauss, Marcel. 1968. Die Gabe: Form und Funktion des Austauschs in archaischen Gesellschaften. Frankfurt am Main: Suhrkamp. [23] Oliver, Pamela. 1980. “Rewards and punishments as selective incentives for collective action: theoretical investigations.” American Journal of Sociology 85(6):1356–1375. [24] Pointner, Sonja, and Axel Franzen. 2016. “The Nature of Fairness in Dictator and Ultimatum Games.” Pp. 232–252 in Essays on Inequality and Integration, edited by A. Franzen, C. Joppke, B. Jann, and E. Widmer. Zürich: Seismo. [25] Raihani, Nichola J., and Redouan Bshary. 2011. “The evolution of punishment in n-player public goods games: A volunteer’s dilemma.” Evolution 65(10):2725–2728. [26] Rauhut, Heiko, and Fabian Winter. 2010. “A sociological perspective on measuring social norms by means of strategy method experiments.” Social Science Research 39(6):1181–1194. [27] Selten, Reinhard. 1967. “Die Strategiemethode zur Erforschung des eingeschränkt rationalen Verhaltens im Rahmen eines Oligopolexperiments.” Pp. 136–168 in Beiträge zur experimentellen Wirtschaftsforschung, edited by H. Sauermann. Tübingen: JCB Mohr. [28] Simmel, Georg. 1908. Soziologie: Untersuchungen über die Formen der Vergesellschaftung. Berlin: Duncker & Humblot. [29] Tutić, Andreas. 2014. “Procedurally Rational Volunteers.” The Journal of Mathematical Sociology 38(3):219–232. [30] Winter, Fabian, Heiko Rauhut, and Dirk Helbing. 2012. “How norms can generate conflict: An experiment on the failure of cooperative micro-motives on the macro-level.” Social Forces 90(3):919–946.

Nynke van Miltenburg, Vincent Buskens, and Werner Raub

Endogenous Peer Punishment Institutions in Prisoner’s Dilemmas: The Role of Noise Abstract: We study six-person Prisoner’s Dilemmas (PDs), in which subjects endogenously decide whether to implement a peer punishment institution in their group and whether the punishment institution, if implemented, implies more or less severe punishment. We consider PDs with perfect information on other subjects’ previous behavior and PDs in which subjects observe each other’s behavior with noise. We expect subjects to be less inclined to implement a punishment institution and to be less likely to prefer severe punishments with noise than without noise. We find that, without noise, the majority of groups choose a punishment institution with severe punishments. With noise, most groups do not implement a punishment institution. Both with and without noise, cooperation and earnings increase when a punishment institution is implemented, especially with opportunities for severe punishments. However, subjects in the noise condition perceive lower earnings under severe punishments than under the other options. With noise, moreover, observing the punishment of cooperators discourages subjects from implementing a punishment institution in subsequent interactions.

1 Introduction Social groups such as neighborhoods, work teams, and religious communities thrive when members cooperate in pursuing group interests. However, cooperation problems arise if defecting on the group benefits individual members (Raub, Buskens, and Corten 2014). In such situations, group members may encourage their peers to cooperate, for example through punishment. Henceforth, opportunities for actors to punish their fellow group members (opportunities for peer punishment) are referred to as a “punishment institution” (North 1990). Experimental research demonstrates that cooperation and long-term welfare increase when a punishment institution is exogenously implemented by the experimenter (for overviews, see Balliet, Mulder, and Van Lange 2011; Chaudhuri 2011; see Diekmann and Przepiorka 2015; Przepiorka and Diekmann 2013 for related empiriNote: Comments from, and discussions with, colleagues of our Utrecht group “Cooperation in Social and Economic Relations” are gratefully acknowledged. This chapter is part of the project “The Feasibility of Cooperation under Various Sanctioning Institutions,” funded by the Netherlands Organization for Scientific Research (NWO, grant 400-09-159). Support for Raub was provided by NWO (PIONIERprogram “The Management of Matches”; grants S 96–168 and PGS 50–370). Raub acknowledges the hospitality of Nuffield College, University of Oxford. https://doi.org/10.1515/9783110472974-016

328 | Nynke van Miltenburg, Vincent Buskens, and Werner Raub

cal work on punishment as a mechanism of cooperation that uses the Volunteer’s Dilemma for modeling incentives in situations with punishment options). This raises the deeper theoretical question of whether actors involved in cooperation problems anticipate the consequences of punishment institutions and voluntarily implement peer punishment institutions in their group (Prendergast 1999). Several studies consider settings in which actors decide whether to interact under a punishment institution. They find that most actors initially prefer to interact without a punishment institution, but that punishment institutions gain popularity over time (Ertan, Page, and Putterman 2009; Gürerk 2013; Gürerk, Irlenbusch, and Rockenbach 2006; Rockenbach 2009; Markussen, Putterman, and Tyran 2014; Rockenbach and Milinski 2006). We build on these findings in two ways. First, previous studies on endogenous punishment institutions focus on cooperation problems in which actors have accurate information regarding the contributions of other group members. While this may drive results on institution formation (Nikiforakis 2014), it does not sufficiently reflect many everyday cooperation problems (Bereby-Meyer 2012). Here, we compare cooperation problems where actors are accurately informed of the decisions of their peers with cooperation problems characterized by noise. With noise, actors may observe others’ cooperation as defection, and vice versa.¹ Noise implies that defectors may remain undetected, while cooperators may receive “misguided” punishment. Accordingly, noise impedes the capacity for exogenous punishment institutions to maintain cooperation (Fischer, Grechenig, and Meier 2013; Grechenig, Niklisch, and Thöni 2010; Van Miltenburg, Przepiorka, and Buskens 2015). It is thus unclear whether findings on endogenous punishment institutions generalize to more realistic noisy environments. Second, when actors involved in everyday cooperation problems decide to implement a punishment institution, it is likely that they also design the details of their institution. In our experiment, in addition to deciding whether to implement a punishment institution, actors choose punishment effectiveness, namely, the extent to which each punishment will reduce its recipient’s earnings (Egas and Riedl 2008; Nikiforakis and Normann 2008). Both in conditions with and without noise, high punishment effectiveness deters defection better than low punishment effectiveness. Accordingly, both with and without noise, cooperation rates are found to increase with exogenous punishment effectiveness (Ambrus and Greiner 2012; Egas and Riedl 2008; Nikiforakis and Normann 2008). Actors may, however, be reluctant to allow punishment with high effectiveness, as it implies a risk of being severely punished themselves (Buchanan and Tullock 1962). Even without noise, cooperators are at risk of being punished antisocially (Cinyabuguma, Page, and Putterman 2006; Herrmann, Thöni, and Gächter

1 We consider one-shot interactions. Theoretical complexities involved in strategic punishment with noise in repeated interactions thus do not apply in our setting (Fudenberg, Rand, and Dreber 2012; Green and Porter 1984; Wu and Axelrod 1995).

Endogenous Peer Punishment Institutions in Prisoner’s Dilemmas: The Role of Noise |

329

2008). With noise, due to misguided punishment, the likelihood that cooperators will be punished is even higher, which might make actors more reluctant to allow opportunities for punishment with high effectiveness. In our experiment, actors individually indicate (henceforth: vote) whether they wish to interact under a punishment institution and which level of effectiveness they prefer. Options winning majority votes are implemented. Many iterations may be required before voting outcomes and behaviors under each outcome converge to a stable pattern (Ambrus and Greiner 2012; Gächter, Renner, and Sefton 2008). Furthermore, while repeated interactions in fixed groups provide incentives for reputation formation that may be anticipated in votes, such incentives are ruled out in one-shot interactions. We therefore study endogenous implementation of punishment institutions and effectiveness over a long sequence of one-shot interactions. The chapter is organized as follows. In Section 2 we review related experimental literature. Section 3 provides hypotheses. The laboratory experiment is presented in Section 4. The results are reported in Section 5. Section 6 concludes.

2 Cooperation with punishment institutions: related literature Experiments comparing voluntary contribution games with and without exogenous punishment institutions have revealed a number of consistent behavioral patterns (Chaudhuri 2011). Typically, in the absence of a punishment institution, cooperation rates begin at approximately 50 % and decline to almost complete defection if oneshot interactions are repeated with different partners (Camerer 2003; Ledyard 1995). Several subjects in these interactions can be classified as conditional cooperators, who cooperate as long as they expect that others will do the same (Fischbacher, Gächter, and Fehr 2001). Under exogenous punishment institutions, several subjects punish defectors, even in one-shot interactions (Fehr and Gächter 2002), and cooperation is typically maintained at high levels (Balliet, Mulder, and Van Lange 2011; Chaudhuri 2011; Nikiforakis 2014). Cooperators often receive some punishment, too (Cinyabuguma, Page, and Putterman 2006; Herrmann, Thöni, and Gächter 2008; Ostrom, Walker, and Gardner 1992), hindering their subsequent cooperation. Henceforth, punishment directed at observed defectors is deemed “pro-social” and punishment directed at observed cooperators “antisocial.” In most experiments, every punishment depletes recipient income by three times the cost of punishment allocation (1 : 3 effectiveness, e.g., Fehr and Gächter 2002). If punishment effectiveness is varied exogenously, cooperation rates increase with effectiveness (Ambrus and Greiner 2012; Egas and Riedl 2008; Nikiforakis and Normann 2008). However, the higher the effectiveness of the

330 | Nynke van Miltenburg, Vincent Buskens, and Werner Raub

punishment, the more resources it destroys. Accordingly, group earnings are found to decrease with effectiveness in series of one-shot interactions (Egas and Riedl 2008). Experiments employing noise in cooperation problems with exogenous punishment institutions and 1 : 3 punishment effectiveness find that a 20 % or higher probability of inaccurately observing the contribution decisions of others causes a decrease in cooperation rates and earnings (Fischer, Grechenig, and Meier 2013; Grechenig, Niklisch, and Thöni 2010; Van Miltenburg, Przepiorka, and Buskens 2015). Studies examining other types of noise also find negative effects on earnings in cooperation problems with a punishment institution (Ambrus and Greiner 2012; Bornstein and Weisel 2010; Patel, Cartwright, and Van Vugt 2010). Evidence suggests that, with noise, cooperation rates and earnings increase with punishment effectiveness in repeated interactions in fixed groups (Ambrus and Greiner 2012). A growing number of experimental studies examine the implementation of peer punishment institutions through voting procedures. To our knowledge, none of these studies has addressed noise or endogenous punishment effectiveness. Sutter, Haigner, and Kocher (2010) allow subjects to vote before the first interaction on the implementation of a punishment institution, a reward institution, or neither. They consider two exogenous levels of effectiveness, and find that the majority of groups does not opt for a punishment or reward institution under 1 : 1 effectiveness while most groups select a reward institution under 1 : 3 effectiveness. Botelho et al. (2005) allow subjects to vote in a final round on whether to implement a punishment institution after a series of cooperation problems with and without a punishment institution. They find that only one group votes in favor of the punishment institution. However, in studies where fixed groups are offered multiple opportunities to vote, and the outcome applies to several ensuing interactions, punishment institutions gain popularity over time. Ertan, Page, and Putterman (2009) find that most groups allow for punishment of belowaverage contributors after a number of voting rounds. Kamei, Putterman and Tyran (2011) and Markussen, Putterman, and Tyran (2014) find that, after several votes, many subjects prefer peer punishment institutions over institutions that automatically punish defectors if the cost of implementing a peer punishment institution is low relative to automatic punishment. Finally, studies in which subjects can migrate between groups with and without a punishment institution find that, over time, many subjects opt for punishment institutions (Gürerk 2013; Gürerk, Irlenbusch, and Rockenbach 2009; Rockenbach and Milinski 2006; Fehr and Williams 2013).

Endogenous Peer Punishment Institutions in Prisoner’s Dilemmas: The Role of Noise |

331

3 Experimental game and hypotheses 3.1 A cooperation problem with noise, a punishment stage, and a voting stage We first outline how cooperation and punishment proceed in the experiment, and then describe the voting procedure. Cooperation problems are modeled as one-shot six-person Prisoner’s Dilemmas (PDs). Most previous studies employ a Public Goods Game, in which actors can contribute any fraction of their endowment. In our PDs, actors can only contribute their entire endowment or nothing. Due to this feature, the PD, unlike the Public Goods Game, allows for a straightforward implementation of noise (Ambrus and Greiner 2012). Each actor i receives an endowment w > 0. Subsequently, each actor decides independently and simultaneously whether to contribute his or her entire endowment to a “group account”: that is, actor i’s contribution c i is either w (cooperation) or 0 (defection). Joint contributions c = ∑ c i are multiplied by m, with 1 < m < 6, and divided equally among group members. Thus, actors gain more individually by defecting than by cooperating (since m < 6), while the group payoff is maximized if everyone cooperates (since m > 1). Each actor is better off under full cooperation than if all defect (since mw > w). Still, Pareto-suboptimal full defection is the unique Nash equilibrium of the PD under the assumption that actors are rational and are self-regarding in the sense of caring exclusively for their own payoffs. Henceforth, we refer to rational and self-regarding actors as payoff-maximizing actors. In the experiment, we comply with conventional values of endowment and individual return from cooperation by using w = 20 and m = 2.4 respectively (Fehr and Gächter 2002). After the contribution stage, each actor i receives a signal o ij on each other group member j’s contribution decision (j ≠ i). Two variants of the PD are considered that differ in the accuracy of the signal. Without noise, the signal is always accurate (o ij = c j ). With noise, the signal may be inaccurate (o ij ≠ c j ) with probability p and accurate with probability 1 − p. If a signal is inaccurate, i observes j’s cooperation as a defection and vice versa. Whether o ij is accurate is determined independently for each observing actor i of each choice c j . Payoffs are based on actual contributions, although actors cannot infer actual payoffs from the contributions they observe with noise. Subjects in the experiment only observe their actual payoffs at the end of a session. Noise does not affect the payoff structure of the PD: the equilibrium of zero contributions remains unchanged for payoff-maximizing actors. In the experiment, p = 0.2. Each actor’s contribution decision is thus incorrectly observed, on average, by one of the five others in a six-person group. In both PDs (with and without noise), if a punishment institution is implemented, once actors observe signals o ij , each i can decide for each j whether to punish j. While in related studies actors can typically choose the level of their punishment, we employ

332 | Nynke van Miltenburg, Vincent Buskens, and Werner Raub

dichotomous punishment decisions to ensure that actors cannot compensate for low effectiveness by allocating more severe punishment, and vice versa. If i decides to punish j, actor i pays a cost of a > 0 for punishing, and the payoff for j is reduced by b > a. Thus, i’s earnings from the contribution stage of the PD are reduced by an amount a for each j whom i decides to punish, and by an amount b for each j who decides to punish i. As punishment is costly, payoff-maximizing actors do not punish in oneshot interactions. Thus, when all actors are payoff maximizing and expect others to be as well, no actor enacts or expects to receive punishment, and the equilibrium of full defection remains unchanged. In the experiment, punishment costs are constant at a = 2. Two levels of effectiveness a : b are considered. With low effectiveness, b = 6 so that a : b = 2 : 6 = 1 : 3, the most commonly used ratio in the literature. With high effectiveness, b = 12 so that a : b = 2 : 12 = 1 : 6.² Recipients of punishment thus lose three or six times the amount that actors spend on punishment allocation. Whether a punishment stage is added to the PD and, if so, the level of effectiveness employed, is decided by vote. The voting stage takes place in every PD before actors make contribution decisions. Voting is compulsory and costless. When actors vote, they know whether the PD they are participating in includes noise. First, actors vote on whether to add a punishment stage to the PD. Second, actors vote for high or low effectiveness of punishment regardless of whether they voted in favor of a punishment stage and without knowing whether a punishment stage will be added. If a majority votes against the punishment stage, it is omitted further on in the game. This outcome is referred to as No Punishment (NP). If a majority votes in favor of a punishment stage, this stage is added to the PD with the corresponding punishment effectiveness voted for. These outcomes are referred to as Low Punishment (LP) and High Punishment (HP). If exactly three of the six group members vote in favor of implementing a punishment institution, or for a certain effectiveness, the respective institution is randomly determined.³

2 Group size and punishment effectiveness are set so that a different number of punishers is required to deter defectors under each effectiveness and noise condition, and enforcing cooperation is feasible because not too many group members need to be punished to achieve deterrence. 3 Theoretically, this procedure cannot exclude strategic votes. Actors who prefer HP (LP) over NP and NP over LP (HP), and who expect at least three others to vote for the punishment effectiveness they least prefer, vote against implementing a punishment institution (their second preference) to avoid ending up with their least preferred outcome, if they expect to be pivotal. We expect these preference patterns and expectations of other votes to be unlikely. We employ our voting procedure, as it is fast to implement and easy for subjects to understand.

Endogenous Peer Punishment Institutions in Prisoner’s Dilemmas: The Role of Noise

| 333

3.2 Costs and benefits of interacting under a punishment institution If all actors are payoff maximizing and expect others to be payoff maximizers as well, irrespective of noise, no punishment is allocated in a subgame perfect equilibrium, and actors defect regardless of the presence of a punishment institution. Voting outcomes are then irrelevant with respect to behavior and earnings, and actors are indifferent in the vote for a punishment institution and for effectiveness. However, as we have seen, it is evident from earlier experiments that there are conditional cooperators, pro-social punishers, and antisocial punishers who behave in a manner that is inconsistent with the payoff-maximizing actor model. The presence of conditional cooperators and punishers alters the implications of the voting outcome for a payoff-maximizing actor i. First, while conditional cooperation can imply that some of i’s fellow group members cooperate under NP, pro-social punishers are likely to enforce even more cooperation under LP and HP. Second, while i always defects and cannot be punished under NP, i may be punished or may cooperate to avoid receiving punishment under LP and HP. Thus, interacting under LP or HP can “benefit” i compared to NP through increased cooperation of other group members. However, LP and HP can imply “costs” for i, as i may have to cooperate and may be punished. Payoff-maximizing actors who anticipate the presence of conditional cooperators and punishers may then vote for LP or for HP if they expect benefits to outweigh costs. Table 1 shows how the costs and benefits of interacting under LP or HP rather than NP for payoff-maximizing actors depend on noise and punishment effectiveness. The table is used to develop hypotheses on votes. The far-left column displays the number of pro-social punishers in a group (group members who punish others whom they observe to be defectors). Corresponding rows outline how earnings of payoff-maximizing actors are affected through interacting under LP or HP instead of NP. For noise conditions, the table shows expected values based on an average of 20 % of pro-social punishers inaccurately observing the focal actor’s contribution decision. We assume that actors behave as if they use Table 1 to maximize expected payoffs, given their expected number of pro-social punishers. Antisocial punishment is not accounted for in Table 1, but we later hypothesize on the effects of experiencing punishment directed at (observed) cooperators. The columns entitled “Pun D” and “Pun C” in Table 1 show (expected) punishments that actors receive under LP and HP after defecting and cooperating respectively. Each punishment subtracts six points under LP and twelve points under HP. Accordingly, the no noise section of Table 1 shows that if actors defect, they receive six punishment points under LP and twelve under HP for each pro-social punisher in the group. Actors are not punished for cooperating. With noise, on average 20 % of pro-social punishments are targeted at cooperators. Accordingly, the noise section of Table 1 shows that under LP actors expect 0.8 × 6 = 4.8 punishment points for

0 1 2 3 4 5

No. of pro-social punishers

0 6 12 18 24 30

0 0 0 0 0 0

D D C C C C

0 6 12 12 12 12

0 1 2 2 2 2

0 12 24 36 48 60

0 0 0 0 0 0

D C C C C C

0 12 12 12 12 12

Cost 0 2 2 2 2 2

Incr 0 4.8 9.6 14.4 19.2 24

0 1.2 2.4 3.6 4.8 6

D D D D C C

Pun D Pun C Dec

Pun D Pun C Dec

Pun D Pun C Dec

Incr

LP

HP

LP Cost

Noise

No noise

0 4.8 9.6 14.4 16.8 18

Cost 0 1 2 2 3 3

Incr

0 9.6 19.2 28.8 38.4 48

0 2.4 4.8 7.2 9.6 12

D D C C C C

Pun D Pun C Dec

HP

0 9.6 16.8 19.2 21.6 24

Cost

0 2 3 3 3 3

Incr

Tab. 1: Punishment received for defection and cooperation (Pun D, Pun C), payoff-maximizing contribution decisions (Dec); payoff loss compared to defecting under NP given the indicated decisions (Cost); required increases in others’ cooperation rates relative to NP to offset indicated costs (Incr) for each voting outcome and number of pro-social punishers in a group. For the noise condition, the table shows expected values based on an average of 20 % inaccurate observations. Payoffs correspond to those in the experiment.

334 | Nynke van Miltenburg, Vincent Buskens, and Werner Raub

Endogenous Peer Punishment Institutions in Prisoner’s Dilemmas: The Role of Noise

| 335

defecting and 0.2 × 6 = 1.2 punishment points for cooperating for each pro-social punisher in the group. Likewise, under HP actors expect 0.8 × 12 = 9.6 punishment points for defecting and 0.2 × 12 = 2.4 punishment points for cooperating for each pro-social punisher. For example, if actors under LP with noise expect that their group contains four pro-social punishers, they should expect 4 × (0.8 × 6) = 19.2 punishment points for defecting and 4 × (0.2 × 6) = 4.8 punishment points for cooperating. The “Dec” columns in Table 1 show payoff-maximizing contribution decisions. In our experiment, actors get 20 × 2.4 / 6 = 8 points returned from their own contribution, so cooperation yields 20 − 8 = 12 points less than defection. Thus, actors maximize the (expected) payoff by cooperating if the (expected) punishment for defection (“Pun D”) is at least 12 points higher than the (expected) punishment for cooperation (“Pun C”). For example, with four pro-social punishers under LP with noise, the difference between expected punishment for cooperation and defection is 19.2 − 4.8 = 14.4 > 12. Accordingly, payoff-maximizing actors under noise who anticipate four pro-social punishers cooperate under LP. The “Cost” columns in Table 1 specify the (expected) costs of interacting under LP or HP: that is, the earnings that payoff-maximizing actors (expect to) lose when they interact under LP or HP rather than NP. Costs comprise two elements. First, defection is payoff maximizing under NP, while cooperation may maximize (expected) payoffs under punishment institutions. Second, actors cannot be punished under NP, but might receive punishment under LP or HP. Accordingly, if defection maximizes (expected) payoffs, “Cost” equals the (expected) punishment for defection (“Pun D”). If cooperation maximizes (expected) payoffs, “Cost” equals the (expected) punishment for cooperation (“Pun C”) plus 12, that is, the costs of cooperation rather than defection. For example, in a group with four pro-social punishers under LP with noise, payoff-maximizing actors cooperate and expect 4.8 punishment points for doing so. Hence, cooperating under LP rather than defecting under NP brings expected costs of 4.8 + 12 = 16.8. The costs of interacting under LP or HP can be offset by increased cooperation. In our experiment, actors receive eight points for each cooperating group member. The “Incr” columns in Table 1 outline increases in others’ cooperation rates required to offset the (expected) costs of interacting under LP and HP rather than NP. This is equal to “Cost” divided by eight and rounded up to the next integer value. For example, in a group with four pro-social punishers under LP with noise, when three more fellow group members cooperate under LP than under NP, earnings from the cooperation of others increase by 3 × 8 = 24 points, offsetting the expected cost of 16.8 points. Thus, if payoff-maximizing actors in the PD with noise expect four pro-social punishers under LP, they expect to earn more under LP than under NP if they expect that, compared to NP, the number of others who cooperate increases by at least three under LP.

336 | Nynke van Miltenburg, Vincent Buskens, and Werner Raub

3.3 Hypotheses: noise and punishment effectiveness We consider two types of hypotheses. First, we develop hypotheses on voting and cooperation with and without noise. We ignore antisocial punishment in these hypotheses.⁴ Second, we present hypotheses on how experiencing certain behaviors of other group members, including punishments that may be antisocial, affect subsequent voting. The more actors allocate pro-social punishment under a noise condition or voting outcome, the more punishment defectors receive relative to cooperators under this outcome. This may in turn affect cooperation and voting. On the one hand, actors may be less inclined to punish under HP than under LP, due to a belief that punishment is too severe. On the other, actors may be more inclined to punish under HP than under LP because they get more “value” for punishment expenditures. Previous studies suggest that the higher the effectiveness of punishment, the more actors are inclined to punish, both with and without noise (Ambrus and Greiner 2012; Egas and Riedl 2008). We may expect that noise renders actors reluctant to punish observed defectors, as they may target actual cooperators. A previous study employing 1 : 3 effectiveness (Van Miltenburg, Przepiorka, and Buskens 2015) found that noise had no significant effect on decisions to punish observed defectors. How noise and punishment effectiveness affect punishment decisions will be tested empirically and evaluated in light of differences between experimental conditions. We first hypothesize on the effects of noise on voting. The “Pun” columns in Table 1 show that for each number of pro-social punishers, defectors are punished less and cooperators are punished more with noise than without noise. If noise renders prosocial punishers reluctant to punish, defectors receive even less punishment relative to cooperators with noise. Accordingly, we expect that actors anticipate more cooperation under LP and HP without noise than with noise. Moreover, the “Incr” columns in Table 1 show that actors often require more cooperation with noise to offset expected costs of interacting under LP or HP. We expect the lower expected benefits, and higher expected costs, to render actors less inclined to vote in favor of implementing a punishment institution with noise.

4 We acknowledge that even a small degree of antisocial punishment can have pronounced effects on cooperation. Still, we choose to ignore antisocial punishment for Hypotheses 1–4 for two reasons. First, we assume that actors who typically have no previous experience with punishment institutions in the PD do not a priori expect the possibility of antisocial punishment (directed at observed cooperators) when voting. When actors observe punishment that might be antisocial, we expect that this will affect their subsequent votes. This is captured in Hypotheses 9 and 10. Second, given that antisocial punishment is a rare phenomenon, we do not expect major differences in the consequences of antisocial punishment between the experimental conditions that would change the expected effects of Hypotheses 1–4.

Endogenous Peer Punishment Institutions in Prisoner’s Dilemmas: The Role of Noise

|

337

Hypothesis 1. Actors are less inclined to vote in favor of implementing a punishment institution with noise. The “Cost” columns in Table 1 show that, with noise, interacting under HP has a higher cost than interacting under LP for each number of pro-social punisher. This difference is even more pronounced when, in line with previous research, actors are more inclined to punish under HP than under LP. Thus, with noise, actors only vote for HP over LP if they expect that more cooperation under HP offsets the additional cost. Without noise, the expected costs of interacting under HP and LP are equal, unless actors expect only one pro-social punisher under LP. Thus, without noise, actors typically do not require a promise of higher cooperation under HP than LP to induce them to vote for HP. We hypothesize that actors are more reluctant to vote for HP with noise than without noise. Hypothesis 2. Actors are less inclined to vote for HP rather than LP with noise. One effect that is not accounted for in Table 1 may further encourage negative noise effects on voting for (more effective) punishment institutions after actors gain experience with the PD under each voting outcome. With noise, and when there are more cooperators than defectors, more cooperators are observed as defectors, on average, than the other way around. Thus, the more actual cooperation exceeds 50 %, the more cooperation rates that actors observe fall below actual cooperation rates. The opposite holds for cooperation rates lower than 50 %. This may imply that others’ cooperation rates, and therefore own earnings, that actors observe with noise under a punishment institution are lower than actual cooperation and earnings, while observed cooperation and earnings under NP are higher than the reality. We test empirically whether these observation biases have consequences for the effect of punishment institutions on observed cooperation or earnings. We now hypothesize on cooperation under each voting outcome. First, for both noise conditions, the “Pun” columns in Table 1 show that for each number of prosocial punishers, defectors receive more punishment relative to cooperators under HP than under LP. The difference is even more pronounced when, in line with previous research, actors are more inclined to punish under HP than LP. Second, for each number of pro-social punishers, the “Pun” columns in Table 1 show that defectors receive less punishment relative to cooperators with noise than without noise. These differences between noise conditions are even more pronounced when noise makes actors reluctant to allocate pro-social punishment. In both noise conditions, therefore, we hypothesize that cooperation rates are higher under HP than under LP, and for each level of punishment effectiveness we hypothesize higher cooperation rates without noise than with noise. Hypothesis 3. Both with and without noise, cooperation rates are higher under HP than under LP and higher under LP than under NP.

338 | Nynke van Miltenburg, Vincent Buskens, and Werner Raub

Hypothesis 4. Cooperation rates under LP and HP are higher without noise than with noise.

3.4 Hypotheses: the effects of experience A second set of hypotheses addresses how experiences affect subsequent votes. First, votes may be affected by the cooperation of fellow group members that actors observed in previous PDs under each voting outcome. Regarding votes on whether to implement a punishment institution, the more cooperation actors have experienced under LP and HP rather than NP, the more likely they will be to expect higher earnings under punishment institutions than under NP. Regarding the vote for punishment effectiveness, the more cooperation actors have experienced from other actors under HP relative to LP, the more likely they will be to expect higher earnings under HP than under LP. Hypothesis 5. Actors are more inclined to vote for an institution (NP, LP, or HP) the more cooperation they have experienced under this institution, and the less cooperation they have experienced under alternative institutions. The “Incr” columns in Table 1 show that, with noise, actors often require more cooperation under LP or HP than without noise to offset the expected costs of interacting under LP or HP rather than NP. The “Cost” columns show that, with noise, interacting under HP has higher expected costs than interacting under LP, such that, with noise, actors only vote for HP if they expect more cooperation under HP than under LP. The difference in costs is even more pronounced if actors are more inclined to punish under HP than under LP. Conversely, without noise, the costs of interacting under HP and LP are mostly equal. Thus, with noise, more experienced cooperation may be required than without noise to convince actors to vote for a (more effective) punishment institution. Hypothesis 6. The effect of previously experienced cooperation on the likelihood of voting for a certain institution is weaker with noise than without noise. Second, votes may be affected by experiencing that observed defectors are punished. The more observed defectors are punished, the more pro-social punishment actors might anticipate in future interactions. This increases the expected benefits of interacting under LP or HP through increased expected cooperation. However, the “Cost” columns in Table 1 show that the expected costs of interacting under punishment institutions also increase with the number of pro-social punishers. Additionally, and especially with noise, the difference between the expected costs of interacting under HP and of interacting under LP increases with the number of pro-social punishers. With noise, therefore, it remains unclear whether actors expect the benefits of increased pro-social punishment to outweigh the costs. Without noise, the costs of interacting under LP or HP are hardly affected by the number of pro-social punishers, while the

Endogenous Peer Punishment Institutions in Prisoner’s Dilemmas: The Role of Noise

| 339

benefits are higher because more actors will be induced to cooperate due to the punishment institution. We thus expect that experiencing that observed defectors are punished renders actors more inclined to opt for a (more effective) punishment institution without noise. In addition, we expect that this effect will be reduced with noise, because observing defectors being punished might make actors realize that if their cooperative behavior is misperceived, they may also be punished. Hypothesis 7. Without noise, experiencing punishment of observed defectors has a positive effect on the likelihood of voting in favor of implementing a punishment institution and voting in favor of HP. Hypothesis 8. Noise reduces the effect of experienced punishment of observed defectors under LP and HP on the likelihood of voting in favor of implementing a punishment institution and voting in favor of HP. Finally, votes may be affected by experiencing that observed cooperators are punished. In conditions with and without noise, the punishment of observed cooperators discourages cooperation, especially under HP where punishments are more severe. Actors who observe that cooperators are punished thus expect lower cooperation rates under punishment institutions in the future, and may expect to receive punishment if they cooperate themselves. Accordingly, we hypothesize that punishment aimed at observed cooperators renders actors more reluctant to vote in favor of implementing a punishment institution and in favor of HP. Hypothesis 9. Actors are less inclined to vote in favor of implementing a punishment institution and less inclined to vote for HP rather than LP after experiencing punishment of observed cooperators under LP and HP. With noise, experiencing that observed cooperators are punished might be the result of a wrong observation of the recipients’ contribution decision. Whether a contribution decision is observed accurately is independently determined for every fellow group member. Observing that a cooperator is punished can thus be the result of an incorrect observation by either the focal actor or the punisher. Realizing that such an observation might be due to noise, actors should be less strongly affected by such observations. After all, the punishment of cooperators might signal either the presence of pro-social or of antisocial punishers in the population. The negative effects of observing cooperators being punished may therefore be smaller with noise than without, as without noise punishment of cooperators is antisocial by definition and implies the presence of antisocial punishers in the population. Hypothesis 10. Effects of experienced punishment of observed cooperators under LP and HP on the likelihood to vote in favor of implementing a punishment institution and to vote for HP are less negative with noise than without noise.

340 | Nynke van Miltenburg, Vincent Buskens, and Werner Raub

4 Experimental procedure In the experiment, subjects participated in series of six-person PD games with endogenous punishment institutions. The PD with noise was employed in half of the sessions, while in the other half subjects participated in the PD without noise. Subjects were informed about whether and how noise was implemented. Payoffs were presented as points that were translated into monetary earnings at the end of a session. Throughout each session, subjects were randomly re-matched in different groups of six after every interaction. At the start of the experiment, subjects received paper instructions that only described the contribution stage of the PD. Subsequently, they answered control questions on the computer to verify that they understood this part of the PD. If a subject did not answer a question correctly, the right answer was presented on the screen. Each session proceeded with five periods of a PD with a contribution stage only, to familiarize the subjects with the decision situation. These initial PDs were played either with or without noise, depending on the experimental condition. After each period, subjects were informed, possibly with noise, about contribution decisions by group members and of their own payoff. In the noise condition, instead of being informed of their actual earnings, subjects were informed of the payoff they would earn if the contribution decisions they observed were accurately observed. This prevented subjects from inferring actual contribution rates. After the five initial periods, subjects interacted for 40 periods in the PD with endogenous punishment institutions and effectiveness as described above. Each period opened with two votes: on whether or not to implement a punishment institution, and a choice between LP and HP. The option chosen by the majority of the six-person group was implemented.⁵ Subjects received new instructions describing this phase of the experiment. After each voting stage, subjects were informed of group voting outcomes, but not about how many group members had voted for each alternative. Regardless of voting outcomes, in this part of the experiment subjects received an additional endowment of 10 points after the contribution stage of each PD. Punishments cost two points to enact. In groups that interacted under LP or HP, therefore, the endowment allowed subjects to punish all five fellow group members. Under LP and HP, subjects were informed of punishments received by all group members after the punishment stage. Under each voting outcome, subjects were informed of their own earnings after each interaction. Again, in the noise condition, these earnings were based on contribution decisions perceived by subjects with noise. While subjects could theoretically acquire negative aggregate earnings during this part of the experiment, we did not de-

5 Neutral labels were used in the experiment. Options at the voting stage were referred to as “System A” (NP), “System B1” (LP), and “System B2” (HP). Subjects were informed that ties were broken randomly.

Endogenous Peer Punishment Institutions in Prisoner’s Dilemmas: The Role of Noise

| 341

velop a protocol for negative earnings, as they were highly unlikely and indeed never occurred. At the end of the session, subjects were informed of their actual accumulated earnings. The experiment was programmed using z-Tree (Fischbacher 2007) and conducted at the ELSE laboratory at Utrecht University. Subjects were recruited using the online recruiting system ORSEE (Greiner 2004). There were 156 participants in total: 78 in the noise condition and 78 in the condition without noise (41 % male, 85 % students, 30 % economics students). Subjects earned 12.50 € on average (earnings ranged from a minimum of 7.50 € to a maximum of 15.50 €). Further information on the experiment (such as the instructions provided to the subjects) is available in the Supplementary Information, which can be obtained upon request.

5 Results 5.1 Descriptive results In line with previous findings (Ledyard 1995), cooperation rates in the five initial PDs without a voting stage steadily declined from roughly 45 % to roughly 28 % regardless of noise (output not shown). However, marked noise effects were visible in the PDs with a voting stage. Figures 1a–1d show individual votes, voting outcomes, contribution decisions, and earnings for each period with a voting stage and for the conditions with and without noise. Subjects made two decisions at each voting stage: whether to vote in favor of implementing a punishment institution, and whether to vote for LP or HP. Figure 1a shows the proportion of subjects who voted in favor of implementing a punishment institution rather than NP and the proportion of subjects who voted for HP rather than LP. Note that these are two independent decisions. Without noise, only 35 % of all subjects initially voted in favor of a punishment institution. Punishment institutions quickly increased in popularity, with a stable majority of roughly 85 % voting in favor of implementing a punishment institution after period ten. In the vote for effectiveness, 35 % initially voted for HP. However, HP also increased in popularity, with a stable majority of around 70 % voting for HP after period ten. With noise, 20 % initially voted in favor of implementing a punishment institution, increasing to roughly 50 % over the forty periods. In addition, 20 % of all subjects initially voted for HP, increasing to 35 %. This tendency toward punishment institutions complements previous findings (Ertan, Page, and Putterman 2009; Gürerk, Irlenbusch, and Rockenbach 2006; Markussen, Putterman, and Tyran 2014). Figure 1b presents voting outcomes. Without noise, most groups initially voted for NP, but, after several rounds, most groups voted for HP, followed by LP. Groups hardly ever voted for NP in later rounds. With noise, almost all groups initially voted

1

Proportion of subjects

Proportion of subjects

342 | Nynke van Miltenburg, Vincent Buskens, and Werner Raub

.8 .6 .4 .2 0 0

10

20

30

1 .8 .6 .4 .2 0

40

0

10

Voted in favor of punishment institution instead of NP Voted for HP instead of LP

1 .8 .6 .4 .2 0 0

10

20

30

.8 .6 .4 .2 0

40

0

10

LP

HP

NP

1 .8 .6 .4 .2 0 0

10

20

30

LP

LP

NP

LP

HP

40

.2 0 0

10

30

40

LP

20 HP

70 60 50 40 30 20 10 0 0

10

20 Period

Period (d)

30

.4

NP

20

40

.6

HP

10

30

HP

Period

70 60 50 40 30 20 10 0 0

40

.8

40

Average earnings

Average earnings

NP

30

1

Period (c)

20 Period

Proportion cooperating

Proportion cooperating

NP

40

1

Period (b)

30

Voted in favor of punishment institution instead of NP Voted for HP instead of LP

Proportion of groups

Proportion of groups

(a)

20 Period

Period

NP

LP

HP

Fig. 1: (a) Proportion of subjects voting in favor of implementing a punishment institution rather than NP, and proportion of subjects voting for HP rather than LP without (left) and with (right) noise. (b) Proportion of groups under each voting outcome without (left) and with (right) noise. (c) Cooperation rates for each voting outcome without (left) and with (right) noise. (d) Earnings for each voting outcome without (left) and with (right) noise.

Endogenous Peer Punishment Institutions in Prisoner’s Dilemmas: The Role of Noise

| 343

for NP. Subsequently, a substantial proportion of groups voted for one of the punishment institutions, but NP remained the most popular in almost every period. Of the two punishment institutions, more groups voted for LP than under HP. Figure 1c shows cooperation rates for each voting outcome. Without noise, cooperation rates under NP were relatively low. Conversely, almost full cooperation was immediately achieved under HP. Roughly 60 % cooperated initially under LP, but this rapidly increased to nearly full cooperation. With noise, cooperation rates under NP likewise remained relatively low. From the first period, cooperation rates were considerably higher under both LP and HP. Cooperation rates slowly decreased over time under LP, but remained high under HP. Under both HP and LP, cooperation rates were consistently lower with noise. Previous studies also found that cooperation rates increased under endogenous punishment institutions (Kamei, Putterman, and Tyran 2011; Markussen, Putterman, and Tyran 2014; Sutter, Haigner, and Kocher 2010) and that cooperation rates were lower under exogenous punishment institutions with noise (Fischer, Grechenig, and Meier 2013; Grechenig, Niklisch, and Thöni 2010; Van Miltenburg, Przepiorka, and Buskens 2015), but increased with effectiveness (Ambrus and Greiner 2012; Egas and Riedl 2008; Nikiforakis and Normann 2008). Figure 1d shows that, without noise, higher earnings were achieved almost immediately under LP and HP in comparison with NP. With noise, earnings under LP and HP were somewhat higher than under NP, but this difference was smaller than without noise. Additionally, while higher cooperation rates were achieved under HP than under LP with noise, this does not appear to translate into higher earnings. Previous studies found that exogenous punishment institutions negatively affected earnings with noise under 1 : 3 effectiveness (Grechenig, Niklisch, and Thöni 2010; Van Miltenburg, Przepiorka, and Buskens 2015). Table 2 shows the punishments received by cooperators and defectors. Without noise, 91 % of defectors were punished by at least one fellow group member under LP, and every defector received punishment under HP. The average number of punishments that defectors received was also slightly higher under HP than under LP, such Tab. 2: Number of actual defectors and cooperators (N), percentage of defectors and cooperators receiving at least one punishment (% punished), avarage number of punishments received by defectors and cooperators (Av. punished), and avarage number of punishments received by defectors and cooperators overall (Av. total) under LP and HP, with and without noise. Noise Eff.

Actual defectors N

No Yes

LP HP LP HP

Actual cooperators

% punished Av. punished Av. total N

99 91 51 100 410 74 41 9

2.68 3.37 1.66 2.56

2.43 3.37 1.23 2.44

705 1929 586 271

% punished Av. punished Av. total 17 17 33 49

1.08 1.09 1.17 1.21

0.18 0.18 0.38 0.59

344 | Nynke van Miltenburg, Vincent Buskens, and Werner Raub

that defectors received more punishments on average under HP than under LP. Conversely, few cooperators were punished, typically by one group member only, such that cooperators received few punishments on average under both LP and HP. With noise, most actual defectors were punished, though at lower percentages and by fewer punishers than without noise. Accordingly, under LP and HP, defectors received fewer punishments on average with noise than without. Several cooperators were punished in the noise condition, especially under HP, but only by slightly more than one group member on average, so the average number of punishments received by actual cooperators remained low. Table 2 shows that, under LP, defectors received an average of 1.23 × 6 = 7.38 punishment points and cooperators received an average of 0.38 × 6 = 2.28 punishment points with noise. Thus, the difference in average punishment for defection and cooperation (7.38 − 2.28 = 5.1 points) did not offset the 12-point payoff advantage of defecting, such that defectors earned more than cooperators on average. Similar calculations reveal that punishment under the other voting outcomes is on average sufficient to deter defection. This may explain why LP with noise is the only punishment institution that does not result in nearly full cooperation, but shows a decline in cooperation over time (Figure 1c).

5.2 Explanatory results: cooperation, punishment, and earnings In what follows, multilevel regression models are presented with random effects at the subject level to control for interdependencies within subjects. We have too few observations at the session level to include random effects at the session level. Models with fixed session effects are difficult to interpret, as sessions and noise are perfectly collinear by design. To facilitate a straightforward interpretation of noise effects, sessions are not controlled for in the models presented below. We indicate when effects are not robust if session fixed effects are included to control for interdependencies at the session level. Figures 2a–2c show the effects of noise and the voting outcome on the predicted probability of punishing an observed defector, predicted cooperation, and predicted period earnings. These predictions are based on regression models available in the Supplementary Information. The differences described below are significant according to these models. The Supplementary Information presents corresponding descriptive statistics. Figure 2a shows the predicted probability that a subject will punish an observed defector.⁶ In each period, subjects observed between zero and five defectors in their

6 No significant effectiveness or noise differences were found regarding the predicted probability of punishing an observed cooperator. Predicted probabilities fall below 0.003 for each level of effectiveness, with and without noise.

Endogenous Peer Punishment Institutions in Prisoner’s Dilemmas: The Role of Noise

Predicted probability to punish obs. defector

No noise

| 345

Noise

1

.5

0 LP

HP

LP

HP

(a) No noise

Noise

Noise Predicted probability to observe a cooperator

Predicted probability to cooperate

1

.5

0 NP

LP

HP

NP

LP

.8 .6

.4

.2 NP

HP

LP

HP

(b) Noise

Noise 55 Predicted observed period earnings

Predicted period earnings

No noise 55 50 45 40 35

50 45 40 35 30

30 NP

LP

HP

NP

LP

HP

NP

LP

HP

(c) Fig. 2: (a) Predicted probability of punishing an observed defector under each effectiveness and depending on noise with 95 % confidence intervals. (b) Predicted probability of cooperation under each voting outcome with and without noise, and predicted probability to observe a cooperator with noise with 95 % confidence intervals. (c) Predicted period earnings for each voting outcome, and predicted observed period earnings with noise with 95 % confidence intervals.

group, before deciding whether or not to punish them. Figure 2a shows that the predicted probability of punishing an observed defector is higher under HP than under LP, with as well as without noise, and for both HP and LP the predicted probability is lower with noise. Thus, as confirmed in Table 2, defectors received more punishments relative to cooperators under HP than under LP and more without noise than with noise. Note that in deriving our hypotheses we had already anticipated that noise may ren-

346 | Nynke van Miltenburg, Vincent Buskens, and Werner Raub

der actors reluctant to allocate pro-social punishment, and that subjects may be more inclined to punish (observed) defectors when punishment effectiveness is high. The left panel of Figure 2b shows the predicted probability that a subject will cooperate. In line with Hypothesis 3, Figure 2 shows that the predicted probability of cooperating is higher under HP than LP and higher under LP than NP, regardless of noise. Moreover, under LP and HP the predicted probability to cooperate is lower with noise, confirming Hypothesis 4. Thus, as expected, punishment, and especially severe punishment, promotes cooperation, while noise reduces the cooperation-enhancing effects of punishment institutions. We noted that, with noise, if cooperation under LP and HP exceeded 50 % while cooperation under NP fell below 50 %, cooperation rates observed by subjects under LP and HP may fall below actual cooperation rates, while observed cooperation rates under NP were higher than in reality. The right-hand panel of Figure 2b shows the predicted probability of observing a cooperator with noise. There are five observations per subject for each PD, one for the observed contribution decision of each fellow group member. The predicted probability of observing a cooperator followed the same pattern as the predicted actual cooperation, though the difference between NP and the punishment institutions was smaller than in reality. Figure 2c shows predicted actual and perceived period earnings with noise. Both with and without noise, predicted actual earnings were higher under HP than under LP and higher under LP than NP. With noise, however, subjects were predicted to observe significantly lower earnings under HP than under NP and LP. Thus, while the positive effect of interacting under a punishment institution on cooperation rates is partly detected by subjects in the noise condition, the subjects do not observe that this translates into higher earnings.

5.3 Explanatory results: votes We now model the effect of noise and experience on the decision of whether to vote in favor of implementing a punishment institution. The experienced cooperation rate is calculated as the average number of other group members observed by a subject to be cooperators in all PDs with a voting stage that preceded the current vote, separated by whether the subject interacted under NP, LP, or HP. Experienced punishment of defection is measured as the average number of punishments received by observed defectors (including the subject)⁷ in all preceding PDs in which a subject interacted under LP and under HP. The same method is used to measure the experienced punishment of cooperation. If subjects never experienced a voting outcome or never observed cooperation or defection under LP or HP, they were assigned a value of zero for

7 The number of subjects receiving punishment for defection and for cooperation is insufficient to distinguish between punishments received by subjects and punishments received by other group members. The results presented below do not change when own received punishment is excluded.

Endogenous Peer Punishment Institutions in Prisoner’s Dilemmas: The Role of Noise

|

347

the corresponding experience. Dichotomous variables indicating whether zero was assigned are included to account for this imputation. We control for the period and for whether the subject had voted in favor of implementing a punishment institution in the previous period. Model 1 of Table 3 shows that subjects were less inclined to vote in favor of implementing a punishment institution with noise than without noise, which is in line with Hypothesis 1. In Model 2, noise effects were just insignificant, indicating that such effects are partly explained by the different experiences actors have with and without noise. Model 2 also shows that the more cooperation subjects experienced under NP, the less inclined they were to vote in favor of implementing a punishment institution, whereas the more cooperation they experienced under LP and HP, the more they were inclined to vote in favor of implementing a punishment institution. This confirms Hypothesis 5. The model further shows that subjects were less inclined to vote in favor of implementing a punishment institution after experiencing more punishment of cooperators under HP, confirming Hypothesis 9 for HP but not for LP. Model 3 presents interactions between experience and noise. The effects of cooperation experienced under NP and LP on the likelihood to vote in favor of implementing a punishment institution were not significantly affected by noise. The positive effect of cooperation experienced under HP was stronger with noise than without noise. There is thus no evidence that noise weakens the effect of experienced cooperation, implying that Hypothesis 6 is not confirmed. Contrary to Hypothesis 7, more experiences of punishment aimed at defectors did not render actors more inclined to vote in favor of implementing a punishment institution without noise. This does not leave room for a reduction of such an effect in the noise condition (Hypothesis 8). Additionally, the effect of experienced punishment of cooperation was not significantly lower with noise than without noise, implying that Hypothesis 10 is not confirmed. In contrast with Hypothesis 10, subjects were, only with noise, less inclined to vote in favor of implementing a punishment institution, the more punishment of cooperation they experienced under LP. It may be that subjects observed antisocial punishments too occasionally without noise (Table 2) for it to affect their votes. Experienced punishment of cooperation under HP likewise rendered actors less inclined to vote in favor of implementing a punishment institution with noise, but the effect was not significant, possibly because few groups interacted under HP. In sum, and as expected, the more cooperation subjects experienced under punishment institutions relative to NP, the more inclined they were to vote in favor of implementing a punishment institution. However, we found no evidence that the effect of experiences differed with or without noise in the hypothesized manner (Hypotheses 6–8 and 10). Instead, only with noise observing that cooperators are punished deterred subjects from voting in favor of implementing a punishment institution. Table 4 shows the effect of noise and experience on the decision of whether to vote for HP instead of LP. The same experiences are used as in Table 3, except experienced cooperation under NP. All votes are included regardless of whether a subject voted in

348 | Nynke van Miltenburg, Vincent Buskens, and Werner Raub

Tab. 3: Logistic regression on the decision of whether or not to vote in favor of implementing a punishment institution in period t. Model 1 Hyp. Noise Experiences (observed) Cooperation NP × noise Cooperation LP × noise Cooperation HP × noise Av. punishment defectors LP × noise Av. punishment defectors HP × noise Av. punishment cooperators LP × noise Av. punishment cooperators HP × noise Controls In favor of pun. t − 1 × noise Period × noise Constant Subject level variance Log likelihood

Model 2

Exp. Sign.

Coeff.

S.e.

1



−1.786∗∗ 0.313

5 6 5 6 5 6 7

− + + − + − +

8 7

Model 3

Coeff.

S.e.

Coeff.

S.e.

−0.732

0.419

−2.477

1.379 0.310 0.392 0.267 0.338 0.690 0.716 0.233

−0.375∗ 0.181

−0.059

0.130

−0.377 0.092 0.278 0.269 −1.012 1.644∗ 0.055

− +

0.064

0.110

0.269 0.364

0.311 0.275

8 9

− −

−0.177

0.375

−0.508 0.583

0.314 0.589

10 9

+ −

−0.743∗ 0.326

10

+

0.419∗∗ 0.148 0.552∗∗ 0.157

3.499∗∗ 0.116

3.277∗∗ 0.119

0.033

0.009

−2.588∗∗ 0.839 0.223 0.666 −1.103

0.788 0.179 0.237 0.011 0.016 3.457

0.241

−3.222∗∗ 0.844

3.440∗∗ −0.413 0.015 0.005 2.645

2.907 0.555 −1480.390

3.731∗∗ 0.698 −1439.326

3.837∗∗ 0.730 −1418.964

0.005

−0.779 ∗∗

0.007

Notes: Random effects at the subject level (6,084 PDs by 156 subjects). Controlled for dichotomous variables indicating whether a value of zero was assigned to legitimate missing values. ∗ p < .05, ∗∗ p < .01.

favor of implementing a punishment institution. Model 4 in Table 4 shows that subjects were less inclined to vote for HP with noise, confirming Hypothesis 2. Model 5 shows that subjects were more inclined to vote for HP after experiencing lower cooperation rates under LP. However, subjects were not more inclined to vote for HP after experiencing higher cooperation rates under HP, only partly confirming Hypothesis 5. The model further shows that experienced punishment of defection under HP posi-

Endogenous Peer Punishment Institutions in Prisoner’s Dilemmas: The Role of Noise

| 349

tively affected the likelihood of voting for HP, partially confirming Hypothesis 7. Actors were less inclined to vote for HP after experiencing more punishment of cooperation under HP, but not under LP, again partially confirming Hypothesis 9. Interactions between noise and experience are shown in Model 6. Supporting Hypothesis 6, the negative effect of experienced cooperation under LP on the likelihood to vote for HP was significantly smaller with noise than without noise. However, experienced cooperation under HP did not interact with noise, thus only partly confirming Hypothesis 6. The effects of experienced punishment of defection and cooperation did not change with noise, implying that Hypotheses 8 and 10 are not confirmed. Instead, Tab. 4: Logistic regression on the decision of whether or not to vote for HP instead of LP in period t. Model 4 Hyp. Noise Experiences (observed) Cooperation LP × noise Cooperation HP × noise Av. punishment defectors LP × noise Av. punishment defectors HP × noise Av. punishment cooperators LP × noise Av. punishment cooperators HP × noise Controls Voted for HP t − 1 × noise Period × noise Constant Subject level variance Log likelihood

Model 5 S.e.

Coeff.

Model 6

Exp. Sign.

Coeff.

S.e.

2



−1.718∗∗ 0.299

5 6 5 6 7

− + + − +

8 7

− +

8 9

− −

−0.054

10 9

+ −

−1.403∗∗ 0.302

10

+

Coeff.

−0.703∗ 0.327† −3.357∗∗ 1.251 −0.506∗∗ 0.138 0.079

0.147

0.118

0.127

0.239∗ 0.100

0.352

−0.826∗∗ 0.227 0.684∗ 0.306† −0.982 0.640 1.151 0.667 0.230 0.207 0.008 0.009

0.298 0.233

0.313 −0.242

0.282 0.525

−0.135 0.776 −1.611∗∗ 0.554† 0.419

3.579∗∗ 0.109

S.e.

0.017∗∗ 0.004

−0.013

0.007

−1.483∗∗ 0.223

0.135

0.772

3.307∗∗ 0.157 0.163 0.219 −0.005 0.010 0.000 0.015 6.776∗ 3.144

2.227 0.449 −1559.525

2.207∗∗ 0.457 −1552.497

∗∗

2.595 0.498 −1612.811

3.421∗∗ 0.112

0.680

∗∗

Notes: Random effects at the subject level (6,084 PDs by 156 subjects). Controlled for dichotomous variables indicating whether a value of zero was assigned to legitimate missing values. † not robust when controlling for session-fixed effects, ∗ p < .05, ∗∗ p < .01.

350 | Nynke van Miltenburg, Vincent Buskens, and Werner Raub

experiencing punishment of cooperation under HP rendered subjects less inclined to vote for HP, regardless of noise. Note that this effect is not robust when session fixed effects are considered. This is attributable to the fact that one session without noise had a particularly high rate of antisocial punishment, which, in line with our theory, generated more votes for LP than the other sessions. In sum, it is partly confirmed that subjects are more inclined to vote for HP after experiencing higher cooperation rates under HP and lower cooperation rates under LP. However, interactions between noise and experience do not consistently affect voting decisions in the hypothesized manner. Instead, experiencing the punishment of cooperators affects subsequent votes regardless of noise.

6 Conclusion and discussion We have studied endogenous implementation of peer punishment institutions and punishment effectiveness in cooperation problems where actors accurately observe the behavior of their peers, and in cooperation problems with a 20 % probability of cooperation being observed as defection (and vice versa). In a laboratory experiment, subjects decided by majority vote whether to interact without a punishment institution (NP) or to implement a punishment institution. If the latter was chosen, subjects selected low (LP) or high (HP) punishment effectiveness. As expected, subjects were less inclined to vote in favor of implementing a punishment institution (and less inclined to vote for HP rather than LP) with noise than without (Hypotheses 1 and 2). Without noise, after some initial reluctance, the vast majority of groups interacted under HP. With noise, NP was chosen most often, although LP gained popularity over time. As expected, both with and without noise, cooperation rates were higher under HP than under LP and higher under LP than under NP (Hypothesis 3). Moreover, cooperation rates under punishment institutions were lower with noise (Hypothesis 4). Interestingly, in both conditions, earnings were higher under LP and HP than under NP, but subjects in the noise condition observed lower earnings under HP than under LP and NP. These results complement previous findings that actors in environments characterized by perfect information implement peer punishment institutions after some experience with the decision situation (Ertan, Page, and Putterman 2009; Gürerk 2013; Gürerk, Irlenbusch, and Rockenbach 2006; Gürerk, Irlenbusch, and Rockenbach 2009; Markussen, Putterman, and Tyran 2014; Rockenbach and Milinski 2006). We found in addition that actors support high punishment effectiveness, but less so the more punishment with high effectiveness is targeted at cooperators (Hypothesis 9). An increasing preference for punishment institutions and for high punishment effectiveness does not extend to noisy environments. With noise, lower observed

Endogenous Peer Punishment Institutions in Prisoner’s Dilemmas: The Role of Noise

| 351

earnings under LP than under NP and more frequent experiences of cooperators being punished may discourage subjects from implementing punishment institutions. Noise is often present in cooperation problems outside of laboratory settings (Bereby-Meyer 2012). Our noise results thus show that previous findings on endogenous institution formation should be interpreted with care, since these findings are based on experiments that did not include noise. This raises questions regarding whether other (more realistic) environments inhibit actors’ willingness to interact under a peer punishment institution. For example, actors may be less inclined to opt for a punishment institution that allows for counter-punishment (Denant-Boemont, Masclet, and Noussair 2007; Nikiforakis 2008; Nikiforakis and Engelmann 2011) or may be less inclined to implement a punishment institution in populations where punishment is often used antisocially (Herrmann, Thöni, and Gächter 2008). In addition to the effects of noise on voting outcomes and cooperation rates, our analysis pinpointed determinants of individual votes. As expected, the more cooperation actors have experienced under an option relative to alternatives, the more they are inclined to vote for it (Hypothesis 5). However, we cannot confirm that actors account for noise in their experiences in the hypothesized manner. When actors aim to vote for the option in which they receive the highest earnings and realize that their experiences are noisy, they are expected to react differently to experienced behavior of fellow group members with noise than without noise (Hypotheses 6–8, 10). For example, with noise, more cooperation is required under punishment institutions than without noise for earnings to exceed those generated under NP. Therefore, observing cooperation with noise under LP and HP should encourage subjects less to vote for punishment institutions than without noise. Such conjectures are not systematically corroborated by the data. Instead, certain experiences affect subsequent votes regardless of noise. This may suggest that, while subjects do react to previously experienced positive or negative incentives, they fail to consider noise. Future research may assess how actors in cooperation problems aim to maximize their expected earnings when presented with noisy information.

7 Postscript Cooperation problems and the role of punishment institutions in fostering cooperation have been a subject of the seminal work of Andreas Diekmann, who has combined careful theoretical arguments with well-designed experiments. Core features of his work in this field, which we have tried to emulate in this chapter are, first, seriously using game-theoretic analysis, and not discarding game theory based on superficial arguments and, second, employing additional assumptions and alternatives to game-theoretic reasoning when needed and when such additional assumptions and alternatives can be clearly spelled out.

352 | Nynke van Miltenburg, Vincent Buskens, and Werner Raub

Bibliography [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

[11]

[12] [13]

[14] [15] [16] [17] [18]

[19] [20]

Ambrus, Attila, and Ben Greiner. 2012. “Imperfect public monitoring with costly punishment: an experimental study.” American Economic Review 102(7):3317–3332. Balliet, Daniel, Laetitia B. Mulder, and Paul A. M. van Lange. 2011. “Reward, punishment, and cooperation: a meta-analysis.” Psychological Bulletin 137(4):594–615. Bereby-Meyer, Yoella. 2012. “Reciprocity and uncertainty.” Behavioral and Brain Sciences 35(1):18–19. Bornstein, Gary, and Ori Weisel. 2010. “Punishment, cooperation, and cheater detection in ‘noisy’ social exchange.” Games 1(1):18–33. Botelho, Anabela, Glenn W. Harrison, Ligia Pinto, and Elisabet E. Rutström. 2005. “Social norms and social choice.” Unpublished manuscript. Buchanan, James, and Gordon Tullock. 1962. The Calculus of Consent: Logical Foundations of Constitutional Democracy. Ann Arbor, MI: University of Michigan Press. Camerer, Colin. 2003. Behavioral Game Theory: Experiments in Strategic Interaction. Princeton, NJ: Princeton University Press. Chaudhuri, Ananish. 2011. “Sustaining cooperation in laboratory public goods experiments: a selective survey of the literature.” Experimental Economics 14(1):47–83. Cinyabuguma, Matthias, Talbot Page, and Louis Putterman. 2006. “Can second-order punishment deter antisocial punishment?” Experimental Economics 9(3):265–279. Laurent Denant-Boemont, David Masclet, and Charles N. Noussair. 2007. “Punishment, counterpunishment and sanction enforcement in a social dilemma experiment.” Economic Theory 33(1):145–167. Diekmann, Andreas, and Wojtek Przepiorka. 2015. “Punitive preferences, monetary incentives and tacit coordination in the punishment of defectors promote cooperation in humans.” Scientific Reports 5:10321. Egas, Martijn, and Arno Riedl. 2008. “The economics of altruistic punishment and the maintenance of cooperation.” Proceedings of the Royal Society B: Biological Sciences 275:871–878. Ertan, Arhan, Talbot Page, and Louis Putterman. 2009. “Who to punish? Individual decisions and majority rule in mitigating the free rider problem.” European Economic Review 53(5):495– 511. Fehr, Ernst, and Simon Gächter. 2002. “Altruistic punishment in humans.” Nature 415(6868):137–140. Fehr, Ernst, and Tony Williams. 2013. “Endogenous emergence of institutions to sustain cooperation.” Unpublished manuscript. Fischbacher, Urs. 2007. “z-Tree: Zürich toolbox for ready-made economic experiments.” Experimental Economics 10(2):171–178. Fischbacher, Urs, Simon Gächter, and Ernst Fehr. 2001. “Are people conditionally cooperative? Evidence from a public goods experiment.” Economics Letters 71(3):397–404. Fischer, Sven, Kristoffel R. Grechenig, and Nicolas Meier. 2013. “Cooperation under punishment: imperfect information destroys it and centralizing punishment does not help.” Unpublished manuscript. Fudenberg, Drew, David G. Rand, and Anna Dreber. 2012. “Slow to anger and fast to forgive: cooperation in an uncertain world.” American Economic Review 102(2):720–749. Gächter, Simon, Elke Renner, and Martin Sefton. 2008. “The long-run benefits of punishment.” Science 322(5907):1510.

Endogenous Peer Punishment Institutions in Prisoner’s Dilemmas: The Role of Noise

| 353

[21] Grechenig, Kristoffel, Andreas Niklisch, and Christian Thöni. 2010. “Punishment despite reasonable doubt – a public goods experiment with sanctions under uncertainty.” Journal of Empirical Legal Studies 7(4):847–867. [22] Green, Edward J., and Robert H. Porter. 1984. “Noncooperative collusion under imperfect price information.” Econometrica 52(1):87–100. [23] Greiner, Ben. 2004. “The online recruitment system ORSEE 2.0: a guide for the organization of experiments in economics.” Unpublished manuscript. [24] Gürerk, Özgür. 2013. “Social learning increases the acceptance and the efficiency of punishment institutions in social dilemmas.” Journal of Economic Psychology 34:229–239. [25] Gürerk, Özgür, Bernd Irlenbusch, and Bettina Rockenbach. 2006. “The competitive advantage of sanctioning institutions.” Science 312(5770):108–111. [26] Gürerk, Özgür; Bernd Irlenbusch, and Bettina Rockenbach. 2009. “Voting with feet: community choice in social dilemmas.” Unpublished manuscript. [27] Herrmann, Benedikt, Christian Thöni, and Simon Gächter. 2008. “Antisocial punishment across societies.” Science 319(5868):1362–1367. [28] Kamei, Kenju, Louis Putterman, and Jean-Robert Tyran. 2011. “State or nature? Formal vs. Informal sanctioning in the voluntary provision of public goods.” Unpublished manuscript. [29] Ledyard, John O. 1995. “Public goods: A survey of experimental research.” Pp. 11–94 in Handbook of Experimental Economics, edited by J. Kagel and A. E. Roth. Princeton, NJ: Princeton University Press. [30] Markussen, Thomas, Louis Putterman, and Jean Robert Tyran. 2014. “Self-organization for collective action: an experimental study of voting on sanction regimes.” Review of Economic Studies 81(1):301–324. [31] Nikiforakis, Nikos. 2008. “Punishment and counter-punishment in public good games: can we really govern ourselves?” Journal of Public Economics 92(1–2):91–112. [32] Nikiforakis, Nikos. 2014. “Self-governance through altruistic punishment?” Pp. 197–213 in Reward and Punishment in Social Dilemmas, edited by P. A. M. van Lange, B. Rockenbach, and T. Yamagishi. Oxford: Oxford University Press. [33] Nikiforakis, Nikos, and Dirk Engelmann. 2011. “Altruistic punishment and the threat of feuds.” Journal of Economic Behavior & Organization 78(3):319–332. [34] Nikiforakis, Nikos, and Hans-Theo Normann. 2008. “A comparative statics analysis of punishment in public-good experiments.” Experimental Economics 11(4):358–369. [35] North, Douglass C. 1990. Institutions, Institutional Change and Economic Performance. Cambridge: Cambridge University Press. [36] Ostrom, Elinor, James Walker, and Roy Gardner. 1992. “Covenants with and without a sword: Self-governance is possible.” American Political Science Review 86(2):404–417. [37] Patel, Amrish, Edward Cartwright, and Mark van Vugt. 2010. “Punishment cannot sustain cooperation in a public good game with free-rider anonymity.” Unpublished manuscript. [38] Prendergast, Canice. 1999. “The provision of incentives in firms.” Journal of Economic Literature 37(1):7–63. [39] Przepiorka, Wojtek, and Andreas Diekmann. 2013. “Individual heterogeneity and costly punishment: a volunteer’s dilemma.” Proceedings of the Royal Society B: Biological Sciences 280:20130247. [40] Raub, Werner, Vincent Buskens, and Rense Corten. 2014. “Social dilemmas and cooperation.” Pp. 597–626 in Handbuch Modellbildung und Simulation in den Sozialwissenschaften, edited by N. Braun and N. J. Saam. Wiesbaden: Springer Fachmedien. [41] Rockenbach, Bettina, and Manfred Milinski. 2006. “The efficient interaction of indirect reciprocity and costly punishment.” Nature 444(7120):718–723.

354 | Nynke van Miltenburg, Vincent Buskens, and Werner Raub

[42] Sutter, Matthias, Stefan Haigner, and Martin G. Kocher. 2010. “Choosing the carrot or the stick? Endogenous institutional choice in social dilemma situations.” Review of Economic Studies 77(4):1540–1566. [43] Van Miltenburg, Nynke, Wojtek Przepiorka, and Vincent Buskens. 2015. “Collective decision rules for implementing punishment in n-person Prisoner’s Dilemmas with noise in the display of contributions.” Unpublished manuscript. [44] Wu, Jianzhong, and Robert Axelrod. 1995. “How to cope with noise in the iterated Prisoner’s Dilemma.” Journal of Conflict Resolution 39(1):183–189.

| Part V: Trust and Trustworthiness

Margit E. Oswald and Corina T. Ulshöfer

Cooperation and Distrust – a Contradiction? Abstract: Trust is usually considered a prerequisite of cooperation in social dilemmas. Experimental studies show that people cooperate surprisingly often. However, this may happen because almost no risk from cooperation exists, and trust can be shallow. In daily life, trust is normally more evidence-based: that is, sufficient information has to be provided to allow the development of trust. The higher the risk resulting from cooperation, the more necessary such a collection of information becomes. During the phase of information gathering, a state of distrust, rather than a state of trust, may be evolutionarily functional. Experimental studies show that people in a state of distrust (a) do not take the opinion of others automatically as their true position, but rather display extensive attributional considerations, (b) generate ideas opposite to, or incongruent with, those in the message of the distrusted others, and prefer nonroutine strategies, (c) perform better in logical reasoning, and (d) show an increase in cognitive flexibility. We will discuss whether distrust can be seen as a state of mind that enhances mindful processing. Furthermore, we examine whether a state of distrust improves accuracy in detecting fraud and lies, and thus decreases the risk involved in cooperation in the long run.

1 What do experimental social dilemmas have in common with social dilemma situations in daily life? In social sciences, decision-making and social interactions are often analyzed by using experiments designed as social dilemmas. The so-called “trust games” are one important group of such social dilemmas: two or more players are provided with a certain amount of money, and each of them is asked to give a share of this money, or all of it, to a collective account (Fehr and Fischbacher 2004). The referee multiplies the resulting amount of money with a factor b (1 < b < n). This outcome determines the returns of the game and is apportioned among the players independent of their particular stake. The structure of social dilemmas in this game can be easily identified. Players win the most by investing the least, but only as long as there are other players who invest enough, and therefore act cooperatively. The gain is maximized if all players cooperate. It is assumed that those players who invest all or most of their money trust the other player or players because they expect cooperation (positive or benevolent behavior) without being able to apply control over the respective outcome. Thus, Mayer, Davis, and Schoorman (1995:712) define trust as follows: “Trust is the willingness of a https://doi.org/10.1515/9783110472974-017

358 | Margit E. Oswald and Corina T. Ulshöfer

party to be vulnerable to the actions of another party based on the expectation that the other will perform a particular action important to the trustor, irrespective of the ability to monitor or control that other party.” Trust (or distrust) may be a disposition as well as a certain state of mind. In the first case, the player has the general tendency to expect benevolence (or malevolence) from most other people, not only in trust games but also in other situations (cf. Rotter 1967). In the second case, the trust (or distrust) of the player is not generally given but refers to a specific person or to a specific group and may be induced by more or less extensive experiences or sensations. In their metaanalyses, Balliet and van Lange (2013) show that overall specific trust is more associated with cooperation (r = .58) than with dispositional trust (r = .26). In these trust games, as well as in other games with other dilemma structures, cooperative behavior may increase, and defective, egoistic, behavior may decrease, under certain conditions. More cooperation is shown if players interact more often, instead of engaging in one-shot games, the monetary incentive for cooperative (compared to that for defective) behavior is relatively high, the players are similar in their attitudes, or the players trust each other (Balliet and Van Lange 2013; Dirks and Ferrin 2001; Fischer 2009; Hardin 2006). So what do these experimental games have in common with social interactions in daily life? First, many social interactions in daily life often have the structure of a social dilemma. Just think of the recent donation appeal by Wikipedia, in late 2015, to support this non-profit information website, or the decision whether or not one should report an observed offense to the police. Second, people encountering social dilemmas in daily life may also be previously unknown to each other, and may only interact for a short period of time, as is the case in experimental games. However, many “players” in daily life may not come together with the sole purpose of attending to a social dilemma. Interacting partners in daily life may already have more or less competent knowledge about each other, and/or the person to be trusted may represent a certain social role, such as that of a physician or a supervisor. In this respect, person-specific trust or distrust in daily life is even more rationally justified. On the one hand, the justification can be based on one’s own experiences with the other person. Ideally, this trust is built up slowly and turns from a calculus-based trust to a knowledge-based trust, or even to an identification-based trust (Lewicki and Bunker 1996; Lewicki, Tomlinson, and Gillespie 2006; Rempel, Holmes, and Zanna 1985). On the other hand, it may be a role-based trust. The trust of groups of people, such as judges or physicians, as well as organizations, is transferred to their particular representatives. It is an abstract trust of a function, or a role owner, as well as the system of organization behind him or her which ensures that roleadequate behavior is met and maintained (Kramer 1999; McKnight, Cummings, and Chervany 1998; Oswald 2006; Oswald 2010). This role-based trust can be developed (a) via direct contact with role owners, (b) via information from a third party about the reputation of the person or organization in question, or (c) via information from the media (Ferrin, Dirks, and Shah 2006). Nevertheless, it is necessary to state that in

Cooperation and Distrust – a Contradiction?

| 359

most social dilemmas in daily life, people refer to knowledge-based trust developed via direct or indirect experiences. In contrast to daily life, there is rarely role-based trust in one-shot trust games. Knowledge-based trust, which may develop through repetitions, is relatively reduced, as possibilities for behavioral feedback are very restricted. In addition, before we ask ourselves if one can speak of trust in these situations at all, we should consider the very low risks at stake in experimental games. In most studies the monetary incentive for a cooperative or defective decision amounts only to a “pocket money” of several Euros, and no real loss is possible if participants’ trust (cooperation) is betrayed by the partner (e.g., Harth and Regner 2016: average earnings 12.23 €, no real loss possible; Brañas-Garza, Espín, and Lenkei 2016: average earnings between 9 and 11 €, no real loss possible). This fact is often criticized, as it has been shown that the magnitude of the monetary incentive influences the decision behavior (Gneezy and Rustichini 2000; Parco, Rapoport, and Stein 2002).

2 The necessity of well-founded trust varies with the risk from cooperation The risk arising from cooperative behavior can vary strongly. That risk may be a simple loss of game money, or a loss of real money from a few pence up to millions, or several years in prison (if one plays a prisoner’s dilemma in real life), or even the loss of one’s own life. The latter loss can happen, for example, up in the mountains when a rope team with two climbers gets into a dangerous situation. The higher the risk of a social dilemma, the more secure one has to be that one’s partner can be trusted (Mayer, Davis, and Schoorman 1995). Balliet and Van Lange (2013) show, using a metaanalysis, that trust exhibits a stronger positive association with cooperation during situations involving larger, compared to smaller, amounts of conflict. Thus, trust matters more when there is a serious conflict of interest. Alternatively, in case of high risks, one can establish control or protective measures, such as the so-called “worst-case protection” or “hedges”, which often exist as supplementary agreements (with corresponding compensations) in case of missing cooperation (Meyerson, Weick, and Kramer 1996). A combination of trust and some control in cases of high risk may be meaningful for the further development of trust (Das and Teng 1998; Gangl, Hofmann, and Kirchler 2015). However, if the decision to cooperate is made by external persons or authorities, the structure of the decision changes. This is no longer a social dilemma at all (cf. Messick and Brewer 1983). If the risks are low, people may choose cooperative behavior because they understand it as a kind of social politeness. If they choose to defect, they will breach this politeness norm and have to expect social disapproval. In such cases it may be enough to see the other person, or to hear their voice, to build up a feeling of trustworthiness.

360 | Margit E. Oswald and Corina T. Ulshöfer

This automatically-developed feeling needs to be differentiated from the more rational process of assessing the likelihood of becoming the victim of exploitation (Neumann 2014). Dunning et al. (2014) prove, in their studies, that this feeling of trustworthiness towards strangers is due to a norm mandating that they show respect for each other even if they are not convinced of their goodwill. If, as in such a case, no knowledge about the trustworthiness of the specific partner is required, Lewicki and Bunker (1996) talk of a calculus-based trust. As soon as the risks from cooperation increase, however, the situation changes significantly, as mentioned above. In romantic relationships, a kind of testing of mutual reliability and integrity takes place after the first romantic feelings are felt. A jealous watch is kept over appointments or promises. The social behavior of the partner towards friends and relatives is also tested, so as to build up knowledge about his or her benevolence and integrity (Rempel, Holmes, and Zanna 1985). Whenever the risks in the relationship increase, for example, in case of the decision to have a child, new critical partner tests may take place. An emancipated woman will thus try to ensure that the partner takes responsibility for parenting, even if that means that he has to cut down his career intentions to a certain degree. Trust development follows another path in case of role-based trust, as previously mentioned. If the possibility of behavior testing of the role owner is missing, as for example with a new physician, trust relies on the general reputation of the social role or organization (Diekmann and Wyder 2002; Li and Jia 2008). This reputation is normally built using information from third parties, one’s own experiences with representatives of this social role or organization, and/or by reports and statistics in the media. This form of trust may also be accompanied by more or less extensive tests and inquiries. These trust tests are tainted with uncertainty. A trust decision is inevitably a prediction based on the strength of past experience, and this prediction may be erroneous. However, it is important for further argumentation to know the fact that people facing an increasing risk from cooperative behavior will make a point of knowledgebased trust, built up through their own experiences, or the experiences of third parties. Furthermore, where this justified trust does not exist, further control mechanisms will be demanded as a substitute for trust. Finally, the question remains as to the quality of the trust test: how correct the prediction is, and how valid the resulting trust decisions have been. We thereby reach the central topic of this essay.

3 Cognitive functions of distrust With increasing risk, cooperative behavior is probably based on knowledge-based trust in the specific person, or the role owner. The more critical the test, the more profound should be the knowledge. Interestingly, the motivation for such a crucial

Cooperation and Distrust – a Contradiction?

| 361

test is often a state of distrust. Let us analyze the research which corroborates this assumption. The research about cognitive functions of distrust has vastly increased over the last twenty years and shows that a state of distrust is connected with (a) reasoning about alternative explanations, (b) the use of non-routine strategies, (c) cognitive flexibility and (d) the overcoming of a positive test strategy. We will report the main results of this research, because an overview of all studies may provide a new insight into the importance of distrust as a state of mind.

3.1 In a state of distrust people think about possible alternative explanations People in a normal environment usually trust the statements of others and consider them as true. This well-developed “truth bias” may be considered as a heuristic that does not only simplify the cognitive processes during social interactions, but also makes the communication easier and contributes to the maintenance of social relations (Grice 1975; Kraut and Higgings 1984; Stiff, Kim, and Ramesh 1992). In a state of distrust, however, people have doubts about whether statements can be taken at face value. Distrust can be considered as the perception that the environment is not normal and functions as a warning signal – things may not be as they appear (Schul, Mayo, and Burnstein 2004). Thus, distrust indicates danger, that is, the possibility of being cheated by the interaction partner who may try to get ahead at the expense of others. While trust means renouncing social control, totally or at least partially (Mayer, Davis, and Schoorman 1995), in a state of distrust it appears necessary to check the behavior of the distrusted person. Therefore, one of the most relevant functions of distrust is that people are no longer guided by first impressions or the seemingly obvious reasons for another’s behavior; instead they take a closer look at the circumstances. Fein and colleagues (Fein, McCloskey, and Tomlinson 1997; Fein 1996; Hilton, Fein, and Miller 1993) were able to show that the fundamental tendency to explain the behavior of others as a definite expression of their attitudes or disposition, the so-called correspondence bias (Jones and Harris 1967), decreases in a state of distrust or suspicion. Students’ political statements were questioned much more regarding external influences, and were less attributed to their own political attitude, if the analysts were in a state of distrust rather than in a neutral state of mind. Furthermore, according to Schul, Burnstein, and Bardi (1996), as well as Schul, Mayo, and Burnstein (2004), distrust can be interpreted as the tendency to be ready to resist the persuasive intent of others and to think about possible alternative explanations. If people suspect the validity of messages, as argued by the authors, they encode messages as if they were true and, at the same time, as if their opposite was true. Thus, people tend to search for non-obvious alternative interpretations of the given informa-

362 | Margit E. Oswald and Corina T. Ulshöfer

tion, although such attempts might be cognitively taxing (Schul, Burnstein, and Bardi 1996). It can, however, be assumed that the opposite of any given information should be more quickly available in situations of distrust than of trust, and incongruent or so-called “counter-scenarios” should be more easily activated. This latter assumption was tested in an extraordinary study by Schul, Mayo, and Burnstein (2004). When subjects were primed with an adjective (e.g., “difficult”) superimposed on a face presented on the computer screen, they recognized synonyms (e.g., “complicated”) faster than antonyms (e.g., “easy”) if the face was trustworthy. However, antonyms were recognized faster than synonyms if the face was untrustworthy. According to Schul, Mayo, and Burnstein (2004), this interaction effect corroborates the assumption that an activation of incongruent associations is a generalized pattern of response in a state of distrust. We could not replicate these results in four experiments (Ulshöfer, Ruffieux, and Oswald 2013). While the study by Schul, Mayo, and Burnstein (2004) was conducted in Israel, ours were conducted in Switzerland, and it may be possible that the effects of this specific paradigm are culture-specific. However, there are diverse studies with different experimental paradigms which show that people in a state of distrust are stimulated to leave the standard path of association and thought, as we will demonstrate below. Before we analyze these further studies, we want to explain a bit more how distrust and trust are manipulated in these experiments. This explanation will probably facilitate readers’ understanding, but has also the benefit that the description of the following studies with either the same or diverse manipulations of distrust and trust remains more economical and manageable. Distrust is often manipulated independently of the upcoming task that subjects have to fulfill. In Schul, Mayo, and Burnstein (2004), as well as in other studies, faces with trustworthy or untrustworthy expressions have been used to elicit a state of trust or distrust in the subjects (experiment 3 in Schul, Mayo, and Burnstein 2008). In other studies, tasks which lead to distrust have to be solved. For example, one has to judge if a questionnaire was answered by women, or by men who only pretend to be women. Those questionnaires contain, for example, descriptions of what is in the purse of the person answering, or how one might exchange a flat tire (experiment 1 in Schul, Mayo, and Burnstein 2008). Another priming method is used by Mayer and Mussweiler (2011) in their second experiment. Subjects have to set up correct sentences from several words. They receive so-called “scrambled sentences” in which several words are presented in a randomized order, with one word often not belonging to the sentence. In the condition of distrust, a majority of the resulting sentences had distrustful content, for example, “asked a fraudulent question”. In other experiments the priming of distrust occurs unconsciously. Subjects have to fulfill a lexical decision task (LDT) where they have to decide whether a letter combination is a word (e.g., “jacket”) or a non-word (e.g., “bealk”). Before these words or non-words are shown on the screen, the verbs “ver-

Cooperation and Distrust – a Contradiction?

|

363

trauen [trust]” or “misstrauen [distrust]” are shown for 13 ms. (experiments 1 and 3 in Mayer and Mussweiler 2011). A manipulation check was not always executed in these experiments, and the effects of a distrust manipulation were not always compared to a trust manipulation, only to a neutral control group. Nonetheless, several studies showed a consistent picture.

3.2 Distrusting individuals prefer non-routine strategies in unusual environments As already mentioned, Schul, Burnstein, and Bardi (1996) and Schul, Mayo, and Burnstein (2004) assume that distrustful people are on guard, believe things may not be as they appear, activate incongruent associations, and consider alternative explanations of the obvious reasons of another’s behavior. Elaborating further on these assumptions, the authors now postulate, from a broader perspective, that people who distrust will particularly avoid those strategies to solve problems that are frequently or routinely taken. Schul, Mayo, and Burnstein (2008:1294) postulate that trust and distrust are generally associated with different types of thought processes: under conditions of trust, people succeed more in making inferences about typical environments, whereas those who distrust perform better in environments that are unusual, unexpected, or non-routine. In three experiments the authors analyzed how subjects in a state of distrust solve problems with routine solutions (for example, to predict changes of a variable Y which increases when two predictors X1 and X2 increase) or solve problems which have solutions that deviate from the routine (for example, to predict changes of a variable Y which increases with predictor X1 , but decreases with predictor X2 ). Schul, Mayo, and Burnstein (2008) confirmed their hypothesis and commented on their findings in this way: distrust “sensitizes individuals to departures from the expected and increases the likelihood that they will search for irregularities and nonroutine contingencies” (Schul, Mayo, and Burnstein 2008:1300).

3.3 A state of distrust may enhance cognitive flexibility On the basis of the assumptions by Schul, Mayo, and Burnstein (2004; 2008), Mayer and Mussweiler (2011) reasoned that distrust is also beneficial for cognitive flexibility, which is one of the main components of creativity: “[. . . ] people in a distrust-mind elaborate more, and they do so in a specific way: They seem to entertain multiple interpretations of potentially valid information rather than to elaborate intensely on that information within only one interpretation frame. [. . . ] The latter is consistent with category diversity and is thus indicative of cognitive flexibility” (Mayer and Mussweiler 2011:1263). Given that lots of studies show that creativity in groups implies a context

364 | Margit E. Oswald and Corina T. Ulshöfer

of mutual trust (Bechtoldt et al. 2010; Ekvall and Ryhammar 1999), they reduce their postulate to such situations where people act alone, and therefore are in a private context. In their first experiment, the authors showed that subliminally-primed distrust (vs. trust) had detrimental effects on creativity (measured by an idea-generation-task) presumed to be public. However, an opposite tendency emerged as soon as people believed themselves to be in a private environment. Further experiments support the assumption that distrust enhances creativity, whereas cognitive flexibility mediates the interrelation. The last two of the four experiments focus mainly on cognitive flexibility. If cognitive flexibility increases, the question arises as to whether people perceive even less typical representative members of semantic categories (e.g., vehicle, furniture or vegetable) still as members of their category, or rather judge that they do not fit their category after all. The higher the so-called “category inclusiveness”, the more participants are willing to include even less representative members in their category. In fact, Mayer and Mussweiler (2011) show that subliminally primed distrust induces an increased category inclusiveness and thus cognitive flexibility. In Experiment 3, for example, participants under conditions of distrust included less typical exemplars (e.g., “camel”) more in their respective categories (e.g., “vehicle”) than participants under conditions of trust or in a neutral state of mind.

3.4 A state of distrust helps to overcome a Positive Test Heuristic Our thinking and perception is normally based on hypotheses, and we therefore tend to test those phenomena which are predicted by our assumptions. This tendency to search for predicted phenomena, and not for those which falsify the hypothesis, is called a Positive Test Heuristic. Under certain premises, this heuristic leads automatically to an affirmation of the tested hypothesis, and thus even to an illusory confirmation (Klayman and Ha 1987; Oswald and Grosjean 2004). Wason (1960; 1968) has analyzed the strategies of his subjects under such preconditions. In the paradigm of the ‘rule discovery task’, subjects were shown a sequence of numbers (e.g., 2–4–6) and had to find the rule behind this sequence (Wason 1960). Most subjects thought of the rule as ‘a sequence of even numbers’, and named triples as, for example, 6–8–12 or 20–22–24. They applied the Positive Test Heuristic. However, the correct rule was more general than the assumed one, as it was ‘any series of increasing numbers’. They always received a positive feedback regarding the test of their hypothesis, even though it was wrong. Only a Negative Test Heuristic would lead them to the correct rule: they have to name a triple which is contrary to their own hypothesis. Mayo, Alfasi, and Schwarz (2014) tested whether a state of distrust helps subjects to avoid the Positive Test Heuristic. The basis of this study is the assumption that a person in a state of distrust focuses on how things may be different from the hypothesis they have in mind. The researchers compared four groups of subjects, who had different degrees of dispositions to trust, or were manipulated with trust and distrust. They confirmed

Cooperation and Distrust – a Contradiction?

|

365

their hypothesis. Subjects in a state of distrust, or a disposition to distrust, used the Negative Test Heuristic more often than those subjects in a state of trust or a disposition to trust. In contrast to Mayer and Mussweiler (2011), they demonstrated that the initial idea-generation stage is not the only route to the higher creativity observed (Mayo, Alfasi, and Schwarz 2014). In the second paradigm, the ‘selection task’ subjects had to prove the logical rule “if p then q” (Wason 1968). A correct test required that the subjects not only tested for p, but also for non-q, which normally only 5–10 % of subjects do. Gigerenzer and Hug (1992) found that subjects did better if there was a possibility that cheating has taken place, and that they were able to uncover it. If the rule “if p then q” is introduced as a social rule, such as “If one sleeps in a mountain shelter, then one should bring wood for the keeper of the shelter”, subjects tested non-q more often. The subjects decided to investigate whether those persons who did not carry wood up to the shelter will sleep there or not. To test non-q under this condition is much easier than under an abstract rule such as “If there is a vowel on the front side, there is a number on the back side.”

4 Does distrust improve the detection of cheating and lying? The cognitive consequences of distrust are seen as a result of evolutionary adaptation processes which have developed out of constant danger of being cheated by others (Mayer and Mussweiler 2011; Schul, Mayo, and Burnstein 2004; 2008). People can protect themselves, if they consider what might happen if the opposite of what the other says is true (Schul, Mayo, and Burnstein 2004). Moreover, people under conditions of distrust are likely to avoid routine strategies, because these strategies are easily anticipated by whoever may be seeking to deceive them (Schul, Mayo, and Burnstein 2008). Mayer and Mussweiler (2011:1263) also emphasized the adaptive advantage of preparing oneself through elaborate information processing, and with counter-scenarios for possible cheating. They compared this adaptive strategy to the less effective strategy of avoiding every situation which may be associated with a potential fraud. So we summarize that a state of distrust leads to elaborate information processing: alerting subjects not to take information at face value, but rather increasing the likelihood that they will consider that things might be different from what meets the eye (Schul, Mayo, and Burnstein 2004); enhancing unusual non-routine strategies in problem solving (Schul, Mayo, and Burnstein 2008); activating cognitive flexibility (Mayer and Mussweiler 2011); and improving critical hypotheses testing (Mayo, Alfasi, and Schwarz 2014). It therefore seems reasonable to assume that distrust enhances a subject’s ability to uncover the ‘true’ intentions of liars and cheaters. One would assume, for example, that people in a state of distrust will pay more attention to the con-

366 | Margit E. Oswald and Corina T. Ulshöfer

tents of the statements of others, analyze possible ulterior motives more, and therefore uncover more logical inconsistencies (see Appendix for an example). However, one also has to question the assumed adaptiveness of these strategies. Indeed, to our surprise, even though there are numerous studies about the adaptive consequences of distrust, until now there has been no study to test whether distrust is advantageous for the accuracy of detecting fraud or lies. Recently, Reinhard and Schwarz (2012) have shown that the accuracy of lie detection can be improved if people elaborate the content of a message instead of relying on non-verbal behavior. It improves the accuracy of lie detection because content cues, for example, the number of reported details, or logical inconsistency, are generally more valid than non-verbal cues, such as gaze aversion (Vrij 2008). However, the elaboration of information processing in this study was due to a manipulated negative mood rather than to a state of distrust. Additionally, in a pretest by Ruffieux and Oswald (2015), it was examined whether subliminal primed distrust (vs. control condition) increased the accuracy of lie detection, given that judgments should be especially driven by logical inconsistencies of the arguments. Indeed, the accuracy of judgments in a state of distrust (76.7 %) was better than for participants in a neutral state of mind (63.3 %). Although this difference of the overall accuracy was not statistically significant, it could be verified that distrust improves the detection of a lie, but not at the expense of detecting the truth. Moreover, the additionally measured self-report about cues used during the veracity judgment showed that participants under conditions of distrust predominantly reported having used contradiction or inconsistency as a cue (53.3 %), whereas those in a neutral state of mind reported this less often (40 %). However, the hypothesis that, in a state of distrust, people are better at uncovering cheating and fraud is not necessarily a truism, nor so easy to confirm. One has to consider that people who are afraid of fraud and cheating are not the only ones who have developed this effective cognitive strategy: the cheaters and liars have done so too (Schul, Mayo, and Burnstein 2004). Furthermore, the state of distrust has to be differentiated from the disposition to distrust. The latter can even be counterproductive, because a general tendency to distrust may lead to a weak tendency to show trusting behavior and to a strategy of avoiding situations which may be associated with fraud. There will therefore be few possibilities to learn from experience, and generally distrusting people run the risk of wrongfully accusing others (Yamagishi, Kikuchi, and Kosugi 1999). Finally, how confident people are in their distrust may be crucial. In the cited studies, merely a state of distrust was manipulated, rather than a firm conviction that a person has malicious intentions. However, a person needs to be uncertain as to whether they should trust or not, or they may not critically examine the other party and the environment and may thus easily fall for a Positive Test Heuristic. To sum up this section: from an evolutionary point of view, a state of distrust should especially serve the function of detecting cheating or lying sooner, and more

Cooperation and Distrust – a Contradiction?

|

367

accurately, than a neutral state or a state of trust. Unfortunately, this conclusion has not yet been sufficiently investigated.

5 Distrust and significant risks arising from cooperation As long as cooperation is examined in social dilemmas, where the risk from cooperation is little more than pocket money, we will not learn when (and under which circumstances) people reliably trust their interaction partner. Trust in these experimental situations may not be much more than easily-shaken confidence in politeness norms. As soon as there are significant risks from cooperation, another level of trust needs to exist. From a theoretical point of view, it is worth mentioning that more research about the process of trust development has been conducted on experimental dilemmas (Ismayilov and Potters 2016; Neumann 2014; Przepiorka and Diekmann 2012). However, these apply to general situations, with only a low cooperation risk. To know more about cooperation in daily life, we have to learn something about the development of trust when the risk arising from cooperation varies. Without direct or indirect experience regarding the benevolence and integrity of the other partner, one may not expect cooperative behavior in real conflict situations, or cooperation may end very abruptly. What role does distrust play in the development of trust, when the risk arising from cooperation is significant? Even if it may be paradoxical at first glance, distrust seems to be a functional state of mind for the development of trust. To reduce the risk from cooperation in a dilemma, one has to make the correct decision as to whether the other party is to be trusted. Nothing could be worse than betting on the wrong horse and judging trustworthiness wrongly. A correct decision requires elaborate and critical information processing about the benevolence and the integrity of one’s partner. Moreover, we now know that this occurs more often under circumstances of distrust than it does under circumstances of trust.

368 | Margit E. Oswald and Corina T. Ulshöfer

Appendix

Notes: Andreas Diekmann and Margit E. Oswald are in the same place at the same time in the mountains, on the First (8357 ft.), in the Kander-Valley, Switzerland. But can we trust this assumption? Is the picture possibly fake? What kind of counter-scenarios may be possible? Questions like these belong to people who are in a state of distrust, and they will detect the striking inconsistencies in the shadows. Fig. 1: Andreas Diekmann and Margit E. Oswald in the same place at the same time.

Bibliography [1] [2]

[3]

[4] [5]

Balliet, Daniel, and Paul A. M. Van Lange. 2013. “Trust, conflict, and cooperation: a metaanalysis.” Psychological Bulletin 139(5):1090–1112. Bechtoldt, Myriam N., Carsten K. W. De Dreu, Bernard A. Nijstad, and Hoon-Seok Choi. 2010. “Motivated information processing, social tuning, and group creativity.” Journal of Personality and Social Psychology 99(4):622–637. Brañas-Garza, Pablo, Antonio M. Espín, and Balint Lenkei. 2016. “BMI is not related to altruism, fairness, trust or reciprocity: Experimental evidence from the field and the lab.” Physiology & behavior 156: 79–93. Retrieved May 30, 2016 (http://www.ncbi.nlm.nih.gov/pubmed/ 26780149). Das, T. K., and Bing-Sheng Teng. 1998. “Between trust and control developing confidence in partner cooperation in alliances.” Academy of Management Review 23(3):491–512. Diekmann, Andreas, and David Wyder. 2002. “Vertrauen und Reputationseffekte bei InternetAuktionen.” Kölner Zeitschrift für Soziologie und Sozialpsychologie 54(4):674–693.

Cooperation and Distrust – a Contradiction?

[6] [7]

[8] [9] [10] [11]

[12] [13] [14]

[15] [16] [17] [18] [19]

[20] [21] [22] [23] [24] [25]

[26]

| 369

Dirks, Kurt T., and Donald L. Ferrin. 2001. “The Role of Trust in Organizational Settings.” Organization Science 12(4):450–467. Dunning, David, Joanna E. Anderson, Thomas Schlösser, and Daniel Ehlebracht. 2014. “Trust at Zero Acquaintance: More a Matter of Respect Than Expectation of Reward.” Journal of Personality and Social Psychology 107(1):122–141. Ekvall, Göran, and Lars Ryhammar. 1999. “The creative climate: Its determinants and effects at a Swedish university.” Creativity Research Journal 12(4):303–310. Fehr, Ernst, and Urs Fischbacher. 2004. “Social norms and human cooperation.” Trends in Cognitive Sciences 8(4):185–190. Fein, Steven. 1996. “Effects of suspicion on attributional thinking and the correspondence bias.” Journal of Personality and Social Psychology 70(6):1164–1184. Fein, Steven, Allison L. McCloskey, and Thomas M. Tomlinson. 1997. “Can the jury disregard that information? The use of suspicion to reduce the prejudicial effects of pretrial publicity and inadmissible testimony.” Personality and Social Psychology Bulletin 23(11):1215–1226. Ferrin, Donald L., Kurt T. Dirks, and Pri P. Shah. 2006. “Direct and indirect effects of third-party relationships on interpersonal trust.” The Journal of Applied Psychology 91(4):870–883. Fischer, Ilan. 2009. “Friend or foe: subjective expected relative similarity as a determinant of cooperation.” Journal of Experimental Psychology. General 138(3):341–350. Gangl, Katharina, Eva Hofmann, and Erich Kirchler. 2015. “Tax authorities’ interaction with taxpayers: A conception of compliance in social dilemmas by power and trust.” New ideas in psychology 37:13–23. Gigerenzer, Gerd, and Karl Hug. 1992. “Domain-specific reasoning: Social contracts, cheating, and perspective change.” Cognition 43(2):127–171. Gneezy, Uri, and Aldo Rustichini. 2000. “Pay enough or don’t pay at all.” Quarterly Journal of Economics 115(3):791–810. Grice, Herbert. P. 1975. “Logic and conversation.” Pp. 41–58 in Speech Acts, edited by P. Cole, and J. L. Morgan. New York: Academic Press. Hardin, Russell. 2006. Trust. Cambridge, MA: Polity Press. Harth, Nicole S., and Tobias Regner. 2016. “The spiral of distrust: (Non-)cooperation in a repeated trust game is predicted by anger and individual differences in negative reciprocity orientation.” International Journal of Psychology, February. Retrieved June 13, 2016 (http://www.ncbi.nlm.nih.gov/pubmed/26865362). Hilton, James L., Steven Fein, and Dale T. Miller. 1993. “Suspicion and dispositional inference.” Personality & Social Psychology Bulletin 19(5):501–512. Ismayilov, Huseyn, and Jan Potters. 2016. “Why do promises affect trustworthiness, or do they?” Experimental Economics 19(2):382–393. Jones, Edward E., and Victor A. Harris. 1967. “The Attribution of Attitudes.” Journal of Experimental Social Psychology 3(1):1–24. Klayman, Joshua, and Young-Won Ha. 1987. “Confirmation, Discontinuation, and Information in Hypothesis Testing.” Psychological Review 94(2):211–228. Kramer, Roderick M. 1999. “Trust and distrust in organizations: Emerging Perspectives, Enduring Questions.” Annual Review of Psychology 50:569–598. Kraut, Robert E., and Edward T. Higgings. 1984. “Communication and social cognition.” Pp. 87–127 in Handbook of social cognition, edited by R. S. Jr. Wyer, and T. K. Srull. Hillsdale, NJ: Erlbaum. Lewicki, Roy J., and Barbara B. Bunker. 1996. “Developing and maintaining trust in work relationships.” Pp. 114–139 in Trust in Organizations: Frontiers of Theory and Research, edited by R. M. Kramer, and T. Tyler. Thousand Oaks, CA: Sage.

370 | Margit E. Oswald and Corina T. Ulshöfer

[27] Lewicki, Roy J., Edward C. Tomlinson, and Nicole Gillespie. 2006. “Models of Interpersonal Trust Development: Theoretical Approaches, Empirical Evidence, and Future Directions.” Journal of Management 32(6):991–1022. [28] Li, Chaoping, and Liangding Jia. 2008. “How do I trust thee? The employee-organization relationship, supervisory support, and middle manager trust in the organization.” Human Resource Management 47(1):111–132. [29] Mayer, Jennifer, and Thomas Mussweiler. 2011. “Suspicious spirits, flexible minds: When distrust enhances creativity.” Journal of Personality and Social Psychology 101(6):1262–1277. [30] Mayer, Roger C., James H. Davis, and F. David Schoorman. 1995. “An integrative model of organizational trust.” The Academy of Management Review 20(3):709–734. [31] Mayo, Ruth, Dana Alfasi, and Norbert Schwarz. 2014. “Distrust and the positive test heuristic: dispositional and situated social distrust improves performance on the Wason rule discovery task.” Journal of Experimental Psychology General 143(3):985–990. [32] McKnight, D. Harrison, Larry L. Cummings, and Norman L. Chervany. 1998. “Initial trust formation in new relationships.” Academy of Management Review 23(3):473–490. [33] Messick, David, and Marilynn. B. Brewer. 1983. “Solving social dilemmas: A review.” Pp. 11– 44 in Review of personality and social psychology, Vol. 4, edited by L. Wheeler, and P. Shaver. Beverly Hills, CA: Sage. [34] Meyerson, Debra, Karl E. Weick, and Roderick M. Kramer. 1996. “Swift trust and temporary groups.” Pp. 166–195 in Trust in organizations: Frontiers of theory and research, edited by R. Kramer and T. Tyler. Thousand Oaks, CA: Sage. [35] Neumann, Robert. 2014. “Understanding trustworthiness: using response latencies from CATI surveys to learn about the ‘crucial’ variable in trust research.” Quality & Quantity 50(1):43–64. [36] Oswald, Margit E., and Stefan Grosjean. 2004. “Confirmation bias.” Pp. 79–96 in Cognitive Illusions: A Handbook on Fallacies and Biases in Thinking, Judgement and Memory, edited by R. F. Pohl. Hove: Psychology Press. [37] Oswald, Margit E. 2006. “Vertrauen in Personen und Organisationen.” Pp. 710–716 in Handbuch der Psychologie. Vol. 3, Sozialpsychologie und Kommunikationspsychologie, edited by H. W. Bierhoff, and D. Frey. Göttingen: Hogrefe. [38] Oswald, Margit E. 2010. “Vertrauen in Organisationen.” Pp. 63–85 in Vertrauensforschung: State of the art, edited by M. K. W. Schweer. Frankfurt: Peter Lang. [39] Parco, James E., Amnon Rapoport, and William E. Stein. 2002. “Effects of financial incentives on the breakdown of mutual trust.” Psychological Science 13(3):292–297. [40] Przepiorka, Wojtek, and Andreas Diekmann. 2012. “Temporal Embeddedness and Signals of Trustworthiness: Experimental Tests of a Game Theoretic Model in the United Kingdom, Russia, and Switzerland.” European Sociological Review 29(5):1010–1023. [41] Reinhard, Marc-André, and Norbert Schwarz. 2012. “The influence of affective states on the process of lie detection.” Journal of Experimental Psychology. Applied 18(4):377–389. [42] Rempel, John K., John G. Holmes, and Mark P. Zanna. 1985. “Trust in close relationships.” Journal of Personality and Social Psychology 49(1):95–112. [43] Rotter, Julian. 1967. “A new scale for the measurement of interpersonal trust.” Journal of Personality 35(4):651–665. [44] Ruffieux, Nicole, and Margit E. Oswald. 2015. “Distrust facilitates analytical reasoning.” University of Bern. Unpublished Manuscript. Retrieved October 21, 2016 (http://www.soz.psy.unibe. ch/unibe/portal/fak_humanwis/philhum_institute/inst_psych/psy_soz/content/e48660/ e48674/e69660/e69692/files494908/ruffieux_oswald_0802_ger.pdf). [45] Schul, Yacoov., Eugene Burnstein, and Anat Bardi. 1996. “Dealing with deceptions that are difficult to detect: Encoding and judgment as a function of preparing to receive invalid information.” Journal of Experimental Social Psychology 253(32):228–253.

Cooperation and Distrust – a Contradiction?

| 371

[46] Schul, Yacoov., Ruth Mayo, and Eugene Burnstein. 2004. “Encoding under trust and distrust: The spontaneous activation of incongruent cognitions.” Journal of Personality and Social Psychology 86(5):668–679. [47] Schul, Yaccov., Ruth Mayo, and Eugene Burnstein. 2008. “The value of distrust.” Journal of Experimental Social Psychology 44(5):1293–1302. [48] Stiff, James. B., H. J. Kim, and C. N. Ramesh. 1992. “Truth biases and aroused suspicion in relational deception.” Communication Research 19(3):326–345. [49] Ulshöfer, Corina T., Nicole Ruffieux, and Margit E. Oswald. 2013. “Thinking the opposite under distrust: Do untrustworthy faces facilitate the recognition of antonyms?” University of Bern. Unpublished manuscript. Retrieved October 21, 2016 (http://www.soz.psy.unibe.ch/unibe/ portal/fak_humanwis/philhum_institute/inst_psych/psy_soz/content/e48660/e48674/ e69660/e69692/files494909/ulshfer_ruffieux_oswald_2013_211016_ger.pdf). [50] Vrij, Aldert. 2008. Detecting lies and deceit: pitfalls and opportunities. Chichester: Wiley-Interscience. [51] Wason, Peter C. 1960. “On the failure to eliminate hypotheses in a conceptual task.” Quarterly Journal of Experimental Psychology 12(3):129–140. [52] Wason, Peter C. 1968. “Reasoning about a rule.” Quarterly Journal of Experimental Psychology 20(3):273–281. [53] Yamagishi, Toshio, Masako Kikuchi, and Motoko Kosugi. 1999. “Trust, gullibility, and social intelligence.” Asian Journal of Social Psychology 2(1):145–161.

Wojtek Przepiorka and Joël Berger

Signaling Theory Evolving: Signals and Signs of Trustworthiness in Social Exchange Abstract: Signaling theory is concerned with situations of strategic interdependence in which one actor (the sender) aims at persuading another actor (the receiver) of a fact the receiver does not know or is uncertain about. The unobserved fact can be a quality of the sender the receiver would like to know and act upon. Signaling theory has been used to explain individuals’ investments in higher education, advertisement, cultural consumption, aggressive behavior, and decision-making in social dilemmas. In this chapter, we give an overview of how signaling theory can be used to explain trust and trustworthiness in social exchange. After restating the core elements of the theory, we discuss conceptual extensions to the basic framework which have proved useful in explaining trust and trustworthiness in social exchange. In particular, we show how distinguishing between signaling costs and benefits, signals and signs, and the production and display of signals and signs can make signaling theory more broadly applicable in sociological scholarship. We illustrate these conceptual extensions with empirical evidence from laboratory experiments. The chapter concludes with an outlook on future research.

1 Introduction Most, if not all, human social interaction takes place under conditions of uncertainty, that is in situations where one, several or all parties are not fully informed about their interaction partners’ true intentions, preferences and/or constraints. Even a social encounter as simple as two pedestrians approaching each other on the sidewalk comprises uncertainty, sometimes causing two strangers engage in a brief “dance” before letting them go on their way. If, as in this example, actors’ interests overlap (i.e., if X wishes Y to pass on the left, so does Y), communication conveying actors’ preferences and intentions can reduce uncertainty to an extent that benefits all parties (e.g., X prevents collision by pointing in the direction he or she wants to go). However, in a significant proportion of human social interaction, interacting parties’ interests diverge or even oppose. In such situations mere statements and gestures may not reduce uncertainty, as making one’s counterpart believe one thing to withhold the truth about another can be highly beneficial from an individual perspective. For example, in situations of conflict when actors fight over a scarce resource, making adversaries believe

Note: We would like to thank Diego Gambetta and Ben Jann for helpful comments and suggestions. https://doi.org/10.1515/9783110472974-018

374 | Wojtek Przepiorka and Joël Berger

that one will endure and win a fight may lead them to recognize defeat without challenge. In situations where trust is at stake, when actors must trust one another to gain but suffer a loss if their trust is abused, making one’s partner believe one is trustworthy is in one’s best interest, whether honest or not. Statements such as “don’t bother, I’m stronger than you” or “send me your money, I’m trustworthy” will convince the gullible, but no one with their head firmly in place. If binding contracts cannot be made and/or fulfilled, is there a way for communication to solve these social dilemmas? Signaling theory, put forward independently by biologists and economists, addresses this question. In biology, the literature on the evolution of animal communication has focused on how a system of “honest” signals can be sustained once it has evolved (Searcy and Nowicki 2005). With the acknowledgement of the individual-selectionist approach in evolutionary theory, former arguments that comprehended signaling as a coordination device in cooperative interactions were discarded. The main argument brought forward by Dawkins and Krebs (1978) was that deceptive signals benefit individuals not endowed with the qualities that the receiver anticipates based on these signals. This manipulative interpretation of signaling, however, leads to the conclusion that receivers will eventually ignore these signals, as they do not entail any benefits for them. Zahavi (1975) offered a way out of the signaling paradox by assuming that signals are costly and entail a handicap that only individuals with superior qualities could sustain. The tail of a peacock became the prime example of the so-called handicap principle (Zahavi and Zahavi 1997). However, not until Grafen (1990) published a formal analysis, showing that costly signals can indeed be evolutionarily stable, did the handicap principle gain wider recognition among biologists. In economics, signaling theory was most prominently introduced by Spence (1974), who suggested that educational attainment operates as a signal in an employer-employee interaction. The assumption inherent to Spence’s model is that signaling costs are negatively correlated with productivity. Actors with higher productivity are assumed to have attained a higher degree of education at lower costs in terms of time and effort. Thus, diplomas and degree certificates reduce an employer’s uncertainty regarding a potential employee’s productivity. At about the same time, Nelson (1974) addressed the question of why producers of experience goods (goods the quality of which cannot be observed in advance) advertise their products if such advertising cannot add valuable information prior to purchase. He argued that advertisement increases consumers’ confidence in the product’s quality, because only consumers’ repeated purchases would compensate the producer for the initial expenses on advertisement. Inspired by contributions in economics and game theory (Cho and Kreps 1987; Frank 1988; Camerer 2003), rational choice sociologists’ discussion of signaling theory relates to the research on social dilemmas in general and the research on trust dilemmas in particular (Voss 1998; Bacharach and Gambetta 2001; Raub 2004; Przepiorka and Diekmann 2013). Signaling theory offers a reliable way to overcome the trust

Signaling Theory Evolving: Signals and Signs of Trustworthiness in Social Exchange |

375

dilemma by answering the following two questions (see also Bozoyan 2015): (1) How can an actor decide whether another actor can be trusted? (2) How can an actor establish his or her trustworthiness? In what follows, we review the core elements of the theory by applying it to the trust dilemma in social exchange. We then discuss conceptual extensions to the basic framework which have proved useful in explaining trust and trustworthiness in social exchange. Most of these conceptual extensions have been developed elsewhere; this chapter puts them together and, where necessary, states more precisely the conditions under which they apply.

2 Trust and trustworthiness in social exchange Trust dilemmas arise in sequential exchange between a truster (first moving party) and a trustee (second moving party) if the following five conditions are met: (1) the exchange is not based on a formally binding agreement; (2) a self-regarding trustee has no incentive to meet the truster’s advance; (3) the truster regrets having made an advance if it remains unmet by the trustee; (4) both truster and trustee are better off if the trustee meets the truster’s advance than if no exchange takes place; and (5) the truster is uncertain of whether his or her advance will be met by the trustee (Coleman 1990: Ch. 5). In game-theoretic terms, the trust dilemma has been described by means of the trust game (Dasgupta 1988; Kreps 1990).¹ The extensive form of the trust game (TG) is represented by the right subtree in Figure 1. In the TG, the truster (Player 1) moves first and decides whether to make an advance (a) or not (¬a). If the truster decides for the latter, no exchange takes place and both parties to the interaction merely save the opportunity costs (P) of engaging in an exchange with each other. Only if the truster makes an advance does the trustee (Player 2) have a possibility to decide. The trustee can decide whether to meet the truster’s advance (m) or not (¬m). If the trustee meets the truster’s advance, the exchange takes place and both the truster and the trustee earn the gains from trade (R > P). However, the trustee gains more by not meeting the truster’s advance (T > R). Since the trustee is self-regarding, he or she does not meet the truster’s advance, and the truster earns less than if he or she had not made an advance (S < P). Note that the TG only fulfils the first four of the five necessary conditions listed above and thus remains an inadequate representation of a trust dilemma. Since in the

1 Note that in the behavioral and experimental economics literature, the investment game (Berg, Dickhaut, and McCabe 1995) is often called a “trust game” (see also Camerer 2003). We will not consider the investment game here. Roughly speaking, the investment game is a continuous version of the binary trust game described in this chapter. For a critical assessment of the investment game as a representation of the trust dilemma, see Ermisch and Gambetta (2006).

376 | Wojtek Przepiorka and Joël Berger

TG, the truster can be certain that the trustee will not meet his or her advance, the truster will refrain from making an advance and no exchange will take place. A trust dilemma also requires that the truster is uncertain whether the trustee will meet his or her advance. Only the trust game with incomplete information (TGI) meets all five necessary conditions of a trust dilemma (Camerer and Weigelt 1988; Dasgupta 1988; Voss 1998; Bacharach and Gambetta 2001; Buskens 2002; Raub 2004). The TGI extends the TG as follows (see Figure 1). First, it introduces nature (N) as a player (Harsanyi 1967), which randomly chooses whether the truster interacts with the trustee in the TG (right subtree in Figure 1) or in the assurance game (AG, left subtree in Figure 1). The only difference between the TG and the AG is that in the AG, the trustee has an incentive to meet the truster’s advance because R + b > T − c, where b > 0 and/or c > 0. However, the truster does not know which of the two games nature has chosen for him or her to interact with the trustee. The truster only knows that nature chooses the AG with probability α and the TG with probability 1−α, and this is common knowledge. Knowing α, the truster thus decides whether or not to make an advance based on his or her expected gains from either action, that is EU[a] = αR + (1 − α)S and EU[¬a] = P, respectively. The truster makes an advance if EU[a] > EU[¬a] or, put differently, if the probability α of interacting in the AG is above a certain threshold: α > (P − S)/(R − S)

(1)

N α AG

1–α

1

1 ¬a

a

a 2

2 m R R+b

¬a

¬m S T–c

P P

m

¬m

R R

S T

P P

TG

Notes: In the TGI, Nature (N) moves first and determines the game in which the truster (Player 1) interacts with a trustee (Player 2). With probability α the game is an assurance game (AG), in which the trustee is trustworthy, and with probability 1 − α the game is a trust game (TG), in which the trustee is untrustworthy. The probability α is common knowledge; the truster does not know in which game he or she is interacting with the trustee, and this is denoted by the dashed line. If the truster makes an advance (a), a trustworthy trustee meets the advance (m), whereas an untrustworthy trustee does not (¬m). In the first case, the truster’s payoff is R and the trustee’s payoff is R + b. In the second case, the truster’s payoff is S and the trustee’s payoff is T . If the truster does not make an advance (¬a), both the truster’s and the trustee’s payoff is P. The payoffs are ordered as follows: T > R > P > S and R + b > T − c. Fig. 1: The trust game with incomplete information (TGI).

Signaling Theory Evolving: Signals and Signs of Trustworthiness in Social Exchange |

377

The truster refrains from making an advance if the condition specified in equation (1) is not met. Unlike in the TG, in the AG, the trustee gains an additional benefit (b) from meeting the truster’s advance, and/or incurs an additional cost (c) from failing to meet the truster’s advance. If the truster makes an advance (a), therefore, the trustee will meet the advance (m). Before we describe where these additional benefits and costs come from, let us first define the terms trust, trustworthiness and competence based on the theoretical framework outlined thus far (for other definitions of trust see, e.g., Hardin 2002; Uslaner 2002; Yamagishi and Yamagishi 1994): – Trust is a truster’s subjective belief regarding the trustee’s trustworthiness and/or competence, based on which the truster decides whether or not to make an advance. – Trustworthiness is the trustee’s intention to meet the truster’s advance. – Competence is the trustee’s ability (in terms of skill and knowledge) to meet the truster’s advance. For the sake of simplicity, in what follows, we will consider the trust dilemma only as a dilemma in which the truster’s uncertainty concerns the trustworthiness of the trustee, but not the trustee’s competence (see Raub 2004 on this point). Following Riegelsberger, Sasse, and McCarthy (2005), we divide a trustee’s additional benefits and costs from meeting or not meeting the truster’s advance, respectively, in extrinsic (i.e., contextual) and intrinsic (i.e., psychological) benefits and costs. A trustee’s extrinsic benefits and costs result from the trustee’s social and institutional embeddedness (Buskens and Raub 2013; Diekmann et al. 2014; Hardin 2002; Hume [1740] 1969; Posner 2000; Sosis 2005; Przepiorka 2013; Przepiorka and Diekmann 2013). In these cases it can be rational for a trustee to act trustworthily, as not meeting a truster’s advance can result in not being trusted ever after, and this may be less beneficial than meeting the truster’s advance and being continuously trusted in the future. For example, electronic reputation systems in online markets create real (i.e., financial) incentives for anonymous traders to behave in a trustworthy way (e.g., Diekmann and Przepiorka 2017). This is not to say that online traders have no intrinsic motives for being trustworthy. A trustee’s intrinsic benefits and costs result from the trustee’s otherregarding preferences (Bolton and Ockenfels 2000; Braun 1992; Fehr and Schmidt 1999; Snijders 1996) and internalized norms of reciprocity and fairness (Bacharach and Gambetta 2001; Bacharach, Guerra, and Zizzo 2007; Falk and Fischbacher 2006; Voss 1998). For example, an online trader may derive a psychological benefit from the fact that his or her merchandise will make his or her trading partner happy and/or feel guilty if, after receiving the money from the trading partner, he or she did not send the merchandise in return. In the reminder of this chapter, we take for granted that (1) a proportion α of trustees is motivated by extrinsic and/or intrinsic benefits and/or costs (henceforth trustworthy-making properties) such that these trustees would meet a truster’s ad-

378 | Wojtek Przepiorka and Joël Berger

vance, and (2) that these trustees’ trustworthy-making properties are a priori unobservable. In the next section, we make use of signaling theory to address the questions of how trustees can convince trusters of their trustworthiness, and of how trusters can tell the trustworthy trustees apart from the untrustworthy ones.

3 Signals of trustworthiness in social exchange Broadly defined, signals are actors’ actions purposefully taken to change other actors’ beliefs. The main tenets of signaling theory as applied to the trust dilemma are as follows (see also Bliege Bird and Smith 2005). First, actors differ in their trustworthymaking properties: that is, some actors are trustworthy, and others are less so. Second, these trustworthy-making properties are only imperfectly observable or not observable at all. Third, actors benefit from knowing their interaction partners’ trustworthymaking properties. Finally, under certain conditions, signals allow actors to convey their trustworthy-making properties and infer the trustworthy-making properties of potential interaction partners. According to “classical” signaling theory (Spence 1974), a signal is produced strategically by a deliberate act. The signaler acts anticipating that an observer will interpret his or her act and infer his or her unobservable properties from it. A signal is informative of the signaler’s unobservable properties if it is type-separating. A signal is type-separating only if the true bearer of the relevant properties can afford to produce the signal and produces it while someone not equipped with these properties cannot afford to produce the signal. If all signalers are able to produce a signal and produce it, the signal does not convey any information about these signalers. Because of the conditions that must obtain for a signal to be type-separating, signaling theory is often called costly signaling theory. A classic example is the earning of a university degree to signal one’s productivity to potential employers (Spence 1974). An employer may have a good idea of the average productivity of potential employees, but does not know the productivity of any one applicant. For productive types, acquiring a university degree is associated with a cost s1 , and for unproductive types acquiring a degree is associated with a higher cost s2 > s1 . If, conditional on having acquired a degree, the discounted lifetime income is w irrespective of type, then earning a university degree is a type-separating signal of productivity if w − s1 > 0 > w − s2 . Under these conditions, an employer will know that an applicant with a degree is productive, whereas an applicant without a degree is not; the employer will therefore hire the former rather than the latter. How can we apply (costly) signaling theory to the interaction between a truster and a trustee in the trust dilemma? Recall first that in the TGI (Figure 1), the trustee in the AG is trustworthy because R + b > T − c, the trustee in the TG is untrustworthy because T > R, and both types of trustees earn the same (P) if the truster does not

Signaling Theory Evolving: Signals and Signs of Trustworthiness in Social Exchange | 379

make an advance. The relevant but unobserved property of the trustee that the truster would like to know about is the trustee’s trustworthiness. The TGI can be extended so that the trustee (knowing his or her type) can first decide whether to produce a signal at a certain cost or not (see, e.g., Przepiorka and Diekmann 2013). Like in the job market example described in the previous paragraph, let us first assume that, for a trustworthy trustee, producing the signal is associated with a cost sAG , and for the untrustworthy trustee producing the signal is associated with a higher cost sTG > sAG (the subscripts correspond to the games in which a trustworthy and untrustworthy trustee, respectively, interact with a truster: see Figure 1). For the truster to interpret the trustee’s type based on the signal, the signal must be type-separating: that is, a trustworthy trustee produces a signal, and an untrustworthy trustee does not. Only then can the truster infer that the trustee producing the signal is trustworthy and safely make an advance, or abstain from making an advance if no signal is produced. The signal is type-separating if the trustworthy trustee can afford to send it, while the untrustworthy trustee cannot. That is, if R + b − sAG > P and P > T − sTG , respectively. If, however, R + b − sAG > P and T − sTG > P, then both types of trustees can afford to produce the signal and may do so, in which case the signal will be uninformative of type. Although formally correct, this straightforward application of signaling theory to the trust dilemma in social exchange can lead to several misconceptions, which, as we claim, may have hampered the propagation of signaling theory in the social sciences in general (and in sociological scholarship in particular). These misconceptions are: (1) signaling theory is about signaling costs; (2) to be type-separating, signaling costs must differ across types; (3) signals are always produced strategically; (4) type-separating signals must be costly. We will next discuss conceptual extensions to the basic framework which have proved useful in explaining trust and trustworthiness in social exchange, and which dissolve these four misconceptions about signaling theory. We will show how distinguishing between signaling costs and benefits helps to dismiss misconceptions (1) and (2), and how distinguishing between signals, signs, their production and their display helps to contradict misconceptions (3) and (4).

4 Extending the signaling theory framework 4.1 Signaling costs and benefits By assuming that trustworthy and untrustworthy types experience different costs in producing a signal, we restrict ourselves to signals with production costs that are (causally) related to the property of interest. This relationship is obvious in the example of university degrees as type-separating signals of a potential employee’s pro-

380 | Wojtek Przepiorka and Joël Berger

ductivity, where the property of interest (productivity) facilitates the production of the signal (the university degree). In the case of the trust dilemma, it is not obvious how trustworthiness (as defined above) could facilitate signal production, or what could signal trustworthiness. We will come back to this point shortly. The general point we want to make here is that a difference in signaling costs is neither a sufficient nor a necessary condition for a signal to be type-separating; what matters are the net benefits (Johnstone 1997; Gambetta 2009). Let us make this point explicit with regard to the trust dilemma by assuming that sTG = sAG = s > 0. Under this assumption, the signal is type-separating if: R−P+b>s>T−P

(2)

That is, what makes the signal type-separating is not the signaling costs alone but the trustworthy trustee’s benefit (b) from meeting the truster’s advance, which an untrustworthy trustee does not obtain net of the costs. For example, the trustworthy trustee might be trustworthy because of his or her temporal embeddedness, which makes future interactions with the truster likely and thus making it worthwhile to behave trustworthily in the first encounter. Trustee types thus differ in the probabilities of meeting the truster again in the future, which are unobservable by the truster (Posner 2000; Przepiorka and Diekmann 2013). Assuming that the truster would not make any further advances if the trustee did not meet the truster’s first advance, trustees with a high probability of meeting the truster again have a strong incentive (b) to be trustworthy from the start, whereas trustees with a low probability of meeting the truster again have no incentive to be trustworthy (see Przepiorka and Diekmann 2013). Note that equation (2) implies that c has no bearing on the conditions under which a signal can be type-separating. If equation (2) does not hold, that is if T − R ≥ b ≥ 0, but c > 0 such that R + b > T − c still holds, there will be trustworthy and untrustworthy trustees, but no type-separating signaling equilibrium can emerge, irrespective of how large s is. To see this, first recall from above that T > R > P. If s is too large such that the untrustworthy type cannot afford to send the signal (s > T − P), neither can the trustworthy type afford to send it (s > R + b − P); if s is small enough that the trustworthy type can afford to send the signal (s < R + b − P), so can the untrustworthy type afford to send it (s < T − P). Under these conditions, a so-called pooling equilibrium will emerge, in which all trustees behave in the same way and the truster ignores trustees’ signaling behavior and merely decides according to equation (1). A noteworthy case is the one in which T − R ≥ b ≥ 0 for some trustworthy trustees, and b > T − R for all other trustworthy trustees. Under these conditions, a socalled semi-separating signaling equilibrium can emerge, in which only the trustworthy trustees with b > T − R can afford to send a signal, whereas the other trustworthy trustees cannot and will therefore be indistinguishable from untrustworthy trustees. Let us now come back to the case in which untrustworthy trustees have higher costs of producing the signal than trustworthy trustees. Clearly, this case requires that the costs of signal production are somehow related to the property of interest

Signaling Theory Evolving: Signals and Signs of Trustworthiness in Social Exchange | 381

(trustworthiness). To establish this relationship, recall our distinction between extrinsic and intrinsic trustworthy-making properties. It is hard to think of a relation between extrinsic trustworthy-making properties and costs of signal production; extrinsic trustworthy-making properties are mainly based on selective incentives which only materialize after the interaction, if at all. However, signals are produced before an interaction takes place. There are many examples of a relationship between intrinsic trustworthy-making properties and costs of signal production. For example, in recent years (experimental) evidence has accumulated which shows that generosity (including charitable giving) and trustworthiness are correlated (Ashraf, Bohnet, and Piankov 2006; Albert et al. 2007; Chaudhuri and Gangadharan 2007; Blanco, Engelmann, and Normann 2011; Fehrler and Przepiorka 2013; Gambetta and Przepiorka 2014; Gambetta and Székely 2014; Przepiorka and Liebe 2016; but see also Fehrler and Przepiorka 2016, who do not find evidence for such a correlation). The positive relationship between generosity and trustworthiness can be explained via otherregarding preferences (see the online appendix to Gambetta and Przepiorka 2014). If we allow actors to derive an additional (psychological) benefit from both producing the signal by being generous and being trustworthy, then producing the signal is less costly for trustworthy types than for untrustworthy types. Let us denote this additional benefit from producing the signal by being generous with g. For example, giving to charity (Harbaugh, Mayr, and Burghart 2007) or working as a volunteer at a non-profit organization (Fehrler and Kosfeld 2014) could be such signals. In these cases, even if the material costs are the same for all types of trustees, that is, sTG = sAG = s > 0, it can be the case that sTG > sAG − g. However, recall that a difference in signaling costs is not a sufficient condition for a signal to be typeseparating. For the signal to be type-separating, it must hold that R + b − sAG + g > P and P > T − sTG . This can be simplified to: R−P+b+g>s>T−P

(3)

Unlike signals which are equally costly for all types of trustees to produce, b can now also be zero as long as R + b > T − c, where c > 0. For example, if a trustee would feel guilty for not meeting the truster’s advance (i.e., incurring an intrinsic cost c) and his or her production of the signal would yield an additional benefit g due to guilt reduction, the signal can be type-separating as long as R − P + g > s > T − P.² In other words, guilt could sustain both charitable giving and trustworthiness. By making the conceptual distinction between signaling costs and benefits, we have shown that the net benefits matter for type-separation rather than the costs

2 This is in line with the other-regarding preferences model by Fehr and Schmidt (1999), which predicts that actors with a β (“guilt”) parameter above a certain threshold will both choose a generous split as dictators in a dictator game and meet a truster’s advance as trustees in a trust game. In other words, dictator game giving can be a type-separating signal of trustworthiness (see Gambetta and Przepiorka 2014).

382 | Wojtek Przepiorka and Joël Berger

alone, and that therefore signaling costs do not have to differ across types to be typeseparating. Thus far, however, we have assumed that, to be type-separating, untrustworthy trustees must not find it worthwhile to produce a signal: it must hold that s > T − P. In the next section, we will show that even if sTG = sAG = s = 0, as long as g > 0 for trustworthy trustees and g = 0 for untrustworthy trustees, producing the signal can be type-separating under certain conditions. These conditions are unveiled once we introduce the conceptual distinction between signals, signs, their production and their display.

4.2 Signals, signs, their production, and their display The conceptual distinction between signals and signs has been developed by Bacharach and Gambetta (2001), Gambetta and Hamill (2005), Gambetta (2009), Gambetta and Przepiorka (2014), and is further developed here. Recall our definition that signals are actors’ actions purposefully taken to change other actors’ beliefs. We define signs as anything in the environment that can change actors’ beliefs, once perceived. For example, if one looks out of the window and sees dark clouds gathering, one is more likely to believe that it is going to rain; if one overhears someone speaking with a French accent, one is more likely to believe that this person is French by birth; if one notes a seller’s reputation score on eBay to be overwhelmingly positive, one is more likely to believe that one can trust this seller with one’s money. Signs and signals are related in that they affect actors’ beliefs, but they are also related in that signs can become signals, and vice versa. A sign becomes a signal if an actor displays (or hides) the sign for the purpose of changing another actor’s beliefs. A signal becomes a sign if, once produced by an actor for the purpose of changing another actor’s beliefs, it remains on display for other reasons. It seems futile to construct an example in which the sign of gathering dark clouds is turned into a signal, whereas a French accent or a seller’s reputation score on eBay can more plausibly be exemplified as signs and signals. For example, during World War II, French spies hiding their accents when speaking German did so with the purpose of making their enemies believe they were native Germans; by hiding them, their French accents became signals. Note, however, that for their German counterparts, hearing someone talking German flawlessly remained a sign, as they of course would not have sensed that something was being hidden from them on purpose. In peer-topeer online markets, having one’s reputation score conspicuously displayed is a standard feature, and concealing one’s reputation score is rarely an option. Hence, online traders may not think of their and their exchange partners’ reputation scores as being purposefully displayed, and therefore perceive them as signs rather than signals. However, before their reputation can be displayed, online sellers must first build one. Online sellers can build a good reputation by behaving trustworthily when dealing with buyers to receive positive ratings. If, by building a good reputation, these sellers

Signaling Theory Evolving: Signals and Signs of Trustworthiness in Social Exchange |

383

intend to affect future buyers’ beliefs about their trustworthiness, their good reputation is a signal they produce. What makes an online seller’s reputation a signal is its purposeful production rather than its automatic display.³ However, an online seller’s good reputation may be the byproduct of the seller being trustworthy because of his or her intrinsic trustworthy-making properties (e.g., guilt aversion, norm of reciprocity, etc.). In this case, reputation is not produced as a signal but as a sign, because it is not produced for the purpose of changing a buyer’s beliefs (yet still has the potential to do so). Whether a seller’s reputation is produced strategically as a signal or naturally as a sign is difficult to determine without knowing the seller’s intentions at the time of production. Still, actors’ intentions can sometimes be derived from the context in which production takes place, or actors may have proof of their intentions at the time of production. The distinction between signals that are produced strategically and signs that are produced naturally opens the door for signaling theory to be applied to “cheap” actions such as gestures and talk.⁴ Actions are not always carried out in anticipation of their future information value; people are not strategic all the time and often produce signs, the “raw material” of potential signals, with other reasons in mind. The interesting aspect of naturally produced signs is that they can be reliably informative of type even though their production costs are not type-separating – in fact, their real production costs can be virtually zero. Simple gestures, which in strategic situations could be easily performed by any actor, can become informative. If actors do not anticipate the information value of their actions, those who behave kindly, for example, do so because they are good-hearted and derive an intrinsic benefit g > 0 from it, while those who are not good-hearted do not derive an intrinsic benefit from being kind (g = 0). Many acts of kindness have negligible costs and are therefore not mimic-proof (i.e., sTG = sAG = s = 0); yet unconditional kindness has been found to explain much of the variation in trustworthiness (Ashraf, Bohnet, and Piankov 2006). So how could “cheap” acts of kindness function as type-separating signals of trustworthiness? This question cannot be answered with regard to sign production by means of our signaling model, because at the time of production actors must neither know nor expect that an interaction with trust at stake will follow in which their trustworthiness may

3 An online seller’s reputation score may not be the best example to clarify the distinction between the production and the display of signs and signals, because it is produced and automatically displayed at the same time, in which case only production lends itself to purposeful acts. Maybe a more suitable example are tattoos (see Gambetta 2009:171). Tattoos, denoting gang membership for instance, are produced at one point in time and can later be purposefully displayed as signals, but also unconsciously displayed as signs. 4 The notion of naturally produced signs relates to Spence’s notion of indices (Spence 1974:9–11 and Ch. 4) and Frank’s notion of passive signals (Frank 1988: Ch. 5). However, by introducing the distinction between sign production and display, indices or passive signals can also become signals at the level of display (Gambetta and Przepiorka 2014).

384 | Wojtek Przepiorka and Joël Berger

be inferred from their kindness. However, the question can be answered by our signaling model with regard to sign display. Actors who miss being kind when they have a chance will later find it too costly (relative to the benefits) to fabricate evidence of their kindness, while displaying the evidence will be affordable by actors who are indeed kind. For the sake of simplicity, let us assume that trustworthiness and natural kindness are perfectly correlated and denote the cost for displaying evidence of an actor’s kind act with s󸀠AG and s󸀠TG if the actor is trustworthy or untrustworthy respectively. Displaying the evidence of one’s kindness is then a type-separating signal of trustworthiness if R + b − s󸀠AG > P and P > T − s󸀠TG . In other words, a past action that produces a sign of an unobservable property of the actor, becomes itself a new, not directly observable property of the actor, and the displayable evidence of the action’s having occurred becomes a new, second-order signal. Natural actions can thus make reliable communication more efficient, because it is much cheaper for the honest actor. This is the case if the type-separating conditions apply to the displayable evidence that an action has both occurred and that the actor did not anticipate its future information value (Gambetta and Przepiorka 2014). Signaling theory, as part of rational choice theory, may appear to many sociologists as a blind lane, where sociological relevance is inadvertently traded for theoretical rigor. However, signaling theory is being developed further, and we believe that the pairing of formal game-theoretic modelling with an open eye for the games we play in everyday life (Goffman 1959; Goffman 1969), will make signaling theory a widely used instrument in sociological scholarship. However, theoretical innovations must be put to an empirical test and confronted with sociologically relevant phenomena. In the next section, we review some empirical tests of the theoretical arguments we have presented in this chapter. Most of these studies are laboratory experiments conducted by sociologists testing hypotheses regarding trust and trustworthiness in trust dilemmas. (For an earlier review of experiments with “signaling games”, see Camerer 2003: Ch. 8.)

5 Experimental evidence To our knowledge, Bolle and Kaehler (2007) are the first to experimentally investigate a signaling model of social exchange. In their experiment, a trustworthy trustee benefits from meeting a truster’s advance, whereas an untrustworthy trustee benefits from not meeting the truster’s advance. Participants in the role of truster cannot tell trustworthy from untrustworthy types, but they know that they will meet a trustworthy trustee with probability α which is varied in the experiment. In one experimental condition (low-alpha), α is low so that the truster has a higher expected payoff from not making an advance (i.e., the condition specified in equation (1) is not met). In the other experimental condition (high-alpha), α is high enough so that a truster has a higher expected

Signaling Theory Evolving: Signals and Signs of Trustworthiness in Social Exchange |

385

payoff from making an advance. In both conditions, the trustees can signal their trustworthiness by making a costly gift to a truster before the truster decides whether to make an advance. If a trustee decides to make a gift, he or she incurs the costs for doing so irrespective of what the truster does thereafter. In this experimental setup, the payoffs are such that an untrustworthy trustee does not gain from sending a gift, even if the truster makes an advance after having received the gift. Thus, a gift is a typeseparating signal because only trustworthy trustees gain from sending it. Although the truster would benefit from knowing the trustee’s type in both the low-alpha and the high-alpha condition, a type-separating equilibrium is more likely to emerge in the low-alpha condition. The reason is that, in the high-alpha condition, trusters can be expected to make an advance without being sent a gift, which eliminates the trustees’ incentives to send a gift. It is therefore more likely that a pooling equilibrium emerges in which no trustee sends a gift. Although their results concur imperfectly with these theoretical predictions, the results are quite close. A related study is conducted by Przepiorka and Diekmann (2013) to test a theoretical argument put forward by Posner (1998; 2000). Posner suggests that trustees interested in long-term social exchange relations invest in signals so that trusters can discriminate between them and less trustworthy trustees, who are not interested in long-term social exchange relations. Unlike in Bolle and Kaehler (2007), trustees differ in their probabilities that in a series of trust dilemmas a further interaction will take place between the same actors, rather than in their payoffs from a single encounter. Consequently, only trustees with a high probability of repeated encounters are compensated for the costs they incur if they send a signal, which establishes the conditions for a type-separating equilibrium. In line with expectations, these trustees are more trustworthy and spend more money on signals than trustees with a low probability of repeated encounters. Also in accordance with theoretical expectations, signaling behavior is more indicative of type in the low-alpha condition than in the high-alpha condition, where the probability of being trusted is high and sending signals therefore not worthwhile. Contrary to expectations, however, signaling does not enhance the level of trust in the low-alpha condition relative to the trust level in the condition without a signaling opportunity. The reason is the high level of unconditional trust in the absence of a signaling opportunity (see Camerer and Weigelt 1988). Still, the more trustees spend on signaling, the higher the probability that trusters make an advance. As in the Bolle and Kaehler (2007) experiment, trusters benefit from the introduction of a signaling opportunity, while the opposite is true for trustees, mainly due to the costs they incur from sending signals. In another computerized laboratory experiment, Fehrler and Przepiorka (2013) test the hypothesis that generosity in the form of charitable giving is a signal of trustworthiness, which can be expected based on evolutionary reasoning (Gintis, Smith, and Bowles 2001). They exploit the natural variation in subjects’ generosity and trustworthiness rather than inducing different types of trustees by varying monetary incentives. Their results show that trustees who give to charity are indeed more trustworthy,

386 | Wojtek Przepiorka and Joël Berger

and trusters infer the trustees’ trustworthiness from these trustees’ generosity when they decide whether to make an advance in a trust dilemma. In all the three studies discussed above, real (i.e., monetary) signaling costs are the same for all trustee types. As elaborated in the first part of this chapter, there are two general conditions under which a signal can be type-separating but equally costly for all types. The first condition is that the benefits of the trustees differ such that it only pays for the trustworthy trustees to send a signal. The second condition is that it does not pay for any of the types to send a signal, but the trustworthy type reaps an additional intrinsic benefit from sending a signal and meeting a truster’s advance. Examples of the first condition are given in Bolle and Kaehler (2007) and Przepiorka and Diekmann (2013). In both studies, the costs for sending the signal are the same for all trustees. However, it only pays for trustworthy trustees to send a signal and realize an exchange, whereas this does not pay for untrustworthy trustees (i.e., R − P + b > s > T − P). An example of the second condition is given in Fehrler and Przepiorka (2013). In their study, not only are the signaling costs the same for both types, but a successful exchange does not even entirely compensate the trustworthy type for the signaling costs. It is exactly the willingness to be generous and give to charity, even if such charitable giving is costly, that distinguishes trustworthy from untrustworthy types. In other words, there is a correlation between the willingness to produce the signal and trustworthiness: the trustworthy trustees, via their otherregarding preferences, derive an intrinsic benefit from giving to charity as they do from meeting a truster’s advance (R + b − s + g > P and P > T − s). This is in line with a recent study conducted by Berger (2017a), which shows that pro-environmental behavior correlates with trustworthiness, and that trusters infer trustees’ trustworthiness from these trustees’ decisions to buy environmentally friendly products as opposed to conventional products. Several experimental studies have been conducted which illustrate the distinction between signals and signs of trustworthiness, as well as the production and display of such signals and signs. An interesting example of a naturally produced sign of trustworthiness is discussed by Feinberg, Willer, and Keltner (2012). In a series of (quasi-) experimental studies, the authors find that the extent of embarrassment displayed by a subject in awkward social situations is predictive of his or her trustworthiness, and observers use embarrassment as an indicator for trustworthiness in the trust dilemma. In everyday interactions, embarrassment is difficult to produce strategically; it is therefore a sign rather than a signal. Generosity can also be a sign of trustworthiness. Gambetta and Przepiorka (2014) conduct an experiment in which they let subjects produce signs and signals of trustworthiness by letting them decide between a generous and a selfish division of money in the dictator game (DG). In one condition, subjects make their DG decisions without knowing that a trust dilemma will follow in which they will act as trustees. These subjects’ DG decisions thus produce signs naturally. In another condition, subjects know everything from the start and are thus able to produce signals strategically. In a sec-

Signaling Theory Evolving: Signals and Signs of Trustworthiness in Social Exchange | 387

ond stage comprising a trust dilemma, trusters can use information about trustees’ DG decisions as an indicator of trustworthiness. It turns out that trusters rely on this information more if trustees were not aware that a trust dilemma would follow after the DG. What is more, given an opportunity to display how they decided in the DG, generous trustees display their generosity, whereas selfish trustees try to hide their selfishness (see also Gambetta and Székely 2014).

6 Conclusion and outlook Since it was put forward 40 years ago, signaling theory has been used to explain behavior in social interaction by biologists, economists and other social and behavioral scientists. Only relatively recently have sociologists become interested in signaling theory, and its application in sociology is still rare. This may seem understandable: signaling theory has often been presented in formal, game theoretic terms, and most sociologists balk at formal theorizing, not to speak of rational choice and game theory. Even those more inclined to formal theoretical approaches may have found signaling theory too blunt an instrument to explain sociologically relevant phenomena. We suspect that the following four misconceptions contribute to the relatively low appreciation of signaling theory in sociology: (1) signaling theory is about signaling costs; (2) to be type-separating, signaling costs must differ across types; (3) signals are always produced strategically; and (4) type-separating signals must be costly. The aim of this chapter has been to make signaling theory more accessible to sociological scholarship by resolving these misconceptions. By first distinguishing between signaling costs and benefits, we resolve misconceptions (1) and (2). This may seem trivial, but experience has taught us that the obvious tends to be overlooked, particularly its implications. While signaling costs accrue when signals are produced, their benefits lie in the future. More importantly, with this distinction in mind, it becomes conceivable that the benefits of signals can differ across types and not only the costs. It follows that it is the difference in net benefits that makes a signal potentially type-separating, not necessarily the difference in costs alone. This conceptual distinction between signaling costs and benefits opens the door for signaling theory to explain cultural consumption and the emergence of seemingly irrational norms (Wenegrat et al. 1996; Posner 2000; Voland 2004; Diekmann and Przepiorka 2010). Second, by distinguishing between signals and signs, and between the production and the display of signals and signs, we resolve misconceptions (3) and (4). Some actions are strategic, produced for the purpose of informing others about one’s unobserved properties and persuade them to act in a certain way. Strategically-produced signals are type-separating if they are unaffordable by mimics. Other actions, chosen for other purposes, can still send information as a by-product; if actors do not antic-

388 | Wojtek Przepiorka and Joël Berger

ipate the information value of their actions, the signs these actions produce can be persuasive even if their production is virtually costless and thus affordable by mimics. Once produced, signs (and signals) can be later re-displayed with efficiency gains. This requires hard-to-fake evidence (a second-order signal) to prove that their production occurred and show the “type-separating” conditions under which it did so. The distinction between signals that are produced strategically and signs that are produced naturally opens the door for signaling theory to be applied on “cheap” actions such as gestures and talk (Gambetta and Przepiorka 2014). Moreover, the distinction between production and display enables signaling theory to be applied on any kind of evidence for one’s past actions and to conceptually incorporate reputation formation, the acquisition of social capital, and the emergence of institutions that promote cooperation in humans under its theoretical framework (Diekmann 2007; Przepiorka 2013). Sociologists have just started to explore and exploit the potential of signaling theory to explain behavior in social interactions. In this chapter, we have also reviewed empirical evidence, mostly coming from laboratory experiments with trust dilemmas. Apart from a relatively complex setup, a major challenge researchers face when testing signaling models experimentally is the inducing of different actor types (e.g., trustworthy and untrustworthy trustees). In theory, assuming rational and self-regarding actors, one can derive the conditions under which a type-separating signaling equilibrium is most likely to emerge (see above). In practice, these conditions depend on subjects’ beliefs and behaviors. For example, despite being informed that a high proportion of interactions will take place in the TG rather than the AG (see Figure 1), subjects may still believe the probability of meeting a trustworthy trustee is high because they expect other subjects to have other-regarding preferences. In line with this conjecture, it has been shown that an overwhelming majority of trustees will meet a truster’s advance in the AG, but a significant proportion of trustees will also meet a truster’s advance in the TG (Camerer 2003; Bolle and Kaehler 2007; Przepiorka and Diekmann 2013). Under these conditions, a rational truster will make an advance anyhow, in which case a trustee will be reluctant to spend money on sending a signal. Because a significant proportion of experimental subjects have other-regarding preferences, the conditions under which signaling should occur efficiently are difficult to implement in lab experiments, not to mention online or field experiments (but see Berger 2017b). One solution to this problem could be to let subjects filling the role of trusters (knowingly) interact with simulated trustees, which are programmed to behave in accordance with the theoretic model. This strategy would allow researchers to test hypotheses about trusters’ behavior in an environment in which the trust problem is large enough for signals to be worth sending (Przepiorka 2009: Ch. 6). Conversely, the same approach could be taken to investigate trustees’ signaling behavior when they knowingly interact with simulated trusters. Clearly, these approaches merely test hypotheses and do not tell us about the subjects’ actual beliefs, preferences and be-

Signaling Theory Evolving: Signals and Signs of Trustworthiness in Social Exchange |

389

haviors. An alternative approach, which several recent studies have taken, is to use the natural variation in subjects’ preferences, give them an opportunity to produce signals and signs of their preferences, and investigate how far observers infer trustworthiness from these signals and signs (e.g., Fehrler and Przepiorka 2013; Gambetta and Przepiorka 2014). This approach has been successful in showing that generosity (Fehrler and Przepiorka 2013; Gambetta and Przepiorka 2014), sustainable consumption (Berger 2017a), corporate social responsibility (Fehrler and Przepiorka 2016), and moral judgements (Simpson, Harrell, and Willer 2013) can function as signals and signs of trustworthiness, whereas, contrary to theoretical expectations (Gintis, Smith, and Bowles 2001), the punishment of selfishness does not seem to have this function (Przepiorka and Liebe 2016). It is important to know which actions convey information about actors’ trustworthiness, because knowing whom to trust promotes cooperation and cohesion in society. If institutions are designed that ignore, or even inhibit, observable actions that produce signals and signs of trustworthiness, societies may lose their ability to reinforce the moral principles on which they are based.

Bibliography [1]

Albert, Max, Werner Güth, Erich Kirchler, and Boris Maciejovsky. 2007. “Are we nice(r) to nice(r) people? An experimental analysis.” Experimental Economics 10(1):53–69. [2] Ashraf, Nava, Iris Bohnet, and Nikita Piankov. 2006. “Decomposing trust and trustworthiness.” Experimental Economics 9(3):193–208. [3] Bacharach, Michael, and Diego Gambetta. 2001. “Trust in Signs.” Pp. 148–184 in Trust in Society, edited by K. S. Cook. New York: Sage. [4] Bacharach, Michael, Gerardo Guerra, and Daniel J. Zizzo. 2007. “The self-fulfilling property of trust: An experimental study.” Theory and Decision 63(4):349–388. [5] Berg, Joyce, John Dickhaut, and Kevin McCabe. 1995. “Trust, reciprocity, and social history.” Games and Economic Behavior 10(1):122–142. [6] Berger, Joël (2017a). “Is ‘buying green’ a signal of trustworthiness in social exchange? Evidence from laboratory experiments.” Unpublished manuscript, Department of Sociology, University of Zurich. [7] Berger, Joël (2017b). “Are luxury brand labels and ‘green’ labels costly signals of social status? An extended replication.” PLOS ONE 12(2):e0170216 [8] Blanco, Mariana, Dirk Engelmann, and Hans T. Normann. 2011. “A within-subject analysis of other-regarding preferences.” Games and Economic Behavior 72(2):321–338. [9] Bliege Bird, Rebecca, and Eric Alden Smith. 2005. “Signaling theory, strategic interaction, and symbolic capital.” Current Anthropology 46(2):221–248. [10] Bolle, Friedel, and Jessica Kaehler. 2007. “Introducing a signaling institution: An experimental investigation.” Journal of Institutional and Theoretical Economics 163(3):428–447. [11] Bolton, Gary E., and Axel Ockenfels. 2000. “ERC: A theory of equity, reciprocity, and competition.” American Economic Review 90(1):166–193. [12] Bozoyan, Christiane. 2015. “Vertrauen und Vertrauenswürdigkeit.” Pp. 195–216 in Experimente in den Sozialwissenschaften, edited by M. Keuschnigg, and T. Wolbring. Baden-Baden: Nomos.

390 | Wojtek Przepiorka and Joël Berger

[13] Braun, Norman. 1992. “Altruismus, Moralität und Vertrauen.” Analyse & Kritik 14(2):177–186. [14] Buskens, Vincent. 2002. Social networks and trust. Dordrecht: Kluwer Academic Publishers. [15] Buskens, Vincent, and Werner Raub. 2013. “Rational Choice Research on Social Dilemmas: Embeddedness Effects on Trust.” Pp. 113–150 in The Handbook of Rational Choice Social Research, edited by R. Wittek, T. A. B. Snijders, and V. Nee. Stanford, CA: Stanford University Press. [16] Camerer, Colin F. 2003. Behavioral game theory. Princeton, NJ: Princeton University Press. [17] Camerer, Colin, and Keith Weigelt. 1988. “Experimental test of a sequential equilibrium reputation model.” Econometrica 56(1):1–36. [18] Chaudhuri, Ananish, and Lata Gangadharan. 2007. “An experimental analysis of trust and trustworthiness.” Southern Economic Journal 73(4):959–985. [19] Cho, In-Koo, and David M. Kreps. 1987. “Signaling games and stable equilibria.” Quarterly Journal of Economics 102(2):179–222. [20] Coleman, James S. 1990. Foundations of social theory. Cambridge, MA: The Belknap Press of Harvard University Press. [21] Dasgupta, Partha. 1988. “Trust as a Commodity.” Pp. 49–72 in Trust: Making and Breaking Cooperative Relations, edited by D. Gambetta. Oxford: Blackwell. [22] Dawkins, Richard, and John R. Krebs. 1978. “Animal Signals: Information or Manipulation?” Pp. 282–309 in Behavioral Ecology: An Evolutionary Approach, edited by J. R. Krebs, and N. B. Davies. Oxford: Blackwell. [23] Diekmann, Andreas. 2007. “Dimensionen des Sozialkapitals.” Pp. 47–65 in Sozialkapital. Grundlagen und Anwendungen, Sonderheft 47/2007 der Kölner Zeitschrift für Soziologie und Sozialpsychologie, edited by A. Franzen, and M. Freitag. Wiesbaden: VS. [24] Diekmann, Andreas, Ben Jann, Wojtek Przepiorka, and Stefan Wehrli. 2014. “Reputation formation and the evolution of cooperation in anonymous online markets.” American Sociological Review 79(1):65–85. [25] Diekmann, Andreas, and Wojtek Przepiorka. 2010. “Soziale Normen als Signale. Der Beitrag der Signaling-Theorie.” Pp. 220–237 in Soziologische Theorie kontrovers. Sonderheft der Kölner Zeitschrift für Soziologie und Sozialpsychologie, edited by G. Albert, and S. Sigmund. Wiesbaden: VS. [26] Diekmann, Andreas, and Wojtek Przepiorka. 2017. “Trust and Reputation in Markets.” in The Oxford Handbook of Gossip and Reputation, edited by F. Giardini, and R. Wittek. Oxford: Oxford University Press. [27] Ermisch, John, and Diego Gambetta. 2006. “People’s trust: The design of a survey-based experiment.” Working Paper, University of Essex, Colchester. [28] Falk, Armin, and Urs Fischbacher. 2006. “A theory of reciprocity.” Games and Economic Behavior 54(2):293–315. [29] Fehr, Ernst, and Klaus M. Schmidt. 1999. “A theory of fairness, competition, and cooperation.” Quarterly Journal of Economics 114(3):817–868. [30] Fehrler, Sebastian, and Michael Kosfeld. 2014. “Pro-social missions and worker motivation: An experimental study.” Journal of Economic Behavior & Organization 100:99–110. [31] Fehrler, Sebastian, and Wojtek Przepiorka. 2013. “Charitable giving as a signal of trustworthiness: Disentangling the signaling benefits of altruistic acts.” Evolution and Human Behavior 34(2):139–145. [32] Fehrler, Sebastian, and Wojtek Przepiorka. 2016. “Choosing a partner for social exchange: Charitable giving as a signal of trustworthiness.” Journal of Economic Behavior & Organization 129:157–171. [33] Feinberg, Matthew, Robb Willer, and Dacher Keltner. 2012. “Flustered and faithful: Embarrassment as a signal of prosociality.” Journal of Personality and Social Psychology 102(1):81–97.

Signaling Theory Evolving: Signals and Signs of Trustworthiness in Social Exchange | 391

[34] Frank, Robert H. 1988. Passions within reason: The strategic role of the emotions. New York: Norton. [35] Gambetta, Diego. 2009. “Signaling.” Pp. 168–194 in The Oxford Handbook of Analytical Sociology, edited by P. Hedström, and P. Bearman. Oxford: Oxford University Press. [36] Gambetta, Diego, and Heather Hamill. 2005. Streetwise: How Taxi Drivers Establish Their Customers’ Trustworthiness. New York: Russell Sage Foundation. [37] Gambetta, Diego, and Wojtek Przepiorka. 2014. “Natural and strategic generosity as signals of trustworthiness.” PLOS ONE 9(5):e97533. [38] Gambetta, Diego, and Áron Székely. 2014. “Signs and (counter)signals of trustworthiness.” Journal of Economic Behavior & Organization 106:281–297. [39] Gintis, Herbert, Eric A. Smith, and Samuel Bowles. 2001. “Costly signaling and cooperation.” Journal of Theoretical Biology 213(1):103–119. [40] Goffman, Erving. 1959. The Presentation of Self in Everyday Life. Garden City, NY: Doubleday, Anchor Books. [41] Goffman, Erving. 1969. Strategic Interaction. Philadelphia: University of Pennsylvania Press. [42] Grafen, Alan. 1990. “Biological signals as handicaps.” Journal of Theoretical Biology 144(4):517–546. [43] Harbaugh, William T., Ulrich Mayr, and Daniel R. Burghart. 2007. “Neural responses to taxation and voluntary giving reveal motives for charitable donations.” Science 316(5831):1622–1625. [44] Hardin, Russell. 2002. Trust and trustworthiness. New York: Russell Sage Foundation. [45] Harsanyi, John. 1967. “Games with incomplete information played by ‘bayesian’ players, I: The basic model.” Management Science 14(3):159–182. [46] Hume, David. [1740] 1969. A treatise of human nature. Harmondsworth, Middlesex: Penguin Books. [47] Johnstone, Rufus A. 1997. “The Evolution of Animal Signals.” Pp. 155–178 in Behavioral Ecology: An Evolutionary Approach, edited by J. R. Krebs, and N. B. Davies. Oxford: Blackwell. [48] Kreps, David. 1990. “Corporate Culture and Economic Theory.” Pp. 90–143 in Perspectives on Positive Political Economy, edited by J. E. Alt, and K. A. Shepsle. Cambridge, MA: Cambridge University Press. [49] Nelson, Phillip. 1974. “Advertising as information.” Journal of Political Economy 82(4):729– 754. [50] Posner, Eric A. 1998. “Symbols, signals, and social norms in politics and the law.” Journal of Legal Studies 27(2):765–798. [51] Posner, Eric A. 2000. Law and social norms. Cambridge, MA: Harvard University Press. [52] Przepiorka, Wojtek. 2009. “Reputation and signals of trustworthiness in social interactions.” Dissertation Nr. 18649, ETH Zurich. [53] Przepiorka, Wojtek. 2013. “Buyers pay for and sellers invest in a good reputation: More evidence from eBay.” Journal of Socio-Economics 42(C):31–42. [54] Przepiorka, Wojtek. 2014. “Reputation in offline and online markets: Solutions to trust problems in social and economic exchange.” Economic Sociology, the European Electronic Newsletter 16(1):4–10. [55] Przepiorka, Wojtek, and Andreas Diekmann. 2013. “Temporal embeddedness and signals of trustworthiness: Experimental tests of a game theoretic model in the United Kingdom, Russia, and Switzerland.” European Sociological Review 29(5):1010–1023. [56] Przepiorka, Wojtek, and Ulf Liebe. 2016. “Generosity is a sign of trustworthiness – the punishment of selfishness is not.” Evolution and Human Behavior 37(4):255–262. [57] Raub, Werner. 2004. “Hostage posting as a mechanism of trust: Binding, compensation, and signaling.” Rationality and Society 16(3):319–365.

392 | Wojtek Przepiorka and Joël Berger

[58] Riegelsberger, Jens, M. Angela Sasse, and John D. McCarthy. 2005. “The mechanics of trust: A framework for research and design.” International Journal of Human-Computer Studies 62(3):381–422. [59] Searcy, William A., and Stephen Nowicki. 2005. The evolution of animal communication: Reliability and deception in signaling systems. Princeton, NJ: Princeton University Press. [60] Simpson, Brent, Ashley Harrell, and Robb Willer. 2013. “Hidden paths from morality to cooperation: Moral judgments promote trust and trustworthiness.” Social Forces 91(4):1529–1548. [61] Snijders, Chris. 1996. Trust and commitments. Amsterdam: Thela Thesis. [62] Sosis, Richard. 2005. “Does religion promote trust? The role of signaling, reputation, and punishment.” Interdisciplinary Journal of Research on Religion 1:1–30. [63] Spence, Michael A. 1974. Market signaling: Informational transfer in hiring and related screening processes. Cambridge, MA: Harvard University Press. [64] Uslaner, Eric M. 2002. The Moral Foundations of Trust. Cambridge: Cambridge University Press. [65] Voland, Eckart. 2004. “Normentreue zwischen Reziprozität und Prestige-Ökonomie: Eine soziobiologische Interpretation kostspieliger sozialer Konformität.” Pp. 177–189 in Fakten statt Normen?, edited by C. Lütge, and G. Vollmer. Baden-Baden: Nomos. [66] Voss, Thomas. 1998. “Vertrauen in modernen Gesellschaften. Eine spieltheoretische Analyse.” Pp. 91–129 in Der Transformationsprozess: Analysen und Befunde aus dem Leipziger Institut für Soziologie, edited by R. Metze, K. Mühler, and K.-D. Opp. Leipzig: Leipziger Universitätsverlag. [67] Wenegrat, Brant, Lisa Abrams, Eleanor Castillo-Yee, and I. Jo Romine. 1996. “Social norm compliance as a signaling system. I. Studies of fitness-related attributions consequent on everyday norm violations.” Ethology and Sociobiology 17(6):403–416. [68] Yamagishi, Toshio, and Midori Yamagishi. 1994. “Trust and Commitment in the United States and Japan.” Motivation and Emotion 18(2):129–166. [69] Zahavi, Amotz. 1975. “Mate selection – A selection for a handicap.” Journal of Theoretical Biology 53(1):205–214. [70] Zahavi, Amotz, and Avishag Zahavi. 1997. The Handicap Principle: A Missing Piece of Darwin’s Puzzle. Oxford: Oxford University Press.

Manuela Vieth and Jeroen Weesie

Trust and Promises as Friendly Advances Experimental Evidence on Reciprocated Kindness Abstract: People’s decision-making is influenced, not only by their own outcomes and those of others, but also by the mere choice of a preceding behavioral option. This can be due to feelings of obligation to return a favor, feelings of indignation about others’ misconduct, or the desire for self-consistency. We explore the impact of trustfulness and of promising trustworthiness on subsequent decisions using experimental data from various trust games, hostage trust games, and dictator games. Our lab experiment is designed as within-subject sets of structurally identical (sub)games resulting from friendly or unfriendly actual behavior in single encounters. This allows us to analyze the “pure” effects of preceding decisions without making specific assumptions about actors’ outcome preferences. We find evidence that both friendly and unfriendly behavior is reciprocated. Trustors reward trustees’ promises and punish omitted promises. Trustees tend to reciprocate trustfulness and are inclined to keep promises.

1 Introduction Social and economic situations with interdependencies between actors are often characterized by incentives for opportunistic behavior (Williamson 1985): actors are tempted to take advantage at the expense of their partner. An example in everyday life is trust. Trust is involved in situations where people make a “risky advance” in the sense that they provide others with an opportunity for exploitation. For instance, if we buy something second-hand, we often cannot be sure about the product quality. Buying something online also involves the risk that the seller might not deliver. In such situations, we would like the others to convince us that they can be trusted. For example, we wish sellers to provide safeguards such as guarantees or warranties for products we buy. Similarly, we are also frequently asked to provide a safeguard, such Note: We thank Vincent Buskens for comments, assistance during experimental sessions, and improvements in the Dutch version of instruction texts. For assistance during the experiment, we thank Rense Corten, Dennie van Dolder, and Richard Zijdeman. And we gratefully acknowledge comments made by Ozan Aksoy, Davide Barrera, Andreas Diekmann, Ben Jann, Wojtek Przepiorka, and Werner Raub, as well as by participants at the LMU seminar in Venice 2006, at the Japanese-German meeting of the DGS section “Modellbildung und Simulation” 2007 in Zurich, at the “Behavioral Studies” colloquium at ETH Zurich in 2007, and at the IIS 2008 World Congress in Budapest. Financial support was provided by the Netherlands Organization for Scientific Research (NWO) under grant MAGW 400-05-089. https://doi.org/10.1515/9783110472974-019

394 | Manuela Vieth and Jeroen Weesie

as a deposit for using certain facilities or for borrowing something. However, many safeguards do not completely remove the temptation to abuse trust but invoke an intrinsic commitment. For instance, informal promises can be cheap-talk: they neither involve objective costs for the person making the promise nor objective benefits for the person receiving the promise. Not accepting an imperfect safeguard can even be perceived as impolite, especially if presented as a gift. This also holds for unwanted gifts and is exploited for marketing strategies (Cialdini 2001 gives examples of free sample products, methods of door-to-door salesmen, and religious missionaries approaching passers-by with flowers). In this chapter we study the following two questions: How does making or omitting a promise of trustworthiness influence trustfulness, and how do promise decisions and trustfulness affect trustworthiness? Two powerful social-psychological forces are at work: inter-personal feelings of obligation or indignation and intra-personal self-consistency (Cialdini 2001: Chs. 2–3; Coleman 1990: Ch. 12; Gouldner 1960). Both forces can induce behavioral patterns of reciprocity that are induced by the mere behavioral process and not by outcome-based motivations. Reciprocity is a behavioral pattern of returning favors and retaliating for unkind actions (for a review, see Kolm and Ythier 2006: Chs. 4, 6–8). We conducted a game-theoretical lab experiment, designed as within-subject sets of structurally identical (sub)games that differ by preceding friendly and unfriendly decisions (“behavioral context”). This allows us to maximally control for effects of outcomes and of individual characteristics. It thereby gives maximal room for studying reciprocity resulting from motivations that are triggered by preceding behavior rather than induced by changes in objective outcomes. Inspired by Snijders (1996), we thus contribute theoretically and empirically to previous research by improving and extending analyses of preceding behavior in trust situations.

2 Process-based reciprocity in trust situations 2.1 The problem of trustworthiness for trustfulness The standard game-theoretical model describing trust situations is the Trust Game (TG) (Dasgupta 1988; Kreps 1990) (Figure 1a). It highlights the core features of trust situations (see also Coleman 1990: Ch. 5). First, two actors are involved: a trustor (1) and a trustee (2). Both actors are better off with honored trust than with no trust at all (R i > P i , with i = 1, 2). Second, the trustee makes a decision after the trustor has placed trust. Despite the collective advantage arising from honored trust, the trustee has incentives to abuse trust (T2 > R2 ), while the trustor has something to lose if trust is abused (S1 < P1 ). The outcomes in the game tree are called “objective” because they are displayed in terms of money. If actors are motivated largely by their own ob-

Trust and Promises as Friendly Advances | 395

jective outcomes and assume similar motivations on the part of others, trust will not be placed because placed trust would be abused. However, numerous studies have shown that people often do place trust and do honor trust, indicating that other-regarding motivations play a role (Snijders 1996; for reviews, see Camerer 2003: Ch. 2; Ostrom and Walker 2003). Two kinds of otherregarding motivations have been distinguished that give rise to reciprocity: outcomebased and intention-based motivations (for a review, see Fehr and Schmidt 2006). Outcome-based other-regarding motivations are social (value) orientations rooted in social comparisons (for reviews, see Au and Kwong 2004; McClintock and van Avermaet 1982). The basic idea is that actors take into account the objective outcomes of their interaction partners: actors’ utility is determined by a function of their own objective outcomes and those of others (Liebrand 1984; McClintock 1972; Messick and McClintock 1968; Weesie 1993; Weesie 1994b). Various social values have been distinguished, for instance, an “equalitarian” orientation (MacCrimmon and Messick 1976). Given such orientations, actors minimize the difference between their own and others’ objective outcomes (Kelley and Thibaut 1978). This idea is also the basis for models of “inequality aversion” (e.g., Bolton and Ockenfels 2000; Fehr and Schmidt 1999; van Lange 1999; Ledyard 1995; Weesie 1994a; for a guilt model, see Snijders 1996). Deviations from an equal outcome are assumed to inflict emotional disutility (e.g., caused by feelings of guilt or envy). This induces people to strive for outcome equality which can promote reciprocal behavior. Applying inequality aversion to the trust game, trustees honor trust or share gains generously if guilt feelings are strong enough (Snijders 1996; McCabe, Rigdon, and Smith 2003). Now consider that the trustee’s decision of whether to honor trust or not constitutes a distribution decision in the TG because to honor trust means to return some benefit. Separating this subgame yields a dichotomous Dictator Game (DG) with the trustee in the role of the dictator that represents the trustee’s sharing decision without a behavioral context (Figure 1b). We use the term “behavioral context” in the sense that an actor makes a decision in a subgame as a part of a larger game: the behavioral context consists of decisions made earlier in that game and of information about induced TG

1 Trustor Trust 2 Trustee

No trust Abuse (P1, P2)

(S1, T2)

R1 > P1 > S1 ; (a) Trust Game

Honor (R1, R2)

T2 > R2 > P2

DG| C1

2 Dictator Keep (S1, T2)

Share (R1, R2)

R1 > S1 ; (b) Dictator Game

Fig. 1: Trust Game (TG) and dichotomous Dictator Game (DG).

T2 > R2

DG

396 | Manuela Vieth and Jeroen Weesie

changes. For instance, the trustor’s decision to place trust is the behavioral context for the trustee’s choice of whether or not to honor that trust. We can now compare the trustee’s decision in the TG with the dictator’s decision in the DG. If the objective outcomes in the TG and the DG are identical, outcome-based motivations do not induce a difference in behavior between the two situations (McCabe, Rigdon, and Smith 2003). However, the trustee in the TG will only be in the favorable position if the trustor has placed trust. Thus, other-regarding motivations that account for behavioral processes of how a certain outcome is obtained become relevant. Specifically, perceived kindness and unkindness from interaction partners activate intention-based motivations. When someone does us a favor, we feel indebted to that person. We are then driven to give something in return to remove this “shadow of indebtedness” (Gouldner 1960:174), often even when the favor was unwanted (Cialdini 2001: Ch. 2; Coleman 1990: Ch. 12). Omitting or delaying an obligation to return a favor causes intrinsic distress and emotional tension. Similarly, inflicted harm demands retaliation, especially if the harm was avoidable or unjustified. For instance, people become unfriendly toward others who behave impolitely without justified reason. Thus, the driving forces of people’s behavior are feelings of obligation to return a favor (Cialdini 2001: Ch. 2; Coleman 1990: Ch. 12; Gouldner 1960) and feelings of indignation that induce people to retaliate for inflicted losses (Gouldner 1960). Experimental studies suggest that the choice of a specific option and information about non-chosen alternatives indicate a persons’ kindness (e.g., Brandts and Solà 2001; Charness and Rabin 2005; Cox 2004; Falk, Fehr, and Fischbacher 2003; Gallucci and Perugini 2000; Gautschi 2000; McCabe, Rigdon, and Smith 2003; Snijders 1996). Various theoretical models have been developed to account for intention-based motivations (e.g., Charness and Rabin 2002; Dufwenberg and Kirchsteiger 2004; Falk and Fischbacher 2006; Levine 1998; Rabin 1993). Evaluating the kindness of others is mainly based on three ingredients: the extent to which (1) an actor’s own outcomes and (2) others’ outcomes are shaped by (3) others’ intentional decisions (e.g., Falk and Fischbacher 2006). The basic assumption is that actors benefit from others’ friendly actions and suffer from others’ unfriendly actions. Thereby, others’ actions will be considered as being particularly kind if others incur sacrifices to behave in a friendly manner. However, received gains and the sacrifice of others will only be perceived as something friendly if the other could have chosen a less friendly alternative. Perceiving kindness gives rise to positive feelings toward the other person, while unkindness triggers negative feelings. The emotional utility caused by others’ friendly behavior constitutes the basis for feelings of obligation that motivate actors to reward others. Correspondingly, the emotional disutility caused by others’ unfriendly behavior gives rise to feelings of indignation that motivate actors to retaliate. Kindness of placed trust. Accounting for intention-based motivations suggests that sharing decisions of trustees and dictators differ despite identical outcomes. Trustees experience an increased outcome (T2 > P2 or R2 > P2 ) due to the trustors’ decision to place trust. Of course, in a trust situation both the trustee and the trustor bene-

Trust and Promises as Friendly Advances | 397

fit from honored trust (R1 > P1 ). However, the trustor faces the risk of trust being abused, which would inflict a loss upon the trustor (S1 < P1 ). As mentioned, an actor’s behavior will be perceived as being particularly kind if that actor incurs actual or potential sacrifices while providing benefits to others. By placing trust, the trustor indicates the positive belief that the trustee will honor that trust. For these reasons, placed trust constitutes a kind advance that induces in trustees a feeling of obligation to return the favor. Thus, trustees should be more strongly motivated to share gains than dictators. Hypothesis 1. Compared to honoring trust in the TG, gains are less likely to be shared in the DG. Note again that we compare decisions in behavioral contexts with identical objective outcomes. We do not thereby account for the possible moderating effects of outcomes (see the discussion for further remarks on this).

2.2 Promises of trustworthiness Trustfulness depends on the possibility of trustworthiness. Trustees might therefore seize opportunities to promise their trustworthiness to reduce the trustor’s concern about abused trust. Promises are expressed intentions to perform a certain action that yields a gain to the other person. Raub (1992) proposes the Hostage Trust Game (HTG, where the term “hostage” is used in the sense of a bond: see Schelling 1960) in which the trustee has a commitment option prior to the TG (see also Weesie and Raub 1996). A commitment is thereby understood as a “voluntary strategic action”, costly or not, with the purpose of “reducing one’s freedom of choice” or changing the outcomes (Schelling 1960). In our context, the commitment stage represents the trustee’s choice of whether to promise trustworthiness or not (Figure 2).

TG|H2O

2

No trust Abuse (P1, P2)

2

No promise 1 Trustor Trust

(S1, T2)

Trustee

Promise 1 Trustor

TG|H2+

Trust

Trustee Honor (R1, R2)

No trust

2 Abuse

Trustee Honor

(P1, P2 – c) (S1, T2 – v2 – c) (R , R – c) 1 2

Trust Game with initial payoffs Trust Game with modified payoffs R1 > P1 > S1; T2 > R2 > P2; v2 ≥ 0; c≥0 Fig. 2: Hostage Trust Game (HTG).

398 | Manuela Vieth and Jeroen Weesie

If the trustee promises his trustworthiness, the initial outcomes can be modified by objective properties of the promise: trustees choose between the initial TG (TG|H02 ) and a TG with modified outcomes (TG|H+2 ). Making a promise can be associated with transaction costs (c), which represent an irreversible investment that the trustee loses irrespective of subsequent choices (even if trust is subsequently withheld). For instance, such costs are involved in contacting the other person (e.g., writing a letter or travelling to a meeting), making a present (or offering product samples), or designing a contract. Promises can also involve something that is of value for the trustee (e.g., a deposit) which will be lost if he abuses trust after he promised trustworthiness. Objective bonds reducing the trustee’s temptation to renege on his promise are represented by the binding value (v2 ). Assume that actors are largely motivated by their own objective outcomes. In that case, trustees promise to behave in a trustworthy manner, trustors place trust, and trustees honor trust if the value of the bond completely removes the trustee’s temptation to abuse trust (perfectly binding: v2 > T2 − R2 ) and if the transaction costs are low enough (affordable: c < R2 − P2 ). (For formal game-theoretical analyses of commitments in the TG and in the related Prisoner’s Dilemma regarding standard selfishness assumptions, see Raub 2004; Raub and Keren 1993; Raub and Weesie 2000; Snijders 1996; Voss 1998; Weesie and Raub 1996.) Experiments using the HTG or the Prisoner’s Dilemma with commitment option show that already imperfectly binding commitments promote placing and honoring trust, and that minimal transaction costs hamper commitment posting (Bracht and Feltovich 2008; Mlicki 1996; Raub and Keren 1993; Snijders 1996; Yamagishi 1986; for negotiation problems, also see Duffy and Feltovich 2006; Prosch 2006). Even free communication without changes in objective outcomes (“cheap-talk”) promotes trustfulness and trustworthiness (for reviews see, e.g., Balliet 2010; Bicchieri 2002; Brosig 2006; Crawford 1998; Kopelman, Weber, and Messick 2002; Ostrom and Walker 2003; Sally 1995; Shankar and Pavitt 2002). Outcome-based motivations (e.g., inequality aversion) induce trustees with suffciently strong guilt feelings to honor trust in the TG (McCabe, Rigdon, and Smith 2003; Snijders 1996). In the HTG, the binding value of made promises reduces the trustee’s outcome of abused trust. On the one hand, the reduction of the trustee’s temptation promotes trustworthiness. On the other, promise properties can decrease advantageous outcome inequality on the part of the trustee. The impact of guilt-based inequality aversion of trustees can thus be smaller in the HTG than in the TG. This is also an obstacle in situations with asymmetric outcomes that typically arise from transaction costs. In the case of cheap-talk promises, trustees’ outcome-based motivations cannot predict any difference in trustworthiness and trustfulness. Comparing behavior in structurally identical decision situations that only differ in the behavioral context needs to account for process-based motivations. Consider that the trustee in the HTG decides a second time when choosing between honoring and abusing trust. Therefore, not only intention-based motivations play a role, but also a desire for self-consistency. Self-consistency is another process-based motivation. Peo-

Trust and Promises as Friendly Advances | 399

ple seek to behave consistently with their beliefs, attitudes, and previous choices (for reviews see, e.g., Cialdini 2001: Ch. 3; Gass and Seiter 2007: Ch. 3; Kunda 2002; Webster 1975). For instance, people tend to behave in accordance with agreements. This also holds for cases in which people discover hidden costs, feel uneasy upon rethinking, or have been instructed or even forced against their inclinations (for examples of studies on persuasion and salesman practices see Cialdini 2001; Gass and Seiter 2007). People then adjust their beliefs and opinions to maintain an impression of self-consistency. Inconsistent behavior causes cognitive dissonance, and this in turn inflicts internal tension and distress, which people seek to avoid (Akerlof and Dickens 1982; Aronson 1992; Festinger 1957; Heider 1944; Heider 1958). People use methods for reducing cognitive dissonance because they benefit from behaving consistently (for an overview see, for example, Gass and Seiter 2007:58). First, self-consistency is crucial for keeping up a self-schema, or “an integrated set of memories, beliefs, and generalizations about one’s behavior in a given domain” (Kunda 2002:452). Such self-knowledge in specific areas is the basis for people’s general self-evaluation and self-esteem. Second, self-consistency helps create an image of self-competence in a complex world. By behaving consistently, people maintain the impression that they control events in their life. Third, self-consistency reduces decision costs as people do not have to rethink all aspects of an identical (or similar) situation (Cialdini 2001). In the following, we explain how we expect intention-based motivations and the desire for self-consistency to influence trustworthiness and trustfulness (for a more detailed discussion, see Vieth 2009).

2.2.1 The effects of obligation and self-consistency on trustworthiness Kindness of placed trust after promising trustworthiness. We start with analyzing trustworthiness in a decision situation after the trustee has made a promise to honor trust (TG|H+2 ). First, trustfulness activates a feeling of obligation to return the favor by honoring that trust (Hypothesis 1). This also holds after the trustee has promised trustworthiness, because in doing so, the trustee shares responsibility for the trustor’s trustfulness. This increases the felt obligation to return the favor of placed trust. Second, the desire for self-consistency induces the trustee to keep his promise. Lying to exploit others’ trustfulness causes distress (Baumgartner et al. 2009). Due to the desire for self-consistency, some trustees with weak feelings of obligation would abuse trust in the TG but honor trust in the TG|H+2 just because they promised trustworthiness. To the extent that the effect of self-consistency adds to the promoting impact of feelings of obligation, trustees are more likely to honor trust. Furthermore, we assume that the properties of the promise moderate the promoting influence of promising trustworthiness on honoring trust. First, the binding value objectively reduces the trustee’s temptation and hence the trustor’s risk. Regarding intention-based motivations, the more objectively binding the promise, the

400 | Manuela Vieth and Jeroen Weesie

less the trustee feels obliged to keep it. Second, if trust would not be placed after the trustee promised trustworthiness, the incurred transaction costs were wasted. Therefore, trustees might perceive placed trust as a reward for the sacrificed transaction costs. This increases the trustee’s feelings of obligation to return the favor. Hypothesis 2. Compared to the TG (i.e., without promise opportunity), trust is more likely to be honored after trustworthiness has been promised (TG|H+2 ). Moreover, the effect of placed trust on trustworthiness becomes less promoting with increasing binding value v2 , but more promoting with increasing transaction costs c. Unkindness of placed trust after omitting the promise. We now turn to the decision situation that arises after the trustee decided not to promise trustworthiness (TG|H02 ). Consider that trustees decide whether to honor trust after they are confronted with trustfulness, despite having omitted the promise. In general, the trustor’s trustfulness invokes a feeling of obligation to return the favor of placed trust. However, trustees not promising trustworthiness reduce internal distress if they are convinced they would abuse trust. Thus, after a withheld promise, feelings of obligation compete with the desire for self-consistency: self-consistency induces the trustee to abuse trust, while feelings of obligation induce the trustee to honor trust. Self-consistency can undermine feelings of obligation because it prevents trustees who omitted the promise from feeling the unease of cognitive dissonance. Trustees might even perceive placed trust in a negative way after they explicitly refuse to promise trustworthiness. Trustees then might abuse trust because they feel irritated. Irritation can arise from the caused intrinsic conflict between obligation and self-consistency or from perceiving trustfulness as a manipulation attempt (Cialdini 2001). Trustees might even consider placed trust as unintelligent behavior not worthy of reward. Moreover, abusing trust seems more legitimate after having omitted the promise. For these reasons, we expect less trustworthiness when the promise has been omitted (TG|H02 ) than when no promise is possible (TG). Again, the properties of the promise deserve additional attention. The binding value becomes relevant for trustees who renege on their promise. Trustees who do not promise trustworthiness because of a high binding value thus show their intention to abuse trust. The higher the binding value of an omitted promise, the more likely trustees are to perceive trustfulness as unintelligent behavior. The feeling of obligation is therefore undermined more strongly the higher the binding value, and trustworthiness decreases. Next, trustees might refrain from making the promise because of high transaction costs. Trustees cannot easily convince themselves that they would abuse trust anyway. Trustees might rather hope that trustors accept that the trustee refrains from making the promise because of high transaction costs. Thus, the higher the transaction costs, the stronger the feelings of obligation to behave in a trustworthy manner and the less likely it becomes that these feelings will be outweighed by the desire for self-consistency. High binding values of a withheld promise can also foster a selection of trustees who do not experience strong feelings of obligation. In

Trust and Promises as Friendly Advances |

401

contrast, high transaction costs also induce trustees with strong feelings of obligation to withdraw from making the promise. Hypothesis 3. Compared to the TG (i.e., the decision situation without a promise opportunity), trust is less likely to be honored after a possible promise of trustworthiness has not been made (TG|H02 ). Moreover, the effect of placed trust on trustworthiness becomes more hampering with increasing binding value v2 , but less hampering with increasing transaction costs c.

2.2.2 The effects of obligation, indignation, and anticipated self-consistency on trustfulness Kindness of promised trustworthiness. We now analyze the trustor’s decision of whether to place trust or not. Promised trustworthiness (TG|H+2 ) involves a prospect for the trustor to receive increased outcomes (R1 > P1 ) from honored trust. In this sense, a voluntary promise of trustworthiness is a friendly advance that invokes feelings of obligation to return the favor by placing trust. Moreover, trustors might anticipate the general desire for self-consistency that induces trustees to keep their promise (Hypothesis 2). We assume the promoting effect of received promises to vary with the properties of the promise. By making a promise with a high binding value, a trustee reduces his temptation (T2 − R2 ) to abuse trust. Trustors might perceive this as an indication of the trustee’s willing to bind himself and thereby to help the trustor to place trust. Since transaction costs are an irreversible investment, trustors might consider costly promises as an indication of particular kindness. Receiving a promise despite high transaction costs induces trustors to feel an even stronger obligation to return the favor. Hypothesis 4. Compared to the TG (i.e., the decision situation without a promise opportunity), trust is more likely to be placed after trustworthiness has been promised (TG|H+2 ). Moreover, the effect of receiving the promise of trustworthiness on placing trust becomes more promoting with increasing binding value v2 and more promoting with increasing transaction costs c. Unkindness of an omitted promise of trustworthiness. Next, recall the arguments that the trustee’s desire for self-consistency in general reduces trustworthiness after the trustee omits the promise (TG|H02 ) (Hypothesis 3). Anticipating this hampering effect, trustors become more reluctant to place trust. The trustee explicitly chooses not to promise his trustworthiness, which implies the choice of not providing the trustor with the prospect of a gain. Trustors might therefore perceive the omitted promise as unfriendly, which creates feelings of indignation. In this case, the trustor would even be driven to retaliate by withholding trust.

402 | Manuela Vieth and Jeroen Weesie

The felt indignation might increase with the binding value of the omitted promise, as this indicates the trustee’s consideration to abuse trust (Hypothesis 3). Regarding transaction costs, trustors might accept that the promise has not been made simply because of high transaction costs. Hypothesis 5. Compared to the TG (i.e., the decision situation without a promise opportunity), trust is less likely to be placed after a possible promise of trustworthiness has not been made (TG|H02 ). The effect of an omitted promise on placing trust becomes more hampering with increasing binding value v2 , but less hampering with increasing transaction costs c. Summarizing the hypotheses highlights the pattern of reciprocal behavior that results from intention-based motivations and from self-consistency (Table 1). Kind advances are returned in kind, and unkind behavior triggers unkind responses. The binding value mostly increases the effects of made and omitted promises. An exception is the impact of made promises on trustworthiness (TG|H+2 ), which is hampered by the binding value. Transaction costs always promote trustfulness and trustworthiness as they increase the positive impact of made promises (TG|H+2 ) and mitigate the negative influence of omitted promises (TG|H02 ). Note again the opposing effects: the binding value promotes self-consistency after the trustee has omitted the promise (TG|H02 ), which reduces trustworthiness, whereas transaction costs strengthen feelings of obligation that increase trustworthiness. Tab. 1: Overview of hypotheses and notation. Placing trust

Honoring trust

Behavioral contexts DG TG TG|H+2 TG|H02

(ref.) + −

− (ref.) + −

Dictator Game (no placed trust) Trust Game (no promise option) TG after a made promise to honor trust TG after an omitted promise to honor trust

Binding value v 2 in TG|H+2 in TG|H02

+ −

− −

Change of the effects of made and omitted promises of trustworthiness with increasing binding value v 2

Transaction costs c in TG|H+2 in TG|H02

+ +

+ +

Change of the effects of made and omitted promises of trustworthiness with increasing transaction costs c

Notes: The hypotheses for effects of behavioral contexts are formulated in terms of differences toward the TG.

Trust and Promises as Friendly Advances | 403

3 Experimental design, data, and statistical methods 3.1 Experimental design: sets of (sub)games The aim of our experiment is to analyze the effects of preceding decisions on subsequent behavior in trust situations. We thus need to control for diverse outcome-based motivations and for the general personal characteristics of participants. For this purpose, we designed our lab experiment as sets of (sub)games (TGs, HTGs, and DGs) with identical extensive forms (i.e., identical choice structure and payoff structure). We constructed our sets of (sub)games in such a way that payoffs in games without a behavioral context were exactly the same as payoffs in the corresponding subgames of the TG and the HTG. For this purpose, we subtracted the absolute values of promise properties at the beginning of some HTGs and added them for other HTGs. We then completed our design with separate TGs and DGs for the different payoff combinations (for details, see Vieth and Weesie 2006). For example, consider the following baseline payoffs used in a set of (sub)games: R1 = R2 = 60, P1 = P2 = 30, S1 = 20, and T2 = 100 (see the numerical example in Figure 3). These payoffs constitute the outcomes of a subgame after the promise was not made (TG|H02 of HTG1). The same payoffs were used in a separate TG and in a separate DG. In an HTG, making the promise changes the trustee’s payoffs because the promise properties are subtracted. We needed a subgame after the promise was made (TG|H+2 ) with payoffs identical to the decision situation after the promise was not made (TG|H02 ). Therefore, we constructed a second HTG by adding the transaction costs (c = 5) and the binding value (v2 = 10) at the beginning to the respective trustee’s payoffs. This yielded R2 = 60 + 5, P2 = 30 + 5 and T2 =100 + 5 + 10 after the promise was not made (TG|H02 of HTG2). In the subgame after the promise was made, the initially added promise properties were subtracted again (TG|H+2 of HTG2). Making the promise in HTGs with promise properties added in the beginning (HTG2) thus results in a subgame with payoffs identical to those in the HTG after the promise is not made starting with the baseline payoffs (HTG1). Similarly, if the promise properties were subtracted at the beginning (HTG3, not included in Figure 3), the subgame after the promise was not made has exactly the same payoffs as the subgame after the promise was made in the HTG starting with the baseline payoffs (HTG1). These implicit shifts of payoffs in HTGs, TGs, and DGs on the scale of the promise properties were not explicit to participants. They were hidden by variations of outcome parameters and by mixing sets of (sub)games (described below). We generated different sets of sub(games) with identical payoffs by varying some outcome parameters (Table 2). These variations were included in the design for the methodological reasons mentioned above and for possible further analyses (for details, see Vieth and Weesie 2006; for further analyses, for example, Vieth 2009). Four baseline payoff combinations were distinguished by varying the payoffs resulting from

404 | Manuela Vieth and Jeroen Weesie

HTG1

TG|H2O

Abuse

TG

Trustor Trust Trustee

No trust

2

Honor

(S1, T2) (20, 100)

TG

Promise 1

Trust 2 Trustee

No trust

(P1, P2) (30, 30)

2 Trustee

No promise 1 Trustor

Abuse

(R1, R2) (60, 60)

Honor

(P1, P2−c) (S1, T2−v2−c) (R1, R2−c) (30, 25) (20, 85) (60, 55)

1 Trustor Trust 2

No trust

(P1, P2) (30, 30)

(S1, T2) (20, 100) DG

DG

Share

(S1, T2) (20, 100)

No promise 1

No trust Abuse

(R1, R2) (60, 60)

2 Dictator

Keep

HTG2

Trustee Honor

Abuse

(R1, R2) (60, 60)

2 Trustee

Trustor Trust 2 Trustee

TG|H

1

Trustor Trust 2

No trust Abuse

Honor

(P1, P2+c) (S1, T2+v2+c) (R1, R2+c) (30, 35) (20, 115) (60, 65)

Promise

+ 2

(P1, P2) (30, 30)

(S1, T2) (20, 100)

Trustee Honor (R1, R2) (60, 60)

Notes: The design allows for the comparison of the trustor’s behavior in (sub)games, indicated by dashed boxes (TG|H02 of HTG1, TG, and TG|H+2 of HTG2) and of the trustee’s behavior in (sub)games, indicated by dotted boxes (TG|H02 of HTG1, TG, DG, and TG|H+2 of HTG2). These sets of (sub)games constitute the “subject-payoff response sets” used in the statistical analyses. Numerical example: high high S1 = 20, T2 = 100, R 1 = R 2 = 60, P1 = P2 = 30, v 2low = 10, c low = 5. Fig. 3: Sets of games with identical subgames.

abused trust (S1 and T2 ) at two levels each (low, high). As baseline payoffs, we chose 0 or 20 for S1 and 80 or 100 for T2 . The baseline payoffs after no trust (P i ) and after honored trust (R i ) were fixed at P1 = P2 = 30 and R1 = R2 = 60. The two promise properties were varied at three levels each (no, low, high). “No” indicates v2 = 0 or

Trust and Promises as Friendly Advances | 405

Tab. 2: Outcome parameters of the experimental design. Design parameters: S1 (2) × T2 (2) × v 2 (3) × c(3) Payoff parameters: S1 (2) × T2 (2)

Promise properties: v 2 (3) × c(3)

Slow 1 =0

v 2no = 0

high S1

v 2low = 1/4 ⋅ (T2 − R 2 ) = {5, 10}

= 20

high

T2low = 80 high T2 = 100 R 1 = R 2 = 60 P1 = P2 = 30

v2

= 3/4 ⋅ (T2 − R 2 ) = {15, 30}

c no = 0 c low = 1/6 ⋅ (R 2 − P2 ) = 5 c high = 4/6 ⋅ (R 2 − P2 ) = 20

c = 0. Low binding values were defined as 1/4 ⋅ (T2 − R2 ), high binding values as high 3/4 ⋅ (T2 − R2 ). For instance, consider T2 = 100 and R2 = 60. In this case, the low possible binding values in our design are vno = 1/4 ⋅ (T2 − R2 ) = 10, and 2 = 0, v 2 high v2 = 3/4⋅(T2 − R2 ) = 30. In some HTGs with promise properties initially subtracted, the promise is perfectly binding (v2 > T2 − R2 ). Levels for transaction costs are defined on the scale R2 − P2 (i.e., the “gain of cooperation”) with clow = 1/6 ⋅ (R2 − P2 ) = 5 and chigh = 4/6 ⋅ (R2 − P2 ) = 20. Binding values and transaction costs resulted in 9 combinations of promise properties, yielding 36 combinations together with the baseline payoffs. As explained above, we designed sets of (sub)games by adding or subtracting promise properties. For initially added promise properties, we selected only combinations in which both promise properties were positive (c > 0 and v2 > 0). Since the cheap-talk case (c = 0 and v2 = 0) does not change the payoffs, this design yields 80 combinations of total payoffs that can occur in the sets of (sub)games. Each participant made decisions in two sets of (sub)games in the role of player 1 (trustor, receiver), and in two other sets in the role of player 2 (trustee, dictator). For each encounter, participants were randomly and anonymously matched with another participant. We employed a “stranger matching”, whereby the probability of re-matching was minimized within each type of game (for details, see Vieth and Weesie 2006). In this sense, each game constituted a single encounter (one-shot game). The sets of (sub)games were mixed by clustering the types of games. First, 12 TGs were played, then 14 HTGs, and thereafter 10 DGs. In two of the TGs and in two of the HTGs, trustees had no objective incentive to abuse trust (T2 < R2 ). These games are not involved in the reported analyses, but were included in the design to check the attention of participants. In these temptation-free decision situations, we observed 82.1 % trustfulness and 95.3 % trustworthiness, which indicates that participants paid sufficient attention to the objective outcomes. These percentages are

406 | Manuela Vieth and Jeroen Weesie significantly higher (p < 0.0001) than the highest average levels (in the TG|H+2 ) in our analyses (Table 4). Note that we neither expect full trustfulness nor full trustworthiness because of possible influences of negative other-regarding outcome-based motivations (e.g., aggressive or competitive tendencies). A brief questionnaire about sociodemographic characteristics of participants (e.g., gender, age, education) separated the TGs from the HTGs. Other questions about personal attitudes and opinions followed the DGs. Analyses of questionnaire items are not reported here. In each game cluster, player roles were changed after half of the periods. In addition to randomly changing interaction partners, payoffs and promise properties (in HTGs) changed from one period to the next. The combinations and sequences of payoffs and promise properties were varied across experimental sessions employing a factorial design. Our experimental design improves in various respects on previous research (for further discussion, see Vieth 2009). Studies by Cox (2004), McCabe, Ridgon, and Smith (2003), and Snijders (1996) (who likewise constructed (sub)games with identical payoffs) are specifically related to our approach. Our improvements are based on the combination of three ingredients. First, our within-subject design allows conclusions to be drawn at the individual level. Effects on the individual level can even point to the opposite direction than the combined effect on an aggregate level (on inequality aversion, see Blanco, Engelmann and Normann 2011; on the “ecological fallacy”, see Robinson 1950). Second, by eliciting actual and sequential decision from our participants, our behavioral contexts are created endogenously by preceding kind and unkind decisions, and all decisions are fully outcome-relevant. In contrast, asking participants to indicate their choices for all potential states of the decision situation (“strategy method”, Selten 1967), decisions remain hypothetical (even if a randomly chosen one is paid). Research suggests that this undermines the influences of emotions and creates artificial consistency in each participant’s decisions (Brosig, Weimann, and Yang 2003; Casari and Cason 2009; McCabe, Rigdon, and Smith 2003; McCabe, Smith, and LePore 2000; McKelvey and Palfrey 1998; Roth 1995:322–323). Third, by employing the Trust Game which has binary choice sets, we maximally reduce ambiguity of decisions with respect to perceived kindness. This enables us to derive and test hypotheses without additional information or specific assumptions about people’s preferences and beliefs (see also the discussion in Section 5). The experiment was computer-assisted, for which we used the software package “z-Tree” (Fischbacher 2007). The decision situations were presented on the screen as a table displaying own and others’ choice options, the respective objective outcomes, and choices made on preceding decision stages. All choice options were neutrally labelled (“up”, “down”, “sending the message ‘I will choose up.’”). In addition to general information on paper, participants received on-screen instructions and a tutorial before each game cluster. Outcomes were displayed in the tables as points representing monetary gains (one Euro cent for each point). Participants were paid anonymously and immediately after the experiment. On average, participants earned 16 €. The experiment was conducted in November 2006 at the ELSE lab at Utrecht Univer-

Trust and Promises as Friendly Advances | 407

sity. During the experiment, all texts were presented in Dutch (for details on experimental design, instructions, and screen setup, see Vieth and Weesie 2006). Using “ORSEE” (Greiner 2015), 156 persons were recruited from the ELSE participant pool and took part in nine groups of 16 to 20 participants. Nearly all the participants were students enrolled in various fields at Utrecht University.

3.2 Data and statistical methods The 156 participants made 1716 “placing trust” decisions in the role of the trustor and 1389 “honoring trust” decisions as a trustee or dictator (Table 3). Of the 80 possible different payoff combinations, 76 were realized for “placing trust” and 71 for “honoring trust”. The variation is mainly due to the elicitation of decisions in actually realized subgames. For instance, a trustee can only decide whether to honor trust after trust has been placed. For our analyses, we constructed “subject-payoff response sets”, that is, we grouped the decisions of each subject made in (sub)games of identical extensive form (Figure 3). As the grouping was based on total payoffs (i.e., promise properties are subtracted in the case the promise has been made), the two subgames of HTGs were separated into different groups (except for the cheap-talk case). This yielded 929 subject-payoff response sets of “placing trust” decisions made in TGs and subgames of HTGs. For “honoring trust” decisions, we had 877 subject-payoff response sets involving DGs and subgames of TGs and HTGs. Each subject-payoff response set involved 1–5 decisions. Tab. 3: Number of cases and units of analyses. Placing trust

Honoring trust

Number of . . .

all data

in analyses

all data

in analyses

subjects total payoffs subject-payoff response sets decisions in total

156 76 929 1716

118 48 212 560

156 71 877 1389

70 35 101 248

Notes: Total payoffs are combinations of payoffs and promise properties.

The grouping in subject-payoff response sets is reflected by a “fixed effects” statistical model in which we make minimal assumptions about differences between subjects and outcomes to analyze the “pure” effects of preceding behavior. In such a fixed effects approach, only subject-payoff response sets in which decisions vary carry statistical information. Thus, our fixed effects approach excludes subject-payoff response sets that involve only one single decision or consist of more than one but always the same decision. In our analyses, we therefore had 212 subject-payoff response sets with

408 | Manuela Vieth and Jeroen Weesie

560 “placing trust” decisions and 101 subject-payoff response sets with 248 “honoring trust” decisions (Table 3). Note that the selection of informative cases is a strength of the powerful design we employed to explore the effect of behavioral advances. We have sufficient decisions and response sets for the presented statistical analyses. The numbers of decisions and response sets involved in the analyses are summarized per subgame in Table 4. For a made or omitted cheap-talk promise (c = 0 and v2 = 0), some response sets involved two decisions for the same subgame. For instance, this is the case for 49 “placing trust” decisions (204 − 155) after the trustee promised trustworthiness (TG|H+2 ). On average across all (sub)games, subjects decided in 269 of the 560 cases (48.0 %) to place trust and in 127 of the 248 cases (51.2 %) to honor trust or to share gains. The frequencies of placed trust and of honored trust seem to differ considerably between the behavioral contexts. However, behavioral contexts are created endogenously by the specific decisions made. For instance, trustees might have had the chance to honor trust and might have done so more often in some behavioral contexts than in others just because the outcomes were perceived as favorable. Therefore, testing our hypotheses about effects of behavioral advances on subsequent decisions requires controlling for influences of outcome-based motivations and for individual heterogeneity. Tab. 4: Summary of data in the analyses per (sub)game. Placing trust (x)

DG TG TG|H+2 TG|H02 ∑

Honoring trust (z)

N(dec)

N(sets)

%x

N(dec)

N(sets)

%z

212 204 144 560

212 155 139

57.1 59.8 18.1 48.0

101 63 66 18 248

101 63 63 18

28.7 52.4 84.8 50.0 51.2

Notes: Only mixed response sets are involved in the analyses. Constant response sets carry no statistical information in a fixed effects approach. The percentages of placed trust (% x) and honored trust (% z) are calculated for the respective number of decisions.

To control for other effects, we used logistic regression models with fixed effects to analyze the decisions nested in the subject-payoff response sets. Models were fitted by conditional maximum likelihood. Regarding this approach, see the Rasch program (Fischer and Molenaar 1995; Rasch [1960] 1980), which is known as the fixed effects estimator for binary panel data in econometrics (Chamberlain 1980). The baseline models can be described as follows: 󸀠

Logit Prob (y ijk |σ ij ) = σ ij + η ijk β

Trust and Promises as Friendly Advances | 409

The model specifies the probability of trustfulness or trustworthiness of a subject i in the behavioral context of a (sub)game k that has a total payoff j, where σ ij represents the fixed effects for subject-payoff combinations, η ijk are attributes of the behavioral contexts k (and of controls) that vary within subject-payoff combinations, and β are parameters. Our analysis assumes neither that all subjects would have the same responses for all payoff combinations nor that the difference in the probability of trustfulness or trustworthiness between total payoffs would be the same for all subjects. In fact, subjects and payoffs may fully interact. We do however assume that the effect of a behavioral context on behavior is the same for all subject-payoff combinations (see the discussion for further remarks).

4 Results 4.1 Analyses for trustworthiness Trustees decide whether to honor trust or not after trust has been placed in the TG or after the trustee’s decision of whether or not to promise trustworthiness in the HTG (TG|H+2 and TG|H02 ). Furthermore, trustees decide whether or not to share gains without behavioral context in the DG. Four differently embedded sharing decisions can thus be distinguished. Following our hypotheses, we use the TG as the reference category in our analyses (Table 5). The first model for trustworthiness (model TW1) contains the dummy variables for the behavioral contexts (Panel A). We present coefficients rather than marginal effects or unit effects because response probabilities can only be estimated at the cost of making specific assumptions about the distribution of fixed effects. Thus, we test our hypotheses in terms of relative effects: effects on the odds rather than effects on the probability (Hoijtink and Boomsma 1995). Wald tests for the differences between the coefficients of (sub)game dummies are reported in Panel C of Table 5. We control for the number of past periods per game because subject-payoff response sets are composed of decisions made in different periods. The period is counted for each type of game (i.e., 1–12 for the TG, 1–14 for the HTG, and 1–10 for the DG). We thus control for a behavioral trend of increasing or decreasing trustworthiness (e.g., resulting from experiences in previous interactions; see the discussion in Section 5 for further remarks). In the second model for trustworthiness (model TW2), we include the properties of the promise (the binding value and transaction costs). We distinguish these effects for the two HTG subgames resulting from the trustee’s decision about making the promise. Again, these interaction effects are relative ones on the odds of the probability that trust is honored. The coefficients for the two HTG subgames in model TW2 represent the effects of making or omitting the promise in cheap-talk cases (v2 = 0 and c = 0). Note again that changes in objective payoffs due to promise properties are captured by including fixed effects for subject-payoff response sets. Co-

410 | Manuela Vieth and Jeroen Weesie

efficients for behavioral contexts and for promise properties therefore represent the effects of preceding decisions that are not based on objective outcomes. The likelihood-ratio test (Panel B) for model TW1 against the null model with period control shows that trustworthiness generally differs significantly between behavioral contexts (LR χ23df = 36.20; p < 0.0001). Moreover, properties of the promise in general significantly moderate the influences that the trustee’s promise decision exerts on trustworthiness. This is indicated by the likelihood-ratio test for model TW2 against model TW1 (LR χ24df = 10.98; p = 0.0268). Separate likelihood-ratio tests likewise show that the binding value (LR χ22df = 5.84; p = 0.0538) and the transactions costs (LR χ 22df = 8.39; p = 0.0150) significantly moderate the influences of making and omitting the promise in the model TW2. However, separate likelihood-ratio tests for the two HTG subgames in the model TW2 show that only the influence of omitted promises is significantly influenced by the properties of the promise (LR χ22df = 6.77; p = 0.0339). No support for such moderating influences can be found for made promises (LR χ22df = 4.50; p = 0.1057). In the following, we describe and discuss the results for the behavioral contexts (Table 5). We argued that trustees perceive placed trust as a friendly advance that invokes feelings of obligation which increase trustworthiness (Hypothesis 1). Although the coefficient is only marginally significant, we find that trustees are indeed less likely to share gains in the DG than after trust has been placed in the TG (Table 5) (see also Cox 2004; Gautschi 2000; McCabe, Rigdon, and Smith 2003). We next hypothesized that promising to honor trust (TG|H+2 ) promotes trustworthiness by increasing feelings of obligation and activating the desire for self-consistency (Hypothesis 2). Supporting this reasoning, the results show a strongly positive and highly significant coefficient of having made the promise. This holds for the combined effects, including those of promise properties (model TW1) and even cheap-talk promises (model TW2). In fact, the positive effect of making the promise is significantly stronger than the negative effect of the absence of placed trust in the DG (test of differences between absolute coefficients: Wald χ 21df = 4.50; p = 0.0338 in model TW1 and Wald χ21df = 2.84; p = 0.0918 for cheap-talk promises in model TW2). This suggests that placed trust after made promises motivates trustworthiness through both selfconsistency and feelings of obligation. However, we cannot ascertain whether selfconsistency fosters feelings of obligation or complements its impact. The strong increase in trustworthiness can also be due to the very strong impact of self-consistency despite possibly reduced feelings of obligation (see the discussion in Section 5). Regarding the moderating influences of promise properties, the binding value should hamper the promoting influence of having promised to honor trust because placed trust becomes a smaller favor with increasing binding value (Hypothesis 2). In our analyses, the coefficient of the binding value after the promise has been made is not significant (model TW2). Thus, we do not find support for our idea. However, the results show that the influence of making the promise tends to be more promoting

Trust and Promises as Friendly Advances | 411

Tab. 5: Logistic regression of trustworthiness with fixed effects for subject-payoff response sets. (A) Regression coefficients TW1

Behavioral contexts DG TG TG|H+2 TG|H02 Binding value v 2 in TG|H+2 in TG|H02 Transaction costs c in TG|H+2 in TG|H02 Past periods per game

TW2

Hypotheses

b

se

b

se

H1 : −

−0.54∘ (ref.) 1.92∗∗∗ 0.18

0.29

−0.55∘ (ref.) 1.74∗∗∗ 1.01

0.29 0.60 1.34

H2 : − H3 : −

−0.03 −0.24∘

0.04 0.13

H2 : + H3 : +

0.07 0.10 0.12

H2 : + H3 : −

0.53 0.57

−0.15

0.11

0.12∘ 0.18∘ −0.16

χ2

df

χ2

df

3

47.18∗∗∗ 10.98∗

7 4

(C) Pairwise comparisons of behavioral contexts (Wald tests) ∆b se

∆b

se

− TG|H+2 DG − TG|H+2 DG − TG|H02

−0.73 −2.29∗∗ −1.57

1.35 0.61 1.35

(B) Model comparisons LR test (TW0) LR test (TW1)

TG|H02

36.20∗∗∗

−1.75∗∗

−2.47∗∗∗ −0.72

0.64 0.55 0.55

Notes: N(response sets) = 101, N(decisions) = 248, N(subjects) = 70; (sub)games (0/1), past periods per game (1 . . . 10/12/14), binding value v 2 = (0, 5, 10, 15, 30), transaction costs c = (0, 5, 20). Likelihood-ratio tests are reported against the null model with period control (TW0) and against the model TW1. Log likelihood (ll): ll|TW0 = −86.81, ll|TW1 = −68.71, ll|TW2 = −63.23. ∗∗∗ p ≤ 0.001, ∗∗ p ≤ 0.01, ∗ p ≤ 0.05, ∘ p ≤ 0.1 (two-sided p-values).

with increasing transaction costs. This suggests that trustees perceive placed trust as more kind and as a reward for having sacrificed high transaction costs (Hypothesis 2). If the trustee has explicitly omitted the promise of trustworthiness (TG|H02 ), but nevertheless gets the chance to decide about honoring trust, the influence of selfconsistency might compete with feelings of obligation (Hypothesis 3). The results show that the coefficient for having omitted the promise is positive, though not significant. This holds for both the estimate involving influences of promise properties (model TW1; and see Snijders 1996) and the estimate for omitted cheap-talk promises (model TW2). The insignificant coefficient suggests that the two motivations indeed

412 | Manuela Vieth and Jeroen Weesie

compete with one another. The positive sign of the coefficient might indicate that feelings of obligation are not easily undermined by the desire for self-consistency. Considering the properties of the omitted promise helps disentangle the influences of the motivations. As argued above, the binding value of the omitted promise should promote self-consistency, while transaction costs that would have been associated with making the promise should strengthen feelings of obligation (Hypothesis 3). Our results provide support for both the hampering influence of self-consistency and the promoting influence of feelings of obligation. The effect of placed trust despite the omitted promise tends to promote trustworthiness as transaction costs that the trustee would have sacrificed increase. The influence of omitted promises on trustworthiness is significantly positive for transaction costs c ≥ 7 (b = 2.29 (= 1.74 + 7 ⋅ 0.18), se = 1.37, Wald χ 21df = 2.77; p = 0.0961). This indicates that trustees feel obliged to reward trustfulness because they might assume that trustors have been so kind as to show their understanding. In contrast, we find a tendency that omitted promises significantly hamper trustworthiness for binding values v2 ≥ 18 (b = −3.23 (= 1.74 − 18 ⋅ 0.24), se = 1.95, Wald χ21df = 2.74; p = 0.0976). Thus, mechanisms of cognitive dissonance reduction might have been successful. Trustees might even perceive placed trust negatively after having omitted a promise with high binding values. The desire for self-consistency then undermines the unwanted feeling of obligation induced by placed trust.

4.2 Analyses for trustfulness We distinguish three differently embedded TGs for the trustor’s decision of whether or not to place trust: the TG without behavioral context, and the two subgames in the HTG after the trustee has decided whether or not to promise trustworthiness (TG|H+2 and TG|H02 ). The results for trustfulness (Table 6) are presented in a similar way to the results for trustworthiness (as described for Table 5). The period in which the trustor decides whether to place trust has a strong and highly significant negative effect on trustfulness (Table 6). Recall that no effect of the decision period on trustworthiness has been found (Table 5). Thus, trustors might have experienced abused trust in previous encounters and become more reluctant to place trust. Likewise contrasting the analyses for trustworthiness, we do not find support for the idea that promise properties would moderate effects of behavioral contexts on trustfulness (LR χ24df = 2.61; p = 0.6257; see also model TF2). Separate likelihood-ratio tests of joint significance as reported for the analyses for trustworthiness also do not provide support for moderating influences of promise properties (details not reported). Nevertheless, regarding the influence of the trustee’s promise decision, trustfulness in general differs significantly between behavioral contexts (LR χ 22df = 54.99; p < 0.0001). We turn again to the details of the behavioral contexts (Table 5).

Trust and Promises as Friendly Advances | 413

Tab. 6: Logistic regression of trustfulness with fixed effects for subject-payoff response sets. (A) Regression coefficients TF1

TF2

Hypotheses

b

se

b

se

Behavioral contexts TG TG|H+2 TG|H02

H4 : + H5 : −

(ref.) 0.46∗ −1.29∗∗∗

0.19 0.24

(ref.) 0.49∗ −1.31∗∗

0.23 0.44

Binding value v 2 in TG|H+2 in TG|H02

H4 : + H5 : −

0.01 0.03

0.02 0.02

Transaction costs c in TG|H+2 in TG|H02 Past periods per game

H4 : + H5 : +

−0.16∗∗∗

0.04

−0.02 −0.02 −0.16∗∗∗

χ2

df

χ2

df

2

57.60∗∗∗

6 4

0.02 0.03 0.05

(B) Model comparisons LR test (TF0) LR test (TF1)

54.99∗∗∗

2.61∗

(C) Pairwise comparisons of behavioral contexts (Wald tests) ∆b se TG|H02

− TG|H+2

−1.75∗∗

0.26

∆b

se

−1.80

0.47

Notes: N(response sets) = 560, N(decisions) = 212, N(subjects) = 118; (sub)games (0/1), past periods per game (1 . . . 12/14), binding value v 2 = (0, 5, 10, 15, 30), transaction costs c = (0, 5, 20). Likelihood-ratio tests are reported against the null model with period control (TF0) and against the model TF1. Log likelihood (ll): ll|TF0 = −193.80, ll|TF1 = −166.30, ll|TF2 = −165.00. ∗∗∗ p ≤ 0.001, ∗∗ p ≤ 0.01, ∗ p ≤ 0.05 (two-sided p-values).

As argued above, the trustee’s promise to behave in a trustworthy manner is a friendly gesture, because it provides the trustor with the perspective of a gain (Hypothesis 4). Thus, trustors should feel an obligation to reward the trustee’s promise by placing trust (TG|H+2 ). Trustors might also anticipate the increase in trustworthiness due to the promise (Table 5 and Hypothesis 2). Our results indeed show that trustors are significantly more likely to place trust after the promise has been made (Table 6). However, the effect is not very strong. This contrasts with the previous finding that trustees staunchly seek to keep their promises irrespective of insufficient objective bonds (Table 5). If a possible promise of trustworthiness has not been made (TG|H02 ), the trustor should be less inclined to place trust due to feeling indignation and anticipating reduced trustworthiness (Hypothesis 5). Supporting this idea, trustfulness is strongly reduced after the promise has not been made (see also Gautschi 2000; Snijders 1996).

414 | Manuela Vieth and Jeroen Weesie

In fact, the hampering impact of the omitted promise on trustfulness is significantly larger than the promoting influence of having received the promise (test of differences between absolute coefficients: Wald χ21df = 6.16; p = 0.0131 in model TF1 and Wald χ 21df = 2.40; p = 0.1214 for cheap-talk promises in model TF2). The difference between the coefficients is not significant in the case of cheap-talk promises. This might be due to the small number of cases in which trustees have omitted a cheaptalk promise (9 of 79 cases, or 11.4 %, in our analyses for trustfulness). Attributing the strong withdrawal of trustfulness to a false consensus effect (i.e., trustors expect trustees to behave as the trustor would do in the trustee’s role) is unlikely because participants have made decisions in both decision roles, and results for trustfulness differ from results from trustworthiness. The sharp decrease in trustfulness thus cannot result from the mere anticipation of abused trust. Rather, strong feelings of indignation are at work, inducing trustors to seek revenge for the omitted promise.

5 Summary and perspectives Our study provides evidence for reciprocity that is due to influences of preceding behavior on subsequent decision-making, irrespective of changes in objective outcomes. We analyzed trustfulness and trustworthiness in different behavioral contexts created by preceding friendly and unfriendly decisions. Based on sociological and social-psychological research, we argued that people are motivated by feelings of obligation or indignation (Cialdini 2001: Ch. 2; Coleman 1990: Ch. 12; Gouldner 1960) and by the desire for self-consistency (Cialdini 2001: Ch. 3; Festinger 1957; Heider 1958). To study the “pure” effects of preceding behavior, we have conducted a lab experiment that allows for within-subject comparisons of trustfulness and trustworthiness in structurally identical (sub)games with different behavioral contexts that are generated endogenously by actual preceding kind and unkind decisions. By the combination of ingredients in our experimental design, we also improve on previous research (e.g., Cox 2004; McCabe, Rigdon, and Smith 2003; Snijders 1996). Our results mainly support the theoretical ideas about influences of process-based motivations. We found that placing trust tends to promote the trustee’s generosity. Trustfulness and trustworthiness is boosted by promises to honor trust and sharply reduced by omitted promises. Objective properties of promises (binding values and transaction costs) tend to moderate behavioral effects on trustworthiness but not on trustfulness. As in previous studies on communication effects, we found that the influence of cheap-talk promises cannot be explained by theoretical models in which perceived kindness is based on changes of objective outcomes induced by preceding behavior (see for example Falk and Fischbacher 2006; for a further discussion see Vieth 2009). Some aspects of our study require further remarks concerning statistical analyses, experimental design, social-psychological assumptions, and formal theoretical

Trust and Promises as Friendly Advances | 415

analyses. First, we mentioned that our statistical analyses were based on the assumption that the effect of behavioral context on behavior is the same for all subject-payoff combinations. This is a strong homogeneity assumption and could be relaxed by allowing the (sub)game coefficients to vary randomly with subject-payoff response sets. However, such analyses would require many more observations, as well as additional assumptions about the distribution of random effects. We avoid this in our fixed effects approach. Second, our experimental design involved many variations, including asymmetric payoff structures. We ignored subgame-payoff interactions for reasons of restricted sample size, and controlled for additive effects of objective outcomes. To some extent, fewer variations seem advisable for further experiments. However, fixing the payoff parameters, for instance, would have induced participants to focus exclusively on behavioral advances and choice options. This might reduce decision noise, but increase the risk of response biases (e.g., participant awareness biases). Our experimental design has the advantage that participants also paid attention to payoff variations (see Section 3.1 on the experimental design). The relatively large number of variations in our experiment thus strengthens the reported results. We also clustered the types of games in our experiment and fixed the ordering: first TGs, then HTGs, and finally DGs. A short questionnaire assured a break between the TGs and the HTGs, and the relationship between HTGs and DGs is less obvious. Decisions compared within response sets were from different periods and games. Nevertheless, it is possible that the specific ordering has an effect, for instance, if participants made increasingly selfish decisions in the course of the experiment. We chose this fixed ordering for two reasons. First, decisions in DGs could most strongly affect subsequent decision-making because outcome-based preferences are revealed. Second, decision situations should not become increasingly simple, as would have been the case with an ordering such as HTG–TG–DG. Among other methodological drawbacks, this would have revealed the nesting of (sub)games. Some of the discussed design issues specifically arise because we employed a within-subject design (see also Keren 1993; Putt 2005). In a between-subjects design, practice effects and carryover effects can be ruled out. Even in a series of single encounters, experiences in preceding decision situations might subsequently influence people’s behavior toward other persons (for example due to indirect reciprocity, changes in mood and beliefs induced by positive and negative experiences, or influences of group dynamics). These aspects are of particular interest for further research. For the purpose of our study, a within-subject design is more suitable because it enables analyzing the influence of motivations at the individual level while controlling for (additive) individual heterogeneity and for influences of objective outcomes without making assumptions about specific outcome-based motivations. Third, we simplified our social-psychological assumptions about the influence of feelings of obligation or indignation and the desire for self-consistency. People can also feel an obligation to return a favor because of normative expectations rather

416 | Manuela Vieth and Jeroen Weesie

than perceived kindness (for example, considering unwanted gifts: see Cialdini 2001). Promise properties have been helpful for disentangling the opposing effects of obligation and self-consistency on trustworthiness, but we cannot separate these influences in the cases in which the promise has been made. Further research could investigate conditions under which the motivation is inter-personal (i.e., intentionally reciprocal due to obligation or indignation) rather than intra-personal (self-consistency). Investigating underlying motivations and psychological processes requires measuring whether an action is indeed perceived as kind or unkind, what emotions are triggered, what reasons people have for their decisions, what beliefs actors have, and how these beliefs are updated. In our experiment, we consciously decided to omit such questions because asking participants to consider the other person can influence decision-making. Some experiments report a bias toward other-regarding behavior (Gächter and Renner 2010; Hoffman, McCabe, and Smith 2008), whereas in other experiments indications have been found that measuring beliefs promotes selfish behavior (Croson 2000). Measuring beliefs thus influences people’s decision-making, but the direction, the extent, and the conditions of such biases still require further research. For this purpose, also studying other decision situations that are not trust situations might be fruitful. For instance, the influence of omitting a promise to behave in a friendly manner (which we argued to be unfriendly) might differ from the influence of explicit threats (for negotiation problems see Prosch 2006; for reward promises and punishment threats see Vieth 2009). Fourth, a more formal theoretical analysis could help formulate more quantitative hypotheses about effect sizes and derive thresholds for certain parameter intervals. For instance, the extent to which self-consistency undermines feelings of obligation after an omitted promise depends on the properties of the promise. Employing formal models, effects of various motivations could be specified and disentangled more clearly (for testing formal hypotheses consider discussion point 3 on measurement issues). Formal modeling would also provide insights in how firmly our hypotheses are rooted in game-theoretical rationality. Actors might be described by a set of individual parameters (i.e., how sensitive they are to obligations, self-consistency etc.). However, it is unlikely that people have rational expectations regarding such social motivations (for biased beliefs about outcome-based motivations see Aksoy and Weesie 2013). A thorough formalization of the theoretical ideas and insights we have presented is therefore an ambitious and challenging task and presumably requires simplifying assumptions (for further discussion, see Vieth 2009).

Trust and Promises as Friendly Advances | 417

Bibliography [1] [2] [3] [4]

[5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23]

Akerlof, George A., and William T. Dickens. 1982. “The Economic Consequences of Cognitive Dissonance.” American Economic Review 72(3):307–319. Aksoy, Ozan, and Jeroen Weesie. 2013. “Hierarchical Bayesian Analysis of Biased Beliefs and Distributional Social Preferences.” Games 4(1):66–88. Aronson, Elliot. 1992. “The Return of the Repressed: Dissonance Theory Makes a Comeback.” Psychological Inquiry 3(4):303–311. Au, Wing Tung, and Jessica Y. Y. Kwong. 2004. “Measurements and Effects of Social Orientation in Social Dilemmas: A Review.” Pp. 71–98 in Contemporary Psychological Research on Social Dilemmas, edited by R. Suleiman, D. V. Budescu, I. Fischer, and D. M. Messick. Cambridge, MA: Cambridge University Press. Balliet, Daniel. 2010. “Communication and Cooperation in Social Dilemmas: A Meta-Analytic Review.” Journal of Conflict Resolution 54(1):39–57. Baumgartner, Thomas, Urs Fischbacher, Anja Feierabend, Kai Lutz, and Ernst Fehr. 2009. “The Neural Circuitry of a Broken Promise.” Neuron 64(Dec 10):756–770. Bicchieri, Cristina. 2002. “Covenants Without Sword: Group Identity, Norms, and Communication in Social Dilemmas.” Rationality and Society 14(2):192–228. Blanco, Mariana, Dirk Engelmann, and Hans-Theo Normann. 2011. “A Within-Subject Analysis of Other-Regarding Preferences.” Games and Economic Behavior 72(2):321–338. Bolton, Gary E., and Axel Ockenfels. 2000. “A Theory of Equity, Reciprocity, and Competition.” American Economic Review 90(1):166–193. Bracht, Juergen, and Nick Feltovich. 2008. “Efficiency in the Trust Game: An Experimental Study of Precommitment.” International Journal of Game Theory 37(1):39–72. Brandts, Jordi, and Carles Solà. 2001. “Reference Points and Negative Reciprocity in Simple Sequential Games.” Games and Economic Behavior 36(2):138–157. Brosig, Jeannette. 2006. “Communication Channels and Induced Behavior.” Mimeo, University of Magdeburg. Brosig, Jeannette, Joachim Weimann, and Chun-Lei Yang. 2003. “The Hot versus Cold Effect in a Simple Bargaining Experiment.” Experimental Economics 6(1):75–90. Camerer, Colin F. 2003. Behavioral Game Theory. Princeton, NJ: Princeton University Press. Casari, Marco, and Timothy N. Cason. 2009. “The Strategy Method Lowers Trustworthy Behavior.” Mimeo, Purdue University. Chamberlain, Gary. 1980. “Analysis of Covariance with Qualitative Data.” Review of Economic Studies 47(1):225–238. Charness, Gary, and Matthew Rabin. 2002. “Understanding Social Preferences with Simple Tests.” Quarterly Journal of Economics 117(3):817–869. Charness, Gary, and Matthew Rabin. 2005. “Expressed Preferences and Behavior in Experimental Games.” Games and Economic Behavior 53(2):151–169. Cialdini, Robert B. 2001. Influence: Science and Practice. Boston, MA: Allyn & Bacon. Coleman, James S. 1990. Foundations of Social Theory. Cambridge, MA: Belknap Press. Cox, James C. 2004. “How to Identify Trust and Reciprocity.” Games and Economic Behavior 46(2):260–281. Crawford, Vincent. 1998. “A Survey of Experiments on Communication via Cheap Talk.” Journal of Economic Theory 78(2):286–298. Croson, Rachel T. A. 2000. “Thinking like a Game Theorist: Factors Affecting the Frequency of Equilibrium Play.” Journal of Economic Behavior and Organization 41(3):299–314.

418 | Manuela Vieth and Jeroen Weesie

[24] Dasgupta, Partha. 1988. “Trust as a Commodity.” Pp. 49–72 in Trust: Making and Breaking Cooperative Relations, edited by D. Gambetta. Oxford: Blackwell. [25] Dufwenberg, Martin, and Georg Kirchsteiger. 2004. “A Theory of Sequential Reciprocity.” Games and Economic Behavior 47(2):268–290. [26] Falk, Armin, Ernst Fehr, and Urs Fischbacher. 2003. “On the Nature of Fair Behavior.” Economic Inquiry 41(1):20–26. [27] Falk, Armin, and Urs Fischbacher. 2006. “A Theory of Reciprocity.” Games and Economic Behavior 54(2):293–315. [28] Fehr, Ernst, and Klaus M. Schmidt. 1999. “A Theory of Fairness, Competition, and Cooperation.” Quarterly Journal of Economics 114(3):817–868. [29] Fehr, Ernst, and Klaus M. Schmidt. 2006. “The Economics of Fairness, Reciprocity and Altruism – Experimental Evidence and New Theories.” Pp. 615–691 in Handbook of the Economics of Giving, Altruism and Reciprocity. Vol. 1, edited by S. Kolm, and J. M. Ythier. Amsterdam: Elsevier. [30] Festinger, Leon. 1957. A Theory of Cognitive Dissonance. Stanford, CA: Stanford University Press. [31] Fischbacher, Urs. 2007. “z-Tree: Zurich Toolbox for Ready-Made Economic Experiments.” Experimental Economics 10(2):171–178. [32] Fischer, Gerhard H., and Ivo W. Molenaar, eds. 1995. Rasch Models. Foundations, Recent Developments and Applications. New York: Springer. [33] Gächter, Simon, and Elke Renner. 2010. “Effects of (Incentivized) Belief Elicitation in Public Goods Experiments.” Experimental Economics 13(3):364–377. [34] Gallucci, Marcello, and Marco Perugini. 2000. “An Experimental Test of a Game-Theoretical Model of Reciprocity.” Journal of Behavioral Decision Making 13(4):367–389. [35] Gass, Robert H., and John S. Seiter. 2007. Persuasion, Social Influence, and Compliance Gaining. 3rd edn. Boston, MA: Pearson. [36] Gautschi, Thomas. 2000. “History Effects in Social Dilemma Situations.” Rationality and Society 12(2):131–162. [37] Gouldner, Alvin W. 1960. “The Norm of Reciprocity.” American Sociological Review 25(2):161– 178. [38] Greiner, Ben. 2015. “An Online Recruitment System for Economic Experiments.” Journal of the Economic Science Association 1(1):114–125. [39] Heider, Fritz. 1944. “Social Perception and Phenomenal Causality.” Psychological Review 51(6):358–374. [40] Heider, Fritz. 1958. The Psychology of Interpersonal Relations. New York: Wiley. [41] Hoffman, Elizabeth, Kevin McCabe, and Veron L. Smith. 2008. “Prompting Strategic Reasoning Increases Other-Regarding Behavior.” Pp. 423–428 in Handbook of Experimental Economics Results. Handbooks in Economics 28, Vol. 1, edited by C. R. Plott, and V. L. Smith. Amsterdam: Elsevier. [42] Hoijtink, Herbert, and Anne Boomsma. 1995. “On Person Parameter Estimation in the Dichotomous Rasch Model.” Pp. 53–68 in Rasch Models: Foundations, Recent Developments, and Applications, edited by G. H. Fischer, and I. W. Molenaar. New York: Springer New York. [43] Kagel, John H., and Alvin E. Roth, eds. 1995. The Handbook of Experimental Economics. Princeton, NJ: Princeton University Press. [44] Kelley, Harold H., and John W. Thibaut. 1978. Interpersonal Relations: A Theory of Interdependence. New York: Wiley. [45] Keren, Gideon. 1993. “Between or Within Subjects Design: A Methodological Dilemma.” Pp. 257–272 in A Handbook for Data Analysis in the Behavioral Sciences, Vol. 1, edited by G. Keren, and C. Lewis. Hillsdale, NJ: Lawrence Erlbaum.

Trust and Promises as Friendly Advances |

419

[46] Kolm, Serge-Christophe, and Jean Mercier Ythier, eds. 2006. Handbook of the Economics of Giving, Altruism and Reciprocity. Handbooks in Economics 23. Amsterdam: Elsevier. [47] Kopelman, Shirli, John M. Weber, and David M. Messick. 2002. “Factors Influencing Cooperation in Commons Dilemmas: A Review of Experimental Psychological Research.” Pp. 113–156 in The Drama of the Commons, edited by E. Ostrom, T. Dietz, N. Dolšak, P. C. Stern, S. Stonich, and E. U. Weber. Washington, DC: National Academy Press. [48] Kreps, David M. 1990. “Corporate Culture and Economic Theory.” Pp. 90–143 in Perspectives on Positive Political Economy, edited by J. E. Alt, and K. A. Shepsle. Cambridge: Cambridge University Press. [49] Kunda, Ziva. 2002. Social Cognition. Making Sense of People. Cambridge, MA: MIT Press. [50] Ledyard, John O. 1995. “Public Goods: A Survey of Experimental Research.” Pp. 111–194 in The Handbook of Experimental Economics, edited by J. H. Kagel, and A. E. Roth. Princeton, NJ: Princeton University Press. [51] Levine, David K. 1998. “Modeling Altruism and Spitefulness in Experiments.” Review of Economic Dynamics 1(3):593–622. [52] Liebrand, Wim B. G. 1984. “The Effects of Social Motives, Communication and Group Size on Behavior in an N-Person, Multi-Stage, Mixed-Motive Game.” European Journal of Social Psychology 14(3):239–264. [53] MacCrimmon, Kenneth R., and David M. Messick. 1976. “A Framework for Social Motives.” Behavioral Science 21(2):86–100. [54] McCabe, Kevin A., Mary L. Rigdon, and Veron L. Smith. 2003. “Positive Reciprocity and Intentions in Trust Games.” Journal of Economic Behavior and Organization 52(2):267–275. [55] McCabe, Kevin A., Veron L. Smith, and Michael LePore. 2000. “Intentionality Detection and ‘Mindreading’: Why Does Game Form Matter?” Proceedings of the National Academy of Sciences 97(8):4404–4409. [56] McClintock, Charles G. 1972. “Social Motivation – A Set of Propositions.” Behavioral Science 17(5):438–454. [57] McClintock, Charles G., and Eddy van Avermaet. 1982. “Social Values and Rules of Fairness: A Theoretical Perspective.” Pp. 43–71 in Cooperation and Helping Behavior, edited by V. J. Derlega, and J. L. Grzelak. New York: Academic Press. [58] McKelvey, Richard D., and Thomas R. Palfrey. 1998. “Quantal Response Equilibria for Extensive Form Games.” Experimental Economics 1(1):9–41. [59] Messick, David M., and Charles G. McClintock. 1968. “Motivational Bases of Choice in Experimental Games.” Journal of Experimental Social Psychology 4(1):1–25. [60] Mlicki, Pawel P. 1996. “Hostage Posting as a Mechanism for Co-Operation in the Prisoner’s Dilemma Game.” Pp. 165–183 in Frontiers in Social Dilemmas Research, edited by W. B. G. Liebrand, and D. M. Messick. Berlin: Springer. [61] Ostrom, Elinor, and James Walker, eds. 2003. Trust and Reciprocity. Interdisciplinary Lessons from Experimental Research. New York: Russell Sage. [62] Prosch, Bernhard. 2006. “Kooperation durch soziale Einbettung und Strukturveränderung.” Mimeo (Habilitationsschrift), University of Erlangen-Nürnberg. [63] Putt, Mary E. 2005. “Carryover and Sequence Effects.” Pp. 197–201 in Encyclopedia of Statistics in Behavioral Science, edited by B. S. Everitt, and D. C. Howell. New York: Wiley. [64] Rabin, Matthew. 1993. “Incorporating Fairness into Game Theory and Economics.” American Economic Review 83(5):1281–1302. [65] Rasch, Georg. [1960] 1980. Probabilistic Models for Some Intelligence and Attainment Tests. Chicago: University of Chicago Press.

420 | Manuela Vieth and Jeroen Weesie

[66] Raub, Werner. 1992. “Eine Notiz über die Stabilisierung von Vertrauen durch eine Mischung von wiederholten Interaktionen und glaubwürdigen Festlegungen.” Analyse und Kritik 14(2):187–194. [67] Raub, Werner. 2004. “Hostage Posting as a Mechanism of Trust.” Rationality and Society 16(3):319–366. [68] Raub, Werner, and Gideon Keren. 1993. “Hostages as a Commitment Device.” Journal of Economic Behavior and Organization 21(1):43–67. [69] Raub, Werner, and Jeroen Weesie. 2000. “Cooperation via Hostages.” Analyse und Kritik 22(1):19–43. [70] Robinson, W. S. 1950. “Ecological Correlations and the Behavior of Individuals.” American Sociological Review 15(3):351–357. [71] Roth, Alvin E. 1995. “Bargaining Experiments.” Pp. 253–348 in in The Handbook of Experimental Economics, edited by J. H. Kagel, and A. E. Roth. Princeton, NJ: Princeton University Press. [72] Sally, David. 1995. “Conversation and Cooperation in Social Dilemmas: A Meta-Analysis of Experiments from 1958 to 1992.” Rationality and Society 7(1):58–92. [73] Schelling, Thomas C. 1960. The Strategy of Conflict. Cambridge, MA: Harvard University Press. [74] Selten, Reinhard. 1967. “Die Strategiemethode zur Erforschung des eingeschränkt rationalen Verhaltens im Rahmen eines Oligopolexperiments.” Pp. 136–168 in Beiträge zur experimentellen Wirtschaftsforschung, edited by H. Sauermann. Tübingen: Mohr-Siebeck. [75] Shankar, Anisha, and Charles Pavitt. 2002. “Resource and Public Goods Dilemmas: A New Issue for Communication Research.” The Review of Communication 2(3):251–272. [76] Snijders, Chris. 1996. Trust and Commitments. Amsterdam: Thela Thesis. [77] van Lange, Paul A. M. 1999. “The Pursuit of Joint Outcomes and Equality in Outcomes: An Integrative Model of Social Value Orientation.” Journal of Personality and Social Psychology 77(2):337–349. [78] Vieth, Manuela. 2009. Commitments and Reciprocity in Trust Situations. Experimental Studies on Obligation, Indignation, and Self-Consistency. Utrecht University, Utrecht. [79] Vieth, Manuela, and Jeroen Weesie. 2006. “Codebook of CoRe Experiments 1: Promises of Trustworthiness in Trust Games. Sets of Identical (Sub)Games.” Mimeo, Utrecht University. [80] Voss, Thomas. 1998. “Vertrauen in modernen Gesellschaften.” Pp. 91–129 in Der Transformationsprozess, edited by R. Metze, K. Mühler, and K.-D. Opp. Leipzig: Universitätsverlag. [81] Webster, Murray. 1975. Actions and Actors. Principles of Social Psychology. Cambridge, MA: Winthrop. [82] Weesie, Jeroen. 1993. “Social Orientations in the Prisoner’s Dilemma.” ISCORE paper 14, ICS/Sociology, Utrecht University. [83] Weesie, Jeroen. 1994a. “Fairness Orientations in Symmetric 2x2 Games.” Mimeo, Utrecht University. [84] Weesie, Jeroen. 1994b. “Social Orientations in Symmetric 2x2 Games.” ISCORE paper 17, ICS/Sociology, Utrecht University. [85] Weesie, Jeroen, and Werner Raub. 1996. “Private Ordering.” Journal of Mathematical Sociology 21(3):201–240. [86] Williamson, Oliver E. 1985. The Economic Institutions of Capitalism. New York: Free Press. [87] Yamagishi, Toshio. 1986. “The Provision of a Sanctioning System as a Public Good.” Journal of Personality and Social Psychology 51(1):110–116.

Chris Snijders, Marcin Bober, and Uwe Matzat

Online Reputation in eBay Auctions: Damaging and Rebuilding Trustworthiness Through Feedback Comments from Buyers and Sellers Abstract: Research on online reputation systems has shown that, in general, a higher reputation score increases trust in the seller in the sense that those with a higher reputation score get better prices for their goods (the “reputation premium”) and may be more likely to attract buyers (a higher probability of a sale). Likewise, negative feedback from buyers reduces trust in the seller. In this chapter, we analyze whether and how sellers who are confronted with negative feedback can rebuild trust by reacting appropriately. We also assess the monetary value of textual feedback, as compared to the value of the numerical reputation score. Data obtained in an online choice-based experiment allows us to analyze both the effects of buyers’ feedback text comments on the trust of a buyer in eBay sellers and the effect of a seller’s textual reaction on rebuilding trust. Our results show that textual feedback from buyers can indeed strongly affect the purchasing decisions of other buyers. Sellers can counteract the effect of negative comments, but only when the appropriate trust-rebuilding strategy is chosen in formulating the reply. The monetary value of adequate textual feedback is surprisingly high when compared to the value of the reputation score.

1 Introduction The possibility of online interaction is changing the way people buy and sell products. Online interaction allows people who would otherwise never be able to reach each other to interact. One could argue that the fact that markets have gone online has brought them closer to the free-market ideal: there are plenty of sellers and plenty of buyers, even for the most obscure objects, and all of them can interact in unrestricted competition. Matters are not, however, that straightforward. Online market interaction faces serious problems of trust between buyers and sellers, and both buyers and sellers of products or services know this. Online traders usually do not know each other, live in distant places, interact only for a single commercial transaction, and may never interact again. For this reason, many online trading sites (eBay, Amazon, and others) allow their buyers to report on their experience with the seller, and in most cases the sellers are allowed to respond. In fact, many separate consumer review sites have emerged that allow the assessment of parties offering goods or services, such as review sites for hotels, restaurants, vacation attractions, car dealers and mechanics, medical professionals, programmers, handymen, and online eBay traders, to name just a few. Consumers have therefore become producers of powerful content by https://doi.org/10.1515/9783110472974-020

422 | Chris Snijders, Marcin Bober, and Uwe Matzat

providing feedback information about products and services (Korfiatis, Garcia-Bariocanal, and Sanchez-Alonso 2012; Utz, Kerkhof, and van den Bos 2012). Sellers know this, and use positive information from consumer review sites to their own benefit. For instance, some airlines report their scores from user reviews, or restaurants may mention how well they have previously scored on a restaurant review site. Roughly speaking, the information about previous encounters usually consists of a quantitative part (a numerical score, or a total number of “thumbs up”, for instance) and a qualitative part (usually a certain amount of text). Especially because the number of online interactions is growing so fast, the amount of available information about previous interactions expands quickly, which makes it necessary for prospective traders to rely on summary statistics or general heuristics to cope with the elaborate information that is potentially available. Because of this, reputation – which used to rely on actual “word-of-mouth”, transmitted within relatively close-knit groups (cf. Diekmann et al. 2014) – has changed into a possibly equivalent but potentially quite different “word-of-Internet” process. Many studies have examined the extent to which the numerical indicators in feedback profiles influence the probability of sales and price premiums (Ba and Pavlou 2002; Bolton, Katok, and Ockenfels 2002; Diekmann, Jann, and Wyder 2009; Snijders and Zijdeman 2004; Stadifird 2001; see Bajari and Hortascu 2005 or the online supplement of Diekmann et al. 2014 for a review). Their results indicate that a seller’s high reputation score leads to a price premium (often, but not always, relatively small) for the buyer (see however Snijders and Weesie 2009 for several issues related to the measurement and interpretation of this premium). In this chapter, we instead analyze the semantic feedback of the reputation system, focusing on eBay as our auction platform. Apart from giving a numerical evaluation, a buyer on eBay can provide a short text comment indicating to the community of other potential buyers his satisfaction or dissatisfaction with the seller’s product and service. In addition, the seller can react to the buyer’s comments by providing a short text comment that indicates his evaluation of the buyer or of the given comment. In particular, we consider three research questions related to the process of dealing with feedback information. First, does an accusation from a previous buyer affect the future perceived trustworthiness of the seller, and does it make a difference whether the accusation is competence- or morality-related? Note that quantitative summary statistics as supplied online typically do not carry such information, and the only place from which users can access it is the semantic feedback. Second, if there is indeed a loss of trust on the part of the buyers, is it possible for a seller to regain that lost trust by adequately responding to either kind of accusation? Again, this can only be assessed from the semantic feedback buyers and sellers give each other, and cannot be deduced from any quantitative summary statistic (at least not as they are implemented now). Third, we want to assess how the value of different kinds of semantic feedback compares to the value of quantitative feedback, for several reasons. One is that the quantitative feedback summary is in a way the most condensed:

Online Reputation: Damaging and Rebuilding Trustworthiness |

423

it allows users to interpret a history of interactions in an instant. However, this can come at a cost of throwing away information users would have liked to have. How good a representation is the quantitative information? A second reason is that people are more used to information conveyed semantically, and better able to both convey and understand meaning in this way than when using or assessing numbers. After all, “word-of-mouth” as we knew it in the pre-Internet era was (and still is) based more on a handful of stories that supposedly conveyed a characteristic image than on an extensive overview of past performance. In this chapter, we present two main findings that extend earlier insights about the role of reputation in the online trust process. First, we replicate the findings of Utz, Matzat, and Snijders (2009), showing that negative comments by buyers reduce trust in a seller and that rebuilding trust online can best be achieved through apologizing, irrespective of the kind of accusation. Second, we compare different kinds of textual buyer and seller feedback and estimate the amount of money a seller loses on average through negative textual feedback. This allows an estimate of how much money the seller can regain by reacting appropriately. The chapter is structured as follows. In the following section, we give a brief overview of reputation systems used in online auctions such as eBay, and follow up with a summary of current knowledge about trust-building and rebuilding. The third section offers a description of the research design and measurements, showing how we can quantify the success of the strategies sellers and buyers use to reduce and rebuild the seller’s trustworthiness. We then present and discuss the results of the online experiment. In the final part, we discuss the implications for future research on online reputation systems and trust building.

2 Theoretical background 2.1 Online auctions, trust, and reputation systems Trust plays various roles in stimulating consumers’ behavior. For instance, consumers’ trust in systems such as the Internet (Grabner-Kräuter and Kaluscha 2003), and trust in third-party mechanisms such as escrow services (Pavlou and Gefen 2004), have facilitated e-commerce by effectively replacing the necessary trust in the other party by trust in the system guiding the interaction. However, in many online interactions, interpersonal trust is still crucial (Mayer, Davis, and Schoorman 1995). In this chapter, we focus on online interaction as it occurs on auction sites such as eBay, where buyers and sellers meet but can only benefit to some extent from institutional safeguards for the transaction. Although the trust issue works both ways, we focus here on the necessary trust of the buyer in the seller. Trust plays a role in the sense that the buyer hands over the control of the situation to the seller: the buyer

424 | Chris Snijders, Marcin Bober, and Uwe Matzat

must wait and see whether the seller will fulfill the contract as promised even though the seller may have incentives to behave in his own short-term interests (cf. Coleman 1990). A buyer’s trusting disposition, and the perceived trustworthiness of the seller, both affect the extent to which the buyer is willing to trust the seller. In turn, perceived trustworthiness (as perceived by the buyer) depends on a mix of the perceived ability, benevolence, and integrity of the other party (Bhattacherjee 2002; Chiu, Huan, and Yen 2010; Metzger 2004; Metzger 2006), which may in turn depend on contextual or institutional characteristics (cf. Przepiorka 2014). One could argue that the perceptions about the other party pertain either to their competence (the seller may want to be cooperative but is not able to do so) or to their morality (the seller may be able to cooperate but does not want to) (Mayer, Davis, and Schoorman 1995; Wojciszke 2005). To assess the competence and morality of other parties, many auction sites have introduced “reputation systems” to collect and publish information about the past transaction behavior of its members (Resnick et al. 2000). After every eBay transaction, for instance, both seller and buyer can give feedback about the business partner by placing a comment and indicating whether the experience has been positive, negative, or neutral. Auction sites such as eBay summarize the interaction data for their users. The eBay member profile (and most other reputation systems we are aware of) combines both qualitative and quantitative feedback. The qualitative or textual feedback includes a history of the past transactions, each associated with positive, neutral or negative comments of the buyers and the replies from the seller. In a reply, sellers can explain or clarify their actions and defend themselves against accusations. Browsing through these “conversations”, a buyer can evaluate how sellers deal with their clients. Because the number of transactions is large, it would be a real burden to go through all of them, or even through a representative sample. This is part of the reason why auction sites also offer quantitative feedback, which typically consists of some compound reputation score and an overview of the feedback ratings, which informs about the number of different types of evaluations (positive, neutral, negative) that a seller has received in the last 1, 6, and 12 months or so. On eBay, the quantitative reputation score of a user equals the difference between the unique number of positive and negative evaluations (semantic comments are not used to calculate the reputation score). Previous research has shown that reputation pays, although perhaps not as much or as consistently as one might expect, and with an emphasis on quantitative summary statistics. Sellers with high reputation scores can sell their products for higher prices because the buyers avoid the risk and prefer to buy from a reliable source, even if it is more expensive (Diekmann et al. 2014; Dellarocas 2003; Lee, Im, and Lee 2000; Standifird 2001; Ba and Pavlou 2002; Li and Lin 2004; Snijders and Weesie 2009). On eBay, this positive effect of reputation on the selling price is often relatively small, perhaps because although buyers are willing to pay relatively high amounts for reputation they do not have to because of the abundance of high reputation sellers on eBay (Snijders and Weesie 2009).

Online Reputation: Damaging and Rebuilding Trustworthiness |

425

2.2 Textual feedback and its role in rebuilding trust Textual feedback of consumers on review sites can have powerful effects (Lee, Park, and Han 2008; Park and Kim 2008; Utz, Kerkhof, and van den Bos 2012). Textual feedback on eBay can also have a strong impact, and affect the price premium. A negative comment that describes a seller as irresponsible or immoral decreases the price that buyers are willing to pay. Pavlou and Dimoka (2006) even showed that extraordinary comments may explain up to 20–30 % of the variance in price premiums. Utz, Matzat, and Snijders (2009) have later shown, using a scenario-based design, that more ordinary comments can likewise influence the credibility of the seller. Moreover, users tend to search for this kind of information: the average buyer on eBay reads at least one page of comments before placing a bid (Pavlou and Dimoka 2006), though one should interpret such statistics with caution as such trends may come and go. Since textual feedback can have such a strong impact on the trust of the buyer in the seller, it likewise has great potential for trust rebuilding. Kim et al. (2004) define rebuilding trust as “activities directed at making a trustor’s trusting beliefs and trusting intentions more positive after a violation is perceived to have occurred”. In online markets, one way to rebuild a lost reputation in the short term is by formulating an adequate reply. The seller could for instance react to the negative comment by denying the accusation or simply apologizing. We see this happening on eBay, and experimental evidence similarly suggests that those who have “betrayed trust” before are more likely to construct messages to persuade others to trust them (Schniter, Sheremeta, and Sznycer 2013). The literature on trust rebuilding distinguishes between competence-based and morality-based trust violations (cf. Kim et al. 2004; Utz, Matzat, and Snijders 2009; Matzat and Snijders 2012). Competence-based violations consider a seller’s skills to fulfill his or her obligations. In online auction settings, an example of a competencebased violation might be a situation where a seller wraps the parcel in the wrong way so that it becomes damaged during transportation, or a seller might unintentionally misrepresent an item because of a lack of knowledge about it. Morality-based violations concern a seller’s intentional unethical knowledge about it. Examples of a morality-based violation could be a situation where a seller is knowingly offering a broken product without acknowledging it in the description, or knowingly using cheaper packaging material. In line with earlier arguments and findings (Pavlou and Dimoka 2006; Utz, Matzat, and Snijders 2009), we expect that both types of accusations affect a seller’s trustworthiness for subsequent buyers. We therefore test the following two hypotheses: Hypothesis 1. A buyer’s morality-based accusation decreases the attractiveness of the seller’s offer (compared to negative feedback without an accusation). Hypothesis 2. A buyer’s competence-based accusation decreases the attractiveness of the seller’s offer (compared to negative feedback without an accusation).

426 | Chris Snijders, Marcin Bober, and Uwe Matzat

Based on (game-theoretic and common sense) logic, just a single accusation of a seller being immoral will strongly affect the buyer’s perception of that seller’s trustworthiness: this seller is not safe to do business with and has now revealed his “true nature”, in the sense that he would deceive if he gets the chance. This is different from a competence-based trust violation: even a very skilled person can make a mistake, so that violations that are competence-based are usually considered less informative (Kim et al. 2004). Given the potentially crucial effect of the comments on trust, online auction sites provide mechanisms that allow sellers to reduce the potentially unjust negative effect of unfavorable comments. Those who receive a negative comment can reply and try to explain what happened. Previous studies distinguish two main strategies that might help rebuild trust after receiving a negative comment: apology and denial (Ohbuchi, Kameda, and Agarie 1989; Bottom et al. 2002; Schweitzer, Hershey, and Bradlow 2006). According to Kim et al. (2004) the effectiveness of these two reply strategies, apology and denial, is moderated by the type of trust violation. Their reasoning runs as follows. Because negative information is more informative in the case of morality-based trust violations, the best strategy for the seller is to deny it: it does not make sense to apologize for lack of morality. Inversely, an apology is the better strategy in the case of competence-based trust violations, both because it shows redemption and because negative information is less insightful for assessing ones’ competence: it is not detrimental to admit to an accidental slip-up. Kim et al. (2004) provide evidence for the relative advantages of apologies and denials in an experimental study that focuses on face-to-face interaction in which parties had limited prior interaction and the relationship was in the emergent stages (as opposed to a relationship with a longer history). Their findings were not replicated by Utz, Matzat, and Snijders (2009) in a study focusing mainly on eBay interaction. They showed that an apology on eBay is the best strategy as a reaction to both types of violations. This might be a result of the generally low level of trust among eBay users (Matzat 2009). It seemed that participants perceived sellers who denied the accusations in the experiment as persons whose guilt was already proven. Alternatively, it may have been the consequence of the fact that buyers on eBay do not see an eBay interaction as a standard purchasing interaction, and simply consider the seller to be responsible for the whole of it, even when something outside the influence of the seller has caused the purchase to go wrong in some way. The results in Utz, Matzat, and Snijders (2009) further suggested that “pure” apologies work better in rebuilding trust than “extended” apologies, in which an explanation was offered in addition to the apology. Curiously, findings from the marketing literature are at odds with both previous findings. Basso and Pizzutti (2016) find, using a scenario experiment in which subjects were asked to consider not being allowed into their hotel room even though they had arrived on time, that an apology works better when the failure is based on morality than when it is based on incompetence (the reverse is true for promises to behave better next time). The reason for this is not obvious;

Online Reputation: Damaging and Rebuilding Trustworthiness |

427

Basso and Pizzutti (2016) mention that this is because there must be a match between the failure and the repair (we tend to agree), but why an apology matches better with an integrity-based violation than with a competence-based violation remains unclear. Given the inconsistent results of the strategies to rebuild trust and the conflicting arguments one can easily come up with, we do not formulate hypotheses about differences in the effectiveness of the seller’s strategies. We are nevertheless interested in considering whether both strategies help the seller to regain authority by diminishing the negative effect of an accusation, and therefore test the following hypotheses: Hypothesis 3. A seller’s apology as a reaction to a buyer’s accusation increases the attractiveness of the offer (compared to no reaction). Hypothesis 4. A seller’s denial of a buyer’s accusation increases the attractiveness of the offer (compared to no reaction). Utz, Matzat, and Snijders (2009) used a scenario-based experiment that forced participants to decide about the trustworthiness of a single eBay seller profile based in part on textual comments from that seller and their buyers. This decision, however, does not resemble the decision situation of buyers on eBay, who must usually choose between several seller options. The buyers in the experiments of Utz, Matzat, and Snijders (2009) had no chance to compare different offers, as would be the case when buying on eBay. This may have limited the external validity of the findings, and makes it harder to interpret the size of the effect (see the design of our experiment below). In this chapter, we utilize a more realistic scenario. The conjoint-based design that we use causes participants to face a situation where they have to rank several product offers. In addition, the conjoint-based design allows estimating the monetary value of both negative textual feedback and different trust rebuilding strategies. In other words, we can estimate how much sellers on average lose when they get a negative comment and how much of this lost money they can regain through different reply strategies. The monetary values allow us to comprehend better and visualize the effectiveness of different kinds of feedback (e.g., quantitative vs. textual) and benchmark trust reparation strategies. There are reasons to believe that written, qualitative feedback might have stronger effects than quantitative feedback. Text allows for more subtle messages than just numbers, and people in general are used to dealing with text rather than numbers and are therefore more likely to feel strongly about it, and we would expect that a single comment might lead traders to trust or distrust a particular seller (whereas we do not expect this to hold for numerical feedback, except in rare instances). It is not, however, easy to quantify in this context what a “stronger” effect is, but we can show how the positive value of a reputation score compares to the hypothesized negative value of an accusation and the potentially positive value of a rebuttal by the seller. We therefore formulate two additional descriptive research questions:

428 | Chris Snijders, Marcin Bober, and Uwe Matzat

D 1. How large is the monetary value of a buyer’s textual comments (accusations) compared to the value of the seller’s numerical reputation score? D 2. How large is the monetary value of a seller’s textual comments (rebuttals to accusations) compared to the negative value of a buyer’s textual comment?

3 Study design and measurements 3.1 Participants Participants were recruited in April 2009 among the users of one of the largest commercial opt-in Internet panels in the Netherlands (the language used in the experiment was Dutch), inviting participants to “a study on online auctions such as eBay”. In line with the standard rules of the panel, participants were rewarded by credits that they could later use for online shopping on the panel’s website. Participants were 191 present or past eBay users (52 % males). The average age of the participants was 42 years (SD = 11.8, range 16–74 years), and 41 % had some kind of higher education (including higher vocational education). Most of the participants used the Internet on a daily basis (76 %) and had been using the Internet for a long time (74 % had been using the Internet since before 2000). Only current or past eBay users could participate. About 40 % of the subjects reported that they had been using eBay for more than four years.

3.2 Procedure Participants received a link to a web-based survey. The study took approximately 15– 20 minutes to complete. In the main part of the study, the participants ranked the attractiveness of three different offers for the exact same model of a digital camera from different buyers. The respondents rated the items by choosing the most and the least preferred offer out of the three offers presented on the screen. This task was repeated five times (each time with three different buyer profiles). In total, each person was exposed to 15 offers (five comparisons of three offers). The offers differed in terms of the price of the product, the reputation score of the seller, the type of trust violation (as mentioned by a previous buyer), and the type of seller’s reaction (the reply on a comment). The number of levels per aspect is described in more detail below. Table 1 provides an overview of the manipulated factors and their levels. Figure 1 shows an example item and summarizes the stimuli received by participants. The procedure is similar to with what is known as a “choice-based conjoint” (CBC) procedure: participants rank a small set (here, a set of three offers in which they choose their most and least preferred option) so that the analysis of the data allows

Online Reputation: Damaging and Rebuilding Trustworthiness |

429

Tab. 1: Overview of factors manipulated in the choice-based experiment. Factor

Factor levels

Price (€) Reputation (1 point) Accusation Reaction∗

10 levels from 300–350 15 levels from 5–4,398 3 levels: none, morality-based, competence-based 5 levels: none, pure apology, apology with explanation, pure denial, denial with explanation 2 levels: broken product, delayed delivery

Scenario type ∗

after the accusation type ‘none’, only the reaction type ‘none’ was displayed.

Notes: Participants ranked 5 sets of 3 vignettes. The vignettes differed in terms of price, feedback score, scenario, accusation, and reply.

Fig. 1: An example of a vignette (eBay offer) used in the study.

430 | Chris Snijders, Marcin Bober, and Uwe Matzat

the researcher to estimate the relative value the participants attach to the different elements of the offer, under the assumption that the participants’ implicit preference values can be adequately represented as a weighted average of the separate characteristics. It is important to note the importance of the availability of the full choice set in this experiment. Most of the empirical studies on eBay behavior use data downloaded from the eBay site that does not include the choice set of the buyers (that is, the options that buyers have considered but did not buy), so that the value of the separate offer characteristics to the buyer cannot be estimated appropriately. Instead, what is measured is what buyers are actually paying for reputation (or other characteristics of the offer), not what they would be willing to pay for reputation. In fact, under the abovementioned assumptions, our procedure estimates willingness to pay irrespective of whether participants would actually consider buying one of the options (cf. Snijders and Weesie 2009). After the rankings, the respondents were asked some questions that allowed us to evaluate the efficiency and credibility of the manipulations used in the experiment (different types of trust violations and the seller’s reactions). In addition, participants answered several questions regarding their eBay usage, their general disposition to trust, and some demographics (e.g., age, gender, education).

3.3 Offer characteristics As Table 1 shows, the price of the digital camera was manipulated on 10 levels and ranged from 300 to 350 €, which was consistent with actual prices for the particular camera at that time. The overall feedback score was manipulated on 15 levels, starting from 5 and progressing roughly on a log-linear scale up to 4,398. The score was always higher than zero, so all the fictional sellers had positive feedback (on eBay, buyers with a negative overall feedback score are rare, and those with a score lower than –3 are removed from the site). In addition to the total score, the offers included a table with the detailed number of negative and positive comments. The number of negative comments was fixed at two, so the number of positive comments equaled the total minus two. Types of trust violation were manipulated on three levels and operationalized as a negative comment placed on the auction page. The comment could either be absent (the control condition), relating to a morality-based trust violation (e.g., “unreliable eBayer, product malfunctioned”) or relating to a competence-based violation (“incompetent eBayer, product was damaged (badly packaged)”). Types of trust-rebuilding strategy were operationalized as a seller’s reply to a negative comment. The type of reply was consistent with the violation type. This variable was manipulated on 5 levels: pure denial (that is, just a denial and nothing else), de-

Online Reputation: Damaging and Rebuilding Trustworthiness |

431

nial with explanation, pure apology, apology with explanation, and no reaction as a baseline condition. (see the Appendix for the precise formulation of these conditions). The type of scenario varied between the subjects. At the beginning of the experiment each of the participants was randomly assigned to one of two different scenarios as mentioned above: one scenario about a delayed delivery or one scenario about the delivery of a broken product. The type of scenario codetermined the content of the comments and the replies to the comments (see Appendix for details).

3.4 Design The total number of possible vignettes, given our levels, equals 10 (price) × 15 (reputation) × 11 (violation and reaction) × 2 (scenario) = 3,300. The 11 “violation and reaction” levels are based on two types of accusations (morality vs competence), with five kinds of replies each (no reply, apology, extended apology, denial, extended denial), plus a control condition without accusation. We randomly generated different sets of three eBay offers (vignettes). Note that the price, feedback score, violation type, and violation response varied both within (and to some extent also between) subjects, as each person was exposed to five comparisons of three offers. In addition, each respondent was assigned to one of two scenarios, describing either a delayed delivery or the delivery of a broken product. We removed the data of 10 respondents who completed the experiment in less than four minutes, or whose data turned out to be suspicious because of a very large number of identical answers in matrix questions, leading to (193 − 10) × 15 = 2,745 offers (randomly chosen out of 3,300) that are included in the actual data analysis.

3.5 Dependent and control variables Attractiveness of the offer: The respondents ranked any set of three offers from the most to the least attractive offer, by choosing first the option that they preferred the most and then by choosing the option that they preferred the least. Control variables General Internet experience was measured with separate standard items such as: years of experience, average weekly usage, etc. Internet auction experience was measured with six questions that allowed the participants to report how often they buy or sell goods on eBay, perceptions about trustworthiness of such activities and the approximate number of transactions, etc. Knowledge of photography: To differentiate between professional and amateur photographers, the participants were asked to indicate what sort of camera they used and for which kind of activities.

432 | Chris Snijders, Marcin Bober, and Uwe Matzat

Trust disposition was measured with a scale consisting of four items, based on the measure by Jarvenpaa, Knoll, and Leidner (1998). Examples of these items are: “Most people are honest in describing their experiences and abilities” and “Most people answer personal questions honestly”. Participants answered the questions on 5-point Likert scales ranging from do not agree at all to fully agree (alpha = 0.81). Willingness to buy technological gadgets was measured with one item: How often do you spend money on gadgets, electronics and multimedia equipment? The respondents answered on a 7-point scale ranging from very often to I do not spend at all.

4 Results The data was analyzed using conditional logistic regression with robust standard errors at the participant level. We recoded the dependent variable (attractiveness of the offer) into a binary variable, with a 1 indicating the most attractive offer. As is well known in the literature, conditional logit on this kind of data allows us to estimate the contribution of the separate offer characteristics on the attractiveness of the offer in a way that is consistent with “McFadden’s choice model” (McFadden 1974). This is based on the assumption that attractiveness can be modeled as a linear combination of the characteristics and an error term with a specific distribution. Conditional logistic regression can be understood as a special case of logistic regression where one takes into account that it is known up front that (in our case) precisely two out of every three offers have been given a value of 0, and one (the preferred offer) a value of 1. Table 2 shows the results of this analysis. The results show that an offer is more likely to be chosen if the price is lower, the seller’s reputation is higher, and if there are no negative textual comments displayed (i.e., no accusations of immorality or incompetence) on the user’s profile page (see Table 2), which is what one would expect. In the scenario of the broken product, both types of accusations (morality-based and competence-based) show a negative and significant effect, reducing the attractiveness of the seller’s offer. The interaction effects between the scenario of the delayed delivery and the two types of accusations are both insignificant, indicating that the effects of the accusations do not differ significantly between the scenarios, thereby providing support for Hypotheses 1 and 2 about the effects of morality- and competence-based accusations. Both an extended apology and a pure apology show a positive and significant effect on the attractiveness of the offer in the case of being accused of a competencebased trust violation. The insignificant interaction effect between the scenario type and the two types of apologies (“Delayed delivery” × Pure apology and “Delayed delivery” × Apology + Explanation) shows that this effect is similar across both types of scenarios. That is, under the condition of a competence-based trust violation Hypothesis 3 about the effects of apologies finds support. A closer look at the interaction

Online Reputation: Damaging and Rebuilding Trustworthiness |

433

Tab. 2: Conditional logit analysis on the attractiveness of the offera . 1 Price (1 €) Reputation (1 point) Accusation of immorality Accusation of incompetence Pure apologyb Apology + Explanation Pure denial Denial + Explanation Acc. of immorality × Pure apology Acc. of immorality × Apology + Explanation Acc. of immorality × Pure denial Acc. of immorality × Denial + Explanation “Delayed delivery” × Price “Delayed delivery” × Reputation score “Delayed delivery” × Accusation of immorality “Delayed delivery” × Accusation of incompetence “Delayed delivery” × Pure apology “Delayed delivery” × Apology + Explanation “Delayed delivery” × Pure denial “Delayed delivery” × Denial + Explanation “Delayed delivery” × Acc. of immorality × Pure apology “Delayed delivery” × Acc. of immorality × Apology + Explanation “Delayed delivery” × Acc. of immorality × Pure denial “Delayed delivery” × Acc. of immorality × Denial + Explanation Observations

2 −0.0296∗∗∗

0.000497∗∗∗

−0.817∗∗∗ −0.657∗∗∗ 1.224∗∗∗ 0.889∗∗∗ −0.976∗∗∗ 0.101

−0.0224∗∗∗ 0.000492∗∗∗ −0.785∗∗ −1.045∗∗∗ 1.147∗∗∗ 1.504∗∗∗ −1.083∗∗ −0.404 −0.154 −1.123∗∗ −0.556 0.202 −0.0151∗ 0.0000404 −0.183 0.746 −0.170 −0.579 0.0192 0.611 1.122∗ 1.109∗ 1.305 0.313

2745

2745

a

The main effect of the scenario type (broken product vs. delayed delivery) cannot be included because it did not vary between the comparisons. Baseline categories: no accusation, no reaction, broken product scenario. b The effect of the variable “pure apology” estimates the effect size of a pure apology as a reaction to a competence-based accusation, because we include an additional interaction term for its effect under the condition of a morality-based accusation. The same holds for other main effects of other reaction types. Standard errors are robust with respect to clustering at the subject level. ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001.

effects between the accusation of immorality and the two types of apologies reveals the following. A pure apology increases the attractiveness of the offer in both scenarios under the condition of a morality-based trust violation. An extended apology as a reaction to the accusation of a morality-based trust-violation increases the attractiveness of the offer in the scenario of a delayed delivery, but not in the scenario of a broken product. With respect to a pure apology, Hypothesis 3 is supported also under the condition of a morality-based trust violation and thus finds support for both

434 | Chris Snijders, Marcin Bober, and Uwe Matzat

types of trust violations and in both scenarios. With respect to an extended apology, Hypothesis 3 finds support in three out of four cases, namely in both scenarios under the condition of a competence-based trust violation and in the scenario of a broken product under the condition of a morality-based trust violation. The only exception is the scenario of a broken product under the condition of a competence-based trust violation. We can therefore conclude that Hypothesis 3 finds substantial support. An apology tends to increase the attractiveness of the offer. Results with respect to Hypothesis 4 about the effects of denials are as follows. Under all four conditions (both scenarios × both types of trust-violations) both types of denial (pure and extended denial) do not show a significant positive effect. Sometimes the effect is even negative (and significant). In any case, Hypothesis 4 regarding the positive effect of a denial is not supported. The results in Table 2 allow comparing the effects of the different kinds of textual feedback in terms of monetary values. We know from Table 2 what the effect of the price of the product is on the attractiveness of the offer. We also know what the effect is of, for instance, negative textual feedback on the attractiveness of the offer. Comparing these two effects allows us to estimate the monetary value of negative textual feedback. Table 3 and Figure 2 represent the data from Table 2 in terms of the monetary value of the various kinds of negative comments and their replies when selling a professional camera (worth about 300–350 euro). A negative comment from a buyer lowers the attractiveness of the offer. An accusation of immorality in the scenario of a broken product “costs” about 35 € in the sense that it would be just as attractive as an offer that is otherwise equal, but 35 € cheaper. A seller accused of incompetence in the scenario of a broken product loses a bit more; the final selling price is lowered by 47 €. The numbers clearly indicate that a single textual comment can do serious damage, especially when compared to the relatively small effect of the numeric feedback. The seller can reduce the negative effect of an accusation by using the optimal way of replying. Under all conditions (both scenarios and both types of trust violation) offering a short apology is an adequate strategy. For instance, in the case of a broken product and being accused of a competence-based trust violation, offering an apology recovers about 51 €, which completely recoups the loss. The net difference from zero is not statistically significant, so we cannot claim that an apology more than compensates for negative feedback, but it also does not seem to lead to a net loss. In this case, therefore, the monetary value of the seller’s strategy is equal to the monetary value of the buyer’s textual comment. In the case of a delayed delivery and an accusation of a morality-based trust violation, the best strategy is a pure apology (that is, an apology with no explanation) which recovers roughly 86 €. Our results indicate that a single point of (extra) reputation allows the seller to raise the price by roughly two cents, which implies that the seller would need to get an extra 2350 points to reduce the negative effect of an accusation of incompetence in the broken product scenario.

Online Reputation: Damaging and Rebuilding Trustworthiness |

435

“Broken product” scenario

“Delayed delivery” scenario

Notes: An illustration of how the estimated selling price of a product worth 300 euro (no accusations) changes depending on different kinds of accusations and seller’s replies. Please note that not all differences are significant (see Table 2 for significance levels). Fig. 2: Illustration of price changes depending on scenario, type of accusation, and type of reaction.

436 | Chris Snijders, Marcin Bober, and Uwe Matzat

Tab. 3: Monetary value of trust violations and trust rebuilding strategies, computed using conditional logita . Price in € Reputation (1 point) Accusation of immorality Accusation of incompetence Pure apology Apology + Explanation Pure denial Denial + Explanation Acc. of immorality × Pure apology Acc. of immorality × Apology + Explanation Acc. of immorality × Pure denial Acc. of immorality × Denial + Explanation “Delayed delivery” × Reputation score “Delayed delivery” × Accusation of immorality “Delayed delivery” × Accusation of incompetence “Delayed delivery” × Pure apology “Delayed delivery” × Apology + Explanation “Delayed delivery” × Pure denial “Delayed delivery” × Denial + Explanation “Delayed delivery” × Acc. of immorality × Pure apology “Delayed delivery” × Acc. of immorality × Apology + Explanation “Delayed delivery” × Acc. of immorality × Pure denial “Delayed delivery” × Acc. of immorality × Denial + Explanation

0.02∗∗∗ −35.04∗∗ −46.65∗∗∗ 51.21∗∗∗ 67.14∗∗∗ −48.35∗∗ −18.04 −6.88 −50.13∗∗ −24.82 9.02 0.00 −8.17 33.30 −7.59 −25.85 0.86 27.28 50.09∗ 49.51∗ 58.26 13.97

Notes: Estimates are based on standard errors that are clustered at the participant level. a See also Figure 2 for a different presentation of these data. ∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001.

This shows that a buyer’s textual feedback, if used correctly, is a powerful weapon compared to the numerical reputation score. We have run several robustness checks on our data: either including personal characteristics such as trust disposition as interaction effects (introducing them as main effects is not possible given that they do not vary across comparisons within subjects), restricting the sample to more experienced users, randomly deleting several non-preferred offers (which is a standard robustness check for the choice-model), or analyzing the data as a rank-ordered logit model, do not alter our results substantially (results not reported here).

Online Reputation: Damaging and Rebuilding Trustworthiness |

437

5 Summary, discussion, and conclusions In this chapter, we have contributed to answering the questions of whether and to what extent buyers can influence the perceived trustworthiness of sellers on eBay, and whether and to what extent it pays for sellers to monitor negative feedback and react appropriately. Our study empirically tested these questions using a choice-based experiment and tried to back them up with social psychological theory on building and rebuilding trust. In addition, we estimated the monetary value of the semantic feedback. Our research shows that the reputation of the sellers can be seriously damaged by a single negative textual comment: as a seller, just trying to recover from this loss by collecting additional positive reputation points will take a long while. Our results also show that a smart and personalized reply strategy can be a crucial key to rebuilding lost reputation. A single appropriate reply can recoup all, or almost all, losses suffered from negative feedback. It is worthwhile to think through the basis of this argument in some more detail to see its implications more clearly. An often-heard argument is that, in principle, online buyer-seller interaction cannot survive without trust, given the fact that sales are transacted through mainly single-shot encounters between anonymous individuals. A possible solution to this cooperation problem is the use of a reputation system that can alleviate the anonymity of encounters and allow for reputation formation. If the reputation system would indeed be such a solution, one would expect it to be possible to show that the reputation system always delivers positive value to those with a good reputation. Although earlier research has found that the quantitative elements of the feedback system indeed contribute to the development of trust and to a price-premium for those with a higher reputation score, this price premium often seems to be small (with some notable exceptions, for which see the overviews mentioned in the introduction). This calls into question whether it is reasonable to assume that the price-premium alone would be sufficient for the reputation system to be the cause for the feasibility of online interaction. Instead, one could argue that the strength of the reputation system does not lie in the value it attaches to sellers with higher reputation scores, but instead on the mere fact of its existence. People know bad behavior will be punished, and this punishment is a sufficient deterrent for bad behavior, but collecting more reputation points does not help much over and above this main effect (cf. Bolton, Katok, and Ockenfels 2004). Our findings add an argument to this line of reasoning. Recent research has focused on the trust-building effects of the textual feedback comments, and it has been shown that negative textual feedback comments accusing the seller of a violation of trust reduce the perceived trustworthiness of the seller (Pavlou and Dimoka 2006; Utz, Matzat, and Snijders 2009). Sellers can repair their damaged trustworthiness (Utz, Matzat, and Snijders 2009; Matzat and Snijders 2012). This suggests that semantic feedback might play an important role in being able to mimic the offline reputation

438 | Chris Snijders, Marcin Bober, and Uwe Matzat

process online, but the research has some limitations. First, the experimental procedure of earlier studies forced participants to evaluate just one offer at a time, thereby limiting the external validity of their findings. In reality, users often have to evaluate more offers simultaneously. Moreover, earlier research was unable to quantify the monetary values of buyers’ and sellers’ feedback so that they could be better compared with each other and with the effect of the numerical reputation score. Our experimental setup allows us to incorporate the full choice set under the consideration of the participants in the analyses. This in turn allows us to compare the monetary impact of the buyers’ and sellers’ textual feedback comments, quantify the effect of semantic feedback and compare it with the effect of reputation scores. Our most remarkable findings follow. Buyers can strongly influence the sellers’ perceived trustworthiness through textual feedback. Although sellers can regain their loss, this requires a specific and adequate response by the seller. In our data, the power of textual feedback is especially visible when compared directly to the effect of reputation scores. A seller would need to get an extra 2,300 reputation points to reduce the negative effect of a buyer’s accusation of incompetence. Even though the precise tradeoff between textual and quantitative feedback is obviously dependent on the context and content of the feedback itself, this shows that the potential of textual feedback is a force to be reckoned with when compared to the mere reputation score. Hence, our results offer some insights into the previously mentioned argument regarding the value of a reputation system. Its existence could affect the probability of mutually profitable online behavior (this is hard to compare other than in an experimental setting). The quantitative content of the reputation system, which has received the bulk of the empirical attention, adds to its value, but our findings show that the semantic content of the reputation system might well be the stronger factor. Our findings suggest that sellers would be wise to try and react to an allegation, as this will allow them to regain some (or possibly even all) of the trust they have lost. However, not all types of reactions recover the loss that is incurred. In general, buyers believe in the accusations of other buyers, but not in all types of reactions by the seller, who is often simply regarded as guilty by definition (Matzat 2009). Although it is not completely clear which reaction to an accusation is best under which circumstances (our data is too limited to answer this issue definitively), a brief apology was found to be a decent reply (when compared to an extended apology or a denial). Our findings have several practical implications. Designers and managers of online reputation systems and markets may want to give feedback comments a more prominent place on their web sites, given that they substantially influence perceived trustworthiness. In addition, given the knowledge that negative information is of special interest to users of online auctions, finding negative information could be made much easier (finding a negative comment on eBay took four to five clicks during our period of data collection). There are several obvious drawbacks to our study. First, although the experiment was set up to mimic the look-and-feel of eBay, and was certainly a step up in terms

Online Reputation: Damaging and Rebuilding Trustworthiness |

439

of the realism of the choice circumstances compared to many scenario-based experiments, our participants did not actually buy the product they favored. Even though, strictly speaking, the underlying model does not assume that an actual sale needs to be involved (McFadden’s choice model merely measures preferences), one could rightly posit that results might have been different when an actual purchase was at stake. Second, the negative value of an accusation and the positive value of a rebuttal are subtle matters and may depend on the precise wording. We are fairly certain that our manipulations worked well, for example in the sense that accusations of incompetence were indeed interpreted in that light (and not as an accusation of morality), but it might still matter how the accusation itself is phrased. The same argument goes for the rebuttal comments. Third, it is worth noticing that negative information in our experiment was easily accessible and overexposed when compared to the real eBay site. This was because we had to make the offers more compact (e.g., by mixing user profile information with the product page) and needed to restrict ourselves to the variables that we could meaningfully vary. Although the manipulation of the comments and rebuttals might create its own demand to some extent – because participants react to the variance in stimuli – it does seem that they react in the same way and, perhaps more importantly, they do seem to react more strongly to the text than to the numbers. Nevertheless, future research should partial out whether effects of semantic feedback remain as strong as they are here when they are less outspoken than in our scenarios. Which kind of rebuttal best fits different accusations remains mysterious, in the sense that the theoretical arguments are as yet still not very refined (“the rebuttal should match the accusation” is the basic argument) and empirical results vary across studies. Future studies should address this issue, both in terms of theoretical elaboration and of empirical testing. Finally, although we see no compelling reasons why other reputation systems on Internet sites would be different from a theoretical point of view, it is an open empirical question whether our findings extend beyond eBay to other auction sites, or perhaps to the struggle for trustworthiness in other social media sites. Moreover, the precise reason for the difference between face-to-face versus online trust rebuilding requires further examination (Kim et al. 2004). Buyers do not change their nature when they purchase in online environments. Therefore, future theory should try to explain why, and under what conditions, users behave in different ways in online and offline worlds.

440 | Chris Snijders, Marcin Bober, and Uwe Matzat

Appendix The tables below contain the list of experimental manipulations (textual feedback) used in the study. All texts were originally in Dutch.

Tab. 4: Scenario 1, “Broken product”. Condition

Content of accusation

Content of seller’s response

Accusation of incompetence + No reaction

Incompetent eBayer, product was damaged (badly packaged)



Accusation of incompetence + Pure denial

Incompetent eBayer, product was damaged (badly packaged)

Nonsense. I cannot be blamed for this

Accusation of incompetence + Denial with explanation

Incompetent eBayer, product was damaged (badly packaged)

Item was packaged carefully. Not my fault if he drops it while unwrapping!

Accusation of incompetence + Pure apology

Incompetent eBayer, product was damaged (badly packaged)

My apologies, my mistake

Accusation of incompetence + Apology with explanation

Incompetent eBayer, product was damaged (badly packaged)

I apologize. I wrapped an item this way for the first time and thought it would work out well

Accusation of immorality + No reaction

Unreliable eBayer, product malfunctioned



Accusation of immorality + Pure denial

Unreliable eBayer, product malfunctioned

Nonsense. I cannot be blamed for this

Accusation of immorality + Denial with explanation

Unreliable eBayer, product malfunctioned

Accusation is false. My advertisements are always completely truthful

Accusation of immorality + Pure apology

Unreliable eBayer, product malfunctioned

My apologies, my mistake

Accusation of immorality + Apology with explanation

Unreliable eBayer, product malfunctioned

Sorry, my mistake. I created the ad in a hurry and forgot to mention these small defects

Online Reputation: Damaging and Rebuilding Trustworthiness |

441

Tab. 5: Scenario 2, “Delayed delivery”. Condition

Content of accusation

Content of seller’s response

Accusation of incompetence + No reaction

Wrong zip code, it took four weeks before I got the product



Accusation of incompetence + Pure denial

Wrong zip code, it took four weeks before I got the product

Nonsense. I cannot be blamed for this

Accusation of incompetence + Denial with explanation

Wrong zip code, it took four weeks before I got the product

Buyer gave me the wrong zip code

Accusation of incompetence + Pure apology

Wrong zip code, it took four weeks before I got the product

My apologies, my mistake

Accusation of incompetence + Apology with explanation

Wrong zip code, it took four weeks before I got the product

I apologize. I have mixed up two digits

Accusation of immorality + No reaction

Unreliable eBayer, it took four weeks before I got the product



Accusation of immorality + Pure denial

Unreliable eBayer, it took four weeks before I got the product

Nonsense. I cannot be blamed for this

Accusation of immorality + Denial with explanation

Unreliable eBayer, it took four weeks before I got the product

I have shipped it immediately, don’t know what is going on at the post office

Accusation of immorality + Pure apology

Unreliable eBayer, it took four weeks before I got the product

My apologies, my mistake

Accusation of immorality + Apology with explanation

Unreliable eBayer, it took four weeks before I got the product

Sorry. I was sick and did not get around to it

Bibliography [1] [2] [3] [4] [5] [6] [7]

Ba, Sulin, and Paul A. Pavlou. 2002. “Evidence of the effect of trust building technology in electronic markets: Price premiums and buyer behavior.” MIS quarterly 26(3):243–268. Bajari, Patrick, and Ali Hortacsu. 2004. “Economic Insights from Internet Auctions.” Journal of Economic Literature 42:457–486. Basso, Kenny, and Cristiane Pizzutti. 2016. “Trust Recovery Following a Double Deviation.” Journal of Service Research 19(2). doi:10.1177/1094670515625455. Bhattacherjee, Anol. 2002. “Individual trust in online firms: Scale development and initial test.” Journal of Management Information Systems 19(1):211–241. Bolton, Gary E., Elena Katok, and Axel Ockenfels. 2002. “How effective are online reputation mechanisms? An experimental investigation.” Management Science 50(11):1587–1602. Bolton, Gary E., Elena Katok, and Axel Ockenfels. 2004. “Trust among Internet Traders: A Behavioral Economics Approach.” Analyse und Kritik 26(1):185–202. Bottom, William P., Kevin Gibson, Steven E. Daniels, and J. Keith Murnighan. 2002. “When talk is not cheap: Substantive penance and expressions of intent in rebuilding cooperation.” Organization Science 13(5):497–513.

442 | Chris Snijders, Marcin Bober, and Uwe Matzat

[8] [9] [10] [11] [12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21] [22]

[23]

[24] [25]

Chang, Wen-Hsi, and Jau-Shien Chang. 2012. “An effective early fraud detection method for online auctions.” Electronic Commerce Research and Applications 11(4):346–360. Chiu, Chao-Min, Hsin-Yi Huang, and Chia-Hui Yen. 2010. “Antecedents of trust in online auctions.” Electronic Commerce Research and Applications 9(2):148–159. Coleman, James S. 1990. Foundations of Social Theory. Cambridge, MA: Harvard University Press. Dellarocas, Chrysanthos. 2003. “The digitization of word-of-mouth: Promise and challenges of online feedback mechanisms.” Management Science 49(10):1407–1424. Diekmann, Andreas, Ben Jann, Wojtek Przepiorka, and Stefan Wehrli. 2014. “Reputation formation and the evolution of cooperation in anonymous online markets.” American sociological review 79(1):65–85. Online supplement: http://goo.gl/fLCWWc. Diekmann, Andreas, Ben Jann, and David Wyder. 2009. “Trust and reputation in internet auctions.” Pp. 139–165 in eTrust: Forming Relationships in the Online World, edited by K. S. Cook, C. Snijders, V. Buskens, and C. Cheshire. New York: Russell Sage Foundation. Grabner-Krauter, Sonja, and Ewald A. Kaluscha. 2003. “Empirical research in online trust: A review and critical assessment.” International Journal of Human-Computer Studies 58(6):783– 812. Jarvenpaa, Sirkka L., Kathleen Knoll, and Dorothy E. Leidner. 1998. “Is anybody out there? Antecedents of trust in global virtual teams.” Journal of Management Information Systems 14(4):29–64. Kim, Kwanho, Yerim Choi, and Jonghun Park. 2013. “Pricing fraud detection in online shopping malls using a finite mixture model.” Electronic Commerce Research and Applications 12(3):195– 207. Kim, Peter H., Donald L. Ferrin, Cecily D. Cooper, and Kurt Dirks. 2004. “Removing the shadow of suspicion: The effects of apology versus denial for repairing competence-versus integritybased trust violations.” Journal of Applied Psychology 89(1):104–118. Korfiatis, Nikolaos, Elena García-Bariocanal, and Salvador Sánchez-Alonso. 2012. “Evaluating content quality and helpfulness of online product reviews: The interplay of review helpfulness vs. review content.” Electronic Commerce Research and Applications 11(3):205–217. Lee, Zoonky, Il Im, and Sang J. Lee. 2000. “The effect of negative buyer feedback on prices in Internet auction markets.” Pp. 286–287 in ICIS ’00 Proceedings of the twenty first International Conference on Information Systems, edited by W. J. Orlikowski, P. Weill, S. Ang, and H. C. Krcmar. Atlanta: Association for Information Systems Atlanta. Lee, Jumin, Do-Hyung Park, and Ingoo Han. 2008. “The effect of negative online consumer reviews on product attitude: An information processing view.” Electronic Commerce Research and Applications 7(3):341–352. Li, Dahui, and Zhangxi Lin. 2004. “Negative reputation rate as the signal of risk in online consumer-to-consumer transactions.” Proceedings of ICEB:5–8. Matzat, Uwe. 2009. “The Acceptance and Effectiveness of Social Control on the Internet: A Comparison between Online Auctions and Knowledge Sharing Groups.” Pp. 238–265 in Trust and Reputation, edited by K. S. Cook, C. Snijders, and V. Buskens. New York: Russell Sage Foundation. Matzat, Uwe, and Chris Snijders. 2012. “Rebuilding Trust in Online Shops on Consumer Review Sites: Sellers’ Responses to User-Generated Complaints.” Journal of Computer-Mediated Communication 18(1):62–79. Mayer, Roger C., James H. Davis, and David Schoorman. 1995. “An Integrative Model of Organizational Trust.” The Academy of Management Review 20(3):709–734. McFadden, Daniel L. 1974. “Conditional Logit Analysis of Qualitative Choice Behavior.” Pp. 105–142 in Frontiers in econometrics, edited by P. Zarembka. New York: Academic Press.

Online Reputation: Damaging and Rebuilding Trustworthiness |

443

[26] Metzger, Miriam J. 2004. “Privacy, trust, and disclosure: Exploring barriers to electronic commerce.” Journal of Computer-Mediated Communication 9(4). doi:10.1111/j.10836101.2004.tb00292.x. [27] Metzger, Miriam J. 2006. “Effects of site, vendor, and consumer characteristics on web site trust and disclosure.” Communication Research 33(3):155–179. [28] Ohbuchi, Ken-ichi, Masuyo Kameda, and Nariyuki Agarie. 1989. “Apology as aggression control: Its role in mediating appraisal of and response to harm.” Journal of Personality and Social Psychology 56(2):219–227. [29] Park, Do-Hyung, and Sara Kim. 2008. “The effects of consumer knowledge on message processing of electronic word-of-mouth via online consumer reviews.” Electronic Commerce Research and Applications 7(4):399–410. [30] Pavlou, Paul A., and Angelika Dimoka. 2006. “The nature and role of feedback text comments in online marketplaces: Implications for trust building, price premiums, and seller differentiation.” Information Systems Research 17(4):392–414. [31] Pavlou, Paul A., and David Gefen. 2004. “Building effective online marketplaces with institution-based trust.” Information Systems Research 15(1):37–59. [32] Przepiorka, Wojtek. 2014. “Reputation in offline and online markets: Solutions to trust problems in social and economic exchange.” The European Economic Sociology Newsletter 16(1):4– 10. [33] Reeder, Glenn D., and Marilynn B. Brewer. 1979. “A schematic model of dispositional attribution in interpersonal perception.” Psychological Review 86(1):61–79. [34] Resnick, Paul, Richard Zeckhauser, Eric Friedman, and Ko Kuwabara. 2000. “Reputation systems: Facilitating trust in internet interactions.” Communications of the ACM 43(12):45–48. [35] Schniter, Eric, Roman M. Sheremeta, and Daniel Sznycer. 2013. “Building and rebuilding trust with promises and apologies.” Journal of Economic Behavior and Organization 94:242–256. doi:10.2139/ssrn.2144021. [36] Schweitzer, Maurice E., John C. Hershey, and Eric T. Bradlow. 2006. “Promises and lies: Restoring violated trust.” Organizational Behavior and Human Decision Processes 101(1):1–19. [37] Skowronski, John J., and Don E. Carlston. 1987. “Social judgment and social memory: The role of cue diagnosticity in negativity, positivity, and extremity biases.” Journal of Personality and Social Psychology 52(4):689–699. [38] Snijders, Chris, and Jeroen Weesie. 2009. “Trust and reputation in an online programmer’s market: A case study.” Pp. 166–185 in Trust and Reputation, edited by K. S. Cook, C. Snijders, V. Buskens, and C. Cheshire. New York: Russel Sage Foundation. [39] Snijders, Chris, and Richard Zijdeman. 2004. “Reputation and Internet Auctions: eBay and beyond.” Analyse and Kritik 26(1):158–184. [40] Standifird, Stephen S. 2001. “Reputation and e-commerce: eBay auctions and the asymmetrical impact of positive and negative ratings.” Journal of Management 27(3):279–295. [41] Utz, Sonja, Peter Kerkhof, and Joost van den Bos. 2012. “Consumers rule: How consumer reviews influence perceived trustworthiness of online stores.” Electronic Commerce Research and Applications 11(1):49–58. [42] Utz, Sonja, Uwe Matzat, and Chris Snijders. 2009. “Online Reputation Systems: The Effects of Feedback Comments and Reactions on Building and Rebuilding Trust in Online Auctions.” International Journal of Electronic Commerce 13(3):95–118. [43] Wojciszke, Bogdan. 2005. “Affective concomitants of information on morality and competence.” European Psychologist 10(1):60–70.

| Part VI: Game Theory

Heinrich H. Nax, Ryan O. Murphy, and Dirk Helbing

Nash Dynamics, Meritocratic Matching, and Cooperation Abstract: John F Nash (1950) proposed dynamics for repeated interactions according to which agents myopically play individual best-responses against their observations of other agents’ past play. Such dynamics converge to Nash equilibria. Without suitable mechanisms, this means that best-response dynamics can lead to low levels of cooperative behavior and thus to inefficient outcomes in social dilemma games. Here, we discuss the theoretical predictions of these dynamics in a variety of social dilemmas and assess these in light of behavioral evidence. We particularly focus on “meritocratic matching”, a class of mechanisms that leads to both low cooperation (inefficient) and high cooperation (near-efficient) equilibria (Gunnthorsdottir et al. 2010; Nax, Murphy, and Helbing 2014; Nax et al. 2015). Most behavioral theories derived from related social dilemmas cannot explain the behavioral evidence for this class of games, but Nash dynamics provide a very satisfactory explanation. We also argue that Nash dynamics provide a parsimonious account of behavioral results for several different social dilemmas, with the exception of the linear public goods game.

1 Introduction Without an appropriate institution, norm or mechanism, cooperation in repeated social dilemma interactions tends to deteriorate over time. Evidence of such phenomena is not restricted to controlled laboratory experiments (see, for example, Ledyard 1995 and Chaudhuri 2011 for reviews); it is also a widely recorded phenomenon in realworld interactions such as collective action and common-pool resource management (Olson 1965; Ostrom 1990; Ostrom 2000; Ostrom 2005; Ostrom, Walker, and Gardner 1994; Ostrom and Walker 2013). The long-term fate of these collective interactions is often a tragedy of the commons (Hardin 1968), a situation in which narrow self-interest and rational behavior lead to collectively worse outcomes than could be achieved by collective action. Game theory (von Neumann and Morgenstern 1944), especially non-cooperative game theory (Nash 1950; Nash 1951), has made formal analyses of strategic interactions possible. This formal framework allows the identification of the key characteristics that determine whether interactions will lead to efficient outcomes or succumb to the tragedy of the commons. Large parts of the game-theoretic inquiry are concerned with static models. In this chapter, we focus on dynamic models, particularly best-response dynamics as first introduced in Nash’s Ph.D. dissertation (Nash 1950). According to such dynamics, indihttps://doi.org/10.1515/9783110472974-021

448 | Heinrich H. Nax, Ryan O. Murphy, and Dirk Helbing

viduals play a repeated game and adapt by myopically best responding, or choosing the strategy that maximizes their current-period payoff in response to the strategies they observed other agents to adopt in the past. Such dynamics converge to a Nash equilibrium, as articulated in Nash’s dissertation. This part of Nash’s dissertation is not as well-known because the short section of the dissertation that contained this material was omitted in the publication (Nash 1951; see Young 2011 for a discussion). More broadly, Nash dynamics are part of the strand of evolutionary game theory models that are not strictly based on the classical replicator argument. Other important dynamics that are closely related to Nash dynamics include fictitious play, stochastic fictitious play, and other forms of belief-based learning models. It is worth emphasizing again that all these dynamic approaches are about noisily best-responding agents, usually presuming agents are driven by narrow self-interest, and that, even holding preferences constant, we can go far in understanding and modeling emergent behavior when considering such dynamics. The dynamic modeling of emergent behavior is a fruitful approach in two ways. First, it is often quite powerful: we can explain why certain outcomes are selected rather than others by looking at the long-term outcomes of the noisy dynamics. Second, dynamic modeling is general: we do not need a different preference theory for each different game, or a preference model at the individual level for all these different games. This is not to argue that individual level preference models are wrong or useless, but they may not be a good first step in trying to understand complicated social interactions and associated emergence behavior. The parsimony of the dynamic model makes it a promising first step and, as we show, can be applied broadly. In the next section, we apply a Nash dynamics analysis to four types of social dilemma game, for each of which there are many real-world examples. (1) The linear public goods game. Individuals decide separately whether, and if so how much, to contribute to the provisioning of a public good. Each individual contribution leads to a greater collective benefit, but the individual cost of its provision exceeds the private benefit. There is therefore a clash between private and collective interests. Not contributing is the dominant individual strategy, but full contribution by everyone maximizes collective payoffs. This game, with a simple linear payoff structure, was first studied as the voluntary contribution mechanism (VCM) applied to the linear public goods game (L-PGG) by Marwell and Ames (1979) (see also Issac, McCue, and Plott 1985). The unique Nash equilibrium of the VCM in the context of L-PGG is universal freeriding/noncontribution, while the outcome that maximizes collective payoffs is universal full-contribution. (2) The step-level public good. In this game, a certain minimum number (or a certain level) of contributions is needed to reach a threshold at or above which the public good is provided. Reaching the threshold requires contributions by two or more players. An individual is

Nash Dynamics, Meritocratic Matching, and Cooperation

| 449

best off if the public good is provided but he himself freerides. The second-best outcome is for the public good to be provided and for the individual to have also contributed himself. Third-best outcome is for the public good not to be provided, without the individual contributing himself. The worst outcome for the individual is to contribute but for the total contributions to fall short of the threshold, leading to the public good not being provided. Palfrey and Rosenthal (1984) introduced this class of game, which has since been studied experimentally (e.g., Kragt, Orbell, and Dawes 1983; Rapoport and Suleiman 1993). There are two types of pure Nash equilibria in this game: one where the threshold is exactly met, which is also the outcome that potentially maximizes collective payoffs, and one where no one contributes, that is, nonprovision of the public good. One therefore has a coordination problem: who, if anyone, will contribute? (3) The volunteer’s dilemma. Exactly one volunteer is needed to provide the public good. The individual is bestoff if the public good is provided by someone else (i.e., someone else volunteers). However, in case no one else volunteers, the individual best response is to do it (i.e., to be the volunteer). Diekmann (1985) introduced this game. The pure Nash equilibria in this game are asymmetric, with any one of the players volunteering. There is also a mixed equilibrium that results in positive probabilities of volunteering. The socially desirable outcome, in terms of total collective payoffs, is for the volunteer to be the one who has the lowest cost of volunteering, but the problem is again one of coordination: who will volunteer? (4) Group-based meritocratic matching. Individuals jointly create a good, which is of a public-goods character, within several separate “clubs” (as in Buchanan 1965): that is, the benefits from the public goods provided in each club do not transcend club boundaries. Inside the club, the same structure as in L-PGG prevails, but admittance to clubs is based on contribution decisions. We refer to this contribution-based group admittance as “meritocratic matching”. Gunnthorsdottir et al. (2010), Nax, Murphy, and Helbing (2014), and Nax et al. (2015) formulated such “meritocratic matching” mechanisms theoretically and empirically. As in the step-level public goods game, the game potentially (whether this is the case depends on various parameters of the game) has two types of pure Nash equilibria: one (which often but not always exists) with many contributions, which is near-efficient, and the other (which always exists) with universal freeriding. Again, a coordination problem emerges and the crux of the strategic interaction is finding the cooperators/contributors. Next, we will show that the kind of Nash dynamics discussed above provide a simple and parsimonious account of behavior in many social dilemma games. Only lin-

450 | Heinrich H. Nax, Ryan O. Murphy, and Dirk Helbing

ear public goods games are an exception to this dynamics account. We believe this is notable because a large (perhaps disproportionate) amount of attention has been given to these games, which has led to a commonly held conclusion that more complex mechanisms like social preferences (or reciprocity, which is changes/dynamics in actions or social preferences) are required to understand cooperative behavior in social interactions. While this may be true for linear public goods games, this may not be so in many other games, as selfish but imperfect (noisy) agents interacting in particular settings lead to efficient outcomes, and yield various commonly observed dynamics. The spontaneous invocation of social preferences in these other games is perhaps unnecessary, as other simpler mechanisms can account for much of the observed behavior from experiments. Moreover, these other social dilemma games contain multiple equilibria that better correspond to various interactions in the real world, as has been argued, for example, by various biologists studying the volunteer’s dilemma (Raihani and Bshary 2011). Often, mechanisms transform interactions into coordination games from social dilemma games, and social dilemmas with multiple equilibria present agents again with a coordination problem. Take the example of cooperation emerging in the presence of punishment. If cooperation is upheld by punishment of defectors, it is an open question who will punish when defection occurs. What is noteworthy is that Nash dynamics offer a solution to this coordination problem. The dynamics settle into equilibria whose “basins of attraction” are bigger than others. One example of this is the generalized Meritocracy games we developed. In these games, the near-efficient equilibria often have a large basin of attraction compared with the zero-contribution equilibrium, and this feature foments interesting dynamics. It establishes a “rational” pathway for narrowly self-interested agents to secure the tremendous efficiency gains available in this strategic context. Contrast this with L-PGG, where there is no such mechanism. Taken all together, we see a variety of interesting social dilemmas in the wild that are not isomorphic to the standard PD game or the common L-PGG. We strongly encourage researchers to pay more attention to identifying, modeling, and studying these other social dilemmas that are different than the fruit flies we know well (e.g., PD and L-PGG). In addition, we particularly encourage the development of quantitative models of social dilemmas that facilitate the exploration of particular mechanisms (information, signaling, etc.) that can facilitate the producing of interesting (non-trivial) dynamics. Lastly, we encourage parsimonious approaches to understanding emergent dynamics. This implies starting with a concept like Nash dynamics (or stochastic fictitious play, or other simple myopic learning models) as a way to begin accounting for emergent behavior rather than evoking more complicated preference refinements. This is not to say that non-selfish preferences do not exist, or are not potentially important, but rather that there may be other simpler mechanisms that can account for the empirics, and Occam’s razor directs us to look to these simpler mechanisms first.

Nash Dynamics, Meritocratic Matching, and Cooperation

| 451

The focus of this chapter will be on Nash dynamics and on meritocratic matching, which, broadly, falls under “assortative matching”, a widespread phenomenon. In evolutionary biology, assortative matching is the key mechanism underlying various forms of kin selection (Hamilton 1964a; Hamilton 1964b); for example, via limited dispersal/locality (spatial interactions; Nowak and May 1992; Eshel, Samuelson, and Shaked 1998; Skyrms 2004) or greenbeard genes (Dawkins 1976; Fletcher and Doebeli 2009; Fletcher and Doebeli 2010). Similarly, assortative matching can be expressed via “homophily” (Alger and Weibull 2012; Alger and Weibull 2013; Xie, Cheng, and Zhou 2015). More specifically, meritocratic matching is a mechanism that leads to assortativity of actions, rather than of genes or locations, and is particularly relevant for human interactions, where institutions exist that can determine who interacts with whom based on observable behavior. The “meritocratic” element is part of the institutional structure of the interaction, and not part of the decisions made by the involved individuals (even though one can think of it as an evolving institution: Nax and Rigos 2016). Examples include school/university admission, team-based payment schemes, and organizational selection and recruitment. These examples have in common that individuals gain access to better groups (leading to better payments) based on their own effort and performance. The crucial common characteristic here is that agents make a precommitted (irrevocable) investment decision that leads to assortative matching with potentially important payoff consequences at the end of the interaction (see Nax, Murphy, and Helbing 2014 for a more detailed discussion). In human interactions, meritocratic matching has been shown, theoretically and experimentally, to stabilize nearefficient contribution levels effectively (Gunnthorsdottir et al. 2010; Nax, Murphy, and Helbing 2014; Nax et al. 2015). To further assist in understanding how meritocratic matching works, we shall explain briefly (before introducing the mechanism formally in a subsequent section) its basic principles. A population of agents is divided into several groups based on contribution decisions: contributors (freeriders) tend to be matched with other contributors (freeriders). Contribution decisions precede group matching, and meritocratic matching creates incentives to contribute to be matched with others doing likewise. The resulting structures of Nash equilibria are as follows. On the one hand, there exist multiple asymmetric near-efficient equilibria with many contributors and only a few freeriders. On the other hand, there continues to exist a Pareto deficient symmetric equilibrium in which all players freeride. It is noteworthy that the meritocracy mechanism enables high-efficiency equilibria to emerge even with narrowly self-regarding players. This stands in notable contrast to the vast majority of existing work on cooperation that invokes other-regarding preferences or reciprocity (i.e., contingent otherregarding preferences) as a means of achieving collective improvement. To add to the discussion about such mechanisms, the purpose of this chapter is threefold. First, the consequences of “Nash dynamics” in various social dilemma games are reviewed and compared with experimental evidence. Second, these dynam-

452 | Heinrich H. Nax, Ryan O. Murphy, and Dirk Helbing

ics are used as the basis to explain, in an elemental way (retaining the assumption of self-regarding preferences), how the players coordinate playing strategies that yield a nearly perfectly efficient equilibrium under meritocratic matching. Finally, differences and similarities with L-PGG, step-level public goods, and volunteer’s dilemmas are discussed. The rest of this chapter is divided as follows. First, we introduce Nash dynamics. In Section 3, we introduce the classes of social dilemmas under consideration. Sections 4 to 5, respectively, present the predictions of the Nash dynamics, discuss existing experimental evidence, and provide alternative explanations. Section 6 concludes.

2 Nash dynamics Evolutionary arguments form the backbone of much of the “emergence of cooperation” literature (Axelrod and Hamilton 1981; Axelrod 1984) related to social dilemmas, of which, for example, West, Griffin, and Gardner (2007), and West, Mouden, and Gardner (2011) provide reviews from an evolutionary biology perspective. Nash himself proposed a particular kind of dynamic justification for his equilibrium concept, and we will focus on these “Nash dynamics” in this chapter. Before we introduce them formally, however, we would like to provide a less formal overview. First, let us recall the static justification of the Nash equilibrium: an outcome of a game is a Nash equilibrium if and only if all strategies that are being played constitute mutual best replies to one another. In other words, in an equilibrium it behooves no player to change their selected strategy unilaterally. How is a Nash equilibrium reached? One way to think about how a Nash equilibrium is arrived at, is as the outcome of an indefinitely repeated game. In that game, agents have the opportunity to revise their strategies over time in light of the past actions of others as played over iterations of the game. If all agents myopically choose to play best replies against their observations of other players’ chosen strategies from the past, who themselves also play best replies given their observations, then such a process will lead to a Nash equilibrium. Certainly, any Nash equilibrium is an absorbing state of such a dynamic process as no player has an incentive to unilaterally choose another action given the other players’ choices. Which equilibria will emerge when there are multiple equilibria? This equilibrium selection question has been addressed in the evolutionary games literature. One route of enquiry, not based on Nash dynamics, is concerned with evolutionary stable strategies (Taylor and Jonker 1978; Helbing 1992; Helbing 1996; Weibull

Nash Dynamics, Meritocratic Matching, and Cooperation

| 453

1995) based on the replication/imitation of strategies with higher fitness (Darwin 1871; Maynard Smith and Price 1973; Maynard Smith 1987). Such arguments are at the heart of evolutionary game theory (often abbreviated as EGT) as applied to biology. In social scientific enquiries of human decision-making, beginning with the seminal contributions by Foster and Young (1990), Young (1993) and Kandori, Mailath, and Rob (1993), analyses take Nash dynamics as the baseline instead of replication/ imitation which presumes only gradual adaptation. The crucial novelty is that “noise” is added in the sense of random behavioral deviations from the predominant bestresponse rule (Helbing 2010; Mäs and Helbing 2014). This added noise, in various forms, has generated sharp long-run predictions and led to a rich theoretical literature, where long-run predictions of which Nash equilibria will be selected depend crucially on the noise modeling (Bergin and Lipman 1996; Blume 2003). Recently, the underlying assumptions of noise are being investigated behaviorally using laboratory studies (Mäs and Nax 2015; Young 2015). To date, most of this literature focuses on coordination games. In the context of social dilemma games, the connections between evolutionary explanations and behavioral/experimental studies are less direct: experimental evidence is explained by social preferences and social norms (e.g., Chaudhuri 2011), and social preferences and social norms are in turn explained by indirect evolutionary arguments (e.g., Alger and Weibull 2012; Alger and Weibull 2013). The reason for this separation is that both models are complex enough when the two issues are separated, hence a full formal treatment appears infeasible (see Schelling 1971; Skyrms 2004). The advantage of following the route of the simpler Nash dynamics, as we do in this chapter, is that these models are tractable, and their macro predictions are easy to verify. This approach is also more parsimonious and more general than the preference/norm based approaches (at least as a first step). This approach to explaining observed behaviors can be successful when the game has more than one equilibrium and/or does not feature a dominant strategy. When there is only one equilibrium (as in L-PGG), one must turn to more complex models to explain why some individuals would ever contribute. But when multiple equilibria exist, and when there is more room for strategizing, then Nash dynamics lead to/from all equilibria to one another, and perturbed Nash dynamics will make predictions about their relative stability. We shall now provide a simple framework to express several alternative individual-level adjustment dynamics that fall under the category of “Nash dynamics”, and we shall use them to try and explain behavioral regularities observed in a number of experiments. The model. Suppose the same population of players N = {1, 2, . . . , n}, by choosing actions from the same finite action set C, repeatedly plays the same symmetric noncooperative game in periods T = {1, 2, . . ., t}, where each outcome, an action ntuple c = {c i }i∈N , implies payoff consequences ϕ c = {ϕ i (c)} i∈N .

454 | Heinrich H. Nax, Ryan O. Murphy, and Dirk Helbing t Denote by BR i (c−i ) the best response by player i against the actions played by the t others in period t, c−i . Omitting period superscripts, being a best response means that ϕ i (BR i |c−i ), that is, the payoff obtained by i by playing BR i (c−i ), is larger than any other payoff obtainable by playing an alternative action (assuming it is unique). Of course, an outcome c is a Nash equilibrium if, and only if, c i = BR i (c−i ) for all players. We assume (along the lines of Kandori, Mailath, and Rob 1993 and Young 1993) that, over time, each player plays BR i (c−i ) with probability (1− ϵ), and all other strategies with some positive probability summing up to ϵ. Note that, as ϵ → 0, only certain outcomes (called stochastically stable states) have positive probability in the long-run distribution of the dynamic (Foster and Young 1990). Of course, the crucial question for each player is to hypothesize what others’ actions will be to decide what one should play oneself. A standard assumption in evolutionary games – and the route taken in Nash (1950) – is to base this hypothesis on information about other players’ actions in the past. However, there are several alternative assumptions as to how information from the past is processed, which we will discuss below. There are also a number of differences in sampling from players’ pasts, but we shall not address this issue here, assuming that all past actions are perfectly observable (as they are in many experimental settings). We illustrate these different assumptions using the two-by-two coordination game known as battle of the sexes, where each player chooses between two actions, “opera” and “football”. Each player receives a payoff of two (zero) from coordination (anticoordination) on any of the actions, and in addition has an idiosyncratic preference worth an additional payoff of one for one of the two actions (man prefers football, woman prefers opera). Of course, the best response in such a game is always to match the other’s action and there are, therefore, two pure strategy Nash equilibria. The important point is that one equilibrium is better for one player, and the other equilibrium is better for the other.

2.1 Basic Nash players John Nash himself, in his PhD thesis (Nash 1950), formulated the following model of dynamic players (as discussed in Young 2011). We shall call these players basic Nash players. A basic Nash player plays t−1 BBR ti : = BR i (c−i ) , t taking as his hypothesis about c−i the previous-period observation of others’ actions. This model is widely used under various names including myopic best reply. Of course, Nash equilibria are absorbing states of basic Nash play dynamics. In the context of the battle of the sexes game, if a man observes the woman playing opera this period, he will play opera next period.

Nash Dynamics, Meritocratic Matching, and Cooperation | 455

2.2 Clever Nash players Building on the notion of the basic Nash player (as used in Young 1993), Saez-Marti and Weibull (1999) introduce the notion of “cleverness” among agents, leading to socalled clever Nash players. A clever Nash player plays a best response against the other player’s basic best responses. More formally, a clever Nash player plays t CBR ti := BR i (BR−i (c t−1 )) = BR i (BBR−i ) , t taking as his hypothesis about c−i the current-period basic Nash player best responses, t−1 t BBR j = BR j (c−j ), for all other players j. A clever Nash player predicts, by looking backwards, the current play of basic Nash players, and he can therefore potentially improve his play by unilaterally responding differently to basic best response. Naturally, Nash equilibria also remain absorbing states of clever Nash play dynamics. In the context of the battle of the sexes game, if a man played football this period he will assume that the woman will play football next period, and therefore play football next period too.

2.3 Forward-looking Nash players In the spirit of Saez-Marti and Weibull (1999), and adding an element of forward-lookingness, we introduce the notion of “forward-looking cleverness”. We shall therefore call such players forward-looking Nash players. A forward-looking Nash player plays FBR ti to maximize his next-period payoff. He for all assumes others will act as basic Nash players next period and will play BBR t+1 j j, taking as given FBR ti and their BBR tk for all k ≠ i, j. For himself, he intends to play t+1 clever Nash CBR t+1 against BBR−i . i In other words, this means that a forward-looking Nash player predicts the future consequences of his own current-period action on his next-period payoff, and he chooses his own action to maximize his forward-looking payoff. Note that a forward-looking Nash player can therefore consider the consequences of unilateral deviations and further consider the consequences of inducing multilateral deviations. Such a player can move play of the population from one Nash equilibrium to another. In general, however, Nash equilibria need not be absorbing states of forward-looking Nash dynamics. It is worth pointing out that it is an under-explored avenue to link foresightedness with the rich literature on cognitive hierarchy. In the context of the battle of the sexes game, this means that the man player will play football this period to make the woman player choose football next period, even at the risk of anti-coordination this period.

456 | Heinrich H. Nax, Ryan O. Murphy, and Dirk Helbing

2.4 Perturbed dynamics We assume that the order of magnitude of noise added to basic Nash play is higher than that added to clever Nash and forward-looking Nash play. However, noise is added to all types of Nash play. Hence, the process continues to be ergodic (every outcome is reached from every outcome with positive probability), meaning that we can apply ergodic theory and stochastic stability arguments. Our baseline assumption will be to consider uniform constant deviation rates for each type of agent. An alternative way of introducing noise to underlying Nash play dynamics would be to develop an approach where the probability of an error depends on its cost vis-à-vis the best response. This approach is taken in Blume (1993) and many subsequent contributions, and corresponds to the approach of quantal response equilibrium (McKelvey and Palfrey 1995; McKelvey and Palfrey 1998). Since this chapter is analytical rather than motivated by fitting data, however, we shall go for the simpler (i.e., one parameter) uniform noise assumption in our approach.

3 Social dilemma games We shall now detail our main game application, called “meritocratic matching games”, and introduce the other three classes of games mentioned in the introduction for subsequent comparison.

3.1 Meritocratic matching games Consider the following meritocratic matching game (MMG). All agents in the population N = {1, 2, . . . , n} have to decide simultaneously whether to contribute toward the provision of local public goods (in the sense of a club/team good, Buchanan 1965), choosing an arbitrary amount c i from some fixed budget B such that c i ∈ [0, B]. Given the vector of all contributions, c, the population is divided into several groups, of equal size s < n, and contributors (freeriders) tend to be matched with contributors (freeriders). We shall call such a matching meritocratic matching, and the resulting class of games are the meritocratic matching games. Meritocratic matching encompasses a range, from no-meritocracy to full-meritocracy, which we shall instantiate as follows. Suppose i.i.d. Gaussian noise, ϵ i ∼ (0, σ 2 ) with σ ∈ (0, ∞), is added to each actual contribution decision so that, instead of the actual contribution c i , only the noised contribution, (c i + ϵ i ), is observable. Players are then ranked according to {(c i + ϵ i )}i∈N , from highest to lowest, and groups form composed from this order: the highest s (c i + ϵ i )s form group one, etc.

Nash Dynamics, Meritocratic Matching, and Cooperation

| 457

β := 1/σ represents the level of meritocracy in the system. – For β → ∞ (or σ2 → 0), noise vanishes and we approach full-meritocracy. In that case, no player contributing less than another can be ranked higher than him, but there is random tie-breaking to ascertain the precise ranking of contributors who contribute the same amount. – For β → 0 (or σ 2 → ∞), noise takes over and we approach no-meritocracy or random group matching (as, for example, in Andreoni 1988). The strength of assortativity in our process is expressed by β, an index that can be related to the so-called “index of assortativity” (Bergstrom 2003; Bergstrom 2013; Jensen and Rigos 2014; Nax and Rigos 2016; see also Wright’s “F-statistic” 1921; 1922; 1965). It corresponds to the notion of “institutional fidelity” in the sense that some real-world institutions/mechanisms endeavor to be meritocratic but do so imperfectly for a variety of different reasons. Meritocratic matching in the form of an assortative matching of contributors and freeriders alike creates incentives to contribute to be matched with others doing likewise. Given contribution decisions and groups that form based on these, some marginal per-capita rate of return, r ∈ ( 1s , 1), determines the return in each of the n/s local public goods which are shared equally among the agents in each group. Note that there are no payoff transfers between players. Denote by S i the group in which i is matched. Consequently, i will receive a monetary payoff of ϕ i = B − c i + r ⋅ ∑j∈S i c j . There is no a priori dominant strategy in this game under meritocratic matching (provided β is sufficiently high). Nevertheless, the outcome where all players freeride (contribute zero) is a Nash equilibrium for any value of β. In addition, if σ 2 is not too large and r is sufficiently large, then there exist additional Nash equilibria (Gunnthorsdottir et al. 2010; Nax, Murphy, and Helbing 2014). These are asymmetric outcomes where a large majority, size m > (n − s), of the population contributes fully, while only a marginal minority of players, size (n − m) < s, freerides. The exact size of the freeriding minority depends on the game’s defining parameters. Meritocracy has been shown experimentally (Gunnthorsdottir et al. 2010; Rabanal and Rabanal 2014) to effectively implement near-efficient contribution levels at the high Nash equilibrium values. This result has been shown to generalize for the inclusion of noise for general meritocracy levels, theoretically (Nax, Murphy, and Helbing 2014) and experimentally (Nax et al. 2015). It is noteworthy that this mechanism works, in the sense of implementing near-efficient outcomes, with homogenous, narrowly self-regarding (Nash) players. An issue we have so far left unaddressed is to explain how the population coordinates into play of high equilibria.

458 | Heinrich H. Nax, Ryan O. Murphy, and Dirk Helbing

3.2 Related games 3.2.1 Linear public goods game The voluntary contributions mechanism (VCM) in the context of a linear public goods game (L-PGG), introduced by Marwell and Ames (1979; see also Isaac, McCue, and Plott 1985 and Isaac and Walker 1988), is used widely to study public goods dilemma games in the behavioral sciences (see reviews by Ledyard 1995 and Chaudhuri 2011). Under the linear public goods game (L-PGG), all agents in the population S = {1, 2, . . . , s} simultaneously decide how much to contribute to a public good, choosing an arbitrary amount c i from some fixed budget B such that c i ∈ [0, B]. As before, given all contributions c, together with a fixed marginal per-capita rate of return r ∈ ( 1s , 1), the public good is then shared equally among the agents who receive a total payoff of ϕ i = B − c i + r ⋅ ∑j∈S c j . Notice now there is no link between contribution decisions and group matching. Of course, the way to maximize one’s payoff, therefore, given any combination of contribution decisions by the others, is to set c i = 0. The unique Nash equilibrium is thus characterized by universal freeriding, meaning non-provision of the public good and lowest collective payoffs.

3.2.2 Step-level public goods The VCM with decisions restricted to a binary choice of whether to contribute (c i = 1) or to freeride (c i = 0) is a special case of the step-level public goods game introduced by Palfrey and Rosenthal (1984), which we shall abbreviate as k-PGG. In k-PGG, agents, again, simultaneously decide whether to contribute or not (now B = 1). The public good is then provided and shared equally among the agents if, and only if, at least k agents contribute. If fewer than k agents contribute, payoffs are ϕ i = 0 for contributors and ϕ i = 1 for free-riders. If at least k agents contribute, payoffs are ϕ i = s ⋅ r − 1 for contributors and ϕ i = s ⋅ r for freeriders. The way to maximize one’s own payoff, given exactly (k − 1) contributors among the others, is to contribute. However, for any other number of contributors among the others, a unilateral decision is not pivotal, hence the best response is to freeride. Therefore, the resulting structure of pure-strategy Nash equilibria for k − PGGs with k such that 1 < k < s includes equilibria as follows: (A) there exist multiple asymmetric equilibria, in each of which exactly k players contribute and the others freeride; (B) there exists a symmetric equilibrium in which all players freeride.

Nash Dynamics, Meritocratic Matching, and Cooperation

| 459

3.2.3 The volunteer’s dilemma An important, and different, special case of k − PGG is when k = 1, the volunteer’s dilemma game (Diekmann 1985), here abbreviated VDG. In VDG, the symmetric equilibrium in which all players freeride falls apart because all players are pivotal when no one volunteers, and the only equilibria in pure strategies are asymmetric such that exactly one player volunteers. A symmetric mixed strategy equilibrium where all players contribute with some positive probability also exists, with the surprising comparative statics that for lower-cost volunteers they will volunteer with a smaller probability in equilibrium than the higher-cost players.

4 Predictions 4.1 Baseline evolutionary predictions 4.1.1 Related games First, we shall consider the baseline evolutionary predictions for the voluntary contribution mechanism applied to the linear public goods game (L-PGG) and for the steplevel public goods game (k-PGG). Universal freeriding is the only stochastically stable equilibrium in L-PGG and k-PGG (Myatt and Wallace 2008). For L-PGG under VCM, the reason is quite simply that there is only one Nash equilibrium, which is the noncooperative outcome. For k-PGG, the reason is more subtle and is a consequence of the incentive structure. Above the threshold, that is, when there are already sufficiently many cooperators/contributors to provide the public good, freeriding is a better strategy. The same is true two steps below the threshold. Hence, only at the threshold (for a contributor), or one step below (for a defector), is contributing a best reply. A positive chance of miscoordination away from the local attractor of the high Nash equilibrium, say by an ϵ-tremble or a “bad apple” amid the contributors (Myatt and Wallace 2008), therefore, takes players away from the efficient equilibrium (out of the good basin of attraction) toward the freeriding equilibrium (into the bad basin of attraction). In terms of dynamics over time, evolutionary predictions are quite different between L-PGG and k-PGG, due to the absence of a dominant strategy in k-PGG. In LPGG, since freeriding is a dominant strategy, we should observe play at or very close to the zero-contributions throughout. By contrast, in k-PGG, we may see initial play at the high-equilibrium outcome, followed by a relatively sharp drop down to freeriding where the process will remain. Next, we turn to the volunteer’s dilemma games (VDG). In VDG, all (pure strategy) Nash equilibria are stochastically stable if the game is symmetric. What happens when there is asymmetry depends on the underlying noise structure? Under uniform devi-

460 | Heinrich H. Nax, Ryan O. Murphy, and Dirk Helbing

ation rates, all outcomes are stochastically stable. With cost-dependent deviations, the outcome where the player with the lowest provision-cost would volunteer is selected. In terms of evolutionary stability, all Nash equilibria are stable. In the presence of asymmetries amongst players, the evolutionary stable states may additionally include asymmetric mixed-strategy equilibria, and even the welfare-maximizing equilibrium where the lowest-cost player volunteers with probability one may be an evolutionary stable state (He et al. 2014). The dynamics of the evolving game may be quite complex (see Raihani and Bshary 2011; Diekmann and Przepiorka 2015). We may see turn-taking behaviors, lock-in to one specific volunteer, and occasional breakdown or over-volunteering if players follow mixed strategies. A symmetric VDG can have quick lock-in behavior under stochastic fictitious play. If all players have a non-zero probability of volunteering, eventually one of them does, and they are then frozen in that configuration with the unlucky volunteer and everyone else staying out.

4.1.2 Meritocratic matching games In terms of stability, the class of MMGs is divided roughly into three types: (A) For games with low meritocracy and/or low rates of return, the freeriding equilibrium is the unique Nash predictions and therefore stable. (B) For games with intermediate meritocracy and intermediate rate of return, nearefficient equilibria exist, but the freeriding equilibrium is the unique stable equilibrium. (C) For games with high meritocracy and high rate of return, near-efficient equilibria exist and are stable. Play of the two extreme cases – (A) and (C) – from the experiment by Nax et al. (2015) is summarized in Figure 1. In terms of dynamics over time, ex ante, we would expect the following behaviors for the three groups of MMGs. For the first (A), we expect similar dynamics as in L-PGG. Indeed, this is what Figure 1 illustrates. For the second (B), we expect similar dynamics as in k-PGG. Finally, for the third (C), we expect dynamics where initial play could either already be close to a high-equilibrium, or start closer to the freeriding outcome. In the latter case, we would expect a relatively sharp increase in contributions quickly, leading from the freeriding outcome to high-equilibrium cooperation levels. Once at high-equilibrium levels, we would expect the process to remain there with high stability. Figure 1 illustrates experimental play of (C).

Nash Dynamics, Meritocratic Matching, and Cooperation

random

exo perfect

Frequency

Contribution

| 461

Frequency

Round

Contribution

Round

Notes: Treatments varied with respect to the degree of meritocracy in the system, ranging from “nomerit” (random re-matching) to “perfect-merit” (a perfectly meritocratic matching protocol). In the figure, the contribution patterns for the case of no-merit are shown on the left, and for perfect-merit on the right. Fig. 1: Contribution patterns from a laboratory experiment under meritocratic matching (Source: Nax et al. (2015); kindly produced by S. Balietti).

4.2 Experimental evidence The experimental evidence for the L-PGG is well known (as reviewed in Ledyard 1995, and more recently in Chaudhuri 2011). Basically, without further mechanisms, contributions start at some intermediate level and decay over time by roughly half the amount every ten rounds under random re-matching (Andreoni 1988), and less when group matching is fixed. The initial contribution pattern is unexpected and is probably best explained by the introduction of additional features such as social preferences (Fischbacher and Gachter 2010; Chaudhuri 2011) or learning (Burton-Chellew, Nax, and West 2015). Nash dynamics cannot explain these high initial levels of cooperation/contribution, but they do explain what happens with iterated interactions. For k-PGGs, the pattern depends crucially on how many contributors, relative to the population size, are needed. The likelihood that the threshold is met or exceeded is higher for lower thresholds relative to the population size (for important and recent contributions see, for example, Erev and Rapoport 1990; Potters, Sefton, and Vesterlund 2005; Potters, Sefton, and Vesterlund 2007; Gächter et al. 2010a; and Gächter et al. 2010b. A lacuna exists for an up-to-date literature review for k-PPGs). For VDGs, Diekmann and Przepiorka (2015) discuss much of the relevant behavioral evidence. What is important is that there is evidence of turn-taking. Moreover, the counterintuitive comparative static of the mixed equilibrium where the lowestcost volunteers volunteer with probabilities lower than the others is behaviorally not confirmed. Instead, they contribute more often, which is more in line with Harsanyi-

462 | Heinrich H. Nax, Ryan O. Murphy, and Dirk Helbing

Selten logic that the payoff-dominant Nash equilibria are behaviorally focal (as discussed in Diekmann 1993). This is also what Nash dynamics with cost-dependency predict. For MMGs, there exists evidence that suggests that near-efficient contribution levels are achieved close to what theory predicts (Gunnthorsdottir et al. 2010; Rabanal and Rabanal 2014; Nax, Murphy, and Helbing 2014), and for noisy meritocracy (e.g., imperfect assortative matching). Three aspects of behavioral evidence are especially noteworthy: (1) The near-efficient equilibrium is the uniquely stable equilibrium, in the sense that outcomes at or close to that equilibrium are played in all experiments. The freeriding equilibrium is never played. (2) A large fraction of players take turns to freeride in equilibrium. Often, turn-taking functions without, or with very little, loss of equilibrium miscoordination. (3) Other than in L-PGG, where contributions gradually decline towards the Nash equilibrium over time, there is almost no change in behavior (they may be learning to coordinate and trust each other) in the MMGs. Instead, players play at or very close to the near-efficient equilibrium virtually from the start of the game and continue unabated. We shall dedicate the remainder of this chapter to explanations of these phenomena.

5 No magic Kahnemann (1988) speaks of “magic” in the context of market entry games, meaning the observed yet unexplained complex asymmetric equilibria that were successfully coordinated upon in experiments, including turn-taking, “without learning and communication” (Camerer and Fehr 2006:50). This is observed despite individuals acting without clear structure (see Ochs 1999 for a review). In our MMGs, the asymmetric Nash equilibria are being played too, despite the existence of a simple symmetric Nash equilibrium to which players could resort instead. Hence, in our MMGs, in the sense of Kahnemann (1988:12), it has been suggested that “subjects display more complex coordination and ‘magic’ than hitherto observed” (Gunnthorsdottir, Vragov, and Shen 2010b). We shall attempt to unravel the magic in this section, based on decision-theoretic foundations that mirror most closely the basic logic of Nash behavior, thus offering a very simple “no-magic argument” for our three phenomena.

Nash Dynamics, Meritocratic Matching, and Cooperation

| 463

5.1 Stability of near-efficiency First, we shall address the question of why the near-efficient equilibrium is more stable. The reason is simple. All that is needed to jump out of the basin of attraction of the no-contribution equilibrium into that of the near-efficient equilibrium is for twoplayers to contribute fully; and so as not to exceed Nash predictions, all that is needed are a few players to contribute zero. The latter can be explained by basic Nash play and by clever Nash play, and the former by forward-looking Nash play. The reason is that the basic best response to all other players contributing fully is to contribute zero, while the forward-looking best response is to contribute fully once a few players contribute fully, which will subsequently lead to near-efficient contributions. Mistakes by basic Nash players, therefore, together with clever and forward-looking Nash play by just a few agents for each category, explains why the asymmetric, near-efficient equilibrium will be played quickly. Without forward-looking Nash players, the efficient outcome could also emerge after some time with some noisy players and best response.

5.2 Turn-taking Next, we shall address the phenomenon of turn-taking. Suppose we start the process off in the near-efficient equilibrium, assuming it exists. If so, then predicting the tremble of a contributor will sway a freerider playing clever best response to contribute fully. Similarly, out of equilibrium: if too few freeriders exist vis-à-vis the near-efficient equilibrium, then a clever Nash player currently freeriding will switch to contributions, expecting the basic Nash players to implement the freeriding strategy. Similarly, if too many freeriders exist vis-à-vis the near-efficient equilibrium, then a clever Nash player currently contributing will switch to freeriding, expecting the basic Nash players to implement the contribution strategy. Low-probability mistakes by basic Nash players, therefore, together with clever Nash play by one or two agents, could explain turn-taking.

5.3 No learning The reason why no learning is required to account for the near-efficient equilibrium virtually from the beginning is because most players can reactively play a basic Nash best response. In fact, this basic Nash play is what allows the clever and forwardlooking players to coordinate between different equilibria. In addition, for the nearefficient Nash equilibrium, there are no mixed motives between individual and group incentives for the large majority of players in equilibrium who will contribute fully. It is best for them to play it and it is best for the collective that it is played.

464 | Heinrich H. Nax, Ryan O. Murphy, and Dirk Helbing

5.4 Predictions for related games In L-PGG under VCM, our model predicts zero contributions throughout the iterated games. Evidence of intermediate contributions would imply astonishingly high deviation rates, which would ebb as contributions decline. This explanation is unsatisfactory, and alternative explanations are required. In k-PGGs, our model predicts that (provided sufficiently many clever and forward-looking players exist) the threshold would be reached or even exceeded. In fact, it offers a simple explanation of the phenomena, suggesting that early contributors are clever or forward-looking, which is very much in line with the explanations that have been proposed. In VDGs, our predictions are especially interesting. For example, a forward-looking Nash player will never volunteer unless he expects no other player to do so, or if currently there is over-volunteering and he expects a backlash. Similarly, a clever Nash player would volunteer if currently there is over-volunteering, but never if there is no volunteer already. Basic Nash players, on the other hand, react to the current market pressure. In an asymmetric volunteer’s dilemma, this implies that the strong players will manage to avoid volunteering if they are clever or forward-looking, but not if they are basic Nash players.

6 Concluding remarks Many interactions are such that Nash equilibria predictions are highly sensitive to the exact assumptions we make about the agents’ utilities and about their beliefs about the other players. The standard linear public goods game is one such example. Based on pure material self-interest, we could not explain why subjects in laboratory experiments consistently contribute positive amounts. Hence, one needs to turn to more complex models involving bounded rationality, social preferences, reciprocity, etc. The exact modeling assumptions will then be crucial for predicting how much is contributed under the VCM and at what time. Other interactions are different. In some games, no matter what assumptions we make about the agents’ utility functions and about their beliefs, one and the same type of outcome is generally predicted. The meritocratic matching game was such an interaction. In situations like this, models of perturbed Nash dynamics make robust and accurate predictions, while the many explanations from other related games (such as the standard linear public goods games) do not. The aim of this chapter was to illustrate how predictions about (non-) cooperative behavior can be made on the basis of various perturbed Nash dynamics in the context of social dilemma games, particularly those involving meritocratic matching. One way to reconcile our findings that no single explanation could account for the evidence

Nash Dynamics, Meritocratic Matching, and Cooperation

| 465

across social dilemmas is to conclude that different game structures and institutions influence different preferences and foster the emergence of different beliefs. Our main conclusion is that simple noise-driven learning models can explain a great deal of what we observe in various social dilemma games, although this does not apply to all the stylized facts that are commonly observed in experimental play of linear public goods games. The principle of parsimony would dictate employing these simpler learning based approaches before resorting to more involved theoretical approaches based on preferences and beliefs at the individual level. At first blush, this conclusion may appear to encroach on the explanatory power of preferences and beliefs in understanding decision-making in games. On the contrary, by hierarchically accounting for phenomena, and first explaining as much as possible with noise and imperfect players, what remains may then be better explained by more complex and idiosyncratic preferences and beliefs models.

Bibliography [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

[13] [14] [15]

Alger, Ingela, and Jörgen W. Weibull. 2012. “A generalization of Hamilton’s rule – Love others how much?” Journal of Theoretical Biology 299:42–54. Alger, Ingela, and Jörgen W. Weibull. 2013. “Homo Moralis – Preference Evolution Under Incomplete Information and Assortative Matching.” Econometrica 81(6):2269–2302. Andreoni, James. 1990. “Impure Altruism and Donations to Public Goods: A Theory of WarmGlow Giving.” The Economic Journal 100(401):464–477. Axelrod, Robert. 1984. The Evolution of Cooperation. New York: Basic Books. Axelrod, Robert, and William D. Hamilton. 1981. “The Evolution of Cooperation.” Science 211(4489):1390–1396. Bergin, James, and Bart L. Lipman. 1996. “Evolution with State-Dependent Mutations.” Econometrica 64(4):943–956. Bergstrom, Theodore C. 2003. “The Algebra of Assortative Encounters and the Evolution of Cooperation.” International Game Theory Review 5(3):211–228. Bergstrom, Theodore C. 2013. “Measures of Assortativity.” Biological Theory 8(2):133–141. Blume, Lawrence. E. 1993. “The Statistical Mechanics of Strategic Interactions.” Games and Economic Behavior 5(3):387–424. Blume, Lawrence. E. 2003. “How noise matters.” Games and Economic Behavior 44(2):251– 271. Buchanan, James M. 1965. “An Economic Theory of Clubs.” Economica 32(125):1–14. Burton-Chellew, Maxwell N., Heinrich H. Nax, and Stuart A. West. 2015. “Payoff-based learning explains the decline in cooperation in public goods games.” Proceedings of the Royal Society B. doi:10.1098/rspb.2014.2678. Bush, Robert R., and Frederick Mosteller. 1955. Stochastic Models for Learning. New York: Wiley & Sons. Camerer, Colin F., and Ernst Fehr. 2006. “When Does “Economic Man” Dominate Social Behavior?” Science 311(5757):47–52. Chaudhuri, Ananish. 2011. “Sustaining cooperation in laboratory public goods experiments: a selective survey of the literature.” Experimental Economics 14(1):47–83.

466 | Heinrich H. Nax, Ryan O. Murphy, and Dirk Helbing

[16] Darwin, Charles. 1871. The Descent of Man, and Selection in Relation to Sex. London: John Murray. [17] Dawes, Robyn M., and Richard Thaler. 1988. “Anomalies: Cooperation.” Journal of Economic Perspectives 2(3):187–197. [18] Dawkins, Richard. 1976. The Selfish Gene. Oxford: Oxford University Press. [19] Diekmann, Andreas. 1985. “Volunteer’s Dilemma.” Journal of Conflict Resolution 29(4):605– 610. [20] Diekmann, Andreas. 1993. “Cooperation in an asymmetric Volunteer’s dilemma game theory and experimental evidence.” International Journal of Game Theory 22(1):75–85. [21] Diekmann, Andreas, and Wojtek Przepiorka. 2015. “Punitive preferences, monetary incentives and tacit coordination in the punishment of defectors promote cooperation in humans.” Scientific Reports 5:17–52. [22] Doebeli, Michael, and Christoph Hauert. 2005. “Models of cooperation based on the Prisoner’s Dilemma and the Snowdrift game.” Ecology Letters 8(7):748–766. [23] Erev, Ido, and Amnon Rapoport. 1990. “Provision of Step-Level Public Goods: The Sequential Contribution Mechanism.” Journal of Conflict Resolution 34(3):401–425. [24] Erev, Ido, and Alvin E. Roth. 1998. “Prediction How People Play Games with Unique, Mixed Strategy Equilibria.” American Economic Review 88(4):848–881. [25] Eshel, Ilan, Larry Samuelson, and Avner Shaked. 1998. “Altruists, Egoists, and Hooligans in a Local Interaction Model.” American Economic Review 88(1):157–179. [26] Fischbacher, Urs, and Simon Gächter. 2010. “Social Preferences, Beliefs, and the Dynamics of Free Riding in Public Good Experiments.” American Economic Review 100(1):541–556. [27] Fletcher, Jeffrey A., and Michael Doebeli. 2009. “A simple and general explanation for the evolution of altruism.” Proceedings of the Royal Society B 276(1654):13–19. [28] Fletcher, Jeffrey A., and Michael Doebeli. 2010. “Assortment is a more fundamental explanation for the evolution of altruism than inclusive fitness or multilevel selection: reply to Bijma and Aanen.” Proceedings of the Royal Society B 277(1682):677–678. [29] Foster, Dean, and Peyton Young. 1990. “Stochastic Evolutionary Game Dynamics.” Theoretical Population Biology 38(2):219–232. [30] Gächter, Simon, Daniele Nosenzo, Elke Renner, and Martin Sefton. 2010a. “Sequential vs. simultaneous contributions to public goods: Experimental evidence.” Journal of Public Economics 94(7–8):515–522. [31] Gächter, Simon, Daniele Nosenzo, Elke Renner, and Martin Sefton. 2010b. “Who makes a good leader? Cooperativeness, optimism, and leading-by-example.” Economic Inquiry 50(4):953– 967. [32] Gunnthorsdottir, Anna, Roumen Vragov, Stefan Seifert, and Kevin McCabe. 2010. “Nearefficient equilibria in contribution-based competitive grouping.” Journal of Public Economics 94(11–12):987–994. [33] Gunnthorsdottir, Anna, Roumen Vragov, and Jianfei Shen. 2010b. “Tacit coordination in contribution-based grouping with two endowment levels.” Pp. 13–75 in Research in Experimental Economics. Vol. 13, Charity with Choice, edited by R. M. Isaac, and D. Norton. Bingley: Emerald Group Publishing Limited. [34] Hamilton, William D. 1964a. “Genetical evolution of social behavior I.” Journal of Theoretical Biology 7(1):1–16. [35] Hamilton, William D. 1964b. “Genetical evolution of social behavior II.” Journal of Theoretical Biology 7(1):17–52. [36] Hardin, Garrett. 1968. “The Tragedy of the Commons.” Science 162(3859):1243–1248. [37] Harsanyi, John C., and Reinhard Selten. 1988. A General Theory of Equilibrium Selection in Games. Cambridge, MA: The MIT Press.

Nash Dynamics, Meritocratic Matching, and Cooperation | 467

[38] He, Jun-Zhou, Rui-Wu Wang, and Yao-Tang Li. 2014. “Evolutionary Stability in the Asymmetric Volunteer’s Dilemma.” PLoS ONE 9(8):e103931. [39] Helbing, Dirk. 1992. “Interrelations between Stochastic Equations for Systems with Pair Interactions.” Physica A: Statistical Mechanics and its Applications 181(1–2):29–52. [40] Helbing, Dirk. 1996. “A Stochastic Behavioral Model and a ‘Microscopic’ Foundation of Evolutionary Game Theory.” Theory and Decision 40(2):149–179. [41] Helbing, Dirk. 2010. “The future of social experimenting.” PNAS 107(12):5265–5266. [42] Helbing, Dirk. 2015. The Automation of Society Is Next: How to Survive the Digital Revolution. North Charleston, SC: CreateSpace Independent Publishing Platform. [43] Isaac, Mark R., and James M. Walker. 1988. “Group Size Effects in Public Goods Provision: The Voluntary Contributions Mechanism.” Quarterly Journal of Economics 103(1):179–199. [44] Isaac, Mark R., Kenneth F. McCue, and Charles R. Plott. 1985. “Public goods provision in an experimental environment.” Journal of Public Economics 26(1):51–74. [45] Jensen, Martin K., and Alexandros Rigos. 2014. Evolutionary Games with Group Selection. Working Paper No. 14/9, University of Leicester. [46] Kahneman, Daniel. 1988. “Experimental Economics: A Psychological Perspective.” Pp. 11–20 in Bounded Rational Behavior in Experimental Games and Markets, edited by R. Tietz, W. Albers, and R. Selten. Berlin: Springer. [47] Kandori, Michihiro, George J. Mailath, and Rafael Rob. 1993. “Learning, Mutation, and Long Run Equilibria in Games.” Econometrica 61(1):29–56. [48] Van de Kragt, Alphons J. C., John M. Orbell, and Robyn M. Dawes. 1983. “The Minimal Contributing Set as a Solution to Public Goods Problems.” American Political Science Review 77(1):112–122. [49] Ledyard, John O. 1995. “Public Goods: A Survey of Experimental Research.” Pp. 111–194 in Handbook of Experimental Economics, edited by J. H. Kagel, and A. E. Roth. Princeton, NJ: Princeton University Press. [50] Mäs, Michael, and Dirk Helbing. 2014. “Noise in behavioral models can improve macro-predictions when micro-theories fail.” Preprint. [51] Mäs, Michael, and Heinrich H. Nax. 2015. “A Behavioral Study of “Noise” in Coordination Games.” Journal of Economic Theory 162:195–208. [52] Marwell, Gerald, and Ruth E. Ames. 1979. “Experiments on the Provision of Public Goods. I. Resources, Interest, Group Size, and the Free-Rider Problem.” American Journal of Sociology 84(6):1335–1360. [53] Maynard Smith, John. 1987. “The Theory of Games and the Evolution of Animal Conflicts.” Journal of Theoretical Biology 47(1):209–221. [54] Maynard Smith, John, and George R. Price. 1973. “The logic of animal conflict.” Nature 246(5427):15–18. [55] McKelvey Richard D., and Thomas R. Palfrey. 1995. “Quantal response equilibria for normal form games.” Games and Economic Behavior 10(1):6–38. [56] McKelvey, Richard D., and Thomas R. Palfrey. 1998. “Quantal response equilibria for extensive form games.” Experimental Economics 1(1):9–41. [57] Myatt, David P., and Chris Wallace. 2008. “An evolutionary analysis of the volunteer’s dilemma.” Games and Economic Behavior 62(1):67–76. [58] Nash, John. 1950. “Non-cooperative games.” Ph.D. thesis, Princeton University. [59] Nash, John. 1951. “Non-cooperative games.” The Annals of Mathematics 54(2):286–295. [60] Nax, Heinrich H., Stefano Balietti, Ryan O. Murphy, and Dirk Helbing. 2015. Meritocratic Matching Can Dissolve the Efficiency-Equality Tradeoff: The Case of Voluntary Contributions Games. Mimeo (SSRN 2604140).

468 | Heinrich H. Nax, Ryan O. Murphy, and Dirk Helbing

[61] Nax, Heinrich H., Ryan O. Murphy, and Dirk Helbing. 2014. “Stability and welfare of ‘meritbased’ group-matching mechanisms in voluntary contribution game.” Risk Center working paper, ETH Zurich. [62] Nax, Heinrich H., and Matjaž Perc. 2015. “Directional Learning and the Provisioning of Public Goods.” Scientific Reports 5(8010). doi:10.1038/srep08010. [63] Nax, Heinrich H., Matjaž Perc, Attila Szolnoki, and Dirk Helbing. 2015a. “Stability of cooperation under image scoring in group interactions.” Scientific Reports 5:12145. [64] Nax, Heinrich H., and Alexandros Rigos. 2016. “Assortativity Evolving from Social Dilemmas.” Journal of Theoretical Biology 395:194–203. [65] Nowak, Martin A., and Robert M. May. 1992. “Evolutionary Games and Spatial Chaos.” Nature 359(6398):826–829. [66] Ochs, Jack. 1999. “Coordination in market entry games.” Pp. 143–172 in Games and Human Behavior: Essays in Honor of Amnon Rapoport, edited by D. Budescu, I. Erev, and R. Zwick. New York: Erlbaum. [67] Olson, Mancur. 1965. The Logic of Collective Action: Public Goods and the Theory of Groups. Cambridge, MA: Harvard University Press. [68] Ostrom, Elinor. 1990. Governing the Commons: The Evolution of Institutions for Collective Action. Cambridge: Cambridge University Press. [69] Ostrom, Elinor, James Walker, and Roy Gardner. 1994. Rules, Games, and Common-Pool Resources. Michigan: University of Michigan Press. [70] Ostrom, Elinor. 2000. “Collective Action and the Evolution of Social Norms.” Journal of Economic Perspectives 14(3):137–158. [71] Ostrom, Elinor, and James Walker. 2003. Trust and Reciprocity: Interdisciplinary Lessons from Experimental Research. New York: Russell Sage Foundation. [72] Ostrom, Elinor. 2005. Understanding Institutional Diversity. Princeton, NJ: Princeton University Press. [73] Palfrey, Thomas R., and Howard Rosenthal. 1984. “Participation and the provision of discrete public goods: a strategic analysis.” Journal of Public Economics 24(2):171–193. [74] Potters, Jan, Martin Sefton, and Lise Vesterlund. 2005. “After You – Endogenous Sequencing in Voluntary Contribution Games.” Journal of Public Economics 89(8):1399–1419. [75] Potters, Jan, Martin Sefton, and Lise Vesterlund. 2007. “Leading-by-example and signaling in voluntary contribution games: an experimental study.” Economic Theory 33(1):169–182. [76] Rabanal, Jean P., and Olga A. Rabanal. 2014. Efficient Investment via Assortative Matching: A laboratory experiment. Mimeo. [77] Raihani, Nichola J., and Redouan Bshary. 2011. “The evolution of punishment in n-player public goods games: a volunteer’s dilemma.” Evolution 65(10):2725–2728. [78] Rapoport, Amnon, and Ramzi Suleiman. 1993. “Incremental Contribution in Step-Level Public Goods Games with Asymmetric Players.” Organizational Behavior and Human Decision Processes 55(2):171–194. [79] Roth, Alvin E., and Ido Erev. 1995. “Learning in extensive-form games: Experimental data and simple dynamic models in the intermediate term.” Games and Economic Behavior 8(1):164– 212. [80] Saez-Marti, Maria, and Jörgen W. Weibull. 1999. “Clever Agents in Young’s Evolutionary Bargaining Model.” Journal of Economic Theory 86(2):268–279. [81] Schelling, Thomas. 1971. “On the ecology of micromotives.” Public Interest 25:61–98. [82] Skyrms, Brian. 2004. The Stag Hunt Game and the Evolution of Social Structure. Cambridge: Cambridge University Press. [83] Taylor, Peter D., and Leo B. Jonker. 1978. “Evolutionary stable strategies and game dynamics.” Mathematical Biosciences 40(1–2):145–156.

Nash Dynamics, Meritocratic Matching, and Cooperation

| 469

[84] Thorndike, Edward L. 1898. Animal Intelligence: An Experimental Study of the Associative Processes in Animals. New York: Macmillan. [85] von Neumann, John, and Oskar Morgenstern. 1944. Theory of Games and Economic Behavior. Princeton, NJ: Princeton University Press. [86] Weibull, Jorgen W. 1995. Evolutionary Game Theory. Cambridge, MA: MIT Press. [87] West, Stuart A., Ashleigh S. Griffin, and Andy Gardner. 2007. “Evolutionary Explanations for Cooperation.” Current Biology 17(16):R661–R672. [88] West, Stuart A., Claire El Mouden, and Andy Gardner. 2011. “Sixteen common misconceptions about the evolution of cooperation in humans.” Evolution and Human Behavior 32(4):231–262. [89] Wright, Sewall. 1921. “Systems of Mating.” Genetics 6(3):111–178. [90] Wright, Sewall. 1922. “Coefficients of inbreeding and relationship.” American Naturalist 56(645):330–338. [91] Wright, Sewall. 1965. “The interpretation of population structure by F-statistics with special regard to systems of mating.” Evolution 19(3):395–420. [92] Xie, Yu, Siwei Cheng, and Xiang Zhou. 2015. “Assortative mating without assortative preference.” PNAS 112(19):5974–5978. [93] Young, H. Peyton. 1993. “The evolution of conventions.” Econometrica 61(1):57–84. [94] Young, H. Peyton. 2008. “Social norms.” In The New Palgrave Dictionary of Economics, 2nd edn., edited by S. N. Durlauf, and L. E. Blume. London: Palgrave Macmillan [95] Young, H. Peyton. 2009. “Learning by trial and error.” Games and Economic Behavior 65(2):626–643. [96] Young, H. Peyton. 2011. “Commentary: John Nash and evolutionary game theory.” Games and Economic Behavior 71(1):12–13. [97] Young, H. Peyton. 2015. “The Evolution of Social Norms.” Annual Review of Economics 7:359– 387.

Friedel Bolle

A Note on the Strategic Determination of the Required Number of Volunteers Abstract: Players 1, 2, . . . , n decide simultaneously and independently whether or not to contribute one unit to the production of a public good. The players believe that the public good is produced if at least k units are contributed. While in the literature on these Binary Threshold Public Good games k is common knowledge, we assume that the real k = k ∗ is known only by an additional player 0 (called principal) who is no player in the contribution game but also profits from the public good. Does he have an incentive to strategically misrepresent k ∗ ? We show that, under quite general conditions, the principal is better off if he pretends k > k ∗ . k < k ∗ can be optimal only if no completely mixed strategy equilibrium exists, that is, for high cost/benefit ratios. The principal’s lie is often also beneficial to the other players.

1 Introduction The value of Game Theory for laymen as well as for experts in understanding social relations becomes obvious by the discussion of a limited number of “simple” but eminent example games. The most famous of these “little treasures” (Goeree and Holt 2001) is the Prisoner’s Dilemma, but there are more games which dissect the intrinsic problems of human cooperation and conflict. Andreas Diekmann (1985; 1993) contributed such a “treasure” to the scientific community: the Volunteer’s Dilemma. One volunteer is necessary to produce a public good for a group of n persons, but, as production is costly, all members of the group would prefer to freeride. Further theoretical investigations are concerned with different dynamic versions of the Volunteer’s Dilemma (Otsubo and Rapoport 2008; Bilodeau and Slivinski 1996; Weesie 1993; Weesie 1994). An evolutionary analysis is provided by Myatt and Wallace (2008). Further experimental studies of the Volunteer’s Dilemma are Franzen (1995) and Goeree, Holt, and Moore (2005). Raihani and Bshary (2011), Przepiorka and Diekmann (2013), and Diekmann and Przepiorka (2015) model punishment in Public Good games as a Volunteer’s Dilemma. A generalization of the Volunteer’s Dilemma game is the Binary Threshold Public Good (BTPG) game, where instead of one volunteer a “sufficiently large” team of volunteers is needed to produce the public good. Under equality of the members’ abilities, “sufficiently large” means “at least k of n”. Most discussed among these games is the case k = n, the Stag Hunt game (Rousseau 1762), which is the model game for the question of whether payoff-dominant or risk-dominant equilibria are selected (Carlson and van Damme 1993; Van Huyck, Battalio, and Beil 1990; Rydval und Ortmann 2005; https://doi.org/10.1515/9783110472974-022

472 | Friedel Bolle

Battalio, Samuelson, and Van Huyck 2001). The classic static binary voting model (requiring k > n/2 or a qualified majority) is based on reflections by Downs (1957), which have been formalized and supplemented by Riker and Ordeshook (1968). The main problem which is discussed with this model is the Paradox of Voting, that is a large percentage of people voting in a general election even though the probability of making a decisive impact is tiny. There are some other theoretical investigations with intermediate thresholds k (Sonnemans, Schram, and Offerman 1998; Goeree and Holt 2005; Palfrey and Rosenthal 1984; Palfrey and Rosenthal 1985). The general BTPG game with complete information is analyzed in Bolle (2015). In this note, I want to discuss a problem which emerges when the threshold is not publicly known but stated by an interested party with superior knowledge.¹ Imagine that a young woman without much income has to move to a new apartment. She needs help from at least k ∗ other persons to carry some heavy furniture, so she calls her n good friends and tells them that she needs volunteers on Saturday morning. All friends expect to be asked and would be offended if not. They cannot be expected to be reliable, however. Excuses are cheap and her friends will take this as a game and will play an equilibrium strategy according to their costs and their altruistic benefits (the good feeling that her moving took place smoothly). The question is: should she tell them that she needs at least k ∗ helpers or should she misrepresent the threshold k ∗ ? A problem is that her helpers will find out the true necessary number k ∗ of helpers in the process of moving (more generally, while attempting to produce the public good) and that they do not like to be manipulated. They might also make her responsible for the waste of their time when there are not enough helpers or when some helpers are superfluous. These negative feelings (misrepresentation costs) of her helpers have an effect on her and will be taken into account as misrepresentation costs for the principal. In a repeated game, misrepresentation causes doubts about the truth of the communicated k of later projects. This is a particularly severe problem for organizations like the German betterplace.org and IBG, which organize voluntary work for many projects. For clearly defined distinct projects, these organizations mention an exact number of required volunteers: to remain trustworthy, they must apply the instrument of misrepresentation cautiously. In our conclusion, we will comment on games of incomplete information, which are played if the other players doubt the communicated k.

1 Myatt and Wallace (2009) ask which optimal threshold to introduce in a “normal” Public Good game with continuous voluntary contributions and a continuous production function of the public good. They discuss this problem for Quantal Response Equilibria (McKelvey and Palfrey 1995) and under evolutionary aspects.

A Note on the Strategic Determination of the Required Number of Volunteers

|

473

2 The optimal threshold The game consists of two stages. Stage 1: Player 0 (the principal) announces the number k of volunteers necessary to produce a public good. Stage 2: Players from N = {1, . . . , n} simultaneously and independently decide whether or not to contribute to the production of the public good. Evaluation and information assumptions: We assume that all utilities are expressed as monetary equivalents. For the players i > 0 and the principal, the public good has a value of G i > 0. If a player i > 0 contributes, he incurs costs c i with G i > c i > 0. The true number k ∗ which is necessary for the production is known only to the principal. For i > 0, c i /G i = ρ and ρ is common knowledge. Players 1, . . ., n believe the communicated k to be truthful. (This assumption will be discussed in the conclusion.) We assume that, in the successful or unsuccessful attempt to produce the public good, the players learn the correct k ∗ . The principal’s misrepresentation costs C0 (k, k ∗ ), and the players’ misrepresentation costs C i (k, k ∗ ), have a minimum at k = k ∗ , which can be normalized to zero.

2.1 The second stage of the game In BTPG games there are many pure, mixed and pure/mixed-strategy equilibria. We therefore need a selection criterion. For this we rely on the Harsanyi–Selten (1988) proposal (HS), which selects, in symmetric games, the payoff-dominant symmetric equilibrium. Because of the above assumption G i /c i = ρ (which (5) shows to be the relevant condition for symmetry), and because the players 1, . . . , n expect k = k ∗ and therefore do not expect to suffer from misrepresentation costs, the second stage of our game is a symmetric subgame. Let us now derive the equilibrium strategies, differentiate between symmetric and asymmetric equilibria, and investigate attributes of symmetric equilibria. In Stage 2, n there are ( ) pure strategy equilibria, where k of the n players contribute. For k < n, k these are asymmetric equilibria which are excluded according to our selection criterion. For k > 1, there is an additional symmetric pure strategy equilibrium where no player contributes, and for k = n, there is a symmetric pure strategy equilibrium where all players contribute. The no-contribution equilibrium is payoff-inferior with respect to all other symmetric equilibria; the full-contribution equilibrium in the case k = n payoff-dominates all other symmetric equilibria (see Lemma 1 below).

474 | Friedel Bolle

Let us now turn to mixed strategy equilibria. Let us assume that all players i play equilibrium strategies and define Q

= prob (at least kcontribute) ,

Q+i = prob (at least k − 1of the players from N − {i} contribute) , Q−i = prob (at least kof the players from N − {i} contribute) . Then q i = Q+i − Q−i

(1)

is the probability that player i is decisive. If i’s contribution probability is π i , then Q = π i Q+i + (1 − π i ) Q−i = Q−i + π i q i = Q+i − (1 − π i ) q i .

(2)

Player i’s expected payoff is R i = QG i − π i c i .

(3)

In a mixed-strategy equilibrium, R i is independent of π i which requires ∂R i /∂π i = 0 or q i G i − c i = 0 or

(4)

q i = c i /G i = ρ .

(5)

From (2), (3), and (4) we derive the expected equilibrium payoff of a mixed strategy player: R i = Q−i G i = Q+i G i − c i . (6) For symmetric players, that is, c i /G i = ρ, the completely mixed-strategy equilibrium is necessarily symmetric (Bolle 2015) and the common mixture probability π is derived from (5), that is: n − 1 k−1 ( (7) )π (1 − π) n−k = c i /G i = ρ , k−1 provided a solution exists. The probability that a player is decisive (that exactly k − 1 others contribute) is equal to the cost/benefit ratio ρ. For k = 1, there is always a unique symmetric mixed-strategy equilibrium (Diekmann 1985): π = 1 − ρ 1/(n−1) .

(8)

In the case 1 < k < n, the left-hand side of (7) is a unimodal function with a maximum at k−1 . (9) πmaximizing = n−1 Therefore, for large enough ρ, equation (7) has no solution, and for small enough ρ it has two solutions π 󸀠 < π󸀠󸀠 (border case π󸀠 = π󸀠󸀠 = πmaximizing ). If there is no solution for (7), then the equilibrium where no player contributes is selected. In addition to these equilibria, there are many equilibria where some players play pure and others mixed strategies, but all these equilibria are asymmetric.

A Note on the Strategic Determination of the Required Number of Volunteers

|

475

Lemma 1. A symmetric equilibrium with a threshold k, where all players 1, . . . , n contribute with π󸀠󸀠 payoff-dominates an equilibrium where all players contribute with π󸀠 < π󸀠󸀠 . Proof. Q−i is larger for π 󸀠󸀠 than for π󸀠 . The Lemma therefore follows from (6).



Proposition 1. In symmetric BTPG games with threshold k, the following equilibria are selected by HS: (i) For k = 1, (8) describes the equilibrium probabilities π(k). (ii) For 1 < k < n, and if (7) has solutions π 󸀠 ≤ π󸀠󸀠 , then the equilibrium with π(k) = π󸀠󸀠 is selected; otherwise π(k) = 0. (iii) For k = n, π(k) = 1. Proof. (i) The unique symmetric equilibrium is described by (8). (ii) The only symmetric equilibria are π(k) = 0 and, if existent, solutions of (7), π󸀠 or π󸀠󸀠 . Because of Lemma 1, π󸀠󸀠 is selected. (iii) The equilibrium π(k) = 0 yields zero utility for all. The unique mixed-strategy equilibrium defined by (7) is also connected with R i = Q−i G i = 0 while the all-contributing equilibrium yields R i = G i − c i > 0. ◼ Proposition 2. If solutions of (7) exist for k and for k + 1 π󸀠󸀠 (k + 1) > π 󸀠󸀠 (k).


k/(n − 1), the left-hand side of (7) is larger for k + 1 than for k. From (7), (a), and (b) follows that π 󸀠󸀠 (k + 1) > π 󸀠󸀠 (k). ◼ Note that, in the following, π(k) denotes the equilibrium probabilities of the equilibria selected by HS. Proposition 2 includes the case k = 1 and can be extended to k = n, where not the (unique) solution of (7) but the pure strategy equilibrium π(n) = 1 is selected.

2.2 The first stage of the game We have derived the equilibria which emerge after the selection of a threshold k. Let us now turn to Stage 1 of the game: which k will the principal choose? Assuming G0 = 1 (normalization), the expected payoff is R0 = Q (k ∗ , π (k)) − C0 (k, k ∗ )

(10)

with π(k) as described above and Q(k ∗ , π(k)) = probability of production when players contribute with π(k) and the true threshold is k ∗ . C0 (k, k ∗ ) describes the expected costs of misrepresentation, namely when the members learn the real threshold k ∗ in their successful or unsuccessful attempt to produce the public good. Different shapes

476 | Friedel Bolle of C0 (k, k ∗ ) are possible, for example a kinked linear function γ+ ∗ (k − k ∗ ) fork > k ∗ { { { C0 (k, k ) = { { { − ∗ ∗ {γ ∗ (k − k) fork ≤ k ∗

(11)

with² γ+ ≥ 0, γ− ≥ 0, or a quadratic function C0 (k, k ∗ ) = γ∗ (k − k ∗ ) . 2

(12)

The principal chooses that k which maximizes (10). Proposition 3. (i) For small enough C0 (k, k ∗ ), the principal will maximize π(k) by choosing k = n. (ii) For large enough C0 (k, k ∗ ) the principal will tell the truth k = k ∗ , even if (7) has no solution. (iii) If (7) has a solution for k = k ∗ , then the principal will choose k ≥ k ∗ ; otherwise the optimal k-value may be smaller than k∗ . (iv) If the misrepresentation costs of players 1, . . . , n are small enough, then these players also profit from the principal’s choice of k. Proof. (i), (ii), and (iii) follow from Proposition (2) and (10). (iv) follows from Propositions 1 and 2. ◼ For h < k ≤ k ∗ , k has less misrepresentation cost than h. Therefore, if solutions of (7) exist, below k ∗ larger k are always preferable to smaller k. This applies to all players 0, 1, . . . , n. Together with Proposition 3 (iv), this gives us the impression that the interests of the principal and the other players are often aligned. In Figure 1, the optimal k for misrepresentation costs (11) with γ− = 0, and different ρ and γ = γ+ are shown. If solutions of (7) exist (ρ ≤ 4/9), then k = 1 will never be chosen and the race is mainly between k = 2 and k = 3. For small γ, the advantage of a higher contribution probability makes k = 3 more profitable. Note, however, that the lower line in Figure 1 indicates choices for γ = 1/40. For smaller γ, k = 4 is optimal for smaller ρ than indicated in this line. If no solution of (7) exists for k = 2, then a solution for k = 3 does not exist either; the race is only between k = 1 and k = 4, the latter with a success probability of 1 (Proposition 1), but also with higher misrepresentation costs when the four members of the group detect that only two would have been necessary. In the dotted area, the players 1 − 4 also profit from misrepresentation, provided their costs C i (k, k ∗ ) are low enough.

2 There are arguments for γ − < γ + (interpreting k < k∗ as humbleness and attempting not to appear too demanding), as well as for the contrary (because k < k∗ increases the probability of insufficient contributions).

A Note on the Strategic Determination of the Required Number of Volunteers |

477

1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 23 2 23 2 23 2 23 3 2 23 2 223 2 2233 2223333333333

.8

Gamma

.6

.4

.2

0 0

.2

2 2 23 23 23 23 3

3 334

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4444444444 4 1 1 1 1 4444 4 1 1 1 1 444 1 1 444 4 44 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4444444444444444444444

.4

.6

.8

1

Rho Fig. 1: The strategic choice of k for n = 4, k∗ = 2, (11) with γ− = 0, and different ρ (Rho) and γ+ (Gamma), both from {1/40, 2/40, . . . , 39/40}. With C i (k, k∗ ) = 0, the players 1–4 profit from misrepresentation in the dotted area.

Figure 1 changes considerably after adopting another cost function C0 (k, k ∗ ), for example (12) instead of (11). In the latter case, the k = 1 combinations disappear and are substituted by k = 4 for small γ and by k = 2 for large γ. In the latter cases, the public good will be produced with zero probability (Proposition 1). Proposition 2 implies that misrepresentations k < k∗ are profitable only if (7) does not have a solution. This is different if the cost function no longer has a minimum at k = k∗ , for example if γ− < 0.

3 Conclusion We have investigated a situation where a public good is produced if at least k ∗ of n players contribute a fixed amount. Such games are plagued by a plethora of equilibria. For applications to certain problems it is thus inevitable to use a selection principle. In this chapter, we have followed the suggestion of Harsanyi and Selten (1988). If it is possible for an informed party (the principal) to misrepresent the necessary number k ∗ of volunteers, she will often choose this option, which is not only beneficial for herself but usually also for the members of the group who produce the thresh-

478 | Friedel Bolle

old public good. The result is driven by two incentives for misrepresentation. First, in the case of high cost/benefit ratios (ρ), the truthful communication of k ∗ may be connected with the non-existence of a mixed-strategy equilibrium, and thus induce the selection of the zero-contribution equilibrium where the public good is never produced. Second, higher thresholds are accompanied by mixed-strategy equilibria (if they exist) with higher contribution probabilities (Proposition 2, if applicable). The latter incentive rests on HS equilibrium selection; otherwise higher thresholds may as well increase the attraction of the often risk-dominant³ no-contribution equilibrium. Another question is whether these theoretically-derived results are valid for “real people”, who are usually not homines economici. In an experimental investigation, Bolle and Spiller (2016) show that higher thresholds are accompanied by higher contribution frequencies and that, in a finite mixture model, contribution to the public good is explained with a share of 49 % players who play the equilibrium selected by Harsanyi and Selten (1988). The basic incentives for misrepresentation of the threshold thus exist also among real people. A last issue is the question of whether the members 1, . . . , n believe the principal’s information about the threshold k. Real people may be convinced that it is better for the group to believe the principal, but a rational player cannot ignore the possibility that k ≠ k ∗ . A game with incomplete information would assume that the players know a prior distribution of k ∗ . If the principal plays a pure strategy k = f(k ∗ ), then the players know k ∗ if f −1 (k ∗ ) contains exactly one element. If f −1 (k ∗ ) contains more than one element, they can compute the probability of different k ∗ from the prior distribution of k ∗ . (The resulting second stage game is then no longer a BTPG game.) This does not exclude the principal playing a pure strategy, but in many cases the equilibrium of the incomplete information game might also require a mixed strategy from the principal. We may additionally think about an extension of the principal’s strategy space by allowing him the non-communication of the threshold k or of the number of players n. This is the usual strategy when organizing a Festschrift (which may, but need not, be considered as a BTPG game). The formal discussion of such a game is, however, beyond the scope of this chapter.

Bibliography [1] [2]

Battalio, Raymond, Larry Samuelson, and John Van Huyck. 2001. “Optimization incentives and coordination failure in laboratory stag hunt games.” Econometrica 69(3):749–764. Berkson, Joseph. 1980. “Minimum Chi-Square, not Maximum Likelihood!” The Annals of Statistics 8(3):457–487.

3 Harsanyi and Selten (1988) also define risk-dominance for the case n > 2 (see Bolle and Spiller 2016).

A Note on the Strategic Determination of the Required Number of Volunteers

[3] [4] [5]

[6]

[7]

[8] [9] [10]

[11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22]

[23] [24]

| 479

Bilodeau, Marc, and Al Slivinsky. 1996. “Toilet Cleaning and Department Chairing: Volunteering a Public Service.” Journal of Public Economics 59(2):299–308. Bolle, Friedel. 2015. “Costly Voting – A General Theory of Binary Threshold Public Goods.” Discussion Paper, Frankfurt (Oder). Bolle, Friedel, and Jörg Spiller. 2016. “Not efficient but payoff dominant – Experimental investigations of equilibrium play in binary threshold public good games.” Discussion Paper, Frankfurt (Oder). Bullinger, Anke F., Emily Wyman, Alicia P. Melis, and Michael Tomasello. 2011. “Coordination of chimpanzees (Pan troglodytes) in a stag hunt game.” International Journal of Primatology 32(6):1296–1310. Carlsson, Hans, and Erik van Damme. 1993. “Equilibrium Selection in Stag Hunt Games.” Chapter 12 in Frontiers of game theory, edited by K. Binmore, A. Kirman, and P. Tani. Cambridge, MA: MIT Press. Diekmann, Andreas. 1985. “Volunteer’s Dilemma.” Journal of Conflict Resolution 29(4):605– 610. Diekmann, Andreas. 1993. “Cooperation in an Asymmetric Volunteer’s Dilemma Game – Theory and Experimental Evidence.” International Journal of Game Theory 22(1):75–85. Diekmann, Andreas, and Wojtek Przepiorka. 2015. “Punitive preferences, monetary incentives and tacit coordination in the punishment of defectors promote cooperation in humans.” Scientific reports 5:10321. Downs, Anthony. 1957. An Economic Theory of Democracy. New York: Harper. Franzen, Axel. 1995. “Group size and one-shot collective action.” Rationality and Society 7(2):183–200. Goeree, Jacob K., and Charles A. Holt. 2001. “Ten little treasures of game theory and ten intuitive contradictions.” American Economic Review 91(5):1402–1422. Goeree, Jacob K., Charles A. Holt, and Angela K. Moore. 2005. “An Experimental Examination of the Volunteer’s Dilemma.” Unpublished working paper. Harsanyi, John C., and Reinhard Selten. 1988. A general theory of equilibrium selection in games. Cambridge, MA: MIT Press. McKelvey, Richard D., and Thomas R. Palfrey. 1995. “Quantal response equilibria for normal form games.” Games and Economic Behavior 10(1):6–38. Myatt, David P., and Chris Wallace. 2008. “An evolutionary analysis of the volunteer’s dilemma.” Games and Economic Behavior 62(1):67–76. Myatt, David P., and Chris Wallace. 2009. “Evolution, Teamwork and Collective Action: Production Targets in the Private Provision of Public Goods.” The Economic Journal 119(534):61–90. Otsubo, Hironori, and Rapoport, Amnon. 2008. “Dynamic Volunteer’s Dilemmas over a Finite Horizon: An Experimental Study.” Journal of Conflict Resolution 52(6):961–984. Palfrey, Thomas R., and Howard Rosenthal. 1984. “Participation and the provision of discrete public goods: a strategic analysis.” Journal of Public Economics 24(2):171–193. Palfrey, Thomas R., and Howard Rosenthal. 1985. “Voter Participation and Strategic Uncertainty.” American Political Science Review 79(1):62–78. Przepiorka, Wojtek, and Andreas Diekmann. 2013. “Individual heterogeneity and costly punishment: a volunteer’s dilemma.” Proceedings of the Royal Society of London B: Biological Sciences 280(1759):20130247. Raihani, Nichola J., and Redouan Bshary. 2011. “The Evolution of Punishment in n-player Public Good games: a Volunteer’s Dilemma.” Evolution 65(10):2725–2728. Rapoport, Amnon, and Dalit Eshed-Levy. 1989. “Provision of step-level public goods: Effects of greed and fear of being gypped.” Organizational Behavior and Human Decision Processes 44(3):325–344.

480 | Friedel Bolle

[25] Rousseau, Jean Jacques. 1997. ‘The Social Contract’ and Other Later Political Writings, translation of the original of 1762 by Victor Gourevitch. Cambridge: Cambridge University Press. [26] Rydval, Ondrej, and Andreas Ortmann. 2005. “Loss avoidance as selection principle: evidence from simple stag-hunt games.” Economics Letters 88(1):101–107. [27] Sonnemans, Joep, Arthur Schram, and Theo Offerman. 1998. “Public good provision and public bad prevention: The effect of framing.” Journal of Economic Behavior & Organization 34(1):143–161. [28] Van Huyck, John B., Raymond C. Battalio, and Richard O. Beil. 1990. “Tacit coordination games, strategic uncertainty, and coordination failure.” American Economic Review 80(1):234–248.

Thomas Gautschi

Is No News Bad News? A Hostage Trust Game with Incomplete Information and Fairness Considerations of the Trustee Abstract: The Trust Game is often used to illustrate cooperation problems in situations of sequential exchange where one actor runs the risk of losing a possible initial investment (i.e., trust is abused if placed). The Hostage Trust Game is an extension of this simple scenario that allows for (binding) commitments by the second actor, which should facilitate an investment by the first actor. If no hostage is placed by the second actor, however, the Trust Game and the Hostage Trust Game are identical. Under complete and perfect information, outcomes in the two games must thus be identical if assuming rational utility-maximizing actors. Nevertheless, experimental evidence exists that trust is withheld more often in the Hostage Trust Game after no hostage has been posted compared to the equivalent Trust Game. We seek to find a simple theoretical explanation for this finding and propose a model of incomplete information which additionally allows for fairness considerations by the trustee.

1 Introduction The Trust Game (Dasgupta 1988; Kreps 1990a) is a formal description of a specific class of cooperation problems based on dyadic and non-cooperative exchange. In the simple one-shot Trust Game, the trustor (or principal, or actor 1) moves first and decides between placing or withholding trust. The game ends if the trustor withholds trust. As soon as trust has been placed, the trustee (or agent, or actor 2) must decide between honoring or abusing placed trust. In both cases the game ends. The game is played under circumstances of complete and perfect information (e.g., Rasmusen 2007: Ch. 2), where the actors are rational in the sense that they maximize their own utility (i.e., money, points). Given the actors’ preferences, the game has a unique subgame perfect equilibrium such that the trustor withholds trust while the trustee will abuse trust if placed by the trustor. The game is thus a formal description of a cooperation problem where the principal runs the risk of losing a potential concession or investment. Since both actors could benefit from mutual cooperation compared to the equilibrium outcome, the question of how to solve this principal-agent problem is pressing (for an analysis of trust situations, see Coleman 1990: Ch. 5; for principal-agent problems, see Laffont and Martimort 2002). Several ways to escape from this Pareto-inefficient solution are conceivable (for an overview Kollock 1998). One example might be infinitely repeated encounters of the trustor and the trustee (i.e., dyadic embeddedness), such that cooperation can https://doi.org/10.1515/9783110472974-023

482 | Thomas Gautschi

be an equilibrium (e.g., Abreu 1988 or Fudenberg and Tirole 1991:152–162 on the Folk Theorem, and Friedman 1986 on conditional cooperation using trigger strategies). Another example is playing the game in networks (i.e., network embeddedness, where the trustor and the trustee are related to third parties) to exploit reputation effects (see Kreps et al. 1982 on reputation effects, and Buskens 2002: Ch. 3, Buskens and Raub 2002, Buskens and Yamaguchi 1999, and Raub and Weesie 1990 on network effects). Another solution to trust problems was pointed out by Schelling (1960; see also Williamson 1985: Ch. 7 and 8): the introduction of credible commitments in the form of hostages. Strategic moves like introducing hostages reshape the incentive structure of the situation. A hostage serves as a self-binding commitment such that the incentives to defect are mitigated or even completely removed. In a trust situation, this may ensure cooperation in the sense that placed trust is honored by the trustee in equilibrium. Schelling’s ideas are captured in a straightforward extension of the Trust Game called the Hostage Trust Game (Raub 2004). The trustee moves first and decides whether or not to place a hostage. His decision is observed by the trustor. Placing the hostage changes the payoff structure of the game such that the trustee’s incentive to abuse trust is totally or at least partly removed. Cooperation, depending on the value of the hostage, can thus be an equilibrium in the Hostage Trust Game (e.g., Snijders 1996:125–146; Snijders 2000; Snijders and Buskens 2001). However, if the trustee denies to post a hostage, the game simply becomes a one-shot Trust Game where mutual defection (no trust and an abuse of trust if placed) is the unique equilibrium.¹ Despite this normative logic, experimental data (e.g., Snijders 1996) shows that subjects in the role of the trustor behave differently in a one-shot Trust Game than they do in the respective subgame of a Hostage Trust Game which is reached after the trustee denies to post the hostage. More precisely, experimental subjects playing as the trustor are less likely to place trust in a Hostage Trust Game where no hostage has previously been posted than they are in the Trust Game. Since “not to post a hostage” leads to a subgame equivalent to the one-shot Trust Game, such behavior is inconsistent. Regardless of any normative equilibrium-considerations, under complete and perfect information (e.g., Rasmusen 2007: Ch. 2), the subjects’ behavior should be identical in both games (i.e., either placing or withholding trust). The question is, therefore: why do people behave so inconsistently in simple but identical situations, and is such behavior supported by a Nash equilibrium? In contrast to this theoretical analysis, experimental data (e.g., Snijders 1996) shows that subjects in the role of the trustor behave differently in a one-shot Trust Game and the respective subgame of a Hostage Trust Game which they reach after

1 See also, for example, Keren and Raub (1993), Mlicki (1996), or Raub and Keren (1993) for hostage analyses in the Prisoner’s Dilemma Game. See further Weesie and Raub (1996) for an analysis of Hostage Games, or Raub and Weesie (2000) for hostages as signaling devices.

Is No News Bad News? | 483

the trustee denies to post the hostage. More precisely, experimental subjects playing as the trustor were less likely to place trust in a Hostage Trust Game with no hostage previously posted, compared to their decision in the Trust Game. Since “not to post a hostage” leads to a subgame equivalent to the one-shot Trust Game, such behavior is “inconsistent” and needs explanation.² Regardless of any equilibrium-considerations but given that the experiments were played under complete and perfect information, we should at least expect subjects to show consistent behavior in the Trust Game and the Hostage Trust Game where no hostage is posted (i.e., either place or withhold trust in both games). The question thus is why people behave “inconsistently” in simple but identical situations (see also Gautschi 2002)? And whether such behavior could indeed be supported by a Nash equilibrium? We conclude that the experimental results suggest that the decision not to post a hostage apparently also reveals information to the trustor about the incentives of the trustee. The trustor’s behavior in the simple Trust Game following the no-hostage option seems to be affected by the trustee’s choice. Harsanyi and Selten (1988:90) remark that “After all, once the subgame has been reached all other parts of the game are strategically irrelevant.”. This, however, may not be the case in the experimental laboratory. Subjects playing as the trustor seem to take a hostage as a signaling device (whether posted or not) that can reveal otherwise unobservable (social) preferences (e.g., Aksoy 2013; Vieth 2009). For instance, they could have inferred the trustworthiness of the trustee from his posting of a hostage, whereas his failure to post a hostage may have been taken as a signal of untrustworthiness. The logic for such behavior by the trustor could be as follows: a trustworthy trustee posts a hostage, since by honoring placed trust he will not lose it. Consequently, he places the hostage to signal his trustworthiness. By inversion, a trustee who does no post a hostage is untrustworthy. Unfortunately, the trustee’s reasoning may differ. He may refrain from posting the hostage because he considers himself trustworthy, (wrongly) assuming he has no need to signal that fact. As suggested by Kreps (1990b), a situation in which one actor is unsure about the incentives of the other is best modeled by introducing incomplete information. The main goal of this note is to relax the assumptions of the simple model of trust and see whether it can explain the experimental findings. We introduce incomplete information and trustees with some social preferences, in other words distinct types of trustees that will differ by their inclination to abuse trust if placed. The next section will briefly summarize Snijders’ (1996) empirical findings. In Section 3, we describe the Trust Game with incomplete information. It serves as the baseline model for the Hostage Trust Game with incomplete information, which we discuss in Section 4. This section also develops the equilibria of the game and discusses whether they are in

2 We refer to “inconsistent behavior” as the experimental subjects’ deviation from the prediction of the game-theoretic model, assuming rational actors with complete and perfect information.

484 | Thomas Gautschi

line with the empirical results. The concluding section summarizes the analysis. The appendix specifies conditions for the equilibria discussed in Section 4.

2 No news is bad news: experimental evidence from a hostage trust game experiment Experimental subjects were recruited from Groningen University (n = 106) and Amsterdam University (n = 106).³ They participated in an experiment reflecting a Trust Game and a Hostage Trust Game played under complete and perfect information, in which no references to “trust” were made (see Figure 1: the names of the games were not mentioned in the experiments). Subjects were promised a minimum of 10 Dutch Guilders (approximately 4.50 €, or 6 $, at the time of the experiment) for participating. In addition, about 15 % of the subjects would earn more (up to a maximum of 120 Dutch Guilders, or 54.50 €) depending on their performance. All participants were orally instructed by the experimenter about the task to perform. In addition, they also received written instructions and a graphical representation of the games to be played. Each subject played five Trust Games and (later) five Hostage Trust Games, making a within-subjects comparison possible. The games differed in terms of their absolute payoffs, although the ordinal ranking of the payoffs for each combination of alternatives was constant. Subjects were asked to take a decision as the trustor and the trustee in each Trust Game as well as in each Hostage Trust Game. We are only interested in the participants’ behavior as the trustor (i.e., actor 1 in Figure 1). The hostage institution was such that if the hostage was posted by the trustee (i.e., actor 2 in Figure 1), the hostage either went to a third party or to the opponent in case of opportunistic behavior by the trustee. However, our interest will only concern situations where no hostage is posted and the left subgame of the Hostage Trust Game in Figure 1 is reached. The trustor was asked to take a decision in the Hostage Trust Game under the assumption that the trustee has refrained from posting the hostage. Her choice situation thus resembles the situation she previously faced in the Trust Game. Neglecting game theoretic equilibrium predictions, one would expect that subjects choosing left (right) in the Trust Game should also choose left (right) once they reach the left subgame of the Hostage Trust Game. The results of Snijders’ experiment are summarized in Table 1. The coefficient of the binary choice model on placing trust results in the negative predictor Subgame HTG. This shows that, in comparison to their decision in the Trust Game, fewer subjects (trustors) chose to play right (i.e., to trust) in the subgame of the Hostage Trust Game reached after no hostage has been posted by the trustee. The marginal effect reveals, at the mean of the independent variables, an estimated

3 For a detailed discussion of the setup and the results, see Snijders (1996:65–84).

Is No News Bad News? | 485

2 H+

H– 1

1 Right Left

Right Left

2 Left

P1 P2

Right

S1 T2

1

R1 R2

2 Left

P1 P2

Trust Game

Right

Right

S1 T2

2

Left

R1 R2

P1 P2

Left

Right

S1+H1 T2–H2

R1 R2

Hostage Trust Game

Value of hostages: H1 = 0, H2 > 0 in Groningen, and H1 = H2 > 0 in Amsterdam. Fig. 1: Trust Game and Hostage Trust Game in Snijders’ (1996) experiments (T2 > R 1 = R 2 > P1 = P2 > S1 ).

decrease in the probability of playing right in the Hostage Trust Game by 0.10. The data thus unveil a significant tendency by the experimental subjects to trust less in the Hostage Trust Game after no hostage has been posted compared to their decision in the Trust Game. Since each game was played five times, thereby adopting different payoffs, the regression analysis also contains right-hand variables that capture the variance in the payoffs. The variable Temptation is an indicator for the trustee’s incentives to abuse trust if placed; more precisely, it is equivalent to (T2 − R2 )/(T2 − S1 ) and is an indicator for the trustee’s incentives to abuse trust if placed. The larger the value of Temptation, the less inclined the trustor should be to place trust. The respective regression coefficient, however, shows no effect of the variation in Temptation on the behavior of the subjects playing as a trustor. The variable Risk, on the other hand, is an indicator for the insecurity a trustor faces by placing trust. It puts into perspective the payoff differences from unjustified trust in comparison with justified trust and is defined as Tab. 1: Probit regression with robust standard errors that the trustor chooses to trust in the ‘no hostage’ subgame of the Hostage Trust Game. Predictors†

Coefficient

p-Value

Marginal effect

Subgame HTG −0.48 0.00 −0.10 Risk −2.08 0.00 −0.44 Temptation −0.02 0.95 Constant −0.62 0.33 N = 1883, Pseudo R 2 = 0.29, L = −668.3 †

Only relevant predictors of Table 6.5 (Snijders 1996:158) are reported here.

486 | Thomas Gautschi (P1 − S1 )/(R1 − S1 ). An increase in Risk should thus lead to less trust being placed by the trustor. The respective regression coefficient shows the expected negative and statistically significant effect on the probability to place trust.⁴ The interesting effects, however, result from the regressor Subgame HTG. Under complete and perfect information, and given the actors’ preferences, we would predict a coefficient of zero for this regressor. The theoretically unimportant effect of a hostage that could have been (but was not) posted, however, negatively affects the trustor’s probability of placing trust. We conclude that not placing a hostage seems, in the eye of the trustor, to contain important information not reflected in the payoffs. By not posting the hostage, the trustee obviously appears less trustworthy. Not posting a hostage when one could have done so has apparently been taken by the subjects playing as the trustor as a signal of untrustworthiness. A model of complete information is suitable for explaining observed behavior in the experiment because the subjects playing the trustor were obviously not aware of their partner’s preferences. This raises the question of whether an equilibrium in a game of incomplete information exists which could lend support to the experimental finding just discussed. Put differently, is it possible to capture the effects shown in the experiment by using a slightly more complex model of trust (one with incomplete information and social orientations by the trustee)? Models which make use of other-regarding preferences (e.g., Bolton and Ockenfels 2000; Fehr and Schmidt 1999) have been successful in explaining experimental outcomes, at least for specific social dilemma games. For a test of models of other-regarding preferences see, for instance, Fehr and Schmidt (2006). However, these models have also been criticized for their approach toward modeling fairness and a faulty understanding of rationality (e.g., Binmore 2005a; Binmore 2005b; Binmore and Shaked 2010).

3 The trust game with incomplete information The basic one-shot Trust Game, played under complete information and described above, can be extended as follows. We still assume that actors are rational in the sense that they maximize their own utility (i.e., money, points) but add some probability for social preferences by the trustee. First, Nature moves and decides the game theoretical type of the trustee.⁵ The trustee is of the good type with probability 0 < π TG < 1 and 4 The game-theoretic model, of course, does not directly predict effects of Temptation and Risk. These are variables which reflect the effects of different payoffs, if existent, on observed behavior in the laboratory. Nevertheless, the trustor’s placing of trust is dependent on Risk in equilibrium. Her prior beliefs about the trustee’s preferences to abuse trust if placed are a function of Risk. 5 Throughout this chapter, we name different game theoretical types of trustees by labels as ‘good’, ‘bad’, and (in the Hostage Trust Game) ‘mediocre’. These labels, for ease of understanding, refer to the trustees’ characters or, put differently, to their preferences for abusing or honoring trust.

Is No News Bad News? |

487

of the bad type with probability 1 − π TG . In analyzing the game, the good and the bad trustee are two separate actors. The trustee knows his own type while the trustor is unaware of the type of the trustee. The structure of the game is common knowledge (e.g., Rasmusen 2007:48–49), but the game is one of asymmetric information. The payoffstructure of the subgames,⁶ emerging after Nature has chosen the type of the trustee, is determined by the ordinal ranking of the payoffs: R1 > P1 > S1 and T2 > R2 > P2 , where the subscript 1 (2) denotes the trustor’s (trustee’s) payoffs. Nature 1 – πTG

πTG Trustor

Trustor Place trust

No trust

Trustee Abuse trust

P1 P2

Place trust

S1 T2

No trust

Honor trust R1 R2+∆G

Trustee Abuse trust

P1 P2

S1 T2

Honor trust R1 R2+∆B

Fig. 2: The Trust Game with Incomplete Information.

The two types of trustees can be distinguished via a minor refinement of their rewardpayoff R2 . If the trustee honors trust if placed by the trustor, fairness considerations (due to his social orientation) increase his payoff R2 by an amount ∆.⁷ Figure 2 depicts the Trust Game with incomplete information, where the dotted line indicates an information set (since Nature’s move is not disclosed to the trustor). We distinguish the two types of trustees as follows: (1) The Good Trustee: If the trustee is of the good type, his additional “fairness payoff” is ∆ G ≥ T2 − R2 . Hence, he will honor trust if placed by the trustor. The good trustee will never choose to abuse trust. Since T2 > R2 always holds, ∆ G either fully compensates for the loss the trustee undergoes by reciprocation (if ∆ G + R2 > T2 ) or at least makes him indifferent about honoring or abusing placed

6 Note that these are not subgames in the proper sense, since incomplete information prohibits the trustor from knowing whether, after Nature has chosen the value of π TG , the left or right Trust Game in Figure 2 is played. The mere forking of the game tree is thus not relevant to the trustor’s decision, since asymmetric information prevents her from knowing which branch is being played. In a slight abuse of terminology, however, we still use the term ‘subgame’. 7 It does not make any difference whether the trustee receives an additional payoff ∆ in addition to his reward-payoff R 2 or whether ∆ is subtracted from his temptation-payoff T 2 .

488 | Thomas Gautschi trust (if ∆ G + R2 = T2 ). In that case, it is assumed that the trustee honors placed trust. The left subgame in Figure 2 is played. (2) The Bad Trustee: If the trustee is of the bad type, his additional “fairness payoff” is ∆ B < T2 − R2 . ∆ B does not completely compensate for the payoff difference between T2 and R2 . Hence, he will abuse trust if placed by the trustor. The bad trustee thus has the same preferences as the trustee we came across in the oneshot Trust Game under complete and perfect information. The right subgame in Figure 2 is played. The game depicted in Figure 2 is easily analyzed. The condition that the trustor places trust is fulfilled if and only if her expected utility as a result of placing trust exceeds her expected utility from withholding trust. Based on subjective expected utility considerations, the trustor will trust the trustee whenever π TG R1 + (1 − π TG ) S1 > P1 or, rearranged, when π TG >

P1 − S1 = : π∗ . R1 − S 1

In other words, the trustor decides to trust the trustee whenever her assessment about the latter’s trustworthiness exceeds the ratio of a potential loss due to unjustified trust (P1 − S1 ) and a potential gain due to justified trust (R1 − S1 ). This inequality resembles, for instance, Coleman’s (1990:97–102) assertion in analyzing trust in dyadic exchange relations, but also follows from a game-theoretic model assuming a Trust Game with incomplete information. Based on this simple analysis, we can now formalize a condition on the trustor’s assessment about the trustee being trustworthy. In other words, to explain the empirical findings summarized in Section 2, it need to be shown that there exists an equilibrium in the Hostage Trust Game with incomplete information (see the following section) in which the trustor’s assessment about facing a good trustee (i.e., this probability will be labeled π G in the Hostage Trust Game) – one who always honors trust regardless of his hostage decision – falls short of π ∗ . More formally, if there indeed exists an equilibrium in this game in which a trustor’s assessment about a good type trustee fulfills π G < π∗ < π TG , one can then imply that we have an equilibrium in which the trustor in the Hostage Trust Game with incomplete information withholds trust more often than in the Trust Game with incomplete information. Due to the asymmetric information of the game, we make use of perfect Bayesian equilibria (Kreps and Wilson 1982). A perfect Bayesian equilibrium is a combination of strategies and a set of beliefs such that, in each node of the game, the strategies are Nash given the beliefs and strategies of all other actors. Actors’ beliefs in each information set are rational: that is, they follow Bayes’s Rule, while in case of out-of-equilibrium behavior we propose behavior that does not contradict Bayes’s Rule.

Is No News Bad News? | 489

4 The hostage trust game with incomplete information Take a look at the Hostage Trust Game with incomplete information as depicted in Figure 3. Again, Nature moves first and decides on the type of the trustee. This time, the trustee can be of three different types: the good type with probability 0 < π G < 1, the mediocre type with probability 0 < π M < 1, or the bad type with probability π B = 1 − π G − π M . Subsection 4.1 will describe the three types of trustees in detail. As in the Trust Game with incomplete information, the trustee knows his own type, while the trustor is unaware of the type of the trustee she meets. As usual, the structure of the game is common knowledge. Again, we assume rational actors who maximize own utility (i.e., money, points) and trustees with some probability for social preferences. The trustees thus receive an additional payoff ∆ due to fairness considerations if they do not abuse trust if placed. The trustor can observe whether or not a hostage has been posted by a trustee before she moves. However, she cannot observe Nature’s move. Therefore, dotted lines in Figure 3 indicate the two information sets. Nature πB

πG πM Trustee

Trustee H

H Trustor

Trustor

P1 P2

S1 T2

ht

Trustor

R1 R∆2

P1 P2

S+1 T2–

ht R1 R∆2

P1 P2

S1 T2

Trustor pt

pt Trustee nt

Trustee nt at

ht R1 R∆2

at P1 P2

H+



Trustor

pt Trustee nt

at

H

+

Trustor pt

Trustee nt at

H



pt nt

Trustee

H

+



S+1 T2–

at

ht R1 R∆2

P1 P2

pt Trustee nt

S1 T2

ht R1 R∆2

Trustee at

P1 P2

S+1 T–2

ht R1 R∆2

Payoffs: R∆ R2 + ∆; T2– T2 − H; for S+1 see text. H– denotes ‘not posting a hostage’, H+ denotes ‘posting a hostage; nt = no trust, pt = place trust, at = abuse trust, ht = honor trust.

Fig. 3: The Hostage Trust Game with Incomplete Information.

Given that the trustee refrains from placing a hostage, the expected utilities for the trustor, E[U1 (⋅)], and the trustee, E[U2 (⋅)], assigned to specific combinations of actions, are as follows.

490 | Thomas Gautschi

First, if the trustee refrains from posting a hostage, the payoffs result from those subgames reached by running along the branches denoted H − (see Figure 3): (a)

E[U1 (no trust, abuse trust)]

= P1 ,

E[U2 (no trust, abuse trust)]

= P2

(b)

E[U1 (no trust, honor trust)]

= P1 ,

E[U2 (no trust, honor trust)]

= P2

(c)

E[U1 (place trust, abuse trust)]

= S1 ,

E[U2 (place trust, abuse trust)]

= T2

(d)

E[U1 (place trust, honor trust)]

= R1 ,

E[U2 (place trust, honor trust)]

= R2∆

where the ordinal ranking of the payoffs is represented by R1 > P1 > S1 and T2 > R2 > P2 , and R2∆ is a shorthand for the payoff R2 + ∆. Second, if the trustee posts a hostage, the trustor and the trustee can expect either of the following payoffs resulting from the subgames reached after running along the branches denoted H + (again see Figure 3): (e)

E[U1 (no trust, abuse trust)]

= P1 ,

E[U2 (no trust, abuse trust)]

= P2

(f)

E[U1 (no trust, honor trust)]

= P1 ,

E[U2 (no trust, honor trust)]

= P2

S+1 ,

E[U2 (place trust, abuse trust)]

= T2−

E[U2 (place trust, honor trust)]

= R2∆

(g)

E[U1 (place trust, abuse trust)]

=

(h)

E[U1 (place trust, honor trust)]

= R1 ,

where T2− is a shorthand for the payoff T2 − H, in which H denotes the value of the hostage which is lost if the trustee posts the hostage but defects subsequently (i.e., he abuses placed trust). In this case, the trustor earns {S 1 if the hostage goes to a third party S+1 = { S + H if the hostage goes to the trustor { 1 Even though we are interested in an equilibrium where no hostage is posted, there might still be at least one equilibrium (pure or mixed) where at least one of the trustees posts a hostage. Since such an equilibrium may depend on whether or not the trustor receives the hostage in addition to her sucker payoff S1 , the above distinction is necessary at this point.

4.1 The game theoretical type of the trustee The behavior of the trustee, or the honoring or abusing the trust placed by the trustor, is strictly dependent on the value of ∆. As mentioned above, we face three game theoretical types of trustees from which Nature chooses with a positive probability at the beginning of the game. (1) The Good Trustee: A trustee is good with probability π G = Pr(∆ > T2 − R2 ). He honors trust after posting a hostage, as well as after refraining from posting the hostage. A good trustee has no incentive to abuse trust if placed, since ∆ exceeds T2 − R2 . He is therefore fully compensated for the foregone gain equal to the difference T2 − R2 (the left Hostage Trust Game in Figure 3 is played).

Is No News Bad News? | 491

(2) The Mediocre Trustee: A trustee is mediocre with probability π M = Pr(T2 − R2 − H < ∆ < T2 − R2 ). He honors trust after posting a hostage, but abuses trust after refraining from posting the hostage (the Hostage Trust Game in the middle of Figure 3 is played). (3) The Bad Trustee: A trustee is bad with probability π B = Pr(∆ < T2 − R2 − H). He abuses trust after posting a hostage as well as after refraining from posting the hostage. The value of ∆ is not enough to compensate him for the foregone gain in case he were to honor placed trust (the right Hostage Trust Game in Figure 3 is played). As usual, the probabilities π G , π M , and π B reflect the trustor’s prior beliefs about the game theoretical type of the trustee. Theorem 1. Consider the Hostage Trust Game with incomplete information as depicted in Figure 3, the three different game theoretical types of trustees and one trustor, all with their subjective expected utility as outlined under (a) through (h). Moreover, define the probability that the trustor places trust after observing a trustee posting a hostage as q+ = Pr(place trust|H + ), and the probability that the trustor places trust after observing a trustee not posting a hostage as q− = Pr(place trust|H − ). In addition, define the following sets of trustees: the set J G of good trustees, the set J M of mediocre trustees, and the set J B of bad trustees. Then define three possible hostage sets, all of which may be empty: the set H+ of hostage posting trustees, the set H− of none hostage posting trustees, and the set H(0,1) of trustees mixing with a probability in the open interval (0,1) over whether or not to post a hostage. The probability that a good trustee posts a hostage is denoted as p G = Pr(H + |good trustee), the probability of a mediocre trustee posting a hostage is denoted as p M = Pr(H + |mediocre trustee), and the probability of a bad trustee posting a hostage is denoted as p B = Pr(H + |bad trustee). Finally, define the following payoff-ratios P1 − S1 , R1 − S 1 P1 − S+1 RISK+ = . R1 − S+1 RISK =

Generally, the Hostage Trust Game with incomplete information then has the following four equilibria: Equilibrium I: Suppose that (J G ∪ J M ∪ J B ) ⊂ H+ , H− = 0, and H(0,1) = 0; then and only then an equilibrium of the following type exists: All types of trustees post a hostage with probability one, while the trustor subsequently places trust if and only if π G < RISK and π G + π M > RISK+ are simultaneously fulfilled. The trustees consequently play according to their strategies defined in Subsection 4.1 and denoted by the double-line in Figure 3. Equilibrium II: Suppose that (J M ∪ J B ) ⊂ H− , J G ⊂ H(0,1) , and H+ = 0; then and only then an equilibrium of the following type exists:

492 | Thomas Gautschi

A mediocre and a bad type trustee refrain from posting a hostage with probability one, while a good type trustee posts a hostage with probability 0 < pG < 1 −

1 − π G P1 − S1 . π G R1 − P1

The trustor subsequently places trust if and only if π G > RISK. Consequently, all trustees play according to their strategies defined in Subsection 4.1 and denoted by the double-line in Figure 3. Equilibrium IIIa: Suppose that (J G ∪ J M ∪ J B ) ⊂ H(0,1) , H+ = 0, and H− = 0; then and only then an equilibrium of the following type exists: All types of trustees provide a hostage with probability 0 < p G = p M = p B < 1 and the trustor withholds trust if and only if π G + π M < RISK+ . Consequently, the trustees play according to their strategies defined in Subsection 4.1 and denoted by the double-line in Figure 3. Equilibrium IIIb: Suppose that (J G ∪ J M ∪ J B ) ⊂ H(0,1) , H+ = 0, and H− = 0; then and only then an equilibrium of the following type exists: All types of trustees provide a hostage with probability 0 < p G ≠ p M ≠ p B < 1, and the trustor then withholds trust if and only if πG pG + πM pM < RISK+ πG pG + πM pM + πB pB and

π G (1 − p G ) < RISK π G (1 − p G ) + π M (1 − p M ) + π B (1 − p B )

are simultaneously fulfilled. Consequently, the trustees play according to their strategies defined in Subsection 4.1 and denoted by the double-line in Figure 3. Proof. See appendix.



Let us summarize the results of the above theorem. First and most important, there exists no equilibrium which would by all means support the empirical findings by Snijders (1996): a trustor’s assessment – after no hostage has previously been posted – about facing a good type trustee can never be smaller than the threshold π∗ for placing trust in the Trust Game with incomplete information. If no hostage has been posted by any of the three types of trustees, the inequality π G < π∗ < π TG is never fulfilled in equilibrium. That is, incomplete information and some social preferences in the sense of an extra utility term for the trustee is not sufficient for explaining the experimental subject’s inconsistent behavior in the laboratory. However, with a little goodwill we may find some indication of an equilibrium which points in the direction of the experimental results. There is some evidence that a trustor in the Hostage Trust Game with incomplete information could choose to withhold trust more often in equilibrium than in the Trust Game with incomplete information. Equilibrium IIIb is the candidate for such speculation.

Is No News Bad News? | 493

In comparison to the Trust Game with incomplete information, Nature introduces a new, mediocre type of trustee to the Hostage Trust Game with incomplete information with probability π M . This trustee honors placed trust after having posted a hostage and abuses placed trust otherwise. Due to this mediocre type, the trustor’s assessment of π G in equilibrium IIIb can indeed fall short of the threshold π∗ of the Trust Game with incomplete information. As we know from the experimental setup, we would need to consider a situation where no hostage is posted and the trustor’s assessment about meeting a good type trustee must fulfill π G < π∗ .⁸ Such a situation does not exist in equilibrium, but equilibrium IIIb states that the probabilities for each of the three game theoretical types of trustees to provide a hostage (i.e., p G , p M , and p B ) are different but fall within the open interval (0,1). If we assume a situation where all three probabilities tend to zero, we have a limiting equilibrium where hostage posting is unlikely to occur. If we are willing to translate “close to zero” as “neither of the trustees posts a hostage”, we have a theoretical explanation for the experimental findings. Let us provide a numerical example. Assume that the payoffs are T2 = 75, R1 = R2 = 50, P1 = P2 = 40, S1 = 15 and the value of the hostage equals H = 5. This is one set of payoffs taken from the experiments conducted by Snijders (1996). The values for RISK and RISK+ are then easily computed: RISK = 0.7143 and RISK+ = 0.6667. Further assume the probabilities for hostage posting by any game theoretical type of trustee to be p G = 0.00001, p M = 0.00002, and p B = 0.00003. Finally assume that the prior beliefs by the trustor about Nature choosing a specific type of trustee are given by the probabilities π G = 0.45 for the good trustee, π M = 0.25 for the mediocre trustee, and π B = 0.3 for the bad trustee. We then have: πG pG + πM pM = 0.513514 < RISK+ = 0.6667, and πG pG + πM pM + πB pB π G (1 − p G ) = 0.450004 < RISK = 0.7143 . π G (1 − p G ) + π M (1 − p M ) + π B (1 − p B ) Jointly, these values fulfill the conditions for equilibrium IIIb. Remember that in the Trust Game the trustor will withhold trust as long as π TG (her prior belief of facing a good trustee) is smaller than RISK. Since the value for RISK is the same in both games under consideration, the trustor would refuse to place trust in the Trust Game with incomplete information for any belief in the range 0 < π TG < 0.7143. On the contrary, the trustor’s belief of facing a good trustee in the Hostage Trust Game with incomplete information can, ceteris paribus, only vary in the range 0 < π G < 0.6285. Hence,

8 Note that π TG denotes the trustor’s prior beliefs of encountering a good type trustee in the Trust Game with incomplete information, while π G denotes the trustor’s prior beliefs about the occurrence of a good type trustee in the Hostage Trust Game with incomplete information. Even though these priors need not adopt the same values, they nevertheless refer to the same game theoretical type of trustee, viz., one who always honors placed trust. Hence, what needs to be shown is that π G < π TG < π ∗ .

494 | Thomas Gautschi

several values for π G can be found which indeed fall short of the threshold value and fulfill π G < π TG < π∗ . Even more straightforward is the interpretation of equilibrium IIIa. Again, we need the assumption that p G = p M = p B tends to zero, such that the probability of any trustee posting a hostage becomes negligible. In equilibrium, as long as π G + π M < RISK+ , the trustor is not willing to place trust. Since π G + π M together need to be smaller than RISK+ (and RISK+ is always smaller than RISK), it can easily be seen that π G can again fall short of π TG . The remaining two equilibria (equilibrium I and equilibrium II) cannot lend support to the experimental findings by Snijders (1996), even if interpreted laxly, as we did with equilibria IIIa and IIIb. In equilibrium II, the trustor places trust as long as her assessment about facing a good type trustee (π G ) exceeds RISK. However, she only does this as long as a good type trustee mixes within a certain range, specified above, over whether or not to post a hostage. A mediocre and a bad type trustee refrain from posting a hostage. Therefore, whenever the trustor sees a hostage being posted she can be sure she faces a good type trustee. If no hostage can be observed by the trustor, she might still be playing a good type trustee, but she is not able to infer the opponent’s type. Put differently, by not observing a hostage, the trustor faces the same decision problem as in the Trust Game with incomplete information. It is, therefore, not surprising that the condition for cooperation in the Hostage Trust Game with incomplete information mirrors the one in the Trust Game with incomplete information. A logical “extension” of equilibrium II is given by equilibrium I. As soon as the trustor’s prior belief about an encounter with a good type trustee falls short of RISK (viz., π G < RISK), she only places trust if her assessment about the common pool of good and mediocre type trustees is larger than RISK+ (viz., π G +π M > RISK+ ). However, +

(πG + πM) Updated 0

RISK

RISK EQ IIIb

πG Updated

1 +

RISK

EQ IIIa 0

πG + πM

1 +

RISK

RISK EQ I

0

πG

πG + πM +

RISK

1 RISK EQ II

0

πG

Fig. 4: Equilibria in the Hostage Trust Game with Incomplete Information.

1

Is No News Bad News? | 495

equilibrium behavior of the trustor asks for hostage posting with probability one by all types of trustees. This does not provide any signal about the type of a trustee, but it nevertheless makes sure that a good and a mediocre type trustee will honor placed trust. Therefore, to trust in the first place pays off for the trustor. All four equilibria of the Hostage Trust Game with incomplete information are visualized in Figure 4. We have shown that the counterfactual behavior of experimental subjects cannot be explained by a model that makes use of incomplete information and the social orientation of the trustee. However, there are some clues in equilibria IIIb and IIIa that might be taken as an indication that Snijders’ (1996) results are not so strange after all. If one is willing to accept the above reasoning, the question raised in the title – whether “no news” is “bad news” – could be answered with a Yes. “No news” in the sense that the trustee did not provide the hostage might, in the eyes of the experimental subjects playing as trustor, contain information about the trustee’s further intentions: he posts no hostage, because he will not honor trust if placed and will thus not run the risk of losing the hostage. Whether this skepticism is justified cannot, unfortunately, be answered. In general, the analysis suggest that it seems wiser to send clear signals about one’s own cooperative intentions than to rely upon a diffuse signal (“I do not intend to abuse trust, and since it will not be necessary to induce trust, I will not post the hostage”) which may be misunderstood by other player(s). This is especially so because sending a signal is free of cost in many situations. Since the analysis also showed that an equilibrium exists in which all types of trustees place a hostage, however, a clear signal may not be enough. The bottom line is thus as follows: choose your signals wisely, send them when necessary, and make sure the signal cannot be imitated.

5 Conclusion and discussion Experimental evidence (e.g., Snijders 1996) shows that subjects tend to withhold trust more often in a Hostage Trust Game where the trustee refrains from producing the hostage than they do in a similar Trust Game. However, since the simple one-shot Trust Game is equivalent to the subgame of the Hostage Trust Game (which starts when no hostage is posted), an equilibrium analysis under complete information will predict identical outcomes in both games. We seek to explain the experimental findings by putting forward a Hostage Trust Game with incomplete information and trustees with some social preferences. It is shown that in such a game, compared to a Trust Game with incomplete information, no equilibrium exists, which suggests that a trustor withholds trust more often if the probability of hostage posting is exactly zero. Thus, the findings by Snijders (1996) cannot be explained by the model put forward. However, if all game theoretical

496 | Thomas Gautschi

types of trustees randomize with a probability in the open interval (0,1) over whether or not to post a hostage, a trustor’s assessment to encounter a good trustee may indeed fall short of her beliefs about such a type of trustee in the Trust Game with incomplete information. In other words, if we assume that in the experiments, subjects playing as a trustee would randomize hostage posting with probabilities close to zero for posting the hostage, we have a theoretical model which resembles the experimental findings by Snijders (1996) as closely as possible. We have also seen that hostages may be posted in equilibrium. However, none of these equilibria are separating in the sense that only a good (and probably a mediocre) type trustee would place a hostage, while a bad type trustee would refrain from providing a hostage. A trustor would thus place trust after observing a hostage, but withhold it when no hostage is observed. Our analysis therefore shows that hostages cannot serve as a signaling device in a Hostage Trust Game with incomplete information. This is not surprising, since Snijders (1996:171) already noted that “hostages cannot be type revealing in Hostage Trust Games.” Since the hostage is not binding for a bad type trustee (he abuses placed trust even after he posts a hostage), a trustor will never place trust with probability one in a pooling equilibrium. But why are there no equilibria in this Hostage Trust Game with incomplete information where only the good type trustee, or the good and the mediocre type trustee, post a hostage, while the trustor places trust if she observes a hostage? After all, there are “candidates” for such an equilibrium (see appendix, case (i) and case (iv)). However, in any of these cases, it pays for the bad type trustee to provide a hostage and mimic a reliable type. Consequently, this leads to the trustor’s conditional cooperation described by equilibria I–IIIb. The conclusion would thus be that misusing signals should incur considerable costs for the sender. The model developed in this note, however, is rather simple, and driven by the desire to theoretically explain an obliquity in Snijders’ (1996) experimental results. It thus neglects the costliness of signals. A reasonable extension of the model would introduce credible signals in the sense that hostage-posting is made costly to the trustee. More precisely, the costs of hostage-posting must be highest to the bad trustee but cheapest to the good trustee.

Is No News Bad News? | 497

Appendix: Proof of Theorem 1 Let us first lay out the conditions that either one of the game theoretical types of trustees posts a hostage. Lemma A1. A good trustee posts a hostage whenever (A.1) is fulfilled and refrains from posting a hostage otherwise. A mediocre actor posts a hostage whenever (A.2) is fulfilled and refrains from posting a hostage otherwise. A bad trustee posts a hostage whenever (A.3) is fulfilled and refrains from posting a hostage otherwise. A trustee mixes over whether or not to post a hostage with probability p G = Pr(H + |good trustee), p M = Pr(H + |mediocre trustee), or p B = Pr(H + |bad trustee), respectively, whenever the right-hand and left-hand side of (A.1), (A.2), or (A.3), respectively, become equal. q+ > q−

(A.1)

T2 − P2 q+ > q− = : q πM R 2 − P2 + ∆2 T2 − P2 q+ > q− = : q πB R 2 − P2 − H

(A.2) (A.3)

Proof. Demanding that the subjective expected utility from posting a hostage is always larger than the subjective expected utility from not posting a hostage for each of the game theoretical types of trustee leads to inequality (A.1) for the good trustee, inequality (A.2) for the mediocre trustee, and inequality (A.3) for the bad trustee. ◼ Inequalities (A.1), (A.2), and (A.3) must satisfy q − < q π M < q π B . Taking into account the sets of trustees and hostage possibilities as defined in Theorem 1, eight possible cases for an equilibrium analysis in pure strategies exist: (i)

J G ⊂ H+ , (J M ∪ J B ) ⊂ H−



q− < q+ < q πM < q πB

(ii)

J M ⊂ H+ , (J G ∪ J B ) ⊂ H−



q πM < q+ < q− < q πB

(iii)

J B ⊂ H , (J G ∪ J M ) ⊂ H





q πB < q+ < q− < q πM

(iv)

(J G ∪ J M ) ⊂ H+ , J B ⊂ H−

+



q− < q πM < q+ < q πB

(v)

(J G ∪ J B ) ⊂ H , J M ⊂ H





q− < q πB < q+ < q πM

(vi)

(J M ∪ J B ) ⊂ H+ , J G ⊂ H−



q πM < q πB < q+ < q−

+

+



(vii)

(J G ∪ J M ∪ J B ) ⊂ H , H = 0



q− < q πM < q πB < q+

(viii)

(J G ∪ J M ∪ J B ) ⊂ H− , H+ = 0



q+ < q− < q πM < q πB

As can be seen, only cases (i), (iv), (vii) and (viii) do not violate the necessary condition q − < q π M < q π B . We can, therefore, restrict ourselves to the following cases when checking for mixed equilibria: (ix)

J G ⊂ H(0,1) , (J M ∪ J B ) ⊂ H− , H+ = 0



q− = q+ < q πM < q πB

(x)

J G ⊂ H+ , J M ⊂ H(0,1) , J B ⊂ H−



q− < q πM = q+ < q πB

(xi)

+

(xii)

(J G ∪ J M ) ⊂ H , J B ⊂ H

(0,1)



,H = 0



q− < q πM < q+ = q πB

+





q+ = q− = q πM = q πB

(J G ∪ J M ∪ J B ) ⊂ H(0,1) , H = 0, H = 0

where case (xii) can only be true for q + = q − = q π M = q π B = 0. Lemma A2. An equilibrium in the Hostage Trust Game with incomplete information (as depicted in Figure 3) only exists in case (vii), denoted as equilibrium I, in case (ix), denoted as equilibrium II, and in case (xii), denoted as equilibrium IIIa and equilibrium IIIb.

498 | Thomas Gautschi

Proof. We will examine the four different equilibria separately. The proof that the other possible cases, that is, (i), (iv), (viii), (x), and (xi) cannot be an equilibrium runs along the same lines as this proof of equilibrium. What needs to be shown is that each case leads to a contradiction in the inequalities formed by q − , q + , q π M , and q π B . ◼ Equilibrium I: Given the three game theoretical types of trustee, a trustor’s expected utility if a hostage is posted is as follows: E [U 1 (C 1 |H + )] = (π G + π M ) R 1 + π B S+1 E [U 1 (D1 |H + )] = P1 Demanding that E[U 1 (C 1 |H + )] > E[U 1 (D1 |H + )], taking into account that π B = 1 − π G − π M and rearranging terms leads to P1 − S+1 πG > − π M = : π ∗∗ G . R 1 − S+1 Therefore, a trustor’s probability to play cooperatively after a hostage was posted is {0 ⇔ π G < π ∗∗ G q+ = { 1 ⇔ π G > π ∗∗ G { where π ∗∗ G decreases if the rules of the Hostage Trust Game demand that a lost hostage goes to the trustor.⁹ Since all trustees place a hostage, q − can only be calculated on the base of passive conjectures (e.g., Rasmusen 2007:160). Assuming the simplest case, namely, the trustor’s priors do not change after she observes a posted hostage (hence, out-of-equilibrium behavior by a trustee does not leave anything to learn for the trustor) leads to E[U 1 (C 1 |H − )] > E[U 1 (D1 |H − )] and rearranging to πG >

P1 − S1 = : π ∗G . R 1 − S1

Hence, a trustor’s probability to play cooperatively after no hostage was posted is {0 ⇔ π G < π ∗G q− = { 1 ⇔ π G > π ∗G {

(A.4)

∗ Since π ∗∗ G < π G , we distinguish the following situations: ∗ π G < π ** { G < πG { { ** Pr (C 1 ) = {π G < π G < π ∗G { { ** ∗ {π G < π G < π G

q+ = q− = 0 q+ > q− q+ = q− = 1

∗ Since we demand that q − < q π M < q π B < q + , only π ∗∗ G < π G < π G can be in equilibrium. Hence, whenever all game theoretical types of trustees post a hostage, the trustor is willing to cooperate if and only if she assesses the probability π G (playing a good trustee) to be in the open + interval (π ∗G , π ∗∗ G ). Put differently, π G < RISK and π G + π M > RISK must be fulfilled simultaneously. All trustees play according their strategy described in Subsection 4.1 and denoted by a double line in Figure 3. This forms equilibrium I.

9 This can be seen since

∂ ∂S1

1 ( RP11 −S −S1 ) < 0.

Is No News Bad News? | 499

Equilibrium II: Given the three game theoretical types of trustees, a trustor’s expected utility if a hostage is posted is as follows: E [U 1 (C 1 |H + )] = R 1 E [U 1 (D1 |H + )] = P1 Since R 1 > P1 always holds for any Trust Game, it must be the case that q + = 1. Given that a trustor observes no hostage being posted, her subjective expected utility is as follows: E [U 1 (C 1 |H − )] =

πM + πB (1 − p G ) π G R1 + S1 (1 − p G ) π G + π M + π B (1 − p G ) π G + π M + π B

E [U 1 (D1 |H − )] = P1 Demanding E[U 1 (C 1 |H − )] > E[U 1 (D1 |H − )] and rearranging terms leads to pG < 1 −

π M + π B P1 − S1 = : p∗ . πG R 1 − P1

Since p G is a probability in the open interval (0,1), we have the restriction π M + π B P1 − S1

P1 − S1 R 1 − S1

or, analogously, π G > RISK. Consequently, we see that q − = 1. In equilibrium, we demand that q − = q + < q π M < q π B , which is indeed fulfilled by the above conditions. Hence, a mediocre and a bad type of trustee refrain from posting a hostage, while a good trustee posts a hostage with probability 0 < p G < p ∗ and a trustor places trust whenever her priors about the good type of trustee fulfill π G > RISK. All trustees play their strategy described in Subsection 4.1 and denoted by a double line in Figure 3. This forms equilibrium II. Equilibria IIIa and IIIb: Given the three game theoretical types of trustees, a trustor’s expected utility if a hostage is posted is as follows E [U 1 (D1 |H + )] = P1 E [U 1 (C 1 |H + )] =

pB πB pG πG + pM πM R1 + S+ pG πG + pM πM + pB πB pG πG + pM πM + pB πB 1

and if no hostage is being posted E [U 1 (D1 |H − )] = P1 E [U 1 (C 1 |H − )] =

π G (1 − p G ) R1 π G (1 − p G ) + π M (1 − p M ) + π B (1 − p B ) π M (1 − p M ) + π B (1 − p B ) + S1 π G (1 − p G ) + π M (1 − p M ) + π B (1 − p B )

In equilibrium, we demand that q + = q − = q π M = q π B and, hence, all types of trustees are mixing over whether or not to post a hostage while a trustor withholds trust. Therefore, E[U 1 (C 1 |H + )] < E[U 1 (D1 |H + )] and E[U 1 (C 1 |H − )] < E[U 1 (D1 |H − )] must be fulfilled simultaneously, leading to πG pG + πM pM < RISK+ πG pG + πM pM + πB pB

500 | Thomas Gautschi

and

π G (1 − p G ) < RISK . π G (1 − p G ) + π M (1 − p M ) + π B (1 − p B )

Depending on the probabilities of the trustees to post a hostage, we find that: – Whenever all trustees are mixing over whether or not to post a hostage with the same probability 0 < p G = p M = p B < 1, the conditions for an equilibrium reduce to π G + π M < RISK+ and π G < RISK, under which a trustor always withholds trust. Since RISK+ is always smaller than RISK, the inequality π G + π M < RISK+ forms a stronger restriction than π G < RISK. All trustees play their strategy described in Subsection 4.1 and denoted by a double line in Figure 3. This forms equilibrium IIIa. – Whenever all trustees are mixing over whether or not to post a hostage with different probabilities 0 < p G ≠ p M ≠ p B < 1, a trustor withholds trust if πG pG + πM pM < RISK+ πG pG + πM pM + πB pB and

π G (1 − p G ) < RISK π G (1 − p G ) + π M (1 − p M ) + π B (1 − p B ) are simultaneously fulfilled. All trustees play according to their strategies defined in Subsection 4.1 and denoted by the double-line in Figure 3. This forms equilibrium IIIb.

This proves Lemma 2 and thus Theorem 1.



Bibliography [1]

Abreu, Dilip. 1988. “On the Theory of Infinitely Repeated Games with Discounting.” Econometrica 56(2):383–396. [2] Aksoy, Ozan. 2013. Social Preferences and Beliefs in Non-Embedded Social Dilemmas. Zutphen: Wöhrmann Print Service. [3] Binmore, Kenneth G. 2005a. “Economic Man – or Straw Man? A Commentary on Henrich et al.” Behavioral and Brain Science 28(06):817–818. [4] Binmore, Kenneth G. 2005b. “Economic Man – or Straw Man? A Comment on Henrich et al.” Mimeo. University College London. [5] Binmore, Kenneth G., and Avner Shaked. 2010. “Experimental Economics: Where Next?” Journal of Economic Behavior and Organization 73(1):87–100. [6] Bolton, Gary E., and Axel Ockenfels. 2000. “ERC: A Theory of Equity, Reciprocity, and Competition.” American Economic Review 90(1):166–193. [7] Buskens, Vincent. 2002. Social Networks and Trust. Boston: Kluwer. [8] Buskens, Vincent, and Werner Raub. 2002. “Embedded Trust: Control and Learning.” Advances in Group Processes 19:167–202. [9] Buskens, Vincent, and Kazuo Yamaguchi. 1999. “A New Model for Information Diffusion in Heterogeneous Social Networks.” Pp. 281–325 in Sociological Methodology, edited by M. P. Becker, and M. E. Sobel. Oxford: Blackwell. [10] Camerer, Colin F. 2003. Behavioral Game Theory: Experiments in Strategic Interactions. Princeton, NJ: Princeton University Press. [11] Coleman, James S. 1990. Foundations of Social Theory. Cambridge, MA: The Belknap Press of Harvard University Press.

Is No News Bad News? | 501

[12] Dasgupta, Partha. 1988. “Trust as a Commodity.” Pp. 49–72 in Trust: Making and Breaking Cooperative Relations, edited by D. Gambetta. Oxford: Basil Blackwell. [13] Fehr, Ernst, and Klaus M. Schmidt. 1999. “A Theory of Fairness, Competition, and Cooperation.” Quarterly Journal of Economics 114(3):817–868. [14] Fehr, Ernst, and Klaus M. Schmidt. 2006. “The Economics of Fairness, Reciprocity and Altruism – Experimental Evidence and New Theories.” Pp. 615–691 in Handbook of the Economics of Giving, Altruism and Reciprocity. Vol. 1, edited by S. C. Kolm, and J. M. Ythier. Amsterdam: Elsevier. [15] Friedman, James W. 1986. Game Theory with Applications to Economics. New York: Oxford University Press. [16] Fudenberg, Drew, and Jean Tirole. 1991. Game Theory. Cambridge, MA: MIT Press. [17] Gautschi, Thomas. 2002. “History Effects in Social Dilemma Situations.” Rationality and Society 12(2):131–162. [18] Harsanyi, John C., and Reinhard Selten. 1988. A General Theory of Equilibrium Selection in Games. Cambridge, MA: MIT Press. [19] Keren, Gideon, and Werner Raub. 1993. “Resolving Social Conflicts through Hostage Posting: Theoretical and Empirical Considerations.” Journal of Experimental Psychology, General 122(4):429–448. [20] Kollock, Peter. 1998. “Social Dilemmas: The Anatomy of Cooperation.” Annual Review of Sociology 24(1):183–214. [21] Kreps, David M. 1990a. “Corporate Culture and Economic Theory.” Pp. 90–143 in Perspectives on Positive Political Economy, edited by J. E. Alt, and K. A. Shepsle. Cambridge, MA: Harvard University Press. [22] Kreps, David M. 1990b. A Course in Microeconomic Theory. New York: Harvester Wheatsheaf. [23] Kreps, David M., and Robert Wilson. 1982. “Sequential Equilibria.” Econometrica 50(4):863– 894. [24] Kreps, David M., Paul Milgrom, John Roberts, and Robert Wilson. 1982. “Rational Cooperation in the Finitely Repeated Prisoners’ Dilemma.” Journal of Economic Theory 27(2):245–252. [25] Laffont, Jean-Jacques, and David Martimort. 2002. The Theory of Incentives: The PrincipalAgent Model. Princeton: Princeton University Press. [26] Mlicki, Pawel. 1996. “Hostage Posting as a Mechanism for Co-operation in the Prisoner’s Dilemma Game.” Pp. 165–183 in Frontiers in Social Dilemma Research, edited by W. B. G. Liebrand, and D. M. Messick. Berlin: Springer. [27] Rasmusen, Eric. 2007. Games and Information: An Introduction to Game Theory. 4th ed. Malden, MA: Blackwell. [28] Raub, Werner. 1992. “Eine Notiz über die Stabilisierung von Vertrauen durch eine Mischung von wiederholten Interaktionen und glaubwürdigen Festlegungen.” Analyse & Kritik 14(2):187– 194. [29] Raub, Werner. 2004. “Hostage Posting as a Mechanism of Trust. Binding, Compensation, and Signaling.” Rationality and Society 16(3):319–365. [30] Raub, Werner, and Gideon Keren. 1993. “Hostages as a Commitment Device. A Game-Theoretic Model and an Empirical Test of Some Scenarios.” Journal of Economic Behavior and Organization 21(1):43–67. [31] Raub, Werner, and Jeroen Weesie. 1990. “Reputation and Efficiency in Social Interactions: An Example of Network Effects.” American Journal of Sociology 96(3):626–654. [32] Raub, Werner, and Jeroen Weesie. 2000. “Cooperation via Hostages.” Analyse & Kritik 22(1):19–43. [33] Schelling, Thomas C. 1960. The Strategy of Conflict. Cambridge, MA: Harvard University Press. [34] Snijders, Chris. 1996. Trust and Commitments. Amsterdam: Thesis Publisher.

502 | Thomas Gautschi

[35] Snijders, Chris. 2000. “Trust via Hostage Posting.” Pp. 114–116 in The Management of Durable Relations. Theoretical Models and Empirical Studies of Households and Organizations, edited by J. Weesie, and W. Raub. Amsterdam: Thesis Publisher. [36] Snijders, Chris, and Vincent Buskens. 2001. “How to Convince Someone That You Can Be Trusted? The Role of ‘Hostages’.” Journal of Mathematical Sociology 25(4):355–383. [37] Vieth, Manuela. 2009. Commitments and Reciprocity in Trust Situations. Experimental Studies on Obligation, Indignation, and Self-Consistency. Ede: Ponsen & Looijen. [38] Weesie, Jeroen, and Werner Raub. 1996. “Private Ordering: A Comparative Institutional Analysis of Hostage Games.” Journal of Mathematical Sociology 21(3):201–240. [39] Williamson, Oliver. 1985. The Economic Institutions of Capitalism. New York: Free Press.

| Part VII: Experimental Methods

Hartmut Esser

When Prediction Fails The Reactions of Rational Choice Theory and Behavioral Economics to the Unexpected Appearance of Framing-Effects Abstract: This contribution deals with the various ways in which Rational Choice Theory (RCT) and behavioral economics have reacted to the numerous violations of the axiom of irrelevant alternatives by framing-effects. It starts with reporting the findings of some variations of the seminal Kahneman–Tversky-Experiments, which show that, even with strong changes in incentives, significant effects of the verbal “definition of the situation” remain. The reactions of RCT and behavioral economics to similar findings in game-theoretical experiments (also with “real” incentives) are summarized by three points: widening, annexations and by-catches. Widening refers to adding new types of incentives to the utility functions of a then “wide” RCT. Annexations means attempts to incorporate certain parts of sociology, anthropology and (social-) psychology into the framework of a (wide) RCT. By-catches are unnoticed discoveries of framing-effects in empirical investigations, which at first sight fail to find or seem to refute them. The contribution is embedded in the framework of the model of frame-selection (MFS), an approach designed to integrate the different concepts and mechanisms in a formal model. The contribution closes with some other recent efforts to develop such an “integrated” theory of action from inside of RCT.

1 Rational Choice? Rational Choice! Andreas Diekmann is one of the most distinguished, productive and dedicated advocates of analytical social sciences. He is exceptionally capable of applying their methods and theoretical instruments to central social problems, and of demonstrating their potential even to outsiders. It is well known that rational choice theory (RCT) is one of the most important bases for analytical social sciences. Its unique power lies in the development of precise models of action and its aggregated consequences, including models of strategic interaction and their dynamics. For good reasons, Diekmann and many others have always refused to change this basis (see, inter alia, Diekmann and Voss 2004; Diekmann 2010). Of course, reasons to make such a change have existed for a long time: innumerable observations of a merely “‘bounded’ rationality”, strange “paradoxes”, puzzling effects, and “biases” of any kind (such as those Jon Elster has collected like glittering jewelry). RCT readily dismisses these objections as “psychology” or sociological drivel, which turn out to be trivial when faced with hard facts and high costs. During the last 15 years, however, things have become more serious due to

https://doi.org/10.1515/9783110472974-024

506 | Hartmut Esser

the results of experimental game theory. Andreas Diekmann, among others, has contributed substantially to the development and dissemination of this approach. For example, contrary to all RCT predictions, about 50 % of participants in collective good experiments regularly decide to act in a way they should not according to RCT-based game theory (see, for example, Fehr and Gächter 2002:138, Figure 2). That is, participants cooperate even in anonymous situations (possibly by taking into account high costs) and also during final rounds of iterated games, when according to RCT everybody tries to fool everyone else. Such a result would be disastrous to any theory, even much less specific theories than RCT with its precise presumptions and particularly high risk of failure. This is all the more so under the strict conditions of theory-based controlled experiments, which are necessary for any test of a theory. However, no theory program abandons its bases at the first sign of failure, especially not RCT, which is well on its way to becoming the standard theory for the entirety of the social sciences. The solution: in addition to material and egoistic motives, one may take into account immaterial and altruistic ones, such as those oriented towards certain reference points in a fair trade. If one adjusted the formal requirements for the mathematical solution of game theoretical equilibria in such a way, all would be well again: The models re-systematize the various results and remain theoretically manageable. This is all that is required, for, above all, RCT is an extremely useful instrument, which merely has to meet certain formal requirements, classifies various results and gives rise to fruitful new hypotheses, which will then prove themselves valid. The underlying “real” processes were uninteresting and could not be completely understood anyway. Moreover, they were not decisive as long as the new models of a “wide” RCT function properly.

2 Framing? Framing! This approach has been successful, and to date the strategy of including new kinds of motives and expectations substantially determines the ways of thinking and developments in behavioral economics and empirical RCT. However, much of this looks like helplessly poking about in a fog, and thus (as Lakatos would put it) is a clear indication of a degenerative problem shift. Above all, it does not appear that the really serious anomalies and theoretical problems at the core of the formal assumptions of orthodox as well as extended RCT could be handled this way. This includes the idea that, for example, preferences are far from being “well ordered” empirically, and that phenomena of framing contribute systematically to the “definition” of the situation, in which occasionally even very subtle symbols may turn the preference structure upside down and, in the extreme cases, dampen and even eliminate the effect of all “rational” incentives.

When Prediction Fails

| 507

Phenomena of framing, as is well known, were already described 20 years ago by social psychologists Tversky and Kahneman. Minor variations in the verbal presentation of a situation resulted in dramatic changes in reactions, while incentive structures remained the same. According to RCT, this must not happen: it would be a severe violation of the axiom of independence, according to which irrelevant aspects should have no effects and communication and language are nothing but “cheap talk” without consequences. Tversky and Kahneman reacted in an instrumentalist way and formulated the so-called “prospect theory” based on further findings from their experiments (see, for example, Tversky and Kahneman 1981): there exist varying utility functions for gains and losses and a non-linear course of the expectation function. A variable reference point, including, inter alia, the description of a situation, determines whether a given “objective” incentive is regarded as a gain or a loss. This “prospect theory” could indeed resolve some of RCT’s anomalies, like risk aversion to losses. Yet, also here the problem became obvious immediately: the functions are only fitted to the observed results and adjusted to some formal requirements of mathematical tractability, but the underlying causal processes remain unexplained. However, substantial theory and knowledge about the underlying causal processes has to precede any formalization. This particularly applies to the impact of verbal framing on setting the reference point. Tversky and Kahneman do not comment on this point, and none of the variants of RCT considers the spontaneous triggering of completely different views and reactions through the “definition” of the situation via “significant symbols”, regardless of any incentives. It would really be a disaster if a member of the orthodox RCT camp had to acknowledge this. One may question, however, whether the findings on the framing via verbal variations in the description of the situation in Tversky’s and Kahneman’s experiment are correct. There are indeed good reasons for doubt. For an easier understanding, we will summarize the description of the experiment. The issue was the combat against an “unusual Asian disease” endangering the life of 600 persons. A first group of subjects was asked to evaluate two different programs (A and B), and a second group was asked to evaluate two other programs (C and D), in terms of their suitability to fight the epidemic on the basis of the following information: Group 1: – If Program A is adopted, 200 people will be saved. – If Program B is adopted, there is a 13 probability that 600 people will be saved, and 2 3 probability that no people will be saved. Group 2: – If Program C is adopted, 400 people will die. – If Program D is adopted, there is a 13 probability that nobody will die, and a probability that 600 people will die.

2 3

508 | Hartmut Esser

Three aspects varied in the specifications for describing the situation: “save” or “die” for describing the programs’ effects, positively or negatively worded consequences (like “save” instead of “not die” or “die” instead of “not saved”) and certain versus risky consequences. The special feature was that, according to RCT, one would expect the same result for all programs: 200 of the 600 affected persons will survive. Empirically, however, things turned out to be different. Program A was preferred by 72 % against Program B (28 %), but Program C was preferred only by 22 % against Program D (78 %). Consequently, there was a difference of 50 percentage points, merely due to use of the terms “save” versus “die”! One could possibly interpret this finding as the effect of the following norm: if someone will be saved with certainty, one does not put him or her at the risk of dying any more (A vs. B); and before sending people to certain death, one will still give them a chance to survive (C vs. D). These findings were quite puzzling from the perspective of RCT, even if one does not presume a perfectly informed homo economicus. As stated above, however, one does not abandon a successful theory program’s basis because of a dubious anomaly. Table 1 describes three variations of the specifications in the experimental design by Tversky and Kahneman. This replication study was conducted within the framework of a project supported by the Deutsche Forschungsgemeinschaft (DFG), at the chair of sociology and philosophy of science at the University of Mannheim. The original version of the experiment described above is presented in row “a.” of Table 1 (cf. Stocké 1996: Section 5). Looking more closely at Tversky’s and Kahneman’s experiment reveals that the specifications in the description of the safe situations are peculiarly incomplete. As the text above shows, the specifications for the negatively phrased consequences in Tab. 1: Variation of the specifications of information in the framing experiment by Tversky and Kahneman (1981), reconstructed and presented according to Stocké (1996:47ff.). Framing

Frame “save”

Frame “die”

Formulation

Positive

Negative

Positive

Negative

Certainty a. Original version (incomplete) b. Complete information c. Reversed version (incomplete)

Program A 200 saved (p(resc) = 1) 200 saved (p(resc) = 1) 200 saved (p(resc) = 1 − d)

400 not saved (p(resc) = 0 + d) 400 not saved (p(resc) = 0) 400 not saved (p(resc) = 0)

Program C 200 not die (p(resc) = 1 − d) 200 not die (p(resc) = 1) 200 not die (p(resc) = 1)

400 die (p(resc) = 0) 400 die (p(resc) = 0) 400 die (p(resc) = 0 + d)

Risk d. All versions (complete)

Program B 600 saved with p = 1/3

0 saved with p = 2/3

Program D 0 die with p = 1/3

600 die with p = 2/3

Notes: p(resc): expected value for the proportion of survivors.

When Prediction Fails |

509

the “gain”-frame, and the positively phrased ones in the “loss”-frame, are missing in the original version in terms of the “safe” Programs A and C (Table 1, row “a.”). What consequences might this have? The first striking point is that for the safe “gain”-frame (Program A) there can be no higher value than p = 1 for 200 survivors, and for the safe “loss”-frame (Program C) the value cannot become lower than p = 0. If there is an error at all (in terms of any value of d), the value of p will increase inevitably by d in the “gain”-frame for the negative phrase (“not saved”); it will also inevitably decrease in the “loss”-frame for the positive phrase (“not die”). Everything else remains the same. Consequently, in case of doubt more than 200 survivors will be estimated for Program A and for Program C less than 200. Therefore, the expected utility shifts symmetrically to the advantage of Program A and to the disadvantage of Program C. This would, however, explain the framing effect found by Tversky and Kahneman through a small and justifiable concession to the “bounded rationality” of real human beings: If information must be processed, and if homo economicus perhaps makes mistakes in doing so, the mistakes will add up inevitably to the difference, which terrified RCT so much – completely without framing! Table 2 summarizes this point (column “a.”, “expected survivors”). Tab. 2: Effects of different information specifications in the framing experiment according to Tversky and Kahneman (1981), reconstructed and presented according to Stocké (1996:41, Table 2, 49, Table 4, 53, Table 5). Version

a. Incomplete information (original version)

b. Complete information

c. Incomplete information (reversed version)

Program

Expected survivors

% votes

Expected survivors

% votes

Expected survivors

% votes

A B C D

> 200 200 < 200 200

50 (72) 50 (28) 25 (22) 75 (78)

200 200 200 200

42 58 38 62

< 200 200 > 200 200

32 68 59 41

Notes: Original effects by Tversky and Kahneman are indicated in brackets in Column “a.”, “% votes”. For the number of cases (approximately 100 in each case) and exact significance statistics, see Stocké (1996: Tables 2, 4 and 5).

Is this explanation simply by effects of missing information therefore evidence of no “framing”, and instead an artefact of information specifications? This is what was tested systematically in the project. The replication of Tversky’s and Kahneman’s original version, including the missing information, initially produced structurally identical results (with only a minor difference). The proportion by which subjects preferred Program A over B differed by 25 percentage points from the proportion by which sub-

510 | Hartmut Esser

jects preferred Program C over D (Table 2, Column “a.”, “% votes”; results of the original experiment by Tversky and Kahneman are indicated in parentheses).¹ According to RCT, the difference should disappear completely when all information is available, because all programs have the same effect and verbal variation is irrelevant, and there are no errors as all necessary information is presented. This was exactly the result of the experiment. The difference between the proportion of subjects preferring Program A over B and the proportion of subjects preferring Program C over D now amounts to only 4 percentage points, which is far from statistical significance (Table 2, Column “b.”, “% votes”). There is thus no shred of evidence of framing, if subjects did not have to guess. If it were now possible to turn the results upside down by a reversed incomplete version in comparison to the original, RCT’s triumph would be perfect. This is exactly what was found (Table 2, Column “c.”). In the symmetrically reversed version, only 32 % of the subjects chose Program A over Program B. However, 59 % of the subjects favored Program C over D, thus meaning that 27 percentage points fewer participants favored Program A over B than Program C over D. Heureka! One can hardly expect more. The findings look indeed like a splendid resurrection of RCT, and an excellent example of a progressive problem shift. However, when testing the robustness of RCT against theoretically extrinsic influences one should also consider incentives, the second fundamental part of RCT, in addition to expectations. Objective gains, here the number of survivors, were the same in all experiments. What, then, will happen, if incentives for a program with the same label increase respectively? Should not all the more the myth of framing disappear? Table 3 describes the experimental design and presents the results of the conducted experiments. In the safe program versions (A and C), 201, 210, 250 and even 300 persons will survive according to the specifications (Column 1 in Table 3) instead of 200, as was the case before. This means a clear increase in the objective efficiency of Program A as compared to Program C, and of Program B as compared to Program D. This can be derived from the respective differences of 0, 1 and 10 up to even 100 (Column 2 in Table 3). According to RCT, one should now assume that the “framing” effect will disappear with minor changes in incentives in the form of an increased number of survivors, and that decisions will be based more or less directly on the increase in incentives. Yet, the findings are different: although the preferences for Programs A and C (Columns 3 and 4) increase, this increase is only moderate and the difference between the proportion by which Program A is preferred over B and the proportion by which Program C is preferred over D – in which framing effects are reflected (Column 5) – hardly changes.

1 None of the published replication studies found such a high difference between programs A and C as was reported in the original experiment by Tversky and Kahneman.

When Prediction Fails | 511

Tab. 3: The (un-)conditionality of framing effects in Tversky’s and Kahneman’s experiment, reconstructed and presented according to Stocké (1996:56, Figure 6 and Table 6). 1

2

3

4

5

Survivors A resp. B/C resp. D

Difference survivors

Frame “save” % votes A vs B

Frame “die” % votes C vs D

Difference % votes A vs C

1

200/200

0

72

22

50

2

200/200 201/200 210/200 250/200 300/200

0 1 10 50 100

50 53 55 61 64

25 32 36 40 48

25 21 19 21 16

Notes: All differences in Column 5 are significant. Row 1: original version by Tversky and Kahneman (1981), row 2: replication study. For the number of cases (approximately 100 in each case) and exact significance statistics, see Stocké (1996: Table 6).

Both the following phenomena obviously do exist: sensitivity to incentives by some of the subjects (as predicted by RCT), and a total insensitivity by others, who base their decisions exclusively on verbal symbols that should be irrelevant according to the axioms of RCT.

3 The model of frame selection Tversky’s and Kahneman’s findings have by no means been the only evidence of framing effects and for the neutralization of incentives. Some of the most famous and frequently replicated experiments are those conducted by Liberman, Samuels, and Ross (2004): 70 % of subjects in a collective good experiment cooperated when the word “community” was presented to them, and only 33 % did so in the case of the words “Wall Street”, independently of their before revealed preferences. Bare words apparently “defined” the situation again, once under the frame of cooperation and solidarity, and then as a situation, in which egoism and utility maximization prevail. For the light-hearted claim that wide RCT constitutes the general microfoundation of social sciences (as raised by, for example, Fehr and Gintis 2007:49, 61), this is a rather cold shower. In the meantime, framing effects have actually become a source of considerable concern for some proponents of traditional RCT, and with good reason. They violate key assumptions of RCT, refer to mechanisms for the selection of reactions that are unknown to any variant of RCT, and – contrary to Tversky and Kahneman – real incentives are involved to a large extent, so that one cannot dismiss all this as cheap talk. Therefore, the only recourse is to play down and push away the relevance

512 | Hartmut Esser

of framing effects, or somehow keep them within the logic of RCT (see Section 4 below on this point). This is comprehensible and, in the first instance, reasonable as long as no more advanced theoretical instruments (in basic terms at least) are available, and as long as any other alternatives represent little more than orientation hypotheses or generalizations of empirical findings. Irritations would, however, remain with each replication study confirming framing effects, and a theory is still needed. The model of frame selection (MFS) represents one attempt for such a theory (see Kroneberg 2014; Esser and Kroneberg 2015). The MFS assumes that any visible behavior is the result of a fundamental sequence of three selections, which cannot be observed directly. Firstly: the selection of a (typecasting) view of the situation, that is, of a frame. Secondly: the selection of a (habitualized) action program, that is, a script based on prior framing. Thirdly: the selection of a specific act on the basis of the preceding processes of frame and script selection. For this fundamental sequence, three interconnected mechanisms are assumed. Firstly, the definition of the situation as the starting point and constraint for all further selections of scripts and subsequent specific acts. Secondly: a variable rationality governing these selections, and ranging from the extreme of an automaticspontaneous and fast “behavior” structured by the past and by programs up to a reflected-calculating “action”, which is slow(er), intentional and directed toward future consequences. This is nothing new. Kahneman, for example, refers to an intuitivespontaneous and fast System 1 and a rational-evaluating and slow System 2; and Alfred Schütz and George H. Mead developed such a distinction in their analyses of everyday behavior long before. Variable rationality indicates that, although the capacity of human beings for data processing is limited, this does not apply to the opportunities for an elaboration of the situation, as homo economicus would presuppose. The conditions for this have been analyzed in detail and confirmed experimentally within the framework of the so-called dual-process-theory (DPT) in social psychology (see the seminal contribution by Fazio 1990). The higher the motivation for a reflected decision, the more favorable the opportunities for estimating consequences; the lower the costs for obtaining the necessary information, the more likely it is that reflections on details of the situation will occur. If one of these conditions is not met, people will react spontaneously, with the possible result of deviations from RCT’s predictions. Thirdly: the definition of the situation, that is, the framing, is controlled by cognitive processes of “decoding” a previously acquired mental model or program by categorization, in other words the (more or less “matching”) recognition of an object in a situation according to certain, previously acquired, prototypical conceptions. This is the starting point for the branching of variable rationality, in which the framing of the situation (and everything else) takes place. The most important conditions for an easy recognition, and thus for a strong, uninterrupted and fast framing, include the (chronic and temporary) accessibility of a mental model as the result of previous learning and “encoding” and their short-term activation (“priming”), the current presence of a certain object in the situation, the previously acquired link between the saved mental model

When Prediction Fails

| 513

and the object (which transfers a simple “objective” object into a more or less significant cultural “symbol”), and, finally, the absence of disturbances in the situation in which the object is observed. These conditions determine the strength of a frame’s activation for a specific definition of a situation in comparison to another. The MFS formally interconnects the definition of the situation, categorization and variable rationality. If a mental model can be accessed easily, if it is linked strongly to a symbolically meaningful object, that is, a “cue”, if it is not disturbed and if, therefore, the activation weight is high, the respective frame will be triggered automatically and uncontrollable (and possibly the same will happen to the activation of associated scripts and acts). If, however, only one of the conditions is not met, or is met to a lesser extent, the activation of a frame will be weakened and, thus, also the clarity of the definition of the situation. This, in turn, will – ceteris paribus – reduce the strength with which the framing determines subsequent reactions, script activation and concrete action. Equation 1 describes these correlations. An actor will follow the reflecting-calculating mode under the following condition:² p ⋅ (1 − AW i ) ⋅ U > C

(1)

The term AW i describes a frame’s activation weight (with AW i ∈ [0, 1]) as a consequence of categorization, p the availability of opportunities (with p ∈ [0, 1]), U the motivation and C the costs for further elaborating the situation’s details, and thus the conditions of DPT’s well corroborated findings. From this follows an important consequence of the MFS: if either the activation weight (AW i ) is low, or opportunities (p) are few, or in the extreme case equal to zero, no further elaboration will arise, even if the costs (C) are extremely low. This will also happen independently of any motivation for thinking things over again. This would statistically emerge as a negative interaction between the framing’s strength (and its underlying accessibility and match) and the effectiveness of “rational” incentives (preferences and expectations). Accordingly, there are two extreme points for this: the complete independence of all reactions from this on the one hand, and their increasing responsiveness, even to such an extent as a homo economicus would need, on the other hand. RCT would, therefore, represent a special case of the MFS, which would be more likely to apply when the frames’ activation weights were low and more opportunities and accessible information for an elaboration of the situation existed. Analogously, one could also integrate the normative and the interpretative paradigm of sociology, institutional theories and cultural sciences into the MFS and in relation to the utilitarian paradigm of RCT. Institutional rules and culturally “significant” symbols define a situation, and action follows organizations and everyday routines. The interpretative paradigm discusses the continuous disturbances of everyday life’s certainties, which

2 For details on the explanation and derivation of the equation, see Kroneberg (2014: Section 5.4).

514 | Hartmut Esser

cannot be overcome “rationally”, as well as the constant repairs via attempts to give the whole situation a “rational” meaning. Could the MFS, therefore, possibly represent a decisive step towards the longsought “general theory of action” for the overall social sciences? For many, including those who admit that orthodox variants (above all) of RCT are problematic, this would be unthinkable. For the specifically sociological approaches (the normative and the interpretative paradigm), this would also hardly be acceptable: they believe RCT is either able to finally solve all problems using its own instruments, or that the MFS is nothing more than a far too complicated variant of (extended) RCT. This is also what the other two (sociological) approaches claim. They consider the MFS to be nothing more than a culturalistically covered version of RCT, and thus not worthwhile discussing. This opinion is justified by the way in which the aspect of variable rationality is modeled within the framework of the MFS (following DPT). Branching into spontaneous or elaborated selections would be again nothing more than a rational decision, always with some deliberation considering future consequences. Yet, the MFS states something else. Each basic sequence starts with an uncontrollable perception of a symbolically “significant” object in an automatic-spontaneous process of comparison. There is thus no “choice” oriented towards utility and costs and consequences in the future, but processes of pattern recognition, categorization and program activation. If this match is perfect, nothing more will happen than activation of a (habitualized) program. However, if this is not the case, the degree of attention will increase promptly, possibly even if deviations and disturbances are only minor, and an uncontrollable, automatic, inner search for memory contents and clues for solving the riddle will start. How far this search will go depends on motivation, opportunities and costs, but this in turn results in the spontaneous process of triggering a program. The motivation describes a person’s evaluation of the relevance of a “correct” reaction being automatically activated through a currently perceived “cue”, which was not in question before, often reinforced and driven by emotions and biochemical processes in the background. Opportunities describe objectively existing limitations, particularly those in time, which is (not) available for the automatic elaboration. The costs finally reflect whether and how far the respective information can be found objectively once the search has started. In this process, nothing is “decided”, but everything is automatically initiated – and occasionally stopped – by objective circumstances. In short: the MFS is neither a variant of wide RCT, nor an unnecessarily complicated version of DPT, which itself is nothing more than a variant of extended RCT as, for example, Opp (2016) believes. All these assessments ignore the fact that the MFS assumes another basic process of selections than the “decision”, which solely depends on preferences and expectations: namely, pattern recognition, categorization, symbolic decoding and (mis-)match, which are formally connected with the “definition” of the situation and variable rationality.

When Prediction Fails |

515

The MFS is not the only approach to provide a theoretical concept for explaining framing effects. In addition to, for example, approaches of “program-based behavior” in evolutionary psychology and biology (see already Vanberg 2002), these include the goal-framing theory (GFT), developed by Siegwart Lindenberg following decades of effort. GFT starts from the assumption that certain overall “goals” frame the situation, and that other goals then remain in the background (see Lindenberg 2016 for an up-to-date summary). The “goal-frames” in GFT, just like the frames in the MFS, are not simple “preferences”, but comprehensive “mind sets”, which focus the attention on a specific aspect (see, inter alia, Lindenberg 2016:47ff.). Lindenberg distinguishes three “goal-frames”: the “hedonic goal”, “normative goal” and “gain goal”. Hedonic goals refer to the basic material needs for immediate wellbeing, normative goals to group-related social goals like solidarity, fairness and reciprocity, and gain-goals aim at increasing one’s own resources through, inter alia, investments and waiving shortterm profits, but also by strategic behavior and the profitable exploitation of others. The three “goal-frames” are universal and chronically accessible. However, only one goal at a time is in the foreground, which is partly due to universal internal processes like hunger or lack of recognition and partly activated by culturally variable “cues” and processes of “priming”. There exists no formal model for the processes regarding “goal-framing”. This particularly applies to the mechanisms through which culturally variable “cues” affect the universal (“goal-”) framing and under which conditions, and how the “goalframes” push themselves to the fore or not. There are, however, clear clues for connecting this to the MFS. “Goal-frames” can be part of culturally determined “significant” mental models, for example of solidarity or competition; can compete with them in the “definition” of the situation; or can support or disrupt their activation through internal signals like hunger or emotional alerts. The concept of “goal-framing” represents an important complement to the MFS itself. Through the links of activating symbols to the condition of accessibility one could, for example, also explain the particular temporary sensitivity for certain “cues”, such as indications for food when hungry and a hedonic goal is in the foreground. The “goal-framing” theory offers an important specification for this. Hedonic goals always display the strongest anchoring and accessibility, and normative ones the weakest. Moreover, it is an important additional specification, which also allows for understanding neurobiological anchored and evolutionarily evolved deep processes in the “definition” of the situation, as well as explaining the varying vulnerability of certain institutional and cultural framings to disruptions in the context of the situation.

516 | Hartmut Esser

4 Reactions As mentioned above, framing effects constitute more than a minor problem for RCT. They not only affect the paradigm’s protective belt of auxiliary assumptions, but its (axiomatic) core. The stakes are high. All the previous accomplishments and achievements of RCT and the idea that the economic model might form the common basis for all social sciences, are under attack. It is a long way from irrefutable indications that something is not correct to a new perspective taking over, especially if nobody can predict where this will all finally lead. Not everybody reacts in the same way: weakening the strictness and extension of the basic assumptions, confused trials (and errors: see also the exhortations in Binmore and Shaked 2010:89ff. not to disregard the methodological standards of theory development through excessive experimentation) and even an occasional glance at other disciplines. Not a few now stick even more to RCT and try, after a brief shock, to make sure that not everything was wrong and not for nothing. Scientific knowledge, however, differs from religious belief: it changes in an interplay between empirical findings and theoretical systematization and this, if possible, in such a way that successful parts of a paradigm are preserved. This means that one can also specify the conditions for the achievements and explain failures with a new model in a “correcting explanation”, just like Popper had proposed for such cases. Against this background, we may summarize three reactions to framing effects and to the MFS as a proposal for such a corrective explanation: the extension of RCT, the annexation of sociology and other disciplines infected by “culture” by an extended version of RCT and some unintended findings of framing effects by RCT-experiments and other analyses (here referred to as “by-catches”), which initially seem to prove that one can refrain confidently from all this fuss about framing. “Wide” RCT The discovery of motives of fairness, justice, and altruism in game theoretical experiments has, as we have already seen, encouraged proponents of RCT to think about issues which were irrelevant for most of them until then. The most important reaction consisted in the extension of RCT: altruistic-social motives are now also included in the utility function, in addition to egoistic-material incentives. At first sight this may not only look like a simple and effective repair of RCT, but also an increase in its logical content. The argument (according to Opp 1999:182) is that the fewer factors are included in the set of initial conditions, the more a theory is subject to restrictions and, thus, the lower is its logical content in comparative terms. The wide version of RCT, however, would imply that an egoistic motive S and/or an altruistic motive R caused an action A, whereas the narrow version would indicate that an egoistic motive and just not (also) an altruistic motive did this: narrow version: wide version:

(S ∧ ¬R) → A

(2a)

(S ∨ R) → A

(2b)

When Prediction Fails | 517

Accordingly, the narrow version would be linked conjunctively to a restrictive initial condition (∧¬R), which obviously does not apply to the wide version with its disjunctive extension (∨R). But is this a correct reconstruction? For the evaluation of the logical content, it does not matter which and how many factors are somehow included in the set of a theory’s initial conditions. Rather, it is crucial to determine the exact nature of the causal relation for explaining an explanandum A: the utility function, including exactly the consequences it permits or prohibits. The narrow version of RCT then states something else: if the egoistic motive S for an action A is stronger than for an action B (S(A)), then A will be chosen – disregarding all other things. In contrast, the wide version must assume a limiting condition for the occurrence of A: if the egoistic motive for A is stronger than the one for B, and if it does not apply that the combined value from the egoistic and the altruistic motive for B exceeds the one for A(∧¬SR(B)), then A will occur. This is expressed in equation (3b). Because initial conditions of the narrow version can be extended by any further condition, the restriction of the wide version can – for a direct comparison of both versions’ logical content – be added disjunctively ((∨¬SR(B)) in equation (3a)). The difference is immediately striking. In terms of its claim for validity, the narrow version is less restricted in its initial conditions than the wide one and, therefore, it has – ceteris paribus – the higher logical content. narrow version:

S(A) ∨ ¬SR(B) → A

(3a)

wide version:

S(A) ∧ ¬SR(B) → A

(3b)

This is also the common opinion within RCT itself: “The models [of wide RCT] will thus become more ‘realistic’, but at the same time there is a danger that the theory will be immunized against empirical criticism through adding ever more utility components” (Diekmann and Voss 2004:20; translation by HE). But insisting on the more substantial narrow version, of course, will not help much, if falsification starts to become more frequent. Therefore, something must be changed if a theory remains simple, risky and of high logical content, yet empirically wrong, at least partly and under certain conditions. Serious reactions within RCT and behavioral economics then also differed from extending simply the utility functions to a broader set of preferences. Exemplarily, the ERC-model can be considered as one of the first attempts to specify the interplay between egoistic and social motives in one single utility function (Ockenfels 1999:132ff.). Although human beings are egoistic, the utility will decrease if an action deviates from a social reference point like the one of fairness, where a joint profit is divided fifty-fifty. Other models consider analogously, for example, the aversion to inequity or reciprocal fairness as social motives. What is always important here is that the new utility function complies with certain formal characteristics – especially for the derivation of game theoretical equilibria and of the empirical implications, which are now expected. Surely, this attempt was not without success (although the logical content cer-

518 | Hartmut Esser

tainly did not increase), and is clearly to be preferred over adding motives to the list of conditions to see whether any effects occur and whether “explained” variance increases.³ Annexations These developments certainly can foster the idea that the key to a universal basic theory of social sciences may consist in the extension of RCT through altruistic motives. In several contributions, a kind of annexation or even incorporation of sociology, cultural anthropology and (social) psychology into a RCT extended by the “internalization” of “norms” has been suggested with reference to, inter alia, Talcott Parsons (Fehr and Gintis 2007:49, 61; Gintis 2007:7ff.). This sounds plausible at first, and the rather casual extensions of RCT hardly represent something else. But this is not what Parsons had in mind regarding norms and what has always been the counter-program to the utilitarian paradigm in the other disciplines. In one of the central elements of this theory of action, the so-called “unit act”, Parsons summarizes this as follows. Each act consists of four elements forming one unit. The first three elements are: an actor, certain goals and a situation, which involves currently uncontrollable conditions on the one hand, and selectable options for goal achievement, the means, on the other hand. These three elements constitute, so to speak, the RCT part of the “unit act”. Parsons does not stop here, and his most important concern is as follows: what is respectively regarded as actor, goal and situation (and thus as stable conditions and effective means) is not determined in advance by a general utility function, but controlled by a fourth and decisive condition: the normative orientation. It is the “definition” of the overall situation, which precedes any elementary act and without which no consistent action would be possible (Parsons 1937:44). However, this is something which is disregarded in any of the narrow or wide versions of RCT: it would mean the acknowledgement of a completely different mechanism to explain human action. Through this, two different meanings of the concept of a norm and how it is used become obvious. RCT considers norms as nothing more than internalized preferences (or goals), which may influence action in addition to other motives and expectations (Gintis 2007:8). For Parsons and large parts of sociology, cultural anthropology and (social) psychology, they also represent general orientations, internal “representations” and mental models of typical situations, which first of all “define” everything: what “type” an actor just represents, his current goals, and which means he presently regards as appropriate. It is the pivotal difference to economics spreading across sociology, cultural anthropology and many parts of cognitive (social) psychology, for ex-

3 The lack of precision in the causal relation has also been the main problem for other proposal for an “extension” of RCT, such as the DBO-concept developed by Hedström. According to this, action depends – somehow – on “desires”, “beliefs” and “opportunities”. This is no extension in the above described sense, but a careless loosening of indispensable requirements for explanatory social sciences; see Diekmann (2010:194f.) on this point.

When Prediction Fails

| 519

ample in terms of the distinction between “intentions” (according to Ajzen and Fishbein) and “attitudes” (according to Allport), which Fazio used as the starting point for his integrative MODE-model (Fazio 1990: Parts III and IV). Closely related to this are assumptions on the effects of norms. For wide RCT, any change in incentives, normative or otherwise, will result in a change in the EU-weights for a certain alternative. Thus, any action is variable and “conditional”, even in case that EU-differences between options and costs are high. According to the “unit act”, however, normative orientations are “categorical” in nature. This means that they operate “unconditionally”, and this particularly applies in cases where they are activated spontaneously as an internalized reaction program. Within the MFS this is represented by the increasing suppression of sensibility to incentives with the strength of frame-activation (following equation (1) above). Parsons believed that the understanding of norms as a kind of unconditional imperative formed the basis for any consistent action and for any social order. Since then, we have learned that social order can also arise among egoistic-rational actors, but we also know that the effects of orientation, which Parsons presumed in the “unit act”, exist empirically. The world of everyday action (and the findings of experiments) is full of examples of this. However, none of the versions of RCT, neither narrow nor wide, has provided a concept for this so far. “By-catches” Meanwhile, there are a considerable number of contributions on framing effects as the definition of the situation being activated by categorizations. They result from experiments of RCT and behavioral economics itself. Most of the findings are similar to those of Tversky and Kahneman or Liberman, Samuels, and Ross (see the references in Duwfenberg, Gächter, and Hennig-Schmidt 2011:462f., Footnotes 6–10; Engel and Rand 2014:387, Footnote 1; Ellingsen et al. 2012:117, Footnote 2, 119, Footnote 11). The results also occur for other symbols such as “watching eyes” (see, inter alia, Fathi, Bateson, and Nettle 2014), signs of order violations like smeared walls or fallen bicycles, or apparent prosocial behavior like the sweeping of a pavement by uninvolved people (see, inter alia, Keizer, Lindenberg, and Steg 2013). Even the effects of the classical cooperation experiment by Fehr and Gächter (2002:138, Figure 2) can be explained without the effort of additional assumptions which wide RCT has to make in this case. Although in the beginning 50 % of the subjects cooperate, this rate decreases to nearly 10 %. Merely the announcement of possible punishments then results in a sharp increase in the cooperation rate by more than 50 percentage points to over 60 %, and ultimately nearly 100 %. Actions themselves obviously also function as signals for the “definition” of the game as cooperative or competitive in nature. One could interpret the decline of the high cooperation rate during the first couple of rounds as the result of a conditional defection, which is well known in RCT. This, however, cannot explain the sharp increase in cooperation by 50 percentage points through merely announcing possible sanctions, as this should be irrelevant for a “rational” actor. One

520 | Hartmut Esser

can reconstruct this easily using the framing concept: the announcement constitutes the “significant” signal, which newly and unambiguously establishes the frame of a joint project for all subjects. In addition to lots of evidence from studies on DPT, contributions from behavioral economics and RCT also provide evidence for variable rationality. These include studies on the effects of time pressure on decisions. The less time one has to make a decision, the more likely it is that one will react intuitive-spontaneously and be oriented toward certain “default options”, like friendliness in the first move having long been trained for in everyday life. But the more time one has (or needs), the more “rationally” one decides, and this usually means one acts egoistically (Rand, Greene, and Nowak 2012; Engel and Rand 2014). Each interruption of the categorization process apparently has the same effect. For example, subjects will become slower, more egoistic and emotionally cooler if they have to use a foreign language in the experiments (see, for example, Costa et al. 2014). The background is constituted by a frame’s accessibility in terms of the conditions for a perfect match. Communicating in the mother tongue makes everything more accessible, including the more deeply embedded emotions. In short, there is a lot of empirical evidence for framing effects in experimental behavioral economics and RCT themselves. In addition, several experiments have been conducted in terms of the MFS, which confirm the effect of suppressing even very high incentives and risks when framing is strong (for a detailed account of this see Kroneberg 2014:109f., Table 4.1). There are deviations and refutations, albeit not many; they are, however, of particular importance in discussing the crux of the matter. In a variation of Liberman’s experiment (“stock-market” instead of “Wall Street”), Ellingsen et al. (2012) found the usual framing effect in the first instance. The authors wanted to examine whether this was merely an effect of cognitive coordination through beliefs and thus compatible with RCT, or whether the label also activates social preferences what would be incompatible with RCT. For this purpose, subjects were told that they would not play against another actively deciding human subject, but against a computer, whose choices were only passive. This aimed at neutralizing possible effective social motives. All framing effects now disappeared (Ellingsen et al. 2012:124, Figure 2): obviously, no preferences were changed by the label, which would have contradicted RCT. In a further experiment, they examined what happens in a sequential game, in which a player makes his move only after the other player’s decision. Again, no differences arose between the two descriptions of the game, yet they did to a considerable degree depending on the first player’s move: 50 % of second movers cooperated in cases where the first mover cooperated, but only 20 % did so in cases of defection (Ellingsen et al. 2012:127, Figure 6). The first mover’s actions apparently completely overwrote the framing effect. Framing effects perhaps fail most clearly in the study by Duwfenberg, Gächter, and Hennig-Schmidt (2011). Here the authors varied two labels for the game: one for describing the gains (“give” vs. “take”), and the other for describing the context (“neu-

When Prediction Fails | 521

tral” vs. “community”). Analogously to Liberman’s experiment, they presumed that “give” and “community”, as compared to “take” and “neutral”, would support cooperation, both for one’s own contribution and for the other player’s (first- and secondorder) beliefs in terms of cooperation. The surprising result was that (in addition to an effect of describing the gains as “give” going in the expected direction) the framing effect was reversed: less cooperation and fewer cooperative beliefs in the “community” game (Duwfenberg, Gächter, and Hennig-Schmidt 2011:466–468, Figures 1–3). Two more recent survey-studies could not find the core effect of the MFS either: the weakening of rational incentives through a strong (normative) framing. Both studies interpreted this as confirmation of the range of wide RCT and as a refutation of the MFS. One of the studies addressed the effects of marital role on a couple’s decisions to move for job-related reasons (Auspurg, Frodermann, and Hinz 2014). They found that everything follows the rules of RCT-based negotiation theory: there was no evidence for a damping effect of marital role conceptions (Auspurg, Frodermann, and Hinz 2014:41, Table 3). The other contribution examined the impact of protest norms during the Monday Demonstrations in Leipzig in 1989. Again, there was no evidence that the strength of a norm reduced the effects of incentives (Opp 2016:20, Table 3). Does this, therefore, mark the end for framing and the MFS? As is common to exhaustions one does not know, of course, immediately whether the saving idea applies or not. But the matter is surprisingly simple and results easily from some details, which occur incidentally and remain nearly unnoticed in the respective studies. One can perhaps detect these details only with an adequate theoretical sensor, in this case the MFS. In the study by Ellingsen et al. (2012), the framing effect disappears in the game where subjects play against a computer. In the other, it disappears only when they experience the first mover’s decision. Both, however, can be understood as a change in context, or as a disruption in the “community” frame. A computer surely does not signal uninterruptedly that the situation is now a matter of a common good and communal solidarity. This also applies to the other finding by Ellingsen et al.: action itself is always a signal, and somebody’s defection is not compatible with the uninterrupted activation of a common good frame. It is the same process as in the case of the sharp increase in cooperation through merely announcing possible sanctions, as reported in the study by Fehr and Gächter. Things are not so simple in case of the reversal of the framing effect in the analysis by Duwfenberg, Gächter, and Hennig-Schmidt (2011). Here there are no such obvious disruptions and replacements of the “community”-framing in the experimental design, and the authors are really helpless. They admit openly to have no theory explaining the effects. This particularly applies to the reversal of the label effect. But they set out with a proposal, which could well originate from Max Weber, Alfred Schütz, Talcott Parsons, George H. Mead or the MFS. The experiments had originally been conducted at the University of Bonn, but this would be a giant university and in Germany the term “Gemeinschaft” (community) had a negative connotation for well-known historical

522 | Hartmut Esser

reasons. This should be different in St. Gallen, a small scale university with a strong “corporate identity” located in Switzerland, where nobody associates negative emotions with “Gemeinschaft” (rather the contrary). What distinguishes the contexts? The meaning of the words. What can be done? Conduct the same experiment in St. Gallen (Duwfenberg, Gächter, and Hennig-Schmidt 2011:477). Table 4 presents the results. Tab. 4: Differences between framing effects in Bonn and St. Gallen (average numbers; according to Duwfenberg, Gächter, and Hennig-Schmidt 2011:472f., 477, Table 7). Contributions

Beliefs (first-order)

Beliefs (second-order)

Framing

Neutral

Community

Neutral

Community

Neutral

Community

Bonn St. Gallen p (difference)

5.7 6.8 n.s.

4.5 10.4 0.002

8.1 8.9 n.s.

6.8 10.6 0.001

9.2 8.1 n.s.

6.8 9.7 0.013

There were no differences in the effect of wording regarding the gains: “give” and “take” probably have the same meaning everywhere. With regard the neutral label, also, nothing happens. The situation is, however, different when it comes to the “community”-game: In St. Gallen, it results in the reinforcement of cooperation usually found in the Liberman-replications in comparison with the neutral label (for all three aspects): contributions as well as first- and second-order expectations. It even reverses the result in Bonn. The absence of damping effects according to the MFS with regard to moving for job-related reasons can very easily be explained with the instruments of MFS immediately. In both cases, the conditions under which RCT comes into effect according to the MFS are met. But then the MFS was not refuted, because it makes the same predictions as RCT. In case of the marital role orientations, everything supports the applicability of RCT as a special case of MFS. Job-related moves represent no spontaneous matter of everyday life, and there is presumably no complete script for their realization available. Thus, there were many opportunities for the interruption of the further progress up to the concrete action, even in case of an uninterrupted activation of the gender role frame. This especially applies because the spouse is always affected, too, and one cannot “decide” quickly on one’s own. Moreover, the question of whether to move or not is important enough and there is also probably sufficient time for a thorough deliberation. It is, however, also possible that the gender role frame had been questioned before, because the respective spouse did not share it. But, unlike what the authors describe and implement empirically (Auspurg, Frodermann, and Hinz 2014:45, Footnote 11), the uninterrupted joint definition of the situation is essential for a strong framing in terms of a couple’s relationship. This had already been clearly demonstrated in the application of the MFS to marriage and divorce (see Esser 2002). Unfortunately, one

When Prediction Fails

| 523

cannot reconstruct the constellation of the spouses regarding role orientations from the study by Auspurg, Frodermann, and Hinz (2014) itself: Only one of the spouses was interviewed regarding role orientations. In principle, the same arguments can be applied for the demonstrations in Leipzig in the GDR at the end of 1989, even without getting into the specific circumstances. As is well-known, protest norms were anything but institutionally and culturally supported and thus easily accessible orientations in the GDR. There existed no wellrehearsed routines for protests, especially not within the morally motivated parts of the protest movement. Regarding the motives within the general orientation, protests occurred both for immaterial reasons, like the German unity, democratic rights and freedom or an internal reform of socialism, and also egoistic aims, such as the search for a better material life – with different weights within the respective milieu of everyday life. So even regarding strong general attitudes favoring the participation in the demonstrations, there existed inconsistencies in terms of the orientation. There were also sufficient interruptions affecting the practical translation into concrete action to start a process of deliberation even when normative inclination was strong. Incentives and time were also sufficiently available. In any case, in the study itself it is presumed that actors were following the RC-mode: “We thus predict that people [in Leipzig at that time] in general deliberate whether to participate or not and do not act spontaneously” (Opp 2016:10; emphasis changed). But then the MFS states that norms indeed function merely as preferences and just not as orientations or frames and that, therefore, a damping effect cannot be expected, even if these normative preferences were strong.

5 Perspectives Without question, the discovery of framing effects has forced RCT – both the wide and narrow versions – onto the defensive and the need for repairs, for downplaying of the problems and for refutations of the anomalies increases with each new indication of a framing effect – sometimes also at any cost of (scientific) reason and fairness. This is legitimate, and it is not merely a part of the scientific search for truth: it is its very core. But research must also proceed, because indications do not decrease. Nobody appreciates attacks from outside(rs). It is thus much more effective if competent, prominent and respected representatives of the contested paradigm themselves start to consider ideas which were completely inconceivable before. This is exactly what has happened. The developments are not that new. They had two starting points: the attempts to get a grip on “bounded rationality”, and the discovery that the basic axioms of RCT (which must also apply to a wide version) are not fulfilled empirically to the extent that only the blind could not acknowledge this. The most important conversion regarding framing effects (and the main fundamentals of

524 | Hartmut Esser

MFS) was the acceptance that behavior is controlled not merely by “decisions” (conscious or unconscious, spontaneous or delayed) and the shadow of the future, but also by processes of categorization of situations and the activation of programs by decoding symbols and mental models encoded by learning in the past. Herbert Simon (1983) provided evidence for this, and more recent examples include the ideas and proposals in the contributions by Bicchieri, by Fehr and Hoff, and in particular by Rubinstein, who are all sufficiently familiar with (narrow and wide) RCT to know what they are doing. Moreover, they are far too respected in this area to be immediately pushed aside. With regard to Bicchieri, in Chapter 2, “Habits of the Mind”, of her book The Grammar of Society, we find terms and processes like “trigger cues”, “categorization” and the activation of “schemas (or scripts)” (Bicchieri 2006:88ff.); in their contribution “Tastes, Castes and Culture”, Fehr and Hoff refer to concepts like “anchoring, framing and multiple social identities” and “frame switches” (Fehr and Hoff 2011:6ff.); Rubinstein considers “reference points”, “frames” and “framing effect” (Salant and Rubinstein 2008; Sandroni 2011) and reaction time as a mediating condition for the rationality of action and the violation of RCT’s axioms (Rubinstein 2013) – almost everything, therefore, that constitutes the MFS. Some, for example Duwfenberg, Gächter, and HennigSchmidt (2011:460, 471), openly admit that a theory on framing and its effects is badly needed but not yet available. Another example is Gintis, who acknowledges framing effects in his various proposals to declare wide RCT as the general theory of social sciences, but admits that his conception has not yet found a theoretical place for them (Gintis 2007:11). What, therefore, are the next steps? Three perspective tasks become apparent. Firstly: the integration, further formal specification and axiomatization of the various approaches and models which try to cope with framing effects. Secondly: the clarification of the conditions for analyzing strategic situations and their dynamics under the assumption of framing effects and spontaneous (and hence unconditional) reactions. Thirdly: the systematic empirical analysis of framing effects and their special conditions, such as those the MFS specifies and systematizes via experiments (see Tutić 2015 for a review and systematization of the various approaches, and a proposal for a research program aiming at further specification and integration). This would certainly be no trifling matter, and it is an open question how things will proceed; yet more has already been accomplished than an initial step.

When Prediction Fails

| 525

Bibliography [1]

[2] [3] [4]

[5]

[6]

[7] [8] [9] [10]

[11]

[12]

[13]

[14] [15] [16] [17] [18]

Auspurg, Katrin, Corinna Frodermann, and Thomas Hinz. 2014. “Berufliche Umzugsentscheidungen in Partnerschaften. Eine experimentelle Prüfung von Verhandlungstheorie, FrameSelektion und Low-Cost-These.” Kölner Zeitschrift für Soziologie und Sozialpsychologie 66(1):21–50. Bicchieri, Cristina. 2006. The Grammar of Society. The Nature and Dynamics of Social Norms. Cambridge: Cambridge University Press. Binmore, Ken, and Avner Shaked. 2010. “Experimental Economics: Where next?” Journal of Economic Behavior & Organization 73(1):87–100. Costa, Albert, Alice Foucart, Sayuri Hayakawa, Melina Aparici, Jose Apesteguia, Joy Heafner, and Boaz Keysar. 2014. “Your Morals Depend on Language.” PLoS ONE 9(4):e94842. doi:10.1371/journal.pone.0094842. Diekmann, Andreas. 2010. “Analytische Soziologie und Rational Choice.” Pp. 193–201 in Die Analytische Soziologie in der Diskussion, edited by T. Kron, and T. Grund. Wiesbaden: VS Verlag für Sozialwissenschaften. Diekmann, Andreas, and Thomas Voss. 2004. “Die Theorie rationalen Handelns. Stand und Perspektiven.” Pp. 13–29 in Rational-Choice-Theorie in den Sozialwissenschaften. Anwendungen und Probleme, edited by A. Diekmann, and T. Voss. München: Oldenbourg Verlag. Dufwenberg, Martin, Simon Gächter, and Heike Hennig-Schmidt. 2011. “The Framing of Games and the Psychology of Play.” Games and Economic Behavior 73(2):459–478. Ellingsen, Tore, Magnus Johannesson, Johanna Mollerstrom, and Sara Munkhammar. 2012. “Social Framing Effects: Preferences or Beliefs?” Games and Economic Behavior 76(1):117–130. Engel, Christoph, and David G. Rand. 2014. “What Does ‘Clean’ Really Mean? The Implicit Framing of Decontextualized Experiments.” Economics Letters 122(3):386–389. Esser, Hartmut. 2002. “In guten wie in schlechten Tagen? Das Framing der Ehe und das Risiko zur Scheidung. Eine Anwendung und ein Test des Modells der Frame-Selektion.” Kölner Zeitschrift für Soziologie und Sozialpsychologie 54(1):27–63. Esser, Hartmut, and Clemens Kroneberg 2015. “An Integrative Theory of Action. The Model of Frame Selection.” Pp. 63–85 in Order on the Edge of Chaos: Social Psychology and the Problem of Social Order, edited by E. J. Lawler, S. R. Thye, and J. Yoon. New York: Cambridge University Press. Fahti, Moe, Melissa Bateson, and Daniel Nettle. 2014. “Effects of Watching Eyes and Norm Cues on Charitable Giving in a Surreptitious Behavioral Experiment.” Evolutionary Psychology 12(5):878–887. Fazio, Russell H. 1990. “Multiple Processes by Which Attitudes Guide Behavior: The Mode Model as an Integrative Framework.” Pp. 75–109 in Advances in Experimental Social Psychology, Vol. 23, edited by M. P. Zanna. San Diego: Academic Press. Fehr, Ernst, and Simon Gächter. 2002. “Altruistic Punishment in Humans”, Nature 415(6868):137–140. Fehr, Ernst, and Herbert Gintis. 2007. “Human Motivation and Social Cooperation: Experimental and Analytical Foundations.” Annual Review of Sociology 33(1):43–64. Fehr, Ernst, and Karla Hoff. 2011. “Introduction: Tastes, Castes and Culture: The Influence of Society on Preferences.” The Economic Journal 121(556):F396–F412. Gintis, Herbert. 2007. “A Framework for the Unification of the Behavioral Sciences.” Behavioral and Brain Sciences 30(1):1–61. Keizer, Kees, Siegwart Lindenberg, and Linda Steg. 2013. “The Importance of Demonstratively Restoring Order.” PLoS ONE 8(6):e65137. doi:10.1371/journal. pone.0065137.

526 | Hartmut Esser

[19] Kroneberg, Clemens. 2014. “Frames, Scripts, and Variable Rationality: An Integrative Theory of Action.” Pp. 97–123 in Analytical Sociology. Actions and Networks, edited by G. Manzo. Chichester: Wiley & Sons, Ltd. [20] Liberman, Varda, Steven M. Samuels, and Lee Ross. 2004. “The Name of the Game: Predictive Power of Reputations versus Situational Labels in Determining Prisoner’s Dilemma Game Moves.” Personality and Social Psychology Bulletin 30(9):1175–1185. [21] Lindenberg, Siegwart. 2016. “Social Rationality and Weak Solidarity. A Coevolutionary Approach to Social Order.” Pp. 43–62 in Order on the Edge of Chaos. Social Psychology and the Problem of Social Order, edited by E. J. Lawler, S. R. Thye, and J. Yoon. Cambridge: Cambridge University Press. [22] Ockenfels, Axel, 1999. Fairneß, Reziprozität und Eigennutz. Ökonomische Theorie und experimentelle Evidenz. Tübingen: Mohr Siebeck GmbH & Co. [23] Opp, Karl-Dieter. 1999. “Contending Conceptions of the Theory of Rational Action.” Journal of Theoretical Politics 11(2):171–202. [24] Opp, Karl-Dieter. 2016. When Do People Follow Norms and When Do They Pursue Their Interests? Implications of Dual-Process Models and Rational Choice Theory, Tested for Protest Participation. Leipzig and Washington. Unpublished manuscript. [25] Parsons, Talcott. 1937. The Structure of Social Action. A Study in Social Theory with Special Reference to a Group of Recent European Writers. Vol. 1, Marshall, Pareto, Durkheim. New York and London: MacGraw-Hill. [26] Rand, David G., Joshua D. Greene, and Martin A. Nowak. 2012. “Spontaneous Giving and Calculated Greed.” Nature 489(7416):427–430. [27] Rubinstein, Ariel. 2013. “Response Time and Decision Making. A ‘Free’ Experimental Study.” Judgement and Decision Making 8(5):540–551. [28] Salant, Yuval, and Ariel Rubinstein. 2008. “(A,f): Choice with Frames.” Review of Economic Studies 75(4):1287–1296. [29] Sandroni, Alvaro. 2011. “Akrasia, Instincts and Revealed Preferences.” Synthese 181(1):1–17. [30] Simon, Herbert A. 1983. Reason in Human Affairs. Stanford, CA: Stanford University Press. [31] Stocké, Volker. 1996. Relative Knappheiten und die Definition der Situation. Die Bedeutung von Formulierungsunterschieden, Informationsmenge und Informationszugänglichkeit in Entscheidungssituationen: Ein Test der Framinghypothese der Prospect-Theory am Beispiel des ‘asian disease problem’. Research Report for the German National Science Foundation (DFG). Mannheim: University of Mannheim. [32] Vanberg, Viktor J. 2002. “Rational Choice vs. Program-Based Behavior: Alternative Theoretical Approaches and their Relevance for the Study of Institutions.” Rationality and Society 14(1):7– 54. [33] Tutić, Andreas. 2015. “Warum denn eigentlich nicht? Zur Axiomatisierung soziologischer Handlungstheorie.” Zeitschrift für Soziologie 44(2):83–98. [34] Tversky, Amos, and Daniel Kahneman. 1981. “The Framing of Decisions and the Psychology of Choice.” Science, New Series 211(4481):453–458.

Marc Höglinger and Stefan Wehrli

Measuring Social Preferences on Amazon Mechanical Turk Abstract: Social preferences are receiving increased attention in the social sciences, especially in behavioral economics and social psychology. From this arises the need to measure individuals’ social preferences in both the laboratory and in surveys of the broader population. The recently proposed SVO slider measure (Murphy et al. 2011) is supposed to be feasible for laboratory as well as for survey research. Our aim is to evaluate this measure using an online survey distributed on Amazon Mechanical Turk (MTurk). We compare the elicited social preferences on MTurk to those found in laboratory settings, look at sociodemographic variation in measured social preferences and evaluate the measure’s test-retest reliability. In addition, we investigate how the standard dictator game performs as an alternative (and shorter) measure of prosocial preferences. Finally, we explore the correlation of these two incentivized measures with established survey items on self-reported prosocial behavior. Results show that social preferences elicited with the SVO-Slider on MTurk have a similar distribution to those found in laboratory settings. Also, the SVO slider turns out to have a high test-retest reliability (Pearson’s r = 0.79). However, the SVO measure correlates only weakly with self-reported prosocial behavior items but, interestingly, considerably with the survey response time.

1 Introduction Over the last two decades, social preferences have received considerable attention and become an important domain of research in rational choice theory and experimental economics. Experimental research based on simple behavioral games has profoundly challenged the selfishness axiom and repeatedly confirmed that people often make choices that take the wellbeing of others into account (Henrich et al. 2005; Fehr and Gintis 2007). This has also spurred theoretical work that has given way to a wealth of new models departing from narrow self-interest (see Fehr and Schmidt 1999; Fehr and Schmidt 2003; Sobel 2005). Heterogeneity in preferences is now widely acknowledged, even if the validity, context dependence, generalizability, and relevance of corresponding experimental findings are still the subject of intense debate (e.g., Bradsley 2008; List 2007). Nevertheless, social preferences have profound implications for the evolution of cooperation. They interact with the incentive structure of institutions in decisive ways such that the presence of a certain number of actors with prosocial or Note: We thank Chris Snijders for his helpful suggestions for the improvement of the original manuscript, and Claudia Jenny for proofreading. https://doi.org/10.1515/9783110472974-025

528 | Marc Höglinger and Stefan Wehrli

reciprocal dispositions can trigger very different aggregate outcomes (e.g., Gürek, Irlenbusch, and Rockenbach 2006; Diekmann et al. 2014). Social psychologists have long recognized that people fundamentally differ with respect to self- versus other-regarding preferences, and that these differences affect cooperative behavior. Consequently, psychologists have been developing measures of “social value orientations” for a long time already (see van Lange et al. 1997; Bogaert, Boon, and Declerck 2008; Murphy and Ackermann 2014 for recent reviews). Social value orientations are personality traits or stable individual differences in individuals’ preferences in how to allocate resources between themselves and others. This view of the heterogeneity of preferences contrasts with standard rational choice theory as formulated by Gary Becker (1976:5), who states that “preferences are assumed not to change substantially over time, nor to be very different between wealthy and poor persons, or even between persons in different societies and cultures.” In view of the large body of literature on social preferences, we still know surprisingly little about the intra-personal stability of social preferences and how these individual differences are distributed across sociodemographic groups within societies and across cultures. Both the economic and psychological research traditions have largely relied on incentivized laboratory experiments, typically carried out using small samples of university students. In response to criticisms regarding the limited generalizability of experimental results (e.g., Henrich, Heine, and Norenzayan 2010; Levitt and List 2007), scholars increasingly applied these methods outside the laboratory in field experiments (e.g., Falk 2007). They relocated the laboratory to many countries and different cultural settings (Hermann, Thöni, and Gächter 2008; Henrich et al. 2005), or embedded behavioral experiments in general population surveys (Fehr et al. 2002; Diekmann 2004; Bekkers 2007; Carpenter, Connolly, and Knowles Meyers 2008). In this chapter, we follow the latter route by including incentivized measures of each of the two experimental research traditions in an online survey. Like Gächter, Hermann, and Thöni (2004), who used survey questions from the General Social Survey (GSS) to investigate the link between trust and voluntary contributions in public goods games, we combine experimental with survey evidence. Our study focuses on prosocial and altruistic behavior captured with the ‘SVO Slider Measure’ of Murphy, Ackermann, and Handgraaf (2011) and with the canonical dictator game (Kahneman, Knetsch, and Thaler 1986). We investigate how these two measures correlate, and how they are related to self-reported behavior, using the General Social Survey’s altruistic behavior items (Smith 2006). In addition, we will revisit Gary Becker’s claim of social preferences’ independence from sociodemographic characteristics and try to replicate existing evidence that, for example, women (Eckel and Grossman 1998) and older people (Carpenter, Connolly, and Knowles Meyers 2008) are on average more prosocial. For our study, we used a larger sample (N = 871) than is typically recruited for laboratory experiments. This provides more statistical power for testing whether individual characteristics are related to prosocial preferences. We collected our data using the online labor market Amazon Mechanical Turk (MTurk), which has rapidly become

Measuring Social Preferences on Amazon Mechanical Turk |

529

one of the largest online subject pools for behavioral research. While MTurk samples are not representative of the general population, their higher heterogeneity relative to other subject pools makes them better suited to the analysis of individual differences in prosocial behavior. MTurk allows administering incentivized survey experiments to a more heterogeneous population, but it has similar disadvantages to laboratory research in terms of self-selection bias and the participant’s experience with behavioral experiments. This might be of concern because we found that social preferences are correlated with participants’ arrival rank (early vs. late study takers) and with their experience with behavioral studies. The remainder of the chapter is organized as follows. In Section 2, we briefly describe the SVO slider (Murphy, Ackermann, and Handgraaf 2011) and how it relates to the canonical dictator game, and we introduce the ‘altruistic behavior module’ from the GSS. Section 3 presents the design and sample of our study. In Section 4, we report the distribution of prosocial preferences found in our sample and compare it with previous studies. The reliability and convergent validity of the SVO slider is addressed in Section 5. In Section 6, we explore the conditional distribution of the revealed preferences from the SVO slider and the dictator game with regard to some basic sociodemographic characteristics. Finally, we report correlations of the two incentivized measures with self-reported prosocial behavior elicited with the GSS ‘altruistic behavior module’ (7) and with survey meta-data such as participants’ survey retention time, arrival rank, and participation in previous studies (8).

2 Measuring social preferences In experimental economics, the workhorse for eliciting social preferences is a one-shot game, incentivized with monetary payoffs and played against one or several anonymous interaction partners. A social preference is considered as a decision maker’s concern for the other’s outcome that influences her behavior in allocation decisions. A person is said to be altruistic if her utility increases with the wellbeing of her interaction partner. In the simplest of all canonical games, the dictator game, one player gets an initial endowment that she can allocate to herself and another player. In observing how she splits the cake, we measure her concern for the other’s material payoff. In terms of a simple additive-separable utility function, the player i’s utility U i depends on her own outcome x i and a valuation a i of the recipient’s payoff x j . Hence, U(x i , x j )i = x i + a i x j , with the share of the endowment the dictator keeps for herself being x i = 1−x j , and a i being an altruism parameter that includes the standard case of pure self-interest if a i = 0 (see Levine 1998 or Andreoni and Miller 2002 for a detailed discussion of utility functions embodying altruism). Depending on the recipient, the context, and the framing of the dictator game, splits over the entire range of possible allocations have been observed. However, the outcome of a dictator game is typically

530 | Marc Höglinger and Stefan Wehrli

a mixture between transfers of zero, fair splits at 50 %, and only few offers in between these two focal points or above 50 % (see Engel 2011 for a meta-analysis). The tradeoff between the dictator and receiver payoff opens a two-dimensional space of joint outcomes depicted in Figure 1, where all possible allocations of the dictator are plotted as the dashed line (reproduced from Murphy and Ackermann 2014:16). The x-axis corresponds to the decision maker’s own payoff (x i ), the y-axis to the other person’s payoff (x j ). The conceptual framework depicted in Figure 1 was introduced by Liebrand (1984) for his Ring measure and provides a classification of potential revealed preferences in a self vs. other outcome space. Because not all preference types on the ring have the same empirical relevance, most SVO measures restrict their measurement to a particular segment on the ring.

Pr os o

r rty

l cia

M a

Altruistic

Masochistic

Individualistic

Sa

m

tit ive

do

as

oc

hi

sti

pe m Co

c S a d is tic

Note: Dashed line indicates the possible allocations in the standard dictator game. Fig. 1: Self vs. other outcome space with tradeoffs of the SVO slider measure.

The actual measurement of social value orientations with the SVO slider is similar to that of the dictator game. Decision makers are confronted with a series of allocation tasks (“items”) called decomposed games. Four out of the 15 allocation items are standard dictator games, while the other 11 implement different tradeoffs. Unlike in many

Measuring Social Preferences on Amazon Mechanical Turk |

531

economic applications, preferences are not measured based on games with strategic interaction. This helps decouple choices from complications introduced by the decision maker’s beliefs of what his interaction partner would prefer. Scholars in the SVO tradition have developed several taxonomies to classify motivational orientation. The most common categorization is that of prosocial vs. proself. The former reflects a concern for the wellbeing of others (altruism) and for equality (fairness), whereas the latter is further subdivided into individualistic and competitive subtraits (Bogaert, Boon, and Declerck 2008; van Lange et al. 1997). The SVO slider of Murphy and colleagues (2011) distinguishes four preference types: altruistic, prosocial, individualistic, and competitive. They claim their measures have several advantages, such as higher resolution, transitivity checks and higher efficiency, thus making the SVO slider also suitable for survey research. The items of the SVO slider span the shaded area in Figure 1, which contains the most empirically common SVO types. Underlying the classification is a unidimensional SVO score in the form of an angle (expressed in degrees) that is interpreted as a measure of concern for the other’s outcome. In the SVO slider item five, for example, the respondent has to choose from the following set of self-other allocations: {(100, 50), (94, 56), (88, 63), (81, 69), (75, 75), (69, 81), (63, 88), (56, 94), (50, 100)}. The mean allocations for self (Ā s ) as well as for the other (Ā O ) are computed across the six primary SVO items. An additional set of nine secondary items (like inequality aversion, fairness, or efficiency) can be used to disentangle the motivational foundation of prosociality; however, we will not consider them here. The SVO angle equals ∘ SVO = arctan((Ā s − 50)/(Ā O − 50)). Note that, since the items of the SVO slider are not symmetrically distributed around the ring, the actual SVO angle does not correspond to the angles in Figure 1. Instead, the SVO angle takes the value of –16.26 degrees for an archetypical competitive player and goes up to 61.39 degrees for a person with purely altruistic preferences. (For a full specification of all items and further psychometric properties of the SVO slider see Murphy, Ackermann, and Handgraaf 2011:774ff.) A different approach to measuring prosociality, which is regularly applied in survey research, is based on additive indexes of self-reported behavior. The GSS altruism index (Smith 2006) is an established and widely used instrument supposed to capture a general propensity towards altruism. It is based on summed-up frequencies of prosocial behaviors from a set of different contexts: that is, how often a respondent gave, helped, volunteered, etc. within a particular time period.¹ Such an approach

1 The question reads as follows: “During the past 12 months, how often have you done each of the following things: [list of prosocial behavior items].” Surveyed items are, among others: given blood, given to the homeless, returned extra change, volunteered, given up a seat, helped someone who was away, carried a stranger’s belongings, lent money, or helped someone find a job. Response options are: more than once a week, once a week, once a month, at least two or three times in the past year, once in the past year, not at all in the past year (see Höglinger and Wehrli 2016 for the full wording and a complete item list).

532 | Marc Höglinger and Stefan Wehrli

raises several methodological issues. Perhaps the most critical one is item selection. The types of prosocial behaviors that underlie the index highly influence how a particular respondent scores. Additionally, self-reports pose several well-known problems such as recall bias or socially desirable responding (Paulhus 1985). However, the incentivized measures referred to above may also be highly context-specific. They capture a very particular type of prosocial behavior, namely sharing small amounts of (windfall) money with others. Nevertheless, the experimental paradigms have the advantage that all respondents are confronted with the identical situation in which decisions have real consequences on the respondents’ own and their interaction partners’ earnings. Even if the GSS altruism index and the two incentivized measures do not exactly address the same construct, we would still expect them to be highly correlated, as they are supposed to capture the same (or at least a very similar) theoretical concept. In Sections 5 and 7, we will discuss the convergent and predictive validity of the measures in detail. Even though we do not provide a concluding assessment of which measure is preferable, it is nonetheless important to know whether (and how) these supposedly closely related approaches coincide.

3 Design and study sample The participants for the study were recruited through the crowdsourcing platform Amazon Mechanical Turk (MTurk). MTurk is frequently used to recruit participants for scientific surveys and experiments (Horton, Rand, and Zeckhauser 2011; Chandler, Mueller, and Paolacci 2014; Paolacci, Chandler, and Ipeirotis 2010). MTurk samples are not representative of the general population, but have been shown to be more heterogeneous than other convenience samples generally used for experimental research. Results from some classical experiments carried out on MTurk have been shown to be comparable to those from other samples, suggesting that MTurk samples are feasible for experimental research (Berinsky, Huber, and Lenz 2012). For our study, we posted a HIT (Human Intelligence Task) asking for participation in a “Study on Decision Making” that required completing two online surveys – one immediately, the other one week later (see Höglinger and Wehrli (2016) for the detailed study design, including screen shots of the questionnaire). A base payment of 2 $ was offered for completing the two surveys, with the possibility of earning up to an extra 3 $ on various decision tasks. Of the 1,109 MTurk workers who accepted the HIT, 1,104 completed the first survey and 897 (81 %) completed the second. Completing the second survey was slightly negatively associated with the SVO slider angle (r = −0.06, p = 0.07); consequently, our final sample, consisting only of respondents completing

Measuring Social Preferences on Amazon Mechanical Turk |

533

both surveys, is slightly less prosocial than the original sample.² Finally, we dropped 26 respondents who did not pass a screening question (adapted from Berinsky, Margolis, and Sances 2014), leaving us with N = 871 cases for the main analyses. The final sample consisted of 47 % females, 99 % US citizens, and 83 % white Caucasians; 86 % had one sibling or more and 43 % lived in a village or small town at age 16. The median age was 34, with the first quartile at 28 and the third at 43; 12 % indicated a high school diploma as their highest educational attainment, 35 % some college or Associate’s Degree, 40 % a Bachelor’s Degree, and 13 % a graduate degree; 48 % were full-time employed, 14 % part-time, 17 % self-employed, 11 % homemakers, and 8 % students (overlapping of categories possible). The median of the self-reported number of participation in prior MTurk studies was 500, with the first quartile at 100 and the third at 1,506, reflecting that most MTurk workers have substantial experience with scientific studies. In both surveys, participants first completed the SVO slider. For this, each respondent was randomly matched (ex-post) with another anonymous participant who figured as receiver.³ Respondents also served as receivers, though for a different participant, to rule out reciprocity. Next, respondents played a prisoners’ dilemma game with yet another participant. A range of sociodemographics, as well as questions on attitudes and prosocial behavior, followed. At the end of the second survey only, participants played a standard dictator game. For every game, the SVO slider, the prisoners’ dilemma and the dictator game, respondents were randomly paired with another different participant. The actual matching, the calculation of payments and giving feedback was done only after all participants had finished the second survey. For the SVO slider, participants made 15 allocation decisions, with endowments of between 90 and 100 points. One out of the 15 allocations was randomly implemented and the corresponding payments allotted. In the prisoner’s dilemma game, the temptation payoff was 100 points, the sucker payoff 0, the cooperation payoff 60 each, and the defection payoff 20 each. The dictator game had an endowment of 100 points, which could be split into units of 10 points. The conversion rate was 100 points, equal to 0.50 $ for all games. Even though these stakes are low compared to laboratory experiments, they are common for MTurk studies and seem to result in similar behavioral patterns to those observed in laboratory studies with much higher stakes (see Amir, Rand, and Gal 2012; Keuschnigg, Bader, and Bracher 2016; Raihani, Mace, and Lamba 2013).

2 Where meaningful, we therefore performed our analyses by additionally including respondents that had participated only in survey one. Reported results are robust and do not change qualitatively. 3 Participants were informed about this in the following way: “In this task, you have been randomly paired with another person whom we will refer to as the ‘Other’. This other person is a participant in this study and someone you do not know and who will remain mutually anonymous. All of your choices are completely confidential. You will not be paired again with this person outside of task 1.”

534 | Marc Höglinger and Stefan Wehrli

4 Distribution of social preferences In a first step, we analyze the distribution of the prosocial preferences in our sample. Figure 2 shows a frequency distribution of the SVO angles elicited with the slider measure (left side) and with the dictator game (right side).⁴ The SVO angle shows an accentuated bimodal distribution, with high frequencies at the focal values of 7.8 (26 %) and 34.9 (22 %), as well as 37.5 (13 %), degrees. Categorizing participants according to Murphy, Ackermann, and Handgraaf (2011), we identify 59 % of our respondents as being prosocial and 41 % as proself. The competitive and the altruistic subtypes are virtually nonexistent, with 4 and 1 cases out of 871 respectively. We therefore restrict ourselves in the following analysis of SVO types to the two basic orientations of prosocial and proself, and collapse the five outlying cases into the neighboring categories. Our type distribution is in good agreement with Murphy, Ackermann, and Handgraaf (2011), whose lab experiments found 59 % respondents to be of the prosocial type and 35 % to be proself. Using the four types (competitive, proself, prosocial, and altruistic), we can explain 86 % of the variance in the SVO slider angle in our sample (r-squared from regressing SVO types on angle). The dichotomous classification into prosocial vs. proself has very similar explanatory power with 83 %. Hence, there is limited additional value in using the more fine-grained SVO angle measure in place of the four or the two type classification.

Prosocial

10

40

% of participants

20

Individualistic

Giving in the dictator game (points)

Altruistic

% of participants

30

Competitive

SVO slider angle (degrees)

30

20

10

0

0 –20

0

20

40

60

0

50

100

Fig. 2: Distribution of prosocial preference measures.

4 The SVO angle shown is in survey one and the dictator game (DG) in survey two, because the DG was played only in the second survey. As we will show later, the SVO angle distributions from survey one and survey two were identical.

Measuring Social Preferences on Amazon Mechanical Turk |

535

We now we turn to the distribution of prosocial preferences elicited using the dictator game. We let participants play a standard dictator game with an endowment of 100 points (equivalent to 0.50 $), which they had to allocate in intervals of 10 points to themselves and to another anonymous, randomly chosen participant. The right panel of Figure 2 indicates that 42 % gave nothing to the other person, while 35 % transferred 50 points, that is, shared equally with the other person. The donation rate, or the average amount of transferred points, was 27 %. This is in line with behavior observed previously in both physical laboratories and on MTurk. Engel (2011) reports an average donation rate of 28.4 % for the lab in a meta-study. The focal points are more pronounced in our case, with 42 % (versus Engel’s 36 %) on average choosing the zero transfer, and 35 % (versus 17 %) choosing the equal split. The probability mass for the categories in between is much lower compared to what is commonly found in dictator games, suggesting that our MTurk sample is more prone to choose ideal typical strategies.

5 Reliability and convergent validity of the SVO slider measure In the following, we investigate the reliability and the convergent validity of the SVO slider measure. The second SVO slider measurement from survey two, elicited one week after survey one, shows an identical distribution of the SVO angle. A Kolmogorov–Smirnov test does not reject equality of distributions (p = 0.29) and the angle mean changes only marginally from 24.1 to 24.7 (p = 0.06). At the individual level, there are some differences between the two SVO measures; still, the correlation is quite high (0.79), and the median of the absolute individual difference between the two measures is only 1.6 degrees (mean = 5, sd = 7.5). By looking at SVO types, a dummy variable indicating being prosocial vs. being proself, we see an identical pattern. The proportion of 39 % proselfs (41 % in survey one) and 60 % prosocials (59 %) remains basically unchanged. The correlation of being prosocial (vs. proself) in survey one and survey two is 0.72. Eighty-six percent show a consistent type in both measures. The direction of change of the 14 % that change type is roughly symmetrical. In sum, the test-retest reliability of the SVO slider for a one-week period is fairly high, and there is also no overall shift in measured preferences in the second survey. Murphy, Ackermann, and Handgraaf (2011) found a slightly higher type consistency of 89 % and an angle correlation of 0.92 in their experiments over the same oneweek period. Volk, Thöni, and Ruigrok (2012) measured contributions to public goods games at three distinct points in time over a five-month period. Interestingly, they too found the preference distribution to remain virtually unchanged. At the individual level, however, only 50 % of their participants were classified as the same preference type over all three waves. To the best of our knowledge, the longest period studied in

536 | Marc Höglinger and Stefan Wehrli

the literature on intra-personal stability of social preferences was the Dutch Telepanel reported in Van Lange et al. (1997), which took place over a time span of 19 months. They found 59 % of their respondents exhibited stable social value orientations over this long period. At first glance, all these test-retest reliabilities seem quite high, but this is not the case if stable personality traits are chosen as a benchmark. Furthermore, the declining stability suggests that social preferences are subject to considerable change over a longer time (see Bekkers 2004 for a detailed discussion). To assess the SVO slider’s convergent validity with other incentivized measures, we compare it with behavior in the dictator (DG) and in the prisoners’ dilemma game (PD). The correlation between the dictator game and the SVO angle, both from survey two, is 0.42. The result is almost identical (0.41) if we take the SVO type (a dummy for being prosocial vs. being proself) instead of the angle. Marginally lower correlations of 0.39 and 0.37 are found if we use the SVO measure from the first survey. These results are hardly surprising as the SVO slider consists of several decision tasks similar to those in a dictator game. The most commonly applied convergence test in the SVO literature is a comparison of revealed SVO types with the decision in a social dilemma situation (see Balliet, Parks, and Joireman 2009). That is why we let our respondents play a prisoners’ dilemma game in both survey waves (T = 100, R = 60, P = 20, and S = 0, with 100 points equaling 0.50 $). We found that 61 % of our respondents chose the cooperative strategy in survey one and 59 % in survey two. Correlation with the SVO angle was 0.32 for survey one and 0.37 for survey two; for the dummy being prosocial (vs. not) it was 0.31 and 0.35 respectively. Murphy, Ackermann, and Handgraaf (2011) found a considerably lower convergent validity of the SVO angle with the prisoners’ dilemma (0.24). Balliet, Parks, and Joireman (2009) report an average correlation of r = 0.30 in their meta-analysis of 82 studies that relate social value orientations with behavior in a social dilemma game. Again, this is in good agreement with our findings.

6 Correlation of the SVO slider and the dictator game with sociodemographics In a next step, we explore how prosocial preferences correlate with various basic sociodemographic characteristics. In the literature, there is convincing evidence that social preferences are related to personality traits (Volk, Thöni, and Ruigrok 2012) and that there are variations between cultures (Henrich et al. 2005). The evidence for sociodemographic variation is, however, still limited and not conclusive. In the following, we report results on the relation of the SVO slider angle and giving in the dictator game, as well as some basic sociodemographic variables that have been discussed in the literature before. Figure 3 shows standardized beta coefficients of OLS models on

Measuring Social Preferences on Amazon Mechanical Turk |

537

the SVO angle and giving in the dictator game.⁵ In general, none of the explored covariates is strongly correlated with any of the two measures. We find the largest effects for females with 0.07 (SVO angle, p = 0.047) and 0.10 (DG giving, p < 0.01). This result matches previous findings on gender and giving in the dictator game. Eckel and Grossman (1998) found females sharing twice as much as men; Bolton and Katok (1995) only found an insignificantly higher contribution level in female participants. Engel’s (2011) meta-analysis, averaging over 12 studies, reports that females gave 5.6 % more than men. Gender differences in the contribution levels seem to be small, which might explain the mixed results typically reported in the experimental literature. Carpenter, Connolly, and Knowles Meyers (2008) embedded a dictator game in a representative population survey and found no gender effects, although they did report significant effects for age and education. Older and more educated people behave more generously and contribute significantly more. In our sample, however, we do not find any age or education effects that reach conventional levels of significance. Experimentalists have often been criticized for generating artifacts with subject pools that only rely on students and not on samples of ordinary people who “rise early .07 .10 .07 .04

Female Age

–.06 –.02 –.04 –.04 –0 –.01

Non-Caucasian Education (cat) Household income (log) Student

–.02 –.08 0 .01 .01

Full-time employed Parents’ education (cat) Number of siblings

–0

.05

.06

.03

Lived in village or small town when 16 –.20

–.10

SVO angle

0

.10

.10 .20

.30

DG giving

Note: Beta coefficients from OLS models (lines indicate 95 % confidence interval). Fig. 3: SVO angle and giving in the dictator game, and socio-demographics.

5 We report standardized beta coefficients for easier legibility and comparability with the bivariate correlations from other sections. Note that, because beta coefficients are standardized by the standard deviations of the variables under consideration, they depend on the corresponding sample variance. Plots of coefficient estimates were produced using the coefplot command for Stata (Jann 2014).

538 | Marc Höglinger and Stefan Wehrli

and get their hands dirty”. For some scholars, Engel’s (2011) finding that students on average give less compared to non-students probably came as a relief. To the best of our knowledge, no study has yet analyzed dictator transfers from different occupational groups on MTurk. We find no significant difference between students and nonstudents. However, using the dichotomous SVO type instead of the angle, students turn out to be more prosocial (β = 0.07, p = 0.049, not reported). Hence, if there is a systematic pattern in our data, it is exactly the opposite of what one would expect according to the literature. With respect to employment, we found no effect from being employed full time on giving in the dictator game, but the SVO angle shows a weak negative beta coefficient of –0.08. Accordingly, the 48 % of our respondents who claimed to work full time exhibit a slightly lower SVO angle compared to those not full-time employed. In the SVO literature, the development of prosocial orientations is explained with exposure to different levels of secure attachment, which itself is thought to be a function of interactions between caregiver and child, family members, and interaction patterns with peers during childhood and early adolescence (van Lange et al. 1997). Sociologists in the tradition of social capital research have often studied the availability of resources within the family and the surrounding community (Coleman 1990). Our data shows that having grown up in a village or small town exhibits a moderate positive effect on giving in the dictator game (0.10) but is unrelated to the SVO angle. In addition, the ‘sibling-prosocial’ hypothesis of van Lange et al. (1997), that is, that siblings tend to be more prosocial, is not supported by our data. None of the other covariates, such as race, social background or income, exhibits a substantial or significant effect on prosocial preferences. Prosocial preferences do not therefore seem to be strongly related to sociodemographic variables, and are thus rather uniformly distributed in the broader population. This obviously only holds true for the few basic sociodemographic variables under consideration. Still, Gary Beckers’ (1976:5) assumption of the uniform distribution of preferences across socioeconomic groups seems more warranted than the assumption of narrow self-interest.

7 Correlation of the SVO slider and the dictator game with self-reported prosocial behavior Survey researchers will be primarily interested in the convergent validity with their own toolbox to assess whether the behavioral measures justify the additional cost and complexity caused by the incentivization. The survey researcher’s standard approach for measuring prosocial behavior consists of established and widely applied scales and indexes of self-reported prosocial behavior. For the current comparison, we used the “altruistic behavior module” from the General Social Survey (GSS), which captures a wide range of altruistic helping behaviors in different contexts (Smith 2006;

Measuring Social Preferences on Amazon Mechanical Turk

|

539

see footnote 1 for a more detailed description). Which measure is superior cannot be addressed here, since in case of a low correlation either or both sides of the equation may be weak. Results of our implementation of the GSS altruism index suggest a relatively high internal consistency, with a Cronbach’s alpha of 0.88 (see Sijtsma 2009 for a critical discussion of alpha). We also performed a principal component analysis, which suggests that our measured construct is indeed unidimensional. Finally, we compared the prevalence of stated prosocial behaviors in our sample with those of a general population sample (GSS) as reported in Einolf (2008:1273) and find good agreement.⁶ Using an additive index of prosocial behavior, we find only a weak association with the SVO angle (Pearsons’s r = 0.08) and a modest association (r = 0.18) with giving in the dictator game (see Figure 4).⁷ Looking at individual items, we find no relation with the SVO slider whatsoever for most of the 15 surveyed behaviors. Only 4 out of the 15 items, namely, “allowed someone to cut ahead”, “gave money to charity”, “helped other with housework” and “talked to a depressed person” showed significant bivariate correlations between 0.08 and 0.13. Giving in the dictator game shows stronger correlations with self-reported prosocial behavior: 11 out of the 15 items were significantly correlated with values between 0.07 and 0.16. Both measures exhibit stronger correlations with the overall index compared to the individual items. In sum, the agreement between incentivized measures and self-reported altruistic behavior is weak. At least three interpretations of this finding are possible. First, the SVO slider and the dictator game capture social preferences poorly. Second, the GSS module captures altruistic behavior poorly. Third, prosocial preferences correlate only weakly with real-world altruistic behavior. Unfortunately, our design does not allow a conclusive assessment here. To put our results into context, Gächter, Hermann, and Thöni (2004) found a similarly weak link between the GSS trust index and contributions in a public goods game (r = 0.21) where the individual trust items showed all correlations below 0.10. In the light of these findings, survey researchers who usually confront severe time and space constraints in questionnaires will likely choose to include the simpler dictator game instead of the SVO slider measure that consists of six allocation decisions, even in its short version. This is especially likely because the dictator game demonstrates in our study a considerably stronger correlation with self-reported altruistic behavior than the SVO slider. In addition, our results show that the SVO slider measure, as well as the dictator game, seemingly measure quite different things to the

6 For most items, we find differences in prevalence below seven percentage points; one item is not reported in Einolf (2008). However, two items show substantial differences: “lent money” and “helped in housework” have significantly lower prevalence in our data set. We speculate that this is primarily due to the lower age of MTurk users compared to the general population. 7 Results are very similar and qualitatively identical if we use the SVO types prosocial vs. proself and a dichotomized indicator of dictator transfers.

540 | Marc Höglinger and Stefan Wehrli .08

Additive index of all items

.18 .03 .03 .04

Gave blood Gave to homeless

.14 .05

Returned extra change

.16 .09

Allowed someone to cut ahead

.14 .04 .06

Volunteered to charity

.13 .15

Gave money to charity 0

Gave up seat

.06 –0

Helped someone who was away

.11 –.02

Carried a stranger’s belongings

.06 .06 .07 .06

Gave directions Loaned item

.16 .08 .09

Helped others with housework –.03

Lent money

.13 .13 .11

Talked to depressed person .05

Helped someone find job

.15 –.10

0

SVO angle

.10

.20

.30

DG giving

Note: Pearson’s r (lines indicate 95 % confidence interval). Fig. 4: Bivariate correlation of the SVO slider and giving in the dictator game with self-reported prosocial behavior.

GSS altruism index. Hence, the measures can hardly substitute each other. Our study cannot assess what they actually measure, but as they pretend to measure the same concept – a preference for altruism or prosociality – it is necessary to investigate this issue further.

Measuring Social Preferences on Amazon Mechanical Turk

|

541

8 Correlation with participants’ survey retention time, arrival rank, and study experience One of the most persistent concerns among researchers who use MTurk is data quality (Chandler, Mueller, and Paolacci 2014). This comes as no surprise because the requester-worker relationship between researchers and participants on MTurk resembles a principal-agent problem. We therefore expect workers to show low effort when monitoring is hard. Amazon has implemented the transaction in favor of the requester: that is, the requester can deny payment if the work effort does not meet his required standards. However, scientific requesters typically face a situation where the effort consists of an honest answer to the researcher’s questions, which usually is not verifiable. Furthermore, payments on MTurk are normally below minimum-wage levels, with the result that respondents have little incentive to invest more attention than necessary. Accordingly, we expect money-maximizing workers to minimize their time to fill out a survey. Our data, indeed, supports this argument. Figure 5 reports correlations of our measures of prosociality with various response-time measurements. There is a strong positive correlation between the SVO angle and the time spent on choosing the positions on the SVO slider instrument (r = 0.48). Our results match those of Chen and Fischbacher (2016) who found for students in the lab a correlation between the response time and the SVO angle of virtually identical magnitude (r = 0.52). They explain the shorter response times of individualistic subjects with their smaller information requirements and with less cognitive conflict between motives and conclude that response times can be a cheap indicator of social preferences. Whether this interesting pattern is generalizable to other subject pools and study settings than the laboratory or MTurk (where participants are much less incentive driven) is of course questionable. Results in figure 5 show also that the total time used to complete both surveys is positively correlated with being prosocial (0.16 for the SVO angle, 0.09 for DG giving). Giving in the dictator game (DG) and the time to complete the DG, however, does not match the same pattern: the correlation between these two variables is negative (r = −0.07, p = 0.04). Another common threat to data quality discussed by Chandler, Mueller, and Paolacci (2014) is the presence of a very active core group of highly productive workers who participate disproportionally often in scientific studies posted on MTurk. These so-called super turkers use tools informing them when new work is available and they arrive at the study site within seconds of a HIT being posted. According to Chandler, Mueller, and Paolacci (2014) these participants are typically experienced players who are already familiar with many common paradigms in experimental research. The lower part of Figure 5 reports results that allow us to assess the impact of these experienced players. The arrival rank at our HIT, or the rank order in which participants accepted our HIT, is positively correlated with prosocial behavior (for SVO angle 0.11, p < 0.001, for DG giving 0.06, p = 0.06): in other words, participants arriv-

542 | Marc Höglinger and Stefan Wehrli

.48

Time to complete SVO slider (log)

.22 .12

Time to complete dictator game (log)

–.07 .16 .09

Time to complete both surveys (log)

.11 .06

Arrival rank for study HIT

–.09 –.10

Self-reported participation in MTurk studies (log)

–.24 –.20

Observed participation in DeSciL studies (log) –.40

–.20

SVO angle

0

.20

.40

.60

DG giving

Note: Pearson’s r (lines indicate 95 % confidence interval). Fig. 5: Bivariate correlation of the SVO slider and giving in the dictator game with participants’ survey retention, arrival time, and experience.

ing early at the survey website are significantly less prosocial. The same conclusion can be drawn from two proxy measures of experience. We asked respondents to selfreport the number of studies they had previously completed on MTurk. In addition, we used the unobtrusive count of previous interactions with our requester account to determine how often they had participated in studies carried out by the ETH Decision Science Laboratory (DeSciL). Both measures clearly indicate that experienced workers are substantially less prosocial. This shows that the possibility of experienced workers behaving differently must be considered when carrying out scientific studies on MTurk. In particular, studies using small samples without explicitly excluding “super turkers” might end up with a highly selective group of participants. This might lead to undesired bias and offsets the effort of recruiting more heterogeneous samples by carrying out studies on MTurk instead of in the laboratory.

9 Discussion and conclusion Research on social preferences in experimental economics and social psychology relies to a great extent on laboratory experiments with canonical games, such as the

Measuring Social Preferences on Amazon Mechanical Turk |

543

dictator game, carried out using small samples of university students. Field and survey researchers from the social sciences often have serious reservations regarding the generalizability of these experimental findings. Our study approaches this by implementing two incentivized measures of social preferences from the lab, the SVO slider measure (Murphy, Ackermann, and Handgraaf 2011) and the standard dictator game, in a survey distributed on MTurk. This way, we were able to open the participants’ pool and achieve a much more heterogeneous sample than those typically used in laboratory experiments. In addition, we applied the altruistic behavior module from the General Social Survey, an established self-report-based survey measure of prosocial behavior, to investigate how the incentivized measures compare to real-life (selfreported) behavior. Our prime contribution lies in an evaluation of the reliability of the SVO slider measure, its comparison with related measures, and an assessment of its convergent validity. Our results show that prosocial preferences are similarly distributed in an MTurk sample and in laboratory subject pools. Using the SVO slider, we identify a virtually identical distribution of social value orientation to what Murphy and colleagues (2011) obtained from laboratory experiments. Moreover, the contribution rate of our sample in a dictator game is in agreement with the average rates reported in Engel’s (2011) meta-analysis. Results regarding the altruistic behavior items from the General Social Survey are also close to the distribution reported by Einolf (2008). We also found that the SVO slider has a fairly high intertemporal reliability and correlates moderately with giving in the dictator game (r = 0.42). The correlation of the SVO slider with behavior in the prisoner’s dilemma game (r = 0.32) comes close to what has been reported in the meta-analysis of Balliet, Parks, and Joireman (2009). Hence, we conclude that measuring social preferences on MTurk is a viable option that should pique the social scientist’s curiosity. However, we found that prosocial preferences as elicited with these two measures are more or less homogeneously distributed in the broader population – at least when we look at some basic sociodemographic characteristics. We have implemented the SVO slider because it promises several advantages over traditional and more popular measures from the SVO literature. One major advantage is the underlying unidimensional angle that allows for a higher resolution measurement of social value orientations. However, our analysis showed that the fine-grained scale of the SVO slider offers little additional insight compared to a dichotomous categorization of individuals into prosocials and proselfs. Furthermore, we found only weak correlation between the SVO slider and self-reported prosocial behavior (r = 0.08), whereas giving in the dictator game shows a somewhat stronger correlation (r = 0.18). Hence, in terms of pure convergent validity with self-reported behavior, the SVO slider is outperformed by the dictator game. This conclusion must be taken with a grain of salt since self-reported behavior is far from being a ground truth. In any case, the fact that the incentivized measures only correlate weakly with established survey scales of prosociality leads to the question of what these scales actually measure and how they relate to real-world behavior.

544 | Marc Höglinger and Stefan Wehrli

Using MTurk to conduct behavioral and survey research has interesting advantages. Many research groups have replicated findings from experimental research on MTurk and repeatedly reported results that are at least qualitatively in good agreement (Crump, McDonnell, and Gureckis 2013). Apart from simply being cheaper, faster, more convenient, and often more transparent than many other sources of convenience samples, MTurk makes it possible to confront experimental treatments with more sociodemographic heterogeneity and environmental variation and noise. While well-known samples of students may be superior in providing initial proof for the existence of certain phenomena, MTurk turned out to be valuable in probing the generalizability and robustness of previous findings. MTurk may thus be regarded as a good complement to laboratory research rather than a substitute for it. Nevertheless, using MTurk samples may also introduce threats to data quality. Experienced workers show up disproportionally early in the data collection process on MTurk and invest significantly less time on the treatment. Moreover, the early arriving and more experienced workers might behave considerably differently than those arriving later in the collection process. In the present case, they behave less prosocially. Recruiting small samples is thus a major threat to data quality on MTurk because one likely ends up with a very selective group that is highly accustomed to all kinds of experimental treatments, games, and standard measures – something most researchers leaving the lab and using MTurk expressly seek to avoid.

Bibliography [1] [2] [3] [4] [5] [6] [7]

[8]

Amir, Ofra, David G. Rand, and Yaakokov Kobi Gal. 2012. “Economic Games on the Internet: The Effect of $1 Stakes.” PLoS One 7(2):e31461. Andreoni, James, and John Miller. 2002. “Giving According to Garp: An Experimental Test of the Consistency of Preferences for Altruism.” Econometrica 70(2):737–753. Balliet, Daniel, Craig Parks, and Jeff Joireman. 2009. “Social Value Orientation and Cooperation in Social Dilemmas: A Meta-Analysis.” Group Processes Intergroup Relations 12:533–545. Becker, Gary S. 1976. The Economic Approach to Human Behavior. Chicago: University of Chicago Press. Bekkers, René. 2004. “Stability, Reliability and Validity of Social Value Orientations.” Working Paper, available at SSRN: http://ssrn.com/abstract=2274560. Bekkers, René. 2007. “Measuring Altruistic Behavior in Surveys: The All-Or-Nothing Dictator Game.” Survey Research Methods 1(3):139–144. Berinsky, Adam J., Gregory A. Huber, and Gabriel S. Lenz. 2012. “Evaluating Online Labor Markets for Experimental Research: Amazon.com’s Mechanical Turk.” Political Analysis 20(3):351– 368. Berinsky, Adam J., Michele F. Margolis, and Michael W. Sances. 2014. “Separating the Shirkers from the Workers? Making Sure Respondents Pay Attention on Self-Administered Surveys.” American Journal of Political Science 58(3):739–753.

Measuring Social Preferences on Amazon Mechanical Turk

[9]

[10] [11] [12] [13]

[14] [15] [16] [17] [18]

[19] [20] [21] [22] [23]

[24] [25]

[26] [27]

[28] [29]

|

545

Bogaert, Sandy, Christoph Boon, and Carolyn Declerck. 2008. “Social Value Orientation and Cooperation in Social Dilemmas: A Review and Conceptual Model.” British Journal of Social Psychology 47:453–480. Bolton, Gary E., and Elena Katok. 1995. “An Experimental Test for Gender Differences in Beneficent Behavior.” Economics Letters 48(3–4):287–292. Bradsley, Nicholas. 2008. “Dictator Game Giving: Altruism or Artefact?” Experimental Economics 11(2):122–133. Carpenter, Jeffery, Christina Connolly, and Caitlin Knowles Meyers. 2008. “Altruistic behavior in a representative dictator experiment.” Experimental Economics 11(3):282–298. Chandler, Jesse, Pam Mueller, and Gabriele Paolacci. 2014. “Nonnaïveté among Amazon Mechanical Turk Workers: Consequences and Solutions for Behavioral Researchers.” Behavior Research Methods 46(1):122–130. Chen, Fadong, and Urs Fischbacher. 2016. “Response time and click position: Cheap indicators of Preferences.” Journal of the Economic Science Association 2(2):109–126. Coleman, James S. 1990. Foundations of Social Theory. Cambridge, MA: Belknap Press of Harvard University Press. Crump, Matthew J. C., John V. McDonnell, and Todd M. Gureckis. 2013. “Evaluating Amazon’s Mechanical Turk as a Tool for Experimental Behavioral Research.” PLoS ONE 8:e57410. Diekmann, Andreas. 2004. “The Power of Reciprocity.” Journal of Conflict Resolution 48(4):487–515. Diekmann, Andreas, Ben Jann, Wojtek Przepiorka, and Stefan Wehrli. 2014. “Reputation Formation and the Evolution of Cooperation in Anonymous Online Markets.” American Sociological Review 79(1):65–85. Eckel, Catherine C., and Philip J Grossman. 1998. “Are Woman Less Selfish Than Men? Evidence from Dictator Experiments.” The Economic Journal 108(448):726–735. Einolf, Christopher J. 2008. “Empathic Concern and Prosocial Behaviors: A Test of Experimental Results Using Survey Data.” Social Science Research 37(4):1267–1279. Engel, Christoph. 2011. “Dictator Games: A Meta Study.” Experimental Economics 14(4):583– 610. Falk, Armin. 2007. “Gift Exchange in the Field.” Econometrica 75(5):1501–1511. Fehr, Ernst, Urs Fischbacher, Bernhard von Rosenblatt, Jürgen Schupp, and Gert G. Wagner. 2002. “A Nation-Wide Laboratory: Examining Trust and Trustworthiness by Integrating Behavioral Experiments into Representative Surveys.” Schmollers Jahrbuch 122:519–543. Fehr, Ernst, and Klaus M. Schmidt. 1999. “A Theory of Fairness, Competition, and Cooperation.” The Quarterly Journal of Economics 114(3):817–868. Fehr, Ernst, and Klaus M. Schmidt. 2003. “Theories of Fairness and Reciprocity: Evidence and Economic Applications.” Pp. 208–257 in Advances in Economics and Econometrics: 8th World Congress, edited by M. Dewatripont, L. P. Hansen, and S. J. Turnovsky. Cambridge: Cambridge University Press. Fehr, Ernst, and Herbert Gintis. 2007. “Human Motivation and Social Cooperation: Experimental and Analytical Foundations.” Annual Review of Sociology 33:43–64. Gächter, Simon, Benedikt Hermann, and Christian Thöni. 2004. “Trust, Voluntary Cooperation, and Socio-Economic Background: Survey and Experimental Evidence.” Journal of Economic Behavior and Organization 55(4):505–531. Gürek, Özgür, Bernd Irlenbusch, and Bettina Rockenbach. 2006. “The Competitive Advantage of Sanctioning Institutions.” Science 312(5770):108–111. Henrich, Joseph, Robert Boyd, Samuel Bowles, Colin Camerer, Ernst Fehr, Herbert Gintis, Richard McElreath, Michael Alvard, Abigail Barr, Jean Ensminger, Natalie Smith Henrich, Kim Hill, Francisco Gil-White, Michael Gurven, Frank W. Marlowe, John Q. Patton, and David

546 | Marc Höglinger and Stefan Wehrli

[30] [31] [32] [33] [34] [35] [36] [37] [38] [39]

[40] [41] [42]

[43] [44] [45] [46] [47] [48] [49]

[50]

Tracer. 2005. “‘Economic Man’ in Cross-Cultural Perspective: Behavioral Experiments in 15 Small-Scale Societies.” Behavioral and Brain Sciences 28(6):795–855. Henrich, Joseph, Steven J. Heine, and Ara Norenzayan. 2010. “The Weirdest People in the World?” Behavioral and Brain Sciences 33(2/3):1–75. Hermann, Benedikt, Christian Thöni, and Simon Gächter. 2008. “Antisocial Punishment across Societies.” Science 319(5868):1362–1367. Höglinger, Marc, and Stefan Wehrli. 2016. A Study on Human Decision Making. Documentation. Zürich: ETH Zurich. Available at: https://www.descil.ethz.ch/projects/1510-peersvo. Horton, John J., David G. Rand, and Richard J. Zeckhauser. 2011. “The Online Laboratory: Conducting Experiments in a Real Labor Market.” Experimental Economics 14(3):399–425. Jann, Ben, 2014: Plotting regression coefficients and other estimates. Stata Journal 14(4):708– 737. Kahneman, Daniel, Jack L. Knetsch, and Richard H. Thaler. 1986. “Fairness as a Constraint on Profit Seeking: Entitlements in the Market.” American Economic Review 76(4):728–741. Keuschnigg, Marc, Felix Bader, and Johannes Bracher. 2016. “Using Crowdsourced Online Experiments to Study Context-Dependency of Behavior.” Social Science Research 59:68–82. Levine, David K. 1998. “Modeling Altruism and Spitefulness in Experiments.” Review of Economic Dynamics 1(3):593–622. Levitt, Steven D., and John A. List. 2007. “What Do Laboratory Experiments Measuring Social Preferences Reveal About the Real World?” Journal of Economic Perspectives 21(2):153–174. Liebrand, Wim B. G. 1984. “The Effect of Social Motives, Communication and Group Size on Behaviour in an N-Person Multi-Stage Mixed-Motive Game.” European Journal of Social Psychology 14:239–246. List, John A. 2007. “On the Interpretation of Giving in Dictator Games.” Journal of Political Economy 115(3):482–493. Murphy, Ryan O., Kurt A. Ackermann, and Michael J. J. Handgraaf. 2011. “Measuring Social Value Orientation.” Judgement and Decision Making 6(8):771–781. Murphy, Ryan O., and Kurt A. Ackermann. 2014. “Social Value Orientation: Theoretical and Measurement Issues in the Study of Social Preferences.” Personality and Social Psychology Review 18(1):13–41. Paolacci, Gabriele, Jesse Chandler, and Panagiotis G. Ipeirotis. 2010. “Running Experiments on Amazon Mechanical Turk.” Judgment and Decision Making 5(5):411–419. Paulhus, Delroy L. 1985. “Two-Component Models of Socially Desirable Responding.” Journal of Personality and Social Psychology 46(3):598–609. Raihani, Nichola J., Ruth Mace, and Shakti Lamba. 2013. “The Effect of $1, $5 and $10 Stakes in an Online Dictator Game.” PLoS One 8(8):e73131. Sijtsma, Klaas. 2009. “On the Use, the Misuse, and the Very Limited Usefulness of Cronbach’s Alpha.” Psychometrika 74(1):107–120. Smith, Tom W. 2006. Altruism and Empathy in America: Trends and Correlates. Chicago: National Opinion Research Center, University of Chicago. Sobel, Joel. 2005. “Interdependent Preferences and Reciprocity.” Journal of Economic Literature 43(2):392–436. Van Lange, Paul A. M., Wilma Otten, Ellen M. N. De Bruin, and Jeffrey A. Joireman. 1997. “Development of Prosocial, Individualistic, and Competitive Orientations: Theory and Preliminary Evidence.” Journal of Personality and Social Psychology 73(4):733–746. Volk, Stefan, Christian Thöni, and Winfried Ruigrok. 2012. “Temporal stability and psychological foundations of cooperation preferences.” Journal of Economic Behavior & Organization 81(2):664–676.

Roger Berger and Bastian Baumeister

Repetition Effects in Laboratory Experiments Abstract: Subjects of laboratory experiments are often recruited from a subject pool. Such experiments are typically similar in content and demand, and so learning processes can be assumed when the same subjects attend two subsequent experimental sessions. We label these “repetition effects”, because in these cases the learning processes differ from those where repeated decisions are made within a singular experiment. Such repetition effects endanger the validity of laboratory experiments. Repetition effects have a procedural component (subjects get to know procedures and equipment in the lab) and a social component (subjects learn how to interact within the special population of laboratory subjects). Furthermore, they can result from the self-selection of subjects in repeated experiments. Through two pilot studies, we show empirically how repetition effects can undermine the validity of experimental results. For this purpose, the same subjects repeated identical experiments (a cognitive reflection test, a hit game, and a dirty faces game in study 1; a prisoner’s dilemma in study 2) several months apart. The results indicate that there were substantial procedural and social learning effects, even when the repeated experiment occurred as much as six months after the initial experiment. Self-selection, however, did not play a major role.

1 Introduction Laboratory experiments have become increasingly popular in the social sciences. A good part of this trend can be traced back to Andreas Diekmann’s tireless promotion of experimental designs in general and laboratory experiments in particular, most notably in German-speaking circles of Sociology. Nevertheless, there is hardly any systematic research on the methodological pitfalls of laboratory experiments. For example, there is limited knowledge on population bias effects (Henrich, Heine, and Norenzayan 2010), selection bias effects (Levitt and List 2007a; Levitt and List 2007b) and the effects of non-representative samples (Fiore 2009). We thus pose a methodological question that – to our knowledge – has not been brought up before: what happens if the same subjects are invited to perform the same experiments several times? Why is this relevant? First, this is a question of reliability. Measuring the variables in the same social situation with the same actors should result in the same findings. A lack of reliability would imply the experiments lacked internal validity. Second, this also compromises their external validity (although external validity might not be the principal goal of laboratory experiments). If, for example, some causal mechanism is found only once in an experimental situation and is not replicable in any repetition within the same population, this mechanism may not be relevant in the real https://doi.org/10.1515/9783110472974-026

548 | Roger Berger and Bastian Baumeister

world where decisions are most often made more than once. Third, many experimental results stem from laboratories that use subject pools. These subjects typically take part in numerous experiments. Most of these experiments share common features like rooms, experimenters, payoff procedures, etc. Other features are highly similar, such as different social dilemma situations, the number of participants, etc. If there are any effects of repetition, therefore, it is useful to be aware of them to be able to control or prevent them. We explored whether and how prior experience with a given experiment might influence the subject’s behavior in two initial studies. In the following theoretical sections, we propose that such effects of repetition might result from three mechanisms: reduction of cognitive confusion; social learning through updating beliefs about other subjects; and self-selection of subjects into additional experiments. Subsequently we will present empirical results that assess whether these three mechanisms do cause repetition effects. In our conclusion, we point to possible consequences that may arise when experimenters do not account for repetition effects.

2 Theory: Why could there be repetition effects? Berger (2015) conducts an explorative series of one-shot prisoners’ dilemmas (PD) with varying degrees of anonymity. Because of the one-shot nature of the experimental situation, reputation-building should not influence the decisions. Independent of the degree of anonymity, there exists only one single Nash equilibrium in the experimental interaction in which both participants defect. However, a lack of anonymity for other subjects and/or experimenters in the laboratory may foster cooperation because some subjects fear negative sanctions beyond the experimental decision (for example after the experimental session). It was therefore supposed that, with increased degrees of anonymity, more actors would choose the Nash equilibrium decision, and thus lower the cooperation rate. When tested empirically, this assumption held true for three of the four anonymity conditions. Surprisingly, however, the highest cooperation rate was found in the experiment with the highest degree of anonymity – a double-blind anonymization procedure which concealed the individual’s decision from other subjects and experimenters (see Figure 1). Berger assumes that the complex, difficult to comprehend anonymization procedure might have been unsettling for unprepared subjects. If this was the case, the objectively highest degree of anonymity may have been subjectively perceived by the participants to be the lowest, resulting in higher cooperation rates. The subjects may only have realized that their decision had indeed been taken anonymously once the whole experimental session was completed. Based on these considerations, the participants in the “double-blind anonymous” condition were re-invited to another session with the same one-shot PD three months

Repetition Effects in Laboratory Experiments | 549

.2 89 4 .2 08 33

3

.3

.1

0 n = 34

.1 5

5

.2

.0 88 23

Cooperation rate

.3

74

33

33

3

.4

DB-A Observation 1

DB-NA Observation 2

B-A

B-NA

Notes: DB-A – Double-Blind Anonymous, DB-NA – DoubleBlind Non-Anonymous, B-A – Blind Anonymous, B-NA – Blind Non-Anonymous, n = 34 in the repeated experiment (observation 2).

Fig. 1: Cooperation rates in one-shot prisoner’s dilemma experiments (data source: Berger 2015).

later. The cooperation rate in this repeated experiment dropped to 8 % (see Figure 1), a value that was in line with the original conjectures. It was evident that between the first experiment and its repetition three month later something had happened with the subjects to compromise the results. We label this measured difference as a repetition effect. The aims of this article are to check if the repetition effect is robust, and to explore possible explanations for it. To do this, we provide three possible explanatory factors for the repetition effect: (1) familiarization with the experimental procedure, and thereby reduction of the subject’s confusion; (2) social learning by updating beliefs about other subjects and the reduction of subject’s kindness;¹ and (3) self-selected participation in repeated experiments. To our knowledge, there are no studies which systematically investigate repeated experiments with the same participants. However, there are many studies concerned with repeated decisions in one single experiment. If we only consider experiments similar to the PD,² where subjects are matched with a different partner in each round,³ a trend of decreasing cooperation rates can be observed over repeated rounds. This is similar to the effects found in Berger (2015) (also see Andreoni and Croson 2008; Andreoni and Miller 1993; Camerer 2003:45f.; Roth 1995:26ff.). Andreoni (1995) proposes

1 Kindness here is meant in terms of complacency towards other subjects that is said to diminish with experience from previous experiments (see Section 2.2). 2 This includes also classic “public good games” (see Andreoni below), which can be interpreted as a generalization of the prisoner’s dilemma (Levitt and List 2007b:155). 3 Indeed, such experiments are termed “repeated” in contrast to iterated experiments where the partners stay the same over several rounds.

550 | Roger Berger and Bastian Baumeister

two explanations for this phenomenon: Kindness and Confusion. We suppose that the same factors may be responsible for repetition effects. In the next two sections, we therefore briefly summarize Andreoni’s findings and connect them to our research question.

2.1 Confusion and procedural learning as a cause of repetition effects Laboratory experiments usually consist of several decisions. To make an informed decision, the subjects must understand the rules and incentives of an experiment. Confusion describes the assumption that some subjects do not immediately understand the equipment, procedures, rules and incentives of an experiment, and are therefore inclined to make the cooperative decision. These participants are not necessarily altruists; rather, they simply do not understand that defection may yield higher payoffs. Andreoni (1995) conducted a series of multi-period, repeated public goods games designed in such a way that it was possible to distinguish whether a subject cooperated due to confusion or because she was an altruist. He showed that about 50 % of all cooperative decisions in these experiments were caused by confusion. According to Andreoni (1995), the pattern of gradually decreasing cooperation rates with continued repetition of the experimental decision is mostly a consequence of decreasing confusion as the subjects learn the experimental procedure. We term this shortly as “procedural learning”. Subsequent studies have yielded similar results (e.g., Houser and Kurzban 2002; Ferraro and Vossler 2010). We conclude that the repetition effect could be due to procedural learning, occurring in between two consecutive experimental sessions.

2.2 Kindness and social learning as a cause of repetition effects In social dilemma situations like the PD, the collective gain can be maximized if both actors mutually cooperate. Yet, there is always an individual incentive to defect at the expense of the other actor(s). As soon as multiple or repeated interactions occur, the possibility of learning from the behavior of other actors arises. Such social learning (as we term it here) is also found in Andreoni’s public goods games. He states that some subjects cooperate because they just want to be nice. Andreoni labels this behavior as kindness. Kind (or altruistic) subjects are not confused and know that defecting would lead to higher payoffs, but nevertheless prefer to cooperate. It is likely that, at some point, defecting players in repeated social dilemmas will exploit altruistic players. This disenchanting experience may entice kind actors to defect as well to protect themselves from further exploitation.

Repetition Effects in Laboratory Experiments |

551

But do the effects of repeated decisions – whether in procedural or social learning – in one experiment apply to the above evidence? Obviously, there is a considerable difference between an experiment that is repeated after months and one that is repeated immediately with the same participants. We next present an explanation for repetition effects that applies only to repeated decisions made in subsequent, repeated experimental sessions: self-selection into the repeated experiment.

2.3 Laboratory experience and self-selection as a cause of repetition effects Several studies show that experienced subjects contribute less in ordinary public goods games than unexperienced ones (Isaac, Walker, and Thomas 1984; Palfrey and Prisbrey 1997; Ledyard 1995:147). Guillén and Veszteg (2012) find that the usual practice to invite subjects from pools of potential participants tends to breed so-called “lab-rats”. These subjects participate in many experiments and gain higher payoffs on average than inexperienced subjects. It is unclear what causes these tendencies. Palfrey and Prisbrey (1997) suggested that experienced subjects simply make fewer “mistakes” in their decision behavior. According to this argument, confusion and procedural learning are the dominant forces that drive repetition effects. Guillén and Veszteg (2012) on the other hand show that the decision to participate in another experiment also depends on gender and on the performance of profitability in previous experiments. Male, moneymaking subjects have a higher tendency to take part in several experiments. Consequently, the drastic decrease in cooperation rates found by Berger (2015) could partially be due to a selection effect in the sense that subjects who defected in the first experiment were more likely to take part in the second experiment than subjects who cooperated in the first experiment.

3 Empirical evidence: two pilot studies Based on the above considerations, we conducted two pilot studies. The objectives of this examination were to explore if repetition effects are robust, whether procedural learning and social learning exists between two experimental sessions that are several months apart, and if repetition effects can be explained as a consequence of self-selection. Both experimental series were conducted at the “Leipziger Experimentallabor LEx.”⁴ Members of the LEx subject pool were invited to take part and were recruited

4 http://lex.sozphil.uni-leipzig.de/.

552 | Roger Berger and Bastian Baumeister

via several means, such as advertising on the University’s Facebook page, flyers, or through personal recruitment in various lectures. Subjects who indicated that they already had lab experience were excluded. The laboratory⁵ itself is situated at the Institute of Sociology, University of Leipzig, Germany. The room has no windows and neutral gray walls. Portable dividers provide the necessary anonymity. We next present the experimental conditions, procedures and results.

3.1 Experiment 1: procedural learning in repeated experiments Procedural learning is a likely cause of repetition effects. It can be assumed that subjects are confused when they are presented with an experiment for the first time, but will understand the task at hand after some time. The goal of this series of experiments is to find out whether learning occurs between repeated experimental sessions taking place several months apart. To do this, we tracked the individual subjects’ performances over all experimental sessions in which they participated.

3.1.1 Experimental procedure To separate the effect of procedural learning from social learning, the experiment needed two requirements. First, the tasks had to be sufficiently demanding to guarantee a perceptible variance in the subjects’ performances. Second, any effects stemming from social preferences or interactions had to be eliminated. Our experiment therefore consisted of the following three cognitive tasks. Initially the subjects were asked to solve the cognitive reflection test (CRT) introduced by Frederick (2005). This test consists of three questions posed in a certain way so that the first answer that springs to mind is usually incorrect. The following questions were asked: A bat and a ball cost 1.10 $ in total. The bat costs 1.00 $ more than the ball. How much does the ball cost? If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets? In a lake, there is a patch of lily pads. Every day, the patch doubles in size. If it takes 48 days for the patch to cover the entire lake, how long would it take for the patch to cover half of the lake?

In Frederick’s studies, only 17 % of all respondents could answer all three items correctly.

5 We used the program “z-tree” to create and run the experiments (Fischbacher 2007).

Repetition Effects in Laboratory Experiments | 553

Afterwards, the subjects played a so-called Hit Game (HG, sometimes also referred to as a Race Game: see Gneezy, Rustichini, and Vostroknutov 2010). In this game, two players draw balls from an urn. The player that draws the last ball wins the game. The total number of balls in the urn, as well as the number of balls each player may draw, vary with each round. If both players play the game rationally, the winner of the game is predetermined. The predetermined player is thus always in the winning position. To prevent any social preferences and social learning of the subjects, their interaction partner was a specially programmed computer algorithm, and the subjects were aware of this from the beginning. The computer was predetermined to be in the losing position. However, as soon as the subject made a mistake, the algorithm was programmed to exploit this and win the round. Seven rounds of increasing levels of difficulty were played. The experiment concluded with the Dirty Faces Game (DFG). The DFG originates from Littlewood (1953). Each player is assigned one of two types, either X or O. The players know the types of all other players in the game, but not their own. Furthermore, there is always at least one X player. This information is publicly announced. Each round consists of several turns, and in each turn the players publicly proclaim which type they think they are (X, O or unknown). As soon as they state a type other than “unknown”, the round is over for them. The goal is to correctly deduce their own type as quickly as possible using the information gathered from the other players (Weber 2001). The game is designed in such a way that choosing “unknown” until the end of the round on average leads to higher payoffs than to guess the right type, so for players that do not understand the game, there is no rational incentive to “gamble”. After reading a comprehensive introduction to the game,⁶ the subjects played seven increasingly difficult rounds with a computer algorithm.⁷ As with the HG, the subjects knew from the outset that they were playing against an algorithm that would always decide rationally. All three tasks yielded monetary payoffs depending on the number of correctly solved subtasks. The show-up fee was 5.00 €, with the subjects able to gain up to an additional 6.50 €, leading to a possible maximum payoff of 11.50 €. The procedural learning experiment was conducted with two samples, S1 and S2 . Subjects of S1 were invited to the experiment on three occasions, with a period of three months between the observations. Subjects of S2 were invited twice, with a period of six months between the observations (see Table 1). They were not made aware in advance that they would play the same game with the same payoffs. Our goal was to determine whether procedural learning processes did occur between each observation, how strong they were, and how they differed from task to task. We also wanted to inves-

6 Containing a printed introduction with examples and a quiz. 7 Subjects also played the DFG with human co-players in the same experimental session (see Grehl and Tutić 2015). These results are not shown as they are not relevant to our research question.

554 | Roger Berger and Bastian Baumeister

Tab. 1: Design of the procedural learning experiments.

Sample 1 (S1 ) n Sample 2 (S2 ) n

T

T + 3 months

T + 6 months

O1 (Mar 2014) 112 O1 (Nov 2013) 61

O2 66

O3 34 O2 34

Notes: O1 – Observation 1, O2 – Observation 2, O3 – Observation 3.

tigate how the repetition effects differed between S1 and S2 . We assumed that a longer hiatus between observations in S2 would lead to larger memory gaps and therefore to smaller repetition effects. Finally, we wanted to ascertain whether cognitive learning effects persist if the subjects repeat the experiment twice.

3.1.2 Results Figure 2 shows the average scores of each observation. Only subjects who participated in all the observations of their respective sample are included. The score difference between each observation is thus a pure learning effect resulting from the repetition of the experiment. These differences in the average scores are only significant for the CRT and the DFG (Sample 2 only). Nonetheless, subjects seemed to learn between each observation, even if they did not know that they would have to use their knowledge again several months after the original experiment. The strongest increase in average scores occurred between the observations of the CRT (see Figure 2). We suspected that the low difficulty of the question and the negative correlation between the subjects’ impulsiveness and the CRT score (see Frederick 2005) drove subjects with low scores to “just take their time” in the next observation. The data supports this assumption. Subjects who answered item 1 incorrectly during O1 did indeed take about 20 seconds longer to solve the item during O2 than other participants. A similar effect can be found for item 2 but not for item 3. On the other hand, the learning effect between any two observations of the Hit Game is relatively small and insignificant. While there is a simple solution to the game, subjects with no experience in these types of games typically have no time during a laboratory experiment to deduct this algorithm. They thus rely on their intuitive understanding of the game and their skill in backwards induction. Grehl and Tutić (2015) show that most subjects can reliably predict one move of their opponent, but do not substantially outperform random guessing in situations where they have to predict two or more moves. We assume that backwards induction is a skill that is relatively hard to improve in a short time span. This should explain the only slight learning effects observed in our series of experiments.

Repetition Effects in Laboratory Experiments |

555

Sample 1

Relative scores

1 .8 .6 .4 .2 0 n = 34

Cognitive Reflection Test

(a)

Observation 1

Hit-Game Observation 2

Dirty-FacesGame Observation 3

Sample 2

Relative scores

1 .8 .6 .4 .2 0 n = 34

Cognitive Reflection Test

(b)

Observation 1

Hit-Game

Dirty-FacesGame

Observation 2

Notes: Only participants who attended all observations are included. Confidence intervals: 90 %. Fig. 2: Average relative scores in the procedure learning experiment.

While the learning effects between the instances of the DFG are smaller in comparison with the CRT (see Figure 2), they are significant for sample 2. It was not suspected that the subjects would learn the DFG more easily than the HG. In fact, some of our colleagues perceived the DFG to be the more difficult game. However, it should be noted that the participants in the DFG received more detailed training and preparation compared to the short instructions with the Hit Game. Furthermore, the DFG was played twice (with computer players and with human players: see footnote 7), whereas the HG was only played once. Unfortunately, it remains unknown how the learning effects of the HG would have played out if the subjects had received similar training to the DFG. Nevertheless, we state that there are measurable learning effects between observations in both samples. They seem to be stronger in relatively simple tasks, with train-

556 | Roger Berger and Bastian Baumeister

ing being perceived to have had a positive influence. In addition, the learning effect is still observable and strong after two repetitions. There are, however, some unexpected results when we compare the learning effects of both samples. We initially assumed that S2 , with only one repetition after six months, would have smaller differences in score between the two observations than S1 , where the repetition took place after three months. Interestingly, the opposite is the case. Although the time gap between O1 and O2 was twice as long for S2 as for S1 , the subjects of S2 still showed stronger learning effects in all three tasks.⁸ We suppose that this could be the consequence of a sampling bias. Ideally, the first observation of both samples should have been conducted at the same time, but this was not possible for practical reasons. It could be that, for some undetermined reason, S2 consisted of “better learners” than S1 . Tables 4 and 5 (see below) show that the subjects of S1 are, in fact, on average one year younger than the subjects of S2 . This suggests that both samples may differ significantly in their composition, which is a flaw in our study. However, the fact that two samples drawn from the same subject pool at slightly different times differ in age shows that the reliability of laboratory experimental results cannot be considered as given. Likewise, this result also raises important questions about making methodologically uninformed comparisons on laboratory results from different subject pools. Overall this shows that the repetition effect may be explained by procedural learning between two subsequent experiments, several months apart from each other. This learning effect must happen to the subjects unwittingly, as they did not know originally that they had to solve the tasks again at a later date. Astonishingly, this was even true when there was a period of about half a year between the two experiments. We nevertheless must concede that most of these results are not statistically significant. It is part of the nature of repeated experiments that only some of the subjects can be invited twice or more. Only about 60 % of the participants in each observation of the first study returned for the next session (see Tables 4 and 5 below).⁹ In the second study (see below) this figure dropped further to only 33 % of the participants (see Table 6 below).

3.2 Experiment 2: social learning in repeated experiments In a second series of experiments, we tested the robustness of the results of Berger (2015) with the one-shot PD. In contrast with his experiment, however, we avoided any procedural difficulties that may stem from the anonymization procedure to prevent confounding social learning with procedural learning.

8 These differences are significant for the DFG. 9 See Appendix for a more detailed analysis of the subjects’ learning process.

Repetition Effects in Laboratory Experiments | 557

3.2.1 Experimental procedure We chose the following anonymization procedure: all subjects were assembled in the laboratory, with dividers concealing each subject’s decisions from the others. The experimenters kept record of all participants, with the knowledge of said participants. The subjects then received instructions and played a practice session, followed by a payoff session, against different opponents. After a concluding questionnaire, each participant got her payoff and left the room one by one. The payoffs were hidden in an envelope so that the person handing over the payoff could not connect it to the receiving subject. We thus used an anonymity procedure comparable to Berger’s (2015) “blind non-anonymous” condition. The payoff matrix of the PD used is presented in Table 2. Tab. 2: Prisoner’s dilemma (payoffs in Euros).

C D

C

D

9, 9 13, 5

5, 13 7, 7

The social learning experiment was conducted with one sample. The initial observation (O1 ) took place in July 2015. After three and ten months respectively, the subjects received another invitation to the same experiment. They did not know in advance that they would play the same game with the same payoffs. Since only eight subjects participated in the final observation (O3 ), the presented result analysis focuses on observations O1 and O2 . Table 3 shows the experimental design. Tab. 3: Design of the social learning experiments.

Sample 1 n

T

T + 3 months

T + 10 months

O1 (Jul 2015) 73

O2 24

O3 8

Notes: O1 – Observation 1, O2 – Observation 2, O3 – Observation 3.

3.2.2 Results Figure 3 displays the cooperation rates in the one-shot Prisoner’s Dilemma for both observations O1 and O2 . Only the 24 subjects that took part in both experiments are

558 | Roger Berger and Bastian Baumeister

Cooperation Rate

.8

.6

.4

.2

0 Practice round n = 24

Observation 1

Payoff round

Observation 2

Notes: Only participants who attended all observations are included. Only the decision made during the payoff round was relevant for each subject’s monetary gain. Confidence intervals: 90 %.

Fig. 3: Cooperation rates in a one-shot prisoner’s dilemma experiment (Observation 1) and the repeated experiment (Observation 2).

included (see Table 6 below). In addition, the cooperation rate is plotted for the corresponding practice rounds without monetary payoff. There is a substantial repetition effect. While there is no repetition effect at all for the unpaid practice round, the cooperation rate in the repeated paid session is about 0.167 below the value of the first observation (see Figure 3). It is conceivable that the subjects may have just randomly chosen an option during the practice period, since it was of no relevance for their monetary outcome. In O3 , all eight remaining 1

Defection Rate

.8 .6 .4 .2 0 Ego cooperated in O1 n = 24

Alter cooperated in O1

Ego defected in O1 Alter defected in O1

Notes: O1 – Observation 1, O2 – Observation 2. Confidence intervals: 90 %.

Fig. 4: Defection rate at Observation 2 as a consequence of the subject’s defection experience during Observation 1.

Repetition Effects in Laboratory Experiments |

559

participants defected, resulting in the continuation of the trend from the first to the second observation. In the above section (see Section 2.2) we suggested that this learning effect in the second experiment could be due to a disenchanting experience in the first experiment. Figure 4 shows that the opponent’s (Alter) decision during O1 has no influence on the decision of O1 -defectors (Ego) in O2 . However, O1 -cooperators who were exploited by defectors in O1 have a much higher inclination to defect during O2 than the O1 cooperators who were not exploited but met a cooperative partner. In O3 , only one subject remained that had cooperated in O2 , and they ultimately defected too. While the results point in the expected direction (subjects seem to update their beliefs about the other subjects in the laboratory experiment), none of the measured differences are statistically significant. This comes as no surprise due to low case numbers (especially for the last observation), which leads to insufficient statistical power.

3.3 Selection effects in repeated experiments The third explanation for repetition effects is self-selection of successful subjects into the repeated experiments. We present the evidence for this explanation in Tables 4–6. Indicators of success and the distribution of age and gender are displayed for each experiment. In contrast to Guillén and Veszteg (2012), we find hardly any evidence of self-selection into the repeated experiments. Participants who decided to visit the laboratory again in the experiments that involved cognitive tasks did not achieve higher scores – and therefore higher profits – in O1 than participants who Tab. 4: Descriptive overview of procedure learning experiments, sample 1. O1

O2

Participated only in O1

Participated Difference in O1 and O2 (p-Values)

%-CRT %-Hit-Game %-DF-Game %-Male Age

45 36 39 35 22.7

47 36 36 30 23.0

n Total n

46 112

66

−2 (0.78) 0 (0.90) 3 (0.54) 5 (0.62) −0.3 (0.57)

Participated only in (O1 and) O2

Participated in (O1 ,) O2 and O3

Difference (p-Values)

69 38 45 34 23.1

64 37 41 26 23.3

5 (0.58) 1 (0.81) 4 (0.48) 8 (0.49) −0.2 (0.83)

32 66

34

Notes: Displayed are the percentages of correctly solved tasks in the Cognitive Reflection Test (CRT), the Hit-Game, and the Dirty-Faces-Game (DF-Game), the percentage of male subjects, the average age, and the case numbers. p-values in parentheses. O1 – Observation 1, O2 – Observation 2, O3 – Observation 3.

560 | Roger Berger and Bastian Baumeister

Tab. 5: Descriptive overview of procedure learning experiments, sample 2. O1 Participated only in O1

Participated in O1 and O2

Difference (p-Values)

%-CRT %-Hit-Game %-DF-Game %-Male Age

54 34 34 26 24.4

46 39 35 35 24.3

8 (0.34) −5 (0.30) −1 (0.83) −9 (0.43) 0.1 (0.97)

n Total n

27 61

34

Notes: Displayed are the percentages of correctly solved tasks in the Cognitive Reflection Test (CRT), the Hit-Game, and the Dirty-Faces-Game (DF-Game), the percentage of male subjects, the average age, and the case numbers. p-values in parentheses. O1 – Observation 1, O2 – Observation 2. Tab. 6: Descriptive overview of the one-shot prisoner’s dilemma experiments.

%-Cooperation (payoff) %-Cooperation (practice) %-Male Age Profit in € n Total n

O1

O2

Participated Participated Difference only in O1 in O1 and O2 (p-Values)

Participated only in (O1 and) O2

Participated in (O1 ,) O2 and O3

44

42

2 (0.87)

31

13

18 (0.32)

54

63

−9 (0.50)

63

63

0 (1.00)

33 25.4 8.5 48 72

38 25.7 8.4 24

−5 (0.73) −0.3 (0.79) −0.1 (0.92)

31 25.3 8.0 16 24

50 27.3 8.0 8

Difference (p-Values)

−19 (0.37) −2 (0.34) 0 (1.00)

Notes: Displayed are the percentage of cooperation, the percentage of male subjects, the average age, and the case numbers. p-values in parentheses. O1 – Observation 1, O2 – Observation 2, O3 – Observation 3.

did not return. There is also no significant difference in the gender or age distribution between those who participated only once and those who returned. This finding is also true for the experiments involving social interaction. However, Guillén and Veszteg (2012) used a much longer series of experiments with a much higher number of subjects.¹⁰ It is possible that their effects are the result of their larger number and longer timespan of observations. Our results from the

10 Guillén and Veszteg (2012) observed more than 2000 subjects over a period of three years.

Repetition Effects in Laboratory Experiments | 561

last observation in the experiments with the PD (O3 ) hint at this explanation. In these results, the share of male subjects is the highest of all observations. These subjects had also been successful in the previous experiments, insofar as they had not been exploited. Only one out of the eight subjects who participated in O3 was cooperative in O2 (and she also turned to defection in O3 ). Nevertheless, we can conclude that the repetition effects in our study are not a result of statistically significant self-selection of the subjects into repeated experiments.

4 Conclusion The first objective of our analysis was to determine whether prior experience in any given experiment influences the behavior of subjects in a laboratory experiment setting. This question is affirmed. In accordance with previous evidence (Berger 2015), we uncovered hints of a repetition effect in a one-shot PD that was repeated after several months. In addition, we found substantial repetition effects in experiments on cognitive abilities that were repeated after three and six months respectively. The second objective was to explore three possible explanations for this repetition effect: procedural learning, social learning, and self-selection of successful subjects into repeated experiments. We found evidence for procedural learning from the subjects across two sessions, even when there were six months between each session. Furthermore, there was additional learning if the experiment was repeated a second time. We found some preliminary evidence on how social learning occurs due to updated beliefs about other subjects involved, which is of particular importance in social dilemma experiments. Cooperative subjects who were exploited by defectors in the initial experiment turned to defection themselves in the repeated experiment three months later. Both procedural and social learning must occur unwittingly, because the subjects were not aware they would have to make the same decision again at a later date. We did not, however, find conclusive evidence for the self-selection of successful subjects into repeated experiments. Our findings stand in contrast to those of Guillén and Veszteg (2012), whose observations covered a much longer timespan and involved many more subjects. It is thus possible that self-selection effects are subtle from one experimental session to another, and only become more distinct in subject pools that involve more experimental sessions per subject. The presented pilot studies investigated only a relatively small number of subjects. To confirm our findings, we recommend further research using a greater sample size. One can nevertheless discuss the relevance of repetition effects for laboratory experiments in the social sciences. Are they actually problematic at all? For instance, it could be argued that having prior experience with a given problem corresponds more closely to what people usually experience in the social realities of everyday life. A pool

562 | Roger Berger and Bastian Baumeister

of experienced subjects could thus increase the internal, and maybe even the external, validity of laboratory experiments. The fact however remains that we know very little about the reliability of the results from laboratory experiments, although (partial) repetitions of experiments might often happen accidentally. The prevalence of repetition effects is probably as much a function of the characteristics of the subject pool as it is of the number of experiments in which they participate. It is plausible that these attributes vary between experimental laboratories, which could potentially threaten a meaningful and valid comparison of results between different laboratories. One could still argue that the presented evidence is only about the levels of marginal shares of different experiments, and that there is in fact no problem, provided that only treatment effects (for example, the effect of information about the partner on cooperation) are interpreted. However, we do not know if the reliability and robustness of treatment effects are affected in repeated experiments. As long as we lack satisfactory answers on these questions, further investigation is necessary. Otherwise, this could reduce the validity and credibility of laboratory experiments as a social-scientific research method.

Appendix Tables 7 and 8 show OLS-regressions with the relative score difference of the corresponding task between two observations as the dependent variable.¹¹ Only one relevant effect is visible in both samples. The individual score of a task in the previous session has a negative effect on the score difference between two observations. This may seem counterintuitive, but the explanation is simple. We assume that each subject has a natural score limit for any given task.¹² Some persons reach this upper limit relatively quickly. The persons who score higher during O1 cannot increase their skills further, or may even perform worse in later sessions. The “slower learners”, on the other hand, perform worse during O1 , but have a higher learning potential in the subsequent observation(s). For S1 , we additionally controlled for the influence of the learning effect between O1 and O2 . Interestingly, this effect is negative as well. This is another indication for our assumption that most subjects have a single “Eureka!” moment, after which their cognitive performance does not improve. Table 9 displays the Pearson correlations between the average scores of each task and the respective observation. They are always positive, and most values are significant and distinctive. That means that subjects who perform well in comparison with the other participants during O1 also perform comparatively well in later observations. For the HG and the DFG, the standard deviation of the score distribution increases over time. Therefore, cognitively able subjects seem to learn better between two experiments if the task is more challenging. For the CRT, which is the least difficult task, this conclusion does not apply. 11 The small sample size in our study makes complex models (for example with interaction effects) unfeasible. We also tested whether gender, age or knowledge of microeconomics and the academic subject influenced repetition effects. No relevant effects were found. 12 There is also a limit given by the tasks themselves. Although almost no subjects reach the absolute score limit of the HG and the DFG in any observation, about 20 % of the subjects reach the upper score limit of the CRT in O1 .

Repetition Effects in Laboratory Experiments |

563

Tab. 7: Explanation of cognitive learning effects in sample 1: OLS-regression on relative score difference. Procedural learning effect

Procedural learning effect

between O1 and O2 CRT HIT

between O2 and O3 CRT Hit

DF

DF

Age in years

0.006 (0.68)

−0.005 (0.53)

0.006 (0.82)

−0.013 (0.42)

−0.001 (0.97)

0.007 (0.64)

Gender (1 = male)

0.067 (0.43)

0.038 (0.39)

0.032 (0.48)

0.049 (0.63)

0.082 (0.29)

0.015 (0.88)

% of task solved in previous session (1 = 10 %)

−0.042 (0.00)∗∗

−0.049 (0.00)∗∗

−0.021 (0.24)∗

−0.024 (0.05)∗

−0.065 (0.00)∗∗

−0.010 (0.56)

−0.023 (0.06)+

−0.028 (0.31)

−0.064 (0.05)∗

Learning effect between O1 and O2 (1 = 10 %) Cons.

0.233 (0.48)

0.289 (0.00)∗∗

−0.024 (0.313)

0.582 (0.12)

0.268 (0.37)

−0.021 (0.953)

n R2

66 0.24

66 0.20

66 0.08

34 0.34

34 0.31

34 0.22

Notes: p-values in parentheses. CRT – Cognitive Reflection Test (CRT), HIT – Hit-Game, DF – Dirty-FacesGame. O1 – Observation 1, O2 – Observation 2, O3 – Observation 3. + p < 0.10, ∗ p < 0.05, ∗∗ p < 0.01. Tab. 8: Explanation of cognitive learning effects in sample 2: OLS-regression on relative score difference. Procedural learning effect between O1 and O2

Age in years Gender (1 = male) % of task solved in O1 (1 = 10 %) Cons. n R2

CRT

Hit

DF

0.003 (0.80) 0.060 (0.49) −0.051 (0.00)∗∗ 0.384 (0.21) 34 0.40

−0.014 (0.11) −0.080 (0.18) −0.072 (0.00)∗∗ 0.695 (0.00)∗∗ 34 0.47

−0.003 (0.82) 0.172 (0.04)∗ −0.026 (0.24) 0.285 (0.313) 34 0.22

Notes: p-values in parentheses. CRT – Cognitive Reflection Test (CRT), HIT – Hit-Game, DF – Dirty-FacesGame. O1 – Observation 1, O2 – Observation 2. + p < 0.10, ∗ p < 0.05, ∗∗ p < 0.01

564 | Roger Berger and Bastian Baumeister

Tab. 9: Correlations of scores between different observations of the cognitive learning experiments. Sample 1

Observations O1 and O2 Observations O2 and O3 Observations O1 and O3 n

Sample 2

CRT

Hit

DF

0.624 (0.00)∗∗ 0.752 (0.00)∗∗ 0.591 (0.00)∗∗ 34

0.457 (0.00)∗∗ 0.484 (0.00)∗∗ 0.212 (0.23) 34

0.718 (0.00)∗∗ 0.673 (0.00)∗∗ 0.713 (0.00)∗∗ 34

CRT

Hit

DF

0.600 (0.00)∗∗

0.284 (0.11)

0.449 (0.01)∗∗

34

34

34

Notes: p-values in parentheses. CRT – Cognitive Reflection Test (CRT), HIT – Hit-Game, DF – Dirty-FacesGame. + p < 0.10, ∗ p < 0.05, ∗∗ p < 0.01.

Bibliography [1] [2]

[3] [4]

[5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

Andreoni, James. 1995. “Cooperation in Public-Goods Experiments: Kindness or Confusion?” The American Economic Review 85(4):891–904. Andreoni, James, and Rachel Croson. 2008. “Partner versus Strangers: Random Rematching in Public Goods Experiments.” Pp. 776–783 in Handbook of Experimental Economics, Results Volume, edited by C. R. Plott, and V. L. Smith. Amsterdam: North-Holland. Andreoni, James, and John H. Miller. 1993. “Rational Cooperation in Finitely Repeated Prisoner’s Dilemma: Experimental Evidence.” The Economic Journal 103(418):570–585. Berger, Roger. 2015. “Das Laborexperiment als sozialer Prozess.” Pp. 53–76 in Experimente in den Sozialwissenschaften. Sonderband der Sozialen Welt, edited by M. Keuschnigg, and T. Wolbring. Baden-Baden: Nomos. Camerer, Colin F. 2003. Behavioral Game Theory: Experiments in Strategic Interaction. Princeton, NJ: Princeton University Press. Ferraro, Paul J., and Christian A. Vossler. 2010. “The Source and Significance of Confusion in Public Goods Experiments.” The B.E. Journal of Economic Analysis & Policy 10(1):1–42. Fiore, Annamaria. 2009. “Experimental Economics: Some Methodological Notes.” MPRA Paper No. 12498. Retrieved January 2, 2017 (https://mpra.ub.uni-muenchen.de/12498/). Fischbacher, Urs. 2007. “z-Tree: Zurich toolbox for ready-made economic experiments.” Experimental Economics 10(2):171–178. Frederick, Shane. 2005. “Cognitive Reflection and Decision Making.” The Journal of Economic Perspectives 19(4):25–42. Gneezy, Uri, Aldo Rustichini, and Alexander Vostroknutov. 2010. “Experience and insight in the Race game.” Journal of Economic Behavior and Organization 75(2):144–155. Grehl, Sascha, and Andreas Tutić. 2015. “Experimental Evidence on Iterated Reasoning in Games.” PLoS ONE 10(8): e0136524. doi:10.1371/journal.pone.0136524. Guillén, Pablo, and Róbert F. Veszteg. 2012. “On ‘lab rats’.” The Journal of Socio-Economics 41(5):714–720. Henrich, Joseph, Steven J. Heine, and Ara Norenzayan. 2010. “The Weirdest People in the World?” Behavioral and Brain Sciences 33(2/3):61–135. Houser, Daniel, and Robert Kurzban. 2002. “Revisiting Kindness and Confusion in Public Goods Experiments.” The American Economic Review 92(4):1062–1069.

Repetition Effects in Laboratory Experiments |

565

[15] Isaac, R. Mark, James Walker, and Susan Thomas. 1984. “Divergent evidence on free riding: An experimental examination of possible explanations?” Public Choice 34(2):113–149. [16] Ledyard, John O. 1995. “Public Goods: A Survey of Experimental Research.” Pp. 111–194 in The Handbook of Experimental Economics, edited by J. H. Kagel, and A. E. Roth. Princeton, NJ: Princeton University Press. [17] Levitt, Steven D., and John A. List. 2007a. “Viewpoint: On the Generalizability of Lab Behaviour to the Field.” Canadian Journal of Economics 40(2):347–370. [18] Levitt, Steven D., and John A. List. 2007b. “What Do Laboratory Experiments Measuring Social Preferences Reveal about the Real World?” Journal of Economic Perspectives 21(2):153–174. [19] Littlewood, John E. 1953. A Mathematician’s Miscellany. London: Methuen & Co. Ltd. [20] Palfrey, Thomas R., and Jeffrey E. Prisbrey. 1997. “Anomalous Behavior in Public Goods Experiments: How Much and Why?” The American Economic Review 87(5):829–846. [21] Roth, Alvin E. 1995. “Introduction to Experimental Economics.” Pp. 3–109 in The Handbook of Experimental Economics, edited by J. H. Kagel, and A. E. Roth. Princeton, NJ: Princeton University Press. [22] Weber, Roberto A. 2001. “Behavior and Learning in the ‘Dirty Faces’ Game.” Experimental Economics 4(3):229–242.

Notes on the Editors and Contributors Katrin Auspurg is Professor of Quantitative Empirical Research at the Department of Sociology, LMU Munich, Germany. Her research interests include social inequalities, labor market and discrimination research, experimental methods, and survey methodology. Margit Averdijk is a senior research associate at the Zurich Project on the Social Development from Childhood to Adulthood (z-proso) at the Jacobs Center for Productive Youth Development at the University of Zurich, Switzerland. Her research interests include victimization research, victimization over the life-course, child sexual abuse, crime prevention, and social science methodology. Dieko Bakker is a PhD student at the Department of Sociology/ICS at the University of Groningen, the Netherlands. His research interests include social norms and normative conflict, cooperation, altruism, and experimental methodology. Bastian Baumeister is a scientific assistant at the Department of Sociology at the University of Leipzig, Germany. His primary interests are social science statistics, methodology and laboratory experiments. Joël Berger is a postdoctoral researcher at the Institute of Sociology, University of Zurich, Switzerland. His research and teaching interests include social order, social inequality, game theory, and experimental methods. Roger Berger is Professor of Sociology at the University of Leipzig, Germany. His research and teaching interests include social science methodology, game theory, and basics of cooperation. Marcin Bober is a former researcher and a PhD student in the Human-Technology Interaction Group at Eindhoven University of Technology, the Netherlands, where he conducted research on humancomputer interaction, online reputation systems, and social data visualization. Currently, he is a User Experience Lead at Google. Friedel Bolle is Professor Emeritus of Economics, still working on research projects at the EuropaUniversität Viadrina in Frankfurt (Oder), Germany. His research interests are mainly experimental and behavioral economics, but also energy economics and applied game theory. Vincent Buskens is Professor of Sociology at the Department of Sociology/ICS, Utrecht University, the Netherlands. His research interests include sociological theory, game theory, mathematical sociology, experimental sociology, social dilemmas, social networks, and institutions. Elisabeth Coutts was a PhD student under Andreas Diekmann at ETH Zurich, Switzerland. She passed away after a long illness on August 5, 2009. Jacob Dijkstra is Associate Professor of Sociology at the Department of Sociology/ICS at the University of Groningen, the Netherlands. His research interests include experimental methods, game theory, social networks, and formal theory. Manuel Eisner is Professor of Comparative and Developmental Criminology at the University of Cambridge, UK. His research interests include the history of violence, human evolution and psychopathologies, aggression over the life course, cross-cultural comparative analyses of crime, and violence prevention. Hartmut Esser is Professor Emeritus of Sociology and Philosophy of Science at the University of Mannheim, Germany. His research interests include sociological theory, methodology of the so-

568 | Notes on the Editors and Contributors

cial sciences, theories of action, migration, integration and ethnic conflicts, marital relations, and (currently) educational systems and educational inequality. A volume on Soziologie. Allgemeine Grundlagen (1993) and six volumes on Soziologie. Spezielle Grundlagen (1999–2001) represent the core of his work. Andreas Flache is Professor of Sociology at the Department of Sociology/ICS at the University of Groningen, the Netherlands. His research and teaching focuses on computational modeling, social complexity, opinion dynamics, cooperation, experimental research, and social networks. Axel Franzen is Professor of Sociology at the University of Bern, Switzerland. His research and teaching interests include methods of empirical social research, game theory, and environmental sociology. Thomas Gautschi is Professor of Sociological Methodology at the University of Mannheim, Germany. His research and teaching interests include game theory, network analysis, model building, economic sociology, social science methodology and statistics, and experimental methods. Jean-Louis van Gelder is a senior researcher at the Netherlands Institute for the Study of Crime and Law Enforcement at the University of Amsterdam (NSCR), the Netherlands. His research interests concern the use of innovative methods in crime research, risk perception and behavior, time orientation and self-control, and cognition and affect in criminal decision making. Christiane Gross is Professor of Quantitative Methods in the Social Sciences at the University of Würzburg, Germany. Her research and teaching interests include quantitative methods and social inequality in education, work, and health. Dirk Helbing is Professor of Computational Social Science at the Department of Humanities, Social and Political Sciences and affiliate of the Computer Science Department at ETH Zurich, Switzerland. He earned a PhD in physics at the University of Stuttgart and was Managing Director of the Institute of Transport and Economics at Dresden University of Technology. He is internationally known for his work on pedestrian crowds, vehicle traffic, and agent-based models of social systems. Furthermore, he coordinates the FuturICT Initiative, is an elected member of the German National Academy of Sciences (“Leopoldina”) and worked for the World Economic Forum’s Global Agenda Council on Complex Systems. Thomas Hinz is Professor of Empirical Social Research and Survey Methodology at the Department of Sociology, University of Konstanz, Germany. His research interests include social inequalities and discrimination in markets, labor market research, experimental and survey methods. Marc Höglinger is a research associate at the Winterthur Institute of Health Economics of the Zurich University of Applied Sciences, Winterthur, Switzerland. His main research interests are survey methods, the sociology of health, organizational science, and the sociology of work. Ben Jann is Professor of Sociology at the University of Bern, Switzerland. His research and teaching interests include social stratification and inequality, labor market sociology, social science methodology, and statistics. Monika Jungbauer-Gans is Scientific Director of the German Centre for Higher Education Research and Science Studies and Professor of Higher Education Research and Science Studies at the University of Hannover, Germany. Her research interests are education, the sociology of health, social inequality und diversity, the labor market, stigmatization, and research methods.

Notes on the Editors and Contributors

| 569

Ulf Liebe is Associate Professor at the Institute of Sociology at the University of Bern, Switzerland. His research and teaching interests include theory comparison, environmental sociology, environmental economics, economic sociology, and experimental methods. Siegwart Lindenberg is Professor of Cognitive Sociology at the Departments of Sociology and the Interuniversity Center for Social Science Theory and Methodology (ICS), University of Groningen, and the department of Social Psychology, Tilburg University, both in the Netherlands. He is a member of the Royal Netherlands Academy of Arts and Sciences. His interests lie in the development, testing and application of theories of social rationality that deal with the influence of the social environment on social need fulfillment, norms, cooperative behavior and self-regulation; and in the application of these theories to the explanation of pro- and anti-social behavior and the conditions of joint production. Michael Mäs is Assistant Professor at the Department of Sociology/ICS at the University of Groningen, the Netherlands. His research focuses on social influence and cooperation in social networks, the emergence of social norms and institutions, experimental methods, and the micro-macro problem. Uwe Matzat is Assistant Professor of Sociology in the Human-Technology Interaction Group at Eindhoven University of Technology, the Netherlands. His research and teaching interests include social media design and use, online reputation systems, methods of online data collection, the social consequences of online technologies, and social network analysis. Nynke van Miltenburg currently works outside academia. Her research interests include social dilemmas, analytical sociology, and experimental methods. Ulrich Mueller is Professor Emeritus for Medical Sociology and Social Medicine at the Medical School, Philipps-University Marburg, and presently heads the Mortality-Follow-Up team of the German National Cohort, hosted by the Federal Institute for Population Research in Wiesbaden, Germany. Ryan O. Murphy is the Director of Behavioral Science at Morningstar Inc. and a visiting professor at the University of Zürich, Switzerland. Previously he was the Chair of Decision Theory and Behavioral Game Theory at the ETH Zürich. His research and teaching interests are in the area of behavioral economics and cognitive psychology, and his recent work is related to measuring people’s preferences, and modeling psychological/structural factors in strategic interactions. Aja Louise Murray is a research associate working on the Zurich Project on the Social Development from Childhood to Adulthood at the University of Cambridge, UK. Her main research interests are in the developmental aspects of mental health, especially attention-deficit/hyperactivity disorder, autism spectrum disorders, and aggression. Heinrich Nax is a senior scientist at Computational Social Sciences at ETH Zurich, Switzerland. His research and teaching interests involve game theory, particularly behavioral, experimental and evolutionary game theory. Natascha Nisic is Junior Professor of Economic Sociology at the University of Hamburg, Germany. Her research and teaching interests include economic and labor market sociology, family sociology, social stratification and inequality, and social science methods. Karl-Dieter Opp is Professor Emeritus at the University of Leipzig, Germany, and Affiliate Professor at the University of Washington (Seattle). His fields of interest are social theory, political participation, social norms and institutions, and the philosophy of the social sciences. Margit E. Oswald is Professor Emeritus of Social Psychology and Legal Psychology at the University of Bern, Switzerland. Her main research interests include rationality and biases of human information

570 | Notes on the Editors and Contributors

processing, treatment and punishment of deviancy by lay people and professionals, aggression, social justice and conflict resolutions, and the development of social stereotypes and prejudices. Peter Preisendörfer is Professor of Sociology at the Johannes Gutenberg-University Mainz, Germany. His research and teaching interests include environmental sociology, the sociology of organizations, entrepreneurship, and the quantitative methods of social research. Wojtek Przepiorka is Assistant Professor at the Department of Sociology/ICS at Utrecht University, the Netherlands. His research and teaching interests include analytical sociology, economic sociology, game theory, organizational behavior, and experimental methods. Werner Raub is Professor of Sociology at Utrecht University, the Netherlands, and at the Interuniversity Center for Social Science Theory and Methodology (ICS). His research and teaching cover a variety of areas in theoretical sociology, applications of mathematical models in sociology, organizational behavior, experimental research, and topics on the interface of analytical social science and philosophy. Heiko Rauhut is Associate Professor of Social Theory and Quantitative Methods at the Institute of Sociology at the University of Zurich, Switzerland. His research and teaching interests include social norms, cooperation, social networks, the sociology of science, analytical sociology, game theory, experimental methods, and applied statistics. Denis Ribeaud is scientific project coordinator of the Zurich Project on the Social Development from Childhood to Adulthood (z-proso) at the Jacobs Center for Productive Youth Development of the University of Zurich. His research interests include aggression and delinquency over the life course, trends in youth violence, crime prevention, dating violence, and mechanisms of moral neutralization and self-control. Chris Snijders is Professor of the Sociology of Technology and Innovation at Eindhoven University of Technology, the Netherlands. His research interests include human and computer-based decisionmaking, online behavior and measurement, human-data interaction, and behavioral research methods. Andreas Tutic is a senior researcher at the Institute of Sociology at the University of Leipzig, Germany. His research and teaching interests include action theory, mathematical sociology, and experimental social science. Corina T. Ulshöfer is a research associate in evaluation for the educational department of the Canton of Bern in Switzerland. She was previously a research assistant in social psychology at the University of Bern, and wrote her PhD thesis on information processing under trust and distrust. Manuela Vieth has been teaching and conducting research mainly in the fields of social norms and motivations, models of social interactions and processes, and experimental methods. Thomas Voss is Professor of Sociology at the University of Leipzig, Germany. His research interests include rational choice and game theory, philosophy of social science, and economic sociology. Jeroen Weesie is Associate Professor of Mathematical Sociology at the Department of Sociology/ICS at Utrecht University, the Netherlands. His research and teaching activities include formal models of social interactions and processes (including game theory), social norms and motivations, mechanism design, social networks, organizational behavior, and statistics. Stefan Wehrli is the laboratory manager at the Decision Science Laboratory at ETH Zurich, Switzerland. His research interests include experimental and survey methodology, social networks, and computational social science.

Notes on the Editors and Contributors

| 571

Fabian Winter is head of the Max Planck Research Group “Mechanisms of Normative Change” at the Max-Planck-Institute for Research in Collective Goods in Bonn, Germany. He is interested in social norms, game theory, economic sociology, causal inference, Big Data, and experimental methods. Rolf Ziegler is Professor Emeritus at the University of Munich, Germany. His research and teaching interests include the analysis of social networks; the survival and success of newly-founded enterprises; mathematical models in the social sciences; and norms, social order and rational choice. He is a member of the Bavarian Academy of science and “Leopoldina” (the German National Academy of Sciences). Important publications include The Kula Ring of Bronislaw Malinowski. A Simulation Model of the Co-Evolution of an Economic and Ceremonial Exchange System (2007); Der Erfolg neugegründeter Betriebe (3rd edition 2007); Networks of Corporate Power. A Comparative Analysis of Ten Countries (1985); and Theorie und Modell. Der Beitrag der Formalisierung zur soziologischen Theorienbildung (1972).